The project in view aspires to create PathFinder which is a mobile application designed specifically to help the visually impaired with navigation in real- time. By incorporating computer vision and object recognition technologies, the app can provide users with directions and support in avoiding obstacles. It is designed to be a user-oriented solution that visually impaired people may use independently & safely while exploring the physical environment.
The borders of the project include the creation of features to detect objects, establishment of GPS tracking, and embedding audio support for users during real-time navigation activities inside the buildings as well as outdoors. The method includes deep learning approaches, specifically Convolutional Neural Networks (CNN), with the integration of permanent GPS data for the object search and accurate navigation assistance.
The project envisions the use of smartphones possessing the requisite processing power while acknowledging the relevance of real-time interaction in effective navigation. The benefits that are envisaged include greater mobility, efficient detection of obstacles and an effective audio mechanism that enhances the navigation experience of the visually impaired.
Project Overview:
The goal of this project is to create a mobile app that assists visually impaired people with navigation on the go. The target audience is visually impaired users as they are in need for better moving around. The project will include advancements in object recognition technology, automatic GPS location combined with the audio responses to both external and internal navigation maps. The methodology includes implementation of deep learning models like CNNs for such tasks as object identification and integration of the results with GPS for navigation targeting. The project makes it clear that users will be using smartphones that have high processing power, and that real-time response is vital for navigation endeavors. The outcomes are also improvement in mobility, reduction in number of undetected obstacles and an efficient audio navigation system.
Background:
The prospect of this project is derived from the fact that visually impaired people have difficulties with mobility independently in their surroundings. Navigation systems designed for visually impaired Definitely help, most of them do not possess the ability to detect objects around them and provide feedback in real-time. The problem statement addresses the absence of a full-fledged, real-time solution to the problem of navigation for this category of users. The project draws upon theories from the fields of assistive technology, computer vision, and object recognition. The major limitations involve the need for fast computation, feedback and context of the graphical interface for the user.
Problem Statement:
The following mobile application would provide real-time navigation support to a visually impaired user through the detection of objects in front, integrated GPS, and audio responses. It would provide service for visually impaired users who need better mobility support. Employing deep learning techniques, namely Convolutional Neural Networks for object detection and GPS for navigation, this application would be able to provide effective indoor and outdoor guidance. Key results include enhanced mobility, reliable obstacle detection, and a robust, audio-based navigation solution to ensure the independence and safety of the visually impaired individual.
While existing navigation assistance solutions that are specifically designed for the visually impaired hold promises in real-time processing, accuracy, and user accessibility, they are generally still lacking. The early GPS-based systems allow users to navigate outdoors; however, obstacles cannot be well detected by such systems. Other innovations, such as the Seeing AI app from Microsoft and the system known as NAVI, have been able to provide audio feedback with object recognition but usually lack indoor navigation and precise object detection. Common deficiencies include real-time processing that is restricted and incomplete integration of GPS and object recognition. Advanced deep learning algorithms are used here, such as YOLO combined with GPS for complete navigation assistance.
Smart Stick Navigation System: Gharghan et al. 2024 have come up with a navigation system for visually impaired users. This system incorporates sensor data and machine learning methods to enhance the detection of obstacles and real-time assistance for better mobility and independence through making more informed decisions while navigating.
Be My Eyes App: This is one more wonderful application that can connect visually impaired users with sighted volunteers through video calls and can get real-time assistance in reading labels or navigating. It shows how community support and technology help foster independence.
Seeing AI: New Technology Research in Support of the Blind and Visually Impaired Community. Featured by Microsoft, Seeing AI supports the blind and visually impaired in identifying products, reading text, and recognizing currency-all further empowering user autonomy and daily function.
YOLOv8: An Incremental Improvement: Improvement Step by Step In 2018, Redmon and Farhadi presented YOLOv8, which was an improved real-time object detection model balancing speed and precision, hence highly applicable in the mobile navigation assistant for visually impaired users.
A Comprehensive Review of Navigation Systems for Visually Impaired Individuals: The paper reviewed various navigation technologies existing for visually impaired users. It assessed the effectiveness of systems, and the challenges faced in this respect to bring forth an idea regarding the scope of potential improvements in the mobility-related context.
System Architecture
Indoor Navigation Aid with Object Recognition Capability: The overall architecture will comprise a mobile application integrated with computer vision, GPS, and object detection capabilities. This would mean that real-time live videos captured through the user's smartphone camera will be processed to detect objects. In contrast, GPS data might be used for outdoor navigation, while the application can provide obstacle detection and navigation guidance using a combination of visual and spatial data. This sends the recognized objects and directional cues back to the mobile application, where text-to-speech functionality provides audio feedback to the user. Real-time processing with this architecture is efficient and can immediately deliver correct navigation assistance. The architecture of the Navigation Assistance system for Real-Time Object Detection would involve a mobile application that will efficiently lead a person with the help of computer vision, GPS, and object detection technologies. This application takes as input a live video feed from a smartphone camera that is processed in real time to detect objects, find obstacles, and give safe paths for navigation. During outdoor navigation, it also draws upon GPS data to keep users oriented in the direction of their desired route.
This architecture includes Mapbox for navigation-a way of enhancing user experience with accurate mapping and routing.
The Mapbox Navigation SDK provides real-time turn-by-turn navigation, integrated well with GPS to position precisely and guide routes. This is further supported by Mapbox's Places API, which provides additional contextual data through the identification of nearby landmarks-like buildings or transit stops-to allow users to understand their location and surroundings in even greater detail. This spatial information, after processing object detection data, is sent to the user in audio feedback synthesized using text-to-speech. Real-time object detection and route planning are possible through this multilayer architecture, ensuring efficient and effective navigation. The objects recognized, along with obstacles and directional prompts, are converted into audio prompts that keep users well-informed about the environment they are navigating and independent in their movements.
Navigation Assistance Systems
Navigation assistance systems provide obstacle detection and directional guidance to their visually impaired users. Usually, GPS integrated into the system supports outdoor navigation, while object detection provides the possibility of obstacle avoidance.
In these smartphone-based systems, detection of obstacles relies on the device's camera and sensors, which provide the input as a direction audibly. For example, image processing used in our Navigation Assistance app is done as real-time object detection to guide users through audio cues to avoid obstacles, thus helping to navigate indoor and outdoor spaces with object detection together combined with GPS.
Object detection
Below is a detailed description of the methodology that will be applied in developing the Navigation Assistance application. It utilizes the smartphone camera to capture real-time live video for object detection using deep learning models. The camera, once turned on, analyzes the environment to detect objects and fuses GPS data for navigation cues both indoors and outdoors. YOLO and similar models will be used for the fast and accurate identification of obstacles. The objects detected will have their position tagged, adding context to the information. This information is then fed into the text-to-speech module, which synthesizes audio descriptions delivered through earphones and thus allows users to avoid obstacles and follow paths for confident, independent navigation.
Key Functionalities
Navigation Assistance: GPS for outdoors and computer vision come together to effect navigation support within the application. While GPS outdoors can track routes and destinations, computer vision helps with the identification of objects and obstacles indoors to provide continuity in effective guidance across environments.
Real-time Video Processing: Real-time video processing is instrumental in this application. This would allow the immediate detection of objects and navigation feedback. The camera of the smartphone captures continuous video, which is processed either on-device or remotely for frame-by-frame analysis. This enables the application to update instantly about the surroundings for safe and responsive navigation.
Object Detection: It identifies objects around the user by using deep learning models. Some of the deep learning models used in CNNs for accurate picture analysis include the YOLO-type models that grant quick identification of a significant number of objects. The app will alert users to relevant obstacles-whether it is a curb, poles, or vehicles-so they may be cognizant of the dangers ahead.
Distance estimation: Estimation of distance thus enables the application Navigation Assistance to judge how close an object it detects is, informing users whether they are near, at a safe distance, or further on. The exact distance the app ascertains through stereo vision or depth estimation methodology. One of the parameters that plays an important role in obstacle avoidance is distance; therefore, making users fully aware of nearby hazards improves navigation safety.
Audio feedback: Audio feedback represents a vital medium of instantly delivering information to visually impaired users in real time in our Navigation Assistance app. By incorporating text-to-speech technology, the app will provide descriptive audio of one's surroundings-for example, "Car coming from your left, five meters away" or "Sidewalk ends in two meters." In this regard, it continuously speaks the description while maintaining smooth, hands-free navigation that is safer and independent.
Usage
Launch the Application: Open the app on your mobile device and ensure it is set up and connected to necessary servers for real-time processing and object detection.
Server Running: Confirm that any required server for processing tasks, such as object detection and distance estimation, is operational.
Activate and Position the Camera: Turn on the smartphone camera and aim it to capture a clear view of the surroundings for better detection accuracy.
Real-time Video Capture: The app starts capturing live videos, processing frames to identify objects using computer vision and deep learning models.
Audio Descriptions for Guidance: Detected objects, and navigation cues are converted to audio descriptions and transmitted through earphones for hands-free navigation.
GPS-Enhanced Outdoor Navigation: Utilize GPS for accurate directional guidance, helping users follow paths and receive real-time updates about their location.
Voice Command Functions:
Camera: Activates roam mode to explore the surroundings.
Navigation: Enters navigation mode for guided assistance.
Go Back: Returns to the previous page in the app.
(in Navigation Mode): Initiates navigation to the specified location.
The result of the project proves the efficiency of the real-time object detection system in detecting obstacles and instantly providing audio feedback to the users, thereby increasing their awareness of the environment around them. The application identifies several objects and obstacles like person, bicycle, car, chair, laptop, bottle, among others, in indoors as well as outdoors locations, during the testing performed.
Key successes of the entire system included the processing of real-time video frames to enable seamless detection without any lag. This responsiveness ensures that audio alerts are timely, which is important for users in adjusting in their movements with ease and efficiency as they avoid obstacles. The incorporation of GPS data also proved quite successful for outdoor navigation, where the app accurately guided users through routes and helped them identify the important landmarks. This GPS data allows smooth and safe navigation, even in crowded or complex environments, when combined with object detection.
The deep learning models used in object recognition-especially YOLO- showed efficiency and accuracy. It maintained the system at Tuned-in high precision in the detection of multiple running objects, which is a very critical feature in dynamic environments. Text-to-speech provided clear audio descriptions briefly. The testing showed, however, that the quality of object detection and navigation guidance depends on camera angle and lighting conditions. For example, low light settings in such areas reduce the detection accuracy, which could be a possible addition in future development: improvements in recognition capability under low light.
These results, in general, support the reliability of the system for real-time navigation aid, enhancing the independence and mobility of visually impaired users considerably. Further work may be directed at refining object detection in various conditions, enhancing the distance estimation, and integrating more active sensory feedback techniques that further support navigation.
Testing revealed that the proposed object recognition system using YOLO yields very accurate and low-latency results-an important factor for user experience. The model detects multiple objects in real time while providing on-time feedback to visually impaired users in navigation.
GPS integration was effective in outdoor navigation as this gave the location and direction accurately with minor deviation. Screenshots from navigation mode clearly show the routes, with real positioning in real time, showing the capability of the system to keep users oriented. Their combination significantly enhances the safety of the users by providing both route guidance and obstacle detection.
User testing showed significant navigation and obstacle detection improvements that allow users to independently create their way and move around in complex environments. This was reinforced by feedback indicating that object detection integrated with navigation assistance works far more effectively since it tends to provide superior solutions for mobility.
Real-time object detection coupled with GPS creates a full-fledged navigation aid that will considerably enhance both spatial and situational awareness for the visually impaired user. This multi-integrated technology emphasizes the creation of robust and user-friendly assistive navigation solutions that guarantee an enrichment of mobility and environmental awareness.
The project successfully designed the mobile application "Navigation Assistance with Real-Time Object Detection for Visually Impaired" to improve the mobility of visually impaired people. State-of-the-art technologies, such as computer vision and GPS, are employed to deliver correct and timely feedback to users effectively, enhancing substantially the user's ability to navigate safely and independently in various environmental situations.
Besides just showing the different deep learning models for which YOLO stands in abbreviation, the idea here is that such deep learning models, integrated with GPS navigation, like YOLO, effectively respond to the limitations of earlier solutions. This fills critical gaps with altogether more reliable and efficient navigation for the users.
In the future work to be done, refinement of application and further development will enhance the usability and higher performance of the application. The major achievement of this work was the designing of a system for enhancements in obstacle detection and provision of reliable navigation. This application provides independence to visually impaired people with safety and takes assistive technology for better mobility one step ahead.
[1] Sadik Kamel Gharghan, H. S. Kamel, Asaower Ahmad Marir, and Lina Akram Saleh, “Smart Stick Navigation System for Visually Impaired Based on Machine Learning Algorithms Using Sensors Data,” Journal of Sensor and Actuator Networks, vol. 13, no. 4, pp. 43–43, Aug. 2024, doi: https://doi.org/10.3390/jsan13040043.
[2] Be My Eyes, “Be My Eyes - Bringing sight to blind and low-vision people,” Bemyeyes.com, 2019. https://www.bemyeyes.com/
[3] “Seeing AI: New Technology Research to Support the Blind and Visually Impaired Community,” Microsoft Accessibility Blog, Apr. 07, 2016. https://blogs.microsoft.com/accessibility/seeing-ai/
[4] T. Y. Mahesh, S. S. Parvathy, S. Thomas, S. R. Thomas, and T. Sebastian, “CICERONE- A Real Time Object Detection for Visually Impaired People,” IOP Conference Series: Materials Science and Engineering, vol. 1085, no. 1, p. 012006, Feb. 2021, doi: https://doi.org/10.1088/1757-899x/1085/1/012006.
[5] Mustufa Haider Abidi, Arshad Noor Siddiquee, Hisham Alkhalefah, and V. Srivastava, “A Comprehensive Review of Navigation Systems for Visually Impaired Individuals,” Heliyon, vol. 10, no. 11, pp. e31825–e31825, Jun. 2024, doi: https://doi.org/10.1016/j.heliyon.2024.e31825.
We extend our heartfelt gratitude to the dedicated researchers and developers in the field of computer vision and deep learning, whose groundbreaking work laid the foundation for this project. Special thanks to our mentors and advisors for their invaluable guidance and feedback throughout this endeavor. We also appreciate the contributions of the open-source community for providing the tools and frameworks that made this project possible. Lastly, we acknowledge the visually impaired community for inspiring us to develop PathFinder and for their feedback in shaping a user-centric solution.
Technical Specifications
Hardware Requirements: Smartphone with a quad-core processor or higher.
Minimum 4GB RAM.
Built-in GPS and camera capabilities.
Software Requirements: Android or iOS operating system.
Python programming language.
Deep learning frameworks: TensorFlow
OpenCV library for image processing.
Data Used: Pretrained models for object detection (e.g., YOLO).
Custom datasets for indoor and outdoor navigation scenarios.
Python Code for Object Detection:
from ultralytics import YOLO #import the model
import cv2
import math
cap = cv2.VideoCapture(0) # start webcam
cap.set(3, 640) # set the size
cap.set(4, 480) # set the size
model = YOLO("yolo-Weights/yolov8n.pt") # load model
classNames = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck", "boat",
"traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat",
"dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella",
"handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat",
"baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup",
"fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli",
"carrot", "hot dog", "pizza", "donut", "cake", "chair", "sofa", "pottedplant", "bed",
"diningtable", "toilet", "tvmonitor", "laptop", "mouse", "remote", "keyboard", "cell phone",
"microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors",
"teddy bear", "hair drier", "toothbrush"]
while True:
success, img = cap.read() # read image data
results = model(img, stream=True) # pass the image frame to the model
# looping through coordinates
for r in results:
boxes = r.boxes # identifying boxes
for box in boxes:
# bounding box
x1, y1, x2, y2 = box.xyxy[0]
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2) # convert to int values
cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 255), 3) # put box in cam
confidence = math.ceil((box.conf[0]*100))/100 # calculating confidence score
print("Confidence --->",confidence)
cls = int(box.cls[0]) # identifying class name
print("Class name -->", classNames[cls])
# object details
org = [x1, y1]
font = cv2.FONT_HERSHEY_SIMPLEX
fontScale = 1
color = (255, 0, 0)
thickness = 2
cv2.putText(img, classNames[cls], org, font, fontScale, color, thickness) # printing label over the bounding box
cv2.imshow('Webcam', img) # display the image
if cv2.waitKey(1) == ord('q'):
break
cap.release()
cv2.destroyAllWindows()