NAVI - Navigation Assistance for Visually Impaired
CONCEPT: Most visually impaired people rely on non-visual cues (sound, touch, memory) to navigate around. While current assistive technology (for e.g. canes, guide dogs and smartphone apps) do assist them, they leave many critical issues unaddressed, making it hazardous for one to roam around. White canes cannot detect overhead dangers like tree branches and signage. Guide dogs need intensive training and cannot interpret text. Smartphones demand constant manual interaction and without human assistance, text-based cues like street signs can’t be interpreted properly.To resolve these issues, we had to address this sensory gap. To compensate for the vision loss, we basically had to ‘bring back’ vision as the eyes allow the brain to see the surroundings. To do so, we decided to describe the surroundings via sound instead, coming up with the concept of a ‘Third Eye’ and ‘Second Brain.
Our SOLUTION: We use a stereo camera setup to understand the 3D space around the user. Below are the core components:
- YOLO (You Only Look Once): This is a real-time object detection model. It identifies and labels objects like arrows, chairs, doors, people, etc., and also gives their position in the frame. To improve its classification accuracy and capabilities, we will fine-tune the YOLO model on datasets (mentioned in Section 4). Fine-tuning is the process of training a pre trained model on a new dataset so that it can predict certain objects more accurately.
- OCR (Optical Character Recognition): It reads any printed or written text from the video feed. For example, if there’s a board that says “EXIT” or “WASHROOM”, OCR helps the system extract and understand that text.
- Stereo Vision + Depth Estimation: Just like human eyes, we use two cameras placed slightly apart. This helps us calculate how far each object or text is from the user. From this depth map, we can estimate the number of steps the user needs towalk to reach the target.