CANINE: Coaching Visually Impaired Users for Interactive Navigation with a Robot Guide Dog

Cunjun Yu1, Zishuo Wang1, Anxing Xiao1, Linfeng Li1, David Hsu1,2
1School of Computing, National University of Singapore 2Smart Systems Institute, National University of Singapore
RSS 2026

CANINE helps users practice doorway navigation with a robot guide dog by selecting weak sub-skills and generating targeted feedback after each episode.

Abstract

Robot guide dogs can expand independent mobility for visually impaired users, but effective use requires subtle human-robot coordination that is difficult to learn from generic instructions. CANINE is an automated coaching system for interactive navigation with robot guide dogs. It decomposes navigation into sub-skills, tracks learner proficiency with knowledge tracing, prioritizes weak areas for practice, and uses foundation models to infer error causes from episode observations and generate personalized verbal corrections.

A controlled proxy-user study with blindfolded participants shows that CANINE improves learning efficiency and final navigation performance compared with generic verbal instruction. A two-week retention study shows lasting skill improvement, and an exploratory case study with a visually impaired user confirms the value of sub-skill practice, personalized feedback, and adaptive curriculum design while surfacing deployment needs such as multimodal grounding and user control over feedback.

Two-Level Coaching

CANINE system overview

The system separates curriculum selection from episode-level feedback generation.

Inter-skill coaching: CANINE decomposes doorway navigation into Navigate to Door, Open Door, and Enter Room. A Gaussian knowledge tracing model estimates latent proficiency for each sub-skill and selects the weakest one for the next practice episode.
Intra-skill coaching: A chest-mounted camera captures the episode. A VLM extracts structured frame states, an LLM summarizes the timeline, diagnoses likely error causes, and generates concise verbal feedback.

Robot Guide Dog Platform

Robot guide dog hardware setup

The platform uses a Unitree Go2 quadruped robot with onboard computation, LiDAR-based navigation, a rigid leash interface, a chest-mounted action camera for first-person observation, a microphone for user feedback, and a speaker for generated instructions.

The navigation task focuses on doorway interaction, where users must coordinate position, timing, door handling, and robot-following without visual feedback.

Human Subject Study Results

CANINE improves the interaction-heavy parts of doorway navigation and is perceived as more useful than generic task-level instruction.

Objective gains. Larger improvement and faster final completion on Open Door and Enter Room.
User experience. Participants rated CANINE as more useful and valued adaptive sub-skill practice.
Retention. Follow-up results suggest the learned skills persisted after two weeks.
Performance improvement across navigation sub-skills

Performance improvement across sub-skills.

Final completion time after training

Final completion time after training.

Subjective evaluation results

Subjective evaluation.

Two-week retention results

Two-week retention.

From Frames to Coaching

Timeline summarization example

CANINE avoids direct free-form video prompting. Instead, it samples frames, extracts symbolic states, builds an episode timeline, and then generates feedback from the structured summary. This decomposition makes the coach's reasoning easier to inspect and keeps feedback grounded in the observed episode.

In the human-judged VLM pipeline comparison, CANINE achieved the highest overall preference rate, with judges favoring its feedback over direct video prompting and structured full-video prompting.

Visually Impaired User Case Study

An exploratory session with a totally blind participant showed learning progression on the same navigation structure used in the controlled study. The participant improved Open Door completion time from 22.4 seconds to 9.1 seconds and described the feedback as accurate.

The case study also revealed deployment considerations that matter beyond proxy-user experiments: feedback should be grounded in haptic cues when language is not enough, and users should be able to pause, replay, or skip coaching to manage cognitive load.

CANINE robot guide dog doorway navigation teaser

Extending Beyond Navigation

Robotic handover setup

CANINE was also instantiated for robotic handover, where a manipulator hands objects to a user who cannot see the robot's approach. The task shares the core coaching challenge of coordinating human and robot action under asymmetric perception.

The handover study suggests that different tasks may require different feedback timing. Terminal feedback worked for doorway navigation, while handover benefited from concurrent guidance for hand placement and grasp timing.

BibTeX

@inproceedings{yu2026canine,
  title = {CANINE: Coaching Visually Impaired Users for Interactive Navigation with a Robot Guide Dog},
  author = {Yu, Cunjun and Wang, Zishuo and Xiao, Anxing and Li, Linfeng and Hsu, David},
  booktitle = {Robotics: Science and Systems},
  year = {2026}
}