StreamCoach: Real-Time Coaching of Human Physical Skills with Large Language Models

Cunjun Yu, Deepak Edakkattil Gopinath, Andrew Silva, Pradyumna Tambwekar, Emily Sumner, Laporsha Dees, Guy Rosman, John J. Leonard, Avinash Balachandran, David Hsu
Preprint, 2025
StreamCoach teaser

StreamCoach separates fast intervention timing from slower instruction generation for concurrent coaching in high-performance driving.

Abstract

Real-time, language-based coaching of human physical skills has the potential to accelerate skill acquisition in fast-paced domains such as driving, sports, and surgery. Building an effective AI coach is difficult because it must decide both when to intervene and what to say. Traditional methods rely on fixed rules and instruction sets, while large language models can generate more flexible feedback but struggle with real-time responsiveness and domain-specific supervision.

StreamCoach decomposes coaching into two stages: a fast stage that detects intervention opportunities by comparing the learner's current state embedding to past coaching moments, and a slower retrieval-augmented generation stage that produces context-aware instructions from relevant prior examples. In a high-performance driving domain, StreamCoach improves both feedback timing and instruction quality, suggesting a scalable framework for concurrent coaching with language.

Fast-Slow Coaching

StreamCoach fast-slow inference framework

The same semantic representation supports both rapid timing decisions and grounded feedback generation.

Fast inference: StreamCoach embeds the learner's current state and compares it with stored examples of past coaching moments to decide whether feedback is needed now.
Slow inference: When an intervention is triggered, the same embedding retrieves relevant expert examples that condition an LLM to generate domain-specific feedback.

Evaluation

StreamCoach was evaluated on a simulated high-performance driving dataset with expert coaching annotations.

Timing. Embedding-based retrieval identifies teachable moments without dense hand-coded rules.
Content. Retrieval-augmented examples help generated instructions align with expert guidance.
Generalization. Evaluation uses participant-level splits to test on unseen learners.
LLM-as-judge evaluation of generated coaching instructions

Instruction quality judged against baselines.

Ablation on retrieval top-k

Ablation on retrieval context size.

BibTeX

@article{yu2025streamcoach,
  title = {Real-Time Coaching of Human Physical Skills with Large Language Models},
  author = {Yu, Cunjun and Gopinath, Deepak Edakkattil and Silva, Andrew and Tambwekar, Pradyumna and Sumner, Emily and Dees, Laporsha and Rosman, Guy and Leonard, John J. and Balachandran, Avinash and Hsu, David},
  year = {2025}
}