Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors
This paper develops and evaluates stepwise verification methods for detecting student reasoning errors in math problem-solving, showing that grounding tutor responses in explicit error detection improves feedback quality and reduces hallucinations in LLM-based dialog tutoring systems. The work collects a dataset of 1K annotated student solution chains and demonstrates that verifier-guided generation produces more targeted, correct responses compared to direct generation baselines.
Large language models (LLMs) offer many opportunities to scale high-quality personalized tutoring. A promising approach is to build dialog tutoring models to scaffold students’ problem-solving. However, even though existing models perform well in solving reasoning questions, they can struggle to precisely detect student’s errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we