Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors
This paper develops and evaluates stepwise verification models that detect errors in student mathematical reasoning chains before generating tutor responses, collecting a dataset of 1K annotated student solutions with error identification. The approach tests whether explicitly verifying student work before response generation improves tutoring quality compared to direct response generation in multi-step math problem-solving.
Large language models (LLMs) offer many opportunities to scale high-quality personalized tutoring. A promising approach is to build dialog tutoring models to scaffold students’ problem-solving. However, even though existing models perform well in solving reasoning questions, they can struggle to precisely detect student’s errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we