Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

Benchmark (Published & Automated) Relevance: 8/10 29 cited 2024 paper

This paper develops and evaluates stepwise verification models that detect errors in student mathematical reasoning chains before generating tutor responses, collecting a dataset of 1K annotated student solutions with error identification. The approach tests whether explicitly verifying student work before response generation improves tutoring quality compared to direct response generation in multi-step math problem-solving.

Large language models (LLMs) offer many opportunities to scale high-quality personalized tutoring. A promising approach is to build dialog tutoring models to scaffold students’ problem-solving. However, even though existing models perform well in solving reasoning questions, they can struggle to precisely detect student’s errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we

Study Type

Benchmark (Published & Automated)

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.

Tags

reasoning evaluation LLMcomputer-science