Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

Relevance: 8/10 29 cited 2024 paper

This paper develops and evaluates stepwise verification methods for detecting student reasoning errors in math problem-solving, showing that grounding tutor responses in explicit error detection improves feedback quality and reduces hallucinations in LLM-based dialog tutoring systems. The work collects a dataset of 1K annotated student solution chains and demonstrates that verifier-guided generation produces more targeted, correct responses compared to direct generation baselines.

Large language models (LLMs) offer many opportunities to scale high-quality personalized tutoring. A promising approach is to build dialog tutoring models to scaffold students’ problem-solving. However, even though existing models perform well in solving reasoning questions, they can struggle to precisely detect student’s errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.

Tags

reasoning evaluation LLMcomputer-science