MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems
MathDial presents a dataset of 3,000 one-to-one math tutoring dialogues where human teachers guide LLM-simulated students through multi-step reasoning problems using scaffolding questions and pedagogical moves, with extensive annotations for training and evaluating AI tutoring systems. The paper demonstrates that current LLMs fail at effective tutoring by revealing solutions too early or providing incorrect feedback, and shows how models finetuned on MathDial improve interactive tutoring performance.
While automatic dialogue tutors hold great potential in making education personalized and more accessible, research on such systems has been hampered by a lack of sufficiently large and high-quality datasets. Collecting such datasets remains challenging, as recording tutoring sessions raises privacy concerns and crowdsourcing leads to insufficient data quality. To address this, we propose a framework to generate such dialogues by pairing human teachers with a Large Language Model (LLM) prompted