Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions
This paper proposes a novel evaluation approach for LLMs' mathematical reasoning by having them simulate novice learners who make specific misconception-based errors and expert tutors who identify misconceptions behind incorrect answers, using grade-school math problems from the Eedi platform. The benchmark measures whether LLMs can identify incorrect answers corresponding to specific misconceptions and recognize misconceptions explaining wrong answers, rather than just solving problems correctly.
We propose novel evaluations for mathematical reasoning capabilities of Large Language Models (LLMs) based on mathematical misconceptions. Our primary approach is to simulate LLMs as a novice learner and an expert tutor, aiming to identify the incorrect answer to math question resulted from a specific misconception and to recognize the misconception(s) behind an incorrect answer, respectively. Contrary to traditional LLMs-based mathematical evaluations that focus on answering math questions corr