Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions
This paper evaluates LLMs' mathematical reasoning by testing their ability to simulate novice learners making misconception-based errors and expert tutors diagnosing those misconceptions in grade-school math problems. The evaluation uses a dataset from Eedi's platform with multiple-choice questions tagged with specific mathematical misconceptions.
We propose novel evaluations for mathematical reasoning capabilities of Large Language Models (LLMs) based on mathematical misconceptions. Our primary approach is to simulate LLMs as a novice learner and an expert tutor, aiming to identify the incorrect answer to math question resulted from a specific misconception and to recognize the misconception(s) behind an incorrect answer, respectively. Contrary to traditional LLMs-based mathematical evaluations that focus on answering math questions corr