Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions

Relevance: 9/10 18 cited 2023 paper

This paper evaluates LLMs' mathematical reasoning by testing their ability to simulate novice learners making misconception-based errors and expert tutors diagnosing those misconceptions in grade-school math problems. The evaluation uses a dataset from Eedi's platform with multiple-choice questions tagged with specific mathematical misconceptions.

We propose novel evaluations for mathematical reasoning capabilities of Large Language Models (LLMs) based on mathematical misconceptions. Our primary approach is to simulate LLMs as a novice learner and an expert tutor, aiming to identify the incorrect answer to math question resulted from a specific misconception and to recognize the misconception(s) behind an incorrect answer, respectively. Contrary to traditional LLMs-based mathematical evaluations that focus on answering math questions corr

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.

Tags

intelligent tutoring system evaluationcomputer-science