Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions

Benchmark (Published & Automated) Relevance: 8/10 18 cited 2023 paper

This paper proposes a novel evaluation approach for LLMs' mathematical reasoning by having them simulate novice learners who make specific misconception-based errors and expert tutors who identify misconceptions behind incorrect answers, using grade-school math problems from the Eedi platform. The benchmark measures whether LLMs can identify incorrect answers corresponding to specific misconceptions and recognize misconceptions explaining wrong answers, rather than just solving problems correctly.

We propose novel evaluations for mathematical reasoning capabilities of Large Language Models (LLMs) based on mathematical misconceptions. Our primary approach is to simulate LLMs as a novice learner and an expert tutor, aiming to identify the incorrect answer to math question resulted from a specific misconception and to recognize the misconception(s) behind an incorrect answer, respectively. Contrary to traditional LLMs-based mathematical evaluations that focus on answering math questions corr

Study Type

Benchmark (Published & Automated)

Source

View source Open PDF

Framework Categories

1 General reasoning 2.3 Pedagogical interactions 3.1 Content knowledge

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.

Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions

Study Type

Source

Framework Categories

Tool Types

Tags