Findings of the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-powered Tutors
This paper presents findings from the BEA 2025 Shared Task that evaluates pedagogical abilities of AI tutors powered by LLMs, specifically focusing on assessing tutor responses for mistake identification, guidance provision, and feedback actionability in mathematics education dialogues. The task established pedagogically-motivated evaluation tracks grounded in learning science principles to measure how effectively AI tutors remediate student mistakes through dialogue.
This shared task has aimed to assess pedagogical abilities of AI tutors powered by large language models (LLMs), focusing on evaluating the quality of tutor responses aimed at student's mistake remediation within educational dialogues. The task consisted of five tracks designed to automatically evaluate the AI tutor's performance across key dimensions of mistake identification, precise location of the mistake, providing guidance, and feedback actionability, grounded in learning science principle