Using Large Language Models to Assess Tutors' Performance in Reacting to Students Making Math Errors

Relevance: 8/10 11 cited 2024 paper

This paper evaluates GPT-3.5-Turbo and GPT-4's ability to assess human tutors' performance in responding to K-12 students' math errors, specifically measuring whether tutors use indirect guidance strategies versus direct error correction. The study analyzes 50 real tutoring dialogues to determine if LLMs can provide automated feedback on tutoring quality.

Research suggests that tutors should adopt a strategic approach when addressing math errors made by low-efficacy students. Rather than drawing direct attention to the error, tutors should guide the students to identify and correct their mistakes on their own. While tutor lessons have introduced this pedagogical skill, human evaluation of tutors applying this strategy is arduous and time-consuming. Large language models (LLMs) show promise in providing real-time assessment to tutors during their

Tool Types

Teacher Support Tools Tools that assist teachers — lesson planning, content generation, grading, analytics.

Tags

tutoring dialogue evaluationcomputer-science