Leveraging LLMs to Assess Tutor Moves in Real-Life Dialogues: A Feasibility Study
This paper evaluates the feasibility of using LLMs (GPT-4, Gemini, LearnLM) to automatically identify and assess two specific tutoring moves in real-life math tutoring dialogues: delivering effective praise and responding to student errors. The study analyzes 50 transcripts of college tutors working with middle school students, demonstrating that LLMs can reliably detect tutoring situations (94-98% accuracy for praise detection, 82-88% for error detection) and evaluate adherence to best practices (83-89% and 73-77% alignment with human judgment).
Tutoring improves student achievement, but identifying and studying what tutoring actions are most associated with student learning at scale based on audio transcriptions is an open research problem. This present study investigates the feasibility and scalability of using generative AI to identify and evaluate specific tutor moves in real-life math tutoring. We analyze 50 randomly selected transcripts of college-student remote tutors assisting middle school students in mathematics. Using GPT-4,