Discerning minds or generic tutors? Evaluating instructional guidance capabilities in Socratic LLMs

Relevance: 10/10 1 cited 2025 paper

This paper introduces GuideEval, a benchmark that evaluates LLMs' ability to provide adaptive Socratic tutoring by assessing three pedagogical phases: perceiving learner states (confusion, comprehension, errors), orchestrating appropriate instructional strategies, and eliciting productive reflections. The benchmark is grounded in authentic K-12 educational dialogues and specifically measures whether LLMs can dynamically adjust their guidance based on student cognitive states rather than just generate generic responses.

The conversational capabilities of large language models hold significant promise for enabling scalable and interactive tutoring. While prior research has primarily examined their ability to generate Socratic questions, it often overlooks a critical aspect: adaptively guiding learners in accordance with their cognitive states. This study moves beyond question generation to emphasize instructional guidance capability. We ask: Can LLMs emulate expert tutors who dynamically adjust strategies in res

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.

Tags

tutoring dialogue evaluationcomputer-science