EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework
EducationQ is a multi-agent dialogue framework that evaluates LLMs' teaching capabilities through simulated teacher-student interactions, testing 14 models across 1,498 questions spanning 13 disciplines and 10 difficulty levels. The framework incorporates formative assessment principles to measure pedagogical effectiveness including questioning strategies, adaptive feedback, and scaffolding behaviors.
Large language models (LLMs) increasingly serve as educational tools, yet evaluating their teaching capabilities remains challenging due to the resource-intensive, context-dependent, and methodologically complex nature of teacher-student interactions. We introduce EducationQ, a multi-agent dialogue framework that efficiently assesses teaching capabilities through simulated dynamic educational scenarios, featuring specialized agents for teaching, learning, and evaluation. Testing 14 LLMs across m