Benchmarking the Pedagogical Knowledge of Large Language Models

Relevance: 7/10 2 cited 2025 paper

This paper introduces The Pedagogy Benchmark, a novel dataset of 920 multiple-choice questions from Chilean teacher training exams designed to evaluate large language models' cross-domain pedagogical knowledge (CDPK) and special education needs (SEND) knowledge. The benchmark tests 97 models on their understanding of teaching strategies, assessment methods, and pedagogical concepts, with accuracies ranging from 28% to 89%.

Benchmarks like Massive Multitask Language Understanding (MMLU) have played a pivotal role in evaluating AI's knowledge and abilities across diverse domains. However, existing benchmarks predominantly focus on content knowledge, leaving a critical gap in assessing models'understanding of pedagogy - the method and practice of teaching. This paper introduces The Pedagogy Benchmark, a novel dataset designed to evaluate large language models on their Cross-Domain Pedagogical Knowledge (CDPK) and Spe

Framework Categories

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.
Teacher Support Tools Tools that assist teachers — lesson planning, content generation, grading, analytics.

Tags

benchmark dataset education learningcomputer-science