Benchmarking the Pedagogical Knowledge of Large Language Models

Benchmark (Published & Automated) Relevance: 7/10 2 cited 2025 paper

This paper introduces The Pedagogy Benchmark, a dataset of 920 multiple-choice questions from Chilean teacher training exams designed to evaluate large language models' cross-domain pedagogical knowledge (CDPK) and Special Education Needs and Disability (SEND) knowledge. The benchmark tests 97 models on their understanding of teaching strategies, assessment methods, and other pedagogical concepts, with results published on an interactive online leaderboard.

Benchmarks like Massive Multitask Language Understanding (MMLU) have played a pivotal role in evaluating AI's knowledge and abilities across diverse domains. However, existing benchmarks predominantly focus on content knowledge, leaving a critical gap in assessing models'understanding of pedagogy - the method and practice of teaching. This paper introduces The Pedagogy Benchmark, a novel dataset designed to evaluate large language models on their Cross-Domain Pedagogical Knowledge (CDPK) and Spe

Study Type

Benchmark (Published & Automated)

Framework Categories

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.
Teacher Support Tools Tools that assist teachers — lesson planning, content generation, grading, analytics.

Tags

benchmark dataset education learningcomputer-science