Benchmarking the Pedagogical Knowledge of Large Language Models
This paper introduces The Pedagogy Benchmark, a dataset of 920 multiple-choice questions from Chilean teacher training exams designed to evaluate large language models' cross-domain pedagogical knowledge (CDPK) and Special Education Needs and Disability (SEND) knowledge. The benchmark tests 97 models on their understanding of teaching strategies, assessment methods, and other pedagogical concepts, with results published on an interactive online leaderboard.
Benchmarks like Massive Multitask Language Understanding (MMLU) have played a pivotal role in evaluating AI's knowledge and abilities across diverse domains. However, existing benchmarks predominantly focus on content knowledge, leaving a critical gap in assessing models'understanding of pedagogy - the method and practice of teaching. This paper introduces The Pedagogy Benchmark, a novel dataset designed to evaluate large language models on their Cross-Domain Pedagogical Knowledge (CDPK) and Spe