EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios
8/10 6 cited 2025 paper
EduBench introduces a comprehensive benchmark dataset with 18,821 data points across 9 educational scenarios and 4,000+ educational contexts (covering K-12 and higher education subjects, different difficulty levels, and multiple question types), evaluated using 12 multi-dimensional metrics covering scenario adaptation, factual/reasoning accuracy, and pedagogical application. The benchmark assesses LLM capabilities in diverse educational tasks including assignment grading, study planning, tutoring, and psychological counseling, with both human and automated evaluation.
AI TutorsTeacher Support Tools large language model evaluation educationcomputer-science