EduEval: A Hierarchical Cognitive Benchmark for Evaluating Large Language Models in Chinese Education
Benchmark (Published & Automated) 10/10 2025 paper
EduEval is a comprehensive hierarchical benchmark for evaluating LLMs in Chinese K-12 education, comprising 24 task types with over 11,000 questions organized across six cognitive dimensions (Memorization, Understanding, Application, Reasoning, Creativity, and Ethics) based on Bloom's Taxonomy and Webb's Depth of Knowledge. The benchmark incorporates authentic educational materials including real exam questions, classroom dialogues, student essays, and expert-designed prompts spanning primary through high school levels.
AI TutorsTeacher Support Tools LLM evaluation K-12 educationcomputer-science