EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMs
EduAdapt introduces a benchmark dataset of nearly 48k grade-labeled QA pairs across grades 1-12 and nine science subjects to evaluate whether LLMs can adapt their responses to different grade levels. The paper evaluates multiple open-source LLMs and finds they struggle to generate developmentally appropriate responses, especially for early-grade students.
Large language models (LLMs) are transforming education by answering questions, explaining complex concepts, and generating content across a wide range of subjects. Despite strong performance on academic benchmarks, they often fail to tailor responses to students'grade levels. This is a critical need in K-12 education, where age-appropriate vocabulary and explanation are essential for effective learning. Existing models frequently produce outputs that are too advanced or vague for younger learne