EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMs

Relevance: 10/10 2025 paper

EduAdapt introduces a benchmark dataset of nearly 48k grade-labeled QA pairs across grades 1-12 and nine science subjects to evaluate whether LLMs can adapt their responses to different grade levels. The paper evaluates multiple open-source LLMs and finds they struggle to generate developmentally appropriate responses, especially for early-grade students.

Large language models (LLMs) are transforming education by answering questions, explaining complex concepts, and generating content across a wide range of subjects. Despite strong performance on academic benchmarks, they often fail to tailor responses to students'grade levels. This is a critical need in K-12 education, where age-appropriate vocabulary and explanation are essential for effective learning. Existing models frequently produce outputs that are too advanced or vague for younger learne

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.
Personalised Adaptive Learning Systems that adapt content and difficulty to individual learners.

Tags

LLM evaluation K-12 educationcomputer-science