FoundationalASSIST: An Educational Dataset for Foundational Knowledge Tracing and Pedagogical Grounding of LLMs
FoundationalASSIST is a K-12 educational dataset of 1.7M student interactions with full question text, actual student responses, and Common Core alignment, designed to evaluate LLMs on knowledge tracing (predicting student performance and exact answers) and pedagogical grounding (understanding properties that make assessment items effective). The paper demonstrates that current frontier LLMs struggle significantly on both task families, performing barely above trivial baselines on knowledge tracing and below random chance on item discrimination.
Can Large Language Models understand how students learn? As LLMs are deployed for adaptive testing and personalized tutoring, this question becomes urgent -- yet we cannot answer it with existing resources. Current educational datasets provide only question identifiers and binary correctness labels, rendering them opaque to LLMs that reason in natural language. We address this gap with FoundationalASSIST, the first English educational dataset providing the complete information needed for researc