Problems With Large Language Models for Learner Modelling: Why LLMs Alone Fall Short for Responsible Tutoring in K-12 Education
This paper empirically evaluates the limitations of LLM-based tutoring systems for K-12 education by comparing a deep knowledge tracing (DKT) model against a widely-used LLM (with and without fine-tuning) on learner knowledge assessment accuracy, reliability, and temporal coherence using large-scale student interaction data. The study demonstrates that specialized learner modelling approaches substantially outperform LLMs in predicting student performance and maintaining consistent knowledge state estimates over time.
The rapid rise of large language model (LLM)-based tutors in K--12 education has fostered a misconception that generative models can replace traditional learner modelling for adaptive instruction. This is especially problematic in K--12 settings, which the EU AI Act classifies as high-risk domain requiring responsible design. Motivated by these concerns, this study synthesises evidence on limitations of LLM-based tutors and empirically investigates one critical issue: the accuracy, reliability,