Problems With Large Language Models for Learner Modelling: Why LLMs Alone Fall Short for Responsible Tutoring in K-12 Education
This paper empirically evaluates LLM-based tutoring systems against traditional deep knowledge tracing (DKT) models for learner modelling in K-12 education, demonstrating that LLMs fall short in accurately tracking student knowledge over time even after fine-tuning. The study directly measures prediction accuracy, temporal coherence, and multi-skill mastery estimation using a large-scale K-12 dataset to assess whether LLMs can responsibly support adaptive instruction.
The rapid rise of large language model (LLM)-based tutors in K--12 education has fostered a misconception that generative models can replace traditional learner modelling for adaptive instruction. This is especially problematic in K--12 settings, which the EU AI Act classifies as high-risk domain requiring responsible design. Motivated by these concerns, this study synthesises evidence on limitations of LLM-based tutors and empirically investigates one critical issue: the accuracy, reliability,