Problems With Large Language Models for Learner Modelling: Why LLMs Alone Fall Short for Responsible Tutoring in K-12 Education

Research / Other Relevance: 9/10 2025 paper

This paper empirically evaluates the limitations of LLM-based tutoring systems for K-12 education by comparing a deep knowledge tracing (DKT) model against a widely-used LLM (with and without fine-tuning) on learner knowledge assessment accuracy, reliability, and temporal coherence using large-scale student interaction data. The study demonstrates that specialized learner modelling approaches substantially outperform LLMs in predicting student performance and maintaining consistent knowledge state estimates over time.

The rapid rise of large language model (LLM)-based tutors in K--12 education has fostered a misconception that generative models can replace traditional learner modelling for adaptive instruction. This is especially problematic in K--12 settings, which the EU AI Act classifies as high-risk domain requiring responsible design. Motivated by these concerns, this study synthesises evidence on limitations of LLM-based tutors and empirically investigates one critical issue: the accuracy, reliability,

Study Type

Research / Other

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.
Personalised Adaptive Learning Systems that adapt content and difficulty to individual learners.

Tags

LLM evaluation K-12 educationcomputer-science