Personalised Adaptive Learning Landscape Summary

Personalised Adaptive Learning: Benchmarking AI-Driven Systems for Education

Systems that adapt content and difficulty to individual learners.

How this was produced: We identified high-relevance papers (scored ≥7/10) classified under this tool type, extracted key sections (abstract, introduction, results, discussion, conclusions) from each, then used Claude to synthesise findings into a structured evidence summary. The focus is on what benchmarks and evaluation methods exist to measure whether these tools work in the lab.

View benchmarks

Personalised adaptive learning represents one of the most technically mature — yet pedagogically under-evaluated — areas in AI-for-education research. Our analysis covers 200 papers spanning knowledge tracing models, intelligent tutoring systems (ITS), adaptive content sequencing, and the emerging integration of large language models (LLMs) into personalised learning pathways. The field has produced sophisticated architectures capable of predicting whether a student will answer the next question correctly, but it has done remarkably little to establish whether these predictions translate into genuine, lasting learning.

The dominant research paradigm centres on knowledge tracing (KT) — modelling student knowledge states from interaction logs to predict future performance. Deep learning approaches have largely supplanted classical Bayesian methods, with Transformer-based and attention-mechanism architectures now standard. However, a critical tension runs through the literature: while prediction accuracy (measured via AUC and RMSE) has improved steadily, the field overwhelmingly evaluates systems on narrow technical metrics rather than on whether students actually learn more, retain knowledge longer, or develop independent problem-solving capacity. Fewer than a handful of the 200 papers examined conduct longitudinal evaluations or measure transfer of learning to novel contexts.

For low- and middle-income countries (LMICs), these gaps matter profoundly. Most benchmark datasets originate from platforms in the United States, South Korea, and China, meaning adaptive systems are trained and validated on learner populations that bear little resemblance to the diverse linguistic, curricular, and infrastructural contexts of LMICs. The cold-start problem — how to personalise effectively when limited learner data exists — is acutely relevant to settings where digital learning infrastructure is nascent. This report sets out what is being measured, what is missing, and where investment could shift the field from prediction optimisation toward genuine impact at scale.