AI Tutors
1-to-1 conversational tutoring systems.
Research Summary
AI tutors represent one of the most mature and actively researched areas in educational AI β and one of the most contested. Our analysis covers 348 papers spanning intelligent tutoring systems (ITS), large language model (LLM)-powered conversational tutors, and adaptive learning platforms, primarily targeting K-12 mathematics and STEM education. The field demonstrates impressive technical progress: systems like ASSISTments, Reasoning Mind Genie 2, and newer LLM-based platforms such as Khanmigo and Duolingo Max can now deliver fluent, personalised instruction at scale. A rigorous randomised controlled trial (RCT) in UK classrooms found human-supervised AI tutoring achieved comparable efficacy to human tutors, with knowledge transfer rates of 66.2% versus 60.7% for human instruction.
Yet beneath this progress lies a fundamental tension. Research consistently reveals that AI tutors β particularly those powered by LLMs β risk undermining the very learning they are designed to support. Studies show that students with unrestricted ChatGPT access scored 17% lower on independent tests despite solving 48% more practice problems. One large-scale study found cognitive engagement scores were significantly lower (mean 2.95/5) for ChatGPT users compared with controls (4.19/5). The field's most authoritative benchmark, TutorBench, demonstrates that no frontier LLM exceeds 56% overall performance on core tutoring skills. These findings point to a critical gap between what AI tutors can do technically and what they achieve pedagogically.
The methodological landscape is shifting rapidly β from rule-based systems toward LLM-powered approaches, and from evaluating answer correctness toward assessing the quality of the tutoring process itself. However, the vast majority of studies measure immediate post-test performance rather than long-term retention, transfer, or metacognitive development. For funders and policymakers in low- and middle-income countries (LMICs), this evidence base demands careful interpretation: AI tutors hold genuine promise for scaling personalised instruction, but deployment without pedagogical safeguards risks creating what researchers have termed a "Zone of No Development" β where permanent AI scaffolding replaces, rather than supports, cognitive growth.