UCO: A Multi-Turn Interactive Reinforcement Learning Method for Adaptive Teaching with Large Language Models

Benchmark (Published & Automated) Relevance: 8/10 2025 paper

This paper proposes UCO (Unidirectional Cognitive Optimization), a multi-turn interactive reinforcement learning method for training LLM-based AI tutors that adapts teaching strategies to students' cognitive states by evaluating genuine understanding (not just correct answers) and operating within students' Zone of Proximal Development. The method is evaluated on BigMath and MathTutorBench benchmarks comparing against 11 baseline models.

Large language models (LLMs) are shifting from answer providers to intelligent tutors in educational settings, yet current supervised fine-tuning methods only learn surface teaching patterns without dynamic adaptation capabilities. Recent reinforcement learning approaches address this limitation but face two critical challenges. First, they evaluate teaching effectiveness solely based on whether students produce correct outputs, unable to distinguish whether students genuinely understand or echo

Study Type

Benchmark (Published & Automated)

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.
Personalised Adaptive Learning Systems that adapt content and difficulty to individual learners.

Tags

tutoring dialogue evaluationcomputer-science