UCO: A Multi-Turn Interactive Reinforcement Learning Method for Adaptive Teaching with Large Language Models

Relevance: 8/10 2025 paper

This paper proposes UCO, a multi-turn interactive reinforcement learning method that trains large language models to become adaptive AI tutors by tracking students' cognitive states through the Zone of Proximal Development framework and rewarding teaching strategies that promote genuine understanding rather than answer repetition. The method is evaluated on mathematics tutoring benchmarks (BigMath and MathTutorBench) and demonstrates performance comparable to closed-source models.

Large language models (LLMs) are shifting from answer providers to intelligent tutors in educational settings, yet current supervised fine-tuning methods only learn surface teaching patterns without dynamic adaptation capabilities. Recent reinforcement learning approaches address this limitation but face two critical challenges. First, they evaluate teaching effectiveness solely based on whether students produce correct outputs, unable to distinguish whether students genuinely understand or echo

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.

Tags

tutoring dialogue evaluationcomputer-science