Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education

Research / Other Relevance: 9/10 2026 paper

This paper introduces PedagogicalRL-Thinking, a reinforcement learning framework that trains LLM tutors by applying pedagogical rewards to both visible responses and internal reasoning traces, using domain-specific educational theory (Polya's problem-solving framework) to guide the model's thinking process. The approach is evaluated on mathematics tutoring dialogues with metrics measuring solution correctness, answer leakage prevention, and pedagogical helpfulness.

Large language models (LLMs) are increasingly deployed as intelligent tutoring systems, yet research on optimizing LLMs specifically for educational contexts remains limited. Recent works have proposed reinforcement learning approaches for training LLM tutors, but these methods focus solely on optimizing visible responses while neglecting the model's internal thinking process. We introduce PedagogicalRL-Thinking, a framework that extends pedagogical alignment to reasoning LLMs in education throu

Study Type

Research / Other

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.

Tags

tutoring dialogue evaluationcomputer-science