Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education

Relevance: 9/10 2026 paper

This paper introduces PedagogicalRL-Thinking, a reinforcement learning framework that trains LLM tutors to generate pedagogically appropriate reasoning traces (not just responses) by rewarding thinking processes grounded in Polya's problem-solving framework, with evaluation on mathematics tutoring dialogues measuring solution correctness, answer leakage prevention, and helpfulness.

Large language models (LLMs) are increasingly deployed as intelligent tutoring systems, yet research on optimizing LLMs specifically for educational contexts remains limited. Recent works have proposed reinforcement learning approaches for training LLM tutors, but these methods focus solely on optimizing visible responses while neglecting the model's internal thinking process. We introduce PedagogicalRL-Thinking, a framework that extends pedagogical alignment to reasoning LLMs in education throu

Source

View source

Framework Categories

2.3 Pedagogical interactions 4.2 Feedback with reasoning

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.

Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education

Source

Framework Categories

Tool Types

Tags