Cultivating Helpful, Personalized, and Creative AI Tutors: A Framework for Pedagogical Alignment using Reinforcement Learning
EduAlign proposes a reinforcement learning framework to align large language models with pedagogical principles of helpfulness, personalization, and creativity for AI tutoring. The framework includes HPC-RM, a multi-dimensional reward model trained on 8k educational interactions, and uses GRPO to fine-tune LLMs to become more pedagogically effective AI tutors.
The integration of large language models (LLMs) into education presents unprecedented opportunities for scalable personalized learning. However, standard LLMs often function as generic information providers, lacking alignment with fundamental pedagogical principles such as helpfulness, student-centered personalization, and creativity cultivation. To bridge this gap, we propose EduAlign, a novel framework designed to guide LLMs toward becoming more effective and responsible educational assistants