Pedagogy-driven Evaluation of Generative AI-powered Intelligent Tutoring Systems

Relevance: 9/10 1 cited 2025 paper

This paper provides a comprehensive review of evaluation practices for LLM-powered Intelligent Tutoring Systems (ITSs), critically analyzing existing benchmarks and proposing three pedagogy-driven research directions for establishing unified, scalable evaluation methodologies grounded in learning science principles. It emphasizes cognitive offloading concerns, citing empirical studies showing students' over-reliance on AI tutors leading to reduced independent problem-solving skills.

The interdisciplinary research domain of Artificial Intelligence in Education (AIED) has a long history of developing Intelligent Tutoring Systems (ITSs) by integrating insights from technological advancements, educational theories, and cognitive psychology. The remarkable success of generative AI (GenAI) models has accelerated the development of large language model (LLM)-powered ITSs, which have potential to imitate human-like, pedagogically rich, and cognitively demanding tutoring. However, t

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.

Tags

tutoring dialogue evaluationcomputer-science