Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting
This paper compares LLM-based tutors with human tutors on grade-school math word problems by having educators annotate and compare tutoring dialog snippets on engagement, empathy, scaffolding, and conciseness in a blind text-only setting. The study finds that educators with teaching experience perceive LLM tutors as performing better than human tutors across all four pedagogical dimensions.
The rapid development of Large Language Models (LLMs) opens up the possibility of using them as personal tutors. This has led to the development of several intelligent tutoring systems and learning assistants that use LLMs as back-ends with various degrees of engineering. In this study, we seek to compare human tutors with LLM tutors in terms of engagement, empathy, scaffolding, and conciseness. We ask human tutors to annotate and compare the performance of an LLM tutor with that of a human tuto