Automated Feedback in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses
This paper compares three models (fine-tuned Mistral/GOAT, SBERT-Canberra, and GPT-4) for automatically scoring and providing qualitative feedback on middle-school students' open-ended math responses, evaluating both scoring accuracy and feedback quality using teacher judgments.
The effectiveness of feedback in enhancing learning outcomes is well documented within Educational Data Mining (EDM). Various prior research has explored methodologies to enhance the effectiveness of feedback. Recent developments in Large Language Models (LLMs) have extended their utility in enhancing automated feedback systems. This study aims to explore the potential of LLMs in facilitating automated feedback in math education. We examine the effectiveness of LLMs in evaluating student respons