LLM-Based Automated Grading with Human-in-the-Loop
This paper presents GradeHITL, an LLM-based automated grading framework with human-in-the-loop that enables AI to ask clarifying questions about rubrics to human experts, dynamically refining grading standards for short-answer assessment. The system is evaluated on mathematics teaching knowledge questions, using reinforcement learning to filter high-quality clarification questions.
The rise of artificial intelligence (AI) technologies, particularly large language models (LLMs), has brought significant advancements to the education field. Among various applications, automatic short answer grading (ASAG), which focuses on evaluating open-ended textual responses, has seen remarkable progress with LLMs. These models not only enhance grading performance compared to traditional ASAG approaches but also move beyond simple comparisons with predefined answers, enabling more sophist