From Handwriting to Feedback: Evaluating VLMs and LLMs for AI-Powered Assessment in Indonesian Classrooms
This paper evaluates vision-language models (VLMs) and large language models (LLMs) for automated grading and feedback generation on over 14K handwritten student answers from Grade 4 classrooms in Indonesia, covering Mathematics and English. The study introduces a multimodal pipeline that processes handwritten responses, grades them against rubrics, and generates personalized Indonesian feedback.
Despite rapid progress in vision-language and large language models (VLMs and LLMs), their effectiveness for AI-driven educational assessment in real-world, underrepresented classrooms remains largely unexplored. We evaluate state-of-the-art VLMs and LLMs on over 14K handwritten answers from grade-4 classrooms in Indonesia, covering Mathematics and English aligned with the local national curriculum. Unlike prior work on clean digital text, our dataset features naturally curly, diverse handwritin