From Handwriting to Feedback: Evaluating VLMs and LLMs for AI-Powered Assessment in Indonesian Classrooms

Relevance: 9/10 2025 paper

This paper evaluates state-of-the-art VLMs and LLMs on grading over 14,000 handwritten answers from Indonesian Grade 4 students in Mathematics and English, and generating personalized feedback. The study introduces a multimodal pipeline integrating vision models for handwriting recognition and language models for rubric-based assessment in real-world, underrepresented classroom settings.

Despite rapid progress in vision-language and large language models (VLMs and LLMs), their effectiveness for AI-driven educational assessment in real-world, underrepresented classrooms remains largely unexplored. We evaluate state-of-the-art VLMs and LLMs on over 14K handwritten answers from grade-4 classrooms in Indonesia, covering Mathematics and English aligned with the local national curriculum. Unlike prior work on clean digital text, our dataset features naturally curly, diverse handwritin

Tool Types

Teacher Support Tools Tools that assist teachers — lesson planning, content generation, grading, analytics.

Tags

AI grading rubric evaluationcomputer-science