From Handwriting to Feedback: Evaluating VLMs and LLMs for AI-Powered Assessment in Indonesian Classrooms

Benchmark (Published & Automated) Relevance: 9/10 2025 paper

This paper evaluates vision-language models (VLMs) and large language models (LLMs) for automated grading and feedback generation on over 14K handwritten student answers from Grade 4 classrooms in Indonesia, covering Mathematics and English. The study introduces a multimodal pipeline that processes handwritten responses, grades them against rubrics, and generates personalized Indonesian feedback.

Despite rapid progress in vision-language and large language models (VLMs and LLMs), their effectiveness for AI-driven educational assessment in real-world, underrepresented classrooms remains largely unexplored. We evaluate state-of-the-art VLMs and LLMs on over 14K handwritten answers from grade-4 classrooms in Indonesia, covering Mathematics and English aligned with the local national curriculum. Unlike prior work on clean digital text, our dataset features naturally curly, diverse handwritin

Study Type

Benchmark (Published & Automated)

Tool Types

Teacher Support Tools Tools that assist teachers — lesson planning, content generation, grading, analytics.

Tags

AI grading rubric evaluationcomputer-science