Machine-Assisted Grading of Nationwide School-Leaving Essay Exams with LLMs and Statistical NLP

Relevance: 9/10 2026 paper

This paper evaluates LLM-based and statistical NLP methods for automated scoring of nationwide high school graduation essay exams in Estonia, comparing machine-generated scores against human raters across multiple rubric dimensions including content, argumentation, and language quality. The study demonstrates that automated scoring achieves reliability comparable to human raters while also examining bias, prompt injection risks, and providing personalized feedback capabilities.

Large language models (LLMs) enable rapid and consistent automated evaluation of open-ended exam responses, including dimensions of content and argumentation that have traditionally required human judgment. This is particularly important in cases where a large amount of exams need to be graded in a limited time frame, such as nation-wide graduation exams in various countries. Here, we examine the applicability of automated scoring on two large datasets of trial exam essays of two full national c

Source

View source

Framework Categories

4.1 Scoring and grading 4.2 Feedback with reasoning 3.1 Content knowledge 5 Ethics and bias

Tool Types

Teacher Support Tools Tools that assist teachers — lesson planning, content generation, grading, analytics.

Machine-Assisted Grading of Nationwide School-Leaving Essay Exams with LLMs and Statistical NLP

Source

Framework Categories

Tool Types

Tags