Unveiling the Tapestry of Automated Essay Scoring: A Comprehensive Investigation of Accuracy, Fairness, and Generalizability

Relevance: 7/10 15 cited 2024 paper

This paper comprehensively evaluates nine Automated Essay Scoring (AES) methods across 25,000+ student essays, examining the interplay between predictive accuracy, fairness (bias across demographic groups), and generalizability in both prompt-specific and cross-prompt settings. The study reveals that prompt-specific models achieve higher accuracy but exhibit greater bias toward students of different economic statuses compared to cross-prompt models.

Automatic Essay Scoring (AES) is a well-established educational pursuit that employs machine learning to evaluate student-authored essays. While much effort has been made in this area, current research primarily focuses on either (i) boosting the predictive accuracy of an AES model for a specific prompt (i.e., developing prompt-specific models), which often heavily relies on the use of the labeled data from the same target prompt; or (ii) assessing the applicability of AES models developed on no

Tool Types

Teacher Support Tools Tools that assist teachers — lesson planning, content generation, grading, analytics.

Tags

automated essay scoring evaluationcomputer-science