FEANEL: A Benchmark for Fine-Grained Error Analysis in K-12 English Writing
FEANEL is a benchmark for evaluating LLMs' ability to provide fine-grained error analysis and pedagogical feedback on K-12 English writing, comprising 1,000 student essays with expert-annotated errors categorized by type, severity, and explanations. The benchmark specifically assesses whether AI systems can identify writing errors and provide educationally meaningful, interpretable feedback to support student learning.
Large Language Models (LLMs) have transformed artificial intelligence, offering profound opportunities for educational applications. However, their ability to provide fine-grained educational feedback for K-12 English writing remains underexplored. In this paper, we challenge the error analysis and pedagogical skills of LLMs by introducing the problem of Fine-grained Error Analysis for English Learners and present the Fine-grained Error ANalysis for English Learners (FEANEL) Benchmark. The bench