FEANEL: A Benchmark for Fine-Grained Error Analysis in K-12 English Writing
FEANEL is a benchmark dataset of 1,000 K-12 student essays (elementary and secondary) with fine-grained error annotations by language education experts, evaluating LLMs' ability to identify error types, assess severity, and provide pedagogical explanations for English writing errors. The benchmark uses a part-of-speech-based error taxonomy and evaluates state-of-the-art LLMs on their error analysis and feedback quality capabilities.
Large Language Models (LLMs) have transformed artificial intelligence, offering profound opportunities for educational applications. However, their ability to provide fine-grained educational feedback for K-12 English writing remains underexplored. In this paper, we challenge the error analysis and pedagogical skills of LLMs by introducing the problem of Fine-grained Error Analysis for English Learners and present the Fine-grained Error ANalysis for English Learners (FEANEL) Benchmark. The bench