Comprehensive Readability Assessment of Scientific Learning Resources
This paper presents AGREE, a dataset of 42,850 computer science learning resources (research papers, lecture notes, Wikipedia content) and evaluates their readability using 14 readability indices and 12 lexical measures, training machine learning models to assess text difficulty. The work focuses on readability assessment of CS academic and technical content for self-learners, not K-12 educational contexts.
Readability is the measure of how easier a piece of text is. Readability assessment plays a crucial role in facilitating content writers and proofreaders to receive guidance about how easy or difficult a piece of text is. In literature, classical readability, lexical measures, and deep learning based model have been proposed to assess the text readability. However, readability assessment using machine and deep learning is a data-intensive task, which requires a reasonable-sized dataset for accur