Exploring Automatic Readability Assessment for Science Documents within a Multilingual Educational Context
This paper develops and evaluates automatic readability assessment models for science education documents in Basque, Spanish, and English for secondary education (ages 12-16), using both machine learning and deep learning approaches to help teachers select appropriately leveled texts for multilingual STEM instruction. The work creates three annotated corpora (BasqueARA, Agrega2Es, Agrega2En+) and tests feature-based and transformer-based models to classify text difficulty levels.
Current student-centred, multilingual, active teaching methodologies require that teachers have continuous access to texts that are adequate in terms of topic and language competence. However, the task of finding appropriate materials is arduous and time consuming for teachers. To build on automatic readability assessment research that could help to assist teachers, we explore the performance of natural language processing approaches when dealing with educational science documents for secondary