ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment
This paper presents README++, a multilingual benchmark for evaluating readability assessment of sentences across 5 languages (Arabic, English, French, Hindi, Russian) and 112 data sources, focusing on second language learners using the CEFR scale. The work evaluates various language models' ability to predict text difficulty levels in supervised, unsupervised, and few-shot settings.
We present a comprehensive evaluation of large language models for multilingual readability assessment. Existing evaluation resources lack domain and language diversity, limiting the ability for cross-domain and cross-lingual analyses. This paper introduces ReadMe++, a multilingual multi-domain dataset with human annotations of 9757 sentences in Arabic, English, French, Hindi, and Russian, collected from 112 different data sources. This benchmark will encourage research on developing robust mult