Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts
This paper introduces prompt-based metrics to evaluate text difficulty and appropriateness for different education levels, improving upon traditional readability measures like Flesch-Kincaid. The authors develop and validate these metrics through user studies and regression experiments to better measure LLMs' ability to adapt educational content to student levels.
Using large language models (LLMs) for educational applications like dialogue-based teaching is a hot topic. Effective teaching, however, requires teachers to adapt the difficulty of content and explanations to the education level of their students. Even the best LLMs today struggle to do this well. If we want to improve LLMs on this adaptation task, we need to be able to measure adaptation success reliably. However, current Static metrics for text difficulty, like the Flesch-Kincaid Reading Eas