Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts
This paper introduces and evaluates prompt-based metrics for measuring the difficulty level of educational texts, aiming to improve upon traditional static metrics like Flesch-Kincaid for assessing whether LLM-generated content is appropriately adapted to different education levels. The work focuses on developing better evaluation methods for measuring how well AI systems adapt educational content difficulty, combining LLM-based prompts with traditional readability metrics.
Using large language models (LLMs) for educational applications like dialogue-based teaching is a hot topic. Effective teaching, however, requires teachers to adapt the difficulty of content and explanations to the education level of their students. Even the best LLMs today struggle to do this well. If we want to improve LLMs on this adaptation task, we need to be able to measure adaptation success reliably. However, current Static metrics for text difficulty, like the Flesch-Kincaid Reading Eas