Automated Educational Question Generation at Different Bloom's Skill Levels Using Large Language Models: Strategies and Evaluation
This paper evaluates five large language models' ability to automatically generate educational questions at different Bloom's taxonomy cognitive levels using advanced prompting techniques, with both expert human and LLM-based evaluation of question quality. The study finds that LLMs can generate pedagogically relevant questions across cognitive levels when properly prompted, though performance varies significantly across models and automated evaluation does not match human judgment.
Developing questions that are pedagogically sound, relevant, and promote learning is a challenging and time-consuming task for educators. Modern-day large language models (LLMs) generate high-quality content across multiple domains, potentially helping educators to develop high-quality questions. Automated educational question generation (AEQG) is important in scaling online education catering to a diverse student population. Past attempts at AEQG have shown limited abilities to generate questio