How Useful are Educational Questions Generated by Large Language Models?
This paper evaluates the quality and usefulness of educational questions generated by large language models (specifically GPT-3/InstructGPT) using controllable text generation with Bloom's taxonomy and difficulty levels, validated through teacher assessments across two domains. The study demonstrates that LLM-generated questions are judged by teachers as high quality and sufficiently useful for classroom use.
Controllable text generation (CTG) by large language models has a huge potential to transform education for teachers and students alike. Specifically, high quality and diverse question generation can dramatically reduce the load on teachers and improve the quality of their educational content. Recent work in this domain has made progress with generation, but fails to show that real teachers judge the generated questions as sufficiently useful for the classroom setting; or if instead the question