How Useful are Educational Questions Generated by Large Language Models?
This paper evaluates the quality and usefulness of educational questions generated by large language models (specifically InstructGPT/GPT-3) using controllable text generation with Bloom's taxonomy and difficulty levels, through human evaluation by teachers across two domains (computer science and biology). Teachers rated the generated questions on quality and usefulness for classroom use.
Controllable text generation (CTG) by large language models has a huge potential to transform education for teachers and students alike. Specifically, high quality and diverse question generation can dramatically reduce the load on teachers and improve the quality of their educational content. Recent work in this domain has made progress with generation, but fails to show that real teachers judge the generated questions as sufficiently useful for the classroom setting; or if instead the question