Docimological Quality Analysis of LLM-Generated Multiple Choice Questions in Computer Science and Medicine
This paper evaluates the docimological quality of multiple-choice questions (MCQs) automatically generated by Large Language Models (GPT-based) in computer science and medicine domains, comparing them against established MCQ item-writing guidelines. The study analyzes LLM-generated questions for pedagogical quality including clarity, plausibility of distractors, and alignment with educational best practices.
Assessment is an essential part of education, both for teachers who assess their students as well as learners who may evaluate themselves. Multiple-choice questions (MCQ) are one of the most popular types of knowledge assessment, e.g., in medical education, as they can be automatically graded and can cover a wide range of learning items. However, the creation of high-quality MCQ items is a time-consuming task. The recent advent of Large Language Models (LLM), such as Generative Pre-trained Trans