Automated Assessment of Students' Code Comprehension using LLMs
This paper evaluates Large Language Models (LLMs) for automatically assessing the correctness of students' line-by-line natural language explanations of computer code (Java programs) in introductory CS courses, comparing LLM performance against fine-tuned encoder-based semantic similarity models. The work focuses on automated short answer grading in the programming domain to support self-explanation activities that improve code comprehension.
Assessing student's answers and in particular natural language answers is a crucial challenge in the field of education. Advances in machine learning, including transformer-based models such as Large Language Models(LLMs), have led to significant progress in various natural language tasks. Nevertheless, amidst the growing trend of evaluating LLMs across diverse tasks, evaluating LLMs in the realm of automated answer assesment has not received much attention. To address this gap, we explore the p