Detecting LLM-Generated Text in Computing Education: Comparative Study for ChatGPT Cases
This paper evaluates eight publicly-available LLM-generated text detectors (including GPTZero, CopyLeaks, GPTKit, and GLTR) using computing education assignments from university students, comparing their accuracy, false positive rates, and resilience to paraphrasing tools. The study focuses on detecting academic integrity violations when students use ChatGPT to complete assignments, finding significant variation in detector performance and high false positive rates.
Due to the recent improvements and wide availability of Large Language Models (LLMs), they have posed a serious threat to academic integrity in education. Modern LLM-generated text detectors attempt to combat the problem by offering educators with services to assess whether some text is LLM-generated. In this work, we have collected 124 submissions from computer science students before the creation of ChatGPT. We then generated 40 ChatGPT submissions. We used this data to evaluate eight publicly