Assessing LLM Text Detection in Educational Contexts: Does Human Contribution Affect Detection?
This paper benchmarks LLM-generated text detection methods in educational contexts using a novel dataset (GEDE) of student essays and LLM-generated texts across different 'contribution levels' (from fully human-written to fully AI-generated). The work evaluates state-of-the-art detectors' performance in identifying AI-generated student submissions, particularly examining false positive rates and detection across different levels of human-AI collaboration.
Recent advancements in Large Language Models (LLMs) and their increased accessibility have made it easier than ever for students to automatically generate texts, posing new challenges for educational institutions. To enforce norms of academic integrity and ensure students'learning, learning analytics methods to automatically detect LLM-generated text appear increasingly appealing. This paper benchmarks the performance of different state-of-the-art detectors in educational contexts, introducing a