Assessing LLM Text Detection in Educational Contexts: Does Human Contribution Affect Detection?

Benchmark (Published & Automated) Relevance: 7/10 2 cited 2025 paper

This paper introduces GEDE (Generative Essay Detection in Education), a benchmark dataset with over 900 student essays and 12,500 LLM-generated essays across various contribution levels (from human-written to fully AI-generated), and evaluates five state-of-the-art detection methods. The benchmark assesses detectors' ability to identify different levels of student vs. LLM contribution in educational writing assignments.

Recent advancements in Large Language Models (LLMs) and their increased accessibility have made it easier than ever for students to automatically generate texts, posing new challenges for educational institutions. To enforce norms of academic integrity and ensure students'learning, learning analytics methods to automatically detect LLM-generated text appear increasingly appealing. This paper benchmarks the performance of different state-of-the-art detectors in educational contexts, introducing a

Study Type

Benchmark (Published & Automated)

Framework Categories

Tool Types

Teacher Support Tools Tools that assist teachers — lesson planning, content generation, grading, analytics.

Tags

benchmark dataset education learningcomputer-science