Investigating generative AI models and detection techniques: impacts of tokenization and dataset size on identification of AI-generated text

Relevance: 6/10 9 cited 2024 paper

This paper investigates machine learning and large language model techniques for detecting AI-generated text in high school writing assessments, comparing various generative AI models (ChatGPT, Claude, Gemini) and their susceptibility to paraphrasing tools. The study uses essays from the ASAP Kaggle competition to train classifiers distinguishing student-written from AI-generated content.

Generative AI models, including ChatGPT, Gemini, and Claude, are increasingly significant in enhancing K–12 education, offering support across various disciplines. These models provide sample answers for humanities prompts, solve mathematical equations, and brainstorm novel ideas. Despite their educational value, ethical concerns have emerged regarding their potential to mislead students into copying answers directly from AI when completing assignments, assessments, or research papers. Current d

Source

View source Open PDF

Framework Categories

5 Ethics and bias

Tool Types

Teacher Support Tools Tools that assist teachers — lesson planning, content generation, grading, analytics.

Investigating generative AI models and detection techniques: impacts of tokenization and dataset size on identification of AI-generated text

Source

Framework Categories

Tool Types

Tags