MinorBench: A hand-built benchmark for content-based risks for children
MinorBench is a hand-built benchmark designed to evaluate how well LLMs refuse unsafe or age-inappropriate queries from children, based on a real-world case study of middle-school students using an LLM chatbot and a novel taxonomy of content-based risks specific to minors.
Large Language Models (LLMs) are rapidly entering children's lives - through parent-driven adoption, schools, and peer networks - yet current AI ethics and safety research do not adequately address content-related risks specific to minors. In this paper, we highlight these gaps with a real-world case study of an LLM-based chatbot deployed in a middle school setting, revealing how students used and sometimes misused the system. Building on these findings, we propose a new taxonomy of content-base