SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth
SproutBench is a safety evaluation benchmark for LLMs targeting youth, comprising 1,283 adversarial prompts designed to assess age-appropriate responses across early childhood (0-6), middle childhood (7-12), and adolescence (13-18). It evaluates 47 LLMs on dimensions including safety, risk prevention, interactivity, and age appropriateness, focusing on ethical and developmental considerations rather than learning outcomes.
The rapid proliferation of large language models (LLMs) in applications targeting children and adolescents necessitates a fundamental reassessment of prevailing AI safety frameworks, which are largely tailored to adult users and neglect the distinct developmental vulnerabilities of minors. This paper highlights key deficiencies in existing LLM safety benchmarks, including their inadequate coverage of age-specific cognitive, emotional, and social risks spanning early childhood (ages 0--6), middle