Education Benchmarks and Evals Mapping

We searched Semantic Scholar for benchmarks and evals relevant to AI in education, and mapped them across 11 quality components. We used LLMs to classify 6,529 papers, which are shown below.

Tool Types

Concerns

Cross-cutting risk themes identified across the research — what could go wrong when AI is used in education, and what do we know about it.

All Benchmarks

6,529
Min relevance
Hide pre-2023