Research Reports
State-of-the-art analyses and evidence summaries synthesised from high-relevance papers, organised by framework category and tool type.
How these were produced: We identified papers scoring ≥7/10 for relevance to K-12 AI education, extracted key sections (abstract, introduction, results, discussion, conclusions), then used Claude to synthesise findings into structured analyses. Each report reflects what the research covers — and what it doesn't.
Framework Categories
11General reasoning
Benchmarks measuring general cognitive and reasoning abilities (logic, math, reading comprehension, problem-solving).
Pedagogical knowledge
Benchmarks measuring knowledge about teaching — instructional strategies, learning theories, curriculum design.
Pedagogy of generated outputs
Benchmarks evaluating the pedagogical quality of AI-generated explanations, hints, and instructional content.
Pedagogical interactions
Benchmarks evaluating interactive teaching behaviours — Socratic questioning, scaffolding, adaptive dialogue.
Content knowledge
Benchmarks measuring mastery of subject-matter content (STEM, humanities, etc.).
Content alignment
Benchmarks measuring alignment of content to curricula, standards, or learning objectives.
Scoring and grading
Benchmarks evaluating automated scoring, grading, and rubric application.
Feedback with reasoning
Benchmarks evaluating quality of feedback — explanations, reasoning traces, actionable suggestions.
Ethics and bias
Benchmarks measuring fairness, bias, safety, and ethical behaviour in educational contexts.
Multimodal capabilities
Benchmarks evaluating vision, audio, diagram understanding, and multimodal reasoning for education.
Multilingual capabilities
Benchmarks evaluating performance across languages and cross-lingual educational tasks.