Research Reports

State-of-the-art analyses and evidence summaries synthesised from high-relevance papers, organised by framework category and tool type.

How these were produced: We identified papers scoring ≥7/10 for relevance to K-12 AI education, extracted key sections (abstract, introduction, results, discussion, conclusions), then used Claude to synthesise findings into structured analyses. Each report reflects what the research covers — and what it doesn't.

Framework Categories

11
1 General reasoning

General reasoning

Benchmarks measuring general cognitive and reasoning abilities (logic, math, reading comprehension, problem-solving).

2.1 Pedagogy

Pedagogical knowledge

Benchmarks measuring knowledge about teaching — instructional strategies, learning theories, curriculum design.

2.2 Pedagogy

Pedagogy of generated outputs

Benchmarks evaluating the pedagogical quality of AI-generated explanations, hints, and instructional content.

2.3 Pedagogy

Pedagogical interactions

Benchmarks evaluating interactive teaching behaviours — Socratic questioning, scaffolding, adaptive dialogue.

3.1 Educational content

Content knowledge

Benchmarks measuring mastery of subject-matter content (STEM, humanities, etc.).

3.2 Educational content

Content alignment

Benchmarks measuring alignment of content to curricula, standards, or learning objectives.

4.1 Assessment

Scoring and grading

Benchmarks evaluating automated scoring, grading, and rubric application.

4.2 Assessment

Feedback with reasoning

Benchmarks evaluating quality of feedback — explanations, reasoning traces, actionable suggestions.

5 Ethics and bias

Ethics and bias

Benchmarks measuring fairness, bias, safety, and ethical behaviour in educational contexts.

6.1 Digitisation / accessibility

Multimodal capabilities

Benchmarks evaluating vision, audio, diagram understanding, and multimodal reasoning for education.

6.2 Digitisation / accessibility

Multilingual capabilities

Benchmarks evaluating performance across languages and cross-lingual educational tasks.

Tool Type Evidence Summaries

3