Teacher Support Tools

Tools that assist teachers β€” lesson planning, content generation, grading, analytics.

πŸ“‹

Research Summary

Teacher support tools represent one of the most active areas of artificial intelligence research in K-12 education, with 238 papers reviewed in this analysis spanning automated grading, feedback generation, lesson planning, question creation, and classroom analytics. The field has made substantial technical progress β€” automated essay scoring (AES) systems now achieve quadratic weighted kappa (QWK) scores of 0.70–0.95 against human raters, and large language models (LLMs) can generate curriculum-aligned lesson plans, reading comprehension questions, and assessment items with increasing sophistication. Systems have been tested across multiple languages β€” including English, Arabic, Chinese, Spanish, Indonesian, and Basque β€” and across subjects from language arts and science to computer programming and visual art.

However, a fundamental tension runs through this literature. The overwhelming majority of papers measure technical performance β€” agreement with human scores, accuracy, precision, F1 β€” rather than educational impact. Very few studies examine whether these tools actually reduce teacher workload in authentic classrooms, whether AI-generated feedback improves student learning, or how automated systems reshape instructional practice over time. Approximately 60% of papers focus on automated essay and short-answer scoring, yet the evaluation paradigm remains narrowly focussed on matching human rater judgements rather than determining whether those judgements β€” or the AI's replication of them β€” genuinely serve learning. This gap between technical sophistication and pedagogical validation represents the most critical challenge facing the field.

The implications for low- and middle-income countries (LMICs) are significant. Teacher workload reduction and scalable assessment are pressing needs in contexts where class sizes are large and trained assessors scarce. Yet nearly all benchmark datasets and evaluation frameworks originate in high-income, English-dominant settings. Building teacher support tools that are equitable, multilingual, and pedagogically grounded β€” rather than simply accurate β€” requires a fundamental shift in how the field defines success.

Read full evidence summary
Min relevance
Hide pre-2023
1,701 benchmarks across 11 categories
View all 549 benchmarks in Pedagogical knowledge β†’
View all 233 benchmarks in Content knowledge β†’
View all 326 benchmarks in Content alignment β†’
View all 500 benchmarks in Scoring and grading β†’
View all 432 benchmarks in Feedback with reasoning β†’
View all 307 benchmarks in Ethics and bias β†’