The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education
This paper evaluates the use of pre-trained language models to automatically assess high-inference teaching practices in K-12 math classrooms and simulated teacher training settings, comparing NLP-based measurements against traditional human expert ratings across multiple dimensions of instruction quality including practices for students with special needs.
Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers’ expertise and idiosyncratic factors, preventing teachers from getting timely and frequent feedback. Different from prior research that mostly focuses on low-inference instructional practices on a singular basis, this paper presents the first study that leverages Natural Language Processi