Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories
This paper evaluates GPT-4o's multimodal and multilingual performance on physics concept inventories spanning multiple subjects and languages, comparing AI performance to undergraduate student benchmarks. The study assesses general reasoning, content knowledge mastery, and multimodal/multilingual capabilities using standardized physics assessments.
We investigate the multilingual and multimodal performance of a large language model-based artificial intelligence (AI) system, GPT-4o, using a diverse set of physics concept inventories spanning multiple languages and subject categories. The inventories, sourced from the PhysPort website, cover classical physics topics such as mechanics, electromagnetism, optics, and thermodynamics, as well as relativity, quantum mechanics, astronomy, mathematics, and laboratory skills. Unlike previous text-onl