Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories
This paper evaluates GPT-4o's multimodal and multilingual performance on physics concept inventories (standardized assessments of conceptual understanding) across multiple subjects and languages, comparing AI results to undergraduate student performance. The study uses existing concept inventory datasets uploaded as images to test the AI's ability to interpret visual information and answer physics questions in various languages.
We investigate the multilingual and multimodal performance of a large language model-based artificial intelligence (AI) system, GPT-4o, using a diverse set of physics concept inventories spanning multiple languages and subject categories. The inventories, sourced from the PhysPort website, cover classical physics topics such as mechanics, electromagnetism, optics, and thermodynamics, as well as relativity, quantum mechanics, astronomy, mathematics, and laboratory skills. Unlike previous text-onl