Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories

Relevance: 7/10 19 cited 2025 paper

This paper evaluates GPT-4o's multimodal and multilingual performance on physics concept inventories spanning multiple subjects and languages, comparing AI performance to undergraduate student benchmarks. The study assesses general reasoning, content knowledge mastery, and multimodal/multilingual capabilities using standardized physics assessments.

We investigate the multilingual and multimodal performance of a large language model-based artificial intelligence (AI) system, GPT-4o, using a diverse set of physics concept inventories spanning multiple languages and subject categories. The inventories, sourced from the PhysPort website, cover classical physics topics such as mechanics, electromagnetism, optics, and thermodynamics, as well as relativity, quantum mechanics, astronomy, mathematics, and laboratory skills. Unlike previous text-onl

Source

View source

Framework Categories

1 General reasoning 3.1 Content knowledge 6.1 Multimodal capabilities 6.2 Multilingual capabilities

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.

Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories

Source

Framework Categories

Tool Types

Tags