MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

Benchmark (Published & Automated) Relevance: 9/10 6 cited 2025 paper

MDK12-Bench is a comprehensive multimodal reasoning benchmark comprising 141K questions from K-12 examinations across six disciplines (math, physics, chemistry, biology, geography, information science), designed to evaluate MLLMs' high-order reasoning capabilities with fine-grained knowledge annotations and a dynamic evaluation framework to prevent data contamination.

Multimodal reasoning, which integrates language and visual cues into problem solving and decision making, is a fundamental aspect of human intelligence and a crucial step toward artificial general intelligence. However, the evaluation of multimodal reasoning capabilities in Multimodal Large Language Models (MLLMs) remains inadequate. Most existing reasoning benchmarks are constrained by limited data size, narrow domain coverage, and unstructured knowledge distribution. To close these gaps, we in

Study Type

Benchmark (Published & Automated)

Source

View source

Framework Categories

1 General reasoning 3.1 Content knowledge 6.1 Multimodal capabilities

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

Study Type

Source

Framework Categories

Tool Types

Tags