MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
MDK12-Bench is a large-scale multimodal reasoning benchmark comprising 141K questions across six K-12 disciplines (math, physics, chemistry, biology, geography, information science) spanning grades 1-12, designed to evaluate MLLMs' reasoning capabilities using real-world K-12 examination questions with fine-grained knowledge annotations and difficulty labels.
Multimodal reasoning, which integrates language and visual cues into problem solving and decision making, is a fundamental aspect of human intelligence and a crucial step toward artificial general intelligence. However, the evaluation of multimodal reasoning capabilities in Multimodal Large Language Models (MLLMs) remains inadequate. Most existing reasoning benchmarks are constrained by limited data size, narrow domain coverage, and unstructured knowledge distribution. To close these gaps, we in