MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
MDK12-Bench is a comprehensive multimodal reasoning benchmark comprising 141K questions from K-12 examinations across six disciplines (math, physics, chemistry, biology, geography, information science), designed to evaluate MLLMs' high-order reasoning capabilities with fine-grained knowledge annotations and a dynamic evaluation framework to prevent data contamination.
Multimodal reasoning, which integrates language and visual cues into problem solving and decision making, is a fundamental aspect of human intelligence and a crucial step toward artificial general intelligence. However, the evaluation of multimodal reasoning capabilities in Multimodal Large Language Models (MLLMs) remains inadequate. Most existing reasoning benchmarks are constrained by limited data size, narrow domain coverage, and unstructured knowledge distribution. To close these gaps, we in