CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models
CMMaTH is a Chinese multimodal mathematics benchmark containing 23,000 K-12 math problems (elementary to high school) with visual elements, designed to evaluate foundation models' mathematical reasoning capabilities. The benchmark includes an open-source automated evaluation tool (GradeGPT) and provides detailed knowledge point annotations across different grade levels.
Due to the rapid advancements in multimodal large language models, evaluating their multimodal mathematical capabilities continues to receive wide attention. Despite the datasets like MathVista proposed benchmarks for assessing mathematical capabilities in multimodal scenarios, there is still a lack of corresponding evaluation tools and datasets for fine-grained assessment in the context of K12 education in Chinese language. To systematically evaluate the capability of multimodal large models in