CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models

Relevance: 7/10 20 cited 2024 paper

CMMaTH is a Chinese multimodal mathematics benchmark containing 23,000 K-12 math problems (elementary to high school) designed to evaluate foundation models' ability to solve visual-mathematical problems across different grade levels and knowledge points. The paper introduces GradeGPT, an open-source evaluation tool for assessing model performance on this dataset.

Due to the rapid advancements in multimodal large language models, evaluating their multimodal mathematical capabilities continues to receive wide attention. Despite the datasets like MathVista proposed benchmarks for assessing mathematical capabilities in multimodal scenarios, there is still a lack of corresponding evaluation tools and datasets for fine-grained assessment in the context of K12 education in Chinese language. To systematically evaluate the capability of multimodal large models in

Source

View source

Framework Categories

1 General reasoning 3.1 Content knowledge 3.2 Content alignment 6.1 Multimodal capabilities

CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models

Source

Framework Categories

Tool Types

Tags