CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models
CMMaTH is a Chinese multimodal mathematics benchmark containing 23,000 K-12 math problems (elementary to high school) designed to evaluate foundation models' ability to solve visual-mathematical problems across different grade levels and knowledge points. The paper introduces GradeGPT, an open-source evaluation tool for assessing model performance on this dataset.
Due to the rapid advancements in multimodal large language models, evaluating their multimodal mathematical capabilities continues to receive wide attention. Despite the datasets like MathVista proposed benchmarks for assessing mathematical capabilities in multimodal scenarios, there is still a lack of corresponding evaluation tools and datasets for fine-grained assessment in the context of K12 education in Chinese language. To systematically evaluate the capability of multimodal large models in