BanglaMATH : A Bangla benchmark dataset for testing LLM mathematical reasoning at grades 6, 7, and 8

Relevance: 7/10 3 cited 2025 paper

BanglaMATH is a benchmark dataset of 1,700 Bangla math word problems from grades 6-8, designed to evaluate LLMs' mathematical reasoning capabilities in a low-resource language. The paper assesses commercial and open-source LLMs on elementary-level math problems, examining performance across grade levels, reasoning complexity, and robustness to distracting information.

Large Language Models (LLMs) have tremendous potential to play a key role in supporting mathematical reasoning, with growing use in education and AI research. However, most existing benchmarks are limited to English, creating a significant gap for low-resource languages. For example, Bangla is spoken by nearly 250 million people who would collectively benefit from LLMs capable of native fluency. To address this, we present BanglaMATH, a dataset of 1.7k Bangla math word problems across topics suc

Tool Types

Tags

elementary education benchmarkcomputer-science