KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan
KazMMLU is an MMLU-style benchmark dataset of 23,000 multiple-choice questions in Kazakh and Russian covering high school and university subjects (STEM, humanities, social sciences) with Kazakhstan-specific content, used to evaluate multilingual LLMs like Llama3.1, Qwen-2.5, GPT-4o, and DeepSeek V3.
Despite having a population of twenty million, Kazakhstan's culture and language remain underrepresented in the field of natural language processing. Although large language models (LLMs) continue to advance worldwide, progress in Kazakh language has been limited, as seen in the scarcity of dedicated models and benchmark evaluations. To address this gap, we introduce KazMMLU, the first MMLU-style dataset specifically designed for Kazakh language. KazMMLU comprises 23,000 questions that cover var