Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU

Relevance: 7/10 59 cited 2023 paper

IndoMMLU is a multi-task language understanding benchmark for Indonesian that evaluates LLMs on 14,981 questions from primary school through university entrance exams across 64 tasks, including STEM, humanities, social science, Indonesian language proficiency, and nine local languages/cultures. The benchmark tests general reasoning, content knowledge, and multilingual capabilities of LLMs like GPT-3.5, BLOOMZ, and Falcon across Indonesian education levels.

Although large language models (LLMs) are often pre-trained on large-scale multilingual texts, their reasoning abilities and real-world knowledge are mainly evaluated based on English datasets. Assessing LLM capabilities beyond English is increasingly vital but hindered due to the lack of suitable datasets. In this work, we introduce IndoMMLU, the first multi-task language understanding benchmark for Indonesian culture and languages, which consists of questions from primary school to university

Tool Types

Tags

large language model evaluation educationcomputer-science