Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation

Relevance: 6/10 7 cited 2024 paper

TR-MMLU is a Turkish language benchmark comprising 6,200 multiple-choice questions across 62 sections drawn from Turkey's education system (including university entrance exams and open education faculty exams) to evaluate large language models' knowledge and linguistic capabilities in Turkish. The benchmark tests factual recall, conceptual understanding, logical reasoning, and cultural context across diverse academic disciplines.

Language models have made remarkable advancements in understanding and generating human language, achieving notable success across a wide array of applications. However, evaluating these models remains a significant challenge, particularly for resource-limited languages such as Turkish. To address this gap, we introduce the Turkish MMLU (TR-MMLU) benchmark, a comprehensive evaluation framework designed to assess the linguistic and conceptual capabilities of large language models (LLMs) in Turkis

Tool Types

Tags

large language model evaluation educationcomputer-science