L3Cube-IndicQuest: A Benchmark Question Answering Dataset for Evaluating Knowledge of LLMs in Indic Context

Relevance: 3/10 8 cited 2024 paper

This paper presents IndicQuest, a benchmark question-answering dataset for evaluating how well multilingual LLMs represent knowledge across 19 Indic languages and English, covering five India-specific domains. The dataset evaluates factual knowledge representation using reference-based metrics and LLM-as-a-judge evaluation.

Large Language Models (LLMs) have made significant progress in incorporating Indic languages within multilingual models. However, it is crucial to quantitatively assess whether these languages perform comparably to globally dominant ones, such as English. Currently, there is a lack of benchmark datasets specifically designed to evaluate the regional knowledge of LLMs in various Indic languages. In this paper, we present the L3Cube-IndicQuest, a gold-standard factual question-answering benchmark

Framework Categories

Tool Types

Tags

LLM as judge evaluationcomputer-science