Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering

Relevance: 3/10 1 cited 2025 paper

This paper introduces KoLasSimpleQA, a multilingual benchmark for evaluating factual knowledge and hallucination in Large Language Models across 9 languages, covering both general domain and language-specific knowledge (history, culture, regional traditions). The benchmark uses simple fact-based questions with single knowledge points, objective answers, and temporal stability to assess LLMs' factual memory and self-awareness.

We introduce KoLasSimpleQA, the first benchmark evaluating the multilingual factual ability of Large Language Models (LLMs). Inspired by existing research, we created the question set with features such as single knowledge point coverage, absolute objectivity, unique answers, and temporal stability. These questions enable efficient evaluation using the LLM-as-judge paradigm, testing both the LLMs' factual memory and self-awareness ("know what they don't know"). KoLasSimpleQA expands existing res

Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering

Source

Framework Categories

Tool Types

Tags