Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering
This paper introduces KoLasSimpleQA, a multilingual benchmark for evaluating factual knowledge and hallucination in Large Language Models across 9 languages, covering both general domain and language-specific knowledge (history, culture, regional traditions). The benchmark uses simple fact-based questions with single knowledge points, objective answers, and temporal stability to assess LLMs' factual memory and self-awareness.
We introduce KoLasSimpleQA, the first benchmark evaluating the multilingual factual ability of Large Language Models (LLMs). Inspired by existing research, we created the question set with features such as single knowledge point coverage, absolute objectivity, unique answers, and temporal stability. These questions enable efficient evaluation using the LLM-as-judge paradigm, testing both the LLMs' factual memory and self-awareness ("know what they don't know"). KoLasSimpleQA expands existing res