Are Multilingual LLMs Culturally-Diverse Reasoners? An Investigation into Multicultural Proverbs and Sayings

Relevance: 3/10 30 cited 2023 paper

This paper evaluates multilingual large language models' ability to understand and reason with proverbs and sayings from six different languages within conversational contexts, constructing the MAPS benchmark dataset. The work focuses on measuring cultural reasoning gaps across languages and whether models can properly interpret figurative language in context.

Large language models (LLMs) are highly adept at question answering and reasoning tasks, but when reasoning in a situational context, human expectations vary depending on the relevant cultural common ground. As languages are associated with diverse cultures, LLMs should also be culturally-diverse reasoners. In this paper, we study the ability of a wide range of state-of-the-art multilingual LLMs (mLLMs) to reason with proverbs and sayings in a conversational context. Our experiments reveal that:

Framework Categories

Tool Types

Tags

reasoning evaluation LLMcomputer-science