M-RewardBench: Evaluating Reward Models in Multilingual Settings

Relevance: 2/10 41 cited 2024 paper

This paper presents M-RewardBench, a multilingual benchmark for evaluating reward models used in aligning large language models with human preferences across 23 languages, testing capabilities in chat, safety, reasoning, and translation tasks. The work focuses on technical evaluation of reward model performance across languages rather than K-12 educational applications.

Reward models (RMs) have driven the state-of-the-art performance of LLMs today by enabling the integration of human feedback into the language modeling process. However, RMs are primarily trained and evaluated in English, and their capabilities in multilingual settings remain largely understudied. In this work, we conduct a systematic evaluation of several reward models in multilingual settings. We first construct the first-of-its-kind multilingual RM evaluation benchmark, M-RewardBench, consist

Framework Categories

Tool Types

Tags

reasoning evaluation LLMcomputer-science