Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese

Relevance: 3/10 6 cited 2025 paper

This paper benchmarks 11 large language models on their performance when prompted in Simplified versus Traditional Chinese, focusing on regional term choice and hiring name choice tasks to identify biases between the two Chinese variants. The work examines representational harms and differential LLM responses across Chinese character sets but does not focus on K-12 education applications.

While the capabilities of Large Language Models (LLMs) have been studied in both Simplified and Traditional Chinese, it is yet unclear whether LLMs exhibit differential performance when prompted in these two variants of written Chinese. This understanding is critical, as disparities in the quality of LLM responses can perpetuate representational harms by ignoring the different cultural contexts underlying Simplified versus Traditional Chinese, and can exacerbate downstream harms in LLM-facilitat

Tool Types

Tags

large language model evaluation educationcomputer-science