How Does Quantization Affect Multilingual LLMs?
This paper analyzes how quantization (a compression technique for LLMs) affects performance across multiple languages, finding that automatic metrics underestimate degradation, non-Latin script languages are harmed more, and challenging tasks like mathematical reasoning degrade fastest. The study focuses on technical compression methods for multilingual LLMs rather than educational applications or K-12 contexts.
Quantization techniques are widely used to improve inference speed and deployment of large language models. While a wide body of work examines the impact of quantization on LLMs in English, none have evaluated across languages. We conduct a thorough analysis of quantized multilingual LLMs, focusing on performance across languages and at varying scales. We use automatic benchmarks, LLM-as-a-Judge, and human evaluation, finding that (1) harmful effects of quantization are apparent in human evaluat