E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language Models

Relevance: 10/10 7 cited 2024 paper

E-EVAL is a comprehensive evaluation benchmark specifically designed for Chinese K-12 education, consisting of 4,351 multiple-choice questions across primary, middle, and high school levels covering nine subjects (Chinese, English, Politics, History, Ethics, Physics, Chemistry, Mathematics, Geography) to assess LLM capabilities in the Chinese K-12 education domain.

With the accelerating development of Large Language Models (LLMs), many LLMs are beginning to be used in the Chinese K-12 education domain. The integration of LLMs and education is getting closer and closer, however, there is currently no benchmark for evaluating LLMs that focuses on the Chinese K-12 education domain. Therefore, there is an urgent need for a comprehensive natural language processing benchmark to accurately assess the capabilities of various LLMs in the Chinese K-12 education dom

Tool Types

Tags

LLM evaluation K-12 educationcomputer-science