CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models

Relevance: 2/10 12 cited 2023 paper

CodeApex is a bilingual benchmark for evaluating Large Language Models on programming tasks including comprehension, code generation, and code correction, primarily using C++ problems and comparing various LLMs like GPT-4.

With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. Evaluating the programming capabilities of LLMs is crucial as it reflects the multifaceted abilities of LLMs, and it has numerous downstream applications. In this paper, we propose CodeApex, a bilingual benchmark dataset focusing on the programming comprehension, code generation, and code correction abilities of LLM

CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models

Source

Framework Categories

Tool Types

Tags