CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models
Relevance: 2/10 12 cited 2023 paper
CodeApex is a bilingual benchmark for evaluating Large Language Models on programming tasks including comprehension, code generation, and code correction, primarily using C++ problems and comparing various LLMs like GPT-4.
With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. Evaluating the programming capabilities of LLMs is crucial as it reflects the multifaceted abilities of LLMs, and it has numerous downstream applications. In this paper, we propose CodeApex, a bilingual benchmark dataset focusing on the programming comprehension, code generation, and code correction abilities of LLM
Source
Framework Categories
Tool Types
Tags
commonsense reasoning testcomputer-science