Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks

Benchmark (Published & Automated) Relevance: 7/10 17 cited 2024 paper

This paper introduces MathQuest, a mathematics dataset curated from Indian 11th and 12th grade NCERT textbooks, and fine-tunes three large language models (LLaMA-2, WizardMath, MAmmoTH) to evaluate their mathematical problem-solving capabilities on this benchmark. The authors also test these fine-tuned models on existing math benchmarks to assess performance across different complexity levels.

The rapid progress in the field of natural language processing (NLP) systems and the expansion of large language models (LLMs) have opened up numerous opportunities in the field of education and instructional methods. These advancements offer the potential for tailored learning experiences and immediate feedback, all delivered through accessible and cost-effective services. One notable application area for this technological advancement is in the realm of solving mathematical problems. Mathemati

Study Type

Benchmark (Published & Automated)

Source

View source

Framework Categories

1 General reasoning 3.1 Content knowledge 3.2 Content alignment

Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks

Study Type

Source

Framework Categories

Tool Types

Tags