Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks
This paper introduces MathQuest, a mathematics dataset curated from Indian 11th and 12th grade NCERT textbooks, and fine-tunes three large language models (LLaMA-2, WizardMath, MAmmoTH) to evaluate their mathematical problem-solving capabilities on this benchmark. The authors also test these fine-tuned models on existing math benchmarks to assess performance across different complexity levels.
The rapid progress in the field of natural language processing (NLP) systems and the expansion of large language models (LLMs) have opened up numerous opportunities in the field of education and instructional methods. These advancements offer the potential for tailored learning experiences and immediate feedback, all delivered through accessible and cost-effective services. One notable application area for this technological advancement is in the realm of solving mathematical problems. Mathemati