Can You Trick the Grader? Adversarial Persuasion of LLM Judges
This paper investigates whether LLM judges can be manipulated through persuasive language when evaluating mathematical solutions, finding that strategically embedded rhetorical techniques cause LLMs to assign inflated scores to incorrect answers by up to 8%. The study tests 14 LLM models across six math benchmarks using seven persuasion strategies grounded in Aristotelian rhetoric.
As large language models take on growing roles as automated evaluators in practical settings, a critical question arises: Can individuals persuade an LLM judge to assign unfairly high scores? This study is the first to reveal that strategically embedded persuasive language can bias LLM judges when scoring mathematical reasoning tasks, where correctness should be independent of stylistic variation. Grounded in Aristotle's rhetorical principles, we formalize seven persuasion techniques (Majority,