Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic

Relevance: 7/10 6 cited 2025 paper

This paper proposes using Signal Temporal Logic (STL) to evaluate confidence trajectories in Chain-of-Thought reasoning for LLMs solving high school mathematics problems, aiming to improve calibration and reduce overconfident incorrect answers. The method is tested on Chinese Gaokao mathematics questions to provide more reliable uncertainty estimates for educational AI systems.

Large Language Models (LLMs) have shown impressive performance in mathematical reasoning tasks when guided by Chain-of-Thought (CoT) prompting. However, they tend to produce highly confident yet incorrect outputs, which poses significant risks in domains like education, where users may lack the expertise to assess reasoning steps. To address this, we propose a structured framework that models stepwise confidence as a temporal signal and evaluates it using Signal Temporal Logic (STL). In particul

Source

View source

Framework Categories

1 General reasoning 2.2 Pedagogy of generated outputs

Tool Types

AI Tutors 1-to-1 conversational tutoring systems.

Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic

Source

Framework Categories

Tool Types

Tags