ConvoLearn: A Dataset of Constructivist Tutor-Student Dialogue
ConvoLearn is a dataset of 1,250 semi-synthetic tutor-student dialogues in middle school Earth Science, grounded in constructivist knowledge-building theory and operationalizing six pedagogical dimensions (cognitive engagement, formative assessment, accountability, cultural responsiveness, metacognition, and power dynamics). The authors demonstrate that fine-tuning LLMs on this dataset shifts their behavior toward knowledge-building strategies, with their Mistral-7B model outperforming base models and Claude Sonnet 4.5 in teacher evaluations.
In educational applications, LLMs exhibit several fundamental pedagogical limitations, such as their tendency to reveal solutions rather than support dialogic learning. We introduce ConvoLearn (https://huggingface.co/datasets/masharma/convolearn ), a dataset grounded in knowledge building theory that operationalizes six core pedagogical dimensions: cognitive engagement, formative assessment, accountability, cultural responsiveness, metacognition, and power dynamics. We construct a semi-synthetic