Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring
This paper evaluates whether CEFR-based system prompting can reliably constrain LLMs to generate Spanish text appropriate to different student proficiency levels (A1, B1, C1) in simulated tutor-student dialogues, finding that prompting effectiveness degrades over sustained interactions (alignment drift). The study uses automated dialogue simulation with open-source LLMs ranging from 7B to 12B parameters to assess proficiency-aligned adaptive tutoring without human participants.
This paper investigates the potentials of Large Language Models (LLMs) as adaptive tutors in the context of second-language learning. In particular, we evaluate whether system prompting can reliably constrain LLMs to generate only text appropriate to the student's competence level. We simulate full teacher-student dialogues in Spanish using instruction-tuned, open-source LLMs ranging in size from 7B to 12B parameters. Dialogues are generated by having an LLM alternate between tutor and student r