Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation

Research / Other Relevance: 3/10 27 cited 2024 paper

This paper introduces Bonito, a model that generates synthetic instruction tuning datasets from unannotated text to adapt large language models to specialized domains without manual annotation. The work focuses on domain adaptation (biomedical, legal) using synthetic task generation rather than K-12 educational contexts or learning outcomes.

We introduce Bonito, an open-source model for conditional task generation that converts unannotated text into task-specific training datasets for instruction tuning. We aim to enable zero-shot task adaptation of large language models on users' specialized, private data. We train Bonito by fine-tuning a pretrained large language model on a new large-scale dataset with 1.65M examples created by remixing existing instruction tuning datasets into meta-templates. The meta-templates for a dataset prod

Study Type

Research / Other

Tool Types

Teacher Support Tools Tools that assist teachers — lesson planning, content generation, grading, analytics.
Personalised Adaptive Learning Systems that adapt content and difficulty to individual learners.

Tags

instructional text generationcomputer-science