BloomVQA: Assessing Hierarchical Multi-modal Comprehension
BloomVQA is a novel VQA benchmark dataset that evaluates multi-modal comprehension in large vision-language models using picture stories mapped to Bloom's Taxonomy levels (remember through create). The benchmark includes 1200 core samples based on early childhood education stories, with hierarchical graph representations enabling automated evaluation of model performance across different cognitive skill levels.
We propose a novel VQA dataset, BloomVQA, to facilitate comprehensive evaluation of large vision-language models on comprehension tasks. Unlike current benchmarks that often focus on fact-based memorization and simple reasoning tasks without theoretical grounding, we collect multiple-choice samples based on picture stories that reflect different levels of comprehension, as laid out in Bloom's Taxonomy, a classic framework for learning assessment widely adopted in education research. Our data map