VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

Show authors

Publication Type

Conference Paper

Book Title

NeurIPS 2024: The 38th Conference on Neural Information Processing Systems

Publication Date

December, 2024

Page Numbers

131035 to 131071

Publisher Location

United States of America

Conference Name

NeurIPS 2024: Annual Conference on Neural Information Processing Systems

Conference Location

Vancouver, Canada

Conference Sponsor

N/A

Conference Date

Dec 9, 2024 - Dec 15, 2024

Abstract

Images are increasingly becoming the currency for documenting biodiversity on the planet, providing novel opportunities for accelerating scientific discoveries in the field of organismal biology, especially with the advent of large vision-language models (VLMs). We ask if pre-trained VLMs can aid scientists in answering a range of biologically relevant questions without any additional fine-tuning. In this paper, we evaluate the effectiveness of 12 state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel dataset, VLM4Bio, consisting of 469K question8 answer pairs involving 30K images from three groups of organisms: fishes, birds, and butterflies, covering five biologically relevant tasks. We also explore the effects of applying prompting techniques and tests for reasoning hallucination on the
performance of VLMs, shedding new light on the capabilities of current SOTA VLMs in answering biologically relevant questions using images

91����

VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

Abstract

Researchers

Organizations

91��