Leveraging Vision Language Models for Specialized Agricultural Tasks

Muhammad Arbab Arshad, Talukder Zaki Jubery, Tirtho Roy, Rim Nassiri, Asheesh K. Singh, Arti Singh, Chinmay Hegde, Baskar Ganapathysubramanian, Aditya Balu, Adarsh Krishnamurthy, Soumik Sarkar

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    As Vision Language Models (VLMs) become increasingly accessible to farmers and agricultural experts, there is a growing need to evaluate their potential in specialized tasks. We present AgEval, a comprehensive benchmark for assessing VLMs' capabilities in plant stress phenotyping, offering a solution to the challenge of limited annotated data in agriculture. Our study explores how general-purpose VLMs can be leveraged for domain-specific tasks with only a few annotated examples, providing insights into their behavior and adaptability. AgEval encompasses 12 diverse plant stress phenotyping tasks, evaluating zero-shot and few-shot in-context learning performance of state-of-the-art models including Claude, GPT, Gemini, and LLaVA. Our results demonstrate VLMs' rapid adaptability to specialized tasks, with the best-performing model showing an increase in F1 scores from 46.24% to 73.37% in 8-shot identification. To quantify performance disparities across classes, we introduce metrics such as the coefficient of variation (CV), revealing that VLMs' training impacts classes differently, with CV ranging from 26.02% to 58.03%. We also find that strategic example selection enhances model reliability, with exact category examples improving F1 scores by 15.38% on average. AgEval establishes a framework for assessing VLMs in agricultural applications, offering valuable benchmarks for future evaluations. Our findings suggest that VLMs, with minimal few-shot examples, show promise as a viable alternative to traditional specialized models in plant stress phenotyping, while also highlighting areas for further refinement. Results and benchmark details are available at: https://github.com/arbab-ml/AgEval

    Original languageEnglish (US)
    Title of host publicationProceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages6320-6329
    Number of pages10
    ISBN (Electronic)9798331510831
    DOIs
    StatePublished - 2025
    Event2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 - Tucson, United States
    Duration: Feb 28 2025Mar 4 2025

    Publication series

    NameProceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025

    Conference

    Conference2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
    Country/TerritoryUnited States
    CityTucson
    Period2/28/253/4/25

    Keywords

    • agriculture
    • few-shot learning
    • in-context learning
    • large language models
    • vision language models

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Computer Science Applications
    • Computer Vision and Pattern Recognition
    • Human-Computer Interaction
    • Modeling and Simulation
    • Radiology Nuclear Medicine and imaging

    Fingerprint

    Dive into the research topics of 'Leveraging Vision Language Models for Specialized Agricultural Tasks'. Together they form a unique fingerprint.

    Cite this