Zero-shot Object Navigation with Vision-Language Foundation Models Reasoning

Shuaihang Yuan, Muhammad Shafique, Mohamed Riyadh Baghdadi, Farshad Khorrami, Anthony Tzes, Yi Fang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This research introduces a novel method for zero-shot object navigation, enabling agents to navigate unexplored environments. Our approach differs from traditional methods, which often fail in new settings due to their dependence on large navigation datasets for training. We use Large Vision Language Models (LVLMs) to help agents understand and move through unfamiliar visual environments without prior experience. The process involves using a pretrained LVLM for object detection to create a semantic map, followed by employing LVLM again to predict the likely location of the target object. Our experiments on the RoboTHOR benchmark show improved performance, with a 1.8% increase in both Success Rate and Success Weighted by Path Length (SPL) compared to the existing best method, ESC.

Original languageEnglish (US)
Title of host publication2024 10th International Conference on Automation, Robotics, and Applications, ICARA 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages501-505
Number of pages5
ISBN (Electronic)9798350394245
DOIs
StatePublished - 2024
Event10th International Conference on Automation, Robotics, and Applications, ICARA 2024 - Athens, Greece
Duration: Feb 22 2024Feb 24 2024

Publication series

Name2024 10th International Conference on Automation, Robotics, and Applications, ICARA 2024

Conference

Conference10th International Conference on Automation, Robotics, and Applications, ICARA 2024
Country/TerritoryGreece
CityAthens
Period2/22/242/24/24

Keywords

  • Commonsense Reasoning
  • Object Goal Navigation
  • Zero-shot Navigation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Mechanical Engineering
  • Safety, Risk, Reliability and Quality
  • Control and Optimization
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Zero-shot Object Navigation with Vision-Language Foundation Models Reasoning'. Together they form a unique fingerprint.

Cite this