TY - GEN
T1 - Zero-shot Object Navigation with Vision-Language Foundation Models Reasoning
AU - Yuan, Shuaihang
AU - Shafique, Muhammad
AU - Baghdadi, Mohamed Riyadh
AU - Khorrami, Farshad
AU - Tzes, Anthony
AU - Fang, Yi
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This research introduces a novel method for zero-shot object navigation, enabling agents to navigate unexplored environments. Our approach differs from traditional methods, which often fail in new settings due to their dependence on large navigation datasets for training. We use Large Vision Language Models (LVLMs) to help agents understand and move through unfamiliar visual environments without prior experience. The process involves using a pretrained LVLM for object detection to create a semantic map, followed by employing LVLM again to predict the likely location of the target object. Our experiments on the RoboTHOR benchmark show improved performance, with a 1.8% increase in both Success Rate and Success Weighted by Path Length (SPL) compared to the existing best method, ESC.
AB - This research introduces a novel method for zero-shot object navigation, enabling agents to navigate unexplored environments. Our approach differs from traditional methods, which often fail in new settings due to their dependence on large navigation datasets for training. We use Large Vision Language Models (LVLMs) to help agents understand and move through unfamiliar visual environments without prior experience. The process involves using a pretrained LVLM for object detection to create a semantic map, followed by employing LVLM again to predict the likely location of the target object. Our experiments on the RoboTHOR benchmark show improved performance, with a 1.8% increase in both Success Rate and Success Weighted by Path Length (SPL) compared to the existing best method, ESC.
KW - Commonsense Reasoning
KW - Object Goal Navigation
KW - Zero-shot Navigation
UR - http://www.scopus.com/inward/record.url?scp=85197356895&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85197356895&partnerID=8YFLogxK
U2 - 10.1109/ICARA60736.2024.10553173
DO - 10.1109/ICARA60736.2024.10553173
M3 - Conference contribution
AN - SCOPUS:85197356895
T3 - 2024 10th International Conference on Automation, Robotics, and Applications, ICARA 2024
SP - 501
EP - 505
BT - 2024 10th International Conference on Automation, Robotics, and Applications, ICARA 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th International Conference on Automation, Robotics, and Applications, ICARA 2024
Y2 - 22 February 2024 through 24 February 2024
ER -