TY - GEN
T1 - Reliable Semantic Understanding for Real World Zero-Shot Object Goal Navigation
AU - Unlu, Halil Utku
AU - Yuan, Shuaihang
AU - Wen, Congcong
AU - Huang, Hao
AU - Tzes, Anthony
AU - Fang, Yi
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - We introduce an innovative approach to advancing semantic understanding in zero-shot object goal navigation (ZS-OGN), enhancing the autonomy of robots in unfamiliar environments. Traditional reliance on labeled data has been a limitation for robotic adaptability, which we address by employing a dual-component framework that integrates a GLIP Vision Language Model for initial detection and an InstructionBLIP model for validation. This combination not only refines object and environmental recognition but also fortifies the semantic interpretation, pivotal for navigational decision-making. Our method, rigorously tested in both simulated and real-world settings, exhibits marked improvements in navigation precision and reliability.
AB - We introduce an innovative approach to advancing semantic understanding in zero-shot object goal navigation (ZS-OGN), enhancing the autonomy of robots in unfamiliar environments. Traditional reliance on labeled data has been a limitation for robotic adaptability, which we address by employing a dual-component framework that integrates a GLIP Vision Language Model for initial detection and an InstructionBLIP model for validation. This combination not only refines object and environmental recognition but also fortifies the semantic interpretation, pivotal for navigational decision-making. Our method, rigorously tested in both simulated and real-world settings, exhibits marked improvements in navigation precision and reliability.
KW - Object Goal Navigation
KW - Safe Navigation
KW - Semantic Scene Understanding
KW - Vision-Language Models
KW - Zero-shot Navigation
UR - http://www.scopus.com/inward/record.url?scp=85212255847&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85212255847&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-78113-1_10
DO - 10.1007/978-3-031-78113-1_10
M3 - Conference contribution
AN - SCOPUS:85212255847
SN - 9783031781124
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 135
EP - 150
BT - Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings
A2 - Antonacopoulos, Apostolos
A2 - Chaudhuri, Subhasis
A2 - Chellappa, Rama
A2 - Liu, Cheng-Lin
A2 - Bhattacharya, Saumik
A2 - Pal, Umapada
PB - Springer Science and Business Media Deutschland GmbH
T2 - 27th International Conference on Pattern Recognition, ICPR 2024
Y2 - 1 December 2024 through 5 December 2024
ER -