TY - GEN
T1 - Exploring the Reliability of Foundation Model-Based Frontier Selection in Zero-Shot Object Goal Navigation
AU - Yuan, Shuaihang
AU - Unlu, Halil Utku
AU - Huang, Hao
AU - Wen, Congcong
AU - Tzes, Anthony
AU - Fang, Yi
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - In this paper, we present a novel method for reliable frontier selection in Zero-Shot Object Goal Navigation (ZS-OGN), enhancing robotic navigation systems with foundation models to improve commonsense reasoning in indoor environments. Our approach introduces a multi-expert decision framework to address the nonsensical or irrelevant reasoning often seen in foundation model-based systems. The method comprises two key components: Diversified Expert Frontier Analysis (DEFA) and Consensus Decision Making (CDM). DEFA utilizes three expert models—furniture arrangement, room type analysis, and visual scene reasoning—while CDM aggregates their outputs, prioritizing unanimous or majority consensus for more reliable decisions. Demonstrating state-of-the-art performance on the RoboTHOR and HM3D datasets, our method excels at navigating towards untrained objects or goals and outperforms various baselines, showcasing its adaptability to dynamic real-world conditions and superior generalization capabilities.
AB - In this paper, we present a novel method for reliable frontier selection in Zero-Shot Object Goal Navigation (ZS-OGN), enhancing robotic navigation systems with foundation models to improve commonsense reasoning in indoor environments. Our approach introduces a multi-expert decision framework to address the nonsensical or irrelevant reasoning often seen in foundation model-based systems. The method comprises two key components: Diversified Expert Frontier Analysis (DEFA) and Consensus Decision Making (CDM). DEFA utilizes three expert models—furniture arrangement, room type analysis, and visual scene reasoning—while CDM aggregates their outputs, prioritizing unanimous or majority consensus for more reliable decisions. Demonstrating state-of-the-art performance on the RoboTHOR and HM3D datasets, our method excels at navigating towards untrained objects or goals and outperforms various baselines, showcasing its adaptability to dynamic real-world conditions and superior generalization capabilities.
KW - Foundation Model Reasoning
KW - Zero-shot Object Goal Navigation
UR - http://www.scopus.com/inward/record.url?scp=85212251242&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85212251242&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-78113-1_9
DO - 10.1007/978-3-031-78113-1_9
M3 - Conference contribution
AN - SCOPUS:85212251242
SN - 9783031781124
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 119
EP - 134
BT - Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings
A2 - Antonacopoulos, Apostolos
A2 - Chaudhuri, Subhasis
A2 - Chellappa, Rama
A2 - Liu, Cheng-Lin
A2 - Bhattacharya, Saumik
A2 - Pal, Umapada
PB - Springer Science and Business Media Deutschland GmbH
T2 - 27th International Conference on Pattern Recognition, ICPR 2024
Y2 - 1 December 2024 through 5 December 2024
ER -