TY - JOUR
T1 - Reconstructing room scales with a single sound for augmented reality displays
AU - Liang, Benjamin S.
AU - Liang, Andrew S.
AU - Roman, Iran
AU - Weiss, Tomer
AU - Duinkharjav, Budmonde
AU - Bello, Juan Pablo
AU - Sun, Qi
N1 - Publisher Copyright:
© 2022 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group on behalf of the Korean Information Display Society.
PY - 2023
Y1 - 2023
N2 - Perception and reconstruction of our 3D physical environment is an essential task with broad applications for Augmented Reality (AR) displays. For example, reconstructed geometries are commonly leveraged for displaying 3D objects at accurate positions. While camera-captured images are a frequently used data source for realistically reconstructing 3D physical surroundings, they are limited to line-of-sight environments, requiring time-consuming and repetitive data-capture techniques to capture a full 3D picture. For instance, current AR devices require users to scan through a whole room to obtain its geometric sizes. This optical process is tedious and inapplicable when the space is occluded or inaccessible. Audio waves propagate through space by bouncing from different surfaces, but are not 'occluded' by a single object such as a wall, unlike light. In this research, we aim to ask the question ‘can one hear the size of a room?’. To answer that, we propose an approach for inferring room geometries only from a single sound, which we define as an audio wave sequence played from a single loud speaker, leveraging deep learning for decoding implicitly-carried spatial information from a single speaker-and-microphone system. Through a series of experiments and studies, our work demonstrates our method's effectiveness at inferring a 3D environment's spatial layout. Our work introduces a robust building block in multi-modal layout reconstruction.
AB - Perception and reconstruction of our 3D physical environment is an essential task with broad applications for Augmented Reality (AR) displays. For example, reconstructed geometries are commonly leveraged for displaying 3D objects at accurate positions. While camera-captured images are a frequently used data source for realistically reconstructing 3D physical surroundings, they are limited to line-of-sight environments, requiring time-consuming and repetitive data-capture techniques to capture a full 3D picture. For instance, current AR devices require users to scan through a whole room to obtain its geometric sizes. This optical process is tedious and inapplicable when the space is occluded or inaccessible. Audio waves propagate through space by bouncing from different surfaces, but are not 'occluded' by a single object such as a wall, unlike light. In this research, we aim to ask the question ‘can one hear the size of a room?’. To answer that, we propose an approach for inferring room geometries only from a single sound, which we define as an audio wave sequence played from a single loud speaker, leveraging deep learning for decoding implicitly-carried spatial information from a single speaker-and-microphone system. Through a series of experiments and studies, our work demonstrates our method's effectiveness at inferring a 3D environment's spatial layout. Our work introduces a robust building block in multi-modal layout reconstruction.
KW - Scene perception
KW - acoustic propagation
KW - audio listening
KW - augmented reality
KW - multi-modal
UR - http://www.scopus.com/inward/record.url?scp=85142302052&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142302052&partnerID=8YFLogxK
U2 - 10.1080/15980316.2022.2145377
DO - 10.1080/15980316.2022.2145377
M3 - Article
AN - SCOPUS:85142302052
SN - 1598-0316
VL - 24
SP - 1
EP - 12
JO - Journal of Information Display
JF - Journal of Information Display
IS - 1
ER -