Reconstructing room scales with a single sound for augmented reality displays

Benjamin S. Liang, Andrew S. Liang, Iran Roman, Tomer Weiss, Budmonde Duinkharjav, Juan Pablo Bello, Qi Sun

Research output: Contribution to journalArticlepeer-review

Abstract

Perception and reconstruction of our 3D physical environment is an essential task with broad applications for Augmented Reality (AR) displays. For example, reconstructed geometries are commonly leveraged for displaying 3D objects at accurate positions. While camera-captured images are a frequently used data source for realistically reconstructing 3D physical surroundings, they are limited to line-of-sight environments, requiring time-consuming and repetitive data-capture techniques to capture a full 3D picture. For instance, current AR devices require users to scan through a whole room to obtain its geometric sizes. This optical process is tedious and inapplicable when the space is occluded or inaccessible. Audio waves propagate through space by bouncing from different surfaces, but are not 'occluded' by a single object such as a wall, unlike light. In this research, we aim to ask the question ‘can one hear the size of a room?’. To answer that, we propose an approach for inferring room geometries only from a single sound, which we define as an audio wave sequence played from a single loud speaker, leveraging deep learning for decoding implicitly-carried spatial information from a single speaker-and-microphone system. Through a series of experiments and studies, our work demonstrates our method's effectiveness at inferring a 3D environment's spatial layout. Our work introduces a robust building block in multi-modal layout reconstruction.

Original languageEnglish (US)
JournalJournal of Information Display
DOIs
StateAccepted/In press - 2022

Keywords

  • Scene perception
  • acoustic propagation
  • audio listening
  • augmented reality
  • multi-modal

ASJC Scopus subject areas

  • Materials Science(all)
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Reconstructing room scales with a single sound for augmented reality displays'. Together they form a unique fingerprint.

Cite this