Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

Iran R. Roman, Christopher Ick, Sivan Ding, Adrian S. Roman, Brian Mcfee, Juan P. Bello

Research output: Contribution to journalConference articlepeer-review

Abstract

Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific rooms. We present SpatialScaper, a library for SELD data simulation and augmentation. Compared to existing tools, SpatialScaper emulates virtual rooms via parameters such as size and wall absorption. This allows for parameterized placement (including movement) of foreground and background sound sources. SpatialScaper also includes data augmentation pipelines that can be applied to existing SELD data. As a case study, we use SpatialScaper to add rooms to the DCASE SELD data. Training a model with our data led to progressive performance improves as a direct function of acoustic diversity. These results show that SpatialScaper is valuable to train robust SELD models.

Original languageEnglish (US)
Pages (from-to)1221-1225
Number of pages5
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of
Duration: Apr 14 2024Apr 19 2024

Keywords

  • data augmentation
  • data simulation
  • microphone arrays
  • room simulations
  • spatial audio

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms'. Together they form a unique fingerprint.

Cite this