TY - GEN
T1 - Sound Source Distance Estimation in Diverse and Dynamic Acoustic Conditions
AU - Kushwaha, Saksham Singh
AU - Roman, Iran R.
AU - Fuentes, Magdalena
AU - Bello, Juan Pablo
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Localizing a moving sound source in the real world involves determining its direction-of-arrival (DOA) and distance relative to a microphone. Advancements in DOA estimation have been facilitated by data-driven methods optimized with large open-source datasets with microphone array recordings in diverse environments. In contrast, estimating a sound source's distance remains understudied. Existing approaches assume recordings by non-coincident microphones to use methods that are susceptible to differences in room reverberation. We present a CRNN able to estimate the distance of moving sound sources across multiple datasets featuring diverse rooms, outperforming a recently-published approach. We also characterize our model's performance as a function of sound source distance and different training losses. This analysis reveals optimal training using a loss that weighs model errors as an inverse function of the sound source true distance. Our study is the first to demonstrate that sound source distance estimation can be performed across diverse acoustic conditions using deep learning.
AB - Localizing a moving sound source in the real world involves determining its direction-of-arrival (DOA) and distance relative to a microphone. Advancements in DOA estimation have been facilitated by data-driven methods optimized with large open-source datasets with microphone array recordings in diverse environments. In contrast, estimating a sound source's distance remains understudied. Existing approaches assume recordings by non-coincident microphones to use methods that are susceptible to differences in room reverberation. We present a CRNN able to estimate the distance of moving sound sources across multiple datasets featuring diverse rooms, outperforming a recently-published approach. We also characterize our model's performance as a function of sound source distance and different training losses. This analysis reveals optimal training using a loss that weighs model errors as an inverse function of the sound source true distance. Our study is the first to demonstrate that sound source distance estimation can be performed across diverse acoustic conditions using deep learning.
KW - distance estimation
KW - mean percentage error
KW - multichannel audio
KW - sound source localization
UR - http://www.scopus.com/inward/record.url?scp=85173064880&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85173064880&partnerID=8YFLogxK
U2 - 10.1109/WASPAA58266.2023.10248194
DO - 10.1109/WASPAA58266.2023.10248194
M3 - Conference contribution
AN - SCOPUS:85173064880
T3 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
BT - Proceedings of the 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2023
Y2 - 22 October 2023 through 25 October 2023
ER -