TY - GEN
T1 - Few-shot sound event detection
AU - Wang, Yu
AU - Salamon, Justin
AU - Bryan, Nicholas J.
AU - Bello, Juan Pablo
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - Locating perceptually similar sound events within a continuous recording is a common task for various audio applications. However, current tools require users to manually listen to and label all the locations of the sound events of interest, which is tedious and time-consuming. In this work, we (1) adapt state-of-the-art metric-based few-shot learning methods to automate the detection of similar-sounding events, requiring only one or few examples of the target event, (2) develop a method to automatically construct a partial set of labeled examples (negative samples) to reduce user labeling effort, and (3) develop an inference-time data augmentation method to increase detection accuracy. To validate our approach, we perform extensive comparative analysis of few-shot learning methods for the task of keyword detection in speech. We show that our approach successfully adapts closed-set few-shot learning approaches to an open-set sound event detection problem.
AB - Locating perceptually similar sound events within a continuous recording is a common task for various audio applications. However, current tools require users to manually listen to and label all the locations of the sound events of interest, which is tedious and time-consuming. In this work, we (1) adapt state-of-the-art metric-based few-shot learning methods to automate the detection of similar-sounding events, requiring only one or few examples of the target event, (2) develop a method to automatically construct a partial set of labeled examples (negative samples) to reduce user labeling effort, and (3) develop an inference-time data augmentation method to increase detection accuracy. To validate our approach, we perform extensive comparative analysis of few-shot learning methods for the task of keyword detection in speech. We show that our approach successfully adapts closed-set few-shot learning approaches to an open-set sound event detection problem.
KW - Few-shot learning
KW - Keyword detection
KW - Keyword spotting
KW - Sound event detection
KW - Speech
UR - http://www.scopus.com/inward/record.url?scp=85091290156&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091290156&partnerID=8YFLogxK
U2 - 10.1109/ICASSP40776.2020.9054708
DO - 10.1109/ICASSP40776.2020.9054708
M3 - Conference contribution
AN - SCOPUS:85091290156
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 81
EP - 85
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Y2 - 4 May 2020 through 8 May 2020
ER -