TY - GEN
T1 - Fusing shallow and deep learning for bioacoustic bird species classification
AU - Salamon, Justin
AU - Bello, Juan Pablo
AU - Farnsworth, Andrew
AU - Kelling, Steve
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/6/16
Y1 - 2017/6/16
N2 - Automated classification of organisms to species based on their vocalizations would contribute tremendously to abilities to monitor biodiversity, with a wide range of applications in the field of ecology. In particular, automated classification of migrating birds' flight calls could yield new biological insights and conservation applications for birds that vocalize during migration. In this paper we explore state-of-the-art classification techniques for large-vocabulary bird species classification from flight calls. In particular, we contrast a 'shallow learning' approach based on unsupervised dictionary learning with a deep convolutional neural network combined with data augmentation. We show that the two models perform comparably on a dataset of 5428 flight calls spanning 43 different species, with both significantly outperforming an MFCC baseline. Finally, we show that by combining the models using a simple late-fusion approach we can further improve the results, obtaining a state-of-the-art classification accuracy of 0.96.
AB - Automated classification of organisms to species based on their vocalizations would contribute tremendously to abilities to monitor biodiversity, with a wide range of applications in the field of ecology. In particular, automated classification of migrating birds' flight calls could yield new biological insights and conservation applications for birds that vocalize during migration. In this paper we explore state-of-the-art classification techniques for large-vocabulary bird species classification from flight calls. In particular, we contrast a 'shallow learning' approach based on unsupervised dictionary learning with a deep convolutional neural network combined with data augmentation. We show that the two models perform comparably on a dataset of 5428 flight calls spanning 43 different species, with both significantly outperforming an MFCC baseline. Finally, we show that by combining the models using a simple late-fusion approach we can further improve the results, obtaining a state-of-the-art classification accuracy of 0.96.
KW - Convolutional neural networks
KW - bioacoustics
KW - data augmentation
KW - deep learning
KW - flight calls
UR - http://www.scopus.com/inward/record.url?scp=85023772547&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85023772547&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2017.7952134
DO - 10.1109/ICASSP.2017.7952134
M3 - Conference contribution
AN - SCOPUS:85023772547
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 141
EP - 145
BT - 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
Y2 - 5 March 2017 through 9 March 2017
ER -