TY - JOUR
T1 - BirdVoxDetect
T2 - Large-Scale Detection and Classification of Flight Calls for Bird Migration Monitoring
AU - Lostanlen, Vincent
AU - Cramer, Aurora
AU - Salamon, Justin
AU - Farnsworth, Andrew
AU - Van Doren, Benjamin M.
AU - Kelling, Steve
AU - Bello, Juan Pablo
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2024
Y1 - 2024
N2 - Sound event classification has the potential to advance our understanding of bird migration. Although it is long known that migratory species have a vocal signature of their own, previous work on automatic flight call classification has been limited in robustness and scope: e.g., covering few recording sites, short acquisition segments, and simplified biological taxonomies. In this paper, we present BirdVoxDetect (BVD), the first full-fledged solution to bird migration monitoring from acoustic sensor network data. As an open-source software, BVD integrates an original pipeline of three machine learning modules. The first module is a random forest classifier of sensor faults, trained with human-in-The-loop active learning. The second module is a deep convolutional neural network for sound event detection with per-channel energy normalization (PCEN). The third module is a multitask convolutional neural network which predicts the family, genus, and species of flight calls from passerines (Passeriformes) of North America. We evaluate BVD on a new dataset (296 hours from nine locations, the largest to date for this task) and discuss the main sources of estimation error in a real-world deployment: mechanical sensor failures, sensitivity to background noise, misdetection, and taxonomic confusion. Then, we deploy BVD to an unprecedented scale: 6672 hours of audio (approximately one terabyte), corresponding to a full season of bird migration. Running BVD in parallel over the full-season dataset yields 1.6 billion FFT's, 480 million neural network predictions, and over six petabytes of throughput. With this method, our main finding is that deep learning and bioacoustic sensor networks are ready to complement radar observations and crowdsourced surveys for bird migration monitoring, thus benefiting conservation ecology and land-use planning at large.
AB - Sound event classification has the potential to advance our understanding of bird migration. Although it is long known that migratory species have a vocal signature of their own, previous work on automatic flight call classification has been limited in robustness and scope: e.g., covering few recording sites, short acquisition segments, and simplified biological taxonomies. In this paper, we present BirdVoxDetect (BVD), the first full-fledged solution to bird migration monitoring from acoustic sensor network data. As an open-source software, BVD integrates an original pipeline of three machine learning modules. The first module is a random forest classifier of sensor faults, trained with human-in-The-loop active learning. The second module is a deep convolutional neural network for sound event detection with per-channel energy normalization (PCEN). The third module is a multitask convolutional neural network which predicts the family, genus, and species of flight calls from passerines (Passeriformes) of North America. We evaluate BVD on a new dataset (296 hours from nine locations, the largest to date for this task) and discuss the main sources of estimation error in a real-world deployment: mechanical sensor failures, sensitivity to background noise, misdetection, and taxonomic confusion. Then, we deploy BVD to an unprecedented scale: 6672 hours of audio (approximately one terabyte), corresponding to a full season of bird migration. Running BVD in parallel over the full-season dataset yields 1.6 billion FFT's, 480 million neural network predictions, and over six petabytes of throughput. With this method, our main finding is that deep learning and bioacoustic sensor networks are ready to complement radar observations and crowdsourced surveys for bird migration monitoring, thus benefiting conservation ecology and land-use planning at large.
KW - Acoustic signal detection
KW - audio databases
KW - deep learning
KW - ecosystems
KW - phylogeny
UR - http://www.scopus.com/inward/record.url?scp=85202785038&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85202785038&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2024.3444486
DO - 10.1109/TASLP.2024.3444486
M3 - Article
AN - SCOPUS:85202785038
SN - 2329-9290
VL - 32
SP - 4134
EP - 4145
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
ER -