TY - JOUR
T1 - Per-channel energy normalization
T2 - Why and how
AU - Lostanlen, Vincent
AU - Salamon, Justin
AU - Cartwright, Mark
AU - McFee, Brian
AU - Farnsworth, Andrew
AU - Kelling, Steve
AU - Bello, Juan Pablo
N1 - Funding Information:
Manuscript received August 6, 2018; revised October 5, 2018; accepted October 9, 2018. Date of publication October 29, 2018; date of current version November 19, 2018. This work was supported in part by the NSF awards under Grant 1633206 and Grant 1633259, in part by the Leon Levy Foundation, and in part by a Google faculty award. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Boaz Rafaely. (Corresponding author: Vincent Lostanlen.) V. Lostanlen, A. Farnsworth, and S. Kelling are with the Cornell Lab of Ornithology, Cornell University, Ithaca, NY 14850 USA (e-mail:, vl1019@nyu.edu; af27@cornell.edu; stk2@cornell.edu).
Publisher Copyright:
© 1994-2012 IEEE.
PY - 2019/1
Y1 - 2019/1
N2 - In the context of automatic speech recognition and acoustic event detection, an adaptive procedure named per-channel energy normalization (PCEN) has recently shown to outperform the pointwise logarithm of mel-frequency spectrogram (logmelspec) as an acoustic frontend. This letter investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints. First, we apply PCEN on various datasets of natural acoustic environments and find empirically that it Gaussianizes distributions of magnitudes while decorrelating frequency bands. Second, we describe the asymptotic regimes of each component in PCEN: temporal integration, gain control, and dynamic range compression. Third, we give practical advice for adapting PCEN parameters to the temporal properties of the noise to be mitigated, the signal to be enhanced, and the choice of time-frequency representation. As it converts a large class of real-world soundscapes into additive white Gaussian noise, PCEN is a computationally efficient frontend for robust detection and classification of acoustic events in heterogeneous environments.
AB - In the context of automatic speech recognition and acoustic event detection, an adaptive procedure named per-channel energy normalization (PCEN) has recently shown to outperform the pointwise logarithm of mel-frequency spectrogram (logmelspec) as an acoustic frontend. This letter investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints. First, we apply PCEN on various datasets of natural acoustic environments and find empirically that it Gaussianizes distributions of magnitudes while decorrelating frequency bands. Second, we describe the asymptotic regimes of each component in PCEN: temporal integration, gain control, and dynamic range compression. Third, we give practical advice for adapting PCEN parameters to the temporal properties of the noise to be mitigated, the signal to be enhanced, and the choice of time-frequency representation. As it converts a large class of real-world soundscapes into additive white Gaussian noise, PCEN is a computationally efficient frontend for robust detection and classification of acoustic events in heterogeneous environments.
KW - Acoustic noise
KW - acoustic sensors
KW - acoustic signal detection
KW - signal classification
KW - spectrogram
UR - http://www.scopus.com/inward/record.url?scp=85055677559&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055677559&partnerID=8YFLogxK
U2 - 10.1109/LSP.2018.2878620
DO - 10.1109/LSP.2018.2878620
M3 - Article
AN - SCOPUS:85055677559
SN - 1070-9908
VL - 26
SP - 39
EP - 43
JO - IEEE Signal Processing Letters
JF - IEEE Signal Processing Letters
IS - 1
M1 - 8514023
ER -