Per-channel energy normalization: Why and how

Vincent Lostanlen, Justin Salamon, Mark Cartwright, Brian McFee, Andrew Farnsworth, Steve Kelling, Juan Pablo Bello

Research output: Contribution to journalArticlepeer-review

Abstract

In the context of automatic speech recognition and acoustic event detection, an adaptive procedure named per-channel energy normalization (PCEN) has recently shown to outperform the pointwise logarithm of mel-frequency spectrogram (logmelspec) as an acoustic frontend. This letter investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints. First, we apply PCEN on various datasets of natural acoustic environments and find empirically that it Gaussianizes distributions of magnitudes while decorrelating frequency bands. Second, we describe the asymptotic regimes of each component in PCEN: temporal integration, gain control, and dynamic range compression. Third, we give practical advice for adapting PCEN parameters to the temporal properties of the noise to be mitigated, the signal to be enhanced, and the choice of time-frequency representation. As it converts a large class of real-world soundscapes into additive white Gaussian noise, PCEN is a computationally efficient frontend for robust detection and classification of acoustic events in heterogeneous environments.

Original languageEnglish (US)
Article number8514023
Pages (from-to)39-43
Number of pages5
JournalIEEE Signal Processing Letters
Volume26
Issue number1
DOIs
StatePublished - Jan 2019

Keywords

  • Acoustic noise
  • acoustic sensors
  • acoustic signal detection
  • signal classification
  • spectrogram

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Per-channel energy normalization: Why and how'. Together they form a unique fingerprint.

Cite this