Crepe

A Convolutional Representation for Pitch Estimation

Jong Wook Kim, Justin Salamon, Peter Li, Juan Bello

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The task of estimating the fundamental frequency of a monophonic sound recording, also known as pitch tracking, is fundamental to audio processing with multiple applications in speech processing and music information retrieval. To date, the best performing techniques, such as the pYIN algorithm, are based on a combination of DSP pipelines and heuristics. While such techniques perform very well on average, there remain many cases in which they fail to correctly estimate the pitch. In this paper, we propose a data-driven pitch tracking algorithm, CREPE, which is based on a deep convolutional neural network that operates directly on the time-domain waveform. We show that the proposed model produces state-of-the-art results, performing equally or better than pYIN. Furthermore, we evaluate the model's generalizability in terms of noise robustness. A pre-trained version of CREPE is made freely available as an open-source Python module for easy application.

Original languageEnglish (US)
Title of host publication2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages161-165
Number of pages5
Volume2018-April
ISBN (Print)9781538646588
DOIs
StatePublished - Sep 10 2018
Event2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
Duration: Apr 15 2018Apr 20 2018

Other

Other2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
CountryCanada
CityCalgary
Period4/15/184/20/18

Fingerprint

Sound recording
Speech processing
Information retrieval
Pipelines
Neural networks
Processing

Keywords

  • Convolutional neural network
  • Pitch estimation

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Kim, J. W., Salamon, J., Li, P., & Bello, J. (2018). Crepe: A Convolutional Representation for Pitch Estimation. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings (Vol. 2018-April, pp. 161-165). [8461329] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2018.8461329

Crepe : A Convolutional Representation for Pitch Estimation. / Kim, Jong Wook; Salamon, Justin; Li, Peter; Bello, Juan.

2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Vol. 2018-April Institute of Electrical and Electronics Engineers Inc., 2018. p. 161-165 8461329.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, JW, Salamon, J, Li, P & Bello, J 2018, Crepe: A Convolutional Representation for Pitch Estimation. in 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. vol. 2018-April, 8461329, Institute of Electrical and Electronics Engineers Inc., pp. 161-165, 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018, Calgary, Canada, 4/15/18. https://doi.org/10.1109/ICASSP.2018.8461329
Kim JW, Salamon J, Li P, Bello J. Crepe: A Convolutional Representation for Pitch Estimation. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Vol. 2018-April. Institute of Electrical and Electronics Engineers Inc. 2018. p. 161-165. 8461329 https://doi.org/10.1109/ICASSP.2018.8461329
Kim, Jong Wook ; Salamon, Justin ; Li, Peter ; Bello, Juan. / Crepe : A Convolutional Representation for Pitch Estimation. 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Vol. 2018-April Institute of Electrical and Electronics Engineers Inc., 2018. pp. 161-165
@inproceedings{8770e6c4413b43cabd3c4c720e31c741,
title = "Crepe: A Convolutional Representation for Pitch Estimation",
abstract = "The task of estimating the fundamental frequency of a monophonic sound recording, also known as pitch tracking, is fundamental to audio processing with multiple applications in speech processing and music information retrieval. To date, the best performing techniques, such as the pYIN algorithm, are based on a combination of DSP pipelines and heuristics. While such techniques perform very well on average, there remain many cases in which they fail to correctly estimate the pitch. In this paper, we propose a data-driven pitch tracking algorithm, CREPE, which is based on a deep convolutional neural network that operates directly on the time-domain waveform. We show that the proposed model produces state-of-the-art results, performing equally or better than pYIN. Furthermore, we evaluate the model's generalizability in terms of noise robustness. A pre-trained version of CREPE is made freely available as an open-source Python module for easy application.",
keywords = "Convolutional neural network, Pitch estimation",
author = "Kim, {Jong Wook} and Justin Salamon and Peter Li and Juan Bello",
year = "2018",
month = "9",
day = "10",
doi = "10.1109/ICASSP.2018.8461329",
language = "English (US)",
isbn = "9781538646588",
volume = "2018-April",
pages = "161--165",
booktitle = "2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Crepe

T2 - A Convolutional Representation for Pitch Estimation

AU - Kim, Jong Wook

AU - Salamon, Justin

AU - Li, Peter

AU - Bello, Juan

PY - 2018/9/10

Y1 - 2018/9/10

N2 - The task of estimating the fundamental frequency of a monophonic sound recording, also known as pitch tracking, is fundamental to audio processing with multiple applications in speech processing and music information retrieval. To date, the best performing techniques, such as the pYIN algorithm, are based on a combination of DSP pipelines and heuristics. While such techniques perform very well on average, there remain many cases in which they fail to correctly estimate the pitch. In this paper, we propose a data-driven pitch tracking algorithm, CREPE, which is based on a deep convolutional neural network that operates directly on the time-domain waveform. We show that the proposed model produces state-of-the-art results, performing equally or better than pYIN. Furthermore, we evaluate the model's generalizability in terms of noise robustness. A pre-trained version of CREPE is made freely available as an open-source Python module for easy application.

AB - The task of estimating the fundamental frequency of a monophonic sound recording, also known as pitch tracking, is fundamental to audio processing with multiple applications in speech processing and music information retrieval. To date, the best performing techniques, such as the pYIN algorithm, are based on a combination of DSP pipelines and heuristics. While such techniques perform very well on average, there remain many cases in which they fail to correctly estimate the pitch. In this paper, we propose a data-driven pitch tracking algorithm, CREPE, which is based on a deep convolutional neural network that operates directly on the time-domain waveform. We show that the proposed model produces state-of-the-art results, performing equally or better than pYIN. Furthermore, we evaluate the model's generalizability in terms of noise robustness. A pre-trained version of CREPE is made freely available as an open-source Python module for easy application.

KW - Convolutional neural network

KW - Pitch estimation

UR - http://www.scopus.com/inward/record.url?scp=85054253049&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054253049&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2018.8461329

DO - 10.1109/ICASSP.2018.8461329

M3 - Conference contribution

SN - 9781538646588

VL - 2018-April

SP - 161

EP - 165

BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -