Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch

Lawrence K. Saul, Daniel D. Lee, Charles L. Isbell, Yann LeCun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We have implemented a real time front end for detecting voiced speech and estimating its fundamental frequency. The front end performs the signal processing for voice-driven agents that attend to the pitch contours of human speech and provide continuous audiovisual feedback. The algorithm we use for pitch tracking has several distinguishing features: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high resolution estimates; and it works reliably over a four octave range, in real time, without the need for postprocessing to produce smooth contours. The algorithm is based on two simple ideas in neural computation: the introduction of a purposeful nonlinearity, and the error signal of a least squares fit. The pitch tracker is used in two real time multimedia applications: a voice-to-MIDI player that synthesizes electronic music from vocalized melodies, and an audiovisual Karaoke machine with multimodal feedback. Both applications run on a laptop and display the user's pitch scrolling across the screen as he or she sings into the computer.

Original languageEnglish (US)
Title of host publicationNIPS 2002
Subtitle of host publicationProceedings of the 15th International Conference on Neural Information Processing Systems
EditorsSuzanna Becker, Sebastian Thrun, Klaus Obermayer
PublisherMIT Press Journals
Pages1181-1188
Number of pages8
ISBN (Electronic)0262025507, 9780262025508
StatePublished - 2002
Event15th International Conference on Neural Information Processing Systems, NIPS 2002 - Vancouver, Canada
Duration: Dec 9 2002Dec 14 2002

Publication series

NameNIPS 2002: Proceedings of the 15th International Conference on Neural Information Processing Systems

Conference

Conference15th International Conference on Neural Information Processing Systems, NIPS 2002
Country/TerritoryCanada
CityVancouver
Period12/9/0212/14/02

ASJC Scopus subject areas

  • Signal Processing
  • Computer Networks and Communications
  • Information Systems

Fingerprint

Dive into the research topics of 'Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch'. Together they form a unique fingerprint.

Cite this