Real time voice processing with audiovisual feedback: Toward autonomous agents with perfect pitch

Lawrence K. Saul, Daniel D. Lee, Charles L. Isbell, Yann LeCun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We have implemented a real time front end for detecting voiced speech and estimating its fundamental frequency. The front end performs the signal processing for voice-driven agents that attend to the pitch contours of human speech and provide continuous audiovisual feedback. The algorithm we use for pitch tracking has several distinguishing features: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high resolution estimates; and it works reliably over a four octave range, in real time, without the need for postprocessing to produce smooth contours. The algorithm is based on two simple ideas in neural computation: the introduction of a purposeful nonlinearity, and the error signal of a least squares fit. The pitch tracker is used in two real time multimedia applications: a voice-to-MIDI player that synthesizes electronic music from vocalized melodies, and an audiovisual Karaoke machine with multimodal feedback. Both applications run on a laptop and display the user's pitch scrolling across the screen as he or she sings into the computer.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 15 - Proceedings of the 2002 Conference, NIPS 2002
PublisherNeural information processing systems foundation
ISBN (Print)0262025507, 9780262025508
StatePublished - 2003
Event16th Annual Neural Information Processing Systems Conference, NIPS 2002 - Vancouver, BC, Canada
Duration: Dec 9 2002Dec 14 2002

Publication series

NameAdvances in Neural Information Processing Systems
ISSN (Print)1049-5258

Other

Other16th Annual Neural Information Processing Systems Conference, NIPS 2002
CountryCanada
CityVancouver, BC
Period12/9/0212/14/02

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint Dive into the research topics of 'Real time voice processing with audiovisual feedback: Toward autonomous agents with perfect pitch'. Together they form a unique fingerprint.

  • Cite this

    Saul, L. K., Lee, D. D., Isbell, C. L., & LeCun, Y. (2003). Real time voice processing with audiovisual feedback: Toward autonomous agents with perfect pitch. In Advances in Neural Information Processing Systems 15 - Proceedings of the 2002 Conference, NIPS 2002 (Advances in Neural Information Processing Systems). Neural information processing systems foundation.