Speech Recognition with Weighted Finite-State Transducers

Mehryar Mohri, Fernando Pereira, Michael Riley

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

This chapter describes a general representation and algorithmic framework for speech speechrecognition recognition based on weighted finite-state weighted transducer transducers. These transducers provide a common and natural representation for major components of speech recognition systems, including hidden Markov models (HMMs), context-dependency models, pronunciation dictionaries, statistical grammars, and word or phone lattices. General algorithms for building and optimizing transducer models are presented, including composition for combining models, weighted determinization and minimization for optimizing time and space requirements, and a weight pushing algorithm for redistributing transition weights optimally for speech recognition. The application of these methods to large-vocabulary recognition tasks is explained in detail, and experimental results are given, in particular for the North American Business News (NAB) task, in which these methods were used to combine HMMs, full cross-word triphones, a lexicon of 40000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in real time on a very simple decoder. Another example demonstrates that the same methods can be used to optimize lattices for second-pass recognition.

Original languageEnglish (US)
Title of host publicationSpringer Handbooks
PublisherSpringer
Pages559-584
Number of pages26
DOIs
StatePublished - 2008

Publication series

NameSpringer Handbooks
ISSN (Print)2522-8692
ISSN (Electronic)2522-8706

Keywords

  • Automatic Speech Recognition
  • Hide Markov Model
  • Output Label
  • Speech Recognition
  • Weighted Acceptor

ASJC Scopus subject areas

  • General

Fingerprint

Dive into the research topics of 'Speech Recognition with Weighted Finite-State Transducers'. Together they form a unique fingerprint.

Cite this