TY - JOUR
T1 - Weighted finite-state transducers in speech recognition
AU - Mohri, Mehryar
AU - Pereira, Fernando
AU - Riley, Michael
N1 - Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2002/1
Y1 - 2002/1
N2 - We survey the use of weighted finite-state transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for hidden Markov models (HMMs), context-dependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general transducer operations combine these representations flexibly and efficiently. Weighted determinization and minimization algorithms optimize their time and space requirements, and a weight pushing algorithm distributes the weights along the paths of a weighted transducer optimally for speech recognition. As an example, we describe a North American Business News (NAB) recognition system built using these techniques that combines the HMMs, full cross-word triphones, a lexicon of 40 000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in real-time on a very simple decoder. In another example, we show that the same techniques can be used to optimize lattices for second-pass recognition. In a third example, we show how general automata operations can be used to assemble lattices from different recognizers to improve recognition performance.
AB - We survey the use of weighted finite-state transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for hidden Markov models (HMMs), context-dependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general transducer operations combine these representations flexibly and efficiently. Weighted determinization and minimization algorithms optimize their time and space requirements, and a weight pushing algorithm distributes the weights along the paths of a weighted transducer optimally for speech recognition. As an example, we describe a North American Business News (NAB) recognition system built using these techniques that combines the HMMs, full cross-word triphones, a lexicon of 40 000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in real-time on a very simple decoder. In another example, we show that the same techniques can be used to optimize lattices for second-pass recognition. In a third example, we show how general automata operations can be used to assemble lattices from different recognizers to improve recognition performance.
UR - http://www.scopus.com/inward/record.url?scp=0036460907&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036460907&partnerID=8YFLogxK
U2 - 10.1006/csla.2001.0184
DO - 10.1006/csla.2001.0184
M3 - Article
AN - SCOPUS:0036460907
VL - 16
SP - 69
EP - 88
JO - Computer Speech and Language
JF - Computer Speech and Language
SN - 0885-2308
IS - 1
ER -