TY - GEN
T1 - Full expansion of context-dependent networks in large vocabulary speech recognition
AU - Mohri, Mehryar
AU - Riley, Michael
AU - Hindle, Don
AU - Ljolje, Andrej
AU - Pereira, Femando
PY - 1998
Y1 - 1998
N2 - We combine our earlier approach to context-dependent network representation with our algorithm for determining weighted networks to build optimized networks for large-vocabulary speech recognition combining an n-gram language model, a pronunciation dictionary and context-dependency modeling. While fully-expanded networks have been used before in restrictive settings (medium vocabulary or no cross-word contexts), we demonstrate that our network determination method makes it practical to use fully-expanded networks also in large-vocabulary recognition with full cross-word context modeling. For the DARPA North American Business News task (NAB), we give network sizes and recognition speeds and accuracies using bigram and trigram grammars with vocabulary sizes ranging from 10000 to 160000 words. With our construction, the fully-expanded NAB context-dependent networks contain only about twice as many arcs as the corresponding language models. Interestingly, we also find that, with these networks, real-time word accuracy is improved by increasing the vocabulary size and n-gram order.
AB - We combine our earlier approach to context-dependent network representation with our algorithm for determining weighted networks to build optimized networks for large-vocabulary speech recognition combining an n-gram language model, a pronunciation dictionary and context-dependency modeling. While fully-expanded networks have been used before in restrictive settings (medium vocabulary or no cross-word contexts), we demonstrate that our network determination method makes it practical to use fully-expanded networks also in large-vocabulary recognition with full cross-word context modeling. For the DARPA North American Business News task (NAB), we give network sizes and recognition speeds and accuracies using bigram and trigram grammars with vocabulary sizes ranging from 10000 to 160000 words. With our construction, the fully-expanded NAB context-dependent networks contain only about twice as many arcs as the corresponding language models. Interestingly, we also find that, with these networks, real-time word accuracy is improved by increasing the vocabulary size and n-gram order.
UR - http://www.scopus.com/inward/record.url?scp=84892168937&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84892168937&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.1998.675352
DO - 10.1109/ICASSP.1998.675352
M3 - Conference contribution
AN - SCOPUS:84892168937
SN - 0780344286
SN - 9780780344280
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 665
EP - 668
BT - Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
T2 - 1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
Y2 - 12 May 1998 through 15 May 1998
ER -