TY - GEN
T1 - Dual Learning for Large Vocabulary On-Device ASR
AU - Peyser, Cal
AU - Huang, Ronny
AU - Sainath, Tara
AU - Prabhavalkar, Rohit
AU - Picheny, Michael
AU - Cho, Kyunghyun
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once. In this scheme, each model is used to generate pseudo-labels for unlabeled examples that are used to train the other model. Dual learning has seen some use in speech processing by pairing ASR and TTS as dual tasks. However, these results mostly address only the case of using unpaired examples to compensate for very small supervised datasets, and mostly on large, non-streaming models. Dual learning has not yet been proven effective for using unsupervised data to improve realistic on-device streaming models that are already trained on large supervised corpora. We provide this missing piece though an analysis of an on-device-sized streaming conformer trained on the entirety of Librispeech, showing relative WER improvements of 10.7%/5.2% without an LM and 11.7%/16.4% with an LM.
AB - Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once. In this scheme, each model is used to generate pseudo-labels for unlabeled examples that are used to train the other model. Dual learning has seen some use in speech processing by pairing ASR and TTS as dual tasks. However, these results mostly address only the case of using unpaired examples to compensate for very small supervised datasets, and mostly on large, non-streaming models. Dual learning has not yet been proven effective for using unsupervised data to improve realistic on-device streaming models that are already trained on large supervised corpora. We provide this missing piece though an analysis of an on-device-sized streaming conformer trained on the entirety of Librispeech, showing relative WER improvements of 10.7%/5.2% without an LM and 11.7%/16.4% with an LM.
UR - http://www.scopus.com/inward/record.url?scp=85147796705&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147796705&partnerID=8YFLogxK
U2 - 10.1109/SLT54892.2023.10023407
DO - 10.1109/SLT54892.2023.10023407
M3 - Conference contribution
AN - SCOPUS:85147796705
T3 - 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
SP - 245
EP - 251
BT - 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE Spoken Language Technology Workshop, SLT 2022
Y2 - 9 January 2023 through 12 January 2023
ER -