Dual Learning for Large Vocabulary On-Device ASR

Cal Peyser, Ronny Huang, Tara Sainath, Rohit Prabhavalkar, Michael Picheny, Kyunghyun Cho

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once. In this scheme, each model is used to generate pseudo-labels for unlabeled examples that are used to train the other model. Dual learning has seen some use in speech processing by pairing ASR and TTS as dual tasks. However, these results mostly address only the case of using unpaired examples to compensate for very small supervised datasets, and mostly on large, non-streaming models. Dual learning has not yet been proven effective for using unsupervised data to improve realistic on-device streaming models that are already trained on large supervised corpora. We provide this missing piece though an analysis of an on-device-sized streaming conformer trained on the entirety of Librispeech, showing relative WER improvements of 10.7%/5.2% without an LM and 11.7%/16.4% with an LM.

Original languageEnglish (US)
Title of host publication2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages245-251
Number of pages7
ISBN (Electronic)9798350396904
DOIs
StatePublished - 2023
Event2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Doha, Qatar
Duration: Jan 9 2023Jan 12 2023

Publication series

Name2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

Conference

Conference2022 IEEE Spoken Language Technology Workshop, SLT 2022
Country/TerritoryQatar
CityDoha
Period1/9/231/12/23

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Media Technology
  • Instrumentation
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Dual Learning for Large Vocabulary On-Device ASR'. Together they form a unique fingerprint.

Cite this