A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale

Cal Peyser, Michael Picheny, Kyunghyun Cho, Rohit Prabhavalkar, W. Ronny Huang, Tara N. Sainath

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Unpaired text and audio injection have emerged as dominant methods for improving ASR performance in the absence of a large labeled corpus. However, little guidance exists on deploying these methods to improve production ASR systems that are trained on very large supervised corpora and with realistic requirements like a constrained model size and CPU budget, streaming capability, and a rich lattice for rescoring and for downstream NLU tasks. In this work, we compare three state-of-the-art semi-supervised methods encompassing both unpaired text and audio as well as several of their combinations in a controlled setting using joint training. We find that in our setting these methods offer many improvements beyond raw WER, including substantial gains in tail-word WER, decoder computation during inference, and lattice density.

Original languageEnglish (US)
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728163277
DOIs
StatePublished - 2023
Event48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece
Duration: Jun 4 2023Jun 10 2023

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2023-June
ISSN (Print)1520-6149

Conference

Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Country/TerritoryGreece
CityRhodes Island
Period6/4/236/10/23

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale'. Together they form a unique fingerprint.

Cite this