A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition

Liang Lu, Xingxing Zhang, Kyunghyun Cho, Steve Renals

Research output: Contribution to journalConference articlepeer-review

Abstract

Deep neural networks have advanced the state-of-the-art in automatic speech recognition, when combined with hidden Markov models (HMMs). Recently there has been interest in using systems based on recurrent neural networks (RNNs) to perform sequence modelling directly, without the requirement of an HMM superstructure. In this paper, we study the RNN encoder-decoder approach for large vocabulary end-toend speech recognition, whereby an encoder transforms a sequence of acoustic vectors into a sequence of feature representations, from which a decoder recovers a sequence of words. We investigated this approach on the Switchboard corpus using a training set of around 300 hours of transcribed audio data. Without the use of an explicit language model or pronunciation lexicon, we achieved promising recognition accuracy, demonstrating that this approach warrants further investigation.

Original languageEnglish (US)
Pages (from-to)3249-3253
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2015-January
StatePublished - 2015
Event16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
Duration: Sep 6 2015Sep 10 2015

Keywords

  • Deep neural networks
  • Encoder-decoder
  • End-to-end speech recognition
  • Recurrent neural networks

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint Dive into the research topics of 'A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition'. Together they form a unique fingerprint.

Cite this