Development of the MIT ASR system for the 2016 Arabic Multi-genre Broadcast Challenge

Tuka Alhanai, Wei Ning Hsu, James Glass

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Arabic language, with over 300 million speakers, has significant diversity and breadth. This proves challenging when building an automated system to understand what is said. This paper describes an Arabic Automatic Speech Recognition system developed on a 1,200 hour speech corpus that was made available for the 2016 Arabic Multi-genre Broadcast (MGB) Challenge. A range of Deep Neural Network (DNN) topologies were modeled including; Feed-forward, Convolutional, Time-Delay, Recurrent Long Short-Term Memory (LSTM), Highway LSTM (H-LSTM), and Grid LSTM (GLSTM). The best performance came from a sequence discriminatively trained G-LSTM neural network. The best overall Word Error Rate (WER) was 18.3% (p < 0:001) on the development set, after combining hypotheses of 3 and 5 layer sequence discriminatively trained G-LSTM models that had been rescored with a 4-gram language model.

Original languageEnglish (US)
Title of host publication2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages299-304
Number of pages6
ISBN (Electronic)9781509049035
DOIs
StatePublished - Feb 7 2017
Event2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - San Diego, United States
Duration: Dec 13 2016Dec 16 2016

Publication series

Name2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings

Conference

Conference2016 IEEE Workshop on Spoken Language Technology, SLT 2016
CountryUnited States
CitySan Diego
Period12/13/1612/16/16

Keywords

  • Arabic
  • Automatic Speech Recognition
  • Deep Neural Networks
  • MGB Challenge

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Artificial Intelligence
  • Language and Linguistics
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Development of the MIT ASR system for the 2016 Arabic Multi-genre Broadcast Challenge'. Together they form a unique fingerprint.

  • Cite this

    Alhanai, T., Hsu, W. N., & Glass, J. (2017). Development of the MIT ASR system for the 2016 Arabic Multi-genre Broadcast Challenge. In 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings (pp. 299-304). [7846280] (2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SLT.2016.7846280