On using very large target vocabulary for neural machine translation

Sebastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrasebased statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method based on importance sampling that allows us to use a very large target vocabulary without increasing training complexity. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to match, and in some cases outperform, the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use an ensemble of a few models with very large target vocabularies, we achieve performance comparable to the state of the art (measured by BLEU) on both the English!German and English!French translation tasks of WMT'14.

Original languageEnglish (US)
Title of host publicationACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages1-10
Number of pages10
ISBN (Electronic)9781941643723
DOIs
StatePublished - 2015
Event53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP 2015 - Beijing, China
Duration: Jul 26 2015Jul 31 2015

Publication series

NameACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference
Volume1

Other

Other53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP 2015
CountryChina
CityBeijing
Period7/26/157/31/15

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Fingerprint Dive into the research topics of 'On using very large target vocabulary for neural machine translation'. Together they form a unique fingerprint.

  • Cite this

    Jean, S., Cho, K., Memisevic, R., & Bengio, Y. (2015). On using very large target vocabulary for neural machine translation. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (pp. 1-10). (ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference; Vol. 1). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p15-1001