Improved zero-shot neural machine translation via ignoring spurious correlations

Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O.K. Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naïve training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which translates twice using a third language as a pivot. In this work, we address the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences. Inspired by this analysis, we propose to use two simple but effective approaches: (1) decoder pre-training; (2) back-translation. These methods show significant improvement (4 ~ 22 BLEU points) over the vanilla zero-shot translation on three challenging multilingual datasets, and achieve similar or better results than the pivot-based approach.

Original languageEnglish (US)
Title of host publicationACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages1258-1268
Number of pages11
ISBN (Electronic)9781950737482
StatePublished - 2020
Event57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Florence, Italy
Duration: Jul 28 2019Aug 2 2019

Publication series

NameACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference57th Annual Meeting of the Association for Computational Linguistics, ACL 2019
CountryItaly
CityFlorence
Period7/28/198/2/19

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science(all)
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Improved zero-shot neural machine translation via ignoring spurious correlations'. Together they form a unique fingerprint.

Cite this