Neural machine translation by jointly learning to align and translate

Dzmitry Bahdanau, Kyung Hyun Cho, Yoshua Bengio

Research output: Contribution to conferencePaperpeer-review

Abstract

Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder–decoders and encode a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder–decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

Original languageEnglish (US)
StatePublished - Jan 1 2015
Event3rd International Conference on Learning Representations, ICLR 2015 - San Diego, United States
Duration: May 7 2015May 9 2015

Conference

Conference3rd International Conference on Learning Representations, ICLR 2015
Country/TerritoryUnited States
CitySan Diego
Period5/7/155/9/15

ASJC Scopus subject areas

  • Education
  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Neural machine translation by jointly learning to align and translate'. Together they form a unique fingerprint.

Cite this