Abstract
This work is the first comprehensive analysis of the properties of word embeddings learned by neural machine translation (NMT) models trained on bilingual texts. We show the word representations of NMT models outperform those learned from monolingual text by established algorithms such as Skipgram and CBOW on tasks that require knowledge of semantic similarity and/or lexical–syntactic role. These effects hold when translating from English to French and English to German, and we argue that the desirable properties of NMT word embeddings should emerge largely independently of the source and target languages. Further, we apply a recently-proposed heuristic method for training NMT models with very large vocabularies, and show that this vocabulary expansion method results in minimal degradation of embedding quality. This allows us to make a large vocabulary of NMT embeddings available for future research and applications. Overall, our analyses indicate that NMT embeddings should be used in applications that require word concepts to be organised according to similarity and/or lexical function, while monolingual embeddings are better suited to modelling (nonspecific) inter-word relatedness.
Original language | English (US) |
---|---|
Pages (from-to) | 3-18 |
Number of pages | 16 |
Journal | Machine Translation |
Volume | 31 |
Issue number | 1-2 |
DOIs | |
State | Published - Jun 1 2017 |
Keywords
- Machine translation
- Representation
- Word embeddings
ASJC Scopus subject areas
- Software
- Language and Linguistics
- Linguistics and Language
- Artificial Intelligence