Neural machine translation with a polysynthetic low resource language

John E. Ortega, Richard Castro Mamani, Kyunghyun Cho

Research output: Contribution to journalArticlepeer-review

Abstract

Low-resource languages (LRL) with complex morphology are known to be more difficult to translate in an automatic way. Some LRLs are particularly more difficult to translate than others due to the lack of research interest or collaboration. In this article, we experiment with a specific LRL, Quechua, that is spoken by millions of people in South America yet has not undertaken a neural approach for translation until now. We improve the latest published results with baseline BLEU scores using the state-of-the-art recurrent neural network approaches for translation. Additionally, we experiment with several morphological segmentation techniques and introduce a new one in order to decompose the language’s suffix-based morphemes. We extend our work to other high-resource languages (HRL) like Finnish and Spanish to show that Quechua, for qualitative purposes, can be considered compatible with and translatable into other major European languages with measurements comparable to the state-of-the-art HRLs at this time. We finalize our work by making our best two Quechua–Spanish translation engines available on-line.

Original languageEnglish (US)
Pages (from-to)325-346
Number of pages22
JournalMachine Translation
Volume34
Issue number4
DOIs
StatePublished - Dec 2020

Keywords

  • Finnish
  • Low resource languages
  • Morphology
  • Neural machine translation
  • Quechua
  • Spanish

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Linguistics and Language
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Neural machine translation with a polysynthetic low resource language'. Together they form a unique fingerprint.

Cite this