Domain and Dialect Adaptation for Machine Translation into Egyptian Arabic

Serena Jeblee, Weston Feely, Houda Bouamor, Alon Lavie, Nizar Habash, Kemal Oflazer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we present a statistical machine translation system for English to Dialectal Arabic (DA), using Modern Standard Arabic (MSA) as a pivot. We create a core system to translate from English to MSA using a large bilingual parallel corpus. Then, we design two separate pathways for translation from MSA into DA: a two-step domain and dialect adaptation system and a one-step simultaneous domain and dialect adaptation system. Both variants of the adaptation systems are trained on a 100k sentence tri-parallel corpus of English, MSA, and Egyptian Arabic generated by a rule-based transformation. We test our systems on a held-out Egyptian Arabic test set from the 100k sentence corpus and we achieve our best performance using the two-step domain and dialect adaptation system with a BLEU score of 42.9.

Original languageEnglish (US)
Title of host publicationANLP 2014 - EMNLP 2014 Workshop on Arabic Natural Language Processing, Proceedings
EditorsNizar Habash, Stephan Vogel
PublisherAssociation for Computational Linguistics (ACL)
Pages196-206
Number of pages11
ISBN (Electronic)9781937284961
StatePublished - 2014
EventEMNLP 2014 Workshop on Arabic Natural Language Processing, ANLP 2014 - Doha, Qatar
Duration: Oct 25 2014 → …

Publication series

NameANLP 2014 - EMNLP 2014 Workshop on Arabic Natural Language Processing, Proceedings

Conference

ConferenceEMNLP 2014 Workshop on Arabic Natural Language Processing, ANLP 2014
Country/TerritoryQatar
CityDoha
Period10/25/14 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Domain and Dialect Adaptation for Machine Translation into Egyptian Arabic'. Together they form a unique fingerprint.

Cite this