TY - GEN
T1 - Domain and Dialect Adaptation for Machine Translation into Egyptian Arabic
AU - Jeblee, Serena
AU - Feely, Weston
AU - Bouamor, Houda
AU - Lavie, Alon
AU - Habash, Nizar
AU - Oflazer, Kemal
N1 - Funding Information:
This publication was made possible by grant NPRP-09-1140-1-177 from the Qatar National Research Fund (a member of the Qatar Foundation) and by computing resources provided by the NSF-sponsored XSEDE program under grant TG-CCR110017. The statements made herein are solely the responsibility of the authors. We thank the reviewers for their comments. Nizar Habash performed most of his contribution to this paper while he was at the Center for Computational Learning Systems at Columbia University.
Publisher Copyright:
©2014 Association for Computational Linguistics
PY - 2014
Y1 - 2014
N2 - In this paper, we present a statistical machine translation system for English to Dialectal Arabic (DA), using Modern Standard Arabic (MSA) as a pivot. We create a core system to translate from English to MSA using a large bilingual parallel corpus. Then, we design two separate pathways for translation from MSA into DA: a two-step domain and dialect adaptation system and a one-step simultaneous domain and dialect adaptation system. Both variants of the adaptation systems are trained on a 100k sentence tri-parallel corpus of English, MSA, and Egyptian Arabic generated by a rule-based transformation. We test our systems on a held-out Egyptian Arabic test set from the 100k sentence corpus and we achieve our best performance using the two-step domain and dialect adaptation system with a BLEU score of 42.9.
AB - In this paper, we present a statistical machine translation system for English to Dialectal Arabic (DA), using Modern Standard Arabic (MSA) as a pivot. We create a core system to translate from English to MSA using a large bilingual parallel corpus. Then, we design two separate pathways for translation from MSA into DA: a two-step domain and dialect adaptation system and a one-step simultaneous domain and dialect adaptation system. Both variants of the adaptation systems are trained on a 100k sentence tri-parallel corpus of English, MSA, and Egyptian Arabic generated by a rule-based transformation. We test our systems on a held-out Egyptian Arabic test set from the 100k sentence corpus and we achieve our best performance using the two-step domain and dialect adaptation system with a BLEU score of 42.9.
UR - http://www.scopus.com/inward/record.url?scp=85045361508&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045361508&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85045361508
T3 - ANLP 2014 - EMNLP 2014 Workshop on Arabic Natural Language Processing, Proceedings
SP - 196
EP - 206
BT - ANLP 2014 - EMNLP 2014 Workshop on Arabic Natural Language Processing, Proceedings
A2 - Habash, Nizar
A2 - Vogel, Stephan
PB - Association for Computational Linguistics (ACL)
T2 - EMNLP 2014 Workshop on Arabic Natural Language Processing, ANLP 2014
Y2 - 25 October 2014
ER -