Dialectal Arabic to English Machine Translation: Pivoting through Modern Standard Arabic

Wael Salloum, Nizar Habash

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Modern Standard Arabic (MSA) has a wealth of natural language processing (NLP) tools and resources. In comparison, resources for dialectal Arabic (DA), the unstandardized spoken varieties of Arabic, are still lacking. We present ELISSA, a machine translation (MT) system for DA to MSA. ELISSA employs a rule-based approach that relies on morphological analysis, transfer rules and dictionaries in addition to language models to produce MSA paraphrases of DA sentences. ELISSA can be employed as a general preprocessor for DA when using MSA NLP tools. A manual error analysis of ELISSA’s output shows that it produces correct MSA translations over 93% of the time. Using ELISSA to produce MSA versions of DA sentences as part of an MSA-pivoting DA-to-English MT solution, improves BLEU scores on multiple blind test sets between 0.6% and 1.4%.

Original languageEnglish (US)
Title of host publicationProceedings of the 2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, NAACL-HLT 2013
EditorsDavid Elson, Anna Kazantseva, Stan Szpakowicz
PublisherAssociation for Computational Linguistics (ACL)
Pages348-358
Number of pages11
ISBN (Electronic)9781937284473
StatePublished - 2013
Event2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 - Atlanta, United States
Duration: Jun 14 2013 → …

Publication series

NameProceedings of the 2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013

Conference

Conference2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013
Country/TerritoryUnited States
CityAtlanta
Period6/14/13 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Dialectal Arabic to English Machine Translation: Pivoting through Modern Standard Arabic'. Together they form a unique fingerprint.

Cite this