Automatic transliteration of romanized dialectal Arabic

Mohamed Al-Badrashiny, Ramy Eskander, Nizar Habash, Owen Rambow

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we address the problem of converting Dialectal Arabic (DA) text that is written in the Latin script (called Arabizi) into Arabic script following the CODA convention for DA orthography. The presented system uses a finite state transducer trained at the character level to generate all possible transliterations for the input Arabizi words. We then filter the generated list using a DA morphological analyzer. After that we pick the best choice for each input word using a language model. We achieve an accuracy of 69.4% on an unseen test set compared to 63.1% using a system which represents a previously proposed approach.

Original languageEnglish (US)
Title of host publicationCoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages30-38
Number of pages9
ISBN (Electronic)9781941643020
StatePublished - Jan 1 2014
Event18th Conference on Computational Natural Language Learning, CoNLL 2014 - Baltimore, United States
Duration: Jun 26 2014Jun 27 2014

Publication series

NameCoNLL 2014 - 18th Conference on Computational Natural Language Learning, Proceedings

Conference

Conference18th Conference on Computational Natural Language Learning, CoNLL 2014
Country/TerritoryUnited States
CityBaltimore
Period6/26/146/27/14

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Artificial Intelligence
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Automatic transliteration of romanized dialectal Arabic'. Together they form a unique fingerprint.

Cite this