Processing Spontaneous Orthography

Ramy Eskander, Nizar Habash, Owen Rambow, Nadi Tomeh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In cases in which there is no standard orthography for a language or language variant, written texts will display a variety of orthographic choices. This is problematic for natural language processing (NLP) because it creates spurious data sparseness. We study the transformation of spontaneously spelled Egyptian Arabic into a conventionalized orthography which we have previously proposed for NLP purposes. We show that a two-stage process can reduce divergences from this standard by 69%, making subsequent processing of Egyptian Arabic easier.

Original languageEnglish (US)
Title of host publicationProceedings of the 2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, NAACL-HLT 2013
EditorsDavid Elson, Anna Kazantseva, Stan Szpakowicz
PublisherAssociation for Computational Linguistics (ACL)
Pages585-595
Number of pages11
ISBN (Electronic)9781937284473
StatePublished - 2013
Event2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 - Atlanta, United States
Duration: Jun 14 2013 → …

Publication series

NameProceedings of the 2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013

Conference

Conference2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013
Country/TerritoryUnited States
CityAtlanta
Period6/14/13 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Processing Spontaneous Orthography'. Together they form a unique fingerprint.

Cite this