Foreign Words and the Automatic Processing of Arabic Social Media Text Written in Roman Script

Ramy Eskander, Mohamed Al-Badrashiny, Nizar Habash, Owen Rambow

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Arabic on social media has all the properties of any language on social media that make it tough for natural language processing, plus some specific problems. These include diglossia, the use of an alternative alphabet (Roman), and code switching with foreign languages. In this paper, we present a system which can process Arabic written in Roman alphabet (“Arabizi”). It identifies whether each word is a foreign word or one of another four categories (Arabic, name, punctuation, sound), and transliterates Arabic words and names into the Arabic alphabet. We obtain an overall system performance of 83.8% on an unseen test set.

Original languageEnglish (US)
Title of host publication1st Workshop on Computational Approaches to Code Switching, Switching 2014 at the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014 - Proceedings
EditorsMona Diab, Julia Hirschberg, Pascale Fung, Thamar Solorio
PublisherAssociation for Computational Linguistics (ACL)
Pages1-12
Number of pages12
ISBN (Electronic)9781937284961
StatePublished - 2014
Event1st Workshop on Computational Approaches to Code Switching, Switching 2014 at the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014 - Doha, Qatar
Duration: Oct 25 2014 → …

Publication series

Name1st Workshop on Computational Approaches to Code Switching, Switching 2014 at the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014 - Proceedings

Conference

Conference1st Workshop on Computational Approaches to Code Switching, Switching 2014 at the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014
Country/TerritoryQatar
CityDoha
Period10/25/14 → …

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Foreign Words and the Automatic Processing of Arabic Social Media Text Written in Roman Script'. Together they form a unique fingerprint.

Cite this