Building a Corpus for Palestinian Arabic: a Preliminary Study

Mustafa Jarrar, Nizar Habash, Diyam Akra, Nasser Zalmout

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents preliminary results in building an annotated corpus of the Palestinian Arabic dialect. The corpus consists of about 43K words, stemming from diverse resources. The paper discusses some linguistic facts about the Palestinian dialect, compared with the Modern Standard Arabic, especially in terms of morphological, orthographic, and lexical variations, and suggests some directions to resolve the challenges these differences pose to the annotation goal. Furthermore, we present two pilot studies that investigate whether existing tools for processing Modern Standard Arabic and Egyptian Arabic can be used to speed up the annotation process of our Palestinian Arabic corpus.

Original languageEnglish (US)
Title of host publicationANLP 2014 - EMNLP 2014 Workshop on Arabic Natural Language Processing, Proceedings
EditorsNizar Habash, Stephan Vogel
PublisherAssociation for Computational Linguistics (ACL)
Pages18-27
Number of pages10
ISBN (Electronic)9781937284961
StatePublished - 2014
EventEMNLP 2014 Workshop on Arabic Natural Language Processing, ANLP 2014 - Doha, Qatar
Duration: Oct 25 2014 → …

Publication series

NameANLP 2014 - EMNLP 2014 Workshop on Arabic Natural Language Processing, Proceedings

Conference

ConferenceEMNLP 2014 Workshop on Arabic Natural Language Processing, ANLP 2014
Country/TerritoryQatar
CityDoha
Period10/25/14 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Building a Corpus for Palestinian Arabic: a Preliminary Study'. Together they form a unique fingerprint.

Cite this