TY - GEN
T1 - Building a Corpus for Palestinian Arabic
T2 - EMNLP 2014 Workshop on Arabic Natural Language Processing, ANLP 2014
AU - Jarrar, Mustafa
AU - Habash, Nizar
AU - Akra, Diyam
AU - Zalmout, Nasser
N1 - Funding Information:
This work is part of the ongoing project Curras, funded by the Palestinian Ministry of Higher Education, Scientific Research Council. Nizar Habash performed most of his work on this paper while he was in the Center for Computational Learning Systems at Columbia University.
Publisher Copyright:
©2014 Association for Computational Linguistics
PY - 2014
Y1 - 2014
N2 - This paper presents preliminary results in building an annotated corpus of the Palestinian Arabic dialect. The corpus consists of about 43K words, stemming from diverse resources. The paper discusses some linguistic facts about the Palestinian dialect, compared with the Modern Standard Arabic, especially in terms of morphological, orthographic, and lexical variations, and suggests some directions to resolve the challenges these differences pose to the annotation goal. Furthermore, we present two pilot studies that investigate whether existing tools for processing Modern Standard Arabic and Egyptian Arabic can be used to speed up the annotation process of our Palestinian Arabic corpus.
AB - This paper presents preliminary results in building an annotated corpus of the Palestinian Arabic dialect. The corpus consists of about 43K words, stemming from diverse resources. The paper discusses some linguistic facts about the Palestinian dialect, compared with the Modern Standard Arabic, especially in terms of morphological, orthographic, and lexical variations, and suggests some directions to resolve the challenges these differences pose to the annotation goal. Furthermore, we present two pilot studies that investigate whether existing tools for processing Modern Standard Arabic and Egyptian Arabic can be used to speed up the annotation process of our Palestinian Arabic corpus.
UR - http://www.scopus.com/inward/record.url?scp=84988699834&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84988699834&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84988699834
T3 - ANLP 2014 - EMNLP 2014 Workshop on Arabic Natural Language Processing, Proceedings
SP - 18
EP - 27
BT - ANLP 2014 - EMNLP 2014 Workshop on Arabic Natural Language Processing, Proceedings
A2 - Habash, Nizar
A2 - Vogel, Stephan
PB - Association for Computational Linguistics (ACL)
Y2 - 25 October 2014
ER -