TY - GEN
T1 - Developing an Egyptian Arabic Treebank
T2 - 9th International Conference on Language Resources and Evaluation, LREC 2014
AU - Maamouri, Mohamed
AU - Bies, Ann
AU - Kulick, Seth
AU - Ciul, Michael
AU - Habash, Nizar
AU - Eskander, Ramy
N1 - Funding Information:
This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract Nos. HR0011-11-C-0145 and HR0011-12-C-0014. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.
PY - 2014
Y1 - 2014
N2 - This paper describes the parallel development of an Egyptian Arabic Treebank and a morphological analyzer for Egyptian Arabic (CALIMA). By the very nature of Egyptian Arabic, the data collected is informal, for example Discussion Forum text, which we use for the treebank discussed here. In addition, Egyptian Arabic, like other Arabic dialects, is sufficiently different from Modern Standard Arabic (MSA) that tools and techniques developed for MSA cannot be simply transferred over to work on Egyptian Arabic work. In particular, a morphological analyzer for Egyptian Arabic is needed to mediate between the written text and the segmented, vocalized form used for the syntactic trees. This led to the necessity of a feedback loop between the treebank team and the analyzer team, as improvements in each area were fed to the other. Therefore, by necessity, there needed to be close cooperation between the annotation team and the tool development team, which was to their mutual benefit. Collaboration on this type of challenge, where tools and resources are limited, proved to be remarkably synergistic and opens the way to further fruitful work on Arabic dialects.
AB - This paper describes the parallel development of an Egyptian Arabic Treebank and a morphological analyzer for Egyptian Arabic (CALIMA). By the very nature of Egyptian Arabic, the data collected is informal, for example Discussion Forum text, which we use for the treebank discussed here. In addition, Egyptian Arabic, like other Arabic dialects, is sufficiently different from Modern Standard Arabic (MSA) that tools and techniques developed for MSA cannot be simply transferred over to work on Egyptian Arabic work. In particular, a morphological analyzer for Egyptian Arabic is needed to mediate between the written text and the segmented, vocalized form used for the syntactic trees. This led to the necessity of a feedback loop between the treebank team and the analyzer team, as improvements in each area were fed to the other. Therefore, by necessity, there needed to be close cooperation between the annotation team and the tool development team, which was to their mutual benefit. Collaboration on this type of challenge, where tools and resources are limited, proved to be remarkably synergistic and opens the way to further fruitful work on Arabic dialects.
KW - Dialectal morphological analyzer
KW - Dialectal treebank
KW - Egyptian Arabic
UR - http://www.scopus.com/inward/record.url?scp=85026881692&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85026881692&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85026881692
T3 - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
SP - 2348
EP - 2354
BT - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
A2 - Calzolari, Nicoletta
A2 - Choukri, Khalid
A2 - Goggi, Sara
A2 - Declerck, Thierry
A2 - Mariani, Joseph
A2 - Maegaard, Bente
A2 - Moreno, Asuncion
A2 - Odijk, Jan
A2 - Mazo, Helene
A2 - Piperidis, Stelios
A2 - Loftsson, Hrafn
PB - European Language Resources Association (ELRA)
Y2 - 26 May 2014 through 31 May 2014
ER -