TY - GEN
T1 - Morphological annotation of quranic Arabic
AU - Dukes, Kais
AU - Habash, Nizar
PY - 2010
Y1 - 2010
N2 - The Quranic Arabic Corpus (http://corpus.quran.com) is an annotated linguistic resource with multiple layers of annotation including morphological segmentation, part-of-speech tagging, and syntactic analysis using dependency grammar. The motivation behind this work is to produce a resource that enables further analysis of the Quran, the 1,400 year old central religious text of Islam. This paper describes a new approach to morphological annotation of Quranic Arabic, a genre difficult to compare with other forms of Arabic. Processing Quranic Arabic is a unique challenge from a computational point of view, since the vocabulary and spelling differ from Modern Standard Arabic. The Quranic Arabic Corpus differs from other Arabic computational resources in adopting a tagset that closely follows traditional Arabic grammar. We made this decision in order to leverage a large body of existing historical grammatical analysis, and to encourage online collaborative annotation. In this paper, we discuss how the unique challenge of morphological annotation of Quranic Arabic is solved using a multi-stage approach. The different stages include automatic morphological tagging using diacritic edit-distance, two-pass manual verification, and online collaborative annotation. This process is evaluated to validate the appropriateness of the chosen methodology.
AB - The Quranic Arabic Corpus (http://corpus.quran.com) is an annotated linguistic resource with multiple layers of annotation including morphological segmentation, part-of-speech tagging, and syntactic analysis using dependency grammar. The motivation behind this work is to produce a resource that enables further analysis of the Quran, the 1,400 year old central religious text of Islam. This paper describes a new approach to morphological annotation of Quranic Arabic, a genre difficult to compare with other forms of Arabic. Processing Quranic Arabic is a unique challenge from a computational point of view, since the vocabulary and spelling differ from Modern Standard Arabic. The Quranic Arabic Corpus differs from other Arabic computational resources in adopting a tagset that closely follows traditional Arabic grammar. We made this decision in order to leverage a large body of existing historical grammatical analysis, and to encourage online collaborative annotation. In this paper, we discuss how the unique challenge of morphological annotation of Quranic Arabic is solved using a multi-stage approach. The different stages include automatic morphological tagging using diacritic edit-distance, two-pass manual verification, and online collaborative annotation. This process is evaluated to validate the appropriateness of the chosen methodology.
UR - http://www.scopus.com/inward/record.url?scp=85006228435&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85006228435&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85006228435
T3 - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
SP - 2530
EP - 2536
BT - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
A2 - Tapias, Daniel
A2 - Russo, Irene
A2 - Hamon, Olivier
A2 - Piperidis, Stelios
A2 - Calzolari, Nicoletta
A2 - Choukri, Khalid
A2 - Mariani, Joseph
A2 - Mazo, Helene
A2 - Maegaard, Bente
A2 - Odijk, Jan
A2 - Rosner, Mike
PB - European Language Resources Association (ELRA)
T2 - 7th International Conference on Language Resources and Evaluation, LREC 2010
Y2 - 17 May 2010 through 23 May 2010
ER -