TY - GEN
T1 - Morphologically annotated corpora for seven arabic dialects
T2 - 4th Arabic Natural Language Processing Workshop, WANLP 2019, held at ACL 2019
AU - Alshargi, Faisal
AU - Dibas, Shahd
AU - Alkhereyf, Sakhar
AU - Faraj, Reem
AU - Abdulkareem, Basmah
AU - Yagi, Sane
AU - Kacha, Ouafaa
AU - Habash, Nizar
AU - Rambow, Owen
N1 - Publisher Copyright:
© ACL 2019.All right reserved.
PY - 2019
Y1 - 2019
N2 - We present a collection of morphologically annotated corpora for seven Arabic dialects: Taizi Yemeni, Sanaani Yemeni, Najdi, Jordanian, Syrian, Iraqi and Moroccan Arabic. The corpora collectively cover over 200,000 words, and are all manually annotated in a common set of standards for orthography, diacritized lemmas, tokenization, morphological units and English glosses. These corpora will be publicly available to serve as benchmarks for training and evaluating systems for Arabic dialect morphological analysis and disambiguation.
AB - We present a collection of morphologically annotated corpora for seven Arabic dialects: Taizi Yemeni, Sanaani Yemeni, Najdi, Jordanian, Syrian, Iraqi and Moroccan Arabic. The corpora collectively cover over 200,000 words, and are all manually annotated in a common set of standards for orthography, diacritized lemmas, tokenization, morphological units and English glosses. These corpora will be publicly available to serve as benchmarks for training and evaluating systems for Arabic dialect morphological analysis and disambiguation.
UR - http://www.scopus.com/inward/record.url?scp=85096536274&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096536274&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85096536274
T3 - ACL 2019 - 4th Arabic Natural Language Processing Workshop, WANLP 2019 - Proceedings of the Workshop
SP - 137
EP - 147
BT - ACL 2019 - 4th Arabic Natural Language Processing Workshop, WANLP 2019 - Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
Y2 - 1 August 2019
ER -