TY - GEN
T1 - Automatic morphological enrichment of a morphologically underspecified treebank
AU - Alkuhlani, Sarah
AU - Habash, Nizar
AU - Roth, Ryan
N1 - Publisher Copyright:
© 2013 Association for Computational Linguistics.
PY - 2013
Y1 - 2013
N2 - In this paper, we study the problem of automatic enrichment of a morphologically underspecified treebank for Arabic, a morphologically rich language. We show that we can map from a tagset of size six to one with 485 tags at an accuracy rate of 94%-95%. We can also identify the unspecified lemmas in the treebank with an accuracy over 97%. Furthermore, we demonstrate that using our automatic annotations improves the performance of a state-of-The-Art Arabic morphological tagger. Our approach combines a variety of techniques from corpus-based statistical models to linguistic rules that target specific phenomena. These results suggest that the cost of treebanking can be reduced by designing underspecified treebanks that can be subsequently enriched automatically.
AB - In this paper, we study the problem of automatic enrichment of a morphologically underspecified treebank for Arabic, a morphologically rich language. We show that we can map from a tagset of size six to one with 485 tags at an accuracy rate of 94%-95%. We can also identify the unspecified lemmas in the treebank with an accuracy over 97%. Furthermore, we demonstrate that using our automatic annotations improves the performance of a state-of-The-Art Arabic morphological tagger. Our approach combines a variety of techniques from corpus-based statistical models to linguistic rules that target specific phenomena. These results suggest that the cost of treebanking can be reduced by designing underspecified treebanks that can be subsequently enriched automatically.
UR - http://www.scopus.com/inward/record.url?scp=84903568308&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84903568308&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84903568308
T3 - NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference
SP - 460
EP - 470
BT - NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
T2 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013
Y2 - 9 June 2013 through 14 June 2013
ER -