Automatic Morphological Enrichment of a Morphologically Underspecified Treebank

Sarah Alkuhlani, Nizar Habash, Ryan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we study the problem of automatic enrichment of a morphologically underspecified treebank for Arabic, a morphologically rich language. We show that we can map from a tagset of size six to one with 485 tags at an accuracy rate of 94%-95%. We can also identify the unspecified lemmas in the treebank with an accuracy over 97%. Furthermore, we demonstrate that using our automatic annotations improves the performance of a state-of-the-art Arabic morphological tagger. Our approach combines a variety of techniques from corpus-based statistical models to linguistic rules that target specific phenomena. These results suggest that the cost of treebanking can be reduced by designing underspecified treebanks that can be subsequently enriched automatically.

Original languageEnglish (US)
Title of host publicationProceedings of the 2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, NAACL-HLT 2013
EditorsDavid Elson, Anna Kazantseva, Stan Szpakowicz
PublisherAssociation for Computational Linguistics (ACL)
Pages460-470
Number of pages11
ISBN (Electronic)9781937284473
StatePublished - 2013
Event2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 - Atlanta, United States
Duration: Jun 14 2013 → …

Publication series

NameProceedings of the 2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013

Conference

Conference2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013
Country/TerritoryUnited States
CityAtlanta
Period6/14/13 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Automatic Morphological Enrichment of a Morphologically Underspecified Treebank'. Together they form a unique fingerprint.

Cite this