Identifying broken plurals, Irregular gender, and rationality in Arabic text

Sarah Alkuhlani, Nizar Habash

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Arabic morphology is complex, partly because of its richness, and partly because of common irregular word forms, such as broken plurals (which resemble singular nouns), and nouns with irregular gender (feminine nouns that look masculine and vice versa). In addition, Arabic morphosyntactic agreement interacts with the lexical semantic feature of rationality, which has no morphological realization. In this paper, we present a series of experiments on the automatic prediction of the latent linguistic features of functional gender and number, and rationality in Arabic. We compare two techniques, using simple maximum likelihood (MLE) with back-off and a support vector machine based sequence tagger (Yamcha). We study a number of orthographic, morphological and syntactic learning features. Our results show that the MLE technique is preferred for words seen in the training data, while the Yamcha technique is optimal for unseen words, which are our real target. Furthermore, we show that for unseen words, morphological features help beyond orthographic features and that syntactic features help even more. A combination of the two techniques improves overall performance even further.

Original languageEnglish (US)
Title of host publicationEACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages675-685
Number of pages11
ISBN (Electronic)9781937284190
StatePublished - 2012
Event13th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2012 - Avignon, France
Duration: Apr 23 2012Apr 27 2012

Publication series

NameEACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings

Other

Other13th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2012
Country/TerritoryFrance
CityAvignon
Period4/23/124/27/12

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Fingerprint

Dive into the research topics of 'Identifying broken plurals, Irregular gender, and rationality in Arabic text'. Together they form a unique fingerprint.

Cite this