TY - GEN
T1 - Identifying broken plurals, Irregular gender, and rationality in Arabic text
AU - Alkuhlani, Sarah
AU - Habash, Nizar
N1 - Funding Information:
We would like to thank Yuval Marton for help with the parsing experiments. The first author was funded by a scholarship from the Saudi Arabian Ministry of Higher Education. The rest of the work was funded under DARPA projects number HR0011-08-C-0004 and HR0011-08-C-0110.
Publisher Copyright:
© 2012 Association for Computational Linguistics.
PY - 2012
Y1 - 2012
N2 - Arabic morphology is complex, partly because of its richness, and partly because of common irregular word forms, such as broken plurals (which resemble singular nouns), and nouns with irregular gender (feminine nouns that look masculine and vice versa). In addition, Arabic morphosyntactic agreement interacts with the lexical semantic feature of rationality, which has no morphological realization. In this paper, we present a series of experiments on the automatic prediction of the latent linguistic features of functional gender and number, and rationality in Arabic. We compare two techniques, using simple maximum likelihood (MLE) with back-off and a support vector machine based sequence tagger (Yamcha). We study a number of orthographic, morphological and syntactic learning features. Our results show that the MLE technique is preferred for words seen in the training data, while the Yamcha technique is optimal for unseen words, which are our real target. Furthermore, we show that for unseen words, morphological features help beyond orthographic features and that syntactic features help even more. A combination of the two techniques improves overall performance even further.
AB - Arabic morphology is complex, partly because of its richness, and partly because of common irregular word forms, such as broken plurals (which resemble singular nouns), and nouns with irregular gender (feminine nouns that look masculine and vice versa). In addition, Arabic morphosyntactic agreement interacts with the lexical semantic feature of rationality, which has no morphological realization. In this paper, we present a series of experiments on the automatic prediction of the latent linguistic features of functional gender and number, and rationality in Arabic. We compare two techniques, using simple maximum likelihood (MLE) with back-off and a support vector machine based sequence tagger (Yamcha). We study a number of orthographic, morphological and syntactic learning features. Our results show that the MLE technique is preferred for words seen in the training data, while the Yamcha technique is optimal for unseen words, which are our real target. Furthermore, we show that for unseen words, morphological features help beyond orthographic features and that syntactic features help even more. A combination of the two techniques improves overall performance even further.
UR - http://www.scopus.com/inward/record.url?scp=84942619790&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84942619790&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84942619790
T3 - EACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings
SP - 675
EP - 685
BT - EACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings
PB - Association for Computational Linguistics (ACL)
T2 - 13th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2012
Y2 - 23 April 2012 through 27 April 2012
ER -