Determining case in Arabic: Learning complex linguistic behavior requires complex linguistic features

Nizar Habash, Ryan Gabbard, Owen Rambow, Seth Kulick, Mitch Marcus

Research output: Contribution to conferencePaperpeer-review

Abstract

This paper discusses automatic determination of case in Arabic. This task is a major source of errors in full diacritization of Arabic. We use a gold-standard syntactic tree, and obtain an error rate of about 4.2%, with a machine learning based system outperforming a system using hand-written rules. A careful error analysis suggests that when we account for annotation errors in the gold standard, the error rate drops to 0.8%, with the hand-written rules outperforming the machine learning-based system.

Original languageEnglish (US)
Pages1084-1092
Number of pages9
StatePublished - 2007
Event2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2007 - Prague, Czech Republic
Duration: Jun 28 2007Jun 28 2007

Other

Other2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2007
Country/TerritoryCzech Republic
CityPrague
Period6/28/076/28/07

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Determining case in Arabic: Learning complex linguistic behavior requires complex linguistic features'. Together they form a unique fingerprint.

Cite this