Abstract
This paper discusses automatic determination of case in Arabic. This task is a major source of errors in full diacritization of Arabic. We use a gold-standard syntactic tree, and obtain an error rate of about 4.2%, with a machine learning based system outperforming a system using hand-written rules. A careful error analysis suggests that when we account for annotation errors in the gold standard, the error rate drops to 0.8%, with the hand-written rules outperforming the machine learning-based system.
Original language | English (US) |
---|---|
Pages | 1084-1092 |
Number of pages | 9 |
State | Published - 2007 |
Event | 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2007 - Prague, Czech Republic Duration: Jun 28 2007 → Jun 28 2007 |
Other
Other | 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2007 |
---|---|
Country/Territory | Czech Republic |
City | Prague |
Period | 6/28/07 → 6/28/07 |
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Computer Science Applications
- Information Systems