Abstract
We explore the contribution of different lexical and inflectional morphological features to dependency parsing of Arabic, a morphologically rich language. We experiment with all leading POS tagsets for Arabic, and introduce a few new sets. We show that training the parser using a simple regular expressive extension of an impoverished POS tagset with high prediction accuracy does better than using a highly informative POS tagset with only medium prediction accuracy, although the latter performs best on gold input. Using controlled experiments, we find that definiteness (or determiner presence), the so-called phi-features (person, number, gender), and undiacritzed lemma are most helpful for Arabic parsing on predicted input, while case and state are most helpful on gold.
Original language | English (US) |
---|---|
Pages | 13-21 |
Number of pages | 9 |
State | Published - 2010 |
Event | 1st Workshop on Statistical Parsing of Morphologically-Rich Languages, SPMRL 2010 - Los Angeles, United States Duration: Jun 5 2010 → … |
Conference
Conference | 1st Workshop on Statistical Parsing of Morphologically-Rich Languages, SPMRL 2010 |
---|---|
Country/Territory | United States |
City | Los Angeles |
Period | 6/5/10 → … |
ASJC Scopus subject areas
- Linguistics and Language
- Language and Linguistics
- Computer Science Applications