Abstract
Most Arabic Named Entity Recognition (NER) systems have been developed using either of two approaches: A rule-based or Machine Learning (ML) based approach, with their strengths and weaknesses. In this paper, the problem of Arabic NER is tackled through integrating the two approaches together in a pipelined process to create a hybrid system with the aim of enhancing the overall performance of NER tasks. The proposed system is capable of recognizing 11 different types of named entities (NEs): Person, Location, Organization, Date, Time, Price, Measurement, Percent, Phone Number, ISBN and File Name. Extensive experiments are conducted using three different ML classifiers to evaluate the overall performance of the hybrid system. The empirical results indicate that the hybrid approach outperforms both the rule-based and the ML-based approaches. Moreover, our system outperforms the state-of-the-art of Arabic NER in terms of accuracy when applied to ANERcorp dataset, with f-measures 94.4% for Person, 90.1% for Location, and 88.2% for Organization.
Original language | English (US) |
---|---|
Pages | 2159-2176 |
Number of pages | 18 |
State | Published - 2012 |
Event | 24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India Duration: Dec 8 2012 → Dec 15 2012 |
Other
Other | 24th International Conference on Computational Linguistics, COLING 2012 |
---|---|
Country/Territory | India |
City | Mumbai |
Period | 12/8/12 → 12/15/12 |
Keywords
- Machine learning
- Named entity recognition
- Natural language processing
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Language and Linguistics
- Linguistics and Language