A pipeline arabic named entity recognition using a hybrid approach

Mai Mohamed Oudah, Khaled Shaalan

Research output: Contribution to conferencePaperpeer-review

Abstract

Most Arabic Named Entity Recognition (NER) systems have been developed using either of two approaches: A rule-based or Machine Learning (ML) based approach, with their strengths and weaknesses. In this paper, the problem of Arabic NER is tackled through integrating the two approaches together in a pipelined process to create a hybrid system with the aim of enhancing the overall performance of NER tasks. The proposed system is capable of recognizing 11 different types of named entities (NEs): Person, Location, Organization, Date, Time, Price, Measurement, Percent, Phone Number, ISBN and File Name. Extensive experiments are conducted using three different ML classifiers to evaluate the overall performance of the hybrid system. The empirical results indicate that the hybrid approach outperforms both the rule-based and the ML-based approaches. Moreover, our system outperforms the state-of-the-art of Arabic NER in terms of accuracy when applied to ANERcorp dataset, with f-measures 94.4% for Person, 90.1% for Location, and 88.2% for Organization.

Original languageEnglish (US)
Pages2159-2176
Number of pages18
StatePublished - 2012
Event24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India
Duration: Dec 8 2012Dec 15 2012

Other

Other24th International Conference on Computational Linguistics, COLING 2012
Country/TerritoryIndia
CityMumbai
Period12/8/1212/15/12

Keywords

  • Machine learning
  • Named entity recognition
  • Natural language processing

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'A pipeline arabic named entity recognition using a hybrid approach'. Together they form a unique fingerprint.

Cite this