Studying the impact of language-independent and language-specific features on hybrid Arabic Person name recognition

Mai Oudah, Khaled Shaalan

Research output: Contribution to journalArticle

Abstract

In this paper, extensive experiments are conducted to study the impact of features of different categories, in isolation and gradually in an incremental manner, on Arabic Person name recognition. We present an integrated system that employs the rule-based approach with the machine learning (ML)-based approach in order to develop a consolidated hybrid system. Our feature space is comprised of language-independent and language-specific features. The explored features are naturally grouped under six categories: Person named entity tags predicted by the rule-based component, word-level features, POS features, morphological features, gazetteer features, and other contextual features. As decision tree algorithm has proved comparatively higher efficiency as a classifier in current state-of-the-art hybrid Named Entity Recognition for Arabic, it is adopted in this study as the ML technique utilized by the hybrid system. Therefore, the experiments are focused on two dimensions: the standard dataset used and the set of selected features. A number of standard datasets are used for the training and testing of the hybrid system, including ACE (2003–2004) and ANERcorp. The experimental analysis indicates that both language-independent and language-specific features play an important role in overcoming the challenges posed by Arabic language and have demonstrated critical impact on optimizing the performance of the hybrid system.

Original languageEnglish (US)
Pages (from-to)351-378
Number of pages28
JournalLanguage Resources and Evaluation
Volume51
Issue number2
DOIs
StatePublished - Jun 1 2017

Keywords

  • Hybrid approach
  • Information extraction
  • Machine learning
  • Named entity recognition
  • Natural language processing
  • Rule-based approach

ASJC Scopus subject areas

  • Language and Linguistics
  • Education
  • Linguistics and Language
  • Library and Information Sciences

Fingerprint Dive into the research topics of 'Studying the impact of language-independent and language-specific features on hybrid Arabic Person name recognition'. Together they form a unique fingerprint.

  • Cite this