Automatic Error Type Annotation for Arabic

Riadh Belkebir, Nizar Habash

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present ARETA, an automatic error type annotation system for Modern Standard Arabic. We design ARETA to address Arabic’s morphological richness and orthographic ambiguity. We base our error taxonomy on the Arabic Learner Corpus (ALC) Error Tagset with some modifications. ARETA achieves a performance of 85.8% (micro average F1 score) on a manually annotated blind test portion of ALC. We also demonstrate ARETA’s usability by applying it to a number of submissions from the QALB 2014 shared task for Arabic grammatical error correction. The resulting analyses give helpful insights on the strengths and weaknesses of different submissions, which is more useful than the opaque M2 scoring metrics used in the shared task. ARETA employs a large Arabic morphological analyzer, but is completely unsupervised otherwise. We make ARETA publicly available.

Original languageEnglish (US)
Title of host publicationCoNLL 2021 - 25th Conference on Computational Natural Language Learning, Proceedings
EditorsArianna Bisazza, Omri Abend
PublisherAssociation for Computational Linguistics (ACL)
Pages596-606
Number of pages11
ISBN (Electronic)9781955917056
StatePublished - 2021
Event25th Conference on Computational Natural Language Learning, CoNLL 2021 - Virtual, Online
Duration: Nov 10 2021Nov 11 2021

Publication series

NameCoNLL 2021 - 25th Conference on Computational Natural Language Learning, Proceedings

Conference

Conference25th Conference on Computational Natural Language Learning, CoNLL 2021
CityVirtual, Online
Period11/10/2111/11/21

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Automatic Error Type Annotation for Arabic'. Together they form a unique fingerprint.

Cite this