AraSAS: The Open Source Arabic Semantic Tagger

Mahmoud El-Haj, Paul Rayson, Elvis de Souza, Nouran Khallaf, Nizar Habash

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents (AraSAS) the first open-source Arabic semantic analysis tagging system. AraSAS is a software framework that provides full semantic tagging of text written in Arabic. AraSAS is based on the UCREL Semantic Analysis System (USAS) which was first developed to semantically tag English text. Similarly to USAS, AraSAS uses a hierarchical semantic tag set that contains 21 major discourse fields and 232 fine-grained semantic field tags. The paper describes the creation, validation and evaluation of AraSAS. In addition, we demonstrate a first case study to illustrate the affordances of applying USAS and AraSAS semantic taggers on the Zayed University Arabic-English Bilingual Undergraduate Corpus (ZAEBUC) (Palfreyman and Habash, 2022), where we show and compare the coverage of the two semantic taggers through running them on Arabic and English essays on different topics. The analysis expands to compare the taggers when run on texts in Arabic and English written by the same writer and texts written by male and by female students. Variables for comparison include frequency of use of particular semantic sub-domains, as well as the diversity of semantic elements within a text.

Original languageEnglish (US)
Title of host publication5th Workshop Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection, OSACT 2022 - Proceedings at Language Resources and Evaluation Conference, LREC 2022
EditorsHend Al-Khalifa, Tamer Elsayed, Hamdy Mubarak, Abdulmohsen Al-Thubaity, Walid Magdy, Kareem Darwish
PublisherEuropean Language Resources Association (ELRA)
Pages23-31
Number of pages9
ISBN (Electronic)9791095546757
StatePublished - 2022
Event5th Workshop Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection, OSACT 2022 - Marseille, France
Duration: Jun 20 2022Jun 25 2022

Publication series

Name5th Workshop Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection, OSACT 2022 - Proceedings at Language Resources and Evaluation Conference, LREC 2022

Conference

Conference5th Workshop Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection, OSACT 2022
Country/TerritoryFrance
CityMarseille
Period6/20/226/25/22

Keywords

  • Arabic
  • Corpus Linguistics
  • English
  • Semantics
  • Taggers

ASJC Scopus subject areas

  • Language and Linguistics
  • Education
  • Library and Information Sciences
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'AraSAS: The Open Source Arabic Semantic Tagger'. Together they form a unique fingerprint.

Cite this