Spoken arabic dialect identification using phonotactic modeling

Fadi Biadsy, Julia Hirschberg, Nizar Habash

Research output: Contribution to conferencePaperpeer-review

Abstract

The Arabic language is a collection of multiple variants, among which Modern Standard Arabic (MSA) has a special status as the formal written standard language of the media, culture and education across the Arab world. The other variants are informal spoken dialects that are the media of communication for daily life. Arabic dialects differ substantially from MSA and each other in terms of phonology, morphology, lexical choice and syntax. In this paper, we describe a system that automatically identifies the Arabic dialect (Gulf, Iraqi, Levantine, Egyptian and MSA) of a speaker given a sample of his/her speech. The phonotactic approach we use proves to be effective in identifying these dialects with considerable overall accuracy - 81.60% using 30s test utterances.

Original languageEnglish (US)
Pages53-61
Number of pages9
StatePublished - 2009
EventEACL 2009 Workshop on Computational Approaches to Semitic Languages, SEMITIC@EACL 2009 - Athens, Greece
Duration: Mar 31 2009 → …

Conference

ConferenceEACL 2009 Workshop on Computational Approaches to Semitic Languages, SEMITIC@EACL 2009
Country/TerritoryGreece
CityAthens
Period3/31/09 → …

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Spoken arabic dialect identification using phonotactic modeling'. Together they form a unique fingerprint.

Cite this