ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English

Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present our work on collecting ArzEnST, a code-switched Egyptian Arabic - English Speech Translation Corpus. This corpus is an extension of the ArzEn speech corpus, which was collected through informal interviews with bilingual speakers. In this work, we collect translations in both directions, monolingual Egyptian Arabic and monolingual English, forming a three-way speech translation corpus. We make the translation guidelines and corpus publicly available. We also report results for baseline systems for machine translation and speech translation tasks. We believe this is a valuable resource that can motivate and facilitate further research studying the codeswitching phenomenon from a linguistic perspective and can be used to train and evaluate NLP systems.

Original languageEnglish (US)
Title of host publicationWANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages119-130
Number of pages12
ISBN (Electronic)9781959429272
StatePublished - 2022
Event7th Arabic Natural Language Processing Workshop, WANLP 2022 held with EMNLP 2022 - Abu Dhabi, United Arab Emirates
Duration: Dec 8 2022 → …

Publication series

NameWANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop

Conference

Conference7th Arabic Natural Language Processing Workshop, WANLP 2022 held with EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period12/8/22 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English'. Together they form a unique fingerprint.

Cite this