Benchmarking Dialectal Arabic-Turkish Machine Translation

Hasan Alkheder, Houda Bouamor, Nizar Habash, Ahmet Zengin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Due to the significant influx of Syrian refugees in Turkey in recent years, the Syrian Arabic dialect has become increasingly prevalent in certain regions of Turkey. Developing a machine translation system between Turkish and Syrian Arabic would be crucial in facilitating communication between the Turkish and Syrian communities in these regions, which can have a positive impact on various domains such as politics, trade, and humanitarian aid. Such a system would also contribute positively to the growing Arab-focused tourism industry in Turkey. In this paper, we present the first research effort exploring translation between Syrian Arabic and Turkish. We use a set of 2, 000 parallel sentences from the MADAR corpus containing 25 different city dialects from different cities across the Arab world, in addition to Modern Standard Arabic (MSA), English, and French. Additionally, we explore the translation performance into Turkish from other Arabic dialects and compare the results to the performance achieved when translating from Syrian Arabic. We build our MADAR-Turk data set by manually translating the set of 2, 000 sentences from the Damascus dialect of Syria to Turkish with the help of two native Arabic speakers from Syria who are also highly fluent in Turkish. We evaluate the quality of the translations and report the results achieved. We make this first-of-a-kind data set publicly available to support research in machine translation between these important but less studied language pairs.

Original languageEnglish (US)
Title of host publicationResearch Track
EditorsMasao Utiyama, Rui Wang
PublisherAsia-Pacific Association for Machine Translation
Pages261-271
Number of pages11
ISBN (Electronic)9780000000002
StatePublished - 2023
Event19th Machine Translation Summit, MT Summit 2023 - Macau, China
Duration: Sep 4 2023Sep 8 2023

Publication series

NameMT Summit 2023 - Proceedings of 19th Machine Translation Summit
Volume1

Conference

Conference19th Machine Translation Summit, MT Summit 2023
Country/TerritoryChina
CityMacau
Period9/4/239/8/23

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Benchmarking Dialectal Arabic-Turkish Machine Translation'. Together they form a unique fingerprint.

Cite this