TY - GEN
T1 - Benchmarking Dialectal Arabic-Turkish Machine Translation
AU - Alkheder, Hasan
AU - Bouamor, Houda
AU - Habash, Nizar
AU - Zengin, Ahmet
N1 - Publisher Copyright:
© 2023 The authors. This article is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0)
PY - 2023
Y1 - 2023
N2 - Due to the significant influx of Syrian refugees in Turkey in recent years, the Syrian Arabic dialect has become increasingly prevalent in certain regions of Turkey. Developing a machine translation system between Turkish and Syrian Arabic would be crucial in facilitating communication between the Turkish and Syrian communities in these regions, which can have a positive impact on various domains such as politics, trade, and humanitarian aid. Such a system would also contribute positively to the growing Arab-focused tourism industry in Turkey. In this paper, we present the first research effort exploring translation between Syrian Arabic and Turkish. We use a set of 2, 000 parallel sentences from the MADAR corpus containing 25 different city dialects from different cities across the Arab world, in addition to Modern Standard Arabic (MSA), English, and French. Additionally, we explore the translation performance into Turkish from other Arabic dialects and compare the results to the performance achieved when translating from Syrian Arabic. We build our MADAR-Turk data set by manually translating the set of 2, 000 sentences from the Damascus dialect of Syria to Turkish with the help of two native Arabic speakers from Syria who are also highly fluent in Turkish. We evaluate the quality of the translations and report the results achieved. We make this first-of-a-kind data set publicly available to support research in machine translation between these important but less studied language pairs.
AB - Due to the significant influx of Syrian refugees in Turkey in recent years, the Syrian Arabic dialect has become increasingly prevalent in certain regions of Turkey. Developing a machine translation system between Turkish and Syrian Arabic would be crucial in facilitating communication between the Turkish and Syrian communities in these regions, which can have a positive impact on various domains such as politics, trade, and humanitarian aid. Such a system would also contribute positively to the growing Arab-focused tourism industry in Turkey. In this paper, we present the first research effort exploring translation between Syrian Arabic and Turkish. We use a set of 2, 000 parallel sentences from the MADAR corpus containing 25 different city dialects from different cities across the Arab world, in addition to Modern Standard Arabic (MSA), English, and French. Additionally, we explore the translation performance into Turkish from other Arabic dialects and compare the results to the performance achieved when translating from Syrian Arabic. We build our MADAR-Turk data set by manually translating the set of 2, 000 sentences from the Damascus dialect of Syria to Turkish with the help of two native Arabic speakers from Syria who are also highly fluent in Turkish. We evaluate the quality of the translations and report the results achieved. We make this first-of-a-kind data set publicly available to support research in machine translation between these important but less studied language pairs.
UR - http://www.scopus.com/inward/record.url?scp=85185219845&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85185219845&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85185219845
T3 - MT Summit 2023 - Proceedings of 19th Machine Translation Summit
SP - 261
EP - 271
BT - Research Track
A2 - Utiyama, Masao
A2 - Wang, Rui
PB - Asia-Pacific Association for Machine Translation
T2 - 19th Machine Translation Summit, MT Summit 2023
Y2 - 4 September 2023 through 8 September 2023
ER -