TY - GEN
T1 - Data Augmentation Techniques for Machine Translation of Code-Switched Texts
T2 - 2023 Findings of the Association for Computational Linguistics: EMNLP 2023
AU - Hamed, Injy
AU - Habash, Nizar
AU - Vu, Ngoc Thang
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Code-switching (CSW) text generation has been receiving increasing attention as a solution to address data scarcity. In light of this growing interest, we need more comprehensive studies comparing different augmentation approaches. In this work, we compare three popular approaches: lexical replacements, linguistic theories, and back-translation (BT), in the context of Egyptian Arabic-English CSW. We assess the effectiveness of the approaches on machine translation and the quality of augmentations through human evaluation. We show that BT and CSW predictive-based lexical replacement, being trained on CSW parallel data, perform best on both tasks. Linguistic theories and random lexical replacement prove to be effective in the lack of CSW parallel data, where both approaches achieve similar results.
AB - Code-switching (CSW) text generation has been receiving increasing attention as a solution to address data scarcity. In light of this growing interest, we need more comprehensive studies comparing different augmentation approaches. In this work, we compare three popular approaches: lexical replacements, linguistic theories, and back-translation (BT), in the context of Egyptian Arabic-English CSW. We assess the effectiveness of the approaches on machine translation and the quality of augmentations through human evaluation. We show that BT and CSW predictive-based lexical replacement, being trained on CSW parallel data, perform best on both tasks. Linguistic theories and random lexical replacement prove to be effective in the lack of CSW parallel data, where both approaches achieve similar results.
UR - http://www.scopus.com/inward/record.url?scp=85183291724&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85183291724&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85183291724
T3 - Findings of the Association for Computational Linguistics: EMNLP 2023
SP - 140
EP - 154
BT - Findings of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
Y2 - 6 December 2023 through 10 December 2023
ER -