TY - GEN
T1 - The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models
AU - Inoue, Go
AU - Alhafni, Bashar
AU - Baimukan, Nurpeiis
AU - Bouamor, Houda
AU - Habash, Nizar
N1 - Funding Information:
This research was supported with Cloud TPUs from Google’s TensorFlow Research Cloud (TFRC). This work was also carried out on the High Performance Computing resources at New York University Abu Dhabi. The first and second authors were supported by the New York University Abu Dhabi Global PhD Student Fellowship program. We thank Salam Khalifa, and Ossama Obeid for helpful discussions. We also thank the anonymous reviewers for their valuable comments.
Publisher Copyright:
© WANLP 2021 - 6th Arabic Natural Language Processing Workshop
PY - 2021
Y1 - 2021
N2 - In this paper, we explore the effects of language variants, data sizes, and fine-tuning task types in Arabic pre-trained language models. To do so, we build three pre-trained language models across three variants of Arabic: Modern Standard Arabic (MSA), dialectal Arabic, and classical Arabic, in addition to a fourth language model which is pre-trained on a mix of the three. We also examine the importance of pre-training data size by building additional models that are pre-trained on a scaled-down set of the MSA variant. We compare our different models to each other, as well as to eight publicly available models by fine-tuning them on five NLP tasks spanning 12 datasets. Our results suggest that the variant proximity of pre-training data to fine-tuning data is more important than the pre-training data size. We exploit this insight in defining an optimized system selection model for the studied tasks.
AB - In this paper, we explore the effects of language variants, data sizes, and fine-tuning task types in Arabic pre-trained language models. To do so, we build three pre-trained language models across three variants of Arabic: Modern Standard Arabic (MSA), dialectal Arabic, and classical Arabic, in addition to a fourth language model which is pre-trained on a mix of the three. We also examine the importance of pre-training data size by building additional models that are pre-trained on a scaled-down set of the MSA variant. We compare our different models to each other, as well as to eight publicly available models by fine-tuning them on five NLP tasks spanning 12 datasets. Our results suggest that the variant proximity of pre-training data to fine-tuning data is more important than the pre-training data size. We exploit this insight in defining an optimized system selection model for the studied tasks.
UR - http://www.scopus.com/inward/record.url?scp=85113899606&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113899606&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85113899606
T3 - WANLP 2021 - 6th Arabic Natural Language Processing Workshop, Proceedings of the Workshop
SP - 92
EP - 104
BT - WANLP 2021 - 6th Arabic Natural Language Processing Workshop, Proceedings of the Workshop
A2 - Habash, Nizar
A2 - Bouamor, Houda
A2 - Hajj, Hazem
A2 - Magdy, Walid
A2 - Zaghouani, Wajdi
A2 - Bougares, Fethi
A2 - Tomeh, Nadi
A2 - Farha, Ibrahim Abu
A2 - Touileb, Samia
PB - Association for Computational Linguistics (ACL)
T2 - 6th Arabic Natural Language Processing Workshop, WANLP 2021
Y2 - 19 April 2021
ER -