TY - GEN
T1 - AraBART
T2 - 7th Arabic Natural Language Processing Workshop, WANLP 2022 held with EMNLP 2022
AU - Eddine, Moussa Kamal
AU - Tomeh, Nadi
AU - Habash, Nizar
AU - Le Roux, Joseph
AU - Vazirgiannis, Michalis
N1 - Funding Information:
This work was granted access to the HPC resources of IDRIS under the allocation 2021-AD011012694 made by GENCI.
Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - Like most natural language understanding and generation tasks, state-of-the-art models for summarization are transformer-based sequence-to-sequence architectures that are pretrained on large corpora. While most existing models focus on English, Arabic remains understudied. In this paper we propose AraBART, the first Arabic model in which the encoder and the decoder are pretrained end-to-end, based on BART (Lewis et al., 2020). We show that AraBART achieves the best performance on multiple abstractive summarization datasets, outperforming strong baselines including a pretrained Arabic BERT-based model, multilingual BART, Arabic T5, and a multilingual T5 model. AraBART is publicly available on github and the Hugging Face model hub.
AB - Like most natural language understanding and generation tasks, state-of-the-art models for summarization are transformer-based sequence-to-sequence architectures that are pretrained on large corpora. While most existing models focus on English, Arabic remains understudied. In this paper we propose AraBART, the first Arabic model in which the encoder and the decoder are pretrained end-to-end, based on BART (Lewis et al., 2020). We show that AraBART achieves the best performance on multiple abstractive summarization datasets, outperforming strong baselines including a pretrained Arabic BERT-based model, multilingual BART, Arabic T5, and a multilingual T5 model. AraBART is publicly available on github and the Hugging Face model hub.
UR - http://www.scopus.com/inward/record.url?scp=85152937175&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85152937175&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85152937175
T3 - WANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop
SP - 31
EP - 42
BT - WANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
Y2 - 8 December 2022
ER -