TY - GEN
T1 - A Characterization Study of Arabic Twitter Data with a Benchmarking for State-of-the-Art Opinion Mining Models
AU - Baly, Ramy
AU - Badaro, Gilbert
AU - El-Khoury, Georges
AU - Moukalled, Rawan
AU - Aoun, Rita
AU - Hajj, Hazem
AU - El-Hajj, Wassim
AU - Habash, Nizar
AU - Shaban, Khaled Bashir
N1 - Publisher Copyright:
©2017 Association for Computational Linguistics
PY - 2017
Y1 - 2017
N2 - Opinion mining in Arabic is a challenging task given the rich morphology of the language. The task becomes more challenging when it is applied to Twitter data, which contains additional sources of noise, such as the use of unstandardized dialectal variations, the non-conformation to grammatical rules, the use of Arabizi and code-switching, and the use of non-text objects such as images and URLs to express opinion. In this paper, we perform an analytical study to observe how such linguistic phenomena vary across different Arab regions. This study of Arabic Twitter characterization aims at providing better understanding of Arabic Tweets, and fostering advanced research on the topic. Furthermore, we explore the performance of the two schools of machine learning on Arabic Twitter, namely the feature engineering approach and the deep learning approach. We consider models that have achieved state-of-the-art performance for opinion mining in English. Results highlight the advantages of using deep learning-based models, and confirm the importance of using morphological abstractions to address Arabic’s complex morphology.
AB - Opinion mining in Arabic is a challenging task given the rich morphology of the language. The task becomes more challenging when it is applied to Twitter data, which contains additional sources of noise, such as the use of unstandardized dialectal variations, the non-conformation to grammatical rules, the use of Arabizi and code-switching, and the use of non-text objects such as images and URLs to express opinion. In this paper, we perform an analytical study to observe how such linguistic phenomena vary across different Arab regions. This study of Arabic Twitter characterization aims at providing better understanding of Arabic Tweets, and fostering advanced research on the topic. Furthermore, we explore the performance of the two schools of machine learning on Arabic Twitter, namely the feature engineering approach and the deep learning approach. We consider models that have achieved state-of-the-art performance for opinion mining in English. Results highlight the advantages of using deep learning-based models, and confirm the importance of using morphological abstractions to address Arabic’s complex morphology.
UR - http://www.scopus.com/inward/record.url?scp=85122598038&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85122598038&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85122598038
T3 - WANLP 2017, co-located with EACL 2017 - 3rd Arabic Natural Language Processing Workshop, Proceedings of the Workshop
SP - 110
EP - 118
BT - WANLP 2017, co-located with EACL 2017 - 3rd Arabic Natural Language Processing Workshop, Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
T2 - 3rd Arabic Natural Language Processing Workshop, WANLP 2017 held at EACL 2017
Y2 - 3 April 2017
ER -