TY - GEN
T1 - A Sentiment Analysis Service Platform for Streamed Multilingual Tweets
AU - Karageorgou, Ioanna
AU - Liakos, Panagiotis
AU - Delis, Alex
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/10
Y1 - 2020/12/10
N2 - Micro-blogging and social-media platforms are now prominent forums for disseminating information, opinions and commentaries. Among these, Twitter enjoys an in-excess of 330M base of users who continually produce and consume information snippets. Users collectively create a voluminous and multi-lingual corpus in a very broad range of topics on a daily basis. The discourse generated in the blogosphere is often of prime interest and importance to individuals, organizations, and companies. These actors would certainly like to periodically receive an overall assessment of demonstrated sentiments on specific issues by automatically classifying tweets expressed in different languages in conjunction with big-data analytics. In this paper, we propose a scalable service platform that employs multilingual sentiment analysis to classify streamed-tweets and yields analytics for selected topics in real-time. We discuss the main component of our Spark-enabled platform as we seek to offer an effective big-data service that can: 1) dynamically handle voluminous as well as high-rate tweettraffic through a multi-component application exploiting the latest software developments, 2) accurately identify messages originated by non-genuine user-accounts, and 3) utilize the Spark machine-learning library (MLib) to successfully classify streamed multi-lingual messages in real-time, using multiple potentially distributed executors. To empower our service platform, we have adopted training sets and developed sentiment analysis (SA) models for English, French, and Greek that help classify streamed tweetswith high accuracy. While experimenting with our distributed analytical platform, we establish both accurate and real-time classification for tweetsexpressed in the above European languages.
AB - Micro-blogging and social-media platforms are now prominent forums for disseminating information, opinions and commentaries. Among these, Twitter enjoys an in-excess of 330M base of users who continually produce and consume information snippets. Users collectively create a voluminous and multi-lingual corpus in a very broad range of topics on a daily basis. The discourse generated in the blogosphere is often of prime interest and importance to individuals, organizations, and companies. These actors would certainly like to periodically receive an overall assessment of demonstrated sentiments on specific issues by automatically classifying tweets expressed in different languages in conjunction with big-data analytics. In this paper, we propose a scalable service platform that employs multilingual sentiment analysis to classify streamed-tweets and yields analytics for selected topics in real-time. We discuss the main component of our Spark-enabled platform as we seek to offer an effective big-data service that can: 1) dynamically handle voluminous as well as high-rate tweettraffic through a multi-component application exploiting the latest software developments, 2) accurately identify messages originated by non-genuine user-accounts, and 3) utilize the Spark machine-learning library (MLib) to successfully classify streamed multi-lingual messages in real-time, using multiple potentially distributed executors. To empower our service platform, we have adopted training sets and developed sentiment analysis (SA) models for English, French, and Greek that help classify streamed tweetswith high accuracy. While experimenting with our distributed analytical platform, we establish both accurate and real-time classification for tweetsexpressed in the above European languages.
KW - Classification of Multilingual tweets
KW - Real-time Big-data Analysis for Streamed tweets
KW - Service Platform for Language Analytics
KW - Spark-enabled Big-data Architecture
UR - http://www.scopus.com/inward/record.url?scp=85103831841&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103831841&partnerID=8YFLogxK
U2 - 10.1109/BigData50022.2020.9377837
DO - 10.1109/BigData50022.2020.9377837
M3 - Conference contribution
AN - SCOPUS:85103831841
T3 - Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
SP - 3262
EP - 3271
BT - Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
A2 - Wu, Xintao
A2 - Jermaine, Chris
A2 - Xiong, Li
A2 - Hu, Xiaohua Tony
A2 - Kotevska, Olivera
A2 - Lu, Siyuan
A2 - Xu, Weijia
A2 - Aluru, Srinivas
A2 - Zhai, Chengxiang
A2 - Al-Masri, Eyhab
A2 - Chen, Zhiyuan
A2 - Saltz, Jeff
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th IEEE International Conference on Big Data, Big Data 2020
Y2 - 10 December 2020 through 13 December 2020
ER -