TY - GEN
T1 - Optimizing Iterative Algorithms for Social Network Sharding
AU - Deng, Zishi
AU - Suel, Torsten
N1 - Funding Information:
In future work, we plan to do evaluations on additional data sets, including web graphs and additional social networks. We also plan to implement our methods in a distributed environment, most likely MapReduce [6], to measure resulting running times on even larger graphs. Beyond these immediate goals, the problem of how to best perform graph sharding of large social networks in distributed environments remains an interesting research challenge with many open questions and the potential for further significant gains. Acknowledgements. This research was supported by NSF Grant IIS-1718680
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - There has recently been significant interest in applications that require computations on massive graph structures, including scenarios where the graph is too large to be processed on a single machine. In this case, the graph needs to be partitioned into subgraphs that can be assigned to individual machines, in a process called graph or social network sharding. Given the sizes of the graphs involved, it is necessary or at least highly desirable that the partitioning itself can also be performed in a distributed manner, instead of running a sequential partitioning algorithm on a single node.We study such distributed algorithms for graph sharding, where the goal is to create subgraphs of roughly equal size that minimize the number of edges crossing subgraph boundaries. In particular, we focus on two well-known approaches that can be efficiently i mplemented i n M apReduce a nd r elated distributed computing paradigms: the Balanced Label Propagation algorithm of Ugander and Backstrom, and the method of Duong et al. based on the Bayesian Stochastic Block Modeling approach of Hofman and Wiggins. Our contributions are as follows: (1) We perform the first direct experimental comparison of the two approaches, which were independently proposed and published. (2) We propose and evaluate several enhancements of Balanced Label Propagation that result in improved graph shardings. (3) We propose and evaluate hybrid methods that perform label propagation both on individual nodes, as suggested by Ugander and Backstrom, and on stochastic blocks inferred using the approach of Duong et al.
AB - There has recently been significant interest in applications that require computations on massive graph structures, including scenarios where the graph is too large to be processed on a single machine. In this case, the graph needs to be partitioned into subgraphs that can be assigned to individual machines, in a process called graph or social network sharding. Given the sizes of the graphs involved, it is necessary or at least highly desirable that the partitioning itself can also be performed in a distributed manner, instead of running a sequential partitioning algorithm on a single node.We study such distributed algorithms for graph sharding, where the goal is to create subgraphs of roughly equal size that minimize the number of edges crossing subgraph boundaries. In particular, we focus on two well-known approaches that can be efficiently i mplemented i n M apReduce a nd r elated distributed computing paradigms: the Balanced Label Propagation algorithm of Ugander and Backstrom, and the method of Duong et al. based on the Bayesian Stochastic Block Modeling approach of Hofman and Wiggins. Our contributions are as follows: (1) We perform the first direct experimental comparison of the two approaches, which were independently proposed and published. (2) We propose and evaluate several enhancements of Balanced Label Propagation that result in improved graph shardings. (3) We propose and evaluate hybrid methods that perform label propagation both on individual nodes, as suggested by Ugander and Backstrom, and on stochastic blocks inferred using the approach of Duong et al.
KW - community detection
KW - graph partitioning
KW - label propagation
KW - network sharding
KW - social networks
UR - http://www.scopus.com/inward/record.url?scp=85125314378&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125314378&partnerID=8YFLogxK
U2 - 10.1109/BigData52589.2021.9671621
DO - 10.1109/BigData52589.2021.9671621
M3 - Conference contribution
AN - SCOPUS:85125314378
T3 - Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021
SP - 400
EP - 408
BT - Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021
A2 - Chen, Yixin
A2 - Ludwig, Heiko
A2 - Tu, Yicheng
A2 - Fayyad, Usama
A2 - Zhu, Xingquan
A2 - Hu, Xiaohua Tony
A2 - Byna, Suren
A2 - Liu, Xiong
A2 - Zhang, Jianping
A2 - Pan, Shirui
A2 - Papalexakis, Vagelis
A2 - Wang, Jianwu
A2 - Cuzzocrea, Alfredo
A2 - Ordonez, Carlos
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Big Data, Big Data 2021
Y2 - 15 December 2021 through 18 December 2021
ER -