## Abstract

There has recently been significant interest in applications that require computations on massive graph structures, including scenarios where the graph is too large to be processed on a single machine. In this case, the graph needs to be partitioned into subgraphs that can be assigned to individual machines, in a process called graph or social network sharding. Given the sizes of the graphs involved, it is necessary or at least highly desirable that the partitioning itself can also be performed in a distributed manner, instead of running a sequential partitioning algorithm on a single node.We study such distributed algorithms for graph sharding, where the goal is to create subgraphs of roughly equal size that minimize the number of edges crossing subgraph boundaries. In particular, we focus on two well-known approaches that can be efficiently i mplemented i n M apReduce a nd r elated distributed computing paradigms: the Balanced Label Propagation algorithm of Ugander and Backstrom, and the method of Duong et al. based on the Bayesian Stochastic Block Modeling approach of Hofman and Wiggins. Our contributions are as follows: (1) We perform the first direct experimental comparison of the two approaches, which were independently proposed and published. (2) We propose and evaluate several enhancements of Balanced Label Propagation that result in improved graph shardings. (3) We propose and evaluate hybrid methods that perform label propagation both on individual nodes, as suggested by Ugander and Backstrom, and on stochastic blocks inferred using the approach of Duong et al.

Original language | English (US) |
---|---|

Title of host publication | Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021 |

Editors | Yixin Chen, Heiko Ludwig, Yicheng Tu, Usama Fayyad, Xingquan Zhu, Xiaohua Tony Hu, Suren Byna, Xiong Liu, Jianping Zhang, Shirui Pan, Vagelis Papalexakis, Jianwu Wang, Alfredo Cuzzocrea, Carlos Ordonez |

Publisher | Institute of Electrical and Electronics Engineers Inc. |

Pages | 400-408 |

Number of pages | 9 |

ISBN (Electronic) | 9781665439022 |

DOIs | |

State | Published - 2021 |

Event | 2021 IEEE International Conference on Big Data, Big Data 2021 - Virtual, Online, United States Duration: Dec 15 2021 → Dec 18 2021 |

### Publication series

Name | Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021 |
---|

### Conference

Conference | 2021 IEEE International Conference on Big Data, Big Data 2021 |
---|---|

Country/Territory | United States |

City | Virtual, Online |

Period | 12/15/21 → 12/18/21 |

## Keywords

- community detection
- graph partitioning
- label propagation
- network sharding
- social networks

## ASJC Scopus subject areas

- Information Systems and Management
- Artificial Intelligence
- Computer Vision and Pattern Recognition
- Information Systems