Exploring the Landscape of Distributed Graph Sketching

David Tench, Evan T. West, Kenny Zhang, Michael A. Bender, Daniel DeLayo, Martín Farach-Colton, Gilvir Gill, Tyler Seip, Victor Zhang

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Recent work has initiated the study of dense graph processing using graph sketching methods, which drastically reduce space costs by lossily compressing information about the input graph. In this paper, we explore the strange and surprising performance landscape of sketching algorithms. We highlight both their surprising advantages for processing dense graphs that were previously prohibitively expensive to study, as well as the current limitations of the technique. Most notably, we show how sketching can avoid bottlenecks that limit conventional graph processing methods. Single-machine streaming graph processing systems are typically bottlenecked by CPU performance, and distributed graph processing systems are typically bottlenecked by network latency. We present Landscape, a distributed graph-stream processing system that uses linear sketching to distribute the CPU work of computing graph properties to distributed workers with no need for worker-to-worker communication. As a result, it overcomes the CPU and network bottlenecks that limit other systems. In fact, for the connected components problem, Landscape achieves a stream ingestion rate one-fourth that of maximum sustained RAM bandwidth, and is four times faster than random access RAM bandwidth. Additionally, we prove that for any sequence of graph updates and queries Landscape consumes at most a constant factor more network bandwidth than is required to receive the input stream. We show that this system can ingest up to 332 million stream updates per second on a graph with 217 vertices. We show that it scales well with more distributed compute power: given a cluster of 40 distributed worker machines, it can ingest updates 35 times as fast as with 1 distributed worker machine. Graph sketching algorithms tend to incur high computational costs when answering queries; to address this Landscape uses heuristics to reduce its query latency by up to four orders of magnitude over the prior state of the art.

    Original languageEnglish (US)
    Title of host publicationSIAM Symposium on Algorithm Engineering and Experiments, ALENEX 2025
    PublisherSociety for Industrial and Applied Mathematics Publications
    Pages133-146
    Number of pages14
    ISBN (Electronic)9798331311995
    StatePublished - 2025
    Event2025 SIAM Symposium on Algorithm Engineering and Experiments, ALENEX 2025 - New Orleans, United States
    Duration: Jan 12 2025Jan 13 2025

    Publication series

    NameProceedings of the Workshop on Algorithm Engineering and Experiments
    Volume2025-January
    ISSN (Print)2164-0300

    Conference

    Conference2025 SIAM Symposium on Algorithm Engineering and Experiments, ALENEX 2025
    Country/TerritoryUnited States
    CityNew Orleans
    Period1/12/251/13/25

    ASJC Scopus subject areas

    • General Engineering
    • Applied Mathematics

    Fingerprint

    Dive into the research topics of 'Exploring the Landscape of Distributed Graph Sketching'. Together they form a unique fingerprint.

    Cite this