Don't thrash: How to cache your hash on flash

Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, Erez Zadok

    Research output: Contribution to journalArticlepeer-review

    Abstract

    This paper presents new alternatives to the well-known Bloom filter data structure. The Bloom filter, a compact data structure supporting set insertion and membership queries, has found wide application in databases, storage systems, and networks. Because the Bloom filter performs frequent random reads and writes, it is used almost exclu-sively in RAM, limiting the size of the sets it can represent. This paper first describes the quotient filter, which sup-ports the basic operations of the Bloom filter, achieving roughly comparable performance in terms of space and time, but with better data locality. Operations on the quotient fil-ter require only a small number of contiguous accesses. The quotient filter has other advantages over the Bloom filter: it supports deletions, it can be dynamically resized, and two quotient filters can be efficiently merged. The paper then gives two data structures, the buffered quotient filter and the cascade filter, which exploit the quo-tient filter advantages and thus serve as SSD-optimized al-ternatives to the Bloom filter. The cascade filter has better asymptotic I/O performance than the buffered quotient fil-ter, but the buffered quotient filter outperforms the cascade filter on small to medium data sets. Both data structures significantly outperform recently-proposed SSD-optimized Bloom filter variants, such as the elevator Bloom filter, buffered Bloom filter, and forest-structured Bloom filter. In experiments, the cascade filter and buffered quotient fil-ter performed insertions 8.6-11 times faster than the fastest Bloom filter variant and performed lookups 0.94-2.56 times faster.

    Original languageEnglish (US)
    Pages (from-to)1627-1637
    Number of pages11
    JournalProceedings of the VLDB Endowment
    Volume5
    Issue number11
    DOIs
    StatePublished - Jul 2012

    ASJC Scopus subject areas

    • Computer Science (miscellaneous)
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Don't thrash: How to cache your hash on flash'. Together they form a unique fingerprint.

    Cite this