On the optimal time/space tradeoff for hash tables

Michael A. Bender, Martín Farach-Colton, John Kuszmaul, William Kuszmaul, Mingmou Liu

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    For nearly six decades, the central open question in the study of hash tables has been to determine the optimal achievable tradeoff curve between time and space. State-of-the-art hash tables offer the following guarantee: If keys/values are (logn) bits each, then it is possible to achieve constant-time insertions/deletions/queries while wasting only O(loglogn) bits of space per key when compared to the information-theoretic optimum-this bound has been proven to be optimal for a number of closely related problems (e.g., stable hashing, dynamic retrieval, and dynamically-resized filters). This paper shows that O(loglogn) wasted bits per key is not the end of the line for hashing. In fact, for any k [log∗ n], it is possible to achieve O(k)-time insertions/deletions, O(1)-time queries, and the k-th iterated logarithm O(log(k) n) wasted bits per key (all with high probability in n), while also supporting dynamic resizing as the size of the table changes. We further show that this tradeoff curve is the best achievable by any of a large class of hash tables, including any hash table designed using the current framework for making constant-time hash tables succinct. Our result holds for arbitrarily large keys/values, and in the case where keys/values are very small, we can tighten our bounds to o(1) wasted bits per key. Building on this, we obtain a constant-time dynamic filter that uses n glog"-1 g‰+ n loge + o(n) bits of space for a wide choice of false-positive rates ", resolving a long-standing open problem for the design of dynamic filters.

    Original languageEnglish (US)
    Title of host publicationSTOC 2022 - Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing
    EditorsStefano Leonardi, Anupam Gupta
    PublisherAssociation for Computing Machinery
    Pages1284-1297
    Number of pages14
    ISBN (Electronic)9781450392648
    DOIs
    StatePublished - Sep 6 2022
    Event54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022 - Rome, Italy
    Duration: Jun 20 2022Jun 24 2022

    Publication series

    NameProceedings of the Annual ACM Symposium on Theory of Computing
    ISSN (Print)0737-8017

    Conference

    Conference54th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2022
    Country/TerritoryItaly
    CityRome
    Period6/20/226/24/22

    Keywords

    • balls and bins
    • data structures
    • hash tables
    • succinct

    ASJC Scopus subject areas

    • Software

    Cite this