Beyond Quantile Methods: Improved Top-K Threshold Estimation for Traditional and Learned Sparse Indexes

Jinrui Gou, Yifan Liu, Minghao Shao, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Top-k threshold estimation is the problem of estimating the score of the k-th highest ranking result of a search query. A good estimate can be used to speed up many common top-k query processing algorithms, and thus a number of researchers have recently studied the problem. Among the various approaches that have been proposed, quantile methods appear to give the best estimates overall at modest computational costs, followed by sampling-based methods in certain cases. In this paper, we make two main contributions. First, we study how to get even better estimates than the state of the art. Starting from quantile-based methods, we propose a series of enhancements that give improved estimates in terms of the commonly used mean under-prediction fraction (MUF). Second, we study the threshold estimation problem on recently proposed learned sparse index structures, showing that our methods also work well for these cases. Our best methods substantially narrow the gap between the state of the art and the ideal MUF of 1.0, at some additional cost in time and space.

    Original languageEnglish (US)
    Title of host publicationProceedings - 2024 IEEE International Conference on Big Data, BigData 2024
    EditorsWei Ding, Chang-Tien Lu, Fusheng Wang, Liping Di, Kesheng Wu, Jun Huan, Raghu Nambiar, Jundong Li, Filip Ilievski, Ricardo Baeza-Yates, Xiaohua Hu
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages709-716
    Number of pages8
    ISBN (Electronic)9798350362480
    DOIs
    StatePublished - 2024
    Event2024 IEEE International Conference on Big Data, BigData 2024 - Washington, United States
    Duration: Dec 15 2024Dec 18 2024

    Publication series

    NameProceedings - 2024 IEEE International Conference on Big Data, BigData 2024

    Conference

    Conference2024 IEEE International Conference on Big Data, BigData 2024
    Country/TerritoryUnited States
    CityWashington
    Period12/15/2412/18/24

    Keywords

    • candidate generation
    • threshold estimation
    • top-k query processing

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Computer Networks and Communications
    • Computer Science Applications
    • Information Systems
    • Information Systems and Management
    • Safety, Risk, Reliability and Quality
    • Modeling and Simulation

    Fingerprint

    Dive into the research topics of 'Beyond Quantile Methods: Improved Top-K Threshold Estimation for Traditional and Learned Sparse Indexes'. Together they form a unique fingerprint.

    Cite this