Using conjunctions for faster disjunctive top-k queries

Michał Siedlaczek, Antonio Mallia, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    While current search engines use highly complex ranking functions with hundreds of features, they often perform an initial candidate generation step that uses a very simple ranking function to identify a limited set of promising candidates. A common approach is to use a disjunctive top-k query for this step. There are many methods for disjunctive top-k computation, but they tend to be slow for the required values of k, which are in the hundreds to thousands. We propose a new approach to safe disjunctive top-k computation that, somewhat counterintuitively, uses precomputed conjunctions of inverted lists to speed up disjunctive queries. The approach is based on a generalization of the well-known MaxScore algorithm, and utilizes recent improvements in threshold estimation techniques as well as new ideas to obtain significant improvements in performance. Our algorithms are implemented as an extension of the PISA framework for search-engine query processing, and available as open-source to support replication and follow-up work.

    Original languageEnglish (US)
    Title of host publicationWSDM 2022 - Proceedings of the 15th ACM International Conference on Web Search and Data Mining
    PublisherAssociation for Computing Machinery, Inc
    Pages917-927
    Number of pages11
    ISBN (Electronic)9781450391320
    DOIs
    StatePublished - Feb 11 2022
    Event15th ACM International Conference on Web Search and Data Mining, WSDM 2022 - Virtual, Online, United States
    Duration: Feb 21 2022Feb 25 2022

    Publication series

    NameWSDM 2022 - Proceedings of the 15th ACM International Conference on Web Search and Data Mining

    Conference

    Conference15th ACM International Conference on Web Search and Data Mining, WSDM 2022
    Country/TerritoryUnited States
    CityVirtual, Online
    Period2/21/222/25/22

    Keywords

    • Candidate generation
    • Query processing
    • Top-k retrieval

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Computer Science Applications
    • Software

    Fingerprint

    Dive into the research topics of 'Using conjunctions for faster disjunctive top-k queries'. Together they form a unique fingerprint.

    Cite this