Fast Disjunctive Candidate Generation Using Live Block Filtering

Antonio Mallia, Michał Siedlaczek, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    A lot of research has focused on the efficiency of search engine query processing, and in particular on disjunctive top-k queries that return the highest scoring k results that contain at least one of the query terms. Disjunctive top-k queries over simple ranking functions are commonly used to retrieve an initial set of candidate results that are then reranked by more complex, often machine-learned rankers. Many optimized top-k algorithms have been proposed, including MaxScore, WAND, BMW, and JASS. While the fastest methods achieve impressive results on top-10 and top-100 queries, they tend to become much slower for the larger k commonly used for candidate generation. In this paper, we focus on disjunctive top-k queries for larger k. We propose new algorithms that achieve much faster query processing for values of k up to thousands or tens of thousands. Our algorithms build on top of the live-block filtering approach of Dimopoulos et al, and exploit the SIMD capabilities of modern CPUs. We also perform a detailed experimental comparison of our methods with the fastest known approaches, and release a full model implementation of our methods and of the underlying live-block mechanism, which will allows others to design and experiment with additional methods under the live-block approach.

    Original languageEnglish (US)
    Title of host publicationWSDM 2021 - Proceedings of the 14th ACM International Conference on Web Search and Data Mining
    PublisherAssociation for Computing Machinery, Inc
    Pages671-679
    Number of pages9
    ISBN (Electronic)9781450382977
    DOIs
    StatePublished - Aug 3 2021
    Event14th ACM International Conference on Web Search and Data Mining, WSDM 2021 - Virtual, Online, Israel
    Duration: Mar 8 2021Mar 12 2021

    Publication series

    NameWSDM 2021 - Proceedings of the 14th ACM International Conference on Web Search and Data Mining

    Conference

    Conference14th ACM International Conference on Web Search and Data Mining, WSDM 2021
    Country/TerritoryIsrael
    CityVirtual, Online
    Period3/8/213/12/21

    Keywords

    • early termination
    • inverted index
    • top-k query processing

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Computer Science Applications
    • Software

    Fingerprint

    Dive into the research topics of 'Fast Disjunctive Candidate Generation Using Live Block Filtering'. Together they form a unique fingerprint.

    Cite this