Document Reordering for Faster Intersection

Qi Wang, Torsten Suel

    Research output: Contribution to journalConference articlepeer-review

    Abstract

    A lot of research has studied how to optimize inverted in- dex structures in search engines through suitable reassign- ment of document identifiers. This approach was originally proposed to allow for better compression of the index, but subsequent work showed that it can also result in significant speed-ups for conjunctive queries and even certain types of disjunctive top-k algorithms. However, we do not have a good understanding of why this happens, and how we could directly optimize an index for query processing speed. As a result, existing techniques attempt to optimize for size, and treat speed increases as a welcome side-effect. In this paper, we take an initial but important step to- wards understanding and modeling speed increases due to document reordering. We define the problem of minimizing the cost of queries given an inverted index and a query dis- tribution, relate it to work on adaptive set intersection, and show that it is fundamentally different from that of mini- mizing compressed index size. We then propose a heuristic algorithm for finding a document reordering that minimizes query processing costs under suitable cost models. Our ex- periments show significant increases in the speed of inter- sections over state-of-the-art reordering techniques.

    Original languageEnglish (US)
    Pages (from-to)475-487
    Number of pages13
    JournalProceedings of the VLDB Endowment
    Volume12
    Issue number5
    DOIs
    StatePublished - 2018
    Event45th International Conference on Very Large Data Bases, VLDB 2019 - Los Angeles, United States
    Duration: Aug 26 2017Aug 30 2017

    ASJC Scopus subject areas

    • Computer Science (miscellaneous)
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Document Reordering for Faster Intersection'. Together they form a unique fingerprint.

    Cite this