Optimizing top-k document retrieval strategies for block-max indexes

Constantinos Dimopoulos, Sergey Nepomnyachiy, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Large web search engines use significant hardware and energy resources to process hundreds of millions of queries each day, and a lot of research has focused on how to improve query processing e ciency. One general class of optimizations called early termination techniques is used in all major engines, and essentially involves computing top results without an exhaustive traversal and scoring of all potentially relevant index entries. Recent work in [9, 7] proposed several early termination algorithms for disjunctive top-k query processing, based on a new augmented index structure called Block-Max Index that enables aggressive skipping in the index. In this paper, we build on this work by studying new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9, 7]. We start by implementing and comparing Block-Max oriented algorithms based on the well-known Maxscore and WAND approaches. Then we study how to build better Block-Max index structures and design better index-traversal strategies, resulting in new algorithms that achieve a factor of 2 speed-up over the best results in [9] with acceptable space overheads. We also describe and evaluate a hierarchical algorithm for a new recursive Block-Max index structure.

    Original languageEnglish (US)
    Title of host publicationWSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining
    Pages113-122
    Number of pages10
    DOIs
    StatePublished - 2013
    Event6th ACM International Conference on Web Search and Data Mining, WSDM 2013 - Rome, Italy
    Duration: Feb 4 2013Feb 8 2013

    Publication series

    NameWSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining

    Other

    Other6th ACM International Conference on Web Search and Data Mining, WSDM 2013
    Country/TerritoryItaly
    CityRome
    Period2/4/132/8/13

    Keywords

    • block-max inverted index
    • docid-oriented block-max index
    • early termination
    • top-k query processing

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Computer Science Applications

    Fingerprint

    Dive into the research topics of 'Optimizing top-k document retrieval strategies for block-max indexes'. Together they form a unique fingerprint.

    Cite this