TY - GEN
T1 - Optimizing top-k document retrieval strategies for block-max indexes
AU - Dimopoulos, Constantinos
AU - Nepomnyachiy, Sergey
AU - Suel, Torsten
PY - 2013
Y1 - 2013
N2 - Large web search engines use significant hardware and energy resources to process hundreds of millions of queries each day, and a lot of research has focused on how to improve query processing e ciency. One general class of optimizations called early termination techniques is used in all major engines, and essentially involves computing top results without an exhaustive traversal and scoring of all potentially relevant index entries. Recent work in [9, 7] proposed several early termination algorithms for disjunctive top-k query processing, based on a new augmented index structure called Block-Max Index that enables aggressive skipping in the index. In this paper, we build on this work by studying new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9, 7]. We start by implementing and comparing Block-Max oriented algorithms based on the well-known Maxscore and WAND approaches. Then we study how to build better Block-Max index structures and design better index-traversal strategies, resulting in new algorithms that achieve a factor of 2 speed-up over the best results in [9] with acceptable space overheads. We also describe and evaluate a hierarchical algorithm for a new recursive Block-Max index structure.
AB - Large web search engines use significant hardware and energy resources to process hundreds of millions of queries each day, and a lot of research has focused on how to improve query processing e ciency. One general class of optimizations called early termination techniques is used in all major engines, and essentially involves computing top results without an exhaustive traversal and scoring of all potentially relevant index entries. Recent work in [9, 7] proposed several early termination algorithms for disjunctive top-k query processing, based on a new augmented index structure called Block-Max Index that enables aggressive skipping in the index. In this paper, we build on this work by studying new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9, 7]. We start by implementing and comparing Block-Max oriented algorithms based on the well-known Maxscore and WAND approaches. Then we study how to build better Block-Max index structures and design better index-traversal strategies, resulting in new algorithms that achieve a factor of 2 speed-up over the best results in [9] with acceptable space overheads. We also describe and evaluate a hierarchical algorithm for a new recursive Block-Max index structure.
KW - block-max inverted index
KW - docid-oriented block-max index
KW - early termination
KW - top-k query processing
UR - http://www.scopus.com/inward/record.url?scp=84874262067&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84874262067&partnerID=8YFLogxK
U2 - 10.1145/2433396.2433412
DO - 10.1145/2433396.2433412
M3 - Conference contribution
AN - SCOPUS:84874262067
SN - 9781450318693
T3 - WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining
SP - 113
EP - 122
BT - WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining
T2 - 6th ACM International Conference on Web Search and Data Mining, WSDM 2013
Y2 - 4 February 2013 through 8 February 2013
ER -