TY - GEN
T1 - Fast Disjunctive Candidate Generation Using Live Block Filtering
AU - Mallia, Antonio
AU - Siedlaczek, Michał
AU - Suel, Torsten
N1 - Funding Information:
Our work leaves a number of open questions for future research. For example, one could try to further improve the speed of the Range-DRAAT approach by designing new compressed index formats that enable the use of SIMD commands for score accumulation, or that support faster on-the-fly creation of block-max arrays. There is also the potential to design other types of algorithms under the live-block approach that outperform our methods. Acknowledgements. This research was supported by NSF Grant IIS-1718680 and a grant from Amazon.
Publisher Copyright:
© 2021 ACM.
PY - 2021/8/3
Y1 - 2021/8/3
N2 - A lot of research has focused on the efficiency of search engine query processing, and in particular on disjunctive top-k queries that return the highest scoring k results that contain at least one of the query terms. Disjunctive top-k queries over simple ranking functions are commonly used to retrieve an initial set of candidate results that are then reranked by more complex, often machine-learned rankers. Many optimized top-k algorithms have been proposed, including MaxScore, WAND, BMW, and JASS. While the fastest methods achieve impressive results on top-10 and top-100 queries, they tend to become much slower for the larger k commonly used for candidate generation. In this paper, we focus on disjunctive top-k queries for larger k. We propose new algorithms that achieve much faster query processing for values of k up to thousands or tens of thousands. Our algorithms build on top of the live-block filtering approach of Dimopoulos et al, and exploit the SIMD capabilities of modern CPUs. We also perform a detailed experimental comparison of our methods with the fastest known approaches, and release a full model implementation of our methods and of the underlying live-block mechanism, which will allows others to design and experiment with additional methods under the live-block approach.
AB - A lot of research has focused on the efficiency of search engine query processing, and in particular on disjunctive top-k queries that return the highest scoring k results that contain at least one of the query terms. Disjunctive top-k queries over simple ranking functions are commonly used to retrieve an initial set of candidate results that are then reranked by more complex, often machine-learned rankers. Many optimized top-k algorithms have been proposed, including MaxScore, WAND, BMW, and JASS. While the fastest methods achieve impressive results on top-10 and top-100 queries, they tend to become much slower for the larger k commonly used for candidate generation. In this paper, we focus on disjunctive top-k queries for larger k. We propose new algorithms that achieve much faster query processing for values of k up to thousands or tens of thousands. Our algorithms build on top of the live-block filtering approach of Dimopoulos et al, and exploit the SIMD capabilities of modern CPUs. We also perform a detailed experimental comparison of our methods with the fastest known approaches, and release a full model implementation of our methods and of the underlying live-block mechanism, which will allows others to design and experiment with additional methods under the live-block approach.
KW - early termination
KW - inverted index
KW - top-k query processing
UR - http://www.scopus.com/inward/record.url?scp=85103041122&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103041122&partnerID=8YFLogxK
U2 - 10.1145/3437963.3441813
DO - 10.1145/3437963.3441813
M3 - Conference contribution
AN - SCOPUS:85103041122
T3 - WSDM 2021 - Proceedings of the 14th ACM International Conference on Web Search and Data Mining
SP - 671
EP - 679
BT - WSDM 2021 - Proceedings of the 14th ACM International Conference on Web Search and Data Mining
PB - Association for Computing Machinery, Inc
T2 - 14th ACM International Conference on Web Search and Data Mining, WSDM 2021
Y2 - 8 March 2021 through 12 March 2021
ER -