TY - GEN
T1 - Using conjunctions for faster disjunctive top-k queries
AU - Siedlaczek, Michał
AU - Mallia, Antonio
AU - Suel, Torsten
N1 - Funding Information:
ACKNOWLEDGMENTS This work was partially supported by NSF Grant IIS-1718680 and a grant from Amazon.
Publisher Copyright:
© 2022 ACM.
PY - 2022/2/11
Y1 - 2022/2/11
N2 - While current search engines use highly complex ranking functions with hundreds of features, they often perform an initial candidate generation step that uses a very simple ranking function to identify a limited set of promising candidates. A common approach is to use a disjunctive top-k query for this step. There are many methods for disjunctive top-k computation, but they tend to be slow for the required values of k, which are in the hundreds to thousands. We propose a new approach to safe disjunctive top-k computation that, somewhat counterintuitively, uses precomputed conjunctions of inverted lists to speed up disjunctive queries. The approach is based on a generalization of the well-known MaxScore algorithm, and utilizes recent improvements in threshold estimation techniques as well as new ideas to obtain significant improvements in performance. Our algorithms are implemented as an extension of the PISA framework for search-engine query processing, and available as open-source to support replication and follow-up work.
AB - While current search engines use highly complex ranking functions with hundreds of features, they often perform an initial candidate generation step that uses a very simple ranking function to identify a limited set of promising candidates. A common approach is to use a disjunctive top-k query for this step. There are many methods for disjunctive top-k computation, but they tend to be slow for the required values of k, which are in the hundreds to thousands. We propose a new approach to safe disjunctive top-k computation that, somewhat counterintuitively, uses precomputed conjunctions of inverted lists to speed up disjunctive queries. The approach is based on a generalization of the well-known MaxScore algorithm, and utilizes recent improvements in threshold estimation techniques as well as new ideas to obtain significant improvements in performance. Our algorithms are implemented as an extension of the PISA framework for search-engine query processing, and available as open-source to support replication and follow-up work.
KW - Candidate generation
KW - Query processing
KW - Top-k retrieval
UR - http://www.scopus.com/inward/record.url?scp=85125814284&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125814284&partnerID=8YFLogxK
U2 - 10.1145/3488560.3498489
DO - 10.1145/3488560.3498489
M3 - Conference contribution
AN - SCOPUS:85125814284
T3 - WSDM 2022 - Proceedings of the 15th ACM International Conference on Web Search and Data Mining
SP - 917
EP - 927
BT - WSDM 2022 - Proceedings of the 15th ACM International Conference on Web Search and Data Mining
PB - Association for Computing Machinery, Inc
T2 - 15th ACM International Conference on Web Search and Data Mining, WSDM 2022
Y2 - 21 February 2022 through 25 February 2022
ER -