TY - GEN
T1 - A Comparison of Top-k Threshold Estimation Techniques for Disjunctive Query Processing
AU - Mallia, Antonio
AU - Siedlaczek, Michal
AU - Sun, Mengyang
AU - Suel, Torsten
N1 - Funding Information:
Acknowledgements. This research was supported by NSF Grant IIS-1718680 and a grant from Amazon.
Publisher Copyright:
© 2020 ACM.
PY - 2020/10/19
Y1 - 2020/10/19
N2 - In the top-k threshold estimation problem, given a query q, the goal is to estimate the score of the result at rank k. A good estimate of this score can result in significant performance improvements for several query processing scenarios, including selective search, index tiering, and widely used disjunctive query processing algorithms such as MaxScore, WAND, and BMW. Several approaches have been proposed, including parametric approaches, methods using random sampling, and a recent approach based on machine learning. However, previous work fails to perform any experimental comparison between these approaches. In this paper, we address this issue by reimplementing four major approaches and comparing them in terms of estimation error, running time, likelihood of an overestimate, and end-to-end performance when applied to common classes of disjunctive top-k query processing algorithms.
AB - In the top-k threshold estimation problem, given a query q, the goal is to estimate the score of the result at rank k. A good estimate of this score can result in significant performance improvements for several query processing scenarios, including selective search, index tiering, and widely used disjunctive query processing algorithms such as MaxScore, WAND, and BMW. Several approaches have been proposed, including parametric approaches, methods using random sampling, and a recent approach based on machine learning. However, previous work fails to perform any experimental comparison between these approaches. In this paper, we address this issue by reimplementing four major approaches and comparing them in terms of estimation error, running time, likelihood of an overestimate, and end-to-end performance when applied to common classes of disjunctive top-k query processing algorithms.
KW - query processing
KW - threshold estimation
KW - top-k document retrieval
UR - http://www.scopus.com/inward/record.url?scp=85095863214&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095863214&partnerID=8YFLogxK
U2 - 10.1145/3340531.3412080
DO - 10.1145/3340531.3412080
M3 - Conference contribution
AN - SCOPUS:85095863214
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 2141
EP - 2144
BT - CIKM 2020 - Proceedings of the 29th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 29th ACM International Conference on Information and Knowledge Management, CIKM 2020
Y2 - 19 October 2020 through 23 October 2020
ER -