TY - GEN
T1 - Top-k aggregation using intersections of ranked inputs
AU - Kumar, Ravi
AU - Punera, Kunal
AU - Suel, Torsten
AU - Vassilvitskii, Sergei
PY - 2009
Y1 - 2009
N2 - There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this problem is query processing in search engines: One has to combine information from several different posting lists (rankings) of web pages (objects) to obtain the top k web pages to answer user queries. Two particularly well-studied approaches to achieve efficiency in top-k aggregation include early-termination algorithms (e.g., TA and NRA) and pre-aggregation of some of the input lists. However, there has been little work on a rigorous treatment of combining these approaches. We generalize the TA and NRA algorithms to the case when pre-aggregated intersection lists are available in addition to the original lists. We show that our versions of TA and NRA continue to remain "instance optimal," a very strong optimality notion that is a highlight of the original TA and NRA algorithms. Using an index of millions of web pages and real-world search engine queries, we empirically characterize the performance gains offered by our new algorithms. We show that the practical benefits of intersection lists can be fully realized only with an early-termination algorithm.
AB - There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this problem is query processing in search engines: One has to combine information from several different posting lists (rankings) of web pages (objects) to obtain the top k web pages to answer user queries. Two particularly well-studied approaches to achieve efficiency in top-k aggregation include early-termination algorithms (e.g., TA and NRA) and pre-aggregation of some of the input lists. However, there has been little work on a rigorous treatment of combining these approaches. We generalize the TA and NRA algorithms to the case when pre-aggregated intersection lists are available in addition to the original lists. We show that our versions of TA and NRA continue to remain "instance optimal," a very strong optimality notion that is a highlight of the original TA and NRA algorithms. Using an index of millions of web pages and real-world search engine queries, we empirically characterize the performance gains offered by our new algorithms. We show that the practical benefits of intersection lists can be fully realized only with an early-termination algorithm.
KW - Early-termination
KW - Intersections
KW - NRA
KW - TA
UR - http://www.scopus.com/inward/record.url?scp=70349151328&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349151328&partnerID=8YFLogxK
U2 - 10.1145/1498759.1498830
DO - 10.1145/1498759.1498830
M3 - Conference contribution
AN - SCOPUS:70349151328
SN - 9781605583907
T3 - Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09
SP - 222
EP - 231
BT - Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09
T2 - 2nd ACM International Conference on Web Search and Data Mining, WSDM'09
Y2 - 9 February 2009 through 12 February 2009
ER -