TY - CONF
T1 - Efficient query subscription processing for prospective search engines
AU - Irmak, Utku
AU - Mihaylov, Svilen
AU - Suel, Torsten
AU - Ganguly, Samrat
AU - Izmailov, Rauf
N1 - Funding Information:
∗CIS Department, Polytechnic University, Brooklyn, NY 11201. {uirmak@cis.poly.edu, suel@poly.edu}. The third author was also partially supported by NSF Awards IDM-0205647 and CCR-0093400, and the New York State Center for Advanced Technology in Telecommunications (CATT) at Polytechnic University.
Publisher Copyright:
© 2006 USENIX Association. All rights reserved.
PY - 2006
Y1 - 2006
N2 - Current web search engines are retrospective in that they limit users to searches against already existing pages. Prospective search engines, on the other hand, allow users to upload queries that will be applied to newly discovered pages in the future. Some examples of prospective search are the subscription features in Google News and in RSS-based blog search engines. In this paper, we study the problem of efficiently processing large numbers of keyword query subscriptions against a stream of newly discovered documents, and propose several query processing optimizations for prospective search. Our experimental evaluation shows that these techniques can improve the throughput of a well known algorithm by more than a factor of 20, and allow matching hundreds or thousands of incoming documents per second against millions of subscription queries per node.
AB - Current web search engines are retrospective in that they limit users to searches against already existing pages. Prospective search engines, on the other hand, allow users to upload queries that will be applied to newly discovered pages in the future. Some examples of prospective search are the subscription features in Google News and in RSS-based blog search engines. In this paper, we study the problem of efficiently processing large numbers of keyword query subscriptions against a stream of newly discovered documents, and propose several query processing optimizations for prospective search. Our experimental evaluation shows that these techniques can improve the throughput of a well known algorithm by more than a factor of 20, and allow matching hundreds or thousands of incoming documents per second against millions of subscription queries per node.
UR - http://www.scopus.com/inward/record.url?scp=80052126667&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052126667&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:80052126667
SP - 375
EP - 380
T2 - 2006 USENIX Annual Technical Conference
Y2 - 30 May 2006 through 3 June 2006
ER -