TY - GEN
T1 - Optimized query execution in large search engines with global page ordering
AU - Long, Xiaohui
AU - Suel, Torsten
PY - 2003
Y1 - 2003
N2 - Large web search engines have to answer thousands of queries per second with interactive response times. A major factor in the cost of executing a query is given by the lengths of the inverted lists for the query terms, which increase with the size of the document collection and are often in the range of many megabytes. To address this issue, IR and database researchers have proposed pruning techniques that compute or approximate term-based ranking functions without scanning over the full inverted lists. Over the last few years, search engines have incorporated new types of ranking techniques that exploit aspects such as the hyperlink structure of the web or the popularity of a page to obtain improved results. We focus on the question of how such techniques can be efficiently integrated into query processing. In particular, we study pruning techniques for query execution in large engines in the case where we have a global ranking of pages, as provided by Pagerank or any other method, in addition to the standard term-based approach. We describe pruning schemes for this case and evaluate their efficiency on an experimental clusterbased search engine with 120 million web pages. Our results show that there is significant potential benefit in such techniques.
AB - Large web search engines have to answer thousands of queries per second with interactive response times. A major factor in the cost of executing a query is given by the lengths of the inverted lists for the query terms, which increase with the size of the document collection and are often in the range of many megabytes. To address this issue, IR and database researchers have proposed pruning techniques that compute or approximate term-based ranking functions without scanning over the full inverted lists. Over the last few years, search engines have incorporated new types of ranking techniques that exploit aspects such as the hyperlink structure of the web or the popularity of a page to obtain improved results. We focus on the question of how such techniques can be efficiently integrated into query processing. In particular, we study pruning techniques for query execution in large engines in the case where we have a global ranking of pages, as provided by Pagerank or any other method, in addition to the standard term-based approach. We describe pruning schemes for this case and evaluate their efficiency on an experimental clusterbased search engine with 120 million web pages. Our results show that there is significant potential benefit in such techniques.
UR - http://www.scopus.com/inward/record.url?scp=85012170745&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85012170745&partnerID=8YFLogxK
U2 - 10.1016/b978-012722442-8/50020-3
DO - 10.1016/b978-012722442-8/50020-3
M3 - Conference contribution
AN - SCOPUS:85012170745
T3 - Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003
SP - 129
EP - 140
BT - Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003
A2 - Freytag, Johann Christoph
A2 - Lockemann, Peter C.
A2 - Abiteboul, Serge
A2 - Carey, Michael J.
A2 - Selinger, Patricia G.
A2 - Heuer, Andreas
PB - Morgan Kaufmann
T2 - 29th International Conference on Very Large Data Bases, VLDB 2003
Y2 - 9 September 2003 through 12 September 2003
ER -