This chapter discusses the optimized query execution in large search engines with global page ordering. Large web search engines have to answer thousands of queries per second with interactive response times. A major factor in the cost of executing a query is given by the lengths of the inverted lists for the query terms, which increase with the size of the document collection and are often in the range of many megabytes. To address this issue, information retrieval (IR) and database researchers have proposed pruning techniques that compute or approximate term-based ranking functions without scanning over the full inverted lists. This chapter focuses on the question of how such techniques can be efficiently integrated into query processing. It studies pruning techniques for query execution in large engines in the case where one has a global ranking of pages, as provided by Pagerank or any other method, in addition to the standard term-based approach. The chapter describes pruning schemes for this case and evaluates their efficiency on an experimental cluster-based search engine with 120 million web pages. The results show that there is significant potential benefit in such techniques.
|Original language||English (US)|
|Title of host publication||Proceedings 2003 VLDB Conference|
|Subtitle of host publication||29th International Conference on Very Large Databases (VLDB)|
|Number of pages||12|
|State||Published - Jan 1 2003|
ASJC Scopus subject areas
- Computer Science(all)