Optimized query execution in large search engines with global page ordering

Xiaohui Long, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Large web search engines have to answer thousands of queries per second with interactive response times. A major factor in the cost of executing a query is given by the lengths of the inverted lists for the query terms, which increase with the size of the document collection and are often in the range of many megabytes. To address this issue, IR and database researchers have proposed pruning techniques that compute or approximate term-based ranking functions without scanning over the full inverted lists. Over the last few years, search engines have incorporated new types of ranking techniques that exploit aspects such as the hyperlink structure of the web or the popularity of a page to obtain improved results. We focus on the question of how such techniques can be efficiently integrated into query processing. In particular, we study pruning techniques for query execution in large engines in the case where we have a global ranking of pages, as provided by Pagerank or any other method, in addition to the standard term-based approach. We describe pruning schemes for this case and evaluate their efficiency on an experimental clusterbased search engine with 120 million web pages. Our results show that there is significant potential benefit in such techniques.

    Original languageEnglish (US)
    Title of host publicationProceedings - 29th International Conference on Very Large Data Bases, VLDB 2003
    EditorsJohann Christoph Freytag, Peter C. Lockemann, Serge Abiteboul, Michael J. Carey, Patricia G. Selinger, Andreas Heuer
    PublisherMorgan Kaufmann
    Pages129-140
    Number of pages12
    ISBN (Electronic)0127224424, 9780127224428
    DOIs
    StatePublished - 2003
    Event29th International Conference on Very Large Data Bases, VLDB 2003 - Berlin, Germany
    Duration: Sep 9 2003Sep 12 2003

    Publication series

    NameProceedings - 29th International Conference on Very Large Data Bases, VLDB 2003

    Other

    Other29th International Conference on Very Large Data Bases, VLDB 2003
    Country/TerritoryGermany
    CityBerlin
    Period9/9/039/12/03

    ASJC Scopus subject areas

    • Software
    • Information Systems
    • Hardware and Architecture
    • Information Systems and Management
    • Computer Science Applications
    • Computer Networks and Communications

    Fingerprint

    Dive into the research topics of 'Optimized query execution in large search engines with global page ordering'. Together they form a unique fingerprint.

    Cite this