Performance of compressed inverted list caching in search engines

Jiangong Zhang, Xiaohui Long, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of thousands of queries per second on tens of billions of documents, making query throughput a critical issue. To satisfy this heavy workload, search engines use a variety of performance optimizations including index compression, caching, and early termination. We focus on two techniques, inverted index compression and index caching, which play a crucial rule in web search engines as well as other high-performance information retrieval systems. We perform a comparison and evaluation of several inverted list compression algorithms, including new variants of existing algorithms that have not been studied before. We then evaluate different inverted list caching policies on large query traces, and finally study the possible performance benefits of combining compression and caching. The overall goal of this paper is to provide an updated discussion and evaluation of these two techniques, and to show how to select the best set of approaches and settings depending on parameter such as disk speed and main memory cache size.

    Original languageEnglish (US)
    Title of host publicationProceeding of the 17th International Conference on World Wide Web 2008, WWW'08
    Pages387-396
    Number of pages10
    DOIs
    StatePublished - 2008
    Event17th International Conference on World Wide Web 2008, WWW'08 - Beijing, China
    Duration: Apr 21 2008Apr 25 2008

    Publication series

    NameProceeding of the 17th International Conference on World Wide Web 2008, WWW'08

    Other

    Other17th International Conference on World Wide Web 2008, WWW'08
    Country/TerritoryChina
    CityBeijing
    Period4/21/084/25/08

    Keywords

    • Index caching
    • Index compression
    • Inverted index
    • Search engines

    ASJC Scopus subject areas

    • Computer Networks and Communications

    Fingerprint

    Dive into the research topics of 'Performance of compressed inverted list caching in search engines'. Together they form a unique fingerprint.

    Cite this