Improved methods for static index pruning

Wei Jiang, Juan Rodriguez, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Static Index Pruning is a performance optimization technique for search engines that attempts to identify and remove index postings that are unlikely to lead to top results for typical user queries. The goal is to obtain a much smaller inverted index that can quickly return results that are (almost) as good as those for the unpruned index. We make two contributions: First, we improve on previous results for pruned index size through a careful analysis of both document and query distribution characteristics. We derive an initial model based on unigram probabilities that obtains gains over previous work in some cases, and a bigram-based approach that achieves some additional improvements. We also devise a simple method for generating query logs in the absence of real-life queries, useful in modeling top results. Our second contribution is to explore, and compare to previously proposed approaches that perform pruning based on how often documents or postings appeared in top positions in the past.

    Original languageEnglish (US)
    Title of host publicationProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
    EditorsRonay Ak, George Karypis, Yinglong Xia, Xiaohua Tony Hu, Philip S. Yu, James Joshi, Lyle Ungar, Ling Liu, Aki-Hiro Sato, Toyotaro Suzumura, Sudarsan Rachuri, Rama Govindaraju, Weijia Xu
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages686-695
    Number of pages10
    ISBN (Electronic)9781467390040
    DOIs
    StatePublished - 2016
    Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
    Duration: Dec 5 2016Dec 8 2016

    Publication series

    NameProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016

    Other

    Other4th IEEE International Conference on Big Data, Big Data 2016
    Country/TerritoryUnited States
    CityWashington
    Period12/5/1612/8/16

    Keywords

    • index
    • search
    • static pruning

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Information Systems
    • Hardware and Architecture

    Fingerprint

    Dive into the research topics of 'Improved methods for static index pruning'. Together they form a unique fingerprint.

    Cite this