Exploring Size-Speed Trade-Offs in Static Index Pruning

Juan Rodriguez, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Static index pruning techniques remove postings from inverted index structures in order to decrease index size and query processing cost, while minimizing the resulting loss in result quality. A number of authors have proposed pruning techniques that use basic properties of postings as well as results of past queries to decide what postings should be kept. However, many open questions remain, and our goal is to address some of them using a machine learning based approach that tries to predict the usefulness of a posting. In this paper, we explore the following questions: (1) How much does an approach that learns from a rich set of features outperform previous work that uses heuristic approaches or just a few features? (2) What is the relationship between index size and query processing speed in static index pruning? We show that an approach that prunes postings using a rich set of features including post-hits and doc-hits can significantly outperform previous approaches, and that there is a very pronounced trade-off between index size and query processing speed for static index pruning that has not been previously explored.

    Original languageEnglish (US)
    Title of host publicationProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
    EditorsNaoki Abe, Huan Liu, Calton Pu, Xiaohua Hu, Nesreen Ahmed, Mu Qiao, Yang Song, Donald Kossmann, Bing Liu, Kisung Lee, Jiliang Tang, Jingrui He, Jeffrey Saltz
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages1093-1100
    Number of pages8
    ISBN (Electronic)9781538650356
    DOIs
    StatePublished - Jul 2 2018
    Event2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
    Duration: Dec 10 2018Dec 13 2018

    Publication series

    NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

    Conference

    Conference2018 IEEE International Conference on Big Data, Big Data 2018
    Country/TerritoryUnited States
    CitySeattle
    Period12/10/1812/13/18

    Keywords

    • search engine performance
    • search optimization
    • static index pruning
    • web search engine

    ASJC Scopus subject areas

    • Computer Science Applications
    • Information Systems

    Fingerprint

    Dive into the research topics of 'Exploring Size-Speed Trade-Offs in Static Index Pruning'. Together they form a unique fingerprint.

    Cite this