Efficient index updates for mixed update and query loads

Sergey Nepomnyachiy, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Inverted index files are commonly used to support keyword search in document collections. While the offline construction of an index can be done efficiently, its incremental update remains a hard problem, especially when the index does not completely fit in memory. We propose a novel approach for maintaining up-to-date index files on a system that constantly serves document updates and user queries. Unlike previous updating policies, we use knowledge of both the update term distribution and the query term distribution to partition the terms into functional groups. We implement two schemes for selective enforcement of contiguous layout of the data on disk, while mandating that the cost of the consolidation is less than its estimated benefit. The first is the 'greedy merge' inspired by the ski-rental problem as studied in the context of competitive analysis. The second is the 'opportunistic prognosticator' -by making reliable predictions, the online problem becomes suitable for offline optimizations.

    Original languageEnglish (US)
    Title of host publicationProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
    EditorsRonay Ak, George Karypis, Yinglong Xia, Xiaohua Tony Hu, Philip S. Yu, James Joshi, Lyle Ungar, Ling Liu, Aki-Hiro Sato, Toyotaro Suzumura, Sudarsan Rachuri, Rama Govindaraju, Weijia Xu
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages984-991
    Number of pages8
    ISBN (Electronic)9781467390040
    DOIs
    StatePublished - 2016
    Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
    Duration: Dec 5 2016Dec 8 2016

    Publication series

    NameProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016

    Other

    Other4th IEEE International Conference on Big Data, Big Data 2016
    Country/TerritoryUnited States
    CityWashington
    Period12/5/1612/8/16

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Information Systems
    • Hardware and Architecture

    Fingerprint

    Dive into the research topics of 'Efficient index updates for mixed update and query loads'. Together they form a unique fingerprint.

    Cite this