Compressing inverted indexes with recursive graph bisection: A reproducibility study

Joel Mackenzie, Antonio Mallia, Matthias Petri, J. Shane Culpepper, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Document reordering is an important but often overlooked preprocessing stage in index construction. Reordering document identifiers in graphs and inverted indexes has been shown to reduce storage costs and improve processing efficiency in the resulting indexes. However, surprisingly few document reordering algorithms are publicly available despite their importance. A new reordering algorithm derived from recursive graph bisection was recently proposed by Dhulipala et al., and shown to be highly effective and efficient when compared against other state-of-the-art reordering strategies. In this work, we present a reproducibility study of this new algorithm. We describe the implementation challenges encountered, and explore the performance characteristics of our clean-room reimplementation. We show that we are able to successfully reproduce the core results of the original paper, and show that the algorithm generalizes to other collections and indexing frameworks. Furthermore, we make our implementation publicly available to help promote further research in this space.

    Original languageEnglish (US)
    Title of host publicationAdvances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Proceedings
    EditorsDjoerd Hiemstra, Claudia Hauff, Norbert Fuhr, Leif Azzopardi, Benno Stein, Philipp Mayr
    PublisherSpringer Verlag
    Pages339-352
    Number of pages14
    ISBN (Print)9783030157111
    DOIs
    StatePublished - 2019
    Event41st European Conference on Information Retrieval, ECIR 2019 - Cologne, Germany
    Duration: Apr 14 2019Apr 18 2019

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume11437 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference41st European Conference on Information Retrieval, ECIR 2019
    Country/TerritoryGermany
    CityCologne
    Period4/14/194/18/19

    Keywords

    • Compression
    • Efficiency
    • Reordering
    • Reproducibility

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Compressing inverted indexes with recursive graph bisection: A reproducibility study'. Together they form a unique fingerprint.

    Cite this