Faster Learned Sparse Retrieval with Guided Traversal

Antonio Mallia, Joel MacKenzie, Torsten Suel, Nicola Tonellotto

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Neural information retrieval architectures based on transformers such as BERT are able to significantly improve system effectiveness over traditional sparse models such as BM25. Though highly effective, these neural approaches are very expensive to run, making them difficult to deploy under strict latency constraints. To address this limitation, recent studies have proposed new families of learned sparse models that try to match the effectiveness of learned dense models, while leveraging the traditional inverted index data structure for efficiency. Current learned sparse models learn the weights of terms in documents and, sometimes, queries; however, they exploit different vocabulary structures, document expansion techniques, and query expansion strategies, which can make them slower than traditional sparse models such as BM25. In this work, we propose a novel indexing and query processing technique that exploits a traditional sparse model's "guidance"to efficiently traverse the index, allowing the more effective learned model to execute fewer scoring operations. Our experiments show that our guided processing heuristic is able to boost the efficiency of the underlying learned sparse model by a factor of four without any measurable loss of effectiveness.

    Original languageEnglish (US)
    Title of host publicationSIGIR 2022 - Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
    PublisherAssociation for Computing Machinery, Inc
    Pages1901-1905
    Number of pages5
    ISBN (Electronic)9781450387323
    DOIs
    StatePublished - Jul 6 2022
    Event45th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022 - Madrid, Spain
    Duration: Jul 11 2022Jul 15 2022

    Publication series

    NameSIGIR 2022 - Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

    Conference

    Conference45th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022
    Country/TerritorySpain
    CityMadrid
    Period7/11/227/15/22

    Keywords

    • inverted index
    • learned sparse retrieval
    • query processing

    ASJC Scopus subject areas

    • Computer Graphics and Computer-Aided Design
    • Information Systems
    • Software

    Fingerprint

    Dive into the research topics of 'Faster Learned Sparse Retrieval with Guided Traversal'. Together they form a unique fingerprint.

    Cite this