Fast Bag-Of-Words Candidate Selection in Content-Based Instance Retrieval Systems

Michał Siedlaczek, Qi Wang, Yen Yu Chen, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Many content-based image search and instance retrieval systems implement bag-of-visual-words strategies for candidate selection. Visual processing of an image results in hundreds of visual words that make up a document, and these words are used to build an inverted index. Query processing then consists of an initial candidate selection phase that queries the inverted index, followed by more complex reranking of the candidates using various image features. The initial phase typically uses disjunctive top-k query processing algorithms originally proposed for searching text collections.Our objective in this paper is to optimize the performance of disjunctive top-k computation for candidate selection in content-based instance retrieval systems. While there has been extensive previous work on optimizing this phase for textual search engines, we are unaware of any published work that studies this problem for instance retrieval, where both index and query data are quite different from the distributions commonly found and exploited in the textual case. Using data from a commercial large-scale instance retrieval system, we address this challenge in three steps. First, we analyze the quantitative properties of index structures and queries in the system, and discuss how they differ from the case of text retrieval. Second, we describe an optimized term-at-a-time retrieval strategy that significantly outperforms baseline term-at-a-time and document-at-a-time strategies, achieving up to 66% speed-up over the most efficient baseline. Finally, we show that due to the different properties of the data, several common safe and unsafe early termination techniques from the literature fail to provide any significant performance benefits.

    Original languageEnglish (US)
    Title of host publicationProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
    EditorsYang Song, Bing Liu, Kisung Lee, Naoki Abe, Calton Pu, Mu Qiao, Nesreen Ahmed, Donald Kossmann, Jeffrey Saltz, Jiliang Tang, Jingrui He, Huan Liu, Xiaohua Hu
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages821-830
    Number of pages10
    ISBN (Electronic)9781538650356
    DOIs
    StatePublished - Jan 22 2019
    Event2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
    Duration: Dec 10 2018Dec 13 2018

    Publication series

    NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

    Conference

    Conference2018 IEEE International Conference on Big Data, Big Data 2018
    CountryUnited States
    CitySeattle
    Period12/10/1812/13/18

    Keywords

    • bag-of-visual-words
    • candidate selection
    • cascade ranking
    • image retrieval
    • inverted index
    • top-k search

    ASJC Scopus subject areas

    • Computer Science Applications
    • Information Systems

    Fingerprint Dive into the research topics of 'Fast Bag-Of-Words Candidate Selection in Content-Based Instance Retrieval Systems'. Together they form a unique fingerprint.

    Cite this