Abstract
A lot of research has studied how to optimize inverted in- dex structures in search engines through suitable reassign- ment of document identifiers. This approach was originally proposed to allow for better compression of the index, but subsequent work showed that it can also result in significant speed-ups for conjunctive queries and even certain types of disjunctive top-k algorithms. However, we do not have a good understanding of why this happens, and how we could directly optimize an index for query processing speed. As a result, existing techniques attempt to optimize for size, and treat speed increases as a welcome side-effect. In this paper, we take an initial but important step to- wards understanding and modeling speed increases due to document reordering. We define the problem of minimizing the cost of queries given an inverted index and a query dis- tribution, relate it to work on adaptive set intersection, and show that it is fundamentally different from that of mini- mizing compressed index size. We then propose a heuristic algorithm for finding a document reordering that minimizes query processing costs under suitable cost models. Our ex- periments show significant increases in the speed of inter- sections over state-of-the-art reordering techniques.
Original language | English (US) |
---|---|
Pages (from-to) | 475-487 |
Number of pages | 13 |
Journal | Proceedings of the VLDB Endowment |
Volume | 12 |
Issue number | 5 |
DOIs | |
State | Published - 2018 |
Event | 45th International Conference on Very Large Data Bases, VLDB 2019 - Los Angeles, United States Duration: Aug 26 2017 → Aug 30 2017 |
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- General Computer Science