TY - GEN
T1 - Improved index compression techniques for versioned document collections
AU - He, Jinru
AU - Zeng, Junyuan
AU - Suel, Torsten
PY - 2010
Y1 - 2010
N2 - Current Information Retrieval systems use inverted index structures for efficient query processing. Due to the extremely large size of many data sets, these index structures are usually kept in compressed form, and many techniques for optimizing compressed size and query processing speed have been proposed. In this paper, we focus on versioned document collections, that is, collections where each document is modified over time, resulting in multiple versions of the document. Consecutive versions of the same document are often similar, and several researchers have explored ideas for exploiting this similarity to decrease index size. We propose new index compression techniques for versioned document collections that achieve reductions in index size over previous methods. In particular, we first propose several bitwise compression techniques that achieve a compact index structure but that are too slow for most applications. Based on the lessons learned, we then propose additional techniques that come close to the sizes of the bitwise technique while also improving on the speed of the best previous methods.
AB - Current Information Retrieval systems use inverted index structures for efficient query processing. Due to the extremely large size of many data sets, these index structures are usually kept in compressed form, and many techniques for optimizing compressed size and query processing speed have been proposed. In this paper, we focus on versioned document collections, that is, collections where each document is modified over time, resulting in multiple versions of the document. Consecutive versions of the same document are often similar, and several researchers have explored ideas for exploiting this similarity to decrease index size. We propose new index compression techniques for versioned document collections that achieve reductions in index size over previous methods. In particular, we first propose several bitwise compression techniques that achieve a compact index structure but that are too slow for most applications. Based on the lessons learned, we then propose additional techniques that come close to the sizes of the bitwise technique while also improving on the speed of the best previous methods.
KW - Index compression
KW - Inverted index
KW - Versioned documents
UR - http://www.scopus.com/inward/record.url?scp=78651335930&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78651335930&partnerID=8YFLogxK
U2 - 10.1145/1871437.1871594
DO - 10.1145/1871437.1871594
M3 - Conference contribution
AN - SCOPUS:78651335930
SN - 9781450300995
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1239
EP - 1248
BT - CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops
T2 - 19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10
Y2 - 26 October 2010 through 30 October 2010
ER -