TY - GEN
T1 - To index or not to index
T2 - 35th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012
AU - Arroyuelo, Diego
AU - González, Senén
AU - Marin, Mauricio
AU - Oyarzún, Mauricio
AU - Suel, Torsten
N1 - Copyright:
Copyright 2012 Elsevier B.V., All rights reserved.
PY - 2012
Y1 - 2012
N2 - Positional ranking functions, widely used in Web search engines, improve result quality by exploiting the positions of the query terms within documents. However, it is well known that positional indexes demand large amounts of extra space, typically about three times the space of a basic nonpositional index. Textual data, on the other hand, is needed to produce text snippets. In this paper, we study time-space trade-offs for search engines with positional ranking functions and text snippet generation. We consider both index-based and non-index based alternatives for positional data. We aim to answer the question of whether one should index positional data or not. We show that there is a wide range of practical time-space trade-offs. Moreover, we show that both position and textual data can be stored using about 71% of the space used by traditional positional indexes, with a minor increase in query time. This yields considerable space savings and outperforms, both in space and time, recent alternatives from the literature. We also propose several efficient compressed text representations for snippet generation, which are able to use about half of the space of current state-of-the-art alternatives with little impact in query processing time.
AB - Positional ranking functions, widely used in Web search engines, improve result quality by exploiting the positions of the query terms within documents. However, it is well known that positional indexes demand large amounts of extra space, typically about three times the space of a basic nonpositional index. Textual data, on the other hand, is needed to produce text snippets. In this paper, we study time-space trade-offs for search engines with positional ranking functions and text snippet generation. We consider both index-based and non-index based alternatives for positional data. We aim to answer the question of whether one should index positional data or not. We show that there is a wide range of practical time-space trade-offs. Moreover, we show that both position and textual data can be stored using about 71% of the space used by traditional positional indexes, with a minor increase in query time. This yields considerable space savings and outperforms, both in space and time, recent alternatives from the literature. We also propose several efficient compressed text representations for snippet generation, which are able to use about half of the space of current state-of-the-art alternatives with little impact in query processing time.
KW - positional indexing
KW - text compression for snippet generation
UR - http://www.scopus.com/inward/record.url?scp=84866626346&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866626346&partnerID=8YFLogxK
U2 - 10.1145/2348283.2348320
DO - 10.1145/2348283.2348320
M3 - Conference contribution
AN - SCOPUS:84866626346
SN - 9781450316583
T3 - SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 255
EP - 264
BT - SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval
Y2 - 12 August 2012 through 16 August 2012
ER -