TY - GEN
T1 - Efficient term proximity search with term-pair indexes
AU - Yan, Hao
AU - Shi, Shuming
AU - Zhang, Fan
AU - Suel, Torsten
AU - Wen, Ji Rong
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2010
Y1 - 2010
N2 - There has been a large amount of research on early termination techniques in web search and information retrieval. Such techniques return the top-k documents without scanning and evaluating the full inverted lists of the query terms. Thus, they can greatly improve query processing efficiency. However, only a limited amount of efficient top-k processing work considers the impact of term proximity, i.e., the distance between term occurrences in a document, which has recently been integrated into a number of retrieval models to improve effectiveness. In this paper, we propose new early termination techniques for efficient query processing for the case where term proximity is integrated into the retrieval model. We propose new index structures based on a term-pair index, and study new document retrieval strategies on the resulting indexes. We perform a detailed experimental evaluation on our new techniques and compare them with the existing approaches. Experimental results on large-scale data sets show that our techniques can significantly improve the efficiency of query processing.
AB - There has been a large amount of research on early termination techniques in web search and information retrieval. Such techniques return the top-k documents without scanning and evaluating the full inverted lists of the query terms. Thus, they can greatly improve query processing efficiency. However, only a limited amount of efficient top-k processing work considers the impact of term proximity, i.e., the distance between term occurrences in a document, which has recently been integrated into a number of retrieval models to improve effectiveness. In this paper, we propose new early termination techniques for efficient query processing for the case where term proximity is integrated into the retrieval model. We propose new index structures based on a term-pair index, and study new document retrieval strategies on the resulting indexes. We perform a detailed experimental evaluation on our new techniques and compare them with the existing approaches. Experimental results on large-scale data sets show that our techniques can significantly improve the efficiency of query processing.
KW - Document structure
KW - Term proximity
KW - Term-pair index
KW - Top-k
UR - http://www.scopus.com/inward/record.url?scp=78651335518&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78651335518&partnerID=8YFLogxK
U2 - 10.1145/1871437.1871593
DO - 10.1145/1871437.1871593
M3 - Conference contribution
AN - SCOPUS:78651335518
SN - 9781450300995
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1229
EP - 1238
BT - CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops
T2 - 19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10
Y2 - 26 October 2010 through 30 October 2010
ER -