TY - GEN
T1 - Performance of compressed inverted list caching in search engines
AU - Zhang, Jiangong
AU - Long, Xiaohui
AU - Suel, Torsten
PY - 2008
Y1 - 2008
N2 - Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of thousands of queries per second on tens of billions of documents, making query throughput a critical issue. To satisfy this heavy workload, search engines use a variety of performance optimizations including index compression, caching, and early termination. We focus on two techniques, inverted index compression and index caching, which play a crucial rule in web search engines as well as other high-performance information retrieval systems. We perform a comparison and evaluation of several inverted list compression algorithms, including new variants of existing algorithms that have not been studied before. We then evaluate different inverted list caching policies on large query traces, and finally study the possible performance benefits of combining compression and caching. The overall goal of this paper is to provide an updated discussion and evaluation of these two techniques, and to show how to select the best set of approaches and settings depending on parameter such as disk speed and main memory cache size.
AB - Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of thousands of queries per second on tens of billions of documents, making query throughput a critical issue. To satisfy this heavy workload, search engines use a variety of performance optimizations including index compression, caching, and early termination. We focus on two techniques, inverted index compression and index caching, which play a crucial rule in web search engines as well as other high-performance information retrieval systems. We perform a comparison and evaluation of several inverted list compression algorithms, including new variants of existing algorithms that have not been studied before. We then evaluate different inverted list caching policies on large query traces, and finally study the possible performance benefits of combining compression and caching. The overall goal of this paper is to provide an updated discussion and evaluation of these two techniques, and to show how to select the best set of approaches and settings depending on parameter such as disk speed and main memory cache size.
KW - Index caching
KW - Index compression
KW - Inverted index
KW - Search engines
UR - http://www.scopus.com/inward/record.url?scp=55149106898&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=55149106898&partnerID=8YFLogxK
U2 - 10.1145/1367497.1367550
DO - 10.1145/1367497.1367550
M3 - Conference contribution
AN - SCOPUS:55149106898
SN - 9781605580852
T3 - Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08
SP - 387
EP - 396
BT - Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08
T2 - 17th International Conference on World Wide Web 2008, WWW'08
Y2 - 21 April 2008 through 25 April 2008
ER -