TY - GEN
T1 - Efficient index updates for mixed update and query loads
AU - Nepomnyachiy, Sergey
AU - Suel, Torsten
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016
Y1 - 2016
N2 - Inverted index files are commonly used to support keyword search in document collections. While the offline construction of an index can be done efficiently, its incremental update remains a hard problem, especially when the index does not completely fit in memory. We propose a novel approach for maintaining up-to-date index files on a system that constantly serves document updates and user queries. Unlike previous updating policies, we use knowledge of both the update term distribution and the query term distribution to partition the terms into functional groups. We implement two schemes for selective enforcement of contiguous layout of the data on disk, while mandating that the cost of the consolidation is less than its estimated benefit. The first is the 'greedy merge' inspired by the ski-rental problem as studied in the context of competitive analysis. The second is the 'opportunistic prognosticator' -by making reliable predictions, the online problem becomes suitable for offline optimizations.
AB - Inverted index files are commonly used to support keyword search in document collections. While the offline construction of an index can be done efficiently, its incremental update remains a hard problem, especially when the index does not completely fit in memory. We propose a novel approach for maintaining up-to-date index files on a system that constantly serves document updates and user queries. Unlike previous updating policies, we use knowledge of both the update term distribution and the query term distribution to partition the terms into functional groups. We implement two schemes for selective enforcement of contiguous layout of the data on disk, while mandating that the cost of the consolidation is less than its estimated benefit. The first is the 'greedy merge' inspired by the ski-rental problem as studied in the context of competitive analysis. The second is the 'opportunistic prognosticator' -by making reliable predictions, the online problem becomes suitable for offline optimizations.
UR - http://www.scopus.com/inward/record.url?scp=85015189695&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85015189695&partnerID=8YFLogxK
U2 - 10.1109/BigData.2016.7840697
DO - 10.1109/BigData.2016.7840697
M3 - Conference contribution
AN - SCOPUS:85015189695
T3 - Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
SP - 984
EP - 991
BT - Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
A2 - Ak, Ronay
A2 - Karypis, George
A2 - Xia, Yinglong
A2 - Hu, Xiaohua Tony
A2 - Yu, Philip S.
A2 - Joshi, James
A2 - Ungar, Lyle
A2 - Liu, Ling
A2 - Sato, Aki-Hiro
A2 - Suzumura, Toyotaro
A2 - Rachuri, Sudarsan
A2 - Govindaraju, Rama
A2 - Xu, Weijia
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th IEEE International Conference on Big Data, Big Data 2016
Y2 - 5 December 2016 through 8 December 2016
ER -