TY - JOUR
T1 - DiVA
T2 - Using Application-Specific Policies to 'Dive' into Vector Approximations
AU - Tsakalozos, Konstantinos
AU - Evangelatos, Spiros
AU - Psallidas, Fotis
AU - Vieira, Marcos R.
AU - Tsotras, Vassilis J.
AU - Delis, Alex
N1 - Publisher Copyright:
© 2015 The British Computer Society 2015. All rights reserved.
PY - 2016/9/1
Y1 - 2016/9/1
N2 - In high-dimensional data domains, the performance of conventional tree-based access structures is occasionally outperformed by simple sequential scans. To this end, the introduction of approximation-based methods helped speed-up queries by providing compact representations of stored data. Approximation methods exploit vector quantization to index data mainly presumed to follow a uniform distribution. In real-world environments however, we mostly encounter both skewed data and query distributions. To address this dual challenge, we propose DiVA that combines the selective use of an approximation approach with an indexing mechanism to organize data subspaces in a high fan-out hierarchical structure. Moreover, DiVA reorganizes its own elements after receiving application hints regarding data access patterns. These hints or policies trigger the restructuring and possible expansion of DiVA so as to offer finer indexing granularity and improved access times in subspaces emerging as 'hot-spots'. The novelty of our approach lies in the self-organizing nature of DiVA driven by application-provided policies; the latter effectively guide the refinement of DiVA's elements as new data arrive, existing data are updated and the nature of query workloads continually changes. An extensive experimental evaluation using real data shows that DiVA reduces up-to 64% of the total number of I/Os if compared with state-of-art methods including the VA-file, GC-tree and A-tree.
AB - In high-dimensional data domains, the performance of conventional tree-based access structures is occasionally outperformed by simple sequential scans. To this end, the introduction of approximation-based methods helped speed-up queries by providing compact representations of stored data. Approximation methods exploit vector quantization to index data mainly presumed to follow a uniform distribution. In real-world environments however, we mostly encounter both skewed data and query distributions. To address this dual challenge, we propose DiVA that combines the selective use of an approximation approach with an indexing mechanism to organize data subspaces in a high fan-out hierarchical structure. Moreover, DiVA reorganizes its own elements after receiving application hints regarding data access patterns. These hints or policies trigger the restructuring and possible expansion of DiVA so as to offer finer indexing granularity and improved access times in subspaces emerging as 'hot-spots'. The novelty of our approach lies in the self-organizing nature of DiVA driven by application-provided policies; the latter effectively guide the refinement of DiVA's elements as new data arrive, existing data are updated and the nature of query workloads continually changes. An extensive experimental evaluation using real data shows that DiVA reduces up-to 64% of the total number of I/Os if compared with state-of-art methods including the VA-file, GC-tree and A-tree.
KW - accessing skewed multidimensional data and queries
KW - self-organizing indexing methods
KW - vector approximation techniques
UR - http://www.scopus.com/inward/record.url?scp=84991735142&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84991735142&partnerID=8YFLogxK
U2 - 10.1093/comjnl/bxv097
DO - 10.1093/comjnl/bxv097
M3 - Article
AN - SCOPUS:84991735142
SN - 0010-4620
VL - 59
SP - 1363
EP - 1382
JO - Computer Journal
JF - Computer Journal
IS - 9
ER -