TY - JOUR
T1 - Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization
AU - Mai, Anh L.
AU - Wang, Pengyu
AU - Abouzied, Azza
AU - Brucato, Matteo
AU - Haas, Peter J.
AU - Meliou, Alexandra
N1 - Publisher Copyright:
© 2024, VLDB Endowment. All rights reserved.
PY - 2024
Y1 - 2024
N2 - A package query returns a packageÐa multiset of tuplesÐthat maximizes or minimizes a linear objective function subject to linear constraints, thereby enabling in-database decision support. Prior work has established the equivalence of package queries to Integer Linear Programs (ILPs) and developed the SketchRefine algorithm for package query processing. While this algorithm was an important first step toward supporting prescriptive analytics scalably inside a relational database, it struggles when the data size grows beyond a few hundred million tuples or when the constraints become very tight. In this paper, we present Progressive Shading, a novel algorithm for processing package queries that can scale efficiently to billions of tuples and gracefully handle tight constraints. Progressive Shading solves a sequence of optimization problems over a hierarchy of relations, each resulting from an ever-finer partitioning of the original tuples into homogeneous groups until the original relation is obtained. This strategy avoids the premature discarding of high-quality tuples that can occur with SketchRefine. Our novel partitioning scheme, Dynamic Low Variance, can handle very large relations with multiple attributes and can dynamically adapt to both concentrated and spread-out sets of attribute values, provably outperforming traditional partitioning schemes such as kd-tree. We further optimize our system by replacing our off-the-shelf optimization software with customized ILP and LP solvers, called Dual Reducer and Parallel Dual Simplex respectively, that are highly accurate and orders of magnitude faster.
AB - A package query returns a packageÐa multiset of tuplesÐthat maximizes or minimizes a linear objective function subject to linear constraints, thereby enabling in-database decision support. Prior work has established the equivalence of package queries to Integer Linear Programs (ILPs) and developed the SketchRefine algorithm for package query processing. While this algorithm was an important first step toward supporting prescriptive analytics scalably inside a relational database, it struggles when the data size grows beyond a few hundred million tuples or when the constraints become very tight. In this paper, we present Progressive Shading, a novel algorithm for processing package queries that can scale efficiently to billions of tuples and gracefully handle tight constraints. Progressive Shading solves a sequence of optimization problems over a hierarchy of relations, each resulting from an ever-finer partitioning of the original tuples into homogeneous groups until the original relation is obtained. This strategy avoids the premature discarding of high-quality tuples that can occur with SketchRefine. Our novel partitioning scheme, Dynamic Low Variance, can handle very large relations with multiple attributes and can dynamically adapt to both concentrated and spread-out sets of attribute values, provably outperforming traditional partitioning schemes such as kd-tree. We further optimize our system by replacing our off-the-shelf optimization software with customized ILP and LP solvers, called Dual Reducer and Parallel Dual Simplex respectively, that are highly accurate and orders of magnitude faster.
UR - http://www.scopus.com/inward/record.url?scp=85190466620&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85190466620&partnerID=8YFLogxK
U2 - 10.14778/3641204.3641222
DO - 10.14778/3641204.3641222
M3 - Conference article
AN - SCOPUS:85190466620
SN - 2150-8097
VL - 17
SP - 1146
EP - 1158
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 5
T2 - 50th International Conference on Very Large Data Bases, VLDB 2024
Y2 - 25 August 2024 through 29 August 2024
ER -