TY - GEN

T1 - Stochastic Package Queries in Probabilistic Databases

AU - Brucato, Matteo

AU - Yadav, Nishant

AU - Abouzied, Azza

AU - Haas, Peter J.

AU - Meliou, Alexandra

N1 - Funding Information:
Acknowledgments. This work was supported by the NYUAD Center for Interacting Urban Networks (CITIES), and funded by: Tamkeen under the NYUAD Research Institute Award CG001, the Swiss Re Institute under the Quantum Cities initiative, and the National Science Foundation under grants IIS-1453543 and IIS-1943971. The authors would like to thank the anonymous reviewers for their valuable insights, and Arya Mazumdar, Nishad Ranade, and Senay Solak for their help and suggestions during various phases of the work.
Publisher Copyright:
© 2020 Association for Computing Machinery.

PY - 2020/6/14

Y1 - 2020/6/14

N2 - We provide methods for in-database support of decision making under uncertainty. Many important decision problems correspond to selecting a "package" (bag of tuples in a relational database) that jointly satisfy a set of constraints while minimizing some overall "cost" function; in most real-world problems, the data is uncertain. We provide methods for specifying - -via a SQL extension - -and processing stochastic package queries (SPQS), in order to solve optimization problems over uncertain data, right where the data resides. Prior work in stochastic programming uses Monte Carlo methods where the original stochastic optimization problem is approximated by a large deterministic optimization problem that incorporates many "scenarios", i.e., sample realizations of the uncertain data values. For large database tables, however, a huge number of scenarios is required, leading to poor performance and, often, failure of the solver software. We therefore provide a novel ßs algorithm that, instead of trying to solve a large deterministic problem, seamlessly approximates it via a sequence of smaller problems defined over carefully crafted "summaries" of the scenarios that accelerate convergence to a feasible and near-optimal solution. Experimental results on our prototype system show that ßs can be orders of magnitude faster than prior methods at finding feasible and high-quality packages.

AB - We provide methods for in-database support of decision making under uncertainty. Many important decision problems correspond to selecting a "package" (bag of tuples in a relational database) that jointly satisfy a set of constraints while minimizing some overall "cost" function; in most real-world problems, the data is uncertain. We provide methods for specifying - -via a SQL extension - -and processing stochastic package queries (SPQS), in order to solve optimization problems over uncertain data, right where the data resides. Prior work in stochastic programming uses Monte Carlo methods where the original stochastic optimization problem is approximated by a large deterministic optimization problem that incorporates many "scenarios", i.e., sample realizations of the uncertain data values. For large database tables, however, a huge number of scenarios is required, leading to poor performance and, often, failure of the solver software. We therefore provide a novel ßs algorithm that, instead of trying to solve a large deterministic problem, seamlessly approximates it via a sequence of smaller problems defined over carefully crafted "summaries" of the scenarios that accelerate convergence to a feasible and near-optimal solution. Experimental results on our prototype system show that ßs can be orders of magnitude faster than prior methods at finding feasible and high-quality packages.

KW - data integration

KW - decision making

KW - optimization

KW - package queries

KW - portfolio optimization

KW - prescriptive analytics

KW - probabilistic databases

KW - simulation

KW - stochastic programming

UR - http://www.scopus.com/inward/record.url?scp=85086242828&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85086242828&partnerID=8YFLogxK

U2 - 10.1145/3318464.3389765

DO - 10.1145/3318464.3389765

M3 - Conference contribution

AN - SCOPUS:85086242828

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

SP - 269

EP - 283

BT - SIGMOD 2020 - Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

PB - Association for Computing Machinery

T2 - 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD 2020

Y2 - 14 June 2020 through 19 June 2020

ER -