Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3389765acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Stochastic Package Queries in Probabilistic Databases

Published: 31 May 2020 Publication History

Abstract

We provide methods for in-database support of decision making under uncertainty. Many important decision problems correspond to selecting a "package" (bag of tuples in a relational database) that jointly satisfy a set of constraints while minimizing some overall "cost" function; in most real-world problems, the data is uncertain. We provide methods for specifying---via a SQL extension---and processing stochastic package queries (SPQS), in order to solve optimization problems over uncertain data, right where the data resides. Prior work in stochastic programming uses Monte Carlo methods where the original stochastic optimization problem is approximated by a large deterministic optimization problem that incorporates many "scenarios", i.e., sample realizations of the uncertain data values. For large database tables, however, a huge number of scenarios is required, leading to poor performance and, often, failure of the solver software. We therefore provide a novel ßs algorithm that, instead of trying to solve a large deterministic problem, seamlessly approximates it via a sequence of smaller problems defined over carefully crafted "summaries" of the scenarios that accelerate convergence to a feasible and near-optimal solution. Experimental results on our prototype system show that ßs can be orders of magnitude faster than prior methods at finding feasible and high-quality packages.

Supplementary Material

MP4 File (3318464.3389765.mp4)
Presentation Video

References

[1]
S. Ahmed and A. Shapiro. Solving chance-constrained stochastic programs via sampling and integer programming. In State-of-the-Art Decision-Making Tools in the Information-Intensive Age, pages 261--269. Informs, 2008.
[2]
D. Bienstock, M. Chertkov, and S. Harnett. Chance-constrained optimal power flow: Risk-aware network control under uncertainty. SIAM Review, 56(3):461--495, 2014.
[3]
M. Brucato, A. Abouzied, and A. Meliou. Package queries: efficient and scalable computation of high-order constraints. The VLDB Journal, 27(5):693--718, 2018.
[4]
M. Brucato, R. Ramakrishna, A. Abouzied, and A. Meliou. Package-Builder: From tuples to packages. PVLDB, 7(13):1593--1596, 2014.
[5]
M. Brucato, N. Yadav, A. Abouzied, P. J. Haas, and A. Meliou. Stochastic package queries in probabilistic databases. arXiv, 2020.
[6]
Z. Cai, Z. Vagena, L. Perez, S. Arumugam, P. J. Haas, and C. Jermaine. Simulation of database-valued Markov chains using SimSQL. In ACM SIGMOD, pages 637--648, 2013.
[7]
G. Calafiore and M. Campi. Uncertain convex programs: randomized solutions and confidence levels. Mathematical Programming, 102(1):25--46, Jan 2005.
[8]
G. Calafiore and F. Dabbene. Probabilistic and Randomized Methods for Design Under Uncertainty. Springer, 2006.
[9]
M. C. Campi and S. Garatti. A sampling-and-discarding approach to chance-constrained optimization: feasibility and optimality. Journal of Optimization Theory and Applications, 148(2):257--280, 2011.
[10]
M. C. Campi and S. Garatti. Wait-and-judge scenario optimization. Mathematical Programming, 167(1):155--189, 2018.
[11]
M. C. Campi, S. Garatti, and M. Prandini. The scenario approach for systems and control design. Annual Reviews in Control, 33(2):149--157,2009.
[12]
M. C. Campi, S. Garatti, and F. A. Ramponi. A general scenario theory for nonconvex optimization and decision making. IEEE Transactions on Automatic Control, 63(12):4067--4078, 2018.
[13]
G. Clare and A. Richards. Air traffic flow management under uncertainty: application of chance constraints. In Proc. 2nd Intl. Conf. Application and Theory of Automation in Command and Control Systems, pages 20--26. IRIT Press, 2012.
[14]
N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. The VLDB Journal--The International Journal on Very Large Data Bases, 16(4):523--544, 2007.
[15]
M. De Choudhury, M. Feldman, S. Amer-Yahia, N. Golbandi, R. Lempel, and C. Yu. Automatic construction of travel itineraries using social breadcrumbs. In HyperText, pages 35--44, 2010.
[16]
E. Delage and Y. Ye. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations research, 58(3):595--612, 2010.
[17]
D. Dentcheva. Optimization models with probabilistic constraints. In Probabilistic and randomized methods for design under uncertainty, pages 49--97. Springer, 2006.
[18]
X. L. Dong, A. Halevy, and C. Yu. Data integration with uncertainty. The VLDB Journal, 18(2):469--500, 2009.
[19]
N. E. Du Toit and J. W. Burdick. Probabilistic collision checking with chance constraints. IEEE Transactions on Robotics, 27(4):809--815, 2011.
[20]
J. Dupaová, N. Gröwe-Kuska, and W. Römisch. Scenario reduction in stochastic programming. Mathematical programming, 95(3):493--511,2003.
[21]
N. Geng, X. Xie, and Z. Zhang. Addressing healthcare operational deficiencies using stochastic and dynamic programming. International Journal of Production Research, 57(14):4371--4390, 2019.
[22]
G. A. Hanasusanto, V. Roitch, D. Kuhn, and W. Wiesemann. A distributionally robust perspective on uncertainty quantification and chance constrained programming. Mathematical Programming, 151(1):35--62,2015.
[23]
T. Homem-de Mello and G. Bayraksan. Monte Carlo sampling-based methods for stochastic optimization. Surveys in Operations Research and Management Science, 19(1):56--85, 2014.
[24]
L. J. Hong, Z. Hu, and G. Liu. Monte Carlo methods for value-at-risk and conditional value-at-risk: a review. ACM Trans. Modeling and Computer Simulation, 24(4):22, 2014.
[25]
IBM CPLEX Optimization Studio. http://www.ibm.com/software/commerce/optimization/cplex-optimizer/.
[26]
R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas. MCDB: A Monte Carlo approach to managing uncertain data. In ACM SIGMOD, pages 687--700. ACM, 2008.
[27]
R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas. The Monte Carlo database system: Stochastic analysis close to the data. ACM Trans. Database Syst., 36(3):18:1--18:41, 2011.
[28]
P. Jorion et al. Financial Risk Manager Handbook, volume 406. John Wiley & Sons, 2007.
[29]
P. Kall, S. W. Wallace, and P. Kall. Stochastic Programming. Springer, 1994.
[30]
R. Karuppiah, M. Martin, and I. E. Grossmann. A simple heuristic for reducing the number of scenarios in two-stage stochastic programming. Computers & Chemical Engineering, 34(8):1246--1255, 2010.
[31]
H. Lam and F. Li. Sampling uncertain constraints under parametric distributions. In 2018 Winter Simulation Conference (WSC), pages 2072--2083. IEEE, 2018.
[32]
J. Luedtke and S. Ahmed. A sample approximation approach for optimization with probabilistic constraints. SIAM Journal on Optimization, 19(2):674--699, 2008.
[33]
J. Luedtke, S. Ahmed, and G. L. Nemhauser. An integer programming approach for linear programs with probabilistic constraints. Mathematical Programming, 122(2):247--272, Apr 2010.
[34]
A. Nemirovski and A. Shapiro. Convex approximations of chance constrained programs. SIAM Journal on Optimization, 17(4):969--996,2006.
[35]
A. Nemirovski and A. Shapiro. Scenario approximations of chance constraints. In Probabilistic and Randomized Methods for Design Under Uncertainty, pages 3--47. Springer, 2006.
[36]
S. M. Ross. Introduction to Probability Models. Academic Press, 2014.
[37]
A. Shapiro, D. Dentcheva, and A. Ruszczynski. Lectures on Stochastic Programming: Modeling and Theory, Second Edition. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2014.
[38]
L. Siksnys and T. B. Pedersen. SolveDB: Integrating optimization problem solvers into SQL databases. In Proceedings of the 28th International Conference on Scientific and Statistical Database Management, page 14. ACM, 2016.
[39]
J. E. Smith and R. L. Winkler. The optimizer's curse: Skepticism and post decision surprise in decision analysis.Management Science,52(3):311--322, 2006.
[40]
D. Suciu, D. Olteanu, C. Ré, and C. Koch.Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool, 2011.
[41]
The Sloan Digital Sky Survey, data release 12. http://cas.sdss.org/dr12/.
[42]
TPC Benchmark?H. http://www.tpc.org/tpch/.
[43]
Yahoo! Finance. http://finance.yahoo.com/.

Cited By

View all
  • (2022)HypeR: Hypothetical Reasoning With What-If and How-To Queries Using a Probabilistic Causal ApproachProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526149(1598-1611)Online publication date: 10-Jun-2022
  • (2022)Multi-dimensional Probabilistic Regression over Imprecise Data StreamsProceedings of the ACM Web Conference 202210.1145/3485447.3512150(3317-3326)Online publication date: 25-Apr-2022
  • (2021)Database systems research in the Arab worldCommunications of the ACM10.1145/344775064:4(120-123)Online publication date: 22-Mar-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data integration
  2. decision making
  3. optimization
  4. package queries
  5. portfolio optimization
  6. prescriptive analytics
  7. probabilistic databases
  8. simulation
  9. stochastic programming

Qualifiers

  • Research-article

Funding Sources

  • NSF
  • Swiss Re Institute
  • NYUAD Research Institute

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)2
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)HypeR: Hypothetical Reasoning With What-If and How-To Queries Using a Probabilistic Causal ApproachProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526149(1598-1611)Online publication date: 10-Jun-2022
  • (2022)Multi-dimensional Probabilistic Regression over Imprecise Data StreamsProceedings of the ACM Web Conference 202210.1145/3485447.3512150(3317-3326)Online publication date: 25-Apr-2022
  • (2021)Database systems research in the Arab worldCommunications of the ACM10.1145/344775064:4(120-123)Online publication date: 22-Mar-2021
  • (2020)sPaQLTooLsProceedings of the VLDB Endowment10.14778/3415478.341549913:12(2881-2884)Online publication date: 14-Sep-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media