Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Scalable package queries in relational database systems

Published: 01 March 2016 Publication History

Abstract

Traditional database queries follow a simple model: they define constraints that each tuple in the result must satisfy. This model is computationally efficient, as the database system can evaluate the query conditions on each tuple individually. However, many practical, real-world problems require a collection of result tuples to satisfy constraints collectively, rather than individually. In this paper, we present package queries, a new query model that extends traditional database queries to handle complex constraints and preferences over answer sets. We develop a full-fledged package query system, implemented on top of a traditional database engine. Our work makes several contributions. First, we design PaQL, a SQL-based query language that supports the declarative specification of package queries. We prove that PaQL is at least as expressive as integer linear programming, and therefore, evaluation of package queries is in general NP-hard. Second, we present a fundamental evaluation strategy that combines the capabilities of databases and constraint optimization solvers to derive solutions to package queries. The core of our approach is a set of translation rules that transform a package query to an integer linear program. Third, we introduce an offline data partitioning strategy allowing query evaluation to scale to large data sizes. Fourth, we introduce SketchRefine, a scalable algorithm for package evaluation, with strong approximation guarantees ((1 ± ε)6-factor approximation). Finally, we present extensive experiments over real-world and benchmark data. The results demonstrate that SketchRefine is effective at deriving high-quality package results, and achieves runtime performance that is an order of magnitude faster than directly using ILP solvers over large datasets.

References

[1]
S. Basu Roy, S. Amer-Yahia, A. Chawla, G. Das, and C. Yu. Constructing and exploring composite items. In SIGMOD, pages 843--854, 2010.
[2]
A. Baykasoglu, T. Dereli, and S. Das. Project team selection using fuzzy optimization approach. Cybernetic Systems, 38(2):155--185, 2007.
[3]
J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975.
[4]
J. Bisschop. AIMMS Optimization Modeling. Paragon Decision Technology, 2006.
[5]
M. Brucato, J. F. Beltran, A. Abouzied, and A. Meliou. Scalable package queries in relational database systems. CoRR, abs/1512.03564, 2015.
[6]
M. Brucato, R. Ramakrishna, A. Abouzied, and A. Meliou. PackageBuilder: From tuples to packages. PVLDB, 7(13):1593--1596, 2014.
[7]
W. Cook and M. Hartmann. On the complexity of branch and cut methods for the traveling salesman problem. Polyhedral Combinatorics, 1:75--82, 1990.
[8]
M. De Choudhury, M. Feldman, S. Amer-Yahia, N. Golbandi, R. Lempel, and C. Yu. Automatic construction of travel itineraries using social breadcrumbs. In HyperText, pages 35--44, 2010.
[9]
T. Deng, W. Fan, and F. Geerts. On the complexity of package recommendation problems. In PODS, pages 261--272, 2012.
[10]
M. Ester, H. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231, 1996.
[11]
R. A. Finkel and J. L. Bentley. Quad trees a data structure for retrieval on composite keys. Acta informatica, 4(1):1--9, 1974.
[12]
GNU Bison. https://www.gnu.org/software/bison/.
[13]
M. X. Goemans and D. P. Williamson. The primal-dual method for approximation algorithms and its application to network design problems. Approximation algorithms for NP-hard problems, pages 144--191, 1997.
[14]
S. Guha, D. Gunopulos, N. Koudas, D. Srivastava, and M. Vlachos. Efficient approximation of optimization queries under parametric aggregation constraints. In VLDB, pages 778--789, 2003.
[15]
J. A. Hartigan and M. A. Wong. Algorithm as 136: A k-means clustering algorithm. Applied statistics, pages 100--108, 1979.
[16]
IBM CPLEX Optimization Studio. http://www.ibm.com/software/commerce/optimization/cplex-optimizer/.
[17]
A. Kalinin, U. Cetintemel, and S. Zdonik. Interactive data exploration using semantic windows. In SIGMOD, pages 505--516, 2014.
[18]
A. Kalinin, U. Çetintemel, and S. B. Zdonik. Searchlight: Enabling integrated search and exploration over large multidimensional data. PVLDB, 8(10):1094--1105, 2015.
[19]
P. Kanellakis, G. Kuper, and P. Revesz. Constraint query languages. Journal of Computer and System Sciences, 1(51):26--52, 1995.
[20]
L. Kaufman and P. J. Rousseeuw. Finding groups in data: an introduction to cluster analysis, volume 344. John Wiley & Sons, 2009.
[21]
T. Lappas, K. Liu, and E. Terzi. Finding a team of experts in social networks. In SIGKDD, pages 467--476, 2009.
[22]
A. Meliou and D. Suciu. Tiresias: The database oracle for how-to queries. In SIGMOD, pages 337--348, 2012.
[23]
B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause. Distributed submodular maximization: Identifying representative elements in massive data. In NIPS, 2013.
[24]
M. Padberg and G. Rinaldi. A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Review, 33(1):60--100, 1991.
[25]
A. G. Parameswaran, P. Venetis, and H. Garcia-Molina. Recommendation systems with complex constraints: A course recommendation perspective. ACM TOIS, 29(4):1--33, 2011.
[26]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.
[27]
F. Pinel and L. R. Varshney. Computational creativity for culinary recipes. In CHI, pages 439--442, 2014.
[28]
The Sloan Digital Sky Survey. http://www.sdss.org/.
[29]
The TPC-H Benchmark. http://www.tpc.org/tpch/.
[30]
D. P. Williamson and D. B. Shmoys. The design of approximation algorithms. Cambridge University Press, 2011.

Cited By

View all
  • (2024)Counterfactual Explanation at Will, with Zero Privacy LeakageProceedings of the ACM on Management of Data10.1145/36549332:3(1-29)Online publication date: 30-May-2024
  • (2024)Data distribution tailoring revisited: cost-efficient integration of representative dataThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00849-w33:5(1283-1306)Online publication date: 12-Apr-2024
  • (2023)Why Not Yet: Fixing a Top-k Ranking that is Not Fair to IndividualsProceedings of the VLDB Endowment10.14778/3598581.359860616:9(2377-2390)Online publication date: 1-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 9, Issue 7
March 2016
96 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 March 2016
Published in PVLDB Volume 9, Issue 7

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Counterfactual Explanation at Will, with Zero Privacy LeakageProceedings of the ACM on Management of Data10.1145/36549332:3(1-29)Online publication date: 30-May-2024
  • (2024)Data distribution tailoring revisited: cost-efficient integration of representative dataThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00849-w33:5(1283-1306)Online publication date: 12-Apr-2024
  • (2023)Why Not Yet: Fixing a Top-k Ranking that is Not Fair to IndividualsProceedings of the VLDB Endowment10.14778/3598581.359860616:9(2377-2390)Online publication date: 1-May-2023
  • (2023)Relational Expressions for Data Transformation and ComputationDatabases Theory and Applications10.1007/978-3-031-47843-7_17(241-255)Online publication date: 1-Nov-2023
  • (2022)Prescriptive analytics: a survey of emerging trends and technologiesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00539-y28:4(575-595)Online publication date: 10-Mar-2022
  • (2021)Machine learning in SQL by translation to TensorFlowProceedings of the Fifth Workshop on Data Management for End-To-End Machine Learning10.1145/3462462.3468879(1-11)Online publication date: 20-Jun-2021
  • (2021)Solving Markov Decision Processes with Partial State Abstractions2021 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA48506.2021.9561435(813-819)Online publication date: 30-May-2021
  • (2020)SuDocuProceedings of the VLDB Endowment10.14778/3415478.341549413:12(2861-2864)Online publication date: 1-Aug-2020
  • (2020)Research and Application of a Data Processing Method on Outliers in Unmanned Aerial Vehicle (UAV) Tracking MeasurementMATEC Web of Conferences10.1051/matecconf/202032701006327(01006)Online publication date: 6-Nov-2020
  • (2019)Scalable computation of high-order optimization queriesCommunications of the ACM10.1145/329988162:2(108-116)Online publication date: 28-Jan-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media