Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3389756acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Thrifty Query Execution via Incrementability

Published: 31 May 2020 Publication History

Abstract

Many applications schedule queries before all data is ready. To return fast query results, database systems can eagerly process existing data and incrementally incorporate new data into prior intermediate results, which often relies on incremental view maintenance (IVM) techniques. However, incrementally maintaining a query result can increase the total amount of work mainly as some early work is not useful for computing the final query result. In this paper, we propose a new metric incrementability to quantify the cost-effectiveness of IVM to decide how eagerly or lazily databases should incrementally execute a query. We further observe that different parts of a query have different levels of incrementability and the query execution should have a decomposed control flow based on the difference. Therefore, to address these needs, we propose a new query processing method Incrementability-aware Query Processing (InQP). We build a prototype InQP system based on Spark and show that InQP significantly reduces resource consumption with a similar latency compared with incrementability-oblivious approaches.

Supplementary Material

MP4 File (3318464.3389756.mp4)
Presentation Video

References

[1]
Apache kafka. https://kafka.apache.org/.
[2]
Spark structured streaming. https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html.
[3]
Y. Ahmad, O. Kennedy, C. Koch, and M. Nikolic. DBToaster: Higher-order delta processing for dynamic, frequently fresh views. Proc. VLDB Endow., 5(10):968--979, 2012.
[4]
A. Ayad and J. F. Naughton. Static optimization of conjunctive queries with sliding windows over infinite streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, June 13--18, 2004, pages 419--430, 2004.
[5]
B. Babcock, S. Babu, M. Datar, and R. Motwani. Chain: Operator scheduling for memory minimization in data stream systems. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, June 9--12, 2003, pages 253--264, 2003.
[6]
J. A. Blakeley, P. Larson, and F. W. Tompa. Efficiently updating materialized views. In Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, May 28--30, 1986, pages 61--71, 1986.
[7]
W. Cai, M. Balazinska, and D. Suciu. Pessimistic cardinality estimation: Tighter upper bounds for intermediate join cardinalities. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 18--35, 2019.
[8]
P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache flink#8482;: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4):28--38, 2015.
[9]
D. Carney, U. cC etintemel, A. Rasin, S. B. Zdonik, M. Cherniack, and M. Stonebraker. Operator scheduling in a data stream manager. In Proceedings of 29th International Conference on Very Large Data Bases, VLDB 2003, Berlin, Germany, September 9--12, 2003, pages 838--849, 2003.
[10]
B. Chandramouli, J. Goldstein, M. Barnett, R. DeLine, J. C. Platt, J. F. Terwilliger, and J. Wernsing. Trill: A high-performance incremental query processor for diverse analytics. Proc. VLDB Endow., 8(4):401--412, 2014.
[11]
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. A. Shah. Telegraphcq: Continuous dataflow processing for an uncertain world. In CIDR 2003, First Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 5--8, 2003, Online Proceedings, 2003.
[12]
S. Chaudhuri, V. R. Narasayya, and R. Ramamurthy. Estimating progress of long running SQL queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, June 13--18, 2004, pages 803--814, 2004.
[13]
J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. Niagaracq: A scalable continuous query system for internet databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16--18, 2000, Dallas, Texas, USA., pages 379--390, 2000.
[14]
R. Chirkova and J. Yang. Materialized views. Foundations and Trends in Databases, 4(4):295--405, 2012.
[15]
Y. Chung, M. L. Mortensen, C. Binnig, and T. Kraska. Estimating the impact of unknown unknowns on aggregate query results. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 861--876, 2016.
[16]
L. S. Colby, T. Griffin, L. Libkin, I. S. Mumick, and H. Trickey. Algorithms for deferred view maintenance. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4--6, 1996, pages 469--480, 1996.
[17]
L. S. Colby, A. Kawaguchi, D. F. Lieuwen, I. S. Mumick, and K. A. Ross. Supporting multiple view maintenance policies. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13--15, 1997, Tucson, Arizona, USA., pages 405--416, 1997.
[18]
C. Estan and J. F. Naughton. End-biased samples for join cardinality estimation. In Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, 3--8 April 2006, Atlanta, GA, USA, page 20, 2006.
[19]
L. Gao, M. Wang, X. S. Wang, and S. Padmanabhan. A learning-based approach to estimate statistics of operators in continuous queries: a case study. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, DMKD 2003, San Diego, California, USA, June 13, 2003, pages 66--72, 2003.
[20]
G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys (CSUR), 25(2):73--169, 1993.
[21]
T. Griffin and B. Kumar. Algebraic change propagation for semijoin and outerjoin queries. SIGMOD Record, 27(3):22--27, 1998.
[22]
T. Griffin and L. Libkin. Incremental maintenance of views with duplicates. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, USA, May 22--25, 1995, pages 328--339, 1995.
[23]
A. Gupta, I. S. Mumick, and V. S. Subrahmanian. Maintaining views incrementally. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, May 26--28, 1993, pages 157--166, 1993.
[24]
H. He, J. Xie, J. Yang, and H. Yu. Asymmetric batch incremental view maintenance. In K. Aberer, M. J. Franklin, and S. Nishio, editors, Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, 5--8 April 2005, Tokyo, Japan, pages 106--117. IEEE Computer Society, 2005.
[25]
J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In J. Peckham, editor, SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13--15, 1997, Tucson, Arizona, USA, pages 171--182. ACM Press, 1997.
[26]
M. Idris, M. Ugarte, and S. Vansummeren. The dynamic yannakakis algorithm: Compact and efficient query processing under updates. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, pages 1259--1274, 2017.
[27]
S. Krishnan, Z. Yang, K. Goldberg, J. M. Hellerstein, and I. Stoica. Learning to optimize join queries with deep reinforcement learning. CoRR, abs/1808.03196, 2018.
[28]
P. Larson and J. Zhou. Efficient maintenance of materialized outer-join views. In Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15--20, 2007, pages 56--65, 2007.
[29]
V. Leis, A. Gubichev, A. Mirchev, P. A. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? Proc. VLDB Endow., 9(3):204--215, 2015.
[30]
V. Leis, B. Radke, A. Gubichev, A. Kemper, and T. Neumann. Cardinality estimation done right: Index-based join sampling. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings, 2017.
[31]
L. F. Mackert and G. M. Lohman. R* optimizer validation and performance evaluation for local queries. In Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, May 28--30, 1986, pages 84--95, 1986.
[32]
T. Malik, R. C. Burns, and N. V. Chawla. A black-box approach to query cardinality estimation. In CIDR 2007, Third Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 7--10, 2007, Online Proceedings, pages 56--67, 2007.
[33]
R. C. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A learned query optimizer. Proc. VLDB Endow., 12(11):1705--1718, 2019.
[34]
M. Nikolic, M. Dashti, and C. Koch. How to win a hot dog eating contest: Distributed incremental view maintenance with batch updates. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 511--526, 2016.
[35]
J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. An empirical analysis of deep learning for cardinality estimation. CoRR, abs/1905.06425, 2019.
[36]
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, Boston, Massachusetts, USA, May 30 - June 1., pages 23--34, 1979.
[37]
M. A. Sharaf, P. K. Chrysanthis, A. Labrinidis, and K. Pruhs. Algorithms and metrics for processing multiple heterogeneous continuous queries. ACM Trans. Database Syst., 33(1):5:1--5:44, 2008.
[38]
M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO - db2's learning optimizer. In VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, September 11--14, 2001, Roma, Italy, pages 19--28, 2001.
[39]
R. Taft, N. El-Sayed, M. Serafini, Y. Lu, A. Aboulnaga, M. Stonebraker, R. Mayerhofer, and F. Andrade. P-store: An elastic database system with predictive provisioning. In Proceedings of the 2018 International Conference on Management of Data, pages 205--219. ACM, 2018.
[40]
D. Tang, Z. Shang, A. J. Elmore, S. Krishnan, and M. J. Franklin. Intermittent query processing. Proc. VLDB Endow., 12(11):1427--1441, July 2019.
[41]
Y. Tao, M. L. Yiu, D. Papadias, M. Hadjieleftheriou, and N. Mamoulis. RPJ: producing fast join results on streams through rate-based optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14--16, 2005, pages 371--382, 2005.
[42]
I. Trummer. Exact cardinality query optimization with bounded execution cost. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 2--17, 2019.
[43]
S. Venkataraman, A. Panda, K. Ousterhout, M. Armbrust, A. Ghodsi, M. J. Franklin, B. Recht, and I. Stoica. Drizzle: Fast and adaptable stream processing at scale. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28--31, 2017, pages 374--389, 2017.
[44]
S. Viglas and J. F. Naughton. Rate-based query optimization for streaming information sources. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA, June 3--6, 2002, pages 37--48, 2002.
[45]
S. Viglas, J. F. Naughton, and J. Burger. Maximizing the output rate of multi-way join queries over streaming information sources. In Proceedings of 29th International Conference on Very Large Data Bases, VLDB 2003, Berlin, Germany, September 9--12, 2003, pages 285--296, 2003.
[46]
A. Wilschut and P. Apers. Pipelining in query execution. In Proceedings of the International Conference on Databases, Parallel Architectures and Their Applications (PARBASE 1990), pages 562--562, United States, 3 1990. IEEE Computer Society.
[47]
K. Zeng, S. Agarwal, and I. Stoica. iOLAP: Managing uncertainty for efficient incremental OLAP. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 1347--1361, 2016.
[48]
J. Zhou, P. Larson, and H. G. Elmongui. Lazy maintenance of materialized views. In Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23--27, 2007, pages 231--242, 2007.

Cited By

View all
  • (2023)F-IVM: analytics over relational databases under updatesThe VLDB Journal10.1007/s00778-023-00817-w33:4(903-929)Online publication date: 14-Nov-2023
  • (2023)Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version)The VLDB Journal10.1007/s00778-023-00785-132:6(1315-1342)Online publication date: 20-Mar-2023
  • (2022)Efficient Incrementialization of Correlated Nested Aggregate Queries using Relative Partial Aggregate Indexes (RPAI)Proceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517889(136-149)Online publication date: 10-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud database
  2. incrementability
  3. incremental view maintenance
  4. non-positive query
  5. query service
  6. resource efficiency

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)124
  • Downloads (Last 6 weeks)21
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)F-IVM: analytics over relational databases under updatesThe VLDB Journal10.1007/s00778-023-00817-w33:4(903-929)Online publication date: 14-Nov-2023
  • (2023)Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version)The VLDB Journal10.1007/s00778-023-00785-132:6(1315-1342)Online publication date: 20-Mar-2023
  • (2022)Efficient Incrementialization of Correlated Nested Aggregate Queries using Relative Partial Aggregate Indexes (RPAI)Proceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517889(136-149)Online publication date: 10-Jun-2022
  • (2022)Workload Prediction for Adaptive Approximate Query Processing2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020614(217-222)Online publication date: 17-Dec-2022
  • (2021)Resource-efficient Shared Query Execution via Exploiting Time SlacknessProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457282(1797-1810)Online publication date: 9-Jun-2021
  • (2020)CrocodileDB in actionProceedings of the VLDB Endowment10.14778/3415478.341551313:12(2937-2940)Online publication date: 14-Sep-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media