Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD

Published: 01 July 2024 Publication History

Abstract

Modern organizations manage their data with a wide variety of specialized cloud database engines (e.g., Aurora, BigQuery, etc.). However, designing and managing such infrastructures is hard. Developers must consider many possible designs with non-obvious performance consequences; moreover, current software abstractions tightly couple applications to specific systems (e.g., with engine-specific clients), making it difficult to change after initial deployment. A better solution would virtualize cloud data management, allowing developers to declaratively specify their workload requirements and rely on automated solutions to design and manage the physical realization. In this paper, we present a technique called blueprint planning that achieves this vision. The key idea is to project data infrastructure design decisions into a unified design space (blueprints). We then systematically search over candidate blueprints using cost-based optimization, leveraging learned models to predict the utility of a blueprint on the workload. We use this technique to build BRAD, the first cloud data virtualization system. BRAD users issue queries to a single SQL interface that can be backed by multiple cloud database services. BRAD automatically selects the most suitable engine for each query, provisions and manages resources to minimize costs, and evolves the infrastructure to adapt to workload shifts. Our evaluation shows that BRAD meet user-defined performance targets and improve cost-savings by 1.6--13× compared to serverless auto-scaling or HTAP systems.

References

[1]
Michael Abebe, Horatiu Lazu, and Khuzaima Daudjee. 2022. Proteus: Autonomous Adaptive Storage for Mixed Workloads. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD '22). 700--714.
[2]
Ivo Adan and Jacques Resing. 2015. Queueing Systems. https://www.win.tue.nl/~iadan/queueing.pdf.
[3]
Divy Agrawal, Sanjay Chawla, Bertty Contreras-Rojas, Ahmed Elmagarmid, Yasser Idris, Zoi Kaoudi, Sebastian Kruse, Ji Lucas, Essam Mansour, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Saravanan Thirumuruganathan, and Anis Troudi. 2018. RHEEM: Enabling Cross-Platform Data Processing: May the Big Data Be with You! Proceedings of the VLDB Endowment 11, 11 (2018), 1414--1427.
[4]
Mert Akdere, Ugur Çetintemel, Matteo Riondato, Eli Upfal, and Stanley B. Zdonik. 2012. Learning-based Query Performance Modeling and Prediction. In Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE '12). 390--401.
[5]
Rana Alotaibi, Damian Bursztyn, Alin Deutsch, Ioana Manolescu, and Stamatis Zampetakis. 2019. Towards Scalable Hybrid Stores: Constraint-Based Rewriting to the Rescue. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). 1660--1677.
[6]
Amazon Web Services. 2022. AWS announces Amazon Aurora zero-ETL integration with Amazon Redshift. https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-aurora-zero-etl-integration-redshift/. Retrieved July 20, 2024.
[7]
Amazon Web Services. 2023. AWS announces Amazon Aurora I/O-Optimized. https://aws.amazon.com/about-aws/whats-new/2023/05/amazon-aurora-i-o-optimized/. Retrieved July 20, 2024.
[8]
Amazon Web Services. 2023. How do I resize an Amazon Redshift cluster? https://repost.aws/knowledge-center/resize-redshift-cluster. Retrieved July 20, 2024.
[9]
Amazon Web Services. 2024. Amazon Athena. https://aws.amazon.com/athena/. Retrieved July 20, 2024.
[10]
Amazon Web Services. 2024. Amazon Athena Pricing. https://aws.amazon.com/athena/pricing/. Retrieved July 20, 2024.
[11]
Amazon Web Services. 2024. Amazon Aurora. https://aws.amazon.com/rds/aurora/. Retrieved July 20, 2024.
[12]
Amazon Web Services. 2024. Amazon Aurora Pricing. https://aws.amazon.com/rds/aurora/pricing/. Retrieved July 20, 2024.
[13]
Amazon Web Services. 2024. Amazon EC2. https://aws.amazon.com/ec2/. Retrieved July 20, 2024.
[14]
Amazon Web Services. 2024. Amazon Redshift. https://aws.amazon.com/redshift/. Retrieved July 20, 2024.
[15]
Amazon Web Services. 2024. Amazon Redshift Pricing. https://aws.amazon.com/redshift/pricing/. Retrieved July 20, 2024.
[16]
Amazon Web Services. 2024. Amazon S3. https://aws.amazon.com/s3/. Retrieved July 20, 2024.
[17]
Amazon Web Services. 2024. AWS CloudFormation. https://aws.amazon.com/pm/cloudformation/. Retrieved July 20, 2024.
[18]
Amazon Web Services. 2024. AWS RDS Proxy. https://aws.amazon.com/rds/proxy/. Retrieved July 20, 2024.
[19]
Amazon Web Services. 2024. Data Lakes and Analytics on AWS. https://aws.amazon.com/big-data/datalakes-and-analytics/. Retrieved July 20, 2024.
[20]
Amazon Web Services. 2024. Purpose-Built Databases on AWS. https://aws.amazon.com/products/databases/. Retrieved July 20, 2024.
[21]
Gene M. Amdahl. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18--20, 1967, Spring Joint Computer Conference (AFIPS '67 (Spring)). 483--485.
[22]
Lyublena Antova, Derrick Bryant, Tuan Cao, Michael Duller, Mohamed A. Soliman, and Florian M. Waas. 2018. Rapid Adoption of Cloud Data Warehouse Technology Using Datometry Hyper-Q. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). 825--839.
[23]
Pgbouncer Authors. 2024. Pgbouncer - Lightweight connection pooler for PostgreSQL. https://www.pgbouncer.org/. Retrieved July 20, 2024.
[24]
Graham Bent, Patrick Dantressangle, David Vyvyan, Abbe Mowshowitz, and Valia Mitsou. 2008. A Dynamic Distributed Federated Database. In Proceedings of the 2nd Annual Conference on International Technology Alliance (ACITA '08).
[25]
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.
[26]
Yuri Breitbart, Hector Garcia-Molina, and Abraham Silberschatz. 1992. Overview of Multidatabase Transaction Management. VLDB Journal 1 (10 1992), 181--239.
[27]
Yuri Breitbart and Avi Silberschatz. 1988. Multidatabase Update Issues. In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data (SIGMOD '88). 135--142.
[28]
Matthew Butrovich, Karthik Ramanathan, John Rollinson, Wan Shen Lim, William Zhang, Justine Sherry, and Andrew Pavlo. 2023. Tigger: A Database Proxy That Bounces with User-Bypass. Proceedings of the VLDB Endowment 16, 11 (2023), 3335--3348.
[29]
Sarah Chasins, Alvin Cheung, Natacha Crooks, Ali Ghodsi, Ken Goldberg, Joseph E. Gonzalez, Joseph M. Hellerstein, Michael I. Jordan, Anthony D. Joseph, Michael W. Mahoney, Aditya Parameswaran, David Patterson, Raluca Ada Popa, Koushik Sen, Scott Shenker, Dawn Song, and Ion Stoica. 2022. The Sky Above The Clouds. arXiv:2205.07147 [cs.DC] https://arxiv.org/abs/2205.07147
[30]
Alvin Cheung, Natacha Crooks, Joseph M. Hellerstein, and Mae Milano. 2021. New Directions in Cloud Programming. arXiv:2101.01159 [cs.DC] https://arxiv.org/abs/2101.01159
[31]
Yun Chi, Hyun Jin Moon, Hakan Hacigümüs, and Jun'ichi Tatemura. 2011. SLA-tree: A Framework for Efficiently Supporting SLA-based Decisions in Cloud Computing. In Proceedings of the 14th International Conference on Extending Database Technology (EDBT '11). 129--140.
[32]
cppreference.com. 2024. C++ named requirements: Compare. https://en.cppreference.com/w/cpp/named_req/Compare. Retrieved July 20, 2024.
[33]
Benoit Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unter-brunner. 2016. The Snowflake Elastic Data Warehouse. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). 215--226.
[34]
Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, Chi Wang, Yinan Li, Ying Li, Donald Kossmann, Johannes Gehrke, and Tim Kraska. 2021. Instance-Optimized Data Layouts for Cloud Analytics Workloads. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD '21). 418--431.
[35]
Jialin Ding, Vikram Nathan, Mohammad Alizadeh, and Tim Kraska. 2020. Tsunami: A Learned Multi-Dimensional Index for Correlated Data and Skewed Workloads. Proceedings of the VLDB Endowment 14, 2 (2020), 74--86.
[36]
Jennie Duggan, Aaron J. Elmore, Michael Stonebraker, Magda Balazinska, Bill Howe, Jeremy Kepner, Sam Madden, David Maier, Tim Mattson, and Stan Zdonik. 2015. The BigDAWG Polystore System. SIGMOD Rec. 44, 2 (August 2015), 11--16.
[37]
Jennie Duggan, Olga Papaemmanouil, Ugur Çetintemel, and Eli Upfal. 2014. Contender: A Resource Modeling Approach for Concurrent Query Performance Prediction. In Proceedings of the 17th International Conference on Extending Database Technology (EDBT '14). 109--120.
[38]
Franz Färber, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, and Jonathan Dees. 2012. The SAP HANA Database - An Architecture Overview. IEEE Data Engineering Bulletin 35 (03 2012), 28--33.
[39]
Dimitrios Georgakopoulos, Marek Rusinkiewicz, and Amit P. Sheth. 1991. On Serializability of Multidatabase Transactions Through Forced Local Conflicts. In Proceedings of the Seventh International Conference on Data Engineering (ICDE '91). 314--323.
[40]
Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural Message Passing for Quantum Chemistry. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 70. PMLR, 1263--1272. https://proceedings.mlr.press/v70/gilmer17a.html
[41]
Google, Inc. 2024. BigQuery Omni. https://cloud.google.com/bigquery/docs/omni-introduction. Retrieved July 20, 2024.
[42]
Google, Inc. 2024. Google Cloud Databases. https://cloud.google.com/products/databases. Retrieved July 20, 2024.
[43]
Google, Inc. 2024. Google Compute Engine. https://cloud.google.com/compute. Retrieved July 20, 2024.
[44]
Xingyu Gu, Adam Ronthal, Robert Thanaraj, and Julian Sun. 2024. Data and Analytics Cloud Adoption Survey Reveals Data Governance and Cost Challenges. Gartner Report. https://www.gartner.com/document/5106731.
[45]
Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Jiangneng Li, and Bin Cui. 2021. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. Proceedings of the VLDB Endowment 15, 4 (2021), 752--765.
[46]
Mor Harchol-Balter. 2013. Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press.
[47]
HashiCorp. 2024. Terraform. https://www.terraform.io. Retrieved July 20, 2024.
[48]
Benjamin Hilprecht and Carsten Binnig. 2022. Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction. Proceedings of the VLDB Endowment 15, 11 (2022), 2361--2374. https://www.vldb.org/pvldb/vol15/p2361-hilprecht.pdf
[49]
Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, Wan Wei, Cong Liu, Jian Zhang, Jianjun Li, Xuelian Wu, Lingyu Song, Ruoxi Sun, Shuaipeng Yu, Lei Zhao, Nicholas Cameron, Liquan Pei, and Xin Tang. 2020. TiDB: A Raft-Based HTAP Database. Proceedings of the VLDB Endowment 13, 12 (2020), 3072--3084.
[50]
S.-Y. Hwang, E.-P. Lim, H.-R. Yang, S. Musukula, K. Mediratta, M. Ganesh, D. Clements, J. Stenoien, and J. Srivastava. 1994. The MYRIAD Federated Database Prototype. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD '94).
[51]
Vanja Josifovski, Peter Schwarz, Laura Haas, and Eileen Lin. 2002. Garlic: A New Flavor of Federated Query Processing for DB2. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD '02). 524--532.
[52]
Konstantinos Kanellis, Cong Ding, Brian Kroth, Andreas Müller, Carlo Curino, and Shivaram Venkataraman. 2022. LlamaTune: Sample-Efficient DBMS Configuration Tuning. Proceedings of the VLDB Endowment 15, 11 (2022), 2953--2965.
[53]
Anastasios Karagiannis, Panos Vassiliadis, and Alkis Simitsis. 2013. Scheduling Strategies for Efficient ETL Execution. Information Systems 38, 6 (2013), 927--945.
[54]
Alfons Kemper and Thomas Neumann. 2011. HyPer: A Hybrid OLTP & OLAP Main Memory Database System Based on Virtual Memory Snapshots. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE '11). 195--206.
[55]
Ferdi Kossmann, Ziniu Wu, Eugenie Lai, Nesime Tatbul, Lei Cao, Tim Kraska, and Sam Madden. 2023. Extract-Transform-Load for Video Streams. Proceedings of the VLDB Endowment 16, 9 (2023), 2302--2315.
[56]
Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H. Chi, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. 2019. SageDB: A Learned Database System. In Proceedings of the 9th Biennial Conference on Innovative Data Systems Research (CIDR '19). http://cidrdb.org/cidr2019/papers/p117-kraska-cidr19.pdf
[57]
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). 489--504.
[58]
Tim Kraska, Tianyu Li, Samuel Madden, Markos Markakis, Amadou Ngom, Ziniu Wu, and Geoffrey X. Yu. 2023. Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data Meshes. Proceedings of the VLDB Endowment 16, 11 (8 2023), 3293--3301.
[59]
Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion Stoica. 2019. Learning to Optimize Join Queries With Deep Reinforcement Learning. arXiv:1808.03196 [cs.DB] https://arxiv.org/abs/1808.03196
[60]
Tirthankar Lahiri, Shasank Chavan, Maria Colgan, Dinesh Das, Amit Ganesh, Mike Gleeson, Sanket Hase, Allison Holloway, Jesse Kamp, Teck-Hua Lee, Juan Loaiza, Neil Macnaughton, Vineet Marwah, Niloy Mukherjee, Atrayee Mullick, Sujatha Muthulingam, Vivekanandhan Raja, Marty Roth, Ekrem Soylemez, and Mohamed Zait. 2015. Oracle Database In-Memory: A Dual Format In-Memory Database. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering (ICDE '15). 1253--1258.
[61]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment 9, 3 (2015), 204--215.
[62]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems (NeurIPS '20).
[63]
Jiexing Li, Arnd Christian König, Vivek R. Narasayya, and Surajit Chaudhuri. 2012. Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques. Proceedings of the VLDB Endowment 5, 11 (2012), 1555--1566.
[64]
Tianyu Li, Badrish Chandramouli, Sebastian Burckhardt, and Samuel Madden. 2023. DARQ Matter Binds Everything: Performant and Composable Cloud Programming via Resilient Steps. Proceedings of the ACM on Management of Data 1, 2, Article 117 (2023), 27 pages.
[65]
Tianyu Li, Badrish Chandramouli, Sebastian Burckhardt, and Samuel Madden. 2024. Serverless State Management Systems. In Proceedings of the Conference on Innovative Data Research (CIDR '24). https://www.cidrdb.org/cidr2024/papers/p16-li.pdf
[66]
Wan Shen Lim, Matthew Butrovich, William Zhang, Andrew Crotty, Lin Ma, Peijing Xu, Johannes Gehrke, and Andrew Pavlo. 2023. Database Gyms. In Proceedings of the Conference on Innovative Data Systems Research (CIDR '23).
[67]
B. T. Lowerre. 1976. The HARPY Speech Recognition System. Ph.D. Dissertation. Carnegie Mellon University.
[68]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2022. Bao: Making Learned Query Optimization Practical. In Proceedings of the International Conference on Management of Data (SIGMOD '22).
[69]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. Proceedings of the VLDB Endowment 12, 11 (2019).
[70]
Ryan Marcus and Olga Papaemmanouil. 2018. Deep Reinforcement Learning for Join Order Enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM '18).
[71]
Ryan Marcus and Olga Papaemmanouil. 2019. Plan-Structured Deep Neural Network Models for Query Performance Prediction. Proceedings of the VLDB Endowment 12, 11 (2019), 1733--1746.
[72]
Microsoft Corporation. 2024. Azure Compute. https://azure.microsoft.com/en-us/products/category/compute. Retrieved July 20, 2024.
[73]
Microsoft Corporation. 2024. Microsoft Fabric Documentation. https://learn.microsoft.com/en-us/fabric/. Retrieved July 20, 2024.
[74]
Elena Milkai, Yannis Chronis, Kevin P Gaffney, Zhihan Guo, Jignesh M Patel, and Xiangyao Yu. 2022. How Good is My HTAP System?. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD '22). 1810--1824.
[75]
Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging AI Applications. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI '18). 561--577.
[76]
Barzan Mozafari, Radu Alexandru Burcuta, Alan Cabrera, Andrei Constantin, Derek Francis, David Grömling, Alekh Jindal, Maciej Konkolowicz, Valentin Marian Spac, Yongjoo Park, Russell Razo Carranzo, Nicholas Richardson, Abhishek Roy, Aayushi Srivastava, Isha Tarte, Brian Westphal, and Chi Zhang. 2023. Making Data Clouds Smarter at Keebo: Automated Warehouse Optimization Using Data Learning. In Companion of the 2023 International Conference on Management of Data (Seattle, WA, USA) (SIGMOD '23). Association for Computing Machinery, New York, NY, USA, 239--251.
[77]
Barzan Mozafari, Carlo Curino, Alekh Jindal, and Samuel Madden. 2013. Performance and Resource Modeling in Highly-Concurrent OLTP Workloads. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13).
[78]
Vikram Nathan, Jialin Ding, Mohammad Alizadeh, and Tim Kraska. 2020. Learning Multi-Dimensional Indexes. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 985--1000.
[79]
Parimarjan Negi, Ziniu Wu, Andreas Kipf, Nesime Tatbul, Ryan Marcus, Sam Madden, Tim Kraska, and Mohammad Alizadeh. 2023. Robust Query Driven Cardinality Estimation under Changing Workloads. Proceedings of the VLDB Endowment 16, 6 (2023), 1520--1533.
[80]
Jennifer Ortiz. 2019. Performance-Based Service Level Agreements for Data Analytics in the Cloud. Ph.D. Dissertation. University of Washington.
[81]
Jennifer Ortiz, Brendan Lee, Magdalena Balazinska, Johannes Gehrke, and Joseph L Hellerstein. 2018. SLAOrchestrator: Reducing the Cost of Performance SLAs for Cloud Data Analytics. In Proceedings of the 2018 USENIX Annual Technical Conference ((USENIX ATC '18)). 547--560.
[82]
Jennifer Ortiz, Brendan Lee, Magdalena Balazinska, and Joseph L. Hellerstein. 2016. PerfEnforce: A Dynamic Scaling Engine for Analytics with Performance Guarantees. arXiv:1605.09753 [cs.DB]
[83]
OtterTune, Inc. 2024. OtterTune | AI Powered Automatic PostgreSQL & MySQL Tuning. https://web.archive.org/web/20240605143522/https://ottertune.com/. Retrieved July 20, 2024.
[84]
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-Driving Database Management Systems. In Proceedings of the Conference on Innovative Data Systems Research (CIDR '17). https://db.cs.cmu.edu/papers/2017/p42-pavlo-cidr17.pdf
[85]
Andrew Pavlo, Matthew Butrovich, Ananya Joshi, Lin Ma, Prashanth Menon, Dana Van Aken, Lisa Lee, and Ruslan Salakhutdinov. 2019. External vs. Internal: An Essay on Machine Learning Agents for Autonomous Database Management Systems. IEEE Data Engineering Bulletin (June 2019), 32--46. http://sites.computer.org/debull/A19june/p32.pdf
[86]
Andrew Pavlo, Matthew Butrovich, Lin Ma, Wan Shen Lim, Prashanth Menon, Dana Van Aken, and William Zhang. 2021. Make Your Database System Dream of Electric Sheep: Towards Self-Driving Operation. Proceedings of the VLDB Endowment 14, 12 (2021), 3211--3221.
[87]
Matthew Perron, Raul Castro Fernandez, David Dewitt, Michael Cafarella, and Samuel Madden. 2023. Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools. In Proceedings of the ACM on Management of Data, Vol. 1. Issue 4.
[88]
pgvector Authors. 2023. Open source vector similarity search for Postgres. https://github.com/pgvector/pgvector.
[89]
Maksim Podkorytov and Michael Gubanov. 2019. Hybrid.Poly: A Consolidated Interactive Analytical Polystore System. In Proceedings of the 2019 IEEE 35th International Conference on Data Engineering (ICDE '19). 1996--1999.
[90]
Calton Pu. 1988. Superdatabases for Composition of Heterogeneous Databases. In Proceedings of the Fourth International Conference on Data Engineering (ICDE '88). 548--555.
[91]
Francisco Romero, Qian Li, Neeraja J. Yadwadkar, and Christos Kozyrakis. 2021. INFaaS: Automated Model-less Inference Serving. In Proceedings of the 2021 USENIX Annual Technical Conference (USENIX ATC '21). 397--411. https://www.usenix.org/conference/atc21/presentation/romero
[92]
Mingwei Samuel. 2021. Hydroflow: A Model and Runtime for Distributed Systems Programming. Master's thesis. University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-201.html
[93]
Gaurav Saxena, Mohammad Rahman, Naresh Chainani, Chunbin Lin, George Caragea, Fahim Chowdhury, Ryan Marcus, Tim Kraska, Ippokratis Pandis, and Balakrishnan (Murali) Narayanaswamy. 2023. Auto-WLM: Machine Learning Enhanced Workload Management in Amazon Redshift. In Companion of the 2023 International Conference on Management of Data (SIGMOD '23). 225--237.
[94]
P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. 1979. Access Path Selection in a Relational Database Management System. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (SIGMOD '79). 23--34.
[95]
Amit P Sheth and James A Larson. 1990. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys (CSUR) 22, 3 (1990), 183--236.
[96]
Vishal Sikka, Franz Färber, Wolfgang Lehner, Sang Kyun Cha, Thomas Peh, and Christof Bornhövd. 2012. Efficient Transaction Processing in SAP HANA Database: The End of a Column Store Myth. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12). 731--742.
[97]
Snowflake, Inc. 2024. ETL vs. ELT: Differences and Similarities. https://www.snowflake.com/guides/etl-vs-elt. Retrieved July 20, 2024.
[98]
Ji Sun and Guoliang Li. 2019. An End-to-End Learning-based Cost Estimator. Proceedings of the VLDB Endowment 13, 3 (2019), 307--319.
[99]
Transaction Processing Performance Council (TPC). 2024. TPC-C. https://www.tpc.org/tpcc/. Retrieved July 20, 2024.
[100]
Transaction Processing Performance Council (TPC). 2024. TPC-DS. https://www.tpc.org/tpcds/default5.asp. Retrieved July 20, 2024.
[101]
Transaction Processing Performance Council (TPC). 2024. TPC-H. https://www.tpc.org/tpch/default5.asp. Retrieved July 20, 2024.
[102]
University of California, Berkeley. 2024. Sky Computing. https://sky.cs.berkeley.edu/. Retrieved July 20, 2024.
[103]
Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-Scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). 1009--1024.
[104]
Shivaram Venkataraman, Zongheng Yang, Michael Franklin, Benjamin Recht, and Ion Stoica. 2016. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. In Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI '16). 363--378.
[105]
Marco Vogt, Alexander Stiemer, and Heiko Schuldt. 2018. Polypheny-DB: Towards a Distributed and Self-Adaptive Polystore. In Proceedings of the 2018 IEEE International Conference on Big Data (IEEE Big Data '18). 3364--3373.
[106]
Midhul Vuppalapati, Justin Miron, Rachit Agarwal, Dan Truong, Ashish Motivala, and Thierry Cruanes. 2020. Building An Elastic Query Engine on Disaggregated Storage. In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI '20). 449--462. https://www.usenix.org/conference/nsdi20/presentation/vuppalapati
[107]
Benjamin Wagner, André Kohn, and Thomas Neumann. 2021. Self-Tuning Query Scheduling for Analytical Workloads. In Proceedings of the International Conference on Management of Data (SIGMOD '21). 1879--1891.
[108]
Jingjing Wang, Tobin Baker, Magdalena Balazinska, Daniel Halperin, Brandon Haynes, Bill Howe, Dylan Hutchison, Shrainik Jain, Ryan Maas, Parmita Mehta, Dominik Moritz, Brandon Myers, Jennifer Ortiz, Dan Suciu, Andrew Whitaker, and Shengliang Xu. 2017. The Myria Big Data Management and Analytics System and Cloud Services. In Proceedings of the Conference on Innovative Data Systems Research (CIDR '17).
[109]
Wentao Wu, Yun Chi, Shenghuo Zhu, Jun'ichi Tatemura, Hakan Hacigümüs, and Jeffrey F. Naughton. 2013. Predicting Query Execution Time: Are Optimizer Cost Models Really Unusable?. In Proceedings of the 29th IEEE International Conference on Data Engineering (ICDE '13). 1081--1092.
[110]
Ziniu Wu, Ryan Marcus, Zhengchun Liu, Parimarjan Negi, Vikram Nathan, Pascal Pfeil, Gaurav Saxena, Mohammad Rahman, Balakrishnan Narayanaswamy, and Tim Kraska. 2024. Stage: Query Execution Time Prediction in Amazon Redshift. In Companion of the 2024 International Conference on Management of Data (SIGMOD '24). 280--294.
[111]
Ziniu Wu, Parimarjan Negi, Mohammad Alizadeh, Tim Kraska, and Samuel Madden. 2023. FactorJoin: A New Cardinality Estimation Framework for Join Queries. Proceedings of the ACM on Management of Data 1, 1, Article 41 (2023), 27 pages.
[112]
Ziniu Wu, Pei Yu, Peilun Yang, Rong Zhu, Yuxing Han, Yaliang Li, Defu Lian, Kai Zeng, and Jingren Zhou. 2022. A Unified Transferable Model for ML-Enhanced DBMS. In Proceedings of the 12th Conference on Innovative Data Systems Research (CIDR '22). https://www.cidrdb.org/cidr2022/papers/p6-wu.pdf
[113]
Geoffrey X. Yu, Markos Markakis, Andreas Kipf, Per-Åke Larson, Umar Farooq Minhas, and Tim Kraska. 2022. TreeLine: An Update-In-Place Key-Value Store for Modern Storage. Proceedings of the VLDB Endowment 16, 1 (2022), 99--112.
[114]
Geoffrey X. Yu, Ziniu Wu, Ferdi Kossmann, Tianyu Li, Markos Markakis, Amadou Ngom, Samuel Madden, and Tim Kraska. 2024. Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD - Extended Version. arXiv:2407.15363 [cs.DB] https://arxiv.org/abs/2407.15363
[115]
Xiang Yu, Guoliang Li, Chengliang Chai, and Nan Tang. 2020. Reinforcement Learning with Tree-LSTM for Join Order Selection. In Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE '20). 1297--1308.
[116]
Jianqiu Zhang, Kaisong Huang, Tianzheng Wang, and King Lv. 2022. Skeena: Efficient and Consistent Cross-Engine Transactions. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD '22). 34--48.
[117]
Xiuwen Zheng, Subhasis Dasgupta, Arun Kumar, and Amarnath Gupta. 2022. AWESOME: Empowering Scalable Data Science on Social Media Data with an Optimized Tri-Store Data System. arXiv:2112.00833 [cs.DB]

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 17, Issue 11
July 2024
1039 pages
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2024
Published in PVLDB Volume 17, Issue 11

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 144
    Total Downloads
  • Downloads (Last 12 months)144
  • Downloads (Last 6 weeks)22
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media