Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3626246.3653371acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Proactive Resume and Pause of Resources for Microsoft Azure SQL Database Serverless

Published: 09 June 2024 Publication History

Abstract

Demand-driven resource allocation for cloud databases has become a popular research direction. Recent approaches have evolved from reactive policies to proactive decision making. These approaches leverage not only the current resource demand but also the predicted demand to make more informed resource allocation decisions for each database and thus improve the quality of service and reduce the operational costs. We present an infrastructure that enables proactive resource allocation capabilities for millions of serverless Azure SQL databases. Our solution finds near-optimal middle ground between high availability of resources, low operational costs, and low computational overhead of the proactive policy. We describe the design principles we followed and the architectural decisions we made during this cross-team, multi-year journey. Given the size and scope of our solution, we believe that the relational cloud databases in other companies could benefit from the proactive resource allocation capabilities.

Supplemental Material

MP4 File
Conference presentation
MP4 File
Demand-driven resource allocation for cloud databases has become a popular research direction. Recent approaches have evolved from reactive policies to proactive decision making. These approaches leverage not only the current resource demand but also the predicted demand to make more informed resource allocation decisions for each database and thus improve the quality of service and reduce the operational costs. We present an infrastructure that enables proactive resource allocation capabilities for millions of serverless Azure SQL databases. Our solution finds near-optimal middle ground between high availability of resources, low operational costs, and low computational overhead of the proactive policy. We describe the design principles we followed and the architectural decisions we made during this cross-team, multi-year journey. Given the size and scope of our solution, we believe that the relational cloud databases in other companies could benefit from the proactive resource allocation capabilities.
PPTX File
Conference presentation

References

[1]
2024. ARIMA. https://pypi.org/project/pmdarima/
[2]
2024. Availability Capabilities of Azure SQL Database. https: //learn.microsoft.com/en-us/azure/azure-sql/database/sql-database-paasoverview? view=azuresql#availability-capabilities
[3]
2024. Azure ML. https://azure.microsoft.com/en-us/services/machine-learning/
[4]
2024. Azure Service Fabric. https://azure.microsoft.com/en-us/services/servicefabric/
[5]
2024. Azure SQL Database. https://azure.microsoft.com/en-us/products/azuresql/ database
[6]
2024. Azure SQL Database Pricing. https://azure.microsoft.com/en-us/pricing/ details/azure-sql-database
[7]
2024. Azure SQL Database Serverless. https://docs.microsoft.com/en-us/azure/ azure-sql/database/serverless-tier-overview
[8]
2024. Clustered and Nonclustered Indexes of SQL Server. https: //learn.microsoft.com/en-us/sql/relational-databases/indexes/clusteredand- nonclustered-indexes-described?view=sql-server-ver16
[9]
2024. GluonTS. https://gluon-ts.mxnet.io/
[10]
2024. MLflow. https://mlflow.org/
[11]
2024. ML.NET Binary Trainer. https://docs.microsoft.com/en-us/dotnet/api/ microsoft.ml.trainers.fasttree.fastforestbinarytrainer
[12]
2024. MySQL Autopilot Shape Advisor. https://dev.mysql.com/doc/heatwaveaws/ en/heatwave-aws-autopilot-shape-advisor.html
[13]
2024. NimbusML. https://docs.microsoft.com/en-us/python/api/nimbusml/ nimbusml.timeseries.ssaforecaster
[14]
2024. Oracle Autonomous Database. https://www.oracle.com/autonomousdatabase/
[15]
2024. Power BI. https://powerbi.microsoft.com/
[16]
2024. Prophet. https://facebook.github.io/prophet/
[17]
2024. SLA for Azure SQL Database. https://azure.microsoft.com/en-us/support/ legal/sla/azure-sql-database/v1_8/
[18]
Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. 265--283.
[19]
Amir H. Ashouri, William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano. 2019. A Survey on Compiler Autotuning using Machine Learning. ACM Computing Surveys (CSUR) 51 (2019), 1 -- 42.
[20]
Nico Bruno and Surajit Chaudhuri. 2005. Automatic Physical Database Tuning: A Relaxation-based Approach. In SIGMOD. 227--238.
[21]
Joyce Cahoon, Wenjing Wang, Yiwen Zhu, Katherine Lin, Sean Liu, Raymond Truong, Neetu Singh, Chengcheng Wan, Alexandra Ciortea, Sreraman Narasimhan, and Subru Krishnan. 2022. Doppler: Automated SKU Recommendation in Migrating SQL Workloads to the Cloud. Proc. VLDB Endow. 15, 12 (2022), 3509--3521.
[22]
Surajit Chaudhuri and Vivek Narasayya. 1998. AutoAdmin "What-If" Index Analysis Utility. In SIGMOD. 367--378.
[23]
Surajit Chaudhuri and Vivek R. Narasayya. 1997. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB. 146--155.
[24]
Dehao Chen, David Xinliang Li, and Tipp Moseley. 2016. AutoFDO: Automatic Feedback-Directed Optimization for Warehouse-Scale Applications. In Proc. of Int. Symposium on Code Generation and Optimization. 12--23.
[25]
Daniel Crankshaw, Peter Bailis, Joseph E. Gonzalez, Haoyuan Li, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan. 2015. The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox. In CIDR.
[26]
Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A Low-Latency Online Prediction Serving System. In NSDI. 613--627.
[27]
Carlo Curino, Neha Godwal, Brian Kroth, Sergiy Kuryata, Greg Lapinski, Siqi Liu, Slava Oks, Olga Poppe, Adam Smiechowski, Ed Thayer, MarkusWeimer, and Yiwen Zhu. 2020. MLOS: An Infrastructure for Automated Software Performance Engineering. In DEEM@SIGMOD. 1--5.
[28]
Benoît Dageville and Mohamed Zait. 2002. SQL Memory Management in Oracle9i. In VLDB. 962--973.
[29]
Sudipto Das, Feng Li, Vivek R. Narasayya, and Arnd Christian König. 2016. Automated Demand-driven Resource Scaling in Relational Database-as-a-Service. In SIGMOD. 1923--1924.
[30]
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-Efficient and QoS-Aware Cluster Management. SIGPLAN Not. 49, 4 (2014), 127--144.
[31]
Karl Dias, Mark Ramacher, Uri Shaft, Venkateshwaran Venkataramani, and Graham Wood. 2005. Automatic Performance Diagnosis and Tuning in Oracle. In CIDR. 84--94.
[32]
Songyun Duan, Vamsidhar Thummala, and Shivnath Babu. 2009. Tuning Database Configuration Parameters with ITuned. Proc. VLDB Endow. 2, 1 (August 2009), 1246--1257.
[33]
Jonathan Eastep, David Wingate, and Anant Agarwal. 2011. Smart Data Structures: An Online Machine Learning Approach to Multicore Data Structures. In Proc. of Int. Conf. on Autonomic Computing. 11--20.
[34]
Grigori Fursin, Yuriy Kashnikov, Abdul Wahid Memon, Zbigniew Chamski, Olivier Temam, Mircea Namolaru, Bilha Mendelson, Ayal Zaks, Eric Courtois, François Bodin, Phil Barnard, Elton Ashton, Edwin Bonilla, John Thomson, Christopher Williams, and Michael O'Boyle. 2011. Milepost GCC: Machine Learning Enabled Self-tuning Compiler. Int. Journal of Parallel Programming 39 (06 2011), 296--327.
[35]
Sunny Gakhar, Joyce Cahoon,Wangchao Le, Xiangnan Li, Kaushik Ravichandran, Hiren Patel, Marc Friedman, Brandon Haynes, Shi Qiao, Alekh Jindal, and Jyoti Leeka. 2022. Pipemizer: An Optimizer for Analytics Data Pipelines. Proc. VLDB Endow. 15, 12 (September 2022), 3710--3713.
[36]
Michael Hammer and Bahram Niamir. 1979. A Heuristic Approach to Attribute Partitioning. In SIGMOD. 93--101.
[37]
Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S. Kester, and Demi Guo. 2018. The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models. In SIGMOD. 535--550.
[38]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. In MM. Association for Computing Machinery, 675--678.
[39]
Alekh Jindal, K. Venkatesh Emani, Maureen Daum, Olga Poppe, Brandon Haynes, Anna Pavlenko, Ayushi Gupta, Karthik Ramachandra, Carlo Curino, Andreas C. Müller, Wentao Wu, and Hiren Patel. 2021. Magpie: Python at Speed and Scale using Cloud Backends. In CIDR.
[40]
Alekh Jindal and Jyoti Leeka. 2022. Query Optimizer as a Service: An Idea Whose Time Has Come. SIGMOD Record 51, 3 (2022), 49--55.
[41]
Gopal Kakivaya, Lu Xun, Richard Hasha, Shegufta Bakht Ahsan, Todd Pfleiger, Rishi Sinha, Anurag Gupta, Mihail Tarta, Mark Fussell, Vipul Modi, Mansoor Mohsin, Ray Kong, Anmol Ahuja, Oana Platon, Alex Wun, Matthew Snider, Chacko Daniel, Dan Mastrian, Yang Li, Aprameya Rao, Vaishnav Kidambi, Randy Wang, Abhishek Ram, Sumukh Shivaprakash, Rajeet Nair, Alan Warwick, Bharat S. Narasimman, Meng Lin, Jeffrey Chen, Abhay Balkrishna Mhatre, Preetha Subbarayalu, Mert Coskun, and Indranil Gupta. 2018. Service Fabric: A Distributed Platform for Building Microservices in the Cloud. In EuroSys. 1--15.
[42]
Arnd Christian König, Yi Shan, Tobias Ziegler, Aarati Kakaraparthy, Willis Lang, Justin Moeller, Ajay Kalhan, and Vivek Narasayya. 2022. Tenant Placement in Over-subscribed Database-as-a-Service Clusters. Proc. VLDB Endow. 15, 11 (2022), 2559--2571.
[43]
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In SIGMOD. 489--504.
[44]
Eva Kwan, Sam Lightstone, K. Bernhard Schiefer, Adam J. Storm, and LeanneWu. 2003. Automatic Database Configuration for DB2 Universal Database: Compressing Years of Performance Expertise into Seconds of Execution. In BTW, Vol. 26. 620--629.
[45]
Willis Lang, Karthik Ramachandra, David J. DeWitt, Shize Xu, Qun Guo, Ajay Kalhan, and Peter Carlin. 2016. Not for the Timid: On the Impact of Aggressive over-Booking in the Cloud. Proc. VLDB Endow. 9, 13 (2016), 1245--1256.
[46]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Learning to Steer Query Optimizers. In SIGMOD. 1275--1288.
[47]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. Proc. VLDB Endow. 12, 11 (July 2019), 1705--1718.
[48]
Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2016. MLlib: Machine Learning in Apache Spark. Journal of Machine Learning Research 17, 34 (2016), 1--7.
[49]
Justin Moeller, Zi Ye, Katherine Lin, and Willis Lang. 2021. Toto - Benchmarking the Efficiency of a Cloud Service. In SIGMOD. 2543--2556.
[50]
Kunal Mukerjee, Tomas Talius, Ajay Kalhan, Nigel Ellis, and Conor Cunningham. 2011. SQL Azure as a Self-Managing Database Service: Lessons Learned and Challenges Ahead. IEEE Data Eng. Bull. 34, 4 (2011), 61--70.
[51]
Dushyanth Narayanan, Eno Thereska, and Anastassia Ailamaki. 2005. Continuous Resource Monitoring for Self-predicting DBMS. In IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. 239--248.
[52]
Rimma Nehme and Nicolas Bruno. 2011. Automated Partitioning Design in Parallel Database Systems. In SIGMOD. 1137--1148.
[53]
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C. Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-Driving Database Management Systems. In CIDR.
[54]
Andrew Pavlo, Matthew Butrovich, Ananya Joshi, Lin Ma, Prashanth Menon, Dana Van Aken, Lisa Lee, and Ruslan Salakhutdinov. 2019. External vs. Internal: An Essay on Machine Learning Agents for Autonomous Database Management Systems. IEEE Data Eng. Bull. 42, 2 (2019), 32--46.
[55]
Andrew Pavlo, Carlo Curino, and Stanley Zdonik. 2012. Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems. In SIGMOD. 61--72.
[56]
Jose Picado, Willis Lang, and Edward C. Thayer. 2018. Survivability of Cloud Databases - Factors and Prediction. In SIGMOD. 811--823.
[57]
Olga Poppe, Tayo Amuneke, Dalitso Banda, Aritra De, Ari Green, Manon Knoertzer, Ehi Nosakhare, Karthik Rajendran, Deepak Shankargouda, Meina Wang, Alan Au, Carlo Curino, Qun Guo, Alekh Jindal, Ajay Kalhan, Morgan Oslake, Sonia Parchani, Vijay Ramani, Raj Sellappan, Saikat Sen, Sheetal Shrotri, Soundararajan Srinivasan, Ping Xia, Shize Xu, Alicia Yang, and Yiwen Zhu. 2020. Seagull: An Infrastructure for Load Prediction and Optimized Resource Allocation. Proc. VLDB Endow. 14, 2 (2020), 154--162.
[58]
Olga Poppe, Pablo Castro, Willis Lang, and Jyoti Leeka. 2023. Proactive Resource Allocation Policy for Microsoft Azure Cognitive Search. SIGMOD Record 52, 3 (2023), 41--48.
[59]
Olga Poppe, Qun Guo, Willis Lang, Pankaj Arora, Morgan Oslake, Shize Xu, and Ajay Kalhan. 2022. Moneyball: Proactive Auto-Scaling in Microsoft Azure SQL Database Serverless. Proc. VLDB Endow. 15, 6 (2022), 1279--1287.
[60]
Olga Poppe, Chuan Lei, Elke A. Rundensteiner, and Dan Dougherty. 2016. Contextaware Event Stream Analytics. In EDBT. 413--424.
[61]
Conor Power, Hiren Patel, Alekh Jindal, Jyoti Leeka, Bob Jenkins, Michael Rys, Ed Triou, Dexin Zhu, Lucky Katahanas, Chakrapani Bhat Talapady, Josh Rowe, Fan Zhang, Rich Draves, Ivan Santa, and Amrish Kumar. 2021. The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward. Proc. VLDB Endow. 14, 12 (2021), 3148--3161.
[62]
Adam J. Storm, Christian Garcia-Arellano, Sam S. Lightstone, Yixin Diao, and M. Surendra. 2006. Adaptive Self-Tuning Memory in DB2. In VLDB. 1081--1092.
[63]
Rebecca Taft, Nosayba El-Sayed, Marco Serafini, Yu Lu, Ashraf Aboulnaga, Michael Stonebraker, Ricardo Mayerhofer, and Francisco Andrade. 2018. P-Store: An Elastic Database System with Predictive Provisioning. In SIGMOD. 205--219.
[64]
Wenhu Tian, Pat Martin, andWendy Powley. 2003. Techniques for Automatically Sizing Multiple Buffer Pools in DB2. In CASCON. 294--302.
[65]
Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-Scale Machine Learning. In SIGMOD. 1009--1024.
[66]
Lalitha Viswanathan, Bikash Chandra, Willis Lang, Karthik Ramachandra, Jignesh M. Patel, Ajay Kalhan, David J. DeWitt, and Alan Halverson. 2017. Predictive Provisioning: Efficiently Anticipating Usage in Azure SQL Database. In ICDE. 1111--1116.
[67]
Bowei Xi, Zhen Liu, Mukund Raghavachari, Cathy H. Xia, and Li Zhang. 2004. A Smart Hill-Climbing Algorithm for Application Server Configuration. In WWW. 287--296.
[68]
Yuqing Zhu, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Ma, Zhuoyue Liu, Kunpeng Song, and Yingchun Yang. 2017. BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning. In Proc. of Symposium on Cloud Computing. 338--350.

Index Terms

  1. Proactive Resume and Pause of Resources for Microsoft Azure SQL Database Serverless

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of Data
    June 2024
    694 pages
    ISBN:9798400704222
    DOI:10.1145/3626246
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. autonomous database
    2. proactive auto-scale of resources

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 380
      Total Downloads
    • Downloads (Last 12 months)380
    • Downloads (Last 6 weeks)49
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media