Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3472883.3486985acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Parslo: A Gradient Descent-based Approach for Near-optimal Partial SLO Allotment in Microservices

Published: 01 November 2021 Publication History

Abstract

Modern cloud services are implemented as graphs of loosely-coupled microservices to improve programmability, reliability, and scalability. Service Level Objectives (SLOs) define end-to-end latency targets for the entire service to ensure user satisfaction. In such environments, each microservice is independently deployed and (auto-)scaled. However, it is unclear how to optimally scale individual microservices when end-to-end SLOs are violated or underutilized, and how to size each microservice to meet the end-to-end SLO at minimal total cost. In this paper, we propose Parslo---a Gradient Descent-based approach to assign partial SLOs among nodes in a microservice graph under an end-to-end latency SLO. At a high level, the Parslo algorithm breaks the end-to-end SLO budget into small incremental "SLO units", and iteratively allocates one marginal SLO unit to the best candidate microservice to achieve the highest total cost savings until the entire end-to-end SLO budget is exhausted. Parslo achieves a near-optimal solution, seeking to minimize the total cost for the entire service deployment, and is applicable to general microservice graphs that comprise patterns like dynamic branching, parallel fan-out, and microservice dependencies. Parslo reduces service deployment costs by more than 6x in real microservice-based applications, compared to a state-of-the-art partial SLO assignment scheme.

Supplementary Material

VTT File (Day3_Session9-Order3.vtt)
MP4 File (Day3_Session9-Order3.mp4)
Presentation video

References

[1]
[n.d.]. Cloud Adoption in 2020. https://www.oreilly.com/radar/cloud-adoption-in-2020/. Accessed: 2021-08-30.
[2]
[n.d.]. Kubernetes Horizontal Pod Autoscaler. https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/. Accessed: 2021-08-30.
[3]
Armin Balalaie, Abbas Heydarnoori, and Pooyan Jamshidi. 2016. Microservices architecture enables devops: Migration to a cloud-native architecture. Ieee Software 33, 3 (2016), 42--52.
[4]
Eric A Brewer. 2015. Kubernetes and the path to cloud native. In Proceedings of the sixth ACM symposium on cloud computing. 167--167.
[5]
Shuang Chen, Christina Delimitrou, and José F Martínez. 2019. Parties: Qos-aware resource partitioning for multiple interactive services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 107--120.
[6]
Byungkwon Choi, Jinwoo Park, Chunghan Lee, and Dongsu Han. 2021. pHPA: A Proactive Autoscaling Framework For Microservice Chain. In 5th Asia-Pacific Workshop on Networking (APNet 2021). Association for Computing Machinery, Inc.
[7]
Chih-Hsun Chou, Laxmi N Bhuyan, and Daniel Wong. 2019. μdpm: Dynamic power management for the microsecond era. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 120--132.
[8]
Alexandros Daglis, Mark Sutherland, and Babak Falsafi. 2019. RPC-Valet: NI-driven tail-aware balancing of μs-scale RPCs. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 35--48.
[9]
Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (2013), 74--80.
[10]
Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices 48, 4 (2013), 77--88.
[11]
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and qos-aware cluster management. ACM SIGPLAN Notices 49, 4 (2014), 127--144.
[12]
Christina Delimitrou and Christos Kozyrakis. 2018. Amdahl's law for tail latency. Commun. ACM 61, 8 (2018), 65--72.
[13]
José Fonseca, Geoffrey Nelissen, and Vincent Nélis. 2019. Schedulability analysis of DAG tasks with arbitrary deadlines under global fixed-priority scheduling. Real-Time Systems (2019).
[14]
Yu Gan and Christina Delimitrou. 2018. The architectural implications of cloud microservices. IEEE Computer Architecture Letters 17, 2 (2018), 155--158.
[15]
Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, et al. 2019. An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems. In ASPLOS.
[16]
Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou. 2019. Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices. In ASPLOS.
[17]
Anshul Gandhi, Mor Harchol-Balter, Ram Raghunathan, and Michael A Kozuch. 2012. Autoscale: Dynamic, robust capacity management for multi-tier data centers. ACM Transactions on Computer Systems (TOCS) 30, 4 (2012), 1--26.
[18]
Hossein Golestani, Amirhossein Mirhosseini, and Thomas F Wenisch. 2019. Software Data Planes: You Can't Always Spin to Win. In Proceedings of the ACM Symposium on Cloud Computing. 337--350.
[19]
Xin He and Yaacov Yesha. 1987. Parallel recognition and decomposition of two terminal series parallel graphs. Information and Computation (1987).
[20]
Xiaofeng Hou, Chao Li, Jiacheng Liu, Lu Zhang, Yang Hu, and Minyi Guo. 2020. ANT-man: towards agile power management in the microservice era. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--14.
[21]
Kostis Kaffes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazières, and Christos Kozyrakis. 2019. Shinjuku: Preemptive scheduling for μsecond-scale tail latency. In 16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19). 345--360.
[22]
Ram Srivatsa Kannan, Lavanya Subramanian, et al. 2019. Grandslam: Guaranteeing slas for jobs in microservices execution frameworks. In EuroSys.
[23]
Marios Kogias, George Prekas, Adrien Ghosn, Jonas Fietz, and Edouard Bugnion. 2019. R2P2: Making RPCs first-class datacenter citizens. In 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19). 863--880.
[24]
Neeraj Kulkarni, Feng Qi, and Christina Delimitrou. 2019. Pliant: Leveraging approximation to improve datacenter resource efficiency. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 159--171.
[25]
Jialin Li, Naveen Kr Sharma, Dan RK Ports, and Steven D Gribble. 2014. Tales of the tail: Hardware, os, and application-level sources of tail latency. In Proceedings of the ACM Symposium on Cloud Computing. 1--14.
[26]
Qian Li, Bin Li, Pietro Mercati, Ramesh Illikkal, Charlie Tai, Michael Kishinevsky, and Christos Kozyrakis. 2021. RAMBO: Resource Allocation for Microservices Using Bayesian Optimization. IEEE Computer Architecture Letters 20, 1 (2021), 46--49.
[27]
Yuhao Li, Dan Sun, and Benjamin C Lee. 2020. Dynamic colocation policies with reinforcement learning. ACM Transactions on Architecture and Code Optimization (TACO) 17, 1 (2020), 1--25.
[28]
David Lo and Christos Kozyrakis. 2014. Dynamic management of TurboMode in modern multi-core chips. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 603--613.
[29]
Tania Lorido-Botran, Jose Miguel-Alonso, and Jose A Lozano. 2014. A review of auto-scaling techniques for elastic applications in cloud environments. Journal of grid computing 12, 4 (2014), 559--592.
[30]
Simon J Malkowski, Markus Hedwig, Jack Li, et al. 2011. Automated control for elastic n-tier workloads based on empirical modeling. In ICAC.
[31]
Ming Mao and Marty Humphrey. 2011. Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In SC'11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.
[32]
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture. 248--259.
[33]
David Meisner, Junjie Wu, and Thomas F Wenisch. 2012. Bighouse: A simulation infrastructure for data center systems. In ISPASS.
[34]
Amirhossein Mirhosseini, Hossein Golestani, and Thomas F Wenisch. 2020. HyperPlane: A Scalable Low-Latency Notification Accelerator for Software Data Planes. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 852--867.
[35]
Amirhossein Mirhosseini, Akshitha Sriraman, and Thomas F Wenisch. 2019. Enhancing server efficiency in the face of killer microseconds. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 185--198.
[36]
Amirhossein Mirhosseini and Thomas Wenisch. 2021. μSteal: a theory-backed framework for preemptive work and resource stealing in mixed-criticality microservices. In Proceedings of the ACM International Conference on Supercomputing. 102--114.
[37]
Amirhossein Mirhosseini and Thomas F Wenisch. 2019. The queuing-first approach for tail management of interactive services. IEEE Micro (2019).
[38]
Amirhossein Mirhosseini, Brendan L West, Geoffrey W Blake, and Thomas F Wenisch. 2020. Q-Zilla: A Scheduling Framework and Core Microarchitecture for Tail-Tolerant Microservices. In HPCA.
[39]
Tirthak Patel and Devesh Tiwari. 2020. Clite: Efficient and qos-aware co-location of multiple latency-critical jobs for warehouse scale computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 193--206.
[40]
Arash Pourhabibi, Siddharth Gupta, Hussein Kassir, Mark Sutherland, Zilu Tian, Mario Paulo Drumond, Babak Falsafi, and Christoph Koch. 2020. Optimus prime: Accelerating data transformation in servers. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 1203--1216.
[41]
Arash Pourhabibi Zarandi, Mark Johnathon Sutherland, Alexandros Daglis, and Babak Falsafi. 2021. Cerebros: Evading the RPC Tax in Datacenters. In Proceedings of the 54th International Symposium on Microarchitecture (MICRO'21).
[42]
George Prekas, Marios Kogias, and Edouard Bugnion. 2017. Zygos: Achieving low tail latency for microsecond-scale networked tasks. In Proceedings of the 26th Symposium on Operating Systems Principles. 325--341.
[43]
Haoran Qiu, Subho S Banerjee, Saurabh Jha, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. 2020. FIRM: An Intelligent Fine-Grained Resource Management Framework for SLO-Oriented Microservices. OSDI (2020).
[44]
Chenhao Qu, Rodrigo N Calheiros, and Rajkumar Buyya. 2018. Autoscaling web applications in clouds: A taxonomy and survey. Comput. Surveys (2018).
[45]
Krzysztof Rzadca, Pawel Findeisen, Jacek Swiderski, Przemyslaw Zych, Przemyslaw Broniek, Jarek Kusmierek, et al. 2020. Autopilot: workload autoscaling at Google. In EuroSys.
[46]
Parminder Singh, Pooja Gupta, Kiran Jyoti, and Anand Nayyar. 2019. Research on auto-scaling of web applications in cloud: survey, trends and future directions. Scalable Computing: Practice and Experience 20, 2 (2019), 399--432.
[47]
Akshitha Sriraman and Abhishek Dhanotia. 2020. Accelerometer: Understanding acceleration opportunities for data center overheads at hyperscale. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 733--750.
[48]
Akshitha Sriraman, Abhishek Dhanotia, and Thomas F Wenisch. 2019. Softsku: optimizing server architectures for microservice diversity@scale. In ISCA.
[49]
Akshitha Sriraman and Thomas F Wenisch. 2018. μ suite: a benchmark suite for microservices. In 2018 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1--12.
[50]
Akshitha Sriraman and Thomas F Wenisch. 2018. μTune: Auto-Tuned Threading for {OLDI} Microservices. In OSDI.
[51]
Lalith Suresh, Peter Bodik, Ishai Menache, Marco Canini, and Florin Ciucu. 2017. Distributed resource management across process boundaries. In Proceedings of the 2017 Symposium on Cloud Computing. 611--623.
[52]
Mark Sutherland, Siddharth Gupta, Babak Falsafi, Virendra Marathe, Dionisios Pnevmatikatos, and Alexandros Daglis. 2020. The NEBULA RPC-optimized architecture. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 199--212.
[53]
Bhuvan Urgaonkar, Giovanni Pacifici, Prashant Shenoy, Mike Spreitzer, and Asser Tantawi. 2005. An analytical model for multi-tier internet services and its applications. ACM SIGMETRICS Performance Evaluation Review 33, 1 (2005), 291--302.
[54]
Bhuvan Urgaonkar, Prashant Shenoy, Abhishek Chandra, et al. 2008. Agile dynamic provisioning of multi-tier internet applications. ACM TAAS (2008).
[55]
Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. ACM SIGARCH Computer Architecture News 41, 3 (2013), 607--618.
[56]
Hailong Yang, Quan Chen, Moeiz Riaz, Zhongzhi Luan, Lingjia Tang, and Jason Mars. 2017. Powerchief: Intelligent power allocation for multi-stage applications to improve responsiveness on power constrained cmp. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 133--146.
[57]
Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, G Edward Suh, and Christina Delimitrou. 2021. Sinan: ML-based and QoS-aware resource management for cloud microservices. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 167--181.
[58]
Yunqi Zhang, David Meisner, Jason Mars, and Lingjia Tang. 2016. Treadmill: Attributing the source of tail latency through precise load testing and statistical inference. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 456--468.
[59]
Hao Zhou, Ming Chen, Qian Lin, Yong Wang, Xiaobin She, Sifan Liu, Rui Gu, Beng Chin Ooi, and Junfeng Yang. 2018. Overload control for scaling wechat microservices. In Proceedings of the ACM Symposium on Cloud Computing. 149--161.
[60]
Haishan Zhu and Mattan Erez. 2016. Dirigent: Enforcing QoS for latency-critical tasks on shared multicore systems. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. 33--47.

Cited By

View all
  • (2025)Cloudnativesim: A Toolkit for Modeling and Simulation of Cloud‐Native ApplicationsSoftware: Practice and Experience10.1002/spe.3417Online publication date: 23-Feb-2025
  • (2024)Optimizing Resource Management for Shared Microservices: A Scalable System DesignACM Transactions on Computer Systems10.1145/363160742:1-2(1-28)Online publication date: 13-Feb-2024
  • (2024)Fast and Efficient Scaling for Microservices with SurgeGuardProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00103(1-15)Online publication date: 17-Nov-2024
  • Show More Cited By

Index Terms

  1. Parslo: A Gradient Descent-based Approach for Near-optimal Partial SLO Allotment in Microservices

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SoCC '21: Proceedings of the ACM Symposium on Cloud Computing
    November 2021
    685 pages
    ISBN:9781450386388
    DOI:10.1145/3472883
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 November 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Auto-scaling
    2. Microservices
    3. Service Level Objectives

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    SoCC '21
    Sponsor:
    SoCC '21: ACM Symposium on Cloud Computing
    November 1 - 4, 2021
    WA, Seattle, USA

    Acceptance Rates

    Overall Acceptance Rate 169 of 722 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)420
    • Downloads (Last 6 weeks)50
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Cloudnativesim: A Toolkit for Modeling and Simulation of Cloud‐Native ApplicationsSoftware: Practice and Experience10.1002/spe.3417Online publication date: 23-Feb-2025
    • (2024)Optimizing Resource Management for Shared Microservices: A Scalable System DesignACM Transactions on Computer Systems10.1145/363160742:1-2(1-28)Online publication date: 13-Feb-2024
    • (2024)Fast and Efficient Scaling for Microservices with SurgeGuardProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00103(1-15)Online publication date: 17-Nov-2024
    • (2024)MSARS: A Meta-Learning and Reinforcement Learning Framework for SLO Resource Allocation and Adaptive Scaling for Microservices2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)10.1109/ISPA63168.2024.00081(590-599)Online publication date: 30-Oct-2024
    • (2024)Designing Cloud Servers for Lower Carbon2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00041(452-470)Online publication date: 29-Jun-2024
    • (2024)Derm: SLA-aware Resource Management for Highly Dynamic Microservices2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00039(424-436)Online publication date: 29-Jun-2024
    • (2024)Minimize Resource Cost for Containerized Microservices Under SLO via ML-Enhanced Layered Queueing Network Optimization2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence)10.1109/Confluence60223.2024.10463310(631-637)Online publication date: 18-Jan-2024
    • (2024)SonnetFuture Generation Computer Systems10.1016/j.future.2023.11.019153:C(169-181)Online publication date: 16-May-2024
    • (2024)A self-stabilizing and auto-provisioning orchestration for microservices in edge-cloud continuumComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2024.110279242:COnline publication date: 2-Jul-2024
    • (2024)LSRAM: A Lightweight Autoscaling and SLO Resource Allocation Framework for Microservices Based on Gradient DescentSoftware: Practice and Experience10.1002/spe.3395Online publication date: 4-Dec-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media