research-article

Public Access

Parslo: A Gradient Descent-based Approach for Near-optimal Partial SLO Allotment in Microservices

Authors:

Amirhossein Mirhosseini,

Sameh Elnikety,

Thomas F. WenischAuthors Info & Claims

SoCC '21: Proceedings of the ACM Symposium on Cloud Computing

Pages 442 - 457

https://doi.org/10.1145/3472883.3486985

Published: 01 November 2021 Publication History

Abstract

Modern cloud services are implemented as graphs of loosely-coupled microservices to improve programmability, reliability, and scalability. Service Level Objectives (SLOs) define end-to-end latency targets for the entire service to ensure user satisfaction. In such environments, each microservice is independently deployed and (auto-)scaled. However, it is unclear how to optimally scale individual microservices when end-to-end SLOs are violated or underutilized, and how to size each microservice to meet the end-to-end SLO at minimal total cost. In this paper, we propose Parslo---a Gradient Descent-based approach to assign partial SLOs among nodes in a microservice graph under an end-to-end latency SLO. At a high level, the Parslo algorithm breaks the end-to-end SLO budget into small incremental "SLO units", and iteratively allocates one marginal SLO unit to the best candidate microservice to achieve the highest total cost savings until the entire end-to-end SLO budget is exhausted. Parslo achieves a near-optimal solution, seeking to minimize the total cost for the entire service deployment, and is applicable to general microservice graphs that comprise patterns like dynamic branching, parallel fan-out, and microservice dependencies. Parslo reduces service deployment costs by more than 6x in real microservice-based applications, compared to a state-of-the-art partial SLO assignment scheme.

Supplementary Material

VTT File (Day3_Session9-Order3.vtt)

Download
25.92 KB

MP4 File (Day3_Session9-Order3.mp4)

Presentation video

Download
692.68 MB

References

[1]

[n.d.]. Cloud Adoption in 2020. https://www.oreilly.com/radar/cloud-adoption-in-2020/. Accessed: 2021-08-30.

[2]

[n.d.]. Kubernetes Horizontal Pod Autoscaler. https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/. Accessed: 2021-08-30.

[3]

Armin Balalaie, Abbas Heydarnoori, and Pooyan Jamshidi. 2016. Microservices architecture enables devops: Migration to a cloud-native architecture. Ieee Software 33, 3 (2016), 42--52.

Digital Library

[4]

Eric A Brewer. 2015. Kubernetes and the path to cloud native. In Proceedings of the sixth ACM symposium on cloud computing. 167--167.

Digital Library

[5]

Shuang Chen, Christina Delimitrou, and José F Martínez. 2019. Parties: Qos-aware resource partitioning for multiple interactive services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 107--120.

Digital Library

[6]

Byungkwon Choi, Jinwoo Park, Chunghan Lee, and Dongsu Han. 2021. pHPA: A Proactive Autoscaling Framework For Microservice Chain. In 5th Asia-Pacific Workshop on Networking (APNet 2021). Association for Computing Machinery, Inc.

Digital Library

[7]

Chih-Hsun Chou, Laxmi N Bhuyan, and Daniel Wong. 2019. μdpm: Dynamic power management for the microsecond era. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 120--132.

[8]

Alexandros Daglis, Mark Sutherland, and Babak Falsafi. 2019. RPC-Valet: NI-driven tail-aware balancing of μs-scale RPCs. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 35--48.

Digital Library

[9]

Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (2013), 74--80.

Digital Library

[10]

Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices 48, 4 (2013), 77--88.

Digital Library

[11]

Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and qos-aware cluster management. ACM SIGPLAN Notices 49, 4 (2014), 127--144.

Digital Library

[12]

Christina Delimitrou and Christos Kozyrakis. 2018. Amdahl's law for tail latency. Commun. ACM 61, 8 (2018), 65--72.

Digital Library

[13]

José Fonseca, Geoffrey Nelissen, and Vincent Nélis. 2019. Schedulability analysis of DAG tasks with arbitrary deadlines under global fixed-priority scheduling. Real-Time Systems (2019).

[14]

Yu Gan and Christina Delimitrou. 2018. The architectural implications of cloud microservices. IEEE Computer Architecture Letters 17, 2 (2018), 155--158.

[15]

Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, et al. 2019. An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems. In ASPLOS.

[16]

Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou. 2019. Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices. In ASPLOS.

Digital Library

[17]

Anshul Gandhi, Mor Harchol-Balter, Ram Raghunathan, and Michael A Kozuch. 2012. Autoscale: Dynamic, robust capacity management for multi-tier data centers. ACM Transactions on Computer Systems (TOCS) 30, 4 (2012), 1--26.

Digital Library

[18]

Hossein Golestani, Amirhossein Mirhosseini, and Thomas F Wenisch. 2019. Software Data Planes: You Can't Always Spin to Win. In Proceedings of the ACM Symposium on Cloud Computing. 337--350.

Digital Library

[19]

Xin He and Yaacov Yesha. 1987. Parallel recognition and decomposition of two terminal series parallel graphs. Information and Computation (1987).

[20]

Xiaofeng Hou, Chao Li, Jiacheng Liu, Lu Zhang, Yang Hu, and Minyi Guo. 2020. ANT-man: towards agile power management in the microservice era. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--14.

[21]

Kostis Kaffes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazières, and Christos Kozyrakis. 2019. Shinjuku: Preemptive scheduling for μsecond-scale tail latency. In 16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19). 345--360.

[22]

Ram Srivatsa Kannan, Lavanya Subramanian, et al. 2019. Grandslam: Guaranteeing slas for jobs in microservices execution frameworks. In EuroSys.

Digital Library

[23]

Marios Kogias, George Prekas, Adrien Ghosn, Jonas Fietz, and Edouard Bugnion. 2019. R2P2: Making RPCs first-class datacenter citizens. In 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19). 863--880.

[24]

Neeraj Kulkarni, Feng Qi, and Christina Delimitrou. 2019. Pliant: Leveraging approximation to improve datacenter resource efficiency. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 159--171.

[25]

Jialin Li, Naveen Kr Sharma, Dan RK Ports, and Steven D Gribble. 2014. Tales of the tail: Hardware, os, and application-level sources of tail latency. In Proceedings of the ACM Symposium on Cloud Computing. 1--14.

Digital Library

[26]

Qian Li, Bin Li, Pietro Mercati, Ramesh Illikkal, Charlie Tai, Michael Kishinevsky, and Christos Kozyrakis. 2021. RAMBO: Resource Allocation for Microservices Using Bayesian Optimization. IEEE Computer Architecture Letters 20, 1 (2021), 46--49.

[27]

Yuhao Li, Dan Sun, and Benjamin C Lee. 2020. Dynamic colocation policies with reinforcement learning. ACM Transactions on Architecture and Code Optimization (TACO) 17, 1 (2020), 1--25.

Digital Library

[28]

David Lo and Christos Kozyrakis. 2014. Dynamic management of TurboMode in modern multi-core chips. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 603--613.

[29]

Tania Lorido-Botran, Jose Miguel-Alonso, and Jose A Lozano. 2014. A review of auto-scaling techniques for elastic applications in cloud environments. Journal of grid computing 12, 4 (2014), 559--592.

Digital Library

[30]

Simon J Malkowski, Markus Hedwig, Jack Li, et al. 2011. Automated control for elastic n-tier workloads based on empirical modeling. In ICAC.

[31]

Ming Mao and Marty Humphrey. 2011. Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In SC'11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.

Digital Library

[32]

Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture. 248--259.

Digital Library

[33]

David Meisner, Junjie Wu, and Thomas F Wenisch. 2012. Bighouse: A simulation infrastructure for data center systems. In ISPASS.

[34]

Amirhossein Mirhosseini, Hossein Golestani, and Thomas F Wenisch. 2020. HyperPlane: A Scalable Low-Latency Notification Accelerator for Software Data Planes. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 852--867.

[35]

Amirhossein Mirhosseini, Akshitha Sriraman, and Thomas F Wenisch. 2019. Enhancing server efficiency in the face of killer microseconds. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 185--198.

[36]

Amirhossein Mirhosseini and Thomas Wenisch. 2021. μSteal: a theory-backed framework for preemptive work and resource stealing in mixed-criticality microservices. In Proceedings of the ACM International Conference on Supercomputing. 102--114.

Digital Library

[37]

Amirhossein Mirhosseini and Thomas F Wenisch. 2019. The queuing-first approach for tail management of interactive services. IEEE Micro (2019).

Digital Library

[38]

Amirhossein Mirhosseini, Brendan L West, Geoffrey W Blake, and Thomas F Wenisch. 2020. Q-Zilla: A Scheduling Framework and Core Microarchitecture for Tail-Tolerant Microservices. In HPCA.

[39]

Tirthak Patel and Devesh Tiwari. 2020. Clite: Efficient and qos-aware co-location of multiple latency-critical jobs for warehouse scale computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 193--206.

[40]

Arash Pourhabibi, Siddharth Gupta, Hussein Kassir, Mark Sutherland, Zilu Tian, Mario Paulo Drumond, Babak Falsafi, and Christoph Koch. 2020. Optimus prime: Accelerating data transformation in servers. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 1203--1216.

Digital Library

[41]

Arash Pourhabibi Zarandi, Mark Johnathon Sutherland, Alexandros Daglis, and Babak Falsafi. 2021. Cerebros: Evading the RPC Tax in Datacenters. In Proceedings of the 54th International Symposium on Microarchitecture (MICRO'21).

[42]

George Prekas, Marios Kogias, and Edouard Bugnion. 2017. Zygos: Achieving low tail latency for microsecond-scale networked tasks. In Proceedings of the 26th Symposium on Operating Systems Principles. 325--341.

Digital Library

[43]

Haoran Qiu, Subho S Banerjee, Saurabh Jha, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. 2020. FIRM: An Intelligent Fine-Grained Resource Management Framework for SLO-Oriented Microservices. OSDI (2020).

[44]

Chenhao Qu, Rodrigo N Calheiros, and Rajkumar Buyya. 2018. Autoscaling web applications in clouds: A taxonomy and survey. Comput. Surveys (2018).

[45]

Krzysztof Rzadca, Pawel Findeisen, Jacek Swiderski, Przemyslaw Zych, Przemyslaw Broniek, Jarek Kusmierek, et al. 2020. Autopilot: workload autoscaling at Google. In EuroSys.

[46]

Parminder Singh, Pooja Gupta, Kiran Jyoti, and Anand Nayyar. 2019. Research on auto-scaling of web applications in cloud: survey, trends and future directions. Scalable Computing: Practice and Experience 20, 2 (2019), 399--432.

[47]

Akshitha Sriraman and Abhishek Dhanotia. 2020. Accelerometer: Understanding acceleration opportunities for data center overheads at hyperscale. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 733--750.

Digital Library

[48]

Akshitha Sriraman, Abhishek Dhanotia, and Thomas F Wenisch. 2019. Softsku: optimizing server architectures for microservice diversity@scale. In ISCA.

[49]

Akshitha Sriraman and Thomas F Wenisch. 2018. μ suite: a benchmark suite for microservices. In 2018 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1--12.

[50]

Akshitha Sriraman and Thomas F Wenisch. 2018. μTune: Auto-Tuned Threading for {OLDI} Microservices. In OSDI.

[51]

Lalith Suresh, Peter Bodik, Ishai Menache, Marco Canini, and Florin Ciucu. 2017. Distributed resource management across process boundaries. In Proceedings of the 2017 Symposium on Cloud Computing. 611--623.

Digital Library

[52]

Mark Sutherland, Siddharth Gupta, Babak Falsafi, Virendra Marathe, Dionisios Pnevmatikatos, and Alexandros Daglis. 2020. The NEBULA RPC-optimized architecture. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 199--212.

Digital Library

[53]

Bhuvan Urgaonkar, Giovanni Pacifici, Prashant Shenoy, Mike Spreitzer, and Asser Tantawi. 2005. An analytical model for multi-tier internet services and its applications. ACM SIGMETRICS Performance Evaluation Review 33, 1 (2005), 291--302.

Digital Library

[54]

Bhuvan Urgaonkar, Prashant Shenoy, Abhishek Chandra, et al. 2008. Agile dynamic provisioning of multi-tier internet applications. ACM TAAS (2008).

[55]

Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. ACM SIGARCH Computer Architecture News 41, 3 (2013), 607--618.

Digital Library

[56]

Hailong Yang, Quan Chen, Moeiz Riaz, Zhongzhi Luan, Lingjia Tang, and Jason Mars. 2017. Powerchief: Intelligent power allocation for multi-stage applications to improve responsiveness on power constrained cmp. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 133--146.

Digital Library

[57]

Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, G Edward Suh, and Christina Delimitrou. 2021. Sinan: ML-based and QoS-aware resource management for cloud microservices. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 167--181.

Digital Library

[58]

Yunqi Zhang, David Meisner, Jason Mars, and Lingjia Tang. 2016. Treadmill: Attributing the source of tail latency through precise load testing and statistical inference. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 456--468.

Digital Library

[59]

Hao Zhou, Ming Chen, Qian Lin, Yong Wang, Xiaobin She, Sifan Liu, Rui Gu, Beng Chin Ooi, and Junfeng Yang. 2018. Overload control for scaling wechat microservices. In Proceedings of the ACM Symposium on Cloud Computing. 149--161.

Digital Library

[60]

Haishan Zhu and Mattan Erez. 2016. Dirigent: Enforcing QoS for latency-critical tasks on shared multicore systems. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. 33--47.

Digital Library

Cited By

Wu JXu MHe YYe KXu C(2025)Cloudnativesim: A Toolkit for Modeling and Simulation of Cloud‐Native ApplicationsSoftware: Practice and Experience10.1002/spe.3417Online publication date: 23-Feb-2025
https://doi.org/10.1002/spe.3417
Luo SLin CYe KXu GZhang LYang GXu HXu C(2024)Optimizing Resource Management for Shared Microservices: A Scalable System DesignACM Transactions on Computer Systems10.1145/363160742:1-2(1-28)Online publication date: 13-Feb-2024
https://dl.acm.org/doi/10.1145/3631607
Ghosh AYadwadkar NErez M(2024)Fast and Efficient Scaling for Microservices with SurgeGuardProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00103(1-15)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00103
Show More Cited By

Index Terms

Parslo: A Gradient Descent-based Approach for Near-optimal Partial SLO Allotment in Microservices
1. Networks
  1. Network services
    1. Cloud computing

Recommendations

Runtime Vertical Scaling of Virtualized Applications via Online Model Estimation
SASO '14: Proceedings of the 2014 IEEE Eighth International Conference on Self-Adaptive and Self-Organizing Systems

Applications in virtualized data centers are often subject to Service Level Objectives (SLOs) regarding their performance (e.g., latency or throughput). In order to fulfill these SLOs, it is necessary to allocate sufficient resources of different types (...
Migration of monolithic systems to microservices: A systematic mapping study
Abstract Context:
The popularity of microservices architecture has grown due to its ability to address monolithic architecture issues, such as limited scalability, hard maintenance, and technological dependence. Nonetheless, the migration of monolith ...
Highlights
- Monolithic systems to microservices migration is complex and techniques are varied;
- Tools used to support migration are scarce and Java-focused;
- Scalability and maintenance drive migration, but few studies assess these aspects;
ProRenaTa: proactive and reactive tuning to scale a distributed storage system
CCGRID '15: Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing

Provisioning stateful services in the Cloud that guarantees high quality of service with reduced hosting cost is challenging to achieve. There are two typical auto-scaling approaches: predictive and reactive. A prediction based controller leaves the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SoCC '21: Proceedings of the ACM Symposium on Cloud Computing

November 2021

685 pages

ISBN:9781450386388

DOI:10.1145/3472883

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Defense Advanced Research Projects Agency

Conference

SoCC '21

Sponsor:

SoCC '21: ACM Symposium on Cloud Computing

November 1 - 4, 2021

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
1,216
Total Downloads

Downloads (Last 12 months)420
Downloads (Last 6 weeks)50

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu JXu MHe YYe KXu C(2025)Cloudnativesim: A Toolkit for Modeling and Simulation of Cloud‐Native ApplicationsSoftware: Practice and Experience10.1002/spe.3417Online publication date: 23-Feb-2025
https://doi.org/10.1002/spe.3417
Luo SLin CYe KXu GZhang LYang GXu HXu C(2024)Optimizing Resource Management for Shared Microservices: A Scalable System DesignACM Transactions on Computer Systems10.1145/363160742:1-2(1-28)Online publication date: 13-Feb-2024
https://dl.acm.org/doi/10.1145/3631607
Ghosh AYadwadkar NErez M(2024)Fast and Efficient Scaling for Microservices with SurgeGuardProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00103(1-15)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00103
Hu KWen LXu MYe K(2024)MSARS: A Meta-Learning and Reinforcement Learning Framework for SLO Resource Allocation and Adaptive Scaling for Microservices2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)10.1109/ISPA63168.2024.00081(590-599)Online publication date: 30-Oct-2024
https://doi.org/10.1109/ISPA63168.2024.00081
Wang JBerger DKazhamiaka FIrvene CZhang CChoukse EFrost KFonseca RWarrier BBansal CStern JBianchini RSriraman A(2024)Designing Cloud Servers for Lower Carbon2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00041(452-470)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00041
Chen LLuo SLin CMo ZXu HYe KXu C(2024)Derm: SLA-aware Resource Management for Highly Dynamic Microservices2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00039(424-436)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00039
Luan SShen H(2024)Minimize Resource Cost for Containerized Microservices Under SLO via ML-Enhanced Layered Queueing Network Optimization2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence)10.1109/Confluence60223.2024.10463310(631-637)Online publication date: 18-Jan-2024
https://doi.org/10.1109/Confluence60223.2024.10463310
Ma RZhan YXia YWu CYang LGao R(2024)SonnetFuture Generation Computer Systems10.1016/j.future.2023.11.019153:C(169-181)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1016/j.future.2023.11.019
Cai BWang XWang BYang MGuo YGuo Q(2024)A self-stabilizing and auto-provisioning orchestration for microservices in edge-cloud continuumComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2024.110279242:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.comnet.2024.110279
Hu KXu MYe KXu C(2024)LSRAM: A Lightweight Autoscaling and SLO Resource Allocation Framework for Microservices Based on Gradient DescentSoftware: Practice and Experience10.1002/spe.3395Online publication date: 4-Dec-2024
https://doi.org/10.1002/spe.3395
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten