Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3542929.3563476acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Demeter: QoS-aware CPU scheduling to reduce power consumption of multiple black-box workloads

Published: 07 November 2022 Publication History

Abstract

Energy consumption in cloud data centers has become an increasingly important contributor to greenhouse gas emissions and operation costs. To reduce energy-related costs and improve environmental sustainability, most modern data centers consolidate Virtual Machine (VM) workloads belonging to different application classes, some being latency-critical (LC) and others being more tolerant to performance changes, known as best-effort (BE). However, in public cloud scenarios, the real classes of applications are often opaque to data center operators. The heterogeneous applications from different cloud tenants are usually consolidated onto the same hosts to improve energy efficiency, but it is not trivial to guarantee decent performance isolation among colocated workloads. We tackle the above challenges by introducing Demeter, a QoS-aware power management controller for heterogeneous black-box workloads in public clouds. Demeter is designed to work without offline profiling or prior knowledge about black-box workloads. Through the correlation analysis between network throughput and CPU resource utilization, Demeter automatically classifies black-box workloads as either LC or BE. By provisioning differentiated CPU management strategies (including dynamic core allocation and frequency scaling) to LC and BE workloads, Demeter achieves considerable power savings together with a minimum impact on the performance of all workloads. We discuss the design and implementation of Demeter in this work, and conduct extensive experimental evaluations to reveal its effectiveness. Our results show that Demeter not only meets the performance demand of all workloads, but also responds quickly to dynamic load changes in our cloud environment. In addition, Demeter saves an average of 10.6% power consumption than state of the art mechanisms.

References

[1]
2021. Working Guidance for Carbon Dioxide Peaking and Carbon Neutrality in Full and Faithful Implementation of the New Development Philosophy. https://en.ndrc.gov.cn/policies/202110/t20211024_1300725.html.
[2]
Esmail Asyabi, Azer Bestavros, Erfan Sharafzadeh, and Timothy Zhu. 2020. Peafowl: in-application CPU scheduling to reduce power consumption of in-memory key-value stores. In Proceedings of the 11th ACM Symposium on Cloud Computing (SoCC '20). Association for Computing Machinery, New York, NY, USA, 150--164.
[3]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '12, London, United Kingdom, June 11--15, 2012. ACM, 53--64.
[4]
A. Beitch, B. Liu, T. Yung, R. Griffith, A. Fox, D. A. Patterson, and A. Beitch. 2010. Rain: A Workload Generation Toolkit for Cloud Computing Applications. Technical Report, University of California at Berkeley (2010).
[5]
Shuang Chen, Christina Delimitrou, and José F. Martínez. 2019. PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, Providence, RI, USA, April 13--17, 2019. ACM, 107--120.
[6]
Chih-Hsun Chou, Laxmi N. Bhuyan, and Daniel Wong. 2019. μDPM: Dynamic Power Management for the Microsecond Era. In 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Washington, DC, USA, February 16--20, 2019. IEEE, 120--132.
[7]
Chih-Hsun Chou, Daniel Wong, and Laxmi N. Bhuyan. 2016. DynSleep: Fine-grained Power Management for a Latency-Critical Data Center Application. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design, ISLPED 2016, San Francisco Airport, CA, USA, August 08 - 10, 2016. ACM, 212--217.
[8]
Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28--31, 2017. ACM, 153--167.
[9]
Howard David, Eugene Gorbatov, Ulf R. Hanebutte, Rahul Khanna, and Christian Le. 2010. RAPL: memory power estimation and capping. In Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010, Austin, Texas, USA, August 18--20, 2010. ACM, 189--194.
[10]
Michael Ferdman, Almutaz Adileh, Yusuf Onur Koçberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2012, London, UK, March 3--7, 2012. ACM, 37--48.
[11]
Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay. 2020. Caladan: Mitigating Interference at Microsecond Timescales. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4--6, 2020. USENIX Association, 281--297. https://www.usenix.org/conference/osdi20/presentation/fried
[12]
Senbo Fu, Rui Prior, and Hyong Kim. 2019. DMFD: Non-Intrusive Dependency Inference and Flow Ratio Model for Performance Anomaly Detection in Multi-Tier Cloud Applications. In 12th IEEE International Conference on Cloud Computing, CLOUD 2019, Milan, Italy, July 8--13, 2019. IEEE, 164--173.
[13]
Redha Gouicem, Damien Carver, Jean-Pierre Lozi, Julien Sopena, Baptiste Lepers, Willy Zwaenepoel, Nicolas Palix, Julia Lawall, and Gilles Muller. 2020. Fewer Cores, More Hertz: Leveraging High-Frequency Cores in the OS Scheduler for Improved Application Performance. In 2020 USENIX Annual Technical Conference, USENIX ATC 2020, July 15--17, 2020. USENIX Association, 435--448.
[14]
Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. 2010. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In Workshops Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, March 1--6, 2010, Long Beach, California, USA. IEEE Computer Society, 41--51.
[15]
Calin Iorgulescu, Reza Azimi, Youngjin Kwon, Sameh Elnikety, Manoj Syamala, Vivek R. Narasayya, Herodotos Herodotou, Paulo Tomita, Alex Chen, Jack Zhang, and Junhua Wang. 2018. PerfIso: Performance Isolation for Commercial Latency-Sensitive Services. In 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11--13, 2018. USENIX Association, 519--532. https://www.usenix.org/conference/atc18/presentation/iorgulescu
[16]
Seyyed Ahmad Javadi, Amoghavarsha Suresh, Muhammad Wajahat, and Anshul Gandhi. 2019. Scavenger: A Black-Box Batch Workload Resource Manager for Improving Utilization in Cloud Environments. In Proceedings of the ACM Symposium on Cloud Computing (SoCC '19). Association for Computing Machinery, New York, NY, USA, 272--285.
[17]
Kostis Kaffes, Dragos Sbirlea, Yiyan Lin, David Lo, and Christos Kozyrakis. 2020. Leveraging application classes to save power in highly-utilized data centers. In SoCC '20: ACM Symposium on Cloud Computing, Virtual Event, USA, October 19--21, 2020. ACM, 134--149.
[18]
Mustafa Korkmaz, Martin Karsten, Kenneth Salem, and Semih Salihoglu. 2018. Workload-Aware CPU Performance Scaling for Transactional Database Systems. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 291--306.
[19]
Alok Gautam Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe Vieira Frujeri, Nithish Mahalingam, Pulkit A. Misra, Seyyed Ahmad Javadi, Bianca Schroeder, Marcus Fontoura, and Ricardo Bianchini. 2021. Prediction-Based Power Oversubscription in Cloud Platforms. In 2021 USENIX Annual Technical Conference, USENIX ATC 2021, July 14--16, 2021. USENIX Association, 473--487.
[20]
Julia Lawall, Himadri Chhaya-Shailesh, Jean-Pierre Lozi, Baptiste Lepers, Willy Zwaenepoel, and Gilles Muller. 2022. OS scheduling with nest: keeping tasks close together on warm cores. In EuroSys '22: Seventeenth European Conference on Computer Systems, Rennes, France, April 5 - 8, 2022. ACM, 368--383.
[21]
Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble. 2014. Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency. In Proceedings of the ACM Symposium on Cloud Computing, Seattle, WA, USA, November 3--5, 2014. ACM, 9:1--9:14.
[22]
Xiaofei Liao, Rentong Guo, Danping Yu, Hai Jin, and Li Lin. 2016. A Phase Behavior Aware Dynamic Cache Partitioning Scheme for CMPs. Int. J. Parallel Program. 44, 1 (2016), 68--86.
[23]
Jdc Little and S. C. Graves. 2008. Little's Law. Springer US (2008).
[24]
David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, and Christos Kozyrakis. 2014. Towards energy proportionality for large-scale latency-critical workloads. In ACM/IEEE 41st International Symposium on Computer Architecture, ISCA 2014, Minneapolis, MN, USA, June 14--18, 2014. IEEE Computer Society, 301--312.
[25]
David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, Portland, OR, USA, June 13--17, 2015. ACM, 450--462.
[26]
Eric Masanet and Nuoa Lei. 2020. How Much Energy Do Data Centers Really Use? https://energyinnovation.org/2020/03/17/how-much-energy-do-data-centers-really-use/.
[27]
Rajiv Nishtala, Paul Carpenter, Vinicius Petrucci, and Xavier Martorell. 2017. Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 409--420. ISSN: 2378-203X.
[28]
Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. 2019. Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads. In 16th USENIX Symposium on Networked Systems Design and Implementation, NSDI2019, Boston, MA, February 26--28, 2019. USENIX Association, 361--378.
[29]
Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun. 2015. Making Sense of Performance in Data Analytics Frameworks. In 12th USENIX Symposium on Networked Systems Design and Implementation, NSDI 15, Oakland, CA, USA, May 4--6, 2015. USENIX Association, 293--307.
[30]
Tirthak Patel and Devesh Tiwari. 2020. CLITE: Efficient and QoS-Aware Co-Location of Multiple Latency-Critical Jobs for Warehouse Scale Computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 193--206. ISSN: 2378-203X.
[31]
Junjie Peng, Yongchuan Dai, Yi Rao, Xiaofei Zhi, and Meikang Qiu. 2015. Modeling for CPU-Intensive Applications in Cloud Computing. In 17th IEEE International Conference on High Performance Computing and Communications, HPCC 2015, 7th IEEE International Symposium on Cyberspace Safety and Security, CSS 2015, and 12th IEEE International Conference on Embedded Software and Systems, ICESS 2015, New York, NY, USA, August 24--26, 2015. IEEE, 20--25.
[32]
Ratner and Bruce. 2009. The correlation coefficient: Its values range between +1/1, or do they? Journal of Targeting Measurement and Analysis for Marketing 17, 2 (2009), 139--142.
[33]
Varun Sakalkar, Vasileios Kontorinis, David Landhuis, Shaohong Li, Darren De Ronde, Thomas Blooming, Anand Ramesh, James Kennedy, Christopher Malone, Jimmy Clidaras, and Parthasarathy Ranganathan. 2020. Data Center Power Oversubscription with a Medium Voltage Power Plane and Priority-Aware Capping. In ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16--20, 2020. ACM, 497--511.
[34]
Md Abu Bakar Siddik, Arman Shehabi, and Landon Marston. 2021. The environmental footprint of data centers in the United States. Environmental Research Letters 16, 6 (may 2021), 064017.
[35]
Sobinder Singh, Abhishek Swaroop, Ajay Kumar, and Anamika. 2016. A survey on techniques to achive energy efficiency in cloud computing. In 2016 International Conference on Computing, Communication and Automation (ICCCA). 1281--1285.
[36]
Etienne Le Sueur and Gernot Heiser. 2011. Slow Down or Sleep, That Is the Question. In 2011 USENIX Annual Technical Conference, Portland, OR, USA, June 15--17, 2011. USENIX Association. https://www.usenix.org/conference/usenixatc11/slow-down-or-sleep-question
[37]
Jiaqi Tan, Soila Kavulya, Rajeev Gandhi, and Priya Narasimhan. 2012. Light-Weight Black-Box Failure Detection for Distributed Systems. In Proceedings of the 2012 Workshop on Management of Big Data Systems (San Jose, California, USA) (MBDS '12). Association for Computing Machinery, New York, NY, USA, 13--18.
[38]
Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys 2015, Bordeaux, France, April 21--24, 2015. ACM, 18:1--18:17.
[39]
Ashish Vulimiri, Philip Brighten Godfrey, Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, and Scott Shenker. 2013. Low latency via redundancy. In Conference on emerging Networking Experiments and Technologies, CoNEXT '13, Santa Barbara, CA, USA, December 9--12, 2013. ACM, 283--294.
[40]
Qingyang Wang, Yasuhiko Kanemasa, Jack Li, Deepal Jayasinghe, Toshihiro Shimizu, Masazumi Matsubara, Motoyuki Kawaba, and Calton Pu. 2013. Detecting Transient Bottlenecks in n-Tier Applications through Fine-Grained Analysis. In IEEE 33rd International Conference on Distributed Computing Systems, ICDCS 2013, 8--11 July, 2013, Philadelphia, Pennsylvania, USA. IEEE Computer Society, 31--40.
[41]
Yawen Wang, Kapil Arya, Marios Kogias, Manohar Vanga, Aditya Bhandari, Neeraja J. Yadwadkar, Siddhartha Sen, Sameh Elnikety, Christos Kozyrakis, and Ricardo Bianchini. 2021. SmartHarvest: harvesting idle CPUs safely and efficiently in the cloud. In EuroSys '21: Sixteenth European Conference on Computer Systems, Online Event, United Kingdom, April 26--28, 2021. ACM, 1--16.
[42]
Hailong Yang, Quan Chen, Moeiz Riaz, Zhongzhi Luan, Lingjia Tang, and Jason Mars. 2017. PowerChief: Intelligent Power Allocation for Multi-Stage Applications to Improve Responsiveness on Power Constrained CMP. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24--28, 2017. ACM, 133--146.
[43]
Li Yi, Cong Li, and Jianmei Guo. 2020. CPI for Runtime Performance Measurement: The Good, the Bad, and the Ugly. In IEEE International Symposium on Workload Characterization, IISWC 2020, Beijing, China, October 27--30, 2020. IEEE, 106--113.
[44]
Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. 2013. CPI2: CPU performance isolation for shared compute clusters. In Eighth Eurosys Conference 2013, EuroSys '13, Prague, Czech Republic, April 14--17, 2013. ACM, 379--391.
[45]
Ying Zhang, Jian Chen, Xiaowei Jiang, Qiang Liu, Ian M. Steiner, Andrew J. Herdrich, Kevin Shu, Ripan Das, Long Cui, and Litrin Jiang. 2021. LIBRA: Clearing the Cloud Through Dynamic Memory Bandwidth Management. In IEEE International Symposium on High-Performance Computer Architecture, HPCA 2021, Seoul, South Korea, February 27 - March 3, 2021. IEEE, 815--826.
[46]
Liang Zhou, Laxmi N. Bhuyan, and K. K. Ramakrishnan. 2020. Swan: a two-step power management for distributed search engines. In ISLPED '20: ACM/IEEE International Symposium on Low Power Electronics and Design, Boston, Massachusetts, August 10--12, 2020. ACM, 67--72.
[47]
Haishan Zhu and Mattan Erez. 2016. Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). Association for Computing Machinery, New York, NY, USA, 33--47.

Cited By

View all
  • (2024)INS: Identifying and Mitigating Performance Interference in Clouds via Interference-Sensitive PathsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698508(380-397)Online publication date: 20-Nov-2024
  • (2024)vSPACE: Supporting Parallel Network Packet Processing in Virtualized Environments through Dynamic Core ManagementProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3689610(14-25)Online publication date: 14-Oct-2024
  • (2024)PvCC: A vCPU Scheduling Policy for DPDK-applied Systems at Multi-Tenant Edge Data CentersProceedings of the 25th International Middleware Conference10.1145/3652892.3700779(379-391)Online publication date: 2-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '22: Proceedings of the 13th Symposium on Cloud Computing
November 2022
574 pages
ISBN:9781450394147
DOI:10.1145/3542929
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud computing
  2. power management
  3. quality of service
  4. workload characterization

Qualifiers

  • Research-article

Conference

SoCC '22
Sponsor:
SoCC '22: ACM Symposium on Cloud Computing
November 7 - 11, 2022
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)110
  • Downloads (Last 6 weeks)10
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)INS: Identifying and Mitigating Performance Interference in Clouds via Interference-Sensitive PathsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698508(380-397)Online publication date: 20-Nov-2024
  • (2024)vSPACE: Supporting Parallel Network Packet Processing in Virtualized Environments through Dynamic Core ManagementProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3689610(14-25)Online publication date: 14-Oct-2024
  • (2024)PvCC: A vCPU Scheduling Policy for DPDK-applied Systems at Multi-Tenant Edge Data CentersProceedings of the 25th International Middleware Conference10.1145/3652892.3700779(379-391)Online publication date: 2-Dec-2024
  • (2024)SLO-Power: SLO and Power-aware Elastic Scaling for Web Services2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00025(136-147)Online publication date: 6-May-2024
  • (2024)Improving the Efficiency of Serverless Computing via Core-Level Power Management2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00024(125-135)Online publication date: 6-May-2024
  • (2024)Detection of quality of service degradation on multi-tenant containerized servicesJournal of Network and Computer Applications10.1016/j.jnca.2024.103839224:COnline publication date: 2-Jul-2024
  • (2023)ODRL: Reinforcement Learning in Priority Scheduling for Running Cost Optimization2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00322(2410-2419)Online publication date: 17-Dec-2023
  • (2023)Energy-Aware Online Task Offloading and Resource Allocation for Mobile Edge Computing2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS57875.2023.00073(339-349)Online publication date: Jul-2023
  • (2023)Thoth: Provisioning Over-Committed Memory Resource with Differentiated QoS in Public Clouds2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00021(82-89)Online publication date: 17-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media