research-article

CPS: A Cooperative Para-virtualized Scheduling Framework for Manycore Machines

Authors:

Haibo ChenAuthors Info & Claims

ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4

Pages 43 - 56

https://doi.org/10.1145/3623278.3624762

Published: 07 February 2024 Publication History

Abstract

Today's cloud platforms offer large virtual machine (VM) instances with multiple virtual CPUs (vCPU) on manycore machines. These machines typically have a deep memory hierarchy to enhance communication between cores. Although previous researches have primarily focused on addressing the performance scalability issues caused by the double scheduling problem in virtualized environments, they mainly concentrated on solving the preemption problem of synchronization primitives and the traditional NUMA architecture. This paper specifically targets a new aspect of scalability issues caused by the absence of runtime hypervisor-internal states (RHS). We demonstrate two typical RHS problems, namely the invisible pCPU (physical CPU) load and dynamic cache group mapping. These RHS problems result in a collapse in VM performance and low CPU utilization because the guest VM lacks visibility into the latest runtime internal states maintained by the hypervisor, such as pCPU load and vCPU-pCPU mappings. Consequently, the guest VM makes inefficient scheduling decisions.

To address the RHS issue, we argue that the solution lies in exposing the latest scheduling decisions made by both the guest and host schedulers to each other. Hence, we present a cooperative para-virtualized scheduling framework called CPS, which facilitates the proactive exchange of timely scheduling information between the hypervisor and guest VMs. To ensure effective scheduling decisions for VMs, a series of techniques are proposed based on the exchanged information. We have implemented CPS in Linux KVM and have designed corresponding solutions to tackle the two RHS problems. Evaluation results demonstrate that CPS significantly improves the performance of PARSEC by 81.1% and FxMark by 1.01x on average for the two identified problems.

References

[1]

Alibaba Cloud: Elastic Compute Service. https://www.alibabacloud.com/product/ecs. Referenced September 2023.

[2]

Amazon EC2 Instance Types. https://aws.amazon.com/ec2/instance-types/. Referenced September 2023.

[3]

AMD. 2021. The 2nd Gen AMD EPYC 7002 Series Processors. www.amd.com/en/processors/epyc-7002-series. Referenced September 2023.

[4]

AMD64 Architecture Programmer's Manual, Volume 2: System Programming. https://www.amd.com/system/files/TechDocs/24593.pdf. Referenced September 2023.

[5]

Github: stress-ng (stress next generation). https://github.com/ColinIanKing/stress-ng. Referenced September 2023.

[6]

Huawei TaiShan Server Data Sheet. https://e.huawei.com/en/material/datacenter/server/7a0b8b0f056f479f909220ac21915999. Referenced September 2023.

[7]

implement vcpu preempted check. https://lwn.net/Articles/704904/. Referenced September 2023.

[8]

Intel® 64 and IA-32 Architectures Software Developer's Manual. https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf. Referenced September 2023.

[9]

lbzip2: parallel bzip2 compression utility. https://github.com/kjn/lbzip2. Referenced September 2023.

[10]

LWN.net: Steal time for KVM. https://lwn.net/Articles/449657/. Referenced September 2023.

[11]

OpenEuler. https://github.com/openeuler-mirror. Referenced September 2023.

[12]

Paravirtualized ticket spinlocks. https://lwn.net/Articles/552696/. Referenced September 2023.

[13]

The CPU Scheduler in VMware vSphere® 5.1. https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vsphere-cpu-sched-performance-white-paper.pdf. Referenced September 2023.

[14]

Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry M. Levy. Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism. SIGOPS Oper. Syst. Rev., 25(5):95--109, sep 1991.

Digital Library

[15]

Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the Art of Virtualization. SIGOPS Oper. Syst. Rev., 37(5):164--177, October 2003.

Digital Library

[16]

Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. The Multikernel: A New OS Architecture for Scalable Multicore Systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09, page 29--44, New York, NY, USA, 2009. Association for Computing Machinery.

Digital Library

[17]

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT '08, page 72--81, New York, NY, USA, 2008. Association for Computing Machinery.

[18]

Edouard Bugnion, Scott Devine, and Mendel Rosenblum. Disco: Running Commodity Operating Systems on Scalable Multiprocessors. SIGOPS Oper. Syst. Rev., 31(5):143--156, oct 1997.

Digital Library

[19]

Bao Bui, Djob Mvondo, Boris Teabe, Kevin Jiokeng, Lavoisier Wapet, Alain Tchana, Gaël Thomas, Daniel Hagimont, Gilles Muller, and Noel DePalma. When EXtended Para - Virtualization (XPV) Meets NUMA. In Proceedings of the Fourteenth EuroSys Conference 2019, EuroSys '19, New York, NY, USA, 2019. Association for Computing Machinery.

[20]

Sanchuan Chen, Fangfei Liu, Zeyu Mi, Yinqian Zhang, Ruby B. Lee, Haibo Chen, and XiaoFeng Wang. Leveraging Hardware Transactional Memory for Cache Side-Channel Defenses. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, ASIACCS '18, page 601--608, New York, NY, USA, 2018. Association for Computing Machinery.

[21]

Luwei Cheng, Jia Rao, and Francis C. M. Lau. VScale: Automatic and Efficient Processor Scaling for SMP Virtual Machines. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys '16, New York, NY, USA, 2016. Association for Computing Machinery.

Digital Library

[22]

Christoffer Dall and Jason Nieh. KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, page 333--348, New York, NY, USA, 2014. Association for Computing Machinery.

[23]

Rafael Lourenco de Lima Chehab, Antonio Paolillo, Diogo Behrens, Ming Fu, Hermann Härtig, and Haibo Chen. CLoF: A Compositional Lock Framework for Multi-Level NUMA Systems. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, SOSP '21, page 851--865, New York, NY, USA, 2021. Association for Computing Machinery.

[24]

Xiaoning Ding, Phillip B. Gibbons, and Michael A. Kozuch. A Hidden Cost of Virtualization When Scaling Multicore Applications. In 5th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 13), San Jose, CA, June 2013. USENIX Association.

[25]

Xiaoning Ding, Phillip B. Gibbons, Michael A. Kozuch, and Jianchen Shan. Gleaner: Mitigating the Blocked-Waiter Wakeup Problem for Virtualized Multicore Applications. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC'14, page 73--84, USA, 2014. USENIX Association.

[26]

Thomas Friebel and Sebastian Biemueller. How to Deal with Lock Holder Pre-emption. 2008.

[27]

Jaeung Han, Jeongseob Ahn, Changdae Kim, Youngjin Kwon, Young-Ri Choi, and Jaehyuk Huh. The Effect of Multi-Core on HPC Applications in Virtualized Systems. In Proceedings of the 2010 Conference on Parallel Processing, Euro-Par 2010, page 615--623, Berlin, Heidelberg, 2010. Springer-Verlag.

[28]

Kenta Ishiguro, Naoki Yasuno, Pierre-Louis Aublin, and Kenji Kono. Mitigating Excessive VCPU Spinning in VM-Agnostic KVM. In Proceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE 2021, page 139--152, New York, NY, USA, 2021. Association for Computing Machinery.

[29]

Ali Kamali. Sharing aware scheduling on multicore systems. PhD thesis, Applied Science: School of Computing Science, 2010.

[30]

Sanidhya Kashyap, Changwoo Min, and Taesoo Kim. Scalability in the Clouds! A Myth or Reality? In Proceedings of the 6th Asia-Pacific Workshop on Systems, APSys '15, New York, NY, USA, 2015. Association for Computing Machinery.

Digital Library

[31]

Sanidhya Kashyap, Changwoo Min, and Taesoo Kim. Opportunistic Spinlocks: Achieving Virtual Machine Scalability in the Clouds. SIGOPS Oper. Syst. Rev., 50(1):9--16, mar 2016.

Digital Library

[32]

Sanidhya Kashyap, Changwoo Min, and Taesoo Kim. Scaling Guest OS Critical Sections with eCS. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 159--172, Boston, MA, July 2018. USENIX Association.

[33]

Hwanju Kim, Sangwook Kim, Jinkyu Jeong, Joonwon Lee, and Seungryoul Maeng. Demand-Based Coordinated Scheduling for SMP VMs. SIGARCH Comput. Archit. News, 41(1):369--380, mar 2013.

Digital Library

[34]

Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. KVM: the Linux virtual machine monitor. In Proceedings of the Linux symposium, volume 1, pages 225--230. Dttawa, Dntorio, Canada, 2007.

[35]

Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B. Lee. Last-Level Cache Side-Channel Attacks are Practical. In 2015 IEEE Symposium on Security and Privacy, pages 605--622, 2015.

Digital Library

[36]

Ming Liu and Tao Li. Optimizing virtual machine consolidation performance on NUMA server architecture for cloud workloads. In 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pages 325--336, 2014.

Digital Library

[37]

Brian D. Marsh, Michael L. Scott, Thomas J. LeBlanc, and Evangelos P. Markatos. First-Class User-Level Threads. SIGOPS Oper. Syst. Rev., 25(5):110--121, sep 1991.

Digital Library

[38]

Aravind Menon, Jose Renato Santos, Yoshio Turner, G. (John) Janakiraman, and Willy Zwaenepoel. Diagnosing Performance Overheads in the Xen Virtual Machine Environment. In Proceedings of the 1st ACM/USENIX International Conference on Virtual Execution Environments, VEE '05, page 13--23, New York, NY, USA, 2005. Association for Computing Machinery.

[39]

Zeyu Mi, Haibo Chen, Yinqian Zhang, Shuanghe Peng, Xiaofeng Wang, and Michael K. Reiter. CPU Elasticity to Mitigate Cross-VM Runtime Monitoring. IEEE Transactions on Dependable and Secure Computing, 17(5):1094--1108, 2020.

[40]

Changwoo Min, Sanidhya Kashyap, Steffen Maass, and Taesoo Kim. Understanding manycore scalability of file systems. In 2016 USENIX Annual Technical Conference (USENIX ATC 16), pages 71--85, 2016.

Digital Library

[41]

Jiannan Ouyang and John R. Lange. Preemptable Ticket Spinlocks: Improving Consolidated Performance in the Cloud. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '13, page 191--200, New York, NY, USA, 2013. Association for Computing Machinery.

Digital Library

[42]

Aravinda Prasad, K Gopinath, and Paul E. McKenney. The RCU-Reader Preemption Problem in VMs. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), pages 265--270, Santa Clara, CA, July 2017. USENIX Association.

[43]

Jia Rao, Kun Wang, Xiaobo Zhou, and Cheng-Zhong Xu. Optimizing virtual machine scheduling in NUMA multicore systems. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), pages 306--317, 2013.

Digital Library

[44]

Xiang Song, Haibo Chen, Binyu Zang, X Song, H Chen, and B Zang. Characterizing the performance and scalability of many-core applications on virtualized platforms. Parallel Processing Institute Technical Report Number: FDUPPITR-2010, 2, 2010.

[45]

Xiang Song, Jicheng Shi, Haibo Chen, and Binyu Zang. Schedule Processes, Not VCPUs. In Proceedings of the 4th Asia-Pacific Workshop on Systems, APSys '13, New York, NY, USA, 2013. Association for Computing Machinery.

[46]

Orathai Sukwong and Hyong S. Kim. Is Co-Scheduling Too Expensive for SMP VMs? In Proceedings of the Sixth Conference on Computer Systems, EuroSys '11, page 257--272, New York, NY, USA, 2011. Association for Computing Machinery.

Digital Library

[47]

David Tam, Reza Azimi, and Michael Stumm. Thread Clustering: Sharing-Aware Scheduling on SMP-CMP-SMT Multiprocessors. SIGOPS Oper. Syst. Rev., 41(3):47--58, mar 2007.

Digital Library

[48]

Boris Teabe, Vlad Nitu, Alain Tchana, and Daniel Hagimont. The lock holder and the lock waiter pre-emption problems: Nip them in the bud using informed spinlocks (i-spinlock). In Proceedings of the Twelfth European Conference on Computer Systems, pages 286--297, 2017.

Digital Library

[49]

Volkmar Uhlig, Joshua LeVasseur, Espen Skoglund, and Uwe Dannowski. Towards Scalable Multiprocessor Virtual Machines. In Proceedings of the 3rd Conference on Virtual Machine Research And Technology Symposium - Volume 3, VM'04, page 4, USA, 2004. USENIX Association.

[50]

VMware. The CPU Scheduler in VMware ESX 4.1. Technical Report, 2010.

[51]

Gauthier Voron, Gaël Thomas, Vivien Quéma, and Pierre Sens. An Interface to Implement NUMA Policies in the Xen Hypervisor. In Proceedings of the Twelfth European Conference on Computer Systems, EuroSys '17, page 453--467, New York, NY, USA, 2017. Association for Computing Machinery.

Digital Library

[52]

Philip M. Wells, Koushik Chakraborty, and Gurindar S. Sohi. Hardware support for spin management in overcommitted virtual machines. In 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 124--133, 2006.

Digital Library

[53]

Chuliang Weng, Qian Liu, Lei Yu, and Minglu Li. Dynamic Adaptive Scheduling for Virtual Machines. In Proceedings of the 20th International Symposium on High Performance Distributed Computing, HPDC '11, page 239--250, New York, NY, USA, 2011. Association for Computing Machinery.

[54]

Song Wu, Huahua Sun, Like Zhou, Qingtian Gan, and Hai Jin. vProbe: Scheduling Virtual Machines on NUMA Systems. In 2016 IEEE International Conference on Cluster Computing (CLUSTER), pages 70--79, 2016.

[55]

Song Wu, Zhenjiang Xie, Haibao Chen, Sheng Di, Xinyu Zhao, and Hai Jin. Dynamic Acceleration of Parallel Applications in Cloud Platforms by Adaptive Time-Slice Control. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 343--352, 2016.

[56]

Xiangyao Yu, George Bezerra, Andrew Pavlo, Srinivas Devadas, and Michael Stonebraker. Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores. Proc. VLDB Endow., 8(3):209--220, nov 2014.

Digital Library

[57]

Yifan Yuan, Mohammad Alian, Yipeng Wang, Ren Wang, Ilia Kurakin, Charlie Tai, and Nam Sung Kim. Don't Forget the I/O When Allocating Your LLC. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pages 112--125, 2021.

[58]

Lei Zhang, Yu Chen, Yaozu Dong, and Chao Liu. Lock-Visor: An Efficient Transitory Co-scheduling for MP Guest. In 2012 41st International Conference on Parallel Processing, pages 88--97, 2012.

[59]

Yinqian Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. Cross-VM Side Channels and Their Use to Extract Private Keys. In Proceedings of the 2012 ACM Conference on Computer and Communications Security, CCS '12, page 305--316, New York, NY, USA, 2012. Association for Computing Machinery.

Digital Library

[60]

Sergey Zhuravlev, Juan Carlos Saez, Sergey Blagodurov, Alexandra Fedorova, and Manuel Prieto. Survey of Scheduling Techniques for Addressing Shared Resources in Multicore Processors. ACM Comput. Surv., 45(1), dec 2012.

Cited By

Dong YMi Z(2024)IOGuard: Software-Based I/O Page Fault Handling with One CPU CoreProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671394(337-346)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671394

Index Terms

CPS: A Cooperative Para-virtualized Scheduling Framework for Manycore Machines
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
      2. Software infrastructure
        Virtual machines

Recommendations

Scheduling para-virtualized virtual machines based on events

Para-virtualization features little performance degradation by presenting each virtual machine with an abstraction of hardware instead of fully emulating the hardware environment. A guest running atop Xen can achieve near-native performances. A critical ...
A novel disk I/O scheduling framework of virtualized storage system
Abstract
Modern data centers usually use virtual machine technology to host various big data applications in a single physical machine, not only enhancing the server utilization, but also providing them with the hardware-level isolation. However, in a ...
Dynamic adaptive scheduling for virtual machines
HPDC '11: Proceedings of the 20th international symposium on High performance distributed computing

With multi-core processors becoming popular, exploiting their computational potential becomes an urgent matter. The functionality of multiple standalone computer systems can be aggregated into a single hardware computer by virtualization, giving ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4

March 2023

430 pages

ISBN:9798400703942

DOI:10.1145/3623278

Chair:
Tor Aamodt,
Program Chair:
Michael M Swift,
Program Co-chair:
Natalie Enright Jerger

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

The National Natural Science Foundation of China (NSFC)

Conference

ASPLOS '23

Sponsor:

ASPLOS '23: 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4

March 25 - 29, 2023

BC, Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
360
Total Downloads

Downloads (Last 12 months)360
Downloads (Last 6 weeks)45

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dong YMi Z(2024)IOGuard: Software-Based I/O Page Fault Handling with One CPU CoreProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671394(337-346)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671394

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents