Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
tutorial

Catalyst: GPU-assisted rapid memory deduplication in virtualization environments

Published: 08 April 2017 Publication History

Abstract

Content based page sharing techniques improve memory efficiency in virtualized systems by identifying and merging identical pages. Kernel Same-page Merging (KSM), a Linux kernel utility for page sharing, sequentially scans memory pages of virtual machines to deduplicate pages. Sequential scanning of pages has several undesirable side effects---wasted CPU cycles when no sharing opportunities exist, and rate of discovery of sharing being dependent on the scanning rate and corresponding CPU availability. In this work, we exploit presence of GPUs on modern systems to enable rapid memory sharing through targeted scanning of pages. Our solution, Catalyst, works in two phases, the first where pages of virtual machines are processed by the GPU to identify likely pages for sharing and a second phase that performs page-level similarity checks on a targeted set of shareable pages. Opportunistic usage of the GPU to produce sharing hints enables rapid and low-overhead duplicate detection, and sharing of memory pages in virtualization environments. We evaluate Catalyst against various benchmarks and workloads to demonstrate that Catalyst can achieve higher memory sharing in lesser time compared to different scan rate configurations of KSM, at lower or comparable compute costs.

References

[1]
Heterogeneous system architecture (hsa) foundation. URL http://www.hsafoundation.com/.
[2]
Cuda toolkit documentation. URL http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/#zero-copy.
[3]
A. Arcangeli, I. Eidus, and C. Wright. Increasing memory density by using ksm. In Proceedings of the 11th Ottawa Linux Symposium (OLS), 2009.
[4]
E. Bugnion, S. Devine, K. Govil, and M. Rosenblum. Disco: Running commodity operating systems on scalable multiprocessors. ACM Transactions on Computer Systems (TOCS), 15 (4):412--447, 1997.
[5]
D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. Oltp-bench: An extensible testbed for benchmarking relational databases. VLDB Endowment, 7(4):277--288, 2013.
[6]
J. Duato, A. J. Pena, F. Silla, J. C. Fernandez, R. Mayo, and E. S. Quintana-Orti. Enabling cuda acceleration within virtual machines using rcuda. In Proceedings of the 18th Annual International Conference on High Performance Computing (HiPC), 2011.
[7]
G. Giunta, R. Montella, G. Agrillo, and G. Coviello. A gpgpu transparent virtualization component for high performance computing clouds. In Proceedings of the 16th International European Conference on Parallel Processing (EuroPar). 2010.
[8]
F. Guo, S. Kim, Y. Baskakov, and I. Banerjee. Proactively breaking large pages to improve memory overcommitment performance in vmware esxi. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), 2015.
[9]
D. Gupta, S. Lee, M. Vrable, S. Savage, A. C. Snoeren, G. Varghese, G. M. Voelker, and A. Vahdat. Difference engine: Harnessing memory redundancy in virtual machines. Communications of the ACM, 53(10):85--93, 2010.
[10]
V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and P. Ranganathan. Gvim: Gpu-accelerated virtual machines. In Proceedings of the 3rd Workshop on System-level Virtualization for High Performance Computing (HPCVirt), 2009.
[11]
A. Herrera. Nvidia grid: Graphics accelerated vdi with the visual performance of a workstation, 2014. URL http://www.nvidia.com/content/grid/vdi-whitepaper.pdf.
[12]
K. Jang, S. Han, S. Han, S. B. Moon, and K. Park. Sslshader: Cheap ssl acceleration with commodity processors. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2011.
[13]
S. T. Jones, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Geiger: Monitoring the buffer cache in a virtual machine environment. SIGARCH Computer Architecture News, 34(5): 14--24, 2006.
[14]
Khronos. The open standard for parallel programming of heterogeneous systems, 2015. URL https://www.khronos.org/opencl/.
[15]
D. Magenheimer, C. Mason, D. McCracken, and K. Hackel. Transcendent memory and linux. In Proceedings of the 11th Ottawa Linux Symposium (OLS), 2009.
[16]
K. Miller, F. Franz, M. Rittinghaus, M. Hillenbrand, and F. Bellosa. Xlh: More effective memory deduplication scanners through cross-layer hints. In Proceedings of the 24th USENIX Annual Technical Conference (ATC), 2013.
[17]
G. Miłós, D. G. Murray, S. Hand, and M. A. Fetterman. Satori: Enlightened page sharing. In Proceedings of the 20th USENIX Annual Technical Conference (ATC), 2009.
[18]
D. Mishra and P. Kulkarni. Comparative analysis of page cache provisioning in virtualized environments. In Proceedings of the 22nd International Symposium on Modelling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2014.
[19]
R. Montella, G. Coviello, G. Giunta, G. Laccetti, F. Isaila, and J. G. Blas. A general-purpose virtualization service for hpc on cloud computing: an application to gpus. In Proceedings of the 9th International Conference on Parallel Processing and Applied Mathematics (PPAM), 2011.
[20]
Y. Naoi and H. Yamada. A gpu-accelerated vm live migration for big memory workloads. In Proceedings of the 5th ACM Symposium on Cloud Computing (SoCC), 2014.
[21]
NVIDIA. Nvidia grid k1 and k2 graphics-accelerated virtual desktops and applications, June 2013. URL http://www.nvidia.in/content/cloud-computing/pdf/nvidia-grid-datasheet-k1-k2.pdf.
[22]
NVIDIA. Cuda parallel computing platform, 2015. URL http://www.nvidia.com/object/cuda_home_new.html.
[23]
NVIDIA. Nvidia nvlink high-speed interconnect, 2017. URL http://www.nvidia.com/object/nvlink.html.
[24]
NVIDIA. Unified memory in cuda 6, 2017. URL https://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/.
[25]
S. Rachamalla, D. Mishra, and P. Kulkarni. Share-o-meter: An empirical analysis of ksm based memory sharing in virtualized systems. In Proceeding of the 20th Annual IEEE International Conference on High Performance Computing (HiPC), 2013.
[26]
C. Reano, A. Pea, F. Silla, J. Duato, R. Mayo, and E. Quintana-Orti. Cu2rcu: Towards the complete rcuda remote gpu virtualization and sharing solution. In Proceedings of the 19th Annual International Conference on High Performance Computing (HiPC), 2012.
[27]
redislabs. redis. URL https://redis.io/.
[28]
P. Sharma and P. Kulkarni. Singleton: system-wide page deduplication in virtual environments. In Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2012.
[29]
L. Shi, H. Chen, and J. Sun. vcuda: Gpu accelerated high performance computing in virtual machines. In Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2009.
[30]
L. Shi, H. Chen, J. Sun, and K. Li. vcuda: Gpu-accelerated high-performance computing in virtual machines. IEEE Transactions on Computers, 61(6):804--816, 2012.
[31]
W. Sun and R. Ricci. Fast and flexible: Parallel packet processing with gpus and click. In Proceedings of the 9th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), 2013.
[32]
W. Sun, R. Ricci, and M. L. Curry. Gpustore: harnessing gpu computing for storage systems in the os kernel. In Proceedings of the 5th Annual International Systems and Storage Conference (SYSTOR), 2012.
[33]
J. Tölke and M. Krafczyk. Teraflop computing on a desktop pc with gpus for 3d cfd. International Journal of Computational Fluid Dynamics, 22(7):443--456, 2008.
[34]
G. Vasiliadis, S. Antonatos, M. Polychronakis, E. P. Markatos, and S. Ioannidis. Gnort: High performance network intrusion detection using graphics processors. In Proceedings of the 11th International Symposium on Recent Advances in Intrusion Detection (RAID), 2008.
[35]
E. Z. Vasily Tarasov and S. Shepler. Filebench: A flexible framework for file system benchmarking.; login:THE USENIX MAGAZINE, 41(1):6--12, 2016.
[36]
F. Vazquez, E. Garzon, J. Martinez, and J. Fernandez. The sparse matrix vector product on gpus. In Proceedings of the 9th International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE), 2009.
[37]
C. A. Waldspurger. Memory resource management in vmware esx server. ACM SIGOPS Operating Systems Review, 36(SI): 181--194, 2002.
[38]
T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif. Sandpiper: Black-box and gray-box resource management for virtual machines. Computer Networks, 53(17):2923--2938, 2009.
[39]
Z. Yang, Y. Zhu, and Y. Pu. Parallel image processing based on cuda. In Proceedings of the 2nd International Conference on Computer Science and Software Engineering (CSSE), 2009.

Cited By

View all
  • (2019)Augmenting Operating Systems with OpenCL AcceleratorsACM Transactions on Design Automation of Electronic Systems10.1145/331556924:3(1-29)Online publication date: 28-Mar-2019
  • (2024)CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00016(76-88)Online publication date: 27-May-2024
  • (2024)Tackling Memory Footprint Expansion During Live Migration of Virtual Machines2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00027(158-167)Online publication date: 6-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 52, Issue 7
VEE '17
July 2017
256 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3140607
Issue’s Table of Contents
  • cover image ACM Conferences
    VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
    April 2017
    261 pages
    ISBN:9781450349482
    DOI:10.1145/3050748
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2017
Published in SIGPLAN Volume 52, Issue 7

Check for updates

Author Tags

  1. Graphics Processing Units
  2. KSM
  3. Memory Deduplication
  4. Virtualization

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)1
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Augmenting Operating Systems with OpenCL AcceleratorsACM Transactions on Design Automation of Electronic Systems10.1145/331556924:3(1-29)Online publication date: 28-Mar-2019
  • (2024)CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00016(76-88)Online publication date: 27-May-2024
  • (2024)Tackling Memory Footprint Expansion During Live Migration of Virtual Machines2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00027(158-167)Online publication date: 6-May-2024
  • (2023)Copy-on-Pin: The Missing Piece for Correct Copy-on-WriteProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575716(176-191)Online publication date: 27-Jan-2023
  • (2023)Optimization of virtual machines performance using fuzzy hashing and genetic algorithm-based memory deduplication of static pagesAutomatika10.1080/00051144.2023.222347964:4(868-877)Online publication date: 27-Jun-2023
  • (2023)Attackers as Instructors: Using Container Isolation to Reduce Risk and Understand VulnerabilitiesDetection of Intrusions and Malware, and Vulnerability Assessment10.1007/978-3-031-35504-2_9(177-197)Online publication date: 12-Jul-2023
  • (2022)LDT: Lightweight Dirty Tracking of Memory Pages for x86 Systems2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC56025.2022.00023(85-94)Online publication date: Dec-2022
  • (2021)nuKSM: NUMA-aware Memory De-duplication on Multi-socket ServersProceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT52795.2021.00026(258-273)Online publication date: 26-Sep-2021
  • (2019)CoWLightProceedings of the 8th International Workshop on Hardware and Architectural Support for Security and Privacy10.1145/3337167.3337170(1-8)Online publication date: 23-Jun-2019
  • (2019)Augmenting Operating Systems with OpenCL AcceleratorsACM Transactions on Design Automation of Electronic Systems10.1145/331556924:3(1-29)Online publication date: 28-Mar-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media