tutorial

Catalyst: GPU-assisted rapid memory deduplication in virtualization environments

Authors:

Debadatta Mishra,

Purushottam KulkarniAuthors Info & Claims

ACM SIGPLAN Notices, Volume 52, Issue 7

Pages 44 - 59

https://doi.org/10.1145/3140607.3050760

Published: 08 April 2017 Publication History

Abstract

Content based page sharing techniques improve memory efficiency in virtualized systems by identifying and merging identical pages. Kernel Same-page Merging (KSM), a Linux kernel utility for page sharing, sequentially scans memory pages of virtual machines to deduplicate pages. Sequential scanning of pages has several undesirable side effects---wasted CPU cycles when no sharing opportunities exist, and rate of discovery of sharing being dependent on the scanning rate and corresponding CPU availability. In this work, we exploit presence of GPUs on modern systems to enable rapid memory sharing through targeted scanning of pages. Our solution, Catalyst, works in two phases, the first where pages of virtual machines are processed by the GPU to identify likely pages for sharing and a second phase that performs page-level similarity checks on a targeted set of shareable pages. Opportunistic usage of the GPU to produce sharing hints enables rapid and low-overhead duplicate detection, and sharing of memory pages in virtualization environments. We evaluate Catalyst against various benchmarks and workloads to demonstrate that Catalyst can achieve higher memory sharing in lesser time compared to different scan rate configurations of KSM, at lower or comparable compute costs.

References

[1]

Heterogeneous system architecture (hsa) foundation. URL http://www.hsafoundation.com/.

[2]

Cuda toolkit documentation. URL http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/#zero-copy.

[3]

A. Arcangeli, I. Eidus, and C. Wright. Increasing memory density by using ksm. In Proceedings of the 11th Ottawa Linux Symposium (OLS), 2009.

[4]

E. Bugnion, S. Devine, K. Govil, and M. Rosenblum. Disco: Running commodity operating systems on scalable multiprocessors. ACM Transactions on Computer Systems (TOCS), 15 (4):412--447, 1997.

Digital Library

[5]

D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux. Oltp-bench: An extensible testbed for benchmarking relational databases. VLDB Endowment, 7(4):277--288, 2013.

Digital Library

[6]

J. Duato, A. J. Pena, F. Silla, J. C. Fernandez, R. Mayo, and E. S. Quintana-Orti. Enabling cuda acceleration within virtual machines using rcuda. In Proceedings of the 18th Annual International Conference on High Performance Computing (HiPC), 2011.

Digital Library

[7]

G. Giunta, R. Montella, G. Agrillo, and G. Coviello. A gpgpu transparent virtualization component for high performance computing clouds. In Proceedings of the 16th International European Conference on Parallel Processing (EuroPar). 2010.

[8]

F. Guo, S. Kim, Y. Baskakov, and I. Banerjee. Proactively breaking large pages to improve memory overcommitment performance in vmware esxi. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), 2015.

Digital Library

[9]

D. Gupta, S. Lee, M. Vrable, S. Savage, A. C. Snoeren, G. Varghese, G. M. Voelker, and A. Vahdat. Difference engine: Harnessing memory redundancy in virtual machines. Communications of the ACM, 53(10):85--93, 2010.

Digital Library

[10]

V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and P. Ranganathan. Gvim: Gpu-accelerated virtual machines. In Proceedings of the 3rd Workshop on System-level Virtualization for High Performance Computing (HPCVirt), 2009.

Digital Library

[11]

A. Herrera. Nvidia grid: Graphics accelerated vdi with the visual performance of a workstation, 2014. URL http://www.nvidia.com/content/grid/vdi-whitepaper.pdf.

[12]

K. Jang, S. Han, S. Han, S. B. Moon, and K. Park. Sslshader: Cheap ssl acceleration with commodity processors. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2011.

Digital Library

[13]

S. T. Jones, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Geiger: Monitoring the buffer cache in a virtual machine environment. SIGARCH Computer Architecture News, 34(5): 14--24, 2006.

Digital Library

[14]

Khronos. The open standard for parallel programming of heterogeneous systems, 2015. URL https://www.khronos.org/opencl/.

[15]

D. Magenheimer, C. Mason, D. McCracken, and K. Hackel. Transcendent memory and linux. In Proceedings of the 11th Ottawa Linux Symposium (OLS), 2009.

[16]

K. Miller, F. Franz, M. Rittinghaus, M. Hillenbrand, and F. Bellosa. Xlh: More effective memory deduplication scanners through cross-layer hints. In Proceedings of the 24th USENIX Annual Technical Conference (ATC), 2013.

[17]

G. Miłós, D. G. Murray, S. Hand, and M. A. Fetterman. Satori: Enlightened page sharing. In Proceedings of the 20th USENIX Annual Technical Conference (ATC), 2009.

[18]

D. Mishra and P. Kulkarni. Comparative analysis of page cache provisioning in virtualized environments. In Proceedings of the 22nd International Symposium on Modelling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), 2014.

Digital Library

[19]

R. Montella, G. Coviello, G. Giunta, G. Laccetti, F. Isaila, and J. G. Blas. A general-purpose virtualization service for hpc on cloud computing: an application to gpus. In Proceedings of the 9th International Conference on Parallel Processing and Applied Mathematics (PPAM), 2011.

[20]

Y. Naoi and H. Yamada. A gpu-accelerated vm live migration for big memory workloads. In Proceedings of the 5th ACM Symposium on Cloud Computing (SoCC), 2014.

[21]

NVIDIA. Nvidia grid k1 and k2 graphics-accelerated virtual desktops and applications, June 2013. URL http://www.nvidia.in/content/cloud-computing/pdf/nvidia-grid-datasheet-k1-k2.pdf.

[22]

NVIDIA. Cuda parallel computing platform, 2015. URL http://www.nvidia.com/object/cuda_home_new.html.

[23]

NVIDIA. Nvidia nvlink high-speed interconnect, 2017. URL http://www.nvidia.com/object/nvlink.html.

[24]

NVIDIA. Unified memory in cuda 6, 2017. URL https://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/.

[25]

S. Rachamalla, D. Mishra, and P. Kulkarni. Share-o-meter: An empirical analysis of ksm based memory sharing in virtualized systems. In Proceeding of the 20th Annual IEEE International Conference on High Performance Computing (HiPC), 2013.

[26]

C. Reano, A. Pea, F. Silla, J. Duato, R. Mayo, and E. Quintana-Orti. Cu2rcu: Towards the complete rcuda remote gpu virtualization and sharing solution. In Proceedings of the 19th Annual International Conference on High Performance Computing (HiPC), 2012.

[27]

redislabs. redis. URL https://redis.io/.

[28]

P. Sharma and P. Kulkarni. Singleton: system-wide page deduplication in virtual environments. In Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2012.

Digital Library

[29]

L. Shi, H. Chen, and J. Sun. vcuda: Gpu accelerated high performance computing in virtual machines. In Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2009.

[30]

L. Shi, H. Chen, J. Sun, and K. Li. vcuda: Gpu-accelerated high-performance computing in virtual machines. IEEE Transactions on Computers, 61(6):804--816, 2012.

Digital Library

[31]

W. Sun and R. Ricci. Fast and flexible: Parallel packet processing with gpus and click. In Proceedings of the 9th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), 2013.

[32]

W. Sun, R. Ricci, and M. L. Curry. Gpustore: harnessing gpu computing for storage systems in the os kernel. In Proceedings of the 5th Annual International Systems and Storage Conference (SYSTOR), 2012.

Digital Library

[33]

J. Tölke and M. Krafczyk. Teraflop computing on a desktop pc with gpus for 3d cfd. International Journal of Computational Fluid Dynamics, 22(7):443--456, 2008.

Digital Library

[34]

G. Vasiliadis, S. Antonatos, M. Polychronakis, E. P. Markatos, and S. Ioannidis. Gnort: High performance network intrusion detection using graphics processors. In Proceedings of the 11th International Symposium on Recent Advances in Intrusion Detection (RAID), 2008.

Digital Library

[35]

E. Z. Vasily Tarasov and S. Shepler. Filebench: A flexible framework for file system benchmarking.; login:THE USENIX MAGAZINE, 41(1):6--12, 2016.

[36]

F. Vazquez, E. Garzon, J. Martinez, and J. Fernandez. The sparse matrix vector product on gpus. In Proceedings of the 9th International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE), 2009.

[37]

C. A. Waldspurger. Memory resource management in vmware esx server. ACM SIGOPS Operating Systems Review, 36(SI): 181--194, 2002.

[38]

T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif. Sandpiper: Black-box and gray-box resource management for virtual machines. Computer Networks, 53(17):2923--2938, 2009.

Digital Library

[39]

Z. Yang, Y. Zhu, and Y. Pu. Parallel image processing based on cuda. In Proceedings of the 2nd International Conference on Computer Science and Software Engineering (CSSE), 2009.

Cited By

Tu CLin T(2019)Augmenting Operating Systems with OpenCL AcceleratorsACM Transactions on Design Automation of Electronic Systems10.1145/331556924:3(1-29)Online publication date: 28-Mar-2019
https://dl.acm.org/doi/10.1145/3315569
Gu YLu YWu CLi JGuo M(2024)CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00016(76-88)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00016
Eswaran RYan MGopalan K(2024)Tackling Memory Footprint Expansion During Live Migration of Virtual Machines2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00027(158-167)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00027
Show More Cited By

Recommendations

Catalyst: GPU-assisted rapid memory deduplication in virtualization environments
VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

Content based page sharing techniques improve memory efficiency in virtualized systems by identifying and merging identical pages. Kernel Same-page Merging (KSM), a Linux kernel utility for page sharing, sequentially scans memory pages of virtual ...
AKC: advanced KSM for cloud computing
SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

Kernel samepage merging (KSM) in Linux kernel archive is a memory deduplication scheme that finds duplicate pages and shares the page in order to alleviate memory bottleneck in cloud. However, because the KSM has to scan all pages in memory to find ...
Towards Efficient Hugepage-aware Memory Deduplication
WORDS '23: Proceedings of the 4th Workshop on Resource Disaggregation and Serverless

Memory deduplication techniques aim to reduce memory consumption by sharing redundant memory pages. Additionally, hugepages are commonly employed to optimize performance by minimizing TLB misses. However, the simultaneous use of memory deduplication and ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 52, Issue 7

VEE '17

July 2017

256 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/3140607

Editor:
Matthew Fluet

Issue’s Table of Contents

VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
April 2017
261 pages
ISBN:9781450349482
DOI:10.1145/3050748

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2017

Published in SIGPLAN Volume 52, Issue 7

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
498
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)1

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tu CLin T(2019)Augmenting Operating Systems with OpenCL AcceleratorsACM Transactions on Design Automation of Electronic Systems10.1145/331556924:3(1-29)Online publication date: 28-Mar-2019
https://dl.acm.org/doi/10.1145/3315569
Gu YLu YWu CLi JGuo M(2024)CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00016(76-88)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPS57955.2024.00016
Eswaran RYan MGopalan K(2024)Tackling Memory Footprint Expansion During Live Migration of Virtual Machines2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00027(158-167)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00027
Hildenbrand DSchulz MAmit NAamodt TJerger NSwift M(2023)Copy-on-Pin: The Missing Piece for Correct Copy-on-WriteProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575716(176-191)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575716
Jagadeeswari NMohanraj VSuresh YSenthilkumar J(2023)Optimization of virtual machines performance using fuzzy hashing and genetic algorithm-based memory deduplication of static pagesAutomatika10.1080/00051144.2023.222347964:4(868-877)Online publication date: 27-Jun-2023
https://doi.org/10.1080/00051144.2023.2223479
Lei YLanson JShue CWood T(2023)Attackers as Instructors: Using Container Isolation to Reduce Risk and Understand VulnerabilitiesDetection of Intrusions and Malware, and Vulnerability Assessment10.1007/978-3-031-35504-2_9(177-197)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1007/978-3-031-35504-2_9
Singh RKP AMishra D(2022)LDT: Lightweight Dirty Tracking of Memory Pages for x86 Systems2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC56025.2022.00023(85-94)Online publication date: Dec-2022
https://doi.org/10.1109/HiPC56025.2022.00023
Panda APanwar ABasu A(2021)nuKSM: NUMA-aware Memory De-duplication on Multi-socket ServersProceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT52795.2021.00026(258-273)Online publication date: 26-Sep-2021
https://dl.acm.org/doi/10.1109/PACT52795.2021.00026
Kumar T SMishra DPanda BDeshmukh N(2019)CoWLightProceedings of the 8th International Workshop on Hardware and Architectural Support for Security and Privacy10.1145/3337167.3337170(1-8)Online publication date: 23-Jun-2019
https://dl.acm.org/doi/10.1145/3337167.3337170
Tu CLin T(2019)Augmenting Operating Systems with OpenCL AcceleratorsACM Transactions on Design Automation of Electronic Systems10.1145/331556924:3(1-29)Online publication date: 28-Mar-2019
https://dl.acm.org/doi/10.1145/3315569
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents