research-article

GPU swap-aware scheduler: virtual memory management for GPU applications

Authors:

Ya-Shu ChenAuthors Info & Claims

SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

Pages 1222 - 1227

https://doi.org/10.1145/3341105.3373866

Published: 30 March 2020 Publication History

Abstract

Graphics processing units (GPUs) have been widely applied for high-performance applications, however, the performance bottleneck is the data movement from the limited physical memory. The cost of data swapping in/out significant affects the applications' performance under memory oversubscription. We propose memory contention-aware priority assignment and virtual memory management to reduce the performance degradation caused by memory contention and memory thrashing in this study. The performance of the proposed methodology was evaluated using a series of workloads and a case study, and impressive results were obtained.

References

[1]

2016. AMD I/O Visualization Technology (IOMMU) Specification. https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf.

[2]

2019. CGroup. https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt.

[3]

Waqar Ali and Heechul Yun. 2017. Work-in-progress: Protecting real-time GPU applications on integrated CPU-GPU SoC platforms. In Proceedings of the Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 141--144.

[4]

Rachata Ausavarungnirun, Vance Miller, Joshua Landgraf, Saugata Ghose, Jayneel Gandhi, Adwait Jog, Christopher J Rossbach, and Onur Mutlu. 2018. Mask: Redesigning the gpu memory hierarchy to support multi-application concurrency. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 503--518.

Digital Library

[5]

Shouvik Bardhan and Daniel A Menascé. 2014. Predicting the effect of memory contention in multi-core computers using analytic performance models. IEEE Trans. Comput. 8 (2014), 2279--2292.

[6]

Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D Hill, and Michael M Swift. 2013. Efficient virtual memory for big memory servers. In Proceedings of the International Symposium on Computer Architecture (ISCA). ACM, 237--248.

Digital Library

[7]

Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem. 2019. Interplay Between Hardware Prefetcher and Page Eviction Policy in CPU-GPU Unified Virtual Memory. In Proceedings of the International Symposium on Computer Architecture (ISCA). ACM, 224--235.

Digital Library

[8]

Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos. 2012. Auto-tuning a high-level language targeted to GPU codes. In Proceedings of the Innovative Parallel Computing (InPar). IEEE, 1--10.

[9]

Mark Harris. 2017. Unified Memory for CUDA Beginners. https://devblogs.nvidia.com/unified-memory-cuda-beginners/.

[10]

Gangyong Jia, Guangjie Han, Aohan Li, and Jaime Lloret. 2015. Coordinate channel-aware page mapping policy and memory scheduling for reducing memory interference among multimedia applications. IEEE Systems Journal 4 (2015), 2839--2851.

[11]

Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the International Symposium on Computer Architecture (ISCA). IEEE, 1--12.

Digital Library

[12]

Jens Kehne, Jonathan Metter, and Frank Bellosa. 2015. GPUswap: Enabling oversubscription of GPU memory through transparent swapping. In Proceedings of the International Conference on Virtual Execution Environments (VEE). ACM, 65--77.

Digital Library

[13]

Jungho Kim, Philkyue Shin, Soonhyun Noh, Daesik Ham, and Seongsoo Hong. 2018. Reducing Memory Interference Latency of Safety-Critical Applications via Memory Request Throttling and Linux cgroup. In Proceedings of the International System-on-Chip Conference (SOCC). IEEE, 215--220.

[14]

Donghyuk Lee, Lavanya Subramanian, Rachata Ausavarungnirun, Jongmoo Choi, and Onur Mutlu. 2015. Decoupled direct memory access: Isolating CPU and IO traffic by leveraging a dual-data-port DRAM. In Proceedings of the International Conference on Parallel Architecture and Compilation (PACT). IEEE, 174--187.

Digital Library

[15]

Janghaeng Lee, Mehrzad Samadi, and Scott Mahlke. 2014. VAST: The illusion of a large memory space for GPUs. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT). IEEE, 443--454.

Digital Library

[16]

Chen Li, Rachata Ausavarungnirun, Christopher J Rossbach, Youtao Zhang, Onur Mutlu, Yang Guo, and Jun Yang. 2019. A Framework for Memory Oversubscription Management in Graphics Processing Units. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 49--63.

Digital Library

[17]

Geunsik Lim, Changwoo Min, and Young Ik Eom. 2013. Virtual memory partitioning for enhancing application performance in mobile platforms. IEEE Transactions on Consumer Electronics 4 (2013), 786--794.

[18]

Shih-Hsiang Lo, Che-Rung Lee, Quey-Liang Kao, I-Hsin Chung, and Yeh-Ching Chung. 2013. Improving GPU Memory Performancewith Artificial Barrier Synchronization. IEEE transactions on parallel and distributed systems 9 (2013), 2342--2352.

[19]

Nikolay Sakharnykh. 2018. Everything you need to know about unified memory. http://on-demand.gputechconf.com/gtc/2018/presentation/s8430-everything-you-need-to-know-about-unified-memory.pdf.

[20]

Vivek Seshadri, Onur Mutlu, Michael A Kozuch, and Todd C Mowry. 2012. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 355--366.

Digital Library

[21]

Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, and Onur Mutlu. 2015. The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory. In Proceedings of the International Symposium on Microarchitecture. ACM, 62--75.

Digital Library

[22]

Billy Tallis. 2019. SSD bandwidth. https://www.anandtech.com/show/13761/the-samsung-970-evo-plus-ssd-review/7.

[23]

Adam Thompson and CJ Newburn. 2019. GPUDirect Storage: A Direct Path Between Storage and GPU Memory. https://devblogs.nvidia.com/gpudirect-storage/.

[24]

Bogdan Marius Tudor, Yong Meng Teo, and Simon See. 2011. Understanding off-chip memory contention of parallel programs in multicore systems. In Proceedings of the International Conference on Parallel Processing. IEEE, 602--611.

Digital Library

[25]

Jianfei Wang, Qin Wang, Li Jiang, Chao Li, Xiaoyao Liang, and Naifeng Jing. 2017. Ibom: An integrated and balanced on-chip memory for high performance gpgpus. IEEE Transactions on Parallel and Distributed Systems 3 (2017), 586--599.

[26]

Jie Zhang, David Donofrio, John Shalf, Mahmut T Kandemir, and Myoungsoo Jung. 2015. Nvmmu: A non-volatile memory management unit for heterogeneous gpu-ssd architectures. In Proceedings of the International Conference on Parallel Architecture and Compilation (PACT). IEEE, 13--24.

Digital Library

Index Terms

GPU swap-aware scheduler: virtual memory management for GPU applications
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded software
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Virtual memory
        Process management
        Scheduling

Recommendations

A Framework for Memory Oversubscription Management in Graphics Processing Units
ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

Modern discrete GPUs support unified memory and demand paging. Automatic management of data movement between CPU memory and GPU memory dramatically reduces developer effort. However, when application working sets exceed physical memory capacity, the ...
Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors
Special Topics

Contemporary discrete GPUs support rich memory management features such as virtual memory and demand paging. These features simplify GPU programming by providing a virtual address space abstraction similar to CPUs and eliminating manual memory ...
A multi-core memory organization for 3-d DRAM as main memory
ARCS'13: Proceedings of the 26th international conference on Architecture of Computing Systems

There is a growing interest in using 3-D DRAM structures and non-volatile memories such as Phase Change Memories (PCM) to both improve access latencies and reduce energy consumption in multicore systems. These new memory technologies present both ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

March 2020

2348 pages

ISBN:9781450368667

DOI:10.1145/3341105

Conference Chairs:
Chih-Cheng Hung
Kennesaw State University
,
Tomas Cerny
Baylor University
,
Program Chairs:
Dongwan Shin
New Mexico Tech
,
Alessio Bechini
University of Pisa, Italy

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Ministry of Science and Technology, Taiwan

Conference

SAC '20

Sponsor:

SIGAPP

SAC '20: The 35th ACM/SIGAPP Symposium on Applied Computing

March 30 - April 3, 2020

Brno, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
291
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)6

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents