Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3341105.3373866acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

GPU swap-aware scheduler: virtual memory management for GPU applications

Published: 30 March 2020 Publication History

Abstract

Graphics processing units (GPUs) have been widely applied for high-performance applications, however, the performance bottleneck is the data movement from the limited physical memory. The cost of data swapping in/out significant affects the applications' performance under memory oversubscription. We propose memory contention-aware priority assignment and virtual memory management to reduce the performance degradation caused by memory contention and memory thrashing in this study. The performance of the proposed methodology was evaluated using a series of workloads and a case study, and impressive results were obtained.

References

[1]
2016. AMD I/O Visualization Technology (IOMMU) Specification. https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf.
[2]
2019. CGroup. https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt.
[3]
Waqar Ali and Heechul Yun. 2017. Work-in-progress: Protecting real-time GPU applications on integrated CPU-GPU SoC platforms. In Proceedings of the Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 141--144.
[4]
Rachata Ausavarungnirun, Vance Miller, Joshua Landgraf, Saugata Ghose, Jayneel Gandhi, Adwait Jog, Christopher J Rossbach, and Onur Mutlu. 2018. Mask: Redesigning the gpu memory hierarchy to support multi-application concurrency. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 503--518.
[5]
Shouvik Bardhan and Daniel A Menascé. 2014. Predicting the effect of memory contention in multi-core computers using analytic performance models. IEEE Trans. Comput. 8 (2014), 2279--2292.
[6]
Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D Hill, and Michael M Swift. 2013. Efficient virtual memory for big memory servers. In Proceedings of the International Symposium on Computer Architecture (ISCA). ACM, 237--248.
[7]
Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem. 2019. Interplay Between Hardware Prefetcher and Page Eviction Policy in CPU-GPU Unified Virtual Memory. In Proceedings of the International Symposium on Computer Architecture (ISCA). ACM, 224--235.
[8]
Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos. 2012. Auto-tuning a high-level language targeted to GPU codes. In Proceedings of the Innovative Parallel Computing (InPar). IEEE, 1--10.
[9]
Mark Harris. 2017. Unified Memory for CUDA Beginners. https://devblogs.nvidia.com/unified-memory-cuda-beginners/.
[10]
Gangyong Jia, Guangjie Han, Aohan Li, and Jaime Lloret. 2015. Coordinate channel-aware page mapping policy and memory scheduling for reducing memory interference among multimedia applications. IEEE Systems Journal 4 (2015), 2839--2851.
[11]
Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the International Symposium on Computer Architecture (ISCA). IEEE, 1--12.
[12]
Jens Kehne, Jonathan Metter, and Frank Bellosa. 2015. GPUswap: Enabling oversubscription of GPU memory through transparent swapping. In Proceedings of the International Conference on Virtual Execution Environments (VEE). ACM, 65--77.
[13]
Jungho Kim, Philkyue Shin, Soonhyun Noh, Daesik Ham, and Seongsoo Hong. 2018. Reducing Memory Interference Latency of Safety-Critical Applications via Memory Request Throttling and Linux cgroup. In Proceedings of the International System-on-Chip Conference (SOCC). IEEE, 215--220.
[14]
Donghyuk Lee, Lavanya Subramanian, Rachata Ausavarungnirun, Jongmoo Choi, and Onur Mutlu. 2015. Decoupled direct memory access: Isolating CPU and IO traffic by leveraging a dual-data-port DRAM. In Proceedings of the International Conference on Parallel Architecture and Compilation (PACT). IEEE, 174--187.
[15]
Janghaeng Lee, Mehrzad Samadi, and Scott Mahlke. 2014. VAST: The illusion of a large memory space for GPUs. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT). IEEE, 443--454.
[16]
Chen Li, Rachata Ausavarungnirun, Christopher J Rossbach, Youtao Zhang, Onur Mutlu, Yang Guo, and Jun Yang. 2019. A Framework for Memory Oversubscription Management in Graphics Processing Units. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 49--63.
[17]
Geunsik Lim, Changwoo Min, and Young Ik Eom. 2013. Virtual memory partitioning for enhancing application performance in mobile platforms. IEEE Transactions on Consumer Electronics 4 (2013), 786--794.
[18]
Shih-Hsiang Lo, Che-Rung Lee, Quey-Liang Kao, I-Hsin Chung, and Yeh-Ching Chung. 2013. Improving GPU Memory Performancewith Artificial Barrier Synchronization. IEEE transactions on parallel and distributed systems 9 (2013), 2342--2352.
[19]
Nikolay Sakharnykh. 2018. Everything you need to know about unified memory. http://on-demand.gputechconf.com/gtc/2018/presentation/s8430-everything-you-need-to-know-about-unified-memory.pdf.
[20]
Vivek Seshadri, Onur Mutlu, Michael A Kozuch, and Todd C Mowry. 2012. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 355--366.
[21]
Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, and Onur Mutlu. 2015. The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory. In Proceedings of the International Symposium on Microarchitecture. ACM, 62--75.
[22]
Billy Tallis. 2019. SSD bandwidth. https://www.anandtech.com/show/13761/the-samsung-970-evo-plus-ssd-review/7.
[23]
Adam Thompson and CJ Newburn. 2019. GPUDirect Storage: A Direct Path Between Storage and GPU Memory. https://devblogs.nvidia.com/gpudirect-storage/.
[24]
Bogdan Marius Tudor, Yong Meng Teo, and Simon See. 2011. Understanding off-chip memory contention of parallel programs in multicore systems. In Proceedings of the International Conference on Parallel Processing. IEEE, 602--611.
[25]
Jianfei Wang, Qin Wang, Li Jiang, Chao Li, Xiaoyao Liang, and Naifeng Jing. 2017. Ibom: An integrated and balanced on-chip memory for high performance gpgpus. IEEE Transactions on Parallel and Distributed Systems 3 (2017), 586--599.
[26]
Jie Zhang, David Donofrio, John Shalf, Mahmut T Kandemir, and Myoungsoo Jung. 2015. Nvmmu: A non-volatile memory management unit for heterogeneous gpu-ssd architectures. In Proceedings of the International Conference on Parallel Architecture and Compilation (PACT). IEEE, 13--24.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing
March 2020
2348 pages
ISBN:9781450368667
DOI:10.1145/3341105
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 March 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. gpgpu applications
  2. memory contention
  3. memory oversubscription
  4. memory thrashing
  5. virtual memory management

Qualifiers

  • Research-article

Funding Sources

  • Ministry of Science and Technology, Taiwan

Conference

SAC '20
Sponsor:
SAC '20: The 35th ACM/SIGAPP Symposium on Applied Computing
March 30 - April 3, 2020
Brno, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 291
    Total Downloads
  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)6
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media