Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2591971.2592002acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

GDM: device memory management for gpgpu computing

Published: 16 June 2014 Publication History

Abstract

GPGPUs are evolving from dedicated accelerators towards mainstream commodity computing resources. During the transition, the lack of system management of device memory space on GPGPUs has become a major hurdle. In existing GPGPU systems, device memory space is still managed explicitly by individual applications, which not only increases the burden of programmers but can also cause application crashes, hangs, or low performance.
In this paper, we present the design and implementation of GDM, a fully functional GPGPU device memory manager to address the above problems and unleash the computing power of GPGPUs in general-purpose environments. To effectively coordinate the device memory usage of different applications, GDM takes control over device memory allocations and data transfers to and from device memory, leveraging a buffer allocated in each application's virtual memory. GDM utilizes the unique features of GPGPU systems and relies on several effective optimization techniques to guarantee the efficient usage of device memory space and to achieve high performance.
We have evaluated GDM and compared it against state-of-the-art GPGPU system software on a range of workloads. The results show that GDM can prevent applications from crashes, including those induced by device memory leaks, and improve system performance by up to 43%.

References

[1]
http://mathworks.com/matlabcentral/newsreader/view thread/324086.
[2]
http://milkyway.cs.rpi.edu/milkyway/forum thread.php?id=2780.
[3]
http://culatools.com/blog/2012/03/12/3099.
[4]
http://blenderartists.org/forum/showthread.php?269777.
[5]
https://github.com/Theano/Theano (commit#:5a755867f21b9a61, fe69a5a5b3a44695, 410016f9d6025064, 9bdeda96639e77af).
[6]
http://mail-archive.com/[email protected]/msg02432.html.
[7]
http://amd.com/en-us/innovations/software-technologies/apu.
[8]
http://documen.tician.de/pycuda/.
[9]
https://devtalk.nvidia.com/default/topic/513370/cublas-problem.
[10]
http://setiweb.ssl.berkeley.edu/beta/forum thread.php?id=1441.
[11]
http://mathworks.com/matlabcentral/answers/85601-unavoidable-memory-leaks-in-mex.
[12]
http://nouveau.freedesktop.org.
[13]
https://github.com/serban/kmeans.
[14]
http://www.hsafoundation.com/.
[15]
CULA linear algebra libraries. culatools.com.
[16]
AMD. AMD accelerated parallel processing OpenCL programming guide, 2013.
[17]
J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a CPU and GPU math expression compiler. In SciPy, 2010.
[18]
B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. H. Loh, D. McCaule, P. Morrow, D. W. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. Shen, and C. Webb. Die stacking (3d) microarchitecture. In MICRO, 2006.
[19]
R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. Scope: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow., 1(2):1265{1276, 2008.
[20]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, 2009.
[21]
P. J. Denning. Virtual memory. ACM Comput. Surv., 2(3):153--189, 1970.
[22]
P. J. Denning. Third generation computer systems. ACM Comput. Surv., 3(4):175--216, Dec. 1971.
[23]
D. R. Engler and M. F. Kaashoek. Exterminate all operating system abstractions. In HOTOS, 1995.
[24]
I. Gelado, J. E. Stone, J. Cabezas, S. Patel, N. Navarro, and W.-m. W. Hwu. An asymmetric distributed shared memory model for heterogeneous parallel systems. In ASPLOS, 2010.
[25]
K. O. W. Group. The OpenCL specification 1.2, 2013.
[26]
V. Gupta, K. Schwan, N. Tolia, V. Talwar, and P. Ranganathan. Pegasus: coordinated scheduling for virtualized accelerator-based systems. In USENIX ATC, 2011.
[27]
T. B. Jablin, P. Prabhu, J. A. Jablin, N. P. Johnson, S. R. Beard, and D. I. August. Automatic CPU-GPU communication management and optimization. In PLDI, 2011.
[28]
K. Jang, S. Han, S. Han, S. Moon, and K. Park. SSLShader: cheap SSL acceleration with commodity processors. In NSDI, 2011.
[29]
F. Ji, H. Lin, and X. Ma. RSVM: a region-based software virtual memory for GPU. In PACT, 2013.
[30]
S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. In USENIX ATC, 2011.
[31]
S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. Gdev: first-class GPU resource management in the operating system. In USENIX ATC, 2012.
[32]
H. Kim. Supporting virtual memory in GPGPU without supporting precise exceptions. In MSPC, 2012.
[33]
V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In ISCA, 2010.
[34]
E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 28(2), 2008.
[35]
M. Macedonia. The GPU enters computing's mainstream. Computer, 36(10):106--108, 2003.
[36]
J. Menon, M. De Kruijf, and K. Sankaralingam. iGPU: exception support and speculative execution on GPUs. In ISCA, 2012.
[37]
T. Ni. Direct Compute: Bring GPU computing to the mainstream. In GTC, 2009.
[38]
NVIDIA. NVIDIA's next generation CUDA compute architecture: Kepler GK110, 2012.
[39]
NVIDIA. NVIDIA CUDA C programming guide, 2013.
[40]
J. Poulton. An embedded DRAM for CMOS ASICs. In ARVLSI, 1997.
[41]
C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: operating system abstractions to manage GPUs as compute devices. In SOSP, 2011.
[42]
M. Silberstein, B. Ford, I. Keidar, and E. Witchel. GPUfs: integrating a file system with GPUs. In ASPLOS, 2013.
[43]
A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS, 2000.
[44]
K. Wang, Y. Huai, R. Lee, F. Wang, X. Zhang, and J. H. Saltz. Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems. Proc. VLDB Endow., 5(11), 2012.
[45]
Y. Yuan, R. Lee, and X. Zhang. The yin and yang of processing data warehousing queries on GPU devices. Proc. VLDB Endow., 6(10):817--828, 2013.

Cited By

View all
  • (2024)gLSM: Using GPGPU to Accelerate Compactions in LSM-tree-based Key-value StoresACM Transactions on Storage10.1145/363378220:1(1-41)Online publication date: 30-Jan-2024
  • (2018)DRAGONProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291699(1-13)Online publication date: 11-Nov-2018
  • (2018)DRAGONProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00035(1-13)Online publication date: 11-Nov-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS '14: The 2014 ACM international conference on Measurement and modeling of computer systems
June 2014
614 pages
ISBN:9781450327893
DOI:10.1145/2591971
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. gpu
  2. memory management
  3. operating system

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMETRICS '14
Sponsor:

Acceptance Rates

SIGMETRICS '14 Paper Acceptance Rate 40 of 237 submissions, 17%;
Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)gLSM: Using GPGPU to Accelerate Compactions in LSM-tree-based Key-value StoresACM Transactions on Storage10.1145/363378220:1(1-41)Online publication date: 30-Jan-2024
  • (2018)DRAGONProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291699(1-13)Online publication date: 11-Nov-2018
  • (2018)DRAGONProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00035(1-13)Online publication date: 11-Nov-2018
  • (2017)EffiShaACM SIGPLAN Notices10.1145/3155284.301874852:8(3-16)Online publication date: 26-Jan-2017
  • (2017)Efficient exception handling support for GPUsProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123950(109-122)Online publication date: 14-Oct-2017
  • (2017)FLEPACM SIGARCH Computer Architecture News10.1145/3093337.303774245:1(483-496)Online publication date: 4-Apr-2017
  • (2017)FLEPACM SIGPLAN Notices10.1145/3093336.303774252:4(483-496)Online publication date: 4-Apr-2017
  • (2017)FLEPACM SIGOPS Operating Systems Review10.1145/3093315.303774251:2(483-496)Online publication date: 4-Apr-2017
  • (2017)Using High Level GPU Tasks to Explore Memory and Communications Options on Heterogeneous PlatformsProceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications10.1145/3085158.3086160(21-28)Online publication date: 26-Jun-2017
  • (2017)GPrioSwapProceedings of the 10th ACM International Systems and Storage Conference10.1145/3078468.3078474(1-10)Online publication date: 22-May-2017
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media