research-article

A Performance Study of CUDA UVM versus Manual Optimizations in a Real-World Setup: Application to a Monte Carlo Wave-Particle Event-Based Interaction Model

Authors:

Jose M. Nadal-Serrano,

Marisa Lopez-VallejoAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 27, Issue 6

Pages 1579 - 1588

https://doi.org/10.1109/TPDS.2015.2463813

Published: 01 June 2016 Publication History

Abstract

The performance of a Monte Carlo model for the simulation of electromagnetic wave propagation in particle-filled atmospheres has been conducted for different CUDA versions and design approaches. The proposed algorithm exhibits a high degree of parallelism, which allows favorable implementation in a GPU. Practical implementation aspects of the model have been also explained and their impact assessed, such as the use of the different types of memories present in a GPU. A number of setups have been chosen in order to compare performance for manually optimized versus Unified Virtual Memory (UVM) implementations for different CUDA versions. Features and relative performance impact of the different options have been discussed, extracting practical hints and rules useful to speed up CUDA programs.

References

[1]

J. Nickolls and W. J. Dally, “The GPU computing era,” IEEE Micro, vol. 30, no. 2, pp. 56–69, Mar./Apr. 2010.

Digital Library

[2]

J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU computing,” Proc. IEEE, vol. 96, no. 5, pp. 879–899, May 2008 .

[3]

L. Dematté and D. Prandi, “GPU computing for systems biology,” Briefings Bioinformat., vol. 11, pp. 323–333, 2010.

[4]

W. Liu, B. Schmidt, G. Voss, and W. Müller-Wittig, “Molecular dynamics simulations on commodity GPUs with CUDA,” in Proc. 14th Int. Conf. High Perform. Comput., 2007, pp. 185–196.

Digital Library

[5]

M. Harvey and G. De Fabritiis, “A survey of computational molecular science using graphics processing units, ” Wiley Interdisciplinary Rev.: Comput. Molecular Sci., vol. 2, no. 5, pp. 734–742, 2012.

[6]

P. Richmond, “From biological cells to populations of individuals: Complex systems simulations with CUDA,” in Proc. GPU Technol. Conf., no. ID S5133, Mar. 2015.

[7]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron, “ A performance study of General-purpose applications on graphics processors using CUDA, ” J. Parallel Distrib. Comput., vol. 68, no. 10, pp. 1370 –1380, 2008.

Digital Library

[8]

P. Micikevicius and D. Tarjan, “The art of performance tuning for CUDA and manycore architectures,” Birds-of-a-feather session at SC'09 [Online]. Available: http://www.cs.virginia.edu/ skadron/Papers/cuda_tuning_bof_sc09_final.pdf, (2009).

[9]

S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-m. W. Hwu, “Optimization principles and application performance evaluation of a multithreaded GPU using CUDA,” in Proc. 13th ACM Symp. Principles Practice Parallel Program., 2008, pp. 73–82.

Digital Library

[10]

R. Duarte, R. Sendag, and F. J. Vetter, “On the performance and energy-efficiency of multi-core SIMD CPUs and CUDA-enabled GPUs,” in Proc. IEEE Int. Symp. Workload Characterization, 2013, pp. 174–184.

[11]

B. Jang, D. Schaa, P. Mistry, and D. Kaeli, “Exploiting memory access patterns to improve memory performance in Data-parallel architectures,” IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 1, pp. 105–118, Jan. 2011.

Digital Library

[12]

Y. Kim and A. Shrivastava, “ Memory performance estimation of CUDA programs,” ACM Trans. Embedded Comput. Syst., vol. 13, no. 2, p. 21, 2013.

[13]

M. Wezowicz, T. Estrada, S. Patel, and M. Taufer, “Performance dissection of a molecular dynamics code across CUDA and GPU generations,” in Proc. IEEE 27th Int. Parallel Distrib. Process. Symp. Workshops PhD Forum, 2013, pp. 1355–1364.

[14]

J. M. Nadal-Serrano and M. Lopez-Vallejo, “A time-resolved monte carlo smoke model for use at optical and infrared frequencies,” Fire Safety J., vol. 71, pp. 299 –309, 2015.

[15]

B. Roysam, A. Cohen, P. Getto, and P. Boyce, “A numerical approach to the computation of light propagation through turbid media: Application to the evaluation of lighted exit signs,” IEEE Trans. Ind. Appl., vol. 29, no. 3, pp. 661– 669, May/Jun. 1993.

[16]

L. Devroye, “Random variate generation in one line of code, ” in Proc. 28th Conf. Winter Simul., 1996, pp. 265–272.

Digital Library

[17]

NVIDIA CUDA developer zone. (2013, Jul. 2). webpage [Online]. Available: https://developer.nvidia.com/category/zone/cuda-zone

[18]

CURAND - CUDA Toolkit Documentation. (2013, Jul.). v.5.5 ed., nVIDIA [Online]. Available: http://docs.nvidia.com/cuda/curand/index.html

[19]

nVIDIA Corporation, “Advanced CUDA webinar - memory optimization,” [Online]. Available: http://ondemand.gputechconf.com /gtcexpress/2011/presentations/ NVIDIA_GPU_Computing_Webinars_CUDA_Memory_Optimization.pdf, 2011.

[20]

Using shared memory in CUDA c/c++. (2014, Jun. 27) [Online]. Available: http://devblogs.nvidia.com/parallelforall/using-shared-memory-cuda-cc/

[21]

CUDA occupancy calculator. (2015, Jul. 07) [Online]. Available: http://developer.download.nvidia.com/compute/cuda/CUDA_Occupancy_calculator.xls

Cited By

Cooper BScogland TGe R(2024)Shared Virtual Memory: Its Design and Performance Implications for Diverse ApplicationsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656608(26-37)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656608
Allen TCooper BGe R(2023)Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual MemoryACM Transactions on Architecture and Code Optimization10.1145/363295321:1(1-24)Online publication date: 14-Nov-2023
https://dl.acm.org/doi/10.1145/3632953
Hijma PHeldens SSclocco Avan Werkhoven BBal H(2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
https://dl.acm.org/doi/10.1145/3570638
Show More Cited By

Index Terms

A Performance Study of CUDA UVM versus Manual Optimizations in a Real-World Setup: Application to a Monte Carlo Wave-Particle Event-Based Interaction Model
1. Computing methodologies
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Index terms have been assigned to the content through auto-classification.

Recommendations

The Metropolis Monte Carlo method with CUDA enabled Graphic Processing Units

We present a CPU-GPU system for runtime acceleration of large molecular simulations using GPU computation and memory swaps. The memory architecture of the GPU can be used both as container for simulation data stored on the graphics card and as floating-...
Accelerated event-by-event Monte Carlo microdosimetric calculations of electrons and protons tracks on a multi-core CPU and a CUDA-enabled GPU

For microdosimetric calculations event-by-event Monte Carlo (MC) methods are considered the most accurate. The main shortcoming of those methods is the extensive requirement for computational time. In this work we present an event-by-event MC code of ...
A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 27, Issue 6

June 2016

311 pages

ISSN:1045-9219

Issue’s Table of Contents

Copyright © 2015.

Publisher

IEEE Press

Publication History

Published: 01 June 2016

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cooper BScogland TGe R(2024)Shared Virtual Memory: Its Design and Performance Implications for Diverse ApplicationsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656608(26-37)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656608
Allen TCooper BGe R(2023)Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual MemoryACM Transactions on Architecture and Code Optimization10.1145/363295321:1(1-24)Online publication date: 14-Nov-2023
https://dl.acm.org/doi/10.1145/3632953
Hijma PHeldens SSclocco Avan Werkhoven BBal H(2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
https://dl.acm.org/doi/10.1145/3570638
Allen TGe Rde Supinski BHall MGamblin T(2021)In-depth analyses of unified virtual memory system for GPU accelerated computingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3480855(1-15)Online publication date: 14-Nov-2021
https://dl.acm.org/doi/10.1145/3458817.3480855

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents