Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Performance Study of CUDA UVM versus Manual Optimizations in a Real-World Setup: Application to a Monte Carlo Wave-Particle Event-Based Interaction Model

Published: 01 June 2016 Publication History

Abstract

The performance of a Monte Carlo model for the simulation of electromagnetic wave propagation in particle-filled atmospheres has been conducted for different CUDA versions and design approaches. The proposed algorithm exhibits a high degree of parallelism, which allows favorable implementation in a GPU. Practical implementation aspects of the model have been also explained and their impact assessed, such as the use of the different types of memories present in a GPU. A number of setups have been chosen in order to compare performance for manually optimized versus Unified Virtual Memory (UVM) implementations for different CUDA versions. Features and relative performance impact of the different options have been discussed, extracting practical hints and rules useful to speed up CUDA programs.

References

[1]
J. Nickolls and W. J. Dally, “The GPU computing era,” IEEE Micro, vol. 30, no. 2, pp. 56–69, Mar./Apr. 2010.
[2]
J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU computing,” Proc. IEEE, vol. 96, no. 5, pp. 879–899, May 2008 .
[3]
L. Dematté and D. Prandi, “GPU computing for systems biology,” Briefings Bioinformat., vol. 11, pp. 323–333, 2010.
[4]
W. Liu, B. Schmidt, G. Voss, and W. Müller-Wittig, “Molecular dynamics simulations on commodity GPUs with CUDA,” in Proc. 14th Int. Conf. High Perform. Comput., 2007, pp. 185–196.
[5]
M. Harvey and G. De Fabritiis, “A survey of computational molecular science using graphics processing units, ” Wiley Interdisciplinary Rev.: Comput. Molecular Sci., vol. 2, no. 5, pp. 734–742, 2012.
[6]
P. Richmond, “From biological cells to populations of individuals: Complex systems simulations with CUDA,” in Proc. GPU Technol. Conf., no. ID S5133, Mar. 2015.
[7]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron, “ A performance study of General-purpose applications on graphics processors using CUDA, ” J. Parallel Distrib. Comput., vol. 68, no. 10, pp. 1370 –1380, 2008.
[8]
P. Micikevicius and D. Tarjan, “The art of performance tuning for CUDA and manycore architectures,” Birds-of-a-feather session at SC'09 [Online]. Available: http://www.cs.virginia.edu/ skadron/Papers/cuda_tuning_bof_sc09_final.pdf, (2009).
[9]
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-m. W. Hwu, “Optimization principles and application performance evaluation of a multithreaded GPU using CUDA,” in Proc. 13th ACM Symp. Principles Practice Parallel Program., 2008, pp. 73–82.
[10]
R. Duarte, R. Sendag, and F. J. Vetter, “On the performance and energy-efficiency of multi-core SIMD CPUs and CUDA-enabled GPUs,” in Proc. IEEE Int. Symp. Workload Characterization, 2013, pp. 174–184.
[11]
B. Jang, D. Schaa, P. Mistry, and D. Kaeli, “Exploiting memory access patterns to improve memory performance in Data-parallel architectures,” IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 1, pp. 105–118, Jan. 2011.
[12]
Y. Kim and A. Shrivastava, “ Memory performance estimation of CUDA programs,” ACM Trans. Embedded Comput. Syst., vol. 13, no. 2, p. 21, 2013.
[13]
M. Wezowicz, T. Estrada, S. Patel, and M. Taufer, “Performance dissection of a molecular dynamics code across CUDA and GPU generations,” in Proc. IEEE 27th Int. Parallel Distrib. Process. Symp. Workshops PhD Forum, 2013, pp. 1355–1364.
[14]
J. M. Nadal-Serrano and M. Lopez-Vallejo, “A time-resolved monte carlo smoke model for use at optical and infrared frequencies,” Fire Safety J., vol. 71, pp. 299 –309, 2015.
[15]
B. Roysam, A. Cohen, P. Getto, and P. Boyce, “A numerical approach to the computation of light propagation through turbid media: Application to the evaluation of lighted exit signs,” IEEE Trans. Ind. Appl., vol. 29, no. 3, pp. 661– 669, May/Jun. 1993.
[16]
L. Devroye, “Random variate generation in one line of code, ” in Proc. 28th Conf. Winter Simul., 1996, pp. 265–272.
[17]
NVIDIA CUDA developer zone. (2013, Jul. 2). webpage [Online]. Available: https://developer.nvidia.com/category/zone/cuda-zone
[18]
CURAND - CUDA Toolkit Documentation. (2013, Jul.). v.5.5 ed., nVIDIA [Online]. Available: http://docs.nvidia.com/cuda/curand/index.html
[19]
nVIDIA Corporation, “Advanced CUDA webinar - memory optimization,” [Online]. Available: http://ondemand.gputechconf.com /gtcexpress/2011/presentations/ NVIDIA_GPU_Computing_Webinars_CUDA_Memory_Optimization.pdf, 2011.
[20]
Using shared memory in CUDA c/c++. (2014, Jun. 27) [Online]. Available: http://devblogs.nvidia.com/parallelforall/using-shared-memory-cuda-cc/
[21]
CUDA occupancy calculator. (2015, Jul. 07) [Online]. Available: http://developer.download.nvidia.com/compute/cuda/CUDA_Occupancy_calculator.xls

Cited By

View all
  • (2024)Shared Virtual Memory: Its Design and Performance Implications for Diverse ApplicationsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656608(26-37)Online publication date: 30-May-2024
  • (2023)Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual MemoryACM Transactions on Architecture and Code Optimization10.1145/363295321:1(1-24)Online publication date: 14-Nov-2023
  • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
  • Show More Cited By

Index Terms

  1. A Performance Study of CUDA UVM versus Manual Optimizations in a Real-World Setup: Application to a Monte Carlo Wave-Particle Event-Based Interaction Model
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image IEEE Transactions on Parallel and Distributed Systems
          IEEE Transactions on Parallel and Distributed Systems  Volume 27, Issue 6
          June 2016
          311 pages

          Publisher

          IEEE Press

          Publication History

          Published: 01 June 2016

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 10 Nov 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Shared Virtual Memory: Its Design and Performance Implications for Diverse ApplicationsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656608(26-37)Online publication date: 30-May-2024
          • (2023)Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual MemoryACM Transactions on Architecture and Code Optimization10.1145/363295321:1(1-24)Online publication date: 14-Nov-2023
          • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
          • (2021)In-depth analyses of unified virtual memory system for GPU accelerated computingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3480855(1-15)Online publication date: 14-Nov-2021

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media