Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-642-03644-6_12guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Performance Optimization Strategies of High Performance Computing on GPU

Published: 21 August 2009 Publication History

Abstract

Recently GPU is widely utilized in scientific computing and engineering applications, owing primarily to the evolution of GPU architecture. Firstly, we analyze some key performance characters of GPU in detail, and the relationships among GPU architecture, programming model and memory hierarchy. Secondly, we present three performance optimization strategies: Prefetching, Streamlizing, and Task Division. Adequate experiments have been done to abstract the relationships among different factors and efficiency. Finally, we map the HPL benchmark to testify our strategies and achieve certain speedup.

References

[1]
Ghuloum, A., Sprangle, E., Fang, J., Wu, G., Zhou, X.: Ct: A Flexible Parallel Programming Model for Tera-scale Architectures. Technical report, Intel Research (2007).
[2]
Gutowitz. H.: A tutorial introduction to Swarm. Technical report, The Santa Fe Institute (1993).
[3]
Monteyne, M.: RapidMind: Multi-Core Develpment Platform, RapidMind Official Page (2007), http://www.rapidmind.net/
[4]
Dongarra, J.J., Luszczek, P., Petitet, A.: The LINPACK Benchmark: Past, Present, and Future. Concurrency and Computation: Practice and Experience 15, 803-820 (2003).
[5]
http://www.netlib.org/benchmark/hpl/index.html
[6]
Halfhill, T.R.: Parallel Processing With CUDA. Microprocessor Report (January 2008).
[7]
Stone, J.: Accelerating Computational Biology by 100x with CUDA. In: NVISION (2008) (presentation).
[8]
Hartley, T.D.R., Catalyurek, U., Ruiz, A., Igual, F., Mayo, R., Ujaldon, M.: Biomedical image analysis on a cooperative cluster of gpus and multicores. In: ICS 2008: Proceedings of the 22nd annual international conference on Supercomputing, pp. 15-25. ACM, New York (2008).
[9]
Bond, A.: Havok FX: GPU-accelerated physics for PC games. In: Proceedings of Game Developers Conference 2006 (2006).
[10]
Hagen, T.R., Lle, K.-A., Natvig, J.R.: Solving the Euler equations on graphics processing units. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006. LNCS, vol. 3994, pp. 220-227. Springer, Heidelberg (2006).
[11]
Zeller, C.: Cloth simulation on the GPU. In: ACM SIGGRAPH 2005 Conference Abstracts and Applications (2005).
[12]
Elsen, E., Houston, M., Vishal, V., Darve, E., Hanrahan, P., Pande, V.S.: N-Body simulation on GPUs. In: Proc. 2006 ACM/IEEE Conf. on Supercomputing, p. 188 (2006).
[13]
Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kale, L., Schulten, K.: Scalable molecular dynamics with NAMD. J. Comp. Chem. 26, 1781-1802 (2005).
[14]
Stone, J.E., Phillips, J.C., Freddolino, P.L., Hardy, D.J., Trabuco, L.G., Schulten, K.: Accelerating molecular modeling applications with graphics processors. J. Comp. Chem. 28, 2618-2640 (2007).
[15]
Stone, S.S., Haldar, J.P., Tsao, S.C., Hwu, W.W., Liang, Z., Sutton, B.P.: Accelerating advanced MRI reconstructions on GPUs. In: ACM Computing Frontier Conference (2008).
[16]
openVIDIA, http://openvidia.sourceforge.net/
[17]
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: SC 2008: Proceedings of the 2008 ACM/IEEE conference on Super-computing, pp. 1-11. IEEE Press, Los Alamitos (2008).
[18]
Fatica, M.: Accelerating Linpack with CUDA on heterogenous clusters. In: GPGPU 2009. ACM, New york (2009).
[19]
Castillo, M., Chan, E., Igual, F.D., Mayo, R., Quintanaorti, E.S., Quintana-orti, G., Van De Geijn, R., Van Zee, F.G.: Making Programming Synonymous with Programming for Linear Algebra Libraries, FLAME Working Note #31. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-08-20 (April 17, 2008).
[20]
Quintana-Orti, G., Igual, F.D., Quintana-Orti, E.S., van de Geijn, R.: Solving Dense Linear Systems on Platforms with Multiple Hardware Accelerators. In: PPoPP, pp. 121-129 (2009).
[21]
decuda, http://www.cs.rug.nl/~wladimir/decuda/

Cited By

View all
  • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
APPT '09: Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
August 2009
476 pages
ISBN:9783642036439
  • Editors:
  • Yong Dou,
  • Ralf Gruber,
  • Josef M. Joller

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 21 August 2009

Author Tags

  1. GPGPU
  2. HPL Benchmark
  3. Optimization Strategy
  4. Stream Computing
  5. Task Division

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media