GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications

Laguna, Ignacio; Wood, Paul C.; Singh, Ranvijay; Bagchi, Saurabh

doi:10.1007/978-3-030-20656-7_12

Ignacio Laguna¹⁸,
Paul C. Wood¹⁹,
Ranvijay Singh²⁰ &
…
Saurabh Bagchi²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11501))

Included in the following conference series:

International Conference on High Performance Computing

1298 Accesses
27 Citations

Abstract

We present GPUMixer, a tool to perform mixed-precision floating-point tuning on scientific GPU applications. While precision tuning techniques are available, they are designed for serial programs and are accuracy-driven, i.e., they consider configurations that satisfy accuracy constraints, but these configurations may degrade performance. GPUMixer, in contrast, presents a performance-driven approach for tuning. We introduce a novel static analysis that finds Fast Imprecise Sets (FISets), sets of operations on low precision that minimize type conversions, which often yield performance speedups. To estimate the relative error introduced by GPU mixed-precision, we propose shadow computations analysis for GPUs, the first of this class for multi-threaded applications. GPUMixer obtains performance improvements of up to $46.4\%$ of the ideal speedup in comparison to only $20.7\%$ found by state-of-the-art methods.

This work was performed when P. C. Wood and R. Singh wereat Purdue University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Comparison of Performance Tuning Process for Different Generations of NVIDIA GPUs and an Example Scientific Computing Algorithm

Automatic tuning to performance modelling of matrix polynomials on multicore and multi-GPU systems

Article 28 March 2016

GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models

References

CoMD-CUDA (2017). https://github.com/NVIDIA/CoMD-CUDA
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization (IISWC 2009), pp. 44–54. IEEE (2009)
Google Scholar
Chiang, W.F., Baranowski, M., Briggs, I., Solovyev, A., Gopalakrishnan, G., Rakamarić, Z.: Rigorous floating-point mixed-precision tuning. In: 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017. Association for Computing Machinery (2017)
Google Scholar
Chiang, W.-F., Gopalakrishnan, G., Rakamaric, Z., Solovyev, A.: Efficient search for inputs causing high floating-point errors. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2014, pp. 43–52. ACM, New York (2014)
Google Scholar
Damouche, N., Martel, M., Chapoutot, A.: Intra-procedural optimization of the numerical accuracy of programs. In: Núñez, M., Güdemann, M. (eds.) FMICS 2015. LNCS, vol. 9128, pp. 31–46. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19458-5_3
Chapter MATH Google Scholar
Darulova, E., Kuncak, V.: Towards a compiler for reals. ACM Trans. Program. Lang. Syst. (TOPLAS) 39(2), 8 (2017)
Article Google Scholar
Guo, H., Rubio-González, C.: Exploiting community structure for floating-point precision tuning. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 333–343. ACM (2018)
Google Scholar
Harris, M.: Mini-nbody: a simple N-body code (2014). https://github.com/harrism/mini-nbody
Iskhodzhanov, T., Potapenko, A., Samsonov, A., Serebryany, K., Stepanov, E., Vyukov, D.: ThreadSanitizer, MemorySanitizer, 8 November 2012. https://urldefense.proofpoint.com/v2/url?u=http-3A__www.llvm.org_devmtg_2012-2D11_Serebryany-5FTSan-2DMSan.pdf&d=DwIF-g&c=vh6FgFnduejNhPPD0fl_yRaSfZy8CWbWnIf4XJhSqx8&r=UyK1_569d50MjVlUSODJYRW2epEY0RveVNq0YCmePcDz4DQHW-CkWcttrwneZ0md&m=QbB1B0a55LgDuuwoFrE3U3GhMpMGOKghlpBLKQdmd1A&s=XadD1efiG2KOXnZcaadrIMuS10vDECEVJu__wnFtYQU&e=
Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Technical report LLNL-TR-641973, August 2013
Google Scholar
Lam, M.O., Hollingsworth, J.K.: Fine-grained floating-point precision analysis. Int. J. High Perform. Comput. Appl. 32, 231 (2016). 1094342016652462
Article Google Scholar
Lam, M.O., Hollingsworth, J.K., de Supinski, B.R., LeGendre, M.P.: Automatically adapting programs for mixed-precision floating-point computation. In: Proceedings of the 27th International ACM Conference on Supercomputing, pp. 369–378. ACM (2013)
Google Scholar
Lam, M.O., Rountree, B.L.: Floating-point shadow value analysis. In: Proceedings of the 5th Workshop on Extreme-Scale Programming Tools, pp. 18–25. IEEE Press (2016)
Google Scholar
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, p. 75. IEEE Computer Society (2004)
Google Scholar
Luk, C.-K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Not. 40, 190–200 (2005)
Article Google Scholar
Menon, H., et al.: ADAPT: algorithmic differentiation applied to floating-point precision tuning. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, p. 48. IEEE Press (2018)
Google Scholar
NDIDIA. CUDA ToolKit Documentation - NVVM IR Specification 1.5 (2018). https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html
Nguyen, H.: GPU Gems 3, pp. 677–694. Addison-Wesley Professional, Reading (2007). chapter 31
Google Scholar
Nvidia. Nvidia Tesla P100 GPU. Pascal Architecture White Paper (2016)
Google Scholar
Nvidia. CUDA C Programming Guide, v9.0 (2018). http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
Paganelli, G., Ahrendt, W.: Verifying (in-) stability in floating-point programs by increasing precision, using SMT solving. In: 2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 209–216. IEEE (2013)
Google Scholar
Rubio-González, C., et al.: Floating-point precision tuning using blame analysis. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, pp. 1074–1085. ACM, New York (2016)
Google Scholar
Rubio-González, C., et al.: Precimonious: tuning assistant for floating-point precision. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 27. ACM (2013)
Google Scholar

Download references

Acknowledgments

We thank the anonymous reviewers for their suggestions and comments on the paper. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DEAC52-07NA27344 (LLNL-CONF-748618).

Author information

Authors and Affiliations

Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
Ignacio Laguna
Johns Hopkins Applied Physics Lab, Laurel, MD, 20723, USA
Paul C. Wood
NVIDIA Corporation, Santa Clara, CA, 95051, USA
Ranvijay Singh
Purdue University, West Lafayette, IN, 47907, USA
Saurabh Bagchi

Authors

Ignacio Laguna
View author publications
You can also search for this author in PubMed Google Scholar
Paul C. Wood
View author publications
You can also search for this author in PubMed Google Scholar
Ranvijay Singh
View author publications
You can also search for this author in PubMed Google Scholar
Saurabh Bagchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ignacio Laguna .

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, UK
Michèle Weiland
Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
Guido Juckeland
Technical University of Munich, Munich, Germany
Carsten Trinitis
Ohio State University, Columbus, USA
Ponnuswamy Sadayappan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Laguna, I., Wood, P.C., Singh, R., Bagchi, S. (2019). GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications. In: Weiland, M., Juckeland, G., Trinitis, C., Sadayappan, P. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11501. Springer, Cham. https://doi.org/10.1007/978-3-030-20656-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-20656-7_12
Published: 17 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20655-0
Online ISBN: 978-3-030-20656-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Comparison of Performance Tuning Process for Different Generations of NVIDIA GPUs and an Example Scientific Computing Algorithm

Automatic tuning to performance modelling of matrix polynomials on multicore and multi-GPU systems

GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Comparison of Performance Tuning Process for Different Generations of NVIDIA GPUs and an Example Scientific Computing Algorithm

Automatic tuning to performance modelling of matrix polynomials on multicore and multi-GPU systems

GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation