Abstract
We present GPUMixer, a tool to perform mixed-precision floating-point tuning on scientific GPU applications. While precision tuning techniques are available, they are designed for serial programs and are accuracy-driven, i.e., they consider configurations that satisfy accuracy constraints, but these configurations may degrade performance. GPUMixer, in contrast, presents a performance-driven approach for tuning. We introduce a novel static analysis that finds Fast Imprecise Sets (FISets), sets of operations on low precision that minimize type conversions, which often yield performance speedups. To estimate the relative error introduced by GPU mixed-precision, we propose shadow computations analysis for GPUs, the first of this class for multi-threaded applications. GPUMixer obtains performance improvements of up to \(46.4\%\) of the ideal speedup in comparison to only \(20.7\%\) found by state-of-the-art methods.
This work was performed when P. C. Wood and R. Singh wereat Purdue University.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
CoMD-CUDA (2017). https://github.com/NVIDIA/CoMD-CUDA
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization (IISWC 2009), pp. 44–54. IEEE (2009)
Chiang, W.F., Baranowski, M., Briggs, I., Solovyev, A., Gopalakrishnan, G., Rakamarić, Z.: Rigorous floating-point mixed-precision tuning. In: 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017. Association for Computing Machinery (2017)
Chiang, W.-F., Gopalakrishnan, G., Rakamaric, Z., Solovyev, A.: Efficient search for inputs causing high floating-point errors. In: Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2014, pp. 43–52. ACM, New York (2014)
Damouche, N., Martel, M., Chapoutot, A.: Intra-procedural optimization of the numerical accuracy of programs. In: Núñez, M., Güdemann, M. (eds.) FMICS 2015. LNCS, vol. 9128, pp. 31–46. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19458-5_3
Darulova, E., Kuncak, V.: Towards a compiler for reals. ACM Trans. Program. Lang. Syst. (TOPLAS) 39(2), 8 (2017)
Guo, H., Rubio-González, C.: Exploiting community structure for floating-point precision tuning. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 333–343. ACM (2018)
Harris, M.: Mini-nbody: a simple N-body code (2014). https://github.com/harrism/mini-nbody
Iskhodzhanov, T., Potapenko, A., Samsonov, A., Serebryany, K., Stepanov, E., Vyukov, D.: ThreadSanitizer, MemorySanitizer, 8 November 2012. https://urldefense.proofpoint.com/v2/url?u=http-3A__www.llvm.org_devmtg_2012-2D11_Serebryany-5FTSan-2DMSan.pdf&d=DwIF-g&c=vh6FgFnduejNhPPD0fl_yRaSfZy8CWbWnIf4XJhSqx8&r=UyK1_569d50MjVlUSODJYRW2epEY0RveVNq0YCmePcDz4DQHW-CkWcttrwneZ0md&m=QbB1B0a55LgDuuwoFrE3U3GhMpMGOKghlpBLKQdmd1A&s=XadD1efiG2KOXnZcaadrIMuS10vDECEVJu__wnFtYQU&e=
Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Technical report LLNL-TR-641973, August 2013
Lam, M.O., Hollingsworth, J.K.: Fine-grained floating-point precision analysis. Int. J. High Perform. Comput. Appl. 32, 231 (2016). 1094342016652462
Lam, M.O., Hollingsworth, J.K., de Supinski, B.R., LeGendre, M.P.: Automatically adapting programs for mixed-precision floating-point computation. In: Proceedings of the 27th International ACM Conference on Supercomputing, pp. 369–378. ACM (2013)
Lam, M.O., Rountree, B.L.: Floating-point shadow value analysis. In: Proceedings of the 5th Workshop on Extreme-Scale Programming Tools, pp. 18–25. IEEE Press (2016)
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, p. 75. IEEE Computer Society (2004)
Luk, C.-K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Not. 40, 190–200 (2005)
Menon, H., et al.: ADAPT: algorithmic differentiation applied to floating-point precision tuning. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, p. 48. IEEE Press (2018)
NDIDIA. CUDA ToolKit Documentation - NVVM IR Specification 1.5 (2018). https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html
Nguyen, H.: GPU Gems 3, pp. 677–694. Addison-Wesley Professional, Reading (2007). chapter 31
Nvidia. Nvidia Tesla P100 GPU. Pascal Architecture White Paper (2016)
Nvidia. CUDA C Programming Guide, v9.0 (2018). http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
Paganelli, G., Ahrendt, W.: Verifying (in-) stability in floating-point programs by increasing precision, using SMT solving. In: 2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 209–216. IEEE (2013)
Rubio-González, C., et al.: Floating-point precision tuning using blame analysis. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, pp. 1074–1085. ACM, New York (2016)
Rubio-González, C., et al.: Precimonious: tuning assistant for floating-point precision. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 27. ACM (2013)
Acknowledgments
We thank the anonymous reviewers for their suggestions and comments on the paper. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DEAC52-07NA27344 (LLNL-CONF-748618).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 This is a U.S. government work and not under copyright protection in the United States; foreign copyright protection may apply
About this paper
Cite this paper
Laguna, I., Wood, P.C., Singh, R., Bagchi, S. (2019). GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications. In: Weiland, M., Juckeland, G., Trinitis, C., Sadayappan, P. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11501. Springer, Cham. https://doi.org/10.1007/978-3-030-20656-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-20656-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20655-0
Online ISBN: 978-3-030-20656-7
eBook Packages: Computer ScienceComputer Science (R0)