research-article

Supporting extended precision on graphics processors

Authors:

Qiong LuoAuthors Info & Claims

DaMoN '10: Proceedings of the Sixth International Workshop on Data Management on New Hardware

Pages 19 - 26

https://doi.org/10.1145/1869389.1869392

Published: 07 June 2010 Publication History

Abstract

Scientific computing applications often require support for non-traditional data types, for example, numbers with a precision higher than 64-bit floats. As graphics processors, or GPUs, have emerged as a powerful accelerator for scientific computing, we design and implement a GPU-based extended precision library to enable applications with high precision requirement to run on the GPU. Our library contains arithmetic operators, mathematical functions, and data-parallel primitives, each of which can operate at either multi-term or multi-digit precision. The multi-term precision maintains an accuracy of up to 212 bits of signifcand whereas the multi-digit precision allows an accuracy of an arbitrary number of bits. Additionally, we have integrated the extended precision algorithms to a GPU-based query processing engine to support efficient query processing with extended precision on GPUs. To demonstrate the usage of our library, we have implemented three applications: parallel summation in climate modeling, Newton's method used in nonlinear physics, and high precision numerical integration in experimental mathematics. The GPU-based implementation is up to an order of magnitude faster, and achieves the same accuracy as their optimized, quadcore CPU-based counterparts.

References

[1]

}}D. H. Bailey. Algorithm 719: Multiprecision translation and execution of fortran programs. ACM Trans. Math. Softw., 19(3):288--319, 1993.

Digital Library

[2]

}}D. H. Bailey. Integer relation detection. Computing in Science and Engineering, 2:24--28, 2000.

Digital Library

[3]

}}D. H. Bailey. High-precision floating-point arithmetic in scientific computation. Computing in Science and Engg., 7(3):54--61, 2005.

Digital Library

[4]

}}D. H. Bailey and J. M. Borwein. Highly parallel, high-precision numerical integration. Technical Report LBNL-57491, April 2008.

[5]

}}D. H. Bailey, J. M. Borwein, and R. E. Crandall. Resolution of the quinn-rand-strogatz constant of nonlinear physics. Experimental Mathematics, 18:107--116, 2009.

[6]

}}D. H. Bailey, Y. Hida, X. S. Li, and O. Thompson. ARPREC: An arbitrary precision computation package. Technical Report LBNL-53651, 2002.

[7]

}}D. H. Bailey, K. Jeyabalan, and X. S. Li. A comparison of three high-precision quadrature schemes. Experimental Mathematics, 14:317--329, 2004.

[8]

}}P. Cudre-Mauroux, H. Kimura, K.-T. Lim, J. Rogers, R. Simakov, E. Soroush, P. Velikhov, D. L. Wang, M. Balazinska, J. Becla, D. DeWitt, B. Heath, D. Maier, S. Madden, J. Patel, M. Stonebraker, and S. Zdonik. A demonstration of SciDB: a science-oriented DBMS. In VLDB, 2009.

Digital Library

[9]

}}T. J. Dekker. A floating-point technique for extending the available precision. Numerische Mathematik, 18:224--242, 1971.

Digital Library

[10]

}}W. Fang, M. Lu, X. Xiao, B. He, and Q. Luo. Frequent itemset mining on graphics processors. In DaMoN '09: Proceedings of the Fifth International Workshop on Data Management on New Hardware, 2009.

Digital Library

[11]

}}GNU Multiple Precision Arithmetic Library. http://gmplib.org/.

[12]

}}D. Göddeke, R. Strzodka, and S. Turek. Accelerating double precision FEM simulations with GPUs. In 18th Symposium on Simulations Technique (ASIM'05), 2005.

[13]

}}N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In SIGMOD, 2004.

Digital Library

[14]

}}G. D. Gracca and D. Defour. Implementation of float-float operators on graphics hardware. In 7th conference on Real Numbers and Computers, 2006.

[15]

}}J. Gray, D. T. Liu, M. Nieto-Santisteban, A. Szalay, D. J. DeWitt, and G. Heber. Scientific data management in the coming decade. SIGMOD Rec., 34(4):34--41, 2005.

Digital Library

[16]

}}J. Gunnels, J. Lee, and S. Margulies. Efficient high-precision dense matrix algebra on parallel architectures for nonlinear discrete optimization. Technical Report IBM Research RC24682, 2008.

[17]

}}B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational query coprocessing on graphics processors. ACM Trans. Database Syst., 34(4):1--39, 2009.

Digital Library

[18]

}}B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD, 2008.

Digital Library

[19]

}}Y. He and C. H. Q. Ding. Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. J. Supercomput., 18(3):259--277, 2001.

Digital Library

[20]

}}Y. Hida, X. Li, and D. Bailey. Algorithms for quad-double precision floating point arithmetic. IEEE Symposium on Computer Arithmetic, pages 155--162, 2001.

Digital Library

[21]

}}G. Lake, T. Quinn, and D. C. Richardson. From sir isaac to the sloan survey: calculating the structure and chaos owing to gravity in the universe. In SODA, 1997.

Digital Library

[22]

}}NVIDIA CUDA. http://developer.nvidia.com/object/cuda.html.

[23]

}}OpenMP. http://openmp.org/.

[24]

}}J. D. Owens, D. Luebke, N. K. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. In Eurographics 2005, State of the Art Reports, 2005.

[25]

}}D. D. Quinn, R. H. Rand, and S. H. Strogatz. Singular unlocking transition in the Winfree model of coupled oscillators. Physical Review, 75(3):036218--+, 2007.

[26]

}}J. R. Shewchuk. Adaptive precision floating-point arithmetic and fast robust geometric predicates. Discrete and Computational Geometry, 18:305--363, 1997.

[27]

}}A. Thall. Extended-precision floating-point numbers for GPU computation. In ACM SIGGRAPH Research posters, 2006.

Digital Library

[28]

}}XBLAS - Extra Precise Basic Linear Algebra Subroutines. http://www.netlib.org/xblas/.

Cited By

Kozicki J(2023)Very accurate time propagation of coupled Schrödinger equations for femto- and attosecond physics and chemistry, with C++ source codeComputer Physics Communications10.1016/j.cpc.2023.108839291(108839)Online publication date: Oct-2023
https://doi.org/10.1016/j.cpc.2023.108839
Miao DLaguna IRubio-González C(2023)Expression Isolation of Compiler-Induced Numerical Inconsistencies in Heterogeneous CodeHigh Performance Computing10.1007/978-3-031-32041-5_20(381-401)Online publication date: 10-May-2023
https://doi.org/10.1007/978-3-031-32041-5_20
de Fine Licht JPattison CZiogas ASimmons-Duffin DHoefler T(2022)Fast Arbitrary Precision Floating Point on FPGA2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM53951.2022.9786219(1-9)Online publication date: 15-May-2022
https://doi.org/10.1109/FCCM53951.2022.9786219
Show More Cited By

Index Terms

Supporting extended precision on graphics processors
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Graphics processors
2. Hardware
  1. Hardware validation
  2. Integrated circuits
    1. Logic circuits
      1. Arithmetic and datapath circuits

Recommendations

Relational query coprocessing on graphics processors

Graphics processors (GPUs) have recently emerged as powerful coprocessors for general purpose computation. Compared with commodity CPUs, GPUs have an order of magnitude higher computation power as well as memory bandwidth. Moreover, new-generation GPUs ...
Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study

The tradeoffs of accuracy and performance are as yet an unsolved problem when dealing with Graphics Processing Units (GPUs) as a general-purpose computation device. Their high performance and low cost makes them a desirable target for scientific ...
Frequent itemset mining on graphics processors
DaMoN '09: Proceedings of the Fifth International Workshop on Data Management on New Hardware

We present two efficient Apriori implementations of Frequent Itemset Mining (FIM) that utilize new-generation graphics processing units (GPUs). Our implementations take advantage of the GPU's massively multi-threaded SIMD (Single Instruction, Multiple ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DaMoN '10: Proceedings of the Sixth International Workshop on Data Management on New Hardware

June 2010

56 pages

ISBN:9781450301893

DOI:10.1145/1869389

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Research Grants Council, University Grants Committee, Hong Kong

Conference

SIGMOD/PODS '10

Sponsor:

SIGMOD

SIGMOD/PODS '10: International Conference on Management of Data

June 7, 2010

Indiana, Indianapolis

Acceptance Rates

Overall Acceptance Rate 94 of 127 submissions, 74%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
302
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kozicki J(2023)Very accurate time propagation of coupled Schrödinger equations for femto- and attosecond physics and chemistry, with C++ source codeComputer Physics Communications10.1016/j.cpc.2023.108839291(108839)Online publication date: Oct-2023
https://doi.org/10.1016/j.cpc.2023.108839
Miao DLaguna IRubio-González C(2023)Expression Isolation of Compiler-Induced Numerical Inconsistencies in Heterogeneous CodeHigh Performance Computing10.1007/978-3-031-32041-5_20(381-401)Online publication date: 10-May-2023
https://doi.org/10.1007/978-3-031-32041-5_20
de Fine Licht JPattison CZiogas ASimmons-Duffin DHoefler T(2022)Fast Arbitrary Precision Floating Point on FPGA2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM53951.2022.9786219(1-9)Online publication date: 15-May-2022
https://doi.org/10.1109/FCCM53951.2022.9786219
Utsugiri TKouya T(2022)Acceleration of Matrix Multiplication Based on Triple-Double (TD), and Triple-Single (TS) Precision ArithmeticComputational Science and Its Applications – ICCSA 2022 Workshops10.1007/978-3-031-10562-3_29(406-423)Online publication date: 4-Aug-2022
https://doi.org/10.1007/978-3-031-10562-3_29
Isupov KBabeshko I(2022)JAD-Based SpMV Kernels Using Multiple-Precision Libraries for GPUsSupercomputing10.1007/978-3-030-92864-3_12(148-161)Online publication date: 3-Jan-2022
https://doi.org/10.1007/978-3-030-92864-3_12
Verschelde J(2021)Accelerated Polynomial Evaluation and Differentiation at Power Series in Multiple Double Precision2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW52791.2021.00111(740-749)Online publication date: Jun-2021
https://doi.org/10.1109/IPDPSW52791.2021.00111
Kozicki JGladky AThoeni K(2021)Implementation of high-precision computation capabilities into the open-source dynamic simulation framework YADEComputer Physics Communications10.1016/j.cpc.2021.108167(108167)Online publication date: Sep-2021
https://doi.org/10.1016/j.cpc.2021.108167
Isupov KKnyazkov VBabeshko IKrutikov A(2021)Computing the Sparse Matrix-Vector Product in High-Precision Arithmetic for GPU ArchitecturesMathematical Modeling and Supercomputer Technologies10.1007/978-3-030-78759-2_28(334-345)Online publication date: 24-Jun-2021
https://doi.org/10.1007/978-3-030-78759-2_28
Isupov KKnyazkov V(2020)Multiple-precision matrix-vector multiplication on graphics processing unitsProgram Systems: Theory and ApplicationsПрограммные системы: теория и приложения10.25209/2079-3316-2020-11-3-61-8411:3(61-84)Online publication date: 2020
https://doi.org/10.25209/2079-3316-2020-11-3-61-84
Исупов ККнязьков В(2020)Multiple-precision matrix-vector multiplication on graphics processing unitsProgram Systems: Theory and ApplicationsПрограммные системы: теория и приложения10.25209/2079-3316-2020-11-3-33-5911:3(33-59)Online publication date: 2020
https://doi.org/10.25209/2079-3316-2020-11-3-33-59
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents