Abstract
Numerical reproducibility failures appear in massively parallel floating-point computations. One way to guarantee this reproducibility is to extend the IEEE-754 correct rounding to larger computing sequences, e.g. to the BLAS. Is the extra cost for numerical reproducibility acceptable in practice? We present solutions and experiments for the level 1 BLAS and we conclude about their efficiency.
Similar content being viewed by others
References
IEEE 754–2008, Standard for Floating-Point Arithmetic. Institute of Electrical and Electronics Engineers, New York (2008)
Bohlender, G.: Floating-point computation of functions with maximum accuracy. IEEE Trans. Comput. C-26(7), 621–632 (1977)
Chohra, C., Langlois, P., Parello, D.: Implementation and Efficiency of Reproducible Level 1 BLAS (2015). http://hal-lirmm.ccsd.cnrs.fr/lirmm-01179986
Collange, C., Defour, D., Graillat, S., Iakimchuk, R.: Reproducible and accurate matrix multiplication in ExBLAS for high-performance computing. In: SCAN 2014, Würzburg, Germany (2014)
Dekker, T.J.: A floating-point technique for extending the available precision. Numer. Math. 18, 224–242 (1971)
Demmel, J.W., Nguyen, H.D.: Fast reproducible floating-point summation. In: Proceedings of 21th IEEE Symposium on Computer Arithmetic. Austin, Texas, USA (2013)
Intel Math Kernel Library. http://www.intel.com/software/products/mkl/
Jézéquel, F., Langlois, P., Revol, N.: First steps towards more numerical reproducibility. ESAIM: Proc. 45, 229–238 (2013)
Muller, J.M., Brisebarre, N., de Dinechin, F., Jeannerod, C.P., Lefèvre, V., Melquiond, G., Revol, N., Stehlé, D., Torres, S.: Handbook of Floating-Point Arithmetic. Birkhäuser, Boston (2010)
Ogita, T., Rump, S.M., Oishi, S.: Accurate sum and dot product. SIAM J. Sci. Comput. 26(6), 1955–1988 (2005)
Reinders, J.: Intel Threading Building Blocks, 1st edn. O’Reilly & Associates Inc., Sebastopol (2007)
Rump, S.M.: Ultimately fast accurate summation. SIAM J. Sci. Comput. 31(5), 3466–3502 (2009)
Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation - part I: faithful rounding. SIAM J. Sci. Comput. 31(1), 189–224 (2008)
Story, S.: Numerical reproducibility in the Intel Math Kernel Library. Salt Lake City, November 2012
Van Zee, F.G., van de Geijn, R.A.: BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Software 41(3), 14:1–14:33 (2015)
Yamanaka, N., Ogita, T., Rump, S., Oishi, S.: A parallel algorithm for accurate dot product. Parallel Comput. 34(68), 392–410 (2008)
Zhu, Y.K., Hayes, W.B.: Correct rounding and hybrid approach to exact floating-point summation. SIAM J. Sci. Comput. 31(4), 2981–3001 (2009)
Zhu, Y.K., Hayes, W.B.: Algorithm 908: online exact summation of floating-point streams. ACM Trans. Math. Softw. 37(3), 37:1–37:13 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Chohra, C., Langlois, P., Parello, D. (2016). Efficiency of Reproducible Level 1 BLAS. In: Nehmeier, M., Wolff von Gudenberg, J., Tucker, W. (eds) Scientific Computing, Computer Arithmetic, and Validated Numerics. SCAN 2015. Lecture Notes in Computer Science(), vol 9553. Springer, Cham. https://doi.org/10.1007/978-3-319-31769-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-31769-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31768-7
Online ISBN: 978-3-319-31769-4
eBook Packages: Computer ScienceComputer Science (R0)