Abstract
Special challenges exist in writing reliable numerical library software for heterogeneous computing environments. Although a lot of software for distributed memory parallel computers has been written, porting this software to a network of workstations requires careful consideration. The symptoms of heterogeneous computing failures can range from erroneous results without warning to deadlock. Some of the problems are straightforward to solve, but for others the solutions are not so obvious, or incur an unacceptable overhead. Making software robust on heterogeneous systems often requires additional communication.
This paper addresses the issue of writing reliable numerical software for networks of heterogeneous computers. We describe and illustrate the problems encountered during the development of ScaLAPACK. Where possible, we suggest solutions to avoid potential pitfalls, or if that is not possible, recommend that the software is not used on heterogeneous networks.
formerly S. Ostrouchov
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorensen, D.: LAPACK Users' Guide, Second Edition. SIAM, Philadelphia, PA, 1995.
Choi, J., Demmel, J., Dhillon, I., Dongarra, J., Ostrouchov, S., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers — Design Issues and Performance. Technical Report UT CS-95-283, LAPACK Working Note #95, University of Tennessee, 1995.
Choi, J., Dongarra, J., Ostrouchov, S., Petitet, A., Walker, D., Whaley, R.C.: A Proposal for a Set of Parallel Basic Linear Algebra Subprograms. Technical Report UT CS-95-292, LAPACK Working Note #100, University of Tennessee, 1995.
Demmel, J., Dhillon, I., and Ren, H.: On the correctness of parallel bisection in floating point, ETNA 3:116–149 (1995).
Dongarra, J., Du Croz, J., Duff, I., Hammarling, S.: A Set of Level 3 Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software, 16(1):1–17, 1990.
Dongarra, J., Du Croz, J., Hammarling, S., Hanson, R.: Algorithm 656: An extended Set of Basic Linear Algebra Subprograms: Model Implementation and Test Programs. ACM Transactions on Mathematical Software, 14(1):18–32, 1988.
Dongarra, J., Whaley, R.C.: A User's Guide to the BLACS v1.0. Technical Report UT CS-95-281, LAPACK Working Note #94, University of Tennessee, 1995.
Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., V. Sunderam, V.: PVM: Parallel Virtual Machine. A User's Guide and Tutorial for Networked Parallel Computing. The MIT Press, Cambridge, Massachusetts, 1994.
Golub, G., and Van Loan, C. F.: Matrix Computations, Johns Hopkins University Press, Baltimore, MD, 2nd ed., 1989.
Gropp, W., Lusk, E. Skjellum, A.: Using MPI: Portable Programming with the Message-Passing Interface, MIT Press, Cambridge, MA, 1994.
IEEE. ANSI/IEEE Standard for Binary Floating Point Arithmetic: Std 754-1985, IEEE Press, New York, NY, 1985.
IEEE. ANSI/IEEE Standard for Radix Independent Floating Point Arithmetic: Std 854-1987, IEEE Press, New York, NY, 1987.
Lawson, C., Hanson, R., Kincaid, D., Krogh, F.: Basic Linear Algebra Subprograms for Fortran Usage. ACM Transactions on Mathematical Software, 5:308–323, 1979.
Message Passing Interface Forum. MPI: A Message Passing Interface Standard. International Journal of Supercomputer Applications and High Performance Computing, 8(3–4), 1994.
Snir, M., Otto, S. W., Huss-Lederman, S., Walker, D. W. and Dongarra, J.: MPI: The Complete Reference, MIT Press, Cambridge, MA, 1996.
SunSoft. The XDR Protocol Specification. Appendix A of “Network Interfaces Programmer's Guide”, SunSoft, 1993.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Blackford, S. et al. (1996). Practical experience in the dangers of heterogeneous computing. In: Waśniewski, J., Dongarra, J., Madsen, K., Olesen, D. (eds) Applied Parallel Computing Industrial Computation and Optimization. PARA 1996. Lecture Notes in Computer Science, vol 1184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62095-8_7
Download citation
DOI: https://doi.org/10.1007/3-540-62095-8_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62095-2
Online ISBN: 978-3-540-49643-4
eBook Packages: Springer Book Archive