Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Practical experience in the dangers of heterogeneous computing

  • Conference paper
  • First Online:
Applied Parallel Computing Industrial Computation and Optimization (PARA 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1184))

Included in the following conference series:

Abstract

Special challenges exist in writing reliable numerical library software for heterogeneous computing environments. Although a lot of software for distributed memory parallel computers has been written, porting this software to a network of workstations requires careful consideration. The symptoms of heterogeneous computing failures can range from erroneous results without warning to deadlock. Some of the problems are straightforward to solve, but for others the solutions are not so obvious, or incur an unacceptable overhead. Making software robust on heterogeneous systems often requires additional communication.

This paper addresses the issue of writing reliable numerical software for networks of heterogeneous computers. We describe and illustrate the problems encountered during the development of ScaLAPACK. Where possible, we suggest solutions to avoid potential pitfalls, or if that is not possible, recommend that the software is not used on heterogeneous networks.

formerly S. Ostrouchov

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorensen, D.: LAPACK Users' Guide, Second Edition. SIAM, Philadelphia, PA, 1995.

    Google Scholar 

  2. Choi, J., Demmel, J., Dhillon, I., Dongarra, J., Ostrouchov, S., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers — Design Issues and Performance. Technical Report UT CS-95-283, LAPACK Working Note #95, University of Tennessee, 1995.

    Google Scholar 

  3. Choi, J., Dongarra, J., Ostrouchov, S., Petitet, A., Walker, D., Whaley, R.C.: A Proposal for a Set of Parallel Basic Linear Algebra Subprograms. Technical Report UT CS-95-292, LAPACK Working Note #100, University of Tennessee, 1995.

    Google Scholar 

  4. Demmel, J., Dhillon, I., and Ren, H.: On the correctness of parallel bisection in floating point, ETNA 3:116–149 (1995).

    Google Scholar 

  5. Dongarra, J., Du Croz, J., Duff, I., Hammarling, S.: A Set of Level 3 Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software, 16(1):1–17, 1990.

    Google Scholar 

  6. Dongarra, J., Du Croz, J., Hammarling, S., Hanson, R.: Algorithm 656: An extended Set of Basic Linear Algebra Subprograms: Model Implementation and Test Programs. ACM Transactions on Mathematical Software, 14(1):18–32, 1988.

    Google Scholar 

  7. Dongarra, J., Whaley, R.C.: A User's Guide to the BLACS v1.0. Technical Report UT CS-95-281, LAPACK Working Note #94, University of Tennessee, 1995.

    Google Scholar 

  8. Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., V. Sunderam, V.: PVM: Parallel Virtual Machine. A User's Guide and Tutorial for Networked Parallel Computing. The MIT Press, Cambridge, Massachusetts, 1994.

    Google Scholar 

  9. Golub, G., and Van Loan, C. F.: Matrix Computations, Johns Hopkins University Press, Baltimore, MD, 2nd ed., 1989.

    Google Scholar 

  10. Gropp, W., Lusk, E. Skjellum, A.: Using MPI: Portable Programming with the Message-Passing Interface, MIT Press, Cambridge, MA, 1994.

    Google Scholar 

  11. IEEE. ANSI/IEEE Standard for Binary Floating Point Arithmetic: Std 754-1985, IEEE Press, New York, NY, 1985.

    Google Scholar 

  12. IEEE. ANSI/IEEE Standard for Radix Independent Floating Point Arithmetic: Std 854-1987, IEEE Press, New York, NY, 1987.

    Google Scholar 

  13. Lawson, C., Hanson, R., Kincaid, D., Krogh, F.: Basic Linear Algebra Subprograms for Fortran Usage. ACM Transactions on Mathematical Software, 5:308–323, 1979.

    Google Scholar 

  14. Message Passing Interface Forum. MPI: A Message Passing Interface Standard. International Journal of Supercomputer Applications and High Performance Computing, 8(3–4), 1994.

    Google Scholar 

  15. Snir, M., Otto, S. W., Huss-Lederman, S., Walker, D. W. and Dongarra, J.: MPI: The Complete Reference, MIT Press, Cambridge, MA, 1996.

    Google Scholar 

  16. SunSoft. The XDR Protocol Specification. Appendix A of “Network Interfaces Programmer's Guide”, SunSoft, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jerzy Waśniewski Jack Dongarra Kaj Madsen Dorte Olesen

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Blackford, S. et al. (1996). Practical experience in the dangers of heterogeneous computing. In: Waśniewski, J., Dongarra, J., Madsen, K., Olesen, D. (eds) Applied Parallel Computing Industrial Computation and Optimization. PARA 1996. Lecture Notes in Computer Science, vol 1184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62095-8_7

Download citation

  • DOI: https://doi.org/10.1007/3-540-62095-8_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62095-2

  • Online ISBN: 978-3-540-49643-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics