Abstract
In this paper we discuss code optimization techniques for implementing the Level 2 and 3 basic linear algebra subprograms on a single processor for the CRAY Y-MP and the CRAY-2. Our performance measurements show that the use of these techniques leads to a significant improvement in performance, and most subroutines achieve close to the peak performance of the machine for computations of relatively small sizes.
Similar content being viewed by others
References
Dongarra, J., and Eisenstat, S. 1984. Squeezing the most out of an algorithm in Cray Fortran. ACM Trans. Math. Software, 10: 221–230.
Dongarra, J., Gustavson, F., and Karp, A. 1984. Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Review, 26, 1 (Jan.), 91–112.
Dongarra, J., Du Croz, J., Hammarling, S., and Duff, I. 1990. A set of Level 3 basic linear algebra subprograms. ACM Trans. Math. Software, 16: 1–17.
Dongarra, J., Du Croz, J., Hammarling, S., and Hanson, R. 1988a. An extended set of Fortran basic linear algebra subprograms. ACM Trans. Math. Software, 14: 1–17.
Dongarra, J., Du Croz, J., Hammarling, S., and Hanson, R. 1988b. An extended set of Fortran basic linear algebra subprograms: Model implementation and test programs, ACM Trans. Math. Software, 14: 18–32.
Fong, K., and Jordan, T. 1977. Some linear algebra algorithms and their performance on the Cray-1. Los Alamos Nat. Laboratory tech. rept. LA-6774.
Lawson, C., Hanson, R., Kincaid, D., and Krogh, F. 1979. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Software, 5: 308–323.
Sheikh, Q., and Liu, J. 1989. Basic linear algebra subprogram optimization on the CRAY-2 system. Cray Channels (spring), 24–27.
Sheikh, Q., Vu, P., and Yang, C. 1989. Implementation of parallel Level 3 BLAS on Cray Y-MP and Cray-2. Second Conf. on Vector and Parallel Processing, NCSA, Univ. of Ill., Urbana, Ill.
Sheikh, Q., Vu, P., Yang, C., and Merchant, M. 1989. Implementation of the Level 2 & 3 BLAS. Tech. rept., Cray Research, Inc.
Wasniewski, J., Du Croz, J., Mayes, P., and Jankowski, L. 1989. Implementing the Level 2 BLAS on the Amdahl vector processors. Supercomputer (Sept.), 34–43.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sheikh, Q., Vu, P., Yang, C. et al. Implementation of the Level 2 and 3 BLAS on the CRAY Y-MP and the CRAY-2. J Supercomput 5, 291–305 (1992). https://doi.org/10.1007/BF00127950
Issue Date:
DOI: https://doi.org/10.1007/BF00127950