Cache Blocking

Gustavson, Fred G.

doi:10.1007/978-3-642-28151-8_3

Fred G. Gustavson^16,17

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7133))

Included in the following conference series:

International Workshop on Applied Parallel Computing

1397 Accesses
1 Citations

Abstract

Over the past five years almost all computer manufacturers have dramatically changed their computer architectures to Multicore (MC) processors. We briefly describe Cache Blocking as it relates to computer architectures since about 1985 by covering the where, when, how and why of Cache Blocking as it relates to dense linear algebra. It will be seen that the arrangement in memory of the submatrices A _ij of A that are being processed is very important.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Approximate Cache Architectures

Algebraic Preconditioning Approaches and Their Applications

Shared Memory in the Many-Core Age

References

Agarwal, R.C., Cooley, J.W., Gustavson, F.G., Shearer, J.B., Slishman, G., Tuckerman, B.: New scalar and vector elementary functions for the IBM System/370. IBM Journal of Research and Development 30(2), 126–144 (1986)
Article MathSciNet Google Scholar
Agarwal, R.C., Gustavson, F.G.: A Parallel Implementation of Matrix Multiplication and LU factorization on the IBM 3090. In: Wright, M. (ed.) Proceedings of the IFIP WG 2.5 on Aspects of Computation on Asynchronous Parallel Processors, pp. 217–221. North Holland, Stanford (1988)
Google Scholar
Agarwal, R.C., Gustavson, F.G., Zubair, M.: Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM Journal of Research and Development 38(5), 563–576 (1994)
Article Google Scholar
Agarwal, R.C., Gustavson, F.G., Zubair, M.: A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication. IBM Journal of Research and Development 38(6), 673–681 (1994)
Article Google Scholar
Andersen, B.S., Gunnels, J.A., Gustavson, F.G., Reid, J.K., Waśniewski, J.: A Fully Portable High Performance Minimal Storage Hybrid Cholesky Algorithm. ACM TOMS 31(2), 201–227 (2005)
Article MathSciNet MATH Google Scholar
Anderson, E., et al.: LAPACK Users’ Guide Release 3.0. SIAM, Philadelphia (1999)
Google Scholar
Blackford, L.S., et al.: ScaLAPACK Users’ Guide. SIAM, Philadelphia (1997)
Google Scholar
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)
Article MathSciNet Google Scholar
Chatterjee, S., et al.: Design and Exploitation of a High-performance SIMD Floating-point Unit for Blue Gene/L. IBM Journal of Research and Development 49(2-3), 377–391 (2005)
Article Google Scholar
Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.: A Set of Level 3 Basic Linear Algebra Subprograms. TOMS 16(1), 1–17 (1990)
Article MATH Google Scholar
Elmroth, E., Gustavson, F.G., Jonsson, I., Kågström, B.: Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software. SIAM Review 46(1), 3–45 (2004)
Article MathSciNet MATH Google Scholar
Gallivan, K., Jalby, W., Meier, U., Sameh, A.: The Impact of Hierarchical Memory Systems on Linear Algebra Algorithm Design. International Journal of Supercomputer Applications 2(1), 12–48 (1988)
Article Google Scholar
Golub, G., VanLoan, C.: Matrix Computations, 3rd edn. John Hopkins Press, Baltimore (1996)
Google Scholar
Gustavson, F.G.: Recursion Leads to Automatic Variable Blocking for Dense Linear-Algebra Algorithms. IBM Journal of Research and Development 41(6), 737–755 (1997)
Article Google Scholar
Gustavson, F.G.: High Performance Linear Algebra Algorithms using New Generalized Data Structures for Matrices. IBM Journal of Research and Development 47(1), 31–55 (2003)
Article MathSciNet Google Scholar
Gustavson, F.G., Gunnels, J.A., Sexton, J.C.: Minimal Data Copy for Dense Linear Algebra Factorization. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 540–549. Springer, Heidelberg (2007)
Chapter Google Scholar
Gustavson, F.G., Swirszcz, T.: In-Place Transposition of Rectangular Matrices. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 560–569. Springer, Heidelberg (2007)
Chapter Google Scholar
Gustavson, F.G., Gunnels, J., Sexton, J.: Method and Structure for Fast In-Place Transformation of Standard Full and Packed Matrix Data Formats. United State Patent Office Submission YOR920070021US1 and Submission YOR920070021US1(YOR.699CIP) US Patent Office, 35 pages (September 1, 2007); 58 pages (March 2008)
Google Scholar
Gustavson, F.G.: The Relevance of New Data Structure Approaches for Dense Linear Algebra in the New Multicore/Manycore Environments, IBM Research report RC24599; also, to appear in PARA 2008 proceeding, 10 pages (2008)
Google Scholar
Gustavson, F.G., Karlsson, L., Kågström, B.: Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion. ACM TOMS, 34 pages (to appear 2011)
Google Scholar
IBM. IBM Engineering and Scientific Subroutine Library for AIX Version 3, Release 3. IBM Pub. No. SA22-7272-00 (February 1986)
Google Scholar
Karlsson, L.: Blocked in-place transposition with application to storage format conversion. Tech. Rep. UMINF 09.01. Department of Computing Science, Umeå University, Umeå, Sweden (January 2009) ISSN 0348-0542
Google Scholar
Knuth, D.: The Art of Computer Programming, 3rd edn., vol. 1 & 2. Addison-Wesley (1998)
Google Scholar
Kurzak, J., Buttari, A., Dongarra, J.: Solving systems of Linear Equations on the Cell Processor using Cholesky Factorization. IEEE Trans. Parallel Distrib. Syst. 19(9), 1175–1186 (2008)
Article Google Scholar
Kurzak, J., Dongarra, J.: Implementation of mixed precision in solving mixed precision of linear equations on the Cell processor: Research Articles. Concurr. Comput.: Pract. Exper. 19(10), 1371–1385 (2007)
Article Google Scholar
Lao, S., Lewis, B.R., Boucher, M.L.: In-place Transpose United State Patent No. US 7,031,994 B2. US Patent Office (April 18, 2006)
Google Scholar
Park, N., Hong, B., Prasanna, V.: Tiling, Block Data Layout, and Memory Hierarchy Performance. IEEE Trans. Parallel and Distributed Systems 14(7), 640–654 (2003)
Article Google Scholar
Tietze, H.: Three Dimensions–Higher Dimensions. In: Famous Problems of Mathematics, pp. 106–120. Graylock Press (1965)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, Emeritus, Poland
Fred G. Gustavson
Umeå University, Sweden
Fred G. Gustavson

Authors

Fred G. Gustavson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Kristján Jónasson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gustavson, F.G. (2012). Cache Blocking. In: Jónasson, K. (eds) Applied Parallel and Scientific Computing. PARA 2010. Lecture Notes in Computer Science, vol 7133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28151-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-28151-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28150-1
Online ISBN: 978-3-642-28151-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cache Blocking

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Approximate Cache Architectures

Algebraic Preconditioning Approaches and Their Applications

Shared Memory in the Many-Core Age

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Cache Blocking

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Approximate Cache Architectures

Algebraic Preconditioning Approaches and Their Applications

Shared Memory in the Many-Core Age

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation