Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/11752578_128guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Adapting linear algebra codes to the memory hierarchy using a hypermatrix scheme

Published: 11 September 2005 Publication History

Abstract

We present the way in which we adapt data and computations to the underlying memory hierarchy by means of a hierarchical data structure known as hypermatrix. The application of orthogonal block forms produced the best performance for the platforms used.

References

[1]
Fuchs, G., Roy, J., Schrem, E.: Hypermatrix solution of large sets of symmetric positive-definite linear equations. Comp. Meth. Appl. Mech. Eng. 1 (1972) 197-216
[2]
Noor, A., Voigt, S.: Hypermatrix scheme for the STAR-100 computer. Comp. & Struct. 5 (1975) 287-296
[3]
Ast, M., Fischer, R., Manz, H., Schulz, U.: PERMAS: User's reference manual, INTES publication no. 450, rev.d (1997).
[4]
Chatterjee, S., Jain, V. V., Lebeck, A. R., Mundhra, S., Thottethodi, M.: Nonlinear array layouts for hierarchical memory systems. In: Proceedings of the 13th international conference on Supercomputing, ACM Press (1999) 444-453
[5]
Frens, J. D., Wise, D. S.: Auto-blocking matrix multiplication, or tracking BLAS3 performance from source code. Proc. 6th ACM SIGPLAN Symp. on Principles and Practice of Parallel Program., SIGPLAN Not. 32 (1997) 206-216
[6]
Valsalam, V., Skjellum, A.: A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels. Concurrency and Computation: Practice and Experience 14 (2002) 805-839
[7]
Wise, D. S.: Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free. In: Euro-Par 2000, LNCS1900. (2000) 774-783
[8]
Mellor-Crummey, J., Whalley, D., Kennedy, K.: Improving memory hierarchy performance for irregular applications. In: Proceedings of the 13th international conference on Supercomputing, ACM Press (1999) 425-433
[9]
Wise, D. S.: Representing matrices as quadtrees for parallel processors. Information Processing Letters 20 (1985) 195-199
[10]
Herrero, J. R., Navarro, J. J.: Automatic benchmarking and optimization of codes: an experience with numerical kernels. In: Proceedings of the 2003 International Conference on Software Engineering Research and Practice, CSREA Press (2003) 701-706
[11]
Herrero, J. R., Navarro, J. J.: Improving Performance of Hypermatrix Cholesky Factorization. In: Euro-Par 2003, LNCS2790, Springer-Verlag (2003) 461-469
[12]
Intel: Intel(R) Itanium(R) 2 processor reference manual for software development and optimization (2004).
[13]
Lam, M., Rothberg, E., Wolf, M.: The cache performance and optimizations of blocked algorithms. In: Proceedings of ASPLOS'91. (1991) 67-74
[14]
Navarro, J. J., Juan, A., Lang, T.: MOB forms: A class of Multilevel Block Algorithms for dense linear algebra operations. In: Proceedings of the 8th International Conference on Supercomputing, ACM Press (1994).
[15]
Whaley, R.C., Dongarra, J. J.: Automatically tuned linear algebra software. In: Supercomputing '98, IEEE Computer Society (1998) 211-217

Cited By

View all
  • (2006)Using non-canonical array layouts in dense matrix operationsProceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing10.5555/1775059.1775141(580-588)Online publication date: 18-Jun-2006
  • (2006)Compiler-optimized kernelsProceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V10.1007/11751649_84(762-771)Online publication date: 8-May-2006

Index Terms

  1. Adapting linear algebra codes to the memory hierarchy using a hypermatrix scheme
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    PPAM'05: Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
    September 2005
    1121 pages
    ISBN:3540341412
    • Editors:
    • Roman Wyrzykowski,
    • Jack Dongarra,
    • Norbert Meyer,
    • Jerzy Waśniewski

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 11 September 2005

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2006)Using non-canonical array layouts in dense matrix operationsProceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing10.5555/1775059.1775141(580-588)Online publication date: 18-Jun-2006
    • (2006)Compiler-optimized kernelsProceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V10.1007/11751649_84(762-771)Online publication date: 8-May-2006

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media