Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Avoiding Communication in Successive Band Reduction

Published: 18 February 2015 Publication History
  • Get Citation Alerts
  • Abstract

    The running time of an algorithm depends on both arithmetic and communication (i.e., data movement) costs, and the relative costs of communication are growing over time. In this work, we present sequential and distributed-memory parallel algorithms for tridiagonalizing full symmetric and symmetric band matrices that asymptotically reduce communication compared to previous approaches.
    The tridiagonalization of a symmetric band matrix is a key kernel in solving the symmetric eigenvalue problem for both full and band matrices. In order to preserve structure, tridiagonalization routines use annihilate-and-chase procedures that previously have suffered from poor data locality and high parallel latency cost. We improve both by reorganizing the computation and obtain asymptotic improvements. We also propose new algorithms for reducing a full symmetric matrix to band form in a communication-efficient manner. In this article, we consider the cases of computing eigenvalues only and of computing eigenvalues and all eigenvectors.

    References

    [1]
    Aggarwal, A. and Vitter, J. 1988. The input/output complexity of sorting and related problems. Commun. ACM 31, 9, 1116--1127.
    [2]
    Agullo, E., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Langou, J., Ltaief, H., Luszczek, P., and Yarkhan, A. 2009. PLASMA Users' Guide. http://icl.cs.utk.edu/plasma/.
    [3]
    Anderson, E., Bai, Z., Bischof, C., et al. 1992. LAPACK Users' Guide. SIAM, Philadelphia, PA.
    [4]
    Auckenthaler, T. 2012. Highly scalable eigensolvers for petaflop applications. Ph.D. thesis, Fakultät für Informatik, Technische Universität München.
    [5]
    Auckenthaler, T., Blum, V., Bungartz, H.-J., Huckle, T., Johanni, R., Krämer, L., Lang, B., Lederer, H., and Willems, P. 2011a. Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Comput. 37, 12, 783--794.
    [6]
    Auckenthaler, T., Bungartz, H.-J., Huckle, T., Krämer, L., Lang, B., Lederer, H., and Willems, P. 2011b. Developing algorithms and software for the parallel solution of the symmetric eigenvalue problem. J. Computat. Sci. 2, 3, 272--278.
    [7]
    Ballard, G., Demmel, J., and Dumitriu, I. 2011a. Communication-optimal parallel and sequential eigenvalue and singular value algorithms. EECS Tech. Rep. EECS-2011-14, University of California, Berkeley.
    [8]
    Ballard, G., Demmel, J., Grigori, L., Jacquelin, M., Nguyen, H., and Solomonik, E. 2013a. Reconstructing householder vectors from tall-skinny QR. Tech. Rep. UCB/EECS-2013-175, EECS Department, University of California, Berkeley.
    [9]
    Ballard, G., Demmel, J., Holtz, O., and Schwartz, O. 2011b. Minimizing communication in numerical linear algebra. SIAM J. Matrix Anal. Appl. 32, 3, 866--901.
    [10]
    Ballard, G., Demmel, J., and Knight, N. 2012. Communication avoiding successive band reduction. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'12). ACM, New York, 35--44.
    [11]
    Ballard, G., Demmel, J., Lipshitz, B., Schwartz, O., and Toledo, S. 2013b. Communication efficient Gaussian elimination with partial pivoting using a shape morphing data layout. In Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'13). ACM, 232--240.
    [12]
    Barth, W., Martin, R., and Wilkinson, J. 1967. Calculation of the eigenvalues of a symmetric tridiagonal matrix by the method of bisection. Numerische Mathematik 9, 5, 386--393.
    [13]
    Bientinesi, P., Igual, F., Kressner, D., and Quintana-Ort, E. 2010. Reduction to condensed forms for symmetric eigenvalue problems on multi-core architectures. In Parallel Processing and Applied Mathematics, R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, Eds., Lecture Notes in Computer Science, vol. 6067, Springer, 387--395.
    [14]
    Bischof, C., Lang, B., and Sun, X. 1994. Parallel tridiagonalization through two-step band reduction. In Proceedings of the Conference on Scalable High-Performance Computing. IEEE, 23--27.
    [15]
    Bischof, C., Lang, B., and Sun, X. 2000a. Algorithm 807: The SBR Toolbox: Software for successive band reduction. ACM Trans. Math. Softw. 26, 4, 602--616.
    [16]
    Bischof, C., Lang, B., and Sun, X. 2000b. A framework for symmetric band reduction. ACM Trans. Math. Softw. 26, 4, 581--601.
    [17]
    Bischof, C., Marques, M., and Sun, X. 1993. Parallel bandreduction and tridiagonalization. In Proceedings of the 6th SIAM Conference on Parallel Processing for Scientific Computing. Vol. 1, SIAM, 383--390.
    [18]
    Bischof, C. and Sun, X. 1992. A framework for symmetric band reduction and tridiagonalization. Tech. Rep. MCS-P298-0392, Argonne National Laboratory.
    [19]
    Blackford, L. S., Choi, J., Cleary, A., et al. 1997. ScaLAPACK Users' Guide. SIAM, Philadelphia, PA. http://www.netlib.org/scalapack/.
    [20]
    Bowdler, H., Martin, R., Reinsch, C., and Wilkinson, J. 1968. The QR and QL algorithms for symmetric matrices. Numerische Mathematik 11, 4, 293--306.
    [21]
    Braman, K., Byers, R., and Mathias, R. 2002. The multishift QR algorithm. part i: Maintaining wellfocused shifts and level 3 performance. SIAM J. Matrix Anal. Appl. 23, 4, 929--947.
    [22]
    Bruck, J., Ho, C.-T., Kipnis, S., and Weathersby, D. 1994. Efficient algorithms for all-to-all communications in multi-port message-passing systems. In Proceedings of the 6th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA'94). ACM, New York, 298--309.
    [23]
    Cannon, L. 1969. A cellular computer to implement the Kalman filter algorithm. Ph.D. thesis, Montana State University, Bozeman, MT.
    [24]
    Cuppen, J. 1980. A divide and conquer method for the symmetric tridiagonal eigenproblem. Numerische Mathematik 36, 2, 177--195.
    [25]
    Demmel, J., Grigori, L., Hoemmen, M., and Langou, J. 2012. Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34, 1, A206--A239.
    [26]
    Demmel, J., Marques, O., Parlett, B., and Vömel, C. 2008. Performance and accuracy of LAPACK's symmetric tridiagonal eigensolvers. SIAM J. Sci. Comput. 30, 3, 1508--1526.
    [27]
    Dhillon, I. S. and Parlett, B. N. 2004. Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices. Linear Algebra Appl. 387, 1--28.
    [28]
    Dongarra, J. J., Sorensen, D. C., and Hammarling, S. J. 1989. Block reduction of matrices to condensed forms for eigenvalue computations. J. Comput. Appl. Math. 27, 1--2, 215--227.
    [29]
    Fuller, S. and Millett, L., Eds. 2011. The Future of Computing Performance: Game Over or Next Level? National Academies Press, Washington, D.C.
    [30]
    Gansterer, W. N., Kvasnicka, D. F., and Ueberhuber, C. W. 1999. Multi-sweep algorithms for the symmetric eigenproblem. In Vector and Parallel Processing, V. Hernandez, J. Palma, and J. J. Dongarra, Eds., Lecture Notes in Computer Science, vol. 1573, Springer, 20--28.
    [31]
    Granat, R., Kågström, B., and Kressner, D. 2010. A novel parallel QR algorithm for hybrid distributed memory hpc systems. SIAM J. Sci. Comput. 32, 4, 2345--2378.
    [32]
    Grosser, B. and Lang, B. 1999. Efficient parallel reduction to bidiagonal form. Parallel Comput. 25, 8, 969--986.
    [33]
    Gu, M. and Eisenstat, S. 1992. A stable algorithm for the rank-1 modification of the symmetric eigenproblem. Tech. Rep. YALEU/DCS/RR-916, Yale University.
    [34]
    Haidar, A., Kurzak, J., and Luszczek, P. 2013a. An improved parallel singular value algorithm and its implementation for multicore hardware. In Proceedings the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, New York, 90:1--90:12.
    [35]
    Haidar, A., Ltaief, H., and Dongarra, J. 2011. Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 8.
    [36]
    Haidar, A., Ltaief, H., and Dongarra, J. 2012a. Toward a high performance tile divide and conquer algorithm for the dense symmetric eigenvalue problem. SIAM J. Sci. Comput. 34, 6, C249--C274.
    [37]
    Haidar, A., Ltaief, H., Luszczek, P., and Dongarra, J. 2012b. A comprehensive study of task coalescing for selecting parallelism granularity in a two-stage bidiagonal reduction. In Proceedings of the IEEE 26th International Parallel & Distributed Processing Symposium. IEEE, 25--35.
    [38]
    Haidar, A., Tomov, S., Dongarra, J., Solcå, R., and Schulthess, T. 2013b. A novel hybrid CPU-GPU generalized eigensolver for electronic structure calculations based on fine-grained memory aware tasks. Int. J. High Perform. Comput. Appl.
    [39]
    Hong, J. and Kung, H. 1981. I/O complexity: The red-blue pebble game. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing (STOC'81). ACM, New York, 326--333.
    [40]
    Howell, G. W., Demmel, J. W., Fulton, C. T., Hammarling, S., and Marmol, K. 2008. Cache efficient bidiagonalization using BLAS 2.5 operators. ACM Trans. Math. Softw. 34, 3, 14:1--14:33.
    [41]
    Karlsson, L. and Kàgström, B. 2011. Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures. Parallel Comput. 37, 12, 771--782.
    [42]
    Kaufman, L. 1984. Banded eigenvalue solvers on vector machines. ACM Trans. Math. Softw. 10, 73--86.
    [43]
    Kaufman, L. 2000. Band reduction algorithms revisited. ACM Trans. Math. Softw. 26, 551--567.
    [44]
    Lang, B. 1991. Parallele reduktion symmetrischer bandmatrizen auf tridiagonalgestalt. Ph.D. thesis, Fakultät für Informatik, Technische Universität München.
    [45]
    Lang, B. 1993. A parallel algorithm for reducing symmetric banded matrices to tridiagonal form. SIAM J. Sci. Comput. 14, 6, 1320--1338.
    [46]
    Lang, B. 1996. Parallel reduction of banded matrices to bidiagonal form. Parallel Comput. 22, 1, 1--18.
    [47]
    Lang, B. 1999. Efficient eigenvalue and singular value computations on shared memory machines. Parallel Comput. 25, 7, 845--860.
    [48]
    Ltaief, H., Luszczek, P., and Dongarra, J. 2013. High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures. ACM Trans. Math. Softw. 39, 3, 16:1--16:22.
    [49]
    Luszczek, P., Ltaief, H., and Dongarra, J. 2011. Two-stage tridiagonal reduction for dense symmetric matrices using tile algorithms on multicore architectures. In Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS'11). IEEE, 944--955.
    [50]
    Murata, K. and Horikoshi, K. 1975. A new method for the tridiagonalization of the symmetric band matrix. Inf. Process. Japan 15, 108--112.
    [51]
    Rajamanickam, S. 2009. Efficient algorithms for sparse singular value decomposition. Ph.D. thesis, University of Florida.
    [52]
    Rutishauser, H. 1963. On Jacobi rotation patterns. In Proceedings of Symposia in Applied Mathematics. Vol. 15, AMS, 219--239.
    [53]
    Schreiber, R. and Van Loan, C. 1989. A storage-efficient wy representation for products of householder transformations. SIAM J. Sci. Stat. Comput. 10, 1, 53--57.
    [54]
    Schwarz, H. 1963. Algorithm 183: Reduction of a symmetric bandmatrix to triple diagonal form. Commun. ACM 6, 6, 315--316.
    [55]
    Schwarz, H. 1968. Tridiagonalization of a symmetric band matrix. Numerische Mathematik 12, 231--241.
    [56]
    Smith, C., Hendrickson, B., and Jessup, E. 1994. A parallel algorithm for householder tridiagonalization. In Proceedings of the 5th SIAM Conference on Applied Linear Algebra. 361--365.
    [57]
    Van Zee, F., Van De Geijn, R., and Quintana-Orti, G. 20134. Restructuring the QR algorithm for performance. ACM Trans. Math. Softw. 40, 3.
    [58]
    Wilkinson, J. 1962. Householder's method for symmetric matrices. Numerische Mathematik 4, 1, 354--361.
    [59]
    Yamazaki, I., Dong, T., Tomov, S., and Dongarra, J. 2013. Tridiagonalization of a symmetric dense matrix on a GPU cluster. In Proceedings of the 27th International Parallel and Distributed Processing Symposium Workshops PhD Forum. 1070--1079.

    Cited By

    View all

    Index Terms

    1. Avoiding Communication in Successive Band Reduction

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Parallel Computing
      ACM Transactions on Parallel Computing  Volume 1, Issue 2
      Special Issue on PPOPP 2012
      January 2015
      224 pages
      ISSN:2329-4949
      EISSN:2329-4957
      DOI:10.1145/2737841
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 February 2015
      Accepted: 01 July 2014
      Revised: 01 July 2014
      Received: 01 April 2013
      Published in TOPC Volume 1, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Symmetric eigenvalue problem
      2. band reduction
      3. communication avoiding algorithms

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • Center for Future Architecture Research
      • Lockheed Martin Corporation
      • Sandia National Laboratories
      • US DOE
      • U.S. Department of Energy Contract
      • Microsoft
      • ParLab
      • DARPA
      • Math Works
      • NSF
      • Intel
      • STARnet
      • National Instruments
      • Sandia National Laboratories Truman Fellowship in National Security Science and Engineering
      • Samsung
      • UC Discovery
      • Nokia
      • NVIDIA
      • Sandia Corporation
      • Oracle
      • Semiconductor Research Corporation
      • MARCO

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)65
      • Downloads (Last 6 weeks)5

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Generalized Ware-Amdhal Law2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00037(215-221)Online publication date: 20-Mar-2024
      • (2023)Efficient parallel reduction of bandwidth for symmetric matricesParallel Computing10.1016/j.parco.2023.102998115:COnline publication date: 1-Feb-2023
      • (2023)Algorithm and Software Overhead: A Theoretical Approach to Performance PortabilityParallel Processing and Applied Mathematics10.1007/978-3-031-30445-3_8(89-100)Online publication date: 27-Apr-2023
      • (2020)High-performance sampling of generic determinantal point processesPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences10.1098/rsta.2019.0059378:2166(20190059)Online publication date: 20-Jan-2020
      • (2019)Improved Unconstrained Energy Functional Method for Eigensolvers in Electronic Structure CalculationsProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337914(1-11)Online publication date: 5-Aug-2019
      • (2018)The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme ScaleSIAM Review10.1137/17M111773260:4(808-865)Online publication date: 8-Nov-2018
      • (2018)Two-sided orthogonal reductions to condensed forms on asymmetric multicore processorsParallel Computing10.1016/j.parco.2018.03.00578:C(85-100)Online publication date: 1-Oct-2018
      • (2017)A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue ProblemProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087561(111-121)Online publication date: 24-Jul-2017
      • (2017)A Parallel Bisection and Inverse Iteration Solver for a Subset of Eigenpairs of Symmetric Band MatricesEigenvalue Problems: Algorithms, Software and Applications in Petascale Computing10.1007/978-3-319-62426-6_3(31-50)Online publication date: 28-Sep-2017

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media