Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers

Published: 01 December 2001 Publication History
  • Get Citation Alerts
  • Abstract

    Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N ), where 2< 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(logN) time by using N /logN processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DMPC can be made fully scalable, that is, for all 1 p N /logN, multiplying two N N matrices can be performed by a DMPC with p processors in O(N /p) time, i.e., linear speedup and cost optimality can be achieved in the range 1.N /logN]. This unifies all known algorithms for matrix multiplication on DMPC, standard or non- standard, sequential or parallel. Extensions of our methods and results to other parallel systems are also presented. For instance, for all 1 p N /logN, multiplying two N N matrices can be performed by p processors connected by a hypercubic network in O(N /p+(N2/p2/ )(logp)2( 1)/ ) time, which implies that if p=O(N /(logN)2( 1)/( 2)), linear speedup can be achieved. Such a parallelization is highly scalable. The above claims result in significant progress in scalable parallel matrix multiplication (as well as solving many other important problems) on distributed memory systems, both theoretically and practically.

    References

    [1]
    S.G. Akl, Prentice-Hall, Upper Saddle River, 1997.
    [2]
    R.J. Anderson, G.L. Miller, University of Southern CaliforniaComputer Science Dept, 1988.
    [3]
    A.F. Benner, H.F. Jordan, V.P. Heuring, Digital optical computing with optically switched directional couplers, Opt. Eng., 30 (1991) 1936-1941.
    [4]
    D. Bini, V. Pan, Birkhäuser, Boston, 1994.
    [5]
    A.K. Chandra, Report (October 1979).
    [6]
    D. Chiarulli, R. Melhem, S. Levitan, Using coincident optical pulses for parallel memory addressing, IEEE Comput., 30 (1987) 48-57.
    [7]
    D. Coppersmith, S. Winograd, Matrix multiplication via arithmetic progressions, J. Symbolic Comput., 9 (1990) 251-280.
    [8]
    E. Dekel, D. Nassimi, S. Sahni, Parallel matrix and graph algorithms, SIAM J. Comput., 10 (1981) 657-673.
    [9]
    P.W. Dowd, Wavelength division multiple access channel hypercube processor interconnection, IEEE Trans. Comput., 41 (1992) 1223-1241.
    [10]
    M.M. Eshaghian, Parallel algorithms for image processing on OMC, IEEE Trans. Comput., 40 (1993) 827-833.
    [11]
    M. Geréb-Graus, T. Tsantilas, Efficient optical communication in parallel computers, 1992.
    [12]
    L.A. Goldberg, M. Jerrum, T. Leighton, S. Rao, Doubly logarithmic communication algorithms for optical-communication parallel computers, SIAM J. Comput., 26 (1997) 1100-1119.
    [13]
    Z. Guo, R. Melhem, R. Hall, D. Chiarulli, S. Levitan, Pipelined communications in optically interconnected arrays, J. Parallel Distrib. Comput., 12 (1991) 269-282.
    [14]
    M. Hamdi, Y. Pan, Efficient parallel algorithms on optically interconnected arrays of processors, IEEE Proc. Comput. Digital Tech., 142 (1995) 87-92.
    [15]
    F.T. Leighton, Morgan Kaufmann, San Mateo, 1992.
    [16]
    S. Levitan, D. Chiarulli, R. Melhem, Coincident pulse techniques for multiprocessor interconnection structures, Appl. Opt., 29 (1990) 2024-2039.
    [17]
    K. Li, Constant time boolean matrix multiplication on a linear array with a reconfigurable pipelined bus system, J. Supercomputing, 11 (1997) 391-403.
    [18]
    K. Li, Fast and scalable parallel algorithms for matrix chain product and matrix powers on optical buses, in: High Performance Computing Systems and Applications, Kluwer Academic, Boston, 2000, pp. 333-348.
    [19]
    K. Li, V.Y. Pan, Parallel matrix multiplication on a linear array with a reconfigurable pipelined bus system, IEEE Trans. Comput., 50 (2001) 519-525.
    [20]
    K. Li, Y. Pan, M. Hamdi, Solving graph theory problems using reconfigurable pipelined optical buses, Parallel Comput., 26 (May 2000) 723-735.
    [21]
    K. Li, Y. Pan, S.Q. Zheng, Kluwer Academic, Boston, 1998.
    [22]
    K. Li, Y. Pan, S.Q. Zheng, Fast and processor efficient parallel matrix multiplication algorithms on a linear array with a reconfigurable pipelined bus system, IEEE Trans. Parallel Distrib. Systems, 9 (1998) 705-720.
    [23]
    K. Li, Y. Pan, S.Q. Zheng, Parallel matrix computations using a reconfigurable pipelined optical bus, J. Parallel Distrib. Comput., 59 (1999) 13-30.
    [24]
    K. Li, Y. Pan, S.Q. Zheng, Efficient deterministic and probabilistic simulations of PRAMs on linear arrays with reconfigurable pipelined bus systems, J. Supercomputing, 15 (2000) 163-181.
    [25]
    K. Li, Y. Pan, S.Q. Zheng, Scalable parallel matrix multiplication using reconfigurable pipelined optical bus systems, October 1998.
    [26]
    Y. Li, Y. Pan, S.Q. Zheng, Pipelined TDM optical bus with conditional delays, Opt. Eng., 36 (1997) 2417-2424.
    [27]
    K. Mehlhorn, U. Vishkin, Randomized and deterministic simulations of PRAMS by parallel machines with restricted granularity of parallel memories, Acta Inform., 21 (1984) 339-374.
    [28]
    Y. Pan, M. Hamdi, Efficient computation of singular value decomposition on arrays with pipelined optical buses, J. Network Comput. Appl., 19 (1996) 235-248.
    [29]
    Y. Pan, M. Hamdi, K. Li, Efficient and scalable quicksort on a linear array with a reconfigurable pipelined bus system, Future Generation Comput. Systems, 13 (1998) 501-513.
    [30]
    Y. Pan, K. Li, Linear array with a reconfigurable pipelined bus system¿Concepts and applications, J. Inform. Sci., 106 (1998) 237-258.
    [31]
    Y. Pan, K. Li, S.Q. Zheng, Fast nearest neighbor algorithms on a linear array with a reconfigurable pipelined bus system, J. Parallel Algorithms Appl., 13 (1998) 1-25.
    [32]
    V. Pan, New fast algorithms for matrix operations, SIAM J. Comput., 9 (1980) 321-342.
    [33]
    V. Pan, Complexity of parallel matrix computations, Theoret. Comput. Sci., 54 (1987) 65-85.
    [34]
    V. Pan, Parallel solution of sparse linear and path systems, in: Synthesis of Parallel Algorithms, Morgan Kaufmann, San Mateo, 1993, pp. 621-678.
    [35]
    V. Pan, J. Reif, Efficient parallel solution of linear systems, May 1985.
    [36]
    S. Pavel, S.G. Akl, Matrix operations using arrays with reconfigurable optical buses, J. Parallel Algorithms Appl., 8 (1996) 223-242.
    [37]
    C. Qiao, R. Melhem, Time-division optical communications in multiprocessor arrays, IEEE Trans. Comput., 42 (1993) 577-590.
    [38]
    S. Rajasekaran, S. Sahni, Sorting, selection, and routing on the array with reconfigurable optical buses, IEEE Trans. Parallel Distrib. Systems, 8 (1997) 1123-1132.
    [39]
    V. Strassen, Gaussian elimination is not optimal, Numer. Math., 13 (1969) 354-356.
    [40]
    S.Q. Zheng, Y. Li, Pipelined asynchronous time-division multiplexing optical bus, Opt. Eng., 36 (1997) 3392-3400.

    Cited By

    View all
    • (2018)Merging, sorting and matrix operations on the SOME-bus multiprocessor architectureFuture Generation Computer Systems10.1016/S0167-739X(03)00129-820:4(643-661)Online publication date: 29-Dec-2018
    • (2018)Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systemsThe Journal of Supercomputing10.1007/s11227-009-0319-054:3(271-297)Online publication date: 31-Dec-2018
    • (2007)Analysis of Parallel Algorithms for Matrix Chain Product and Matrix Powers on Distributed Memory SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.102718:7(865-878)Online publication date: 1-Jul-2007
    • Show More Cited By

    Index Terms

    1. Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Journal of Parallel and Distributed Computing
          Journal of Parallel and Distributed Computing  Volume 61, Issue 12
          December 2001
          144 pages

          Publisher

          Academic Press, Inc.

          United States

          Publication History

          Published: 01 December 2001

          Author Tags

          1. cost optimality
          2. distributed memory parallel computer
          3. linear array with reconfigurable pipelined bus system
          4. matrix multiplication
          5. module parallel computer
          6. optical model of computation
          7. scalability
          8. speedup

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 29 Jul 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2018)Merging, sorting and matrix operations on the SOME-bus multiprocessor architectureFuture Generation Computer Systems10.1016/S0167-739X(03)00129-820:4(643-661)Online publication date: 29-Dec-2018
          • (2018)Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systemsThe Journal of Supercomputing10.1007/s11227-009-0319-054:3(271-297)Online publication date: 31-Dec-2018
          • (2007)Analysis of Parallel Algorithms for Matrix Chain Product and Matrix Powers on Distributed Memory SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.102718:7(865-878)Online publication date: 1-Jul-2007
          • (2005)Fast and Scalable Parallel Matrix Computations on Distributed Memory SystemsProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 0110.1109/IPDPS.2005.221Online publication date: 4-Apr-2005
          • (2003)Fast solution of large N × N matrix equations in an MIMD-SIMD hybrid systemParallel Computing10.1016/j.parco.2003.05.01129:11-12(1669-1684)Online publication date: 1-Nov-2003

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media