research-article

Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System

Authors:

Victor Y. PanAuthors Info & Claims

IEEE Transactions on Computers, Volume 50, Issue 5

Pages 519 - 525

https://doi.org/10.1109/12.926164

Published: 01 May 2001 Publication History

Abstract

The known fast sequential algorithms for multiplying two $N\times N$ matrices (over an arbitrary ring) have time complexity $O(N^\alpha)$, where $2 < \alpha < 3$. The current best value of $\alpha$ is less than 2.3755. We show that, for all $1 \le p \le N^{\alpha}$, multiplying two $N\times N$ matrices can be performed on a p-processor linear array with a reconfigurable pipelined bus system (LARPBS) in $ O({N^{\alpha}\over p}+({N^2\over p^{2/\alpha}})\log p)$ time. This is currently the fastest parallelization of the best known sequential matrix multiplication algorithm on a distributed memory parallel system. In particular, for all $1 \le p \le N^{2.3755}$, multiplying two $N\times N$ matrices can be performed on a p-processor LARPBS in $ O({N^{2.3755}\over p}+({N^2\over p^{0.8419}})\log p) $ time and linear speedup can be achieved for $p$ as large as $O(N^{2.3755}/(\log N)^{6.3262})$. Furthermore, multiplying two $N\times N$ matrices can be performed on an LARPBS with $O(N^\alpha)$ processors in $O(\log N)$ time. This compares favorably with the performance on a PRAM.

References

[1]

S.G. Akl, Parallel Computation: Models and Methods. Upper Saddle River, N.J.: Prentice Hall, 1997.

Digital Library

[2]

A.F. Benner H.F. Jordan and V.P. Heuring, “Digital Optical Computing with Optically Switched Directional Couplers,” Optical Eng., vol. 30, pp. 1936-1941, 1991.

[3]

D. Bini and V. Pan, Polynomial and Matrix Computations, Vol. 1, Fundamental Algorithms. Boston: Birkhäuser, 1994.

Digital Library

[4]

A.K. Chandra, “Maximal Parallelism in Matrix Multiplication,” Report RC-6193, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., Oct. 1979.

[5]

D. Chiarulli R. Melhem and S. Levitan, “Using Coincident Optical Pulses for Parallel Memory Addressing,” Computer, vol. 30, pp. 48-57, 1987.

Digital Library

[6]

D. Coppersmith and S. Winograd, “Matrix Multiplication via Arithmetic Progressions,” J. Symbolic Computation, vol. 9, pp. 251-280, 1990.

Digital Library

[7]

E. Dekel D. Nassimi and S. Sahni, “Parallel Matrix and Graph Algorithms,” SIAM J. Computing, vol. 10, pp. 657-673, 1981.

[8]

P.W. Dowd, “Wavelength Division Multiple Access Channel Hypercube Processor Interconnection,” IEEE Trans. Computers, vol. 41, pp. 1223-1241, 1992.

Digital Library

[9]

G.H. Golub and C.F. Van Loan, Matrix Computations. Baltimore: Johns Hopkins Univ. Press, 1996.

[10]

Z. Guo R. Melhem R. Hall D. Chiarulli and S. Levitan, “Pipelined Communications in Optically Interconnected Arrays,” J. Parallel and Distributed Computing, vol. 12, pp. 269-282, 1991.

Digital Library

[11]

M. Hamdi and Y. Pan, “Efficient Parallel Algorithms on Optically Interconnected Arrays of Processors,” IEE Proc.—Computers and Digital Techniques, vol. 142, pp. 87-92, 1995.

[12]

I. Kaporin, “A Practical Algorithm for Faster Matrix Multiplication,” Numerical Linear Algebra with Applications, vol. 6, pp. 687-700, 1999.

[13]

S. Levitan D. Chiarulli and R. Melhem, “Coincident Pulse Techniques for Multiprocessor Interconnection Structures,” Applied Optics, vol. 29, pp. 2024-2039, 1990.

[14]

K. Li, “Constant Time Boolean Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System,” J. Supercomputing, vol. 11,no. 4, pp. 391-403, 1997.

Digital Library

[15]

K. Li, “Fast and Scalable Parallel Algorithms for Matrix Chain Product and Matrix Powers on Optical Buses,” High Performance Computing Systems and Applications, A. Pollard, D.J.K. Mewhort, and D.F. Weaver, eds., pp. 333-348, Boston: Kluwer Academic, 2000.

[16]

K. Li, “Fast and Scalable Parallel Matrix Computations with Optical Buses,” Lecture Notes in Computer Science, vol. 1800, pp. 1053-1062, 2000.

Digital Library

[17]

K. Li Y. Pan and M. Hamdi, “Solving Graph Theory Problems Using Reconfigurable Pipelined Optical Buses,” Parallel Computing, vol. 26, no. 6, pp. 723-735, 2000.

Digital Library

[18]

Parallel Computing Using Optical Interconnections. K. Li, Y. Pan, and S.Q. Zheng, eds. Boston: Kluwer Academic, 1998.

Digital Library

[19]

K. Li Y. Pan and S.Q. Zheng, “Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array with a Reconfigurable Pipelined Bus System,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 8, pp. 705-720, Aug. 1998.

Digital Library

[20]

K. Li Y. Pan and S.Q. Zheng, “Parallel Matrix Computations Using a Reconfigurable Pipelined Optical Bus,” J. Parallel and Distributed Computing, vol. 59,no. 1, pp. 13-30, 1999.

Digital Library

[21]

K. Li Y. Pan and S.Q. Zheng, “Efficient Deterministic and Probabilistic Simulations of PRAMs on Linear Arrays with Reconfigurable Pipelined Bus Systems,” J. Supercomputing, vol. 15, no. 2, pp. 163-181, 2000.

Digital Library

[22]

Y. Li Y. Pan and S.Q. Zheng, “Pipelined TDM Optical Bus with Conditional Delays,” Optical Eng., vol. 36, no. 9, pp. 2417-2424, 1997.

[23]

Y. Pan and M. Hamdi, “Efficient Computation of Singular Value Decomposition on Arrays with Pipelined Optical Buses,” J. Network and Computer Applications, vol. 19, pp. 235-248, 1996.

Digital Library

[24]

Y. Pan M. Hamdi and K. Li, “Efficient and Scalable Quicksort on a Linear Array with a Reconfigurable Pipelined Bus System,” Future Generation Computer Systems, vol. 13, no. 6, pp. 501-513, 1998.

Digital Library

[25]

Y. Pan and K. Li, “Linear Array with a Reconfigurable Pipelined Bus System—Concepts and Applications,” J. Information Sciences, vol. 106, nos. 3-4, pp. 237-258, 1998.

Digital Library

[26]

Y. Pan K. Li and S.Q. Zheng, “Fast Nearest Neighbor Algorithms on a Linear Array with a Reconfigurable Pipelined Bus System,” J. Parallel Algorithms and Applications, vol. 13, pp. 1-25, 1998.

[27]

V. Pan, “How to Multiply Matrices Faster,” Lecture Notes in Computer Science, vol. 179, Berlin: Springer-Verlag, 1984.

Digital Library

[28]

V. Pan, “How Can We Speed Up Matrix Multiplication?” SIAM Review, vol. 26,no. 3, pp. 393-415, 1984.

Digital Library

[29]

V. Pan, “Complexity of Parallel Matrix Computations,” Theoretical Computer Science, vol. 54, pp. 65-85, 1987.

Digital Library

[30]

V. Pan, “Parallel Solution of Sparse Linear and Path Systems,” in Synthesis of Parallel Algorithms, J.H. Reif, ed., pp. 621-678, San Mateo, Calif.: Morgan Kaufmann, 1993.

[31]

V. Pan and J. Reif, “Efficient Parallel Solution of Linear Systems,” Proc. Seventh ACM Symp. Theory of Computing, pp. 143-152, May 1985.

Digital Library

[32]

H. Park H.J. Kim and V.K. Prasanna, “An O(1) Time Optimal Algorithm for Multiplying Matrices on Reconfigurable Mesh,” Information Processing Letters, vol. 47, pp. 109-113, 1993.

Digital Library

[33]

S. Pavel and S.G. Akl, “Matrix Operations Using Arrays with Reconfigurable Optical Buses,” J. Parallel Algorithms and Applications, vol. 8, pp. 223-242, 1996.

[34]

C. Qiao and R. Melhem, “Time-Division Optical Communications in Multiprocessor Arrays,” IEEE Trans. Computers, vol. 42, pp. 577-590, 1993.

Digital Library

[35]

S. Rajasekaran and S. Sahni, “Sorting, Selection, and Routing on the Array with Reconfigurable Optical Buses,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 11, pp. 1123-1132, Nov. 1997.

Digital Library

[36]

V. Strassen, “Gaussian Elimination Is Not Optimal,” Numerische Mathematik, vol. 13, pp. 354-356, 1969.

Digital Library

[37]

J.L. Trahan A.G. Bourgeois Y. Pan and R. Vaidyanathan, “Optimally Scaling Permutation Routing on Reconfigurable Linear Arrays with Optical Buses,” J. Parallel and Distributed Computing, vol. 60, no. 9, pp. 1125-1136, 2000.

Digital Library

[38]

C.-H. Wu S.-J. Horng and H.-R. Tsai, “Efficient Parallel Algorithms for Hierarchical Clustering on Arrays with Reconfigurable Optical Buses,” J. Parallel and Distributed Computing, vol. 60, no. 9, pp. 1137-1153, 2000.

Digital Library

[39]

S.Q. Zheng and Y. Li, “Pipelined Asynchronous Time-Division Multiplexing Optical Bus,” Optical Eng., vol. 36, no. 12, pp. 3392-3400, 1997.

Cited By

Li K(2010)Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systemsThe Journal of Supercomputing10.1007/s11227-009-0319-054:3(271-297)Online publication date: 1-Dec-2010
https://dl.acm.org/doi/10.1007/s11227-009-0319-0
Dongarra JPineau JRobert YVivien FChatterjee SScott M(2008)Matrix product on heterogeneous master-worker platformsProceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming10.1145/1345206.1345217(53-62)Online publication date: 20-Feb-2008
https://dl.acm.org/doi/10.1145/1345206.1345217
Li K(2007)Analysis of Parallel Algorithms for Matrix Chain Product and Matrix Powers on Distributed Memory SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.102718:7(865-878)Online publication date: 1-Jul-2007
https://dl.acm.org/doi/10.1109/TPDS.2007.1027
Show More Cited By

Index Terms

Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices
2. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array With a Reconfigurable Pipelined Bus System

We present efficient parallel matrix multiplication algorithms for linear arrays with reconfigurable pipelined bus systems (LARPBS). Such systems are able to support a large volume of parallel communication of various patterns in constant time. An ...
Parallel Matrix Computations Using a Reconfigurable Pipelined Optical Bus

We present fast and cost-efficient parallel algorithms for a number of important and fundamental matrix computation problems on linear arrays with reconfigurable pipelined optical bus systems. These problems include computing the inverse, the ...
Analysis of Parallel Algorithms for Matrix Chain Product and Matrix Powers on Distributed Memory Systems

Given N matrices A_{1}, A_{2}, \ldots, A_{N} of size N \times N, the matrix chain product problem is to compute A_{1} \times A_{2} \times \cdots \times A_{N}. Given an N \times N matrix A, the matrix powers problem is to calculate the first N powers of ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 50, Issue 5

May 2001

144 pages

ISSN:0018-9340

Editors:
Jean-Luc Gaudiot
Univ. of Southern California, Los Angeles
,
Fabrizio Lombardi
Northeastern Univ., Boston, MA

Issue’s Table of Contents

Copyright © Copyright © 2001 IEEE. All Rights Reserved.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 May 2001

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li K(2010)Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systemsThe Journal of Supercomputing10.1007/s11227-009-0319-054:3(271-297)Online publication date: 1-Dec-2010
https://dl.acm.org/doi/10.1007/s11227-009-0319-0
Dongarra JPineau JRobert YVivien FChatterjee SScott M(2008)Matrix product on heterogeneous master-worker platformsProceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming10.1145/1345206.1345217(53-62)Online publication date: 20-Feb-2008
https://dl.acm.org/doi/10.1145/1345206.1345217
Li K(2007)Analysis of Parallel Algorithms for Matrix Chain Product and Matrix Powers on Distributed Memory SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.102718:7(865-878)Online publication date: 1-Jul-2007
https://dl.acm.org/doi/10.1109/TPDS.2007.1027
Zhuo LPrasanna V(2007)Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.100118:4(433-448)Online publication date: 1-Apr-2007
https://dl.acm.org/doi/10.1109/TPDS.2007.1001
Li K(2005)Fast and Scalable Parallel Matrix Computations on Distributed Memory SystemsProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 0110.1109/IPDPS.2005.221Online publication date: 4-Apr-2005
https://dl.acm.org/doi/10.1109/IPDPS.2005.221
Katsinis C(2004)Merging, sorting and matrix operations on the SOME-bus multiprocessor architectureFuture Generation Computer Systems10.1016/S0167-739X(03)00129-820:4(643-661)Online publication date: 1-May-2004
https://dl.acm.org/doi/10.1016/S0167-739X%2803%2900129-8
Choi MPark NLombardi F(2002)Hardware-Software Co-Reliability in Field Reconfigurable Multi-Processor-Memory SystemsProceedings of the 16th International Parallel and Distributed Processing Symposium10.5555/645610.661371Online publication date: 15-Apr-2002
https://dl.acm.org/doi/10.5555/645610.661371
Li K(2001)Scalable Parallel Matrix Multiplication on Distributed Memory Parallel ComputersJournal of Parallel and Distributed Computing10.1006/jpdc.2001.176861:12(1709-1731)Online publication date: 1-Dec-2001
https://dl.acm.org/doi/10.1006/jpdc.2001.1768

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents