Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System

Published: 01 May 2001 Publication History
  • Get Citation Alerts
  • Abstract

    The known fast sequential algorithms for multiplying two $N\times N$ matrices (over an arbitrary ring) have time complexity $O(N^\alpha)$, where $2 < \alpha < 3$. The current best value of $\alpha$ is less than 2.3755. We show that, for all $1 \le p \le N^{\alpha}$, multiplying two $N\times N$ matrices can be performed on a p-processor linear array with a reconfigurable pipelined bus system (LARPBS) in $ O({N^{\alpha}\over p}+({N^2\over p^{2/\alpha}})\log p)$ time. This is currently the fastest parallelization of the best known sequential matrix multiplication algorithm on a distributed memory parallel system. In particular, for all $1 \le p \le N^{2.3755}$, multiplying two $N\times N$ matrices can be performed on a p-processor LARPBS in $ O({N^{2.3755}\over p}+({N^2\over p^{0.8419}})\log p) $ time and linear speedup can be achieved for $p$ as large as $O(N^{2.3755}/(\log N)^{6.3262})$. Furthermore, multiplying two $N\times N$ matrices can be performed on an LARPBS with $O(N^\alpha)$ processors in $O(\log N)$ time. This compares favorably with the performance on a PRAM.

    References

    [1]
    S.G. Akl, Parallel Computation: Models and Methods. Upper Saddle River, N.J.: Prentice Hall, 1997.
    [2]
    A.F. Benner H.F. Jordan and V.P. Heuring, “Digital Optical Computing with Optically Switched Directional Couplers,” Optical Eng., vol. 30, pp. 1936-1941, 1991.
    [3]
    D. Bini and V. Pan, Polynomial and Matrix Computations, Vol. 1, Fundamental Algorithms. Boston: Birkhäuser, 1994.
    [4]
    A.K. Chandra, “Maximal Parallelism in Matrix Multiplication,” Report RC-6193, IBM T.J. Watson Research Center, Yorktown Heights, N.Y., Oct. 1979.
    [5]
    D. Chiarulli R. Melhem and S. Levitan, “Using Coincident Optical Pulses for Parallel Memory Addressing,” Computer, vol. 30, pp. 48-57, 1987.
    [6]
    D. Coppersmith and S. Winograd, “Matrix Multiplication via Arithmetic Progressions,” J. Symbolic Computation, vol. 9, pp. 251-280, 1990.
    [7]
    E. Dekel D. Nassimi and S. Sahni, “Parallel Matrix and Graph Algorithms,” SIAM J. Computing, vol. 10, pp. 657-673, 1981.
    [8]
    P.W. Dowd, “Wavelength Division Multiple Access Channel Hypercube Processor Interconnection,” IEEE Trans. Computers, vol. 41, pp. 1223-1241, 1992.
    [9]
    G.H. Golub and C.F. Van Loan, Matrix Computations. Baltimore: Johns Hopkins Univ. Press, 1996.
    [10]
    Z. Guo R. Melhem R. Hall D. Chiarulli and S. Levitan, “Pipelined Communications in Optically Interconnected Arrays,” J. Parallel and Distributed Computing, vol. 12, pp. 269-282, 1991.
    [11]
    M. Hamdi and Y. Pan, “Efficient Parallel Algorithms on Optically Interconnected Arrays of Processors,” IEE Proc.—Computers and Digital Techniques, vol. 142, pp. 87-92, 1995.
    [12]
    I. Kaporin, “A Practical Algorithm for Faster Matrix Multiplication,” Numerical Linear Algebra with Applications, vol. 6, pp. 687-700, 1999.
    [13]
    S. Levitan D. Chiarulli and R. Melhem, “Coincident Pulse Techniques for Multiprocessor Interconnection Structures,” Applied Optics, vol. 29, pp. 2024-2039, 1990.
    [14]
    K. Li, “Constant Time Boolean Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System,” J. Supercomputing, vol. 11,no. 4, pp. 391-403, 1997.
    [15]
    K. Li, “Fast and Scalable Parallel Algorithms for Matrix Chain Product and Matrix Powers on Optical Buses,” High Performance Computing Systems and Applications, A. Pollard, D.J.K. Mewhort, and D.F. Weaver, eds., pp. 333-348, Boston: Kluwer Academic, 2000.
    [16]
    K. Li, “Fast and Scalable Parallel Matrix Computations with Optical Buses,” Lecture Notes in Computer Science, vol. 1800, pp. 1053-1062, 2000.
    [17]
    K. Li Y. Pan and M. Hamdi, “Solving Graph Theory Problems Using Reconfigurable Pipelined Optical Buses,” Parallel Computing, vol. 26, no. 6, pp. 723-735, 2000.
    [18]
    Parallel Computing Using Optical Interconnections. K. Li, Y. Pan, and S.Q. Zheng, eds. Boston: Kluwer Academic, 1998.
    [19]
    K. Li Y. Pan and S.Q. Zheng, “Fast and Processor Efficient Parallel Matrix Multiplication Algorithms on a Linear Array with a Reconfigurable Pipelined Bus System,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 8, pp. 705-720, Aug. 1998.
    [20]
    K. Li Y. Pan and S.Q. Zheng, “Parallel Matrix Computations Using a Reconfigurable Pipelined Optical Bus,” J. Parallel and Distributed Computing, vol. 59,no. 1, pp. 13-30, 1999.
    [21]
    K. Li Y. Pan and S.Q. Zheng, “Efficient Deterministic and Probabilistic Simulations of PRAMs on Linear Arrays with Reconfigurable Pipelined Bus Systems,” J. Supercomputing, vol. 15, no. 2, pp. 163-181, 2000.
    [22]
    Y. Li Y. Pan and S.Q. Zheng, “Pipelined TDM Optical Bus with Conditional Delays,” Optical Eng., vol. 36, no. 9, pp. 2417-2424, 1997.
    [23]
    Y. Pan and M. Hamdi, “Efficient Computation of Singular Value Decomposition on Arrays with Pipelined Optical Buses,” J. Network and Computer Applications, vol. 19, pp. 235-248, 1996.
    [24]
    Y. Pan M. Hamdi and K. Li, “Efficient and Scalable Quicksort on a Linear Array with a Reconfigurable Pipelined Bus System,” Future Generation Computer Systems, vol. 13, no. 6, pp. 501-513, 1998.
    [25]
    Y. Pan and K. Li, “Linear Array with a Reconfigurable Pipelined Bus System—Concepts and Applications,” J. Information Sciences, vol. 106, nos. 3-4, pp. 237-258, 1998.
    [26]
    Y. Pan K. Li and S.Q. Zheng, “Fast Nearest Neighbor Algorithms on a Linear Array with a Reconfigurable Pipelined Bus System,” J. Parallel Algorithms and Applications, vol. 13, pp. 1-25, 1998.
    [27]
    V. Pan, “How to Multiply Matrices Faster,” Lecture Notes in Computer Science, vol. 179, Berlin: Springer-Verlag, 1984.
    [28]
    V. Pan, “How Can We Speed Up Matrix Multiplication?” SIAM Review, vol. 26,no. 3, pp. 393-415, 1984.
    [29]
    V. Pan, “Complexity of Parallel Matrix Computations,” Theoretical Computer Science, vol. 54, pp. 65-85, 1987.
    [30]
    V. Pan, “Parallel Solution of Sparse Linear and Path Systems,” in Synthesis of Parallel Algorithms, J.H. Reif, ed., pp. 621-678, San Mateo, Calif.: Morgan Kaufmann, 1993.
    [31]
    V. Pan and J. Reif, “Efficient Parallel Solution of Linear Systems,” Proc. Seventh ACM Symp. Theory of Computing, pp. 143-152, May 1985.
    [32]
    H. Park H.J. Kim and V.K. Prasanna, “An O(1) Time Optimal Algorithm for Multiplying Matrices on Reconfigurable Mesh,” Information Processing Letters, vol. 47, pp. 109-113, 1993.
    [33]
    S. Pavel and S.G. Akl, “Matrix Operations Using Arrays with Reconfigurable Optical Buses,” J. Parallel Algorithms and Applications, vol. 8, pp. 223-242, 1996.
    [34]
    C. Qiao and R. Melhem, “Time-Division Optical Communications in Multiprocessor Arrays,” IEEE Trans. Computers, vol. 42, pp. 577-590, 1993.
    [35]
    S. Rajasekaran and S. Sahni, “Sorting, Selection, and Routing on the Array with Reconfigurable Optical Buses,” IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 11, pp. 1123-1132, Nov. 1997.
    [36]
    V. Strassen, “Gaussian Elimination Is Not Optimal,” Numerische Mathematik, vol. 13, pp. 354-356, 1969.
    [37]
    J.L. Trahan A.G. Bourgeois Y. Pan and R. Vaidyanathan, “Optimally Scaling Permutation Routing on Reconfigurable Linear Arrays with Optical Buses,” J. Parallel and Distributed Computing, vol. 60, no. 9, pp. 1125-1136, 2000.
    [38]
    C.-H. Wu S.-J. Horng and H.-R. Tsai, “Efficient Parallel Algorithms for Hierarchical Clustering on Arrays with Reconfigurable Optical Buses,” J. Parallel and Distributed Computing, vol. 60, no. 9, pp. 1137-1153, 2000.
    [39]
    S.Q. Zheng and Y. Li, “Pipelined Asynchronous Time-Division Multiplexing Optical Bus,” Optical Eng., vol. 36, no. 12, pp. 3392-3400, 1997.

    Cited By

    View all
    • (2010)Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systemsThe Journal of Supercomputing10.1007/s11227-009-0319-054:3(271-297)Online publication date: 1-Dec-2010
    • (2008)Matrix product on heterogeneous master-worker platformsProceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming10.1145/1345206.1345217(53-62)Online publication date: 20-Feb-2008
    • (2007)Analysis of Parallel Algorithms for Matrix Chain Product and Matrix Powers on Distributed Memory SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.102718:7(865-878)Online publication date: 1-Jul-2007
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Computers
    IEEE Transactions on Computers  Volume 50, Issue 5
    May 2001
    144 pages
    ISSN:0018-9340
    Issue’s Table of Contents

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 01 May 2001

    Author Tags

    1. Bilinear algorithm
    2. PRAM
    3. cost-optimality
    4. distributed memory system
    5. linear array
    6. matrix multiplication
    7. optical pipelined bus
    8. reconfigurable system
    9. speedup.

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 29 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2010)Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systemsThe Journal of Supercomputing10.1007/s11227-009-0319-054:3(271-297)Online publication date: 1-Dec-2010
    • (2008)Matrix product on heterogeneous master-worker platformsProceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming10.1145/1345206.1345217(53-62)Online publication date: 20-Feb-2008
    • (2007)Analysis of Parallel Algorithms for Matrix Chain Product and Matrix Powers on Distributed Memory SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.102718:7(865-878)Online publication date: 1-Jul-2007
    • (2007)Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.100118:4(433-448)Online publication date: 1-Apr-2007
    • (2005)Fast and Scalable Parallel Matrix Computations on Distributed Memory SystemsProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 0110.1109/IPDPS.2005.221Online publication date: 4-Apr-2005
    • (2004)Merging, sorting and matrix operations on the SOME-bus multiprocessor architectureFuture Generation Computer Systems10.1016/S0167-739X(03)00129-820:4(643-661)Online publication date: 1-May-2004
    • (2002)Hardware-Software Co-Reliability in Field Reconfigurable Multi-Processor-Memory SystemsProceedings of the 16th International Parallel and Distributed Processing Symposium10.5555/645610.661371Online publication date: 15-Apr-2002
    • (2001)Scalable Parallel Matrix Multiplication on Distributed Memory Parallel ComputersJournal of Parallel and Distributed Computing10.1006/jpdc.2001.176861:12(1709-1731)Online publication date: 1-Dec-2001

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media