Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systems

Published: 01 December 2010 Publication History

Abstract

We present fast and highly scalable parallel computations for a number of important and fundamental matrix problems on distributed memory systems (DMS). These problems include matrix multiplication, matrix chain product, and computing the powers, the inverse, the characteristic polynomial, the determinant, the rank, the Krylov matrix, and an LU- and a QR-factorization of a matrix, and solving linear systems of equations. Our highly scalable parallel computations for these problems are based on a highly scalable implementation of the fastest sequential matrix multiplication algorithm on DMS. We show that compared with the best known parallel time complexities on parallel random access machines (PRAM), the most powerful but unrealistic shared memory model of parallel computing, our parallel matrix computations achieve the same speeds on distributed memory parallel computers (DMPC), and have an extra polylog factor in the time complexities on DMS with hypercubic networks. Furthermore, our parallel matrix computations are fully scalable on DMPC and highly scalable over a wide range of system size on DMS with hypercubic networks. Such fast (in terms of parallel time complexity) and highly scalable (in terms of our definition of scalability) parallel matrix computations were rarely seen before on any distributed memory systems.

References

[1]
Arabnia HR (1993) A transputer-based reconfigurable parallel system. In: Atkins S, Wagner AS (eds) Transputer research and applications (NATUG 6), Vancouver, Canada. IOS Press, Amsterdam, pp 153-169.
[2]
Arif Wani M, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multiring network. J Supercomput 25(1):43-62.
[3]
Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor--theoretical properties and algorithms. Parallel Comput 21(11):1783-1805.
[4]
Bini D, Pan V (1994) Polynomial and matrix computations, vol 1, fundamental algorithms. Birkhäuser, Boston
[5]
Coppersmith D, Winograd S (1990) Matrix multiplication via arithmetic progressions. J Symb Comput 9:251-280.
[6]
Csanky L (1976) Fast parallel matrix inversion algorithms. SIAM J Comput 5:618-623.
[7]
Dekel E, Nassimi D, Sahni S (1981) Parallel matrix and graph algorithms. SIAM J Comput 10:657- 673.
[8]
Eshaghian MM (1993) Parallel algorithms for image processing on OMC. IEEE Trans Comput 40:827-833.
[9]
Goldberg LA, Jerrum M, Leighton T, Rao S (1997) Doubly logarithmic communication algorithms for optical-communication parallel computers. SIAM J Comput 26:1100-1119.
[10]
Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Addison-Wesley, Harlow.
[11]
Ibarra OH, Moran S, Rosier LE (1980) A note on the parallel complexity of computing the rank of order n matrices. Inf Process Lett 11(4-5):162.
[12]
Le Verrier UJJ (1840) Sur les variations seculaires des elementes elliptiques des sept planets principales. J Math Pures Appl 5:220-254.
[13]
Leighton FT (1992) Introduction to parallel algorithms and architectures: arrays, trees, hypercubes. Morgan Kaufmann, San Mateo.
[14]
Li K (2001) Scalable parallel matrix multiplication on distributed memory parallel computers. J Parallel Distrib Comput 61(12):1709-1731.
[15]
Li K (2004) Fast and scalable parallel matrix computations with reconfigurable pipelined optical buses. Parallel Algorithms Appl 19(4):195-209.
[16]
Li K (2007) Analysis of parallel algorithms for matrix chain product and matrix powers on distributed memory systems. IEEE Trans Parallel Distrib Syst 18(7):865-878.
[17]
Li K (2008) Fast and scalable parallel matrix multiplication and its applications on distributed memory systems. In: Rajasekaran S, Reif J (eds) Parallel computing: models, algorithms, and applications. CRC Press, Boca Raton, Chap 47.
[18]
Li K, Pan VY (2001) Parallel matrix multiplication on a linear array with a reconfigurable pipelined bus system. IEEE Trans Comput 50(5):519-525.
[19]
Li K, Pan Y, Zheng SQ (1998) Fast and processor efficient parallel matrix multiplication algorithms on a linear array with a reconfigurable pipelined bus system. IEEE Trans Parallel Distrib Syst 9(8):705-720.
[20]
Mehlhorn K, Vishkin U (1984) Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories. Acta Inf 21:339-374.
[21]
Pan V (1987) Complexity of parallel matrix computations. Theor Comput Sci 54:65-85.
[22]
Pan V, Reif J (1985) Efficient parallel solution of linear systems. In: Proceedings of 7th ACM symposium on theory of computing, May 1985, pp 143-152.
[23]
Pan Y, Li K (1998) Linear array with a reconfigurable pipelined bus system--concepts and applications. J Inf Sci 106(3-4):237-258.

Cited By

View all
  • (2018)Complexity Analysis of Vedic Mathematics Algorithms for Multicore EnvironmentInternational Journal of Rough Sets and Data Analysis10.4018/IJRSDA.20171001034:4(31-47)Online publication date: 11-Dec-2018
  • (2018)Energy- and reliability-aware task scheduling onto heterogeneous MPSoC architecturesThe Journal of Supercomputing10.1007/s11227-011-0720-362:1(265-289)Online publication date: 31-Dec-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing
The Journal of Supercomputing  Volume 54, Issue 3
December 2010
129 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2010

Author Tags

  1. Cost optimality
  2. Distributed memory parallel computer
  3. Distributed memory system
  4. Hypercubic network
  5. Matrix computation
  6. Scalability
  7. Speedup

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Complexity Analysis of Vedic Mathematics Algorithms for Multicore EnvironmentInternational Journal of Rough Sets and Data Analysis10.4018/IJRSDA.20171001034:4(31-47)Online publication date: 11-Dec-2018
  • (2018)Energy- and reliability-aware task scheduling onto heterogeneous MPSoC architecturesThe Journal of Supercomputing10.1007/s11227-011-0720-362:1(265-289)Online publication date: 31-Dec-2018

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media