Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Fault-Tolerant Matrix Triangularizations on Systolic Arrays

Published: 01 November 1988 Publication History

Abstract

Examines the checksum methods of Abraham et al. for LU decomposition on multiprocessor arrays. Their methods are efficient for detecting a transient error, but expensive for correcting it due to the need for a computation rollback. The authors show how to avoid the rollback by using matrix updating techniques, and they introduce new checksum methods for Gaussian elimination with pairwise pivoting and for QR decomposition on systolic arrays.

References

[1]
{1} J. A. Abraham, "Fault tolerant techniques for highly parallel signal processing architectures," in Highly Parallel Signal Processing Architectures, Proc. SPIE, K. Bromley, Ed., vol. 614, 1986, pp. 49-65.
[2]
{2} H. M. Ahmed, J.-M. Delosme, and M. Morf, "Highly concurrent computing structures for matrix arithmetic and signal processing," Computer, vol. 15, pp. 65-82, Jan. 1982.
[3]
{3} A. Bojanczyk, R. P. Brent, and H. T. Kung, "Numerically stable solution of dense systems of linear equations using mesh-connected processors," SIAM J. Sci. Statist. Comput., vol. 5, pp. 95-104, 1984.
[4]
{4} W. M. Gentleman and H. T. Kung, "Matrix triangularization by systolic arrays," in Real Time Signal Processing IV, Proc. SPIE, T. F. Tao, Ed., vol. 298, 1981, pp. 19-26.
[5]
{5} P. E. Gill, G. H. Golub, W. Murray, and M. A. Saunders, "Methods for modifying matrix factorizations," Math. Comput., vol. 28, pp. 505-535, 1974.
[6]
{6} K-H. Huang and J. A. Abraham, "Algorithm-based fault tolerance for matrix operations," IEEE Trans. Comput., vol. C-33, pp. 518-528, 1984.
[7]
{7} J-Y. Jou and J. A. Abraham, "Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing structures," Proc. IEEE, vol. 74, pp. 732-741, 1986.
[8]
{8} H. T. Kung and M. S. Lam, "Fault-tolerant VLSI systolic arrays and two-level pipelining," in Real Time Signal Processing VI, Proc. SPIE, K. Bromley, Ed., vol. 431, 1983, pp. 143-158.
[9]
{9} H. T. Kung and C. E. Leiserson, "Algorithms for VLSI processor arrays," in Introduction to VLSI Systems, C. Mead and L. Conway, Eds. Reading, MA: Addison-Wesley, 1980, pp. 271-292.
[10]
{10} F. T. Luk, "Algorithm-based fault tolerance for parallel matrix equation solvers," in Real Time Signal Processing VIII, Proc. SPIE, W. J. Miceli and K. Bromley, Eds., vol. 564, 1985, pp. 49-53.
[11]
{11} F. T. Luk, "A rotation method for computing the QR-decomposition," SIAM J. Sci. Statist. Comput., vol. 7, pp. 452-459, 1986.
[12]
{12} F. T. Luk and H. Park, "Analysis of algorithm-based fault tolerance techniques," in J. Parallel Distribut. Comput., vol. 5, pp. 172-184, 1988.
[13]
{13} J. M. Ortega, Numerical Analysis, A Second Course. New York: Academic, 1972.
[14]
{14} D. C. Sorensen, "Analysis of pairwise pivoting in Gaussian elimination," IEEE Trans. Comput., vol. C-34, pp. 274-278, 1985.
[15]
{15} J. M. Speiser and H. J. Whitehouse, "A review of signal processing with systolic arrays," in Real Time Signal Processing VI, Proc. SPIE, K. Bromley, Ed., vol. 431, 1983, pp. 2-6.
[16]
{16} J. H. Wilkinson, The Algebraic Eigenvalue Problem. London, England: Oxford University Press, 1965.

Cited By

View all
  • (2022)Software approaches for resilience of high performance computing systems: a surveyFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-022-2096-317:4Online publication date: 12-Dec-2022
  • (2017)Silent Data Corruption Resilient Two-sided Matrix FactorizationsACM SIGPLAN Notices10.1145/3155284.301875052:8(415-427)Online publication date: 26-Jan-2017
  • (2017)Silent Data Corruption Resilient Two-sided Matrix FactorizationsProceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3018743.3018750(415-427)Online publication date: 26-Jan-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 37, Issue 11
November 1988
174 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 November 1988

Author Tags

  1. Gaussian elimination
  2. LU decomposition
  3. QR decomposition
  4. cellular arrays
  5. checksum methods
  6. computation rollback
  7. fault tolerant computing
  8. matrix algebra
  9. matrix triangularizations
  10. matrix updating techniques
  11. multiprocessor arrays
  12. pairwise pivoting
  13. parallel algorithms
  14. parallel architectures.
  15. systolic arrays
  16. transient error
  17. weighted checksum

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Software approaches for resilience of high performance computing systems: a surveyFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-022-2096-317:4Online publication date: 12-Dec-2022
  • (2017)Silent Data Corruption Resilient Two-sided Matrix FactorizationsACM SIGPLAN Notices10.1145/3155284.301875052:8(415-427)Online publication date: 26-Jan-2017
  • (2017)Silent Data Corruption Resilient Two-sided Matrix FactorizationsProceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3018743.3018750(415-427)Online publication date: 26-Jan-2017
  • (2015)Resilient Matrix Multiplication of Hierarchical Semi-Separable MatricesProceedings of the 5th Workshop on Fault Tolerance for HPC at eXtreme Scale10.1145/2751504.2751507(19-26)Online publication date: 15-Jun-2015
  • (2013)Parallel reduction to hessenberg form with algorithm-based fault toleranceProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.1145/2503210.2503249(1-11)Online publication date: 17-Nov-2013
  • (2011)Soft error resilient QR factorization for hybrid system with GPGPUProceedings of the second workshop on Scalable algorithms for large-scale systems10.1145/2133173.2133179(11-14)Online publication date: 14-Nov-2011
  • (2001)An Efficient Algorithm-Based Fault Tolerance Design Using the Weighted Data-Check RelationshipIEEE Transactions on Computers10.1109/12.91928150:4(371-383)Online publication date: 1-Apr-2001
  • (1998)Fault Tolerant Faddeeva AlgorithmJournal of Parallel and Distributed Computing10.1006/jpdc.1998.147453:1(78-89)Online publication date: 25-Aug-1998
  • (1997)Extending Backward Error Assertions to Tolerance of Large Errors in Floating Point ComputationsIEEE Transactions on Computers10.1109/12.58807246:4(505-510)Online publication date: 1-Apr-1997
  • (1996)New Encoding/Decoding Methods for Designing Fault-Tolerant Matrix OperationsIEEE Transactions on Parallel and Distributed Systems10.1109/71.5369377:9(931-938)Online publication date: 1-Sep-1996
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media