Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Skip header Section
Fault-tolerant computer system designFebruary 1996
Publisher:
  • Prentice-Hall, Inc.
  • Division of Simon and Schuster One Lake Street Upper Saddle River, NJ
  • United States
ISBN:978-0-13-057887-7
Published:01 February 1996
Pages:
536
Skip Bibliometrics Section
Bibliometrics

Cited By

  1. Souza L, Camboim K, Araujo J, Alencar F, Maciel P and Ferreira J (2023). Dependability evaluation and sensitivity analysis of data center cooling systems, The Journal of Supercomputing, 79:17, (19607-19635), Online publication date: 1-Nov-2023.
  2. Alishzadeh Y, Jassbi S and Reshadi M (2023). Energy management of fault-tolerant real-time embedded systems through switching-activity-based techniques, Microprocessors & Microsystems, 102:C, Online publication date: 1-Oct-2023.
  3. ACM
    Zhang L, Wang Z and Kong F (2023). Optimal Checkpointing Strategy for Real-time Systems with Both Logical and Timing Correctness, ACM Transactions on Embedded Computing Systems, 22:4, (1-21), Online publication date: 31-Jul-2023.
  4. Zhang Y (2021). Energy-aware fixed priority scheduling with shared resources in standby-sparing systems, Microprocessors & Microsystems, 87:C, Online publication date: 1-Nov-2021.
  5. Li G, Wang W, Gai K, Tang Y, Yang B and Si X (2021). A Framework for Mimic Defense System in Cyberspace, Journal of Signal Processing Systems, 93:2-3, (169-185), Online publication date: 1-Mar-2021.
  6. ACM
    Duarte E, Macêdo R, Martins E and Rajsbaum S (2020). A tour of dependable computing research in Latin America, Communications of the ACM, 63:11, (96-101), Online publication date: 22-Oct-2020.
  7. Efanov D, Sapozhnikov V, Sapozhnikov V and Pivovarov D (2020). The Synthesis Conditions of Completely Self-Testing Embedded-Control Circuits Based on the Boolean Complement Method to the “1-out-of-m” Constant-Weight Code, Automatic Control and Computer Sciences, 54:2, (89-99), Online publication date: 1-Mar-2020.
  8. Yuan B, Li B, Chen H, Zeng Z and Yao X (2019). Multi-objective redundancy hardening with optimal task mapping for independent tasks on multi-cores, Soft Computing - A Fusion of Foundations, Methodologies and Applications, 24:2, (981-995), Online publication date: 1-Jan-2020.
  9. Efanov D, Sapozhnikov V, Sapozhnikov V and Pivovarov D (2020). Synthesis of Built-in Self-Test Control Circuits Based on the Method of Boolean Complement to Constant-Weight 1-out-of-n Codes, Automatic Control and Computer Sciences, 53:6, (481-491), Online publication date: 1-Nov-2019.
  10. Cao K, Xu G, Zhou J, Chen M, Wei T and Li K (2019). Lifetime-aware real-time task scheduling on fault-tolerant mixed-criticality embedded systems, Future Generation Computer Systems, 100:C, (165-175), Online publication date: 1-Nov-2019.
  11. Ansari M, Safari S, Yeganeh-Khaksar A, Salehi M and Ejlali A (2018). Peak Power Management to Meet Thermal Design Power in Fault-Tolerant Embedded Systems, IEEE Transactions on Parallel and Distributed Systems, 30:1, (161-173), Online publication date: 1-Jan-2019.
  12. ACM
    Li T, Ambrose J, Ragel R and Parameswaran S (2016). Processor Design for Soft Errors, ACM Computing Surveys, 49:3, (1-44), Online publication date: 30-Sep-2017.
  13. Han H, Touba N and Yang J (2017). Exploiting Unused Spare Columns and Replaced Columns to Enhance Memory ECC, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 36:9, (1580-1591), Online publication date: 1-Sep-2017.
  14. Lee S, Kevrekidis I and Karniadakis G (2017). A resilient and efficient CFD framework, Journal of Computational Physics, 344:C, (516-533), Online publication date: 1-Sep-2017.
  15. Boreck J, Kohlk M and Kubtov H (2017). Parity driven reconfigurable duplex system, Microprocessors & Microsystems, 52:C, (251-260), Online publication date: 1-Jul-2017.
  16. Haque M, Aydin H and Zhu D (2017). On Reliability Management of Energy-Aware Real-Time Systems Through Task Replication, IEEE Transactions on Parallel and Distributed Systems, 28:3, (813-825), Online publication date: 1-Mar-2017.
  17. Ahmed W, Hasan O, Pervez U and Qadir J (2017). Reliability modeling and analysis of communication networks, Journal of Network and Computer Applications, 78:C, (191-215), Online publication date: 15-Jan-2017.
  18. Samet R and Samet N Analysis of Relationship Between Modes of Intercomputer Communications and Fault Types in Redundant Computer Systems LNCS on Transactions on Computational Science XXIX - Volume 10220, (1-32)
  19. Moghaddas V, Fazeli M and Patooghy A (2016). Reliability-oriented scheduling for static-priority real-time tasks in standby-sparing systems, Microprocessors & Microsystems, 45:PA, (208-215), Online publication date: 1-Aug-2016.
  20. Salehi M, Ejlali A and Al-Hashimi B (2016). Two-Phase Low-Energy N-Modular Redundancy for Hard Real-Time Multi-Core Systems, IEEE Transactions on Parallel and Distributed Systems, 27:5, (1497-1510), Online publication date: 1-May-2016.
  21. Liu W, Zhang W, Wang X and Xu J (2016). Distributed Sensor Network-on-Chip for Performance Optimization of Soft-Error-Tolerant Multiprocessor System-on-Chip, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 24:4, (1546-1559), Online publication date: 1-Apr-2016.
  22. Wu T, Ganesan K, Hu Y, Wong H, Wong S and Mitra S (2016). TPAD: Hardware Trojan Prevention and Detection for Trusted Integrated Circuits, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 35:4, (521-534), Online publication date: 1-Apr-2016.
  23. ACM
    Kafshdooz M, Taram M, Assadi S and Ejlali A (2016). A Compile-Time Optimization Method for WCET Reduction in Real-Time Embedded Systems through Block Formation, ACM Transactions on Architecture and Code Optimization, 12:4, (1-25), Online publication date: 7-Jan-2016.
  24. ACM
    Tavana M, Teimouri N, Abdollahi M and Goudarzi M (2014). Simultaneous hardware and time redundancy with online task scheduling for low energy highly reliable standby-sparing system, ACM Transactions on Embedded Computing Systems, 13:4, (1-31), Online publication date: 5-Dec-2014.
  25. Xiang J, Machida F, Tadano K and Maeno Y Analysis of Persistence of Relevance in Systems with Imperfect Fault Coverage Proceedings of the 33rd International Conference on Computer Safety, Reliability, and Security - Volume 8666, (109-124)
  26. ACM
    Sabry M, Atienza D and Catthoor F (2014). OCEAN, ACM Transactions on Embedded Computing Systems, 13:4s, (1-26), Online publication date: 1-Jul-2014.
  27. ACM
    Krishna C (2014). Fault-tolerant scheduling in homogeneous real-time systems, ACM Computing Surveys, 46:4, (1-34), Online publication date: 1-Apr-2014.
  28. ACM
    Liu W, Wang X, Xu J, Zhang W, Ye Y, Wu X, Nikdast M and Wang Z (2014). On-chip sensor networks for soft-error tolerant real-time multiprocessor systems-on-chip, ACM Journal on Emerging Technologies in Computing Systems, 10:2, (1-20), Online publication date: 1-Feb-2014.
  29. Riener H, Frehse S and Fey G Improving fault tolerance utilizing hardware-software-co-synthesis Proceedings of the Conference on Design, Automation and Test in Europe, (939-942)
  30. ACM
    Zhao B, Aydin H and Zhu D (2013). Shared recovery for energy efficiency and reliability enhancements in real-time applications with precedence constraints, ACM Transactions on Design Automation of Electronic Systems, 18:2, (1-21), Online publication date: 1-Mar-2013.
  31. Wu Z, Hsieh S and Li J (2013). Sensor deployment based on fuzzy graph considering heterogeneity and multiple-objectives to diagnose manufacturing system, Robotics and Computer-Integrated Manufacturing, 29:1, (192-208), Online publication date: 1-Feb-2013.
  32. Chung J, Lee I, Sullivan M, Ryoo J, Kim D, Yoon D, Kaplan L and Erez M Containment domains Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-11)
  33. Izosimov V, Guglielmo G, Lora M, Pravadelli G, Fummi F, Peng Z and Fujita M (2012). Time-Constraint-Aware Optimization of Assertions in Embedded Software, Journal of Electronic Testing: Theory and Applications, 28:4, (469-486), Online publication date: 1-Aug-2012.
  34. MüLler M and DomíNguez-GarcíA A (2012). Fault coverage modeling in nonlinear dynamical systems, Automatica (Journal of IFAC), 48:7, (1372-1379), Online publication date: 1-Jul-2012.
  35. ACM
    Henkel J, Bauer L, Becker J, Bringmann O, Brinkschulte U, Chakraborty S, Engel M, Ernst R, Härtig H, Hedrich L, Herkersdorf A, Kapitza R, Lohmann D, Marwedel P, Platzner M, Rosenstiel W, Schlichtmann U, Spinczyk O, Tahoori M, Teich J, Wehn N and Wunderlich H Design and architectures for dependable embedded systems Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, (69-78)
  36. ACM
    Zhao B, Aydin H and Zhu D Generalized reliability-oriented energy management for real-time embedded applications Proceedings of the 48th Design Automation Conference, (381-386)
  37. ACM
    Zhang W, Lim J, Olichandran R, Scherpelz J, Jin G, Lu S and Reps T (2011). ConSeq, ACM SIGPLAN Notices, 46:3, (251-264), Online publication date: 17-Mar-2011.
  38. ACM
    Zhang W, Lim J, Olichandran R, Scherpelz J, Jin G, Lu S and Reps T (2011). ConSeq, ACM SIGARCH Computer Architecture News, 39:1, (251-264), Online publication date: 17-Mar-2011.
  39. ACM
    Zhang W, Lim J, Olichandran R, Scherpelz J, Jin G, Lu S and Reps T ConSeq Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, (251-264)
  40. Amin M, Ramazani A, Monteiro F, Diou C and Dandache A (2011). A self-checking hardware journal for a fault-tolerant processor architecture, International Journal of Reconfigurable Computing, 2011, (11-11), Online publication date: 1-Jan-2011.
  41. ACM
    Vosoughi A, Bilal K, Khan S, Min-Allah N, Li J, Ghani N, Bouvry P and Madani S A multidimensional robust greedy algorithm for resource path finding in large-scale distributed networks Proceedings of the 8th International Conference on Frontiers of Information Technology, (1-6)
  42. Varatkar G, Narayanan S, Shanbhag N and Jones D (2010). Stochastic networked computation, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18:10, (1421-1432), Online publication date: 1-Oct-2010.
  43. ACM
    Lee K, Shrivastava A, Dutt N and Venkatasubramanian N (2010). Partitioning techniques for partially protected caches in resource-constrained embedded systems, ACM Transactions on Design Automation of Electronic Systems, 15:4, (1-30), Online publication date: 1-Sep-2010.
  44. Italiano G Resilient algorithms and data structures Proceedings of the 7th international conference on Algorithms and Complexity, (13-24)
  45. ACM
    Shrivastava A, Lee J and Jeyapaul R (2010). Cache vulnerability equations for protecting data in embedded processor caches from soft errors, ACM SIGPLAN Notices, 45:4, (143-152), Online publication date: 13-Apr-2010.
  46. ACM
    Shrivastava A, Lee J and Jeyapaul R Cache vulnerability equations for protecting data in embedded processor caches from soft errors Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems, (143-152)
  47. Biernat J (2010). Fast fault-tolerant adders, International Journal of Critical Computer-Based Systems, 1:1/2/3, (117-127), Online publication date: 1-Feb-2010.
  48. Brodal G, Jørgensen A, Moruz G and Mølhave T Counting in the Presence of Memory Faults Proceedings of the 20th International Symposium on Algorithms and Computation, (842-851)
  49. ACM
    Zhao B, Aydin H and Zhu D Enhanced reliability-aware power management through shared recovery technique Proceedings of the 2009 International Conference on Computer-Aided Design, (63-70)
  50. ACM
    Ejlali A, Al-Hashimi B and Eles P A standby-sparing technique with low energy-overhead for fault-tolerant hard real-time systems Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis, (193-202)
  51. ACM
    Mawlood-Yunis A Fault-tolerant semantic mappings among heterogeneous and distributed local ontologies Proceedings of the 2nd international workshop on Ontologies and information systems for the semantic web, (31-38)
  52. ACM
    Lee K, Shrivastava A, Kim M, Dutt N and Venkatasubramanian N Mitigating the impact of hardware defects on multimedia applications Proceedings of the 16th ACM international conference on Multimedia, (319-328)
  53. Lim S, Lee B and Kim J (2008). Diversity and fault avoidance for dependable replication systems, Information Processing Letters, 108:1, (33-37), Online publication date: 1-Sep-2008.
  54. ACM
    Florio V and Blondia C (2008). A survey of linguistic structures for application-level fault tolerance, ACM Computing Surveys, 40:2, (1-37), Online publication date: 1-Apr-2008.
  55. ACM
    Afonso F, Silva C, Brito N, Montenegro S and Tavares A Aspect-oriented fault tolerance for real-time embedded systems Proceedings of the 2008 AOSD workshop on Aspects, components, and patterns for infrastructure software, (1-8)
  56. ACM
    Wu M, Sun X and Jin H Performance under failures of high-end computing Proceedings of the 2007 ACM/IEEE conference on Supercomputing, (1-11)
  57. Buckl C, Knoll A, Schieferdecker I and Zander J Model-based analysis and development of dependable systems Proceedings of the 2007 International Dagstuhl conference on Model-based engineering of embedded real-time systems, (271-293)
  58. ACM
    Saha G (2007). Understanding dependable computing concepts, Ubiquity, 2007:November, (1-1), Online publication date: 1-Nov-2007.
  59. Zhou Y, Lakamraju V, Koren I and Krishna C (2007). Software-Based Failure Detection and Recovery in Programmable Network Interfaces, IEEE Transactions on Parallel and Distributed Systems, 18:11, (1539-1550), Online publication date: 1-Nov-2007.
  60. Aydin H (2007). Exact Fault-Sensitive Feasibility Analysis of Real-Time Tasks, IEEE Transactions on Computers, 56:10, (1372-1386), Online publication date: 1-Oct-2007.
  61. Sekanina L Evolution of polymorphic self-checking circuits Proceedings of the 7th international conference on Evolvable systems: from biology to hardware, (186-197)
  62. Eriksson H Dependability evaluation of time-redundancy techniques in integer multipliers Proceedings of the 17th international conference on Integrated Circuit and System Design: power and timing modeling, optimization and simulation, (566-575)
  63. ACM
    Bolzani L, Sanchez E and Reorda M A software-based methodology for the generation of peripheral test sets based on high-level descriptions Proceedings of the 20th annual conference on Integrated circuits and systems design, (348-353)
  64. Jørgensen A, Moruz G and Mølhave T Priority queues resilient to memory faults Proceedings of the 10th international conference on Algorithms and Data Structures, (127-138)
  65. ACM
    Sekanina L (2007). Evolutionary functional recovery in virtual reconfigurable circuits, ACM Journal on Emerging Technologies in Computing Systems, 3:2, (8-es), Online publication date: 1-Jul-2007.
  66. ACM
    Kadayif I and Kandemir M (2007). Modeling and improving data cache reliability, ACM SIGMETRICS Performance Evaluation Review, 35:1, (12), Online publication date: 12-Jun-2007.
  67. ACM
    Kadayif I and Kandemir M Modeling and improving data cache reliability Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
  68. Neti S and Muller H Quality Criteria and an Analysis Framework for Self-Healing Systems Proceedings of the 2007 International Workshop on Software Engineering for Adaptive and Self-Managing Systems
  69. ACM
    Lee K, Shrivastava A, Issenin I, Dutt N and Venkatasubramanian N Mitigating soft error failures for multimedia applications by selective data protection Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, (411-420)
  70. ACM
    Cox A, Mohanram K and Rixner S Dependable ≠ unaffordable Proceedings of the 1st workshop on Architectural and system support for improving software dependability, (58-62)
  71. Garousi V, Briand L and Labiche Y Analysis and visualization of behavioral dependencies among distributed objects based on UML models Proceedings of the 9th international conference on Model Driven Engineering Languages and Systems, (365-379)
  72. Ko E Simulating an intelligence fault tolerance system for situation-aware ubiquitous computing Proceedings of the 2006 international conference on Intelligent computing: Part II, (1290-1299)
  73. ACM
    Zhu X and Qin W Prototyping a fault-tolerant multiprocessor SoC with run-time fault recovery Proceedings of the 43rd annual Design Automation Conference, (53-56)
  74. Zhou Y, Lakamraju V, Koren I and Krishna C Software-Based Adaptive and Concurrent Self-Testing in Programmable Network Interfaces Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1, (525-532)
  75. ACM
    Saha G (2006). Software based fault tolerance, Ubiquity, 2006:July, (1-1), Online publication date: 1-Jul-2006.
  76. ACM
    Saha G (2006). Application semantic driven assertions toward fault tolerant computing, Ubiquity, 2006:June, (1-27), Online publication date: 1-Jun-2006.
  77. ACM
    Sekanina L On dependability of FPGA-based evolvable hardware systems that utilize virtual reconfigurable circuits Proceedings of the 3rd conference on Computing frontiers, (221-228)
  78. ACM
    Zhang R and Jha N Threshold/majority logic synthesis and concurrent error detection targeting nanoelectronic implementations Proceedings of the 16th ACM Great Lakes symposium on VLSI, (8-13)
  79. Buckl C, Knoll A and Schrott G The zerberus language Proceedings of the Second Latin-American conference on Dependable Computing, (101-120)
  80. Ji M and Yu S Availability Modeling for Reliable Routing Software Proceedings of the 9th IEEE International Symposium on Distributed Simulation and Real-Time Applications, (120-128)
  81. Zhou M, Mei H and Zhang L A Multi-Property Trust Model for Reconfiguring Component Software Proceedings of the Fifth International Conference on Quality Software, (142-149)
  82. He X, Zhang M and Yang Q (2005). SPEK, IEEE Transactions on Dependable and Secure Computing, 2:2, (138-149), Online publication date: 1-Apr-2005.
  83. Maity S, Nayak A and Roy B Reliability of VLSI linear arrays with redundant links Proceedings of the 6th international conference on Distributed Computing, (326-337)
  84. Mitra S, Saxena N and McCluskey E (2004). Efficient Design Diversity Estimation for Combinational Circuits, IEEE Transactions on Computers, 53:11, (1483-1492), Online publication date: 1-Nov-2004.
  85. Kim H and Sheldon F (2004). Testing Software Requirements with Z and Statecharts Applied to an Embedded Control Systemt0t1, Software Quality Journal, 12:3, (231-264), Online publication date: 1-Sep-2004.
  86. ACM
    Reyhani-Masoleh A and Hasan M (2004). Towards fault-tolerant cryptographic computations over finite fields, ACM Transactions on Embedded Computing Systems, 3:3, (593-613), Online publication date: 1-Aug-2004.
  87. Vargas F, Fagundes R, Barros D, Brum D and Rhod E (2004). Merging a DSP-Oriented Signal Integrity Technique and SW-Based Fault Handling Mechanisms to Ensure Reliable DSP Systems, Journal of Electronic Testing: Theory and Applications, 20:4, (397-411), Online publication date: 1-Aug-2004.
  88. Pinho L, Vasques F and Wellings A (2004). Replication Management in Reliable Real-Time Systems, Real-Time Systems, 26:3, (261-296), Online publication date: 1-Apr-2004.
  89. Candea G, Cutler J and Fox A (2004). Improving availability with recursive microreboots, Performance Evaluation, 56:1-4, (213-248), Online publication date: 1-Mar-2004.
  90. ACM
    Xu M, Bodik R and Hill M A "flight data recorder" for enabling full-system multiprocessor deterministic replay Proceedings of the 30th annual international symposium on Computer architecture, (122-135)
  91. ACM
    Xu M, Bodik R and Hill M (2003). A "flight data recorder" for enabling full-system multiprocessor deterministic replay, ACM SIGARCH Computer Architecture News, 31:2, (122-135), Online publication date: 1-May-2003.
  92. Vargas F, Fagundes R and Barros D (2003). A New On-Line Robust Approach to Design Noise-Immune Speech Recognition Systems, Journal of Electronic Testing: Theory and Applications, 19:1, (61-72), Online publication date: 1-Feb-2003.
  93. Mathur F Hardware reliability Encyclopedia of Computer Science, (773-777)
  94. Dutta S, Dutta S, Burman R, Barik M and Mazumdar C Design and Implementation of a Soft Real Time Fault Tolerant System Proceedings of the 4th International Workshop on Distributed Computing, Mobile and Wireless Computing, (319-328)
  95. Sorin D, Martin M, Hill M and Wood D SafetyNet Proceedings of the 29th annual international symposium on Computer architecture, (123-134)
  96. ACM
    Sorin D, Martin M, Hill M and Wood D (2002). SafetyNet, ACM SIGARCH Computer Architecture News, 30:2, (123-134), Online publication date: 1-May-2002.
  97. Mitra S, Saxena N and McCluskey E (2002). A Design Diversity Metric and Analysis of Redundant Systems, IEEE Transactions on Computers, 51:5, (498-510), Online publication date: 1-May-2002.
  98. Lo J (2002). Analysis of a BICS-Only Concurrent Error Detection Method, IEEE Transactions on Computers, 51:3, (241-253), Online publication date: 1-Mar-2002.
  99. Pomeranz I and Reddy S A Method to Enhance the Fault Coverage Obtained by Output Response Comparison of Identical Circuits Proceedings of the 2001 IEEE International Test Conference
  100. Mitra S, Saxena N and McCluskey E Techniques for Estimation of Design Diversity for Combinational Logic Circuits Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS), (25-36)
  101. Bezerra E, Vargas F and Gough M (2001). Improving Reconfigurable Systems Reliability by Combining Periodical Test and Redundancy Techniques, Journal of Electronic Testing: Theory and Applications, 17:2, (163-174), Online publication date: 1-Apr-2001.
  102. Pitica D and Posteuca C Complex Reliability Evaluation of Voters for Fault Tolerant Designs Proceedings of the 2nd International Symposium on Quality Electronic Design
  103. Benso A, di Carto S, di Natale G and Prinetto P SEU effect analysis in an open-source router via a distributed fault injection environment Proceedings of the conference on Design, automation and test in Europe, (219-225)
  104. ACM
    Huang W and McCluskey E A memory coherence technique for online transient error recovery of FPGA configurations Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays, (183-192)
  105. Tu H and Hawkes L (2001). Families of Optimal Fault-Tolerant Multiple-Bus Networks, IEEE Transactions on Parallel and Distributed Systems, 12:1, (60-73), Online publication date: 1-Jan-2001.
  106. Mitra S and McCluskey E COMBINATIONAL LOGIC SYNTHESIS FOR DIVERSITY IN DUPLEX SYSTEMS Proceedings of the 2000 IEEE International Test Conference
  107. Ramamurthy B, Upadhyaya S and Bhargava B (2000). Design and Analysis of an Integrated Checkpointing and Recovery Scheme for Distributed Applications, IEEE Transactions on Knowledge and Data Engineering, 12:2, (174-186), Online publication date: 1-Mar-2000.
  108. Mitra S, Saxena N and McCluskey E A Design Diversity Metric and Reliability Analysis for Redundant Systems Proceedings of the 1999 IEEE International Test Conference
  109. Lo J (1998). Online Current Testing, IEEE Design & Test, 15:4, (49-56), Online publication date: 1-Oct-1998.
  110. Hellebrand S, Wunderlich H and Hertwig A (1998). Synthesizing Fast, Online-Testable Control Units, IEEE Design & Test, 15:4, (36-41), Online publication date: 1-Oct-1998.
  111. Pflanz M and Vierhaus H (1998). Generating Reliable Embedded Processors, IEEE Micro, 18:5, (33-41), Online publication date: 1-Sep-1998.
  112. Fenn S, Gossel M, Benaissa M and Taylor D (1998). On-Line Error Detection for Bit-Serial Multipliers in GF(2m), Journal of Electronic Testing: Theory and Applications, 13:1, (29-40), Online publication date: 1-Aug-1998.
  113. ACM
    Dovrolis C and Ramanathan P (1998). Resource aggregation for fault tolerance in integrated services networks, ACM SIGCOMM Computer Communication Review, 28:2, (39-53), Online publication date: 1-Apr-1998.
  114. Nikolos D (1998). Self-Testing Embedded Two-Rail Checkers, Journal of Electronic Testing: Theory and Applications, 12:1-2, (69-79), Online publication date: 1-Feb-1998.
Contributors
  • University of Bristol

Reviews

Claudiu Bulaceanu

Today, when designing a functional system is a common matter, emphasis is placed on designing mission-critical systems with enhanced reliability and a high degree of safety. This textbook covers architecture and design of fault-tolerant and high-availability systems, from both the theoretical and the practical points of view. The book is divided into eight parts. The first part comprises four chapters, the first of which is an introduction. The second deals with redundancy techniques for hardware, software, and time. Chapter 3 treats evaluation techniques. Chapter 4 deals with design methodologies. Part 2, comprising seven chapters, discusses the architecture of fault-tolerant computers. The first chapter is an introduction to fault-tolerant architectures. The second chapter treats the categorization of applications and related systems. The third chapter is dedicated to several well-known computer architectures such as IBM and VAX. The fourth chapter briefly analyzes several fault-tolerant implementations, such as the ones issued by AT&T, Tandem, Stratus, and VAXft. The fifth chapter discusses some examples of long-life systems, like the ones used in space for Voyager and Galileo. The sixth chapter highlights the critical systems features and characteristics of systems like the one used for the space shuttle. The seventh chapter is a summary of the part. Part 3, consisting of 13 chapters, displays the principles of fault-tolerant multiprocessor and distributed systems. After an introduction and a review of types of parallel processing, topology of interconnection, and programming models, the next chapters deal with several types of fault-tolerance: static redundancy and dynamic redundancy. Other chapters deal with fault detection, recovery strategies, recovery techniques and schemes (such as rollback and forward recovery), and reconfiguration in multiprocessors. Part 4 deals with several case studies regarding fault-tolerant multiprocessors and distributed systems. This part contains ten chapters that include several well-known implementations, such as the Tandem systems, the Stratus XA/r 300 Systems, the Sequoia systems, the Byzantine resilience model, and the space shuttle system. The six chapters of Part 5 are dedicated to experimental analysis of systems dependability. Statistical techniques, the design phase, and the prototype phase are presented, and the operational phase is discussed at length. Part 6 includes five chapters and covers reliability estimation, including element and system reliability, behavioral decomposition, and examples. Part 7, including seven chapters, focuses on fault tolerance in software. It highlights the motivation for this area of study, how to deal with faulty programs, fault-tolerance in software, reliability models for software, acceptance tests, exception handling, and a case study of an extended distributed recovery block. The final part comprises four chapters and is dedicated to system diagnosis. The different flavors of diagnoses are discussed, including the diagnosis under probabilistic models and under bounded models. This is a classroom textbook intended for advanced undergraduate and graduate students in computer science or a related field of study; IT researchers who want to complete their knowledge of systems reliability; designers of reliable computer systems; and IT project managers. The level and structure of the text, as well as its approach to the fault-tolerant systems field, are typical for the university environment. Each part ends with suitable exercises. The best part of the book is its comprehensive and clear explanation of a field not yet well established and with many things still to uncover. The real-life examples and theoretical mathematical support are additional outstanding features . On the down side, the book would have improved by following at least one example of the design of a fault-tolerant system from beginning to end, passing through all the phases outlined in the book.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Recommendations