Software dependability in the operational phase

November 1995

Author:
Inhwan Lee
Univ. of Illinois at Urbana-Champaign

Publisher:

University of Illinois at Urbana-Champaign
Champaign, IL
United States

Order Number:UMI Order No. GAX95-12450

Bibliometrics

Abstract

Software quality should be built-in and maintained throughout the software life cycle, which requires understanding of software dependability in actual environments. This thesis discusses how to develop analysis techniques for evaluating the dependability of operational software using real measurements while taking design issues into account. The issues addressed include fault categorization and characterization of error propagation, symptom-based diagnosis of recurrent software failures, identification of software fault tolerance, evaluation of the impact of software faults on the overall system, and the development of techniques for analyzing multiway failure dependencies among software and hardware modules. The process is illustrated using a case study of the Tandem GUARDIAN operating system.Using process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of reported faults in the system software that cause processor failures. The loose coupling between processors, which results in the backup execution (the processor state and the sequence of events) being different from the original execution, is a major reason for the measured software fault tolerance. About 72% of reported field software failures in Tandem systems are recurrences of previously reported faults. In addition to the conventional approach of reducing the number of faults in software, software dependability in Tandem systems can be enhanced by reducing the recurrence rate and by improving the robustness of process pairs and the system configuration. An approach for automatically diagnosing recurrences based on their symptoms is developed. The results of evaluations of the effectiveness of the approach show that between 75% and 95% of recurrences can be successfully identified by matching failure symptoms, such as stack traces and problem detection locations. Less than 10% of faults are misdiagnosed.

Cited By

Contributors

Inhwan Lee
Hanyang University
- Publication Years1995 - 2000
- Publication counts5
- Citation count43
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article9
View Full Profile

Index Terms

Software dependability in the operational phase
1. General and reference
  1. Cross-computing tools and techniques
    1. Reliability
2. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software reliability

Comments

Recommendations

Experimental study of software dependability
Emulation of Transient Software Faults for Dependability Assessment: A Case Study
EDCC '10: Proceedings of the 2010 European Dependable Computing Conference

Fault Tolerance Mechanisms (FTMs) are extensively used in software systems to counteract software faults, in particular against faults that manifest transiently, namely Mandelbugs. In this scenario, Software Fault Injection (SFI) plays a key role for ...
Software Dependability in the Tandem GUARDIAN System

Based on extensive field failure data for Tandem s GUARDIAN operating system, this paper discusses evaluation of the dependability of operational software. Software faults considered are major defects that result in processor failures and invoke backup ...

Browse Theses

Sections

Cited By

Index Terms

Experimental study of software dependability

Emulation of Transient Software Faults for Dependability Assessment: A Case Study

Software Dependability in the Tandem GUARDIAN System

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Experimental study of software dependability

Emulation of Transient Software Faults for Dependability Assessment: A Case Study

Software Dependability in the Tandem GUARDIAN System