Comparison Modeling of System Reliability for Future NASA Projects
Amanda M. Gillespie, ASQ CRE, SAIC
Mark W. Monaghan, Ph.D, SAIC
Yuan Chen, Ph.D, NASA LaRC
Key Words: RBD, Importance Measure, Cut Set, Fussell-Vesely, Comparison Modeling
SUMMARY & CONCLUSIONS
A National Aeronautics and Space Administration
(NASA) supported Reliability, Maintainability, and
Availability (RMA) analysis team developed a RMA analysis
methodology that uses cut set and importance measure
analyses to compare model proposed avionics computing
architectures. In this paper we will present an effective and
efficient application of the RMA analysis methodology for
importance measures that includes Reliability Block Diagram
(RBD) Analysis, Comparison modeling, Cut Set Analysis, and
Importance Measure Analysis. In addition, we will also
demonstrate that integrating RMA early in the system design
process is a key and fundamental decision metric that supports
design selection.
The RMA analysis methodology presented in this paper
and applied to the avionics architectures enhances the usual
way of predicting the need for redundancy based on failure
rates or subject matter expert opinion. Typically, RBDs and
minimal cut sets along with the Fussell-Vesely (FV) method is
used to calculate importance measures are calculated for each
functional element in the architecture [1]. This paper presents
an application of the FV importance measures and presents it
as a methodology for using importance measures in success
space to compare architectures. These importance measures
are used to identify which functional element is most likely to
cause a system failure, thus, quickly identifying the path to
increase the overall system reliability by either procuring more
reliable functional elements or adding redundancy [2].
This methodology that used RBD analysis, cut set
analysis, and the FV importance measures allowed the
avionics design team to better understand and compare the
vulnerabilities in each scenario of the architectures. It also
enabled the design team to address the deficiencies in the
design architectures more efficiently, while balancing the need
to design for optimum weight and space allocations.
1 INTRODUCTION
A trade study was performed to evaluate various avionics
computing architectures from the perspectives of reliability,
mass, power, data integrity, software implementation, and
hardware and software integration for future NASA programs.
A set of RBD models were developed to analyze the reliability
of and rank the various computing system architectures.
These reliability analysis modules allowed for ease and
consistency in calculating reliability, cut sets, and importance
measures.
First, the RBD modules were created and the reliability of
each architecture was calculated. Second, cut set analyses
were performed to determine functional elements most likely
to fail in the architecture (i.e., which functional elements had
the largest unreliability). Third, FV importance measures
were calculated for each functional element in each of the
architectures.
Last, identical functional elements were
grouped to allow for comparison between the architectures
and provide the understanding of which functional elements
had the most significant impact on system reliability.
2 SCOPE
This paper describes the reliability engineering
methodology developed for the RBD comparison, cut set
analysis,
importance
analysis,
and
improvement
recommendations for the architectures for future NASA
launch vehicles.
3 ASSUMPTIONS
To ensure that the RBD modules for each scenario of the
architecture were comparable, repeatable, and auditable,
assumptions were documented, which included functional
element failure rates, fault tolerance, mission duration, and
cable and connector details.
3.1 Functional element Failure Rates
The failure rates for the functional elements in the
architectures were estimated based on the existing avionics
system reliability databases. To facilitate comparison, the
same failure rates were assumed for the same functional
elements, interconnects, and topologies for the various
architectures. All functional elements were assumed to have
an exponential failure rate distribution, as not only is the most
common approach, but the exponential distribution is the
model for the useful life period, signifying that random
failures are occurring. This useful life period was the focus of
the avionics architecture trade study.
Due to the proprietary nature of these failure rates, they
will not be listed in this paper. The functional element failure
rates were based on existing avionics architectures used in
aircraft, and a failure rate environmental conversion was made
to the Spaceflight (SF) environment. The conversions were
made in accordance with the System Reliability Center (SRC)
environmental matrix [3].
3.2 Fault Tolerance
Avionics systems have evolved over time to incorporate
fault tolerance within the system architecture. The capability
to survive a functional element fault has driven the design of
multiple avionics system configurations [4]. In the Delta
family of launch vehicles, the avionics systems have evolved
over time based on reliability and fault tolerance [5]. Several
examples of various avionic system architectures were
considered. The NASA avionic architecture team selected the
systems to be evaluated based on fault tolerance capability.
The reliability analysis for the selected avionic systems to
be studied assumed one-fault tolerance for each function
element, i.e., more than one failure in any single functional
element was deemed to be a failure. The configuration of the
functional elements provided the fault tolerance capability.
For example, more than one of the three Inertial Navigation
Units (INU) would result in a system failure. In addition, only
hard or non-recoverable failures of the functional elements
were considered in the analysis. The impact of a functional
element operating in a degraded stated was not taken into
consideration.
For a triplex voter (TV), the functional unit configuration
was assumed to consist of three functional units plus majority
voting logic, with the functional elements in a parallel
configuration for reliability calculations (2-of-3 in agreement
for success). Figure 1 shows the RBD configuration for a TV.
For a self-checking pair (SCP) functional element
configuration, it was assumed that, for the flight computers,
switches, or buses, the self-checking pair consists of two flight
computers or two switches or two buses, which needed to have
data agreement to be successful.
Therefore, the SCP
functional elements were in a series configuration for
reliability calculations (2-of-2 in agreement for success).
Figure 2 shows the RBD configuration for a SCP.
of up to nine months (6,480 hours) for mission scenarios that
could potentially require an Earth departure stage and long
mission duration.
3.5 Cables and Cable Connectors (C&C)
The different architectures were modeled with and
without cabling as part of the RBD analysis. The request to
model cabling was made in order to identify a potential
difference when cabling is installed into the models. The
cable input to the functional element was modeled as having
their individual failure properties. The cabling assembly RBD
“block” was assumed to be comprised of the cable, supports,
and connectors. The cable assembly was assumed to be routed
from the source to the destination individually.
Figure 1 – RBD Example of TV and Fully-Cross Strapped
Architecture
3.3 Channelized and Fully-Cross Strapped Configurations
A fully-cross strapped configuration was assumed to be
such that all functional elements could share data, i.e., FC-1
could share data with all switches, and all switches could share
data with all instrumentation/sensors and effectors/actuators.
Figure 1 shows the RBD configuration of the fully-cross
strapped architecture.
A channelized configuration was assumed to be such that
only functional elements of the same channel could share data,
i.e., Flight Computer-1 (FC-1) only shared data with switch-1
(SW-1), and SW-1 could only share data with Data
Acquisition Unit-1 (DAU-1), Main Propulsion System-1
(MPS-1), etc. Figure 2 shows the RBD configuration of the
channelized architecture.
3.4 Mission Duration
The reliability analysis was performed for a time period
Figure 2 – RBD example of SCP and Channelized
Architecture
4 RELIABILITY BLOCK DIAGRAMS (RBD)
A RBD provides a pictorial representation of the
architecture’s reliability performance. The RBD demonstrates
a logical connection of functional elements needed for system
success. The RBD does not identify the avionics system
topology but rather the functional element logical connection
[6]. The particular assemblies identified by the RBD blocks
identify system operational functions.
The RBD model does not demonstrate physical
configuration, cannot predict mass, does not estimate power
consumption, and cannot guarantee the reliability values
demonstrated are capable of being achieved. However, when
RBDs from different architectures or systems have the same
assumptions (failure rates, fault tolerance, etc), the RBDs can
provide the ability to rank architectures by order of magnitude
comparison, in which case the design engineer can determine
the most reliable system architecture [7].
The architecture RBD calculations take into account the
objectives and related engineering defined aspects of each
system configuration from an assessment of operational
success. The RBD is assembled in a success path for the
system. The series representation indicates a system in which
each block is dependent upon the success of the system.
Parallel block configurations indicate a group of blocks that
provide active redundancy or standby redundancy.
PTC Windchill Quality Solutions (formerly RELEX) was
used as the primary reliability modeling tool. The various
architectures were modeled into different RBD configurations
using the failure rates as described in Section 3.1. By using
the same failure rates, the only variance in results would be
due to the RBD configurations identifying the variant in
configuration of the architectures.
This allows for a
normalized comparison of the architectures.
In the RELEX software, the Optimization Simulation
(OpSim) module was used to depict the RBDs. Results were
calculated using both analytical and Monte-Carlo Simulation
calculations with 1,000,000 iterations. For the Monte-Carlo
Simulations, the confidence level was set at 95%.
ranking of the architectures to determine if one is significantly
better than the others. With these results, two architectures
were eliminating two of the architectures (CBTV and CBSC).
However, additional RBD comparison data was needed to
ensure the vulnerabilities of each architecture was understood.
5 RBD ANALYSIS AND RELIABILITY RESULTS
Cut set analysis provides clear indication of where the
most likely failure paths would be depending on the accuracy
of the RBD and the accuracy of the failure data of the
functional elements. A cut set is a set of basic events
[failures] where the joint occurrence of these basic events
results in the failure of the system [2]. Each cut set can
contain anywhere from one to all functional elements,
depending on the system architecture. A minimal cut set is
defined as a set that “cannot be reduced without losing its
status as a cut set” [8].
Once the minimal cut sets were identified, a comparison
was made to determine if the system with the least reliability
contained the most failure paths, thus, making a
recommendation for the most reliable architecture based on
the number of minimal failure paths. It was determined that,
for the architectures in this study, in general, the more cut sets
an architecture had, the less reliable it tended to be. However,
as the difference in reliability between the architectures
became smaller, a conclusion as to the most reliable
architecture could not be drawn from the number of cut sets
alone. This was due to the difference in number of functional
elements and their configuration in each of the architectures.
Listed below are the various avionic architectures were
evaluated. Following this, the quantity of functional elements,
the redundancy configuration for each architecture, and the
architecture fault tolerance are provided.
1. Fully Cross-Strapped Switched Triplex Voter
(FCSSTV)
2. Partially Cross-Strapped Switched Triplex Voter
(PCSSTV)
3. Channelized Bussed Triplex Voter (CBTV)
4. Fully Cross-Strapped Self-Checking (FCSSC)
5. Fully Cross-Strapped Bussed Self-Checking
(FCSBSC)
6. Channelized Bussed Self-Checking (CBSC)
5.1 Architecture RBD Reliability Results Summary
Figure 3 and Table 1 show the reliability results for the
six architectures. The reliability results are calculated at 9
months (6,480 hours) and include the failure contribution from
cabling. These reliability results were considered to be a best
estimate of the architecture reliability. The results may not be
the actual operational reliability; however, they allow a
Figure 3: Architecture Reliability Comparison Results:
Reliability versus Mission Elapsed Time (MET)
Architecture
R (6480 hrs)
FCSSTV
0.666999
PCSSTV
0.613596
CBTV
0.464581
FCSSC
0.648547
FCSBSC
0.646730
CBSC
0.389427
Table 1: Architecture Reliability Results
6 CUT SET ANALYSIS
Table 2 shows the architectures ranked from most reliable
to least reliable and the number of minimal cut sets calculated
for each, which led the authors to the conclusion that, for the
architectures compared for this study, the number of cut sets
had no inherent tie to system reliability, rather it was more a
function of the modeling detail and number of components in
each architecture.
Architecture
R (6480 hrs)
# of Minimal Cut Sets
FCSSTV
0.666999
75
FCSSC
0.648547
67
FCSBSC
0.646730
73
PCSSTV
0.613596
195
CBTV
0.464581
267
CBSC
0.389427
304
Table 2: Architecture Ranking and Number of Minimal Cut
Sets
7 IMPORTANCE MEASURES IN SUCCESS SPACE
Part of the decision analysis in selecting a specific
architecture includes determining which of the functional
elements can lead to high risk scenarios. In order to assess the
importance of functional elements in the architecture or the
sensitivity of the architecture reliability to changes in the
functional element’s input failure rates, several importance (or
sensitivity) measures are available [2]. “Importance measures
quantify the criticality of a particular functional element
within a system design. They have been widely used as tools
for identifying system weakness, and to prioritize reliability
improvement activities.” [9]
The various measures are based on slightly different
interpretations of the concept of functional element
importance. Intuitively, the functional element importance
should depend on the location of the functional element in the
system, the reliability of the functional element in question,
and the uncertainty in the estimate of functional element
reliability [10].
For this study, the individual functional element
importance measures were divided by the sum of all
importance measures to determine which element represented
the most significant contribution to architecture reliability.
The importance measure reflects how much relative
improvement may be available from improving performance
of a specific functional element. Change in the failure rates of
the functional elements (or adding redundancy to account for
the high failure rate) with the highest importance measure
percent contribution will have the most significant effect on
increasing system reliability.
Typically, importance measures are used in failure space,
or for Fault Tree Analysis (FTA). However, these importance
factors can be defined in success space (RBD Analysis) by
calculating the measures based on the total success of the
system instead of the total risk. Some importance factors do
not preserve their meaning in success space; therefore, they
fail to rank the functional elements appropriately. However,
all importance measures provide one with a single number for
each functional element that can be used as part of a
comparative analysis.
There are five importance measures generally accepted
for use: Birnbaum (BM), Criticality, Fussell-Vesely (FV),
Risk Reduction Worth (RRW), and Risk Achievement Worth
(RAW) [11]. The FV is the fractional contribution of the
event to the overall architecture reliability. The RAW is the
decrease in the architecture reliability if the functional element
is has always failed. The RRW is the increase in the
architecture reliability is the functional element never fails.
The BM is the rate of change in the architecture reliability as a
result of the change in the reliability of a given event. BM can
be expressed also as BM = RAW + RRW [12]. The Criticality
measure is a weighted version of the BM, which takes into
account the ratio of functional element failure probability and
system reliability.
Due to the limitation of the RELEX software program in
calculating importance measures in success space, all
importance measures had to be calculated by hand. The BM,
Criticality, RAW, and RRW measures proved significantly
time consuming, as the conditional reliability calculations had
to be made by performing multiple runs in RELEX to obtain
values (20 or more runs per architecture). However, RELEX
calculations quickly provided minimal cut set and their
corresponding unreliability values that could then be used to
calculate the FV for all functional elements.
Figure 4 shows the comparison of the BM, Criticality,
FV, RRW, and RAW results for the functional elements in the
FCSSTV architecture.
Figure 4: Comparison of Various Importance Measure
Results
Table 3 details the comparison of the RRW, RAW, and
FV measures for the FCSSTV architecture. The RRW and
RAW were calculated on an interval scale to allow for
comparison of architectures [13]. The FV and RAW measures
yielded very similar results, while the FV measure differed
slightly in comparison to the RRW measure.
Table 4 details the comparison of the BM, Criticality, and
FV measures for the FCSSTV architecture. The RRW, BM,
and Criticality measures yielded nearly identical results and
rankings. The FV and RAW measures differed slightly from
the BM and Criticality measures, but the primary contributor
to the unreliability of the architecture (over 21% of the
unreliability in all three measures) remained the same.
RRW
RAW
FV
FC
21.32% FC
24.89%
FC
27.36%
INU
14.92% PIC
20.41%
PIC
18.54%
RGA
14.92% ECU
14.68% ECU 13.63%
PIC
11.01% HCU
13.54% HCU 12.62%
ECU
9.74% INU
10.00%
INU
10.74%
HCU
9.45% RGA
10.00% RGA 10.74%
DAU
4.72% DAU
2.61% DAU
2.54%
SW
4.63% RCS
1.50% RCS
1.47%
RCS
3.67% MPS
1.50% MPS
1.47%
MPS
3.67% SW
0.74%
SW
0.76%
TVC
1.94% TVC
0.12% TVC
0.12%
Table 3: Comparison of RRW, RAW, and FV Importance
Measure Results for FCSSTV Architecture
BM
Criticality
FV
FC
21.93%
FC
21.32%
FC
27.36%
INU
14.07%
INU
14.92%
PIC
18.54%
RGA 14.07% RGA 14.92% ECU 13.63%
PIC
12.62%
PIC
11.01% HCU 12.62%
ECU 10.59% ECU
9.74%
INU
10.74%
HCU 10.15% HCU
9.45%
RGA 10.74%
DAU
4.36%
DAU
4.72%
DAU
2.54%
SW
3.97%
SW
4.63%
RCS
1.47%
RCS
3.30%
RCS
3.67%
MPS
1.47%
MPS
3.30%
MPS
3.67%
SW
0.76%
TVC
1.63%
TVC
1.94%
TVC
0.12%
Table 4: Comparison of BM, Criticality, and FV Importance
Measure Results for FCSSTV Architecture
Although all importance measures provided useful and
similar information on functional elements in success space,
when weighing the time needed to perform the importance
measure analysis versus the benefits that were to be achieved,
the FV was chosen in order to efficiently obtain the
quantitative importance measures for each functional element
for a comparative analysis.
7.1 Fussell-Vesely (FV)Importance for Functional Elements
The FV is the probability that at least one minimal cut set
that contains the functional element (i) has failed at time (t),
given that the system is failed at time (t) [10]. In other words,
the functional element FV is the sum of the unreliability of the
minimal cut sets containing the functional element, divided by
the sum of the unreliability of all of the system’s minimal cut
sets.
Once the functional element FV percent contributions
were calculated for all functional elements and all
architectures, a comparison of functional element importance
between the various architectures can be made.
The
implementation of the FV to compare functional elements
within the different architectures proved somewhat more
involved than first anticipated. Most architectures contained
redundancy with two or more of the same functional element
functions within each; however, there was not a one-to-one
correspondence.
For example, the TV architectures
(FCSSTV, PCSSTV, and CBTV) contained three FCs, while
the SCP architectures (FCSSC, FCSBSC, and CBSC)
contained four FCs. Therefore, the functional element
functions were grouped for comparison by summing the
contributions of all like-functional elements in the
architecture. Table 5 shows the example of how this was done
for FCs.
Architecture
FC-#
FV %
FV %
FC-1
9.12%
FCSSTV
FC-2
9.12%
27.36%
FC-3
9.12%
FC-1
6.66%
PCSSTV
FC-2
6.66%
19.98%
FC-3
6.66%
FC-1
15.83%
CBTV
FC-2
7.63%
39.29%
FC-3
15.83%
FC-1A
8.35%
FC-1B
8.35%
FCSSC
33.39%
FC-2A
8.35%
FC-2B
8.35%
FC-1A
8.51%
FC-1B
8.51%
FCSBSC
34.04%
FC-2A
8.51%
FC-2B
8.51%
FC-1A
11.08%
FC-1B
11.08%
CBSC
44.33%
FC-2A
11.08%
FC-2B
11.08%
Table 5: Example of Functional element grouping for
Importance Measure Comparison
Once all functional element functions were grouped, a
few functions were unable to be compared. For example,
CBTV and CBSC had Bus functional element contributions,
while the other Architectures did not. Overall, though, this
worked to determine which functional elements were most
likely to cause a failure. The results are shown in Figure 5.
Figure 5: Functional element FV Calculations per
Architecture
The functional element importance measures provide the
designer with more information than a single reliability
calculation comparison may provide. Architecture selection is
made based not only on reliability calculations, but on weight,
space, cost, risk to the mission, etc, and the additional
importance measure calculations provide comparison data to
allow the designer to make more informed trade decisions in a
more efficient and effective manner. The difference in
vulnerabilities of the architectures can easily be compared
using the importance measure analysis.
Different distributions of the functional elements’ failure
contributions indicate different reliability improvement paths
for the architectures. Intermediate states can be modeled to
include the impact of failure data integrity on the reliability
and integrity of the architectures. These results can be used to
determine the most efficient and cost-effective way to increase
reliability of the architecture.
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
W.S. Gough, J. Riley, J.M. Koren, “A New Approach to
the Analysis of Reliability Block Diagrams,” Proc. Ann.
Reliability & Maintainability Symp. (Jan.) 1990, pp. 456464.
B.M. Ayyub, Risk Analysis in Engineering and
Economics, New York, Chapman & Hall/CRC, 2003.
D. Nicholls ed., System Reliability Toolkit, Reliability
information Analysis Center (RAIC), 2005.
R. Hammett, “Design by Extrapolation: An Evaluation of
Fault Tolerant Avionics,” IEEE AESS Systems Magazine,
(April) 2002.
J.M. Palsulich, B.J. Schinnerer, Launch Vehicles for
Low-Cost Planetary Missions, Elsevier Science Ltd.,
2002.
P.D.T. O’Connor, Practical Reliability Engineering, 4th
Edition, Wiley, 2006.
M.L.O. e`Souza, T.R. de Carvalho, The Fault Avoidance
and The Fault Tolerance Approaches for Increasing the
Reliability of Aerospace and Automotive Systems, Society
of Automotive Engineers Inc., 2005.
W.R. Blischke, D.N. Prabhakar Murthy, Reliability:
Modeling, Prediction, and Optimization, Wiley, 2000.
J.E. Ramierez-Marquez, D.W. Coit, “Composite
Importance Measures for Multi-State Systems with MultiState Functional elements,” IEEE Transactions on
Reliability, vol. 54, (Sept) 2005, p.517.
M. Rausand, A. Hoyland, System Reliability Theory:
Models, Statistical Methods, and Applications, 2nd
Edition, Wiley, 2003.
J. Apt, “Human Spaceflight Risk Management,”
Encyclopedia of Aerospace Engineering, Wiley, 2010.
M. Stamatelatos, W. Vesely, et al, Fault Tree Handbook
with Aerospace Applications, NASA Office of Safety and
Mission Assurance, August 2002.
W.E. Vesely, T.C. Davis, R.S. Denning, N. Saltos,
Measures of Risk Importance and Their Applications,
Division of Risk Analysis, Office of Nuclear Regulator
Research, NUREG/CR-3385, 1986.
ACKNOWLEDGMENTS
The authors would like to thank Glen (Spence) Hatfield,
Duane H. Pettit, Joseph M. Schuh and Dr. Robert Hodson,
who contributed to the development and review of the RMA
analysis methodology described in this paper.
BIOGRAPHIES
Amanda M. Gillespie, ASQ CRE
SAIC-LX-O3
Operations & Checkout Bldg, M7-0355
Kennedy Space Center, FL 32899 USA
e-mail: amanda.gillespie-1@nasa.gov
Amanda received her BS in Applied Mathematics from the Georgia
Institute of Technology in 2000. Amanda M. Gillespie is a
Reliability Engineer with SAIC at NASA KSC, FL. As a part of the
KSC RMA team, Amanda works with multiple engineering teams to
evaluate and increase the operational and inherent availability of the
systems. Amanda is a member of the American Society for Quality
(ASQ) Reliability and Statistics Societies. Amanda received her
ASQ Certified Reliability Engineer (CRE) certification in January
2011.
Mark W. Monaghan, Ph.D
SAIC-LX-O3
Operations & Checkout Bldg, M7-0355
Kennedy Space Center, FL 32899 USA
e-mail: mark.w.monaghan@nasa.gov
Mark W. Monaghan received his Ph.D in Applied Decision Science
from Walden University in 2008. Mark is a Reliability Engineer with
SAIC at NASA KSC, FL. As a part of the KSC RMA team, Dr.
Monaghan works with multiple engineering teams to evaluate and
increase the operational and inherent availability of the systems. He
is a senior member of Institute of Electrical and Electronic Engineers
(IEEE) Industrial Application Society (IAS). He is also a member of
American Institute of Aeronautics and Astronautics (AIAA) and the
ASQ Reliability Societies.
Yuan Chen, PhD
NASA LaRC
Electronic Systems Branch
5 N. Dryden Street, MS 488
Hampton, VA 23681 USA
e-mail: yuan.chen@nasa.gov
Yuan Chen received her Ph.D. Reliability Engineering from the
University of Maryland at College Park, Maryland, in 1998, with a
Graduate Fellowship from the National Institute of Standards and
Technologies. She is currently a senior member of technical staff
with the Electronic Systems Branch, NASA Langley Research
Center, Hampton, Virginia. Dr. Chen’s research area has been
focused on the development of and reliability methodologies on
microelectronic devices/systems for space applications. She has
authored and co-authored over 40 technical papers, and is a senior
member of IEEE and AIAA.