Probabilistic Constrained Optimization Methodology and Applications
Probabilistic Constrained Optimization Methodology and Applications
Engineering
Design under
Uncertainty
and Health
Prognostics
Springer Series in Reliability Engineering
Series editor
Hoang Pham, Piscataway, USA
More information about this series at http://www.springer.com/series/6917
Chao Hu Byeng D. Youn
•
Pingfeng Wang
123
Chao Hu Byeng D. Youn
Department of Mechanical Engineering School of Mechanical and Aerospace
Iowa State University Engineering
Ames, IA Seoul National University
USA Seoul
Korea (Republic of)
and
Pingfeng Wang
Department of Electrical and Computer Department of Industrial and Enterprise
Engineering Systems Engineering
Iowa State University University of Illinois at Urbana–Champaign
Ames, IA Urbana, IL
USA USA
This Springer imprint is published by the registered company Springer International Publishing AG
part of Springer Nature
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
v
vi Preface
important tool for reliability engineers because they provide both descriptive and
analytical ways to deal with the uncertainty in simulation or experimental data.
The rest of the chapters in this book address the two main challenges in the
practice of life-cycle reliability engineering: (i) how to design a reliable engineered
system during the design stage and (ii) how to achieve high operational reliability
during operation of a system. Reliability-based design with time-independent
reliability analysis tackles the first challenge. Addressing this challenge is the pri-
mary focus of this book. Chapter 4 presents the fundamentals of reliability analysis,
including the concepts and formulations of time-independent and time-dependent
reliability analyses. Chapter 5 introduces state-of-the-art techniques for time-
independent reliability analysis, including the first- and second-order reliability
methods (FORM/SORM), direct and smart Monte Carlo simulation (MCS), and
emerging stochastic response surface methods. Chapter 6 discusses the advanced
topic of reliability analysis: time-dependent reliability analysis in design. Chapter 7
explains how to design a reliable product by introducing reliability-based design
optimization (RBDO).
Health monitoring, diagnostics, prognostics, and management strategies have
been proposed to address the second challenge. These techniques have been
cohesively integrated to the point where a new discipline has emerged: prognostics
and health management (PHM). In recent years, PHM has been successfully applied
to many engineered systems to assess their health conditions in real time under
actual operating conditions and to adaptively enhance life-cycle reliabilities through
condition-based maintenance, which allows for the anticipation and prevention of
unexpected system failures. Chapter 8 discusses the current state-of-the-art tech-
niques from this emerging discipline and Chap. 9 presents successful practices in
several engineering fields.
Each chapter has an extensive collection of examples and exercises, including
engineering and/or mathematical examples that illustrate the material in that
chapter. Supplemental exercises are also provided to allow learners to practice
application to improve understanding of the topics discussed in that chapter.
The authors would like to acknowledge that the work presented in this book was
partially supported by the following organizations: the US National Science
Foundation (NSF): the NSF I/UCRC Center for e-Design, the US Army TARDEC,
the US Nuclear Regulatory Commission (NRC), the Maryland Industrial
Partnerships Program (MIPS), the Korea Institute of Energy Technology Evaluation
and Planning (KETEP), the National Research Foundation (NRF) of Korea, the
Korea Institute of Machinery and Materials, the Korea Agency for Infrastructure
Technology Advancement (KAIA), General Motors, LG Electronics, and Samsung
Electronics.
vii
viii Contents
In the past few decades, reliability has been widely recognized as of great impor-
tance in engineering product design and development. Hence, considerable
advances have been made in the field of reliability-based design optimization
(RBDO), resulting in new techniques for analyzing and improving the reliability of
an engineered system, while taking into account various sources of uncertainty
(e.g., material properties, loads, and geometric tolerances). Additionally, advanced
maintenance strategies have been developed to help ensure systems operate reliably
throughout their lifetime. This chapter introduces basic concepts of reliability,
provides an overview of the history of reliability engineering, presents the way
reliability efforts can be integrated into product design and development, and
provides a framework for the material that will be covered in the subsequent
chapters.
Even since this initial “apologetic” beginning, reliability has grown into an
omnipresent attribute that plays an essential role in the safe and effective operation
of almost any modern engineered system. The persuasive nature of reliability in the
common public and academic community cannot be overstated. In fact, a quick
search of the word “reliability” with Google results in over 100 million results
found on the web [3].
From 1816 through 2015, the occurrence of several key events and develop-
ments has led to the establishment of reliability engineering as a scientific disci-
pline. This scientific discipline started to be established in the mid-1950s and
subsequently maintained rapid development through tremendous support from a
vast community of academia, industry, and government constituents. Despite
numerous social, cultural, and technological achievements enabled by reliability
engineering, many challenges still await in the future. In what follows, we briefly
review key events and developments in the chronological history of reliability
engineering.
Our review starts before the first use of the word “reliability” and points out the
essential theoretical foundation of reliability, i.e., the theory of probability and
statistics, which has supported the establishment of reliability engineering as a
scientific discipline. The theory was initially developed in 1600s to address a series
of questions and interests in gaming and gambling by Blaise Pascal and Pierre de
Fermat [3]. In the 1800s, Laplace further expanded the application domain of this
theory into many practical problems. In addition to probability and statistics as the
theoretical enabler for the emergence of reliability engineering, the concept of mass
production for standardized parts (rifle manufacturing by the Springfield armory in
1795 and Ford Model T car production in 1913) also played an essential role as a
practical enabler [3].
Besides these two enablers, the catalyst for the rise of reliability engineering has
been recognized as the vacuum tube, or more specifically, the triode invented by
Lee de Forest in 1906. By initializing the electronic revolution, the vacuum tube led
to a series of applications such as the radio, television, and radar. The vacuum tube,
which was deemed as the active element contributing to the victory of the Allies in
World War II, was the primary source of equipment failure due to its much more
(about four times more) frequent occurrence of failure than all other equipment.
4 1 Basic Concepts of Reliability
After the war, these failure events motivated the US Department of Defense to
organize investigations into these events, which eventually led to the emergence of
reliability engineering as a scientific discipline in the 1950s. This new discipline
was, for the first time, consolidated and synthesized in the Advisory Group on
Reliability of Electronic Equipment (AGREE) report in 1957. The AGREE was
jointly established in 1952 between the Department of Defense and the American
Electronics Industry for the following missions [4]:
(1) To recommend measures that would result in equipment that is more reliable;
(2) To help implement reliability programs in government and civilian agencies;
(3) To disseminate a better education on reliability.
With the objective to achieve higher reliability, military-funded projects were
launched and a great deal of effort was devoted to failure data collection and root
cause analysis. Furthermore, the specification of quantitative reliability require-
ments emerged as the beginning of a contractual aspect of reliability. These relia-
bility requirements necessitated the development of reliability prediction techniques
to estimate and predict the reliability of a component before it was built and tested.
The milestone of this development was the publication of a major report (TR-1100)
titled “Reliability Stress Assessment for Electronic Equipment” by the Radio
Corporation of America, a major manufacturer of vacuum tubes. Analytical models
for estimating component failure rates were introduced in the report, which then
facilitated the publication of the influential military standard MH-217 in 1961. This
standard is still being used today for reliability prediction [5]. A timeline of the
aforementioned key events that contributed to the emergence of reliability engi-
neering is shown in Fig. 1.1.
In the 1960s, the decade of the first development phase of reliability engineering,
the discipline proceeded along two tracks [3]:
(1) Increased specialization in the discipline, consisting of increased sophistication
of statistical techniques (e.g., redundancy modeling, Bayesian statistics,
Markov chains), the emergence of the discipline of Reliability Physics to
identify and model physical causes of failure, and the development of a separate
subject, Structural Reliability, to assess the structural integrity of buildings,
bridges, and other construction;
Time
Event Birth of Probability & Invention of vacuum tube Mass production AGREE report published
Statistics (Pascal & Fermat, Start of electronic revolution (Ford car, 1913) Birth of reliability engineering
1654) (Forest, 1906) (AGREE, 1957)
Practical
Role Theoretical enabler Catalyst Consolidator & synthesizer
enabler
Fig. 1.1 Timeline of key events leading to the birth of reliability engineering
1.2 History of Reliability Engineering 5
(2) Shifting of the focus from component reliability to system reliability to deal
with increasingly complex engineered systems (e.g., ICBMs, the swing-wing
F-111, the US space shuttle).
The 1970s witnessed work in three broad areas that characterized the develop-
ment of reliability engineering:
(1) Increased interest in system reliability and safety of complex engineered sys-
tems (e.g., nuclear power plants [6]);
(2) A new focus on software reliability, due to the ever-increasing reliance on
software in many safety-critical systems [7];
(3) Design of an incentive program, Reliability Improvement Warranty, to foster
improvements in reliability.
During the last three and a half decades, from 1980 to 2015, significant technical
advances and practical applications have been achieved by academia, government,
industry and/or multilateral collaboration of these stakeholders, with an aim of
addressing the challenges posed by the increasing complexity of modern engineered
systems. An increasing number of publications on reliability engineering can be
found in well-known journals, such as IEEE Transactions on Reliability, and
Reliability Engineering & Systems Safety. These efforts and developments have
enabled reliability engineering to become a well-established, multidisciplinary field
that endeavors to address the following challenging questions [8]:
(1) Why does a system fail? This question is studied by analyzing failure causes
and mechanisms and identifying failure consequences.
(2) How can reliable systems be designed? This is studied by conducting reliability
analysis, testing, and design optimization.
(3) How can high operational reliability be achieved throughout a system’s
life-cycle? This question is studied by developing and implementing health
monitoring, diagnostics, and prognostics systems.
This book attempts to address the second and third questions from the per-
spective of engineering design. Our focus will be placed on reliability-based design
considering various sources of uncertainty, i.e., addressing the second question. The
emerging discipline of prognostics and health management (PHM), which has
received increasing attention from the reliability community, will be briefly dis-
cussed to address the third question.
Fig. 1.2 Overview of product reliability plan with specific tasks. Acronyms are defined as
follows: DOE, Design of Experiment; HALT, Highly Accelerated Life Testing; CAD,
Computer-Aided Design; FEA, Finite Element Analysis; RBD, Reliability Block
Diagram; FTA, Fault Tree Analysis; FMEA, Failure Modes and Effects Analysis; FRACAS,
Failure Report and Corrective Action System; HASS, Highly Accelerated Stress Screening; and
SPC, Statistical Process Control
1.3 Reliability Practices in a Product’s Life-Cycle 7
techniques (e.g., a reliability block diagram [12] and statistical inference [13]). The
reliability model can be extended by incorporating the effects of operating condi-
tions and environmental factors on product reliability. The reliability model not
only provides valuable information about the reliability of a design, but also paves
the way for reliability-based design optimization (RBDO). In RBDO, the tech-
niques of reliability analysis and design optimization are integrated to develop
reliability-based design methodologies that offer probabilistic approaches to engi-
neering design. Reliability-based design attempts to find the optimum design of an
engineered product that minimizes the cost and satisfies a target level of reliability,
while accounting for uncertainty in parameters and design variables [14]. Once
prototype units are built based on the optimal design derived from RBDO, relia-
bility testing can be performed to gain more insights into the reliability of the
design.
Task IV—Performing Reliability Testing: The design of a reliable product
entails a carefully designed and executed test plan. The plan should place an
emphasis on accelerated life testing (ALT) and highly accelerated life testing
(HALT), both of which expose a product to environmental stresses (e.g., temper-
ature, pressure, and loading) above what the product experiences in normal use, in
order to stimulate failure to occur more quickly than in actual use. ALT/HALT
serve as a formal process to measure and/or improve product reliability in a timely
and cost-effective manner. They guarantee a proportional reduction in the time to
market as well as greatly improved reliability [9]. Efforts should be devoted to
planning robust reliability testing that takes into account various operating condi-
tions using the design of experiment (DOE) technique. Reliability testing should be
integrated with reliability-based design (Task III) in two aspects. First, all failures
observed during reliability testing should be investigated and documented using the
failure report and corrective action system (FRACAS), and the actual root cause of
failure should be fed into reliability-based design as design feedback for reliability
improvement. Second, the testing results should be used to assess and improve the
validity of the reliability model developed and used in Task III. Note that, in
addition to reliability testing, field return and warranty data can be an invaluable
source of information in identifying and addressing potential reliability problems to
improve the design.
Task V—Controlling Production Quality: During the manufacturing phase, the
reliability engineer must conduct quality control by incorporating statistical process
control (SPC). If the manufacturing process is not adequately tracked using SPC
techniques, the relatively large uncertainty inherent in the manufacturing process is
likely to cause manufacturing-induced unreliability. To reduce the magnitude of
manufacturing uncertainty, use of the SPC technique is suggested. SPC uses sta-
tistical monitoring to measure and control the uncertainty in the manufacturing
process. Integrating SPC into the manufacturing process requires the following four
steps: (i) selecting key personnel (e.g., the quality control manager) for SPC skill
training; (ii) conducting sensitivity analysis to identify critical process parameters
that significantly affect the product quality; (iii) deploying hardware and software
for SPC; and (iv) investigating the cause thoroughly if a large uncertainty is
1.3 Reliability Practices in a Product’s Life-Cycle 9
observed. The SPC provides the manufacturing team with the ability to identify
specific causes of large variations and to make process improvements for higher
product reliability. In addition, through proper rework on products that are detected
to have defects, the manufacturing line can output a larger number of qualified
products given the same amount of cost. Thus, greater revenue can be achieved.
In the last decade or so, reliability engineering has been extending its domain of
application to sensory-based health monitoring, diagnostics, and prognostics.
A tremendous amount of research efforts have been devoted to this extension, and
this stream of research is centered on utilizing sensory signals acquired from an
engineered system to monitor the health condition and predict the remaining useful
life of the system over its operational lifetime. This is treated as Task VI, and is one
of the major focuses of this book.
This health information provides an advance warning of potential failures and a
window of opportunity for implementing measures to avert these failures.
in this emerging discipline that enable optimal design of sensor networks for fault
detection, effective extraction of health-relevant information from sensory signals,
and robust prediction of remaining useful life.
References
1. Mobley, R. K. (2002). An introduction to predictive maintenance (2nd ed., Chap. 1). New
York, NY: Elsevier.
2. Coleridge, S. T. (1983). Biographia literaria. In J. Engell & W. J. Bate (Eds.), The collected
works of Samuel Taylor Coleridge. New Jersey, USA: Princeton University Press.
3. Saleh, J. H., & Marais, K. (2006). Highlights from the early (and pre-) history of reliability
engineering. Reliability Engineering & System Safety, 91(2), 249–256.
4. Coppola, A. (1984). Reliability engineering of electronic equipment: A historical perspective.
IEEE Transactions on Reliability, 33(1), 29–35.
5. Denson, W. (1998). The history of reliability prediction. IEEE Transactions on Reliability, 47
(3-SP), 321–328.
6. WASH-1400. (1975). Reactor safety study. USA: US Nuclear Regulatory Commission.
7. Moranda, P. B. (1975). Prediction of software reliability during debugging. In Proceedings of
the Annual Reliability Maintenance Symposium (pp. 327–332).
8. Zio, E. (2009). Reliability engineering: Old problems and new challenges. Reliability
Engineering & System Safety, 94(2), 125–141.
9. O’Connor, P. D. T., Newton, D., & Bromley, R. (2002). Practical reliability engineering (4th
ed.). West Sussex, England: Wiley.
10. Ireson, W. G., Coombs, C. F., & Moss, R. Y. (1996). Handbook of reliability engineering and
management (2nd ed.). New York, NY: McGraw-Hill Professional.
11. Xiong, Y., Chen, W., Tsui, K.-L., & Apley, D. (2008). A better understanding of model
updating strategies in validating engineering models. Computer Methods in Applied
Mechanics and Engineering, 198(15–16), 1327–1337.
12. Kuo, W., & Zuo, M. J. (2002). Optimal reliability modeling: Principles and applications.
Hoboken, NJ: Wiley.
13. Singpurwalla, N. D. (2006). Reliability and risk: A bayesian perspective. New York, NY:
Wiley.
14. Haldar, A., & Mahadevan, S. (2000). Probability, reliability, and statistical methods in
engineering design. New York, NY: Wiley.
Chapter 2
Fundamentals of Probability Theory
With the event E being a countable union of disjoint sets s, the relation above can
be derived from the addition theorem for mutually exclusive events. This will be
explained in detail in a subsequent section. On the other hand, if the sample space X
is uncountably infinite and s represents a real value, s can then be treated as a
continuous random variable. In this case, a probability density function (PDF) is
used to represent p(s).
2.2 Axioms and Theorems of Probability Theory 13
This section presents the basic axioms and theorems of probability theory that form
the theoretical basis for probability analysis.
Other useful rules that follow from Axioms 1–3 can be expressed as follows:
Pð£Þ ¼ 0
PðE1 Þ PðE2 Þ if E1 E2 ð2:3Þ
1 Þ ¼ 1 PðE1 Þ
PðE
14 2 Fundamentals of Probability Theory
The third rule of the equation above can be proven using Axioms 2 and 3.
Applications of these axioms and rules lead to useful theorems for the calculation of
the probability of events. These will be detailed in the next subsection.
Events E1, E2, …, En are mutually exclusive if the occurrence of any event excludes
the occurrence of the other n − 1 events. In other words, any two events Ei and Ej in
this set of events cannot occur at the same time, or are mutually exclusive. This can
be mathematically expressed as Ei \ Ej = Ø. For example, a battery cell could fail
due to internal short circuit or over-discharge. The events E1 = {failure due to
internal short circuit} and E2 = {failure due to over-discharge} can be treated as
mutually exclusive due to the impossibility of the simultaneous occurrence of the
two events. It follows from Axiom 3 (Sect. 2.2.1) that
In general, if events E1, E2, …, En are mutually exclusive, the following addition
theorem applies:
!
[
n X
n
P Ei ¼ PðEi Þ ð2:5Þ
i¼1 i¼1
With this knowledge of mutually exclusive events in place, let us now investi-
gate a general case where the mutually exclusive assumption may not hold. The
probability that event E1, event E2, or both events occur is computed based on the
union of two events, expressed as
For the case of n events E1, E2, …, En, the probability that at least one of them
occurs can be derived by generalizing Eq. (2.6) as
0 1
!
[
n X
n X B \ C
P Ei ¼ ð1Þj þ 1 Pj with Pj ¼ P@ Ek A ð2:7Þ
i¼1 j¼1 1 i1 \\ij n k2fi1 ;...;ij g
Example 2.1 Suppose that among a set of 50 battery cells, 1 cell suffers from
damage due to an internal short circuit and 2 cells suffer from damage due to
overvoltage. If we randomly choose one cell from the set, what is the
probability of getting a defective cell?
Solution
Since the events E1 = {get a cell with an internal short circuit} and E2 = {get
a cell with an overvoltage} are mutually exclusive, we can compute the
probability of their union using Eq. (2.4) as
Events E1 and E2 are independent if the occurrence of one event does not have any
influence on that of the other. In such cases, the probability of a joint event can be
computed as the multiplication of the probabilities of individual events, expressed
as
For a general case where events E1 and E2 may not be independent, the probability
of the joint event can be expressed as
In the equations above, P(E1|E2) and P(E2|E1) are conditional probabilities that
assume that E2 and E1 have occurred, respectively.
Assuming events E1, E2, …, En are mutually exclusive (Ei \ Ej = Ø for all
i 6¼ j) and collectively exhaustive (the sum of probabilities of all events equals one),
we then divide the probability of an arbitrary event EA into the probabilities of
n mutually exclusive events, expressed as
16 2 Fundamentals of Probability Theory
X
n X
n
PðEA Þ ¼ PðEA \ Ei Þ ¼ PðEi ÞPðEA jEi Þ ð2:10Þ
i¼1 i¼1
We call this equation the total probability theorem. This theorem further leads to
Bayes’ theorem that is used to compute the posterior probability of an event as a
function of the priori probability and likelihood. Bayes’ theorem will be discussed
in detail in Chap. 3.
A random variable is a function that maps events in the sample space X into
outcomes in the real number space R where the outcomes can be real or integer,
continuous or discrete, success or failure, etc. The random variable often written as
X: X ! R, is useful in quantifying uncertainty mathematically. In what follows, we
will introduce two types of random variables, namely discrete random variables and
continuous random variables.
A discrete random variable X is a function that maps events in a sample space into
to a finite or countably infinite set of real numbers. An example of a discrete
random variable can be found in specimen tensile tests with 10 kN tensile force. If
we repeat this test 100 times with each test employing 20 specimens, the number of
failed specimens in each tensile test can be treated as a discrete random variable.
This variable can only represent a finite set of discrete integer values between 0 and
20. In general, a discrete random variable can only represent a value at a finite or
infinite set of discrete points. This means that the probability of such a variable can
only be computed at these discrete points. The randomness in this variable is
described using the so-called probability mass function (PMF), denoted as pX(x).
Assuming that X can be any of a series of discrete values x1, x2, …, xM, we can then
define the PMF of X as
X
M
pX ðxk Þ ¼ PðX ¼ xk Þ; with pX ð x k Þ ¼ 1 ð2:11Þ
k¼1
Note that the PMF is a discrete function that consists of a series of discrete values,
as shown in Fig. 2.3a. The cumulative distribution function (CDF) of a discrete
random variable can be computed by adding all PMFs together. The CDF takes the
2.3 Random Variables and Univariate Distribution Functions 17
form of a step function, as shown in Fig. 2.3b. Mathematically, the CDF can be
expressed as
X
FX ð xÞ ¼ PðX xÞ ¼ pX ð x k Þ ð2:12Þ
xk x
In the next chapter, we will introduce several commonly used discrete distributions.
Example 2.2 Let X be the number of heads that appear if a coin is tossed
three times sequentially. Compute the PMF of X.
Solution
If H and T represent head and tail, respectively, the sample space for this
experiment can be expressed as X = {HHH, HHT, HTH, THH, HTT, THT,
TTH, TTT}. First, consider the event {X = 0}, which can be easily mapped to
the outcome w = TTT. The probability of {X = 0} or pX(0) can be computed
as
3
1 1
pX ð0Þ ¼ PðfX ¼ 0gÞ ¼ Pðfw ¼ TTTgÞ ¼ ¼
2 8
3
1 3
pX ð1Þ ¼ PðfX ¼ 1gÞ ¼ Pðfw 2 fHTT, THT, TTHggÞ ¼ 3 ¼
2 8
3
1 3
pX ð2Þ ¼ PðfX ¼ 2gÞ ¼ Pðfw 2 fTHH, HTH, HHTggÞ ¼ 3 ¼
2 8
3
1 1
pX ð3Þ ¼ PðfX ¼ 3gÞ ¼ Pðfw ¼ HHHgÞ ¼ ¼
2 8
The computations of pX(1) and pX(2) use the addition theorem for mutually
exclusive events and the multiplication theorem for independent events, as
introduced earlier in the chapter.
Let us consider again the outcome of an experiment. This time the experiment is
conducted to test an LED light bulb until it burns out. The random variable
X represents the bulb’s lifetime in hours. Since X can take any positive real value, it
does not make sense to treat X as a discrete random variable. As a random variable
that can represent continuous values, X should be treated as a continuous random
variable, and its randomness can be modeled using a PDF fX(x). Note that the PDF
at any point only provides information on density, not mass. Only integration
(volume) of the PDF gives information on mass (probability). Thus, we become
interested in computing the probability that the outcome of X falls into a specific
interval (x1, x2]. This probability can be computed as
Zx2
Pðx1 \X x2 Þ ¼ fX ð xÞdx ð2:13Þ
x1
Based on the equation above, we can calculate the CDF FX(x) by setting x1 and x2 to
−∞ and x, respectively. This can be expressed as
Zx
FX ð xÞ ¼ PðX xÞ ¼ fX ð xÞdx ð2:14Þ
1
The PDF fX(x) has a relationship with the CDF FX(x) almost everywhere, expressed
as
2.3 Random Variables and Univariate Distribution Functions 19
dFX ð xÞ
f X ð xÞ ¼ ð2:15Þ
dx
The relationship between the PDF fX(x) and CDF FX(x) of a continuous random
variable is shown in Fig. 2.4, where it can be observed that Eq. (2.13) can be
equivalently written as
To satisfy the three axioms in Sect. 2.2.1, a CDF FX(x) must possess the following
properties:
(1) F is non-decreasing, i.e., FX(x1) FX(x2) whenever x1 > x2.
(2) F is normalized, i.e., FX(−∞) = 0, FX(+∞) = 1.
(3) F is right-continuous, i.e.,
lim FX ðx þ eÞ ¼ FX ð xÞ ð2:17Þ
e#0
0.6
0.4
0.2
0
x1 x2 X
fX(x)
0.3
x2
0.2 P ( x1 < X ≤ x2 ) = ∫ f X ( x )dx
x1
0.1
0 x1 x2 X
20 2 Fundamentals of Probability Theory
Zþ 1
fX ð xÞdx ¼ 1 ð2:18Þ
1
Example 2.3 Assume that the remaining useful life (in days) of an engineered
system follows the following distribution function
2x
; for x 0
f X ð xÞ ¼ ð1 þ x2 Þ2
0; for x\0
Zþ 1 Zþ 1 Zþ 1
2x u¼1 þ x2 1
fX ð xÞdx ¼ dx ¼ du ¼ 1
ð 1 þ x2 Þ 2 u2
1 0 1
Z5 Z26
2x u¼1 þ x2 1 126 4
P ð 3 X 5Þ ¼ dx ¼ du ¼ ¼
ð 1 þ x2 Þ 2 u2 u10 65
3 10
(3) The reliability can be written as P(X 5). This probability can be
computed using Eq. (2.13) as
Zþ 1 Zþ 1 þ1
2x u¼1 þ x2 1 1 1
P ð X 5Þ ¼ 2
dx ¼ du ¼ ¼
ð 1 þ x2 Þ u 2 u 26 26
5 26
2.4 Statistical Moments of Random Variables 21
In Sect. 2.3, we learned that the randomness in a random variable can be exactly
modeled using a distribution function. In engineering practice, we also need to
characterize the statistical nature of a random variable with numerical parameters that
can be estimated from available samples. Such numerical parameters include mean,
standard deviation, skewness, kurtosis, etc.; these are known as statistical moments.
2.4.1 Mean
Let us first consider a discrete random variable X that can take a series of discrete
values x1, x2, …, xM with probabilities pX(x1), pX(x2), …, pX(xM). The mean lX, or
expected value E(X), can be defined as
X
M
Eð X Þ ¼ x k pX ð x k Þ ð2:19Þ
k¼1
Note that if we have a set of independent and identically distributed (i.i.d.) random
samples, the arithmetic mean of these samples approaches the expected value in
Eq. (2.19) as the number of samples approaches infinity. The proof of this statement
can be derived based on the strong law of large numbers, which is omitted here. In
reliability analysis, we are particularly interested in computing the expectation of a
system’s performance. A system’s performance is often a function of several random
variables. If we assume that the input X of a system performance function g(X) is a
discrete random variable, we can calculate the expected value of g(X) as
X
M
E ðgð X ÞÞ ¼ gðxk ÞpX ðxk Þ ð2:20Þ
k¼1
Example 2.4 Consider the discrete random variable in Example 2.2. Compute
its expected value.
Solution
Since X can represent any of the values 0, 1, 2 and 3, its expected value can
be computed as
X
3
1 3 3 1 3
EðX Þ ¼ x pX ð x Þ ¼ 0 þ1 þ2 þ3 ¼
x¼0
8 8 8 8 2
22 2 Fundamentals of Probability Theory
Example 2.5 Assume that the number X of failed power transformers per year
follows the following PMF
kx ek
pX ð x Þ ¼ ; x ¼ 0; 1; 2; . . .
x!
Solution
The calculation of the mean value of X can be expressed as
X
þ1
EðX Þ ¼ x pX ð x Þ
x¼0
X
þ1
kx ek
¼ x
x¼0
x!
X
þ 1
kx ek l¼x1 k X
þ1 l
k
¼ ¼ ke
x¼1
ðx 1Þ! l¼0
l!
¼ kek ek ¼ k
Note: As we will see in Sect. 2.5, the PMF in this example is called a Poisson
distribution, which is useful in modeling randomness in the number of events
occurring in a fixed period of time under the assumptions of occurrence
independence and a constant occurrence rate.
Zþ 1
EðX Þ ¼ xfX ð xÞdx ð2:21Þ
1
If we treat the integral above as a summation of the multiplications of x and its PDF
value fX(x) over infinitesimally narrow intervals, the expected value of a continuous
random variable can also be regarded as a weighted-sum average, which bears a
resemblance to a discrete random variable. If we assume that the input X of a system
performance function g(X) is a continuous random variable, we can calculate the
expected value of g(X) as
2.4 Statistical Moments of Random Variables 23
Zþ 1
E ð gð X Þ Þ ¼ gð xÞfX ð xÞdx ð2:22Þ
1
Example 2.6 Assume that a continuous random variable X follows the fol-
lowing distribution function
1 1 x l2
fX ð xÞ ¼ pffiffiffiffiffiffi exp
2pr 2 r
Solution
It is equivalent to show that E(X − l) = 0. The expectation can be computed
as
Zþ 1
EðX Þ ¼ ðx lÞfX ð xÞdx
1
Zþ 1
1 1 x l2
¼ ðx lÞ pffiffiffiffiffiffi exp dx
2pr 2 r
1
Zþ 1
1 xl 1 x l2
¼ pffiffiffiffiffiffi exp dx
2p r 2 r
1
Zþ 1
1 1
Eð X Þ ¼ pffiffiffiffiffiffi z exp z2 dz
2pr 2
1
þ 1
1 1
¼ pffiffiffiffiffiffi exp z2 ¼0
2pr 2 1
Note: As we will see in Sect. 2.5, the PDF in this example is called a
Gaussian or normal distribution—the most widely used continuous
distribution.
…, xM. In such cases, the sample mean can be used as an approximation to the
population mean. The sample mean is expressed as
1X M
^X ¼
l xk ð2:23Þ
M k¼1
As the number of samples M approaches infinity, the sample mean approaches the
population mean.
2.4.2 Variance
Similar to the mean, the variance is also an expectation. Letting X in Eq. (2.19) be
(X − lX)2, we can then obtain the expression of the variance of a discrete random
variable X as
h i XM
Varð X Þ ¼ E ðX lX Þ2 ¼ ðxk lX Þ2 pX ðxk Þ ð2:24Þ
k¼1
The equation above suggests that the variance is the weighted-sum average of the
squared deviation of X from its mean value lX and that the variance measures the
dispersion of samples in a probabilistic manner.
Replacing X in Eq. (2.21) with (X − lX)2 gives the expression of the variance of
a continuous random variable X as
h i Zþ 1
Varð X Þ ¼ E ðX lX Þ2 ¼ ðx lX Þ2 fX ð xÞdx ð2:25Þ
1
The standard deviation of X is defined as the square root of its variance, denoted by
r.
For a random variable with a non-zero mean, the computation is easier using a
simplified formula for variance, specifically
Varð X Þ ¼ E X 2 ðE ð X ÞÞ2 ð2:26Þ
which essentially decomposes the calculation of the variance into the calculations of
the expectation of square and of the square of expectation. When there is a larger
number of random samples, x1, x2, …, xM, the sample variance can be computed as
1 X M
EðX Þ ¼ ^ X Þ2
ð xk l ð2:27Þ
M 1 k¼1
2.4 Statistical Moments of Random Variables 25
Observe from Eqs. (2.24) and (2.25) that, for both discrete and continuous
random variables, the variance is defined as the expectation of the squared deviation
of the corresponding random variable from its mean value. This definition of
variance can be generalized to the definition of the nth central moment of X,
expressed as E[(X − lX)n]. The variance is equivalent to the 2nd central moment. In
Sect. 2.4.3, we will introduce two higher-order moments, namely skewness and
kurtosis, which are defined based on the 3rd and 4th central moments, respectively.
Solution
According to Eq. (2.25), we can compute the variance of X as
h i Zþ 1
2
Varð X Þ ¼ E ðX lÞ ¼ ðx lÞ2 fX ð xÞdx
1
Zþ 1
1 1 x l2
¼ ðx lÞ pffiffiffiffiffiffi exp
2
dx
2pr 2 r
1
Zþ 1
r x l2 1 x l2
¼ pffiffiffiffiffiffi exp dx
2p r 2 r
1
Zþ 1
r2 1 2
Varð X Þ ¼ pffiffiffiffiffiffi z exp z dz
2
2p 2
1
Zþ 1
r2 1 2
¼ pffiffiffiffiffiffi z d exp z
2p 2
1
þ 1 Zþ 1
r2 1 2 2 1 1 2
¼ pffiffiffiffiffiffi z exp z þ r pffiffiffiffiffiffi exp z dz
2 2p 2 1 2p 2
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} 1
0 |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl
ffl {zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl
ffl}
1
¼ r2
The fact that first term on the right hand side is zero is due to the faster decay
of exp(−0.5z2) than z. Detailed computation of the second term will be dis-
cussed in the next subsection.
26 2 Fundamentals of Probability Theory
Kurtosis measures the portion of the variance that results from infrequent extreme
deviations. Thus, for a random variable with high kurtosis, a large portion of its
variance is due to infrequent extreme deviations.
The first four statistical moments are named mean, variance, skewness, and
kurtosis, respectively. As will be discussed in later chapters, these moments provide
information regarding the uncertainty of a system’s performance for reliability
analysis and, in some cases, can even help estimate the PDF of the system per-
formance. Furthermore, the two lower-order moments (i.e., mean and variance) are
important quantities used in design problems, such as reliability-based robust
design optimization.
(a) (b)
fX(x)
fX(x)
X X
Fig. 2.5 Positively skewed PDF (a) versus negatively skewed PDF (b)
2.5 Commonly Used Univariate Distribution Functions 27
In this chapter’s prior sections, we have introduced discrete and continuous random
variables and presented fundamentals of the PMF/PDF and CDF for modeling
randomness in these random variables. In what follows, commonly used univariate
distribution functions will be introduced with an aim to enhance understanding of
probability distributions as well as to provide preliminary information to lay the
foundation for reliability analysis to be introduced in later chapters.
This section introduces three commonly used discrete distributions, namely bino-
mial distribution, Poisson distribution, and geometric distribution.
Binomial Distribution
Recall Example 2.2 where a coin is tossed three times sequentially. The
experiment in this example possesses two properties: (i) only two outcomes (head
and tail) are possible in an experimental trial (tossing a coin); and (ii) repeated trials
are conducted independently with each trial having a constant probability for each
outcome (0.5 for head and 0.5 for tail). If a sequence of n experimental trials
satisfies the two properties and the probability of occurrence of an outcome in each
trial is p0, the number X of occurrences of this outcome follows a binomial dis-
tribution, and its PMF pX(x) can be expressed as
where C(n, x) = n!/[x!(n − x)!] is the binomial coefficient and can be interpreted as
the number of different ways that the outcome occurs x times out of n trials. The
equation above can be derived by summing the probabilities of C(n, x) mutually
exclusive events, of which each has a probability of occurrence px0(1 − p0)n − x.
Example 2.8 Reconsider Example 2.2, except here the coin is tossed five
times. Compute the probability of getting at least two heads in five tosses.
Solution
Let X denote the number of heads in five tosses, which follows a binomial
distribution pX(x; 5, 0.5). The probability of {X 2} can be computed as
28 2 Fundamentals of Probability Theory
This calculation uses the rule for the probability of complementary events, as
outlined in Eq. (2.3).
Poisson Distribution
In Example 2.5, we presented the PMF of a discrete random variable X following
the Poisson distribution. It takes the following form
kx ek
pX ð x Þ ¼ ; x ¼ 0; 1; 2; . . . ð2:31Þ
x!
The Poisson distribution can be used to model the randomness in the number of
events occurring in a fixed period of time under the assumptions of occurrence
independence and a constant occurrence rate. We have seen from Example 2.5 that
the mean of X is k. It will be shown in Example 2.9 that the variance of X is also k.
The Poisson distribution can be treated as a limiting case of the binomial dis-
tribution. In fact, as the number of trials n ! +∞, the probability of occurrence
p0 ! 0, and np0 = k, the binomial distribution can be well represented by the
Poisson distribution. This can be proved by setting p0 = k/n in Eq. (2.30) and
applying the limit n ! +∞, expressed as
x
n! k k nx
pX ðx; n; p0 Þ ¼ lim 1
n! þ 1 x!ðn xÞ! n n
nð n 1Þ ð n x þ 1Þ kx k n k x
¼ lim 1 1
n! þ 1 nx x! n n
ð2:32Þ
1 x 1 kx k n
¼ lim 1 1 1 1
n! þ 1 n n x! n
n
x k
k x
k ke
¼ lim 1 ¼
x! n! þ 1 n x!
2.5 Commonly Used Univariate Distribution Functions 29
Example 2.9 Reconsider Example 2.5. Find the variance of the number of
failed power transformers per year.
Solution
According to Eq. (2.26), the variance of X can be computed as
X
þ1
kx ek
Varð X Þ ¼ E X 2 ðE ð X ÞÞ2 ¼ x2 k2
x¼0
x!
X
þ1
kx ek
¼ ð x 1 þ 1Þ k2
x¼1
ðx 1Þ!
X
þ1
kx ek X
þ1
kx ek
¼ þ k2
x¼2
ð x 2 Þ! x¼1
ð x 1 Þ!
X
þ1 l
k X
þ1 l
k
Varð X Þ ¼ k2 ek þ kek k2
l¼0
l! l¼0
l!
¼ k þk k ¼ k
2 2
Geometric Distribution
In a sequence of experimental trials, we are often interested in calculating how
many trials have to be conducted until a certain outcome can be observed. For a
Bernoulli sequence of experimental trials, the number X of trials conducted before
the first occurrence of a certain outcome follows a geometric distribution as
pX ð xÞ ¼ ð1 pÞx1 p; x ¼ 1; 2; . . . ð2:33Þ
Example 2.10 The acceptance scheme for purchasing lots containing a large
number of batteries is to test no more than 75 randomly selected batteries and
to reject a lot if a single battery fails. Assume the probability of a single
battery failure is 0.001.
(1) Compute the probability that a lot is accepted.
(2) Compute the probability that a lot is rejected on the 20th test.
30 2 Fundamentals of Probability Theory
(3) Compute the probability that a lot is rejected in less than 75 trials.
Solution
(1) The probability of acceptance is equal to the probability of no failure in
75 trials
(2) The probability that a lot is rejected on the 20th test can computed by
using a geometric distribution with X = 19 and p = 0.001 as
In reliability analysis, continuous probability distributions are more often used than
discrete probability distributions. This section presents three commonly used con-
tinuous distributions, namely normal distribution, Weibull distribution, and expo-
nential distribution.
Normal Distribution
The most widely used continuous probability distribution is the normal or Gaussian
distribution. The density function of a normal distribution can be expressed as
1 1 x l2
f X ð xÞ ¼ pffiffiffiffiffi
ffi exp ð2:34Þ
2pr 2 r
area as one, the height decreases. The change of the mean only leads to the change
of the center location of the PDF while the shape remains the same.
If l = 0 and r2 = 1, X follows a standard normal distribution, its PDF
fX(x) becomes a standard normal density, and its CDF FX(x) becomes a standard
normal CDF, denoted as U(x). If l 6¼ 0 or r2 6¼ 1, we can use the substitution
z = (x − l)/r to transform the original normal distribution into a standard normal
distribution as
1 1
fZ ðzÞ ¼ pffiffiffiffiffiffi exp z2 ð2:35Þ
2p 2
Next, let us verify that the normal PDF fX(x) satisfies the requirements of a PDF.
Clearly, fX(x) > 0 for all x 2 R. We then need to show that the integration of fX(x) is
one. First, we use the substitution z = (x − l)/r to simplify the integration as
0.6
F (x)
X
0.4
0.2
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
X
1
= 0, = 0.5
= 0, = 1.0
0.8
= 0, = 1.5
= 2, = 0.5
0.6
fX (x)
0.4
0.2
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
X
32 2 Fundamentals of Probability Theory
Zþ 1 Zþ 1
1 1 x l2
fX ð xÞdx ¼ pffiffiffiffiffiffi exp dx
2pr 2 r
1 1
ð2:36Þ
Zþ 1
1 1
¼ pffiffiffiffiffiffi exp z2 dz
2p 2
1
Zþ 1
1
I¼ exp z2 dz ð2:37Þ
2
1
Next, we change Cartesian x-z coordinates to polar r-h coordinates with the sub-
stitutions x = rcosh, z = rsinh and dxdy = rdrdh. In Cartesian coordinates, the
integral is over the entire x-z plane. Equivalently, in polar coordinates, the inte-
gration is over the range of the radius r from 0 to +∞ and the range of the angle h
from 0 to 2p. We then have
0 10 þ 1 1
Zþ 1 Z
1 2 1
I2 ¼ @ exp x dxA@ exp z2 dzA
2 2
1 1
Zþ 1 Z2p
1
¼ exp r 2 rdhdr
2
0 0 ð2:39Þ
Zþ 1
1
¼ 2p exp r 2 rdr
2
0
þ 1
1
¼ 2p exp r 2 ¼ 2p
2 0
pffiffiffiffiffiffi
This gives I ¼ 2p and the integral in Eq. (2.36) then becomes one. By calculating
I, we solve the problem left in Example 2.5.
2.5 Commonly Used Univariate Distribution Functions 33
Lognormal Distribution
If the logarithm of a continuous random variable X follows a normal distribution,
the random variable follows a lognormal distribution
" #
1 1 ln x l 2
fX ð xÞ ¼ pffiffiffiffiffiffi exp ð2:40Þ
2prx 2 r
Here, x > 0, l > 0 and r > 0. Note that l and r are two lognormal parameters, not
the mean and standard deviation of X. The mean lX, and standard deviation rX of X,
can be computed as
lX ¼ exp l þ r2 =2
2 ð2:41Þ
r2X ¼ er 1 exp 2l þ r2
Similar to the normal case, we can also transform a lognormal random variable to a
standard normal variable by using the substitution z = (ln x − l)/r. We can then
derive the CDF of a lognormal random variable as
ln X l ln x l
FX ð xÞ ¼ PðX xÞ ¼ P
r r
ln xl
Zr
1 1 2 ð2:42Þ
¼ pffiffiffiffiffiffi exp z dz
2p 2
1
ln x l
¼U
r
Lognormal PDFs with different means and standard deviations are compared in
Fig. 2.7. As rX increases, the width of the PDF increases and the height decreases.
Unlike the normal distribution, the lognormal distribution can represent values in
(0, +∞) and is unsymmetrical.
Solution
(1) We first derive the lognormal parameters by inversely using Eq. (2.41) as
1=2
l ¼ ln l2X = rX þ l2X
1=2
r ¼ ln rX =l2X þ 1
Then the probability can be computed by using the normal CDF values as
(2) We apply the same procedure to solve this question. First, we compute
the lognormal parameters as l = 1.9164 and r = 0.0329. We then
compute the critical standard normal values as
Weibull Distribution
The Weibull distribution is a continuous probability distribution that is widely used
to model the time-to-failure distribution of engineered systems. The PDF of a
Weibull random variable X is defined as
k
fX ð xÞ ¼ kk ðkxÞk1 eðkxÞ ð2:43Þ
Here, x > 0, k > 0 is the scale parameter, and k > 0 is the shape parameter.
Integrating the Weibull PDF results in a Weibull CDF of the following form
k
FX ð xÞ ¼ 1 eðkxÞ ð2:44Þ
f X ð xÞ
hX ð x Þ ¼ ¼ kk ðkxÞk1 ð2:45Þ
1 FX ð xÞ
for the time x > 0. The failure rate is a constant over time if k = 1. In this case, the
Weibull distribution becomes an exponential distribution. A value of k > 1
36 2 Fundamentals of Probability Theory
fX ð xÞ ¼ kekx ð2:46Þ
FX ð xÞ ¼ 1 ekx ð2:47Þ
Zþ 1 Zþ 1 Zþ 1
MTTF ¼ t fT ðtÞdt ¼ t kekt dt ¼ tdekt
0 0 0
Zþ 1
¼ tekt j0þ 1 þ ekt dt
0
1 1
¼ 0 þ ¼ ¼ 100 h
k k
Zþ 1 Zþ 1
Pðt 200Þ ¼ fT ðtÞdt ¼ kekt dt
200 200
þ1
¼ ekt 200 ¼ e 2
0:1353
38 2 Fundamentals of Probability Theory
In the previous sections, we have focused our discussion on single random vari-
ables. In engineering practice, quite often two or more random variables are present.
For example, in a two-dimensional rectangular plate, both the length and width can
be modeled as random variables. Thus, it is important to study multiple random
variables, which are often grouped as a single object, namely a random vector. The
aim of this section is to introduce the probability model of a random vector,
conditional probability, and independence. We will separately discuss both con-
tinuous and discrete random vectors that consist of two random variables.
2.6 Random Vectors and Joint Distribution Functions 39
Let us first consider the discrete case. The joint PMF of two discrete random
variables X and Y is defined by
In this case, the joint CDF, denoted as FXY(x, y), can be defined in a similar fashion
as the univariate CDF. It takes the following form
XX
FXY ðx; yÞ ¼ pXY xi ; yj ð2:49Þ
yi y xi x
We can derive the marginal PMFs pX(x) and pY(y) from the joint PMF pXY(x, y) as
X
pX ð x Þ ¼ pXY xi ; yj
j
X ð2:50Þ
pY ð y Þ ¼ pXY xi ; yj
i
Example 2.13 Find the marginal PMF pX(x) given the following joint PMF
2½ðx þ 1Þ=ðx þ 2Þ y
pXY ðx; yÞ ¼ nðn þ 3Þ ; if x ¼ 0; 1; 2; . . .; n 1; y 0
0; otherwise
Solution
For x = 0, 1, 2, …, n, we take the summation of y over the whole range
X
þ1
pX ð x Þ ¼ pXY ðx; yÞ
y¼1
X
þ1
2½ðx þ 1Þ=ðx þ 2Þ y
¼
y¼0
nð n þ 3Þ
2 1 2ð x þ 2Þ
¼ ¼
nðn þ 3Þ 1 ðx þ 1Þ=ðx þ 2Þ nðn þ 3Þ
Thus
2ðx þ 2Þ
pX ð x Þ ¼ nðn þ 3Þ ; if x ¼ 0; 1; 2; . . .; n 1;
0; otherwise
40 2 Fundamentals of Probability Theory
X
n1 X
n1
2ð x þ 2Þ 2 nð n þ 3Þ
pX ð x Þ ¼ ¼ ¼1
x¼0 x¼0
nð n þ 3Þ nð n þ 3Þ 2
Next, let us consider the continuous case. The joint CDF of two continuous
random variables X and Y is defined by
Zy Zx
FXY ðx; yÞ ¼ PðX x; Y yÞ ¼ fXY ðs; tÞdsdt ð2:51Þ
1 1
where fXY(x, y) is the joint PDF of X and Y. The joint PDF fXY(x, y) can also be
obtained from the joint CDF as
An example joint PDF is shown in Fig. 2.11, where X and Y are jointly continuous.
The line boundary between two surface areas with different line styles follows the
shape of a conditional PDF, which will be introduced in the next section.
2.6 Random Vectors and Joint Distribution Functions 41
The integrations of the joint PDF fXY(x, y) with respect to X and Y give the
marginal PDFs fY(y) and fX(x), respectively, as
Zþ 1
f Y ð yÞ ¼ fXY ðs; yÞds
1
ð2:53Þ
Zþ 1
fX ð xÞ ¼ fXY ðx; tÞdt
1
The equation above suggests that integrating out one random variable produces the
marginal PDF of the other.
Example 2.14 Find the marginal PDFs fU(u) and fV(v) of two random vari-
ables u and v given the following joint PDF (|q| < 1)
1 1 2
/ðu; v; qÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi exp u 2quv þ v 2
2p 1 q2 2 ð 1 q2 Þ
Solution
The symmetry of the joint PDF indicates that the two random variables have
the same marginal PDFs. It is then sufficient to only compute fU(u). To do so,
we integrate out v as
42 2 Fundamentals of Probability Theory
Zþ 1
f U ð uÞ ¼ /ðu; v; qÞdv
1
Zþ 1
1 1
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi exp u 1 q þ ðv quÞ
2 2 2
dv
2p 1 q2 2 ð 1 q2 Þ
1
Zþ 1
eu =2
2
1 1
¼ pffiffiffiffiffiffi pffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffi exp
2
ðv quÞ dv
2p 2p 1 q2 2ð 1 q 2 Þ
1
Zþ 1
eu =2
2
1 1 2
fU ðuÞ ¼ pffiffiffiffiffiffi pffiffiffiffiffiffi exp w dw
2p 2p 2
1
We have shown that the integration is one. This gives the marginal PDF as
eu =2
2
fU ðuÞ ¼ pffiffiffiffiffiffi
2p
from Sect. 2.2.3 that the probability of a joint event can be expressed as the product
resulting from the multiplication of the probabilities of an individual event and a
conditional event. Then it follows that
PðE1 \ E2 Þ
PðE1 jE2 Þ ¼ ð2:54Þ
PðE2 Þ
In the equation above, P(E1|E2) is the conditional probability, assuming that E2 has
occurred. If we apply this relationship to a discrete random variable, we then obtain
a conditional PMF as
Similarly, pY|X(y|x) = pXY(x, y)/pX(x). These formulae indicate that, for any fixed
x and y, pY|X(y|x) and pX|Y(x|y) share the same shapes as slices of pXY(x, y) with fixed
x and y, respectively. Two discrete random variables are independent if and only if
the following relationship holds
Example 2.15 Let us recall the example of fatigue tests. The sample mea-
surements can be obtained for the physical quantities in the strain-life model
below.
De r0f b c
¼ 2Nf þ e0f 2Nf
2 E
Note that P(E1) = 8/20 = 2/5, P(E2) = 16/20 = 4/5, P(E1 \ E2) = 4/
20 = 1/5. Find the conditioned probabilities P(E1|E2) and P(E2|E1)..
Solution
From Eq. (2.54), we have
PðE1 \ E2 Þ 1=5 1
PðE1 jE2 Þ ¼ ¼ ¼
PðE2 Þ 4=5 4
PðE1 \ E2 Þ 1=5 1
PðE2 jE1 Þ ¼ ¼ ¼
PðE1 Þ 2=5 2
As shown in Fig. 2.10 (with n = 5), for any fixed x, the joint pXY(x, y) be-
comes a slice as a function of y, which shares the same shape as that of the
above pX|Y(x|y).
Example 2.16 Find the conditional PMF pY|X(y|x) given the joint PMF in
Example 2.13.
Solution
Recall from Example 2.13 that
2ðx þ 2Þ
pX ð x Þ ¼ nðn þ 3Þ ; if x ¼ 0; 1; 2; . . .; n 1;
0; otherwise
Thus, for y 0,
2.6 Random Vectors and Joint Distribution Functions 45
Therefore,
( y
xþ1
pYjX ðyjxÞ ¼
1
xþ2 xþ2 ; if y 0
0; if y\0
As shown in Fig. 2.10 (with n = 5), for any fixed x, the joint pXY(x, y) be-
comes a slice as a function of y, which shares the same shape as that of the
above pX|Y(x|y).
fXY ðx; yÞ
fXjY ðxjyÞ ¼
f Y ð yÞ
ð2:57Þ
fXY ðx; yÞ
fYjX ðyjxÞ ¼
fX ð xÞ
Two jointly continuous random variables X and Y are independent if and only if
fXY(x, y) = fX(x)fY(y). As shown in Fig. 2.11, a slice of fXY(x, y) for a fixed x (i.e., the
line boundary between two surface areas) shares the same shape as the conditional
PDF.
In the case of a single random variable, we use its variance to describe the extent to
which samples deviate from the mean of this variable. In the case of two random
variables, we need to measure how much the two variables deviate together, or the
cooperative deviation. For that purpose, we define the covariance between two
random variables X and Y as
CovðX; Y Þ X lX Y lY
qXY ¼ ¼E ð2:60Þ
rX rY rX rY
The bivariate normal or Gaussian distribution is a widely used joint distribution that
can be easily extended to the multivariate case. In order to define a general bivariate
normal distribution, let us first define a standard bivariate normal distribution as
1 1 2
/ðu; v; qÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi exp u 2quv þ v 2
ð2:61Þ
2p 1 q2 2 ð 1 q2 Þ
Fig. 2.12 Scatter plots of random samples from correlated and dependent (a), uncorrelated and
dependent (b), and uncorrelated and independent (c) random variables
2.6 Random Vectors and Joint Distribution Functions 47
where u and v are jointly continuous standard normal variables, and |q| < 1 is the
correlation coefficient. Recall from Example 2.14 the following integration
Zþ 1
eu =2
2
Zþ 1 Zþ 1 Zþ 1
eu =2
2
This integral of a standard normal PDF has been shown to be one. Until now, we
have shown that the double integral of the bivariate normal PDF in Eq. (2.61)
equals one, which means the probability that the pair (u, v) falls into any point on
the whole 2D plane is one.
A bivariate standard normal PDF surface for q = 0 is shown in Fig. 2.13a.
Observe from the figure that the surface exhibits a perfect circular symmetry. In
other words, for all (u, v) combinations on a circle with a certain radius, the standard
normal PDF takes the same values. This can be more clearly seen from the PDF
contour in Fig. 2.13b. Next, let us take one step further to derive Eq. (2.61) under
q = 0 as
1 1
/ðu; v; q ¼ 0Þ ¼ exp u2 þ v2
2p 2
2 2 ð2:64Þ
1 u 1 v
¼ pffiffiffiffiffiffi exp pffiffiffiffiffiffi exp
2p 2 2p 2
Fig. 2.13 Bivariate standard normal PDF surface (a) and PDF contour (b): q = 0
48 2 Fundamentals of Probability Theory
Fig. 2.14 Bivariate standard normal PDF surface (a) and PDF contour (b): q = 0.8
This means that the joint PDF is a product of two marginal standard normal PDFs.
We then note that u and v are independent if q = 0. We further note that, if q 6¼ 0,
the joint PDF is not separable. Thus, jointly continuous standard normal variables
are independent if and only if their correlation coefficient q = 0. The joint PDF
surface for q = 0.8 is plotted in Fig. 2.14a and the corresponding PDF contour is
plotted in Fig. 2.14b. Note that the circles on which the joint PDF takes constant
values now become ellipses whose axes are the 45° and 135° diagonal lines.
Based on the bivariate standard normal PDF, we can easily define a general
bivariate normal PDF with the mean values lX and lY, the standard deviations rX
and rY, and the correlation coefficient qX as
1 x lX y lY
fXY ðx; yÞ ¼ / ; ;q
rX rY rX rY
2 2
ð2:65Þ
exp 2ð1q1
2 Þ
xlX
rX 2q XY
xlX
rX
ylY
rY þ ylY
rY
XY
¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2prX rY 1 q2XY
By integrating out unneeded variables, we can show that the marginal PDFs
fX(x) and fY(y), respectively, follow normal distributions N(lX, r2X) and N(lY, r2Y).
The bivariate normal distribution can be generalized for an N-dimensional random
vector X: X ! RN. The joint CDF and PDF for an N-dimensional random vector
X are written as
Joint CDF: FX ðxÞ ¼ P \ ni¼1 fXi xi g
@n ð2:66Þ
Joint PDF: fX ðxÞ ¼ F ðxÞ
@x1 @xn X
n2 1
12
fX ðxÞ ¼ ð2pÞ jRX j exp ðx lX ÞT R1
X ð x lX Þ ð2:67Þ
2
where lX and RX are the mean vector and covariance matrix of X, respectively.
2.7 Exercises
2:1 A drawer contains 10 pairs of loose, unpaired socks. Three pairs are black, 4
are gray, and 3 are red. Answer the following questions:
(1) If you remove 2 of the socks, what is the probability that one is black
and the other is gray?
(2) If you remove 2 of the socks, what is the probability that they will
match?
2:2 Recall the fatigue test example. The sample measurements can be obtained
for the physical quantities in the damage model below.
De r0f b c
¼ 2Nf þ e0f 2Nf
2 E
Consider a set of 30 measurement data (see Table 2.1) for the fatigue
ductility coefficient (ef′) and exponent (c) used in the strain-life formula.
Answer the following questions:
(1) Compute the sample means and variances of the fatigue ductility
coefficient (ef′) and exponent (c), respectively.
(2) Construct the covariance matrix and find the coefficient of correlation
using the data set given in Table 2.1.
2:3 In a cellular phone company, LCD fracture failures are commonly experi-
enced. To gain a good understanding of the LCD fracture failure, the com-
pany performed a dent test on 50 LCD modules. The test data is given in
Table 2.1 Data for the fatigue ductility coefficient and exponent
ef′ c ef′ c ef′ c ef′ c ef′ c
0.022 0.289 0.253 0.466 0.539 0.630 0.989 0.694 1.611 0.702
0.071 0.370 0.342 0.531 0.590 0.621 1.201 0.690 1.845 0.760
0.146 0.450 0.353 0.553 0.622 0.653 1.304 0.715 1.995 0.759
0.185 0.448 0.354 0.580 0.727 0.635 1.388 0.717 2.342 0.748
0.196 0.452 0.431 0.587 0.729 0.645 1.392 0.716 3.288 0.821
0.215 0.460 0.519 0.655 0.906 0.703 1.426 0.703 6.241 0.894
50 2 Fundamentals of Probability Theory
Table 2.2 where the test data identifies the failure displacement (df) and
failure force (Ff). Answer the following questions:
(1) Compute the sample means and variances of the failure displacement
(df) and failure force (Ff).
(2) Construct the covariance matrix and find the coefficient of correlation
using the data set given in Table 2.2.
2:4 During a manufacturing process that lasts 10 days, 15 units are randomly
sampled each day from the production line to check for defective units.
Based on historical information, it is known that the probability of a
defective unit is 0.05. Any time that two or more defectives are found in the
sample of 15, the process of that day is stopped.
(1) What is the probability that, of the 10 total days, 3 days will have the
production stopped?
(2) Given (1), what is the probability that the process is stopped in the first
2 days?
2:5 Customers arrive in a certain store according to a Poisson process with a rate
of k = 4 per hour (the number of customers arriving at any time interval with
a length t follows a Poisson distribution with the parameter kt). Given that
the store opens at 9:00 am, answer the following questions.
(1) What is the probability that exactly 1 customer has arrived by 9:30 am?
(2) Given (1), what is the probability that a total of 5 customers have
arrived by 11:30 am?
2:6 Let X and Y be two discrete random variables that take any of the values 1, 2,
or 3. Their joint PMF pXY(x, y) is given by the following matrix, with pXY(x,
y) being the element on the xth row and the yth column
Table 2.2 Failure displacement/force data (from the LCD module dent test)
df Ff df Ff df Ff df Ff df Ff
1.0105 2.0428 1.1680 1.9648 1.2717 1.8874 1.2233 2.2746 1.0946 2.7343
0.6915 2.6993 0.6809 2.5530 0.6093 2.4002 1.1010 2.6674 1.0367 1.8956
1.4959 1.9897 0.6728 1.9722 0.6436 2.5901 0.9569 1.8998 1.4014 2.4851
0.8989 2.7379 1.2995 1.3366 1.2011 2.7858 0.6554 2.7567 1.3191 2.1417
0.8676 2.7248 1.4146 1.7351 1.1431 1.9160 1.3022 2.4822 1.2609 1.3805
0.6558 2.0972 1.0804 2.7867 1.0735 2.5905 0.9038 2.5498 0.6462 2.3304
0.8684 2.4429 0.6982 2.4016 1.2567 2.3779 1.1471 2.2240 0.6656 2.0282
0.6417 2.0810 1.3432 2.3986 1.2365 2.2008 1.2671 2.7754 0.6797 1.9377
1.0549 1.9513 0.9043 2.0410 1.3032 2.5604 0.6943 2.3596 1.3185 2.3303
1.2853 2.1653 0.8646 2.1482 0.8592 2.0957 0.7151 2.7484 1.4487 2.0789
2.7 Exercises 51
2 3
0 0 1=8
4 1=2 1=4 0 5
0 0 1=8
likelihood estimation, and statistical tests) can be used to derive such a probability
distribution. These two categories of methods are separately discussed in this
section.
Graphical display of sample data enables visual examination of the data, which
often gives insights useful for choosing an appropriate probability distribution to
model the uncertainty of the population from which the data are sampled. Two
graphical methods, namely histograms and probability plotting, have proved their
usefulness in statistical data analysis.
Histogram
A histogram visually shows the frequency distribution of a given sample of
observations. A histogram can be constructed using the following steps:
Step 1: Specify the number of bins nb based on the number of observations M,
subdivide the range of data into nb equal intervals, and specify the boundaries of
these intervals.
Step 2: Count the number of observations falling into each interval as the frequency
in the interval and, if needed, calculate the normalized frequency by dividing the
observed frequency in the interval by the total number of observations M.
Step 3: Plot the histogram by drawing, above each interval, a rectangle whose width
is the length of the interval and whose height is the frequency (or normalized
frequency) corresponding to the interval.
The number of bins nb is very critical in constructing an informative histogram.
Generally speaking, a choice of nb between 5 and 20 often gives satisfactory results
in engineering practice. An empirical square root relationship between nb and the
number of observations M (i.e., nb = M1/2) can be used as a general guideline in
determining the number of bins.
We can practice the above-mentioned three steps using the comprehensive
strength data in Table 3.1. The data were obtained from testing 80 Aluminum–
Lithium alloy specimens. In what follows, we demonstrate the process of creating a
histogram in a step-by-step manner.
Step 1: Using the empirical square root relationship, we find nb = M1/2 = 801/2 9.
Therefore, the raw data are subdivided into 9 equal intervals. The minimum and
maximum strength values, 76 and 245, are then rounded off to 70 and 250,
respectively. Thus, the total length of the intervals is 180 with 9 bins, which results
in the width of each bin being 20.
Step 2: The number of observations falling into each interval and the resultant
frequency distribution are then counted and summarized, as shown in Table 3.2.
3.1 Conventional (or Frequentist) Statistical Methods 55
Table 3.1 Compressive 105 221 183 186 121 181 180 143
strength data (unit: psi) from
97 154 153 174 120 168 167 141
80 Aluminum–Lithium alloy
specimens 245 228 174 199 181 158 176 110
163 131 154 115 160 208 158 133
207 180 190 193 194 133 156 123
134 178 76 167 184 135 229 146
218 157 101 171 165 172 158 169
199 151 142 163 145 171 148 158
160 175 149 87 160 237 150 135
196 201 200 176 150 170 118 149
The normalized frequency is also computed by dividing the frequency by the total
number of observations.
Step 3: The histogram is graphed, as shown in Fig. 3.1a, using the frequency values
in Table 3.2. Also plotted is the normalized histogram, shown in Fig. 3.1b, whose
total area is approximately 1.0.
A histogram graphically informs an engineer of the properties (e.g., central
tendency, dispersion, skewness) of the distribution of the sample data. These
properties often give insights into the choice of a probability distribution to model
the uncertainty of the population. As can be seen from Fig. 3.1, the histogram
appears to be symmetric and bell-shaped, which provides evidence that the normal
distribution is a good choice to represent the population of compressive strength
measurements. The good match between the histogram and the normal fit
strengthens the validity of this choice. Detailed information regarding how to fit a
normal distribution and how to quantitatively validate this fit will be discussed in
subsequent sections.
The histogram in Fig. 3.1 possesses a single mode where the maximum value is
taken. This type of probability distribution is referred to as a unimodal distribution.
In engineering practice, we may also encounter probability distributions with more
than one mode. An example of a bimodal distribution is shown in Fig. 3.2. The
56 3 Statistical Data Analysis
Fig. 3.1 Histogram (a) and normalized histogram (b) of compressive strength
random variable is the power loss due to the friction between the piston ring and
cylinder liner, oil consumption, blow-by, and/or liner wear rate in a V6 gasoline
engine. Compared to a unimodal distribution, a bimodal distribution is more
complicated and more challenging to analyze. Reliability analysis involving this
type of PDF will be discussed in detail in Chap. 5.
Solution
In conventional statistical inference, probability is interpreted as normalized
frequency. Thus, the reliability can be estimated by computing the normalized
frequency of finding a sample alloy specimen whose comprehensive strength
is larger than 110 psi, expressed as
Probability Plotting
As mentioned earlier, the visual display of the sample distribution provided by a
histogram provides insights into which probability distribution offers a reasonable
representation of the uncertainty in the sample data. However, for a problem with
sample data of a small to moderate size, the histogram might give a misleading
indication on the underlying distribution. In such cases, the underlying distribution
can be better identified by plotting the sample data along the x-axis and the cor-
responding empirical cumulative probability values along the y-axis. This graphical
method is referred to as probability plotting. It assumes a hypothesized distribution
and uses a probability paper that is constructed according to the hypothesized
distribution. Probability paper is commonly used for the normal, lognormal, and
Weibull distributions. If the empirical cumulative curve follows a linear line on the
probability paper, it can be concluded that the underlying distribution conforms to
the hypothesis.
The process of constructing a probability plot consists of three steps. First, the
sample of observations is sorted in ascending order. The sorted observations are
denoted as x1, x2, …, xM. Next, the empirical cumulative probability values are
computed for xi as (i − 0.5)/M, i = 1, 2, …, M. Finally, the empirical cumulative
probability values are plotted versus the sample values on the probability paper
corresponding to a hypothesized distribution. If the hypothesized distribution cor-
rectly represents the uncertainty of the data, the plotted points should form an
approximately straight line. The further the points deviate from a straight line, the
greater the indication of a departure from the hypothesized distribution. It is usually
a subjective decision to determine whether the data plot forms a straight line.
This process can be illustrated using the compressive strength data in Table 3.1.
Let us, for example, investigate whether the data follow a normal distribution by
using the normal probability plot. First, we arrange the data in ascending order and
compute the empirical cumulative probability values, as summarized in Table 3.3.
Then, we plot (i − 0.5)/M and xi on a normal probability paper, as shown in
Fig. 3.3a. A straight line is also plotted to help judge whether the data plot follows a
58 3 Statistical Data Analysis
Table 3.3 Empirical cumulative probability values for compressive strength data
xi i (i − 0.5)/M xi i (i − 0.5)/M xi i (i − 0.5)/M xi i (i − 0.5)/M
76 1 0.0063 145 21 0.2562 163 41 0.5062 181 61 0.7562
87 2 0.0187 146 22 0.2687 163 42 0.5188 183 62 0.7688
97 3 0.0313 148 23 0.2813 165 43 0.5313 184 63 0.7813
101 4 0.0437 149 24 0.2938 167 44 0.5437 186 64 0.7937
105 5 0.0563 149 25 0.3063 167 45 0.5563 190 65 0.8063
110 6 0.0688 150 26 0.3187 168 46 0.5687 193 66 0.8187
115 7 0.0813 150 27 0.3312 169 47 0.5813 194 67 0.8313
118 8 0.0938 151 28 0.3438 170 48 0.5938 196 68 0.8438
120 9 0.1063 153 29 0.3563 171 49 0.6062 199 69 0.8562
121 10 0.1187 154 30 0.3688 171 50 0.6188 199 70 0.8688
123 11 0.1313 154 31 0.3812 172 51 0.6312 200 71 0.8812
131 12 0.1437 156 32 0.3937 174 52 0.6438 201 72 0.8938
133 13 0.1563 157 33 0.4063 174 53 0.6563 207 73 0.9063
133 14 0.1688 158 34 0.4188 175 54 0.6687 208 74 0.9187
134 15 0.1812 158 35 0.4313 176 55 0.6813 218 75 0.9313
135 16 0.1938 158 36 0.4437 176 56 0.6937 221 76 0.9437
135 17 0.2062 158 37 0.4562 178 57 0.7063 228 77 0.9563
141 18 0.2188 160 38 0.4688 180 58 0.7188 229 78 0.9688
142 19 0.2313 160 39 0.4813 180 59 0.7312 237 79 0.9812
143 20 0.2437 160 40 0.4938 181 60 0.7438 245 80 0.9938
Fig. 3.3 Normal probability plots of compressive strength data using cumulative probability
values (a) and standard normal scores (b)
3.1 Conventional (or Frequentist) Statistical Methods 59
linear line. Since all the points appear to lie around the straight line, it can be
concluded that the compressive strength follows, at least approximately, a normal
distribution.
It is worth noting that we can also build a probability plot on a standard two
dimensional graph paper. This can be done by transforming the CDF values of xi to
the corresponding standard normal scores zi. This transformation can be expressed
as
i 0:5
zi ðxi Þ ¼ U1 ðPðZ zi ÞÞ ¼ U1 ð3:1Þ
M
where U1 is the inverse standard normal CDF. The standard normal scores zi are
plotted against the sample values xi in Fig. 3.3b. This normal probability plot makes
the empirical cumulative curve linear by transforming the cumulative probabilities
of xi, while the one in Fig. 3.3a does so by adjusting the scale of the y-axis. Note
that these two plots are virtually equivalent.
Step 1: Equate the first population moment E(X) to the first sample moment (sample
mean).
Step 2: Equate the second population moment E[(X − µ)2] to the second sample
moment (sample variance).
60 3 Statistical Data Analysis
Example 3.2 Suppose we have a set of random samples x1, x2, …, xM from a
normal distribution with the PDF as
1 1 x l2
fX ðx; l; rÞ ¼ pffiffiffiffiffiffi exp
2pr 2 r
Solution
The first population moment (population mean) is
Eð X Þ ¼ l
1X M
EðX Þ ¼ l ¼ xi
M i¼1
Next, equating the second theoretical moment about the mean with the cor-
responding sample moment gives:
h i 1X M
E ðX lÞ2 ¼ r2 ¼ ^ Þ2
ð xi l
M i¼1
Table 3.4 Mean and standard deviation as functions of distributional parameters
Distribution Probability mass/density function (PMF/PDF) Moments as functions of parameters
Discrete Bernoulli 1x
pX ðx; p0 Þ ¼ p0 Þ
px0 ð1 l X ¼ p0
x ¼ 0; 1 r2X ¼ p0 ð1 p0 Þ
Binomial pX ðx; n; p0 Þ ¼ C ðn; xÞpx0 ð1 p0 Þnx lX ¼ np0
x ¼ 0; 1; 2; . . .; n r2X ¼ np0 ð1 p0 Þ
Poisson kx ek lX ¼ k; r2X ¼ k
pX ð xÞ ¼
x!
x ¼ 0; 1; 2; . . .
Geometric pX ð xÞ ¼ ð1 pÞx1 p lX ¼ 1=p
x ¼ 1; 2; . . . r2X ¼ ð1 pÞ=p2
h
i
Continuous Normal 1 2 lX ¼ l; r2X ¼ r2
fX ð xÞ ¼ pffiffiffiffi
2pr
exp 12 xl
r
3.1 Conventional (or Frequentist) Statistical Methods
" #
Lognormal 1 1 ln x l 2 lX ¼ exp l þ r2 =2
fX ð xÞ ¼ pffiffiffiffiffiffi exp 2
Finally, we need to solve the two equations for the two parameters. In this
particular case, the equations appear to have already been solved for l and r2.
We can thus obtain the method of moments estimators of the parameters as
1X M
^¼
l xi
M i¼1
1X M
^2 ¼
r ^ Þ2
ð xi l
M i¼1
Y
M
LðhÞ ¼ fX ðx1 ; hÞfX ðx2 ; hÞ fX ðxM ; hÞ ¼ f X ð x i ; hÞ ð3:2Þ
i¼1
For convenience, we can use the logarithm of the likelihood function, namely the
log-likelihood function, expressed as
X
M
ln LðhÞ ¼ ln fX ðxi ; hÞ ð3:3Þ
i¼1
The maximum likelihood estimator of h that maximizes the likelihood function above
can be obtained by equating the first-order derivative of this function to zero. In cases
where multiple distributional parameters need to be estimated, the likelihood func-
tion becomes a multivariate function of the unknown distributional parameters. We
can find the maximum likelihood estimators of these parameters by equating the
corresponding partial derivatives to zeros and solving the resultant set of equations.
3.1 Conventional (or Frequentist) Statistical Methods 63
Example 3.3 Suppose we have a set of random samples x1, x2, …, xM from a
normal distribution with the PDF as
1 1 x l2
fX ðx; l; rÞ ¼ pffiffiffiffiffiffi exp
2pr 2 r
Solution
First, the likelihood function of the random samples is derived as
Y
M
1 Y
M
1 xi l2
Lðl; rÞ ¼ pffiffiffiffiffiffi exp
fX ðxi ; l; rÞ ¼
i¼1 i¼1 2pr 2 r
" #
n=2 1 X M
¼ 2pr2 exp 2 ðxi lÞ2
2r i¼1
n
1 XM
ln Lðl; rÞ ¼ ln 2pr2 2 ð x i lÞ 2
2 2r i¼1
@ ln Lðl; rÞ 1XM
¼ 2 ðxi lÞ
@l r i¼1
@ ln Lðl; rÞ M 1 XM
¼ 2 þ 4 ð x i lÞ 2
@r 2r 2r i¼1
Finally, we equate the derivatives to zero and solve the resultant set of
equations to obtain the maximum likelihood estimators
1X M
^¼
l xi
M i¼1
1X M
^2 ¼
r ^ Þ2
ð xi l
M i¼1
Note that the maximum likelihood estimators are the same as the estimators
from the method of moments (see Example 3.2).
64 3 Statistical Data Analysis
Using this information, we can graphically demonstrate the basic idea of the
MLE. First, we randomly generate 100 samples from a normal distribution with
l = 3.0 and r = 1.0. We then fix r at 1.0 and plot the log-likelihood values against
l in Fig. 3.4. Observe that the log-likelihood function has a maximum value around
l = 3.0, which indicates that the maximum likelihood estimator of l is approxi-
mately equal to its true value.
For the case of a discrete probability distribution, the likelihood function of the
samples becomes the probability of obtaining the samples x1, x2, …, xM, expressed
as
Y
M
LðhÞ ¼ PðX1 ¼ x1 ; X2 ¼ x2 ; . . .; XM ¼ xM Þ ¼ pXi ðxi ; hÞ ð3:4Þ
i¼1
where pX(x; h) is the PMF of the discrete random variable and h is the unknown
distributional parameter. Observe that the maximum likelihood estimator of the
parameter of a discrete distribution maximizes the probability of obtaining the
sample values x1, x2, …, xM.
Example 3.4 Suppose the samples x1, x2, …, xM are randomly drawn from a
Bernoulli distribution with the PMF as
px0 ð1 p0 Þ1x x ¼ 0; 1
pX ðx; p0 Þ ¼
0 otherwise
Solution
First, we derive the log-likelihood function of the random samples as
!
X
M X
M X
M
ln Lð pÞ ¼ ln pX ðx; p0 Þ ¼ ln p0 xi þ lnð1 p0 Þ M xi
i¼1 i¼1 i¼1
Finally, we equate the derivative to zero and solve the resultant equation to
obtain the maximum likelihood estimator
1X M
^p0 ¼ xi
M i¼1
Note that the estimator is the same as the estimator from the method of
moments (see Table 3.4).
where FX(x) is the CDF of the hypothesized distribution being tested, and xui and xli
are respectively the upper and lower limits of the ith interval. It follows that the test
statistic
X
nb
ðOi Ei Þ2
v2 ¼ ð3:6Þ
i¼1
Ei
X
nb
ðOi Ei Þ2
v2 ¼ [ v2a;nb k1 ð3:7Þ
i¼1
Ei
where v2a;nb k1 is the chi-square critical value with nb − k − 1 degrees of freedom
and the significance level a.
The chi-square goodness-of-fit test can be applied to both continuous and dis-
crete distributions. Since this test involves the use of binned data (i.e., data cate-
gorized into classes), the value of the chi-square test statistic varies depending on
how the data are binned. Furthermore, this test requires a reasonably large number 1
of observations in order for the chi-square approximation to be valid.
Example 3.5 Use the chi-square goodness-of-fit test to determine whether the
compressive strength data of the Aluminum–Lithium alloy in Table 3.1
follow a normal distribution (a = 0.05).
1
A rough rule of thumb concerning the sample size is that there should be at least 50 samples in
order for the chi-square approximation to be valid. A more rigorous approach to sample size
determination involves studying the power of a hypothesis test [1].
3.1 Conventional (or Frequentist) Statistical Methods 67
Solution
The distributional parameters, l and r, of the hypothesized normal dis-
tribution can be directly obtained from the sample data as lX = 162.66,
rX = 33.77. We then follow Table 3.2 to divide the sample data into intervals
and pool end intervals with frequencies less than 5 with neighboring intervals
to make the frequencies at least 5. The distribution of observed and expected
frequencies is shown in the table below. We then execute the hypothesis
testing step-by-step.
X
nb
ðOi Ei Þ2
v2 ¼
i¼1
Ei
1X M
SX ð x Þ ¼ I ð xi xÞ ð3:8Þ
M i¼1
where I() is an indicator function that takes the value 1 if xi x, and 0 otherwise.
As shown in Fig. 3.5, the empirical CDF is essentially a step function that exhibits a
step increase of 1/M at each sample point. Now that we have the definition of the
empirical CDF, the K-S distance can then be defined as the maximum distance
between the empirical and theoretical CDF curves, expressed as
1.0
S X ( xi )
0.8
DKS
0.6
FX ( xi )
0.4
0.2
0
x1 x2 x3 x4 x5 X
3.1 Conventional (or Frequentist) Statistical Methods 69
Here, the K-S distance DKS is the test statistic whose probability distribution
depends only on the sample size M. With a specified significance level a, we can
obtain the critical value from a K-S look-up table. The null hypothesis in favor of
the assumed distribution should be rejected if
where DKS,a,M is the K-S critical value with the sample size M and the significance
level a.
A distinctive feature of the K-S test is that, unlike the chi-square test, it does not
require a large sample size to make the assumption about the validity of the dis-
tribution of the test statistic. However, the K-S test is only applicable to continuous
distributions. More details regarding these two statistical tests can be found in [2]
and [3], respectively.
Example 3.6 Use the K-S goodness-of-fit test to determine whether the
compressive strength data of the Aluminum–Lithium alloy in Table 3.1 can
be represented by a normal distribution (a = 0.05).
Solution
As mentioned in Example 3.5, the distributional parameters of the hypothe-
sized normal distribution can be obtained from the samples as lX = 162.66,
rX = 33.77. Based on Eq. (3.8), we can compute the empirical CDF SX(x),
which is plotted along with the theoretical CDF in Fig. 3.6.
We then execute the hypothesis testing step-by-step.
1. Specify the null and alternative hypotheses as
H0 The compressive strength data follow a normal distribution.
H1 The compressive strength data do not follow a normal distribution.
2. The significance level a = 0.05.
3. The K-S test statistic is defined as
Bayes’ theorem (also known as Bayes’ rule or Bayes’ law) is developed based on
the definition of conditional probability. If A and B denote two stochastic events, P
(A|B) denotes the probability of A conditional on B. Bayes’ theorem relates the
conditional and marginal probabilities of A and B as
3.2 Bayesian Statistics 71
PðBjAÞ PðAÞ
PðAjBÞ ¼ ð3:11Þ
PðBÞ
The theorem simply states that a conditional probability of A given B is equal to the
conditional probability of B given event A, multiplied by the marginal probability of
A and divided by the marginal probability of B [4]. It is straightforward to derive
Bayes’ theorem in Eq. (3.11) based on the probability rules introduced in Chap.2.
From the multiplication theorem, we know that P(A,B) = P(A|B)P(B), and simi-
larly, P(B,A) = P(B|A)P(A). Since P(A,B) = P(B,A), the right sides of these two
equations equal each other, which gives P(A|B)P(B) = P(A|B)P(A). Moving P
(A) from the right-hand side to the left-hand side leaves us with Eq. (3.11).
The terms in Bayes’ theorem are defined as follows:
• P(A) is the prior probability or marginal probability of A. The prior probability
can be treated as the subjective probability that expresses our belief prior to the
occurrence of B. It is “prior” in the sense that it does not take into account any
information about B.
• P(B) is the marginal probability of B, and acts as a normalizing constant. Based
on the total probability theorem, this quantity can be computed as the sum of the
conditional probabilities of B under all possible (mutually exclusive) events Ak
(included in a set SA) in the sample space. This can be mathematically expressed
for a discrete sample space as:
X
PðBÞ ¼ PðBjAi ÞPðAi Þ ð3:12Þ
Ai 2SA
Example 3.7 Assume there are three doors (D1, D2, and D3); behind two of
the doors are goats and behind the third door is a new car. The three doors are
equally likely to have the car. Thus, the probability of getting the car by
picking each door at the beginning of the game is simply 1/3. After you have
picked a door, say D1, instead of showing you what is behind that door,
Monty opens another door, say D2, which reveals a goat. At this point, Monty
gives you the opportunity to switch the door from D1 to D3. What should you
do, given that Monty is trying to let you get a goat?
Solution
The question is whether the probability of getting the car by picking the
door D1 is the same as that by picking D3, or mathematically, whether
72 3 Statistical Data Analysis
2
A detection rate of 90% means that the test correctly detects defective cases 90% of the time.
3
A false alarm rate of 30% means that, in cases where a bearing unit is not defective, the test
produces an alarm, suggesting the detection of a fault 30% of the time.
3.2 Bayesian Statistics 73
Pðdefective j test + Þ ¼
Pðtest + j defectiveÞ PðdefectiveÞ
Pðtest + j defectiveÞPðdefectiveÞ þ Pðtest + j not defectiveÞPðnot defectiveÞ
ð3:13Þ
Combining Eqs. (3.11) and (3.12) gives rise to the formula above. Filling in the
prior and conditional probabilities yields:
0:90 0:20
Pðdefective j test + Þ ¼ 0:4286 ð3:14Þ
0:90 0:20 þ 0:30 0:80
Thus, the probability that the unit is defective conditional on the positive test result
is 0.4286. Since this probability is an estimated probability after the data (from the
diagnostic test) is observed, it is termed a posterior probability.
An important contribution of Bayes’ theorem is that it provides a rule on how to
update or revise a prior belief to a posterior belief; this lies at the core of Bayesian
inference. In the bearing example, the reliability or quality engineer may choose to
repeat the diagnostic test (i.e., conduct a second test). After the second test, the
engineer can use the posterior probability of being defective (P = 0.429) in
Eq. (3.14) as the new prior P(defective). By doing so, the engineer has updated the
prior probability of being defective to reflect the result of the first test. If the second
test still gives a positive result, the updated posterior probability of being defective
can be computed as:
0:90 0:0:4286
Pðdefective j test + Þ ¼ 0:6923 ð3:15Þ
0:90 0:4286 þ 0:30 0:5714
With the second positive test result, we obtain an increase in the posterior proba-
bility from 0.4286 to 0.6923, which means that the added test result (positive) has
increased our belief that the bearing unit might be defective. If the engineer con-
tinues to repeat the test and observe a positive result in each of the repeated tests,
these repeated tests will yield the posterior probabilities shown in Table 3.5.
Bayesian statistics stems from the concept of repeating a test and recomputing
the posterior probability of interest based on the results of the repeated testing. In
the context of reliability analysis, the Bayesian approach begins with a prior
probability of the system success event, and updates this prior probability with new
data to obtain a posterior probability. The posterior probability can then be used as a
prior probability in subsequent analysis. This may be an appropriate strategy for
Table 3.5 Posterior probabilities of being defective after repeated tests with positive results
Test 1 2 3 4 5 6 7 8 9 10
number
Posterior 0.4286 0.6923 0.8710 0.9529 0.9838 0.9945 0.9982 0.9994 0.9998 0.9999
probability
74 3 Statistical Data Analysis
In Bayesian statistics, the quantities in Bayes’ theorem in Eq. (3.11) are typically
expressed in the form of probability distributions rather than point probabilities. In
the bearing example, we assume the prior probability of being defective after
2 years of operation is a point probability of exactly 0.20. However, the uncertainty
in bearing parameters and design variables, as well as our imperfect knowledge of
the reliability of the bearing population, give rise to a certain degree of unit-to-unit
variation in this prior probability. Thus, it is unreasonable to use a precise point
value to represent this probability. Instead, a probability distribution should be used
for the prior defect probability to capture our uncertainty about its true value.
Similarly, the point values for the conditional probabilities should be replaced with
probability distributions to represent our uncertainty about their true values. The
inclusion of the prior and conditional probability distributions eventually produces
a posterior probability distribution that is no longer a single quantity. This posterior
distribution combines the positive result observed from the diagnostic test with the
prior probability distribution to produce an updated posterior distribution that
expresses our knowledge of the probability that the bearing unit is defective.
Let us now express Bayes’ theorem in terms of continuous probability distri-
butions. Let X be a continuous random variable with a PDF f(x, h), where h is the
distributional parameter (e.g., the mean and standard deviation of a normally dis-
tributed variable). The goal of Bayesian inference is to represent prior uncertainty of
a distributional parameter with a probability distribution and to update this prob-
ability distribution with newly acquired data. The updating procedure yields a
posterior probability distribution of the parameter. This perspective is in contrast
with frequentist inference, which relies exclusively on the data as a whole, with no
reference to prior information. From the Bayesian point of view, the parameter h is
interpreted as a realization of a random variable H with a PDF fH(h). Based on
Bayes’ theorem, the posterior distribution of H, given a new observation x, can be
expressed as
Example 3.8 Suppose that we have a set of random samples x = {x1, x2, …,
xM} drawn from the normal PDF fX(x; l, r) of a random variable X, where the
mean l is unknown and the standard deviation r is known. Assume that the
prior distribution of l, fM(l), is a normal distribution with its mean u and
variance s2 . Determine the posterior distribution of l, fM|X(l|x), given the
random observations x.
Solution
First, we compute the conditional probability of obtaining x, given l, as
Y
M
1 1 xi l2
fXjM ðxjlÞ ¼ pffiffiffiffiffiffi exp
i¼1 2pr 2 r
" #
2 n=2 1 X M
2
¼ 2pr exp 2 ðxi lÞ
2r i¼1
2 n=2
2 1=2 1 XM
2 1 2
fXjM ðxjlÞfM ðlÞ ¼ 2pr 2ps exp 2 ðxi lÞ 2 ðl uÞ
2r i¼1 2s
M 1 Mx u
¼ K1 ðx1 ; . . .; xM ; r; u; sÞ exp þ l þ
2
þ 2 l
2r2 2s2 r2 s
2 r2 s2 r 2 þ 1
s2
" 2 #
1 M 1 Ms x þ r2 u
2
¼ K2 ðx1 ; . . .; xM ; r; u; sÞ exp þ l
2 r2 s2 Ms2 þ r2
76 3 Statistical Data Analysis
As can be observed in Example 3.8, Bayesian inference and the MLE provide
essentially the same estimate if we have an extremely large sample size. In engi-
neering practice, however, we often have very limited sample data due to the
expense and time demands associated with obtaining the data. In such cases, the
MLE may not give an accurate or even reasonable estimate. In contrast, Bayesian
inference gives a better estimate if we assume a reasonable prior distribution. The
term “reasonable” means that the prior assumption is at least consistent with the
underlying distribution of the population. If there is no such consistency, Bayesian
inference may give an erroneous estimate due to the misleading prior information.
Another important observation we can make from Example 3.8 is that the
posterior distribution shares the same form (i.e., normal distribution) with the prior
distribution. In such cases, we say that the prior is conjugate to the likelihood. If we
have a conjugate prior, the posterior distribution can be obtained in an explicit form.
Conjugacy is desirable in Bayesian inference, because using conjugate priors/
likelihoods with known forms significantly eases the evaluation of the posterior
probability. Looking back at Example 3.8, we note that the normal (or Gaussian)
family is conjugate to itself (or self-conjugate): if the likelihood function is normal,
choosing a normal prior ensures that the posterior is also normal. Other conjugate
Bayesian inference models include the binomial inference, exponential inference,
and Poisson inference. Among these inference models, the binomial inference is the
3.2 Bayesian Statistics 77
Cða; bÞ a1
fP0 ðp0 Þ ¼ p ð1 p0 Þb1 ð3:17Þ
CðaÞCðbÞ 0
Based on the form of the binomial distribution, the likelihood function can be
expressed as
Cðx þ a; n þ b xÞ x þ a1
fP0 jX ðp0 jxÞ ¼ p ð1 p0 Þn þ bx1 ð3:19Þ
Cðx þ aÞCðn þ b xÞ 0
The posterior distribution follows the same form (i.e., the beta distribution) as the
prior distribution, which suggests that the beta prior is conjugate to the binomial
likelihood. In Example 3.9, we demonstrate this conjugate inference model with a
simple reliability analysis problem.
Maximum likelihood
estimator
Solution
The parameters in this example take the following values: a = 4, b = 4,
x = 8, n = 10. The posterior distribution of p0 can be obtained, according to
Eq. (3.19), as B(x + a, n + b − x), or B(12, 6). The prior and posterior dis-
tributions of p0 are plotted in Fig. 3.7. The figure shows that the posterior
distribution results—from a combination of the prior information and the
testing data (evidence)—lie between the prior distribution and the maximum
likelihood estimator (which exclusively relies on the testing data).
In many engineering problems, the conjugacy condition does not hold, and
explicit solutions of posterior distributions cannot be readily obtained through
simple mathematical manipulations. In such cases, we are left to draw random
samples from posterior distributions to approximate the distributions. A commonly
used simulation method for drawing samples from a posterior distribution is
referred to as Markov chain Monte Carlo (MCMC), in which two important sam-
pling techniques, namely the Metropolis–Hastings algorithm and Gibbs sampling,
are often used. An in-depth discussion of these techniques is beyond the scope of
this book. Readers are recommended to refer to [3] for detailed information.
In Bayesian updating, Bayesian inference expressed in Eq. (3.16) is often per-
formed iteratively over time. In other words, after observing the initial set of testing
data, Bayesian inference is performed to obtain the resulting posterior probability,
and this posterior probability can then be treated as a prior probability for com-
puting a new posterior probability as the next set of testing data becomes available.
Figure 3.8 shows the overall procedure of Bayesian updating for a distributional
parameter H. In each updating iteration, Bayesian inference is performed with the
most “up-to-date” prior information and the most recent data. The posterior density
of H after one iteration becomes the prior density for the next iteration. The
capability of continuous updating is an attractive feature of Bayesian statistics that
is useful for parameter estimation with evolving data sets or random variables.
Updating
Iteration, i = i + 1
Prior Density
fΘ(θ )
Bayesian
Likelihood function Posterior Density
Updating
fX|Θ(x|θ) fΘ|X(θ |x)
Mechanism
Observed data, X
3.3 Exercises
3:1 Recall Problem 2.2 in Chap. 2. Answer the following question based on the 30
sample data obtained from the fatigue tests described in that problem.
(1) Use normal, Weibull, and lognormal distributions. Find the most suitable
parameters of the three distributions for the fatigue ductility coefficient
(ef′) and exponent (c) using the MLE method.
(2) Find the most suitable distribution for the data set (ef′, c) using the
chi-square goodness-of-fit test.
(3) Verify the results using the graphical methods described in the chapter (a
histogram and a probability plot).
3:2 Recall Problem 2.3 in Chap. 2. Answer the following question based on the 50
sample data obtained from LCD module dent tests.
(1) Use normal, Weibull, and uniform distributions. Find the most suitable
parameters of the three distributions for the failure displacement (df) and
failure force (Ff) using the MLE method.
(2) Find the most suitable distributions for the data set (df, Ff) using the
chi-square goodness-of-fit test.
(3) Verify the results using probability plots.
3:3 Suppose that we are interested in identifying the probability distribution for
the number of cars passing through the main gate of the University of
Maryland per minute. The data have been collected by a group of students and
are shown in Table 3.6.
Table 3.7 Data for 100 electronic components’ time-to-failure (TTF) [minutes]
1703.2 1071.4 2225.8 1826.5 1131.0 2068.9 1573.5 1522.1 1490.7 2226.6
1481.1 2065.1 1880.9 2290.9 1786.4 1867.2 1859.1 1907.5 1791.8 1871.0
1990.4 2024.1 1688.6 1962.7 2191.7 1841.0 1814.1 1918.1 2237.5 1396.8
1692.8 707.2 2101.3 2165.4 1975.2 1961.6 2116.7 1373.0 1798.8 2248.4
1872.3 1597.8 1865.1 742.8 1436.7 1380.8 2258.2 1960.0 2182.8 1772.7
2003.6 1589.4 1988.3 1874.9 1859.0 2051.9 1763.0 1854.6 1974.7 2289.9
1945.7 1774.8 1579.6 1430.5 1855.0 1757.9 1029.3 1707.2 1864.7 1964.8
1719.4 1565.2 1736.8 1759.4 1939.4 2065.7 2258.5 2292.8 1452.5 1692.2
2120.7 1934.8 999.4 1919.9 2162.4 2094.9 2158.2 1884.2 1748.7 2260.3
1040.8 1535.0 1283.4 2267.7 2100.3 2007.9 2499.8 1902.9 1599.6 1567.5
Table 3.8 Data for 100 cutting tools’ time-to-failure (TTF) [minutes]
839.3 838.8 959.3 950.5 873.9 948.3 859.8 898.7 903.6 1031.6
852.4 891.5 965.0 856.1 739.0 895.3 916.8 921.7 1093.3 863.7
965.0 927.1 888.6 918.4 1025.4 811.3 960.4 826.9 875.2 980.7
905.6 982.7 892.0 928.4 918.7 1071.5 824.1 743.9 915.0 1064.0
753.3 787.6 836.0 941.7 951.7 791.8 949.1 874.6 975.8 948.2
1046.2 817.6 939.5 850.7 809.7 936.6 1040.8 947.1 857.9 901.4
952.3 848.5 999.0 1007.7 915.8 931.3 907.1 966.3 810.8 771.2
776.4 913.7 1003.7 978.0 1035.7 1065.8 1107.6 766.0 772.6 999.1
870.0 1007.6 877.7 709.8 958.1 874.1 846.0 746.2 994.0 954.7
916.6 1054.9 917.4 812.6 963.4 1017.5 1122.9 865.1 938.8 837.5
3.3 Exercises 81
3:5 Suppose it is desired to estimate the failure rate of a machine cutting tool.
A test can be performed to estimate its failure rate. The failure times in
minutes are shown in Table 3.8. Answer the following questions:
(1) Construct a histogram of TTF.
(2) Find a probability distribution model fT(t) and its parameters for the TTF
data (use the MLE method for parameter estimation and the K-S
goodness-of-fit test for distribution selection).
(3) Attempt to update the TTF mean value (h) with aggregation of 100 TTF
data using Bayesian inference. Assume that the TTF follows a normal
distribution with a standard deviation of r = 80 and that the prior dis-
tribution fH(h) of h follows a normal distribution with the mean u ¼ 1000
and the standard deviation s = 100.
3:6 The TTF of a machine has an exponential distribution with parameter k.
Assume that the prior distribution for k is exponential with a mean of 100 h.
We have five observed TTF from five machines; the average TTF is 1200 h.
(1) Compute the posterior distribution of k based on our observations.
(2) Based on the posterior distribution, what proportion of the machines will
fail after 1000 h?
References
1. Lenth, R. V. (2001). Some practical guidelines for effective sample size determination. The
American Statistician, 55(3), 187–193.
2. http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm
3. http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm
4. Lynch, S. (2010). Introduction to applied Bayesian statistics and estimation for social
scientists. New York, NY: Springer.
Chapter 4
Fundamentals of Reliability Analysis
Failures of engineered systems (e.g., vehicle, aircraft, and material) lead to sig-
nificant maintenance/quality-control costs, human injuries, and fatalities. Examples
of such system failures can be found in various engineering fields: the Chernobyl
disaster in Russia (1986), the collapse of the I-35 W Mississippi River Bridge in the
U.S. (2007), the explosion of a compressed natural gas (CNG) bus in the Republic
of Korea (2010), and the lithium-ion battery fire/smoke on Boeing 787 Dreamliners
in the U.S. and Japan (2013). Many system failures can be traced back to various
difficulties in evaluating and designing complex systems under highly uncertain
manufacturing and operational conditions. One of the greatest challenges in design
of an engineered system is to ensure high reliability and maintainability of the
system during its life-cycle. Our attempt to address this challenge begins with the
discussion of the fundamentals of reliability analysis. This discussion will be
separately conducted for time-independent and time-dependent reliability analyses,
with an aim to facilitate more in-depth discussions in later chapters.
where the random vector X = (X1, X2,…, XN)T models uncertainty sources, such as
material properties, geometric tolerances, and loading conditions; G(X) is a system
performance function, and the system success event is Esys = {G(X) 0}. The
uncertainty of the vector X further propagates and leads to the uncertainty in the
system performance function G. In reliability analysis, equating the system per-
formance function G to zero, i.e., G = 0, gives us the so-called limit-state function,
which separates the safe region G(X) 0 from the failure region G(X) > 0.
Depending on the specific problems, a wide variety of system performance func-
tions can be defined to formulate time-independent reliabilities. The most
well-known example is the safety margin between the strength and load of an
engineered system, which will be discussed in Sect. 4.2. The concept of
time-independent reliability analysis in a two-dimensional case is illustrated in
Fig. 4.1. The dashed lines represent the contours of the joint PDF of the two
random variables X1 (manufacturing tolerance) and X2 (operational factors). The
basic idea of reliability analysis is to compute the probability that X is located in the
safe region {G 0}.
time-independent reliability
analysis in a two-dimensional Safe region fX(x)
space. Reprinted (adapted)
with permission from Ref. [1]
factor
G>0
Failure
region
X1: manufacturing tolerance
4.1 Definition of Reliability 85
where X is the random vector representing engineering uncertainty factors, and the
time-to-failure (TTF) T(X) of the system is defined as the time when the system’s
performance function (or health condition) is worse than a predefined critical value.
The equation above indicates that time-dependent reliability analysis requires
modeling of the underlying TTF distribution. This can be done by using a wide
variety of parametric probability distributions; the most commonly used are
exponential distribution, Weibull distribution, and normal distribution. The relia-
bility functions under these distributions will be discussed in Sect. 4.3.
Consider the most well-known performance function, i.e., the safety margin
between the strength S of an engineered system and the load L on this system. This
performance function takes the following form
G¼LS ð4:3Þ
The strength S and load L are random in nature and their randomness can be
characterized by two PDFs fS(s) and fL(l), respectively. Under the assumption of
normal distributions, these two PDFs are plotted in Fig. 4.2a. The probability of
failure depends on the intersection (shaped) area of the two PDFs, where the load
on the system might exceed its strength. Let lS and lL, respectively, denote the
means of S and L; let rS and rL, respectively, denote the standard deviations of
S and L. We can then compute the mean and standard deviation of the normally
distributed performance function G as
l G ¼ l L lS
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð4:4Þ
rG ¼ r2L þ r2S 2qLS rL rS
Probability density
performance function G (b) fS (s)
fL(l)
μL μS L,S
Probability density
(b)
G<0 G >0
Safe region Failure region
Pf = 1 − R
βσG
μG G=L ‒ S
!
G ðlL lS Þ 0 ðlL lS Þ
R ¼ PðG 0Þ ¼ P pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r2L þ r2S r2L þ r2S
! ! ð4:5Þ
l lL l lL
¼ P Z pSffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ U pSffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r2L þ r2S r2L þ r2S
Based on the equation above and our intuition, we can reduce the intersection
area and thus increase the reliability through either of the following two strategies:
• Increase the relative distance between the two means: As the relative distance lS
− lL between the means increases, the numerator of the standard normal value
in Eq. (4.5) increases. Accordingly, the standard normal CDF value, or relia-
bility, increases.
• Decrease the variances of two variables S and L: Reduction in either variance
leads to a reduction in the denominator of the standard normal value in
Eq. (4.5). The decrease of the denominator results in an increase of the standard
normal CDF value, or reliability.
4.2 Reliability Function (Time-Independent) 87
The PDF of the system performance function is plotted in Fig. 4.2b, where the
probability of failure, or one minus reliability, is indicated by the shaded area. We
note that the distance between the mean performance function (safety margin) and
the limit state G = 0 is equal to the standard deviation rG multiplied by a factor b.
In reliability analysis, this factor is named the reliability index, and is expressed as
l lL
b ¼ U1 ðRÞ ¼ pSffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð4:6Þ
r2L þ r2S
The reliability index provides an alternative measure of reliability from the per-
spective of a standard normal distribution. In Chap. 5, we will see that the reliability
index is a very useful measure for reliability assessment.
Example 4.1 Suppose that the coefficients of variation for the load and
strength are qL = 0.2 and qS = 0.1, respectively. Assume both variables
follow normal distributions. Determine the ratio of means lS/lL required to
achieve a reliability no less than 0.99.
Solution
According to the Appendix in Chap. 5, the reliability index corresponding to
a 0.99 reliability is b = 2.3263. The reliability index can be mathematically
expressed as
l lL lS =lL 1
b ¼ pSffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
rL þ rS
2 2
q2L þ ðlS =lL Þ2 q2S
The equation above can be written as a quadratic equation with j as the only
unknown, expressed as
1 b2 q2S j2 2j þ 1 b2 q2L ¼ 0
Since 0.5193 < 1 it does not satisfy j > 1; thus, the final solution is 1.5951.
88 4 Fundamentals of Reliability Analysis
Let us next consider a general case where the normality assumption may not
hold. To calculate the reliability, we can perform a two-dimensional integration as
0 1
Zþ 1 Zþ 1
R ¼ P ð L S 0Þ ¼ @ fL ðlÞdlAfS ðsÞds ð4:7Þ
0 s
Observe that the integration above is performed over the safe region ΩS that is
defined as ΩS = {x = (L, S)T: G(x) 0}. The integration above can thus be
equivalently expressed as
ZZ ZZ
R¼ fL ðlÞfS ðsÞdlds ¼ f X ðxÞdx ð4:8Þ
XS XS
where fX(x) denotes the joint PDF of the vector X. Now let us further generalize this
calculation to any multi-dimensional random vector whose joint PDF may or may
not be separable. In this general case, the time-independent reliability can be for-
mulated as a multi-dimensional integration of the joint PDF over a safe region
Z Z
R ¼ fX ðxÞdx ð4:9Þ
XS
where fX(x) denotes the joint PDF of this random vector, and the safe region ΩS is
defined as XS ¼ fx : GðxÞ 0g.
In engineering practice, however, it is extremely difficult, if not impossible, to
perform multi-dimensional numerical integration when the performance function
involves a large number of random input variables. The search for efficient com-
putational procedures to perform this multi-dimensional integration has resulted in a
variety of numerical and simulation methods, such as first- and second-order reli-
ability methods (FORM/SORM), direct or smart Monte Carlo simulation (MCS),
the dimension reduction (DR) method, the stochastic spectral method, and the
stochastic collocation method. These methods will be introduced in Chap. 5.
Example 4.2 Given the joint density function of two random variables X and Y
6xy
; if 0\x\2; 2\y\4
fXY ðx; yÞ ¼ 8
0; otherwise
Solution
Here, the safe region XS can be defined as ΩS = {(X, Y): G(X, Y) 0} =
{(X, Y): 2X Y}. Given the joint density function, this is equivalent to
ΩS = [{X: 0 < X < 1} \ {Y: 2 < Y < 4}] [ [{X: 1 X < 2} \ {Y: Y
2X}]. Reliability can be computed by performing a two-dimensional inte-
gration over this safe region ΩS, expressed as
Z1 Z4 Z2 Z4
R¼ fXY ðx; yÞdydx þ fXY ðx; yÞdydx
0 2 1 2x
Z1 Z4 Z2 Z4
6xy 6xy
¼ dydx þ dydx
8 8
0 2 1 2x
Z1 Z2 1 2
3x ð x 2Þ 2
3 x2 ðx 2Þ3
¼ dx þ dx ¼ þ
4 2 4 8 0 6
0 1 1
19
¼
24
Zþ 1
RT ðtÞ ¼ PðT ðXÞ [ tÞ ¼ fT ðtÞdt ¼ 1 FT ðtÞ ð4:10Þ
t
90 4 Fundamentals of Reliability Analysis
Z1
MTTF ¼ E½T ¼ sfT ðsÞds ð4:11Þ
0
In a life test of replaceable units, the mean of all the sample units approaches the
MTTF as the number of tested units approaches infinity. By equating fT(t) to the
negative derivative of RT(t), we can derive another useful formula of MTTF as
Z1 Z1
@RT ðsÞ
MTTF ¼ s ds ¼ ½tRT ðtÞ1
0 þ RðsÞds
@s
0 0
ð4:12Þ
Z1
¼ RT ðsÞds
0
Insights can be gained into failure mechanisms by examining the behavior of the
so-called failure rate. The failure rate, denoted by h(t), can be derived from the
reliability and the TTF distribution. Let h(t)Dt be the conditional probability that the
system will fail at some time t < T < t + Dt given that it has not yet failed at
T = t. This conditional probability can be expressed as
f T ðt Þ
hð t Þ ¼ ð4:14Þ
RðtÞ
Failure Rate
“Infant
Mortality” Wear-Out
Failure Constant (Random)
Failures Failures
Time
The bathtub shape of the failure rate curve can be attributed to the existence of both
defective and non-defective units in the population. In the first part, the failure rate
curve is dominated by the initially defective units (burn-in failures) whose failure
rate rapidly decreases until these defective units fail. In the second part, the failure
rate remains at a low constant since initially defective units have failed and the
remaining units are not yet experiencing wear-out failures. In the third part, the
non-defective units become dominant but the units begin to wear out over time,
leading to an increasing failure rate.
There are a handful of parametric models that have successfully served as
population models for failure times arising from a wide range of products and
failure mechanisms. Sometimes there are probabilistic arguments based on the
physics of the failure mode that tend to justify the choice of the model. Other times,
the model is used solely because of its empirical success in fitting actual failure
data. The next section discusses some popular parametric models.
This section presents the most commonly used probability distributions for mod-
eling the TTF distribution.
Exponential Distribution
With the assumption of a constant failure rate, the TTF follows an exponential
distribution with only one unknown parameter, expressed as
for t 0. The exponential distribution is the simplest among all life distributions.
The reliability function can be easily obtained as
The MTTF is the mean value of the exponential random variable t, i.e., MTTF = 1/k.
As shown in Eq. 4.12, the MTTF can also be computed by integrating the reliability
function from zero to infinity as
Zþ 1 Zþ 1
MTTF ¼ RT ðsÞds¼ eks ds
0 0 ð4:17Þ
1 1
¼ eks 0þ 1 ¼
k k
The failure rate (or hazard) function can be easily computed as h(t) = 1/k. The
reliability functions and failure rate functions with different parameters are graph-
ically compared in Fig. 4.4. As k increases, the reliability decreases more rapidly
over time.
Fig. 4.4 Reliability functions (a) and failure rate functions (b) for exponential distribution
4.3 Reliability Function (Time-Dependent) 93
Solution
This conditional probability can be computed as
Pðt [ a þ bÞ RT ðt ¼ a þ bÞ
Pðt [ a þ bjt [ aÞ ¼ ¼
Pðt [ aÞ RT ðt ¼ aÞ
ekða þ bÞ
¼ ¼ ekb
eka
Weibull Distribution
A generalization of the exponential distribution is the Weibull distribution, which
can model a constant, decreasing, or increasing failure rate function. The density of
a Weibull TTF distribution can be expressed as
k t
k1 ðkt Þk
fT ðtÞ ¼ e ð4:18Þ
k k
Here, t > 0, k > 0 is the scale parameter, and k > 0 is the shape parameter. The
reliability function can be computed by subtracting the CDF from 1 as
Zt
fT ðsÞds ¼ eðkÞ
t k
R T ðt Þ ¼ 1 ð4:19Þ
0
The MTTF is the mean value of the exponential random variable t, expressed as
Cð1 þ 1=kÞ
MTTF ¼ ð4:20Þ
k
k t
k1
hð t Þ ¼ ð4:21Þ
k k
94 4 Fundamentals of Reliability Analysis
h(t)
h(t)
h(t)
t t t
Fig. 4.5 Three cases of a failure rate: k = 1 (a); k > 1 (b); k < 1 (c)
From the equation above, we can see that the temporal trend of the failure rate
depends on the shape parameter k. This dependence is illustrated in Fig. 4.5. When
k = 1, the failure rate is a constant and the Weibull distribution reduces to the
exponential model with MTTF = 1/k. When k < 1, the failure rate decreases
monotonically and can be used to represent the case of infant mortality. When
k > 1, the failure rate increases monotonically and can be used to represent the case
of wear-out aging. As mentioned before, the Weibull distribution with k = 1 is an
exponential distribution with the parameter k. Note that, for k > 4, the Weibull
distribution becomes symmetric and bell-shaped, like the curve of a normal
distribution.
The reliability functions and failure rate functions with different parameters are
compared in Fig. 4.6. As k increases, the reliability decreases more slowly over
time. For k = 1, the failure rates remain constant over time. For k > 1, the failure
rates increase monotonically over time.
Fig. 4.6 Reliability functions (a) and failure rate functions (b) for a Weibull distribution
4.3 Reliability Function (Time-Dependent) 95
Example 4.4 Assume that the failure rate function of a solid-state power unit
takes the following form (t: hrs)
t
0:5
hðtÞ ¼ 0:003
500
(2) The conditional probability that the unit fails after 100 h, given that it has
not failed before 50 h, can be computed as
Normal Distribution
Another widely used TTF distribution is the normal, or Gaussian, distribution. The
density of this distribution can be expressed as
1 1 t l
2
fT ðt; l; rÞ ¼ pffiffiffiffiffiffi exp ð4:22Þ
2pr 2 r
where l 0 is the mean or MTTF, and r > 0 is the standard deviation of the TTF.
The reliability function can be expressed in terms of the standard normal CDF as
Zþ 1
1 1 t l
2 t l
Examples for modeling TTF with normal distributions include the useful life of the
tread of a tire and the wear-out time of the cutting edge of a machine tool. The
reliability functions and failure-rate functions with different parameters are graph-
ically compared in Fig. 4.7. Two observations can be made: (i) as l increases, the
reliability curve shifts to the right; (ii) the shape of the reliability curve is deter-
mined by r, and a decrease of r leads to the compression of the curve along the
center line t = l.
Fig. 4.7 Reliability functions (a) and failure rate functions (b) for a normal distribution
4.3 Reliability Function (Time-Dependent) 97
Life PDF
Life PDF
1 Lifetime 1 Lifetime
2 2
3 Unit- 3 Unit-
4 independent 4 dependent
Unit ID Unit ID
reliability analysis method (see Fig. 4.8 for the difference between population- and
unit-wise reliability analysis) to ensure high operational reliability of an engineered
system throughout its life-cycle. To overcome the limitations of classical reliability
analysis, prognostics and health management (PHM) has recently emerged as a key
technology to (i) evaluate the current health condition (health monitoring) and
(ii) predict the future degradation behavior (health prognostics) of an engineered
system throughout its life-cycle. This emerging discipline will be discussed in
Chap. 8 of this book.
4.4 Exercises
L L
(3) Assuming that X and M have a correlation coefficient qXM = 0.90 and
that lX = 60, compute the reliabilities R1 and R2 corresponding to E1 and
E2.
4:2 A cantilever beam is shown in Fig. 4.10. The length of the beam is 100 in. The
width and thickness are represented by w and t, respectively. The free end of
the beam is subjected to two transverse loads X and Y along the orthogonal
directions, as shown in Fig. 4.10.
The stress limit-state function is expressed as
600 600
G ¼ rðX; YÞ S ¼ Yþ 2 X S¼0
wt2 w t
where S is the yield strength, and w and t are the design parameters (fixed in
this problem: w = 2 in and t = 1 in). S, X, and Y are independent random
variables whose means and standard deviations are summarized in Table 4.2.
(1) Calculate the mean and standard deviation of r(X, Y).
(2) Compute the time-independent reliability, defined as P(G 0).
L Y
X
t
w
References
1. Hu, C., Wang, P., & Youn, B. D. (2015). Advances in system reliability analysis under
uncertainty. In Numerical methods for reliability and safety assessment (pp. 271–303). Cham:
Springer.
2. Song, J., & Der Kiureghian, A. (2003). Bounds on system reliability by linear programming.
Journal of Engineering Mechanics, 129(6), 627–636.
Chapter 5
Reliability Analysis Techniques
(Time-Independent)
Reliability analysis under uncertainty, which assesses the probability that a sys-
tem’s performance (e.g., fatigue, corrosion, fracture) meets its marginal value while
taking into account various uncertainty sources (e.g., material properties, loads,
geometries), has been recognized as having significant importance in product
design and process development. However, reliability analysis in many engineering
problems has been a challenging task due to the overwhelmingly large computa-
tional burden. To resolve the computational challenges, a variety of numerical and
simulation techniques have been developed during the last two decades. This
chapter is devoted to providing an in-depth discussion of these developments with
the aim of providing insights into their relative merits and limitations.
where X = (X1, X2, …, XN)T denotes an N-dimensional random vector that models
uncertainty sources, such as material properties, loads, and geometric tolerances;
fX(x) denotes the joint PDF of this random vector; the safe region ΩS is defined by
the limit-state function as ΩS = {X: G(X) 0}; and G(X) is a system perfor-
mance (or response) function.
Neither analytical multi-dimensional integration nor direct numerical integration
is computationally feasible for large-scale engineering problems where the numbers
of random variables are relatively large (e.g., a finite element model with over 20
importance sampling methods [6–8] and the enhanced MCS method with optimized
extrapolation [9]. Despite improved efficiency over direct MCS, these methods are
still computationally expensive.
The stochastic response surface method (SRSM) is an emerging technique for
reliability analysis under uncertainty. As opposed to the deterministic response
surface method (RSM), whose input variables are deterministic, SRSM employs
random variables as its inputs. The aim of SRSM is to alleviate the computational
burden required for accurate uncertainty quantification (i.e., quantifying the
uncertainty in the performance function) and reliability analysis. This is achieved
by constructing an explicit multi-dimensional response surface approximation
based on function values given at a set of sample points. Generally speaking,
uncertainty quantification and reliability analysis through SRSM consists of the
following steps:
Step 1: Determine an approximate functional form for the performance function.
Step 2: Evaluate the parameters of the functional approximation (or the stochastic
response surface) based on the function values at a set of sample points.
Step 3: Conduct MCS or numerical integration based on the functional approxi-
mation to obtain the probabilistic characteristics (e.g., statistical moments,
reliability, and PDF) of the performance function.
The current state-of-the-art SRSMs for uncertainty quantification include the
dimension reduction (DR) method [10–12], the stochastic spectral method [13–15],
and the stochastic collocation method [16–19].
Recall the simple performance function discussed in Chap. 4, where the safety
margin between the strength S of an engineered system and the load L on this
system is defined as the performance function. Under the assumption of normal
distributions for S and L, the performance function G also follows a normal dis-
tribution. By further assuming statistical independence between S and L, we can
compute the reliability based on the standard normal CDF of the following form
!
l lL
R ¼ U pSffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð5:2Þ
r2L þ r2S
The reliability of the engineered system is estimated based on the first two statistical
moments (mean and standard deviation). Here, we only consider a simple perfor-
mance function, which is a linear combination of two normally distributed random
variables. In fact, this idea of reliability analysis using the first two statistical
moments can be generalized to cases where the functions are in a nonlinear form.
104 5 Reliability Analysis Techniques (Time-Independent)
We begin with the first-order Taylor series expansion of the performance function
G(X) at the mean value, expressed as
XN
@GðlX Þ
GðXÞ GðlX Þ þ Xi lXi
i¼1
@Xi ð5:3Þ
¼ a1 X 1 þ þ aN X N þ b
This can be rewritten in a vector form as G(X) = aTX + b, where a = [a1, a2, …,
aN]T contains the first-order partial derivatives of G with respect to input random
variables and is called a sensitivity vector of G. We can then obtain the first-order
approximate mean and variance of G as
lG ¼ E ½G E aT X þ b ¼ aT lX þ b ð5:4Þ
and
h i h i
r2G ¼ E ðG lG Þ2 ¼ E ðG lG ÞðG lG ÞT
h T i
E aT X þ b aT lX b aT X þ b aT lX b
h i ð5:5Þ
¼ a T E ð X lX Þ ð X lX Þ T a
¼ aT RX a
where RX is the covariance matrix of X. Under the assumption of normality for the
performance function, the reliability can be computed based on the first two sta-
tistical moments as
l
R¼U G ð5:6Þ
rG
Note that the formula above gives the exact reliability only if the performance
function is a linear function of normally distributed random variables. However, it
is rare, in engineering practice, to encounter an engineered system whose perfor-
mance function is a simple linear combination of normally distributed random
variables. It is more likely that the performance function is of a nonlinear form and
that some of the random variables are non-normally distributed. In such cases, the
first-order expansion method leads to an inaccurate reliability estimate that often
contains a large error.
P1 P2
10ft 10ft
Fig. 5.1 Cantilever beam subjected to vertical loads and a bending moment
10,000 lb-ft). Further assume that P1 and P2 are uncorrelated. Answer the
following questions.
(1) Compute the mean, standard deviation, and coefficient of variation
(COV) of the maximum moment at the fixed end using the first-order
expansion method.
(2) Compute the reliability with an allowable moment ma = 33,000 lb–ft.
Solution
(1) At the fixed end, the maximum moment can be expressed as
where a = [10, 20]T, b = 10,000, and X = [P1, P2]T. Note that the mean
vector lX = [1000, 50]T, and the covariance matrix RX = [1002, 0; 0, 502],
where the semicolon denotes the separation between two adjacent rows in a
matrix. We first compute the mean as
1000
lmmax ¼ a lX þ b ¼ ½ 10
T
20 þ 10000
500
¼ 30; 000 lbft
1002 0 10
r2mmax ¼ a RX a ¼ ½ 10
T
20
0 502 20
2
¼ 2 10 ½ lbft
6
Taking the square root of the variance gives us the standard deviation as
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
rmmax ¼ 2 106 1414:2 lbft
rmmax 1414:2
COV ¼ 0:047 or 4:7%
lmmax 30000
R ¼ Pðmmax ma Þ
mmax lmmax ma lmmax
¼P
rmmax rmmax
33;000 30;000
¼P Z ¼ 2:12
1414
¼ Uð2:12Þ ¼ 98:3%
From what has been discussed, we can see that the linearization of a performance
function at the mean values of the random variables enables estimation of the mean
and variance of the performance function. However, the estimates may contain
large errors if second- and/or high-order expansion terms are significant. A more
accurate approximation can be realized by the second-order Taylor series expansion
of the performance function G(X). This expansion involves quadratic
(second-order) terms and can be expressed as
XN
@GðlX Þ
GðXÞ GðlX Þ þ Xi lXi
i¼1
@Xi
ð5:7Þ
1 X X @ GðlX Þ
N N 2
þ Xi lXi Xj lXj
2 i¼1 j¼1 @Xi @Xj
1X N
@ 2 GðlX Þ 2
lG ¼ E½G GðlX Þ þ rXi ð5:8Þ
2 i¼1 @Xi2
and
5.2 Expansion Methods 107
h i X N
@GðlX Þ 2 2 1X N X N
@ 2 GðlX Þ 2 2
r2Y ¼ E ðG lG Þ2 rXi þ r r ð5:9Þ
i¼1
@Xi 2 i¼1 j¼1 @Xi @Xj Xi Xj
Under the assumption of normality for the performance function, the reliability can
then be computed using Eq. (5.6).
The expansion methods discussed earlier suffer from the following two drawbacks:
(i) these methods only utilize the first two statistical moments of the input random
variables while ignoring the distribution information of these variables; and (ii) the
mean point of the input random variables is treated as the reference point for
building a linear or quadratic approximation of the performance function, which
may lead to a large error in estimating a high reliability (or a low probability of
failure). As an attempt to overcome the drawbacks of the expansion methods,
Hasofer and Lind proposed the first-order reliability method (FORM) [2] in 1974.
Since then, the attempts to improve FORM have resulted in more advanced
MPP-based methods, including the second-order reliability method (SORM). We
will briefly review these two most-well-known MPP-based methods.
The basic idea of FORM is to linearize the performance function G(X) at the most
probable failure point on the limit-state surface G(X) = 0, or the MPP in the
transformed U-space. The U-space is composed of independent standard normal
variables U that are transformed from the input random variables X in the original
X-space. Compared to the expansion methods, the two distinctive features of
FORM are (i) the transformation T of input random variables to the standard
normal space and (ii) the use of the MPP as the reference point for the linearization
of the performance function. For a normal random variable Xi with mean lXi and
standard deviation rXi, transformation T can be simply defined as
X lX i
Ui ¼ T ðXi Þ ¼ ; i ¼ 1; 2; . . .; N ð5:10Þ
rXi
In a general case, the transformation formula can be derived based on the CDF
mapping as
108 5 Reliability Analysis Techniques (Time-Independent)
where FXi and U are the CDFs of Xi and Ui, respectively. Note that, unlike the
expansion methods, FORM utilizes the distribution information of input random
variables to transform these variables to standard normal random variables. The
transformation of a uniformly distributed random variable Xi to the corresponding
standard normal random variable Ui is illustrated in Fig. 5.2. Observe that the
one-to-one mapping between the CDFs ensures the one-to-one mapping between
the values of the original and transformed variables. The transformations of five of
the most commonly used types of probability distributions (i.e., normal, lognormal,
Weibull, Gumbel, and uniform) are presented in Table 5.1.
Through the transformation, the performance function G(X) in the original
X-space is mapped onto an equivalent function G(U) = G(T(X)) in the transformed
U-space. The transformation of a performance function involving two normally
distributed input variables is graphically presented in Fig. 5.3. In the U-space, the
ΦUi (ui)
FXi (xi)
1.0 1.0
0.6 0.6
0.4 0.4
0.2 0.2
0 0
ui Ui xi Xi
fU i (ui)
fX i (xi)
0.9
1.0
0.6
0.3
0 0 ui Ui 0 0 1 Xi
xi
Fig. 5.2 Transformation of a uniform random variable to a standard normal random variable
5.3 MPP-Based Methods 109
Table 5.1 Probability distribution and its transformation between X- and U-space
Distribution Probability density function (PDF) Transformation
h i
Normal fX ð xÞ ¼ pffiffiffiffi
1
exp 12 xl
2 X ¼ l þ rU
2pr r
" #
Lognormal 1 1 ln x l 2 X ¼ expðl þ rU Þ
fX ð xÞ ¼ pffiffiffiffiffiffi exp 2
2prx 2 r r2X ¼ er 1 exp 2l þ r2
x[0
k
Weibull fX ð xÞ ¼ kk ðkxÞk1 eðkxÞ X ¼ 1k ½ lnðUðU ÞÞ1=k
x
0
h i
Gumbel fX ð xÞ ¼ a exp aðx vÞ eaðxvÞ ; X ¼ a þ ðb 1ÞUðU Þ
1x þ1
Uniform fX ð xÞ ¼ ba
1
; axb X ¼ a þ ðb 1ÞUðU Þ
MPP u* denotes the point on the failure surface which has the minimum distance to
the origin. This distance is called the reliability index, expressed as
" #1=2
X
N
u i
2
b ¼ ku k ¼ ð5:12Þ
i¼1
R ¼ UðbÞ ð5:13Þ
As can be seen from Fig. 5.3, as the minimum distance b between the failure
surface and the origin becomes larger, the area of the safe region becomes larger,
while the area of the failure region becomes smaller. This indicates a lesser chance
that the random samples will be located in the failure region, which means a lower
G(X) > 0
U2
X2
G(X) > 0
Failure region Failure region
fX(x)
G(X) 0
Safe region
Reliability
index β
MPP u*
Mean point
FORM
X1 0 U1
0
G(X) = 0
fU(u) Failure surface
probability of failure and thus higher reliability. Thus, the reliability index b is a
good measure of reliability. Note that this measure gives an exact reliability value
only if the failure surface in the U-space is of a linear form. In most engineering
problems, however, the failure surface in the U-space is a nonlinear function, either
due to nonlinearity in the performance function or due to nonlinear transformation
from a non-normal distribution to a normal distribution. For the case in Fig. 5.3, the
failure surface exhibits a nonlinear form in the U-space due to the nonlinearity in
the performance function, and as a consequence, the FORM overestimates the
reliability, as indicated by the shaded error region.
The remaining—however, the most critical—task is to search for the
MPP. Mathematically, this task can be formulated as an optimization problem with
one equality constraint in the U-space, expressed as
Minimize kUk
ð5:14Þ
Subject to GðUÞ ¼ 0
where the optimum point on the failure surface is the MPP u*. The MPP search
generally requires an iterative optimization scheme based on the gradient infor-
mation of the performance function. Among the many MPP search algorithms, the
most widely used is the Hasofer-Lind and Rackwitz-Fiessler (HL-RF) method, due
to its simplicity and efficiency. The HL-RF method consists of the following iter-
ative steps:
Step 1: Set the number of iterations k = 0 and the initial MPP estimate u = u(0)
that corresponds to the mean values of X.
Step 2: Transform u(k) to x(k) using Eq. (5.11). Compute the performance function
G(u(k)) = G(x(k)) and its partial derivatives with respect to the input ran-
dom variables in the U-space as
@G @G @G
r U G uð k Þ ¼ ; ; . . .; ð5:15Þ
@U1 @U2 @UN U¼uðkÞ
ðk Þ rU G uðkÞ
n ¼ ð5:17Þ
krU GðuðkÞ Þk
U2
G(U) > 0
Failure region
u(0)
G(U) ≤ 0 u(1)
Safe region
n(0)
G(U) = G(u(0)) > 0
0 U1
G(U) = 0
Failure surface
80
GðX1 ; X2 Þ ¼ 1
X12 þ 8X2 þ 5
where X1 and X2 each follow a normal distribution with the mean 4 and the
standard deviation 0.6. Find the MPP and compute the reliability with the
FORM.
Solution
The first iteration in the HL-RF method is detailed as follows:
Step 1: Set the number of iterations k = 0 and the initial values u = (0,0).
Step 2: Transform u(0) to x(0) with Eq. (5.11). x(0) = (4,4). Compute the
performance function G(x(k)) as
80 80
GðX1 ; X2 Þ ¼ 1 ¼1 2 0:5049
X12 þ 8X2 þ 5 4 þ8 4þ5
h i r Guð0Þ
ð1Þ ð0Þ ð0Þ
G uð0Þ
U
u ¼ u rU G u 2
krU Gðuð0Þ Þk
ð0:1367; 0:1367Þ
¼ ½ð0; 0Þ ð0:1367; 0:1367Þ ð0:5094Þ
0:13672 þ 0:13672
¼ ð1:8633; 1:8633Þ:
The results for all the iterations needed to find the MPP are summarized in
Table 5.2. Finally, the reliability is estimated as the standard normal CDF
value of b, which is 0.9998.
G(X) ≤ 0
Safe region
Reliability
index β
MPP u*
FORM
0 U1
SORM
G(X) = 0
fU(u) Failure surface
114 5 Reliability Analysis Techniques (Time-Independent)
Y
N 1
pf ¼ UðbÞ ð1 þ bji Þ ð5:18Þ
i¼1
where ji, i = 1, 2, …, N, are the principal curvatures of G(U) at the MPP, and b is
the reliability index. Clearly, upon completion of the FORM computation, an extra
computational task is needed to find the principal curvatures ji. This task can be
completed in two steps, which are listed as follows:
Step 1: Rotate the standard normal variables Ui (in the U-space) to a set of new
standard normal variables Yi (in the Y-space), of which the last variable YN
shares the same direction with the unit gradient vector of G(U) at the
MPP. To do so, we generate an orthogonal rotation matrix R, which can be
derived from a simple matrix R0, expressed as
2 3
1 0 0
6 0 1 0 7
6 .. .. .. .. 7
R0 ¼ 6 7 ð5:19Þ
4
.
. .
. 5
@Gðu Þ=@u1 @Gðu Þ=@u2 @Gðu Þ=@uN
jrGðu Þj jrGðu Þj jrGðu Þj
where the last row consists of the components of the unit gradient vector of the
limit-state function at the MPP. Next, the orthogonal matrix R can be obtained by
orthogonalizing R0 using the Gram-Schmidt algorithm. In the rotated Y-space, the
U-space, second-order approximation to the limit-state function at the MPP can be
expressed as
1
GðYÞ YN þ b þ ðY Y ÞT RDRT ðY Y Þ ð5:20Þ
2
RDRT
A¼ ð5:21Þ
jrGðu Þj
1XN 1
GðUÞ ¼ UN þ b þ ji Ui2 ð5:22Þ
2 i¼1
Finally, Breitung’s SORM formula in Eq. (5.18) can be used to compute the
probability of failure or reliability. Besides Breitung’s formula, another popular and
more accurate SORM formulation is given by Tvedt [4].
During the last several decades, sampling methods have played an important role in
advancing research in reliability analysis. These methods generally involve gen-
eration of random samples of input random variables, deterministic evaluations of
the performance function at these random samples, and post-processing to extract
the probabilistic characteristics (e.g., statistical moments, reliability, and PDF) of
the performance function. In this section, direct Monte Carlo simulation (MCS), the
most crude (yet widely used) sampling method, is briefly introduced. Following this
brief introduction, we introduce a smart MCS method that borrows ideas from
MPP-based methods, namely the importance sampling method.
The term “Monte Carlo” was originally used as a Los Alamos code word by Ulam
and von Neumann, who worked on stochastic simulations to achieve better atomic
bombs. Since then, the word has been widely used in articles and monographs and
MCS has been applied to a wide variety of scientific disciplines. The basic idea
behind MCS is to approximate an underlying distribution of a stochastic function
and the associated probabilistic characteristics (e.g., mean, variance, and
higher-order moments) by computing the function values at simulated random
samples.
To introduce this concept, let us rewrite the multi-dimensional integration in
Eq. (4.9) for reliability analysis with an indicator function as
Z Z Z Z
R¼ fX ðxÞdx ¼ IXS ðxÞfX ðxÞdx ¼ E IXS ðxÞ ð5:23Þ
XS X
where I[] is an indicator function of the safe or fail state such that
116 5 Reliability Analysis Techniques (Time-Independent)
1; x 2 XS
IXS ðxÞ ¼ ð5:24Þ
0; x 2 XnXS
1 4
0.8
2
0.6
U2
V2
0
0.4
-2
0.2
0 -4
0 0.2 0.4 0.6 0.8 1 -4 -2 0 2 4
V1 U1
Fig. 5.6 Standard uniform samples (a) and the corresponding standard normal samples (b)
5.4 Sampling Methods 117
Step 3: Evaluate the values of the performance function G(X) at the random
samples xj. This step requires M function evaluations. Upon the comple-
tion of this step, we obtain M random function values G(xj), for j = 1, 2,
…, M, which consist of rich statistical information of the performance
function.
Step 4: Extract the probabilistic characteristics of G(X), including statistical
moments, reliability, and PDF, from the random function values. For
example, the reliability can be estimated by
1X M
R ¼ E I X S ð xÞ IXS xj ð5:26Þ
M j¼1
Random sampling allows for the derivation of all probabilistic characteristics (e.g.,
statistical moments, reliability, and PDF) of the performance function. This is
different from MPP-based methods, which are only capable of estimating the
reliability.
Reliability analysis using direct MCS is graphically illustrated in Fig. 5.7. A set
of random samples of X1 and X2 are categorized into two groups, failed samples and
safe samples, separated by the failure surface (or the limit state function) G(X) = 0.
The reliability is computed as the proportion of the safe samples over all random
samples.
X2
fX(x)
Mean point
0 X1
118 5 Reliability Analysis Techniques (Time-Independent)
where z1−a/2 is the 100(1 − a/2)th percentile of the standard normal distribution. For
a 95% confidence level, a = 0.05 and z1−a/2 = 1.96.
Example 5.3 Recall Exercise 4.2 where a cantilever beam is subjected to two
end transverse loads X and Y in orthogonal directions. Ten sets of random
samples generated for direct MCS are summarized in Table 5.3. Obtain the
reliability estimate using direct MCS and discuss the precision of the
estimate.
Table 5.3 Summary of random samples for direct MCS in Example 5.3
Sample X [lb] Y [lb] R [psi] Sample X [lb] Y [lb] R [psi]
ID ID
1 614 853 448,480 6 566 850 407,240
2 674 1036 385,540 7 512 944 402,470
3 528 1158 359,820 8 415 1067 391,870
4 483 887 415,130 9 458 903 409,390
5 553 1057 383,340 10 484 1124 400,800
5.4 Sampling Methods 119
Solution
First, evaluate the values of the performance function at all ten sample points
as −100,645, 26,390, 66,645, −76,745, 16,635, −67,505, −42,500, −9655,
−69,700 and 8805. The number of safe samples is 6 and the reliability
estimate from direct MCS is R = 6/10 = 0.60. Since direct MCS only
employs a small number of samples (M = 10), the estimate may have a large
variation, which is computed as
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Rð1 RÞ 0:6 0:4
rP ¼ ¼ 0:155
M 10
This indicates that the MC estimate contains a large variation that can be fully
attributed to the small sample size. We can then compute the 95% confidence
interval of the reliability estimate as [0.2962, 0.9038].
As can be observed from Eq. (5.28), direct MCS generally requires a large number
M of samples to obtain a sufficiently small variance in the reliability estimate,
especially in cases of high reliability (or low probability of failure). To alleviate this
computational burden and reduce the variance in the reliability estimate, researchers
have developed a wide variety of smart MCS methods, among which the most
popular one is the importance sampling method [6–8]. The basic idea of importance
sampling is to assign more sample points to the regions that have more impact on
the probability of failure. If these regions are treated with greater importance by
sampling more frequently, the variance in the resulting reliability or probability
estimate can be reduced. Therefore, one of the most important elements in
importance sampling is to choose an appropriate sampling distribution that
encourages random samples to be placed in these regions. An example of such a
sampling distribution is shown in the U-space in Fig. 5.8, where the sampling
distribution centers at the MPP in the standard normal space. For a given number of
random samples, more points are assigned to the failure region by the importance
sampling method than by direct MCS; because direct MCS sampling distribution
uses the means of random inputs (or the origin in the U-space) as the center.
To examine this, start with the formula for the probability of failure, expressed as
Z Z
1X M
pf ¼ IXF ðxÞ fX ðxÞdx IXF xj ð5:30Þ
M j¼1
X
120 5 Reliability Analysis Techniques (Time-Independent)
U2
Fig. 5.8 Comparison
between direct MCS and G(X) > 0
importance sampling Failure region
Importance
sampling
G(X) ≤ 0
Safe region
MPP u*
0 U1
G(X) = 0
Failure surface
fU(u)
Direct MCS
It can be expected that, in the new sampling distribution, a random sample point
will have a greater chance of the indicator value being one or, in other words, a
greater chance of falling in the failure region. Therefore, we can expect a larger
number of failure samples, which leads to a smaller variance in the estimate of the
probability of failure.
Stochastic response surface methods (SRSMs) are capable of alleviating the com-
putational burden required by sampling methods, while still maintaining compa-
rable accuracy. This section introduces three state-of-the art SRSMs for uncertainty
quantification and reliability analysis, namely the dimension reduction
(DR) method [10–12], the stochastic spectral method [13–15], and the stochastic
collocation method [16–19].
5.5 Stochastic Response Surface Methods 121
X
N X
GðxÞ ¼ G0 þ Gi ðXi Þ þ Gi1 i2 ðXi1 ; Xi2 Þ
i¼1 1 i1 \i2 N
X
þ þ Gi1 iS ðXi1 ; . . .; Xis Þ þ þ G1N ðX1 ; . . .; XN Þ
1 i1 \\is N
ð5:32Þ
122 5 Reliability Analysis Techniques (Time-Independent)
X
N X
GðXÞ ¼ GC0 þ GCi ðXi Þ þ GCi1 i2 ðXi1 ; Xi2 Þ
i¼1 1 i1 \i2 N
X
þ þ GCi1 iS ðXi1 ; . . .; Xis Þ þ þ GC1N ðX1 ; . . .; XN Þ
1 i1 \\is N
ð5:34Þ
GC0 ¼ GðlX Þ;
GCi ¼ GðXÞjX¼lX nXi GC0 ;
ð5:36Þ
GCi1 i2 ¼ GðXÞjX¼l nðXi ;Xi Þ GCi1 GCi2 GC0 ;
X 1 2
...
Here, the notation x = lX\Xi denotes the vector X with its components other than Xi
being set equal to the corresponding components of the reference vector l.
A general recursive formula for the component functions can be derived as [21]
X
GCu ðXu Þ ¼ GðXÞjX¼lX nXu GCv ðXv Þ ð5:37Þ
vu
where the notation X = lX\Xu denotes the vector X with its components other than
those indices that belong to the set u being set equal to the corresponding com-
ponents of the reference vector l.
It is worth noting that we can derive the Cut-HDMR formulae from a Taylor
series expansion of the response function at the reference point lx as [11]
X
1
1X
N
@ jG j
GðXÞ ¼ GðlX Þ þ j ðlX Þ Xi lXi
i¼1 @Xi
j¼1
j!
X
1
1 X @ j1 þ j2 G j1 j2
þ ð l Þ X i l X i l þ
j1 ;j2
j !j ! j1
1 1 2 1 i \i N @Xi @Xi
j2 X 1 Xi1 2 Xi2
1 2 1 2
ð5:39Þ
We can see that any component function in the Cut-HDMR expansion accounts for
an infinite number of Taylor series terms containing the same set of random vari-
ables as that component function. For example, the univariate decomposed com-
ponent function GCi (Xi) in Eq. (5.34) contains the univariate terms with Xi of any
order in the Taylor series expansion, and so on. Thus, the dimension decomposition
of any order in Eq. (5.34) should not be viewed as a Taylor series expansion of the
same order, nor do they represent a limited degree of nonlinearity in g(x). In fact,
the dimension decomposition provides greater accuracy than a Taylor series
expansion of the same or even higher order. In particular, the residual error in a
univariate approximation to a multidimensional integration of a system response
over a symmetric domain was reported to be far less than that of a second-order
Taylor expansion method for probability analysis [11]. We also note that, to con-
struct the dimension decomposition of a response function, or the Cut-HDMR, we
124 5 Reliability Analysis Techniques (Time-Independent)
need to first define a reference point lX ¼ ðlX1 ; lX2 ; . . .; lXN Þ in the input random
space. Regarding this issue, the work by Sobol [27] suggests that it is optimum to
define the reference point as the mean value of the input random variables. Thus,
this study will employ the mean point of the random inputs as the reference point.
The responses of most practical physical systems are significantly affected by
only low-order interactions (usually up to the second-order) of the input random
variables; the high-order interactions of these variables are often very weak. In these
systems, a few lower-order component functions are sufficient to capture the
response uncertainty. These considerations led to two well-known versions of
Cut-HDMR, namely the univariate dimension reduction (UDR) method [10] and
the bivariate dimension reduction (BDR) method [11]. Considering the component
functions in Eq. (5.34), looking only up to the first-order yields the univariate
decomposed response, expressed as
X
N
GU ðXÞ ¼ GC0 þ GCi ðXi Þ ð5:40Þ
i¼1
Replacing the component functions with the formulae in Eq. (5.36) gives us the
UDR formulation, expressed as
X
N
GU ðXÞ ¼ GðXÞjx¼lX nXi ðN 1ÞGðlX Þ ð5:41Þ
i¼1
For example, if a response function G(X) has three input random variables X1, X2,
and X3, the univariate decomposed response can be expressed as
GU ðX1 ; X2 ; X3 Þ ¼ G X1 ; lX2 ; lX3 þ G lX1 ; X2 ; lX3
ð5:42Þ
þ G lX1 ; lX2 ; X3 2G lX1 ; lX2 ; lX3
X
N X
GB ðXÞ ¼ GC0 þ GCi ðXi ÞGðXÞ þ GCi1 i2 ðXi1 ; Xi2 Þ ð5:43Þ
i¼1 1 i1 \i2 N
Substituting the component functions with the formulae in Eq. (5.36) gives us the
BDR formulation, expressed as
X
GB ðXÞ ¼ GðXÞjx¼l nðXi ;Xi Þ
X 1 2
1 i1 \i2 N
X
N
ðN 1ÞðN 2Þ
ð N 2Þ GðXÞjx¼lX nXi þ GðlX Þ ð5:44Þ
i¼1
2
5.5 Stochastic Response Surface Methods 125
For the same response function G(X) with three input random variables X1, X2, and
X3, the bivariate decomposed response can be expressed as
GU ðX1 ; X2 ; X3 Þ ¼ G X1 ; X2 ; lX3 þ G X1 ; lX2 ; X3 þ G lX1 ; X2 ; X3
G X1 ; lX2 ; lX3 G lX1 ; X2 ; lX3 G lX1 ; lX2 ; X3 ð5:45Þ
þ G lX1 ; lX2 ; lX3
To further predict the reliability or PDF of the response, the decomposed compo-
nent functions need to be integrated or interpolated, followed by the use of a PDF
generation technique (in the case of integration [11]) or the use of direct MCS (in
the case of interpolation [22]). In what follows, the procedure for numerical
interpolation is discussed in detail.
Numerical Interpolation for Component Function Approximation
Consider the UDR formula in Eq. (5.41) where the univariate component function
can be approximated with function values at a set of univariate sample points,
expressed as
GðXÞjx¼lX nXi ¼ G lX1 ; . . .; lXi1 ; Xi ; lXi þ 1 ; . . .; lXN
Xm ð5:46Þ
ð jÞ
¼ aj ðXi Þ G lX1 ; . . .; lXi1 ; xi ; lXi þ 1 ; . . .; lXN
j¼1
i , xi , …, xi ), and
where m is the number of univariate sample points (Xi = x(1) (2) (m)
aj(Xi) is the jth interpolation basis function. A widely used interpolation basis
function is called the Lagrange polynomial. In Lagrange interpolation, aij has the
following form
Qm
ðk Þ
k¼1;k6¼j Xi xi
aj ð X i Þ ¼ Q ð5:47Þ
m ð jÞ ðk Þ
k¼1;k6¼j xi xi
Repeating this interpolation for all univariate component functions in Eq. (5.36),
we then have an explicit function approximation for the response function,
expressed as
N X
X m
ð jÞ
GU ðXÞ ¼ aj ðXi Þ G lX1 ; . . .; lXi1 ; xi ; lXi þ 1 ; . . .; lXN ðN 1ÞGðlX Þ
i¼1 j¼1
ð5:48Þ
If we use the same number of sample points m for the Lagrange interpolation of all
univariate component functions, we then need (m − 1)N + 1 function evaluations
for the UDR. An empirical sample point distribution for the UDR when m = 3 is
126 5 Reliability Analysis Techniques (Time-Independent)
(μ X1 − 3σ X1 , μ X 2 + 3σ X 2 ) (μ X1 , μ X 2 + 3σ X 2 ) (μ X1 + 3σ X1 , μ X 2 + 3σ X 2 )
(μ X1 − 3σ X1 , μ X 2 ) (μ X1 , μX2 ) (μ X1 + 3σ X1 , μ X 2 )
(μ X1 − 3σ X1 , μ X 2 − 3σ X 2 ) (μ X1 , μ X 2 − 3σ X 2 ) (μ X1 + 3σ X1 , μ X 2 − 3σ X 2 )
Full factorial
UDR
shown in Fig. 5.9. Also shown is the full factorial (or tensor-product) design (the
number of sample points being mN) without the use of UDR. It is apparent that,
compared to the full factorial design, UDR achieves a significant reduction in the
number of sample points.In the case of the BDR, the Lagrange interpolation of the
bivariate component function can be expressed as
GðXÞjx¼l nðXi ;Xi Þ ¼ G lX1 ; . . .; lXi 1 ; Xi1 ; lXi þ 1 ; . . .; lXi 1 ; Xi2 ; lXi þ 1 ; . . .; lXN
X 1 2 1 1 2 2
Xm X
m
¼ aj1 ðXi1 Þaj2 ðXi2 Þ
j2 ¼1 j2 ¼1
ðj Þ ðj Þ
G lX1 ; . . .; lXi 1 ; xi11 ; lXi þ1
; . . .; lXi 1 ; xi12 ; lXi þ1
; . . .; lXN
1 1 2 2
ð5:49Þ
Repeating this interpolation for all bivariate component functions in Eq. (5.44)
gives us an explicit formula for the bivariate approximation of the response func-
tion, expressed as
5.5 Stochastic Response Surface Methods 127
X X
m X
m
GB ðXÞ ¼ aj1 ðXi1 Þaj2 ðXi2 Þ
1 i1 \i2 N j2 ¼1 j2 ¼1
ðj Þ ðj Þ
G lX1 ; . . .; lXi 1 ; xi11 ; lXi þ1
; . . .; lXi 1 ; xi12 ; lXi þ1
; . . .; lXN
1 1 2 2
N X
X m ð5:50Þ
ð jÞ
ð N 2Þ aj ðXi Þ G lX1 ; . . .; lXi1 ; xi ; lXi þ 1 ; . . .; lXN
i¼1 j¼1
ðN 1ÞðN 2Þ
þ GðlX Þ
2
Since we have N(N − 1)/2 bivariate combinations and we need (m − 1)2 sample
points for each bivariate combination (excluding the m − 1 univariate sample
points), the number of sample points for computing the bivariate component
functions is N(N − 1)(m − 1)2/2. Therefore, the total number of sample points
required by the BDR is N(N − 1)(m − 1)2/2 + (m − 1)N + 1. Similarly, we can
apply Lagrange interpolation to other versions of the DR method involving third-
and higher-order component functions.
Monte Carlo Simulation for Uncertainty Quantification
Once Lagrange interpolation is completed for all component functions in the UDR
or BDR, an approximate function G ^ of the original response function G can be
obtained by interpolation using Lagrange polynomials at a set of sample points.
Thus, any probabilistic characteristics of G(x), including statistical moments, reli-
ability, and PDF, can be easily estimated by performing direct MCS. For example,
any rth moment can be calculated as
Z
br ffi ^ r ðxÞfX ðxÞdx
G
r XM ð5:51Þ
^ ðxÞ ¼ lim 1
¼E G ^ r xj
G
M!1 M
j¼1
where br is the rth moment of the performance function G(X); fX(x) is the joint
PDF; xj is the jth realization of X; and M is the sample size. For reliability esti-
mation, we can define an approximate safe domain for the performance function
g as
^S ¼ x : G
X ^ ð xÞ 0 ð5:52Þ
Z
Rffi IX^ S ðxÞfX ðxÞdx
ð5:53Þ
1X M
¼ E IX^ S ðxÞ ¼ lim IX^ S xj
ns!1 M
j¼1
It should be noted that the MCS performed here employs the explicit interpolation
^ instead of the original performance function G and is thus inexpensive. It is also
G
noted that the approximation of the response function over the input domain allows
for the derivation of any probabilistic characteristics (e.g., statistical moments,
reliability, and PDF) based on the same set of sample points. This is desirable,
especially in reliability-based robust design problems where both moment estima-
tion and reliability analysis are required [30–32].
MATLAB Code for UDR-based SRSM
A 99-line MATLAB code that implements a UDR-based SRSM is provided in the
Appendix. The method in the code first uses the UDR to decompose the multidi-
mensional performance function into multiple one-dimensional univariate functions
and then employs cubic spine interpolation to approximate the one-dimensional
univariate functions. The two-step process results in a stochastic response surface,
with which the MCS is then applied to obtain the full probabilistic characteristics
(e.g., statistical moments, reliability, and PDF) of the performance function.
The stochastic spectral method [13] is an emerging technique for reliability analysis
of complex engineering problems. This method uses a number of response samples
and generates a stochastic response surface approximation with multi-dimensional
polynomials over a random space. Once the explicit response surface is constructed,
MCS is often used for reliability analysis due to its convenience. The most popular
stochastic spectral method is the polynomial chaos expansion (PCE) method. The
fundamentals and computational procedures of PCE are detailed next.
Fundamentals of PCE
The original Hermite polynomial chaos, also called the homogeneous chaos, was
derived from the original theory of Wiener [14] for the spectral representation of
5.5 Stochastic Response Surface Methods 129
Table 5.4 Types of random inputs and corresponding generalized polynomial chaos basis
Random variable Polynomial chaos Support
Continuous Gaussian Hermite (−∞, +∞)
Gamma Generalized Laguerre (Laguerre) [0, +∞)
(Exponential)
Beta Jacobi [a, b]
Uniform Legendre [a, b]
Discrete Poisson Charlier {0, 1, …}
Binomial Krawtchouk {0, 1, …, N}
Negative binomial Meixner {0, 1, …}
Hypergeometric Hahn {0, 1, …, N}
X
1
GðXÞ ¼ c0 C0 þ ci1 C1 fi1 ðXÞ
i1 ¼1
X
1 X
i1
þ ci1 i2 C2 fi1 ðXÞ; fi2 ðXÞ ð5:55Þ
i1 ¼1 i2 ¼1
1 X
X i1 X
i2
þ ci1 i2 i3 C3 fi1 ðXÞ; fi2 ðXÞ; fi3 ðXÞ þ
i1 ¼1 i2 ¼1 i3 ¼1
where Cn fi1 ðXÞ; fi2 ðXÞ; . . .; fin ðXÞ denotes the n-dimensional Askey-chaos of
order n in terms of the random variables fi1 ; fi2 ; . . .; fin . According to the
Cameron-Martin theorem [33], the polynomial chaos expansion in Eq. (5.55)
converges in the L2 sense (the mean square sense).
For the purpose of notational convenience, Eq. (5.55) is often rewritten as
X
1
GðXÞ ¼ si Wi ðfðXÞÞ; f ¼ ff 1 ; f 2 ; . . . g ð5:56Þ
i¼0
where there exists a one-to-one mapping between the polynomial basis functions Cn
and Wi, and the PCE coefficients si and ci1 ;...;ir .
If the random variables f follow the standard normal distribution, the following
expression can be used to obtain the corresponding univariate Hermite polynomials
with an order p:
130 5 Reliability Analysis Techniques (Time-Independent)
@ p ef =2
2
2
=2
Wp ðfÞ ¼ ð1Þp ef ð5:57Þ
@pf
Using the expression above, we can easily derive the first five Hermite polynomials,
expressed as
These polynomials are plotted in Fig. 5.10. Observe that higher-order Hermite
polynomials generally exhibit higher degrees of nonlinearity.
The univariate Hermite polynomials serve as the foundation for constructing the
multi-dimensional Hermite polynomials by taking tensor products. To do so, we
first define a multi-index p = {p1, p2, …, pN} whose ith element ik is the polynomial
order corresponding to the ith standard normal variable fi. We then define the
modulus of the multi-index p as
X
N
j pj ¼ pi ð5:59Þ
i¼1
Y
N
WNp ðfÞ ¼ W1pi ðfi Þ ð5:60Þ
i¼1
0 1 0 2 1 0 3 2 1 0
p¼ ð5:61Þ
00 10 1 20 1 2 3
where the first and second rows contain the values for p1 and p2, respectively, and
the vertical lines separate a lower-order |p| from the adjacent higher-order |p| + 1.
Accordingly, the third-order PCE has the following form as
ð5:62Þ
where dij is the Kronecker’s delta and E[] is the expectation operator. This property
is very useful in computing the PCE coefficients, as will be discussed later. In
engineering practice, it is impractical to consider an infinite summation in
Eq. (5.56) and we often truncate the expansion up to a specific order p. All
N-dimensional polynomials of orders not exceeding p result in the truncated PCE as
follows (with P denoting the number of unknown PCE coefficients):
X
P1
GðxÞ ¼ si Wi ðfÞ; x ¼ fx1 ; x2 ; . . .; xN g; f ¼ ff1 ; f2 ; . . .fN g ð5:64Þ
i¼0
N þp ðN þ pÞ!
P¼ ¼ ð5:65Þ
p N!p!
132 5 Reliability Analysis Techniques (Time-Independent)
Projection method
Based on the orthogonality of the polynomial chaos, the projection method [34,
35] can be used as a non-intrusive approach to compute the expansion coefficients
of a response. Pre-multiplying both sides of Eq. (5.56) by Wj(f) and taking the
expectation gives the following equation
" #
X1
E GðXÞWj ðfÞ ¼ E si Wi ðfÞWj ðfÞ ð5:66Þ
i¼0
Due to the orthogonality of the polynomial chaos, Eq. (5.66) takes the form
E GðXÞWj ðfÞ
sj ¼ h i ð5:67Þ
E W2j ðfÞ
Now the remaining question is how to select the regression points. Let us first
consider a univariate case (N = 1) where the PCE is truncated up to the degree
p (clearly, P = p + 1). The optimum regression points are given by the roots of the
orthogonal polynomial of the order p + 1, that is {r1, r2, …, rp+1}. For example, the
optimum regression points for a third-order PCE (p = 3) with Hermite polynomials
can be obtained by equating the fourth-order Hermite polynomial to zero, that is f4
− 6f2 + 3 = 0, which gives us the following points: −2.3344, −0.7420, 0.7420 and
2.3344. For a general multivariate case (N > 1), an optimum set of regression points
can be obtained by applying the tensor-product formula to the univariate points {r1,
r2, …, rp+1}, which can be explicitly expressed as
for k = 1, 2, …, (p + 1)N. Note that, in cases of high dimensions (N) or high orders
(p), the computational cost of a full tensor-product formula becomes intolerably
expensive. The research efforts to address this issue have resulted in the idea of
selecting a subset of the full tensor-product points [40, 41]. One way is to choose
the first (N − 1)P roots with the smallest Euclidean distances to the origin [40].
Another way is to sort the tensor-product points according to increasing Euclidean
distances and adaptively add the point into the information matrix until WTW
becomes invertible, which was reported to give less than (N − 1)P regression points
[41]. It should also be noted that, for very high expansion orders (e.g., above 20),
the regression method with even full tensor-product points can encounter numerical
instability, i.e., the term WTW is ill-conditioned. In such cases, we need to rely on
the projection method to compute the PCE coefficients.
Recently, to estimate small failure probability, shifted and windowed Hermite
polynomial chaos were proposed to enhance the accuracy of a response surface in
the failure region [42]. Although the PCE method is considered to be accurate, the
5.5 Stochastic Response Surface Methods 135
Example 5.4 This example examines Fortini’s clutch, shown in Fig. 5.12.
This problem has been extensively used in the field of tolerance design [43,
44]. As shown in Fig. 5.12, the overrunning clutch is assembled by inserting
a hub and four rollers into the cage.
The contact angle y between the vertical line and the line connecting the
centers of two rollers and the hub, is expressed in terms of the independent
component variables, x1, x2, x3, and x4 as follows:
x1 þ 0:5ðx2 þ x3 Þ
yðxÞ ¼ arccos
x4 0:5ðx2 þ x3 Þ
Fig. 5.12 Fortini’s clutch. Reprinted (adapted) with permission from Ref. [44]
Solution
Let us first compute the expansion coefficients for the first-order PCE. We
solve f2 − 1 = 0 for the roots of the second-order Hermite polynomial, which
gives us r1 = − 1 and r2 = 1. We then obtain 24 = 16 tensor-product
regression points fj = (f1,j, f1,j, f1,j, f1,j), for j = 1, 2, …, 16. Using these, we
construct the information matrix of the following form:
2 3
1 f1;1 f2;1 f3;1 f4;1
6 1 f1;2 f2;2 f3;2 f4;2 7
6 7
W ¼ 6 .. . .. .. .. 7
4 . .. . . . 5
1 f1;16 f2;16 f4;16 f4;16
Next, we conduct a least square regression using Eq. (5.69) to obtain the PCE
coefficients and construct the PCE model as
Similarly, we can go through the same process and construct the second-order
PCE model as
The PDF approximations of the first- and second-order PCEs and direct MCS
with 1,000,000 samples are compared in Fig. 5.13. Observe that the PDF
approximation is improved from the first-order to the second-order through
inclusion of the second-order orthogonal Hermite polynomials in the PCE
model.
Table 5.6 summarizes the probability analysis results for the first- and
second-order PCEs, and compares these with direct MCS. The second-order
PCE produces more accurate results than its first-order counterpart but
requires more computational effort (in terms of original performance function
evaluations). The second-order PCE gives comparable accuracy to direct
5.5 Stochastic Response Surface Methods 137
MCS but is far more efficient, which suggests that a PCE model with a
sufficiently high order can be used as a better alternative to direct MCS for
reliability.
Great attention has been paid to the stochastic collocation method for approximating
a multi-dimensional random function due to its strong mathematical foundation and
its ability to achieve fast convergence for interpolation construction. This method is
another SRSM that approximates a multi-dimensional random function using
function values given at a set of collocation points. In the stochastic collocation
138 5 Reliability Analysis Techniques (Time-Independent)
X
mi
U i ð gÞ ¼ aij G xij ð5:71Þ
j¼1
where i 2 ℕ is the interpolation level, aij 2 C([0,1]) is the jth interpolation nodal
basis function, xij is the jth support node, and mi is the number of support nodes in
the interpolation level i. Note that we use the superscript i to denote the interpo-
lation level during the development of the stochastic collocation method. Two
widely used nodal basis functions are the piecewise multi-linear basis function and
the Lagrange polynomial. Here, we briefly describe the fundamentals of piecewise
multi-linear basis functions. To achieve faster error decay, the Clenshaw-Curtis grid
with equidistant nodes is often used for piecewise multi-linear basis functions [18].
In the case of a univariate interpolation (N = 1), the support nodes are defined as
1 if i ¼ 1
mi ¼
2 i1
þ 1; if i [ 1
( ð5:73Þ
j1
mi 1 for j ¼ 1; . . .; mi if mi [ 1
xij ¼
0:5 for j ¼ 1; . . .; mi if mi ¼ 1
The resulting set of the points fulfills the nesting property Xi Xi+1 that is very
useful for the hierarchical interpolation scheme detailed later. Then, the univariate
140 5 Reliability Analysis Techniques (Time-Independent)
aij ¼ 1 for i ¼ 1
(
i ð5:74Þ
1 ð m i 1 Þ x x j ; if x xij \1=ð1 mi Þ
aij ¼
0; otherwise
X mN
X
m1
U i1 U iN ðGÞ ¼ aij11 aijNN G xij11 ; . . .; xijNN ð5:75Þ
j1 ¼1 jN ¼1
where the superscript ik, k = 1, …, N, denotes the interpolation level along the kth
dimension, U ik are the interpolation functions with the interpolation level ik along
the kth dimension, and the subscript jk, k = 1, …, N, denotes the index of a given
support node in the kth dimension. The number of function evaluations required by
the tensor-product formula reads
MT ¼ m1 m2 mN ð5:76Þ
Suppose that we have the same number of collocation points in each dimension,
i.e., m1 = m2 = ··· = mN m, and that the total number of tensor-product collo-
cation points is MT = mN. Even if we only have three collocation points (m = 3) in
each dimension, this number (MT = 3N) still grows very quickly as the number of
dimensions is increased (e.g., MT = 310 *6 104, for N = 10). Thus, we need
more efficient sampling schemes than the tensor-product grid to reduce the amount
of computational effort for multi-dimensional interpolation. The search for such
sampling schemes has resulted in sparse grid methods, the fundamentals of which
are briefly introduced in subsequent sections.
Smolyak Algorithm: Conventional Sparse Grid
Compared to the classical tensor-product algorithm, the Smolyak algorithm
achieves an order of magnitude reduction in the number of collocation points, while
maintaining the approximation quality of the interpolation by imposing an
inequality constraint on the summation of multi-dimensional indices [16]. This
inequality leads to special linear combinations of tensor-product formulas such that
the interpolation error remains the same as for the tensor-product algorithm.
5.5 Stochastic Response Surface Methods 141
where i = (i1, …, iN) is the multi-index, and |i| = i1++ iN. The formula above
indicates that the Smolyak algorithm builds the multi-dimensional interpolation by
considering one-dimensional functions of interpolation levels i1, …, iN under the
constraint that the sum of these interpolation levels lies within the range [q − N + 1,
q]. Figure 5.14 shows an example of two-dimensional (N = 2) nodes derived from a
sparse grid A4,2 with |i| 4 and from a tensor-product grid based on the same
one-dimensional points. Observe that the number of points in the sparse grid is
significantly smaller than that in the tensor-product grid. The 2D Clenshaw-Curtis
grids for different levels of resolutions specified by different q values are plotted in
Fig. 5.15.
With the incremental interpolant, Di ¼ U i U i1 ; U 0 ¼ 0, the Smolyak formulas
can be equivalently written as
X
Aq;N ðGÞ ¼ Di1 DiN ðGÞ
jij q
X ð5:78Þ
¼ Aq1;N ðgÞ þ Di1 DiN ðGÞ
jij¼q
The formulas above suggest that the Smolyak algorithm improves the interpolation
by utilizing all of the previous interpolation formulas Aq−1,N and the current
incremental interpolant with the order q. If we select the sets of support nodes in a
nested fashion (i.e., Xi Xi+1) to obtain recurring points (e.g., the Clenshaw-Curtis
grid) when extending the interpolation level from i to i + 1, we only need to
compute function values at the differential grids that are unique to Xi+1,
i1 = 1 i1 = 2 i1 = 3
i2 = 1
Sparse grid Tensor-product
i2 = 2
i2 = 3
Fig. 5.14 Comparison of a sparse grid and a tensor-product grid (sparse grid for |i| 4,
tensor-product grid based on the same one-dimension points)
142 5 Reliability Analysis Techniques (Time-Independent)
Xi+1
D = X
i+1 i
\X . In such cases, to build a sparse multi-dimensional interpolation with
the order q, we only need to compute function values at the nested sparse grid
[
Hq;N ¼ XDi1 XDiN ¼ Hq1;N [ DHq;N
jij q
[ ð5:79Þ
DHq;N ¼ XDi1 XDiN
jij¼q
where ΔHq,N denotes the grid points required to increase an interpolation order from
q − 1 to q.
Although the Smolyak algorithm greatly reduces the number of collocation
points for the multi-dimensional interpolation compared to the tensor-product
algorithms, there is still a possibility that the number of function evaluations can be
further reduced in cases where the performance function exhibits different degrees
of nonlinearity in the stochastic dimensions. To achieve such a reduction, one must
adaptively detect the dimensions with higher degrees of nonlinearity and assign
more collocation points to those dimensions. This can be accomplished by using the
dimension-adaptive tensor-product algorithm, which is detailed in the next
subsection.
Dimension-Adaptive Tensor-Product Algorithm: Generalized Sparse Grid
For a given interpolation level l, the conventional sparse grid requires the index set
Il,N = {i | |i| l + N} to build the interpolation A(l + N, N). If we loosen the
5.5 Stochastic Response Surface Methods 143
admissibility condition on the index set, we can construct the index set of the
generalized sparse grid [17]. An index set I is called admissible if for all i 2 I,
i ek 2 I for 1 k N; ik [ 1 ð5:80Þ
Here, ek is the kth unit vector. This admissibility condition still satisfies the tele-
scopic property of the incremental interpolant Δi = Ui − Ui−1. Thus, we can take
advantage of the previous interpolation to construct a better interpolation by just
sampling the differential grids that are unique to the finer interpolation, as shown in
Eqs. (5.78) and (5.79). In each step of the algorithm, an error indicator is assigned
to each multi-index i. The multi-index it with the largest estimated error is selected
for adaptive refinement, since it is possible that a larger error reduction can be
achieved. The admissible indices in the forward neighborhood of it are added to the
index set I. The forward neighborhood of an index i can be defined as
I F ð i Þ ¼ fi þ e k ; 1 k Ng ð5:81Þ
In each step, the newly added indices are called active indices and grouped as an
active index set IA; whereas, those indices whose forward neighborhood has been
refined are called old indices and grouped as an old index set IO. The overall index
set I is comprised of the active and old index sets: I ¼ IA [ IO .
Note that in the dimension-adaptive algorithm, the generalized sparse grid
construction allows for adaptive detection of the important dimensions, and thus a
more efficient refinement compared to the conventional sparse grid interpolation.
However, in engineering practice, not only different dimensions but also two
opposite directions (positive and negative) within one dimension often demonstrate
a large difference in response nonlinearity. In such cases, it is desirable to place
more points in the direction with higher nonlinearity. The dimension-adaptive
algorithm may not be appropriate for this purpose.
Hierarchical Interpolation Scheme
For dimension-adaptive interpolation, the hierarchical interpolation scheme provides
a more convenient way for error estimation than the nodal interpolation scheme [18].
Here, we start by deriving the hierarchical interpolation formulae in the case of the
univariate interpolation, which takes advantage of the nested characteristic of grid
points (i.e., Xi Xi+1). Recall the incremental interpolant, Di ¼ U i U i1 . Based on
Eq. (5.71) and U i1 ðGÞ ¼ U i ðU i1 ðGÞÞ, we can write [18]
Di ðGÞ ¼ U i ðGÞ U i U i1 ðGÞ
X X
¼ aij G xij aij U i1 ðGÞ xij
xij 2X i xij 2X i ð5:82Þ
X
¼ aij G xij U i1 ðGÞ xij
xij 2X i
144 5 Reliability Analysis Techniques (Time-Independent)
Because for all xij 2 X i1 ; Gðxij Þ U i1 ðGÞðxij Þ ¼ 0, Eq. (5.82) can be rewritten as
X
Di ðGÞ ¼ aij G xij U i1 ðGÞ xij ð5:83Þ
xij 2XDi
By denoting the jth element of XDi by xij, Eq. (5.83) can be rewritten as
X
mD i
D ðG Þ ¼
i
aij G xij U i1 ðGÞ xij ð5:85Þ
j¼1 |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
wij
Here, wij is defined as the hierarchical surplus, which indicates the interpolation
error of a previous interpolation at the node xij of the current interpolation level
i. The bigger the hierarchical surpluses, the larger the interpolation errors. For
smooth performance functions, the hierarchical surpluses approach zero as the
interpolation level goes to infinity. Therefore, the hierarchical surplus can be used
as a natural candidate for error estimation and control. Figure 5.16 shows the
comparison between the hierarchical and nodal basis functions with a piecewise
linear spline and a Clenshaw-Curtis grid [18, 45]. Figure 5.17 illustrates the
comparison between the hierarchical and nodal interpolation. Based on the Smolyak
formula in Eq. (5.78), a multivariate hierarchical interpolation formula can be
obtained as [18, 45].
(a) (b)
a13 a23
x13 x23 x33 x43 x53 x12 x13 x11 x23 x22
Fig. 5.16 Nodal basis functions a3j ; x3j 2 X 3 (a) and hierarchical basis functions aij with the
support nodes xij 2 XDi ; i ¼ 1; 2; 3 (b) for the Clenshaw-Curtis grid. Reprinted (adapted) with
permission from Ref. [45]
5.5 Stochastic Response Surface Methods 145
(a) G ( x53 )
1 G ( x1 )
3 (b)
w23
G ( x43 )
G(x 3
) w 2 w13 w22
G ( x33 )
2
1
w11
x13 x23 x33 x43 x53 x12 x13 x11 x23 x22
Fig. 5.17 Nodal (a) and hierarchical (b) interpolations in 1D. Reprinted (adapted) with permis-
sion from Ref. [45]
ð5:86Þ
ID ¼ ID
1 IN
D
ð5:87Þ
þ
where, IDk ¼ ik ; ik ; 1 k N. Here, the forward n
neighborhood of a
o
þ =
multi-dimensional DI i 2 I is defined as the N indices i þ ek
d D d
; 1 k N,
and the sign of the kth directional unit vector e+/−
k is the same as that of the kth
element idk of id. An example of the tensor-product grid and a DSG in a
two-dimensional space (N = 2) is shown in Fig. 5.18. Observe that the DI in a DSG
divides the original index space into four quadrants; this division allows for an
adaptive refinement of the quadrature points in these quadrants.
146 5 Reliability Analysis Techniques (Time-Independent)
i2+ = 3+
Tensor grid DSG
i2+ = 2+
i2 = 1
i2– = 2–
i2– = 3–
Fig. 5.18 Comparison of tensor-product (tensor) grid and a DSG in a two-dimensional space
Based on the proposed concepts of the DI and DSG, the overall procedure of
ADATP interpolation is briefly summarized in Table 5.7. The relative error indi-
cator used in the interpolation scheme can be defined for a DI i as
1 X
er ði Þ ¼ wij ð5:88Þ
ðGmax Gmin ÞMi j
Table 5.7 Procedure for ADATP interpolation. Reprinted (adapted) with permission from Ref.
[45]
Step 1 Set an initial interpolation level l (q − N) = 0; set the initial old index set IO = Ø and
the initial active index set IA = {i}, where the initial active DI i = (1, …, 1) is the
center point (0.5, …, 0.5); set an initial relative error indicator er(i) = 1
Step 2 Select a trial index set IT (from IA) with the error indicator greater than a relative
error threshold value eC; move the active index set IA to the old index set IO. If
IT = Ø, go to Step 7
Step 3 Select and remove the trial index it with the largest error indicator from IT; if none,
go to Step 6. If the number of the collocation points M exceeds the maximum
number Mmax, go to Step 7
Step 4 Generate the forward neighborhood IF of it and add IF to the active index set IA
Step 5 Compute the hierarchical surplus of each new added point based on the collocation
points in the old index set and compute the error indicator of each active index. Go
to Step 3
Step 6 Set an interpolation level l = l + 1 and go to Step 2
Step 7 Construct an explicit interpolation G ^ of the performance function G
5.5 Stochastic Response Surface Methods 147
description of the ADATP method refers to the DI. Under the scheme of asym-
metric sampling, it is expected that the error decay is at least as fast as that of the
DATP interpolation.
Once the asymmetric dimension-adaptive sampling procedure is completed, an
approximate function G ^ of the original performance function G can be obtained by
interpolation using the hierarchical basis functions at the collocation points. Thus,
any probabilistic characteristics of G(x), including statistical moments, reliability,
and PDF, can be easily estimated by performing MCS, as described in Sect. 5.3.1.
Example 5.5 This example utilizes the V6 gasoline engine problem intro-
duced by Lee [46]. The performance function considered in this example is
the power loss (PL) due to the friction between the piston ring and the
cylinder liner, oil consumption, blow-by, and liner wear rate. A ring/liner
subassembly simulation model was used to compute the PL. The simulation
model has four input parameters, ring surface roughness x1, liner surface
roughness x2, linear Young’s modulus x3, and linear hardness x4. Of the four
total inputs, the first two, ring surface roughness x1 and linear surface
roughness x2, were treated as random inputs following normal distributions
with mean 4.0 and 6.119 µm, respectively, and with unit variance. The other
two inputs, linear Young’s modulus x3 and linear hardness x4, were treated as
deterministic inputs fixed at 80 GPa and 240 BHV, respectively. It has been
shown in [46] that the PL has a bimodal PDF.
Compare the accuracy of the UDR, PCE, and ADATP methods in
reproducing the PDF of the PL, and the accuracy of FORM, UDR, PCE, and
ADATP methods in estimating the reliability defined as P(PL 0.3).
Fig. 5.19 PDF approximations for the V6 engine example. Reprinted (adapted) with
permission from Ref. [45]
148 5 Reliability Analysis Techniques (Time-Independent)
Table 5.8 Probability analysis results for the V6 engine example . Reprinted (adapted)
with permission from Ref. [45]
ADATP MCS PCE 20 N + 1 FORM
(p = 25) UDR
Mean (kW) 0.3935 0.3935 0.3934 0.3935 –
Std. dev. 0.0311 0.0310 0.0311 0.0314 –
(kW)
Skewness −0.6062 −0.5883 −0.5742 −0.5393 –
Kurtosis 3.0567 3.0828 3.0566 3.0974 –
P(PL 0.0055 0.0054 0.0057 0.0048 0.0057
0.3) (±0.0005a)
No. FE 72 100,000 625 41 15
a
Error bounds computed with a 95% confidence level
Solution
To predict the bimodal shape of the PDF, the ADATP method uses
eC = 0.005, Mmax = 70, and cubic Lagrange splines [47] as the hierarchical
basis functions. Figure 5.19 shows the PDF approximations of the 25th order
PCE with a fully tensorized Gauss–Hermite quadrature (mI = 25), the UDR
method, the ADATP method, and MCS. Both the ADATP and the PCE
methods provide reasonably accurate approximations of the irregularly
shaped PDF, while the UDR method fails to represent the irregular shape of
this PDF. The probability analysis results shown in Table 5.8 suggest that the
number of function evaluations of the ADATP method is much smaller than
that of the PCE method with a fully tensorized Gaussian quadrature. In this
example, the FORM requires the smallest number of function evaluations
while still producing a good reliability estimate. The small error produced by
FORM is due to the nonlinearity of the power loss function. However, FORM
cannot be used for cases that require the construction of a complete PDF and
subsequent uncertainty propagations.
5.6 Exercises
5:1 Consider the cantilever beam-bar system in Problem 4.1 (see Fig. 4.9) in
Chap. 4. Suppose that a failure mode consists of two failure events: the
1 ), followed by the
formation of a hinge at the fixed point of the beam (event E
formation of another hinge at the midpoint of the beam (event E 3 ). The two
safety events can be expressed as:
5.6 Exercises 149
Table 5.10 Statistical information for the random variables in Problem 5.2
Random variables X [lb] Y [lb] S [psi] w [in] t [in]
Distribution Normal Normal Normal Normal Normal
Mean 500 1000 400,000 2 1
Standard deviation 100 100 20,000 0.02 0.01
150 5 Reliability Analysis Techniques (Time-Independent)
5:3 Consider the following simply supported beam subject to a uniform load, as
illustrated in Fig. 5.20. Suppose L = 5 m, and two random variables EI (X1)
and w (X2) are independent and follow normal distributions with means and
standard deviations summarized in Table 5.11.
The maximum deflection of the beam is shown as
5X2 L4
YðX1 ; X2 Þ ¼
384X1
EI
% A 99-LINE UDR-BASED SRSM CODE WRITTEN BY HU C., Wang P., AND YOUN B.D. %
function UDR_RS()
clear all; close all;
u = [0.4 0.4]; %% Mean vector of random variables
s = [0.01 0.01]; %% Standard deviation vector
xx(k) = input(k,kk);
if isequal(k,1) && isequal(xx,u) %% Avoid re-evaluating mean value
output(k,kk) = findresponse(xx);
gg = output(k,kk);
elseif *isequal(k,1) && isequal(xx,u)
output(k,kk) = gg;
else
output(k,kk) = findresponse(xx);
end
end
end
%=================== Define Performance Function =======================%
function response = findresponse(xx)
if isvector(xx) == 1
response = 0.75*exp(-0.25*(9*xx(1)-2)^2-0.25*(9*xx(2)-2)^2)....
+0.75*exp(-(9*xx(1)+1)^2/49-(9*xx(2)+1)/10)....
+0.50*exp(-0.25*(9*xx(1)-7)^2-0.25*(9*xx(2)-3)^2)....
-0.20*exp(-(9*xx(1)-4)^2-(9*xx(2)-7)^2) - 0.6;
else
response = 0.75*exp(-0.25*(9*xx(1,:)-2).^2-0.25*(9*xx(2,:)-2).
^2)....
+0.75*exp(-(9*xx(1,:)+1).^2/49-(9*xx(2,:)+1)/10)....
+0.50*exp(-0.25*(9*xx(1,:)-7).^2-0.25*(9*xx(2,:)-3).^2)....
-0.20*exp(-(9*xx(1,:)-4).^2-(9*xx(2,:)-7).^2) - 0.6;
end
154 5 Reliability Analysis Techniques (Time-Independent)
References
1. Haldar, A., & Mahadevan, S. (2000). Probability, reliability, and statistical methods in
engineering design. New York: Wiley.
2. Hasofer, A. M., & Lind, N. C. (1974). Exact and invariant second-moment code format.
Journal of Engineering Mechanics Division ASCE, 100, 111–121.
3. Breitung, K. (1984). Asymptotic approximations for multinormal integrals. Journal of
Engineering Mechanics, ASCE, 110(3), 357–366.
4. Tvedt, L. (1984). Two second-order approximations to the failure probability. Section on
structural reliability. Hovik, Norway: A/S vertas Research.
5. Rubinstein, R. Y. (1981). Simulation and the Monte Carlo method. New York: Wiley.
References 155
6. Fu, G., & Moses, F. (1988). Importance sampling in structural system reliability. In
Proceedings of ASCE Joint Specialty Conference on Probabilistic Methods, Blacksburg, VA
(pp. 340–343).
7. Au, S. K., & Beck, J. L. (1999). A new adaptive importance sampling scheme for reliability
calculations. Structural Safety, 21(2), 135–158.
8. Hurtado, J. E. (2007). Filtered importance sampling with support vector margin: A powerful
method for structural reliability analysis. Structural Safety, 29(1), 2–15.
9. Naess, A., Leira, B. J., & Batsevych, O. (2009). System reliability analysis by enhanced
Monte Carlo simulation. Structural Safety, 31(5), 349–355.
10. Rahman, S., & Xu, H. (2004). A univariate dimension-reduction method for
multi-dimensional integration in stochastic mechanics. Probabilistic Engineering
Mechanics, 19, 393–408.
11. Xu, H., & Rahman, S. (2004). A generalized dimension-reduction method for
multi-dimensional integration in stochastic mechanics. International Journal of Numerical
Methods in Engineering, 61, 1992–2019.
12. Youn, B. D., Zhimin, X., & Wang, P. (2007). Eigenvector Dimension Reduction
(EDR) method for sensitivity-free probability analysis. Structural and Multidisciplinary
Optimization, 37, 13–28.
13. Ghanem, R. G., & Spanos, P. D. (1991). Stochastic finite elements: A spectral approach. New
York: Springer.
14. Wiener, N. (1938). The homogeneous chaos. American Journal of Mathematics, 60(4), 897–
936.
15. Xiu, D., & Karniadakis, G. E. (2002). The Wiener-Askey polynomial chaos for stochastic
differential equations. SIAM Journal on Scientific Computing, 24(2), 619–644.
16. Smolyak, S. (1963). Quadrature and interpolation formulas for tensor product of certain
classes of functions. Soviet Mathematics—Doklady, 4, 240–243.
17. Grestner, T., & Griebel, M. (2003). Dimension-adaptive tensor-product quadrature.
Computing, 71(1), 65–87.
18. Klimke, A. (2006). Uncertainty modeling using fuzzy arithmetic and sparse grids (Ph.D.
thesis). Universität Stuttgart, Shaker Verlag, Aachen.
19. Ganapathysubramanian, B., & Zabaras, N. (2007). Sparse grid collocation schemes for
stochastic natural convection problems. Journal of Computational Physics, 225(1), 652–685.
20. Law, A. M., & Kelton, W. D. (1982). Simulation modeling and analysis. New York:
McGraw-Hill.
21. Rabitz, H., Alis, O. F., Shorter, J., & Shim, K. (1999). Efficient input–output model
representations. Computer Physics Communications, 117(1–2), 11–20.
22. Xu, H., & Rahman, S. (2005). Decomposition methods for structural reliability analysis.
Probabilistic Engineering Mechanics, 20, 239–250.
23. Rabitz, H., & Alis, O. F. (1999). General foundations of high dimensional model
representations. Journal of Mathematical Chemistry, 25(2–3), 197–233.
24. Alis, O. F., & Rabitz, H. (2001). Efficient implementation of high dimensional model
representations. Journal of Mathematical Chemistry, 29(2), 127–142.
25. Li, G., Rosenthal, C., & Rabitz, H. (2001). High dimensional model representations. Journal
of Physical Chemistry A, 105, 7765–7777.
26. Li, G., Wang, S. W., & Rabitz, H. (2001). High dimensional model representations generated
from low dimensional data samples—I: Mp-Cut-HDMR. Journal of Mathematical Chemistry,
30(1), 1–30.
27. Sobol, I. M. (2003). Theorems and examples on high dimensional model representations.
Reliability Engineering and System Safety, 79(2), 187–193.
28. Griebel, M., & Holtz, M. (2010). Dimension-wise integration of high-dimensional functions
with applications to finance. Journal of Complexity, 26(5), 455–489.
29. Kuo, F. Y., Sloan, I. H., Wasilkowski, G. W., & Wozniakowski, H. (2010). On
decompositions of multivariate functions. Mathematics of Computation, 79, 953–966.
156 5 Reliability Analysis Techniques (Time-Independent)
30. Youn, B. D., Choi, K. K., & Yi, K. (2005). Reliability-based robust design optimization using
the performance moment integration method and case study of engine gasket-sealing problem.
In Proceedings of SAE 2005 World Congress, Detroit, MI, United States.
31. Youn, B. D., Choi, K. K., & Yi, K. (2005). Performance Moment Integration (PMI) method
for quality assessment in reliability-based robust design optimization. Mechanics Based
Design of Structures and Machines, 33, 185–213.
32. Lee, S. H., Chen, W., & Kwak, B. M. (2009). Robust design with arbitrary distributions using
Gauss-type quadrature formula. Structural and Multidisciplinary Optimization, 39(3), 227–
243.
33. Cameron, R. H., & Martin, W. T. (1947). The orthogonal development of nonlinear
functionals in series of Fourier–Hermite functionals. Annals of Mathematics, 48, 385–392.
34. Le Maître, O. P., Knio, O. M., Najm, H. N., & Ghanem, R. G. (2001). A stochastic projection
method for fluid flow—I. Basic formulation. Journal of Computational Physics, 173, 481–
511.
35. Le Maître, O. P., Reagan, M., Najm, H. N., Ghanem, R. G., & Knio, O. M. (2002).
A stochastic projection method for fluid flow—II. Random process. Journal of Computational
Physics, 181, 9–44.
36. Field, R. V. (2002). Numerical methods to estimate the coefficients of the polynomial chaos
expansion. In Proceedings of the 15th ASCE Engineering Mechanics Conference.
37. Xiu, D. (2009). Fast numerical methods for stochastic computations: A review.
Communications in Computational Physics, 5, 242–272.
38. Gerstner, T., & Griebel, M. (1998). Numerical integration using sparse grids. Numerical
Algorithms, 18(3), 209–232.
39. Ma, X., & Zabaras, N. (2009). An adaptive hierarchical sparse grid collocation algorithm for
the solution of stochastic differential equations. Journal of Computational Physics, 228,
3084–3113.
40. Berveiller, M. (2005). Stochastic finite elements: Intrusive and non intrusive methods for
reliability analysis (Ph.D. thesis). Universite’ Blaise Pascal, Clermont-Ferrand.
41. Sudret, B. (2008). Global sensitivity analysis using polynomial chaos expansions. Reliability
Engineering & System Safety, 93, 964–979.
42. Paffrath, M., & Wever, U. (2007). Adapted polynomial chaos expansion for failure detection.
Journal of Computational Physics, 226, 263–281.
43. Creveling, C. M. (1997). Tolerance design: A handbook for developing optimal specification.
Cambridge, MA: Addison-Wesley.
44. Wu, C. C., Chen, Z., & Tang, G. R. (1998). Component tolerance design for minimum quality
loss and manufacturing cost. Computers in Industry, 35, 223–232.
45. Hu, C., & Youn, B. D. (2011). An asymmetric dimension-adaptive tensor-product method for
reliability analysis. Structural Safety, 33(3), 218–231.
46. Lee, S. H., & Chen, W. (2009). A comparative study of uncertainty propagation methods for
black-box-type problems. Structural and Multidisciplinary Optimization, 37(3), 239–253.
47. Kvasov, B. I. (2000). Methods of shape-preserving spline approximation. Singapore: World
Scientific Publications Co., Inc.
48. Abramowitz, M., & Stegun, I. A. (1972). Handbook of mathematical functions (9th ed.). New
York: Dover Publications, Inc.
Chapter 6
Time-Dependent Reliability Analysis
in Design
briefly discussed, and related references are provided for further study. Section 6.6
provides concluding thoughts. Exercise problems are provided in Sect. 6.7.
For static time-independent reliability analysis, the limit state function G(X) is
generally used, where the vector X represents random input variables with a joint
probability density function (PDF) fX(x). The probability of failure can be defined
based on the limit state function as
Z Z
Pf ¼ PðGðXÞ [ 0Þ ¼ fX ðxÞdx ð6:1Þ
GðXÞ [ 0
where S0 represents the marginal value. For example, for a roller clutch, the per-
formance function S(X, t) could be defined as the hoop stress generated in the clutch
operation, while the limit state function G(X, t) can be defined accordingly as the
difference between the threshold value of the hoop stress R and S(X, t). By setting
G(X, t) 0, we constrain the hoop stress generated in the clutch operation from
going beyond the threshold. If tl is the lifetime of interest, the probability of failure
within the lifetime [0, tl] can be described based on the stress strength reliability
model as
performance approaches use the extreme value of the performance function under
consideration, and a failure occurs if the extreme value over a designed time
interval is greater than a given threshold value. The first-passage approaches con-
sider the first time instant when the performance of interest exceeds or falls below a
threshold; this requires calculation of the “outcrossing rate” as the likelihood that
the performance exceeds or falls below the threshold. In recent literature, a com-
posite limit state (CLS) based method has also been developed to compute the
cumulative probability of failure based on the Monte Carlo simulation (MCS). This
approach constructs a global CLS to transform time-dependent reliability analysis
into a time-independent analysis by combining all instantaneous limit states of each
time interval in a series. In the rest of this chapter, these different methods will be
introduced.
In this section, the Monte Carlo simulation (MCS) method for time-dependent
reliability analysis is first introduced; an example is then used to demonstrate the
concept.
where t represents the time variable that varies within [0, 5], and X1 and X2 are
normally distributed random variables: X1*N(3.5, 0.32) and X2*N(3.5, 0.32).
The time-dependent probability of failure, P(G(X, t)>0), can then be calculated as
detailed below.
Following the procedure shown in Fig. 6.1, the following steps can be used to
estimate the time-dependent reliability of this example using the MCS method. We
2 0.8 0.8
2
Safe Samples
1.5
0.7 0 0.7 Failure Samples
1 -2
0.6 0.6
Probability Density
0.5 -4
0.5 0.5
0 -6
-1 -0.5 0 0.5 1
G(x, t)
X2
X1 0.4
X2
-8 0.4
4
0.3 -10 0.3
3
0.2 -12
0.2
2 -14
0.1
1 0.1
-16
0
0
0 0.2 0.4 0.6 0.8 1 -1 -0.5 0 0.5 1 -18 0
0 1 2 3 4 5 -1 -0.5 0 0.5 1
X2 X1
Time, t X1
Fig. 6.1 Procedure for time-dependent reliability analysis using the MCS method
6.2 Simulation-Based Methods for Time-Dependent Reliability Analysis 161
1.5 5
1
4.5
0.5
Probability Density
4
0
2 2.5 3 3.5 4 4.5 5
X2
X 3.5
1
1.5
3
1
2.5
0.5
0 2
2 2.5 3 3.5 4 4.5 5 2 2.5 3 3.5 4 4.5 5
X X
2 1
Fig. 6.2 Generating random sample points for X1 and X2 from their PDFs
first generate random sample points of X1 and X2 by drawing samples from the
probability density functions (PDFs). Here, a total number of 10,000 samples are
generated for X1 and X2, respectively. Figure 6.2 shows the random sample points
generated for X1 and X2.
The limit sate function G(X, t), as shown in Eq. (6.4), is then evaluated at the
random sample points generated in Step 1. As G(X, t) is time-dependent, evaluation
of the limit state function at a given sample point x* will generally provide a
one-dimensional function G(x*, t).
Figure 6.3a shows the evaluation of G(X, t) over time at several sample points,
including the point x = [3.5, 3.5].
Considering the failure event as defined in Eq. (6.3), a sample point can be
classified as a failure sample point if the limit state function G(X, t) goes beyond
zero with the specified time interval. As shown in Fig. 6.3b, the limit state function
has been evaluated at four different sample points over time. In the figure, the two
read curves indicate two failure sample points; whereas, the two blue curves cor-
respond to two safe samples. Following the classification procedure shown in
Fig. 6.3 Evaluation of system performance functions over time (a), and determination of failure
sample points (b)
162 6 Time-Dependent Reliability Analysis in Design
Fig. 6.1, all Monte Carlo samples can be accordingly classified into either failure
samples or safe samples, as seen in the classification results shown in Fig. 6.4.
Based on the classification of the sample points, 81.60% of sample points are
classified to be safe within the time interval [0, 5]. Thus, based on the MCS
method, the time-dependent reliability is estimated as 81.60%.
This section presents the details about one of the extreme value-based methods for
time-dependent reliability analysis, namely, the nested extreme response surface
(NERS) approach [10–13]. Section 6.3.1 introduces the concept of the nested time
prediction model (NTPM). Section 6.3.2 details how the NERS approach can be
used to efficiently construct the NTPM. A mathematic example follows in Sect. 6.3.3.
For a system of interest, the performance function S(X, t), as shown in Eq. (6.2), may
change over time due to time-dependent loading conditions and/or component dete-
rioration. Figure 6.5 shows the random realizations of two different types of system
responses, where the solid and dashed lines represent monotonically and
non-monotonically increasing performance, respectively. If S(x, t) increases or
decreases monotonically over time, the extreme response will generally occur at the
time interval boundary, where the probability of failure will also approach its maxi-
mum value. For this type of time-dependent reliability constraint, the reliability
analysis needs to be carried out only at the time interval boundary, and the optimum
6.3 Extreme Value-Based Methods 163
design derived from the reliability-based design optimization (RBDO) can guarantee
the reliability requirements being satisfied over the entire time domain of interest.
However, the situation is more complicated when the system response S(X, t) is a
non-monotonic function, as shown in Fig. 6.5. In this case, it is critical that the
reliability analysis is carried out at the instantaneous time when the extreme response
of the performance function is obtained.
The time that leads to the extreme response of the system performance function
varies with different designs x; thus, a response surface of the time with respect to
system design variables can be determined as
n o
T ¼ f ðXÞ: max Sðx; tÞ; 8x 2 X : ð6:5Þ
t
into a time-independent one by estimating the extreme time responses for any given
system design.
Although the NTPM facilitates time-dependent reliability analysis, it can be very
challenging to efficiently develop a high-fidelity time-prediction model. First,
analytical forms of time-dependent limit states are usually not available in practical
design applications; consequently, NTPM must be developed based on limited
samples. Second, because the samples for developing NTPM require extreme time
responses over the design space, it is of vital importance that these responses be
efficiently extracted to make the design process computationally affordable. Third,
NTPM must be adaptive in performing two different roles: making predictions of
extreme time responses for reliability analysis, and including extra sample points to
improve the model itself at necessary regions during the iterative design process.
The following section presents the NERS methodology, which addresses the three
aforementioned challenges.
In this subsection, we introduce one of the extreme value-based methods that can be
used to tackle time-dependent reliability analysis and design problems, namely the
nested extreme response surface (NERS) approach. The key to this approach is to
effectively build the NTPM in the design space of interest, which can then be used
to predict the time when the system response will approach its extreme value [10–
13]. The NERS methodology is comprised of three major techniques within three
consecutive steps: (1) efficient global optimization (EGO) for extreme time
response identification, (2) construction of a kriging-based NTPM, and (3) adaptive
time response prediction and model maturation. The first step, EGO, is employed to
6.3 Extreme Value-Based Methods 165
Time-dependent Yes No
Reliability Results NTPM is accurate?
Fig. 6.7 The procedure of time-dependent reliability analysis using the NERS approach
efficiently extract a certain amount of extreme time response samples, which are
then used for the development of NTPM in the second step. Once the kriging-based
NTPM for extreme time responses is established, adaptive response prediction and
model maturation mechanisms are used to assure the prediction accuracy and
efficiency by autonomously enrolling new sample points when needed during the
analysis process. The NERS methodology is outlined in the flowchart shown in
Fig. 6.7, and the three aforementioned key techniques are explained in detail in the
remainder of this section.
A. Efficient Global Optimization for Extreme Time Response Identification
For reliability analysis with time-dependent system responses, it is critical to
efficiently compute the extreme responses of the limit state function and effectively
locate the corresponding time when the system response approaches its extreme
value. For a given system design, the response of the limit state function is
time-dependent and could be either a monotonic or non-monotonic
one-dimensional function with respect to time. Here, the Efficient Global
Optimization (EGO) technique [17] can be employed to efficiently locate the
extreme system response and the corresponding time when the extreme response is
approached, mainly because it is capable of searching for the global optimum when
dealing with a non-monotonic limit state, while at the same time assuring excellent
computational efficiency. In this subsection, we focus on introducing the applica-
tion of the EGO technique for extreme time response identification. Discussion of
the EGO technique itself is omitted, because more detailed information about this
technique can be obtained from references [18–20].
166 6 Time-Dependent Reliability Analysis in Design
In order to find the global optimum time that leads to the extreme response of the
limit state function, the EGO technique generates a one-dimensional stochastic
process model based on existing sample responses over time. Stochastic process
models have been widely used for function approximation; more information on
these models can be found in the references [21, 22]. In this study, the response of
the limit state function over time for a particular design point is expressed by a
one-dimensional stochastic process model in EGO with a constant global mean as
where l is the global model representing the function mean, and e(t) is a stochastic
process with mean zero and variance r2e . The covariance between e(t) at two
different points ti and tj is defined as Cov(e(ti), e(tj)) = r2e Rc(ti, tj), in which the
correlation function is given by
b
Rc ti ; tj ¼ exp ati tj ð6:7Þ
where F = (F(t1), F(t2), …, F(tk)) denote the sample responses of the limit state
function, and Rc is the covariance matrix whose (i, j) is given by Eq. (6.7). After
developing the initial one-dimensional stochastic process model, EGO updates this
model iteratively by continuously searching for the most useful sample point that
ensures a maximum improvement for the accuracy until the convergence criteria is
satisfied. To update the stochastic process model, the EGO employs the expected
improvement [23] metric to quantify the potential contribution of a new sample
point to the existing response surface; the sample point that gives the largest
expected improvement value is chosen at the next iteration. In what follows, the
expected improvement metric will be introduced briefly and the procedure of
employing the EGO technique for extreme time response identification will be
summarized. A mathematic example is employed to demonstrate the extreme time
response identification using the EGO technique.
Let us consider a continuous function F(t) over time t that represents a limit state
function over time for a given system design point in the design space. Here, we
employ an expected improvement metric to determine the global minimum of F(t).
Due to limited samples of F(t), the initial stochastic process model may introduce
large model uncertainties, and consequently, the function approximation, denoted
by f(t), could be substantially biased compared with the real function F(t). Due to
the uncertainty involved in this model, in EGO, the function approximation of f(t) at
6.3 Extreme Value-Based Methods 167
time t is treated as a normal random variable whose mean and standard deviation
are computed by an approximated response Fr(t), and its standard error e(t) is
determined from the stochastic process model. With these notations, the improve-
ment at time t can be defined as
where Fmin indicates the approximated global minimum value at the current EGO
iteration. By integrating the expectation of the right part of Eq. (6.8), the expected
improvement at any given time t can be presented as [17]
where U() and /() are the cumulative distribution function and the probability
density function for the standard Gaussian distribution, respectively. A larger
expected improvement at time t means a greater probability of achieving a better
global minimum approximation. Thus, a new sample should be evaluated at the
particular time ti, where the maximum expected improvement value is obtained to
update the stochastic process model. With the updated model, a new global mini-
mum approximation for F(t) can be obtained. The same process can be repeated
iteratively by evaluating a new sample at time ti, which provides the maximum
expected improvement value and updates the stochastic process model for the new
global minimum approximation until the maximum expected improvement is small
enough, and less than a critical value Ic (in this study, Ic = |Fmin|%, which is 1% of
the absolute value of the current best global minimum approximation). The pro-
cedure for employing the EGO technique for extreme time response identification
can be briefly summarized in five steps, as shown in Table 6.1.
A mathematical example is employed here to demonstrate the accuracy and
efficacy of the EGO for extraction of the extreme responses of the limit state
Table 6.1 Procedure for EGO for extreme time response identification
Steps Procedure
Step 1: Identify a set of initial sample times ti and evaluate the responses of the limited state
function F(ti) for (i = 1, 2, …, k).
Step 2: Develop a stochastic process model for F(t) with existing sample points, and
approximate the global minimum, Fmin, and the time tm.
Step 3: Determine time ti with maximum expected improvements, E[I(t)].
Step 4: Compare max{E[I(t)]} with Ic: If max{E[I(t)]} <= Ic, STOP and report tm and Fmin;
Else, go to Step 5.
Step 5: Evaluate the response at ti, and repeat Steps 2–4.
168 6 Time-Dependent Reliability Analysis in Design
function. Assume that a time-dependent limit state function for a particular point of
the design space is provided by
The objective here is to identify the extreme response of F(t) (the global minimum)
and pinpoint the corresponding time t within the time interval [1, 5]. Figure 6.8
shows the limit state function with respect to time, in which the global minimum
occurs at time t = 4.5021 with F(4.5021) = −3.5152; whereas, the local minimum is
located at t = 1.9199 with F(1.9199) = 0.8472.
Following the procedures outlined in Table 6.1, the performance function is first
evaluated at initial samples t1*t5: [1, 2, 3, 4, 5], and the obtained limit state
function values are F(t1)*F(t5): [10.992, 0.924, 7.776, 1.607, 9.600]. With these
initial sample points, a one-dimensional stochastic process model can be built to
approximate the limit state function F(t), and the global optimum can be approx-
imated as F(2) = 0.924. As indicated by Step 3 in Table 6.1, the expected
improvement is calculated based on Eq. (6.10) throughout the time interval [1, 5],
and the maximum expected improvement can be obtained as max{E[I(ti)]} =
5.2655, where ti = 2.4572. Because max{E[I(ti)]} > Ic, where Ic = 0.00001 here, the
limit state function will be evaluated at the new sample point ti = 2.4572, which
results in F(2.4572) = 3.6114. Figure 6.9a shows the above-discussed first EGO
iteration for the extreme response identification, in which the top figure represents
the current realization of the stochastic process model for F(t), and the bottom one
indicates the curve of the expected improvements over t. In the next iteration, by
adding a new sample point to the existing ones at t1*t5, a new stochastic process
model with better accuracy can be built for the approximation of F(t) over time and
the identification of the extreme response of the limit state function (the global
minimum).
(a)
(b)
Fig. 6.9 a First EGO iteration for extreme response identification; b The eighth EGO iteration for
extreme response identification
The procedures shown in Table 6.1 can be repeated until the convergence cri-
terion is satisfied and the extreme response of the limit state function is identified
with a desired accuracy level. Figure 6.9b shows the sixth EGO iteration for the
extreme response identification. After a total of eight iterations, the expected
improvement of including a new sample point for the response surface is clearly
small enough and the convergence criterion is satisfied. Table 6.2 details all the
EGO iterations for the extreme response identification of this mathematical
example. As shown in this table, the accuracy of the estimated global minimum is
improved after involving a new sample during the EGO process; thus, the minimum
170 6 Time-Dependent Reliability Analysis in Design
Fmin= −3.5152 in iteration 8 is more accurate than the result Fmin= −3.4991 in
iteration 7.
B. Nested Time Prediction Model
This section presents the procedure for developing a nested time prediction model
using kriging. After repeating the EGO process outlined in subsection A for different
system sample points in the design space, a set of data can be obtained, including
initial sample points in the design space X, and the corresponding times T when the
system responses approach their extreme values at these sample points. To build the
NTPM using the NERS approach, design points are randomly generated in the
design space based on the random properties of the design variables. To balance the
accuracy and efficiency of the NTPM, initially 10 * (n − 1) samples are suggested to
build the kriging-based NTPM for n-dimensional problems (n > 1). The accuracy
and efficiency of NTPM is controlled by an adaptive response prediction and model
maturation (ARPMM) mechanism, which will be detailed in the next subsection.
The objective here is to develop a prediction model to estimate the time that leads to
the extreme performances of the limit state function for any given system design in
the design space. For this purpose, the kriging technique is employed, and a kriging
model is constructed based on the sample dataset obtained during the EGO process.
It is noteworthy that different response surface approaches, such as the simple linear
regression model or artificial neural networks [24–26], could be applied here for
development of the NTPM. In this study, kriging is used because it performs well for
modeling nonlinear relationships between the extreme time responses with respect to
system design variables.
Kriging is considered to be powerful and flexible for developing surrogate
models among many widely used metamodeling techniques [20, 27]. One of the
distinctive advantages of kriging is that it can provide not only prediction of
extreme time responses at any design point but it can also define the uncertainties,
6.3 Extreme Value-Based Methods 171
such as the mean square errors associated with the prediction. Considering a limit
state function with nd random input variables, a kriging time prediction model can
be developed with n sample points denoted by (xi, ti), in which xi= (x1i …xnd
i ) (i =
1…n) are sample inputs, and ti is the time when the limit state function approaches
the extreme value for a given xi. In the kriging model, time responses are assumed
to be generated from the model:
where H(x), as a polynomial function of x, is the global model that represents the
function mean, and Z(x) is a Gaussian stochastic process with zero mean and
variance r2. As indicated by the studies in references [20, 28], a constant global
mean for H(x) is usually sufficient in most engineering problems; it is also much
less expensive and computationally more convenient. Thus, we use a constant
global mean l for the polynomial term H(x), and accordingly, the kriging time
prediction model in Eq. (6.16) can be expressed as
where Rc(xi, xj) represents an n n symmetric correlation matrix, and the (i,
j) entry of this correlation matrix is a function of the spatial distance between two
sample points xi and xj, which is expressed as
!
X
nd
Rc ðxi ; xj Þ ¼ exp ap jxpi xpj jbp ð6:15Þ
p¼1
where xi and xj denote two sample points, || is the absolute value operator, and ap
and bp are hyperparameters of the kriging model that need to be determined. In this
equation, ap is a positive weight factor related to each design variable, and bp is a
non-negative power factor with a value usually within the range [0, 2] [17, 29].
Note that other than the most commonly used Gaussian functions, other options
are available to define the correlation matrix Rc(xi, xj) and derive the covariance
function S(x) [16, 20, 30–32]. With n number of sample points (xi, ti) for I = 1, …,
n for the kriging time prediction model, the likelihood function of the model
hyperparameters can be given as
1 1 T 1
Likelihood ¼ n lnð2pÞ þ n ln r þ lnjRc j þ 2 ðt AlÞ Rc ðt AlÞ
2
2 2r
ð6:16Þ
172 6 Time-Dependent Reliability Analysis in Design
In this equation, we can solve for the values of l and r2 by maximizing the
likelihood function in closed form as
1 T 1
l ¼ AT R1
c A A Rc t ð6:17Þ
ðt AlÞT R1
c ðt AlÞ
r2 ¼ ð6:18Þ
n
where A is a matrix of basis functions for the global model. In this study, A is an
n 1 unit vector since only the constant global mean l is considered for the
polynomial term H(x). Substituting Eqs. (6.17) and (6.18) into Eq. (6.16), the
likelihood function is transformed to a concentrated likelihood function, which
depends only upon the hyperparameters ap and bp for any p within [1, nd]. Then, ap
and bp can be obtained by maximizing the concentrated likelihood function, and
thereafter the correlation matrix Rc can be computed. With the kriging time pre-
diction model, the extreme time response for any given new point x′ can be esti-
mated as
where r(x′) is the correlation vector between x′ and the sampled points x1*xn, in
which the ith element of r is given by ri(x′) = Rc(x′, xi).
C. Adaptive Response Prediction and Model Maturation
Model prediction accuracy is of vital importance when employing the nested
time prediction model for design. Thus, during the design process, a mechanism is
needed for model maturation that will automatically enroll new sample points to
improve the accuracy of the nested time prediction model when the accuracy
condition is not satisfied. This paper develops an adaptive response prediction and
model maturation (ARPMM) mechanism based on the mean square error e(x) of the
current best prediction. Figure 6.10 shows the flowchart of the developed
mechanism.
Before predicting the time response of a new design point x using the latest
update of the NTPM, the ARPMM mechanism can be employed by first calculating
the current mean square error e(x) of the best prediction as
" #
ð1 AT R1 rÞ2
eðxÞ ¼ r2 1 r T R1 r þ ð6:20Þ
AT R1 A
No
Time-independent
Add
Reliability Analysis
Current Design Point
Extreme Time
Time-dependent
Response Identification
Reliability Using EGO
eðxÞ
nðxÞ ¼ ð6:21Þ
l
Prediction of a time response t′ using the NTPM for a new design point x is
accepted only if the relative error n(x) is less than a user-defined threshold nt. To
balance a smooth design process and a desired prediction accuracy, we suggest a
value of nt within the range [10−3, 10−2]. Once the prediction on this particular design
point x is accepted, the time response t′ of extreme performance is estimated using
Eq. (6.23) and returned to the time-dependent reliability analysis process. If the
relative error is larger than the threshold, then x will be enrolled as a new sample
input, and the EGO process, as discussed in subsection A, will be employed to
extract the true time response when the limit state function approaches its extreme
performance for x. With the new design point x and the true time response for x, the
NTPM will be updated, as discussed in subsection B. The procedure for the ARPMM
mechanism of NTPM is provided in Table 6.3. Through the developed ARPMM
mechanism, the NTPM can be updated adaptively during the time-dependent reli-
ability analysis process to guarantee accuracy and simultaneously maintain effi-
ciency. Note that the ARPMM mechanism automates the improvement of the kriging
model during the design process; under rare cases, the stability issues induced by
singular matrices may occur when several design points located closely together are
used to seed the kriging model. Thus, we suggest an extra step be included in the
ARPMM process by checking the singularity after adding new sample points for the
prediction accuracy improvement of the kriging model.
174 6 Time-Dependent Reliability Analysis in Design
Table 6.3 The procedure of developing the nested time prediction model using ARPMM
Steps Procedure
Step 1: Identify an initial set of design points, X, and extract time responses T when the
limited state function approaches extreme values at X, correspondingly.
Step 2: Develop the nested time prediction model (NTPM) using existing data set D = [X, T].
Step 3: For a new design point x, calculate prediction error, n(x), using the latest NTPM.
Step 4: Compare the prediction error with the error threshold, nt:
• If n(x) < nt, estimate t′(x) at the new point x and return to reliability analysis;
• If n(x) nt, determine the extreme response at t′ of using EGO, add (x, t′) to D,
then go to Step 2.
In this section, the same mathematical example as used in Sect. 6.2.2 is employed
to demonstrate time-dependent reliability analysis using the NERS approach. In the
example, the time-dependent limit state function G(X, t) is given by
where t represents the time variable that varies within [0, 5] and x1 and x2 are
normally distributed random variables: X1 * Normal(5.8, 0.52) and X2 * Normal
(2.2, 0.52). Figure 6.11 shows the failure surface of the instantaneous time-dependent
limit states as it changes with the time interval [0, 6], in which limit state functions at
different time nodes equal zero. As a benchmark solution, we employed Monte Carlo
simulation to calculate the reliability, with an obtained reliability of 0.9626. In the
MCS, first, 100,000 samples were generated for each input variable, and then, the
time variable was discretized evenly into 100 time nodes within the interval [0, 6].
The limit state function was evaluated for all sample points at each time node, and the
reliability was then estimated by counting the number of safe samples accordingly.
A sample point was considered a safe sample if the minimum of the limit state
function values over the 100 time nodes was larger than zero. In what follows, the
reliability analysis for this case study is carried out using the NERS approach.
The NERS approach, which converts the time-dependent reliability analysis to a
time-independent analysis, begins with the development of the NTPM. As shown in
Table 6.4, an initial set of eight design points can be sampled randomly in the
design space, based on the randomness of the design variables X1 and X2. With the
initial set of design points, the EGO technique can be employed for identification of
the extreme responses (maximum) of the limit state and relative times when the
limit state approaches its extreme values. The results are also shown in the last two
columns of Table 6.4. For example, for the design point [3.4134, 3.2655], the limit
state function approaches its extreme response −11.0843 when time t = 1.3574, as
6.3 Extreme Value-Based Methods 175
shown in the first row of the table. Using these eight initial sample points, denoted
by (xi, ti) for i = 1, 2, …, 8, by following the procedure discussed in Sect. 6.3.2B, a
kriging-based NTPM can be developed. The unknown parameters of the kriging
model are estimated, and the NTPM is obtained by maximizing the concentrated
likelihood function, as shown in Eq. (6.16). Figure 6.12 shows the developed
NTPM for this case study.
During the reliability analysis process, the ARPMM mechanism is used to
update the NTPM as necessary and predict the time responses for new design
points. For the ARPMM mechanism, the mean square error e(x) is employed as the
decision-making metric. When the NTPM is used to predict the time when the limit
state function approaches its extreme value for the new design point x, if the relative
error n(x), based on e(x), is greater than nt, the ARPMM mechanism will trigger the
176 6 Time-Dependent Reliability Analysis in Design
EGO process for identification of the true time response t′ for x. After the EGO
process, the new design point x and the corresponding time response t′ will be
included as a new sample point to D, and the initial NTPM will be updated with this
new point.
After developing the NTPM, the time-dependent reliability analysis can be
converted to a time-independent one. Commonly used reliability analysis tools,
such as the FORM, can be employed for this purpose. By applying the FORM, the
time-independent limit state function is linearized at the most probable point
(MPP), and the reliability index can be calculated in the standard normal space as
the distance from the MPP to the origin, while the HL-RF algorithm [33] can be
used for the MPP search in FORM. As the FORM linearizes the limit state function
for the reliability calculation, error will be introduced due to this linearization. Thus,
MCS is also employed with NTPM to study the reliability analysis errors intro-
duced by FORM and by NTPM. Table 6.5 shows the results of reliability analysis
with the NERS approach.
The composite limit state (CLS) approach [34] discretizes the continuous time
variable into a finite number of time intervals and consequently time is treated as a
constant value within each time interval. The time-dependent reliability analysis can
then be converted into reliability analysis of a serially connected system where the
limit state function defined for each time interval is treated as a system component.
Let time tl denote the designed life of interest, which is discretized into N time
intervals using a fixed time step Dt. Let En = {x|G(x, tn) 0} denote the
instantaneous failure at the discrete time tn, and let [ En for n = 1, 2, …, be the
composite limit state defined as the union event of all instantaneous failure events
defined for each discretized time interval. The cumulative probability of failure can
then be approximated by
!
[
N
Pf ð0; tl Þ ¼ P GðX; tn Þ\0; tn 2 ½0; tl ; ð6:23Þ
n¼0
where failure occurs if G(X, tn) < 0 for any n = 1, 2, …, N. Although the CLS
approach converts the time-dependent reliability analysis to a time-independent
one, determining the CLS itself is difficult and computationally expensive because
it requires the comparison of all instantaneous failure events for each design point.
The discretization of time will greatly simplify the time-dependent reliability
analysis because it converts the time-dependent analysis problem into several
time-independent reliability problems. However, despite the several-fold increase in
computational cost, this method also raises the error when the time-dependent limit
state is highly nonlinear within different time intervals.
In this section, the same mathematical example as used in Sect. 6.2.2 is employed
to demonstrate time-dependent reliability analysis using the CLS approach. By
using the CLS method, the time within [0, 5] is discretized into 5 intervals, and
within each time interval, the performance function is considered time-independent,
as shown in Fig. 6.13.
178 6 Time-Dependent Reliability Analysis in Design
After discretizing the time variable into time intervals, the time-dependent per-
formance function can be accordingly converted into time-independent ones in
different time intervals. With the time-independent performance function in each
time interval, reliability analysis can be carried out using existing reliability analysis
methods, such as MCS or FORM.
In this example, the MCS is used to find out the reliability at each time interval,
in order to compute the time-dependent reliability estimates. Table 6.6 lists the
reliability values estimated for these time intervals. The time-dependent reliability
can be accordingly approximated based upon the reliability at each time interval as:
Y
t
R¼ Ri ¼ 0:7413
i¼1
Outcrossing rate based methods relate the probability of failure to the mean number
of outcrossings of the time-dependent limit state function through the limit state
surface. Outcrossing rate based methods generally perform time-dependent relia-
bility analysis through estimation when the outcrossings may be considered inde-
pendent and Poisson distributed, or by estimating an upper bound in general cases.
In the following section, the outcrossing rate is introduced and time-dependent
reliability analysis that employs the outcrossing rate is presented. A case study is
used to demonstrate the method.
The approach based on the outcrossing rate calculates the probability of failure
based on the expected mean number of outcrossings of the random process through
the defined failure surface [35]. The instantaneous outcrossing rate at time s is then
defined as
Let N(0, tl) denote the number of outcrossings of zero value from the safe
domain to the failure domain within [0, tl]. Basic probability theory shows that
N(0, tl) follows a binomial distribution. When the probability of outcrossing is very
small, it is equal to the mean number of outcrossings per unit time (the outcrossing
rate). Because the binomial distribution converges to the Poisson distribution when
the time period is sufficiently long or the dependence between crossings is negli-
gible, the outcrossings are assumed to be statistically independent [36]. With this
assumption, the outcrossing rate becomes the first-time crossing rate, or the failure
rate. Then, the probability of failure can be estimated from the upcrossing rate. The
probability of failure defined in Eq. (6.3) also reads as
Equation (6.25) can be interpreted as that failure within time interval [0, tl] that
corresponds either to failure at the initial instant t = 0 or to a later outcrossing of the
limit state surface if the system is in the safe domain at t = 0. It is reported that the
upper bound on Pf(0, tl) is available as [35]
Ztl
E½N ð0; tl Þ ¼ v þ ðtÞdt ð6:27Þ
0
approximating the limit state surface by the hyper plane a(s + Ds)u +
b(s + Ds) = 0 in the standard normal space.
• Denote the correlation between the two events {G(X, s) > 0} and {G(X, s +
Ds) 0} by
In this section, the same mathematical example as used in Sects. 6.2.2 and 6.3.3 is
employed to demonstrate time-dependent reliability analysis using the PHI2
method. In the example, a time-dependent limit state function G(X, t) is given by
where t represents the time variable that varies within [0, 5]. The random variables
X1 and X2 are normally distributed random variables: X1 * Normal(3.5, 0.322) and
X2 * Normal(3.5, 0.322). While computing the time-dependent reliability of this
example problem using the PHI2 methods, six different time increments have been
used. The results, as compared with the MCS results in Sect. 6.2.2, are summarized
in Table 6.7.
As one of the outcrossing rate based methods, the PHI2 method enables the use of
classical time-independent reliability methods, such as FORM, for solving the
time-dependent problem. The disadvantages of PHI2 were that only a bound of the
system reliability could be obtained, and error could rise because of the nonlinearity
of limit states due to the use of FORM. Sudret [40] developed a new formula based
on the original PHI2 method to stabilize the step-size effect in calculating the
time-dependent reliability, denoted as the PHI2+ method. Zhang and Du [41] and
Du [42] proposed a mean value first-passage method to perform time-dependent
reliability analysis for the function generator mechanism. In this approach, the
analytical equations were derived for the up-crossing and down-crossing rates first,
and a numerical procedure was then proposed to integrate these two rates in order to
compute the time-dependent reliability. The assumption of this approach was that
the motion error of the mechanism was a non-stationary Gaussian process. Son and
Savage [43] tracked the time-variant limit state in standard normal space to compute
the system reliability. For this approach, time was divided into several intervals, and
the incremental failure probabilities were calculated for all of them. The probability
of failure estimate was then obtained through the summation of all incremental
failure probabilities. This approach simplified time-dependent reliability analysis by
discretizing time into several time intervals and further assuming independency
between these intervals.
6.6 Conclusion
the nested extreme response surface (NERS) approach. The principles of composite
limit state based methods and outcrossing rate based methods have been briefly
discussed, with examples to demonstrate the analysis processes.
Exercise Problems
where t represents the time variable that varies within [0, 5], and X1 and
X2 are normally distributed random variables: X1 * Normal(3.5, 0.32) and
X2 * Normal(3.5, 0.32). Calculate the time-dependent probability of failure,
P(G(X, t) 0). Following the procedure as shown in Fig. 6.1, the following
steps can be used to estimate the time-dependent reliability of this example.
a. Compute the time-dependent reliability using the MCS method.
b. Compute the time-dependent reliability using the NERS method.
c. Compute the time-dependent reliability using the CLS method with 10 time
intervals evenly distributed.
d. Compute the time-dependent reliability using the outcrossing rate method.
2. Consider a time-dependent limit state function G(X, t), given by
2
ðX1 þ X2 t 10Þ2 ð0:2t2 þ X1 t X2 16Þ
GðX; tÞ ¼ 1
30 120
where t represents the time variable that varies within [0, 5], and X1 and X2 are
normally distributed random variables: X1 * Normal(3.0, 0.52) and X2 *
Normal(3.0, 0.52). Calculate the time-dependent probability of failure, P(G(X, t)
0).
a. Compute the time-dependent reliability using the MCS method.
b. Compute the time-dependent reliability using the NERS method.
c. Compute the time-dependent reliability using the CLS method with 10 time
intervals evenly distributed.
d. Compute the time-dependent reliability using the outcrossing rate method.
3. For the two-slider crank mechanism shown in Fig. 6.14, the time-dependent
limit state function can be given by
where
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Dsactual ¼ R1 cosðh h0 Þ þ R22 R21 sin2 ðh h0 Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
R3 cosðh1 þ h0 h d0 Þ R24 R23 sin2 ðh1 þ h0 h d0 Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Dsdesired ¼ 108 cosðh h0 Þ þ 2112 1082 sin2 ðh h0 Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
100 cosðh1 þ h0 h d0 Þ 2132 1002 sin2 ðh1 þ h0 h d0 Þ
The random variables and parameters in the performance function are given in
the table below.
c. Compute the time-dependent reliability using the CLS method with 10 time
intervals evenly distributed.
d. Compute the time-dependent reliability using the outcrossing rate method.
References
1. Currin, C., Mitchell, T., Morris, M., & Ylvisaker, D. (1991). Bayesian prediction of
deterministic functions, with applications to the design and analysis of computer experiments.
Journal of the American Statistical Association, 86(416), 953–963.
2. Li, J., Chen, J., & Fan, W. (2007). The equivalent extreme-value event and evaluation of the
structural system reliability. Structural Safety, 29(2), 112–131.
3. Chen, J. B., & Li, J. (2007). The extreme value distribution and dynamic reliability analysis of
nonlinear structures with uncertain parameters. Structural Safety, 29(2), 77–93.
4. Li, J., & Mourelatos, Z. P. (2009). Time-dependent reliability estimation for dynamic
problems using a Niching genetic algorithm. Journal of Mechanical Design, 131(7), 071009.
5. Lutes, L. D., & Sarkani, S. (2009). Reliability analysis of systems subject to first-passage
failure (NASA Technical Report No. NASA/CR-2009-215782).
6. Kuschel, N., & Rackwitz, R. (2000). Optimal design under time-variant reliability constraints.
Structural Safety, 22(2), 113–127.
7. Li, C., & Der Kiureghian, A. (1995). Mean out-crossing rate of nonlinear response to
stochastic input. In Proceedings of ICASP-7, Balkema, Rotterdam (pp. 295–302).
8. Schrupp, K., & Rackwitz, R. (1988). Out-crossing rates of marked Poisson cluster processes
in structural reliability. Applied Mathematical Modelling, 12(5), 482–490.
9. Breitung, K. (1994). Asymptotic approximations for the crossing rates of Poisson square
waves (pp. 75–75). NIST Special Publication SP.
10. Wang, Z., & Wang, P. (2013). A new approach for reliability analysis with time-variant
performance characteristics. Reliability Engineering and System Safety, 115, 70–81.
11. Wang, Z., & Wang, P. (2012). A nested response surface approach for time-dependent
reliability-based design optimization. Journal of Mechanical Design, 134(12), 121007(14).
12. Wang, P., Wang, Z., & Almaktoom, A. T. (2014). Dynamic reliability-based robust design
optimization with time-variant probabilistic constraints. Engineering Optimization, 46(6),
784–809.
13. Hu, Z., & Mahadevan, S. (2016). A single-loop kriging surrogate modeling for
time-dependent reliability analysis. Journal of Mechanical Design, 138(6), 061406(10).
14. Xu, H., & Rahman, S. (2005). Decomposition methods for structural reliability analysis.
Probabilistic Engineering Mechanics, 20(3), 239–250.
15. Youn, B. D., & Xi, Z. (2009). Reliability-based robust design optimization using the
Eigenvector Dimension Reduction (EDR) method. Structural and Multidisciplinary
Optimization, 37(5), 475–492.
16. Xu, H., & Rahman, S. (2004). A generalized dimension-reduction method for multidimen-
sional integration in stochastic mechanics. International Journal for Numerical Methods in
Engineering, 61(12), 1992–2019.
17. Jones, D. R., Schonlau, M., & Welch, W. J. (1998). Efficient global optimization of expensive
black-box functions. Journal of Global Optimization, 13(4), 455–492.
18. Schonlau, M. (1997). Computer experiments and global optimization (Ph.D. Dissertation).
University of Waterloo, Waterloo, Ontario, Canada.
19. Stuckman, B. E. (1988). A global search method for optimizing nonlinear systems. IEEE
Transactions on Systems, Man and Cybernetics, 18(6), 965–977.
20. Žilinskas, A. (1992). A review of statistical models for global optimization. Journal of Global
Optimization, 2(2), 145–153.
186 6 Time-Dependent Reliability Analysis in Design
21. Koehler, J., & Owen, A. (1996). Computer experiments. In S. Ghosh & C. R. Rao (Eds.),
Handbook of statistics, 13: Design and analysis of experiments (pp. 261–308). Amsterdam:
Elsevier.
22. Sacks, J., Welch, W. J., Mitchell, T. J., & Wynn, H. P. (1989). Design and analysis of
computer experiments. Statistical Science, 4(4), 409–423.
23. Mockus, J., Tiesis, V., & Zilinskas, A. (1978). The application of Bayesian methods for
seeking the extreme. In L. C. W. Dixon & G. P. Szego (Eds.), Towards global optimization
(Vol. 2, pp. 117–129). Amsterdam, The Netherlands: Elsevier.
24. Haftka, R. T., & Watsonft, L. T. (1999). Response surface models combining linear and Euler
aerodynamics for supersonic transport design. Journal of Aircraft, 36(1), 75–86.
25. Madsen, J. I., Shyy, W., & Haftka, R. T. (2000). Response surface techniques for diffuser
shape optimization. AIAA Journal, 38(9), 1512–1518.
26. Welch, W. J., Buck, R. J., Sacks, J., Wynn, H. P., Mitchell, T. J., & Morris, M. D. (1992).
Screening, predicting, and computer experiments. Technometrics, 34(1), 15–25.
27. Wang, G. G., Dong, Z., & Aitchison, P. (2001). Adaptive response surface method—A global
optimization scheme for approximation-based design problems. Engineering Optimization,
33(6), 707–733.
28. Keane, A. J., & Nair, P. B. (2005). Computational approaches for aerospace design (p. 582).
Wiley: West Sussex.
29. Simpson, T. W., Mauery, T. M., Korte, J. J., & Mistree, F. (1998). Comparison of response
surface and kriging models for multidisciplinary design optimization. AIAA paper 98,
4758(7).
30. Paciorek, C. J. (2003). Nonstationary Gaussian processes for regression and spatial
modelling (Ph.D. dissertation). Carnegie Mellon University, Pittsburgh, PA.
31. Farhang-Mehr, A., & Azarm, S. (2005). Bayesian metamodeling of engineering design
simulations: A sequential approach with adaptation to irregularities in the response behavior.
International Journal for Numerical Methods in Engineering, 62(15), 2104–2126.
32. Qin, S., & Cui, W. (2003). Effect of corrosion models on the time-dependent reliability of
steel plated elements. Marine Structures, 16(1), 15–34.
33. Madsen, H. O., Krenk, S., & Lind, N. C. (2006). Methods of structural safety. USA: Dover
Publications.
34. Singh, A., Mourelatos, Z. P., & Li, J. (2010). Design for lifecycle cost using time-dependent
reliability. Journal of Mechanical Design, 132(9), 091008.
35. Hagen, O., & Tvedt, L. (1991). Vector process out-crossing as parallel system sensitivity
measure. Journal of Engineering Mechanics, 117(10), 2201–2220.
36. Rackwitz, R. (1998). Computational techniques in stationary and non-stationary load
combination—A review and some extensions. Journal of Structural Engineering, 25(1),
1–20.
37. Breitung, K. (1988). Asymptotic crossing rates for stationary gaussian vector processes.
Stochastic Processes and Their Applications, 29(2), 195–207.
38. Belyaev, Y. K. (1968). On the number of exits across the boundary of a region by a vector
stochastic process. Theory of Probability & Its Applications, 13, 320–324.
39. Andrieu-Renaud, C., Sudret, B., & Lemaire, M. (2004). The PHI2 method: A way to compute
time-variant reliability. Reliability Engineering & System Safety, 84(1), 75–86.
40. Sudret, B. (2008). Analytical derivation of the out-crossing rate in time-variant reliability
problems. Structure and Infrastructure Engineering, 4(5), 353–362.
41. Zhang, J., & Du, X. (2011). Time-dependent reliability analysis for function generator
mechanisms. Journal of Mechanical Design, 133(3), 031005(9).
42. Du, X. (2012). Toward time-dependent robustness metrics. Journal of Mechanical Design,
134(1), 011004(8).
43. Son, Y. K., & Savage, G. J. (2007). Set theoretic formulation of performance reliability of
multiple response time-variant systems due to degradations in system components. Quality
and Reliability Engineering International, 23(2), 171–188.
Chapter 7
Reliability-Based Design Optimization
It has been widely recognized that engineering design should account for the
stochastic nature of design variables and parameters in engineered systems.
Reliability-based design optimization (RBDO) integrates the techniques of relia-
bility analysis and design optimization and offers probabilistic approaches to
engineering design [1–11]. RBDO attempts to find the optimum design that min-
imizes a cost and satisfies an allocated reliability target with respect to system
performance function(s), while accounting for various sources of uncertainty (e.g.,
material properties, geometric tolerances, and loading conditions). In general, a
RBDO problem for an engineered system can be formulated as follows:
Minimize f ðdÞ
Subject to R ¼ Pr Gj ðX; H; dÞ 0 Uðbt Þ ¼ Rt ; j ¼ 1; . . .; Nc ð7:1Þ
diL di diU ; i ¼ 1; . . .; Nd
where the objective function is often the cost (e.g. price, volume, and mass) of the
system, the random design vector X = (X1, …, XNd)T and the random parameter
vector H = (h1, …, hNr)T, Nd and Nr are the number of design and parameter
variables, respectively, d = (d1, …, dNd)T = l(X) is the mean design vector (or the
design variable set), R is the reliability level of a given system design d, Gj is the
performance function of the jth design constraint, for j = 1, …, Nc, Nc is the
number of constraints, Ф is the cumulative distribution function of the standard
normal distribution, Rt is the target reliability level, which corresponds to a target
reliability index bt, and dLi and dU i are, respectively, the lower and upper bounds
on di, for i = 1, …, Nd. The concept of RBDO and its comparison to deterministic
design optimization (DDO) is illustrated with a two-dimensional design example
in Fig. 7.1. In this example, the engineered system under design is the lower
control arm of an army ground vehicle. The control arm serves as a connection
between the wheels and the main body of the vehicle. The majority of dynamic
loading in the suspension system is transmitted through the control arm, making it
susceptible to fatigue failure. Here, a computer simulation model is built to predict
the fatigue lives of the nodes on the surface of the control arm, and
simulation-based design is used to optimize the thicknesses of two components, X1
and X2, with an aim to minimize the design cost and satisfy two fatigue design
constraints.
The design constraints are defined in terms of the fatigue lives at two hotspots,
and are graphically represented as two failure surfaces, G1(X1, X2) = 0 and G2(X1,
X2) = 0, that separate the safe regions from the failure regions. For example, at any
design point below the failure surface G1 = 0, the fatigue life at the 1st hotspot is
longer than the specification limit (i.e., G1 < 0) and at any point above the surface,
the fatigue life is shorter than the limit (i.e., G1 > 0). Thus, on or below the surface
is the safe region for the 1st constraint (i.e., G1 0) and above the surface is the
failure region (i.e., G1 > 0). Similarly, on or to the left of the failure surface G2 = 0
is the safe region for the 2nd design constraint (i.e., G2 0) and to the right is the
failure region (i.e., G2 > 0). Therefore, any design point in the joint safe region is
safe with respect to both constraints (i.e., G1 0 and G2 0). This joint region is
treated as the system safe (or feasible) region in this design example. The concentric
Fig. 7.1 Schematic of RBDO and its comparison with DDO in a two-dimensional design space
7.1 Problem Statement and Formulation 189
ellipses centering at either optimum design point l(X) represent the contours of the
joint PDF of the two random variables X1 and X2. The small dots that scatter around
the optimum design point are the design realizations that are randomly generated
based on the joint PDF. Two important observations can be made from the figure:
• Starting with an initial design in the safe region, DDO and RBDO both attempt
to find an optimum design solution that minimizes the objective function and
satisfies the design constraints. In this case, the objective function decreases
along the upper-right direction in the design space, i.e., the farther towards the
upper right that the design point is located, the smaller the corresponding
objective. Thus, both design approaches move the design point as far toward the
upper-right as possible while keeping the design point in the safe region.
• DDO and RBDO differ intrinsically in the way they deal with the design con-
straints, and this difference leads to two different optimal design solutions (see
the deterministic optimum and probabilistic optimum in Fig. 7.1). DDO con-
siders two deterministic constraints, G1(d1, d2) 0 and G2(d1, d2) 0, that do
not account for the uncertainty in the thicknesses of the components, while
RBDO addresses two probabilistic constraints (or fatigue reliabilities), Pr(G1(X1,
X2) 0) Rt and Pr(G2(X1, X2) 0) Rt, that explicitly account for the
uncertainty. Consequently, DDO pushes the optimum design point onto the
boundary of the safe region, leaving little room for accommodating uncertainty
in the design variables (depicted by red dots around the deterministic optimum
design). By doing so, DDO finds the optimum design point that minimizes the
objective function while satisfying the deterministic constraints. However, many
random design realizations (or actual control arm units) around the deterministic
optimum fall out of the safe region and are deemed unreliable. RBDO pushes
the deterministic optimum design back to the safe region in order to create a
safety margin that accommodates the uncertainty in the design variables. Due to
the safety margin around the probabilistic optimum, this design solution is more
reliable in that most of the random design realizations (blue dots) are located
within the safe region.
In its search for the probabilistic optimum, RBDO must evaluate the feasibility
of probabilistic constraints at a candidate design point through reliability analysis
under uncertainty. Intuitively, RBDO needs to determine whether the safety margin
shown in Fig. 7.1 is large enough to satisfy both probabilistic constraints. During
the past two decades, many attempts have been made to develop efficient strategies
to perform feasibility identification of the probabilistic constraints. These strategies
can be broadly categorized as: double-loop RBDO [1–5], decoupled, sequential
single-loop RBDO (or decoupled RBDO) [6], integrated single-loop RBDO (or
single-loop RBDO) [7–11], and metamodel-based RBDO [12]. The rest of this
chapter introduces these strategies as well as discusses an emerging design topic,
RBDO under time-dependent uncertainty.
190 7 Reliability-Based Design Optimization
The double-loop RBDO strategies often consist of a nested structure of outer and
inner loops, where the outer loop performs design optimization in the original
design variable space (X-space), and the inner loop performs reliability analysis at a
given design point in the transformed standard normal space (U-space). The two
steps (design optimization and reliability analysis) are iteratively repeated until the
optimum design of an engineered system is found that minimizes the cost and meets
the probabilistic constraints.
The jth probabilistic design constraint in the RBDO formulation shown in
Eq. (7.1) can be expressed in terms of the CDF of the jth performance function as
Pr Gj ðXÞ 0 ¼ FGj ð0Þ Uðbt Þ ð7:2Þ
where FGj ðÞ is the CDF of Gj, and its value at Gj = 0 is the reliability of the jth
design constraint, expressed as
Z0 Z
FGj ð0Þ ¼ fGj gj dgj ¼ fX ðxÞ dx ð7:3Þ
1 XSj
Here fX(x) is the joint PDF of all the random variables, and XSj is the safety region
defined as XSj ¼ x : Gj ðxÞ\0 . For notational convenience, we assume the new
random vector consists of both design and parameter vectors, i.e., X = [XT, HT]T.
The evaluation of Eq. (7.3) is essentially reliability analysis under time-independent
uncertainty, which has been extensively discussed in Chap. 5. Among the various
methods for reliability analysis, an approximate probability integration method is
known to provide efficient solutions; this is the first-order reliability method
(FORM). Recall from Sect. 5.3 that reliability analysis in FORM requires a
transformation T of the original random variables X to the standard normal random
variables U. Correspondingly, the performance function G(X) can be mapped from
the original X-space onto the transformed U-space, i.e., G[T(X)] G(U).
There are in general three double-loop approaches to RBDO, and these
approaches further express the probabilistic constraint in Eq. (7.2) through inverse
transformations or using approximate statistical moments:
RIA: Grj ¼ bt U1 FGj ð0Þ ¼ bt bsj 0 ð7:4Þ
j ¼ Gj ðlÞ þ trGj 0
AMA: Gm ð7:6Þ
where bsj , Gpj , and Gmj are respectively the safety reliability index, probabilistic
performance measure; and probabilistic constraint in AMA for the jth design
constraint, t is a target reliability coefficient, and rGj is the second statistical
moment of the performance function Gj. Equation (7.4) uses the reliability index to
describe the probabilistic constraint in Eq. (7.1); this RBDO approach is called
reliability index approach (RIA) [13]. Similarly, Eq. (7.5) replaces the probabilistic
constraint in Eq. (7.1) with the probabilistic performance measure; this approach is
known as performance measure approach (PMA). Because the (often) approximate
first and second statistical moments in Eq. (7.6) are used to describe the proba-
bilistic constraint in the RBDO formulation, this approach is termed the approxi-
mate moment approach (AMA), which originated from the robust design
optimization concept [14, 15]. In what follows, these RBDO approaches will be
introduced, and a focus will be put on RIA and PMA, which are more widely used
in system design than AMA.
Minimize f ðdÞ
Subject to Grj ¼ bt bsj 0; j ¼ 1; . . .; Nc ð7:7Þ
diL di diU ; i ¼ 1; . . .; Nd
The reliability index in the probabilistic constraint in RIA can be evaluated using
FORM, which solves an optimization problem in the transformed U-space (see
Sect. 5.3.1). In FORM, the equality constraint is the failure surface Gj(U) = 0. The
point on the failure surface that has the shortest distance to the origin is called the
most probable point (MPP) uj , and the first-order estimate of the reliability index is
defined as the distance between the MPP and origin, bs,FORMi = ||uj ||. Because of its
simplicity and efficiency, the HL–RF method described in Sect. 5.3.2 is often
employed to search for the MPP for reliability analysis in RIA.
The MPP search space in RIA is illustrated over a two-dimensional design space
in Fig. 7.2, where the first-order reliability indices in Eq. (7.4) are bs,FORM
j = bj(x*j ;
*
dk) = ||T(xj )||, j = 1, 2, at the kth design iteration. Reliability analysis in RIA is
carried out by determining the minimum distance between the mean design point dk
and the MPP x*j on the failure surface Gj(X) = 0, j = 1, 2. A comparison of the two
probabilistic constraints in Fig. 7.2 suggests that the first constraint is slightly
192 7 Reliability-Based Design Optimization
X2: operational
space of RIA in a G1 ≤ 0 & G2 ≤ 0
two-dimensional design space
β1(X;dk) < β t β2(X;dk) > β t
factor
dk
G2 = 0
x1*
G2 > 0
G1 > 0 Failure
Failure x2 *
G1 = 0 region
region
X1: manufacturing
tolerance
violated, i.e., b1(x*1; dk) < bt, and the second is largely inactive, i.e., b2(x*2; dk) > bt.
Consequently, the MPP search space (see the smaller circle) for the first constraint
is smaller than the search space (see the larger circle) for the second constraint. It
has been reported in [1, 3] that the size of the MPP search space in reliability
analysis could affect the efficiency of the MPP search but may not be a crucial
factor. Rather, it was found that PMA with the spherical equality constraint (see
Eq. (7.9) in Sect. 7.2.2) is often easier to solve than RIA with an often complicated
constraint (i.e., Gj(X) = 0). In other words, it is easier to minimize a complex cost
function subject to a simple constraint function (PMA) than to minimize a simple
cost function subject to a complicated constraint function (RIA).
Minimize f ðdÞ
Subject to Gpj ¼ FG1j ðUðbt ÞÞ 0; j ¼ 1; . . .; Nc ð7:8Þ
diL di diU ; i ¼ 1; . . .; Nd
Maximize Gj ðUÞ
ð7:9Þ
Subject to kUk ¼ bt
The point on the target reliability surface ||U|| = bt with the maximum value of the
performance function is called the MPP uj with the prescribed reliability ||uj || = bt.
Then, the probabilistic performance measure is defined as Gp,FORM j = G(uj ).
Unlike RIA, the MPP search in PMA requires only the direction vector uj /||uj ||
given the spherical equality constraint ||U|| = bt. Three numerical methods for PMA
can be used to perform the MPP search in Eq. (7.9): the advanced mean value
(AMV) method [15], the conjugate mean value (CMV) method [3, 17], and the
hybrid mean value (HMV) method [3].
The MPP search space in PMA is illustrated over a two-dimensional design
space in Fig. 7.3, where the first-order performance measures in Eq. (7.5) are
Gp,FORM
j = gj(x*j ), j = 1, 2. Reliability analysis in PMA is carried out by determining
the maximum performance value Gi(x*j ) on the explicit sphere of the target relia-
bility bj(X; dk) = bt, j = 1, 2. Although the two probabilistic constraints in Fig. 7.3
significantly differ in terms of feasibility, they share the same MPP search space
(see the circle in Fig. 7.3). As mentioned, it is often more efficient to perform the
MPP search in PMA with the spherical equality constraint, than to perform RIA
with a complicated constraint (i.e., Gj(X) = 0).
AMV for MPP Search in PMA
The advanced mean value (AMV) method is, in general, well-suited for solving the
optimization problem in Eq. (7.9) due to its simplicity and efficiency [16]. The
first-order AMV method starts the MPP search with the initial MPP estimate
expressed as follows [16]:
r U G ð u ¼ 0Þ
uð0Þ ¼ bt nð0Þ ¼ bt ð7:10Þ
kr U G ð u ¼ 0Þ k
space of PMA in a
two-dimensional design space
β1(X;dk) = β2(X;dk)
= βt
factor
G2 = g2(x2*)
dk
x2 *
x1 * G2 = 0
G2 > 0
Failure
G1 > 0 G1 = 0 region
Failure
region
G1 = g1(x1*) X1: manufacturing
tolerance
194 7 Reliability-Based Design Optimization
To maximize the objective function G(U) in Eq. (7.9), the AMV method first uses
the normalized steepest descent direction n(0) obtained at the origin u = 0 in the U-
space. Note that u = 0 corresponds to the mean values of X. In subsequent steps, the
method iteratively updates the search direction at the current iteration (k + 1) as the
steepest descent direction n(k) at the MPP u(k) obtained at the previous iteration
k. The iterative algorithm works according to the following:
ð1Þ t ð0Þ rU G uð0Þ ðk þ 1Þ t ðk Þ rU G uðkÞ
u ¼bn ¼b t
; u ¼bn ¼b t
ð7:11Þ
krU Gðuð0Þ Þk krU GðuðkÞ Þk
It was reported in [3] that the AMV method often behaves well for a convex
performance function, but may exhibit instability and inefficiency for a concave
performance function due to the sole use of the gradient information at the previous
MPP.
CMV for MPP Search in PMA
When dealing with a concave performance function, the AMV method tends to
show slow convergence or even divergence. This numerical difficulty can be
addressed by employing an alternative MPP search method, the conjugate mean
value (CMV) method, which updates the search direction through a combined use
of the steepest descent directions at the three previous iterations [3, 17]. The update
of the search direction at the current iterative (k + 1), k 2, is performed according
to the following:
where
ðk Þ rU G uðkÞ
n ¼ ð7:13Þ
krU GðuðkÞ Þk
It can be observed from the above equations that the conjugate steepest descent
direction is a weighted sum of the previous three consecutive steepest descent
directions. This way of updating the search direction improves the rate of con-
vergence and the stability over the AMV method for concave performance
functions.
HMV for MPP Search in PMA
Although the CMV method works well on concave functions, the method is often
less efficient than the AMV method for convex functions. To combine the strengths
of the two methods, the hybrid mean value (HMV) method was developed and
shown to attain both stability and efficiency in the MPP search in PMA [3].
The HMV method first determines the type (i.e., convex or concave) of a
7.2 Double-Loop RBDO 195
performance function based on the steepest descent directions at three most recent
iterations, and then adaptively selects one of the two algorithms, AMV or CMV, for
the MPP search. A more detailed description of the method can be found in [3, 18].
The RBDO problem in Eq. (7.1) can be redefined using AMA in Eq. (7.6) as
Minimize f ðdÞ
Subject to Gmj ¼ Gj ðlÞ þ krGj 0; j ¼ 1; . . .; Nc ð7:14Þ
diL di diU ; i ¼ 1; . . .; Nd
As described in Sect. 5.2, the second statistical moment (standard deviation) of the
performance function can be approximated using the first-order Taylor series
expansion at the mean values lX as
XN
@Gj ðlX Þ
Gj ðXÞ Gj ðlX Þ þ Xi lXi ð7:15Þ
i¼1
@X i
a complicated constraint function using RIA [1, 3, 17]. Second, the nonlinearity of
the PMA reliability constraint is less dependent on probabilistic model types, such
as normal, lognormal, Weibull, Gumbel, and uniform distributions, than the RIA
reliability constraint [17]. Thus, RIA tends to diverge for distributions other than
the normal distribution, whereas PMA converges well for all types of distributions.
Here, we also list some observations on AMA from an earlier study [18]. First,
without knowing the output probabilistic distribution type, a reliability requirement
is directly assigned by the first two moments of the performance function.
Therefore, a non-normal and skewed output distribution with even a small variation
produces a large error when estimating the reliability. Second, another numerical
error can be generated when estimating the first two moments based on a Taylor
series expansion at the mean values of the random variables. Third, AMA involves
intensive computations because it requires the second- or higher-order sensitivity of
the performance function to evaluate the sensitivity of the probabilistic constraint,
whereas PMA and RIA require only the first-order sensitivity.
Table 7.1 summarizes the comparisons of these double-loop approaches in terms
of several numerical attributes. As shown in Table 7.1, PMA is in general more
desirable than RIA and AMA for RBDO from several numerical perspectives.
where uGj ðUÞ¼0 is the MPP in the U-space that may be estimated by the HL-RF
method (see Sect. 7.2.1). Using the transformation U = T(X; d), Eq. (7.16) can be
rewritten as
@bs;FORM
j TðX; dÞT @TðX; dÞ
¼ ð7:17Þ
@di bs;FORM @di
X¼xG ðXÞ¼0
j
@Gp;FORM
j @Gj ðUÞ
¼ ð7:18Þ
@di @di U¼u
bj ¼bt
where ub ðUÞ¼bt is the MPP in the U-space that may be estimated by the AMV,
j
CMV, or HMV method (see Sect. 7.2.2). Using the transformation U = T(X; d),
Eq. (7.18) can be rewritten as
@Gp;FORM
j @Gj ðTðX; dÞÞ
¼ ð7:19Þ
@di @di X¼x bj ¼bt
Adhetent[1] Adhesive
Adherent[1]
Adherent[1]
Adhetent[2] Adhesive
Adhesive
Adherent[2] Adherent[2]
In this process, residual stress due to the mismatch of the thermal expansion
coefficients of two layered plates could result in failures of the component,
7.2 Double-Loop RBDO 199
F2
F1
Adherent 1
Adhesive
Adherent 2
z
y x
Bar
Fig. 7.4 Target bonding process and FE model for Problem 7.2
Minimize Q ¼ l r þ rr
Subject to Rj ¼ PðGj ðXÞ 0Þ Uðbt Þ; j ¼ 1; 2
2000 X1 10000; 1000 X2 5000; 1 X3 5;
Table 7.2 Statistical properties of design variables in layered plate bonding model for
Problem 7.2
Design variable Distribution type Mean Std. dev.
X1 Normal 4000 400
X2 Normal 2000 200
X3 Normal 1 0.1
200 7 Reliability-Based Design Optimization
where lr and rr are the mean and standard deviation of residual stress,
G1(X) is the instantaneous stress, G2(X) is the edge displacement, and the
target reliability index bt = 3.
The eigenvector dimension reduction (EDR) method [19], as a variant of
univariate dimension reduction (UDR) introduced in Sect. 5.5.1, is carried
out to evaluate the quality (= mean + standard deviation) of residual stress
and the reliabilities of two constraints. The sampling scheme for the EDR
method is adaptively chosen in the RBRDO process to tackle the high non-
linearity in system responses. First, RBRDO starts with a 2Nd + 1 sampling
scheme for the EDR method. Then when satisfying a relaxed convergence
criteria (e 0.1), the RBRDO process turns the 4N + 1 sampling on. In this
example, the standard deviation at the fourth design iteration is quite small
but this estimation is not accurate enough because of highly nonlinear
responses. Therefore, after the fourth design iteration, RBRDO is performed
with the 4N + 1 sampling scheme to enhance accuracy of the quality and
reliability estimates. Sequential quadratic programming (SQP) is used as a
design optimizer to solve the RBRDO problem. Table 7.3 shows the design
history of this problem. After eight design iterations, an optimum design is
found where X2 is close to the upper bound. The EDR method requires in
total 87 function evaluations for RBRDO. MCS with 1000 random samples is
used to confirm the EDR results at the optimum design. It is found that the
EDR estimates for the mean (lr) and standard deviation (rr) of the residual
stress at the optimum design are very close to those using MCS. The overall
quality is drastically improved by 38%.
A new design
Constraint Check:
Reliability
Constraint 1 analysis
loop
Optimization
loop Reliability
Constraint 2 analysis
loop
……….
Reliability
Constraint n analysis
loop
Optimal design
Fig. 7.5 Nested double-loop of RBDO. Reprinted (adapted) with permission from Ref. [20]
RBDO Methodologies
One of strategies to decouple the nested loop is to decouple two loops: the outer
loop for deterministic design optimization and the inner loop for reliability analysis.
The separated two loops are performed sequentially until a design optimization
converges (see Fig. 7.7). Compared to the double-loop RBDO, which conducts the
reliability analysis for all design changes in the outer loop, the decoupled RBDO
conducts the reliability analysis only once after the deterministic optimum design
from the outer loop is achieved. That is, the outer loop may have several iterations
but it does not call the inner loop each time. This reduces the number of the
reliability analyses, which accounts for a majority of the computational cost. In the
7.3 Decoupled RBDO 203
RL 1st LS RL nth LS
2nd iteration
Optimization
loop
RL 1st LS RL nth LS
Till convergence
is obtained
Minimize f ðd Þ
d
ð7:20Þ
ðk þ 1Þ
Subject to Gj d; lX sj 0; j ¼ 1; 2; . . .; Nc
ðk þ 1Þ
where si is the shifting vector for the jth design constraint at the ðk þ 1Þth
design iteration.
204 7 Reliability-Based Design Optimization
Another strategy to decouple the nested loop is to eliminate the inner loop for
reliability analysis by approximating the probabilistic constraints as deterministic
ones. Once the probabilistic constraints are approximated into deterministic ones, a
simple deterministic design optimization can be conducted without additional
reliability analysis (see Fig. 7.8).
There are two major approaches to approximating the probabilistic constraints:
single-loop single vector (SLSV) and Karush-Kuhn-Tucker (KKT) optimality
condition. The SLSV approach leverages the sensitivity of the design variables to
remove reliability analysis by finding the MPP iteratively. The main drawback of
SLSV is that the active constraints should be identified in advance. The second
approach with the KKT condition treats the inner loop of reliability analysis as
the equality constraints in the outer design optimization loop. The single-loop
formulation using the KKT optimality condition is given as follows
Minimize f ðdÞ
d
t
Subject to GP ffi Gðd; Xðut ÞÞ 0
^t
where ut ffi bt a ð7:21Þ
!
rX Gðd; XðuÞÞJX;u
^ ffi
a t
rX Gðd; XðuÞÞJX;u
u¼~
u
^t is
target reliability index, JX;u is the Jacobian matrix of the transformation, and a
the negative normalized gradient vector of the performance function GðÞ evaluated
at the approximate location for the performance function value u ~ , which is the
solution of KKT optimality condition.
A simple Optimization
optimization loop
model
Optimal
design
7.4 Single-Loop RBDO 205
Instead of searching for the exact MPP at each design iteration, the single-loop
approach obtains an approximate location for the performance function value u ~j by
solving the system equation given by the KKT condition. In Fig. 7.9, the dashed
t
line G ¼ GP is the approximate limit-state function from the conversion of the
probabilistic constraint. This approach may require high computational resources in
order to handle a large number of design variables and calculate second-order
derivatives.
As the single-loop approaches do not require reliability analysis throughout the
optimization process (see Fig. 7.8), they can reduce computational costs signifi-
cantly. However, these approaches can also produce an infeasible design for highly
nonlinear design problems in which the accuracy of the approximation is not
guaranteed.
This section briefly reviews the use of the direct Monte Carlo simulation
(MCS) based on surrogate models for reliability analysis. As discussed in Sect. 4.1,
the probability of failure given a performance function G(X) can be expressed as
Z Z
Pf ¼ PðGðXÞ [ 0Þ ¼ fX ðxÞdx ð7:22Þ
GðxÞ [ 0
where fX(x) is the joint PDF of the system random inputs X. The direct MCS first
draws from the distribution fX(x) a large number of random samples, and then
evaluates the performance function G(X) at these random samples to estimate the
probability of failure:
Z Z
Pf ¼ PðGðXÞ [ 0Þ ¼ If ðxÞfX ðxÞdx ¼ E½If ðxÞ
ð7:23Þ
where E[.] is the expectation operator, and If(x) represents an indicator function,
defined as
1; if GðxÞ [ 0
If ðxÞ ¼ ð7:24Þ
0; otherwise
This section introduces the surrogate modeling with adaptive sequential sampling
(SMASS) approach. Section 7.5.2.1 first presents the Kriging-based surrogate
modeling, whereas Sect. 7.5.2.2 introduces the sampling scheme for initial surro-
gate model development. Section 7.5.2.3 then presents a new classification confi-
dence value (CCV)-based adaptive sequential sampling technique for the updating
of Kriging surrogate models; Sect. 7.5.2.4 summarizes the procedure of the
SMASS approach.
where l is the mean response, and Z(x) is Gaussian stochastic process with mean
equal to zero and variance equal to r2. GK(X) is the Kriging predicted response as a
function of X. The covariance function between two input points, xi and xj, is
expressed as
where Corr() is the correlation function; and ap and bp are parameters of the
Kriging model. With n number of observations, Gtr = [G(x1), …, G(xn)], at training
samples Xtr = [x1, …, xn], the log likelihood function of the Kriging model can be
expressed as
1 1
Likelihood ¼ ½n lnð2pÞ þ n ln r2 þ lnjRj þ 2 ðGtr AlÞT R1 ðGtr AlÞ
2 2r
ð7:28Þ
With the Kriging model, the response for any given point x′ can be estimated as
where r is the correlation vector between x′ and the training points Xtr = [x1, …,
xn], and the ith element of r is given by r(i) = Corr(x′, xi). The mean square error e
(x′) can be estimated by
" #
0 ð1 AT R1 rÞ2
1
eðx Þ ¼ r 1 r R r þ
2 T
ð7:32Þ
AT R1 A
208 7 Reliability-Based Design Optimization
With an initial set of sample points XE and system responses YE, a Kriging model
M (GK) can be constructed accordingly. However, this Kriging model usually has
a low fidelity, and thus needs to be updated. This subsection introduces a new
confidence-based adaptive sampling scheme for sequential updating of the
Kriging models.
The prediction of the response at point xi from a Kriging model can be con-
sidered as a random variable that follows normal distribution. For any given sample
point xi, based on the Kriging prediction of its response, GK(xi), it can be
accordingly classified as a sampling point in the failure region or safe region. With
this classification, all Monte Carlo sample points can be accordingly categorized
into two classes, as shown in Fig. 7.10, the failure class and the safe class, where
the failure class includes all sample points at which the predicted responses GK(xi)
> 0, and the safe class at which GK(xi) 0. Knowing that the Kriging prediction
can be considered as a random variable, thus the classification of sample point
becomes probabilistic. Here we define the probability of having a correct
7.5 Metamodel-Based RBDO 209
Z1 ½yGK ðxi Þ
2
1 12 eðx
CCVðxi Þ ¼ PðGðxi Þ [ 0Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi e Þ i dy;
2p eðxi Þ ð7:33Þ
0
for all i where GK ðxi Þ [ 0
where GK(xi) and e(xi) are the predicted response at point xi and the standard
deviation of the prediction, respectively. Similarly, for sample points in the safe
class, the CCV value indicates the probability that the sample point is at the safe
region, which can be accordingly calculated as the area of the normal cumulative
distribution function in the interval of (−∞, 0) as
Z0 ½yGK ðxi Þ
2
1 12
CCVðxi Þ ¼ PðGðxi Þ 0Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi e eðxi Þ
dy;
2p eðxi Þ ð7:34Þ
1
for all i where GK ðxi Þ 0
Based upon the definition, it is clear from Fig. 7.10 that the CCV should be a
positive value within (0.5, 1), where a higher value indicates higher classification
confidence. Combining Eqs. (7.33) and (7.34), the CCV of the sample point xi can
be generally calculated as
!
jGk ðxi Þj
CCVðxi Þ ¼ U pffiffiffiffiffiffiffiffiffiffi ; i ¼ 1; 2; . . .; n ð7:35Þ
eðxi Þ
GK(xi) 0 0 GK(xi)
210 7 Reliability-Based Design Optimization
prediction, respectively. These values can be obtained directly from the constructed
Kriging model. By using Eq. (7.35), failure potentials of the Monte Carlo samples
can be calculated based on their Kriging-predicted means and standard deviations
of the responses.
In order to improve the fidelity of the Kriging model, especially for the problems
with multiple disjointed failure regions, sample points must be accordingly chosen
from different disjointed failure regions during the sequential Kriging model
updating process, as those sample points could bring much information about the
system performance function at particular failure regions and thus are more valu-
able. Therefore, in the developed SMASS approach, the sample point will be
selected based upon a sampling rule of improving the classification confidence
values using the Kriging model, thus the sample point with the minimum CCV, x*,
will be selected in each Kriging model updating iteration, and the corresponding
performance value y* will be evaluated. This selected sample x* with its actual
response value y* is then added into XE and YE, respectively, and the Kriging
model will be accordingly updated with new sample points added. To prevent same
sample points being used repeatedly in different updating iterations, the selected
sample point x* will be excluded from the original Monte Carlo samples in each
updating iteration. The updated Kriging model is then used to predict the responses
of Monte Carlo samples again. This search and update process works iteratively and
it is terminated when the minimum CCV value reaches a predefined threshold,
CCVt. This stopping rule is defined as
where CCVi is the CCV value for the sample point xi, and CCVt is the predefined
CCV threshold. To ensure a good balance of accuracy and efficiency, it is suggested
that a value of CCVt defined between [0.95, 1) is desirable; in this study, 0.95 has
been used for the CCVt. In implementation of the SMASS approach, it is suggested
that the minimum CCVi in Eq. (7.36) can often be replaced by the average mini-
mum CCV value obtained at the last few updating iterations (e.g., the last five
iterations), in order to ensure a more robust convergence.
Sect. 7.5.2.3, is used as the performance metric of the Kriging model M. (3) Update
the Kriging model iteratively by adding sample points into XE and YE using the
CCV-based sampling scheme. The updated Kriging model is used to predict the
responses for a new set of Monte Carlo samples for the probability of failure
estimation.
@di @di
Rnr
Although the analytical form for the sensitivity of reliability can be derived, it
cannot be used to compute the sensitivity when all the samples in MCS are iden-
tified as safe. If IRF equals 0 for all the N-samples, Eq. (7.38) becomes
Z Z
@Pf @ @
¼ If ðxÞfX ðxÞdx ¼ If ðxÞ fX ðxÞdx ¼ 0 ð7:39Þ
@di @di @di
Rnr Rnr
Although zero estimation of the sensitivity based on MCS samples may not lead
to a divergence of the RBDO process, it could result in substantially more design
iterations since the new design point in the subsequent design iteration will be
affected by this sensitivity estimation. This is especially true for a high reliability
target scenario in RBDO, as the sensitivity estimated using the MCS samples based
on Eq. (7.38) will frequently be zero. In same extreme cases, the non-smooth
sensitivity estimation will substantially increase the total number of design itera-
tions, and could also make the RBDO process fail to converge to the optimum
design. To alleviate such a difficulty, a new way to calculate smooth sensitivity of
reliability without extra computational cost is presented here [32]. Defined as the
integration of the probability density function of system input variables over the
safe region (G(X) 0), the reliability has a monotonically one-to-one mapping
relationship to the ratio of the mean and the variance of the performance function,
which can be expressed as
Z Z
lGðxÞ lGðxÞ
R ¼ PðGðXÞ 0Þ ¼ fX ðxÞdx U /
rGðxÞ rGðxÞ
GðxÞ\0
R ð7:40Þ
GðxÞfX ðxÞdx
¼ qRffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Rnr
2
Rnr ðGðxÞ lGðxÞ Þ fX ðxÞdx
where µG(x) and rG(x) are the mean and variance of performance function G
(X) given the random input x. It is should be noticed that the failure probability is
computed by the integration of the probability density function over all of the
failure region, and can be determined by the randomness properties of input x and
performance function G(x). Thus, the probability of failure is a function of the
design variable d. In this equation, the approximate equality should be equality if
7.5 Metamodel-Based RBDO 213
the random response G(x) follows the Gaussian distribution given the randomness
of input x. The sensitivity of reliability with respect to the design variable d can
then be approximated as
@R lGðxÞ
/r ð7:41Þ
@d rGðxÞ
Note that the right part of Eq. (7.41) only provides an estimated sensitivity vector
that is proportional to the true design sensitivity. Thus, the sensitivity information in
Eq. (7.41) can be normalized and derived as
@R lGðxÞ lGðxÞ
ar r ð7:42Þ
@d rGðxÞ r
GðxÞ
@d @d
ð7:44Þ
the ith iteration. Also let ak and ak+1 be the proportional coefficients for kth and
(k + 1)th iterations. With these notations, the proportional coefficient ak+1 can be
updated by
Rk þ 1 Rk
ðd 0 ; if jRk þ 1 Rk j [ Ca
ak þ 1 ¼ k þ 1 dk ÞðSRk Þ ð7:45Þ
ak ; otherwise
Two case studies are employed in this section to demonstrate the proposed
approach to reliability analysis of problems with disjointed active failure regions.
1 2
GðXÞ ¼ ðX þ 4Þ ðX2 1Þ sinð2:5X1 Þ 2 ð7:46Þ
20 1
The contour of the limit state function G(X) = 0 in Fig. 7.12 shows three disjointed
active failure regions, denoted in the figure by failure region 1, failure region 2, and
failure region 3, respectively. The probability of failure analysis of this mathe-
matical example is conducted with the direct MCS with a very large number of
sample points in order to come up with a benchmark probability of failure value so
that the accuracy of other reliability analysis methods, including the develop
SMASS approach, can be compared. With a sample size of 108, direct MCS pro-
duces an estimated probability of failure of 2.56 10−4.
The FORM is also implemented for the case study. As shown in Fig. 7.12, after
a total of 84 iterations, the MPP search converges to the MPP, [−2.895363,
2.911457], which gives an estimated reliability index of 3.7943. Accordingly, the
probability of failure can be estimated by the FORM as 7.4 10−5. For each
iteration of MPP search, there are three function evaluations for calculating the
performance function value and its first-order gradient information with respect to
the input variables X. Therefore, the FORM requires a total of 252 function
evaluations. It can be seen from Fig. 7.12 that the FORM method can only find one
MPP in one failure region while ignoring all other potential failure regions. This
results in substantial errors in the probability of failure estimation, as compared with
the MCS estimate.
7.5 Metamodel-Based RBDO 215
Fig. 7.12 Contour of the limit state function G(X) = 0 and the MPP found by the FORM
state function, as shown by the black solid line in the figure, respectively. The
approximate limit state function is very accurate, especially at those critical failure
surfaces from multiple disjointed active failure regions. Figure 7.15 shows the
convergence history of the minimum CCV during the Kriging model updating
process. With a total number of 38 function evaluations, the developed SMASS
approach provides a probability of failure estimate of 2.55 10−4. From the relative
error comparison, the developed SMASS approach is able to provide the most
accurate probability of failure estimation, compared with the FORM, DRM, and the
simple Kriging method. In addition, the simple Kriging method and developed
SMASS approach are significantly more accurate than FORM and the DRM.
Furthermore, due to the novel adaptive sequential sampling mechanism, the
y main system
F(t)
218 7 Reliability-Based Design Optimization
Table 7.4 Reliability analysis results by various approaches for the mathematical example
Approach Probability of failure Error (%) Number of function evaluations
MCS 2.56 10−4 N/A 108
FORM 7.4 10−5 71.09 252
DRM 2.78 10−5 89.24 5
Kriging 2.37 10−4 7.42 150
SMASS 2.55 10−4 0.39 38
developed SMASS approach is more accurate than simple Kriging, with a much
smaller number of function evaluations. The quantitative results for the comparison
of different probability of failure analysis methods employed in this mathematical
example are summarized in Table 7.4.
In this case study, a vibration absorber problem [33] is employed and the proba-
bility of failure analysis is carried out using the developed SMASS approach in
comparison with FORM, DRM, and the simple Kriging method. A tuned damper
system that includes a main system and a vibrational absorber is shown in Fig. 7.16.
The main system is attached to the ground or a surface by a spring and a damper,
and the absorber is attached to the main system by a spring only. The main system
is subject to a harmonic force FðtÞ ¼ cosðx tÞ. The purpose of the absorber is to
reduce or eliminate the vibrational amplitude of the main system. This type of
problem often occurs when there is a need to reduce or eliminate the seismic effect
on civil structures.
For the given vibration absorber system, as shown in Fig. 7.16, the normalized
amplitude y of the main system can be calculated as
2
1 1
b2
y ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 ffi ð7:47Þ
2 2 2 h i2
1 R b1 b1 b1 þ b21b2 þ 412 b1 b 1b2
1 1 2 1 2 1 1 2
where R is the ratio of the absorber’s mass to the main system’s mass, f is the
damping ratio of the main system, b1 is the ratio of the nature frequency of the main
system to the harmonic force frequency, and b2 is the ratio of the nature frequency
of the absorber to the harmonic force frequency. In this case study, R and f are set
as constants with R = 0.01 and f = 0.01, whereas b1 and b2 are considered to be
random variables that follow normal distributions, with b1 * N(1, 0.0252) and b2
* N(1, 0.0252), respectively.
For this case study, it is considered a system failure when the normalized
amplitude y reaches beyond a critical value of 28, thereby the limit state equation
7.5 Metamodel-Based RBDO 219
Fig. 7.18 Disjointed active failure regions by the limit state function and the MPP
Similar to the first case study, the Kriging surrogate modeling approach without
adaptive sequential sampling, referred to as simple Kriging, has also been employed
for the case study. In order to construct a relatively accurate Kriging surrogate
model, a total number of 200 training sample points have been used. Figure 7.20
shows an approximated limit state function using a Kriging model constructed with
200 training sample points, as compared with the true limit state function. In order
to avoid the randomness effect of the training sample points generated by the Latin
Hypercube Sampling (LHS), the Kriging surrogate model has been repeatedly
developed 100 times, and the estimated probability of failure on average is 1.163
10−2 based upon these 100 Kriging surrogate models.
7.5 Metamodel-Based RBDO 221
Fig. 7.20 Predicted limit state function by simple Kriging constructed with 200 sample points
The developed SMASS approach is applied to the case study. In this case study,
it is implemented first with 15 initial LHS sample points, as denoted by the blue
circle points in Fig. 7.14. Meanwhile, a total of 105 Monte Carlo sample points,
XMCS, are also generated in order to identify best sample points for the sequential
updating of the Kriging model. After 53 iterations of updating of the Kriging
model, the minimum CCV value for all the XMCS sample points satisfies the target
classification confidence value, CCVt, which has been set as 0.95 in this case study.
The 53 sequentially sampled points for the updating of the Kriging model are
shown in Fig. 7.21 with stars. From the identified sequential sampling points,
clearly the developed SMASS approach is able to locate most of the sample points
to the disjointed active failure regions, in order to enhance the fidelity of the
Kriging model. With a total of 103 evaluations of the limit state function, G(X), the
approximated limit state function generated by the developed Kriging model, as
shown by the red dash line in the figure, has a very good match with the true limit
state function, as shown by the black solid line in the figure, respectively. The
approximate limit state function is very accurate, especially at those critical failure
surfaces from multiple disjointed active failure regions. Figure 7.22 shows the
convergence history of the minimum CCV during the Kriging model updating
process. With a total number of 103 function evaluations, the developed SMASS
approach provides a probability of failure estimate of 1.033 10−2. From the
relative error comparison, the developed SMASS approach is able to provide the
most accurate probability of failure estimation, compared with the FORM, DRM,
and the simple Kriging method. Due to the novel adaptive sequential sampling
222 7 Reliability-Based Design Optimization
Fig. 7.22 History of the minimum CCV during the iterative Kriging model updating process
mechanism, the developed SMASS approach is more accurate than the simple
Kriging with a smaller number of function evaluations. The quantitative results for
the comparison of different probability of failure analysis methods employed in this
mathematical example are summarized in Table 7.5.
7.6 Exercises 223
Table 7.5 Reliability analysis results by various approaches for the vibration absorber example
Approach Probability of failure Error (%) Number of function evaluations
MCS 1.024 10−2 N/A 1 108
FORM 6.87 10−3 32.92 99
DRM 0 100.00 5
Simple Kriging 1.163 10−2 13.57 200
SMASS 1.033 10−2 0.88 103
7.6 Exercises
7:1 Consider a mathematical problem for design optimization. The problem has
two design variables X = [X1, X2]T, and involves three performance functions
that are defined as follows:
Minimize f ðdÞ ¼ d1 þ d2
Subject to Gj ðdÞ 0; j ¼ 1; 2; 3 ð7:49Þ
1 d1 10; 1 d2 10
The RBDO problem with a target reliability of 99.87% (i.e., a target reliability
index of bt = 3.0) is formulated as follows
Minimize f ðdÞ ¼ d1 þ d2
Subject to Pr Gj ðX; dÞ 0 Uðbt Þ; j ¼ 1; 2; 3 ð7:50Þ
0 d1 10; 0 d2 10
7:2 Solve the RBDO problem in Problem 7.1 using RIA with the HL-RF method
(i.e., nm = 2 in the 99-line RBDO code). In this exercise, assume that X2 is a
non-normally distributed random variable and follows a lognormal distribu-
tion with the same mean and standard deviation as defined in the code. (Hint:
Refer to the transformation between the X-space and the U-space for a log-
normal distribution specified in Table 5.1, and consider adding the following
lines of code for the 1st constraint).
①: DXDU(1) = stdx(1);
sigmaL = sqrt(log(1+(stdx(2)/x(2))^2));
muL = log(x(2))-0.5*sigmaL^2;
DXDU(2) = exp(muL + sigmaL*u(2))*sigmaL;
dbeta = u./(beta*DXDU);
②: x(1) = u(1).*stdx(1)+d(1);
sigmaL = sqrt(log(1+(stdx(2)/d(2))^2));
muL = log(d(2))-0.5*sigmaL^2;
x(2) = exp(muL + sigmaL*u(2));
③: DXDU = x(2)*sigmaL;
GCeq(2) = -x(1)^2/20*DXDU;
7:3 Consider a vehicle side-impact problem for design optimization. The opti-
mization task is to minimize the vehicle weight while meeting the side impact
top safety-rating criteria shown in Table 7.6 [34]. There are nine design
parameters used in the design optimization of vehicle side impact. The design
variables are the thickness (X1–X7) and material properties (X8, X9) of critical
parts, as shown in Table 7.7. The two (non-design) random parameters are
barrier height and hitting position (X10, X11), which can vary from −30 to
30 mm according to the physical test.
Table 7.7 Statistical properties of random design and parameter variables (X10 and X11 both have
0 means) for Problem 7.1
Random variable Distribution type Std. dev. dL d dU
X1 (mm) Normal 0.050 0.500 1.000 1.500
X2 (mm) Normal 0.050 0.500 1.000 1.500
X3 (mm) Normal 0.050 0.500 1.000 1.500
X4 (mm) Normal 0.050 0.500 1.000 1.500
X5 (mm) Normal 0.050 0.500 1.000 1.500
X6 (mm) Normal 0.050 0.500 1.000 1.500
X7 (mm) Normal 0.050 0.500 1.000 1.500
X8 (GPa) Lognormal 0.006 0.192 0.300 0.345
X9 (GPa) Lognormal 0.006 0.192 0.300 0.345
X10 (mm) Normal 10.0 X10 and X11 are not design
X11 (mm) Normal 10.0 variables
The response surfaces for ten performance measures are constructed from a
vehicle side-impact model as {Gj gj, j = 1, 2, …, 10}, where the perfor-
mance limits gj form a vector g = [1, 32, 32, 32, 0.32, 0.32, 0.32, 4, 9.9,
15.7]T. The response surfaces of the vehicle weight f and the performance
measures Gi are defined as follows:
Minimize f ðdÞ
Subject to Gj ðdÞ gj ; j ¼ 1; . . .; 10 ð7:51Þ
diL di diU ; i ¼ 1; . . .; 9
Minimize f ðdÞ
Subject to Pr Gj ðX; dÞ gj 99:87%; j ¼ 1; . . .; 10 ð7:52Þ
diL di diU ; i ¼ 1; . . .; 9
(1) Solve the deterministic design optimization problem in Eq. (7.51) using
the ‘fmincon’ function in MATLAB. Start the design optimization from
the initial design (d1–d7 = 1.000, d8 = d9 = 0.300).
(2) Solve the RBDO problem in Eq. (7.52) using PMA with the AMV method
by modifying the MATLAB code in the Appendix. Start the design
optimization from both the initial design (d1–d7 = 1.000, d8 = d9 = 0.300)
and deterministic optimum design obtained from (1).
%%%%%%%%%% A 99 LINE RBDO CODE WRITTEN BY WANG P.F. & YOUN B.D. %%%%%%%%
function RBDO()
clear all; close all; clc;
global nc nd nm bt stdx Iters Cost
nm=2; nc=3; nd=2; bt=norminv(0.99,0,1);
x0=[5,5]; stdx=[0.3,0.3]; lb=[0,0]; ub=[10,10];
xp=x0; Iters=0;
options = optimset(′GradConstr′,′on′,′GradObj′,′on′,′LargeScale′,′off′);
[x,fval]=fmincon(@Costfun,x0,[],[],[],[],lb,ub,@frelcon,options)
%==================== Obj. Function ==============================%
function [f,g]= Costfun(x)
f=x(1)+x(2);
g=[1 1];
Cost=f;
end
Appendix: A 99-Line MATLAB Code for RBDO 227
if iter ==1
sign = -ceq/abs(ceq);
elseif iter>1
Dif=abs(U(iter-1,:)*U(iter,:)′ - 1);
end
end
beta = sign*norm(u);
dbeta = -u./(beta*stdx);
end
%========================== Constraint Fun. ==========================%
function [ceq,GCeq]=cons(u,d,kc)
x = u.*stdx+d;
if kc == 1
ceq=1-x(1)^2*x(2)/20;
GCeq(1)=-x(1)*x(2)/10*stdx(1);
GCeq(2)=-x(1)^2/20*stdx(2);
elseif kc == 2
ceq=1-(x(1)+x(2)-5)^2/30-(x(1)-x(2)-12)^2/120;
GCeq(1)=(-(x(1)+x(2)-5)/15-(x(1)-x(2)-12)/60)*stdx(1);
GCeq(2)=(-(x(1)+x(2)-5)/15+(x(1)-x(2)-12)/60)*stdx(2);
elseif kc == 3
ceq=1-80/(x(1)^2+8*x(2)+5);
GCeq(1)=x(1)*160*stdx(1)/((x(1)^2+8*x(2)+5))^2;
GCeq(2)=80*8*stdx(2)/((x(1)^2+8*x(2)+5))^2;
end
end
function SHOW(Iters,x,c,GC)%==== Display the Iteration Information====%
fprintf(1,′\n********** Iter.%d ***********\n′ ,Iters);
disp([′Des.: ′ sprintf(′%6.4f ′,x)]);
disp([′Obj.: ′ sprintf(′%6.4f′,Cost)]);
if nm==1
disp([′Cons.: ′ sprintf(′%6.4f ′,c)]);
elseif nm==2
disp([′Index.: ′ sprintf(′%6.4f ′,bt-c)]);
end
disp([′Sens.: ′ sprintf(′%6.4f ′,GC)]);
fprintf(′\n\n′)
end
end
Appendix: A 99-Line MATLAB Code for RBDO 229
References
1. Tu, J., Choi, K. K., & Park, Y. H. (1999). A new study on reliability-based design
optimization. Journal of Mechanical Design, Transactions of the ASME, 121(4), 557–564.
2. Youn, B. D., Choi, K. K., & Du, L. (2004). Enriched Performance Measure Approach (PMA+)
and its numerical method for reliability-based design optimization. AIAA Journal, 43,
874–884.
3. Youn, B. D., Choi, K. K., & Park, Y. H. (2003). Hybrid analysis method for reliability-based
design optimization. Journal of Mechanical Design, 125(2), 221–232.
4. Chiralaksanakul, A., & Mahadevan, S. (2005). First-order approximation methods in
reliability-based design optimization. Journal of Mechanical Design, 127(5), 851–857.
5. Noh, Y., Choi, K., & Du, L. (2008). Reliability-based design optimization of problems with
correlated input variables using a Gaussian Copula. Structural and Multidisciplinary
Optimization, 38(1), 1–16.
6. Du, X. P., & Chen, W. (2004). Sequential optimization and reliability assessment method for
efficient probabilistic design. Journal of Mechanical Design, 126(2), 225–233.
7. Liang, J. H., Mourelatos, Z. P., & Nikolaidis, E. (2007). A single-loop approach for system
reliability-based design optimization. Journal of Mechanical Design, 129(12), 1215–1224.
8. Nguyen, T. H., Song, J., & Paulino, G. H. (2010). Single-loop system reliability-based design
optimization using matrix-based system reliability method: Theory and applications. ASME
Journal of Mechanical Design, 132, 011005-1–11.
9. Thanedar, P. B., & Kodiyalam, S. (1992). Structural optimization using probabilistic
constraints. Journal of Structural Optimization, 4, 236–240.
10. Chen, X., Hasselman, T. K., & Neill, D. J. (1997, April). Reliability-based structural design
optimization for practical applications. AIAA Paper, 97-1403.
230 7 Reliability-Based Design Optimization
11. Wang, L., & Kodiyalam, S. (2002). An efficient method for probabilistic and robust design
with non-normal distributions. In 43rd AIAA Structures, Structural Dynamics, and Materials
Conference, April 2002.
12. Rahman, S. (2009). Stochastic sensitivity analysis by dimensional decomposition and score
functions. Probabilistic Engineering Mechanics, 24(3), 278–287.
13. Yu, X., Chang, K. H., & Choi, K. K. (1998). Probabilistic structural durability prediction.
AIAA Journal, 36(4), 628–637.
14. Putko, M. M., Newman, P. A., Taylor, A. C., III, & Green, L. L. (2002). Approach for
uncertainty propagation and robust design in CFD using sensitivity derivatives. Journal of
Fluids Engineering, 124(1), 60–69.
15. Koch, P. N., Yang, R.-J., & Gu, L. (2004). Design for six sigma through robust optimization.
Structural and Multidisciplinary Optimization, 26(3–4), 235–248.
16. Wu, Y. T., Millwater, H. R., & Cruse, T. A. (1990). Advanced probabilistic structural analysis
method for implicit performance functions. AIAA Journal, 28(9), 1663–1669.
17. Youn, B. D., & Choi, K. K. (2004). An investigation of nonlinearity of reliability-based
design optimization approaches. Journal of Mechanical Design, 126(3), 403–411.
18. Youn, B. D., & Choi, K. K. (2004). Selecting probabilistic approaches for reliability-based
design optimization. AIAA Journal, 42(1), 124–131.
19. Youn, B. D., Xi, Z., & Wang, P. (2008). Eigenvector Dimension-Reduction (EDR) method
for sensitivity-free uncertainty quantification. Structural and Multidisciplinary Optimization,
37(1), 13–28.
20. Shan, S., & Wang, G. G. (2008). Reliable design space and complete single-loop
reliability-based design optimization. Reliability Engineering & System Safety, 93(8),
1218–1230.
21. Zou, T., & Mahadevan, S. (2006). A direct decoupling approach for efficient reliability-based
design optimization. Structural and Multidisciplinary Optimization, 31(3), 190–200.
22. Nguyen, T. H., Song, J., & Paulino, G. H. (2011). Single-loop system reliability-based
topology optimization considering statistical dependence between limit-states. Structural and
Multidisciplinary Optimization, 44(5), 593–611.
23. Rubinstein, R. Y., & Kroese, D. P. (2011). Simulation and the Monte Carlo method (Vol.
707). New York: Wiley.
24. Stein, M. (1987). Large sample properties of simulations using latin hypercube sampling.
Technometrics, 29(2), 143–151.
25. Goel, T., Haftka, R. T., & Shyy, W. (2008). Error measures for noise-free surrogate
approximations. AIAA Paper, 2008-901.
26. Dey, A., & Mahadevan, S. (1998). Ductile structural system reliability analysis using
importance sampling. Structure Safety, 20(2), 137–154.
27. Martino, L., Elvira, V., Luengo, D., & Corander, J. (2015). An adaptive population
importance sampler: Learning from uncertainty. IEEE Transactions on Signal Processing, 63
(16), 4422–4437.
28. Beachkofski, B., & Grandhi, R. (2002). Improved distributed hypercube sampling. In 43rd
AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference.
April, 2002.
29. Leary, S., Bhaskar, A., & Keane, A. (2003). Optimal Orthogonal-array-based latin
hypercubes. Journal of Applied Statistics, 30(5), 585–598.
30. Joseph, V. R., & Hung, Y. (2008). Orthogonal-maximin latin hypercube designs. Statistica
Sinica, 18, 171–186.
31. Deutsch, J. L., & Deutsch, C. V. (2012). Latin hypercube sampling with multidimensional
uniformity. Journal of Statistical Planning and Inference, 142, 763–772.
32. Wang, Z., & Wang, P. (2014). A maximum confidence enhancement based sequential
sampling scheme for simulation-based design. Journal of Mechanical Design, 136(2),
021006.
References 231
33. Kuczera, R. C., & Mourelatos, Z. P. (2009). On estimating the reliability of multiple failure
region problems using approximate metamodels. Journal of Mechanical Design, 131(12),
121003–121013.
34. Youn, B. D., Choi, K. K., Gu, L., & Yang, R.-J. (2004). Reliability-based design optimization
for crashworthiness of side impact. Journal of Structural and Multidisciplinary Optimization,
26(3–4), 272–283.
Chapter 8
Time-Dependent Reliability Analysis
in Operation: Prognostics and Health
Management
Over the past few decades, rapid adoption of sensing, computing, and communi-
cations technologies has created one of the key capabilities of modern engineered
systems: the ability—at a low cost—to gather, store, and process large volumes of
sensor data from an engineered system during operation. These sensor data may
contain rich information about a system’s behavior under both healthy and
degraded conditions. A critical question now is how to leverage the new sensor
information, which may be continuously or periodically collected, to assess the
current health condition (health reasoning) and predict imminent failures (health
prognostics) of the operating system over its life cycle. This health information can
provide a timely warning about potential failures and potentially open a window of
opportunity for implementing measures to avert these failures. This chapter presents
techniques and approaches that enable (i) design of sensor networks (SNs) for
health reasoning, (ii) extraction of health-relevant information from sensor signals
and assessment of a system’s health condition, and (iii) prediction of a system’s
remaining useful life (RUL).
Life PDF
Life PDF
Lifetime (hrs) Lifetime (hrs)
1 1
2 2
3 3
Unit-independent Unit-dependent
4 4
Unit ID Population-wise (ALT) Unit ID Unit-wise (PHM)
potential risk. In other words, we may need to perform unit-wise reliability analysis
that estimates the time-dependent reliability of a particular unit. Figure 8.1 provides
a graphical illustration of the difference between population- and unit-wise relia-
bility analyses.
Recent decades have seen a growing interest in moving from traditional non-
destructive testing (NDT) to nondestructive evaluation (NDE) and structural health
monitoring (SHM), and towards automated data analytics for prognostics and health
management (PHM) [1], as shown in Fig. 8.2. Among these major enabling
technologies for unit-wise, time-dependent reliability analysis, PHM has recently
emerged as a key technology that uses data analytics to assess the current health
condition of an engineered system (health reasoning) and predict when and how the
system is likely to fail (health prognostics) throughout the system’s lifetime. The
need for PHM is also being driven by an increased demand for condition-based
maintenance and life extension of high-value engineered systems like bridges and
energy infrastructure (e.g., nuclear power plants, wind turbines, and pipelines).
In general, PHM consists of four basic functions: health sensing, health rea-
soning, health prognostics, and health management (see Fig. 8.3). A brief
description of each function is given here:
Time
limit
Extrapolation
Current time Time
Fig. 8.2 Evolution of key enabling technologies for unit-wise, time-dependent reliability analysis
8.1 Overview of Prognostics and Health Management 235
Cost
Failure cost Maintenance cost
Number of failures
As mentioned in Sect. 8.1, the health sensing function of PHM aims at acquiring
sensor signals from an engineered system through in-situ monitoring techniques
and ensuring a high likelihood of damage detection by designing an optimal SN.
The effectiveness of PHM in failure prevention and reliability improvement relies
greatly on the usefulness and completeness of health-relevant information conveyed
by the sensor signals. These measurable physical quantities can be classified into
two major categories: environmental signals (e.g., temperature, pressure, and
humidity) and operating signals (e.g., voltage, current, vibration, and power). In
order to identify an approximate set of sensing quantities, we can first conduct
failure modes and effects analysis (FMEA) to determine critical failure modes and
their effects, and then identify measureable quantities that may be affected by these
modes and/or effects. Potential failure modes and the corresponding sensing
quantities of several engineered systems are shown in Fig. 8.5.
Identifying appropriate sensing quantities (or selecting approximate sensor
types) is one important aspect of the health sensing function. In a broader sense,
there may be interest in designing an optimal SN with high detectability, while
accounting for various sources of uncertainty (e.g., material properties, geometric
tolerances and loading conditions). Sections 8.2.1–8.2.3 present a
Fig. 8.5 Potential failures and sensing quantities of selected engineered systems
238 8 Time-Dependent Reliability Analysis in Operation …
By definition, the ith diagonal term in the PoD matrix represents a conditional
probability of correct detection for the ith health state; this can be defined as the
detectability of the ith system health state HSi as
Minimize C
subject to Di ðXT ; XN ; XLoc ; Xs Þ Dti ð8:3Þ
ði ¼ 1; 2; . . .; NHS Þ
where C is the cost involved (calculated as the product of the number of sensors and
the sum of the sensor material and installation costs), XT is a vector of the binary
decision variables for selection of the types of sensing devices, XN is a vector
consisting of the number of each selected type of sensing devices, XLoc is a 3-D
vector of the location of each sensing device, Xs is a vector of the sensing control
parameters, NHS is the total number of predefined health states for the engineered
system, Di is the detectability of the SN for the ith predefined health state, which is
a function of the design variables XT, XN, XLoc, and Xs, and Dti is the target SN
detectability for the ith predefined health state. Note that the formulation of the SN
design optimization problem bears a resemblance to that of the reliability-based
design optimization problem [6, 7] with the exception that the former uses the
detectability as the constraint and the latter uses the reliability as the constraint.
The SN design optimization problem in Eq. (8.3) contains discrete decision
variables for the selection of sensing devices, integer variables for the number of
selected sensing devices, as well as continuous variables for the sensor locations.
Thus, the optimization problem is formulated as a mixed-integer nonlinear pro-
gramming (MINLP) problem [8], and heuristic algorithms, such as genetic algo-
rithms (GAs), can be used as the optimizer for the optimization process. In this
textbook, the GA is employed for the example problem that will be detailed in the
subsequent section. More alternative algorithms for solving the MINLP problem
can be found in references [8, 9].
240 8 Time-Dependent Reliability Analysis in Operation …
The flowchart of the SN design optimization process is shown in Fig. 8.6 [4, 5]. As
shown in the figure, the process starts from an initial SN design and goes into the
design optimization subroutine (the grey box on the right-hand side), which will
carry out the SN cost analysis, call the performance analysis subroutine (the grey
box on the left-hand side) to evaluate the performance of the SN in its current
design, and execute the optimizer to generate a new SN design if the optimality
condition is not met. In the performance analysis subroutine, the detectability
analysis (as discussed in the previous section) will be carried out. Before solving the
optimization problem, valid system simulation models have to be built and com-
puter simulations have to be accomplished so that the training and testing data sets
for each predefined health state are available.
It is interesting to note that this design optimization procedure bears a striking
resemblance to the RBDO procedure. The two key elements in the SN design, the
detectability analysis and the design optimization, can be equivalently mapped to
the two key elements in the RBDO, the reliability analysis and design optimization.
This finding can be generalized to other optimization problems, and we can con-
clude that all optimization problems share the same basic structure: the optimization
routine performs design change based on the cost (e.g., product volume, number of
sensors) and constraint (e.g., reliability, detectability) analyses. The two key ele-
ments in SN design, the detectability analysis and the design optimization, will be
discussed in detail in subsequent sections.
Probabilistic
Simulated
performance
New SN
design
Optimizer
Structural Analysis
No
Optimal?
Yes
Final design
LCCs. Due to the difficulties associated with direct sensing inside trans-
formers, the data that are most often used for both diagnosis and prognosis of
transformers are obtained through indirect measurements. This example aims
to design an optimum SN on the front wall surface of a power transformer.
The measurements of the transformer’s vibration responses induced by the
magnetic field loading enable the detection of mechanical failures of the
winding support joints inside the transformer.
Description of the Example
In this example, a loosening of the winding support joint is considered to be
the failure mode. Detection of the failure will be realized by collecting the
vibration signal, induced by magnetic field loading with a fixed frequency on
the power transformer core, using an optimally designed SN on the external
surface of the transformer. The validated finite element (FE) model of a
power transformer was created in ANSYS 10, as shown in Fig. 8.7, where
one exterior wall is removed to make the interior structure visible. Figure 8.8
shows 12 simplified winding support joints, 4 for each winding. The trans-
former is fixed at the bottom surface and a vibration load with a frequency of
120 Hz is applied to the transformer core. The joint loosening was realized by
Fig. 8.7 An FE model of a power transformer (without the covering wall). Reprinted
(adapted) with permission from Ref. [4]
242 8 Time-Dependent Reliability Analysis in Operation …
1 2 3 4 5 6
7 8 9 10 11 12
Fig. 8.8 Winding support joints and their numbering. Reprinted (adapted) with permission
from Ref. [4]
reducing the stiffness of the joint itself. Different combinations of the loos-
ening joints will be treated as different health states of the power transformer;
these will be detailed in the next subsection.
The uncertainties in this example are modeled as random parameters with
corresponding statistical distributions, as listed in Table 8.3. These uncer-
tainties include the material properties (e.g., Young’s modulus, densities, and
Poisson ratios) for support joints and windings, as well other parts in the
power transformer system. In addition, geometric parameters are also con-
sidered as random variables. These uncertainties will be propagated into the
structural vibration responses and will be accounted for when designing the
optimum SN.
Health States
For the purpose of demonstrating the proposed SN design methodology, 9
representative health states (see Table 8.4) were selected from all possible
Table 8.3 Random properties of the power transformer. Reprinted (adapted) with
permission from Ref. [4]
Random Physical meaning Randomness (cm, g,
variable degree)
X1 Wall thickness N(3, 0.062)
X2 Angular width of support joints N(15, 0.32)
X3 Height of support joints N(6, 0.122)
X4 Young’s modulus of support joint N(2e12, 4e102)
X5 Young’s modulus of loosening N(2e10, 4e82)
joints
X6 Young’s modulus of winding N(1.28e12, 3e102)
X7 Poisson ratio of joints N(0.27, 0.00542)
X8 Poisson ratio of winding N(0.34, 0.00682)
X9 Density of joints N(7.85, 0.1572)
X10 Density of windings N(8.96, 0.1792)
8.2 The Health Sensing Function 243
Fig. 8.9 Stress contour of the winding supports for a healthy-state power transformer.
Reprinted (adapted) with permission from Ref. [4]
244 8 Time-Dependent Reliability Analysis in Operation …
Fig. 8.10 Detectability with an optimum design and detectability with different numbers of
sensors. Reprinted (adapted) with permission from Ref. [4]
The vibration amplitude of each node on the surface of the covering wall was
used as the simulated sensor (accelerometer) output. Thus, the design vari-
ables in this example include: (i) the total number of accelerometers, (ii) the
location of each accelerometer, and (iii) the direction (X or Z) of each
accelerometer.
Results and Discussion
The SN design problem in this example was solved using the genetic algo-
rithm. Figure 8.10 shows the detectability for each of the 9 health states at the
optimum SN design, and the detectability for each different number of total
sensors. Using a target detectability of 0.95, we obtained the optimum SN
design on the outer wall surface (140 cm 90 cm) with a total of 9 sensors.
The results of this example suggest that the proposed SN design framework is
capable of solving SN design problems for complicated engineered systems
with multiple system health states and a wide variety of system input
uncertainties.
The effectiveness of PHM for failure prevention and reliability improvement relies
significantly on the usefulness and completeness of health-relevant information
conveyed by the sensor signals. Advances in wireless communications and
low-power electronics have allowed the deployment of wireless sensor networks
8.2 The Health Sensing Function 245
(WSNs) for PHM. However, because the powering of wireless sensors still relies on
chemical batteries, the limited lifespan of chemical batteries makes it difficult to use
wireless sensors, especially when replacement is needed in inaccessible or remote
locations. Furthermore, according to the U.S. Department of Energy [10], estimated
battery replacement costs of $80–$500 (including labor) exceed the price of the
sensor. This battery issue that affects wireless sensors used in health sensing is
prompting research interest in developing a self-powered solution.
Energy harvesting has received much attention as an alternative solution to
possibly eliminate the replacement cost of the chemical batteries in wireless sen-
sors. Energy harvesting technology converts ambient, otherwise wasted, energy
sources into electric power that can be used for operating wireless sensors.
Figure 8.11 shows the concept of energy harvesting for self-powered wireless
sensors.
Energy Conversion (Transduction) Mechanisms
Examples of ambient, otherwise wasted, energy sources include light, fluid flow,
temperature difference, and vibration. A solar cell can convert the energy from light
directly into electric power. This transduction mechanism is called the photovoltaic
effect. While solar cells can produce relatively high power density; their use is
limited in a dim light conditions and they are unsuitable where light is not
accessible.
As shown in Fig. 8.11, thermal energy, such as temperature difference, can be
converted into electric power using the thermoelectric transduction mechanism. The
thermoelectric effect was discovered by the Baltic German physicist, Thomas
Johann Seebeck. Thermoelectricity refers to the direct conversion of temperature
differences to electric voltage, and vice versa. The output voltage generated by the
thermoelectric effect is proportional to the temperature difference between the
junctions of dissimilar conductors.
246 8 Time-Dependent Reliability Analysis in Operation …
Likewise, vibration energy, one widely available ambient energy source, can be
converted into electric power using piezoelectric, electromagnetic, electrostatic,
and/or magnetostrictive transduction mechanisms. Among vibration-based energy
harvesting technologies, piezoelectric energy harvesting has been preferred due to
its high energy density and easy installation. In 1880, French physicists, Jacques
and Pierre Curie, first discovered piezoelectricity. The prefix ‘piezo’ comes from
the Greek ‘piezein’ which means to pressure or squeeze. In this method, as shown
in the second column in Fig. 8.11, a piezoelectric material produces electric
polarization in response to mechanical strain. This transduction mechanism is called
the direct piezoelectric effect. The amount of output voltage generated by the
piezoelectric material is proportional to the mechanical strain.
Among the aforementioned energy conversion mechanisms, piezoelectric energy
harvesting could be a very attractive solution for powering wireless sensors,
because engineered systems usually induce vibrations in operation. To help the
readers better understand how to realize self-powered wireless sensors for PHM, the
following subsection provides a brief overview of the key issues in piezoelectric
energy harvesting.
Key Issues in Piezoelectric Energy Harvesting
Research in piezoelectric energy harvesting can be broken into four key issues,
specifically: (i) development of materials, (ii) modeling and analysis, (iii) mechan-
ics-based design, and (iv) circuit design. To successfully realize self-powered
wireless sensors using piezoelectric energy harvesting, it is necessary to thoroughly
understand the key issues and make connections between them.
• Development of materials: Piezoelectric materials include lead zirconate titanate
(PZT), zinc oxide (ZnO), polyvinylidene difluoride (PVDF), lead magnesium
niobate-lead titanate (PMN-PT), and polypropylene (PP) polymer. It is never
enough to emphasize only the material issues in order to improve the
mechanical and electrical properties of the piezoelectric material. For example,
piezoelectric ceramics, such as PZT, have a high piezoelectric and dielectric
constant, but are inherently brittle and less durable. Meanwhile, piezoelectric
polymers, such as PVDF, have high flexibility but low electromechanical
coupling. For this reason, many material scientists have been devoted to
developing flexible as well as electromechanically efficient piezoelectric mate-
rials based on nanotechnology.
• Modeling and analysis: Prior to designing the piezoelectric energy harvester and
selecting the best sites for installation, it is essential to make a preliminary
estimate of the output power based on the vibration data acquired from the
engineered system. This issue drives current research interest in developing an
electromechanically-coupled model with high predictive capability, based on
rigorous theories and mechanics. Many research efforts have been made to
advance an analytical model (e.g., lumped-parameter, Rayleigh-Ritz method,
and distributed-parameter) that can describe the physics of the electromechan-
ical behavior of the piezoelectric energy harvester. Since the commercialized
8.2 The Health Sensing Function 247
Fig. 8.12 Piezoelectric energy harvesting skin for powering wireless sensors
8.2 The Health Sensing Function 249
(ID: 48, 52, and 40) were connected to the other platform (AmbioMote24-B).
An oscilloscope (LT354 M from LeCroy) was used to measure the output
voltage in real time. While the outdoor condensing unit is in operation, an
output voltage of 4–5 V (peak-to-peak) was measured and all five wireless
sensor signals were successfully transmitted to the laptop computers in
real time.
The primary tasks of the health reasoning function are extraction of health-relevant
system information (or features) from raw sensor signals and early detection of
faults based on the extracted health-relevant information. These tasks can be
accomplished by (i) continuously or periodically monitoring the operation of an
engineered system, (ii) detecting and diagnosing abnormal conditions or faults of
the system using feature extraction and health classification techniques, and
(iii) assessing the significance of the detected faults. The procedure for executing
the health reasoning function is shown in Fig. 8.15. The process involves the
following key steps:
• Signal preprocessing: The aims of signal preprocessing are to isolate specific
signals of interest, filter out noise and outliers, and/or achieve a normalized scale.
Signals that are preprocessed are expected to achieve better accuracy in fault
detection and classification than those that are not preprocessed. The applicability
of one specific signal preprocessing technique depends on the kinds of signals
that need to be preprocessed, how noisy they are, how many outliers they contain,
and what techniques will be used in the subsequent processing steps. Some of the
most important techniques include time synchronous averaging resampling,
signal filtering, and various averaging techniques.
• Feature extraction: This step extracts health-relevant features from raw or pre-
processed sensor measurements acquired from continuous or periodic sensing of
an engineered system. Inherent in this feature extraction step is the condensing
of raw sensor data. Commonly used feature extraction techniques include time
domain analysis (e.g., statistical moment calculation), frequency domain anal-
ysis (e.g., FFT), and time-frequency domain analysis (e.g., wavelet transform).
• Feature selection: Feature selection aims at selecting an optimum subset of
features that minimize redundancy and focus on features of maximum relevance
to the health states of the system. Both non-adaptive and adaptive approaches
can be used to select features that are capable of discriminating measurements
that belong to different health states.
• Fault detection and classification (health diagnostics): This step involves (i) fault
detection that determines whether some type of fault has occurred, and (ii) fault
classification that identifies to which of a set of health states (defined based on
fault type and location) a new measurement belongs. An additional process is
often needed to quantify the severity of a detected fault (e.g., the size of a crack
on a plate, the loss of power from a battery) in the form of a normalized health
measure, or health index. This additional process yields a quantitative measure
of the fault and is particularly useful for health prognostics.
Together, fault detection and classification are commonly called health diagnostics.
In most engineered systems equipped with the capability of health diagnostics, fault
detection runs continuously, while fault classification is triggered only upon the
252 8 Time-Dependent Reliability Analysis in Operation …
detection of a fault. In other systems, fault detection and classification may run in
parallel and be performed simultaneously.
Prior to feature extraction, raw sensor signals are often preprocessed to isolate
specific signals of interest, to remove noise and outliers, and/or to achieve a nor-
malized scale. Signal preprocessing is essential to ensuring good accuracy in fault
detection and classification. Several important techniques for signal preprocessing
are discussed in the following sections.
8.3.1.1 Resampling
Resampling is a signal preprocessing technique that changes the sampling rate of the
raw sensor signals. Since most of the signals are acquired using a pre-determined
sampling rate, the signals may require resampling to consider the signal character-
istics. Reducing the sampling rate by an integer factor is called downsampling;
Raw sensory
signals
What health information
8.3.1 Signal 8.3.2 and 8.3.3 Feature is available in sensor
preprocessing extraction and selection signals?
whereas, increasing the sampling rate by an integer factor is called upsampling. Note
that downsampling is used with an anti-aliasing filter because the sampling rate
reduction causes distortion of high-frequency components. Combining downsam-
pling and upsampling, sampling rate conversion by a noninteger factor can be
achieved. For example, the rotating speed of a rotor system fluctuates, while the
sampling rate is fixed. Thus, the raw vibration signals have a different number of
points per a rotation of the rotor, which will increase uncertainties in the analysis
procedure. Uncertainty can be reduced by resampling the signals into a fixed number
of points per cycle with respect to the tachometer signals.
Fig. 8.16 Procedure for conventional TSA. Reprinted (adapted) with permission from Ref. [14]
254 8 Time-Dependent Reliability Analysis in Operation …
Fig. 8.17 Vibration modulation due to the revolution of the planet gears of a planetary gearbox
interest. On the other hand, the noise term converges to zero as a considerable
number of segments accumulates, as shown in the lower right of Fig. 8.16.
TSA needs to be revised for use in more complex gearboxes, such as planetary
gearboxes in which planet gears revolve around a sun gear (see Fig. 8.17). Because a
typical sensor is fixed at a gearbox housing, the measured vibration signal is mod-
ulated as a function of the distance from the fixed sensor to the revolving planet gears.
For such systems, TSA has been developed with a window function [14]. The
window function extracts the vibration signals as the planet gears approach the
sensor so that the extracted signals can be used for ensemble averaging, as shown in
Fig. 8.16. This process enables the vibration signals that are out of interest to be
ignored, while increasing the signal-to-noise (S/N) ratio. Typical mathematically
defined window functions, such as Tukey window and Hann window, have a
bell-shape to highlight the instances in which the planet gears pass the sensor and
reduce the signal amplitude when the planet gears are located far from the sensor.
Extracted vibration signals from the bell-shaped window function would have high
similarity and could serve as good sources for the ensemble averaging process
shown in Fig. 8.16. Autocorrelation-based TSA (ATSA) was developed [14] as a
more physics-oriented approach. The autocorrelation function, which is a measure
of similarity, increases as the planet gears approach the sensor and decreases as the
planet gear recedes from the sensor. In ATSA, the autocorrelation function is used
to quantify the similarity of the measured signal as a function of the distance from
the fixed sensor to the revolving planet gears; this function is then used to design
the shape of the window function.
8.3.1.3 Filtering
input signal (X(jw)) are modified by the frequency characteristics of the filters (H
(jw)). A sound equalizer is one of the most popular applications of a filter that
modifies the frequency-shape of an input signal. If an equalizing filter has a high
value for low frequencies and a low value for moderate or high frequencies, the
filter will enhance the bass sound.
In engineered systems, frequency-selective filters and linear finite impulse
response (FIR) and infinite impulse response (IIR) filters are the most widely used
filters. Frequency-selective filters have unit values for the desired narrow-range
frequencies and zeros for the other frequencies, as shown in Fig. 8.19, where a
low-pass filter, a band-pass filter, and a high-pass filter are presented. For example,
in a multi-stage rotating system, a low-pass filter, a band-pass filter, and a high-pass
filter can be used to selectively analyze the health state of the low-speed shaft,
middle-speed shaft, and high-speed shaft, respectively. On the other hand, a
low-pass filter and a high-pass filter can be used to filter out high-frequency noise
components and low-frequency undesired modulating components, respectively.
Other filtering methods, such as the moving and exponential average smoothing
techniques, can also be used to filter out noise and outliers in raw sensor signals.
These moving and exponential average smoothing techniques are special cases of
the well-known linear finite impulse response (FIR) and infinite impulse response
(IIR) filters, respectively, and are popular due to their ease of implementation.
Linear filters estimate the current data point by taking a weighted sum of the current
and previous measurements in a window of a finite or infinite length.
Mathematically, the FIR filter can be represented as [16]
X
I
^xk ¼ wi xki þ 1 ð8:4Þ
i¼1
where I is the length of the filter, and {wi} is the sequence of weights that define the
characteristics of the filter and sum to unity. Note that when all weights are equal,
the FIR filter reduces to the mean or average filter. IIR filters are linear filters with
infinite filter lengths. An important IIR filter is the so-called exponentially weighted
moving average, which filters the current measurement by exponentially averaging
it with all previous measurements [16]
As multiple sensor signals may have significantly different scales, the process of
normalizing the sensor signals is important to ensuring robust fault detection and
classification. An example of data normalization on a single sensor signal is shown
in Fig. 8.20, where the raw measurements acquired by a sensor are normalized
roughly between 0 and 1.
8.3 The Health Reasoning Function 257
Fig. 8.23b can be obtained. RMS signals are usually used to detect changes in
machine vibrations.
Fig. 8.23 Sample sinusoid signal (a) and RMS signal (b)
sð f Þdf
Vibration RMS
Vibration signal
0.05 Stage 11
Stage 1
0
100
100
50
50
0 20 40 60 80 100
Life percentage 0 0 Frequency (Hz) Life percentage
Fig. 8.24 Life-cycle evolution of vibration spectra (a) and RMS (b) with an inner race defect on a
rotating bearing. Reprinted (adapted) with permission from Ref. [20]
Fig. 8.25 Time domain signal (a) and frequency domain signal (b)
262 8 Time-Dependent Reliability Analysis in Operation …
done by building a FFT block diagram using the MATLAB® DSP System
Toolbox.
t = 0:0.001:0.6;
x = sin(2*pi*50*t)+sin(2*pi*120*t)+sin(2*pi*200*t);
y = x + randn(size(t));
figure(1)
subplot(2,1,1)
plot(1000*t(1:50),y(1:50))
xlabel(′Time (Milli-Seconds)′)
ylabel(′Signal with Random Noise′)
subplot(2, 1, 2)
Y = fft(y, 512);
Fy = Y.* conj(Y)/512;
f = 1000*(0:256)/512;
plot(f, Fy(1:257))
xlabel(′frequency (Hz)′);
ylabel(′Frequency Content of Signal′);
As mentioned earlier, we can extract a large number of features (in the time,
frequency, and time-frequency domains) from raw sensor signals. Among these
features, only a subset of the features is relevant and should be used to build a
health diagnostic model. Thus, we need to select the most relevant and unique
features, while removing most irrelevant and redundant features from the data to
improve the performance of the diagnostic model.
Feature Selection Using Non-adaptive Approaches
One method for selecting features for fault identification is to apply engineered
flaws, similar to the ones expected in actual operating conditions, to systems and
develop an initial understanding of the features that are sensitive to the expected
fault. The flawed system can be used to identify the features that are sensitive
enough to distinguish between the fault-free and faulty system. The use of
analytical tools, such as experimentally validated finite element models, can be a
great asset in this process. In many cases, analytical tools are used to per-
form numerical experiments where flaws are introduced through computer
simulation.
Damage accumulation testing, during which significant structural components of
the system under study are degraded by subjecting them to realistic loading con-
ditions, can also be used to identify appropriate features. This process may involve
induced-damage testing or accelerated degradation testing (e.g., fatigue testing,
corrosion growth, and temperature cycling) to acquire feature data under certain
types of damage states in an accelerated fashion. Insight into the appropriate fea-
tures can be gained from several types of analytical and experimental studies, as
described above, and is usually the result of information obtained from some
combination of these studies.
Feature Selection Using Adaptive Approaches
In addition to the non-adaptive approaches mentioned above, adaptive approa-
ches can also be used for selecting relevant features. As shown in Fig. 8.26, an
adaptive approach typically consists of four major components, namely (i) feature
subset generation, (ii) performance evaluation, (iii) stopping criteria check, and
(iv) online testing. In the training phase, a certain search strategy generates can-
didate feature subsets, of which each subset is evaluated according to a diagnostic
performance measure and compared with the previous best one with respect to this
measure. A new, better subset replaces the previous best subset. This is repeated
264 8 Time-Dependent Reliability Analysis in Operation …
Training phase
No
Feature subset Performance Stop
generation evaluation selection?
Training
Data Yes
Best subset
Testing Test diagnostic Train diagnostic
Data model model
Testing phase
until a stopping criterion is satisfied. In the testing phase, the performance of the
selected subset of features is evaluated with testing data not used in the feature
selection process.
Feature Selection Using Deep Learning
One of the main issues with PHM of complex engineered systems is the lack of
labeled data (i.e., data acquired from an operating system whose health state is
known) as well as the cost of labeling unlabeled data (e.g., by performing additional
diagnostics to assess the health of an operating system). Thus, there has been
interest in exploring the use of unlabeled data as a way to improve prediction
accuracy in fault diagnostics and failure prognostics. The availability of large
volumes of this kind of data in many complex systems makes it an appealing source
of information. Recently, deep learning methods have made notable advances in the
fields of speech recognition [26, 27], computer vision [28, 29], and natural language
processing [30, 31]. The unique ability of deep learning to automate learning of
high-level, complex features from large volumes of unlabeled data makes it
attractive for porting to the feature extraction/selection toolbox of a PHM practi-
tioner. In particular, the practitioner could investigate the use of deep belief net-
works (DBNs) [32, 33], built as a stack of Restricted Boltzmann Machines (RBMs)
on top of each other (see Fig. 8.27), to address the challenges of feature discovery
when dealing with large amounts of unlabeled monitoring and inspection data.
In a DBN, the features are learned in a layer-by-layer manner, and the features
learned by one-layer RBM become the input data for training the next layer of the
RBM. This hierarchical multi-level learning extracts more abstract and complex
features at a higher level, based on the less abstract features/data in the lower level
(s) of the learning hierarchy. The bottom-layer RBM is trained with the prepro-
cessed monitoring and inspection data, and the activation probabilities of hidden
units are treated as the input data for training the upper-layer RBMs. Once the
network is trained, the top layer’s output becomes highly representative of deep
8.3 The Health Reasoning Function 265
RBN 2 W2
Hidden layer 1
RBN 1 W1
Input layer
(bottom layer)
Input data node
features (see Fig. 8.27) that can be used for fault diagnostics and failure prognos-
tics. For the purpose of fault diagnostics, conventional classification models (see
Sect. 8.3.4) can be trained, leveraging the deep features, with the aid of small
amounts of labeled data.
Fault diagnosis aims at determining the fault type, location, and severity based on the
extracted feature. This task can be treated as a classification problem. The algorithms
used in health classification usually fall into two categories: supervised classification
and unsupervised classification. These two categories are illustrated in Fig. 8.28.
When labeled data are available from both the fault-free and faulty systems, the sta-
tistical pattern recognition algorithms fall into the general classification referred to as
supervised classification. Note that supervised classification belongs to a broader
category, supervised learning, which also includes supervised regression (often useful
(a) (b)
X2
X2
Class I Cluster I
Class II Cluster II
X1 X1
X
Mc
T
Sb ¼ mj lj l lj l ð8:7Þ
j¼1
X
m T
Sw ¼ x i lc i x i lc i ð8:8Þ
i¼1
Let w represent the project vector that maps the original N-dimensional features
onto a one-dimensional space (or a line). The projected features can be expressed as
yi = wxi, for i = 1, …, m. Then, multi-class LDA can be formulated as an opti-
mization problem in search of w that maximizes the ratio of the between-class
separation to the within-class separation, as
wT Sb w
^ ¼ arg max
w ð8:9Þ
w wT Sw w
Original x1
distributions
Projected y
distributions
diagnostics problem. The number of hidden layers and number of neurons vary
depending on the complexity of the problem. The model input is fed through the
input layer of the network and is connected to hidden layers by synaptic weights.
The number of hidden layers between the input and output layers can vary based on
the complexity of the problem. Each network layer is connected to the next layer
through synaptic weights in a hierarchical form. The training of the neural network
aims at learning the relationship between the input layer and the output layer by
adjusting the weights and bias values of each neuron in the network for each
training pattern. The BNN model is trained by optimizing the synaptic weights and
biases of all neurons until the maximum number of epochs is reached. The number
of epochs is defined as the number of times a training algorithm uses the entire
training data set. The trained BNN diagnostics model provides classification classes
as an outcome, when sensor data is provided as an input to the model.
Support Vector Machine
In addition to network-based learning techniques like BNN, kernel-based machine
learning techniques can also be used as member algorithms for health diagnostics.
Support vector machine (SVM) is one of the most popular kernel-based machine
learning techniques for classification. The following section briefly introduces SVM
for classification.
With the organized input data {(x1, c1), (x2, c2), …, (xm, cm)}, SVM constructs
the optimal separating hyper-plane that maximizes the margin between the sepa-
rating hyper-plane and the data [37, 41–52]. Without any loss of generality, con-
sider a two-class case for which the optimal separating hyper-plane and the
maximum margin are shown in Fig. 8.30. For the linearly separable, two-class
SVM shown in Fig. 8.30, the optimal hyper-plane separating the data can be
expressed as
y ð xÞ ¼ w T x þ b ¼ 0 ð8:10Þ
Support
Vectors
Class 2
Margin
8.3 The Health Reasoning Function 269
P
m
2w wþC
1 T
minimize ni
i¼1
ð8:11Þ
subject to yi ðwT xi þ bÞ 1 ni
ni 0; i ¼ 1; 2; . . .; m
where the regularization parameter C specifies the error penalty and ni is a slack
variable defining the error. If the Lagrangian multipliers are introduced, the opti-
mization problem in Eq. (8.11) is transformed to a dual-quadratic optimization
problem and expressed as
P
m P
m P
m
minimize LD ¼ ai 12 ai aj y i y j xi x j
P
m
i¼1 i¼1 j¼1
ð8:12Þ
subject to ai y i ¼ 0
i¼1
After solving the optimization problem shown above, the solution of w can be
expressed as
X
m
w¼ ai y i xi ð8:13Þ
i¼1
During the test phase, we determine on which side of the separating hyper-plane
a test instance x lies and assign the corresponding class. The decision function can
be expressed mathematically as sgn(wTx + b). Thus, the diagnostic SVM results
provide different classification classes as a solution when a set of preprocessed
sensor data is provided as an input.
Unsupervised Fault Classification
Mahalanobis Distance
Unsupervised statistical inference can be used for classifying health-relevant input
features into different HSs based on their relative statistical distances. The
Mahalanobis distance classifier is one of these classification techniques. In statistics,
the MD is a distance measure based on the correlations between variables, by which
270 8 Time-Dependent Reliability Analysis in Operation …
different patterns can be identified and analyzed. The MD gauges the similarity of an
unknown sample set to a known one. Unlike the Euclidean distance method, the MD
considers the correlations of the data set and is scale-invariant. The MD measure
shows the degree of dissimilarity between the measured data point xf and a reference
training set (l) with the covariance matrix S, as shown in Eq. (8.14).
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T
D xf ¼ xf lf S1 f xf l f ð8:14Þ
where xf = (x1, x2, …, xF)T is a F-dimensional data vector, and l and S are
respectively the mean vector and covariance matrix of the reference training data
set. The MD is often used to detect outliers within multi-dimensional input samples,
especially in the development of linear regression models. Due to its straightfor-
wardness and ease of implementation, the MD model has been used for health
diagnostics. MD health diagnostics considers the correlation between different input
variables and determines the system HS based on the minimum MD values of the
testing sample, compared to training samples from different HSs. Note that, in
Sect. 8.2.1, MD is employed to classify different HSs for the development of sensor
networks for health monitoring.
Self-organizing Maps
The methodologies discussed above were different machine learning processes
where the target HS classes are known. If the different health conditions, and their
functional relationships with the system input parameters, are not clearly known,
possible health conditions of the system can then be determined using an unsu-
pervised learning process that segregates the data based on the possible health
conditions. The self-organizing map (SOM) is a type of artificial neural network
that is trained using unsupervised learning to produce a two-dimensional discretized
representation of the input space of the training samples. The SOM uses a neigh-
borhood function to preserve the topological properties of the input space and
determine the closest unit distance to the input vector [17]; this is then used to
construct class boundaries graphically on a two-dimensional map. The SOM
training utilizes competitive learning. When a training example is fed to the SOM,
its Euclidean distance to all weight vectors is computed and the neuron with the
weight vector most similar to the input vector x will be identified as the best
matching unit (BMU). The weights of the BMU and neurons close to it in the SOM
lattice are adjusted towards the input vector. Moreover, the magnitude of the change
decreases with time and distance from the BMU. The weight vectors of the BMU
and its topological neighbors are fine-tuned to move closer to the input vector space
[36]. The learning rule for updating a weight vector w can be expressed as:
where wi(t + 1) is the updated weight vector, wi(t) is the weight vector from the
previous iteration, a(t) is the monotonically decreasing learning coefficient with
0 < a < 1, and h(nBMU, ni, t) is the neighborhood function that decreases mono-
tonically with an increase in the distance between the BMU nBMU and the neuron ni
in the lattice. The Gaussian function is a common choice for the neighborhood
function. Regardless of the function form, the h(nBMU, ni, t) decreases over the
training time. During the training process, one sample pattern is chosen from the
input data X arbitrarily, and the distance between the sample point and the initial
weight vector of the SOM is determined using the distance measure. Thus, through
the learning, the input data are transformed into different HS clusters, and the
overlapping of the different clusters is determined to be the misclassification.
Health Index
Successful implementation of health prognostics (see Sect. 8.4) often requires the
derivation of a single health measure that quantifies the health condition of an
operating system. This health measure is called a health index. In general, health
indices can be categorized into two types: (i) a Physics Health Index (PHI) or (ii) a
Virtual Health Index (VHI).
Physics Health Index (PHI)
A PHI uses a dominant physical signal as a direct health measure and is thus
applicable only if sensor signals are directly related to physics-of-failure. In the
literature, most engineering applications of health prognostics are based on various
PHIs, such as the battery impedance [47], the magnitude of the vibration signal
[48], and the radio frequency (RF) impedance [49]. However, the application of a
PHI is limited to cases where sensor signals directly related to physics-of-failure are
available. Mapping of a multitude of heterogeneous sensor signals to a dominant
physical signal is getting more and more difficult with the growing complexity of
engineered systems and sensor networks.
Virtual Health Index (VHI)
A VHI is applicable even if sensor signals are not directly related to system
physics-of-failure. VHIs have a potential to overcome the limitation of PHIs
described above. Multi-dimensional sensor signals can be transformed into a
one-dimensional VHI using advanced data processing techniques, such as weighted
averaging methods [43], the Mahalanobis distance [44], flux-based methods [45], or
a linear data transformation method [46]. Let’s consider the linear data transfor-
mation method. Suppose there are two multi-dimensional sensor data sets that
represent the system failed and system healthy states, Q0 of M0 D matrix and Q1
of M1 D matrix, respectively. M0 and M1 are the data sizes for system failed and
system healthy states, respectively, and D is the dimension of each data set. With
these two data matrices, a transformation matrix T can be obtained to transform the
multi-dimensional sensor signals into a one-dimensional VHI as
272 8 Time-Dependent Reliability Analysis in Operation …
1
T ¼ QT Q QT Soff ð8:16Þ
Upon the detection and classification of a fault via the health reasoning function, the
health prognostics function predicts the time remaining before the fault progresses
to an unacceptable level, in other words, the remaining useful life (RUL).
Figure 8.31 shows a typical paradigm of the health prognostics function, which first
utilizes the sensor signal to produce the system degradation signal through signal
processing and then leverages the degradation signal to perform diagnostics of the
system’s current health condition and further predict the system’s RUL and
reliability.
In general, two categories of approaches have been developed that enable
continuous updating of system degradation and RUL distribution: (i) model-based
approaches, and (ii) data-driven approaches. These two approaches are graphically
compared in Fig. 8.32. The application of general, model-based prognostic
approaches relies on the understanding of system physics-of-failure (PoF) and
underlying system degradation models. The basic idea is to identify the parameters
of the PoF-based degradation model in the online process. As practical engineered
systems generally consist of multiple components with multiple failure modes,
understanding all potential physics-of-failure and their interactions in a complex
system is almost impossible. In such cases, the data-driven approaches for system
Prediction
Reliability
Time
T: current time Td : designed life
8.4 The Health Prognostics Function 273
(a) (b)
Loading Response Training Testing
Signals Signals Signals Signals
health prognostics are more desirable; these data-driven approaches are mainly
based on massive sensor data with a lessened requirement for knowledge of
inherent system failure mechanisms. Data-driven prognostic approaches generally
require sensor feature extraction and statistical pattern recognition for the offline
training process, and interpolation, extrapolation, or machine learning for online life
prediction.
where S(ti) represents the degradation signal at time ti; S0 is a known constant; d, a,
and b are stochastic model parameters representing the uncertainty of generator
operating conditions, and e is a random error term modeling possible sensor noise
that follows a zero-mean Gaussian distribution with standard deviation r.
Non-iterative Bayesian Updating with Particle Filter
To make the discussion more concrete, consider a dynamic nonlinear discrete-time
system described by a state-space model. In this context, a simplified state-space
model is defined as
Transition: xi ¼ f ðxi1 Þ þ ui
ð8:18Þ
Measurement: yi ¼ gð xi Þ þ vi
where xi is the vector of (hidden) system states at time ti = iΔt, Δt is a fixed time
step between two adjacent measurement points, and i is the index of the mea-
surement time step, respectively; yi is the vector of system observations (or mea-
surements); and ui is the vector of process noise for the states; vi is the vector of
measurement noise; and f() and g() are the state transition and measurement
functions, respectively. With the system defined, we aim to infer the system states
x from the noisy observations y.
Sensor
signal
Residual life
X
NP
pðxi jy1:i Þ wij d xi xij ð8:19Þ
i¼1
NP
NP
where xij j¼1 and wij j¼1 are the particles and weights estimated at the ith
measurement time step, respectively; NP is the number of particles; and d is the
Dirac delta function. The standard particle filter algorithm follows a standard
procedure of sequential importance sampling and resampling (SISR) to recursively
update the particles and their associated weights [73]:
(1) Initialization (i = 0)
For j = 1, 2, …, NP, randomly draw state samples x0 j from the prior distri-
bution p(x0).
(2) For i = 1, 2, …
(a) Importance Sampling
For j = 1, 2, …, NP, randomly draw samples from the proposed importance
density xji q xi jxj0:i1 ; y1:i . The standard SISR particle filter employs the
j j
so-called transmission prior distribution q xi jx0:i1 ; y1:i ¼ p xi jxi1 .
For j = 1, 2, …, NP, evaluate the importance weights
j
p yi jxij p xij jxi1
wij ¼ wi1
j
j ð8:20Þ
q xi jx0:i1 ; y1:i
" #1
X
NP
~ ij
w ¼ wij wij ð8:21Þ
j¼1
where Ci is the capacity of the cell at the ith cycle, C0 is the initial capacity, a is the
coefficient of the exponential component of capacity fade, k is the exponential
capacity fade rate, b is the coefficient of the linear component of capacity fade, and
ci is the normalized capacity at the ith cycle. Researchers have reported that the
exponential function captures the active material loss [76] and the hybrid of the
linear and exponential functions was reported to provide a good fit to three years’
cycling data [77]. Here, we treat the normalized capacity c and capacity fade rates a,
k, and b as the state variables, i.e., x [c, a, k, b]T. The system transition and
measurement functions can then be written as [75].
Here, yi is the capacity measurement (or estimate) at the ith cycle; and u, r1, r2,
r3, and v are the Gaussian noise variables with zero means.
To perform the RUL prediction, we learn and track the capacity fade behavior of
the cell at every charge/discharge cycle. The learning and tracking are done by
updating the parameters a, k, and b of the capacity fade model in Eq. (8.22) with
the capacity measurement via the use of particle filter. After the resampling step of
particle filter at the ith cycle, the posterior probability distribution of the normalized
capacity is approximated as
1 X NP
pðci jy1:i Þ d ci cij ð8:24Þ
NP i¼1
where cji is the jth resampled particle of ci. The normalized capacity l cycles in the
future can be predicted by extrapolating the capacity fade model with the updated
parameters, expressed as
1 X NP
pðci þ l jy1:i Þ d ci þ l cijþ l ð8:25Þ
NP i¼1
where
cijþ l ¼ 1 aij 1 exp kij ði þ lÞ bij ði þ lÞ ð8:26Þ
Here, we define the normalized capacity at 78.5% of the failure threshold. Then the
RUL (in cycles) can be obtained for each particle as the number of cycles between
the current cycle i and the end-of-life (EOL) cycle
278 8 Time-Dependent Reliability Analysis in Operation …
Lij ¼ root aij 1 exp kij i þ bij i ¼ 0:215 i ð8:27Þ
Finally, the RUL distribution can be built based on these particles, expressed as
1 X NP
pðLi jy1:i Þ d Li Lij ð8:28Þ
NP i¼1
1
The term “online” indicates a state where a (testing) system unit is operating in the field and its
RUL is unknown and needs to be predicted.
2
The term “offline” indicates a state where a (training) system unit is operating in the lab or field
and often runs to failure (thus, its RUL at any time is known) prior to the operation of any system
units online.
8.4 The Health Prognostics Function 279
Ms
P 2
Minimize SSD ¼ hr tj hp tj þ T0
j¼1 ð8:29Þ
subject to T0 2 ½0; L Dt
where hr(tj) and hp(tj) are the online and offline health index data at tj, respectively;
Ms is the length of the online health index data; T0 is the time-scale initial health
condition; Dt is the time span (= tMs − t1) of the online health index data; and L is
the time span of a predictive health degradation curve, i.e., the life span of an offline
system unit. Once T0 is determined from the optimization (Eq. 8.29), the projected
RUL of an online system unit based on a given predictive health degradation curve
can be calculated as
RUL ¼ L Dt T0 ð8:30Þ
1X K X
K
L¼ ðWi Li Þ where W¼ Wi ð8:31Þ
W i¼1 i¼1
where Li is the projected RUL on the ith offline predictive health degradation curve
and Wi is the ith similarity weight. A similarity weight Wi can be defined as the
inverse of the corresponding SSDi, i.e., Wi= (SSDi)−1. This definition ensures that a
greater similarity gives a greater weight.
280 8 Time-Dependent Reliability Analysis in Operation …
Extrapolation-Based Approach
Unlike the interpolation-based approach, the extrapolation-based approach employs
the training data set not for comparison with the testing data set but rather for
obtaining prior distributions of the degradation model parameters. The testing data
set is then used to update these prior distributions. An RUL estimate can be
obtained by extrapolating the updated degradation model to a predefined failure
threshold (see Fig. 8.36). Bayesian linear regression, Kalman filter, or particle filter
can be employed for construction and updating of the degradation model.
Machine Learning-Based Approach
In contrast to the interpolation- or extrapolation-based approaches, the machine
learning-based approach does not involve any visible manipulation of the offline
and online data; rather, it requires the training of a prognostics model using the
offline data. One such model is the recurrent neural network (RNN) model, which is
capable of learning nonlinear dynamic temporal behavior due to the use of an
internal state and feedback. A first-order, simple RNN is an example of multi-layer
perceptron (MLP) with feedback connections (see Fig. 8.37). The network is
composed of four layers, namely, the input layer I, recurrent layer R, context layer
C, and output layer O. Units of the input layer and the recurrent layer are fully
connected through the weights WRI, while units of the recurrent layer and output
layer are fully connected through the weights WOR. Through the recurrent weights
WRC, the time delay connections link current recurrent units R(t) with the context
units C(t) holding recurrent units R(t−1) in the previous time step. The net input of
the ith recurrent unit can be computed as
X X
~ ði tÞ ¼
R
ðt Þ
WijRI Ij þ
ðt1Þ
WijRC Rj ð8:32Þ
j j
Online data
Normalized health index
Δt Lˆi
Extrapolation
Cycle index
R1 … Rj … R|R|
Recurrent layer R
WRC
WRI
WRC WRI
C1 … Cj … C|C| I1 … Ii … I|I|
Fig. 8.37 Simplified (a) and more detailed representation (b) of Elman’s simple RNN. Reprinted
(adapted) with permission from Ref. [79]
Given the logistic sigmoid function as the activation function f, the output activity
of the ith recurrent unit can then be computed as
h
i1
ðtÞ ~ ði tÞ ¼ 1 þ exp R
~ ði tÞ
Ri ¼ f R ð8:33Þ
The net input and output activity of the ith output unit can be computed, respec-
tively, as
X
~ ði tÞ ¼
O WijOR Rj
ðt Þ
ð8:34Þ
j
and
h
i1
ðtÞ ~ ði tÞ ¼ 1 þ exp O
~ ði tÞ
Oi ¼ f O ð8:35Þ
In health prognostics, the inputs to the RNN are the normalized sensor data set
QN and the outputs are the RULs associated with the data set. The RNN training
process calculates the gradients of network weights with respect to the network
performance and updates the network weights in search of the optimum weights
with the minimum prediction error.
Other machine learning techniques that can be used for health prognostics
include artificial intelligence-based techniques, feed-forward neural networks, the
decision tree method, support vector machine (SVM), relevance vector machine
(RVM), k-nearest neighbor (KNN) regression, fuzzy logic, and others.
282 8 Time-Dependent Reliability Analysis in Operation …
Probability density
Virtual health index
Fig. 8.38 The three main PHM functions for NASA aircraft engine prognostics
variations that are unknown. The 21 sensor signals were obtained from six
different operation regimes. The whole data set was divided into training and
testing subsets, each of which consists of 218 engine units.
Health Reasoning Function
To account for the different initial degradation conditions, an adjusted cycle
index is proposed as Cadj = C − Cf, where C is the operational cycle of the
training data for an engine unit and Cf is the cycle-to-failure of an engine unit.
A cycle index 0 indicates engine unit failure; whereas, negative cycle indices
are realized prior to failure. Among the 21 sensor signals, some signals
contain no or little degradation information for engine units; other signals do
contain degradation information. To improve the RUL prediction accuracy
and efficiency, seven relevant signals (2, 3, 4, 7, 11, 12, and 15) were selected
by screening all 21 sensor signals according to the degradation behaviors.
Based on the seven sensor signals, a normalized health index was con-
structed to represent the health degradation process of the engine. This nor-
malization process is realized by using a linear transformation with the sensor
data representing system failure and system healthy states. The dots in
Fig. 8.40 represent the normalized health index data obtained from the
training data set of an offline engine unit.
The randomness of the health index data is mainly due to the measurement
noise from the signals. Thus, a stochastic regression technique, namely rel-
evance vector machine (RVM) regression, can be used to model the VHI data
in a stochastic manner. The RVM is a Bayesian representation of a gener-
alized sparse linear model, which shares the same functional form as the
support vector machine (SVM). In this example, the linear spline kernel
function was used as a basis function for the RVM. The RVM was used to
build the predictive health degradation curves (hpi (t), i = 1, …, 218) for 218
offline engine units. The regression model gives both the mean and the
variation of the predictive health degradation curve, as shown in Fig. 8.40.
These predictive health degradation curves for the offline units altogether
construct the background health knowledge, which characterizes the system
degradation behavior. Later, this background knowledge can be used for
modeling the predictive RUL distributions of online engine units.
Degradation curves built for the offline units are exemplified in Fig. 8.41.
Health Prognostics Function
The online prediction process employed the testing data set obtained from
218 online system units. As explained earlier, the optimum fitting was
employed to determine a time-scale initial health degradation state (T0) with
the training data set for an online engine unit, while minimizing the SSE
between the online health data h(tj) and the predictive health degradation data
hp(tj), as shown in Fig. 8.42. It should be noted that the offline learning
process generates different predictive health degradation curves from
K identical offline units. Repeating this process provided different projected
RULs (RULi for i = 1, …, 218) on different predictive health degradation
curves. The projected RULs can be used to predict the RUL of an online unit
through a weighted-sum formulation.
From 218 offline engine units, the same number of the predictive health
degradation curves and projected RULs was obtained for each online engine
unit. Likewise, the same number of similarity weights was sought for each
online engine unit, based on the inverse of the SSE. A weighted-sum
8.4 The Health Prognostics Function 285
Fig. 8.43 Predicted RUL histograms with true RULs for a units 1 and 2, and b units 3 and 4
286 8 Time-Dependent Reliability Analysis in Operation …
formulation was then used to predict the RUL for each online engine unit as a
function of the projected RULs, while considering the first 50 largest simi-
larity weights. Note that hpi (ti) are stochastically modeled using RVM
regression. Thus, the resulting similarity weights were modeled in a statistical
manner, as was the RUL of the online unit. Using the mean and covariance
matrices of the relevance vector coefficients for the RVM regression, the
random samples of the coefficients result in the random samples of the
similarity weights for the projected RULs of the engine unit. The randomness
of the similarity weights and projected RULs is then propagated to the pre-
dictive RUL of the engine unit through the weighted-sum formulation.
Figure 8.43 shows the RUL histogram and the true value with the testing data
set for the first four online engine units.
Uncertainty Uncertainty
propaga on propaga on
PredicƟon at
120 cycles
Raw sensory signal
Probability density
Virtual health index PredicƟon at
80 cycles
PredicƟon at
40 cycles
0
20 40 60 80 100 120 140
estimate the total uncertainties in future states and in the RUL. Figure 8.44 shows
uncertainty propagation from the PHM function at one level to the one at the next
level. An arbitrary assignment of the distribution type to RUL is often erroneous;
thus, the true probability distribution of the RUL needs to be estimated through
rigorous uncertainty propagation of the various sources of uncertainty through the
models and algorithms. Such a distribution may not share similarity with any of the
commonly used distribution types. It is important to understand that uncertainty
propagation is significantly important and challenging in the context of prognostics,
since the focus is on predicting the future unknown behavior of a system.
Uncertainty Management
The third activity is uncertainty management, and it has two primary focuses. First,
it focuses on reducing the uncertainty in the predicted RUL and increasing the
confidence in condition-based maintenance during real-time operation. This can be
done by partitioning uncertainty to quantify the contributions of uncertainties from
individual sources to the uncertainty in the predicted RUL. For example, if sensor
noise and bias are identified to be significant contributors to the uncertainty in the
RUL prediction, a better RUL prediction (with less uncertainty) can be achieved by
improving the quality of the sensors. The second focus of uncertainty management
is on addressing how uncertainty-related information can assist in making decisions
about when and how to maintain the system. Since the uncertainty in the RUL
prediction cannot be eliminated, it is important to take into account the uncertainty
and make optimum decisions under acceptable risk.
288 8 Time-Dependent Reliability Analysis in Operation …
Health Index
Reprinted (adapted) with
permission from Ref. [79]
The vibration-based health index is highly random and non-monotonic, but grad-
ually increases as the bearing in the fan degrades over time. For the electric cooling
fan prognostics, the first 20 fan units are employed for the training dataset in the
offline training (Step 1) process, while the rest are used to produce the testing
dataset in the online prediction (Step 2) process.
In this demonstration, the vibration-based health index data of the 20 training
and 24 testing fan units are pre-computed and stored in the MATLAB.mat files,
DuFan.mat and DuFan_test.mat. Step 0.1 in the main code SBI_FanDemo loads
these pre-computed training and testing data sets.
Health Prognostics
RUL prediction with similarity-based interpolation involves two sequential
steps, offline training and online prediction. In offline training, we have the
degradation data from the 20 offline units. An offline unit is tested to failure in the
lab. So we have a complete degradation trajectory from beginning of life (BOL) to
end of life (EOL). We overlay the degradation data from the 20 units on the same
graph (“Offline data” in Fig. 8.46). The intent of offline training is to construct a
fitted curve to represent the trajectory.
In this demonstration, machine learning-based regression (i.e., RVM regression)
is used to fit a degradation curve to the data for each offline unit. By doing so, we
transform these offline data into 20 degradation curves, as shown in Fig. 8.46
(“Background degradation curves”). These curves will be used to predict the RUL
of an online unit. The lines of code for Step 1 in the MATLAB main code
SBI_FanDemo.m implement this curve fitting on all 20 offline training units with
RVM regression.
Health index
Regression
Health index
degrada on
Regression curves
Fig. 8.46 Offline training (Step 1) for electric cooling fan prognostics
8.5 PHM Demonstration: Prognostics of Electric Cooling Fan 291
Unlike an offline unit, an online unit is operating in the field and has not failed
yet. So we only get partial degradation data (see the blue dots in Fig. 8.47) from the
online unit. Now, we have two pieces of information: (1) the degradation curves
from the offline units; and (2) the partial degradation data from the online unit. The
objective is to predict the RUL of this online unit.
Online RUL prediction using similarity-based interpolation is implemented in
the MATLAB supporting code Life_Pred_SBI_FanDemo.m. In the implementa-
tion, an outer “for” loop is used to run RUL prediction on all testing units (e.g., 24
testing fan units in this demonstration) and an inner “for” loop is used to run RUL
prediction of each testing unit based on all training units (e.g., 20 training fan units
in this demonstration).
% Run an outer for loop to predict RULs for all testing units
for ku = 1:nTest
fprintf(′\n\n Currently making prediction for testing unit %d \n
\n′,ku);
% Run an inner for loop over all training units
for ki = 1:nTrain
…
end
…
end
Two steps are involved in RUL prediction of a testing unit. First, we predict the
RUL of the online unit based on the 1st offline unit. The prediction involves the
optimization process in Eq. (8.29). In this process, we move the online data along
the time axis to find the best match with the degradation curve. After the optimum
match is found, we will have a predicted life and a sum squared error of this match.
Repeating this process for all the other offline units, we can get 20 life predictions.
The inner “for” loop in the MATLAB supporting code Life_Pred_SBI_FanDemo.m
implements this first step.
292 8 Time-Dependent Reliability Analysis in Operation …
Fig. 8.47 Online prediction (Step 2) for electric cooling fan prognostics
for ki = 1:nTrain
% Predict RUL of testing unit ku using data from training unit ki
L1 = size(D_deg_test{ku},1); % Length of testing data
L2 = size(D_deg{ki},1); % Length of training data
Wei_Rul = [];
X = D_deg{ki}(:,1); % Extract cycle index data of training
% unit ki
W = MUW{ki}; % Extract RVM model weights of training
% unit ki
if L1 < L2
ndif = L2-L1;
% Use matrix operation to accelerate computation
xpool = (1-L2):1:0; % Define adjusted cycle index of
% training unit ki
% Construct design matrix Phi and calcuate prediction by RVM
% regression model
Phi_pred = Kernel_Linearspline_FanDemo(xpool,X);
T_pred = Phi_pred*W;
8.5 PHM Demonstration: Prognostics of Electric Cooling Fan 293
if isempty(Wei_Rul)==0
% Identify RUL that produces the best match between online data
% (testing unit ku) and offline degradation curve (training
% unit ki)
L_Pred = min(Wei_Rul(:,1));
Pred_Results(ki,:) = Wei_Rul(Wei_Rul(:,1)==L_Pred,:);
clear Wei_Rul
else
Pred_Results(ki,:) = [-1,-1];
end
end
RUL_Prediction{ku} = Pred_Results;
Next, we aggregate all the 20 predictions in a weighted-sum form (see Eq. 8.31).
This gives the final predicted RUL. The weight is inversely proportional to the error
of the match. That is, a larger weight will be assigned if the degradation curve of a
certain offline unit shows better agreement with the online data. The lines of code
right below the inner “for” loop in the MATLAB supporting code
Life_Pred_SBI_FanDemo.m implement this second step.
150
RUL (cycle)
100
50
0
0 5 10 15 20 25
Unit ID (sorted)
8.6 Exercises 295
8.6 Exercises
8:1 Consider a basic model of an isotropic rotor with a rubbing condition. Assume
that the rotor with a lumped mass system is supported by bearings, which can
be modeled as springs and dampers. The motion behavior of the rotor system
can be ideally modeled using a second-order ordinary differential equation as
respectively. Assume that the present day is day 100, and the threshold of
the frequency feature is 0.9. Use a single-term exponential function to
build a prognosis model.
Ci ¼ a1 expðk1 iÞ þ a2 expðk2 iÞ
where Ci is the capacity (Ah) at the ith cycle, a1 and a2 are the coefficients of
the exponential components of capacity fade, and k1 and k2 are the exponential
capacity fade rates. Suppose the model parameters a1, k1, a2, and k2 are
independent random variables whose statistical information is summarized in
Table 8.8.
(1) Generate a synthetic data set of capacity fade from 10 cells through the
following steps:
Step 1: Generate 10 sets of random realizations of the model parameters
based on their distributions.
Table 8.8 Statistical information of the parameters of the capacity fade model for Problem 8.2
Parameter a1 k1 a2 k2
Distribution Normal Normal Normal Normal
Mean (fitted value [83]) −9.860E−07 5.752E−02 8.983E−01 −8.340E−04
Standard deviation 9.860E−08 5.752E−03 8.983E−02 8.340E−05
8.6 Exercises 297
105
Cell 1
100 Cell 2
95
Normalized capacity (%)
90
85
Capacity threshold
80
75
70
0 50 100 150 200 250 300
Cycle number
Fig. 8.50 Synthetic data of capacity fade from two cells. For ease of visualization, capacity
measurements are plotted every 5 cycles for both cells
Step 2: Produce a capacity vs. cycle trajectory from cycle 1 to cycle 300
for each set of parameters.
Steps 3: Corrupt each trajectory by adding a white Gaussian noise with a
mean of 0 Ah and a standard deviation of 0.005 Ah.
An example synthetic data set of capacity fade from 2 cells is shown in
Fig. 8.50, where the normalized capacity at the ith cycle ci is computed
as ci = Ci/C0, with C0 being the initial capacity.
(2) Here, treat the capacity C as the state variable and the capacity fade rates
a1, k1, a2,and k2 as the model parameters. The system transition and
measurement functions can then be written as
Transition: Ci ¼ a1;i1 exp k1;i1 i þ a2;i1 exp k2;i1 i þ ui ;
a1;i ¼ a1;i1 þ r1;i ; k1;i ¼ k1;i1 þ r2;i ;
a2;i ¼ a2;i1 þ r3;i ; k2;i ¼ k2;i1 þ r4;i ;
Measurement: yi ¼ Ci þ vi
Here, yi is the capacity measurement at the ith cycle; u, r1, r2, r3, r4, and
v are the Gaussian noise variables with zero means. Use the standard
particle filter method to predict the RULs of each cell at cycles 50 and
150. Compare the prediction accuracy at these two cycles for each cell.
298 8 Time-Dependent Reliability Analysis in Operation …
References
1. Bond, L. J. (2015). From NDT to prognostics: Advanced technologies for improved quality,
safety and reliability, Invited Keynote. In 12th Far East NDT Forum, Zhuhai, China, May
29–31, 2015.
2. Bond, L. J., Doctor, S. R., Jarrell, D. B., & Bond, J. W. D. (2008). Improved economics of
nuclear plant life management. In Proceedings of the 2nd IAEA International Symposium on
Nuclear Power Plant Life Management, International Atomic Energy Agency, Shanghai,
China, IAEA Paper IAEA-CN-155-008KS.
3. Global industry estimates based on “Industrial Internet: Pushing the Boundaries of Minds &
Machines”. November 26, 2012.
4. Wang, P., Youn, B. D., & Hu, C. (2014). A probabilistic detectability-based sensor network
design method for system health monitoring and prognostics. Journal of Intelligent Material
Systems and Structures. https://doi.org/10.1177/1045389X14541496.
5. Wang, P., Wang, Z., Youn, B. D., & Lee, S. (2015). Reliability-based robust design of smart
sensing systems for failure diagnostics using piezoelectric materials. Computers & Structures,
156, 110–121.
6. Youn, B. D., & Xi, Z. (2009). Reliability-based robust design optimization using the
eigenvector dimension reduction (EDR) method. Structural and Multidisciplinary
Optimization, 37(5), 475–492.
7. Youn, B. D., Choi, K. K., Du, L., & Gorsich, D. (2007). Integration of possibility-based
optimization and robust design for epistemic uncertainty. ASME Journal of Mechanical
Design, 129(8).
8. Adjiman, C. S., Androulakis, I. P., & Floudas, C. A. (2000). Global optimization of
mixed-integer nonlinear problems. AIChE Journal, 46(9), 1769–1797.
9. Wei, J., & Realff, J. (2004). Sample average approximation methods for stochastic MINLPs.
Computers & Chemical Engineering, 28(3), 333–346.
10. New technology captures freely available vibration energy to power wireless sensor. https://
energy.gov/eere/amo/vibration-power-harvesting. Accessed February 25, 2018.
11. McFadden, P. D. (1987). A revised model for the extraction of periodic waveforms by time
domain averaging. Mechanical Systems and Signal Processing, 1, 83–95.
12. Jardine, A. K. S., Lin, D., & Banjevic, D. (2006). A review on machinery diagnostics and
prognostics implementing condition-based maintenance. Mechanical Systems and Signal
Processing, 20, 1483–1510.
13. Bechhoefer, E., & Kingsley, M. (2009). A review of time synchronous average algorithms. In
Annual Conference of the Prognostics and Health Management Society, San Diego, CA.
14. Ha, J. M., Youn, B. D., Oh, H., Han, B., & Jung, Y. (2016). Autocorrelation-based time
synchronous averaging for health monitoring of planetary gearboxes in wind turbines.
Mechanical Systems and Signal Processing, 70–71, 161–175.
15. McFadden, P. D. (1989). Interpolation techniques for time domain averaging of gear
vibration. Mechanical Systems and Signal Processing, 3, 87–97.
16. Strum, R. D., & Kirk, D. E. (1989). First principles of discrete systems and digital signal
processing. Reading, MA: Addison-Wesley.
17. Yin, L., Yang, M., Gabbouj, M., & Neuvo, Y. (1996). Weighted median filters: A tutorial.
IEEE Transactions on Circuits and Systems, 40, 157–192.
18. Ganguli, R. (2002). Noise and outlier removal from jet engine health monitoring signals using
weighted FIR median hybrid filters. Mechanical Systems and Signal Processing, 16(6),
867–978.
19. Neerjarvi, J., Varri, A., Fotopoulos, S., & Neuvo, Y. (1993). Weighted FMH filters. Signal
Processing, 31, 181–190.
20. Hu, C., Youn B. D., Kim, T. J., & Wang, P. (2015). Semi-supervised learning with co-training
for data-driven prognostics. Mechanical Systems and Signal Processing, 62–63, 75–90.
References 299
21. Sejdić, E., Djurović, I., & Jiang, J. (2009). Time-frequency feature representation using
energy concentration: An overview of recent advances. Digital Signal Processing, 19(1),
153–183.
22. Gröchenig, K. (2001). Foundations of time-frequency analysis. Boston: Birkhäuser.
23. Mallat, S. G. (1999). A wavelet tour of signal process (2nd ed.). San Diego: Academic Press.
24. Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: Society for Industrial and
Applied Mathematics.
25. Cohen, L. (1995). Time-frequency analysis. Englewood Cliffs, NJ: Prentice Hall.
26. Dahl, G., Ranzato, M., Mohamed, A.-R., & Hinton, G. E. (2010). Phone recognition with the
mean-covariance restricted boltzmann machine. In Advances in neural information processing
systems (pp. 469–477). New York: Curran Associates, Inc.
27. Hinton, G., Deng, L., Yu, D., Mohamed, A.-R., Jaitly, N., Senior, A., et al. (2012). Deep
neural networks for acoustic modeling in speech recognition: The shared views of four
research groups. IEEE Signal Processing Magazine, 29(6), 82–97.
28. Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief
nets. Neural Computation, 18(7), 1527–1554.
29. Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep
convolutional neural networks. In Advances in neural information processing systems (Vol.
25, pp. 1106–1114). New York: Curran Associates, Inc.
30. Mikolov, T., Deoras, A., Kombrink, S., Burget, L., & Cernocký, J. (2011). Empirical
evaluation and combination of advanced language modeling techniques. In INTERSPEECH,
ISCA (pp. 605–608).
31. Socher, R., Huang, E. H., Pennin, J., Manning, C. D., & Ng, A. (2011) Dynamic pooling and
unfolding recursive autoencoders for paraphrase detection. In Advances in neural information
processing systems (pp. 801–809). New York: Curran Associates, Inc.
32. Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets.
Neural Computation, 18, 1527–1554.
33. Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine
Learning, 2(1).
34. Fisher, R. A. (1938). The statistical utilization of multiple measurements. Annals of Eugenics,
8, 376–386.
35. Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). USA:
Academic Press.
36. Huang, R., Xi, L., Li, X., Richard Liu, C., Qiu, H., & Lee, J. (2007). Residual life predictions
for ball bearings based on self-organizing map and back propagation neural network methods.
Mechanical Systems and Signal Processing, 21, 193–207.
37. Samanta, B. (2004). Gear fault detection using artificial neural networks and support vector
machines with genetic algorithms. Mechanical Systems and Signal Processing, 18, 625–644.
38. Srinivasan, S., Kanagasabapathy, P., & Selvaganesan, N. (2007). Fault diagnosis in deaerator
using neural networks. Iranian Journal of Electrical and Computer Engineering, 6, 62.
39. Saxena, A., & Saad, A. (2007). Evolving an artificial neural network classifier for condition
monitoring of rotating mechanical systems. Applied Soft Computing, 7, 441–454.
40. Yang, B. S., Hwang, W. W., Kim, D. J., & Chit Tan, A. (2005). Condition classification of
small reciprocating compressor for refrigerators using artificial neural networks and support
vector machines. Mechanical Systems and Signal Processing, 19, 371–390.
41. Saimurugan, M., Ramachandran, K. I., Sugumaran, V., & Sakthivel, N. R. (2011). Multi
component fault diagnosis of rotational mechanical system based on decision tree and support
vector machine. Expert Systems with Applications, 38(4), 3819–3826.
42. Ge, M., Du, R., Zhang, G., & Xu, Y. (2004). Fault diagnosis using support vector machine
with an application in sheet metal stamping operations. Mechanical Systems and Signal
Processing, 18, 143–159.
43. Xue, F., Bonissone, P., Varma, A., Yang, W., Eklund, N., & Goebel, K. (2008). An
instance-based method for remaining useful life estimation for aircraft engines. Journal of
Failure Analysis and Prevention, 8(2), 199–206.
300 8 Time-Dependent Reliability Analysis in Operation …
44. Nie, L., Azarian, M. H., Keimasi, M., & Pecht, M. (2007). Prognostics of ceramic capacitor
temperature-humidity-bias reliability using mahalanobis distance analysis. Circuit World,
33(3), 21–28.
45. Baurle, R. A., & Gaffney, R. L. (2008). Extraction of one-dimensional flow properties from
multidimensional data sets. Journal of Propulsion and Power, 24(24), 704–714.
46. Wang, T., Yu, J., Siegel, D., & Lee, J. (2008). A similarity-based prognostics approach for
remaining useful life estimation of engineered systems. In International Conference on
Prognostics and Health Management, Denver, CO, October 6–9, 2008.
47. Saha, B., Goebel, K, Poll, S., & Christophersen, J. (2009). Prognostics methods for battery
health monitoring using a Bayesian framework. IEEE Transaction on Instrumentation and
Measurement, 58(2), 291–296.
48. Gebraeel, N. Z., Lawley, M. A., Li, R., & Ryan, J. K. (2005). Residual-life distributions from
component degradation signals: A Bayesian approach. IIE Transactions on Reliability, 37(6),
543–557.
49. Kwon, D., Azarian, M., & Pecht, M. (2008). Detection of solder joint degradation using RF
impedance analysis. In IEEE Electronic Components and Technology Conference, Lake
Buena Vista, FL, 27–30 May (pp. 606–610).
50. Abbasion, S., Rafsanjani, A., Farshidianfar, A., & Irani, N. (2007). Rolling element bearings
multi-fault classification based on the wavelet denoising and support vector machine.
Mechanical Systems and Signal Processing, 21, 2933–2945.
51. Sun, J., Rahman, M., Wong, Y., & Hong, G. (2004). Multiclassification of tool wear with
support vector machine by manufacturing loss consideration. International Journal of
Machine Tools and Manufacture, 44, 1179–1187.
52. Geramifard, O., Xu, T. X., Pang, C., Zhou, J., & Li, X. (2010). Data-driven approaches in
health condition monitoring—A comparative study. In 8th IEEE International Conference on
Control and Automation (ICCA) (pp. 1618–1622).
53. Ramadass, P., Haran, B., Gomadam, P. M., White, R., & Popov, B. N. (2004). Development
of first principles capacity fade model for Li-ion cells. Journal of the Electrochemical Society,
151, A196.
54. Santhanagopalan, S., Zhang, Q., Kumaresan, K., & White, R. E. (2008). Parameter estimation
and life modeling of lithium-ion cells. Journal of the Electrochemical Society, 155(4),
A345–A353.
55. Yang, L., Agyakwa, P., & Johnson, C. M. (2013). Physics-of-failure lifetime prediction
models for wire bond interconnects in power electronic modules. IEEE Transactions on
Device and Materials Reliability, 13(1), 9–17.
56. Shao, J., Zeng, C., & Wang, Y. (2010). Research progress on physics-of-failure based fatigue
stress-damage model of solderjoints in electronic packing. In Proceedings of Prognostics and
Health Management Conference, January 10–12, 2010 (pp. 1–6).
57. Gordon, N. J., Salmond, D. J., & Smith, A. F. M. (1993). Novel approach to nonlinear/
non-Gaussian Bayesian state estimation. Radar and Signal Processing, IEE Proceedings,
F140(2), 107–113.
58. Kitagawa, G. (1996). Monte carlo filter and smoother for non-Gaussian nonlinear state space
models. Journal of Computational and Graphical Statistics, 5(1), 1–25.
59. Arulampalam, M., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters
for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal
Processing, 50(2), 174–189.
60. Tavare, S., Balding, D., Griffiths, R., & Donnelly, P. (1997). Inferring coalescence times from
DNA sequence data. Genetics, 145, 505–518.
61. Weiss, G., & von Haeseler, A. (1998). Inference of population history using a likelihood
approach. Genetics, 149, 1539–1546.
62. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953).
Equations of state calculations by fast computing machines. Journal of Chemical Physics,
21(6), 1087–1092.
References 301
63. Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their
applications. Biometrika, 57(1), 97–109.
64. Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the bayesian
restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6),
721–741.
65. Liu, J. S. (1994). The collapsed Gibbs sampler in bayesian computations with applications to
a gene regulation problem. Journal of the American Statistical Association, 89(427),
958–966.
66. Tierney, L., & Kadane, J. B. (1986). Accurate approximations for posterior moments and
marginal distributions. Journal of the American Statistical Association, 81, 82–86.
67. Tierney, L., Kass, R. E., & Kadane, J. B. (1989). Approximate marginal densities of nonlinear
functions. Biometrika, 76(3), 425–433.
68. Azevedo-Filho, A., & Shachter, R. (1994). Laplace’s method approximations for probabilistic
inference in belief networks with continuous variables. In R. Mantaras & D. Poole (Eds.),
Uncertainty in artificial intelligence. San Francisco, CA: Morgan Kauffman.
69. Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of
Basic Engineering, 82(1), 35–45.
70. Jazwinski, A. H. (1970). Stochastic processes and filtering theory. San Diego, CA: Academic.
71. Sorenson, H. W. (Ed.). (1985). Kalman filtering: Theory and application. Piscataway, NJ:
IEEE.
72. Julier, S. J., & Uhlmann, J. K. (2004). Unscented filtering and nonlinear estimation.
Proceedings of IEEE, 92, 401–422.
73. Arulampalam, S., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for
on-line non-linear/non-gaussian bayesian tracking. IEEE Transaction on Signal Processing,
50(2), 174–188.
74. Cappe, O., Godsill, S. J., & Moulines, E. (2007). An overview of existing methods and recent
advances in sequential Monte Carlo. IEEE Proceedings, 95(5), 899–924.
75. Hu, C., Jain, G., Tamirisa, P., & Gorka, T. (2014). Method for estimating capacity and
predicting remaining useful life of lithium-ion battery. Applied Energy, 126, 182–189.
76. Honkura, K., Takahashi, K., & Horiba, T. (2011). Capacity-fading prediction of lithium-ion
batteries based on discharge curves analysis. Journal of Power Sources, 196(23), 10141–
10147.
77. Brown, J., Scott, E., Schmidt, C., & Howard, W. (2006). A practical longevity model for
lithium-ion batteries: De-coupling the time and cycle-dependence of capacity fade. In 208th
ECS Meeting, Abstract #239.
78. Wang, P., Youn, B. D., & Hu, C. (2012). A generic probabilistic framework for structural
health prognostic and uncertainty management. Mechanical Systems and Signal Processing,
28, 622–637.
79. Hu, C., Youn, B. D., Wang, P., & Yoon, J. T. (2012). Ensemble of data-driven prognostic
algorithms for robust prediction of remaining useful life. Reliability Engineering & System
Safety, 103, 120–135.
80. Orchard, M., Kacprzynski, G., Goebel, K., Saha, B., & Vachtsevanos, G. (2008). Advances in
uncertainty representation and management for particle filtering applied to prognostics. In
International Conference on Prognostics and Health Management, 2008. PHM 2008,
October 2008 (pp. 1–6).
81. Tang, L., Kacprzynski, G., Goebel, K., & Vachtsevanos, G. (2009). Methodologies for
uncertainty management in prognostics. In 2009 IEEE Aerospace Conference, March 2009
(pp. 1–12).
82. Sankararaman, S., & Goebel, K. (2015). Uncertainty in prognostics and systems health
management. International Journal of Prognostics and Health Management, 6, Special Issue
on Uncertainty in Prognostics and Health Management.
83. He, W., Williard, N., Osterman, M., & Pecht, M. (2011). Prognostics of lithium-ion batteries
based on Dempster-Shafer theory and the Bayesian Monte Carlo method. Journal of Power
Sources, 196(23), 10314–10321.
Chapter 9
Case Studies: Prognostics and Health
Management (PHM)
Steam turbines in power plants are large and complex mechanical rotating systems.
Generally, rigid couplings are used to connect three to five stages of shafts, as
shown in Fig. 9.1. Steam turbine systems are typically composed of a high-pressure
(HP) turbine, an intermediate-pressure (IP) turbine, two low-pressure (LP) turbines,
a generator, and an exciter. Each shaft is supported by two journal bearings. An oil
film between the journal bearing and the turbine shaft prevents direct contact
between the rotor and the stator. These bearings ensure the turbine system operates
steadily. Although turbines are designed to operate in a stable condition, various
uncertainties exist, such as operation uncertainty, manufacturing variability, and
installation uncertainty. In adverse conditions, various anomaly states can be found
in a turbine system, for example unbalance, misalignment, rubbing, and oil whirl,
among others. The work described here as a PHM case study examined a number of
steam turbine systems from four power plant sites in Korea.
HIP LP - A LP - B
Journal
Vibration signals are commonly used for health diagnostics and prognostics of
steam turbines. Even under normal conditions, a bit of unbalance is present in the
turbine rotors, resulting in a certain level of vibration when rotating. As anomaly
states begin developing, the level of vibration increases. Sophisticated analysis of
vibration signals is essential for robust diagnostics and prognostics of the turbine
rotors. In addition to vibration sensors, temperature and pressure signals can also be
acquired for PHM of turbine rotors. Data acquisition must be carefully designed and
care must be taken to account for the data-sampling rate and frequency, logging
power and storage, and other factors.
Proximity sensors are widely used for condition monitoring of journal bearing
rotor systems in turbines. The sensors directly measure gap displacement signals
between the turbine shaft and the sensor. These measurements are represented by
vibration signals that are acquired via the AC component of the signals. The DC
component of the signals represents the absolute position of the turbine shaft
centerline. Since the gap information provides information about the dynamic
behavior of the turbine system, the state of the turbine can be accurately determined
by using vibration signals to analyze the behavior of the turbines [1, 2].
To robustly detect potential anomaly states, the number and location of prox-
imity sensors must be carefully considered. Adding more sensors requires increased
data analysis loads and data storage capacity. On the other hand, the
high-temperature environment in steam turbines limits sensor placement. Sensors
are often placed between the coupling and the bearing seal to limit the effect of the
high-temperature steam. Considering these practical aspects, the sensors are typi-
cally placed at two positions for each turbine stage, adjacent to the journal bearings,
as illustrated Fig. 9.2. For example, in a three-stage steam turbine, vibration signals
are acquired from six different axial positions. For each axial location, two prox-
imity sensors are placed orthogonally to obtain the orbit trace of the turbine cen-
terline. In total, twelve proximity sensors are used for PHM in a three-stage steam
turbine.
Other signal measurement specifications—period, sampling rate, and duration—
must also be carefully determined when designing the sensing function. The period
between the signal measurements should be short enough to detect abrupt degra-
dation while still being long enough to minimize the burden of data storage. Signals
should be measured when an event (e.g., anomaly state, high vibration)
9.1 Steam Turbine Rotors 305
HIP LP - A LP - B
Fig. 9.2 Axial locations of proximity sensors. Reprinted (adapted) with permission from Ref. [3]
The objective of the reasoning function is to extract the health data of steam
turbines using measured vibration signals. This requires some key reasoning steps,
including: preprocessing, feature extraction, feature selection, and classification.
These are the steps required for the supervised machine learning method.
Preprocessing
Vibration signals must be processed before signal features are extracted. The
rpm of a steam turbine varies between 3600 ± 20 rpm; however, larger variations
can occur due to uncertainties. These variations in rpm may lead to inconsistent
reasoning results because the fixed sampling rate is likely to yield different sampled
data at given time intervals. This uncertainty can be controlled by applying phase
synchronized resampling, also known as angular resampling, to the acquired
vibration signals, as shown in Fig. 9.3 [4, 5]. Using the tachometer peaks as the
starting point of a revolution, the signals can be resampled to have an equal number
306 9 Case Studies: Prognostics and Health Management (PHM)
Magnitude
vibration signals;
a tachometer signal, b raw
signal, and c resampled signal
(b)
Magnitude
9 points 6 points
(c)
Magnitude
9 points 9 points
Time
of points per revolution. The resampled vibration signals will then give consistent
results, despite rpm variations.
Feature Extraction
Next, the resampled vibration signals are used to extract candidate features.
Based on the information from complete revolutions, candidate features can be
extracted with minimal noise in the order domain. Since steam turbines rotate
mostly at a steady state, time- and frequency-domain features can be used.
Candidate features include eight time-domain features and eleven frequency–do-
main features. The extracted features are presented in Tables 9.1 and 9.2 [6, 7].
Among the eight time-domain features, max, mean, and root-mean-square
(RMS) are related to the energy of the vibration. Skewness and kurtosis, which are
the third and fourth statistical moments, respectively, represent the statistical
characteristics. The last three features—crest factor, shape factor, and impulse
ðN1Þs4
t6 Crest factor Max
RMS
t7 Shape factor RMS
Mean
t8 Impulse factor Max
Mean
9.1 Steam Turbine Rotors 307
f3 RVF hR i1=2
ðf FC Þ2 sð f Þdf
R
sð f Þdf
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
f4 0:5x=1x sðf0:5x Þ=sðf1x Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
f5 2x=1x sðf2x Þ=sðf1x Þ
nP pffiffiffiffiffiffiffiffiffiffiffio pffiffiffiffiffiffiffiffiffiffiffi
f6 ð1x 10xÞ=1x 10
sðfnx Þ = sðf1x Þ
n¼1
nR o pffiffiffiffiffiffiffiffiffiffiffi
f7 ð0 0:39xÞ=1x 0:39x pffiffiffiffiffiffiffiffiffi
0 sð f Þdf = sðf1x Þ
nR o pffiffiffiffiffiffiffiffiffiffiffi
f8 ð0:4x 0:49xÞ=1x 0:49x pffiffiffiffiffiffiffiffiffi
0:4x sð f Þdf = sðf1x Þ
nR o pffiffiffiffiffiffiffiffiffiffiffi
f9 ð0:51x 0:99xÞ=1x 0:99x pffiffiffiffiffiffiffiffiffi
0:51x sð f Þdf = sðf1x Þ
nR pffiffiffiffiffiffiffiffiffi o pffiffiffiffiffiffiffiffiffiffiffi
f10 ð3x 5xÞ=1x 5x
sð f Þdf = sðf1x Þ
3x
nP qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
o pffiffiffiffiffiffiffiffiffiffiffi
f11 ð3x; 5x; 7x; 9xÞ=1x 4
s fð2n þ 1Þx = sðf1x Þ
n¼1
robust regardless of directionality. Researchers have also reported that the perfor-
mance of the diagnosis can be enhanced by using the ODR method without extra
sensors [8].
Feature Selection and Classification
The feature selection process is of great importance when the number of features is
large. This process determines the optimal feature subset from a set of candidate
features. Each feature possesses a different separation ability to distinguish the health
states of the steam turbine. To obtain robust reasoning for steam turbines, a genetic
algorithm is integrated with different separability measures, such as probability-of-
separation (PoS) or correlation coefficient [3, 9]. A genetic algorithm can be used to
randomly generate subsets offeatures, and the degrees of separability can be measured
for the subsets using either the PoS or a correlation coefficient method. Subset
generation is repeated until a predefined criterion is satisfied. Note that PoS is a
separability measure that quantifies the degree of separation between two classes.
Once the optimal feature subset is determined, classification continues using the
support vector machine (SVM) method to minimize structural risk [10, 11].
Through an optimization process, hyper-planes that separate multiple states are
trained using the data from known states. Using the trained classifier, the unknown
states of a turbine can be predicted.
The prognostics function predicts the remaining useful life (RUL) of a steam tur-
bine. To develop the prognostics function, a health index is defined for each
anomaly state, as shown Table 9.3. Vpp is the peak-to-peak value of the vibration
signal, clearance denotes the clearance of an oil seal in a steam turbine, and f8
indicates one of the frequency-domain features shown in Table 9.2. Each health
index, defined as a function of the extracted features, considers certain character-
istics of the anomaly conditions. The indices are normalized between zero and one.
A simple example is an unbalanced state in a turbine. The health index of unbalance
is the peak-to-peak level divided by the clearance level. After the health indices are
defined, the RUL can be calculated by tracing the trend of the indices, as presented
in Fig. 9.4. The blue points in the figure are used to establish the RUL model. The
model is used for estimating the RUL of the steam turbine, as shown by the green
dotted line.
An accurately calculated RUL provides the basis for condition-based monitoring
(CBM) of steam turbines. CBM of steam turbines helps steam turbine operators
schedule maintenance actions based on the analyzed condition of the turbines.
Operators can minimize unnecessary maintenance actions and can prevent catas-
trophic turbine failures by reliably predicting turbines’ health conditions. It has
9.1 Steam Turbine Rotors 309
been reported in [12] that CBM can substantially save on overall operation and
maintenance (O&M) costs.
The drivetrain of a wind turbine consists primarily of the main bearing, drive shaft,
gearbox, and generator, as shown in Fig. 9.5. The main bearing supports the blades,
and the gearbox connected to the drive shaft increases the rotating speed. Rotating
energy from the high-speed shaft (i.e., the output shaft of the gearbox) is transferred
to the generator. Wind turbines generally operate in harsh environmental condi-
tions, such as random and non-stationary wind profiles. Moreover, offshore wind
turbines are exposed to highly varying sea gusts with high salt concentrations.
Thus, the rotating parts in the drivetrain of a wind turbine are prone to a variety of
mechanical failures, such as fatigue, wear, and corrosion. Among the drive-train
components of a wind turbine, the gearbox is known to have the highest risk due to
its potential for an excessively long downtime and its expensive replacement cost.
This section discusses the four main PHM functions for wind turbine gearboxes.
310 9 Case Studies: Prognostics and Health Management (PHM)
Wind turbines are equipped with two kinds of DAQ systems, including SCADA
(Supervisory Control and Data Acquisition) and CMS (Condition Monitoring
System). The main purpose of both SCADA and CMS is to provide data for
integrated performance and health management in wind turbines.
Although SCADA was originally designed for performance management, recent
studies have shown that SCADA data can also serve as a precursor to represent the
health state of a wind turbine; data can also be used to enhance the performance of
PHM [13].
SCADA basically measures four kinds of signals, including environmental data
(e.g., wind speed and direction), operational data (e.g., power and rotor speed),
controlling data (e.g., yaw and pitch control) and response data (e.g., temperatures
of the main bearing and gearbox shaft). Data collected at a very low frequency, for
example once every 10 min, make it possible to evaluate the performance behavior
of a wind turbine. Some failure types of the gearbox affect the overall system
performance, thus leading to anomaly conditions such as increased temperature of
the lubrication oil or unexpected speed fluctuations due to irregular load conditions.
Thus, the use of SCADA data as a means for performing PHM is an emerging
trend.
CMS requires various kinds of signal analysis, such as vibration analysis, oil
debris analysis, and noise analysis, to enhance PHM capabilities. Among the var-
ious analysis methods available, a recent study described that typical mechanical
failure can be most sensitively detected using vibration analysis [14]. DNV GL, one
of the most important organizations for certification of wind turbines, enacted a
regulation that every system should acquire high-frequency vibration signals (in
addition to SCADA) for the purpose of health assessment in wind turbines.
DNV GL certification established a guideline regarding the necessary number of
vibration sensors that should be installed on the drivetrain of a wind turbine.
According to the certification, at least four vibration sensors, with a sampling rate of
9.2 Wind Turbine Gearboxes 311
more than 10 kHz, should be installed on each gearbox for condition monitoring
purposes. A typical wind turbine gearbox consists of multiple stages of gear sets,
including at least one planetary stage. Vibration sensors should be mounted near
locations where a high load is applied. In general, it is suggested that sensors be
positioned at each gear stage of the gearbox. When a planetary gearbox is of interest
for PHM, DNV GL certification suggests placing the sensors in “the area of the ring
gear of the planetary gear stage” and at “the level of the sun gear in the 1st stage of
the planetary gear stage” where vibration signals from gear meshing can be
effectively captured.
Recent studies showed that the integrated measurement of SCADA signals and
vibration signals can enhance the performance of PHM for wind turbine gearboxes.
For example, to reduce the effects of speed variations within the system, several
commercial CMSs are attempting to analyze the vibration signals only when a wind
turbine operates under nominally constant rotational speed and torque. This can be
achieved by adaptively measuring vibration signals, while continuously monitoring
the performance of wind turbines via SCADA signals.
Preprocessing
Wind turbine gearboxes consist of multiple stages of gear sets and many bearings.
The effects of the gearbox components are mixed together, and are measured using
an accelerometer. Thus, to enhance fault detectability, preprocessing techniques
should be used on vibration signals. As discussed in Sect. 8.3.1 of Chap. 8, one of
the most widely used preprocessing techniques is time synchronous averaging
(TSA). TSA serves as a signal separator that facilitates selective analysis of a
particular vibration signal that is estimated to originate from a single component of
interest in the gearbox [15]. Figure 9.6 illustrates the TSA procedure. In TSA, the
signal is separated by gear rotating frequency and ensemble averaged. The deter-
ministic signal, which is synchronously regular with the gear rotation, remains.
However, irregular or noise signals asymptotically converge to zero as the number
of averages in the ensemble increases. Using the signal processed with TSA,
additional analysis can be performed by defining residual signals (RES) and dif-
ference signals (DIF) that represent the energy of sidebands and noise, respectively.
RES can be calculated from TSA by filtering out regular components, including the
fundamental gear mesh frequency and its harmonics. Information about pure
sidebands can be observed from RES signals. DIF is calculated from RES by
filtering out sidebands. In the normal state, DIF should be ideally a white Gaussian
noise because there are no regular or irregular meshing components in the fre-
quency domain. When a fault occurs, an increase in the energy of sidebands and
unexpected frequency components can be detected well using RES and DIF.
312 9 Case Studies: Prognostics and Health Management (PHM)
Feature Extraction
Two feature extraction techniques are discussed here, based on: (a) SCADA and
(b) vibration. First, the SCADA-based technique can be performed by monitoring
power quality and temperature trends. Both the data-driven approach and the
model-based approach have four essential steps: (1) define a data-driven or
physics-based model that estimates the output signal (i.e., power or temperature) of
a gearbox based on the input signals (e.g., speed, torque, and power) under the
normal state, (2) continuously measure the input signals and calculate the estimated
output signals using the established model (while simultaneously collecting the
measured output signals), (3) calculate residuals, which are defined as the difference
between the estimated and the measured output signals, and (4) correlate the
residuals and the health state. As an example, Fig. 9.7 illustrates the health rea-
soning process of a wind turbine gearbox where the gearbox oil outlet temperature
was selected as the reference SCADA data. Applicable modeling techniques
include physics-based modeling techniques, such as temperature modeling using a
heat generation model, and data-driven modeling techniques, such as principal
component analysis (PCA), neural networks (NN), auto-associative kernel regres-
sion (AAKR), and the nonlinear state estimate technique (NSET).
Fig. 9.7 Health reasoning for a wind turbine gearbox using temperature monitoring
9.2 Wind Turbine Gearboxes 313
features, thus enabling timely and preventive maintenance and decision making
(e.g., CBM [17]).
For effective maintenance decisions, it is of great importance to understand the
lead-time needed for maintenance of a target system. Wind turbine maintenance
cannot occur instantly after predicting an anomaly state of the turbines due to
numerous practical difficulties, such as remote location of the turbines, part pro-
curement and supply, availability of maintenance crews and facilities, site acces-
sibility, etc. This problem is even worse when wind turbines are located offshore.
Thus, the lead-time to prepare for suggested maintenance must be carefully ana-
lyzed. Moreover, cost analysis becomes an important element when dealing with a
large-scale wind farm comprised of a fleet of wind turbines. Because even a single
maintenance action is accompanied by large expenses for a sea vessel with a crane,
crews, skilled technicians, and so on, maintenance schedules should be optimized to
minimize the overall operation and maintenance (O&M) cost. Thus, the health
management function should make optimal maintenance decisions while account-
ing for the lead-time and cost structure. Additional details on these considerations
can be found in [18], which contains a general theoretical background, and in [19]
which examines applications in wind farms.
where I0 and f are the current amplitude and AC frequency, respectively. The core
vibration is generated by a phenomenon called magnetostriction, and its accelera-
tion ac is proportional to the square of loading voltage U with an amplitude U0 as
Both vibration sources have twice the AC frequency as the fundamental frequency.
It is known that the core vibration has harmonic frequency components as well, due
to magnetization hysteresis and the complexity of the core structure [21]. Thus,
transformer vibration is a key physical quantity that can be used for PHM of power
transformers.
In order to measure the vibration of a transformer, acceleration sensors are often
installed on the outer surface of the transformer (see Fig. 9.8). Vibration is prop-
agated into the sensors through the insulating oil inside the transformer. These
sensors cannot be installed inside the transformer due to the insulating oil and high
electro-magnetic field.
As shown in Fig. 9.8, transformer vibration is measured at numerous locations,
specifically 32–162 locations, depending on the transformer’s size and type.
Measurement can be problematic due to (1) costly sensor installation and mainte-
nance, (2) management and prohibitively large processing times due to the amount
of data, and (3) acquisition of unnecessary data. Thus, sensor networks must be
optimally designed for cost-effective PHM. Measured or simulated vibration data
can be used to optimize design of the sensor network. Decision variables include
the type, number, and locations of sensors [22]. It is not easy for the
simulation-based approach to take into account various uncertainty sources, (e.g.,
operating conditions, maintenance, and manufacturing). Thus, first, six optimal
positions are found using measured vibration data. Given the fundamental and
harmonic frequencies found as features of vibration signals in the transformers, the
six sensors show the equivalent behavior of the features, compared to the fully
loaded sensors, as presented in Fig. 9.9.
Two health measures are defined here: the fundamental frequency and the harmonic
frequency. They are referred to as the fundamental health measure (FHM) and the
harmonic health measure (HHM), respectively. As addressed in Sect. 9.3.1, they
play an important role in assessing the health condition of the core and the winding.
Any mechanical failure in the core can increase the vibration energy in both the
fundamental and harmonic frequencies; whereas, one in the winding can only affect
the fundamental frequency. Mathematically, the two health measures, FHM and
HHM, can be expressed as
9.3 Power Transformers 317
where Sfund
i and Sharm
i are the spectral responses of the vibration signals at the
fundamental (120 Hz) and harmonic frequencies (240 Hz) measured at the ith
sensor, and fkg is a sensor set obtained from the sensor position optimization
process. Figure 9.10 shows the two health measures for three groups of power
transformers. The different groups are defined in terms of age: Group A is 21 years
old, Group B is 14 years old, and Group C is 6 years old. From the spread of the
two health measures, it can be inferred that (1) a faulty winding in Group A leads to
a high FHM, (2) a faulty core in Group B leads to high HHM, and (3) no fault
findings in Group C indicate any low-health measures. Later, it was confirmed that
the transformers in Groups A and B had been replaced according to field experts’
decisions, and the expected mechanical faults of these replaced transformers were
also observed.
This subsection briefly discusses strategies for predicting the RUL of power
transformers in light of potential mechanical failures. Determining the threshold for
mechanical failure of a transformer is an important step for the health prognostics
task. This study defines the threshold based upon historical data and experts’
opinion. A health degradation model can be developed using the health data
acquired from many transformers over different operational periods. The RUL is
318 9 Case Studies: Prognostics and Health Management (PHM)
Power generators are key components in power plants. Power generators convert
kinetic energy into electrical energy. A power generator generally consists of a
stator and a rotor, as shown in Fig. 9.12a. Reliable operation of power generators is
essential because unexpected breakdowns of a generator can lead to power plant
shutdown and substantial related economic and societal losses. Typically, the stator
winding, which is composed of slender copper strands, is one of the most vul-
nerable locations of the power generator. In an anomaly condition, the exterior
insulation of the stator winding can deteriorate due to moisture from inner coolant
channels, as shown in Fig. 9.12b. This water absorption in stator winding insulation
is an indirect reason for catastrophic failure, as shown in Fig. 9.12c. This section
describes how a smart health reasoning system for power generator windings can
mitigate downtime due to moisture absorption.
Fig. 9.12 a Structure of a power generator, b cross-sectional view of a winding, and c catastrophic
failure of a power generator
9.4 Power Generators 319
In a water-cooled power generator, coolant water flows into the water channels of
the winding. Sometimes, leakage occurs and water is absorbed into and remains in
the winding insulation. Leakage can be caused by various operational stresses, such
as mechanical vibration, thermal shock, and crevice corrosion. The water that
remains in the insulation degrades the winding insulation [24], which can cause the
insulation to break down and ultimately cause the power generator to fail. For this
reason, electric companies or manufacturing companies assess the health condition
of the winding insulation using water absorption detectors. A water absorption
detector infers the presence of water in the insulation by measuring capacitance of
the insulation [25]. Because the relative static permittivity (or the dielectric con-
stant) of water is higher than that of mica (which is what is generally used as the
insulation material), wet insulation has a higher capacitance C, based upon the
following equation:
A
C ¼ er e0 ð9:5Þ
t
where A is the measurement area, t is the distance between the plates, e0 is the
electric constant (e0 8.854 pF-m-1), and er is the relative static permittivity of the
material.
Capacitance measurements as health data provide valuable information that can
be used to infer the amount of water absorption in a stator winding. Thus,
health-relevant information about the winding can be extracted from capacitance
data. The power generators employed in the study described here have forty-two
windings and are water-cooled. As shown in Fig. 9.13, the assembly slot for both
the top and bottom groups contains ten measurement points. Each measurement
point can be modeled as a random variable, X1–X10.
The capacitance data acquired at these measurement points were modeled with
statistically correlated random variables, Xi. One way to measure the correlation
between two random variables is to use the Pearson product-moment correlation
Fig. 9.13 Structure diagram of a water-cooled power generator with a 2-path system. Reprinted
(adapted) with permission from Ref. [26]
320 9 Case Studies: Prognostics and Health Management (PHM)
coefficient. Table 9.4 summarizes the correlation coefficients for the ten random
variables in a matrix form. The highlighted values in this table are the correlation
coefficients between the measurement variables within the same group (e.g., the
CET group). One can observe two features from the highlighted values: (1) there is
a statistically positive correlation, and (2) there is a higher degree of correlation
within the same group. These features indicate that the two or three capacitance data
from the same group tend to behave with linear dependence. Based upon the
measurement location and the correlation features, the measurement points with
high correlation can be conceived as individual data groups, such as CET, CEB,
TET, and TEB. This implies that one entire dataset for ten random variables would
be split into four groups, each of which consists of two or three random variables.
Although the capacitance data are relevant to the health condition of the stator
winding, the high dimensionality and non-linearity of the data make it difficult to
infer the health condition easily and precisely. To address this situation, this section
introduces the definition of a new health index, namely Directional Mahalanobis
Distance (DMD). Traditional Mahalanobis distance (MD) is a relative health mea-
sure that quantifies the deviation of a measured data point from a clustered data
center, which is generally a populated mean (l) of a dataset [27]. MD degenerates
multi-dimensional data (X) to a one-dimensional distance measure, while taking into
account the statistical correlation between random variables, as shown in Fig. 9.14.
As compared to the Euclidean distance, the MD measure possesses a few unique
advantages: (1) MD transforms a high-dimensional dataset that is complicated to
handle into a one-dimensional measure capable of easy comprehension and quick
computation. (2) MD is robust to differing scales of the measurements, as MD
values are calculated after normalizing the data. (3) By taking into account the
correlation of the dataset, MD is sensitive to inter-variable changes in multivariate
measurements. However, MD has its own limitation in that it is a
direction-independent health measure in the random capacitance space. In other
words, two capacitance measurements with the same MD values, but one with a
higher capacitance value and the other with a lower value, are treated equally,
although they most likely imply two different levels of moisture absorption. Thus,
Directional Mahalanobis Distance (DMD) is proposed to overcome this limitation
of MD. DMD can be expressed as follows:
r ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T ffi
e
DMD X i ¼ e
Xi l R 1 e
Xi l ð9:6Þ
e n;i ¼ X n;i ; if X n;i [ ln
X ð9:7Þ
ln ; otherwise
9.4 Power Generators
Table 9.4 Correlation coefficient matrix (symmetric) for ten random variables. Reprinted (adapted) with permission from Ref. [26]
Correlation matrix CET CEB TET TEB
TOP OUT IN OUT (X4) IN TOP OUT (X7) IN OUT (X9) IN
(X1) (X2) (X3) (X5) (X6) (X8) (X10)
CET TOP (X1) 1
OUT (X2) 0.4761 1
IN (X3) 0.4194 0.5503 1
CEB OUT (X4) 0.0849 0.1572 0.1354 1
IN (X5) −0.039 0.1686 0.0765 0.3445 1
TET TOP (X6) 0.3341 0.1553 0.1868 0.0343 −0.052 1
OUT (X7) 0.1972 0.2506 0.2729 0.0879 0.0171 0.4377 1
IN (X8) 0.2295 0.1423 0.3296 0.0082 0.0457 0.4269 0.4900 1
TEB OUT (X9) 0.0438 −0.128 −0.097 0.0186 −0.114 0.0887 −0.010 −0.003 1
IN (X10) 0.0354 −0.040 −0.004 0.0457 0.0870 −0.048 0.1084 0.0215 0.3385 1
321
322 9 Case Studies: Prognostics and Health Management (PHM)
Fig. 9.14 Healthy and faulty points located in the a original space and b transformed space.
Reprinted (adapted) with permission from Ref. [28]
where Xn;i denotes the raw capacitance data at the nth measurement location of the
ith bar unit, ln is the mean of the capacitance data at the nth measurement location,
and Xe n;i denotes the processed capacitance data. Through the projection process in
Fig. 9.15b, the absolutely healthy data would be ignored in the subsequent trans-
formation. The data projection underscores the need for consideration of the
direction in the health reasoning process of the measurement data. This leads to the
unique capability of the proposed index that makes use of the distance and
degradation direction as a health measure, as shown in Fig. 9.15c.
Based upon the maintenance strategies for the stator winding and field experts’
opinions, three health grades were proposed, as summarized in Table 9.5: (1) faulty
condition (or the presence of water absorption), (2) warning condition (or close to
water absorption), and (3) healthy condition (or no water absorption).
Figure 9.16 shows the scatter plot of DMD against the operating period. The circles
mark the data obtained from the faulty (water absorbed) windings. In particular, two
data points marked with red circles represent two failed windings that caught fire in
2008. Most of the circled data points were classified as “faulty” or “warning.” This
indicates that the proposed health grade system properly defines the health condi-
tion of the generator stator windings as it relates to water absorption. The circles in
the healthy zone are maintenance cases due to reasons other than water absorption.
9.4 Power Generators 323
Fig. 9.15 Scatter plots a before the projection, b after projection, and c after transformation.
Reprinted (adapted) with permission from Ref. [28]
The proposed health grade system and the suggested maintenance actions are
expected to make it possible to carry out condition-based system maintenance. By
providing power plant maintenance personnel with a quantitatively established
maintenance guideline, the facility maintenance process would become much more
systematic and objective.
324 9 Case Studies: Prognostics and Health Management (PHM)
In a Li-ion battery cell, the SOC quantifies the remaining charge of a cell relative to
its fully charged capacity. The SOC of a cell changes very rapidly and, depending
on the use condition, may traverse the entire range 100–0% within minutes.
Capacity is an important SOH indicator [31, 32] that determines the maximum
amount of charge that a fully charged battery can deliver. In contrast to the rapidly
varying behavior of the SOC, the cell capacity tends to vary more slowly; it
typically decreases 1.0% or less in a month with regular use. Given a discrete-time
dynamic model that describes the electrical behavior of a cell and knowledge of the
measured electrical signals, the health reasoning function aims to estimate the SOC
and the capacity of the cell in a dynamic environment at every charge/discharge
cycle. Using this information, it predicts how long the cell is expected to last before
the capacity fade reaches an unacceptable level. The subsequent sections describe
one case study that demonstrates this process.
326 9 Case Studies: Prognostics and Health Management (PHM)
where OCV is the open circuit voltage, i is the current, Rs is the series resistance, Vd
is the diffusion voltage, and k is the index of the measurement time step. Since there
is a strong correlation between the SOC and OCV, the SOC can be estimated from
the OCV of the cell. The state transition equation of the diffusion voltage can be
expressed as
Vd;k Dt
Vd;k þ 1 ¼ Vd;k þ ik ð9:9Þ
Rd Cd
Vd
+ –
– OCV i Rs +
Cd
– + Rd
Fig. 9.17 A Li-ion battery equivalent circuit model (or lumped parameter model): open circuit
voltage (OCV), series resistance (Ri), diffusion resistance (Rd), and diffusion capacitance (Cd).
Reprinted (adapted) with permission from Ref. [33]
9.5 Lithium-Ion Batteries 327
approaches, such as the extended/unscented Kalman filter [31, 32, 34, 35] and the
coulomb counting technique [36].
Capacity Estimation with State Projection
Recent literature reports a variety of approaches for estimating the capacity and/
or internal resistance of Li-ion batteries. In general, these approaches can be cat-
egorized into (1) adaptive filtering approaches [31, 32, 34–42], (2) coulomb
counting approaches [36, 43], (3) neural network approaches [44–46], and
(4) kernel regression approaches [47–50]. In what follows, we elaborate on a
capacity estimation method that utilizes the SOC estimates before and after the state
projection to estimate the capacity. Based on a capacity estimate Ck, the state
projection scheme projects the SOC through a time span LΔt, expressed as [35]
R tk þ L
tk iðtÞdt
SOCk þ L ¼ SOCk þ ð9:10Þ
Ck
At the start of every state projection (i.e., at the time tk), an accurate SOC estimate is
needed. This estimate will then be projected through the projection time span
LΔt according to the state projection equation in Eq. 9.11. Upon the completion of
every state projection (i.e., at the time tk+L), we also need to have an accurate SOC
estimate to complete the capacity estimation. It is important to note that accuracy in
the SOC estimation is a key factor that affects accuracy in the capacity estimation.
In applications where the SOC estimates contain large measurement or estimation
noise, the state projection expressed by Eq. 9.11 will result in inaccurate and biased
capacity estimates, as also noted in [39]. In order to maintain an acceptable level of
accuracy in capacity estimation in the presence of inaccurate SOC estimates, a large
cumulated charge (i.e., the numerator in Eq. 9.11) is needed to compensate for the
inaccuracy in the SOC estimation.
Li-ion cells for this case study were constructed in hermetically sealed prismatic
cases between 2002 and 2012 and subjected to full depth of discharge cycling with a
nominal weekly discharge rate (C/168 discharge) at 37 °C [33]. The cycling data from
four 2002 cells was used to verify the effectiveness of the method for capacity
328 9 Case Studies: Prognostics and Health Management (PHM)
Fig. 9.18 Effect of capacity on state projection (assuming a constant current discharge). Reprinted
(adapted) with permission from Ref. [33]
estimation. The cell discharge capacity was estimated based on the state projection
scheme. In this study, an unknown SOC (or 1 − DOD) at a specific OCV level was
approximated based on the cubic spline interpolation with a set of known OCV and
SOC values (see the measurement points and interpolated curve in Fig. 9.19a). As
shown in Fig. 9.19a, the state projection zone spans an OCV range of 4.0–3.7 V. In
Fig. 9.19b, the net flow charge in this state projection zone was plotted as a function of
cell discharge capacity for four test cells (cells 1–4) at eight different cycles spanning
the whole 10-year test duration. The graph shows that the net flow charge is a linear
(a) (b)
4.2 80
Normalized Net flow charge (%)
3.8 65
3.4 50
0 20 40 60 80 100 70 80 90 100 110
Depth of discharge (%) Normalized capacity (%)
Fig. 9.19 a Plot of OCV as a function of DOD with state projection zone and b plot of
normalized net flow discharge as a function of normalized discharge capacity. Reprinted (adapted)
with permission from Ref. [33]
9.5 Lithium-Ion Batteries 329
function of the cell discharge capacity. This observation suggests that a linear model
can be generated to relate the capacity to the current integration. In fact, this linear
model is exactly the one given in Eq. 9.11. With the SOCs at 4.0 and 3.7 V, derived
based on the OCV-SOC relationship, and the net flow charge calculated by coulomb
counting, the cell discharge capacity can be computed based on Eq. 9.11.
The capacity estimation results for the first two cells (i.e., Cells 1 and 2) are
shown in Fig. 9.20. This figure shows that the capacity estimation method closely
tracks the capacity fade trend throughout the cycling test for both cells. Table 9.6
summarizes the capacity estimation errors for the four cells. Here, the root mean
square (RMS) and the maximum errors are formulated as
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
u1 X Nc
where NC is the number of charge/discharge cycles; and DCi and DC ^ i are respec-
tively the measured and estimated normalized capacities at the ith cycle. Observe
that the average error is less than 1% for any of the four cells and the maximum
error is less than 3%. The results suggest that the state projection method is capable
of producing accurate and robust capacity estimation in the presence of cell-to-cell
manufacturing variability.
90 90
85 85
80 80
75 75
0 200 400 600 800 0 200 400 600 800
Cycle number Cycle number
Fig. 9.20 Capacity estimation results of a cell 1 and b cell 2; results are plotted every 50 cycles
for ease of visualization. Reprinted (adapted) with permission from Ref. [33]
Table 9.6 Capacity estimation results for the 4 test cells. Reprinted (adapted) with permission
from Ref. [33]
Cell Cell 1 Cell 2 Cell 3 Cell 4
RMS error (%) 0.52 0.51 0.88 0.52
Maximum error (%) 2.38 2.91 2.10 2.90
330 9 Case Studies: Prognostics and Health Management (PHM)
An RUL estimate for Li-ion batteries refers to the available service time or number
of charge/discharge cycles left before the performance of the system degrades to an
unacceptable level. Research on battery health prognostics has to date been mainly
conducted by researchers in the prognostics and health management (PHM) society.
For example, a Bayesian framework with a particle filter was proposed for prog-
nostics (i.e., RUL prediction) of Li-ion batteries based on impedance measurements
[51]. In order to eliminate the need for prognostics to rely on impedance mea-
surement equipment, researchers have developed various model-based approaches
that predict RUL by extrapolating a capacity fade model [52–55]. In addition,
particle filters (or the sequential Monte Carlo methods) described in Sect. 8.4.1 can
be used for online updating of a capacity fade model for lithium-ion batteries and
for prediction of RUL using the updated model [33].
The RUL is used as the relevant metric for determining the state of life (SOL) of
Li-ion batteries. Based on the capacity estimates obtained from the state projection
scheme, the Gauss-Hermite particle filter technique is used to project the capacity
fade trend to the end of life (EOL) value for the RUL prediction [33]. Here the EOL
value is defined as 78.5% of the BOL discharge capacity. Figure 9.21a shows the
capacity tracking and RUL prediction from the GHPF at cycle 200 (or 3.1 years).
The figure shows that the predicted PDF of the life provides a slightly conservative
solution and includes the true EOL cycle (i.e., 650 cycles or approximately
9.4 years). Figure 9.21b plots the RUL predictions from the GHPF at multiple
cycles throughout the lifetime. The graph shows that, as we keep updating the RUL
distribution throughout the battery lifetime, the prediction tends to converge to the
true value as the battery approaches its EOL cycle.
Cycle 200
95 Predicted RUL
Predicted 600
RUL (cycles)
90
Life PDF True Life
85 Actual
data 400
80
Failure Limit 200
75
70 0
0 [0] 200 [3.1] 400 [6.0] 600 [8.8] 800 [11.5] 0 [0] 200 [3.1] 400 [6.0] 600 [8.8]
Cycle number [years on test] Cycle number [years on test]
Fig. 9.21 RUL prediction results of cell 1; Figure a plots the capacity tracking and RUL
prediction provided by the GHPF at cycle 200 (results are plotted every 20 cycles for ease of
visualization) and b plots the RUL predictions from the GHPF at multiple cycles throughout the
lifetime. Reprinted (adapted) with permission from Ref. [33]
9.6 Fuel Cells 331
Although fuel cells are promising candidates for future power generation, com-
mercialization of fuel cells remains a future prospect due to safety concerns. Thus,
accurate evaluation of the state of health (SOH) of fuel cells is necessary to enable
condition-based maintenance to prevent impending catastrophic failures of the fuel
cells. In this case study, the fuel cell is modeled by an equivalent circuit model
(ECM) whose parameters are the health indicators. The reasoning function deals
with the task of estimating the ECM parameters. The reasoning function can be
attained by either measuring or estimating the impedance spectrum of the fuel cell.
In the prognostic function, the ECM parameters are estimated for a future time with
the help of voltage estimation.
This section provides a basic overview of fuel cells. The basic structure of a fuel cell
contains an anode, a cathode, and electrolyte, as shown in Fig. 9.22. The hydrogen
is supplied from the anode side. It reacts with the oxygen coming from the cathode
side, and this reaction produces water. This overall reaction is made up of two
reactions.
H2 ! 2H þ þ 2e
ð9:13Þ
1
2 O2 þ 2H þ þ 2e ! H 2 O
The first reaction, separation of a hydrogen molecule into hydrogen ions and
electrons, occurs at the interface between the anode and the electrolyte. The
hydrogen ions then move to the cathode through the electrolyte to complete the
reaction. Since the electrolyte only allows the ions to pass, the electrons transfer
through an external wire and provide energy to the load on the way to the cathode.
This process continues as long as the hydrogen fuel supply remains.
During operation of a fuel cell, individual components of the fuel cell are subject
to degradation. The degradation rate accelerates during particular working steps,
such as during the transportation of the reactants, and during charge transfers. It
also varies based on the rate of the electrochemical reactions. Variation in the
degradation rates of fuel cells is due to the varied degradation of each component in
the fuel cell; membrane, catalyst layer, gas diffusion layer, and bipolar plates. For
example, the membrane can undergo mechanical, thermal, and electrochemical
degradation due to non-uniform stresses during the assembly or as cycling hydra-
tion induces mechanical stress on the membrane. Also, increased operating tem-
peratures due to reactant crossover via pinholes or perforations decompose the
membrane. Further, the radicals generated by the undesirable side reaction diminish
the membrane. In the electrocatalyst layer, the detachment, dissolution of the cat-
alyst, and growth in catalyst particle reduce the catalytic area and, thus, reduce the
catalyst activity. Also, corrosion can occur in the gas diffusion layer and bipolar
plates. Degradation of individual components, of course, occurs in a combined
manner. The electrochemical side reaction weakens the mechanical strength of the
membrane. Mechanical degradation brings about local pinholes and results in
thermal degradation due to the crossover of the reactant, which again enhances the
side reaction rate. This complex degradation must be prevented before it ends up
resulting in a catastrophe, such as an explosion. Prevention starts from knowing the
current status, and furthermore by predicting the future status of the fuel cells.
Re(Z)
transport is modeled using the Warburg element, ZW. Measuring the impedance
spectrum and fitting the ECM gives the parameters of the ECM; these parameters
tell us the main problem that the fuel cell is experiencing, and how much the fuel
cell suffers from it.
In spite of the robust features of the EIS method, obtaining the impedance
spectrum of a fuel cell requires an expensive impedance analyzer and stable
measurement conditions. These requirements make it difficult for EIS to be appli-
cable for online use. Thus, broadband current interruption techniques have been
developed to overcome the shortcomings of the EIS method. These methods utilize
current interrupting waveforms, such as pseudo-random binary sequence (PRBS)
and multi-sine signals, imposed on the operating current. They estimate the
impedance spectrum by analyzing the voltage response to the interrupting current.
These methods reduce the measurement time and extract information similar to that
found through EIS measurements.
The health indicator defined and estimated in the reasoning stage can be extrapo-
lated to predict the future status of the fuel cell. This section explains the process for
life prognosis using predictions of the health parameters of the equivalent circuit.
One way of predicting the health parameters is to combine the health parameters
with a physical property of the fuel cell and induce its health parameters [56]. In
fuel cells, voltage is a well-known lumped health indicator, and it can be related to
the health parameters of the equivalent circuit. An example is shown in Fig. 9.24.
Figure 9.24a shows the equivalent circuit of the fuel cell and its parameters,
Fig. 9.24b shows the voltage degradation curve, and Fig. 9.24c depicts the relation
between one of the model parameters and the voltage. From this relation, and with
the help of the voltage prediction, the health parameters for future status can be
estimated.
The voltage is modeled through two degradation models: reversible and irre-
versible. The reversible degradation is temporary and recoverable. An example of a
reversible degradation is water clogging in flow channels. Irreversible degradation
334 9 Case Studies: Prognostics and Health Management (PHM)
Fig. 9.24 a Equivalent circuit model, b measured voltage, c relationship between the voltage and
ECM parameter. Reprinted (adapted) with permission from Ref. [56]
is permanent damage to the fuel cell, such as a melted membrane. Reversible and
irreversible degradation are modeled by exponential and linear models, respec-
tively, as shown in Fig. 9.25.
9.6 Fuel Cells 335
Fig. 9.26 Predicted impedance spectrum results of FC2 at time a 830, and b at time 1016 h.
Reprinted (adapted) with permission from Ref. [56]
Next, the predicted voltage is used for estimating the health parameters. The
reconstructed impedance spectrum using the estimated parameters is compared with
the measured impedance spectrum and the comparison results are shown in
Fig. 9.26.
9.7 Pipelines
Pipelines are essential infrastructures in our modern society. Pipelines play crucial
roles in transporting various fluids, such as oil, water, and gas, from storage to
users. Leaks in pipelines result in economic, environmental, and social problems.
Thus, accurate detection and quick maintenance should be done to prevent these
problems. However, due to the characteristics of pipelines, the detection of leaks is
not easy. For example, accessibility of the pipeline is generally limited due to
installation conditions, which involve long distances that often include under-
ground, underwater, or alpine locations. To overcome these challenges, various leak
detection techniques, such as ground penetrating radar (GPR) [57], leak noise
correlators (LNC) [58], and pig-mounted acoustic (PMA) sensing [59], have been
developed over many years. In this section, a real-time and remote monitoring
detection method is introduced. This method uses a time-domain reflectometry
(TDR) technique that can stochastically detect multiple leaks using forward and
inverse models [60]. This method was validated through a case study that was
performed in a water distribution system. The water distribution system is the most
typical pipeline system. This TDR-based method is expected to be applicable to not
only water pipelines but also to other pipelines in general.
336 9 Case Studies: Prognostics and Health Management (PHM)
TDR was originally proposed as a method for locating faults on a transmission line,
such as electrical open, short, or chafe. The principle of TDR is similar to RADAR,
which finds the location of an object by measuring a reflected radio wave. Likewise,
a TDR device propagates an incident pulse along the transmission line. When the
pulse meets any fault on the transmission line, the pulse is reflected at that location
and returns to the device. Thus, by measuring the travel time of the pulse from
departure to arrival, the fault location can be estimated by multiplying the propa-
gation velocity of the pulse. The cause of the reflection is the nonconformity of
impedance in the transmission line. The change of impedance is caused by a fault,
such as a short, open, or chafe of the transmission line. The shape and degree of
reflection are represented by the reflection coefficient, Г, as shown in Fig. 9.27.
The reflection coefficient, Г, is expressed as:
ZF Z0
C1 ¼ ð9:14Þ
ZF þ Z0
where Z0 is the characteristic impedance of the transmission line and ZF is the fault
impedance. If Г is less than zero, a wave shape of the reflected pulse is upside down
against the incident pulse, which indicates an electrical short. If the Г is 1, the wave
shape of the reflected pulse is same as that of the incident pulse, which specifically
indicates an electrical open condition, as shown in Fig. 9.28.
Similarly, using these electric characteristics, TDR-based leak detection methods
have been developed by correlating water leakage with an electric short in a
transmission line. To generate an electric short at the leak position, a leak detector
is used, as shown in Fig. 9.29. The device is connected to a transmission line
attached to the pipeline. The leak detector consists of two copper plates and a
plastic case with holes. As leaking water contacts the copper plates, a pulse signal is
reflected due to the resulting electric short.
Normal Open
Short
Fig. 9.29 Concept of pipeline leak detection using a TDR and leak detectors
Even with the measured TDR signals, analysis of the reflected pulse signal for
accurate leak detection is not a simple matter. This is because ambient noise and
overlapped pulses can result from multiple leaks. Thus, both forward and inverse
models can be applied to accurately interpret the measured TDR signal. This
model-based leak detection method consists of three steps: (1) emulating possible
TDR signals through the forward model, (2) estimating the correlation of the
emulated signal with the measured one, and (3) determining the number of leaks
and their locations, as shown in Fig. 9.30. The first step is to simulate the TDR
signals for all possible leak situations using a forward model. The second step is to
calculate the correlation between the simulated and measured TDR signals using a
likelihood metric. Third, Bayesian inference is used to determine the number of
leaks, of which the simulated signal gives the maximum likelihood.
338 9 Case Studies: Prognostics and Health Management (PHM)
The forward model produces simulated TDR signals as a function of model input
parameters, such as the physical properties of the transmission line and leak
information (e.g., the number of leaks and their locations). The forward model is
generally derived using RLCG circuit theory or S-parameters. It is known that the
S-parameter model is computationally far more efficient than the RLCG signal
modeling method.
The inverse model employs Bayesian inference, which infers the posterior
distributions of the model parameters from the prior distributions. The prior dis-
tribution of a leak’s location assumes a uniform distribution. Then, for all possible
leak situations, the likelihood function in Eq. 9.15 can be calculated by comparing
the simulated TDR signals (vm) with the measured TDR signal (y) [61]. Bayesian
inference is employed to determine the number of leaks and their locations. The
simulated signal yields the highest value of Pr(h|y), and h is the leak location.
m=2 1
PrðyjhÞ ¼ ð2prM Þ exp k y v m k2 ð9:15Þ
2rM
Lab-scale experiments were carried out for validation of this leak detection
system. The test bed consisted of a 10 m pipeline, a transmission line, a leak
detector, and a TDR device, as shown in Fig. 9.31a. The experiment was conducted
with three water leaks present, specifically at 5.5, 6, and 8 m. Figure 9.31b displays
the signal acquired from the experiment. As shown in the figure, this signal does not
easily identify leaks or locations.
Figure 9.31c displays the three leaks and their locations that were correctly
identified using Bayesian inference. It shows the marginal PDFs of the leak
locations.
9.8 Summary
Four core functions in PHM have been successfully applied to engineering appli-
cations. This chapter presented several case studies that explain successful PHM
practices: (1) steam turbine rotors, (2) wind turbine gearboxes, (3) the core and
9.8 Summary 339
Fig. 9.31 a Test bed, b measured TDR signal, and c result of the Bayesian inference
windings in power transformers, (4) power generator stator windings, (5) lithium-ion
batteries, (6) fuel cells, and (7) water pipelines. These examples provide useful
findings about the four core functions of PHM technology, contemporary technology
trends, and industrial values. PHM offers great economic value to various industries
through condition-based maintenance (CBM), which helps operators schedule
maintenance actions based on the analyzed conditions of engineered systems.
Operators can minimize unnecessary maintenance actions and can prevent catas-
trophic system failures by reliably predicting a system’s health condition. CBM can
substantially save on overall operation and maintenance (O&M) costs.
References
1. Gupta, K. (1997). Vibration—A tool for machine diagnostics and condition monitoring.
Sadhana, 22, 393–410.
2. Lin, J., & Qu, L. S. (2000). Feature extraction based on Morlet wavelet and its application for
mechanical fault diagnosis. Journal of Sound and Vibration, 234, 135–148.
3. Jeon, B., Jung, J., Youn, B., Kim, Y.-W., & Bae, Y.-C. (2015). Datum unit optimization for
robustness of a journal bearing diagnosis system. International Journal of Precision
Engineering and Manufacturing, 16, 2411–2425.
4. Bonnardot, F., El Badaoui, M., Randall, R., Daniere, J., & Guillet, F. (2005). Use of the
acceleration signal of a gearbox in order to perform angular resampling (with limited speed
fluctuation). Mechanical Systems and Signal Processing, 19, 766–785.
340 9 Case Studies: Prognostics and Health Management (PHM)
5. Villa, L. F., Reñones, A., Perán, J. R., & de Miguel, L. J. (2011). Angular resampling for
vibration analysis in wind turbines under non-linear speed fluctuation. Mechanical Systems
and Signal Processing, 25, 2157–2168, 8.
6. Han, T., Yang, B.-S., Choi, W.-H., & Kim, J.-S. (2006). Fault diagnosis system of induction
motors based on neural network and genetic algorithm using stator current signals.
International Journal of Rotating Machinery, 2006.
7. Yang, B.-S., & Kim, K. J. (2006). Application of Dempster-Shafer theory in fault diagnosis of
induction motors using vibration and current signals. Mechanical Systems and Signal
Processing, 20, 403–420.
8. Jung, J.H., Jeon, B.C., Youn, B.D., Kim, M., Kim, D. & Kim, Y., (2017). Omnidirectional
regeneration (ODR) of proximity sensor signals for robust diagnosis of journal bearing
systems. Mechanical Systems and Signal Processing, 90, 189–207.
9. Guo, B., Damper, R. I., Gunn, S. R., & Nelson, J. D. B. (2008). A fast separability-based
feature-selection method for high-dimensional remotely sensed image classification. Pattern
Recognition, 41, 1653–1662, 5.
10. Abbasion, S., Rafsanjani, A., Farshidianfar, A., & Irani, N. (2007). Rolling element bearings
multi-fault classification based on the wavelet denoising and support vector machine.
Mechanical Systems and Signal Processing, 21, 2933–2945.
11. Widodo, A., & Yang, B.-S. (2007). Support vector machine in machine condition monitoring
and fault diagnosis. Mechanical Systems and Signal Processing, 21, 2560–2574.
12. Heng, A., Zhang, S., Tan, A. C. C., & Mathew, J. (2009). Rotating machinery prognostics:
State of the art, challenges and opportunities. Mechanical Systems and Signal Processing, 23,
724–739, 4.
13. Zaher, A., McArthur, S., Infield, D., & Patel, Y. (2009). Online wind turbine fault detection
through automated SCADA data analysis. Wind Energy, 12, 574–593.
14. Tchakoua, P., Wamkeue, R., Ouhrouche, M., Slaoui-Hasnaoui, F., Tameghe, T. A., & Ekemb,
G. (2014). Wind turbine condition monitoring: State-of-the-art review, new trends, and future
challenges. Energies, 7, 2595–2630.
15. Samuel, P. D., Conroy, J. K., & Pines, D. J. (2004). Planetary transmission diagnostics. NASA
CR, 213068.
16. Lebold, M., McClintic, K., Campbell, R., Byington, C., & Maynard, K. (2000). Review of
vibration analysis methods for gearbox diagnostics and prognostics. In Proceedings of the
54th Meeting of the Society for Machinery Failure Prevention Technology (p. 16).
17. Lau, B. C. P., Ma, E. W. M., & Pecht, M. (2012). Review of offshore wind turbine failures
and fault prognostic methods. In 2012 IEEE Conference on Prognostics and System Health
Management (PHM) (pp. 1–5).
18. Feldman, K., Jazouli, T., & Sandborn, P. A. (2009). A methodology for determining the
return on investment associated with prognostics and health management. IEEE Transactions
on Reliability, 58, 305–316.
19. Nilsson, J., & Bertling, L. (2007). Maintenance management of wind power systems using
condition monitoring systems—Life cycle cost analysis for two case studies. IEEE
Transactions on Energy Conversion, 22, 223–229.
20. Hu, C., Wang, P., Youn, B. D., Lee, W.-R., & Yoon, J. T. (2012). Copula-based statistical
health grade system against mechanical faults of power transformers. IEEE Transactions on
Power Delivery, 27, 1809–1819.
21. Shengchang, J., Yongfen, L., & Yanming, L. (2006). Research on extraction technique of
transformer core fundamental frequency vibration based on OLCM. IEEE Transactions on
Power Delivery, 21, 1981–1988.
22. Wang, P., Youn, B. D., Hu, C., Ha, J. M., & Jeon, B. (2014). A probabilistic
detectability-based sensor network design method for system health monitoring and
prognostics. Journal of Intelligent Material Systems and Structures, 1045389X14541496.
23. Hu, C., Youn, B. D., Wang, P., & Yoon, J. T. (2012). Ensemble of data-driven prognostic
algorithms for robust prediction of remaining useful life. Reliability Engineering & System
Safety, 103, 120–135.
References 341
24. Inoue, Y., Hasegawa, H., Sekito, S., Sotodate, M., Shimada, H., & Okamoto, T. (2003).
Technology for detecting wet bars in water-cooled stator windings of turbine generators. In
Electric Machines and Drives Conference, 2003. IEMDC’03. IEEE International (pp. 1337–
1343).
25. Kim, H. S., Bae, Y. C., & Kee, C. D. (2008). Wet bar detection by using water absorption
detector. Journal of Mechanical Science and Technology, 22, 1163–1173.
26. Park, K. M., Youn, B. D., Yoon, J. T., Hu, C., Kim, H. S., & Bae, Y. C. (2013). Health
diagnostics of water-cooled power generator stator windings using a Directional Mahalanobis
Distance (DMD). In 2013 IEEE Conference on Prognostics and Health Management (PHM)
(pp. 1–8).
27. Wang, Y., Miao, Q., & Pecht, M. (2011). Health monitoring of hard disk drive based on
Mahalanobis distance. In Prognostics and System Health Management Conference
(PHM-Shenzhen), 2011 (pp. 1–8).
28. Youn, B. D., Park, K. M., Hu, C., Yoon, J. T., Kim, H. S., Jang, B. C., et al. (2015). Statistical
health reasoning of water-cooled power generator stator bars against moisture absorption.
IEEE Transactions on Energy Conversion, 30(4), 1376–1385.
29. Arora, P., White, R. E., & Doyle, M. (1998). Capacity fade mechanisms and side reactions in
lithium-ion batteries. Journal of the Electrochemical Society, 145, 3647–3667.
30. Vetter, J., Novák, P., Wagner, M., Veit, C., Möller, K.-C., Besenhard, J., et al. (2005). Ageing
mechanisms in lithium-ion batteries. Journal of Power Sources, 147, 269–281.
31. Plett, G. L. (2004). Extended Kalman filtering for battery management systems of LiPB-based
HEV battery packs: Part 3. State and parameter estimation. Journal of Power Sources, 134,
277–292.
32. Plett, G. L. (2006). Sigma-point Kalman filtering for battery management systems of
LiPB-based HEV battery packs: Part 2: Simultaneous state and parameter estimation. Journal
of Power Sources, 161, 1369–1384.
33. Hu, C., Jain, G., Tamirisa, P., & Gorka, T. (2014). Method for estimating capacity and
predicting remaining useful life of lithium-ion battery. Applied Energy, 126, 182–189.
34. Lee, S., Kim, J., Lee, J., & Cho, B. (2008). State-of-charge and capacity estimation of
lithium-ion battery using a new open-circuit voltage versus state-of-charge. Journal of Power
Sources, 185, 1367–1373.
35. Hu, C., Youn, B. D., & Chung, J. (2012). A multiscale framework with extended Kalman
filter for lithium-ion battery SOC and capacity estimation. Applied Energy, 92, 694–704.
36. Ng, K. S., Moo, C.-S., Chen, Y.-P., & Hsieh, Y.-C. (2009). Enhanced coulomb counting
method for estimating state-of-charge and state-of-health of lithium-ion batteries. Applied
Energy, 86, 1506–1511.
37. Chiang, Y.-H., Sean, W.-Y., & Ke, J.-C. (2011). Online estimation of internal resistance and
open-circuit voltage of lithium-ion batteries in electric vehicles. Journal of Power Sources,
196, 3921–3932.
38. He, W., Williard, N., Chen, C., & Pecht, M. (2013). State of charge estimation for electric
vehicle batteries using unscented Kalman filtering. Microelectronics Reliability, 53, 840–847.
39. Plett, G. L. (2011). Recursive approximate weighted total least squares estimation of battery
cell total capacity. Journal of Power Sources, 196, 2319–2331.
40. Schmidt, A. P., Bitzer, M., Imre, Á. W., & Guzzella, L. (2010). Model-based distinction and
quantification of capacity loss and rate capability fade in Li-ion batteries. Journal of Power
Sources, 195, 7634–7638.
41. Verbrugge, M. (2007). Adaptive, multi-parameter battery state estimator with optimized
time-weighting factors. Journal of Applied Electrochemistry, 37, 605–616.
42. Xiong, R., Sun, F., Chen, Z., & He, H. (2014). A data-driven multi-scale extended Kalman
filtering based parameter and state estimation approach of lithium-ion olymer battery in
electric vehicles. Applied Energy, 113, 463–476.
43. Waag, W., & Sauer, D. U. (2013). Adaptive estimation of the electromotive force of the
lithium-ion battery after current interruption for an accurate state-of-charge and capacity
determination. Applied Energy, 111, 416–427.
342 9 Case Studies: Prognostics and Health Management (PHM)
44. Bai, G., Wang, P., Hu, C., & Pecht, M. (2014). A generic model-free approach for lithium-ion
battery health management. Applied Energy, 135, 247–260.
45. Kim, J., Lee, S., & Cho, B. (2012). Complementary cooperation algorithm based on DEKF
combined with pattern recognition for SOC/capacity estimation and SOH prediction. IEEE
Transactions on Power Electronics, 27, 436–451.
46. Eddahech, A., Briat, O., Bertrand, N., Delétage, J.-Y., & Vinassa, J.-M. (2012). Behavior and
state-of-health monitoring of Li-ion batteries using impedance spectroscopy and recurrent
neural networks. International Journal of Electrical Power & Energy Systems, 42, 487–494.
47. Pattipati, B., Sankavaram, C., & Pattipati, K. (2011). System identification and estimation
framework for pivotal automotive battery management system characteristics. IEEE
Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 41,
869–884.
48. Nuhic, A., Terzimehic, T., Soczka-Guth, T., Buchholz, M., & Dietmayer, K. (2013). Health
diagnosis and remaining useful life prognostics of lithium-ion batteries using data-driven
methods. Journal of Power Sources, 239, 680–688.
49. Widodo, A., Shim, M.-C., Caesarendra, W., & Yang, B.-S. (2011). Intelligent prognostics for
battery health monitoring based on sample entropy. Expert Systems with Applications, 38,
11763–11769.
50. Hu, C., Jain, G., Zhang, P., Schmidt, C., Gomadam, P., & Gorka, T. (2014). Data-driven
method based on particle swarm optimization and k-nearest neighbor regression for
estimating capacity of lithium-ion battery. Applied Energy, 129, 49–55.
51. Saha, B., Goebel, K., Poll, S., & Christophersen, J. (2009). Prognostics methods for battery
health monitoring using a Bayesian framework. IEEE Transactions on Instrumentation and
Measurement, 58, 291–296.
52. Ng, S. S., Xing, Y., & Tsui, K. L. (2014). A naive Bayes model for robust remaining useful
life prediction of lithium-ion battery. Applied Energy, 118, 114–123.
53. Wang, D., Miao, Q., & Pecht, M. (2013). Prognostics of lithium-ion batteries based on
relevance vectors and a conditional three-parameter capacity degradation model. Journal of
Power Sources, 239, 253–264.
54. Liu, J., Saxena, A., Goebel, K., Saha, B., & Wang, W. (2010). An adaptive recurrent neural
network for remaining useful life prediction of lithium-ion batteries. DTIC Document 2010.
55. Saha, B., & Goebel, K. (2009). Modeling Li-ion battery capacity depletion in a particle
filtering framework. In Proceedings of the Annual Conference of the Prognostics and Health
Management Society, 2009 (pp. 2909–2924).
56. Kim, T., Oh, H., Kim, H., & Youn, B. D. (2016). An online-applicable model for predicting
health degradation of PEM fuel cells with root cause analysis. IEEE Transactions on
Industrial Electronics, 63(11), 7094–7103.
57. Demirci, S., Yigit, E., Eskidemir, I. H., & Ozdemir, C. (2012). Ground penetrating radar
imaging of water leaks from buried pipes based on back-projection method. NDT and E
International, 47, 35–42.
58. Gao, Y., Brennan, M., Joseph, P., Muggleton, J., & Hunaidi, O. (2004). A model of the
correlation function of leak noise in buried plastic pipes. Journal of Sound and Vibration,
277, 133–148.
59. McNulty, J. (2001). An acoustic-based system for detecting, locating and sizing leaks in water
pipelines. In Proceedings of the 4th International Conference on Water Pipeline Systems:
Managing Pipeline Assets in an Evolving Market. New York, UK, 2001.
60. Kim, T., Woo, S., Youn, B. D., & Huh, Y. C. (2015). TDR-based pipe leakage detection and
location using Bayesian inference. In 2015 IEEE Conference on Prognostics and Health
Management (PHM) (pp. 1–5).
61. Schuet, S., Timucin, D., & Wheeler, K. (2011). A model-based probabilistic inversion
framework for characterizing wire fault detection using TDR. IEEE Transactions on
Instrumentation and Measurement, 60, 1654–1663.