6 PMC
6 PMC
6 PMC
1. Introduction [6.1.]
1. How did Statistical Quality Control Begin? [6.1.1.]
2. What are Process Control Techniques? [6.1.2.]
3. What is Process Control? [6.1.3.]
4. What to do if the process is "Out of Control"? [6.1.4.]
5. What to do if "In Control" but Unacceptable? [6.1.5.]
6. What is Process Capability? [6.1.6.]
5. Tutorials [6.5.]
1. What do we mean by "Normal" data? [6.5.1.]
2. What do we do when data are "Non-normal"? [6.5.2.]
3. Elements of Matrix Algebra [6.5.3.]
1. Numerical Examples [6.5.3.1.]
2. Determinant and Eigenstructure [6.5.3.2.]
4. Elements of Multivariate Analysis [6.5.4.]
1. Mean Vector and Covariance Matrix [6.5.4.1.]
2. The Multivariate Normal Distribution [6.5.4.2.]
3. Hotelling's T squared [6.5.4.3.]
1. T2 Chart for Subgroup Averages -- Phase I [6.5.4.3.1.]
2. T2 Chart for Subgroup Averages -- Phase II [6.5.4.3.2.]
3. Chart for Individual Observations -- Phase I [6.5.4.3.3.]
4. Chart for Individual Observations -- Phase II [6.5.4.3.4.]
5. Charts for Controlling Multivariate Variability [6.5.4.3.5.]
6. Constructing Multivariate Charts [6.5.4.3.6.]
5. Principal Components [6.5.5.]
1. Properties of Principal Components [6.5.5.1.]
2. Numerical Example [6.5.5.2.]
7. References [6.7.]
6.1. Introduction
Contents of This section discusses the basic concepts of statistical process control,
Section quality control and process capability.
The concept of The first to apply the newly discovered statistical methods to the
quality control problem of quality control was Walter A. Shewhart of the Bell
in Telephone Laboratories. He issued a memorandum on May 16, 1924
manufacturing that featured a sketch of a modern control chart.
was first
advanced by Shewhart kept improving and working on this scheme, and in 1931 he
Walter published a book on statistical quality control, "Economic Control of
Shewhart Quality of Manufactured Product", published by Van Nostrand in
New York. This book set the tone for subsequent applications of
statistical methods to process control.
Contributions Two other Bell Labs statisticians, H.F. Dodge and H.G. Romig
of Dodge and spearheaded efforts in applying statistical theory to sampling
Romig to inspection. The work of these three pioneers constitutes much of what
sampling nowadays comprises the theory of statistical quality and control.
inspection There is much more to say about the history of statistical quality
control and the interested reader is invited to peruse one or more of
the references. A very good summary of the historical background of
SQC is found in chapter 1 of "Quality Control and Industrial
Statistics", by Acheson J. Duncan. See also Juran (1997).
Typical There are many ways to implement process control. Key monitoring and
process investigating tools include:
control ● Histograms
techniques
● Check Sheets
● Pareto Charts
● Cause and Effect Diagrams
● Defect Concentration Diagrams
● Scatter Diagrams
● Control Charts
All these are described in Montgomery (2000). This chapter will focus
(Section 3) on control chart methods, specifically:
● Classical Shewhart Control charts,
● Cumulative Sum (CUSUM) charts
● Exponentially Weighted Moving Average (EWMA) charts
● Multivariate control charts
Tools of Several techniques can be used to investigate the product for defects or
statistical defective pieces after all processing is complete. Typical tools of SQC
quality (described in section 2) are:
control ● Lot Acceptance sampling plans
● Skip lot sampling plans
● Military (MIL) Standard sampling plans
Process must Note that the process must be stable before it can be centered at a
be stable target value or its overall variation can be reduced.
A process We are often required to compare the output of a stable process with the process
capability specifications and make a statement about how well the process meets specification. To
index uses do this we compare the natural variability of a stable process with the process
both the specification limits.
process
variability A capable process is one where almost all the measurements fall inside the specification
and the limits. This can be represented pictorially by the plot below:
process
specifications
to determine
whether the
process is
"capable"
There are several statistics that can be used to measure the capability of a process: Cp,
Cpk, Cpm.
Most capability indices estimates are valid only if the sample size used is 'large enough'.
Large enough is generally thought to be about 50 independent data values.
The Cp, Cpk, and Cpm statistics assume that the population of data values is normally
distributed. Assuming a two-sided specification, if and are the mean and standard
deviation, respectively, of the normal data and USL, LSL, and T are the upper and
lower specification limits and the target value, respectively, then the population
capability indices are defined as follows:
Definitions of
various
process
capability
indices
Sample Sample estimators for these indices are given below. (Estimators are indicated with a
estimates of "hat" over them).
capability
indices
The estimator for Cpk can also be expressed as Cpk = Cp(1-k), where k is a scaled
distance between the midpoint of the specification range, m, and the process mean, .
Denote the midpoint of the specification range by m = (USL+LSL)/2. The distance
between the process mean, , and the optimum, which is m, is - m, where
. The scaled distance is
(the absolute sign takes care of the case when ). To determine the
estimated value, , we estimate by . Note that .
The estimator for the Cp index, adjusted by the k factor, is
Plot showing To get an idea of the value of the Cp statistic for varying process widths, consider the
Cp for varying following plot
process
widths
Translating
capability into USL - LSL 6 8 10 12
"rejects" Cp 1.00 1.33 1.66 2.00
where ppm = parts per million and ppb = parts per billion. Note that the reject figures
are based on the assumption that the distribution is centered at .
We have discussed the situation with two spec. limits, the USL and LSL. This is known
as the bilateral or two-sided case. There are many cases where only the lower or upper
specifications are used. Using one spec limit is called unilateral or one-sided. The
corresponding capability indices are
One-sided
specifications
and the
corresponding and
capability
indices
where and are the process mean and standard deviation, respectively.
Estimators of Cpu and Cpl are obtained by replacing and by and s, respectively.
The following relationship holds
Cp = (Cpu + Cpl) /2.
This can be represented pictorially by
Confidence
Assuming normally distributed process data, the distribution of the sample follows
intervals for
indices
from a Chi-square distribution and and have distributions related to the
non-central t distribution. Fortunately, approximate confidence limits related to the
normal distribution have been derived. Various approximations to the distribution of
have been proposed, including those given by Bissell (1990), and we will use a
normal approximation.
The resulting formulas for confidence limits are given below:
100(1- )% Confidence Limits for Cp
where
= degrees of freedom
Confidence Approximate 100(1- )% confidence limits for Cpu with sample size n are:
Intervals for
Cpu and Cpl
with z denoting the percent point function of the standard normal distribution. If is
not known, set it to .
Limits for Cpl are obtained by replacing by .
Confidence Zhang et al. (1990) derived the exact variance for the estimator of Cpk as well as an
Interval for approximation for large n. The reference paper is Zhang, Stenback and Wardrop (1990),
Cpk "Interval Estimation of the process capability index", Communications in Statistics:
Theory and Methods, 19(21), 4455-4470.
The variance is obtained as follows:
Let
Then
where
It is important to note that the sample size should be at least 25 before these
approximations are valid. In general, however, we need n 100 for capability studies.
Another point to observe is that variations are not negligible due to the randomness of
capability indices.
An example For a certain process the USL = 20 and the LSL = 8. The observed process average,
= 16, and the standard deviation, s = 2. From this we obtain
This means that the process is capable as long as it is located at the midpoint, m = (USL
+ LSL)/2 = 14.
But it doesn't, since = 16. The factor is found by
and
We would like to have at least 1.0, so this is not a good process. If possible,
reduce the variability or/and center the process. We can compute the and
From this we see that the , which is the smallest of the above indices, is 0.6667.
Note that the formula is the algebraic equivalent of the min{
, } definition.
What you can The indices that we considered thus far are based on normality of the process
do with distribution. This poses a problem when the process distribution is not normal. Without
non-normal going into the specifics, we can list some remedies.
data 1. Transform the data so that they become approximately normal. A popular
transformation is the Box-Cox transformation
2. Use or develop another set of indices, that apply to nonnormal distributions. One
statistic is called Cnpk (for non-parametric Cpk). Its estimator is calculated by
where p(0.995) is the 99.5th percentile of the data and p(.005) is the 0.5th
percentile of the data.
For additional information on nonnormal distributions, see Johnson and Kotz
(1993).
There is, of course, much more that can be said about the case of nonnormal data.
However, if a Box-Cox transformation can be successfully performed, one is
encouraged to use it.
Definintion of Dodge reasoned that a sample should be picked at random from the
Lot lot, and on the basis of information that was yielded by the sample, a
Acceptance decision should be made regarding the disposition of the lot. In
Sampling general, the decision is either to accept or reject the lot. This process is
called Lot Acceptance Sampling or just Acceptance Sampling.
Acceptance It was pointed out by Harold Dodge in 1969 that Acceptance Quality
Quality Control is not the same as Acceptance Sampling. The latter depends
Control and on specific sampling plans, which when implemented indicate the
Acceptance conditions for acceptance or rejection of the immediate lot that is
Sampling being inspected. The former may be implemented in the form of an
Acceptance Control Chart. The control limits for the Acceptance
Control Chart are computed using the specification limits and the
standard deviation of what is being monitored (see Ryan, 2000 for
details).
Definitions Deriving a plan, within one of the categories listed above, is discussed
of basic in the pages that follow. All derivations depend on the properties you
Acceptance want the plan to have. These are described using the following terms:
Sampling ● Acceptable Quality Level (AQL): The AQL is a percent defective
terms that is the base line requirement for the quality of the producer's
product. The producer would like to design a sampling plan such
that there is a high probability of accepting a lot that has a defect
level less than or equal to the AQL.
● Lot Tolerance Percent Defective (LTPD): The LTPD is a
designated high defect level that would be unacceptable to the
consumer. The consumer would like the sampling plan to have a
low probability of accepting a lot with a defect level as high as
the LTPD.
● Type I Error (Producer's Risk): This is the probability, for a
given (n,c) sampling plan, of rejecting a lot that has a defect level
equal to the AQL. The producer suffers when this occurs, because
a lot with acceptable quality was rejected. The symbol is
commonly used for the Type I error and typical values for
range from 0.2 to 0.01.
● Type II Error (Consumer's Risk): This is the probability, for a
given (n,c) sampling plan, of accepting a lot with a defect level
equal to the LTPD. The consumer suffers when this occurs,
because a lot with unacceptable quality was accepted. The symbol
is commonly used for the Type II error and typical values range
from 0.2 to 0.01.
● Operating Characteristic (OC) Curve: This curve plots the
probability of accepting the lot (Y-axis) versus the lot fraction or
percent defectives (X-axis). The OC curve is the primary tool for
displaying and investigating the properties of a LASP.
● Average Outgoing Quality (AOQ): A common procedure, when
sampling and testing is non-destructive, is to 100% inspect
rejected lots and replace all defectives with good units. In this
case, all rejected lots are made perfect and the only defects left
are those in lots that were accepted. AOQ's refer to the long term
defect level for this combined LASP and 100% inspection of
rejected lots process. If all lots come in with a defect level of
exactly p, and the OC curve for the chosen (n,c) LASP indicates a
probability pa of accepting such a lot, over the long run the AOQ
can easily be shown to be:
The final Making a final choice between single or multiple sampling plans that
choice is a have acceptable properties is a matter of deciding whether the average
tradeoff sampling savings gained by the various multiple sampling plans justifies
decision the additional complexity of these plans and the uncertainty of not
knowing how much sampling and inspection will be done on a
day-by-day basis.
AQL is The foundation of the Standard is the acceptable quality level or AQL. In
foundation the following scenario, a certain military agency, called the Consumer
of standard from here on, wants to purchase a particular product from a supplier,
called the Producer from here on.
In applying the Mil. Std. 105D it is expected that there is perfect
agreement between Producer and Consumer regarding what the AQL is
for a given product characteristic. It is understood by both parties that
the Producer will be submitting for inspection a number of lots whose
quality level is typically as good as specified by the Consumer.
Continued quality is assured by the acceptance or rejection of lots
following a particular sampling plan and also by providing for a shift to
another, tighter sampling plan, when there is evidence that the
Producer's product does not meet the agreed-upon AQL.
Standard Mil. Std. 105E offers three types of sampling plans: single, double and
offers 3 multiple plans. The choice is, in general, up to the inspectors.
types of
sampling Because of the three possible selections, the standard does not give a
plans sample size, but rather a sample code letter. This, together with the
decision of the type of plan yields the specific sampling plan to be used.
Steps in the The steps in the use of the standard can be summarized as follows:
standard 1. Decide on the AQL.
2. Decide on the inspection level.
3. Determine the lot size.
4. Enter the table to find sample size code letter.
5. Decide on type of sampling to be used.
6. Enter proper table to find the plan to be used.
7. Begin with normal inspection, follow the switching rules and the
rule for stopping the inspection (if needed).
Additional There is much more that can be said about Mil. Std. 105E, (and 105D).
information The interested reader is referred to references such as (Montgomery
(2000), Schilling, tables 11-2 to 11-17, and Duncan, pages 214 - 248).
There is also (currently) a web site developed by Galit Shmueli that will
develop sampling plans interactively with the user, according to Military
Standard 105E (ANSI/ASQC Z1.4, ISO 2859) Tables.
Number of It is instructive to show how the points on this curve are obtained, once
defectives is we have a sampling plan (n,c) - later we will demonstrate how a
approximately sampling plan (n,c) is obtained.
binomial
We assume that the lot size N is very large, as compared to the sample
size n, so that removing the sample doesn't significantly change the
remainder of the lot, no matter how many defects are in the sample.
Then the distribution of the number of defectives, d, in a random
sample of n items is approximately binomial with parameters n and p,
where p is the fraction of defectives per lot.
The probability of observing exactly d defectives is given by
The binomial
distribution
Sample table Using this formula with n = 52 and c=3 and p = .01, .02, ...,.12 we find
for Pa, Pd Pa Pd
using the
binomial .998 .01
distribution .980 .02
.930 .03
.845 .04
.739 .05
.620 .06
.502 .07
.394 .08
.300 .09
.223 .10
.162 .11
.115 .12
Equations for In order to design a sampling plan with a specified OC curve one
calculating a needs two designated points. Let us design a sampling plan such that
sampling plan the probability of acceptance is 1- for lots with fraction defective p1
with a given and the probability of acceptance is for lots with fraction defective
OC curve p2. Typical choices for these points are: p1 is the AQL, p2 is the LTPD
and , are the Producer's Risk (Type I error) and Consumer's Risk
(Type II error), respectively.
Sample table Setting p = .01, .02, ..., .12, we can generate the following table
of AOQ AOQ p
versus p .0010 .01
.0196 .02
.0278 .03
.0338 .04
.0369 .05
.0372 .06
.0351 .07
.0315 .08
.0270 .09
.0223 .10
.0178 .11
.0138 .12
Interpretation From examining this curve we observe that when the incoming quality
of AOQ plot is very good (very small fraction of defectives coming in), then the
outgoing quality is also very good (very small fraction of defectives
going out). When the incoming lot quality is very bad, most of the lots
are rejected and then inspected. The "duds" are eliminated or replaced
by good ones, so that the quality of the outgoing lots, the AOQ,
becomes very good. In between these extremes, the AOQ rises, reaches
a maximum, and then drops.
The maximum ordinate on the AOQ curve represents the worst
possible quality that results from the rectifying inspection program. It
is called the average outgoing quality limit, (AOQL ).
From the table we see that the AOQL = 0.0372 at p = .06 for the above
example.
One final remark: if N >> n, then the AOQ ~ pa p .
Sample table Setting p= .01, .02, ....14 generates the following table
of ATI versus ATI P
p 70 .01
253 .02
753 .03
1584 .04
2655 .05
3836 .06
5007 .07
6083 .08
7012 .09
7779 .10
8388 .11
8854 .12
9201 .13
9453 .14
Plot of ATI A plot of ATI versus p, the Incoming Lot Quality (ILQ) is given below.
versus p
Design of a The parameters required to construct the OC curve are similar to the single
double sample case. The two points of interest are (p1, 1- ) and (p2, , where p1 is the
sampling lot fraction defective for plan 1 and p2 is the lot fraction defective for plan 2. As
plan far as the respective sample sizes are concerned, the second sample size must
be equal to, or an even multiple of, the first sample size.
There exist a variety of tables that assist the user in constructing double and
multiple sampling plans. The index to these tables is the p2/p1 ratio, where p2 >
p1. One set of tables, taken from the Army Chemical Corps Engineering
Agency for = .05 and = .10, is given below:
Tables for n1 = n2
accept approximation values
R= numbers of pn1 for
p2/p1 c1 c2 P = .95 P = .10
Example
The left holds constant at 0.05 (P = 0.95 = 1 - ) and the right holds
constant at 0.10. (P = 0.10). Then holding constant we find pn1 = 1.16 so n1
= 1.16/p1 = 116. And, holding constant we find pn1 = 5.39, so n1 = 5.39/p2 =
108. Thus the desired sampling plan is
n1 = 108 c1 = 2 n2 = 108 c2 = 4
If we opt for n2 = 2n1, and follow the same procedure using the appropriate
table, the plan is:
n1 = 77 c1 = 1 n2 = 154 c2 = 4
The first plan needs less samples if the number of defectives in sample 1 is
greater than 2, while the second plan needs less samples if the number of
defectives in sample 1 is less than 2.
Construction Since when using a double sampling plan the sample size depends on whether
of the ASN or not a second sample is required, an important consideration for this kind of
curve sampling is the Average Sample Number (ASN) curve. This curve plots the
ASN versus p', the true fraction defective in an incoming lot.
We will illustrate how to calculate the ASN curve with an example. Consider a
double-sampling plan n1 = 50, c1= 2, n2 = 100, c2 = 6, where n1 is the sample
size for plan 1, with accept number c1, and n2, c2, are the sample size and
accept number, respectively, for plan 2.
Let p' = .06. Then the probability of acceptance on the first sample, which is the
chance of getting two or less defectives, is .416 (using binomial tables). The
probability of rejection on the second sample, which is the chance of getting
more than six defectives, is (1-.971) = .029. The probability of making a
decision on the first sample is .445, equal to the sum of .416 and .029. With
complete inspection of the second sample, the average size sample is equal to
the size of the first sample times the probability that there will be only one
sample plus the size of the combined samples times the probability that a
second sample will be necessary. For the sampling plan under consideration,
the ASN with complete inspection of the second sample for a p' of .06 is
50(.445) + 150(.555) = 106
The general formula for an average sample number curve of a double-sampling
plan with complete inspection of the second sample is
ASN = n1P1 + (n1 + n2)(1 - P1) = n1 + n2(1 - P1)
where P1 is the probability of a decision on the first sample. The graph below
shows a plot of the ASN versus p'.
The ASN
curve for a
double
sampling
plan
Procedure The procedure commences with taking a random sample of size n1from
for multiple a large lot of size N and counting the number of defectives, d1.
sampling
if d1 a1 the lot is accepted.
if d1 r1 the lot is rejected.
if a1 < d1 < r1, another sample is taken.
If subsequent samples are required, the first sample procedure is
repeated sample by sample. For each sample, the total number of
defectives found at any stage, say stage i, is
Item-by-item The sequence can be one sample at a time, and then the sampling
and group process is usually called item-by-item sequential sampling. One can also
sequential select sample sizes greater than one, in which case the process is
sampling referred to as group sequential sampling. Item-by-item is more popular
so we concentrate on it. The operation of such a plan is illustrated
below:
Diagram of
item-by-item
sampling
Equations The equations for the two limit lines are functions of the parameters p1,
for the limit , p2, and .
lines
where
Instead of using the graph to determine the fate of the lot, one can resort
to generating tables (with the help of a computer program).
n n n n n n
inspect accept reject inspect accept reject
1 x x 14 x 2
2 x 2 15 x 2
3 x 2 16 x 3
4 x 2 17 x 3
5 x 2 18 x 3
6 x 2 19 x 3
7 x 2 20 x 3
8 x 2 21 x 3
9 x 2 22 x 3
10 x 2 23 x 3
11 x 2 24 0 3
12 x 2 25 0 3
13 x 2 26 0 3
The f and i The parameters f and i are essential to calculating the probability of acceptance
parameters for a skip-lot sampling plan. In this scheme, i, called the clearance number, is a
positive integer and the sampling fraction f is such that 0 < f < 1. Hence, when f
= 1 there is no longer skip-lot sampling. The calculation of the acceptance
probability for the skip-lot sampling plan is performed via the following
formula
ASN of skip-lot An important property of skip-lot sampling plans is the average sample number
sampling plan (ASN ). The ASN of a skip-lot sampling plan is
ASNskip-lot = (F)(ASNreference)
where F is defined by
Therefore, since 0 < F < 1, it follows that the ASN of skip-lot sampling is
smaller than the ASN of the reference sampling plan.
In summary, skip-lot sampling is preferred when the quality of the submitted
lots is excellent and the supplier can demonstrate a proven track record.
Chart
demonstrating
basis of
control chart
Why control The control limits as pictured in the graph might be .001 probability
charts "work" limits. If so, and if chance causes alone were present, the probability of
a point falling above the upper limit would be one out of a thousand,
and similarly, a point falling below the lower limit would be one out of
a thousand. We would be searching for an assignable cause if a point
would fall outside these limits. Where we put these limits will
determine the risk of undertaking such a search when in reality there is
no assignable cause for variation.
Since two out of a thousand is a very small risk, the 0.001 limits may
be said to give practical assurances that, if a point falls outside these
limits, the variation was caused be an assignable cause. It must be
noted that two out of one thousand is a purely arbitrary number. There
is no reason why it could have been set to one out a hundred or even
larger. The decision would depend on the amount of risk the
management of the quality control program is willing to take. In
general (in the world of quality control) it is customary to use limits
that approximate the 0.002 standard.
Letting X denote the value of a process characteristic, if the system of
chance causes generates a variation in X that follows the normal
distribution, the 0.001 probability limits will be very close to the 3
limits. From normal tables we glean that the 3 in one direction is
0.00135, or in both directions 0.0027. For normal distributions,
therefore, the 3 limits are the practical equivalent of 0.001
probability limits.
Strategies for If a data point falls outside the control limits, we assume that the
dealing with process is probably out of control and that an investigation is
out-of-control warranted to find and eliminate the cause or causes.
findings
Does this mean that when all points fall within the limits, the process is
in control? Not necessarily. If the plot looks non-random, that is, if the
points exhibit some form of systematic behavior, there is still
something wrong. For example, if the first 25 of 30 points fall above
the center line and the last 5 fall below the center line, we would wish
to know why this is so. Statistical methods to detect sequences or
nonrandom patterns can be applied to the interpretation of control
charts. To be sure, "in control" implies that all points are between the
control limits and they form a random pattern.
C4 factor
Fractional
Factorials
With this definition the reader should have no problem verifying that
the c4 factor for n = 10 is .9727.
Mean and So the mean or expected value of the sample standard deviation is c4 .
standard
deviation of The standard deviation of the sample standard deviation is
the
estimators
WECO rules The WECO rules are based on probability. We know that, for a normal
based on distribution, the probability of encountering a point outside ± 3 is
probabilities 0.3%. This is a rare event. Therefore, if we observe a point outside the
control limits, we conclude the process has shifted and is unstable.
Similarly, we can identify other events that are equally rare and use
them as flags for instability. The probability of observing two points
out of three in a row between 2 and 3 and the probability of
observing four points out of five in a row between 1 and 2 are also
about 0.3%.
WECO rules Note: While the WECO rules increase a Shewhart chart's sensitivity to
increase trends or drifts in the mean, there is a severe downside to adding the
false alarms WECO rules to an ordinary Shewhart control chart that the user should
understand. When following the standard Shewhart "out of control"
rule (i.e., signal if and only if you see a point beyond the plus or minus
3 sigma control limits) you will have "false alarms" every 371 points
on the average (see the description of Average Run Length or ARL on
the next page). Adding the WECO rules increases the frequency of
false alarms to about once in every 91.75 points, on the average (see
Champ and Woodall, 1987). The user has to decide whether this price
is worth paying (some users add the WECO rules, but take them "less
seriously" in terms of the effort put into troubleshooting activities when
out of control signals occur).
With this background, the next page will describe how to construct
Shewhart variables control charts.
and S We begin with and s charts. We should use the s chart first to
Shewhart determine if the distribution for the process characteristic is stable.
Control
Charts Let us consider the case where we have to estimate by analyzing past
data. Suppose we have m preliminary samples at our disposition, each of
size n, and let si be the standard deviation of the ith sample. Then the
average of the m standard deviations is
and R If the sample size is relatively small (say equal to or less than 10), we
control can use the range instead of the standard deviation of a sample to
charts construct control charts on and the range, R. The range of a sample is
simply the difference between the largest and smallest observation.
There is a statistical relationship (Patnaik, 1946) between the mean
range for data from a normal distribution and , the standard deviation
of that distribution. This relationship depends only on the sample size, n.
The mean of R is d2 , where the value of d2 is also a function of n. An
estimator of is therefore R /d2.
Armed with this background we can now develop the and R control
chart.
Let R1, R2, ..., Rk, be the range of k samples. The average range is
The R chart
R control This chart controls the process variability since the sample range is
charts related to the process standard deviation. The center line of the R chart
is the average range.
To compute the control limits we need an estimate of the true, but
unknown standard deviation W = R/ . This can be found from the
distribution of W = R/ (assuming that the items that we measure
follow a normal distribution). The standard deviation of W is d3, and is a
known function of the sample size, n. It is tabulated in many textbooks
on statistical quality control.
Therefore since R = W , the standard deviation of R is R = d3 . But
since the true is unknown, we may estimate R by
As was the case with the control chart parameters for the subgroup
averages, defining another set of factors will ease the computations,
namely:
D3 = 1 - 3 d3 / d2 and D4 = 1 + 3 d3 / d2. These yield
Efficiency of n Relative
R versus S Efficiency
2 1.000
3 0.992
4 0.975
5 0.955
6 0.930
10 0.850
A typical sample size is 4 or 5, so not much is lost by using the range for
such sample sizes.
which is the absolute value of the first difference (e.g., the difference between
two consecutive data points) of the data. Analogous to the Shewhart control
chart, one can plot both the data (which are the individuals) and the moving
range.
Individuals For the control chart for individual measurements, the lines plotted are:
control
limits for an
observation
where is the average of all the individuals and is the average of all
the moving ranges of two observations. Keep in mind that either or both
averages may be replaced by a standard or target, if available. (Note that
1.128 is the value of d2 for n = 2).
Example of The following example illustrates the control chart for individual
moving observations. A new process was studied in order to monitor flow rate. The
range first 10 batches resulted in
Example of
individuals
chart
The process is in control, since none of the plotted points fall outside either
the UCL or LCL.
Alternative Note: Another way to construct the individuals chart is by using the standard
for deviation. Then we can obtain the chart from
constructing
individuals
control It is preferable to have the limits computed this way for the start of Phase 2.
chart
Definition of
cumulative
sum
Sample
V-Mask
demonstrating
an out of
control
process
Interpretation In the diagram above, the V-Mask shows an out of control situation
of the V-Mask because of the point that lies above the upper arm. By sliding the
on the plot V-Mask backwards so that the origin point covers other cumulative
sum data points, we can determine the first point that signaled an
out-of-control situation. This is useful for diagnosing what might have
caused the process to go out of control.
From the diagram it is clear that the behavior of the V-Mask is
determined by the distance k (which is the slope of the lower arm) and
the rise distance h. These are the design parameters of the V-Mask.
Note that we could also specify d and the vertex angle (or, as is more
common in the literature, = 1/2 the vertex angle) as the design
parameters, and we would end up with the same V-Mask.
In practice, designing and manually constructing a V-Mask is a
complicated procedure. A cusum spreadsheet style procedure shown
below is more practical, unless you have statistical software that
automates the V-Mask methodology. Before describing the spreadsheet
approach, we will look briefly at an example of a software V-Mask.
JMP example An example will be used to illustrate how to construct and apply a
of V-Mask V-Mask procedure using JMP. The 20 data points
324.925, 324.675, 324.725, 324.350, 325.350, 325.225, 324.125,
324.525, 325.225, 324.600, 324.625, 325.150, 328.325, 327.250,
327.825, 328.500, 326.675, 327.775, 326.875, 328.350
are each the average of samples of size 4 taken from a process that has
an estimated mean of 325. Based on process data, the process standard
deviation is 1.27 and therefore the sample means used in the cusum
procedure have a standard deviation of 1.27/41/2 = 0.635.
After inputting the 20 sample means and selecting "control charts"
from the pull down "Graph" menu, JMP displays a "Control Charts"
screen and a "CUSUM Charts" screen. Since each sample mean is a
separate "data point", we choose a constant sample size of 1. We also
choose the option for a two sided Cusum plot shown in terms of the
original data.
JMP allows us a choice of either designing via the method using h and
k or using an alpha and beta design approach. For the latter approach
we must specify
● , the probability of a false alarm, i.e., concluding that a shift in
the process has occurred, while in fact it did not
● , the the probability of not detecting that a shift in the process
mean has, in fact, occurred
● (delta), the amount of shift in the process mean that we wish to
detect, expressed as a multiple of the standard deviation of the
data points (which are the sample means).
Note: Technically, alpha and beta are calculated in terms of one
sequential trial where we monitor Sm until we have either an
out-of-control signal or Sm returns to the starting point (and the
monitoring begins, in effect, all over again).
JMP output When we click on chart we see the V-Mask placed over the last data
from point. The mask clearly indicates an out of control situation.
CUSUM
procedure
We next "grab" the V-Mask and move it back to the first point that
indicated the process was out of control. This is point number 14, as
shown below.
JMP
CUSUM
chart after
moving
V-Mask to
first out of
control
point
A Most users of cusum procedures prefer tabular charts over the V-Mask.
spreadsheet The V-Mask is actually a carry-over of the pre-computer era. The
approach to tabular method can be quickly implemented by standard spreadsheet
cusum software.
monitoring
To generate the tabular form we use the h and k parameters expressed in
the original data units. It is also possible to use sigma units.
The following quantities are calculated:
Shi(i) = max(0, Shi(i-1) + xi - - k)
Example of We will construct a cusum tabular chart for the example described
spreadsheet above. For this example, the JMP parameter table gave h = 4.1959 and k
calculations = .3175. Using these design values, the tabular form of the example is
h k
325 4.1959 0.3175
Increase in Decrease in
mean mean
Group x x-325 x-325-k Shi 325-k-x Slo Cusum
If the distance between a plotted point and the lowest previous point is equal
to or greater than h, one concludes that the process mean has shifted
(increased).
h is decision Hence, h is referred to as the decision limit. Thus the sample size n,
limit reference value k, and decision limit h are the parameters required for
operating a one-sided CUSUM chart. If one has to control both positive and
negative deviations, as is usually the case, two one-sided charts are used,
with respective values k1, k2, (k1 > k2) and respective decision limits h and
-h.
Standardizing The shift in the mean can be expressed as - k. If we are dealing with
shift in mean normally distributed measurements, we can standardize this shift by
and decision
limit
Determination The average run length (ARL) at a given quality level is the average number
of the ARL, of samples (subgroups) taken before an action signal is given. The
given h and k standardized parameters ks and hs together with the sample size n are usually
selected to yield approximate ARL's L0 and L1 at acceptable and rejectable
quality levels 0 and 1 respectively. We would like to see a high ARL, L0,
when the process is on target, (i.e. in control), and a low ARL, L1, when the
process mean shifts to an unsatisfactory level.
In order to determine the parameters of a CUSUM chart, the acceptable and
rejectable quality levels along with the desired respective ARL ' s are usually
specified. The design parameters can then be obtained by a number of ways.
Unfortunately, the calculations of the ARL for CUSUM charts are quite
involved.
There are several nomographs available from different sources that can be
utilized to find the ARL's when the standardized h and k are given. Some of
the nomographs solve the unpleasant integral equations that form the basis
of the exact solutions, using an approximation of Systems of Linear
Algebraic Equations (SLAE). This Handbook used a computer program that
furnished the required ARL's given the standardized h and k. An example is
given below:
Example of
finding ARL's mean shift Shewart
given the (k = .5) 4 5
standardized
h and k 0 336 930 371.00
.25 74.2 140 281.14
.5 26.6 30.0 155.22
.75 13.3 17.0 81.22
1.00 8.38 10.4 44.0
1.50 4.75 5.75 14.97
2.00 3.34 4.01 6.30
2.50 2.62 3.11 3.24
3.00 2.19 2.57 2.00
4.00 1.71 2.01 1.19
Using the If k = .5, then the shift of the mean (in multiples of the standard deviation of
table the mean) is obtained by adding .5 to the first column. For example to detect
a mean shift of 1 sigma at h = 4, the ARL = 8.38. (at first column entry of
.5).
The last column of the table contains the ARL's for a Shewhart control chart
at selected mean shifts. The ARL for Shewhart = 1/p, where p is the
probability for a point to fall outside established control limits. Thus, for
3-sigma control limits and assuming normality, the probability to exceed the
upper control limit = .00135 and to fall below the lower control limit is also
.00135 and their sum = .0027. (These numbers come from standard normal
distribution tables or computer programs, setting z = 3). Then the ARL =
1/.0027 = 370.37. This says that when a process is in control one expects an
out-of-control signal (false alarm) each 371 runs.
ARL if a 1 When the means shifts up by 1 sigma, then the distance between the upper
sigma shift control limit and the shifted mean is 2 sigma (instead of 3 ). Entering
has occurred normal distribution tables with z = 2 yields a probability of p = .02275 to
exceed this value. The distance between the shifted mean and the lower limit
is now 4 sigma and the probability of < -4 is only .000032 and can be
ignored. The ARL is 1 / .02275 = 43.96 .
Shewhart is The conclusion can be drawn that the Shewhart chart is superior for
better for detecting large shifts and the CUSUM scheme is faster for small shifts. The
detecting break-even point is a function of h, as the table shows.
large shifts,
CUSUM is
faster for
small shifts
Comparison For the Shewhart chart control technique, the decision regarding the
of Shewhart state of control of the process at any time, t, depends solely on the most
control recent measurement from the process and, of course, the degree of
chart and 'trueness' of the estimates of the control limits from historical data. For
EWMA the EWMA control technique, the decision depends on the EWMA
control statistic, which is an exponentially weighted average of all prior data,
chart including the most recent measurement.
techniques
By the choice of weighting factor, , the EWMA control procedure can
be made sensitive to a small or gradual drift in the process, whereas the
Shewhart control procedure can only react when the last data point is
outside a control limit.
Choice of The parameter determines the rate at which 'older' data enter into the
weighting calculation of the EWMA statistic. A value of = 1 implies that only
factor the most recent measurement influences the EWMA (degrades to
Shewhart chart). Thus, a large value of = 1 gives more weight to
recent data and less weight to older data; a small value of gives more
weight to older data. The value of is usually set between 0.2 and 0.3
(Hunter) although this choice is somewhat arbitrary. Lucas and Saccucci
(1990) give tables that help the user select .
Definition of The center line for the control chart is the target value or EWMA0. The
control control limits are:
limits for UCL = EWMA0 + ksewma
EWMA
LCL = EWMA0 - ksewma
where the factor k is either set equal 3 or chosen using the Lucas and
Saccucci (1990) tables. The data are assumed to be independent and
these tables also assume a normal population.
As with all control procedures, the EWMA procedure depends on a
database of measurements that are truly representative of the process.
Once the mean value and standard deviation have been calculated from
this database, the process can enter the monitoring stage, provided the
process was in control when the data were collected. If not, then the
usual Phase 1 work would have to be completed first.
Sample data Consider the following data consisting of 20 points where 1 - 10 are on
the top row from left to right and 11-20 are on the bottom row from left
to right:
EWMA These data represent control measurements from the process which is to
statistics for be monitored using the EWMA control chart technique. The
sample data corresponding EWMA statistics that are computed from this data set
are:
Interpretation The red dots are the raw data; the jagged line is the EWMA statistic
of EWMA over time. The chart tells us that the process is in control because all
control chart EWMAt lie between the control limits. However, there seems to be a
trend upwards for the last 5 periods.
There is another chart which handles defects per unit, called the u chart
(for unit). This applies when we wish to work with the average number
of nonconformities per unit of product.
For additional references, see Woodall (1997) which reviews papers
showing examples of attribute control charting, including examples
from semiconductor manufacturing such as those examining the spatial
depencence of defects.
and
The inspection Before the control chart parameters are defined there is one more
unit definition: the inspection unit. We shall count the number of defects
that occur in a so-called inspection unit. More often than not, an
inspection unit is a single unit or item of product; for example, a
wafer. However, sometimes the inspection unit could consist of five
wafers, or ten wafers and so on. The size of the inspection units may
depend on the recording facility, measuring equipment, operators, etc.
Suppose that defects occur in a given inspection unit according to the
Poisson distribution, with parameter c (often denoted by np or the
Greek letter ). In other words
Control charts
for counts,
using the
Poisson where x is the number of defects and c > 0 is the parameter of the
distribution Poisson distribution. It is known that both the mean and the variance
of this distribution are equal to c. Then the k-sigma control chart is
If the LCL comes out negative, then there is no lower control limit.
This control scheme assumes that a standard value for c is available. If
this is not the case then c may be estimated as the average of the
number of defects in a preliminary sample of inspection units, call it
. Usually k is set to 3 by many practioners.
Control chart An example may help to illustrate the construction of control limits for
example using counts data. We are inspecting 25 successive wafers, each containing
counts 100 chips. Here the wafer is the inspection unit. The observed number
of defects are
8 12 21 11
9 10 22 19
10 17 23 16
11 19 24 31
12 17 25 13
13 14
Normal We have seen that the 3-sigma limits for a c chart, where c represents
approximation the number of nonconformities, are given by
to Poisson is
adequate
when the
mean of the
Poisson is at
least 5 where it is assumed that the normal approximation to the Poisson
distribution holds, hence the symmetry of the control limits. It is
shown in the literature that the normal approximation to the Poisson is
adequate when the mean of the Poisson is at least 5. When applied to
the c chart this implies that the mean of the defects should be at least
5. This requirement will often be met in practice, but still, when the
mean is smaller than 9 (solving the above equation) there will be no
lower control limit.
Let the mean be 10. Then the lower control limit = 0.513. However,
P(c = 0) = .000045, using the Poisson formula. This is only 1/30 of the
assumed area of .00135. So one has to raise the lower limit so as to get
as close as possible to .00135. From Poisson tables or computer
software we find that P(1) = .0005 and P(2) = .0027, so the lower limit
should actually be 2 or 3.
on the original scale, but they require special tables to obtain the
limits. Of course, software might be used instead.
Warning for Note: In general, it is not a good idea to use 3-sigma limits for
highly skewed distributions that are highly skewed (see Ryan and Schwertman (1997)
distributions for more about the possibly extreme consequences of doing this).
The
binomial
distribution
model for
number of
defectives in
a sample
The mean of D is np and the variance is np(1-p). The sample proportion
nonconforming is the ratio of the number of nonconforming units in the
sample, D, to the sample size n,
and
p control If the true fraction conforming p is known (or a standard value is given),
charts for then the center line and control limits of the fraction nonconforming
lot control chart is
proportion
defective
Hotelling As in the univariate case, when data are grouped, the T 2 chart can be
charts for paired with a chart that displays a measure of variability within the
both means subgroups for all the analyzed characteristics. The combined T 2 and
and
dispersion (dispersion) charts are thus a multivariate counterpart of the univariate
and S (or and R) charts.
Additional For more details and examples see the next page and also Tutorials,
discussion section 5, subsections 4.3, 4.3.1 and 4.3.2. An introduction to Elements of
multivariate analysis is also given in the Tutorials.
The constant c is the sample size from which the covariance matrix was
estimated.
Mean and Let X1,...Xn be n p-dimensional vectors of observations that are sampled
Covariance independently from Np(m, ) with p < n-1, with the covariance
matrices
matrix of X. The observed mean vector and the sample dispersion
matrix
Additional See Tutorials (section 5), subsections 4.3, 4.3.1 and 4.3.2 for more
discussion details and examples. An introduction to Elements of multivariate
analysis is also given in the Tutorials.
Another way Another way to analyze the data is to use principal components. For
to monitor each multivariate measurement (or observation), the principal
multivariate components are linear combinations of the standardized p variables (to
data: standardize subtract their respective targets and divide by their
Principal standard deviations). The principal components have two important
Components advantages:
control 1. the new variables are uncorrelated (or almost)
charts
2. very often, a few (sometimes 1 or 2) principal components may
capture most of the variability in the data so that we do not have
to use all of the p principal components for control.
Additional More details and examples are given in the Tutorials (section 5).
discussion
where Zi is the ith EWMA vector, Xi is the the ith observation vector i
= 1, 2, ..., n, Z0 is the vector of variable values from the historical data,
is the diag( 1, 2, ... , p) which is a diagonal matrix with 1, 2,
... , p on the main diagonal, and p is the number of variables; that is
the number of elements in each vector.
Illustration of The following illustration may clarify this. There are p variables and
multivariate each variable contains n observations. The input data matrix looks like:
EWMA
Simplification It has been shown (Lowry et al., 1992) that the (k,l)th element of the
covariance matrix of the ith EWMA, , is
Table for The following table gives the values of (1- ) 2i for selected values of
selected and i.
values of
2i
and i
1- 4 6 8 10 12 20 30 40 50
.9 .656 .531 .430 .349 .282 .122 .042 .015 .005
.8 .410 .262 .168 .107 .069 .012 .001 .000 .000
.7 .240 .118 .058 .028 .014 .001 .000 .000 .000
.6 .130 .047 .017 .006 .002 .000 .000 .000 .000
.5 .063 .016 .004 .001 .000 .000 .000 .000 .000
.4 .026 .004 .001 .000 .000 .000 .000 .000 .000
.3 .008 .001 .000 .000 .000 .000 .000 .000 .000
.2 .002 .000 .000 .000 .000 .000 .000 .000 .000
.1 .000 .000 .000 .000 .000 .000 .000 .000 .000
*****************************************************
* Multi-Variate EWMA Control Chart *
*****************************************************
The UCL = 5.938 for = .05. Smaller choices of are also used.
● Sales Forecasting
● Budgetary Analysis
● Yield Projections
● Inventory Studies
● Workload Projections
● Utility Studies
● Census Analysis
There are Techniques: The fitting of time series models can be an ambitious
many undertaking. There are many methods of model fitting including the
methods following:
used to ● Box-Jenkins ARIMA models
model and
forecast ● Box-Jenkins Multivariate Models
time series ● Holt-Winters Exponential Smoothing (single, double, triple)
The user's application and preference will decide the selection of the
appropriate technique. It is beyond the realm and intention of the
authors of this handbook to cover all these methods. The overview
presented here will start by looking at some basic smoothing techniques:
● Averaging Methods
Taking We will first investigate some averaging methods, such as the "simple"
averages is average of all past data.
the simplest
way to A manager of a warehouse wants to know how much a typical supplier
smooth data delivers in 1000 dollar units. He/she takes a sample of 12 suppliers, at
random, obtaining the following results:
Supplier Amount Supplier Amount
1 9 7 11
2 8 8 7
3 9 9 13
4 12 10 9
5 9 11 11
6 12 12 10
The computed mean or average of the data = 10. The manager decides
to use this as the estimate for expenditure of a typical supplier.
Is this a good or bad estimate?
1 9 -1 1
2 8 -2 4
3 9 -1 1
4 12 2 4
5 9 -1 1
6 12 2 4
7 11 1 1
8 7 -3 9
9 13 3 9
10 9 -1 1
11 11 1 1
12 10 0 0
Table of So how good was the estimator for the amount spent for each supplier?
MSE results Let us compare the estimate (10) with the following estimates: 7, 9, and
for example 12. That is, we estimate that each supplier will spend $7, or $9 or $12.
using
different Performing the same calculations we arrive at:
estimates Estimator 7 9 10 12
SSE 144 48 36 84
MSE 12 4 3 7
The estimator with the smallest MSE is the best. It can be shown
mathematically that the estimator that minimizes the MSE for a set of
random data is the mean.
Table Next we will examine the mean to see how well it predicts net income
showing over time.
squared
error for the The next table gives the income before taxes of a PC manufacturer
mean for between 1985 and 1994.
sample data Squared
Year $ (millions) Mean Error Error
1985 46.163 48.776 -2.613 6.828
1986 46.998 48.776 -1.778 3.161
1987 47.816 48.776 -0.960 0.922
1988 48.311 48.776 -0.465 0.216
1989 48.758 48.776 -0.018 0.000
1990 49.164 48.776 0.388 0.151
1991 49.548 48.776 0.772 0.596
1992 48.915 48.776 1.139 1.297
1993 50.315 48.776 1.539 2.369
1994 50.768 48.776 1.992 3.968
The mean is The question arises: can we use the mean to forecast income if we
not a good suspect a trend? A look at the graph below shows clearly that we should
estimator not do this.
when there
are trends
Moving The next table summarizes the process, which is referred to as Moving
average Averaging. The general expression for the moving average is
example Mt = [ Xt + Xt-1 + ... + Xt-N+1] / N
1 9
2 8
3 9 8.667 0.333 0.111
4 12 9.667 2.333 5.444
5 9 10.000 -1.000 1.000
6 12 11.000 1.000 1.000
7 11 10.667 0.333 0.111
8 7 10.000 -3.000 9.000
9 13 10.333 2.667 7.111
10 9 9.667 -0.667 0.444
11 11 11.000 0 0
12 10 10.000 0 0
1 9
2 8
3 9 9.5
4 12 10.0
5 9 10.75
6 12
7 11
The first The initial EWMA plays an important role in computing all the
forecast is subsequent EWMA's. Setting S2 to y1 is one method of initialization.
very Another way is to set it to the target of the process.
important
Still another possibility would be to average the first four or five
observations.
It can also be shown that the smaller the value of , the more important
is the selection of the initial EWMA. The user would be wise to try a
few methods, (assuming that the software has them available) before
finalizing the settings.
Expand Let us expand the basic equation by first substituting for St-1 in the
basic basic equation to obtain
equation
St = yt-1 + (1- ) [ yt-2 + (1- ) St-2 ]
= yt-1 + (1- ) yt-2 + (1- )2 St-2
Summation By substituting for St-2, then for St-3, and so forth, until we reach S2
formula for (which is just y1), it can be shown that the expanding equation can be
basic written as:
equation
Expanded For example, the expanded equation for the smoothed value S5 is:
equation for
S5
From the last formula we can see that the summation term shows that
the contribution to the smoothed value St becomes less at each
consecutive time period.
Example for Let = .3. Observe that the weights (1- ) t decrease exponentially
= .3 (geometrically) with time.
Value weight
last y1 .2100
y2 .1470
y3 .1029
y4 .0720
How do you The speed at which the older responses are dampened (smoothed) is a
choose the function of the value of . When is close to 1, dampening is quick
weight and when is close to 0, dampening is slow. This is illustrated in the
parameter? table below:
---------------> towards past observations
(1- ) (1- ) 2 (1- ) 3 (1- ) 4
We choose the best value for so the value which results in the
smallest MSE.
Example Let us illustrate this principle with an example. Consider the following
data set consisting of 12 observations taken over time:
Error
Time yt S ( =.1) Error squared
1 71
2 70 71 -1.00 1.00
3 69 70.9 -1.90 3.61
4 68 70.71 -2.71 7.34
5 64 70.44 -6.44 41.47
6 65 69.80 -4.80 23.04
7 72 69.32 2.68 7.18
8 78 69.58 8.42 70.90
9 75 70.43 4.57 20.88
10 75 70.88 4.12 16.97
11 75 71.29 3.71 13.76
12 70 71.67 -1.67 2.79
The sum of the squared errors (SSE) = 208.94. The mean of the squared
errors (MSE) is the SSE /11 = 19.0.
Calculate The MSE was again calculated for = .5 and turned out to be 16.29, so
for different in this case we would prefer an of .5. Can we do better? We could
values of apply the proven trial-and-error method. This is an iterative procedure
beginning with a range of between .1 and .9. We determine the best
initial choice for and then search between - and + . We
could repeat this perhaps one more time to find the best to 3 decimal
places.
Nonlinear But there are better search methods, such as the Marquardt procedure.
optimizers This is a nonlinear optimizer that minimizes the sum of squares of
can be used residuals. In general, most well designed statistical software programs
should be able to find the value of that minimizes the MSE.
Sample plot
showing
smoothed
data for 2
values of
In other words, the new forecast is the old one plus an adjustment for
the error that occurred in the last forecast.
Bootstrapping of Forecasts
Bootstrapping What happens if you wish to forecast from some origin, usually the
forecasts last data point, and no actual observations are available? In this
situation we have to modify the formula to become:
Example of Bootstrapping
Example The last data point in the previous example was 70 and its forecast
(smoothed value S) was 71.7. Since we do have the data point and the
forecast available, we can calculate the next forecast using the regular
formula
Table The following table displays the comparison between the two methods:
comparing Period Bootstrap Data Single Smoothing
two methods forecast Forecast
13 71.50 75 71.5
14 71.35 75 71.9
15 71.21 74 72.2
16 71.09 78 72.4
17 70.98 86 73.0
Sample data Let us demonstrate this with the following data set smoothed with an
set with trend of 0.3:
Data Fit
6.4
5.6 6.4
7.8 6.2
8.8 6.7
11.0 7.3
11.6 8.4
16.7 9.4
15.3 11.6
21.6 12.7
22.4 15.4
Note that the current value of the series is used to calculate its smoothed
value replacement in double exponential smoothing.
Initial Values
Several As in the case for single smoothing, there are a variety of schemes to set
methods to initial values for St and bt in double smoothing.
choose the
initial S1 is in general set to y1. Here are three suggestions for b1:
values b1 = y2 - y1
b1 = (yn - y1)/(n - 1)
Comments
Meaning of The first smoothing equation adjusts St directly for the trend of the
the previous period, bt-1, by adding it to the last smoothed value, St-1. This
smoothing helps to eliminate the lag and brings St to the appropriate base of the
equations
current value.
The second smoothing equation then updates the trend, which is
expressed as the difference between the last two values. The equation is
similar to the basic form of single smoothing, but here applied to the
updating of the trend.
Non-linear The values for and can be obtained via non-linear optimization
optimization techniques, such as the Marquardt Algorithm.
techniques
can be used
Example
Comparison of Forecasts
Table To see how each method predicts the future, we computed the first five
showing forecasts from the last observation as follows:
single and Period Single Double
double
exponential 11 22.4 25.8
smoothing 12 22.4 28.7
forecasts 13 22.4 31.7
14 22.4 34.6
15 22.4 37.6
Plot A plot of these results (using the forecasted double smoothing values) is
comparing very enlightening.
single and
double
exponential
smoothing
forecasts
This graph indicates that double smoothing follows the data much closer
than single smoothing. Furthermore, for forecasting single smoothing
cannot do better than projecting a straight horizontal line, which is not
very likely to occur in reality. So in this case double smoothing is
preferred.
To handle In this case double smoothing will not work. We now introduce a third
seasonality, equation to take care of seasonality (sometimes called periodicity). The
we have to resulting set of equations is called the "Holt-Winters" (HW) method after
add a third the names of the inventors.
parameter
The basic equations for their method are given by:
where
● y is the observation
● S is the smoothed observation
● b is the trend factor
● I is the seasonal index
● F is the forecast at m periods ahead
● t is an index denoting a time period
and , , and are constants that must be estimated in such a way that the
MSE of the error is minimized. This is best left to a good software package.
Complete To initialize the HW method we need at least one complete season's data to
season determine initial estimates of the seasonal indices I t-L.
needed
L periods A complete season's data consists of L periods. And we need to estimate the
in a season trend factor from one period to the next. To accomplish this, it is advisable
to use two complete seasons; that is, 2L periods.
How to get The general formula to estimate the initial trend is given by
initial
estimates
for trend
and
seasonality
parameters
As we will see in the example, we work with data that consist of 6 years
with 4 periods (that is, 4 quarters) per year. Then
Step 3: Step 3: Now the seasonal indices are formed by computing the average of
form each row. Thus the initial seasonal indices (symbolically) are:
seasonal I1 = ( y1/A1 + y5/A2 + y9/A3 + y13/A4 + y17/A5 + y21/A6)/6
indices I2 = ( y2/A1 + y6/A2 + y10/A3 + y14/A4 + y18/A5 + y22/A6)/6
I3 = ( y3/A1 + y7/A2 + y11/A3 + y15/A4 + y19/A5 + y22/A6)/6
I4 = ( y4/A1 + y8/A2 + y12/A3 + y16/A4 + y20/A5 + y24/A6)/6
We now know the algebra behind the computation of the initial estimates.
The next page contains an example of triple exponential smoothing.
Plot of raw
data with
single,
double, and
triple
exponential
forecasts
6906 .4694
5054 .1086 1.000
936 1.000 1.000
520 .7556 0.000 .9837
Computation The data set consists of quarterly sales data. The season is 1 year and
of initial since there are 4 quarters per year, L = 4. Using the formula we obtain:
trend
Table of 1 2 3 4 5 6
initial
seasonal 1 362 382 473 544 628 627
indices 2 385 409 513 582 707 725
3 432 498 582 681 773 854
4 341 387 474 557 592 661
380 419 510.5 591 675 716.75
In this example we used the full 6 years of data. Other schemes may use
only 3, or some other number of years. There are also a number of ways
to compute initial estimates.
Data Each line contains the CO2 concentration (mixing ratio in dry air,
expressed in the WMO X85 mole fraction scale, maintained by the Scripps
Institution of Oceanography). In addition, it contains the year, month, and
a numeric value for the combined month and year. This combined date is
useful for plotting purposes.
Data
Southern
6.4.4.2. Stationarity
Stationarity A common assumption in many time series techniques is that the
data are stationary.
A stationary process has the property that the mean, variance and
autocorrelation structure do not change over time. Stationarity can
be defined in precise mathematical terms, but for our purpose we
mean a flat looking series, without trend, constant variance over
time, a constant autocorrelation structure over time and no periodic
fluctuations (seasonality).
The differenced data will contain one less point than the
original data. Although you can difference the data more than
once, one differene is usually sufficient.
2. If the data contain a trend, we can fit some type of curve to
the data and then model the residuals from that fit. Since the
purpose of the fit is to simply remove long term trend, a
simple fit, such as a straight line, is typically used.
3. For non-constant variance, taking the logarithm or square root
of the series may stabilize the variance. For negative data, you
can add a suitable constant to make all the data positive before
applying the transformation. This constant can then be
subtracted from the model to obtain predicted (i.e., the fitted)
values and forecasts for future points.
The above techniques are intended to generate series with constant
Example The following plots are from a data set of monthly CO2
concentrations.
Run Sequence
Plot
The initial run sequence plot of the data indicates a rising trend. A
visual inspection of this plot indicates that a simple linear fit should
be sufficient to remove this upward trend.
This plot also shows periodical behavior. This is discussed in the
next section.
Linear Trend
Removed
This plot contains the residuals from a linear fit to the original data.
After removing the linear trend, the run sequence plot indicates that
the data have a constant location and variance, although the pattern
of the residuals shows that the data depart from the model in a
systematic way.
6.4.4.3. Seasonality
Seasonality Many time series display seasonality. By seasonality, we mean periodic
fluctuations. For example, retail sales tend to peak for the Christmas
season and then decline after the holidays. So time series of retail sales
will typically show increasing sales from September through December
and declining sales in January and February.
Seasonality is quite common in economic time series. It is less common
in engineering and scientific data.
If seasonality is present, it must be incorporated into the time series
model. In this section, we discuss techniques for detecting seasonality.
We defer modeling of seasonality until later sections.
seasonal periods are known. In most cases, the analyst will in fact know
this. For example, for monthly data, the period is 12 since there are 12
months in a year. However, if the period is not known, the
autocorrelation plot can help. If there is significant seasonality, the
autocorrelation plot should show spikes at lags equal to the period. For
example, for monthly data, if there is a seasonality effect, we would
expect to see significant peaks at lag 12, 24, 36, and so on (although the
intensity may decrease the further out we go).
Example The following plots are from a data set of southern oscillations for
without predicting el nino.
Seasonality
Run
Sequence
Plot
Seasonal
Subseries
Plot
The means for each month are relatively close and show no obvious
pattern.
Box Plot
Example The following plots are from a data set of monthly CO2 concentrations.
with A linear trend has been removed from these data.
Seasonality
Run
Sequence
Plot
Seasonal
Subseries
Plot
The seasonal subseries plot shows the seasonal pattern more clearly. In
Box Plot
As with the seasonal subseries plot, the seasonal pattern is quite evident
in the box plot.
Sample Plot
This plot allows you to detect both between group and within group
patterns.
If there is a large number of observations, then a box plot may be
preferable.
Questions The seasonal subseries plot can provide answers to the following
questions:
1. Do the data exhibit a seasonal pattern?
2. What is the nature of the seasonality?
3. Is there a within-group pattern (e.g., do January and July exhibit
similar patterns)?
4. Are there any outliers once seasonality has been accounted for?
Software Seasonal subseries plots are available in a few general purpose statistical
software programs. They are available in Dataplot. It may possible to
write macros to generate this plot in most statistical software programs
that do not provide it directly.
Trend, One approach is to decompose the time series into a trend, seasonal,
Seasonal, and residual component.
Residual
Decompositions Triple exponential smoothing is an example of this approach. Another
example, called seasonal loess, is based on locally weighted least
squares and is discussed by Cleveland (1993). We do not discuss
seasonal loess in this handbook.
where Xt is the time series, is the mean of the series, At-i are white
noise, and 1, ... , q are the parameters of the model. The value of q
is called the order of the MA model.
That is, a moving average model is conceptually a linear regression of
the current value of the series against the white noise or random
shocks of one or more prior values of the series. The random shocks
at each point are assumed to come from the same distribution,
typically a normal distribution, with location at zero and constant
scale. The distinction in this model is that these random shocks are
propogated to future values of the time series. Fitting the MA
estimates is more complicated than with AR models because the error
terms are not observable. This means that iterative non-linear fitting
procedures need to be used in place of linear least squares. MA
models also have a less obvious interpretation than AR models.
Sometimes the ACF and PACF will suggest that a MA model would
be a better model choice and sometimes both AR and MA terms
should be used in the same model (see Section 6.4.4.5).
Note, however, that the error terms after the model is fit should be
independent and follow the standard assumptions for a univariate
process.
Box-Jenkins Box and Jenkins popularized an approach that combines the moving
Approach average and the autoregressive approaches in the book "Time Series
Analysis: Forecasting and Control" (Box, Jenkins, and Reinsel,
1994).
Although both autoregressive and moving average approaches were
already known (and were originally investigated by Yule), the
contribution of Box and Jenkins was in developing a systematic
methodology for identifying and estimating models that could
incorporate both approaches. This makes Box-Jenkins models a
powerful class of models. The next several sections will discuss these
models in detail.
where the terms in the equation have the same meaning as given for the
AR and MA model.
Stages in There are three primary stages in building a Box-Jenkins time series
Box-Jenkins model.
Modeling 1. Model Identification
2. Model Estimation
3. Model Validation
Detecting Stationarity can be assessed from a run sequence plot. The run
stationarity sequence plot should show constant location and scale. It can also be
detected from an autocorrelation plot. Specifically, non-stationarity is
often indicated by an autocorrelation plot with very slow decay.
Identify p and q Once stationarity and seasonality have been addressed, the next step
is to identify the order (i.e., the p and q) of the autoregressive and
moving average terms.
Autocorrelation The primary tools for doing this are the autocorrelation plot and the
and Partial partial autocorrelation plot. The sample autocorrelation plot and the
Autocorrelation sample partial autocorrelation plot are compared to the theoretical
Plots behavior of these plots when the order is known.
Examples We show a typical series of plots for performing the initial model
identification for
1. the southern oscillations data and
2. the CO2 monthly concentrations data.
Run Sequence
Plot
Seasonal
Subseries Plot
Autocorrelation
Plot
Partial
Autocorrelation
Plot
Run Sequence
Plot
The initial run sequence plot of the data indicates a rising trend. A
visual inspection of this plot indicates that a simple linear fit should
be sufficient to remove this upward trend.
Linear Trend
Removed
This plot contains the residuals from a linear fit to the original data.
After removing the linear trend, the run sequence plot indicates that
the data have a constant location and variance, which implies
stationarity.
However, the plot does show seasonality. We generate an
autocorrelation plot to help determine the period followed by a
seasonal subseries plot.
Autocorrelation
Plot
Seasonal
Subseries Plot
Autocorrelation
Plot for
Seasonally
Differenced
Data
Partial
Autocorrelation
Plot of
Seasonally
Differenced
Data
remaining seasonality.
In summary, our intial attempt would be to fit an AR(2) model with a
seasonal AR(12) term on the data with a linear trend line removed.
We could try the model both with and without seasonal differencing
applied. Model validation should be performed before accepting this
as a final model.
Sample Plot
Questions The partial autocorrelation plot can help provide answers to the
following questions:
1. Is an AR model appropriate for the data?
2. If an AR model is appropriate, what order should we use?
Case Study The partial autocorrelation plot is demonstrated in the Negiz data case
study.
4-Plot of As discussed in the EDA chapter, one way to assess if the residuals
Residuals from the Box-Jenkins model follow the assumptions is to generate a
4-plot of the residuals and an autocorrelation plot of the residuals. One
could also look at the value of the Box-Ljung (1978) statistic.
Output from other software programs will be similar, but not identical.
Model
With the SEMSTAT program, you start by entering a valid file name or you can select a
Identification
file extension to search for files of particular interest. In this program, if you press the
Section
enter key, ALL file names in the directory are displayed.
Enter FILESPEC or EXTENSION (1-3 letters): To quit, press F10.
? bookf.bj
MAX MIN MEAN VARIANCE NO. DATA
80.0000 23.0000 51.7086 141.8238 70
Do you wish to make transformations? y/n n
Input order of difference or 0: 0
Input period of seasonality (2-12) or 0: 0
Model
Fitting Enter FILESPEC or EXTENSION (1-3 letters): To quit, press F10.
Section
? bookf.bj
MAX MIN MEAN VARIANCE NO. DATA
80.0000 23.0000 51.7086 141.8238 70
Do you wish to make transformations? y/n n
Input order of difference or 0: 0
Input NUMBER of AR terms: 2
Input NUMBER of MA terms: 0
Input period of seasonality (2-12) or 0: 0
*********** OUTPUT SECTION ***********
AR estimates with Standard Errors
Phi 1 : -0.3397 0.1224
Phi 2 : 0.1904 0.1223
Forecasting
Section
---------------------------------------------------
FORECASTING SECTION
---------------------------------------------------
Analyzing If you observe very large autocorrelations at lags spaced n periods apart, for
Autocorrelation example at lags 12 and 24, then there is evidence of periodicity. That effect
Plot for should be removed, since the objective of the identification stage is to reduce
Seasonality the autocorrelations throughout. So if simple differencing was not enough,
try seasonal differencing at a selected period. In the above case, the period is
12. It could, of course, be any value, such as 4 or 6.
The number of seasonal terms is rarely more than 1. If you know the shape of
your forecast function, or you wish to assign a particular shape to the forecast
function, you can select the appropriate number of terms for seasonal AR or
seasonal MA models.
The book by Box and Jenkins, Time Series Analysis Forecasting and Control
(the later edition is Box, Jenkins and Reinsel, 1994) has a discussion on these
forecast functions on pages 326 - 328. Again, if you have only a faint notion,
but you do know that there was a trend upwards before differencing, pick a
seasonal MA term and see what comes out in the diagnostics.
The results after taking a seasonal difference look good!
Model Fitting Now we can proceed to the estimation, diagnostics and forecasting routines.
Section The following program is again executed from a menu and issues the
following flow of output:
Enter FILESPEC or EXTENSION (1-3 letters):
To quit press F10.
? bookg.bj
MAX MIN MEAN VARIANCE NO. DATA
622.0000 104.0000 280.2986 14391.9170 144
y (we selected a square root
Do you wish to make transformation because a closer
transformations? y/n inspection of the plot revealed
increasing variances over time)
Statistics of Transformed series:
Mean: 5.542 Variance 0.195
Input order of difference or 0: 1
Input NUMBER of AR terms: Blank defaults to 0
Input NUMBER of MA terms: 1
Input period of seasonality (2-12) or
12
0:
Input order of seasonal difference or
1
0:
Input NUMBER of seasonal AR
Blank defaults to 0
terms:
Input NUMBER of seasonal MA
1
terms:
Statistics of Differenced series:
Forecasting Defaults are obtained by pressing the enter key, without input.
Section Default for number of periods ahead from last period = 6.
Default for the confidence band around the forecast = 90%.
Next Period Lower Forecast Upper
145 423.4257 450.1975 478.6620
146 382.9274 411.6180 442.4583
147 407.2839 441.9742 479.6191
148 437.8781 479.2293 524.4855
149 444.3902 490.1471 540.6153
150 491.0981 545.5740 606.0927
151 583.6627 652.7856 730.0948
152 553.5620 623.0632 701.2905
153 458.0291 518.6510 587.2965
with
Interesting There are a few interesting properties associated with the phi or AR
properties of parameter matrices. Consider the following example for a bivariate
parameter series with n =2, p = 2, and q = 0. The ARMAV(2,0) model is:
matrices
Without loss of generality, assume that the X series is input and the Y series
are output and that the mean vector = (0,0).
Therefore, tranform the observation by subtracting their respective averages.
Diagonal The diagonal terms of each Phi matrix are the scalar estimates for each series,
terms of in this case:
Phi matrix
1.11, 2.11 for the input series X,
Transfer The lower off-diagonal elements represent the influence of the input on the
mechanism output.
This is called the "transfer" mechanism or transfer-function model as
discussed by Box and Jenkins in Chapter 11. The terms here correspond to
their terms.
The upper off-diagonal terms represent the influence of the output on the
input.
Feedback This is called "feedback". The presence of feedback can also be seen as a high
value for a coefficient in the correlation matrix of the residuals. A "true"
transfer model exists when there is no feedback.
This can be seen by expressing the matrix form into scalar form:
Delay Finally, delay or "dead' time can be measured by studying the lower
off-diagonal elements again.
If, for example, 1.21 is non-significant, the delay is 1 time period.
Plots of The plots of the input and output series are displayed below.
input and
output
series
-------------------------------------------------------------------------------
Statistics on the Residuals
MEANS
-0.0000 0.0000
COVARIANCE MATRIX
0.01307 -0.00118
-0.00118 0.06444
CORRELATION MATRIX
1.0000 -0.0407
-0.0407 1.0000
----------------------------------------------------------------------
--------------------------------------------------------
FORECASTING SECTION
--------------------------------------------------------
The forecasting method is an extension of the model and follows the
theory outlined in the previous section. Based on the estimated variances
and number
of forecasts we can compute the forecasts and their confidence limits.
The user, in this software, is able to choose how many forecasts to
obtain, and at what confidence levels.
Defaults are obtained by pressing the enter key, without input.
Default for number of periods ahead from last period = 6.
Default for the confidence band around the forecast = 90%.
How many periods ahead to forecast? 6
Enter confidence level for the forecast limits : .90:
SERIES: 1
6.5. Tutorials
Tutorial 1. What do we mean by "Normal" data?
contents 2. What do we do when data are "Non-normal"?
3. Elements of Matrix Algebra
1. Numerical Examples
2. Determinant and Eigenstructure
4. Elements of Multivariate Analysis
1. Mean vector and Covariance Matrix
2. The Multivariate Normal Distribution
3. Hotelling's T2
1. Example of Hotelling's T2 Test
2. Example 1 (continued)
3. Example 2 (multiple groups)
4. Hotelling's T2 Chart
5. Principal Components
1. Properties of Principal Components
2. Numerical Example
Normal
probability
distribution
Parameters The parameters of the normal distribution are the mean and the
of normal standard deviation (or the variance 2). A special notation is
distribution employed to indicate that X is normally distributed with these
parameters, namely
X ~ N( , ) or X ~ N( , 2).
Tables for the Tables of the cumulative standard normal distribution are given in
cumulative every statistics textbook and in the handbook. A rich variety of
standard approximations can be found in the literature on numerical methods.
normal
distribution For example, if = 0 and = 1 then the area under the curve from -
1 to + 1 is the area from 0 - 1 to 0 + 1, which is 0.6827. Since
most standard normal tables give area to the left of the lookup value,
they will have for z = 1 an area of .8413 and for z = -1 an area of .1587.
By subtraction we obtain the area between -1 and +1 to be .8413 -
.1587 = .6826.
The Box-Cox
Transformation
Given the vector of data observations x = x1, x2, ...xn, one way to select the
power is to use the that maximizes the logarithm of the likelihood
function
The logarithm
of the
likelihood
function where
Confidence In addition, a confidence bound (based on the likelihood ratio statistic) can
bound for be constructed for as follows: A set of values that represent an
approximate 100(1- )% confidence bound for is formed from those
that satisfy
Example of the To illustrate the procedure, we used the data from Johnson and Wichern's
Box-Cox textbook (Prentice Hall 1988), Example 4.14. The observations are
scheme microwave radiation measurements.
Table of The values of the log-likelihood function obtained by varying from -2.0
log-likelihood to 2.0 are given below.
values for LLF LLF LLF
various values
of -2.0 7.1146 -0.6 89.0587 0.7 103.0322
-1.9 14.1877 -0.5 92.7855 0.8 101.3254
-1.8 21.1356 -0.4 96.0974 0.9 99.3403
-1.7 27.9468 -0.3 98.9722 1.0 97.1030
-1.6 34.6082 -0.2 101.3923 1.1 94.6372
-1.5 41.1054 -0.1 103.3457 1.2 91.9643
-1.4 47.4229 0.0 104.8276 1.3 89.1034
-1.3 53.5432 0.1 105.8406 1.4 86.0714
1.2 59.4474 0.2 106.3947 1.5 82.8832
-1.1 65.1147 0.3 106.5069 1.6 79.5521
-0.9 75.6471 0.4 106.1994 1.7 76.0896
-0.8 80.4625 0.5 105.4985 1.8 72.5061
-0.7 84.9421 0.6 104.4330 1.9 68.8106
The Box-Cox transform is also discussed in Chapter 1 under the Box Cox
Linearity Plot and the Box Cox Normality Plot. The Box-Cox normality
plot discussion provides a graphical method for choosing to transform a
data set to normality. The criterion used to choose for the Box-Cox
linearity plot is the value of that maximizes the correlation between the
transformed x-values and the y-values when making a normal probability
plot of the (transformed) data.
Basic Vectors and matrices are arrays of numbers. The algebra for symbolic
definitions operations on them is different from the algebra for operations on
and scalars, or single numbers. For example there is no division in matrix
operations of algebra, although there is an operation called "multiplying by an
matrix inverse". It is possible to express the exact equivalent of matrix algebra
algebra - equations in terms of scalar algebra expressions, but the results look
needed for rather messy.
multivariate
analysis It can be said that the matrix algebra notation is shorthand for the
corresponding scalar longhand.
Sum of two The sum of two vectors (say, a and b) is the vector of sums of
vectors corresponding elements.
Sample matrices If
then
Matrix addition,
subtraction, and
multipication
and
Multiply matrix To multiply a a matrix by a given scalar, each element of the matrix
by a scalar is multiplied by that scalar
Identity matrix To augment the notion of the inverse of a matrix, A-1 (A inverse) we
notice the following relation
A-1A = A A-1 = I
I is a matrix of form
Definition of the
characteristic
equation for 2x2
matrix
Definition of A matrix with rows and columns exchanged in this manner is called the
Transpose transpose of the original matrix.
Definition of The mean vector consists of the means of each variable and the
mean vector variance-covariance matrix consists of the variances of the variables
and along the main diagonal and the covariances between each pair of
variance- variables in the other matrix positions.
covariance
matrix The formula for computing the covariance of the variables X and Y is
where the mean vector contains the arithmetic averages of the three
variables and the (unbiased) variance-covariance matrix S is calculated
by
Centroid, The mean vector is often referred to as the centroid and the
dispersion variance-covariance matrix as the dispersion or dispersion matrix. Also,
matix the terms variance-covariance matrix and covariance matrix are used
interchangeably.
where m = (m1, ..., mp) is the vector of means and is the variance-covariance
matrix of the multivariate normal distribution. The shortcut notation for this density
is
Univariate When p = 1, the one-dimensional vector X = X1 has the normal distribution with
normal mean m and variance 2
distribution
so that
with
Result does Although this result applies to hypothesis testing, it does not apply
not apply directly to multivariate Shewhart-type charts (for which there is no
directly to 0), although the result might be used as an approximation when a
multivariate large sample is used and data are in subgroups, with the upper control
Shewhart-type limit (UCL) of a chart based on the approximation.
charts
Selection of Three-sigma units are generally not used with multivariate charts,
different however, which makes the selection of different control limit forms for
control limit each Phase (based on the relevant distribution theory), a natural
forms for choice.
each Phase
Obtaining the Each i, i = 1, 2, ..., p, is obtained the same way as with an chart,
i namely, by taking k subgroups of size n and computing
Here is used to denote the average for the lth subgroup of the ith
variable. That is,
with xilr denoting the rth observation (out of n) for the ith variable in
the lth subgroup.
Estimating The variances and covariances are similarly averaged over the
the variances subgroups. Specifically, the sij elements of the variance-covariance
and matrix S are obtained as
covariances
Compare T2 As with an chart (or any other chart), the k subgroups would be
against tested for control by computing k values of T2 and comparing each
control against the UCL. If any value falls above the UCL (there is no lower
values control limit), the corresponding subgroup would be investigated.
Formula for Each of the k values of given in the equation above would be
the upper
compared with
control limit
Lower A lower control limit is generally not used in multivariate control chart
control limits applications, although some control chart methods do utilize a LCL.
Although a small value for might seem desirable, a value that is
very small would likely indicate a problem of some type as we would
not expect every element of to be virtually equal to every element
in .
Delete As with any Phase I control chart procedure, if there are any points that
out-of-control plot above the UCL and can be identified as corresponding to
points once out-of-control conditions that have been corrected, the point(s) should
cause be deleted and the UCL recomputed. The remaining points would then
discovered be compared with the new UCL and the process continued as long as
and corrected necessary, remembering that points should be deleted only if their
correspondence with out-of-control conditions can be identified and the
cause(s) of the condition(s) were removed.
Illustration To illustrate, assume that a subgroups had been discarded (with possibly a = 0) so
that k - a subgroups are used in obtaining and . We shall let these two values
be represented by and to distinguish them from the original values, and
, before any subgroups are deleted. Future values to be plotted on the
multivariate chart would then be obtained from
Phase II
control
limits
with a denoting the number of the original subgroups that are deleted before
computing and . Notice that the equation for the control limits for Phase II
given here does not reduce to the equation for the control limits for Phase I when a
= 0, nor should we expect it to since the Phase I UCL is used when testing for
control of the entire set of subgroups that is used in computing and .
Delete As in the case when subgroups are used, if any points plot outside these
points if control limits and special cause(s) that were subsequently removed can
special be identified, the point(s) would be deleted and the control limits
cause(s) are recomputed, making the appropriate adjustments on the degrees of
identified freedom, and re-testing the remaining points against the new limits.
and
corrected
Further The control limit expressions given in this section and the immediately
Information preceding sections are given in Ryan (2000, Chapter 9).
Inverse While these principal factors represent or replace one or more of the
transformaion original variables, it should be noted that they are not just a one-to-one
not possible transformation, so inverse transformations are not possible.
Original data To shed a light on the structure of principal components analysis, let
matrix us consider a multivariate data matrix X, with n rows and p columns.
The p elements of each row are scores or measurements on a subject
such as height, weight and age.
Linear Next, standardize the X matrix so that each column mean is 0 and
function that each column variance is 1. Call this matrix Z. Each column is a vector
maximizes variable, zi, i = 1, . . . , p. The main idea behind principal component
variance analysis is to derive a linear function y for each of the vector variables
zi. This linear function possesses an extremely important property;
namely, its variance is maximized.
Number of At this juncture you may be tempted to say: "so what?". To answer
parameters to this let us look at the intercorrelations among the elements of a vector
estimate variable. The number of parameters to be estimated for a p-element
increases variable is
rapidly as p ● p means
increases
● p variances
Orthogonal To produce a transformation vector for y for which the elements are
transformations uncorrelated is the same as saying that we want V such that Dy is a
simplify things diagonal matrix. That is, all the off-diagonal elements of Dy must be
zero. This is called an orthogonalizing transformation.
Infinite number There are an infinite number of values for V that will produce a
of values for V diagonal Dy for any correlation matrix R. Thus the mathematical
problem "find a unique V such that Dy is diagonal" cannot be solved
as it stands. A number of famous statisticians such as Karl Pearson
and Harold Hotelling pondered this problem and suggested a
"variance maximizing" solution.
Constrain v to The constraint on the numbers in v1 is that the sum of the squares of
generate a the coefficients equals 1. Expressed mathematically, we wish to
unique solution maximize
where
y1i = v1' zi
The eigenstructure
Lagrange Let
multiplier
approach
>
introducing the restriction on v1 via the Lagrange multiplier
approach. It can be shown (T.W. Anderson, 1958, page 347, theorem
8) that the vector of partial derivatives is
and setting this equal to zero, dividing out 2 and factoring gives
Largest Specifically, the largest eigenvalue, 1, and its associated vector, v1,
eigenvalue are required. Solving for this eigenvalue and vector is another
mammoth numerical task that can realistically only be performed by
a computer. In general, software is involved and the algorithms are
complex.
Remainig p After obtaining the first eigenvalue, the process is repeated until all p
eigenvalues eigenvalues are computed.
Principal Factors
Scale to zero It was mentioned before that it is helpful to scale any transformation
means and unit y of a vector variable z so that its elements have zero means and unit
variances variances. Such a standardized transformation is called a factoring of
z, or of R, and each linear component of the transformation is called
a factor.
Deriving unit Now, the principal components already have zero means, but their
variances for variances are not 1; in fact, they are the eigenvalues, comprising the
principal diagonal elements of L. It is possible to derive the principal factor
components with unit variance from the principal component as follows
where
B = VL -1/2
B matrix The matrix B is then the matrix of factor score coefficients for
principal factors.
Dimensionality The number of eigenvalues, N, used in the final set determines the
of the set of dimensionality of the set of factor scores. For example, if the original
factor scores test consisted of 8 measurements on 100 subjects, and we extract 2
eigenvalues, the set of factor scores is a matrix of 100 rows by 2
columns.
Factor Structure
The SS' is the source of the "explained" correlations among the variables.
communality Its diagonal is called "the communality".
Rotation
Factor analysis If this correlation matrix, i.e., the factor structure matrix, does not
help much in the interpretation, it is possible to rotate the axis of the
principal components. This may result in the polarization of the
correlation coefficients. Some practitioners refer to rotation after
generating the factor structure as factor analysis.
Communality
Formula for A measure of how well the selected factors (principal components)
communality "explain" the variance of each of the variables is given by a statistic
statistic called communality. This is defined by
Explanation of That is: the square of the correlation of variable k with factor i gives
communality the part of the variance accounted for by that factor. The sum of these
statistic squares for n factors is the communality, or explained variable for
that variable (row).
Main steps to In summary, here are the main steps to obtain the eigenstructure for a
obtaining correlation matrix.
eigenstructure 1. Compute R, the correlation matrix of the original data. R is
for a also the correlation matrix of the standardized data.
correlation
2. Obtain the characteristic equation of R which is a polynomial
matrix
of degree p (the number of variables), obtained from
expanding the determinant of |R- I| = 0 and solving for the
roots i, that is: 1, 2, ... , p.
3. Then solve for the columns of the V matrix, (v1, v2, ..vp). The
roots, , i, are called the eigenvalues (or latent values). The
columns of V are called the eigenvectors.
Sample data Let us analyze the following 3-variate dataset with 10 observations. Each
set observation consists of 3 measurements on a wafer: thickness, horizontal
displacement and vertical displacement.
Solve for the Next solve for the roots of R, using software
roots of R value proportion
1 1.769 .590
2 .927 .899
3 .304 1.000
Notice that
● Each eigenvalue satisfies |R- I| = 0.
● The sum of the eigenvalues = 3 = p, which is equal to the trace of R (i.e., the
sum of the main diagonal elements).
● The determinant of R is the product of the eigenvalues.
● The product is 1 x 2 x 3 = .499.
Compute the Substituting the first eigenvalue of 1.769 and R in the appropriate equation we
first column obtain
of the V
matrix
This is the matrix expression for 3 homogeneous equations with 3 unknowns and
yields the first column of V: .64 .69 -.34 (again, a computerized solution is
indispensable).
Compute the Repeating this procedure for the other 2 eigenvalues yields the matrix V
remaining
columns of
the V matrix
Notice that if you multiply V by its transpose, the result is an identity matrix,
V'V=I.
Compute the Now form the matrix L1/2, which is a diagonal matrix whose elements are the
L1/2 matrix square roots of the eigenvalues of R. Then obtain S, the factor structure, using S =
V L1/2
So, for example, .91 is the correlation between variable 2 and the first principal
component.
Compute the Next compute the communality, using the first two eigenvalues only
communality
This means that the first two principal components "explain" 86.62% of the first
variable, 84.20 % of the second variable, and 98.76% of the third.
Compute the The coefficient matrix, B, is formed using the reciprocals of the diagonals of L1/2
coefficient
matrix
Compute the Finally, we can compute the factor scores from ZB, where Z is X converted to
principal standard score form. These columns are the principal factors.
factors
Principal These factors can be plotted against the indices, which could be times. If time is
factors used, the resulting plot is an example of a principal factors control chart.
control
chart
Semiconductor One of the assumptions in using classical Shewhart SPC charts is that the only
processing source of variation is from part to part (or within subgroup variation). This is
creates the case for most continuous processing situations. However, many of today's
multiple processing situations have different sources of variation. The semiconductor
sources of industry is one of the areas where the processing creates multiple sources of
variability to variation.
monitor
In semiconductor processing, the basic experimental unit is a silicon wafer.
Operations are performed on the wafer, but individual wafers can be grouped
multiple ways. In the diffusion area, up to 150 wafers are processed in one
time in a diffusion tube. In the etch area, single wafers are processed
individually. In the lithography area, the light exposure is done on sub-areas of
the wafer. There are many times during the production of a computer chip
where the experimental unit varies and thus there are different sources of
variation in this batch processing environment.
tHE following is a case study of a lithography process. Five sites are measured
on each wafer, three wafers are measured in a cassette (typically a grouping of
24 - 25 wafers) and thirty cassettes of wafers are used in the study. The width
of a line is the measurement under study. There are two line width variables.
The first is the original data and the second has been cleaned up somewhat.
This case study uses the raw data. The entire data table is 450 rows long with
six columns.
Case study
data: wafer Raw Cleaned
line width Line Line
measurements Cassette Wafer Site Width Sequence Width
=====================================================
1 1 Top 3.199275 1 3.197275
1 1 Lef 2.253081 2 2.249081
1 1 Cen 2.074308 3 2.068308
1 1 Rgt 2.418206 4 2.410206
1 1 Bot 2.393732 5 2.383732
1 2 Top 2.654947 6 2.642947
1 2 Lef 2.003234 7 1.989234
1 2 Cen 1.861268 8 1.845268
1 2 Rgt 2.136102 9 2.118102
1 2 Bot 1.976495 10 1.956495
1 3 Top 2.887053 11 2.865053
1 3 Lef 2.061239 12 2.037239
1 3 Cen 1.625191 13 1.599191
1 3 Rgt 2.304313 14 2.276313
1 3 Bot 2.233187 15 2.203187
2 1 Top 3.160233 16 3.128233
2 1 Lef 2.518913 17 2.484913
2 1 Cen 2.072211 18 2.036211
2 1 Rgt 2.287210 19 2.249210
2 1 Bot 2.120452 20 2.080452
2 2 Top 2.063058 21 2.021058
2 2 Lef 2.217220 22 2.173220
2 2 Cen 1.472945 23 1.426945
2 2 Rgt 1.684581 24 1.636581
2 2 Bot 1.900688 25 1.850688
2 3 Top 2.346254 26 2.294254
2 3 Lef 2.172825 27 2.118825
2 3 Cen 1.536538 28 1.480538
2 3 Rgt 1.966630 29 1.908630
2 3 Bot 2.251576 30 2.191576
3 1 Top 2.198141 31 2.136141
3 1 Lef 1.728784 32 1.664784
3 1 Cen 1.357348 33 1.291348
3 1 Rgt 1.673159 34 1.605159
3 1 Bot 1.429586 35 1.359586
3 2 Top 2.231291 36 2.159291
3 2 Lef 1.561993 37 1.487993
3 2 Cen 1.520104 38 1.444104
3 2 Rgt 2.066068 39 1.988068
4-Plot of
Data
Run
Sequence
Plot of Data
Numerical
Summary
SUMMARY
***********************************************************************
* LOCATION MEASURES * DISPERSION MEASURES
*
***********************************************************************
* MIDRANGE = 0.2957607E+01 * RANGE = 0.4422122E+01
*
* MEAN = 0.2532284E+01 * STAND. DEV. = 0.6937559E+00
*
* MIDMEAN = 0.2393183E+01 * AV. AB. DEV. = 0.5482042E+00
*
* MEDIAN = 0.2453337E+01 * MINIMUM = 0.7465460E+00
*
* = * LOWER QUART. = 0.2046285E+01
*
* = * LOWER HINGE = 0.2048139E+01
*
* = * UPPER HINGE = 0.2971948E+01
*
* = * UPPER QUART. = 0.2987150E+01
*
* = * MAXIMUM = 0.5168668E+01
*
***********************************************************************
* RANDOMNESS MEASURES * DISTRIBUTIONAL MEASURES
*
***********************************************************************
* AUTOCO COEF = 0.6072572E+00 * ST. 3RD MOM. = 0.4527434E+00
*
* = 0.0000000E+00 * ST. 4TH MOM. = 0.3382735E+01
*
* = 0.0000000E+00 * ST. WILK-SHA = 0.6957975E+01
*
* = * UNIFORM PPCC = 0.9681802E+00
*
* = * NORMAL PPCC = 0.9935199E+00
*
* = * TUK -.5 PPCC = 0.8528156E+00
*
* = * CAUCHY PPCC = 0.5245036E+00
*
***********************************************************************
This summary generates a variety of statistics. In this case, we are primarily interested in
the mean and standard deviation. From this summary, we see that the mean is 2.53 and
the standard deviation is 0.69.
Plot response The next step is to plot the response against each individual factor. For
agains comparison, we generate both a scatter plot and a box plot of the data.
individual The scatter plot shows more detail. However, comparisons are usually
factors easier to see with the box plot, particularly as the number of data points
and groups become larger.
Scatter plot
of width
versus
cassette
Box plot of
width versus
cassette
Interpretation We can make the following conclusions based on the above scatter and
box plots.
1. There is considerable variation in the location for the various
cassettes. The medians vary from about 1.7 to 4.
2. There is also some variation in the scale.
3. There are a number of outliers.
Scatter plot
of width
versus wafer
Box plot of
width versus
wafer
Interpretation We can make the following conclusions based on the above scatter and
box plots.
1. The locations for the 3 wafers are relatively constant.
2. The scales for the 3 wafers are relatively constant.
3. There are a few outliers on the high side.
4. It is reasonable to treat the wafer factor as homogeneous.
Scatter plot
of width
versus site
Box plot of
width versus
site
Interpretation We can make the following conclusions based on the above scatter and
box plots.
1. There is some variation in location based on site. The center site
in particular has a lower median.
2. The scales are relatively constant across sites.
3. There are a few outliers.
Dex mean We can use the dex mean plot and the dex standard deviation plot to
and sd plots show the factor means and standard deviations together for better
comparison.
Dex mean
plot
Dex sd plot
Summary The above graphs show that there are differences between the lots and
the sites.
There are various ways we can create subgroups of this dataset: each
lot could be a subgroup, each wafer could be a subgroup, or each site
measured could be a subgroup (with only one data value in each
subgroup).
Recall that for a classical Shewhart Means chart, the average within
subgroup standard deviation is used to calculate the control limits for
the Means chart. However, on the means chart you are monitoring the
subgroup mean-to-mean variation. There is no problem if you are in a
continuous processing situation - this becomes an issue if you are
operating in a batch processing environment.
We will look at various control charts based on different subgroupings
next.
Site as The first pair of control charts use the site as the subgroup. However,
subgroup since site has a subgroup size of one we use the control charts for
individual measurements. A moving average and a moving range chart
are shown.
Moving
average
control chart
Moving
range control
chart
Wafer as The next pair of control charts use the wafer as the subgroup. In this
subgroup case, that results in a subgroup size of 5. A mean and a standard
deviation control chart are shown.
Mean control
chart
SD control
chart
Note that there is no LCL here because of the small subgroup size.
Cassette as The next pair of control charts use the cassette as the subgroup. In this
subgroup case, that results in a subgroup size of 15. A mean and a standard
deviation control chart are shown.
Mean control
chart
SD control
chart
Interpretation Which of these subgroupings of the data is correct? As you can see,
each sugrouping produces a different chart. Part of the answer lies in
the manufacturing requirements for this process. Another aspect that
can be statistically determined is the magnitude of each of the sources
of variation. In order to understand our data structure and how much
variation each of our sources contribute, we need to perform a variance
component analysis. The variance component analysis for this data set
is shown below.
Equating If your software does not generate the variance components directly,
mean squares they can be computed from a standard analysis of variance output by
with expected equating means squares (MSS) to expected mean squares (EMS).
values
JMP ANOVA Below we show SAS JMP 4 output for this dataset that gives the SS,
output MSS, and components of variance (the model entered into JMP is a
nested, random factors model). The EMS table contains the
coefficients needed to write the equations setting MSS values equal to
their EMS's. This is further described below.
Variance From the ANOVA table, labelled "Tests wrt to Random Effects" in the
Components JMP output, we can make the following variance component
Estimation calculations:
Chart only Another solution would be to have one chart on the largest source of
most variation. This would mean we would have one set of charts that
important monitor the lot-to-lot variation. From a manufacturing standpoint, this
source of would be unacceptable.
variation
Use boxplot We could create a non-standard chart that would plot all the individual
type chart data values and group them together in a boxplot type format by lot. The
control limits could be generated to monitor the individual data values
while the lot-to-lot variation would be monitored by the patterns of the
groupings. This would take special programming and management
intervention to implement non-standard charts in most floor shop control
systems.
Alternate A commonly applied solution is the first option; have multiple charts on
form for this process. When creating the control limits for the lot means, care
mean must be taken to use the lot-to-lot variation instead of the within lot
control variation. The resulting control charts are: the standard
chart individuals/moving range charts (as seen previously), and a control chart
on the lot means that is different from the previous lot means chart. This
new chart uses the lot-to-lot variation to calculate control limits instead
of the average within-lot standard deviation. The accompanying
standard deviation chart is the same as seen previously.
Mean
control
chart using
lot-to-lot
variation
The control limits labeled with "UCL" and "LCL" are the standard
control limits. The control limits labeled with "UCL: LL" and "LCL:
LL" are based on the lot-to-lot variation.
Click on the links below to start Dataplot and run this case
study yourself. Each step may use results from previous The links in this column will connect you with more detailed
steps, so please be patient. Wait until the software verifies information about each analysis step from the case study
that the current step is complete before clicking on the next description.
step.
1. Numerical summary of WIDTH. 1. The summary shows the mean line width
is 2.53 and the standard deviation
of the line width is 0.69.
4. Subgroup analysis.
7. Generate an analysis of
7. The analysis of variance and
variance. This is not
currently implemented in components of variance
DATAPLOT for nested calculations show that
datasets. cassette to cassette
variation is 54% of the total
and site to site variation
is 36% of the total.
8. Generate a mean control chart
using lot-to-lot variation. 8. The mean control chart shows one
point that is on the boundary of
being out of control.
Data These data were collected from an aerosol mini-spray dryer device. The
Collection purpose of this device is to convert a slurry stream into deposited
particles in a drying chamber. The device injects the slurry at high
speed. The slurry is pulverized as it enters the drying chamber when it
comes into contact with a hot gas stream at low humidity. The liquid
contained in the pulverized slurry particles is vaporized, then
transferred to the hot gas stream leaving behind dried small-sized
particles.
The response variable is particle size, which is collected equidistant in
time. There are a variety of associated variables that may affect the
injection process itself and hence the size and quality of the deposited
particles. For this case study, we restrict our analysis to the response
variable.
Aerosol The data set consists of particle sizes collected over time. The basic
Particle Size distributional properties of this process are of interest in terms of
Dynamic distributional shape, constancy of size, and variation in size. In
Modeling addition, this time series may be examined for autocorrelation structure
and Control to determine a prediction model of particle size as a function of
time--such a model is frequently autoregressive in nature. Such a
high-quality prediction equation would be essential as a first step in
developing a predictor-corrective recursive feedback mechanism which
would serve as the core in developing and implementing real-time
dynamic corrective algorithms. The net effect of such algorthms is, of
course, a particle size distribution that is much less variable, much
more stable in nature, and of much higher quality. All of this results in
final ceramic mold products that are more uniform and predictable
across a wide range of important performance characteristics.
For the purposes of this case study, we restrict the analysis to
determining an appropriate Box-Jenkins model of the particle size.
Case study
data 115.36539
114.63150
114.63150
116.09940
116.34400
116.09940
116.34400
116.83331
116.34400
116.83331
117.32260
117.07800
117.32260
117.32260
117.81200
117.56730
118.30130
117.81200
118.30130
117.81200
118.30130
118.30130
118.54590
118.30130
117.07800
116.09940
118.30130
118.79060
118.05661
118.30130
118.54590
118.30130
118.54590
118.05661
118.30130
118.54590
118.30130
118.30130
118.30130
118.30130
118.05661
118.30130
117.81200
118.30130
117.32260
117.32260
117.56730
117.81200
117.56730
117.81200
117.81200
117.32260
116.34400
116.58870
116.83331
116.58870
116.83331
116.83331
117.32260
116.34400
116.09940
115.61010
115.61010
115.61010
115.36539
115.12080
115.61010
115.85471
115.36539
115.36539
115.36539
115.12080
114.87611
114.87611
115.12080
114.87611
114.87611
114.63150
114.63150
114.14220
114.38680
114.14220
114.63150
114.87611
114.38680
114.87611
114.63150
114.14220
114.14220
113.89750
114.14220
113.89750
113.65289
113.65289
113.40820
113.40820
112.91890
113.40820
112.91890
113.40820
113.89750
113.40820
113.65289
113.89750
113.65289
113.65289
113.89750
113.65289
113.16360
114.14220
114.38680
113.65289
113.89750
113.89750
113.40820
113.65289
113.89750
113.65289
113.65289
114.14220
114.38680
114.63150
115.61010
115.12080
114.63150
114.38680
113.65289
113.40820
113.40820
113.16360
113.16360
113.16360
113.16360
113.16360
112.42960
113.40820
113.40820
113.16360
113.16360
113.16360
113.16360
111.20631
112.67420
112.91890
112.67420
112.91890
113.16360
112.91890
112.67420
112.91890
112.67420
112.91890
113.16360
112.67420
112.67420
112.91890
113.16360
112.67420
112.91890
111.20631
113.40820
112.91890
112.67420
113.16360
113.65289
113.40820
114.14220
114.87611
114.87611
116.09940
116.34400
116.58870
116.09940
116.34400
116.83331
117.07800
117.07800
116.58870
116.83331
116.58870
116.34400
116.83331
116.83331
117.07800
116.58870
116.58870
117.32260
116.83331
118.79060
116.83331
117.07800
116.58870
116.83331
116.34400
116.58870
116.34400
116.34400
116.34400
116.09940
116.09940
116.34400
115.85471
115.85471
115.85471
115.61010
115.61010
115.61010
115.36539
115.12080
115.61010
115.85471
115.12080
115.12080
114.87611
114.87611
114.38680
114.14220
114.14220
114.38680
114.14220
114.38680
114.38680
114.38680
114.38680
114.38680
114.14220
113.89750
114.14220
113.65289
113.16360
112.91890
112.67420
112.42960
112.42960
112.42960
112.18491
112.18491
112.42960
112.18491
112.42960
111.69560
112.42960
112.42960
111.69560
111.94030
112.18491
112.18491
112.18491
111.94030
111.69560
111.94030
111.94030
112.42960
112.18491
112.18491
111.94030
112.18491
112.18491
111.20631
111.69560
111.69560
111.69560
111.94030
111.94030
112.18491
111.69560
112.18491
111.94030
111.69560
112.18491
110.96170
111.69560
111.20631
111.20631
111.45100
110.22771
109.98310
110.22771
110.71700
110.22771
111.20631
111.45100
111.69560
112.18491
112.18491
112.18491
112.42960
112.67420
112.18491
112.42960
112.18491
112.91890
112.18491
112.42960
111.20631
112.42960
112.42960
112.42960
112.42960
113.16360
112.18491
112.91890
112.91890
112.67420
112.42960
112.42960
112.42960
112.91890
113.16360
112.67420
113.16360
112.91890
112.42960
112.67420
112.91890
112.18491
112.91890
113.16360
112.91890
112.91890
112.91890
112.67420
112.42960
112.42960
113.16360
112.91890
112.67420
113.16360
112.91890
113.16360
112.91890
112.67420
112.91890
112.67420
112.91890
112.91890
112.91890
113.16360
112.91890
112.91890
112.18491
112.42960
112.42960
112.18491
112.91890
112.67420
112.42960
112.42960
112.18491
112.42960
112.67420
112.42960
112.42960
112.18491
112.67420
112.42960
112.42960
112.67420
112.42960
112.42960
112.42960
112.67420
112.91890
113.40820
113.40820
113.40820
112.91890
112.67420
112.67420
112.91890
113.65289
113.89750
114.38680
114.87611
114.87611
115.12080
115.61010
115.36539
115.61010
115.85471
116.09940
116.83331
116.34400
116.58870
116.58870
116.34400
116.83331
116.83331
116.83331
117.32260
116.83331
117.32260
117.56730
117.32260
117.07800
117.32260
117.81200
117.81200
117.81200
118.54590
118.05661
118.05661
117.56730
117.32260
117.81200
118.30130
118.05661
118.54590
118.05661
118.30130
118.05661
118.30130
118.30130
118.30130
118.05661
117.81200
117.32260
118.30130
118.30130
117.81200
117.07800
118.05661
117.81200
117.56730
117.32260
117.32260
117.81200
117.32260
117.81200
117.07800
117.32260
116.83331
117.07800
116.83331
116.83331
117.07800
115.12080
116.58870
116.58870
116.34400
115.85471
116.34400
116.34400
115.85471
116.58870
116.34400
115.61010
115.85471
115.61010
115.85471
115.12080
115.61010
115.61010
115.85471
115.61010
115.36539
114.87611
114.87611
114.63150
114.87611
115.12080
114.63150
114.87611
115.12080
114.63150
114.38680
114.38680
114.87611
114.63150
114.63150
114.63150
114.63150
114.63150
114.14220
113.65289
113.65289
113.89750
113.65289
113.40820
113.40820
113.89750
113.89750
113.89750
113.65289
113.65289
113.89750
113.40820
113.40820
113.65289
113.89750
113.89750
114.14220
113.65289
113.40820
113.40820
113.65289
113.40820
114.14220
113.89750
114.14220
113.65289
113.65289
113.65289
113.89750
113.16360
113.16360
113.89750
113.65289
113.16360
113.65289
113.40820
112.91890
113.16360
113.16360
113.40820
113.40820
113.65289
113.16360
113.40820
113.16360
113.16360
112.91890
112.91890
112.91890
113.65289
113.65289
113.16360
112.91890
112.67420
113.16360
112.91890
112.67420
112.91890
112.91890
112.91890
111.20631
112.91890
113.16360
112.42960
112.67420
113.16360
112.42960
112.67420
112.91890
112.67420
111.20631
112.42960
112.67420
112.42960
113.16360
112.91890
112.67420
112.91890
112.42960
112.67420
112.18491
112.91890
112.42960
112.18491
Run Sequence
Plot
Interpretation We can make the following conclusions from the run sequence plot.
of the Run 1. The data show strong and positive autocorrelation.
Sequence Plot
2. There does not seem to be a significant trend or any obvious
seasonal pattern in the data.
The next step is to examine the sample autocorrelations using the
autocorrelation plot.
Autocorrelation
Plot
Run Sequence
Plot of
Differenced
Data
Interpretation The run sequence plot of the differenced data shows that the mean of
of the Run the differenced data is around zero, with the differenced data less
Sequence Plot autocorrelated than the original data.
The next step is to examine the sample autocorrelations of the
differenced data.
Autocorrelation
Plot of the
Differenced
Data
Partial
Autocorrelation
Plot of the
Differenced
Data
Interpretation The partial autocorrelation plot of the differenced data with 95%
of the Partial confidence bands shows that only the partial autocorrelations of the
Autocorrelation first and second lag are significant. This suggests an AR(2) model for
Plot of the the differenced data.
Differenced
Data
Dataplot Based on the differenced data, Dataplot generated the following estimation output for the
ARMA AR(2) model:
Output
for the
AR(2) #############################################################
Model # NONLINEAR LEAST SQUARES ESTIMATION FOR THE PARAMETERS OF #
# AN ARIMA MODEL USING BACKFORECASTS #
#############################################################
MODEL SPECIFICATION
FACTOR (P D Q) S
1 2 1 0 1
##STEP SIZE
FOR
######PARAMETER
##APPROXIMATING
#################PARAMETER DESCRIPTION STARTING VALUES
#####DERIVATIVE
INDEX #########TYPE ##ORDER ##FIXED ##########(PAR)
##########(STP)
1 AR (FACTOR 1) 1 NO 0.10000000E+00
0.77167549E-06
2 AR (FACTOR 1) 2 NO 0.10000000E+00
0.77168311E-06
3 MU ### NO 0.00000000E+00
0.80630875E-06
0.1000E-09
MAXIMUM SCALED RELATIVE CHANGE IN THE PARAMETERS (STOPP)
0.1489E-07
NONDEFAULT VALUES....
FACTOR 1
AR 1 -0.40604575E+00 0.41885445E-01 -9.69 -0.47505616E+00
-0.33703534E+00
AR 2 -0.16414479E+00 0.41836922E-01 -3.92 -0.23307525E+00
-0.95214321E-01
MU ## -0.52091780E-02 0.11972592E-01 -0.44 -0.24935207E-01
0.14516851E-01
Interpretation The first section of the output identifies the model and shows the starting values for the fit.
of Output This output is primarily useful for verifying that the model and starting values were
correctly entered.
The section labeled "ESTIMATES FROM LEAST SQUARES FIT" gives the parameter
estimates, standard errors from the estimates, and 95% confidence limits for the
parameters. A confidence interval that contains zero indicates that the parameter is not
statistically significant and could probably be dropped from the model.
The model for the differenced data, Yt, is an AR(2) model:
with 0.44.
It is often more convenient to express the model in terms of the original data, Xt, rather
than the differenced data. From the definition of the difference, Yt = Xt - Xt-1, we can make
the appropriate substitutions into the above equation:
Dataplot Alternatively, based on the differenced data Dataplot generated the following estimation
ARMA output for an MA(1) model:
Output for
the MA(1)
Model
#############################################################
# NONLINEAR LEAST SQUARES ESTIMATION FOR THE PARAMETERS OF #
# AN ARIMA MODEL USING BACKFORECASTS #
#############################################################
MODEL SPECIFICATION
FACTOR (P D Q) S
1 0 1 1 1
##STEP SIZE
FOR
######PARAMETER
##APPROXIMATING
#################PARAMETER DESCRIPTION STARTING VALUES
#####DERIVATIVE
INDEX #########TYPE ##ORDER ##FIXED ##########(PAR)
##########(STP)
1 MU ### NO 0.00000000E+00
0.20630657E-05
2 MA (FACTOR 1) 1 NO 0.10000000E+00
0.34498203E-07
NONDEFAULT VALUES....
FACTOR 1
MU ## -0.51160754E-02 0.11431230E-01 -0.45 -0.23950101E-01
0.13717950E-01
MA 1 0.39275694E+00 0.39028474E-01 10.06 0.32845386E+00
0.45706001E+00
Interpretation The model for the differenced data, Yt, is an ARIMA(0,1,1) model:
of the Output
with 0.44.
4-Plot of The 4-plot is a convenient graphical technique for model validation in that it
Residuals from tests the assumptions for the residuals on a single graph.
ARIMA(2,1,0)
Model
Interpretation We can make the following conclusions based on the above 4-plot.
of the 4-Plot 1. The run sequence plot shows that the residuals do not violate the
assumption of constant location and scale. It also shows that most of
the residuals are in the range (-1, 1).
2. The lag plot indicates that the residuals are not autocorrelated at lag 1.
3. The histogram and normal probability plot indicate that the normal
distribution provides an adequate fit for this model.
Autocorrelation In addition, the autocorrelation plot of the residuals from the ARIMA(2,1,0)
Plot of model was generated.
Residuals from
ARIMA(2,1,0)
Model
Interpretation The autocorrelation plot shows that for the first 25 lags, all sample
of the autocorrelations expect those at lags 7 and 18 fall inside the 95% confidence
Autocorrelation bounds indicating the residuals appear to be random.
Plot
Ljung-Box Test Instead of checking the autocorrelation of the residuals, portmanteau tests
for such as the test proposed by Ljung and Box (1978) can be used. In this
Randomness example, the test of Ljung and Box indicates that the residuals are random at
for the the 95% confidence level and thus the model is appropriate. Dataplot
ARIMA(2,1,0) generated the following output for the Ljung-Box test.
Model
LJUNG-BOX TEST FOR RANDOMNESS
1. STATISTICS:
NUMBER OF OBSERVATIONS = 559
LAG TESTED = 24
LAG 1 AUTOCORRELATION = -0.1012441E-02
LAG 2 AUTOCORRELATION = 0.6160716E-02
LAG 3 AUTOCORRELATION = 0.5182213E-02
4-Plot of The 4-plot is a convenient graphical technique for model validation in that it
Residuals from tests the assumptions for the residuals on a single graph.
ARIMA(0,1,1)
Model
Interpretation We can make the following conclusions based on the above 4-plot.
of the 4-Plot 1. The run sequence plot shows that the residuals do not violate the
from the assumption of constant location and scale. It also shows that most of
ARIMA(0,1,1) the residuals are in the range (-1, 1).
Model
2. The lag plot indicates that the residuals are not autocorrelated at lag 1.
3. The histogram and normal probability plot indicate that the normal
distribution provides an adequate fit for this model.
This 4-plot of the residuals indicates that the fitted model is an adequate
model for these data.
Autocorrelation The autocorrelation plot of the residuals from ARIMA(0,1,1) was generated.
Plot of
Residuals from
ARIMA(0,1,1)
Model
Interpretation Similar to the result for the ARIMA(2,1,0) model, it shows that for the first
of the 25 lags, all sample autocorrelations expect those at lags 7 and 18 fall inside
Autocorrelation the 95% confidence bounds indicating the residuals appear to be random.
Plot
Ljung-Box Test The Ljung and Box test is also applied to the residuals from the
for ARIMA(0,1,1) model. The test indicates that the residuals are random at the
Randomness of 99% confidence level, but not at the 95% level.
the Residuals
for the Dataplot generated the following output for the Ljung-Box test.
ARIMA(0,1,1)
LJUNG-BOX TEST FOR RANDOMNESS
Model
1. STATISTICS:
NUMBER OF OBSERVATIONS = 559
LAG TESTED = 24
LAG 1 AUTOCORRELATION = -0.1280136E-01
LAG 2 AUTOCORRELATION = -0.3764571E-02
LAG 3 AUTOCORRELATION = 0.7015200E-01
99 % POINT = 42.97982
1. Run sequence plot of Y. 1. The run sequence plot shows that the
data show strong and positive
autocorrelation.
4. Model validation.
6.7. References
Selected References
Army Chemical Corps (1953). Master Sampling Plans for Single, Duplicate, Double and
Multiple Sampling, Manual No. 2.
Bissell, A. F. (1990). "How Reliable is Your Capability Index?", Applied Statistics, 39,
331-340.
Champ, C.W., and Woodall, W.H. (1987). "Exact Results for Shewhart Control Charts
with Supplementary Runs Rules", Technometrics, 29, 393-399.
Duncan, A. J. (1986). Quality Control and Industrial Statistics, 5th ed., Irwin,
Homewood, IL.
Hotelling, H. (1947). Multivariate Quality Control. In C. Eisenhart, M. W. Hastay, and
W. A. Wallis, eds. Techniques of Statistical Analysis. New York: McGraw-Hill.
Juran, J. M. (1997). "Early SQC: A Historical Supplement", Quality Progress, 30(9)
73-81.
Montgomery, D. C. (2000). Introduction to Statistical Quality Control, 4th ed., Wiley,
New York, NY.
Kotz, S. and Johnson, N. L. (1992). Process Capability Indices, Chapman & Hall,
London.
Lowry, C. A., Woodall, W. H., Champ, C. W., and Rigdon, S. E. (1992). "A Multivariate
Exponentially Weighted Moving Average Chart", Technometrics, 34, 46-53.
Lucas, J. M. and Saccucci, M. S. (1990). "Exponentially weighted moving average
control schemes: Properties and enhancements", Technometrics 32, 1-29.
Ott, E. R. and Schilling, E. G. (1990). Process Quality Control, 2nd ed., McGraw-Hill,
New York, NY.
Quesenberry, C. P. (1993). "The effect of sample size on estimated limits for and X
control charts", Journal of Quality Technology, 25(4) 237-247.
Ryan, T.P. (2000). Statistical Methods for Quality Improvement, 2nd ed., Wiley, New
York, NY.
Ryan, T. P. and Schwertman, N. C. (1997). "Optimal limits for attributes control charts",
Journal of Quality Technology, 29 (1), 86-98.
Schilling, E. G. (1982). Acceptance Sampling in Quality Control, Marcel Dekker, New
York, NY.
Tracy, N. D., Young, J. C. and Mason, R. L. (1992). "Multivariate Control Charts for
Individual Observations", Journal of Quality Technology, 24(2), 88-95.
Woodall, W. H. (1997). "Control Charting Based on Attribute Data: Bibliography and
Review", Journal of Quality Technology, 29, 172-183.
Woodall, W. H., and Adams, B. M. (1993); "The Statistical Design of CUSUM Charts",
Quality Engineering, 5(4), 559-570.
Zhang, Stenback, and Wardrop (1990). "Interval Estimation of the Process Capability
Index", Communications in Statistics: Theory and Methods, 19(21), 4455-4470.
Statistical Analysis
Anderson, T. W. (1984). Introduction to Multivariate Statistical Analysis, 2nd ed., Wiley
New York, NY.
Johnson, R. A. and Wichern, D. W. (1998). Applied Multivariate Statistical Analysis,
Fourth Ed., Prentice Hall, Upper Saddle River, NJ.