Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Large-Scale Semiconductor Process Fault Detection Using A Fast Pattern Recognition-Based Method

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

194 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO.

2, MAY 2010

Large-Scale Semiconductor Process Fault Detection


Using a Fast Pattern Recognition-Based Method
Qinghua Peter He, Member, IEEE, and Jin Wang, Member, IEEE

Abstract—Fault detection and classification (FDC) has been processes are batch process, some unique process characteris-
recognized as an integral component of the advanced process tics, such as multimodal batch trajectories and nonlinearity, have
control (APC) framework in the semiconductor industry, as it posed difficulties for fault detection that cannot be readily ad-
helps to improve overall equipment efficiency (OEE). However,
some unique characteristics of semiconductor manufacturing dressed by these multivariate statistical methods. In our previous
processes have posed challenges for FDC applications, such as work [19], a pattern recognition based fault detection method
nonlinearity in most batch processes, and multimodal batch using the k-nearest-neighbor (kNN) rule [20] (FD-kNN) was
trajectories due to product mix. To explicitly account for these developed to explicitly account for the above mentioned char-
unique characteristics, a pattern recognition based fault detection acteristics of semiconductor processes. However, the drawback
method utilizing the k-nearest-neighbor rule (FD-kNN) was pre-
viously developed. In FD-kNN, historical data are used directly as associated with the FD-kNN method is that it can be computa-
the reference of normal process operation to determine whether tion and storage intensive for large-scale processes, where the
a new measurement is a fault. Therefore, for processes with a number of features or variables can easily exceed thousands
large number of variables, it can be computation and storage after batch unfolding. This drawback may prevent it from being
intensive, and may be difficult for online process monitoring. To implemented for online process monitoring, especially when
address this difficulty, we propose a fast pattern recognition based
fault detection method, termed principal component-based kNN thousands or tens of thousands of such FDC models are running
(PC-kNN), which takes advantages of both principal component concurrently due to high-mix production [21]. For example, by
analysis (PCA) for dimensionality reduction and FD-kNN for non- 2006 there were more than 7000 active FDC models at IBM
linearity and multimode handling. Two simulation examples and [6] and over 30 000 models at Intel [7]. To reduce computation
an industrial example are used to demonstrate the performance of complexity and storage/memory requirements, in this paper we
the proposed PC-kNN method in fault detection.
propose a fast pattern recognition based fault detection method,
Index Terms— Fault detection, k-nearest neighbor rule, pattern which is termed principal component-based kNN method (PC-
recognition, principal component analysis (PCA). kNN). The basic idea of the proposed PC-kNN method involves
two steps. In the first step, PCA is applied to the original process
data to capture key process features, and then FD-kNN is ap-
I. INTRODUCTION
plied to the principal subspace obtained by PCA to perform fault
detection. In this way, we can significantly reduce the compu-
tation load and data storage requirement, without loosing the
W ITH the massive amount of trace or machine data avail-
able in today’s semiconductor industry, fault detection
has been used to reduce wafer scrap, increase equipment uptime
advantages of the FD-kNN method in handling multimode data
and process nonlinearity.
and reduce the usage of test wafers [1]–[8]. Among available The remainder of the paper is organized as follows. In
fault detection methods, multivariate statistical fault detection Section II, we give brief reviews on the relevant methods
methods, such as principal component analysis (PCA) and par- applied in this study. Section III presents the proposed fault
tial least squares (PLS) have drawn increasing interest recently detection method, PC-kNN. Simulation examples to illustrate
for semiconductor manufacturing process monitoring [9]–[13]. fault detection capability of the proposed method are pre-
PCA- and PLS-based methods have been tremendously suc- sented in Section IV. Section V compares PCA, FD-kNN, and
cessful in monitoring continuous processes in the chemical and PC-kNN using an industrial example, and Section VI gives
petrochemical industries. In addition, their application to tra- conclusions and some discussions.
ditional chemical batch processes has been studied in the last II. METHODS
decade [14]–[18]. Although most semiconductor manufacturing
In this section, we briefly review relevant methods: PCA
and FD-kNN. Throughout this paper, a scalar is denoted by an
Manuscript received May 31, 2009; revised November 30, 2009; accepted
italic lower-case character , a vector by a bold lower-case
December 15, 2009. First published January 26, 2010; current version published character , a matrix by a bold upper-case character and
May 05, 2010. This work was supported by the NSF by Grants CTS-0853748 a three-way array by an underlined bold upper-case character
and CTS-0853983.
Q. P. He is with the Department of Chemical Engineering, Tuskegee Univer-
. These notations are consistent with notations used by
sity, Tuskegee, AL 36088 USA (e-mail: qhe@tuskegee.edu). others in the statistical process control literature (see e.g., [22]).
J. Wang is with the Department of Chemical Engineering, Auburn University,
Auburn, AL 36849 USA (e-mail: wang@auburn.edu). A. PCA
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. PCA and its application in process monitoring have been
Digital Object Identifier 10.1109/TSM.2010.2041289 studied extensively, and a brief review is given here. Generally
0894-6507/$26.00 © 2010 IEEE

Authorized licensed use limited to: National Taiwan University. Downloaded on August 03,2010 at 15:11:32 UTC from IEEE Xplore. Restrictions apply.
HE AND WANG: LARGE-SCALE SEMICONDUCTOR PROCESS FAULT DETECTION 195

speaking, the raw data for a batch process is a 3-D array where denotes squared Euclidean distance from
with unequal dimensions in sample to its -th nearest neighbor.
batch duration. Certain data preprocessing steps are needed to 3) Determine the threshold of kNN distance, , for fault
transform the raw data into a two dimension matrix before PCA detection.
can be applied. Let denote the process data matrix Similar to multivariate SPM methods, the threshold
after preprocessing, with samples (rows) and variables can be determined in two ways. One is based on the
(columns). is first scaled to zero mean for covariance-based assumption that following a noncentral distri-
PCA and further to unit variance for correlation-based PCA. bution. The other is through the calibration or use of
By either the NIPALS [23] or a singular value decomposition testing data under normal operation conditions [9],
(SVD) algorithm, the scaled matrix is decomposed as fol- [27].
lows: • Step 2: fault detection
For an incoming unclassified sample ,
(1) 1) Identify ’s nearest neighbors in the training data set.
2) Calculate ’s kNN distance using (4).
where and are the score and loading
3) Compare against the threshold . If ,
matrices, respectively. The PCA projection reduces the orig-
it is classified as a normal sample. Otherwise, it is de-
inal set of variables to principal components (PC’s). For
tected as a fault.
fault detection on a new sample vector , the squared predic-
Details on the FD-kNN method, such as the choice of and
tion error (SPE) and the Hotelling’s are often used. The SPE
determination of , can be found in [4], [19].
statistic indicates how well each sample conforms to the model,
measured by the projection of the sample vector on the residual
space (RS): III. PC-KNN
A disadvantage of the FD-kNN method is that it requires
(2)
considerable memory resources for large systems, as all of the
training data must be retained to compute for each incoming
The Hotelling’s is a measure of the variation in principal
sample . In addition, the computational complexity is directly
component space (PCS):
related to the dimensionality of the data. Consequently, there
(3) is a practical upper limit to both the number of training sam-
ples and the number of variables that can be processed if the
where is the diagonal matrix of largest eigenvalues of . algorithm is implemented online with thousands or tens of thou-
The statistic forms a hyper-ellipsoid, which represents the sands of models running concurrently. Although there are algo-
joint limits of variations that can be explained by a set of rithmic techniques that can reduce the computational burden,
common causes. the implementation of these techniques is nontrivial [20]. To
The thresholds of SPE and can be determined either the- reduce the memory requirement and computation time of the
oretically [24], [25] or empirically [18], [26]. FD-kNN method while still keeping its advantage of handling
nonlinear and multimodal data, we propose a fast pattern recog-
B. FD-kNN nition based fault detection method, termed as PC-kNN, which
The basic idea of FD-kNN method [4], [19] is that the tra- combines the strengths of PCA and FD-kNN. In PC-kNN, we
jectory of a normal sample is similar to that of normal samples first make use of the dimensionality reduction and information
in the training data; while the trajectory of a fault sample must preserving properties of PCA to extract the principal component
exhibit some deviation from that of normal training samples. In subspace (PCS) that contains key information of the data set.
FD-kNN, this idea is implemented through evaluating the kNN Then we apply FD-kNN to the PCS for fault detection, which
distance, which is defined by the average squared distance be- usually has a much lower dimension compared to the original
tween a sample to its nearest neighbors in the training data set. data space. For a training data set with sam-
Specifically, the kNN distance of a faulty sample must be greater ples and variables, principal components which span the PCS
than that of a normal sample. Therefore, the FD-kNN method are extracted by (1). In the model building phase, we apply the
consists of two steps: 1) model building which determines the FD-kNN algorithm to the score matrix corresponding to the
threshold of the kNN distance for normal samples, and 2) fault features preserved in PCS.
detection, which calculates a new sample’s kNN distance and The PC-kNN distance of sample is defined as
compares it with the predetermined threshold to perform fault
detection. The details are as follows: (5)
• Step 1: model building
For each sample in the training data set,
1) Find its nearest neighbors in the training data set. where denotes squared Euclidean distance from sample to
2) Calculate the kNN distance as defined by its -th nearest neighbors in the PCS, i.e., from to its
-th nearest neighbor in . Similar to [19], the distribution of
(4) PC-kNN distances can be approximated by a noncentral dis-
tribution and a threshold can be determined theoretically or

Authorized licensed use limited to: National Taiwan University. Downloaded on August 03,2010 at 15:11:32 UTC from IEEE Xplore. Restrictions apply.
196 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010

Fig. 1. Nonlinear case. (a) The scatter plot of the first two dominant mode vari- Fig. 2. Fault detection—nonlinear case. (a) SPE chart. (b) T chart.
ables. (b) The score plot when all variables are projected onto the first two PC’s.

empirically. For a test sample , its scores are ob-


tained by projecting onto the loading matrix according to,
. Its k nearest neighbors in the PCS can be found by
computing distances between and and its PC-kNN squared
distance is calculated using (5). is then compared with
to determine whether is a fault. By doing so, the proposed
PC-kNN method can significantly reduce the computation time
and memory requirement without sacrificing fault detection ca-
Fig. 3. Fault detection—nonlinear case. (a) The FD-kNN method. (b) The pro-
pability. posed PC-kNN method.
It is worth noting that PC-kNN works in PCS similar to the
PCA index except that PC-kNN handles nonlinearity and
multimodal distribution in the scores. Therefore, we expect A. Nonlinear Case
that PC-kNN would perform essentially the same as in the
The first simulation example is a nonlinear case with the fol-
linear, unimodal cases while PC-kNN would outperform in
lowing dominant process mode:
the presence of nonlinearity or multimodal distribution in the
scores. Because PC-kNN makes no use of residuals, any fault (6)
that occurs in the residual subspace only will not be detected
by PC-kNN. However, a fault that cannot be detected by the where is a white noise sequence with variance . Vari-
PCA index does not necessarily mean that the fault occurs ables to are white noise sequences with variances ranging
in the residual subspace only. This is because the PCA from to .
index uses a hyperellipsoid in the PCS to represent the normal A total of 600 normal runs are simulated, with 500 normal
process operation region, which can only capture the linear runs for training and 100 normal runs for validation. In addition,
part of the process behavior. Therefore, many faults will go 5 faulty runs are simulated. Fig. 1(a) shows the scatter plot of the
undetected by the PCA index if the process is not a linear training, validation and fault samples in the dominant subspace,
process. On the other hand, in the proposed PC-kNN method, i.e., the plane spanned by variables and . If PCA is applied
we use the scores of all training data in the PCS to represent to the data set with all 12 variables, the score plot of the first two
the normal process operation, which preserves the process PC’s is shown in Fig. 1(b). By comparing the two figures, we can
nonlinearity or multimodal distribution in the PCS. As PCA see that the process dominant mode and the process nonlinearity
extracts the directions that correspond to the largest variations are preserved during PCA projection.
of the data set, generally speaking the dominant features such PCA is first applied to detect the faults in the data set. With
as nonlinearity or multimodal distribution will be captured by the confidence level at 99%, the detection results with 2 PC’s
PCS. As illustrated by two simulated examples in Sections IV are shown in Fig. 2, where the SPE and indices for training,
and one industrial case study in Section V, most faults, if not validation, and fault samples are plotted. Fig. 2 shows that PCA
all, will result in changes in PCS. does not perform well and only fault 3 is detected by the
index. This result is expected because PCA model does not cap-
ture the nonlinear behavior of the process. By assuming a linear
IV. ILLUSTRATIVE EXAMPLES
process, the thresholds of SPE and are high in order to cover
In this section, two simulation examples each with 12 vari- the normal operation region, which greatly increase the chance
ables are given to illustrate how the proposed PC-kNN fault de- of type II errors for nonlinear processes.
tection method works in the presence of nonlinearity and mul- Next, FD-kNN is applied to the nonlinear data set. The
timodal distribution. In both cases, it is assumed that the first number of nearest neighbors, , is set to be 3. The detection re-
two variables ( and ) are dominant modes while others are sult is shown in Fig. 3(a). By capturing the process nonlinearity,
essentially white noise after subtracting setpoints. The perfor- FD-kNN successfully detects all 5 faults.
mance of three fault detection methods, i.e., PCA, FD-kNN and Finally we apply PC-kNN to perform fault detection. Specif-
PC-kNN, are compared using the two examples. ically, FD-kNN is applied to the scores obtained earlier from

Authorized licensed use limited to: National Taiwan University. Downloaded on August 03,2010 at 15:11:32 UTC from IEEE Xplore. Restrictions apply.
HE AND WANG: LARGE-SCALE SEMICONDUCTOR PROCESS FAULT DETECTION 197

Fig. 4. Multimodal case. (a) The scatter plot of the first two dominant mode Fig. 5. Fault detection—multimodal case. (a) SPE chart. (b) T chart.
variables. (b) The score plot when all variables are projected onto the first two
PC’s.

PCA based on 2 PC’s as shown in Fig. 1(b). The number of


nearest neighbor is set as 3. The detection result is shown in
Fig. 3(b). Note that because PCA effectively removes noise by
extracting principal components, the kNN distances of faults in
PC-kNN show more distinctive deviation from normal samples
compared to kNN distances of faults in FD-kNN.
This example confirms that the PCS captures the dominant
process features, whether they are linear or nonlinear. There- Fig. 6. Fault detection—multimodal case. (a) The FD-kNN method. (b) The
fore, PC-kNN is able to handle nonlinearity as long as that non- proposed PC-kNN method.
linearity the dominant process feature that it is captured in PCS.

B. Multimodal Case based on 2 PC’s with , and the result is shown in Fig. 6(b),
where all 5 faults are also successfully detected.
The second simulation example is a multimodal case where
Again, because the PCA model only extracts linear features
the process is performed on two tools with different process
from process data, the multimodal environment imposes limita-
gains and an offset exists between the tools. Again, the process
tions on PCA and makes it less effective. As the bimodal dis-
is mainly characterized by two variables ( and ) out of 12
tribution is the dominant feature of the process (which is also
variables:
a type of nonlinearity), it is captured in the PCS as shown in
Fig. 4(b), which is why PC-kNN successfully detects all five
Tool A
faults.
Tool B (7)

where and are white noise sequences with variance 0.25. V. INDUSTRIAL EXAMPLES
Variables to are white noise sequences with variance In this section, a benchmark industrial example is used to
ranging from 0.04 to 1. compare different fault detection methods. The data set is col-
Three-hundred normal runs are conducted on each tool so lected from an Al stack etch process performed on a commercial
that totally 600 normal data points are collected. Five-hundred scale Lam 9600 plasma etch tool at Texas Instrument Inc. [9],
normal runs are randomly selected from the two tools for [28]. The data consists of 108 normal wafers taken during 3 ex-
training, the remaining 100 normal runs are used for validation, periments and 21 wafers with intentionally induced faults taken
and 5 faults are introduced. The data set, including the training, during the same experiments. Due to large amount of missing
validation and fault samples, is visualized in Fig. 4(a), in the data in two batches, only 107 normal wafers and 20 wafers with
plane spanned by and . If PCA is applied to the data set faults are used in this study. A more detailed description on the
with all 12 variables, the score plot of the first two PC’s is faults can be found in [9]. Similar to [9], only the samples col-
shown in Fig. 4(b). By comparing the two figures, we can see lected from steps 4 and 5 are used in this work, and 19 nonset-
that the process dominant mode and the multimodal distribution point process variables are included for fault detection.
of samples are captured in the PCS. As aforementioned, there are unique characteristics associ-
Fault detection results using the PCA SPE and charts with ated with semiconductor processes. These characteristics are
2 PC’s are shown in Fig. 5(a) and Fig. 5(b). We see that only also noted in this data set, as illustrated using two examples
faults 2 and 5 are detected by the index, because the bimodal (variables EndPt A and TCP Load) in Fig. 7.
distributed data set violated the unimodal distribution assump- • Unequal batch and step duration: Like many other batch
tion that PCA is based upon. processes, in the etch data set, different batches have dif-
Next, FD-kNN is applied to the multimodal data set. The ferent durations, and the batch durations range from 95
number of nearest neighbors, , is set to be 3 as in the previous to 112 sec for the normal batches. In addition, batches of
example. The result is shown in Fig. 6(a), where all faults are equal length may not follow exactly the same time trajec-
detected. PC-kNN is then applied to the scores obtained in PCA tory, as the duration of each specific step may vary from

Authorized licensed use limited to: National Taiwan University. Downloaded on August 03,2010 at 15:11:32 UTC from IEEE Xplore. Restrictions apply.
198 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010

is to obtain equal length batch records. In this step, the initial


5 sample points are removed to eliminate the possible effect of
initial fluctuation in the tool and sensors, and totally 85 sample
points were kept to accommodate shorter batches in the data set.
This is done for all training, validation and test data. Once equal
length batch records are obtained, the second step is to scale
each variable to zero mean and unit variance for each wafer in
the training data set and scale the validation and test data ac-
cordingly using the mean and variance values obtained from the
training data. Note that the above two-step data preprocessing
Fig. 7. Process shift and drift exhibited in the data set. (a) Endpoint A. (b) TCP
load.
can be done automatically in the production environment. The
data is further unfolded batch-wise to obtain a two-dimensional
array.
B. Fault Detection
Here, three methods, i.e., PCA, FD-kNN, and PC-kNN, are
applied to the preprocessed data sets for fault detection.
When PCA is applied, all training data are used together to es-
timate the data covariance matrix, making no distinction among
data from different experiments/groups. Totally 3 PC’s are se-
lected to build the PCA model based on the criterion that the
SPE and values of the validation data are at the same levels
Fig. 8. Fault detection results based on PCA. (a) SPE chart (b) T chart. as those of the training data. Fig. 8(a) shows the fault detection
result based on the SPE index and (b) shows the fault detection
result based on the index. Note that fault 12 is detected by
batch to batch and time stamps of the step onset are not SPE but not shown in Fig. 8(a) because its SPE index is sev-
synchronized. In the etch data set, the duration of step 4, eral orders of magnitude larger than those of the other samples.
which is one of the main etch steps, varies from 44 to 52 SPE and charts together detect 13 faults out of 20 faults. De-
sec. tailed fault detection results are listed in Table I. The low fault
• Process shift and drift: In the semiconductor industry, detection accuracy of PCA in this case can be explained by the
process shift and drift are primarily due to the following multimodal distribution of the batch trajectories due to process
reasons: preventive and corrective maintenances, differ- shifts. Note that in typical manufacturing environments, process
ences in the incoming materials, aging of the etcher over shifts are frequent and inevitable due to a variety of reasons such
a clean cycle, and drift in the process monitoring sensors. as preventive maintenance (PM) and part replacement.
For the etch data set, the three experiments were carried Next, FD-kNN and PC-kNN are applied to the etch data set.
out several weeks apart. Due to the process shift and drift, The number of nearest neighbors is set to 3 for both methods
data from different experiments have different means and and 3 PC’s are used for PC-kNN. The detection results are
somewhat different covariance structures [9]. shown in Fig. 9 and Fig. 10(a). To illustrate where each sample’s
neighbors are located, the mapping of the three nearest neigh-
bors for each training sample obtained in PC-kNN is plotted
A. Data Preprocessing
in Fig. 10(b). The x axis is the index for each training sample,
Data preprocessing is an important aspect of multivariate sta- and the y axis shows the indices of its three nearest neighbors.
tistical analysis and can have a significant impact on the overall Fig. 10(b) shows that wafers are grouped in three clusters, which
sensitivity and robustness of the method [9]. For batch process indicates the same multimodal characteristic shown in Fig. 7.
monitoring, data preprocessing usually includes trajectory shift, Fig. 10(b) also shows that each wafer finds its three nearest
trajectory alignment, data filtering, unfolding and scaling. Other neighbors in its own group, which confirms that the multimodal
preprocessing steps may be needed depending on the specific characteristic is captured in the principal component subspace,
process. Although all these data preprocessing techniques are and that the PC-kNN method, similar to FD-kNN, naturally
powerful for improving the effectiveness of the fault detection handles multimodal distribution by focusing on local neighbor-
method, they are not desirable in an automated manufacturing hoods. The detailed fault detection results from PCA, FD-kNN,
environment. This is because most of the preprocessing steps and PC-kNN are shown in Table I where FD-kNN and PC-kNN
(e.g., dynamic time warping (DTW) and filtering) are process each detects 16 faults out of 20, although the faults detected are
specific and require human interactions which are difficult to not exactly the same ones.
automate. The industrial example demonstrates that both FD-kNN and
One goal of this work is to maximize the level of automation PC-kNN are capable of handling multimodal data without addi-
in fault detection in order to suit industrial applications. There- tional data preprocessing. Note that if further data preprocessing
fore, we compare fault detection methods with minimum data is performed to eliminate the multimodal distribution, PCA per-
preprocessing in this work. The first step of data preprocessing forms similarly to FD-kNN [19].

Authorized licensed use limited to: National Taiwan University. Downloaded on August 03,2010 at 15:11:32 UTC from IEEE Xplore. Restrictions apply.
HE AND WANG: LARGE-SCALE SEMICONDUCTOR PROCESS FAULT DETECTION 199

TABLE I
FAULT DETECTED BY DIFFERENT METHODS

Fig. 9. Fault detection using FD-kNN.

Fig. 10. PC-kNN. (a) Fault detection. (b) Mapping of the nearest neighbors for
k =3 .

TABLE II
COMPARISON OF COMPUTATION SPEED

VI. CONCLUSION AND DISCUSSION


In this paper, a fast pattern recognition based fault detection
method is proposed. The new method makes use of the PC-kNN
to explicitly account for some unique characteristics of most
semiconductor processes, namely nonlinearity and multimodal
trajectories. Compared to FD-kNN, a fault detection method
that uses the kNN rule in the high-dimensional original variable
space, PC-kNN reduces computational complexity and memory
C. Comparison of Storage and Computation
requirements by applying the kNN rule to the low-dimensional
principal component subspace.
The saving in storage and computation time by PC-kNN com-
Because the PC-kNN method makes no assumption about
pared to FD-kNN depends on the number of features that are
the linearity of the scores and it detects abnormality based on
studied. The larger the number of features, the bigger the sav-
local neighborhoods, it naturally handles process nonlinearity
ings in storage and computation. In the TI example, there are and multimodal distributions. In addition, because the impact
1615 features after unfolding. If only three principal component of noise is reduced by PCA, PC-kNN sometimes outperforms
features are stored, the model storage of PC-kNN is about 0.2% FD-kNN in fault detection. Similar to FD-kNN, the choice of
of that of the FD-kNN. In terms of computation, for the TI ex- for PC-kNN is not critical. In general, larger values of reduce
ample, PC-kNN detects faults similar to those of FD-kNN but the effect of noise on the fault detection, but make boundaries
with computation speed more than two orders magnitude faster between normal and fault batches less distinct. A practical ap-
than FD-kNN in model building and fault detection as shown in proach is to try several different values of on historical data
Table II. All computations were carried out using a laptop with and choose the one that gives the best cross-validation results.
a dual core processor at the speed of 1.2 GHz and 1.5 GB RAM. It is worth noting that PC-kNN works in the principal compo-
It is worth noting that the industrial example in this work is rela- nent subspace similar to the PCA index except that PC-kNN
tively small. In the case of thousands of large-scale applications handles nonlinearity and multimodal distribution in the scores.
running concurrently, the consideration of storage and compu- Therefore, PC-kNN performs essentially the same as PCA
tation is more important. in the linear, unimodal cases while PC-kNN outperforms PCA

Authorized licensed use limited to: National Taiwan University. Downloaded on August 03,2010 at 15:11:32 UTC from IEEE Xplore. Restrictions apply.
200 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010

in the presence of nonlinearity or multimodal distribution [13] J. Wong, “Batch PLS analysis and FDC process control of within lot
in the scores. Because PC-kNN makes no use of residuals, any SiON gate oxide thickness variation in sub-nanometer range,” in Proc.
AEC/APC Symp. XVIII, Westminster, CO, Sep. 2006.
fault that occurs in the residual subspace only will not be de- [14] T. Kourti, P. Nomikos, and J. F. MacGregor, “Analysis, monitoring,
tected by PC-kNN. However, as demonstrated by the three ex- and fault diagnosis of batch processes using multi-block and multi-way
amples in this work, most faults, if not all, will incur changes PLS,” J. Process. Contr., vol. 5, pp. 277–284, 1995.
[15] P. Nomikos and J. MacGregor, “Multi-way partial least squares in mon-
to the principal component subspace. Some of them were not itoring batch processes,” Chemometr. Intell. Lab. Syst., vol. 30, pp.
detected by the PCA index but most of them were detected 97–108, 1995.
by PC-kNN. Nevertheless, one possible improvement is to join [16] S. J. Qin, S. Valle-Cervantes, and M. Piovoso, “On unifying multi-
block analysis with applications to decentralized process monitoring,”
faults detected by PC-kNN and faults detected by SPE so that J. Chemometr., vol. 15, pp. 715–742, 2001.
the full space is covered in fault detection. [17] J. A. Westerhuis, T. Kourti, and J. MacGregor, “Comparing alternative
The next logical step following the detection of a fault is fault approaches for multivariate statistical analysis of batch process data,”
J. Chemometr., vol. 13, pp. 397–413, 1999.
diagnosis. Although fault diagnosis is not the focus of this paper, [18] Q. P. He, J. Wang, and S. J. Qin, “A new fault diagnosis method using
PC-kNN does have the fault diagnosis capability similar to the fault directions in Fisher discriminant analysis,” AIChE J., vol. 51, no.
contribution plot of PCA. Like the PCA contribution plot where 2, pp. 555–571, 2005.
[19] Q. P. He and J. Wang, “Fault detection using k-nearest-neighbor rule
the summation of SPE is broken down into each element, the for semiconductor manufacturing processes,” IEEE Trans. Semicond.
PC-kNN distance can be broken down into each element corre- Manuf., vol. 20, no. 4, pp. 345–354, 2007.
sponding to each score variables, and then each element is re- [20] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. New
York: Wiley, 2001.
lated back to the original variables through the PCA model. [21] J. Wang, Q. P. He, and T. F. Edgar, “State estimation in high-mix semi-
Given the large numbers of equipment and recipe context conductor manufacturing,” J. Process Contr., vol. 19, pp. 443–456,
that are found in today’s semiconductor processing facilities, 2009.
[22] A. K. Smilde, “Comments on three-way analyses used for batch
the idea of monitoring all of these contexts with a single model process data,” J. Chemometr., vol. 15, no. 11, pp. 19–27, 2001.
is very appealing. Since PC-kNN handles multimodal data, one [23] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,”
area of our future work is to explore the potential of deploying a Chemometr. Intell. Lab. Syst., vol. 2, pp. 37–52, 1987.
[24] J. E. Jackson and G. Mudholkar, “Control procedures for residuals as-
single PC-kNN model for the whole process rather than poten- sociated with principal component analysis,” Technometr., vol. 21, pp.
tially hundreds of context-specific PCA models. 341–349, 1979.
[25] S. J. Qin, “Statistical process monitoring: Basics and beyond,” J.
REFERENCES Chemometr., vol. 17, pp. 480–502, 2003.
[26] L. H. Chiang, E. L. Russell, and R. D. Braatz, Fault Detection and
[1] A. Ison and C. J. Spanos, “Robust fault detection and fault classifica-
Diagnosis in Industrial Systems. London, U.K.: Springer, 2001.
tion of semiconductor manufacturing equipment,” in Proc. Int. Symp. [27] E. L. Russell, L. H. Chiang, and R. D. Braatz, “Fault detection in indus-
Semicond. Manuf., Tokyo, Japan, October 1996.
trial processes using canonical variate analysis and dynamic principal
[2] Q. P. He, “Novel multivariate fault detection methods using Maha- component analysis,” Chemometr. Intell. Lab. Syst., vol. 51, pp. 81–93,
lanobis distance,” in Proc. AEC/APC Symp. XVII, Indian Wells, CA, 2000.
Sep. 2005.
[28] B. M. Wise, Metal Etch Data for Fault Detection Evaluation [Online].
[3] J. Wang and Q. P. He, “A pattern matching approach for fast distur- Available: http://software.eigenvector.com/Data/Etch/index.html 1999
bance detection and classification using Bayesian statistics,” in Proc.
AEC/APC Symp. XVII, Indian Wells, CA, Sep. 2005.
[4] Q. P. He and J. Wang, “A multivariate fault detection method using Qinghua Peter He (S’01–M’05) received the B.S.
k-nearest-neighbor rule,” in Proc. AEC/APC Symp. XVIII, Westminster, degree in chemical engineering from Tsinghua
CO, 2006. University, Beijing, China, in 1996, and the M.S.
[5] J. Wang and Q. P. He, “A Bayesian approach for disturbance detection and Ph.D. degrees in chemical engineering, in 2002
and classification and its application to state estimation in run-to-run and 2005, respectively, from the University of Texas,
control,” IEEE Trans. Semicond. Manuf., vol. 20, no. 2, pp. 126–136, Austin.
2007. He is currently an Assistant Professor at Tuskegee
[6] T. Adamson, G. Moore, M. Passow, J. Wong, and Y. Xu, “Strategies for University, Tuskegee, AL. His research interests are
successfully implementing fab-wide FDC methodologies in semicon- in the general areas of process modeling, monitoring,
ductor manufacturing,” in Proc. AEC/APC Symp. XVIII, Westminster, optimization and control, with special interests in the
CO, Sep. 2006. modeling and optimization, fault detection and clas-
[7] T. Moore, B. Harner, G. Kestner, C. Baab, and J. Stanchfield, “Intel’s sification of batch processes such as semiconductor manufacturing and pharma-
FDC proliferation in 300 mm HVM: Progress and lessons learned,” in ceutical processes. He is also interested in molecular dynamic simulation and
Proc. AEC/APC Symp. XVIII, Westminster, CO, Sep. 2006. Monte Carlo simulation of micro/nanoelectronic and biological systems. He has
[8] C. A. Bode, J. Wang, Q. P. He, and T. F. Edgar, “Run-to-run control and more than three years of experience in semiconductor and chemical industries.
state estimation in semiconductor manufacturing,” Ann. Rev. Contr. ,
vol. 31, no. 2, pp. 241–253, 2007.
[9] B. M. Wise, N. B. Gallagher, S. W. Butler, D. White, and G. G.
Barna, “A comparison of principal component analysis, multiway Jin Wang (S’01–M’04) received the B.S. degree in
principal component analysis, trilinear decomposition and parallel chemical engineering from Tsinghua University, Bei-
factor analysis for fault detection in a semiconductor etch process,” J. jing, China, in 1994. She received the M.S. and Ph.D.
Chemometr., vol. 13, pp. 379–396, 1999. degrees in chemical engineering from the University
[10] B. M. Wise, N. B. Gallagher, and E. B. Martin, “Application of of Texas, t Austin, in 2001 and 2004, respectively.
PARAFAC2 to fault detection and diagnosis in semiconductor etch,” From 2002 to 2006, she was a Process De-
J. Chemometr., vol. 15, pp. 285–298, 2001. velopment Engineer and later a Senior Process
[11] G. Cherry, R. Good, and S. J. Qin, “Semiconductor process monitoring Development Engineer with Advanced Micro De-
and fault detection with recursive multiway PCA based on a combined vices, Inc. Since 2006 she has been with Auburn
index,” in Proc. AEC/APC Symp. XIV, Salt Lake City, UT, Sep. 2002. University, Auburn, AL, as an Assistant Professor.
[12] H. H. Yue and M. Tomoyasu, “Weighted principal component analysis Her research interests include system identification,
and its applications to improve FDC performance,” in Proc. 43rd IEEE semiconductor process modeling and control, fault detection and classification,
Conf. Decision Contr., Atlantis, Paradise Island, Bahamas, Dec. 2004, control performance monitoring, and systems biology. She holds ten U.S.
pp. 4262–4267. patents.

Authorized licensed use limited to: National Taiwan University. Downloaded on August 03,2010 at 15:11:32 UTC from IEEE Xplore. Restrictions apply.

You might also like