Large-Scale Semiconductor Process Fault Detection Using A Fast Pattern Recognition-Based Method
Large-Scale Semiconductor Process Fault Detection Using A Fast Pattern Recognition-Based Method
Large-Scale Semiconductor Process Fault Detection Using A Fast Pattern Recognition-Based Method
2, MAY 2010
Abstract—Fault detection and classification (FDC) has been processes are batch process, some unique process characteris-
recognized as an integral component of the advanced process tics, such as multimodal batch trajectories and nonlinearity, have
control (APC) framework in the semiconductor industry, as it posed difficulties for fault detection that cannot be readily ad-
helps to improve overall equipment efficiency (OEE). However,
some unique characteristics of semiconductor manufacturing dressed by these multivariate statistical methods. In our previous
processes have posed challenges for FDC applications, such as work [19], a pattern recognition based fault detection method
nonlinearity in most batch processes, and multimodal batch using the k-nearest-neighbor (kNN) rule [20] (FD-kNN) was
trajectories due to product mix. To explicitly account for these developed to explicitly account for the above mentioned char-
unique characteristics, a pattern recognition based fault detection acteristics of semiconductor processes. However, the drawback
method utilizing the k-nearest-neighbor rule (FD-kNN) was pre-
viously developed. In FD-kNN, historical data are used directly as associated with the FD-kNN method is that it can be computa-
the reference of normal process operation to determine whether tion and storage intensive for large-scale processes, where the
a new measurement is a fault. Therefore, for processes with a number of features or variables can easily exceed thousands
large number of variables, it can be computation and storage after batch unfolding. This drawback may prevent it from being
intensive, and may be difficult for online process monitoring. To implemented for online process monitoring, especially when
address this difficulty, we propose a fast pattern recognition based
fault detection method, termed principal component-based kNN thousands or tens of thousands of such FDC models are running
(PC-kNN), which takes advantages of both principal component concurrently due to high-mix production [21]. For example, by
analysis (PCA) for dimensionality reduction and FD-kNN for non- 2006 there were more than 7000 active FDC models at IBM
linearity and multimode handling. Two simulation examples and [6] and over 30 000 models at Intel [7]. To reduce computation
an industrial example are used to demonstrate the performance of complexity and storage/memory requirements, in this paper we
the proposed PC-kNN method in fault detection.
propose a fast pattern recognition based fault detection method,
Index Terms— Fault detection, k-nearest neighbor rule, pattern which is termed principal component-based kNN method (PC-
recognition, principal component analysis (PCA). kNN). The basic idea of the proposed PC-kNN method involves
two steps. In the first step, PCA is applied to the original process
data to capture key process features, and then FD-kNN is ap-
I. INTRODUCTION
plied to the principal subspace obtained by PCA to perform fault
detection. In this way, we can significantly reduce the compu-
tation load and data storage requirement, without loosing the
W ITH the massive amount of trace or machine data avail-
able in today’s semiconductor industry, fault detection
has been used to reduce wafer scrap, increase equipment uptime
advantages of the FD-kNN method in handling multimode data
and process nonlinearity.
and reduce the usage of test wafers [1]–[8]. Among available The remainder of the paper is organized as follows. In
fault detection methods, multivariate statistical fault detection Section II, we give brief reviews on the relevant methods
methods, such as principal component analysis (PCA) and par- applied in this study. Section III presents the proposed fault
tial least squares (PLS) have drawn increasing interest recently detection method, PC-kNN. Simulation examples to illustrate
for semiconductor manufacturing process monitoring [9]–[13]. fault detection capability of the proposed method are pre-
PCA- and PLS-based methods have been tremendously suc- sented in Section IV. Section V compares PCA, FD-kNN, and
cessful in monitoring continuous processes in the chemical and PC-kNN using an industrial example, and Section VI gives
petrochemical industries. In addition, their application to tra- conclusions and some discussions.
ditional chemical batch processes has been studied in the last II. METHODS
decade [14]–[18]. Although most semiconductor manufacturing
In this section, we briefly review relevant methods: PCA
and FD-kNN. Throughout this paper, a scalar is denoted by an
Manuscript received May 31, 2009; revised November 30, 2009; accepted
italic lower-case character , a vector by a bold lower-case
December 15, 2009. First published January 26, 2010; current version published character , a matrix by a bold upper-case character and
May 05, 2010. This work was supported by the NSF by Grants CTS-0853748 a three-way array by an underlined bold upper-case character
and CTS-0853983.
Q. P. He is with the Department of Chemical Engineering, Tuskegee Univer-
. These notations are consistent with notations used by
sity, Tuskegee, AL 36088 USA (e-mail: qhe@tuskegee.edu). others in the statistical process control literature (see e.g., [22]).
J. Wang is with the Department of Chemical Engineering, Auburn University,
Auburn, AL 36849 USA (e-mail: wang@auburn.edu). A. PCA
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. PCA and its application in process monitoring have been
Digital Object Identifier 10.1109/TSM.2010.2041289 studied extensively, and a brief review is given here. Generally
0894-6507/$26.00 © 2010 IEEE
Authorized licensed use limited to: National Taiwan University. Downloaded on August 03,2010 at 15:11:32 UTC from IEEE Xplore. Restrictions apply.
HE AND WANG: LARGE-SCALE SEMICONDUCTOR PROCESS FAULT DETECTION 195
speaking, the raw data for a batch process is a 3-D array where denotes squared Euclidean distance from
with unequal dimensions in sample to its -th nearest neighbor.
batch duration. Certain data preprocessing steps are needed to 3) Determine the threshold of kNN distance, , for fault
transform the raw data into a two dimension matrix before PCA detection.
can be applied. Let denote the process data matrix Similar to multivariate SPM methods, the threshold
after preprocessing, with samples (rows) and variables can be determined in two ways. One is based on the
(columns). is first scaled to zero mean for covariance-based assumption that following a noncentral distri-
PCA and further to unit variance for correlation-based PCA. bution. The other is through the calibration or use of
By either the NIPALS [23] or a singular value decomposition testing data under normal operation conditions [9],
(SVD) algorithm, the scaled matrix is decomposed as fol- [27].
lows: • Step 2: fault detection
For an incoming unclassified sample ,
(1) 1) Identify ’s nearest neighbors in the training data set.
2) Calculate ’s kNN distance using (4).
where and are the score and loading
3) Compare against the threshold . If ,
matrices, respectively. The PCA projection reduces the orig-
it is classified as a normal sample. Otherwise, it is de-
inal set of variables to principal components (PC’s). For
tected as a fault.
fault detection on a new sample vector , the squared predic-
Details on the FD-kNN method, such as the choice of and
tion error (SPE) and the Hotelling’s are often used. The SPE
determination of , can be found in [4], [19].
statistic indicates how well each sample conforms to the model,
measured by the projection of the sample vector on the residual
space (RS): III. PC-KNN
A disadvantage of the FD-kNN method is that it requires
(2)
considerable memory resources for large systems, as all of the
training data must be retained to compute for each incoming
The Hotelling’s is a measure of the variation in principal
sample . In addition, the computational complexity is directly
component space (PCS):
related to the dimensionality of the data. Consequently, there
(3) is a practical upper limit to both the number of training sam-
ples and the number of variables that can be processed if the
where is the diagonal matrix of largest eigenvalues of . algorithm is implemented online with thousands or tens of thou-
The statistic forms a hyper-ellipsoid, which represents the sands of models running concurrently. Although there are algo-
joint limits of variations that can be explained by a set of rithmic techniques that can reduce the computational burden,
common causes. the implementation of these techniques is nontrivial [20]. To
The thresholds of SPE and can be determined either the- reduce the memory requirement and computation time of the
oretically [24], [25] or empirically [18], [26]. FD-kNN method while still keeping its advantage of handling
nonlinear and multimodal data, we propose a fast pattern recog-
B. FD-kNN nition based fault detection method, termed as PC-kNN, which
The basic idea of FD-kNN method [4], [19] is that the tra- combines the strengths of PCA and FD-kNN. In PC-kNN, we
jectory of a normal sample is similar to that of normal samples first make use of the dimensionality reduction and information
in the training data; while the trajectory of a fault sample must preserving properties of PCA to extract the principal component
exhibit some deviation from that of normal training samples. In subspace (PCS) that contains key information of the data set.
FD-kNN, this idea is implemented through evaluating the kNN Then we apply FD-kNN to the PCS for fault detection, which
distance, which is defined by the average squared distance be- usually has a much lower dimension compared to the original
tween a sample to its nearest neighbors in the training data set. data space. For a training data set with sam-
Specifically, the kNN distance of a faulty sample must be greater ples and variables, principal components which span the PCS
than that of a normal sample. Therefore, the FD-kNN method are extracted by (1). In the model building phase, we apply the
consists of two steps: 1) model building which determines the FD-kNN algorithm to the score matrix corresponding to the
threshold of the kNN distance for normal samples, and 2) fault features preserved in PCS.
detection, which calculates a new sample’s kNN distance and The PC-kNN distance of sample is defined as
compares it with the predetermined threshold to perform fault
detection. The details are as follows: (5)
• Step 1: model building
For each sample in the training data set,
1) Find its nearest neighbors in the training data set. where denotes squared Euclidean distance from sample to
2) Calculate the kNN distance as defined by its -th nearest neighbors in the PCS, i.e., from to its
-th nearest neighbor in . Similar to [19], the distribution of
(4) PC-kNN distances can be approximated by a noncentral dis-
tribution and a threshold can be determined theoretically or
Authorized licensed use limited to: National Taiwan University. Downloaded on August 03,2010 at 15:11:32 UTC from IEEE Xplore. Restrictions apply.
196 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010
Fig. 1. Nonlinear case. (a) The scatter plot of the first two dominant mode vari- Fig. 2. Fault detection—nonlinear case. (a) SPE chart. (b) T chart.
ables. (b) The score plot when all variables are projected onto the first two PC’s.
Authorized licensed use limited to: National Taiwan University. Downloaded on August 03,2010 at 15:11:32 UTC from IEEE Xplore. Restrictions apply.
HE AND WANG: LARGE-SCALE SEMICONDUCTOR PROCESS FAULT DETECTION 197
Fig. 4. Multimodal case. (a) The scatter plot of the first two dominant mode Fig. 5. Fault detection—multimodal case. (a) SPE chart. (b) T chart.
variables. (b) The score plot when all variables are projected onto the first two
PC’s.
B. Multimodal Case based on 2 PC’s with , and the result is shown in Fig. 6(b),
where all 5 faults are also successfully detected.
The second simulation example is a multimodal case where
Again, because the PCA model only extracts linear features
the process is performed on two tools with different process
from process data, the multimodal environment imposes limita-
gains and an offset exists between the tools. Again, the process
tions on PCA and makes it less effective. As the bimodal dis-
is mainly characterized by two variables ( and ) out of 12
tribution is the dominant feature of the process (which is also
variables:
a type of nonlinearity), it is captured in the PCS as shown in
Fig. 4(b), which is why PC-kNN successfully detects all five
Tool A
faults.
Tool B (7)
where and are white noise sequences with variance 0.25. V. INDUSTRIAL EXAMPLES
Variables to are white noise sequences with variance In this section, a benchmark industrial example is used to
ranging from 0.04 to 1. compare different fault detection methods. The data set is col-
Three-hundred normal runs are conducted on each tool so lected from an Al stack etch process performed on a commercial
that totally 600 normal data points are collected. Five-hundred scale Lam 9600 plasma etch tool at Texas Instrument Inc. [9],
normal runs are randomly selected from the two tools for [28]. The data consists of 108 normal wafers taken during 3 ex-
training, the remaining 100 normal runs are used for validation, periments and 21 wafers with intentionally induced faults taken
and 5 faults are introduced. The data set, including the training, during the same experiments. Due to large amount of missing
validation and fault samples, is visualized in Fig. 4(a), in the data in two batches, only 107 normal wafers and 20 wafers with
plane spanned by and . If PCA is applied to the data set faults are used in this study. A more detailed description on the
with all 12 variables, the score plot of the first two PC’s is faults can be found in [9]. Similar to [9], only the samples col-
shown in Fig. 4(b). By comparing the two figures, we can see lected from steps 4 and 5 are used in this work, and 19 nonset-
that the process dominant mode and the multimodal distribution point process variables are included for fault detection.
of samples are captured in the PCS. As aforementioned, there are unique characteristics associ-
Fault detection results using the PCA SPE and charts with ated with semiconductor processes. These characteristics are
2 PC’s are shown in Fig. 5(a) and Fig. 5(b). We see that only also noted in this data set, as illustrated using two examples
faults 2 and 5 are detected by the index, because the bimodal (variables EndPt A and TCP Load) in Fig. 7.
distributed data set violated the unimodal distribution assump- • Unequal batch and step duration: Like many other batch
tion that PCA is based upon. processes, in the etch data set, different batches have dif-
Next, FD-kNN is applied to the multimodal data set. The ferent durations, and the batch durations range from 95
number of nearest neighbors, , is set to be 3 as in the previous to 112 sec for the normal batches. In addition, batches of
example. The result is shown in Fig. 6(a), where all faults are equal length may not follow exactly the same time trajec-
detected. PC-kNN is then applied to the scores obtained in PCA tory, as the duration of each specific step may vary from
Authorized licensed use limited to: National Taiwan University. Downloaded on August 03,2010 at 15:11:32 UTC from IEEE Xplore. Restrictions apply.
198 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010
Authorized licensed use limited to: National Taiwan University. Downloaded on August 03,2010 at 15:11:32 UTC from IEEE Xplore. Restrictions apply.
HE AND WANG: LARGE-SCALE SEMICONDUCTOR PROCESS FAULT DETECTION 199
TABLE I
FAULT DETECTED BY DIFFERENT METHODS
Fig. 10. PC-kNN. (a) Fault detection. (b) Mapping of the nearest neighbors for
k =3 .
TABLE II
COMPARISON OF COMPUTATION SPEED
Authorized licensed use limited to: National Taiwan University. Downloaded on August 03,2010 at 15:11:32 UTC from IEEE Xplore. Restrictions apply.
200 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 23, NO. 2, MAY 2010
in the presence of nonlinearity or multimodal distribution [13] J. Wong, “Batch PLS analysis and FDC process control of within lot
in the scores. Because PC-kNN makes no use of residuals, any SiON gate oxide thickness variation in sub-nanometer range,” in Proc.
AEC/APC Symp. XVIII, Westminster, CO, Sep. 2006.
fault that occurs in the residual subspace only will not be de- [14] T. Kourti, P. Nomikos, and J. F. MacGregor, “Analysis, monitoring,
tected by PC-kNN. However, as demonstrated by the three ex- and fault diagnosis of batch processes using multi-block and multi-way
amples in this work, most faults, if not all, will incur changes PLS,” J. Process. Contr., vol. 5, pp. 277–284, 1995.
[15] P. Nomikos and J. MacGregor, “Multi-way partial least squares in mon-
to the principal component subspace. Some of them were not itoring batch processes,” Chemometr. Intell. Lab. Syst., vol. 30, pp.
detected by the PCA index but most of them were detected 97–108, 1995.
by PC-kNN. Nevertheless, one possible improvement is to join [16] S. J. Qin, S. Valle-Cervantes, and M. Piovoso, “On unifying multi-
block analysis with applications to decentralized process monitoring,”
faults detected by PC-kNN and faults detected by SPE so that J. Chemometr., vol. 15, pp. 715–742, 2001.
the full space is covered in fault detection. [17] J. A. Westerhuis, T. Kourti, and J. MacGregor, “Comparing alternative
The next logical step following the detection of a fault is fault approaches for multivariate statistical analysis of batch process data,”
J. Chemometr., vol. 13, pp. 397–413, 1999.
diagnosis. Although fault diagnosis is not the focus of this paper, [18] Q. P. He, J. Wang, and S. J. Qin, “A new fault diagnosis method using
PC-kNN does have the fault diagnosis capability similar to the fault directions in Fisher discriminant analysis,” AIChE J., vol. 51, no.
contribution plot of PCA. Like the PCA contribution plot where 2, pp. 555–571, 2005.
[19] Q. P. He and J. Wang, “Fault detection using k-nearest-neighbor rule
the summation of SPE is broken down into each element, the for semiconductor manufacturing processes,” IEEE Trans. Semicond.
PC-kNN distance can be broken down into each element corre- Manuf., vol. 20, no. 4, pp. 345–354, 2007.
sponding to each score variables, and then each element is re- [20] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. New
York: Wiley, 2001.
lated back to the original variables through the PCA model. [21] J. Wang, Q. P. He, and T. F. Edgar, “State estimation in high-mix semi-
Given the large numbers of equipment and recipe context conductor manufacturing,” J. Process Contr., vol. 19, pp. 443–456,
that are found in today’s semiconductor processing facilities, 2009.
[22] A. K. Smilde, “Comments on three-way analyses used for batch
the idea of monitoring all of these contexts with a single model process data,” J. Chemometr., vol. 15, no. 11, pp. 19–27, 2001.
is very appealing. Since PC-kNN handles multimodal data, one [23] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,”
area of our future work is to explore the potential of deploying a Chemometr. Intell. Lab. Syst., vol. 2, pp. 37–52, 1987.
[24] J. E. Jackson and G. Mudholkar, “Control procedures for residuals as-
single PC-kNN model for the whole process rather than poten- sociated with principal component analysis,” Technometr., vol. 21, pp.
tially hundreds of context-specific PCA models. 341–349, 1979.
[25] S. J. Qin, “Statistical process monitoring: Basics and beyond,” J.
REFERENCES Chemometr., vol. 17, pp. 480–502, 2003.
[26] L. H. Chiang, E. L. Russell, and R. D. Braatz, Fault Detection and
[1] A. Ison and C. J. Spanos, “Robust fault detection and fault classifica-
Diagnosis in Industrial Systems. London, U.K.: Springer, 2001.
tion of semiconductor manufacturing equipment,” in Proc. Int. Symp. [27] E. L. Russell, L. H. Chiang, and R. D. Braatz, “Fault detection in indus-
Semicond. Manuf., Tokyo, Japan, October 1996.
trial processes using canonical variate analysis and dynamic principal
[2] Q. P. He, “Novel multivariate fault detection methods using Maha- component analysis,” Chemometr. Intell. Lab. Syst., vol. 51, pp. 81–93,
lanobis distance,” in Proc. AEC/APC Symp. XVII, Indian Wells, CA, 2000.
Sep. 2005.
[28] B. M. Wise, Metal Etch Data for Fault Detection Evaluation [Online].
[3] J. Wang and Q. P. He, “A pattern matching approach for fast distur- Available: http://software.eigenvector.com/Data/Etch/index.html 1999
bance detection and classification using Bayesian statistics,” in Proc.
AEC/APC Symp. XVII, Indian Wells, CA, Sep. 2005.
[4] Q. P. He and J. Wang, “A multivariate fault detection method using Qinghua Peter He (S’01–M’05) received the B.S.
k-nearest-neighbor rule,” in Proc. AEC/APC Symp. XVIII, Westminster, degree in chemical engineering from Tsinghua
CO, 2006. University, Beijing, China, in 1996, and the M.S.
[5] J. Wang and Q. P. He, “A Bayesian approach for disturbance detection and Ph.D. degrees in chemical engineering, in 2002
and classification and its application to state estimation in run-to-run and 2005, respectively, from the University of Texas,
control,” IEEE Trans. Semicond. Manuf., vol. 20, no. 2, pp. 126–136, Austin.
2007. He is currently an Assistant Professor at Tuskegee
[6] T. Adamson, G. Moore, M. Passow, J. Wong, and Y. Xu, “Strategies for University, Tuskegee, AL. His research interests are
successfully implementing fab-wide FDC methodologies in semicon- in the general areas of process modeling, monitoring,
ductor manufacturing,” in Proc. AEC/APC Symp. XVIII, Westminster, optimization and control, with special interests in the
CO, Sep. 2006. modeling and optimization, fault detection and clas-
[7] T. Moore, B. Harner, G. Kestner, C. Baab, and J. Stanchfield, “Intel’s sification of batch processes such as semiconductor manufacturing and pharma-
FDC proliferation in 300 mm HVM: Progress and lessons learned,” in ceutical processes. He is also interested in molecular dynamic simulation and
Proc. AEC/APC Symp. XVIII, Westminster, CO, Sep. 2006. Monte Carlo simulation of micro/nanoelectronic and biological systems. He has
[8] C. A. Bode, J. Wang, Q. P. He, and T. F. Edgar, “Run-to-run control and more than three years of experience in semiconductor and chemical industries.
state estimation in semiconductor manufacturing,” Ann. Rev. Contr. ,
vol. 31, no. 2, pp. 241–253, 2007.
[9] B. M. Wise, N. B. Gallagher, S. W. Butler, D. White, and G. G.
Barna, “A comparison of principal component analysis, multiway Jin Wang (S’01–M’04) received the B.S. degree in
principal component analysis, trilinear decomposition and parallel chemical engineering from Tsinghua University, Bei-
factor analysis for fault detection in a semiconductor etch process,” J. jing, China, in 1994. She received the M.S. and Ph.D.
Chemometr., vol. 13, pp. 379–396, 1999. degrees in chemical engineering from the University
[10] B. M. Wise, N. B. Gallagher, and E. B. Martin, “Application of of Texas, t Austin, in 2001 and 2004, respectively.
PARAFAC2 to fault detection and diagnosis in semiconductor etch,” From 2002 to 2006, she was a Process De-
J. Chemometr., vol. 15, pp. 285–298, 2001. velopment Engineer and later a Senior Process
[11] G. Cherry, R. Good, and S. J. Qin, “Semiconductor process monitoring Development Engineer with Advanced Micro De-
and fault detection with recursive multiway PCA based on a combined vices, Inc. Since 2006 she has been with Auburn
index,” in Proc. AEC/APC Symp. XIV, Salt Lake City, UT, Sep. 2002. University, Auburn, AL, as an Assistant Professor.
[12] H. H. Yue and M. Tomoyasu, “Weighted principal component analysis Her research interests include system identification,
and its applications to improve FDC performance,” in Proc. 43rd IEEE semiconductor process modeling and control, fault detection and classification,
Conf. Decision Contr., Atlantis, Paradise Island, Bahamas, Dec. 2004, control performance monitoring, and systems biology. She holds ten U.S.
pp. 4262–4267. patents.
Authorized licensed use limited to: National Taiwan University. Downloaded on August 03,2010 at 15:11:32 UTC from IEEE Xplore. Restrictions apply.