Effective Variational-Autoencoder-Based Generative Models For Highly Imbalanced Fault Detection Data in Semiconductor Manufacturing
Effective Variational-Autoencoder-Based Generative Models For Highly Imbalanced Fault Detection Data in Semiconductor Manufacturing
Effective Variational-Autoencoder-Based Generative Models For Highly Imbalanced Fault Detection Data in Semiconductor Manufacturing
Abstract—In current semiconductor manufacturing, limited immediate action to help the process engineer with FDC tasks
raw trace data pertaining to defective wafers make fault detection prior to metrology data being taken. In the recent literature,
(FD) assignments extremely difficult due to the data imbalance defective wafer detection and classification have been con-
in wafer classification. To mitigate, this paper proposes using a
variational autoencoder (VAE) as a data augmentation strategy siderably addressed by investigating the sensor data of the
for resolving data imbalance of temporal raw trace data. A VAE process equipment in semiconductor fabrication plant [4], [5],
with few defective samples is first trained. By means of extract- [6], [7], [8], [9]. In common semiconductor practice, the sen-
ing the latent variables that characterize the distribution of the sor data is also known as the raw trace data of status variable
defective samples, we make use of the statistical randomness of identification (SVID). The raw trace data contains a wide
the latent variables to generate synthesized defective samples via
the decoder scheme in the trained VAE. Two data representa- variety of temporal, time-indexed sequences of measurements
tions and VAE modeling strategies, concatenation of multiple and collected in situ from the sensors installed in the process
individual raw trace data as the input of the VAE during the equipment, where each sensor corresponds to a specific SVID.
training stage, are investigated. A real-data plasma enhanced Technically speaking, SVID can be, in essence, quantitative or
chemical vapor deposition (PECVD) process having only few qualitative for the purposes of sensor design. Typically, quali-
defective samples is used to illustrate the performance enhance-
ment to wafer classification arising from the proposed data tative SVIDs are related to the wafer count, time stamp, chip
augmentation framework. Based on the computational compar- tally, etc.; quantitative SVIDs are related to the processing
isons between noted classification models, the proposed generative variables in the tool, which can be measured in the met-
VAE model via the individual strategy enables the adaptive boost- ric system, like chamber inner heater zone power, chamber
ing (AdaBoost) classifier to achieve perfect performances in every pressure reading, chamber outer heater current, temperature
metrics if the 80% and 100% over-sampling ratios are adopted.
power, etc. In a word, fault detection (FD) is performed in
Index Terms—Variational autoencoder (VAE), data augmen- an early stage of processing steps, attempting to detect defec-
tation, wafer classification, semiconductor manufacturing, fault tive wafers (or process excursions) without recourse to the
metrology system and circumvent the defective wafers going
downstream in the pipeline.
HE SEMICONDUCTOR industry has long turned into
T an important core driving force behind the development
of innovative high-tech applications and consumer electronics.
A. Literature Review
The process engineers used to conduct FD assignments in
In semiconductor manufacturing, fault detection and classifi- terms of the univariate statistics (also known as UVAs) like
cation (FDC) plays a central role in the paradigm of extended mean, standard deviation, range, maximum, minimum, slope,
advanced process control (eAPC) [1], [2], [3]. The con- skewedness, kurtosis, coefficient of variation, among oth-
struction of an effective wafer classification model promises ers, applied to every individual pre-defined processing steps.
However, if non-key SVIDs and/or non-key processing steps
Manuscript received 21 June 2022; revised 14 November 2022; accepted 16 are selected for monitoring, unexpectedly high false positive
January 2023. Date of publication 3 February 2023; date of current version rate (i.e., type I error) or false negative rate (i.e., type II error)
5 May 2023. This work was supported in part by the Ministry of Science
and Technology, Taiwan, under Grant MOST-111-2221-E-027-070-MY3. will result [10]. Toward this end, the entire nonlinear SVID
(Corresponding author: Shu-Kai S. Fan.) profile of raw trace data is monitored instead for the purposes
Shu-Kai S. Fan is with the Department of Industrial Engineering and of safeguarding the yield loss [6], [7], [8], [9]. In the statistical
Management, National Taipei University of Technology, Taipei 106344,
Taiwan (e-mail: morrisfan@ntut.edu.tw). process control (SPC) literature, profile monitoring provides
Du-Ming Tsai is with the Department of Industrial Engineering and an alternative to the FD tasks in the electronics manufacturing
Management, Yuan Ze University, Taoyuan 32003, Taiwan. industry [11], [12], [13].
Pei-Chi Yeh is with the Department of Industrial Engineering and
Management, National Taipei University of Technology, Taipei 106344, As the manufacturing management technology (MMT) in
Taiwan, and also with the RD Process Center, Taiwan Semiconductor wafer fabrication advances, a high yield of 95% or even
Manufacturing Company Ltd., Hsinchu 308, Taiwan. beyond becomes now standard practice. Machine and deep
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TSM.2023.3238555. learning modeling is often suffering from the difficulty of class
Digital Object Identifier 10.1109/TSM.2023.3238555 imbalance in the FD tasks. The class imbalance arises from the
c 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on December 05,2023 at 16:24:18 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on December 05,2023 at 16:24:18 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on December 05,2023 at 16:24:18 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on December 05,2023 at 16:24:18 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on December 05,2023 at 16:24:18 UTC from IEEE Xplore. Restrictions apply.
Fig. 9. Deep network structure of the VAE model via the individual strategy.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on December 05,2023 at 16:24:18 UTC from IEEE Xplore. Restrictions apply.
Fig. 10. Generated temporal profiles of abnormal wafers for SVIDs 25-28
via the VAE models.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on December 05,2023 at 16:24:18 UTC from IEEE Xplore. Restrictions apply.
concatenation strategy is not recommended for the proposed adequately with the non-commensurability in scale between
generative VAE model that is used to characterize the raw SVIDs, therefore yielding augmented data inferior to those
trace data of PECVD addressed in this paper. of the individual strategy. Such inferiority in data augmen-
For random forests, the individual strategy in combina- tation leads to unstable improvement in accuracy despite an
tion with the proposed VAE model is capable of boosting increasing OS ratio.
classification accuracy most among compared data augmen-
tation strategies but seems unfazed by increasing OS ratios. D. Classification Performance of AdaBoost and XGBoost
Similarly, the other three strategies appear to not benefit from Using the Individual Strategy in the Proposed VAE Model
data augmentation, which could partly due to the inherent tech- Based on the comparison results in Section III-C, the
nique of bootstrap aggregating already embedded in random two boosting algorithms, AdaBoost and XGBoost, pro-
forests. duce best classification accuracy with the assistance of the
It is worth to note that AdaBoost performs perfectly in proposed VAE model in terms of the individual strategy.
accuracy, precision, recall and F-score while the individual Full details of classification performance, including accuracy,
strategy in the proposed VAE model is applied to the data precision, recall, F1 -score and FNR, are now reported in
augmentation with OS ratios of 80% and 100% (see Fig. 12). Tables III and IV. The best augmentation ratios for AdaBoost
In circumstances where the OS ratio is less than or equal and XGBoost are indicated in bold face.
to 60%, these four data augmentation methods do not help In Table III, the performance of AdaBoost improves in
AdaBoost to improve classification accuracy considerably. The every evaluation indicators almost linearly as the minority case
great success of the individual strategy in AdaBoost can be increases. In spite of perfect precision (100%) and a really
attributed to the following two reasons: (i) appropriate char- high accuracy (98.21% and 97.93%), AdaBoost still returns
acterization of FD data by using the individual strategy, and the FNRs of 16.45% and 19.03% as the minority class is
(ii) a sequential ensemble structure by successively refitting augmented to 266 and 399 cases. As mentioned earlier, such
weak classifiers to different weighted instances of the training high FNRs are totally unacceptable in the semiconductor prac-
dataset. By doing so, subsequent classifiers focus on relatively tice. In this regard, at least 532 minority cases are necessary
hard-to-classify instances. Apparently from Fig. 12, heavier to achieve world-class production yields. As can clearly be
data augmentation, like OS ratios of 80% and 100%, facilitates seen from Table III, a recall significantly lower than its perfect
the adaptiveness of dealing with additional minority cases. precision implies that the trained model owns a very decisive
For bagging and XGBoost, the individual strategy in com- ability to identify the raw trace data of abnormal wafers, but is
bination with the proposed VAE model attains the highest only restrained to partial defective patterns in the temporal raw
accuracy (0.9793 and 0.9944, respectively) as the OS ratio of trace data. With the aid of augmented raw trace data of defec-
60% is used. XGBoost returns the highest accuracy of 0.9944, tive wafers during the training stage, the machine learning
almost equally perfect as AdaBoost for OS ratios of 80% and model surely enhances its detectability by learning more dif-
100%. Unlike AdaBoost, the success of XGBoost rests on ferent raw trace patterns of defective wafers. The gap between
a parallel boosting mechanism within a single tree under an recall and precision begins to narrow as the OS ratio increases.
optimized distributed gradient-boosting framework. In Table IV, the performance XGBoost culminates in every
Taking a close look at Figs. 11–14, the SMOTE and boot- evaluation indicators as the minority case increases to 399. In
strap methods do not provide noticeable improvement in the meantime, the FNR of 5.16% is achieved, barely accept-
accuracy even when the OS ratio rises. The major difficulty able in practice. Surprisingly, the performance of every indi-
may arise from the fact that these two methods do not take cator declines afterwards, which remains an unsolved question
the distribution of the temporal data into account, so the syn- for a further scrutiny. On the whole, AdaBoost slightly out-
thesized temporal file data cannot resemble the original series performs XGBoost for the PECVD process investigated here.
well in some localities. By comparison with the individual Another potential research opportunity is to confirm if the dis-
strategy used for the proposed generative model, the concate- crepancy in performance arises from the boosting framework
nation strategy embedded in a single VAE model cannot deal between the sequential and parallel learning structures.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on December 05,2023 at 16:24:18 UTC from IEEE Xplore. Restrictions apply.
P ERFORMANCE E VALUATION OF XGB OOST U SING the proposed generative VAE model in terms of the indi-
THE I NDIVIDUAL S TRATEGY vidual strategy is to provide the practitioners with a viable
raw trace data augmentation tool in the ordinary semicon-
ductor manufacturing practice. An immediate extension of
the current work for future research would be applying the
proposed generative model to other important processing steps
in semiconductor manufacturing, such as chemical mechanical
planarization, etching, and ion implantation, since it is of prac-
tical relevance in FD. How to determine an optimum OS ratio
automatically and adaptively in the advanced process control
paradigm bears a further scrutiny [27]. Although the excellent
performances are demonstrated in this paper, additional on-site
confirmation experiments are still required to further validate
the effectiveness of the proposed generative model.
All the detailed performance evaluation reports in regard
to other data augmentation methods (SMOTE, bootstrap and
the concatenation strategy), other classifiers (random forests R EFERENCES
and bagging), other indicators (F0.5 -score, F2 -score and speci-
[1] F. Zhu et al., “Methodology for important sensor screening for
ficity) are available upon request. In an earlier experiment, fault detection and classification in semiconductor manufactur-
the Gaussian mixture model (GMM) has ever been used to ing,” IEEE Trans. Semicond. Manuf., vol. 34, no. 1, pp. 65–73,
replace the latent vector in (1) and Fig. 3. However, the VAE- Feb. 2021.
[2] S. Yasuda, T. Tanaka, M. Kitabata, and Y. Jisaki, “Chamber and
GMM model does not perform as competitively as the original recipe-independent FDC indicator in high-mix semiconductor manufac-
VAE model. The corresponding experimental results are also turing,” IEEE Trans. Semicond. Manuf., vol. 34, no. 3, pp. 301–306,
available upon request. Aug. 2021.
[3] D. H. Kim and S. J. Hong, “Use of plasma information in machine-
The statement that one-class models may suffer from a learning-based fault detection and classification for advanced equipment
high false positive rate is claimed in Section I. To do justice control,” IEEE Trans. Semicond. Manuf., vol. 34, no. 3, pp. 408–419,
under the PECVD data configuration investigated in this paper, Aug. 2021.
the OCSVM models using the polynomial and radial basis [4] H. Lee, Y. Kim, and C. O. Kim, “A deep learning model for robust
wafer fault monitoring with sensor measurement noise,” IEEE Trans.
function kernels were evaluated, respectively, and yielded a Semicond. Manuf., vol. 30, no. 1, pp. 23–31, Feb. 2017.
false positive rate of 0.5 approximately. This additional com- [5] J. Jang, B. W. Min, and C. O. Kim, “Denoised residual trace analysis
putational result re-stresses the contribution of the proposed for monitoring semiconductor process faults,” IEEE Trans. Semicond.
Manuf., vol. 32, no. 3, pp. 293–301, Aug. 2019.
VAE-based generative model for highly imbalanced fault [6] E. Kim, S. Cho, B. Lee, and M. Cho, “Fault detection and diagnosis
detection data. Similar outcomes of high false positive rates using self-attentive convolutional neural networks for variable-length
were also reported in [28]. sensor data in semiconductor manufacturing,” IEEE Trans. Semicond.
Manuf., vol. 32, no. 3, pp. 302–309, Aug. 2019.
[7] S.-K. S. Fan, D.-M. Tsai, F. He, J.-Y. Huang, and C.-H. Jen, “Key
IV. C ONCLUSION parameter identification and defective wafer detection of semiconduc-
tor manufacturing processes using image processing techniques,” IEEE
This paper investigates an important problem in the routine Trans. Semicond. Manuf., vol. 32, no. 4, pp. 544–552, Nov. 2019.
FD tasks of current semiconductor manufacturing, the class [8] S.-K. S. Fan, C.-Y. Hsu, D.-M. Tsai, F. He, and C.-C. Cheng,
imbalance in wafer classification. To resolve this difficulty, a “Data-driven approach for fault detection and diagnostic in semicon-
ductor manufacturing,” IEEE Trans. Autom. Sci. Eng., vol. 17, no. 4,
new VAE-based model in terms of two different data augmen- pp. 1925–1936, Oct. 2020.
tation strategies is proposed. A real-data PECVD process in [9] S.-K. S. Fan, C.-Y. Hsu, C.-H. Jen, K.-L. Chen, and L.-T. Juan,
semiconductor manufacturing is used to illustrate the proposed “Defective wafer detection using a denoising autoencoder for semicon-
ductor manufacturing processes,” Adv. Eng. Inform., vol. 46, Oct. 2020,
generative model. In terms of various OS ratios, the decoder Art. no. 101166.
in the trained VAE model is utilized to generate the syn- [10] S.-K. S. Fan, S.-C. Lin, and P.-F. Tsai, “Wafer fault detection and key
thesized profile data of abnormal wafers for the purpose of step identification for semiconductor manufacturing using principal com-
data augmentation. To verify the effectiveness of the proposed ponent analysis, AdaBoost and decision tree,” J. Ind. Prod. Eng., vol. 33,
no. 3, pp. 151–168, Jun. 2016.
VAE model, four different machine learning models are com- [11] S.-K. S. Fan, N.-C. Yao, Y.-J. Chang, and C.-H. Jen, “Statistical mon-
pared in the classification performance while four different itoring of nonlinear profiles by using piecewise linear approximation,”
data augmentation strategies are used, i.e., SMOTE, bootstrap, J. Process Control, vol. 21, no. 8, pp. 1217–1229, Sep. 2011.
[12] S.-K. S. Fan, Y.-J. Chang, and N. Aidara, “Nonlinear profile monitoring
and the individual and concatenation strategies used in the of reflow process data based on the sum of sine functions,” Qual. Rel.
VAE model. Based on a comprehensive computational study, Eng. Int., vol. 29, no. 5, pp. 743–758, Jul. 2013.
the proposed VAE model coupled with the individual strat- [13] S.-K. S. Fan, C.-H. Jen, and J.-X. Lee, “Profile monitoring for autocor-
related reflow processes with small samples,” Processes, vol. 7, no. 2,
egy outperforms the other data augmentation methods in every p. 104, Jan. 2019.
tested classifiers. In particular, AdaBoost in combination with [14] T. Lee, K. B. Lee, and C. O. Kim, “Performance of machine learning
the proposed VAE model and the individual strategy deliv- algorithms for class-imbalanced process fault detection problems,” IEEE
ers perfect classification performances (i.e., 100% in accuracy, Trans. Semicond. Manuf., vol. 29, no. 4, pp. 436–445, Nov. 2016.
[15] X. Jiang and Z. Ge, “Data augmentation classifier for imbalanced
precision, recall and F1 -score; 0 in FNR) as the OS ratios fault classification,” IEEE Trans. Autom. Sci. Eng., vol. 18, no. 3,
of 80% and 100% are selected. The major contribution of pp. 1206–1217, Jul. 2021.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on December 05,2023 at 16:24:18 UTC from IEEE Xplore. Restrictions apply.
[16] X. Yuan, C. Ou, Y. Wang, C. Yang, and W. Gui, “A layer-wise data [22] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2013,
augmentation strategy for deep learning networks and its soft sensor arXiv:1312.6114.
application in an industrial hydrocracking process,” IEEE Trans. Neural [23] D. P. Kingma and M. Welling, “An introduction to variational autoen-
Netw. Learn. Syst., vol. 32, no. 8, pp. 3296–3305, Aug. 2021. coders,” Found. Trends Mach. Learn., vol. 12, no. 4, pp. 307–392,
[17] M. Saqlain, Q. Abbas, and J. Y. Lee, “A deep convolutional neural Nov. 2019.
network for wafer defect identification on an imbalanced dataset in semi- [24] S. Bond-Taylor, A. Leach, Y. Long, and C. G. Willcocks, “Deep
conductor manufacturing processes,” IEEE Trans. Semicond. Manuf., generative modelling: A comparative review of VAEs, GANs, nor-
vol. 33, no. 3, pp. 436–444, Aug. 2020. malizing flows, energy-based and autoregressive models,” IEEE Trans.
[18] Y. Hyun and H. Kim, “Memory-augmented convolutional neural Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 7327–7347, Nov. 2022,
networks with triplet loss for imbalanced wafer defect pattern classi- doi: 10.1109/TPAMI.2021.3116668.
fication,” IEEE Trans. Semicond. Manuf., vol. 33, no. 4, pp. 622–634, [25] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach.
Nov. 2020. Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011.
[19] S. Wang, Z. Zhong, Y. Zhao, and L. Zuo, “A variational autoencoder
[26] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer,
enhanced deep learning model for wafer defect imbalanced classifica-
“SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell.
tion,” IEEE Trans. Compon. Packag. Manuf. Technol., vol. 11, no. 12,
Res., vol. 16, pp. 321–357, Jan. 2002.
pp. 2055–2060, Dec. 2021, doi: 10.1109/TCPMT.2021.3126083.
[20] J. Yu and J. Liu, “Multiple granularities generative adver- [27] S.-K. S. Fan, C.-W. Cheng, and D.-M. Tsai, “Fault diagnosis of wafer
sarial network for recognition of wafer map defects,” IEEE acceptance test and chip probing between front-end-of-line and back-
Trans. Ind. Informat., vol. 18, no. 3, pp. 1674–1683, Mar. 2022, end-of-line processes,” IEEE Trans. Autom. Sci. Eng., vol. 19, no. 4,
doi: 10.1109/TII.2021.3092372. pp. 3068–3082, Oct. 2022, doi: 10.1109/TASE.2021.3106011.
[21] Y. Lu, Y.-M. Cheung, and Y. Y. Tang, “Bayes imbalance impact index: [28] S.-K. S. Fan, D.-M. Tsai, C.-H. Jen, C.-Y. Hsu, F. He, and L.-T. Juan,
A measure of class imbalanced data set for classification problem,” “Data visualization of anomaly detection in semiconductor process-
IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 9, pp. 3525–3539, ing tools,” IEEE Trans. Semicond. Manuf., vol. 35, no. 2, pp. 186–197,
Sep. 2020. May 2022.
Authorized licensed use limited to: National Cheng Kung Univ.. Downloaded on December 05,2023 at 16:24:18 UTC from IEEE Xplore. Restrictions apply.