Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Anomaly Detection For Industrial Surface Inspection Application in Maintenance of Aircraft Components

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Available online at www.sciencedirect.

com

ScienceDirect
Procedia CIRP 107 (2022) 246–251
www.elsevier.com/locate/procedia

55th CIRP Conference on Manufacturing Systems

Anomaly detection for industrial surface inspection: application in


maintenance of aircraft components
Falko Kähler*a , Ole Schmedemanna , Thorsten Schüppstuhla
a Hamburg University of Technology, TUHH, Institute of Aircraft Production Technology, Denickestr. 17, 21073 Hamburg, Germany
* Corresponding author. Tel.: +49-40-42878-3479 ; fax: +49-40-42731-4551. E-mail address: f.kaehler@tuhh.de

Abstract
Surface defects on aircraft landing gear components represent a deviation from a normal state. Visual inspection is a safety-critical, but recurring
task with automation aspiration through machine vision. Various rare occurring faults make acquisition of appropriate training data cumbersome,
which represents a major challenge for artificial intelligence-based optical inspection. In this paper, we apply an anomaly detection approach
based on a convolutional autoencoder for defect detection during inspection to encounter the challenge of lacking and biased training data. Results
indicated the potential of this approach to assist the inspector, but improvements are required for a deployment.
© 2022 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the International Programme committee of the 55th CIRP Conference on Manufacturing Systems
Keywords: optical inspection; anomaly detection; surface defects; machine vision

1. Introduction motivates to train AI-based approaches only on one class, the


normal/defect-free samples. Ideally, the AI-models learn the
Visual inspection is a repetitive task in product lifecycles. structure of the normal class and subsequently identify anoma-
During maintenance of aircraft landing gears, the complete sur- lous samples when they sufficiently deviate from the learned
face of components must be inspected reliably for surface de- normal state.
fects to ensure flight safety. Small and hard to identify defects The goal of this work is to develop and implement an AI-
like pitting corrosion require skilled and experienced workers, based approach for detecting corrosion surface defects on land-
which are prone to error and human factors such as monotony or ing gear components in image data based on one-class classifi-
unsteady concentration [9, 16]. Additional drawbacks of visual cation. After a review of related work and relevant aspects, an
inspection such as hardly accessible component areas motivate anomaly detection approach is selected and applied. Later on,
the use of imaging sensors and automatic evaluation (machine results are discussed and the achieved performance is evaluated
vision) using artificial intelligence (AI), which is successfully regarding deployment on the given use case.
deployed in a variety of applications.
AI-bases machine vision relies on sufficient training data
that encompasses all possible types, shapes and locations of
defects. Rarely occurring defect types and variations as well Nomenclature
as missing or incomplete documentation make the creation of
comprehensive datasets tedious, time-consuming and costly, A image A (input)
representing a major challenge for machine vision deployment. a threshold parameter
Approaches to generate synthetic training data yield potential to B image B (reconstruction)
deal with this challenge [4, 10, 17], but require expert knowl- i pixel coordinate
edge and have a risk of unintentional domain gap. However, j pixel coordinate
defect-free samples are easy to acquire in large quantities. This nF number of filters
T threshold
µ mean value
σ standard deviation

2212-8271 © 2022 The Authors. Published by Elsevier B.V.


This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the International Programme committee of the 55th CIRP Conference on Manufacturing Systems
10.1016/j.procir.2022.05.197

This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
Falko Kähler et al. / Procedia CIRP 107 (2022) 246–251 247

2. Related work threshold for classification. According to their study, the pro-
posed method outperformed autoencoder or principal compo-
2.1. Automated optical inspection nent analysis. Tsai et al. [20] developed a convolutional au-
toencoder (CAE) approach for defect detection and applied it
Application of automated optical inspection is often moti- successfully on a variety of material surfaces. Their introduced
vated by the demand for increased productivity and reduction CAE with regularizations outperformed conventional CAE as
of errors and costs. A broad field of application is the detec- well as VAE. Lehr et al. [14] compared a CAE with pre-
tion of defects, which aims to identify the specific class and trained and fine-tuned convolutional neural networks (ResNet-
location of a defect [11]. Most applications use an imaging sys- 18 based) on their own data as well as on MVTec dataset.
tem (image sensor and lighting) to capture the surface of the They observed the CAE performed better detecting defective
object with adjacent software for image evaluation. The soft- images than defect-free on their own created dataset. However,
ware consist of an inspection algorithm to extract features of compared to supervised methods, the CAE achieved lower ac-
the image and classify it into non-defect or defect [9]. In re- curacy than supervised methods. GAN-based approaches have
cent years, inspection algorithms are more and more based on received increasing attention from researchers in recent years
artificial intelligence, which has improved the performance of [1, 2, 12]. For instance, Lai et al. [12] used a pretrained GAN to
various computer vision tasks. Automated optical inspection in generate defect-free images based on their training data. As the
order to detect surface defects is applied in manufacturing qual- proposed GAN failed to generate defective samples, they were
ity control, such as of metal [11, 15], ceramics or textiles [8], able to identify defects in textured images effectively.
optical elements [16] or electronics [9].
Another wide field is inspection during maintenance (as
in our use case) with many researches been conducted up to 3. Use case analysis
now. For instance, AI-based inspection is more and more enter-
ing in aircraft maintenance. Inspecting the fuselage for corro- According to [6, 7], different aspects of anomaly detection
sion is a vital task, where Brandoli et al. [5] applied a image- (see fig. 1) have to be discussed to select appropriate anomaly
based deep learning method for corrosion identification and detection methods. Regarding input data, it can be distinguished
achieved promising results and high performance. The authors
encountered shortage of defective images by employing trans-
fer learning, but stated their method is expected to improve
with more data. Taheritanjani et al. [19] applied supervised and
unsupervised AI-methods on real image data of aircraft en-
gines fasteners and achieved an accuracy and recall of 0.99
using a Resnet101-based supervised method, while unsuper-
vised methods like support vector machines or autoencoders
achieved significantly lower performance. The authors stated
the main drawbacks of supervised methods are tedious data col-
lection as well as the lacking generalizability when introducing Fig. 1. Aspects of anomaly detection.
unknown defects. Other researches used AI-based approaches
on endoscopic images for defect detection in of aircraft en- between sequential data (e.g. video, speech, text) and non-
gines [18, 21]. Shen et al. [18] for instance successfully im- sequential data (e.g. images). This work focuses on imaging
plemented supervised learning for detecting cracks and burns, sensors to capture surface defects, so non-sequential image
but the amount of available training data remains a bottleneck. data is used. Due to lacking defect data, only one class data
of defect-free images is available for training. Therefore, only
2.2. Anomaly detection semi-supervised methods are considered, which only train on
one class (one-class classification). Next, anomalies can be cat-
Classification between a defect-free and defective while egorized into three types:
training only on defect-free/normal data instances is often re-
ferred as one-class classification or anomaly detection problem • Point anomalies, where a data instance significantly de-
[6, 7]. Various semi-/unsupervised approaches have been de- viates from the other instances
veloped to encounter the data shortage issue. In recent years, • Contextual anomalies, where a data instance is consid-
approaches mostly based on a Generative Adversarial Network ered anomalous in a specific context
(GAN) or Convolutional Autoencoder (CAE) have been devel- • Collective anomalies, where single data instances appear
oped to detect defects in image data. However, compared to to be normal, but anomalous in a group
traditional classification approaches, the specific type of de-
fect cannot be determined. An et al. [3] proposed a Varia- Pitting corrosion defects on landing gears can be considered
tional Autoencoder (VAE), which differs from conventional au- point anomalies since they do not have a specific context be-
toencoders that it delivers a reconstruction probability instead tween each other as their occurrence is random. They are indi-
of a reconstruction error, which does not require a specific vidually considered anomalous. Output of deep anomaly detec-

This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
248 Falko Kähler et al. / Procedia CIRP 107 (2022) 246–251

tion methods can either be an anomaly score or a class label.


An anomaly score quantifies the outlierness of a data instance.
The score can be ranked and a domain-specific threshold (deci-
sion score) determined by an expert can be applied to identify
anomalies [6]. As our goal is to detect surface defects, we aim
for a label output. Anomaly scores, however, may yield useful
information about the defectiveness.

4. Approach

After reviewing aspects of anomaly detection in section 3, a Fig. 2. Setup for image acquisition.
Convolutional Autoencoder (CAE) approach has been selected.
An autoencoder consists of an encoder and decoder. The en-
coder compresses the input to a lower-dimensional space. This
compressed representation is passed to the decoder, which re-
constructs it back to the input dimension. For CAE, the encoder
consists of a sequence of convolution and downsampling layers
to compress an image, while the decoder involves a series of
deconvolution and upsampling layers for reconstruction [20].
Compared to VAE- or GAN-based approaches, CAE are con-
sidered relatively straightforward to train.
As a generative method, the CAE delivers an image output
which can be compared with the input. The similarity or recon-
struction error between input and output characterizes the au-
toencoder’s performance and is considered as anomaly score.
Since the CAE will be trained only on reconstructing defect-
free images, a high similarity between input and reconstructed
output and therefore a low reconstruction error is expected. For
defective images, a higher reconstruction error is estimated,
which deviates significantly from normal instances. We pro-
pose a threshold method based on the reconstruction error to
identify normal and anomalous instances. Different metrics for
calculating the similarity of input and reconstruction, namely
mean squared error (MSE), structural similarity index (SSIM) Fig. 3. Dataset samples.
and signal-to-reconstruction-error-ratio (SRE) [13], are consid-
ered.
5.2. Autoencoder architecture
5. Implementation
The CAE was implemented using python and keras tensor-
flow. The encoder consists of the input and a convolution layer
5.1. Data acquisition
to compress the input image. The decoder applies a transposed
convolution and a convolution layer to reconstruct the input
Real images of a landing gear component surface have been
from the compressed representation. The width of the autoen-
acquired for training and testing. Figure 2 depicts the acquisi-
coder is varied by modifying the number of filters (nF ) of the
tion setup which consists of a grayscale camera focusing per-
convolution/convolution transpose layers between 16, 32 and
pendicular on the component surface. The component itself
64. Adam (short for Adaptive Moment Estimation) optimizer
can be rotated while the camera is slideable parallel along the
was applied. Details of the architecture can be found in table 1.
component rotation axis. Due to the varying outer component
contour, the camera was manually focused. A LED ring light
ensured adequate lighting conditions. In total, 600 non-defect
and 300 defect images were taken. The images were resized to Table 1. Detailed parameters of the CAE.
144x144 pixels and normalized. Dataset samples1 are shown in Layer Filters Padding Activation
fig 3. Corrosion is visible as dark areas, which clearly separates Encoder Input
Convolution nF x(3,3) same relu
from the (rather noisy) metal texture. Decoder Convolution transpose nF x(3,3) same relu
Convolution 1x(3,3) same sigmoid
1 The dataset can be requested from the corresponding author.

This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
Falko Kähler et al. / Procedia CIRP 107 (2022) 246–251 249

5.3. Training

The CAE models were trained from ground up using the data
elaborated in section 5.1. 500 random defect-free samples were
picked for training and the remaining 100 defect-free and all
300 defective samples were used for subsequent testing. During
training, a 85/15 training/validation split was applied. The train-
ing data is shuffled after each epoch. Training parameters were
chosen on best practices among literature. A constant batch size
of 16 and a learning rate of 0.001 were chosen. All models were
trained for 25 epochs and saved after each epoch for evaluation.

6. Results

6.1. Selection of autoencoder architecture and metric

The CAE models have been trained and the performance


(balanced accuracy, precision and recall) on the test samples
was calculated. The similarity measures were compared to de-
termine the best metric for normal/defective classification. As
the majority of defect-free training samples would be classified
correctly using the mean reconstruction loss, it is used as initial
threshold T for classification of test samples, as a significantly
differing loss is expected for defective samples. Hereby, for
MSE losses greater than the threshold are considered anoma-
lous (less error equals greater similarity), while for SSIM and
SRE losses lower than the threshold are considered anomalous
(higher ratio equals greater similarity).
Figure 4 shows the performance of each training configura-
tion over the number of epochs. Despite the shallow CAE ar-
chitecture, promising, but not yet sufficient results are achieved
at this stage. Best performance is achieved for nF = 64 after
one epoch and nF = 16 after 3 epochs using SRE metric. For
MSE, individual performance values reached similar levels. It
can be noticed SSIM lacks behind and is not considered further
in this work. Table 2 shows the respective confusion matrices Fig. 4. Performance of MSE, SSIM and SRE metric with varying number of
for nF = 64 and nF = 16. filters.

Table 2. Confusion matrix for nF = 64 after 1 epoch (top) and nF = 16 after 3 is desired and as small number of false positive detections may
epochs (bottom). be accepted as a trade-off. Due to overall highest recall rates,
Predicted label SRE metric and nF = 64 is selected for further investigations
nF = 64 defective defect-free Total and optimizations such as threshold adjustment.
defective 261 39 300
True label
defect-free 41 59 100 6.2. Threshold adjustment
Total 302 98 400
Predicted label The decision threshold is important for the model perfor-
nF = 16 defective defect-free Total mance. In section 6.1, the mean training reconstruction error
defective 261 39 300 was used as initial threshold to determine a suitable similarity
True label
defect-free 45 55 100 metric. However, tuning the threshold may reinforce the desired
Total 306 94 400 behavior for the specific use case. We define the threshold as
T = µ + a · σ, where the initial mean reconstruction error µ
Roughly 50% of defect-free samples were falsely classified, is adjusted by the product of parameter a and the standard de-
indicating the initial threshold is not suitable for evaluating viation σ of the training reconstruction errors. The wide CAE-
defect-free samples. In contrast, defective samples were clas- model (nF = 64) after one epoch is used for further evaluation.
sified significantly better. Since all defects must be reliably de- Parameter a is varied between -2 and 2 and the performance is
tected when inspecting safety-critical components, a high recall evaluated. As depicted in figure 5, decent tradeoffs of all three

This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
250 Falko Kähler et al. / Procedia CIRP 107 (2022) 246–251

performances are achieved in range 0 ≤ a ≤ 0.5. One can also

Fig. 5. Threshold adjustment.

notice the contrary behavior of precision and recall as a high


recall leads to more defect classifications, but also to more in-
correct defect classifications, decreasing the overall precision.
As the slopes of precision and recall indicate, a high increase
of recall can be achieved while sparsely decreasing precision.
Since inspection is a safety-critical task, all defects must be
detected, so a high recall rate is desired. When adjusting the
threshold to a = 1.21, a maximum recall rate of 1 is achieved,
Fig. 6. Image reconstructions.
but balanced accuracy and precision decrease to 0.56 and 0.77
respectively, according to the corresponding confusion matrix
in table 3. The performance decreased drastically, with only 12 pected issues and bias. Some blurry and probably misleading
samples were noticed. For instance, original image 1 in figure 6
Table 3. Confusion matrix. (nF = 64, epoch = 1, a = 1.21). shows ambiguous dark spots, which were not considered defec-
Predicted label tive during data acquisition. We investigated falsely classified
defective defect-free Total defect-free images (fig. 7), which reinforced this suspicion. As
defective 300 0 300 the images were normalized during data preparation, dark spots
True label
defect-free 88 12 100 (see red boxes in fig. 7) become more present. These spots have
Total 388 12 400 similarities with the actual corrosion defects and might mislead
the CAE preventing it from learning relevant class features, re-
of 100 defect-free samples classified correctly, resulting only sulting in a hazy demarcation line between normal and defec-
in a 12 % reduction of images to be evaluated manually. The tive.
false alarm rate would be extremely high in real deployment,
as defect-free samples are more frequent than defective. In this
state, it is evident a threshold adjustment does not lead to satis-
factory results and the approach does not yield sufficient benefit
for the inspector yet.

6.3. Failure investigation

In order to identify causes for the low performance, we Fig. 7. Falsely classified defect-free samples.
investigated the reconstructions of the CAE. Figure 6 shows
data samples from the test set, both normal and defective. As
the images indicate, the CAE successfully reconstructs input 6.4. Discussion
from both normal and defective samples and visual differences
are not as evident expected. The reconstructions appear blurry As previously elaborated, the CAE reconstructs defect-free
probably causing a distorted similarity measurement. Bright- as well as defective images successfully, which influences the
ness differences can be noticed between input and reconstruc- performance negatively. On one hand, threshold adjustment ac-
tion for both normal and defective samples. We conclude the cording to section 6.2 can increase the desired behavior of the
CAE (with the elaborated architecture) has not learned prop- CAE to detect all defects, but on the other hand will drasti-
erly on the given data. The training data was analyzed for sus- cally increase the false positive classifications and, in contrast

This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
Falko Kähler et al. / Procedia CIRP 107 (2022) 246–251 251

to previous works elaborated in section 2.2, does not lead to a [4] Bath, L., Schmedemann, O., Schüppstuhl, T., 2021. Automatisierung in
satisfactory performance for autonomous deployment. The ap- der industriellen Endoskopie/development of new means regarding sensor
proach can be considered as starting point for machine-vision positioning and measurement data evaluation – automation of industrial
endoscopy. wt Werkstattstechnik online 111, 644–649. doi:10.37544/
based inspection and yield potential to reduce the number of 1436-4980-2021-09-70.
images to be evaluated manually and the time spent for visual [5] Brandoli, B., de Geus, A.R., Souza, J.R., Spadon, G., Soares, A., Ro-
inspection when the false alarm rate is on an acceptable low drigues, J.F., Komorowski, J., Matwin, S., 2021. Aircraft fuselage cor-
level. To achieve this, the training data could be enhanced and rosion detection using artificial intelligence. Sensors (Basel, Switzerland)
expanded. This is on one hand accomplished by careful data 21. doi:10.3390/s21124026.
[6] Chalapathy, R., Chawla, S., 2019. Deep learning for anomaly detection: A
acquisition to describe the normal, defect-free state as precisely survey. URL: http://arxiv.org/pdf/1901.03407v2.
and unambiguously as possible to separate anomalies clearly. [7] Chandola, V., Banerjee, A., Kumar, V., 2009. Anomaly detection. ACM
Deeper information content such as colored images may sup- Computing Surveys 41, 1–58. doi:10.1145/1541880.1541882.
port this separation. On the other hand, during deployment the [8] Czimmermann, T., Ciuti, G., Milazzo, M., Chiurazzi, M., Roccella, S.,
data itself can could be expanded by new data annotated by Oddo, C.M., Dario, P., 2020. Visual-based defect detection and classi-
fication approaches for industrial applications-a survey. Sensors (Basel,
the still-needed inspector in aircraft landing gear maintenance. Switzerland) 20. doi:10.3390/s20051459.
This enables an further algorithm training and improvements, [9] Ebayyeh, A.A.R.M.A., Mousavi, A., 2020. A review and analysis of au-
such as inclusion of new or variable over time normal states. tomatic optical inspection and quality monitoring methods in electronics
industry. IEEE Access 8, 183192–183271. doi:10.1109/ACCESS.2020.
3029127.
7. Conclusion [10] Gutierrez, P., Luschkova, M., Cordier, A., Shukor, M., Schappert, M., Dah-
men, T., 2021. Synthetic training data generation for deep learning based
quality inspection doi:10.1117/12.2586824.
This work investigates a method to detect surface defects [11] He, Y., Song, K., Meng, Q., Yan, Y., 2020. An end-to-end steel sur-
in image data of an aircraft landing gear component. To en- face defect detection approach via fusing multiple hierarchical features.
counter the shortage of defective samples, we pursued an IEEE Transactions on Instrumentation and Measurement 69, 1493–1504.
one class anomaly detection approach based on a convolu- doi:10.1109/TIM.2019.2915404.
tional autoencoder (CAE). The CAE is expected to success- [12] Lai, Y.T., Hu, J.S., Tsai, Y.H., Chiu, W.Y., 09.07.2018 - 12.07.2018. In-
dustrial anomaly detection and one-class classification using generative
fully reconstruct normal/defect-free images, but fail on anoma- adversarial networks, in: 2018 IEEE/ASME International Conference on
lous/defective samples. The similarity between input and re- Advanced Intelligent Mechatronics (AIM), IEEE. pp. 1444–1449. doi:10.
constructed output is calculated and compared with a threshold 1109/AIM.2018.8452228.
to identify defective samples. Results showed the implemented [13] Lanaras, C., Bioucas-Dias, J., Galliani, S., Baltsavias, E., Schindler, K.,
CAE reconstructs normal as well as defective inputs success- 2018. Super-resolution of sentinel-2 images: Learning a globally applica-
ble deep neural network. ISPRS Journal of Photogrammetry and Remote
fully, which affects the performance negatively. Several metrics Sensing 146, 305–319. doi:10.1016/j.isprsjprs.2018.09.018.
for evaluating the similarity and reconstruction error have been [14] Lehr, J., Sargsyan, A., Pape, M., Philipps, J., Krüger, J., 08.09.2020 -
investigated, with signal-to-reconstruction-error (SRE) proving 11.09.2020. Automated optical inspection using anomaly detection and
to be the most effective metric to differentiate between nor- unsupervised defect clustering, in: 2020 25th IEEE International Confer-
mal/defective. However, due to successful reconstruction of de- ence on Emerging Technologies and Factory Automation (ETFA), IEEE.
pp. 1235–1238. doi:10.1109/ETFA46521.2020.9212172.
fective samples, a clear threshold could not be determined and a [15] Luo, Q., Fang, X., Liu, L., Yang, C., Sun, Y., 2020. Automated visual defect
satisfactory performance is not achieved. The approach can be detection for flat steel surface: A survey. IEEE Transactions on Instrumen-
considered as starting point for machine vision based surface in- tation and Measurement 69, 626–644. doi:10.1109/TIM.2019.2963555.
spection. An adjusted CAE architecture, higher quality training [16] Schöch, A., Perez, P., Linz-Dittrich, S., Bach, C., Ziolek, C., 2017. Auto-
data as well as gathered data during deployment and probably mated surface inspection of small customer-specific optical elements. tm -
Technisches Messen 84, 502–511. doi:10.1515/teme-2017-0012.
additional information such as color might increase the perfor- [17] Schoepflin, D., Holst, D., Gomse, M., Schüppstuhl, T., 2021. Synthetic
mance and yield potential to reduce the number images to be training data generation for visual object identification on load carriers, pp.
evaluated manually. 1257–1262. doi:10.1016/j.procir.2021.11.211.
[18] Shen, Z., Wan, X., Ye, F., Guan, X., Liu, S., . Deep learning based frame-
work for automatic damage detection in aircraft engine borescope inspec-
Acknowledgements tion , 1005–1010doi:10.1109/ICCNC.2019.8685593.
[19] Taheritanjani, S., Schoenfeld, R., Bruegge, B., 2019. Automatic damage
detection of fasteners in overhaul processes, in: 2019 IEEE 15th Interna-
Research was funded by the German Federal Ministry for tional Conference on Automation Science and Engineering (CASE), IEEE.
Economics and Climate Action under the Program LuFo V-3. pp. 1289–1295. doi:10.1109/COASE.2019.8843049.
[20] Tsai, D.M., Jen, P.H., 2021. Autoencoder-based anomaly detection for
surface defect inspection. Advanced Engineering Informatics 48, 101272.
References doi:10.1016/j.aei.2021.101272.
[21] Wong, C.Y., Seshadri, P., Parks, G.T., 2021. Automatic borescope damage
[1] Akcay, S., Atapour-Abarghouei, A., Breckon, T.P., 2018. Ganomaly: Semi- assessments for gas turbine blades via deep learning 142, 1097. doi:10.
supervised anomaly detection via adversarial training . 2514/6.2021-1488.
[2] Akçay, S., Atapour-Abarghouei, A., Breckon, T.P., 2019. Skip-ganomaly:
Skip connected and adversarially trained encoder-decoder anomaly detec-
tion.
[3] An, J., Cho, S., 2015. Variational autoencoder based anomaly detection.

This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.

You might also like