Anomaly Detection For Industrial Surface Inspection Application in Maintenance of Aircraft Components
Anomaly Detection For Industrial Surface Inspection Application in Maintenance of Aircraft Components
Anomaly Detection For Industrial Surface Inspection Application in Maintenance of Aircraft Components
com
ScienceDirect
Procedia CIRP 107 (2022) 246–251
www.elsevier.com/locate/procedia
Abstract
Surface defects on aircraft landing gear components represent a deviation from a normal state. Visual inspection is a safety-critical, but recurring
task with automation aspiration through machine vision. Various rare occurring faults make acquisition of appropriate training data cumbersome,
which represents a major challenge for artificial intelligence-based optical inspection. In this paper, we apply an anomaly detection approach
based on a convolutional autoencoder for defect detection during inspection to encounter the challenge of lacking and biased training data. Results
indicated the potential of this approach to assist the inspector, but improvements are required for a deployment.
© 2022 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the International Programme committee of the 55th CIRP Conference on Manufacturing Systems
Keywords: optical inspection; anomaly detection; surface defects; machine vision
This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
Falko Kähler et al. / Procedia CIRP 107 (2022) 246–251 247
2. Related work threshold for classification. According to their study, the pro-
posed method outperformed autoencoder or principal compo-
2.1. Automated optical inspection nent analysis. Tsai et al. [20] developed a convolutional au-
toencoder (CAE) approach for defect detection and applied it
Application of automated optical inspection is often moti- successfully on a variety of material surfaces. Their introduced
vated by the demand for increased productivity and reduction CAE with regularizations outperformed conventional CAE as
of errors and costs. A broad field of application is the detec- well as VAE. Lehr et al. [14] compared a CAE with pre-
tion of defects, which aims to identify the specific class and trained and fine-tuned convolutional neural networks (ResNet-
location of a defect [11]. Most applications use an imaging sys- 18 based) on their own data as well as on MVTec dataset.
tem (image sensor and lighting) to capture the surface of the They observed the CAE performed better detecting defective
object with adjacent software for image evaluation. The soft- images than defect-free on their own created dataset. However,
ware consist of an inspection algorithm to extract features of compared to supervised methods, the CAE achieved lower ac-
the image and classify it into non-defect or defect [9]. In re- curacy than supervised methods. GAN-based approaches have
cent years, inspection algorithms are more and more based on received increasing attention from researchers in recent years
artificial intelligence, which has improved the performance of [1, 2, 12]. For instance, Lai et al. [12] used a pretrained GAN to
various computer vision tasks. Automated optical inspection in generate defect-free images based on their training data. As the
order to detect surface defects is applied in manufacturing qual- proposed GAN failed to generate defective samples, they were
ity control, such as of metal [11, 15], ceramics or textiles [8], able to identify defects in textured images effectively.
optical elements [16] or electronics [9].
Another wide field is inspection during maintenance (as
in our use case) with many researches been conducted up to 3. Use case analysis
now. For instance, AI-based inspection is more and more enter-
ing in aircraft maintenance. Inspecting the fuselage for corro- According to [6, 7], different aspects of anomaly detection
sion is a vital task, where Brandoli et al. [5] applied a image- (see fig. 1) have to be discussed to select appropriate anomaly
based deep learning method for corrosion identification and detection methods. Regarding input data, it can be distinguished
achieved promising results and high performance. The authors
encountered shortage of defective images by employing trans-
fer learning, but stated their method is expected to improve
with more data. Taheritanjani et al. [19] applied supervised and
unsupervised AI-methods on real image data of aircraft en-
gines fasteners and achieved an accuracy and recall of 0.99
using a Resnet101-based supervised method, while unsuper-
vised methods like support vector machines or autoencoders
achieved significantly lower performance. The authors stated
the main drawbacks of supervised methods are tedious data col-
lection as well as the lacking generalizability when introducing Fig. 1. Aspects of anomaly detection.
unknown defects. Other researches used AI-based approaches
on endoscopic images for defect detection in of aircraft en- between sequential data (e.g. video, speech, text) and non-
gines [18, 21]. Shen et al. [18] for instance successfully im- sequential data (e.g. images). This work focuses on imaging
plemented supervised learning for detecting cracks and burns, sensors to capture surface defects, so non-sequential image
but the amount of available training data remains a bottleneck. data is used. Due to lacking defect data, only one class data
of defect-free images is available for training. Therefore, only
2.2. Anomaly detection semi-supervised methods are considered, which only train on
one class (one-class classification). Next, anomalies can be cat-
Classification between a defect-free and defective while egorized into three types:
training only on defect-free/normal data instances is often re-
ferred as one-class classification or anomaly detection problem • Point anomalies, where a data instance significantly de-
[6, 7]. Various semi-/unsupervised approaches have been de- viates from the other instances
veloped to encounter the data shortage issue. In recent years, • Contextual anomalies, where a data instance is consid-
approaches mostly based on a Generative Adversarial Network ered anomalous in a specific context
(GAN) or Convolutional Autoencoder (CAE) have been devel- • Collective anomalies, where single data instances appear
oped to detect defects in image data. However, compared to to be normal, but anomalous in a group
traditional classification approaches, the specific type of de-
fect cannot be determined. An et al. [3] proposed a Varia- Pitting corrosion defects on landing gears can be considered
tional Autoencoder (VAE), which differs from conventional au- point anomalies since they do not have a specific context be-
toencoders that it delivers a reconstruction probability instead tween each other as their occurrence is random. They are indi-
of a reconstruction error, which does not require a specific vidually considered anomalous. Output of deep anomaly detec-
This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
248 Falko Kähler et al. / Procedia CIRP 107 (2022) 246–251
4. Approach
After reviewing aspects of anomaly detection in section 3, a Fig. 2. Setup for image acquisition.
Convolutional Autoencoder (CAE) approach has been selected.
An autoencoder consists of an encoder and decoder. The en-
coder compresses the input to a lower-dimensional space. This
compressed representation is passed to the decoder, which re-
constructs it back to the input dimension. For CAE, the encoder
consists of a sequence of convolution and downsampling layers
to compress an image, while the decoder involves a series of
deconvolution and upsampling layers for reconstruction [20].
Compared to VAE- or GAN-based approaches, CAE are con-
sidered relatively straightforward to train.
As a generative method, the CAE delivers an image output
which can be compared with the input. The similarity or recon-
struction error between input and output characterizes the au-
toencoder’s performance and is considered as anomaly score.
Since the CAE will be trained only on reconstructing defect-
free images, a high similarity between input and reconstructed
output and therefore a low reconstruction error is expected. For
defective images, a higher reconstruction error is estimated,
which deviates significantly from normal instances. We pro-
pose a threshold method based on the reconstruction error to
identify normal and anomalous instances. Different metrics for
calculating the similarity of input and reconstruction, namely
mean squared error (MSE), structural similarity index (SSIM) Fig. 3. Dataset samples.
and signal-to-reconstruction-error-ratio (SRE) [13], are consid-
ered.
5.2. Autoencoder architecture
5. Implementation
The CAE was implemented using python and keras tensor-
flow. The encoder consists of the input and a convolution layer
5.1. Data acquisition
to compress the input image. The decoder applies a transposed
convolution and a convolution layer to reconstruct the input
Real images of a landing gear component surface have been
from the compressed representation. The width of the autoen-
acquired for training and testing. Figure 2 depicts the acquisi-
coder is varied by modifying the number of filters (nF ) of the
tion setup which consists of a grayscale camera focusing per-
convolution/convolution transpose layers between 16, 32 and
pendicular on the component surface. The component itself
64. Adam (short for Adaptive Moment Estimation) optimizer
can be rotated while the camera is slideable parallel along the
was applied. Details of the architecture can be found in table 1.
component rotation axis. Due to the varying outer component
contour, the camera was manually focused. A LED ring light
ensured adequate lighting conditions. In total, 600 non-defect
and 300 defect images were taken. The images were resized to Table 1. Detailed parameters of the CAE.
144x144 pixels and normalized. Dataset samples1 are shown in Layer Filters Padding Activation
fig 3. Corrosion is visible as dark areas, which clearly separates Encoder Input
Convolution nF x(3,3) same relu
from the (rather noisy) metal texture. Decoder Convolution transpose nF x(3,3) same relu
Convolution 1x(3,3) same sigmoid
1 The dataset can be requested from the corresponding author.
This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
Falko Kähler et al. / Procedia CIRP 107 (2022) 246–251 249
5.3. Training
The CAE models were trained from ground up using the data
elaborated in section 5.1. 500 random defect-free samples were
picked for training and the remaining 100 defect-free and all
300 defective samples were used for subsequent testing. During
training, a 85/15 training/validation split was applied. The train-
ing data is shuffled after each epoch. Training parameters were
chosen on best practices among literature. A constant batch size
of 16 and a learning rate of 0.001 were chosen. All models were
trained for 25 epochs and saved after each epoch for evaluation.
6. Results
Table 2. Confusion matrix for nF = 64 after 1 epoch (top) and nF = 16 after 3 is desired and as small number of false positive detections may
epochs (bottom). be accepted as a trade-off. Due to overall highest recall rates,
Predicted label SRE metric and nF = 64 is selected for further investigations
nF = 64 defective defect-free Total and optimizations such as threshold adjustment.
defective 261 39 300
True label
defect-free 41 59 100 6.2. Threshold adjustment
Total 302 98 400
Predicted label The decision threshold is important for the model perfor-
nF = 16 defective defect-free Total mance. In section 6.1, the mean training reconstruction error
defective 261 39 300 was used as initial threshold to determine a suitable similarity
True label
defect-free 45 55 100 metric. However, tuning the threshold may reinforce the desired
Total 306 94 400 behavior for the specific use case. We define the threshold as
T = µ + a · σ, where the initial mean reconstruction error µ
Roughly 50% of defect-free samples were falsely classified, is adjusted by the product of parameter a and the standard de-
indicating the initial threshold is not suitable for evaluating viation σ of the training reconstruction errors. The wide CAE-
defect-free samples. In contrast, defective samples were clas- model (nF = 64) after one epoch is used for further evaluation.
sified significantly better. Since all defects must be reliably de- Parameter a is varied between -2 and 2 and the performance is
tected when inspecting safety-critical components, a high recall evaluated. As depicted in figure 5, decent tradeoffs of all three
This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
250 Falko Kähler et al. / Procedia CIRP 107 (2022) 246–251
In order to identify causes for the low performance, we Fig. 7. Falsely classified defect-free samples.
investigated the reconstructions of the CAE. Figure 6 shows
data samples from the test set, both normal and defective. As
the images indicate, the CAE successfully reconstructs input 6.4. Discussion
from both normal and defective samples and visual differences
are not as evident expected. The reconstructions appear blurry As previously elaborated, the CAE reconstructs defect-free
probably causing a distorted similarity measurement. Bright- as well as defective images successfully, which influences the
ness differences can be noticed between input and reconstruc- performance negatively. On one hand, threshold adjustment ac-
tion for both normal and defective samples. We conclude the cording to section 6.2 can increase the desired behavior of the
CAE (with the elaborated architecture) has not learned prop- CAE to detect all defects, but on the other hand will drasti-
erly on the given data. The training data was analyzed for sus- cally increase the false positive classifications and, in contrast
This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.
Falko Kähler et al. / Procedia CIRP 107 (2022) 246–251 251
to previous works elaborated in section 2.2, does not lead to a [4] Bath, L., Schmedemann, O., Schüppstuhl, T., 2021. Automatisierung in
satisfactory performance for autonomous deployment. The ap- der industriellen Endoskopie/development of new means regarding sensor
proach can be considered as starting point for machine-vision positioning and measurement data evaluation – automation of industrial
endoscopy. wt Werkstattstechnik online 111, 644–649. doi:10.37544/
based inspection and yield potential to reduce the number of 1436-4980-2021-09-70.
images to be evaluated manually and the time spent for visual [5] Brandoli, B., de Geus, A.R., Souza, J.R., Spadon, G., Soares, A., Ro-
inspection when the false alarm rate is on an acceptable low drigues, J.F., Komorowski, J., Matwin, S., 2021. Aircraft fuselage cor-
level. To achieve this, the training data could be enhanced and rosion detection using artificial intelligence. Sensors (Basel, Switzerland)
expanded. This is on one hand accomplished by careful data 21. doi:10.3390/s21124026.
[6] Chalapathy, R., Chawla, S., 2019. Deep learning for anomaly detection: A
acquisition to describe the normal, defect-free state as precisely survey. URL: http://arxiv.org/pdf/1901.03407v2.
and unambiguously as possible to separate anomalies clearly. [7] Chandola, V., Banerjee, A., Kumar, V., 2009. Anomaly detection. ACM
Deeper information content such as colored images may sup- Computing Surveys 41, 1–58. doi:10.1145/1541880.1541882.
port this separation. On the other hand, during deployment the [8] Czimmermann, T., Ciuti, G., Milazzo, M., Chiurazzi, M., Roccella, S.,
data itself can could be expanded by new data annotated by Oddo, C.M., Dario, P., 2020. Visual-based defect detection and classi-
fication approaches for industrial applications-a survey. Sensors (Basel,
the still-needed inspector in aircraft landing gear maintenance. Switzerland) 20. doi:10.3390/s20051459.
This enables an further algorithm training and improvements, [9] Ebayyeh, A.A.R.M.A., Mousavi, A., 2020. A review and analysis of au-
such as inclusion of new or variable over time normal states. tomatic optical inspection and quality monitoring methods in electronics
industry. IEEE Access 8, 183192–183271. doi:10.1109/ACCESS.2020.
3029127.
7. Conclusion [10] Gutierrez, P., Luschkova, M., Cordier, A., Shukor, M., Schappert, M., Dah-
men, T., 2021. Synthetic training data generation for deep learning based
quality inspection doi:10.1117/12.2586824.
This work investigates a method to detect surface defects [11] He, Y., Song, K., Meng, Q., Yan, Y., 2020. An end-to-end steel sur-
in image data of an aircraft landing gear component. To en- face defect detection approach via fusing multiple hierarchical features.
counter the shortage of defective samples, we pursued an IEEE Transactions on Instrumentation and Measurement 69, 1493–1504.
one class anomaly detection approach based on a convolu- doi:10.1109/TIM.2019.2915404.
tional autoencoder (CAE). The CAE is expected to success- [12] Lai, Y.T., Hu, J.S., Tsai, Y.H., Chiu, W.Y., 09.07.2018 - 12.07.2018. In-
dustrial anomaly detection and one-class classification using generative
fully reconstruct normal/defect-free images, but fail on anoma- adversarial networks, in: 2018 IEEE/ASME International Conference on
lous/defective samples. The similarity between input and re- Advanced Intelligent Mechatronics (AIM), IEEE. pp. 1444–1449. doi:10.
constructed output is calculated and compared with a threshold 1109/AIM.2018.8452228.
to identify defective samples. Results showed the implemented [13] Lanaras, C., Bioucas-Dias, J., Galliani, S., Baltsavias, E., Schindler, K.,
CAE reconstructs normal as well as defective inputs success- 2018. Super-resolution of sentinel-2 images: Learning a globally applica-
ble deep neural network. ISPRS Journal of Photogrammetry and Remote
fully, which affects the performance negatively. Several metrics Sensing 146, 305–319. doi:10.1016/j.isprsjprs.2018.09.018.
for evaluating the similarity and reconstruction error have been [14] Lehr, J., Sargsyan, A., Pape, M., Philipps, J., Krüger, J., 08.09.2020 -
investigated, with signal-to-reconstruction-error (SRE) proving 11.09.2020. Automated optical inspection using anomaly detection and
to be the most effective metric to differentiate between nor- unsupervised defect clustering, in: 2020 25th IEEE International Confer-
mal/defective. However, due to successful reconstruction of de- ence on Emerging Technologies and Factory Automation (ETFA), IEEE.
pp. 1235–1238. doi:10.1109/ETFA46521.2020.9212172.
fective samples, a clear threshold could not be determined and a [15] Luo, Q., Fang, X., Liu, L., Yang, C., Sun, Y., 2020. Automated visual defect
satisfactory performance is not achieved. The approach can be detection for flat steel surface: A survey. IEEE Transactions on Instrumen-
considered as starting point for machine vision based surface in- tation and Measurement 69, 626–644. doi:10.1109/TIM.2019.2963555.
spection. An adjusted CAE architecture, higher quality training [16] Schöch, A., Perez, P., Linz-Dittrich, S., Bach, C., Ziolek, C., 2017. Auto-
data as well as gathered data during deployment and probably mated surface inspection of small customer-specific optical elements. tm -
Technisches Messen 84, 502–511. doi:10.1515/teme-2017-0012.
additional information such as color might increase the perfor- [17] Schoepflin, D., Holst, D., Gomse, M., Schüppstuhl, T., 2021. Synthetic
mance and yield potential to reduce the number images to be training data generation for visual object identification on load carriers, pp.
evaluated manually. 1257–1262. doi:10.1016/j.procir.2021.11.211.
[18] Shen, Z., Wan, X., Ye, F., Guan, X., Liu, S., . Deep learning based frame-
work for automatic damage detection in aircraft engine borescope inspec-
Acknowledgements tion , 1005–1010doi:10.1109/ICCNC.2019.8685593.
[19] Taheritanjani, S., Schoenfeld, R., Bruegge, B., 2019. Automatic damage
detection of fasteners in overhaul processes, in: 2019 IEEE 15th Interna-
Research was funded by the German Federal Ministry for tional Conference on Automation Science and Engineering (CASE), IEEE.
Economics and Climate Action under the Program LuFo V-3. pp. 1289–1295. doi:10.1109/COASE.2019.8843049.
[20] Tsai, D.M., Jen, P.H., 2021. Autoencoder-based anomaly detection for
surface defect inspection. Advanced Engineering Informatics 48, 101272.
References doi:10.1016/j.aei.2021.101272.
[21] Wong, C.Y., Seshadri, P., Parks, G.T., 2021. Automatic borescope damage
[1] Akcay, S., Atapour-Abarghouei, A., Breckon, T.P., 2018. Ganomaly: Semi- assessments for gas turbine blades via deep learning 142, 1097. doi:10.
supervised anomaly detection via adversarial training . 2514/6.2021-1488.
[2] Akçay, S., Atapour-Abarghouei, A., Breckon, T.P., 2019. Skip-ganomaly:
Skip connected and adversarially trained encoder-decoder anomaly detec-
tion.
[3] An, J., Cho, S., 2015. Variational autoencoder based anomaly detection.
This is a resupply of March 2023 as the template used in the publication of the original article contained errors. The content of the article has remained unaffected.