Anomaly Detection For Industrial Surface Inspection Application in Maintenance of Aircraft Components
Procedia CIRP 107 (2022) 246–251
Surface defects on aircraft landing gear components represent a deviation from a normal state. Visual inspection is a safety-critical, but recurring
task with automation aspiration through machine vision. Various rare occurring faults make acquisition of appropriate training data cumbersome,
which represents a major challenge for artificial intelligence-based optical inspection. In this paper, we apply an anomaly detection approach
based on a convolutional autoencoder for defect detection during inspection to encounter the challenge of lacking and biased training data. Results
indicated the potential of this approach to assist the inspector, but improvements are required for a deployment.
Keywords: optical inspection; anomaly detection; surface defects; machine vision
2. Related work threshold for classification. According to their study, the pro-
posed method outperformed autoencoder or principal compo-
2.1. Automated optical inspection nent analysis. Tsai et al. [20] developed a convolutional au-
toencoder (CAE) approach for defect detection and applied it
Application of automated optical inspection is often moti- successfully on a variety of material surfaces. Their introduced
vated by the demand for increased productivity and reduction CAE with regularizations outperformed conventional CAE as
of errors and costs. A broad field of application is the detec- well as VAE. Lehr et al. [14] compared a CAE with pre-
tion of defects, which aims to identify the specific class and trained and fine-tuned convolutional neural networks (ResNet-
location of a defect [11]. Most applications use an imaging sys- 18 based) on their own data as well as on MVTec dataset.
tem (image sensor and lighting) to capture the surface of the They observed the CAE performed better detecting defective
object with adjacent software for image evaluation. The soft- images than defect-free on their own created dataset. However,
ware consist of an inspection algorithm to extract features of compared to supervised methods, the CAE achieved lower ac-
the image and classify it into non-defect or defect [9]. In re- curacy than supervised methods. GAN-based approaches have
cent years, inspection algorithms are more and more based on received increasing attention from researchers in recent years
artificial intelligence, which has improved the performance of [1, 2, 12]. For instance, Lai et al. [12] used a pretrained GAN to
various computer vision tasks. Automated optical inspection in generate defect-free images based on their training data. As the
order to detect surface defects is applied in manufacturing qual- proposed GAN failed to generate defective samples, they were
ity control, such as of metal [11, 15], ceramics or textiles [8], able to identify defects in textured images effectively.
optical elements [16] or electronics [9].
Another wide field is inspection during maintenance (as
in our use case) with many researches been conducted up to 3. Use case analysis
now. For instance, AI-based inspection is more and more enter-
ing in aircraft maintenance. Inspecting the fuselage for corro- According to [6, 7], different aspects of anomaly detection
sion is a vital task, where Brandoli et al. [5] applied a image- (see fig. 1) have to be discussed to select appropriate anomaly
based deep learning method for corrosion identification and detection methods. Regarding input data, it can be distinguished
achieved promising results and high performance. The authors
encountered shortage of defective images by employing trans-
fer learning, but stated their method is expected to improve
with more data. Taheritanjani et al. [19] applied supervised and
unsupervised AI-methods on real image data of aircraft en-
gines fasteners and achieved an accuracy and recall of 0.99
using a Resnet101-based supervised method, while unsuper-
vised methods like support vector machines or autoencoders
achieved significantly lower performance. The authors stated
the main drawbacks of supervised methods are tedious data col-
lection as well as the lacking generalizability when introducing Fig. 1. Aspects of anomaly detection.
unknown defects. Other researches used AI-based approaches
on endoscopic images for defect detection in of aircraft en- between sequential data (e.g. video, speech, text) and non-
gines [18, 21]. Shen et al. [18] for instance successfully im- sequential data (e.g. images). This work focuses on imaging
plemented supervised learning for detecting cracks and burns, sensors to capture surface defects, so non-sequential image
but the amount of available training data remains a bottleneck. data is used. Due to lacking defect data, only one class data
of defect-free images is available for training. Therefore, only
2.2. Anomaly detection semi-supervised methods are considered, which only train on
one class (one-class classification). Next, anomalies can be cat-
Classification between a defect-free and defective while egorized into three types:
training only on defect-free/normal data instances is often re-
ferred as one-class classification or anomaly detection problem • Point anomalies, where a data instance significantly de-
[6, 7]. Various semi-/unsupervised approaches have been de- viates from the other instances
veloped to encounter the data shortage issue. In recent years, • Contextual anomalies, where a data instance is consid-
approaches mostly based on a Generative Adversarial Network ered anomalous in a specific context
(GAN) or Convolutional Autoencoder (CAE) have been devel- • Collective anomalies, where single data instances appear
oped to detect defects in image data. However, compared to to be normal, but anomalous in a group
traditional classification approaches, the specific type of de-
fect cannot be determined. An et al. [3] proposed a Varia- Pitting corrosion defects on landing gears can be considered
tional Autoencoder (VAE), which differs from conventional au- point anomalies since they do not have a specific context be-
toencoders that it delivers a reconstruction probability instead tween each other as their occurrence is random. They are indi-
of a reconstruction error, which does not require a specific vidually considered anomalous. Output of deep anomaly detec-
4. Approach
After reviewing aspects of anomaly detection in section 3, a Fig. 2. Setup for image acquisition.
Convolutional Autoencoder (CAE) approach has been selected.
An autoencoder consists of an encoder and decoder. The en-
coder compresses the input to a lower-dimensional space. This
compressed representation is passed to the decoder, which re-
constructs it back to the input dimension. For CAE, the encoder
consists of a sequence of convolution and downsampling layers
to compress an image, while the decoder involves a series of
deconvolution and upsampling layers for reconstruction [20].
Compared to VAE- or GAN-based approaches, CAE are con-
sidered relatively straightforward to train.
As a generative method, the CAE delivers an image output
which can be compared with the input. The similarity or recon-
struction error between input and output characterizes the au-
toencoder’s performance and is considered as anomaly score.
Since the CAE will be trained only on reconstructing defect-
free images, a high similarity between input and reconstructed
output and therefore a low reconstruction error is expected. For
defective images, a higher reconstruction error is estimated,
which deviates significantly from normal instances. We pro-
pose a threshold method based on the reconstruction error to
identify normal and anomalous instances. Different metrics for
calculating the similarity of input and reconstruction, namely
mean squared error (MSE), structural similarity index (SSIM) Fig. 3. Dataset samples.
and signal-to-reconstruction-error-ratio (SRE) [13], are consid-
5.2. Autoencoder architecture
5. Implementation
The CAE was implemented using python and keras tensor-
flow. The encoder consists of the input and a convolution layer
5.1. Data acquisition
to compress the input image. The decoder applies a transposed
convolution and a convolution layer to reconstruct the input
Real images of a landing gear component surface have been
from the compressed representation. The width of the autoen-
acquired for training and testing. Figure 2 depicts the acquisi-
coder is varied by modifying the number of filters (nF ) of the
tion setup which consists of a grayscale camera focusing per-
convolution/convolution transpose layers between 16, 32 and
pendicular on the component surface. The component itself
64. Adam (short for Adaptive Moment Estimation) optimizer
can be rotated while the camera is slideable parallel along the
was applied. Details of the architecture can be found in table 1.
component rotation axis. Due to the varying outer component
contour, the camera was manually focused. A LED ring light
ensured adequate lighting conditions. In total, 600 non-defect
and 300 defect images were taken. The images were resized to Table 1. Detailed parameters of the CAE.
144x144 pixels and normalized. Dataset samples1 are shown in Layer Filters Padding Activation
fig 3. Corrosion is visible as dark areas, which clearly separates Encoder Input
Convolution nF x(3,3) same relu
from the (rather noisy) metal texture. Decoder Convolution transpose nF x(3,3) same relu
Convolution 1x(3,3) same sigmoid
1 The dataset can be requested from the corresponding author.
5.3. Training
The CAE models were trained from ground up using the data
elaborated in section 5.1. 500 random defect-free samples were
picked for training and the remaining 100 defect-free and all
300 defective samples were used for subsequent testing. During
training, a 85/15 training/validation split was applied. The train-
ing data is shuffled after each epoch. Training parameters were
chosen on best practices among literature. A constant batch size
of 16 and a learning rate of 0.001 were chosen. All models were
trained for 25 epochs and saved after each epoch for evaluation.
6. Results
Table 2. Confusion matrix for nF = 64 after 1 epoch (top) and nF = 16 after 3 is desired and as small number of false positive detections may
epochs (bottom). be accepted as a trade-off. Due to overall highest recall rates,
Predicted label SRE metric and nF = 64 is selected for further investigations
nF = 64 defective defect-free Total and optimizations such as threshold adjustment.
defective 261 39 300
True label
defect-free 41 59 100 6.2. Threshold adjustment
Total 302 98 400
Predicted label The decision threshold is important for the model perfor-
nF = 16 defective defect-free Total mance. In section 6.1, the mean training reconstruction error
defective 261 39 300 was used as initial threshold to determine a suitable similarity
True label
defect-free 45 55 100 metric. However, tuning the threshold may reinforce the desired
Total 306 94 400 behavior for the specific use case. We define the threshold as
T = µ + a · σ, where the initial mean reconstruction error µ
Roughly 50% of defect-free samples were falsely classified, is adjusted by the product of parameter a and the standard de-
indicating the initial threshold is not suitable for evaluating viation σ of the training reconstruction errors. The wide CAE-
defect-free samples. In contrast, defective samples were clas- model (nF = 64) after one epoch is used for further evaluation.
sified significantly better. Since all defects must be reliably de- Parameter a is varied between -2 and 2 and the performance is
tected when inspecting safety-critical components, a high recall evaluated. As depicted in figure 5, decent tradeoffs of all three
In order to identify causes for the low performance, we Fig. 7. Falsely classified defect-free samples.
investigated the reconstructions of the CAE. Figure 6 shows
data samples from the test set, both normal and defective. As
the images indicate, the CAE successfully reconstructs input 6.4. Discussion
from both normal and defective samples and visual differences
are not as evident expected. The reconstructions appear blurry As previously elaborated, the CAE reconstructs defect-free
probably causing a distorted similarity measurement. Bright- as well as defective images successfully, which influences the
ness differences can be noticed between input and reconstruc- performance negatively. On one hand, threshold adjustment ac-
tion for both normal and defective samples. We conclude the cording to section 6.2 can increase the desired behavior of the
CAE (with the elaborated architecture) has not learned prop- CAE to detect all defects, but on the other hand will drasti-
erly on the given data. The training data was analyzed for sus- cally increase the false positive classifications and, in contrast
