Main

NDT&E International 138 (2023) 102885
Contents lists available at ScienceDirect
NDT and E International

journal homepage: www.elsevier.com/locate/ndteint
Learning defects from aircraft NDT data

Navya Prakash a ,∗, Dorothea Nieberl b , Monika Mayer b , Alfons Schuster b
a
Marine Perception (MAP), German Research Center for Artificial Intelligence (DFKI) GmbH, Oldenburg, Lower Saxony, Germany
b
Center for Lightweight Production Technology (ZLP), German Aerospace Center (DLR), Augsburg, Bavaria, Germany
ARTICLE INFO ABSTRACT
Keywords: Non-destructive evaluation of aircraft production is optimised and digitalised with Industry 4.0. The aircraft
NDT structures produced using fibre metal laminate are traditionally inspected using water-coupled ultrasound
NDE 4.0 scans and manually evaluated. This article proposes Machine Learning models to examine the defects in
Aircraft production
ultrasonic scans of A380 aircraft components. The proposed approach includes embedded image feature
Quality control
extraction methods and classifiers to learn defects in the scan images. The proposed algorithm is evaluated
Machine learning
POD
by benchmarking embedded classifiers and further promoted to research with an industry-based certification
process. The HoG-Linear SVM classifier has outperformed SURF-Decision Fine Tree in detecting potential
defects. The certification process uses the Probability of Detection function, substantiating that the HoG-Linear
SVM classifier detects minor defects. The experimental trials prove that the proposed method will be helpful to
examiners in the quality control and assurance of aircraft production, thus leading to significant contributions
to non-destructive evaluation 4.0.
1. Introduction proposed work can be vital in the automated offline-QA to scruti-

nise FML aircraft production [8] and adapt to other aircraft materials
Ultrasonic Testing (UT) is a typical Non-destructive Testing (NDT) like aluminium, thermoplastic fibre [9] and Carbon Fibre Reinforced
method for examining the structural components for aircraft produc- Plastic (CFRP) [10]. The proposed research aims: to understand and
tion. Manufacturing aircraft made of Fibre Metal Laminates (FML) prepare ultrasonic scans of aircraft FML (raw data) provided by the
includes cascaded steps such as placement of aluminium, glass prepreg, aircraft industry and pre-process data (convert raw data to images)
adhesive, doublers, stringers, vacuum bagging and curing in an au- to gain feasibility for the proposed method. Additionally, implement
toclave [1]. Quality Control (QC) is performed first at the layup of embedded Machine Learning classifiers with image feature extrac-
the component (without stringers) after curing and the quality as-
tion techniques to achieve the best defect detection rate and further
sessment is visually evaluated [2]. The manually performed examina-
interpret industry-based certification process to evaluate this approach.
tion of anomalies is very time-consuming. In addition, [3] conducted
The remainder of the paper is structured as follows: Section 2
NDT inspection using a manual UT phased array for Glass Reinforced
describes the proposed Machine Learning model and its pipeline. Sec-
(GLARE® ) FML of A380, it lacked the high capacity of data and
tion 3 illustrates the proposed model’s data interpretations with exper-
additionally an evaluation software. So, Non-Destructive Evaluation
(NDE) 4.0 helps streamline processes, increase quality and lower costs imental results and Section 4 discusses the industry-based inferences
in aircraft production with an automated Quality Assurance (QA) [4]. to evaluate the proposed approach. Finally, Section 5 summarises the
Traditionally, the quality control of FML is performed by an expe- proposed method and explores the scope for further improvements.
rienced examiner after the final production of an aircraft structure [5].
But, with the implementation of Machine Learning (ML) techniques, 2. Learning defects
defects can be identified instantaneously to help the examiner [6,
7]. So, the primary motivation was to develop an automated QA in
Machine Learning is a subset of Artificial Intelligence (AI), deal-
aircraft production by implementing a Machine Learning algorithm.
ing with data acquired from sensors for learning the data-generating
The quality analysis process in the proposed method consists of pre-
distribution. There are three primary techniques: supervised learning
analysing the sensor data acquisition to classify the features according
to the defects and good qualities. The proposed approach reduces the – data needs to be labelled (each data point tagged to belong to a
examiner’s workload, expensive repairs and manufacturing waste. The particular class) for training, mainly used for classification (predicts
∗ Corresponding author.
E-mail address: navya.prakash@dfki.de (N. Prakash).
https://doi.org/10.1016/j.ndteint.2023.102885
Received 6 September 2022; Received in revised form 13 May 2023; Accepted 21 May 2023
Available online 26 May 2023
0963-8695/© 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
N. Prakash et al. NDT and E International 138 (2023) 102885
discrete labels) and regression (predicts a continuous quantity). Next, crack shape estimation with height, length and depth parameters us-
unsupervised learning – requires no data labels for training; dimension- ing Eddy Current and SVM for regression (SVR) with RBF kernel in
ality reduction and clustering are the two significant methodologies. conductive materials. It achieved a maximum error rate of 0.3 mm in
Following is reinforcement learning – the agent (training) sends an defect length, but height and depth detection needed more training.
action (a move causing change) to the environment (real or virtual Following, [40] used an ANN, MLP (with back-propagation) and SVR
world) and in-return environment sends the state and its reward (eval- (with RBF kernel) for crack defect classification. SVR outperformed
uation of the action, either positive or negative) for the agent; real-time MLP with a maximum error rate of 0.8 mm on a 5 mm crack length,
decisions and gaming models are its prototypes. Additionally, semi- but height and depth parameters needed more SVR model tuning.
supervised learning is a combination of supervised and unsupervised Dynamic PCA, 𝑘-NN, MLP, RBF and SVM were implemented by [41]
learning methodologies. for defect depth in infrared NDT in CFRP composite material. The
Supervised learning examples [11] include Support Vector Ma- MLP outperformed RBF and SVM for complex composite, whereas the
chine (SVM) [12], Decision Trees [13,14], Random Forest (RF) [15], dynamic PCA and 𝑘-NN could estimate defect depth on plane composite
𝑘-Nearest Neighbour (𝑘-NN) [16], Naïve Bayes [17], Linear Discrim- and detection limit for classifiers. The NDT data of oil or gas pipeline
inant Analysis (LDA) [18] and Logistic Regression [19]. The Fuzzy defects were detected by [42] using LDA, MLP, SVR, RBF, PCA, 𝑘-NN
C-means (FCM) [20], 𝑘-means [21] and Principal Component Analysis and SVR outperformed all other methods with 98.28%. SVM and ANN
(PCA) [22] are a few state-of-the-art unsupervised learning techniques. were trained by [43] with NDT rail data for real-time defect processing
SVM predicts classes based on an optimal hyperplane creating margins and SVM outperformed ANN with 97% of accuracy. For fabric defect
to find similar features from each class and classifies them together. image analysis, [44] implemented AdaBoost [45] and HoG for feature
Decision Trees predict a class by learning the decision rules from the extraction with SVM for classification. This method identified most de-
data features of that class. Random Forest combines the outcome of fects with fewer false rejection rates. The SVM and ANN classified NDT
multiple Decision Trees into a prediction. 𝑘-Nearest Neighbour predicts data of construction structures by [46] with Fast Fourier Transforms
using the proximity of 𝑘 nearest data points for classification. Naïve (FFT) and RBF for feature extraction, SVM outperformed ANN with
Bayes classifies based on the probability of data points applying Bayes’ 93% accuracy.
theorem. Using Fisher’s algorithm, LDA finds a linear combination Aerospace structure defects were classified based on their shapes
of data features to characterise different classes. Logistic regression (Shape Geometric Descriptor (SGD)) using J48 Decision Tree [47],
finds the probability of an event occurring, such as voted or no vote, MLP, Naïve Bayes classifiers with Content-Based Image Retrieval (CBIR)
and SGD for feature extraction in the research of [48]. MLP out-
based on the data variables. Fuzzy C-means is similar to 𝑘-means but
performed J48 Decision Tress (96%) and Naïve Bayes (95%) with
is a soft clustering where a data point can belong to one or more
98% accuracy. Another research [49] trained J48 Decision Trees and
clusters. 𝑘-means is a hard clustering that partitions data points into
Random Forest to determine weld quality in NDT data of Shielded
𝑘 clusters, each belonging to one cluster with the nearest mean value.
Metal Arc Welding (SMAW) of carbon steel plates. Random Forest
PCA reduces data dimensionality and increases its interpretability with
outperformed J48 Decision Trees (70.78%) with 88.69% of accuracy.
less information loss.
Automatic NDT aircraft defects were diagnosed by [50] using SVM
Deep Learning is a subset of Machine Learning applied to im-
and SURF with AlexNet [51] and VGG-F Deep Neural Network as
ages, videos, text and other data formats. It comprises multi-layer
feature extraction methods. SVM gained the highest accuracy of 96%
Artificial Neural Networks (ANN) [23]. Deep Neural Network (DNN)
with the SURF for Region of Interest (RoI) selection. The mobile panel
has many hidden layers of neural networks to perform classification
surface defects were inspected by [52] with LBP and HoG feature
and regressions. The state-of-the-art neural networks are Radial Ba-
extractors that trained Naïve Bayes and SVM. The HoG-SVM classifier
sis Function (RBF) [24], Autoencoders (AEC) [25], Multi-Layer Per-
outperformed all other feature extractors and Naïve Bayes with >90%
ceptron (MLP) [26], VGG-F [27], Fast R-CNN [28], ResNet v2 [29],
average accuracy. Random Forest with RoI classified defects on alloys
Transformer [30].
and achieved >90% accuracy [53].
The Machine Learning algorithm often includes a feature extrac- An Aeronautics Engine Radiographic Testing Inspection System
tion process depending on the input data type to improve its perfor- Net (AE-RTISNet) with Fast R-CNN was developed to inspect defects
mance [31]. A few state-of-the-art image feature extraction methods are in aeronautical engines [54]. It contains RoI as a feature extractor
Local Binary Patterns (LBP) [32], Maximally Stable Extremal Regions and obtained a mean average precision (mAP) of 90% compared
(MSER) [33], KAZE [34], Speeded Up Robust Features (SURF) [35], to YOLO [55]. The Aluminium Conductor Composite Core (ACCC)
Histogram of Oriented Gradients (HoG) [36]. LBP labels pixels in an with NDT X-ray images was analysed for defects using Inception
image by thresholding each pixel neighbourhood, resulting in a binary ResNet v2 [56]. This Deep Neural Network, Inception ResNet v2,
number to encode local texture information. For blob detection, the maintained 97.01% accuracy compared with Res2Net-18 (96.28%)
MSER method uses co-variant regions in corresponding grey-level cells and ResNet-v2-50 (96.15%) after data augmentation. Random For-
in images. KAZE works on non-linear scale space and determinants est, RBF-SVM, hidden Markov model (HMM) [57] were implemented
of the Hessian matrix with the local difference binary descriptor to by [58] for training with autoencoders-FFT, low-pass filtering and
detect multi-scale corner features from the scale space. SURF detects PCA for feature extractions to measure defects in aerospace CFRP
interest points and local neighbourhoods to match, finds features in aluminium plates. AEC-PCA outperformed all other classifiers with
the Gaussian scale space, can distinguish between background and >0.9 clustering scores. Convolution Neural Networks (CNN) deter-
foreground features in an image, finds blob features and is partially mined aerospace NDT defects using spot classifiers in research of [59]
influenced by Scale-Invariant Feature Transform (SIFT) [37]. HoG is and the Indirect spot CNN classifier outperformed the Direct spot
a feature descriptor that describes the image features by calculating CNN classifier with 98% of accuracy. Another CNN approach [60]
the frequency of gradients oriented in localised parts of an image; it was developed to detect defects in NDT data of stainless steel and
encodes local shape information. welded Gas Tungsten Arc Welding (GTAW) or Shielded Metal Arc
The previous research methods that use Machine Learning for NDT Welding (SMAW) joints. This CNN resembles VGG-16 and gained a
data defect analysis are as follows, [38] for UT C-mode scanning acous- Probability of Detection (POD) of 𝑎90∕95 = 2.1 mm, where 𝑎 is the
tic microscopy (C-SAM) in integrated circuits using the Mumford–Shah defect size and 90/95 denotes 90% POD with 95% of CNN model
model for grayscale image processing and SVM for defect classification confidence. An ANN was developed by [61] to monitor defects in
with 80% of recognition rate. This technique needs more training data NDT data of mechanical, aerospace and civil structures consisting of
to improve classification accuracy. The research of [39] implemented aluminium and magnesium alloys and inferred >95% of precision.
2
Table 1
A brief literature survey (ordered by publication year)
Source NDT data Feature extraction Machine learning Performance analysis
Zhang et al. (2005) [38] Integrated Circuits Mumford–Shah model SVM Recognition rate: 80%
Bernieri et al. (2006) [39] Conductive materials RoI SVM regression (SVR) with Maximum error rate (length): 0.3
RBF mm
Bernieri et al. (2008) [40] Conductive materials RoI ANN-MLP (reference) and SVR: maximum error rate (length)
SVR with RBF of 0.8 mm; SVR outperformed
MLP
Benítez et al. (2009) [41] CFRP structure RoI Dynamic PCA, 𝑘-NN, MLP, MLP outperformed RBF and SVM
RBF and SVM
Khodayari-Rostamabad et Oil, gas pipelines PCA 𝑘-NN, SVR, RBF, LDA, MLP Accuracy: SVR - 98.28%
al. (2009) [42]
Wei & Cheng-Tong Rail flaws RoI SVM, ANN Accuracy: SVM - 97%
(2009) [43]
Shumin et al. (2011) [44] Fabric HoG, AdaBoost SVM Detection rate: SVM - high, less
false rejections
Saechai et al. (2012) [46] Construction cement FFT, RBF SVM, ANN Accuracy: SVM - 93%
structure
D’Angelo & Rampone Aerospace structure SGD, CBIR J48 Decision Trees, Accuracy: MLP - 98%,
(2015) [48] Multilayer Perceptron
(MLP) and Naïve Bayes
Sumesh et al. (2015) [49] SMAW Carbon Steel plates Statistical approach J48 Decision Trees, Accuracy: Random Forest -
Random Forest 88.69%
Internal Study: Schmidt, T CFRP C-scans Measured values of all SVM, Random Forest AUC: Gradient histogram-SVM -
et al. (2015) [10] sections, mean or variance, 0.987
gradient histograms
Malekzadeh et al. Aircraft surface LBP, RGB and HSV SVM Accuracy: SVM-SURF - 96%
(2017) [50] histograms, AlexNet,
VGG-F DNN, SURF
Huang et al. (2017) [52] Mobilephone Panel LBP, HoG Naïve Bayes, SVM Average accuracy: HoG-SVM -
>90%
Internal Study: University GLARE® -NDT C-scan Laplace filter, material CNN-ASPP, SGD, softmax High exclusion rate of manual
of Augsburg [64] images thickness, edge information inspection for component area -
97.36%
Shipway et al. (2019) [53] Titanium alloy plates RoI Decision Trees, Random Accuracy: >90%
Forest
Chen & Juang (2020) [54] Aeronautical engine RoI Fast R-CNN, YOLO mAP: Fast R-CNN - 90%
Hu et al. (2021) [56] Aluminium conductor Image normalisation Inception ResNet v2, Accuracy: Inception ResNet v2 -
composite core ResNet-18, ResNet-v2-50 97.01%
Kraljevski et al. Sensor network signals of FFT, low-pass filtering, AEC, HMM, RBF-SVM, Clustering score: AEC-PCA - >0.9
(2021) [58] aluminium and CFRP PCA Random Forest
plates
Niccolai et al. (2021) [59] Aerospace structures RoI Direct and Indirect spot Accuracy: Indirect spot CNN -
CNN 98%
Siljama et al. (2021) [60] Stainless steel Normalisation CNN POD: a90/95 = 2.1 mm
Fakih et al. (2022) [61] Aerospace/mechanical/civil Geometric constraints, ANN Precision: ANN - >95%
structures Approximate Bayesian
computation
Le et al. (2022) [62] Aircraft structure PCA SVM, Naïve Bayes, 𝑘-NN, Average accuracy: SVM - 89.48%
Random Forest, Logistic
Regression
Risheh et al. (2022) [63] Steel structures RoI, threshold selection, 𝑘-means clustering Defects detected accurately
image segmentation, Canny
edge detection
Aircraft structure corrosion was analysed using NDT data with PCA Learning classifiers. The positive class had 37 annotated discontinuities
for feature extraction and SVM, Naïve Bayes, Random Forest, 𝑘-NN with 18 delaminations and 19 porosities and consisted of 222 total
and Logistic regression models [62]. SVM outperformed all other training samples. The gradient of histograms for feature extraction was
models with 89.48% average accuracy. 𝑘-means clustering for NDT combined with SVM and Random Forest to classify discontinuities. The
steel structure was developed by [63] to determine defects with RoI, gradient histogram-SVM had the highest AUC of 0.987 and 10% of
thresholds, image segmentation and Canny edge detection techniques. FP rate, but the gradient histogram-Random Forest classifier had a
This method does not need training and can detect defects accurately lesser FP rate for the positive class. In contrast, the gradient histogram-
in smaller datasets. Random Forest classifier gained lesser confidence than the gradient
Further, [10] was an automated evaluation of CFRP component NDT histogram-SVM. There is a requirement for more training data with
data with discontinuities such as delaminations, layer porosity, volume positive class samples to increase the classification rate.
porosity and foreign bodies. These CFRP C-scans were converted to Following, [64] detected anomalies using a Deep Learning tech-
.png images using ULTIS® NDT Kit software and trained Machine nique with the same GLARE® NDT dataset used in the proposed model.
3
Fig. 1. A380 FML panels [1,2,65,66].

Fig. 2. Sample NDT data with defects (magnified).
The NDT scans were converted to grayscale images with Python pro-
gramming. These images were pre-processed using a Laplace filter to
extract local material thickness and edge information as features, lead-
ing to an advantage in differentiating faulty and splice regions. These
features trained the Deep Learning architecture with the first six CNN
layers and one Atrous Spatial Pyramid Pooling (ASPP) layer that helps
for significant faulty pixel classifications and another CNN layer with
the last layer of Upsampling. The Stochastic Gradient Descent (SGD) for
the learning method and Softmax cross entropy for the error function
Fig. 3. Defect categories [1,2,65,66].
were used in this research. This classifier achieved an average high
exclusion rate (manual inspection) of 97.36% for the component area
on the test data; training steps are inversely proportional to the True
Positive rate. The disadvantages of this classifier are: the exclusion rate classified defects according to the AITM6-4001 and provided ground
varies with the component type and has a higher False Positive rate. truth values (C-scans) for automated evaluation. The data collected
This classifier determines non-faulty regions instead of differentiating from NDT inspection reports are plotted on a plane view of the compo-
faults and displays additional faults even in non-faulty regions. This nent as images, known as C-scans (process mentioned in AITM6-4001).
method needs more training data for faulty regions to improve its Fig. 2 shows a sample C-scan with denoted defects.
performance and use it in real-time offline-QA of aircraft production. In the proposed approach, the NDT ultrasonic inspection report of
The proposed research aims to develop an automated evaluation FML A380 contains C-scans of each aircraft component. These scans
of aircraft NDT data, i.e., an offline-QA to help human examiners. (.xml file – raw dataset) were analysed using the quality software
Learning defects from aircraft production involves data acquisition, ULTIS® -TESTIA (NDT Kit). The experts at DLR-ZLP denoted the defects
pre-processing, Machine Learning training, predictions and determining in the raw dataset with the help of PAG inspection reports and visu-
the model’s confidence. Choosing an appropriate Machine Learning alised them using this software, forming the ground truth data for this
algorithm can seem complicated because many supervised and unsuper- research. This NDT Kit creates three files, .nkc, .nkd and .nkz for each C-
vised algorithms use different learning strategies. However, choosing scan. The .nkc file has the original C-scan data consisting of two blocks:
an algorithm depends on the quantity of data, data type, applicable the first block is the header of the file with a length (in bytes) defined
insights and the requirement to utilise the model’s evaluation results. by the data offset field and written in ASCII format (indications and
Highly flexible models tend to overfit data by modelling minor vari- values). The second block of .nkc contains the physical data written
ations that could be noise. Simple models are easier to interpret but in binary format. The .nkd file contains defect information such as
might have lower accuracy. Therefore, choosing a suitable algorithm file name, defect surface (mm2 ), outline surface (mm2 ), outline length
requires trading one benefit against another, including model speed, (mm) and comments. Any other information is stored in the .nkz file.
accuracy and complexity. In contrast to the literature survey (Table 1), In the proposed approach, the defect classes of the C-scans are
the proposed work comprises state-of-the-art Machine Learning clas- categorised as porosity (Fig. 2), fold, twist, overlap, gap and foreign
sifiers with distinct image feature extraction methods to detect two body, as illustrated in Fig. 3. There were 343 data samples and 99
classes (binary classification): defects and good components in the contained at least one defect as illustrated in Fig. 4. Fig. 4 describes
aircraft ultrasonic-scan imageset. that the minimum number of defects in an image of a component
is one and the maximum is 15. Most defects belong to the porosity
2.1. NDT dataset category (distribution over the different defect types is confidential).
The proposed method pre-processes the data using these 343 data
GLARE® [67,68] is a new FML class that produces A380 aircraft
samples for further processing. The quantity of data samples used in
structures. The A380 comprises 15.1, 18.1, 18.14, 18.16, 18.17 com-
this study is limited because of the industrial aircraft production rate.
ponents, as in Fig. 1.
The FML of the A380 NDT inspection technique is explained in
the Airbus Test Method for inspection processes (AITM) AITM6-4001 2.2. Machine learning model
(confidential). The aircraft production company, Premium AEROTECH
GmbH (PAG), followed signal analysis requirements according to the The proposed model comprises training and evaluation (Section 4)
AITM6-4001 and generated inspection reports. These inspection reports processes. Preparation for the training process includes three primary
4
Fig. 5. Input image in RGB format and its corresponding grayscale image. (For
interpretation of the references to colour in this figure legend, the reader is referred
to the web version of this article.)
Fig. 4. Distribution of defects from NDT data.
CV consists of exhaustive (iterates randomly on all data points) and

steps: pre-processing, processing and post-processing data. The pre-
non-exhaustive methods (iterates randomly on partitioned data points
processing involves converting the C-scans to Machine Learning com-
simultaneously). The 𝑘-fold and hold-out techniques are examples of
patible format. The ULTIS® enables storing complete C-scan informa-
non-exhaustive methods implemented to validate Machine Learning
tion as an image. Manually, all 343 data samples were converted to
models. The hold-out approach arbitrarily sub-samples more for the
8-bit .jpg images, forming an imageset of defective and non-defective
training than validation. The 𝑘-fold method randomly partitions the
parts. Next, pre-processing is labelling .jpg images to prepare for model
prime training set into 𝑘 equal subsets; one subset forms validation and
training. For labelling, all images were relabelled using MATLAB’s
(𝑘 − 1) subsets for training. The cross-validation process is repeated 𝑘
Image Labeler application. This app consecutively was loaded with
times using each of 𝑘 samples at least once for validation. The average
99 defect images with 208 defects and 244 non-defective images for
accuracy of all 𝑘-folds determines the model’s ability to predict new
labelling. It stores the Region of Interest (ROI) labels (rectangle –
data. The 10-fold CV is used for the proposed model validations, where
position, pixel area) and Scene Labels (defect and good). The ROI for
𝑘 = 10. The 𝑘-fold is suitable for the proposed model because of the
the defect scene label are the rectangles around different defects as
smaller training set and prevents overfitting.
shown in (Fig. 2) and the entire image for a good scene label. These
Lastly, benchmarking predictions of different state-of-the-art Ma-
labelled images are stored for proposed model training with ‘defects’
chine Learning binary classifiers embedded with distinct image feature
and ‘good’ classes. As there are two categories for classification, the
extraction techniques are stored in the post-processing for further
proposed model is a binary classifier and ‘defect’ is a positive class,
model evaluation. The binary classifiers with high confidence scores
as the model aims to determine defects in the images and ‘good’ is a
are recommended (Section 3) for NDE 4.0 (Section 4).
negative class.
√
𝑑𝑠 = 𝑙𝑒𝑛𝑔𝑡ℎ ∗ 𝑏𝑟𝑒𝑎𝑑𝑡ℎ (1) 2.3. Model pipeline
where: 𝑑𝑠 is the defect size in pixels (px), length and breadth of the
The proposed Machine Learning model pipeline comprises two
rectangular defect label
steps: training – feature extraction and classification (including vali-
Further, pre-processing includes calculating defect size (𝑑𝑠) in the
dation). The in-built functions of MATLAB were used with the Clas-
image labels. 𝑑𝑠 is defined as the square root of the defect area as
sification Learner App for the proposed model. An algorithm for the
in Eq. (1). The defect area is obtained from the rectangular image
proposed model is as follows:
label dimensions (length, breadth). A square root over the defect area
is formulated for two reasons: for standardising all the defect data (1) Feature Extraction: input positive imageset (208 cropped defect
and most defects are not frame-filling, i.e., the defect pixel area is images) and negative (244 good images) imageset of RGB or
not equal to the rectangular label area, for example, twist, fold, pores truecolour images (as shown in Fig. 5) as an image datastore
(Fig. 3). The minimum defect size in image labels encountered is 6 to form a training imageset. Datastore can store larger feature
px and the maximum is 383 px. According to the defect size, all 208 vector size and increases processing rate.
defects were cropped to their equivalent defect label size and stored as
the labelled defect (positive) imageset. The 244 labelled non-defective 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑖𝑚𝑑𝑠=𝑖𝑚𝑎𝑔𝑒𝐷𝑎𝑡𝑎𝑠𝑡𝑜𝑟𝑒 (𝑓 𝑜𝑙𝑑𝑒𝑟 𝑝𝑎𝑡ℎ)
images formed the good (negative) imageset.
The processing step has feature extraction and training. It includes (a) These labelled images of both classes have features ex-
training the proposed Machine Learning model with a feature set from tracted using custom extractors as follows:
the training imageset (positive and negative imageset) and class labels (i) Convert all input RGB images to grayscale (Fig. 5)
– defect and good. The feature set is obtained from different image for LBP, MSER, KAZE and SURF feature extraction
feature extraction techniques: LBP, MSER, KAZE, SURF and HoG. Each (HoG can extract features from RGB and grayscale
feature extractor has a bag-of-features to store its features. Each bag-of- images)
features (feature set) is input to each state-of-the-art Machine Learning
model for binary classification: SVM, Decision Trees, Random Forest, 𝑘- 𝑔𝑟𝑎𝑦𝑠𝑐𝑎𝑙𝑒 𝑖𝑚𝑎𝑔𝑒 = 𝑟𝑔𝑏2𝑔𝑟𝑎𝑦 (𝑅𝐺𝐵 𝑖𝑚𝑎𝑔𝑒)
NN and Naïve Bayes. MATLAB’s Classification Learner application was
(ii) LBP (Fig. 6) and HoG (Fig. 7) features of each input
loaded with the training set (a feature set and class labels). During
image
the training process, the Cross Validation (CV) [69,70] technique is
applied to the training set to prevent overfitting (model overtrain), un- 𝑓 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 = 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝐿𝐵𝑃 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 (𝑔𝑟𝑎𝑦𝑠𝑐𝑎𝑙𝑒 𝑖𝑚𝑎𝑔𝑒)
derfitting (insufficient model training), to observe the model’s reaction
to a similar independent dataset and prediction error function. The 𝑓 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 = 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝐻𝑂𝐺𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 (𝑅𝐺𝐵 𝑖𝑚𝑎𝑔𝑒)
5
Fig. 6. LBP features.
(iii) For MSER (Fig. 8), KAZE (Fig. 9) and SURF

(Fig. 10) features of each input grayscale image – Fig. 7. HoG features.
detect regions and extract features from each these
regions
𝑟𝑒𝑔𝑖𝑜𝑛𝑠 = 𝑑𝑒𝑡𝑒𝑐𝑡𝑀𝑆𝐸𝑅𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 (𝑔𝑟𝑎𝑦𝑠𝑐𝑎𝑙𝑒 𝑖𝑚𝑎𝑔𝑒)
𝑟𝑒𝑔𝑖𝑜𝑛𝑠 = 𝑑𝑒𝑡𝑒𝑐𝑡𝐾𝐴𝑍𝐸𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 (𝑔𝑟𝑎𝑦𝑠𝑐𝑎𝑙𝑒 𝑖𝑚𝑎𝑔𝑒)
𝑟𝑒𝑔𝑖𝑜𝑛𝑠 = 𝑑𝑒𝑡𝑒𝑐𝑡𝑆𝑈 𝑅𝐹 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 (𝑔𝑟𝑎𝑦𝑠𝑐𝑎𝑙𝑒 𝑖𝑚𝑎𝑔𝑒)
𝑓 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 = 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 (𝑔𝑟𝑎𝑦𝑠𝑐𝑎𝑙𝑒 𝑖𝑚𝑎𝑔𝑒, 𝑟𝑒𝑔𝑖𝑜𝑛𝑠)

(iv) Each custom feature extractor has a Bag-of-visual-
words constructed
𝑏𝑎𝑔 = 𝑏𝑎𝑔𝑂𝑓 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 (𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑖𝑚𝑑𝑠, 𝐶𝑢𝑠𝑡𝑜𝑚
𝐸𝑥𝑡𝑟𝑎𝑐𝑡𝑜𝑟 𝑁𝑎𝑚𝑒, 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑜𝑟 𝑓 𝑢𝑛𝑐𝑡𝑖𝑜𝑛 ℎ𝑎𝑛𝑑𝑙𝑒𝑟)
(v) Load scene data as an encoded bag-of-features from Fig. 8. MSER features.
each custom extractor and training imageset
(vi) Load all labels of training imageset to scene labels as
an attribute to scene data; label names ‘defect’ and normalises the result using a block-wise pattern and returns a descriptor
‘good’ are stored as scene type for each cell.
Fig. 8 shows MSER feature extraction (zoomed-in) for Fig. 5. From
(2) Training: Open Classification Learner App and load scene data the grayscale image, co-variant regions (MSER regions) (coloured re-
and scene type gions) are extracted by checking the variation of the region area
(a) select all scene data as predictors size between different intensity thresholds. Ellipses (marked in black
colour) and centroids (marked in black plus) from MSER regions are
(b) simultaneously apply Cross-Validation with 10-fold
stable connected components of the grayscale image.
(c) start the session and store validation results (Section 3)
Fig. 9 displays KAZE features (zoomed-in) from Fig. 5. The grayscale
(d) In the Classification Learner App, use parallel computing
image is used to obtain KAZE points (marked in blue ellipses and
to train all available Machine Learning classifiers at once.
black plus), with non-linear diffusion to construct a scale space for the
(e) Store all trained classifiers for further analysis (Sections
grayscale image and then detect multi-scale corner features from that
3, 4)
scale space.
Fig. 10 shows SURF points (marked in black colour) (zoomed-
A part of the data from the A380 component is visualised in Fig. 5
in) are extracted from Fig. 5. These SURF points are obtained using
(cropped smaller section of a good part) due to data confidentiality,
Hessian blob detector and its feature vector from Haar wavelet from
the input RGB image is converted to grayscale for feature extraction
the grayscale image.
processes (except for HoG).
During the training process, HoG extracted 34,596 features from
Fig. 6 represents the LBP feature graph of encoded local texture each image and 422 × 34,596 feature vectors were elected with the
information in binary format extracted from Fig. 5. The LBP feature par- strongest features from each class. These strongest HoG feature vectors
titions the grayscale image into non-overlapping cells. The histogram created a bag-of-features with 500 clusters. SURF extracted 12,093 fea-
bins represent the number of features from each cell in the grayscale tures (total – 422 × 12,093) and the strongest features from each class
image and bins depend on the number of neighbours of each cell. The formed 50 bag-of-features clusters. MSER extracted 10,644 features
uniform feature set values of each cell (local texture information) are with 500 bag-of-features clusters. KAZE extracted 9124 features with
plotted with LBP histogram bins and each histogram describes an LBP 500 clusters and LBP extracted 420 features with 302 bag-of-features
feature. clusters. Overall, in the training process, HoG produces the most feature
Fig. 7 illustrates HoG features (zoomed-in) (marked in white colour) vectors in this setup and more features are required to train Machine
extracted from an RGB input image (Fig. 5) converted to a binary Learning classifiers to gain better prediction results.
image. This binary image is decomposed into small, squared cells The classifiers trained in the proposed method from the Classifica-
and computes a histogram of oriented gradients in each cell. Then, it tion Learner App include 𝑘-NN – fine, medium, coarse, cosine, cubic,
6
Fig. 11. 𝑘-NN accuracy.
Table 2
Possibilities of predictions.
Type Predicted Actual value
True positive Defect Defect
True negative Good Good
False positive Defect Good
Fig. 9. KAZE features. False negative Good Defect
The accuracy of the Machine Learning model is the rate of correct

prediction to the total predictions (Eq. (2)).
𝑇𝑃
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (3)
𝑇𝑃 + 𝐹𝑃
Precision is the rate of correct defects predicted to the total positive
predictions by the trained Machine Learning model (Eq. (3)).
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = (4)
𝑇𝑃 + 𝐹𝑁
Recall or sensitivity is the rate of correct defects predicted to the
total positive instances in the test data (Eq. (4)).
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 = 2 ∗ (5)
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
Fig. 10. SURF features. 𝑇𝑁
𝑆𝑝𝑒𝑐𝑖𝑓 𝑖𝑐𝑖𝑡𝑦 = (6)
𝑇𝑁 + 𝐹𝑃
𝐹𝑃
𝐹 𝑃 𝑅 = 1 − 𝑆𝑝𝑒𝑐𝑖𝑓 𝑖𝑐𝑖𝑡𝑦 = (7)
weighted and Decision Trees – fine, medium, coarse. Random Forest 𝑇𝑁 + 𝐹𝑃
– ensemble boosted trees, ensemble bagged trees, ensemble subspace The F1-score is the harmonic mean of precision and recall (Eq. (5)).
discriminant, ensemble subspace 𝑘-NN, ensemble RUS boosted trees; The Rate of Change (ROC) is the probability curve [72] and the
SVM – linear, quadratic, cubic, fine Gaussian, medium Gaussian, coarse Area under the ROC curve (AUC) is the degree of separability. ROC-
Gaussian and Naïve Bayes. The performance of all these classifiers with AUC evaluates the trained classifier’s performance in distinguishing
image feature extraction methods is discussed in Section 3. the ‘defect’ and ‘good’ classes with the values of True Positive Rate
(TPR) (recall or sensitivity) and False Positive Rate (FPR). The FPR is
3. Experimental result and discussion calculated based on the specificity (Eq. (6)) of the trained model using
Eq. (7). The ROC-AUC curve is plotted with FPR (x-axis) against TPR
The proposed Machine Learning model is evaluated using metrics (y-axis). The trained model can better classify defects and good aircraft
such as accuracy, precision, recall, F1-score, Receiver Operating Curve structures if the AUC value is higher.
with Area Under the Curve (ROC-AUC) [71], 𝑘-fold Cross Validation
and POD certification. The classifier’s confidence is designated based 3.1. Cumulative models
on the values of true-positive (TP), true-negative (TN), false-positive
(FP) and false-negative (FN). Figs. 11–14 illustrate the analysis to choose the best accuracy of
From Table 2, a prediction is a TP or TN when the predicted and cumulative Machine Learning classifiers with image feature extraction
actual values are the same; TP is when a defect is classified as defect methods.
class and TN is a good part classified as a good class. An FP or FN From Fig. 11, the performance of LBP-Fine 𝑘-NN has the high-
occurs when the predicted and actual values are different; FP is the est accuracy of 59.3% and the least of LBP-Coarse 𝑘-NN with 55%.
classification with the predicted value of a defect, but the actual value MSER-Cosine 𝑘-NN has the highest accuracy of 92.4% and least with
is a good aircraft part and FN is vice-versa. A matrix representation of MSER-Coarse 𝑘-NN of 45%. KAZE-Cosine 𝑘-NN has 80.5% high accu-
all these values forms a confusion matrix. racy and a low of 55.2% with KAZE-Coarse 𝑘-NN. SURF-Fine 𝑘-NN has
95.2% highest accuracy and 87.4% with KAZE-Cubic 𝑘-NN. HoG-Cosine
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (2) 𝑘-NN has 90% accuracy and is low with HoG-Coarse 𝑘-NN of 55.5%.
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
7
Fig. 12. Decision Tree accuracy. Fig. 14. SVM accuracy.
Table 3
Consolidated accuracy chart.
Feature extraction Classifiers Accuracy (%)
LBP Linear SVM 66.4
Ensemble Subspace Discriminant 66.4
Fine 𝑘-NN 59.3
Naïve Bayes 55
MSER Quadratic SVM 92.6
Ensemble Bagged Trees 91.9
Cosine 𝑘-NN 92.4
Naïve Bayes 57.6
KAZE Linear SVM 92.1
Ensemble Subspace Discriminant 94.2
Cosine 𝑘-NN 80.5
Naïve Bayes 94.3
SURF Linear SVM 96.9
Fig. 13. Random Forest accuracy.
Decision Fine Tree 97.9
Fine 𝑘-NN 95.2
Naïve Bayes 59.5
Fig. 12 shows performance analysis of the Decision Tree, the combi- HoG Linear SVM 99
nation of LBP-Decision Fine Tree has 64.3% accuracy and less of 56.4% Ensemble RUS Boosted Trees 93.5
Cosine 𝑘-NN 90
with LBP-Decision Coarse Tree. MSER-Decision Fine Tree has a high
Naïve Bayes 56
accuracy of 90.7% and MSER-Decision Coarse Tree has low accuracy
of 80.7%. KAZE-Decision Fine Tree and KAZE-Decision Medium Tree
have a similarly high accuracy of 91% and KAZE-Decision Coarse Tree Table 4
has low of 88.6% accuracy. SURF-Decision Fine Tree has the highest Evaluation chart.
accuracy of 97.9% and SURF-Decision Medium Tree and SURF-Decision Classifiers Accuracy (%) Recall Precision F1-score
Coarse Tree have an accuracy of 97%. KAZE-Decision Fine Tree gained HoG-SVM 99 0.9919 0.9880 0.984
92.14% high accuracy and KAZE-Decision Coarse Tree of 82.1% low SURF-Fine Tree 97.9 0.9839 0.97 0.97
accuracy.
Fig. 13 demonstrates the Random Forest or Ensemble Trees de-
tection rate, LBP-Ensemble Subspace Discriminant has gained 66.4%
accuracy with 90.7%. The highest accuracy is gained by HoG-Linear
and low accuracy of 49.5% with LBP-Ensemble Subspace 𝑘-NN. MSER-
SVM of 99% and the low accuracy of HoG-Fine Gaussian SVM with
Ensemble Boosted Trees has a high of 91.9% and MSER-Ensemble
72.6%.
Bagged Trees of 56.9% low accuracies. KAZE-Ensemble Subspace Dis-
The LBP had the lowest feature extraction performance with all
criminant and KAZE-Ensemble Subspace 𝑘-NN achieved the highest
classifiers compared to MSER, KAZE, SURF and HoG. The second least
accuracy of 94.2%, but KAZE-Ensemble Boosted Trees has 55% low
feature extraction interpretations were MSER, followed by KAZE. The
accuracy. SURF-Ensemble Bagged Trees, SURF-Ensemble Subspace Dis-
selection of the best feature extraction methods influences the classi-
criminant and SURF-Ensemble Subspace 𝑘-NN have the same high
fiers. SURF and HoG feature extraction methods were selected as the
accuracy around 95.6%; low accuracy of 55% with SURF-Ensemble
best to encase with classifiers to avoid false negatives. The Naïve Bayes
Bagged Trees. 93.5% of accuracy is gained by HoG-Ensemble RUS
(Table 3) and 𝑘-NN were not applicable with most feature extraction
Boosted Trees and a low of 59% with HoG-Ensemble Bagged Trees.
Fig. 14 illustrates the detection rate of SVM classifier, LBP-Linear methods and thus were eliminated in the further evaluation process.
SVM, LBP-Quadratic SVM, LBP-Cubic SVM, LBP-Fine Gaussian SVM and
LBP-Medium Gaussian SVM has a similarly high accuracy of 66.4%; 3.2. Best performing models
LBP-Coarse Gaussian SVM has low accuracy of 55%. MSER-Quadratic
SVM and MSER-Cubic SVM have similar high accuracy of 92.6%, a The highest accuracies from all classifiers are consolidated in Ta-
low of 69.8% from MSER-Fine Gaussian SVM. KAZE-Linear SVM gained ble 3. The embedded classifiers HoG-Linear SVM and SURF-Decision
92.1% high accuracy and KAZE-Coarse Gaussian SVM low of 63.6%. Fine Trees achieved the highest accuracy of 99% and 97.9%, respec-
SURF-Linear SVM, SURF-Quadratic SVM, SURF-Coarse SVM, SURF- tively. Therefore, these two classifiers are further evaluated with recall,
Medium Gaussian SVM and SURF-Coarse Gaussian SVM have matching precision and F1-score metrics as exhibited in Table 4. HoG-Linear SVM
high accuracy of around 96%. SURF-Fine Gaussian SVM achieved low gains the highest F1-score with 0.984 compared to SURF-Decision Fine
8
Fig. 15. HoG-Linear SVM confusion matrix.
Fig. 18. SURF-Decision Fine Tree ROC-AUC curve.
The ROC-AUC curves provide AUC values for HoG-Linear SVM with
𝐴𝑈 𝐶 = 1.00 and prediction probability of zero for negative and 0.98
for positive classes as demonstrated in Fig. 17. The SURF-Decision Fine
Tree has 𝐴𝑈 𝐶 = 0.92 and prediction probability of 0.06 for negative
and 0.87 for positive class predictions as represented in Fig. 18.
After assessing all the evaluation metrics from Table 3 and Table 4,
Fig. 16. SURF-Decision Fine Tree confusion matrix.
the best-performing embedded Machine Learning classifiers are HoG-
Linear SVM and SURF-Decision Fine Tree. These have negligible FN
leading 𝑅𝑒𝑐𝑎𝑙𝑙 ≈ 1.00, high precision and F1-score. The robust require-
ment for the proposed model is to achieve 100% of TP rate on the
prediction data and zero FN rate. The FN rate is essential for calibrating
the proposed model and ROC-AUC curves help with calibration. The
threshold curve is the ROC curve that separates positive and negative
classes, selected to obtain a significantly lower or zero FN rate and
maximum TP rate in the prediction process. From Figs. 17 and 18, as
the TP rate increases, the FP rate also increases. If the AUC of HoG-
Linear SVM decreases below 1.00 and SURF-Decision Fine Tree above
0.92, their FN rate increases. The FP rate is negligible (an experienced
examiner can scrutinise the FP visually) for the real-time usage of the
proposed system, but the FN rate should not be increased because of
the risk involved in the industrial offline-QA of aircraft production.
As SVM is primarily a binary classifier and HoG-Linear SVM has
performed best with the prediction data, selecting it as a predominant
classifier for the proposed approach is beneficial. Hence, it is further
evaluated with the POD certification process. SURF-Decision Fine Tree
can be an option for multi-class classification.
Fig. 17. HoG-Linear SVM ROC-AUC curve.
3.3. Comparison and constraints
Tree F1-score of 0.97. A 98.4% of correct defects are predicted to total The proposed HoG-Linear SVM classifier performs better than [10,
defect samples by trained HoG-Linear SVM model and in test data. 64]. But it has some constraints, such as the Linear SVM classifier is a
In contrast, with test data, SURF-Decision Fine Tree has fewer correct black box, as the path to its predictions is unknown. But Decision Fine
defects predictions. Tree is a grey-box as its prediction path is returned as a binary tree
split into branching nodes based on input data values.
The selection of the best-fitting model anticipates factors such as
A binary tree resulting from one of the proposed prediction analyses
low FN, high recall, precision and F1-score. Apart from accuracy, the
is illustrated in Fig. 19. This binary tree starts with the root and has two
confusion matrix and ROC-AUC curve help calculate these influencing
branches at each node; the nodes contain conditions for the predictions.
scores and calibrate the model. Confusion matrices of HoG-Linear SVM This tree has four and three levels, with the leaf nodes having the
(Fig. 15) and SURF-Decision Fine Tree (Fig. 16) reveal the lowest FN predicted classes, thus explicitly demonstrating the prediction analysis.
rate, with the former having 2% for positive class, zero for negative The SURF-Decision Fine Tree can be feasible for real-time offline-
class. The latter has an FN rate of 13% and 6% for positive and negative QA in aircraft industries for NDE 4.0 and inline-QA, but it could be
classes, respectively. complicated with a heap of Decision Trees and branches. The HoG-
The TP rate of HoG-Linear SVM is 98% for the positive class and Linear SVM analysis from the proposed prediction dataset may not
100% for the negative class and the TP rate of SURF-Decision Fine Tree reflect an accurate performance in the real-time industrial offline-QA
for the positive class is 87% and 94% for the negative class. due to a deficit in additional positive class training data from each
9
Fig. 19. SURF-Decision Fine Tree Classification Tree Viewer.
The data loss is calculated as in Eq. (8). The worst-case data loss
is 1.32% and the average data loss is 0.66%. These .bmp images were
trained and tested with the HoG-Linear SVM classifier and observed
that data loss had no influence on its performance.
4. Certification
The reliability of NDE is defined as determining the probability

of a defect in different defect-size datasets during the evaluation pro-
cess [74]. The quality assessment for the reliability of NDE is essential
for aircraft structural management [75]. The certification is a statistical
validation process for inspecting the reliability of NDE approaches
with POD analysis [76,77]. The proposed certification process is an
automatic error detection (intended for Ultrasonic Testing) to verify
Fig. 20. Local SSIM Map with Global SSIM value 0.9868 of a component. if the proposed Machine Learning classifier can help an examiner in
QA. This process involves acquiring the bare scan of NDT data in
the squirter or X-ray systems and manually converting NDT scans to
images using ULTIS® for the Machine Learning process. The evaluation
aircraft component. The FN was generated with the ‘fold’ defect type
from all the defects due to small fold size and fewer fold samples. More is the human investigation of the scanned image to find defects in the
training data can lead to an increase in the performance of the proposed scan data and for the Machine Learning process, the trained classifier
HoG-Linear SVM classifier. accomplishes the prediction process. This qualification is based on the
Another constraint is the data loss from storing NDT scans as .jpg POD concept to find defect sizes reliably.
images in the pre-processing phase. The .jpg format compresses images, In general, POD is translated into the reliability of finding a given
but the raw data can be converted to .bmp images using the same defect size in px (minimum size to be detected). The minimum size
ULTIS® NDT Kit software, as .bmp is an uncompressed raster and high- contains the POD knowledge and is implemented with the 29/29
quality file format. The signal analysis with a set of five .jpg and five method. There are 29 defects in the minimum size to be detected and
.bmp images from each aircraft component were analysed to determine this method has to detect all 29 reliably without missing one. Thus,
the data loss. The Peak Signal-to-Noise-Ratio (PSNR), Mean Squared defect size in px, 𝑎90∕95 , automatically fulfils the POD criterion: gain
Error (MSE) and Structural Similarity Index Measure (SSIM) [73] are 0.9 at 95% without dealing with the POD concept. The disadvantage
commonly used to calculate data loss. For this signal analysis, SSIM for of the 29/29 method is that the POD requirement is fulfilled, but the
measuring image quality is preferred. For input .bmp (reference image) test model’s reliability is unknown.
and .jpg (comparing image), images of a component are used to obtain The certification process is mainly used to avoid the risks and
the local and global SSIM values and SSIM maps. If the SSIM value is challenges such as software being a black-box has to be noted, training
closer to 1, it signifies better input image quality. an algorithm is crucial and requires expertise and reliability of the
Fig. 20 shows a sample local SSIM map of a component and the dark Machine Learning model in terms of the new dataset, types of defects,
pixels are the small values of local SSIM. The regions with small local different NDT testing methods and feature extraction techniques.
SSIM values correspond to areas where the .jpg image noticeably differs Recent NDE 4.0 research has evaluated their Machine Learning
from the .bmp image. The bright pixels represent the large values of models for NDT data using POD [60,62,78,79]. The possible certi-
local SSIM. These bright pixel regions correspond to uniform regions fication process with the proposed HoG-Linear SVM model includes
of the .bmp image, where data compression has less impact on the .jpg
evaluation of NDT by the NDT-test engineer and algorithm for the
image. The SSIM values for 15.1 component is 0.9962, 18.1 is 0.9953,
predictions of this model or collecting feedback from them regarding
18.14 is 0.9940, 18.16 is 0.9944 and 18.17 is 0.9868.
the quality of the algorithm generated; evaluation leads to further
𝐷𝑎𝑡𝑎𝑙𝑜𝑠𝑠 = 1 − 𝑆𝑆𝐼𝑀𝑣𝑎𝑙𝑢𝑒 (8) training of algorithm and repeats often. It can verify the model’s
10
reliability while encountering new defect datasets and implementing

distinct feature extractions or validation methods. The HoG-Linear SVM
model’s confidence and adoption reliability in offline-QA in the aircraft
industry is analysed with a POD function.
4.1. Probability of detection
A POD is a function of the defect size; it evaluates the smallest

flaw size and combines its quantitative and qualitative parameters [80].
The 90∕95 defect size information is used as a reference and detects
defects with a probability of 90% at 95% of confidence level [81]. Two
methods to determine POD are Hit/Miss data for binary data and signal
response data for continuous data. POD Hit/Miss results is a hit when
the defect is detected and failure is a miss.
𝑇𝑃 Fig. 21. POD curve.
𝑃 𝑂𝐷 = (9)
𝑇𝑃 + 𝐹𝑁
𝐻𝑖𝑡 = 𝑎 > 𝑎𝑙𝑎𝑟𝑔𝑒𝑠𝑡 (10)

Miss rate included more of the ‘fold’ defect type smaller than 8 px. The
performance gap of the proposed POD(20) can be minimised with better
𝑀𝑖𝑠𝑠 = 𝑎 < 𝑎𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 (11) quality of NDT scan perception. Since the PAG testers only flag defects
The POD is calculated using Eq. (9), where TP is a hit and FN is a larger than a specific size, there might be more detectable defects in
miss. Hit/Miss data has a defect size range in px, 𝑎𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 (minor defect the data samples, but their test reports do not exist for annotation.
size of 6 px) and 𝑎𝑙𝑎𝑟𝑔𝑒𝑠𝑡 (maximum defect size of 383 px) to determine The factors influencing the proposed POD(20) are NDT scan image
the substantial uncertainty of the proposed HoG-Linear SVM model to resolution (requires better image quality) and defect frame-filling (not
detect the defect or not. Hit/Miss data suits the proposed model as all defect samples are frame-filling, but control specimens were frame-
SVM performs better as a binary classifier and SURF-Decision Fine Tree filling). Due to this frame-filling issue, POD(20) indicates that at least
could perform better as a multi-class classifier. A Hit is measured if the 20 px must be in an image with any defect size and 8-bit resolution
inspection system detects a defect size, 𝑎 that agrees on Eq. (10) and (256 px).
a Miss is measured if the inspection system does not detect a defect Evaluation of the data by a tester is time-consuming and has the
size, 𝑎 that agrees on Eq. (11). For Hit/Miss data having a vast number probability of missing defects. The HoG-Linear SVM model can save
of smallest or largest defects will not help to gain information on the time and reduce the frequency of miss counts by highlighting areas
POD(𝑎) function that will fit the data. The information required for of interest to the examiner. This model predicts defects based on
estimating the POD(𝑎) function has to be maximised so the defect sizes pixel-by-pixel scans and executes instantly.
are uniformly distributed between the smallest and largest defect size
of interest using the 29/29 method. The POD is calculated with new
5. Conclusion
defects and helps to measure the performance gap of the proposed HoG-
Linear SVM model with defect parameter size of defect area in px. The
overall defect range to be investigated is 6 px to 383 px and intervals Offline-Quality Assessment for NDT-FML of A380 aircraft structures
required within the defect size range to be investigated is 5 px. has been analysed to determine defects in the Ultrasonic Testing scan
In the 29/29 method, having a minimum sample of 29 defects images with state-of-the-art Machine Learning algorithms, SVM and
in each defect width interval is necessary. So newer defect dataset Decision Trees. These models are embedded with distinguished image
was formed for the POD(𝑎) function by combining the existing defect feature extraction techniques SURF and HoG. The combination of HoG-
samples and artificially created to generate more data. The artificial Linear SVM (F1-score = 0.984, ROC-AUC = 1.00) and SURF-Decision
defects were constructed using image augmentation methods of rota- Fine Tree (F1-score = 0.97, ROC-AUC = 0.92) outperformed all other
tion, skewing and mirroring. The smallest defect size in the positive models. The HoG-Linear SVM was further evaluated with the certifica-
imageset is 6 px and the largest is 383 px. A sum of 29 artificial defects tion process with the POD function, enabling it to determine a defect
was fabricated in each defect size interval (5 px - to generate more size of 20 px in images. The HoG-Linear SVM has a performance gap of
defects), creating 29 × 5 = 145 defect samples. These 145 artificial 14 px that can be improved with more defect samples for training and
defects combined with the positive imageset of 189 + 16=205 existing evaluation with industry partners for production.
defects. So a total of 350 (145 + 205 = 350) defects with different
sizes (considerable cost) are used to create a POD(𝑎) function. From
the existing negative imageset (233 + 7 = 240), images were cropped Special note
to match the same defect width interval (5 px) to obtain 350 specimens.
So, these control specimens (350) are randomly mixed with the defect The corresponding author conducted this research work with ZLP-
specimens (350). The trained HoG-Linear SVM classifier must detect DLR, Augsburg and Informatik-University of Bonn, Bonn, Germany in
all the 29 defects in that defect width interval to achieve the 90% PoD 2019.
with a 95% confidence level.
The trained HoG-Linear SVM classifier used the prediction process
Funding
to determine these 700 specimens, all Hit/Miss was recorded to plot
their probability as represented in Fig. 21. The proposed POD function
is prior improved due to the HoG image feature extraction method Center for Lightweight Production Technology (ZLP), German Aerospace
containing denoising and feature vectors [82]. The performance gap Center (DLR), Augsburg, Bavaria, Germany.
is calculated from the POD function (Fig. 21) as the difference between Preparation of this article – DFKI acknowledges financial support
the 90/95 and the smallest pixel size, 20 - 6 = 14 px. So a minor defect by the Lower Saxony Ministry for Science and Culture (MWK) through
of size 20 px can be identified as TP, achieving a POD of 90/95. The ‘‘Niedersachsen Vorab’’ (ZN3480).
11
CRediT authorship contribution statement [17] Bayes. An essay towards solving a problem in the doctrine of chances. In:
FRS communicated by Mr. Price in a letter to John Canton, A.M. FRS. 1763,
https://royalsocietypublishing.org/doi/pdf/10.1098/rstl.1763.0053.
Navya Prakash: Methodology, Software, Validation, Formal anal-
[18] Fisher A. The use of multiple measurements in taxonomic problems. Ann Eugen
ysis, Writing – original draft, Visualization. Dorothea Nieberl: Con- 1936;7(2):179–88. http://dx.doi.org/10.1111/j.1469-1809.1936.tb02137.x.
ceptualization, Data curation, Writing – review & editing, Supervision, [19] Cramer JS. The Origins of Logistic Regression. Tinbergen institute working paper
Project administration. Monika Mayer: Conceptualization, Data cura- No. 2002-119/4, 2002, http://dx.doi.org/10.2139/ssrn.360300.
tion, Writing – review & editing, Supervision, Project administration. [20] Bezdek JC, Ehrlich R, Full W. FCM: The fuzzy c-means clustering algo-
rithm. Comput Geosci 1984;10(2–3):191–203. http://dx.doi.org/10.1016/0098-
Alfons Schuster: Writing – review & editing, Supervision. 3004(84)90020-7.
[21] Faber V. Clustering and the continuous K-means algorithm. Los Alamos Sci
Declaration of competing interest 1994;22:138–44, https://www.cs.kent.edu/zwang/schedule/lj9.pdf.
[22] Jolliffe IT. Principal Component Analysis. Springer Series in Statistics; 2002,
http://cda.psych.uiuc.edu/statistical_learning_course/Jolliffe%20I.%20Principal%
The authors declare that they have no known competing finan- 20Component%20Analysis%20(2ed.,%20Springer,%202002)(518s)_MVsa_.pdf.
cial interests or personal relationships that could have appeared to [23] McCulloch WS, Pitts W. A logical calculus of the ideas immanent in ner-
influence the work reported in this paper. vous activity. Bull Math Biophys 1943;5:115–33. http://dx.doi.org/10.1007/
BF02478259.
[24] Broomhead DS, Lowe D. Radial basis functions, multi-variable functional in-
Data availability
terpolation and adaptive networks. In: Royal signals and radar establishment
malvern (United Kingdom). Complex Systems Publications, Inc.; 1988, p. 321–55,
The data that has been used is confidential. RSRE-MEMO-4148 https://apps.dtic.mil/sti/citations/ADA196234.
[25] Kramer MA. Nonlinear principal component analysis using autoassociative neural
networks. AIChE J 1991;37(2):11, https://people.engr.tamu.edu/rgutier/web_
Acknowledgement courses/cpsc636_s10/kramer1991nonlinearPCA.pdf.
[26] Van Der Malsburg C. Frank rosenblatt: Principles of neurodynamics: Perceptrons
We thank Prof. Dr. Jens Lehmann (Informatik-University of Bonn) and the theory of brain mechanisms. In: Brain theory. Springer Berlin Heidelberg;
for supervising this research (2019). 1986, p. 245–8. http://dx.doi.org/10.1007/978-3-642-70911-1_20.
[27] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale
image recognition. In: International conference on learning representations
References (ICLR). 2015, http://dx.doi.org/10.48550/arXiv.1409.1556.
[28] Girshick R. Fast R-CNN. In: IEEE international conference on computer vision.
[1] Ucan H, Scheller J, Nguyen C, Nieberl D, Mayer M, et al. Automated, quality ICCV, 2015, p. 1440–8. http://dx.doi.org/10.1109/ICCV.2015.169.
assured and high volume oriented production of fibre metal laminates (FML) for [29] Szegedy C, Ioffe S, Vanhoucke V, Alemi A. Inception-v4, inception-ResNet and the
the next generation of passenger aircraft fuselage shells. Sci Eng Compos Mater impact of residual connections on learning. 2016, http://dx.doi.org/10.48550/
2019;26:502–8. http://dx.doi.org/10.1515/secm-2019-0031, https://elib.dlr.de/ arXiv.1602.07261.
129574/. [30] Vaswani A, Shazeer N, et al. Attention is all you need. Neural Information
[2] Apmann H, Mayer M, et al. Verfahren der INLINE-qualitätssicherung und der zer- Processing Systems 2017. http://dx.doi.org/10.48550/arXiv.1706.03762, https:
störungsfreien prüfung innerhalb der fertigungslinie von faser-metall-laminaten. //dl.acm.org/doi/10.5555/3295222.3295349.
In: DLR congress (DLRK) conference - FML. 2017, https://elib.dlr.de/117260/. [31] Medjahed SA. A comparative study of feature extraction methods in images
[3] Bisle W, Meier T, Mueller S, Rueckert S. In-service inspection concept of GLARE® classification. IJIGSP 2015;7(3):16–23. http://dx.doi.org/10.5815/ijigsp.2015.03.
– an example for the use of new UT array inspection systems, ECNDT. 2006, 03.
https://www.ndt.net/search/docs.php3?id=3540. [32] Ojala T, Pietikainen M, Maenpaa T. Multiresolution gray-scale and rotation
[4] Vrana J, Singh R. NDE 4.0 - a design thinking perspective. J Nondestruct Eval invariant texture classification with local binary patterns. IEEE Trans Pattern
2021;24. http://dx.doi.org/10.1007/s10921-020-00735-9. Anal Mach Intell 2002;24(7):971–87. http://dx.doi.org/10.1109/TPAMI.2002.
[5] Schmidt T, Dutta S. Automation in production integrated NDT using thermogra- 1017623.
phy. In: International symposium on NDT in aerospace. 2012, https://www.ndt. [33] Matas J, Chum O, et al. Robust wide baseline stereo from maximally Stable
net/article/aero2012/papers/we3b1.pdf. Extremal Regions. Image Vis Comput 2004;22(10):761–7. http://dx.doi.org/10.
[6] Wunderlich C, Tschöpe C, Duckhorn F. Advanced methods in NDE using machine 1016/j.imavis.2004.02.006.
learning approaches. In: AIP conference proceedings 1949-020022. 2018, http: [34] Alcantarilla PF, Bartoli A, Davison AJ. KAZE features. In: Computer Vision -
//dx.doi.org/10.1063/1.5031519. ECCV 2012. Springer Berlin Heidelberg; 2012, p. 214–27. http://dx.doi.org/10.
[7] Ren I, Zahiri F, et al. A deep ensemble classifier for surface defect detection 1007/978-3-642-33783-3_16.
in aircraft visual inspection, smart sustain. Manuf Syst 2020;4(1):20200031. [35] Bay H, Tuytelaars T, Van Gool L. SURF: Speeded up robust features. In: Computer
http://dx.doi.org/10.1520/SSMS20200031. vision – ECCV 2006, Vol. 3951. Springer Berlin Heidelberg; 2006, p. 404–17.
[8] Nieberl D, Mayer M, Stefani T, Willmeroth M. Automated manufacturing of http://dx.doi.org/10.1007/11744023_32.
large fibre-metal-lmainate parts. In: European conference on composite materials. [36] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In:
2018, https://elib.dlr.de/124296/. IEEE computer society conference on computer vision and pattern recognition
[9] Schuster A, Mayer M, Willmeroth M, Brandt L, Kupke M. Inline quality control (CVPR’05). 2005, http://dx.doi.org/10.1109/cvpr.2005.177.
for thermoplastic automated fibre placement. In: Procedia manufacturing, Vol. [37] Lowe GD. Distinctive image features from scale-invariant keypoints. Int J Comput
51. Elsevier, FAIM; 2021, p. 505–11. http://dx.doi.org/10.1016/j.promfg.2020. Vis 2004;60:91–110. http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94.
10.071. [38] Zhang YL, Guo N, et al. Automated defect recognition of C-SAM images
[10] Internal Study, Schmidt T, Mayer M, Rainer L, Kupke M. Pilotstudie automa- in IC packaging using support vector machines. Int J Adv Manuf Technol
tisierte auswertung von NDT daten. DLR-IB 435-2015/32. 43 S, DLR-Interner 2005;25(11–12):1191–6. http://dx.doi.org/10.1007/s00170-003-1942-1.
Bericht, Unpublished https://elib.dlr.de/101533/. [39] Bernieri A, Ferrigno L, et al. An SVM approach to crack shape reconstruction
[11] Caruana R, N-Mizil A. An empirical comparison of supervised learning al- in eddy current testing. In: IEEE instrumentation and measurement technology
gorithms. In: International conference on machine learning. 2006, p. 161–8. conference proceedings. 2006, p. 2121–6. http://dx.doi.org/10.1109/IMTC.2006.
http://dx.doi.org/10.1145/1143844.1143865. 328502.
[12] Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995;20:273–97. [40] Bernieri A, Ferrigno L, et al. Crack shape reconstruction in eddy current
http://dx.doi.org/10.1007/BF00994018. testing using machine learning systems for regression. IEEE Trans Instrum Meas
[13] Breiman L. Bagging predictors. Mach Learn 1996;24(2):123–40. http://dx.doi. 2008;57(9):1958–68. http://dx.doi.org/10.1109/TIM.2008.919011.
org/10.1007/BF00058655. [41] Benítez HD, Loaiza H, et al. Defect characterization in infrared non-destructive
[14] Rokach L, Maimon O. Decision trees. In: Data mining and knowledge discovery testing with learning machines. NDT E Int. 2009;42(7):630–43. http://dx.doi.
handbook. Springer; 2005, p. 165–92. http://dx.doi.org/10.1007/0-387-25465- org/10.1016/j.ndteint.2009.05.004.
X_9. [42] Khodayari-Rostamabad A, Reilly JP, et al. Machine learning techniques for the
[15] Breiman L. Random forests. Mach Learn 2001;45:5–32. http://dx.doi.org/10. analysis of magnetic flux leakage images in pipeline inspection. IEEE Trans Magn
1023/A:1010933404324. 2009;45(8):3073–84. http://dx.doi.org/10.1109/TMAG.2009.2020160.
[16] Fix E, Hodges Jr JL. Discriminatory analysis - nonparametric discrimination: [43] Wei H, C-Tong L. Automatic real time SVM based ultrasonic rail flaw detection
consistency properties. Technical Report 4, USAF School of Aviation Medicine; and classification system. J Graduate Sch Chin Acad Sci 2009;26(4):517–21,
1951, https://apps.dtic.mil/sti/pdfs/ADA800276.pdf. http://journal.ucas.ac.cn/EN/Y2009/V26/I4/517.
12
[44] Shumin D, Zhoufeng L, Chunlei L. Adaboost learning for fabric defect detection [64] Internal Study: University of Augsburg, Detection of anomalies in ultrasonic
based on HOG and SVM. In: International conference on multimedia technology. images of fibre-metal-laminate skin fields, DLR Augsburg, (Unpublished).
IEEE; 2011, p. 2903–6. http://dx.doi.org/10.1109/ICMT.2011.6001937. [65] Ucan H, Apmann H, Grassel G, Krombholz C, Fortkamp K, Nieberl D,
[45] Freund Y, Schapire RE. A short introduction to boosting. J Japan Ehmke F, Nguyen C, Akin D. Produktionstechnologien für leichtbaustrukturen
Soc Artif Intell 1999;14(5):771–80, https://cseweb.ucsd.edu/yfreund/papers/ aus faser-metall-laminaten im flugzeugrumpf. Deutscher Luft- und Raum-
IntroToBoosting.pdf. fahrtkongress; 2017, https://elib.dlr.de/114906/ https://www.researchgate.net/
[46] Saechai S, Kongprawechnon W, Sahamitmongkol R. Test system for defect publication/321964549.
detection in construction materials with ultrasonic waves by support vector [66] Zapp P, Pantelelis N, Ucan H. The way to decrease the curing time by 50% in
machine and neural network. In: SCIS-ISIS. 2012, p. 1034–9. http://dx.doi.org/ the manufacturing of structural components using the example of FML fuselage
10.1109/SCIS-ISIS.2012.6505090. panels. In: SAMPE Europe conference. 2019, https://elib.dlr.de/130943/.
[47] Salzberg SL. Book review C4.5: Programs for machine learning by j. Ross quinlan. [67] Wanhill RJH. GLARE: A versatile fibre metal laminate (FML) concept. 2017,
Morgan Kaufmann publishers, inc. 1993. Mach Learn 1994;16(3):235–40. http: http://dx.doi.org/10.1007/978-981-10-2134-3_13.
//dx.doi.org/10.1007/BF00993309. [68] Etr HE, Korkmaz ME, Gupta MK, Gunay M, Xu J. A state-of-the-art review
[48] D’Angelo G, Rampone S. Shape-based defect classification for non destructive on mechanical characteristics of different fibre metal laminates for aerospace
testing. IEEE Metrol Aerospace (MetroAeroSpace) 2015;406–10. http://dx.doi. and structural application. In: International journal of advanced manufacturing
org/10.1109/MetroAeroSpace.2015.7180691. technology, Vol. 123. Springer; 2022, p. 2965–91. http://dx.doi.org/10.1007/
[49] Sumesh A, Rameshkumar K, et al. Use of machine learning algorithms s00170-022-10277-1.
for weld quality monitoring using acoustic signature. Procedia Comput Sci [69] Stone M. Cross-validatory choice and assessment of statistical predictions. J R
2015;50:316–22. http://dx.doi.org/10.1016/j.procs.2015.04.042. Stat Soc Ser B Stat Methodol 1973;36(2):111–33. http://dx.doi.org/10.1111/j.
[50] Malekzadeh T, Abdollahzadeh M, et al. Aircraft fuselage defect detection using 2517-6161.1974.tb00994.x.
deep neural networks. In: The IEEE global conference on signal and infor- [70] Berrar D. Cross-validation. In: Encyclopedia of bioinformatics and computational
mation processing. 2017, http://dx.doi.org/10.48550/arXiv.1712.09213, arXiv: biology, Vol. 1. Elsevier; 2018, p. 542–5. http://dx.doi.org/10.1016/B978-0-12-
1712.09213. 809633-8.20349-X, Elsevier.
[51] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolu- [71] Jarvis R, Cawley P, Nagy PB. Performance evaluation of a magnetic field
tional neural networks. Commun ACM 2017;60(6):84–90. http://dx.doi.org/10. measurement NDE technique using a model assisted probability of detection
1145/3065386. framework. NDT E Int 2017;91:61–70. http://dx.doi.org/10.1016/j.ndteint.2017.
[52] Huang H, Hu C, et al. Surface defects detection for mobilephone panel work- 06.006.
pieces based on machine vision and machine learning. In: IEEE international [72] Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett
conference on information and automation. ICIA, 2017, p. 370–5. http://dx.doi. 2006;27(8):861–74. http://dx.doi.org/10.1016/j.patrec.2005.10.010.
org/10.1109/ICInfA.2017.8078936. [73] Zhou W, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment:
[53] Shipway NJ, Huthwaite P, et al. Performance based modifications of random From error visibility to structural similarity. IEEE Trans Image Process
forest to perform automated defect detection for fluorescent penetrant inspection. 2004;13(4):600–12. http://dx.doi.org/10.1109/TIP.2003.819861.
J Nondestruct Eval 2019;38(2):37. http://dx.doi.org/10.1007/s10921-019-0574- [74] Georgiou GA. PoD curves, their derivation, applications and limitations. Insight
9. 2006;49:409–14. http://dx.doi.org/10.1784/insi.2007.49.7.409.
[54] Chen Z-H, Juang J-C. AE-rtisnet: Aeronautics engine radiographic testing in- [75] Harding CA, Hugo GR. Guidelines for interpretation of published data on
spection system net with an Improved Fast Region-based convolutional neural probability of detection for non-destructive testing. 2011, p. 31, https://apps.
network framework. Appl Sci 2020;10(23):8718. http://dx.doi.org/10.3390/ dtic.mil/sti/pdfs/ADA398282.pdf.
app10238718. [76] Matzkanin GA, Yolken HT. Probability of Detection (POD) for nondestructive
[55] Redmon J, Divvala S, et al. You only look once: Unified, real-time object evaluation. NDE, Defense Technical Information Center; 2001, http://dx.doi.org/
detection. 2016, http://dx.doi.org/10.48550/arXiv.1506.02640. 10.21236/ADA398282.
[56] Hu Y, Wang J, et al. Automatic defect detection from X-ray scans for aluminium [77] Sause MGR, Jasiuniene E. Structural health monitoring damage detection systems
conductor composite core wire based on classification neutral network. NDT E for aerospace. Cham: Springer International Publishing; 2021, http://dx.doi.org/
Int 2021;124:102549. http://dx.doi.org/10.1016/j.ndteint.2021.102549. 10.1007/978-3-030-72192-3.
[57] Rabiner LR, Juang BH. An introduction to hidden Markov models. IEEE ASSP [78] Zolfaghari A, Kolahan F. Reliability and sensitivity of visible liquid penetrant
Mag 1986;12. http://dx.doi.org/10.1109/MASSP.1986.1165342. NDT for inspection of welded components. Mater Test 2017;59(3):290–4. http:
[58] Kraljevski I, Duckhorn F, et al. Machine learning for anomaly assessment in //dx.doi.org/10.3139/120.111000.
sensor networks for NDT in aerospace. IEEE Sens J 2021;21(9):11000–8. http: [79] Tschöke K, et al. Feasibility of model-assisted probability of detection prin-
//dx.doi.org/10.1109/JSEN.2021.3062941. ciples for structural health monitoring systems based on guided waves for
[59] Niccolai A, Caputo D, et al. Machine learning-based detection technique for fibre-reinforced composites. IEEE Trans Ultrason Ferroelectr Freq Control
NDT in industrial manufacturing. In: Mathematics, Vol. 9. MDPI; 2021, p. 1251. 2021;68(10):3156–73. http://dx.doi.org/10.1109/TUFFC.2021.3084898.
http://dx.doi.org/10.3390/math9111251, (11). [80] Silva RR da, Padu GX de. Nondestructive inspection reliability: State of the
[60] Siljama O, Koskinen T, et al. Automated flaw detection in multi-channel phased art. In: Nondestructive testing methods and new applications. InTech; 2012,
array ultrasonic data using machine learning. J Nondestruct Eval 2021;40(3):67. http://dx.doi.org/10.5772/37112.
http://dx.doi.org/10.1007/s10921-021-00796-4. [81] Schnars U, Kück A. Application of POD analysis at airbus. In: 4th European-
[61] Fakih MA, Chiachío M, et al. A Bayesian approach for damage assessment in american workshop on reliability of NDE. 2009, https://www.ndt.net/?id=
welded structures using lamb-wave surrogate models and minimal sensing. NDT 8320.
E Int 2022;128:102626. http://dx.doi.org/10.1016/j.ndteint.2022.102626. [82] Topp M, Strothmann L. How can NDT 4.0 improve the probability of detec-
[62] Le M, Luong VS, et al. Auto-detection of hidden corrosion in an aircraft tion (POD)? e-J Nondestruct Test (NDT) 2021;26(4). https://www.ndt.net/?id=
structure by electromagnetic testing: A machine-learning approach. Appl Sci 26013.
2022;12(10):5175. http://dx.doi.org/10.3390/app12105175, MDPI.
[63] Risheh A, Tavakolian P, et al. Infrared computer vision in non-destructive
imaging: Sharp delineation of subsurface defect boundaries in enhanced trun-
cated correlation photothermal coherence tomography images using K-means
clustering. NDT E Int 2022;125:102568. http://dx.doi.org/10.1016/j.ndteint.
2021.102568.
13

Main

Uploaded by

Copyright:

Available Formats

Main

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Main

Uploaded by

Copyright:

Available Formats

NDT&E International 138 (2023) 102885

Contents lists available at ScienceDirect

NDT and E International

Learning defects from aircraft NDT data

ARTICLE INFO ABSTRACT

1. Introduction proposed work can be vital in the automated offline-QA to scruti-

Fig. 1. A380 FML panels [1,2,65,66].

CV consists of exhaustive (iterates randomly on all data points) and

Fig. 6. LBP features.

(iii) For MSER (Fig. 8), KAZE (Fig. 9) and SURF

𝑟𝑒𝑔𝑖𝑜𝑛𝑠 = 𝑑𝑒𝑡𝑒𝑐𝑡𝑀𝑆𝐸𝑅𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 (𝑔𝑟𝑎𝑦𝑠𝑐𝑎𝑙𝑒 𝑖𝑚𝑎𝑔𝑒)

𝑟𝑒𝑔𝑖𝑜𝑛𝑠 = 𝑑𝑒𝑡𝑒𝑐𝑡𝐾𝐴𝑍𝐸𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 (𝑔𝑟𝑎𝑦𝑠𝑐𝑎𝑙𝑒 𝑖𝑚𝑎𝑔𝑒)

𝑟𝑒𝑔𝑖𝑜𝑛𝑠 = 𝑑𝑒𝑡𝑒𝑐𝑡𝑆𝑈 𝑅𝐹 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 (𝑔𝑟𝑎𝑦𝑠𝑐𝑎𝑙𝑒 𝑖𝑚𝑎𝑔𝑒)

𝑓 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 = 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑠 (𝑔𝑟𝑎𝑦𝑠𝑐𝑎𝑙𝑒 𝑖𝑚𝑎𝑔𝑒, 𝑟𝑒𝑔𝑖𝑜𝑛𝑠)

Fig. 11. 𝑘-NN accuracy.

The accuracy of the Machine Learning model is the rate of correct

Fig. 12. Decision Tree accuracy. Fig. 14. SVM accuracy.

Fig. 15. HoG-Linear SVM confusion matrix.

Fig. 18. SURF-Decision Fine Tree ROC-AUC curve.

Fig. 19. SURF-Decision Fine Tree Classification Tree Viewer.

The reliability of NDE is defined as determining the probability

reliability while encountering new defect datasets and implementing

4.1. Probability of detection

A POD is a function of the defect size; it evaluates the smallest

𝐻𝑖𝑡 = 𝑎 > 𝑎𝑙𝑎𝑟𝑔𝑒𝑠𝑡 (10)

You might also like