Molefe Mohale Emmanuel 2021
Molefe Mohale Emmanuel 2021
Molefe Mohale Emmanuel 2021
By:
Mohale Molefe
213538364
Supervisor:
Prof Jules-Raymond Tapamo
The research presented in this dissertation was conducted at the University of KwaZulu
Natal under the supervision of Prof. Jules-Raymond Tapamo. I hereby declare that
all the materials used in this dissertation are my own original work except where an
acknowledgement is made in form of a reference. The work contained herein has not
been submitted in part or whole for a degree at any other university.
Mohale Molefe
June 2021
i
ii
Declaration 1: Supervisor
Jules-Raymond Tapamo
June 2021
iii
iv
Declaration 2: Plagiarism
2. The work presented in this dissertation has not been submitted to UKZN or
another university for purposes of obtaining an academic qualification, whether
by myself or any other.
3. This dissertation does not contain another person’s data, pictures, graphs or
other information unless specifically acknowledged as being sourced from other
persons.
4. The research does not contain another person’s writings unless specifically ac-
knowledged as being sourced from other researchers. Where other written
sources have been quoted, then:
(a) Their words have been re-written but general information attributed to
them has been referenced.
(b) Where their exact words have been used, then their writings have been
placed in italics and inside quotation marks and referenced.
5. This dissertation does not contain texts, graphics or tables copied and pasted
from the internet, unless specifically acknowledged, and the source being de-
tailed in the thesis and in the Reference section.
Mohale Molefe
June 2021
v
vi
Declaration 3: Publications
I, Mohale Molefe, declare that the following publications came out of this disserta-
tion.
Mohale Molefe
June 2021
vii
viii
Acknowledgement
I am also thankful to my mentor and colleague Dr. Joshua Maumela whose knowl-
edge and expertise was invaluable through this research.
I would also like to thank Thato Mahlatji, Refilwe Ndlela and Siboniso Vilakazi,
who assisted with providing the dataset used to conduct experiments in this research
work.
Lastly, I would like to acknowledge the Transnet Radiography specialists who cat-
egorised different thermite welding defects from the obtained dataset.
ix
x
Abstract
The defects formed during the thermite welding process between two sections of
rails require the welded joints to be inspected for quality purpose. The commonly
used non-destructive method for inspection is Radiography testing. However, the de-
tection and classification of various defects from the generated radiography images
remains a costly, lengthy and subjective process as it is purely conducted manually
by trained experts. It has been shown that most rail breaks occur due to a crack that
initiated from the weld joint defect that was not detected. To meet the requirements
of the modern technologies, the development of an automated detection and classifi-
cation model is significantly demanded by the railway industry. This work presents a
method based on image processing and machine learning techniques to automatically
detect and classify welding defects. Radiography images are first enhanced using the
Contrast Limited Adaptive Histogram Equalisation method; thereafter, the Chan-Vese
Active Contour Model is applied to the enhanced images to segment and extract the
weld joint as the Region of Interest from the image background. A comparative in-
vestigation between the Local Binary Patterns descriptor and the Bag of Visual Words
approach with Speeded Up Robust Features descriptor was carried out for extracting
features in the weld joint images. The effectiveness of the aforementioned feature
extractors was evaluated using the Support Vector Machines, K-Nearest Neighbours
and Naive Bayes classifiers. This study’s experimental results showed that the Bag
of Visual Words approach when used with the Support Vector Machines classifier,
achieves the best overall classification accuracy of 94.66%. The proposed method
can be expanded in other industries where Radiography testing is used as the inspec-
tion tool.
xi
xii
Contents
1 General Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Main Aim and Specific Objectives . . . . . . . . . . . . . . . . . . . . . 5
1.5 Study Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Literature Review 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 An Overview of Machine Learning . . . . . . . . . . . . . . . . . . . . 7
2.3 Application of Machine Learning in Railway . . . . . . . . . . . . . . . 9
2.3.1 Shallow Learning-based Algorithms . . . . . . . . . . . . . . . . 9
2.3.2 Deep Learning-based Algorithms . . . . . . . . . . . . . . . . . 12
2.4 Classification of Defects in Radiography Images . . . . . . . . . . . . . 15
2.4.1 Image Enhancement . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.2 Image Segmentation and Weld Joint Extraction . . . . . . . . . 17
2.4.3 Feature Extraction and Classification . . . . . . . . . . . . . . . 21
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
xiii
3.3.4 Post-Processing using Morphological Operations . . . . . . . . . 36
3.3.5 Algorithm for Weld Joint Segmentation and RoI extraction . . . 37
3.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.1 Feature Extraction using Local Binary Patterns . . . . . . . . . . 39
3.4.2 Feature Extraction using Speeded Up Robust Features . . . . . . 42
3.4.3 Image Representation using Bag of Visual Words . . . . . . . . 47
3.5 Feature Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5.1 Feature Classification using Support Vector Machines . . . . . . 50
3.5.2 Feature Classification using the K-Nearest Neighbors . . . . . . 54
3.5.3 Feature Classification using Naive Bayes . . . . . . . . . . . . . 55
3.6 Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.6.1 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.6.2 K Fold Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . 58
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Bibliography 87
xiv
List of Figures
4.1 Defect-less . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2 Wormholes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Shrinkage cavities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Inclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5 Image Enhancement using CLAHE at varying clip factor values . . . . 63
4.6 Weld joint RoI steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.7 Weld joint extraction in a defect-less class . . . . . . . . . . . . . . . . 65
4.8 Weld joint extraction in Wormholes class . . . . . . . . . . . . . . . . . 65
4.9 Weld joint extraction shrinkage cavities class . . . . . . . . . . . . . . . 66
4.10 Weld joint extraction in Inclusions class . . . . . . . . . . . . . . . . . . 66
xv
4.11 Segmentation accuracy for each class . . . . . . . . . . . . . . . . . . . 66
4.12 Classification accuracy of the K-NN classifier at varying LBP cell size
parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.13 Classification accuracy of the SVM classifier at varying LBP cell size
parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.14 Classification accuracy of the Naive Bayes classifier at varying LBP cell
size parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.15 Classification accuracy of the K-NN classifier at varying codebook size
parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.16 Classification accuracy of the SVM classifier at varying codebook size
parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.17 Classification accuracy of the Naive Bayes classifier at varying code-
book size parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
xvi
List of Tables
4.1 Confusion matrix using LBP and 5-NN at [6 14] cell size . . . . . . . . 68
4.2 Confusion matrix using LBP and 5-NN at [12 28] cell size . . . . . . . 68
4.3 Confusion matrix using LBP and 1-NN at [30 70] cell size . . . . . . . 68
4.4 Confusion matrix using LBP and 3-NN at [60 140] cell size . . . . . . . 68
4.5 Confusion matrix using LBP and SVM (σ = 4) at [6 14] cell size . . . . 70
4.6 Confusion matrix using LBP and SVM (σ=0.25) at [12 28] cell size . . 70
4.7 Confusion matrix using LBP and SVM (σ=0.5) at [30 70] cell size . . . 71
4.8 Confusion matrix using LBP and SVM (σ=0.5) at [60 140] cell size . . 71
4.9 Confusion matrix using LBP and Naive Bayes at [6 14] cell size . . . . 72
4.10 Confusion matrix using LBP and Naive Bayes at [12 28] cell size . . . . 73
4.11 Confusion matrix using LBP and Naive Bayes at [30 70] cell size . . . . 73
4.12 Confusion matrix using LBP and Naive Bayes at [60 140] cell size . . . 73
4.13 Highest classification accuracy by each classifier for LBP features . . . 74
4.14 Confusion matrix using BoSURF and 3-NN at 200 codewords . . . . . 75
4.15 Confusion matrix using BoSURF and 5-NN at 800 codewords . . . . . 75
4.16 Confusion matrix using BoSURF and 5-NN at 1400 codewords . . . . . 76
4.17 Confusion matrix using BoSURF and 3-NN at 2000 codewords . . . . . 76
4.18 Confusion matrix using BoSURF and SVM (σ = 0.5) at 200 codewords 77
4.19 Confusion matrix using BoSURF and SVM (σ = 4) at 800 codewords . 77
4.20 Confusion matrix using BoSURF and SVM (σ = 8) at 1400 codewords . 77
4.21 Confusion matrix using BoSURF and SVM (σ = 0.25) at 2000 codewords 78
4.22 Confusion matrix using BoSURF and Naive Bayes at 200 codewords . . 79
4.23 Confusion matrix using BoSURF and Naive Bayes at 800 codewords . . 79
4.24 Confusion matrix using BoSURF and Naive Bayes at 1400 codewords . 79
4.25 Confusion matrix using BoSURF and Naive Bayes at 2000 codewords . 79
xvii
4.26 Highest classification accuracy by each classifier for Bag of SURF (Bo-
SURF) features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.27 Highest classification accuracy achieved for LBP and BoSURF . . . . . 81
xviii
List of Algorithms
xix
xx
Abbreviations
xxi
RoI Region of Interest
RT Radiography Testing
xxii
Chapter 1
General Introduction
1.1 Introduction
Railway transportation refers to passengers’ transportation, various commodities,
goods and services traded in as Cargo and Freight from one destination to another,
using wheeled vehicles designed to run on rails. The South African railway industry
is owned and managed by Transnet Freight Rail (TFR), which is one of six divisions
of Transnet Ltd. TFR maintains an extensive rail network across South Africa that
connects with other rail networks in Sub-Saharan regions and its rail network in-
frastructure represents approximately 80% of total infrastructure in Africa. The rail-
way infrastructure is a complex and multi-purpose system that involves earthwork,
tunnels, bridges, and a track structure. Each infrastructure system serves a specific
purpose of assuring safe and reliable train transportation. Thus, proper maintenance
planning and reliable infrastructure monitoring technologies are of great importance.
1
2 General Introduction
Track structure is the most fundamental part of the railway infrastructure, and its
primary purpose is to serve as a guideway for the train wheels and absorb dynamic
stresses induced by the train motion. As illustrated in Figure 1.1, track structure
comprises components such as rails, sleepers, ballast, and fasteners. The most criti-
cal and maintenance demanding component of the track structure is the rails. Unlike
other components, rails are manufactured in sections and are joined together to form
a continuous railway line during the installation process. The sections of the rails are
usually welded together to form Continuously Welded Rails (CWR); the type of weld-
ing method used by TFR and other railway industries [8, 9] is thermite welding.
Figure 1.2: Thermite weld process and the weld joint formed
A wide range of Non-Destructive Testing (NDT) techniques has been used to in-
spect the weld joints for possible defects that could have occurred during the welding
process. These include acoustic emission, eddy current, ultrasonic testing, and radio-
graphy testing. Radiography Testing (RT) is a commonly used NDT method across
many railway industries [10]. RT has several advantages compared to other NDT
methods as it allows radiography experts to examine and visualize defects from the
generated images. The role of the radiography expert is to detect, classify, and accept
1.2 Problem Statement 3
or reject the weld, based on the type of defect detected and the applicable radiogra-
phy standards.
Five different types of welding defects can be produced from the RT methods. These
are; lack of fusion, shrinkage cavities, inclusions, wormholes, and porosity. Lack of
fusion defects occurs when there is an insufficient fusion between the weld joint and
the parent rails during the welding operation. Shrinkage cavities are voids formed
during the solidification of molten iron due to shrinkages; they are usually located in
the upper web area of the rails. Porosity and wormholes are voids which are caused
by gas entrapment. Porosity is spherical, and wormholes are elongated. Inclusions
occur due to the presence of foreign materials; they are irregular in shape. Figure
1.3 illustrates some of the defects produced by the RT method.
One of many examples of a reported train derailment in TFR due to thermite weld
defect was that of the coal line during the 2016/2017 financial year. Failure analysis
4 General Introduction
showed that the rail break occurred due to a crack that initiated at the weld joint. Fig-
ure 1.4 illustrates the derailment site on the coal line. Before the derailment, the coal
line reported 32 rail breaks, 18 train cancellations, and 6520-minute delays. 58% of
the rail breaks were on or adjacent to the weld joint [2]. These statistics indicate that
the manual detection and classification of welding defects using human expertise is
unacceptable. Thus, there is a need to develop an automatic defect detection and
classification system that will address the shortcoming of the manual process.
1.3 Motivation
Railway transportation plays a significant role in developing the South African econ-
omy, and industrial growth. Failures such as rail breaks due to thermite weld defects
are directly linked to train derailments. At present, the detection and classification
of thermite weld defects are performed manually by a trained radiography expert.
However, manual inspection is problematic due to its low efficiency, lack of objectiv-
ity, high false alarm rate, and lengthy turnaround time. Additionally, the results are
entirely dependent on the capability of the inspector to detect, classify, and assess
the criticality of defects. Thus, there is a demand for the development of an efficient
and accurate system that can detect and classify thermite weld defects automatically.
The development of an automated system will assure that defects are detected and
classified immediately after the weld joints are tested. This will significantly reduce
the turnaround time and allow the maintenance teams to immediately remove the
weld joints that possess a risk to rail breaks.
Over the years, computer vision technologies have been studied for various appli-
cations in the railway industry. Some of the successful applications of computer
vision in the railway industry have been the development of the automatic classifica-
tion of rail surface defects and railway fastener monitoring systems [11, 12, 13, 14].
However, the development of a computer vision-based system for the detection and
1.4 Main Aim and Specific Objectives 5
classification of thermite weld defects remains an area that has not been explored.
This work is an attempt to automate the process by using image processing and ma-
chine learning techniques.
• Perform a literature review on existing techniques for the detection and classi-
fication of welding defects in radiography images.
• Develop an algorithm to segment and extract the weld joint as the Region of
Interest.
• The design and implementation of an efficient model for thermite weld joint
defect classification.
• Chapter two presents the literature review on the current image processing and
machine learning methods used to detect and classify defects in the railway
and radiography industries. This includes methods based on deep learning and
shallow learning algorithms.
• Chapter three discusses the material and methods for classification of thermite
weld defects. It divides the methods into thermite weld image enhancement,
weld joint extraction, feature extraction and feature classification.
• Chapter four provides the full experimental results and discussion obtained
from various feature extraction and classification algorithms.
• Chapter five concludes the dissertation and provides the recommendation for
future work.
Chapter 2
Literature Review
2.1 Introduction
Image processing and machine learning methods have enabled many railway practi-
tioners to benefit from a wide variety of applications over the past years. The possi-
bility of collecting data such as rail surface defects, missing fasteners, track geometry
and welding defects has proven beneficial for efficient railway transportation and
the development of predictive maintenance models. The use of machine learning
methods in the railway industry started to gain popularity since the introduction of
first-generation rail monitoring systems introduced by Trascino et al. [15]. These sys-
tems collected and stored various types of rail defect data, which later was reviewed
by a trained personnel to make decisions. However, these systems did not incorpo-
rate automated detection and classification of rail defects in the captured data. As
faster computing software became available, several researchers started introducing
image processing and machine learning framework with high automation capabili-
ties. Therefore, this chapter reviews some of the successful applications of machine
learning in the railway and radiography industries. Section 2.2 presents an overview
of machine learning. Section 2.3 discusses the recent applications of machine learn-
ing in the railway industry. Finally, Section 2.4 outlines some image processing and
machine learning methods for extracting and classifying welding defects in radiogra-
phy images. Section 2.5 concludes the chapter.
7
8 Literature Review
The main difference between shallow and deep learning algorithms lies in their
level of representation. Shallow algorithms use manually designed features and al-
gorithms such as the Support Vector Machines (SVM) [18], K-Nearest Neighbours
(K-NN) [19] and Random Forest [20] to train a shallow classifier. Deep learning
algorithms, on the other hand, learn the features directly from raw data [21]. Shal-
low learning-based methods use manually designed or hand-crafted features to train
shallow algorithms for learning the function that maps the predictive variables to
the target variables. Additionally, these set of algorithms use a structured dataset in
the form of a vector as inputs. For example, the SVM uses the feature vectors rep-
resenting different samples in high dimensional feature space to learn a hyperplane
that best separates between samples of two different classes. The unknown sample
feature vector is assigned to a target function based on its relative position from the
constructed hyperplane. A K-NN algorithm stores the feature vectors as part of the
training process; the unknown sample is assigned a target function using the feature
vectors nearest to the unknown sample.
The main difference between the standard neural network and CNN is that the con-
volutional layers replace the fully connected layers in the standard neural network.
In contrast to fully connected layers, the neurons in convolutional layers are not con-
nected to all the neurons in the previous and next layers; instead, the weights are
shared between the group of layers. This sharing of weight is an important property
of CNN as it allows the weight learned from the previous task to be applied in solving
a new task; this is known as transfer learning.
A similar approach to the one presented in [22] was proposed by Yue et al. [23].
The authors combined geometric features with grey levels features to describe three
surface defects, namely; scale peeling, crack stripping and tread crack. The multiclass
classification of these defects was achieved using the AdaBoost classifier. According
to the authors, the combination of the geometric features and grey levels features
allows for detection of complex and random shape of the defect regions, which is an
10 Literature Review
In [24], a method for detection and classification of images containing the scouring
rail surface defects and defect-less images was proposed. Several feature extraction
algorithms were used for experiments, these included; Principal Component Analysis
(PCA), Kernel Principal Component Analysis (KPCA), Singular Value Decomposition
(SVD) and Histogram Match (HM). The comparative analysis was achieved using the
Random Forest as a classification algorithm. The experimental results conducted
showed that the PCA gives higher classification accuracy while the HM achieved
faster feature extraction and training time. This method was not effective as for-
eign objects on acquired images were detected as defects.
A particular rail surface defect type called squats is usually caused by the Rolling
Contact Fatigue (RCF). Gao et al. [27] made use of three different data sources
consisting of ultrasonic, eddy current and rail surface images to detect squats more
reliably. Features extracted from three data sources were grouped using a cluster-
ing algorithm and fed into the SVM classifier trained to detect squats. This method
lacked accuracy, and the feature extraction process was slow. Jiang et al. [28] com-
bined the laser ultrasonic technology and hybrid intelligent method to achieve fast
classification and evaluation of RCF in different depths. The ultrasonic scanning sys-
tems collected data samples from different locations of the defects. Their method
used Wavelet Packet Transform (WPT) to decompose the signal of the defects in dif-
ferent depths; KCPA to eliminate the redundancy of original feature set and the SVM
classifier to classify defects in different depths.
Zhang et al. [29] proposed an automatic railway visual detection system that de-
tects surface defects like squats, spalling and flaking. The authors used the vertical
projection and grey contrast algorithms to extract the rail from the background im-
2.3 Application of Machine Learning in Railway 11
age. A curvature filter was also applied on the extracted rails to eliminate the noise
and keep only the essential details. The detection of surface defects was achieved
using the Gaussian Mixture Model (GMM) as a segmentation technique based on the
Markov Random Field (RMF). This method allowed for the extraction of rail, even in
challenging backgrounds. However, the main challenge is that an assumption made
was that the pixels in an image are independently distributed, and the prior distribu-
tion of the GMM is independent on the spatial relationship between the pixels and
their neighbours. Thus, the GMM was more susceptible to noise and illumination
changes.
Grace and Rao [30] performed an analytical study of real-time rail surface defect
prediction using three machine learning classification algorithms, namely; Neural
Network, Decision Trees and Random Forest. The algorithms were trained and vali-
dated using the dataset collected by the Train Recording Car (TRC). The experimental
results showed that the Decision Trees classifier outperforms the other classification
algorithms to classify low-risk and high-risk surface defects. Even though the classi-
fication accuracy achieved is impressive, the downside of this study is on the defect
detection stage, some defects could not be detected at high TRC speed; thus, im-
provements are needed.
A method for inspecting weld defects in welded rails was proposed by Nunez et al.
[31] based on the Axle Box Acceleration (ABA) measurements data. ABA measures
the vibrations induced by the wheel-rail interaction and indicates an irregularity in
the rail-based on the measured wheel-rail interaction data. The authors used the
Hilbert based approach to process, detect and assess the quality of the weld based on
numerous registered dynamic responses in ABA. The obtained results were, however,
dependent solely on the dynamic responses from ABA. Furthermore, the acceleration
data used corresponded widely to the vibrations induced by the track irregularities.
An improved method to predict weld defects and classify the track conditions from
the predicted results was proposed by Yao and Tao [32]. The authors extracted a wide
range of features, including manufacturing technologies of welds, related materials,
influential environment factors, and welding engineers’ marks. These features were
then used to train the machine learning classification algorithms including the SVM,
Random Forest and Logistic Regression. However, their method does not detect and
classify different rail welding defects; It only detect whether there is the presence of
welding defects on rails and based on the predicted results the track is classified as
safe or at risk.
12 Literature Review
Shang et al. [34] proposed a two-stage method for rail inspection using image
segmentation and CNN. Their method was designed specifically for two objectives;
to extract the rail surface from the background and classify the rail as defective or
defect-less. The rail surface was extracted using the Canny edge operator to detect
edges as the boundaries. Subsequently, the rail classification as defective or defect-
less was achieved using CNN based on the inception-v3 pre-trained model. This
method achieved great classification results; however, it was implemented for a bi-
nary classification task. Additionally, the Canny edge operator did not guarantee
successful detection of edges in every image.
Roohi et al. [35] developed a Deep Convolutional Neural Networks (DCNN) frame-
work to automatically detect and classify four classes of rail defects, namely, weld-
ing defects, light squats, moderate squats, and severe squats. The authors claimed
that feature extraction using DCNN is more robust and accurate than the traditional
feature extraction methods used on a large dataset. Their framework comprised
three convolutional layers, three max-pooling layers, and three fully-connected lay-
ers. Subsequently, the hyperbolic tangent (Tanh) function and the rectified linear
unit (ReLU) were used as activation functions. The classification accuracy achieved
was impressive but could be better with hyperparameter tuning of parameters such
as the learning rate and optimiser. Furthermore, their framework does not detect
and classify different types of welding defects.
The method proposed by Jamshid et al. [36] detected the squats and predicted its
growth based on the video images and ultrasonic measurement data. The ultrasonic
measurement data was used to derive the general characteristics of the squats, and
the video image data was used to analyse the growth of the visual length of defects.
As an improvement to their previous method in [26] where the SVM classifier was
2.3 Application of Machine Learning in Railway 13
used to classify the fastener defects. Gibert et al. [11] trained a CNN pipeline based
on five convolutional layers to classify the condition of the fastener as good, miss-
ing, or defective. To make their pipeline more robust against unusual situations, the
authors used image augmentation and resampling techniques to add more ”hard to
classify” images to their training dataset.
Yanan et al. [37] developed a rail surface defect detection method using the YOLO-
v3 deep learning network. Grey scale input images were initially divided into equal
cells, and within each cell, the height, width and centre coordinates of the defects
was calculated using the dimensional clustering method. The authors further used
a logistic regression algorithm to calculate the bounding box score; meanwhile, the
predictions of defect class that the bounding box contains were achieved using the
binary cross-entropy loss function. However, the classification results were not im-
pressive, and the learning rate was set to a high value. A high learning rate allows a
model to learn faster at the cost of the sub-optimal solution.
Recurrent Neural Networks (RNN) are another example of deep learning algorithms
commonly used for sequential and time-series tasks. Long Short-Term Memory (LSTM)
networks are a particular case of RNN, and they can handle the vanishing gradient
problem of the standard RNN. Xu et al. [38] developed an LSTM model to detect and
classify defective and non-defective rail surface based on the ultrasonic measurement
data. The pulse sequence from the ultrasonic data was interpreted as the sequential
task in the LSTM architecture. The LSTM memory cell was used to establish the sur-
face defect classification pipeline.
Song et al. [39] conducted a comparative study to detect and classify the severity of
rail shelling defects. The dataset used to conduct the experiment included images of
four levels of rail shelling defects ranging from low risk to high risk. The authors com-
pared two pre-trained CNN models, namely the Residual Neural Networks (ResNet)
and the VGG-16 network, as well as the approaches based on the manually extracted
features, including the HoG descriptor with SVM classifier. The authors presented the
results in terms of computation cost and classification accuracy. Their experimental
results showed that the ResNet model takes less computational cost and achieves the
highest overall classification accuracy. Table 2.1 illustrates an overview of the current
publications of rail defect classification using machine learning techniques.
14 Literature Review
Table 2.1: Recent publications on rail defect classification using machine learning
techniques
As shown in Table 2.1, most researchers have applied machine learning tech-
niques specifically on the detection and classification of rail surface defects. Although
the condition monitoring of weld joint based on machine learning has been studied
by several researchers [34, 40, 41], a limited amount of research work can be found
on the specific subject of the detection and classification of rail thermite weld de-
fects using image processing and machine learning. Thus, for this research work, the
following section will review some related work in other industries that use radio-
graphy to detect and classify welding defects in radiography images. Furthermore,
Table 2.1 shows that deep learning has been the most extensively used technique for
detecting and classifying rail defects. However, deep learning algorithms require an
extensive amount of data for their implementation. Given that the dataset presented
in this work is limited, the following section will review recent applications of im-
age processing and shallow learning techniques for the detection and classification
of welding defects in radiography images.
2.4 Classification of Defects in Radiography Images 15
In the research work conducted by Roumen et al. [42] for automatic detection of
defects in radiography images, the authors used two-dimensional adaptive filter and
two-dimensional linear filter for noise suppression and correction of uneven back-
ground. Their method enhances the images at the less computational cost compared
to other Fast Fourier Transform filters. Mohamad and Halim [43] applied a circular
average filter and logarithmic intensity contrast to enhance radiography images con-
taining inclusions and porosity defects. A comparative study by Maher et al. [44]
compared the ideal, Butterworth and Gaussian high pass filters for noise removal and
enhancement of radiography images consisting of the lack of penetration and poros-
ity defects. Through experimentation, the authors proved that Gaussian high pass
filter provides a smooth transition between various bands of pixels intensity values.
Radiography images usually have low contrast and, in most cases, improvement is
achieved by manipulating the contrast of the image. The original radiography im-
age has its grey level distribution highly skewed to the darker side of its histogram.
16 Literature Review
Tokhy et al. [46] applied contrast stretching and normalisation algorithms to en-
hance radiography images containing welding defects. In this method, the images
were first normalised using low and high threshold values, then the value closest to
the maximum and minimum values was computed and finally, the contrast stretch-
ing algorithm was applied according to the determined range of contrast values. In
the study proposed by Abidin et al. [47], three pre-processing techniques for image
enhancement and noise removal were applied, these included noise removal by the
median filter, image enhancement by contrast stretching and image sharpening with
Laplacian filter.
In the research work proposed by Zahran et al. [48], the authors suggested using
the median filter and Weiner filter to remove noise prior to enhancing radiography
images with Global Histogram Equalisation (GHE). However, enhancing images us-
ing GHE as outlined in [49] is not ideal since it assumes no illumination changes
in foreground and background image objects. Additionally, for images where there
is a change in illumination, the GHE mapping gives unwanted results such as over
enhancement of intensity values with high probability values.
compared contrast stretching, GHE, and AHE for enhancement of radiography im-
ages. The authors presented the results in terms of peak signal to noise ratio and
mean squared error. The experimental results proved that AHE outperforms the GHE
while the contrast stretching achieved the worst results. However, Zhihong et al.
[49] states that AHE techniques tend to over enhance local noise content since en-
hancement is carried out in local image regions.
connected around the RoI, and they define the weld joint. Most edge-based segmen-
tation techniques rely on the computation of first and second-order image operators
for edge identification. For instance, the work proposed by Carasco and Merry [55]
relied on the Canny edge operator to detect edges and segment the weld joint image
consisting of steel manufacturing welding defects. The segmented weld joint images
were compared to an ideal binary image developed manually by experts. Mirzaei et
al. [56] compared the Sobel, Canny and Gaussian filter for segmentation of weld
joint on the welding images database. Although there was no significant difference
in the results obtained, the Gaussian filter yielded better detection of edges.
According to [56], first-order derivative operators such as the Sobel edge detector
are sensitive to noise and double edge formation. Thus, additional processing tech-
niques are required for effective edge detection. As outlined in [57], the Canny edge
operator achieves good signal to noise ratio compared to first-order derivative oper-
ators. Additionally, non-maxima suppression means the weak edges are eliminated,
and thus the formation of double edges is minimised. However, the Canny edge oper-
ator requires much time to run due to complex computation. Another segmentation
technique which relies on the detection of edges is the edge-based Active Contour
Models (ACM).
Image segmentation using ACM is one of the successful and widely used technique
in image processing. ACM provides an attractive method of segmenting images since
they always produce sub-regions with continuous boundaries, contrary to the first
and second-order derivatives, often producing discontinuous boundaries. The ACM
is known as snake model formulated by Kass et al. [58]. It is a method of surround-
ing the RoI boundary in an image by a ‘snake like’ closed contour, the closed contour
then dynamically adapts to the edges of the RoI in the image under the influence of
internal forces, image forces and external constraint forces.
In their work for unsupervised welding defect classification based on Gaussian mix-
ture models (GMM) and exact shape parameters, Nacerredine et al. [59] used snake
ACM for two primary objectives; weld joint segmentation and defects segmentation.
The authors outlined the advantages of snake ACM models in terms of segmenting
objects with irregular shapes. Another unsupervised classification of welding defects
based on GMM and shape parameters proposed by Zhang et al. [29] made use of
snake ACM for weld joint segmentation. Image denoising techniques, including the
curvature filter, were initially applied to original images to minimise noise. Despite
being the widely used image segmentation method, the snake ACM is sensitive to
the initial contour position and shape. For example, an initial contour should be
positioned near the RoI to minimise the computation time. Another significant dis-
2.4 Classification of Defects in Radiography Images 19
In contrast to the edge-based segmentation methods where edges are first identified,
region-based segmentation takes the opposite approach, beginning from the inside
of the ACM and the growing outward until weld joint boundaries are encountered.
The region-based segmentation techniques are considered to be more advantageous
than edge-based since they consider regions area rather than local properties such
as gradients. Thresholding is one of the most straightforward and most fundamental
region-based segmentation technique in image processing [60, 61, 62]. Thresholding
relies on a fundamental fact that the dynamic range of pixel values between the RoI
and the background is different. The output of this segmentation technique is a bi-
nary image with RoI represented by a white region and the background represented
by a dark region.
Mouhmadi et al. [63] extracted the weld joint from the background images contain-
ing welding defects in steel pipes using global thresholding. The authors addressed
the issue of low contrast and noise by applying image enhancement techniques to the
acquired images. Several other applications of global thresholding for weld joint ex-
traction in radiography images can be found in [64, 65]. However, the most common
disadvantage of global thresholding methods for weld joint segmentation is that these
methods assumes the acquired images only consist of a bimodal histogram. There-
fore such methods are generally not ideal for images with a non-uniform background
where there are variations in illumination.
Most researchers have focused their attention on the local thresholding techniques for
weld joint segmentation. In local thresholding, an image is divided into sub-regions,
and within each sub-region, a fixed value for separating the foreground and back-
ground is determined. The study conducted by Naceredinne et al. [59] compared
the global thresholding to the local thresholding techniques for weld joint segmen-
tation of radiography images. The results obtained from their study indicated that
global thresholding yields good results on images with good contrast. For images
with non-uniform background, local thresholding was recommended.
The notable disadvantage of thresholding is that the obtained binary image cannot
be exploited immediately because of superfluous information that must be removed.
Thus, the post-processing step is usually used after thresholding. The method pro-
posed by Mahmoudi et al. [66] which is the improvement to their previous work in
[62], relied on local thresholding for weld joint segmentation. Morphological oper-
ations were used as a post-processing technique to remove residual spots and to fill
20 Literature Review
holes from the thresholded images. In [67], the method based on multiscale mor-
phology was applied to weld joint images segmented by the iterative Otsu threshold
algorithm.
In this model, the mean intensity of the pixels inside and outside the curve approx-
imates the image to a smooth representation. The Chan-Vese model is commonly
known as the ACM without edges because it can detect objects whose boundaries are
not represented by image gradients. Just like the edge-based ACM, this model min-
imises the energy until the desired boundaries are reached. However, the stopping
term is not necessarily dependent on the gradient information. The main advantage
of the Chan-Vese ACM is that contours can be split or merged together depending on
the topology changes.
Gharsallah and Braiek [70] used a level set ACM to segment welding defects in im-
ages characterised by uneven illumination and low contrast. The authors exploited
the saliency map as a feature representing image pixels embedded into a region en-
ergy minimisation function. The saliency map is said to be able to represent small de-
fects even in images with low contrast. The results obtained by the authors indicated
that level set ACM is more robust for segmenting images with challenging contrast
and background as well as good performance compared to edge-based segmentation
methods. Boutiche et al. [71] segmented weld joint and welding defects using the
Chan-Vese ACM, curve evolution and binary level set methods. Their method aimed
to extract defects in radiography images with uneven illumination and calculate the
defect parameters for another application.
2.4 Classification of Defects in Radiography Images 21
The Low-level visual feature extraction techniques extract visual properties from
certain regions of the image via pixel-level operation. The extracted features are
commonly referred to as global or local, according to the relative area of those re-
gions. A global feature is computed by considering the entire image, and it reflects
the global characteristics of the image. In contrast, a local feature is computed over
a small region of the image. This section reviews the welding defects detection and
classification methods based on global and local feature extraction techniques.
The methods based on geometric features and texture features have been the most
commonly used global feature extractors for extracting defect features in radiogra-
phy images. The geometric features describe the shape, size, location, and intensity
information of the welding defects, while texture features provide important visual
information. Mekhalfa et al. [72] applied the SVM and MLP classifiers in four weld-
ing defect types, namely solid inclusions, porosity, lack of penetration, and crack.
The authors first applied histogram equalisation techniques to improve the images’
quality before extracting a set of geometric features derived from the geometrical
defect parameters. In their study, the SVM provided higher accuracy and a faster
computational time compared to MLP classifier. The work proposed by Valavanis and
Kosmopoulos [73] made use of geometric and texture features for a multiclass weld-
ing defect classification pipeline. The authors compared the SVM, Neural networks
and K-NN classifiers; the SVM classifier achieved the highest overall classification ac-
curacy in this work.
A method for automatic detection of weld defects from radiography images based
on the SVM classifier was proposed by Shao et al. [74]. Three types of global fea-
22 Literature Review
tures were extracted, including defect area, average grey scale difference and grey
scale standard deviation. These extracted features were then used as inputs to a clas-
sifier to distinguish non-defective images from defective. The results showed that the
proposed method could reduce the undetected rate and false alarm compared to tra-
ditional defect detection methods. The method proposed by Hassan and Awan [75]
classified the welding defects using the geometric features and Artificial Neural net-
work (ANN). The extracted geometric features include the defect area, major axis,
minor axis, solidity and perimeter. The initial step involved enhancing the image
contrast using the histogram equalisation before segmenting the weld defect region
using global thresholding and morphological operations.
Silva et al. [76] extracted four geometric features, including position, aspect ratio,
and the roundness of various types of welding defects. Classification of the extracted
features was achieved using the ANN classifier. Their method proved that the quality
of the extracted features is more important than the quantity of the features. Her-
nandez et al. [77] extracted features describing the defect size, defect shape, defect
location, and information intensity. These features are similar to the features ex-
tracted in [75]. The defects classified included the inclusions, porosity, longitudinal
crack. The classification of the extracted features was achieved using the adaptive
network-based fuzzy interference system (ANFIS). A similar method based on geo-
metric features and ANFIS to classify five types welding defects was proposed in [78].
Global features are attractive because they produce a very compact representation
of images where each image corresponds to a single point in a high dimensional fea-
ture space. Furthermore, global features require less computational cost compared to
the requirements of local features. However, global features are not invariant to sig-
nificant image transformations and are sensitive to clutter and occlusion. As a result,
it is either assumed that an image contains only a single object or that good segmen-
tation of the object from the background is available. The approach to overcoming
these limitations, as stated by Ibrahim et al. [79] is to segment images into several
regions with each region representing a single object. However, image segmentation
is a challenging task that requires a high level understanding of the image content.
Global feature’s limitations are overcome by a local feature that finds interesting
characteristics of the image content despite significant changes in illumination, oc-
clusion, viewpoint and clutter, and the image does not need to be segmented. A local
feature is computed over a relatively small region of the image. It is defined as a
pattern in an image that differs from its immediate neighbourhood. A local feature
in an image content can be points, edges, corners or small image patches [67]. Two
2.4 Classification of Defects in Radiography Images 23
types of local descriptors are found in the literature, keypoint based descriptors and
grid sampling-based descriptors. Keypoints are points such as corners and blobs, and
their shape, scale and position are found using a feature detector. On the other hand,
Grid sampling descriptors consist of patches of fixed size and shape placed on a reg-
ular grid across an image. This section reviews the classification of welding defects
conducted by several researchers based on local feature extraction techniques.
The HoG descriptor is a grid sampling-based descriptor which has been very success-
ful in facial recognition tasks [80, 81, 82]; it is invariant to illumination changes. Gao
et al. [83] proposed a method for automatic detection and classification of welding
defects in heating panels. The authors used the HoG descriptor for feature extraction
while the kernel-based SVM was used as a classifier. However, their method was not
suitable for rotated images. Liu et al.[84] proposed a rather more interesting weld
defect classification pipeline based on CNN with three fully connected layers. Fea-
tures in the second fully connected layer were extracted using the HoG descriptor.
The authors then used ensemble methods to classify features in the last fully con-
nected layer of the CNN. The pipeline gave good accuracy at the expense of large
data requirements.
Local Binary Patterns (LBP) descriptor is one of the widely used grid sampling-based
descriptors for extracting local features in images. It is invariant to illumination
changes and rotation. The work proposed by Moghaddam et al. [85] compared the
linear SVM and the K-NN classifiers to classify weld defects features extracted using
the LBP descriptor. The authors considered three types of welding defects: lack of
penetration, lack of fusion and external undercut. The first step was to improve the
contrast of images using the two-dimensional filter before performing feature extrac-
tion. The K-NN outperformed the linear SVM classifier by a significant margin in
terms of classification accuracy.
Mery et al. [86] conducted an empirical study to automatically detect weld de-
fects in a large dataset of automotive radiography images. The authors compared
24 computer vision techniques, including deep learning, sparse representation and
local descriptors. The experiments conducted by the authors showed that the best
performance is achieved by the combination of the LBP descriptor and a linear SVM
classifier. Moreover, the application of the LBP descriptor and the SVM classifier to
detect and classify weld defects in radiography images can be found in [87, 88].
Feature extraction using the keypoint based descriptors include two main steps, key-
point detection and keypoint description. Keypoint detection aims to find interesting
information or keypoints on the image that are invariant to a wide variety of im-
24 Literature Review
The method based on the SIFT descriptor and the SVM classifier was proposed in
[5] to detect and classify steel defects, due to multiple features generated per im-
age, the authors suggested feature reduction by the voting strategy based on the one
versus all multiple classifiers. The SIFT descriptor yields a 64-dimensional feature
vector describing the local content for every detected keypoint. Keypoints in SIFT
are detected using the Difference of Gaussian (DoG) generated from image pyramid.
On the other hand, the SURF descriptor was implemented to improve SIFT in terms
of reduced feature vector length and faster detection of keypoints [89]. Kalai et al.
[6] detected and classified the slag inclusions, porosity, lack of fusion and incomplete
penetration defects on steel welding images. Features of these defects were extracted
using the SURF descriptor while the Auto-Encoder Classifier (AEC) was employed for
classification purposes. AEC was analysed for weld image classification using a dif-
ferent number of neurons in different hidden layers.
Despite achieving great results in many applications and being robust and invari-
ant to many image transformations, keypoint based descriptors such as SIFT and the
SURF consist of many feature vectors representing a single image. This yields a high
dimensional feature space; thus, the computational cost is high, and the classification
results are affected by outliers. This is because keypoint vectors could be classified
as belonging to a different class label even though they came from the same image.
2.5 Conclusion
In Section 2.2, an overview of machine learning techniques was introduced, and it
was noted that machine learning techniques could be divided into supervised learn-
ing, unsupervised learning and reinforcement learning. Supervised learning learns
from the labelled dataset, while unsupervised learning finds interesting character-
istics in the unlabelled dataset. Reinforcement learning is when the learning algo-
rithms are provided with the score that tells the algorithm how good or bad its pre-
dictions are. It was further noted that machine learning algorithms could be divided
into shallow learning algorithms and deep learning algorithms. Deep learning algo-
rithms learn the features directly from raw data, while shallow learning algorithms
rely on manually extracted features.
2.5 Conclusion 25
In Section 2.3, the recent applications of machine learning algorithms for the de-
tection and classification of rail defects were investigated. The applications included
methods based on shallow learning algorithms and deep learning algorithms, and the
summary of the results obtained were presented in Table 2.1. It was observed that
both shallow and deep learning algorithms had been used widely for the detection
and classification of rail surface defects and very little work was found on the rail
thermite welding defects. Thus, a further investigation of some related work in other
industries that use radiography to detect and classify welding defects was conducted
in Section 2.4. Given that the dataset presented in this work is limited, section 2.3
only investigated the methods based on image processing and shallow learning tech-
niques.
The techniques investigated for weld joint extraction included edge-based segmen-
tation techniques and region-based segmentation techniques. It was observed that
edge-based segmentation methods require the computation of edges in images using
the first and second-order derivative operators. However, this could be a challenge
for cases where the detection of edges in images is not feasible. The region-based seg-
mentation on the other hand like the thresholding technique only uses the statistical
information of background and foreground objects for segmentation. Furthermore,
thresholding was found not suitable for segmenting images with uneven background.
Another edge-based segmentation investigated was the Chan-Vese ACM; this method
was found to have a better performance compared to edge-based segmentation meth-
ods since it can segment images with tribological changes.
Two feature extraction methods were investigated; these were the global feature
extraction methods and local feature extraction methods. In this work, the feature
extraction method must at least satisfy the following image transformation require-
26 Literature Review
Two keypoint based descriptors were also investigated; SIFT and SURF descriptors.
Both of these descriptors meet the requirements and they are also invariant to scale,
however, the major disadvantage as found in the literature is that they both repre-
sent an image by many feature vectors and this is not ideal for training a classifier.
Thus, in this work, a novel mid level image representation method that aims to com-
bine the keypoint based features into a global image representation is proposed. The
method is based on the Bag of Visual Words (BoVW) mid level image representa-
tion approach. The SURF descriptor will be used for this purpose since it has a
small dimensional feature vector and it detects keypoints much faster than the SIFT
descriptor. For feature classification, the SVM, K-NN and Naive Bayes classification
algorithms will be used for experiments.
Chapter 3
3.1 Introduction
This chapter’s main objective is to provide a mathematical pipeline of the methods
used in this work to detect and classify thermite weld defects. As depicted in Figure
3.1, the methods are divided into thermite weld image enhancement, weld joint Re-
gion of Interest (RoI) extraction, feature extraction and feature classification. Images
are initially enhanced to improve their quality; thereafter, the weld joint is extracted
from the background of the enhanced images. Feature extraction is performed on the
weld joint to extract defect features. The extracted features are then used to train
and validate a classification algorithm. This chapter is structured as follows. First,
image enhancement and weld joint RoI methods are presented in Sections 3.2 and
3.3, respectively. Then the feature extraction methods to extract defect features are
outlined in Section 3.4. After that, the feature classification methods to classify the
considered defects are discussed in Section 3.5, and finally, Section 3.6 discusses the
evaluation methods. Section 3.7 concludes the chapter.
27
28 Materials and Methods
Low pixel dynamic range intensity values characterise the collected thermite weld
images, where pixels are either skewed to the right or to the left of the histogram.
Thus, image enhancement techniques are required to improve the image quality such
that the dynamic range of pixels is evenly distributed across the entire histogram. As
discussed in Chapter 2, several image enhancement techniques have been used in the
literature to enhance radiography images. These are divided into contrast stretching
and Histogram equalisation. Contrast stretching enhances the quality of an image
by increasing the dynamic range of the pixels. It takes the narrow range of intensity
values in the normalised input image and produces a wide range of intensity values
in the processed image. The disadvantage of the contrast stretching technique is that
it is only confined to a linear transform function for mapping input values to out-
put values. Furthermore, it is based on point processing, and it does not consider
the overall appearance of the image [47]. Histogram equalisation offers more ad-
vantages than contrast stretching since the global appearance of the image can be
enhanced by manipulating its histogram. Therefore, histogram equalisation is used
in this work to determine a function that transforms an original image into an en-
hanced image. Histogram equalisation techniques can be divided into global-based
and adaptive-based approaches.
GHE then maps the histogram of I into its entire range of grey values {g0 , gL−1 } by
using the CDF as the transform function. The transform function T (gk ) is defined
using the function cdf (gk ) as:
Then the output image of the GHE is denoted by GI = (GI (x, y)), where GI (x, y) is
expressed as:
GI (x, y) = T (I(x, y)) (3.4)
(a) (b)
Figure 3.2: Image enhancement: (a) Original image and (b) Image enhanced using
CLAHE.
Image segmentation using ACM is one of the successful and widely used segmen-
tation techniques for a variety of tasks in image processing [93, 94, 95]. ACM pro-
vides an efficient way of using an energy function to drive the contour towards the
3.3 Region of Interest Extraction 31
Parameterised Approaches
Where Eint represents the internal energy of the contour and Eext represents the
external forces. Internal energy encourages the contour to conform to a known shape
preference; it serves to impose a piecewise smoothness constraints [96]. The internal
energy at some point C(s) on the curve is defined as:
Where C 0 (s) is the first-order derivative, and it makes the contour to act as a mem-
brane (elasticity), and C 00 (s) is the second-order derivative which allows the contour
to act as a thin plate (rigidity). α(s) and β(s) are the user-defined parameters which
controls the relative importance of C 0 (s) and C 00 (s) respectively.
External forces attract the contour towards image features such as edges, lines and
texture. They can be interpreted as a gravitational pull towards edges in an image.
At a contour location C(s) in image I, the external force is calculated as:
Z 1
EExt = −||∇I(C(s))||2 (3.8)
0
The snake model provides an attractive method of segmenting images since it pro-
duces sub-regions with continuous boundaries, contrary to the first and second-order
derivatives, which often produces discontinuous boundaries [97]. Despite being the
widely used image segmentation method [98, 99, 100], The snake model is said to
be sensitive to the initial contour position and shape. For example, an initial con-
tour should be positioned near the RoI to minimise the computational cost. Another
significant disadvantage of a snake model is its inability to change with topology
[101].
Level sets ACM were first introduced by Osher and Sethian [102]. The difference
between the parametric ACM and the level sets ACM is that the latter implements
the contour via a variational level set method. The contour is represented implicitly
by a function φ(x, y) called a level set function, where (x, y) is the pixel location in
the image domain Ω. The contour C is defined as those pixels in Ω where the level
set function is zero, and this is expressed as:
The level set function can be interpreted as the distance function with respect to
the contour C. It is positive outside the contour, zero at the contour location and
negative inside the contour. Given that the contour C moves with speed F in the
normal direction, then the level set function φ(x, y) must satisfy the following level
set equation:
∂φ(x, y)
= F |∇φ(x, y)| (3.10)
∂t
Two alternative approaches for level set segmentation exist; Geodesic ACM and
Chan-Vese ACM. In Geodesic ACM, the gradient descent equation providing speed
F is derived in terms of the contour and then implemented using the level set equa-
3.3 Region of Interest Extraction 33
tion. This was done to derive the level set equation for a snake model. The energy
function which must be minimised is defined as:
Z
E(C) = g(C)dC (3.11)
The above equation is minimum at the edges of the object and g is an edge indicator
function defined as:
1
g(I(x, y)) = (3.12)
1 + |∇Iσ (x, y)|
Where Iσ (x, y) is the smoothed image representing the spatial scale where the gradi-
ent is computed. The gradient descent equation providing the speed of the contour
in the normal direction is given as:
dC
= gκn + (n × ∇g)n (3.13)
dt
Where κ is the local curvature of C and n is the outer normal. Implementing Equation
3.13 in terms of the level set Equation 3.10 gives the level set equation for Geodesic
ACM defined as:
∂φ(x, y) ∇φ(x, y)
= g(I(x, y))|∇φ(x, y)|div + ∇g(I(x, y))∇φ(x, y) (3.14)
∂t |∇φ(x, y)|
Another segmentation method based on the level set approach is the Chan-Vese ACM,
proposed by Chan and Vese [69] for the Mumford-Shah [103] segmentation tech-
niques. The Mumford-Shah model can detect contours without relying on gradient
information. For instance, objects with very smooth edges or non-connected edges
can be segmented. Given an image with two regions Ω1 and Ω2 representing the fore-
ground and background objects, respectively; the heaviside step function is defined
as: (
1, if φ(x, y) ≥ 0 ((x, y) ∈ Ω1 )
H(φ(x, y)) = (3.15)
0, else ((x, y) ∈ Ω2 )
The Chan-Vese ACM is based on the Mumford-Shah model, and it segments the image
by using the grey scale intensity information within regions as opposed to using the
edge information. The Mumford-Shah energy function Ecv which must be minimised
34 Materials and Methods
Where h1 and h2 are the mean intensity values inside and outside the contour respec-
tively, these are updated for each iteration, and v|∂Ω1 | is the length of the boundary
which is used as a regularising term. The Chan-Vese ACM in terms of the level set
function φ can be written as:
Z
E(h1 , h2 , φ) = ((I(x, y) − h1 )2 − (I(x, y) − h2 )2 )Hφ(x, y)
Ω
Z Z
2
+ (I(x, y) − h2 ) dxdy + v |∇Hφ(x, y)|dxdy (3.17)
Ω Ω
The mean intensity values h1 and h2 inside and outside the evolving contour, respec-
tively are defined as: R
I(x, y)H(φ(x, y))dxdy
h1 (φ) = Ω R (3.18)
Ω
H(φ(x, y))dxdy
R
I(x, y)(1 − H(φ(x, y)))dxdy
h2 (φ) = Ω R (3.19)
Ω
(1 − H(φ(x, y)))dxdy
The local minimisation of the Chan-Vese energy function is done by the gradient
descent. It is assumed that the heaviside function is smoothed slightly to make it
differentiable. Its derivative is the smoothed delta function: dH(φ)H(φ)
= δ(φ). The
gradient descent equation is calculated as:
∂φ ∇φ 2 2
= δ(φ) vdiv + (I(x, y) − h2 ) − (I(x, y) − h1 ) (3.20)
∂t |∇φ|
Similar to the Geodesic ACM, the Chan-Vese ACM minimises the energy function until
the RoI object boundaries are reached. However, this is achieved independent of
the gradient information but rather on the statistical information of the background
and foreground (RoI) regions. This then allows Chan-Vese ACM to segment images
even characterised by noise and smooth edges. Another significant advantage of the
Chan-Vese ACM is that contours can be broken down into parts or joined together
depending on the topology of the level set function. For these reasons, the Chan-Vese
ACM is used in this work to segment and extract the weld joint as the RoI from the
background of the thermite weld images.
3.3 Region of Interest Extraction 35
(a) (b)
Figure 3.3: Image segmentation: (a) Application of Chan-Vese ACM and (b) seg-
mented image.
(a) (b)
Figure 3.4: RoI extraction: (a) Image with bounding box and (b) Cropped image.
36 Materials and Methods
Mathematical morphology operation used in this work is dilation. Dilation adds pix-
els on the boundaries of the image. In this work, dilation is used to add foreground
pixels such that dark regions in the weld joint are eliminated (see Figure 3.5c). Dila-
tion causes the white region (foreground pixels) to grow in size; thus, dark regions
become smaller and smaller. Dilation takes two parameters as inputs; the first pa-
rameter is the input image to dilate, and the second parameter is the kernel. Given
A as a set of input image coordinates, B as a set of kernel coordinates and Bx as
a translation of B such that its origin is at x. Then, the Dilation of A by B is the
set of all points in x such that the intersection of A with Bx is not empty. This is
mathematically defined as [105]:
A ⊕ B = {x|(B)x ∩ A 6= φ} (3.21)
Global features are attractive because they produce a compact representation of im-
ages where each image corresponds to a single point in a high dimensional feature
space. Furthermore, global features are faster to compute and require less computa-
tional cost than local features. Some of the commonly used global feature extraction
methods include the Histogram of Oriented Gradients (HoG) descriptor and the Grey
38 Materials and Methods
Two local feature extraction techniques are empirically compared for extracting de-
fect features in the weld joint images; these are the Local Binary Patterns (LBP)
descriptor and the Speeded-Up Robust Features (SURF) descriptor. In this work,
the feature extractor is required to extract illumination and rotation invariant fea-
tures, and both the LBP and SURF descriptors meet these requirements. It should be
mentioned that the output from the SURF descriptor are keypoints that are highly
discriminative, where many keypoint descriptor vectors represent each image; this
then makes it challenging to train a classifier as the classification results will be im-
pacted by outliers and the computational cost to train the classifier will be expensive.
To address these challenges, the Bag of Visual Words (BoVW) approach is used in
this work to cluster the keypoints into groups called visual words or the codewords;
then, every weld joint image is represented by a global vector which is a count of the
number of occurrence of each codeword on a given image.
Where gc represents the grey intensity value of c and gi represents the grey intensity
value of circular symmetric neighbourhood pixels of c. The sign function S() ensures
that the LBP descriptor is invariant to illumination change; it is defined as:
(
1, if x ≥ 0
S(x) = (3.23)
0, if x < 0
The binary number generated is converted into a decimal number which forms the
LBP code for the given centre pixel. Figure 3.6 illustrates the generation of the LBP
code for the centre pixel highlighted by a red colour.
Assuming that the cell in Figure 3.6 has dimensions N × M , the LBP code is com-
puted for every pixel, and it is characterised by the distribution of codes representing
the cell as the LBP histogram vector defined as:
N X
X M
z(k) = g(LBP(R,P ) (i, j) − k) (3.24)
i=1 j=1
The original LBP descriptor of Equation 3.22 is not invariant to rotation, and it pro-
duces 2P different binary codes produced from the neighbouring pixels. However,
each neighbourhood pixel will move accordingly along the circle’s parameter for a
40 Materials and Methods
rotated image, therefore yielding a different LBP value. A rotation invariant LBP
descriptor is achieved by grouping together the LBP patterns that are the rotated ver-
sions of the same pattern. The rotation invariant LBP descriptor is formally defined
as:
ri
LBP(R,P ) = min{ROR(LBP(R,P ) i|i = 0, 1, ..., P − 1} (3.25)
Where the function ROR(x, i) perform the circular i step bitwise right shift on the
pattern binary string x for i number of times. The minimum between the resulting
numbers is then selected. Keeping only the rotationally invariant patterns leads to
a reduction in feature dimensionality, but the number of LBP codes increases drasti-
cally with an increase in P .
The extended LBP descriptor [109] uses uniform patterns to reduce the number of
LBP codes. Uniform patterns have been experimentally shown to occur more fre-
quently in texture images than non-uniform patterns. A pattern is said to be uniform
if it contains at most two bitwise transitions from 0 to 1 or vice versa when the binary
string is considered circular. For example, 0001000 is a uniform pattern because it
has two transitions, while 0101010 is not a uniform pattern because it has 6 transi-
tions. To distinguish between the uniform and non-uniform patterns, the uniformity
measure U is introduced, and it counts the number of spatial transitions (1’s and 0’s)
between successive bits in the circular representation of the pattern binary code. U
is defined as:
P
X −1
U (LBP(R,P ) ) = |S(gi−1 − gc ) − S(g0 − gc )| + |S(gi−1 − gi ) − S(gi−1 − gi )| (3.26)
i=0
All patterns with U > 2 (more than two spatial transitions) are called non-uniform
otherwise patterns are called uniform. The modified rotation invariant uniform LBP
descriptor is then defined as:
(PP −1
riu i=0 S(gi − gc ) × 2i , if U (LBP(R,P ) ≥ 2
LBP(R,P ) = (3.27)
P+1, Otherwise
Parameter selection
The LBP descriptor has several parameters, some of which requires fine-tuning to
achieve the best feature classification results. Parameters such as the number of
neighbours, the radius and the cell size are usually the main parameters which are
optimised for best results depending on the type of LBP descriptor used [110, 111,
112]. In this work, the rotation invariant uniform LBP descriptor is used, thus the
number of neighbours and the radius are kept at (8,1) since the uniform patterns
are found to occur more frequently on this combination [109]. Keeping the number
of neighbours at 8 is also a great choice for avoiding a long feature vector. The LBP
parameter which is optimised here is the cell size. Table 3.1 lists the LBP parameters
used in this work.
Parameter Value
Number of neighbors 8
Radius 1
Cell size Optimal cell size to be evaluated
Number of bins 59
Normalisation L2-norm
42 Materials and Methods
Algorithm 3 gives the step involved for extracting features in weld joint images using
the LBP descriptor.
Keypoint Detection
The SURF descriptor uses the Hessian matrix to determine the location and scale of
the potential keypoints. For a given pixel (x, y) in image I, the Hessian matrix is
calculated as: " 2 2
#
∂ I ∂ I
∂x2 ∂x∂y
H(I(x, y)) = ∂2I ∂2I (3.28)
∂x∂y ∂y 2
Where Lxx (c, σ), Lyy (c, σ) and Lxy (c, σ) are the convolutions of Gaussian second-
order derivatives with image I at point c in x, y and xy directions, respectively. The
determinant of Hessian matrix at this location is then defined as [113].
Where Dxx , Dyy and Dxy are the approximations of the Gaussian second-order deriva-
tives in x, y and xy directions respectively. The SURF algorithm uses responses of box
filters to approximate these three derivatives in respective directions. Three such box
filters are depicted in Figure 3.8.
Figure 3.8: Three box filter approximation of the second-order derivatives of Gaus-
sian filters [3]
The box filter responses are computed using the integral images. In an integral
image, the value of any pixel (x, y) is the sum of all the pixel values above and to the
left of the same pixel location in the original image. The integral image Iint computed
from image I can be calculated as [114]:
0 0
X X
Iint = I(x , y ) (3.31)
0 0
x ≤x,y ≤y
The concept of integral images allows for quick and efficient computation of box
filters. For example, the approximation of Dxx filter in Figure 3.8 is calculated by
first calculating the area enclosed by region A’B’C’D’ and then subtracting the area
enclosed region by ABCD. These area calculations can be carried out efficiently using
integral images.
Figure 3.9: The scale, the filter sizes and octaves in SURF [4]
Keypoint Localisation
The determinant of the Hessian matrix determines the potential keypoint, but some
are weak and need to be eliminated; this is done in the keypoint localisation step.
Keypoint localisation is achieved in three respective stages; in the first stage, all
keypoints within an octave are tested against a fixed threshold value. Keypoints
above the threshold value are accepted and passed on to the second stage. Keypoints
below the threshold are discarded. The second stage is non-maximum suppression
in a 3 × 3 × 3 neighbourhood. In this stage, every keypoint is compared to its 26
neighbouring pixels, 9 in the scale below and above it and 8 in the current scale. A
keypoint is considered a strong keypoint if its value is higher or lower than all its
neighbours (see Figure 3.10).
The last step of keypoint localisation is to interpolate the nearby data to determine
the position and scale of keypoints to sub-pixel accuracy. This is achieved by fitting a
3D quadratic function around the neighbourhood of each local extrema, and its peak
value is selected as a sub-pixel and sub-scale location. The function is approximated
by Taylor expansion of the scale-space function D(x, y, σ) with the keypoint (from
3.4 Feature Extraction 45
∂DT 1 T ∂ 2D
D(z) = D + z+ z z (3.32)
∂z 2 ∂z 2
Where D and its derivatives are evaluated at the keypoint candidate z0 = [x0 , y0 , σ0 ]T
and offset point z = [δx, δy, δσ]T . The location of the extrema is then evaluated by
setting the derivative of Equation 3.32 to 0, yielding:
∂ 2 D−1 ∂D
ẑ = (3.33)
∂z 2 ∂z
Orientation Assignment
The output from the previous step is the scale-invariant keypoints which are localised
to a sub-pixel accuracy in terms of (x, y, σ). The orientation assignment step aims to
achieve rotation invariant keypoints by assigning to each a reproducible orientation
information. This is done in two steps; first, a circular region of radius 6σ is taken
around every keypoint, and within this region, Haar wavelets responses of size 4σ in
x and y directions are calculated. The obtained responses are then weighted using a
Gaussian kernel centred around every keypoint and plotted as vector points in x and
y coordinates. In the second step, a window of size π/3 is rotated around a keypoint.
Points inside this window are then summed up, and the most dominant results are
assigned as the orientation of the keypoint. The orientation assignment step can be
understood by referring to Figure 3.11.
Keypoint Description
This step constructs a square region centred around the keypoint and oriented along
the dominant orientation. This region is divided into sub-regions of size 4 × 4, and
46 Materials and Methods
for each sub-region, Haar wavelets responses are calculated at 5 × 5 regular spaced
sample points. The x and y wavelet responses denoted by dx, and dy, respectively
are calculated and summed up to give a first entry to the feature vector. Absolute
values of the responses |dx | and |dy | are also added to the feature vector to obtain
the information on the polarity of the intensity changes. Thus for each sub-region, a
vector is four-dimensional given as:
X X X X
v=( dx , dy , |dx |, |dy |) (3.34)
Since there is a total of 16 sub-regions within the square region, the SURF descriptor
across every keypoint is, therefore, a 64-dimensional feature vector.
Algorithm 4 presents the steps used to extract features using the SURF descriptor.
Codebook Construction
In the codebook construction step, all the keypoint descriptor vectors from the train-
ing dataset are clustered together, and each cluster represents a codeword. Let V =
{vj | j = 1, 2, ..., N } be a set of unordered SURF keypoint descriptors extracted from
the training dataset where vj ∈ IRD is a keypoint descriptor vector and N is the total
number of keypoint descriptors. In this work, the Kmeans clustering algorithm is
used to construct the codebook. This is done by clustering the N keypoint descrip-
tor vectors into K clusters. The output from Kmeans clustering is then a codebook
defined as C = {ck |k = 1, 2, ..., K} where ck ∈ IRD is the mean vector of the k th
cluster.
Coding
The coding step aims to represent every image in the dataset in terms of the codebook
elements (codewords). The coding step can be modelled using the function f defined
48 Materials and Methods
as:
f : IRD −→ IRK
vj −→ βj (3.35)
where βj = {(βk,j )|k = 1..., K} maps a descriptor vector vj into the closest codeword
ck in the codebook according to the following hard coding equation
(
1, if k = arg mink∈{1...,K} ||vj − ck ||22
βk,j = (3.36)
0, otherwise
Pooling
The final step in the BoVW approach is to construct a vector z that provides a global
description of an image. This vector is a count of how many times each codeword
appears on a given image. The idea of the pooling step is to concatenate and add all
the elements of the encoded descriptor vector for every keypoint on an image. Thus,
given an image with the total number of n descriptors, the k th component of vector
z is calculated as: n
X
zk = βk,j (3.37)
j=1
Parameter Selection
Two important parameters of the BoVW approach with SURF descriptor (hereafter
referred to as ”BoSURF approach”), which requires fine-tuning for optimal classifi-
cation results, are found in the keypoint detection, description and the codebook
construction steps, respectively. In the keypoint detection and description step, the
SURF descriptor computes the keypoints using the determinant of the Hessian ma-
trix. Then it removes some by thresholding against a fixed Hessian determinant value
minHessian. Though minHessian is based on heuristics, the optimal value has been
found to be between 400 and 800 in several research studies [116, 117]. Therefore,
minHessian of 500 is used in this work. The other parameter of interest is in the
codebook construction step where keypoints are clustered into K number of clusters,
and each cluster represents a codeword, again the question of which K to use is also
based on heuristics; therefore, different values of K ranging from 200 to 2000 are
experimented in this work for optimal results.
3.5 Feature Classification 49
The steps for representing the weld joint images using the BoSURF approach are
illustrated by Algorithm 5.
Three classification algorithms deemed effective for modelling small dataset are used
in this work to address the aforementioned challenges [118, 119, 120]. These are
50 Materials and Methods
the Support Vector Machines (SVM), the K-Nearest Neighbours (K-NN) and the Naive
Bayes classifiers. Therefore, this section of the chapter provides a detailed explana-
tion of the mathematical approach and algorithm implementation of the considered
classification algorithms for classifying thermite weld defects.
Mathematical Approach
For a binary classification task, let ((v1 , y1 ), ..., (vn , yn )) be the training dataset where
vi are the feature vectors representing the samples and y ∈ (−1, +1) are the cor-
responding class labels for the samples. With reference to Figure 3.13, SVM is a
learning algorithm that attempts to find the hyperplane that separates the positive
samples (+1 labelled) from the negative samples (-1 labelled) with the largest mar-
gin, where w is the vector constrained to be perpendicular to the hyperplane, b is the
bias term, and b/||w|| is the perpendicular distance from the origin to the hyperplane.
The margin of the hyperplane is defined as the shortest distance between the positive
and negative samples, which are known as support vectors. For all the samples in the
training dataset, the following constraints must be satisfied.
Figure 3.13: A hyperplane that separates the negative and positive samples [7]
By referring to Figure 3.13, the margin which must then be maximised can be
computed as the distance between H1 and H2 planes defined as:
|1 − b| | − 1 − b| 2
d= − = (3.41)
||w|| ||w|| ||w||
Thus the margin which must be maximised for an optimal separating hyperplane is
equivalent to solving a primal optimisation problem defined as:
1
min ||w||2 subject to yi (w.vi + b) ≥ +1 ∀i (3.42)
2
To find the maxima or minima of any function without having to worry about the
constraints, Lagrangian formulation is used. It introduces new Lagrangian multiplier
αi for each constraint and the minimisation problem of Equation 3.42 becomes:
l l
1 X X
L = ||w||2 − αi yi (xi wi + b) + αi (3.43)
2 i=1 i=1
Taking the partial of derivative of Equation 3.43 with respect to the vector w and the
bias b yields: X
w= αi yi vi (3.44)
i=1
X
αi y i = 0 (3.45)
i=1
52 Materials and Methods
The expression of Equation 3.44 defines the vector w as the linear sum of some of the
samples in the dataset. Substituting Equation 3.44 and Equation 3.45 into Equation
3.43 gives the formulation of the dual SVM defined as:
X 1 XX X
L= αi − αi αj yi yj (vi · vj ) subject to αi yi = 0 and αi ≥ 0 (3.46)
i
2 i j i=1
By solving the dual optimisation problem, the coefficients αi are found. The samples
with αi > 0 are called the support vectors, and they lie on H1 and H2 hyperplanes.
Only the support vectors affect the solution of the SVM problem; hence only the
support vectors are needed to express the solution of the vector w. The decision rule
for classification of the new, unseen sample represented by the vector z is therefore
defined as:
f (z) = wT zi + b (3.47)
Which is equivalent to:
M
X
f (z) = yi αi (viT · z) + b (3.48)
i
The predicted class label of vector z is then determined by the sign of the decision
function stated above. The formulation of the SVM classifier discussed till now as-
sumes the training samples are linearly separable. In a real-life classification task,
however, the data is characterised by the presence of noise and outliers; thus, data
samples cannot be separated linearly. Soft margin SVM tackle this problem by in-
troducing a slack variable ξi which allows some samples to lie amongst samples
of the opposite class. The primal optimisation problem of Equation 3.42 taking into
account ξi is then defined as:
1 X
min ||w2 || + C ξi subject to yi (w.xi + b) ≥ +1 − ξi ∀ξi (3.49)
2 i
Where C is a parameter that controls the misclassification error, applying the La-
grangian formulation of Equation 3.49 and then taking the partial derivative with
respect to w and b yields the dual formulation problem defined as:
X 1 XX X
L= αi − αi αj yi yj (vi · vj ) subject to αi yi = 0 and C ≤ αi ≤ 0
i
2 i j i=1
(3.50)
The formulation of non linear SVM is possible for cases where the samples are non-
linearly separable. The main idea is to transform samples to a high dimensional
feature space χ where they can easily be separated. The transformation then requires
the dot product between any pairs of samples to be computed in χ (i.e φ(vi ) · φ(vj )).
This transformation is computationally expensive thus the kernel functions are used.
3.5 Feature Classification 53
A kernel function K that corresponds to the dot product in χ is defined as: K(vi , vj ) =
φ(xi ) · φ(vj ), thus, only K is needed for computing the dot product without mapping
into a high dimensional feature space. The dual problem of Equation 3.51 is then
defined as:
X 1 XX X
L= αi − αi αj yi yj (K(vi · vj )) subject to αi yi = 0 and C ≤ αi ≤ 0
i
2 i j i=1
(3.51)
Some of the commonly used kernel functions are the linear, polynomial and Radial
Basis Function (RBF). In many classification tasks, the linear and polynomial kernels
have been found to require less computational cost, but they usually achieve low
classification accuracy compared to the RBF kernel [121, 122, 123]. The RBF is used
as the kernel function in this work, and it is defined as:
||vi − vj ||2
K(vi , vj ) = exp( ) (3.52)
2σ 2
To achieve a multi-classification of thermite weld defects, the one vs one SVM clas-
sifier is used, where two pairs of classes are trained at a time. Thus, a total of
D(D − 1)/2 classifiers are obtained, where D is the total number of classes. An
unknown feature vector is assigned to a class label based on the majority vote.
Algorithm 6 gives the steps involved in the classification of weld joint images using
SVM.
Mathematical Approach
To explain the workings of K-NN, let ((v1 , y1 )..., (vn , yn )) be the weld joint training
dataset where vi are feature vectors representing the training samples in a high di-
mensional feature space IRm , yi are the class labels of the samples. The training phase
of the K-NN stores the training samples and when the query sample represented by
vector z from the validation data is made, the distance between the z and every
other training sample is calculated. The distance measure used in this work is the
Euclidean distance, for any two vectors vi and vj it can be defined as:
v
u m
uX
d(vi , vj ) = t (ar (vi ) − ar (vj ))2 (3.53)
r=1
Then, k number of samples (v1 , v2 ..., vk ) which are nearest to z are used to assign the
class label of z according to the equation defined as:
k
X
y(z) ← arg max δ(c, y(vi )) (3.54)
c∈C
i=1
Where y(z) is the class of sample z, c ∈ C is the class label and δ(c, y(z)) is equal to
1 if c is equal to c(vi ), otherwise δ(c, y(z)) is equal to 0. One obvious disadvantage of
assigning the class label based on the majority vote is that nearest k samples may vary
widely in their distance, and the closest neighbours more reliably indicate the class
label for the query sample. For these reasons, weighted K-NN is used in this work.
In weighted K-NN, the contribution of each of the k nearest samples is weighted
according to their distance from the query sample z, thus giving greater weight to
the closest neighbours. By weighting the vote of each nearest sample, then Equation
3.54 becomes.
Xk
y(z) ← arg max wi δ(c, y(z)) (3.55)
c∈C
i=1
3.5 Feature Classification 55
Where wi is the weighting function. In this work, samples are weighted according to
their inverse squared distance from z defined as:
1
wi = (3.56)
d(z, vi )2
Algorithm 7 gives the steps involved in the classification of weld joint images using
the KNN classifier.
Mathematical Approach
P (z|yi )P (yi )
P (yi |z) = (3.57)
P (z)
56 Materials and Methods
Where P (yi |z) is the probability that sample z belongs to class yi , P (z|yi ) is the prob-
ability of generating sample z given class yi , P (yi ) is the prior probability of class yi
and P (z) is the probability of sample z occurring. Modelling P (z|yi ) is impractical
given z is a vector in a high dimensional feature space. Thus in Naive Bayes, it is
assumed that individual zi0 s is conditionally independent given y. The numerator of
Equation 3.57 then becomes:
m
Y
P (z|yi )P (yi ) = P (z1 |yi ) · P (z2 |yi )..., P (zm |yi ) · P (yi ) = P (zk |yi )P (yi ) (3.58)
k=1
P (z) is the same for all the classes, and it does not affect the decision. Thus, equation
3.58 simplifies to:
m
Y
P (z|yi ) = P (zk |yi )P (yi ) (3.59)
k=1
P (yi ) is the class prior probability. Given N number of feature vectors from the
training dataset and Ni number of feature vectors which belongs to class yi , then the
prior probability is calculated as:
Ni
P (yi ) = (3.60)
N
To assign the class label to an unknown sample, the value of Equation 3.59 is com-
puted for each class and the class where this value is maximal is selected. This is
computed as y for sample z:
m
Y
y ← arg max P (zk |yi )P (yi ) (3.61)
yi
k=1
Algorithm 8 presents the steps involved for the classification of weld joint images
using the Naive Bayes classifier.
3.6 Evaluation Methods 57
Three performance measures can be computed from the confusion matrix, namely
the average accuracy, precision, and recall. The average accuracy is calculated as the
58 Materials and Methods
total number of correctly recognized validation examples divided by the total num-
ber of examples in the validation dataset. Precision is calculated as the total number
of correctly recognised positive examples divide by the number of examples labelled
by the classifier as positive. The recall is calculated as the total number of correctly
recognised positive examples divide by the total number of positive examples in the
validation dataset. Equations 3.62 - 3.65 defines the performance measures for a
multi-class classification task. For a single class Ci the performance measure is de-
fined by tpi , f ni , f pi and tni . Where l is the total number of classes in the validation
dataset.
Pl tpi +tni
i=1 tpi +f ni +f pi +tni
Average accuracy = (3.62)
l
Pl tpi +tni
i=1 tpi +f ni +f pi +tni
Error rate = 1 − (3.63)
l
Pl
i=1 tpi
P recision = Pl (3.64)
i=1 tpi + f pi
Pl
i=1 tpi
Recall = Pl (3.65)
i=1 tpi + f ni
3.7 Conclusion
This chapter has introduced the mathematical approaches and algorithms used in
this work to classify defects in thermite weld images. The methods were presented
in terms of image enhancement, RoI extraction, feature extraction and feature clas-
sification. Thermite weld image enhancement was carried out based on the CLAHE
technique, and the Weld joint RoI extraction was achieved using the Chan-Vese ACM.
For feature extraction, two techniques were proposed for comparison: the LBP de-
scriptor and the BoVW approach with SURF descriptor (BoSURF). It was further out-
lined that specific parameters require fine-tuning for optimal results. In this work,
the LBP cell size parameter on the LBP descriptor and the codebook size on the Bo-
SURF approach are fine-tuned. Subsequently, the performance of feature extraction
techniques was evaluated using the SVM, K-NN and Naive Bayes classifiers. The next
chapter presents the experimental results and discussion.
60 Materials and Methods
Chapter 4
4.1 Introduction
This chapter aims to provide a detailed presentation of the results obtained from
conducting the experiments using methods presented in Chapter 3. It is structured
as follows: Dataset for the experiments is described in Section 4.2, followed by the
presentation of the results obtained after image enhancement and weld joint RoI
extraction algorithms in Sections 4.3 and 4.4, respectively. Then, the classification
results obtained from the Local Binary Patterns (LBP) descriptor are presented in Sec-
tion 4.5. Subsequently, the classification results obtained from the Bag of Speeded
Up Robust Features (BoSURF) approach are also presented in Section 4.5. The clas-
sification of the above-mentioned feature extraction techniques is achieved using the
Support Vector Machines (SVM), the K-Nearest Neighbours (K-NN) and the Naive
Bayes classifiers. The results obtained are then empirically compared in Section 4.6
to select the best combination of the feature extractor and classifier for automatic
detection and classification of thermite weld defects. Section 4.7 concludes the chap-
ter. All experiments were conducted on a 64-bit MSI machine powered by the Nvidia
GeForce graphics card with 24 cores, and 32 GB RAM. The source codes were imple-
mented using the Matlab R2019b software under the school license.
61
62 Experimental Results and Discussion
validation purposes. Figures 4.1 to 4.4 depict some of the sample images from each
class.
in this work is the Contrast Limited Histogram Equalisation (CLAHE) technique, and
it was applied on every weld joint image using the steps explained by Algorithm 1.
As mentioned in Section 3.2, the CLAHE technique overcomes the noise enhance-
ment artefact introduced by the Adaptive Histogram Equalisation (AHE) technique
by clipping the histogram before using the Cumulative Distribution Function (CDF)
as the transform function.
Figure 4.5: Image Enhancement using CLAHE at varying clip factor values
64 Experimental Results and Discussion
Figures 4.7 to 4.10 depict the weld joint RoI extraction for sample images in each
class. It can be observed in Figure 4.8 that some images that contain wormholes
defects, needed to be post-processed in order to achieve accurate segmentation and
weld joint extraction. This is understandable since wormholes defects are charac-
terised by multiple “worm-like” dark patterns introduced by gas entrapment during
the thermite welding process. On the contrary, shrinkage cavities and inclusion de-
fects were easily segmented as they are mostly characterised by a single shape rep-
resenting a defect. Shrinkage cavities usually appear as a straight line (see Figure
4.9), and they are caused by the poor pre-heating temperature of rail ends during
the thermite welding. In comparison, inclusions are irregular in shape (see Figure
4.4 Weld Joint Extraction 65
4.10) and are caused by the presence of foreign objects. The post-processing tech-
nique employed in this work to remove residual spots on images segmented by the
proposed Chan-Vese ACM is based on dilation discussed in Section 3.3.4. Figure 4.11
shows the segmentation accuracy of the proposed Chan-Vese ACM on the thermite
weld images per each class. The image is considered to be successfully segmented
if there is no dark spots on the segmented weld joint RoI after post-processing. The
proposed method achieved the segmentation accuracy of 100% on images that be-
long to defect-less and shrinkage cavities classes. Furthermore, the segmentation
accuracy of 97% was achieved on images belonging to the inclusions class, the lowest
segmentation accuracy of 83% was achieved on the images belonging to the worm-
holes class. On average, the proposed Chan-Vese ACM achieved the segmentation
accuracy of 95%.
Feature classification using the K-NN classifier was achieved using Algorithm 7 pre-
sented in the previous chapter. As mentioned in Section 3.5, the value of the K
parameter in the K-NN classifier can have a significant impact on the classification
accuracy. It controls the number of training feature vectors considered when assign-
ing a class label to the unknown feature vector. A smaller K value would mean the
classifier is sensitive to the outliers, while a higher value would mean the neighbour-
hood includes too many vectors from other classes. Therefore, different values of
K, ranging from 1 to 7 (K = 1, 3, 5, 7) were experimented in this work at every LBP
cell size parameter to obtain the highest classification results at each cell size. Tables
4.1 to 4.4 shows the best confusion matrix results obtained by the K-NN classifier at
optimal but different value of K in each LBP cell size parameter. The results are based
on the 5 fold cross-validation method, where in each model, 240 feature vectors (60
per class) were used to train the K-NN classifier, and 60 feature vectors (15 per class)
were used for validation purposes.
68 Experimental Results and Discussion
Table 4.1: Confusion matrix using LBP and 5-NN at [6 14] cell size
Table 4.2: Confusion matrix using LBP and 5-NN at [12 28] cell size
Table 4.3: Confusion matrix using LBP and 1-NN at [30 70] cell size
Table 4.4: Confusion matrix using LBP and 3-NN at [60 140] cell size
The average classification accuracy achieved by the K-NN classifier (at optimal K
value) in each LBP cell size was calculated from the obtained confusion matrix re-
sults using Equation 3.62. It was calculated by taking the mean average precision
per class. In each class, precision is calculated as the ratio of feature vectors correctly
classified to belong to a class (true positive) to the actual number of class feature vec-
tors (true positive and true negative). Figure 4.12 illustrates the highest classification
accuracy achieved at optimal K value of the K-NN classifier in each LBP cell size pa-
rameter. It can be observed that the highest, overall classification accuracy of 94%
was achieved at the optimal K value and LBP cell size parameter of 5 and [6 14] re-
spectively. Additionally, the lowest classification accuracy of 90.67% was achieved at
the K value and LBP cell size parameter of 3 and [60 140] respectively. The accuracy
slightly decreases with an increase in the cell size parameter. This slight decrease in
the accuracy indicates that the K-NN classifier generally provides better classification
performance at a smaller spatial scale of the LBP descriptor. Furthermore, it should
be noted from the confusion matrix results that the classes which contribute to the
decrease in the classification accuracy at increasing cell size are the inclusions and
shrinkage cavities. Therefore, the LBP cell size parameter has been found to have an
impact on the classification accuracy achieved by the K-NN classifier.
Figure 4.12: Classification accuracy of the K-NN classifier at varying LBP cell size
parameter
70 Experimental Results and Discussion
Feature classification using the SVM classifier was achieved using Algorithm 6 pre-
sented in the previous chapter. The non-linear SVM with the Radial Basis Function
(RBF) kernel was used. As already mentioned in Section 3.5, the kernel width σ
can significantly impact the classification results. The σ parameter in the RBF kernel
determines the reach of a single training feature vector. A very high σ value would
mean the training feature vectors have a far reach, in contrary, a very low σ value
would mean the training feature vectors have a closer reach. This means that higher
σ values will yield a decision boundary that depends on the closest feature vectors,
ignoring feature vectors further away. Subsequently, the lower values of σ will yield
a decision boundary that considers feature vectors that are furthest from the decision
boundary. Thus, to prevent the formation of a highly flexed decision boundary and
the decision boundary that is linear, different σ values ranging from 2−4 to 24 (σ =
2−4 , 2−3 , 2−2 , 2−1 , 21 , 22 , 23 , 24 ) were experimented to obtain the optimal value of σ at
each LBP cell size parameter. Tables 4.5 to 4.8 show the best confusion matrix results
obtained by the SVM classifier at optimal but different σ value in each LBP cell size
based on the 5 fold cross-validation method.
Table 4.5: Confusion matrix using LBP and SVM (σ = 4) at [6 14] cell size
Table 4.6: Confusion matrix using LBP and SVM (σ=0.25) at [12 28] cell size
Table 4.7: Confusion matrix using LBP and SVM (σ=0.5) at [30 70] cell size
Table 4.8: Confusion matrix using LBP and SVM (σ=0.5) at [60 140] cell size
The average classification accuracy achieved by the SVM classifier (at optimal σ
value) in each LBP cell size parameter has been obtained, and it was calculated from
the confusion matrix results using Equation 3.62. Figure 4.13 depicts the highest
classification accuracy achieved by the SVM classifier at optimal σ value in each LBP
cell size parameter. The highest, overall classification accuracy achieved by the SVM
classifier is 93.33%, and it was achieved at a σ value of 4 and [6 14] LBP cell size
parameter. Furthermore, the lowest classification accuracy of 91.67% was achieved
at a σ value of 0.5 and LBP cell size parameter of [60 140]. Similar to the results
obtained by the K-NN classifier, there is a slight decrease in the classification accuracy
obtained by the SVM classifier at increasing LBP cell size parameter. Also similar to
the conclusion made on the results obtained by the K-NN classifier; the LBP cell
size parameter has been experimentally found to impact the classification results
obtained by the SVM classifier. The advantage of a small LBP cell size parameter is
that features can be extracted in very small local regions in an image; this makes it
possible to detect features that otherwise could not be detected at a larger spatial
scale. On the downside, a small LBP cell size yields a longer feature vector, and this
greatly increases the computation cost.
72 Experimental Results and Discussion
Figure 4.13: Classification accuracy of the SVM classifier at varying LBP cell size
parameter
Feature classification using the Naive Bayes classifier was achieved using Algorithm
8. The Naive Bayes classifier is simple, fast and known to perform effectively on
a limited dataset. Furthermore, the Naive Bayes classifier requires less parameter
tuning than other classifiers such as the SVM and K-NN. Tables 4.9 to 4.12 depict the
confusion matrix results obtained by the Naive Bayes classifier at each LBP cell size
parameter.
Table 4.9: Confusion matrix using LBP and Naive Bayes at [6 14] cell size
Table 4.10: Confusion matrix using LBP and Naive Bayes at [12 28] cell size
Table 4.11: Confusion matrix using LBP and Naive Bayes at [30 70] cell size
Table 4.12: Confusion matrix using LBP and Naive Bayes at [60 140] cell size
Figure 4.14 shows the average classification accuracy achieved by the Naive Bayes
classifier at varying LBP cell size parameters. The accuracy was calculated from the
confusion matrix results using Equation 3.62. Contrary to the classification accura-
cies obtained by the K-NN and SVM classifiers, the classification accuracy achieved by
the Naive Bayes classifier increases with an increase in the cell size parameter. This
increase indicates that the Naive Bayes classifier generalises better on the feature
vectors extracted at a large LBP spatial scale. However, there is a slight decrease in
the classification accuracy after [30 70] cell size. The highest classification accuracy
obtained by the Naive Bayes classifier is 85.66%, and it was achieved at [30 70] LBP
cell size parameter.
74 Experimental Results and Discussion
Figure 4.14: Classification accuracy of the Naive Bayes classifier at varying LBP cell
size parameter
Table 4.13: Highest classification accuracy by each classifier for LBP features
Feature classification using the K-NN classifier was achieved using Algorithm 7 pre-
sented in the previous chapter. Different values of K ranging from 1 to 7 (K = 1, 3,
5, 7) were experimented to find the optimal value of K at each codebook size. Tables
4.14 to 4.17 show the best confusion matrix results obtained using the K-NN classifier
at optimal but different value of K in each codebook size parameter based on the 5
fold cross-validation method. In each model, 240 feature vectors (60 per class) were
used to train the classifier, and 60 feature vectors (15 per class) were used to validate
the classifier.
Table 4.14: Confusion matrix using BoSURF and 3-NN at 200 codewords
Table 4.15: Confusion matrix using BoSURF and 5-NN at 800 codewords
Table 4.16: Confusion matrix using BoSURF and 5-NN at 1400 codewords
Table 4.17: Confusion matrix using BoSURF and 3-NN at 2000 codewords
Figure 4.15: Classification accuracy of the K-NN classifier at varying codebook size
parameter
Figure 4.15 shows the average classification accuracy achieved at optimal K value
of the K-NN classifier at increasing number of codewords. The accuracy was calcu-
4.5 Feature Classification 77
lated from the confusion matrix results using Equation 3.62. A significant increase in
the classification accuracy can be observed for the first 800 codewords. Afterwards,
there is a slight and less significant increase in the classification accuracy on the re-
maining codewords. The highest classification accuracy achieved by the K-NN (K=5)
classifier is 90.66%, and it was obtained at 1400 codewords.
Feature classification using the SVM classifier was performed using Algorithm 6 pre-
sented in the previous chapter. The non-linear SVM with the RBF kernel was used.
Different σ values (σ = 2−4 , 2−3 , 2−2 , 2−1 , 21 , 22 , 23 , 24 ) were experimented at each
codebook size in order to obtain the optimal value of σ. Tables 4.18 to 4.21 show the
best confusion matrix results obtained by the SVM classifier at optimal but different σ
value for each codebook size parameter based on the 5 fold cross-validation method.
Table 4.18: Confusion matrix using BoSURF and SVM (σ = 0.5) at 200 codewords
Table 4.19: Confusion matrix using BoSURF and SVM (σ = 4) at 800 codewords
Table 4.20: Confusion matrix using BoSURF and SVM (σ = 8) at 1400 codewords
Table 4.21: Confusion matrix using BoSURF and SVM (σ = 0.25) at 2000 codewords
Figure 4.16 shows the average classification accuracy achieved by the SVM clas-
sifier (at optimal σ value) in each codebook size parameter. The accuracy was calcu-
lated from the obtained confusion matrix results using Equation 3.62. It can be ob-
served that the classification accuracy increases with an increase in the codebook size
from 600 to 1400 codewords. After that, there is a linear decrease in the classifica-
tion accuracy with an increase in the number of codewords. The highest classification
accuracy achieved by the SVM classifier (at σ = 8) for classifying the BoVW features
is 94.66%, and it was obtained at the optimal codebook size of 1400 codewords.
Subsequently, the lowest classification accuracy achieved by the SVM classifier was
obtained at 600 codewords.
Figure 4.16: Classification accuracy of the SVM classifier at varying codebook size
parameter
4.5 Feature Classification 79
Feature classification using the Naive Bayes classifier was achieved according to the
steps listed by Algorithm 8. Tables 4.22 to 4.25 depict the confusion matrix results
obtained by the Naive Bayes classifier at varying LBP cell size parameter.
Table 4.22: Confusion matrix using BoSURF and Naive Bayes at 200 codewords
Table 4.23: Confusion matrix using BoSURF and Naive Bayes at 800 codewords
Table 4.24: Confusion matrix using BoSURF and Naive Bayes at 1400 codewords
Table 4.25: Confusion matrix using BoSURF and Naive Bayes at 2000 codewords
Figure 4.17 shows the average classification accuracy achieved by the Naive Bayes
classifier at varying codebook size parameter. The accuracy was calculated from the
confusion matrix results using Equation 3.62. There is an increase in the classification
accuracy for the initial 1200 codewords. After that, the codebook size parameter
has a less significant impact on the classification accuracy. The highest classification
accuracy achieved by the Naive Bayes classifier is 88.33%, and it was obtained at the
optimal codebook size parameter of 1200 codewords.
Figure 4.17: Classification accuracy of the Naive Bayes classifier at varying codebook
size parameter
Table 4.26: Highest classification accuracy by each classifier for BoSURF features
Table 4.27: Highest classification accuracy achieved for LBP and BoSURF
4.7 Conclusion
This chapter has presented the experimental results of the methods introduced in
Chapter 3. The chapter first introduced the dataset used to conduct the experiments.
This was followed by applying the CLAHE technique to improve the quality of every
image. The weld joint was extracted as the RoI from the background of each en-
hanced image, where the Chan-Vese ACM was used as a segmentation method. Two
feature extraction methods, namely the LBP descriptor and BoSURF approach, were
applied on each weld joint image to represent the weld joint as a feature vector. The
performance of the feature extraction methods was evaluated using the three classifi-
cation algorithms, namely the K-NN, SVM and Naive Bayes. Hyperparameter tuning
was performed on the feature extraction and classification algorithms to identify the
optimal parameters for best classification results. The experimental results indicated
that the best method for detecting and classifying thermite weld defects is obtained
by combining the BoSURF approach as a feature extractor and the SVM as a classifier.
Chapter 5
An automated thermite weld defect detection and classification method have been
developed based on image processing and machine learning techniques. Due to the
nature of the obtained thermite weld radiography images, four steps were proposed:
image enhancement, image segmentation, feature extraction, and feature classifica-
tion. The collected images were characterised by poor contrast; therefore, image
enhancement techniques were required to improve the image quality and defect vis-
ibility. According to the literature study, it was found that the CLAHE technique
provides better enhancement results on radiography images compared to other his-
togram equalisation techniques. Thus, the collected images were enhanced using the
CLAHE technique, and image quality was improved.
An algorithm has been developed and applied on the enhanced images to extract
the weld joint (RoI) from the image background. The literature study indicated that
the edge-based segmentation methods such as Thresholding and Hough transform
are effective for a variety of segmentation tasks. However, the collected images con-
tained an irregularly shaped weld joint and a complex image background. This then
83
84 Conclusion and Future Work
Feature extraction techniques were then applied to the weld joint images to represent
every weld joint as a feature vector. The literature categorised these techniques into
local and global feature extractors. Local feature extractors were found to have more
advantages than global feature extractors as they are invariant to significant image
transformations such as rotation, viewpoint and illumination changes. Therefore,
two local feature extraction techniques, namely the LBP descriptor and the SURF
descriptor, were independently applied on the weld joint images to represent every
image as a feature vector.
The SURF descriptor first detects the scale-invariant keypoints before computing a
descriptor vector for each keypoint in the image. This meant a single image is rep-
resented by many feature vectors for training a classifier; thus, computational costs
demand are extensively high. To address this challenge, the BoVW (BoSURF) ap-
proach was used to create a codebook in a completely unsupervised learning manner
from the unlabelled SURF descriptor vectors. Weld joint images were therefore repre-
sented by a single histogram vector that is a count of how many times each codeword
appears on the image. The K-means clustering algorithm was used to create visual
codewords.
The performance of the two feature extractors was compared using three classifiers,
namely the K-NN, SVM and Naive Bayes. These three classifiers were selected due to
their effectiveness in modelling a small dataset. Some parameters of the feature ex-
tractors and classifiers were fine-tuned to evaluate their impact on the classification
performance and to select the best classification results. For feature extractors, these
parameters were the LBP cell size and the codebook size on the LBP descriptor and
BoSURF approach, respectively. For classifiers, the parameters were the K value and
the σ value on the K-NN and SVM classifiers, respectively.
ited research work exist in the literature for the specific objective of detecting and
classifying thermite weld defects in welded rails using image processing and machine
learning techniques. Thus, the results obtained in this work can be used as a baseline
for further research studies and improvement to the topic at hand.
1. Image dataset of other thermite weld defects types should be collected to im-
plement a robust method that can detect and classify any thermite weld defects.
2. More thermite weld image dataset should be collected and made publicly avail-
able to compare the methods proposed in this work to some of the state of the
art methods based on Deep learning approaches.
86 Conclusion and Future Work
Bibliography
[1] R.Ndlela, “Xrs-4 portable x-ray machine operation,” Transnet Ltd, Johannes-
burg, 2019.
[3] G. Yang, D. Li, G. Ru, J. Cao, and W. Jin, “Body height estimation system
based on binocular vision,” International Journal of Online Engineering (iJOE),
vol. 14, p. 177, 04 2018.
[4] X. Wang, B. Zhou, J. Ji, and B. Pu, “Recognition and distance estimation of an
irregular object in package sorting line based on monocular vision,” Interna-
tional Journal of Advanced Robotic Systems, vol. 16, p. 172988141982721, 02
2019.
[5] B. Suvdaa, J. Ahn, and J. Ko, “Steel surface defects detection and classification
using sift and voting strategy,” 2012.
[6] K. Selvi and D. JohnAravindar, “An industrial inspection approach for weld
defects using machine learning algorithm,” 2019.
[7] K. Huang, H. Yang, I. King, and M. Lyu, Local Learning vs. Global Learning: An
Introduction to Maxi-Min Margin Machine, 11, vol. 117, pp. 625–626.
[8] Y. Min, B. Xiao, J. Dang, B. Yue, and T. Cheng, “Real time detection system for
rail surface defects based on machine vision,” EURASIP Journal on Image and
Video Processing, vol. 2018, p. 3, 12 2018.
87
88 Bibliography
[11] X. Gibert, V. M. Patel, and R. Chellappa, “Deep multitask learning for rail-
way track inspection,” IEEE Transactions on Intelligent Transportation Systems,
vol. 18, no. 1, pp. 153–164, 2017.
[14] C. Tastimur, M. Karakose, A. Erhan, and A. Ilhan, “Rail defect detection with
real time image processing technique,” 07 2016, pp. 411–415.
[21] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp.
436–44, 05 2015.
Bibliography 89
[23] B. Yue, Y. Wang, Y. Min, Z. Zhang, W. Wang, and J. Yong, “Rail surface defect
recognition method based on adaboost multi-classifier combination,” in 2019
Asia-Pacific Signal and Information Processing Association Annual Summit and
Conference (APSIPA ASC), 2019, pp. 391–396.
[24] Y. Santur, M. Karaköse, and E. Akin, “Random forest based diagnosis approach
for rail fault inspection in railways,” in 2016 National Conference on Electrical,
Electronics and Biomedical Engineering (ELECO), 2016, pp. 745–750.
[25] B. Xaio, Y. Min, and H. Ma, “Detection of rail fastener based on wavelet de-
composition and pca,” in 2017 2nd International Conference on Information
Technology and Management Engineering (ITME 2017), 2017, pp. 163–168.
[26] X. Gibert, V. M. Patel, and R. Chellappa, “Robust fastener detection for au-
tonomous visual railway track inspection,” in 2015 IEEE Winter Conference on
Applications of Computer Vision, 2015, pp. 694–701.
[27] S. Gao, T. Szugs, and R. Ahlbrink, “Use of combined railway inspection data
sources for characterization of rolling contact fatigue,” 06 2018.
[29] H. Zhang, X. Jin, Q. M. J. Wu, Y. Wang, Z. He, and Y. Yang, “Automatic visual
detection system of railway surface defects with curvature filter and improved
gaussian mixture model,” IEEE Transactions on Instrumentation and Measure-
ment, vol. 67, no. 7, pp. 1593–1608, 2018.
[30] K. G. Mercy and S. K. Srinivasa Rao, “A framework for rail surface defect pre-
diction using machine learning algorithms,” in 2018 International Conference
on Inventive Research in Computing Applications (ICIRCA), 2018, pp. 972–977.
[32] N. Yao, Y. Jia, and K. Tao, “Rail weld defect prediction and related condition-
based maintenance,” IEEE Access, vol. 8, pp. 103 746–103 758, 2020.
90 Bibliography
[33] D. Soukup and R. Huber-Mörk, “Convolutional neural networks for steel sur-
face defect detection from photometric stereo images,” 12 2014.
[34] L. Shang, Q. Yang, J. Wang, S. Li, and W. Lei, “Detection of rail surface defects
based on cnn image recognition and classification,” in 2018 20th International
Conference on Advanced Communication Technology (ICACT), 2018, pp. 45–51.
[37] S. Yanan, Z. Hui, L. Li, and Z. Hang, “Rail surface defect detection method
based on yolov3 deep learning networks,” in 2018 Chinese Automation
Congress (CAC), 2018, pp. 1563–1568.
[38] Q. Xu, Q. Zhao, G. Yu, L. Wang, and T. Shen, “Rail defect detection method
based on recurrent neural network,” in 2020 39th Chinese Control Conference
(CCC), 2020, pp. 6486–6490.
[50] S. Attia, “Enhancement of chest x-ray images for diagnosis purposes,” Journal
of Natural Sciences Research, vol. 6, pp. 43–47, 2016.
[51] C. Dang, J. Gao, Z. Wang, F. Chen, and Y. Xiao, “Multi-step radiographic image
enhancement conforming to weld defect segmentation,” IET Image Process.,
vol. 9, pp. 943–950, 2015.
[53] Y. Zhang, l. zhang, B. Dai, B. Chen, and Y. Li, “Welding defect detection based
on local image enhancement,” IET Image Processing, vol. 13, 09 2019.
[54] W. hui Hou, D. Zhang, Y. Wei, J. Guo, and X. Zhang, “Review on com-
puter aided weld defect detection from radiography images,” Applied Sciences,
vol. 10, p. 1878, 2020.
92 Bibliography
[67] T. Tong, Y. Cai, and D. Sun, “Defects detection of weld image based on math-
ematical morphology and thresholding segmentation,” in 2012 8th Interna-
tional Conference on Wireless Communications, Networking and Mobile Com-
puting, 2012, pp. 1–4.
[68] Lu Yu, Qiao Wang, Lenan Wu, and Jun Xie, “Mumford-shah model with fast
algorithm on lattice,” in 2006 IEEE International Conference on Acoustics Speech
and Signal Processing Proceedings, vol. 2, 2006, pp. II–II.
[69] T. F. Chan and L. A. Vese, “A level set algorithm for minimizing the mumford-
shah functional in image processing,” in Proceedings IEEE Workshop on Varia-
tional and Level Set Methods in Computer Vision, 2001, pp. 161–168.
[74] J. Shao, H. Shi, D. Du, L. Wang, and H. Cao, “Automatic weld defect detec-
tion in real-time x-ray images based on support vector machine,” in 2011 4th
International Congress on Image and Signal Processing, vol. 4, 2011, pp. 1842–
1846.
[75] J. Hassan, A. M. Awan, and A. Jalil, “Welding defect detection and classi-
fication using geometric features,” in 2012 10th International Conference on
Frontiers of Information Technology, 2012, pp. 139–144.
94 Bibliography
[79] A. S. Ibrahim, A. E. Youssef, and A. L. Abbott, “Global vs. local features for gen-
der identification using arabic and english handwriting,” in 2014 IEEE Inter-
national Symposium on Signal Processing and Information Technology (ISSPIT),
2014, pp. 000 155–000 160.
[80] C. Shu, X. Ding, and C. Fang, “Histogram of the oriented gradient for face
recognition,” Tsinghua Science and Technology, vol. 16, no. 2, pp. 216–224,
2011.
[81] H. Yang and X. A. Wang, “Cascade face detection based on histograms of ori-
ented gradients and support vector machine,” in 2015 10th International Con-
ference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2015,
pp. 766–770.
[83] E. Gao, Q. Gao, and J. Chen, “The welding region extraction technology based
on hog and svm,” 01 2017.
[86] D. Mery and C. Arteta, “Automatic defect recognition in x-ray testing using
computer vision,” in 2017 IEEE Winter Conference on Applications of Computer
Vision (WACV), 2017, pp. 1026–1035.
Bibliography 95
[87] D. Mery and C. Arteta, “Automatic defect recognition in x-ray testing using
computer vision,” in 2017 IEEE Winter Conference on Applications of Computer
Vision (WACV), 2017, pp. 1026–1035.
[88] Y. Kai, D. Qian, T. Sun, M. Zhang, and S. Zhang, “Weld defect detection based
on completed local ternary patterns,” 12 2017, pp. 6–14.
[89] H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features,”
vol. 3951, 07 2006, pp. 404–417.
[90] N. Kong, “A literature review on histogram equalization and its variations for
digital image enhancement,” International journal of innovation, management
and technology, 2013.
[94] Y. Tian, M.-q. Zhou, Z.-k. Wu, and X.-c. Wang, “A region-based active contour
model for image segmentation,” in 2009 International Conference on Compu-
tational Intelligence and Security, vol. 1, 2009, pp. 376–380.
[95] L. Cai and Y. Wang, “A phase-based active contour model for segmentation of
breast ultrasound images,” in 2013 6th International Conference on Biomedical
Engineering and Informatics, 2013, pp. 91–95.
[96] B. Lucia, Genetic snakes: Active contour models by genetic algorithms, 01 2007,
vol. 8, pp. 177–194.
[98] M. Ben Gharsallah and E. Ben Braiek, “Image segmentation for defect detec-
tion based on level set active contour combined with saliency map,” in 2015
16th International Conference on Sciences and Techniques of Automatic Control
and Computer Engineering (STA), 2015, pp. 388–392.
96 Bibliography
[99] P. Bumrungkun, “Defect detection in textile fabrics with snake active contour
and support vector machines,” Journal of Physics: Conference Series, vol. 1195,
p. 012006, 04 2019.
[101] A. Kaushik, C. Prakashand, and S. Mathpal, “Edge detection and level set
active contour model for the segmentation of cavity present in dental x-ray
images,” International Journal of Computer Applications, vol. 96, 06 2014.
[111] K. Pavel and L. Ladislav, Novel Texture Descriptor Family for Face Recognition,
05 2019, pp. 37–47.
[113] D. Agnew, “Efficient use of the hessian matrix for circuit optimization,” IEEE
Transactions on Circuits and Systems, vol. 25, no. 8, pp. 600–608, 1978.
[114] D. Bradley and G. Roth, “Adaptive thresholding using the integral image,” J.
Graphics Tools, vol. 12, pp. 13–21, 01 2007.
[115] E. Oyallon and J. Rabin, “An analysis of the surf method,” Image Processing On
Line, vol. 5, pp. 176–218, 07 2015.
[116] P. Pui and J. Minoi, Keypoint Descriptors in SIFT and SURF for Face Feature
Extractions, 02 2018, pp. 64–73.
[121] I. Shadeed, J. Alwan, and D.Abd, “The effect of gamma value on support
vector machine performance with different kernels,” International Journal of
Electrical and Computer Engineering (IJECE), vol. 10, p. 5497, 10 2020.
[124] R. Timofte, T. Tuytelaars, and L. Van Gool, “Naive bayes image classification:
Beyond nearest neighbors,” vol. 7724, 11 2012, pp. 689–703.