ABSTRACT The advance in technologies such as e-commerce and financial technology (FinTech)
applications have sparked an increase in the number of online card transactions that occur on a daily basis.
As a result, there has been a spike in credit card fraud that affects card issuing companies, merchants,
and banks. It is therefore essential to develop mechanisms that ensure the security and integrity of credit
card transactions. In this research, we implement a machine learning (ML) based framework for credit
card fraud detection using a real world imbalanced datasets that were generated from European credit
cardholders. To solve the issue of class imbalance, we re-sampled the dataset using the Synthetic Minority
over-sampling TEchnique (SMOTE). This framework was evaluated using the following ML methods:
Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting
(XGBoost), Decision Tree (DT), and Extra Tree (ET). These ML algorithms were coupled with the Adaptive
Boosting (AdaBoost) technique to increase their quality of classification. The models were evaluated using
the accuracy, the recall, the precision, the Matthews Correlation Coefficient (MCC), and the Area Under the
Curve (AUC). Moreover, the proposed framework was implemented on a highly skewed synthetic credit card
fraud dataset to further validate the results that were obtained in this research. The experimental outcomes
demonstrated that using the AdaBoost has a positive impact on the performance of the proposed methods.
Further, the results obtained by the boosted models were superior to existing methods.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
165286 VOLUME 9, 2021
E. Ileberi et al.: Performance Evaluation of ML Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost
Tree (DT). These ML methods were evaluated individually an accuracy of 93.2% and a AUC of 0.93. Although these
in terms of their effectiveness and classification quality. results are promising, this research did not investigate the
Additionally, the Adaptive Boosting (AdaBoost) algorithm class imbalance issue that exists in the dataset that was used.
was paired with each methods to increase their robustness. Trivedi et al. [11] proposed an efficient credit card fraud
The main contribution of this paper is a comparative analysis detection engine using ML methods. In this research,
of several ML methods on a publicly available dataset the authors considered many supervised ML techniques
that contains real word cards transactions. Moreover, this including Gradient Boosting (GB) and Random Forest
research investigate the AdaBoost to increase the quality (RF). The authors evaluated these methods using the
of classification on a highly skewed credit card fraud European cardholders dataset. The performance metrics
dataset. The major contribution of this research work can be used to assess the effectiveness of the proposed approaches
summarized as follows: include the accuracy and the precision. The outcome of
• We propose a credit card fraud detection framework that the experiments showed that the GB obtained an accuracy
is scalable. of 94.01% and a precision of 93.99%. On the other hand,
• We implement the SMOTE technique in order to solve the RF achieved an accuracy of 94.00% and a precision
the issue of class imbalance that is found in credit card of 95.98%.
fraud datasets. Tanouz et al. [12] presented a credit card fraud detection
• We pair the AdaBoost method with several ML methods framework using ML algorithms. In this research, the
to increase the performance on the proposed framework. authors used the European cardholders dataset to assess
Moreover, we conduct a comparison analysis using the the performance of the proposed methods. Moreover, the
following metrics: accuracy, recall, precision, Matthews authors implemented an under-sampling technique to solve
Correlation Coefficient (MCC), and Area Under the the issue of class imbalance that exist in the dataset that
Curve (AUC). was used. The ML methods considered in this work include
• We implement the proposed credit card fraud detection the RF and LR. The researchers used the accuracy as
framework on a highly imbalanced synthetic dataset to the main performance metric. The results demonstrated
validate its effectiveness. that the RF approach achieved a fraud detection accu-
The rest of the paper is organized as follows. Section 2 pro- racy of 91.24%. In contrast, the LR method obtained an
vides a literature review of previous work that used ML accuracy 95.16%. Furthermore, the authors computed the
for credit card fraud detection. Section 3 provides a confusion matrix to assert whether these proposed methods
background of the ML methods that were used in this paper. performed optimally for the positive and negative classes.
In Section 4, we conduct the experiments. Section 5 presents The results showed that the class imbalance issue that exist
the implementation of the proposed framework on a synthetic in the European credit card holder dataset requires further
credit card fraud dataset. Section 6 concludes the research. investigation.
Riffi et al. [13] implemented a credit card fraud detection
II. RELATED WORK engine using the Extreme Learning Machine (ELM) and
This section provides a literature review of previous Multilayer Perceptron (MLP) algorithms. Both the ELM
researches that used ML techniques for credit card fraud and MLP are artificial neural networks (ANNs); however,
detection. they differ in terms of internal architecture. In this research,
Khatri et al. [9] implemented several ML algorithms for the authors used the European cardholders dataset that was
credit card fraud detection. In this research, the authors imple- generated in 2013. The authors used the fraud detection
mented the following methods: Decision Tree(DT), k-Nearest accuracy as the main performance metric. The results
Neighbor (kNN), Logistic Regression (LR), Random Forest demonstrated that the MLP method achieved an accuracy
(RF), and Naive Bayes (NB). To evaluate the ML-based credit of 97.84%. In contrast, the ELM attained credit card fraud
card fraud detection models, the researchers used a dataset detection accuracy of 95.46%. This work concluded that
that was generated from European cardholders in 2013 [25]. the MLP outperformed the ELM; however, the ELM is less
Moreover, the authors considered the sensitivity and the pre- complex in comparison to the MLP.
cision as the main performance metrics. The results showed Randhawa et al. [14] The authors proposed a credit card
that the kNN algorithm achieved the most optimal results with fraud detection engine using Adaptive Boosting (AdaBoost)
a precision of 91.11% and a sensitivity of 81.19%. and Majority Voting (MV) methods. In this research, the
Rajora et al. [10] conducted a comparative research of authors used the European cardholders dataset. Moreover,
ML methods for credit card fraud detection using the the authors considered the AdaBoost method in conjunctions
European cardholders dataset. Some of the methods that with ML methods such as the Support Vector Machine
were investigated include the RF and the kNN methods. (SVM). In the experiments, the accuracy and the Matthews
The authors considered the accuracy and the area under the Correlation Coefficient (MCC) were considered as the main
curve (AUC) as the main performance metrics. The results performance metrics. The results demonstrated that the
demonstrated that RF algorithm achieved an accuracy of AdaBoost-SVM achieved an accuracy of 99.959% and a
94.9% and a AUC of 0.94. In contrast, the kNN obtained MCC of 0.044.
Amount. All the features within the dataset are numerical. Algorithm 2 SMOTE Implementation - Credit Card Fraud
The class (label) is represented by the last column whereby Dataset
the value of 0 represents a legitimate transaction and 1: Start
2: Input Credit card fraud dataset (DF) containing minority
the value of 1 is a fraudulent activity. The attributes V1
class data points
to V28 do not have specific feature names due to data
3: Output An oversampled dataset: Xres , input data and yres ,
security and integrity reasons. The name of the features were the target
withheld to protect the identity and types of transactions 4: Import the SMOTE module from imblearn [7]
conducted by the cardholders. This dataset has been used 5: Import pandas (pd) from pandas [8]
in [9]–[14]. 6: Read DF in a pd dataframe
7: Separate the dataframe into input data, X , and target data,
8: Instantiate SMOTE instance as sm = SMOTE (m : r),
The Synthetic Minority over-sampling TEchnique (SMOTE)
where m is the minority class and r the ratio.
is amongst of the most dominant techniques that are used to
9: Fit the SMOTE instance as follows: Xres , yres =
address the issue of class imbalance that is found in datasets sm.fit_resample(X , y)
such as the ones used to build credit card fraud detection ML- 10: End
based models [5]. The SMOTE method generates samples
of a specific class by connecting a data point with its k-
nearest neighbours. The SMOTE method generates synthetic during the training process. Algorithm 1 depicts the pseudo
data points that are not a direct replica of the minority class code implementation of the SMOTE technique [6] that was
instance. This is done to avoid the phenomenon of over-fitting used in this research. Algorithm 2 describes the pseudo code
implementation of the SMOTE method on the credit card
dataset that is used in this research by using the Imblearn
Algorithm 1 SMOTE (T , N , k) library [7].
1: Input T , the total number of instances in the minority
class; N , the percentage (amount of SMOTE). k, the D. EXPERIMENTAL SETUP
number of neighbours. The classification experiments were conducted on Google
2: Ouput 100 ∗ T , the newly created synthetic data points Colab [26]. The Google Compute Engine (GCE) had the
3: if N < 100 then following specifications: Intel(R) Xeon(R), 2 Cores, 2.30G
4: Generate T minority class data points randomly. Hz. The ML models were implemented using the Scikit-
5: T = (N /100) * T
Learn ML framework [27].
6: N = 100
8: N = int( 100 )
The credit card fraud dataset that is used in this research
9: num_attrs, the number of attributes
contains traces of legitimate and fraudulent transactions that
10: k, the number of nearest neighbours
11: sample, are labeled as 1s and 0s. Therefore, we have framed this
12: new_index, keeps tabs on the number of synthetic data ML task as a binary classification task. Such problems
points that were generated. It is initialized with 0. are evaluated using performance metrics that includes: the
13: synthetic_array, an array to keep synthetic data points accuracy (AC), the recall (RC), and the precision (PR). The
14: for t range(1 to T ) do mathematical formulation of these indicators is as follows:
15: Calculates the k nearest neighbours for t and save the • False positive (FP): correct transactions that are incor-
indices in nn_array rectly labeled as fraudulent.
16: Populate(N , t, nn_array (this is a function that • False Negative (FN): fraudulent transactions that are
computes synthetic samples) incorrectly classified as legitimate transactions.
17: end for
• True positive (TP): fraudulent activities that are accu-
18: Populate(N , t, nn_array
19: while N 6 = 0 do rately flagged fraudulent.
20: Randomly select an number between 1 and k = rn • True Negative (TN): genuine transactions that are
21: for at in range (1 to num_attrs) do positively classified as genuine.
22: Calculate the difference: δ = TN + TP
sample[nn_array[rn][at]] - sample[i][at] AC = (3)
23: Compute the gap: gap = random(0,1) - random TP + TN + FN + Fp
numbers between 0 and 1. TP
PR = (4)
24: synthetic_array[new_index][at] = sample[i][at]] TP + FP
+ gap * δ TP
25: end for RC = (5)
26: increment the new index: new_index++ Furthermore, the European cardholders dataset is highly
27: N =N -1 imbalanced. Therefore, considering the AC, PR, and the
28: end while
RC metrics is not enough to assess the performance of our
between the RCs and PRs that were obtained by all the
models before and after the application of the SMOTE-
AdaBoost. Additionally, Figure 14 shows the comparison
of the MCCs before and after the implementation the Day, Amount, Use Chip, Merchant Name, Merchant City,
SMOTE-AdaBoost. Merchant State, MCC, Zip, Errors, Is Fraud}, where Is Fraud
represents the class. These attributes are listed in Table 4.
G. EXPERIMENTS VALIDATION The experiments process were conducted using the following
In this section, experiments are conducted on a synthetic models: DT, RF, ET, XGB, LR. All these models were
credit card fraud dataset which is publicly available [32]. This adaptively boosted (using AdaBoost). The results are listed in
dataset includes 24357143 genuine credit card transactions Table 5. The model that performed optimally in comparison
and 29757 fraudulent ones. Moreover, the dataset contains to other models is the ET-AdaBoost with an accuracy of
the following features F = { User, Card, Year, Month, Time, 99.99%, a recall of 99.99%, a precision of 99.99% and
ZENGHUI WANG (Member, IEEE) received the
B.Eng. degree in automation from the Naval
Aviation Engineering Academy, China, in 2002,
EMMANUEL ILEBERI received the M.Sc. degree and the Ph.D. degree in control theory and control
in telecommunications systems and computer engineering from Nankai University, China, in
network engineering from the Belarusian State 2007.
University of Informatics and Radioelectronics, He is currently a Professor with the Department
Minsk, Belarus, in 2018. He is currently pursuing of Electrical and Mining Engineering, University
the Ph.D. degree in electrical and electronic of South Africa (UNISA), South Africa. His
engineering with the University of Johannes- research interests include industry 4.0, control
burg, South Africa. His research interests include theory and control engineering, engineering optimization, image/video
machine learning and credit card fraud detection. processing, artificial intelligence, and chaos.