Credit Card
Credit Card
Credit Card
ABSTRACT The advance in technologies such as e-commerce and financial technology (FinTech)
applications have sparked an increase in the number of online card transactions that occur on a daily basis.
As a result, there has been a spike in credit card fraud that affects card issuing companies, merchants,
and banks. It is therefore essential to develop mechanisms that ensure the security and integrity of credit
card transactions. In this research, we implement a machine learning (ML) based framework for credit
card fraud detection using a real world imbalanced datasets that were generated from European credit
cardholders. To solve the issue of class imbalance, we re-sampled the dataset using the Synthetic Minority
over-sampling TEchnique (SMOTE). This framework was evaluated using the following ML methods:
Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting
(XGBoost), Decision Tree (DT), and Extra Tree (ET). These ML algorithms were coupled with the Adaptive
Boosting (AdaBoost) technique to increase their quality of classification. The models were evaluated using
the accuracy, the recall, the precision, the Matthews Correlation Coefficient (MCC), and the Area Under the
Curve (AUC). Moreover, the proposed framework was implemented on a highly skewed synthetic credit card
fraud dataset to further validate the results that were obtained in this research. The experimental outcomes
demonstrated that using the AdaBoost has a positive impact on the performance of the proposed methods.
Further, the results obtained by the boosted models were superior to existing methods.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
165286 VOLUME 9, 2021
E. Ileberi et al.: Performance Evaluation of ML Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost
Tree (DT). These ML methods were evaluated individually an accuracy of 93.2% and a AUC of 0.93. Although these
in terms of their effectiveness and classification quality. results are promising, this research did not investigate the
Additionally, the Adaptive Boosting (AdaBoost) algorithm class imbalance issue that exists in the dataset that was used.
was paired with each methods to increase their robustness. Trivedi et al. [11] proposed an efficient credit card fraud
The main contribution of this paper is a comparative analysis detection engine using ML methods. In this research,
of several ML methods on a publicly available dataset the authors considered many supervised ML techniques
that contains real word cards transactions. Moreover, this including Gradient Boosting (GB) and Random Forest
research investigate the AdaBoost to increase the quality (RF). The authors evaluated these methods using the
of classification on a highly skewed credit card fraud European cardholders dataset. The performance metrics
dataset. The major contribution of this research work can be used to assess the effectiveness of the proposed approaches
summarized as follows: include the accuracy and the precision. The outcome of
• We propose a credit card fraud detection framework that the experiments showed that the GB obtained an accuracy
is scalable. of 94.01% and a precision of 93.99%. On the other hand,
• We implement the SMOTE technique in order to solve the RF achieved an accuracy of 94.00% and a precision
the issue of class imbalance that is found in credit card of 95.98%.
fraud datasets. Tanouz et al. [12] presented a credit card fraud detection
• We pair the AdaBoost method with several ML methods framework using ML algorithms. In this research, the
to increase the performance on the proposed framework. authors used the European cardholders dataset to assess
Moreover, we conduct a comparison analysis using the the performance of the proposed methods. Moreover, the
following metrics: accuracy, recall, precision, Matthews authors implemented an under-sampling technique to solve
Correlation Coefficient (MCC), and Area Under the the issue of class imbalance that exist in the dataset that
Curve (AUC). was used. The ML methods considered in this work include
• We implement the proposed credit card fraud detection the RF and LR. The researchers used the accuracy as
framework on a highly imbalanced synthetic dataset to the main performance metric. The results demonstrated
validate its effectiveness. that the RF approach achieved a fraud detection accu-
The rest of the paper is organized as follows. Section 2 pro- racy of 91.24%. In contrast, the LR method obtained an
vides a literature review of previous work that used ML accuracy 95.16%. Furthermore, the authors computed the
for credit card fraud detection. Section 3 provides a confusion matrix to assert whether these proposed methods
background of the ML methods that were used in this paper. performed optimally for the positive and negative classes.
In Section 4, we conduct the experiments. Section 5 presents The results showed that the class imbalance issue that exist
the implementation of the proposed framework on a synthetic in the European credit card holder dataset requires further
credit card fraud dataset. Section 6 concludes the research. investigation.
Riffi et al. [13] implemented a credit card fraud detection
II. RELATED WORK engine using the Extreme Learning Machine (ELM) and
This section provides a literature review of previous Multilayer Perceptron (MLP) algorithms. Both the ELM
researches that used ML techniques for credit card fraud and MLP are artificial neural networks (ANNs); however,
detection. they differ in terms of internal architecture. In this research,
Khatri et al. [9] implemented several ML algorithms for the authors used the European cardholders dataset that was
credit card fraud detection. In this research, the authors imple- generated in 2013. The authors used the fraud detection
mented the following methods: Decision Tree(DT), k-Nearest accuracy as the main performance metric. The results
Neighbor (kNN), Logistic Regression (LR), Random Forest demonstrated that the MLP method achieved an accuracy
(RF), and Naive Bayes (NB). To evaluate the ML-based credit of 97.84%. In contrast, the ELM attained credit card fraud
card fraud detection models, the researchers used a dataset detection accuracy of 95.46%. This work concluded that
that was generated from European cardholders in 2013 [25]. the MLP outperformed the ELM; however, the ELM is less
Moreover, the authors considered the sensitivity and the pre- complex in comparison to the MLP.
cision as the main performance metrics. The results showed Randhawa et al. [14] The authors proposed a credit card
that the kNN algorithm achieved the most optimal results with fraud detection engine using Adaptive Boosting (AdaBoost)
a precision of 91.11% and a sensitivity of 81.19%. and Majority Voting (MV) methods. In this research, the
Rajora et al. [10] conducted a comparative research of authors used the European cardholders dataset. Moreover,
ML methods for credit card fraud detection using the the authors considered the AdaBoost method in conjunctions
European cardholders dataset. Some of the methods that with ML methods such as the Support Vector Machine
were investigated include the RF and the kNN methods. (SVM). In the experiments, the accuracy and the Matthews
The authors considered the accuracy and the area under the Correlation Coefficient (MCC) were considered as the main
curve (AUC) as the main performance metrics. The results performance metrics. The results demonstrated that the
demonstrated that RF algorithm achieved an accuracy of AdaBoost-SVM achieved an accuracy of 99.959% and a
94.9% and a AUC of 0.94. In contrast, the kNN obtained MCC of 0.044.
Amount. All the features within the dataset are numerical. Algorithm 2 SMOTE Implementation - Credit Card Fraud
The class (label) is represented by the last column whereby Dataset
the value of 0 represents a legitimate transaction and 1: Start
2: Input Credit card fraud dataset (DF) containing minority
the value of 1 is a fraudulent activity. The attributes V1
class data points
to V28 do not have specific feature names due to data
3: Output An oversampled dataset: Xres , input data and yres ,
security and integrity reasons. The name of the features were the target
withheld to protect the identity and types of transactions 4: Import the SMOTE module from imblearn [7]
conducted by the cardholders. This dataset has been used 5: Import pandas (pd) from pandas [8]
in [9]–[14]. 6: Read DF in a pd dataframe
7: Separate the dataframe into input data, X , and target data,
C. SMOTE APPLIED TO CREDIT CARD FRAUD DATASET y
8: Instantiate SMOTE instance as sm = SMOTE (m : r),
The Synthetic Minority over-sampling TEchnique (SMOTE)
where m is the minority class and r the ratio.
is amongst of the most dominant techniques that are used to
9: Fit the SMOTE instance as follows: Xres , yres =
address the issue of class imbalance that is found in datasets sm.fit_resample(X , y)
such as the ones used to build credit card fraud detection ML- 10: End
based models [5]. The SMOTE method generates samples
of a specific class by connecting a data point with its k-
nearest neighbours. The SMOTE method generates synthetic during the training process. Algorithm 1 depicts the pseudo
data points that are not a direct replica of the minority class code implementation of the SMOTE technique [6] that was
instance. This is done to avoid the phenomenon of over-fitting used in this research. Algorithm 2 describes the pseudo code
implementation of the SMOTE method on the credit card
dataset that is used in this research by using the Imblearn
Algorithm 1 SMOTE (T , N , k) library [7].
1: Input T , the total number of instances in the minority
class; N , the percentage (amount of SMOTE). k, the D. EXPERIMENTAL SETUP
number of neighbours. The classification experiments were conducted on Google
N
2: Ouput 100 ∗ T , the newly created synthetic data points Colab [26]. The Google Compute Engine (GCE) had the
3: if N < 100 then following specifications: Intel(R) Xeon(R), 2 Cores, 2.30G
4: Generate T minority class data points randomly. Hz. The ML models were implemented using the Scikit-
5: T = (N /100) * T
Learn ML framework [27].
6: N = 100
7: end if E. PERFORMANCE METRICS
N
8: N = int( 100 )
The credit card fraud dataset that is used in this research
9: num_attrs, the number of attributes
contains traces of legitimate and fraudulent transactions that
10: k, the number of nearest neighbours
11: sample, are labeled as 1s and 0s. Therefore, we have framed this
12: new_index, keeps tabs on the number of synthetic data ML task as a binary classification task. Such problems
points that were generated. It is initialized with 0. are evaluated using performance metrics that includes: the
13: synthetic_array, an array to keep synthetic data points accuracy (AC), the recall (RC), and the precision (PR). The
14: for t range(1 to T ) do mathematical formulation of these indicators is as follows:
15: Calculates the k nearest neighbours for t and save the • False positive (FP): correct transactions that are incor-
indices in nn_array rectly labeled as fraudulent.
16: Populate(N , t, nn_array (this is a function that • False Negative (FN): fraudulent transactions that are
computes synthetic samples) incorrectly classified as legitimate transactions.
17: end for
• True positive (TP): fraudulent activities that are accu-
18: Populate(N , t, nn_array
19: while N 6 = 0 do rately flagged fraudulent.
20: Randomly select an number between 1 and k = rn • True Negative (TN): genuine transactions that are
21: for at in range (1 to num_attrs) do positively classified as genuine.
22: Calculate the difference: δ = TN + TP
sample[nn_array[rn][at]] - sample[i][at] AC = (3)
23: Compute the gap: gap = random(0,1) - random TP + TN + FN + Fp
numbers between 0 and 1. TP
PR = (4)
24: synthetic_array[new_index][at] = sample[i][at]] TP + FP
+ gap * δ TP
25: end for RC = (5)
TP + FN
26: increment the new index: new_index++ Furthermore, the European cardholders dataset is highly
27: N =N -1 imbalanced. Therefore, considering the AC, PR, and the
28: end while
RC metrics is not enough to assess the performance of our
between the RCs and PRs that were obtained by all the
models before and after the application of the SMOTE-
AdaBoost. Additionally, Figure 14 shows the comparison
of the MCCs before and after the implementation the Day, Amount, Use Chip, Merchant Name, Merchant City,
SMOTE-AdaBoost. Merchant State, MCC, Zip, Errors, Is Fraud}, where Is Fraud
represents the class. These attributes are listed in Table 4.
G. EXPERIMENTS VALIDATION The experiments process were conducted using the following
In this section, experiments are conducted on a synthetic models: DT, RF, ET, XGB, LR. All these models were
credit card fraud dataset which is publicly available [32]. This adaptively boosted (using AdaBoost). The results are listed in
dataset includes 24357143 genuine credit card transactions Table 5. The model that performed optimally in comparison
and 29757 fraudulent ones. Moreover, the dataset contains to other models is the ET-AdaBoost with an accuracy of
the following features F = { User, Card, Year, Month, Time, 99.99%, a recall of 99.99%, a precision of 99.99% and
REFERENCES
[1] A. Thennakoon, C. Bhagyani, S. Premadasa, S. Mihiranga, and
N. Kuruwitaarachchi, ‘‘Real-time credit card fraud detection using
machine learning,’’ in Proc. 9th Int. Conf. Cloud Comput., Data Sci. Eng.
(Confluence), Jan. 2019, pp. 488–493.
[2] S. P. Maniraj, A. Saini, S. Ahmed, and S. Sarkar, ‘‘Credit card fraud
detection using machine learning and data science,’’ Int. J. Res. Appl. Sci.
Eng. Technol., vol. 8, no. 9, pp. 3788–3792, Jul. 2021.
[3] The Nilson Report. Accessed: Sep. 27, 2021. [Online]. Available: https:
//www.nilsonreport.com/upload/content_promo/The_Nilson_Report_10-
17-2016.pdf
[4] The Nilson Report. Accessed: Sep. 27, 2021. [Online]. Available: https://nil
sonreport.com/content_promo.php?id_promo=16
[5] D. Elreedy and A. F. Atiya, ‘‘A comprehensive analysis of synthetic
minority oversampling technique (SMOTE) for handling class imbalance,’’
Inf. Sci., vol. 505, pp. 32–64, Dec. 2019.
[6] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, ‘‘SMOTE:
FIGURE 15. ROC: DT, RF, ET, XGB, LR.
Synthetic minority over-sampling technique,’’ J. Artif. Intell. Res., vol. 16,
no. 1, pp. 321–357, 2002.
[7] Imalanced Learn. Accessed: Sep. 27, 2021. [Online]. Available: https://
MCC of 0.99. This pattern can also be observed in the imbalanced-learn.org/stable/
[8] Pandas. Accessed: Sep. 27, 2021. [Online]. Available: https://pandas.
outcomes obtained by the DT-AdBoost, RF-AdaBoost and pydata.org/
RF-AdaBoost. These results demonstrated that using the [9] S. Khatri, A. Arora, and A. P. Agrawal, ‘‘Supervised machine learning
SMOTE method on CCF data in conjunction with AdaBoost algorithms for credit card fraud detection: A comparison,’’ in Proc.
10th Int. Conf. Cloud Comput., Data Sci. Eng. (Confluence), Jan. 2020,
on the classification models has a positive impact on the pp. 680–683.
overall performance of a CCF detection engine. Additionally, [10] S. Rajora, D. L. Li, C. Jha, N. Bharill, O. P. Patel, S. Joshi, D. Puthal,
Fig. 15 depicts the ROC curves of each proposed model and M. Prasad, ‘‘A comparative study of machine learning techniques for
credit card fraud detection based on time variance,’’ in Proc. IEEE Symp.
and the results show that the DT, RF, ET, and XGB Comput. Intell. (SSCI), Nov. 2018, pp. 1958–1963.
obtained an AUC of 1. In contrast, the LR achieved an [11] N. K. Trivedi, S. Simaiya, U. K. Lilhore, and S. K. Sharma, ‘‘An efficient
AUC of 0.66. These results validate the MCC values listed credit card fraud detection model based on machine learning methods,’’
Int. J. Adv. Sci. Technol., vol. 29, no. 5, pp. 3414–3424, 2020.
in Table 5. [12] R. Sailusha, V. Gnaneswar, R. Ramesh, and G. R. Rao, ‘‘Credit card fraud
detection using machine learning,’’ in Proc. 4th Int. Conf. Intell. Comput.
Control Syst. (ICICCS), May 2020, pp. 967–972.
V. CONCLUSION [13] F. Z. El Hlouli, J. Riffi, M. A. Mahraz, A. El Yahyaouy, and H. Tairi,
This paper implemented several ML algorithms for credit ‘‘Credit card fraud detection based on multilayer perceptron and extreme
card fraud detection using the European credit card fraud learning machine architectures,’’ in Proc. Int. Conf. Intell. Syst. Comput.
Vis. (ISCV), Jun. 2020, pp. 1–5.
dataset that was generated in September 2013. The ML [14] K. Randhawa, C. K. Loo, M. Seera, C. P. Lim, and A. K. Nandi, ‘‘Credit
methods proposed in this work included the DT, RF, ET, card fraud detection using AdaBoost and majority voting,’’ IEEE Access,
XGB, LR and SVM. Additionally, each of the proposed vol. 6, pp. 14277–14284, 2018.
[15] R. E. Schapire, ‘‘Explaining adaboost,’’ in Empirical Inference. Berlin,
algorithms was paired with the AdaBoost technique to Germany: Springer, 2013, pp. 37–52.
increase the quality of classification and to deal with the [16] X. Li, L. Wang, and E. Sung, ‘‘AdaBoost with SVM-based component
issue of class imbalance that is present in the European classifiers,’’ Eng. Appl. Artif. Intell., vol. 21, no. 5, pp. 785–795,
Aug. 2008.
credit card fraud dataset. Further, a comparison analysis [17] K. Kirasich, T. Smith, and B. Sadler, ‘‘Random forest vs logistic regression:
was conducted between the methods presented in this work Binary classification for heterogeneous datasets,’’ SMU Data Sci. Rev.,
and existing credit card fraud detection frameworks. For vol. 1, no. 3, p. 9, 2018.
[18] J. Feng, H. Xu, S. Mannor, and S. Yan, ‘‘Robust logistic regression and
instance, the DT-AdaBoost, RF-AdaBoost, ET-AdaBoost, classification,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 27, 2014,
and XGB-AdaBoost achieved accuracies of 99.67%, 99.95%, pp. 253–261.
99.98%, and 99.98%, respectively. In terms of the quality [19] C.-C. Chern, Y.-J. Chen, and B. Hsiao, ‘‘Decision tree–based classifier in
providing telehealth service,’’ BMC Med. Informat. Decis. Making, vol. 19,
of classification, the ET-AdaBoost obtained an MCC of no. 1, pp. 1–15, Dec. 2019.
0.99 and the XGB-AdaBoost achieved an MCC of 0.99. [20] T. Hengl, M. Nussbaum, M. N. Wright, G. B. M. Heuvelink, and
These outcomes demonstrated that using the AdaBoost B. Gräler, ‘‘Random forest as a generic framework for predictive modeling
algorithm has a positive impact on the proposed ML methods. of spatial and spatio-temporal variables,’’ PeerJ, vol. 6, p. e5518,
Aug. 2018.
Moreover, the framework proposed in this research was [21] D. A. Pisner and D. M. Schnyer, ‘‘Support vector machine,’’ in Machine
validated using a highly skewed synthetic credit card fraud Learning. New York, NY, USA: Academic, 2020, pp. 101–121.
dataset and the results were optimal. For instance, the ET- [22] A. Tharwat, ‘‘Parameter investigation of support vector machine classifier
AdaBoost obtained an accuracy of 99.99% and a MCC with kernel functions,’’ Knowl. Inf. Syst., vol. 61, no. 3, pp. 1269–1302,
Dec. 2019.
of 0.99. Moreover, the XGB-AdaBoost, DT-AdaBoost, ET- [23] Ensemble Trees. Accessed: Sep. 27, 2021. [Online]. Available:
AdaBoost, and RF-AdaBoost attained an AUC value of 1. https://scikit-learn.org/stable/modules/classes.html#module-
In future works, we intend to test and validate the proposed sklearn.ensemble
[24] T. T. Wong and P. Y. Yeh, ‘‘Reliable accuracy estimates from k-fold cross
framework on additional credit card fraud datasets that will validation,’’ IEEE Trans. Knowl. Data Eng., vol. 32, no. 8, pp. 1586–1594,
be sourced from financial institutions. Apr. 2019.
[25] Credit Card Fraud Detection. Accessed: Sep. 27, 2021. [Online]. YANXIA SUN (Senior Member, IEEE) received
Available: https://www.kaggle.com/mlg-ulb/creditcardfraud the D.Tech. degree in electrical engineering from
[26] Google Colab. Accessed: Sep. 27, 2021. [Online]. Available: the Tshwane University of Technology, South
https://colab.research.google.com/ Africa, and the Ph.D. degree in computer science
[27] Scikit-learn: Machine Learning in Python. Accessed: Sep. 27, 2021. from University Paris-EST, France, in 2012. She is
[Online]. Available: https://scikit-learn.org/stable/ currently working as a Professor with the Depart-
[28] D. Chicco and G. Jurman, ‘‘The advantages of the Matthews correlation ment of Electrical and Electronic Engineering Sci-
coefficient (MCC) over F1 score and accuracy in binary classification
ence, University of Johannesburg, South Africa.
evaluation,’’ BMC Genomics, vol. 21, no. 1, pp. 1–13, 2020.
Her research interests include renewable energy,
[29] S. Boughorbel, F. Jarray, and M. El-Anbari, ‘‘Optimal classifier for
imbalanced data using Matthews correlation coefficient metric,’’ PLoS evolutionary optimization, neural networks, non-
ONE, vol. 12, no. 6, Jun. 2017, Art. no. e0177678. linear dynamics, and control systems.
[30] M. Norton and S. Uryasev, ‘‘Maximization of AUC and buffered AUC
in binary classification,’’ Math. Program., vol. 174, no. 1, pp. 575–612,
2019.
[31] A. Luque, A. Carrasco, A. Martín, and A. de las Heras, ‘‘The impact of
class imbalance in classification performance metrics based on the binary
confusion matrix,’’ Pattern Recognit., vol. 91, pp. 216–231, Oct. 2019.
[32] E. R. Altman, ‘‘Synthesizing credit card transactions,’’ 2019,
arXiv:1910.03033.
ZENGHUI WANG (Member, IEEE) received the
B.Eng. degree in automation from the Naval
Aviation Engineering Academy, China, in 2002,
EMMANUEL ILEBERI received the M.Sc. degree and the Ph.D. degree in control theory and control
in telecommunications systems and computer engineering from Nankai University, China, in
network engineering from the Belarusian State 2007.
University of Informatics and Radioelectronics, He is currently a Professor with the Department
Minsk, Belarus, in 2018. He is currently pursuing of Electrical and Mining Engineering, University
the Ph.D. degree in electrical and electronic of South Africa (UNISA), South Africa. His
engineering with the University of Johannes- research interests include industry 4.0, control
burg, South Africa. His research interests include theory and control engineering, engineering optimization, image/video
machine learning and credit card fraud detection. processing, artificial intelligence, and chaos.