Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
33 views

ML Project Report

The document describes a mini project on credit card fraud detection using machine learning. It presents the abstract, introduction and index sections. Specifically: 1) It introduces the problem of credit card fraud and how machine learning can help detect fraudulent transactions faster and more accurately. 2) The introduction provides background on credit cards and the types of fraud that can occur, such as card theft, counterfeit cards, and online fraud. 3) The index previews that the project will include chapters on literature review, system design, implementation, results, and conclusions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

ML Project Report

The document describes a mini project on credit card fraud detection using machine learning. It presents the abstract, introduction and index sections. Specifically: 1) It introduces the problem of credit card fraud and how machine learning can help detect fraudulent transactions faster and more accurately. 2) The introduction provides background on credit cards and the types of fraud that can occur, such as card theft, counterfeit cards, and online fraud. 3) The index previews that the project will include chapters on literature review, system design, implementation, results, and conclusions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

BHARATI VIDYAPEETH (DEEMED TO BE UNIVERSITY)

COLLEGE OF ENGINEERING

DEPARTMENT OF ENGINEERING & TECHNOLOGY OFFCAMPUS,


KHARGHAR, NAVI MUMBAI,410210

Mini Project Report


On
Credit Card Fraud Detection
Subject-: - Machine Learning

Presented By

Roll No. Name PRN

44 Sharda Verma 2043110212

51 Arpit Singh 2143110186

67 Jaydip Singh 19431100218

38 Aryan Srivastava 2043110204

Signature of Internal Examiner


1|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
BHARATI VIDYAPEETH (DEEMED TO BE UNIVERSITY)
COLLEGE OF ENGINEERING

DEPARTMENT OF ENGINEERING & TECHNOLOGY OFFCAMPUS,


KHARGHAR, NAVI MUMBAI,410210

This is to certify that the project entitled, “Credit Card Fraud Detection”, which is being submitted here
with for the award of B.Tech. CSBS Department, is the result of the work completed by Sharda Verma
under my supervision and guidance within the four walls of the institute and the same has not been
submitted elsewhere for the award of any degree.

Guide (Head of Department)


DEPARTMENT OF ENGINEERING & DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS, TECHNOLOGY OFFCAMPUS
KHARGHAR, NAVI MUMBAI KHARGHAR, NAVI MUMBAI

Principal
DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS,
KHARGHAR, NAVI MUMBAI

2|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
BHARATI VIDYAPEETH (DEEMED TO BE UNIVERSITY)
COLLEGE OF ENGINEERING

DEPARTMENT OF ENGINEERING & TECHNOLOGY OFFCAMPUS,


KHARGHAR, NAVI MUMBAI,410210

This is to certify that the project entitled, “Credit Card Fraud Detection”, which is being submitted here
with for the award of B.Tech. CSBS Department, is the result of the work completed by Arpit Singh
under my supervision and guidance within the four walls of the institute and the same has not been
submitted elsewhere for the award of any degree.

Guide (Head of Department)


DEPARTMENT OF ENGINEERING & DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS, TECHNOLOGY OFFCAMPUS
KHARGHAR, NAVI MUMBAI KHARGHAR, NAVI MUMBAI

Principal
DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS,
KHARGHAR, NAVI MUMBAI

3|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
BHARATI VIDYAPEETH (DEEMED TO BE UNIVERSITY)
COLLEGE OF ENGINEERING

DEPARTMENT OF ENGINEERING & TECHNOLOGY OFFCAMPUS,


KHARGHAR, NAVI MUMBAI,410210

This is to certify that the project entitled, “Credit Card Fraud Detection”, which is being submitted here
with for the award of B.Tech. CSBS Department, is the result of the work completed by Jaydip Singh
under my supervision and guidance within the four walls of the institute and the same has not been
submitted elsewhere for the award of any degree.

Guide (Head of Department)


DEPARTMENT OF ENGINEERING & DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS, TECHNOLOGY OFFCAMPUS
KHARGHAR, NAVI MUMBAI KHARGHAR, NAVI MUMBAI

Principal
DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS,
KHARGHAR, NAVI MUMBAI

4|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
BHARATI VIDYAPEETH (DEEMED TO BE UNIVERSITY)
COLLEGE OF ENGINEERING

DEPARTMENT OF ENGINEERING & TECHNOLOGY OFFCAMPUS,


KHARGHAR, NAVI MUMBAI,410210

This is to certify that the project entitled, “Credit Card Fraud Detection”, which is being submitted here
with for the award of B.Tech. CSBS Department, is the result of the work completed by Aryan Srivastav
under my supervision and guidance within the four walls of the institute and the same has not been
submitted elsewhere for the award of any degree.

Guide (Head of Department)


DEPARTMENT OF ENGINEERING & DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS, TECHNOLOGY OFFCAMPUS
KHARGHAR, NAVI MUMBAI KHARGHAR, NAVI MUMBAI

Principal
DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS,
KHARGHAR, NAVI MUMBAI

5|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Abstract

As we are moving towards the digital world — cybersecurity is becoming a crucial part of our life.
When we talk about security in digital life then the main challenge is to find the abnormal activity.
When we make any transaction while purchasing any product online — a good amount of people
prefers credit cards. The credit limit in credit cards sometimes helps us by making purchases even if
we don’t have the amount at that time. But, on the other hand, these features are misused by cyber
attackers. To tackle kind of problem, we need a system that can abort the transaction if it finds
abnormal. Here, comes the need for a system that can track the pattern of all the transactions and if
any pattern is abnormal then the transaction should be aborted.

In this project, we are going to build a machine learning model that can detect whether a credit card
transaction is legit or fraud. Using machine learning, credit card fraud detection can become easier
and more efficient. Our designed model will be able to recognize the unusual credit card transactions
and fraud. The first and foremost step involves collecting and sorting raw data, which is then used to
train the model to predict the probability of fraud. Our model will provide various benefits such as
faster detection, higher accuracy and improved efficiency with larger data so that it becomes user-
friendly and effective.

6|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Index

Chapter No. Title Page No.

1 Introduction 8

2 Literature Survey 10

3 System Design 17

4 Implementation 22

5 Result 29

6 Conclusion 31

References 32

7|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Chapter 1
Introduction

Credit card generally refers to a card that is assigned to the customer (cardholder), usually allowing
them to purchase goods and services within credit limit or withdraw cash in advance. Credit card
provides the cardholder an advantage of the time, that is, it provides time for their customers to repay
later in a prescribed time, by carrying it to the next billing cycle.
Financial institutions often provide customers with cards that make their lives convenient as they go
shopping without carrying cash. Other than debit cards the credit cards are also beneficial to
consumers because it protects them against purchased goods that might be damaged, lost or even
stolen. Customers are required to verify the transaction with the merchant before carrying out any
transaction using their credit card.
Despite the several benefits that credit cards provide to consumers, they are also associated with
problems such as security and fraud. Credit card fraud is considered a challenge which banks and
financial institutions are facing. It occurs when unapproved individuals use credit cards for gaining
money or property using fraudulent means.
Without any risks, a significant amount can be withdrawn without the owner’s knowledge, in a short
period. Fraudsters always try to make every fraudulent transaction legitimate, which makes fraud
detection very challenging and difficult task to detect.

Credit card information is sensitive to be stolen via online platforms and web pages that are
unsecured. Fraudsters can access the credit and debit card numbers of users illegitimately without
their consent and knowledge.
These frauds can be classified as:
• Credit Card Frauds: Online and Offline
• Card Theft
• Account Bankruptcy
• Device Intrusion
• Application Fraud
• Counterfeit Card
• Telecommunication Fraud

8|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Machine learning is effective in determining which transactions are fraudulent and those that are
legitimate. One of the main challenges associated with detection techniques is the barrier to
exchanging ideas related to fraud detection. With different frauds mostly credit card frauds, often in
the news for the past few years, frauds are in the top of mind for most the world’s population. Credit
card dataset is highly imbalanced because there will be more legitimate transaction when compared
with a fraudulent one. Even then there are chances for thieves to misuse the credit cards. There are
many machine learning techniques to overcome this problem.

Some of the currently used approaches to detection of such fraud are:


• Artificial Neural Network
• Fuzzy Logic
• Genetic Algorithm
• Logistic Regression
• Decision tree
• Support Vector Machines
• Bayesian Networks
• Hidden Markov Model
• K-Nearest Neighbour

9|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Chapter 2

Literature Survey

Citation Title Methods Advantages Limitations Results Remarks

[1] Jiang, Changjun et al. Machine Majority of The The Model can
“Credit Card Fraud learning to voting methods precision precision be built on
Detection: A Novel detect credit achieve good value value a other
Approach Using card fraud accuracy rates achieved is achieved is dataset and
Aggregation Strategy detection. in order to less as less as can be
and Feedback detect the fraud compared compared to deployed
Mechanism.” IEEE in the credit to other other on frontend
Internet of Things cards. algorithms algorithms on the
Journal 5 (2018): website.
3637-3647.
[2] Pumsirirat, A. and Deep Proposed There is a There is a Not every
Yan, L. (2018). learning model need to need to user
Credit Card Fraud topologies outperformed improve improve the friendly
Detection using Deep for the and prevented the accuracy of and can be
Learning based on detection of the frauds in accuracy of the improved
Auto-Encoder and fraud in any online the proposed for
Restricted Boltzmann online transaction proposed algorithm visualizatio
Machine. money through credit algorithm. n of the
International Journal transaction cards data.
of Advanced
Computer Science
and Applications,
9(1).
[3] Mohammed, Emad, The B2C Proposed Problems Problems Need to
and Behrouz Far. dataset for random forests like like quantify
“Supervised Machine the provide good imbalanced imbalanced and address

10 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Learning Algorithms identificatio results on the data make data make it the fairness
for Credit Card n and small dataset it less less risks
Fraudulent detection of effective effective
Transaction fraud from than any than any
Detection: A the credit other other dataset
Comparative Study.” cards. dataset.
IEEE Annals of the
History of
Computing, IEEE, 1
July 2018.
[4] Randhawa, Kuldeep, Deep The The smaller The smaller Consider
et al. “Credit Card autoencoder classification is number of number of using other
Fraud Detection which is performed on variations variations predictors
Using AdaBoost and used to the best- are used to are used to models
Majority Voting.” extract the extracted examine examine the
IEEE Access, vol. 6, best features. Due the results results of
2018, pp. 14277– characteristi to which it of the the
14284. cs of the gains high proposed proposed
information accuracy, the approach approach
from the low variance is
credit card noticeable
transaction.
[5] Roy, Abhimanyu, et The In comparison Limited Limited Consider
al. “Deep Learning performanc to naïve Bayes parameters parameters Scalable
Detecting Fraud in es of several and logistic are used to are used to dataset and
Credit Card algorithms regression test the test the apply the
Transactions.” 2018 were approaches, the performanc performance CF process
Systems and evaluated performance of e level. level again
Information when they k-NN is better.
Engineering Design were
Symposium (SIEDS), applied on

11 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
2018. credit card
fraud data
that is
highly
skewed.
[6] Xuan, Shiyang, et al. Novel data There is a large The The Needs to
“Random Forest for mining moving evaluation evaluation implement
Credit Card Fraud layer of window, of the of the in global
Detection.” 2018 defense is higher number results is results is not networking
IEEE 15th proposed of attributes not done done application
International using and number of properly properly and and explore
Conference on two link types and also also global
Networking, Sensing algorithms available presided presided dataset.
and Control named which can be information information
(ICNSC), 2018. Communal searched by is given is given
Detection CD and SD about result about result
and algorithms. analysis analysis
Spike
Detection.
[7] Awoyemi, John O., et Several The normal The The Needs
al. “Credit Card methods are usage pattern proposed proposed some more
Fraud Detection integrated of clients algorithm algorithm classificati
Using Machine to provide a depending achieves achieves on.
Learning Techniques: secure upon their past high high
A Comparative mechanism. activities is performanc performance
Analysis.” 2017 identified by e in terms in terms of
International applying any of execution
Conference on of these execution time by
Computing methods. time by accuracy
Networking and accuracy factor get
Informatics (ICCNI), factor get compromise

12 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
2017. compromis d
ed.
[8] Melo-Acosta, “No Cash‟ The fraud The The expense Difficulty
German E., et al. mobile activities are expense history and of showing
“Fraud Detection in application minimized history and any the
Big Data Using is proposed. using this any unwanted problem to
Supervised and Semi- proposed unwanted costs need the
Supervised Learning application costs need to be network.
Techniques.” 2017 to be minimized
IEEE Colombian minimized.
Conference on
Communications and
Computing
(COLCOM), 2017.

[9] “Survey Paper on The The The The This


Credit Card Fraud principal performance of precision precision method has
Detection by component various value and value and many
Suman”, Research analysis methods was execution execution layers,
Scholar, GJUS&T (PCA) is evaluated using time are time are not making it
Hisar HCE, Sonepat applied to certain not as per as per the difficulty.
published by real data to performance the demand
International Journal propose a metrics which demand.
of Advanced novel showed the
Research in approach proposed
Computer approach‟s
Engineering & efficiency
Technology against others.
(IJARCET) Volume
3 Issue 3, March
2014.

13 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
[10] “Research on Credit A variant of The real data Precision Precision Needs to
Card Fraud Detection the was used to was low as was low as implement
Model Based on traditional evaluate the compared compared to the system
Distance Sum – by iterative performances to other other to
Wen-Fang YU and amplitude of the proposed algorithms algorithms understand
Na Wang” published adjusted method which the user
by 2009 International Fourier showed the needs or to
Joint Conference on transform efficiency of help users
Artificial (IAAFT) the proposed clarify their
Intelligence. and the approach needs
iterative
surrogate
signals on
graph
algorithms
(ISSG) are
proposed
[11] V. N. Dornadula and Decision High This The The
S. Geetha, ―Credit Tree adaptability, method has duration interaction
Card Fraud Detection approach is which Many of the between a
using Machine used aids in layers, network is user and an
Learning Algorithms, considering all making it unknown item may
Procedia potential difficulty. (High consist of
Computer Sci., vol. solutions to a It may own processing an explicit.
165, pp. 631–641, problem. There an over time for
2019. is minimal fitting large neural
need for issue, networks.
data cleaning. which the
RF
algorithm
mastery

14 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
resolve.

[12] B. Wickramanayake, Logistic Easier to The non- The expense The fraud
D. K. Geeganage, C. Regression implement, linear issue history and activities
Ouyang, and Y. Xu, method interpret, and cannot be any are
―A survey of online used very fixed with unwanted minimized
card payment fraud efficient to logistic costs need using this
detection using data train. It makes regression to be proposed
mining-based no assumptions because it minimized. method
methods, arXiv, about has a
2020. distributions of linear
classes in decision
feature space. surface.
[13] R. Sailusha, V. K -Nearest It is The The Limited
Gnaneswar, R. Neighbors straightforward computatio proposed parameters
Ramesh, and G. algorithm to n cost algorithm are used to
Ramakoteswara Rao, implement. is high achieves test
―Credit Card Fraud Speed of because of high the
Detection Using detection is calculating performance performanc
Machine Learning, good. If the the in terms of e level.
Proc. Int. Conf. training data distance execution
Intell. Comput. is huge, it may between time by
Control Syst. ICICCS be more the accuracy
2020, no. Iciccs, pp. efficient. data points factor
1264–1270, 2020. for all the get
training compromise
samples.
[14] A. RB and S. K. KR, Artificial Storing The There is a Proposed
―Credit Card Fraud Neural information on unexplaine need to neural
Detection Using Networks the d improve the network
Artificial Neural entire network. demeanor accuracy of provides

15 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Network,‖ Glob. Ability to of the the good
Transitions Proc., pp. work with network. proposed results on
0–8, incomplete algorithm. the small
2021. data. dataset
[15] D. D. Borse, P. S. H. K-means Efficient and Lots of The The less
Patil, and S. Dhotre, Clustering Quick. recurrences classificatio number of
―Credit Card Fraud analysis is Repeated Have to n is variation
Detection Using done technique. select our performed are
Naïve Bayes and C4, Works on possess k on used to
vol. 10, no.1, pp. categorized value. the best- examine
423–429, 2021. digital data. Must extracted the results
understand features. of the
the case of proposed
our data approach
well.

16 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Chapter 3

System Design

The centralized approach is one of the commonly adopted methods for credit card fraud detection. A
fraud detection system (FDS) becomes inefficient when the limited datasets are available and the
limited detection period. Banks and other financial centers cannot share their data on a central server
due to GDPR. Users’ privacy can still be compromised even if the "anonymized" dataset is locally on
servers as it could be reversed-engineered.
The approach the model is going to follow, uses the latest machine learning algorithms to detect
anomalous activities, called outliers. The basic rough architecture diagram can be represented with the
following figure:

Fig. 1 Model Architecture

When looked at in detail on a larger scale along with real life elements, the full architecture diagram
can be represented as follows:

17 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Fig. 2 Detailed structure of the model

First of all, we obtained our dataset from Kaggle, a data analysis website which provides datasets.
Inside this dataset, there are 31 columns out of which 28 are named as v1-v28 to protect sensitive
data. The other columns represent Time, Amount and Class. Time shows the time gap between the
first transaction and the following one. Amount is the amount of money transacted. Class 0 represents
a valid transaction and 1 represents a fraudulent one.

We plot different graphs to check for inconsistencies in the dataset and to visually comprehend it:

Fig. 3 Inconsistency Graph

18 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
This graph shows that the number of fraudulent transactions is much lower than the legitimate ones.

Fig. 4 Distribution of Time Feature

This graph shows the times at which transactions were done within two days. It can be seen that the
least number of transactions were made during night time and highest during the days.

Fig. 5 Distribution of Monetary Value Feature


19 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
This graph represents the amount that was transacted. A majority of transactions are relatively small
and only a handful of them come close to the maximum transacted amount.
After checking this dataset, we plot a histogram for every column. This is done to get a graphical
representation of the dataset which can be used to verify that there are no missing any values in the
dataset. This is done to ensure that we don’t require any missing value imputation and the machine
learning algorithms can process the dataset smoothly.

Fig. 6 Working of the Model


After this analysis, we plot a heatmap to get a colored representation of the data and to study the
correlation between out predicting variables and the class variable. This heatmap is shown below:

Fig. 7 Heatmap Representing the Correlation Predicting Variables and Class Variables
20 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
The dataset is now formatted and processed. The time and amount column are standardized and the
Class column is removed to ensure fairness of evaluation. The data is processed by a set of algorithms
from modules. The following module diagram explains how these algorithms work together. This
data is fit into a model and the following outlier detection modules are applied on it:
• Local Outlier Factor
• Isolation Forest Algorithm
These algorithms are a part of sklearn. The ensemble module in the sklearn package includes
ensemble-based methods and functions for the classification, regression and outlier detection. This
free and open-source Python library is built using NumPy, SciPy and matplotlib modules which
provides a lot of simple and efficient tools which can be used for data analysis and machine learning.
It features various classification, clustering and regression algorithms and is designed to interoperate
with the numerical and scientific libraries.
We have used Jupyter Notebook platform to make a program in Python to demonstrate the approach
that this paper suggests. This program can also be executed on the cloud using Google Collab
platform which supports all python notebook files.
Technology and Language used for project: -
Languages:
1. Python: Version 3.10.2

Tools with their versions:

1. Operating System: Windows


2. Editors and IDE:
a. Jupyter Notebook: Version 6.5.1
b. Google Collab
3. Dataset: Card Holder id, Transaction id, Amount, Time, Label
4. Libraries and Framework:
a. NumPy: Version 1.23.1
b. SciPy: Version 1.9.3
c. Matplotlib: Version 3.6.2
d. Sklearn: Version 0.2

21 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Chapter 4

Implementation

Card transactions are always unfamiliar when compared to previous transactions made the customer.
This unfamiliarity is a very difficult problem in real-world when are called concept drift problems.
Concept drift can be said as a variable which changes over time and in unforeseen ways. These
variables cause a high imbalance in data. Table 1 shows basic features that are captured when any
transaction is made.

Attribute Name Description

Transaction id Identification number of a transaction

Cardholder id Unique Identification number given to the


cardholder
Amount Amount transferred or credited in a particular
transaction by the customer
Time Details like time and date, to identify when the
transaction was made
Label To specify whether the transaction is genuine or
fraudulent
Table 1: Raw features of credit card transactions

We have considered a dataset which contains the real bank transactions made by European
cardholders in the year 2013. As a security concern, the actual variables are not being shared but —
they have been transformed versions of PCA. As a result, we can find 29 feature columns and 1 final
class column.

22 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Importing the Necessary Libraries

Importing the Dataset

Data Processing & Understanding

23 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Imbalance the data

Only 0.17% fraudulent transaction out all the transactions. The data is highly Unbalanced.
Let’s first apply our models without balancing it and if we don’t get a good accuracy then we can find
a way to balance this dataset. But first, let’s implement the model without it and will balance the data
only if needed.

Printing the amount details for Fraudulent Transaction

Printing the amount details for Normal Transaction

24 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
As we can clearly notice from this, the average Money transaction for the fraudulent ones is more.
This makes this problem crucial to deal with.

Plotting the Correlation Matrix

The correlation matrix graphically gives us an idea of how features correlate with each other and can
help us predict what are the features that are most relevant for the prediction.

In the Heatmap, we can clearly see that most of the features do not correlate to other features but
there are some features that either has a positive or a negative correlation with each other. For
example, V2 and V5 are highly negatively correlated with the feature called Amount.
We also see some correlation with V20 and Amount. This gives us a deeper understanding of the data
available to us.

25 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Separating the X and the Y values
Dividing the data into inputs parameters and outputs value format

Training and Testing Data Bifurcation


We will be dividing the dataset into two main groups. One for training the model and the other for
Testing our trained model’s performance.

Building a Random Forest Model using scikit learn

26 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Building all kinds of evaluating parameters

27 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Visualizing the Confusion Matrix

28 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Chapter 5

Results

In our proposed system we use the following formulae to evaluate, accuracy and precision are never
good parameters for evaluating a model. But accuracy and precision are always considered as the
base parameter to evaluate any model.

The Matthews Correlation Coefficient (MCC) is a machine learning measure which is used to check
the balance of the binary (two-class) classifiers. It takes into account all the true and false values that
is why it is generally regarded as a balanced measure which can be used even if there are different
classes,

Comparison with other algorithms without dealing with the misbalancing of the data.

29 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
As you can see with the Random Forest method, we are getting a better result even for the recall
which is the trickiest part.

We received the highest accuracy in our credit card fraud detection model. This number should not be
surprising as our data was balanced towards one class. The good thing that we have noticed from the
confusion matrix is that — our model is not overfitted.

The only catch here is the data that we have received for model training. The data features are the
transformed version of PCA. If the actual features follow a similar pattern, then we are doing great.

30 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Chapter 6

Conclusion

Credit card fraud is without a doubt an act of criminal dishonesty. This article has listed out the most
common methods of fraud along with their detection methods and reviewed recent findings in this field.
This paper has also explained in detail, how machine learning can be applied to get better results in
fraud detection along with the algorithm, pseudocode, explanation its implementation and
experimentation results.
While the algorithm does reach over 99.6% accuracy, its precision remains only at 28% when a tenth
of the data set is taken into consideration. However, when the entire dataset is fed into the algorithm,
the precision rises to 33%. This high percentage of accuracy is to be expected due to the huge imbalance
between the number of valid and number of genuine transactions.
In this project, we developed a novel method for fraud detection, where customers are grouped based
on their transactions and extract behavioral patterns to develop a profile for every cardholder. Then
different classifiers are applied on three different groups later rating scores are generated for every type
of classifier. This dynamic changes in parameters lead the system to adapt to new cardholder's
transaction behaviors timely. Followed by a feedback mechanism to solve the problem of concept drift.
We observed that the Matthews Correlation Coefficient was the better parameter to deal with imbalance
dataset. MCC was not the only solution.
We finally observed that Logistic regression, decision tree and random forest are the algorithms that
gave better results.

31 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
References

[1] Jiang, Changjun et al. “Credit Card Fraud Detection: A Novel Approach Using Aggregation
Strategy and Feedback Mechanism.” IEEE Internet of Things Journal 5 (2018): 3637-3647.
[2] Pumsirirat, A. and Yan, L. (2018). Credit Card Fraud Detection using Deep Learning based on
Auto-Encoder and Restricted Boltzmann Machine. International Journal of Advanced Computer
Science and Applications, 9(1).
[3] Mohammed, Emad, and Behrouz Far. “Supervised Machine Learning Algorithms for Credit Card
Fraudulent Transaction Detection: A Comparative Study.” IEEE Annals of the History of Computing,
IEEE, 1 July 2018.
[4] Randhawa, Kuldeep, et al. “Credit Card Fraud Detection Using AdaBoost and Majority Voting.”
IEEE Access, vol. 6, 2018, pp. 14277–14284.
[5] Roy, Abhimanyu, et al. “Deep Learning Detecting Fraud in Credit Card Transactions.” 2018
Systems and Information Engineering Design Symposium (SIEDS), 2018.
[6] Xuan, Shiyang, et al. “Random Forest for Credit Card Fraud Detection.” 2018 IEEE 15th
International Conference on Networking, Sensing and Control (ICNSC), 2018.
[7] Awoyemi, John O., et al. “Credit Card Fraud Detection Using Machine Learning Techniques: A
Comparative Analysis.” 2017 International Conference on Computing Networking and Informatics
(ICCNI), 2017.
[8] Melo-Acosta, German E., et al. “Fraud Detection in Big Data Using Supervised and Semi-
Supervised Learning Techniques.” 2017 IEEE Colombian Conference on Communications and
Computing (COLCOM), 2017.
[9] “Survey Paper on Credit Card Fraud Detection by Suman”, Research Scholar, GJUS&T Hisar
HCE, Sonepat published by International Journal of Advanced Research in Computer Engineering &
Technology (IJARCET) Volume 3 Issue 3, March 2014.
[10] “Research on Credit Card Fraud Detection Model Based on Distance Sum – by Wen-Fang YU
and Na Wang” published by 2009 International Joint Conference on Artificial Intelligence.
[11] “Credit Card Fraud Detection through Parenclitic Network AnalysisBy Massimiliano Zanin,
Miguel Romance, Regino Criado, and SantiagoMoral” published by Hindawi Complexity Volume
2018, Article ID 5764370, 9 pages
[12] “Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy” published
32 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
by IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 29,
NO. 8, AUGUST 2018
[13] “Credit Card Fraud Detection-by Ishu Trivedi, Monika, Mrigya, Mridushi” published by
International Journal of Advanced Research in Computer and Communication Engineering Vol. 5,
Issue 1, January 2016.
[14] David J.Wetson,David J.Hand,M Adams,Whitrow and Piotr Jusczak “Plastic Card Fraud
Detection using Peer Group Analysis” Springer, Issue 2008.
[15] “Credit Card Fraud Detection Based on Transaction Behaviour -by John Richard D. Kho, Larry
A. Vea” published by Proc. of the 2017 IEEE Region 10 Conference (TENCON), Malaysia,
November 5-8, 2017.

33 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject

You might also like