0% found this document useful (0 votes)

33 views

ML Project Report

The document describes a mini project on credit card fraud detection using machine learning. It presents the abstract, introduction and index sections. Specifically: 1) It introduces the problem of credit card fraud and how machine learning can help detect fraudulent transactions faster and more accurately. 2) The introduction provides background on credit cards and the types of fraud that can occur, such as card theft, counterfeit cards, and online fraud. 3) The index previews that the project will include chapters on literature review, system design, implementation, results, and conclusions.

Uploaded by

srijansundaram521

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

ML Project Report

Uploaded by

srijansundaram521

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

BHARATI VIDYAPEETH (DEEMED TO BE UNIVERSITY)

COLLEGE OF ENGINEERING

DEPARTMENT OF ENGINEERING & TECHNOLOGY OFFCAMPUS,

KHARGHAR, NAVI MUMBAI,410210

Mini Project Report

On
Credit Card Fraud Detection
Subject-: - Machine Learning

Presented By

Roll No. Name PRN

44 Sharda Verma 2043110212

51 Arpit Singh 2143110186

67 Jaydip Singh 19431100218

38 Aryan Srivastava 2043110204

Signature of Internal Examiner

1|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
BHARATI VIDYAPEETH (DEEMED TO BE UNIVERSITY)
COLLEGE OF ENGINEERING

DEPARTMENT OF ENGINEERING & TECHNOLOGY OFFCAMPUS,

KHARGHAR, NAVI MUMBAI,410210

This is to certify that the project entitled, “Credit Card Fraud Detection”, which is being submitted here
with for the award of B.Tech. CSBS Department, is the result of the work completed by Sharda Verma
under my supervision and guidance within the four walls of the institute and the same has not been
submitted elsewhere for the award of any degree.

Guide (Head of Department)

DEPARTMENT OF ENGINEERING & DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS, TECHNOLOGY OFFCAMPUS
KHARGHAR, NAVI MUMBAI KHARGHAR, NAVI MUMBAI

Principal
DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS,
KHARGHAR, NAVI MUMBAI

2|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
BHARATI VIDYAPEETH (DEEMED TO BE UNIVERSITY)
COLLEGE OF ENGINEERING

DEPARTMENT OF ENGINEERING & TECHNOLOGY OFFCAMPUS,

KHARGHAR, NAVI MUMBAI,410210

This is to certify that the project entitled, “Credit Card Fraud Detection”, which is being submitted here
with for the award of B.Tech. CSBS Department, is the result of the work completed by Arpit Singh
under my supervision and guidance within the four walls of the institute and the same has not been
submitted elsewhere for the award of any degree.

Guide (Head of Department)

DEPARTMENT OF ENGINEERING & DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS, TECHNOLOGY OFFCAMPUS
KHARGHAR, NAVI MUMBAI KHARGHAR, NAVI MUMBAI

Principal
DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS,
KHARGHAR, NAVI MUMBAI

3|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
BHARATI VIDYAPEETH (DEEMED TO BE UNIVERSITY)
COLLEGE OF ENGINEERING

DEPARTMENT OF ENGINEERING & TECHNOLOGY OFFCAMPUS,

KHARGHAR, NAVI MUMBAI,410210

This is to certify that the project entitled, “Credit Card Fraud Detection”, which is being submitted here
with for the award of B.Tech. CSBS Department, is the result of the work completed by Jaydip Singh
under my supervision and guidance within the four walls of the institute and the same has not been
submitted elsewhere for the award of any degree.

Guide (Head of Department)

DEPARTMENT OF ENGINEERING & DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS, TECHNOLOGY OFFCAMPUS
KHARGHAR, NAVI MUMBAI KHARGHAR, NAVI MUMBAI

Principal
DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS,
KHARGHAR, NAVI MUMBAI

4|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
BHARATI VIDYAPEETH (DEEMED TO BE UNIVERSITY)
COLLEGE OF ENGINEERING

DEPARTMENT OF ENGINEERING & TECHNOLOGY OFFCAMPUS,

KHARGHAR, NAVI MUMBAI,410210

This is to certify that the project entitled, “Credit Card Fraud Detection”, which is being submitted here
with for the award of B.Tech. CSBS Department, is the result of the work completed by Aryan Srivastav
under my supervision and guidance within the four walls of the institute and the same has not been
submitted elsewhere for the award of any degree.

Guide (Head of Department)

DEPARTMENT OF ENGINEERING & DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS, TECHNOLOGY OFFCAMPUS
KHARGHAR, NAVI MUMBAI KHARGHAR, NAVI MUMBAI

Principal
DEPARTMENT OF ENGINEERING &
TECHNOLOGY OFFCAMPUS,
KHARGHAR, NAVI MUMBAI

5|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Abstract

As we are moving towards the digital world — cybersecurity is becoming a crucial part of our life.
When we talk about security in digital life then the main challenge is to find the abnormal activity.
When we make any transaction while purchasing any product online — a good amount of people
prefers credit cards. The credit limit in credit cards sometimes helps us by making purchases even if
we don’t have the amount at that time. But, on the other hand, these features are misused by cyber
attackers. To tackle kind of problem, we need a system that can abort the transaction if it finds
abnormal. Here, comes the need for a system that can track the pattern of all the transactions and if
any pattern is abnormal then the transaction should be aborted.

In this project, we are going to build a machine learning model that can detect whether a credit card
transaction is legit or fraud. Using machine learning, credit card fraud detection can become easier
and more efficient. Our designed model will be able to recognize the unusual credit card transactions
and fraud. The first and foremost step involves collecting and sorting raw data, which is then used to
train the model to predict the probability of fraud. Our model will provide various benefits such as
faster detection, higher accuracy and improved efficiency with larger data so that it becomes user-
friendly and effective.

6|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Index

Chapter No. Title Page No.

1 Introduction 8

2 Literature Survey 10

3 System Design 17

4 Implementation 22

5 Result 29

6 Conclusion 31

References 32

7|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Chapter 1
Introduction

Credit card generally refers to a card that is assigned to the customer (cardholder), usually allowing
them to purchase goods and services within credit limit or withdraw cash in advance. Credit card
provides the cardholder an advantage of the time, that is, it provides time for their customers to repay
later in a prescribed time, by carrying it to the next billing cycle.
Financial institutions often provide customers with cards that make their lives convenient as they go
shopping without carrying cash. Other than debit cards the credit cards are also beneficial to
consumers because it protects them against purchased goods that might be damaged, lost or even
stolen. Customers are required to verify the transaction with the merchant before carrying out any
transaction using their credit card.
Despite the several benefits that credit cards provide to consumers, they are also associated with
problems such as security and fraud. Credit card fraud is considered a challenge which banks and
financial institutions are facing. It occurs when unapproved individuals use credit cards for gaining
money or property using fraudulent means.
Without any risks, a significant amount can be withdrawn without the owner’s knowledge, in a short
period. Fraudsters always try to make every fraudulent transaction legitimate, which makes fraud
detection very challenging and difficult task to detect.

Credit card information is sensitive to be stolen via online platforms and web pages that are
unsecured. Fraudsters can access the credit and debit card numbers of users illegitimately without
their consent and knowledge.
These frauds can be classified as:
• Credit Card Frauds: Online and Offline
• Card Theft
• Account Bankruptcy
• Device Intrusion
• Application Fraud
• Counterfeit Card
• Telecommunication Fraud

8|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Machine learning is effective in determining which transactions are fraudulent and those that are
legitimate. One of the main challenges associated with detection techniques is the barrier to
exchanging ideas related to fraud detection. With different frauds mostly credit card frauds, often in
the news for the past few years, frauds are in the top of mind for most the world’s population. Credit
card dataset is highly imbalanced because there will be more legitimate transaction when compared
with a fraudulent one. Even then there are chances for thieves to misuse the credit cards. There are
many machine learning techniques to overcome this problem.

Some of the currently used approaches to detection of such fraud are:

• Artificial Neural Network
• Fuzzy Logic
• Genetic Algorithm
• Logistic Regression
• Decision tree
• Support Vector Machines
• Bayesian Networks
• Hidden Markov Model
• K-Nearest Neighbour

9|Page
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Chapter 2

Literature Survey

Citation Title Methods Advantages Limitations Results Remarks

[1] Jiang, Changjun et al. Machine Majority of The The Model can
“Credit Card Fraud learning to voting methods precision precision be built on
Detection: A Novel detect credit achieve good value value a other
Approach Using card fraud accuracy rates achieved is achieved is dataset and
Aggregation Strategy detection. in order to less as less as can be
and Feedback detect the fraud compared compared to deployed
Mechanism.” IEEE in the credit to other other on frontend
Internet of Things cards. algorithms algorithms on the
Journal 5 (2018): website.
3637-3647.
[2] Pumsirirat, A. and Deep Proposed There is a There is a Not every
Yan, L. (2018). learning model need to need to user
Credit Card Fraud topologies outperformed improve improve the friendly
Detection using Deep for the and prevented the accuracy of and can be
Learning based on detection of the frauds in accuracy of the improved
Auto-Encoder and fraud in any online the proposed for
Restricted Boltzmann online transaction proposed algorithm visualizatio
Machine. money through credit algorithm. n of the
International Journal transaction cards data.
of Advanced
Computer Science
and Applications,
9(1).
[3] Mohammed, Emad, The B2C Proposed Problems Problems Need to
and Behrouz Far. dataset for random forests like like quantify
“Supervised Machine the provide good imbalanced imbalanced and address

10 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Learning Algorithms identificatio results on the data make data make it the fairness
for Credit Card n and small dataset it less less risks
Fraudulent detection of effective effective
Transaction fraud from than any than any
Detection: A the credit other other dataset
Comparative Study.” cards. dataset.
IEEE Annals of the
History of
Computing, IEEE, 1
July 2018.
[4] Randhawa, Kuldeep, Deep The The smaller The smaller Consider
et al. “Credit Card autoencoder classification is number of number of using other
Fraud Detection which is performed on variations variations predictors
Using AdaBoost and used to the best- are used to are used to models
Majority Voting.” extract the extracted examine examine the
IEEE Access, vol. 6, best features. Due the results results of
2018, pp. 14277– characteristi to which it of the the
14284. cs of the gains high proposed proposed
information accuracy, the approach approach
from the low variance is
credit card noticeable
transaction.
[5] Roy, Abhimanyu, et The In comparison Limited Limited Consider
al. “Deep Learning performanc to naïve Bayes parameters parameters Scalable
Detecting Fraud in es of several and logistic are used to are used to dataset and
Credit Card algorithms regression test the test the apply the
Transactions.” 2018 were approaches, the performanc performance CF process
Systems and evaluated performance of e level. level again
Information when they k-NN is better.
Engineering Design were
Symposium (SIEDS), applied on

11 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
2018. credit card
fraud data
that is
highly
skewed.
[6] Xuan, Shiyang, et al. Novel data There is a large The The Needs to
“Random Forest for mining moving evaluation evaluation implement
Credit Card Fraud layer of window, of the of the in global
Detection.” 2018 defense is higher number results is results is not networking
IEEE 15th proposed of attributes not done done application
International using and number of properly properly and and explore
Conference on two link types and also also global
Networking, Sensing algorithms available presided presided dataset.
and Control named which can be information information
(ICNSC), 2018. Communal searched by is given is given
Detection CD and SD about result about result
and algorithms. analysis analysis
Spike
Detection.
[7] Awoyemi, John O., et Several The normal The The Needs
al. “Credit Card methods are usage pattern proposed proposed some more
Fraud Detection integrated of clients algorithm algorithm classificati
Using Machine to provide a depending achieves achieves on.
Learning Techniques: secure upon their past high high
A Comparative mechanism. activities is performanc performance
Analysis.” 2017 identified by e in terms in terms of
International applying any of execution
Conference on of these execution time by
Computing methods. time by accuracy
Networking and accuracy factor get
Informatics (ICCNI), factor get compromise

12 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
2017. compromis d
ed.
[8] Melo-Acosta, “No Cash‟ The fraud The The expense Difficulty
German E., et al. mobile activities are expense history and of showing
“Fraud Detection in application minimized history and any the
Big Data Using is proposed. using this any unwanted problem to
Supervised and Semi- proposed unwanted costs need the
Supervised Learning application costs need to be network.
Techniques.” 2017 to be minimized
IEEE Colombian minimized.
Conference on
Communications and
Computing
(COLCOM), 2017.

[9] “Survey Paper on The The The The This

Credit Card Fraud principal performance of precision precision method has
Detection by component various value and value and many
Suman”, Research analysis methods was execution execution layers,
Scholar, GJUS&T (PCA) is evaluated using time are time are not making it
Hisar HCE, Sonepat applied to certain not as per as per the difficulty.
published by real data to performance the demand
International Journal propose a metrics which demand.
of Advanced novel showed the
Research in approach proposed
Computer approach‟s
Engineering & efficiency
Technology against others.
(IJARCET) Volume
3 Issue 3, March
2014.

13 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
[10] “Research on Credit A variant of The real data Precision Precision Needs to
Card Fraud Detection the was used to was low as was low as implement
Model Based on traditional evaluate the compared compared to the system
Distance Sum – by iterative performances to other other to
Wen-Fang YU and amplitude of the proposed algorithms algorithms understand
Na Wang” published adjusted method which the user
by 2009 International Fourier showed the needs or to
Joint Conference on transform efficiency of help users
Artificial (IAAFT) the proposed clarify their
Intelligence. and the approach needs
iterative
surrogate
signals on
graph
algorithms
(ISSG) are
proposed
[11] V. N. Dornadula and Decision High This The The
S. Geetha, ―Credit Tree adaptability, method has duration interaction
Card Fraud Detection approach is which Many of the between a
using Machine used aids in layers, network is user and an
Learning Algorithms, considering all making it unknown item may
Procedia potential difficulty. (High consist of
Computer Sci., vol. solutions to a It may own processing an explicit.
165, pp. 631–641, problem. There an over time for
2019. is minimal fitting large neural
need for issue, networks.
data cleaning. which the
RF
algorithm
mastery

14 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
resolve.

[12] B. Wickramanayake, Logistic Easier to The non- The expense The fraud
D. K. Geeganage, C. Regression implement, linear issue history and activities
Ouyang, and Y. Xu, method interpret, and cannot be any are
―A survey of online used very fixed with unwanted minimized
card payment fraud efficient to logistic costs need using this
detection using data train. It makes regression to be proposed
mining-based no assumptions because it minimized. method
methods, arXiv, about has a
2020. distributions of linear
classes in decision
feature space. surface.
[13] R. Sailusha, V. K -Nearest It is The The Limited
Gnaneswar, R. Neighbors straightforward computatio proposed parameters
Ramesh, and G. algorithm to n cost algorithm are used to
Ramakoteswara Rao, implement. is high achieves test
―Credit Card Fraud Speed of because of high the
Detection Using detection is calculating performance performanc
Machine Learning, good. If the the in terms of e level.
Proc. Int. Conf. training data distance execution
Intell. Comput. is huge, it may between time by
Control Syst. ICICCS be more the accuracy
2020, no. Iciccs, pp. efficient. data points factor
1264–1270, 2020. for all the get
training compromise
samples.
[14] A. RB and S. K. KR, Artificial Storing The There is a Proposed
―Credit Card Fraud Neural information on unexplaine need to neural
Detection Using Networks the d improve the network
Artificial Neural entire network. demeanor accuracy of provides

15 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Network,‖ Glob. Ability to of the the good
Transitions Proc., pp. work with network. proposed results on
0–8, incomplete algorithm. the small
2021. data. dataset
[15] D. D. Borse, P. S. H. K-means Efficient and Lots of The The less
Patil, and S. Dhotre, Clustering Quick. recurrences classificatio number of
―Credit Card Fraud analysis is Repeated Have to n is variation
Detection Using done technique. select our performed are
Naïve Bayes and C4, Works on possess k on used to
vol. 10, no.1, pp. categorized value. the best- examine
423–429, 2021. digital data. Must extracted the results
understand features. of the
the case of proposed
our data approach
well.

16 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Chapter 3

System Design

The centralized approach is one of the commonly adopted methods for credit card fraud detection. A
fraud detection system (FDS) becomes inefficient when the limited datasets are available and the
limited detection period. Banks and other financial centers cannot share their data on a central server
due to GDPR. Users’ privacy can still be compromised even if the "anonymized" dataset is locally on
servers as it could be reversed-engineered.
The approach the model is going to follow, uses the latest machine learning algorithms to detect
anomalous activities, called outliers. The basic rough architecture diagram can be represented with the
following figure:

Fig. 1 Model Architecture

When looked at in detail on a larger scale along with real life elements, the full architecture diagram
can be represented as follows:

17 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Fig. 2 Detailed structure of the model

First of all, we obtained our dataset from Kaggle, a data analysis website which provides datasets.
Inside this dataset, there are 31 columns out of which 28 are named as v1-v28 to protect sensitive
data. The other columns represent Time, Amount and Class. Time shows the time gap between the
first transaction and the following one. Amount is the amount of money transacted. Class 0 represents
a valid transaction and 1 represents a fraudulent one.

We plot different graphs to check for inconsistencies in the dataset and to visually comprehend it:

Fig. 3 Inconsistency Graph

18 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
This graph shows that the number of fraudulent transactions is much lower than the legitimate ones.

Fig. 4 Distribution of Time Feature

This graph shows the times at which transactions were done within two days. It can be seen that the
least number of transactions were made during night time and highest during the days.

Fig. 5 Distribution of Monetary Value Feature

19 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
This graph represents the amount that was transacted. A majority of transactions are relatively small
and only a handful of them come close to the maximum transacted amount.
After checking this dataset, we plot a histogram for every column. This is done to get a graphical
representation of the dataset which can be used to verify that there are no missing any values in the
dataset. This is done to ensure that we don’t require any missing value imputation and the machine
learning algorithms can process the dataset smoothly.

Fig. 6 Working of the Model

After this analysis, we plot a heatmap to get a colored representation of the data and to study the
correlation between out predicting variables and the class variable. This heatmap is shown below:

Fig. 7 Heatmap Representing the Correlation Predicting Variables and Class Variables
20 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
The dataset is now formatted and processed. The time and amount column are standardized and the
Class column is removed to ensure fairness of evaluation. The data is processed by a set of algorithms
from modules. The following module diagram explains how these algorithms work together. This
data is fit into a model and the following outlier detection modules are applied on it:
• Local Outlier Factor
• Isolation Forest Algorithm
These algorithms are a part of sklearn. The ensemble module in the sklearn package includes
ensemble-based methods and functions for the classification, regression and outlier detection. This
free and open-source Python library is built using NumPy, SciPy and matplotlib modules which
provides a lot of simple and efficient tools which can be used for data analysis and machine learning.
It features various classification, clustering and regression algorithms and is designed to interoperate
with the numerical and scientific libraries.
We have used Jupyter Notebook platform to make a program in Python to demonstrate the approach
that this paper suggests. This program can also be executed on the cloud using Google Collab
platform which supports all python notebook files.
Technology and Language used for project: -
Languages:
1. Python: Version 3.10.2

Tools with their versions:

1. Operating System: Windows

2. Editors and IDE:
a. Jupyter Notebook: Version 6.5.1
b. Google Collab
3. Dataset: Card Holder id, Transaction id, Amount, Time, Label
4. Libraries and Framework:
a. NumPy: Version 1.23.1
b. SciPy: Version 1.9.3
c. Matplotlib: Version 3.6.2
d. Sklearn: Version 0.2

21 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Chapter 4

Implementation

Card transactions are always unfamiliar when compared to previous transactions made the customer.
This unfamiliarity is a very difficult problem in real-world when are called concept drift problems.
Concept drift can be said as a variable which changes over time and in unforeseen ways. These
variables cause a high imbalance in data. Table 1 shows basic features that are captured when any
transaction is made.

Attribute Name Description

Transaction id Identification number of a transaction

Cardholder id Unique Identification number given to the

cardholder
Amount Amount transferred or credited in a particular
transaction by the customer
Time Details like time and date, to identify when the
transaction was made
Label To specify whether the transaction is genuine or
fraudulent
Table 1: Raw features of credit card transactions

We have considered a dataset which contains the real bank transactions made by European
cardholders in the year 2013. As a security concern, the actual variables are not being shared but —
they have been transformed versions of PCA. As a result, we can find 29 feature columns and 1 final
class column.

22 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Importing the Necessary Libraries

Importing the Dataset

Data Processing & Understanding

23 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Imbalance the data

Only 0.17% fraudulent transaction out all the transactions. The data is highly Unbalanced.
Let’s first apply our models without balancing it and if we don’t get a good accuracy then we can find
a way to balance this dataset. But first, let’s implement the model without it and will balance the data
only if needed.

Printing the amount details for Fraudulent Transaction

Printing the amount details for Normal Transaction

24 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
As we can clearly notice from this, the average Money transaction for the fraudulent ones is more.
This makes this problem crucial to deal with.

Plotting the Correlation Matrix

The correlation matrix graphically gives us an idea of how features correlate with each other and can
help us predict what are the features that are most relevant for the prediction.

In the Heatmap, we can clearly see that most of the features do not correlate to other features but
there are some features that either has a positive or a negative correlation with each other. For
example, V2 and V5 are highly negatively correlated with the feature called Amount.
We also see some correlation with V20 and Amount. This gives us a deeper understanding of the data
available to us.

25 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Separating the X and the Y values
Dividing the data into inputs parameters and outputs value format

Training and Testing Data Bifurcation

We will be dividing the dataset into two main groups. One for training the model and the other for
Testing our trained model’s performance.

Building a Random Forest Model using scikit learn

26 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Building all kinds of evaluating parameters

27 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Visualizing the Confusion Matrix

28 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Chapter 5

Results

In our proposed system we use the following formulae to evaluate, accuracy and precision are never
good parameters for evaluating a model. But accuracy and precision are always considered as the
base parameter to evaluate any model.

The Matthews Correlation Coefficient (MCC) is a machine learning measure which is used to check
the balance of the binary (two-class) classifiers. It takes into account all the true and false values that
is why it is generally regarded as a balanced measure which can be used even if there are different
classes,

Comparison with other algorithms without dealing with the misbalancing of the data.

29 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
As you can see with the Random Forest method, we are getting a better result even for the recall
which is the trickiest part.

We received the highest accuracy in our credit card fraud detection model. This number should not be
surprising as our data was balanced towards one class. The good thing that we have noticed from the
confusion matrix is that — our model is not overfitted.

The only catch here is the data that we have received for model training. The data features are the
transformed version of PCA. If the actual features follow a similar pattern, then we are doing great.

30 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
Chapter 6

Conclusion

Credit card fraud is without a doubt an act of criminal dishonesty. This article has listed out the most
common methods of fraud along with their detection methods and reviewed recent findings in this field.
This paper has also explained in detail, how machine learning can be applied to get better results in
fraud detection along with the algorithm, pseudocode, explanation its implementation and
experimentation results.
While the algorithm does reach over 99.6% accuracy, its precision remains only at 28% when a tenth
of the data set is taken into consideration. However, when the entire dataset is fed into the algorithm,
the precision rises to 33%. This high percentage of accuracy is to be expected due to the huge imbalance
between the number of valid and number of genuine transactions.
In this project, we developed a novel method for fraud detection, where customers are grouped based
on their transactions and extract behavioral patterns to develop a profile for every cardholder. Then
different classifiers are applied on three different groups later rating scores are generated for every type
of classifier. This dynamic changes in parameters lead the system to adapt to new cardholder's
transaction behaviors timely. Followed by a feedback mechanism to solve the problem of concept drift.
We observed that the Matthews Correlation Coefficient was the better parameter to deal with imbalance
dataset. MCC was not the only solution.
We finally observed that Logistic regression, decision tree and random forest are the algorithms that
gave better results.

31 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
References

[1] Jiang, Changjun et al. “Credit Card Fraud Detection: A Novel Approach Using Aggregation
Strategy and Feedback Mechanism.” IEEE Internet of Things Journal 5 (2018): 3637-3647.
[2] Pumsirirat, A. and Yan, L. (2018). Credit Card Fraud Detection using Deep Learning based on
Auto-Encoder and Restricted Boltzmann Machine. International Journal of Advanced Computer
Science and Applications, 9(1).
[3] Mohammed, Emad, and Behrouz Far. “Supervised Machine Learning Algorithms for Credit Card
Fraudulent Transaction Detection: A Comparative Study.” IEEE Annals of the History of Computing,
IEEE, 1 July 2018.
[4] Randhawa, Kuldeep, et al. “Credit Card Fraud Detection Using AdaBoost and Majority Voting.”
IEEE Access, vol. 6, 2018, pp. 14277–14284.
[5] Roy, Abhimanyu, et al. “Deep Learning Detecting Fraud in Credit Card Transactions.” 2018
Systems and Information Engineering Design Symposium (SIEDS), 2018.
[6] Xuan, Shiyang, et al. “Random Forest for Credit Card Fraud Detection.” 2018 IEEE 15th
International Conference on Networking, Sensing and Control (ICNSC), 2018.
[7] Awoyemi, John O., et al. “Credit Card Fraud Detection Using Machine Learning Techniques: A
Comparative Analysis.” 2017 International Conference on Computing Networking and Informatics
(ICCNI), 2017.
[8] Melo-Acosta, German E., et al. “Fraud Detection in Big Data Using Supervised and Semi-
Supervised Learning Techniques.” 2017 IEEE Colombian Conference on Communications and
Computing (COLCOM), 2017.
[9] “Survey Paper on Credit Card Fraud Detection by Suman”, Research Scholar, GJUS&T Hisar
HCE, Sonepat published by International Journal of Advanced Research in Computer Engineering &
Technology (IJARCET) Volume 3 Issue 3, March 2014.
[10] “Research on Credit Card Fraud Detection Model Based on Distance Sum – by Wen-Fang YU
and Na Wang” published by 2009 International Joint Conference on Artificial Intelligence.
[11] “Credit Card Fraud Detection through Parenclitic Network AnalysisBy Massimiliano Zanin,
Miguel Romance, Regino Criado, and SantiagoMoral” published by Hindawi Complexity Volume
2018, Article ID 5764370, 9 pages
[12] “Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy” published
32 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject
by IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 29,
NO. 8, AUGUST 2018
[13] “Credit Card Fraud Detection-by Ishu Trivedi, Monika, Mrigya, Mridushi” published by
International Journal of Advanced Research in Computer and Communication Engineering Vol. 5,
Issue 1, January 2016.
[14] David J.Wetson,David J.Hand,M Adams,Whitrow and Piotr Jusczak “Plastic Card Fraud
Detection using Peer Group Analysis” Springer, Issue 2008.
[15] “Credit Card Fraud Detection Based on Transaction Behaviour -by John Richard D. Kho, Larry
A. Vea” published by Proc. of the 2017 IEEE Region 10 Conference (TENCON), Malaysia,
November 5-8, 2017.

33 | P a g e
BVDU-DET-NM/CSBS/2022-23/ML-miniproject

Data Analytics For Accounting, 2nd Edition Vernon Richardson PDF
90% (10)
Data Analytics For Accounting, 2nd Edition Vernon Richardson PDF
1,026 pages
Salesforce AI Associate Dumps
100% (3)
Salesforce AI Associate Dumps
60 pages
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
60% (10)
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
9 pages
Ford Escape 4wd Workshop Manual v6 3 0l 2008
100% (4)
Ford Escape 4wd Workshop Manual v6 3 0l 2008
7,556 pages
DATA ANALYTICS - A Comprehensive Beginner's Guide To Learn About The Realms of Data Analytics From A-Z
88% (17)
DATA ANALYTICS - A Comprehensive Beginner's Guide To Learn About The Realms of Data Analytics From A-Z
102 pages
EPSM Unit 7 Data Analytics
100% (1)
EPSM Unit 7 Data Analytics
27 pages
Major Project Report
100% (2)
Major Project Report
33 pages
Wide Range Achievement Test - PAR
67% (3)
Wide Range Achievement Test - PAR
29 pages
China Musings Goldman Sachs 060122
100% (1)
China Musings Goldman Sachs 060122
20 pages
Gap & Prosper Workshop
No ratings yet
Gap & Prosper Workshop
60 pages
Writing Effective Use Cases
100% (1)
Writing Effective Use Cases
8 pages
Home Page - TalentCentral
No ratings yet
Home Page - TalentCentral
13 pages
Medical Billing
100% (9)
Medical Billing
229 pages
21MCA2697 Project Report
No ratings yet
21MCA2697 Project Report
90 pages
Ca998 - Online Accomodation Finder
No ratings yet
Ca998 - Online Accomodation Finder
56 pages
Role of Technology in Banking Sector
No ratings yet
Role of Technology in Banking Sector
59 pages
Janata Shikshana Samiti'S K H Kabbur Institute of Engineering
No ratings yet
Janata Shikshana Samiti'S K H Kabbur Institute of Engineering
93 pages
Online Banking Documentation
No ratings yet
Online Banking Documentation
8 pages
FR Store Project Report Final
No ratings yet
FR Store Project Report Final
41 pages
Technical Seminar VPP
No ratings yet
Technical Seminar VPP
26 pages
21MCA2697 Himanshu Rubrics4
No ratings yet
21MCA2697 Himanshu Rubrics4
49 pages
Visvesvaraya Technological University: "Online Bank Management System"
No ratings yet
Visvesvaraya Technological University: "Online Bank Management System"
9 pages
Bank Fraud Documentation
No ratings yet
Bank Fraud Documentation
109 pages
Bharat Atm Report
No ratings yet
Bharat Atm Report
23 pages
Atmprojectreport
No ratings yet
Atmprojectreport
49 pages
Collage Project Report
No ratings yet
Collage Project Report
36 pages
Samwaad
No ratings yet
Samwaad
80 pages
License Detection Using Qrcode
No ratings yet
License Detection Using Qrcode
56 pages
Online Carkparking Booking System Final Report
No ratings yet
Online Carkparking Booking System Final Report
38 pages
B.M.MOULYA
No ratings yet
B.M.MOULYA
30 pages
CHANDU PPT
No ratings yet
CHANDU PPT
35 pages
DCAS PROJECT REPORT-Manish
No ratings yet
DCAS PROJECT REPORT-Manish
34 pages
Online Appliances Hire: Project Report
No ratings yet
Online Appliances Hire: Project Report
87 pages
ONLINE SHOPPING PORTAL FOR MEN new loki (1) doc last
No ratings yet
ONLINE SHOPPING PORTAL FOR MEN new loki (1) doc last
45 pages
E-commerce Website Research Report(Faiz Alam) PDF
No ratings yet
E-commerce Website Research Report(Faiz Alam) PDF
75 pages
145 Sharmeen Shaikh
No ratings yet
145 Sharmeen Shaikh
97 pages
Submitted 20BCS1209 FinalReport Chilli Express
No ratings yet
Submitted 20BCS1209 FinalReport Chilli Express
48 pages
Major Project Report Goyat PDF
No ratings yet
Major Project Report Goyat PDF
49 pages
Major Project Innfinder
No ratings yet
Major Project Innfinder
50 pages
New Project Eee
No ratings yet
New Project Eee
23 pages
Atm System 1
No ratings yet
Atm System 1
50 pages
Mohan
No ratings yet
Mohan
5 pages
Project 2
No ratings yet
Project 2
23 pages
Ravi Internship Report
No ratings yet
Ravi Internship Report
39 pages
PBL REPORT FINAL(ARYAN AND SATYAM)
No ratings yet
PBL REPORT FINAL(ARYAN AND SATYAM)
24 pages
Atm Project
No ratings yet
Atm Project
67 pages
Is001 Tie
No ratings yet
Is001 Tie
52 pages
E - Banking Services
No ratings yet
E - Banking Services
44 pages
ATM Machine Project Report
No ratings yet
ATM Machine Project Report
31 pages
major project report
No ratings yet
major project report
26 pages
Sample
No ratings yet
Sample
95 pages
551front
No ratings yet
551front
9 pages
Nani 58
No ratings yet
Nani 58
19 pages
Shishafinalproj
No ratings yet
Shishafinalproj
80 pages
Final Report
No ratings yet
Final Report
25 pages
Final Report
No ratings yet
Final Report
38 pages
Prodr
No ratings yet
Prodr
31 pages
Banking Application System (214204P&214212P) (1)
No ratings yet
Banking Application System (214204P&214212P) (1)
52 pages
Documentation Twit
No ratings yet
Documentation Twit
30 pages
Arshiya M.tech Final Project
No ratings yet
Arshiya M.tech Final Project
64 pages
MBA MINI PROJECT111 - Organized (1) Page Number Changed, Date Updated
No ratings yet
MBA MINI PROJECT111 - Organized (1) Page Number Changed, Date Updated
43 pages
Report of Cryptocurrency Application
No ratings yet
Report of Cryptocurrency Application
33 pages
AJP_OM_MIRO_om
No ratings yet
AJP_OM_MIRO_om
15 pages
RASHMI C Ui-Internship
No ratings yet
RASHMI C Ui-Internship
31 pages
Project Documentation
No ratings yet
Project Documentation
29 pages
505-mini
No ratings yet
505-mini
59 pages
Om Project
No ratings yet
Om Project
47 pages
Internship File
No ratings yet
Internship File
32 pages
Project Report
No ratings yet
Project Report
48 pages
Nithin Main Project
No ratings yet
Nithin Main Project
60 pages
WT PBL Merged
No ratings yet
WT PBL Merged
25 pages
ANSYS Workbench 2023 R2: A Tutorial Approach, 6th Edition
From Everand
ANSYS Workbench 2023 R2: A Tutorial Approach, 6th Edition
Prof. Sham Tickoo
No ratings yet
Resume Updated
100% (3)
Resume Updated
2 pages
Focus Investing PDF
No ratings yet
Focus Investing PDF
18 pages
A Collection of Fraud Schemes
67% (3)
A Collection of Fraud Schemes
54 pages
TED Talks List
100% (2)
TED Talks List
15 pages
Consumer Reports Buying Guide 2021
100% (1)
Consumer Reports Buying Guide 2021
227 pages
Medical Coding
No ratings yet
Medical Coding
43 pages
Guide of Books On Intelligence
100% (2)
Guide of Books On Intelligence
19 pages
Immediate download Decision Sciences: Theory and Practice 1st Edition Raghu Nandan Sengupta ebooks 2024
100% (1)
Immediate download Decision Sciences: Theory and Practice 1st Edition Raghu Nandan Sengupta ebooks 2024
55 pages
Grammarly GenAI-TLP v2.0
No ratings yet
Grammarly GenAI-TLP v2.0
23 pages
Account List
No ratings yet
Account List
1,294 pages
QuickBooks Online Core Certification Self Study Workbook V21.2.2
100% (1)
QuickBooks Online Core Certification Self Study Workbook V21.2.2
55 pages
1500 2014 5.7L
No ratings yet
1500 2014 5.7L
285 pages
Instant download Managerial Accounting Garrison 13th Edition Test Bank pdf all chapter
100% (7)
Instant download Managerial Accounting Garrison 13th Edition Test Bank pdf all chapter
52 pages
FREE and Paid Test Banks and Solution Manual Updated
0% (2)
FREE and Paid Test Banks and Solution Manual Updated
197 pages
Office Automation Important Questions and Answers
100% (1)
Office Automation Important Questions and Answers
9 pages
Casino CRM
100% (1)
Casino CRM
10 pages
"Enter Entity Name Here": Risk Assessment Template
No ratings yet
"Enter Entity Name Here": Risk Assessment Template
34 pages
Online Casino Software For Sale and Casino Software Solutions
No ratings yet
Online Casino Software For Sale and Casino Software Solutions
2 pages
Data Analytics and Python Programming 2 Bundle Manuscript - Isaac D. Cody
100% (3)
Data Analytics and Python Programming 2 Bundle Manuscript - Isaac D. Cody
156 pages
GSM-based Automated Over Voltage and Under Voltage Monitoring System For Panel Boards Using Arduino Uno
No ratings yet
GSM-based Automated Over Voltage and Under Voltage Monitoring System For Panel Boards Using Arduino Uno
14 pages
Evaluation of A Company's Marketing Mix
No ratings yet
Evaluation of A Company's Marketing Mix
12 pages
Chapter 5
No ratings yet
Chapter 5
46 pages
PDF Brain Network Analysis 1st Edition Moo K. Chung download
100% (1)
PDF Brain Network Analysis 1st Edition Moo K. Chung download
55 pages
Board Diversity and Earnings Management in Publicly Listed Oil and Gas Firms in Nigeria
No ratings yet
Board Diversity and Earnings Management in Publicly Listed Oil and Gas Firms in Nigeria
24 pages
Bravais Pearson and Spearman Correlation Coefficients 2002
No ratings yet
Bravais Pearson and Spearman Correlation Coefficients 2002
4 pages
Toxic Work Environments
No ratings yet
Toxic Work Environments
186 pages
SoICT-IT2022-07-Data Analysis and Experimental Design - Eng&Vie
No ratings yet
SoICT-IT2022-07-Data Analysis and Experimental Design - Eng&Vie
9 pages
CH 3
No ratings yet
CH 3
76 pages
Effect of Marketing Intelligence On Sales Performance of Commercial Banks in Kenya
No ratings yet
Effect of Marketing Intelligence On Sales Performance of Commercial Banks in Kenya
13 pages
Business Statistics - 1 (External)
No ratings yet
Business Statistics - 1 (External)
6 pages
Edexcel Math S3
No ratings yet
Edexcel Math S3
178 pages
Chapter 7st - Multiple Items - Coordinated Ordering
No ratings yet
Chapter 7st - Multiple Items - Coordinated Ordering
33 pages
Scatter Plot
No ratings yet
Scatter Plot
11 pages
GEC3 Assignment 5 PDF
No ratings yet
GEC3 Assignment 5 PDF
5 pages
Chapter - 5 - Correlation and Regression
No ratings yet
Chapter - 5 - Correlation and Regression
70 pages
8.transesterification of Waste Edible Oils To Biodiesel Using Calcium Oxide
No ratings yet
8.transesterification of Waste Edible Oils To Biodiesel Using Calcium Oxide
11 pages
Machine Learning Notes Unit 1 To 4
No ratings yet
Machine Learning Notes Unit 1 To 4
101 pages
Data Analytics Stats Viz Python PowerBi Excel SQL
No ratings yet
Data Analytics Stats Viz Python PowerBi Excel SQL
8 pages
Medical Stat 2023 AVN Ans
No ratings yet
Medical Stat 2023 AVN Ans
37 pages
Humss As The Chosen Strand and Its Influence On The Students Communication Skills PDF
No ratings yet
Humss As The Chosen Strand and Its Influence On The Students Communication Skills PDF
13 pages
The Effects of Customary Land Tenure System On Agricultural Productivity in Ngora District
No ratings yet
The Effects of Customary Land Tenure System On Agricultural Productivity in Ngora District
9 pages
Determinasi Sistem Pengendalian Mutu Dan Sikap Profesionalisme Auditor Terhadap Kualitas Audit
No ratings yet
Determinasi Sistem Pengendalian Mutu Dan Sikap Profesionalisme Auditor Terhadap Kualitas Audit
23 pages
MGN801 Ca1
No ratings yet
MGN801 Ca1
9 pages
(eBook PDF) CFA Program Curriculum 2019 Level II Volumes 1-6 Box Set 2024 Scribd Download
100% (1)
(eBook PDF) CFA Program Curriculum 2019 Level II Volumes 1-6 Box Set 2024 Scribd Download
41 pages
Ijaefa202213 (2) 61 68
No ratings yet
Ijaefa202213 (2) 61 68
8 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
FREE STATE GR 12 SEPT 2020 P2 and Memo
No ratings yet
FREE STATE GR 12 SEPT 2020 P2 and Memo
27 pages
Statistics and Numericalmethods Unit 2and 5 Question Bank
No ratings yet
Statistics and Numericalmethods Unit 2and 5 Question Bank
8 pages