Project Report 5
Project Report 5
Project Report 5
pg. 2
DECLARATION
I Kaushal kumar Roll no: 2022002825, 2year MBA student of UNIVERSITY SCHOOL OF
MANAGEMENT, hereby declare that the training titled understanding the how to do the Detection of
credit card fraud transaction using machine learning algorithm and various controlling techniques that we
used in Detection of credit card fraud transaction using machine learning algorithm and submitted in the
partial fulfillment for the degree. The information and data given the report is authentic and true to the
best of my knowledge.
I also declare that this project has not been submitted for the award of any other degree, diploma, and
literature earlier.
Kaushal Kumar
pg. 3
LIST OF FIGURES
Figure No. Name of the Figure Page no.
Figure 22
Decision tree Algorithm
4.2.3
Figure 5.1 26
System Architecture
Figure 5.3 28
Activity Diagram
Figure2 Dataset
49
Figure3 Data set reading code
pg. 4
pg. 5
CHAPTER-1
INTRODUCTION
CHAPTER-1
INTODUCTION
pg. 6
1.1Overview
Credit card is the most popular mode of payment. As the number of credit card
users is rising world-wide, the identity theft is increased, and frauds are also
increasing. In the virtual card purchase, only the card information is required such as
card number, expiration date, secure code, etc. Such purchases are normally done on
the Internet or over telephone. To commit fraud in these types of purchases, a person
simply needs to know the card details. The mode of payment for online purchase is
mostly done by credit card. The details of credit card should be kept private. To
secure credit card privacy, the details should not be leaked. Different ways to steal
credit card details are phishing websites, steal/lost credit cards, counterfeit credit
cards, theft of card details, intercepted cards etc. For security purpose, the above
things should be avoided. In online fraud, the transaction is made remotely and only
the card’s details are needed. The simple way to detect this type of fraud is to analyze
the spending patterns on every card and to figure out any variation to the “usual”
spending patterns. Fraud detection by analyzing the existing data purchase of
cardholder is the best way to reduce the rate of successful credit card frauds. As the
data sets are not available and also the results are not disclosed to the public. The
fraud cases should be detected from the available data sets known as the logged data
and user behavior. At present, fraud detection has been implemented by a number of
methods such as data mining, statistics, and artificial intelligence.
1.2Problem Statement
The card holder faced a lot of trouble before the investigation finish. And also, as all
the transaction is maintained in a log, we need to maintain huge data, and also now a
day’s lot of online purchase are made so we don’t know the person how is using the card
online, we just capture the in address for verification purpose. So there need a help from
the cyber- crime to investigate the fraud.
pg. 7
1.3 Significance and relevance of work
Relevance of work includes consideration of all the possible ways to provide a solution
to given problem. The proposed solution should satisfy all the user requirements and
should be flexible enough so that future changes can easily done based on the future
upcoming requirements like Machine learning techniques.
There are two important categories of machine learning techniques to identify the frauds
in credit card transactions: supervised and unsupervised learning model. In supervised
approach, early transactions of credit card are labelled as genuine or frauds. Then, the
scheme identifies the fraud transaction with credit card data.
1.4 Objectives
Features Extractions from recognized facial information then data will be normalized for
extracting features of good Objective of the project is to predict the fraud and fraud less
transaction with respect to the time and amount of the transaction using classification
machine learning algorithms such as SVM, Random Forest, Decision tree and confusion
matrix in building of the complex machine learning models.
1.5 Methodology
First the Dataset is read. Exploratory Data Analysis is performed on the dataset to clearly
understand the statistics of the data, Feature selection is used, A machine learning model
is developed. Train and test the model and analysis the performance of the model using
certain evaluation techniques such as accuracy, confusion matrix, precision etc.
Chapter-1
1. Overview: the overview provides the basic layout and the insight about the
pg. 8
work proposed. It briefs the entire need of the currently proposed work.
2. Problem statement: A problem statement is a concise description of an issue
have mentioned about the work we are trying to accomplish in this section.
5. Methodology: A methodology is a collection of methods, practices, processes
and techniques. We have explained in this section about the working of the
project in a brief way.
Chapter-2
1. Literature Survey: the purpose of a literature review is to gain an
understanding of the existing resources to a particular topic or area of study. We
have referred to many research papers relevant to our work in a better.
Chapter-3
1. System Requirements and Specifications: System Requirements and
Specifications is a document that describes the nature of a project, software or
application. This section contains the brief knowledge about the functional and
non – functional that are needed to implement the project.
Chapter-4
pg. 9
1. System Analysis: System Analysis is a document that describes about the existing
system and proposed system in the project. And also describes about advantages and
disadvantages in the project.
Chapter-5
1. System design: System design is a document that describes about the project modules,
Activity diagram, Use Case Diagram, Data Flow Diagram, and Sequence Diagram
detailed in the project.
Chapter-6
1. Implementation: Implementation is a document that describes about the
detailed concepts of the project. Also describes about the algorithm with their
detailed steps. And also, about the codes for implementation of the algorithm.
Chapter-7
Test cases.
Chapter-8
1. Performance Analysis: Performance Analysis is a document that describes about
Chapter-9
1. Conclusion and Future Enhancement: Conclusion and Future Enhancement is a
document that describes about the brief summary of the project and
pg. 10
undetermined events that will occur in that time.
pg. 11
CHAPTER-2
LITERATURE SURVEY
CHAPTER-2
LITERATURE SURVEY
2.1 Credit card fraud Detection techniques: Data and Technique oriented perspective
Disadvantages
2.3 Credit card fraud detection using machine learning algorithm author :
Authors : Vaishnavi Nath Dornodulaa , Getha S.
The main aim of the paper is to design and develop a novel fraud detection method for
Streaming Transaction Data, with an objective, to analyze the past transaction details of
the customers and extract the behavioral patterns.
Disadvantages
• Imbalanced Data
2.4 Fraudulent transaction detection in card by applying ensemble machine tachniques
Authors: Debachudamani Prusti, Santanu Kumar Rath
2.5 Detection of Credit Card Fraud Transactions using Machine Learning Algorithms
and Neural Network
Credit card fraud resulting from misuse of the system is defined as theft or misuse of
one’s credit card information which is used for personal gains without the permission of
the card holder. To detect such frauds, it is important to check the usage patterns of a user
over the past transactions. Comparing the usage pattern and current transaction, we can
classify it as either fraud or a legitimate transaction.
Disadvantages
2.6 Credit card fraud detection using machine learning algorithm and cyber security
Authors : Jiatongshen
As they have the same accuracy the time factor is considered to choose the best
algorithm. By considering the time factor they concluded that the Ad boost algorithm
works well to detect credit card fraud.
Disadvantages
• Accuracy is not getting perfectly
pg. 14
CHAPTER -3
SYSTEM REQUIREMENTS AND SPECIFICATION
CHAPTER -3
pg. 15
System Requirement Specification (SRS) is a fundamental document, which forms the
foundation of the software development process. The System Requirements Specification
(SRS) document describes all data, functional and behavioral requirements of the
software under production or development. An SRS is basically an organization's
understanding (in writing) of a customer or potential client's system requirements and
dependencies at a particular point in time (usually) prior to any actual design or
development work. It's a two- way insurance policy that assures that both the client and
the organization understand the other's requirements from that perspective at a given
point in time. The SRS also functionsas a blueprint for completing a project with as little
cost growth as possible. The SRS is often referred to as the "parent" document because
all subsequent project management documents, such as design specifications, statements
of work, software architecture specifications, testing and validation plans, and
documentation plans, are related to it.It is important to note that an SRS contains
functional and non-functional requirements only. It doesn't offer design suggestions,
possible solutions to technology or business issues, or any other information other than
what the development team understands the customer's system requirements.
Functional Requirement defines a function of a software system and how the system
pg. 16
must behave when presented with specific inputs or conditions. These may include
calculations, data manipulation and processing and other specific functionality. In this
system following are the functional requirements:
pg. 17
pg. 18
CHAPTER-4
SYSTEM ANALYSIS
CHAPTER-4
SYSTEM ANALYSIS
Systems analysis is the process by which an individual studies a system such that an
information system can be analyzed, modeled, and a logical alternative can be chosen.
Systems analysis projects are initiated for three reasons: problems, opportunities, and
directives
• credit card fraud detection (CCFD) is also proposed by using neural networks. The
existing credit card fraud detection system using neural network follows the
whale swarm optimization algorithm to obtain an incentive value.
4.1.1 Limitation
• If the time interval is too short, then Markov models are inappropriate because
the individual displacements are not random, but rather are deterministically
related in time. This example suggests that Markov models are generally
inappropriate over sufficiently short time intervals.
pg. 21
Fig 4.2.2 simplified random algorithm
Decision tree
4.2.1 Advantages
pg. 23
CHAPTER -5
SYSTEM DESIGN
CHAPTER -5
SYSTEM DESIGN
5.1 Project Report
Entire project is divided into 3 modules as follows:
Data Gathering and pre processing
Training the model using following Machine Learning algorithms
pg. 24
i. SVM
ii. Random Forest Classifier
iii. Decision Tree
b. A model is trained using the training dataset with the following algorithms
SVM, Random Forest Classifier and Decision Tree
c. The trained models are trained with the testing data and results are visualized using
score, Precision, Recall. The results are then displayed using various data
visualization tools for analysis purpose.
e. The algorithm which has provided the better accuracy rate compared to remaining
pg. 25
a. The algorithm which has provided better accuracy rate has considered as the
c. Database is connected to the front end to store the user information who are using it.
SYSTEM ARCHITECTURE
Our Project main purpose is to making Credit Card Fraud Detection awarding to people from
credit card online frauds. the main point of credit card fraud detection system is necessary to
safe our transactions & security. With this system, fraudsters don't have the chance to make
multiple transactions on a stolen or counterfeit card before the cardholder is aware of the
fraudulent activity. This model is then used to identify whether a new transaction is fraudulent or
not. Our aim here is to detect 100% of the fraudulent transactions while minimizing the incorrect
fraud classifications.
Activity diagram is an important diagram in UML to describe the dynamic aspects of the
system. Activity diagram is basically a flowchart to represent the flow from one activity
to another activity. The activity can be described as an operation of the system. The
control flow is drawn from one operation to another. This flow can be sequential,
branched, or concurrent. Activity diagrams deal with all type of flow control by using
different elements such as fork, join, etc. The basic purposes of activity diagram are it
pg. 26
captures the dynamic behavior of the system. Activity diagrams used to show message
flow from one activity to another Activity is a particular operation of the system. Activity
diagrams are not only used for visualizing the dynamic nature of a system, but they are
also used to construct the executable system by using forward and reverse engineering
techniques. The only missing thing in the activity diagram is the message part
The sequence diagram represents the flow of messages in the system and is also termed
as an event diagram. It helps in envisioning several dynamic scenarios. It portrays the
communication between any two lifelines as a time-ordered sequence of events, such that
pg. 28
these lifelines took part at the run time. In UML, the lifeline is represented by a vertical
bar, whereas the message flow is represented by a vertical dotted line that extends across
the bottom of the page. It incorporates the iterations as well as branching.
pg. 29
Fig 5.5 Data Flow Diagram
pg. 30
CHAPTER-6
IMPLEMENTATION
CHAPTER-6
IMPLEMENTATION
6.1 Algorithm
pg. 31
Step 1: Import dataset
Step4: Decide the amount of data for training data and testing data
Step5: Give 80% data for training and remaining data for testing. Step6:
Step7: Choose the algorithm among 3 different algorithms and create the model Step8:
Step11: Compare the algorithms for all the variables and find out the best algorithm.
CODE
Importing Libraries
!pip install tensorflow
pg. 32
# for numerical operations import
numpy as np
# data visualization
DATA ACQUISTION
data = pd.read_csv('creditcard.csv') data
pg. 33
DATA ANAYSIS
data.shape data.info()
data.describe()
sns.countplot(x='Class', data=data)
print("Fraud: ",data.Class.sum()/data.Class.count())
Fraud_class = pd.DataFrame({'Fraud': data['Class']})
valid = data[data['Class'] == 0]
fraud.Amount.describe() plt.figure(figsize=(20,20))
plt.title('Correlation Matrix', y=1.05, size=15)
sns.heatmap(data.astype(float).corr(),linewidths=0.1,vmax=1.0,
DATA NORMALIZATION
rs = RobustScaler()
X = data.drop(['Class'], axis = 1) Y =
data["Class"]
pg. 34
DATA SPLITTING
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 1)
X_train
X_test Y_test
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.show(
pg. 35
# Evaluating
SVC evaluate(Y_pred_svm, Y_test)
# Evaluation
evaluate(Y_pred_rf, Y_test)
dtc = DecisionTreeClassifier()
dtc.fit(X_train, Y_train)
# predictions
pg. 36
CHAPTER-7
TESTING
CHAPTER -7
TESTING
pg. 37
Testing is a process of executing a program with intent of finding an error. Testing
presents an interesting anomaly for the software engineering. The goal of the software
testing is to convince system developer and customers that the software is good enough
for operational use. Testing is a process intended to build confidence in the software.
Testing is a set of activities that can be planned in advance and conducted systematically.
Software testing is often referred to as verification & validation.
In this testing we test each module individually and integrate with the overall system.
Unit testing focuses verification efforts on the smallest unit of software design in the
module. This is also known as module testing. The module of the system is tested
separately. This testing is carried out during programming stage itself. In this testing step
each module is found to working satisfactorily as regard to the expected output from the
module. There are some validation checks for fields also. It is very easy to find error
debut in the system.
pg. 38
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user
manuals. Functional testing is centered on the following items:
Data can be lost across an interface; one module can have an adverse effort on the other
sub functions when combined may not produces the desired major functions. Integrated
testingis the systematic testing for constructing the uncover errors within the interface.
The testing was done with sample data. The Developed system has run successfully for
this sample data. The need for integrated test is to find the overall system performance.
pg. 39
CHAPTER-8
PERFORMANCE ANALYSIS
CHAPTER-8
PERFORMANCE ANALYSIS
pg. 40
:PERFORMANCE METRICS:
The basic performance measures derived from the confusion matrix. The confusion
matrix is a 2 by 2 matrix table contains four outcomes produced by the binary classifier.
Various measures such as sensitivity, specificity, accuracy and error rate are derived from
the confusion matrix.
Accuracy: Accuracy is calculated as the total number of two correct predictions(A+B)
divided by the total number of the dataset(C+D). It is calculated as (1-error rate).
Accuracy=A+B/C+D
Whereas,
A=True Positive B=True Negative
C=Positive D=Negative
Error rate:
Error rate is calculated as the total number of two incorrect predictions(F+E) divided by
the total number of the dataset(C+D).
Error rate=F+E/C+D
Whereas,
E=False Positive F=False
Negative C=Positive
D=Negative
SENSITIVITY:
Sensitivity is calculated as the number of correct positive predictions(A) divided by
the total number of positives(C).
Sensitivity=A/C
Specificity: Specificity is calculated as the number of correct negative predictions(B) divided
by the total number of negatives(D).
Specificity=B/D.
pg. 41
DATA ANALYSIS
Accuracy: 0.9994557775359011
Precision: 0.6781609195402298
Recall: 0.9516129032258065
F1-Score: 0.7919463087248322
pg. 42
Fraud 0.68 0.95 0.79 62
RANDOM FOREST
Accuracy : 0.9995611109160493
Precision : 0.7701149425287356
Recall : 0.9305555555555556
DECISION TREE
Accuracy: 0.9992802219023208
Precision: 0.7241379310344828
Recall: 0.7875
F1-Score: 0.7544910179640718
pg. 43
Normal 1.00 1.00 1.00 56882
pg. 44
CHAPTER-9
CONCLUSION & FUTURE ENHANCEMENT
CHAPTER -9
CONCLUSION & FUTURE ENHANCEMENT
Nowadays, in the global computing environment, online payments are important, because
pg. 45
online payments use only the credential information from the credit card to fulfill an
application and then deduct money. Due to this reason, it is important to find the best
solution to detect the maximum number of frauds in online systems.
Accuracy, Error-rate, Sensitivity and Specificity are used to report the performance of the
system to detect the fraud in the credit card. In this paper, three machine learning
algorithms are developed to detect the fraud in credit card system. To evaluate the
algorithms, 80% of the dataset is used for training and 20% is used for testing and
validation. Accuracy, error rate, sensitivity and specificity are used to evaluate for
different variables for three algorithms. The accuracy result is shown for SVM; Decision
tree and random forest classifier are 99.94, 99.92, and 99.95 respectively. The
comparative results show that the Random Forest performs better than the SVM and
decision tree techniques.
Future Enhancement
Detection, we did end up creating a system that can, with enough time and data, get very
close to that goal. As with any such project, there is some room for improvement here.
The very nature of this project allows for multiple algorithms to be integrated together
asmodules and their results can be combined to increase the accuracy of the final result.
This model can further be improved with the addition of more algorithms into it.
However, the output of these algorithms needs to be in the same format as the others.
Once that condition is satisfied, the modules are easy to add as done in the code. This
provides a great degree of modularity and versatility to the project. More room for
improvement can be found in the dataset. As demonstrated before, the precision of the
algorithms increases when the size of dataset is increased. Hence, more data will surely
make the model more accurate in detecting frauds and reduce the number of false
positives.
BIBLIOGRAPHY
pg. 46
B.Meena, I.S.L.Sarwani, S.V.S.S.Lakshmi,” Web Service mining and its techniques in Web
Mining” IJAEGT,Volume 2,Issue 1 , Page No.385-389.
F. N. Ogwueleka, "Data Mining Application in Credit Card Fraud Detection System", Journal
of Engineering Science and Technology, vol. 6, no. 3, pp. 311-322, 2019.
G. Singh, R. Gupta, A. Rastogi, M. D. S. Chandel, A. Riyaz, "A Machine Learning Approach for
Detection of Fraud based on SVM", International Journal of Scientific Engineering and
Technology, vol. 1, no. 3, pp. 194-198, 2019, ISSN ISSN: 2277-1581.
K. Chaudhary, B. Mallick, "Credit Card Fraud: The study of its impact and detection
techniques", International Journal of Computer Science and Network (IJCSN), vol. 1, no. 4,
pp. 31-35, 2019, ISSN ISSN: 2277-5420.
S. Bhattacharyya, S. Jha, K. Tharakunnel, J. C.Westland, "Data mining for credit card fraud: A
comparative study", Decision Support Systems, vol. 50, no. 3, pp. 602-613, 2019.
Y. Sahin, E. Duman, "Detecting credit card fraud by ANN and logistic regression",
Innovations in Intelligent Systems and Applications (INISTA) 2018 International Symposium,
pg. 47
pp. 315-319, 2018.
pg. 48
APPENDIX
Appendix A: Screen Shots
Fig . 2 Dataset
pg. 49
Fig.3 Data set reading mode
pg. 50
Appendix B: Abbreviations
pg. 51