Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Project Report 5

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 51

DETECTION OF CREDIT CARD FRAUD TRANSACTION

USING MACHINE LEARNING ALGORITHM


A Project Report Submitted in Partial fulfillment of the requirement for the Award
of the degree
MASTER OF BUSINESS ADMINISTRATION
IN
FACULTY OF COMMERCE & MANAGEMENT
BY
Mr. Kaushal kumar
{Registration Number:22-UD-493;Roll no:47}
Under the supervision of
Mrs.

UNIVERSITY SCHOOL OF MANAGEMENT


KURUKSHETRA UNINVERSITY, KURUKSHETRA [HARAYANA]
pg. 1
ACKNOWLEDGEMENT

This report is an output of collaborative efforts. However, it could not have


been possible without the help and guidance of some people whom I would
like to acknowledge before I begin.

With profound respect and gratitude, I take the opportunity to convey my


thanks to complete the course here. I do extend my heartfelt thanks to Prof.
Bhanwar Singh for providing me this opportunity to be a part of this
esteemed organization. I am extremely grateful to as Prof. Bhanwar Singh
my mentor for their cooperation and guidance that helped me a lot during the
internship on Inventory Management I have learnt a lot working under them
and I will always be indebted of them for this value addition in me.
I would also like to thank the UNIVERSITY SCHOOL OF
MANAGEMENT and all the faculty member of USM department for their
effort of constant cooperation which have been significant factor in the
accomplishment of my summer training.

pg. 2
DECLARATION
I Kaushal kumar Roll no: 2022002825, 2year MBA student of UNIVERSITY SCHOOL OF
MANAGEMENT, hereby declare that the training titled understanding the how to do the Detection of
credit card fraud transaction using machine learning algorithm and various controlling techniques that we
used in Detection of credit card fraud transaction using machine learning algorithm and submitted in the
partial fulfillment for the degree. The information and data given the report is authentic and true to the
best of my knowledge.

I also declare that this project has not been submitted for the award of any other degree, diploma, and
literature earlier.

Kaushal Kumar

pg. 3
LIST OF FIGURES
Figure No. Name of the Figure Page no.

Figure fraud and Non Fraud 20


4.1.1
Representation

Figure SVM Representation 21


4.2.1
Figure Simplified Random Forest 22
4.2.2
algorithm

Figure 22
Decision tree Algorithm
4.2.3
Figure 5.1 26
System Architecture

Figure 5.2 Activity Diagram 27

Figure 5.3 28
Activity Diagram

Figure 5.4 Sequence diagram 29


30
Figure5.5 Data Flow diagram

Figure8.1 Dataset analysis 42

Figure1 Correlation Matrix 49

Figure2 Dataset
49
Figure3 Data set reading code

Figure4 Confusion Matrix 50

pg. 4
pg. 5
CHAPTER-1
INTRODUCTION

CHAPTER-1
INTODUCTION
pg. 6
1.1Overview
Credit card is the most popular mode of payment. As the number of credit card
users is rising world-wide, the identity theft is increased, and frauds are also
increasing. In the virtual card purchase, only the card information is required such as
card number, expiration date, secure code, etc. Such purchases are normally done on
the Internet or over telephone. To commit fraud in these types of purchases, a person
simply needs to know the card details. The mode of payment for online purchase is
mostly done by credit card. The details of credit card should be kept private. To
secure credit card privacy, the details should not be leaked. Different ways to steal
credit card details are phishing websites, steal/lost credit cards, counterfeit credit
cards, theft of card details, intercepted cards etc. For security purpose, the above
things should be avoided. In online fraud, the transaction is made remotely and only
the card’s details are needed. The simple way to detect this type of fraud is to analyze
the spending patterns on every card and to figure out any variation to the “usual”
spending patterns. Fraud detection by analyzing the existing data purchase of
cardholder is the best way to reduce the rate of successful credit card frauds. As the
data sets are not available and also the results are not disclosed to the public. The
fraud cases should be detected from the available data sets known as the logged data
and user behavior. At present, fraud detection has been implemented by a number of
methods such as data mining, statistics, and artificial intelligence.

1.2Problem Statement
The card holder faced a lot of trouble before the investigation finish. And also, as all
the transaction is maintained in a log, we need to maintain huge data, and also now a
day’s lot of online purchase are made so we don’t know the person how is using the card
online, we just capture the in address for verification purpose. So there need a help from
the cyber- crime to investigate the fraud.
pg. 7
1.3 Significance and relevance of work
Relevance of work includes consideration of all the possible ways to provide a solution
to given problem. The proposed solution should satisfy all the user requirements and
should be flexible enough so that future changes can easily done based on the future
upcoming requirements like Machine learning techniques.
There are two important categories of machine learning techniques to identify the frauds
in credit card transactions: supervised and unsupervised learning model. In supervised
approach, early transactions of credit card are labelled as genuine or frauds. Then, the
scheme identifies the fraud transaction with credit card data.

1.4 Objectives
Features Extractions from recognized facial information then data will be normalized for
extracting features of good Objective of the project is to predict the fraud and fraud less
transaction with respect to the time and amount of the transaction using classification
machine learning algorithms such as SVM, Random Forest, Decision tree and confusion
matrix in building of the complex machine learning models.

1.5 Methodology
First the Dataset is read. Exploratory Data Analysis is performed on the dataset to clearly
understand the statistics of the data, Feature selection is used, A machine learning model
is developed. Train and test the model and analysis the performance of the model using
certain evaluation techniques such as accuracy, confusion matrix, precision etc.

1.6 Organization of the report

Chapter-1
1. Overview: the overview provides the basic layout and the insight about the

pg. 8
work proposed. It briefs the entire need of the currently proposed work.
2. Problem statement: A problem statement is a concise description of an issue

Tobe addressed or a condition to be improved upon. We have identified the gap


between addressed or a condition to be improved upon.

3. Significance and Relevance of Work: We have mentioned about the

contribution of our work to the society.


4. Objectives: A project objective describes the desired results of the work. We

have mentioned about the work we are trying to accomplish in this section.
5. Methodology: A methodology is a collection of methods, practices, processes

and techniques. We have explained in this section about the working of the
project in a brief way.

Chapter-2
1. Literature Survey: the purpose of a literature review is to gain an
understanding of the existing resources to a particular topic or area of study. We
have referred to many research papers relevant to our work in a better.

Chapter-3
1. System Requirements and Specifications: System Requirements and
Specifications is a document that describes the nature of a project, software or
application. This section contains the brief knowledge about the functional and
non – functional that are needed to implement the project.

Chapter-4

pg. 9
1. System Analysis: System Analysis is a document that describes about the existing
system and proposed system in the project. And also describes about advantages and
disadvantages in the project.

Chapter-5

1. System design: System design is a document that describes about the project modules,
Activity diagram, Use Case Diagram, Data Flow Diagram, and Sequence Diagram
detailed in the project.

Chapter-6
1. Implementation: Implementation is a document that describes about the
detailed concepts of the project. Also describes about the algorithm with their
detailed steps. And also, about the codes for implementation of the algorithm.

Chapter-7

1. Testing: Testing is a document that describes about the


a. Methods of testing: This contains the information about Unit testing,

Validation testing, Functional testing, Integration testing, User Acceptance


testing.
b. Test Cases: In Test Cases we contain the detailed description about program

Test cases.
Chapter-8
1. Performance Analysis: Performance Analysis is a document that describes about

the study system in detailed.

Chapter-9
1. Conclusion and Future Enhancement: Conclusion and Future Enhancement is a
document that describes about the brief summary of the project and
pg. 10
undetermined events that will occur in that time.

pg. 11
CHAPTER-2

LITERATURE SURVEY

CHAPTER-2

LITERATURE SURVEY

2.1 Credit card fraud Detection techniques: Data and Technique oriented perspective

Authors: SamanehSorournejad, Zahra Zojaji, Amir Hassan Monadjemi.


In this paper, after investigating difficulties of credit card fraud detection, we seek to
pg. 12
review the state of the art in credit card fraud detection techniques, datasets and
evaluation criteria.

Disadvantages

• Lack of standard metrics

2.2 Detection of credit card fraud : State of art


Authors: Imane Sadgali, Nawal Sael, Faouzia Benabbau
In this paper, we propose a state of the art on various techniques of credit card fraud
detection. The purpose of this study is to give a review of implemented techniques for
credit card fraud detection, analyses their incomes and limitless, and synthesize the
finding in order to identify the techniques and methods that give the best results so far.
Disadvantages
• Lack of adaptability

2.3 Credit card fraud detection using machine learning algorithm author :
Authors : Vaishnavi Nath Dornodulaa , Getha S.
The main aim of the paper is to design and develop a novel fraud detection method for
Streaming Transaction Data, with an objective, to analyze the past transaction details of
the customers and extract the behavioral patterns.

Disadvantages

• Imbalanced Data
2.4 Fraudulent transaction detection in card by applying ensemble machine tachniques
Authors: Debachudamani Prusti, Santanu Kumar Rath

In this study, the application of various classification models is proposed by


implementing machine learning techniques to find out the accuracy and other
performance parameters to identify the fraudulent transaction.
Disadvantages
pg. 13
• Overlapping data.

2.5 Detection of Credit Card Fraud Transactions using Machine Learning Algorithms
and Neural Network

Authors: Deepti Dighe, Sneha Patil, Shrikant Kokate

Credit card fraud resulting from misuse of the system is defined as theft or misuse of
one’s credit card information which is used for personal gains without the permission of
the card holder. To detect such frauds, it is important to check the usage patterns of a user
over the past transactions. Comparing the usage pattern and current transaction, we can
classify it as either fraud or a legitimate transaction.

Disadvantages

• Different misclassification importance

2.6 Credit card fraud detection using machine learning algorithm and cyber security
Authors : Jiatongshen
As they have the same accuracy the time factor is considered to choose the best
algorithm. By considering the time factor they concluded that the Ad boost algorithm
works well to detect credit card fraud.
Disadvantages
• Accuracy is not getting perfectly

pg. 14
CHAPTER -3
SYSTEM REQUIREMENTS AND SPECIFICATION

CHAPTER -3

SYSTEM REQUIREMENTS AND SPECIFICATION

3.1 System Requirement Specification:

pg. 15
System Requirement Specification (SRS) is a fundamental document, which forms the
foundation of the software development process. The System Requirements Specification
(SRS) document describes all data, functional and behavioral requirements of the
software under production or development. An SRS is basically an organization's
understanding (in writing) of a customer or potential client's system requirements and
dependencies at a particular point in time (usually) prior to any actual design or
development work. It's a two- way insurance policy that assures that both the client and
the organization understand the other's requirements from that perspective at a given
point in time. The SRS also functionsas a blueprint for completing a project with as little
cost growth as possible. The SRS is often referred to as the "parent" document because
all subsequent project management documents, such as design specifications, statements
of work, software architecture specifications, testing and validation plans, and
documentation plans, are related to it.It is important to note that an SRS contains
functional and non-functional requirements only. It doesn't offer design suggestions,
possible solutions to technology or business issues, or any other information other than
what the development team understands the customer's system requirements.

3.2 Hardware specification


➢ RAM: 4GB and Higher
➢ Processor: intel i3 and above
➢ Hard Disk: 500GB: Minimum

3.3 Software specification


➢ OS: Windows or Linux
➢ Python IDE: python 2.7.x and above
➢ Jupiter Notebook
➢ Language: Python

3.4 Functional Requirements:

Functional Requirement defines a function of a software system and how the system
pg. 16
must behave when presented with specific inputs or conditions. These may include
calculations, data manipulation and processing and other specific functionality. In this
system following are the functional requirements:

 Collect the Datasets.


 Train the Model.
 Predict the results
3.5 Non-Functional Requirements
 The system should be easy to maintain.
 The system should be compatible with different platforms.
 The system should be fast as customers always need speed.
 The system should be accessible to online users.
 The system should be easy to learn by both sophisticated and novice users.
 The system should provide easy, navigable and user-friendly interfaces.
 The system should produce reports in different forms such as tables and
graphs for easy visualization by management.
 The system should have a standard graphical user interface that allows for the online
3.6 Performance Requirement:

Performance is measured in terms of the output provided by the application.


Requirement specification plays an important part in the analysis of a system. Only when
the requirement specifications are properly given, it is possible to design a system, which
will fit into required environment. It rests largely with the users of the existing system to
give the requirement specifications because they are the people who finally use the
system. This is because the requirements have to be known during the initial stages so
that the system can be designed according to those requirements. It is very difficult to
change the system conceit has been designed and on the other hand designing a system,
which does not cater to the requirements of the user, is of no use.

pg. 17
pg. 18
CHAPTER-4

SYSTEM ANALYSIS

CHAPTER-4

SYSTEM ANALYSIS

Systems analysis is the process by which an individual studies a system such that an
information system can be analyzed, modeled, and a logical alternative can be chosen.
Systems analysis projects are initiated for three reasons: problems, opportunities, and
directives

4.1 Existing System


pg. 19
• Since the credit card fraud detection system is a highly researched field, there
are many different algorithms and techniques for performing the credit card
fraud detection system.
• One of the earliest systems is CCFD system using Markov model. Some other
various existing algorithms used in the credit cards fraud detection system
includes Cost sensitive decision tree (CSDT).

• credit card fraud detection (CCFD) is also proposed by using neural networks. The
existing credit card fraud detection system using neural network follows the
whale swarm optimization algorithm to obtain an incentive value.

Figure 4.1.1 fraud and Non Fraud Representation

4.1.1 Limitation

• If the time interval is too short, then Markov models are inappropriate because
the individual displacements are not random, but rather are deterministically
related in time. This example suggests that Markov models are generally
inappropriate over sufficiently short time intervals.

4.2 Proposed System

Vector support machine:


pg. 20
SVM works by mapping data to a high-dimensional feature space so that data points can
be categorized, even when the data are not otherwise linearly separable. A separator
between the categories is found, then the data are transformed in such a way that the
separator could be drawn as a hyperplane Training regression model and finding out the
best one.

Fig 4.2.1 SVM Representation


Random Forest Classifier
Features are cheekbone to jaw width, width to upper facial height ratio, perimeter to area
ratio, eye size, lower face to face height ratio, face width to lower face height ratio and
mean of eyebrow height. The extracted features are normalized and finally subjected to
support regression.

pg. 21
Fig 4.2.2 simplified random algorithm

Decision tree

A decision tree is a type of supervised machine learning used to categorize or make


predictions based on how a previous set of questions were answered. The model is a
form of supervised learning, meaning that the model is trained and tested on a set of data
that contains the desired categorization.

Fig 4.2.3 Decision tree algorithm

4.2.1 Advantages

• Support vector machine works comparably well when there is an


understandable margin of dissociation between classes.

• SVM is effective in instances where the number of dimensions is larger than


the number of specimens.

• Simple to understand and to interpret.


• Requires little data preparation.
pg. 22
• The cost of using the tree (i.e., predicting data) is logarithmic in the number of
data points used to train the tree.

• Able to handle both numerical and categorical data.


• Random forest classifier can be used to solve for regression or classification
problems.

• The random forest algorithm is made up of a collection of decision trees, and


each tree in the ensemble is comprised of a data sample drawn from a training
set with replacement, called the bootstrap sample.

pg. 23
CHAPTER -5
SYSTEM DESIGN

CHAPTER -5
SYSTEM DESIGN
5.1 Project Report
Entire project is divided into 3 modules as follows:
Data Gathering and pre processing
Training the model using following Machine Learning algorithms

pg. 24
i. SVM
ii. Random Forest Classifier
iii. Decision Tree

Module 1: Data gathering and Data Pre processing


a. A proper dataset is searched among various available ones and finalized with the dataset.
b. The dataset must be preprocessed to train the model.
c. In the preprocessing phase, the dataset is cleaned and any redundant values, noisy

dataandnull values are removed.


d. The Preprocessed data is provided as input to the module.

Module 2: Training the model


a. The Preprocessed data is split into training and testing datasets in the 80:20 ratio to

avoid the problems of over-fitting and under-fitting.

b. A model is trained using the training dataset with the following algorithms
SVM, Random Forest Classifier and Decision Tree
c. The trained models are trained with the testing data and results are visualized using

bar graphs, scatter plots.


d. The accuracy rates of each algorithm are calculated using different params like F1

score, Precision, Recall. The results are then displayed using various data
visualization tools for analysis purpose.
e. The algorithm which has provided the better accuracy rate compared to remaining

algorithms is taken as final prediction model.

Module 3: Final prediction model integration with front end

pg. 25
a. The algorithm which has provided better accuracy rate has considered as the

final prediction model.


b. The model thus made is integrated with front end.

c. Database is connected to the front end to store the user information who are using it.

SYSTEM ARCHITECTURE
Our Project main purpose is to making Credit Card Fraud Detection awarding to people from
credit card online frauds. the main point of credit card fraud detection system is necessary to
safe our transactions & security. With this system, fraudsters don't have the chance to make
multiple transactions on a stolen or counterfeit card before the cardholder is aware of the
fraudulent activity. This model is then used to identify whether a new transaction is fraudulent or
not. Our aim here is to detect 100% of the fraudulent transactions while minimizing the incorrect
fraud classifications.

Fig 5.1 System Architecture

5.1 Activity diagram

Activity diagram is an important diagram in UML to describe the dynamic aspects of the
system. Activity diagram is basically a flowchart to represent the flow from one activity
to another activity. The activity can be described as an operation of the system. The
control flow is drawn from one operation to another. This flow can be sequential,
branched, or concurrent. Activity diagrams deal with all type of flow control by using
different elements such as fork, join, etc. The basic purposes of activity diagram are it
pg. 26
captures the dynamic behavior of the system. Activity diagrams used to show message
flow from one activity to another Activity is a particular operation of the system. Activity
diagrams are not only used for visualizing the dynamic nature of a system, but they are
also used to construct the executable system by using forward and reverse engineering
techniques. The only missing thing in the activity diagram is the message part

Fig 5.2 Activity diagram

5.3 Use case diagram


In UML, use-case diagrams model the behavior of a system and help to capture the
requirements of the system. Use-case diagrams describe the high-level functions and scope of a
system. These diagrams also identify the interactions between the system and its actors. The
use cases and actors in use-case diagrams describe what the system does and how the actors
use it, but not how the system operates internally. Use-case diagrams illustrate and define the
context and requirements of either an entire system or the important parts of the system. You
can model a complex system with a single use-case diagram, or create many use-case diagrams
pg. 27
to model the components of the system. You would typically develop use- case diagrams in the
early phases of a project and refer to them throughout the development process.

Fig 5.3 Use Case Diagram

5.4 Sequence Diagram

The sequence diagram represents the flow of messages in the system and is also termed
as an event diagram. It helps in envisioning several dynamic scenarios. It portrays the
communication between any two lifelines as a time-ordered sequence of events, such that

pg. 28
these lifelines took part at the run time. In UML, the lifeline is represented by a vertical
bar, whereas the message flow is represented by a vertical dotted line that extends across
the bottom of the page. It incorporates the iterations as well as branching.

Fig 5.4 sequence diagram

5.5 Data Flow Diagram


A Data Flow Diagram (DFD) is a traditional visual representation of the information flows
within a system. A neat and clear DFD can depict the right amount of the system requirement
graphically. It can be manual, automated, or a combination of both. It shows how data enters and
leaves the system, what changes the information, and where data is stored. The objective of a
DFD is to show the scope and boundaries of a system as a whole. It may be used as a
communication tool between a system analyst and any person who plays a part in the order that
acts as a starting point for redesigning a system. The DFD is also called as a data flow graph or
bubble chart.

pg. 29
Fig 5.5 Data Flow Diagram

pg. 30
CHAPTER-6
IMPLEMENTATION

CHAPTER-6
IMPLEMENTATION
6.1 Algorithm
pg. 31
Step 1: Import dataset

Step 2: Convert the data into data frames format Step3:

Do random oversampling using ROSE package

Step4: Decide the amount of data for training data and testing data

Step5: Give 80% data for training and remaining data for testing. Step6:

Assign train dataset to the models

Step7: Choose the algorithm among 3 different algorithms and create the model Step8:

Make predictions for test dataset for each algorithm

Step9: Calculate accuracy for each algorithm

Step10: Apply confusion matrix for each variable

Step11: Compare the algorithms for all the variables and find out the best algorithm.

CODE
Importing Libraries
!pip install tensorflow
pg. 32
# for numerical operations import
numpy as np

# to store and analysis data in dataframes import


pandas as pd

# data visualization

import matplotlib.pyplot as plt


import seaborn as sns

# python modules for data normalization and splitting


from sklearn.preprocessing import RobustScaler from
sklearn.model_selection import train_test_split

# python modules for creating training and testing ml algorithms from


sklearn.svm import SVC

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

# python modules for creating training and testing Neural Networks


import tensorflow as tf

from tensorflow.keras.models import load_model from


tensorflow.keras.models import Sequential from
tensorflow.keras.layers import Dropout, Dense

DATA ACQUISTION
data = pd.read_csv('creditcard.csv') data

pg. 33
DATA ANAYSIS

data.shape data.info()
data.describe()

sns.countplot(x='Class', data=data)

print("Fraud: ",data.Class.sum()/data.Class.count())
Fraud_class = pd.DataFrame({'Fraud': data['Class']})

Fraud_class. apply(pd.value_counts). plot(kind='pie',subplots=True)


fraud = data[data['Class'] == 1]

valid = data[data['Class'] == 0]
fraud.Amount.describe() plt.figure(figsize=(20,20))
plt.title('Correlation Matrix', y=1.05, size=15)

sns.heatmap(data.astype(float).corr(),linewidths=0.1,vmax=1.0,

square=True, linecolor='white', annot=True)

DATA NORMALIZATION
rs = RobustScaler()

data['Amount'] = rs.fit_transform(data['Amount'].values.reshape(-1, 1)) data['Time'] =


rs.fit_transform(data['Time'].values.reshape(-1, 1))data

CONSIDERING INPUTS COLUMNS OUTPUT COLUMN

X = data.drop(['Class'], axis = 1) Y =
data["Class"]

pg. 34
DATA SPLITTING
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 1)
X_train

X_test Y_test

def evaluate(Y_test, Y_pred):

print("Accuracy: ",accuracy_score(Y_test, Y_pred))


print("Precision: ",precision_score(Y_test, Y_pred))
print("Recall: ",recall_score(Y_test, Y_pred)) print("F1-
Score: ",f1_score(Y_test, Y_pred)) print("AUC score:
",roc_auc_score(Y_test, Y_pred))

print(classification_report(Y_test, Y_pred, target_names = ['Normal', 'Fraud'])) conf_matrix =


confusion_matrix(Y_test, Y_pred)

plt.figure(figsize =(6, 6))

sns.heatmap(conf_matrix, xticklabels = ['Normal', 'Fraud'], yticklabels =


['Normal', 'Fraud'], annot = True, fmt ="d"); plt.title("Confusion matrix")

plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.show(

CREATING ALGORITHMS , TRAINING ,TESTING AND


EVALUATION # CREATING SUPPORT
svm = SVC(
)# Training
SVC svm.fit(X_train, Y_train)
# Testing
SVCY_pred_svm = svm.predict(X_test)

pg. 35
# Evaluating
SVC evaluate(Y_pred_svm, Y_test)
# Evaluation

evaluate(Y_pred_rf, Y_test)

# Decision tree model creation

dtc = DecisionTreeClassifier()
dtc.fit(X_train, Y_train)

# predictions

Y_pred_dt_i = dtc.predict(X_test) evaluate(Y_pred_dt_i,


Y_test)

pg. 36
CHAPTER-7
TESTING

CHAPTER -7
TESTING

pg. 37
Testing is a process of executing a program with intent of finding an error. Testing
presents an interesting anomaly for the software engineering. The goal of the software
testing is to convince system developer and customers that the software is good enough
for operational use. Testing is a process intended to build confidence in the software.
Testing is a set of activities that can be planned in advance and conducted systematically.
Software testing is often referred to as verification & validation.

7.1 Unit Testing

In this testing we test each module individually and integrate with the overall system.
Unit testing focuses verification efforts on the smallest unit of software design in the
module. This is also known as module testing. The module of the system is tested
separately. This testing is carried out during programming stage itself. In this testing step
each module is found to working satisfactorily as regard to the expected output from the
module. There are some validation checks for fields also. It is very easy to find error
debut in the system.

7.2 Validation Testing

At the culmination of the black box testing, software is completely assembled as a


package, interfacing errors have been uncovered and corrected and a final series of
software tests. Asking the user about the format required by system tests the output
displayed or generated by the system under consideration. Here the output format is
considered the of screen display. The output formation the screen is found to be correct
as the format was designed in the system phase according to the user need. For the
hard copy also, the output comes out as specified by the user. Hence the output
testing does not result in any correction in the system.

7.2 Functional Testing

pg. 38
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user
manuals. Functional testing is centered on the following items:

Valid Input: identified classes of valid input must be accepted. Invalid

Input: identified classes of invalid input must be rejected. Functions:

identified functions must be exercised.

Output: identified classes of application outputs must be exercised.

Systems/Procedures: interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key


functions, or special test cases Before functional testing is complete, additional tests are
identified and the effective value of current tests is determined.

7.3 Integration Testing

Data can be lost across an interface; one module can have an adverse effort on the other
sub functions when combined may not produces the desired major functions. Integrated
testingis the systematic testing for constructing the uncover errors within the interface.
The testing was done with sample data. The Developed system has run successfully for
this sample data. The need for integrated test is to find the overall system performance.

7.4 User acceptance testing


User Acceptance Testing is a critical phase of any project and requires significant
participation bythe end user. It also ensures that the system meets the functional
requirements. Some of my friendswere who tested this module suggested that this was really
a user-friendly application and givinggood processing speed

pg. 39
CHAPTER-8
PERFORMANCE ANALYSIS

CHAPTER-8
PERFORMANCE ANALYSIS

pg. 40
:PERFORMANCE METRICS:

The basic performance measures derived from the confusion matrix. The confusion
matrix is a 2 by 2 matrix table contains four outcomes produced by the binary classifier.
Various measures such as sensitivity, specificity, accuracy and error rate are derived from
the confusion matrix.
Accuracy: Accuracy is calculated as the total number of two correct predictions(A+B)
divided by the total number of the dataset(C+D). It is calculated as (1-error rate).
Accuracy=A+B/C+D
Whereas,
A=True Positive B=True Negative
C=Positive D=Negative
Error rate:
Error rate is calculated as the total number of two incorrect predictions(F+E) divided by
the total number of the dataset(C+D).
Error rate=F+E/C+D
Whereas,
E=False Positive F=False
Negative C=Positive
D=Negative
SENSITIVITY:
Sensitivity is calculated as the number of correct positive predictions(A) divided by
the total number of positives(C).
Sensitivity=A/C
Specificity: Specificity is calculated as the number of correct negative predictions(B) divided
by the total number of negatives(D).
Specificity=B/D.

pg. 41
DATA ANALYSIS

Fig.8.1 Data set analysis

SUPPORT VECTOR MACHINE

Accuracy: 0.9994557775359011

Precision: 0.6781609195402298

Recall: 0.9516129032258065

F1-Score: 0.7919463087248322

AUC score: 0.975560405918703

precision recall f1-score support

Normal 1.00 1.00 1.00 56900

pg. 42
Fraud 0.68 0.95 0.79 62

accuracy 1.00 56962


macro avg 0.84 0.98 0.90 56962
weighted avg 1.00 1.00 1.00 56962

RANDOM FOREST
Accuracy : 0.9995611109160493
Precision : 0.7701149425287356

Recall : 0.9305555555555556

FI- Score : 0.8427672955974842

AUC- Score : 0.9651019999609383


precision recall f1-score support

Normal 1.00 1.00 1.00 56890


Fraud 0.77 0.93 0.84 72

accuracy 1.00 56962


macro avg 0.89 0.97 0.92 56962
weighted avg 1.00 1.00 1.00 56962

DECISION TREE

Accuracy: 0.9992802219023208

Precision: 0.7241379310344828

Recall: 0.7875

F1-Score: 0.7544910179640718

AUC score: 0.8935390369536936


precision recall f1-score support

pg. 43
Normal 1.00 1.00 1.00 56882

Fraud 0.72 0.79 0.75 80

accuracy 1.00 56962

macro avg 0.86 0.89 0.88 56962


weighted avg 1.00 1.00 1.00 56962

pg. 44
CHAPTER-9
CONCLUSION & FUTURE ENHANCEMENT

CHAPTER -9
CONCLUSION & FUTURE ENHANCEMENT
Nowadays, in the global computing environment, online payments are important, because
pg. 45
online payments use only the credential information from the credit card to fulfill an
application and then deduct money. Due to this reason, it is important to find the best
solution to detect the maximum number of frauds in online systems.
Accuracy, Error-rate, Sensitivity and Specificity are used to report the performance of the
system to detect the fraud in the credit card. In this paper, three machine learning
algorithms are developed to detect the fraud in credit card system. To evaluate the
algorithms, 80% of the dataset is used for training and 20% is used for testing and
validation. Accuracy, error rate, sensitivity and specificity are used to evaluate for
different variables for three algorithms. The accuracy result is shown for SVM; Decision
tree and random forest classifier are 99.94, 99.92, and 99.95 respectively. The
comparative results show that the Random Forest performs better than the SVM and
decision tree techniques.

Future Enhancement
Detection, we did end up creating a system that can, with enough time and data, get very
close to that goal. As with any such project, there is some room for improvement here.
The very nature of this project allows for multiple algorithms to be integrated together
asmodules and their results can be combined to increase the accuracy of the final result.
This model can further be improved with the addition of more algorithms into it.
However, the output of these algorithms needs to be in the same format as the others.
Once that condition is satisfied, the modules are easy to add as done in the code. This
provides a great degree of modularity and versatility to the project. More room for
improvement can be found in the dataset. As demonstrated before, the precision of the
algorithms increases when the size of dataset is increased. Hence, more data will surely
make the model more accurate in detecting frauds and reduce the number of false
positives.

BIBLIOGRAPHY
pg. 46
 B.Meena, I.S.L.Sarwani, S.V.S.S.Lakshmi,” Web Service mining and its techniques in Web
Mining” IJAEGT,Volume 2,Issue 1 , Page No.385-389.
 F. N. Ogwueleka, "Data Mining Application in Credit Card Fraud Detection System", Journal
of Engineering Science and Technology, vol. 6, no. 3, pp. 311-322, 2019.

 G. Singh, R. Gupta, A. Rastogi, M. D. S. Chandel, A. Riyaz, "A Machine Learning Approach for
Detection of Fraud based on SVM", International Journal of Scientific Engineering and
Technology, vol. 1, no. 3, pp. 194-198, 2019, ISSN ISSN: 2277-1581.

 K. Chaudhary, B. Mallick, "Credit Card Fraud: The study of its impact and detection
techniques", International Journal of Computer Science and Network (IJCSN), vol. 1, no. 4,
pp. 31-35, 2019, ISSN ISSN: 2277-5420.

 M. J. Islam, Q. M. J. Wu, M. Ahmadi, M. A. Sid- Ahmed, "Investigating the Performance of


Naive-Bayes Classifiers and KNearestNeighbor Classifiers", IEEE International Conference on
Convergence Information Technology, pp. 1541-1546, 2017.

 R. Wheeler, S. Aitken, "Multiple algorithms for fraud detection" in Knowledge-Based


Systems, Elsevier, vol. 13, no. 2, pp. 93-99, 2018.

 S. Patil, H. Somavanshi, J. Gaikwad, A. Deshmane, R. Badgujar, "Credit Card Fraud Detection


Using Decision Tree Induction Algorithm", International Journal of Computer Science and
Mobile Computing (IJCSMC), vol. 4, no. 4, pp. 92-95, 2020, ISSN ISSN: 2320-088X.

 S. Maes, K. Tuyls, B. Vanschoenwinkel, B. Manderick,"Credit card fraud detection using


Bayesian and neural networks", Proceedings of the 1st international naiso congresson
neuro fuzzy technologies, pp. 261-270, 2017.

 S. Bhattacharyya, S. Jha, K. Tharakunnel, J. C.Westland, "Data mining for credit card fraud: A
comparative study", Decision Support Systems, vol. 50, no. 3, pp. 602-613, 2019.

 Y. Sahin, E. Duman, "Detecting credit card fraud by ANN and logistic regression",
Innovations in Intelligent Systems and Applications (INISTA) 2018 International Symposium,
pg. 47
pp. 315-319, 2018.

pg. 48
APPENDIX
Appendix A: Screen Shots

Fig .1 correlation matrix

Fig . 2 Dataset

pg. 49
Fig.3 Data set reading mode

Fig 4 Confusion Matrix

pg. 50
Appendix B: Abbreviations

CCFD – Credit Card Fraud Detection


CSDT – Cost Sensitive Decision Tree
ML – Machine Learning
SVM – Support Vector Machine
URL – Uniform Resource

pg. 51

You might also like