Vignesh Final Mini Project
Vignesh Final Mini Project
SECURITY
MASTER OF SCIENCE
IN
INFORMATION TECHNOLOGY
Submitted by
VIGNESH A M
23MIT061
OCTOBER 2024
1
SRI KRISHNA ARTS AND SCIENCE COLLEGE
Accredited by NAAC with ‘A’ Grade, Kuniyamuthur,
Coimbatore – 641008
CERTIFICATE
This is to certify that the project report entitled “ANOMALY DETECTION FOR
FINANCIAL TRANSACTION SECURITY” in partial fulfilment of requirements for the
award of the degree of Master of Science in Information Technology is a record of bonafide work
carried out by VIGNESH A M (23MIT061) and that no part of this has been submitted for
the awardof any other degree or diploma and the work has not been published in popular
journal or magazine.
This Project Report is submitted for the viva voce conducted on at Sri
Krishna Arts and Science College.
2
SRI KRISHNA ARTS AND SCIENCE COLLEGE
Accredited by NAAC with ‘A’ Grade, Kuniyamuthur,
Coimbatore – 641008
DECLARATION
I hereby declare that the project report entitled “ANOMALY DETECTION FOR FINANCIAL AND
TRANSACTION SECURITY” submitted in partial fulfilment of the requirements for the award of the
degree of Master of Science in Information Technology is an original work submitted and it has not
been previously formed the basis for the award of any other Degree, Diploma, Associate ship, Fellowship
or similar titles to any other university or body during the periodof my study.
Place: Coimbatore
Date:
VIGNESH A M
(23MIT061)
3
ACKNOWLEDGEMENT
CEO, SriKrishna Institutions, Sri Krishna Arts and Science College, Coimbatore.
Principal, Sri Krishna Arts and Science College for giving me this opportunity to
I would like to extend my thanks and unbound sense for the timely help and
Department of IT & Cognitive Systems, Sri Krishna Arts and Science College in
I take this opportunity to thank my parents and friends for their constant
VIGNESH A M
23MIT061
4
ABSTRACT
5
TABLE OF CONTENTS
CHAPTER
CHAPTER TITLE PAGE NO.
NO.
1. INTRODUCTION
1.1 OVERVIEW OF THE PROJECT
1.2 PROBLEM DEFINITION
2. SYSTEM STUDY
2.1 LITERATURE RIVIEW
6
CHAPTER 1
INTRODUCTION
Financial practices typically follow certain patterns based on the financial his-
tory of the person or business. By analyzing large amounts of data in advance, DNNs
can recognize these patterns and identify deviations from them. These distractions
can indicate potential fraud, such as unusual purchases, sudden transfers of large
sums of money, or transactions from inaccurate sources. The ability of DNNs to
continuously learn and adapt is an important advantage in anomaly detection. As
fraudsters develop new techniques, DNNs can adapt their models to detect these new
patterns. These changes help ensure that the system remains effective in the face of
evolving threats.
7
1.1 AIM OF THE PROJECT:
The main goal of this project is to harness the power of deep neural networks
(DNNs) to develop a robust anomaly detection system for financial transactions This
system will be built to analyze more financial data and identify deviant patterns from
the norm. These deviations can indicate fraud, such as money laundering or
unauthorized accounts. By implementing this DNN-based system, we aim to
significantly enhance the security and efficiency of financial institutions. Early
identification of discrepancies can lead to early intervention, reducing financial loss
and protecting consumer assets. Furthermore, the system’s ability to constantly learn
and adapt will make it more effective against evolving fraudulent techniques.
This work delves into the use of deep neural networks (DNNs) for anomaly
detection in financial transactions. DNNs, with their unique pattern recognition
capabilities, are ideally suited for analyzing the vast amount of data generated by
financial institutions on a daily basis Through the specific process of identifying
relevant transactions based on historical data on it, DNNs can appropriately flag
obstacles thatmay indicate fraudulent activity. The goal of this project is to develop
a DNN-based system that can identify suspicious transactions in real-time, which
can intervene faster and protect financial institutions and their customers from
fraud. The focus of this project will be to build and train a DNN model specially
designed for anomaly detection in financial transactions. The project will explore
different DNN algorithms and training methods to improve the accuracy and
performance of the model. In addition, the project will address the challenges of
interpretation and bias arising with DNNs. By using optimal translation methods
and incorporating fairness considerations throughout the development process, the
project seeks to develop robust and reliable DNN algorithms for real-world
economic applications.
8
1.3 SCOPE OF THE PROJECT:
The objective of this project is to develop a deep neural network (DNN) model
for anomaly detection in financial transactions. The model will be trained on
historical data on financial transactions to identify specific examples of appropriate
use. By analyzing incoming actions in real time, DNN detects deviations from these
norms that could indicate potential fraud.
The scope of the project covers the entire development life cycle of DNN.
This includes data acquisition and pre-processing, design and training of the DNN
system, and integration of the system into a real-time anomaly detection system The
project will also develop performance metrics to assess the use of the DNN system
effectively in detecting fraudulent transactions.
PURPOSE
9
CHAPTER 2
SYSTEM STUDY
EXISTING SYSTEM:
These models require extensive feature engineering and may not capture
complex relationships within the data. However, they typically lack the depth needed to
recognize sophisticated fraud patterns that deep learning methods can address.
The proposed system for the anomaly detection project in financial transaction
security aims to create a robust, scalable solution capable of identifying fraudulent
activities in real-time. The system will utilize Feedforward Neural Networks (FFNN)
and Deep Neural Networks (DNN) to analyse transaction data effectively. Initially, the
system will gather and preprocess large datasets containing historical transaction
records, ensuring data quality through techniques like normalization and feature
10
extraction. Key features, such as transaction amount, time of transaction, user location,
and historical spending patterns, will be incorporated to improve model accuracy.
Once the data is prepared, the model will be trained using labeled datasets, allowing it
to learn the characteristics of normal and anomalous transactions. A validation phase
will ensure that the model generalizes well to unseen data. The system will be designed
to operate in a real-time environment, enabling immediate detection and flagging of
suspicious transactions for further investigation. Additionally, a user-friendly dashboard
will provide visual insights into transaction patterns and model performance metrics.
By implementing this system, financial institutions can enhance their fraud detection
capabilities, minimize losses, and bolster customer trust in digital transaction security.
This proactive approach aims to significantly improve the overall security landscape of
financial transactions.
The feasibility study for the anomaly detection project in financial transaction
security evaluates its technical, operational, and economic viability. Technically, using
Feedforward Neural Networks (FFNN) and Deep Neural Networks (DNN) is viable, as
these models can effectively analyze large datasets and complex transaction patterns.
Operationally, integration with existing financial systems is achievable, requiring
collaboration with IT teams for real-time data processing. Economically, although initial
costs for development may be high, the long-term benefits—such as reduced fraud
losses and increased customer trust—justify the investment. Overall, the proposed
system promises significant enhancements in detecting fraudulent transactions.
11
2.3.2 TECHNICAL FEASIBILITY:
SOFTWARE SPECIFICATION
12
HARDWARE SPECIFICATION
Processor:
• Intel i7 or AMD Ryzen 7 (or higher) for better performance with deep learning
tasks.
• At least 8 cores recommended.
RAM:
• Minimum 16 GB, ideally 32 GB for handling large datasets and training models
efficiently.
Storage:
• SSD (Solid State Drive) with at least 512 GB for faster data access and
processing.
• Consider an additional HDD for larger datasets.
GPU:
• NVIDIA RTX 2060 or higher (e.g., RTX 3060, 3070, etc.) for accelerated
training of neural networks. CUDA support is essential for GPU acceleration.
NUMPY:
NumPy, short for Numerical Python, is a powerful library that serves as the
backbone for numerical computing in Python. It provides a high-performance
multidimensional array object, along with tools for working with these arrays. NumPy’s
core feature is the ndarray, which enables efficient storage and manipulation of large
datasets. It supports a wide range of mathematical functions, making it ideal for
operations like linear algebra, Fourier transforms, and random number generation.
NumPy’s broadcasting capabilities allow for arithmetic operations between arrays of
different shapes, enhancing flexibility. In the context of data science and machine
learning, NumPy is often used for preprocessing data, performing calculations, and
serving as a foundation for other libraries, such as Pandas and TensorFlow. Its speed
and efficiency make it an essential tool for anyone working with numerical data,
facilitating complex computations with ease and precision.
13
PANDAS:
Pandas is a powerful data manipulation and analysis library for Python, designed
to make working with structured data intuitive and efficient. At its core, Pandas
introduces two primary data structures: Series (one-dimensional) and Data Frame (two-
dimensional), which allow users to easily handle and analyze tabular data. With built-
in functions for data cleaning, transformation, and aggregation, Pandas simplifies tasks
such as handling missing values, filtering data, and performing group operations. Its
ability to read and write data from various formats, including CSV, Excel, and SQL
databases, makes it highly versatile for data ingestion. In the realm of data analysis and
machine learning, Pandas is frequently used for preprocessing datasets before modeling.
Its seamless integration with NumPy enhances numerical computations, while its rich
visualization capabilities, in conjunction with libraries like Matplotlib and Seaborn,
provide valuable insights into data trends and patterns, making it indispensable for data
scientists and analysts alike.
DNN:
FFNN:
14
classification and regression. They are simpler than deep neural networks, making them
easier to implement and train. However, they may struggle with highly complex tasks,
where deeper architectures often yield better performance.
PYTHON:
MACHINE LEARNING:
15
2.7 METHODOLOGIES
Deep Neural Networks (DNN) consist of multiple hidden layers that capture intricate
features and relationships in data, enhancing the model's ability to detect subtle anomalies in
financial transactions through hierarchical representation learning.
16
CHAPTER 3
SYSTEM DESIGN AND DEVELOPMENT
The dataset for your anomaly detection project in financial transaction security should
consist of various features representing transaction attributes. Key fields may include
transaction ID, timestamp, user ID, transaction amount, transaction type (e.g., withdrawal,
deposit), merchant details, location, and device used. Additionally, it is essential to include
labels indicating whether a transaction is normal or anomalous. The dataset should contain a
diverse range of transactions to capture different spending behaviors and anomalies. A
balanced representation of both legitimate and fraudulent transactions will enhance the model's
ability to learn distinguishing features effectively, improving anomaly detection accuracy.
A use case diagram is a visual representation that illustrates the interactions between
various actors and the system in a project. In the context of your anomaly detection project for
17
financial transactions, the diagram serves to identify the key functionalities of the system.
A Data Flow Diagram (DFD) visually represents the flow of data within a system. In
your anomaly detection project, it shows how transaction data is input by customers, processed
by the fraud detection system, and outputs flagged transactions to the bank admin for review,
facilitating efficient data handling and security.
18
3.5 CLASS DIAGRAM
Shows that the Anomaly detection system includes several classesthat represent the
different components of the system, including the Transaction, User, Feature Engineering,
Machine Learning, Fraud Detector, Fraud Investigation, and Notification classes.
19
CHAPTER 4
SYSTEM TESTING
TESTING TYPES:
UNIT TESTING:
Unit testing focuses on verifying the functionality of individual components within the
anomaly detection system, such as algorithms for detecting fraudulent transactions and data
processing functions. By testing each unit in isolation, developers can identify and fix bugs
early, improving code quality and facilitating smoother integration into the overall system
architecture
20
INTEGRATION TESTING:
s e l f . d e t e c t o r = Anomaly Detector ( )
21
import u n i t t e s t
from your module i m p o r t Anomaly
Detector from your m o d ul e i m p o r t D a t a
P r e p r o c e s s o rfrom your m o d ul e i m p o r t
DataGenerator
c l a s s T e s t I n t e g r a t i o n ( u n i t t e s t . T e s t C a s e ) :d e f set Up ( s e l f ) :
s e l f . d e t e c t o r = Anomaly Detector ( )
se l f . preproce ssor = DataPrepro
c e s s o r ( )s e l f . g e n e r a t o r = D a t a G e n e
rator ( )
# T r a i n t h e model
sel f . detector . t rai n ( preprocessed data )
# Make p r e d i c t i o n s
predictions = sel f . detector . predict ( preprocessed data )
# E v a l u a t e t h e model
evaluation = sel f . detector . evaluate ( preprocessed data )
# Check t h e results
se l f . assert Greater ( evaluat
i o n , 0 )s e l f . a s s e r t L e s s ( e v a
l u a t i o n , 1)
# Check t h e p r e d i c t i o n s
se l f . assertEqual ( len ( p redi cti ons ) , len ( pre processed
d a t a ) )s e l f . a s s e r t G r e a t e r E q u a l ( max ( p r e d i c t i o n s ) , 0 )
s e l f . a s s e r t L e s s E q u a l ( min ( p r e d i c t i o n s ) , 1 )
SYSTEM TESTING:
System testing for the anomaly detection project involves evaluating the entire
application to ensure it meets specified requirements. This includes functional testing to verify
accurate anomaly detection, performance testing for responsiveness under load, security testing
22
to identify vulnerabilities, and user acceptance testing to confirm usability, ultimately ensuring
a robust and reliable system.
import u n i t t e s t
from your module i m p o r t Anomaly Detector
from your m o d ul e i m p o r t D a t a P r e p r o c e s s o r
from your m o d ul e i m p o r t D a t a G e n e r a t o r
c l a s s Te st Sy st e m ( u n i t t e s t . T e s t C a s e ) :d e f set Up ( s e l f ) :
s e l f . d e t e c t o r = Anomaly Detector ( )
s e l f . p r e p r o c e s s o r = D a t a P r e p r o c e s s o r ( )s e l f . g e n e r a t o r = D a t a G e n e r a t o r ( )
d e f t e s t s y s t e m ( s e l f ) :# Load t h e d a t a
data = se l f . gene rator . ge ne rate da ta ()
# Preprocess the data
preprocesse d data = se l f . preprocessor . preprocess ( data )
# T r a i n t h e model
sel f . detector . t rai n ( preprocessed data )
# Make p r e d i c t i o n s
predictions = sel f . detector . predict ( preprocessed data )
# E v a l u a t e t h e model
evaluation = se lf . detector . evaluate ( preprocessed data )
# Check t h e r e s u l t s
s e l f . a s s e r t G r e a t e r ( e v a l u at i o n , 0 )
s e l f . a s s e r t L e s s ( e v a l u a t i o n , 1 )a
# Check t h e p r e d i c t i o n s
s e l f . a s s e r t G r e a t e r E q u a l ( max ( p r e d i c t i o n s ) , 0 )
s e l f . a s s e r t L e s s E q u a l ( min ( p r e d i c t i o n s ) , 1 )
# T e s t t h e anomal y d e t e c t i o n
anomalies = s e l f . d e t e c t or . de t e ct a noma l i e s ( pre pr oc e sse d da t a )
s e l f . a s se rt Equal ( len ( anomalies ) , len ( p re proc e sse d da ta ) )
s e l f . a s s e r t G r e a t e r E q u a l ( max ( a n o m a l i e s ) , 0 )
s e l f . a s s e r t L e s s E q u a l ( min ( a n o m a l i e s ) , 1 )
if name == m a i n ’ : u n i t t e s t . main ( )
23
4.1 INPUT MODEL:
The input model for your anomaly detection system is designed to capture essential
features from financial transaction data, enabling accurate analysis. Key inputs include
transaction ID, timestamp, user ID, transaction amount, transaction type (e.g., withdrawal,
deposit), merchant details, geographical location, and device information used for the
transaction. These features collectively provide a comprehensive view of each transaction,
allowing the system to identify patterns and detect anomalies effectively. To optimize model
performance, the input data must undergo preprocessing.
The anomaly detection algorithm in your financial transaction system employs machine
learning techniques to identify irregular patterns. Initially, the algorithm processes input
features such as transaction amount, timestamp, user ID, and transaction type. Common
algorithms include Isolation Forest, which isolates anomalies by randomly partitioning data,
and One-Class SVM, which learns the boundaries of normal transactions. K-Means Clustering
can also be used to group similar transactions, identifying outliers as anomalies. After training,
the model assigns risk scores to transactions, flagging those exceeding a predefined threshold.
This systematic approach enhances fraud detection, providing accurate alerts for suspicious
activities and improving financial security.
24
CHAPTER 5
To set up the environment for your anomaly detection project, install Python as the
primary programming language, along with essential libraries like Pandas for data
manipulation, Scikit-learn for machine learning, and Matplotlib/Seaborn for data visualization.
Additionally, configure a database (e.g., MySQL or MongoDB) to store transaction data, and
consider using Jupyter Notebook for interactive development and testing.
• Data Acquisition: Gather historical transaction data from relevant sources, such as
bank databases or financial APIs, ensuring the dataset includes both legitimate and
fraudulent transactions for comprehensive analysis.
• Data Cleaning: Remove duplicates, handle missing values, and correct inconsistencies
in the dataset to ensure high-quality data for analysis. This may involve imputing
missing values or removing problematic entries.
• Feature Selection: Identify and select relevant features, such as transaction ID,
timestamp, user ID, transaction amount, transaction type, and merchant details, that
contribute to detecting anomalies.
• Normalization and Encoding: Normalize numerical features to a consistent scale
(e.g., using Min-Max scaling) and encode categorical variables (e.g., using one-hot
encoding) to prepare the data for machine learning algorithms.
• Data Splitting: Divide the pre-processed dataset into training, validation, and testing
subsets to facilitate model training, tuning, and evaluation, ensuring that the model
generalizes well to unseen data.
25
5.2 DATA COLLECTION AND PREPROCESSING:
The dataset as shown in contains transactions made by credit cards in September 2013
by European cardholders. This dataset presents transactions that occurred in two days, where
we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive
class (frauds) account for 0.172% of all transactions.
It contains only numerical input variables which are the result of a PCA trans- formation.
Unfortunately, due to confidentiality issues, we cannot provide the original features and more
background information about the data. Features V1, V2, . . . V28 are the principal components
obtained with PCA, the only features which have not been transformed with PCA are ’Time’
and ’Amount’. Feature ’Time’ contains the seconds elapsed between each transaction and the
first transaction in the dataset. The feature ’Amount’ is the transaction Amount, this feature
can be used for example dependent cost-sensitive learning. Feature ’Class’ is the response
variable and it takes value 1 in case of fraud and 0 otherwise.
26
5.3 MODEL EVALUATION AND DEPLOYMENT:
In this module, we will evaluate the performance of the trained FFNN model
using metrics such as precision, recall, and F1-score. We will also use techniques
such as confusion matrices and ROC curves to visualize the performance of the
model. If the model performs well, we will deploy it to detect anomalies in real-time
financial transactions. We will also monitor the performance of the model and
retrain it peri- odically to adapt to changes in the data distribution. Additionally, we
will integrate the model with other systems, such as fraud detection systems, to
improve the overall accuracy of the system.
Start by loading the credit card transaction data from the ‘creditcard.csv
‘dataset in Google Colab. This dataset will be split into training and testing sets.
Ensure to remove any missing data during this process.
Explore the dataset to understand the structure and distribution of the data. Pay attention
to relevant features such as transaction amount, location, and time. Remove any irrelevant or
redundant features to streamline the data.
27
PREPROCESS THE DATASET:
Preprocess the data to prepare it for training. This may include steps such as scaling,
normalization, and handling missing data. The data should be labeled appropriately, where
fraudulent transactions are labeled as 1 and non-fraudulent trans- actions as 0.
Train a Deep Neural Network model on the training set. This involves computing the
sigmoid function on the linear combination of features and weights, computing the loss
function using the predicted probabilities and the true labels, computing the gradient of the loss
function with respect to the weights, and updating the weights using a learning rate and the
gradient.
Apply the trained Deep Neural Network model to the testing data and evaluate its
performance using metrics such as accuracy, precision, recall, and F1 score.
EXISTING SYSTEM:
28
One limitation of the existing system is its reliance on predefined rules and thresholds,
which can limit its effectiveness in detecting new or evolving fraud pat- terns. Additionally,
the existing system may generate false positives or false negatives, which can lead to
unnecessary investigations or missed fraudulent transactions
COMPARISON GRAPH:
COMPARISON TABLE:
PROPOSED SYSTEM:
29
(FFNN) to analyze vast amounts of credit card transaction data and uncover complex
patterns indicative of fraudulent activity. The FFNN is trained on a combination of
advanced machine learning techniques, including logistic regression, decision trees, and
random forests, allowing the system to continuously learn and refine its accuracy over
time. By incorporating new data and user feedback into its algorithms, the system can
adapt to evolving fraudulent tactics and improve its detection capabilities. One of the
system’s key advantages is its ability to detect fraud in real-time, enabling swift
intervention to prevent losses and minimize the impact of fraudulent transactions.
Additionally, the system automates a significant portion of the fraud detection process,
reducing the need for manual reviews and investigations, and subsequently decreasing
associated costs.
Furthermore, the FFNN’s ability to learn and generalize from large datasets
enables the system to identify subtle patterns and anomalies that may not be detectable
by traditional rule-based systems. However, a potential challenge lies inthe system’s
reliance on substantial volumes of high-quality data to effectively train its FFNN.
Furthermore, like any detection system, there is a risk of false positivesor false
negatives, potentially leading to unnecessary investigations or overlooked fraudulent
activities.
30
CHAPTER 6
CONCLUSION:
31
FUTURE ENCHANCEMENTS:
Future enhancements for the anomaly detection system in financial transactions can
significantly improve its effectiveness and adaptability to emerging threats. One potential
enhancement is the integration of advanced machine learning techniques, such as deep learning
models like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).
These models can capture more complex patterns in transaction data, improving the system's
ability to identify subtle anomalies that traditional algorithms may overlook. Additionally,
incorporating ensemble methods, which combine multiple models to improve prediction
accuracy, could further enhance detection capabilities.
Another important area for enhancement is the expansion of the dataset used for training
the model. By incorporating a more diverse range of transaction scenarios—including different
geographical locations, user behaviours, and transaction types—the model can become more
robust and better equipped to recognize fraudulent activities across various contexts. Regularly
updating the dataset with recent transactions will also help the model adapt to evolving fraud
\tactics.
Collaboration with cybersecurity experts can also provide valuable insights into
emerging threats and vulnerabilities. By staying informed about the latest fraud trends and
attack vectors, the system can be proactively updated to mitigate risks effectively. Additionally,
incorporating user feedback loops will help refine the model based on real-world experiences,
making the system more user-centric.
Lastly, enhancing user interfaces for bank administrators will improve usability.
Providing intuitive dashboards, customizable alerts, and detailed visualizations of transaction
patterns will empower users to make informed decisions quickly.
32
CHAPTER 7
BIBLIOGRAPHY:
• Ahmed, M., Mahmood, A. N., & Hu, J. (2016). A survey of network anomaly detection
techniques. Journal of Network and Computer Applications, 60, 1-22.
https://doi.org/10.1016/j.jnca.2015.09.015
• Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM
Computing Surveys (CSUR), 41(3), 1-58. https://doi.org/10.1145/1541880.1541882
• Iglewicz, B., & Hoaglin, D. C. (1993). How to Detect and Handle Outliers. New York:
Sage Publications.
• Kiran, R. A., & Gohil, H. (2017). A comparative study of machine learning algorithms
for anomaly detection in financial transactions. International Journal of Computer
Applications, 168(2), 1-5. https://doi.org/10.5120/ijca2017915260
• Xia, Y., Wang, L., & Wu, Y. (2015). Financial transaction anomaly detection based on
machine learning algorithms. Journal of Financial Crime, 22(3), 350-363.
https://doi.org/10.1108/JFC-04-2014-0022
• Wu, J., & Zhang, J. (2018). Anomaly detection in financial transactions: A deep
learning approach. Expert Systems with Applications, 95, 174-183.
https://doi.org/10.1016/j.eswa.2017.11.052
• Zhou, Z., & Jiang, Y. (2019). A survey of anomaly detection techniques in financial
transactions. Journal of Financial Crime, 26(2), 336-352. https://doi.org/10.1108/JFC-
10-2018-0106
• Bhatia, A., & Singh, M. (2020). Machine learning for fraud detection: A review.
Journal of King Saud University - Computer and Information Sciences.
https://doi.org/10.1016/j.jksuci.2020.03.005
33
CHAPTER 8
from g o o g l e . c o l a b
i m p o r t d r i v ed r i v e .
mount ( ’ / c o n t e n t / d r
ive ’ )
import
pandas as
pd i m p o r t
numpy as np
import mat pl otlib .
p y p l o t as p l t i m p o r t
s e a b o r n as s n s
from s k l e a r n . p r e p r o c e s s i n g i m p o r t S t a n d a r d S c a l e r
from s k l e a r n . m o d e l s e l e c t i o n i m p o r t
t r a i n t e s t s p l i ti m p o r t k e r a s
from k e r a s . models i m p o
r t S e q u e n t i a l from k e r a s
. l a y e r s i m p o r t Dense
from k e r a s . l a y e r s i m p o r t Dropout
d r i v e . mount ( ’ / c o n t e n t / d r i v e ’ )
i m p o r t os
df . head ( 1 )
34
df [ ’ C l a s s ’ ] . u n i q u e ( ) # 0 = no f r a u d , 1 = f r a u d u l e n t
p r i n t ( df . shape )
p r i n t ( df . i n f o ( ) )
p r i n t ( df . d e s c r i b e ( ) )
s n s . c o u n t p l o t ( x= ’ C l a
s s ’ , d a t a = df ) p l t . show
()
X = df . i l o c [ : , : − 1 ] . v a l u e s
y = df . i l o c [ : , − 1 ] . v a l u e s
p r i n t (X. shape )
p r i n t ( y . shape )
X t r a i n , X t e s t , Y t r a i n , Y t e s t = t r a i n t e s t s p l i t ( X, y , t e s t s i z e = 0 . 1 , r a n
d o m s t at e =1)
sc = S t a n d a r d S c a l e r ( )
X t r a i n = sc . f i t t r a n s f o r m ( X t r a i n )
X t e s t = sc . t r a n s f o r m ( X t e s t )
p r i n t ( X t r a i n . shape
p r i n t ( X t e s t . shape )
clf = Sequential ([
Dense ( u n i t s = 16 , k e r n e l i n i t i a l i z e r = ’ uniform ’ , i n p u t d i m = 30 , a c t i v a t i o n = ’ r e l u ’
35
),
Dense ( u n i t s = 18 , k e r n e l i n i t i a l i z e r = ’ uniform ’ , a c t i v a t i o n = ’ r e l u ’ ) ,
Dropout ( 0 . 2 5 ) ,
Dense ( 2 0 , k e r n e l i n i t i a l i z e r = ’ uniform ’ , a c t i v a t i o n = ’ r e l u ’ ) ,
Dense ( 2 4 , k e r n e l i n i t i a l i z e r = ’ uniform ’ , a c t i v a t i o n = ’ r e l u ’ ) ,
c l f . summary ( )
c l f . compile ( o p t i m i z e r = ’ adam ’ , l o s s = ’ b i n a r y c r o s s e n t r o p y ’ , m e t r i c s =[ ’ a c c u r a c y
’ ])
c l f . f i t ( X t r a i n , Y t r a i n , b a t c h s i z e = 15 , epochs = 2 )
s c o r e = c l f . e v a l u a t e ( X t e s t , Y t e s t , b a t c h s i z e = 128 )
from s k l e a r n . m e t r i c s i m p o r t c o n f u s i o n m a t r i x
from s k l e a r n . m e t r i c s i m p o r t a c c u r a c y s c o r e
p r i n t ( ” Confusion M a t r i x : ” )
p r i n t ( ” \ n Accuracy : ” , a c c u r a c y s c o r e ( Y t e s t , y p r e d ) )
p l t . t i t l e ( ’ P r e c i s i o n − R e c a l l Curve wit h T h r e s h o l d ’ )
p l t . show ( )
# C a l c u l a t i n g t h e PR−AUC s c o r e wit h t h r e s h o l d
from s k l e a r n . m e t r i c s i m p o r t a v e r a g e p r e c i s i o n s c o r e
36
p r i n t ( ” \nPR−AUC Score wi t h T h r e s h o l d : ” , a v e r a g e p r e c i s i o n s c o r e ( Y t e s t , y p r
ed ) )
# C a l c u l a t i n g t h e AUC−PR s c o r e wit h t h r e s h o l d
from s k l e a r n . m e t r i c s i m p o r t auc
# C a l c u l a t i n g t h e a r e a under t h e ROC c u r v e wi t h t h r e s h o l d
from s k l e a r n . m e t r i c s i m p o r t r o c a u c s c o r e
# C a l c u l a t i n g t h e a r e a under t h e PR c u r v e wi t h t h r e s h o l d
from s k l e a r n . m e t r i c s i m p o r t a v e r a g e p r e c i s i o n s c o r e
p r i n t ( ” \ n Area under t h e PR c u r v e wi t h T h r e s h o l d : ” , a v e r a g e p r e c i s i o n s c o r e ( Y
test , y pred ) )
37
8.2 SAMPLE SCREENSHOTS:
38
39