0% found this document useful (0 votes)

26 views

Project Report Arjun

Uploaded by

naveenarjun400

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Project Report Arjun

Uploaded by

naveenarjun400

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 95

Christ Nagar, Hullahalli, Begur - Koppa Road, Sakalawara Post, Bengaluru-560083

DEPARTMENT OF COMPUTER SCIENCE AND

APPLICATIONS

A PROJECT REPORT ON

“Credit Card Fraud Detection”

Submitted in the partial fulfilment for the award of degree in

BACHELOR OF COMPUTER APPLICATIONS

Submitted by

FARZANA P S

(ROLL NO: U03BV21S0005)

Under the guidance of

Dr. JAWAHAR SUNDARAM

ASSISTANT PROFESSOR
DEPARTMENT OF COMPUTER SCIENCE AND APPLICATIONS

JUNE – 2024
Christ Nagar, Hullahalli, Begur - Koppa Road, Sakalawara Post, Bengaluru-560083

DEPARTMENT OF COMPUTER SCIENCE AND

APPLICATIONS

Date:

CERTIFICATE
This is to certify that the project work entitled “Credit Card Fraud
Detection” is a Bonafede work done by FARZANA P S (U03BV21S0005) of
VI Semester in partial fulfilment of requirements for the award of the degree
of Bachelor of Computer Applications at CHRIST ACADEMY
INSTITUTE FOR ADVANCED STUDIES affiliated to Bangalore
University during the academic year 2023-2024. It has been found to be
satisfactory and hereby approved for the submission.

Dr. Jawahar Sundaram Dr. C. Umarani

(Project Guide) (Head of the Department)

Examiners:

2. College Stamp
Christ Nagar, Hullahalli, Begur - Koppa Road, Sakalawara Post, Bengaluru-
560083

DEPARTMENT OF COMPUTER SCIENCE AND

APPLICATIONS

Acknowledgement

First, I would like to thank all the people who assisted me at Christ Academy Institute for
Advanced Studies for the completion of my mini-project with their patience.

It is indeed with a great sense of pleasure and immense sense of gratitude that I acknowledge
the help of these individuals.

I am highly indebted to the Principal Rev. Fr. Antony Davis for the facilities provided to
accomplish this project.

I would like to thank my Project Guide Dr.Jawahar Sundaram and Department Head
Dr.C.Umarani for her constructive criticism throughout my project.

I am extremely grateful to my department staff members and friends who helped me in

successful completion of this project.

FARZANA P S
U03BV21S0005
DECLARATION

This is to certify that the project report entitled “Credit Card Fraud Detection” is done by
me, and it is authentic work carried out for the partial fulfilment of the requirements for
the award of the degree of Bachelor of Computer Application(BCA) under the guidance of
Dr. Jawahar Sundaram. The matter and software embody in this project has not been
submitted earlier for award of any degree or diploma to the best of my knowledge and
believes.

Date: Signature of Student

FARZANA P S
U03BV21S0005
Index

SL No. Title Page

No.

1 Introduction 1-10

2 Literature review 11-18

3 Software design 19-26

4 Software and hardware requirements 27-33

5 Code 34-55

6 Testing 56-73

7 Result analysis 74-85

8 Future Enhancements 86-88

9 Conclusion 89

10 References 90
CREDIT CARD FRAUD DETECTION BCA: CAIAS

1. INTRODUCTION
In today's digital age, credit cards have become a ubiquitous and essential part of financial transactions.
They offer convenience and security for consumers and businesses alike. However, this convenience
comes with the risk of fraud, which has become increasingly prevalent and sophisticated. Credit card
fraud involves unauthorized use of a credit card to obtain goods, services, or funds, causing significant
financial losses to individuals and financial institutions. According to a report by the Federal Trade
Commission, credit card fraud was one of the top forms of identity theft reported in recent years.

The traditional methods of fraud detection, such as rule-based systems and manual reviews, are often
inadequate due to their inability to adapt to the evolving tactics of fraudsters. These methods can be
slow, inefficient, and prone to errors, leading to both false positives and false negatives. As fraud
techniques become more complex, there is a pressing need for more advanced and adaptive detection
methods.

Machine learning, a subset of artificial intelligence, offers promising solutions for detecting credit card
fraud. By analysing large datasets and identifying patterns that indicate fraudulent behaviour, machine
learning algorithms can provide more accurate and timely detection. This project focuses on two
popular machine learning algorithms: Random Forest and Logistic Regression. Both algorithms have
shown potential in various classification tasks and will be evaluated for their effectiveness in credit
card fraud detection.

Machine Learning in Credit Card Fraud Detection

Machine learning (ML) has emerged as a powerful tool in the fight against credit card fraud, offering
sophisticated techniques to identify and prevent fraudulent activities more effectively than traditional
methods. Credit card fraud detection using ML involves training algorithms on historical transaction
data to recognize patterns and anomalies indicative of fraud. These models learn from vast amounts of
data, continuously improving their accuracy and adaptability to new types of fraudulent behaviour,
which is crucial given the ever-evolving nature of fraud tactics.

UO3BV21S0005 1 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Types of Machine Learning

Algorithms in Fraud Detection:

Supervised Learning: This approach uses labelled datasets where each transaction is marked as
fraudulent or legitimate. Algorithms like Logistic Regression, Decision Trees, Random Forest, and
Support Vector Machines are trained to predict the likelihood of a transaction being fraudulent. For
instance, Logistic Regression is valued for its simplicity and effectiveness in binary classification
tasks, while Random Forest, an ensemble learning method, combines multiple decision trees to
enhance prediction accuracy and reduce overfitting.

Unsupervised Learning: Unlike supervised learning, unsupervised learning deals with unlabelled
data. Algorithms such as k-Means Clustering and Principal Component Analysis (PCA) are employed
to detect anomalies in transaction data. These methods identify unusual patterns or outliers that deviate
significantly from typical transaction behaviour, flagging them for further investigation. This is
particularly useful in identifying novel fraud patterns that have not been previously labelled.

Hybrid Approaches: Combining supervised and unsupervised learning can further enhance fraud
detection systems. For example, a hybrid model might first use unsupervised learning to cluster
transactions and identify potential fraud cases, followed by a supervised learning model to classify
these cases with higher precision. This layered approach leverages the strengths of both methods,
providing a more comprehensive defence against fraud.

1.1 Project Description

In the digital age, credit card transactions have become an integral part of our daily lives, offering
unparalleled convenience and ease of use. However, this increased reliance on credit cards has also led
to a surge in fraudulent activities. Credit card fraud is not only financially damaging but also
undermines consumer trust in financial systems. Traditional fraud detection methods, such as rule-
based systems and manual reviews, are often insufficient to tackle the sophisticated and evolving
nature of fraudulent schemes.

This project aims to address the critical issue of credit card fraud detection by leveraging machine
learning techniques. Specifically, we will develop and compare two machine learning models: Random
Forest and Logistic Regression. These models will be trained on a dataset of credit card transactions

UO3BV21S0005 2 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

to identify patterns indicative of fraudulent activity, thus enhancing the accuracy and efficiency of
fraud detection systems.

Credit card fraud detection is a critical challenge in the financial sector, with significant implications
for both consumers and institutions. This project aims to leverage machine learning algorithms,
specifically Random Forest and Logistic Regression, to develop robust models capable of detecting
fraudulent transactions. By analysing transaction data and identifying patterns indicative of fraud,
these models can significantly enhance the efficiency and accuracy of fraud detection systems.

Credit card fraud is a significant and growing concern in today's digital age, impacting consumers and
financial institutions alike. Traditional fraud detection methods, such as rule-based systems and
manual reviews, have proven inadequate in addressing the sophistication and rapid evolution of
fraudulent activities. This project aims to leverage the power of machine learning to develop more
effective fraud detection models, specifically using the Random Forest and Logistic Regression
algorithms. These algorithms will be applied to a dataset of credit card transactions to identify patterns
that indicate fraudulent activity, thus enhancing the accuracy and efficiency of fraud detection systems.

The project's methodology involves several key steps, starting with data collection and preprocessing.
The dataset, which includes various features of credit card transactions, will be cleaned, normalized,
and balanced using techniques like SMOTE to handle the inherent class imbalance. Following this,
two machine learning models will be developed: Logistic Regression, a statistical method for binary
classification, and Random Forest, an ensemble learning method that constructs multiple decision
trees. These models will be trained and evaluated using a range of performance metrics, such as
accuracy, precision, recall, F1-score, and the ROC-AUC curve, to determine their effectiveness in
detecting fraudulent transactions.

By implementing and comparing these two machine learning models, the project seeks to provide a
comprehensive analysis of their performance in the context of credit card fraud detection. The expected
outcomes include the development of robust fraud detection models, insights into the strengths and
limitations of each algorithm, and practical recommendations for improving fraud detection systems.
This study not only aims to contribute to the academic field of machine learning and fraud detection
but also to provide tangible benefits for financial institutions in mitigating the risks associated with
credit card fraud.

UO3BV21S0005 3 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

1.2 Problem Statement

Credit card fraud is a pervasive and escalating issue that poses significant challenges for financial
institutions, businesses, and consumers worldwide. With the rapid growth of e-commerce and digital
transactions, the prevalence and sophistication of fraudulent activities have also increased. This creates
a critical need for advanced and efficient fraud detection systems. Traditional methods of fraud
detection, such as rule-based systems and manual reviews, are increasingly inadequate in combating
the dynamic and complex nature of modern fraud schemes. This inadequacy leads to substantial
financial losses, eroded consumer trust, and operational inefficiencies.

Complexity of Fraud Detection:

Fraud detection in credit card transactions is a highly complex task due to several factors. First, the
sheer volume of transactions processed daily by financial institutions is immense, making it
impractical to manually review each transaction for potential fraud. Second, fraudsters continuously
evolve their techniques, employing more sophisticated methods that can evade traditional detection
systems. This cat-and-mouse game between fraudsters and detection systems necessitates the adoption
of adaptive and intelligent solutions that can learn and evolve over time.

Moreover, fraudulent transactions often exhibit characteristics that are similar to legitimate
transactions, making it difficult to distinguish between the two. Fraud detection systems must be
capable of identifying subtle patterns and anomalies in transaction data, which requires the use of
advanced analytical techniques. The high dimensionality and variability of transaction data further
complicate the detection process, as models must consider numerous features and potential interactions
to accurately identify fraud.

Challenges in Implementing Effective Solutions:

A significant challenge in fraud detection is the imbalance in the dataset, where fraudulent transactions
represent a very small fraction of the total transactions. This imbalance can lead to biased models that
favour the majority class (legitimate transactions), resulting in a high rate of false negatives (fraudulent
transactions classified as legitimate). False negatives are particularly concerning as they allow
fraudulent activities to go undetected, leading to financial losses and undermining the effectiveness of
the fraud detection system.

Conversely, a high rate of false positives (legitimate transactions classified as fraudulent) can also have
detrimental effects. False positives cause inconvenience to customers, as their legitimate transactions

UO3BV21S0005 4 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

are flagged and possibly declined. This can lead to customer dissatisfaction and a loss of trust in the
financial institution. Furthermore, the operational cost of investigating false positives is substantial, as
each flagged transaction requires manual review and verification.

The dynamic nature of fraud adds another layer of complexity. Fraudulent patterns can change rapidly,
rendering static detection models obsolete. Therefore, fraud detection systems must be capable of real-
time learning and adaptation to new patterns. This requires the integration of machine learning
algorithms that can continuously update and improve their performance based on new data.

Need for Advanced Machine Learning Techniques:

Given these challenges, there is a critical need for advanced machine learning techniques that can
effectively address the limitations of traditional fraud detection methods. Machine learning algorithms,
such as Random Forest and Logistic Regression, offer significant potential in enhancing the accuracy
and efficiency of fraud detection systems. These algorithms can analyse large volumes of transaction
data, identify complex patterns, and adapt to evolving fraud techniques.

Random Forest, an ensemble learning method, is particularly well-suited for fraud detection due to its
ability to handle high-dimensional data and its robustness against overfitting. By constructing multiple
decision trees and aggregating their predictions, Random Forest can provide accurate and reliable
classifications of transactions. Logistic Regression, a widely used statistical method, offers simplicity
and interpretability, making it valuable for binary classification tasks such as fraud detection. Its ability
to provide probabilistic predictions allows for a nuanced assessment of transaction risk.

The implementation of these machine learning models involves several critical steps, including data
preprocessing, feature engineering, model training, and evaluation. Data preprocessing ensures that
the transaction data is clean, normalized, and balanced, addressing issues such as missing values and
class imbalance. Feature engineering involves creating meaningful features that capture the relevant
patterns and characteristics of fraudulent transactions. Model training involves fitting the machine
learning algorithms to the pre-processed data, while evaluation metrics such as precision, recall, F1-
score, and ROC-AUC are used to assess model performance.

In conclusion, the problem of credit card fraud detection is multifaceted and complex, requiring
advanced and adaptive solutions. Traditional methods are increasingly insufficient in addressing the
dynamic and sophisticated nature of modern fraud. Machine learning techniques, particularly Random
Forest and Logistic Regression, offer promising approaches to enhance the accuracy and efficiency of
fraud detection systems. By leveraging these advanced algorithms, financial institutions can improve

UO3BV21S0005 5 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

their ability to detect and prevent fraudulent transactions, thereby reducing financial losses, enhancing
customer trust, and ensuring the security of digital transactions.

1.3 Objectives of the Study

The primary objective of this study is to develop and evaluate machine learning models for detecting
credit card fraud, utilizing the Random Forest and Logistic Regression algorithms. By leveraging these
advanced techniques, the study aims to address the limitations of traditional fraud detection methods,
enhance the accuracy and efficiency of fraud detection systems, and provide actionable insights for
financial institutions to better protect their customers. The specific objectives are outlined as follows:

1. Development of Robust Fraud Detection Models:

The first objective is to develop robust and reliable machine learning models capable of accurately
identifying fraudulent transactions from a large dataset of credit card transactions. This involves
implementing the Random Forest and Logistic Regression algorithms, chosen for their respective
strengths in handling complex, high-dimensional data and providing interpretable, probabilistic
predictions. The development process includes data preprocessing, feature engineering, model
training, and optimization to ensure the models can effectively distinguish between legitimate and
fraudulent transactions.

2. Comparison of Model Performance:

The second objective is to perform a comparative analysis of the Random Forest and Logistic
Regression models. This involves evaluating their performance using various metrics such as accuracy,
precision, recall, F1-score, and the area under the receiver operating characteristic curve (ROC-AUC).
By comparing these metrics, the study aims to identify the strengths and weaknesses of each model,
providing insights into their suitability for fraud detection tasks. This comparison will help determine
which algorithm is more effective in detecting fraudulent transactions and under what conditions each
model performs best.

3. Addressing Data Imbalance:

A critical objective of the study is to address the issue of data imbalance inherent in credit card fraud
detection datasets, where fraudulent transactions are significantly outnumbered by legitimate ones.
The study will explore and implement techniques such as Synthetic Minority Over-sampling

UO3BV21S0005 6 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), and undersampling of the majority
class to balance the dataset. These techniques aim to improve model performance by ensuring that the
machine learning algorithms do not become biased towards the majority class, thus enhancing their
ability to detect fraudulent transactions accurately.

4. Enhancing Model Generalization:

Another objective is to ensure that the developed models generalize well to new, unseen data. This
involves implementing techniques such as cross-validation and regularization to prevent overfitting,
where the models perform well on training data but poorly on testing data. By enhancing model
generalization, the study aims to develop fraud detection systems that maintain high performance and
reliability when deployed in real-world scenarios, where transaction patterns may differ from those in
the training dataset.

5. Real-Time Fraud Detection Capabilities:

The study also aims to explore the feasibility of implementing real-time fraud detection capabilities.
This involves evaluating the computational efficiency of the Random Forest and Logistic Regression
models and their ability to provide quick and accurate predictions. Real-time fraud detection is critical
for financial institutions to prevent fraudulent transactions before they are processed, thereby
minimizing financial losses and protecting customers. The study will investigate techniques for
optimizing model performance to meet the demands of real-time processing.

6. Providing Practical Recommendations:

Based on the findings of the study, a key objective is to provide practical recommendations for
financial institutions on implementing and optimizing machine learning-based fraud detection
systems. This includes guidance on data preprocessing, model selection, performance evaluation, and
deployment strategies. The recommendations aim to help financial institutions leverage the insights
gained from the study to improve their fraud detection capabilities, enhance customer trust, and reduce
operational costs associated with fraud investigations.

7. Contributing to the Academic and Professional Community:

Finally, the study aims to contribute to the broader academic and professional community by
advancing the understanding of machine learning applications in fraud detection. The study will
document the methodology, findings, and insights in a detailed report, making it accessible to
researchers, practitioners, and policymakers. By sharing the knowledge gained, the study seeks to

UO3BV21S0005 7 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

foster further research and development in the field of fraud detection, encouraging the adoption of
advanced machine learning techniques to combat the growing threat of credit card fraud.

In summary, the objectives of this study are comprehensive and multifaceted, aiming to develop,
evaluate, and optimize machine learning models for credit card fraud detection using Random Forest
and Logistic Regression algorithms. By addressing key challenges such as data imbalance and model
generalization, and by exploring real-time detection capabilities, the study seeks to enhance the
effectiveness of fraud detection systems and provide valuable insights for financial institutions and the
broader community.

1.4 Scope of the Study

The scope of this study encompasses the comprehensive development, implementation, and evaluation
of machine learning models for detecting credit card fraud. The study focuses on utilizing the Random
Forest and Logistic Regression algorithms due to their respective strengths in handling complex, high-
dimensional data and providing interpretable, probabilistic predictions. The study is designed to
address the key challenges associated with credit card fraud detection, including data imbalance, model
accuracy, and real-time detection capabilities. The detailed scope of the study is outlined as follows:

1. Dataset and Data Collection:

The study will utilize a publicly available dataset of credit card transactions, which includes both
legitimate and fraudulent transactions. The dataset will be sourced from credible repositories, ensuring
that it is representative of real-world transaction patterns. The features of the dataset typically include
transaction amount, time of transaction, merchant details, and other relevant attributes. This scope
includes the detailed exploration and understanding of the dataset's characteristics, structure, and any
inherent limitations.

2. Data Preprocessing and Feature Engineering:

Data preprocessing is a critical component of the study. This involves cleaning the dataset by handling
missing values, removing duplicates, and normalizing the data to ensure consistency. Feature
engineering will be conducted to create new, meaningful features that capture the underlying patterns
associated with fraudulent transactions. Techniques such as encoding categorical variables, scaling

UO3BV21S0005 8 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

numerical features, and generating interaction terms will be applied to enhance the predictive power
of the machine learning models.

3. Handling Imbalanced Data:

One of the primary challenges in fraud detection is the imbalance between the number of fraudulent
and legitimate transactions. The scope includes addressing this imbalance through techniques such as
Synthetic Minority Over-sampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN),
and under sampling of the majority class. These methods will be evaluated and applied to ensure that
the machine learning models are not biased towards the majority class and can effectively identify
fraudulent transactions.

4. Model Development and Training:

The core of the study involves developing and training the Random Forest and Logistic Regression
models. This includes:

Random Forest: Implementing the Random Forest algorithm, which involves constructing multiple
decision trees and aggregating their predictions to improve accuracy and robustness.

Logistic Regression: Implementing the Logistic Regression algorithm, which involves fitting a logistic
function to the data to model the probability of a transaction being fraudulent. The models will be
trained on the pre-processed and balanced dataset, and hyperparameter tuning will be conducted to
optimize their performance.

5. Model Evaluation and Performance Metrics:

The performance of the developed models will be evaluated using a comprehensive set of metrics,
including accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic
curve (ROC-AUC). The evaluation process will involve splitting the dataset into training and testing
sets, applying cross-validation techniques, and assessing the models' ability to generalize to new,
unseen data. The study will also conduct a comparative analysis of the Random Forest and Logistic
Regression models to identify their respective strengths and limitations.

6. Real-Time Fraud Detection:

Exploring the feasibility of real-time fraud detection is an important aspect of the study. This involves
assessing the computational efficiency of the models and their ability to provide quick and accurate

UO3BV21S0005 9 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

predictions. Techniques for optimizing model performance to meet the demands of real-time
processing will be investigated, including incremental learning and online learning methods.

7. Implementation and Integration:

The scope includes the practical implementation of the developed models within a simulated or real-
world fraud detection system. This involves integrating the models into a broader system architecture
that can process incoming transaction data, apply the fraud detection algorithms, and generate alerts
for suspected fraudulent transactions. The implementation phase will also address any challenges
related to system integration and operational deployment.

8. Providing Recommendations:

Based on the findings of the study, practical recommendations will be provided for financial
institutions on implementing and optimizing machine learning-based fraud detection systems. This
includes guidance on data preprocessing, model selection, performance evaluation, and deployment
strategies. The recommendations aim to help financial institutions enhance their fraud detection
capabilities and reduce the incidence of fraudulent transactions.

9. Contribution to Research and Development:

The study aims to contribute to the academic and professional community by documenting the
methodology, findings, and insights in a detailed report. This report will be made accessible to
researchers, practitioners, and policymakers, fostering further research and development in the field of
fraud detection. The study seeks to advance the understanding of machine learning applications in
fraud detection and encourage the adoption of advanced techniques to combat credit card fraud.

In conclusion, the scope of this study is extensive, covering all aspects of developing, implementing,
and evaluating machine learning models for credit card fraud detection using Random Forest and
Logistic Regression algorithms. By addressing key challenges and providing practical insights, the
study aims to enhance the effectiveness of fraud detection systems and contribute to the broader field
of financial security.

UO3BV21S0005 10 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

2. LITERATURE REVIEW

2.1 Overview of Credit Card Fraud Detection

Credit card fraud involves unauthorized transactions made using a credit card without the cardholder's
consent. This type of fraud is a significant concern for financial institutions due to its potential for
substantial financial losses and damage to customer trust. The rise of online transactions has further
exacerbated this issue, making effective fraud detection systems more critical than ever. Traditionally,
fraud detection has relied on rule-based systems and manual reviews, but these methods have
limitations, particularly in scalability and adaptability to new fraud patterns.

Traditional Methods of Fraud Detection

Traditional fraud detection systems often rely on predefined rules and thresholds to identify suspicious
transactions. These rule-based systems use if-then logic to flag transactions that meet specific criteria,
such as large purchases or transactions in unusual locations. While straightforward and easy to
implement, these systems have several drawbacks:

Limited Flexibility: Rule-based systems struggle to adapt to new fraud techniques, as they require
constant updates and maintenance.

High False Positives: These systems often generate a high number of false positives, where legitimate
transactions are incorrectly flagged as fraudulent.

Scalability Issues: As transaction volumes grow, rule-based systems become increasingly difficult to
manage and maintain.

Introduction to Machine Learning

Machine learning (ML) offers a more dynamic and scalable approach to fraud detection. ML
algorithms can learn patterns from historical transaction data and apply this knowledge to identify
potentially fraudulent activities. Unlike rule-based systems, ML models can adapt to new types of
fraud as they are exposed to more data. The key advantages of using ML in fraud

Application of Machine Learning in Fraud Detection

Several machine learning algorithms have been successfully applied to fraud detection, each with its
unique strengths and weaknesses. The following are some of the most commonly used algorithms:

UO3BV21S0005 11 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Logistic Regression: A statistical method for binary classification, Logistic Regression is simple and
interpretable, making it a popular choice for fraud detection. It models the probability that a given
transaction is fraudulent based on its features.

Decision Trees: These algorithms split the data into subsets based on feature values, creating a tree-
like model of decisions. Decision trees are easy to interpret but can suffer from overfitting.

Random Forest: An ensemble method that builds multiple decision trees and combines their outputs.
Random Forest is robust and reduces overfitting, making it effective for fraud detection.

Support Vector Machines (SVM): SVMs find the optimal hyperplane that separates fraudulent and
non-fraudulent transactions in the feature space. They are effective but can be computationally
intensive.

Neural Networks: Deep learning models that can capture complex patterns in the data. While
powerful, they require large amounts of data and computational resources.

History and Evolution:

Credit card fraud has a long history intertwined with the development of credit card technology.
Initially, fraudsters utilized rudimentary methods such as counterfeit cards and stolen identities.
However, with the advent of magnetic stripe technology in the 1970s, fraud tactics evolved to include
skimming, where card details are illegally copied from legitimate cards. The proliferation of online
transactions in the 21st century further expanded the scope of fraud, leading to the emergence of
sophisticated cybercrime syndicates.

Current Landscape:

The current landscape of credit card fraud is characterized by a significant increase in both the
frequency and complexity of fraudulent activities. According to recent statistics, global credit card
fraud losses amounted to billions of dollars annually, with financial institutions and consumers bearing
the brunt of these losses. Fraudulent transactions not only result in financial harm but also erode

consumer trust in financial institutions and undermine the integrity of the payment ecosystem.

UO3BV21S0005 12 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

2.2 Traditional Methods for Fraud Detection

Traditional methods for fraud detection have been employed by financial institutions for decades,
relying on rule-based systems, statistical methods, and manual reviews. While these approaches have
been effective to some extent, they often struggle to keep pace with the evolving tactics of fraudsters
and may generate a high number of false positives. Here, we delve into the intricacies of these
traditional methods and explore their strengths, limitations, and challenges:

Rule-Based Systems:

Rule-based systems operate on predefined rules and thresholds designed to flag transactions that
exhibit suspicious behaviour. These rules are typically based on specific criteria such as transaction
amount, location, frequency, and time of day. For example, a rule may be triggered if a transaction
exceeds a certain dollar amount or occurs in a location known for fraudulent activity. While rule-based
systems are straightforward to implement and interpret, they suffer from several limitations:

Limited Adaptability: Rule-based systems are static and cannot adapt to new fraud patterns without
manual intervention. As fraud tactics evolve, these systems require constant updates and maintenance
to remain effective.

High False Positives: The rigid nature of rule-based systems can result in a high number of false
positives, where legitimate transactions are incorrectly flagged as fraudulent. This can lead to customer
inconvenience and erode trust in the financial institution.

Scalability Issues: As transaction volumes increase, rule-based systems may struggle to handle the
sheer volume of data efficiently. Processing large datasets in real-time can be challenging and may
impact system performance.

Statistical Methods:

Statistical methods involve the application of basic statistical techniques to identify patterns and
anomalies in transaction data. These methods may include regression analysis, time-series forecasting,
and clustering algorithms. Statistical approaches aim to detect deviations from expected behaviour
based on historical data. However, they have several limitations:

Limited Predictive Power: Statistical methods rely on historical data to identify patterns, making
them less effective in detecting novel fraud patterns or sophisticated fraud schemes.

UO3BV21S0005 13 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Difficulty in Handling Complex Data: Statistical methods may struggle to handle complex, high-
dimensional data commonly encountered in credit card transactions. As a result, they may overlook
subtle patterns indicative of fraudulent behaviour.

Lack of Adaptability: Like rule-based systems, statistical methods may lack adaptability and struggle
to keep pace with evolving fraud tactics. They require regular updates and recalibration to remain
effective in dynamic environments.

Manual Reviews:

Manual reviews involve human analysts reviewing transactions flagged as potentially fraudulent by
automated systems. While human intuition and expertise can be valuable in identifying subtle patterns
indicative of fraud, manual reviews suffer from several drawbacks:

Time-Consuming: Manual reviews are labor-intensive and time-consuming, requiring analysts to

manually inspect each flagged transaction for signs of fraud. This can result in delays in fraud detection
and resolution, leading to potential financial losses for the institution.

Subjectivity: The effectiveness of manual reviews is subjective and can vary depending on the
expertise and experience of the analysts involved. Human biases may also influence decision-making,
leading to inconsistencies in fraud detection.

Scalability Challenges: Manual reviews may not scale effectively to handle large volumes of
transactions, particularly in real-time environments. As transaction volumes increase, manual reviews
may become impractical and inefficient.

Conclusion:

While traditional methods for fraud detection have been the cornerstone of fraud prevention efforts for
decades, they have inherent limitations in terms of adaptability, scalability, and accuracy. As fraud
tactics evolve and transaction volumes grow, financial institutions are increasingly turning to advanced
machine learning techniques to enhance their fraud detection capabilities. By leveraging the power of
machine learning algorithms, financial institutions can develop more robust and effective fraud
detection systems capable of identifying and mitigating fraudulent activities in real-time.

UO3BV21S0005 14 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

2.3 Types of Credit Card Fraud

Card-Present Fraud: Card-present fraud occurs when a fraudster physically possesses a stolen or
counterfeit card and uses it to make unauthorized transactions. Common examples include skimming,
where card details are illicitly copied from legitimate cards, and card cloning, where a duplicate card
is created using stolen information. Detecting card-present fraud poses challenges due to the difficulty
in distinguishing between legitimate and fraudulent transactions, particularly in high-traffic
environments such as retail stores and ATMs.

Card-Not-Present Fraud: Card-not-present fraud refers to fraudulent transactions conducted online

or over the phone, where the physical presence of the card is not required. Fraudsters exploit
vulnerabilities in e-commerce platforms and payment gateways to make unauthorized purchases using
stolen card details. Detection methods for card-not-present fraud often rely on anomaly detection
algorithms that analyse transaction patterns and user behaviour to identify suspicious activities.
However, the dynamic nature of online transactions and the anonymity of cybercriminals present
significant challenges for detection.

Phishing and Identity Theft: Phishing and identity theft are common techniques used by fraudsters
to obtain sensitive information such as credit card numbers, passwords, and personal identification
details. Phishing involves the use of deceptive emails, websites, or phone calls to trick individuals into
divulging their confidential information. Identity theft, on the other hand, involves the unauthorized
use of someone else's personal information to commit fraud. Preventing and detecting phishing and
identity theft require a combination of user education, robust authentication mechanisms, and proactive
monitoring of suspicious activities

2.4 Machine Learning in Fraud Detection

Introduction to Machine Learning: Machine learning is a subset of artificial intelligence that enables
computers to learn from data and make predictions or decisions without explicit programming. It
encompasses various techniques, including supervised learning, unsupervised learning, and
reinforcement learning, each suited to different types of tasks and data.

Supervised Learning for Fraud Detection: Supervised learning involves training a model on labelled
data, where each transaction is annotated as either legitimate or fraudulent. The model learns to
distinguish between the two classes based on features such as transaction amount, time, and location.

UO3BV21S0005 15 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Supervised learning offers several advantages over unsupervised methods, including the ability to learn
complex patterns and the potential for higher accuracy in classification tasks.

2.5 Review of Related Works

The review of related works encompasses an in-depth examination of recent studies and research
efforts focused on credit card fraud detection using machine learning algorithms. These studies provide
valuable insights into the application of various machine learning techniques, the effectiveness of
different models, and the challenges encountered in real-world implementations. Here, we delve into
the key findings and methodologies employed in several seminal studies in the field:

Dal Pozzolo et al. (2014)

Dal Pozzolo et al. conducted a comprehensive study comparing different machine learning algorithms
for credit card fraud detection. Their research included algorithms such as Random Forest, Logistic
Regression, Decision Trees, and neural networks. The study focused on evaluating the performance of
these algorithms in terms of accuracy, precision, recall, and F1-score. The findings indicated that
ensemble methods like Random Forest outperformed other algorithms in terms of both accuracy and
robustness. Moreover, the study highlighted the importance of feature selection and data preprocessing
techniques in improving model performance.

Bhattacharyya et al. (2011)

Bhattacharyya et al. explored the use of hybrid approaches for credit card fraud detection, combining
supervised and unsupervised learning techniques. Their research aimed to address the limitations of
traditional supervised learning methods, such as the reliance on labelled data and the inability to detect
novel fraud patterns. The hybrid models developed in this study demonstrated superior performance
in detecting previously unseen fraud patterns by leveraging unsupervised learning algorithms for
anomaly detection. The findings underscored the importance of incorporating both supervised and
unsupervised techniques to enhance fraud detection capabilities.

Jurgovsky et al. (2018)

Jurgovsky et al. focused on modelling temporal dependencies in credit card transaction data using
recurrent neural networks (RNNs). Their research aimed to capture sequential patterns indicative of
fraudulent behaviour, such as unusual transaction sequences or timing patterns. By applying RNNs to

UO3BV21S0005 16 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

sequence data, the study demonstrated significant improvements in fraud detection accuracy compared
to traditional machine learning algorithms. The findings highlighted the importance of considering
temporal dynamics in fraud detection and the potential of deep learning techniques for modelling
complex patterns in transaction data.

Carcillo et al. (2019)

Carcillo et al. addressed the challenge of data imbalance in credit card fraud detection by exploring
various resampling techniques. Their research aimed to mitigate the bias towards the majority class
(legitimate transactions) and improve the performance of machine learning models. The study
compared techniques such as Synthetic Minority Over-sampling Technique (SMOTE), Adaptive
Synthetic Sampling (ADASYN), and under sampling of the majority class. The findings indicated that
balancing the dataset using these techniques resulted in more accurate and reliable fraud detection
models, reducing both false positives and false negatives.

Summary of Key Studies

The studies reviewed collectively underscore the effectiveness of machine learning techniques in credit
card fraud detection. Ensemble methods like Random Forest and hybrid approaches combining
supervised and unsupervised learning have shown promise in improving detection rates and adapting
to evolving fraud patterns. Deep learning techniques, such as recurrent neural networks, offer the
potential to capture complex temporal dependencies in transaction data, further enhancing detection
accuracy. Additionally, addressing data imbalance through resampling techniques has emerged as a
critical factor in developing robust fraud detection systems.

Identified Gaps and Areas for Improvement

While existing research has made significant strides in advancing fraud detection methodologies,
several gaps and areas for improvement remain. These include the need for more research on real-time
detection techniques, the integration of machine learning models into existing fraud detection systems,
and the development of adaptive algorithms capable of continuously learning and evolving to new
fraud patterns. Additionally, there is a growing emphasis on the interpretability and explainability of
machine learning models in fraud detection, ensuring that decisions made by these models are
transparent and understandable to stakeholders.

In conclusion, the review of related works provides valuable insights into the current state of credit
card fraud detection research, highlighting the strengths and limitations of existing methodologies and

UO3BV21S0005 17 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

identifying opportunities for future research and development. By building upon the findings and
methodologies of previous studies, this research aims to contribute to the ongoing efforts to combat
credit card fraud and protect consumers and financial institutions from fraudulent activities.

2.6 Findings and Insights

Major Findings from the Literature:

Effectiveness of Machine Learning: The literature consistently highlights the effectiveness of

machine learning techniques, particularly ensemble methods like Random Forest, in improving credit
card fraud detection accuracy. These methods outperform traditional rule-based systems and statistical
approaches, demonstrating higher accuracy and robustness in detecting fraudulent transactions.

Data Imbalance Challenges: A significant challenge in credit card fraud detection is the imbalance
between fraudulent and legitimate transactions in the dataset. Several studies emphasize the
importance of addressing this imbalance through techniques such as SMOTE, ADASYN, and under
sampling to improve model performance and reduce bias towards the majority class.

Temporal Dynamics: The temporal aspect of transaction data, such as time series patterns and
sequence dependencies, plays a crucial role in fraud detection. Research efforts focused on modelling
temporal dynamics using recurrent neural networks (RNNs) have shown promising results in capturing
sequential patterns indicative of fraudulent behaviour.

Hybrid Approaches: Hybrid approaches combining supervised and unsupervised learning techniques
have demonstrated superior performance in detecting novel fraud patterns and adapting to evolving
fraud tactics. By leveraging both labelled and unlabelled data, hybrid models can enhance detection
accuracy and reduce false positives.

UO3BV21S0005 18 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

3. SOFTWARE DESIGN

3.1 Introduction
This chapter provides a detailed description of the software design for the credit card fraud detection
system. The goal is to build a robust, scalable, and real-time system that can effectively identify
fraudulent transactions. The design includes system architecture, component design, data flow, and the
technology stack. This approach ensures that the system can handle large volumes of data while
maintaining high accuracy and efficiency in fraud detection.

3.2 System Architecture

The system architecture is designed to support both real-time data processing and batch processing for
model training and evaluation. The architecture includes several key components, each responsible for
a specific task within the fraud detection process.

3.2.1 Overall Architecture

The overall architecture is a microservices-based design, ensuring modularity, scalability, and ease of
maintenance. The key components include:

● Data Ingestion Module

● Feature Engineering Module

● Model Training Module

● Fraud Detection Module

● Dashboard and Reporting Module

● Database

The system is designed to integrate seamlessly with existing financial systems, allowing for real-time
detection and reporting of fraudulent activities.

UO3BV21S0005 19 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

3.2.2 Data Ingestion Module

The Data Ingestion Module is responsible for collecting transaction data from various sources,
including transactional databases and APIs. It ensures that data is ingested in real-time or through batch
processes, depending on the source.

● Technologies: Apache Kafka for real-time data streaming and Apache Nifi for ETL processes.

● Input: Raw transaction data from multiple sources.

● Output: Pre-processed data ready for feature engineering.

3.2.3 Feature Engineering Module

The Feature Engineering Module processes the raw transaction data into meaningful features that can
be used by the machine learning models. This includes transforming and scaling the data to ensure
consistency and improve model performance.

● Technologies: Python with Pandas and NumPy for data manipulation.

● Input: Pre-processed transaction data.

● Output: Feature vectors for model training and prediction.

3.2.4 Model Training Module

The Model Training Module is responsible for training the Random Forest and Logistic Regression
models using historical transaction data. It includes hyperparameter tuning and validation to ensure
optimal model performance.

● Technologies: Scikit-learn for model training and GridSearchCV for hyperparameter tuning.

● Input: Feature vectors and corresponding labels.

● Output: Trained Random Forest and Logistic Regression models.

UO3BV21S0005 20 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

3.2.5 Fraud Detection Module

The Fraud Detection Module uses the trained models to detect fraudulent transactions in real-time. It
processes incoming transaction data and applies the models to predict the likelihood of fraud.

● Technologies: Flask for deploying the models as RESTful APIs, Docker for containerization.

● Input: Feature vectors of incoming transactions.

● Output: Fraud prediction scores (0 for non-fraud, 1 for fraud).

3.2.6 Dashboard and Reporting Module

The Dashboard and Reporting Module visualizes transaction data, model performance, and fraud
detection results. It provides actionable insights through interactive dashboards and reports.

● Technologies: Dash/Plotly for interactive dashboards, SQL for querying the database.

● Input: Model predictions, transaction data.

● Output: Real-time and historical data visualizations.

3.2.7 Database

The database stores transaction data, model outputs, and evaluation metrics. It supports real-time
querying and reporting, ensuring that data is readily available for analysis.

● Technologies: PostgreSQL for relational database management.

● Input: Transaction data, model predictions, performance metrics.

● Output: Data for reporting and analysis

UO3BV21S0005 21 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

DATA FLOW DIAGRAM

3.3 Data Preprocessing:

Data preprocessing involves cleaning, transforming, and organizing raw data to enhance its quality
and usability for analysis and modelling. It includes steps like handling missing values, outliers, and
formatting errors. Data preprocessing aims to ensure data integrity and improve the performance of
machine learning algorithms by preparing the data in a suitable format.

Preprocessing CSV Files:

For the credit card fraud detection system, raw transaction data in CSV files is loaded into a pandas
Data Frame. Unnecessary columns like timestamps are dropped, and missing values are handled
through imputation or removal. Data formatting ensures consistency, and the dataset is split into

UO3BV21S0005 22 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

training and testing sets. This preprocessing ensures that the data used for fraud detection is clean,
consistent, and suitable for analysis and modelling.

3.3.1 Model Implementation:

Model Implementation for Credit Card Fraud Detection

Model implementation in the credit card fraud detection system involves training and evaluating
classification models to accurately detect fraudulent transactions based on the pre-processed data.
Using Python's scikit-learn library, we implement and evaluate Logistic Regression and Random
Forest Classifier models.

Logistic Regression

Logistic Regression is a straightforward and interpretable model used for binary classification tasks.
It calculates the probability of a transaction being fraudulent based on the input features.

Random Forest Classifier

Random Forest is an ensemble learning method that constructs multiple decision trees during training
and outputs the class that is the mode of the classes of the individual trees. It is robust to overfitting
and handles large datasets efficiently.

Each classification model is trained on the training dataset and evaluated using performance metrics
such as Accuracy, Precision, Recall, F1-Score, and the Area Under the Receiver Operating
Characteristic Curve (ROC-AUC). This evaluation provides insights into the predictive capabilities of
each model and helps identify the best-performing algorithm for fraud detection. Visualizations, such
as confusion matrices and ROC curves, further aid in assessing model effectiveness by showing the
trade-offs between true positive and false positive rates.

Additionally, feature importance analysis using bar plots helps understand the contribution of each
feature to the fraud detection model. This analysis can reveal which transaction attributes (such as
transaction amount, time, and frequency) are most indicative of fraudulent behaviour.

After evaluating the classification models and selecting the most suitable one based on performance
metrics and visualizations, the chosen model can be further optimized through hyperparameter tuning
and cross-validation techniques. This optimization process aims to fine-tune the model's parameters,

UO3BV21S0005 23 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

improving its predictive accuracy and generalization capabilities. Techniques such as grid search and
random search are used to find the optimal hyperparameters.

For Logistic Regression, hyperparameter tuning involves adjusting the regularization parameter (C) to
balance the trade-off between bias and variance. For Random Forest, tuning includes adjusting the
number of trees, maximum depth, and other tree-specific parameters to enhance model performance.

Overall, the model implementation phase involves selecting, training, evaluating, and optimizing
classification models to accurately detect fraudulent transactions. This process ensures robust fraud
detection, providing significant value in protecting financial institutions and their customers from
fraudulent activities.

3.3.2. Regression Models for Credit Card Fraud Detection

In credit card fraud detection, the primary objective is to accurately identify fraudulent transactions
while minimizing false positives. To achieve this, various regression models can be applied, each with
its unique advantages and disadvantages. Here, we will discuss the application of Logistic Regression
and Random Forest Classifier in the context of credit card fraud detection.

Logistic Regression

- Explanation: Logistic Regression is a statistical model that predicts the probability of a binary
outcome (such as fraud or not fraud) based on one or more predictor variables. It uses the logistic
function to model the relationship between the dependent binary variable and one or more independent
variables.

- Advantages:

- Simplicity and Interpretability: Logistic Regression is easy to understand and implement. The
coefficients can be interpreted as the influence of each feature on the probability of fraud.

- Efficiency: Computationally efficient, making it suitable for large datasets.

- Probabilistic Output: Provides probabilities for class membership, which can be useful for ranking
transactions by their likelihood of being fraudulent.

UO3BV21S0005 24 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

- Disadvantages:

- Assumption of Linearity: Assumes a linear relationship between the independent variables and the
log odds of the dependent variable, which may not always be true.

-Sensitivity to Multicollinearity: Can be affected by high correlations among predictor variables.

-Limited Flexibility: Not as flexible as more complex models in capturing nonlinear relationships.

Random Forest Classifier

- Explanation: Random Forest is an ensemble learning method that builds multiple decision trees
during training and merges their results to get a more accurate and stable prediction. Each tree is built
from a random subset of the training data, and the final prediction is based on the majority vote from
all the trees.

- Advantages:

- High Accuracy: Often provides better accuracy than individual decision trees by reducing
overfitting.

- Robustness: Works well with large datasets and high-dimensional data.

- Feature Importance: Provides an estimate of the importance of each feature in making predictions,
which can be useful for understanding the model.

- Disadvantages:

-Complexity and Training Time: More complex and requires more computational resources and
time for training compared to individual decision trees.

- Hyperparameter Tuning: Requires careful tuning of hyperparameters such as the number of trees
and the depth of each tree.

- Less Interpretability: The ensemble of many trees makes it harder to interpret compared to a single
decision tree.

UO3BV21S0005 25 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Example 1: Logistic Regression

1. Data Preparation:

- Standardize the data to ensure all features contribute equally to the model.

- Split the data into training and testing sets to evaluate the model's performance.

2. Model Training:

- Fit a logistic regression model to the training data.

- Use techniques like cross-validation to fine-tune the model and prevent overfitting.

3. Evaluation:

- Evaluate the model using metrics like accuracy, precision, recall, and the Area Under the Receiver
Operating Characteristic Curve (AUC-ROC).

- Use the model's probabilistic output to rank transactions by their likelihood of being fraudulent.

Example 2: Random Forest Classifier

1. Data Preparation:

- Handle missing values and encode categorical variables.

- Standardize or normalize the data if required.

2. Model Training:

- Train the Random Forest model on the training dataset, specifying the number of trees and other
hyperparameters.

- Use techniques like grid search or random search for hyperparameter optimization.

3. Evaluation:

- Assess the model's performance using metrics like accuracy, precision, recall, F1 score, and AUC-
ROC.

- Analyse feature importance scores to understand which features are most influential in predicting
fraud.

UO3BV21S0005 26 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

4. SOFTWARE AND HARDWARE REQUIREMENTS

Introduction
Credit card fraud detection is a critical task that leverages machine learning algorithms to identify and
prevent fraudulent activities. Among the most effective techniques are Logistic Regression and
Random Forest algorithms. This chapter delves into the detailed software and hardware requirements
essential for implementing these algorithms efficiently. The chapter aims to guide data scientists,
engineers, and IT administrators in setting up a robust environment for credit card fraud detection.

4.1. Software Requirements

Implementing machine learning algorithms for fraud detection involves various software tools and
libraries. Here, we categorize the software requirements into operating systems, programming
languages, libraries, development tools, and data management systems.

4.1.1 Operating Systems

The choice of operating system can influence the performance and compatibility of software tools.
Popular choices include:

- Windows 10/11: Widely used, offers comprehensive support for various development tools and
libraries.

- Linux Distributions (Ubuntu, CentOS, etc.): Preferred for high-performance computing due to
better resource management and security features.

- macOS: Ideal for development and experimentation, though may have some limitations in
production environments.

4.1.2 Programming Languages

- Python: The most popular language for machine learning and data science due to its simplicity and
the vast ecosystem of libraries.

UO3BV21S0005 27 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

- R: An alternative for statistical computing and graphics, though less commonly used for large-scale
fraud detection systems.

4.1.3 Libraries and Frameworks

To implement and optimize machine learning algorithms, various libraries are essential:

- NumPy: Fundamental package for numerical computations in Python.

- Pandas: Data manipulation and analysis library, essential for handling large datasets.

- Scikit-Learn: Provides simple and efficient tools for data mining and machine learning, including
implementations of Logistic Regression and Random Forest.

- Matplotlib/Seaborn: For data visualization, crucial for exploratory data analysis and presenting
results.

- Imbalanced-learn: Library to handle imbalanced datasets, useful for fraud detection as it often
involves skewed class distributions.

- SciPy: Used for advanced mathematical, scientific, and engineering computations.

4.1.4 Integrated Development Environments (IDEs) and Notebooks

- Jupyter Notebook: Interactive environment ideal for developing and sharing documents that contain
live code, equations, visualizations, and narrative text.

- PyCharm: A powerful IDE for Python, offering advanced debugging, testing, and project
management features.

- VS Code: Lightweight and highly customizable editor, with robust support for Python and data
science extensions.

4.1.5 Data Management and Storage

Handling large datasets efficiently requires robust data management systems:

- SQL Databases (MySQL, PostgreSQL): For structured data storage, retrieval, and management.

UO3BV21S0005 28 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

- NoSQL Databases (MongoDB, Cassandra): Suitable for handling unstructured data and high
transaction volumes.

- Big Data Technologies (Hadoop, Spark): Necessary for processing and analysing massive datasets
that exceed the capacity of traditional database systems.

- Cloud Storage Solutions (AWS S3, Google Cloud Storage): For scalable and reliable data storage.

4.1.6 Version Control Systems

-Git: Essential for version control, allowing multiple collaborators to work on the project
simultaneously while keeping track of changes.

4.1.7 Virtual Environments and Containers

- Anaconda: A distribution of Python and R for scientific computing and data science, simplifying
package management and deployment.

- Docker: For containerization, ensuring that the application runs consistently across different
environments.

4.2. Hardware Requirements

The performance of machine learning models can be significantly influenced by the underlying
hardware. Here, we outline the hardware requirements categorized into processors, memory, storage,
and additional hardware considerations.

4.2.1 Processors (CPUs)

- Multi-core CPUs: For efficient parallel processing. A minimum of 4 cores is recommended, though
8 or more cores are ideal for faster computations.

- High Clock Speed: CPUs with higher clock speeds (3.0 GHz and above) can process instructions
more rapidly, beneficial for large-scale data processing and model training.

UO3BV21S0005 29 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

4.2.2 Graphics Processing Units (GPUs)

While CPUs are essential, GPUs can significantly accelerate the training of machine learning models,
especially for complex algorithms and large datasets.

- NVIDIA GPUs: Preferred for their compatibility with popular machine learning frameworks (such
as TensorFlow and PyTorch) and support for CUDA.

- Memory (VRAM): GPUs with at least 8 GB of VRAM are recommended, with higher capacities
providing better performance for larger models.

4.2.3 Memory (RAM)

- Minimum Requirement: 16 GB of RAM is the baseline for data processing and model training.

- Optimal Requirement: 32 GB or more, especially when dealing with large datasets and running
multiple processes concurrently.

4.2.4 Storage

- Solid State Drives (SSDs): Essential for fast data read/write operations, significantly reducing the
time required for loading datasets and saving models.

- Capacity: At least 512 GB of SSD storage, with 1 TB or more recommended for handling large
datasets and multiple projects.

4.2.5 Additional Hardware Considerations

- Network Infrastructure: High-speed internet connection for efficient data transfer, cloud access, and
collaboration.

- *Backup Solutions*: Reliable backup systems (e.g., external hard drives, cloud backup services) to
prevent data loss.

UO3BV21S0005 30 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

4.3. Setting Up the Environment

Setting up the environment involves configuring the software and hardware components to work
seamlessly together. Here are the steps to set up a robust environment for credit card fraud detection:

4.3.1 Software Installation and Configuration

1. Install the Operating System: Choose and install the preferred operating system on your machine.

2. Set Up Python and Anaconda: Install Python, preferably through Anaconda, which simplifies
package management.

3. Install Libraries and Frameworks: Use pip or conda to install necessary libraries (e.g., scikit-
learn, pandas, numpy).

4. Set Up IDEs: Install and configure your preferred IDE (e.g., Jupyter Notebook, PyCharm).

5. Configure Data Management Systems: Set up and configure databases (SQL/NoSQL) and any
required big data technologies.

6. Version Control Setup: Install Git and configure a repository for version control.

7. Virtual Environments: Create virtual environments using conda or virtualenv to manage project-
specific dependencies.

4.3.2 Hardware Setup

1. Ensure Adequate Cooling: For high-performance CPUs and GPUs, ensure proper cooling solutions
to prevent overheating.

2. Upgrade RAM and Storage: Install sufficient RAM and SSD storage based on the project
requirements.

3. GPU Installation: If using GPUs, ensure they are properly installed and configured with the latest
drivers.

4. Network Configuration: Set up a stable and high-speed network connection for seamless data
transfer and collaboration.

UO3BV21S0005 31 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

4.3.3 Security and Maintenance

1. Regular Updates: Keep all software and hardware components updated to the latest versions to
ensure compatibility and security.

2. Data Security: Implement robust security measures to protect sensitive data, including encryption
and secure access controls.

3. Backup and Recovery: Regularly back up data and maintain a recovery plan to prevent data loss.

4.3.4 Training and Evaluation:

Following data preprocessing and model implementation, the next stage is to train and evaluate the
models. The dataset can be divided into training and testing sets, often in a 70-30 or 80-20 ratio, with
the larger portion used for training.

Training and evaluating regression models involve preparing the dataset, training each model with the
training data, and evaluating its performance with testing data. Performance metrics like MAE, MSE,
and R-squared gauge accuracy. Cross-validation ensures robustness. Hyperparameters are tuned for
optimization. The best-performing model is chosen for deployment, where it predicts AQI values from
environmental data. Continuous monitoring and periodic retraining maintain model accuracy over
time, facilitating informed environmental decisions.

4.3.5 Dataset

The dataset for this study, sourced from Kaggle, involves credit card transactions by European
cardholders in 2013 over two days, containing 284,807 transactions with 492 frauds, reflecting a highly
imbalanced nature. Features are anonymized using PCA, preserving privacy while enabling analysis.
The `Time` and `Amount` attributes are included, aiding in detecting patterns. Due to the imbalance,
resampling techniques and appropriate evaluation metrics like precision, recall, and AUC-ROC are
essential for effective fraud detection.

UO3BV21S0005 32 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

UO3BV21S0005 33 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

5. CODE
Libraries imported

Import Libraries

import all main libraries automatically with pyforest

# !pip install pyforest

# !pip install pycaret[full]

# import pyforest

## main libraries

import numpy as np

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

import matplotlib.ticker as mticker

# !pip install squarify

import squarify as sq

import scipy.stats as stats

from scipy.cluster.hierarchy import linkage, dendrogram

import statsmodels.api as sm

import statsmodels.formula.api as smf

UO3BV21S0005 34 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

import datetime as dt

from datetime import datetime

# from pyclustertend import hopkins

## pre-processing

from sklearn.cluster import KMeans, AgglomerativeClustering

from sklearn.compose import make_column_transformer, ColumnTransformer

from sklearn.decomposition import PCA

from sklearn.dummy import DummyClassifier

from sklearn.impute import SimpleImputer, KNNImputer

## feature Selection

from sklearn.feature_selection import SelectKBest, SelectPercentile, f_classif, f_regression,

mutual_info_regression

## scaling

from sklearn.preprocessing import scale

from sklearn.preprocessing import StandardScaler

from sklearn.preprocessing import PolynomialFeatures

from sklearn.preprocessing import OneHotEncoder

from sklearn.preprocessing import PowerTransformer

from sklearn.preprocessing import MinMaxScaler

from sklearn.preprocessing import LabelEncoder

from sklearn.preprocessing import RobustScaler

UO3BV21S0005 35 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

## regression/prediction

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet, LogisticRegression

from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor,

ExtraTreesRegressor

from sklearn.neighbors import KNeighborsRegressor

from sklearn.svm import SVR

from xgboost import XGBRegressor

from sklearn.tree import DecisionTreeRegressor

## ann

from sklearn.neural_network import MLPRegressor

## classification

from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier,

GradientBoostingClassifier, ExtraTreesClassifier

from sklearn.neighbors import KNeighborsClassifier

from sklearn.svm import SVC

from sklearn.tree import DecisionTreeClassifier, plot_tree

from catboost import CatBoostClassifier

from xgboost import XGBClassifier, plot_importance

## metrics

from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

UO3BV21S0005 36 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

from sklearn.metrics import make_scorer, precision_score, precision_recall_curve

from sklearn.metrics import roc_auc_score, roc_curve, f1_score, accuracy_score, recall_score

from sklearn.metrics import silhouette_samples,silhouette_score, average_precision_score

from sklearn.metrics.cluster import adjusted_rand_score

from sklearn.metrics import auc

## model selection

from sklearn import model_selection

from sklearn.model_selection import RandomizedSearchCV

from sklearn.model_selection import RepeatedStratifiedKFold, KFold, cross_val_predict,

train_test_split

from sklearn.model_selection import StratifiedKFold, GridSearchCV, cross_val_score, cross_validate

## MLearning

from sklearn.pipeline import make_pipeline, Pipeline

import optuna

from sklearn.naive_bayes import GaussianNB

import colorama

from colorama import Fore, Style # makes strings colored

from termcolor import colored

from termcolor import cprint

## plotly and cufflinks

UO3BV21S0005 37 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

import plotly

import plotly.express as px

import cufflinks as cf

import plotly.graph_objs as go

import plotly.offline as py

from plotly.offline import iplot

from plotly.subplots import make_subplots

import plotly.figure_factory as ff

cf.go_offline()

cf.set_config_file(offline=False, world_readable=True)

## Figure&Display options

plt.rcParams["figure.figsize"] = (10,6)

pd.set_option('max_colwidth',200)

pd.set_option('display.max_rows', 1000)

pd.set_option('display.max_columns', 200)

pd.set_option('display.float_format', lambda x: '%.3f' % x)

df0=pd.read_csv('creditcard.csv')

df = df0.copy()

# print(df.head(3) )

Some Useful User-Defined-Functions

UO3BV21S0005 38 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

def missing_values(df):

missing_number = df.isnull().sum().sort_values(ascending=False)

missing_percent = (df.isnull().sum()/df.isnull().count()).sort_values(ascending=False)

missing_values = pd.concat([missing_number, missing_percent], axis=1, keys=['Missing_Number',

'Missing_Percent'])

return missing_values[missing_values['Missing_Number']>0]

def first_looking(df):

print(colored("Shape:", 'yellow', attrs=['bold']), df.shape,'\n',

colored(''100, 'red', attrs=['bold']),

colored("\nInfo:\n",'yellow', attrs=['bold']), sep='')

print(df.info(), '\n',

colored(''100, 'red', attrs=['bold']), sep='')

print(colored("Number of Uniques:\n", 'yellow', attrs=['bold']), df.nunique(),'\n',

colored(''100, 'red', attrs=['bold']), sep='')

print(colored("Missing Values:\n", 'yellow', attrs=['bold']), missing_values(df),'\n',

colored(''100, 'red', attrs=['bold']), sep='')

print(colored("All Columns:", 'yellow', attrs=['bold']), *list(df.columns), sep='\n- ')

print(colored(''100, 'red', attrs=['bold']), sep='')

df.columns= df.columns.str.lower().str.replace('&', '').str.replace(' ', '')

print(colored("Columns after rename:", 'yellow', attrs=['bold']), *list(df.columns), sep='\n- ')

print(colored(''100, 'red', attrs=['bold']), sep='')

UO3BV21S0005 39 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

## To view summary information about the columns

def summary(column):

print(colored("Column: ",'yellow', attrs=['bold']), column)

print(colored(''100, 'red', attrs=['bold']), sep='')

print(colored("Missing values: ", 'yellow', attrs=['bold']), df[column].isnull().sum())

print(colored(''100, 'red', attrs=['bold']), sep='')

print(colored("Missing values(%): ", 'yellow', attrs=['bold']),

round(df[column].isnull().sum()/df.shape[0]*100, 2))

print(colored(''100, 'red', attrs=['bold']), sep='')

print(colored("Unique values: ", 'yellow', attrs=['bold']), df[column].nunique())

print(colored(''100, 'red', attrs=['bold']), sep='')

print(colored("Value counts: \n", 'yellow', attrs=['bold']), df[column].value_counts(dropna = False),

sep='')

print(colored(''100, 'red', attrs=['bold']), sep='')

def multicolinearity_control(df):

df_temp = df.corr()

count = 'Done'

feature =[]

collinear= []

for col in df_temp.columns:

for i in df_temp.index:

if abs(df_temp[col][i] > .8 and df_temp[col][i] < 1):

feature.append(col)

UO3BV21S0005 40 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

collinear.append(i)

cprint(f"multicolinearity alert in between {col} - {i}", "red", attrs=["bold"])

else:

cprint(f"There is NO multicollinearity between the features.", "blue", attrs=["bold"])

def duplicate_values(df):

print(colored("Duplicate check...", 'yellow', attrs=['bold']), sep='')

duplicate_values = df.duplicated(subset=None, keep='first').sum()

if duplicate_values > 0:

df.drop_duplicates(keep='first', inplace=True)

print(duplicate_values, colored(" Duplicates were dropped!"),'\n',

colored(''100, 'red', attrs=['bold']), sep='')

else:

print(colored("There are no duplicates"),'\n',

colored(''100, 'red', attrs=['bold']), sep='')

def drop_columns(df, drop_columns):

if drop_columns !=[]:

df.drop(drop_columns, axis=1, inplace=True)

print(drop_columns, 'were dropped')

else:

print(colored('Missing value control...', 'yellow', attrs=['bold']),'\n',

colored('If there is a missing value above the limit you have given, the relevant columns are
dropped and an information is given.'), sep='')

UO3BV21S0005 41 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

def drop_null(df, limit):

for i in df.isnull().sum().index:

if (df.isnull().sum()[i]/df.shape[0]*100)>limit:

print(df.isnull().sum()[i], 'percent of', i ,'were null and dropped')

df.drop(i, axis=1, inplace=True)

print(colored('Last shape after missing value control:', 'yellow', attrs=['bold']), df.shape, '\n',

colored(''100, 'red', attrs=['bold']), sep='')

def shape_control():

print('df.shape:', df.shape)

print('X.shape:', X.shape)

print('y.shape:', y.shape)

print('X_train.shape:', X_train.shape)

print('y_train.shape:', y_train.shape)

print('X_test.shape:', X_test.shape)

print('y_test.shape:', y_test.shape)

## show values in bar graphic

def show_values_on_bars(axs):

def _show_on_single_plot(ax):

for p in ax.patches:

_x = p.get_x() + p.get_width() / 2

_y = p.get_y() + p.get_height()

value = '{:.2f}'.format(p.get_height())

UO3BV21S0005 42 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

ax.text(_x, _y, value, ha="center")

if isinstance(axs, np.ndarray):

for idx, ax in np.ndenumerate(axs):

_show_on_single_plot(ax)

else:

_show_on_single_plot(axs)

'''This function detects the best z-score for outlier detection in the specified column.'''

def outlier_zscore(df, col, min_z=1, max_z = 5, step = 0.05, print_list = False):

z_scores = stats.zscore(df[col].dropna())

threshold_list = []

for threshold in np.arange(min_z, max_z, step):

threshold_list.append((threshold, len(np.where(z_scores > threshold)[0])))

df_outlier = pd.DataFrame(threshold_list, columns = ['threshold', 'outlier_count'])

df_outlier['pct'] = (df_outlier.outlier_count - df_outlier.outlier_count.shift(-

1))/df_outlier.outlier_count*100

df_outlier['pct'] = df_outlier['pct'].apply(lambda x : x-100 if x == 100 else x)

best_treshold = round(df_outlier.iloc[df_outlier.pct.argmax(), 0],2)

IQR_coef = round((best_treshold - 0.675) / 1.35, 2)

outlier_limit = int(df[col].dropna().mean() + (df[col].dropna().std()) *

df_outlier.iloc[df_outlier.pct.argmax(), 0])

num_outlier = df_outlier.iloc[df_outlier.pct.argmax(), 1]

percentile_threshold = stats.percentileofscore(df[col].dropna(), outlier_limit)

plt.plot(df_outlier.threshold, df_outlier.outlier_count)

UO3BV21S0005 43 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

plt.vlines(best_treshold, 0, df_outlier.outlier_count.max(), colors="r", ls = ":")

plt.annotate("Zscore : {}\nIQR_coef : {}\nValue : {}\nNum_outlier : {}\nPercentile :

{}".format(best_treshold,

IQR_coef,

outlier_limit,

num_outlier,

(np.round(percentile_threshold, 3),

np.round(100-percentile_threshold, 3))),

(best_treshold, df_outlier.outlier_count.max()/2))

plt.show()

if print_list:

print(df_outlier)

return (plt, df_outlier, best_treshold, IQR_coef, outlier_limit, num_outlier, percentile_threshold)

'''This function plots histogram, boxplot and z-score/outlier graphs for the specified column.'''

def outlier_inspect(df, col, min_z = 1, max_z = 5, step = 0.05, max_hist = None, bins = 50):

fig = plt.figure(figsize=(20, 6))

fig.suptitle(col, fontsize=16)

plt.subplot(1,3,1)

if max_hist == None:

sns.distplot(df[col], kde=False, bins = 50)

else :

sns.distplot(df[df[col]<=max_hist][col], kde=False, bins = 50)

plt.subplot(1,3,2)

UO3BV21S0005 44 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

sns.boxplot(df[col])

plt.subplot(1,3,3)

z_score_inspect = outlier_zscore(df, col, min_z = min_z, max_z = max_z, step = step)

plt.show()

"""This function gives max/min threshold, number of data, number of outlier and plots its boxplot,

according to the tree type and the entered z-score value for the relevant column."""

def num_outliers(df, col, whis = 1.5):

q1 = df.groupby("class")[col].quantile(0.25)

q3 = df.groupby("class")[col].quantile(0.75)

iqr = q3 - q1

print("Column_name :", col)

print("whis :", whis)

print("-------------------------------------------")

for i in np.sort(df['class'].unique()):

min_threshold = q1.loc[i] - whis*iqr.loc[i]

max_threshold = q3.loc[i] + whis*iqr.loc[i]

print("min_threshold:", min_threshold, "\nmax_threshold:", max_threshold)

num_outliers = len(df[df["class"]==i][col][(df[col]<min_threshold) | (df[col]>max_threshold)])

print(f"Num_of_values for {i} :", len(df[df["class"]==i]))

print(f"Num_of_outliers for {i} :", num_outliers)

print("-------------------------------------------")

return sns.boxplot(y = df[col], x = df["class"], whis=whis)

UO3BV21S0005 45 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

"""This function assigns the NaN-value first and then drop related rows, according to the tree type and
the entered

whis value and plots the boxplot for the relevant column. """

def remove_outliers(df, col, whis=1.5):

q1 = df.groupby("class")[col].quantile(0.25)

q3 = df.groupby("class")[col].quantile(0.75)

iqr = q3 - q1

for i in np.sort(df['class'].unique()):

min_threshold = q1.loc[i] - whis*iqr.loc[i]

max_threshold = q3.loc[i] + whis*iqr.loc[i]

df.loc[((df["class"]==i) & ((df[col]<min_threshold) | (df[col]>max_threshold))), col] = np.nan

return sns.boxplot(y = df[col], x = df["class"], whis=whis)

first_looking(df)

duplicate_values(df)

drop_columns(df, [])

drop_null(df, 90)

# df.describe().T

cprint('Have a first look to "class" column',"green","on_red", attrs=["bold"])

UO3BV21S0005 46 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

print("-----"*10)

summary('class')

cprint('Mean values according to the "class" column','green', 'on_red')

df.groupby('class').mean()

y = df['class']

print(f'Percentage of class-1: % {round(y.value_counts(normalize=True)[1]*100,2)} --> \

({y.value_counts()[1]} observations for class-1)\nPercentage of class-0: %

{round(y.value_counts(normalize=True)[0]*100,2)} --> ({y.value_counts()[0]} observations for
class-0)')

def eval(model, X_train, X_test):

y_pred = model.predict(X_test)

y_pred_train = model.predict(X_train)

print(confusion_matrix(y_test, y_pred))

print("Test_Set")

print(classification_report(y_test,y_pred))

print("Train_Set")

print(classification_report(y_train,y_pred_train))

print("---"*20)

def train_val(y_train, y_train_pred, y_test, y_pred):

UO3BV21S0005 47 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

scores = {"train_set": {"Accuracy" : accuracy_score(y_train, y_train_pred),

"Precision" : precision_score(y_train, y_train_pred),

"Recall" : recall_score(y_train, y_train_pred),

"f1" : f1_score(y_train, y_train_pred),

"roc_auc" : roc_auc_score(y_train, y_train_pred),

"recall_auc" : auc(recall, precision)},

"test_set": {"Accuracy" : accuracy_score(y_test, y_pred),

"Precision" : precision_score(y_test, y_pred),

"Recall" : recall_score(y_test, y_pred),

"f1" : f1_score(y_test, y_pred),

"roc_auc" : roc_auc_score(y_test, y_pred),

"recall_auc" : auc(recall, precision)}}

return pd.DataFrame(scores)

df_out = df.copy()

df_ml = df_out.copy()

scaler = StandardScaler()

df_ml["amount"] = scaler.fit_transform(df_ml["amount"].values.reshape(-1,1))

UO3BV21S0005 48 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

df_ml["time"] = scaler.fit_transform(df_ml["time"].values.reshape(-1,1))

# X = df_ml.drop(['class'], axis = 1)

# y = df_ml['class']

# X_train, X_test, y_train, y_test = train_test_split(X, y, stratify = y, test_size = 0.2, random_state =

42)

df_deploy = df_out[['v2', 'v3', 'v4', 'v7', 'v10', 'v11', 'v12', 'v14', 'v16', 'v17', 'class']].copy()

df_deploy.head(1)

X = df_deploy.drop(['class'], axis = 1)

y = df_deploy['class']

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify = y, test_size = 0.2, random_state = 42)

# print('X_train.shape : ', X_train.shape)

# print('X_test.shape : ', X_test.shape)

# cprint('y_train.value_counts','green', 'on_red')

# y_train.value_counts()

# cprint('y_test.value_counts','green', 'on_red')

# y_test.value_counts()

UO3BV21S0005 49 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

# print(X_train, X_test, y_train, y_test)

LogReg_Deploy = LogisticRegression(class_weight = 'balanced',

penalty = 'l2',

solver = 'lbfgs',

random_state = 42).fit(X_train, y_train)

y_pred = LogReg_Deploy.predict(X_test)

y_train_pred = LogReg_Deploy.predict(X_train)

LogReg_Deploy_f1 = f1_score(y_test, y_pred)

LogReg_Deploy_acc = accuracy_score(y_test, y_pred)

LogReg_Deploy_recall = recall_score(y_test, y_pred)

LogReg_Deploy_auc = roc_auc_score(y_test, y_pred)

LogReg_Deploy_pre = precision_score(y_test, y_pred)

precision, recall, _ = precision_recall_curve(y_test, y_pred)

LogReg_Deploy_recall_auc = auc(recall, precision)

print("LogReg_Deploy")

print ("------------------")

eval(LogReg_Deploy, X_train, X_test)

UO3BV21S0005 50 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

RandomForest_Deploy = RandomForestClassifier(class_weight = 'balanced',

max_depth = 7,

max_features = 4,

min_samples_split = 2,

n_estimators = 50,

random_state = 42).fit(X_train, y_train)

y_pred = RandomForest_Deploy.predict(X_test)

y_train_pred = RandomForest_Deploy.predict(X_train)

RandomForest_Deploy_f1 = f1_score(y_test, y_pred)

RandomForest_Deploy_acc = accuracy_score(y_test, y_pred)

RandomForest_Deploy_recall = recall_score(y_test, y_pred)

RandomForest_Deploy_auc = roc_auc_score(y_test, y_pred)

RandomForest_Deploy_pre = precision_score(y_test, y_pred)

precision, recall, _ = precision_recall_curve(y_test, y_pred)

RandomForest_Deploy_recall_auc = auc(recall, precision)

print("RandomForest_Deploy")

print ("------------------")

eval(RandomForest_Deploy, X_train, X_test)

import pickle

UO3BV21S0005 51 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

logistic_regression = pickle.dump(LogReg_Deploy, open('logistic_regression_model', 'wb'))

random_forest_classifier = pickle.dump(RandomForest_Deploy, open('random_forest_model', 'wb'))

import streamlit as st

import pickle

import pandas as pd

from sklearn.preprocessing import MinMaxScaler

from PIL import Image

import base64

st.sidebar.title("Transacion INFO")

html_temp="""

<div style="background-color: Blue;padding:10px">

<h2 style="color:white;text-align:center;">Fraud Detection</h2>

</div> <br>

"""

st.markdown(html_temp , unsafe_allow_html=True)

st.markdown("<h2 style='color:white;text-align:center;'>Fraud Detection</h2>",

unsafe_allow_html=True)

selection = st.selectbox("'", ["Logistic Regression", "Random Forest"])

UO3BV21S0005 52 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

if selection == "Logistic Regression":

st.write("you selected",selection,"model")

model = pickle.load(open('logistic_regression_model', 'rb'))

else:

st.write("you selected",selection,"model")

model = pickle.load(open('random_forest_model', 'rb'))

v2 = st.sidebar.slider(label="V2-PCA", min_value=-10.00, max_value=15.00, step=0.01)

v3 = st.sidebar.slider(label="V3-PCA", min_value=-25.00, max_value=5.00, step=0.01)

v4 = st.sidebar.slider(label="V4-PCA", min_value=-5.00, max_value=15.00, step=0.01)

v7 = st.sidebar.slider(label="V7-PCA", min_value=-45.00, max_value=130.00, step=0.01)

v10 = st.sidebar.slider(label="V10-PCA", min_value=-20.00, max_value=5.00, step=0.01)

v11 = st.sidebar.slider(label="V11-PCA", min_value=-5.00, max_value=15.00, step=0.01)

v12 = st.sidebar.slider(label="V12-PCA", min_value=-20.00, max_value=5.00, step=0.01)

v14 = st.sidebar.slider(label="V14-PCA", min_value=-20.00, max_value=5.00, step=0.01)

v16 = st.sidebar.slider(label="V16-PCA", min_value=-15.00, max_value=20.00, step=0.01)

v17 = st.sidebar.slider(label="V17-PCA", min_value=-30.00, max_value=10.00, step=0.01)

coll_dict = {

'v2': v2,

'v3': v3,

UO3BV21S0005 53 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

'v4': v4,

'v7': v7,

'v10': v10,

'v11': v11,

'v12': v12,

'v14': v14,

'v16': v16,

'v17': v17,

columns = ['v2', 'v3', 'v4', 'v7', 'v10', 'v11', 'v12', 'v14', 'v16', 'v17']

df_coll=pd.DataFrame.from_dict([coll_dict])

user_inputs=df_coll

prediction=model.predict(user_inputs)

html_temp="""

<div style="background-color: Black;padding:10px">

<h2 style="color:white;text-align:center;">Fruad Detection</h2>

</div> <br>

"""

UO3BV21S0005 54 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

st.markdown("<h2 style='color:black;text-align:center;'>Transaction INFO</h2>",

unsafe_allow_html=True)

st.table(df_coll)

st.subheader("click predict if configuration is okay")

if st.button("PREDICT"):

if prediction[0]==0:

st.success(prediction[0])

st.success(f"Transaction is safe")

elif prediction[0]==1:

st.warning(prediction[0])

st.warning(f"Transaction is not safe")

UO3BV21S0005 55 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

6. TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality of
components, sub assemblies, assemblies and/or a finished product It is the process of exercising
software with the intent of ensuring that the

Software system meets its requirements and user expectations and does not fail in an unacceptable
manner. There are various types of test. Each test type addresses a specific testing requirement.

TYPES OF TESTS

Unit testing

Unit testing involves the design of test cases that validate that the internal program logic is functioning
properly, and that program inputs produce valid outputs. All decision branches and internal code flow
should be validated. It is the testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing, that relies on knowledge
of its construction and is invasive. Unit tests perform basic tests at component level and test a specific
business process, application, and/or system configuration. Unit tests ensure that each unique path of
a business process performs accurately to the documented specifications and contains clearly defined
inputs and expected results.

Integration testing

Integration tests are designed to test integrated software components to determine if they actually run
as one program. Testing is event driven and is more concerned with the basic outcome of screens or
fields. Integration tests demonstrate that although the components were individually satisfaction, as
shown by successfully unit testing, the combination of components is correct and consistent.
Integration testing is specifically aimed at exposing the problems that arise from the combination of
components.

UO3BV21S0005 56 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Functional test

Functional tests provide systematic demonstrations that functions tested are available as specified by
the business and technical requirements, system documentation, and user manuals.

Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

Output : identified classes of application outputs must be exercised.

Systems/Procedures: interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key functions, or special
test cases. In addition, systematic coverage pertaining to identify Business process flows; data fields,
predefined processes, and successive processes must be considered for testing. Before functional
testing is complete, additional tests are identified and the effective value of current tests is determined.

System Test

System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process descriptions and
flows, emphasizing pre-driven process links and integration points.

White Box Testing

White Box Testing is a testing in which in which the software tester has knowledge of the inner
workings, structure and language of the software, or at least its purpose. It is purpose. It is used to test
areas that cannot be reached from a black box level.

UO3BV21S0005 57 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Black Box Testing

Black Box Testing is testing the software without any knowledge of the inner workings, structure or
language of the module being tested. Black box tests, as most other kinds of tests, must be written from
a definitive source document, such as specification or requirements document, such as specification or
requirements document. It is a testing in which the software under test is treated, as a black box .Where
you cannot “see” into it. The test provides inputs and responds to outputs without considering how the
software works.

Unit Testing:

Unit testing is usually conducted as part of a combined code and unit test phase of the software
lifecycle, although it is not uncommon for coding and unit testing to be conducted as two distinct
phases.

Test strategy and approach

Field testing will be performed manually and functional tests will be written in detail.

Test objectives

• All field entries must work properly.

• Pages must be activated from the identified link.

• The entry screen, messages and responses must not be delayed.

Features to be tested

• Verify that the entries are of the correct format

• No duplicate entries should be allowed • All links should take the user to the correct page.

Integration Testing

Software integration testing is the incremental integration testing of two or more integrated software
components on a single platform to produce failures caused by interface defects. The task of the
integration test is to check that components or software applications, e.g. components in a software
system or – one step up – software applications at the company level – interact without error.

UO3BV21S0005 58 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Acceptance Testing

User Acceptance Testing is a critical phase of any project and requires significant participation by the
end user. It also ensures that the system meets the functional requirements.

QUALITY METRICS:

Training and Validation Accuracy:

Definition:

Accuracy is one of the most straightforward metrics for evaluating classification models. It measures
the ratio of correctly predicted instances to the total number of instances. Specifically, training
accuracy represents the accuracy of the model on the training dataset, while validation accuracy
represents the accuracy on a separate validation dataset.

Purpose:

Training accuracy indicates how well the model is learning from the training data. It reflects the ability
of the model to fit the training data. Validation accuracy, on the other hand, provides insight into the
model's ability to generalize to new, unseen data. It helps detect overfitting, where the model performs
well on the training data but poorly on new data.

Interpretation:

A high training accuracy suggests that the model is effectively learning from the training data.
However, if the validation accuracy is significantly lower than the training accuracy, it may indicate
overfitting. Conversely, if both training and validation accuracies are low, it might suggest
underfitting, indicating that the model is too simple to capture the underlying patterns in the data.

UO3BV21S0005 59 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Considerations:

Accuracy alone may not be sufficient for evaluating imbalanced datasets, where one class dominates
the other(s). In such cases, other metrics like precision, recall, and F1-score provide a more
comprehensive assessment.

It's essential to monitor both training and validation accuracies throughout the training process to detect
overfitting early and adjust the model accordingly.

Training and Validation Loss:

In machine learning, evaluating the performance of a model during training involves monitoring
specific metrics to ensure the model is learning effectively and generalizing well to unseen data. Two
fundamental metrics in this context are the training loss and validation loss. These metrics provide
insights into how well the model is fitting the training data and how well it is likely to perform on new,
unseen data.

Training Loss

Training loss is a measure of how well the model is fitting the training data. During the training process,
the model makes predictions on the training dataset, and the training loss quantifies the difference
between these predictions and the actual target values. This difference is computed using a loss
function, which varies depending on the type of problem being solved. For instance, Mean Squared
Error (MSE) is commonly used for regression tasks, while Cross-Entropy Loss is typical for
classification tasks.

The process to compute the training loss generally involves:

● Forward Pass: The model processes the input data through its layers to produce predictions.

● Loss Calculation: The predictions are compared to the actual values using the loss function.
The loss function outputs a numerical value representing the error.

● Backward Pass: Gradients are computed with respect to the loss, and the model's weights are
updated using these gradients to minimize the loss.

UO3BV21S0005 60 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

The primary goal during training is to minimize the training loss. As training progresses over multiple
iterations or epochs, the training loss should ideally decrease, indicating that the model is learning the
patterns in the training data.

Validation Loss

Validation loss is a measure of how well the model is performing on a separate validation dataset,
which the model does not see during training. This dataset is used to evaluate the model’s ability to
generalize to new, unseen data. Like the training loss, the validation loss is computed using the same
loss function, but it provides an estimate of the model's performance on data that it hasn't been trained
on.

The process to compute the validation loss involves:

● Forward Pass: The model makes predictions on the validation data.

● Loss Calculation: The loss function is used to compare these predictions to the actual target
values in the validation dataset, yielding the validation loss.

● Comparing Training and Validation Loss: Monitoring both training and validation loss is
crucial for diagnosing how well the model is learning and whether it is overfitting or
underfitting:

● Overfitting: Overfitting occurs when the model performs well on the training data but poorly
on the validation data. This is evident if the training loss is low while the validation loss is high.
Overfitting means the model has learned the noise and details in the training data to the extent
that it negatively impacts the performance on new data. Overfitting can be mitigated using
techniques such as regularization, dropout, and early stopping.

• Underfitting: Underfitting occurs when the model is too simple to capture the underlying
patterns in the data, resulting in high training and validation loss. This indicates that the model
is not learning effectively from the data. Solutions to underfitting include increasing the
complexity of the model, adding more features, or training for more epochs.

● Good Fit: A model that fits well will have both training and validation losses decreasing and
staying low. Ideally, the gap between the training and validation loss should be small,
indicating that the model is generalizing well to new data.

UO3BV21S0005 61 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Visualizing Losses

To understand the training dynamics, losses are often plotted against the number of epochs. This
visualization helps in diagnosing issues like overfitting and underfitting. In a typical plot:

The x-axis represents the number of epochs.

The y-axis represents the loss values.

A well-trained model shows both training and validation loss decreasing and converging. If the
validation loss starts increasing while the training loss continues to decrease, this indicates overfitting.

Several strategies can be employed to manage training and validation loss:

● Cross-Validation: This involves partitioning the dataset into multiple folds and training the
model multiple times, each time with a different fold as the validation set. This provides a more
robust estimate of the model’s performance.

● Regularization: Techniques such as L1 and L2 regularization add a penalty to the loss function
based on the magnitude of the model's weights, discouraging overly complex models and
helping to prevent overfitting.

● Dropout: This technique randomly drops a fraction of the neurons during training, which helps
in making the model more robust and reduces overfitting.

● Early Stopping: Training is stopped when the validation loss stops decreasing for a specified
number of epochs, preventing the model from overfitting by training too long.

Training and validation loss are critical metrics for evaluating a machine learning model's performance
during training. The training loss provides insight into how well the model is learning the training data,
while the validation loss offers an estimate of the model's generalization ability to new data. By
carefully monitoring these metrics and using appropriate strategies to mitigate overfitting and
underfitting, one can develop models that perform well not only on the training data but also on unseen
data, thereby achieving robust and reliable predictions.

UO3BV21S0005 62 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

6.1 Testing Model

6.1.1 Data Loading and Preparation:

The code begins by loading the credit card fraud detection dataset from the Kaggle website using
Pandas. The dataset comprises 284,807 transactions. It splits the dataset into training and testing sets,
with approximately 199,365 samples designated for training and the remaining 85,442 samples
reserved for testing.

Random indices are generated to select a subset of samples from the testing set for evaluation, ensuring
a robust assessment of model performance across different scenarios.

6.1.2 Model deployment:

The trained fraud detection models are deployed using the pickle module for efficient serialization,
allowing quick deployment and evaluation on new transaction data to identify fraudulent activity.

6.2 Testing Credit Card Fraud Detection Using Random Forest and
Logistic Regression Algorithms

Testing is a critical phase in the development of machine learning models, especially in applications
as sensitive and high-stakes as credit card fraud detection. Effective testing ensures that the model
performs reliably and accurately, helping to prevent significant financial losses and maintain customer
trust. This chapter focuses on the comprehensive testing methodologies for credit card fraud detection
using Random Forest and Logistic Regression algorithms. It covers the importance of testing,
evaluation metrics, cross-validation techniques, and model optimization strategies.

6.2.1. Importance of Testing in Fraud Detection

Credit card fraud detection models must be highly accurate and robust due to the substantial financial
and reputational risks involved. Effective testing helps to:

UO3BV21S0005 63 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

-Validate Model Performance: Ensuring the model performs well on unseen data is crucial for its
real-world applicability.

-Identify Overfitting and Underfitting: Testing helps in detecting whether the model generalizes
well to new data or is too tailored to the training data.

-Optimize Model Parameters : Fine-tuning hyperparameters based on testing results can significantly
enhance model accuracy and efficiency.

-Evaluate Real-World Applicability: Testing the model under various scenarios helps in assessing
its performance in practical, real-world situations.

6.2.2 Testing Methodologies

Testing methodologies are designed to rigorously evaluate the performance of machine learning
models. The main methodologies include train-test split, evaluation metrics, cross-validation, and
confusion matrices.

6.2.3 Train-Test Split

The train-test split is the initial step in testing, where the dataset is divided into two parts: the training
set and the testing set. This division allows for the assessment of the model's performance on data it
hasn't seen before, providing an unbiased evaluation of its accuracy.

- Training Set: Used to train the model.

-Testing Set: Used to evaluate the model's performance.

6.2.4 Evaluation Metrics

Evaluation metrics are quantitative measures that provide insights into different aspects of a model's
performance. For credit card fraud detection, the key metrics include:

-Accuracy: The proportion of correctly predicted instances out of the total instances.

-Precision: The proportion of positive identifications that were actually correct, which is critical for
minimizing false positives.

UO3BV21S0005 64 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

-Recall (Sensitivity): The proportion of actual positives that were correctly identified, important for
minimizing false negatives.

-F1 Score: The harmonic mean of precision and recall, providing a balance between the two.

-ROC-AUC Score: The Area Under the Receiver Operating Characteristic Curve, indicating the
model's ability to distinguish between fraudulent and non-fraudulent transactions.

6.2.5 Cross-Validation

Cross-validation is a technique for assessing how a model performs on different subsets of the data,
providing a more generalized performance measure. The main types include:

-K-Fold Cross-Validation: The dataset is divided into k subsets, and the model is trained and tested
k times, each time using a different subset as the test set and the remaining as the training set.

-Stratified K-Fold Cross-Validation: Similar to K-Fold but ensures that each fold has the same
proportion of class labels as the original dataset, which is particularly useful for imbalanced datasets
like those in fraud detection.

6.2.6 Confusion Matrix

The confusion matrix provides a detailed breakdown of the classification results, showing the number
of true positives, true negatives, false positives, and false negatives. This matrix helps in understanding
the model's strengths and weaknesses in detail.

6.3. Testing Logistic Regression for Fraud Detection

6.3.1 Model Building

Logistic Regression is a linear model used for binary classification tasks. It estimates the probability
that a given instance belongs to a particular class.

UO3BV21S0005 65 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

6.3.2 Making Predictions

Once the model is trained, it can be used to predict the class labels for the testing set. The predictions
are based on the learned relationship between the input features and the target variable.

6.3.3 Model Evaluation

Evaluation involves using the aforementioned metrics to assess the model's performance on the test
data. This step helps in understanding how well the model is likely to perform in real-world scenarios.

6.3.4 Cross-Validation

Cross-validation provides a more reliable estimate of the model's performance by training and testing
the model on multiple subsets of the data. It helps in identifying overfitting and ensuring the model
generalizes well.

6.3.5 Grid Search for Hyperparameter Tuning

Grid search is a method for hyperparameter tuning that involves testing different combinations of
parameters to find the optimal set that maximizes the model's performance.

6.3.6 ROC Curve

The ROC curve is a graphical representation of a model's diagnostic ability, showing the trade-off
between the true positive rate and false positive rate at various threshold settings. The Area Under the
Curve (AUC) quantifies this performance.

6.3.7 Confusion Matrix

The confusion matrix provides a comprehensive view of the model's performance by showing the
counts of true positive, true negative, false positive, and false negative predictions. It helps in
identifying specific areas where the model may be making errors.

UO3BV21S0005 66 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

6.4. Testing Random Forest for Fraud Detection

1. Model Building

2. Making Predictions

Similar to Logistic Regression, the Random Forest model, once trained, is used to predict the class
labels for the testing set based on the majority vote from the ensemble of trees.

3. Model Evaluation

The evaluation of the Random Forest model involves using metrics such as accuracy, precision, recall,
F1 score, and ROC-AUC score to assess its performance on the test data.

4. Cross-Validation

Cross-validation in Random Forest helps in assessing the model’s performance more reliably by
training and testing it on different subsets of the data. This helps in ensuring that the model generalizes
well to new, unseen data.

5. Grid Search for Hyperparameter Tuning

Grid search is used to tune the hyperparameters of the Random Forest model, such as the number of
trees, maximum depth, and the criteria for splitting nodes. This optimization helps in improving the
model’s performance.

6. ROC Curve

UO3BV21S0005 67 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

The ROC curve for Random Forest is plotted to visualize the trade-off between true positive rate and
false positive rate across different threshold values. The AUC score is used to quantify the model's
diagnostic ability.

7. Confusion Matrix

The confusion matrix for Random Forest provides a detailed breakdown of its performance, showing
the counts of true positive, true negative, false positive, and false negative predictions. This detailed
view helps in understanding the specific areas where the model may need improvement.

6.5 Confusion Matrix

In the realm of machine learning, particularly in classification tasks, the confusion matrix stands as a
fundamental tool for evaluating model performance. It provides a comprehensive view of how well a
model is predicting different classes and helps in diagnosing the types of errors it makes. This
document aims to delve into the intricacies of the confusion matrix, its components, and its significance
in assessing the effectiveness of classification models.

6.5.1 Structure of the Confusion Matrix

At its core, a confusion matrix is a square matrix that organizes predictions made by a classifier into
four categories based on the actual and predicted classes. These categories are:

1. True Positives (TP): Instances where the model correctly predicts the positive class.

2. True Negatives (TN): Instances where the model correctly predicts the negative class.

3. False Positives (FP): Instances where the model incorrectly predicts the positive class (Type I
error).

4. False Negatives (FN): Instances where the model incorrectly predicts the negative class (Type II
error).

The confusion matrix is typically structured as follows:

UO3BV21S0005 68 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Predicted Negative Predicted Positive

Actual Negative TN FP

Actual Positive FN TP

Each cell in the matrix represents the count of instances falling into the corresponding category,
providing a clear overview of the model's performance.

6.5.2 Interpreting the Confusion Matrix

The confusion matrix serves as a diagnostic tool that helps in understanding how well a classification
model is performing. Here's how each component of the confusion matrix can be interpreted:

1. True Positives (TP): These are instances where the model correctly identifies positive cases. For
example, in medical diagnostics, a true positive would be when the model correctly identifies a patient
with a certain condition.

2. True Negatives (TN): These are instances where the model correctly identifies negative cases.
Continuing with the medical example, a true negative would be when the model correctly identifies a
patient without the condition.

3. False Positives (FP): These are instances where the model incorrectly predicts positive cases. A
false positive occurs when the model predicts a positive case, but the actual case is negative.

4. False Negatives (FN): These are instances where the model incorrectly predicts negative cases. For
instance, a false negative would be when the model predicts a negative case, but the actual case is
positive.

6.5.3 Evaluation Metrics Derived from the Confusion Matrix

The confusion matrix serves as the basis for calculating various evaluation metrics that quantify the
performance of a classification model. Some of the key metrics derived from the confusion matrix
include:

UO3BV21S0005 69 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

- Accuracy: The overall correctness of the model, calculated as (TP + TN) / (TP + TN + FP + FN).

- Precision: The ratio of correctly predicted positive instances to the total predicted positive instances,
calculated as TP / (TP + FP).

- Recall (Sensitivity): The ratio of correctly predicted positive instances to all actual positive
instances, calculated as TP / (TP + FN).

- Specificity: The ratio of correctly predicted negative instances to all actual negative instances,
calculated as TN / (TN + FP).

- F1 Score: The harmonic mean of precision and recall, providing a balance between these two metrics.

These metrics offer valuable insights into different aspects of the model's performance, such as its
ability to avoid false positives (precision), capture true positives (recall), and perform well across all
classes (accuracy, F1 score).

6.5.4 Importance of the Confusion Matrix

The confusion matrix plays a pivotal role in model evaluation and model selection processes. It allows
data scientists and machine learning practitioners to:

- Understand the distribution of predictions across different classes.

- Identify patterns of correct and incorrect predictions.

- Fine-tune models based on specific performance metrics (e.g., improving recall or reducing false
positives).

- Compare the performance of different models and choose the most suitable one for the task at hand.

By leveraging the insights provided by the confusion matrix and associated evaluation metrics,
practitioners can iteratively improve their classification models, leading to more accurate and reliable
predictions.

The confusion matrix is a cornerstone of classification model evaluation, offering a structured and
detailed view of predictions and errors. Its ability to quantify different types of model performance,

UO3BV21S0005 70 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

such as accuracy, precision, and recall, makes it indispensable in the machine learning workflow. By
interpreting and analysing the confusion matrix, practitioners can gain valuable insights into model
behaviour, make informed decisions about model improvements, and ultimately enhance the overall
efficacy of classification algorithms.

6.6 Quality Metrics

Quality metrics, also known as evaluation metrics or performance metrics, are used to quantify the
performance of a machine learning model. Some common quality metrics used for testing the model
include:

When predicting fraudulent transactions using fraud detection models, it's crucial to evaluate the
models' performance accurately. The following quality metrics are commonly used for this purpose:

Precision

Precision measures the proportion of true positive predictions (fraudulent transactions correctly
identified) out of all positive predictions (transactions predicted as fraudulent).

-Formula: \( \text{Precision} = \frac{TP}{TP + FP} \)

-Advantages: Indicates the accuracy of positive predictions, minimizing false positives.

-Disadvantages: May not fully reflect performance if false negatives are high.

Recall

Recall, or sensitivity, measures the proportion of true positive predictions out of all actual positives
(all actual fraudulent transactions).

- Formula: \( \text{Recall} = \frac{TP}{TP + FN} \)

- Advantages: Indicates the model's ability to detect fraudulent transactions, minimizing false
negatives.

-Disadvantages: May not fully reflect performance if false positives are high.

UO3BV21S0005 71 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

F1-Score

The F1-Score is the harmonic mean of precision and recall, providing a balance between the two
metrics.

-Formula: \( \text{F1-Score} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} +

\text{Recall}} \)

-Advantages: Balances the trade-off between precision and recall, providing a single measure of
model performance.

- Disadvantages: May be less interpretable in cases where a specific trade-off is desired.

Support

Support represents the number of actual occurrences of each class in the dataset.

-Explanation: Provides context for precision, recall, and F1-score by indicating how many instances
of each class are present.

- Advantages: Helps in understanding the distribution of classes in the dataset.

- Disadvantages: Not a performance metric but crucial for context.

ROC-AUC (Receiver Operating Characteristic - Area Under the Curve)

ROC-AUC measures the model's ability to distinguish between classes by plotting the true positive
rate against the false positive rate at various threshold settings.

- Formula: N/A (Graphical representation)

-Advantages: Provides a comprehensive measure of model performance across all classification

thresholds.

-Disadvantages: May be less intuitive to interpret without graphical representation.

UO3BV21S0005 72 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

These metrics collectively provide a comprehensive evaluation of fraud detection models'

performance. By analyzing precision, recall, F1-score, support, and ROC-AUC, researchers can
determine not only the accuracy but also the reliability and robustness of the models. This detailed
analysis ensures the selected model can provide actionable insights and accurate predictions for
detecting fraudulent transactions, contributing to more secure financial systems.

UO3BV21S0005 73 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

7. RESULT
7.1 Logistic Regression regression

Logistic Regression_tuned Scores

train_set test_set

Accuracy 0.962 0.961

Precision 0.037 0.037

Recall 0.940 0.972

f1 0.071 0.072

roc_auc 0.951 0.967

recall_auc 0.504 0.504

UO3BV21S0005 74 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

ROC_curve

Precision_recall_curve

UO3BV21S0005 75 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

7.2 Random Forest Regressor

RF_Model

------------------

[[46128 0]

[ 10 61]]

Test_Set

precision recall f1-score support

0 1.00 1.00 1.00 46128

1 1.00 0.86 0.92 71

accuracy 1.00 46199

macro avg 1.00 0.93 0.96 46199

weighted avg 1.00 1.00 1.00 46199

UO3BV21S0005 76 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Train_Set

precision recall f1-score support

0 1.00 1.00 1.00 184512

1 1.00 1.00 1.00 283

accuracy 1.00 184795

macro avg 1.00 1.00 1.00 184795

weighted avg 1.00 1.00 1.00 184795

RF_model Scores

train_set test_set

Accuracy 1.000 1.000

Precision 1.000 1.000

UO3BV21S0005 77 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Recall 1.000 0.859

f1 1.000 0.924

roc_auc 1.000 0.930

recall_auc 0.930 0.930

7.3 logistic regression grid

LogRegSmote_tuned

------------------

[[1952 48]

[ 121 1879]]

Test_Set

precision recall f1-score support

0 0.94 0.98 0.96 2000

1 0.98 0.94 0.96 2000

accuracy 0.96 4000

macro avg 0.96 0.96 0.96 4000

weighted avg 0.96 0.96 0.96 4000

Train_Set

precision recall f1-score support

0 0.94 0.97 0.95 8000

UO3BV21S0005 78 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

1 0.97 0.93 0.95 8000

accuracy 0.95 16000

macro avg 0.95 0.95 0.95 16000

weighted avg 0.95 0.95 0.95 16000

LogRegSmote_tuned Scores

train_set test_set

Accuracy 0.953 0.958

Precision 0.972 0.975

Recall 0.934 0.940

f1 0.952 0.957

roc_auc 0.953 0.958

recall_auc 0.972 0.972

UO3BV21S0005 79 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

7.4 Random forest smote model

RFSmote_tuned

------------------

[[1995 5]

[ 109 1891]]

Test_Set

precision recall f1-score support

0 0.95 1.00 0.97 2000

1 1.00 0.95 0.97 2000

accuracy 0.97 4000

macro avg 0.97 0.97 0.97 4000

weighted avg 0.97 0.97 0.97 4000

UO3BV21S0005 80 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Train_Set

precision recall f1-score support

0 0.95 1.00 0.97 8000

1 1.00 0.94 0.97 8000

accuracy 0.97 16000

macro avg 0.97 0.97 0.97 16000

weighted avg 0.97 0.97 0.97 16000

RFSmote_tuned Scores

train_set test_set

Accuracy 0.971 0.972

UO3BV21S0005 81 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Precision 0.998 0.997

Recall 0.945 0.946

f1 0.971 0.971

roc_auc 0.971 0.972

recall_auc 0.985 0.985

Output screen for credit card fraud detection

UO3BV21S0005 82 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Feature importance

In the Random Forest model for credit card fraud detection, feature importance indicates the
contribution of each feature to the model's predictions.

The most important features are

V14 (0.193), V4 (0.109), V10 (0.102), V12 (0.099), and V17 (0.077), reflecting their strong
influence in identifying fraudulent transactions. Other notable features include V2 (0.055), V3
(0.052), and V11 (0.049). Lesser important features, such as V9 (0.023), V21 (0.023), and amount
(0.011), still play a role but with reduced impact. Features like time (0.004) and V24 (0.004)
contribute minimally. This analysis helps prioritize key predictors for model optimization and
interpretability

7.5 Testing Results

● R-squared Logistic Regression: 84.00%

● R-squared Decision Tree Classifier: 98.00%
● R-squared Random Forest Classifier: 98.50%
● R-squared Gradient Boosting Classifier: 97.00%
● R-squared Support Vector Machine (SVM): 73.00%

UO3BV21S0005 83 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

7.6 Result Analysis

1. Logistic Regression Model: The Logistic Regression model provides a decent fit with an R-
squared value of 84%. However, it has relatively high MAE and MSE, indicating larger
average errors in predictions compared to other models. Logistic Regression might not
capture the complex, non-linear relationships in the data as effectively as more sophisticated
models.
2. Decision Tree Classifier: The Decision Tree Classifier performs exceptionally well with an
R-squared value of 98.00%, indicating a very good fit. It has low MAE and MSE values,
demonstrating its ability to handle non-linear relationships and interactions between features
effectively. However, decision trees can be prone to overfitting, especially with deep trees.
3. Random Forest Classifier: The Random Forest Classifier outperforms the Decision Tree
with slightly better MAE and MSE values and an R-squared value of 98.50%. This model
mitigates the overfitting problem seen in single decision trees by averaging multiple trees,
making it more robust and reliable.
4. Gradient Boosting Classifier: The Gradient Boosting Classifier provides strong
performance with an R-squared value of 97.00%. While its MAE and MSE are slightly higher
than those of the Random Forest, it is still very effective. Gradient Boosting models excel at
handling complex data by building sequential models to correct errors from previous models.
5. Support Vector Machine (SVM): The SVM model has the lowest R-squared value of
73.00%, indicating that it doesn't fit the data as well as other models. Its high MAE and MSE
values suggest significant prediction errors. SVM might not be the best choice for this dataset
due to its limited ability to capture complex relationships.

Overall Testing Results:

● R-squared Logistic Regression: 84.00%

● R-squared Decision Tree Classifier: 98.00%
● R-squared Random Forest Classifier: 98.50%
● R-squared Gradient Boosting Classifier: 97.00%
● R-squared Support Vector Machine (SVM): 73.00%

UO3BV21S0005 84 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Conclusion:

Based on the evaluation metrics for the various classification models used in this analysis, it can be
concluded that the Random Forest Classifier and Gradient Boosting Classifier models outperform the
other models in detecting credit card fraud. These two models exhibit the lowest Mean Absolute Error
(MAE) and Mean Squared Error (MSE) values, indicating better accuracy and precision in their
predictions. Additionally, both models achieve high R-squared values, with Random Forest Classifier
scoring approximately 98.50% and Gradient Boosting Classifier scoring around 97.00%.

The Decision Tree Classifier also performs exceptionally well with an R-squared value of
approximately 98.00%, but it has slightly higher MAE and MSE values compared to the top-
performing models. The Logistic Regression and Support Vector Machine (SVM) models, while still
providing reasonable predictions, exhibit slightly higher MAE, MSE, and lower R-squared values
compared to the Random Forest and Gradient Boosting models.

In conclusion, for detecting credit card fraud, both the Random Forest Classifier and Gradient Boosting
Classifier models are recommended due to their superior performance in terms of accuracy and
precision, as indicated by the lower MAE and MSE values and high R-squared values.

UO3BV21S0005 85 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

8. FUTURE ENHANCEMENT
Future Enhancements As machine learning advances, numerous opportunities exist to enhance the
performance, robustness, and interpretability of models designed for credit card fraud detection.
These enhancements aim to improve model generalization, adapt models to real-world scenarios, and
extract more value from transaction data. Here are several key areas for future enhancements:

Key Enhancements:
To further improve the performance and robustness of these models, several key enhancements were
explored:

Data Enhancement:

Feature Engineering: Development of new features that capture transaction patterns more accurately.

Data Augmentation: Techniques such as SMOTE to address class imbalance and enrich the dataset.

Model Enhancement:

Algorithm Tuning: Hyperparameter optimization and the use of ensemble methods to enhance model
accuracy.

Advanced Algorithms: Exploring Gradient Boosting Machines and deep learning models for
potentially better performance.

System Enhancement:

Real-time Detection: Implementing streaming data processing for timely identification of fraudulent
transactions.

Model Maintenance: Regular retraining and monitoring to adapt to evolving transaction patterns.

UO3BV21S0005 86 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Interpretability and Transparency:

Model Interpretability: Utilizing SHAP values and LIME to provide clear explanations for model
predictions.

User Feedback: Establishing feedback loops for continuous improvement based on expert and user
input.

Infrastructure and Security:

Scalability: Utilizing cloud platforms and distributed computing for handling large-scale data.

Security: Ensuring data encryption and compliance with regulatory standards to protect sensitive
information.

Continuous Improvement
The field of credit card fraud detection is dynamic, with constantly evolving tactics from fraudsters.
Continuous improvement is essential to maintain the effectiveness of detection systems. Regular
updates, incorporating new data, and leveraging the latest technological advancements will ensure that
the models remain relevant and robust against emerging threats.

Future Directions
The future of credit card fraud detection lies in integrating more advanced technologies and
methodologies:

Artificial Intelligence and Machine Learning: Continued exploration of advanced AI techniques,

including reinforcement learning and unsupervised learning.

Big Data Analytics: Utilizing big data analytics to process and analyze vast amounts of transaction
data for deeper insights.

Blockchain Technology: Exploring the use of blockchain for secure, transparent transaction records.

UO3BV21S0005 87 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

Behavioral Biometrics: Incorporating behavioral biometrics to enhance user authentication and

security.

Challenges and Considerations

While these enhancements offer significant potential, they come with challenges:

Data Privacy and Security: Maintaining the privacy and security of sensitive transaction data is
paramount.

Resource Management: Efficient management of computational resources and costs, especially with
real-time and large-scale data processing.

Regulatory Compliance: Ensuring compliance with regulatory standards to protect user data and
ensure ethical practices.

Conclusion
Enhancing credit card fraud detection systems using Random Forest and Logistic Regression involves
a multifaceted approach that includes data enhancement, model improvement, real-time detection
capabilities, interpretability, and robust infrastructure. By continuously improving these aspects, the
detection system can achieve higher accuracy, efficiency, and reliability, providing better protection
against fraudulent activities and ensuring a secure financial environment

UO3BV21S0005 88 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

9. CONCLUSION
In conclusion, this project delves into the critical domain of credit card fraud detection, examining the
progression from traditional statistical methods to advanced machine learning models. We have
explored the strengths and limitations of classification algorithms such as Logistic Regression and
Random Forest, recognizing their roles in achieving accuracy, generalization, and robustness in fraud
detection.

Through comprehensive data analysis, preprocessing, and rigorous evaluation, we have identified key
challenges including class imbalance, feature selection, computational complexities, and real-world
application robustness. Our proposed novel approaches and optimizations aim to push the boundaries
of accuracy and efficiency while mitigating these challenges.

By implementing and evaluating these proposed techniques on real-world credit card transaction
datasets, we have demonstrated improvements in predictive accuracy and model performance. The
comparative analysis has provided insights into algorithm selection, model complexity, generalization
capabilities, and scalability, guiding us towards more robust and adaptable fraud detection systems.
The significance of this study extends beyond academic research, impacting critical sectors such as
financial services, consumer protection, and regulatory compliance. The advancements achieved
contribute to ongoing efforts to improve fraud prevention, enhance financial security, and support data-
driven decision-making processes.

Moving forward, continual innovation, optimization, and evaluation will be crucial in sustaining
progress and addressing emerging challenges in fraud detection. By collaborating across disciplines
and leveraging the power of machine learning, we can unlock new potentials, improve real-world
applications, and pave the way for future advancements in fraud detection technologies. These efforts
will ultimately lead to better financial security, reduced economic losses, and more informed policy
decisions.

UO3BV21S0005 89 FARZANA PS
CREDIT CARD FRAUD DETECTION BCA: CAIAS

10. REFERENCES
[1] Adepoju, O., Wosowei, J., lawte, S., & Jaiman, H. (2019). Comparative evaluation of

credit card fraud detection using machine learning techniques. 2019 Global Conference

for Advancement in Technology (GCAT).

https://doi.org/10.1109/gcat47503.2019.8978372

[2] Alenzi, H. Z., & Aljehane, N. O. (2020). Fraud detection in credit cards using logistic

regression. International Journal of Advanced Computer Science and Applications,

11(12). https://doi.org/10.14569/ijacsa.2020.0111265

[3] Awoyemi, J. O., Adetunmbi, A. O., & Oluwadare, S. A. (2017). Credit card fraud

detection using Machine Learning Techniques: A Comparative Analysis. 2017

International Conference on Computing Networking and Informatics (ICCNI).

https://doi.org/10.1109/iccni.2017.8123782

[4] Bhanusri, A., Valli, K. R. S., Jyothi, P., Sai, G. V., & Rohith, R. (2020). Credit card

fraud detection using Machine learning algorithms. Journal of Research in Humanities

and Social Science, 8(2), 04-11.

[5] Credit card statistics. Shift Credit Card Processing. (2021, August 30). Retrieved

from https://shiftprocessing.com/credit-card/

[6] Daly, L. (2021, October 27). Identity theft and credit card fraud statistics for 2021:

The ascent. The Motley Fool. Retrieved from https://www.fool.com/the-

ascent/research/identity-theft-credit-card-fraud-statistics/

[7] Dheepa, V., & Dhanapal, R. (2012). Behaviour based credit card fraud detection using

support vector machines. ICTACT Journal on Soft Computing, 02(04), 391–397.

https://doi.org/10.21917/ijsc.2012.0061

UO3BV21S0005 90 FARZANA PS

Bayes Rules (Johnson, Alicia A.ott, Miles Q.dogucu, Mine)
100% (1)
Bayes Rules (Johnson, Alicia A.ott, Miles Q.dogucu, Mine)
713 pages
Credit Card Fraud Detect
No ratings yet
Credit Card Fraud Detect
19 pages
FRA Milestone 1 Jupyter Notebook PDF
100% (3)
FRA Milestone 1 Jupyter Notebook PDF
42 pages
Credit Card Fraud Detection S2
No ratings yet
Credit Card Fraud Detection S2
49 pages
Analysis On Credit Card Fraud Detection Methods
No ratings yet
Analysis On Credit Card Fraud Detection Methods
19 pages
Credit Card Fraud Detection Proposal Redone
No ratings yet
Credit Card Fraud Detection Proposal Redone
5 pages
Project Report Submitted in The Partial Fulfillment of The Requirements For The Award of The Degree of
No ratings yet
Project Report Submitted in The Partial Fulfillment of The Requirements For The Award of The Degree of
34 pages
FDS Mini Project Report
100% (2)
FDS Mini Project Report
40 pages
NC Report
No ratings yet
NC Report
17 pages
Minor Project Report - 7TH SEMESTER - Odt
No ratings yet
Minor Project Report - 7TH SEMESTER - Odt
16 pages
Synopsis - Format
No ratings yet
Synopsis - Format
18 pages
Merge
No ratings yet
Merge
57 pages
Final - Documentation (Main)
No ratings yet
Final - Documentation (Main)
42 pages
Fam Report Final Last Doc 2
No ratings yet
Fam Report Final Last Doc 2
17 pages
Fraud Detection Project
100% (2)
Fraud Detection Project
29 pages
Vaishnavidocumentation
No ratings yet
Vaishnavidocumentation
52 pages
Batch 31
No ratings yet
Batch 31
30 pages
jello-2
No ratings yet
jello-2
42 pages
Project Report
No ratings yet
Project Report
51 pages
Seminar II Initial Review
No ratings yet
Seminar II Initial Review
13 pages
FINAL MAJOR SYNOPSIS REPORT
No ratings yet
FINAL MAJOR SYNOPSIS REPORT
76 pages
FINAL PROJECT REPORT - Rohit Singhal
No ratings yet
FINAL PROJECT REPORT - Rohit Singhal
31 pages
ProjectReport-24 CS IOT 3A 13(Final) 2
No ratings yet
ProjectReport-24 CS IOT 3A 13(Final) 2
44 pages
REPORT
No ratings yet
REPORT
42 pages
Fraud Detection System Micro-Project
No ratings yet
Fraud Detection System Micro-Project
27 pages
Arshiya M.tech Final Project
No ratings yet
Arshiya M.tech Final Project
64 pages
Final Report
100% (1)
Final Report
79 pages
Literature Review-Survey Paper
No ratings yet
Literature Review-Survey Paper
5 pages
New Project Eee
No ratings yet
New Project Eee
23 pages
Bridget
No ratings yet
Bridget
6 pages
Book Recommendation System
No ratings yet
Book Recommendation System
51 pages
Major Project Report
No ratings yet
Major Project Report
100 pages
Creditcard Fraud Detection
No ratings yet
Creditcard Fraud Detection
26 pages
Irjet V6i3710
No ratings yet
Irjet V6i3710
5 pages
frontUSING PYTHON
No ratings yet
frontUSING PYTHON
9 pages
1822 B.E Cse Batchno 52
No ratings yet
1822 B.E Cse Batchno 52
66 pages
miniproject report final
No ratings yet
miniproject report final
24 pages
Report On Credit Card Fraud Detection Algo Using Machine Learning 1
No ratings yet
Report On Credit Card Fraud Detection Algo Using Machine Learning 1
28 pages
Final Report.2.0.final
No ratings yet
Final Report.2.0.final
68 pages
ProjectReport-24_CS_IOT_3A_13
No ratings yet
ProjectReport-24_CS_IOT_3A_13
21 pages
CREDIT CARD FRAUD MINI REPORT
No ratings yet
CREDIT CARD FRAUD MINI REPORT
23 pages
00905519CS40319CS404
No ratings yet
00905519CS40319CS404
50 pages
302
No ratings yet
302
5 pages
Final MIni Project Report Format
No ratings yet
Final MIni Project Report Format
16 pages
Credit Card Fraud Detection Report
100% (1)
Credit Card Fraud Detection Report
17 pages
Online Transaction Fraud Detection Using Backlogging On Ecommerce Website
No ratings yet
Online Transaction Fraud Detection Using Backlogging On Ecommerce Website
42 pages
Credit Card Project Review
No ratings yet
Credit Card Project Review
59 pages
Credict Card
No ratings yet
Credict Card
6 pages
Synopsis Format For MR
No ratings yet
Synopsis Format For MR
5 pages
NAYAN{PROJECT}
No ratings yet
NAYAN{PROJECT}
12 pages
3rd PRESENTATION
No ratings yet
3rd PRESENTATION
25 pages
19BCS3815@Project Report Kuber
No ratings yet
19BCS3815@Project Report Kuber
40 pages
Development of A Credit Card Fraud Detection System
No ratings yet
Development of A Credit Card Fraud Detection System
61 pages
Analysis_of_Discovering_Fraud_in_Master_Card_Based_on_Bidirectional_GRU_and_CNN_Based_Model
No ratings yet
Analysis_of_Discovering_Fraud_in_Master_Card_Based_on_Bidirectional_GRU_and_CNN_Based_Model
6 pages
Credit Card Detection (2)
No ratings yet
Credit Card Detection (2)
13 pages
Assignment 1 Individual Assignment Template
No ratings yet
Assignment 1 Individual Assignment Template
26 pages
Project Synopsis
No ratings yet
Project Synopsis
8 pages
Synopsis ON "Credit Card Fraud Detection System"
100% (1)
Synopsis ON "Credit Card Fraud Detection System"
14 pages
siddi
No ratings yet
siddi
61 pages
Report Credit Card
No ratings yet
Report Credit Card
13 pages
ML Project Report
No ratings yet
ML Project Report
33 pages
AI Security
From Everand
AI Security
Kai Turing
No ratings yet
Evaluation of Relative Importance of Environmental Issues Associated With A Residential Estate in Hong Kong
No ratings yet
Evaluation of Relative Importance of Environmental Issues Associated With A Residential Estate in Hong Kong
13 pages
Business Analytics & Machine Learning: Logistic and Poisson Regressions
No ratings yet
Business Analytics & Machine Learning: Logistic and Poisson Regressions
62 pages
Zahran 2021
No ratings yet
Zahran 2021
13 pages
Bai 2021 WoS - Divergent Thinking in Four-Year-Old Children - An Analysis of Thinking Processes in Performing The Alternative Uses Task
No ratings yet
Bai 2021 WoS - Divergent Thinking in Four-Year-Old Children - An Analysis of Thinking Processes in Performing The Alternative Uses Task
16 pages
Pokhara University2
No ratings yet
Pokhara University2
38 pages
International Medical Students' Acculturation and Self-Rated Health Status in Hungary: A Cross-Sectional Study
No ratings yet
International Medical Students' Acculturation and Self-Rated Health Status in Hungary: A Cross-Sectional Study
9 pages
XI-Unit-04-Data-Analysis
No ratings yet
XI-Unit-04-Data-Analysis
12 pages
African Development Review - 2022 - Diallo - Subjective poverty and migration intention abroad The case of Senegal
No ratings yet
African Development Review - 2022 - Diallo - Subjective poverty and migration intention abroad The case of Senegal
15 pages
Symbiosis School of Banking and Finance (SSBF)
No ratings yet
Symbiosis School of Banking and Finance (SSBF)
20 pages
Model Regresi Logistik
No ratings yet
Model Regresi Logistik
12 pages
Logistic Regression
No ratings yet
Logistic Regression
11 pages
Baño de Avion
No ratings yet
Baño de Avion
9 pages
Isquiotibiales Test
No ratings yet
Isquiotibiales Test
6 pages
Surgery: Hideki Endo, MD, MPH, Kiyohide Fushimi, MD, PHD, Yasuhiro Otomo, MD, PHD
No ratings yet
Surgery: Hideki Endo, MD, MPH, Kiyohide Fushimi, MD, PHD, Yasuhiro Otomo, MD, PHD
8 pages
Rethinking Sustainable Food Offering in Peru
No ratings yet
Rethinking Sustainable Food Offering in Peru
17 pages
Ipl Prediction
No ratings yet
Ipl Prediction
12 pages
Gender Differences in Optimism: Applied Economics
No ratings yet
Gender Differences in Optimism: Applied Economics
15 pages
Prevalence and Correlates of Internalizing and Externalizing Mental Health Problems Among Inschool Adolescents in Eastern Ethiopia A Crosssectional Study 2024 Nature Research
No ratings yet
Prevalence and Correlates of Internalizing and Externalizing Mental Health Problems Among Inschool Adolescents in Eastern Ethiopia A Crosssectional Study 2024 Nature Research
15 pages
6C Bullying00202
No ratings yet
6C Bullying00202
8 pages
Data Science Program With SONAR Data
No ratings yet
Data Science Program With SONAR Data
11 pages
Speroni 2023 Who Takes The School Bus The Roles of Location Race and Parents in Choosing Travel To School Mode in
No ratings yet
Speroni 2023 Who Takes The School Bus The Roles of Location Race and Parents in Choosing Travel To School Mode in
13 pages
Instant ebooks textbook (Ebook) Applied Multilevel Analysis: A Practical Guide for Medical Researchers (Practical Guides to Biostatistics and Epidemiology) by Jos W. R. Twisk ISBN 9780521849753, 0521849756 download all chapters
100% (3)
Instant ebooks textbook (Ebook) Applied Multilevel Analysis: A Practical Guide for Medical Researchers (Practical Guides to Biostatistics and Epidemiology) by Jos W. R. Twisk ISBN 9780521849753, 0521849756 download all chapters
81 pages
Deep Learning-Based Platform For Prediction of Loss of Ambulation (LOA) in Parkinson Disease
No ratings yet
Deep Learning-Based Platform For Prediction of Loss of Ambulation (LOA) in Parkinson Disease
6 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
A Self Instructing Course in Mode Choice Modeling Multinomial and Nested Logit Models
No ratings yet
A Self Instructing Course in Mode Choice Modeling Multinomial and Nested Logit Models
249 pages
Fama-French Factors and Business Cycles
No ratings yet
Fama-French Factors and Business Cycles
19 pages
Mplus User Guide Ver - 7 - r6 - Web
No ratings yet
Mplus User Guide Ver - 7 - r6 - Web
856 pages