Script KHDL

The presentation discusses predicting fraudulent financial transactions using a dataset of 1.75 million simulated transactions. It identifies key characteristics of fraud, such as the prevalence of small, frequent transactions, and evaluates three models, concluding that the Decision Tree model is the most effective with a 99.48% accuracy rate. Future work includes addressing data imbalance and incorporating additional features to improve detection capabilities.

Uploaded by

thutran.31231022807

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Script KHDL

Uploaded by

thutran.31231022807

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Intro

Today, I would like to present on the topic "Predicting Fraudulent Financial

Transactions" – an important issue in the field of finance and data technology.
Our presentation will outline the report’s key sections, including the dataset
overview, methodology, experiment, results, discussion and conclusions.

Problem Formulation

Financial fraud is becoming increasingly complex, now occurring not only in high-
value transactions but also in frequent, low-value ones, especially during low-
surveillance periods. To investigate this, we analyzed 1.75 million transactions from
simulated users across various terminals, covering the period from January to June
2023. Our goal is to identify key characteristics of fraudulent transactions and
propose an effective prediction method to enhance detection accuracy.

Research questions
In our study, we focus on two key research questions. First, "What characteristics
distinguish fraudulent financial transactions?" Second, "How can financial transaction
fraud be predicted accurately using historical transaction data?"

Based on the data, we see that there are many reasons for fraudulent transactions.
To improve accuracy and make our conclusions more focused, we aim to identify the
main factors that lead to financial fraud. This is also the first research question we
want to explore.

We also want to know if the model can still make accurate predictions when used,
since it is built on past data. This helps us measure how well the model works in
predicting new data. That’s why we chose this as our second research question.

Related work
“Using machine learning Meta-Classifiers to detect financial frauds” research by
Achakzai and Juan (2022) employed machine learning Meta-classifiers to detect
financial fraud. The result is Meta-Classifiers can outperform the best stand-alone
classifiers across different performance metrics and improve predictive performance
over traditional statistical methods.
Methodology
Our study employed two key methods: the Orange app (for analytical tool) and
analytical technique we use is data classification. Firstly, I’ll introduce Orange, an
open-source data mining and visualization platform, allowed us to analyze data, build
models, and visualize results effectively. For data classification technique (the
process of predicting/inferencing the class (or multiple classes) of a given data object
based on a predefined classification model), we tested three models — Logistic
Regression, Decision Tree, and Support Vector Machine (SVM) — to predict
fraudulent transactions. After evaluating their performance, which method was
identified as the most effective model, we will choose this method for improving fraud
detection accuracy.
Experiment
Introduction of dataset
We utilized the "Fraudulent Transaction Detection" dataset, sourced from Kaggle— a
reputable online platform for research and learning. This dataset was used to
address the research questions posed, aiming to predict the likelihood of fraud in
financial transactions and derive the final conclusions for our project. Below is the
data structure of the "Fraudulent Transaction Detection" dataset, including its
attributes, meanings, and roles within this dataset.
From the dataset containing 1.05 million instances, we decided to reduce to 7,000
instances to minimize the impact on data representativeness while dealing with a
severe class imbalance (where fraudulent transactions account for a significantly
smaller proportion than legitimate transactions).
Data preprocessing
Since the dataset contained no missing values or noisy data, we retained the original
data without adjustments. Using the Rank Widget tool in Orange, we identified five
variables — CUSTOMER_ID, TX_TIME_SECONDS, TRANSACTION_ID,
TX_TIME_DAYS, and TERMINAL_ID — as having minimal impact on fraud
detection. These variables were removed to improve model efficiency. Additionally,
we renamed certain variables for improved clarity during analysis by Edit Domain
widget.

Research Question 1: What characteristics distinguish fraudulent financial

transactions?

To answer this question, we analyzed the dataset to identify key patterns that set
fraudulent transactions apart from legitimate ones.

First, we examined the transaction amounts. While many people assume that
fraudulent transactions are usually high-value, our findings showed otherwise.
Fraudulent transactions appeared across various value ranges, including small and
moderate amounts. This suggests that fraudsters may intentionally conduct low-
value transactions to avoid detection.

Next, we explored fraud scenarios in the dataset. Among the identified fraud cases,
Fraud Scenario 1 was the most common. This scenario often involved a series of
rapid transactions from the same terminal or customer within a short period. This
pattern aligns with the “burst fraud” tactic, where multiple low-value transactions are
processed quickly to bypass security systems.

These findings highlight that fraudulent transactions are not limited to large amounts
but can also occur in smaller, frequent transactions. Identifying these patterns is
crucial for developing effective fraud detection models.

Data Classification (RQ2)

Next, we used the Data Sampler tool in Orange to split the dataset into two parts:
70% for model training and 30% for prediction and evaluation. For the model-building
process, we applied three classification techniques: Logistic Regression, Decision
Tree, and Support Vector Machine (SVM). Each model was evaluated using the Test
& Score tool with 5-fold cross-validation to ensure consistent and reliable results.
While Logistic Regression achieved the highest AUC score (0,979) , we found that
the Decision Tree model performed better when we looked deeper into the results.

Using the Confusion Matrix, we saw that the Decision Tree model had fewer
mistakes in identifying both legitimate and fraudulent transactions. This is important
because missing a fraudulent transaction (a false negative) can lead to serious
financial losses, while wrongly marking a legitimate transaction as fraud (a false
positive) can frustrate customers. Although Logistic Regression scored well overall, it
misclassified more transactions compared to the Decision Tree model.

The ROC Curve also confirmed this. While all three models — Logistic Regression,
Decision Tree, and SVM — showed good performance, the Decision Tree curve was
closest to the top-left corner. This position indicates the best balance between
correctly detecting fraud and avoiding false alarms. As a result, the Decision Tree
method was selected as the most effective model for fraud prediction.
Finally, we applied the trained Decision Tree model to the test dataset. Out of 2,100
transactions, the model predicted 272 fraudulent transactions, closely matching the
actual number of 283 fraudulent transactions. This minimal deviation demonstrated
the model’s high accuracy rate of 99.48%, confirming its effectiveness in identifying
fraudulent financial transactions.
Result

Our study successfully identified key patterns in fraudulent transactions and

developed an effective prediction model.

For Research Question 1, we found that fraudulent transactions often involve small,
frequent amounts, not just large values. The most common pattern, Fraud Scenario
1, featured rapid transactions from the same terminal or customer, suggesting
fraudsters may exploit timing strategies to evade detection.

For Research Question 2, while Logistic Regression had the highest AUC score, the
Decision Tree model proved most effective. It accurately predicted 272 fraudulent
cases out of 2,100 transactions, closely matching the actual figure of 283, with an
impressive 99.48% accuracy.

These results highlight that identifying transaction patterns and selecting the right
model greatly improves fraud detection.

Discussion
*Limitation: One issue was data imbalance, meaning there were far fewer
fraudulent transactions than legitimate ones. This may have made it harder for the
model to spot rare fraud cases. Another challenge was that the dataset had limited
details, which could prevent the model from detecting more complex fraud patterns.
*Future Direction:

First, some transactions lacked clear fraud labels. Techniques like clustering or
semi-supervised learning could help identify patterns in these cases.

Second, to address data imbalance, methods such as SMOTE or threshold

adjustments could improve the model’s ability to detect rare fraud cases.

Lastly, adding features like transaction frequency, merchant types, or location data
may help uncover more complex fraud patterns.

These improvements can further enhance fraud detection accuracy.

Conclusion

191 - 197 - Detection of Transaction Fraud Using Deep Learning
No ratings yet
191 - 197 - Detection of Transaction Fraud Using Deep Learning
28 pages
ML Fraud Detection Case Study
No ratings yet
ML Fraud Detection Case Study
5 pages
synopsis ml projectpdf
No ratings yet
synopsis ml projectpdf
13 pages
mlproject
No ratings yet
mlproject
8 pages
10 Case Study
No ratings yet
10 Case Study
6 pages
Case Study Front Page
No ratings yet
Case Study Front Page
11 pages
Analysis & Summary Report for the Vice President of Fraud
No ratings yet
Analysis & Summary Report for the Vice President of Fraud
16 pages
Whitrow2009 Article TransactionAggregationAsAStrat
No ratings yet
Whitrow2009 Article TransactionAggregationAsAStrat
26 pages
Credit Card Fraud Detection Using Enhanced Random Forest Classifier For Imbalanced Data
No ratings yet
Credit Card Fraud Detection Using Enhanced Random Forest Classifier For Imbalanced Data
11 pages
Modeling and Data Analysis in The Credit Card Industry: Bankruptcy, Fraud, and Collections
No ratings yet
Modeling and Data Analysis in The Credit Card Industry: Bankruptcy, Fraud, and Collections
6 pages
Final Doc of Fraud Detection in Banking Data by Machine Learning Techniques
No ratings yet
Final Doc of Fraud Detection in Banking Data by Machine Learning Techniques
63 pages
JETIR2404299
No ratings yet
JETIR2404299
9 pages
Research Proposal Template for Master Student
No ratings yet
Research Proposal Template for Master Student
15 pages
FINANCIAL FRAUD DETECTION
No ratings yet
FINANCIAL FRAUD DETECTION
11 pages
Sample Project Report
No ratings yet
Sample Project Report
45 pages
PaymentsFraudDetectionusingMLmethodsExploringPerformanceEthicalandReal-WorldConsiderationsinMachineLearningbasedFraudDetectionforSecurePayments
No ratings yet
PaymentsFraudDetectionusingMLmethodsExploringPerformanceEthicalandReal-WorldConsiderationsinMachineLearningbasedFraudDetectionforSecurePayments
13 pages
Static models to detect fraud that rely on supervised training are exposed to the risk of being learned and circumvented
No ratings yet
Static models to detect fraud that rely on supervised training are exposed to the risk of being learned and circumvented
10 pages
6credit Card Fraud Detection Using Hidden Markov Model and Naive Bayes
No ratings yet
6credit Card Fraud Detection Using Hidden Markov Model and Naive Bayes
51 pages
IEEE_Conference_Template (2)
No ratings yet
IEEE_Conference_Template (2)
3 pages
A Multi Perspective Fraud Detection Method For Multi Participant E Commerce Transactions
No ratings yet
A Multi Perspective Fraud Detection Method For Multi Participant E Commerce Transactions
6 pages
Credit Card Fraud Detection - Machine Learning Methods
No ratings yet
Credit Card Fraud Detection - Machine Learning Methods
5 pages
Detecting Financial Statement Fraud With Interpret
No ratings yet
Detecting Financial Statement Fraud With Interpret
11 pages
Internship project
No ratings yet
Internship project
8 pages
Credit Card Fraud Detection Techniques
No ratings yet
Credit Card Fraud Detection Techniques
8 pages
Major 1 2nd
No ratings yet
Major 1 2nd
13 pages
Fraud Detection Project Report
No ratings yet
Fraud Detection Project Report
6 pages
Fraud_Detection_Synopsis
No ratings yet
Fraud_Detection_Synopsis
5 pages
Credit Card Fraud Detection and Analysis
No ratings yet
Credit Card Fraud Detection and Analysis
4 pages
Credit Card Fraud Detection With Artificial Immune Systems (Has Features)
No ratings yet
Credit Card Fraud Detection With Artificial Immune Systems (Has Features)
12 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
27 pages
Fraud Detection in Banking Data by Machine Learning Techniques
No ratings yet
Fraud Detection in Banking Data by Machine Learning Techniques
10 pages
Insurance Fraud Detection
No ratings yet
Insurance Fraud Detection
10 pages
Presentation Slides
No ratings yet
Presentation Slides
16 pages
ADS Phase1
No ratings yet
ADS Phase1
2 pages
122208
No ratings yet
122208
17 pages
1023 Vol.30 Issue 4 April 2024 Educational Administration Theory and Practice
No ratings yet
1023 Vol.30 Issue 4 April 2024 Educational Administration Theory and Practice
12 pages
Researchpaperforcapstone! (1)
No ratings yet
Researchpaperforcapstone! (1)
9 pages
pdsreport (1)
No ratings yet
pdsreport (1)
6 pages
Autonomous Credit Card Fraud Detection Using Machine Learning Approach
No ratings yet
Autonomous Credit Card Fraud Detection Using Machine Learning Approach
23 pages
MPML10 2022 FR
No ratings yet
MPML10 2022 FR
24 pages
Credit Card Fraud Detection: Title
No ratings yet
Credit Card Fraud Detection: Title
5 pages
Report
No ratings yet
Report
14 pages
Credit Card Project-2
No ratings yet
Credit Card Project-2
17 pages
Credit Card Project-2
No ratings yet
Credit Card Project-2
17 pages
IJRPR16322
No ratings yet
IJRPR16322
15 pages
Fraudulent Financial Transactions Detection Using Machine Learning
No ratings yet
Fraudulent Financial Transactions Detection Using Machine Learning
10 pages
ccfdpdf
No ratings yet
ccfdpdf
25 pages
Fraud Detection: Data Mining
No ratings yet
Fraud Detection: Data Mining
5 pages
03 Niall Adams
100% (1)
03 Niall Adams
49 pages
Mini Project
No ratings yet
Mini Project
3 pages
Dect
No ratings yet
Dect
3 pages
chat gpt rp
No ratings yet
chat gpt rp
3 pages
Machine Learning in Anti-Money Laundering - Summary Report: October 2018
No ratings yet
Machine Learning in Anti-Money Laundering - Summary Report: October 2018
10 pages
金融违约笔记
No ratings yet
金融违约笔记
10 pages
Topic 2
No ratings yet
Topic 2
5 pages
RL2
No ratings yet
RL2
6 pages
B17 Discrete Report
No ratings yet
B17 Discrete Report
16 pages
Fraud Detection System Micro-Project
No ratings yet
Fraud Detection System Micro-Project
27 pages
Credit Card Fraud Detection Using A Deep Learning Multistage Model
No ratings yet
Credit Card Fraud Detection Using A Deep Learning Multistage Model
26 pages
Data Mining 101: Core Concepts and Algorithms
From Everand
Data Mining 101: Core Concepts and Algorithms
Swarnalata Verma
No ratings yet
File Word Tiểu Luận Nhóm 8
No ratings yet
File Word Tiểu Luận Nhóm 8
31 pages
Tài liệu không có tiêu đề (1)
No ratings yet
Tài liệu không có tiêu đề (1)
2 pages
Sườn ý
No ratings yet
Sườn ý
18 pages
Homework Chapter 1
No ratings yet
Homework Chapter 1
7 pages
Esp1 - English For Business - Sample Test - SV
No ratings yet
Esp1 - English For Business - Sample Test - SV
8 pages
Strategic Value Framework 2022 - Draft
No ratings yet
Strategic Value Framework 2022 - Draft
54 pages
Qualitative Methods - KKV and RSI Short Paper
No ratings yet
Qualitative Methods - KKV and RSI Short Paper
9 pages
Division Monitoring, Evaluation and Plan Adjustment Proposal
No ratings yet
Division Monitoring, Evaluation and Plan Adjustment Proposal
12 pages
Giraldo 2021 B
No ratings yet
Giraldo 2021 B
16 pages
Icar PG Paper 2pd 2020 DM-DC
No ratings yet
Icar PG Paper 2pd 2020 DM-DC
62 pages
Chapter One: Introduction: 1 1.1 Definition and Classification of Statistics
No ratings yet
Chapter One: Introduction: 1 1.1 Definition and Classification of Statistics
68 pages
Audit-Sampling EXERCISE
No ratings yet
Audit-Sampling EXERCISE
200 pages
Thesis Topics Operations Management
100% (3)
Thesis Topics Operations Management
4 pages
Teori Observasi
No ratings yet
Teori Observasi
10 pages
Quantitative Research: La Salette of Roxas College, Inc
No ratings yet
Quantitative Research: La Salette of Roxas College, Inc
56 pages
Row 1 Row 2 Row 3 Row 4 Column 1 Column 2 Column 3 Column 4
No ratings yet
Row 1 Row 2 Row 3 Row 4 Column 1 Column 2 Column 3 Column 4
10 pages
Quantitative Methods Assignment
No ratings yet
Quantitative Methods Assignment
15 pages
RESEARCH PAPER
No ratings yet
RESEARCH PAPER
19 pages
Question Text: Not Yet Answered Marked Out of 1.00
No ratings yet
Question Text: Not Yet Answered Marked Out of 1.00
16 pages
BA History V Syllabus New
No ratings yet
BA History V Syllabus New
49 pages
Final 1
No ratings yet
Final 1
64 pages
Experimental Psychology
0% (3)
Experimental Psychology
3 pages
Session Plan Automotive
No ratings yet
Session Plan Automotive
54 pages
ECO242
No ratings yet
ECO242
1 page
07 Formulasi Kebijakan Kesehatan Berbasis Bukti
No ratings yet
07 Formulasi Kebijakan Kesehatan Berbasis Bukti
30 pages
TOK Essay Example A: Student Work
No ratings yet
TOK Essay Example A: Student Work
7 pages
Bahasa Isyarat Indonesia Sebagai Budaya Tuli Melalui Pemaknaan Anggota Gerakan Untuk Kesejahteraan Tuna Rungu
No ratings yet
Bahasa Isyarat Indonesia Sebagai Budaya Tuli Melalui Pemaknaan Anggota Gerakan Untuk Kesejahteraan Tuna Rungu
14 pages
The Salford DBA Guidance-on-Writing-a-DBA-Research-Proposal-PDF
No ratings yet
The Salford DBA Guidance-on-Writing-a-DBA-Research-Proposal-PDF
2 pages
Unit5 Hypothesis Testing-1
No ratings yet
Unit5 Hypothesis Testing-1
82 pages
Quiz Grade-10
No ratings yet
Quiz Grade-10
2 pages
Statistics and Psychology
100% (14)
Statistics and Psychology
385 pages
Untitled
No ratings yet
Untitled
254 pages
Thesis Paper Chapter 3 Sample
100% (1)
Thesis Paper Chapter 3 Sample
4 pages
PDF (SG) - EAP11 - 12 - Unit 10 - Lesson 1 - Kinds of Reports
No ratings yet
PDF (SG) - EAP11 - 12 - Unit 10 - Lesson 1 - Kinds of Reports
19 pages
For Research Report
No ratings yet
For Research Report
14 pages