Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

Script KHDL

The presentation discusses predicting fraudulent financial transactions using a dataset of 1.75 million simulated transactions. It identifies key characteristics of fraud, such as the prevalence of small, frequent transactions, and evaluates three models, concluding that the Decision Tree model is the most effective with a 99.48% accuracy rate. Future work includes addressing data imbalance and incorporating additional features to improve detection capabilities.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Script KHDL

The presentation discusses predicting fraudulent financial transactions using a dataset of 1.75 million simulated transactions. It identifies key characteristics of fraud, such as the prevalence of small, frequent transactions, and evaluates three models, concluding that the Decision Tree model is the most effective with a 99.48% accuracy rate. Future work includes addressing data imbalance and incorporating additional features to improve detection capabilities.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Intro

Today, I would like to present on the topic "Predicting Fraudulent Financial


Transactions" – an important issue in the field of finance and data technology.
Our presentation will outline the report’s key sections, including the dataset
overview, methodology, experiment, results, discussion and conclusions.

Problem Formulation

Financial fraud is becoming increasingly complex, now occurring not only in high-
value transactions but also in frequent, low-value ones, especially during low-
surveillance periods. To investigate this, we analyzed 1.75 million transactions from
simulated users across various terminals, covering the period from January to June
2023. Our goal is to identify key characteristics of fraudulent transactions and
propose an effective prediction method to enhance detection accuracy.

Research questions
In our study, we focus on two key research questions. First, "What characteristics
distinguish fraudulent financial transactions?" Second, "How can financial transaction
fraud be predicted accurately using historical transaction data?"

Based on the data, we see that there are many reasons for fraudulent transactions.
To improve accuracy and make our conclusions more focused, we aim to identify the
main factors that lead to financial fraud. This is also the first research question we
want to explore.

We also want to know if the model can still make accurate predictions when used,
since it is built on past data. This helps us measure how well the model works in
predicting new data. That’s why we chose this as our second research question.

Related work
“Using machine learning Meta-Classifiers to detect financial frauds” research by
Achakzai and Juan (2022) employed machine learning Meta-classifiers to detect
financial fraud. The result is Meta-Classifiers can outperform the best stand-alone
classifiers across different performance metrics and improve predictive performance
over traditional statistical methods.
Methodology
Our study employed two key methods: the Orange app (for analytical tool) and
analytical technique we use is data classification. Firstly, I’ll introduce Orange, an
open-source data mining and visualization platform, allowed us to analyze data, build
models, and visualize results effectively. For data classification technique (the
process of predicting/inferencing the class (or multiple classes) of a given data object
based on a predefined classification model), we tested three models — Logistic
Regression, Decision Tree, and Support Vector Machine (SVM) — to predict
fraudulent transactions. After evaluating their performance, which method was
identified as the most effective model, we will choose this method for improving fraud
detection accuracy.
Experiment
Introduction of dataset
We utilized the "Fraudulent Transaction Detection" dataset, sourced from Kaggle— a
reputable online platform for research and learning. This dataset was used to
address the research questions posed, aiming to predict the likelihood of fraud in
financial transactions and derive the final conclusions for our project. Below is the
data structure of the "Fraudulent Transaction Detection" dataset, including its
attributes, meanings, and roles within this dataset.
From the dataset containing 1.05 million instances, we decided to reduce to 7,000
instances to minimize the impact on data representativeness while dealing with a
severe class imbalance (where fraudulent transactions account for a significantly
smaller proportion than legitimate transactions).
Data preprocessing
Since the dataset contained no missing values or noisy data, we retained the original
data without adjustments. Using the Rank Widget tool in Orange, we identified five
variables — CUSTOMER_ID, TX_TIME_SECONDS, TRANSACTION_ID,
TX_TIME_DAYS, and TERMINAL_ID — as having minimal impact on fraud
detection. These variables were removed to improve model efficiency. Additionally,
we renamed certain variables for improved clarity during analysis by Edit Domain
widget.

Research Question 1: What characteristics distinguish fraudulent financial


transactions?

To answer this question, we analyzed the dataset to identify key patterns that set
fraudulent transactions apart from legitimate ones.

First, we examined the transaction amounts. While many people assume that
fraudulent transactions are usually high-value, our findings showed otherwise.
Fraudulent transactions appeared across various value ranges, including small and
moderate amounts. This suggests that fraudsters may intentionally conduct low-
value transactions to avoid detection.

Next, we explored fraud scenarios in the dataset. Among the identified fraud cases,
Fraud Scenario 1 was the most common. This scenario often involved a series of
rapid transactions from the same terminal or customer within a short period. This
pattern aligns with the “burst fraud” tactic, where multiple low-value transactions are
processed quickly to bypass security systems.

These findings highlight that fraudulent transactions are not limited to large amounts
but can also occur in smaller, frequent transactions. Identifying these patterns is
crucial for developing effective fraud detection models.

Data Classification (RQ2)


Next, we used the Data Sampler tool in Orange to split the dataset into two parts:
70% for model training and 30% for prediction and evaluation. For the model-building
process, we applied three classification techniques: Logistic Regression, Decision
Tree, and Support Vector Machine (SVM). Each model was evaluated using the Test
& Score tool with 5-fold cross-validation to ensure consistent and reliable results.
While Logistic Regression achieved the highest AUC score (0,979) , we found that
the Decision Tree model performed better when we looked deeper into the results.

Using the Confusion Matrix, we saw that the Decision Tree model had fewer
mistakes in identifying both legitimate and fraudulent transactions. This is important
because missing a fraudulent transaction (a false negative) can lead to serious
financial losses, while wrongly marking a legitimate transaction as fraud (a false
positive) can frustrate customers. Although Logistic Regression scored well overall, it
misclassified more transactions compared to the Decision Tree model.

The ROC Curve also confirmed this. While all three models — Logistic Regression,
Decision Tree, and SVM — showed good performance, the Decision Tree curve was
closest to the top-left corner. This position indicates the best balance between
correctly detecting fraud and avoiding false alarms. As a result, the Decision Tree
method was selected as the most effective model for fraud prediction.
Finally, we applied the trained Decision Tree model to the test dataset. Out of 2,100
transactions, the model predicted 272 fraudulent transactions, closely matching the
actual number of 283 fraudulent transactions. This minimal deviation demonstrated
the model’s high accuracy rate of 99.48%, confirming its effectiveness in identifying
fraudulent financial transactions.
Result

Our study successfully identified key patterns in fraudulent transactions and


developed an effective prediction model.

For Research Question 1, we found that fraudulent transactions often involve small,
frequent amounts, not just large values. The most common pattern, Fraud Scenario
1, featured rapid transactions from the same terminal or customer, suggesting
fraudsters may exploit timing strategies to evade detection.

For Research Question 2, while Logistic Regression had the highest AUC score, the
Decision Tree model proved most effective. It accurately predicted 272 fraudulent
cases out of 2,100 transactions, closely matching the actual figure of 283, with an
impressive 99.48% accuracy.

These results highlight that identifying transaction patterns and selecting the right
model greatly improves fraud detection.

Discussion
*Limitation: One issue was data imbalance, meaning there were far fewer
fraudulent transactions than legitimate ones. This may have made it harder for the
model to spot rare fraud cases. Another challenge was that the dataset had limited
details, which could prevent the model from detecting more complex fraud patterns.
*Future Direction:

First, some transactions lacked clear fraud labels. Techniques like clustering or
semi-supervised learning could help identify patterns in these cases.

Second, to address data imbalance, methods such as SMOTE or threshold


adjustments could improve the model’s ability to detect rare fraud cases.

Lastly, adding features like transaction frequency, merchant types, or location data
may help uncover more complex fraud patterns.

These improvements can further enhance fraud detection accuracy.

Conclusion

You might also like