The document discusses credit card fraud detection. It defines credit card fraud as unauthorized purchases made using someone's credit card or account. Credit card fraud detection models past credit card transactions to identify fraudulent versus legitimate transactions. The model's performance is evaluated based on metrics like true positives, false positives, accuracy, sensitivity, specificity, and precision. The dataset used contains over 284,000 credit card transactions, with variables like amount and time, and a class variable indicating legitimate or fraudulent transactions. An XGBoost model is used for fraud prediction in the user interface. XGBoost is an optimized gradient boosting algorithm that converts weak learners into strong learners through sequential iterations to improve predictions.
1 of 15
More Related Content
Credit card fraud detection
Submitted by :
Shubham Chandel
2. What is credit card fraud?
• Credit card fraud is when someone uses
your credit card or credit account to make
a purchase you didn't authorize. This
activity can happen in different ways: If you
lose your credit card or have it stolen, it
can be used to make purchases or other
transactions, either in person or online.
3. What credit card fraud detection does?
• The Credit Card Fraud Detection Problem
includes modeling past credit card
transactions with the knowledge of the ones
that turned out to be fraud. This model is
then used to identify whether a new
transaction is fraudulent or not. Our aim
here is to detect maximum of the fraudulent
transactions while minimizing the incorrect
fraud classifications
4. Some important terms:
• True Positive: The fraud cases that the model predicted as ‘fraud.’
• False Positive: The non-fraud cases that the model predicted as ‘fraud.’
• True Negative: The non-fraud cases that the model predicted as ‘non-fraud.’
• False Negative: The fraud cases that the model predicted as ‘non-fraud.’
• Accuracy: The measure of correct predictions made by the model – that is, the ratio
of fraud transactions classified as fraud and non-fraud classified as non-fraud to the
total transactions in the test data.
• Sensitivity: Sensitivity, or True Positive Rate, or Recall, is the ratio of correctly
identified fraud cases to total fraud cases.
• Specificity: Specificity, or True Negative Rate, is the ratio of correctly identified non-
fraud cases to total non-fraud cases.
• Precision: Precision is the ratio of correctly predicted fraud cases to total predicted
fraud cases.
• Confusion matrix: A confusion matrix is a table that is often used to describe the
performance of a classification model (or “classifier”) on a set of test data for which
the true values are known. It allows the visualization of the performance of an
• We gathered the data from
• It has 32 variables and 284807 observations
• Class is used as the factor variable
7. •V1 to V28 are the transactions
•Amount denotes the amount of transaction
•Class is the factor variable ( 0 denotes legit transaction and 1
denotes fraud detection)
•Time is the time of transaction
8. Functions used for balancing the dataset
• Random over sampling
Random oversampling duplicates examples from the
minority class in the training dataset and can result in
overfitting for some models.
• Random under sampling
Random undersampling deletes examples from the
majority class and can result in losing information
invaluable to a model.
• Hybrid Sampling
It’s a combination of random under sampling and random
over sampling
• Smote
SMOTE synthesizes new examples for the minority class.
9. Lets take a look at the model and try
understand it…
• Method used in UI for fraud prediction
• XGBoost (Extreme Gradient Boosting) is an
optimized distributed gradient boosting
• It provides parallel computing, regularization,
Enabled cross verification, Missing values,
Flexibility, Availibility, Save and reload, tree
12. How does XGBOOST work?
• XGBoost belongs to a family of boosting
algorithms that convert weak learners into
strong learners. A weak learner is one which is
slightly better than random guessing.
• Boosting is a sequential process; i.e., trees are
grown using the information from a previously
grown tree one after the other. This process
slowly learns from data and tries to improve its
prediction in subsequent iterations.
13. Parameters of XGBOOST
• nrounds[default=100]
It controls the maximum number of iterations/growth of trees
• eta[default=0.3][range: (0,1)]
It controls the learning rate
• gamma[default=0][range: (0,Inf)]
It controls regularization and prevents overfitting
• max_depth[default=6][range: (0,Inf)]
It controls the depth of the tree
• min_child_weight[default=1][range:(0,Inf)]
the leaf node has a minimum sum of instance weight lower than
min_child_weight, the tree splitting stops.
• subsample[default=1][range: (0,1)]
It controls the number of samples supplied to a tree.
• colsample_bytree[default=1][range: (0,1)]
It control the number of variables supplied to a tree