Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
CREDIT CARD FRAUD DETECTION
Submitted by :
Vineeta
And
Shubham Chandel
What is credit card fraud?
• Credit card fraud is when someone uses
your credit card or credit account to make
a purchase you didn't authorize. This
activity can happen in different ways: If you
lose your credit card or have it stolen, it
can be used to make purchases or other
transactions, either in person or online.
What credit card fraud detection does?
• The Credit Card Fraud Detection Problem
includes modeling past credit card
transactions with the knowledge of the ones
that turned out to be fraud. This model is
then used to identify whether a new
transaction is fraudulent or not. Our aim
here is to detect maximum of the fraudulent
transactions while minimizing the incorrect
fraud classifications
Some important terms:
• True Positive: The fraud cases that the model predicted as ‘fraud.’
• False Positive: The non-fraud cases that the model predicted as ‘fraud.’
• True Negative: The non-fraud cases that the model predicted as ‘non-fraud.’
• False Negative: The fraud cases that the model predicted as ‘non-fraud.’
• Accuracy: The measure of correct predictions made by the model – that is, the ratio
of fraud transactions classified as fraud and non-fraud classified as non-fraud to the
total transactions in the test data.
• Sensitivity: Sensitivity, or True Positive Rate, or Recall, is the ratio of correctly
identified fraud cases to total fraud cases.
• Specificity: Specificity, or True Negative Rate, is the ratio of correctly identified non-
fraud cases to total non-fraud cases.
• Precision: Precision is the ratio of correctly predicted fraud cases to total predicted
fraud cases.
• Confusion matrix: A confusion matrix is a table that is often used to describe the
performance of a classification model (or “classifier”) on a set of test data for which
the true values are known. It allows the visualization of the performance of an
algorithm.
ABOUT DATASET
• We gathered the data from
Kaggle(https://www.kaggle.com/mlg-
ulb/creditcardfraud)
• It has 32 variables and 284807 observations
• Class is used as the factor variable
Lets take a look at the data set
•V1 to V28 are the transactions
•Amount denotes the amount of transaction
•Class is the factor variable ( 0 denotes legit transaction and 1
denotes fraud detection)
•Time is the time of transaction
Functions used for balancing the dataset
• Random over sampling
Random oversampling duplicates examples from the
minority class in the training dataset and can result in
overfitting for some models.
• Random under sampling
Random undersampling deletes examples from the
majority class and can result in losing information
invaluable to a model.
• Hybrid Sampling
It’s a combination of random under sampling and random
over sampling
• Smote
SMOTE synthesizes new examples for the minority class.
Lets take a look at the model and try
understand it…
BEGINNING WITH THE UI
XGBOOST
• Method used in UI for fraud prediction
• XGBoost (Extreme Gradient Boosting) is an
optimized distributed gradient boosting
library.
• It provides parallel computing, regularization,
Enabled cross verification, Missing values,
Flexibility, Availibility, Save and reload, tree
pruning
How does XGBOOST work?
• XGBoost belongs to a family of boosting
algorithms that convert weak learners into
strong learners. A weak learner is one which is
slightly better than random guessing.
• Boosting is a sequential process; i.e., trees are
grown using the information from a previously
grown tree one after the other. This process
slowly learns from data and tries to improve its
prediction in subsequent iterations.
Parameters of XGBOOST
• nrounds[default=100]
It controls the maximum number of iterations/growth of trees
• eta[default=0.3][range: (0,1)]
It controls the learning rate
• gamma[default=0][range: (0,Inf)]
It controls regularization and prevents overfitting
• max_depth[default=6][range: (0,Inf)]
It controls the depth of the tree
• min_child_weight[default=1][range:(0,Inf)]
the leaf node has a minimum sum of instance weight lower than
min_child_weight, the tree splitting stops.
• subsample[default=1][range: (0,1)]
It controls the number of samples supplied to a tree.
• colsample_bytree[default=1][range: (0,1)]
It control the number of variables supplied to a tree
LET’S LOOK AT UI
•THANKYOU

More Related Content

Credit card fraud detection

  • 1. CREDIT CARD FRAUD DETECTION Submitted by : Vineeta And Shubham Chandel
  • 2. What is credit card fraud? • Credit card fraud is when someone uses your credit card or credit account to make a purchase you didn't authorize. This activity can happen in different ways: If you lose your credit card or have it stolen, it can be used to make purchases or other transactions, either in person or online.
  • 3. What credit card fraud detection does? • The Credit Card Fraud Detection Problem includes modeling past credit card transactions with the knowledge of the ones that turned out to be fraud. This model is then used to identify whether a new transaction is fraudulent or not. Our aim here is to detect maximum of the fraudulent transactions while minimizing the incorrect fraud classifications
  • 4. Some important terms: • True Positive: The fraud cases that the model predicted as ‘fraud.’ • False Positive: The non-fraud cases that the model predicted as ‘fraud.’ • True Negative: The non-fraud cases that the model predicted as ‘non-fraud.’ • False Negative: The fraud cases that the model predicted as ‘non-fraud.’ • Accuracy: The measure of correct predictions made by the model – that is, the ratio of fraud transactions classified as fraud and non-fraud classified as non-fraud to the total transactions in the test data. • Sensitivity: Sensitivity, or True Positive Rate, or Recall, is the ratio of correctly identified fraud cases to total fraud cases. • Specificity: Specificity, or True Negative Rate, is the ratio of correctly identified non- fraud cases to total non-fraud cases. • Precision: Precision is the ratio of correctly predicted fraud cases to total predicted fraud cases. • Confusion matrix: A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. It allows the visualization of the performance of an algorithm.
  • 5. ABOUT DATASET • We gathered the data from Kaggle(https://www.kaggle.com/mlg- ulb/creditcardfraud) • It has 32 variables and 284807 observations • Class is used as the factor variable
  • 6. Lets take a look at the data set
  • 7. •V1 to V28 are the transactions •Amount denotes the amount of transaction •Class is the factor variable ( 0 denotes legit transaction and 1 denotes fraud detection) •Time is the time of transaction
  • 8. Functions used for balancing the dataset • Random over sampling Random oversampling duplicates examples from the minority class in the training dataset and can result in overfitting for some models. • Random under sampling Random undersampling deletes examples from the majority class and can result in losing information invaluable to a model. • Hybrid Sampling It’s a combination of random under sampling and random over sampling • Smote SMOTE synthesizes new examples for the minority class.
  • 9. Lets take a look at the model and try understand it…
  • 11. XGBOOST • Method used in UI for fraud prediction • XGBoost (Extreme Gradient Boosting) is an optimized distributed gradient boosting library. • It provides parallel computing, regularization, Enabled cross verification, Missing values, Flexibility, Availibility, Save and reload, tree pruning
  • 12. How does XGBOOST work? • XGBoost belongs to a family of boosting algorithms that convert weak learners into strong learners. A weak learner is one which is slightly better than random guessing. • Boosting is a sequential process; i.e., trees are grown using the information from a previously grown tree one after the other. This process slowly learns from data and tries to improve its prediction in subsequent iterations.
  • 13. Parameters of XGBOOST • nrounds[default=100] It controls the maximum number of iterations/growth of trees • eta[default=0.3][range: (0,1)] It controls the learning rate • gamma[default=0][range: (0,Inf)] It controls regularization and prevents overfitting • max_depth[default=6][range: (0,Inf)] It controls the depth of the tree • min_child_weight[default=1][range:(0,Inf)] the leaf node has a minimum sum of instance weight lower than min_child_weight, the tree splitting stops. • subsample[default=1][range: (0,1)] It controls the number of samples supplied to a tree. • colsample_bytree[default=1][range: (0,1)] It control the number of variables supplied to a tree