Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
0 views

Credit_Card_fraud_detection Using ML - Jupyter Notebook2

The document outlines a Jupyter Notebook for credit card fraud detection using machine learning techniques. It includes steps for data loading, preprocessing, and statistical analysis of a dataset containing valid and fraudulent transactions. The notebook also addresses class imbalance by sampling valid transactions to create a balanced dataset for further analysis.

Uploaded by

balajikukkapalli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Credit_Card_fraud_detection Using ML - Jupyter Notebook2

The document outlines a Jupyter Notebook for credit card fraud detection using machine learning techniques. It includes steps for data loading, preprocessing, and statistical analysis of a dataset containing valid and fraudulent transactions. The notebook also addresses class imbalance by sampling valid transactions to create a balanced dataset for further analysis.

Uploaded by

balajikukkapalli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

6/17/23, 8:13 AM Credit_Card_fraud_detection using ML - Jupyter Notebook

In [24]:
!pip install scikit-learn

Requirement already satisfied: scikit-learn in c:\python39\lib\site-packages


(1.2.2)
Requirement already satisfied: numpy>=1.17.3 in c:\python39\lib\site-packages
(from scikit-learn) (1.24.1)
Requirement already satisfied: scipy>=1.3.2 in c:\python39\lib\site-packages
(from scikit-learn) (1.10.1)
Requirement already satisfied: joblib>=1.1.1 in c:\python39\lib\site-packages
(from scikit-learn) (1.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in c:\python39\lib\site-p
ackages (from scikit-learn) (3.1.0)

In [2]:

#IMPORT THE REQUIRED LIBRARIES

In [1]:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler as SS

In [2]:
#LOAD THE DATASET USING PANDAS LIBRARY

In [3]:
dataset=pd.read_csv('creditcard.csv')

In [4]:
#VIEWING THE DATASET USING head() and tail()

localhost:8888/notebooks/Credit_Card_fraud_detection using ML.ipynb 1/14


6/17/23, 8:13 AM Credit_Card_fraud_detection using ML - Jupyter Notebook

In [5]:
dataset.head()

Out[5]:

Time V1 V2 V3 V4 V5 V6 V7 V8

0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599 0.098698 0

1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803 0.085102 -0

2 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461 0.247676 -1

3 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609 0.377436 -1

4 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941 -0.270533 0

5 rows × 31 columns

In [6]:
dataset.tail()

Out[6]:

Time V1 V2 V3 V4 V5 V6 V7

284802 172786.0 -11.881118 10.071785 -9.834783 -2.066656 -5.364473 -2.606837 -4.918215 7

284803 172787.0 -0.732789 -0.055080 2.035030 -0.738589 0.868229 1.058415 0.024330 0

284804 172788.0 1.919565 -0.301254 -3.249640 -0.557828 2.630515 3.031260 -0.296827 0

284805 172788.0 -0.240440 0.530483 0.702510 0.689799 -0.377961 0.623708 -0.686180 0

284806 172792.0 -0.533413 -0.189733 0.703337 -0.506271 -0.012546 -0.649617 1.577006 -0

5 rows × 31 columns

In [7]:
#VIEW THE SHAPE OF THE DATASET

In [14]:
dataset.shape

Out[14]:

(284807, 31)

In [ ]:
#DROPPING THE TIME COLUMN

localhost:8888/notebooks/Credit_Card_fraud_detection using ML.ipynb 2/14


6/17/23, 8:13 AM Credit_Card_fraud_detection using ML - Jupyter Notebook

In [8]:
dataset=dataset.drop(['Time'],axis=1)

In [9]:
dataset.shape

Out[9]:

(284807, 30)

In [10]:
#CHECKING FOR NULL VALUES IN THE DATASET

In [11]:
dataset.isnull().sum()

Out[11]:

V1 0
V2 0
V3 0
V4 0
V5 0
V6 0
V7 0
V8 0
V9 0
V10 0
V11 0
V12 0
V13 0
V14 0
V15 0
V16 0
V17 0
V18 0
V19 0
V20 0
V21 0
V22 0
V23 0
V24 0
V25 0
V26 0
V27 0
V28 0
Amount 0
Class 0
dtype: int64

localhost:8888/notebooks/Credit_Card_fraud_detection using ML.ipynb 3/14


6/17/23, 8:13 AM Credit_Card_fraud_detection using ML - Jupyter Notebook

In [12]:
#INFORMATION ABOUT DATASET FEATURES

In [13]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284807 entries, 0 to 284806
Data columns (total 30 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 V1 284807 non-null float64
1 V2 284807 non-null float64
2 V3 284807 non-null float64
3 V4 284807 non-null float64
4 V5 284807 non-null float64
5 V6 284807 non-null float64
6 V7 284807 non-null float64
7 V8 284807 non-null float64
8 V9 284807 non-null float64
9 V10 284807 non-null float64
10 V11 284807 non-null float64
11 V12 284807 non-null float64
12 V13 284807 non-null float64
13 V14 284807 non-null float64
14 V15 284807 non-null float64
15 V16 284807 non-null float64
16 V17 284807 non-null float64
17 V18 284807 non-null float64
18 V19 284807 non-null float64
19 V20 284807 non-null float64
20 V21 284807 non-null float64
21 V22 284807 non-null float64
22 V23 284807 non-null float64
23 V24 284807 non-null float64
24 V25 284807 non-null float64
25 V26 284807 non-null float64
26 V27 284807 non-null float64
27 V28 284807 non-null float64
28 Amount 284807 non-null float64
29 Class 284807 non-null int64
dtypes: float64(29), int64(1)
memory usage: 65.2 MB

In [14]:
#STATISTICAL MEASURES OF THE DATASET FEATURES

localhost:8888/notebooks/Credit_Card_fraud_detection using ML.ipynb 4/14


6/17/23, 8:13 AM Credit_Card_fraud_detection using ML - Jupyter Notebook

In [15]:
dataset.describe()

Out[15]:

V1 V2 V3 V4 V5 V6

count 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05 2.848070e+05

mean 1.168375e-15 3.416908e-16 -1.379537e-15 2.074095e-15 9.604066e-16 1.487313e-15

std 1.958696e+00 1.651309e+00 1.516255e+00 1.415869e+00 1.380247e+00 1.332271e+00

min -5.640751e+01 -7.271573e+01 -4.832559e+01 -5.683171e+00 -1.137433e+02 -2.616051e+01

25% -9.203734e-01 -5.985499e-01 -8.903648e-01 -8.486401e-01 -6.915971e-01 -7.682956e-01

50% 1.810880e-02 6.548556e-02 1.798463e-01 -1.984653e-02 -5.433583e-02 -2.741871e-01

75% 1.315642e+00 8.037239e-01 1.027196e+00 7.433413e-01 6.119264e-01 3.985649e-01

max 2.454930e+00 2.205773e+01 9.382558e+00 1.687534e+01 3.480167e+01 7.330163e+01

8 rows × 30 columns

In [16]:
#COUNTING THE TARGET VALUES OF VALID AND FRAUD TRANSACTIONS

In [17]:

dataset['Class'].value_counts()

Out[17]:

0 284315
1 492
Name: Class, dtype: int64

In [18]:
#SEPERATING VALID AND FRAUD TRANSACTIONS FOR ANALYSIS

In [20]:
real=dataset[dataset.Class==0]
fraud=dataset[dataset.Class==1]

In [21]:
#VIEW THE SHAPES OF VALID AND FRAUD TRANSACTIONS DATA

In [22]:

print(real.shape,fraud.shape)

(284315, 30) (492, 30)

localhost:8888/notebooks/Credit_Card_fraud_detection using ML.ipynb 5/14


6/17/23, 8:13 AM Credit_Card_fraud_detection using ML - Jupyter Notebook

In [23]:
#STATISTICAL MEASURES ON VALID AND FRAUD DATA - AMOUNT COLUMN

In [24]:
real.Amount.describe()

Out[24]:

count 284315.000000
mean 88.291022
std 250.105092
min 0.000000
25% 5.650000
50% 22.000000
75% 77.050000
max 25691.160000
Name: Amount, dtype: float64

In [25]:

fraud.Amount.describe()

Out[25]:

count 492.000000
mean 122.211321
std 256.683288
min 0.000000
25% 1.000000
50% 9.250000
75% 105.890000
max 2125.870000
Name: Amount, dtype: float64

In [26]:

#CONVERTING THE UNBALANCED DATASET TO EQUAL SIZE

In [27]:
real_trans=real.sample(n=492)

In [28]:
#SHAPE OF CONVERTED VALID TRANSACTIONS DATA

In [29]:

print(real_trans.shape)

(492, 30)

localhost:8888/notebooks/Credit_Card_fraud_detection using ML.ipynb 6/14


6/17/23, 8:13 AM Credit_Card_fraud_detection using ML - Jupyter Notebook

In [30]:
#CONCATENATING THE NEW VALID TRANSACTIONS DATA AND FRAUD DATA TO FORM NEW DATASET

In [31]:
new_dataset=pd.concat([real_trans,fraud],axis=0)

In [32]:
new_dataset.shape

Out[32]:

(984, 30)

localhost:8888/notebooks/Credit_Card_fraud_detection using ML.ipynb 7/14


6/17/23, 8:13 AM Credit_Card_fraud_detection using ML - Jupyter Notebook

In [33]:
print(new_dataset)

V1 V2 V3 V4 V5 V6 V7
\
12944 1.244831 -0.323488 0.669388 -0.759828 -0.533733 0.174007 -0.728925
197932 -2.553810 -0.083991 -1.936271 -1.333610 1.151432 -1.256007 0.601789
118594 1.274797 0.134625 0.175297 0.316727 -0.072007 -0.208877 -0.098056
62022 -2.494865 2.322834 -0.348792 0.227640 -2.164643 -0.456763 -0.424845
33852 1.234848 0.693741 -0.570387 0.812455 0.402725 -0.960529 0.412969
... ... ... ... ... ... ... ...
279863 -1.927883 1.125653 -4.518331 1.749293 -1.566487 -2.010494 -0.882850
280143 1.378559 1.289381 -5.004247 1.411850 0.442581 -1.326536 -1.413170
280149 -0.676143 1.126366 -2.213700 0.468308 -1.120541 -0.003346 -2.234739
281144 -3.113832 0.585864 -5.399730 1.817092 -0.840618 -2.943548 -2.208002
281674 1.991976 0.158476 -2.583441 0.408670 1.151147 -0.096695 0.223050

V8 V9 V10 ... V21 V22 V23 \


12944 0.059987 3.121899 -1.505427 ... -0.180848 0.003462 -0.111213
197932 0.309391 -0.044965 -0.907717 ... 0.094768 1.383915 0.429534
118594 0.027570 -0.069581 0.121424 ... -0.266693 -0.832113 0.028693
62022 1.691762 0.115487 -0.478802 ... 0.006396 0.136100 0.061866
33852 -0.246613 -0.349505 -0.791390 ... -0.112713 -0.174721 -0.067972
... ... ... ... ... ... ... ...
279863 0.697211 -2.064945 -5.587794 ... 0.778584 -0.319189 0.639419
280143 0.248525 -1.127396 -3.232153 ... 0.370612 0.028234 -0.145640
280149 1.210158 -0.652250 -3.463891 ... 0.751826 0.834108 0.190944
281144 1.058733 -1.632333 -5.245984 ... 0.583276 -0.269209 -0.456108
281674 -0.068384 0.577829 -0.888722 ... -0.164350 -0.295135 -0.072173

V24 V25 V26 V27 V28 Amount Class


12944 -0.809456 0.520126 -0.644821 0.081947 0.019252 11.85 0
197932 0.319040 1.896602 0.245503 0.091951 -0.242711 33.24 0
118594 -0.518209 0.277836 0.130337 -0.037812 -0.000764 1.98 0
62022 0.430796 0.096930 0.392947 0.259631 0.084802 119.99 0
33852 -0.005488 0.537779 0.394634 -0.006489 0.042001 0.76 0
... ... ... ... ... ... ... ...
279863 -0.294885 0.537503 0.788395 0.292680 0.147968 390.00 1
280143 -0.081049 0.521875 0.739467 0.389152 0.186637 0.76 1
280149 0.032070 -0.739695 0.471111 0.385107 0.194361 77.89 1
281144 -0.183659 -0.328168 0.606116 0.884876 -0.253700 245.00 1
281674 -0.450261 0.313267 -0.289617 0.002988 -0.015309 42.53 1

[984 rows x 30 columns]

In [34]:

#DIVIDING THE DATA INTO DEPENDENT AND INDEPENDENT VARIABLES

In [35]:
X=new_dataset.iloc[:,:-1]
y=new_dataset['Class']

localhost:8888/notebooks/Credit_Card_fraud_detection using ML.ipynb 8/14


6/17/23, 8:13 AM Credit_Card_fraud_detection using ML - Jupyter Notebook

In [36]:
print(X)
print(y)

V1 V2 V3 V4 V5 V6 V7
\
12944 1.244831 -0.323488 0.669388 -0.759828 -0.533733 0.174007 -0.728925
197932 -2.553810 -0.083991 -1.936271 -1.333610 1.151432 -1.256007 0.601789
118594 1.274797 0.134625 0.175297 0.316727 -0.072007 -0.208877 -0.098056
62022 -2.494865 2.322834 -0.348792 0.227640 -2.164643 -0.456763 -0.424845
33852 1.234848 0.693741 -0.570387 0.812455 0.402725 -0.960529 0.412969
... ... ... ... ... ... ... ...
279863 -1.927883 1.125653 -4.518331 1.749293 -1.566487 -2.010494 -0.882850
280143 1.378559 1.289381 -5.004247 1.411850 0.442581 -1.326536 -1.413170
280149 -0.676143 1.126366 -2.213700 0.468308 -1.120541 -0.003346 -2.234739
281144 -3.113832 0.585864 -5.399730 1.817092 -0.840618 -2.943548 -2.208002
281674 1.991976 0.158476 -2.583441 0.408670 1.151147 -0.096695 0.223050

V8 V9 V10 ... V20 V21 V22 \


12944 0.059987 3.121899 -1.505427 ... -0.136351 -0.180848 0.003462
197932 0.309391 -0.044965 -0.907717 ... -0.714412 0.094768 1.383915
118594 0.027570 -0.069581 0.121424 ... -0.092439 -0.266693 -0.832113
62022 1.691762 0.115487 -0.478802 ... -0.124247 0.006396 0.136100
33852 -0.246613 -0.349505 -0.791390 ... -0.026455 -0.112713 -0.174721
... ... ... ... ... ... ... ...
279863 0.697211 -2.064945 -5.587794 ... 1.252967 0.778584 -0.319189
280143 0.248525 -1.127396 -3.232153 ... 0.226138 0.370612 0.028234
280149 1.210158 -0.652250 -3.463891 ... 0.247968 0.751826 0.834108
281144 1.058733 -1.632333 -5.245984 ... 0.306271 0.583276 -0.269209
281674 -0.068384 0.577829 -0.888722 ... -0.017652 -0.164350 -0.295135

V23 V24 V25 V26 V27 V28 Amount


12944 -0.111213 -0.809456 0.520126 -0.644821 0.081947 0.019252 11.85
197932 0.429534 0.319040 1.896602 0.245503 0.091951 -0.242711 33.24
118594 0.028693 -0.518209 0.277836 0.130337 -0.037812 -0.000764 1.98
62022 0.061866 0.430796 0.096930 0.392947 0.259631 0.084802 119.99
33852 -0.067972 -0.005488 0.537779 0.394634 -0.006489 0.042001 0.76
... ... ... ... ... ... ... ...
279863 0.639419 -0.294885 0.537503 0.788395 0.292680 0.147968 390.00
280143 -0.145640 -0.081049 0.521875 0.739467 0.389152 0.186637 0.76
280149 0.190944 0.032070 -0.739695 0.471111 0.385107 0.194361 77.89
281144 -0.456108 -0.183659 -0.328168 0.606116 0.884876 -0.253700 245.00
281674 -0.072173 -0.450261 0.313267 -0.289617 0.002988 -0.015309 42.53

[984 rows x 29 columns]


12944 0
197932 0
118594 0
62022 0
33852 0
..
279863 1
280143 1
280149 1
281144 1
281674 1
Name: Class, Length: 984, dtype: int64

localhost:8888/notebooks/Credit_Card_fraud_detection using ML.ipynb 9/14


6/17/23, 8:13 AM Credit_Card_fraud_detection using ML - Jupyter Notebook

In [37]:
#DATA STANDARDIZATION USING STABDARDSCALER LIBRARY

In [38]:
#data standardization - downscaling the values in range between -1 to 1

In [39]:
scaler=SS()
scaler.fit(X)
standard_X=scaler.fit_transform(X)

In [40]:

print(standard_X)

[[ 0.64436566 -0.56000255 0.678745 ... -0.01354975 -0.02895029


-0.42613389]
[-0.0366462 -0.49589664 0.25840802 ... -0.00373262 -0.55314014
-0.32615045]
[ 0.64973775 -0.43738005 0.59903972 ... -0.13106416 -0.06900215
-0.47226931]
...
[ 0.29997779 -0.1719224 0.21365418 ... 0.2839302 0.32144471
-0.1174426 ]
[-0.13704557 -0.31659764 -0.30030643 ... 0.77433309 -0.57513009
0.66368098]
[ 0.77831208 -0.43099595 0.15400869 ... -0.09102908 -0.09810702
-0.28272613]]

In [41]:
#SPLITTING THE DATASET INTO TRAINING AND TESTING

In [42]:
x_train,x_test,y_train,y_test=train_test_split(standard_X,y,test_size=0.1,stratify=y,random_

In [43]:
print(x_train.shape,x_test.shape,y_train.shape,y_test.shape)

(885, 29) (99, 29) (885,) (99,)

In [44]:
#MODEL TRAINING - LOGISTIC REGRESSION

localhost:8888/notebooks/Credit_Card_fraud_detection using ML.ipynb 10/14


6/17/23, 8:13 AM Credit_Card_fraud_detection using ML - Jupyter Notebook

In [45]:
classifier=LogisticRegression()
classifier.fit(x_train,y_train)

Out[45]:

▾ LogisticRegression
LogisticRegression()

In [46]:

#CHECKING THE ACCURACY ON TRAINING DATA

In [47]:
train_acc=classifier.predict(x_train)
train_acc_score=accuracy_score(train_acc,y_train)
print("The Accuracy on training data is :",train_acc_score)

The Accuracy on training data is : 0.9468926553672317

In [48]:
#PREDICT THE OUTPUTS USING TEST DATA ON THE MODEL

In [49]:
y_pred=classifier.predict(x_test)

In [50]:
print(y_pred)

[1 0 0 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 0 0 1 0 1 1 0 1 0 1 1 0 0 1
0 0 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 1 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0
0 0 1 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 1 1 0 1]

In [51]:

print(y_test)

198868 1
107690 0
217010 0
269315 0
146418 0
..
221515 0
30100 1
123238 1
166719 0
262560 1
Name: Class, Length: 99, dtype: int64

localhost:8888/notebooks/Credit_Card_fraud_detection using ML.ipynb 11/14


6/17/23, 8:13 AM Credit_Card_fraud_detection using ML - Jupyter Notebook

In [52]:
#CHECKING THE ACCURACY OF PREDICTED OUTPUTS BY THE MODEL

In [53]:
accuracy=accuracy_score(y_pred,y_test)
print("The Accuracy Score of the model is : ",accuracy)

The Accuracy Score of the model is : 0.9595959595959596

In [56]:

#LET US PREDICT THE LABLES OF UNKNOWN DATA IN THE DATASET

In [54]:

input_data=[-1.359807134, -0.072781173,2.536346738,1.378155224,-0.33832077,
0.462387778,0.239598554,0.098697901,0.3637869,0.090794172,-0.551599533,
-0.617800856,-0.991389847,-0.311169354,1.468176972,-0.470400525,
0.207971242,0.02579058,0.40399296,0.251412098,-0.018306778,0.277837576,
-0.11047391,0.066928075,0.128539358,-0.189114844,0.133558377,-0.21053053,149.62

In [55]:
#CHANGING THIS INPUT DATA INTO NUMPY ARRAY

In [56]:
input_data_as_numpy_array = np.asarray(input_data)
input_data_reshaped=input_data_as_numpy_array.reshape(1,-1)

In [57]:
#STANDARDIZE THE RESHAPED ARRAY DATA

In [58]:
stand_input_data=scaler.transform(input_data_reshaped)
print(stand_input_data)

[[ 0.17741197 -0.49289614 0.97991693 -0.27390802 0.29401914 0.69373145


0.52055522 -0.03069531 0.69723632 0.65705465 -0.86706227 0.55238137
-0.90664344 0.67685441 1.49001676 0.46569015 0.58728046 0.46185805
0.05315731 0.08446081 -0.13288618 0.23950209 -0.08900368 0.2410082
0.17515525 -0.43271835 0.03709494 -0.48874704 0.2178455 ]]

C:\python39\lib\site-packages\sklearn\base.py:439: UserWarning: X does not ha


ve valid feature names, but StandardScaler was fitted with feature names
warnings.warn(

In [59]:
#PREDICT THE USER OF THIS UNKNOWN DATA

localhost:8888/notebooks/Credit_Card_fraud_detection using ML.ipynb 12/14


6/17/23, 8:13 AM Credit_Card_fraud_detection using ML - Jupyter Notebook

In [60]:
prediction_label=classifier.predict(stand_input_data)

In [61]:
if prediction_label:
print("Fraud Transaction..............👎")
else:
print("Valid Transaction...............👍")

Valid Transaction...............👍

In [62]:

#-----------LET US FIT THE RandomForestClassifier ON THE SAME DATA AND CHECK THE ACCURACY AN

In [63]:

rfc=RandomForestClassifier()
rfc.fit(x_train,y_train)
pred_rfc=rfc.predict(x_test)
print("Predicted lables using RFC\n",pred_rfc)

Predicted lables using RFC


[1 0 0 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 0 0 1 0 1 1 0 1 0 1 1 0 0 1
0 0 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 0 1 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0
0 0 1 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 0 0 1 1 0 1]

In [64]:
acc_score_rfc=accuracy_score(pred_rfc,y_test)
print("Accuracy Score using RFC is :",acc_score_rfc)

Accuracy Score using RFC is : 0.9595959595959596

In [65]:
prediction_label_rfc=rfc.predict(stand_input_data)

In [66]:
if prediction_label_rfc:
print("Fraud Transaction..............👎")
else:
print("Valid Transaction...............👍")

Valid Transaction...............👍

In [ ]:

localhost:8888/notebooks/Credit_Card_fraud_detection using ML.ipynb 13/14

You might also like