Ppa Final Project
Ppa Final Project
Ppa Final Project
ANALYTICS
Submit To:
Prof. Alekh Gour
Submitted By:
Bhavika Garodia (B2019011)
Ishika Thukral (B2019021)
Krishna Kumar (B2019022)
Pranav Vijay Kulkarni (B2019035)
Soumyadeep Halder (B2019052)
INTRODUCTION
Identify the credit risk for the customers in order to grant them loans.
The bank has provided a certain data to identify the credit risk
involved in calculating the risk involved in granting loans to the
customers.
2
1.
PROBLEM
STATEMENT
The Bank needs to predict
“ whether the customer would be
defaulting his/ her payments
towards the credits that he/ she
has obtained.
(This prediction would help bank to
minimize the chances of default).
4
DATASET LAYOUT
VARIABLE NAMES VARIABLE DEFINITIONS
Age Age of Customer
Ed Education Qualification
5
TRANFORMATIONS IN THE
DATASET
There were NA values in the dependent variable “Default”, hence
we omitted the NA values from the dataset.
The data was not in a normalized way, (the ratio details were
provided in the dataset), hence we calculated the Debt, Credit
values using Debt to income ratio, Credit to Debt ratio, using the
income values provided in the dataset.
6
ASSIGNING WEIGHTS TO VARIABLES
Age variable was also one of the main factor that we took into
considerations for providing credit to the customers.
The Age values were in numerical format and it was in the range
from 20 to 56 years.
In order to predict it with respect to individual age groups we had
segmented each age group from 20-30, 30-40, 40-50, 50-60 and had
assigned weights of 1,2,3 and 4 respectively.
Similarly assigning weights to address and employ variables. These
variables were segmented from 0-5, 5-10, 10-15, 15-20, 20-25, 25-30,
30-35 and had assigned weights of 1, 2, 3, 4, 5, 6, 7 respectively.
7
FINAL TOUCH TO THE DATASET
8
CORRELATION
How is the
Correlation?
9
LOGISTIC REGRESSION
The data was split into training and testing datasets, in 70:30 ratio.
11
DECISION TREE USING GINI
12
DECISION TREE USING INFORMATION GAIN
13
Comparing GINI and Information gain
14
RESULTS
It can be seen that the model depends on the factors such as totaldebt,
employ.1, credit, address.1, age.1.
Regression Equation:
-3.15+ (0.616 x age.1) + (1.73 x employ.1) + (0.55 x address.1) + (0.0017x
totaldebt) = Y
15
CONCLUSION
Based on the model we have created, the highest probability of default is for
the customer’s with the below profiles
The customer in the age group of 20 to 30 years of age.
Also depends on the customers who are employed and have experience
of 0 to 5 years.
Also it depend on the customer’s who have not stayed at the same
address for more than 5 years.
Also the total debt of the customer (that is the debt + otherdebt).
The bank must be cautious on providing loans/ credits to the customers with
the above profile.
16
Thank You!
17