Ppa Final Project

CREDIT RISK
ANALYTICS
Submit To:
Prof. Alekh Gour
Submitted By:
Bhavika Garodia (B2019011)
Ishika Thukral (B2019021)
Krishna Kumar (B2019022)
Pranav Vijay Kulkarni (B2019035)
Soumyadeep Halder (B2019052)
INTRODUCTION
 Identify the credit risk for the customers in order to grant them loans.
 The bank has provided a certain data to identify the credit risk
involved in calculating the risk involved in granting loans to the
customers.
 Based on the data provided we need to design a model that predicts

whether the customer would be a default customer or not..
2
1.
PROBLEM
STATEMENT
The Bank needs to predict
“ whether the customer would be
defaulting his/ her payments
towards the credits that he/ she
has obtained.
(This prediction would help bank to
minimize the chances of default).
4
DATASET LAYOUT
VARIABLE NAMES VARIABLE DEFINITIONS
Age Age of Customer
Ed Education Qualification
Employ Tenure with current employer (in years)

Address Number of years in same address
Income Customer Income
Debtinc Debt to income ratio
Creddebt Credit to Debt ratio
Othdebt Other debts
Default Customer defaulted in the past (1= defaulted,
0=Never defaulted)
5
TRANFORMATIONS IN THE
DATASET
There were NA values in the dependent variable “Default”, hence
we omitted the NA values from the dataset.
The data was not in a normalized way, (the ratio details were
provided in the dataset), hence we calculated the Debt, Credit
values using Debt to income ratio, Credit to Debt ratio, using the
income values provided in the dataset.
6
ASSIGNING WEIGHTS TO VARIABLES
Age variable was also one of the main factor that we took into
considerations for providing credit to the customers.
The Age values were in numerical format and it was in the range
from 20 to 56 years.
In order to predict it with respect to individual age groups we had
segmented each age group from 20-30, 30-40, 40-50, 50-60 and had
assigned weights of 1,2,3 and 4 respectively.
Similarly assigning weights to address and employ variables. These
variables were segmented from 0-5, 5-10, 10-15, 15-20, 20-25, 25-30,
30-35 and had assigned weights of 1, 2, 3, 4, 5, 6, 7 respectively.
7
FINAL TOUCH TO THE DATASET
Once the weights are assigned, we converted age and

education, employment, address variables into factor data
type.
Post this we created dummy variables for education, age,

employment and address variables.
8
CORRELATION
How is the
Correlation?
9
LOGISTIC REGRESSION
The data was split into training and testing datasets, in 70:30 ratio.
The model was built on training dataset and the insignificant

variables were removed from the model and we obtained the
variables age.1, employ.1, address.1, totaldebt were the only
significant variables that determine whether the customer would
default or not.
We obtained the residual deviance to be 460.08 on 485 degrees of

freedom.
Null Deviance 562.85 on 489 degrees of freedom.

10
regression
 The model was tested and the
confusion matrix was built for
different levels of ROC values, and
the maximum accuracy was obtained
at 0.4
 The prediction accuracy was 78.09%
11
DECISION TREE USING GINI
12
DECISION TREE USING INFORMATION GAIN
13
Comparing GINI and Information gain
The accuracy of both the models are determined to be

77.14%
Based on the complexity of the decision, it is

preferable to chose the decision model obtained from
gini.
14
RESULTS
It can be seen that the model depends on the factors such as totaldebt,
employ.1, credit, address.1, age.1.
Using these variables limitations specified in the decision model, prediction

can be done to determine whether a customer would default or not.
Regression Equation:
-3.15+ (0.616 x age.1) + (1.73 x employ.1) + (0.55 x address.1) + (0.0017x
totaldebt) = Y
15
CONCLUSION
Based on the model we have created, the highest probability of default is for
the customer’s with the below profiles
The customer in the age group of 20 to 30 years of age.
Also depends on the customers who are employed and have experience
of 0 to 5 years.
Also it depend on the customer’s who have not stayed at the same
address for more than 5 years.
Also the total debt of the customer (that is the debt + otherdebt).
The bank must be cautious on providing loans/ credits to the customers with
the above profile.
16
Thank You!
17

Ppa Final Project

Uploaded by

Copyright:

Available Formats

Ppa Final Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ppa Final Project

Uploaded by

Copyright:

Available Formats

CREDIT RISK

 Based on the data provided we need to design a model that predicts

Employ Tenure with current employer (in years)

Once the weights are assigned, we converted age and

Post this we created dummy variables for education, age,

The model was built on training dataset and the insignificant

We obtained the residual deviance to be 460.08 on 485 degrees of

Null Deviance 562.85 on 489 degrees of freedom.

 The prediction accuracy was 78.09%

The accuracy of both the models are determined to be

Based on the complexity of the decision, it is

Using these variables limitations specified in the decision model, prediction

You might also like