Retail Credit Scoring
Retail Credit Scoring
Retail Credit Scoring
Group-7
Section A
DAUR Assignment 3
Method:
- For the ease calculation and clean the data to remove factors which might not contribute
to the analysis, we have removed several fields from the data frame
- We have used logistics regression to build the model for the classification of defaulters
(Only 2 classes: Defaulters and Non-defaulters)
- Using the cross validation approach post building the model we have tested the training
data on test data (since it is mentioned to split the data into two half, we have used equal
number (14453) of samples for test and training set)
- We have used classification tree approach to classify the data post this
- For above methods, we have built confusion matrix and determined true and false
prediction as a percentage
Analysis
Since, Monthly income in thousands is coming out be not significant in the model, we have
removed this field and run the logistics regression one more time, without “MTHINCTH”
Output
ROC plot
Output
Tree
library(ISLR)
library(tree)
cr<-read.csv("Downloads/DAUR/Retail credit.csv",header=T)
detach(cr)
attach(cr)
names(cr)
dim(cr)
names(cr)
dim(cr)
summary(mod_1)
summary(mod_2)
require(MASS)
exp(cbind(Odds_Ratio=coef(mod_2), confint(mod_2)))
# Using the "predict()" function to obtain the probabilities of the form "P(Y=1|X)"
# The "type=response" ensures the output of the form "P(Y=1|X)", rather than other information
such as the logit
mod_2.probs=predict(mod_2,type="response")
library(Epi)
# ROC Plot
library(pROC)
R=roc(DefaulterFlag,mod_2.probs)
plot(roc(DefaulterFlag,mod_2.probs),col="blue",legacy.axes = TRUE)
mod_2.pred=rep("No",28906)
mod_2.pred[mod_2.probs>.6845515]="Yes"
mod_train.predict=ifelse(mod_2.pred=="Yes",1,0)
# Creating Confusion Matrix to check how many observations are correctly or incorrectly classified
table(mod_train.predict,DefaulterFlag)
# Calculating the fraction of days for which the prediction was correct
mean(mod_train.predict==DefaulterFlag)
mean(mod_train.predict!=DefaulterFlag)
default_new<- ifelse(DefaulterFlag=="Yes", 1, 0)
library(ResourceSelection)
hoslem.test(default_new, fitted(mod_2))
set.seed(1)
train=sample(1:28906,14453)
# Training Data
cr_train=cr[train,]
dim(cr_train)
# Test Data
cr_test=cr[-train,]
dim(cr_test)
df_train=DefaulterFlag[train]
df_test=DefaulterFlag[-train]
# Fitting a new logistic regression model based on the training data set
mod_train=glm(DefaulterFlag~.-DefaulterType
,data=cr, subset=train,family=binomial)
# Predicting "P(Y=1|X)" for the training data set based on the fitted logistic regression model
mod_probs_train=predict(mod_train,cr_test,type="response")
names(cr_train)
dim(cr_test)
names(cr_test)
# ROC Plot
R=roc(df_train,mod_probs_train)
plot(roc(df_train,mod_probs_train),col="blue",legacy.axes = TRUE)
mod_pred_train=rep("No",14453)
mod_pred_train[mod_probs_train>.644594]="Yes"
mod_train.predict=ifelse(mod_pred_train=="Yes",1,0)
# Creating Confusion Matrix to check how many observations are correctly or incorrectly classified
table(mod_train.predict,df_train)
# Calculating the fraction of days for which the prediction was correct
mean(mod_train.predict==df_train)
mean(mod_train.predict!=df_train)
dim(cr_train)
require(rpart.plot)
require(rpart)
p <-predict(r,cr_test,type = "class")
# Confusion Matrix
table(df_test,p)
# Calculating the fraction of days for which the prediction was correct
mean(df_test==p)
mean(df_test!=p)