Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
154 views30 pages

Logistic Regression

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 30

PHUONG NGUYEN

LOGISTIC REGRESSION
CONTENT
1. INTRODUCTION

2. LOGISTIC REGRESSION MODEL

3. EVALUATING CLASSIFICATION PERFORMANCE


INTRODUCTION


INTRODUCTION


INTRODUCTION

5
LOGISTIC RESPONSE FUNCTION
1
𝑝=
1 + 𝑒 −𝑥

6
PROBABILITY

 

1
𝑝= −(𝛽 0 +𝛽 1 𝑥 1 +𝛽2 𝑥 2 + …𝛽 𝑞 𝑥 𝑞 )
1+ 𝑒
ODDS

𝑝
𝑂𝑑𝑑𝑠 =
1−𝑝

𝑂𝑑𝑑𝑠 1
𝑝= =
1 + 𝑂𝑑𝑑𝑠 1 + 𝑂𝑑𝑑𝑠 −1
ODDS

𝑝
𝑂𝑑𝑑𝑠 =
1−𝑝
LOGIT

𝑂𝑑𝑑𝑠 = 𝑒 𝛽0 +𝛽1𝑥1 +𝛽2𝑥2 +⋯+𝛽𝑞𝑥𝑞

ln(𝑂𝑑𝑑𝑠) = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑞 𝑥𝑞
LOGIT
𝑝
𝐿𝑜𝑔𝑖𝑡 = 𝑙𝑛
1−𝑝
LOGISTIC REGRESSION MODEL


PERSONAL LOAN OFFER
UNIVERSALBANK.CSV



SINGLE PREDICTOR MODEL

 
SINGLE PREDICTOR MODEL


PYTHON FUNCTIONALITY NEEDED
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression,
LogisticRegressionCV
from sklearn.model_selection import train_test_split
import statsmodels.api as sm
from mord import LogisticIT
import matplotlib.pylab as plt
import seaborn as sns
from dmba import classificationSummary, gainsChart,
liftChart
from dmba.metric import AIC_score

https://github.com/nnbphuong/datascience4biz/blob/
master/Logistic_Regression.ipynb
DATA PREPROCESSING
bank_df = pd.read_csv('UniversalBank.csv')
bank_df.drop(columns=['ID', 'ZIP Code'], inplace=True)
bank_df.columns = [c.replace(' ', '_') for c in bank_df.columns]

# Treat education as categorical, convert to dummy variables


bank_df['Education'] = bank_df['Education'].astype('category')
new_categories = {1: 'Undergrad', 2: 'Graduate', 3:
'Advanced/Professional'}
bank_df.Education.cat.rename_categories(new_categories, inplace=True)
bank_df = pd.get_dummies(bank_df, prefix_sep='_', drop_first=True)

y = bank_df['Personal_Loan']
X = bank_df.drop(columns=['Personal_Loan’])

# partition data
train_X, valid_X, train_y, valid_y = train_test_split(X, y,
test_size=0.4, random_state=1)
FITTING THE MODEL
▪ 

# fit a logistic regression


logit_reg = LogisticRegression(penalty="l2", C=1e42,
solver='liblinear')
logit_reg.fit(train_X, train_y)
print('intercept ', logit_reg.intercept_[0])
print(pd.DataFrame({'coeff': logit_reg.coef_[0]},
index=X.columns).transpose())
print('AIC', AIC_score(valid_y, logit_reg.predict(valid_X),
df = len(train_X.columns) + 1))
FITTING THE MODEL OUTPUT
intercept -12.61895521314035

Age Experience Income Family CCAvg Mortgage


coeff -0.032549 0.03416 0.058824 0.614095 0.240534 0.001012

Securities_Account CD_Account Online CreditCard


coeff -1.026191 3.647933 -0.677862 -0.95598

Education_Graduate Education_Advanced/Professional
coeff 4.192204 4.341697

AIC -709.1524769205962
CONVERTING FROM LOGIT TO PROBABILITY
𝑙𝑜𝑔𝑖𝑡
𝑂𝑑𝑑𝑠
𝑂𝑑𝑑𝑠 = 𝑒 →𝑝=
1 + 𝑂𝑑𝑑𝑠
logit_reg_pred = logit_reg.predict(valid_X)
logit_reg_proba = logit_reg.predict_proba(valid_X)
logit_result = pd.DataFrame({'actual': valid_y,
'p(0)': [p[0] for p in logit_reg_proba],
'p(1)': [p[1] for p in logit_reg_proba],
'predicted': logit_reg_pred })

# display four different cases


interestingCases = [2764, 932, 2721, 702]
print(logit_result.loc[interestingCases])

OUTPUT
actual p(0) p(1) predicted
2764 0 0.976 0.024 0
932 0 0.335 0.665 1
2721 1 0.032 0.968 1
702 1 0.986 0.014 0
INTERPRETING PROBABILITY AND ODDS

▪ 
EVALUATING CLASSIFICATION PERFORMANCE
classificationSummary(train_y, logit_reg.predict(train_X))
classificationSummary(valid_y, logit_reg.predict(valid_X))

OUTPUT
Confusion Matrix (Accuracy 0.9080)

Prediction
Actual 0 1
0 2632 81
1 195 92
Confusion Matrix (Accuracy 0.9110)

Prediction
Actual 0 1
0 1763 44
1 134 59
VARIABLE SELECTION



VARIABLE SELECTION

×
VARIABLE SELECTION



MODEL SELECTION


SUMMARY

You might also like