Logistic Regression
Logistic Regression
Logistic Regression
LOGISTIC REGRESSION
CONTENT
1. INTRODUCTION
▪
INTRODUCTION
▪
▪
▪
INTRODUCTION
5
LOGISTIC RESPONSE FUNCTION
1
𝑝=
1 + 𝑒 −𝑥
6
PROBABILITY
1
𝑝= −(𝛽 0 +𝛽 1 𝑥 1 +𝛽2 𝑥 2 + …𝛽 𝑞 𝑥 𝑞 )
1+ 𝑒
ODDS
𝑝
𝑂𝑑𝑑𝑠 =
1−𝑝
𝑂𝑑𝑑𝑠 1
𝑝= =
1 + 𝑂𝑑𝑑𝑠 1 + 𝑂𝑑𝑑𝑠 −1
ODDS
𝑝
𝑂𝑑𝑑𝑠 =
1−𝑝
LOGIT
ln(𝑂𝑑𝑑𝑠) = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑞 𝑥𝑞
LOGIT
𝑝
𝐿𝑜𝑔𝑖𝑡 = 𝑙𝑛
1−𝑝
LOGISTIC REGRESSION MODEL
▪
PERSONAL LOAN OFFER
UNIVERSALBANK.CSV
▪
▪
▪
SINGLE PREDICTOR MODEL
SINGLE PREDICTOR MODEL
▪
PYTHON FUNCTIONALITY NEEDED
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression,
LogisticRegressionCV
from sklearn.model_selection import train_test_split
import statsmodels.api as sm
from mord import LogisticIT
import matplotlib.pylab as plt
import seaborn as sns
from dmba import classificationSummary, gainsChart,
liftChart
from dmba.metric import AIC_score
https://github.com/nnbphuong/datascience4biz/blob/
master/Logistic_Regression.ipynb
DATA PREPROCESSING
bank_df = pd.read_csv('UniversalBank.csv')
bank_df.drop(columns=['ID', 'ZIP Code'], inplace=True)
bank_df.columns = [c.replace(' ', '_') for c in bank_df.columns]
y = bank_df['Personal_Loan']
X = bank_df.drop(columns=['Personal_Loan’])
# partition data
train_X, valid_X, train_y, valid_y = train_test_split(X, y,
test_size=0.4, random_state=1)
FITTING THE MODEL
▪
Education_Graduate Education_Advanced/Professional
coeff 4.192204 4.341697
AIC -709.1524769205962
CONVERTING FROM LOGIT TO PROBABILITY
𝑙𝑜𝑔𝑖𝑡
𝑂𝑑𝑑𝑠
𝑂𝑑𝑑𝑠 = 𝑒 →𝑝=
1 + 𝑂𝑑𝑑𝑠
logit_reg_pred = logit_reg.predict(valid_X)
logit_reg_proba = logit_reg.predict_proba(valid_X)
logit_result = pd.DataFrame({'actual': valid_y,
'p(0)': [p[0] for p in logit_reg_proba],
'p(1)': [p[1] for p in logit_reg_proba],
'predicted': logit_reg_pred })
OUTPUT
actual p(0) p(1) predicted
2764 0 0.976 0.024 0
932 0 0.335 0.665 1
2721 1 0.032 0.968 1
702 1 0.986 0.014 0
INTERPRETING PROBABILITY AND ODDS
▪
▪
EVALUATING CLASSIFICATION PERFORMANCE
classificationSummary(train_y, logit_reg.predict(train_X))
classificationSummary(valid_y, logit_reg.predict(valid_X))
OUTPUT
Confusion Matrix (Accuracy 0.9080)
Prediction
Actual 0 1
0 2632 81
1 195 92
Confusion Matrix (Accuracy 0.9110)
Prediction
Actual 0 1
0 1763 44
1 134 59
VARIABLE SELECTION
▪
▪
▪
▪
VARIABLE SELECTION
▪
×
VARIABLE SELECTION
▪
→
→
MODEL SELECTION
▪
▪
SUMMARY
▪