Logistic Regression
Logistic Regression
Linear Regression Model : The aim of LR model is to find best fit line so the value of y will vary from –
infinity to + infinity.
In the logistic regression model the regression word is coming because the logistic regression is working
on the aim of linear regression model. But the difference is range. Logistic regression range is 0 to 1.
P=1/(1+e^(-y))
P=1/1+infinity=0
Y=infinity
P=1
It is classification model not the regression model. Underlying concept is based on linear regression
that’s why suffix regression coming. Here also the aim is to create the best fit line.limit of dependent
variable y is from 0 to 1.
Decision boundary mid point which ensures that there would be no biasness among the category.so the
decision boundary created at mid point . if we have two category 0 and 1 then decision boundary would
be 0.5. if target value is greater than 0.5 come under 1 if it less than 0.5 then come under 0.
p=1/1+e^(-y)
p(1+e^(-y))=1
p+p*e^(-y)=1
p*e^(-y)=1-p
e^(-y)=1-p/p
lne^(-y)=ln(1-p)/p
-y=ln (1-p)/p
Y=ln p/(1-p)
Hence We have converted the sigmoid function such that now its expressed as a function of target
variable.
Y=ln (p/(1-p))
p/1-p=odds ratio
y=ln(odds ratio)
y can be defined as log of odds ratio or log odd function or logistic function.
Y=log(odds ratio)
ln p/1-p=b0+b1x
Confusion Matrix
Accuracy
Misclassification
TPR
FPR
TNR
Precision
Recall
F1 Score
ROC Curve
AUC
Actual Actual Total
+ -
Predicted + True + False+ Total
predictive
+
Predicted - False- True- Total
Predictive
-ve
Total Total Actual + Total Actual -
Y actual Y predicted
p p
p p
n p
n n
p p
p n
n n
n p
p p
p n
Total pact=6 Correctly predicted positive=4
Total nact=4
Confusion matrix is the tabular representation which will give the actual count vs predictive count
Accuracy=6/10=0.6
=4/10=0.4
=4/6=2/3=0.66
=2/4=0.5
=2/4=0.5
=4/6=2/3=0.66
F1 score= it measures combine effect of precision and recall is harmonic mean of precision and recall
F1score=2/(1/precision+1/recall)
F1score=0.66 when both false +ve and –ve are important parameter for us.
Drawback
Interpretability is very difficult we cant comment individually about false +ve or negative
ROC Curve= receiver operating characteristics this was initially operator by receiver operators of military
radar in 1941 that’s why named as roc and roc curve is plotted b/w TPR and FPR.
In Machine Learning Classification Model ROC Helps to analyze the operating characeristics of model.
Bias Variance tradeoff=Assumptions which the model will make it is easier for the model to understand
target function. There can be two bias low and high
Ex=decision tree,knn,svm.
Variance=change in the estimate of target function in the training and testing data.
Training set=typically 60% of data generally used in training a machine learning model.
Validation set=Development set typically 20% of data. Used to test the quality of trained model.
Bias error
Variance error
Irreducible error
This error is present in training dataset we cant do anything about it but we can reduce bias and
variance error.
Total error=bias +variance+irreducible error
Model which can maintain the balance between bias and variance error is considered as the champion
model. Trade off management of bias variance error.
Cross validation CV=estimate how the model is performing on unseen data or test data which is not
used while training the model.
It is the technique for performing cross validation, in this we define in each iteration that how many
example we need to use for testing dataset and training model. Process is repeated for all possible
combination of dataset
test training training training training training training training training training
training test training training training training training training training training
training training test training training training training training training training
training training training test training training training training training training
training training training training test training training training training training
training training training training training test training training training training
training training training training training training test training training training
training training training training training training training test training training
training training training training training training training training test training
training training training training training training training training training test
4 sample
3 training
1 testing
Entire dataset split into k equal part every time k-1 part training and 1 part testing.
Everytime one part we keep as test k-1 part as training and everytime the testing part will shuffle like
this each part serves as testing as well as training dataset. Like this k times the split is done. K fold cross
validation. Generally value of k is less than 20 and most commonly used k is 5 or 10. Finally the CV will
give us mean of all the model accuracy of unseen data.
In k fold CV the test split is not seen by the model and the model performance is tested on it and it will
increase predictive power of model .
LOOCV:Particular case of LPOCV with p=1 it requires less computation time than LPOCV.
Feature Selection: Selection of Relevant feature which will give significant impact in model.
1 RFE: recursive feature elimination: it will select the feature by recursively considering smaller sets of
features. First the estimator is trained on initial set of feature and importance of each feature is
obtained through specific attribute and it will eliminate those feature with less feature importance score
and it will eliminate the attribute and repeat with another attribute. At last it will give best performing
feature.
Select K best Method: it works by selecting best features best on univariate statistical test the idea here
it is to calculate some matrices between the target and each feature for the selection of k best feature
here we only need to provide the score function to define our metric.
1 f_classif it will do selection based on annova f value between the label for the classification task.
2 f_regression. it will do selection based on annova f value between the label for the regression task.
5 chisquare: chisquare stats for non negative feature for classification task.
Variance threshold Feature: it will remove the feature which having low variance with target.