Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
8 views

Logistic Regression

This document provides an overview of logistic regression models. It discusses: 1) Logistic regression is a classification algorithm where the target variable is categorical (e.g. yes/no) as opposed to linear regression where the target is continuous. 2) Logistic regression limits the range of the dependent variable between 0-1 using the sigmoid function, whereas linear regression ranges from -infinity to +infinity. 3) Performance is evaluated using metrics like accuracy, confusion matrix, ROC curve, and AUC. These help analyze how well the model classifies examples.

Uploaded by

sheenu117
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Logistic Regression

This document provides an overview of logistic regression models. It discusses: 1) Logistic regression is a classification algorithm where the target variable is categorical (e.g. yes/no) as opposed to linear regression where the target is continuous. 2) Logistic regression limits the range of the dependent variable between 0-1 using the sigmoid function, whereas linear regression ranges from -infinity to +infinity. 3) Performance is evaluated using metrics like accuracy, confusion matrix, ROC curve, and AUC. These help analyze how well the model classifies examples.

Uploaded by

sheenu117
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Logistic Regression Model: ML model which is a classification algorithm in which target variable discrete

or categorical in nature.Catagorical Yes/no pass /fail. Spam/notspam.

Linear Regression Model : The aim of LR model is to find best fit line so the value of y will vary from –
infinity to + infinity.

In the logistic regression model the regression word is coming because the logistic regression is working
on the aim of linear regression model. But the difference is range. Logistic regression range is 0 to 1.

Y=dependent variable=y(-infinity,+infinity) is changing to y(0,1) transformation is happening by the help


of mathematical function called sigmoid function.

P=1/(1+e^(-y))

How does sigmoid function is helping for the transformation

In linear regression y=-infinity it will go in sigmoid function

P=1/1+infinity=0

Y=infinity

P=1

Hence it can be seen that sigmoid fun convert limit of y to 0 to 1.

It is classification model not the regression model. Underlying concept is based on linear regression
that’s why suffix regression coming. Here also the aim is to create the best fit line.limit of dependent
variable y is from 0 to 1.

Decision boundary mid point which ensures that there would be no biasness among the category.so the
decision boundary created at mid point . if we have two category 0 and 1 then decision boundary would
be 0.5. if target value is greater than 0.5 come under 1 if it less than 0.5 then come under 0.

Finding the value of target variable by using sigmoid function.

p=1/1+e^(-y)

p(1+e^(-y))=1

p+p*e^(-y)=1

p*e^(-y)=1-p

e^(-y)=1-p/p

lne^(-y)=ln(1-p)/p

-y=ln (1-p)/p

Y=ln p/(1-p)

Hence We have converted the sigmoid function such that now its expressed as a function of target
variable.
Y=ln (p/(1-p))

p/1-p=odds ratio

y=ln(odds ratio)

y can be defined as log of odds ratio or log odd function or logistic function.

In the linear regression y=mx+c=b0+b1x1+b2x2+……bnxn+error

Error not in logistic regression

Y=log(odds ratio)

ln p/1-p=b0+b1x

for sigmoid function we get s shaped curve in the range of 0 to 1.

Types of logistic regression mode

Binary class classification 2 category

Multi class classification more than 2 category

Evaluation matrix for classification model

Confusion Matrix

Accuracy

Misclassification

TPR

FPR

TNR

Precision

Recall

F1 Score

ROC Curve

AUC
Actual Actual Total
+ -
Predicted + True + False+ Total
predictive
+
Predicted - False- True- Total
Predictive
-ve
Total Total Actual + Total Actual -

Y actual Y predicted
p p
p p
n p
n n
p p
p n
n n
n p
p p
p n
Total pact=6 Correctly predicted positive=4

Correctly predicted negative=2

Total nact=4

Confusion matrix is the tabular representation which will give the actual count vs predictive count

Actual Actual Total


+ -
Predicted + True +=4 False+=2 Total
predictive
+=6
Predicted - False-2 True-2 Total
Predictive
–ve=4
Total Total Actual + Total Actual -
=6 =4
Accuracy is the ratio of correct predictions/total predictions
Total no of correct predictions=true+ + true-

Total predictions =sum of all cells

Accuracy=6/10=0.6

Mis Classification= defined as ratio of incorrect predictions/total predictions

=sum of false + +false –ve/total

=4/10=0.4

True positive rate=(TPR) or recall

TPR=True Positive/Total number of actual +ve

=4/6=2/3=0.66

False Positive Rate

FPR=False +ve/Total Actual –ve

=2/4=0.5

True Negative rate= ratio of true negative /actual negative

=2/4=0.5

Precision= defined as ratio of true positive to total predicted positive

=4/6=2/3=0.66

Known as positive predicted values

F1 score= it measures combine effect of precision and recall is harmonic mean of precision and recall

F1score=2/(1/precision+1/recall)

F1score=0.66 when both false +ve and –ve are important parameter for us.

Drawback

Interpretability is very difficult we cant comment individually about false +ve or negative

ROC Curve= receiver operating characteristics this was initially operator by receiver operators of military
radar in 1941 that’s why named as roc and roc curve is plotted b/w TPR and FPR.

In Machine Learning Classification Model ROC Helps to analyze the operating characeristics of model.

IF TPR<FPR then the model is not doing great job.

If TPR>FPR model is doing great job.


AUC=Area Under the Curve total area under ROC Curve

Higher the area under the curve higher is TPR.

If TPR>FPR model is doing good job.

Bias Variance tradeoff=Assumptions which the model will make it is easier for the model to understand
target function. There can be two bias low and high

Low bias =less assumptions about the form of target function

Ex=decision tree,knn,svm.

High Bias= more assumptions about the form of target function.

Linear Regression logistic regression.

Variance=change in the estimate of target function in the training and testing data.

Low and high variance

Low variance=small changes to the estimate of target function.

Ex=linear and logistic regression

High variance= Large changes to the estimate of target function.

Decision Tree knn svm.

Divide the data into 3 parts

Training set,validation set,test set.

Training set=typically 60% of data generally used in training a machine learning model.

Validation set=Development set typically 20% of data. Used to test the quality of trained model.

Test set=typically 20% of data is to report the accuracy of final model.

Prediction error in 3 parts

Bias error

Variance error

Irreducible error

This error is present in training dataset we cant do anything about it but we can reduce bias and
variance error.
Total error=bias +variance+irreducible error

Trade off is the tension introduce by bias and variance

Model which can maintain the balance between bias and variance error is considered as the champion
model. Trade off management of bias variance error.

Cross validation CV=estimate how the model is performing on unseen data or test data which is not
used while training the model.

Common types of cross validation method

1 LpOCV: Leave p out cross validation:

It is the technique for performing cross validation, in this we define in each iteration that how many
example we need to use for testing dataset and training model. Process is repeated for all possible
combination of dataset

2 LOOCV:Leave one out cross validation

3 K fold: generally used.: in k fold we divide our sample into subsample

test training training training training training training training training training
training test training training training training training training training training
training training test training training training training training training training
training training training test training training training training training training
training training training training test training training training training training
training training training training training test training training training training
training training training training training training test training training training
training training training training training training training test training training
training training training training training training training training test training
training training training training training training training training training test
4 sample

3 training

1 testing

Entire dataset split into k equal part every time k-1 part training and 1 part testing.

Testing dataset will shuffle its position.

It will take the mean of all accuracy r1,r2,r3,…….rn.


K fold is the wonderful technique which resolves the problem of overfitting. In this we split entire
dataset into k equal parts target variable not there in testing dataset.

Everytime one part we keep as test k-1 part as training and everytime the testing part will shuffle like
this each part serves as testing as well as training dataset. Like this k times the split is done. K fold cross
validation. Generally value of k is less than 20 and most commonly used k is 5 or 10. Finally the CV will
give us mean of all the model accuracy of unseen data.

In k fold CV the test split is not seen by the model and the model performance is tested on it and it will
increase predictive power of model .

LOOCV:Particular case of LPOCV with p=1 it requires less computation time than LPOCV.

Feature Selection: Selection of Relevant feature which will give significant impact in model.

Sklearn provide some inbuilt function

1 RFE: recursive feature elimination: it will select the feature by recursively considering smaller sets of
features. First the estimator is trained on initial set of feature and importance of each feature is
obtained through specific attribute and it will eliminate those feature with less feature importance score
and it will eliminate the attribute and repeat with another attribute. At last it will give best performing
feature.

Select K best Method: it works by selecting best features best on univariate statistical test the idea here
it is to calculate some matrices between the target and each feature for the selection of k best feature
here we only need to provide the score function to define our metric.

There are some score function which we need to provide

1 f_classif it will do selection based on annova f value between the label for the classification task.

2 f_regression. it will do selection based on annova f value between the label for the regression task.

3 mutual_info_classif: mutual info for a discrete class.

4 mutual_info_regression: : mutual info for a continuous class.

5 chisquare: chisquare stats for non negative feature for classification task.

6 Select_Fpr: Select feature based on false positive rate.

Variance threshold Feature: it will remove the feature which having low variance with target.

You might also like