0% found this document useful (0 votes)

8 views

Logistic Regression

This document provides an overview of logistic regression models. It discusses: 1) Logistic regression is a classification algorithm where the target variable is categorical (e.g. yes/no) as opposed to linear regression where the target is continuous. 2) Logistic regression limits the range of the dependent variable between 0-1 using the sigmoid function, whereas linear regression ranges from -infinity to +infinity. 3) Performance is evaluated using metrics like accuracy, confusion matrix, ROC curve, and AUC. These help analyze how well the model classifies examples.

Uploaded by

sheenu117

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Logistic Regression

Uploaded by

sheenu117

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Logistic Regression Model: ML model which is a classification algorithm in which target variable discrete

or categorical in nature.Catagorical Yes/no pass /fail. Spam/notspam.

Linear Regression Model : The aim of LR model is to find best fit line so the value of y will vary from –
infinity to + infinity.

In the logistic regression model the regression word is coming because the logistic regression is working
on the aim of linear regression model. But the difference is range. Logistic regression range is 0 to 1.

Y=dependent variable=y(-infinity,+infinity) is changing to y(0,1) transformation is happening by the help

of mathematical function called sigmoid function.

P=1/(1+e^(-y))

How does sigmoid function is helping for the transformation

In linear regression y=-infinity it will go in sigmoid function

P=1/1+infinity=0

Y=infinity

P=1

Hence it can be seen that sigmoid fun convert limit of y to 0 to 1.

It is classification model not the regression model. Underlying concept is based on linear regression
that’s why suffix regression coming. Here also the aim is to create the best fit line.limit of dependent
variable y is from 0 to 1.

Decision boundary mid point which ensures that there would be no biasness among the category.so the
decision boundary created at mid point . if we have two category 0 and 1 then decision boundary would
be 0.5. if target value is greater than 0.5 come under 1 if it less than 0.5 then come under 0.

Finding the value of target variable by using sigmoid function.

p=1/1+e^(-y)

p(1+e^(-y))=1

p+p*e^(-y)=1

p*e^(-y)=1-p

e^(-y)=1-p/p

lne^(-y)=ln(1-p)/p

-y=ln (1-p)/p

Y=ln p/(1-p)

Hence We have converted the sigmoid function such that now its expressed as a function of target
variable.
Y=ln (p/(1-p))

p/1-p=odds ratio

y=ln(odds ratio)

y can be defined as log of odds ratio or log odd function or logistic function.

In the linear regression y=mx+c=b0+b1x1+b2x2+……bnxn+error

Error not in logistic regression

Y=log(odds ratio)

ln p/1-p=b0+b1x

for sigmoid function we get s shaped curve in the range of 0 to 1.

Types of logistic regression mode

Binary class classification 2 category

Multi class classification more than 2 category

Evaluation matrix for classification model

Confusion Matrix

Accuracy

Misclassification

TPR

FPR

TNR

Precision

Recall

F1 Score

ROC Curve

AUC
Actual Actual Total
+ -
Predicted + True + False+ Total
predictive
+
Predicted - False- True- Total
Predictive
-ve
Total Total Actual + Total Actual -

Y actual Y predicted
p p
p p
n p
n n
p p
p n
n n
n p
p p
p n
Total pact=6 Correctly predicted positive=4

Correctly predicted negative=2

Total nact=4

Confusion matrix is the tabular representation which will give the actual count vs predictive count

Actual Actual Total

+ -
Predicted + True +=4 False+=2 Total
predictive
+=6
Predicted - False-2 True-2 Total
Predictive
–ve=4
Total Total Actual + Total Actual -
=6 =4
Accuracy is the ratio of correct predictions/total predictions
Total no of correct predictions=true+ + true-

Total predictions =sum of all cells

Accuracy=6/10=0.6

Mis Classification= defined as ratio of incorrect predictions/total predictions

=sum of false + +false –ve/total

=4/10=0.4

True positive rate=(TPR) or recall

TPR=True Positive/Total number of actual +ve

=4/6=2/3=0.66

False Positive Rate

FPR=False +ve/Total Actual –ve

=2/4=0.5

True Negative rate= ratio of true negative /actual negative

=2/4=0.5

Precision= defined as ratio of true positive to total predicted positive

=4/6=2/3=0.66

Known as positive predicted values

F1 score= it measures combine effect of precision and recall is harmonic mean of precision and recall

F1score=2/(1/precision+1/recall)

F1score=0.66 when both false +ve and –ve are important parameter for us.

Drawback

Interpretability is very difficult we cant comment individually about false +ve or negative

ROC Curve= receiver operating characteristics this was initially operator by receiver operators of military
radar in 1941 that’s why named as roc and roc curve is plotted b/w TPR and FPR.

In Machine Learning Classification Model ROC Helps to analyze the operating characeristics of model.

IF TPR<FPR then the model is not doing great job.

If TPR>FPR model is doing great job.

AUC=Area Under the Curve total area under ROC Curve

Higher the area under the curve higher is TPR.

If TPR>FPR model is doing good job.

Bias Variance tradeoff=Assumptions which the model will make it is easier for the model to understand
target function. There can be two bias low and high

Low bias =less assumptions about the form of target function

Ex=decision tree,knn,svm.

High Bias= more assumptions about the form of target function.

Linear Regression logistic regression.

Variance=change in the estimate of target function in the training and testing data.

Low and high variance

Low variance=small changes to the estimate of target function.

Ex=linear and logistic regression

High variance= Large changes to the estimate of target function.

Decision Tree knn svm.

Divide the data into 3 parts

Training set,validation set,test set.

Training set=typically 60% of data generally used in training a machine learning model.

Validation set=Development set typically 20% of data. Used to test the quality of trained model.

Test set=typically 20% of data is to report the accuracy of final model.

Prediction error in 3 parts

Bias error

Variance error

Irreducible error

This error is present in training dataset we cant do anything about it but we can reduce bias and
variance error.
Total error=bias +variance+irreducible error

Trade off is the tension introduce by bias and variance

Model which can maintain the balance between bias and variance error is considered as the champion
model. Trade off management of bias variance error.

Cross validation CV=estimate how the model is performing on unseen data or test data which is not
used while training the model.

Common types of cross validation method

1 LpOCV: Leave p out cross validation:

It is the technique for performing cross validation, in this we define in each iteration that how many
example we need to use for testing dataset and training model. Process is repeated for all possible
combination of dataset

2 LOOCV:Leave one out cross validation

3 K fold: generally used.: in k fold we divide our sample into subsample

test training training training training training training training training training
training test training training training training training training training training
training training test training training training training training training training
training training training test training training training training training training
training training training training test training training training training training
training training training training training test training training training training
training training training training training training test training training training
training training training training training training training test training training
training training training training training training training training test training
training training training training training training training training training test
4 sample

3 training

1 testing

Entire dataset split into k equal part every time k-1 part training and 1 part testing.

Testing dataset will shuffle its position.

It will take the mean of all accuracy r1,r2,r3,…….rn.

K fold is the wonderful technique which resolves the problem of overfitting. In this we split entire
dataset into k equal parts target variable not there in testing dataset.

Everytime one part we keep as test k-1 part as training and everytime the testing part will shuffle like
this each part serves as testing as well as training dataset. Like this k times the split is done. K fold cross
validation. Generally value of k is less than 20 and most commonly used k is 5 or 10. Finally the CV will
give us mean of all the model accuracy of unseen data.

In k fold CV the test split is not seen by the model and the model performance is tested on it and it will
increase predictive power of model .

LOOCV:Particular case of LPOCV with p=1 it requires less computation time than LPOCV.

Feature Selection: Selection of Relevant feature which will give significant impact in model.

Sklearn provide some inbuilt function

1 RFE: recursive feature elimination: it will select the feature by recursively considering smaller sets of
features. First the estimator is trained on initial set of feature and importance of each feature is
obtained through specific attribute and it will eliminate those feature with less feature importance score
and it will eliminate the attribute and repeat with another attribute. At last it will give best performing
feature.

Select K best Method: it works by selecting best features best on univariate statistical test the idea here
it is to calculate some matrices between the target and each feature for the selection of k best feature
here we only need to provide the score function to define our metric.

There are some score function which we need to provide

1 f_classif it will do selection based on annova f value between the label for the classification task.

2 f_regression. it will do selection based on annova f value between the label for the regression task.

3 mutual_info_classif: mutual info for a discrete class.

4 mutual_info_regression: : mutual info for a continuous class.

5 chisquare: chisquare stats for non negative feature for classification task.

6 Select_Fpr: Select feature based on false positive rate.

Variance threshold Feature: it will remove the feature which having low variance with target.

Pils-300 Manual
No ratings yet
Pils-300 Manual
138 pages
FNSACC418 Learner Guide V1.1
No ratings yet
FNSACC418 Learner Guide V1.1
216 pages
Rhetorical Analysis - Our Vanishing Night
No ratings yet
Rhetorical Analysis - Our Vanishing Night
4 pages
Machine Learning Interview Questions.
50% (2)
Machine Learning Interview Questions.
43 pages
2012 Amana Tool Catalog
No ratings yet
2012 Amana Tool Catalog
356 pages
Aluminum Household Utensil Making Plant
100% (2)
Aluminum Household Utensil Making Plant
26 pages
Signature 1940
100% (2)
Signature 1940
68 pages
Session-11 Machine Learning - Jupyter Notebook
No ratings yet
Session-11 Machine Learning - Jupyter Notebook
11 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
FAM Unit6
No ratings yet
FAM Unit6
32 pages
Session-11 Machine Learning
No ratings yet
Session-11 Machine Learning
27 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
03 Logistic Regression
No ratings yet
03 Logistic Regression
23 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
ML Tutorial
No ratings yet
ML Tutorial
45 pages
Ml2-Summary
No ratings yet
Ml2-Summary
8 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
PWC
No ratings yet
PWC
24 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Accuracy Assessment and Confusion Matrix
No ratings yet
Accuracy Assessment and Confusion Matrix
23 pages
IS4242 W6 Model Evaluation and Selection
No ratings yet
IS4242 W6 Model Evaluation and Selection
86 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Unit II
100% (1)
Unit II
13 pages
ML CLASS 5 Logistic Regression Algorithm
No ratings yet
ML CLASS 5 Logistic Regression Algorithm
16 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Regression
No ratings yet
Regression
45 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
ML_AI
No ratings yet
ML_AI
53 pages
Lecture Material 11
No ratings yet
Lecture Material 11
14 pages
UNIT-3
No ratings yet
UNIT-3
12 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
ML-2-PPT-UNIT-2
No ratings yet
ML-2-PPT-UNIT-2
214 pages
ML-1
No ratings yet
ML-1
24 pages
Accuracy Measures
No ratings yet
Accuracy Measures
18 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
Classification
100% (2)
Classification
105 pages
Machine Learning Model
No ratings yet
Machine Learning Model
9 pages
07 Logistics Regression
No ratings yet
07 Logistics Regression
23 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
DS Notes Unit - V
No ratings yet
DS Notes Unit - V
13 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
13 Logistic Regression Main
No ratings yet
13 Logistic Regression Main
14 pages
Module 3 - ML
No ratings yet
Module 3 - ML
101 pages
Int3209 - Data Mining: Week 5: Classification Model Improvements
No ratings yet
Int3209 - Data Mining: Week 5: Classification Model Improvements
56 pages
FDS_notes
No ratings yet
FDS_notes
6 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
49 pages
L22 KNN+Metrics
No ratings yet
L22 KNN+Metrics
18 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
ERROR and Confusion Matrix
No ratings yet
ERROR and Confusion Matrix
29 pages
Module 5 Advanced Classification Techniques
No ratings yet
Module 5 Advanced Classification Techniques
40 pages
09_23ECE216_LogisticRegression
No ratings yet
09_23ECE216_LogisticRegression
40 pages
Logisttic Regression, ROC Curve, Cost Function
No ratings yet
Logisttic Regression, ROC Curve, Cost Function
10 pages
Logistic+regression Data
No ratings yet
Logistic+regression Data
13 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Module 2
No ratings yet
Module 2
72 pages
Unit 2
No ratings yet
Unit 2
8 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Machine Learning Algorithm
100% (2)
Machine Learning Algorithm
20 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Emec D 22 01052
No ratings yet
Emec D 22 01052
30 pages
Measuring Firms Financial Constraints Evidence Fo
No ratings yet
Measuring Firms Financial Constraints Evidence Fo
40 pages
Functional Information Systems
0% (1)
Functional Information Systems
33 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Reforms in Unicef (Mayank Gera) (X-R)
No ratings yet
Reforms in Unicef (Mayank Gera) (X-R)
33 pages
8630 Miele Built in Brochure 2013 Pureline 2
No ratings yet
8630 Miele Built in Brochure 2013 Pureline 2
28 pages
Just A Bit Wicked 7
No ratings yet
Just A Bit Wicked 7
194 pages
HP Workstation
No ratings yet
HP Workstation
48 pages
FFF Solutions
No ratings yet
FFF Solutions
152 pages
ĐỀ THI THỬ THPT QG SỐ 04
No ratings yet
ĐỀ THI THỬ THPT QG SỐ 04
6 pages
New Disney
No ratings yet
New Disney
3 pages
O2c Cycle
No ratings yet
O2c Cycle
14 pages
Denon Avr-1613 Owners Manual
No ratings yet
Denon Avr-1613 Owners Manual
131 pages
19th Jan 2025 Digital Classified Doon Weekly Market Epaper-min
No ratings yet
19th Jan 2025 Digital Classified Doon Weekly Market Epaper-min
31 pages
1.3 Patients Assessment Policy
0% (1)
1.3 Patients Assessment Policy
6 pages
Summary of Ancient Mystery Cults
100% (3)
Summary of Ancient Mystery Cults
25 pages
AF2602 Course Outline Sem 1 2021-2022
No ratings yet
AF2602 Course Outline Sem 1 2021-2022
12 pages
Matrix of Communication Theories
No ratings yet
Matrix of Communication Theories
8 pages
(Occasional Papers) Sujit Mukherjee - Towards A Literary History of India-Indian Institute of Advanced Study (1975)
100% (1)
(Occasional Papers) Sujit Mukherjee - Towards A Literary History of India-Indian Institute of Advanced Study (1975)
112 pages
Chapter 1 Practice Test
100% (1)
Chapter 1 Practice Test
3 pages
CLIMDEX: Climate Extremes Indices
No ratings yet
CLIMDEX: Climate Extremes Indices
5 pages
MCQ For 4 and 5th Unit
No ratings yet
MCQ For 4 and 5th Unit
15 pages
Vocabulary Intermediate
No ratings yet
Vocabulary Intermediate
21 pages
NCM 104 Lec Reviewer Finals
No ratings yet
NCM 104 Lec Reviewer Finals
11 pages
EthicsModule2Lesson1 2
No ratings yet
EthicsModule2Lesson1 2
13 pages
Sampling For Internal Audit ICFR Compliance Testing
No ratings yet
Sampling For Internal Audit ICFR Compliance Testing
48 pages
Multivariate Statistical Approaches in Archeology: A Systematic Review
No ratings yet
Multivariate Statistical Approaches in Archeology: A Systematic Review
7 pages
Junior Assistant in Telangana State Road Transport Corporation
No ratings yet
Junior Assistant in Telangana State Road Transport Corporation
23 pages
Clinical Handbook of Couple Therapy Third Edition Alan S. Gurman All Chapters Instant Download
100% (1)
Clinical Handbook of Couple Therapy Third Edition Alan S. Gurman All Chapters Instant Download
80 pages