0% found this document useful (0 votes)

51 views

Logistic Regression

This document provides an overview of logistic regression. It discusses: 1) The logistic sigmoid function which links the predictor variables to the probability of class membership. It ranges between 0 and 1. 2) Logistic regression has fewer parameters than naive Bayes since it does not make independence assumptions between features. 3) Maximum likelihood estimation is used to determine the logistic regression parameters by minimizing the cross-entropy error function. 4) The gradient of the error function is derived and used in a simple sequential algorithm to iteratively update the weights through gradient descent.

Uploaded by

sarah alina

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views

Logistic Regression

Uploaded by

sarah alina

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Logistic Regression

Sargur N. Srihari
University at Buffalo, State University of New York
USA
Machine Learning Srihari

Topics in Linear Classification using

Probabilistic Discriminative Models
• Generative vs Discriminative
1. Fixed basis functions
2. Logistic Regression (two-class)
3. Iterative Reweighted Least Squares (IRLS)
4. Multiclass Logistic Regression
5. Probit Regression
6. Canonical Link Functions
2
Machine Learning Srihari

Topics in Logistic Regression

• Logistic Sigmoid and Logit Functions
• Parameters in discriminative approach
• Determining logistic regression parameters
– Error function
– Gradient of error function
– Simple sequential algorithm
– An example
• Generative vs Discriminative Training
– Naiive Bayes vs Logistic Regression
3
Machine Learning Srihari

Logistic Sigmoid and Logit Functions

• In two-class case, posterior of class C1
can be written as as a logistic sigmoid Logistic Sigmoid
of feature vector ϕ=[ϕ1,..ϕM]T σ(a)
p(C1|ϕ) = y(ϕ) = σ (wTϕ)
with p(C2|ϕ) = 1- p(C1|ϕ) a
Here σ (.) is the logistic sigmoid function Properties:
A. Symmetry
– Known as logistic regression
w
in statistics
σ (-a)=1-σ (a)
• Although a model for classification rather than
for regression B. Inverse
a=ln(σ /1-σ)
known as logit.
• Logit function: Also known as
log odds since
– It is the log of the odds ratio it is the ratio
• It links the probability to the predictor variables ln[p(C1|ϕ)/p(C2|ϕ)]
C. Derivative
dσ/da = σ (1-σ)
Machine Learning Srihari

Fewer Parameters in Linear

Discriminative Model
• Discriminative approach (Logistic Regression)
– For M -dim feature space ϕ:
– M adjustable parameters
• Generative based on Gaussians (Bayes/NB)
• 2M parameters for mean
• M(M+1)/2 parameters for shared covariance matrix
• Two class priors
• Total of M(M+5)/2 + 1 parameters
– Grows quadratically with M
• If features assumed independent (naïve Bayes) still 5
needs M+3 parameters
Machine Learning Srihari

Determining Logistic Regression parameters

• Maximum Likelihood Approach for Two classes

• For a data set (ϕn,tn) where tn ε {0,1} and
ϕn=ϕ (xn), n =1,..,N

• Likelihood function can be written as

{ }
1−tn
p(t | w) = ∏ yn 1 − yn
tn

n =1
where t =(t1,..,tN)T and yn= p(C1|ϕn)

yn is the probability that tn =1

6
Machine Learning Srihari

Error Fn for Logistic Regression

• Likelihood function is
N

{ }
1−tn
p(t | w) = ∏ yn 1 − yn
tn

n=1

• By taking negative logarithm we get the

Cross-entropy Error Function
N

{ }
E(w) = − ln p(t | w) = − ∑ tn ln yn + (1 − tn )ln(1 − yn )
n =1

where yn= σ (an) and an=wTϕn

• We need to minimize E(w)
At its minimum, derivative of E(w) is zero
So we need to solve for w in the equation
∇E(w) = 0 7
Machine Learning Srihari

What is Cross-entropy?
• Entropy of p(x) is defined as H(p) = − ∑ p(x)log p(x)
x

– If p(x=1| t)=t and p(x=0|t)=1-t then we can write

p(x)=tx(1-t)1-x
• Then Entropy of p(x) is H(p)=t logt+(1-t)log(1-t)
• Cross entropy of p(x) and q(x) is defined as
H(p,q) = − ∑ p(x)logq(x)
x

– If q(x=1| y)=y then H(p,q)= t log y + (1-t)log(1-y)

• In general H(p,q)=H(p)+DKL(p||q)
– where p(x)
DKL (p,q) = − ∑ p(x)log
x q(x)

8
Machine Learning Srihari

Gradient of Error Function

Error function
N

{
E(w) = − ln p(t | w) = − ∑ tn ln yn + (1 − tn )ln(1 − yn )
n =1
}
where yn= σ(wTϕn)

Using Derivative of logistic sigmoid dσ/da=σ(1-σ)

Gradient of the error function Proof of gradient expression
Let z = z1 + z 2
N
∇E(w) = ∑ yn − tn φn
n =1
( ) where z1 = t ln σ (wφ ) and z 2 = (1 − t)ln[1 − σ (wφ )]
dz1 tσ (wφ )[1 − σ (wφ )]φ dσ
= = σ (1− σ )
da
Error x Feature Vector dw σ (wφ ) d a
and Using dx
(ln ax) =
x
Contribution to gradient by data dz2 (1 − t)σ (wφ )[1 − σ (wφ )](−φ )
point n is error between target tn = 9
dw [1 − σ (wφ )]
and prediction yn= σ (wTφn) times basis φn dz
Therefore = (σ (wφ ) − t)φ
dw
Machine Learning Srihari

Simple Sequential Algorithm

• Given Gradient of error function
N

( )
∇E(w) = ∑ yn − tn φn
where y = σ(w ϕ )
n =1
n
T
n

• Solve using an iterative approach

w τ+1 = w τ − η∇En
• where
Takes precisely same form as
∇En = (yn − tn )φn
Gradient of Sum-of-squares
error for linear regression
Error x Feature Vector

Samples are presented one at a time in

which each each of the weight vectors is updated 10
Python Code for Logistic Regression
Sigmoid function to produce value between 0 and 1
def sigmoid(z): Prediction
return (1 / (1 + np.exp(-z)))

Loss and Cost function

Loss function is the loss for a training example
z
Cost is the loss for whole training set

Updating weights and biases

p is our prediction and y is correct value

Finding db and dw
Derivative wrt p à Derivative wrt z.

https://towardsdatascience.com/
logistic-regression-from-
very-scratch-ea914961f320
Logistic Regression Code in Python
use sci-kit learn to create a data set.
import sklearn.datasets
import matplotlib.pyplot as plt
import numpy as np
X, Y = sklearn.datasets.make_moons(n_samples=500, noise=.2)
X, Y = X.T, Y.reshape(1, Y.shape[0])
epochs = 1000
learningrate = 0.01
def sigmoid(z):
return 1 / (1 + np.exp(-z))
losstrack = []
m = X.shape[1]
w = np.random.randn(X.shape[0], 1)*0.01
b=0
for epoch in range(epochs):
z = np.dot(w.T, X) + b
p = sigmoid(z)
cost = -np.sum(np.multiply(np.log(p), Y) + np.multiply((1 - Y), np.log(1 - p)))/m
losstrack.append(np.squeeze(cost))
dz = p-Y
dw = (1 / m) * np.dot(X, dz.T)
db = (1 / m) * np.sum(dz)
w = w - learningrate * dw
b = b - learningrate * db
plt.plot(losstrack)

Prediction: From the code above, you find p. It will be between 0 and
Machine Learning Srihari

ML solution can over-fit

• Severe over-fitting for linearly
4

σ(a)
0

separable data !4

!4 !2 0 2 4 6 8

– Because ML solution occurs at σ = 0.5 a

• With σ>0.5 and σ< 0.5 for the two classes
• Solution equivalent to a=wTϕ = 0
– Logistic sigmoid becomes infinitely steep
• A Heavyside step function ||w||goes to ∞
– Solution
• Penalizing wts
• Recall in linear regression
N

{
∇En = −∑ tn − wT φ(xn )
n=1
} φ(x ) n
T
without reg
⎡ N ⎤
{
∇En = ⎢ − ∑ tn − wT φ(x n ) } φ(x ) n
T
⎥ + λw with reg 13
⎣ n=1 ⎦
14

An Example of 2-class Logistic Regression

• Input Data

ϕ0(x)=1, dummy feature

Initial Weight Vector, Gradient and

Hessian (2-class)
• Weight vector

• Gradient

• Hessian
16

Final Weight Vector, Gradient and

Hessian (2-class)
• Weight Vector

• Gradient

• Hessian

Number of iterations : 10

Error (Initial and Final): 15.0642, 1.0000e-009

Generative vs Discriminative Training
Variables x ={x1,..xM} and classifier target y

1. Generative: estimate parameters of variables independently

Naïve y For classification:
Simple estimation
Determine joint:
Bayes M
p(y,x ) = p(y)∏ p(x i | y) independently estimate M sets of parameters
i=1 But independence is usually false
From joint get required We can estimate M(M+1)/2 covariance matrix
x1 x2 xM conditional p(y|x)

2. Discriminative: estimate joint parameters wi

Potential Functions (log-linear) For classification:
⎧ M
⎫
ϕi(xi,y)=exp{wixi I{y=1}}, Unnormalized P(y = 1 | x ) = exp ⎨w 0 + ∑ wi xi ⎬
! ! = 0 | x ) = exp {0} = 1
P(y
⎩ i=1 ⎭
ϕ0(y)=exp{w0 I{y=1}}
⎧ M
⎫ ez
Normalized P(y = 1 | x ) = sigmoid ⎨w 0 + ∑ wi xi ⎬ where sigmoid(z) =
Naïve ⎩ ⎭ 1+ ez

Markov
y I has value 1 when
y=1, else 0 Logistic Regression
i=1

Jointly optimize M parameters multiclass

exp(ai )
More complex estimation but correlations p(yi | φ ) = yi (φ ) =
x1 x2 xM accounted for ∑ j exp(a j )
Can use much richer features: where aj=wjTϕ
Edges, image patches sharing same pixels
Machine Learning Srihari

Logistic Regression is a special

architecture of a neural network

18
Machine Learning Srihari

19
https://storage.ning.com/topology/rest/1.0/file/get/2408482975?profile=original

Dr. Debra Ann Poole PHD - Interviewing Children - The Science of Conversation in Forensic Contexts-American Psychological Association (2016) (Z-Lib - Io)
100% (2)
Dr. Debra Ann Poole PHD - Interviewing Children - The Science of Conversation in Forensic Contexts-American Psychological Association (2016) (Z-Lib - Io)
244 pages
The Power of Language How Discourse Influences Society (Lynne Young Brigid Fitzgerald)
100% (2)
The Power of Language How Discourse Influences Society (Lynne Young Brigid Fitzgerald)
336 pages
Humphries Language Anxiety
No ratings yet
Humphries Language Anxiety
13 pages
3-LG_Eval
No ratings yet
3-LG_Eval
52 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
09_23ECE216_LogisticRegression
No ratings yet
09_23ECE216_LogisticRegression
40 pages
Classification-Introduction, Logistic Regression
No ratings yet
Classification-Introduction, Logistic Regression
26 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Lecture Note #9_PEC-CS701E
No ratings yet
Lecture Note #9_PEC-CS701E
41 pages
Lecture 5_Logistic Regression (1)
No ratings yet
Lecture 5_Logistic Regression (1)
28 pages
Logistic regression
No ratings yet
Logistic regression
12 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
23 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Deep Learning Week 204-4
No ratings yet
Deep Learning Week 204-4
1 page
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Logistic regression by Nirzona
No ratings yet
Logistic regression by Nirzona
11 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
10. Binary Logistic Regression 2
No ratings yet
10. Binary Logistic Regression 2
43 pages
L4 - Logistic Regression - B
No ratings yet
L4 - Logistic Regression - B
45 pages
ML Assignment
No ratings yet
ML Assignment
20 pages
ml (08-08-2024)
No ratings yet
ml (08-08-2024)
5 pages
LSVM - Jaikrishna 1
No ratings yet
LSVM - Jaikrishna 1
5 pages
Logistic Regression - Jaikrishna 2
No ratings yet
Logistic Regression - Jaikrishna 2
5 pages
Intro to Linear and Logistic Reg
No ratings yet
Intro to Linear and Logistic Reg
5 pages
06_LogisticRegression
No ratings yet
06_LogisticRegression
29 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
21 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Notes 05
No ratings yet
Notes 05
51 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Exp 2 121a1047 ML Lavanya Kurup Div C C3
No ratings yet
Exp 2 121a1047 ML Lavanya Kurup Div C C3
8 pages
DS203 2024 01 02 LogisticRegression
No ratings yet
DS203 2024 01 02 LogisticRegression
38 pages
Lect 11 P1
No ratings yet
Lect 11 P1
21 pages
13.logistic Regression
No ratings yet
13.logistic Regression
9 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
W2 Ann
No ratings yet
W2 Ann
12 pages
Unit II
100% (1)
Unit II
13 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Module-2_Logistic Regression in Machine Learning
No ratings yet
Module-2_Logistic Regression in Machine Learning
28 pages
Exp3 ML
No ratings yet
Exp3 ML
4 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Logistic Regression: Adapted From: Tom Mitchell's Machine Learning Book Evan Wei Xiang and Qiang Yang
No ratings yet
Logistic Regression: Adapted From: Tom Mitchell's Machine Learning Book Evan Wei Xiang and Qiang Yang
15 pages
Logistic Regression
No ratings yet
Logistic Regression
74 pages
4. Logistic Regression
No ratings yet
4. Logistic Regression
21 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Logistic Regression
No ratings yet
Logistic Regression
78 pages
2+Logistic_regression
No ratings yet
2+Logistic_regression
10 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
output_23
No ratings yet
output_23
6 pages
Mathematics Behind Logistic Regression Model 1598272636
No ratings yet
Mathematics Behind Logistic Regression Model 1598272636
6 pages
Machine Learning Using Optimization and Logistic Regression and Sigmoid Function_grp 06
No ratings yet
Machine Learning Using Optimization and Logistic Regression and Sigmoid Function_grp 06
31 pages
Introduction To Machine Learning: 2 Linear Classifiers
No ratings yet
Introduction To Machine Learning: 2 Linear Classifiers
4 pages
ML Logistic Regression
No ratings yet
ML Logistic Regression
19 pages
Handout 02 Logistic Regression
No ratings yet
Handout 02 Logistic Regression
39 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
Unit II NOTES
No ratings yet
Unit II NOTES
31 pages
DDA3020 Lecture 06 Logistic Regression
No ratings yet
DDA3020 Lecture 06 Logistic Regression
47 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Indonesia Energy Transition Outlook IETO 2024
100% (1)
Indonesia Energy Transition Outlook IETO 2024
143 pages
Data Mining - Cluster Analysis Basic Concepts and Algorithms
No ratings yet
Data Mining - Cluster Analysis Basic Concepts and Algorithms
98 pages
Enhanced Anomaly-Based Fault Detection System in E
No ratings yet
Enhanced Anomaly-Based Fault Detection System in E
19 pages
05 Reliability of Distribution Systems
No ratings yet
05 Reliability of Distribution Systems
32 pages
Long Intl Intro To CII Adv Work Packaging-An Industry Best Practice
No ratings yet
Long Intl Intro To CII Adv Work Packaging-An Industry Best Practice
15 pages
Nanzan University Asian Folklore Studies
No ratings yet
Nanzan University Asian Folklore Studies
11 pages
History and Theory - Is Restoration Actually Modern?
No ratings yet
History and Theory - Is Restoration Actually Modern?
2 pages
IA B.E ElectronicsandCommunicationEngineering Semester5 DivisionA PDF
No ratings yet
IA B.E ElectronicsandCommunicationEngineering Semester5 DivisionA PDF
1 page
Diffusion of Innovation Theory
No ratings yet
Diffusion of Innovation Theory
13 pages
IPPTCh 005
No ratings yet
IPPTCh 005
65 pages
How To Improve A Thesis Statement
100% (3)
How To Improve A Thesis Statement
5 pages
JoelGreenbergThirdSuperseding 30mar 2021
No ratings yet
JoelGreenbergThirdSuperseding 30mar 2021
45 pages
Hist 113www Syllabus
No ratings yet
Hist 113www Syllabus
5 pages
America and I
No ratings yet
America and I
2 pages
2HSG
No ratings yet
2HSG
9 pages
Lecture 1 Introduction To Project Management PDF
No ratings yet
Lecture 1 Introduction To Project Management PDF
27 pages
Transparency Guide
No ratings yet
Transparency Guide
33 pages
Strategic Alliances & AirBnB Case
No ratings yet
Strategic Alliances & AirBnB Case
16 pages
Idioms and Meaning
No ratings yet
Idioms and Meaning
30 pages
Exposure To Computer Descipline Part 1
No ratings yet
Exposure To Computer Descipline Part 1
168 pages
Activity 2 Prefinal 1
No ratings yet
Activity 2 Prefinal 1
10 pages
Nucl Phys Problems
No ratings yet
Nucl Phys Problems
6 pages
02 Task Performance 1 - ARG GROUP-5-funda
No ratings yet
02 Task Performance 1 - ARG GROUP-5-funda
5 pages
Letter To Lori Swanson Minnesota AG Concerning Open Records Requests To U of Minnesota
No ratings yet
Letter To Lori Swanson Minnesota AG Concerning Open Records Requests To U of Minnesota
2 pages
1B
No ratings yet
1B
5 pages
TSO&ISPF
No ratings yet
TSO&ISPF
13 pages
Impacts of Human-Wildlife Conflict in Developing C PDF
No ratings yet
Impacts of Human-Wildlife Conflict in Developing C PDF
6 pages
Emerging Markets
No ratings yet
Emerging Markets
35 pages
Weekly Plan Sheet
No ratings yet
Weekly Plan Sheet
25 pages
Stainsfile - 10% Neutral Buffered Formalin
No ratings yet
Stainsfile - 10% Neutral Buffered Formalin
2 pages
4th House Characteristics
No ratings yet
4th House Characteristics
3 pages