Logistic Regression

Lecture 3
Logistic Regression
Dr. Le Huu Ton
Hanoi, 2020
Outline
Logistic Regression
Gradient Descent
Newton’s Method
2
Outline
Logistic Regression
Gradient Descent
Newton’s Method
2
Classification
Example:
Given data of prices of houses with size from 25-30 m2 and their
location. Predict if a house is on Thanh Xuan district or Hoan Kiem
district base on it price.
Price (b.VND) Location
2.5 Thanh Xuan y  {0,1}

3.5 Thanh Xuan
y=0 => Thanh Xuan (negative)
5.6 Hoan Kiem y=1 => Hoan Kiem (positive)
2.2 Thanh Xuan
6.9 Hoan Kiem
9.6 Hoan Kiem
4
Classification
Classification with linear regression???

Location
0.5
0
Price
5
Classification
 Linear regression is not a good choice for classification problem
h( x )   x T
Need a more suitable hypothesis such as:
0  h( x )  1
=> 1
h( x)  g ( x) where
T
g ( z)  z
1 e
1
h( x )   T x
g(z): sigmoid function or logistic
function
1 e 6
Classification
Classification => Logistic Regression

Logistic Function
y=h(x)
θTx
7
Classification
Interpretation of Hypothesis output

h(x) can be considered as the probability that output y =1 with a
given value of input x
h(x)=0.65 => there is 65% chance that the house is locate at Hoan
kiem district
8
Classification
Example of Image Classification using Caffe

http://demo.caffe.berkeleyvision.org/
9
Classification
Example:
Calulate the output value with following coefficient
Θ0 = Θ1=0;
Θ0 =0.5 Θ1=0.7;
2.5 Thanh Xuan

3.5 Thanh Xuan
5.6 Hoan Kiem
2.2 Thanh Xuan
6.9 Hoan Kiem
9.6 Hoan Kiem
10
Classification
Decision Boundary
11
Classification
Decision Boundary
12
Classification
Decision Boundary
13
Classification
Cost Function
Linear Regression:
1 m
E ( )  
2m i 1
( h ( x (i )
)  y (i ) 2
)
1
h( x )   T x
1 e
14
Classification
Logistic Regression Cost function
  log(h ( x) if y  1 
E (h ( x), y )   
  log(1  h ( x )) if y  0 
15
Classification
Logistic Regression Cost function
  log(h ( x) if y  1 
E (h ( x), y )   
  log(1  h ( x )) if y  0 
E (h ( x), y )   y log(h ( x))  (1  y ) log(1  h ( x))
16
Outline
Logistic Regression
Gradient Descent
Newton’s Method
2
Classification
Gradient Descent for logistic regression:

Given the cost function
1 m (i )
E ( )    [ y log h ( x (i ) )  (1  y ( i ) ) log(1  h ( x ( i ) )]
m i 1
Update θ until convergence:

 j :  j   E ( )
 j
18
Classification
Exercise:

Calculate
E ( )
 j
  1 1 z
g ( z)  z
 z 2
(e )
z z 1  e (1  e )
1 1
 z
(1  z
)  g ( z )(1  g ( z ))
1 e 1 e
 1  f ( g )  g
log( z )  f ( g ( z )) 
z z z g z
19
Classification
Solution:
 1 m
  (h ( x (i ) )  y (i ) ) x (ji )
 j m i 1
20
Classification
Exercise:
Starting with θ0 and θ1 equal to 0. α =0.001. Calculate the value of
coefficient after first iteration with batch gradient descent
Price Location Output Value

2.5 Thanh Xuan 0
3.5 Thanh Xuan 0
5.6 Hoan Kiem 1
2.2 Thanh Xuan 0
6.9 Hoan Kiem 1
9.6 Hoan Kiem 1
21
Homework
Exercise:
Starting with θ0 and θ1 equal to 0. α =0.0001. Calculate the value of
coefficient after first iteration using gradient descent with batch
learning, stochastic and mini batch learning algorithm
Price Location Output Value
2.5 Thanh Xuan 0
3.5 Thanh Xuan 0
5.6 Hoan Kiem 1
2.2 Thanh Xuan 1
6.9 Hoan Kiem 0
9.6 Hoan Kiem 1
22
Outline
Logistic Regression
Gradient Descent
Newton’s Method
2
Newton’s Method
Logistic Regression: Minimize the cost function
1 m (i )
E ( )    [ y log h ( x (i ) )  (1  y ( i ) ) log(1  h ( x ( i ) )]
m i 1
Gradient Descent: step by step modify the coefficients θ such as
this modification reduce the cost function

 j :  j   E ( )
 j
Newton’s method shares the same idea with normal equation
(linear regression): finding the coefficients θ as

E ( )  J ( )  0
 24
Newton’s Method
J(θ)
θ3 θ2 θ1 θ0 θ
25
Newton’s Method
Start with random value of coefficient θ0 and then step by step

update θ, until E’(θ) reaches 0, or E(θ) reaches its minimum
While J(θ)!=0
{
- Calculate the tangent line of J(θ) at θt
- Find the cross point of tangent line with the θ axis, called θt+1
- Update θt to θt+1
}
26
Newton’s Method
J(θ)
 J ( 0
)
J ( 0 )  J ' ( 0 ) 
 
J ( 0 )
 ' 0
J ( )
J(θ0)
J (  0
)
1   0 
J '( 0 )
Δ
θ3 θ2 θ1 θ0
27
Newton’s Method
Start with random value of coefficient θ0 and then step by step

update θ
While J(θ)!=0
{
- Calculate the tangent line of J(θ) at θt
- Find the cross point of tangent line with the θ axis, called θt+1
- Update θt to θt+1
J (  t
) E '
( t
)
  
t 1 t
   '' t
t
J '( )t
E ( )
}
28
Newton’s Method
J (  t
) E '
( t
)
  
t 1 t
   '' t
t
J '( )t
E ( )
  t  H 1 E
Where: H is Hessian Matrix,  E is a derivative vector


H 00 H 01 ... H 0 n E  
 0
H10 H11 ... H1n 2 E
H where H ij   E  ...
... ... ... ...  i  j

H n0 H n1 ... H nn E  
 n
29
Newton’s Method

E  
 0
1 m
 E  ...  E   (h ( x (i ) )  y (i ) ) x ( i )
 m i 1
E  
 n
1 m
H    h( x (i ) )(1  h( x ( i ) ) x ( i ) ( x ( i ) )T 
m i 1
30
Newton’s Method
Which is the best option checking if Newton’s method has

converged?
1. Plot h(x) as a function of x, and check if it fits the data well.
2. Plot E(θ) as a function of θ and check if it has reach a minimum
3. Plot θ as a function of the number of iteration and check if it
has stop decreasing (or decreasing only a tiny amount per
iteration)
4. Plot E(θ) as a function of number of iteration and check if it has
stop decreasing (or decreasing only a tiny amount per
iteration)
31
Newton’s Method
Newton’s Method vs Gradient Descent

Gradient Descent Newton’s Method
Implementation Simpler More complex
Need to chose No
parameter
Convergence Speed Need more Iteration Less iteration
Computation cost of Each iteration is
each iteration is more expensive 0(n3)
cheep 0(n) N:number of
n:number of features features
Application Use when n is large Use when n is small
(n>1000)
32
Newton’s Method
Exercise:
Given the following data, compute the Hessian Matrix and the
derivative vector at θ0= θ1=0
2.5 Thanh Xuan

3 Thanh Xuan
6 Hoan Kiem
2 Thanh Xuan
7 Hoan Kiem
10 Hoan Kiem
33
References
http://openclassroom.stanford.edu/MainFolder/CoursePage.php?c
ourse=MachineLearning
Andrew Ng Slides:
https://www.google.com.vn/url?sa=t&rct=j&q=&esrc=s&source=w
eb&cd=2&cad=rja&uact=8&sqi=2&ved=0ahUKEwjNt4fdvMDPAhXI
n5QKHZO1BSgQFggfMAE&url=https%3A%2F%2Fdatajobs.com%2F
data-science-repo%2FGeneralized-Linear-Models-%5BAndrew-
Ng%5D.pdf&usg=AFQjCNGq37q2uVFcpGhNqH-
5KZSlJ_HSxg&sig2=vnCEvyvKQGCuryttAPcokw&bvm=bv.134495766
,d.dGo
34

Logistic Regression

Uploaded by

Copyright:

Available Formats

Logistic Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Logistic Regression

Uploaded by

Copyright:

Available Formats

Lecture 3

Dr. Le Huu Ton

2.5 Thanh Xuan y  {0,1}

Classification with linear regression???

 Linear regression is not a good choice for classification problem

Need a more suitable hypothesis such as:

Classification => Logistic Regression

Interpretation of Hypothesis output

Example of Image Classification using Caffe

2.5 Thanh Xuan

Logistic Regression Cost function

Logistic Regression Cost function

E (h ( x), y )   y log(h ( x))  (1  y ) log(1  h ( x))

Gradient Descent for logistic regression:

Price Location Output Value

Logistic Regression: Minimize the cost function

Start with random value of coefficient θ0 and then step by step

Start with random value of coefficient θ0 and then step by step

Where: H is Hessian Matrix,  E is a derivative vector

Which is the best option checking if Newton’s method has

Newton’s Method vs Gradient Descent

Price (b.VND) Location

2.5 Thanh Xuan

You might also like