Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Logistic Regression

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Lecture 3

Logistic Regression

Dr. Le Huu Ton

Hanoi, 2020
Outline

Logistic Regression

Gradient Descent

Newton’s Method

2
Outline

Logistic Regression

Gradient Descent

Newton’s Method

2
Classification

Example:
Given data of prices of houses with size from 25-30 m2 and their
location. Predict if a house is on Thanh Xuan district or Hoan Kiem
district base on it price.
Price (b.VND) Location

2.5 Thanh Xuan y  {0,1}


3.5 Thanh Xuan
y=0 => Thanh Xuan (negative)
5.6 Hoan Kiem y=1 => Hoan Kiem (positive)
2.2 Thanh Xuan
6.9 Hoan Kiem
9.6 Hoan Kiem

4
Classification

Classification with linear regression???


Location

0.5

0
Price

5
Classification

 Linear regression is not a good choice for classification problem

h( x )   x T

Need a more suitable hypothesis such as:

0  h( x )  1
=> 1
h( x)  g ( x) where
T
g ( z)  z
1 e
1
h( x )   T x
g(z): sigmoid function or logistic
function
1 e 6
Classification

Classification => Logistic Regression


Logistic Function
y=h(x)

θTx

7
Classification

Interpretation of Hypothesis output


h(x) can be considered as the probability that output y =1 with a
given value of input x
h(x)=0.65 => there is 65% chance that the house is locate at Hoan
kiem district

8
Classification

Example of Image Classification using Caffe


http://demo.caffe.berkeleyvision.org/

9
Classification

Example:
Calulate the output value with following coefficient
Θ0 = Θ1=0;
Θ0 =0.5 Θ1=0.7;
Price (b.VND) Location

2.5 Thanh Xuan


3.5 Thanh Xuan
5.6 Hoan Kiem
2.2 Thanh Xuan
6.9 Hoan Kiem
9.6 Hoan Kiem

10
Classification

Decision Boundary

11
Classification

Decision Boundary

12
Classification

Decision Boundary

13
Classification

Cost Function
Linear Regression:
1 m
E ( )  
2m i 1
( h ( x (i )
)  y (i ) 2
)

1
h( x )   T x
1 e

14
Classification

Logistic Regression Cost function

  log(h ( x) if y  1 
E (h ( x), y )   
  log(1  h ( x )) if y  0 

15
Classification

Logistic Regression Cost function

  log(h ( x) if y  1 
E (h ( x), y )   
  log(1  h ( x )) if y  0 

E (h ( x), y )   y log(h ( x))  (1  y ) log(1  h ( x))

16
Outline

Logistic Regression

Gradient Descent

Newton’s Method

2
Classification

Gradient Descent for logistic regression:


Given the cost function
1 m (i )
E ( )    [ y log h ( x (i ) )  (1  y ( i ) ) log(1  h ( x ( i ) )]
m i 1
Update θ until convergence:


 j :  j   E ( )
 j

18
Classification

Exercise:

Calculate
E ( )
 j
  1 1 z
g ( z)  z
 z 2
(e )
z z 1  e (1  e )
1 1
 z
(1  z
)  g ( z )(1  g ( z ))
1 e 1 e

 1  f ( g )  g
log( z )  f ( g ( z )) 
z z z g z
19
Classification

Solution:

 1 m
  (h ( x (i ) )  y (i ) ) x (ji )
 j m i 1

20
Classification

Exercise:
Starting with θ0 and θ1 equal to 0. α =0.001. Calculate the value of
coefficient after first iteration with batch gradient descent

Price Location Output Value


2.5 Thanh Xuan 0
3.5 Thanh Xuan 0
5.6 Hoan Kiem 1
2.2 Thanh Xuan 0
6.9 Hoan Kiem 1
9.6 Hoan Kiem 1

21
Homework

Exercise:
Starting with θ0 and θ1 equal to 0. α =0.0001. Calculate the value of
coefficient after first iteration using gradient descent with batch
learning, stochastic and mini batch learning algorithm
Price Location Output Value
2.5 Thanh Xuan 0
3.5 Thanh Xuan 0
5.6 Hoan Kiem 1
2.2 Thanh Xuan 1
6.9 Hoan Kiem 0
9.6 Hoan Kiem 1

22
Outline

Logistic Regression

Gradient Descent

Newton’s Method

2
Newton’s Method

Logistic Regression: Minimize the cost function

1 m (i )
E ( )    [ y log h ( x (i ) )  (1  y ( i ) ) log(1  h ( x ( i ) )]
m i 1
Gradient Descent: step by step modify the coefficients θ such as
this modification reduce the cost function

 j :  j   E ( )
 j
Newton’s method shares the same idea with normal equation
(linear regression): finding the coefficients θ as

E ( )  J ( )  0
 24
Newton’s Method

J(θ)

θ3 θ2 θ1 θ0 θ
25
Newton’s Method

Start with random value of coefficient θ0 and then step by step


update θ, until E’(θ) reaches 0, or E(θ) reaches its minimum
While J(θ)!=0
{
- Calculate the tangent line of J(θ) at θt
- Find the cross point of tangent line with the θ axis, called θt+1
- Update θt to θt+1
}

26
Newton’s Method

J(θ)
 J ( 0
)
J ( 0 )  J ' ( 0 ) 
 
J ( 0 )
 ' 0
J ( )

J(θ0)
J (  0
)
1   0 
J '( 0 )

Δ
θ3 θ2 θ1 θ0

27
Newton’s Method

Start with random value of coefficient θ0 and then step by step


update θ
While J(θ)!=0
{
- Calculate the tangent line of J(θ) at θt
- Find the cross point of tangent line with the θ axis, called θt+1
- Update θt to θt+1
J (  t
) E '
( t
)
  
t 1 t
   '' t
t

J '( )t
E ( )
}

28
Newton’s Method

J (  t
) E '
( t
)
  
t 1 t
   '' t
t

J '( )t
E ( )
  t  H 1 E

Where: H is Hessian Matrix,  E is a derivative vector



H 00 H 01 ... H 0 n E  
 0
H10 H11 ... H1n 2 E
H where H ij   E  ...
... ... ... ...  i  j

H n0 H n1 ... H nn E  
 n

29
Newton’s Method


E  
 0
1 m
 E  ...  E   (h ( x (i ) )  y (i ) ) x ( i )
 m i 1
E  
 n

1 m
H    h( x (i ) )(1  h( x ( i ) ) x ( i ) ( x ( i ) )T 
m i 1

30
Newton’s Method

Which is the best option checking if Newton’s method has


converged?
1. Plot h(x) as a function of x, and check if it fits the data well.
2. Plot E(θ) as a function of θ and check if it has reach a minimum
3. Plot θ as a function of the number of iteration and check if it
has stop decreasing (or decreasing only a tiny amount per
iteration)
4. Plot E(θ) as a function of number of iteration and check if it has
stop decreasing (or decreasing only a tiny amount per
iteration)

31
Newton’s Method

Newton’s Method vs Gradient Descent


Gradient Descent Newton’s Method
Implementation Simpler More complex
Need to chose No
parameter
Convergence Speed Need more Iteration Less iteration
Computation cost of Each iteration is
each iteration is more expensive 0(n3)
cheep 0(n) N:number of
n:number of features features
Application Use when n is large Use when n is small
(n>1000)

32
Newton’s Method

Exercise:
Given the following data, compute the Hessian Matrix and the
derivative vector at θ0= θ1=0

Price (b.VND) Location

2.5 Thanh Xuan


3 Thanh Xuan
6 Hoan Kiem
2 Thanh Xuan
7 Hoan Kiem
10 Hoan Kiem

33
References

http://openclassroom.stanford.edu/MainFolder/CoursePage.php?c
ourse=MachineLearning
Andrew Ng Slides:
https://www.google.com.vn/url?sa=t&rct=j&q=&esrc=s&source=w
eb&cd=2&cad=rja&uact=8&sqi=2&ved=0ahUKEwjNt4fdvMDPAhXI
n5QKHZO1BSgQFggfMAE&url=https%3A%2F%2Fdatajobs.com%2F
data-science-repo%2FGeneralized-Linear-Models-%5BAndrew-
Ng%5D.pdf&usg=AFQjCNGq37q2uVFcpGhNqH-
5KZSlJ_HSxg&sig2=vnCEvyvKQGCuryttAPcokw&bvm=bv.134495766
,d.dGo

34

You might also like