Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

University

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Student Name: Nguyen Dinh Hoang

Student ID: 20021361

Machine Learning (INT3405E)


Homework 02

Question 1
Learning XOR function - 10pts
If we fit the dataset with a logistic regression model, what is the results of the gradient descent
training: Convergence or Non-Convergence?

1. Explain your choice

2. Could you verify your answer by some experiment results

1. The results of the gradient descent training is Non-Convergence. With the XOR function,
because the data is not linearly separable, so cannot find a line to help divide the 0 and 1
layers, so the problem has no solution.

2. Logistic Regression also only generates linear contours.

It’s easy to catch that we cannot find a line to divide the 0 and 1 range
Try with 2d data:

1
Question 2
1. Apply Newton’s method to training the Logistic Regression Model (concretize updated
formulas of parameter sets w, w0 ).

2. Plot the learning path of Newton’s method and the Gradient descent’s method (like in
slide) with the AND dataset and XOR dataset. You should test with several initial values
of parameter sets w, w0 .

We have log of likelihood function:


n
X
ℓ(θ) = yi log(hθ (xi )) + (1 − yi ) log(1 − hθ (xi ))
i=1

According to the topic we have Newton’s Method:

1. Find the tangent line to f (x) at point (xn , yn )

y = f ′ (xn )(x − xn ) + f (xn ) (1)

2. Find the x-intercept of the tangent line: xn+1

0 = f ′ (xn )(xn+1 − xn ) + f (xn ) (2)



−f (xn ) = f (xn )(xn+1 − xn ) (3)
f (xn )
xn+1 = xn − ′ (4)
f (xn )

3. Find the y value at the x-intercept:

yn+1 = f (xn+1 ) (5)

IF yn+1 − yn ≈ 0 return yn+1


ELSE set x = xn+1 , y = yn+1 then return equation (1)

Apply to Logistic Regression:


1
x̂n+1 = x̂n − f (x̂n )
∇f (x̂n )
−1
θn+1 = θn + Hℓ(θ̂)
∇ℓ(θ)

The gradient of ℓ(θ) is: *P +


n
(yi − hθ (xi ))xi
∇ℓ = Pi=1
n
i=1 (yi − hθ (xi ))

And the Hessian matrix is:


 n n 
X X
 h (x
θ i )(1 − h (x
θ i ))θ θ
1 1 , h (x
θ i )(1 − h (x
θ i ))θ 1
 i=1 i=1 
Hℓ(θ̂) =  n

n


X X
hθ (xi )(1 − hθ (xi ))θ1 , hθ (xi )(1 − hθ (xi ))
 
i=1 i=1

1
Note: hθ (x) = 1+e−z and z = θ1 x + θ2
We have:

2
Command Line
def newtons_method(x, y):
theta_1 = random
theta_2 = random
delta_l = np.Infinity
l = log_likelihood(x, y, theta_1, theta_2)
max_iterations = 15
i = 0
while abs(delta_l) > .0000001 and i < max_iterations:
i += 1
g = gradient(x, y, theta_1, theta_2)
hess = hessian(x, y, theta_1, theta_2)
H_inv = np.linalg.inv(hess)
delta = np.dot(H_inv, g)
delta_theta_1 = delta[0][0]
delta_theta_2 = delta[1][0]

# Update step
theta_1 += delta_theta_1
theta_2 += delta_theta_2

# Update the log-likelihood at each iteration


l_new = log_likelihood(x, y, theta_1, theta_2)
delta_l = l - l_new
l = l_new
return np.array([theta_1, theta_2])

AND dataset:

You might also like