Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
47 views

06-Logistic Regression-V Machine Learning

The document discusses logistic regression, an algorithm for classification. It introduces linear classification and decision boundaries. For logistic regression, the linear function is passed through a logistic function that outputs values between 0 and 1, interpreted as probabilities. This softens the hard threshold of linear classification. The parameters θ control the shape and location of the logistic curve. The goal is to fit the parameters to minimize loss on training data, using gradient descent on the cost function.

Uploaded by

Ali Don
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

06-Logistic Regression-V Machine Learning

The document discusses logistic regression, an algorithm for classification. It introduces linear classification and decision boundaries. For logistic regression, the linear function is passed through a logistic function that outputs values between 0 and 1, interpreted as probabilities. This softens the hard threshold of linear classification. The parameters θ control the shape and location of the logistic curve. The goal is to fit the parameters to minimize loss on training data, using gradient descent on the cost function.

Uploaded by

Ali Don
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

CS 4104

APPLIED MACHINE LEARNING

Dr. Hashim Yasin


National University of Computer
and Emerging Sciences,
Faisalabad, Pakistan.
LOGISTIC REGRESSION
Logistic Regression
3

 Linear functions can be used to do classification as


well as regression.

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Logistic Regression
4

 Given these training data, the task of classification


is to learn a hypothesis ℎ.

 A decision boundary is a line (or a surface, in


higher dimensions) that separates the two classes.

 A linear decision boundary is called a linear


separator and data that admit such a separator
are called linearly separable.
Dr. Hashim Yasin Applied Machine Learning (CS4104)
Logistic Regression
5

 In this case, the linear separator is


𝑥2 = 1.7𝑥1 − 4.9

−𝑥2 + 1.7𝑥1 − 4.9 = 0

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Logistic Regression
6

 For Class A, the right of this line with higher values


of 𝑥1 and lower values of 𝑥2 ,
−𝑥2 + 1.7𝑥1 − 4.9 > 0
 For Class B, the equation would be,
−𝑥2 + 1.7𝑥1 − 4.9 < 0
 The classification hypothesis can be written as
1, 𝑖𝑓 𝛉. 𝐱 ≥ 0
ℎ𝛉 𝐱 = ቊ
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Logistic Regression
7

 Alternatively, we can think of ℎ as the result of


passing the linear function 𝛉. 𝐱 through a threshold
function:
ℎ𝛉 𝐱 = 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑(𝛉. 𝐱)
1 𝑖𝑓 𝛉. 𝐱 ≥ 0
𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝛉. 𝐱 = ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑚

෍ 𝜃𝑖 𝑥𝑖 > 𝑡 𝑜𝑢𝑡𝑝𝑢𝑡 = 1
𝑖=1
𝑒𝑙𝑠𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 = 0

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Logistic Regression
8

෍ 𝜃𝑖 𝑥𝑖 > 𝑡 𝑜𝑢𝑡𝑝𝑢𝑡 = 1
𝑖=1
𝑒𝑙𝑠𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 = 0
𝜃1 = 1, 𝜃2 = 0.2, 𝑡 = 0.05

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Logistic Regression
9

෍ 𝜃𝑖 𝑥𝑖 > 𝑡 𝑜𝑢𝑡𝑝𝑢𝑡 = 1
𝑖=1
𝑒𝑙𝑠𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 = 0
𝜃1 = 2.1, 𝜃2 = 0.2, 𝑡 = 0.05

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Logistic Regression
10

෍ 𝜃𝑖 𝑥𝑖 > 𝑡 𝑜𝑢𝑡𝑝𝑢𝑡 = 1
𝑖=1
𝑒𝑙𝑠𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 = 0
𝜃1 = −0.8, 𝜃2 = 0.03, 𝑡 = 0.05

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Logistic Regression
11

 The hypothesis ℎ(𝐱) is not differentiable


 It is in fact a discontinuous function of its inputs
and its weights.
 As a result, learning becomes very unpredictable
adventure.
 The linear classifier always announces a completely
confident prediction of 1 or 0, even for examples
that are very close to the boundary.

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Logistic Regression
12

 All of these issues can be resolved to a large extent


by softening the threshold function
 In case of logistic regression, we approximate the
threshold with a continuous, differentiable function
that generate output between 0 and 1

0 ≤ ℎ𝛉 𝐱 ≤ 1

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Logistic Regression
13

 The logistic function is

𝑒𝑧 1
𝑔 𝑧 = 𝑧
=
1+𝑒 1 + 𝑒 −𝑧

1
ℎ𝛉 𝐱 =
1 + 𝑒 −𝛉.𝐱
ℎ𝛉 𝐱 = 𝛉𝑇 𝐱 = 𝛉. 𝐱

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Logistic Regression
14

(a) The hard threshold function (b) The logistic function,


Threshold with 0/1 output.
Dr. Hashim Yasin Applied Machine Learning (CS4104)
Logistic Regression
15

Plot of a logistic regression hypothesis

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Logistic Regression
16

𝑒𝑧 𝑒 (𝜃0 +𝜃1 𝑥)
𝑔 𝑧 = 𝑧
=
1+𝑒 1 + 𝑒 (𝜃0 +𝜃1 𝑥)

𝜃0 𝜃1

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Logistic Regression
17

 Parameters control shape and location of


logistic/sigmoid curve
 𝜃0 … controls location of the midpoint
 𝜃1 … controls slope of rise
𝑒𝑧
𝑔 𝑧 = 𝜃0 𝜃1
1 + 𝑒𝑧 𝑔 𝑥

𝑒 (𝜃0 +𝜃1 𝑥)
=
1+𝑒 (𝜃0 +𝜃1 𝑥)

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Logistic Regression
18

 Notice that the output, being a number between 0 and 1,


can be interpreted as a probability of belonging to the
class labeled 1.
 If ℎ(𝑥) ≥ 0.5, the output is "1" otherwise ℎ(𝑥) < 0.5
the output is "0".
 The process of fitting the parameters of this model to
minimize loss on a data set is called logistic regression.
 There is NO easy closed-form solution for this model, but
the gradient descent computation is straight-forward.

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Cost Function
19

𝛉
𝜃𝑖

𝛉 𝛉
𝜃𝑖

𝛉 𝛉 𝛉
𝜃𝑖
𝛉 𝛉

𝜕𝑔(𝑓(𝑥)) ′
𝜕𝑓(𝑥)
= 𝑔 (𝑓(𝑥))
𝜕𝑥 𝜕𝑥
Dr. Hashim Yasin Applied Machine Learning (CS4104)
Cost Function
20

𝛉
𝜃𝑖

𝛉 𝛉
𝜃𝑖

𝛉 𝛉 𝛉
𝜃𝑖
𝛉 𝛉

𝜕𝑔(𝑓(𝑥)) ′
𝜕𝑓(𝑥)
= 𝑔 (𝑓(𝑥))
𝜕𝑥 𝜕𝑥
Dr. Hashim Yasin Applied Machine Learning (CS4104)
1
Cost Function ℎ𝛉 𝐱 =
1 + 𝑒 −𝛉.𝐱
21

𝛉
𝜃𝑖

𝛉 𝛉
𝜃𝑖

𝛉 𝛉 𝛉
𝜃𝑖
𝛉 𝛉

𝜕𝑔(𝑓(𝑥)) ′
𝜕𝑓(𝑥)
= 𝑔 (𝑓(𝑥))
𝜕𝑥 𝜕𝑥
Dr. Hashim Yasin Applied Machine Learning (CS4104)
1
Cost Function ℎ𝛉 𝐱 =
1 + 𝑒 −𝛉.𝐱
22

𝛉
𝜃𝑖

𝛉 𝛉
𝜃𝑖

𝛉 𝛉 𝛉
𝜃𝑖
𝛉 𝛉

𝜕𝑔(𝑓(𝑥)) ′
𝜕𝑓(𝑥)
= 𝑔 (𝑓(𝑥))
𝜕𝑥 𝜕𝑥
Dr. Hashim Yasin Applied Machine Learning (CS4104)
1
Cost Function ℎ𝛉 𝐱 =
1 + 𝑒 −𝛉.𝐱
23

𝛉
𝜃𝑖

𝛉 𝛉
𝜃𝑖

𝛉 𝛉 𝛉
𝜃𝑖
𝛉 𝛉

𝜕𝑔(𝑓(𝑥)) ′
𝜕𝑓(𝑥)
= 𝑔 (𝑓(𝑥))
𝜕𝑥 𝜕𝑥
Dr. Hashim Yasin Applied Machine Learning (CS4104)
Cost Function
24

 The derivative 𝑔′ of the logistic function satisfies

 So, we have
𝑔′(𝛉. 𝐱) = 𝑔(𝛉. 𝐱)(1 − 𝑔(𝛉. 𝐱))

= ℎ𝛉 (𝐱)(1 − ℎ𝛉 (𝐱))
 So the weight update for minimizing the loss is
𝜃𝑖 = 𝜃𝑖 + 𝜂(𝑦 − ℎ𝛉 (𝐱)) × ℎ𝛉 (𝐱)(1 − ℎ𝜽 (𝐱)) × 𝑥𝑖

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Gradient Descent Search
25

𝛉 ← any point in the parameter space


loop until convergence do
for each 𝜃𝑖 in 𝛉 do
𝜃𝑖 = 𝜃𝑖 + 𝜂(𝑦 − ℎ𝛉 (𝐱)) × ℎ𝛉 (𝐱)(1 − ℎ𝜽 (𝐱)) × 𝑥𝑖

 The parameter 𝜂 is the step size or the learning


rate in a learning problem.
 It can be a fixed constant, or it can decay over time
as the learning process proceeds.
Dr. Hashim Yasin Applied Machine Learning (CS4104)
Gradient Descent…Example
26

 Find 𝑥 that is minimum of


𝑓(𝑥) = 1.2(𝑥 − 2)2 + 3.2
 How?
 Take derivative, set equal to zero, and try to solve
for 𝑥.

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Acknowledgement
27

Tom Mitchel, Russel & Norvig, Andrew Ng, Alpydin &


Ch. Eick.

Dr. Hashim Yasin Applied Machine Learning (CS4104)


28

Dr. Hashim Yasin Applied Machine Learning (CS4104)

You might also like