Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
14 views

Lecture 8 Logistic Regression

Detailed presentation on logistic regression

Uploaded by

Syed Abubakar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lecture 8 Logistic Regression

Detailed presentation on logistic regression

Uploaded by

Syed Abubakar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Logistic

Regression
Outline

1. The classification problem

2. Why not linear regression?

3. Logistic regression formulation

4. Logistic regression cost function

5. Worked examples

2 /33
Outline

1. The classification problem

2. Why not linear regression?

3. Logistic regression formulation

4. Logistic regression cost function

5. Worked examples

3 /33
The classification probelm
The linear regression model discussed in the previous lesson assumes that the
response variable 𝑦 is quantitative (metrical)
• in many situations, the response variable is instead qualitative (categorical)

Qualitative variables take values in an unordered set 𝒞 = "cat! ", … , "cat " " , such as:
• eye color ∈ "brown", "blue", "green"
• email ∈ "spam", "not spam"

Metric data Categorical data


• Describe a quantity • Describe membership categories
• An ordering is defined • It is not meaningful to apply an ordering
• A distance is defined • It is not meaningful to compute distances

4 /33
The classification probelm
The process of estimating categorical outcomes using a set of regressors 𝝋 is called
classification

Estimating a categorical response for an observation 𝝋 can be referred to as classifying


that observation, since it involves assigning the observation to a category, or class

Often we are more interested in estimating the probabilities that 𝝋 belongs to each
category in 𝒞

The most probable category is then chosen as the class for the observation 𝝋

5 /33
Examples of classification problems
• A person arrives at the emergency room with a set of symptoms that could possibly
be attributed to one of three medical conditions
Which of the three conditions does the individual have?

• An online banking system manages transactions, storing user’s IP address, past


transaction history, and so forth
Is the transaction fraudulent or not?

• A biologist collects DNA sequence data for a number of patients with and without
a given disease
which DNA mutations are deleterious (disease-causing) and which are not?

6 /33
Example: cat vs dog classification
Suppose that we measure the weight and Classifier function 𝑓 ⋅
height of some dogs and cats

Height [cm]
We want to learn the function 𝑓 ⋅ that can
$
tell us if a given input vector 𝝋 = 𝜑! , 𝜑# is a
dog or a cat Cats
• 𝜑! : weight
𝜑" Dogs

• 𝜑# : height

QUIZ: The point is classified by the model


as a ?
𝜑! Weight [kg]
7 /33
The classification problem

QUIZ: Consider a company that produces sliding gates. The gates can have four
weights 300Kg, 400Kg, 500Kg, 600Kg . We want to detect the weight of the
gate. This is a:

q A regression problem

q A classification problem

q Both a regression and a classification problem

8 /33
Outline

1. The classification problem

2. Why not linear regression?

3. Logistic regression formulation

4. Logistic regression cost function

5. Worked examples

9 /33
Why not linear regression?
Suppose that we are trying to estimate the medical condition of a patient in the
emergency room based on her symptoms

There are three possibilities: stroke, drug overdose and epileptic seizure
We could consider encoding these values as a quantitative response variable, 𝑦, as

1 if stroke
𝑦 = <2 if drug overdose
3 if epileptic seizure

However, we are implicitly saying that the «difference» between drug overdose and
stroke is the same as the «difference» between epileptic seizure and drug
overdose, which does not make much sense

10 /33
Why not linear regression?
We can also change the encoding to

1 if epileptic seizure
𝑦 = <2 if stroke
3 if drug overdose

This would imply a totally different relationship among the three conditions
• each of these codings would produce fundamentally different linear models…
• …that would ultimately lead to different sets of estimates on test observations

In general, there is no natural way to convert a qualitative response variable with more
than two levels into a quantitative response that is ready for linear regression

11 /33
Why not linear regression?
With two levels, the situation is better. For instance, perhaps there are only two
possibilities for the patient’s medical condition: stroke and drug overdose

0 if stroke
𝑦=<
1 if drug overdose

We can fit a linear regression to this binary response, and classify as drug overdose if
𝑦C > 0.5 and stroke otherwise, interpreting 𝑦C as a probability of drug overdose

However, if we use linear regression, some of our estimates might be outside the [0, 1]
interval, which does not make sense as a probability. There is nothing that “saturates” the
output between 0 and 1. Logistic function (Sigmoid)

12 /33
Outline

1. The classification problem

2. Why not linear regression?

3. Logistic regression formulation

4. Logistic regression cost function

5. Worked examples

13 /33
Logistic regression
Purpose: Estimate the probability that a set of input regressors 𝝋 ∈ ℝ%×! belong to one
of two classes 𝑦 ∈ 0, 1

Define the linear combination quantity Logistic function (Sigmoid)


%*!

𝑎 = I 𝜑' ⋅ 𝜃' = 𝝋$ ⋅ 𝜽
'()
0.5
The formula 𝑠 𝑎 is the logistic function

1 𝑒$ • 𝑎≫0⇒𝑠 𝑎 =1
𝑠 𝑎 = #$ =
1+𝑒 1 + 𝑒$ • 𝑎≪0⇒𝑠 𝑎 =0 0

14 /33
Logistic regression
Purpose: Estimate the probability that a set of input regressors 𝝋 ∈ ℝ%×! belong to one
of two classes 𝑦 ∈ 0, 1

1
𝑃 𝑦=1𝝋 =𝑠 𝑎 =𝑠 𝝋%𝜽 = !𝜽
1 + 𝑒 #𝝋

The output of 𝑠 𝝋!𝜽 is interpreted as a probability

• 𝝋$ 𝜽 ≫ 0 ⇒ 𝑠 𝝋$ 𝜽 ≫ 0.5 ⇒ 𝑃 𝑦 = 1 𝝋 ≈ 1 𝝋 is classified to class 1

• 𝝋$ 𝜽 ≪ 0 ⇒ 𝑠 𝝋$ 𝜽 ≪ 0.5 ⇒ 𝑃 𝑦 = 1 𝝋 ≈ 0 𝝋 is classified to class 0

15 /33
Outline

1. The classification problem

2. Why not linear regression?

3. Logistic regression formulation

4. Logistic regression cost function

5. Worked examples

16 /33
Logistic regression cost function
Suppose to have at disposal a dataset 𝒟 = 𝝋 1 ,𝑦 1 ,…, 𝝋 𝑁 ,𝑦 𝑁 where 𝝋 𝑖 ∈ ℝ%×!
and 𝑦 𝑖 ∈ 0, 1 , 𝑖 = 1, … , 𝑁, 𝑖. 𝑖. 𝑑.

1
Estimate a logistic regression model 𝑃 𝑦 𝑖 = 1 𝝋 𝑖 = ! ≡𝜋 𝑖
1+ 𝑒 *𝝋 ' 𝜽

The logistic regression cost function 𝐽 𝜽 is defined as:

*
𝐽 𝜽 = − 7 𝑦 𝑖 ⋅ ln 𝜋 𝑖 + 1 − 𝑦 𝑖 ⋅ ln 1 − 𝜋 𝑖
()!

17 /33
Logistic regression cost function
QUIZ: In the logistic regression cost function, where are the parameters 𝜽 that we
want to estimate?
*
𝐽 𝜽 = − 7 𝑦 𝑖 ⋅ ln 𝜋 𝑖 + 1 − 𝑦 𝑖 ⋅ ln 1 − 𝜋 𝑖
()!

q In the 𝑦 𝑖 terms

q In the ln terms

q In the 𝜋 𝑖 terms

18 /33
Logistic regression cost function
Cost function interpretation
Suppose there is only one datum 𝒟 = 𝝋, 𝑦

− ln 𝜋 if 𝑦 = 1
⇒𝐽 𝜽 =<
− ln 1 − 𝜋 if 𝑦 = 0

Case 𝑦 = 1

• 𝐽 𝜽 ≈ 0 if 𝑦 = 1 and 𝜋 ≈ 1
𝐽 𝜽 = −ln 𝜋
• 𝐽 𝜽 ≈ +∞ if 𝑦 = 1 and 𝜋 ≈ 0

19 /33
Logistic regression cost function
Cost function interpretation
Suppose there is only one datum 𝒟 = 𝝋, 𝑦

− ln 𝜋 if 𝑦 = 1
⇒𝐽 𝜽 =<
− ln 1 − 𝜋 if 𝑦 = 0

Case 𝑦 = 0

• 𝐽 𝜽 ≈ 0 if 𝑦 = 0 and 𝜋 ≈ 0
𝐽 𝜽 = −ln 1 − 𝜋
• 𝐽 𝜽 ≈ +∞ if 𝑦 = 0 and 𝜋 ≈ 1

20 /33
IN-DEPTH ANALYSIS
Computation of the minimum of 𝐽(𝜽)
We have to compute the gradient of 𝐽 𝜽 with respect to 𝜽 ∈ ℝ%×! . First, compute the
!
derivative of 𝑠 𝑎 =
!-. "#

𝜕𝑠 𝑎 𝜕 1 𝜕 "# "$ 𝑒 "#


= = 1 + 𝑒 = − 1+ 𝑒 "# "% 𝑒 "# −1 = − 1 + 𝑒 "# "% −𝑒 "# =
𝜕𝑎 𝜕𝑎 1 + 𝑒 "# 𝜕𝑎 1 + 𝑒 "# %

1 𝑒 "# 1 1 + 𝑒 "# − 1 1 1 + 𝑒 "# 1


= "# ⋅ "# = "# ⋅ "# = "# ⋅ "# − "# = 𝑠 𝑎 ⋅ 1−𝑠 𝑎
1+𝑒 1+𝑒 1+𝑒 1+𝑒 1+𝑒 1+𝑒 1+𝑒

In the case where 𝑎 = 𝝋$ 𝜽, we have that

𝛻𝜽𝑠 𝝋%𝜽 = 𝝋 ⋅ 𝑠 𝝋%𝜽 ⋅ 1 − 𝑠 𝝋%𝜽 =𝝋⋅𝜋⋅ 1−𝜋


𝑑×1 𝑑×1 1×1 1×1

21 /33
IN-DEPTH ANALYSIS
Computation of the minimum of 𝐽(𝜽)
We can now compute the gradient of 𝐽 𝜽
(
1
𝐽 𝜽 = − 3 𝑦 𝑖 ln 𝜋 𝑖 + 1 − 𝑦 𝑖 ln 1 − 𝜋 𝑖 𝜋 𝑖 = !𝜽
&'$
1 + 𝑒 "𝝋 &

( (
𝜋) 𝑖 −𝜋 ) 𝑖 𝝋 𝑖 𝜋 𝑖 1−𝜋 𝑖 −𝝋 𝑖 𝜋 𝑖 1 − 𝜋 𝑖
𝛻𝜽 𝐽 𝜽 = − / 𝑦 𝑖 + 1−𝑦 𝑖 = −/ 𝑦 𝑖 + 1−𝑦 𝑖
𝜋 𝑖 1−𝜋 𝑖 𝜋 𝑖 1−𝜋 𝑖
𝑑×1 %&' %&'

( (

= / −𝑦 𝑖 𝝋 𝑖 1 − 𝜋 𝑖 − 1−𝑦 𝑖 −𝝋 𝑖 𝜋 𝑖 = / 𝝋 𝑖 ⋅ −𝑦 𝑖 + 𝑦 𝑖 𝜋 𝑖 +𝝋 𝑖 ⋅ 𝜋 𝑖 −𝑦 𝑖 𝜋 𝑖
%&' %&'

( (

= / 𝝋 𝑖 ⋅ −𝑦 𝑖 + 𝑦 𝑖 𝜋 𝑖 − 𝑦 𝑖 𝜋 𝑖 + 𝜋 𝑖 = /𝝋 𝑖 ⋅ 𝜋 𝑖 − 𝑦 𝑖
%&' %&' 𝑑×1 1×1

22 /33
Gradient descent
It can be shown that:
• The cost function 𝐽 𝜽 is convex and admits a unique minimum
• The equations found by posing 𝛻𝜽 𝐽 𝜽 = 𝟎 are nonlinear in 𝜽 and it is not possible to
find a solution in closed-form
ü For this reason, we need to resort to iterative optimization algorithms

Use gradient descent:

𝜽 ! 𝑘 − 𝛼 ⋅ 𝛻𝐽 𝜽 ,
! 𝑘+1 =𝜽 𝛼 ∈ ℝ/) : learning rate
𝑑×1 𝑑×1 1×1 𝑑×1 : ;
𝜽9𝜽

23 /33
Gradient descent
*
𝐽 𝜽 = − 7 𝑦 𝑖 ⋅ ln 𝜋 𝑖 + 1 − 𝑦 𝑖 ⋅ ln 1 − 𝜋 𝑖
()!
Repeat { 0

𝜃) = 𝜃) − 𝛼 ⋅ I 𝜋 𝑖 − 𝑦 𝑖
'(!
0

𝜃! = 𝜃# − 𝛼 ⋅ I 𝜋 𝑖 − 𝑦 𝑖 ⋅ −𝜑! 𝑖
'(!


0

𝜃%*! = 𝜃%*! − 𝛼 ⋅ I 𝜋 𝑖 − 𝑦 𝑖 ⋅ −𝜑%*! 𝑖


} '(!

24 /33
Logistic regression recap
The logistic regression model, despite its name, is not used for regression, but for
classification

Once the model estimates the probability of a class, we can classify a point to a particular
class if the probability for that class is above a threshold (usually 0.5)

The function that now we are trying to estimate is: 𝑓 𝝋 = 𝑃 𝑦 = 1 𝝋

The logistic regression tries to model 𝑓 by using the model:

$
1
𝑠 𝝋 𝜽 = !𝜽
1 + 𝑒 *𝝋

The point 𝝋 can then be classified to class 𝑦 = +1 if 𝑠 𝝋$ 𝜽 ≥ 0.5

25 /33
Logistic regression recap
The classification boundary found by Linear classifier
logistic regression is linear

Height [cm]
Infact, classifying with the rule:
𝑦 = 1 if 𝑠 𝝋$ 𝜽 ≥ 0.5
Cats
Dogs

is the same as saying


𝑦 = 1 if 𝝋$ 𝜽 ≥ 0

Weight [kg]
26 /33
Outline

1. The classification problem

2. Why not linear regression?

3. Logistic regression formulation

4. Logistic regression cost function

5. Worked examples

27 /33
Students admissions classification
% Read data from file
We want to estimate if a student will get admitted data = load(‘studentsdata.csv’);
to a university given the results on two exams (𝜑$ , 𝜑% ) Phi = data(:, [1, 2]); y = data(:, 3);

• The training set consists of 𝑁 = 100 students with % Setup the data matrix appropriately, and
add ones for the intercept term
𝜑$ 𝑖 , 𝜑% 𝑖 and 𝑦 𝑖 ∈ 0,1 , for 𝑖 = 1, … , 𝑁 [N, m] = size(Phi); d = m + 1;

% Add intercept term


Phi = [ones(N, 1) Phi];

• Φ ∈ ℝ$++×- % Initialize fitting parameters


initial_theta = zeros(d, 1);
• 𝒚 ∈ ℝ$++×$ pi_s = sigmoid(Phi*theta)
• 𝜽 ∈ ℝ-×$ J = ( -y'*log(pi_s) – (1-y)'*log(1-pi_s));
grad = Phi'*( pi_s - y);

Embed in a function and pass the function to an


optimization algoritm that iteratively computes
the gradient

28 /33
The framingham heart study
In late 1940s, U.S. Government set out to better understand cardiovascular disease

Plan: track large cohort of initially healthy patients over time

The city of Framingham (MA) was selected as site for the study in the 1948
• Appropriate size
• Stable population
• Cooperative doctors and residents

A total of 5209 patients aged 30-59 were enrolled. They had to give a survey and take
and exam every 2 years:
• Physical characteristics and behavioral characteristics
• Test results
29 /33
The framingham heart study
We will build models using the Framingham data to estimate and prevent heart disease

We will estimate the 10-year risk of Coronary Heart Disease


• CHD is a disease of the blood vessels supplying the heart

Heart disease has been the leading cause of death


worldwide since 1921:
• 7.3 million people died from CHD in 2008
• Since 1950, age-adjusted death rates have declined 60%

30 /33
The framingham heart study
Demographic risk factors
• male: sex of patient • age: age in years at first examination
• education: Some high school (1), high school (2), some college (3), college (4)

Behavioral risk factors


• currentSmoker: 0/1 • cigsPerDay: cigarettes per day

Behavioral risk factors


• BPmeds: On blood pressure medication at time of first examination
• prevalentStroke: Previously had a stroke
• prevalentHyp: Currently hypertensive • Diabetes: Currently has diabetes

31 /33
The framingham heart study
Risk factors from first examination
• totChol: Total cholesterol (mg/dL)
• sysBP: Systolic blood pressure
• diaBP: Diastolic blood pressure
• BMI: Body Mass Index (kg/m# )
• heartRate: Heart rate (beats/minute)
• glucose: Blood glucose level (mg/dL)

Use logistic regression to estimate whether or not a patient experienced CHD within
10 years of first examination

32 /33
The framingham heart study

Most critical identified risk


factors

33 /33

You might also like