Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

05 Logistic Regression

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

CS 480/680

Introduction to Machine Learning


Lecture 5
Logistic Regression and Numerical Optimization
Kathryn Simone
24 September 2024

1
Will a shelter cat get adopted within the next 30 days?

PAGE 2
Source: Humane Society of Kitchener Waterloo Stratford Perth (Accessed 21/09/2024)
The cat adoption dataset
Attributes → Outcome/Label
Cats →

Age Playfulness Adopted?


(Years) (a.u.)

0.3 5 Yes

6 1 No

1 9 Yes

9 7 Yes

0.2 3 Yes

PAGE 3
Exploring the cat adoption dataset

PAGE 4
Knowledge of the chances of an event guides decision-making
Consider and compare:

Prediction A:
A cat will not get adopted within 30 days.
- Model has binary output
- Classification task

Prediction B:
The probability that a cat will get adopted
within 30 days is 5%.
- Model has continuous output
- Regression task used for classification
- Can prioritize efforts (marketing campaigns,
waived/adjusted fees, etc) and justify decisions

PAGE 5
Key Questions
I. What is logistic regression?

II. How do we estimate the parameters?

III. How can we handle the multiclass case?

PAGE 6
Key Questions
I. What is logistic regression?

II. How do we estimate the parameters?

III. How can we handle the multiclass case?

PAGE 7
How to model the probability of an outcome?

In Linear regression, we

“Probability” of Default
assumed a hypothesis class of the form:

Outcome

PAGE 8
Figure adapted from Introduction to Statistical Learning, Section 4.3
Hypothesis class for logistic regression

h(x)
0.5

0
0
〈w,x〉

PAGE 9
Figure adapted from Understanding Machine Learning, Section 9.3
Recall: Perceptron and the class of halfspaces

PAGE 10
Compare to Perceptron and class of halfspaces

PAGE 11
The logistic model for probability of an outcome

PAGE 12
Monotonicity contributes to interpretability

Discussion: Logistic Regression in the Credit Industry


(2nd Order Solutions on medium.com) PAGE 13
Key Questions
I. What is logistic regression?

II. How do we estimate the parameters?

III. How can we handle the multiclass case?

PAGE 14
Interpreting h(x) as a probability requires
a stochastic model of the outcome

PAGE 15
Recall and apply the Bernoulli random variable

PAGE 16
Deriving the likelihood function starting with the Bernoulli RV

PAGE 17
Deriving the log-likelihood function for logistic regression (1/2)

Full derivation at the end of


this deck, if interested

PAGE 18
Deriving the log-likelihood function for logistic regression (2/2)

PAGE 19
The logistic regression objective and cross-entropy loss

PAGE 20
Proof of convexity: Probabilistic Machine Learning, Section 10.2.3.4
Gradient descent for numerical optimization

PAGE 21
Understanding Machine Learning, Section 14.1, 14.3
Another approach: Newton’s method

w0 w1 w2

PAGE 22
Deriving the update for Newton’s method

f(w0)

w0 w1
Δ0

PAGE 23
Application of Newton’s method to loss function minimization

PAGE 24
Key Questions
I. What is logistic regression?

II. How do we estimate the parameters?

III. How can we handle the multiclass case?

PAGE 25
Generalizing to the multiclass setting

PAGE 26
Figures: Sci-Kit Learn
Architectural interpretation of logistic regression

PAGE 27
Logistic regression

PAGE 28
Multinomial regression

k=1 k=2 k=2

PAGE 29
Multinomial regression

k=1 k=2 k=2

PAGE 30
Multinomial regression

Output Layer: 0.2 3.0 1.5

Probabilities (%): 5 77 17

PAGE 31
Now that we’re at the end of the lecture,
you should be able to…
★ Recommend and justify application of logistic regression in appropriate real-world
scenarios, as an alternative to linear regression and binary classification.
★ Explain the logistic regression hypothesis class using correct terminology,
including conditional probability, sigmoid function, and linear predictor.
★ Sketch the decision boundary of a logistic regression predictor in a
low-dimensional setting for different thresholds and parameters.
★ Defend the cross-entropy loss function used in logistic regression.
★ Explain the parametrization and hypothesis class of multinomial regression with
reference to the softmax function.
★ Implement and apply iterative optimization algorithms including gradient
descent, stochastic gradient descent, and the Newton-Raphson method.
★ Interpret the meaning of coefficients of a learned logistic regression model.

PAGE 32
PAGE 33

You might also like