05 Logistic Regression
05 Logistic Regression
05 Logistic Regression
1
Will a shelter cat get adopted within the next 30 days?
PAGE 2
Source: Humane Society of Kitchener Waterloo Stratford Perth (Accessed 21/09/2024)
The cat adoption dataset
Attributes → Outcome/Label
Cats →
0.3 5 Yes
6 1 No
1 9 Yes
9 7 Yes
0.2 3 Yes
PAGE 3
Exploring the cat adoption dataset
PAGE 4
Knowledge of the chances of an event guides decision-making
Consider and compare:
Prediction A:
A cat will not get adopted within 30 days.
- Model has binary output
- Classification task
Prediction B:
The probability that a cat will get adopted
within 30 days is 5%.
- Model has continuous output
- Regression task used for classification
- Can prioritize efforts (marketing campaigns,
waived/adjusted fees, etc) and justify decisions
PAGE 5
Key Questions
I. What is logistic regression?
PAGE 6
Key Questions
I. What is logistic regression?
PAGE 7
How to model the probability of an outcome?
In Linear regression, we
“Probability” of Default
assumed a hypothesis class of the form:
Outcome
PAGE 8
Figure adapted from Introduction to Statistical Learning, Section 4.3
Hypothesis class for logistic regression
h(x)
0.5
0
0
〈w,x〉
PAGE 9
Figure adapted from Understanding Machine Learning, Section 9.3
Recall: Perceptron and the class of halfspaces
PAGE 10
Compare to Perceptron and class of halfspaces
PAGE 11
The logistic model for probability of an outcome
PAGE 12
Monotonicity contributes to interpretability
PAGE 14
Interpreting h(x) as a probability requires
a stochastic model of the outcome
PAGE 15
Recall and apply the Bernoulli random variable
PAGE 16
Deriving the likelihood function starting with the Bernoulli RV
PAGE 17
Deriving the log-likelihood function for logistic regression (1/2)
PAGE 18
Deriving the log-likelihood function for logistic regression (2/2)
PAGE 19
The logistic regression objective and cross-entropy loss
PAGE 20
Proof of convexity: Probabilistic Machine Learning, Section 10.2.3.4
Gradient descent for numerical optimization
PAGE 21
Understanding Machine Learning, Section 14.1, 14.3
Another approach: Newton’s method
w0 w1 w2
PAGE 22
Deriving the update for Newton’s method
f(w0)
w0 w1
Δ0
PAGE 23
Application of Newton’s method to loss function minimization
PAGE 24
Key Questions
I. What is logistic regression?
PAGE 25
Generalizing to the multiclass setting
PAGE 26
Figures: Sci-Kit Learn
Architectural interpretation of logistic regression
PAGE 27
Logistic regression
PAGE 28
Multinomial regression
PAGE 29
Multinomial regression
PAGE 30
Multinomial regression
Probabilities (%): 5 77 17
PAGE 31
Now that we’re at the end of the lecture,
you should be able to…
★ Recommend and justify application of logistic regression in appropriate real-world
scenarios, as an alternative to linear regression and binary classification.
★ Explain the logistic regression hypothesis class using correct terminology,
including conditional probability, sigmoid function, and linear predictor.
★ Sketch the decision boundary of a logistic regression predictor in a
low-dimensional setting for different thresholds and parameters.
★ Defend the cross-entropy loss function used in logistic regression.
★ Explain the parametrization and hypothesis class of multinomial regression with
reference to the softmax function.
★ Implement and apply iterative optimization algorithms including gradient
descent, stochastic gradient descent, and the Newton-Raphson method.
★ Interpret the meaning of coefficients of a learned logistic regression model.
PAGE 32
PAGE 33