0% found this document useful (0 votes)

2 views

practicalMachineLearning_lecture3

The document outlines the schedule for upcoming presentations in a Practical Machine Learning course, detailing topics and students responsible for each lecture. It also covers key concepts related to loss functions, maximum likelihood estimation, and parameter estimation using PyTorch. Additionally, it includes assignments and hands-on exercises for students to apply their learning on linear and logistic regression.

Uploaded by

p230.mc

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

practicalMachineLearning_lecture3

Uploaded by

p230.mc

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Redistribution without permission not allowed

Practical Machine Learning

Lecture 3

Daniel Andrade
Check Updated Schedule of Presentations
• Check whether your name is listed in the schedule of the following pages.
• Check the day of your presentation and start preparing.
Schedule of Presentations (1/2)
• Lecture 4 (December 12th):
• Section 6.2 in “An Introduction to Statistical Learning”
Students in charge: m242663, m242232
Either as a team or separate content at Subsection “Comparing the Lasso and Ridge Regression”, page 245 (book page, not pdf page number)

• Lecture 5 (December 17th):

• "Statistical challenges of high-dimensional data”, pages 1-8 + Section 6.4 in “An Introduction to Statistical Learning”,
Student in charge: m242520
• Section 10.1 ~ 10.4 of “An Introduction to Statistical Learning”,
Student in charge: m244718

• Lecture 6 (December 19th):

• https://d2l.ai/chapter_optimization/index.html Section 12.1 ~ 12.6
Students in charge: m241613, m243893
Either as a team or separate content 12.1 ~ 12.3 and 12.4 ~ 12.6
• https://d2l.ai/chapter_optimization/index.html Section 12.7 ~ 12.11
Students in charge: m243893, m245772 (team)

• Lecture 7 (December 24th)

• “Probabilistic Machine Learning - Advanced Topics”, Section 3.1, 3.2
Student in charge: m241098
• "Probabilistic Machine Learning - Advanced Topics”, Section 5.1 KL-Divergence
Students in charge: m245209, m241465
Either as a team or separate content at Subsection 5.1.4
Schedule of Presentations (2/2)
• Lecture 8 (January 7)
"New Insights and Perspectives on the Natural Gradient Method", till page 23 + “Second-order optimization for neural networks.”, Chapter 2+3
Student in charge: m244200, m241594
Either as a team or separate content: "New Insights and Perspectives on the Natural Gradient Method” and “Second-order optimization for neural networks.”

• Lecture 9 (January 9)
https://d2l.ai/chapter_attention-mechanisms-and-transformers/index.html Section 11
Students in charge: m245073, m232259, m242619, m235482
Either as a team or separate content: 11.1, 11.2 and 11.3, 11.4 and 11.5, 11.6 and 11.7~11.9

• Lecture 10 (January 14)

"Bayesian Statistics and Modeling” + “Holes in Bayesian statistics”(some parts)
Students in charge: m244645, m243658, m241734
Either as a team or separate content: "Bayesian Statistics and Modeling” pages 1~8, pages 9~23, “Holes in Bayesian statistics”(some parts)

• Lecture 11 (January 16)

“Probabilistic Machine Learning - Advanced Topics”, Section 2.6.4 (Stationary distribution of a Markov chain) + Section 12.2, 12.3
Students in charge: m243000, m246692 (team)

• Lecture 12 (January 21)

“Probabilistic Machine Learning - Advanced Topics”, Markov chain Monte Carlo, Section 12.5 ~ 12.7
Students in charge: m240492, m244274
Either as a team or separate content: 12.5 and 12.6,12.7

• Lecture 13 (January 23)

“MCMC using Hamiltonian dynamics” pages 1~21
Students in charge: m245117, m246981
Either as a team or separate content: pages 1~12 and 13~21
Todays Topic
1. Loss Functions and Maximum Likelihood Methods for Estimating
Parameters θ of model fθ .
2. Parameter Estimation with PyTorch
3. Hands-on Session: Try out what you learnt
Loss Functions and
Maximum Likelihood
Estimation
Recall from Lecture 1
Goal of Prediction
p
• For any covariate vector x ∈ ℝ , we want to minimize the error of wrong
prediction (in expectation), i.e.:

Find f that minimizes

[ℓ( f(X), Y)] ,

where the expectation is with respect to p(y, x), i.e. the joint density of Y
and X, and ℓ is some loss function. ℓ(y,̂ y), where ŷ is the prediction of the
our model, and y is the true value.
𝔼
Recall from Lecture 1

Commonly used loss functions for evaluation

• Commonly used loss function for regression is the mean squared error (MSE) loss
2
function ℓ(y,̂ y) = (ŷ − y) , for which we get
2 p
[( f(X) − Y) ] , where f : ℝ → ℝ.

• Commonly used loss function for classi ication is 0-1 Loss

ℓ(y,̂ y) = I(ŷ ≠ y) , for which we get
p k
[I( argmax fj(x) ≠ y)] , where f : ℝ → ℝ , k is the number of classes.
j∈{1,2,…,k}
(where fj are the logits that are passed to the softmax, recall
p1(X), p2(X), …, pk(X) = softmax( f1(X), f2(X), …, fk(X))
𝔼
𝔼
f
Parameter Estimation
d
• The model fθ has parameters θ ∈ ℝ that need to be set.
• Since d is large, we cannot specify them all manually and therefore try to estimate them from our data (*)
• Parameter estimation is also called training in machine learning.
• By far, the most successful training method is the gradient descent method:

∂
θ (t+1)
:= θ − η(
(t)
[ℓ( fθ(t)(X), Y)]) ,
∂θ
(t)
where θ is parameter θ at step t, and η is called the learning rate.
(0)
θ is set to some random value.
(t)
If η is small enough, then (in most situations) θ converges to a stationary point.
(Hopefully to a good local minimum. If the objective function is convex then, to a
global minimum)
𝔼
(*) A few remaining parameters nevertheless need to be set manually, these parameters are often called Hyper-parameters
Expectations are estimated using data D
• In general, we do not know p(y, x).

• But assuming D = {(y1, x1), (y2, x2), …, (yn, xn)} are iid samples from p(y, x),
we have the following unbiased estimates:

n
1
∑
[ℓ( fθ(X), Y)] ≈ ℓ( fθ(xi), yi) , and
n i=1
n
∂ 1 ∂
∑
[ℓ( fθ(X), Y)] ≈ ℓ( fθ(xi), yi) .
∂θ n i=1 ∂θ
𝔼
𝔼
Commonly used loss functions for training
2
• For regression the (MSE) loss function ℓ(y,̂ y) = (ŷ − y) can also be used for
training.

• For classi ication the 0-1 Loss cannot be used since the gradient with respect
to θ is 0 almost everywhere. Instead a popular surrogate loss is the cross-
entropy (CE) loss (*):
ℓ(p, y) = − log py ,
where y ∈ {1,2,…, k} is the true label and
vector p = (p1, p2, …, pk) contains in position j the predicted probability of
class j.
(*) strictly speaking, the CE loss, as de ined e.g. in PyTorch, is using the logits as input.
f
f
Gradient of 0-1 Loss with respect to θ is 0 almost everywhere

Simple Example Illustration (Assuming θ ∈ ℝ)

I(yθ̂ ≠ y)

0
θ

Comment: Here I write yθ̂ instead of ŷ to emphasize the dependence on θ

Equivalence to Maximum Likelihood Estimation - Regression
2 2
• Using the likelihood function N(y | f(x), σ ) , for any ixed σ , we have the
following equivalence:

n n
1 2 2
∑ ∏
argmin ( fθ(xi) − yi) = argmax N(yi | fθ(xi), σ )
θ n i=1 θ i=1

Minimizing MSE loss Maximizing likelihood of data

To verify this, transform the righthand side by −log( ⋅ )

f
Equivalence to Maximum Likelihood Estimation - Classi ication

• Using the likelihood function Cat(y | p1(x), p2(x), …, pk(x)), we have the
following equivalence

n n
1
∑ ∏
argmin − log pθ,yi(xi) = argmax pθ,yi(xi)
θ n i=1 θ i=1
Minimizing CE loss Maximizing likelihood of data

where pθ,1(x), pθ,2(x), …, pθ,k(x) = softmax( fθ,1(x), fθ,2(x), …, fθ,k(x))

logits

Recall that
e zj
softmax(z1, z2, …, zk)j := k
∑l=1 e zl

f
Parameter Estimation
with PyTorch
Parameter Estimation with PyTorch
• Vanilla Gradient Descent is available as torch.optim.SGD
• Don’t forget to call optimizer.zero_grad()

A minimal example:

loss_fn = torch.nn.CrossEntropyLoss()

optimizer = torch.optim.SGD(simpleModel.parameters(), lr=LEARNING_RATE)

for t in range(EPOCHS):
pred = simpleModel(X)
loss = loss_fn(pred, y)

optimizer.zero_grad() # clear "grad"

loss.backward() # calculate gradients and save in "grad" standard training pattern in PyTorch
optimizer.step() # one gradient descent step
What is optimizer doing?
• In the case of a constant learning rate (torch.optim.SGD) it is easy to
implement gradient decent manually.
• The code below is for educational purposes. In practice, you should
always use an optimizer, since this leads to code that is
(1) cleaner, (2) possibly faster, (3) easier to replace.

# clear "grad"
for param in simpleModel.parameters(): Corresponds to optimizer.zero_grad()
if param.grad is not None:
param.grad = torch.zeros_like(param.grad)

loss.backward() # calculate gradients and save in "grad"

# one gradient descent step

with torch.no_grad(): Corresponds to optimizer.step()
for param in simpleModel.parameters():
param -= LEARNING_RATE * param.grad
First Assignment
First Assignment
• lecture3_linear_regression.py calculates an approximation to the least squares solution. Denote this approximation
p
by β̃
∈ ℝ (linear1.weight) and τ̃ ∈ ℝ (linear1.bias). In this assignment you are asked to compare the analytic solution
β ∈ ℝp and τ ∈ ℝ to the approximation β̃, τ̃.
• Analytic solution (e.g. check out your “linear model” lecture notes):
T
1 x1
τ
(β)
T
T −1 T 1 x2
= (Xb Xb ) Xb y , where Xb := ∈ ℝn×(p+1) . (That is Xb includes the one vector for the bias)
⋮ ⋮
T
1 xn

• Hand in the source code that calculates

τ Hand-in your source code via
the analytic solution ( ) using PyTorch,
β Moodle until December 24th
(β̃)
τ τ̃
the euclidean distance between ( ) and . (Tuesday) 23.55h
β
Please hand-in a python ile with the following name
assignment1_STUDENTID.py , where
“STUDENTID” is your student id.
f
Hands-on Part
Linear Regression - Exercise
• Execute the code in lecture3_linear_regression.py.
• What is happening in each line?
• Change EPOCHS to 100, and to 1000. What do you observe?
• Try EPOCHS = 100000. What do you observe?
• Finally, set EPOCHS = 10, and LEARNING_RATE = 0.01. What do you observe?
Logistic Regression - Exercise
• Execute the code in lecture3_logistic_regression.py.
• Compare “class LogisticRegressionModel” to
“class LinearRegressionModel” (from lecture3_linear_regression.py)
• Write down the formula that is used to calculate loss_fn(pred, y) in this
example (that is for the logistic regression with CE loss).
Computational Graph for Automatic Differentiation - Exercise
• Execute gradient_descent_from_scratch_exercise.py
and think about what happens if
“with torch.no_grad(): "
is not used.
Homework
Homework
• Read Chapter 4 of “An Introduction to Statistical Learning” (Second
Edition)

BXE Engine Wiring Diagrams
100% (3)
BXE Engine Wiring Diagrams
14 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
Labconco Manual Liofilizadora
No ratings yet
Labconco Manual Liofilizadora
71 pages
SABRE Cars
No ratings yet
SABRE Cars
50 pages
chapter02.Background-theory_5e45b9b50ccb12d028c8edf9b332c5e5
No ratings yet
chapter02.Background-theory_5e45b9b50ccb12d028c8edf9b332c5e5
20 pages
SkriptOptMach
No ratings yet
SkriptOptMach
49 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Practical-5_2CEIT606_Artificial Intelligence
No ratings yet
Practical-5_2CEIT606_Artificial Intelligence
14 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Lecture3_upload
No ratings yet
Lecture3_upload
28 pages
IN5400 - Machine Learning For Image Analysis
No ratings yet
IN5400 - Machine Learning For Image Analysis
6 pages
Notes5_Regression
No ratings yet
Notes5_Regression
14 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
chapter_4_assignment (6)
No ratings yet
chapter_4_assignment (6)
5 pages
EE5434 Regression
No ratings yet
EE5434 Regression
96 pages
ML 01
No ratings yet
ML 01
24 pages
ML Labs
No ratings yet
ML Labs
46 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
IML-Summary
No ratings yet
IML-Summary
12 pages
loss-functions
No ratings yet
loss-functions
8 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Super Cheatsheet Machine Learning
100% (1)
Super Cheatsheet Machine Learning
15 pages
CS 229 - Supervised Learning Cheatsheet
No ratings yet
CS 229 - Supervised Learning Cheatsheet
2 pages
output_23
No ratings yet
output_23
6 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Homework2
No ratings yet
Homework2
3 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
AI2025_Lecture02_recording_slides (1)
No ratings yet
AI2025_Lecture02_recording_slides (1)
52 pages
01B-DL2023-LinearModels
No ratings yet
01B-DL2023-LinearModels
47 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
03-Linear Classification
No ratings yet
03-Linear Classification
17 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
AI2025_Lecture05_inperson_slide
No ratings yet
AI2025_Lecture05_inperson_slide
47 pages
Loss Function - Ipynb - Colaboratory
No ratings yet
Loss Function - Ipynb - Colaboratory
6 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Statistical Learning Intro
No ratings yet
Statistical Learning Intro
10 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
61 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Lec 2
No ratings yet
Lec 2
5 pages
ECE_449_Notes
No ratings yet
ECE_449_Notes
5 pages
Lec 03
No ratings yet
Lec 03
42 pages
QSRI-lecture1
No ratings yet
QSRI-lecture1
45 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
poly_aml
No ratings yet
poly_aml
76 pages
ML Notes
No ratings yet
ML Notes
14 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Binary Classification and Logistic Regression
No ratings yet
Binary Classification and Logistic Regression
7 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
CH 1
No ratings yet
CH 1
24 pages
DL 02 Basics
No ratings yet
DL 02 Basics
94 pages
Deep learning
No ratings yet
Deep learning
15 pages
ML Lab 06 Manual - Linear Regression 1 (Version 6)
No ratings yet
ML Lab 06 Manual - Linear Regression 1 (Version 6)
8 pages
05_optimization_basics
No ratings yet
05_optimization_basics
94 pages
Learning-Demo
No ratings yet
Learning-Demo
7 pages
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
(MS1R) Common Representative Rheumatic Diseases
No ratings yet
(MS1R) Common Representative Rheumatic Diseases
6 pages
4G Welding
No ratings yet
4G Welding
11 pages
Sec 04 PDF
100% (1)
Sec 04 PDF
22 pages
MS BULLET NOTES 5 - Budgeting
No ratings yet
MS BULLET NOTES 5 - Budgeting
4 pages
Chapter 13 FM
No ratings yet
Chapter 13 FM
9 pages
Patton Explains Nfpa, Ul Corruption
No ratings yet
Patton Explains Nfpa, Ul Corruption
2 pages
3.5 Cross Price Elasticity
0% (1)
3.5 Cross Price Elasticity
21 pages
AZ Evergreen Moldy Bread v1 GY
No ratings yet
AZ Evergreen Moldy Bread v1 GY
6 pages
English Top 10 Endangered Mexico
No ratings yet
English Top 10 Endangered Mexico
14 pages
Scalp Injuries: Neurosurgery Dr. El - Matary
No ratings yet
Scalp Injuries: Neurosurgery Dr. El - Matary
5 pages
Emba Yug Neft.sco
No ratings yet
Emba Yug Neft.sco
7 pages
In The Previous Lesson On It States That T
100% (1)
In The Previous Lesson On It States That T
16 pages
Advantages and Disadvantages of Technology
No ratings yet
Advantages and Disadvantages of Technology
1 page
A User-Friendly FORTRAN BVP Solver
No ratings yet
A User-Friendly FORTRAN BVP Solver
18 pages
Specialist Gynaecologist Hamilton Waikato NZ 3204
No ratings yet
Specialist Gynaecologist Hamilton Waikato NZ 3204
3 pages
Bhimashankar Resume 2021
No ratings yet
Bhimashankar Resume 2021
3 pages
Case Study: Colombo Frozen Yogurt
No ratings yet
Case Study: Colombo Frozen Yogurt
3 pages
Electrolytic Hydrogenation of Ephedr PDF
100% (1)
Electrolytic Hydrogenation of Ephedr PDF
3 pages
2751369-206 - Rev B
No ratings yet
2751369-206 - Rev B
1 page
SS2 Technical Drawing Lesson Plan Week 5
100% (1)
SS2 Technical Drawing Lesson Plan Week 5
6 pages
Literature Review
100% (1)
Literature Review
58 pages
Supply Chain Management - MSIL
No ratings yet
Supply Chain Management - MSIL
15 pages
SRK Pay Commision
No ratings yet
SRK Pay Commision
8 pages
Reviewer (STAS111)
No ratings yet
Reviewer (STAS111)
14 pages
NoI DUNG On TaP KIeM TRA HoC Ky I - MoN TIeNG ANH 11 THi dIeM - Nam Hoc 2019 - 2020 5edaa36538
No ratings yet
NoI DUNG On TaP KIeM TRA HoC Ky I - MoN TIeNG ANH 11 THi dIeM - Nam Hoc 2019 - 2020 5edaa36538
18 pages
APOLLO Complete Catalog
No ratings yet
APOLLO Complete Catalog
490 pages
Hotel Dialogues in English
0% (1)
Hotel Dialogues in English
3 pages