Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

SMAI-M20-L09: Aspects of Supervised Learning: C. V. Jawahar

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

SMAI-M20-L09: Aspects of Supervised Learning

C. V. Jawahar

IIIT Hyderabad

August 28, 2020

1 / 16
Class Review (L09)

1 Consider a matrix A of size m × n. Rank of A is related m or n?


2 A and B are two independent events such that P(A) = 0.4 and
P(A ∩ B) = 0.2 Then Find P(A ∩ B).
3 If A is a n × n matrix, with every pair of columns orthogonal i.e.,
ai · aj = 0 ∀i, j and ||ai || = 1.
4 Product of Eigen values of a real square matrix is known as ?
5 X ∼ N(0, 1), Y ∼ N(1, 1) and Z = X + Y . Then,

2 / 16
Blank

3 / 16
Recap:
Problem Space:
Learn a function y = f (W, x) from the data.
for classification
for regression
Learn useful features
Supervised Learning:
Notion of Training and Testing
Notion of Loss Function and Optimization
Need of Generalization and Worry of Overfitting
Classification Algorithms:
Nearest Neighbour Algorithm
Linear Classification; Linear Regression
Decide as ω1 if P(ω1 |x) ≥ P(ω2 |x) else ω2
Performance Metrics
Mathematical Foundations: Linear Algebra, Probability, Optimization
SVD, Eigen Decomposition
MLE
4 / 16
Blank

5 / 16
This Lecture Session:

Micro-Lecture Videos
1 Minimum Error Classification

The best we can ever achieve.


Q: Even Deep Learning can not do better. Sad. Isn’t? :-)
2 Model complexity and Occam’s razor
Simple, yet good model
New Key words: Regularization, Model Complexity
3 Validation Error, K-Fold and LOO
An estimate of the test error.
Q: How do we prefer one of the two solutions (say NN with K=3 and
K=5)? Finding the right hyper parameters.

6 / 16
Blank

7 / 16
Questions? Comments?

8 / 16
Blank

9 / 16
Discussions Point - I

In the context of regression (assume x ∈ R 1 i.e., only one feature):


1 We know how to fit a line passing through origin with a model
y = wT x
2 We know how to fit a line, even if it is not passing
 through
 origin,
x
with a model y = wT x0 . where x0 is defined as
1
3 How do we model the problem of fitting a quadratic (say a parabola)
given a set of points?. What is x? Is there a closed form expression?

10 / 16
Blank

11 / 16
Discussion Point - II
1 Can we guess/compute/complete the missing elements of the matrix:
 
7 ? ?
 ? 8 ? 
 
 ? 12 6 
 
 ? ? 2 
21 6 ?

if we know that this is a rank-1 matrix (or every row is a multiple of


each other)1
2 If A is a m × n matrix and Ak is the nearest rank-k matrix, Ak can be
computed using SVD as (i.e., Ak = arg minB ||A − B||2F and
rank(B) = k)
Ak = Uk Dk VkT
Details:2
1
Read later: https://web.stanford.edu/class/cs168/l/l9.pdf
2
Read Later: https://courses.cs.washington.edu/courses/cse521/16sp/521-lecture-9.pdf
12 / 16
Blank

13 / 16
Discussion Point - III

Consider the binary classification problem where both classes are univariate
Gaussian (assume µ1 ≤ µ2 ) . i.e., P(ωi |x) = N (µi , σi2 ). Optimal decision
is “Decide as ω1 if x ≤ θ else ω2 ”.

1 When σ1 = σ2 = σ, show that the optimal threshold (i.e., θ) is the


mid point of means. Ans:
2 2
1 1 (θ−µ1 ) 1 1 (θ−µ2 )
√ e− 2 σ = √ e− 2 σ
2σ 2π 2σ 2π
µ1 +µ2
or θ = 2
2 If σ1 6= σ2 , what will be the θ? Can we get a closed form expression
for θ ? (for convenience, discard the normalizing term in the class)

14 / 16
Blank

15 / 16
What Next:? (next three)

Application of SVD and Eigen Decomposition


More Insights into Supervised Learning
Bayesian View and Optimal Classification
Practical Issues in Optimization
Choice of Loss Functions

16 / 16

You might also like