EUC1502 Module2 Machine Learning
EUC1502 Module2 Machine Learning
Introduction to
machine learning
econometrics
THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION
Eurostat
Computational issues
Introduction
n observations
Disadvantages:
▪ The validation estimates can be higly variable, depending
on which observations are icluded in the training set
4
Eurostat
Computational issues
The validation set approach
Disadvantages:
▪ The validation estimates can be higly variable, depending
on which observations are icluded in the training set
5
Eurostat
Computational issues
Cross-validation
▪ K-Fold cross-validation
6
Eurostat
Computational issues
Cross-validation: leave-one-out cross-validation
1 n-1
8
Eurostat
Computational issues
Cross-validation: leave-one-out cross-validation
Advantages:
Disadvantage:
▪ It can be very time consuming if n large
9
Eurostat
Computational issues
Cross-validation: K-Fold cross-validation
k=5
fold k k-1 folds
If k=n LOOCV
Usually k=5 or k=10
11
Eurostat
Computational issues
Cross-validation: K-Fold cross-validation
Advantages:
12
Eurostat
Computational issues
Assessing model fitting
1
MSE = σ𝑛𝑖=1(𝑦𝑖 − 𝑓መ 𝑥𝑖 ) 2
𝑛 Prediction that 𝒇
gives on 𝑥𝑖
Training MSE
14
Eurostat
Computational issues
Assessing model fitting
Example
0 20 40 60 80 100
X
15
Eurostat
Computational issues
Assessing model fitting
2.5
1.5
2 5 10 20
value that test MSE can get
Flexibility
Remarks:
17
Eurostat
Computational issues
Replication or resampling: Bootstrap
zi = (xi, yi)
B= number of
bootstrapped
samples
S(Z*b)= quantity
of interest
18
Eurostat
Computational issues
Replication or resampling: Bagging
1 𝐵
𝑓መ𝑏𝑎𝑔(𝑥)= σ𝑏=1 𝑓መ*b(x)
𝐵
20
Eurostat
Computational issues
Replication or resampling: Bumping
The best model is from the b bootstrap sample where:
𝑁
^
∗ 𝑏Ƹ
The model predictions are 𝑓መ (x)
Remark:
▪ The original training sample is included in the bootstrapped
samples, so the model could pick it if it has the lowest
training error
21
Eurostat
Machine learning linear estimation
Introduction
Remarks:
▪ Prediction accuracy:
➢ low bias
➢ If n ≫ p also low variance
However:
➢ If n > p high variance
➢ If p >n the method cannot be used
22
Eurostat
Machine learning linear estimation
Introduction
▪ Model interpretability:
➢ The model sometimes includes irrelevant variables
23
Eurostat
Machine learning linear estimation
Introduction
24
Eurostat
Machine learning linear estimation
Shrinkage methods
▪ Ridge regression
▪ Lasso
25
Eurostat
Machine learning linear estimation
Shrinkage methods
▪ Ridge regression
▪ Lasso
26
Eurostat
Machine learning linear estimation
Shrinkage methods – Ridge regression
Tuning parameter
shrinkage penalty
If:
𝜆 =0 shrinkage penality is null and 𝛽መ 𝑅 = 𝛽መ
𝜆 →∞ shrinkage penality grows and 𝛽መ 𝑅 → 0
27
Eurostat
Machine learning linear estimation
Shrinkage methods – Ridge regression
28
Eurostat
Machine learning linear estimation
Shrinkage methods – Ridge regression
Advantages:
60
as 𝜆 increases
40
(but the bias increases)
30
20
10
▪ It can be used when p>n
0
1e−01 1e+01 1e+03
Disadvantage: λ
▪ Ridge regression
▪ Lasso
30
Eurostat
Machine learning linear estimation
Shrinkage methods – Lasso regression
ℓ1 penalty
31
Eurostat
Machine learning linear estimation
Shrinkage methods –Selection of 𝜆
Steps:
▪ Define a grid of values for 𝜆
32
Eurostat