Intro To Statistical Learning
Intro To Statistical Learning
Learning.
Juwa Nyirenda
juwa.nyirenda@uct.ac.za
Please read the course outline to familiarise yourself with the content of
the course and how it is assessed. The course outline can be accessed by
clicking on the rst Vula tab on the left margin of the course Vula page.
Quiz 1 to be submitted by 23h55 on Monday, 26 April 2021
Quiz 2 to be submitted by 23h55 on Monday, 3 May 2021
Assignment 1 will be given out on Monday, 3 May 2021. Due date 23h55
on Monday, 10 May 2021.
Catch up meetings on MS Teams on Fridays between 16h00 and 18h00.
Admin
Relevant chapters:
Chapter 2
Chapter 3
Chapter 6 (6.1, 6.2)
Chapter 4
Chapter 8
Download pdf
James, G., Witten, D., Hastie, T. and Tibshirani, R.,
2013. An introduction to statistical learning
(Vol. 112).
New York: Springer.
ML/SL/Data Science/AI/Deep Learning/Analytics?!
All of these concepts are interrelated, and have become buzzwords
over the past several years
ML/SL/Data Science/AI/Deep Learning/Analytics?!
1
Slide credit: M. Varughese
Data Rich Environments in Industry: Spam lters, Chatbots,
Recommender Systems, Self-driving cars, Smart Policing,
Healthcare, etc.2
2
Slide credit:gizmodo.com, rcdroarena.com, www.wiseyak.com,
Applications in Computer Vision
Applications in NLP
Course Goals
Unsupervised Learning
● ●
● ●
●
●
● ●
● ● ● ● ●
●
●
● ● ●
4 ●● ●●●
●
●
●
● ●●
●
● ●
●
● ●●● ● ● ●
●● ●
●
● ●
● ● ●
● ● ● ●
● ●● ● ● ● ●● ● ●●
●●● ●●●●●●● ●
2
● ● ● ● ●●
● ●
●●●
● ● ●●●
●● ●
● ●
●
Input dim. 2
● ●
●
●●●●●●
●
●●●●
●●●●
●●● ●●● ● ● ●
●●●●●●● ●
−2
●
● ●
●●
● ● ●●● ●
●● ● ●
●●
● ● ●
●●●● ● ●
●● ● ●
●● ●● ● ● ●
● ●● ●●
−4
● ●
●● ● ●●● ●●● ●● ●● ●
● ●●
●● ●● ●
● ●●● ● ● ● ●
● ●●● ● ●
●● ● ● ● ●● ●● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ●● ● ●● ● ● ●●
● ●● ●● ●●● ●
● ● ● ● ●
● ● ●
● ●● ● ● ● ●
−6
● ●● ● ● ● ●● ● ●
● ● ●● ● ●
●
●●
−5 0 5 10
Input dim. 1
Supervised Learning
Supervised Learning
● ●
● ●
●
●
● ●
● ● ● ● ●
●
●
● ● ●
4 ●● ●●●
●
●
●
● ●●
●
● ●
●
● ●●● ● ● ●
●● ●
●
● ●
● ● ●
● ● ● ●
● ●● ● ● ● ●● ● ●●
●●● ●●●●●●● ●
2
● ● ● ● ●●
● ●
●●●
● ● ●●●
●● ●
● ●
●
Input dim. 2
● ●
●
●●●●●●
●
●●●●
●●●●
●●● ●●● ● ● ●
●●●●●●● ●
−2
●
● ●
●●
● ● ●●● ●
●● ● ●
●●
● ● ●
●●●● ● ●
●● ● ●
●● ●● ● ● ●
● ●● ●●
−4
● ●
●● ● ●●● ●●● ●● ●● ●
● ●●
●● ●● ●
● ●●● ● ● ● ●
● ●●● ● ●
●● ● ● ● ●● ●● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ●● ● ●● ● ● ●●
● ●● ●● ●●● ●
● ● ● ● ●
● ● ●
● ●● ● ● ● ●
−6
● ●● ● ● ● ●● ● ●
● ● ●● ● ●
●
●●
−5 0 5 10
Input dim. 1
Supervised Examples: Advertising
● ●
● ●
● ● ●● ●● ● ●
25
25
● ●
● ● ● ●
● ● ● ● ●
●
● ●
● ● ●
● ● ● ●
●
●
● ●
●
● ● ● ●
● ● ● ●
● ●
● ● ● ● ● ● ● ●
20
20
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●● ● ●
●
●
● ● ● ● ●● ● ●
● ● ● ●● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ●● ●● ● ●
● ●● ●● ●
●● ● ● ●● ●
● ●
● ●● ● ● ● ●●
●
●
● ●
●
● ●
15
15
● ● ● ●
Sales
Sales
● ● ● ● ● ●● ●● ● ●● ● ●
●● ● ● ● ● ● ● ●
● ● ●●
● ● ●
● ●
● ● ●● ●● ● ● ●● ● ●● ●
● ● ● ● ●● ● ● ●
● ● ● ● ● ●● ● ● ● ●● ● ● ● ●
● ● ● ● ●
● ●● ● ● ● ●
●
●
● ● ●
● ● ●
● ● ● ●
● ●
●
●
●● ● ● ●
●●● ● ●● ● ●●
● ● ●
●● ●●
●
●● ● ● ●● ●● ●
● ● ●●
● ●● ●● ● ●● ●
10 ● ● ●●
● ●●●● ● ● ●
10
●● ●●
● ●●● ● ●● ● ● ● ● ●●●
● ●
● ●● ● ● ● ● ●● ● ●
● ●
● ●● ● ● ● ● ●● ●
● ●
● ●
● ●
● ● ● ●
● ● ● ●
●● ● ●● ● ● ●
● ●
● ●
● ●● ● ● ●
● ●● ●
● ●●
●● ● ● ●
5
5
● ●
● ●
● ●
TV Radio
50
● ●
● ● ●● ● ●
● ●●
● ● ● ● ● ● ●
●
25
●
● ●
●● ● ● ●
●
●
● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ●
● ●
40
● ● ● ●
● ● ● ● ●
●● ● ● ● ●
●
● ● ● ●● ● ●
● ● ● ● ●
20
● ● ● ● ●
●● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ●● ●
● ●
● ● ● ●
● ●● ●
● ● ●
30
● ● ● ● ●
● ● ●● ●● ● ●
● ● ●
● ● ● ● ● ● ●●
● ●
● ● ●● ● ● ● ● ● ● ●
● ● ●
15
● ● Radio
Sales
●●
●● ● ● ● ●● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
●●● ●
● ● ●
● ● ● ● ●
●● ● ● ● ●
● ● ● ● ●● ● ● ●
● ●● ●● ●● ●●
20
● ●
● ● ●●●● ● ● ● ●●
● ● ● ●
● ● ● ●
● ● ● ●●
●
● ●
● ● ● ● ● ●
● ●● ● ● ● ● ●
● ●● ● ●
10
●●● ●
●● ● ● ● ● ● ● ●
●
●● ●● ● ● ●
●
● ●● ●
● ● ●● ● ● ● ● ● ●
● ●
● ●
● ● ● ● ●● ● ●
● ● ●
● ● ●
● ● ●
10
● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ●● ● ●
● ● ●
5
● ● ●● ●
●● ● ● ●
● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●●
●● ●● ● ● ● ●
● ●●
● ●
0
0 20 40 60 80 100 0 20 40 60 80 100
Newspaper Newspaper
> # Load Income dataset and store in data frame called dat:
> dat = read.table(’Income.txt’, h = TRUE)
> # First 10 rows of Income data:
> head(dat, 10)
100
100
● ●
● ●
● ●
● ●
● ● ● ●
● ●
● ●
80
80
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ●
● ●
Income
Income
60
60
● ●
● ●
● ●
● ●
40
40
● ●
● ●
● ● ●
●
● ●
● ●
20
20
● ●
● ●
10 12 14 16 18 20 22 50 100 150
Education Seniority
3
Term originates from a Monty Python sketch*
Variables: Measurement Scales
4
Synonym for input variables. Used interchangeably.
Supervised Examples: Mortality
TV Radio Newspaper
[1,] 230.1 37.8 69.2
[2,] 44.5 39.3 45.1
[3,] 17.2 45.9 69.3
[4,] 151.5 41.3 58.5
[5,] 180.8 10.8 58.4
Each row of this matrix now (say, [3,]) corresponds to the feature set for
an observation in the vector Sales (say, Sales[3]).
Statistical Learning: Regression
●
●●
● ●
●●
8
7
●
●
●
●
●● (x25, y25)
●
●●
●
7
● ●
6
●
● ●
●
●
●
6
5
●
● ●
● ●
Y
Y
● ●
●
● ●
● ● ●
●
5
● ●
● ● ●
4
● ● ●
● ●
● ● ● ●
● ●
● ● ● ● ●● ● ●
● ● ●
●
● ● ●
●
4
● ● ●
● ● ● ● ●
3
●
● ●● ● ● ● ●
● ●
●
●
● ● ●
3
● True f
● ● ● Data
0.0 0.2 0.4 0.6 0.8 1.0 2 0.0 0.2 0.4 0.6 0.8 1.0
X X
●
●●
● ●
●●
8
●
7
●
●
●
●
●●
●
●●
●
7
● ●
6
●
● ●
●
●
6
●
5
●
● ●
● ●
Y
Y
● ●
●
● ●
● ● ●
●
5
● ●
● ● ●
4
● ● ●
● ●
● ● ● ●
● ●
● ●● ● ● ● ● ●
● ● ●
●
● ● ●
●
4
● ● ●
● ● ● ● ●
3
●
● ● ● ●● ● ●
● ●
●
Error ● True f
● ● ●
3
● Data ● ● Data
Sys. Comp. ● ● Model
2
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
X X
What we posit.
Statistical Learning: Regression
8
●
●●
●
●●
●
7
●
●
●
●
6
●
●
Y
●
●
● ●
● ●
● ●
● ● ●
● ●
● ● ●
4
● ●
● ●
● ●● ● ●
● ●
●
● Error
● ● ●
3
● ● Data
● ● Model
o o o oo o o o ooooo o o o
oo o oo o o
o o o o oo o
o o oo oo o
o o o oo o
o
o ooo o
oo o ooooo oooo
o o o oo o o
o o o oo o o o
o o ooooo oo
o o o o
o oo o
o o o
X1
1.0
0.5
0.5
0.0
0.0
X2
X2
−0.5
−0.5
−1.0
−1.0 −0.5 0.0 0.5 1.0 −1.0 −1.0 −0.5 0.0 0.5 1.0
X1 X1
1.0
0.5
0.5
0.0
0.0
X2
X2
−0.5
−0.5
−1.0
−1.0 −0.5 0.0 0.5 1.0 −1.0 −1.0 −0.5 0.0 0.5 1.0
X1 X1
f (.) vs fˆ(.)
Why estimate f ?
High
Subset Selection
Lasso
Least Squares
Interpretability
Bagging, Boosting
Low High
Flexibility
●
●
120
● ● ●●
25
●
● ●
● ● ●
●
● ● ●
● ●
●
●
100
● ●
● ●
● ● ● ●
20
● ● ●
● ● ●
● ● ●●
● ● ● ●
● ● ●
● ● ●
●
80
● ● ● ● ●
● ● ●●
● ●●
● ● ● ● ●●
● ●● ● ●
●
15
● ●
Sales
Error
● ● ● ● ●●
●● ● ● ● ●
● ● ●
60
●
● ● ●● ●● ●
● ● ● ●
● ● ● ● ● ●● ● ●
● ● ● ● ●
●● ●
● ● ●
● ● ● ●● ●
●
●
●● ●
●●
●
●● ● ● ●●
● ●● ●● ●
● ● ●●
●
10
●●
●●● ●
40
● ●●
● ●● ● ● ● ●
● ●● ● ● ● ●
●
● ●
● ●
●● ● ●●
●
● ●●
● ●●
20
●● ●
5
●
^
f
●
● Data
● Residuals
0
TV Iteration
● ●
● ●
● ● ●● ●● ● ●
25
25
● ●
● ● ● ●
● ● ● ● ●
●
● ●
● ● ●
● ● ● ●
●
●
● ●
●
● ● ● ●
● ● ● ●
● ●
● ● ● ● ● ● ● ●
20
20
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●● ● ●
●
●
● ● ● ● ●● ● ●
● ● ● ●● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ●● ●● ● ●
● ●● ●● ●
●● ● ● ●● ●
● ●
● ●● ● ● ● ●●
●
●
● ●
●
● ●
15
15
● ● ● ●
Sales
Sales
● ● ● ● ● ●● ●● ● ●● ● ●
●● ● ● ● ● ● ● ●
● ● ●●
● ● ●
● ●
● ● ●● ●● ● ● ●● ● ●● ●
● ● ● ● ●● ● ● ●
● ● ● ● ● ●● ● ● ● ●● ● ● ● ●
● ● ● ● ●
● ●● ● ● ● ●
●
●
● ● ●
● ● ●
● ● ● ●
● ●
●
●
●● ● ● ●
●●● ● ●● ● ●●
● ● ●
●● ●●
●
●● ● ● ●● ●● ●
● ● ●●
● ●● ●● ● ●● ●
10 ● ● ●●
● ●●●● ● ● ●
10
●● ●●
● ●●● ● ●● ● ● ● ● ●●●
● ●
● ●● ● ● ● ● ●● ● ●
● ●
● ●● ● ● ● ● ●● ●
● ●
● ●
● ●
● ● ● ●
● ● ● ●
●● ● ●● ● ● ●
● ●
● ●
● ●● ● ● ●
● ●● ●
● ●●
●● ● ● ●
5
5
● ●
● ●
● ●
TV Radio
50
● ●
● ● ●● ● ●
● ●●
● ● ● ● ● ● ●
●
25
●
● ●
●● ● ● ●
●
●
● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ●
● ●
40
● ● ● ●
● ● ● ● ●
●● ● ● ● ●
●
● ● ● ●● ● ●
● ● ● ● ●
20
● ● ● ● ●
●● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ●● ●
● ●
● ● ● ●
● ●● ●
● ● ●
30
● ● ● ● ●
● ● ●● ●● ● ●
● ● ●
● ● ● ● ● ● ●●
● ●
● ● ●● ● ● ● ● ● ● ●
● ● ●
15
● ● Radio
Sales
●●
●● ● ● ● ●● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
●●● ●
● ● ●
● ● ● ● ●
●● ● ● ● ●
● ● ● ● ●● ● ● ●
● ●● ●● ●● ●●
20
● ●
● ● ●●●● ● ● ● ●●
● ● ● ●
● ● ● ●
● ● ● ●●
●
● ●
● ● ● ● ● ●
● ●● ● ● ● ● ●
● ●● ● ●
10
●●● ●
●● ● ● ● ● ● ● ●
●
●● ●● ● ● ●
●
● ●● ●
● ● ●● ● ● ● ● ● ●
● ●
● ●
● ● ● ● ●● ● ●
● ● ●
● ● ●
● ● ●
10
● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ●● ● ●
● ● ●
5
● ● ●● ●
●● ● ● ●
● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●●
●● ●● ● ● ● ●
● ●●
● ●
0
0 20 40 60 80 100 0 20 40 60 80 100
Newspaper Newspaper
But what about the joint relationship? That is, we know that more
than one factor attributes to income.
So we include more predictors: E.g., for a given combination of
Education and Seniority, what Income can we expect to observe?
Let's just extend our equation:
Y = β0 + β1 × X1 + β2 × X2 + , (6)
where Y denotes income, X1 denotes years of education, X2
denotes seniority, and {β0 , β1 , β2 } are coecients of the linear
equation. I.e.,
Income ≈ β0 + β1 × Education + β2 × Seniority. (7)
Parametric Methods: Income Revisited
Incom
e
ity
or
Ye
ni
ars
Se
of
Ed
uc
ati
on
Incom
e
ity
or
Ye
ni
ars
Se
of
Ed
uc
ati
on
100
● ●
● ●
● ●
● ●
● ● ● ●
● ●
● ●
80
80
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ●
● ●
Income
Income
60
60
● ●
● ●
● ●
● ●
40
40
● ●
● ●
● ● ●
●
● ●
● ●
20
20
● ●
● ●
10 12 14 16 18 20 22 50 100 150
Education Seniority
100
● ●
● ●
● ●
● ●
● ● ● ●
● ●
● ●
80
80
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ●
● ●
Income
Income
60
60
● ●
● ●
● ●
● ●
40
40
y^ = 35.25
● ●
y^ = 29.08
● ●
● ● ● ●
● ●
● ●
20
● 20 ●
● ●
10 12 14 16 18 20 22 10 12 14 16 18 20 22
Education Education
Non-Parametric Methods: Income Revisited
Non-Parametric Methods: Income Revisited
Inco
me
me
Ed
Ed
uc
uc
atio
atio
n
n
rity rity
nio nio
Se Se
fˆ for
Circles Revisited
Some Closing Remarks