0% found this document useful (0 votes)

10 views

Intro To Statistical Learning

The document provides an introduction to a course on supervised learning. It outlines the topics that will be covered in the course including relevant chapters, applications of machine learning, and the goals of the course. It also distinguishes between supervised and unsupervised learning paradigms.

Uploaded by

alvin

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Intro To Statistical Learning

Uploaded by

alvin

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Supervised Learning Part 1: Intro to Statistical

Learning.

Juwa Nyirenda
juwa.nyirenda@uct.ac.za

University of Cape Town

Slide Credit: Slides adapted from those by Dr Etienne Pienaar

April 20, 2021

Admin

Please read the course outline to familiarise yourself with the content of
the course and how it is assessed. The course outline can be accessed by
clicking on the rst Vula tab on the left margin of the course Vula page.
Quiz 1 to be submitted by 23h55 on Monday, 26 April 2021
Quiz 2 to be submitted by 23h55 on Monday, 3 May 2021
Assignment 1 will be given out on Monday, 3 May 2021. Due date 23h55
on Monday, 10 May 2021.
Catch up meetings on MS Teams on Fridays between 16h00 and 18h00.
Admin

Relevant chapters:
Chapter 2
Chapter 3
Chapter 6 (6.1, 6.2)
Chapter 4
Chapter 8
Download pdf
James, G., Witten, D., Hastie, T. and Tibshirani, R.,
2013. An introduction to statistical learning
(Vol. 112).
New York: Springer.
ML/SL/Data Science/AI/Deep Learning/Analytics?!
All of these concepts are interrelated, and have become buzzwords
over the past several years
ML/SL/Data Science/AI/Deep Learning/Analytics?!

All of these concepts are interrelated, and have become buzzwords

over the past several years
Data Rich Environments in Industry:
Tech/Banking/Insurance1

Netix prize: Based on ratings of 18,000 movies by 400,000

Netix customers, predict customer's ratings of other movies.

(source: Holloway, 2010)

1
Slide credit: M. Varughese
Data Rich Environments in Industry: Spam lters, Chatbots,
Recommender Systems, Self-driving cars, Smart Policing,
Healthcare, etc.2

2
Slide credit:gizmodo.com, rcdroarena.com, www.wiseyak.com,
Applications in Computer Vision
Applications in NLP
Course Goals

By the end of the course, you should be able to:

Understand how various statistical learning algorithms work
Implement them on your own
Look at a real world problem and identify if a statistical learning
algorithm is an appropriate solution
If so, identify what types of algorithms might be applicable
Feel insipired to work on and learn more about statistical learning.
Skills: Paradigms for Statistical Learning
We bisect `Statistical Learning' into two broad paradigms of learning,
namely:
Supervised Learning
We are interested in the relationship between a set of predictor
variables and a measurable outcome.
Data consists of input variables and output variables.
Examples of models that t within this paradigm are: Linear and
Logistic Regression, Decision/Regression Trees, Neural Networks,
Support Vector Machines.
Unsupervised Learning
We are interested in intrinsic
patterns/clusters/features/partitions/groupings in a set of
observations.
Data simply consists of a number of measurements.
Examples of such methods include Principal Components Analysis
(PCA), Clustering, and Self-Organising Maps (SOMs).
Unsupervised Learning Example

Unsupervised Learning

● ●
● ●
●
●
● ●
● ● ● ● ●
●
●
● ● ●
4 ●● ●●●
●
●
●
● ●●
●
● ●
●
● ●●● ● ● ●
●● ●
●
● ●
● ● ●
● ● ● ●
● ●● ● ● ● ●● ● ●●
●●● ●●●●●●● ●
2

● ● ● ● ●●
● ●
●●●
● ● ●●●
●● ●
● ●
●
Input dim. 2

● ●
●
●●●●●●
●
●●●●
●●●●
●●● ●●● ● ● ●
●●●●●●● ●
−2

●
● ●
●●
● ● ●●● ●
●● ● ●

●●
● ● ●
●●●● ● ●
●● ● ●
●● ●● ● ● ●
● ●● ●●
−4

● ●
●● ● ●●● ●●● ●● ●● ●
● ●●
●● ●● ●
● ●●● ● ● ● ●
● ●●● ● ●
●● ● ● ● ●● ●● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ●● ● ●● ● ● ●●
● ●● ●● ●●● ●
● ● ● ● ●
● ● ●
● ●● ● ● ● ●
−6

● ●● ● ● ● ●● ● ●
● ● ●● ● ●
●
●●

−5 0 5 10

Input dim. 1
Supervised Learning

Supervised Learning

● ● ● ● ●●
● ●
●●●
● ● ●●●
●● ●
● ●
●
Input dim. 2

● ●
●
●●●●●●
●
●●●●
●●●●
●●● ●●● ● ● ●
●●●●●●● ●
−2

●
● ●
●●
● ● ●●● ●
●● ● ●

●●
● ● ●
●●●● ● ●
●● ● ●
●● ●● ● ● ●
● ●● ●●
−4

● ●● ● ● ● ●● ● ●
● ● ●● ● ●
●
●●

−5 0 5 10

Input dim. 1
Supervised Examples: Advertising

Advertising data: Sales and expenditure (000's) on TV, Radio, and

Newspaper advertising for 200 markets.
Can we predict Sales (output variable) based on advertising expenditure
(input variables/predictors)?
> # Load advertising dataset. Store in data frame called dat:
> dat = read.table(’Advertising.txt’, h = TRUE)
> # First 10 rows of Wage data:
> head(dat, 10)

TV Radio Newspaper Sales

1 230.1 37.8 69.2 22.1
2 44.5 39.3 45.1 10.4
3 17.2 45.9 69.3 9.3
4 151.5 41.3 58.5 18.5
5 180.8 10.8 58.4 12.9
6 8.7 48.9 75.0 7.2
7 57.5 32.8 23.5 11.8
8 120.2 19.6 11.6 13.2
9 8.6 2.1 1.0 4.8
10 199.8 2.6 21.2 10.6
Supervised Examples: Advertising

● ●
● ●
● ● ●● ●● ● ●

25
● ●
● ● ● ●
● ● ● ● ●
●
● ●
● ● ●
● ● ● ●
●
●
● ●
●
● ● ● ●
● ● ● ●
● ●
● ● ● ● ● ● ● ●

20
● ● ● ● ● ●
● ● ● ● ● ●
● ● ●● ● ●
●
●
● ● ● ● ●● ● ●
● ● ● ●● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ●● ●● ● ●
● ●● ●● ●
●● ● ● ●● ●
● ●
● ●● ● ● ● ●●
●
●
● ●
●
● ●

15
● ● ● ●

Sales

Sales
● ● ● ● ● ●● ●● ● ●● ● ●
●● ● ● ● ● ● ● ●
● ● ●●
● ● ●
● ●
● ● ●● ●● ● ● ●● ● ●● ●
● ● ● ● ●● ● ● ●
● ● ● ● ● ●● ● ● ● ●● ● ● ● ●
● ● ● ● ●
● ●● ● ● ● ●
●
●
● ● ●
● ● ●
● ● ● ●
● ●
●
●
●● ● ● ●
●●● ● ●● ● ●●
● ● ●
●● ●●
●
●● ● ● ●● ●● ●
● ● ●●
● ●● ●● ● ●● ●
10 ● ● ●●
● ●●●● ● ● ●

10
●● ●●
● ●●● ● ●● ● ● ● ● ●●●
● ●
● ●● ● ● ● ● ●● ● ●
● ●
● ●● ● ● ● ● ●● ●
● ●
● ●
● ●
● ● ● ●
● ● ● ●
●● ● ●● ● ● ●
● ●
● ●
● ●● ● ● ●
● ●● ●
● ●●
●● ● ● ●
5

5
● ●

● ●

0 50 100 150 200 250 300 0 10 20 30 40 50

TV Radio

50
● ●
● ● ●● ● ●
● ●●
● ● ● ● ● ● ●
●
25

●
● ●
●● ● ● ●
●
●
● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ●
● ●

40
● ● ● ●
● ● ● ● ●
●● ● ● ● ●
●
● ● ● ●● ● ●
● ● ● ● ●
20

● ● ● ● ●
●● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ●● ●
● ●
● ● ● ●
● ●● ●
● ● ●

30
● ● ● ● ●
● ● ●● ●● ● ●
● ● ●
● ● ● ● ● ● ●●
● ●
● ● ●● ● ● ● ● ● ● ●
● ● ●
15

● ● Radio
Sales

●●
●● ● ● ● ●● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ●
●●● ●
● ● ●
● ● ● ● ●
●● ● ● ● ●
● ● ● ● ●● ● ● ●
● ●● ●● ●● ●●
20
● ●
● ● ●●●● ● ● ● ●●
● ● ● ●
● ● ● ●
● ● ● ●●
●
● ●
● ● ● ● ● ●
● ●● ● ● ● ● ●
● ●● ● ●
10

●●● ●
●● ● ● ● ● ● ● ●
●
●● ●● ● ● ●
●
● ●● ●
● ● ●● ● ● ● ● ● ●
● ●
● ●
● ● ● ● ●● ● ●
● ● ●
● ● ●
● ● ●
10

● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ●● ● ●
● ● ●
5

● ● ●● ●
●● ● ● ●
● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●●
●● ●● ● ● ● ●
● ●●
● ●
0

0 20 40 60 80 100 0 20 40 60 80 100

Newspaper Newspaper

Sales vs. various predictor variables.

Supervised Examples: Income

Income data: Income and years of Education and Seniority for 30

individuals.
Can we predict Income (outcome) based on years of Education and
Seniority?

> # Load Income dataset and store in data frame called dat:
> dat = read.table(’Income.txt’, h = TRUE)
> # First 10 rows of Income data:
> head(dat, 10)

Education Seniority Income

1 21.58621 113.10345 99.91717
2 18.27586 119.31034 92.57913
3 12.06897 100.68966 34.67873
4 17.03448 187.58621 78.70281
5 19.93103 20.00000 68.00992
6 18.27586 26.20690 71.50449
7 19.93103 150.34483 87.97047
8 21.17241 82.06897 79.81103
9 20.34483 88.27586 90.00633
10 10.00000 113.10345 45.65553
Supervised Examples: Income

100

100
● ●
● ●
● ●

● ●
● ● ● ●
● ●
● ●
80

80
● ●
● ● ● ●

● ● ● ●
● ● ● ●
● ●
● ●
● ●
Income

Income
60

60
● ●

● ●

● ●
40

40
● ●
● ●

● ● ●
●

● ●
● ●
20

20
● ●
● ●

10 12 14 16 18 20 22 50 100 150

Education Seniority

Income vs. various predictor variables.

Supervised Examples: Email Classication

Can we detect whether an email is spam3 ?

We might use the relative frequency of occurrence of
characters/words/mispelings...
type free your our mail order dollar ! (
email 0.000 0.000 0.270 0.550 0.000 0.000 0.000 0.549
email 0.000 1.780 0.890 0.000 0.000 0.000 0.000 0.298
email 0.000 0.000 0.000 1.960 0.000 0.000 0.000 0.373
email 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
email 0.000 0.000 0.600 0.000 0.100 0.000 0.000 0.049
spam 0.000 2.830 0.940 0.940 0.000 0.000 0.000 0.000
spam 1.050 2.100 0.000 0.000 0.000 0.182 0.365 0.365
email 1.380 1.380 0.000 0.690 0.000 0.000 2.378 0.000
email 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.098
email 0.930 0.000 0.000 0.000 0.000 0.000 0.000 0.163

3
Term originates from a Monty Python sketch*
Variables: Measurement Scales

Both predictor and output variables can be dened as either qualitative

or quantitative.
Supervised learning problems where the output variables are quantitative
in nature are referred to as regression tasks.
Problems where the output variables are qualitative in nature are
referred to as classication tasks.
The particular methods one chooses to perform these tasks are often
informed by the measurement scale of the output, i.e., qualitative vs.
quantitative.
The set of predictor variables are usually a combination of both
quantitative and qualitative measurements for both regression and
classication problems. Indeed, one often spends some time
encoding/engineering the feature set4 in order to improve model
performance or interpretation.

4
Synonym for input variables. Used interchangeably.
Supervised Examples: Mortality

But this should be obvious, right?

Consider the following: Does smoking status aect mortality?
age smoking deaths person_years
1 35_44 smoker 32 52407
2 45_54 smoker 104 43248
3 55_64 smoker 206 28612
4 65_74 smoker 186 12663
5 75_84 smoker 102 5317
6 35_44 non-smoker 2 18790
7 45_54 non-smoker 12 10673
8 55_64 non-smoker 28 5710
9 65_74 non-smoker 28 2585
10 75_84 non-smoker 31 1462
Variables: Notation

We use Y to notionally represent responses/output variables/target

variable.
For an observed outcome of Y , we use y . Furthermore, we often use an
index subscript, say yi to denote the i-th observation i.e., a particular
instance of an outcome.
{y1 , y2 , . . . , yN } thus represents a set of N observations.
For a given Y we also observe a vector of p features
X = (X1 , X2 , . . . , Xp ).
We may refer to an observed outcome of this feature vector using
x = (x1 , x2 , . . . , xp ).
So for each observation, we have a pair (y, x).
For example, for our rst observation of the advertising data, we have
y = 22.1 and x = (230.1, 37.8, 69.2).
Variables: R

In computing environments `variables' are objects which contain data.

Often, the relationship between the mathematical variables and those in
the workspace becomes opaque.
Typically, in R we work with vectors and matrices.
Sales may denote a collection of observations {y1 , y2 , . . . , yN }.
Sales is thus a (column) vector of observations which can be indexed:
> Sales[1:5] # Vector

[1] 22.1 10.4 9.3 18.5 12.9

We like to assign variable names that correspond to the output and

features of the data when we build models for interpretive purposes.
Hint: When writing pseudo-code, make sure to annotate the
mathematical counterparts so you know what you are doing.
Variables: R

Since we usually have multiple features for every response, we tend to

work with collections of vectors (one for each feature)
Mathematically, these features are usually joined in the form of a matrix.
This can be achieved quite naturally in the computing environment:
> X = cbind(TV, Radio, Newspaper) # Matrix: features by column
> head(X, 5)

TV Radio Newspaper
[1,] 230.1 37.8 69.2
[2,] 44.5 39.3 45.1
[3,] 17.2 45.9 69.3
[4,] 151.5 41.3 58.5
[5,] 180.8 10.8 58.4

Each row of this matrix now (say, [3,]) corresponds to the feature set for
an observation in the vector Sales (say, Sales[3]).
Statistical Learning: Regression

One way to describe the relationship between responses and

predictors is via the equation:
Y = f (X) + |{z}

| {z } (1)
systematic random

f (.) is some xed, but unknown function.

is a random error term, independent of X , with mean 0.
We may subsequently posit a model for f , say fˆ, which estimates
f.
Statistical Learning: Regression

Data Systematic Component

●
●●
● ●
●●
8

7
●
●
●

●
●● (x25, y25)
●

●●
●
7

● ●

6
●
● ●

●
●

●
6

5
●
● ●
● ●
Y

Y
● ●
●
● ●
● ● ●
●
5

● ●
● ● ●

4
● ● ●
● ●
● ● ● ●
● ●
● ● ● ● ●● ● ●
● ● ●
●
● ● ●
●
4

● ● ●
● ● ● ● ●

3
●
● ●● ● ● ● ●
● ●
●
●
● ● ●
3

● True f
● ● ● Data

0.0 0.2 0.4 0.6 0.8 1.0 2 0.0 0.2 0.4 0.6 0.8 1.0

X X

What we have vs. what is.

Statistical Learning: Regression

Random Component Data

●
●●
● ●
●●

8
●
7

●
●
●
●
●●
●
●●
●

7
● ●
6

●
● ●

●
●

6
●
5

●
● ●
● ●
Y

Y
● ●
●
● ●
● ● ●
●

5
● ●
● ● ●
4

● ● ●
● ●
● ● ● ●
● ●
● ●● ● ● ● ● ●
● ● ●
●
● ● ●
●

4
● ● ●
● ● ● ● ●
3

●
● ● ● ●● ● ●
● ●
●
Error ● True f
● ● ●

3
● Data ● ● Data
Sys. Comp. ● ● Model
2

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

X X

What we posit.
Statistical Learning: Regression

8
●
●●
●
●●
●

7
●
●
●

●
6

●
●
Y

●
●
● ●
● ●
● ●
● ● ●
● ●
● ● ●
4

● ●
● ●

● ●● ● ●
● ●
●
● Error
● ● ●
3

● ● Data
● ● Model

0.0 0.2 0.4 0.6 0.8 1.0

How we use what we have to get what we posit.

Statistical Learning: Classication

Compute f (X) at each point (X1 , X2 ) on a lattice (grid).

Dashed line is where f (X) = 0.5.
oo o
oo o
o
o
o oo oo o
o
o oo oo ooo
o o o oooooo o oo
o oo o o ooo o
o o oo o oo
o
oo o ooo o o o o o o o
o o o oo o o o
o oo o o o o o o o
o o oooo o ooo o o o o ooo
o
o
X2

o o o oo o o o ooooo o o o
oo o oo o o
o o o o oo o
o o oo oo o
o o o oo o
o
o ooo o
oo o ooooo oooo
o o o oo o o
o o o oo o o o
o o ooooo oo
o o o o
o oo o
o o o

Classication on a lattice over the input space.

Where is the response variable in this graph?
Statistical Learning: Classication
1.0

1.0
0.5

0.5
0.0

0.0
X2

X2
−0.5

−0.5
−1.0

−1.0 −0.5 0.0 0.5 1.0 −1.0 −1.0 −0.5 0.0 0.5 1.0

X1 X1

f (.) vs. the data.

Statistical Learning: Classication
1.0

1.0
0.5

0.5
0.0

0.0
X2

X2
−0.5

−0.5
−1.0

−1.0 −0.5 0.0 0.5 1.0 −1.0 −1.0 −0.5 0.0 0.5 1.0

X1 X1

f (.) vs fˆ(.)
Why estimate f ?

Prediction: When we have X , and our assumption that

mean() = 0 is valid, then we can predict the associated outcome
Y using:
Ŷ = fˆ(X). (2)

Classication: the boundary dened by fˆ(X) can be used to decide an

appropriate class.
Inference: We wish to infer aspects of the relationship between X
and Y from the data. The goal here is dierent, although we often
recover predictions from such studies.
Which predictors are associated with the response?
What is the relationship between the predictors and response?
Methods for nding f

High
Subset Selection
Lasso

Least Squares
Interpretability

Generalized Additive Models

Trees

Bagging, Boosting

Support Vector Machines

Low

Low High

Flexibility

Flexibility/Complexity vs. Interpretability.

Parametric Methods: Advertising Revisited

Let's formulate an hypothesis on the structure of f .

If we consider predicting Sales using expenditure on TV only, we'd
have:
Y = β0 + β1 × X + , (3)
where Y denotes sales, X denotes expenditure, and {β0 , β1 } are
coecients of the linear equation.
In less formal notation, we posit:
Sales ≈ β0 + β1 × TV. (4)
Parametric Methods: Advertising Revisited

We thus have an equation which posits the relationship between TV and

Sales.
We need to nd the parameters, (β0 , β1 ), such that the equation best
represents the data in some sense.
One possible way to do this is to minimise the distance between
predictions under this model and the observed data, i.e.:
MSQE = Ave((Y − Ŷi )2 )
N
1 X
= (yi − ŷi )2
N i=1 (5)
N
1 X
= (yi − (β0 + β1 xi ))2 .
N i=1

Once we have done so, we have estimates (β̂0 , β̂1 ).

Parametric Methods: Advertising Revisited

By minimising the discrepancies between the model equation and

the observations, we get:

●
●

120
● ● ●●
25

●
● ●
● ● ●
●
● ● ●
● ●
●
●

100
● ●
● ●
● ● ● ●
20

● ● ●
● ● ●
● ● ●●
● ● ● ●
● ● ●
● ● ●
●

80
● ● ● ● ●
● ● ●●
● ●●
● ● ● ● ●●
● ●● ● ●
●
15

● ●
Sales

Error
● ● ● ● ●●
●● ● ● ● ●
● ● ●

60
●
● ● ●● ●● ●
● ● ● ●
● ● ● ● ● ●● ● ●
● ● ● ● ●
●● ●
● ● ●
● ● ● ●● ●
●
●
●● ●
●●
●
●● ● ● ●●
● ●● ●● ●
● ● ●●
●
10

●●
●●● ●

40
● ●●
● ●● ● ● ● ●
● ●● ● ● ● ●
●
● ●
● ●
●● ● ●●
●
● ●●
● ●●
20
●● ●
5

●
^
f
●
● Data
● Residuals
0

0 50 100 150 200 250 300 0 20 40 60 80 100

TV Iteration

We can repeat this for other predictors as well.

Parametric Methods: Advertising Revisited

● ●
● ●
● ● ●● ●● ● ●

25
● ●
● ● ● ●
● ● ● ● ●
●
● ●
● ● ●
● ● ● ●
●
●
● ●
●
● ● ● ●
● ● ● ●
● ●
● ● ● ● ● ● ● ●

15
● ● ● ●

Sales

5
● ●

● ●

0 50 100 150 200 250 300 0 10 20 30 40 50

TV Radio

50
● ●
● ● ●● ● ●
● ●●
● ● ● ● ● ● ●
●
25

●
● ●
●● ● ● ●
●
●
● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ●
● ●

40
● ● ● ●
● ● ● ● ●
●● ● ● ● ●
●
● ● ● ●● ● ●
● ● ● ● ●
20

● ● ● ● ●
●● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ●● ●
● ●
● ● ● ●
● ●● ●
● ● ●

30
● ● ● ● ●
● ● ●● ●● ● ●
● ● ●
● ● ● ● ● ● ●●
● ●
● ● ●● ● ● ● ● ● ● ●
● ● ●
15

● ● Radio
Sales

● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ●● ● ●
● ● ●
5

● ● ●● ●
●● ● ● ●
● ● ● ●
● ● ●● ● ● ● ●
● ● ● ●●
●● ●● ● ● ● ●
● ●●
● ●
0

0 20 40 60 80 100 0 20 40 60 80 100

Newspaper Newspaper

Linear t of Sales vs. various predictor variables.

Parametric Methods: Income Revisited

But what about the joint relationship? That is, we know that more
than one factor attributes to income.
So we include more predictors: E.g., for a given combination of
Education and Seniority, what Income can we expect to observe?
Let's just extend our equation:
Y = β0 + β1 × X1 + β2 × X2 + , (6)
where Y denotes income, X1 denotes years of education, X2
denotes seniority, and {β0 , β1 , β2 } are coecients of the linear
equation. I.e.,
Income ≈ β0 + β1 × Education + β2 × Seniority. (7)
Parametric Methods: Income Revisited

Incom
e

ity
or
Ye

ni
ars

Se
of
Ed
uc
ati
on

f (X1 , X2 ) vs. observations.

Parametric Methods: Income Revisited

Incom
e

ity
or
Ye

ni
ars

Se
of
Ed
uc
ati
on

Parametric equation for a plane as fˆ(X1 , X2 ) (Equation 7).

Parametric Methods: Income Revisited
100

100
● ●
● ●
● ●

● ●
● ● ● ●
● ●
● ●
80

80
● ●
● ● ● ●

● ● ● ●
● ● ● ●
● ●
● ●
● ●
Income

Income
60

60
● ●

● ●

● ●
40

40
● ●
● ●

● ● ●
●

● ●
● ●
20

20
● ●
● ●

10 12 14 16 18 20 22 50 100 150

Education Seniority

Income vs. various predictor variables.

Non-Parametric Methods

Consider a more mechanical hypothesis on the structure of f .

Draw a set of predictors. Pick the K observations that are `closest' to X
and average the responses from those examples.
Mathematically, we have:
ŷi = fˆ(xi ) = Ave(yj |xj ∈ NK (xi )) (8)
for every i = 1, 2, ..., N .
The similarity argument is simple, and doesn't require us to nd any
parameters (ish).
Non-Parametric Methods: Income Revisited

k_NN(k = 3) k_NN(k = 10)

100

100
● ●
● ●
● ●

● ●
● ● ● ●
● ●
● ●
80

80
● ●
● ● ● ●

● ● ● ●
● ● ● ●
● ●
● ●
● ●
Income

Income
60

60
● ●

● ●

● ●
40

40
y^ = 35.25

● ●

y^ = 29.08
● ●

● ● ● ●

● ●
● ●
20

● 20 ●
● ●

10 12 14 16 18 20 22 10 12 14 16 18 20 22

Education Education
Non-Parametric Methods: Income Revisited
Non-Parametric Methods: Income Revisited

k_NN(k = 5) k_NN(k = 15)

Inco

Inco
me

me
Ed

Ed
uc

uc
atio

atio
n

n
rity rity
nio nio
Se Se

fˆ for
Circles Revisited
Some Closing Remarks

Which approach is better?

For a given approach, which parameters are `best'?
We never have f (.), how do we know even know we are close to
the true pattern?
What insights do these models give about the true process?

сталі вирази
No ratings yet
сталі вирази
16 pages
Learning Activity Sheet TLE 7 - 3&4 (LAS)
0% (1)
Learning Activity Sheet TLE 7 - 3&4 (LAS)
7 pages
2.0 Machine Learning Introduction
No ratings yet
2.0 Machine Learning Introduction
24 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
Introduction to Statistical Learning _ Why Do We Need Statistical Learning
No ratings yet
Introduction to Statistical Learning _ Why Do We Need Statistical Learning
15 pages
QSRI-lecture1
No ratings yet
QSRI-lecture1
45 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
ML_Valkenborg
No ratings yet
ML_Valkenborg
84 pages
06 Machine Learning
No ratings yet
06 Machine Learning
24 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
Day 2. Lecture - Machinelearning
No ratings yet
Day 2. Lecture - Machinelearning
32 pages
Unit-1 - Machine Learning
No ratings yet
Unit-1 - Machine Learning
85 pages
Intro To Data Science Lecture 1
No ratings yet
Intro To Data Science Lecture 1
7 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
Machine Learning
No ratings yet
Machine Learning
64 pages
5 Le
No ratings yet
5 Le
36 pages
Chapter Introduction
No ratings yet
Chapter Introduction
7 pages
Machine Learning
No ratings yet
Machine Learning
137 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
Regression 0
No ratings yet
Regression 0
108 pages
Introduction To ML - MCA - 2023
No ratings yet
Introduction To ML - MCA - 2023
30 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
AI Chapter 5
No ratings yet
AI Chapter 5
31 pages
Lecture Machinelearning
No ratings yet
Lecture Machinelearning
32 pages
ML Chapter 1
No ratings yet
ML Chapter 1
41 pages
01-intro
No ratings yet
01-intro
22 pages
Notes
No ratings yet
Notes
125 pages
ML Sit1305
No ratings yet
ML Sit1305
127 pages
86 37 196 Mod 5
No ratings yet
86 37 196 Mod 5
52 pages
BTMMeeting25Nov2020-StatisticalLearning
No ratings yet
BTMMeeting25Nov2020-StatisticalLearning
49 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
lec1
No ratings yet
lec1
54 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Introduction to ML Unit-1 PPT
No ratings yet
Introduction to ML Unit-1 PPT
90 pages
21CSC305P ML_ Unit 1-E.pptx
No ratings yet
21CSC305P ML_ Unit 1-E.pptx
137 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
CHP 1
No ratings yet
CHP 1
47 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
L02 Fundamentals of ML
No ratings yet
L02 Fundamentals of ML
46 pages
Ch2_Statistical_Learning
No ratings yet
Ch2_Statistical_Learning
51 pages
ML - Module 1
No ratings yet
ML - Module 1
30 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
MILIT PPT Modifies
No ratings yet
MILIT PPT Modifies
43 pages
Module 1 Notes
No ratings yet
Module 1 Notes
38 pages
Week 12 Intro to DS and ML
No ratings yet
Week 12 Intro to DS and ML
67 pages
ML 1
No ratings yet
ML 1
35 pages
3171617_introduction_1175
No ratings yet
3171617_introduction_1175
58 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
23ECE205 FoDS 13 Introduction To ML
No ratings yet
23ECE205 FoDS 13 Introduction To ML
41 pages
DataScienceIntro-CM-ML_compressed
No ratings yet
DataScienceIntro-CM-ML_compressed
79 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
15 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
79 pages
Chapter 01 Introduction to ML
No ratings yet
Chapter 01 Introduction to ML
178 pages
Intro_DL_01
No ratings yet
Intro_DL_01
64 pages
ML
No ratings yet
ML
17 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
DS - NLP
No ratings yet
DS - NLP
39 pages
Ch3-Machine Learning
No ratings yet
Ch3-Machine Learning
124 pages
Cultural Psychology Who Needs It
No ratings yet
Cultural Psychology Who Needs It
28 pages
Case Analysis - Edited
No ratings yet
Case Analysis - Edited
8 pages
Gen 101 U-Ii
No ratings yet
Gen 101 U-Ii
11 pages
Yo Yo Honey Singh
No ratings yet
Yo Yo Honey Singh
9 pages
The Key of Chain To Impermanent Love By: Rodjhen Anne P. Barquilla
No ratings yet
The Key of Chain To Impermanent Love By: Rodjhen Anne P. Barquilla
1 page
With Inclusionof The Provisions of Deped Order No. 8, S. 2015
No ratings yet
With Inclusionof The Provisions of Deped Order No. 8, S. 2015
3 pages
teaching-internship-task-12
No ratings yet
teaching-internship-task-12
9 pages
Reviewer in Guidance para Matapos Na Kami
No ratings yet
Reviewer in Guidance para Matapos Na Kami
47 pages
Non Judgemental Attitude
No ratings yet
Non Judgemental Attitude
3 pages
Bkempe Engl3020 Final
No ratings yet
Bkempe Engl3020 Final
6 pages
Formal Observation 10 2019
No ratings yet
Formal Observation 10 2019
12 pages
Data-Collection-per-DO-No.-10-s.-2025
No ratings yet
Data-Collection-per-DO-No.-10-s.-2025
6 pages
A Summer Job Comprehension
No ratings yet
A Summer Job Comprehension
4 pages
4.AB Psychology
No ratings yet
4.AB Psychology
9 pages
Questionnaire 8
No ratings yet
Questionnaire 8
6 pages
Statistics Probability - G11 - Q3 - Mod4 - Random Sampling Parameter and Statistic
No ratings yet
Statistics Probability - G11 - Q3 - Mod4 - Random Sampling Parameter and Statistic
9 pages
Interview Questions Copy-1
No ratings yet
Interview Questions Copy-1
2 pages
Q2 Philosophyreviewer
No ratings yet
Q2 Philosophyreviewer
2 pages
Empirical Links Between Achievement Goal Theory and Self-Determination Theory in Sport
No ratings yet
Empirical Links Between Achievement Goal Theory and Self-Determination Theory in Sport
13 pages
JFT APR FinancialStress 2020
No ratings yet
JFT APR FinancialStress 2020
30 pages
7 Keys To Abundance by Leah Morris
No ratings yet
7 Keys To Abundance by Leah Morris
3 pages
Ed 3501 Lesson Plan Hanson Karl
No ratings yet
Ed 3501 Lesson Plan Hanson Karl
5 pages
Great Place To Work Report
100% (1)
Great Place To Work Report
38 pages
Ogl 360 Final Project-Compressed
No ratings yet
Ogl 360 Final Project-Compressed
19 pages
Practical Research 1: Quarter 4
No ratings yet
Practical Research 1: Quarter 4
11 pages
Self Esteem Study Skills Self Concept Social Support Psychological Distress and Coping Mechanism Effects On Test Anxiety and Academic Performance
No ratings yet
Self Esteem Study Skills Self Concept Social Support Psychological Distress and Coping Mechanism Effects On Test Anxiety and Academic Performance
9 pages
Training Manual Prison Officers
No ratings yet
Training Manual Prison Officers
224 pages
Idiom Examples With Sentences: 1 What Is Framing Sentence?
No ratings yet
Idiom Examples With Sentences: 1 What Is Framing Sentence?
17 pages