Machine Learning and Pattern Recognition Week 2 Error Bars

The document discusses error bars and uncertainty in experimental results. It explains how to calculate standard error on a mean from sample data using sample variance. It also discusses how error bars can indicate how precisely we know the mean and how applicable the central limit theorem is for different sample sizes and distributions. The document cautions that error bars must be clearly defined and notes there are different types of errors like variation between models.

Uploaded by

zeliawillscumberg

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Machine Learning and Pattern Recognition Week 2 Error Bars

Uploaded by

zeliawillscumberg

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Error bars

It’s good practice to give some indication of uncertainty or expected variability in experi-
mental results. You will need to report experimental results in your coursework. Many of
you will also write up experimental results in dissertations this year, and you will also want
to know how seriously to take numbers that you measure in your future work.
We will discuss some different “standard deviations” that you might see reported, or want
to report, including “standard errors on the mean”.

1 Standard errors on a mean

Imagine we are taking a series of experimental measurements { xn }nN=1 . These could be all
sorts of things: for example, the times taken for different runs of a program that you’re
testing, or the weights of some people who attend a particular gym.
We will assume that the measurements are taken independently from some unknown
distribution. In the first example, the test conditions for the computer program are stable,
and the times depend on independent random choices. In the second example, the people
were selected at random from members of the gym. These distributions are ‘unknown’ in
that we don’t have mathematical descriptions of them. However, we can draw samples
(gather data) from the distributions in these examples.
The mean µ and variance σ2 of the distribution are usually unknown, but we assume they
are finite. We can estimate them from the sample mean and variance, given N independent
samples:
N
1
µ ≈ x̄ =
N ∑ xn (1)
n =1
N
1
σ2 ≈ σ̂2 = ∑
N − 1 n =1
( xn − x̄ )2 . (2)

The ( N − 1) in the estimator for the variance, rather than N, (Bessel’s correction) is a small
detail you don’t need to worry about for this course.1
The estimator x̄ is itself a random variable: if we gathered a second dataset and computed
its mean in the same way, we would get a different x̄. For some datasets x̄ will be bigger
than the underlying true mean µ, sometimes it will be smaller. The mean of x̄ is the correct
answer µ. That is, x̄ is an unbiased estimator.
Using the rules of expectations and variances (see the note in the background section), we
can estimate the variance of x̄. We assume here that the observations are independent:
N
1
var[ x̄ ] =
N2 ∑ var[xn ] (3)
n =1
1
= Nσ2 = σ2 /N ≈ σ̂2 /N. (4)
N2
A “typical” deviation from the mean is given by the standard deviation (not the variance).
So we write: √
µ = x̄ ± σ̂/ N, (5)
to give an indication of how precisely we think we have measured the mean of the distribu-
tion with our N samples. Some papers might report ± two standard deviations.

1. The ( N − 1) normalization makes the variance estimator unbiased and is what the Matlab/Octave var function
does by default. NumPy’s np.var requires the option ddof=1 to get the unbiased estimator. However, if N is small
enough that this difference matters, you need to be more careful about the statistics than we are in this note.

MLPR:w2d Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 1

If the distribution over observations has finite mean and variance, then for large N the
Central Limit Theorem (CLT) tells us that x̄ will be approximately Gaussian distributed
close to its mean. We don’t assume that the data are Gaussian distributed. We just note
that the sum, and so average, of many values will be approximately Gaussian.
√ With that
interpretation, we expect the estimate x̄ to be within 1 standard error σ/ N about 2/3 of
the time, and within 2 standard errors about 95% of the time.
√ √
Care: we don’t evaluate the ‘true’ standard error σ/ N, but an approximation of it, σ̂/ N.
Moreover, the CLT will only be accurate for large N, and can’t be trusted several
√ standard
deviations away from the mean. In other words, the “error bar” µ = x̄ ± σ̂/ N gives some
indication of what means are plausible. Giving this statement is better than just stating
µ ≈ x̄, but more concrete statistical statements would require additional analysis.

2 Application to test set errors

The average test set loss,
M M
1 1
Ltest =
M ∑ L(y(m) , f (x(m) )) =
M ∑ Lm , (6)
m =1 m =1
is an estimate of the generalization error, the average loss we would see if we could gather
an infinite test set with the same distribution. How wrong might this estimate be?
We don’t assume that the individual losses, L(y, f (x)), are Gaussian-distributed, and they
often aren’t. For example, when performing classification we might report the 0–1 loss,
which is zero when we are correct and one when we make an error. In this example, the
distribution over the losses is a Bernoulli distribution.
However, we can compute the empirical mean and variance of any set of losses, and report
an estimate of the mean, with a standard error bar.
Before taking such an error bar on test performance too seriously we would need to think
about whether the theory above applies. Are the test cases independent? If the loss isn’t
bounded, is it likely to have a finite variance that we can reasonably estimate? Will future
inputs to our model come from the same distribution? Will the relationship between inputs
and outputs that we’re modelling continue to be the same?

3 Reliability of a method
A standard error on the test set loss indicates how much the future performance of a
particular fitted model might deviate from the performance we have estimated. It doesn’t say
whether the machine learning method would work well in future, if training were run again
to create a new model.
Readers of a paper may also want to know how variable the performance of a model can
be across different fits, that is, how robust is the method? The fitted models could vary
for multiple reasons: we might gather new data; some machine learning methods depend
on random choices; somewhat horrifyingly, even machine learning code using no random
numbers and running on the same data, often gives different results!2 To summarize one
of these effects, we could report the standard deviation of the models’ performances (not
a standard error on the mean) to indicate how much a future fit will typically vary from
average performance when something is changed.
Important: Papers are sometimes not clear on what their “error bars” are reporting. Some-
times they show the standard deviation of results under different conditions, other times
a standard error indicating uncertainty of the generalization error due to a finite test set.
Always try to be clear precisely what standard deviation/error you are reporting and why.

2. This phenomenon is mentioned in passing in https://arxiv.org/abs/1707.05589. Different single-point preci-

sion round-off errors occur when different choices are made by a parallel scheduler on a GPU, and the effect can be
as large as changing a random initialization!

MLPR:w2d Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 2

4 Which model is better?
If we fit two models, A and B, we can get a test set loss and standard error for both. If these
two error bars overlap, it is common — but usually wrong — to conclude that we can’t tell
whether A is better than B.
As an example, if A had a loss of 0.1 less than B on every single test case in a test set of 1,000
cases, I would be incredibly confident that A was better than B. The two standard errors for
the models’ performance could be larger (for example 0.5). It is possible to be sure of the
ordering of two models, without knowing how good either is very precisely.
To test model A against model B we could do a simple paired comparison. Construct the
difference in losses on each test case:

δm = L(y(m) , f (x(m) ; B)) − L(y(m) , f (x(m) ; A)). (7)

If the mean of the δ’s is several standard errors greater than zero, we would report that A is
the better model. (Non-examinable: you could perform a paired t-test if you wanted to turn
this idea into a formal hypothesis test.)

5 Test your understanding

I hope you’ll try computing error bars “for real”, when you next gather some data. However,
you can (and should!) check you know how on a toy example now.
For example, try generating 100 variables from some non-Gaussian distribution with a mean
you know. For example a Bernoulli distribution:
xx = 1 * (np.random.rand(100) < 0.3)
Now estimate the mean from the 100 data-points, along with a standard error bar on that
mean. While scipy has a standard error function, you should compute the error bar from
the standard deviation at first, so you definitely know what you are computing and why. If
you re-run your code several times, is the error bar usually reasonable?
Not everything you read will have an explicit suggestion of something to try at the end. You
should come up with these ideas yourself: I often look for ways to check my mathematical
understanding (which is often wrong). If there is a small special case where we know the
answer, trying it out on a computer is often a good idea.

6 Further Reading and Reflection

This course doesn’t get into any “proper” or sophisticated statistical analysis of results. For
example, if you wanted to do formal statistical tests on approximately normally distributed
quantities, a t-test takes into account that we can only estimate the variance of the distribution.
However, there is no point talking about t-tests, or other specific statistical tests, if it’s not
clear what you are doing and why!
This note reflects what is actually the most important first step: think about what sources of
variability are affecting your results, and get some idea of how big these variations can be.
Sometimes the difference in performance of two models is so much larger than any source
of variability, that careful testing is not required. If careful testing is required, you should
consider whether anyone cares about the size of the difference, even if it’s “statistically
significant”. If the performance of models is close, maybe we should pick the model that’s
easiest to implement or maintain, or takes less computer time.

MLPR:w2d Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 3

SENTRI Manual
No ratings yet
SENTRI Manual
30 pages
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
100% (1)
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
14 pages
Data Analysis For Physics Laboratory: Standard Errors
No ratings yet
Data Analysis For Physics Laboratory: Standard Errors
5 pages
1B40 DA Lecture 2v2
No ratings yet
1B40 DA Lecture 2v2
9 pages
Fundamentals of Statistics (18.6501x)
No ratings yet
Fundamentals of Statistics (18.6501x)
20 pages
Statistics
No ratings yet
Statistics
60 pages
Errors in Measurement
No ratings yet
Errors in Measurement
35 pages
Lecture 11: Standard Error, Propagation of Error, Central Limit Theorem in The Real World
No ratings yet
Lecture 11: Standard Error, Propagation of Error, Central Limit Theorem in The Real World
13 pages
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
No ratings yet
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
9 pages
Basic Probability Reference Sheet: February 27, 2001
No ratings yet
Basic Probability Reference Sheet: February 27, 2001
8 pages
Statistics
No ratings yet
Statistics
53 pages
Errors Experiment
No ratings yet
Errors Experiment
8 pages
Pro Band Stat
No ratings yet
Pro Band Stat
27 pages
Lectura 1 Point Estimation
No ratings yet
Lectura 1 Point Estimation
47 pages
A (Very) Brief Review of Statistical Inference: 1 Some Preliminaries
No ratings yet
A (Very) Brief Review of Statistical Inference: 1 Some Preliminaries
9 pages
Probability and Statistics
No ratings yet
Probability and Statistics
28 pages
lec7
No ratings yet
lec7
10 pages
Data Analysis, Standard Error, and Confidence Limits: Mean of A Set of Measurements
No ratings yet
Data Analysis, Standard Error, and Confidence Limits: Mean of A Set of Measurements
5 pages
Error Propagation
No ratings yet
Error Propagation
22 pages
Lecture01 Uppsala EQG 12
No ratings yet
Lecture01 Uppsala EQG 12
39 pages
ABD Formulas
No ratings yet
ABD Formulas
55 pages
Lecture Notes For Mathematical Statistics
No ratings yet
Lecture Notes For Mathematical Statistics
184 pages
Lecture1
No ratings yet
Lecture1
8 pages
PHY224H1F/324H1S Notes On Error Analysis: References
No ratings yet
PHY224H1F/324H1S Notes On Error Analysis: References
14 pages
Statistics, Probability, Distributions, & Error Propagation: James R. Graham 9/2/09
No ratings yet
Statistics, Probability, Distributions, & Error Propagation: James R. Graham 9/2/09
39 pages
Statistics Review
No ratings yet
Statistics Review
16 pages
Basic Concepts of Inference: Corresponds To Chapter 6 of Tamhane and Dunlop
No ratings yet
Basic Concepts of Inference: Corresponds To Chapter 6 of Tamhane and Dunlop
40 pages
Experimental Uncertainties: A Practical Guide
No ratings yet
Experimental Uncertainties: A Practical Guide
17 pages
Uncertainty
No ratings yet
Uncertainty
12 pages
Basic Stats Estimation
No ratings yet
Basic Stats Estimation
8 pages
Lec 6
No ratings yet
Lec 6
20 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
Articulo - Error Bars in Experimental Biology
No ratings yet
Articulo - Error Bars in Experimental Biology
6 pages
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
No ratings yet
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
12 pages
Data Analysis Vade Me Cum
No ratings yet
Data Analysis Vade Me Cum
79 pages
Industrial Instrumentation
No ratings yet
Industrial Instrumentation
30 pages
Chapter 7. Statistical Estimation: 7.6: Properties of Estimators I
No ratings yet
Chapter 7. Statistical Estimation: 7.6: Properties of Estimators I
6 pages
Lectura 2 Point Estimator Basics
No ratings yet
Lectura 2 Point Estimator Basics
11 pages
CSCE 970 Lecture 6: System Evaluation and Combining Classifiers
No ratings yet
CSCE 970 Lecture 6: System Evaluation and Combining Classifiers
9 pages
An Introduction To Objective Bayesian Statistics PDF
No ratings yet
An Introduction To Objective Bayesian Statistics PDF
69 pages
Data and Error Analysis
No ratings yet
Data and Error Analysis
35 pages
1726614865 453 350Lecture5 - Simple Linear Regression - Estimation
No ratings yet
1726614865 453 350Lecture5 - Simple Linear Regression - Estimation
29 pages
Problem Set 1 - Answers
No ratings yet
Problem Set 1 - Answers
7 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
Intro To Essential Stats With Python
No ratings yet
Intro To Essential Stats With Python
51 pages
Risk Fisher
No ratings yet
Risk Fisher
39 pages
Bias-Variance Tradeoffs: 1 Single Sample MLE
No ratings yet
Bias-Variance Tradeoffs: 1 Single Sample MLE
7 pages
Chaeat Sheet Econometrics
100% (2)
Chaeat Sheet Econometrics
5 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Statistics 21march2018
No ratings yet
Statistics 21march2018
25 pages
Wam Chem417 Notes App v4
No ratings yet
Wam Chem417 Notes App v4
4 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
18.6501x Fundamentals of Statistics
100% (1)
18.6501x Fundamentals of Statistics
8 pages
Errors Bars Biology Cumming-2007
No ratings yet
Errors Bars Biology Cumming-2007
6 pages
1-Intro to Error Analysis.pptx
No ratings yet
1-Intro to Error Analysis.pptx
13 pages
msqe_metrics_1_ps2
No ratings yet
msqe_metrics_1_ps2
11 pages
Lec3 ppt2019
No ratings yet
Lec3 ppt2019
18 pages
Sample Theory With Ques. - Estimation (JAM MS Unit-14)
No ratings yet
Sample Theory With Ques. - Estimation (JAM MS Unit-14)
25 pages
Stats, Mle, and Other Stuff: 1 Sevssd
No ratings yet
Stats, Mle, and Other Stuff: 1 Sevssd
10 pages
Basic - Statistics 30 Sep 2013 PDF
100% (1)
Basic - Statistics 30 Sep 2013 PDF
20 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
w2e_multivariate_gaussian
No ratings yet
w2e_multivariate_gaussian
6 pages
MDA3S
No ratings yet
MDA3S
22 pages
w2c_central_limit
No ratings yet
w2c_central_limit
1 page
Biological Data Science Lecture4
No ratings yet
Biological Data Science Lecture4
21 pages
Week 2 Naive Bayes
No ratings yet
Week 2 Naive Bayes
15 pages
Biological Data Science Lecture6
No ratings yet
Biological Data Science Lecture6
29 pages
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
BDS 2016-17
No ratings yet
BDS 2016-17
4 pages
MATH11183 Week 1-Part 2
No ratings yet
MATH11183 Week 1-Part 2
18 pages
Part 5
No ratings yet
Part 5
31 pages
Part 4
No ratings yet
Part 4
24 pages
TS Part2
No ratings yet
TS Part2
62 pages
Week 8 Pca
No ratings yet
Week 8 Pca
26 pages
MLPR w0f - Machine Learning and Pattern Recognition
No ratings yet
MLPR w0f - Machine Learning and Pattern Recognition
3 pages
Bio Statslectures
No ratings yet
Bio Statslectures
60 pages
Part 3
No ratings yet
Part 3
29 pages
PMRslides 02
No ratings yet
PMRslides 02
13 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
2019 AMAM Exam Paper
No ratings yet
2019 AMAM Exam Paper
3 pages
Bayesian Workshop1 Solution
No ratings yet
Bayesian Workshop1 Solution
3 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
w9b Netflix Prize
No ratings yet
w9b Netflix Prize
3 pages
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
No ratings yet
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
3 pages
Bayesian Week4 LectureNotes
No ratings yet
Bayesian Week4 LectureNotes
15 pages
Heat Advection
No ratings yet
Heat Advection
12 pages
Machine Learning and Pattern Recognition - Laplace - Approximation
No ratings yet
Machine Learning and Pattern Recognition - Laplace - Approximation
4 pages
Machine Learning and Pattern Recognition Variational KL
No ratings yet
Machine Learning and Pattern Recognition Variational KL
5 pages
2017 AMAM Exam Paper
No ratings yet
2017 AMAM Exam Paper
6 pages
Machine Learning and Pattern Recognition Sampling Based Approximations
No ratings yet
Machine Learning and Pattern Recognition Sampling Based Approximations
3 pages
Navigation and Path Following Guidance of A Small Auv - Maya: From Concept To Practice
No ratings yet
Navigation and Path Following Guidance of A Small Auv - Maya: From Concept To Practice
7 pages
Packet Tracer Manual
100% (5)
Packet Tracer Manual
177 pages
Fall Activity PDF
No ratings yet
Fall Activity PDF
1 page
Aa - Req - 000131 - Quality Requirements Third Party Design Verification
No ratings yet
Aa - Req - 000131 - Quality Requirements Third Party Design Verification
11 pages
En LV XF Data Sheet
No ratings yet
En LV XF Data Sheet
10 pages
7 Ways To Boot in Safe Mode in Windows 10
No ratings yet
7 Ways To Boot in Safe Mode in Windows 10
19 pages
Art Fundamental Part 4
100% (1)
Art Fundamental Part 4
3 pages
Final Report Iec-10
No ratings yet
Final Report Iec-10
10 pages
ESwitching Basic Switching Wireless PT Practice SBA
No ratings yet
ESwitching Basic Switching Wireless PT Practice SBA
4 pages
DIG The UK BIZ Links W Robert Hunter Biden
No ratings yet
DIG The UK BIZ Links W Robert Hunter Biden
27 pages
JIT Techniques Practiced in Service Industry
No ratings yet
JIT Techniques Practiced in Service Industry
4 pages
CSC 318 Solution To Past Q
No ratings yet
CSC 318 Solution To Past Q
4 pages
GLS - GSS613 Spatial Data Analyses and Modelling
No ratings yet
GLS - GSS613 Spatial Data Analyses and Modelling
112 pages
SC 1000 RDM - Eng
No ratings yet
SC 1000 RDM - Eng
3 pages
Payroll Accounting - Principles of Accounting
No ratings yet
Payroll Accounting - Principles of Accounting
7 pages
Swahili Stories From Arab Sources, With An English Translation
No ratings yet
Swahili Stories From Arab Sources, With An English Translation
180 pages
Subsquid Testnet Coinlist Co
100% (1)
Subsquid Testnet Coinlist Co
9 pages
discrete structures-assignment-2352821-DặngDuyNguyên
No ratings yet
discrete structures-assignment-2352821-DặngDuyNguyên
6 pages
Network Diagram Hints PDF
No ratings yet
Network Diagram Hints PDF
8 pages
Teste Grila Ecommerce
No ratings yet
Teste Grila Ecommerce
15 pages
Primer Número Revista BIM y Puentes
No ratings yet
Primer Número Revista BIM y Puentes
45 pages
Sig Sigma Implementation (DMAIC) of Friction Welding of Tube To Tube
No ratings yet
Sig Sigma Implementation (DMAIC) of Friction Welding of Tube To Tube
7 pages
SMC8508T Ug R01
No ratings yet
SMC8508T Ug R01
27 pages
ENR
No ratings yet
ENR
3 pages
Heritage Institute of Technology: Kolkata
No ratings yet
Heritage Institute of Technology: Kolkata
6 pages
Access Control Lists: CCNA Routing and Switching Connecting Networks v6.0
No ratings yet
Access Control Lists: CCNA Routing and Switching Connecting Networks v6.0
45 pages
Annexure III - Problem Statements
No ratings yet
Annexure III - Problem Statements
9 pages
Config - Pro and DTL Settings
No ratings yet
Config - Pro and DTL Settings
25 pages
Keyword Research Note
No ratings yet
Keyword Research Note
9 pages

Machine Learning and Pattern Recognition Week 2 Error Bars

Uploaded by

Machine Learning and Pattern Recognition Week 2 Error Bars

Uploaded by

Error bars

1 Standard errors on a mean

MLPR:w2d Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 1

2 Application to test set errors

2. This phenomenon is mentioned in passing in https://arxiv.org/abs/1707.05589. Different single-point preci-

MLPR:w2d Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 2

δm = L(y(m) , f (x(m) ; B)) − L(y(m) , f (x(m) ; A)). (7)

5 Test your understanding

6 Further Reading and Reflection

MLPR:w2d Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 3

You might also like