Data Analysis For Physics Laboratory: Standard Errors
Data Analysis For Physics Laboratory: Standard Errors
Data Analysis For Physics Laboratory: Standard Errors
These notes cover the most important points from the PHYS 10181B Data Analysis
course. They are intended as an aide-memoire, and do not replace the e-learning module
lecture notes. Further material can be found in textbooks, such as Practical Physics
by G.L. Squires.
It is good laboratory practice, which helps with data analysis, to keep good records of
everything that could possibly be relevant, recording all of your readings in ink, directly
into a neat laboratory notebook.
Standard errors
When we measure some quantity, our measured value, x, is very unlikely to be exactly
equal to the true value, X, of that quantity. It is useful to have a good estimate of how
close to the true value any given measurement is likely to be. This is called the Standard
Error. Although we use the word error, this does not imply that the measurement has
been done badly, or incorrectly. It is better to think of the standard error as a descrip-
tion of the uncertainty inherent in the measurement. The value of the standard error
does not just apply to our one measurement, but will be the same for any measurement
of the same quantity using the same apparatus and technique.
Uncertainties dont represent absolute limits: for example, a result x implies 68.3%
confidence that the true value is in the range x x + , and 95.5% confidence that
the true value lies in the range x 2 x + 2.
There are two important types of errors: random errors due to factors such as the
intrinsic accuracy of the apparatus; and systematic errors, which cause measurements
to deviate from true values in a systematic way.
(x X)2
!
1
f (x) = exp .
2 2 2
The quantity X is the mean value of the Gaussian distribution and is the standard de-
viation of this distribution. It represents the standard error on a single measurement, i.e.
it is related to the precision of the apparatus and the care with which the measurements
are made.
Although we cannot measure X or directly, we can use our value of x as our best
estimate of X, and a good estimate of may be obtained from the standard deviation
of the finite sample of n measurements, using:
s
n
s.
n1
q
In practice, the number of measurements, n, is usually large enough that n/(n 1)
can be taken as unity. In other words, the standard deviation of the finite sample of
measurements is a very good approximation to the standard deviation of the underlying
(infinite) Gaussian distribution.
The standard error on our average value, x, is given by the standard deviation of the
distribution (i.e. the standard error on an individual measurement) divided by the square
root of the number of measurements in our sample:
s
m = = .
n n1
From such a set of measurements, the quoted result would therefore be x m .
From the point of view of the design of an experiment involving repeated measurement,
the final precision therefore depends on both the precision of each measurement and the
number of measurements taken.
The number of degrees of freedom, N dof , is equal to the number of data points, N ,
minus the number of free parameters in the fit. A fit to a straight line, y = mx + c, has
two free parameters, m and c.
For a good fit, the value of chi-squared will be approximately equal to the number of
degrees of freedom. As a rule of thumb, values of chi-squared in the range from half to
twice N dof may be regarded as indicating an acceptable fitqif N dof is less than q about
2
20. For larger N dof , values of /N dof in the range 1 8/N dof to 1 + 8/N dof
would be acceptable. A good fit tells us that this particular set of data points could
reasonably have been measured from a physical system governed by this function. Note
that there may also be other functions or parameter values which give a good fit.
Values outside of these acceptable ranges usually imply that something is wrong, either
with the uncertainties i , where an under-estimate would give a high chi-sq and an over-
estimate would give a low chi-sq, or with the assumed function f (x). Always look at a
plot of your data points and best-fit function to help you to interpret your calculated
value of chi-sq sensibly.
Often, the normalised value of chi-sq (2 /N dof ) is used; this is referred to as the reduced
chi-sq or 2r .
f2 = x2 + y2 + z2 .
2. When independent quantities are multiplied or divided, then the fractional (or
percentage) errors are combined in quadrature. For example, if f = xy/z, then
!2 2 !2 2
f x y z
= + + .
f x y z
The shape of the Poisson distribution is not symmetric, however, as the value of
increases it becomes very like the Gaussian distribution, and for values of above about
10, it is reasonable to treat it as a (discrete) Gaussian
distribution. In an experiment
where N counts are recorded, a statistical error of N may be assigned, to allow for the
random fluctuations in the counting rate. So our best estimate of is N N .