Treatment of Bias in Estimating Measurement Uncertainty: Gregory E. O'Donnell and D. Brynn Hibbert

Treatment of bias in estimating measurement uncertainty
Gregory E. ODonnell
ab
and D. Brynn Hibbert*
b
Received 24th September 2004, Accepted 15th February 2005
First published as an Advance Article on the web 8th March 2005
DOI: 10.1039/b414843f
Bias in an analytical measurement should be estimated and corrected for, but this is not always
done. As an alternative to correction, there are a number of methods that increase the expanded
uncertainty to take account of bias. All sensible combinations of correcting or enlarging
uncertainty for bias, whether considered significant or not, were modeled by a Latin hypercube
simulation of 125,000 iterations for a range of bias values. The fraction of results for which
the result and its expanded uncertainty contained the true value of a simulated test measurand was
used to assess the different methods. The strategy of estimating the bias and always correcting is
consistently the best throughout the range of biases. For expansion of the uncertainty when the
bias is considered significant is best done by SUMU
Max
:U(C
test result
) 5 ku
c
(C
test result
) + |d
run
|,
where k is the coverage factor (5 2 for 95% confidence interval), u
c
is the combined standard
uncertainty of the measurement and d
run
is the run bias.
Introduction
Uncertainties exist in every analytical measurement whether
they are acknowledged or not. A truly informed decision,
based on a measurement result, can only be made with the
consideration of the measurement uncertainty. The measure-
ment uncertainty of a result is important also for the
comparison of subsamples, laboratories and the comparison
of the test result with a specification limit. The Guide to the
Expression of Uncertainty in Measurement (GUM)
1
gives
scientists a theoretical approach to the estimation of uncer-
tainty and has developed a formal definition of uncertainty
of measurement, in conjunction with ISO in ref. 2, as a
parameter, associated with the result of a measurement,
that characterizes the dispersion of the values that could
reasonably be attributed to the measurand. The measurand
is the quantity being measured, and most analysts would
consider that their primary objective is to obtain a test result
that is as close to the true value of the measurand as possible.
An analyst produces a test result based on the analytical pro-
cedure employed. It is unknown what the true value of a
sample is, one can only estimate a value and the range in which
the true value might lie, by good method validation and quality
control procedures. Therefore, the expression of the uncer-
tainty of an analytical result defines an interval where the true
value of the measurand is expected to lie with a stated level of
confidence. The aim of this paper is to illustrate how the
treatment of bias influences the uncertainty interval and the
probability of the true value occurring within such an interval.
Components of measurement uncertainty
Measurement uncertainty has been considered to have two
components, arising from random effects (random error) and
systematic effects. The former may be estimated by deter-
mining the precision, measured as a standard deviation, of
measurement results under repeatability conditions. The true-
ness of an observed value can be assessed by comparison to
an accepted reference value, preferably embodied in a certified
reference material (CRM). A correction to the observed value
is made when the bias is considered significant. The uncer-
tainty of such a correction is then included in the measurement
uncertainty calculation. In the bottom-up approach each
contribution to bias is determined as a variance, and these
are summed together with the precision estimation in an
uncertainty budget. The difference between the two appro-
aches is, therefore, in the treatment of bias.
Definitions of bias
The term bias has been defined
3
as the difference between the
expectation of the test results and an accepted reference value.
The term test result here refers to a single observation or the
mean of a set of observations of the particular measurand.
Bias has a currency for a particular time and applies to the
results of a single laboratory or a group of laboratories.
4,5
Thus, different types of bias become apparent, namely method
bias, laboratory bias and run bias.
Method bias is the difference between the expectation of test
results obtained from all laboratories using that method and
an accepted reference value.
6
The determination of method
bias is usually achieved by a number of laboratories participat-
ing in a collaborative trial in which a certified reference
material (CRM) is analyzed. Ideally the CRM used should be
of the same matrix. When a CRM is not available then a
standard material purchased from a reputable supplier will
need to suffice. This will have a larger uncertainty than a CRM
and give traceability to that standard material only. Walker
and Lumley
7
have noted that the use of a CRM to establish a
reference value is equivalent to participating in an inter-
laboratory or collaborative trial because the reference value of
the a reputable CRM has usually been established by such a
study. The bias of a method is therefore obtained by com-
paring the mean of all the means of the laboratories test *b.hibbert@unsw.edu.au
PAPER www.rsc.org/analyst | The Analyst
This journal is The Royal Society of Chemistry 2005 Analyst, 2005, 130, 721729 | 721
P
u
b
l
i
s
h
e
d

o
n

0
8

M
a
r
c
h

2
0
0
5
.

D
o
w
n
l
o
a
d
e
d

b
y

U
n
i
v
e
r
s
i
d
a
d
e

T
e
c
n
i
c
a

d
e

L
i
s
b
o
a

(
U
T
L
)

o
n

0
8
/
0
2
/
2
0
1
4

1
6
:
4
4
:
4
4
.

View Article Online / Journal Homepage / Table of Contents for this issue
results to the accepted reference value. This represents full
reproducibility conditions.
The laboratory bias is the difference between the expectation
of the test results from a particular laboratory and an
accepted reference value.
3
The expectation of the test results
has been interpreted in ref. 8 to be the mean of a sufficiently
large number of results. Hence, the laboratory bias is the
difference between the mean of the means of the test results
of individual analytical runs and the accepted reference
value. The individual analytical runs would be performed,
at different times, with different analysts, and if possible,
on different instruments. This is sometimes referred to as
intermediate, intralaboratory or within laboratory repro-
ducibility conditions.
9
It then follows that the run bias (d
run
) would be defined as
the difference between the mean of a small number of test
results from a single run and the accepted reference value,
determined under repeatability conditions. Run bias is the result
of uncontrolled factors remaining constant for the period of
time of that batch of samples and having equal effects on all
samples in that batch.
10
With many analytical methods, the
method bias and the laboratory bias tend to be reasonably
consistent over time, if they have been determined with enough
data, however, the run bias will tend to vary between each
batch of samples. An illustration of the different types of bias
can be seen in Fig. 1. If we take the viewpoint of a client of a
commercial laboratory who submits samples regularly to that
laboratory for analysis, it is logical that the client would
require these results to be comparable from run to run. The
only way this could be achieved is by recognizing that the run
bias is the bias of a particular analysis and moreover that it
should be estimated for each analysis batch. We note the use of
the symbol d
run
for the component of the total bias attributable
to the run in ref. 11 (as distinct from laboratory and method
components). This usage should not be confused with the
symbol here which refers to the actual bias of a particular
analytical run. Some sectors of analytical chemistry take the
view that the bias of an analysis is the laboratory bias,
7,12,13
and use the laboratory bias to correct the measurement result
(if a correction is applied). This is an erroneous correction to
use in most cases. However it may be argued that the use of
laboratory bias is better than no estimate at all. In practice,
such a correction is rarely applied. This is because the labora-
tory bias distribution usually has a substantial standard devia-
tion and a correction is only deemed necessary when the bias is
considered statistically significant.
Decisions for dealing with run bias
There are four major decisions in dealing with bias and
these are schematically illustrated as a decision tree diagram
in Scheme 1. These are described here followed by a
computer simulation to determine the effectiveness of the
treatment to provide a measurement result and expanded
uncertainty that encompasses the true value with the stated
probability.
Decision 1measure bias. The first decision to be made is to
decide if it is necessary to estimate bias at all. Many sectors
of analytical chemistry use empirical methods to give com-
parability of analytical results when trueness cannot be
achieved by any practical means. The results of such analyses
are dependent on the method used and are not related to the
Fig. 1 Schematic of different levels of bias and accompanying dispersion of results. Run bias and repeatability standard deviation contributes to
the standard deviation of the laboratory result (intra-laboratory or intermediate reproducibility) and this and laboratory bias contributes to the
method reproducibility. The accepted reference value (ARV) is the result obtained with zero bias.
722 | Analyst, 2005, 130, 721729 This journal is The Royal Society of Chemistry 2005
P
u
b
l
i
s
h
e
d

o
n

0
8

M
a
r
c
h

2
0
0
5
.

D
o
w
n
l
o
a
d
e
d

b
y

U
n
i
v
e
r
s
i
d
a
d
e

T
e
c
n
i
c
a

d
e

L
i
s
b
o
a

(
U
T
L
)

o
n

0
8
/
0
2
/
2
0
1
4

1
6
:
4
4
:
4
4
.

View Article Online
true value. The results are traceable to that method only.
They are, ipso facto, free of method bias but may indeed suffer
from laboratory and run bias. This bias can only be deter-
mined if a reference material specific for that particular
method is available. If it is available and the bias is determined,
then one can proceed to determine if the bias is significant. If
the method of interest is not an empirical method but rather a
rational method,
14
and the bias has not been measured, then
the method has not been adequately validated and should not
be used. If the bias has not been determined then the reported
amount concentration of the test material (the test result,
C
Test Result
) is the value obtained by the measurement pro-
cedure C
obs
(or C
obs
if the results of a number of analyses have
been averaged).
C
Test Result
5 C
obs
(1)
The combined uncertainty of the test result is then the
combined uncertainty of the measurement, which is estimated
by bottom-up methods explained in GUM.
1
u
c
(C
Test Result
) 5 u
c
(C
obs
) (2)
Once the combined uncertainty has been calculated then the
expanded uncertainty (U) can be obtained by multiplying it
by the coverage factor k, which is often taken as 2 to give
approximately a 95% probability of the location of the true
value when the expanded uncertainty is added and subtracted
from the mean.
U 5 ku
c
(C
Test Result
) (3)
The run bias of a batch of analyses can be determined by
taking the average of a small number (p) of observations of a
CRM (C
obs,CRM
) and comparing it to its accepted reference
value (C
CRM
). As the run bias is determined each time a
batch of the analysis is performed, then the uncertainty of
the run bias is equal to the combination of the uncertainty
of the analysis and the uncertainty of the reference material.
Consequently, the run bias has been determined on one day, in
a short period of time, by one analyst, on one instrument.
These are repeatability conditions and the run bias is equal to
d
run
5 C
obs,CRM
2 C
CRM
(4)
and the uncertainty of the run bias is
u d
run
~
s
r,CRM
p
p

2
zu
2
C
CRM

s
(5)
where s
r,CRM
is the repeatability standard deviation of the
analyses of the CRM and u(C
CRM
) is the uncertainty of the
value of the CRM.
The value of p used as the denominator in eqn. (5) is the
number of analyses of the CRM performed to determine the
run bias. While it would be best to establish the repeatability
on the day of the analysis, time and cost restraints usually do
not allow this to be done with much rigor. Therefore, the
repeatability determined in the method validation process is
often used as a reasonable estimate of s
r,CRM
in eqn. (5) above.
To increase the precision of the estimate it can be calculated
as a pooled standard deviation from a number of different
runs, and is written s
p,r,CRM
. In a commercial working
Scheme 1 Decisions (labeled in bold face) that can be made with respect to bias and its correction, or otherwise. Outcomes (labeled in italics) give
the reported test result (C
Test Result
) and expanded uncertainty, for which k is 2 for approximately 95% coverage. U 5Enlargement method refers
to one of the strategies described in the text to expand the uncertainty when bias is not corrected for.
P
u
b
l
i
s
h
e
d

o
n

0
8

M
a
r
c
h

2
0
0
5
.

D
o
w
n
l
o
a
d
e
d

b
y

U
n
i
v
e
r
s
i
d
a
d
e

T
e
c
n
i
c
a

d
e

L
i
s
b
o
a

(
U
T
L
)

o
n

0
8
/
0
2
/
2
0
1
4

1
6
:
4
4
:
4
4
.

View Article Online
laboratory where time and cost restraints restrict the number
of analyses of the CRM on the day to a single analysis, then
the run bias is simply
d
run
5 C
obs,CRM
2 C
CRM
(6)
and the uncertainty of the run bias is
u d
run
~
s
2
p,r,CRM
zu
2
C
CRM

q
(7)
with the s
p,r,CRM
term being taken from the method validation
process as above.
Once bias is measured then it is usual to test the magnitude
of the bias for statistical significance. A test is usually
performed because it has been decided to take some action
(correct the result or enlarge the uncertainty) when bias is
significantly different from zero, but to ignore non-significant
bias. Bias is generally considered significant when it is greater
than the expanded uncertainty of the measurement of the bias
where u(d
run
) is given by eqn. (5).
d
run
. ku(d
run
) (8)
The magnitude of the uncertainty of the bias is largely
dependent on p, the number of repeats used to determine the
bias. Therefore the significance test should be performed at
p 2 1 degrees of freedom with the null hypothesis H
0
: d
run
5 0
and k should take the value t
0.050,p21
. In practice k is usually
taken as 2 which is the Student-t value for 60 degrees of
freedom and so approximates t
0.050,
.
Decision 2is the bias statistically significant. If the run bias
is determined to be not significant then there are three options,
to correct the observed value for the non-significant bias
anyway (Outcome 2.A), to account for the non-significant
bias by enlarging the uncertainty interval (Outcome 2.B), or to
report the test result without correction for bias and report
the uncertainty including the uncertainty associated with the
zero correction (Outcome 2.C). These options are explained
in more detail below. If a correction is made (Outcome 2.A),
then the test result is equal to the bias subtracted from the
observed value.
C
Test Result
5 C
obs,Sample
2 d
run
(9)
A standard deviation of the observed value is obtained from
a pooled standard deviation of a number of real test samples
which are known to show a positive result for the analyte of
interest. The samples results should be of a similar magnitude,
if not, then all standard deviations should be converted to
relative standard deviations (RSD) and a pooled RSD
calculated. These samples can be analysed on different days,
by different analysts, and on different equipment if possible.
They also should represent the variability of the matrix to be
analyzed, and possess the inhomogeneity of the samples that is
likely to be encountered in this analysis. Whether the
uncertainty of primary sampling is included in the measure-
ment uncertainty of the sample or only any subsampling that
occurred once the sample arrived in the laboratory depends on
the definition of the measurand.
15
If different types of samples
are analyzed by the current method then a separate pooled
standard deviation for each type of sample may be appro-
priate. Consequently, the combined uncertainty of the test
result can be expressed as
u
c
C
Test Result
~
s
p,r,Sample
n
p

2
zu
2
d
run

s
(10)
where n represents the number of analyses of a test sample.
The expanded uncertainty can again be obtained by multi-
plying the combined uncertainty by the coverage factor k.
U 5 ku
c
(C
Test Result
) (11)
When a test result is not corrected for a run bias (Decision
2.3), then the uncertainty range should be increased to take
into account the offset of the result (Outcome 2.B). The
uncertainty range should at least be increased to include the
true value with the stated probability. This can be done by
enlarging the uncertainty interval by one of the enlargement
methods. These methods are explained below, and are referred
to as the method of Barwick and Ellison, the RSSu method,
the RSSU method, the SUMU method and the SUMU
Max
method, and are expressed in eqns. (13)(20) respectively.
When these methods are employed, the expanded uncertainty
is expressed in Scheme 1 in a general form as
U 5 enlargement method (12)
A common occurrence of significant run bias is in the area
of trace analysis and is due to the loss of analyte recovered,
termed apparent recovery in a recent IUPAC recommenda-
tion.
16
Apparent recovery is the ratio of the observed value to
the expected reference value.
RR~
CC
obs,CRM
C
CRM
(13)
Barwick and Ellison
17,18
evaluated the uncertainty of the
apparent recovery, which is a quotient, as
u

RR
0
~
RR
1
p
s
obs,CRM
CC
obs,CRM

2
z
u C
CRM

C
CRM

2
s
(14)
They concluded that when the test result is not corrected for
the loss of the apparent recovery, then the uncertainty interval
should be increased according to
u

RR ~
1{
RR j j
k

2
zu
2
RR
0
s
(15)
The uncertainty of the final test result can be determined by
combining the uncertainty of the measurement result with
the uncertainty of the apparent recovery, giving the expanded
uncertainty
U C
Test Result
~
k|C
Test Result
1
n
s
obs,Sample
C
obs,Sample

2
z
u

RR
RR

2
s
(16)
A number of other approaches of dealing with uncorrected
significant bias are used by different sectors of analytical
P
u
b
l
i
s
h
e
d

o
n

0
8

M
a
r
c
h

2
0
0
5
.

D
o
w
n
l
o
a
d
e
d

b
y

U
n
i
v
e
r
s
i
d
a
d
e

T
e
c
n
i
c
a

d
e

L
i
s
b
o
a

(
U
T
L
)

o
n

0
8
/
0
2
/
2
0
1
4

1
6
:
4
4
:
4
4
.

View Article Online
chemistry. The first approach added the bias to the combined
uncertainty by using the root sum of squares (RSS)
method.
19,20
RSSu:U C
Test Result
~k
u
2
c
C
Test Result
zd
2
run
q
(17)
The second approach also used the RSS method to combine
the bias this time with the expanded uncertainty. This is the
method put forward by the APHA
21
RSSU:U C
Test Result
~
k
2
u
2
c
C
Test Result
zd
2
run
q
(18)
The next method simply adds the bias to the expanded
uncertainty.
19,20
SUMU:U(C
Test Result
)
5
Max(ku
c
(C
Test Result
) d
run
,0)
(19)
When the absolute value of the run bias is larger than the
expanded uncertainty then one of the enlarged uncertainties,
U
+
or U
2
, will be a negative value. Under that circumstance,
the negative limit is equated to zero.
The SUMU approach has the disadvantage of producing
asymmetrical uncertainty limits. This drawback can be
overcome by using the SUMU
Max
method. This method
adds the absolute value of the run bias to the expanded
uncertainty.
19,20
SUMU
Max
:U(C
Test Result
) 5 ku
c
(C
Test Result
) + |d
run
| (20)
If the outcome of Decision 2.3 was not to account for bias in
the test result (Outcome 2C), then the test result is equal to the
observed value. When the bias is measured there is an
uncertainty in that measurement. If the measurement reveals
that the bias is not significant, then there is still uncertainty in
that revelation and a possibility that bias does exist. However,
if one proceeds as if the bias does not exist (often called a
zero correction), then the uncertainty of the decision of
such action still needs to be taken into consideration. Ignoring
bias and its uncertainty in this way yields a test result that is
erroneous and an uncertainty interval that may not include the
true value.
Decision 3is the method fit for purpose. If the run bias is
significant then one should always answer the question, Is the
method fit for purpose? That is to say, does the method give a
result to the level of quality expected of it? If the answer to this
question is no, then the method should not be used until
improvements are made which enables the method to attain
the level of quality expected. If the method is fit for purpose
and does have significant run bias then one needs to decide to
correct the result for the bias, to account for the bias in the
uncertainty by using an enlargement method, or to present the
result uncorrected. It may be thought of as a rare occurence
that a test result is reported uncorrected when it has known
significant bias, however, in many multi-analyte analysis, all of
the analytes in the analysis may not be adequately recovered to
be considered to have non-significant bias, in accordance with
the criteria expressed in eqn. (8), and may indeed be not
corrected for that bias. The guidelines set out for that type of
analysis may well acknowledge that for a number of analytes,
significant bias is present and acceptable. However, it is rarely
stipulated that a correction should be performed.
13
If one
corrects the test result for the bias, then the uncertainty of the
test result is the combination of the uncertainty for the
measurement and the uncertainty of the bias, as seen in
eqns. (9)(11). If the test result is not corrected for bias, the
bias can be accounted for in the uncertainty. The interval can
be enlarged by one of the enlargement methods so that it has a
higher probability of containing the true value. If the result
is not corrected for significant run bias then the result is in
appreciable error.
In the next section we shall describe Monte Carlo simula-
tions that investigate which of the methods do lead to results
and confidence intervals that are consistent with their stated
aims.
Experimental
Many of the decisions on the treatment of bias have a
discontinuity at the point at which the measured bias is
deemed to be significant, and thus have no analytical solution
to the probability distribution. Maroto
20
has shown the use of
Monte Carlo simulations to be a very powerful way to test
the methods. We have adopted the same overall approach, but
use Latin Hypercubes for speed, and cover a wider range of
decision combinations and treatments. In brief, the true value
of a measurand, the measurement repeatability, uncertainty
of the quantity value of the CRM, and the number of
measurements of the CRM are chosen for the experiments.
Then simulations, assuming normality of distribution of
C
CRM
and C
obs
are performed for a range of run biases of
the measurement in which the quantity value of a CRM
is measured with m 5 C
CRM
+ d
run
, and s 5 u(d
run
), as given
by eqns. (4) and (5), and then the test result is obtained in a
second simulation with a particular value of d
run
being
applied in the prescribed treatment (correction or expansion
of uncertainty). This yields a distribution of test results that
can be compared with the true value, and the fraction of
expanded uncertainty ranges that encompass the true value is
observed. If the treatments are well designed, this fraction
should be near 95% for any value of the bias. Scheme 2 shows
this approach for the case in which there is significant bias for
the particular value simulated and then the simulated C
obs
is
either corrected for this bias or the uncertainty is expanded
using the value of the bias.
The simulations were performed on a WinTel personal
computer running @RISK 3.5 (Pallisade software, USA) as an
add-in to Microsoft Excel (Office 2000, Microsoft Inc, Seattle,
USA). The characteristics of the system were chosen to give
non-negligible contributions from measurement uncertainty
and the uncertainty of the CRM. A range of bias values
were chosen to span non-significant to significant bias. Table 1
contains parameters of the model.
For cases in which there is an analytical solution, i.e. when
the procedure is continuous across the level of significance (e.g.
always correct, never correct) the simulations were checked by
a calculation involving the normal distribution. The fraction of
measurements of a value C
true
with run bias, d
run
, and
P
u
b
l
i
s
h
e
d

o
n

0
8

M
a
r
c
h

2
0
0
5
.

D
o
w
n
l
o
a
d
e
d

b
y

U
n
i
v
e
r
s
i
d
a
d
e

T
e
c
n
i
c
a

d
e

L
i
s
b
o
a

(
U
T
L
)

o
n

0
8
/
0
2
/
2
0
1
4

1
6
:
4
4
:
4
4
.

View Article Online
combined uncertainty u
c
for which the result plus and minus
an expanded uncertainty U includes the true value is given by
CtruezU
Ctrue{U
f x,C
true
zd
run
,u
c
dx (21)
where f(x,m,s) is the probability density function of the normal
distribution with mean m and standard deviation s.
Results
Five scenarios have been considered that are most likely to
occur.
1. Always correct for bias
The most obvious scenario would be to always correct for
bias whether it is significant or not. This is equivalent to a
combination of outcomes 2.A and 3.A on the decision chart in
Scheme 1. The observed value is always corrected for the bias
using eqn. (9) and the expanded uncertainty interval (eqn. (11))
is the coverage factor multiplied by the combined uncertainty
(eqn. (10)). For a coverage factor of 2 the probability of the
true value occurring within the uncertainty limits is 95.45%
across all levels of bias and the resulting mean percentage from
the simulation can be seen in Fig. 2 as a line which closely
approximates this value.
2. Never correct for bias
The antithesis of the first position would be to never correct
for bias. This would give an outcome as seen in Scheme 1 as a
combination of 2.C and 3.C. The test result is simply the
observed value as expressed in eqn. (1). The uncertainty of the
test result is equivalent to the uncertainty of the measurement
as expressed in eqn. (2) and eqn. (3). The probability of the true
value occurring inside the interval of the expanded uncertainty
starts at 98.5%, a higher than expected value of 95.45%. This
is because the uncertainty of the test result (eqn. (10) and
eqn. (11)), as shown by the curve, includes the uncertainty of
bias but the distribution of the observed value, C
obs
, which
the uncertainty interval is encompassing, does not. This
means that the uncertainty interval is slightly larger than a
Scheme 2 Detection of significant bias (a) followed by correction (b) or enlargement of the expanded uncertainty (c).
Table 1 Parameters of the simulation model
Simulation parameter Value
Accepted reference value (true value)
of sample
100
d
run
30 in increments of 1
u(C
CRM
) 1.0
s
p,r,CRM
4.0
p (for bias estimation) 1
u(d
run
)
~
4
2
z1
2
p
~4:12
s
p,r,sample
6.0
n (for sample estimation) 1
k 2
Number of iterations of Latin Hypercube
simulation for each d
run
and sample
125,000
P
u
b
l
i
s
h
e
d

o
n

0
8

M
a
r
c
h

2
0
0
5
.

D
o
w
n
l
o
a
d
e
d

b
y

U
n
i
v
e
r
s
i
d
a
d
e

T
e
c
n
i
c
a

d
e

L
i
s
b
o
a

(
U
T
L
)

o
n

0
8
/
0
2
/
2
0
1
4

1
6
:
4
4
:
4
4
.

View Article Online
corresponding 95.45% interval of the distribution of C
obs
.
As the bias increases the probability quickly decreases as
shown in Fig. 2. This is not surprising, for as the bias increases
and the uncertainty interval remains constant, there would
be increasingly less chance of the true value occurring inside
the interval.
3. Non-significant biasuncorrected/significant biascorrected
The intermediate positions between these two extreme cases
would be to leave the observed value uncorrected for non-
significant bias and to correct the observed value, or enlarge
the uncertainty, for significant bias. The former situation,
would be a combination of the outcomes as seen in Scheme 1
as 2.C and 3.A to disregard any non-significant bias and to
correct the test result when the bias became significant. The
results of the simulation show a curve that starts just below
97.5% which again is above the expected value of 95.45%
probability. This again is due to the uncertainty of the test
result including the uncertainty of the bias and the distribution
of the observed value not including it. From this value of
97.5% the curve then decreases in probability until significant
bias is encountered. The observed value is then corrected
for the bias, and the probability of the uncertainty interval
containing the true value then increases until it reaches the
expected probability of 95.45%. (see Fig. 2)
4. Non-significant biasuncorrected/significant biasenlarged
A combination of outcomes 2.C and 3.B illustrates the situa-
tion when non-significant bias is disregarded and the uncer-
tainty range is enlarged to compensate for the bias when it
becomes significant (see Fig. 3). Again the curves started
higher than expected, because of the uncertainty of the test
result including the uncertainty of the bias when the distribu-
tion of the observed value did not. The method of SUMU gave
the most unacceptable probability of all the enlargement
methods. It shows a curve that starts at 97.5% and slowly
reduces in probability to zero where the significance bias ratio,
that is ratio of the bias to the expanded uncertainty of the bias,
was marginally higher than 2. The RSSU and Barwick and
Ellison method start at zero bias with a probability of 98.6%,
decrease to 91% at significant bias, and continue to decline
at slightly different rates to approximately 80% and 83%
respectively at a significant bias ratio of 2.0. The SUMU
Max
enlargement method mirrors the RSSu curve and both curves
decrease slightly to 92% probability just after significant bias is
encountered. The curves then increase due to the interval being
enlarged to the expected probability at a significant ratio of
1.75, above which the curves depart from each other with
RSSu increasing to approximately 2% above SUMU
Max
at
3.0 significant bias ratio. The RSSu and SUMU
Max
curves at
high bias, above 1.756significant bias ratio, over estimate the
uncertainty interval slightly.
5. Always enlarge for bias
The other scenario not yet considered is to enlarge the uncer-
tainty range to account for the bias at all levels of bias. With
this scenario as expected all the curves start at a probability
above the expected value, except the SUMU method, which
performs poorly as seen in Fig. 4. From an expected value of
95.45% at zero bias, the probability quickly decreases. The
RSSU and the Barwick and Ellison curves mirror each other
with the latter curve giving a slightly higher probability. They
both gradually decrease to 92% and 93.5% at significant bias
and continue to decrease to 80% and 83.5% at 2.06significant
bias respectively. The RSSu method at zero bias starts at
approximately 3% above the expected value and then declines
to approximately 1% above the expected value at significant
bias, and as bias increases the uncertainty interval is over-
estimated to a small extent. The SUMU
Max
method starts at
zero bias at approximately 4% above the expected value and
declines slightly as bias increases to approximately 97.7%. It is
not unexpected that enlarging the uncertainty interval when
the bias is non-significant would result in a higher probability
of the true value being contained inside the limits and this can
be seen clearly in Fig. 4.
Fig. 2 To correct or not correct for bias. This graph shows the
probability of the true value occurring within the uncertainty limits
when the bias is corrected (circles), when it is not corrected for at all
(triangles), and when it is corrected for only when the bias is significant
(squares). Lines are drawn indicating the level at which the bias is
considered significant and the expected probability for k 5 2 (95.45%).
Fig. 3 Enlarge when significant. This graph shows the probability of
the true value occurring within the uncertainty interval when the
uncertainty interval is increased by an enlargement method when bias
is significant. The enlargement methods are SUMU (diamonds), RSSU
(circles), Barwick & Ellison (triangles), SUMU
Max
(crosses), RSSu
(squares). Lines are drawn indicating the level at which the bias is
considered significant and the expected probability for k 5 2 (95.45%).
P
u
b
l
i
s
h
e
d

o
n

0
8

M
a
r
c
h

2
0
0
5
.

D
o
w
n
l
o
a
d
e
d

b
y

U
n
i
v
e
r
s
i
d
a
d
e

T
e
c
n
i
c
a

d
e

L
i
s
b
o
a

(
U
T
L
)

o
n

0
8
/
0
2
/
2
0
1
4

1
6
:
4
4
:
4
4
.

View Article Online
Discussion
The estimation of the uncertainty of an analytical test result
needs only to consider the precision of the whole analytical
procedure and the uncertainty of the correction for bias. These
are best estimated separately as repeatabilities of a sample and
CRM as in eqns. (5), (10) and (11). As accreditation bodies
and method providers already stipulate that a bias estimate
(usually as recovery) be determined with each run, then it is a
simple matter to utilise this data without a great deal of extra
effort needed by the laboratory. Estimates of uncertainty
based on intermediate reproducibility can give an adequate
estimate of the uncertainty of the test result, however, an
estimate of the bias and its uncertainty still needs to be
determined to verify if the bias is significant or not. Estimates
based on full reproducibility, such as from interlaboratory
collaborative trails, should only be considered when it is clear
that all participants are following the same analytical method
rigorously in unbiased situations.
The significance of bias can be determined by the com-
parison in eqn. (8). Bias is often hidden by the significance test
being based on reproducibility data. This is because between-
run or between-laboratory reproducibility is perforce larger
than within run repeatability and would thus, in accordance
with eqn. (8), incorrectly render the bias non-significant. A
second question may be raised about the level of significance.
Testing at the 95% level means that the probability of the null
hypothesis (bias is zero) has to fall to 5% before the bias is
considered significant. In situations where there is likely to be a
small bias there may be a case to reject the null hypothesis at a
much greater probability (say 10 or 20%).
Metrological traceability and therefore comparability of test
results within a laboratory and between laboratories can only
be achieved if one recognizes that the run bias is the bias of the
analysis as performed and a traceable correction for that bias
is carried out. The run bias of an analysis tends to be due to
uncontrolled factors that remain constant for the period of the
analysis and have equal effects on all samples in that batch. If
the bias varies from sample to sample then it could be said that
the method is not very robust. Efforts should be made, usually
in the method validation stage, to rid the method of
uncontrolled factors. This is usually done by the introduction
of an internal standard or surrogate. If this cannot be achieved
then the fitness of purpose of the method needs to be
addressed. If bias is seen to vary in repeated analysis of a
sample, this can be attributed to the sample or the method. If
the within sample bias is due to the sample, this is usually
associated to the heterogeneity of the sample. Efforts should
be made to employ a sampling strategy that results in a
homogeneous sample. If this is not possible then the number of
replicate analysis of the sample should be increased to give a
more reliable result. If the within sample bias is due to the
method, then this variability is accounted for in the repeat-
ability of the sample and again can be reduced by increasing
the number of replicates of the sample. The simulation reveals
that the best way to deal with bias is to always correct for it. If
correction for bias is not allowed, then enlarging the
uncertainty by the SUMU
Max
gives the best compromise
across all levels of bias.
Conclusions
Testing an average of the bias under reproducibility conditions
for significance may render the bias to be acceptable, that is
non-significant, when in fact on many occasions it is not. The
bias should be included in the test result by making a
correction for the bias based on the magnitude of the effect
in that particular run, and the uncertainty of the correction
should be included in the uncertainty of the test result. If
the observed value is not corrected for the bias then the
uncertainty of the test result should be enlarged by one of
the enlargement methods to at least include the true value.
The best enlargement method to use, as illustrated in this
paper by simulations, is the SUMU
Max
method, which adds
the absolute value of the bias to the expanded uncertainty. Not
allowing for bias by correction of the test result or by an
enlargement of the uncertainty of the result, and not including
the uncertainty of the estimation of bias leaves both the test
result and the uncertainty invalid.
Gregory E. ODonnell
ab
and D. Brynn Hibbert*
b
a
WorkCover NSW, Laboratory Services Unit, 5a Pioneer Ave,
Thornleigh, NSW, 2120, Australia.
E-mail: greg.odonnell@workcover.nsw.gov.au; Fax: +61 2 9980 6849;
Tel: +61 2 9473 4005
b
School of Chemistry, University of New South Wales, Sydney, NSW,
2052, Australia. E-mail: b.hibbert@unsw.edu.au; Fax: +61 2 9385 4713;
Tel: +61 2 9385 6141
References
1 ISO, BIPM, IEC, IFCC, IUPAC, IUPAP, and OIML, Guide to
the Expression of Uncertainty in Measurement, ISO, Geneva,
Switzerland, 1993.
2 ISO, International vocabulary of basic and general terms in
metrology, ISO, Geneva, Switzerland, 1993.
3 ISO.3534-1, StatisticsVocabulary and symbolsPart 1, ISO,
Geneva, Switzerland, 1993.
4 M. Thompson and R. Wood, Pure Appl. Chem., 1995, 67, 649.
5 M. Thompson, S. L. R. Ellison and R. Wood, Pure Appl. Chem.,
2002, 74, 835.
Fig. 4 Always enlarge. This graph shows the probability of the true
value occurring within the uncertainty interval when the uncertainty
interval is increased by an enlargement method over all levels of bias.
The enlargement methods are SUMU (diamonds), RSSU (circles),
Barwick & Ellison (triangles), SUMU
Max
(crosses), RSSu (squares).
Lines are drawn indicating the level at which the bias is considered
significant and the expected probability for k 5 2 (95.45%).
P
u
b
l
i
s
h
e
d

o
n

0
8

M
a
r
c
h

2
0
0
5
.

D
o
w
n
l
o
a
d
e
d

b
y

U
n
i
v
e
r
s
i
d
a
d
e

T
e
c
n
i
c
a

d
e

L
i
s
b
o
a

(
U
T
L
)

o
n

0
8
/
0
2
/
2
0
1
4

1
6
:
4
4
:
4
4
.

View Article Online
6 ISO.5725-1, Accuracy (trueness and precision) of measurement
methods and results Part 1: General principles and definitions, ISO,
Geneva, Switzerland, 1998.
7 R. Walker and I. Lumley, Trends Anal. Chem., 1999, 18, 594.
8 D. L. Massart, B. G. M. Vandeginste, L. M. C. Buydens,
S. D. Jong, P. J. Lewi and J. Smeyers-Verbeke, Handbook of
Chemometrics and Qualimetrics: Part A, Elsevier Science B.V.,
Amsterdam, 1997.
9 E. Hund, D. L. Massart and J. Smeyers-Verbeke, Trends Anal.
Chem., 2001, 20, 394.
10 R. Song, E. Kennedy and D. Bartley, Anal. Chem., 2001, 73, 310.
11 Analytical Methods Committee, Analyst, 1995, 120, 2303.
12 A. Maroto, R. Boque, J. Riu and F. X. Rius, Trends Anal. Chem.,
1999, 18, 577.
13 A. Ambrus, Accredit. Qual. Assur., 2004, 9, 288.
14 M. Thompson, J. Environ. Monit., 1999, 1, 19N.
15 M. Ramsey, VAM Bulletin, 2004, autumn 9.
16 D. T. Burns, K. Danzer and A. Townshend, Pure Appl. Chem.,
2002, 74, 2201.
17 V. J. Barwick and S. L. R. Ellison, Analyst, 1999, 124, 981.
18 V. J. Barwick and S. L. R. Ellison, Accredit. Qual. Assur., 2000, 5, 47.
19 S. D. Phillips, K. R. Eberhardt and B. Parry, J. Res. Natl. Inst.
Stand. Technol., 1997, 102, 577.
20 A. Maroto, R. Boque, J. Riu and F. X. Rius, Accredit. Qual.
Assur., 2002, 7, 90.
21 M. A. H. Franson, American Public Health Association. Standard
methods for examination of water and wastewater, Washington, DC,
1989.
P
u
b
l
i
s
h
e
d

o
n

0
8

M
a
r
c
h

2
0
0
5
.

D
o
w
n
l
o
a
d
e
d

b
y

U
n
i
v
e
r
s
i
d
a
d
e

T
e
c
n
i
c
a

d
e

L
i
s
b
o
a

(
U
T
L
)

o
n

0
8
/
0
2
/
2
0
1
4

1
6
:
4
4
:
4
4
.

View Article Online

Treatment of Bias in Estimating Measurement Uncertainty: Gregory E. O'Donnell and D. Brynn Hibbert

Uploaded by

Copyright:

Available Formats

Treatment of Bias in Estimating Measurement Uncertainty: Gregory E. O'Donnell and D. Brynn Hibbert

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Treatment of Bias in Estimating Measurement Uncertainty: Gregory E. O'Donnell and D. Brynn Hibbert

Uploaded by

Copyright:

Available Formats

Treatment of bias in estimating measurement uncertainty

You might also like