Chemometrics Lecture Note
Chemometrics Lecture Note
Chemometrics Lecture Note
2 Cr. hrs
shimeles.addisu1@gmail.com,
shimeles.addisu@ju.edu.et
1
2
Brief Introduction to Chemometrics
1. Analytical problems
Analytical chemists face both qualitative and quantitative
problems.
Modern analytical chemistry is significantly a quantitative
science
quantitative result is much more valuable than a
qualitative one.
For Example:
It may be useful to have detected boron in a water
sample, but it is much more useful to be able to say how
3
much boron is present.
Errors in quantitative analysis
Measurements invariably involve errors and
uncertainties.
It is impossible to perform a chemical analysis
that is totally free of errors or uncertainties.
We can only hope to minimize errors and
estimate their size with acceptable accuracy.
4
Types of error
Experimental scientists make a fundamental distinction
between three types of error.
These are known as gross, random and systematic
errors.
5
Absolute error in
the micro-Kjeldahl
determination of
nitrogen
7
Systematic (or determinate) error:
causes the mean of a data set to differ from the accepted value.
9
Sources of Systematic Errors
10
11
Gross errors
Gross errors are so serious that there is no alternative to
abandoning the experiment and making a completely fresh
start.
Gross errors lead to outliers, results that appear to differ
significantly from all other data in a set of replicate
measurements.
Examples:
a complete instrument breakdown,
accidentally dropping or discarding a crucial sample,
discovering during the course of the experiment that a pure
reagent was badly contaminated.
12
Reproducibility and Repeatability
Suppose students is asked to do the five replicate titrations in
rapid succession.
The same set of solutions and the same glassware would be
used throughout, the same preparation of indicator would be
added to each titration flask, and the temperature, humidity
and other laboratory conditions would remain much the same.
In such cases the precision measured would be the within-
run precision: this is called the repeatability.
13
Suppose, however, that for some reason the titrations were
performed by different staff on five different occasions in different
laboratories, using different pieces of glassware and different
batches of indicator.
It would not be surprising to find a greater spread of the results in
this case.
The resulting data would reflect the between-run precision of the
method, i.e. its reproducibility.
14
15
Statistics of repeated measurements
Mean and standard deviation
Considered the results of five replicate titrations done by four
students.
16
A more useful measure, which utilizes all the values, is the
standard deviation, s, which is defined as follows:
Example:
17
The distribution of repeated measurements
Lets’ consider the following table.
Suppose it is 50 replicate determinations of the levels of nitrate
ion in a particular water specimen.
18
The above table can be summarized in a frequency table (Table
below).
This table shows that, the value 0.46 μg ml-1 appears once, the
value 0.47 g ml-1 appears three times, and so on.
19
The distribution of the results can most easily be appreciated by
drawing a histogram.
20
The mathematical model usually used is the normal or Gaussian
distribution which is described by the equation:
where x is the measured value, and y the frequency with which it occurs
21
The normal distribution has the following properties
This would mean that, if the nitrate ion concentrations (in μg mL-1)
are normally distributed:
about 68% should lie in the range 0.483–0.517,
about 95% in the range 0.467–0.533 and
99.7% in the range 0.450–0.550
In fact 33 out of the 50 results (66%) lie between 0.483 and 0.517,
49 (98%) between 0.467 and 0.533, and all the results between
0.450 and 0.550, so the agreement with theory is fairly good. 22
Figure: Properties of the normal distribution: (i) approximately 68% of values lie
within ±1s of the mean; (ii) approximately 95% of values lie within ±2s of the mean;
(iii) approximately 99.7% of values lie within ±3s of the mean. 23
For a normal distribution with known mean, μ, and standard
deviation, σ, the exact proportion of values which lie within any
interval can be found from tables, provided that the values are
first standardized so as to give z-values.
This is done by expressing any value of x in terms of its
deviation from the mean in units of the standard deviation, σ.
That is:
24
Example:
If repeated values of a titration are normally distributed with mean
10.15 mL and standard deviation 0.02 mL, find the proportion of
measurements which lie between 10.12 mL and 10.20 mL.
F(2.5) = 0.9938
29
The sampling distribution of the mean, showing the range within which
95% of sample means lie.
(The exact value 1.96 has been used in this equation rather than the
approximate value, 2. We can use Table to check that the proportion
of values between z = -1.96 and z = 1.96 is indeed 0.95.)
30
Example:
Calculate the 95% and 99% confidence limits of the mean for the
nitrate ion concentration measurements in Table below
31
Solution:
32
Confidence limits of the mean for small samples
33
Example:
The sodium ion level in a urine specimen was measured using an
ion-selective electrode. The following values were obtained: 102, 97,
99, 98, 101, 106 mM. What are the 95% and 99% confidence limits
for the sodium ion concentration?
Solution:
The mean and standard deviation of these values are 100.5 mM
and 3.27 mM respectively.
There are six measurements and therefore five degrees of
freedom.
34
From Table the value of t5 for calculating the 95% confidence
limits is 2.57.
From the 95% confidence limits of the mean are given by:
35
Presentation of results
Errors involved in all quantitative results
Less commonly:
the standard error of the mean is sometimes quoted
instead of the standard deviation,
37
In practice it is usual to quote as significant figures
all the digits which are certain, plus the first uncertain
one.
38
39
Significance tests
One of the most important properties of an analytical method
is that it should be free from bias, so that the value it gives for
the amount of the analyte should be the true value.
This property may be tested by applying the method to a
standard test portion containing a known amount of analyte.
Even if there are no systematic errors, random errors make it
most unlikely that the measured amount will exactly equal the
known amount in the standard.
To decide whether the difference between the measured and
standard amounts can be accounted for by random errors a
statistical test known as a significance test can be used.
40
Comparison of an experimental mean with a known value
In making a significance test we are testing the truth of a
hypothesis which is known as a null hypothesis, often
denoted by H0.
The term null is used to imply that there is no difference
between the observed and known values apart from that due
to random variation.
The null hypothesis is rejected if the probability of such a
difference occurring by chance is less than 1 in 20 (i.e. 0.05 or
5%).
41
If ltl (i.e. the calculated value of t) exceeds a certain critical value
then the null hypothesis is rejected.
The critical value of t for a given significance level can be found
from Table.
Example:
In a new method for determining selenourea in water the following
values were obtained for tap water samples spiked with 50 ng mL-1
of selenourea:
50.4, 50.7, 49.1, 49.0, 51.1 ng mL-1
Is there any evidence of systematic error?
42
Solution:
The mean of these values is 50.06 and the standard deviation is
0.956.
Adopting the null hypothesis that there is no systematic error.
45
If these standard deviations are not significantly
different for a method of testing this assumption), a
pooled estimate, s, of the standard deviation can first be
calculated using the equation:
2(
s 1n 1) s1
2
( n 2 1) s2
2
(n1 n 2 2)
To decide whether the difference between the two
means, is significant, i.e. to test the null
hypothesis, H0: μ1 = μ2, the statistic t is then
calculated from
46
47
48
Paired t-test
Two methods of analysis are compared by applying
both of them to the same set of test materials, which
contain different amounts of analyte.
To test whether n paired results are drawn from the
same population, that is H0: μd = 0, we calculate the
t-statistic from the equation:
49
Table below shows the results of determining the
paracetamol concentration (% m/m) in tablets by two
different methods.
Tablets from ten different batches were analyzed to see
whether the results obtained by the two methods differed.
Each batch is thus characterized by a pair of
measurements, one value for each method.
Differences between the tablets, differences between the
methods and random measurement errors contribute to
the variation between the measurements.
50
Test whether there is a significant difference between the
results obtained by the two methods in Table above:
The differences between the pairs of values (subtracting
the second value from the first value in each case) are:
51
These differences have mean and standard
deviation
Substituting in Eq. above, with n 10, gives t 0.88.
The critical value is t9 = 2.26 (P = 0.05).
Since the calculated value of is less than this the null
hypothesis is retained: the methods do not give
significantly different results for the paracetamol
concentration.
52
F-test for the comparison of standard deviations
This is a test designed to indicate whether there is a significant
difference between two methods based on their standard
deviations.
For example, the results of two different analytical methods
or results from two different laboratories.
The F-test uses the ratio of the two sample variances, i.e. the
ratio of the squares of the standard deviations, s12>s22
It is calculated from the equation:
S12
F 2
S2
There are two different degrees of freedom, v1 (n1-1) and v2 (n2-1).
If the calculated value of F exceeds a certain critical value
53
(obtained from tables), then the null hypothesis is rejected.
Example:
A proposed method for the determination of the chemical oxygen
demand of wastewater was compared with the standard (mercury
salt) method. The following results were obtained for a sewage
effluent sample:
55
The critical value is F7,7 = 3.787 (P 0.05), where the first and
second subscripts indicate the number of degrees of freedom of
the numerator and denominator respectively.
Since the calculated value of F (4.8) exceeds this, the null
hypothesis of equal variances is rejected.
The variance of the standard method is significantly greater than
that of the proposed method at the 5% probability level, i.e. the
proposed method is more precise.
56
57
Outliers
Every experimentalist is familiar with the situation in which one
(or possibly more than one) measurement in a set of results
appears to differ unexpectedly from the others.
In some cases the suspect result may be attributed to a human
error.
For example, if the following results were given for a titration:
12.12, 12.15, 12.13, 13.14, 12.12 mL
The fourth value is large value.
Should such suspect values be retained or should be
rejected as outlier?
59
Example:
The following sets of data were reported chloride analysis in a
sample: 103, 106, 107 and 114 meq/L. One value appears suspect.
Determine if it can be ascribed to accidental error, at 95 %
confidence level.
Solution:
The suspect result is 114 meq/L.
/114 107 /
Q 0.64
114 103
The tabulated value for four observations is 0.829. Since the
calculated Q is less than the tabulated Q, the suspected number
should not be rejected.
60
N Qcrit Qcrit Qcrit
61
Grubbs’ test
This test compares the deviation of the suspect value from the
sample mean with the standard deviation of the sample. The
suspect value is naturally the value that is furthest away from the
mean.
In order to use Grubbs’ test for an outlier, i.e. to test the null
hypothesis, H0, that all measurements come from the same
population, the statistic G is calculated:
Note that the mean and the standard deviation are calculated with
the suspect value included, as H0 presumes that there are no
outliers. 62
The critical values for G for P 0.05 are given in Table.
If the calculated value of G exceeds the critical value, the
suspect value is rejected.
The values given are for a two-sided test, which is appropriate
when it is not known in advance at which extreme of the data
range an outlier may occur.
63
From G-Table, for sample size 4, the critical value of G is 1.481, (P 0.05).
Since the calculated value of G does not exceed 1.481, the suspect
measurement should be retained.
64
Analysis of variance
In analytical work there are often more than two means
to be compared.
Let’s consider the following situations:
comparing the mean concentration of protein in solution for
samples stored under different conditions
comparing the mean results obtained for the concentration
of an analyte by several different methods
comparing the mean titration results obtained by several
different experimentalists using the same apparatus
65
In all these examples there are two possible sources of
variation.
a. due to the random error in measurement.
b. due to what is known as a controlled or fixed-
effect factor.
For the examples above, the controlled factors are:
the conditions under which the solution was stored
the method of analysis used and
the experimentalist carrying out the titration
66
Analysis of variance (ANOVA) is a statistical
technique which can be used to separate and estimate
the different causes of variation.
For the examples above, it can be used to separate any
variation which is caused by changing the controlled
factor from the variation due to random error.
It can test whether altering the controlled factor leads to
a significant difference between the mean values
obtained.
ANOVA can also be used in situations where there is
more than one source of random variation.
67
Comparison of several means
Table below shows the results obtained in an investigation
into the stability of a fluorescent reagent stored under
different conditions.
68
Within-sample variation
For each sample a variance can be calculated by using the
formula
A
102 101 1 1
100 101 -1 1
101 101 0 0
2
B
101 102 1 1
101 102 -1 1
104 102 2 4
6 69
C
97 97 0 0
95 97 -2 4
99 97 2 4
8
D
90 92 -2 4
92 92 0 0
94 92 2 4
8
70
Variance formula:
Degree of freedom = 12 - 4 = 8
71
101 98 3 9
102 98 4 16
97 98 -1 1
92 98 -6 36
62
72
Within-sample mean square = 3 with 8 d.f
Between-sample mean square = 62 with 3 d.f
If the null hypothesis is correct, these two estimates of σ02
should not differ significantly.
If it is incorrect, the between-sample estimate of σ02 will be
greater than the within-sample estimate because of between-
sample variation.
74
The null hypothesis is that there is no difference in
reliability.
Assuming that the workers use the laboratory for an
equal length of time, we would thus expect the same
number of breakages by each worker.
Since the total number of breakages is 61, , the expected
number of breakages per worker is 61/4 =15.25.
75
Solution
76
77
78
The quality of analytical measurements
Sampling
In most analyses we rely on chemical samples to give us
information about a whole object.
Unless the sampling stages of an analysis are considered
carefully, the statistical methods may be invalidated.
For example it is not possible to analyze all the water in a stream
for a toxic pollutant.
The sample studied must be taken in a way that ensures as far as
possible that it is truly representative of the whole object.
79
To illustrate some aspects of sampling we can study the
situation in which we have a large batch of tablets and wish to
obtain an estimate for the mean weight of a tablet.
Rather than weigh all the tablets, we take a few of them (say
ten) and weigh each one.
In this example the batch of tablets forms the population and the
ten weighed tablets form a sample from this population.
Therefore,
Sampling is the process by which a sample population is
reduced in size to an amount of homogeneous material that can
be conveniently handled in the laboratory and whose
composition is representative of the population.
80
Obtaining A Representative Sample
The sampling process must ensure that the items chosen are
representative of the bulk of material or population.
The items chosen for analysis are often called sampling units or
sampling increments.
For example, our population might be 100 coins, and we might wish to
know the average concentration of lead in the collection of coins.
Our sample is to be composed of five coins.
Each coin is a sampling unit or an increment.
In the statistical sense, the sample corresponds to several small parts
taken from different parts of the bulk material.
To avoid confusion, chemists usually call the collection of sampling
units or increments the gross sample.
81
Separation and estimation of variances using ANOVA
Table below shows the results of the purity testing of a
barrelful of sodium chloride.
Five sample increments (A–E) were taken from different
parts of the barrel chosen at random, and four replicate
analyses were performed on each sample.
82
There are two possible sources of variation:
① due to the random error in the measurement of purity,
given by the measurement variance, σ02
② due to real variations in the sodium chloride purity at
different points in the barrel, given by the sampling
variance, σ12.
A test should be carried out to see whether σ12 differs
significantly from 0.
This is done by comparing the within- and between-
sample mean squares: if they do not differ significantly
then σ12=0 and both mean squares estimates σ02 .
83
The one way ANOVA shows that the between sample mean
square is greater than the within-sample mean square: F-
test shows that this difference is very significant i.e., σ12
does differ significantly from 0.
84
Introduction to quality control methods
If a laboratory is to produce analytical results of a quality
that is acceptable to its clients, and allow it to perform well
in proficiency tests or method performance studies,
its results should show excellent consistency from
day to day.
Checking for such consistency is complicated by the
inevitable occurrence of random errors, so several statistical
techniques have been developed to show whether or not
time-dependent trends are occurring in the results,
alongside the random errors.
These are referred to as quality control methods.
85
Analytical quality control (AQC): refers to all those
processes and procedures designed to ensure that the
results of laboratory analysis are consistent,
comparable, accurate and within specified limits of
precision.
Suppose that a laboratory uses a chromatographic
method for determining the level of a pesticide in fruits.
The results may be used to determine whether a large
batch of fruit is acceptable or not, and their quality is
thus of great importance.
86
The performance of the method will be checked at
regular intervals by applying it, with a small number of
replicate analyses, to a standard reference material
(SRM), in which the pesticide level is certified by a
regulatory authority.
A standard reference material is a highly purified
compound that is well characterized.
The quality and purity of reference standards are crucial
to determining scientifically valid results for many
analytical methods.
87
Alternatively an internal quality control (IQC) standard of
known composition and high stability can be used.
IQC is a valuable technique to ensure that the results
produced from any assay are reliable and reproducible.
IQC ensures that factors determining the magnitude of
uncertainty do not change during the routine use of an
analytical method over long periods of time.
IQC is conducted by inserting one or more control
materials into every run of analysis.
The control materials are treated by an analytical
procedure identical to that performed on the test
materials. 88
In practice z = 1.96 is often rounded to 2 for 95% confidence
limits and z = 2.97 is rounded to 3 for 99.7% confidence.
89
90
Shewhart chart for mean
values
91
92
93
These factors take values depending on the sample size, n.
The relevant equations are:
94
95
The ARL can be reduced significantly by using a
different type of control chart, a CUSUM (cumulative
sum) chart.
CUSUM chart is a type of control chart used to
monitor small shifts in the process mean.
It uses the cumulative sum of deviations from a
target.
The CUSUM chart plots the cumulative sum of
deviations from the target for individual
measurements or subgroup means.
96
97
98
5. Calibration methods: regression and correlation
Calibration graphs in instrumental analysis
The analyst takes a series of samples in which the
concentration of the analyte is known.
These calibration standards are measured in the
analytical instrument under the same conditions as
those subsequently used for the test (the ‘unknown’)
samples.
The results are used to plot a calibration graph, which is
then used to determine the analyte concentrations in
test samples by interpolation.
99
A reagent blank and a set of
five standards
Calibration procedure in
instrumental analysis: o
calibration points; • test
sample.
100
This general procedure raises several important statistical
questions:
Is the calibration graph linear? If it is a curve, what is the form of the
curve?
Since each of the points on the calibration graph is subject to errors,
what is the best straight line (or curve) through these points?
Assuming that the calibration plot is actually linear, what are the errors
and confidence limits for the slope and the intercept of the line?
When the calibration plot is used for the analysis of a test sample,
what are the errors and confidence limits for the determined
concentration?
What is the limit of detection of the method? That is, what is the least
concentration of the analyte that can be detected with a
predetermined level of confidence?
101
The calibration curve is always plotted with the instrument
signals on the vertical (y) axis and the standard
concentrations on the horizontal (x) axis.
This is because it is assumed that:
all the errors are in the y-values and that the standard
concentrations (x-values) are error-free.
the y-values obtained have a normal (Gaussian) error
distribution, and that the magnitude of the random errors
in the y-values is independent of the analyte
concentration.
The straight line calibration graphs take the algebraic form:
where b is the slope of the line and a its intercept on the y-axis.
102
The product–moment correlation coefficient
Is the calibration graph linear?
A common method of estimating how well the
experimental points fit a straight line is to calculate the
product–moment correlation coefficient, r.
It is often referred to simply as the correlation
coefficient.
103
, is called the covariance of the two variables x and y.
104
Figure: The product–moment
correlation coefficient, r.
105
106
When r =-1 describes perfect negative correlation (negative
slope), whereas when r = +1 we have perfect positive
correlation (positive slope).
107
The line of regression of y on x
The least-squares straight line is given by:
108
Example:
109
Errors in the slope and Intercept of the regression line
The line of regression calculated is used to estimate:
the concentrations of test samples by interpolation
the limit of detection of the analytical procedure
The random errors in the values for the slope and intercept
important and should be calculated.
The random errors in the y-direction (sy/x)
110
The y-residuals of a regression line
111
Provided with a value for sy/x we can now calculate sb and
sa, the standard deviations for the slope (b) and the
intercept (a).
These are given by:
112
Example
Calculate the standard deviations and confidence limits of the
slope and intercept of the regression line calculated in above
example.
113
114
Limits of detection
The limit of detection (LOD) of an analyte is the
concentration which gives an instrument signal (y)
significantly different from the ‘blank’ or ‘background’
signal.
LOD can be calculated as the analyte concentration
giving a signal equal to the blank signal, yB, plus three
standard deviations of the blank, sB:
115
Curve A represents the normal
distribution of measured values
of the blank signal.
A point y = P towards the upper
edge of this distribution, and
claim that a signal greater than
this was unlikely to be due to
the blank
a signal less than P would be assumed to indicate a blank
sample.
for a sample giving an average signal P, 50% of the observed
signals will be less than this, since the signal will have a normal
distribution (of the same shape as that for the blank) extending
116
below P (curve B).
The probability of concluding that this sample does not differ
from the blank when in fact it does is therefore 50%.
Point P, which has been called the limit of decision, is thus
unsatisfactory as a limit of detection, since it solves the first of
the problems mentioned above, but not the second.
A more suitable point is at y = Q, such that Q is twice as far as P
from yB.
The distance from yB to Q in the x-direction is 3.28 times the
standard deviation of the blank, sB, then the probability of each
of the two kinds of error occurring is only 5%.
If the distance from yB to Q is only 3sB, the probability of each
error is about 7%: many analysts would consider that this is a
reasonable definition of a limit of detection.
117
Limit of quantitation (LOQ)
The limit of quantitation (or limit of
determination), which is regarded as the lower limit
for precise quantitative measurements, as opposed to
qualitative detection.
LOQ = yB + 10sB
118
The method of standard additions
The complication of matching the matrix of the standards
to that of the sample can be avoided by conducting the
standardization in the sample. This is known as the
method of standard additions.
119
The signal is plotted on the y-axis and the x-axis is
graduated in terms of the amounts of analyte added.
The regression line is calculated in the normal way, but
space is provided for it to be extrapolated to the point on
the x-axis at which y = 0.
This negative intercept on the x-axis corresponds to the
amount of the analyte in the test sample.
120
Weighted regression lines
In any calibration analysis the overall random error of
the result will arise from a combination of the error
contributions from the several stages of the analysis.
When the y-direction error in a regression calculation
gets larger as the concentration increases.
In a weighted linear regression, each xy-pair’s
contribution to the regression line is inversely proportional
to the precision of yi;
that is, the more precise the value of y, the greater
its contribution to the regression.
121
If the individual points are denoted by (x1, y1), (x2, y2), etc. as
usual, and the corresponding standard deviations are s1, s2,
etc., then the individual weights, w1, w2, etc., are given by:
The slope and the intercept of the recession line are then given
by:
yw
i
wi yi
n
and x w
i
wi xi
n
122
Example:
Calculate the unweighted and weighted regression lines for
the following calibration data. For each line calculate also
the concentrations of test samples with absorbances of
0.100 and 0.600.
123
The slope and the intercept of unweighted regression line is
calculated as follow:
x x y y
i i
slope, b i
x x
2
i
i
Intercept , a y bx
Slope = 0.0725
Intercept = 0.0133
125
The weighted regression line can be calculated as:
In the absence of a suitable computer program it is usual to
set up a table as follows.
126
y w 0.1558 / 6 0.0260
y w 1.372 / 6 0.229
127
These values for aw and bw can be used to show that absorbance
values of 0.100 and 0.600 correspond to concentrations of 1.23
and 8.01 μgmL-1 respectively.
128
129
The median: initial data analysis
Mean or average is used as the ‘measure of central
tendency’ or ‘measure of location’ of a set of results
when the (symmetrical) normal distribution is assumed,
but in non-parametric statistics, the median is usually
used instead.
130
The sign test
The sign test is amongst the simplest of all non-
parametric statistical methods
The sign test is used to test hypotheses concerning the
median of a continuous distribution.
Let’s use the symbol θ to represent to median of the
distribution.
Remember that in the case of a normal distribution the
mean is equal to the median and so the sign test can
be used to test hypotheses concerning the mean of a
normal distribution.
131
① Form null and alternative hypotheses and choose a degree of
confidence.
The null hypothesis is that the median of our population
distribution is equal to a specified median value, and the
alternative hypothesis is that it is different.
The chosen degree of confidence determines the significance
level, which will be used when deciding whether or not to
reject the null hypothesis.
② Compute a test statistic.
We count how many of the sample values are greater than or
less than the specified median value.
We then compute the probability of getting this number (or a
more unlikely one) if the specified median was correct.
This probability is our test statistic.
132
③ Compare the test statistic to a critical value.
For the sign test, our critical value is the chosen significance
level.
Therefore, if the probability is less than or equal to the
significance level, then we reject the null hypothesis.
Example:
Professor A wanted to test if the contaminant levels in the drug
were better than the government guideline of 50. Suppose that
Professor A has now produced a new batch of the drug and
noticed that the new contaminant level data do not appear to be
normally distributed. In this case, Professor A would need to use
a nonparametric hypothesis test.
133
The contaminant level data resulting from Professor A’s new
production of the drug are as follows:
45.344, 48.655, 36.199, 54.881, 49.287, 49.336, 53.492,
40.702, 46.318, 31.303
To perform the sign test, we first form our null and alternative
hypotheses.
We are interested in whether the sample median is less than
50
Our null hypothesis is the sample median is not less than
50.
The alternative hypothesis is the sample median is not the
same as 50.
134
Next, we compute our test statistic.
To do this, we determine whether each sample value is
greater than or less than the specified median of 50.
Denoting values greater than 50 by “+” and those less
than 50 by “-”, we have: (- - - + - - + - - -). Counting
these up, we have two pluses and eight minuses.
If the null hypothesis were true (i.e. the median was not
less than 50), what would be the probability that we would
get this result (or a more unlikely one) by chance?
To answer this question, we must introduce the binomial
distribution.
In probability theory, the binomial distribution indicates
the probability of the number of “successes” in n trials,
135
each of which has a probability p of “success”.
The binomial distribution specifies that the probability of r
successes in n trials with a chance of success in a single
trial of p is:
138
The Wald–Wolfowitz runs test
The Wald Wolfowitz run test is a non-parametric test or
method that is used in cases when the parametric test is not in
use.
In some instances we are interested not merely in whether
observations generate positive or negative signs, but also in
whether these signs occur in a random sequence.
If a straight line is a good fit to a set of calibration points,
positive and negative residuals will occur more or less at
random.
A sequence of +ve signs followed by a sequence of –ve signs,
and then another sequence of +ve signs. Such sequences are
technically known as runs
139
The Wald–Wolfowitz method tests whether the number of runs
is small enough for the null hypothesis of a random
distribution of signs to be rejected.
The number of runs in the experimental data is compared
with the numbers in the Wald–Wolfowitz runs test table, which
refers to the P = 0.05 probability level.
The table is entered by using the appropriate values for N, the
number of +ve signs, and M, the number of -ve signs.
If the experimental number of runs is smaller than the
tabulated value, then the null hypothesis can be rejected.
140
Example
141
142
The Wilcoxon signed rank test
A disadvantage of the sign test is it uses so little of the
information provided.
Important advances were made by Wilcoxon, and his
signed rank test has several applications. Its
mechanism is best illustrated by an example.
Example:
The blood lead levels (in pgmL-1) of seven children
were found to be 104, 79, 98, 150, 87, 136 and 101.
Could such data come from a population, assumed to
be symmetrical, with a median/mean of 95 pgmL-1?
143
The first step of the Wilcoxon sign test is to calculate the
differences of the repeated measurements and to calculate
the absolute differences.
On subtraction of the reference concentration (95) the
data give values of:
Absolute differences are first arranged in order of
magnitude without their signs
146
The numbers are then ranked: in this process they keep
their signs but are assigned numbers indicating their
order (or rank):
147
The next step is to calculate the +ve rank and –ve rank.
+ve rank =1 + 2 + 4 + 6 + 7 = 20
-ve rank = 3 + 5 = 8
20 + 8 = 7(7+1)/2
28 = 28
The lower of these two figures (8) is taken as the test statistic.
For n = 7, the test statistic must be less than or equal to 2
before the null hypothesis – that the data do come from a
population of median (mean) 95 – can be rejected at a
significance level of P = 0.05.
Since the test statistic is 8 the null hypothesis must be retained.
148
Example:
The following table gives the percentage concentration of zinc,
determined by two different methods, for each of eight
samples of health food.
149
If there is no systematic difference between the two methods,
then we would expect that the differences between the results
for each sample, i.e. (titration result ̶ spectrometry result),
should be symmetrically distributed about zero.
The signed differences are:
Sample EDTA Atomic Signed
titration spectrometry Diff.
1 7.2 7.6 -0.4
2 6.1 6.8 -0.7
3 5.2 4.6 0.6
4 5.9 5.7 0.2
5 9.0 9.7 -0.7
6 8.5 8.7 -0.2
7 6.6 7.0 -0.4
8 4.4 4.7 -0.3
150
Arranging these values in numerical order while retaining their
signs, we have:
151
The ranking of these results presents an obvious difficulty, that of
tied ranks.
There are two results with the numerical value 0.2, two with a
numerical value of 0.4, and two with a numerical value of 0.7.
This problem is resolved by giving the tied values average ranks,
with appropriate signs.
Thus the ranking for the present data is:
Sample EDTA Atomic Signed Tied
titration spectrometry Diff. rank
6 8.5 8.7 -0.2 -1.5
4 5.9 5.7 0.2 1.5
8 4.4 4.7 -0.3 -3
1 7.2 7.6 -0.4 -4.5
7 6.6 7.0 -0.4 -4.5
3 5.2 4.6 0.6 6
2 6.1 6.8 -0.7 -7.5
152
5 9.0 9.7 -0.7 -7.5
This sum for the numbers above is 36, which is the same as the
sum of the first eight integers (for the first n integers the sum is
n(n+1/2), and therefore correct.
The sum of the positive ranks is 7.5
The sum of the negative ranks is 28.5
Therefore, the test statistic is 7.5
155
To compute our test statistic, we start off by pooling both
samples (which are of sizes nc and nt for the control and
test data, respectively) into a single large sample.
We sort the data values in the large sample from 1,...,nc +nt
and then calculate the sums of the ranks from each
individual sample.
We use Rt to denote the sum of the ranks of the test sample
and Rc the sum of the ranks of the control sample.
Ut and Uc are calculated using the following formulae:
156
The lower value of Ut or Uc is then compared with a critical
value from a table.
If the calculated value is lower than the critical value, then we
reject the null hypothesis.
Example:
A study investigating potential links between diet and physical
development. Height and weight data have been gathered
from two cohorts: one of subjects who had suffered from
malnutrition in childhood (cohort A) and one of subjects who had
not (cohort B).
157
The research team wishes to determine if the heights of
the subjects in cohorts A and B are different.
Based on other findings of the study, the team has a
good reason to doubt that all the populations from
which these samples were drawn are normally
distributed.
Therefore, as we have unpaired data (i.e. the subjects
in cohorts A and B are different), we will use the Mann–
Whitney U test.
To perform the test, both samples ranked from lowest to
highest.
158
Next sum up the ranks for cohorts A and B.
Denoting cohort B as the control sample and cohort A as
the test sample, we have:
159
As a check:
Uc + Ut = nc × nt
Uc + Ut = 27.5 + 14.5 = 42
nc × nt = 6 × 7 = 42
Ut = 14.5, as our test statistic (the lower of these two
values).
We find that our critical value is 6.
As 14.5 is not less than 6, we cannot reject the null
hypothesis,
meaning that the team cannot conclude with 95%
confidence that the heights of the two cohorts are
different.
160
161
162