Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

CHAPTER 7:

STATISTICAL DATA
TREATMENT AND
EVALUATION
Most common applications of statistical tests to the
treatment of analytical results:

1. Defining a numerical interval around the mean of a set of


replicate analytical results within which the population
mean can be expected to lie with a certain probability.
This interval is called the confidence interval (CI). The
interval is related to the standard deviation of the mean.
2. Determining the number of replicate measurements
required to ensure that an experimental mean falls within
a certain range with a given level of probability.
3. Estimating the probability that (a) an experimental mean
and a true value or (b) two experimental means are
different; that is, whether the difference is real or simply
the result of random error. This test is particularly
important for discovering systematic errors in a method
and determining whether two samples come from the
same source.
4. Determining at a given probability level whether the
precision of two sets of measurements differs.
5. Comparing the means of more than two samples to
determine whether differences in the means are real or the
result of random error. This process is known as analysis of
variance.
6. Deciding with a certain probability whether an apparent
outlier in a set of replicate measurements is the result of a
gross error and can thus be rejected or whether it is a
legitimate part of the population that must be retained in
calculating the mean of the set.
CONFIDENCE INTERVALS

The confidence interval for the mean is the range of


values within which the population mean  is expected to
lie with a certain probability.

The confidence level is the probability that the true


mean lies within a certain interval. It is often expressed
as a percentage.

If s is a good approximation of , the confidence interval


can be significantly narrower than if the estimate of  is
based on only a few measurement values.

The probability that a result is outside the confidence


interval is often called the significance level.
• Finding the Confidence Interval when  is known or s → 
CI for  = x  z (for a single measurement)

𝑧
𝑥−
z= 
CI for  = 𝑥ҧ  (for N measurements)
𝑁
Determine the 80% and 95% confidence interval for (a) the first entry
(1108 mg/L glucose) and (b) the mean value (1100.3 mg/L) for
month 1 in the example. Assume that in each part, s = 19 is a good
estimate of .
How many replicate measurements in month 1 are needed to
decrease the 95% confidence interval to 1100.3  10.0 mg/L
of glucose?
It is essential to keep in mind at
all times that confidence intervals
apply only in the absence of bias
and only if we can assume that s
is a good approximation of 
(s → )

• Finding the Confidence Interval When  Is Unknown

for a single measurement

for N measurements
A chemist obtained the following data for the alcohol content of a
sample of blood: % C2H4OH: 0.084, 0.089, and 0.079. Calculate
the 95% confidence interval for the mean assuming (a) the three
results obtained are the only indication of the precision of the
method and (b) from previous experience on hundreds of
samples, we know that the standard deviation of the method is
0.005% C2H4OH and is a good estimate of .
STATISTICAL AIDS TO HYPOTHESIS TESTING

Null hypothesis – postulates that two or more observed


quantities are the same

Specific examples of hypothesis tests that chemists


often use include the comparison of When expressed as a
(1) the mean of an experimental data set with fraction, the
what is believed to be the true value; significance level is
(2) the mean to a predicted or cutoff (threshold) often given the
value: and symbol . The
(3) the means or the standard deviations from two confidence level (CL)
or more sets of data. The sections that follow is related to  on a
consider some of the methods for making these percentage basis by
comparisons. CL = (1 - ) X 100%.
• Comparing an Experimental Mean with a Known
Value
Hypothesis test – used to draw conclusions about the
population mean  and its nearness to the known value
which we call 𝑜

H0 :  = o
Ha :  > o or  < o or   o

For tests concerning one or two means, the test statistic


can be the z statistic if we have a large number of
measurements or if we know . Alternatively, we must use
the t statistic for small numbers with unknown . When in
doubt the t statistic should be used.
Large Sample z Test

1. State the null hypothesis: Ho:  = o

2. Form the test statistic:


𝑥ഥ − o
𝑧= 
ൗ 𝑁

3. State the alternative hypothesis, Ha, and determine the


rejection region

For Ha :   o , reject Ho if z  zcrit or if z  −zcrit (two-tailed test)


For Ha :  > o, reject Ho if z  zcrit
For Ha :  < o, reject Ho if z  −zcrit
A class of 30 students determined the activation energy of a
chemical reaction to be 27.7 kcal/mol (mean value) with a
standard deviation of 5.2 kcal/mol. Are the data in agreement
with the literature value of 30.8 kcal/mol at (1) the 95%
confidence level and (2) the 99% confidence level? Estimate
the probability of obtaining a mean equal to the literature
value.
Ho :  = 30.8 kcal/mol
Ha :   30.8 kcal/mol
For 95% confidence level, zcrit = 1.96
99% confidence level, zcrit = 2.58

27.7−30.8
𝑧= 5.2 = −3.26
ൗ 30

Since, z < -1.96 (95% CL) and z < -2.85 (99% CL) we reject
the null hypothesis.
Conclusion: The student mean is actually different from the
literature value and not just the result of random error.
Small Sample t Test

1. State the null hypothesis: Ho:  = o

2. Form the test statistic:


𝑥ഥ − o
𝑡= 𝑠
ൗ 𝑁

3. State the alternative hypothesis, Ha, and determine the


rejection region

For Ha :   o , reject Ho if t  tcrit or if t  −tcrit (two-tailed test)


For Ha :  > o, reject Ho if t  tcrit
For Ha :  < o, reject Ho if t  −tcrit
Bias = B - o

In testing for bias, we do not know initially whether the


difference between the experimental mean and the
accepted value is due to random error or to an actual
systematic error. The t test is used to determine the
significance of the difference.
A new procedure for the rapid determination of the percentage of
sulfur in kerosenes was tested on a sample known from its
method of preparation to contain 0.123% (o = 0.123%) S. The
results were % S = 0.112, 0.118, 0.115, and 0.119. Do the data
indicate that there is a bias in the method at the 95% confidence
level?
Ho :  = 0.123 %
Ha :   0.123 %

𝑥ഥ −o 0.116−0.123
𝑡= 𝑠 = 0.0032 = −4.375
ൗ 𝑁 ൗ 4
tcrit (95% CL) = 3.18 and tcrit (99% CL) = 5.84

➢ t < tcrit (at 95% CL).Therefore, there is a significant


difference at the 95% confidence level and thus bias
in the method

➢ t > tcrit (at 99% CL).Therefore, there is no significant


difference (null hypothesis is accepted) at the 99%
confidence level
CL = (1 - ) X 100%

➢ The significance level (0.05 or 0.01) is the probability


of making an error by rejecting the null hypothesis

If it were confirmed
by further
experiments that the
method always gives
low results, we
would say that the
method had a
negative bias.
• Comparison of Two Experimental Means

- whether a difference in the means of two sets of data is


real or the result of random error
- results are used to determine whether two analytical
methods give the same values or whether two analysts
using the same methods obtain the same means

The t Test for Differences in Means


N1 replicate analyses by analyst 1 yielded 𝑥ഥ1
N2 replicate analyses by analyst 2 yielded 𝑥ഥ2

H0 : 1 = 2
Ha : 1  2 (two-tailed test)
1 < 2 or 1 > 2 (one-tailed test)

number of degrees of
freedom for finding the
critical value of t is N1 +
N2 - 2.
If absolute value of t < tcritical, the null hypothesis is accepted
and no significant difference between the means has been
demonstrated.
If absolute value of t > tcritical , indicates a significant
difference between the means.

Two barrels of wine were analyzed for their alcohol content


to determine whether they were from different sources. On
the basis of six analyses, the average content of the first
barrel was established to be 12.61 % ethanol. Four
analyses of the second barrel gave a mean of 12.53%
alcohol. The 10 analyses yielded a pooled standard
deviation spooled of 0.070%. Do the data indicate a
difference between the wines?

H0 : 1 = 2
Ha : 1  2
Degrees of freedom = 10-2 = 8

For 95% CL, tcrit = 2.31

Since 1.771 < 2.3, null hypothesis is accepted

Paired Data
Ho: d = o where o is a specific value for the difference to
be tested, often zero
Ho: d  o
d > o A new automated procedure for
d < o determining glucose in serum (Method
A) is to be compared with the
𝑑ഥ −o established method (Method B). Both
𝑡= 𝑠𝑑
ൗ 𝑁 methods are performed on serum from
the same six patients to eliminate
patient-to-patient variability. Do the
following results confirm a difference in
the two methods at the 95% confidence
level?
N=6
di = 16 + 9 + 25 + 5 + 22 + 11 = 88
di2 = 1592
𝑑ҧ = 14.67

882
1592 − 14.67
𝑠𝑑 =
6−1
6
= 7.76 𝑡= 7.76 = 4.628
6
tcrit (95% CL) = 2.57, degrees of freedom = 5
Since t > tcrit , therefore, the two methods give different
results (null hypothesis is rejected)
Errors in Hypothesis Testing

o type I error - occurs when Ho is rejected although it is


actually true (false negative)
to minimize this type of error, use a smaller 
o type II error - occurs when Ho is accepted and it is
actually false (false positive)
to minimize this type of error, use a larger 

• Comparison of Precision

F test - compare the variances (or standard deviations) of


two populations

H 0 : 1 2 = 2 2
H a : 1 2  2 2 (two-tailed test)
12 > 22 or 12 < 22 (one-tailed test)
2
𝑠1 the larger variance always appears in
F= 2 the numerator
𝑠2

A standard method for the determination of the carbon monoxide


(CO) level in gaseous mixtures is known from many hundreds of
measurements to have a standard deviation of 0.21 ppm CO. A
modification of the method yields a value for s of 0.15 ppm CO for
a pooled data set with 12 degrees of freedom. A second
modification, also based on 12 degrees of freedom, has a standard
deviation of 0.12 ppm CO. Is either modification significantly more
precise than the original?

2 Fcrit = 2.30
𝑠𝑠𝑡𝑑 (0.21)2
𝐹1 = 2 = = 1.96 F1 < Fcrit , null hypothesis is
𝑠1 (0.15)2
2 accepted (no improvement in
𝑠𝑠𝑡𝑑 (0.21)2 precision)
𝐹2 = 2 = = 3.06
𝑠2 (0.12)2
F2 > Fcrit, null hypothesis is
rejected
whether the precision of the second modification is
significantly better than that of the first
2
𝑠1 (0.15)2 Fcrit = 2.69
𝐹2 = = = 1.56
𝑠2 2 (0.12)2
Since F < 2.69, we must accept Ho and conclude that
the two methods give equivalent precision.

• Detection of Gross Error

When a set of data contains an outlying result that


appears to differ exclusively from the average, the decision
must be made whether to retain or reject it.
No universal rule can be invoked to settle the question of
retention or rejection.
Rejecting Data

When one value in a set of results is much larger or smaller


than the others, decide whether to retain or reject the
questionable value.

Q-Test

ห𝑋𝑞 − 𝑋𝑛 ȁ
𝑄𝑒𝑥𝑝 = where Xq = questionable value
𝑤
Xn = nearest numerical value
w = range

Xq = is rejected if: Qexp  Qt Xq = is accepted if: Qexp  Qt


The analysis of a calcite sample yielded CaO
percentages of 55.95, 56.00, 56.04, 56.08, and
56.23. The last value appears anomalous; should it be
retained or rejected at the 95% confidence level?

ȁ56.23 − 56.08ȁ
𝑄𝑒𝑥𝑝 = = 0.54
56.23 − 55.95
Qt (95% CL) =0.71.
Since Qexp < Qt
we retain the outlier
at 95% confidence level.
Recommendations for Treating Outliers

1.Reexamine carefully all data relating to the outlying


result to see if a gross error could have affected its value.
This recommendation demands a properly kept laboratory
notebook containing careful notations of all observations

2. If possible, estimate the precision that can be


reasonably expected from the procedure to be sure that
the outlying result actually is questionable.

3. Repeat the analysis if sufficient sample and time are


available. Agreement between the newly acquired data
and those of the original set that appear to be valid will
lend weight to the notion that the outlying result should
be rejected. Furthermore, if retention is still indicated,
the questionable result will have a small effect on the
mean of the larger set of data.
4. If more data cannot be secured, apply the Q test to
the existing set to see if the doubtful result should be
retained or rejected on statistical grounds.

5. If the Q test indicates retention consider reporting the


median of the set rather than the mean. The median has the
great virtue of allowing inclusion of all data in a set without
undue influence from an outlying value. In addition, the
median of a normally distributed set containing three
measurements provides a better estimate of the correct
value than the mean of the set after the outlying value has
been discarded.

You might also like