Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

Lecture 1 -Data Analysis

The document discusses the principles of analytical chemistry, focusing on statistical data treatment, types of analysis (qualitative and quantitative), and the importance of accuracy and precision in measurements. It outlines the general analytical problem, types of errors in experimental data, and methods for statistical treatment of errors, including mean, median, standard deviation, and relative standard deviation. The document emphasizes the significance of understanding random and systematic errors in achieving reliable analytical results.

Uploaded by

cmy00912
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture 1 -Data Analysis

The document discusses the principles of analytical chemistry, focusing on statistical data treatment, types of analysis (qualitative and quantitative), and the importance of accuracy and precision in measurements. It outlines the general analytical problem, types of errors in experimental data, and methods for statistical treatment of errors, including mean, median, standard deviation, and relative standard deviation. The document emphasizes the significance of understanding random and systematic errors in achieving reliable analytical results.

Uploaded by

cmy00912
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

CHEM2004

Principles of Analytical Chemistry

Statistical Data Treatment

Dr. Lo Pik Kwan (Peggy)


Associate Professor
G6635, City University of Hong Kong
peggylo@cityu.edu.hk

Analytical Chemistry
“ANALYTICAL CHEMISTRY IS ANALYTICAL METHODOLOGY
NOT
SPECTROMETERS, IDENTIFICATION
POLAROGRAPHS,
DETERMINATION or ASSAY
ELECTRON MICROPROBES,
ETC. ANALYSIS
QUANTITATION
IT IS ANALYTE
EXPERIMENTATION,
OBSERVATION, VALIDATION
DEVELOPING FACTS, METHODS or PROTOCOLS
AND DRAWING CONCLUSIONS.” TECHNIQUES

West PW
SOLVING A PROBLEM
in Analytical Chemistry (46) 1974
6
Analytical Chemistry
Two Types of Analysis
• Qualitative Analysis  answers the question what is it in chemical
terms? identify what materials are present in sample
Examples include detecting metals in groundwater or fish - what
metals are present?

quantitative analysis: determine how much materials are present in sample

Analytical Chemistry
Two Types of Analysis
• Qualitative Analysis  answers the question what is it in chemical
terms?
Examples include detecting metals in groundwater or fish - what
metals are present?

• Quantitative Analysis  answers the question how much is there?


Examples include measurement of quantities of metals in fish
(trace amounts or enough to harm you?)

4
Applications of Analytical Chemistry

Applications of Analytical Chemistry

6
The General Analytical Problem

Select sample
Extract analyte(s) from matrix

Separate analytes

Detect, identify and


quantify analytes

Determine reliability and


significance of results
7

Statistical Data
Treatment

8
Errors & Statistical Data Treatment

Mean Defined as follows:


N

 xi
i=1
x =
N

Where xi = individual values of x and


N = number of replicate measurements

Median
The middle result when data are arranged according to increasing
or decreasing value.

Odd number of results : middle value of the set of data


Even number of results : mean of the middle pair 10
Precision
Relates to reproducibility of results.
- The closeness of results to others obtained in exactly the
same way
How similar are values obtained in exactly the same way?

Useful for measuring this:


Deviation from the mean:

d i  xi  x

11

Accuracy

Measurement of agreement between experimental mean and


true value .
Measures of accuracy:

Absolute error: E = xi - xt (where xt = true or accepted value)

Relative error: x x
E  i t  100%
r x
t
(latter is more useful in practice)

12
Illustration of “Mean” and “Median”
Results of 6 determinations of the Fe(III) content of a solution, known to
contain 20 ppm:

19.4 19.5 19.6 19.8 20.1 20.3

13

Accepted Value: 20.00 ppm

19.4 19.5 19.6 19.8 20.1 20.3

14
Illustrating the difference between “accuracy” and “precision”

Low accuracy, low precision Low accuracy, high precision

High accuracy, low precision High accuracy, high precision


15

Sample Standard Deviation, s

For small samples of data, i.e. small N, the equation for s


N

 (x  x) i
2

s i 1
N 1
Alternative Expression for s (suitable for calculators)

N
(  xi ) 2
(  xi 2 )  i 1

i 1 N
s
N 1
16
Note: NEVER round off figures before the end of the calculation
Reproducibility of a method for determining
Standard Deviation of a Sample the % of selenium in foods. 9 measurements
were made on a single batch of brown rice.
Sample Selenium content (g/g) (xi) xi2
1 0.07 0.0049
2 0.07 0.0049
3 0.08 0.0064
4 0.07 0.0049
5 0.07 0.0049
6 0.08 0.0064
7 0.08 0.0064
8 0.09 0.0081
9 0.08 0.0064

17

Reproducibility of a method for determining


Standard Deviation of a Sample the % of selenium in foods. 9 measurements
were made on a single batch of brown rice.
Sample Selenium content (g/g) (xi) xi2
1 0.07 0.0049
2 0.07 0.0049
3 0.08 0.0064
4 0.07 0.0049
5 0.07 0.0049
6 0.08 0.0064
7 0.08 0.0064
8 0.09 0.0081
9 0.08 0.0064

18
Relative standard deviation (RSD)

RSD = (s/ x ) X 100% (the coefficient of variation)


RSD = (s/ x ) x 1000 ppt (ppt = parts per thousand)

Spread or Range (w)

The difference between the largest value and the smallest one in a set of data

19

Knowing precisely ≠ Knowing accurately

HN NH3+Cl-
S
H H

Benzyl isothiourea
hydrochloride

OH

N
Analyst 1: precise, accurate
Analyst 2: imprecise, accurate Nicotinic acid
Analyst 3: precise, inaccurate
Analyst 4: imprecise, inaccurate 20
Types of Error in Experimental Data
Three types:

(1) Random (indeterminate) Error


•cause data to be scattered approx. symmetrically about a mean
value
•reflected by precision - dealt with statistically

Random error for analysis 1 and


3 is less than that for analysis 2
and 4

Types of Error in Experimental Data


Three types:
(2) Systematic (determinate) Error
•causes the mean of a data set to differ from the accepted value
•affects accuracy
Systemic error for analysis 1 and
2 is less than that for analysis 3
and 4

(3) Gross Errors


•occur only occasionally, large, and may cause a result to be
either high or low
•often the product of human errors 22
Sources of Systematic Error
1. Instrument Error
•Nonideal instrument behaviour
•Faulty calibrations
•Use under inappropriate conditions

2. Method Error
•Nonideal chemical or physical behaviour of analytical systems

3. Personal Error
•Carelessness
•Inattention
•Personal limitation of the experimenter

23

Systematic errors can be


constant (e.g. error in burette reading -
less important for larger values of reading) or
proportional (e.g. presence of given proportion of
interfering impurity in sample; equally significant
for all values of measurement)
Minimise instrument errors by careful recalibration and good
maintenance of equipment.

Minimise personal errors by care and self-discipline

Method errors - most difficult. “True” value may not be known.


Three approaches to minimise:
•analysis of certified standards
•use 2 or more independent methods
•analysis of blanks 24
Statistical Treatment of
Random Errors
There are always a large number of small, random errors
in making any measurement.

These can be small changes in temperature or pressure;


random responses of electronic detectors (“noise”) etc.

Suppose there are 4 small random errors possible.


Assume all are equally likely, and that each causes an error
of U in the reading.
Possible combinations of errors are shown on the next slide:
25

Combination of Random Errors

Total Error No. Relative Frequency

+U+U+U+U +4U 1 1/16 = 0.0625

-U+U+U+U +2U 4 4/16 = 0.250


+U-U+U+U
+U+U-U+U
+U+U+U-U

-U-U+U+U 0 6 6/16 = 0.375


-U+U-U+U
-U+U+U-U
+U-U-U+U
+U-U+U-U
+U+U-U-U

+U-U-U-U -2U 4 4/16 = 0.250


-U+U-U-U
-U-U+U-U
-U-U-U+U

-U-U-U-U -4U 1 1/16 = 0.0625

The next overhead shows this in graphical form 26


Frequency Distribution for
Measurements Containing Random Errors

4 random uncertainties 10 random uncertainties

This is a
A very large number of Gaussian or
random uncertainties normal error
curve.
Symmetrical about
the mean.
27

Calibration of a 10ml Pipette

1. A small flask and stopper were weighted (M1)


2. 10 mL of water were transferred to the flask with the
pipette
3. The flask was stopped
4. The flask, the stopper and the water were weighed again
(M2)
5. The temperature of the water was also measured to find
its density
6. The mass of water = M2-M1
7. Volume delivered by the pipette = (M2-M1)/Density

28
Replicate Data on the Calibration of a 10ml Pipette

No. Vol, ml. No. Vol, ml. No. Vol, ml

1 9.988 18 9.975 35 9.976


2 9.973 19 9.980 36 9.990
3 9.986 20 9.994 37 9.988
4 9.980 21 9.992 38 9.971
5 9.975 22 9.984 39 9.986
6 9.982 23 9.981 40 9.978
7 9.986 24 9.987 41 9.986
8 9.982 25 9.978 42 9.982
9 9.981 26 9.983 43 9.977
10 9.990 27 9.982 44 9.977
11 9.980 28 9.991 45 9.986
12 9.989 29 9.981 46 9.978
13 9.978 30 9.969 47 9.983
14 9.971 31 9.985 48 9.980
15 9.982 32 9.977 49 9.983
16 9.983 33 9.976 50 9.979
17 9.988 34 9.983

Mean volume 9.982 ml Median volume 9.982 ml


29
Spread/range 0.025 ml Standard deviation 0.0056 ml

Calibration data in graphical form

A = histogram of experimental results

B = Gaussian curve with the same mean value, the same precision (see later)
and the same area under the curve as for the histogram. 30
Statistical Treatment of Random Error

Population vs Sample

31

Main properties of Gaussian curve:


Sample mean ( x ) : the mean of a limited sample (small N) drawn from a
population of data
N

x i
x= i =1
N
Population mean () : defined as earlier (N  ). In absence of systematic error,
 is the true value (maximum on Gaussian curve).
N

x i
= i =1
N
More often than not, particularly when N is small, x differs from µ because a small
of data does not exactly represent its population.
Remember, sample mean ( x ) defined for small values of N.
(Sample mean  population mean when N  20) 32
Population Standard Deviation (s or 
The equation for  must be modified for small samples of data, i.e. small N

N N
 ( xi   ) 2  ( xi  x ) 2
i 1
 s i 1
N
N 1
For population of data For sample data

Two differences cf. to equation for 


•Use sample mean instead of population mean
•Use degree of freedom, N-1, instead of N

33

Effects of N on the reliability of s

- When N > 20, s ~ 

- The rapid improvement in the reliability of s as N increases makes it


feasible to obtain a good approximation of when the method of
measurement is not excessively time-consuming and when an adequate
supply of sample is available.

34
Properties of the Normal Error Curve
The distribution of errors for a particular population of data is given by two population
parameters  and 

The population mean  expresses the magnitude of the quantity being measured; the
standard deviation  expresses the scatter and is therefore an index of precision.

General Gaussian curve plotted in


units of z, where
Two Gaussian curves with two different z = (x - )/
i.e. deviation from the mean of a
standard deviations, A and B (=2A) datum in units of standard
deviation. Plot can be used for
data with given value of mean,
and any standard deviation. 35

Standard Error of a Mean

The standard error of the mean, is defined as follows:

sm  s
N
N, x 

sm a measure of how close your sample mean x is likely to be the


true population mean µ. It takes into account both the value of the s
and the sample size N.
Note: The s quantifies scatter/spread – how much the value vary from one another.

36
Pooled Data

When several small sets have the same sources of random error (i.e. the same type
of measurements but different samples) the standard deviations of the individual
data sets may be pooled to more accurately determine the standard deviation of the
analysis method.
Suppose that there are t small sets of data, comprising N1, N2,….Nt measurements.
The equation for the resultant sample standard deviation is:

N1 N2 N3

 ( xi  x1 ) 2
  ( xi  x2 )   ( xi  x3 ) 2 ....
2

i 1 i 1 i 1
s pooled 
N 1  N 2  N 3 ...... t

(Note: one degree of freedom is lost for each set of data)

37

Pooled Standard Deviation Analysis of 6 bottles of wine


for residual sugar.
Bottle Sugar % (w/v) No. of obs. Deviations from mean
1 0.94 3 0.05, 0.10, 0.08 Set n  ( x  x )
i
2
sn
1 0.0189 0.097
2 1.08 4 0.06, 0.05, 0.09, 0.06 2 0.0178 0.077
3 1.20 5 0.05, 0.12, 0.07, 0.00, 0.08 3 0.0282 0.084
4 0.67 4 0.05, 0.10, 0.06, 0.09 4 0.0242 0.090
5 0.0230 0.107
5 0.83 3 0.07, 0.09, 0.10 6 0.0205 0.083
6 0.76 4 0.06, 0.12, 0.04, 0.03 Total 0.1326

38
Pooled Standard Deviation Analysis of 6 bottles of wine
for residual sugar.
Bottle Sugar % (w/v) No. of obs. Deviations from mean
Set n  (x  x) 2
sn
1 0.94 3 0.05, 0.10, 0.08 1
i

0.0189 0.097
2 1.08 4 0.06, 0.05, 0.09, 0.06 2 0.0178 0.077
3 1.20 5 0.05, 0.12, 0.07, 0.00, 0.08 3 0.0282 0.084
4 0.0242 0.090
4 0.67 4 0.05, 0.10, 0.06, 0.09 5 0.0230 0.107
5 0.83 3 0.07, 0.09, 0.10 6 0.0205 0.083
Total 0.1326
6 0.76 4 0.06, 0.12, 0.04, 0.03

39

Two alternative methods for measuring the precision of a set of results:

VARIANCE: This is the square of the standard deviation:


N

(x i  x )2
s2  i 1
N 1

COEFFICIENT OF VARIANCE (CV)


(or RELATIVE STANDARD DEVIATION):
Divide the standard deviation by the mean value and express as a percentage:

s
CV  ( )  100%
x
40
How can we relate the observed mean x value to the true value  ?

41

42
43

For a single measurement: CL for  = x  z

For the sample mean of N measurements ( x ), the equivalent expression is:

CL for   x  z
N

44
Confidence Limits when  is known
Atomic absorption analysis for copper concentration in aircraft engine oil gave a mean
value of 8.53 g Cu/ml. Pooled results of many analyses showed s   = 0.32 g
Cu/ml.
Find out the  at 90% and 99% confidence level if the above result were based on (a) 1,
(b) 4, (c) 16 measurements.

45

If we have no information on , and only have a value for s -


the confidence interval is larger,
i.e. there is a greater uncertainty.
Instead of z, it is necessary to use the parameter t, defined as follows:

t = (x - )/s

i.e. just like z, but using s instead of .

By analogy we have: CL for   x  ts


N
(where x = sample mean for N measurements)

The calculated values of t are given on the next overhead

46
Values of t for various levels of probability

Degrees of freedom 80% 90% 95% 99%


(N-1)
1 3.08 6.31 12.7 63.7
2 1.89 2.92 4.30 9.92
3 1.64 2.35 3.18 5.84
4 1.53 2.13 2.78 4.60
5 1.48 2.02 2.57 4.03
6 1.44 1.94 2.45 3.71
7 1.42 1.90 2.36 3.50
8 1.40 1.86 2.31 3.36
9 1.38 1.83 2.26 3.25
19 1.33 1.73 2.10 2.88
59 1.30 1.67 2.00 2.66
 1.29 1.64 1.96 2.58

Note: (1) As (N-1)  , so t  z


(2) For all values of (N-1) < , t > z, I.e. greater uncertainty
47

Confidence Limits where  is not known


Analysis of an insecticide gave the following values for % of the chemical lindane:
7.47, 6.98, 7.27. Calculate the CL for the mean value at the 90% confidence level.
2
xi% xi
7.47 55.8009
6.98 48.7204
7.27 52.8529

48
Testing a Hypothesis
If the experimental value is different from the true value, is the difference due to a
systematic error (bias) in the method – or simply due to random error?

NULL HYPOTHESIS
--- two values are the same

ALTERNATIVE HYPOTHESIS
--- two values are different

x  xt   ts N
At the desired confidence level, null hypothesis is rejected
-----> the two values should not be the same
-----> evidence for systematic errors 49

Detection of Systematic Error (Bias)


A standard material known to contain
38.9% Hg was analysed by
atomic absorption spectroscopy.
The results were 38.9%, 37.4%
and 37.1%. At the 95% confidence level,
is there any evidence for
a systematic error in the method?

50
Are two sets of measurements significantly different?

Suppose two samples are analysed under identical conditions.


Sample 1  x1 from N 1 replicate analyses
Sample 2  x2 from N 2 replicate analyses

Are these significantly different?


Using definition of pooled standard deviation, the equation on the last
overhead can be re-arranged:

N1  N 2
x1  x2  ts pooled
N1 N 2
Only if the difference between the two samples is greater than the term on
the right-hand side, there must be a systematic error

51

Test for significant difference between two sets of data

Two different methods for the analysis of boron in plant samples


gave the following results (g/g):
(spectrophotometry) x = 28 g/g
(fluorimetry) x = 26.25 g/g

Each based on 5 replicate measurements.


At the 99% confidence level, are the mean values significantly different?
Given spooled = 0.267.
There are 8 degrees of freedom,
therefore (Table) t = 3.36 (99% level).
Level for rejecting null hypothesis is

 ts N 1  N 2 N 1 N 2 - i. e.  ( 3.3 6 )( 0 .2 6 7 ) 1 0 2 5
i.e. ± 0.5674, or ±0.57 g/g.
But x1  x 2  28 . 0  26 . 25  1 . 75  g/g
i. e . x 1  x 2   ts p o o le d N 1  N 2 N 1 N 2

Therefore, at this confidence level, there is a significant difference, and


there must be a systematic error in at least one of the methods of 52
analysis.
Detection of Gross Errors

A set of results may contain an outlying result


- out of line with the others.
Should it be retained or rejected?
There is no universal criterion for deciding this.
One rule that can give guidance is the Q test.

Consider a set of results

The parameter Qexp is defined as follows:

Qexp  x q  x n /w

where xq = questionable result


xn = nearest neighbour
 w = spread of entire set
53

Qexp is then compared to a set of values Qcrit:

Qcrit (reject if Qexpt > Qcrit)

No. of observations 90% 95% 99% confidencelevel

3 0.941 0.970 0.994


4 0.765 0.829 0.926
5 0.642 0.710 0.821
6 0.560 0.625 0.740
7 0.507 0.568 0.680
8 0.468 0.526 0.634
9 0.437 0.493 0.598
10 0.412 0.466 0.568
Rejection of outlier recommended if Qexp > Qcrit for the desired confidence level.

Note:1. The higher the confidence level, the less likely is


rejection to be recommended.
2. Rejection of outliers can have a marked effect on mean
and standard deviation, esp. when there are only a few
data points. Always try to obtain more data.
3. If outliers are to be retained, it is often better to report
the median value rather than the mean. 54
Q Test for Rejection of Outliers at 95% level

The following values were obtained for the concentration of nitrite ions in a sample
of river water: 0.403, 0.410, 0.401, 0.380 mg/L. Should the last reading be rejected
at 95% level?

55

Q Test for Rejection of Outliers at 95% level

Suppose 3 further measurements taken, giving total values of: 0.403, 0.410, 0.401,
0.380, 0.400, 0.413, 0.411 mg/l. Should 0.380 still be retained at 95% level?

56

You might also like