Lecture 2-Data Analysis - Part 1
Lecture 2-Data Analysis - Part 1
Chapter 2
Statistics in Analytical Chemistry
2
Introduction
• All measurements always involve in errors and uncertainties.
• Example: Errors involved in a titration
chem.uiuc.edu
chem-ilp.net
4
Important terms
• Mean: X, is the numerical average:
5
Important terms
• Median:
– Xmed is the middle value when data are ordered from the smallest to
the largest value.
– Odd number of measurements: median is the middle value.
– Even number of measurements: median is the average of the n/2 and
the (n/2) + 1 measurements, where n is the number of measurements.
6
Important terms
• Precision: refers to the closeness of the results
obtained from identical measurement →
describes reproducibility.
7
Significant figures
• Significant figures: the number of digits reported in a
measurement reflect the accuracy of the measurement and
the precision of the measurement device.
• Significant figures are all certain figures plus one extra figure
having some uncertainty.
• Example:
8
Significant figures
• Rule 1: Disregard all initial zeros, all remaining digits including
terminal zeros and zeros between nonzero integers are
significant.
9
Significant figures
• Rule 2: For addition and subtraction, the smallest number of
digits to the right of the decimal set the significance.
• Examples:
1.362 22.989 770
+ 35.453 Rule for rounding to drop
+ 3.111
all insignificant numbers:
4.473 58.442 770 round up for digits ≥ 5,
round down for digits < 5
Not significant
à58.443
Rounding up
• Exercises:
1) Rounding to 3 significant figures: 0.135 2; 0.0216 74
2) Write answer with the correct number of digits: 12.3 – 1.63 =;
1.021 + 1.63 =
10
Significant figures
• Rule 3: For multiple and division, the smallest number of
significant digits determines the significance.
• Examples:
• Exercise:
Write answer with the correct number of digits: 4.34 × 9.2 = 39.928
11
Significant figures
• Rule 4:
– Number of digits in mantissa of log x = number of significant figures in
x
• Example:
1st: 19.4; 2nd: 19.5; 3rd: 19.6; 4th: 19.8; 5th: 20.1; 6th: 20.3.
13
Errors
• Relative error Er: is a more useful quantity than the absolute
error.
'( *'+ '( *'+
𝐸𝑟 = % 𝐸𝑟 = ×100%
'( '(
1st:19.4; 2nd: 19.5; 3rd: 19.6; 4th: 19.8; 5th: 20.1; 6th: 20.3
Mean = 19.8
Relative error for the mean:
Er = (19.8 - 20.0) x 100%/19.8 = - 1%
14
Errors
• Results can be precise without being accurate or accurate
without being precise.
15
Fundamentals of analytical chemistry, Skoog, D. A
Errors
• Every measurement has some uncertainty, called
experimental error.
• Experimental error is classified as systematic or random.
• Systematic errors:
– Also called determinate error, arises from a flaw in equipment or the
design of an experiment. If you conduct the experiment again in
exactly the same manner, the error is reproducible.
– In principle, systematic error can be discovered and corrected.
17
Random errors
• Random errors:
– Also called indeterminate error, arises from uncontrolled variables in
the measurement.
– Random error has an equal chance of being positive or negative.
– It is always present and very difficult to be corrected.
– Example: Reading a scale
18
Gross errors
• Gross errors:
– Gross errors differ from indeterminate and determinate errors. They
usually occur only occasionally, are often large and may cause a result
to be either high or low.
19
Statistical treatment of random errors
• Random or indeterminate errors exist in every measurement.
20
Statistical treatment of random errors
• Distribution of random errors:
– Example: Calibration of a 10 mL pipet with replication of 50 times.
22
Properties of Gaussian curve
• Difference between sample mean 𝑋0 and population mean µ
∑3 𝑋% ∑3 𝑋%
𝑋0 = %45 𝜇 = % 45
𝑁 𝑁
23
Properties of Gaussian curve
• Population standard deviation #: is a measure of the precision
of a population data.
∑3
%45 𝑋% − 𝜇
: N represents the number of data
𝜎= points that make up the data.
𝑁
𝑋−𝜇 1 − z 2 /2
𝐼𝑓 𝑧 = y= e
𝜎 σ 2π
24
Fundamentals of analytical chemistry, Skoog, D. A
Properties of Gaussian curve
• Area under the Gaussian curve: between a pair or limits gives
the probability of a measured value.
– Example: calculate the probability of a measured value within ± σ.
+σ +1
1 − ( x − µ )2 /2σ 2 1 − z 2 /2
area = ∫ e dx = ∫ e dz = 0.683
−σ σ 2π −1 2π
~ 68.3% of the values will lie ~ 99.7% of the values will lie
within ± σ (z = ± 1) within ± 3σ (z = ± 3)
25
Fundamentals of analytical chemistry, Skoog, D. A
Properties of Gaussian curve
• The area under entire Gaussian curve = 1 à 100 % the values
making up the population will lie within ±∞.
26
Properties of Gaussian curve
27
Sample standard deviation
• Sample standard deviation s (absolute standard deviation):
:
∑3 0 2 ∑3
%45 𝑋% − 𝑋 %45 𝑑%
𝑠= =
𝑁 −1 𝑁−1
:
: ∑3 𝑋
∑3 0 2 ∑%45 𝑋% − %45 %
3
%45 𝑋% − 𝑋 𝑁
𝑠= =
𝑁 −1 𝑁−1
28
Sample standard deviation
• Sample standard deviation s:
– Example:
29
Sample standard deviation
• Pooling data to increase the reliability of s:
– The pooled estimate of σ, spooled is a weighted average of individual
estimates:
30
Sample standard deviation
• Variance (s2): can be used to describe the precision of the
data. :
∑ 3 0
%45 𝑋% − 𝑋
2 ∑3
% 45 𝑑%
𝑠 =
2 =
𝑁−1 𝑁−1
𝑠
• Relative standard deviation (RSD): 𝑅𝑆𝐷 =
𝑋0
If 𝑦 = 𝑎 + 𝑏 − 𝑐; then 𝑠V = 𝑠W : + 𝑠X : + 𝑠Y :
• Example:
Standard deviation of the result:
• Multiplication/Division:
[\ [] : [^ : [_ :
If 𝑦 = 𝑎×𝑏/𝑐; then = + +
V W X Y
• Example:
32
Error propagation
• Exponential:
[\ [] (the exponent x can be considered
If 𝑦 = 𝑎 ` ; then =𝑥
V W free of uncertainty).
• Example:
33
Error propagation
• Logarithm and antilogarithm:
5 [m [m
𝐼𝑓 𝑦 = log 𝑥 ; then 𝑠V = ≅ 0.434 26
jk5l ` `
[\
𝐼𝑓 𝑦 = 10 ` ; then = 𝑙𝑛10 𝑠` ≅ 2.302 6 𝑠`
V
• Examples:
34
Confidence intervals (CI)
• Confidence interval for the mean is the range of values within
which the population mean µ is expected to lie with a certain
probability.
• Example: 99% probable that the true population mean for a
set of calcium measurements lies in the interval 7.25% ±
0.15% Ca. Thus, the mean should lie in the interval from
7.10% to 7.40% Ca with 99% probability.
35
CI when σ is known or s is a good approximation of σ
37
CI when σ is known or s is a good approximation of σ
38
CI when σ is known or s is a good approximation of σ
39
CI when σ is unknown
• Often, limitations in time or in the amount of available sample
prevent us to assume s is a good estimate of σ.
• Use t statistical parameter t (Student’s t), which is defined in
exactly the same way as z except that s is substituted for σ.
• For a single measurement with result x:
𝑥−𝜇
𝑡=
𝑠
• For the mean of N measurements:
𝑥̅ − 𝜇
𝑡=
𝑠/ 𝑁
• CI for the mean of N replicate measurements:
𝑡𝑠
𝐶𝐼 𝑓𝑜𝑟 𝜇 = 𝑥v ±
𝑁
Note: t depends on the desired confidence level and the number of degrees
of freedom (N-1) in the calculation of s.
40
CI when σ is unknown
• Values of t at different degree of freedom and confidence
level:
41
CI when σ is unknown
• Example 1: chemist obtained the following data for the
alcohol content of a sample of blood: % C2H5OH: 0.084, 0.089,
and 0.079. Calculate the 95% confidence interval for the mean
assuming:
(a) The three results obtained are the only indication of the precision of
the method
42
CI when σ is unknown
(b) from previous experience on hundreds of samples, we know that the
standard deviation of the method s = 0.005% C2H5OH and is a good
estimate of σ
43