Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
38 views

Module 1A Notes Introduction To Statistical Analysis For Chemistry Students

This document discusses types of errors in statistical analysis and chemistry measurements. It defines three main types of errors: 1) Random errors, which cause scattered data and affect precision. They result from small unpredictable factors. 2) Systematic errors, which cause data means to differ from true values and affect accuracy. They can arise from instrument flaws, imperfect methods, or personal limitations. 3) Gross errors, which are rare but large and can create outlier data points. The document also discusses ways to recognize and adjust for systematic errors, including analyzing standard samples. Personal errors can be minimized through care and self-discipline.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Module 1A Notes Introduction To Statistical Analysis For Chemistry Students

This document discusses types of errors in statistical analysis and chemistry measurements. It defines three main types of errors: 1) Random errors, which cause scattered data and affect precision. They result from small unpredictable factors. 2) Systematic errors, which cause data means to differ from true values and affect accuracy. They can arise from instrument flaws, imperfect methods, or personal limitations. 3) Gross errors, which are rare but large and can create outlier data points. The document also discusses ways to recognize and adjust for systematic errors, including analyzing standard samples. Personal errors can be minimized through care and self-discipline.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Module 1 – INTRODUCTION TO STATISTICAL ANALYSIS FOR CHEMISTRY STUDENTS

Definition of Terms The absolute error, E, in the measurement of a quantity Xi is


deviation – the difference between a value in a frequency given by the equation
distribution and fixed number such as the mean
normalized equation – simplified the expressions/equations.
where Xt is the true, or accepted, value of the quantity. Note that
The equation for normalization is derived by initially deducting
the sign in stating the error is retained.
the minimum value from the variable to be normalized, then the
minimum value is deducted from the maximum value and then Relative error is expressed in parts per thousand (ppt).
the previous result is divided by the latter.
calibration – In measurement technology and metrology,
calibration is the comparison of measurement values delivered
by a device under test with those of a calibration standard of Classifying Experimental Errors
known accuracy (Wikipedia).
frequency - the number of occurrences of a repeating event per
unit of time.
normal distribution function – a probability function that
describes how the values of a variable are distributed. It is a
symmetric distribution where most of the observations cluster
around the central peak and the probabilities for values further
away from the mean taper off equally in both directions.
degrees of freedom – number of measurement minus 1 (N-1)

Precision and Accuracy


Mean, arithmetic mean, and average (x) are synonyms.

The graph suggests that chemical analyses are affected by at least


two types of errors.

where xi represents the individual value of x making up a set of N • One type, called random (or indeterminate) error,
replicate measurements. causes data to be scattered more or less symmetrically
around a mean value.
Median is the middle result when replicate data are arranged in
order of size.
• For an odd number of data points, the median is the middle
value. For an even number, the mean of the middle pair is
used.
• The mean of two or more measurements is their average
value.
• The median is used advantageously when a set of data
contains an outlier, a result that differs significantly from
others in the set. • Scattering of data (caused by the random error) for
analysts 1 and 3 is significantly less than that for analysts
Precision is determined by repeating the measurement on
2 and 4.
replicate samples, and describes the reproducibility of
measurements; the closeness of results to each other. • A second type of error, called systematic (or
determinate) error, causes the mean of a set of data to
Precision is a function of the deviation from the mean, di, or just
differ from the true value (bias).
the deviation, which is defined as

Three terms to describe the precision of a set of replicate data:


standard deviation, variance, and coefficient of variation.
Accuracy indicates the closeness of the measurement to its true
or accepted value and is expressed by the error.
• The results of analysts 1 and 2 have little systematic species, non-specificity of most reagents, and
error than the data of analysts 3 and 4. possible interference.
• Random, or indeterminate, errors are errors that affect – Errors inherent in a method are often difficult to
the precision of measurement. detect and are thus the most serious of the three
types of systematic error.
• Systematic, or determinate, errors affect the accuracy of
results. • One or more of the following steps can recognize and adjust
for a systematic error in an analytical method.
• A third type of error is gross error. It usually occurs only
occasionally, are often large, and may cause outliers, – Analyzing standard reference materials
results that appear to differ markedly from all other (commonly known as check sample), SRM, is the
data in a set of replicate measurements. best way to estimate the bias of an analytical
method.
How do Systematic Error Arises?
– If standard samples are not available, a second
• Three types of systematic errors: independent and reliable analytical method can be
used in parallel with the method being evaluated.
– Instrument errors are caused by imperfections in
A statistical test must be used to determine
measuring devices and instabilities in their
whether any difference is a result of random errors
components.
in the two methods or due to bias in the method
– Method errors arise from non-ideal chemical or under study.
physical behavior of analytical systems.
– Blank determinations are useful for detecting
– Personal errors result from the carelessness, certain types of constant errors. It reveals errors
inattention, or personal limitations of the and correct data. In a blank determination, or
experimenter. blank, all steps of the analysis are performed
• Instrument errors without using the sample. The results from the
blank are then applied as correction to the sample
– Systematic error that exists in all measuring measurements. Lastly, personal errors arise from
devices. For example, pipets, burets, and the personal limitation of an analyst.
volumetric flasks may hold or deliver volumes
slightly different from those indicated by their • Personal Errors
graduations. – Measurements require personal judgments. An
– These differences arise from using glassware at a analyst who is insensitive to color changes tends
temperature that differs significantly from the to use excess reagent in a volumetric analysis.
calibration temperature, from distortions in Physical disabilities are often sources of personal
determinate errors.
container walls due to heating while drying, from
errors in the original calibration, or from – Prejudice is a universal source of personal error.
contaminants or scratches on the inner surfaces of Number bias is another source of personal error
the containers. that varies considerably from person to person.
– Electronic instruments are subject to instrumental Color blindness amplifies personal errors in a
systematic errors. For example, errors emerge as volumetric analysis.
the voltage of a battery-operated power supply – Most personal errors can be minimized by care and
decreases with use. self-discipline.
– Instrument errors are detectable and correctable.
Periodic calibration eliminates most systematic The Nature of Random Errors
errors of this type. Examples of causes of random errors are:
• Method Errors • Different weights for every positioning of body on the
– The non-ideal chemical or physical behavior of the weighing scale.
reagents and reactions upon which an analysis is • Measuring height is affected by minor posture changes.
based, often introduce systematic method errors. • Measuring wind velocity depends on the height and time at
which a measurement is taken. Multiple readings must be
– Such sources of non-ideality include the slowness
and incompleteness of reactions, the instability of taken and averaged because gusts and changes in direction
affect the value.
Random errors are caused by many small but uncontrollable where, N approaches infinite number while a sample mean, 𝐱̅,
variables that accumulates.
– The table below gives all the possible ways in which four
errors can combine to give the indicated deviations from
the mean value. where, N is a finite number of measurements. The mean is an
– Sufficiently large number of measurements can be average of a group characteristic (an item of interest).
expected to have a frequency distribution. The plot of this Important note: The difference between µ and 𝐱̅ decreases
distribution is called a Gaussian curve or a normal error rapidly as N reaches over 30 measurements.
curve where the most frequent occurrence is zero deviation
from the mean. The population standard deviation, σ, is a measure of the
precision or scatter of a population data, which is given by the
equation,

where N is the number of data points making up the population.

– the two populations of data in the figure above differ


only in their standard deviations.
– the standard deviation for the data set yielding the
broader but lower curve B is twice that for the
– The spread of data results directly from an
measurements yielding curve A which means the
accumulation of all random uncertainties in the
precision of the data leading to curve A is twice as good
experiment. The spread in a set of replicate
as that of the data that are represented by curve B.
measurements can be defined as the difference
between the highest and lowest result. Now, in chemical analysis settings, we rarely have the luxury of
taking hundreds of measurements to determine a single system.
Common Parameters to Express Reliability of Data It is usually not practical use of time and resources. Usually, we
are dealing with a much smaller data set (generally N < 30). As a
Statistical methods allow us to characterize data and to make result, the σ must be modified into sample standard deviation,
objective and intelligent decisions about data quality and s, to estimate the σ and is given by the equation
interpretation.
Gaussian curves can be described by a normalized equation
below which contains two parameters, the population mean, µ
and the population standard deviation, σ. Data values such as x
or
and y are variables.
where the quantity N — 1 is called the number of degrees of
freedom.

A population mean, µ, is – The number of degrees of freedom indicates the


number of independent results that enter into the
computation of the standard deviation.
– When N — 1 is used instead of N, s is said to be an Example 2
unbiased estimator of the population standard The mercury in samples of seven fish taken from Chesapeake Bay
deviation. was determined by a method based on the absorption of
radiation by gaseous elemental mercury. Calculate a pooled
The sample variance, s2, is an estimate of the population estimate of the standard deviation for the method. Refer to the
variance σ2. attachment “Module1Exercise2” for data and calculation.
Chemist frequently quote standard deviations in relative rather
than absolute terms. We calculative the relative standard Combining Uncertainties in Calculations –
deviation by Propagation of Uncertainty
We often need to estimate the standard deviation of a result that
Relative standard deviation:
has been computed from two or more experimental data points,
each of which has a known sample standard deviation.
Coefficient of variation:

Example 1
The following results were obtained in the replicate
determination of the lead content of a blood sample: 0.752,
0.756, 0.752, 0.751, and 0.760 ppm Pb. Calculate the mean and
the standard deviation of this data set (Answer: 𝐱̅ = 0.754 ppm;
sx = 0.00377 ≈ 0.004 ppm Pb).

The sample standard deviation is the degree which a single


measurement within the sample differ from the sample mean
(e.g. 0.754 ppm ± 0.004 ppm). Another parameter is used to
estimate of how far the sample mean is likely to be from the
population mean and is given by the equation,
Example 3
Calculate the standard deviation of the result
It can be seen from the formula that the standard error of the
mean, sm, decreases as N increases. This parameter tells us how
the mean varies with different experiments measuring the same
quantity. Thus, if the effect of random changes is significant, then
the sm will be higher. If there is no change in the points as Refer to the attachment “Module1Exercise3” for exercise 3
experiments are repeated, then the sm is zero. calculation.

Confidence limits
Pooling of Data to Improve the Reliability of s
– use:
The rapid improvement in the reliability of s with increases in N 1. defining a numerical interval (confidence interval)
makes it feasible to obtain a good approximation of σ when the around the mean 𝐱̅ of a set of replicate analytical
method of measurement is not excessively time consuming and results within which the population mean μ can be
when an adequate supply of sample is available. expected to lie with a certain probability;
To obtain a pooled estimate of the standard deviation, spooled, 𝑧𝜎
CL for µ = 𝐱̅ ± (used when s is a good estimate of σ)
deviations from the mean for each subset are squared; the √𝑁

squares of all subsets are then summed and divided by an Exercise 4


appropriate number of degrees of freedom. Determine the 80% and 95% confidence intervals for (a) the first
entry (1100.8% mg/L glucose) in Example 6-2 of the reference
book and (b) the mean value (1100.3 mg/L) for month 1 in the
example. Assume that in each part, s = 19 is a good estimate of
σ.
where N1 is the number of results in set 1, N2 is the number in set 𝑡𝑠
CL for µ = 𝐱̅ ± (used when σ is unknown)
√𝑁
2, and so forth. The term Nt is the number of data sets that are
pooled.
Exercise 5 6. Susan measures the weight of a standard paper clip to be
A chemist obtained the following data for the alcohol content of 0.97 grams. The accepted value for the mass of a paper clip
a sample of blood; % C2H5OH: 0.084, 0.089, and 0.079. Calculate is 1.05 grams. What is the percent error of Susan's
the 95% confidence interval for the mean assuming the three measurement? Do you notice any peculiar differences
results obtained are the only indication of the precision of the between this percent error and the percent error found in
method. problem 7?

Refer to the attachment “Module1Exercise4&5” to answer 7. If you had a beaker and some graphite how would you
exercise 4 and 5. weigh the exact amount of graphite using the weighing of
difference procedure?
Detecting Gross Errors
ANSWERS TO PROBLEM EXERCISES:
A data point differs excessively from the mean in a data set is
termed an outlier. When a set of data contains an outlier, the 1. Since Tom must rely on the machine for an absorbance
decision must be made whether to retain or reject. The Q test is reading and it provides consistently different
a simple and widely used statistical test. measurements, this is an example of systematic error.
2. The majority of Claire's variation in time can likely be
|𝑥𝑞 − 𝑥𝑛 | |𝑥𝑞 − 𝑥𝑛 |
Q 𝑒𝑥𝑝 = = attributed to random error such as fatigue after multiple
𝑤 |𝑥ℎ𝑖𝑔ℎ − 𝑥𝑙𝑜𝑤 |
laps, inconsistency in swimming form, slightly off timing in
where xq is questionable result and xn is its nearest neighbor. starting and stopping the stop watch, or countless other
This ratio is then compared with rejection values Qcrit found in small factors that alter lap times. To a much smaller extent,
the table below. If Qexp is greater than Qcrit, the questionable the stop watch itself may have errors in keeping time
result can be rejected with the indicated degree of confidence. resulting in systematic error.
3. The researcher's percent error is about 0.62%.
4. This is known as multiplier or scale factor error.
5. This is called an offset or zero setting error.
6. Susan's percent error is -7.62%. This percent error is
negative because the measured value falls below the
accepted value. In problem 7, the percent error was positive
because it was higher than the accepted value.
7. You would first weigh the beaker itself. After obtaining the
Problem exercises: weight, then you add the graphite in the beaker and weigh
it. After obtaining this weight, you then subtract the weight
1. Tom conducted an experiment using the GENSYS-20 and his
of the graphite plus the beaker minus the weight of the
results were consistently off from the actual absorbance for
beaker.
the wavelength. Is this a systematic or random error?

2. Claire decided to time her dog lap times with a stop watch.
Her results were varied after 10 trials. Why is this so? LINKS for Lecture videos:
3. A researcher measures the length of a particular steel bolt Module 1A Part 1
to be 24.35 cm. If the accepted value for the length of this
steel bolt is 24.20 cm, what is the percent error of the https://youtu.be/SUBnxinRM74
researcher's measurement?
Module 1A Part 2
4. A spectrophotometer gives absorbance readings that are
consistently higher than the actual absorbance of the https://youtu.be/YTufn8Go7PY
materials being analyzed. What kind of systematic error is
Module 1B Part 1
this?
https://www.youtube.com/watch?v=BKlB_iB4wp4
5. An electronic balance lacks the ability to read a measured
quantity as zero so researchers must weigh by difference to Module 1B Part 2
determine more accurately the mass of a material. What
type of error is this inability to read zero called? https://www.youtube.com/watch?v=4y2cjJ8Jpsg

You might also like