Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
60 views

One Dimensional Statistics

This document discusses various statistical concepts including levels of uncertainty, random and systematic errors, populations and samples, normal distributions, z-scores, t-tests, and ANOVA. Key topics covered include defining probability based on measurement error and random variables, the differences between populations and samples, calculating means, standard deviations, and combining uncertainties when working with multiple measured values.

Uploaded by

dhbash ALKALI
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

One Dimensional Statistics

This document discusses various statistical concepts including levels of uncertainty, random and systematic errors, populations and samples, normal distributions, z-scores, t-tests, and ANOVA. Key topics covered include defining probability based on measurement error and random variables, the differences between populations and samples, calculating means, standard deviations, and combining uncertainties when working with multiple measured values.

Uploaded by

dhbash ALKALI
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

One-dimensional statistics

Prof Veruscha Fester


7 October 2020
The challenge of statistics is to….
• Define the levels of uncertainty
• Based on measurement error and probabilities
• Related to randomly distributed values.
• These are named random errors and are different from systematic
errors which result from some bias in the measurement technique
(e.g. calibration error)
• The main difference between a population and sample has to do with
how observations are assigned to the data set.
• A population includes all of the elements from a set of data.
• A sample consists one or more observations drawn from the population.
Introduction
• A research paper reports a distance measurement of 10.5 m.
• The implication is that:
• The measurement accuracy is 10.5 ± 0.05 m;
• The measurement instrument has been calibrated
• The measurement instrument is capable of resolving measurements to this accuracy
• A measurement of 10.7 m is significantly different from the result stated
• Plotting a histogram of the same result measured several times can show the slightly
different results being recorded.

• Reporting of a single measurement contains inherent information about


the accuracy of both the measurement and the measurement system.
5% Probability estimate
• This is a common measure in statistics – a ‘rule of thumb’
• 5% of all the measured values will lie outside this range of values centred on
the mean value
• 95% of the measurements will lie within this range
• This probability value is a measure of the random, symmetrical distribution of
measured values about the mean value.
• Assuming a normal (random) distribution about the mean value (µ) less than 5%
of the measurements will lie outside the range of ± two standard deviations (σ)
away from the mean.
• On average 2.5% will have values greater than µ+2σ and 2.5% will have values
small than µ+2σ.
• There might be situations where the 5% probability of error is unacceptably large,
then a smaller probability might be mandated.
Normal distribution curve
Acceptable probabilities and risk
An explosives company has a detonation device for large scale mining
operations. A 5% probability of detonation is mandated b the mine
operator. There is a greater than 50% chance that one or more charges
will remain undetonated in a set of 100 explosives charges. This is likely
to be unacceptable.

An artificial heartpump is used to replace the natural human heart. If


there is a 5% chance that it will fail in the first 2 years of service, then
this would be regarded as unacceptable by most people in the
community.
Calculating the mean
Calculating the standard deviation
• The standard deviation is the square root of the variance
Z-score
• Simply put, a z-score (also called a standard score) gives you an idea of how far
from the mean a data point is. But more technically it’s a measure of how many
standard deviations below or above the population mean a raw score is.
• The Z Score Formula: One Sample
• The basic z score formula for a sample is:
• z = (x – μ) / σ
• For example, let’s say you have a test score of 190. The test has a mean (μ) of 150
and a standard deviation (σ) of 25. Assuming a normal distribution, your z score
would be:
• z = (x – μ) / σ
• = (190 – 150) / 25 = 1.6.
• The z score tells you how many standard deviations from the mean your score is.
In this example, your score is 1.6 standard deviations above the mean.
Z scores and Standard Deviations

• Technically, a z-score is the number of standard deviations from the mean


value of the reference population (a population whose known values have
been recorded, like in these charts the CDC compiles about people’s
weights). For example:
• A z-score of 1 is 1 standard deviation above the mean.
• A score of 2 is 2 standard deviations above the mean.
• A score of -1.8 is -1.8 standard deviations below the mean.
• A z-score tells you where the score lies on a normal distribution curve. A z-
score of zero tells you the values is exactly average while a score of +3 tells
you that the value is much higher than average.
Z Score Formula: Standard Error of the Mean

• When you have multiple samples and want to describe the standard
deviation of those sample means (the standard error), you would use this z
score formula:
z = (x – μ) / (σ / √n)
This z-score will tell you how many standard errors there are between the
sample mean and the population mean.
• Example problem: In general, the mean weight of women is 65 kg with a
standard deviation of 3.5 kg. What is the probability of finding a random
sample of 50 women with a mean height of 70 kg, assuming the heights
are normally distributed?
• z = (x – μ) / (σ / √n)
• = (70 – 65) / (3.5/√50) = 5 / 0.495 = 10.1
Where normal distribution can not be used for
probability calculations
• The mean value is close to zero and negative values are not possible
in the data set
• The distribution is skewed about the mean. This is defined
numerically as the skewness of the population.
• The µ±2σ range of values do not contribute 95% of the probability.
This is numerically defined as the kurtosis of the population.
Combining errors and uncertainties (1)
• Once several parameters have been determined experimentally
• And associated errors determine using the 5% probability concept
• Some additional mathematical processing might be required
• In which the different parameters and their associated errors are
combined
• To calculate the final value of interest and the associated error.
• There are some simple rules for combining errors which a based on
the least squared error analysis used to calculate the mean value
Combining errors and uncertainties (2)
• If two values are to be added or subtracted:
• xi±2σi and xj±2σj
• y = (xi±xj) ± 2(σi2+ σj2)0.5
• All units must be the same
• If two values are divided or multiplied: xi±2σi and xj±2σj
2
𝑥𝑥𝑖𝑖 𝑥𝑥𝑖𝑖 σ𝑖𝑖 2 σ𝑗𝑗
• y= ± 2 +
𝑥𝑥𝑗𝑗 𝑥𝑥𝑗𝑗 𝑥𝑥𝑖𝑖 𝑥𝑥𝑗𝑗
• The units of y, xi, xj do not have to be identical.
• Expressions are based on the RMS analysis and so are statistically rigorous
and should be used in combining data with their associated errors.
Propagation of errors
Example of error calculation (A)

The diameter of a pipe of length L (error ∆L) containing MW kg of water (error ∆MW &
density ρW) is

So applying (A) the error in calculated D is then

Always check for dominant error(s)


• For example if Z = A x B x C x D and A is known to 5%, B, C & D are known to 1% then the error will
effectively be 5% and there is no need to do the analysis
Student’s t-test
• In some experimental investigations it is important to know if two
populations are likely to be sample populations selected from the
same global populations.
• This type of one-dimensional question can be addressed using a t-test
(also known as Student’s t-test)
• The test is most applicable when the standard deviations are very
large in comparison to the likely changes or differences between the
two mean values.
Two types of t-test
• Paired t-test
• Is one in which the same population is tested twice to determine if there has been a change
in the overall population
• It is a method of determining if there is a statistically significant change in the population
after an intervention.
• A simple mean and standard deviation calculation will not show a significant change if the
change is likely to be significantly smaller than the standard deviation measure of the
population.
• Unpaired t-test
• Is one in which two different populations are measured to determine if there is a difference
between the two populations.
• In this case the two populations are unrelated and the number of samples can be different
in the two sample sets.
• The t-test can be evaluated using the MS-Excel function ttest-paired and ttest-
unpaired.
• In Matlab the functions are ttest1 and ttest2 for the paired and unpaired data
sets respectively.
Examples for applying t-tests
Paired t-test
Unpaired t-test
A group of 20 aluminum poles of different sizes is
weighed immediately following manufacture. The The concrete strength tests in Ghana high rise
same poles are weighed after 6 months exposure buildings need to meet an international
to the environment. The paired t-test will give the specification. A set of measurements from a
probability that there is a significant change in the sample in England was used for comparison.
weight of the poles. What is the probability that these two sets of
measurements are identical? The question can
A group of 25 employees are weighed. A fitness be resolved using an unpaired t-test
trainer is asked to work with the group to lose
weight. Six months later the same group of people Populations of 25 ten-year old girls in Sweden
is weighed again. The paired t-test will provide a and 30 ten-year old girls in Denmark are
probability estimate that a significant change in weighed. The objectives are to see if there is a
weight has taken place. difference in weight between the two
populations. The probability that there is a
A simple mean and standard deviation calculation different between the two populations can be
will not show a significant change if the change in established using an unpaired t-test.
weight is likely to be significantly smaller than the
range of weights in the population.
ANOVA statistics
We looked at one-dimension statistical methods first
• Then two different populations were compared using the t test.
• If there are more than two dependent data sets, these techniques are inadequate.
• If many repeat measurements are made of a number of members of the population
• The ANOVA statistical methods allow the calculation of probability estimates for three
or more datasets
• As for the t-test, this method can determine statistically significant differences when the
standard deviations in the parameters are much larger than the difference between the
populations
• The ANOVA test can be evaluated using the ANOVA probability testing
• MS Excel function ANOVA
• Two-factor with replication ANOVA Populations of 25 ten-year old girls in Sweden and 30
• Two-factor without replication ten-year old girls in Denmark are weighed and their
• In Matlab the functions are anova1 and anova2 height is also measured every year over 20 years. The
objective is to see if there is a relationship between
height and weight between the two populations over
time. The probability that there is a difference
between the two populations tracked over the years
can be established using a two-factor ANOVA test.
Exercise 1: Instrumentation & Calibration
Review specification sheets for 3 experimental instruments that you
will use in your research project. Briefly summarise the following user
requirements:
• Dynamic range
• Sensitivity
• Linearity
• Calibration requirements
• Calibration procedure
Exercise 2: Review of statistical analysis in
journal article
Review a published journal article in your engineering discipline
which includes a statistical analysis.
• Write a brief report on the statistical analysis.
• Can you suggest an improved statistical analysis?
• Suggest some additional parameters that might have been measured
during the data acquisition stage.
• Explain how you would analyze the total data set of the additional
measurements.

You might also like