Chapter 4 Notes
Chapter 4 Notes
Probability and
Statistics
1
True Value
True value is represented by X ' = X ± ux
Plotting Histograms
The abscissa is divided in K small intervals
between the minimum and maximum values.
The abscissa will be divided between the
maximum and minimum measured values of x into
K small intervals.
Let the number of times, nj, that a measured
value assumes a value within an interval defined
by x-δx ≤ x < x+δx be plotted on the ordinate.
For small N, K should be chosen so that nj ≥ 5 for
at least one interval.
2
Plotting Histograms
K=1.87(N-1)0.40+1=7
Cell range:
min = 0.68 ; max = 1.34
3
Infinite Statistics
The most common distribution is the normal or
Gaussian distribution. This predicts that the
distribution will be evenly distributed
around the central tendency. (“bell curve”)
The operating conditions that are held fixed
are length, temperature, pressure, and
velocity.
Gaussian Distribution
Pdf for Gaussian:
p( x) = 1 / ( v(2 π )1/ 2 exp [( − 1 / 2)(( x − x') 2 / σ 2)]
x’= true mean of x ; σ2 = true variance of
x
4
Gaussian Distribution
Simplification of integration:
x ' + δx
P( x '− δx≤ x≤ x '+ δx )= ∫ x ' − δx p( x )dx P(x)=probability
p(x)=Pdf
Table 4.3
Solutions given for this in table 4.3
For z=1, 68.27% of observations
within 1 standard deviation of
x’
For z=2, 95.45%
For z=3, 99.73%
The values of z in table 4.3 can be
used to predict probability of a
unique value occurring in an
infinite data set.
The Z Table
5
Finite Statistics
When N<<∞, we do not have true representation
of the population. We have finite statistics which
describe the sample and estimate the population.
Sample mean
(probable estimate of true mean)
N
x = 1 / N ∑ xi
i =1
N
Sample variance
(measurement of
sx2 = 1 / ( N − 1)∑ ( xi − x )2
i= 1
precision)
t-estimate
For a finite data set, we use the t-estimate
instead of z, which was used in the infinite.
One can state that (P%)
xi = x ± tv, p s x
6
Standard Deviation of Means
If we measure a variable N times under fixed
conditions, and replicate this M times, we will end
up with slightly different means for each
replication.
Variation depends on
1. The sample variance
s2x
2. The sample size N
sx = sx / N
1/ 2
Difference increases as
s2x increases and
decreases with N1/2
The expected variance of
means can be
estimated by a single
finite data set
Pooled Statistics
Samples that are grouped in a manner so as
to determine a common set of statistics are
called pooled.
7
Pooled mean of x: M N
x = 1 / MN ∑ ∑ xij
j=1 i =1
j=1 i =1
M
= 1 / M ∑ sxj2
j=1
(with v=M(N-1) degrees of freedom)
Pooled standard deviation of means:
sx = sx / ( MN )1/ 2
8
Precision Interval in Sample
Variance
•The precision interval for the sample
variance can be formulated by the probability
statement:
P( χ 21 − α / 2 ≤ χ 2 ≤ χ 2 α / 2 ) = 1 − α
with a probability of P(χ2)= 1 - α ;
α = level of significance
Combining:
P[ ν sx 2 / χ 2α / 2 ≤ σ 2 ≤ ν sx2 / χ 21 − α / 2 ] = 1 − α
For 95% precision interval by which sx2 estimates σ2:
ν sx 2 / χ 2.025 < σ 2 < ν sx 2 / χ 2.975
9
Values for χα
2
10
Goodness of Fit Test
Regression Analysis
Regression analysis is used to establish a
functional relationship between the
independent and dependent variables. It is
assumed that the relationship can be
described by either a polynomial or a
fourier series.
It is also assumed that the variation in the
dependent variable is normally distributed
about a fixed value of the independent
variable.
Regression Analysis
Such behavior is illustrated in the previous
slide by considering the dependent variable yi,j
consisting of N measurements, i = 1, 2, …, N,
of y at each of n values of independent
variable, xj, j = 1, 2, …, n.
This behavior is most common during
calibrations, where the input to the measuring
system, x, is held nominally fixed while the
measurement of y occurs.
11
Regression Analysis
Repeated measurements of y will yield a
normal distribution with variance S 2y ( xj),
about some mean value, y( xj).
12
Linear Polynomials
For linear polynomials, a correlation
coefficient, r, can be found by:
s 2yx
r = 1− ( )
s 2y
Linear Polynomials
The correlation coefficient represents a
quantitative measure of the linear
association between x and y.
It is bounded by ±1, which represents
prefect correlation; the sign indicates that y
increases or decreases with x.
For ±0.9 < r < ±1, a linear regression can
be considered as a reliable relation between
y and x.
Variance
N
Sy 2 = 1 / ( N − 1) ∑ ( yi − y)
2
i =1
13
Data Outlier Detection
It is not uncommon for a data set to contain one
or more spurious data points that appear to be out
of range of expected trends.
Data that lie outside the probability of normal
variation can bias the sample mean and variance.
Statistical methods and histograms are helpful to
identify outliers.
If removed, data needs to be re-evaluated for
stats.
Method 1
Method 1 (Three Sigma Test)
– Use area under Pdf to find basis for outlier
detection
• Calculate data set statistics
Method 2
Method 2 (modified three sigma test for
large data sets)
– Estimate the sample mean and standard
deviation
– Compute modified z variable for each data
point: z0 = ( xi − x) / sx
– Probability that x lies outside one-sided range
from 0 to z0 is 0.5-P(z0), where P(z0) is found
from the one-sided z chart.
– For N data points, if N[0.5-P(z0)] ≤ 0.1 data is
outlier. Example 4.11
14
Number of Measurements
Required
How many observations are required to
estimate the true mean x’, with acceptable
precision?
Earlier we saw that precision interval is a
quantified measure of precision error in the
estimate of true mean.
x' = x ± τ v , psx sx = sx / N1/ 2
Based on sample mean and its precision interval:
x' = x ± tv ,95sx / N1/ 2
Number of Measurements
Required:
95% confidence interval:
CI = ± tv ,95sx / N1/ 2 (95%)
Number of Measurements
Required:
Precision interval is two-sided about the
mean.
x − tv ,95sx / N1/ 2 ≤ x' < x + tv ,95sx / N1/ 2
One-sided precision value:
d = CI / 2 = tv ,95sx / N1/ 2
Then required measurements:
N ≈ ( tv ,95sx / d ) 2 P= (95%) -approximation
due to Sx
15
Number of Measurements
Required:
The approximation serves as a reminder that this
expression is based in an assumed value for Sx.
The accuracy of equation 4.43 will depend on
how well the assumed value for Sx approximates
σ.
The obvious deficiency in this method is that an
estimate for the sample variance is needed.
One way around this is to make a preliminary
small number of measurements, N1, to obtain an
estimate of the sample variance, S1, to be
expected.
Number of Measurements
Required:
The total number of measurements, NT,
will be estimated by:
tN1 − 1.95S1 2
NT ≈ ( ) (95%)
d
This establishes that NT – N1 additional
measurements will be required.
16