Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
14 views

Chapter 4 Notes

Uploaded by

kutay.asikoglu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Chapter 4 Notes

Uploaded by

kutay.asikoglu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Chapter 4

Probability and
Statistics

Figliola and Beasley, (1999)

Probability and Statistics


Engineering measurements taken repeatedly under
seemingly ideal conditions will normally show
variability.
Measurement system
– Resolution
– Repeatability
Measurement procedure and technique
– Repeatability

Probability and Statistics


Measured variable
– Temporal variation
– Spatial variation
We want:
1. A single representative value that best
characterizes the average of the data set.
2. A measure of the variation in a measured
data set.

1
True Value
True value is represented by X ' = X ± ux

X = most probable estimate of X’

ux = confidence interval at probability P%


which is based on estimates of precision
and bias error.

Probability Density Function


Central Tendency – says that there is one central
value about which all other values tend to be
scattered.
Probability Density – the frequency with which the
measured variable takes on a value within a
given interval.

The region where observations tend to gather is around the


central value.

Plotting Histograms
Š The abscissa is divided in K small intervals
between the minimum and maximum values.
Š The abscissa will be divided between the
maximum and minimum measured values of x into
K small intervals.
Š Let the number of times, nj, that a measured
value assumes a value within an interval defined
by x-δx ≤ x < x+δx be plotted on the ordinate.
Š For small N, K should be chosen so that nj ≥ 5 for
at least one interval.

2
Plotting Histograms

Š For N>40 ; K=1.87(N-1)0.40+1


Š The histogram displays the tendency and
density.
Š If the y-axis is normalized by dividing
nj/N, a frequency distribution results.

K=1.87(N-1)0.40+1=7
Cell range:
min = 0.68 ; max = 1.34

2δx = (1.34-0.68)/7 = 0.10


x-δx ≤ x < x+δx
Cell = 2*δx = 0.10

Note: total frequency equals


100%

Note: total occurrence equals


N

Probability Density Function


P(x) comes from the frequency distribution
P( x ) = lim ( nj ) / ( N ( 2δx ))
N → ∞ ,δx → 0
- Defines the probability that a measured variable
might assume a particular value on any given
observation, and also provides the central
tendency
- The shape depends on the variable in
consideration and its natural
circumstances/processes.
- Plot histograms – compare to common distribution
and then fit the parameter.
- Unifit is a good PC-based distribution fitting
software.

3
Infinite Statistics
The most common distribution is the normal or
Gaussian distribution. This predicts that the
distribution will be evenly distributed
around the central tendency. (“bell curve”)
The operating conditions that are held fixed
are length, temperature, pressure, and
velocity.

Gaussian Distribution
Š Pdf for Gaussian:
p( x) = 1 / ( v(2 π )1/ 2 exp [( − 1 / 2)(( x − x') 2 / σ 2)]
x’= true mean of x ; σ2 = true variance of
x

Š To find/predict the probability that a future


measurement falls within some interval.
Probability P(x) is the area under the curve
between X’±δx on p(x)

4
Gaussian Distribution
Š Simplification of integration:
x ' + δx
P( x '− δx≤ x≤ x '+ δx )= ∫ x ' − δx p( x )dx P(x)=probability
p(x)=Pdf

if z1=(x1-x’)/σ and β=(x-x’)/σ


dx=σdβ and
z1 −
P( − z1 ≤ β ≤ z1) = 1 / ( 2π )1 / 2 ∫ e β2 / 2

− z1
β=standardized normal variant ; z1=interval
on p(x)

Table 4.3
Solutions given for this in table 4.3
For z=1, 68.27% of observations
within 1 standard deviation of
x’
For z=2, 95.45%
For z=3, 99.73%
The values of z in table 4.3 can be
used to predict probability of a
unique value occurring in an
infinite data set.

The Z Table

5
Finite Statistics
When N<<∞, we do not have true representation
of the population. We have finite statistics which
describe the sample and estimate the population.

Sample mean
(probable estimate of true mean)
N
x = 1 / N ∑ xi
i =1

N
Sample variance
(measurement of
sx2 = 1 / ( N − 1)∑ ( xi − x )2
i= 1
precision)

Finite Statistics con’t:


Deviation of xi:
xi − x
Sample standard deviation: sx = sx2
Degree of freedom: the number of samples less the
central tendency measurement (N-1)

t-estimate
For a finite data set, we use the t-estimate
instead of z, which was used in the infinite.
One can state that (P%)
xi = x ± tv, p s x

± tv, p s x represents the precision interval


P% = probability
v = degrees of freedom
x = sample mean
Table 4.4 gives “t” distribution

6
Standard Deviation of Means
If we measure a variable N times under fixed
conditions, and replicate this M times, we will end
up with slightly different means for each
replication.

It can be shown that regardless of the form of the


pdf of the individualized replication, the mean
values themselves will be normally distributed.

Variation depends on
1. The sample variance
s2x
2. The sample size N
sx = sx / N
1/ 2

Difference increases as
s2x increases and
decreases with N1/2
The expected variance of
means can be
estimated by a single
finite data set

Pooled Statistics
Š Samples that are grouped in a manner so as
to determine a common set of statistics are
called pooled.

Š If we have M replicates of variable x, with


N repeated measurements producing data
set xi,j ; i = 1 to N ; j = 1 to M

7
Š Pooled mean of x: M N
x = 1 / MN ∑ ∑ xij
j=1 i =1

Š Pooled standard deviation of x:


M N
sx = [1 / ( M( N − 1)) ∑ ∑ ( xij − xj) ]
2

j=1 i =1

M
= 1 / M ∑ sxj2
j=1
(with v=M(N-1) degrees of freedom)
Š Pooled standard deviation of means:
sx = sx / ( MN )1/ 2

Chi Squared Distribution


Š Estimates the precision by which s2x
predicts σ2.
• s2x= sample variance ; σ2 = population variance
Š If we plotted sx for many sets having N
samples each, we would generate the pdf
for P(χ2) (chi-squared)
Š For a normal distribution chi-squared
χ 2 = ν (s2x / σ 2) ν = N -1

8
Precision Interval in Sample
Variance
•The precision interval for the sample
variance can be formulated by the probability
statement:
P( χ 21 − α / 2 ≤ χ 2 ≤ χ 2 α / 2 ) = 1 − α
with a probability of P(χ2)= 1 - α ;
α = level of significance
Combining:
P[ ν sx 2 / χ 2α / 2 ≤ σ 2 ≤ ν sx2 / χ 21 − α / 2 ] = 1 − α
For 95% precision interval by which sx2 estimates σ2:
ν sx 2 / χ 2.025 < σ 2 < ν sx 2 / χ 2.975

Precision Interval in Sample


Variance
Š The χ2 distribution estimates the
discrepancy expected as a result of random
chance.
Š Values for χ α are tabulated in Table 4.5 as
2

a function of the degrees of freedom.


Š The P(χ2) value equals the area under p(χ2)
as measured from the left, and the α value
is the area as measured from the right.
Š The total area under p(χ2) is equal to unity.

9
Values for χα
2

Goodness of Fit Test


Š We can use a chi-squared test to determine how good of a
fit our selected pdf represents the actual distribution of
data.
Š The χ2 test gives us the measure of error between the
variation in the data set and variation predicted by the
assumed pdf.
Š Construct histogram of data and histogram from predicted
pdf, where nj is actual and n’j is predicted number of
occurrences per cell. Then:
k
χ2 = ∑ ( n − n' )
j=1
j j
2
/ nj

For given d.o.f. the better fit gives lower χ2

Goodness of Fit Test


Š The χ α table, given in the previous slide,
2

can be interpreted as a measure of the


discrepancy expected as a result of random
chance.
Š For example, a value for α of 0.95 implies
that 95% of the discrepancy between the
histogram and the assumed distribution is
due to random variation only.

10
Goodness of Fit Test

Š With P(χ2) = 1 - α, this leaves only 5% of


the discrepancy caused by a systematic
tendency, such as different distribution.
Š In general, a P(χ2) < 0.05 confers a very
strong measure of a good fit to the assumed
distribution, an unequivocal result.

Regression Analysis
Š Regression analysis is used to establish a
functional relationship between the
independent and dependent variables. It is
assumed that the relationship can be
described by either a polynomial or a
fourier series.
Š It is also assumed that the variation in the
dependent variable is normally distributed
about a fixed value of the independent
variable.

Regression Analysis
Š Such behavior is illustrated in the previous
slide by considering the dependent variable yi,j
consisting of N measurements, i = 1, 2, …, N,
of y at each of n values of independent
variable, xj, j = 1, 2, …, n.
Š This behavior is most common during
calibrations, where the input to the measuring
system, x, is held nominally fixed while the
measurement of y occurs.

11
Regression Analysis
Š Repeated measurements of y will yield a
normal distribution with variance S 2y ( xj),
about some mean value, y( xj).

mth Order Polynomials


Š (least squares regression)

Š yi=a0+a1x+a2x2+…+amxm gives the estimate


of yi at independent variable value χI - x.

Š The standard deviation of the deviation of


each point is given by:
N
syx = (( ∑ ( yi − yci ) 2 / ν )1/ 2
i =1

d.o.f: ν=N - (m+1) ; N = observations, m = order

12
Linear Polynomials
Š For linear polynomials, a correlation
coefficient, r, can be found by:

s 2yx
r = 1− ( )
s 2y

Linear Polynomials
Š The correlation coefficient represents a
quantitative measure of the linear
association between x and y.
Š It is bounded by ±1, which represents
prefect correlation; the sign indicates that y
increases or decreases with x.
Š For ±0.9 < r < ±1, a linear regression can
be considered as a reliable relation between
y and x.

Variance
N
Sy 2 = 1 / ( N − 1) ∑ ( yi − y)
2

i =1

For ±0.4 < r ≤ ±1 is considered a good fit


between x and y. r2 is often used, but these
are not true measures of the precision of yi.
Syx should be used for precision.

13
Data Outlier Detection
Š It is not uncommon for a data set to contain one
or more spurious data points that appear to be out
of range of expected trends.
Š Data that lie outside the probability of normal
variation can bias the sample mean and variance.
Š Statistical methods and histograms are helpful to
identify outliers.
Š If removed, data needs to be re-evaluated for
stats.

Method 1
Š Method 1 (Three Sigma Test)
– Use area under Pdf to find basis for outlier
detection
• Calculate data set statistics

• Discard all points outside x ± tν99.8Sx

• Good for n > 10, easy to program

Method 2
Š Method 2 (modified three sigma test for
large data sets)
– Estimate the sample mean and standard
deviation
– Compute modified z variable for each data
point: z0 = ( xi − x) / sx
– Probability that x lies outside one-sided range
from 0 to z0 is 0.5-P(z0), where P(z0) is found
from the one-sided z chart.
– For N data points, if N[0.5-P(z0)] ≤ 0.1 data is
outlier. Example 4.11

14
Number of Measurements
Required
Š How many observations are required to
estimate the true mean x’, with acceptable
precision?
Š Earlier we saw that precision interval is a
quantified measure of precision error in the
estimate of true mean.
x' = x ± τ v , psx sx = sx / N1/ 2
Based on sample mean and its precision interval:
x' = x ± tv ,95sx / N1/ 2

Number of Measurements
Required:
Š 95% confidence interval:
CI = ± tv ,95sx / N1/ 2 (95%)

To evaluate, we must have an estimate of


Sx, based on previous test data, experience,
or manufacturer specs.
sx = standard deviation of means
sx = standard deviation of N samples

Number of Measurements
Required:
Š Precision interval is two-sided about the
mean.
x − tv ,95sx / N1/ 2 ≤ x' < x + tv ,95sx / N1/ 2
Š One-sided precision value:
d = CI / 2 = tv ,95sx / N1/ 2
Š Then required measurements:
N ≈ ( tv ,95sx / d ) 2 P= (95%) -approximation
due to Sx

15
Number of Measurements
Required:
Š The approximation serves as a reminder that this
expression is based in an assumed value for Sx.
Š The accuracy of equation 4.43 will depend on
how well the assumed value for Sx approximates
σ.
Š The obvious deficiency in this method is that an
estimate for the sample variance is needed.
Š One way around this is to make a preliminary
small number of measurements, N1, to obtain an
estimate of the sample variance, S1, to be
expected.

Number of Measurements
Required:
Š The total number of measurements, NT,
will be estimated by:

tN1 − 1.95S1 2
NT ≈ ( ) (95%)
d
Š This establishes that NT – N1 additional
measurements will be required.

16

You might also like