0% found this document useful (0 votes)

14 views

Chapter 4 Notes

Uploaded by

kutay.asikoglu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Chapter 4 Notes

Uploaded by

kutay.asikoglu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Chapter 4

Probability and
Statistics

Figliola and Beasley, (1999)

Probability and Statistics

Engineering measurements taken repeatedly under
seemingly ideal conditions will normally show
variability.
Measurement system
– Resolution
– Repeatability
Measurement procedure and technique
– Repeatability

Probability and Statistics

Measured variable
– Temporal variation
– Spatial variation
We want:
1. A single representative value that best
characterizes the average of the data set.
2. A measure of the variation in a measured
data set.

1
True Value
True value is represented by X ' = X ± ux

X = most probable estimate of X’

ux = confidence interval at probability P%

which is based on estimates of precision
and bias error.

Probability Density Function

Central Tendency – says that there is one central
value about which all other values tend to be
scattered.
Probability Density – the frequency with which the
measured variable takes on a value within a
given interval.

The region where observations tend to gather is around the

central value.

Plotting Histograms
The abscissa is divided in K small intervals
between the minimum and maximum values.
The abscissa will be divided between the
maximum and minimum measured values of x into
K small intervals.
Let the number of times, nj, that a measured
value assumes a value within an interval defined
by x-δx ≤ x < x+δx be plotted on the ordinate.
For small N, K should be chosen so that nj ≥ 5 for
at least one interval.

2
Plotting Histograms

For N>40 ; K=1.87(N-1)0.40+1

The histogram displays the tendency and
density.
If the y-axis is normalized by dividing
nj/N, a frequency distribution results.

K=1.87(N-1)0.40+1=7
Cell range:
min = 0.68 ; max = 1.34

2δx = (1.34-0.68)/7 = 0.10

x-δx ≤ x < x+δx
Cell = 2*δx = 0.10

Note: total frequency equals

100%

Note: total occurrence equals

Probability Density Function

P(x) comes from the frequency distribution
P( x ) = lim ( nj ) / ( N ( 2δx ))
N → ∞ ,δx → 0
- Defines the probability that a measured variable
might assume a particular value on any given
observation, and also provides the central
tendency
- The shape depends on the variable in
consideration and its natural
circumstances/processes.
- Plot histograms – compare to common distribution
and then fit the parameter.
- Unifit is a good PC-based distribution fitting
software.

3
Infinite Statistics
The most common distribution is the normal or
Gaussian distribution. This predicts that the
distribution will be evenly distributed
around the central tendency. (“bell curve”)
The operating conditions that are held fixed
are length, temperature, pressure, and
velocity.

Gaussian Distribution
Pdf for Gaussian:
p( x) = 1 / ( v(2 π )1/ 2 exp [( − 1 / 2)(( x − x') 2 / σ 2)]
x’= true mean of x ; σ2 = true variance of
x

To find/predict the probability that a future

measurement falls within some interval.
Probability P(x) is the area under the curve
between X’±δx on p(x)

4
Gaussian Distribution
Simplification of integration:
x ' + δx
P( x '− δx≤ x≤ x '+ δx )= ∫ x ' − δx p( x )dx P(x)=probability
p(x)=Pdf

if z1=(x1-x’)/σ and β=(x-x’)/σ

dx=σdβ and
z1 −
P( − z1 ≤ β ≤ z1) = 1 / ( 2π )1 / 2 ∫ e β2 / 2
dβ
− z1
β=standardized normal variant ; z1=interval
on p(x)

Table 4.3
Solutions given for this in table 4.3
For z=1, 68.27% of observations
within 1 standard deviation of
x’
For z=2, 95.45%
For z=3, 99.73%
The values of z in table 4.3 can be
used to predict probability of a
unique value occurring in an
infinite data set.

The Z Table

5
Finite Statistics
When N<<∞, we do not have true representation
of the population. We have finite statistics which
describe the sample and estimate the population.

Sample mean
(probable estimate of true mean)
N
x = 1 / N ∑ xi
i =1

N
Sample variance
(measurement of
sx2 = 1 / ( N − 1)∑ ( xi − x )2
i= 1
precision)

Finite Statistics con’t:

Deviation of xi:
xi − x
Sample standard deviation: sx = sx2
Degree of freedom: the number of samples less the
central tendency measurement (N-1)

t-estimate
For a finite data set, we use the t-estimate
instead of z, which was used in the infinite.
One can state that (P%)
xi = x ± tv, p s x

± tv, p s x represents the precision interval

P% = probability
v = degrees of freedom
x = sample mean
Table 4.4 gives “t” distribution

6
Standard Deviation of Means
If we measure a variable N times under fixed
conditions, and replicate this M times, we will end
up with slightly different means for each
replication.

It can be shown that regardless of the form of the

pdf of the individualized replication, the mean
values themselves will be normally distributed.

Variation depends on
1. The sample variance
s2x
2. The sample size N
sx = sx / N
1/ 2

Difference increases as
s2x increases and
decreases with N1/2
The expected variance of
means can be
estimated by a single
finite data set

Pooled Statistics
Samples that are grouped in a manner so as
to determine a common set of statistics are
called pooled.

If we have M replicates of variable x, with

N repeated measurements producing data
set xi,j ; i = 1 to N ; j = 1 to M

7
Pooled mean of x: M N
x = 1 / MN ∑ ∑ xij
j=1 i =1

Pooled standard deviation of x:

M N
sx = [1 / ( M( N − 1)) ∑ ∑ ( xij − xj) ]
2

j=1 i =1

M
= 1 / M ∑ sxj2
j=1
(with v=M(N-1) degrees of freedom)
Pooled standard deviation of means:
sx = sx / ( MN )1/ 2

Chi Squared Distribution

Estimates the precision by which s2x
predicts σ2.
• s2x= sample variance ; σ2 = population variance
If we plotted sx for many sets having N
samples each, we would generate the pdf
for P(χ2) (chi-squared)
For a normal distribution chi-squared
χ 2 = ν (s2x / σ 2) ν = N -1

8
Precision Interval in Sample
Variance
•The precision interval for the sample
variance can be formulated by the probability
statement:
P( χ 21 − α / 2 ≤ χ 2 ≤ χ 2 α / 2 ) = 1 − α
with a probability of P(χ2)= 1 - α ;
α = level of significance
Combining:
P[ ν sx 2 / χ 2α / 2 ≤ σ 2 ≤ ν sx2 / χ 21 − α / 2 ] = 1 − α
For 95% precision interval by which sx2 estimates σ2:
ν sx 2 / χ 2.025 < σ 2 < ν sx 2 / χ 2.975

Precision Interval in Sample

Variance
The χ2 distribution estimates the
discrepancy expected as a result of random
chance.
Values for χ α are tabulated in Table 4.5 as
2

a function of the degrees of freedom.

The P(χ2) value equals the area under p(χ2)
as measured from the left, and the α value
is the area as measured from the right.
The total area under p(χ2) is equal to unity.

9
Values for χα
2

Goodness of Fit Test

We can use a chi-squared test to determine how good of a
fit our selected pdf represents the actual distribution of
data.
The χ2 test gives us the measure of error between the
variation in the data set and variation predicted by the
assumed pdf.
Construct histogram of data and histogram from predicted
pdf, where nj is actual and n’j is predicted number of
occurrences per cell. Then:
k
χ2 = ∑ ( n − n' )
j=1
j j
2
/ nj

For given d.o.f. the better fit gives lower χ2

Goodness of Fit Test

The χ α table, given in the previous slide,
2

can be interpreted as a measure of the

discrepancy expected as a result of random
chance.
For example, a value for α of 0.95 implies
that 95% of the discrepancy between the
histogram and the assumed distribution is
due to random variation only.

10
Goodness of Fit Test

With P(χ2) = 1 - α, this leaves only 5% of

the discrepancy caused by a systematic
tendency, such as different distribution.
In general, a P(χ2) < 0.05 confers a very
strong measure of a good fit to the assumed
distribution, an unequivocal result.

Regression Analysis
Regression analysis is used to establish a
functional relationship between the
independent and dependent variables. It is
assumed that the relationship can be
described by either a polynomial or a
fourier series.
It is also assumed that the variation in the
dependent variable is normally distributed
about a fixed value of the independent
variable.

Regression Analysis
Such behavior is illustrated in the previous
slide by considering the dependent variable yi,j
consisting of N measurements, i = 1, 2, …, N,
of y at each of n values of independent
variable, xj, j = 1, 2, …, n.
This behavior is most common during
calibrations, where the input to the measuring
system, x, is held nominally fixed while the
measurement of y occurs.

11
Regression Analysis
Repeated measurements of y will yield a
normal distribution with variance S 2y ( xj),
about some mean value, y( xj).

mth Order Polynomials

(least squares regression)

yi=a0+a1x+a2x2+…+amxm gives the estimate

of yi at independent variable value χI - x.

The standard deviation of the deviation of

each point is given by:
N
syx = (( ∑ ( yi − yci ) 2 / ν )1/ 2
i =1

d.o.f: ν=N - (m+1) ; N = observations, m = order

12
Linear Polynomials
For linear polynomials, a correlation
coefficient, r, can be found by:

s 2yx
r = 1− ( )
s 2y

Linear Polynomials
The correlation coefficient represents a
quantitative measure of the linear
association between x and y.
It is bounded by ±1, which represents
prefect correlation; the sign indicates that y
increases or decreases with x.
For ±0.9 < r < ±1, a linear regression can
be considered as a reliable relation between
y and x.

Variance
N
Sy 2 = 1 / ( N − 1) ∑ ( yi − y)
2

i =1

For ±0.4 < r ≤ ±1 is considered a good fit

between x and y. r2 is often used, but these
are not true measures of the precision of yi.
Syx should be used for precision.

13
Data Outlier Detection
It is not uncommon for a data set to contain one
or more spurious data points that appear to be out
of range of expected trends.
Data that lie outside the probability of normal
variation can bias the sample mean and variance.
Statistical methods and histograms are helpful to
identify outliers.
If removed, data needs to be re-evaluated for
stats.

Method 1
Method 1 (Three Sigma Test)
– Use area under Pdf to find basis for outlier
detection
• Calculate data set statistics

• Discard all points outside x ± tν99.8Sx

• Good for n > 10, easy to program

Method 2
Method 2 (modified three sigma test for
large data sets)
– Estimate the sample mean and standard
deviation
– Compute modified z variable for each data
point: z0 = ( xi − x) / sx
– Probability that x lies outside one-sided range
from 0 to z0 is 0.5-P(z0), where P(z0) is found
from the one-sided z chart.
– For N data points, if N[0.5-P(z0)] ≤ 0.1 data is
outlier. Example 4.11

14
Number of Measurements
Required
How many observations are required to
estimate the true mean x’, with acceptable
precision?
Earlier we saw that precision interval is a
quantified measure of precision error in the
estimate of true mean.
x' = x ± τ v , psx sx = sx / N1/ 2
Based on sample mean and its precision interval:
x' = x ± tv ,95sx / N1/ 2

Number of Measurements
Required:
95% confidence interval:
CI = ± tv ,95sx / N1/ 2 (95%)

To evaluate, we must have an estimate of

Sx, based on previous test data, experience,
or manufacturer specs.
sx = standard deviation of means
sx = standard deviation of N samples

Number of Measurements
Required:
Precision interval is two-sided about the
mean.
x − tv ,95sx / N1/ 2 ≤ x' < x + tv ,95sx / N1/ 2
One-sided precision value:
d = CI / 2 = tv ,95sx / N1/ 2
Then required measurements:
N ≈ ( tv ,95sx / d ) 2 P= (95%) -approximation
due to Sx

15
Number of Measurements
Required:
The approximation serves as a reminder that this
expression is based in an assumed value for Sx.
The accuracy of equation 4.43 will depend on
how well the assumed value for Sx approximates
σ.
The obvious deficiency in this method is that an
estimate for the sample variance is needed.
One way around this is to make a preliminary
small number of measurements, N1, to obtain an
estimate of the sample variance, S1, to be
expected.

Number of Measurements
Required:
The total number of measurements, NT,
will be estimated by:

tN1 − 1.95S1 2
NT ≈ ( ) (95%)
d
This establishes that NT – N1 additional
measurements will be required.

Applied Math Unit1 Summary and Useful Formulas
No ratings yet
Applied Math Unit1 Summary and Useful Formulas
4 pages
CH 02
No ratings yet
CH 02
41 pages
Probability and Statistics
No ratings yet
Probability and Statistics
28 pages
Lec 6
No ratings yet
Lec 6
20 pages
Module01_ProbabilityAndHypothesisTesting
No ratings yet
Module01_ProbabilityAndHypothesisTesting
62 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
Prob & Stats (Slides) PDF
No ratings yet
Prob & Stats (Slides) PDF
101 pages
Lecture4 Mech SU
No ratings yet
Lecture4 Mech SU
17 pages
Pro Band Stat
No ratings yet
Pro Band Stat
27 pages
Counting Statistics: Presented by Vikas Lakhwani
No ratings yet
Counting Statistics: Presented by Vikas Lakhwani
21 pages
Data Analysis For Physics Laboratory: Standard Errors
No ratings yet
Data Analysis For Physics Laboratory: Standard Errors
5 pages
Error and Uncertainty: General Statistical Principles
No ratings yet
Error and Uncertainty: General Statistical Principles
8 pages
CHAPTERS
No ratings yet
CHAPTERS
17 pages
1B40 DA Lecture 2v2
No ratings yet
1B40 DA Lecture 2v2
9 pages
Grad Lecture 3
No ratings yet
Grad Lecture 3
27 pages
Chapter III Generalized Performance Characteristics of Instruments
100% (1)
Chapter III Generalized Performance Characteristics of Instruments
49 pages
Statistics Study Guide: Matthew Chesnes The London School of Economics September 22, 2001
No ratings yet
Statistics Study Guide: Matthew Chesnes The London School of Economics September 22, 2001
22 pages
CRE Equations and Formulas Print Out
No ratings yet
CRE Equations and Formulas Print Out
30 pages
Lec 11 Chapter IV Descriptiv and Inferential Stat.
No ratings yet
Lec 11 Chapter IV Descriptiv and Inferential Stat.
26 pages
Formula PDF
No ratings yet
Formula PDF
7 pages
Formula
No ratings yet
Formula
7 pages
Lecture Notes For STAT2602
No ratings yet
Lecture Notes For STAT2602
104 pages
Notes
No ratings yet
Notes
16 pages
Lecture Notes
No ratings yet
Lecture Notes
80 pages
GB Academy Equation List
No ratings yet
GB Academy Equation List
16 pages
1 - 3 Descriptive Measures
No ratings yet
1 - 3 Descriptive Measures
33 pages
S 15 Notes
No ratings yet
S 15 Notes
216 pages
Stat 1124 Tables and Formulas (V. 202110)
No ratings yet
Stat 1124 Tables and Formulas (V. 202110)
7 pages
Probability and Distributions
No ratings yet
Probability and Distributions
6 pages
Lecture Notes
No ratings yet
Lecture Notes
90 pages
Formula Stables
No ratings yet
Formula Stables
29 pages
Eco 2
No ratings yet
Eco 2
31 pages
Mba Semester 1 Mb0040 - Statistics For Management-4 Credits (Book ID: B1129) Assignment Set - 2 (60 Marks)
No ratings yet
Mba Semester 1 Mb0040 - Statistics For Management-4 Credits (Book ID: B1129) Assignment Set - 2 (60 Marks)
5 pages
CQE Academy Equation Cheat Sheet - D
No ratings yet
CQE Academy Equation Cheat Sheet - D
15 pages
Module Wise Important Formulae
No ratings yet
Module Wise Important Formulae
45 pages
Statistics Study Guide: Measures of Central Tendancy
No ratings yet
Statistics Study Guide: Measures of Central Tendancy
2 pages
Chapter 2
No ratings yet
Chapter 2
46 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Stat 509 Notes
100% (1)
Stat 509 Notes
195 pages
Student Solutions Manual To Accompany An Introduction To Econometrics A Self Contained Approach 1nbsped 9780262317184 9780262525404 - Compress
No ratings yet
Student Solutions Manual To Accompany An Introduction To Econometrics A Self Contained Approach 1nbsped 9780262317184 9780262525404 - Compress
143 pages
CQE Academy Equation Cheat Sheet B
No ratings yet
CQE Academy Equation Cheat Sheet B
15 pages
Formula Sheet Math236
No ratings yet
Formula Sheet Math236
2 pages
Sampling Distributions
No ratings yet
Sampling Distributions
32 pages
Week 03
No ratings yet
Week 03
39 pages
ECM1001 Formula Sheet
No ratings yet
ECM1001 Formula Sheet
15 pages
SDM 1 Formula
No ratings yet
SDM 1 Formula
9 pages
MT233 October 2019-1
No ratings yet
MT233 October 2019-1
39 pages
Statistics
No ratings yet
Statistics
60 pages
Chapter 02
No ratings yet
Chapter 02
47 pages
MAT 211 Introduction To Business Statistics I Lecture Notes
No ratings yet
MAT 211 Introduction To Business Statistics I Lecture Notes
69 pages
Chapter 20_PPT (Fall, 2024)
No ratings yet
Chapter 20_PPT (Fall, 2024)
29 pages
City_Uni_of_New_York
No ratings yet
City_Uni_of_New_York
33 pages
STATS-EXAM-FORMULAE
No ratings yet
STATS-EXAM-FORMULAE
6 pages
ABD Formulas
No ratings yet
ABD Formulas
55 pages
Chapter 2 Organizing and Summarizing Data
No ratings yet
Chapter 2 Organizing and Summarizing Data
8 pages
Basic - Statistics 30 Sep 2013 PDF
100% (1)
Basic - Statistics 30 Sep 2013 PDF
20 pages
Introduction To Probability and Statistics Thirteenth Edition
No ratings yet
Introduction To Probability and Statistics Thirteenth Edition
46 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Full Download Multivariate Statistical Methods A Primer Third Edition Manly PDF DOCX
100% (8)
Full Download Multivariate Statistical Methods A Primer Third Edition Manly PDF DOCX
65 pages
Review of Chi-Square
No ratings yet
Review of Chi-Square
3 pages
8614 Assignment 1
No ratings yet
8614 Assignment 1
17 pages
Final Task X
No ratings yet
Final Task X
17 pages
Passengerid Survived Pclass Name Sex Age Sibsp Parch Ticket
No ratings yet
Passengerid Survived Pclass Name Sex Age Sibsp Parch Ticket
16 pages
12-Inferential Statistics Assignment
No ratings yet
12-Inferential Statistics Assignment
5 pages
Analysis of Variance Anova
No ratings yet
Analysis of Variance Anova
33 pages
Tutorial 4 Answers
No ratings yet
Tutorial 4 Answers
4 pages
Tests of Significance
75% (4)
Tests of Significance
111 pages
MB0034-Research Methodology - Model Question Paper
83% (36)
MB0034-Research Methodology - Model Question Paper
24 pages
Project Report Statistika
No ratings yet
Project Report Statistika
38 pages
Anova e Print
No ratings yet
Anova e Print
3 pages
Ethical Issues and IRB
No ratings yet
Ethical Issues and IRB
48 pages
Statistical Analysis Activity
No ratings yet
Statistical Analysis Activity
1 page
Basics of Multivariate Normal
No ratings yet
Basics of Multivariate Normal
46 pages
Bayesian Factor Zero-Inflated Poisson Model For Multiple Grouped Count Data
No ratings yet
Bayesian Factor Zero-Inflated Poisson Model For Multiple Grouped Count Data
27 pages
TAKE HOME QUIZ 1 Stat
No ratings yet
TAKE HOME QUIZ 1 Stat
2 pages
Deco504 Statistical Methods in Economics Hindi
No ratings yet
Deco504 Statistical Methods in Economics Hindi
409 pages
V2-G7 statistics and probability
No ratings yet
V2-G7 statistics and probability
4 pages
Parental Support, Communication and Supervision of Home Work As Predictors of Academic Achievement Among Standard Seven Pupils in Nairobi City County, Kenya
No ratings yet
Parental Support, Communication and Supervision of Home Work As Predictors of Academic Achievement Among Standard Seven Pupils in Nairobi City County, Kenya
121 pages
Case Study of Sales Forecasting
No ratings yet
Case Study of Sales Forecasting
22 pages
Lesson 1 Basic Concepts of Statistics
No ratings yet
Lesson 1 Basic Concepts of Statistics
9 pages
Empirical Rule-Examples (Normal Distribution)
100% (1)
Empirical Rule-Examples (Normal Distribution)
17 pages
Biostatistics, Exam - Nursing, July, 2005
No ratings yet
Biostatistics, Exam - Nursing, July, 2005
6 pages
4 Measures of Central Tendency
No ratings yet
4 Measures of Central Tendency
18 pages
Dip Secretarial Notes
No ratings yet
Dip Secretarial Notes
116 pages
2-Scatterplots and Correlation
No ratings yet
2-Scatterplots and Correlation
4 pages
Correlation
No ratings yet
Correlation
52 pages
STATA Intro
0% (1)
STATA Intro
123 pages
Confidence Interval Estimation (Basic Statistics)
No ratings yet
Confidence Interval Estimation (Basic Statistics)
20 pages