Random Variables
Random Variables
8
16 27
7
17 25
6
18 29
5
19 31
4
20 28
3
21 28.5
2
22 29.5
1
23 27
0
24 25.5 20.0 22.5 25.0 27.5 30.0 32.5 35.0 37.5
25 28.5 Strength, X (MPa)
The actual strength is a random variable X, whose chance
of occurrence can be estimated from the histogram. 2
Hence, a random variable: (r.v.)
• is to identify events in numerical terms
• permits convenient analytical description & graphical
display of events and their probabilities
• can be discrete (e.g. dice) or continuous (e.g. strength)
CDF : FX ( xi ) P ( X xi ) p X ( xi )
all x xi
x
0
i 0
p 8
.0
(x
X )i .FX
0 i)8
0
(x
1 0 6
9
.0 0
. +
8
0 .0 =
6
9 0
.1 4
2
3 0 4
8
3
2
1
.5 0
. +
8
0 .0 +
6
9 8
.3
0 0
+
4
= 8
2
1
4
.5 1
=
xi pX(xi) FX(xi)
0 0.008 0.008
1 0.096 0.008+0.096=0.104
2 0.384 0.008+0.096+0.384=0.488
3 0.512 0.008+0.096+0.384+0.512=1
1.1
1.0
0.9
Cumulative value FX(x)
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-1 0 1 2 3 4
No. of Bulldozer, X
x
FX ( x ) P ( X x ) f X ( x )dx
dFX ( x )
Note: if the derivative of FX(x) exists, then f X ( x )
dx
P ( x X x dx)
fX(x) f X ( x)dx
Probability density function
dFX (b)
f X (b) Cannot be negative!
dx
Area under curve = 1
Area = Units of 1/X e.g. if X ~
P(X < a) metres, fX(x) ~ m-1
Value represents gradient
of cdf
a dx b
Integrate Differentiate
FX(x) Cumulative distribution
Gradient function
Ranges from 0 to 1
Montonically increasing
Dimensionless (no units)
FX(a)
Gradient given by pdf
=P(X < a)
a b
X (MPa) No. of specimens Fraction of specimens 0.20
20.0 - 22.4 2 0.08
22.5 - 24.9 1 0.04
0.15
25.0 - 27.4 5 0.2
27.5 - 29.9 12 0.48
PDF
30.0 - 32.4 3 0.12
p
0.10
Total 25 1
0.00
20.0 22.5 25.0 27.5 30.0 32.5 35.0 37.5
Strength, X (M Pa)
Cumulative distribution 1.3
1.2
Strength, X (MPa)
0 elsewhere
100 c
Since F X ( ) 1 , 0 f X ( x ) dx 1
This gives c 0 . 01 (units = km–1) x
x variable for integration 0 100
CDF x upper limit of integral
x FX(x)
FX ( x ) 0
0.01dx , 0 x 100
1
0.01 x 0 0.01 x
x
0 x 100
0 x0
1 x 100
P (20 X 35) F X (35) F X (20) 0.15
3.2 MAIN DESCRIPTORS OF A RANDOM VARIABLE
If we have a set of data for X (e.g. strength of concrete),
how do we describe the PDF? Two possible ways are:
(a) fit a curve that can be described by an algebraic equation;
(b) describe the shape of distribution based solely on the data
1 N N
1
x
N
i 1
xi xi N
i 1
N
X xi pX ( xi ) or x f X ( x )dx
i 1
Mathematical Expectation
(generalization of weighted average)
In the last slide, the mean or expected value of X (X) is given by
N
E[ X ] xi pX ( xi ) or x f X ( x )dx
i 1
i 1
g( x ) f X ( x )dx
When g(X) = Xn
N
E[ X ] xin pX ( xi ) or
n
x n f X ( x )dx
i 1
This is known as the n-th moment of X ...(power of X = n)
For mean, n=1
Variance & Standard Deviation (measure of dispersion)
The mean gives us the “centre of gravity” of the data
distribution. It does not indicate whether the data is
concentrated or spread out. X = E[X]
The deviation of point i from the mean is xi – X.
Hence, a measure of spread is E[(X – X)2] known as variance.
N
E[( X X ) ] ( xi X )2 p X ( xi )
2
X
2
i 1
or ( x X )2 f X ( x )dx
Note that this is also known as second central moment of X. It
is the moment of “inertia” of the data about the mean.
The standard deviation, X, is the square root of variance.
X2 E[ X 2 ] X 2
N
where E[ X ] ( xi ) 2 p X ( xi ) or
2
x 2 f X ( x )dx
i 1
X2 E[( X X ) 2 ] E[ X 2 2 X X X2 ]
E[ X 2 ] E[2 X X ] E[ X2 ] E[ X 2 ] 2 X E[ X ] X2
E[ X 2 ] X2 (will be always positive)
https://en.wikipedia.org/wiki/Probability_density_function
Singapore gross montly income
5070
http://stats.mom.gov.sg/Pages/Income-Summary-Table.aspx
0.8
0.2
t
1
0 1 2 3 4
2
1 1
t e t
dt
0
2
Method 2 (Note: Method 2 is often easier!)
1
T2 E[T 2 ] T2 t 2 fT (t )dt 2
0
1 1
t 2 e t dt 2 2
0
1/
Coefficient of Variation COV = T = 1
1/
Mode Mode = 0
Median, tm is solved from FT(tm) = 0.5
t
where FT ( t ) 0 e t dt e t
t
0 1 e t
1 – exp(–tm) = 0.5, hence tm = ln(2)/
Appendix
Formula for integrating by parts
udv uv vdu
0
te t dt u t dv e t
t e t 1
t e e dt 0
t
0 0
0