Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
16 views

Random Variables

The document discusses random variables and probability distributions. Some key points: - A random variable is a variable whose value is subject to variations due to chance. It allows events to be described numerically and probabilities to be calculated. - The probability distribution of a random variable describes the probabilities associated with all possible values of the variable. It can be discrete or continuous. - Examples of discrete and continuous random variables and their probability mass functions and cumulative distribution functions are provided. - For a continuous random variable, the probability density function describes the relative likelihood of the variable taking on a given value.

Uploaded by

Chong Ray Jie
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Random Variables

The document discusses random variables and probability distributions. Some key points: - A random variable is a variable whose value is subject to variations due to chance. It allows events to be described numerically and probabilities to be calculated. - The probability distribution of a random variable describes the probabilities associated with all possible values of the variable. It can be discrete or continuous. - Examples of discrete and continuous random variables and their probability mass functions and cumulative distribution functions are provided. - For a continuous random variable, the probability density function describes the relative likelihood of the variable taking on a given value.

Uploaded by

Chong Ray Jie
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

3.

MODELS OF RANDOM PHENOMENA


3.1 CONCEPT OF RANDOM VARIABLE
Consider a dice throw, which constitute the possible
outcomes: 1, 2, 3, 4, 5, 6.
If we throw a dice, we are not sure exactly which of these
number will appear on top.
Let X denote the number that will appear on top.
We call X a random variable because the actual value is not
known. Its value can be associated with some degree of
chance.
As another example, consider the strength of concrete
specimens which was estimated based on limited sample.

Specimen Strength, X (MPa) X (MPa) No. of specimens Fraction of specimens


1 28 20.0 - 22.4 2 0.08
2 26.5 22.5 - 24.9 1 0.04
3 20 25.0 - 27.4 5 0.2
4 22 27.5 - 29.9 12 0.48
5 27.5 30.0 - 32.4 3 0.12
6 28.5 32.5 - 34.9 1 0.04
7 30 35.0 - 37.4 1 0.04
8 28 Total 25 1
9 35.5
10 23.5
13
11 27.5
12
12 29.5
11
13 32
10
14 29
9
15 33.5
No. of specimens

8
16 27
7
17 25
6
18 29
5
19 31
4
20 28
3
21 28.5
2
22 29.5
1
23 27
0
24 25.5 20.0 22.5 25.0 27.5 30.0 32.5 35.0 37.5
25 28.5 Strength, X (MPa)
The actual strength is a random variable X, whose chance
of occurrence can be estimated from the histogram. 2
Hence, a random variable: (r.v.)
• is to identify events in numerical terms
• permits convenient analytical description & graphical
display of events and their probabilities
• can be discrete (e.g. dice) or continuous (e.g. strength)

Probability Distribution of a Random Variable


Since the value of a r.v. represents an event, a r.v. can assume
a specific value with an associated chance or probability
Probability distribution is a rule for describing the probability
associated with all the values of a random variable.

For a r.v. X, its probability distribution can be described by


its cumulative distribution function (CDF)
Denote r.v with capital
FX ( x )  P ( X  x ) for all x letter & its value with a
corresponding lower case
From axioms, letter
FX (  )  0, FX (  )  1 and
FX ( x ) is continuous & non  decreasing with x

Discrete Random Variable


If only discrete values of X have probability values, X is a
discrete r.v.
The probability of X = xi is denoted as pX(xi), which is
known as the probability mass function (PMF)
The CDF is FX ( xi )  P ( X  xi )   pX ( x i )
all x  xi
Example 3.1 - Bulldozers
Contractor purchases 3 bulldozers. Each has 80% chance
that it will not breakdown within 6 months.
X = no. of bulldozer that is functional after 6 months
sample space and probabilities
X Sample space P(X = xi)
3 GGG 0.512
2 GGB, GBG, BGG 0.384
1 GBB, BGB, BBG 0.096
0 BBB 0.008
pX(3) = P(GGG) = P(G)  P(G)  P(G) = 0.83 = 0.512

pX (2)  P (GGB  GBG  BGG)


 P (GGB)  P (GBG)  P ( BGG)  [0.8  0.8  0.2] 3  0.384

CDF : FX ( xi )  P ( X  xi )   p X ( xi )
all x  xi
x
0
i 0
p 8
.0
(x
X )i .FX
0 i)8
0
(x

1 0 6
9
.0 0
. +
8
0 .0 =
6
9 0
.1 4
2
3 0 4
8
3
2
1
.5 0
. +
8
0 .0 +
6
9 8
.3
0 0
+
4
= 8
2
1
4
.5 1
=

xi pX(xi) FX(xi)
0 0.008 0.008
1 0.096 0.008+0.096=0.104
2 0.384 0.008+0.096+0.384=0.488
3 0.512 0.008+0.096+0.384+0.512=1
1.1
1.0
0.9
Cumulative value FX(x)

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-1 0 1 2 3 4

No. of Bulldozer, X
x
FX ( x )  P ( X  x )   f X ( x  )dx 

dFX ( x )
Note: if the derivative of FX(x) exists, then f X ( x ) 
dx

P ( x  X  x  dx)
fX(x)  f X ( x)dx
Probability density function
dFX (b)
f X (b)   Cannot be negative!
dx
 Area under curve = 1
Area =  Units of 1/X e.g. if X ~
P(X < a) metres, fX(x) ~ m-1
 Value represents gradient
of cdf
a dx b
Integrate Differentiate
FX(x) Cumulative distribution
Gradient function
 Ranges from 0 to 1
 Montonically increasing
 Dimensionless (no units)
FX(a)
 Gradient given by pdf
=P(X < a)
a b
X (MPa) No. of specimens Fraction of specimens 0.20
20.0 - 22.4 2 0.08
22.5 - 24.9 1 0.04
0.15
25.0 - 27.4 5 0.2
27.5 - 29.9 12 0.48

PDF
30.0 - 32.4 3 0.12

p
0.10

32.5 - 34.9 1 0.04


35.0 - 37.4 1 0.04 0.05

Total 25 1
0.00
20.0 22.5 25.0 27.5 30.0 32.5 35.0 37.5
Strength, X (M Pa)
Cumulative distribution 1.3
1.2

Cumulative fraction of specimens


function FX(x) 1.1
1.0
0.9
FX(25) = 0.12 0.8
0.7

FX(30) = 0.80 0.6


0.5
0.4
FX(37.5) = 1.0 0.3
0.2

FX(19) = 0.0 0.1


0.0
20.0 22.5 25.0 27.5 30.0 32.5 35.0 37.5

Strength, X (MPa)

Example 3.2 – Accidents along a highway


Traffic conditions along a 100 km highway is constant.
Likelihood of accidents is uniform. Let X = location where an
accident occurs. What is the PDF and CDF of X?

PDF : fX ( x)  c 0  x  100 fX(x)

0 elsewhere
100 c
Since F X (  )  1 , 0 f X ( x ) dx  1
This gives c  0 . 01 (units = km–1) x
x variable for integration 0 100
CDF x upper limit of integral
x FX(x)
FX ( x )   0
0.01dx  , 0  x  100
1
  0.01 x  0  0.01 x
x
0  x  100
0 x0
1 x  100
P (20  X  35)  F X (35)  F X (20)  0.15
3.2 MAIN DESCRIPTORS OF A RANDOM VARIABLE
If we have a set of data for X (e.g. strength of concrete),
how do we describe the PDF? Two possible ways are:
(a) fit a curve that can be described by an algebraic equation;
(b) describe the shape of distribution based solely on the data

Let us examine (b) first. Some descriptors are:


• range of values of X
• at which x the PDF is the maximum (the same as at which x
most of the data occurs) - known as the mode
• at which x the data is divided into half - known as median
• some weighted value of the spread of data (e.g. centre of
gravity of data, i.e. the mean value)

In engineering, we collect data based on random sampling.


Assume that each point has an equal chance of happening
with respect to other points sampled.

Hence, if N points are sampled, with values xi (i = 1,2,…,N),


then the probability of a point having a value = xi is 1/N

We can compute the average (or sample mean),


of a set of data as

1 N N
  1 
x
N

i 1
xi    xi  N 
i 1   

In the above equation, xi are random samples


Because different values of a random variable X have
different probabilities pX(xi) or probability densities
fX(xi)dx, the weighted average is of special interest,
and is the mean (or expected value) of the r.v., denoted
as X or E[X]

N 
 X    xi pX ( xi )  or  x f X ( x )dx

i 1

In the above equation, xi are NOT random samples


xi is varied systematically from – to 

Mathematical Expectation
(generalization of weighted average)
In the last slide, the mean or expected value of X (X) is given by
N 
E[ X ]    xi  pX ( xi ) or  x f X ( x )dx

i 1

In general, the expected value of a function g(x) is defined as


N
E[ g( X )]   g ( x i ) p X ( x i ) or

i 1


g( x ) f X ( x )dx

When g(X) = Xn
N 
E[ X ]    xin  pX ( xi ) or
n
 x n f X ( x )dx

i 1
This is known as the n-th moment of X ...(power of X = n)
For mean, n=1
Variance & Standard Deviation (measure of dispersion)
The mean gives us the “centre of gravity” of the data
distribution. It does not indicate whether the data is
concentrated or spread out. X = E[X]
The deviation of point i from the mean is xi – X.
Hence, a measure of spread is E[(X – X)2] known as variance.
N
  E[( X   X ) ]   ( xi   X )2 p X ( xi )
2
X
2
i 1

or  ( x   X )2 f X ( x )dx
Note that this is also known as second central moment of X. It
is the moment of “inertia” of the data about the mean.
The standard deviation, X, is the square root of variance.

Useful formula: Variance = mean-square – square of mean

 X2  E[ X 2 ]   X 2
N 
where E[ X ]   ( xi ) 2 p X ( xi ) or
2
 x 2 f X ( x )dx

i 1

 X2  E[( X   X ) 2 ]  E[ X 2  2 X  X   X2 ]
 E[ X 2 ]  E[2 X X ]  E[  X2 ]  E[ X 2 ]  2 X E[ X ]   X2
 E[ X 2 ]   X2 (will be always positive)

Coefficient of Variation (X)


- non-dimensional measure of dispersion of data relative to
the central value
- standard deviation is normalized by the mean: X = X/ X
- has no meaning if X = 0
Measures of central tendency

Mode (Xmode) is the “most likely” value,


corresponding to the max value of the
pdf it is possible to have multiple
modes.

Median (denoted as Xm) is the value


such that there is 50% chance being
above, or below Xm, i.e. FX(Xm) = 0.5

Mean (or expectation is the most


common, but not the only measure of
“central tendency”

https://en.wikipedia.org/wiki/Probability_density_function
Singapore gross montly income

5070

http://stats.mom.gov.sg/Pages/Income-Summary-Table.aspx

Describing the shape of a distribution


One way to describe data is by the mean (or mode or median),
the variance (or standard deviation or coefficient of variation).
To enhance the description, higher moments can be used.
For example, the third central moment tells us about the
symmetry of the distribution (or data) about the mean. The
third central moment is given by
N
E[( X   X ) ]   ( xi   X )3 p X ( xi )
3
i 1

  ( x   X ) f X ( x )dx
3
or
The non-dimensional form is known as the skewness coefficient
(or simply skewness) and is defined as  = E[(X – X)3]/X3
Symmetric distribution implies that skewness = 0, but
skewness = 0 does not necessarily imply symmetry!
Example 3.3 - Bulldozer (moments of distribution)
xi pX(xi) FX(xi)
0 0.008 0.008
1 0.096 0.008+0.096=0.104
2 0.384 0.008+0.096+0.384=0.488
3 0.512 0.008+0.096+0.384+0.512=1
Mean = E[X] =  xi pX(xi)
= 0(0.008)+1(0.096)+2(0.384)+3(0.512) = 2.4
The expected value of a
var(X) = X2 = E[X2] - X2 discrete r.v. is not
necessarily a possible
E[X2] =  xi2 pX(xi) value of the r.v.
= 02(0.008)+12(0.096)+22(0.384)+32(0.512) = 6.24
 X2 = 6.24 – 2.42 = 0.48
Standard deviation = X = 0.48 = 0.69
Coefficient of variation (C.O.V), X = 0.69/2.4 = 0.29

Example 3.4 - Moments of continuous distribution


T = useful life (in hours) of welding machines with PDF given
by the exponential distribution f (t) T

fT(t) = e–t for t  0 


1.0

0.8

=0 for t < 0 0.6

where  = constant parameter


0.4

0.2

Mean life of welding machine 0.0

t
  1
0 1 2 3 4

T  E (T )   tfT (t )dt   t  e dt  t


0 0 
Variance of life

Method 1 var(T) = T2  (t  T ) 2 fT (t )dt
0

2
  1 1
  t    e t
dt 
0
  2
Method 2 (Note: Method 2 is often easier!)
 1
 T2  E[T 2 ]  T2   t 2 fT (t )dt  2
0 
 1 1
  t 2  e   t dt  2  2
0  
1/ 
Coefficient of Variation COV = T = 1
1/ 
Mode Mode = 0
Median, tm is solved from FT(tm) = 0.5
t
where FT ( t )  0 e  t dt    e  t
 
  t
0  1  e  t
1 – exp(–tm) = 0.5, hence tm = ln(2)/

Skewness  = E[(T – T)3]/T3


3
1   1

t

3     e dt  2
t 
0
 

Appendix
Formula for integrating by parts

 udv  uv   vdu

0
te  t dt u t dv  e  t
 
   t     e  t  1
  t e    e dt  0  
 t
 
  0 0
  0 

Note: don’t worry, complicated integration, e.g. by parts,


substitution, etc, will not be tested in exam. Any necessary
formulas will be given.
You only need to know how to integrate:
• Polynomials: e.g. x2, x3
• Exponential function: exp(bx)
• Simple trigo functions: e.g. cos(bx), sin(bx)

You might also like