Continuous Random Variables: Probability Density Function PDF
Continuous Random Variables: Probability Density Function PDF
A continuous random variable is a random variable which can take values measured
on a continuous scale e.g. weights, strengths, times or lengths.
Variance of : ∫ ∫
Note that the mean and variance may not be well defined for distributions with broad
tails. The mode is the value of where is maximum (which may not be unique).
The median is given by the value of x where
∫
.
Statistics and probability: 3-2
Uniform distribution
2 x
{ 1
, for short.
Proof:
〈 〉 〈 〉 ∫
∫ ∫ * + ( )
1) Waiting times from random arrival time until a regular event (see below)
In a hard disk drive, the disk rotates at 7200rpm. The wait time is defined
as the time between the read/write head moving into position and the
beginning of the required information appearing under the head.
(b) Find the mean and standard deviation of the wait time.
(c) Booting a computer requires that 2000 pieces of information are read from random
positions. What is the total expected contribution of the wait time to the boot time,
and rms deviation?
Solution
Rotation time = 8.33ms. Wait time can be anything between 0 and 8.33ms and each
time in this range is as likely as any other time. Therefore, distribution of the wait
time is U(0, 8.33ms) (i. . 1 = 0 and 2 = 8.33ms).
Exponential distribution
Relation to Poisson distribution: If a Poisson process has constant rate , the mean
after a time is . The probability of no-occurrences in this time is
Statistics and probability: 3-4
If is the pdf for the first occurrence, then the probability of no occurrences is also
given by
∫
So equating the two ways of calculating the probability we have
hence :.the time until the first occurrence (and between subsequent
occurrences) has the Exponential distribution, parameter .
Occurrence
∫ ∫ [ ] ∫ [ ]
∫ ∫
[ ] ∫
Example: Reliability
The time till failure of an electronic component has an Exponential distribution and it
is known that 10% of components have failed by 1000 hours.
(a) What is the probability that a component is still working after 5000 hours?
(b) Find the mean and standard deviation of the time till failure.
Solution
∫ [ ]
∫ [ ]
Normal distribution
The continuous random variable has the Normal distribution if the pdf is:
Normalization
[non-examinable]
cannot be integrated analytically for general ranges, but the full range can be
integated as follows. Define
( x )2 x2
I dx e 2 2
dx e 2 2
x2 y2 x2 y 2 r2 r2
2
I dx e 2 dy e 2 2
rdrd e 2
2 rdre 2 2
2 2 2 2 2
dxdy e
0 0 0
2
2
r
2 e
2 2
2 2
0
∫ ∫ ∫
√ √ √
[ ] ∫ ∫
√ √
1) Quite a few variables, e.g. human height, measurement errors, detector noise.
(Bell-shaped histogram).
Change of variable
If , then
√ √
∫
Outside of exams this is probably best evaluated using a computer package (e.g.
Maple, Mathematica, Matlab, Excel); for historical reasons you still have to use
tables.
Example: Using standard Normal tables (on course web page and in exams)
If Z ~ N(0, 1):
(a)
(c)
= 0.6915.
(d)
= (1.5) - (0.5)
= 0.9332 - 0.6915
= 0.2417.
Statistics and probability: 3-8
( )
(f)
(g) Finding a range of values within which lies with probability 0.95:
The answer is not unique; but suppose we want an interval which is symmetric about
zero i.e. between -d and d.
The outside diameter, X mm, of a copper pipe is N(15.00, 0.022) and the fittings for
joining the pipe have inside diameter Y mm, where Y ~ N(15.07, 0.0222).
(iii) Find the probability that a randomly chosen pipe fits into a randomly chosen
fitting (i.e. X < Y).
Solution
(i)
( )
(ii) From previous example lies in (-1.96, 1.96) with probability 0.95.
i.e. ( )
Statistics and probability: 3-9
Remember than means and variances of independent random variables just add. So if
are independent and each have a normal distribution ,
we can easily calculate the mean and variance of the sum. A special property of the
Normal distribution is that the distribution of the sum of Normal variates is also a
Normal distribution. So if are constants then:
Proof that the distribution of the sum is Normal is beyond scope. Useful special cases
for two variables are
If all the X's have the same distribution i.e. 1 = 2 = ... = n = , say and 12 = 22
= ... = n2 = 2, say, then:
X 1 X 2 X n
(iv) All ci = 1/n: X = ~ N(, 2/n)
n
The last result tells you that if you average n identical independent noisy
measurements, the error decreases by √ . (variance goes down as ).
Hence
( )
√
Statistics and probability: 3-10
Answer: We can estimate the temperature from n detectors by calculating the mean
from each. The variance of the mean will be 1K2/n where n is the number of detectors.
An rms error of 0.1K corresponds to a variance of 0.01 K2, hence we need n=100
detectors.
Normal approximations
Central Limit Theorem: If X1, X2, ... are independent random variables with the
same distribution, which has mean and variance (both finite), then the sum
n
X
i 1
i tends to the distribution as .
1 n
Hence: The sample mean X n = X i is distributed approximately as N(, 2/n)
n i 1
for large n.
For the approximation to be good, n has to be bigger than 30 or more for skewed
distributions, but can be quite small for simple symmetric distributions.
The approximation tends to have much better fractional accuracy near the peak than
in the tails: don’t rely on the approximation to estimate the probability of very rare
events.
Example: Average of n samples from a uniform distribution:
Statistics and probability: 3-11
𝑝 𝑝
𝑛 𝑛
Example: I toss a coin 1000 times, what is the probability that I get more than 550
heads?
Answer: The number of heads has a binomial distribution with mean np=500 and
variance So the number of heads can be approximated as
. Hence
( )
√
Statistics and probability: 3-12
The manufacturing of computer chips produces 10% defective chips. 200 chips are
randomly selected from a large production batch. What is the probability that fewer
than 15 are defective?
( )
√
[ ]
If Poisson
parameter and
is large (> 7, say),
then has
approximately a
distribution.
Statistics and probability: 3-13
At a given hospital, patients with a particular virus arrive at an average rate of once
every five days. Pills to treat the virus (one per patient) have to be ordered every 100
days. You are currently out of pills; how many should you order if the probability of
running out is to be less than 0.005?
Solution
Assume the patients arrive independently, so this is a Poisson process, with rate 0.2 /
day.
Therefore, Y, number of pills needed in 100 days, ~ Poisson, = 100 x 0.2 = 20.
Comment
Let’s say the virus is deadly, so we want to make sure the probability is less than 1
in a million, 10-6. A normal approximation would give 4.7 above the mean, so
pills. But surely getting just a bit above twice the average number of cases
is not that unlikely??
Don’t use approximations that are too simple if their failure might be
important! Rare events in particular are often a lot more likely than predicted by
(too-) simple approximations for the probability distribution.