Probability and Stochastic Process 21
Probability and Stochastic Process 21
Probability and Stochastic Process 21
3.2. Poisson random variable. A non-negative integer valued random variable X is said to
be a Poisson random variable with parameter λ > 0, denoted as X ∼ P(λ), if probability mass
function is given by
λi
p(i) = P(X = i) = e−λ i = 0, 1, . . .
i!
The Poisson random variable has a wide range of applications in diverse areas, For example, the
number of misprints on a page of a book, the number of phone calls received by a call center
follows a Poisson distribution. The Poisson distribution is actually a limiting case of a binomial
distribution when the number of trials n gets large and the probability of success p is small. To
see this, let X ∼ B(n, p). Set λ = np. Then
i
λ n−i n(n − 1) . . . (n − i + 1) λi
n i n−i n λ λ n−i
P(X = i) = p (1 − p) = i
1 − = i
1−
i i n n n i! n
For large n and keeping λ and i fixed, we see that
n(n − 1) . . . (n − i + 1) λ −i λ n
≈ 1, 1− ≈ 1, 1− ≈ e−λ .
ni n n
Therefore, we have
λi
P(X = i) ≈ e−λ .
i!
Example 3.4. A factory produces nails and packs them in boxes of 400. If the probability that
a nail is substandard is 0.005, find the probability that a box selected at random contains at most
two nails which are substandard.
Solution: Let X denotes the number of substandard nails in a box of 400. Then X ∼ B(n, p)
with n = 400 and p = 0.005. Since n is large enough and p is small, we can use Poisson
PROBABILITY AND STOCHASTIC PROCESS 23
approximation and therefore X ∼ P(λ) where λ = 400 × 0.005 = 2. The required probability is
given by
21 22
P(X ≤ 2) = e−2 1 + + = 5 e−2 .
1! 2!
Expected value and variance of Poisson random variable: In view of the relation
between binomial distribution and Poisson distribution, one can expect that expectation and
variance of a Poisson random variable would be np = λ. We now verify this result. For any
X ∼ P(λ),
∞ ∞
X
−λ
X λi−1
E[X] = ip(i) = λe = λ,
(i − 1)!
i=0 i=1
∞ ∞ ∞
X X
−λ λi−1 X λj
2
E[X ] = 2
i p(i) = λ ie = λe−λ (j + 1)
(i − 1)! j!
i=0 i=1 j=0
∞ ∞
X jλj X λj
= λ e−λ + e−λ = λ(λ + 1) ,
j! j!
j=0 j=0
2
Var(X) = E[X ] − E[X] = λ(λ + 1) − λ2 = λ .
2
t
Moment generating function of X is given by mX (t) = e−λ(1−e ) for all t, see Example 2.16.
Example 3.5. The mean number of errors due to a bug occurring in a minute is 0.0001. What
is the probability that there will be no error in 30 minutes? How long would the program need
to run to ensure that there will be a 99.95% chances that an error will show up to highlight this
bug?
Solution. Let X denotes total number of errors occurring in 30 minutes. Then X ∼ P(λ)
where λ = 30 × 0.0001 = 0.003. The probability that there will be no error in 30 minutes is equal
to
P(X = 0) = e−0.003 .
Suppose that program need to run k minutes . Then, P no occurrence of error in k minutes =
e−(0.0001)k . Hence, to be 99.95% sure catching the bug, we need to choose k such that
1 − e−(0.0001)k ≥ 0.9995 ⇔ e−(0.0001)k ≤ 0.0005 =⇒ k > 75000 .
3.3. Negative binomial and geometric random variables. A positive, integer valued ran-
dom variable X is said to be a geometric random variable with parameter p, denoted it by
X ∼ G(p), if its probability mass function is given by
P(X = n) = (1 − p)n−1 p n = 1, 2, . . . for some 0 < p < 1.
X basically represents the number of trials required until a success occurs. One can easily check
that
1 2−p 1−p
E[X] = , E[X 2 ] = , Var(X) = .
p p2 p2
Example 3.6. An urn contains 10 white and 4 black balls. Balls are randomly selected, one at
a time, until a black one is obtained. If we assume that each ball selected is replaced before the
next one is drawn, what is the probability that exactly 3 draws are needed ?
Solution: We denote by X, the number of draws required to select a black ball. Then X is a
geometric random variable with parameter p = 72 . Hence the probability that exactly 3 draws are
needed is given by
5 2 2 50
P(X = 3) = × = .
7 7 243
24 A. K. MAJEE
is said to be a negative binomial random variable with parameters (r, p). It is denoted by
X ∼ BN (r, p). It basically represents the number of trials required until a total of r success
occurs. Note that a geometric random variable is just a negative binomial with parameter (1, p).
Let us calculate the mean and variance of negative binomial random variable X. Observe that
∞ ∞
k n−1 1 X k−1 n − 1 r+1
X
k r n−r
E[X ] = n p (1 − p) = n n p (1 − p)n−r
n=r
r − 1 p n=r
r − 1
∞ ∞
1 X k−1 n r+1 n−r r X k−1 m − 1
= n r p (1 − p) = (m − 1) pr+1 (1 − p)m−(r+1)
p n=r r p r
m=r+1
(3.3)
r h i
= E (Y − 1)k−1 , (3.4)
p
where Y ∼ BN (1 + r, p). Hence, setting k = 1, we have E[X] = pr . For k = 2, we have
r h i r 1+r
E[X 2 ] = E (Y − 1) =
−1 .
p p p
Using these estimate, we calculate the variance as
r 1+r r2 r(1 − p)
Var(X) = −1 − 2 = .
p p p p2
Example 3.7. Suppose that Indian football team has ability to win any one game is 45% and
games are independent of one another.
a) What is the probability that it takes 15 games to win 4th game?
b) What is the expected value and variance of the number of games it will take to win their
40th game.
c) Knowing that they got their 45th win with 4 games remaining in the season, what is the
probability that they do not get their 46th or more wins?
Solution: Let X respectively Y be the number of games required to win their 4th respectively
40th game. Then X ∼ BN (4, 0.45) and Y ∼ BN (40, 0.45).
a) The required probability is given by
14
P(X = 15) = (0.45)4 (0.55)11 .
3
c) Let Z denotes the number of games to get first win. Then Z ∼ G(0.45). The required
probability is given by
3.4. Uniform random variable. A random variable X is said to have a uniform distribution
on the interval [a, b] with −∞ < a < b < ∞, if its probability density function is given by
(
1
, a≤x≤b
f (x) = b−a
0, otherwise.
We write X ∼ U[a, b] if X has a uniform distribution on [a, b]. The distribution function of
X is given by
0, x < a
x − a
F (x) = , a≤x<b
b−a
1, b ≤ x.
Proof. For simplicity, let a > 0. Similar proof can be carrier out for a < 0. Let FY (·) be the
cumulative distribution function of Y . Then
x−b x − b
FY (x) = P(Y ≤ x) = P(X ≤ ) = FX ,
a a
where FX (·) is the cumulative distribution function of X. If fY is the density function of Y ,
then we have
1 x − b 1 (x−(aµ+b))2
fY (x) = fX = √ e− 2a2 σ2 .
a a a σ 2π
This implies that Y ∼ N (aµ + b, a2 σ 2 ).
Remark 3.2. In view of Lemma 3.1, we see that for any X ∼ N (µ, σ 2 ) the random variable
Z := X−µ
σ is a standard normal random variable.
Example 3.9. For any X ∼ N (3, 9), find i) P(2 < X < 5) and ii) P(|X − 3| > 6).
Solution: In view of Remark 3.2, we need to find the probability in terms of standard normal
random variable and then use Table 1.
2 − 3 5 − 3 2 1 2 1
i) P(2 < X < 5) = P <Z< = Φ( ) − Φ(− ) = Φ( ) + Φ( ) − 1] ≈ 0.3779 .
3 3 3 3 3 3
ii) P(|X − 3| > 6) = 1 − P(|X − 3| ≤ 6) = 1 − P(−3 ≤ X ≤ 9) = 1 − P(−2 ≤ Z ≤ 2)
= 1 − Φ(2) − Φ(−2) = 2(1 − Φ(2)) ≈ 0.456 .
PROBABILITY AND STOCHASTIC PROCESS 27
Table 1. Area Φ(x) under the standard normal curve to the left of x
Example 3.10. Suppose that the life lengths of two electronic devices say, D1 and D2 , have
normal distributions N (40, 36) and N (45, 9), respectively. If a device is to be used for 45 hours,
which device would be preferred?
28 A. K. MAJEE
Solution: We will find which device has greater probability of lifetime more than 45 hours. We
have
5 5
P(D1 > 45) = P(Z > ) = 1 − Φ( ) ≈ 0.2005
6 6
P(D2 > 45) = P(Z > 0) = 1 − Φ(0) = 0.5
Since P(D2 > 45) > P(D1 > 45), the device D2 will be preferred.
3.5.1. The Normal Approximation to the Binomial Distribution. : We have seen that
B(n, p) converges to P(λ = np) when nis large and p is very small. One can ask the limiting
distribution of B(n.p) when neither p or q = 1 − p is very small (in particular when np(1 − p) is
large ). This result is known as DeMoivre-Laplace limit theorem. We state this theorem
without proof as it as a special case of central limit theorem which will be discussed later.
Theorem 3.2 (DeMoivre-Laplace limit theorem). If Sn denotes the number of successes, with
p as a success of probability, that occur when n independent trials are performed, then for any
a < b,
Sn − np
P a≤ p ≤ b → Φ(b) − Φ(a) as n → ∞.
np(1 − p)
Note that the normal approximation will, in general, be quite good for values of n satisfying
np(1 − p) ≥ 10.
Example 3.11. A die is tossed 1000 consecutive times. Calculate the probability that the number
4 shows up between 150 and 200 times. What is the probability that the number 4 appears exactly
150 times?
Solution: If X denotes the number of times 4 shows up then X ∼ B(1000, 16 ). Since np(1−p) =
5000
36 > 10, we can use normal approximation of binomial distribution.
150 − 500 X − 500 200 − 500
3
P(150 ≤ X ≤ 200) = P q ≤ q 3 ≤ q 3
1250 1250 1250
9 9 9
= P − 1.14142 ≤ Z ≤ 2.8284 = Φ(2.8284) − Φ(−1.14142) ≈ 0.9183
Since binomial is a discrete integer-valued random variable, whereas the normal is a continuous
random variable,it is best to write P(X = i) as P(i − 12 ≤ X ≤ i + 12 ) before applying the normal
approximation (this is called the continuity correction). Therefore, we have
149.5 − 500 X − 500 150.5 − 500
3
P(X = 150) = P(149.5 < X < 150.5) = P q ≤ q 3 ≤ q 3
1250 1250 1250
9 9 9
= P − 1.4566 ≤ Z ≤ −1.3718 = Φ(−1.3718) − Φ(−1.4566) ≈ 0.01319 .
3.6. Exponential random variable. A continuous random variable is said to be exponential
random variable with parameter λ > 0, denoted by Exp(λ), if its pdf is given by
(
λe−λx , x ≥ 0
f (x) =
0, x < 0.
The cdf of an exponential random variable is given by
Z x
FX (x) = P(X ≤ x) = f (y) dy = 1 − e−λx , x ≥ 0.
0
Mean and variance of X ∼ Exp(λ): By using integration by parts formula, we have, for
n>0
PROBABILITY AND STOCHASTIC PROCESS 29
Z ∞ Z ∞ Z ∞
d −λx
E[X n ] = xn λe−λx dx = − xn e =n e−λx xn−1 dx
0 0 dx 0
Z ∞
n n
= xn−1 λe−λx dx = E[X n−1 ]
λ 0 λ
1 2 2
=⇒ E[X] = , E[X 2 ] = E[X] = 2 .
λ λ λ
Hence, we have
2 1 1
Var(X) =
2
− 2 = 2.
λ λ λ
In practice, the exponential distribution often arises as the distribution of the amount of time
until some specific event occurs e.g., the amount of time until an earthquake occurs, a new war
breaks out etc.
Definition 3.4. We say that a non-negative random variable X is memoryless if
P X > s + t X > t = P(X > s) ⇔ P(X > t + s) = P(X > t)P(X > s) ∀ s, t ≥ 0. (3.5)
Exponentially distributed random variables are memoryless. Indeed, for any t, s ≥ 0, one has
P(X > t + s) = e−λ(t+s) = e−λt e−λs = P(X > t)P(X > s)
Moreover, we have the following theorem.
Theorem 3.3. Let X be a random variable such that P(X > 0) > 0. Then X ∼ Exp(λ) if and
only if (3.5) holds.
Example 3.12. Suppose the time of use it takes until a smartphone fails is modelled by an ex-
ponential random variable. The average time until failure is 1000 hours. What is the probability
that the smartphone
i) fails in the first 10 hours?
ii) does not fail in the first 1000 hours?
iii) does not fail in the next 500 hours knowing that it already used for 200 hours?
Solution: Let X be time in hours until failure. Then X ∼ Exp(λ) for some λ > 0. Given that
1
E[X] = 1000. Hence λ = 1000 .
Z 1000
1 − x 1
i) P(X < 10) = e 1000 dx = 1 − e− 100 .
0 1000
ii) P(X ≥ 1000) = 1 − P(X ≤ 1000) = e−1 ≈ 0.3678.
iii) By memoryless property, the required probability is
1
P X > 500 + 200 X > 200 = P(X > 500) = e− 2 .
3.6.1. Failure rate function: Let X be a positive continuous random variable, interpreted as
a lifetime of some item, with distribution function F (·) and density function f (·). The failure
rate function λ(t) of F is defined by
f (t)
λ(t) = , where F̄ = 1 − F.
F̄ (t)
λ(t) can be represented as the conditional probability intensity that a t-unit-old item will fail
i.e.,
f (t)
P X ∈ (t, t + dt) X > t ≈ dt.
F̄ (t)
30 A. K. MAJEE
Reliability is defined as the probability that a system (component) will function over some
time period t. The function F̄ (t) is known as reliability function. Sometimes it is denoted by
R(t) := P(X > t).
One can easily check that for an exponentially distributed random variable, the failure rate
function λ(t) is constant and equal to λ. The failure rate function λ(t) uniquely determines the
d
F (t)
distribution F . Indeed, since λ(t) = dt , we have
1 − F (t)
Z t
log(1 − F (t)) = − λ(s) ds + c.
0
Since X is positive random variable, one has F (0) = 0 and hence c = 0. Thus,
n Z t o
F (t) = 1 − exp − λ(s) ds .
0
For instance, if λ(t) = a + bt, then the distribution function and density function of the random
variable is given by
bt2 bt2
F (t) = 1 − e−(at+ 2
)
, f (t) = (a + bt)e−(at+ 2
)
, t ≥ 0.
A design life is defined to be the time to failure tR that corresponds to a specified reliability
R i.e., R(tR ) = R.
Example 3.13. Given the failure rate function λ(t) = 5×10−6 t where t is measured in operating
hours, what is the design life if a 0.98 reliability is desired?
Solution: The reliability function can be written in terms of failure rate function as R(t) =
Rt
exp{− 0 λ(s) ds}. By the given condition, we have
Z t r
−6 ln(0.98)
0.98 = exp{− 5 × 10 s ds} =⇒ t = − = 89.89 ≈ 90 .
0 2.5 × 10−6
Therefore the design life is 90 hours.
3.7. Gamma distribution: The gamma distribution is used to describe the intervals of time
between two consecutive failures of an airplane’s motor or the intervals of time between arrivals
of clients to a queue in a supermarket’s cashier point. A random variable X is said to have a
gamma distribution with positive parameters (α, λ), if its density function is given by
( −λx
λe (λx)α−1
Γ(α) , x≥0
f (x) =
0, x < 0
where Γ(α) is the gamma function. The parameter α is called the shape parameter while λ is
the scale parameter. We denote gamma distribution as Γ(α, λ).
The distribution function of X ∼ Γ(α, λ) is given by
Z x −λy Z λx
λe (λy)α−1 1
FX (x) = dy = y α−1 e−y dy .
0 Γ(α) Γ(α) 0
The moment generating function of X ∼ Γ(α, λ) is given by
λ α
mX (t) = if t < α.
λ−t
Further,
d α d2 α2 + α α
E[X] = mX (t) = , E[X 2 ] = mX (t) = , Var(X) = .
dt t=0 λ dt2 t=0 λ2 λ2
PROBABILITY AND STOCHASTIC PROCESS 31
• Gamma distribution with parameters (1, λ) is nothing but the exponential distribution
with parameter λ.
• Gamma distribution with parameters ( k2 , 12 ) with k ∈ N is called chi-square distribution
with k degrees of freedom. We denote the chi-square distribution by X ∼ χ2(k) .
• For α > 1 and λ > 0, the gamma distribution Γ(α, λ) is called Erlang distribution.
Example 3.14. Time spent on a computer (X) is gamma distributed with mean 20 min and
variance 80 min2 . Find P(X < 24) and P(20 < X < 40).
Solution: X ∼ Γ(α, λ). Given that E[X] = 20 and Var(X) = 80 i.e.,
α α
= 20, = 80.
λ λ2
Solving these, we have α = 5 and λ = 14 . Therefore,
Z 6
1
P(X < 24) = y α−1 e−y dy ≈ 0.715 ,
Γ(5) 0
Z 10
1
P(20 < X < 40) = y α−1 e−y dy ≈ 0.411 .
Γ(5) 5
3.8. Weibull distribution: The Weibull distribution is used extensively in reliability and life
data analysis such as situations involving failure times of items. Since this failure time may
be any positive number, the distribution is continuous. It has been used successfully to model
such things as vacuum tube failures and ball bearing failures. A random variable X ha Weibull
distribution with parameters (γ, µ, α) if its density function is given by
γ x−µ γ−1
x−µ γ
exp − , x≥µ
f (x) = α α α
0, x < µ ,
where γ, α are positive constant. We denote the Weibull distribution as W (γ, µ, α). The value
γ is called the shape parameter, µ is the location parameter and α is the scale parameter.
Example 3.15. The time to failure (in hours) of bearings in a mechanical shaft is satisfactorily
modelled as a Weibull random variable with γ = 0.5, µ = 0 and α = 5000. Determine the
probability that a bearing lasts fewer than 6000 hours and also the mean time to failure.
Solution: If X denotes the time (in hours) to failure of bearings then X ∼ W (0.5, 0, 5000).
Observe that the cumulative distribution function F (x) is given by
12
x
− 5000
F (x) = 1 − e , x ≥ 0.
The probability that a bearing lasts fewer than 6000 hours is given by
0.5
6
P(X < 6000) = F (6000) = 1 − e− 5 ≈ 0.666 .
The expected value of X is
1 ∞ x 0.5 − ∞ ∞
0.5
√ √
Z Z Z
x α − y
E[X] = e α dx = ye dy = α y 2 e−y dy = αΓ(3) = 2α
2 0 α 2 0 0
Thus, the mean time to failure of bearing is 1000 hours.
3.9. Beta distribution. A random variable is said to have a beta distribution with parameter
a > 0 and b > 0, denoted as β(a, b), if its density function id given by
1
f (x) = xa−1 (1 − x)b−1 χ(0,1) (x)
B(a, b)
32 A. K. MAJEE
R1
where B(a, b) is the beta function i.e., B(a, b) = 0 xa−1 (1 − x)b−1 dx. The beta distribution
can be used to model random phenomenon whose set of possible values is some finite interval.
One can easily check that
a ab
E[X] = , Var(X) = 2
.
a+b (a + b) (a + b + 1)
Example 3.16. During an 8-hour shift, the portion of time Y that a machine is down for
maintenance or repairs has a beta distribution with a = 1 and b = 2. The cost of this downtime,
due to lost production and cost of maintenance and repair is given by C = 5 + 20Y + 5Y 2 . Find
the mean of C.
Solution: Given that Y ∼ β(1, 2) and hence , we have
1 2 1 2 1
E[Y ] = , Var(Y ) = = =⇒ E[Y 2 ] = Var(Y ) + E[Y ] = .
3 9×4 18 6
Thus, the mean of C is given by
75
E[C] = 5 + 20E[Y ] + 5E[Y 2 ] = .
6
3.10. Cauchy distribution. A random variable X is said to have a Cauchy distribution with
parameter θ, −∞ < θ < ∞, if its density function is given by
1 1
fX (x) = − ∞ < x < ∞.
π 1 + (x − θ)2
Its distribution function is given by
1 x
Z
1 1 1
FX (x) = 2
dy = + tan−1 (x − θ) .
π −∞ 1 + (y − θ) 2 π
One can easily check that E[X r ] does not exist for any r ≥ 1. The characteristic function of X
is given by
φX (t) = exp(iθt − |t|) ∀ t ∈ R.
A Cauchy distribution with parameter θ = 0 is known as standard Cauchy distribution.
Example 3.17. Fine the distribution of X := tan(Y ) for any Y ∼ U( π2 , π2 ).
Solution: The distribution function of X is given by
tan−1 (x) + π
1 1
FX (x) = P(tan(Y ) ≤ x) = P(Y ≤ tan−1 (x)) = 2
= + tan−1 (x) .
π 2 π
Hence the density function of X is given by
d 1 1
fX (x) = FX (x) = − ∞ < x < ∞.
dx π 1 + x2
Thus, X has a Cauchy distribution with parameter θ = 0.