0% found this document useful (0 votes)

28 views

Module 2 (Updated)

This document discusses probability distributions and their properties. It defines key terms like random variables, probability distributions, probability mass functions, cumulative distribution functions. It differentiates between discrete and continuous random variables. It provides examples of common probability distributions like binomial and uniform and illustrates how to calculate their probabilities, expectations, and cumulative distribution functions. The goal is for students to understand these fundamental probability concepts and be able to apply them to problems involving discrete and continuous random variables.

Uploaded by

Kassy Mondido

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

Module 2 (Updated)

Uploaded by

Kassy Mondido

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

Stat 123: Probability and Statistics Module 2

2 Probability Distributions

Learning objectives: At the end of this module, the student should be able to:

1. define and illustrate random variable,

2. differentiate discrete and continuous random variables,

3. demonstrate knowledge on joint distribution and marginal distribution,

4. define and illustrate expectations, variance and standard deviation of a random variable
and discuss their properties,

5. demonstrate knowledge on covariance and correlation of two random variables,

6. state Chebyshev’s Inequality and apply it in solving word problems,

7. discuss raw moments, central moments and moment generating functions (mgf ),

8. write the distributions of discrete and continuous random variables in notation form
indicating the appropriate parameters,

9. state the pmf/pdf, expectation, variance and mgf of random variable with discrete and
continuous distribution, and

10. solve problems in Computer Science involving discrete and continuous distributions.

In the previous module, you have learned the constructs of probability; its foundation,
properties, and how to apply them in solving problems that are probabilistic. Here, you will
learn different probability distributions with their properties and how they are practically
used.

Page 1
Stat 123: Probability and Statistics Module 2

2.1 The Random Variable

The concept of probability distributions always begin with the idea of random variables. You
can take its meaning but taking it literally and you are probably right. But of course for
every mathematical concept, we have to formally define what it is.

Definition 2.1. A random variable is a function X : Ω → R.

The domain of a random variable is the sample space Ω. Its range can be the set of all
real numbers R, or its subsets like (0, +∞), or the integers Z, or the interval (0, 1), depending
on what possible values the random variable can potentially take.
Once an experiment is completed, and the outcome ω is known, the value of random
variable X(ω) becomes determined.
Example:

1. In a coin-toss experiment X1 , the outcomes are H and T . So our sample space is

C = {H, T }. We can map H to the real number 1 and T to 0. That is, X1 ({H}) = 1
and X1 ({T }) = 0.

2. In a die-toss experiment X2 , we have the sample space D = {1, 2, 3, 4, 5, 6}. We can

map each outcome to a real number. That is, X2 ({1}) = 1, X2 ({2}) = 2, . . . , X2 ({6}) =
6.

3. Consider an experiment of tossing 3 fair coins and counting the number of heads.
Certainly, the same model suits the number of girls in a family with 3 children, the
number of 1s in a random binary code consisting of 3 characters, etc. Let X3 be the
number of heads. Prior to an experiment, its value is not known. All we can say is
that X3 has to be an integer between 0 and 3. Since assuming each value is an event,

Page 2
Stat 123: Probability and Statistics Module 2

we can compute probabilities,

1 1 1 1
P ({X3 = 0}) = P ({three tails}) = P ({T T T }) = =
2 2 2 8
3
P ({X3 = 1}) = P ({HT T }) + P ({T HT }) + P ({T T H}) =
8
3
P ({X3 = 2}) = P ({HHT }) + P ({HT H}) + P ({T HH}) =
8
1
P ({X3 = 3}) = P ({HHH}) = .
8

Summarizing,

x P (X3 = x)

0 1/8

1 3/8

2 3/8

3 1/8

Total 1

This table contains everything that is know about random variable X3 prior to the
experiment. Generally, before we know the outcome ω, we cannot tell what X equals
to. However, we can list all possible values of X and determine the corresponding
probabilities. By this example, we can now formally define probability distribution.

Definition 2.2. A probability distribution is a function P (X) : R → [0, 1] where

P
P (X) = 1.

Example:

1. Unbiased coin toss experiment.

Page 3
Stat 123: Probability and Statistics Module 2

x P (X1 = x)

0 1/2

1 1/2

Total 1

2. Biased coin toss experiment. Biased in such a way that the tail is twice as likely to
occur than the head.

x P (X4 = x)

0 1/3

1 2/3

Total 1

3. Unbiased die toss experiment.

Page 4
Stat 123: Probability and Statistics Module 2

x P (X2 = x)

1 1/6

2 1/6

3 1/6

4 1/6

5 1/6

6 1/6

Total 1

The examples shown above are actually collection of all probabilities related to a random
variable X, referred to as the distribution of X. Moreover, based from the examples, we can
also ask the question, what is the probability that X is less than or equal to a particular x?
We just simply take the “cumulative” sum of all probabilities that are less than or equal x.
For example, the probability that X2 is less than or equal 4 is

F (4) := P (X ⩽ 4)

= P (X2 = 1) + P (X2 = 2) + P (X2 = 3) + P (X2 = 4)

1 1 1 1
= + + +
6 6 6 6
2
= .
3

Definition 2.3. Collection of all the probabilities related to X is the distribution of X.

The function
P (X) = P (X = x)

Page 5
Stat 123: Probability and Statistics Module 2

is the probability mass function, or pmf. The cumulative distribution function or

cdf is defined as
x
X
F (X) = P (X ⩽ x) = P (X = xi ).
i=1

The set of possible values of X is called the support of the distribution F .

Example: Unbiased die toss experiment.

x P (X2 ⩽ x)

1 1/6

2 1/3

3 1/2

4 2/3

5 5/6

6 1

For every outcome ω, the variable X takes one and only value x. This makes events
X = x mutually exclusive and exhaustive, therefore,

X X
P (X) = P (X = x) = 1.
x x

We can also conclude that the cdf F (X) is a non-decreasing function of x, always between
0 and 1, with
lim F (x) = 0 and lim F (x) = 1.
x→−∞ x→∞

Between any two subsequent values of X, F (X) is constant. It jumps by P (X) at each
possible value x of X.

Page 6
Stat 123: Probability and Statistics Module 2

Recall that one way to compute for the probability of an event is to add probabilities of
all the outcomes in it. Hence, for any set A,

X
P (X ∈ A) = P (X).
x∈A

When A is an interval, its probability can be computed directly from the cdf F (X),

P (a < X ⩽ b) = F (b) − F (a).

Example: A program consists of two modules. The number of errors X1 in the first module
has the pmf p1 (X), and the number of errors X2 in the second module has the pmf p2 (X),
independently of X1 , where

x p1 (x) p2 (x)

0 0.5 0.2

1 0.3 0.2

2 0.1 0.1

3 0.1 0

Find the pmf and cdf of Y = X1 + X2 , the total number of errors.

Answer: We break the problem into steps. First, determine all possible values of Y , then
compute the probability of each value. Clearly, the number of errors Y is an integer that
can be as low as 0 + 0 = 0 and as high as 3 + 2 = 5. Since p2 (3) = 0, the second module has

Page 7
Stat 123: Probability and Statistics Module 2

at most 2 errors. Next,

pY (0) = P (Y = 0) = P (X1 = X2 = 0) = p1 (0)p2 (0)

= (0.5)(0.7) = 0.35

pY (1) = P (Y = 1) = p1 (0)p2 (1) + p1 (1)p2 (0)

= (0.5)(0.2) + (0.3)(0.7) = 0.31

pY (2) = P (Y = 2) = p1 (0)p2 (2) + p1 (1)p2 (1) + p1 (2)p2 (0)

= (0.5)(0.1) + (0.3)(0.2) + (0.1)(0.7) = 0.18

pY (3) = P (Y = 3) = p1 (1)p2 (2) + p1 (2)p2 (1) + p1 (3)p2 (0)

= (0.3)(0.1) + (0.1)(0.2) + (0.1)(0.7) = 0.12

pY (4) = P (Y = 4) = p1 (2)p2 (2) + p1 (3)p2 (1)

= (0.1)(0.1) + (0.1)(0.2) = 0.03

pY (5) = P (Y = 5) = p1 (3)p2 (2) = (0.1)(0.1) = 0.01

Now check:

5
X
pY (y) = 0.35 + 0.31 + 0.18 + 0.12 + 0.03 + 0.01 = 1,
y=0

thus we probably counted all the possibilities and did not miss any (we just wanted to
P
emphasize that simply getting P (X) = 1 does not guarantee that we made no mistake in
our solution. However, if this equality is not satisfied, we have a mistake for sure).

Page 8
Stat 123: Probability and Statistics Module 2

The cumulative function can be computed as

FY (0) = pY (0) = 0.35

FY (1) = FY (0) + pY (1) = 0.35 + 0.31 = 0.66

FY (2) = FY (1) + pY (2) = 0.66 + 0.18 = 0.84

FY (3) = FY (2) + pY (3) = 0.84 + 0.12 = 0.96

FY (4) = FY (3) + pY (4) = 0.96 + 0.03 = 0.99

FY (5) = FY (4) + pY (5) = 0.99 + 0.01 = 1.00

Between the values of Y, F (x) is a constant.

Exercise: A computer virus is trying to corrupt three files. The first file will be corrupted
with probability 0.4. Independently of it, the second file will be corrupted with probability
0.3, and the 3rd file will be corrupted with a probability of 0.65.

1. Compute the probability mass function of X, the number of corrupted files.

2. Draw a graph of its probability mass function and cumulative distribution function.

Page 9
Stat 123: Probability and Statistics Module 2

2.2 Classifications of Random Variables

Recall that a random variable is a function whose range is R. Since R is subsettable into
different number systems, it should be that there are random variables whose range is only
a subset of R, such as N or N ∪ {0}. There are two classifications of a random variable;
discrete and continuous. And so far, we are dealing with discrete random variables.

Definition 2.4. Discrete random variables are random variables whose range is finite
or countable.

In particular, it means that their values can be listed, or arranged in a sequence. Examples
include the number of jobs submitted to a printer, the number of errors, the number of
error-free modules, the number of failed components, and so on. Discrete variables don’t
have to be integers. For example, the proportion of defective components in a lot of 100 can
be 0, 1/100, . . . , 99/100, or 1. This variable assumes 101 different values, so it is discrete,
although not an integer. On the other hand, if the range is uncountable, the random variable
is said to be continuous.
In addition, the name probability mass function is solely for probability functions of
discrete random variables.

Definition 2.5. For a discrete random variable, the probability mass function for x denoted
by fX (x) is defined as fX (x) = P (X = x).

The following are the properties of a probability mass function:

1. The sum for all values of f for all x ∈ X is 1, that is,

∞
X
fX (x) = 1.
x=−∞

2. The value of F for some k ∈ R, denoted by FX (k) is given by

[⌊k⌋]
X
P (X ⩽ k) = fX (x).
x=−∞

Page 10
Stat 123: Probability and Statistics Module 2

3. The probability that X is in (a, b] is given by

P (a < X ⩽ b) = P (X ⩽ b) − P (X ⩽ a)
[⌊b⌋] [⌊a⌋]
X X
= fX (x) − fX (x)
x=−∞ x=−∞
[⌊b⌋]
X
= fX (x)
x=a+1

4. The probability that X is in [a, b] is

P (a ⩽ X ⩽ b) = P (X ⩽ b) − P (X ⩽ a − 1).

5. The probability that X is greater than x is given by

P (X ⩾ x) = 1 − P (X ⩽ x − 1).

Example:

1. Let X be a drv with pmf fX (x) = (1/2)x ; X = 1, 2, 3, . . . .

(a) Show that f is a pmf.

Answer: Note that the equation for a geometric series is given by

∞
X r
rk =
k=1
1−r

for all r ∈ (−1, 1). Hence we have

∞ x 1
X 1 2
= 1 = 1.
x=1
2 1− 2

Therefore, f is a pmf.

Page 11
Stat 123: Probability and Statistics Module 2

(b) Find P (X ⩽ 10).

Answer: The equation for a geometric series with a finite upper limit is given by

n
X r − rn+1
rk = .
k=1
1−r

Hence,

10 x
X 1
P (X ⩽ 10) =
x=1
2
1 1 11

− 2
2
=
1 − 12
1023
= .
1024

2. A tetrahedron (4-sided die) is rolled twice. Let X be the larger of the 2 outcomes if
they are different and the common value if they are the same.

(a) Find the pmf.

Answer: The sample space of the experiment is given by Ω = {(d1 , d2 )|d1 , d2 =

1, 2, 3, 4} where d1 is the outcome of the first toss and d2 is the second toss. It is
easy to validate that n(Ω) = 16. The probability distribution is then given by

x f (x)

1 1/16

2 3/16

3 5/16

4 7/16

Page 12
Stat 123: Probability and Statistics Module 2

Hence the pmf is given by

2x − 1
fX (x) = , x = 1, 2, 3, 4.
16

(b) Find the cdf.

Answer:

x F (x)

1 1/16

2 4/16

3 9/16

4 1

Hence the cdf is given by

x2
fX (x) = , x = 1, 2, 3, 4.
16

Exercise: A fair coin is tossed 3 times. A player wins $1 if the toss is a head, but loses $1
if the first toss is a tail. Similarly, the player wins $2 if the second toss is a head and loses
$2 if the second toss is a tail, same with third toss. Let the random variable X be the total
winnings after 3 tosses. Find the pmf and the cdf.
On the other hand, another type of random variable is when the range is infinite and
uncountable.

Definition 2.6. Continuous random variables are random variables whose range is
uncountable.

Probability functions whose random variable is continuous are called probability distribu-
tion functions or pdf.

Page 13
Stat 123: Probability and Statistics Module 2

Definition 2.7. For a continuous random variable, the probability distribution function for
x denoted by fX (x) is defined as fX (x) = P (X ∈ (a, b)).

The definition means that probability for crv is defined in an interval, not on a point.
The following are the properties of a probability distribution function:

1. The sum for all values of f for all non-overlapping (a, b) ∈ X is 1, that is,

Z ∞
fX (x)dx = 1.
−∞

This also means that the entire area under the pdf curve f is 1.

2. The value of F for some k ∈ R, denoted by FX (k) is given by

Z k
P (X ⩽ k) = fX (x)dx.
−∞

3. The probability that X is in (a, b) is given by

P (a < X < b) = P (X < b) − P (X < a)

Z b Z a
= fX (x)dx − fX (x)dx
−∞ −∞
Z b
= fX (x)dx.
a

4. P (a < X < b) = P (a ⩽ X < b) = P (a < X ⩽ b) = P (a ⩽ X ⩽ b).

5. P (X < x) = P (X ⩽ x) and P (X > x) = P (X ⩾ x).

6. The probability that X is greater than x is given by

Z x
P (X > x) = 1 − P (X < x) = 1 − fX (x)dx.
−∞

Page 14
Stat 123: Probability and Statistics Module 2

That type of random variable assume a whole interval of values. This could be a bounded
interval (a, b), or an unbounded interval such as (a, ∞), (−∞, b), or (−∞, ∞). Sometimes,
it may be a union of several such intervals. Intervals are uncountable, therefore, all values
of a random variable cannot be listed in this case. Examples of continuous variables include
various times (software installation time, code execution time, connection time, waiting
time, lifetime), also physical variables like weight, height, voltage, temperature, distance,
the number of miles per gallon, etc.
Example:

1. For comparison, observe that a long jump is formally a continuous random variable
because an athlete can jump any distance within some range. Results of a high jump,
however, are discrete because the bar can only be placed on a finite number of heights.

2. A job is sent to a printer. Let X be the waiting time before the job starts printing.
With some probability, this job appears first in line and starts printing immediately,
X = 0. It is also possible that the job is first in line but t takes 20 seconds for the
printer to warm up, in which case X = 20. So far, the variable has a discrete behavior
with a positive pmf P (X) at x = 0 and x = 20. However, if there are other jobs in
a queue, then X depends on the time it takes to print them, which is a continuous
random variable. Using a popular jargon, besides “point masses” at x = 0 and x = 20,
the variable is continuous, taking values in (0, ∞).

Page 15
Stat 123: Probability and Statistics Module 2

2.3 Joint and Marginal Distributions

Often we deal with several random variables simultaneously. We may look at the size of a
RAM and the speed of a CPU, the price of a computer and its capacity, temperature and
humidity, technical and artistic performance, etc.

Definition 2.8. If X and Y are random variables, then the pair (X, Y ) is a random vector.
Its distribution is called the joint distribution of X and Y . Individual distributions of X
and Y are then called marginal distributions.

Although we talk about two random variables here, all the concepts extend to a vector
(X1 , X2 , . . . , Xn ) of n components and its joint distribution. Similarly to a single variable,
the joint distribution of a vector is a collection of probabilities for a vector (X, Y ) to take a
value (x, y). Recall that two vectors are equal,

(X, Y ) = (x, y),

if X = x and Y = y. Note that this “and” means the intersection, therefore, the joint
probability mass function (jpmf) of X and Y is given by

P (x, y) = P ((X, Y ) = (x, y)) = P ((X = x) ∩ (Y = y)).

For ease of notations and without loss of generality, let us discuss joint distributions in
the discrete sense and we begin with this definition of joint probability mass function.

Definition 2.9. The joint probability mass function denoted by fX,Y (x, y) is given by
probability values

XX
pij := P (X = xi , Y = yj ) ⩾ 0 and pij = 1.
i j

To illustrate, we can show the jpmf of X and Y by using a table. In the univariate sense,

Page 16
Stat 123: Probability and Statistics Module 2

we know that pmf of X can be illustrated as

X P (X = xi )

x1 f (x1 )

x2 f (x2 )

.. ..
. .
P
We know that i f (xi ) = 1. Extending this to jpmf, we have

Y
X y1 y2 ... yn ...
x1 p11 p12 . . . p1n ...
x2 p21 p22 . . . p2n ...
.. .. .. .. .. ..
. . . . . .
xm pm1 pm2 . . . pmn . . .
.. .. .. .. .. ..
. . . . . .

where pij = fX,Y (xi , yj ) for all i, j = 1, 2, . . .

It also makes sense to find the cdf F from random variables X and Y . This is nothing
different from the univariate case except that double summation (double integral) is already
involved here.

Definition 2.10. The joint cumulative distribution function denoted by FX,Y (x, y) is
given by
X X
F (x, y) = pij .
i;xi ⩽x j;yj ⩽y

The definition is then again an extension from the definition of cdf of the univariate case.
Next, we find what we call the marginal distributions of X and Y from the jpmf fX,Y (x, y).

Definition 2.11. Let F (x, y) and f (x, y) be the jcdf and jpmf of the random variables X

Page 17
Stat 123: Probability and Statistics Module 2

and Y respectively. Then the marginal distribution of X and Y are given by

X X
P (X = xi ) =: pi+ = pij and P (Y = yj ) =: p+j = pij .
j i

This means that pi+ is simply the sum of the ith row and p+j is the sum of the j th column.
To illustrate, the marginal distribution of X is

x fX,Y (x)

x1 p1+ = p11 + p12 + · · · + p1n + . . .

x2 p2+ = p21 + p22 + · · · + p2n + . . .

.. ..
. .

xm pm+ = pm1 + pm2 + · · · + pmn + . . .

.. ..
. .

and the marginal distribution of Y is

y fX,Y (y)

y1 p+1 = p11 + p21 + · · · + pm1 + . . .

y2 p+2 = p12 + p22 + · · · + pm2 + . . .

.. ..
. .

yn p+n = p1n + p2n + · · · + pmn + . . .

.. ..
. .

Page 18
Stat 123: Probability and Statistics Module 2

Since fX,Y (x) and fX,Y (y) are pmf themselves, then it must be that

X X
pi+ = p+j = 1.
i j

Remark: The joint distribution cannot be computed from marginal distributions because
they carry no information about interrelations between random variables. For example,
marginal distributions cannot tell whether variables X and Y are independent or dependent.

Definition 2.12. Random variables X and Y are independent if

pij = pi+ p+j

for all values of x and y.

In problems, to show independence of X and Y , we have to check whether the jpmf

factors into product of marginal pmfs for all pairs x and y. To prove dependence, we only
need to present one counterexample, a pair (x, y) with pij ̸= pi+ p+j .
Example: A program consists of two modules. The number of errors, X, in the first module
and the number of errors, Y , in the second module have the joint distribution, P (0, 0) =
P (0, 1) = P (1, 0) = 0.2, P (1, 1) = P (1, 2) = P (1, 3) = 0.1, P (0, 2) = P (0, 3) = 0.05.

1. Find the marginal distributions of X and Y .

Answer: It is convenient to organize the jpmf of X and Y in a table. Adding row-wise

and column-wise, we get the marginal pmfs.

Page 19
Stat 123: Probability and Statistics Module 2

fX,Y (x, y) 0 1 2 3 fX,Y (x)

0 0.20 0.20 0.05 0.05 0.50

x
1 0.20 0.10 0.10 0.10 0.50

fX,Y (y) 0.40 0.30 0.15 0.15 1.00

2. Find the probability of no errors in the first module.

Answer: fX,Y (x = 0) = 0.50.

3. Find the distribution of the total number of errors in the program.

Answer: Let Z = X + Y be the total number of errors. To find the distribution Z, we

first identify its possible values, then find the probability of each value. We see that Z
can be as small as 0 and as large as 4. Then,

fZ (0) = P (X + Y = 0) = P (X = 0 ∩ Y = 0) = fX,Y (0, 0) = 0.20,

fZ (1) = P (X = 0 ∩ Y = 1) + P (X = 1 ∩ Y = 0)

= fX,Y (0, 1) + fX,Y (1, 0) = 0.20 + 0.20 = 0.40,

fZ (2) = fX,Y (0, 2) + fX,Y (1, 1) = 0.05 + 0.10 = 0.15,

fZ (3) = fX,Y (0, 3) + fX,Y (1, 2) = 0.05 + 0.10 = 0.15,

fZ (4) = fX,Y (1, 2) = 0.10.

P
It is a good check to verify z fZ (z) = 1.

4. Find out if errors in the two modules occur independently.

Answer: To decide on the independence of X and Y , check if their joint pmf factors into
a product of marginal pmfs. We see that fX,Y (0, 0) = 0.2 indeed equals fX (0)fY (0) =

Page 20
Stat 123: Probability and Statistics Module 2

(0.5)(0.4). Keep checking. Next, fX,Y (0, 1) = 0.2 whereas fX (0)fY (1) = (0.5)(0.3) =
0.15. There is no need to check further. We found a pair of x and y that violates the
formula for independent random variables. Therefore, the numbers of errors in two
modules are dependent.

Since a jpmf involves two random variables X and Y , and are essentially events, it also
makes sense to know the conditional distribution of X given Y , or vice-versa.

Definition 2.13. The conditional distribution function of X given Y is given by

P (X = xi , Y = yj )
pi|Y =yj := .
P (Y = yj )

Note that this equation is derived form the definition of conditional probability of A given
B which is P (A|B) = P (A ∩ B)/P (B).
Example: Two cards are drawn without replacement from an ordinary deck and the random
variable X measures the number of hearts and Y measures the number of clubs drawn.

1. Find the joint probability mass function of X and Y .

Answer:

Y
X 0 1 2
(26 C2 )(13 C0 )(13 C0 ) 25 (26 C1 )(13 C0 )(13 C1 ) 13 (26 C0 )(13 C0 )(13 C2 ) 1
0 52 C2
= 102 52 C2
= 51 52 C2
= 17
(26 C1 )(13 C1 )(13 C0 ) 13 (26 C0 )(13 C1 )(13 C1 ) 13
1 52 C2
= 51 52 C2
= 102
0
(26 C0 )(13 C2 )(13 C0 ) 1
2 52 C2
= 17
0 0

Page 21
Stat 123: Probability and Statistics Module 2

2. Find the marginal distributions of X and Y .

Answer: The marginal distributions of X and Y are given by

X fX,Y (x) Y fX,Y (y)

0 25/102 + 13/51 + 1/17 = 19/34 0 25/102 + 13/51 + 1/17 = 19/34
1 13/51 + 13/102 = 13/34 1 13/51 + 13/102 = 13/34
2 1/17 2 1/17

3. Find pi|Y =1 or the conditional distribution of X given that Y = 1.

Answer:

X pi|Y =1

P (X=0,Y =1) 13/51 2

0 P (Y =1)
= 13/34
= 3

P (X=1,Y =1) 13/102 1

1 P (Y =1)
= 13/54
= 3

P (X=2,Y =1)
2 P (Y =1)
=0

4. Are X and Y independent?

Answer: We take p11 . Is it equal to p1+ p+1 ? Since 13/102 ̸= 13/34 × 13/34, then X
and Y are not independent.

Page 22
Stat 123: Probability and Statistics Module 2

Exercise: Consider two random variables X and Y with jpmf given in the table below.

Y =0 Y =1 Y =2

1 1 1
X=0 6 4 8

1 1 1
X=1 8 6 6

1. Find P (X = 0, Y ⩽ 1).

2. Find the marginal pmfs of X and Y .

3. Find the conditional distribution pj|X=0 .

4. Are X and Y independent?

Page 23
Stat 123: Probability and Statistics Module 2

2.4 Expectation and Variance

The distribution of a random variable or a random vector, the full collection of related
probabilities, contains the entire information about its behavior. This detailed information
can be summarized in a few vital characteristics describing the average value, the most
likely value of a random variable, its spread, variability, etc. The most commonly used are
the expectation, variance, standard deviation, covariance, and correlation, introduced in this
section.

Definition 2.14. Expectation or expected value of a random variable X, denoted by

E[X], is its mean, the average value.

We know that X can take different values with different probabilities. For this reason,
its average value is not just the average of all its values. Rather, it is a weighted average.
Example:

1. Consider a variable that takes values 0 and 1 with probabilities P (0) = P (1) = 0.5.
That is, 

0; with probability 1/2

X= .

1; with probability 1/2


Observing this variable many times, we shall see X = 0 about 50% of times and X = 1
about 50% of times. The average value of X will then be close to 0.5, so it is reasonable
to have E[X] = 0.5.

2. Suppose that P (0) = 0.75 and P (1) = 0.25. Then, in a long run, X is equal 1 only
1/4 of times, otherwise it equals 0. Suppose we earn $1 every time we see X = 1. On
the average, we earn $1 every four times, or $0.25 per each observation. Therefore, in
this case E[X] = 0.25.

In a certain sense, expectation is the best forecast of X. The variable itself is random.
It takes different values with different probabilities P (x). At the same time, it has just one

Page 24
Stat 123: Probability and Statistics Module 2

expectation E[X] which is non-random.

Often we are interested in another variable, Y , that is a function of X. For example,
downloading time depends on the connection speed, profit of a computer store depends on
the number of computers sold, and bonus of its manager depends on this profit. Expectation
of Y = h(X) is computed by a similar formula as defined below.

Definition 2.15 (Mathematical Expectation). Let X be a drv with pmf fX (x). Let h(·)
be a positive-valued function. Then the (mathematical) expectation of h(x), denoted by
E[h(x)] is given by
∞
X
E[h(x)] = h(x)fX (x).
x=−∞

Remark: Indeed, if g is a one-to-one function, then Y takes each value y = g(x) with
probability f (x), and the formula for E[Y ] can be applied directly. If h is not one-to-one,
then some values of h(x) will be repeated in Definition 2.15. However, they are still multiplied
by the corresponding probabilities. When we add in Definition 2.15, these probabilities are
also added, thus each value of h(x) is still multiplied by the probability fY (h(x)).
We now define special consequences of Definition 2.15.

Definition 2.16. If h(X) = xk , then

∞
X
k
E[X ] = xk fX (x)
x=−∞

is called the k th raw moment of X.

Definition 2.17. From Definition 2.16, if k = 1, then

∞
X
E[X] = xfX (x)
x=−∞

is called the mean of X.

This is the formula to use to get the expected values from the previous examples.

Page 25
Stat 123: Probability and Statistics Module 2

Definition 2.18. If h(X) = (x − E[X])k , then we can have

∞
X
E[(X − E[X])k ] = (x − E[X])k fX (x)
x=−∞

which is called the k th central moment of X.

Note that E[X] is not a random variable, rather, an unknown constant. We can set
E[X] = µ so that we have

∞
X
k
E[(X − µ) ] = (x − µ)k fX (x)
x=−∞

instead.

Definition 2.19. From Definition 2.18, if we set k = 2, then we have the 2nd central moment,
also called the variance of X, given by

∞
X
2
E[(X − E[X]) ] = (x − E[X])2 fX (x).
x=−∞

Example: Take for example the tetrahedron problem from page 12. Let us get the mean
and the variance of the random variable.

1. Find the mean.

Answer: By using Definition 1.17, the mean is given by

4
X
E[X] = xfX (x)
x=1
1 3 5 7
=1· +2· +3· +4·
16 16 16 16
1 3 15 7
= + + +
16 8 16 4
1
=3 .
8

2. Find the variance.

Page 26
Stat 123: Probability and Statistics Module 2

Answer: We use the Definition 2.19 and the answer we got from (1).

2 2 2 2
1 1 1 3 1 5 1 7
Var[X] := 1 − 3 + 2−3 + 3−3 + 4−3
8 16 8 16 8 16 8 16
289 243 5 343
= + + +
1024 1024 1024 1024
55
= .
64

Now we discuss relevant properties of expectation. Note that these properties will only
hold true if such expectation E exists.

Property 2.1. If c is a constant, then E[c] = c.

Property 2.2. If c is a constant, then E[ch(X)] = cE[h(X)].

Property 2.3. If h1 and h2 are functions, then E[h1 (X) + h2 (X)] = E[h1 (X)] + E[h2 (X)]

Property 2.4. If X and Y are independent, then E[XY ] = E[X]E[Y ].

Example:

1. Let X be a number selected at random from the first 10 positive integers. Assuming
equally likely outcomes, compute E[X(11 − X)].

Answer: Since all outcomes are equally likely to occur, then it should be that

1
f (x) =
10

for all x = 1, 2, . . . , 10. On the other hand,

E[X(11 − X)] = E[11X − X 2 ]

= 11E[X] − E[X 2 ]

Hence, we can solve for 11E[X] and E[X 2 ] separately and just get their difference
after.

Page 27
Stat 123: Probability and Statistics Module 2

(a)

1 1 1
E[X] = 1 · +c· + · · · + 10 ·
10 10 10
1
= (1 + · · · + 10)
10
= 5.5

(b)

1 1 1
E[X 2 ] = 12 · + 22 · + · · · + 102 ·
10 10 10
(10)(11)(21) 1
= ·
6 10
= 38.5

Therefore, E[X(11 − X)] = 11(5.5) − 38.5 = 22.

2. Let X be a random variable with pmf

(|x| + 1)2
f (x) = ; x = −1, 0, 1.
9

Find:

(a) E[X]

(1 + 1)2 (0 + 1)2 (1 + 1)2

E[X] = −1 · +0· +1·
9 9 9
= 0.

Page 28
Stat 123: Probability and Statistics Module 2

(b) E[X 2 ]

(1 + 1)2 (0 + 1)2 (1 + 1)2

E[X] = (−1)2 · + 02 · + 12 ·
9 9 9
4 4
= +0+
9 9
8
= .
9

E[3X 2 − 2X + 4] = 3E[X 2 ] − 2E[X] + 4

8
=3· −2·0+4
9
24
= +4
9
20
= .
3

Property 2.5. Suppose Var[X] = E[(X − µ)2 ] where E[X] = µ. Then Var[X] = E[X 2 ] −
(E[X])2 .

Proof.

Var[X] = E[(X − E[X])2 ]

= E[X 2 − 2XXE[X] − E[E[X]]2 ]

= E[X 2 ] − 2E[Xµ] + E[µ2 ]

= E[X 2 ] − 2µE[X] + µ2

= E[X 2 ] − 2µ2 + µ2

= E[X 2 ] − µ2

= E[X 2 ] − (E[X])2 .

Page 29
Stat 123: Probability and Statistics Module 2

Let’s have more properties involving variance. For the following properties, we begin
with two random variables X and Y whose expectations exist and a constant c ∈ R.

Property 2.6. Var[c] = 0.

Proof.

Var[c] = E[(c − E[c])2 ]

= E[(c − c)2 ]

= 0.

Property 2.7. Var[cX] = c2 Var[X]

Proof.

Var[cX] = E[(cX − E[cX])2 ]

= E[(cX − cE[X])2 ]

= E[(c(X − E[X])2 )]

= E[c2 (X − E[X])2 ]

= c2 E[(X − E[X])2 ]

= c2 Var[X].

The variance of a random variable X Var[X] is often denoted as σ 2 and its square root
σ is called the standard deviation.

Definition 2.20. Standard deviation is a square root of variance,

p
σ =: Std[X] = Var[X].

Page 30
Stat 123: Probability and Statistics Module 2

If X is measured in some units, then its mean µ has the same measurement unit as X.
Variance σ 2 is measured in squared units, and therefore, it cannot be compared with X or µ.
No matter how funny it sounds, it is rather normal to measure variance of profit in squared
dollars, variance of class enrollment in squared students, and variance of available disk space
in squared gigabytes. When a squared root is taken, the resulting standard deviation σ is
again measured in the same units as X. This is the main reason of introducing yet another
measure of variability, σ.
Exercises:

1. There is one error in one of five blocks of a program. To find the error, we test
three randomly selected blocks. Let X be the number of errors in these three blocks.
Compute E[X] and Var[X].

2. Tossing a fair die is an experiment that can result in any integer number from 1 to
6 with equal probabilities. Let X be the number of dots on the top face of a die.
Compute E[X] and Var[X].

3. The number of home runs scored by a certain team in one baseball game is a random
variable with the distribution

x 0 1 2

f (x) 0.4 0.4 0.2

The team plays 2 games. The number of home runs scored in one game is independent
of the number of home runs in the other game. Let Y be the total number of home
runs. Find E[Y ] and Var[Y ].

4. A computer user tries to recall her password. She knows it can be one of 4 possible
passwords. She tries her passwords until she finds the right one. Let X be the number
of wrong passwords she uses before she finds the right one. Find E[X] and Var[X].

Page 31
Stat 123: Probability and Statistics Module 2

2.5 Covariance and Correlation

Expectation, variance, and standard deviation characterize the distribution of a single ran-
dom variable. Now we introduce measures of association of two random variables.

Definition 2.21. The covariance σXY := Cov[X, Y ] is defined as

Cov[X, Y ] = E[(X − E[X])(Y − E[Y ])].

It summarizes the interrelation between the random variables X and Y .

In the discrete sense,

XX
Cov[X, Y ] = f (x, y)(x − E[X])(y − E[y]).
x y

Moreover, covariance is the expected product of deviations of X and Y from their respective
expectations. If Cov[X, Y ] > 0, then positive deviations X − E[X] are more likely to be
multiplied by positive Y − E[Y ], and negative X − E[X] are more likely to be multiplied by
negative Y − E[Y ]. In short, large X imply large Y , and small X imply small Y .
The following are properties of the covariance.

Property 2.8. Cov[X, X] = Var[X]

Page 32
Stat 123: Probability and Statistics Module 2

Proof.

Cov[X, X] = E[(X − E[X])(X − E[X])]

= E[(X − E[X])2 ]

= Var[X].

Property 2.9. Cov[X, Y ] = E[XY ] − E[X]E[Y ]

Proof.

Cov[X, Y ] = E[(X − E[X])(Y − E[Y ])]

= E[XY − Y E[X] − XE[Y ] + E[X]E[Y ]]

= E[XY ] − E[[Y ]E[X]] − E[XE[Y ]] + E[E[X]E[Y ]]

= E[XY ] − E[X]E[Y ] − E[Y ]E[X] + E[X]E[Y ]

= E[XY ] − E[X]E[Y ].

Property 2.10. If X and Y are independent, Cov[X, Y ] = 0.

Proof.

Cov[X, Y ] = E[XY ] − E[X]E[Y ]

= E[X]E[Y ] − E[X]E[Y ]

= 0.

Page 33
Stat 123: Probability and Statistics Module 2

In Property 2.10, we say that X and Y are uncorrelated. We see that independent
variables are always uncorrelated. The reverse is not always true. There exist some variables
that are uncorrelated but not independent.

Definition 2.22. Correlation coefficient between variables X and Y is defined by

Cov[X, Y ]
ρ := .
(Std[X])(Std[Y ])

Correlation coefficient is a rescaled, normalized covariance. Notice that covariance Cov[X, Y ]

has a measurement unit. It is measured in units of X multiplied by units of Y . As a result,
it is not clear from its value whether X and Y are strongly or weakly correlated. Really, one
has to compare Cov[X, Y ] with the magnitude of X and Y . Correlation coefficient performs
such a comparison, and as a result, it is dimensionless.
The value of the correlation coefficient is bounded between −1 and 1 only. A correlation of
−1 means perfect negative correlation and a correlation of 1 means perfect positive correlation.
These perfect correlations are illustrated below.

Further, values of ρ near 1 indicate strong positive correlation, values near −1 show
strong negative correlation, and values near 0 show weak correlation or no correlation.

Page 34
Stat 123: Probability and Statistics Module 2

Example: Let us continue the example we did from page 19, and we compute

x fX (x) xfX (x) x − E[X] (x − E[X])2 fX (x)

0 0.5 0 −0.5 0.125

1 0.5 0.5 0.5 0.125

2
µX = 0.5 σX = 0.25

and (using the second method of computing variances)

y fY (y) yfY (y) y2 y 2 fY (y)

0 0.4 0 0 0

1 0.3 0.3 1 0.3

2 0.15 0.3 4 0.6

3 0.15 0.45 9 1.35

µY = 1.05 E[Y 2 ] = 2.25

√
As a result, we have Var[X] = 0.25, Var[Y ] = 2.25 − 1.052 = 1.1475, Std[X] = 0.25 =
√
0.5, and Std[Y ] = 1.1475 = 1.0712. Also,

XX
E[XY ] = xyf (x, y) = (1)(1)(0.1) + (1)(2)(0.1) + (1)(3)(0.1) = 0.6
x y

((the other five terms in this sum are 0). Therefore,

Cov = E[XY ] − E[X]E[Y ] = 0.6 − (0.5)(1.05) = 0.075

Page 35
Stat 123: Probability and Statistics Module 2

and
Cov[X, Y ] 0.075
ρ= = = 0.1400.
(Std[X])(Std[Y ]) (0.5)(1.0712)

Exercises:

1. Show that Cov[aX, Y ] = aCov[X, Y ] where a ∈ R.

2. Show that ρ[aX + b, cY + d] = ρ[X, Y ] where a, b, c, d ∈ R.

3. Let X and Y be independent variables, show that Var[X + Y ] = Var[X] + Var[Y ].

Page 36
Stat 123: Probability and Statistics Module 2

2.6 Chebyshev’s Inequality

Knowing just the expectation and variance, one can find the range of values most likely
taken by this variable. Russian mathematician Pafnuty Chebyshev (1821–1894) showed that
any random variable X with expectation µ = E[X] and variance σ 2 = Var[X] belongs to
the interval µ ± ε = [µ − ε, µ + ε] with probability of at least σ 2 /ε2 for ε > 0.

Theorem 2.1. Let X be a random variable with expectation µ and variance σ 2 . Then

σ 2
P (|X − µ| ⩾ ε) ⩽
ε

for all ε > 0.

Chebyshev’s inequality shows that only a large variance may allow a variable X to differ
significantly from its expectation µ. In this case, the risk of seeing an extremely low or
extremely high value of X increases. For this reason, risk is often measured in terms of a
variance or standard deviation.
There are ways where we can prove Chebyshev’s inequality and one way to do that is by
Markov’s inequality.

Theorem 2.2 (Markov’s Inequality). Let X be a nonnegative random variable. Then for
ε > 0,
E[X]
P (X ⩾ ε) ⩽ .
ε

Proof. Assume that X is discrete. Then,

∞
X
E[X] = xfX (x)
x=−∞
ε−1
X ∞
X
= xfX (x) + xfX (x)
x=−∞ x=ε
X∞
⩾ xfX (x)
x=ε

Page 37
Stat 123: Probability and Statistics Module 2

Since x ⩾ ε, then

∞
X ∞
X
xfX (x) ⩾ εfX (x)
x=ε x=ε
X∞ X∞
εfX (x) = ε fX (x)
x=ε x=ε

= ε · P (X ⩾ ε)

Then the proof is completed by Transitive Property of Inequality.

Now we prove Chebyshev’s Inequality.

Proof. Let Y = (X − E[X])2 . Then Y is a nonnegative random variable with expected value
E[Y ] = Var[X]. By Markov’s Inequality,

E[Y ] Var[X]
P (Y ⩾ ε2 ) ⩽ 2
= .
ε ε2

But notice that Y ⩾ ε2 is the same as |X − E[X]| ⩾ ε, so we conclude that

σ2
P (|X − µ| ⩾ ε) ⩽
ε2

where µ = E[X] and σ 2 = Var[X].

Example:

1. Suppose the number of errors in a new software has expectation µ = 20 and a standard
deviation of 2. According to Chebyshev’s Inequality, there are more than 30 errors with
probability
2
2
P (X > 30) ⩽ P (|X − 20| > 10) ⩽ = 0.04.
10

However, if the standard deviation is 5 instead of 2, then the probability of more than
30 errors can only be bounded by (5/10)2 = 0.25.

Page 38
Stat 123: Probability and Statistics Module 2

2. Chebyshev’s inequality shows that in general, higher variance implies higher probabili-
ties of large deviations, and this increases the risk for a random variable to take values
far from its expectation. This finds a number of immediate applications. Here we
focus on evaluating risks of financial deals, allocating funds, and constructing optimal
portfolios. This application is intuitively simple. The same methods can be used for
the optimal allocation of computer memory, CPU time, customer support, or other
resources.

As an example, suppose we would like to invest $10, 000 into shares of companies
XX and YY. Shares of XX cost $20 per share. The market analysis shows that their
expected return is $1 per share with a standard deviation of $0.5. Shares of YY cost
$50 per share, with an expected return of $2.50 and a standard deviation of $1 per
share, and returns from the two companies are independent. In order to maximize the
expected return and minimize the risk (standard deviation or variance), is it better
to invest (A) all $10, 000 into XX, (B) all $10, 000 into YY, or (C) $5, 000 in each
company?

Answer: Let X be the actual (random) return from each share of XX, and Y be the
actual return from each share of YY. Compute the expectation and variance of the
return for each of the proposed portfolios (A, B, and C).

(a) At $20 a piece, we can use $10, 000 to buy 500 shares of XX collecting a profit of
A = 500X. Using Property 2.2 and Property 2.7,

E[A] = 500E[A] = (500)(1) = 500;

Var[A] = 5002 Var[X] = 5002 (0.5)2 = 62, 500.

(b) Investing all $10, 000 into YY, we buy 10, 000/50 = 200 shares of it and collect a

Page 39
Stat 123: Probability and Statistics Module 2

profit of B = 200Y ,

E[B] = 200E[Y ] = (200)(2.50) = 500;

Var[B] = 2002 Var[Y ] = 2002 (1)2 = 40, 000.

(c) Investing $5, 000 into each company makes a portfolio consisting of 250 shares
of XX and 100 shares of YY; the profit in this case will be C = 250X + 100Y .
Following Exercise no. 3 of page 36 for independent X and Y,

E[C] = 250E[X] + 100E[Y ] = 250 + 250 = 500;

Var[C] = 2502 Var[X] + 1002 Var[Y ] = 2502 (0.5)2 + 1002 (1)2 = 25, 625.

The expected return is the same for each of the proposed three portfolios because each
share of each company is expected to return 1/20 or 2.50/50, which is 5%. In terms of
the expected return, all three portfolios are equivalent. Portfolio C, where investment
is split between two companies, has the lowest variance; therefore, it is the least risky.
This supports one of the basic principles in finance: to minimize the risk, diversify the
portfolio.

Page 40
Stat 123: Probability and Statistics Module 2

2.7 Families of Discrete Distributions

Next, we introduce the most commonly used families of discrete distributions. Amazingly, ab-
solutely different phenomena can be adequately described by the same mathematical model,
or a family of distributions. For example, as we shall see below, the number of virus at-
tacks, received e-mails, error messages, network blackouts, telephone calls, traffic accidents,
earthquakes, and so on can all be modeled by the same Poisson family of distributions.

2.7.1 Discrete Uniform

Perhaps the easiest discrete distribution to understand, the discrete uniform distribution
assumes that all outcomes are equally likely to occur. It means that all outcomes have equal
chances of being chosen.

Definition 2.23. Let X represent an outcome of an experiment with n outcomes all with
equal chances of occurrence. Then the experiment follows a discrete distribution that is
denoted by X ∼ DU (n) with pmf given by

1
fX (x) = ; x = 1, 2, . . . , n.
n

Example: Suppose we throw an unbiased die. Let X represent the outcome of throwing
the die. Then x = 1, 2, 3, 4, 5, 6 and f (x) = 1/6 for all values of x.
P
If X ∼ DU (n), does it satisfy X = 1? We need to show that the sum of probabilities

Page 41
Stat 123: Probability and Statistics Module 2

for all values of x is 1.

∞ n
X X 1
fX (x) =
x=−∞ x=1
n
1 1 1
= + + ··· +
|n n {z n}
n terms

1
=n
n
= 1.

P
Hence if X ∼ DU (n) then X = 1.

n+1
Property 2.11. If X ∼ DU (n), then E[X] = 2
.

Proof. Proof is left as an exercise.

n2 −1
Property 2.12. If X ∼ DU (n), then Var[X] = 12
.

Proof. Proof is also left as an exercise.

2.7.2 Binomial Distribution

Suppose we have an experiment with only two outcomes; success and failure. In that exper-
iment, you are allowed to do n trials. Let p denote the probability of success and q := 1 − p
the probability of failure. In this experiment of n trials, k of them must be successes and
k ⩽ n. The rest of the trials will then be failures and there will be n − k of them. By
Multiplication Principle, the probability that this will occur is

pk q n−k .

Page 42
Stat 123: Probability and Statistics Module 2

 
n
 
However, there are 
  ways in which k successes are distributed from n trials, hence the

 
k
probability that there are k successes after n trials is given by
 
n
  k n−k
 p q .
 
 
k

Definition 2.24. Let X represent the number of successes after n trials where p is the prob-
ability of success and q the probability of failure. Then the experiment follows a binomial
distribution which is denoted by X ∼ Bi(n, p) with pmf given by

 
n
  x n−x
 p q
fX (x) =  
 
x

where x = 0, 1, . . . , n.

Example: Suppose an unbiased coin will be tossed 5 times. What is the probability of
having 2 heads?
Answer: The value of the parameters are n = 5 and p = 1/2, and the experiment X is
denoted by X ∼ Bi(5, 1/2). Hence,

 
5 2 3
  1 1 5
fX (2) =  
  2 =
  2 16
2

is the probability of having 2 heads from tossing an unbiased coin 5 times.

Property 2.13. If X ∼ Bi(n, p), then E[X] = np.

Page 43
Stat 123: Probability and Statistics Module 2

Proof.

n
X
E[X] = x · n Cx px q n−x
x=0
n
X
= x · n Cx px q n−x
x=1
n
X n!
= x px q n−x
x=1
(n − x)!x!
n
X n!
= px q n−x
x=1
(n − x)!(x − 1)!
n
X (n − 1)!
= np px−1 q n−x
x=1
(n − x)!(x − 1!)

Let n − x = (n − 1) − (x − 1).
n
X (n − 1)!
= np px−1 q [(n−1)−(x−1)]
x=1
[(n − 1) − (x − 1)]!(x − 1)

Let m = n − 1, y = x − 1.
m
X m!
= np py q m−y
y=0
(m − y)!y!
m
X
y m−y
= np m Cy p q
y=0

By Binomial Theorem,

= nP (p + q)m

= np[p + (1 − p)]m

= nP (1)m

= np.

Property 2.14. If X ∼ Bi(n, p), then Var[X] = npq.

Proof. Note that Var[X] = E[X 2 ] − (E[X])2 . Since we already know E[X], what’s left for

Page 44
Stat 123: Probability and Statistics Module 2

us to find is E[X 2 ]. Here, we use some trick to find it. First, we find E[X(X − 1)] and use
it to find E[X 2 ] since

E[X(X − 1)] = E[X 2 − X] = E[X 2 ] − E[X].

Solving,

n
X
E[X(X − 1)] = x(x − 1) · n Cx px q n−x
x=0
n
X n!
= x(x − 1) px q n−x
x=0
(n − x)!x!
n
X n!
= x(x − 1) px q n−x
x=2
(n − x)!x!
n
X n!
= px q n−x
x=2
(n − x)!(x − 2)!
n
2
X (n − 2)!
= n(n − 1)p px−2 q n−x
x=2
(n − x)!(x − 2)!

Let n − x = (n − 2) − (x − 2).
n
X (n − 2)!
= n(n − 1)p2 px−2 q (n−2)−(x−2)
x=2
[(n − 2) − (x − 2)]!(x − 2)!

Let m = n − 2, y = x − 2.
m
2
X m!
= n(n − 1)p py q m−y
y=0
(m − y)!y!
Xm
= n(n − 1)p2 m Cy p
y m−y
q
y=0

= n(n − 1)p2 .

Page 45
Stat 123: Probability and Statistics Module 2

Since E[X(X − 1)] = E[X 2 ] − E[X], we see that

n(n − 1)p2 = E[X 2 ] − np

E[X 2 ] = n(n − 1)p2 + np.

Finally, the variance is then given by,

Var[X 2 ] = E[X 2 ] − (E[X])2

= n(n − 1)p2 + np − (np)2

= np[(n − 1)p + 1 − np]

= nP (np − p + 1 − np)

= nP (−p + 1)

= nP (1 − p)

= npq.

2.7.3 Bernoulli Distribution

We now discuss the special case for the binomial distribution.

Definition 2.25. Let X represent the number of success after doing an experiment one
time. Then X is said to follow a Bernoulli distribution, denoted by X ∼ Be(p) whose
pmf is given by
fX (x) = px q 1−x .

Note that X ∼ Be(p) = X ∼ Bi(1, p).

Example: Find the probability of getting a head in a biased coin where the head is twice
as likely to come up as tails.

Page 46
Stat 123: Probability and Statistics Module 2

Answer:
1 1−1
2 1 2
fX (1) = = .
3 3 3

Property 2.15. The sum of n independent Bernoulli trials is a binomial distribution, i.e.,
if X1 , X2 , . . . , Xn is a sequence of independent Bernoulli random variables, then

X1 + X2 + · · · + Xn ∼ Bi(n, p).

Property 2.16. If X ∼ Be(p), then E[X] = p and Var[X] = pq.

Proof. Proof is left as an exercise.

2.7.4 Poisson Distribution

The next distribution is related to a concept of rare events, or Poissonian events. Essentially
it means that two such events are extremely unlikely to occur simultaneously or within a
very short period of time. Arrivals of jobs, telephone calls, e-mail messages, traffic accidents,
network blackouts, virus attacks, errors in software, floods, and earthquakes are examples of
rare events.

Definition 2.26. Let X represent the number of occurrences per unit of space/time and
λ is the average number of occurrences per unit of space/time. Then X follows a Poisson
distribution denoted by X ∼ P o(λ) with pmf

e−λ λx
fX (x) =
x!

where x = 0, 1, 2, . . .

P
To show that X ∼ P o(λ) satisfies X = 1, we introduce first a special series.

Page 47
Stat 123: Probability and Statistics Module 2

Property 2.17. The function ex can be approximated by the Taylor series

∞
X xn
.
n=0
n!

P
By Property 2.17, X = 1 is satisfied where X ∼ P o(λ) since

∞ ∞
X e−λ λx −λ
X λx
=e
x=0
x! x=0
x!

= e−λ eλ

= 1.

Example: Customers of an internet service provider initiate new accounts at the average
rate of 10 accounts per day.

1. What is the probability that more than 8 new accounts will be initiated today?

Answer: New account initiations qualify as rare events because no two customers open
accounts simultaneously. Then the number X of today’s new accounts has Poisson
distribution with parameter λ = 10. Hence,

P (X > 8) = 1 − FX (8) = 1 − 0.333 = 0.667.

2. What is the probability that more than 16 accounts will be initiated within 2 days?

Answer: The number of accounts, Y , opened within 2 days does not equal 2X. Rather,
Y is another Poisson random variable whose parameter equals 20. Indeed, the param-
eter is the average number of rare events, which, over the period of two days, doubles
the one-day average. Hence, with λ = 20,

P (Y > 16) = 1 − FY (16) = 1 − 0.221 = 0.779.

Page 48
Stat 123: Probability and Statistics Module 2

Property 2.18. If X ∼ P o(λ), then E[X] = λ.

Proof.

∞
X e−λ λx
E[X] = x
x=0
x!
∞
X e−λ λx
= x
x=1
x!
∞
−λ
X xλx−1
= λe
x=1
x!
∞
X λx−1
= λe−λ
x=1
(x − 1)!

Let y = x − 1.
∞
−λ
X λy
= λe
y=0
y!

= λe−λ eλ

= λ.

Property 2.19. If X ∼ P o(λ), then Var[X] = λ.

Proof. Proof is left as an exercise. Hint: Solve E[X(X − 1)] first to get E[X 2 ].

2.7.5 Geometric Distribution

Consider a sequence of independent Bernoulli trials. Each trial results in a “success” or a

“failure. The number of Bernoulli trials needed to get the first success is called the geometric
distribution.

Definition 2.27. Let X be the number of trials until the first success. Often regarded
as the opposite of Bernoulli, X is said to follow the geometric distribution denoted by
X ∼ Ge(p) with pmf
fX (x) = q x−1 p

Page 49
Stat 123: Probability and Statistics Module 2

where x = 1, 2, . . .

Example: Find the probability that it will take 6 tosses to get a head in a coin toss where
heads is 5 times as less likely to occur as tails.
Answer:
6−1
1 5 5
fX (x) = = .
6 6 46656

To get the expectation and the variance of a geometric distribution, we first introduce a
special series.

Property 2.20. The function 1/(1 − x)2 can be approximated by getting the sum

1 + 3x + 6x2 + 10x3 + . . .

Property 2.21. If X ∼ Ge(p), then E[X] = 1/p.

Proof.

∞
X
E[X] = xq x−1 p
x=1

= p + 2qp + 3q 2 p + . . . (1)

(1 − p)E[X] = (1 − p)(p + 2qp + 3q 2 p + . . . )

E[X] − pE[X] = qp + 2q 2 p + 3q 3 p + . . . (2)

E[X] − (E[X] − pE[X]) = p + qp + q 2 p + q 3 p + . . . (1) − (2)

pE[X] = p + qp + q 2 p + q 3 p + . . .

E[X] = 1 + q + q 2 + q 3 + . . .
∞
X
= qx Geometric series
x=0
1
=
1−q
1
= .
p

Page 50
Stat 123: Probability and Statistics Module 2

Property 2.22. If X ∼ Ge(p), then Var[X] = q/p2 .

Proof.

∞
X
E[X 2 ] = x2 q x−1 p
x=1

= p + 4qp + 9q 2 p + 16q 2 p + . . .

= P (1 + 4q + 9q 2 + 16q 3 + . . . )

= p[1 + (3q + q) + (6q 2 + 3q 2 ) + (10q 3 + 6q 3 ) + . . . ]

= p[(1 + 3q + 6q 2 + . . . ) + (q + 3q 2 + 6q 3 + . . . )]

= p[(1 − q)−3 + q(1 − q)−3 ] (Property 2.20)

= P (1 − q)−3 (1 + q)

= P (p)−3 (2 − p)
2−p
= .
p2

Therefore,

Var[X] = E[X 2 ] − (E[X])2

2
2−p 1
= 2
−
p p
2−p 1
= 2
− 2
p p
2−p−1
=
p2
1−p
=
p2
q
= 2.
p

Page 51
Stat 123: Probability and Statistics Module 2

2.7.6 Negative Binomial Distribution

In a sequence of independent Bernoulli trials, the number of trials needed to obtain n suc-
cesses has negative binomial distribution.

Definition 2.28. Let X be the number of trials until the nth success. Then X follows a
negative binomial distribution denoted by X ∼ N B(n, p) with pmf

 
x − 1
  n x−n
fX (x) = 

p q

 
n−1

where x = n, n + 1, n + 2, . . .

Example:

1. Find the probability that it will take 5 tosses to get 2 heads in a coin toss.

Answer:
2 3
1 1 1
fX (x) = 5−1 C2−1 = .
2 2 8

2. In a recent production, 5% of certain electronic components are defective. We need to

find 12 non-defective components for our 12 new computers. Components are tested
until 12 non-defective ones are found. What is the probability that more than 15
components will have to be tested?

Answer: Let X be the number of components tested until 12 non-defective ones are
found. It is a number of trials needed to see 12 successes, hence X has a negative
binomial distribution with n = 12 and p = 0.05.
P∞
We need P (X > 15) = x=16 f (x) or 1 − F (15); however, applying the formula for
f (x) directly is rather cumbersome. What would be a quick solution?

Virtually any negative binomial problem can be solved by a binomial distribution.

Although X is not binomial at all, the probability P (X > 15) can be related to some

Page 52
Stat 123: Probability and Statistics Module 2

binomial variable. In our example,

P (X > 15) = P (more than 15 trials needed to get 12 successes)

= P (15 trials are not sufficient)

= P (there are fewer than 12 successes in 15 trials)

= P (Y < 12)

where Y is the number of successes (non-defective) components in 15 trials, which is a

binomial variable with parameters n = 15 and p = 0.95. Therefore,

P (X > 15) = P (Y < 12) = P (Y ⩽ 11) = F (11) = 0.0055.

This technique, expressing a probability about one random variable in terms of another
random variable, is rather useful. Soon it will help us relate Gamma and Poisson
distributions and simplify computations significantly.

Property 2.23. The sum of n independent geometric trials is a negative binomial distribu-
tion, i.e., if X1 , X2 , . . . , Xn is a sequence of independent geometric random variables, then

X1 + X2 + · · · + Xn ∼ N B(n, p).

Property 2.24. If X ∼ N B(n, p), then E[X] = n/p.

Page 53
Stat 123: Probability and Statistics Module 2

Proof. By Property 2.23, we see that

E[X] = E[X1 + · · · + Xn ] where Xi ∼ Ge(p)∀i = 1, . . . , n

= E[X1 ] + · · · + E[Xn ]
1 1
= + ··· +
p p
| {z }
n terms
n
= .
p

Property 2.25. If X ∼ N B(n, p), then Var[X] = q/p2 .

Proof. Proof is left as an exercise.

2.7.7 Hypergeometric Distribution

Definition 2.29. Consider an experiment of getting n number of objects from a dichotomous

collection of size M . Here, the size of the group of interest is k and the rest is M − k. Let X
be the number of these in the subset of size k out of the n chosen. Then X is said to follow
a hypergeometric distribution denoted by X ∼ HG(M, k, n) with pmf

  
 k  M − k 
  
  
  
  
x n−x
fX (x) =  
M 
 
 
 
 
n

where x = 0, 1, . . . , min(n, k).

Example: A company (the producer) supplies microprocessors to a manufacturer (the con-

Page 54
Stat 123: Probability and Statistics Module 2

sumer) of electronic equipment. The microprocessors are supplied in batches of 50. The
consumer regards a batch as acceptable provided that there are not more than 5 defective
microprocessors in the batch. Rather than test all of the microprocessors in the batch, 10
are selected at random and tested.

1. Find the probability that out of a sample of 10, d = 0, 1, 2, 3, 4, 5 are defective when
there are actually 5 defective microprocessors in the batch.

Answer: Let X = the number defectives in a sample. Then

45 C10−d × 5 Cd
P (X = d) = .
50 C10

Hence,

45 C10× 5 C0 45 C9 × 5 C1
P (X = 0) = = 0.311 P (X = 1) = = 0.431
50 C10 50 C10
45 C8 × 5 C2 45 C7 × 5 C3
P (X = 2) = = 0.210 P (X = 3) = = 0.044
50 C10 50 C10
45 C6 × 5 C4 45 C5 × 5 C5
P (X = 4) = = 0.004 P (X = 5) = = 0.0001
50 C10 50 C10

2. Suppose that the consumer will accept the batch provided that not more than m
defectives are found in the sample of 10.

(a) Find the probability that the batch is accepted when there are 5 defectives in the
batch.

Answer:
m m
45 C10−d × 5 Cd
X X
P (X = d) = ; m⩽5
50 C10
d=0 d=0

(b) Find the probability that the batch is rejected when there are 3 defectives.

Answer:
m m
47 C10−d × 3 Cd
X X
1− P (X = d) = 1 − ; m⩽3
50 C10
d=0 d=0

Page 55
Stat 123: Probability and Statistics Module 2

Exercises:

1. The number of computer shutdowns during any month has a Poisson distribution,
averaging 0.25 shutdowns per month.

(a) What is the probability of at least 3 computer shutdowns during the next year?

(b) During the next year, what is the probability that at least 3 months (out of 12)
with exactly 1 computer shutdown in each?

2. A lab network consisting of 20 computers was attacked by a computer virus. This virus
enters each computer with probability 0.4, independently of other computers. Find the
probability that it entered at least 10 computers.

3. Suppose a gambler will draw a bridgehand (13 cards) randomly in an ordinary deck.

(a) Find the probability that 7 of the cards are red.

(b) If the gambler has k black cards, what is the probability that there are at least 3
red cards?

4. Based on a basketball player’s record, the probability of him shooting a 3-point shot
is 0.65.

(a) Find the probability that it it will take him at most 10 shots to get 5 3-point
shots.

(b) Find the probability that he can shoot at least 6 3-point shots after 10 shots.

Page 56
Stat 123: Probability and Statistics Module 2

2.8 Families Continuous Distributions

The probability density functions of these distributions are described by formulas that de-
pend on some parameter values. The expectation and variances of the distributions are
specified in terms of these parameters. The probability values associated with these contin-
uous distributions distributions are sometimes straightforward to calculate, although some
distributions require the use of a software package.
As in the discrete case, varieties of phenomena can be described by relatively few families
of continuous distributions. Here, we shall discuss Uniform, Exponential, Gamma, and
Normal families.

2.8.1 Continuous Uniform Distribution

Uniform distribution plays a unique role in stochastic modeling. A random variable with
any thinkable distribution can be generated from a Uniform random variable. Many com-
puter languages and software are equipped with a random number generator that produces
random variates. Users can convert them into variables with desired distributions and use
for computer simulation of various events and processes.

Definition 2.30. The uniform distribution has a constant density. On the interval
(a, b) ⊂ R, its density equals

1
f (x) = , a < x < b.
b−a

Note that |b − a| has to be a finite number. Hence, there does not exist a uniform
distribution on the entire real line. In other words, if you are asked to choose a random
number from (−∞, ∞), you cannot do it uniformly.
Examples:

1. If a flight scheduled to arrive at 5 pm actually arrives at a uniformly distributed time

between 4:50 and 5:10, then it is likely to arrive before 5 pm and after 5 pm, equally

Page 57
Stat 123: Probability and Statistics Module 2

likely before 4:55 and after 5:05, etc.

2. Let X denote the waiting time at a bus stop. The waiting time at a bus stop is
uniformly distributed between 1 and 12 minute.

(a) The probability density function of X is

1
f (x) = , 1 ⩽ x ⩽ 12
12 − 1
1
= , 1 ⩽ x ⩽ 12
11

(b) The probability that the rider waits 8 minutes or less is

P (X ⩽ 8) = base · height
1
= (8 − 1)
11
7
=
11
≈ 0.6364.

a+b (b − a)2
Property 2.26. If X ∼ U (a, b), then E[X] = and Var[X] = .
2 12

Exercises:

1. Suppose that X ∼ U (−3, 8). Find;

(a) E[X]

(b) The standard deviation of X

2. A new battery supposedly with a charge of 1.5 volts actually has a voltage with a
uniform distribution between 1.43 and 1.60 volts.

(a) What is the expectation of the voltage?

Page 58
Stat 123: Probability and Statistics Module 2

(b) What is the standard deviation of the voltage?

3. A computer random-number generator produces numbers that have a uniform distri-

bution between 0 and 1.

(a) If 20 random numbers are generated, what are the expectation and variance of the
number that lie in each of the four intervals [0.00, 0.30), [0.30, 0.50), [0.50, 0.75),
and [0.75, 1.00)?

(b) What is the probability that exactly five numbers lie in each of the four intervals?

2.8.2 Exponential Distribution

Definition 2.31. The exponential distribution has a state space x ⩾ 0 and is often used
to model failure or waiting times and interarrival times. If X follows such distribution, it is
denoted by X ∼ ExP (λ) and it has a probability distribution function

f (x) = λe−λx

where λ > 0. Its cumulative distribution function is given by

F (x) = 1 − e−λx .

Property 2.27. If X ∼ ExP (λ), then E[X] = 1/λ and V ar[X] = 1/λ2 .

Example:

1. An engineer examines the edges of steel girders for hairline fractures. The girders are
10 m long, and it is discovered that they have an average of 42 fractures each. If a
girder has 42 fractures, then there are 43 “gaps” between fractures or between the ends
of the girder and the adjacent fractures. The average length of these gaps is therefore

Page 59
Stat 123: Probability and Statistics Module 2

10/43 = 0.23 m. The fractures appear to be randomly spaced on the girders, so the
engineer proposes that the location of fractures on a particular girder can be modeled
by a Poisson process with
1
λ= = 4.3.
0.23

According to this model, the length of a gap between any two adjacent fractures has
an exponential distribution with λ = 4.3. In this case, the probability that a gap is
less than 10 cm long is

P (X ⩽ 10) = F (0.10) = 1 − e−4.3×0.10 = 0.35.

The probability that a gap is longer than 30 cm is

P (X ⩽ 0.30) = 1 − F (0.30) = e−4.3×0.30 = 0.28.

2. The engineer in charge of the car panel manufacturing process pays particular attention
to the arrival of metal sheets at the beginning of the panel construction lines. These
metal sheets are brought one by one from other parts of the factory floor, where they
have been cut into the required sizes. On average, about 96 metal sheets are delivered
to the panel construction lines in 1 hour. The engineer decides to model the arrival of
the metal sheets with a Poisson process. The average waiting time between arrivals is
60/96 = 0.625 minute, so a value of

1
λ= = 1.6
0.625

is used. This model assumes that the waiting times between arrivals of metal sheets
are independently distributed as exponential distributions with λ = 1.6. For example,

Page 60
Stat 123: Probability and Statistics Module 2

the probability that there is a wait of more than 3 minutes between arrivals is

P (X ⩾ 3) = 1 − F (3) = e−1.6×3 = 0.008.

Exercise: Suppose that you are waiting for a friend to call you and that the time you wait
in minutes has an exponential distribution with parameter λ = 0.1.

1. What is the expectation of your waiting time?

2. What is the probability that you will wait longer than 10 minutes?

3. What is the probability that you will wait less than 5 minutes?

4. Suppose that after 5 minutes you are still waiting for the call. What is the distribution
of your additional waiting time? In this case, what is the probability that your total
waiting time is longer than 15 minutes?

5. Suppose now that the time you wait in minutes for the call has a U (0, 20) distribution.
What is the expectation of your waiting time? If after 5 minutes you are still waiting
for the call, what is the distribution of your additional waiting time?

2.8.3 Normal Distribution

In this section, we discuss the normal or Gaussian distribution. It is the most important of all
continuous probability distributions and is used extensively as the basis for many statistical
inference methods. Its importance stems from the fact that it is a natural probability
distribution for directly modeling error distributions and many other naturally occurring
phenomena.

Definition 2.32. The normal distribution has a probability density function

(x − µ)2

1
f (x) = √ exp −
σ 2π 2σ 2

Page 61
Stat 123: Probability and Statistics Module 2

for x ∈ (−∞, ∞), depending upon two parameters, the mean and the variance

E[X] = µ and Var[X] = σ 2

of the distribution. The probability density function is a bell-shaped curve that is symmetric
about µ. The notation
X ∼ N (µ, σ 2 )

denotes that the random variable X has a normal distribution with mean µ and variance σ 2 .
In addition, the random variable X can be referred to as being “normally distributed”.

The probability density function of a normal random variable is symmetric about the
value µ and has what is known as a “bell-shaped” curve. The figure above shows the
probability density functions of normal distributions with different values for µ and σ and
notice how the shape changes when the parameters µ and σ change.

Definition 2.33. A normal distribution with mean µ and variance σ 2 = 1 is known as the
standard normal distribution. Its probability density function has the notation ϕ(x) and
is given by
2
1 x
ϕ(x) = √ exp −
2π 2

Page 62
Stat 123: Probability and Statistics Module 2

for x ∈ (−∞, ∞). The notation Φ(x) is used for the cdf of a standard normal distribution,
which is calculated from the expression

Z x
Φ(x) = ϕ(y)dy.
−∞

The symmetry of the standard normal distribution about 0 implies that if the random
variable Z has a standard normal distribution, then

1 − Φ(x) = P (Z ⩾ x) = P (Z ⩽ −x) = Φ(−x).

This equation can be rearranged to provide the easily remembered relationship

Φ(x) + Φ(−x) = 1.

A very important general result is that if X ∼ N (µ, σ 2 ) then the transformed random
variable
X −µ
Z=
σ

has a standard normal distribution. This result indicates that any normal distribution can
be related to the standard normal distribution by appropriate scaling and location changes.
Notice that the transformation operates by first subtracting the mean value µ and then by
dividing by the standard deviation σ. The random variable Z is known as the “standardized”
version of the random variable X.
A consequence of this result is that the probability values of any normal distribution
can be related to the probability values of any normal distribution can be related to the
probability values of a standard normal distribution and, in particular, to the cdf Φ(x). For

Page 63
Stat 123: Probability and Statistics Module 2

example,

a−µ X −µ b−µ
P (a ⩽ X ⩽ b) = p ⩽ ⩽
σ σ σ

a−µ b−µ
=p ⩽Z⩽
σ σ

b−µ a−µ
=Φ −Φ .
σ σ

Example: Suppose that X ∼ N (3, 4). Then

P (X ⩽ 6) = P (−∞ ⩽ X ⩽ 6)

6−3 −∞ − 3
=Φ −Φ
2 2
= Φ(1.5) − Φ(−∞)

= 0.9332 − 0

= 0.9332

5.4 − 3.0 2.0 − 3.0
P (2 ⩽ X ⩽ 5.4) = Φ −Φ
2.0 2.0
= Φ(1.2) − Φ(−0.5)

= 0.8849 − 0.3085

= 0.5764.

Definition 2.34 (Empirical Rule). • There is a probability of about 68% that a random
variable takes a value with one standard deviation of its mean.

• There is a probability of about 95% that a random variable takes a value with two

Page 64
Stat 123: Probability and Statistics Module 2

standard deviations of its mean.

• There is a probability of about 99.7% that a random variable takes a value with three
standard deviations of its mean.

The percentiles of a N (µ, σ 2 ) distribution are related to the percentiles of a standard

normal distribution through the relationship

P (X ⩽ µ + σzα ) = P (Z ⩽ zα ) = 1 − α.

For example, since the 95th percentile of the standard normal distribution of z0.05 = 1.645,
the 95th percentile of a N (3, 4) distribution is

µ + σz0.05 = 3 + (2 × z0.05 ) = 3 + (2 × 1.645) = 6.29.

Example:

1. A company manufactures concrete blocks that are used for construction purposes.
Suppose that the weights of the individual concrete blocks are normally distributed
with a mean value of µ = 11.0 kg and a standard deviation of σ = 0.3 kg. The
probability that a concrete block weighs less than 10.5 kg is

P (X ⩽ 10.5) = P (−∞ ⩽ X ⩽ 10.5)

10.5 − µ −∞ − µ
=Φ −Φ
σ σ

10.5 − 11.0 −∞ − 11.0
=Φ −Φ
0.3 0.3
= Φ(−1.67) − Φ(−∞)

= 0.0475 − 0

= 0.0475.

Page 65
Stat 123: Probability and Statistics Module 2

Consequently, only about 1 in 20 concrete blocks weighs less than 10.5 kg.

2. A Wall Street analyst estimates that the annual return from the stock of company A
can be considered to be an observation from a normal distribution with mean µ = 8.0%
and standard deviation σ = 1.5%. The analyst’s investment choices are based upon the
considerations that any return greater than 5% is “satisfactory” and a return greater
than 10% is “excellent”. The probability that company A’s stock will prove to be
“unsatisfactory” is

P (X ⩽ 5.0) = P (−∞ ⩽ X ⩽ 5.0)

5.0 − µ −∞ − µ
=Φ −Φ
σ σ

5.0 − 8.0 −∞ − 8.0
=Φ −Φ
1.5 1.5
= Φ(−2.00) − Φ(−∞)

= 0.0028 − 0

= 0.0028

and the probability that company A’s stock will prove to be “excellent” is

P (10.0 ⩽ X) = P (10.0 ⩽ X ⩽ ∞)

∞−µ 10.0 − µ
=Φ −Φ
σ σ

∞ − 8.0 10.0 − 8.0
=Φ −Φ
1.5 1.5
= Φ(∞) − Φ(1.33)

= 1 − 0.9082

= 0.0918.

Page 66
Stat 123: Probability and Statistics Module 2

Exercises:

1. Suppose that X ∼ N (10, 2). Find:

(a) P (X ⩽ 10.34)

(b) P (X ⩾ 11.98)

(c) P (7.67 ⩽ X ⩽ 9.90)

(d) P (10.88 ⩽ X ⩽ 13.22)

(e) P (|X − 10| ⩽ 3)

(f) The value of x for which P (X ⩽ x) = 0.23

(g) The value of x for which P (X ⩾ x) = 0.04

(h) The value of x for which P (|X − 10| ⩾ x) = 0.63

2. The amount of sugar contained in 1-kg packets is actually normally distributed with a
mean of µ = 1.03 kg and a standard deviation of σ = 0.014 kg.

(a) What proportion of sugar packets are underweight?

(b) If an alternative package-filling machine is used for which the weights of the pack-
ets are normally distributed with a mean µ = 1.05 kg and a standard deviation
of σ = 0.016 kg, does this result in an increase or a decrease on the proportion of
underweight packets?

Page 67
Stat 123: Probability and Statistics Module 2

2.8.4 Gamma Distribution

Definition 2.35. The gamma distribution has applications in reliability theory and is
also used in the analysis of Poisson process. The parameters of this distribution are λ and
k meaning if X is a random variable with a gamma distribution, then it is denoted by
X ∼ Ga(λ, k). Its probability density function is

λk xk−1 e−λx
f (x) =
Γ(k)

for x ⩾ 0 and f (x) = 0 for x < 0.

The function Γ(k) is known as the gamma function. It provides the correct scaling to
ensure that the total area under the probability density function is equal to 1.

Definition 2.36. The gamma function is defined to be

Z ∞
Γ(k) = xk−1 e−x dx.
0

√
Some special cases are Γ(1) = 1 and Γ(1/2) = π, and in general,

Γ(k) = (k − 1)Γ(k − 1).

And if k ∈ N, then Γ(k) = (k − 1)!. Also, notice that if k = 1, the gamma distribution
simplifies to the exponential distribution with parameter λ. The expectation and variance
of a gamma distribution are given in the following property.

Property 2.28. If X ∼ Ga(λ, k), then E[X] = k/λ and V ar[X] = k/λ2 .

The parameter k is often referred to as the shape parameter of the gamma distribution,
and λ is referred to as the scale parameter. Another important property of a gamma distri-
bution with an integer value of the parameter k is that it can be obtained as the sum of a
set of independent exponential random variables.

Page 68
Stat 123: Probability and Statistics Module 2

Property 2.29. If X1 , . . . , Xk is a sequence of i.i.d. exponential random variables, i.e.,

P
Xi ∼ ExP (λ) for all i. then i Xi ∼ Ga(λ, k).

This property implies that for a Poisson process with parameter λ, the time taken for
k events to occur has a gamma distribution with parameters k and λ, since the time taken
until the first event occurs, and the times between subsequent events, each have independent
exponential distributions with parameter λ.
Examples:

1. Suppose that the random variable X measures the length between one end of a girder
and the fifth fracture along the girder, as shown in the figure below.

If the fracture locations are modeled by a Poisson process, X has a gamma distribution
with parameters k = 5 and λ = 4.3. The expected distance to the fifth fracture is
therefore
k 5
E[X] = = = 1.16m.
λ 4.3

We use R to show that the 0.05 quantile point of this distribution is x = 0.458m, so
that
F (0.458) = 0.05.

Consequently, the engineer can be 95% sure that the fifth fracture is at least 46 cm
away from the end of the girder. A software package can also be used to calculate the

Page 69
Stat 123: Probability and Statistics Module 2

probability that the fifth fracture is within 1 m of the end of the girder, which is

F (1) = 0.4296.

It is interesting to note that this latter probability can also be obtained using the
Poisson distribution. The number of fractures within a 1-m section of the girder has a
Poisson distribution with mean
λ × 1 = 4.3.

The probability that the fifth fracture is within 1 m of the end of the girder is the
probability that there are at least five fractures within the first 1 m section, which is
therefore
P (Y ⩾ 5) = 0.4296

where Y ∼ P o(4.3).

2. Suppose that the engineer in charge of the car panel manufacturing process is interested
in how long it will take for 20 metal sheets to be delivered to the panel construction
lines. Under the Poisson process model, this time X has a gamma distribution with
parameters k = 20 and λ = 1.6. The expected waiting time is consequently

k 20
E[X] = = = 12.5 minutes.
λ 1.6

The variance of the waiting time is

k 20
Var[X] = 2
= = 7.81
λ 1.62

√
so that the standard deviation is σ = 7.81 = 2.80 minutes. We use R to show that

Page 70
Stat 123: Probability and Statistics Module 2

for this distribution,

F (17.42) = 0.95 and F (15) = 0.8197.

The engineer can therefore be 95% confident that 20 metal sheets will have arrived
within 18 minutes, say. Furthermore, there is a probability of about 0.82 that they
will all arrive within 15 minutes.

Exercises:

1. A day’s sales in $1000 units at a gas station have a gamma distribution with parameters
k = 5 and λ = 0.9.

(a) What is the expectation of a day’s sales?

(b) What is the standard deviation of a day’s sales?

(d) What is the probability that a day’s sales are more than $6000?

2. Suppose that the time in minutes taken by a worker on an assembly line to complete
a particular task has a gamma distribution with parameters k = 44 and λ = 0.7.

(a) What are the expectation and standard deviation of the time taken to complete
the task?

(b) Use a software package to find the probability that the task is completed within
an hour.

Page 71

Cost Analysis of Nestle
70% (20)
Cost Analysis of Nestle
6 pages
Cima p1 2023 Notes
No ratings yet
Cima p1 2023 Notes
112 pages
Inthinking - AI HL Paper 1
No ratings yet
Inthinking - AI HL Paper 1
17 pages
SQQS1013 Chapter 5
100% (1)
SQQS1013 Chapter 5
23 pages
Module-2 (Correction and Regression)
No ratings yet
Module-2 (Correction and Regression)
120 pages
PRP Module 2
No ratings yet
PRP Module 2
113 pages
Introduction To Probability: 2.1 Random Variable
No ratings yet
Introduction To Probability: 2.1 Random Variable
4 pages
PRP - Unit 2
No ratings yet
PRP - Unit 2
41 pages
Prepared By: Mohammad Saifuddin: Discrete or Continuous
No ratings yet
Prepared By: Mohammad Saifuddin: Discrete or Continuous
7 pages
Probability Distribution: Question Booklet
No ratings yet
Probability Distribution: Question Booklet
8 pages
Econ-2042- Unit 2-HO
No ratings yet
Econ-2042- Unit 2-HO
12 pages
UNIT II Probability Theory
No ratings yet
UNIT II Probability Theory
84 pages
UNIT II Probability Theory
No ratings yet
UNIT II Probability Theory
77 pages
2. Probability Theory_D
No ratings yet
2. Probability Theory_D
80 pages
Statistical Method
No ratings yet
Statistical Method
227 pages
Distribution Theory: Unit III
No ratings yet
Distribution Theory: Unit III
70 pages
3-Random Variables-04-01-2023
No ratings yet
3-Random Variables-04-01-2023
54 pages
RV Intro
No ratings yet
RV Intro
5 pages
Basic Probability Review
No ratings yet
Basic Probability Review
77 pages
Sta 2200 Notes PDF
No ratings yet
Sta 2200 Notes PDF
52 pages
Week05-06 EC With Annotations
No ratings yet
Week05-06 EC With Annotations
84 pages
Random Variables and Pdfs
No ratings yet
Random Variables and Pdfs
18 pages
Probability Review
No ratings yet
Probability Review
12 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
MEFall2023_7
No ratings yet
MEFall2023_7
46 pages
SI_Chapter-1
No ratings yet
SI_Chapter-1
30 pages
Lec-6 Random Variable 1D
No ratings yet
Lec-6 Random Variable 1D
12 pages
Chapter 4
80% (5)
Chapter 4
21 pages
Discrete Random Variable
No ratings yet
Discrete Random Variable
41 pages
Mca4020 SLM Unit 02
No ratings yet
Mca4020 SLM Unit 02
27 pages
CHAPTER TWO (2) S
No ratings yet
CHAPTER TWO (2) S
69 pages
Lesson - 12
0% (1)
Lesson - 12
38 pages
Chap 02
No ratings yet
Chap 02
153 pages
Chapter 6 - Random Variables and Probability Distributions
No ratings yet
Chapter 6 - Random Variables and Probability Distributions
101 pages
Lec3 Random Variables and Distributions
No ratings yet
Lec3 Random Variables and Distributions
18 pages
Unit 5 Random Variables: Structure
No ratings yet
Unit 5 Random Variables: Structure
20 pages
Ignou Stat
No ratings yet
Ignou Stat
320 pages
Block 2
No ratings yet
Block 2
87 pages
BMA2102 Probability and Statistics II Lecture 1
No ratings yet
BMA2102 Probability and Statistics II Lecture 1
15 pages
EE311_Lecture_Chapter_#04_Random_Variables_and_Expectation
No ratings yet
EE311_Lecture_Chapter_#04_Random_Variables_and_Expectation
48 pages
Unit 1 - Digital Communication - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Digital Communication - WWW - Rgpvnotes.in
11 pages
Chapter One - Probability Distribution
No ratings yet
Chapter One - Probability Distribution
26 pages
NOTES_DC
No ratings yet
NOTES_DC
109 pages
محاضرة رقم 5
No ratings yet
محاضرة رقم 5
28 pages
Discrete Random Variables and Probability Distributions
No ratings yet
Discrete Random Variables and Probability Distributions
23 pages
1) CE Probability - Revision
No ratings yet
1) CE Probability - Revision
10 pages
Chap2 Discrete Distributions
No ratings yet
Chap2 Discrete Distributions
22 pages
Statistical Methods and Testing of Hypothesis
No ratings yet
Statistical Methods and Testing of Hypothesis
52 pages
Session3 PSQT DKJ
No ratings yet
Session3 PSQT DKJ
83 pages
4 Random Variables
No ratings yet
4 Random Variables
68 pages
lecturenotes3-10
No ratings yet
lecturenotes3-10
27 pages
Chapter 3
No ratings yet
Chapter 3
6 pages
Ch1 Random Variables and Probability Distributions 0
No ratings yet
Ch1 Random Variables and Probability Distributions 0
25 pages
05 Random Signal
No ratings yet
05 Random Signal
40 pages
CH 7 - Random Variables Discrete and Continuous
No ratings yet
CH 7 - Random Variables Discrete and Continuous
7 pages
Statistics and Probability Lesson 1
No ratings yet
Statistics and Probability Lesson 1
3 pages
Unit-1-Single Random Variable
No ratings yet
Unit-1-Single Random Variable
64 pages
Random Variables Apr 27
No ratings yet
Random Variables Apr 27
32 pages
Ch.3 - Ch.4 - Ch.5 RV-PD - Part I
No ratings yet
Ch.3 - Ch.4 - Ch.5 RV-PD - Part I
82 pages
Chapt 2 PDF
No ratings yet
Chapt 2 PDF
39 pages
Basic Statistics For Lms
0% (1)
Basic Statistics For Lms
23 pages
Random Variable and Mathematical Expectation
No ratings yet
Random Variable and Mathematical Expectation
31 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Metabase Expressions
No ratings yet
Metabase Expressions
20 pages
Variable Overhead Variance: Material Cost Variance Formula
No ratings yet
Variable Overhead Variance: Material Cost Variance Formula
6 pages
Descriptive Statistics For Spatial Distributions
No ratings yet
Descriptive Statistics For Spatial Distributions
38 pages
What Is Variance?: Key Takeaways
No ratings yet
What Is Variance?: Key Takeaways
10 pages
[Ebooks PDF] download Quantitative Methods for Business 13th Edition full chapters
100% (2)
[Ebooks PDF] download Quantitative Methods for Business 13th Edition full chapters
55 pages
The Poisson Distribution
No ratings yet
The Poisson Distribution
9 pages
[Ebooks PDF] download A Biostatistics Toolbox for Data Analysis 1st Edition Steve Selvin full chapters
100% (2)
[Ebooks PDF] download A Biostatistics Toolbox for Data Analysis 1st Edition Steve Selvin full chapters
81 pages
Meta-Analysis of Comparative Studies of Depression in Mothers of Children With and Without Developmental Disabilities
No ratings yet
Meta-Analysis of Comparative Studies of Depression in Mothers of Children With and Without Developmental Disabilities
15 pages
10th Maths - Chapter 8 Book-In 1 Marks Exercise Solutions
No ratings yet
10th Maths - Chapter 8 Book-In 1 Marks Exercise Solutions
23 pages
Jobson 1980
No ratings yet
Jobson 1980
12 pages
1985 (Anderson)
No ratings yet
1985 (Anderson)
18 pages
Kevin Alvarez - Formal Lab Report
No ratings yet
Kevin Alvarez - Formal Lab Report
21 pages
JKNJKN
No ratings yet
JKNJKN
72 pages
Calculation Example
No ratings yet
Calculation Example
4 pages
Bingo 1
No ratings yet
Bingo 1
42 pages
A Study On Employees Satisfaction With Special Referance To Muthu Shri Chakra Packaging PVT LTD
No ratings yet
A Study On Employees Satisfaction With Special Referance To Muthu Shri Chakra Packaging PVT LTD
71 pages
Lesson Plan Ip Xii March April 2023
100% (2)
Lesson Plan Ip Xii March April 2023
1 page
Stein Et Al 2014 Environmental Heterogeneity
No ratings yet
Stein Et Al 2014 Environmental Heterogeneity
15 pages
MODIFIED CURRICULUM B.TECH MECHANICAL 3rd To 8th
No ratings yet
MODIFIED CURRICULUM B.TECH MECHANICAL 3rd To 8th
101 pages
Collection, Tabulation Analysis, Interpretation Presentation of Numerical Data
No ratings yet
Collection, Tabulation Analysis, Interpretation Presentation of Numerical Data
11 pages
Statistics 4040 2013 - 2014
No ratings yet
Statistics 4040 2013 - 2014
183 pages
Get Item Response Theory 1st Edition R. Darrell Bock PDF ebook with Full Chapters Now
100% (7)
Get Item Response Theory 1st Edition R. Darrell Bock PDF ebook with Full Chapters Now
81 pages
An Analytical Study of Consumer Relationship Management in Banking Sector: An Empirical Study On SBI
No ratings yet
An Analytical Study of Consumer Relationship Management in Banking Sector: An Empirical Study On SBI
9 pages
Hu and Bentler 1998
No ratings yet
Hu and Bentler 1998
30 pages
STAT 135: Linear Regression: Joan Bruna
No ratings yet
STAT 135: Linear Regression: Joan Bruna
232 pages
Measurement of Variability
No ratings yet
Measurement of Variability
11 pages