Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chapter 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Probability Theory and Random Processes

(MA 225)

Class Notes
September – November, 2020

Instructor
Ayon Ganguly
Department of Mathematics
IIT Guwahati
Contents

1 Probability 2
1.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Classical Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Countable and Uncountable Sets . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Axiomatic Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.4 Continuity of Probability . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1
Chapter 1

Probability

1.1 Probability
1.1.1 Classical Probability
As we know that the probability of an event A, denoted by P (A), is defined by

Favourable number of cases to A #A


P (A) = = ,
Total number of cases #S
where S is the set of all possible outcomes. This definition is known as classical definition of
probability. Note that this definition is only meaningful if number of elements in S is finite.
As A ⊆ S, A is finite if S is finite. Let us consider the following examples.

Example 1.1. Let a fair die is rolled. What is the probability of getting three on upper
face? It is easy to see that the required probability is 1/6. ||

Example
√ 1.2. Consider a target comprising of three concentric circles of radii 1/3, 1, and
3 feet. Consider the event that a shooter hits inside the inner circle and its’ probability.
Let A be an event that the shooter hits inside the inner circle. Then
n √ o
S = R2 and A = x ∈ R2 : |x| ≤ 3 .

In this case both A and S are infinite and therefore probability of A can not be found using
classical definition of probability. ||

Here we will try to provided a general definition of probability such that we can apply
the new definition for larger class of problems, like Example 1.2. Note that probability can
be viewed as a function where the argument of the function is a set and output is a real
number. To give the new definition of probability, we will use three basic properties (will
be discussed) of the classical definition of probability, and we will say that a function which
satisfy these three properties is called a probability or a probability function. Of course, we
need to define the domain of the function properly.

Definition 1.1 (Set Function). A function which takes a set as its’ argument is called a set
function.

2
1.1.2 Countable and Uncountable Sets
For further discussion, we need the concepts of countable and uncountable sets. The defi-
nitions and some properties of countable and uncountable sets are given in this subsection.
You must have read these concepts in MA 101 course and therefore it is a recapitulation.
Definition 1.2. We say that two sets A and B are equivalent if there exists a bijection from
A to B. We denote it by A ∼ B.
Definition 1.3. For any positive integer n, let Jn = {1, 2, . . . , n} and N be the set of all
positive integers (natural numbers). For any set A, we say:
(a) A is finite if A = φ or A ∼ Jn for some n ∈ N. n is said to be the cardinality of A or
number of elements in A.
(b) A is infinite if A is not finite.
(c) A is countable if A ∼ N.
(d) A is atmost countable if A is finite or countable.
(e) A is uncountable if A is neither finite not countable.
Example 1.3. The set of all integers, Z, is countable. Consider the function f : N → Z
given by
(
n
if n even
f (n) = 2 n−1
− 2 if n odd.
It is easy to see that f (·) is a bijection from N to Z. Therefore, Z is countable. ||
Remark 1.1. A finite set cannot be equivalent to any of its proper subset. However, this is
possible for an infinite set. For example, consider the bijection f : N → 2N defined by
f (n) = 2n.
Here, 2N is a proper subset of N and f (·) is a bijection from N to 2N. Therefore, N and 2N
are equivalent. †
Remark 1.2. If a set is countable, then it can be written as a sequence {xn }n≥1 of distinct
terms. †
Theorem 1.1. Every infinite subset of a countable set A is countable.
S∞
Theorem 1.2. Let {En }n≥1 be a sequence of atmost countable sets and put S = n=1 En .
Then S is atmost countable.
Theorem 1.3. Let A1 , A2 , . . . , An be atmost countable sets. Then Bn = A1 ×A2 ×. . .×An
is atmost countable.
Corollary 1.1. The set of rationals, Q, is countable.
Theorem 1.4. The set, A, of all binary sequences is uncountable.
Corollary 1.2. [0, 1] is uncountable.
Corollary 1.3. R is uncountable.
Corollary 1.4. Qc is uncountable.
Corollary 1.5. Any interval is uncountable.

3
1.1.3 Axiomatic Probability
To define the probability under more general framework, we need the concepts of random
experiment, sample space, σ-field. These concepts are needed to define the domain of the
probability function adequately.
Definition 1.4 (Random Experiment). An experiment is called a random experiment if it
satisfies the following three properties:
1. All the outcomes of the experiment is known in advance.

2. The outcome of a particular performance of the experiment is not known in advance.

3. The experiment can be repeated under identical conditions.


Note that according to the definition of a random experiment, we know all possible
outcomes before hand, and hence we can make a list of all possible outcomes. This list is
called sample space. The third condition in the definition of random sample is some what
hypothetical in the sense that we will in general assume that the third condition is satisfied
(if not very absurd to assume).
Definition 1.5 (Sample Space). The collection of all possible outcomes of a random exper-
iment is called the sample space of the random experiment. It will be denoted by S.
Example 1.4. A toss of a coin is a random experiment as all the conditions of the definition
of random experiment hold true. In this case, the sample space is S = {H, T }. The sample
space is finite in this example. ||
Example 1.5. Tossing a coin until the first head appears is also a random experiment
with sample space S = {H, T H, T T H, . . .}. In this case, the sample space is countably
infinite. ||
Example 1.6. The experiment of measuring the height of a student is a random experiment
with sample space S = (0, ∞). Here the sample space is uncountable. ||
Definition 1.6 (σ-algebra or σ-field). A collection, F, of subsets of S is called a σ-algebra
or a σ-field if it satisfy the following properties:
1. S ∈ F.

2. A ∈ F implies Ac ∈ F.

[
3. A1 , A2 , . . . ∈ F implies Ai ∈ F.
i=1

The first condition in the definition of σ-field implies that a σ-field is always non-empty.
The second condition is called closed under complementation. Note that here Ac = S − A,
i.e., the complementation is with respect to the sample space. The third condition of the
definition of σ-field is called closed under countable union. Thus, by definition, a σ-field is
closed under complementation and countable union.
Definition 1.7 (Event). A set E is said to be an event with respect to a σ-field F if E ∈ F.
We will say “the event E occurs” if the outcomes of a performance of the random experiment
is in E.

4
Example 1.7 (Continuation of Example 1.4). Consider following three classes of subsets
of S. F1 = {∅, S, {H}, {T }}, F2 = {∅, S}, and F3 = {∅, S, {H}}. Here we will show that
F1 and F2 are σ fields, but F3 is not a σ-field.
Note that S ∈ F1 and for any A ∈ F1 , Ac ∈ F1 . Hence, it is easy to see that the
first two conditions of the definition of σ-field are hold true. For the third condition, let
A1 , A2 , . . . ∈ F1 .
Case I: If Ai = S for at least oneSi ∈ N, ∞
S
i=1 Ai = S ∈ F1 .

Case II: If Ai = ∅ for all i ∈ N, i=1 Ai = ∅ ∈ F1 .
Case III: If Ai = {H} for at least one i ∈ N and rest of Ai = ∅, S ∞
S
i=1 Ai = {H} ∈ F1 .
Case IV: If Ai = {T } for at least one i ∈ N and rest of Ai = ∅, ∞ i=1 Ai = {T } ∈ F1 .

S∞ V: If Ai = {H} for at least one i ∈ N and Aj = {T } for at least one j ∈ N,


Case
i=1 Ai = S ∈ F1 .
These are the exhaustive cases and in all the cases, ∞
S
i=1 Ai ∈ F1 . Hence, F1 is a σ-field on
the subsets of S. Note that F1 is power set of S.
It is easy to see that F2 is σ-field and therefore left as a practice problem. To show that F3
is not a σ-field, we need to show that at least one of the three conditions is not true. It is very
easy to check that the second condition is not true as {H} ∈ F3 , but {H}c = {T } ∈ / F3 . ||
Example 1.8 (Continuation of Example 1.5). Consider F = P(S), the power set of S.
Clearly, S ∈ F. For any A ∈ F, Ac is a subset S∞ of S and hence belongs to F. For any
countable collection of sets A1 , A2 , . . . ∈ F, i=1 Ai is a subset of S and belongs to F.
Hence, F is a σ-field on the subsets of S. ||
Example 1.9 (Continuation of Example 1.6). F = {φ, S, (4, 5), (4, 5)c } is a σ-field. ||
Remark 1.3. Note that there could be multiple σ-field on subsets of a sample space. Power
set of sample space is always a σ-field and it is the largest σ-field. On the other hand {S, ∅}
is also a σ-field and it is the smallest σ-field. †
Definition 1.8 (Measurable Space). Let S be a sample space of a random experiment and
F is a σ-field on subsets of S. Then the ordered pair (S, F) is called a measurable space.
Definition 1.9 (Probability). Let (S, F) be a measurable space. A set function P : F → R
is called a probability if
1. P (E) ≥ 0 for all E ∈ F.

2. P (S) = 1.

3. (Countable Additivity) Let E1 , E2 , . . . ∈ F be a sequence of disjoint events (i.e., Ei ∩


Ej = ∅ for all i 6= j ∈ N) then

! ∞
[ X
P Ei = P (Ei ).
i=1 i=1

The idea of probability germinates to predict the outcomes of gambling, where the classi-
cal definition of probability was used. When the people tried to give the axiomatic definition
of the probability, it was observed that these three properties (mentioned in Definition 1.9)
of the classical definition of probability are working fine. Hence, these three properties are
used. Note that the first and the third axioms (mentioned in Definition 1.9) have good

5
intuitions from the concepts of area of a region or volume of a shape. The area or volume
is always non-negative and if we have several disjoint reasons or shapes, then the combined
area or volume is the sum of the individual areas or volumes, respectively.

Definition 1.10 (Probability Space). Let S be a sample space and F be a σ-field on the
subsets of S. Let P be a probability defined on F. The triplet (S, F, P ) is called a probability
space.

Example 1.10 (Continuation of Example 1.4). Consider the random experiment of tossing
of a coin, where sample space is S = {H, T } and F = P (S), the power set of S. Consider
a function P : F → R defined by

P (S) = 1, P ({H}) = 0.6, P ({T }) = 0.4, and P (∅) = 0.

Here it is very easy to see that the first two axioms of Definition 1.9 are hold true. To check
if the third axiom hold or not, let us consider the following cases. Note that here we have
to choose Ei ’s such that Ei are disjoint.
Case I: Ei = ∅ for all i ∈ N. Then P (Ei ) = 0 for all i ∈ N implies ∞
P
i=1 P (Ei ) = 0. On the
other hand, P (∪∞ E
i=1 i ) = P (∅) = 0.
Case II: Ei = S if i = i0 for some i0 ∈ N and Ei = ∅ for i 6= i0 . In this case ∞
P
i=1 P (Ei ) = 1
and P (∪∞ E
i=1 i ) = P (S) = 1.
Case
P∞ III: Ei = {H} if i = i0 for some i0 ∈ N and Ei = ∅ for i 6= i0 . In this case

i=1 P (Ei ) = P ({H}) = 0.6 and P (∪i=1 Ei ) = P ({H}) = 0.6.

P∞ IV: Ei = {T } if i = i0 for some


Case

i0 ∈ N and Ei = ∅ for i 6= i0 . In this case
i=1 P (Ei ) = P ({T }) = 0.4 and P (∪i=1 Ei ) = P ({T }) = 0.4.
Case V: Ei = {T } ifPi = i1 , Ei = {H} if i = i2 for some i1 6= i2 ∈ N, and Ei = ∅ for
i 6= i1 , i2 . In this case ∞ ∞
i=1 P (Ei ) = P ({H})+P ({T }) = 1 and P (∪i=1 Ei ) = P (S) = 1. ||

Example 1.11. Consider a roll of a die. The sample space S = {1, 2, . . . , 6} and take
F = P(S). Let P (∅) = 0 and P (i) = 1/6 for i ∈ S. Note that in this case the function P (·)
have not defined for all the members in F. However, if we assume that P (·) is a probability
defined on the σ-field F, we can uniquely extend P (·) for all other members of F. Let
E ∈ F. As S is a finite set, so is E. Let the cardinality of E is n and the elements of E be
x1 < x2 < . . . < xn . Define Ei = {xi } for i = 1, 2, . . . , n and Ei = ∅ for i > n. Clearly,
Ei ’s are disjoint and E = ∪∞ i=1 Ei . Now,
P∞ if P (·) is a probability, using the third axiom of
Definition 1.9, P (E) = P (∪∞ E
i=1 i ) = i=1 P (Ei ) = n/6. ||

Example 1.12. Consider a roll of a die. The sample space S = {1, 2, . . . , 6} and take
F = P(S). Let P (∅) = 0 and P (i) = i/21 for i ∈ S. As before, in this case also we can
extend the function P (·) on F such that it becomes a probability on F. ||

We have already pointed out that P(S) is a σ-field. Now, a natural question is that if
P(S) is σ-field, then why do we define σ-field? Why should not we work with power set
of sample space, always, and define probability on the power set of the sample space? We
will try to answer these questions using next two examples. We will show that the choice of
σ-field is an important issue.
#E
Example 1.13. Let S = {1, 2, . . . , 60} and F = P(S). Let us define P (E) = #S
for all
E ∈ F. Note that as S is a finite set, P (·) satisfies all the axioms of probability. ||

6
Example 1.14. Now, consider a different problem where S = N and F = P(N). Can we
extend the definition of probability in the previous example to define a probability for this
example? A natural extension is
Nn (E)
P (E) = lim sup , (1.1)
n→∞ n

where E ∈ F and Nn (E) is the number of times E occurs in the first n natural numbers.
Here we have used lim sup instead of lim to overcome the issue of existence of limit of Nnn(E) .
Before answering if P (·) defined above is a probability, let us see the values of P (·) evaluated
on some specified subsets of S. Let us consider A = {ω ∈ N : ω is a multiple of 3} and we
want to calculate P (A). Note that Nn (A) is the number of multiple of three in the set
Jn = {1, 2, . . . , n}. Thus,

m if n = 3m
Nn (A)  3mm
= 3m+1 if n = 3m + 1
n 
 m
3m+2
if n = 3m + 2.

1 Nn (A) 1 1
Hence, for all n ∈ N, 6 ≤ ≤ which implies P (A) = 31 . Similarly, P (B) =
3 + n−2 n 3 4
(why?) for B = {ω ∈ N : ω is a multiple of 4}. Now, assume that C = {2}. Then
(
Nn (C) 0 if n = 1
= 1
n n
if n ≥ 2.

P set D. However, S = N = ∪i∈N {i}.


Hence, P (C) = 0. Similarly, P (D) = 0 for any singleton
Hence, if P satisfies the third axiom then P (S) = ∞ i=1 P ({i}) = 0 6= 1, which contradicts
the second axiom. Though P (·) as defined in (1.1) gives meaningful values for some sets
like A and B, it does not satisfy all the three axioms, when it is defined on the power set of
S. ||

Note that we can always define a probability on the power set of a sample space. For
example, let ω0 ∈ S be a fixed element. Define P : P(S) → R by
(
1 if ω0 ∈ A
P (A) =
0 if ω0 ∈
/ A.

It is easy to see that P (·) is a probability. However, in practice, a probability is used to model
a practical situation, where the probability may need to satisfy extra conditions other then
three conditions mentioned in Definition 1.9. The previous example suggests, depending on
our objective we may need to choose from the set of all subsets of S, certain subsets (not all) of
S on which to define a probability P . For example, P (·) defined in (1.1) becomes a probability
on the σ-fields F1 = {S, ∅, A, Ac } or F2 = {S, ∅, A, B, Ac , B c , A ∩ B, Ac ∩ B, A ∩ B c , Ac ∩ B c ,
A ∪ B, A ∪ B c , Ac ∪ B, Ac ∪ B c , (A ∩ B) ∩ (Ac ∩ B c ), (Ac ∩ B) ∪ (A ∩ B c )}, where A and
B are as defined in the previous example.
Next we will see some of the properties of probability. Let us assume that (S, F, P ) be
a probability space.

Theorem 1.5. P (∅) = 0.

7
Proof: Consider Ei = ∅ for all i ∈ N. Clearly, Ei ’s are disjoint and ∪∞
i=1 Ei = ∅. Using the
third axiom of Definition 1.9,

P (∅) + P (∅) + . . . = P (∅) =⇒ P (∅) = 0,

as using first axiom P (∅) ≥ 0.

Theorem 1.6 (Finite Additivity). If E1 , E2 , . . . , En are n disjoint events, then


n
! n
[ X
P Ei = P (Ei ).
i=1 i=1

Proof: Take Ei = ∅ for i > n in the third axiom, to get the required result.

Theorem 1.7 (Monononicity). P (·) is monotone, i.e., for E1 , E2 ∈ F and E1 ⊂ E2 ,


P (E1 ) ≤ P (E2 ).

Proof: Note that E2 = (E2 ∩ E1 ) ∪ (E2 ∩ E1c ) with (E2 ∩ E1 ) ∩ (E2 ∩ E1c ) = ∅. Hence, using
finite additivity, P (E2 ) = P (E2 ∩ E1 ) + P (E2 ∩ E1c ) = P (E1 ) + P (E2 ∩ E1 ) ≥ P (E1 ). Here
the second equality is true as E1 ⊂ E2 and the last inequality is true as P (·) ≥ 0.

The first and second terms in the right hand side of the decomposition E2 = (E2 ∩ E1 ) ∪
(E2 ∩ E2c ) can be interpreted as E2 occurring with E1 and E2 occurring without E1 , respec-
tively. This decomposition is quite useful. We can use it to solve several problems in this
course.

Theorem 1.8. Let A, B ∈ F such that P (B) = 0. Then P (A ∩ B) = 0.

Proof: Note that 0 = P (B) ≥ P (A ∩ B) ≥ 0. Hence P (A ∩ B) = 0.

Corollary 1.6. Let A, B ∈ F with P (B) = 1. Then P (A ∩ B) = P (A).

Proof: As P (B c ) = 0, P (A ∩ B c ) = 0 =⇒ P (A ∩ B) = P (A).

Theorem 1.9 (Subtractive Property). P (·) is subtractive, i.e., for E1 , E2 ∈ F and E1 ⊂


E2 , P (E2 \ E1 ) = P (E2 ) − P (E1 ).

Proof: As in the proof of Theorem 1.7,

P (E2 ) = P (E1 ) + P (E2 ∩ E1c ) =⇒ P (E2 \ E2 ) = P (E2 ) − P (E1 ).

Theorem 1.10. 0 ≤ P (E) ≤ 1.

Proof: For any E ∈ F, ∅ ⊂ E ⊂ S =⇒ 0 ≤ P (E) ≤ 1, using Theorem 1.7.

Theorem 1.11. If E1 , E2 ∈ F, then P (E1 ∪ E2 ) = P (E1 ) + P (E2 ) − P (E1 ∩ E2 ).

Proof: Note that E1 ∪ E2 = E2 ∪ (E1 \ E2 ). Also, E2 ∩ (E1 \ E2 ) = ∅. Hence, P (E1 ∪ E2 ) =


P (E2 )+P (E1 \ E2 ) . We have already seen that P (E1 ) = P (E1 ∩E2 )+P (E1 \E2 ). Combining,
we get the required result.

Theorem 1.12. If E1 , E2 ∈ F, then P (E1 ∪ E2 ) ≤ P (E1 ) + P (E2 ).

8
Proof: Trivial from the last theorem.
Theorem 1.13. If E ∈ F, then P (E c ) = 1 − P (E).
Proof: This is trivial as S = E ∪ E c .
Definition 1.11 (Elementary Event). A single-ton event is called an elementary event.
If S is finite, and F = P(S), it is sufficient to assign probability to each elementary event
in the sense that for any subset E of S, we can calculate P (E). For any E ∈ F, E can be
written as the union of elementary events that are in E, i.e., E = ∪ω∈E {ω}. Note that as E
is finite (being a subset of S, which is finite), there are finite number ofPelementary events
in the expression. Also, elementary events are disjoint. Hence, P (E) = ω∈E P ({ω}).
Let S be finite, F = P(S) and the elementary events be equally likely (i.e., all the
elementary events have same probability). Let the cardinality of the set S is n. Then S
can be written as {ω1 , ω2 , . . . , ωn }. Let P ({ωi }) = c for all i = 1, 2, . . . , n. Note that
S = ∪ni=1 {ωi } implies that c = 1/n. Now, for any event E, P (E) = #E n
, which is classical
probability. Hence, classical definition of probability is a particular case of the axiomatic
definition of probability.
If S is countably infinite, and F = P(S), it is still sufficient to assign probability to
each elementary event. For any E ∈ F, E is atmost countable, which means P that E can
be expressed as countable union of elementary events. Therefore, P (E) = ω∈E P ({ω}).
However, in this case one cannot assign equal probability to each elementary event without
violating the second axiom in Definition 1.9. Revisit Examples 1.11 and 1.12.
If S is uncountable, and F = P(S), one can not make an equally likely assignment of pos-
itive probabilities to each elementary event. We can prove this statement by contradiction.
If possible, suppose that an equally likely assignment of positive probability can be done,
i.e., P ({ω}) = c > 0 for all ω ∈ S. S can be written as union of elementary events. How-
ever there are uncountable elementary events, and hence it is an uncountable union of sets.
Therefore, the third axiom of probability can not be used directly to conclude that P (S) > 1.
Now, note that there exists a countable subset E of S. Clearly, P (S) ≥ P (E) = ∞. This is
a contradiction to the second axiom of Definition 1.9. Hence, our assumption is wrong.
Indeed, for uncountable S and F = P(S), one can not assign positive probability to each
elementary event without violating the axiom P (S) = 1. This statement can be proved,
again, by contradiction. If possible, suppose that P ({ω}) > 0 for all ω ∈ S. Let us define
the sets An = ω ∈ S : P ({ω}) > n1 for n = 1, 2, . . .. The claim is that An is finite set


for all n = 1, 2, . . .. If not, then An is either countably infinite or uncountable. In both


the cases, P (An ) is infinite, which is a contradiction. Hence, An is finite. Now, note that
S = ∪∞ n=1 An . As An are finite, S is atmost countable, which is a contradiction and therefore
our assumption that P ({ω}) > 0 for all ω ∈ S is wrong.

1.1.4 Continuity of Probability


Note that a function f : R → R is said to be continuous at x0 if for every real sequence
{xn }n≥1 converging to x0 , the sequence {f (xn )}n≥1 converges to f (x0 ). If we want to extend
this definition of continuity of a function f : R → R to probability, first the convergence of
sequence of events need to be defined, as the argument of P (·) is an event. Here we will
consider the limits of increasing and decreasing sequences of events.
Definition 1.12 (Increasing Sequence of Events). A sequence, {En }n≥1 , of events are said
to be increasing if En ⊆ En+1 for all n = 1, 2, . . ..

9
Definition 1.13 (Decreasing Sequence of Events). A sequence, {En }n≥1 , of events are said
to be decreasing if En+1 ⊆ En for all n = 1, 2, . . ..

Definition 1.14 (Limit of Increasing Sequence of Events). For an increasing sequence,



[
{En }n≥1 , of events, the limit is defined by lim En = En .
n→∞
n=1

Definition 1.15 (Limit of Decreasing Sequence of Events). For a decreasing sequence,



\
{En }n≥1 , of events, the limit is defined by lim En = En .
n→∞
n=1

Theorem 1.14 (Continuity from below). Let {En }n≥1 be an increasing sequence of events,
then  
P lim En = lim P (En ).
n→∞ n→∞

E3 \ E2

E2 \ E1

E1

Figure 1.1: Sequence of events {An }n≥1 .

Proof: Let us define the following sequence, {An }n≥1 , of events as

A1 = E1 and An = En \ En−1 for n = 2, 3, . . . .

Please see the Figure 1.1. Clearly, An ’s are disjoint and ∞


S S∞
n=1 An = n=1 En . Also,
(
P (E1 ) if i = 1
P (An ) =
P (En ) − P (En−1 ) if n = 2, 3, . . . ,

as {En }n≥1 is an increasing sequence of events. Now,


! ∞
! ∞ N
  [ [ X X
P lim En = P En =P An = P (Ai ) = lim P (An ) = lim P (En ).
n→∞ N →∞ n→∞
n=1 n=1 n=1 n=1

Here the third equality is true for the third axiom of Definition 1.9, the fourth equality is
true from the definition of a convergence of series, and the last equality is true for telescopic
series.

10
Theorem 1.15 (Continuity from above). Let {En }n≥1 be a decreasing sequence of events,
then  
P lim En = lim P (En ).
n→∞ n→∞

Proof: Let An = Enc for all n = 1, 2, . . .. Clearly, {An }n≥1 is an increasing sequence of
events. Hence,
 
P lim An = lim P (An )
n→∞

! n→∞
[
=⇒ P Enc = lim P (Enc )
n→∞
n=1

!c !
\
=⇒ P En = lim (1 − P (En ))
n→∞
n=1
 
=⇒ P lim En = lim P (En ).
n→∞ n→∞

1.2 Conditional Probability


We use conditional probability when we have some information about the outcome of a
random experiment. Let us consider the following example.

Example 1.15. Let a die is thrown twice. Suppose that we are interested in the probability
of the event that the sum of the outcomes of the rolls is six. Clearly, the sample space has
36 points and
S = {(n, m) : n, m = 1, 2, . . . , 6} .
As the sample space is finite, let us use the classical definition of probability. The required
probability is 5/36.
Now, assume that you have observed that the first throw results in a 4. We are interested
in the probability of the same event as before, but now we have extra information that the
outcome of the first roll is 4. Note that when we know that the first roll results in a 4, the
sample space changes and the new sample space is

S1 = {(4, m) : m = 1, 2, . . . , 6} ,

which is the event that the first throw is a 4. We need to find the probability of the event
that the sum is 6 in the sample space S1 . There is only one case (4, 2) (in S1 ) which is
favorable to the event of interest and hence the required probability is

1 1/36 P (A ∩ H)
= = ,
6 6/36 P (H)

where A and H are the events that sum is 6 and first roll results in 4, respectively. ||

Once you are given some information or you observe something, the sample space changes.
Conditional probability is a probability on the changed sample space. Motivated by the above
example, the definition of conditional probability is given as follows.

11
Definition 1.16 (Conditional Probability). Let H be an event with P (H) > 0. For any
arbitrary event A, the conditional probability of A given H is denoted by P (A|H) and defined
by
P (A ∩ H)
P (A|H) = .
P (H)
Note that to define the conditional probability, the probability of the conditioning event
has to be positive. The probability of the intersection of two events can be expressed in
terms of the conditional probability and the relationship is given below.
(
P (A)P (B|A) if P (A) > 0
P (A ∩ B) =
P (B)P (A|B) if P (B) > 0.

Definition 1.17 (Mutually Exclusive Events). A collection of events {E1 , E2 . . .} is said to


be mutually exclusive if Ei ∩ Ej = ∅ for all i 6= j.
Definition 1.18 (Exhaustive Events). A collection of events {E1 , E2 . . .} is said to be ex-
haustive if P (∪∞
i=1 Ei ) = 1.

Thus if a collection of events {En }n≥1 is such that ∪∞


n=1 En = S, then the collection is
exhaustive.
Theorem 1.16 (Theorem of Total Probability). Let {E1 , E2 . . .} be a collection of mutually
exclusive and exhaustive events with P (Ei ) > 0 for all i = 1, 2, . . .. Then for any event E,

X
P (E) = P (E|Ei )P (Ei ) .
i=1

n o
Proof: Let us denote Ẽi = Ei ∩ E for all i = 1, 2, . . .. Then Ẽ1 , Ẽ2 , . . . is mutually
exclusive. Now, as Ei ’s are exhaustive using Corollary 1.6

!!
[
P (E) = P E ∩ Ei
i=1 E6 E5 E4

!
[
=P (E ∩ Ei )
i=1
E ∩ E5
E ∩ E3


X
= P (E ∩ Ei ) , as Ẽi are mutually exclusive
E ∩ E1

E3
E ∩ E2
i=1
X∞
= P (E|Ei )P (Ei ). as P (Ei ) > 0 for all i ∈ N. E1 E2
i=1

The theorem of total probability tells that probability of an event can be computed by
computing the probability of several partitions of the event. See the above figure, where the
square and the shaded region indicates the sample space and the event E, respectively. In
the figure {E1 , E2 , E3 , E4 , E5 , E6 } is mutually exclusive and exhaustive. The event E can
be partitioned into E ∩ E1 , E ∩ E2 , E ∩ E3 , and E ∩ E5 and hence the probability of E
can be computed by computing the probability of several partitions and then adding these
probabilities.

12
Theorem 1.17 (Bayes’ Theorem). Let {E1 , E2 . . .} be a collection of mutually exclusive
and exhaustive events with P (Ei ) > 0 for all i = 1, 2, . . .. Let E be any event with P (E) > 0.
Then
P (E|Ei )P (Ei )
P (Ei |E) = ∞ for i = 1, 2, . . . .
X
P (E|Ej )P (Ej )
j=1

Proof: Using the definition of conditional probability and the theorem of total probability,
the proof is straight forward.
In the theorem of total probability and Bayes’ theorem, we have considered a countable
collection of events {E1 , E2 , . . .}. However, the theorems hold true even if we have a finite
collection of mutually exclusive and exhaustive events (Why?).
Example 1.16. There are 3 boxes. Box 1 containing 1 white, 4 black balls. Box 2
containing 2 white, 1 black ball. Box 3 containing 3 white, 3 black balls. First you throw a
fair die. If the outcomes are 1, 2 or 3 then box 1 is chosen, if the outcome is 4 then box 2 is
chosen and if the outcome is 5 or 6 then box 3 is chosen. Finally, you draw a ball at random
from the chosen box. Let W denote the event that the drawn ball is white. Also, assume
that Bi , i = 1, 2, 3, denotes the event that ith box is selected after the roll of the die. Using
Bayes’ theorem, the (conditional) probability that the ball is from box 1 given the chosen
ball is white is
P (W |B1 )P (B1 ) 9
P (B1 |W ) = 3 = .
X 34
P (W |Bi )P (Bi )
i=1

Similarly given the fact that the drawn ball is white, the probability that the ball is from
box 2 is P (B2 |W ) = 5/17. ||

1.3 Independence
Observe in the previous example that P (B1 |W ) = 9/34 < 1/2 = P (B1 ), whereas P (B2 |W ) =
5/17 > 1/6 = P (B2 ). Thus the occurrence of one event can make the occurrence of a second
event more or less likely. Also, occurrence of an event may not change the probability of
the occurrence of a second event. For example, let a coin is tossed two times. Then the
probability of a head in the second toss does not change if the result of the first toss is a tail.
When occurrence of one event, say A, reduce the probability of the occurrence of another
event, say B, we say that the events are negatively associated. That means A and B are
negatively associated if P (B|A) < P (B). For the conditional probability P (B|A), P (A) must
be strictly greater than zero. Now, note that P (B|A) < P (B) can be equivalently written
as P (A ∩ B) < P (A)P (B), where we do not need the restriction P (A) > 0. Motivated by
this discussion, we have the following definition.
Definition 1.19. Let A and B be two events. They are said to be
1. negatively associated if P (A ∩ B) < P (A)P (B).

2. positively associated if P (A ∩ B) > P (A)P (B).

3. independent if P (A ∩ B) = P (A)P (B).

13
Theorem 1.18. If A and B are independent, so are A and B c .

Proof: As A = (A ∩ B) ∪ (A ∩ B c ), where A ∩ B and A ∩ B c are disjoint. Hence

P (A ∩ B c ) = P (A) − P (A ∩ B)
= P (A) − P (A) P (B), as A and B are independent events
= P (A)P (B c ) .

Hence, A and B c are independent events.

Corollary 1.7. If A and B are independent events, then

1. Ac and B are independent events.

2. Ac and B c are independent events.

Proof: This proof is simple using the previous theorem, and hence left as an exercise.

Example 1.17. Let P (B) = 0. For any event A, 0 ≤ P (A ∩ B) ≤ P (B) = 0. Hence,


P (A ∩ B) = 0. On the other hand, P (A)P (B) = 0. Therefore, A and B are independent.
Now, assume that P (B) = 1. Then for any event A, A and B c are independent as
P (B c ) = 0. Using the previous theorem, A and B are independent events. In particular any
event A is independent of S and ∅. ||

We have talked about independence of two events. A natural question is: Is the concept
of independence be extended for more than two events? The answer is yes. However, there
are two types of independence that are of interest for more than two events. We will discuss
these concepts now.

Definition 1.20 (Pairwise Independent). A countable collection of events E1 , E2 , . . . are


said to be pairwise independent if Ei and Ej are independent for all i 6= j.

Definition 1.21 (Independent for Finite Collection of Events). A finite collection of events
E1 , E2 , . . . , En are said to be independent (or mutually independent) if for any sub-collection
En1 , . . . , Enk of E1 , E2 , . . . , En ,
k
\  k
Y
P Enk = P (Eni ).
i=1 i=1

Definition 1.22 (Independent for Countable Collection of Events). A countable collection


of events E1 , E2 , . . . are said to be independent (or mutually independent) if any finite sub-
collection is independent.

Suppose that we have three events E1 , E2 , and E3 and we want to check if they are
pairwise independent or not. We need to verify three conditions, viz.,

P (E1 ∩ E2 ) = P (E1 )P (E2 ),


P (E1 ∩ E3 ) = P (E1 )P (E3 ),
P (E2 ∩ E3 ) = P (E2 )P (E3 ).

14
However, to check if they are independent or not, we need to verify four conditions, viz.,
P (E1 ∩ E2 ) = P (E1 )P (E2 ),
P (E1 ∩ E3 ) = P (E1 )P (E3 ),
P (E2 ∩ E3 ) = P (E2 )P (E3 ),
P (E1 ∩ E2 ∩ E3 ) = P (E1 )P (E2 )P (E3 ).
That means to check if three events are independent or not, we need to check one extra
condition over the conditions that need to verify for pairwise independence.
n
In general, one needs to verify 2 conditions to check if a collection of n events are
pairwise independent or not. To check if a collection of n events are independent or not,
2n −n−1 conditions need to be verified. Clearly, if a collection of events are independent, then
they are pairwise independent. However, in general, the converse is not true as illustrated
by the following example.
Example 1.18. Suppose that a coin is tossed twice. The sample space has four points
and is given by S = {HH, HT, T H, T T }. Suppose that all elementary events are equally
likely. That is P (HH) = P (HT ) = P (T H) = P (T T ) = 1/4. Let E1 = {HH, HT },
E2 = {HH, T H} and E3 = {HH, T T }. Clearly, E1 , E2 , and E3 are the events that the first
toss results in a heads, second toss results in heads, and both the tosses have same outcomes,
respectively. It is easy to see that P (E1 ) = P (E2 ) = P (E3 ) = 1/2. Also
1
P (E1 ∩ E2 ) = P (HH) = .
4
1
P (E1 ∩ E3 ) = P (HH) = .
4
1
P (E2 ∩ E3 ) = P (HH) = .
4
1
P (E1 ∩ E2 ∩ E3 ) = P (HH) = .
4
Thus P (E1 ∩ E2 ) = 1/4 = P (E1 )P (E2 ), P (E1 ∩ E3 ) = 1/4 = P (E1 )P (E3 ), and P (E2 ∩ E3 ) =
1/4 = P (E2 )P (E3 ). This shows that the events E1 , E2 , and E3 are pairwise independent.
However, P (E1 ∩ E2 ∩ E3 ) = 1/4 6= 1/8 = P (E1 )P (E2 )P (E3 ). Hence, E1 , E2 , and E3 are
not independent. ||
Example 1.19. Let a die be rolled twice. The sample space is given by
S = {(i, j) : i = 1, . . . , 6, j = 1, . . . , 6}.
Suppose all elementary events are equally likely, i.e., P (ω) = 1/36 for all ω ∈ S. Let us
consider following events
E1 = 1st roll is 1, 2 or 3,
E2 = 1st roll is 3, 4 or 5,
E3 = Sum of the rolls is 9.
Clearly, P (E1 ) = 1/2, P (E2 ) = 1/2, and P (E3 ) = 1/9. Also, P (E1 ∩ E2 ∩ E3 ) = 1/36 =
P (E1 )P (E2 )P (E3 ). However, E1 and E2 are not independent as P (E1 ∩ E2 ) = 1/6 6= 1/4 =
P (E1 )P (E2 ). Thus E1 , E2 , and E3 are not independent, not even pairwise independent.
This example shows that verifying the condition P (E1 ∩ E2 ∩ E3 ) = P (E1 )P (E2 )P (E3 ) is
not enough to check if three events are independent or not. ||

15
Definition 1.23 (Conditional Independent). Given an event C two events A and B are said
to be conditionally independent if P (A ∩ B|C) = P (A|C)P (B|C).

Example 1.20. A box contains two coins: a fair regular coin and one fake two-headed
coin (i.e., P (H) = 1). The regular coin is called Coin 1 and the other is called Coin 2. You
choose a coin at random and toss it twice. Define the following events.

A = First coin toss results in a H.


B = Second coin toss results in a H.
C = Coin 1 (regular) has been selected.

Here P (A|C) = 1/2 = P (B|C), P (A ∩ B|C) = 1/4. Hence, A and B are conditionally
independent given C. As P (A) = 3/4 = P (B) and P (A ∩ B) = 5/8, A and B are not
independent. Thus, the conditional independence does not imply independence in general.
||

16

You might also like