Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Probability

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Lecture 1: Introduction

January 4, 2019
Sunil Kumar Gauttam
Department of Mathematics, LNMIIT

Probability theory is the branch of mathematics that is concerned with random phenomena.
So it’s natural to ask what is the meaning of a random phenomena? Many phenomena have
the property that their repeated observation under a specified set of conditions invariably
leads to the same outcome. For example, if a ball initially at rest is dropped from as
height of
2d
d meter through an vacuumed cylinder, it will invariably fall to the ground in t = sec-
g
onds, where g is the gravitational acceleration. There are other phenomena whose repeated
observation under a specified set of conditions does not always lead to the same outcome.
A familiar example of this type is the tossing of a coin. If a coin is tossed 1000 times the
occurrences of heads and tails alternate in a seemingly erratic and unpredictable manner.
It is such phenomena that we think of as being random and which are the object of our
investigation.
At first glance it might seem impossible to make any worthwhile statements about such
random phenomena, but this is not so. Experience has shown that many nondeterministic
phenomena exhibit a “statistical regularity” that makes them subject to study. For example
if we toss a coin large number of times, the proportion of heads seems to fluctuate around 12
unless the coin is severely unbalanced.
The eighteenth century French naturalist Comte de Buffon tossed a coin 4040 times and
got 2048 heads. The proportion (or relative frequency) of heads in this case is 0.507. J.E.
Kerrich from Britain, recorded 5067 heads in 10000 tosses. Proportion of heads in this case
is 0.5067. Statistician Karl Pearson spent some more time, making 24000 tosses of a coin.
He got 12012 heads, and thus, proportion of heads in this case is 0.5005.
Of course, this proportion of heads depends on number of tosses and will fluctuate, even
wildly, as number of tosses increases. But if we let number of tosses go to infinity, will the
sequence of relative proportion “settle down to a steady value”? Such a question can never
be answered empirically, since by the very nature of a limit we cannot put an end to the
trials. So it is a mathematical idealization (or belief) to assume that such a limit does exist
and is equal to 0.5, and then write P (Head) = 0.5. We think of this limiting proportion 0.5
as the “probability” that the coin will land heads up in a single toss.
So there is a certain level of belief/assumption when we say that probability of head is 0.5.
Now we turn around the game. More generally the statement that a certain experimental
outcome has probability p can be interpreted as meaning that if the experiment is repeated

1-1
1-2 Lecture 1: Introduction

a large number of times, that outcome would be observed “about” (approximately) 100p
percent of the time. This interpretation of probabilities is called the relative frequency
interpretation. It is very natural in many applications of probability theory to real world
problems, especially to those involving the physical sciences.
Let us summarize what we have done “When we say that probability of head is 14 while
tossing a coin, we mean if we toss the coin large number of times then approximately 41 th
times we should get head”. Same interpretation can applied when we say that probability
of an even number is 12 when we through a die.
Question: What interpretation you make of the following statement “ Modi has 70% chance
of winning 2019 Loksabha elections”?
It is obvious that we can not have relative frequency interpretation for the above statement.
For the mathematical theory of probability the interpretation of probabilities is irrelevant.
We will use the relative frequency interpretation of probabilities only as an intuitive moti-
vation for the definitions and theorems we will be developing throughout the course.
Lecture 2: Sample Space, Countable & Uncountable Sets
January 7, 2019
Sunil Kumar Gauttam
Department of Mathematics, LNMIIT

Definition 2.1 By a random experiment (or chance experiment), we mean an experiment


(a imaginary thought experiment which may never actually be performed but can be conceived
of as being performed or a real physical process) which has multiple outcomes (at least two)
and one don’t know in advance which outcome is going to occur, unless one perform the
experiment. Also, when experiment is performed exactly one out of several possible outcomes
will be produced.

Example 2.2 Tossing a coin, throwing a die are random experiment. Unless you throw the
coin or die you don’t know what is coming up. Though we know all possible outcomes of the
random experiment.

Remark 2.3 There is no restriction on what constitutes a random experiment. For example,
it could be a single toss of a coin, or three tosses, or an infinite sequence of tosses. However,
it is important to note that in our mathematical model of randomness, there is only one
random experiment. So, three tosses of a coin constitute a single experiment, rather than
three experiments.

We begin with a model for a random experiment whose performance, results in an idealized
outcome from a family of possible outcomes. The first element of the model is the specifi-
cation of an abstract sample space (Ω) representing the collection of idealized outcomes of
the random experiment. Next comes the identification of a family of events ( F) of interest,
each event represented by an aggregate of elements of the sample space. The final element
of the model is the specification of a consistent scheme of assignation of probabilities (P ) to
events. We consider these elements in turn.

2.1 The Sample Space


Definition 2.4 Sample space of a random experiment is the set of all idealized outcomes of
the random experiment. We denote it by the uppercase Greek letter Ω.

Example 2.5 In random experiment of tossing a coin, the sample space is Ω = {head, tail}
or {H, T }.

2-1
2-2 Lecture 2: Sample Space, Countable & Uncountable Sets

Remark 2.6 In the sample space for the random experiment in Example 2.5, one might also
include as possible outcomes “coin stands on the edge” or “coin disappears” (Shaktimaan
threw the coin and it crossed the gravity of the earth!!). It is not too serious if we admit
more things into our consideration than can really occur, but we want to make sure that we
do not exclude things that might occur. As a guiding principle, the sample space should have
enough detail to distinguish between all outcomes of interest to the modeler, while avoiding
irrelevant details. In this spirit we have made use of idealized outcomes.

Example 2.7 Toss two coins and note down the number of heads obtained. Here sample
space is Ω = {0, 1, 2}

Example 2.8 In the random experiment of tossing a coin till you get a head, the sample
space is
Ω = {H, T H, T T H, T T T H, · · · }

Example 2.9 Consider throwing a dart on a square target and viewing the point of impact
as the outcome. The sample space is the set of all the points on the square.

The sample space of a random experiment may consist of a finite or an infinite number of
possible outcomes. Infinite sets are further divided into two categories: countably infinite
and uncountable. The sample space in Example 2.8 is countably infinite and the sample
space in Example 2.9 is uncountable.
That is to say that there are two different order of infinity and uncountable is higher order
of infinity compare to countably infinite.

2.1.1 Countable Sets

We want to measure number of elements in a set. When set S is a finite set, it is very clear
to us that we may list the elements as {s1 , s2 , · · · sn } for some positive integer n. When we
deal with infinite sets, things are not that straight. For example, set of natural numbers
N “appears” to contain as much as “double” of elements in set S = {2n|n ∈ N} (set of all
even positive integers). Can we say that number of elements in N is 2 times the number of
elements in S? If your answer is yes, then are you saying that ∞ < 2 × ∞? So we run into
problems.
Now here is the idea due to Cantor. Let us recast the notion of finiteness as follows:
A set S has n elements iff there exists a bijection f : {1, 2, · · · , n} → S.
So finiteness has been recasted as a one-to-one correspondence between the set and a subset
of natural number. This idea paves the way to define the following notion.
Lecture 2: Sample Space, Countable & Uncountable Sets 2-3

Definition 2.10 A set S is said to be countably infinite if there exists a function f : N → S


such that f one-one and onto.

In other words, a set is countably infinite when it can be put into 1-to-1 correspondence with
the set of positive integers. In this case we say that the set S is as infinite as N.

Remark 2.11 Since every countably infinite set is the range of a 1-1 function defined on
N, we may regard every countably infinite set as the range of a sequence of distinct terms
(Note that in general the terms x1 , x2 , x3 , · · · of a sequence need not be distinct). Speaking
more loosely, we may say that the elements of any countably infinite set can be “arranged”
in a sequence of distinct terms.

Example 2.12 Let S be the set of all even positive integers. Then it is countable.

1 2 3 4 5 6 7 8 9 ···
..
l l l l l l l l l .
2 4 6 8 10 12 14 16 18 · · ·

It is highly counter intuitive that both the sets have “same order of infinity”, though empiri-
cally it appears that set S have half of the elements compare to N.

Example 2.13 Set of all integers Z is a countable set.

1 2 3 4 5 6 7 8 9 ···
..
l l l l l l l l l .
0 1 −1 2 −2 3 −3 4 −4 · · ·

Once again Z and N have same of number of elements in sense of bijection.

Example 2.14 The set of all positive rational numbers is countably infinite.
The following figure gives a schematic representation of listing the positive rationals. In
this figure the first row contains all positive rationals with numerator 1, the second all with
numerator 2, etc.; and the first column contains all with denominator 1, the second all with
denominator 2, and so on. Our listing amounts to traversing this array of numbers as the
arrows indicate, where of course all those numbers already encountered are left out.
2-4 Lecture 2: Sample Space, Countable & Uncountable Sets

We see that every natural number is associated with the unique positive rational number and
for every positive rational number, there exists a natural number. Hence set of all positive
rational numbers is countable.

Example 2.15 Show that set of all rational numbers is countably infinite.

Solution: In proof of countability of positive rationals, we counted as follows:

1 2 3 4 5 6 7 8 9 ···
.
l l l l l l l l l ..
1 1 1 2 3
1 2
2 3
3 4 3 2
4 ···

Now what we did for counting integers, we do the same thing to include non-positive rationals
as follows:
1 2 3 4 5 6 7 8 9 ···
..
l l l l l l l l l .
1
0 1 −1 2
− 12 2 −2 1
3
− 13 · · ·

Definition 2.16 An infinite set which is not countably infinite is called uncountable.

Example 2.17 Set of irrational numbers is uncountable. Hence set of real numbers is also
uncountable. In fact, all the intervals

(a, b), (a, b], [a, b), [a, b]

where a, b ∈ R and a < b, are uncountable.


Lecture 3: Review of Set Theory and Field of Events
January 8, 2019
Sunil Kumar Gauttam
Department of Mathematics, LNMIIT

Definition 3.1 The elements ω of Ω will be called sample points, each sample point ω iden-
tified with an idealised outcome of the underlying random experiment experiment.

In Example 2.5, Head and Tail are the sample points, in Example 2.7, 0, 1, 2 are the sample
points.

Definition 3.2 Any subset of a sample space is said to be an event.

Example 3.3 {H}, {H, T } are events corresponding to the sample space in Example 2.5.
{0}, {1, 2}, {0, 2} are events corresponding to the sample space in Example 2.7

Since empty-set is also considered as subset of every set hence this is also a valid event.
It is called impossible event. When there is no outcome corresponding some event, that is
referred as null event.

Remark 3.4 H is a sample point and {H} is an event. 0 is a sample point {0} is an event.

Recall that the first two elements of a probability model are sample space Ω and σ-field F.
Both are sets. Also, σ-field is collection of events (which are subsets of Ω). So our model
makes extensive use of set operations, so let us introduce at the outset the relevant notation
and terminology.
A set is a collection of objects, which are the elements of the set. If S is a set and x is an
element of S, we write x ∈ S. If x is not an element of S, we write x ∈ / S. A set can have
no elements (for example, the set of all 20 feet tall students in the LNMIIT), in which case
it is called the empty set, denoted by ∅.
In probability model the sample space Ω will be considered as universal set, which contains
all objects that could conceivably be of our interest in a particular context.

3-1
3-2 Lecture 3: Review of Set Theory and Field of Events

3.1 Set Operations


If every element of a set S is also an element of a set T , we say that S is a subset of T , and
we write S ⊂ T or T ⊃ S. If S ⊂ T and T ⊂ S, then two sets are equal , and we write
S = T.
Given sets S and T , new sets may be constructed by union, intersection, and set differences.
The union of two sets S and T , is the set of all elements that belong to S or T (or both), and
is denoted by S ∪ T . Also it is trivial to see S ∪ T = T ∪ S and S ∪ (T ∪ U ) = (S ∪ T ) ∪ U .
First one says that union is commutative operation and second one states associativity of
union.
The intersection of two sets S and T is the set of all elements that belong to both S and T ,
and is denoted by S ∩ T . Once again intersection is also commutative and associative.
Because of the associative laws, however, we can write

A ∪ B ∪ C, A ∩ B ∩ C ∩ D

without brackets. But a string of symbols like A ∪ B ∩ C is ambiguous, therefore not defined;
indeed (A ∪ B) ∩ C is not identical with A ∪ (B ∩ C).
The set difference S \ T is the set whose members are those elements of S that are not
contained in T .
Let S ⊂ Ω. Then The complement of S, with respect to the Ω, is the set {x ∈ Ω|x ∈
/ S} of
c c
all elements of Ω that do not belong to S, and is denoted by S . Note that Ω = ∅ (with
respect to Ω). It is easy to see that (S c )c = S and S ∩ S c = ∅.

Proposition 3.5 For any three sets A, B, C we have

1. Distributive law of sets:

(a) (A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C)
(b) (A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C)

2. DeMorgan’s law:

(a) (A ∪ B)c = Ac ∩ B c
(b) (A ∩ B)c = Ac ∪ B c

In some cases, we will have to consider the union or the intersection of infinitely many sets.
Lecture 3: Review of Set Theory and Field of Events 3-3

For example, if for every positive integer n, we are given a set Sn , then

[
Sn = S1 ∪ S2 ∪ · · · = {x|x ∈ Sn for at least one n}
n=1
\∞
Sn = S1 ∩ S2 ∩ · · · = {x|x ∈ Sn for all n}
n=1

These are also called countable union and countable intersection, respectively. If for every
real number α ∈ (0, 1), we are given a set Sα , then

[
Sα = {x|x ∈ Sα for at least one α} uncountable union
α

\
Sα = {x|x ∈ Sα for all α} uncountable intersection
α

Two sets are said to be disjoint if their intersection is empty. More generally, several sets
(finite or infinite) are said to be disjoint if no two of them have a common element.
Sets and the associated operations are easily visualized in terms of Veen diagrams:
3-4 Lecture 3: Review of Set Theory and Field of Events

3.2 The algebra (or Field) of events


We suppose now that we are dealing with a fixed, sample space Ω representing a chance
experiment. The events of interest are subsets of the space Ω at hand. In the probabilistic
setting, the special set Ω connotes the certain event, ∅ the impossible event. Probabilistic
language lends flavour to the dry terminology of sets. We say that an event A occurs if the
performance of the random experiment yields an outcome ω ∈ A. If A and B are events
and A ⊂ B then the occurrence of A implies the occurrence of B. If, on the other hand,
A ∩ B = ∅, that is to say, they are disjoint, the occurrence of one precludes the occurrence
of the other and we say that A and B are mutually exclusive.
Given events A and B, it is natural to construct new events of interest by apply various set
operations: union, intersection, set difference. Clearly, it is desirable when discussing the
family F of events to include in F all sets that can be obtained by such natural combinations
of events in the family.
Note that intersections may be written in terms of unions and complements alone. That is
A ∩ B = (Ac ∪ B c )c . Similarly, we may write set difference A \ B = A ∩ B c = (Ac ∪ B)c .
Therefore it will be enough for the family F of events under consideration to be closed under
complements and unions. It is clear that, as long as F is non-empty, then it must contain
both Ω and ∅. Indeed, if A is any event then so is Ac whence so is Ac ∪A = Ω and Ac ∩A = ∅.

Definition 3.6 Let Ω be a nonempty set. A collection F of subsets of Ω is called an algebra


if

(i) Ω ∈ F
(ii) A ∈ F =⇒ Ac ∈ F, (in other words F is closed under complementation).
(iii) A, B ∈ F =⇒ A ∪ B ∈ F (i.e., F is closed under finite union).

Remark 3.7 The meaning of the statement A ∈ F =⇒ Ac ∈ F, is that if A ∈ F then


Ac ∈ F. It does not mean that F contains complement of every subset A of Ω. Similar is
the meaning of the statement A, B ∈ F =⇒ A ∪ B ∈ F, if A, B are two sets in F then
there union is also in F. It does not mean that F contains union of any two subsets A, B
of Ω.

Example 3.8 (i) Let Ω be any nonempty set and let F = {Ω, ∅}.
(ii) Let Ω be any nonempty set, then P(Ω) is an algebra.
(iii) Let Ω = {a, b, c, d}. Consider the classes

F1 = {Ω, ∅, {a}}
Lecture 3: Review of Set Theory and Field of Events 3-5

and
F2 = {Ω, ∅, {a}, {b, c, d}}.
Then, F2 is an algebra, but F1 is not an algebra, since {a}c ∈
/ F1 .
Lecture 4: σ-Field and Probability Measure
January 9, 2019
Sunil Kumar Gauttam
Department of Mathematics, LNMIIT

Example 4.1 (An algebra or field generated by intervals) Identifying Ω with the right-
closed (or half closed) unit interval (0, 1], let J be the set of all half-closed subintervals of Ω
of the form (a, b]. Let R(J ) denote the family of all finite unions of the half-closed intervals
in J . The set R(J ) is clearly non-empty (and, in particular, contains (0, 1]), is by definition
closed under finite unions , and, as (a, b]c = (0, 1] \ (a, b] = (0, a] ∪ (b, 1], it follows that R(J )
is closed under complements as well. The collection R(J ) is hence a field.

When dealing with infinite sample space, as in Example 4.1, it is natural to consider count-
ably infinite unions or intersection of events, to get other events. For example, let us
the events (0, 1/2], (0, 2/3], (0, 3/4], · · · , (0, 1 − 1/n], · · · As this is an increasing sequence
of events, the finite union of the first n − 1 of these intervals is (0, 1 − 1/n] (and in particular
is contained in J , hence in R(J )). Now observe that

[
(0, 1 − 1/n] = (0, 1).
n=2

Also (0, 1) ∈
/ R(J ). It follows that the algebra R(J ) is not closed under countable unions:
countably infinite set operations on the members of R(J ) can yield quite natural sets that
are not in it. Examples of this nature suggest that it would be quite embarrassing if the
family of events that we are willing to consider, does not make provision for the events which
could be obtained by applying countably infinite unions, intersections.
Bearing in mind this caution, we must entertain countably infinite combinations of events.
In this context the Greek letter σ (pronounced “sigma”) is used universally to indicate a
countable infinity of operations.

Definition 4.2 A collection of sets F is called an σ-algebra or σ-field (or sometimes Borel
field) if

(i) Ω ∈ F
(ii) A ∈ F =⇒ Ac ∈ F.

[
(iii) A1 , A2 , · · · ∈ F =⇒ An ∈ F (i.e., F is closed under countable union).
n=1

4-1
4-2 Lecture 4: σ-Field and Probability Measure

Classes in Example 3.8 (i), (ii) are σ-algebra. If sample space Ω is finite then any field is a
σ-field.
One can ask what is the σ-field in set up of Example 4.1 which contains the class J ? It is not
obvious what that should be. Indeed, this is by no means a trivial problem, and questions
of this type are in the province of a branch of advanced mathematics called measure theory
and cannot be dealt with at this level. But it is clear that this σ-algebra should contain all
the subintervals of (0, 1] of the form

[a, b], (a, b), and [a, b).

Also it will contain all the sets that can be formed by taking (possibly countably infinite)
unions and intersections of sets of above subintervals.
So it is natural to ask that if we are not telling what exactly constitutes the σ-field, then
why one should talk about it at all? We offer two reasons for doing so. First the notion of
σ-field allows us to define other concepts in probability theory in a precise manner. Second
is auxiliary quantities (especially random variables) quickly become the dominant theme of
the theory and the σ-field itself fades into the background.
Hence now onwards we will be assuming that the events at hand are elements of some suitably
large σ-algebra of events without worrying about the specifics of what exactly constitutes
the family.

4.1 The probability measure


So we now begin with an abstract sample space Ω equipped with a σ-algebra F containing
the events of interest to us. The last element in a probability model is to determine a
consistent scheme of assigning probabilities to events.

Definition 4.3 A function P from F to the set of real-numbers is called a probability mea-
sure if it satisfies the following properties

1. (Nonnegativity) P (A) ≥ 0 for all events A.

2. (Normalization) P (Ω) = 1.

3. (Countable Additivity) If A1 , A2 , · · · is a sequence of disjoint events (i.e., Ai ∩Aj =


∅ if i 6= j), then the probability of their union satisfies

! ∞
[ X
P An = P (An )
n=1 n=1
Lecture 4: σ-Field and Probability Measure 4-3

The triple (Ω, F, P ) is called a probability space.


How do these axioms gel with our experience? Our intuitive assignment of probabilities to
results of chance experiments is based on an implicit mathematical idealisation of the notion
of limiting relative frequency. Suppose A is an event. If, in n independent trials (we use the
word “independent” here in the sense that we attach to it in ordinary language) A occurs
m times then it is natural to think of the relative frequency m/n of the occurrence of A as
a measure of its probability. Indeed, we anticipate that in a long run of trials the relative
frequency of occurrence becomes a better and better fit to the “true” underlying probability
of A.
As 0 ≤ m/n ≤ 1, the positivity and normalisation axioms are natural if our intuition for
odds in games of chance is to mean anything at all. The selection of 1 as normalisation
constant is a matter of convention.
Likewise, from point of relative frequencies, if A and B are two mutually exclusive events
and if in n independent trials A occurs m1 times and B occurs m2 times, then the relative
frequency of occurrence of either A or B in n trials is (m1 + m2 )/n = m1 /n + m2 /n.
Probability measure is now forced to be additive if it is to be consistent with experience.
Now as we mentioned that countable additivity is a natural thing to assume in order to
compute probability of certain events which is disjoint countable union.
Probability measure is additive. To visualize countable additivity: Think of probability
measure as mass. So it assigns mass 1 unit to the sample space Ω (think of some physical
object). Every subset ( or piece) of Ω is going have mass ≥ 0. And if we have subspecies
A1 , A2 , · · · which are disjoint, then mass of union of these is sum of all individuals masses.
Probability behaves like length, area, volume. These are natural examples of probability
measure. And all the axioms of probability are so natural in these examples.
Lecture 5: Properties of Probability Measure
January 11, 2019
Sunil Kumar Gauttam
Department of Mathematics, LNMIIT

Example 5.1 Consider an experiment of throwing a dart on the unit square target. The
“area” of a set is a natural candidate of probability measure which satisfies all the axioms. So
importance of σ-field in probability theory is illustrated by the fact that only “nice” subset of
unite square are measurable in terms of area. Look the following subset which is too jagged,
rough, or inaccessible.

So from purely mathematical point of view, we can not work with power set (the largest σ-
algebra) in this case. Hence we have to restrict ourselves to the class of subsets for which we
can define area. So collection of those nice sets forms our σ-field.

5.1 Deductions from the Axioms


There are many natural properties of a probability measure, which have not been included in
the definition for the simple reason that they can be derived using, the axioms. In this respect
the axioms of a mathematical theory are like the constitution of a government. Unless and
until it is changed or amended, every law must be made to follow from it. In mathematics
we have the added assurance that there are no divergent views as to how the constitution
should be construed.

Theorem 5.2 (Properties of Probability Measure) Suppose (Ω, F, P ) be a probability


space. Then we have the following:

5-1
5-2 Lecture 5: Properties of Probability Measure

1. P (∅) = 0.
2. (Finite Additivity) If A and B are two mutually exclusive events then show that
P (A ∪ B) = P (A) + P (B).
3. For any event A, P (Ac ) = 1 − P (A).
4. For any two events such that A ⊂ B, we have P (B \ A) = P (B) − P (A).
5. (monotonicity) For any two events such that A ⊂ B, we have P (A) ≤ P (B).
6. P (A) ≤ 1 for any event A.
7. (finite sub-additivity) For any two events A and B, we have P (A ∪ B) ≤ P (A) +
P (B).
8. (Continuity) Let An , n ≥ 1 be events.

!
[
(a) If A1 ⊂ A2 ⊂ · · · Then P Ak = lim P (Ak ).
k→∞
k=1

!
\
(b) If A1 ⊃ A2 ⊃ · · · Then P Ak = lim P (Ak ).
k→∞
k=1

Proof:

1. Let An = ∅ for all n, then (An )’s are disjoint. Therefore by countable additivity

! ∞ ∞
[ X X
P (∅) = P An = P (An ) = P (∅)
n=1 n=1 n=1

This is possible iff P (∅) = 0.


2. Set A1 = A, A2 = B, Ak = ∅, for k = 3, 4, · · · . Then (An ) is sequence of mutually

[
exclusive events and An = A ∪ B. Therefore by countable additivity we get
n=1


! ∞ ∞
[ X X
P (A ∪ B) = P An = P (An ) = P (A) + P (B) + P (∅) = P (A) + P (B)
n=1 n=1 n=3

3. A ∪ Ac = Ω. Also A and Ac are mutually exclusive events, hence using normalization


and finite additivity of probability measure we are done.
4. Note that B = A ∪ (B \ A). Now A and B \ A are mutually exclusive. Using finite
additivity
Lecture 5: Properties of Probability Measure 5-3

5. Using previous one and non-negativity we are done.

6. Since A ⊂ Ω, hence by normlization axiom and monotonicity we are done.

7. Note that A ∪ B = A ∪ (B \ (A ∩ B)). Now A and B \ (A ∩ B) are mutually exclusive.


Also from 2 above P (B \ (A ∩ B)) = P (B) − P (A ∩ B). So we obtain

P (A ∪ B) = P (A) + P (B) − P (A ∩ B).

Now by non-negativity gives the desired result.



[
8. (a) Suppose A1 ⊂ A2 ⊂ · · · and A := Ak . Set B1 = A1 , and for each n ≥ 2, let
k=1
Bn denote those points which are in An but not in An−1 , i.e., Bn = An \ An−1 . By
[n ∞
[ ∞
[
definition, the sets Bn are disjoint. Also An = Bk and A = Bk = Ak .
k=1 k=1 k=1
Hence
n
X
P (An ) = P (Bk )
k=1

Since the left side above cannot exceed 1 for all n, P (Bk ) ≥ 0 for all k, so sequence
of partial sums is increasing and bounded above hence the series on the right side
must converge. Hence we obtain
n
X ∞
X
lim P (An ) = lim P (Bk ) =: P (Bk ) = P (A). (5.1)
n→∞ n→∞
k=1 k=1

(b) Now if A1 ⊃ A2 ⊃ · · · Then Ac1 ⊂ Ac2 ⊂ · · · . Hence by part (a),



!
[
P Ack = lim P (Ack )
k→∞
k=1
" ∞
!c #
[
1−P Ack = lim [1 − P (Ak )]
k→∞
k=1

!
\
1−P Ak = 1 − lim P (Ak )
k→∞
k=1

!
\
P Ak = lim P (Ak )
k→∞
k=1
MATH-221: Probability and Statistics
Tutorial # 1 (Countable & uncountable sets, field & σ-field, Probability Measure)

1. If A is a finite set and B is a countably infinite set then show that A ∪ B is countably
infinite.

[
2. If A1 , A2 , · · · be countably infinite events then show that An is countably infinite.
n=1

3. Show that open interval (0, 1) is uncountable and hence show that the open interval
(a, b) is uncountable, where a and b be any two real numbers with a < b.
4. Let Ω 6= φ be a finite set and F is a field of subsets of Ω. Show that F is a σ-field
(or Borel field).
5. Let Ω = (−∞, ∞), the real line. Suppose B is a σ-field (or Borel field or σ-algebra)
which contains all the intervals of the form (−∞, x], x ∈ R. Then show that B also
contains all the intervals of the form
[a, b], (a, b), (a, b], [a, b), where a, b ∈ R, a < b.

6. Let A, B, C be events such that P (A) = 0.7, P (B) = 0.6, P (C) = 0.5, P (A ∩ B) =
0.4, P (A ∩ C) = 0.3, P (C ∩ B) = 0.2 and P (A ∩ B ∩ C) = 0.1. Find P (A ∪ B ∪
C), P (Ac ∩ C) and P (Ac ∩ B c ∩ C c ).
7. Prove or disprove: If P (A ∩ B) = 0 then A and B are mutually exclusive events.
8. Does there exists a probability measure (or function) P such that the events A, B, C
satisfies P (A) = 0.6, P (B) = 0.8, P (C) = 0.7, P (A∩B) = 0.5, P (A∩C) = 0.4, P (C ∩
B) = 0.5 and P (A ∩ B ∩ C) = 0.1 ?
n
X
9. For events A1 , A2 , · · · , An , show that P (A1 ∩ A2 ∩ · · · ∩ An ) ≥ P (Ai ) − n + 1.
i=1

10. Let Ω = N. Define a set function P as follows: For A ⊂ Ω,



0 if A is finite
P (A) = .
1 if A is infinite
Is P a probability measure (or function)?
11. Let A1 , A2 , · · · be a sequence of events then show that

! ∞
[ X
P An ≤ P (An ).
n=1 n=1

12. Let An , n ≥ 1 be a sequence of events. Then prove the following:



!
[
(a) If A1 ⊂ A2 ⊂ · · · Then P Ak = lim P (Ak ).
k→∞
k=1

!
\
(b) If A1 ⊃ A2 ⊃ · · · Then P Ak = lim P (Ak ).
k→∞
k=1
Lecture 7: Example of Probability spaces
January 18, 2019
Sunil Kumar Gauttam
Department of Mathematics, LNMIIT

A probability space has three elements Ω the sample space, F-collection of events, P the
probability measure. Choice of sample space is relatively easy and clear from underlying
random experiment. Choosing appropriate σ-field might be difficult but one can assume
existence of a σ-field which contains events of interest. Defining a probability function is not
at all an easy task and definition makes no attempt to tell what particular function P to
choose; it merely requires P to satisfy the axiom. As we shall see that for a sample space
with same σ-algebra we may define many different probability measure.
Now we illustrate how to construct probability spaces starting from some common sense
assumptions about the random experiment.

7.1 Finite Sample Space


If the sample space consists of a finite number of possible outcomes, then we can take any
σ-field F (including the largest one, the power set of the sample space). The probability
measure is specified by the probabilities of each single outcome. Let Ω = {ω1 , ω2 , · · · , ωn }
for some n ∈ N. Then choose numbers p1 , p2 , · · · , pn such that
n
X
pi ≥ 0, and pi = 1.
i=1

Define
X P ({ωi }) = P (ωi ) = pi for each i = 1, 2, · · · , n. If A is any event in F, then P (A) =
P (ω). Then P is a probability measure. Indeed,
ω∈A

1. P (A) is sum of P (ωi )’s such that ωi ∈ A and P (ωi ) = pi ≥ 0 hence, there sum is also
non-negative.
n
X n
X
2. P (Ω) = P (ωi ) = pi = 1.
i=1 i=1

3. If A1 , A2 , · · · is a sequence of pairwise disjoint events, then


X
P (∪k Ak ) = P (ω) = P (A1 ) + P (A2 ) + · · ·
ω∈∪k Ak

7-1
7-2 Lecture 7: Example of Probability spaces

Example 7.1 In a coin tossing experiment, let us assign the probability p of coming head
and 1−p of coming tail such that 0 ≤ p ≤ 1. Then this assignment satisfy all the properties of
probability measure. Hence, we see that there are infinitely many ways to define a probability
law on a sample space.

7.1.1 Equally likely outcomes or Discrete Uniform Probability law

There is a special probability law when the sample space is finite. It is based on symmetry,
which is a very natural thing to assume in case of absence of any bias or additional infor-
mation. If we assume that all the outcomes {ω1 , ω2 , · · · , ωn } are “equally likely” or they
1
have same chance of occurring then P (ωi ) = for each i = 1, 2, · · · , n. This probability
n
assignment is called discrete uniform probability law. You must realize by now that whatever
probability theory you people have done in your earlier classes, it falls into this setup. Recall
your definition of the probability of an event A,

Number of outcomes favourable to A


P (A) =
Number of all possible outcomes of the experiment.

7.2 Countable Sample Space

We shall show how easy it is to construct probability measures for any countable space
Ω = {ω1 , ω2 , · · · }. Once again we can work with any σ-field F (including the largest one,
the power set of the sample space). The probability measure is specified by the probabilities
of each single outcome. Each sample point ωn has probability pn subject to the conditions

X
∀ n : pn ≥ 0, pn = 1. (7.1)
n=1

In symbols, we write P ({ωn }) = P (ωn ) = pn for all n. Now for any subset A of Ω, we define
its probability to be
X
P (A) := P (ω).
ω∈A

Once again it is very easy to verify all the axioms of probability function.

Example 7.2 In the random experiment of tossing a coin till you get a head, the sample
space is
Ω = {H, T H, T T H, T T T H, · · · }.
Lecture 7: Example of Probability spaces 7-3

It clear that sample space is countably infinite. We assign probability


1
P (H) =
2
1
P (T H) = 2
2
1
P (T T H) = 3
2

Then it is a valid probability measure. One can define another probability measure:
2
P (H) =
3
2
P (T H) = 2
3
2
P (T T H) = 3
3


X 1 1/3
As we know that 2 n
= 2 = 1, hence it defines a valid candidate for probability
n=1
3 1 − 1/3
measure.

X
In fact, if (an )n≥1 is sequence of non-negative terms such that an < ∞. Then define
n=1

X
S = an , which is a positive real number. Now define probability of each outcome as
n=1
follows:
an
P (ωn ) = , for each n = 1, 2, · · ·
S

X 1
Then we get a probability measure. As a particular example, recall that p
converges
n=1
n
for every real number p > 1. Let us denote the sum by Sp . Then for every p > 1, we define
a probability measure as follows:
1
P (ωn ) = , for each n = 1, 2, · · ·
Sp np

So we have defined uncountably many probability measures on the same sample space and
same σ-field.

Exercise 7.3 Show that all outcomes of a countable sample space can not be equally likely.
7-4 Lecture 7: Example of Probability spaces

Solution: Suppose the contrary, i.e., there exists p > 0 such that P (ωn ) = p for all n =
1, 2, · · · . Now by countable additivity

X ∞
X
P (Ω) = P (ωn ) = p = ∞.
n=1 n=1

which is a contradiction.

7.3 Uncountable Sample Space


The situation is more complicated when we deal with uncountable sample space. It is not
possible to define probability for every subset of sample space, and at the same time be
consistent with the axioms of probability function and it’s consequences. We have already
seen an example where we can not talk about area of every subset of a unit square.

Example 7.4 Let Ω = [0, ∞). Once again it may not be possible to define probability for
every subset of Ω, so we again consider the σ-field which contains all subintervals of Ω. For
subinterval [a, b) define the probability as
P {[a, b)} = e−a − e−b , 0 ≤ a < b ≤ ∞.
Once again it is easy to verify the axioms of probability function for class of intervals of the
form [a, b).

1. If a < b then −a > −b and ex is an strictly increasing function, hence e−a > e−b .
Hence we have non-negativity.
2. P (Ω) = 1.
3. Let us verify the finite additivity.
P {[0, 2)} = P {[0, 1) ∪ [1, 2)}
e−0 − e−2 = e−0 − e−1 + e−1 − e−2

One may jump to the conclusion from Exercise 7.3, that in an uncountable sample space
(since every uncountable set has a countably infinite subset), all outcomes can not be equally
likely. But this is not the case, for example, in above
(∞  )  
\ 1 1
P {a} = P a, a + = lim P a, a + = lim e−a − e−a−1/n = 0, ∀a ∈ Ω
n=1
n n→∞ n n→∞

Hence it follows that for 0 ≤ a < b < ∞,


P {[a, b]} = P {(a, b)} = P {(a, b]} = P {[a, b)}.
Lecture 8: Conditional Probability
January 21, 2019
Sunil Kumar Gauttam
Department of Mathematics, LNMIIT

We know something about the world and based on what we know when we set up a probability
model. Then something happens, and somebody tells us a little more about the world, gives
us some new information. This new information, in general, should change our beliefs about
what happened or what may happen. So whenever we’re given new information, some
partial information about the outcome of the experiment, we should revise our beliefs. And
conditional probabilities are just the probabilities that apply after the revision of our beliefs,
when we’re given some information.
Let us take an example from our class only. By looking at the attendance data of Mr.
Sanskar Bindal (17UCS141) we assign probability of Sanskar being present in the class on a
given day is zero. Now if I tell you that tomorrow is the quiz in class, do u still think that
probability of Sanskar being present on tomorrow remains unchanged?
In more precise terms, given an experiment, a corresponding sample space, and a probability
law, suppose that we know that the outcome is within some given event B. We wish to
quantify the likelihood that the outcome also belongs to some other given event A. We thus
seek to construct a new probability law that takes into account the available knowledge:
a probability law that for any event A, specifies the conditional probability of A given B,
denoted by P (A|B). We would like the conditional probabilities P (A|B) of different events
A to constitute a legitimate probability law, which satisfies the probability axioms.

Saying that B has occurred means outcome already lies in B. So B is our new universe.
Now we ask what is the probability that outcome is in set A given it is already in B. Think
of probability measure as area or mass. It is clear that it should area or mass of A ∩ B. Now

8-1
8-2 Lecture 8: Conditional Probability

we want this new assignment of probability to follow our axioms of probability, one of them
is probability of sample sapce is 1, since our new universe or sample space is set B, in order
to have P (B|B) = 1 we divide by P (B). This motivate the definition.

Definition 8.1 Let (Ω, F, P ) be a probability space and A ∈ F be such that P (A) > 0. The
conditional probability of an event B ∈ F given A is denoted by P (B|A) is defined as

P (A ∩ B)
P (B|A) = .
P (A)

Define
FA := {B ∩ A|B ∈ F}
So FA is class (or family or collection or set) of subsets of A.

Exercise 8.2 Show that FA is a σ-field of subsets of A. (Mind you, it is A not Ω.)

Solution:

1. A ∩ A = A hence A ∈ FA

2. Let C ∈ FA , i.e., there is some B ∈ F such that C = A ∩ B. Draw a diagram. Keep


in mind that we have to take compliment of C in A (not in Ω), implies C c = A ∩ B c .
Since B c ∈ F hence C c ∈ FA .

3. Let C1 , C2 , · · · ∈ FA . Then there exists B1 , B2 , · · · ∈ F such that Cn = A ∩ Bn for all



[
n ∈ N. Note that Bn ∈ F and
n=1

∞ ∞ ∞
!
[ [ [
Cn = (A ∩ Bn ) = A ∩ Bn ∈ FA
n=1 n=1 n=1

Define PA on FA as follows:

PA (B) = P (B|A), B ∈ FA .

Then (A, FA , PA ) is a probability space. We are already assuming P (A) > 0 which implies
A is non-empty. We already shown FA is a σ-field of subsets of A. Now only thing left is to
show PA is a probability measure.

1. P (B|A) ≥ 0 by definition.
Lecture 8: Conditional Probability 8-3

2. PA (A) = P (A|A) = 1.

3. Countable additivity of PA follows from countable additivity of P .

Remark 8.3 Let us also note that since we have P (B|B) = 1, all of the conditional proba-
bility is concentrated on B. Thus, we might as well discard all possible outcomes outside B
and treat the conditional probabilities as a probability law defined on the new universe B.
Since conditional probabilities constitute a legitimate probability law, all general properties of
probability measure remain valid.

Example 8.4 Consider an experiment involving two successive rolls of a fair die. If the
sum of the two rolls is 9, how likely is it that the first roll was a 6?

Solution: Let B be event that the sum of the two rolls is 9. Then B = {(3, 6), (6, 3), (4, 5), (5, 4)}.
Let A be the event that the first roll is 6, then A has 6 elements. Now
1
P {(6, 3)} 36 1
P (A|B) = = 4 =
P (B) 36
4

In Example 8.4 we considered, the probability space was specified, and we computed condi-
tional probability. In many problems however, we actually proceed in the opposite direction.
We are given in advance what we want some conditional probabilities to be, and we use this
information and the rules of probabilities to compute the requested probabilities. A typical
example of this situation is the following.

Example 8.5 Suppose that the population of a certain city is 40% male and 60% female.
Suppose also that 50% of the males and 30% of the females smoke. Find the probability that
a smoker is male.

Solution: Let M denote the event that a person selected is a male and let F denote the event
that the person selected is a female. Also let S denote the event that the person selected
smokes. The given information can be expressed in the form P (S|M ) = .5, P (S|F ) =
.3, P (M ) = .4, and P (F ) = .6. The problem is to compute P (M |S). By definition

P (M ∩ S)
P (M |S) = .
P (S)

Now P (M ∩ S) = P (M )P (S|M ) = (.4)(.5) = 0.20, so the numerator can be computed in


terms of the given probabilities. Now how to compute P (S)? Here is a very important
technique in probability (known as total probability theorem, which we will prove today).
8-4 Lecture 8: Conditional Probability

This is known as “divide and rule”, divide or partition the set S so that it is possible for us
to compute the probability of those partitioned events. Note that S is the union of the two
disjoint sets S ∩ M and S ∩ F . it follows that

P (S) = P (S ∩ M ) + P (S ∩ F ).

Since P (S ∩ F ) = P (F )P (S ∩ F ) = (.6)(.3) = .18. Therefore, P (S) = 0.38. Hence


0.20
P (M |S) = = 0.526315789
0.38

Generally speaking, most problems of probability have to do with several events and it is
their mutual relation or joint action that must be investigated. The following result is useful
for such situation.

Proposition 8.6 For arbitrary events A1 , A2 , · · · , An , we have

P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 )P (A2 |A1 )P (A3 |A1 ∩ A2 ) · · · P (An |A1 ∩ A2 · · · ∩ An−1 ) (8.1)

provided P (A1 ∩ A2 · · · An−1 ) > 0.

Proof: Since
P (A1 ) ≥ P (A1 ∩ A2 ) · · · ≥ P (A1 ∩ A2 · · · An−1 ) > 0,
therefore, all the conditional probabilities in (8.1) are well-defined. Now the right side of
(8.1) is

P (A2 ∩ A1 ) P (A3 ∩ A2 ∩ A1 ) P (A1 ∩ A2 · · · ∩ An−1 ∩ An )


P (A1 ) × × × ··· ×
P (A1 ) P (A2 ∩ A1 ) P (A1 ∩ A2 · · · ∩ An−1 )
Lecture 9: Total Probability Theorem & Bayes Rule
January 22, 2019
Sunil Kumar Gauttam
Department of Mathematics, LNMIIT

Definition 9.1 A collection {A1 , A2 , · · · , AN } of events is said to be a partition of Ω if

1. Ai ’s are pairwise disjoint.

N
[
2. Ai = Ω
i=1

If N < ∞ then partition is said to be finite partition and if N = ∞, then it is called a


countable partition.

Example 9.2 If Ω = N then {E, O} where E is collection even numbers and O is set of odd
numbers. This is a finite partition. If take {{1}, {2}, · · · } as partition then it is a countable
partition.

Now we explore some further applications of conditional probability. The following theorem,
which is often useful for computing the probabilities of various events, using a ”divide-and-
conquer” approach.

9-1
9-2 Lecture 9: Total Probability Theorem & Bayes Rule

Theorem 9.3 (Total Probability Theorem) Let (Ω, F, P ) be a probability space and {A1 , A2 , · · · , AN }
be a partition of Ω such that P (Ai ) > 0 for all i. Then for any event B ∈ F,
N
X
P (B) = P (B|Ai )P (Ai )
i=1

Proof: Event B is decomposed into the disjoint union


B = (A1 ∩ B) ∪ (A2 ∩ B) ∪ · · · (AN ∩ B)
Now using additivity of probability measure we have
N
X
P (B) = P (B ∩ Ai )
i=1

By definition of conditional probability P (B ∩ Ai ) = P (B|Ai )P (Ai ). Hence we get the


theorem.
One of the uses of the theorem is to compute the probability of various events B for which
the conditional probabilities P (B|Ai ) are known or easy to derive. The key is to choose ap-
propriately the partition {A1 , A2 , · · · , AN } and this choice is often suggested by the structure
of the problem.

Example 9.4 You enter a chess tournament where your probability of winning a game is
0.3 against half the players (call them type 1), 0.4 against a quarter of the players (call them
type 2), and 0.5 against the remaining quarter of the players (call them type 3). You play a
game against a randomly chosen opponent. What is the probability of winning?

Solution: Let Ai be the event of playing with an opponent of type i. We have


P (A1 ) = 0.5, P (A2 ) = 0.25, P (A3 ) = 0.25
Also, let B be the event of winning. We have
P (B|A1 ) = 0.3, P (B|A2 ) = 0.4, P (B|A3 ) = 0.5
Thus, by the total probability theorem, the probability of winning is
P (B) = P (A1 )P (B|A1 ) + P (A2 )P (B|A2 ) + P (A3 )P (B|A3 )
= 0.5 × 0.3 + 0.25 × 0.4 + 0.25 × 0.5
= 0.375.

The total probability theorem is often used in conjunction with the following celebrated
theorem, which relates conditional probabilities of the form P (A|B) with conditional prob-
abilities of the form P (B|A), in which the order of the conditioning is reversed.
Lecture 9: Total Probability Theorem & Bayes Rule 9-3

Theorem 9.5 (Bayes Theorem) Let {A1 , A2 , · · · , AN }, be a partition of the sample space,
and assume that P (Ai ) > 0, for all i = 1, 2, · · · , N . Then, for any event B such that
P (B) > 0, we have
P (B|Ai )P (Ai )
P (Ai |B) = N
, for each i = 1, 2, · · · N.
X
P (Ak )P (B|Ak )
k=1

Proof: For fixed i,


P (B ∩ Ai ) P (B|Ai )P (Ai ) P (B|Ai )P (Ai )
P (Ai |B) = = = N
P (B) P (B) X
P (Ak )P (B|Ak )
k=1

where the last equality follows from Total probability theorem.


Bayes rule is often used for inference. There are a number of causes that may result in
certain effect. We observe the effect and wish to infer the cause. The events A1 , A2 , · · · , AN
is associated with the causes and event B represents the effect. The probability P (B|Ai ) that
the effect will be observed when the cause Ai is present amounts to a probabilistic model
of the cause-effect relation. Given that effect B has been observed, we wish to evaluate
the probability P (Ai |B) that cause Ai is present. We refer to P (Ai |B) as the posterior
probability of event Ai given the information, to be distinguished from P (Ai ) which we call
the prior probability.
Numerous applications were made in all areas of natural phenomena and human behavior.
Let us look at few examples of the “inference”.

Example 9.6 1. If B is a “body” and the An ’s are the several suspects of the murder,
then the Baye’s theorem will help the jury or court to decide the whodunit.
2. If B is an earthquake and the An ’s are the different physical theories to explain it, then
the theorem will help the scientists to choose between them.
3. We observe a shade in a person’s X-ray (this is event B the “effect”) and we want to
estimate the likelihood of three mutually exclusive and collectively exhaustive potential
causes : cause 1 (event A1 ) is that there is a malignant tumor, cause 2 (event A2 )
is that there is a nonmalignant tumor, and cause 3 (event A3 ) corresponds to the
reasons other than a tumor. We assume that we know the probabilities P (Ai ) and
P (B|Ai ), i = 1, 2, 3. Given that we see a shade (event B occurs), Baye’s theorem gives
the posterior probabilities of the various causes as:
P (B|Ai )P (Ai )
P (Ai |B) = 3
, for each i = 1, 2, 3.
X
P (Ak )P (B|Ak )
k=1
9-4 Lecture 9: Total Probability Theorem & Bayes Rule

Example 9.7 A test for a certain rare disease is assumed to be correct 95% of the time: if
a person has the disease, the test results are positive with probability 0.95, and if the person
does not have the disease, the test results are negative with probability 0.95. A random person
drawn from a certain population has probability 0.001 of having the disease. Given that the
person just tested positive, what is the probability of having the disease?

Solution: If A is the event that the person has the disease, and B is the event that the test
results are positive, the desired probability, P (A|B), is
P (B|A)P (A)
P (A|B) =
P (B|A)P (A) + P (B|Ac )P (Ac )
0.001 × 0.95
=
0.001 × 0.95 + 0.999 × 0.05
= 0.0187
Since P (Ac ) = 0.999, P (·|Ac ) is a probability measure. Since test is 95% correct, that is
P (B c |Ac ) = 0.95. Hence P (B|Ac ) = 0.05.
Note that even though the test was assumed to be fairly accurate, a person who has tested
positive is still very unlikely (less than 2%) to have the disease. According to The Economist
(February 20th, 1999), 80% of those questioned at a leading American hospital substantially
missed the correct answer to a question of this type; most qf them thought that the proba-
bility that the person has the disease is 0.95.

Remark 9.8 The practical utility of Baye’s formula is limited by our usual lack of knowledge
on the various a priori probabilities.
Lecture 10: Independence
January 23, 2019
Sunil Kumar Gauttam
Department of Mathematics, LNMIIT

We have introduced the conditional probability P (A|B) to capture the partial information
that event B provides about event A. An interesting and important special case arises when
the occurrence of B provides no such information and does not alter the probability that A
has occurred, i.e. ,
P (A|B) = P (A).
When the above equality holds, we say that A is independent of B. Note that by the
P (A ∩ B)
definition P (A|B) = , this is equivalent to
P (B)

P (A ∩ B) = P (A)P (B)

We adopt this latter relation as the definition of independence because it can be used even
when P (B) = 0, in which case P (A|B) is undefined. The symmetry of this relation also
implies that independence is a symmetric property; that is, if A is independent of B, then B
is independent of A, and we can unambiguously say that A and B are independent events.

Definition 10.1 Events A and B are said to be independent if

P (A ∩ B) = P (A)P (B)

Independence is often easy to grasp intuitively. For example, if the occurrence of two events
is governed by distinct and noninteracting physical processes, such events will turn out to
be independent .

Example 10.2 We toss a fair coin two times. Then Ω = {HH, HT, T T, T H} and proba-
bility measure is
1
P (ω) = , ∀ω ∈ Ω.
4
Consider the events A = {HH, HT }, B = {HH, T H} and C = {HT, T T }. Clearly P (A) =
P (B) = P (C) = 12 . Event A is first toss is head and event B is second toss is head.
Physically there is no connection what happens in first toss and second toss, intuitively it is
clear that A and B should be independent. In fact that is the case.
1
P (A ∩ B) = P (HH) = = P (A)P (B)
4

10-1
10-2 Lecture 10: Independence

Whereas event C is second toss is tail. Now B and C determine each other in the sense, if
B occurs then C can not occur and vice-versa, so they are not independent. In fact that is
the case.

P (C ∩ B) = P (∅) = 0 6= P (C)P (B)

Once again A and C should be independent. Indeed we have In fact that is the case.
1
P (A ∩ C) = P (HT ) = = P (A)P (C).
4

Note that independence is not easily visualized in terms of the sample space. A common
first thought is that two events are independent if they are disjoint, but in fact the opposite
is true: two disjoint events A and B with P (A) > 0 and P (B) > 0 are never independent,
since their intersection A ∩ B is empty and has probability 0. For example, an event A and
its complement Ac are not independent [unless P (A) = 0 or P (A) = 1] , since knowledge
that A has occurred provides precise information about whether Ac has occurred.
Sometimes the notion of independence does not appear intuitive but the mathematical co-
incidence of equality happens and we have to declare the two events independent.

Example 10.3 Let us consider a random experiment of choosing a real number from (0, 1].
Then Ω = (0, 1] and F is the σ-field which contains all the subintervals
  of Ω,
 and probability
 
1 1 3 1
measure is defined as “length” of the set. Consider events A = 0, ,B = , ,C = ,1 .
2 4 4 4
Then P (A) = 0.5, P (B) = 0.5, P (C) = 0.75. Event A is that the chosen number belong to
the interval (0, 0.5]. Event B is that the chosen number belongs the interval [0.25, 0.75] and
event C is that the chosen number belongs the interval [0.25, 1].
In this experiment the notion of independence/dependence in not at all intuitive. But Then
A, B are independent, A, C are dependent and B, C are dependent.
1
P (A ∩ B) = P {[0.25, 0.5]} = = P (A)P (B)
4
1
P (A ∩ C) = P {[0.25, 0.5]} = 6= P (A)P (C)
4
1
P (B ∩ C) = P {B} = 6= P (B)P (C)
2

As mentioned earlier, if A and B are independent, the occurrence of B does not provide
any new information on the probability of A occurring. It is then intuitive that the non-
occurrence of B should also provide no information on the probability of A. Indeed, we have
the following proposition.
Lecture 10: Independence 10-3

Proposition 10.4 If A and B are independent events, then the following pairs are also
independent:

(a) A and B c ,

(b) Ac and B,

(c) Ac and B c .

Proof:

(a) We must show that P (A ∩ B c ) = P (A)P (B c ).

P (A ∩ B c ) = P (A \ (A ∩ B)) = P (A) − P (A ∩ B) (∵ A ∩ B ⊂ A)
= P (A) − P (A)P (B) = P (A)[1 − P (B)] = P (A)P (B c )

(b) Let us relabel the event A as event C and B as event D. So events D and C are
independent, hence by part (a) it follows that D and C c are independent. That is to
say B and Ac are independent.

(c) If A and B are independent then Ac and B are independent by part (b). Now let us
relabel Ac as event C and relabel B as event D. Applying part (a) on the pair C and
D, we get independence of C and Dc . But C = Ac and Dc = B c . This completes the
proof.

10.1 Independence of more than two events


We might think that we could say A, B and C are independent if P (A∩B∩C) = P (A)P (B)P (C).
However, this is not the correct condition.

Example 10.5 Consider two rolls of a fair six-sided die, and the following events:

A = {1st roll is 1 , 2, or 3},


B = {1st roll is 3 , 4, or 5},
C = {sum of the two rolls is 9}
10-4 Lecture 10: Independence

Then A ∩ B = {(3, i)|i = 1, 2, 3, 4, 5, 6}, A ∩ C = {(3, 6)}, B ∩ C = {(3, 6), (4, 5), (5, 4)}
6 18 18
P (A ∩ B) = 6= · = P (A)P (B)
36 36 36
1 18 4
P (A ∩ C) = 6= · = P (A)P (C)
36 36 36
3 18 4
P (C ∩ B) = 6= · = P (B)P (C)
36 36 36
On the other hand,
1 1 1 1
P (A ∩ B ∩ C) = = · · = P (A)P (B)P (C)
36 2 2 9
So it would be grossly embarrassing if we say that three events A, B, C are independent but
they are not pairwise (any two at a time) independent .

A second attempt at definition of independence of A, B and C, in light of the previous


example, might be to define A, B and C to be independent if all the pairs are independent,
i.e., (A, B), (B, C) and (A, C) are independent.

Example 10.6 Let Ω = {1, 2, 3, 4}. Define


1
P (i) = for i = 1, 2, 3, 4
4
Consider the events A = {1, 2}, B = {1, 3} and C = {1, 4}. Then P (A) = P (B) = P (C) =
1
2
. Note that A, B, C are pairwise independent
1
P (A ∩ B) = P {1} = = P (A)P (B)
4
1
P (A ∩ C) = P {1} = = P (A)P (C)
4
1
P (B ∩ C) = P {1} = = P (B)P (C)
4
1
but P (A ∩ B ∩ C) = P {1} = 4 6= P (A)P (B)P (C).

The preceding two examples show that mutual (or total) independence of a collection of
events requires an extremely strong condition. The following definition works.
We say three events, A1 , A2 and A3 are independent, if it satisfying the four conditions
P (A1 ∩ A2 ) = P (A1 )P (A2 )
P (A2 ∩ A3 ) = P (A3 )P (A2 )
P (A1 ∩ A3 ) = P (A1 )P (A3 )
P (A1 ∩ A2 ∩ A3 ) = P (A1 )P (A2 )P (A3 )
The first three conditions simply assert that any two events are independent, a property
known as pairwise independence. But the fourth condition is also important and does not
follow from the first three.
Lecture 11: Independence of Multiple Events & Counting
January 28, 2019
Sunil Kumar Gauttam
Department of Mathematics, LNMIIT

Example 11.1 Consider the experiment of tossing a fair coin three times. Then Ω =
{HHH, HHT, HT H, T HH, T T H, T HT, HT T, T T T }. Each outcome has equal chance of
occurrence. Let Ai be the event that the ith toss is a head and i = 1, 2, 3. Intuitively,
A1 , A2 , A3 seems to be independent. Let us verify the same as per the definition 11.2. Note
that
A1 = {HHH, HHT, HT H, HT T }
A2 = {HHH, HHT, T HH, T HT },
A3 = {HHH, HT H, T HH, T T H, }.
1
Hence P (Ai ) = 2
for each i = 1, 2, 3.

1 1 1 1
P (A1 ∩ A2 ∩ A3 ) = P (HHH) = = × × = P (A1 )P (A2 )P (A3 )
8 2 2 2
1 1 1
P (A1 ∩ A2 ) = P (HHT, HHH) = = × = P (A1 )P (A2 )
4 2 2
1 1 1
P (A2 ∩ A3 ) = P (HHH, T HH) = = × = P (A3 )P (A2 )
4 2 2

The definition of independence can be extended to multiple events (more than three).

Definition 11.2 We say that the events A1 , A2 , · · · , An are independent if


P (Ai1 ∩ Ai2 · · · ∩ Aim ) = P (Ai1 )P (Ai2 ) · · · P (Aim ).
for every 2 ≤ m ≤ n, and for every choice of indices 1 ≤ i1 < i2 · · · < im ≤ n
   
n n
For m = 2, we have conditions to be checked for pairs. For m = 3, we have
2 3
conditions to be checked for triples and so on. Hence checking independence of n events
require to check
         
n n n n n n
+ + ··· + = (1 + 1) − − = 2n − n − 1
2 3 n 0 1

11-1
11-2 Lecture 11: Independence of Multiple Events & Counting

non-trivial conditions. Independence places very strong requirements on the interrelation-


ships between events.
Notion of independence can be extended easily to countably infinite collection of events.

Definition 11.3 We say that a countably infinite collection of events {Ai |i ≥ 1} is inde-
pendent if any finite sub-collection of events is independent.

Example 11.4 If events A, B, and C are independent, show that A ∪ B is also independent
of C.

Solution:

P ((A ∪ B) ∩ C) = P {(A ∩ C) ∪ (B ∩ C)} = P (A ∩ C) + P (B ∩ C) − P {(A ∩ C) ∩ (B ∩ C)}


= P (A)P (C) + P (B)P (C) − P (A ∩ B ∩ C) = P (A)P (C) + P (B)P (C) − P (A)P (B)P (C)
= P (C) [P (A) + P (B) − P (A)P (B)] = P (C) [P (A) + P (B) − P (A ∩ B)] = P (C)P (A ∪ B)

P ((Ac ∩ B) ∩ C) = P (Ac ∩ B ∩ C) = P (B ∩ C) − P (B ∩ C ∩ A)
= P (B)P (C) − P (A)P (B)P (C)
= P (B)P (C) [1 − P (A)] = P (B ∩ C)P (Ac )

The moral of the Example 11.4 is that if A, B and C are independent then any event
determined by events A and B, will be independent from event C.

11.1 Counting
Now we start a new topic, namely counting. The reason we’re going to talk about counting
is that there’s a lot of probability problems whose solution actually reduces to successfully
counting the cardinalities of various sets (or the number of outcomes in various events). We
have already seen a context where such counting arises. When the sample space Ω has a
finite number of equally likely outcomes, so that the discrete uniform probability law applies.
Then, the probability of any event A is given by
number of elements of A
P (A) =
number of elements of Ω
and involves counting the elements of A and of Ω. Now, today we’re going to just touch
the surface of this subject. There’s a whole field of mathematics called combinatorics who
Lecture 11: Independence of Multiple Events & Counting 11-3

are people who actually spend their whole lives counting more and more complicated sets.
We were not going to get anywhere close to the full complexity of the field, but we’ll get
just enough tools that allow us to address problems of the type that one encounters in most
common situations. Now, if somebody gives you Ω by just giving you a list and gives you
another set A, again, giving you a list, it’s easy to count there element. You just count how
much there is on the list. But sometimes the sets are described in some more implicit way,
and we may have to do a little bit more work.
The counting principle is based on a divide-and-conquer approach, whereby the counting
is broken down into stages. Suppose a process consists of 2 stages. Suppose there are n1
possible results at first stage. For every possible result at the first stage there are n2 possible
results at second stage. Then the total number of possible results of the process is n1 n2 .
This analogy can be extended to any r-step process.

Permutations

Suppose we have 10 students and 10 chairs. We wish to count the total number of arrange-
ments (ordered list) the 10 students can be assigned the 10 chairs. A permutation is an
arrangement in a definite order of objects. Number of different permutations or ordered lists
of n different objects is n!.

k-permutations

We start with n distinct objects, and let k be some positive integer, with k ≤ n. We wish
to count the number of different ways that we can pick k out of these n objects and arrange
them in a sequence, i.e., the number of distinct k-object sequences. More concrete way of
thinking about this problem: We have 99 students in B2 section of P&S course and I want to
form a group with 10 students and puts them in a particular order? So how many possible
ordered lists can I make that consist of 10 people? By ordered, I mean that we take those
10 people and we say this is the first person in the group. That’s the second person in the
group. That’s the third person in the group and so on.
So in how many ways can we do this? Out of these n, we want to choose just k of them
and put them in slots, one after the other. So we have n choices for who we put as the top
person in the community. We can pick anyone and have them be the first person. Then I’m
going to choose the second person in the committee. I’ve used up 1 person. So I’m going
to have n − 1 choices here. When we are ready to select the last (the kth) object, we have
already chosen k − 1 objects, which leaves us with n − (k − 1) choices for the last one. By
the Counting Principle, the number of possible sequences, called k-permutations,
n!
n(n − 1)(n − 2) · · · (n − k + 1) =
(n − k)!
11-4 Lecture 11: Independence of Multiple Events & Counting

Combinations

There are n people and we are interested in forming a committee of k. How many different
committees are possible? More abstractly, this is the same as the problem of counting the
number of k-element subsets of a given n-element set. Notice that forming a combination
is different than forming a k-permutation, because in a combination there is no ordering of
the selected elements. For example, whereas the 2-permutations of the letters A, B, C, and
D are
AB, BA, AC, CA, AD, DA, BC.CB, BD, DB, CD, DC,
the combinations of two out of these four letters are

AB, AC, AD, BC, BD, CD.


n!
Each combination is associated with k! “duplicate” k-permutations, so the number
(n − k)!
of k-permutations is equal to the number of
 combinations times k!. Hence, the number of
n
possible combinations which we denote by , is equal to
k
   
n n! n n!
× k! = =⇒ =
k (n − k)! k (n − k)!k!
Lecture 12: Random Variables
29 January, 2019
Sunil Kumar Gauttam
Department of Mathematics, LNMIIT

In many random experiments, the outcomes are numerical, e.g., throwing a dice, when
outcome corresponds to instrument reading or stock prices. In other random experiments,
the outcomes may not be numerical, e.g., tossing a coin, choosing a student randomly from
the class. In general, the points of a sample space may be very concrete objects such as apples,
molecules, and people. Sometimes we may be interested in certain real-valued (sometimes
we may require to consider complex-valued) function of the sample space. For example, if
the experiment is the selection of students (for placement!!) from a given population, we
may wish consider their CGPA.
But all functions defined on the sample space are not useful, in the sense that we may not be
able to assign probabilities to all basic events associated with the function. So one need to
restrict to certain class of functions of the sample space. This motivates us to define random
variables.

Definition 12.1 Let (Ω, F) be a measurable space, i.e., Ω is non-empty set and F is a
σ-field on Ω. A function X : Ω → R is said to be a random variable if for each x ∈ R,

{ω ∈ Ω|X(ω) ≤ x} =: {X ≤ x} ∈ F

Remark 12.2 1. The adjective “random” is just to remind us that we are dealing with
a sample space, which is related to something called random phenomena or random
experiment. Once ω is picked, X(ω) is thereby determined and there is nothing vague,
or random about it anymore.

2. Observe that random variables can be defined on a sample space before any probability
is mentioned.

3. When experimenters are concerned with random variables that describe observations,
their main interest is in the probabilities with which the random variables take various
values. For example, Let Ω be set of students in B2 Section for P&S class. These
may be labeled as Ω = {ω1 , ω2 , · · · , ω99 }. If we are interested in their attendance
distribution, let A(ω) denote the attendance of ω. Now we are interested in determining
the probability of events like {A ≥ 1} = {ω ∈ Ω|A(ω) ≥ 1}, {A = 0} etc. For that
these events should be in the σ-field. This idea suggest the condition in the Definition
12.1.

12-1
12-2 Lecture 12: Random Variables

Example 12.3 Let Ω = {H, T }, F = P(Ω). Define X : Ω → R by

X(H) = 1, X(T ) = 0.

Claim 12.4 X is a random variable.


Let x ∈ R be given.

1. If x < 0, then {X ≤ x} = ∅.

2. If 0 ≤ x < 1, then {X ≤ x} = {T }.

3. If 1 ≤ x, then {X ≤ x} = Ω.

Hence for each x ∈ R, {X ≤ x} ∈ F.

Remark 12.5 If we take F = {∅, Ω} (trivial σ-field), then the function X defined above is
not a random variable.

Example 12.6 Let Ω be a non-empty set. If F = {∅, Ω}, then only constant functions on
Ω are random variables.

Solution: In view of Remark 12.5 it is enough to show that constant functions are random
variable. Let X ≡ c on Ω for some c ∈ R. Then

1. If x < c, then {X ≤ x} = ∅.

2. If c ≤ x, then {X ≤ x} = Ω.

Hence for each x ∈ R, {X ≤ x} ∈ F.

Example 12.7 Let Ω be a non-empty set. If F = P(Ω) (power set of Ω), then any functions
on Ω is a random variable.

Solution: Let x ∈ R be given. Then {X ≤ x} is a subset of Ω, hence is in F.

Example 12.8 Let Ω = (0, 1], F = σ-field containing all the subintervals of the form (a, b], 0 ≤ a < b ≤ 1.
Define X : Ω → R by
X(ω) = 3ω + 1.

Claim 12.9 X is a random variable.


Note that 0 < ω ≤ 1, so range of X is (1, 4]. Hence for given x ∈ R,
Lecture 12: Random Variables 12-3

1. If x ≤ 1, the {X ≤ x} = ∅.
 
x−1
2. If 1 < x ≤ 4, the {X ≤ x} = 0, .
3

3. If 4 < x, the {X ≤ x} = Ω.

Hence for each x ∈ R, {X ≤ x} ∈ F.

If you look at most of the books on probability theory they give the following definition of
random variable.

Definition 12.10 (Engineering or Practical Definition) A random variable is a func-


tion from Ω to R (or C).

Remark 12.11 When sample space is finite or countably infinite, any function can be re-
garded as random variable in the sense that we can always take the power set as a σ-field.
Recall that if sample space is finite or countably infinite, we can define probability of any
event, hence Definition 12.10 is perfect in this case.
But there are certain difficulties arise when sample space is uncountable and random variable
with a continuous range of possible values. Then it may not be possible for us to define
probability to every event. So we can not work with power set, hence we need a class of
events which is rich enough to include all the events of our interest. Precisely, this is the
reason we need the Definition 12.1. It is comforting to know that mathematical subtleties of
this type do not arise in most of the physical applications.

Example 12.12 In an experiment involving two rolls of a die, the following are examples
of random variables:

1. The sum of the two rolls. X((i, j)) = i + j


12-4 Lecture 12: Random Variables


 0 if i 6= 6, j 6= 6
2. The number of sixes in the two rolls. X((i, j)) = 1 if exactly of one of i or j is 6
2 if i = j = 6

3. The second roll raised to the fifth power. X((i, j)) = j 5 .

So this example tell us that with a single sample space we may have several random variables
sitting on it.

Starting with some random variables, we can at once make new ones by operating on them
in various ways.

Proposition 12.13 Let (Ω, F) be a measurable space.

1. If X and Y are random variables, then so are


X
X + Y, X − Y, XY, (Y 6= 0),
Y
and aX + bY where a and b are two real numbers.

2. If X is a random variable and f : D ⊂ R → R is a Borel measurable function of one


variable then f (X) is also a random variable.

3. If X and Y are random variables and f : D ⊂ R2 → R is a Borel measurable function


of two variables then f (X, Y ) is also random variable.
Lecture 13: Discrete Random Variables
30 January, 2019
Sunil Kumar Gauttam
Department of Mathematics, LNMIIT

Definition 13.1 Let (Ω, F) be a measurable space and X : Ω → R be a random variable.


Random variable X is called discrete if its range is either finite or countably infinite subset
of R.

The random variables in Example 12.3 and Example 12.12 can take at finite number of
numerical values, and are therefore discrete, whereas the random variable in Example 12.8
have the range (1, 4], which is an uncountable set and therefore it is not a discrete random
variable.
If sample space is finite then by definition of a function it follows that, any random variable
defined on the sample space necessarily have finite range. Therefore any random variable on
finite sample space is a discrete random variable.
If sample space is countably infinite then any random variable defined on the sample space
have range which is either finite or countably infinite. Therefore any random variable on
countably infinite sample space is a discrete random variable.

Example 13.2 In the random experiment of tossing a coin till you get a head, the sample
space is
Ω = {H, T H, T T H, T T T H, · · · }
which is a countably infinite set. Define a random variable X as the number of toss required
to get the head, then 

 1 if ω = H
 2 if ω = T H

X(ω) = 3 if ω = T T H

 ... ... ...

We may define a random variable Y as follows:



1 if ω = H
Y (ω) =
0 otherwise

Note that we may have a sample space which is uncountable but we may define discrete
random variables on it.

13-1
13-2 Lecture 13: Discrete Random Variables

For an example, consider the experiment of choosing a point a from the interval [−1, 1]. So
now our Ω = [−1, 1]. The random variable that associates with x the numerical value

 −1 if x < 0
sgn(x) = 0 if x = 0
1 if x > 0

is discrete.
If X is a discrete random variable and g : R → R be a Borel measurable function, then g(X)
is also a discrete random variable.
True/ False: If X is not a discrete random variable and g : R → R be a Borel measurable
function, then g(X) is also not discrete.

−1 if x < 0
The statement is False. Let Ω = (−1, 1), X(ω) = ω and g(x) = Then
1 if x ≥ 0
g(X) is discrete since it’s range is {−1, 1}.
First we focus exclusively on discrete random variables.

13.1 Probability Mass Function


The most important way to characterize a discrete random variable is through the proba-
bilities of the values that it can take. For a discrete random variable X, these are captured
by the probability mass function (pmf for short) of X, denoted fX (·). In particular. if x is
any possible value of X, the probability mass of x, denoted fX (x), is the probability of the
event {X = x} consisting of all outcomes that give rise to a value of X equal to x:

fX (x) = P {X = x}

Definition 13.3 A real-valued function f defined on R by f (x) = P (X = x) is called the


pmf of X.

Example 13.4 Consider the experiment consist of two independent tosses of a fair coin,
and let X be the number of heads obtained. Then the pmf of X is
 1
 4 if x = 0, 2
1
fX (x) = if x = 1
 2
0 otherwise

Pmf captures the “point probabilities or point mass”, in the sense that P (X = x) is the mass
attached at point x. We may draw pmf of X as follows:
Lecture 13: Discrete Random Variables 13-3

x=0 x=1 x=2

Proposition 13.5 (Properties of pmf ) Let f be the pmf of a discrete random variable
X. Then it has the following properties

1. f (x) ≥ 0 for all x ∈ R.


2. {x ∈ R : f (x) > 0} is a finite or countably infinite subset of R.
X
3. f (x) = 1, where summation is over all x belongs to the range of X.
x

Proof:

1. Since P {X = x} ≥ 0 for any x.


2. Since range of X is either finite or countably infinite.
3. As x ranges over all possible values of X the events {X = x} are disjoint and form a
partition of the sample space, that is
[
Ω= {X = x}
x∈R(X)

Now the result follows from the additivity and normalization axioms of probability
measure.

The pmf contains all the information required for the random variable, i.e., for Borel set
S ⊂ R, we can compute the probability of the event {X ∈ S} we have
X X
P (X ∈ S) = fX (x) = P {X = x}
x∈S∩R(X) x∈S∩R(X)

For example, if X is the number of heads obtained in two tosses of a fair coin, as above, the
probability of at least one head is
2
X 3
P (X ≥ 1) = fX (x) = ,
x=1
4
since S = [1, ∞) and R(X) = {0, 1, 2}.
Calculating the PMF of X is conceptually straightforward. For each possible value x of X:
13-4 Lecture 13: Discrete Random Variables

1. Collect all the possible outcomes that give rise to the event {X = x}

2. Add their probabilities to obtain fX (x).

1
Example 13.6 Let Ω = {ω1 , ω2 , ω3 }, F = P(Ω), P (ωi ) = for i = 1, 2, 3, and define ran-
3
dom variables X, Y and Z as follows:

X(ω1 ) = 1, X(ω2 ) = 2, X(ω3 ) = 3;


Y (ω1 ) = 2, Y (ω2 ) = 3, Y (ω3 ) = 1;
Z(ω1 ) = 3, Z(ω2 ) = 1, Z(ω3 ) = 2.

Show that X, Y and Z have the same pmf.

Solution: Note that R(X) = R(Y ) = R(Z) = {1, 2, 3}.


1
P {X = 1} = P (ω1 ) =
3
1
P {X = 2} = P (ω2 ) =
3
1
P {X = 3} = P (ω3 ) =
3
1
P {Y = 1} = P (ω3 ) =
3
1
P {Y = 2} = P (ω1 ) =
3
1
P {Y = 3} = P (ω2 ) =
3
1
P {Z = 1} = P (ω2 ) =
3
1
P {Z = 2} = P (ω3 ) =
3
1
P {Z = 3} = P (ω1 ) =
3

Moral of the Example 13.6 is, different random variable may have same pmf.
MATH-221: Probability and Statistics
Tutorial # 2 (Conditioning, Independence, Bayes Rule, Total Probability Theorem)

1. Let Ω = (0, 1] be the sample space and let P (·) be a probability function defined by

 x if 0 ≤ x < 1

P ((0, x]) = 2 2
1
 x if
 ≤x≤1
2
 
1 1 1
Then show that P = and P ({x}) = 0 if x 6= .
2 4 2
2. (Conditional version of the total probability theorem) Let {A1 , A2 , · · · , AN }
be a partition of Ω. Also let B be an event such that P (Ai ∩ B) > 0 for all i. Then
XN
for any event A, show that P (A|B) = P (Ai |B)P (A|Ai ∩ B).
i=1

3. Let Ω = {(a1 , a2 , · · · , an )|ai is either 0 or 1 for each i }. For i = 1, 2, · · · , n set Ai =


{(a1 , a2 , · · · , an )|ai = 1}. If all the outcomes are equally likely, then show that
A1 , A2 , · · · , An are independent.
4. Consider the experiment of tossing a coin two times. Let A be the event that first
toss is head and B be the event that second toss is head. Let P be the probability
measure under which all the outcomes are equally likely. Then show that A and B
are independent with respect to P . Let Q be the probability measure such that
1 1 1 1
Q(HH) = , Q(HT ) = , Q(T H) = , Q(T T ) =
3 6 6 3
Then show that A and B are not independent with respect to Q.
5. Three switches connected in parallel operate independently. Each switches remains
closed with probability p. Then (a) Find the probability of receiving an input signal
at the output. (b) Find the probability that switch Si is open given that an input
signal is received at the output.
6. An electronic assembly consists of two subsystems, say A and B. From previous
testing procedures, the following probabilities assumed to be known: P (A fails) =
0.20, P (A and B both fail) = 0.15, P (B fails alone) = 0.15. Evaluate the following
probabilities (a) P(A fails/B has failed) (b) P(A fails alone /A or B fail).
7. In answering a question on a multiple-choice test, a student either knows the answer
or guesses. Let p be the probability that the student knows the answer and 1 − p
the probability that the student guesses. Assume that a student who guesses at the
answer will be correct with probability 1/m, where m is the number of multiple-
choice alternatives. What is the conditional probability that a student knew the
answer to a question, given that he or she answered it correctly?
8. Let a sample of size 4 be drawn (a) with replacement (b) without replacement from
an urn containing 12 balls of which 8 are white. Find the probability that the ball
drawn on the third draw is white given that the sample contains three white balls.

You might also like