Lecture 1-2

Lecture Notes on Introduction to Probability &
Statistics (MA203)
Dr. Masihuddin
August 5, 2024
1
Lecture Notes on MA203 (IIITG) 2
1 Introduction
We begin this course with a brief discussion of what “Probability” and “Statistics” is.
Then we review some mathematical foundations that are needed for developing proba-
bility theory.
1.1 Probability Theory

Randomness and uncertainty are inherent in our daily lives and present in every discipline
of science, engineering, and technology. Probability theory provides a mathematical
framework for describing and analyzing random phenomena in the world around us. By
“random phenomena”, we refer to experiments with unpredictable outcomes.
To develop some intuition, let’s examine specific applications of probability. Consider
the terms “randomness” and “probability” in the context of one of the simplest random
experiments: flipping a fair coin.
One way to understand “randomness” is as an expression of our lack of knowledge.
If we knew more about the force used to flip the coin, its initial orientation, the impact
point between the finger and the coin, air turbulence, the table’s surface smoothness, and
the material characteristics of both the coin and the table, we might be able to predict
whether the coin would land heads or tails. However, without this information, we cannot
predict the coin’s outcome. Thus, when we call something random, we acknowledge our
limited knowledge and our inability to be certain about the result.
With a fair coin, if we know nothing about the flip, the probability of it landing
heads is 50%, or 1/2. What does this mean? There are two common interpretations
of “probability.” One is relative frequency: if we flip the coin many times, it will land
heads about half the time. As the number of flips increases, the proportion of heads will
approach 1/2. This understanding aligns with the law of large numbers, which we will
formally discuss later.
Another interpretation is that probability quantifies our subjective belief that some-
thing will happen. For example, predicting the weather involves considering factors like
cloud cover and humidity. Different people may estimate the probability of rain differ-
ently based on these factors. Often, these two interpretations coincide—personal beliefs
about rain may be based on the relative frequency of rain on similar days.
The beauty of probability theory is its applicability regardless of whether we interpret
probability in terms of long-run frequency or degree of belief. Probability theory provides
a robust framework for studying random phenomena, starting with axioms of probability
and building the entire theory through mathematical arguments.
The major contents of this course consists of Probability theory. But we will also
study a small section on the concepts of theory of Statistics.
1.1.1 Statistics: Science of Data

Statistics is the science of decision making in the presence of uncertainty. The two major
areas of statistics are Descriptive Statistics and Inferential Statistics.
(a) Descriptive Statistics: Most of the statistical information in newspapers, maga-

zines, company reports, and other publications consists of data that are summarized
and presented in a form that is easy for the reader to understand. Such summaries
2
of data, which may be tabular, graphical, or numerical, are referred to as descriptive

statistics.
(b) Inferential Statistics: Many situations require information about a large group
of elements (individuals, companies, voters, households, products, customers, and
so on). But, because of time, cost, and other considerations, data can be collected
from only a small portion of the group. The larger group of elements in a particular
study is called the population, and the smaller group is called the sample.
Population
A population is the set of all elements of interest in a particular study.
Sample
A sample is a subset of the population.
The process of conducting a survey to collect data for the entire population is called
a census. The process of conducting a survey to collect data for a sample is called a
sample survey. As one of its major contributions, statistics uses data from a sample
to make estimates and test hypotheses about the characteristics of a population
through a process referred to as statistical inference.
1.2 Foundational concepts of Probability

In our daily life we come across many processes whose nature cannot be predicted in
advance. Such processes are referred to as random processes. The only way to derive
information about random processes is to conduct experiments. Each such experiment
results in an outcome which cannot be predicted beforehand. In fact even if the experi-
ment is repeated under identical conditions, due to presence of factors which are beyond
control, outcomes of the experiment may vary from trial to trial. However we may know
in advance that each outcome of the experiment will result in one of the several given
possibilities. For example, in the cast of a die under a fixed environment the outcome
(number of dots on the upper face of the die) cannot be predicted in advance and it varies
from trial to trial. However we know in advance that the outcome has to be among one of
the numbers 1, 2, ..., 6. Probability theory deals with the modeling and study of random
processes.
Definition 1.1. (i) Random experiment: A random experiment is an experiment
in which:
(a) the set of all possible outcomes of the experiment is known in advance;
(b) the outcome of a particular performance (trial) of the experiment cannot be
predicted in advance;
(c) the experiment can be repeated under identical conditions.
(ii) Sample space: The collection of all possible outcomes of a random experiment is
called the sample space. A sample space will usually be denoted by Ω. ■
3
Example 1.1. (i) In the random experiment of casting a die one may take the sample
space as Ω = {1, 2, 3, 4, 5, 6}, where i ∈ Ω indicates that the experiment results in i
(i = 1, . . . , 6) dots on the upper face of die.
(ii) In the random experiment of simultaneously flipping a coin and casting a die one
may take the sample space as
Ω = {H, T } × {1, 2, . . . , 6} = {(r, i) : r ∈ {H, T }, i ∈ {1, 2, . . . , 6}},
where (H, i) ((T, i)) indicates that the flip of the coin resulted in head (tail) on the upper
face and the cast of the die resulted in i (i = 1, 2, . . . , 6) dots on the upper face.
(iii) Consider an experiment where a coin is tossed repeatedly until a head is observed.
In this case the sample space may be taken as Ω = {1, 2, . . .} (or Ω = {T, T H, T T H, . . .}),
where i ∈ Ω (or T T · · · T H ∈ Ω with (i − 1) T s and one H) indicates that the experiment
terminates on the i-th trial with first i − 1 trials resulting in tails on the upper face and
the i-th trial resulting in the head on the upper face.
(iv) In the random experiment of measuring lifetimes (in hours) of a particular brand
of batteries manufactured by a company one may take Ω = [0, 70000], where we have
assumed that no battery lasts for more than 70,000 hours. ■
Definitions
Event: Let Ω be the sample space of a random experiment and let E ⊆ Ω. If the
outcome of the random experiment is a member of the set E, we say that the event
E has occurred.
(Mutually Exclusive Events) : Two events E1 and E2 are said to be mutually
exclusive if they cannot occur simultaneously, i.e., if E1 ∩ E2 = ϕ, the empty set.
In a random experiment some events may be more likely to occur than the others. For
example, in the cast of a fair die (a die that is not biased towards any particular outcome),
the occurrence of an odd number of dots on the upper face is more likely than the
occurrence of 2 or 4 dots on the upper face. Thus it may be desirable to quantify the
likelihoods of occurrences of various events.
Probability of an event is a numerical measure of chance with which that event occurs.
To assign probabilities to various events associated with a random experiment one may
assign a real number P (E) ∈ [0, 1] to each event E with the interpretation that there is a
(100 × P (E))% chance that the event E will occur and a (100 × (1 − P (E)))% chance that
the event E will not occur. For example if the probability of an event is 0.25 it would
mean that there is a 25% chance that the event will occur and that there is a 75% chance
that the event will not occur. Note that, for any such assignment of possibilities to be
meaningful, one must have P (Ω) = 1. Now we will discuss two methods of assigning
probabilities.
1.3 Classical Definition of Probability

This method of assigning probabilities is used for random experiments which result in a
finite number of equally likely outcomes. Let Ω = {ω1 , . . . , ωn } be a finite sample space
with n ∈ N possible outcomes; here N denotes the set of natural numbers. For E ⊆ Ω,
let |E| denote the number of elements in E. An outcome ω ∈ Ω is said to be favorable to
an event E if ω ∈ E. In the classical method of assigning probabilities, the probability
of an event E is given by
4
number of outcomes favorable to E |E| |E|

P (E) = = = .
total number of outcomes |Ω| n
The classical method satisfy the following intuitive properties:
(i) For any event E, P (E) ≥ 0;
(ii) For mutually exclusive events E1 , E2 , . . . , En (i.e., Ei ∩ Ej = ∅, whenever i, j ∈

{1, . . . , n}, i ̸= j)
n
! n
| ni=1 Ei |
S Pn
i=1 |Ei |
[ X
P Ei = = = P (Ei );
i=1
|Ω| n i=1
|Ω|
(iii) P (Ω) = |Ω|
= 1.
1.3.1 Example 1.1.2

Suppose that in a classroom we have 25 students (with registration numbers 1, 2, . . . ,
25) born in the same year having 365 days. Suppose that we want to find the probability
of the event E that they all are born on different days of the year. Here an outcome
consists of a sequence of 25 birthdays. Suppose that all such sequences are equally likely.
Then |Ω| = 36525 , |E| = 365 × 364 × · · · × 341 = 365P25 and P (E) = |E||Ω|
= 365P 25
36525
.
There is a limited applicability of the classical method of assigning probabilities as it
can be used only for random experiments which result in a finite number of equally likely
outcomes.
1.4 Relative Frequency Method

Suppose that we have independent repetitions of a random experiment (here independent
repetitions means that the outcome of one trial is not affected by the outcome of another
trial) under identical conditions. Let fN (E) denote the number of times an event E occurs
(also called the frequency of event E in N trials) in the first N trials and let rN (E) = fNN(E)
denote the corresponding relative frequency. Using advanced probabilistic arguments
(e.g., using Weak Law of Large Numbers to be discussed later) it can be shown that,
under mild conditions, the relative frequencies stabilize (in certain sense) as N gets large
(i.e., for any event E, limN →∞ rN (E) exists in certain sense). In the relative frequency
method of assigning probabilities the probability of an event E is given by
fN (E)
P (E) = lim rN (E) = lim .
N →∞ N →∞ N
In practice, to assign probability to an event E, the experiment is repeated a large

(but fixed) number of times (say N times) and the approximation P (E) ≈ rN (E) is used
for assigning probability to event E.
The relative frequency method also satisfy the following properties:
(i) for any event E, P (E) ≥ 0;
5
(ii) for mutually exclusive events E1 , E2 , . . . , En

n
! n
[ X
P Ei = P (Ei );
i=1 i=1
(iii) P (Ω) = 1.
Limitations: Although the relative frequency method seems to have more applicability
than the classical method, this method also has some limitations.
It is imprecise as it is based on an approximation (P (E) ≈ rN (E)).
Another difficulty with relative frequency method is that it assumes that the exper-
iment can be repeated a large number of times. This may not be always possible
due to budgetary and other constraints (e.g., in predicting the success of a new
space technology it may not be possible to repeat the experiment a large number
of times due to high costs involved).
Some useful remarks
(i) A set E is said to be finite if either E = ϕ (the empty set) or if there exists a one-one
and onto function f : {1, 2, . . . , n} → E (or f : E → {1, 2, . . . , n}) for some natural
number n;
(ii) A set is said to be infinite if it is not finite;
(iii) A set E is said to be countable if either E = ϕ or if there is an onto function

f : N → E, where N denotes the set of natural numbers;
(iv) A set is said to be countably infinite if it is countable and infinite;
(v) A set is said to be uncountable if it is not countable;
(vi) A set E is said to be continuum if there is a one-one and onto function f : R → E

(or f : E → R), where R denotes the set of real numbers.
The following results, can be found in any standard textbook on set theory, provides
some of the properties of finite, countable and uncountable sets.
Results:
(i) Any finite set is countable;
(ii) If A is a countable and B ⊆ A then B is countable;
(iii) Any uncountable set is an infinite set;
(iv) If A is an infinite set and AB then B is infinite;
(v) If A is an uncountable set and A ⊆ B then B is uncountable;
(vi) If E is a finite set and F is a set such that there exists a one-one and onto function
f : E → F (orf : F → E) then F is finite;
6
(vii) If E is a countably infinite (continuum) set and F is a set such that there exists
a one-one and onto function f : E → F (orf : F → E) then F is countably
infinite(continuum);
(viii) A set E is countable if and only if either E = ϕ or there exists a one-one and onto
map f : E → N0 , for some N0 ∈ N;
(ix) A set E is countable if, and only if, either E is finite or there exists a one-one map
f : N → E;
(x) A set E is countable if, and only if, either E = ϕ or there exists a one-one map
f : E → N;
(xi) A non-empty countable set E can be either written as E = {ω1 , ω2 , . . . , ωn }, for

some n ∈ N, or as E = {ω1 , ω2 , . . . , };
(xii) Unit interval (0, 1) is uncountable. Hence any interval (a, b), where −∞ < a < b <
∞, is uncountable;
(xiii) N × N is countable;
(xiv) Let Λ be a countable set and let {Aα , α ∈ Λ} be a (countable) collection of countable
sets. Then ∪α∈Λ Aα is countable. In other words, countable union of countable sets
is countable;
(xv) Any continuum set is uncountable.
In advanced courses on probability theory it is shown that in many situations (espe-

cially when the sample space Ω is continuum) it is not possible to assign probabilities to
all subsets of Ω such that properties (i)-(iii) of classical (or relative frequency) method are
satisfied. Therefore probabilities are assigned to only certain types of subsets of Ω. Now
we will discuss the modern approach to probability theory. We will define the concept of
probability for certain types of subsets Ω using a set of axioms that are consistent with
properties (i)-(iii) of classical (or relative frequency) method. Further, will also study
various properties of probability measures.
1.5 Axiomatic Definition to Probability and Properties of Prob-

ability Measure
We will first discuss some essential definitions required.
(a) Class of sets : A set whose elements are themselves set is called a class of sets.
A class of sets will be usually denoted by script letters A, B, C, . . .. For example
A = {{1, 3} . {2, 5, 7} , {8, 10, 11}};
(b) Set function : Let C be a class of sets. A function µ : C → R is called a set

function. In other words, a real-valued function whose domain is a class of sets is
called a set function.
7
In many practical situations, we are often interested in finding the probability of some
event A (say). If event A is our some event of interest, then the event Ac (non-occurrence
of A) also carries significance and hence we consider Ac as another event. Similarly,
one may also be interested in “union of events” as an event. However, due to technical
reasons, we should only consider finite or countably infinite unions. This special structure
of collection of events is usually written in terms of a “ σ- field of subsets of Ω. Moreover,
we also see “complimentation” and countable unions generate other events. This leads
to the introduction of the following definition.
Definition 1.2.2. A sigma-field (σ-field) of subsets of Ω is a class F of subsets of Ω
satisfying the following properties:
Ω ∈ F;
A ∈ F ⇒ Ac = Ω − A ∈ F (closed under complements);
Ai ∈ F, i = 1, 2, . . . ⇒ ∞
S
i=1 Ai ∈ F (closed under countably infinite unions).
Some Remarks on σ- field

(i) We expect the event space to be a σ-field;
(ii) Suppose that F is a σ-field of subsets of Ω. Then,
(a) ∅ ∈ F (since ∅ = Ωc );
(b) E1 , E2 , . . . ∈ F ⇒ ∞
T T∞ S∞ c c
i=1 Ei ∈ F (since i=1 Ei = ( i=1 Ei ) );
(c) E, F ∈ F ⇒ E − F = E ∩ F c ∈ F and E∆F = (E − F ) ∪ (F − E) ∈ F;
(e) although the power set of Ω (P (Ω)) is a σ-field of subsets of Ω, in general, a
σ-field may not contain all subsets of Ω.
Example
(i) F = {∅, Ω} is a sigma field, called the trivial sigma-field;
(ii) Suppose that A ⊆ Ω. Then the smallest σ-field containing the set A is F =
{A, Ac , ∅, Ω} ;
(iii) Arbitrary intersection of σ-fields is a σ-field (Assignment);
Borel σ-field: Let Ω = R and let F ′ be the class of all open intervals in R. Then
B1 = σ(F ′ ) is called the Borel σ-field on R. The Borel σ-field in Rk (denoted by Bk ) is
the σ-field generated by the class of all open rectangles in Rk .
Borel set: A set B ∈ Bk is called a Borel set in Rk ; here Rk = {(x1 , . . . , xk ) : −∞ <

xi < ∞, i = 1, . . . , k} denotes the k-dimensional Euclidean space.
Let Ω be a sample space associated with a random experiment and let F be the event
space (a σ-field of subsets of Ω). Recall that members of F are called events. Now we
provide a mathematical definition of probability based on a set of axioms.
Definition (i) Let F be a σ-field of subsets of Ω. A probability function (or a

probability measure) is a set function P , defined on F, satisfying the following three
axioms:
8
(a) P (E) ≥ 0, ∀E ∈ F; (Axiom 1: Non-negativity);
(b) If E1 , E2 , . . . is a countably infinite collection of mutually exclusive events (i.e.,

Ei ∈ F, i = 1, 2, . . . , Ei ∩ Ej = ∅, i ̸= j) then
∞
! ∞
[ X
P Ei = P (Ei ); (Axiom 2: Countably infinite additivity)
i=1 i=1
(c) P (Ω) = 1 (Axiom 3: Probability of the sample space is 1 ).
(ii) The triplet (Ω, F, P ) is called a probability space.

Lecture 1-2

Uploaded by

Copyright:

Available Formats

Lecture 1-2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 1-2

Uploaded by

Copyright:

Available Formats

Lecture Notes on Introduction to Probability &

1.1 Probability Theory

1.1.1 Statistics: Science of Data

(a) Descriptive Statistics: Most of the statistical information in newspapers, maga-

of data, which may be tabular, graphical, or numerical, are referred to as descriptive

1.2 Foundational concepts of Probability

1.3 Classical Definition of Probability

number of outcomes favorable to E |E| |E|

The classical method satisfy the following intuitive properties:

(i) For any event E, P (E) ≥ 0;

(ii) For mutually exclusive events E1 , E2 , . . . , En (i.e., Ei ∩ Ej = ∅, whenever i, j ∈

1.3.1 Example 1.1.2

1.4 Relative Frequency Method

In practice, to assign probability to an event E, the experiment is repeated a large

(i) for any event E, P (E) ≥ 0;

(ii) for mutually exclusive events E1 , E2 , . . . , En

 It is imprecise as it is based on an approximation (P (E) ≈ rN (E)).

Some useful remarks

(ii) A set is said to be infinite if it is not finite;

(iii) A set E is said to be countable if either E = ϕ or if there is an onto function

(iv) A set is said to be countably infinite if it is countable and infinite;

(v) A set is said to be uncountable if it is not countable;

(vi) A set E is said to be continuum if there is a one-one and onto function f : R → E

(i) Any finite set is countable;

(ii) If A is a countable and B ⊆ A then B is countable;

(iii) Any uncountable set is an infinite set;

(iv) If A is an infinite set and AB then B is infinite;

(v) If A is an uncountable set and A ⊆ B then B is uncountable;

(xi) A non-empty countable set E can be either written as E = {ω1 , ω2 , . . . , ωn }, for

(xv) Any continuum set is uncountable.

In advanced courses on probability theory it is shown that in many situations (espe-

1.5 Axiomatic Definition to Probability and Properties of Prob-

(b) Set function : Let C be a class of sets. A function µ : C → R is called a set

Some Remarks on σ- field

Borel set: A set B ∈ Bk is called a Borel set in Rk ; here Rk = {(x1 , . . . , xk ) : −∞ <

Definition (i) Let F be a σ-field of subsets of Ω. A probability function (or a

(a) P (E) ≥ 0, ∀E ∈ F; (Axiom 1: Non-negativity);

(b) If E1 , E2 , . . . is a countably infinite collection of mutually exclusive events (i.e.,

(c) P (Ω) = 1 (Axiom 3: Probability of the sample space is 1 ).

(ii) The triplet (Ω, F, P ) is called a probability space.

You might also like

It is imprecise as it is based on an approximation (P (E) ≈ rN (E)).