Basic Concepts of Probability
Basic Concepts of Probability
Probability and
Independence of Events
By
Amitava Bandyopadhyay
Learning Objectives
Understand the concepts of events and
probability of events
Understand
the
notion
of
conditional
probabilities and independence of different kinds
Understand the concept of inverse probabilities
and Bayes theorem
Understand specific concepts of lift, support,
sensitivity and specificity
Develop ability to use these concepts for
formulation of business problems and providing
solutions to the same
Events
In business analytics we often study the occurrence of events. A
customer coming into a store may or may not buy some items. In
this case buying is an event. We may be interested in knowing
the amount spent by the customer. In this case the amount spent
(or the range of spend) is an event. A particular machine may or
may not fail in a given time interval. In this case failure of the
machine in the given time interval is an event.
Loosely speaking an event is something happening. We attach
numeric value to the outcome being observed.
An interesting point is that while dealing with events we are
dealing with a special kind of variable. The range of all possible
values of this variable is known in advance but the exact value
that will happen in the next instance is not known. Such a
variable is called a random variable.
Note: An event is a subset of values that the random variable can
assume
Events (Continued)
Consider the case of a customer walking into a retail
store. During her stay in the store, she may or may not
buy. Accordingly, we may define a random variable X
that takes two values 0 if nothing was bought and 1, if
something was bought. Events of buying or not buying
may then be defined.
Usually events are defined in terms of capital letters
Note: Events are subsets of values of random
variable. An important assumption is that the
values of the random variables are being
generated
under
similar
conditions.
This
assumption is often ignored and is a hazardous
practice.
Examples
You are working for an automobile company. The company
wants to know how many times an automobile might fail during
the warranty period (say before travelling 10000 miles). The
number of failure is a random variable. Suppose we denote this
random variable by X. An event may be described as X 5, i.e.
the event that an automobile fails at most 5 times.
Suppose an automobile has been serviced and you want to
know how many miles it will travel before encountering the
next failure. Let X denote the number of miles. Here X is a
random variable. An event may be X 5000.
Note
Many real life situations are actually events that describe some
values assumed by a random variable
Number of vehicles sold during a month, number of accidents in a
day, number of customers coming to a retail store in a month,
number of telephone calls made by customers in a day to a call
center all these are examples of random variables. An event
specifies some values of the random variable.
In business analytics you must always try to identify the random
Probability of an Event
We will normally use the symbol P(A) to denote the
probability that the event A happens
Let A be the event that a personal loan given to a
particular customer turns out to be bad (cannot be
recovered)
Let P(A) = 0.03
This implies that the past record of the bank shows that 3%
of the personal loans turns out to be bad. (Note that we are
assuming that all personal loans are sanctioned under
more or less similar conditions. In case the past records
contain a period of severe recession leading to loss of
many jobs the analyst should be careful in estimating the
probability. Even dropping that period may be meaningful)
Example-cum-Exercise
Frequency
1
3
6
13
72
135
130
Axioms of Probability
A function P that assigns areal number P(A)
to each event A is a probability
distribution or a probability measure if
it satisfies the following three axioms
a. P(A) 0
b. P() = 1
c. If A1, A2, . are disjoint events, i.e. Ai Aj =
where is the empty set, then P(Aj) = P(Aj)
Conditional Probability
Conditional probability of event A given event B
written as P(AB) is the relative frequency of A
given B has happened.
Conditional probability P(AB) = P(AB) / P(B).
Actually P(AB) = NAB / NB
In the table given in the example-cum-exercise
slide, what is the conditional probability that a
customer will rate his billing experience as 7
given that his experience score is > 5?
Suppose a family has three siblings. What is the
conditional probability that the family has three
P(AB) isgiven
defined
,
i.e. onlyatif least
P(B)
daughters
thatonly
out if
ofBthe
3 siblings
> are
0 girls?
two
Note that P(AB) and P(BA) are not the same.
An Important Point
Note that P(AB) and P(BA) are not the same. Consider the following
example.
An epidemiologist wants to assess the impact of smoking on the
incidence of lung cancer. From hospital records she collected data on
100 patients of lung cancer and she also collected data on 300
persons not suffering from lung cancer. She has classified the 400
samples into smokers and non smokers and the observations are
Smoker
Lung Cancer
Total
summarized
below
Yes
No
Yes
69
137
206
No
31
163
194
Total
100
300
400
Let A be the event that a person has lung cancer and let B be the
event that the person is a smoker. Can you estimate P(AB) from the
table given above?
Concept of Independence
We say that events A and B are independent in
case P(AB) = P(A), i.e. the probability of A is not
impacted by the presence of event B
This definition implies that when A and B are
independent, P(AB) = P(A).P(B)
Example: Suppose a machine may fail for three
different reasons and suppose these three reasons
happen independently. Let A, B and C denote the
events that reason 1, reason 2 and reason 3 are
present. Then P(ABC) = P(A). P(B).P(C)
Note: If A and B are independent, then Ac and Bc
are independent. In fact it can be easily shown that
Ac and B , and A and Bc are also independent.
Examples of Independence
Suppose you are tossing a fair coin. Thus the
probability that a toss results in a head is 0.5.
Assuming that tosses are independent of each
other, what is the chance that 3 tosses will result
in 3 heads?
Suppose a machine has 20 different parts.
Suppose the parts fail independently of each other
and on any given day a part fails with 1% chance
only. Suppose the machine continues to operate if
all parts are operational and fails if one or more
parts fail. What is the chance that the machine will
fail on a randomly selected day?
Mutually Independent
Events
A set of events B1, B2, , Bp are said to be
mutually independent in case for all
combinations 1 i < j < k <.. p, the
following multiplication rules hold
P(Ai Aj) = P(Ai) P(Aj) ..
(1)
P(Ai Aj Ak) = P(Ai) P(Aj) P(Ak) (2)
.
P(A1 A2 Ap ) = P(A1) P(A2)P(Ap).......(p
1)
Notes on Mutual
Independence
Mutual independence is a strong
condition
Even though the condition consisting
of a set of 2p equations looks
complicated, its validity is obvious
and requires no checking.
It may be readily verified that when
the last equation holds good, all
other equations will hold good.
Pairwise Independence
When the first equation involving two events
hold good for all possible choices of two events,
the events are said to be pairwise independent
Pairwise independence does not mean mutual
independence. Suppose two fair dice are thrown
and the following three events are defined
A means odd face with first die
B means odd face with second die
C means odd sum
Note that A, B and C are pairwise independent but not
mutually independent
Exercise
In a certain county 60% of registered
voters support party A, 30% support
party B and 10% are independents.
When those voters were asked about
increasing military spending 40% of
supporters of A opposed it, 65% of
supporters of B opposed it and 55%
of the independents opposed it. What
is the probability that a voter
selected randomly in this county
opposes increased military spending?
Bayes Theorem
Bayes theorem allows us to look at probability from a inverse
perspective
Bayes theorem states that
P(BA) = P(AB) P(B) / P(A)
Let B1, B2, , Bp be a set of mutually exclusive and collectively
exhaustive events such that B j for j = 1, 2, p. In this set up
Bayes theorem may be stated as
P(BjA) = P(ABj) P(Bj) / (P(ABj) P(Bj)), j = 1, 2, p
This simple yet intelligent way of looking at probability is often
very effective. We may not be able to find P(B jA) directly but it
may be far easier to estimate P(AB j).
Construct examples of the previous statement. Recall the example
of smoking and lung cancer. Can you use Bayes theorem to
estimate probability of lung cancer given smoking habit?
Application of Bayes
Theorem
Suppose I divide my email into three categories: A 1 =
spam; A2 = administrative and A3 = technical. From
previous experience I find that P(A1) = 0.3; P(A2) = 0.5;
and P(A3) = 0.2. Let B be the event that the email contains
the word free and has at least one occurrence of the
character !. From previous experience I have noted that
P(B/A1) = 0.95; P(B/A2) = 0.005 and P(B/A3) = 0.001. I
receive an email with the word free and the !. What is the
probability that it is a spam?
Events of Interest
Note that sensitivity and specificity do not give the
probabilities of the events of interest
We are actually interested in positive and negative predictive
values (abbreviated as PPV and NPV respectively) defined as
Why
is
this
Important?
Suppose we are trying to develop a classification
model to understand what leads to failure of
vehicles. It is not possible to conduct experiments
where we observe impact of different conditions on
the event of failure of vehicles in a given period of
time. However, whenever vehicles fail, the failures
will be reported. Suppose the conditions are
captured by sensors. Thus we will have data on
conditions given that failure has happened. We can,
therefore, estimate the probability of different
conditions given that failure has happened. From
warranty report data we can also estimate the
unconditional probability of failure. We can,
therefore, use the methodology given above to
classify whether vehicles will fail under given
conditions
Further Insights
Note that the previous discussions show how we can
estimate P(BA) where B is the failure event, when P(AB)
and P(B) are known
Generally we would like to estimate the conditional
probability of failure given many rather than only one event.
Thus we may like to estimate P(BA 1 A2 . Ak). Note that
using Bayes theorem, we know
Concept of Conditional
Independence
Let A, B and C be three events
A and B are said to be conditionally independent given C, in
case P(AB C) = P(AC)
Conditional independence is often a reasonable assumption as
we show in the subsequent examples
Consider the following events
A = Event that lecture is delivered by Amitava (there are two teachers
Amitava and Boby)
B = Event that lecturer arrives late
C = Event that lecture concerns stat theory (theory and practical are taught)
Suppose Amitava has a higher chance of delivering lecture on stat theory
Suppose Amitava is likelier to be late
Notice that the conditional probability of lecture being on stat theory given that
lecturer is Amitava is independent of the event that the lecturer arrives late.
Thus P(C / AB) = P(C / A)
Example
Consider the problem where data were collected for customers of
computers. We need to develop a classification mechanism so that
customers may be classified as buyers or non-buyers given the
profile. We will use Nave Bayes classification methodology to
accomplish this objective.
Data Table
Age
Income
Student
Credit
Rating
Buys
Computer
30
High
No
Fair
No
30
High
No
Excellent
No
31 40
High
No
Fair
Yes
> 40
Medium
No
Fair
Yes
> 40
Low
Yes
Fair
Yes
> 40
Low
Yes
Excellent
No
31 40
Low
Yes
Excellent
Yes
30
Medium
No
Fair
No
30
Low
Yes
Fair
Yes
> 40
Medium
Yes
Fair
Yes
30
Medium
Yes
Excellent
Yes
31 40
Medium
No
Excellent
Yes
31 40
High
Yes
Fair
Yes
> 40
Medium
No
Excellent
No
Classification Mechanism
The classifier aims at developing a method such that
optimal allocation to one of the classes (buys
computer / does not buy computer) is made for any
customer with a given combination of age, income,
status (student or not) and credit rating
Let B be the response variable that takes two values.
B = 0 means the customer does not buy computer
and 1 means s/he buys computer
Now P(B = 0 / Age, Income, Status, Credit Rating)
and P(B = 1 / Age, Income, Status, Credit Rating)
needs to be found using the Nave Bayes theory
We know that rather than estimating these probabilities,
some values proportional to the same shall be found
Exercise
Develop a classification mechanism
for the IRIS database
Hint: Note that the response variable
has three classes. Also observe that
there are four explanatory variables
Divide the explanatory variables into
certain classes and find the conditional
probabilities of the explanatory variables
given the different values of the
response variable
Review Questions
What is a random experiment?
What is an event? What is the meaning of
probability?
Let A and B be two events. What is meant
by A and B are independent?
Define conditional probability. Are P(A / B)
and P(B / A) same? If not, why not?
Explain
the
concept
of
conditional
independence and how is it used for
classification?