Unit 2
Unit 2
• Let's now pretend that our universe involves a research study on humans, and the
event ‘A’ is people in that study who have cancer.
• If our study has 100 people and A has 25 people, the probability of A or P(A) is
25/100.
• The ratio of the number of favorable outcomes to the total number of outcomes of
an event.
• Here we observe several instances of the event and count the number of times A
was satisfied. The division of these numbers is an approximation of the probability.
Bayesian Approach:
• The Bayesian approach differs by dictating that probabilities must be discerned
(determined) using theoretical means.
• Using the Bayes approach, we would have to think a bit more critically about
events and why they occur.
Bayesian versus Frequentist
• The important part of the Frequentist approach is the relative frequency.
• The relative frequency of an event is how often an event occurs divided by the
total number of observations.
• Example – marketing stats
• Let's say that you are interested in ascertaining how often a person who visits your
website is likely to return on a later date. This is sometimes called the rate of
repeat visitors.
• we can calculate relative frequency.
– So, in this case, we can take the visitor logs and calculate the relative
frequency of
– event A (repeat visitors).
– Let's say, of the 1,458 unique visitors in the past week, 452 were repeat
visitors.
– We can calculate this as follows:
– P(A) RF(A) =
• Proof: As we increase the sample size of our relative frequency, the frequency
approaches the actual average (probability) of 5.
Why Bayes?
Because Bayes answers the questions we
really care about.
Pr(I have disease | test +) vs Pr(test + | disease)
• Let's say that our Universe is 100 people who showed up for an experiment, in
which a new test for cancer is being developed:
• Here, the red circle, A, represents 25 people who actually have cancer.
• Using the relative frequency approach, we can say that
– P(A) = number of people with cancer/number of people in study,
– that is, 25/100 = ¼ = .25.
This means that there is a 25% chance that someone has cancer.
Compound events
• A second event, called B, as shown, which contains people for whom the test was
positive (it claimed that they had cancer).
• Let's say that this is for 30 people.
– So, P(B) = 30/100 = 3/10 = .3.
– This means that there is a 30% chance that the test said positive for any given
person:
• These are two separate events, but they interact with each other. Namely, they
might intersect or have people in common, as shown here:
Compound events
• A intersect B or A ∩ B, are people for whom the test claimed they were positive
for cancer (A) and they actually do have cancer. Let's say that's 20 people.
• The test said positive for 20 people, that is, they have cancer, as shown here:
• If we want to say that someone has cancer or the test came back positive.
– This would be the total sum (or union) of the two events, namely, the sum of
5, 20, and 10, which is 35.
– So, 35/100 people either have cancer or had a positive test outcome.
– That means, P(A or B) = 35/100 = .35 = 35%.
Compound events
• We have people in the following four different classes:
• Pink: This refers to the people who have cancer and had a negative test
outcome
• Purple (A intersect B): These people have cancer and had a positive test
outcome
• Blue: This refers to the people with no cancer and a positive test outcome
• White: This refers to the people with no cancer and a negative test outcome
• So, effectively, the only times the test was accurate was in the white and purple
regions.
• In the blue and pink regions, the test was incorrect.
Conditional Probability
• Select an arbitrary person from this study of 100 people, Assume that that their
test result was positive.
• What is the probability of them actually having cancer?
– So the event B has already taken place, and that their test came back
positive.
• The question now is: what is the probability that they have cancer, that is P(A)?
• This is called a conditional probability of A given B or P(A|B).
• There is a 66% chance that if a test result came back positive, that person had
cancer.
• In reality, this is the main probability that the experimenters want. They want to
know how good the test is at predicting cancer.
The Rules of Probability
• These rules help us calculate compound probabilities with ease.
The addition rule
• The addition rule is used to calculate the probability of either or events.
• To calculate
– P(A ∪ B) = P(A or B), we use the following formula:
– P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
• To get the union of the two events, we have to add together the area of the circles
in the universe.
• The subtraction of P(A and B) - This is because when we add the two circles, we
are adding the area of intersection twice, as shown in the following diagram:
• If A is the event that someone has cancer, and B is that the test result was positive,
we have:
P(A or B) = P(A) + P(B) – P(A and B)
= .25 + .30 - .2 = .35
• This was calculated before visually in the diagram.
Addition Rule of Probability
Addition Rule: If A and B are two events in a probability experiment, then the probability that either
one of the events will occur is:
P(A or B)=P(A)+P(B)−P(A and B)
Venn diagram representation: P(A∪B)=P(A)+P(B)−P(A∩B)
Example: On a six-sided dice, each side has a number between 1 and 6. What is the probability of
throwing 3 or 4?
The chance of rolling either 3 or 4 is: 1/6 + 1/6 = 2/6
= 1/3
Addition Rule of Probability
Example:
• If a single card is drawn from a regular pack of cards, what is the probability that the card is either a
queen or spade?
Solution:
Let X be the event of picking a queen and Y be the event of picking a spade.
P(X)=4/52
P(Y)=13/52
The two events are not mutually exclusive, as there is one favorable outcome in which the card can be
both an ace and spade.
P(X and Y)=1/52
Independence
• Two events are independent if one event does not affect the outcome of
the other,
that is P(B|A) = P(B) and P(A|B) = P(A).
• If two events are independent, then:
P(A ∩ B) = P(A) · P(B|A) = P(A) · P(B)
Example: Flip a coin and get heads and flip another coin and get tails
Complementary events
• The complement of A is the opposite or negation of A.
• For example, if A is the event where someone has cancer, is the event where
someone is cancer free.
The Rules of Probability
Complementary events:
P(A) = l – (P(2)+P(3))
= 1 – (1/36 + 2/36)
= 1 – (3/36)
= 33/36
=.9
The Rules of Probability
Complementary events:
The Rules of Probability
Complementary events:
Difference Between Mutually Exclusive and
Independent Events
• A mutually exclusive event can simply be defined as a situation when two
events cannot occur at same time whereas independent event occurs
when one event remains unaffected by the occurrence of the other event.
• The false positives are the tests incorrectly predicting positive (cancer) == 10
• The false negatives are the tests incorrectly predicting negative (no cancer) == 5
• The first two classes indicate where the test was correct or true.
• The last two classes indicate where the test was incorrect or false.
• Advanced Probability
– To explore more complicated theorems of probability and how we can
use them in a predictive capacity.
– Bayes theorem and random variables, give rise to common machine
learning algorithms, such as the Naïve Bayes algorithm
Concepts:
• Exhaustive events
• Bayes theorem
• Basic prediction rules
• Random variables
Advanced Probability
• Given a set of events {temperature < 60, temperature > 90}, these events
are not collectively exhaustive because there is a third option that is not
given in this set of events: The temperature could be between 60 and 90.
– However, they are mutually exhaustive because both cannot happen
at the same time.
Bayesian Approach:
• When applying Bayes, the following three things are considered along with
how they all interact with each other:
• A prior distribution
• A posterior distribution
• A likelihood
• Basically, we are concerned with finding the posterior. - That's the thing
we want to know.
• Bayes can be interpreted as trying to figure out P(H|D) (the probability that
our hypothesis is correct, given the data).
• P(H) - the probability of the hypothesis before we observe the data, called
the prior probability or just prior
• P(H|D) - what we want to compute, the probability of the hypothesis after
we observe the data, called the posterior
• P(D|H) - the probability of the data under the given hypothesis, called the
likelihood
• P(D) - the probability of the data under any hypothesis (the normalizing constant)
Bayes Theorem
• Applications: Bayes theorem shows up in a lot of applications, usually
when we need to make fast decisions based on data and probability. Most
recommendation engines, such as Netflix's, use some elements of
Bayesian updating.
Example – Titanic Data
• A very famous dataset involves looking at the survivors of the sinking of
the Titanic in 1912. We will use an application of probability in order to
figure out if there were any demographic features that showed a
relationship to passenger survival.
• Mainly, we are curious to see if we can isolate any features of our dataset
that can tell us more about the types of people who were likely to survive
this disaster.
• Each row represents a single passenger on the ship, and, for now, we are
looking at two specific features: the gender of the individual and whether
or not they survived.
• For example, the first row represents a man who did not survive while the
fourth row (with index 3,) represents a female who did survive.
Python code – To find survival analysis of passengers based on the attribute
import pandas as pd
titanic = pd.read_csv('C:/Users/Nithya/Desktop/titanic.csv’) #read in a csv
titanic = titanic[['Sex', 'Survived']] #the Sex and Survived column
titanic.head()
num_rows = float(titanic.shape[0])
print(num_rows)
p_survived = (titanic.Survived==1).sum() / num_rows
print(p_survived)
p_notsurvived = 1 - p_survived
print(p_notsurvived)
p_male = (titanic.Sex=="male").sum() / num_rows
print(p_male)
p_female = 1 - p_male # == .35
print(p_female)
number_of_women = titanic[titanic.Sex=='female'].shape[0]
print(number_of_women)
women_who_lived = titanic[(titanic.Sex=='female') & (titanic.Survived==1)].shape[0]
print(women_who_lived)
p_survived_given_woman = women_who_lived / (number_of_women)
print(p_survived_given_woman)
Medical test example:
(0.01)(0.95) 0.0095
= = » 0.24
(0.01)(0.95) + (0.99)(0.03) 0.0095 + 0.0297
Typical statistics problem: There is a parameter, θ, that we
want to estimate, and we have some data.
• Examples:
One Die- 6 outcomes
One coin- 2 outcomes
One deck of cards- 52 outcomes; 4 Aces, 12
face cards, and 36 non-face cards(ie 2-10);
Basic Probabilities
P(jack, tails)
4 1 4
( )= 0.04 4%
52 2 104
Compound Event Notations
Compound Events
• When the outcome of one event does not
affect the outcome of a second event, these
are called independent events.
• A random variable is a function that maps values from the sample space
of an event (the set of all possible outcomes) to a probability value
(between 0 and 1).
Random Variables
Discrete random variables
• A discrete random variable only takes on a countable number of
possible values.
• Example:
Outcome 1 2 3 4
Probability 0.1 0.3 0.4 0.2
• Solution:
x 1 2 3 4 5 6
p 0.1 0.1 0.1 0.1 0.1 0.5
xp 0.1 0.2 0.3 0.4 0.5 3
• Solution 3 0.25
4 0.15
E(X) = Σ [ xi * P(xi) ]
E(X) = 0*0.10 + 1*0.20 + 2*0.30 + 3*0.25 +4*0.15
= 2.15
Expectation of a random variable – Mean
Example: What is the expected value when we roll a fair die?
• Solution:
• There are six possible outcomes: 1, 2, 3, 4, 5, 6. Each of these has a
probability of 1/6 of occurring. Let X represent the outcome of the
experiment.
• Therefore P(X = 1) = 1/6 (this means that the probability that the
outcome of the experiment is 1 is 1/6)
P(X = 2) = 1/6 (the probability that you throw a 2 is 1/6)
P(X = 3) = 1/6 (the probability that you throw a 3 is 1/6)
P(X = 4) = 1/6 (the probability that you throw a 4 is 1/6)
P(X = 5) = 1/6 (the probability that you throw a 5 is 1/6)
P(X = 6) = 1/6 (the probability that you throw a 6 is 1/6)
• E(X) = 1×P(X = 1) + 2×P(X = 2) + 3×P(X = 3) + 4×P(X=4) + 5×P(X=5) +
6×P(X=6)
• Therefore E(X) = 1/6 + 2/6 + 3/6 + 4/6 + 5/6 + 6/6 = 7/2
• So the expectation is 3.5 .
Expectation of a random variable
– Variance
• The variance of a discrete random variable X measures the spread, or
variability, of the distribution, and is defined by Var(X) = Σx2p − μ2
• The variance of a random variable tells us something about the spread of
the possible values of the variable. For a discrete random variable X, the
variance of X is written as Var(X).
Var(X) = E[ (X – m)2 ] where m is the expected value E(X)
• This can also be written as:
Var(X) = E(X2) – m2
x 1 2 3 4 5 6
• Example: p 0.1 0.1 0.1 0.1 0.1 0.5
σ = √Var(X)
The Standard Deviation is 1.803
Expectation of a random variable –
Example:
Mean, Variance
x 1 2 3 4
p(x) .10 .30 .40 .20
Answer:
E(X) = (.10)(1) + (.30)(2) + (.40)(3) + (.20)(4) = 2.7.
Var(x) = .81.
sd(X) = p var(X) = 0.9.
Expectation of a random variable – Mean, Variance
Example: The project has a 2% chance of failing completely and a 26% chance of
being a great success! Calculate the expected value of success with its variation. Also
find the chance that our product will have success rate of 3 or higher.
Solution:
E[X] = 0(0.02) + 1(0.07) + 2(0.25) + 3(0.4) + 4(0.26) = 2.81
So, The manager can expect a success of about 2.81 out of this project.
Variance=V[X] = 0.93
We could say that our project will have an expected score of 2.81 plus or minus .93
meaning that can expect something between 1.88 and 3.74.
To find success rate > 3
P(X >= 3) = P(X = 3) + P(X = 4) = .66 = 66%
• This means that we have a 66% chance that our product will rate as either a 3 or a 4.
• Another way to calculate this would be the conjugate way, as shown here:
P(X >= 3) = 1 – P(X < 3)
P(X < 3) = P(X = 0) + P(X = 1) + P(X = 2) = 0.02 + 0.07 + 0.25 = .034
1 – P(X < 3) = 1 - .34 = .66 = P( x >= 3)
Types of discrete random variable
Types of Discrete Random Variables:
• Binomial
• Geometric
• Poisson
Binomial random variables:
A binomial setting has the following four conditions:
• The possible outcomes are either success or failure
• The outcomes of trials cannot affect the outcome of another trial
• The number of trials was set (a fixed sample size)
• The chance of success of each trial must always be p
0.15
5 0.102919
6 0.036757 0.1
7 0.009002 0.05
8 0.001447
9 0.000138 0
0 1 2 3 4 5 6 7 8 9 10
10 5.9E-06
Value of x
Types of discrete random variable
Binomial random variables – Example Problems:
The probability mass function (PMF) for a binomial random variable:
• From here, we can calculate an expected value and the variance of this variable:
• So, this family can expect to have probably 1 or 2 kids with type O blood!
Binomial random variables
Example – blood types
• What if we want to know the probability that at least 3 of their kids have
type O blood?
Solution:
• To know the probability that at least three of their kids have type O blood,
we can use the following formula for discrete random variables:
• So, there is about a 10% chance that three of their kids have type O
blood.
Binomial random variable Applications:
• A binomial random variable is a discrete random variable that
counts the number of successes in a binomial setting.
• p=.09
• q=.91
• n=6
• x=2
Binomial Distribution Problems
p=.20
q= .80
n=18
x=5
Types of discrete random variable
2. Geometric random variables
• It is actually quite similar to the binomial random variable in that we are
concerned with a setting in which a single event is occurring over and over.
• The major difference is that we are not fixing the sample size.
Four conditions:
• The possible outcomes are either success or failure
• The outcomes of trials cannot affect the outcome of another trial
• The number of trials was not set
• The chance of success of each trial must always be p
Note: These are the exact same conditions as a binomial variable, except the third
condition.
• A geometric random variable is a discrete random variable, X, that counts the
number of trials needed to obtain one success.
• The parameters are p = the chance of success of each trial and (1 − p) = the chance
of failure of each trial. The formula for the PMF is as follows:
P(X = x) = (1−p)[x−1]p
Types of discrete random variable
3. Poisson random variables
• This is used when an event that we wish to model has a small probability of
happening and that we wish to count the number of times that the event occurs
in a certain time frame.
• If we have an idea of the average number of occurrences, μ, over a specific period
of time, given from past instances, then the Poisson random variable, denoted by
X = Poi(μ), counts the total number of occurrences of the event during that given
time period.
Examples of Poisson random variables:
• Finding the probability of having a certain number of visitors on your site within an
hour, knowing the past performance of the site
• Estimating the number of car crashes at an intersection based on past police reports
• If we let X = the number of events in a given interval, and the average number of
events per interval is the λ number, then the probability of observing x events in a
given interval is given by the following formula:
Types of discrete random variable
2. Poisson random variables
Example – call center:
The number of calls arriving at your call center follows a Poisson distribution
at the rate of 5 calls/hour. What is the probability that exactly six calls will
come in between 10 and 11 p.m.?
Solution:
• Let X be the number of calls that arrive between 10 and 11 p.m. This is our
Poisson random variable with mean λ = 5.
• The mean is 5 because we are using 5 as our previous expected value of
the number of calls to come in at this time.
P(X = 6) =( e(-5)*56) / 6!
= 0.146
• This means that there is about a 14.6% chance that exactly six calls will
come between 10 and 11 p.m.s
Random variables
Continuous random variables
• A continuous random variable can take on an infinite number of possible
values
• It can take all possible values between certain limits. Continuous random
variables are usually measurements.
• It can also take integral as well as fractional values.
• Examples include height, weight, the amount of sugar content in an
orange, the time required to run a mile. The height, weight, age of a
person, the distance between two cities etc. are some of the continuous
random variables.
• A continuous random variable is not defined at specific values. Instead, it
is defined over an interval of values, and is represented by the area under
a curve (in advanced mathematics, this is known as an integral).
Random variables
Continuous random variables
Consider the following examples of continuous variables:
• The length of a sales representative's phone call (not the number of calls)
• The actual amount of oil in a drum marked 20 gallons (not the number of oil
drums)
• If X is a continuous random variable, then there is a function, f(x), such that for any
constants a and b:
x=6
Solution:
= 0.111476736
Examples
Problem 3:
Solution:
The average sales score by a company is 1.5/day. What is the probability that this company achieves more
than 2 scores in a given day?
Solution:
• It follows Poisson distribution: mean = λ = 1.5; P(x=0) = e-1.5 (1.5)0 / 0!
• For P(x>2), any number of values could be considered from 2 onwards. = (0.2231) (1)/1
• So, we consider x<=2. = 0.2231
P(x>2) = 1-p(x<=2)
P(x=1) = e-1.5 (1.5)1 / 1!
= 1 – [ P(x=0) + P(x=1)+P(x=2) ] = 0.2231 * 1.5
= 0.3347
= 1 – [ 0.2231+0.3347+0.251]
P(x=2) = e-1.5 (1.5)2 / 2!
= 1 - 0.8088 = (0.2231 * 2.25 )/2
= 0.251
= 0.1912
So, there is a 19.12 % of chance that the company achieves more than 2 scores.
Examples
Poisson random variable
Examples
Poisson random variable
Examples
Poisson random variable
Module 2 - Questions
Vectors and Matrices, sets
3. How to perform additional operations such as dot product and multiplication with a
scalar.
4. Why data is represented as a ‘vector’ in data science? What is the use of vectors in
data science?
5. List various symbols associated with basic arithmetic operations of vectors in data
science.
6. Define set and set theory. Illustrate set operations with suitable examples. (Any
data will be given to show the operations in Python)
Module 2 - Questions
Bayesian versus Frequentist
1. Differentiate Bayesian versus frequentist approaches
2. Suppose a test is 95% accurate when a disease is present and 97% accurate when
the disease is absent. Suppose that 1% of the population has the disease. What is P
(have the disease | test +)?
3. Let's say that we are interested in ascertaining how often a person who visits the
particular website is likely to return on a later date. There are 1,458 unique visitors
in the past week, 452were repeat visitors. Calculate the relative frequency of
repeat visitors.
Module 2 - Questions
Probability
1. Derive the rules of probability in detail with an illustration.
2. Derive Bayes’ theorem for two events (Hypothesis and Data). Also describe any
two of its applications.
3. Let's say 165 people walked in for the study. All 165 people are given the test and
asked if they have cancer (provided through various other means). 50 people were
predicted to have no cancer and did not have it, 100 people were predicted to have
cancer and actually did have it. Formulate the confusion matrix to show the results
of this experiment.
1. List the types of discrete random variables. Identify the conditions in which these
random variables are appropriate and describe them with necessary examples.
Note: Refer all problems related to all the types of random variables.