Probability Formula Sheet
Probability Formula Sheet
If A and B are two events in a sample space S, then the probability of the event
B when the event A has already occurred is called the conditional probability of B and is
denoted by P(A|B) and defined as
P( A ∩ B)
P( A | B) =
P( B)
The probability P(A|B) is an updating of P(A) based on the knowledge that event B has
already occurred.
Independent events:
A set of events is said to be independent if the occurrence of any one of them does not
depend on the occurrence or non-occurrence of the others. If two events A and B are
independent then:
P ( A ∩ B ) = P ( A).P ( B )
Baye’s theorem:
If E1, E2, E3, . . . En are mutually exclusive and exhaustive events with P(Ei) ≠ 0 for
i = 1 to n of a RANDOM experiment then for any arbitrary event ‘A’ of the
sample spaces of the above experiment with P(A) > 0 , we have i=1
⎛E ⎞ P( Ei ).P( A / Ei )
P⎜ i ⎟ = n
⎝ A⎠
∑ P( Ei ).P( A / Ei )
i =1
Random Variable
A random variable is a function that assigns a real number to every element of sample
space. Let S be the sample space of an experiment. Here we assign a specific number to
each outcome of the sample space. A random variable X is a function from S to the set of
real numbers R i.e. X: S →R
Ex. Suppose a coin is tossed twice S = {HH, HT, TH, TT}.Let X: represents number of
heads on top face. So to each sample point we can associate a number X (HH) = 2, X
(HT) = 1, X (TH) = 1, X (TT) = 0.Thus X is a random variable with range space RX =
{ 0, 1, 2}
Types of random Variable:
Discrete Random Variable: A random variable which takes finite number of
values or countable infinite number of values is called discrete random variable.
Example: Number of alpha particles emitted by a radioactive source.
Continuous Random Variable: A random variable which takes non-countable infinite
number of values is called discrete random variable. Example: length of time during which
a vacuum tube is installed in a circuit functions is a continuous RV.
Discrete Probability Distribution
Suppose a discrete variate X is the outcome of some experiment. If the probability that X
takes the value xi is pi, then
P( X = xi ) = pi or p ( xi ) for i = 1, 2....n
Where
a. p ( xi ) ≥ 0
b. ∑ p( x ) = 1
i
The set of values xi with their probabilities pi i.e. (xi,pi) constitute a discrete probability
distribution of the discrete variate X. The function p is called probability mass function
(pmf) or probability density function (pdf).
Cumulative Distribution Function (CDF) or Distribution Function of discrete
random variable X is defined by F(x) = p(X ≤ x) where x is a real number (– ∞ < x <
∞)
F ( x = u ) = ∑ p( x)
x ≤u
Expectation
Properties:
1.E (a ) = a.
2.E (ax + b) = aE ( x) + b
3.E ( x + y ) = E ( x) + E ( y )
4.E ( xy ) = E ( x).E ( y ), if x and v are independent R.V.
2 2
Also: Var ( X ) = E ( X ) − [ E ( X )]
Properties:
1.Var (a ) = 0
2.Var (aX + b) = a 2Var ( X )
3.Var ( X ) ≥ 0
Moments
1.The rth moment of a r.v. X about any point (X=A) is given by
µr (aboutX = A) = E ( X − A) r
µr (about mean)==E(X- x) r
r = 0 ⇒ µ0 =E(1) = 1.
r = 1 ⇒ µ1 =E(X- x) = E ( X ) − E ( x) = x − x = 0
r = 2 ⇒ µ1 =E(X- x) 2 = Var ( X )
t2 tr
M X (t ) = 1 + µ t + µ2' + .....µr' + ...
2! r!
Remark: If the mgf exists for a random variable X, we will be able to obtain all the
moments of X. It is very plainly put, one function that generates all the moments of X.
Kurtosis
Even if we know the measures of central tendency, dispersion and skewness of a random
variable (or its distribution). we cannot get a complete idea about the distribution. In
order to analyze the distribution completely, another characteristic kurtosis is also
required. Kurtosis means the convexity of the probability curve of the distribution. Using
the measure of coefficient of kurtosis, we can have an idea about the flatness or peakedness
of the probability curve near its top.
µ
The only measure of kurtosis used is the fourth order central moment ( 4 ).
µ
β 2 = 42
The coefficient of kurtosis is defined as µ2 .
Note: 1. Curve which is neither flat nor peaked is called a mesokurtic curve, for which
β2 = 3
2. Curve which is flatter than the curve 1 is called platykurtic curve, for which β 2 < 3
3. Curve which is more peaked than the curve 1 is called leptokurtic curve, for which
β2 > 3
Covariance:
If X and Y are two random variables then covariance between them is defined as
Cov( X , Y ) = E{( X − E ( X ).(Y − E (Y )} = E ( XY ) − E ( X ).E (Y )
Properties:
1.Cov( X , Y ) = 0, Iff X and Y are two independent random variables
2.Cov(aX+bY,cX+dY)=a.cσ X 2 + b.dσ Y 2 + (ad + bc)Cov( X , Y )
3.Cov(aX , bY ) = a.bCov( X , Y )
4.Var (aX ± bY ) = a 2V ( X ) + b 2V (Y ) ± 2abCov( X , Y )
5.Var (aX ± bY ) = a 2V ( X ) + b 2V (Y ), Iff X and Y are two independent random variables
The term correlation refers to the degree of relationship between two or more variables. If
a change in one variable effects a change in the other variable, the variables are said to be
correlated. Let X and Y be any two discrete random variables with standard deviations σX
and σY, respectively. The correlation coefficient of X and Y, denoted Corr(X,Y) is defined
as:
Cov( X , Y )
ρ XY = Corr ( X , Y ) = r ( X , Y ) =
σ X .σ Y
Chebyshev's inequality
2
If X is a RV with mean µ =np and variance=npq= σ , then foe any positive number k
1
P {| X − µ |≥ kσ }≤
k2
Or
1
P {| X − µ |< kσ }≥ 1 −
k2
Bernoulli Experiment
Suppose X denotes the number of successes in a sequence of n Bernoulli trials and let the
probability of success in each trial be p. Then X is said to follow a Binomial
distribution with parameters n and p if the probability distribution of X is given by
B (n, p ) = P ( X = x) = px = nCx p x q n − x , x = 0,1, 2......n
Example:
Suppose there are 2000 computer chips in a batch and there is a 2% probability that any
one chip is faulty. Then the number of faulty computer chips in the batch follows a
Binomial distribution with parameters n=2000 and p=2%.
Mean and variance of the Binomial distribution:
E ( X ) = np
E ( X 2 ) = n(n − 1) p 2 + np
Var ( X ) = E ( X 2 ) − {E ( X )}2
=n(n − 1) p 2 + np − n 2 p 2
=npq
M x (t ) = (q + pet ) n
Moment generating function of the Binomial Distribution:
npq (q − p )
Skewness =
npq ⎡⎣1 + 3 pq (n − 2 )⎤⎦
Kurtosis =
Poisson Distribution
e − λ .λ x
P( X = x) = , x = 0,1, 2..........
x!
The Poisson random variable has a tremendous range of applications in diverse areas
because it may be used as an approximation for a binomial random variable with
parameters (n, p) when n is large and p is small enough so that np is of moderate size.
The Poisson distribution is a limiting case of the binomial distribution under the
following conditions:
p ,the constant probability for the success of each trail is indefinitely small, i.e. p → 0.
np =λ is finite.
Mean = E ( X ) = λ
Var ( X ) = λ
th
In general, any
(k + 1) order central moment for Poisson distribution=
⎡dµ ⎤
µk +1 = λ ⎢ k + k µk −1 ⎥
⎣ dλ ⎦
Skewness = λ
λ (3λ + 1)
Kurtosis =
Multinomial Distribution
The multinomial distribution is used to find probabilities in experiments where there are
more than two outcomes. The multinomial distribution arises from an extension of the
binomial experiment to situations where each trial has k ≥ 2 possible outcomes.
Suppose E1,E2,……Ek are k mutually exclusive and exhaustive outcomes of a trial with
respective probabilities p1,p2…pk. The probability that E1 occurs n1 times, E2 occurs n2
times …….Ek occurs nk times in n independent observations is given by:
n!
P(n1 , n2 ,......nk ) = p1n1 . p2n2 ...... pknk
(n1 !).(n2 !).....(nk !)