Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
11 views

Lecture 8

Uploaded by

Diptendra Juin
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecture 8

Uploaded by

Diptendra Juin
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Data Analytics with

Python
Probability
Distributions

1
What is a
distribution?
• Describes the ‘shape’ of a batch of numbers

• The characteristics of a distribution can sometimes be defined using a


small number of numeric descriptors called ‘parameters’

2
Why
distribution?
• Can serve as a basis for standardized comparison of empirical
distributions
• Can help us estimate confidence intervals for inferential
statistics
• Form a basis for more advanced statistical methods
– ‘fit’ between observed distributions and certain theoretical
distributions is an assumption of many statistical procedures

3
Random
variable
• A variable which contains the outcomes of a chance experiment
• “Quantifying the outcomes”
• Example X= (1 = Head, 0 = Tails)
• A variable that can take on different values in the population
according to some “random” mechanism
• Discrete
– Distinct values, countable
– Year
• Continuous
– Mass

4
Probability
Distributions
• The probability distribution function or probability density function (PDF)
of a random variable X means the values taken by that random variable
and their associated probabilities.

• PDF of a discrete r.v. (also known as PMF):


Example 1: Let the r.v. X be the number of heads obtained in two tosses
of
a coin.
Sample Space: {HH, HT, TH, TT}

5
PDF of
Number of Heads (X):
Discrete
0 1
r.v.2 sum
PDF (P(X)): ¼ ½ ¼ 1
T h e P D F o f t h e N u m b e r o f H e a d s in
T w o To s s e s o f a C o i n

0 .6
0 .5
0 .5
0 .4
Probability

0 .3 0 .2 0 .2
Density

5 5
0 .2
0 .1
0
0 1 2
Nu mb er of Heads

6
Probability Distribution for the
Random Variable X
A probability distribution for a discrete
random variable X:
x –8 –3 –1 0 1 4 6
P(X = x) 0.13 0.15 0.17 0.20 0.15 0.11 0.09

Find
a. PX 0.65
 0 0.67
b. P 3  X 
7
Discrete Distribution
-- Example
Distribution of Daily
Crises P
Number of Probability r 0.5
Crises o 0.4
b
0 0.37 a 0.3
b
1 0.31 0.2
i
2 0.18 l 0.1
3 0.09 i
0
4 0.04 t 0 1 2 3 4 5
5 0.01 Number of Crises
y

8
Requirements for a Discrete Probability Function
• Probabilities are between 0 and 1, inclusively

• Total of all probabilities equals 1

0  P( X)  1 for all X

 P( X )  1
over all x

9
Cumulative
Distribution Function
• The CDF of a random variable X (defined as F(X)) is a graph
associating all possible values, or the range of possible values with
P(X  x).
• CDFs always lie between 0 and 1 i.e., 0  F(Xi)  1, Where F(Xi) is
the CDF.

10
The Expected Value of X
Let X be a discrete rv with set of possible values D and pmf p(x). The
expected value or mean value of X, denoted

E( X ) or  X , is

E( X )   X   x
p(x)
xD
11
Mean and Variance of a Discrete Random
Variable

A probability distribution can be viewed as a loading with the


mean equal to the balance point. Parts (a) and (b) illustrate
equal means, but Part (a) illustrates a larger variance.
Mean and Variance of a Discrete
Random Variable

The probability distribution illustrated in Parts (a) and (b)


differ even though they have equal means and equal
variances.
Example –
• Use the data belowExpected Value
to find out the expected number of credit cards that a
customer to a retail outlet will possess.
x = # credit cards
x P(x =X)
0 0.08 E  X   x1 p1  x2 p2  ...  xn pn
1 0.28
 0(.08) 1(.28)  2(.38)  3(.16)
2 0.38
3 0.16
 4(.06)  5(.03)  6(.01)
4 0.06 =1.97
5 0.03
6 0.01
About 2 credit cards

14
The Variance and Standard Deviation

Let X have pmf p(x), and expected value Then the 


variance of X, denoted V(X)

(or  X2 or  2 ), is

V ( X ) D (x   ) 2  p(x)  E[( X 


Thestandard
) 2 ] deviation (SD) of X
is X  X2

15
The quiz scores for a particular student are given below:

22, 25, 20, 18, 12, 20, 24, 20, 20, 25, 24, 25,
18
Find the variance and standard deviation.
Value 12 18 20 22 24 25
Frequency 1 2 4 1 2 3
Probability .08 .15 .31 .08 .15 .23

  21

V ( X )  p1  x1   2  p2  x2   2  ...  pn  xn 

 2
16
V ( X )  .08 12  212  .15 18  212  .3120 

212
.08 22  212  .15 24  212  .23 25 
V (2 X ) 

2113.25

  V(X  13.25 
) 3.64

17
Shortcut Formula for Variance

2  
V(X) x  p(x)   2
2

  
D 

 
 E X 2   E  X
2


18
Mean of a Discrete
Distribution
  E X    X
X P(X) X.P(X)
 P(
-1
X ) .1 -.1
0 .2 .0
1 .4 .4
2 .2 .4
3 .1 .3
1.0

19
Variance and Standard Deviation
of a Discrete Distribution
 2

1.2 
 X      ( X 1.10
2

2
  P( X ) 
1.2 X P(X) X  ( X )
2
)  P( X )
2


-1 .1 -2 4 .4
0 .2 -1 1 .2
1 .4 0 0 .0
2 .2 1 1 .2
3 .1 2 4 .4
1.2
20
Mean of the Data
Example
  E X    X  P( X
X P(X)
) 0 1.15
.37
 X P(X)
.00
0.5
P
r
o 0.4
1 .31 .31 b
a 0.3
2 .18 .36 b
0.2
i
3 .09 .27 l 0.1
i
4 .04 .16 0
t 0 1 2 3 4 5
5 .01 .05 y
Number
1.15

21
Properties of
Expected Value
1.E(b)  b, b is a
constant. 2. E(X +Y) =
E(X)+ E(Y).
E( X
3.E  Y E(Y
4.E( XY )  E() X )E(Y ) unless they are
X )
.
indpendendent. 5.E(aX )  aE( X ), a constant.
6.E(aX  b)  aE( X )  b, a and b are constants.

22
Properties of
1. Var(constant) = 0 Variance
2. If X and Y are two independent random variables, then
Var(X + Y) = Var(X) + Var (Y) and
Var(X - Y) = Var(X) + Var (Y)
3. If b is a constant then Var(b+X) = Var(X)
4. If a is a constant then Var(aX) = a2Var(X)
5. If a and b are constants then Var(aX+b) = a2Var(X)
6. If X and Y are two independent random variables and a and b are
constants then Var(aX+bY) = a2Var(X) + b2Var(Y)

23
Covariance
Covariance: For two discrete random variables X and Y with E(X) =
x and E(Y) = y, the covariance between X and Y is defined as
Cov(XY) = xy = E(X - x) E(Y - y) = E(XY) - x y.

24
Covariance
• In general, the covariance between two random variables can be
positive or negative.
• If two random variables move in the same direction, then the
covariance will be positive, if they move in the opposite direction
the covariance will be negative.
Properties:
1. If X and Y are independent random variables, their covariance
is zero. Since E(XY) = E(X)E(Y)
2. Cov(XX) = Var(X)
3. Cov(YY) = Var(Y)

25
Correlation
Coefficient
• The covariance tells the sign but not the magnitude about how
strongly the variables are positively or negatively related. The
correlation coefficient provides such measure of how strongly the
variables are related to each other.
• For two random variables X and Y with E(X) = x and E(Y) =
y,
the correlation coefficient is defined as

 xy  Cov( XY )  xy
 x y  x y

26
27
Thank
You

28

You might also like