0% found this document useful (0 votes)

11 views

Lec3 IntroToProbabilityAndStatistics

This document provides an introduction to probability and statistics concepts including common probability distributions. It outlines topics that will be covered such as the binomial, multinomial, Poisson, student's t, Laplace, gamma, beta, and Pareto distributions. Examples of the Bernoulli and binomial distributions are shown. The document also references additional resources on probability and statistics.

Uploaded by

hu jack

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Lec3 IntroToProbabilityAndStatistics

Uploaded by

hu jack

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Introduction to

Probability and Statistics

(Continued)
Prof. Nicholas Zabaras
Center for Informatics and Computational Science
https://cics.nd.edu/
University of Notre Dame
Notre Dame, Indiana, USA

Email: nzabaras@gmail.com
URL: https://www.zabaras.com/

August 27, 2018

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Contents
 The binomial and Bernoulli distributions

 The multinomial and multinoulli distributions

 The Poisson distribution

 Student’s T

 Laplace distribution

 Gamma distribution

 Beta distribution

 Pareto distribution

 Introduction to Covariance and Correlation

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 2
References
• Following closely Chris Bishops’ PRML book, Chapter 2

• Kevin Murphy’s, Machine Learning: A probablistic perspective, Chapter 2

• Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge

University Press.

• Bertsekas, D. and J. Tsitsiklis (2008). Introduction to Probability. Athena

Scientiﬁc. 2nd Edition

• Wasserman, L. (2004). All of statistics. A Concise Course in Statistical

Inference. Springer.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 3
Binary Variables
 Consider a coin flipping experiment with heads = 1 and
tails = 0. With   [0,1]
p ( x  1|  )  
p( x  0 |  )  1  

 This defines the Bernoulli distribution as follows:

Bern ( x |  )   x (1   )1 x

 Using the indicator function, we can also write this as:

Bern ( x |  )   ( x 1)
(1   ) ( x  0)

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 4
Bernoulli Distribution
 Recall that in general

[ f ]   p( x) f ( x), [ f ]   p( x) f ( x)dx
x

var[ f ]  [ f ( x) 2 ]  [ f ( x)]2
 For the Bernoulli distribution Bern ( x |  )   x (1   )1 x , we
can easily show from the definitions:
[ x]  
var[ x]   (1   )

[ x]   
x0,1
p ( x |  ) ln p( x |  )    ln   (1   ) ln(1   )

 Here H[𝑥] is the “entropy of the distribution”

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 5
Likelihood Function for Bernoulli Distribution
 Consider the data set

D   x1 , x2 ,..., xN 
in which we have 𝑚 heads (𝑥 = 1), and 𝑁 − 𝑚 tails (𝑥 = 0)

 The likelihood function takes the form:

N N
p (D |  )   p ( xn |  )   xn (1   )1 xn  m (1   ) N  m
n 1 n 1

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 6
Binomial Distribution
 Consider the discrete random variable X 0,1, 2,..., N 

 We define the Binomial distribution as follows: binomial distribution

0.35

N m
Bin ( X  m | N ,  )     (1   ) N  m
0.3

m 0.25 Bin ( N  10,   0.25)

0.2

 In our coin flipping experiment, 0.15

it gives the probability in 𝑁 flips to 0.1

Matlab Code
get 𝑚 heads with 𝜇 being the 0.05

probability getting heads in one flip. 0

0 1 2 3 4 5 6 7 8 9 10

 It can be shown (see S. Ross, Introduction to Probability

Models) that the limit of the binomial distribution as
N  , N   l , is the Poisson(l) distribution.
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 7
Binomial Distribution
 The Binomial distribution for 𝑁 = 10, and   0.25, 0.9 is
shown below using MatLab function binomDistPlot from
Kevin Murphys’ PMTK
 =0.250
0.25  =0.900
0.9
0.35 0.4

0.3 0.35

0.25 0.3

0.25
0.2

0.2
0.15

0.15
0.1
0.1

0.05
0.05

0
0 1 2 3 4 5 6 7 8 9 10 0
0 1 2 3 4 5 6 7 8 9 10

Bin ( N ,  )

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 8
Mean,Variance of the Binomial Distribution
 Consider for independent events the mean of the sum is
the sum of the means, and the variance of the sum is the
sum of the variances.

 Because 𝑚 = 𝑥1 + . . . + 𝑥𝑁, and for each observation the

mean and variance are known from the Bernoulli
distribution:
N
[m]   mBin (m | N ,  )  [ x1  ...  xN ]  N 
m0
N
var[m]    m  [m] Bin (m | N ,  )  var[ x1  ...  xN ]  N  (1   )
2

m0

 One can also compute [m], [m 2 ] by differentiating

N N

(twice) the identity   m   (1   )  1 wrt 𝜇. Try it!

m 1
m N m

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 9
Binomial Distribution: Normalization
To show that the Binomial is correctly normalized, we use the
following identities:
 N   N   N  1
 Can be shown with direct substitution:     (*)
  
n n  1   n 
N
N

 Binomial theorem: (1  x)     x m (**)

m0  m 

This theorem is proved by induction using (*) and noticing:

N
N m N
 N  m N  N  m 1 N  N  m N 1  N  m
(1  x) N 1
    x (1  x)     x     x     x    x 
m0  m  m0  m  m0  m  m0  m  m 1  m  1 
 N m  N  N  m N 1   N  1 m N 1 N  1
*
N N
  m
      
N 1
1  x   x  x   1    x  x   x
 m 1  m    m 1  m  1  m 1  m  m 0  m 

 To finally show normalization using (**):

m N
N
N m N
 N    N   
    (1   ) N m
 (1   ) N
   
m 0  m   1   
 (1   )  1   1
m0  m   1  
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 10
Generalization of the Bernoulli Distribution
We are now looking at discrete variables that can take on
one of 𝐾 possible mutually exclusive states.

The variable is represented by a 𝐾-dimensional vector 𝒙 in

which one of the elements 𝑥𝑘 equals 1, and all remaining
elements equal 0: 𝒙 = 0,0, … , 1,0, … , 0 𝑇
K

These vectors satisfy: x

k 1
k 1

 Let the probability of 𝑥𝑘 = 1 be denoted as 𝜇𝑘. Then

K K K
p( x |  )      k
k 1
xk
k
k 1
( xk 1)
, 
k 1
k  1, k  0

where 𝝁 = 𝜇1 , … , 𝜇𝐾 𝑇 .

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 11
Multinoulli/Categorical Distribution
 The distribution is already normalized:
K K

 p( x |  )    
x x k 1
xk
k   k  1
k 1

 The mean of the distribution is computed as:

 x |     xp( x |  )   1 ,...,  K 
T

x

similar to the result for the Bernoulli distribution.

 The Multinoulli also known as the Categorical distribution

often denoted as (Mu here is the multinomial distribution):
Cat  x |    Multinoulli  x |    Mu  x |1,  

 The parameter 1 stands to emphasize that we roll a 𝐾-sided

dice once (𝑁 = 1) – see next for the multinomial distribution.
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 12
Likelihood: Multinoulli Distribution
 Let us consider a data set D   x1 ,..., xN  . The likelihood
becomes:
N
N K K  xnk K N
p(D |  )    xnk
k   k n1
  , where : mk  xnk
mk
k
n 1 k 1 k 1 k 1 n 1

is the # of observations of 𝑥𝑘 = 1.

 mk is the “sufficient statistic” of the distribution.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 13
MLE Estimate: Multinoulli Distribution
 To compute the maximum likelihood (MLE) estimate of 𝝁, we
maximize an augmented log-likelihood
 K  K  K 
ln p (D |  )  l   k  1   mk ln k  l   k  1
 k 1  k 1  k 1 
mk
 Setting the derivative wrt 𝜇𝑘 equal to zero: K  
l
Substitution into the constraint
K

K m k K K 
mk

mk

k 1
k 1  k 1

l
 1  l   mk 
k 1 m
K

k
N
k 1

As expected, this is
the fraction in the 𝑁
observations of 𝑥𝑘 = 1
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 14
Multinomial Distribution
 We can also consider the joint distribution of 𝑚1, … , 𝑚𝐾 in 𝑁
observations conditioned on the parameters 𝝁 =
(𝜇1, … , 𝜇𝐾).

 From the expression for the likelihood given earlier

K
p (D |  )    kmk
k 1

the multinomial distribution Mu (m1 ,..., mK | N ,  ) with

parameters 𝑁 and 𝝁 takes the form:

N!
p (m1 , m2 ,..., mK | N , 1 , 2 ,...,  K )  1m1 2m2 ... Kmk where  k 1 mk  N
K

m1 !m2 !...mk !

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 15
Example: Biosequence Analysis
cgat acg gggtcgaa
 Consider a set of DNA caat ccg agatcgca
Sequences

sequences where there caat ccg t g t t g g g a N=1:10

are 10 rows (sequences) caat cgg catgcggg
and 15 columns (locations cgagccg cgtacgaa
cat acgg agcacgaa
along the genome).
t aat ccg ggcatgta
Several locations are cgagccg agtacaga
conserved by evolution ccat ccg cgtaagca
ggat acg agatgaca
(e.g., because they are part Location along the genome

of a gene coding region), since the corresponding columns

tend to be pure e.g., column 7 is all 𝑔’s.
 To visuallize the data (sequence logo), we plot the letters
𝐴, 𝐶, 𝐺 and 𝑇 with a font size proportional to their empirical
probability, and with the most probable letter on the top.
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 16
Example: Biosequence Analysis
 The empirical probability distribution at location 𝑡, is
obtained by normalizing the vector of counts (see MLE
estimate)
𝑁 𝑁 𝑁 𝑁
1
෡𝑡 =
𝜽 ෍𝕀(𝑋𝑖𝑡 = 1) , ෍𝕀(𝑋𝑖𝑡 = 2) , ෍𝕀(𝑋𝑖𝑡 = 3) , ෍𝕀(𝑋𝑖𝑡 = 4)
𝑁
𝑖=1 𝑖=1 𝑖=1 𝑖=1
2

 This distribution is known

as a motif.

 Can also compute the

Bits
1

most probable letter

in each location; this is the
consensus sequence.
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Sequence Position
Use MatLab function seqlogoDemo from Kevin Murphys’ PMTK

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 17
Summary of Discrete Distributions
 A summary of the multinomial and related discrete
distributions is summarized below on a Table from Kevin
Murphy’s textbook

 𝑛 = 1 (one roll of the dice), 𝑛 = − (𝑁 rolls of the dice)

 𝐾 = 1 (binary variables), 𝐾 = − (1-of-𝐾 encoding)

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 18
The Poisson Distribution
 We say that X  0,1, 2,3,... has a Poisson distribution with
parameter 𝜆 > 0, if its pmf is
l lx
X ~ Poi (l ) : Poi ( x | l )  e
x!

 This is a model for counts of rare events.

Poi(l=1.000) Poi(l=10.000)
0.4 0.14

0.35 0.12

0.3
0.1

0.25
0.08

0.2
0.06

0.15
0.04
0.1

0.02
0.05

0
0 0 5 10 15 20 25 30
0 5 10 15 20 25 30
Use MatLab function poissonPlotDemo from Kevin Murphys’ PMTK

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 19
The Empirical Distribution
 Given data, 𝒟 = {𝑥1, … , 𝑥𝑁}, we define the empirical
distribution as:
1 N
1 if xi  A
pemp ( A)    xi ( A), Dirac Measure :  xi ( A)  
N i 1 0 if xi  A
 We can also associate weights with each sample:
N N N
1
Generalize pemp ( x) 
N
 i 1
xi ( x)  pemp ( x)   wi xi ( x) , 0  wi  1,  wi  1
i 1 i 1
 This corresponds to a histogram with spikes at each
sample point with height equal to the corresponding
weight. This distribution assigns zero weight to any point
not in the dataset.
 Note that the “sample mean of 𝑓(𝑥)” is the expectation of
𝑓(𝑥) under the empirical distribution:
N N
1 1
[ f ( x)]   f ( x)  xi ( x)dx   f (x ) i
i 1 N N n 1
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 20
Student’s T Distribution

1 2  /2 1/2
(  )
 l   lx   
1/2

p ( x |  , l , )  2 2 1  
     
( )  
2

 The parameter 𝜆 is called the precision of the 𝒯-distribution, even

though it is not in general equal to the inverse of the variance (see
below on behavior as 𝜐 → ∞).

 The parameter 𝜐 is called the degrees of freedom.

 For the particular case of 𝜐 = 1, the T-distribution reduces to the

Cauchy distribution.

 In the limit 𝜐 → ∞, the 𝒯-distribution 𝒯(𝑥|𝜇, 𝜆, 𝜐) becomes a Gaussian

𝒩(𝑥|𝜇, 𝜆−1 ) with mean 𝜇 and precision 𝜆.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 21
For 𝝊 → ∞, 𝓣(𝒙|𝝁, 𝝀, 𝝊) Becomes a Gaussian

1 2  /2 1/2
(  )
 l   lx   
1/2

p ( x |  , l , )  2 2 1  
     
( )  
2
 We first write the distribution as follows:
 /2 1/2
 l  x   2     1  l  x   2  
T ( x |  , l , )  1    exp  ln 1  
    2    

 For large 𝜐, we can approximate the log as follows:

   1  l  x   2    
 l   
2

x 1 
T ( x |  , l , )  exp    O( )    exp 
2
 O( ) 
 2      2 
 In the limit 𝜐 → ∞, the T-distribution 𝒯(𝑥|𝜇, 𝜆, 𝜐) is indeed a Gaussian
𝒩(𝑥|𝜇, 𝜆−1 ) with mean 𝜇 and precision 𝜆. The normalization of the T is
valid in this limit as well (so the Gaussian obtained is normalized).

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 22
Student’s T Distribution
 1 2  /2 1/2
(  )  lx   
 l 
1/2

p ( x |  , l , )  2 2 1  
     
( )  
2
student distribution
0.4
v=10
0.35 v=1.0
v=0.1

Mean :  ,   1 0.3

  0, l  1
Mode :  0.25
For   , we
 0.2
obtain N (  , l 1 )
Var : ,  2
l   2  0.15

0.1

0.05

0
-5 -4 -3 -2 -1 0 1 2 3 4 5

MatLab Code
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 23
Student’s T Vs the Gaussian
prob. density functions

 We plot: 0.8
Gauss
Student

 
0.7

N  x | 0,1 , T ( x | 0,1,1), Lap x | 0,1/ 2

Laplace

0.6

0.5

 The mean and variance of the 0.4

Student’s is undefined for 𝜐 = 1. 0.3

0.2

 Logs of the PDFs. The Student’s 0.1

is NOT log concave. -4 -3 -2 -1 0 1

log prob. density functions

2 3 4

0
Run MatLab function studentLaplacePdfPlot Gauss
-1 Student
from Kevin Murphys’ PMTK Laplace
-2

 When 𝜐 = 1, the distribution is -3

known as Cauchy or Lorentz. -4

-5

Due to its heavy tails, the mean -6

does not converge. -7

-8

 Recommended to use 𝜐 = 4. -9
-4 -3 -2 -1 0 1 2 3 4

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 24
Student’s T Distribution

p ( x |  , a, b)   N  x |  , 1  Gamma  | a, b  d 
0

    
1/2
2 b
a
   exp    x      a 1  b
e d
0
2   2  (a )

 The Student’s T distribution can be seen from the equation

above (see following two slides for proof) as an infinite
mixture of Gaussians each of them with different
precision (governed by a Gamma distribution)
The result is a distribution that in general has longer ‘tails’
than a Gaussian.
This gives the T-distribution robustness, i.e. the T-
distribution is much less sensitive than the Gaussian to the
presence of outliers.
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 25
Appendix: Student’s T as a Mixture of Gaussians
 If we have a univariate Gaussian 𝒯(𝑥|𝜇, 𝜏 −1 ) together with a prior
Gamma(𝜏 |𝑎, 𝑏) and we integrate out the precision, we obtain the
marginal distribution of 𝑥

p ( x |  , a, b)   N  x |  , 1  Gamma  | a, b  d 
0

    
1/2
2 b
a
   exp    x      a 1  b
e d 
0
2   2  (a )
 1 2
 Introduce the transformation z  b
 2  x      to simplify as:
A

1/2 
ba  1 
p ( x |  , a, b)      1/2
exp   z   a 1
d 
(a)  2  0
1/2 
b  1 
a
1
   1/2  a 11 
z 1/2
exp   z  z a 1
dz
(a)  2  A 0

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 26
Appendix: Student’s T as a Mixture of Gaussians
1/2 
ba  1  1
p ( x |  , a, b)    1/2  a 11 
z 1/2
exp   z  z a 1
dz
(a)  2  A 0
1/2  a 1/2 
ba  1   1 2
 b   x        a 1/2
  exp  z z dz
(a)  2   2  0

 Recalling the definition of the Gamma function: (a)  exp   z  z a 1dz
 0

1/2  a 1/2
ba  1   1 2 1
p ( x |  , a, b)    b   x    (a  )
(a)  2   2  2
a
 It is common to redefine the parameters in this distribution as:   2a, l 
 1 b
(  ) 2  /2 1/2
 l   lx   
1/2

p ( x |  , l , )  2 2 1  
     
( )  
2
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 27
Robustness of Student’s T Distribution
 The robustness of the T-distribution is illustrated here by comparing
the “maximum likelihood solutions” for a Gaussian and a T-distribution
(30 data points from the Gaussian are used).

 The effect of a small number of outliers (Fig. on the right) is less

significant for the T-distribution than for the Gaussian.
0.5
0.5
0.45
0.45
0.4
0.4
0.35
0.35
0.3
0.3
0.25
T-distribution & 0.25
0.2
0.2 Gaussian
0.15
Gaussian nearly
0.15
0.1 overlap 0.1
0.05
0.05
0
-10 -8 -6 -4 -2 0 2 4 6 8 10 12 0
-10 -8 -6 -4 -2 0 2 4 6 8 10 12

MatLab Code
T-distribution
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 28
Robustness of Student’s T Distribution
 The earlier simulation is repeated here with the PMTK toolbox.
0.5
gaussian
student T
laplace
0.4
0.5
gaussian
0.3 student T
laplace
0.4

0.2

0.3

0.1

0.2
0
-5 0 5 10

0.1

0
-5 0 5 10

Run MatLab function robustDemo

from Kevin Murphys’ PMTK

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 29
The Laplace Distribution
 Another distribution with heavy tails is the Laplace distribution, also
known as the double sided exponential distribution. It has the following
pdf:
x
1 
Lap  x |  , b   e b
2b
 𝜇 is a location parameter and 𝑏 > 0 is a scale parameter

Mean   , Mode   , Var  2b 2

 Its robust to outliers (see earlier demonstration).

 It puts mores probability density at 0 than the Gaussian. This property

is a useful way to encourage sparsity in a model.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 30
Beta Distribution
 The Beta(𝛼, 𝛽) distribution with x  [0,1],  ,   0 is defined as
follows:
(   )  1  1 x 1 (1  x)  1       
( x)  x (1  x)  , beta ( ,  ) 
( )(  ) beta( ,  )     
Normalizing
factor

 The expected value, mode and variance of a Beta

random variable 𝑥 with (hyper-) parameters 𝛼 and 𝛽 :
 a
beta distributions

-1
 
3

 mode éë x ùû =
a=0.1, b=0.1
x , a=1.0, b=1.0

  a +b -2
2.5 a=2.0, b=3.0
a=8.0, b=4.0


var  x   1.5

    (    1)
2
1

0.5

 For more information visit this link. 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 31
Beta Distribution
 If 𝛼 = 𝛽 = 1, we obtain a uniform distribution.

 If 𝛼 and 𝛽 are both less than 1, we get a bimodal

distribution with spikes at 0 and 1.
beta distributions
3
 If 𝛼 and 𝛽 are both greater a=0.1, b=0.1
a=1.0, b=1.0
than 1, the distribution 2.5 a=2.0, b=3.0 𝛼,𝛽>1
a=8.0, b=4.0
is unimodal. 2

𝛼,𝛽<1
Run betaPlotDemo 1.5
from PMTK

1
𝛼=𝛽=1

0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 32
Beta Distribution
3 3
Beta(0.1,0.1) Beta(1,1)

2.5 2.5

2 2 See Matlab implementation

pdf

1.5

pdf
1.5

1
1

0.5
0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
x
3
Beta(2,3) 3
Beta(8,4)
2.5
2.5

2
2
pdf

1.5

pdf
1.5

1
1

0.5
0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
x 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 33
Gamma Function
(   )  1
( x)  x (1  x)  1
( )(  )

 The gamma function extends the factorial to real numbers:


( x)   u x 1e u du
0

 With integration by parts:

( x  1)  x( x)
 For integer 𝑛:
(n)  (n  1)!
 For more information visit this link.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 34
Beta Distribution: Normalization
 Showing that the Beta(𝛼, 𝛽) distribution is normalized
correctly is a bit tricky. We need to prove that:
1

1   
 1
( )(  )  (   )    1
d
0
Follow the steps:
 (a) change the variable 𝑦 below to 𝑦 = 𝑡 − 𝑥;
 (b) change the order of integration in the shaded
triangular region;
 and (c) change 𝑥 to m via 𝑥 = 𝑡𝜇: t
𝑡=𝑥
  1  x   1  y 
 
 1 

 
( )(  )    x e dx   y e dy    x   e  t  x  dt  dx 
t  1
x
0  0  y t  x 0 x 

 t  1  t   1
    x e  t  x  dx  dt   t e t tdt    1 1    d  
 1  1  t  1  1

00  0 0
1
 (   )    1 1   
 1
d
0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 35
Gamma Distribution- Rate Parametrization
 It is frequently a model for waiting times. For important
properties see here.

 It is more often parameterized in terms of a shape

parameter 𝑎 and an inverse scale parameter 𝑏 = 1/𝜃,
called a rate parameter:
a 
b
p ( x | a, b)  x a 1e  bx , x   0,   , (a )   u a 1e  u du
(a ) 0

 The mean, mode and variance with this parametrization are:

 a 1
  , for a  1 
 x  mod e  x    b var  x   2
b 0 otherwise b

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 36
Gamma Distribution
b a a 1
 Plots of Gamma ( X | a, b)  x exp( xb), b  1
(a )

 As we increase the rate 𝑏, the distribution squeezes

leftwards and upwards. For 𝑎 < 1, the mode is at zero.
Gamma distributions
0.9
a=1.0,b=1.0
0.8
a=1.5,b=1.0
0.7 a=2.0,b=1.0

0.6

0.5

0.4

0.3

0.2

0.1

1 2 3 4 5 6 7

Run gammaPlotDemo
from PMTK

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 37
Gamma Distribution
 An empirical PDF of rainfall data fitted with a Gamma
distribution.
3.5 3.5
MoM
MLE
3 3

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0
0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5

Run MatLab function gammaRainfallDemo

from PMTK

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 38
Exponential Distribution
 This is defined as
Expon ( X | l )  Gamma ( X |1, l )  l exp( xl ), x  0,  


 Here 𝜆 is the rate parameter.

 This distribution describes the times between events in a

Poisson process, i.e. a process in which events occur
continuously and independently at a constant average rate
𝜆.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 39
Chi-Squared Distribution
 This is defined as

 1 2
 1    1
 ( X |  )  Gamma ( X | , )    x 2 exp( ), x  0,  
2 2 x
2 2   2
 
2

 This is the distribution of the sum of squared Gaussian

random variables.

 More precisely,

Let Z i ~ (0,1) and S   Z i2 , then : S ~ 2
i 1

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 40
Inverse Gamma Distribution
 This is defined as follows:

If X ~ Gamma ( X | a, b)  X 1 ~ InvGamma ( X | a, b)
where:
b a  ( a 1)
InvGamma ( X | a, b)  x exp(b / x), x   0,  
(a )
 𝑎 is the shape and 𝑏 the scale parameters.

 Note that 𝑏 is a scale parameter since:

X
InvGamma ( | a,1)
InvGamma ( X | a, b)  b
b
 It can be shown that:
b b b2
Mean  (exists for a  1), Mode  , var  (exists for a  2)
a 1 a 1 (a  1) (a  2)
2

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 41
The Pareto Distribution
 Used to model the distribution of quantities that exhibit
long tails (heavy tails)
𝒫𝒶𝓇ℯ𝓉ℴ 𝑋|k, 𝑚 = 𝑘𝑚𝑘 𝑥 −(𝑘+1) 𝕀 𝑥 ≥ 𝑚
 This density asserts that 𝑥 must be greater than some
constant 𝑚, but not too much greater, 𝑘 controls what is
“too much”.
 Modeling the frequency of words vs. their rank (e.g. “the”,
“of”, etc.) or the wealth of people.*
 As 𝑘 → ∞, the distribution approaches 𝛿(𝑥 − 𝑚).
 On a log-log scale, the pdf forms a straight line of the form
log 𝑝(𝑥) = 𝑎 log 𝑥 + 𝑐 for some constants 𝑎 and 𝑐 (power
law, Zipf’s law).
* Basis of the distribution: a high proportion of a population has low income and only few have very high incomes.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 42
The Pareto Distribution
 Applications: Modeling the frequency of words vs their
rank, distribution of wealth (𝑘 =Pareto Index), etc.
Pareto ( X | k , m)  km k x  ( k 1) ( x  m), Pareto distribution
km
Mean  (if k  1), 2 m=0.01, k=0.10

k 1 1.8
m=0.00, k=0.50
m=1.00, k=1.00
Mode  m, 1.6

1.4
m2 k
var  (if k  2) 1.2
(k  1) (k  2)
2
1

0.8

0.6

0.4

0.2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

ParetoPlot from PMTK

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 43
Covariance
 Consider two random variables X , Y :   .

 The joint probability distribution is defined as:

P  X  A, Y  B  P  X 1 ( A) Y 1 ( B )   p( x, y )dxdy
A B

 Two random variables are independent if

p ( x, y )  p ( x ) p ( y )

 The covariance of X and Y is defined as:

cov( X , Y )  ( X  ( X ))(Y  (Y )) 

 It is straight forward to verify that: cov( X , Y )   XY    X  Y 

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 44
Correlation, Center Normalized Random Variables
 Consider two random variables X , Y :   .

 The correlation coefficient of X and Y is defined as:

cov( X , Y )
corrc( X , Y ) 
 XY

where the standard deviations of X and Y are

 X  cov( X ),  Y  cov(Y )
~ 𝑋−𝔼 𝑋
𝑋=
𝜎𝑋
 The center normalized random variables are defined as:
~ 𝑌−𝔼 𝑌
𝑌=
𝜎𝑌
 It is straight forward to verify that:

~ ~ ~ ~
𝔼 𝑋 =𝔼 𝑌 =0 var 𝑋 = var 𝑌 = 1

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 45

1.05: Energy Lab Worksheet-Assignment Template
100% (1)
1.05: Energy Lab Worksheet-Assignment Template
6 pages
ABA Language and Cognition PDF
100% (1)
ABA Language and Cognition PDF
377 pages
Montessori - Mathematics - Introduction
No ratings yet
Montessori - Mathematics - Introduction
3 pages
cs229 Notes Ensemble
No ratings yet
cs229 Notes Ensemble
7 pages
Module02A Slides Print
No ratings yet
Module02A Slides Print
66 pages
Data Analysis - Statistics-021216 - NH (Compatibility Mode)
No ratings yet
Data Analysis - Statistics-021216 - NH (Compatibility Mode)
8 pages
Lec31 32 CaterpillarRegressionExample
No ratings yet
Lec31 32 CaterpillarRegressionExample
108 pages
Kriging: Applied Geostatistics For Mining Professionals
No ratings yet
Kriging: Applied Geostatistics For Mining Professionals
37 pages
Anomaly Detection
No ratings yet
Anomaly Detection
15 pages
Special Matrices, Cells, Structures, Functions: Chapter 2 & 3
No ratings yet
Special Matrices, Cells, Structures, Functions: Chapter 2 & 3
13 pages
A Bootstrap Method For Error Estimation in Randomized Matrix Multiplication
No ratings yet
A Bootstrap Method For Error Estimation in Randomized Matrix Multiplication
40 pages
Old Exam-Dec PDF
No ratings yet
Old Exam-Dec PDF
6 pages
TE IT DMBI Module2 Data Preprocessing L8-L11
No ratings yet
TE IT DMBI Module2 Data Preprocessing L8-L11
73 pages
Lec19 Introduction2LinearRegression
No ratings yet
Lec19 Introduction2LinearRegression
53 pages
Msda3 Notes
No ratings yet
Msda3 Notes
8 pages
ML Quiz 2
No ratings yet
ML Quiz 2
4 pages
Jee 2017
No ratings yet
Jee 2017
75 pages
Cluster Analysis Introduction
No ratings yet
Cluster Analysis Introduction
23 pages
Chapter5sampling for counts and proportions
No ratings yet
Chapter5sampling for counts and proportions
18 pages
Vmls - 103exercises
No ratings yet
Vmls - 103exercises
50 pages
PHY102 SP21 Lab Intro and Error Analysis
No ratings yet
PHY102 SP21 Lab Intro and Error Analysis
26 pages
Mil780 Classification
No ratings yet
Mil780 Classification
18 pages
MIT2 086F14 Monte Carlo
No ratings yet
MIT2 086F14 Monte Carlo
15 pages
Assignment 2 MAT323 (22301787)
No ratings yet
Assignment 2 MAT323 (22301787)
6 pages
Lec26 RandomVariableGeneration
No ratings yet
Lec26 RandomVariableGeneration
38 pages
Monti Carlo PDF
No ratings yet
Monti Carlo PDF
23 pages
2016 Midterm
No ratings yet
2016 Midterm
9 pages
Lecture 3 Introduction to Linear Algebra (Part 2)
No ratings yet
Lecture 3 Introduction to Linear Algebra (Part 2)
57 pages
SSP Exam WS1920
No ratings yet
SSP Exam WS1920
5 pages
SSPI Lecture 4 MVU Estimation 2025
No ratings yet
SSPI Lecture 4 MVU Estimation 2025
64 pages
Lec29 StatsAndFits 2017
No ratings yet
Lec29 StatsAndFits 2017
28 pages
Bishop2008 Chapter ANewFrameworkForMachineLearnin
No ratings yet
Bishop2008 Chapter ANewFrameworkForMachineLearnin
24 pages
Chapter 03 - Random Variables
No ratings yet
Chapter 03 - Random Variables
14 pages
Lec33 MetropolisHastings
No ratings yet
Lec33 MetropolisHastings
66 pages
Machine Learning and Pattern Recognition Week 2 Error Bars
No ratings yet
Machine Learning and Pattern Recognition Week 2 Error Bars
3 pages
Matrix Decompositions
No ratings yet
Matrix Decompositions
9 pages
Index Notation Practice
No ratings yet
Index Notation Practice
6 pages
1 DiscreteDistribution2018
No ratings yet
1 DiscreteDistribution2018
75 pages
ssrn-3033993
No ratings yet
ssrn-3033993
50 pages
QAM I Review
No ratings yet
QAM I Review
16 pages
Stat276 Chapter 7
No ratings yet
Stat276 Chapter 7
23 pages
Statistics and Probability (Week 3)
No ratings yet
Statistics and Probability (Week 3)
7 pages
Ch 5.2 Measure of Dispersion
No ratings yet
Ch 5.2 Measure of Dispersion
16 pages
Islamic University of Technology (IUT) : Digital Signals Processing Lab (EEE 4702)
No ratings yet
Islamic University of Technology (IUT) : Digital Signals Processing Lab (EEE 4702)
9 pages
Bbs14ege ch05 Discrete Probability Distributions
No ratings yet
Bbs14ege ch05 Discrete Probability Distributions
37 pages
Chapter 2 Evaluation Analytical Data (Izirwan) (1)
No ratings yet
Chapter 2 Evaluation Analytical Data (Izirwan) (1)
107 pages
week_4_1
No ratings yet
week_4_1
51 pages
Sampling Distributions: Fifth Week Fifth Week
No ratings yet
Sampling Distributions: Fifth Week Fifth Week
29 pages
Chap2 Sampling Distns I
100% (1)
Chap2 Sampling Distns I
20 pages
CE304-Unit 5-Lect1-Jumah2018
No ratings yet
CE304-Unit 5-Lect1-Jumah2018
10 pages
MTH3251 Financial Mathematics Exercise Book 15
No ratings yet
MTH3251 Financial Mathematics Exercise Book 15
18 pages
Chapter Four
No ratings yet
Chapter Four
21 pages
cs229-notes2
No ratings yet
cs229-notes2
14 pages
Statests
No ratings yet
Statests
20 pages
A Mathematical Model Analysis PDF
No ratings yet
A Mathematical Model Analysis PDF
13 pages
Chapter 5: Common Distributions: 5.1 The Normal Distribution
No ratings yet
Chapter 5: Common Distributions: 5.1 The Normal Distribution
21 pages
SSPI Lecture 3 Estimation Intro 2025
No ratings yet
SSPI Lecture 3 Estimation Intro 2025
56 pages
Assignment MEF 2 2018
No ratings yet
Assignment MEF 2 2018
5 pages
02 Point Estimators
No ratings yet
02 Point Estimators
33 pages
Generative Learning Algorithms: CS229 Lecture Notes
No ratings yet
Generative Learning Algorithms: CS229 Lecture Notes
14 pages
Questions For CET
No ratings yet
Questions For CET
11 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Dai 2020
No ratings yet
Dai 2020
62 pages
Lecture 8.2 - Variational Quantum Eigensolver
No ratings yet
Lecture 8.2 - Variational Quantum Eigensolver
27 pages
Gonzalez 2020
No ratings yet
Gonzalez 2020
79 pages
Durrande 2020
No ratings yet
Durrande 2020
90 pages
Lecture 4.1 - Quantum Query Algorithms
No ratings yet
Lecture 4.1 - Quantum Query Algorithms
38 pages
Ek 2020
No ratings yet
Ek 2020
203 pages
Seminar em
No ratings yet
Seminar em
51 pages
Lecture 7 - Introduction To Quantum Noise Bonus
No ratings yet
Lecture 7 - Introduction To Quantum Noise Bonus
13 pages
Lecture 1.1 - Single States
No ratings yet
Lecture 1.1 - Single States
49 pages
Lecture 8.1 - Iterative Quantum Phase Estimation - Moving Beyond Traditional QPE
No ratings yet
Lecture 8.1 - Iterative Quantum Phase Estimation - Moving Beyond Traditional QPE
31 pages
Lecture 3 - Entanglement in Action
No ratings yet
Lecture 3 - Entanglement in Action
36 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
Lec35 SequentialImportanceSampling
No ratings yet
Lec35 SequentialImportanceSampling
46 pages
Lec21 BiasVarianceDecomposition
No ratings yet
Lec21 BiasVarianceDecomposition
15 pages
Lec29 ImportanceSampling
No ratings yet
Lec29 ImportanceSampling
84 pages
Lec24 BayesianLinearRegression
No ratings yet
Lec24 BayesianLinearRegression
29 pages
Lec27 AcceptReject
No ratings yet
Lec27 AcceptReject
30 pages
Lec25 MonteCarloMethods
No ratings yet
Lec25 MonteCarloMethods
57 pages
Lec18 HierarchicalBayesianModels
No ratings yet
Lec18 HierarchicalBayesianModels
20 pages
Introduction To State Space Models and Sequential Bayesian Inference
No ratings yet
Introduction To State Space Models and Sequential Bayesian Inference
58 pages
Lec9 MultivariateGaussian
No ratings yet
Lec9 MultivariateGaussian
60 pages
Lec22 Introduction2BayesianRegression
No ratings yet
Lec22 Introduction2BayesianRegression
42 pages
Lec17 PriorModeling
No ratings yet
Lec17 PriorModeling
37 pages
Lec23 Evidence4Regression
No ratings yet
Lec23 Evidence4Regression
38 pages
Lec14 15 GenerativeModelsForDiscreteData
No ratings yet
Lec14 15 GenerativeModelsForDiscreteData
74 pages
Lec7 InformationTheory
No ratings yet
Lec7 InformationTheory
41 pages
Lec16 SummarizingPosteriors BayesianModelSelection
No ratings yet
Lec16 SummarizingPosteriors BayesianModelSelection
59 pages
Pccoe College Information
No ratings yet
Pccoe College Information
11 pages
Length Check Latex Template For Preparing An Article For Submission To Optica Publishing Group Journals Ao Jocn Josa A Josa B Ol Optica
No ratings yet
Length Check Latex Template For Preparing An Article For Submission To Optica Publishing Group Journals Ao Jocn Josa A Josa B Ol Optica
4 pages
Snehasis Ghosh Resume
No ratings yet
Snehasis Ghosh Resume
1 page
Baroid Indonesia
No ratings yet
Baroid Indonesia
3 pages
4-5. Art, Copy - Creative Strategy-Ganjil2021-2022
No ratings yet
4-5. Art, Copy - Creative Strategy-Ganjil2021-2022
163 pages
Checking For Understanding - Material
No ratings yet
Checking For Understanding - Material
2 pages
TAMO Terapija PDF
No ratings yet
TAMO Terapija PDF
9 pages
John Peterson Trax Presentation
No ratings yet
John Peterson Trax Presentation
20 pages
Artikel Ilmiah Kelompok 2
No ratings yet
Artikel Ilmiah Kelompok 2
8 pages
Pierre Manent The Greatness and Misery of Liberalism
No ratings yet
Pierre Manent The Greatness and Misery of Liberalism
8 pages
High Tide in Tucson
No ratings yet
High Tide in Tucson
2 pages
TCS34725 Color Sensor User Manual
No ratings yet
TCS34725 Color Sensor User Manual
16 pages
ROVsim2 O&G Brochure
No ratings yet
ROVsim2 O&G Brochure
7 pages
4 - Assignment 1 Situation Analysis
No ratings yet
4 - Assignment 1 Situation Analysis
3 pages
Relevance of Designing and Developing An Improvised White Board Compass For Teaching Geometrical Construction Concepts in Basic Technology
No ratings yet
Relevance of Designing and Developing An Improvised White Board Compass For Teaching Geometrical Construction Concepts in Basic Technology
6 pages
Lady Macbeth - Infirm of Purpose
No ratings yet
Lady Macbeth - Infirm of Purpose
7 pages
Sys Eqns Applications 3x3
No ratings yet
Sys Eqns Applications 3x3
11 pages
Essay Muet
No ratings yet
Essay Muet
2 pages
Benchmarking For Cktimsr 2011
No ratings yet
Benchmarking For Cktimsr 2011
6 pages
Work Immersion Portfolio Scoring Rubric
100% (8)
Work Immersion Portfolio Scoring Rubric
2 pages
Political Science MCQs With Explanation For CSS (Machiavelli)
100% (1)
Political Science MCQs With Explanation For CSS (Machiavelli)
3 pages
Writing Task 1 Week 1
No ratings yet
Writing Task 1 Week 1
7 pages
A Grounded Theory
No ratings yet
A Grounded Theory
15 pages
Approaches To Green Belt Design
20% (5)
Approaches To Green Belt Design
26 pages
Partial Regression Coefficients.: Herv e Abdi
No ratings yet
Partial Regression Coefficients.: Herv e Abdi
4 pages
Ethnicity and Education
No ratings yet
Ethnicity and Education
33 pages
Logarithm Practice
No ratings yet
Logarithm Practice
9 pages