Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
33 views

Common Probability Distributions: D. Joyce, Clark University Aug 2006

The document summarizes several common probability distributions including uniform, Bernoulli, binomial, geometric, negative binomial, hypergeometric, Poisson, exponential, gamma, beta, normal, chi-squared, t and F distributions. It provides the name, parameters, probability mass/density functions, mean, variance and examples for each distribution. It also discusses how some distributions relate to the Bernoulli process, central limit theorem, and gamma and beta functions used in defining certain distributions.

Uploaded by

helloThar213
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Common Probability Distributions: D. Joyce, Clark University Aug 2006

The document summarizes several common probability distributions including uniform, Bernoulli, binomial, geometric, negative binomial, hypergeometric, Poisson, exponential, gamma, beta, normal, chi-squared, t and F distributions. It provides the name, parameters, probability mass/density functions, mean, variance and examples for each distribution. It also discusses how some distributions relate to the Bernoulli process, central limit theorem, and gamma and beta functions used in defining certain distributions.

Uploaded by

helloThar213
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Common probability distributions

D. Joyce, Clark University


Aug 2006
1 Introduction.
I summarize here some of the more common distributions used in probability and statistics.
Some are more important than others, and not all of them are used in all elds.
Ive identied four sources of these distributions, although there are more than these.
the uniform distributions, either discrete Uniform(n), or continuous Uniform(a, b).
those that come from the Bernoulli process or sampling with or without replace-
ment: Bernoulli(p), Binomial(n, p), Geometric(p), NegativeBinomial(p, r),
and Hypergeometric(N, M, n).
those that come from the Poisson process: Poisson(, t), Exponential(),
Gamma(, r), and Beta(, ).
those related to the Central Limit Theorem: Normal(,
2
), ChiSquared(), T(),
and F(
1
,
2
).
For each distribution, I give the name of the distribution along with one or two parameters
and indicate whether it is a discrete distribution or a continuous one. Then I describe an
example interpretation for a random variable X having that distribution. Each discrete
distribution is determined by a probability mass function f which gives the probabilities for
the various outcomes, so that f(x) = P(X=x), the probability that a random variable X
with that distribution takes on the value x. Each continuous distribution is determined by a
probability density function f, which, when integrated from a to b gives you the probability
P(a X b). Next, I list the mean = E(X) and variance
2
= E((X)
2
) = E(X
2
)
2
for the distribution, and for most of the distributions I include the moment generating function
m(t) = E(X
t
). Finally, I indicate how some of the distributions may be used.
1.1 Background on gamma and beta functions.
Some of the functions below are described in terms of the gamma and beta functions. These
are, in some sense, continuous versions of the factorial function n! and binomial coecients

n
k

, as described below.
1
The gamma function, (x), is dened for any real number x, except for 0 and negative
integers, by the integral
(x) =


0
t
x1
e
t
dt.
The function generalizes the factorial function in the sense that when x is a positive integer,
(x) = (x1)!. This follows from the recursion formula, (x+1) = x(x), and the fact that
(1) = 1, both of which can be easily proved by methods of calculus.
The values of the function at half integers are important for some of the distributions
mentioned below. A bit of clever work shows that (
1
2
) =

, so by the recursion formula,

n +
1
2

=
(2n 1)(2n 3) 3 1
2
n

.
The beta function, B(, ), is a function of two positive real arguments. Like the
function, it is dened by an integral, but it can also be dened in terms of the function
B(, ) =

1
0
t
1
(1 t)
1
dt =
()()
( +)
.
The beta function is related to binomial coecients, for when and are positive integers,
B(, ) =
( 1)!( 1)!
( + 2)!
=

+ 2
1

1
.
2 The uniform distributions
Uniform distributions come in two kinds, discrete and continuous. They share the property
that all possible values are equally likely.
2.1 Discrete uniform distribution.
Uniform(n). Discrete.
In general, a discrete uniform random variable X can take any nite set as values, but
here I only consider the case when X takes on integer values from 1 to n, where the parameter
n is a positive integer. Each value has the same probability, namely 1/n.
f(x) = 1/n, for x = 1, 2, . . . , n.
=
n + 1
2
.
2
=
n
2
1
12
.
m(t) =
1 e
(n+1)t
1 e
t
.
Example. A typical example is tossing a fair cubic die. n = 6, = 3.5, and
2
=
35
12
.
2
2.2 Continuous uniform distribution.
Uniform(a, b). Continuous.
In general, a continuous uniform variable X takes values on a curve, surface, or higher
dimensional region, but here I only consider the case when X takes values in an interval [a, b].
Being uniform, the probability that X lies in a subinterval is proportional to the length of
that subinterval.
f(x) =
1
b a
, for x [a, b].
=
a +b
2
.
2
=
(b a)
2
12
.
m(t) =
e
bt
e
at
t(b a)
.
Example. An example of an approximately uniform distribution on the interval [
1
2
,
1
2
] is
the roundo error when rounding to the nearest integer. This assumes that the distribution
of numbers being rounded o is fairly smooth and stretches over several integers.
Note. Most computer programming languages have a built in pseudorandom number gen-
erator which generates numbers X in the unit interval [0, 1]. Random number generators for
any other distribution can then computed by applying the inverse of the cumulative distri-
bution function for that distribution to X.
3 The Bernoulli process and sampling with or without
replacement
A single trial for a Bernoulli process, called a Bernoulli trial, ends with one of two outcomes,
one often called success, the other called failure. Success occurs with probability p while
failure occurs with probability 1 p, usually denoted q.
The Bernoulli process consists of repeated independent Bernoulli trials with the same
parameter p. These trials form a random sample from the Bernoulli population.
You can ask various questions about a Bernoulli process, and the answers to these ques-
tions have various distributions. If you ask how many successes there will be among n
Bernoulli trials, then the answer will have a binomial distribution, Binomial(n, p). The
Bernoulli distribution, Bernoulli(p), simply says whether one trial is a success. If you ask
how many trials it will be to get the rst success, then the answer will have a geometric
distribution, Geometric(p). If you ask how many trials there will be to get the r
th
suc-
cess, then the answer will have a negative binomial distribution, NegativeBinomial(p, r).
Given that there are M successes among N trials, if you ask how many of the rst n trials
are successes, then the answer will have a Hypergeometric(N, M, n) distribution.
Sampling with replacement is described below under the binomial distribution, while
sampling without replacement is described under the hypergeometric distribution.
3
3.1 Bernoulli distribution.
Bernoulli(p). Discrete.
The parameter p is a real number between 0 and 1. The random variable X takes on two
values: 1 (success) or 0 (failure). The probability of success, P(X=1), is the parameter p.
The symbol q is often used for 1 p, the probability of failure, P(X=0).
f(0) = 1 p, f(1) = p.
= p.
2
= p(1 p) = pq.
m(t) = pe
t
+q.
Example. If you ip a fair coin, then p =
1
2
, = p =
1
2
, and
2
=
1
4
.
3.2 Binomial distribution.
Binomial(n, p). Discrete.
Here, X is the sum of n independent Bernoulli trials, each Bernoulli(p), so X = x means
there were x successes among the n trials. (Note that the Bernoulli(p) = Binomial(1, p).)
f(x) =

n
x

p
x
(1 p)
nx
, for x = 0, 1, . . . , n.
= np.
2
= np(1 p) = npq.
m(t) = (pe
t
+q)
n
.
Sampling with replacement. Sampling with replacement occurs when a set of N elements
has a subset of M preferred elements, and n elements are chosen at random, but the n
elements dont have to be distinct. In other words, after one element is chosen, its put
back in the set so it can be chosen again. Selecting a preferred element is success, and that
happens with probability p = M/N, while selecting a nonpreferred element is failure, and
that happens with probability q = 1 p = (N M)/N. Thus, sampling with replacement is
a Bernoulli process.
A bit of history. The rst serious development in the theory of probability was in the
1650s when Pascal and Fermat investigated the binomial distribution in the special case
p =
1
2
. Pascal published the resulting theory of binomial coecients and properties of what
we now call Pascals triangle.
In the very early 1700s Jacob Bernoulli extended these results to general values of p.
3.3 Geometric distribution.
Geometric(p). Discrete.
When independent Bernoulli trials are repeated, each with probability p of success, the
number of trials X it takes to get the rst success has a geometric distribution.
f(x) = q
x1
p, for x = 1, 2, . . ..
=
1
p
.
2
=
1 p
p
2
.
m(t) =
pe
t
1 qe
t
.
4
3.4 Negative binomial distribution.
NegativeBinomial(p, r). Discrete.
When independent Bernoulli trials are repeated, each with probability p of success, and
X is the trial number when r successes are rst achieved, then X has a negative binomial
distribution. Note that Geometric(p) = NegativeBinomial(p, 1).
f(x) =

x1
r1

p
r
q
xr
, for x = r, r + 1, . . ..
=
r
p
.
2
=
rq
p
2
.
m(t) =

pe
t
1 qe
t

r
.
3.5 Hypergeometric distribution.
Hypergeometric(N, M, n). Discrete.
In a Bernoulli process, given that there are M successes among N trials, the number X
of successes among the rst n trials has a Hypergeometric(N, M, n) distribution.
f(x) =

M
x

NM
nx

N
n

, for x = 0, 1, . . . , n.
= np.
2
=

Nn
N1

npq.
Sampling without replacement. Sampling with replacement was mentioned above in
the section on the binomial distribution. Sampling without replacement is similar, but once
an element is selected from a set, it is taken out of the set so that it cant be selected again.
Formally, we have a set of N elements with a subset of M preferred elements, and n
distinct elements among the N elements are to be chosen at random. The symbol p is often
used for the probability of choosing a preferred element, that is, p = M/N, and q = 1 p.
The hypergeometric distribution, described below, answers the question: of the n chosen
elements, how many are preferred?
When n is a small fraction of N, then sampling without replacement is almost the same
as sampling with replacement, and the hypergeometric distribution is almost a binomial
distribution.
Many surveys have questions with only two responses, such as yes/no, for/against, or
candidate A/B. These are actually sampling without replacement because the same person
wont be asked to respond to the survey twice. But since only a small portion of the population
will respond, the analysis of surveys can be treated as sampling with replacement rather than
sampling without replacement (which it actually is).
4 The Poisson process
A Poisson process is the continuous version of a Bernoulli process. In a Bernoulli process,
time is discrete, and at each time unit there is a certain probability p that success occurs,
the same probability at any given time, and the events at one time instant are independent
of the events at other time instants.
5
In a Poisson process, time is continuous, and there is a certain rate of events occurring
per unit time that is the same for any time interval, and events occur independently of each
other. Whereas in a Bernoulli process either no or one event occurs in a unit time interval,
in a Poisson process any nonnegative whole number of events can occur in unit time.
As in a Bernoulli process, you can ask various questions about a Poisson process, and
the answers will have various distributions. If you ask how many events occur in an inter-
val of length t, then the answer will have a Poisson distribution, Poisson(t). If you ask
how long until the rst event occurs, then the answer will have an exponential distribution,
Exponential(). If you ask how long until the r
th
event, then the answer will have a
gamma distribution, Gamma(, r). If there are + events in a given time interval, if you
ask what fraction of the interval it takes until the
th
event occurs, then the answer will have
a Beta(, ) distribution.
4.1 Poisson distribution.
Poisson(t). Discrete.
When events occur uniformly at random over time at a rate of events per unit time,
then the random variable X giving the number of events in a time interval of length t has a
Poisson distribution.
Often the Poisson distribution is given just with the parameter , but I nd it useful to
incorporate lengths of time t other than requiring that t = 1.
f(x) =
1
x!
(t)
x
e
t
, for x = 0, 1, . . . .
= t.
2
= t.
m(s) = e
t(e
s
1)
.
4.2 Exponential distribution.
Exponential(). Continuous.
When events occur uniformly at random over time at a rate of events per unit time,
then the random variable X giving the time to the rst event has an exponential distribution.
f(x) = e
x
, for x [0, ).
= 1/.
2
= 1/
2
.
m(t) = (1 t/)
1
.
4.3 Gamma distribution.
Gamma(, r) or Gamma(, ). Continuous.
In the same Poisson process for the exponential distribution, the gamma distribution gives
the time to the r
th
event. Thus, Exponential() = Gamma(, 1).
The gamma distribution also has applications when r is not an integer. For that generality
the factorial function is replaced by the gamma function, (x), described above.
There is an alternate parameterization Gamma(, ) of the family of gamma distribu-
tions. The connection is = r, and = 1/ which is the expected time to the rst event in
a Poisson process.
6
f(x) =
1
(r)

r
x
r1
e
x
=
x
1
e
x/

()
, for x [0, ).
= r/ = .
2
= r/
2
=
2
.
m(t) = (1 t/)
r
= (1 t)

.
Application to Bayesian statistics. Gamma distributions are used in Bayesian statistics
as conjugate priors for the distributions in the Poisson process. In Gamma(, ), counts
the number of occurrences observed while keeps track of the elapsed time.
4.4 Beta distribution.
Beta(, ). Continuous.
In a Poisson process, if + events occur in a time interval, then the fraction of that
interval until the
th
event occurs has a Beta(, ) distribution.
Note that Beta(1, 1) = Uniform(0, 1).
In the following formula, B(, ) is the beta function described above.
f(x) =
1
B(, )
x
1
(1 x)
1
, for 0 x 1.
=

+
.
2
=

( +)
2
( + + 1)
.
Application to Bayesian statistics. Beta distributions are used in Bayesian statistics
as conjugate priors for the distributions in the Bernoulli process. Indeed, thats just what
Thomas Bayes (17021761) did. In Beta(, ), counts the number of successes observed
while keeps track of the failures observed.
5 Distributions related to the central limit theorem
The Central Limit Theorem says sample means and sample sums approach normal distribu-
tions as the sample size approaches innity.
5.1 Normal distribution.
Normal(,
2
). Continuous.
The normal distribution, also called the Gaussian distribution, is ubiquitous in probability
and statistics.
The parameters and
2
are real numbers,
2
being positive with positive square root .
f(x) =
1

2
exp

(x )
2
2
2

, for x R.
= .
2
=
2
.
m(t) = exp(t +t
2

2
/2).
The standard normal distribution has = 0 and
2
= 1. Thus, its density function is
f(x) =
1

2
e
x
2
/2
.
7
Applications. Normal distributions are used in statistics to make inferences about the
population mean when the sample size n is large.
Application to Bayesian statistics. Normal distributions are used in Bayesian statistics
as conjugate priors for the family of normal distributions with a known variance.
5.2
2
-distribution.
ChiSquared(). Continuous.
The parameter , the number of degrees of freedom, is a positive integer. I generally
use this Greek letter nu for degrees of freedom. This is the distribution for the sum of the
squares of independent standard normal distributions.
A
2
-distribution is actually a special case of a gamma distribution with a fractional value
for r. ChiSquared() = Gamma(, r) where =
1
2
and r =

2
.
f(x) =
x
/21
e
x/2
2
/2
(/2)
, for x 0.
= .
2
= 2.
m(t) = (1 2t)
/2
.
Applications.
2
-distributions are used in statistics to make inferences on the population
variance when the population is assumed to be normally distributed.
5.3 Students T-distribution.
T(). Continuous.
If Y is Normal(0, 1) and Z is ChiSquared() and independent of Y , then X = Y/

Z/
has a T-distribution with degrees of freedom.
f(x) =
(
+1
2
)

2
) (1 +x
2
/)
(+1)/2
, for x R.
= 0.
2
=

2
for > 2.
Applications. T-distributions are used in statistics to make inferences on the population
variance when the population is assumed to be normally distributed, especially when the
population is small.
5.4 Snedecor-Fishers F-distribution.
F(
1
,
2
). Continuous.
If Y and Z are independent
2
-random variables with
1
and
2
degrees of freedom,
respectively, then X =
Y/
1
Z/
2
has an F-distribution with (
1
,
2
) degrees of freedom.
Note that if X is T(), then X
2
is F(1, ).
f(x) =
1
B(

1
2
,

2
2
)
(

2
)

1
/2
x

1
/21
(1 +

1

2
x)
(
1
+
2
)/2
for x > 0.
8
=

2

2
2
when
2
> 2.
2
=
2
2
2
(
1
+
2
2)

1
(
2
2)
2
(
2
4)
when
2
> 4.
Applications. F-distributions are used in statistics when comparing variances of two pop-
ulations.
Table of Discrete and Continuous distributions
Distribution Type Mass/density function f(x) Mean Variance
2
Uniform(n) D 1/n, for x = 1, 2, . . . , n (n + 1)/2 (n
2
1)/12
Uniform(a, b) C
1
b a
, for x [a, b]
a +b
2
(b a)
2
12
Bernoulli(p) D f(0) = 1 p, f(1) = p p p(1 p)
Binomial(n, p) D

n
x

p
x
(1 p)
nx
, np npq
for x = 0, 1, . . . , n
Geometric(p) D q
x1
p, for x = 1, 2, . . . 1/p (1 p)/p
2
NegativeBinomial(p,r) D

x1
r1

p
r
q
xr
, r/p r(1 p)/p
2
for x = r, r + 1, . . .
Hypergeometric(N,M,n) D

M
x

NM
nx

N
n

, np np(1 p)
for x = 0, 1, . . . , n
Poisson(t) D
1
x!
(t)
x
e
t
, for x = 0, 1, . . . t t
Exponential() C e
x
, for x [0, ) 1/ 1/
2
Gamma(, r) C
1
(r)

r
x
r1
e
x
r/ r/
2
Gamma(, ) =
x
1
e
x/

()
, = = a
2
for x [0, )
Beta(, ) C
1
B(, )
x
1
(1 x)
1
,

+

(+)
2
(++1)
for 0 x 1
Normal(,
2
) C
1

2
exp

(x )
2
2
2

,
2
for x R
ChiSquared() C
x
/21
e
x/2
2
/2
(/2)
, for x 0 2
T() C
(
+1
2
)

2
) (1 +x
2
/)
(+1)/2
0 /( 2)
for x R
F(
1
,
2
) C
1
B(

1
2
,

2
2
)
(

2
)

1
/2
x

1
/21
(1 +

1

2
x)
(
1
+
2
)/2

2
2
2
2
2
(
1
+
2
2)

1
(
2
2)
2
(
2
4)
for > 0
9

You might also like