Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
15 views

ProbabilityStatistics_Probability2 (1)

Probability
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

ProbabilityStatistics_Probability2 (1)

Probability
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Chapter 1

Random variables

1.1 First concepts


Often, the possible results (sample space Ω) of a random experiment are not numeric
values. For example, if a coin is tossed three times in succession

Ω = {(c, c, c), (c, c, +), (c, +, c), (+, c, c), (c, +, +), (+, c, +), (+, +, c), (+, +, +)}.

In general, it is easier to use numerical values than to work directly with the sample
space which can be very complicated. For example, in the case of the three-coin toss,
we might be interested only in the number of faces we got. Thus, we would identify the
elementary event (c, c, c) with 3, the event (+, +, +) with 0, the events (c, +, +), (+, c, +)
and (+, +, c) with 1, and finally the events (c, c, +), (c, +, c) and (+, c, c) with 2. In this
way the concept of random variable (one-dimensional) appears as that of a function
that corresponds to each element ω of the sample space to correspond a real number.
Thus a random variable (v.a., for short) is an application

f : Ω −→ R
ω −→ X(ω).

The rigorous definition of a random variable is more complicated and technical than the
objectives of this course, but what interests us is to be clear about its concept.
Depending on the values that a random variable can take, it is classified into:
• Discrete random variable if it only takes a finite or countable number of values.
Examples:

◦ The number of faces in the three-coin toss.


◦ The number of times we have to roll a dice to get the first 6.

• Continuous Random Variable if you can take an infinite number of uncountable


values, usually an interval (for example, [a, b]), a half line (for example, (a, ∞) ) or
the entire real line.
Examples:

1
◦ The time it takes for a light bulb to fail.
◦ A family’s income.

One of the basic tools when studying random variables is the distribution function. Given
a v.a. X, a function F : R → [0, 1] is its distribution function if F corresponds to each
value x the probability that the variable take a value less than or equal to x, that is,

F (x) := P (X ≤ x).

This function has the following properties:


• limx→+∞ F (x) = 1 i limx→−∞ F (x) = 0.
• F (x) is an increasing function: if a ≤ b, then F (a) ≤ F (b).
• F (x) is continuous to the right: limx→x+0 F (x) = F (x0 ).
Note. The distribution function is not always continuous.
We can use the distribution function definition to calculate probabilities of the variable
X:
• P (X < a) = F (a− )
• P (X = a) = F (a) − F (a− )
• P (a < X ≤ b) = F (b) − F (a)
• P (a ≤ X ≤ b) = F (b) − F (a− )
• P (a ≤ X < b) = F (b− ) − F (a− )
• P (a < X < b) = F (b− ) − F (a)
on F (a− ) = limx→a− F (x).

Next we will study the most important concepts related to the probability distribution of
a random variable. We will study separately the discrete case of the continuous case. For
now we will give the concepts in general and in the next section we will study the most
important distributions, those that appear more in practice.

1.2 Discrete Random Variables


A discrete random variable is determined at the time we know the probability that the
variable takes each of its possible values. The probability mass function is the function
that gives us for each value x its corresponding probability, that is P (X = x).
If we say x1 , x2 , . . . , xn the different values that the variable X can take, it is very common
to use even shorter notation

pi = pX (xi ) = f (xi ) = P (X = xi ).

2
Example. Suppose we toss three coins and want to calculate the probability mass function
of the v.a. X = Number of faces in all three releases.
The Ω set has 8 equiprobable elementary events, so
1
p0 = f (0) = P (X = 0) = P ((+, +, +)) = ,
8
3
p1 = f (1) = P (X = 1) = P ((c, +, +), (+, c, +), (+, +, c)) = ,
8
3
p2 = f (2) = P (X = 2) = P ((c, c, +), (+, c, c), (c, +, c)) = ,
8
1
p3 = f (3) = P (X = 3) = P ((c, c, c)) = .
8

If a discrete random variable takes the values x1 , x2 , . . . , xn , the probability that when
doing the random experiment, X takes any of these values is 1 so we have
n
X n
X
f (xi ) = P (X = xi ) = 1,
i=1 i=1

and this would also happen even if the discrete random variable took an infinite but
countable number of values x1 , x2 , . . . , xn , . . . :

X ∞
X
f (xi ) = P (X = xi ) = 1.
i=1 i=1

On the other hand, given two numbers a < b,


X X
P (a ≤ X ≤ b) = P (X = xi ) = f (xi ).
a≤xi ≤b a≤xi ≤b

1.3 Continuous Random Variables


For a continuous variable, the number of values it takes is uncountable, and the sums
we wrote in the previous section make no sense. In this case, what is done is to change
the summation sign to the integral sign. The density function of a continuous random
variable X is defined, and denoted by the letter f , as a function that satisfies:

1. f (x) ≥ 0 for all x ∈ R.

2. f is integrable and Z ∞
f (x) dx = 1.
−∞

3. For any pair of numbers a < b is satisfied


Z b
P (a ≤ X ≤ b) = f (x) dx.
a

3
This last property tells us that the probability that a continuous variable takes values
in a certain interval [a, b] is calculated as the area between the curve that determines
its density function and the axis x, between the points a and b. For the latter fact, the
probability that X is exactly the right number a would be the area of the line joining the
point (a, 0) with the point (a, f (a)) . Since the area of a line is 0 we deduce that for every
continuous random variable X
P (X = a) = 0, for any a ∈ R.

Fig_density.pdf

Figure 1.1: Density function of a continuous variable.

1.4 Expectation and variance of a random variable


The Expectation (mathematics) or expected value of a random variable is the concept
equivalent to the arithmetic mean of descriptive statistics. In fact, its definition corre-
sponds to what would give us the limit of the arithmetic means of a very large number
of observations of the random variable X, when this number of observations tends to
infinity. In inferential statistics, mathematical expectancy is also called the population
average.
For the definition of Expectation, we will study separately the discrete case of the con-
tinuous case.

Discrete case Let X be a v.a. discreet. Its expectation is denoted by µ or E(X) and is
defined X X
E(X) = xi f (xi ) = xi P (X = xi ),
i∈I i∈I

4
where the set of values that takes X can be a finite set and then I = {1, 2, . . . , n}, or it
can be infinitely countable and then I = {1, 2, . . . , n, . . . }, which is equivalent to I = N.

Example. We calculate the expectation of the v.a. X = Number of faces in 3-coin toss.
Remember that X takes the values 0,1,2 and 3 and the probability function is f (0) = 1/8,
f (1) = f (2) = 3/8 and f (3) = 1/8. We have,
1 3 3 1 3
E(X) = 0 · + 1 · + 2 · + 3 · = = 1.5 .
8 8 8 8 2

Continuous case Let X be a v.a. continuous with density function f . Its expectation
is defined by Z ∞
E(X) = µ = x f (x)dx.
−∞

We can give the following formulas for calculating the expectation of a certain function
of a random variable X. Consider the v.a. g(X), where g : R −→ R is a certain function.
If X is discrete with probability function f , then
X
E(g(X)) = g(xi ) f (xi ).
i∈I

If X is continuous with density function f , then


Z ∞
E(g(X)) = g(x) f (x) dx.
−∞

For example, if X is a v.a. keep on


Z ∞
2
E(X ) = x2 f (x) dx.
−∞

The variance of a v.a. X is denoted by V ar(X) or σ 2 and is defined as


h 2 i
2
V ar(X) = σ = E X − E(X) .

How are we averaging the quadratic deviations of the v.a. with respect to their expecta-
tion, we are measuring the dispersion of the v.a. It is the analogous concept in theory of
the probability of variance of descriptive statistics.
The variance is also calculated as
2
V ar(X) = σ 2 = E(X 2 ) − E(X) .

We will define, as a descriptive statistic, the standard deviation of X as the square


root of the variance: p
σ = V ar(X).

5
Properties of Expectation and Variance
1. If X is a v.a. and a and b are two constants, it holds that

E aX + b = aE(X) + b

V ar(aX + b) = a2 V ar(X).

2. If X1 , X2 , . . . , Xn are v.a., then



E X1 + · · · + Xn = E(X1 ) + · · · + E(Xn ).

In general, it is not true that the variance of a sum of v.a. is equal to the sum of its
variances. However, there is one important case in which this is true. We will consider it
below.

1.5 Independent Random Variables


It is said that the v.a. X1 , X2 , . . . , Xn are independent if it holds that for any collection
of intervals (a1 , b1 ), (a2 , b2 ), . . . , (an , bn ) es has
 
P a1 < X1 < b1 ) ∩ (a2 < X2 < b2 ) ∩ · · · ∩ (an < Xn < bn )
= P (a1 < X1 < b1 )P (a2 < X2 < b2 ) · · · P (an < Xn < bn ).

This means that each variable takes its possible values without being affected by the
values taken by the others.
An important property of independent variables is that if X1 , X2 , . . . Xn are v.a. indepen-
dent, then

V ar(X1 + X2 + · · · + Xn ) = V ar(X1 ) + V ar(X2 ) + · · · + V ar(Xn ).

6
Chapter 2

Top Random Variable Distribution


Laws

In this chapter we will describe the main laws of probability that we find in the applications
of probability calculus.

2.1 Discrete Variables


2.1.1 Bernoulli distribution
This is the mathematical model of the following situation. We perform a random ex-
periment only once and observe whether a certain event occurs or not, with p being the
probability of the event occurring (success) and q = 1 − p the probability that the event
is not occurred (failure).
For example, we randomly select a person from a certain population of individuals and
we are interested in whether or not they are smokers. A v.a. of Bernoulli associated with
this experiment would be

1 if the individual is a smoker
X=
0 otherwise.
The probability function of this v.a. it’s that simple

p ifx = 1,
P (X = x) =
q = 1 − p ifx = 0.
In example p is the population proportion of smokers and q = 1 − p is the population
proportion of non-smokers.
Notation. If X has Bernoulli distribution and p is the probability of success, we will
also say that X has Bernoulli distribution of parameter p and we will write
X ∼ B(p).

Expectation and variance. It is easily verified that if X ∼ B(p), then


E(X) = p,

7
V ar(X) = pq = p(1 − p).

2.1.2 Binomial distribution


A v.a. X has binomial distribution of parameters n and p and we write X ∼ B(n, p) (or
X ∼ Binomial(n, p)) if
X = X1 + X2 + . . . + Xn ,
where the v.a. Xa nd are independent and

Xi ∼ B(p), ∀ i = 1, . . . , n.

This definition can be interpreted as follows: Suppose we perform n tests (independent)


of type Bernoulli (that is, we assign the value 1 if the test i -th is produces a success and
0 if a failure occurs) and we are interested in X the total number of successes obtained
in the n tests.

Examples:

• Suppose we roll a perfect dice n and we are interested in the number of times we get
6. Here we are doing n independent tests, in which in each test we can be successful
(6 out) or failure (6 out) and we are interested in the total number of successes in
the n releases. Thus, the v.a. X given by the number of 6 in the n releases will
have distribution B(n, 1/6).

• The v.a. from our example of the 3 coin toss X = Number of faces obtained. By
the same type of reasoning as in the previous example X ∼ B(3.1/2).

• We have a population of individuals and select n people in succession, with re-


placement. We are interested in the number of smokers among the n individuals
selected. As each time we return the selected individual to the population, the
successive tests consisting of observing whether the individual is a smoker (success)
or not (failure) will be independent, ie the result of a test does not influence for
nothing in the result of the following tests. If we did not return the individual to
the population, then the successive tests would not be independent, since if we have
selected a smoker in the next test there is one less smoker in the population and in
total we also have one less person.
Thus, if the selection is made with replacement, the X number of smokers in the
n selections will have binomial distribution with parameters n (number of selected
individuals) and p (probability of success in each test = proportion of smokers in
the population).

It can be shown that the probability function of X is given by:


 
n k n−k
P (X = k) = p q , ∀ k = 0, 1, . . . , n .
k

8
The combination number nk in the above expression is defined as


 
n n! n(n − 1)(n − 2) · · · (n − k + 1)
= =
k (nk)! k! k(k − 1)(k − 2) · · · 2 · 1
and gives us the number of ways we have to choose subsets of k elements within a set of
n elements. It makes sense for k = 0.1, . . . , n, i, by definition,
 
n
= 1.
0
The factorial of a natural number, k!, Is given by the expression:

k! = k(k − 1)(k − 2) · · · 2 · 1.

Also, by definition, 0! = 1. It is the number of ways in which one can sort the elements
of a set of k elements.
Expectation and variance. If X ∼ B(n, p), then

E(X) = np,

V ar(X) = np(1 − p).


These two properties are easily tested using X being the sum of n v.a. independent with
Bernoulli distribution of parameter p.

Example. A student takes an 8-question test in which each question has 3 possible
answers, only one of which is correct. Assuming that this student has no idea and that
he answers the questions completely at random, we calculate the probability that he will
answer exactly 4 questions.
Each time the student answers a question he / she is taking a Bernoulli test in which
the success will be to get the correct answer correct (which has a probability of p = 1/3)
and the failure not to get it right. Subsequent tests are independent. If we say X to the
number of correct questions out of 8, we will have that
 
1
X ∼ B 8, .
3
We are interested
   
8 1 4 1 8−4 8 · 7 · 6 · 5  1 4  2 4
P (X = 4) = 1− = = 0.1707.
4 3 3 4·3·2·1 3 3

2.1.3 Poisson distribution


A v.a. X has Poisson distribution of parameter λ, where λ is a positive real number, if
its probability function is given by
λk
P (X = k) = e−λ , per ak = 0, 1, 2, . . .
k!
9
We write X ∼ Pois(λ).
This is a distribution that can take any value in the set of natural numbers, although the
probability that it takes large k values tends to zero very fast when k → ∞. It is used
as a model in many practical situations. For example, the number of particles emitted
by a radioactive substance in a certain unit of time fits well into this distribution. It
is also known as the distribution of rare events as it also models infrequent events well.
Examples are the number of monthly suicides in a certain city or the number of accidents
on a certain stretch of road in a given time frame. It is also a good model for counting
plants or animals in a certain region, bacteria per unit volume, and so on.
The Poisson distribution is related to the binomial distribution by the following theorem:
Theorem 1 If X ∼ B(n, p), so n → ∞ i np = λ, where λ is a positive constant (note
that, for this can happen, p must have 0), then
B(n, p) −→ Poiss(λ).
This theorem tells us that if n is large and p very small, we can define λ = np and
approximate the probabilities of X ∼ B(n, p) by the corresponding probabilities of a
Poiss (λ). This translates to using the approximate formula:
(np)k
 
n k
P (X = k) = p (1 − p)nk ' e−np .
k k!
This approach works well if met
n ≥ 20 i p ≤ 0.05
and it is excellent if
n ≥ 100 i np ≤ 10.

Example. Suppose that in a large batch of items produced by a certain factory, the
proportion of defective items is p = 0.001 and that 100 objects of production are randomly
selected. We calculate the probability of getting a defective item and also the probability
of not getting one.
As the output is very large, we will use the binomial model for the variable
X = Number of defective items out of 100
no matter if the selection was made with or without a replacement. Thus X ∼ B(100, 0001)
and therefore  
100
P (X = 1) = (0.001)1 (0.999)99 ' 0.0906,
1
 
100
P (X = 0) = (0.001)0 (0.999)100 ' 0.9048.
0
Approximating now by the Poisson of parameter λ = np = 100 × 0.001 = 0.1, we have for
Y ∼ Poiss(0, 1)
0.11
P (Y = 1) = e−0.1 ' 0.0905,
1!
10
0.10
P (Y = 0) = e−0.1 = e−0,1 ' 0.9048.
0!
As we can see, the results are very similar. They are generally easier to perform calcu-
lations with the Poisson. In addition, for n large the combinatorial numbers involved in
binomial calculations can be very large. On the other hand, if p is very small, its powers
are even smaller, and when we do the calculations with the binomial distribution, we have
products of very large numbers for very small numbers. This causes accuracy problems in
calculating machines. Therefore, the option of using the Poisson approximation instead
of the Binomial distribution is appropriate in these cases.

Expectation and variance. If X ∼ Poiss(λ), we have

E(X) = V ar(X) = λ.

This tells us that the λ parameter is only the population average of the v.a.

11

You might also like