ProbabilityStatistics_Probability2 (1)
ProbabilityStatistics_Probability2 (1)
Random variables
Ω = {(c, c, c), (c, c, +), (c, +, c), (+, c, c), (c, +, +), (+, c, +), (+, +, c), (+, +, +)}.
In general, it is easier to use numerical values than to work directly with the sample
space which can be very complicated. For example, in the case of the three-coin toss,
we might be interested only in the number of faces we got. Thus, we would identify the
elementary event (c, c, c) with 3, the event (+, +, +) with 0, the events (c, +, +), (+, c, +)
and (+, +, c) with 1, and finally the events (c, c, +), (c, +, c) and (+, c, c) with 2. In this
way the concept of random variable (one-dimensional) appears as that of a function
that corresponds to each element ω of the sample space to correspond a real number.
Thus a random variable (v.a., for short) is an application
f : Ω −→ R
ω −→ X(ω).
The rigorous definition of a random variable is more complicated and technical than the
objectives of this course, but what interests us is to be clear about its concept.
Depending on the values that a random variable can take, it is classified into:
• Discrete random variable if it only takes a finite or countable number of values.
Examples:
1
◦ The time it takes for a light bulb to fail.
◦ A family’s income.
One of the basic tools when studying random variables is the distribution function. Given
a v.a. X, a function F : R → [0, 1] is its distribution function if F corresponds to each
value x the probability that the variable take a value less than or equal to x, that is,
F (x) := P (X ≤ x).
Next we will study the most important concepts related to the probability distribution of
a random variable. We will study separately the discrete case of the continuous case. For
now we will give the concepts in general and in the next section we will study the most
important distributions, those that appear more in practice.
pi = pX (xi ) = f (xi ) = P (X = xi ).
2
Example. Suppose we toss three coins and want to calculate the probability mass function
of the v.a. X = Number of faces in all three releases.
The Ω set has 8 equiprobable elementary events, so
1
p0 = f (0) = P (X = 0) = P ((+, +, +)) = ,
8
3
p1 = f (1) = P (X = 1) = P ((c, +, +), (+, c, +), (+, +, c)) = ,
8
3
p2 = f (2) = P (X = 2) = P ((c, c, +), (+, c, c), (c, +, c)) = ,
8
1
p3 = f (3) = P (X = 3) = P ((c, c, c)) = .
8
If a discrete random variable takes the values x1 , x2 , . . . , xn , the probability that when
doing the random experiment, X takes any of these values is 1 so we have
n
X n
X
f (xi ) = P (X = xi ) = 1,
i=1 i=1
and this would also happen even if the discrete random variable took an infinite but
countable number of values x1 , x2 , . . . , xn , . . . :
∞
X ∞
X
f (xi ) = P (X = xi ) = 1.
i=1 i=1
2. f is integrable and Z ∞
f (x) dx = 1.
−∞
3
This last property tells us that the probability that a continuous variable takes values
in a certain interval [a, b] is calculated as the area between the curve that determines
its density function and the axis x, between the points a and b. For the latter fact, the
probability that X is exactly the right number a would be the area of the line joining the
point (a, 0) with the point (a, f (a)) . Since the area of a line is 0 we deduce that for every
continuous random variable X
P (X = a) = 0, for any a ∈ R.
Fig_density.pdf
Discrete case Let X be a v.a. discreet. Its expectation is denoted by µ or E(X) and is
defined X X
E(X) = xi f (xi ) = xi P (X = xi ),
i∈I i∈I
4
where the set of values that takes X can be a finite set and then I = {1, 2, . . . , n}, or it
can be infinitely countable and then I = {1, 2, . . . , n, . . . }, which is equivalent to I = N.
Example. We calculate the expectation of the v.a. X = Number of faces in 3-coin toss.
Remember that X takes the values 0,1,2 and 3 and the probability function is f (0) = 1/8,
f (1) = f (2) = 3/8 and f (3) = 1/8. We have,
1 3 3 1 3
E(X) = 0 · + 1 · + 2 · + 3 · = = 1.5 .
8 8 8 8 2
Continuous case Let X be a v.a. continuous with density function f . Its expectation
is defined by Z ∞
E(X) = µ = x f (x)dx.
−∞
We can give the following formulas for calculating the expectation of a certain function
of a random variable X. Consider the v.a. g(X), where g : R −→ R is a certain function.
If X is discrete with probability function f , then
X
E(g(X)) = g(xi ) f (xi ).
i∈I
How are we averaging the quadratic deviations of the v.a. with respect to their expecta-
tion, we are measuring the dispersion of the v.a. It is the analogous concept in theory of
the probability of variance of descriptive statistics.
The variance is also calculated as
2
V ar(X) = σ 2 = E(X 2 ) − E(X) .
5
Properties of Expectation and Variance
1. If X is a v.a. and a and b are two constants, it holds that
E aX + b = aE(X) + b
V ar(aX + b) = a2 V ar(X).
In general, it is not true that the variance of a sum of v.a. is equal to the sum of its
variances. However, there is one important case in which this is true. We will consider it
below.
This means that each variable takes its possible values without being affected by the
values taken by the others.
An important property of independent variables is that if X1 , X2 , . . . Xn are v.a. indepen-
dent, then
6
Chapter 2
In this chapter we will describe the main laws of probability that we find in the applications
of probability calculus.
7
V ar(X) = pq = p(1 − p).
Xi ∼ B(p), ∀ i = 1, . . . , n.
Examples:
• Suppose we roll a perfect dice n and we are interested in the number of times we get
6. Here we are doing n independent tests, in which in each test we can be successful
(6 out) or failure (6 out) and we are interested in the total number of successes in
the n releases. Thus, the v.a. X given by the number of 6 in the n releases will
have distribution B(n, 1/6).
• The v.a. from our example of the 3 coin toss X = Number of faces obtained. By
the same type of reasoning as in the previous example X ∼ B(3.1/2).
8
The combination number nk in the above expression is defined as
n n! n(n − 1)(n − 2) · · · (n − k + 1)
= =
k (nk)! k! k(k − 1)(k − 2) · · · 2 · 1
and gives us the number of ways we have to choose subsets of k elements within a set of
n elements. It makes sense for k = 0.1, . . . , n, i, by definition,
n
= 1.
0
The factorial of a natural number, k!, Is given by the expression:
k! = k(k − 1)(k − 2) · · · 2 · 1.
Also, by definition, 0! = 1. It is the number of ways in which one can sort the elements
of a set of k elements.
Expectation and variance. If X ∼ B(n, p), then
E(X) = np,
Example. A student takes an 8-question test in which each question has 3 possible
answers, only one of which is correct. Assuming that this student has no idea and that
he answers the questions completely at random, we calculate the probability that he will
answer exactly 4 questions.
Each time the student answers a question he / she is taking a Bernoulli test in which
the success will be to get the correct answer correct (which has a probability of p = 1/3)
and the failure not to get it right. Subsequent tests are independent. If we say X to the
number of correct questions out of 8, we will have that
1
X ∼ B 8, .
3
We are interested
8 1 4 1 8−4 8 · 7 · 6 · 5 1 4 2 4
P (X = 4) = 1− = = 0.1707.
4 3 3 4·3·2·1 3 3
Example. Suppose that in a large batch of items produced by a certain factory, the
proportion of defective items is p = 0.001 and that 100 objects of production are randomly
selected. We calculate the probability of getting a defective item and also the probability
of not getting one.
As the output is very large, we will use the binomial model for the variable
X = Number of defective items out of 100
no matter if the selection was made with or without a replacement. Thus X ∼ B(100, 0001)
and therefore
100
P (X = 1) = (0.001)1 (0.999)99 ' 0.0906,
1
100
P (X = 0) = (0.001)0 (0.999)100 ' 0.9048.
0
Approximating now by the Poisson of parameter λ = np = 100 × 0.001 = 0.1, we have for
Y ∼ Poiss(0, 1)
0.11
P (Y = 1) = e−0.1 ' 0.0905,
1!
10
0.10
P (Y = 0) = e−0.1 = e−0,1 ' 0.9048.
0!
As we can see, the results are very similar. They are generally easier to perform calcu-
lations with the Poisson. In addition, for n large the combinatorial numbers involved in
binomial calculations can be very large. On the other hand, if p is very small, its powers
are even smaller, and when we do the calculations with the binomial distribution, we have
products of very large numbers for very small numbers. This causes accuracy problems in
calculating machines. Therefore, the option of using the Poisson approximation instead
of the Binomial distribution is appropriate in these cases.
E(X) = V ar(X) = λ.
This tells us that the λ parameter is only the population average of the v.a.
11