Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Probability Theory: 1.1. Space of Elementary Events, Random Events

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

1.

Probability theory
1.1. Space of elementary events, random events
Probability theory is the branch of mathematics concerned with analysis of ran-
dom phenomenon. The central objects of probability theory are random variables,
stochastic processes, and events.
A random experiment is an experiment that produces random outcomes. For
example, throwing a die is a random experiment in which each trial produces a
random outcome from six possible outcomes, i.e., faces with one through six spots.

1 definition. An experiment is any procedure that can be infinitely repeated and


has a well defined set of outcomes.

A trial is a single instantiation of a random experiment. If a die is thrown ten


times, there would be ten trials. The key concept to note here is that each trial
produces exactly one outcome.
Another term frequently used in probability theory is a random event.

2 definition. A random event in probability theory is any fact, which may occur
as a result of an experiment with a random outcome or may not.

A random event is a higher level outcome that may depend on multiple exper-
iments and multiple outcomes of the experiments. For example, consider a game
consisting of two random experiments, throwing a die and throwing a coin. A player
is to throw the die twice and the coin once. A player who gets the face with one
spot in both die-throwings and a head in the coin-throwing wins the grand prize.
In this game, the random event of interest is winning the grand prize. This event
would occur, if the trials produce the following outcomes: one spot in both of the
die-throwings and a head in the coin-throwing. In this example, the event depends
on multiple experiments and multiple outcomes.
The simplest result of experiment is called an elementary event ( for instance, an
appearance of heads or tails at throwing of a coin etc.).

3 definition. Elementary event is a result of experiment, which can not be split


in to separate events.

4 definition. A set of all elementary events Ω is called a space of elementary


events or sample space.

Example of sample space, Ω, and elementary events include: If a coin is tossed


twice, Ω = {HH, HT, T H, T T }, H for heads and T for tails, and the elementary
events are {HH}, {HT }, {T H} and {T T }.
There are three types of events: random event, certain event and im-
possible event.

1
5 definition. A certain event is an event, which must happen as a result of the
experiment without fail.

Example. When we throw a die the certain event is ”a fall of the die on one of its
faces”.

A certain event is notated as a space of elementary events Ω.

6 definition. An impossible event is an event, which cannot happen as a result


of the experiment.

Example. When we throw a die the imposable event is ”a fall of the number 7”.

An impossible event ∅ is called an empty subset of a space of elementary events.


Since all events are sets, they are usually written as sets (e.g.{1, 2, 3}), and
represented graphically using Venn diagrams. Venn diagrams are particularly
useful for representing events because the probability of the event can be identified
with the ratio of the area of the event and the area of the sample space.

Figure 1: Venn diagrams

7 definition. Events A and B are called mutually exclusive (A ∩ B = ∅), if their


simultaneous occurrence is impossible.

When events are not mutually exclusive (inclusive events, i.e., non-mutually
exclusive events), the word ”or” allows for the possibility of both events happen-
ing.
Events are collectively exhaustive if all the possibilities for outcomes are ex-
hausted, and at least one of those outcomes must occur. For example, there are
theoretically only two possibilities for flipping a coin. Flipping a head and flipping
a tail are collectively exhaustive events. Events can be both mutually exclusive
and collectively exhaustive. In the case of flipping a coin, flipping a head and

2
flipping a tail are also mutually exclusive events. Both outcomes cannot occur for a
single trial (i.e., when a coin is flipped only once).
If several events can happen as a result of an experiment, and each of them
isn’t more possible than others according to objective conditions, then such events
are called equally likely events. Examples of equally likely events are: an appear-
ance of two, an ace or a knave at taking a card out of a pack, an appearance of any
number from 1 to 6 at throwing of a die etc.

1.2. Probability measure


A probabilistic model is a mathematical description of an uncertain situation.
It’s two main ingredients are listed below and are visualized in Fig. 2. Elements
of a Probabilistic Model:

1) The sample space Ω, which is the set of all possible outcomes of an experi-
ment.

2) The probability law (or probability measure), which assigns to a set


A of possible outcomes (an event) a nonnegative number P(A) (called the
probability of A) that encodes our knowledge or belief about the collective
”likelihood” of the elements of A.

Figure 2: Probability model

The probability measure P is a function, that assigns to each event a probability


between 0 and 1. It must satisfy the probability axioms:

1) P(Ω) = 1;

2) P(A) > 1;

3) for each pair of mutually exclusive events A, B ⊂ Ω the equality P(A ∪ B) =


P(A) + P(B) takes places.

3
1.3. Statistical definition of probability
Intuitively, the probability of an event is supposed to measure the long-term
relative frequency of the event. Specifically, suppose that we repeat the experiment
indefinitely. (Note that this actually creates a new, compound experiment.) For an
event A in the basic experiment, let n(A) denote the number of times A occurred
(the frequency of A) in the first N runs. Thus,
n(A)
Pn (A) = (1)
N
is the relative frequency of A in the first N runs. If we have chosen the correct
probability measure for the experiment, then in some sense we expect that the
relative frequency of each event should converge to the probability of the event:

Pn (A) → P(A), when n → ∞. (2)

It follows that if we have the data from N runs of the experiment, the observed
relative frequency Pn (A) can be used as an approximation for P(A). This approxi-
mation is called the statistical definition of probability.

1.4. Classical definition of probability


8 definition. Let a space of elementary events Ω be given and this space consists
of N equally likely elementary events, among which there are n events, favorable for
an event A. Then the number
n
P(A) = (3)
N
is called the probability of an event A.

Basic properties of probability. Let a space of elementary events Ω be given


and probabilities P are defined on events of Ω. Then:

1) P(∅) = 0;

2) if A ⊂ B ⊂ Ω, then P(A) 6 P(B);

3) for each A ⊂ Ω the inequality P(A) 6 1 takes places;

4) for each A ⊂ Ω the equality P(Ā) = 1 − P(A) takes places;

1.5. Geometric definition of probability


We have geometric figure Ω. Geometric figure A is a subset of Ω. The possibility
to choose any point in figure Ω is identity. what is the possibility that chosen point
will be in the figure A? This possibility is equal to the ratio of the figures areas:
area of figure A
P(the point is in the figure A) = . (4)
area of figure Ω

4
1.6. Elements of combinatorics
In English we use the word ”combination” loosely, without thinking if the order of
things is important. In other words:

1) ”My fruit salad is a combination of apples, grapes and bananas” We don’t care
what order the fruits are in, they could also be ”bananas, grapes and apples”
or ”grapes, apples and bananas”, its the same fruit salad.

2) ”The combination to the safe was 472”. Now we do care about the order.
”724” would not work, nor would ”247”. It has to be exactly 4-7-2.

So, in Mathematics we use more precise language:

* If the order doesn’t matter, it is a Combination.

* If the order does matter it is a Permutation or Arrangement.

1.6.1. Permutation

1. Permutation with Repetition. Permutation with repetitions of n elements,


where n1 elements are category 1, n2 elements are category 2, ..., nk elements are
category k is denoted by P (n1 , n2 , ..., nk ), where

n!
P (n1 , n2 , ..., nk ) = , n1 + n2 + ...nk = n.
n1 !n2 !...nk !
2. Permutation without Repetition. Permutation without repetitions of n
different elements is denoted by Pn , where

Pn = n(n − 1)...2 · 1 = n!.

1.6.2. Arrangement

There are basically two types of arrangements:

* Repetition is Allowed: such as the lock above. It could be ”333”.

* No Repetition: for example the first three people in a running race. You can’t
be first and second.

1. Arrangements with Repetition. These are the easiest to calculate. When


you have n things to choose from ... you have n choices each time! When choosing
r of them, the permutations are: n × n × ...(rtimes). (In other words, there are n
possibilities for the first choice, THEN there are n possibilities for the second choice,
and so on, multiplying each time.) Which is easier to write down using an exponent
of r:
Ān,r = nr .

5
Example. In the lock above, there are 10 numbers to choose from (0, 1, ..9) and you
choose 3 of them:

10 × 10 × ...(3times) = 103 = 1000permutations

So, the formula is simply: nr where n is the number of things to choose from,
and you choose r of them (Repetition allowed, order matters).
2. Arrangements without Repetition. In this case, you have to reduce
the number of available choices each time. For example, what order could 16 pool
balls be in? After choosing, say, number ”14” you can’t choose it again. So, your
first choice would have 16 possibilities, and your next choice would then have 15
possibilities, then 14, 13, etc. And the total permutations would be:

16 × 15 × 14 × 13 × ... = 20, 922, 789, 888, 000

But maybe you don’t want to choose them all, just 3 of them, so that would be
only:
16 × 15 × 14 = 3, 360
In other words, there are 3,360 different ways that 3 pool balls could be selected
out of 16 balls.
The formula is written:
n!
Arn =
(n − r)!
where n is the number of things to choose from, and you choose r of them (No
repetition, order matters).

1.6.3. Combinations

There are also two types of combinations (remember the order does not matter now):

* Repetition is Allowed: such as coins in your pocket (5,5,5,10,10)

* No Repetition: such as lottery numbers (2,14,15,27,30,33)

1. Combinations without Repetition. This is how lotteries work. The


numbers are drawn one at a time, and if you have the lucky numbers (no matter
what order) you win!
n!
Cnr =
r!(n − r)!
 That formula is so important it is often just written in big parentheses like this:
n
r
where n is the number of things to choose from, and you choose r of them (No
repetition, order doesn’t matter).
It is interesting to also note how this formula is nice and symmetrical:

Cnr = Cnn−r

6
In other words choosing 3 balls out of 16, or choosing 13 balls out of 16 have the
same number of combinations.
2. Combinations with Repetition. Let us say there are five flavors of ice
cream: banana, chocolate, lemon, strawberry and vanilla. You can have three
scoops. How many variations will there be?
Let’s use letters for the flavors: {b, c, l, s, v}. Example selections would be

* {c, c, c} (3 scoops of chocolate)

* {b, l, v} (one each of banana, lemon and vanilla)

* {b, v, v} (one of banana, two of vanilla)

(And just to be clear: There are n = 5 things to choose from, and you choose
r = 3 of them. Order does not matter, and you can repeat!)
Think about the ice cream being in boxes, you could say ”move past the first
box, then take 3 scoops, then move along 3 more boxes to the end” and you will
have 3 scoops of chocolate! OK, so instead of worrying about different flavors, we
have a simpler problem to solve: ”how many different ways can you arrange arrows
and circles” Notice that there are always 3 circles (3 scoops of ice cream) and 4
arrows (you need to move 4 times to go from the 1st to 5th container). So (being
general here) there are r + (n − 1) positions, and we want to choose r of them to
have circles. This is like saying ”we have r + (n-1) pool balls and want to choose
r of them”. In other words it is now like the pool balls problem, but with slightly
changed numbers. And you would write it like this:
(r + (n − 1))!
C̄nr =
r!(n − 1)!
where n is the number of things to choose from, and you choose r of them (Repetition
allowed, order doesn’t matter).

1.7. Inclusion-Exclusion Principle


In the set S = {1, 2, ..., 100}, one out of every six numbers is a multiple of 6, so
that the total of multiples of 6 in S is [100/6] = [16, 666] = 16. Similarly, the total
number of multiples of 7 in S is [100/7] = 14. How many numbers in S are multiples
of 6 or 7? The answer is not 14 + 16 = 30. The reason is that the sum 14 + 16
counts twice those numbers that are both a multiple of 6 AND a multiple of 7; i.e.,
the multiples of 42. How many multiples of 42 were counted twice? As before, we
compute [100/42] = 2. Then the correct answer is that there are 14 + 16 − 2 = 28
numbers in S that are multiples of 6 or 7.
How many numbers in S are multiples of 2, 3, or 5? In this case there are 50
multiples of 2, 33 multiples of 3 and 20 multiples of 5. But the answer is clearly
not 50 + 33 + 20 = 103. This sum counts twice the numbers that are multiples of

7
2 and 3 for instance. We must subtract 16 multiples of 6, 10 multiples of 10 and 6
multiples of 15. It seems as if 50 + 33 + 20 − 16 − 10 − 6 = 71 is the final answer, but
it is not! The multiples of 30 were counted 3 times and eliminated 3 times. They
are not accounted for. We have to add 3 multiples of 30 to get the correct answer:
50 + 33 + 20 − 16 − 10 − 6 + 3 = 74.
The inclusion-exclusion principle tells us how to keep track of what to add and
what to subtract in problems like the above:
Let S be a finite set, and suppose there is a list of r properties that every element of
S may or may not have. We call S1 the subset of elements of S that have property
1; S1,2 the subset of elements in S that have properties 1 and 2, etc. Notice that
∪Si is the subset of elements of S that have at least one of the r properties.
r
X X X
| ∪ Si | = |Si | − |Si,j | + |Si,j,k | − ... + (−1)r−1 |Si,j,k,...,r |.
i=1 1≤i<j≤r 1≤i<j<k≤r

For example, if r=4, then

| ∪ Si | = |S1 | + |S2 | + |S3 | + |S4 | (5)


− |S1,2 | − |S1,3 | − |S1,4 | − |S2,3 | − |S2,4 | − |S3,4 |
+ |S1,2,3 | + |S1,2,4 | + |S1,3,4 | + |S2,3,4 |
− |S1,2,3,4 |.

The same principle and with probabilities!

* For each A, B ⊂ Ω the equality P(A ∪ B) = P(A) + P(B) − P(A ∩ B) takes


places.

* For each A, B, C ⊂ Ω the equality P(A ∪ B ∪ C) = P(A) + P(B) + P(C) −


P(A ∩ B) − P(A ∩ C) − P(B ∩ C) + P(A ∩ B ∩ C) takes places.

* ...

1.8. Conditional probability


In this section we ask and answer the following question. Suppose we assign a a
sample space to an experiment and then learn that an event B has occurred. How
should we change the probabilities of the remaining events? We shall call the new
probability for an event A the conditional probability of A given B and denote it by
P(A|B). In the ”die-toss” example, the probability of event A, ”three spots occur”,
is P(A) = 61 on a single toss. But what if we know that event B, ”at least three
spots occur”, occurred? Then there are only four possible outcomes, one of which
is A. The probability of A = {3} is 14 , given that B = {3; 4; 5; 6} occurred.

8
9 definition. The probability of occurrence of an event A at the condition, that an
event B takes place, is called the conditional probability and calculated by the
formula:
P(A ∩ B)
P(A|B) = (6)
P(B)
where A, B ⊂ Ω, P(B) 6= 0.

Example. We have the same example as above, but know we calculate probability
using formula:
1
P(A ∩ B) 1
P(A|B) = = 64 = .
P(B) 6
4

1.9. The formula of total (complete) probability


10 definition. Let Ω be sample space. If H1 , H2 , ..., Hn are mutually exclusive
(Hi ∩ Hj = ∅, i 6= j) events such that

H1 ∪ H2 ∪ ... ∪ Hn = Ω (7)

then for any event A the probability P (A) can be calculated using the total probability
formula:
n
X
P(A) = P(A|H1 )P(H1 )+P(A|H2 )P(H2 )+...+P(A|Hn )P(Hn ) = P(A|Hi )P(Hi ).
i=1
(8)

1.10. Bayes’ formula


Suppose we have a set of events H1 , H2 ,..., Hm that are pairwise disjoint and such
that
Ω = H1 ∪ H2 ∪ ... ∪ Hm .
We call these events hypotheses. We also have an event A that gives us some
information about which hypothesis is correct. We call this event evidence. Before
we receive the evidence, then, we have a set of prior probabilities P(H1 ), P(H2 ), .
. . , P(Hm ) for the hypotheses. If we know the correct hypothesis, we know the
probability for the evidence. That is, we know P(A|Hi ) for all i. We want to find
the probabilities for the hypotheses given the evidence. That is, we want to find
the conditional probabilities P(Hi |A). These probabilities are called the posterior
probabilities.
To find these probabilities, we write them in the form

P(Hi ∩ A)
P(Hi |A) = . (9)
P(A)

9
We can calculate the numerator from our given information by

P(Hi ∩ A) = P(A|Hi )P(Hi ). (10)

Since one and only one of the events H1 , H2 , . . . , Hm can occur, we can write the
probability of A as

P(A) = P(H1 ∩ A) + P(H2 ∩ A) + ... + P(Hm ∩ A).

Using Equation 10, the above expression can be seen to equal

P(A) = P(A|H1 )P(H1 ) + P(A|H2 )P(H2 ) + ... + P(A|Hm )P(Hm ). (11)

Using 9, 10, and 11 yields Bayes’ formula:


P(A|Hi )P(Hi )
P(Hi |A) = Pm . (12)
i=1 P(A|Hi )P(Hi )

Although this is a very famous formula, we will rarely use it. If the number of
hypotheses is small, a simple tree measure calculation is easily carried out.

1.11. Independent events


It often happens that the knowledge that a certain event A has occurred has no effect
on the probability that some other event B has occurred, that is, that P(A|B) =
P(A). One would expect that in this case, the equation P(B|A) = P(B) would
also be true. In fact, each equation implies the other. If these equations are true,
we might say the A is independent of B. For example, you would not expect the
knowledge of the outcome of the first toss of a coin to change the probability that
you would assign to the possible outcomes of the second toss, that is, you would
not expect that the second toss depends on the first. This idea is formalized in the
following definition of independent events.

11 definition. Two events A and B are independent if both A and B have positive
probability and if
P(A|B) = P(A)
and
P(B|A) = P(B)

As noted above, if both P(A) and P(B) are positive, then each of the above
equations imply the other, so that to see whether two events are independent, only
one of these equations must be checked.
The following theorem provides another way to check for independence.

1 proposition. If P(A) > 0 and P(B) > 0, then A and B are independent if and
only if
P(A ∩ B) = P(A)P(B) (13)

10
Proof. Assume first that A and B are independent. Then P(A|B) = P(A), and so

P(A ∩ B) = P(A|B)P(B) = P(A)P(B)

Assume next that P(A ∩ B) = P(A)P(B). Then

P(A ∩ B) P(A)P(B)
P(A|B) = = = P(A)
P(B) P(B)
Also,
P(A ∩ B) P(A)P(B)
P(B|A) = = = P(B)
P(A) P(A)
Therefore, A and B are independent.

12 definition. A set of events {A1 , A2 , ..., An } is said to be mutually independent


if for any subset {Ai , Aj , ..., Am } of these events we have

P(Ai ∩ Aj ∩ ... ∩ Am ) = P(Ai )P(Aj )...P(Am ).

It is important to note that the statement

P(Ai ∩ Aj ∩ ... ∩ Am ) = P(Ai )P(Aj )...P(Am ).

does not imply that the events A1 , A2 , . . . , An are mutually independent.

1.12. Bernoulli trials


Let suppose we are doing an experiment. In this experiment the probability, that
event A will occur is equal P(A) = p, and probability, that opposite event Ā will
occur is equal q = 1 − p.

13 definition. Bernoulli trials are trials, which satisfy these requests:

1) in each trial two events can occur: event A, or opposite event Ā;

2) trials are independent. Intuitively, the outcome of one trial has no influence
over the outcome of another trial.;

3) the probability of event A in all trials is the same and equal p = P(A). (The
probability of opposite event Ā in all trials is equal q = P(Ā)).

1.13. Bernoulli formula


To analyze a Bernoulli trials process, we choose as our sample space a binary tree
and assign a probability measure to the paths in this tree. Suppose, for example,
that we have three Bernoulli trials. The possible outcomes are indicated in the
tree diagram shown in Figure 3.4. The probabilities assigned to the branches of
the tree represent the probability for each individual trial. Since we have assumed

11
that outcomes on any one trial do not affect those on another, we assign the same
probabilities at each level of the tree. An outcome w for the entire experiment
will be a path through the tree. For example, w3 represents the outcomes SFS.
Our frequency interpretation of probability would lead us to expect a fraction p of
successes on the first experiment; of these, a fraction q of failures on the second; and,
of these, a fraction p of successes on the third experiment. This suggests assigning
probability pqp to the outcome w3 . Thus, the probability that the three events S
on the first trial, F on the second trial, and S on the third trial occur is the product
of the probabilities for the individual events.

Figure 3: Tree diagram of three Bernoulli trials.

We shall be particularly interested in the probability that in n Bernoulli trials


there are exactly j successes. We denote this probability by b(n, p, k). Let us
calculate the particular value b(3, p, 2) from our tree measure. We see that there
are three paths which have exactly two successes and one failure, namely w2 , w3 ,
and w5 . Each of these paths has the same probability p2 q. Thus b(3, p, 2) = 3p2 q.
Considering all possible numbers of successes we have

b(3, p, 0) = q 3

b(3, p, 1) = 3pq 2
b(3, p, 2) = 3p2 q
b(3, p, 3) = p3
We can, in the same manner, carry out a tree measure for n experiments and deter-
mine b(n, p, k) for the general case of n Bernoulli trials.

12
2 proposition (Bernoulli Formula). Let suppose we are doing n independent
trials, where each trial consists only of two possible outcomes A (success) and Ā
(failure) with p = P(A) and q = P(Ā)) = 1 − p, where 0 < p < 1. The probability
b(n, p, k) of the event which corresponds to k ’successes’ in n Bernoulli trials is

b(n, p, k) = Cnk pk q n−k for k = 0, 1, 2, ..., n. (14)

Conclusions of Bernoulli formula:

1) probability, that after n independent Bernoulli trials, the event A occurs more
then k1 and less then k2 times, is equal
k2
X
P{k1 6 k 6 k2 } = Cnk pk q n−k ; (15)
k=k1

2) Probability, that after n independent Bernoulli trials, the event A occurs one
or more times, is equal.

P{1 6 k 6 n} = 1 − q n ; (16)

Proof.
n
X n
X
P{1 6 k 6 n} = Cnk pk q n−k = Cnk pk q n−k − Cn0 p0 q n−0 = 1 − q n .
k=1 k=0

1.14. Maximum likelihood value of event A


14 definition. The value k = k ∗ , where Pn (k) takes the largest value is called
maximum likelihood value of event A.

3 proposition. The maximum likelihood value k = k ∗ of event A in Bernoulli trials,


where 0 < p = P(A) < 1 is in this interval:

np − q 6 k ∗ 6 np + p. (17)

13

You might also like