Chapter 2. Discrete Models: 1 Probability - The Foundation of Statistics
Chapter 2. Discrete Models: 1 Probability - The Foundation of Statistics
Chapter 2. Discrete Models: 1 Probability - The Foundation of Statistics
Discrete Models 28
In this chapter, the basics of probability theory are introduced and these ideas will be
illustrated with discrete probability models where the number of random outcomes is
either finite or countably infinite. For example, if we inspect five bridges and note the
number that need repairs, then the possible values noted will be finite: 0, 1, 2, 3, 4,
or 5. Before the inspections are performed, the number of bridges that will need
repair is unknown or random. Probability deals with the problem of quantifying the
likelihood of random outcomes. A set of outcomes is countably infinite if the ele-
ments of the set can be arranged similarly to the counting numbers: 1, 2, 3, 4, . . . . For
example, consider an experiment where the number of radioactive particles emitted
from a radioactive substance in an hour is recorded. Then the possible values for
this recorded number are 0, 1, 2, . . . , with no upper bound. This is a discrete type
experiment. On the other hand, if we measure the weight of a engine prototype, the
measured weight can vary on a continuum of numbers, say from 300 to 350 pounds.
A weight measurement is an example of a continuous variable (which is discussed in
Chapter 3).
Definition. The Sample Space S is the set of all possible outcomes of the experiment.
Probabilities are defined as numbers between zero and one (inclusive) that indicate
how likely an event is to occur. Probabilities near one indicate that the event is very
likely to occur and probabilities near zero indicate that the even is unlikely to occur.
A probability of 0.5 is a 50-50 chance.
i. P (A) 0.
ii. P (S) = 1
iii. If the events Ai are pairwise disjoint (or, in the language of probability, mutually
exclusive), then
[
X
Ai = P (Ai ).
i=1 i=1
In other experiments, all the outcomes are equally likely. For instance, when tossing
a fair die, the sample space S contains 6 equally likely outcomes since the die is
fair. The event A of rolling an even number contains three of these outcomes, so
P (A) = 3/6 = 0.5. If B is the event you roll a multiple of 3, then B = {3, 6} and
P (B) = 2/6 = 1/3. Sometimes, probabilities can be computed by simply knowing
the setup of the experiment. Suppose we receive a shipment of 100 engines and we
know that 10 are defective. Consider an experiment of selecting one of the engines at
random and let A be the event that the selected engine is defective. Then it makes
sense to assign a probability of 10/100 for the P (A).
Exercise: From the definition of probability, P (S) = 1. Use this fact to prove that
the probability of the null set (or empty set) is zero.
Extensions to the additive law exist for three or more events. For example,
P (A) = P (A B) + P (A B).
Chapter 2. Discrete Models 31
Definition. Two events A and B are mutually exclusive if they cannot both simulta-
neously occur, i.e., their intersection is empty: AB = in which case P (AB) = 0.
3 Conditional Probability
A very important concept in probability is the concept of conditional probability.
Often in practice the outcome of an experiment is uncertain, but we may have some
additional information that helps shed some light on the outcome.
To illustrate, let A be the event that a randomly selected cup of soup will be overfilled
during the production process and suppose P (A) = 0.1. Further, suppose there are
two filling tanks. Let B be the event that the cup of soup was filled by the first tank.
If both tanks fill an equal number of tanks, then it makes sense to set P (B) = 0.5.
Suppose that P (A B) = 0.08. That is, the probability that a randomly selected cup
is overfull and was filled by the first tank is 0.08.
Consider the following question: Given that the cup was filled by the first tank, what
is the probability it will be overfilled? That is
P (A given B).
P (A|B).
If we know the cup was filled by the first machine, then we know the event B has
occurred, and therefore, we can reduce our sample space from S to B. The definition
of conditional probability is then
P (A B)
Conditional Probability Formula: P (A|B) = .
P (B)
In the cup of soup example,
P (A B) 0.08
P (A|B) = = = 0.16.
P (B) 0.5
That is, given that the cup was filled by the first tank, the probability it is overfilled
is 0.16. We can turn the problem around: given that the cup is overfull, what is the
probability that it was filled by the first tank? That is, find
P (A B) 0.08
P (B|A) = = = 0.80.
P (A) 0.1
We know that 50% of the cups are filled by the first tank. However, if we know that
a cup has been overfilled, then there is an 80% chance it was filled by the first tank.
Note that in general P (A|B) 6= P (B|A).
Chapter 2. Discrete Models 32
4 Independence
Another very important concept in statistics is that of independence. Note that in the
previous example the event that a cup is overfilled is not independent of the event that
the cup was filled by the first tank. If knowing that the event B has occurred effects
the likelihood of whether or not event A occurs, then the two events are dependent.
Conversely, if knowing that event B has occurred does not effect the chance of event
A occurring, then we say the events A and B are independent :
The notion of independence allows us to solve lots of problems. Consider a jet that
has two engines that operate independently. The probability of engine failure for an
engine is 0.01. What is the probability that both engines fail? Let F1 be the event
the first engine fails and F2 be the event the second engine fails. Then the probability
of both engines failing is P (F1 F2 ). Since the engines operate independently,
If the jet can fly as long as one of the engines is operating, what is the probability
that the jet does not crash due to engine failure? That is, find P (F1 F2 ):
Example. Suppose there is a 1/100 = 0.01 chance you would win a weekly lottery
if you buy a single ticket. If you play every week for a year (52 weeks), what is the
probability you win at least once?
Chapter 2. Discrete Models 33
Let Wj be the event you win on the j week, j = 1, 2, . . . , 52. Then P (Wj ) = 0.01.
Then
So, there is about a 41% chance of winning at least once during the year.
Example (Based on a Car-Talk radio show puzzler on September 21, 2008). A man
has two cars. The first car will start with a probability of 0.8 and the second car will
start with a probability of 0.7. The man applies for a job requiring a car. The boss
says he can only hire someone who is reliable and can be depended on at least 90% of
the time. Because the applicants cars only start 70% and 80% of the time, the boss
says he will not hire the man. However, the man claims he does meet the reliability
criterion. How is this possible?
Let A denote the event the first car starts with P (A) = 0.8 and let B be the event
the second car starts with probability P (B) = 0.7. Assuming the events A and B
are independent, we have that the probability the man will have a car for work is
P (A B) = P (A) + P (B) P (A B) = 0.8 + 0.7 (0.8)(0.7) = 0.94 using the
independence of A and B. Thus, the man should be able to get to work 94% of the
time.
We can denote random variables by other letters besides Y and typically use letters
towards the end of the alphabet (e.g. X, Y , Z).
Some simple examples:
Sample 10 engines from a large shipment and let Y equal the number of defective
engines out of the 10. Then the random variable Y can assume possible values
of 0, 1, . . . , 10. Thus Y varies between these eleven values and the value that
Chapter 2. Discrete Models 34
In the cup-a-soup example, let Y equal the weight of soup in a randomly chosen
cup of soup. In this case, Y is a continuous random variable because it can
(theoretically) assume any value in a continuum of values. In such cases, it
does not make sense to assign a non-zero probability to any specific value that
Y can assume because there are an uncountably infinite number of possible
values. Note that for even though a random variable is continuous, we are only
able to record their values on a discrete scale (say to the nearest 10th of a pound
for example).
Inspect cups of soup coming off the assembly line and let X equal the number
of cups inspected until one is overfilled. The possible values that X can assume
are 1, 2, 3, . . . , with no upper limit. Even though there is no upper limit to the
number of values X can assume, X is nonetheless a discrete random variable
because the values X can take can be put into a one-to-one correspondence
with the natural numbers. Random variable that can take arbitrary values in
a continuum cannot be put into a one-to-one correspondence with the natural
numbers.
In each of these examples, its fairly easy to determine what values the random variable
can assume. What we also need to know is how likely it is that the random variables
assumes these values. That is, we need to know its probability distribution.
F (y) = P (Y y)
Definition. The Probability Function p(y) for a discrete random variable is defined
by
p(y) = P (Y = y).
Therefore, 0 p(y) 1 since probabilities must lie between zero and one. Also, if
we sum up all the values of p(y) over all y values, we must get one.
Chapter 2. Discrete Models 35
Example. Roll a fair die and let Y equal the face value that comes up. Then we can
express the probability function of Y conveniently in tabular form:
y 1 2 3 4 5 6
p(y) 1/6 1/6 1/6 1/6 1/6 1/6
F (y) 1/6 2/6 3/6 4/6 5/6 1
Note that in this example F (y) = 0 for y < 1 and F (y) = 1 for y 6. Note
also that the cdf is defined for all real numbers. For instance, in this example,
F (2.344) = P (Y 2.344) = P (Y 2) = 2/6.
y 0 1 2 3
p(y) 0.85 0.10 0.04 0.01
Note that the p(y) values sum to one. The expected value of Y is computed as
X
E[Y ] = yp(y) = 0(0.85) + 1(0.10) + 2(0.04) + 3(0.01) = .21.
y
Thus, if you sampled hundreds of compressors, you expect to see about 0.21 leaks on
average.
We saw in the cup-a-soup example that the variation in the process was also very
important. In probability we can formally define the variance of a random variable
Chapter 2. Discrete Models 36
which is a measure of how spread out its values are. The variance is denoted by
the Greek letter 2 (sigma-squared). In order to measure variability, a natural
approach is to examine how far a measured variable Y differs from the average value
. Now, Y varies according to its probability distribution, so to get an over measure
of variation, we can look at average deviations: E(Y ), but this quantity is always
zero because the positive deviations from always cancel out the negative deviations
from . Instead, we compute the average squared deviations from and call this the
variance:
In the case of the variance of Y , we simply compute the average value of g(Y ) =
(Y )2 .
It is also useful to know that the expectation operator is linear. That is,
E[a + bY ] = a + bE[Y ]
for any two constants a and b. This allows us to provide a convenient formula for the
variance of a random variable:
2 = E[(Y )2 ]
= E[Y 2 2Y + 2 ]
= E[Y 2 ] 2E[Y ] + 2
= E[Y 2 ] 22 + 2
= E[Y 2 ] 2 .
2 = var(Y ) = E[Y 2 ] 2 .
Chapter 2. Discrete Models 37
y 0 1 2 3
y2 0 1 4 9
p(y) 0.85 0.10 0.04 0.01
yp(y) 0 0.10 0.08 0.03
y 2 p(y) 0 0.10 0.16 0.09
From the table we compute that E[Y 2 ] = 0 + 0.10 + 0.16 + 0.09 = 0.35 and therefore
the variance
of Y is 2 = E[Y 2 ]2 = 0.35(0.21)2 = 0.3059. The standard deviation
= 2 = 0.3059 = 0.5531.
Now that some of the basics of probability have been introduced, we present one of
the most important discrete probability models the binomial distribution.
Example. Consider the cup-a-soup example. Suppose that a cup is ok for shipment
if the weight dispensed into is between 237-239. The probability that a cup is within
this specification is 0.80. A change is made to the production process to see if the
proportion of cups that fall in this specified range is increased. In order to test if
the change has improved matters, n = 10 cups are sampled. For each cup we record
whether it is a success (S), (i.e., if its weight is between 237 - 239.), or a failure (F)
(i.e., if its weight falls outside this range). Let Y denote the number of good cups out
of the n = 10 trials. Then Y is a binomial random variable (assuming the 10 trials
are independent and identical). The question of interest concerns the probability of
success, p, under the changed process. In particular, is p > 0.80: did the change
improve matters?
Chapter 2. Discrete Models 38
Let us assume for the sake of argument that p has not changed and that p = 0.80
still. Let us compute the probability function p(y) of Y . Since Y is the number of
success out of n = 10 trials, Y can assume the values 0, 1, . . . , 10.
One step at a time, we shall compute p(0) = P (Y = 0) first.
p(0) = P (Y = 0)
= P (F F F F F F F F F F )
= P (F )P (F )P (F )P (F )P (F )P (F )P (F )P (F )P (F )P (F ) by independence
= 0.2010
= 0.0000001.
Thus, it is not very likely that all 10 cups would fail to meet specifications if p = 0.80.
Next, consider p(1) = P (Y = 1). One way this can happen is if we get an outcome
such as SF F F F F F F F F . Again, by independence, the probability of this outcome
is
(0.8)(0.2)(0.2)(0.2)(0.2)(0.2)(0.2)(0.2)(0.2)(0.2) = (0.8)(0.2)9
= py q ny
S2 S1 .
(Recall that 0! = 1.) Thus the total number of ways of placing S1 and S2 into the n
slots is given by
10! n!
90 = 10 9 = =
(10 2)! (n y)!
where y = 2. Note that we have artificially labeled the two successes as S1 and S2 .
The 90 possibilities we just counted distinguishes the order in which we placed the
Chapter 2. Discrete Models 39
two successes. However, we are not interested in the order; the labeling was artificial.
To get the correct number of possibilities we need to divide the 90 by 2 because there
are two ways to rearrange S1 and S2 by simply having them change places. Thus, the
total number of ways of choosing y = 2 slots out of the n = 10 possible slots to place
successes in is
10 9 n!
90/2 = 45 = = .
2 (n y)!y!
The same logic can be applied when y = 3 successes. We can label the three successes
S1 , S2 and S3 . There are 720 = 10 9 8 = 10!/(10 3)! ways of arranging these three
successes into the 10 slots. Once again, we are not interested in distinguishing the
three successes. Thus, the 720 possibilities is too large by a factor of 3 2 1 = 6
ways of rearranging the three successes. The total number of possibilities is then
n!/((n y)!y!) which is the same formula we derived when y = 2. This formula
is the general formula for all values of y = 0, 1, . . . , n. This expression for counting
the number of combinations of n objects
taken y at a time is given by the binomial
n
coefficient which is denoted by :
y
n n!
= (1)
y y!(n y)!
n
The binomial coefficient counts the number of ways of choosing y items from a
y
collection of n items.
Using the binomial coefficient, we can now give the formula for the binomial prob-
ability function on n trials and success probability p: the probability of exactly y
successes out of n trials is given by
n y ny
p(y) = p q , y = 0, 1, . . . , n, (2)
y
Applying the formula for y = 3, 4, . . . , 10, will give the remaining probabilities.
Matlab Commands. The cumulative probability function for a binomial random
variable can be computed using the command binocdf in Matlab. For instance,
typing binocdf(2,10,.8) gives the probability P (Y 2) when Y has a binomial
distribution on n = 10 trials and success probability p = 0.8. To compute P (Y = 2)
in Matlab, we can note that
P (Y = 2) = P (Y 2) P (Y 1)
and type
binocdf(2,10,.8)-binocdf(1,10,.8)
in Matlab to get the answer 7.3728e 005.
Chapter 2. Discrete Models 40
Figure 1 shows the probability function for the binomial distribution. The left panel
of Figure 1 shows the binomial probability function for n = 10 which is skewed to the
left. The right panel of Figure 1 shows the binomial probability function for n = 100
which looks symmetric and bell-shaped.
= np (3)
Pn n y ny
which can be found by computing y=1 y p q .
y
Caution: This formula does not apply to other types of random variables.
The formula is quite intuitive. Suppose you are an 80% free-throw shooter in basket-
ball and you take n = 10 shots. How many would you expect to make? The answer
is 80% of 10, or 8. Heres an interesting question, if you are an 80% shooter, are you
more likely to make 6 of 10 shots or make all 10 shots? Just plug the numbers into
(2) to find the answer (most peoples intuition is wrong on this one).
The variance of a binomial random variable is
The binomial coefficient is useful for counting the number of outcomes of experiments
when the total number of outcomes is very large. Here are a couple common examples.
possible poker hands. To compute the probability of a royal straight flush in poker
(i.e. 10, Jack, Queen, King and Ace all of the same suit), note that there are only 4
possible royal straight flushes for the four different suits (hearts, diamonds, spades,
clubs). Thus,
Note that this probability computation is based on the assumption that when you
deal 5 cards from a randomly shuffled deck that all 2, 598, 960 hands are equally likely
to occur.
Example (Super Lotto) Suppose you buy a super lottery ticket where you choose 6
numbers from the set of numbers 1, 2, . . . , 47. Then there are
47
= 10, 737, 573
6
possible combinations. If you buy one ticket, your probability of winning is 1/(10, 737, 573) =
0.0000000931. That is, it is very unlikely you will win.
One of the main statistical inference procedures is hypothesis testing. The basic
ideas of hypothesis testing are now introduced using the binomial distribution. The
concepts covered here carry over to other statistical models.
Chapter 2. Discrete Models 42
Example: 20% of the electrodes produced by a machine are defective and cannot be
used resulting in a waste of time and money. The company is considering purchasing
a new but expensive replacement machine in the hope that the proportion of defective
electrodes will decrease. Before purchasing the machine, the company decides to test
it first by producing n = 100 electrodes with the new machine. Based on the test run
producing n = 100 electrodes, a decision needs to be made: buy the new machine or
stick with the old machine. How should the decision be made?
The decision can be made using hypothesis testing. Out of the n = 100 electrodes pro-
duced by the new machine, let Y denote the number of electrodes that are defective.
From the previous sections of this chapter, we would expect Y to follow a binomial
distribution with n = 100 trials and success probability p. The success probability p
in this problem is an example of a parameter and the problem is that we do not know
the value of p. If the new machine is no better than the old machine, then p 0.20
and there is no sense in buying the expensive new machine. If, on the other hand,
p < 0.20, then the defect rate for the new machine is less than that of the old machine
and it may make sense to replace the old machine by the new machine. Suppose the
defect rate for the new machine is the same as the old machine (i.e. p = 0.20). Then
we would expect the number of defective electrodes (out of n = 100) to be around
= np = 100(0.20) = 20 plus or minus a standard deviation or two. However, if the
number of defective electrodes is considerably less than 20, then we would conclude
the new machine is better than the old machine.
The logic behind hypothesis testing is as follows. We assume for the sake of argument
that the new machine is no better than the old machine (the status quo) and we call
this the null hypothesis and denote it by H0 . In terms of the defect rate parameter
p for the new machine, the null hypothesis H0 can be written
H0 : p = 0.20.
The null hypothesis is always stated in terms of a model parameter, in this case p,
the defect rate of the new machine. We also set up an alternative hypothesis also
in terms of the model parameter, denoted Ha , which states the research hypothesis:
is the new machine better than the old machine? In terms of the defect rate p for the
new machine, the alternative hypothesis Ha is
Ha : p < 0.20.
The idea now is to run the experiment (i.e. produce n = 100 electrodes with the new
machine) and see if the data from the experiment allow us to reject the null hypothesis
H0 and accept the alternative hypothesis Ha that the new machine is better than the
old machine.
Chapter 2. Discrete Models 43
In order to make the decision based on the data, we plug the data into a test statistic.
Test statistics can be quite complicated in practice, but for this example we shall use
a very simple test statistic: let Y = the number of defective electrodes. We shall let
Y be the test statistic.
If the number of defective electrodes Y is small, we will reject the null hypothesis H0
and accept the alternative hypothesis Ha that the new machine has a lower defect
rate. The question is: how small does Y , the number of defective electrodes, have to
be in order to reject H0 and conclude the new machine is better than the old machine?
In order to make this decision, we need a cut-off value for Y so that if Y is less than
this cut-off value we reject the null hypothesis. Whenever we make a decision there
are two types of errors possible (described below). The cut-off value is determined
by minimizing the chance of committing one of these errors. Here are the definitions
for the two types of errors when making a decision:
Definition. A Type I error occurs if the null hypothesis is rejected when it is true.
Definition. A Type II error occurs if the null hypothesis is accepted when it is false.
In the context of the electrode example, a type I error occurs if we conclude the
new machine works better than the old machine (reject H0 : p = 0.20 and conclude
Ha : p < 0.20) when in fact the new machine is no better than the old machine.
A type I error here would be very bad because an expensive new machine will be
purchased that is no better than the old machine. A type II error would be to claim
the new machine performs the same as the old machine (accept H0 ) when in fact the
new machine has a smaller defect rate. A type II error in this context is also bad,
but committing it means the company would just continue producing electrodes with
the old machine. Often hypothesis tests are set up in such a way that a type I error
is the more serious error.
To help understand the logic behind hypothesis testing, consider an analogy with
a courtroom trial. The defendant on trial is either guilty or not guilty. Evidence
is heard to decide whether to convict or not convict the defendant. To begin, the
defendant is assumed to be innocent and then the data is examined to determine if
we can reject the hypothesis of innocence and convict. Thus, we can set this up as
a hypothesis test:
Null Hypothesis H0 : Innocent
versus the
Alternative Hypothesis Ha : Guilty.
In statistics, the evidence is in the data and we use the data to determine if the
null hypothesis should be rejected or not. In a court trial there are two possible
decisions (convict or not convict) and also two possible errors: type I and type II. In
the trial analogy, a type I error is to reject the assumption of innocence and convict
the defendant when in fact the defendant is innocent. Convicting an innocent person
is considered a very bad thing, and thus we generally need to be convinced beyond
Chapter 2. Discrete Models 44
a reasonable doubt that the defendant is guilty. A type II error in the context of a
court case is to let a guilty person go free.
Note that in a court of law, failing to convict the defendant does not necessarily
mean that the defendant is innocent. Failing to convict the defendant could mean
that there was not enough evidence. In the statistical framework, failing to reject
the null hypothesis could result either because the null hypothesis is true or because
there is not enough data (i.e. evidence) to conclude the null hypothesis is false.
Therefore, in practice if the null hypothesis is not rejected, one will typically refrain
from claiming the null hypothesis is true because this could cause a type II error.
Instead, one can say there is insufficient evidence to reject the null hypothesis.
The type II error problem highlights the importance of designing experiments appro-
priately. We want to avoid conducting a costly experiment or survey where we collect
evidence (data) and find out afterwards we cannot reject the null hypothesis simply
due to a lack of evidence. Lack of evidence could be due to insufficient sample size
or a poor experimental design or sampling design. Great care must go into the data
collection process.
Recall that in the electrode example, we need to determine a cut-off value for Y in
order to make a decision. This cut-off value will be chosen to make the probability
of a type I error small since a type I error is considered more serious than a type II
error. The probability of committing a type I error is called the significance level and
denote it by the Greek letter (alpha).
Typical values for the significance level are 0.01, 0.05, or 0.10 depending on how
much protection one wants against committing a type I error. The value = 0.05 is
used most frequently. Because the binomial distribution is discrete, it is usually not
possible to set the significance level at exactly some fixed value like = 0.05 as we
shall see. Let c be the cut-off value for Y so that we will reject H0 if we observe a
value Y = y c. Let us choose c so that the significance level is = 0.05 (or as close
as possible to 0.05).
Definition: The critical region (or rejection region) of a test is the set of values of
the test statistic that will lead to a decision to reject the null hypothesis.
In the electrode example, the critical region will be of the form y c where y is the
observed number of defective electrodes. If we choose c = 13, then
= P (type I error)
= P (Rejecting H0 when H0 is true)
= P (Y c when p = 0.20)
= P (Y 13 when p = 0.20)
= 0.0469.
Chapter 2. Discrete Models 45
n=100
p=0.20
binocdf(13, n,p)
The probability given by Matlab from these commands is 0.0469. From the above
probability computation we see that if the number of defective electrodes out of
n = 100 produced by the new machine is less than or equal to 13, then we will
reject H0 conclude that the defect rate p of the new machine is less than 0.20. The
probability of making a type I error in this case is only 0.0469. Stated another way,
if the defect rate p for the new machine is the same as the old machine (p = 0.20)
then observing 13 or fewer defects with the new machine out of 100 electrodes is very
unlikely. Figure 2 shows a picture of the binomial distribution when p = 0.20 along
with the critical region. Note that the probabilities p(y) in this figure are essentially
zero once you get more than three standard deviations away from the mean of = 20.
Suppose the test run with the new machine is run and out of the n = 100 electrodes
produced, we observe y = 10 defective electrodes. Since the value y = 10 falls in
the critical region (y = 10 13), we would reject H0 and conclude that the defect
rate p for the new machine is less than 0.20 with a significance level 0.05. It is
important to state the significance level in your conclusion because this specifies
the strength of the statistical evidence against the null hypothesis. In this example
we are claiming that the new machine is better than the old machine. This could be
an incorrect claim (i.e. a type I error) but the probability of making that error is
only 0.05.
In the electrode example, the hypothesis test was an example of a one-sided test.
That is, we decided to reject H0 for only small values of Y . In other examples where
one wants to determine if the parameter differs from some hypothetical value, then
we would have a two-sided test where we would reject the null hypothesis for either
very large or very small values of the test statistic.
9.1 p-values
In the previous section where hypothesis testing was described, a small probability of a
type I error ( = 0.05) was specified which determined the cut-off value for the critical
region. Another common approach to testing a hypothesis is to report the strength
of the evidence against H0 . In the electrode example, observing y = 10 defective
electrodes with the new machine would lead to the rejection of the null hypothesis
using a significance level 0.05 because y = 10 is in the critical region. In this
section, we ask: How likely is it to observe 10 or fewer defects with the new machine
if the defect rate is the same as the old machines defect rate? This probability is
known as a p-value. Formally, for this example, the p-value is computed as:
p-value = P (Y 10) (assuming p = 0.20)
= 0.0057.
Chapter 2. Discrete Models 46
0.10
0.08
0.06
Critical
Region
p(y)
0.04
= 0.0469
n=100, p=0.2
0.02
0.00
0 10 20 30 40
Figure 2: Null Distribution for the binomial distribution with n = 100, p = 0.20
with cut-off for the critical region.
If the defect rate for the new machine is the same as the old machine (p = 0.20),
then the probability of observing 10 or fewer defective electrodes out of n = 100
is extremely unlikely (the probability is 0.0057). Reporting this p-value is more
informative than performing a test at a fixed significance level because the p-value
tells you exactly the strength of the evidence against H0 .
Here is a general definition of a p-value:
Because p-values are probabilities, they range in value between 0 and 1. p-values near
zero are evidence against the null hypothesis. For instance, in the electrode example
above, the p-value was 0.0057 is very small and provides strong evidence against H0 .
Small p-values tell us that an observed outcome is very unlikely if the null hypothesis
is true. A rough rule of thumb is that if the p-value is less than 0.01, one has very
strong evidence against H0 . If p-value < 0.05, then one strong evidence against H0 .
If 0.05 < p value < 0.10, then the evidence against H0 is only moderate. Generally,
p-values > 0.10 are not considered as evidence against H0 . Of course, there is some
grey area in interpreting p-values.
Chapter 2. Discrete Models 47
Recall that there are two types of errors in hypothesis testing: type I and type II.
In the context of the electrode problem, a type I error is to conclude that the defect
rate for the new machine is lower than that of the old machine when in fact it is not
lower. A type II error is claim the defect rate for the new machine is the same as
the old machine when in fact the new machine has a lower defect rate. As mentioned
above, it is important to plan experiments and surveys so that you have enough
data (evidence) to reject the null hypothesis when the null hypothesis is false. In
statistical terminology, one wants to plan experiments so that the hypothesis test has
high power.
Definition. The Power of a statistical test is the probability of rejecting the null
hypothesis when the null hypothesis is false.
In the courtroom analogy, low power is similar to little evidence. A guilty defendant
may not be convicted if there is a lack of evidence. In statistics, a false null hypothesis
will not be rejected if there is not enough data. Power computations tend to be a little
complicated and we will not provide one here. However, we can illustrate the problem
with poor power using the electrode example again. Suppose in the electrode example
a test run with the new machine was run that produced only n = 10 electrodes instead
of n = 100. If only y = 1 defective electrode is observed from the n = 10 test run,
then the proportion of defections is 1/10 or 10% which is the same proportion in the
above example (10 out of 100 or 10%). In the large test run (n = 100), observing
ten defective electrodes provided very strong evidence against the null hypothesis
H0 . However, if the smaller test run is made (n = 10), the p-value of the test is
P (Y 1) = 0.3758 which is not a small probability. In other words, if the null
hypothesis is true (i.e. the defect rate is p = 0.20), then it is not unusual that the
number of defective electrodes produced out of ten is less than or equal to one. With
such a large p-value, we cannot conclude the new machine is better than the old
machine (i.e. we cannot reject H0 ). The new machine may indeed be better than
the old machine, but we cannot make that determination based on a test run of only
n = 10 electrodes.
When designing an experiment or survey an important consideration then is that your
test will have adequate power to detect differences from the null hypothesis. Required
sample sizes needed for an experiment are determined by specifying ahead of time
the desired power. For instance, requiring a power of 90% is quite common. Higher
power requires a greater sample size. There are many software packages available for
doing sample size computations. For more complicated models, computer simulations
may be needed to determine an adequate sample size to guarantee a high power.
In the electrode example, we rejected the null hypothesis if the number of defective
electrodes Y produced by the new machine was small. That is we rejected H0 for
Y c, where c is a designated cut-off value. In many applications we may set up a
hypothesis test to reject a null hypothesis if the test statistic is either too large or too
Chapter 2. Discrete Models 48
small: in these cases, the test is known as a two-tailed test. The following example
will help illustrate a two-tailed test.
Example. 30% of air tanks begin to leak when the pressure in the tank exceeds a
specific threshold. The company manufacturing the tanks begins using a new valve
produced by a different supplier. Fifty tanks are tested with the new valve to deter-
mine if the proportion of tanks that leak has changed. Let p denote the proportion
of tanks that will leak with the new valve when the pressure exceeds a the specific
threshold. The null hypothesis of the test is
H0 : p = 0.30,
which says that the proportion of tanks that leak with the new valve is the same as
with the old valve. We want to determine if the proportion of tanks that leak with
the new valve has changed, so the alternative hypothesis is
Ha : p 6= 0.30.
If the observed proportion of tanks out of the n = 50 tested is either much bigger or
much smaller than 0.30, then we will reject H0 and accept Ha . This is an example
of a two-tailed test because we will reject H0 if the observed number of leaking tanks
falls in either the left or right tail of the binomial distribution. Let Y denote the
number of leaking tanks observed from the experiment. The critical region now takes
the form: reject H0 if Y < c1 or Y > c2 . The question again comes down to finding
cut-off values c1 and c2 in order to make a decision to reject H0 or not.
Let us choose a significance level = 0.05. Because we have a two-tailed alternative,
we can to split the 0.05 probability in two for the two tails of the binomial distribution:
0.025 for the left tail (small values of Y ) and 0.025 for the right tail (large values of
Y ). Because n is fairly large and p = 0.3 is not too close to zero or one, the binomial
distribution for n = 50 and p = 0.3 will be fairly symmetric and we can use the
empirical rule to get a rough idea of the cut-off values for the critical region. If H0 is
true, from (3), the meanqnumber of tanks with leaks will be np = 50(0.3) = 15 with
standard deviation = np(1 p) = 3.24. (which follows from (4)). Approximately
95% of the probability will lie between 2 = 15 6.48 which gives values of 8.52
and 21.48. Let y denote the observed number of leaking tanks out of fifty. Let us
choose cut-off values for our two-tailed critical region as:
Reject H0 if y 8 or y 22.
The exact significance level for this test can be computed by noting
= P (Rejecting H0 when H0 is true (p = 0.3))
= P (Y 8 or Y 22)
= P (Y 8) + P (Y 22)
= P (Y 8) + [1 P (Y < 22)]
= P (Y 8) + 1 P (Y 21)
0.12
0.10
0.08
0.06
p(y)
Critical Critical
Region Region
0.04
n=50, p=0.3
0.02
0.00
0 10 20 30 40
A binomial random variable is a discrete random variable that can assume a finite
number of values, namely 0, 1, 2, . . . , n. Another type of discrete random variable that
Chapter 2. Discrete Models 50
can take the values 0, 1, 2, . . . , is the Poisson distribution. Consider an engineer whos
job is to troubleshoot problems for customers that have purchased the companys
product. Let the random variable Y denote the number of calls that arrive per hour.
A Poisson distribution often provides a reasonable model for data generated by such
a process. The Poisson distribution is parameterized by a rate parameter > 0 and
the probability function for the Poisson distribution is
and zero otherwise. The expected value of a Poisson random variable is . The
Poisson distribution has an interesting property where the variance is equal to the
expected value, i.e. var(Y ) = .
The Poisson distribution is quite useful in practice for a couple of reasons. One
reason is that the Poisson distribution provides a good approximation to the binomial
distribution when the number of trials n is large and the success probability p is small.
In such cases, the binomial distribution is well approximated by a Poisson distribution
with mean = np.
e2 20 /0! = 0.13533528,
Another reason the Poisson distribution arises is due to the Poisson Process. Consider
a physical process where a particular type of event occurs (such as a defect in a product
or the emission of a radioactive particle). Let Y (t) denote the number of such events
that occur in a given interval of time [0, t]. In many such processes, the probability
an event occurs in a short interval of time is proportional to the size of the time
interval and the occurrences of events in disjoint time intervals are independent. If
the probability of two or more events occurring in a small interval of time is very
small, then the process satisfying these conditions is called a (homogeneous) Poisson
Processes. One can show that if Y (t) is the number of occurrences of the event in the
interval [0, t], then P (Y (t) = k) et (t)k /k!, for k = 0, 1, 2, . . . ,. That is, Y (t) has
a Poisson distribution.
There are several other well-known discrete probability distributions that are very
useful in practice and we briefly note a few of them here:
Chapter 2. Discrete Models 51
Then
# of ways A can occur
P (A) = .
Total # of outcomes
Since we are selecting
n = 5 computers from a set of N = 100 computers, the
100
denominator is . As for the numerator, if we select three defective computers,
5
they were selected from the r = 10 defective
computers in the shipment and the
10
number of ways that can occur is . However, we are not done yet if we
3
selected y = 3 defective computers, then we must of selected n y = 5 3 = 2
non-defective computers from the N r = 100 10 =90 non-defective
computers in
90
the shipment. The number of ways that can occur is . Thus,
2
10 90
3 2
P (A) = = 0.0063835281.
100
5
Chapter 2. Discrete Models 52
Suppose you monitor a production process until you find a defective item. If we let
Y denote the number of items monitored until a defective is found, then Y has a
geometric distribution, assuming the trials are independent and the probability an
item is defective does not change throughout the process. The probability function
for the geometric distribution is
p(y) = (1 p)y1 p for y = 1, 2, . . . .
Question: Can you derive this probability function based on the description given
above (see Problem 4(d))?
Problems
1. A company has two pumps, either of which can be used to pump water. The
probability the older pump malfunctions is 0.5 and the probability that the
newer pump malfunctions is 0.3.
a) What is the probability that both pumps fail?
b) What is the probability that at least one of the pumps does not malfunc-
tion?
c) What assumption is necessary about how the two pumps work in order to
answer parts (a) and (b)?
2. A gear box is selected at random from a collection of gear boxes that were man-
ufactured over the last week at a factory. The factory operates with three shifts
(day, early evening, late night). Let A be the event the gear box was manufac-
tured during the day shift, let B be the event it was manufactured during the
early evening shift and let C denote the event that it was manufactured during
the late night shift. Suppose P (A) = 0.4 and P (B) = 0.3. Find the following:
a) P (C)
b) P (A B). What is the term used to describe the relation between events
A and B?
c) P (A B).
d) P (A|B).
e) Are events A and B independent?
3. A company purchases parts for a product. 80% of the parts are from a Japanese
company and 20% of the parts are from a German company. 5% of the Japanese
parts are defective and 3% of the German parts are defective. A part is selected
at random. Let D be the event the part is defective, let G be the event the
part is from the German company and let J be the event the part is from the
Japanese company. Find the following:
Chapter 2. Discrete Models 53
4. The number of defects Y in a paint job on newly manufactured cars has the
following distribution:
y 0 1 2 3
f (y) .6 .3 .07 ?
5. The fiberglass side of an aircraft has two cracks of sizes 1.1 inches and 1.7
inches in diameter. The probability of detecting the cracks using non-destructive
inspection is 0.3 for the 1.1 inch flaw and 0.4 for the 1.7 inch flaw. An inspector
inspects the side of the aircraft. Let Y denote the number of flaws found.
Assume the event of detecting one of the flaws is independent of whether or not
the other flaw is detected.
a) P (Y = 4)
b) P (Y 4)
c) Use part (b) to compute P (Y 3).
d) E[Y ]
e) the standard deviation of Y .
Chapter 2. Discrete Models 54
a) Find E[Y1 ] and E[Y2 ], that is, find the expected number of ovens with
eventual electrical problems at each plant.
b) Find 1 and 2 , the standard deviations of Y1 and Y2 at each plant.
c) To answer the question, we could compute P (Y1 60) and P (Y2 600).
However, a direct computation of these probabilities is tedious (for exam-
ple, using (2), P (Y2 600) = p(600) + p(601) + + p(1000)). Instead,
use the empirical rule.
How many standard deviations is 60 from the mean of Y1 ?
How many standard deviations is 600 from the mean of Y2 ?
d) Apply the empirical rule to get an estimate of P (Y1 60).
e) Apply the empirical rule to get an estimate of P (Y2 600).
y 0 1 2 3 4
p(y) 0.7 0.2 0.05 0.03 0.02
10. In the previous problem, a tire is considered suitable for sale if it has no imper-
fections. Suppose the morning shift at the plant produces n = 100 tires.
b) What is the probability all 100 tires from the morning shift are suitable
for sale?
c) What is the probability that exactly 95 of the tires from the morning shift
are suitable for sale?
d) What is the expected number of tires from the morning shift that are
suitable for sale?
11. This problem is a continuation of problems 9 and 10. The defect rate on the tires
is considered to be too high. In order to address this problem, the manufacturing
process is changed in the hope of increasing the proportion of tires with zero
imperfections. n = 100 tires from a morning shift are produced under the new
conditions to see if the new conditions will lead to a higher proportion of tires
suitable for sale. The plant manager wants to test if the change has improved
the process. As before, a tire is suitable for sale only if it has no imperfections
in its tread.
a) State the appropriate null and alternative hypotheses in the context of this
problem. Be sure to define the parameter used in the statement of H0 and
Ha .
b) In the context of this problem, describe a type I error.
c) In the context of this problem, describe a type II error.
d) Suppose the plant manager decides to adopt the new (and expensive)
change in the production process if the number of tires suitable for pro-
duction out of the 100 is 74 or greater. You advise the plant manager
that this may not be a wise decision. Using the empirical rule, compute
the approximate significance level of the hypothesis test that rejects the
null hypothesis if 74 or more good tires are produced. That is, what is
probability of committing a type I error if we reject H0 when the number
of successes Y is greater than or equal to 74?
12. Survey results indicate that 47% of automobile drivers use their seat belts. In
order to obtain a higher rate of seat belt use, a law was passed to require drivers
to wear their seat belt. In order to determine if the new law has increased seat
belt usage, a random sample of n = 50 drivers was observed and it was noted
whether or not each of the drivers were using their seat belts. Let p denote the
proportion of drivers in the population that use their seat belts since the law
was passed. Let Y equal the number of drivers (out of the n = 50 observed)
that were wearing their seat belts.
a) If the goal of the new law is to increase the proportion of drivers that use
their seat belts, set up an appropriate null and alternative hypothesis in
terms of p to test if the new law is working.
b) In plain English, explain what a type I error is in the context of this
problem.
Chapter 2. Discrete Models 56
13. An engineering consultant is sent to solve problems for clients. Previous ex-
perience indicates that the consultant is able to successfully solve 75% of the
problems.
14. This is a continuation of problem 13. Suppose the consultant attends a training
class in the hopes of being able to solve a higher proportion of the service call
problems. Let p denote the proportion of calls the consultant can successfully
solve after taking the training course (recall that the proportion of successful
jobs before the training course was 0.75). We want to test if the training course
is successful. Answer the following parts:
a) State the null and alternative hypotheses for this problem in terms of p.
b) In plain English, what does it mean to commit a type I error in the context
of this problem?
c) In plain English, what does it mean to commit a type II error in the context
of this problem?
d) Suppose the consultants work is logged for a month after the training
class. During this period she had n = 300 service jobs. Let Y denote
the number of successful jobs out of these 300 jobs. If the training course
did not improve her ability to solve problems, what is the expectation and
standard deviation of Y ?
e) Suppose the consultant successfully solved 240 of the 300 jobs during this
month. How likely is it that the consultant would have 240 or more suc-
cesses out of n = 300 trials if the training course did not help (i.e. if
p = .75)? Use the empirical rule to approximate this probability.