Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

STAT1000A - Lecture Notes - Solutions

Uploaded by

jalimongesi19
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

STAT1000A - Lecture Notes - Solutions

Uploaded by

jalimongesi19
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 195

Chapter 4 PROBABILITY

4.1 Introduction

At times we are interested in how likely it is for a particular occurrence to take place. For example, you
debate the need to carry an umbrella when it looks like there is a chance of rain, or you consider investing in
a particular company if there is a good chance that the share price will increase. This chance, likelihood or
possibility of some event occurring is referred to as a probability. A probability is a numeric value between
0 and 1, inclusive. It gives us an idea of how likely a particular outcome is and plays an important role in the
decision making process in the face of uncertainty.

The objectives of this chapter include:


 Understand the concepts and principles of probability
 Calculate various probabilities

4.2 Chapter Formulae

Basic axiom 0  P  A  1
Complement rule P  A   1  P  A

Addition rule P  A  B   P  A  P  B   P  A  B 

P  A  B   P  A  P  B  , if and only if A and B are mutually exclusive


Intersection rule P  A  P  A  B   P  A  B 

De Morgan’s rule P  A  B  P  A  B  and P  A  B  P  A  B 

Conditional probability P  A  B
P  A | B 
P  B

Multiplication rule P  A  B   P  A  P  B | A  P  B   P  A | B 
Statistical independence P  A  B   P  A  P  B  , if and only if A and B are independent
Bayes’ Theorem with two P  B | A P  A
P  A | B 
events A and B P  B | A P  A  P  B | A  P  A 

Bayes’ Theorem with three


P  B | A1  P  A1 
events A1 , A2 and A3 , P  A1 | B  
P  B | A1  P  A1   P  B | A2  P  A2   P  B | A3  P  A3 
where A1  A2  A3  S

74
4.3 Terminology and Notation

Random experiment: A procedure or process of which the outcome is not known in advance.

Sample space: The set that describes or contains all possible outcomes of a random experiment. It is
denoted by S. The probability of the sample space is P  S   1 .

Event: A subset or portion of the full sample space. It is usually denoted by a capital letter, such as event A,
B, etc. The probability of an event A is denoted as P  A . An event that is sure to occur is the certain event
and its probability is 1. An impossible event is an event that has no chance of occurring and its probability is
0. An event that has only one possible outcome is termed the elementary event. The complement of an event
consists of all outcomes of the sample space, excluding the event itself.

Empty set: The set that does not contain any of the possible outcomes of the sample space. It is denoted by
    , and P    0 . Therefore this is the impossible event.

Exercise 4.1
Consider the experiment of rolling a fair six-sided die. As the outcome of any single roll of the die is not
known in advance, this is a random experiment.

1) The sample space is described by

o What type of event is this and what is its probability?

2) The event of an odd number exceeding 5 is described by

o What type of event is this and what is its probability?

3) The set of even numbers is described by

4) The complement of the event of even numbers is described by

5) The event of even numbers exceeding 4 is described by

o What type of event is this?


75
4.4 Basic Probability Concepts

Types of probability
Probabilities are defined in three different ways:

1) A priori classical probability.


This probability is based on prior knowledge of the process involved. Therefore, if an experiment has
a number of equally likely events then the probability of an event A is defined as:
number of outcomes in A
P  A 
total number of outcomes in S

2) Empirical probability.
This probability is based on observed data and not prior knowledge of the process.

3) Subjective probability.
This probability differs from person to person since it is based on a person’s past experience,
personal opinion (“gut feel”) and the analysis of a situation. This type of probability is used in
situations where we cannot use the a priori definition or the empirical definition.

Exercise 4.2
For each of the following random experiments, calculate the required probability and state the type used.

1) If we roll a fair six-sided die once, what is the probability of getting at least a four on the die?

2) In a survey, 52 out of 110 respondents claim that they use Internet banking. What is the probability
that someone uses Internet banking?

76
3) What is the chance that the traffic lights at Empire Road will be working today?

Basic probability axioms


 0  P  A  1

 P    0, the impossible event

 P  S   1, the certain event

Complement rule
Recall that the complement of an event A consists of all outcomes in S that is not in A. It is denoted as A .
And since P  S   1 it follows that:

 P  A   1  P  A

 P  A  1  P  A 

Exercise 4.3
Consider an experiment where we roll two fair dice, one blue and one red. Calculate the required
probabilities.

The sample space for this experiment, listing the outcomes as (red, blue), is:
 (1,1) (1, 2) (1,3) (1, 4) (1,5) (1, 6) 
(2,1) (2, 2) (2,3) (2, 4) (2,5) (2, 6) 

 (3,1) (3, 2) (3,3) (3, 4) (3,5) (3, 6) 
s 
(4,1) (4, 2) (4,3) (4, 4) (4,5) (4, 6) 
 (5,1) (5, 2) (5,3) (5, 4) (5,5) (5, 6) 
 
(6,1) (6, 2) (6,3) (6, 4) (6,5) (6, 6) 

1) The number 3 on the red die.

77
2) Exactly one 3.

3) At least one 3.

4) A total of 7.

5) A total of at most 7.

6) A total of at least 11.

78
Probability representation
A priori and empirical probabilities can be represented in different formats, namely contingency tables,
Venn diagrams and notation.

Recall from Chapter 1 that we can create a frequency table for a single variable, listing all possible outcomes
of the variable. Such a table can be extended to include a second variable, creating a contingency table.
Contingency tables are very useful to represent probabilities of events in the rows and columns of the table.
To illustrate this, consider the empirical results from a survey of 1000 households (Berenson, 2012). The
households were asked whether they intended or planned to purchase a big screen TV in the next 12 months.
After the 12-month period a follow-up study was conducted for the same sample of households to see
whether they actually purchased the TV.

This experiment consists of the following events:


 Planned to purchase TV (denoted as event A )
 Did not plan to purchase TV (denoted as event A )
 Actually purchased TV (denoted as event B )
 Did not actually purchase TV (denoted as event B )

All four of these events combined form the sample space and the outcomes can be represented in a
contingency table as follows:
Actually purchased B Did not actually purchase B Total
Planned to purchase A 200 50 250
Did not plan to purchase A 100 650 750
Total 300 700 1000
Table 4.1: Contingency table for intent to purchase TV study

Contingency tables are not limited to only four events. When constructing a contingency table it is important
to ensure that all events are represented in the appropriate manner in the table. For example, say that we
asked the respondents whether they planned to purchase a TV and they could respond as either “Yes”, “No”
or “Maybe”. Then Table 4.1 would require three rows to capture the three possible responses, together with
current two columns. The entire sample space must be captured in the contingency table, either in terms of
frequencies or in terms of proportions (probabilities).

79
A sample space can also be represented using Venn diagrams. A Venn diagram, introduced by the English
mathematician John Venn, is a way of displaying how different sets of objects overlap. For example, say a
sample space consists of three events A, B and C such that A and B overlap, but C does not overlap with
either A or B. The sample space and all the events can be visually represented in a Venn diagram as follows:

A B

The sample space is represented by a rectangular box and circles are used to describe the events. As this is
only a visual representation it is not necessary to draw the sizes of the circles in such a way to reflect the
probability values of each event. Also note, in this example event C is not represented as a circle, it is simply
the complement of the combination of events A and B.

The final way to represent probabilities is through the use of notation. This is typically done when the given
probability information can be easily substituted into specific probability formulae. This is further discussed
in Section 4.5.

Marginal and joint probabilities


A single or marginal probability refers to the probability of a single event. A joint probability refers to the
probability of an occurrence that involves two or more events.

Consider the a priori experiment of tossing a fair coin. Then the probability of getting a head on a single toss
of the coin is a marginal probability. If we were to toss the coin twice, then the probability of getting a head
on both tosses of the coin is a joint probability. For a contingency table with empirical results, such as Table
4.1, marginal probabilities can be calculated by using the totals in the margins of the table, whereas joint
probabilities are calculated from the information in the individual cells of the table.

80
Exercise 4.4
For the data in Table 4.1, calculate the following probabilities and state whether these are marginal or joint
probabilities.

1) The probability that a household actually purchases a big screen TV.

2) The probability that a household planned to purchase a big screen TV but eventually did not
purchase one.

Exercise 4.5
Where people turn to for news is different for various age groups. A study investigated the main channel
used to follow the news for three different age groups. Use this data to calculate the probabilities below.

Under 36 36 to 50 Over 50
Total
years old years old years old
Television 107 119 133 359
Radio 73 102 127 302
National newspaper 75 97 109 281
Local newspaper 52 79 107 238
Internet 95 83 76 254
Total 402 480 552 1434

81
1) At most 50 years old.

2) Use television.

3) Do not use any newspaper.

4) Use the Internet and are over 50 years old.

5) Are under 36 years old and use either radio or television.

6) Use television or are over 50 years old.

7) Use any newspaper or are up to 50 years old.

82
Intersection and union of events
Joint probabilities can be expressed in notation using the concepts of intersection and union of events.
Intersection refers to where events occur together. The intersection of two events A and B is denoted by
A  B and describes all the outcomes that are common to both A and B, in other words both A and B
occurred. Union refers to the combination of events. The union of two events A and B is denoted by A  B
and contains the outcomes of event A, or the outcomes of event B, or the outcomes of both A and B. This
means that either A or B or both events occurred.

We can now express the intersections of all events in Table 4.1 using notation. Each of the joint distributions
is an intersection between selected events. The union of two events, say A and B, include cells A  B ,
A  B and A  B .

Did not plan to purchase B Did not actually purchase B


Planned to purchase A A B A B
Did not plan to purchase A AB AB

Exercise 4.6
Consider the contingency table summarising the type of mutual fund and its associated level of risk for 868
mutual funds, and answer the following questions.

Risk level
High (H) Average (A) Low (L) Total
Mutual fund Growth (G) 302 140 22 464
Value (V) 53 171 180 404
Total 355 311 202 868

1) How many elementary events are there in this sample space?

2) How many intersections are there between individual events?

83
3) What is the probability that a fund is a low risk growth fund?

4) What proportion of funds are either high risk or value funds?

5) What is the chance that a randomly selected fund is average or low risk?

Co-existence of events
When dealing with more than one event, it is important to understand how these events co-exist in the
sample space. In particular, we look at whether events are mutually exclusive, exhaustive and form a
partitioning of the sample space. These are described as follows in terms of either two or three events, but
the definitions can be extended for any number of multiple events.

84
 Mutually exclusive
Any two events A and B are said to be mutually exclusive or non-overlapping if the intersection of A
and B yields the empty set, i.e. A  B   . Therefore P  A  B   0 .

 Exhaustive
Events A, B and C are said to be exhaustive if together they fill up the sample space, i.e.
A  B  C  S and thus P  A  B  C   1

 Partitioning
If events A, B and C are both mutually exclusive and exhaustive, they form a partitioning of the
sample space.

85
Exercise 4.7
The following table gives the mutual fund data excluding totals. Indicate whether the following events are
mutually exclusive, exhaustive and form a partitioning of the sample space.

High (H) Average (A) Low (L)


Growth (G) 302 140 22
Value (V) 53 171 180

Mutually exclusive Exhaustive Partitioning


G, V
V  A, GL
H, A, L
G  H , G  A, GL,
V H , V  A, V L
V H , V
V H , GA

Range of an intersection or a union


In some situations we are interested in obtaining joint information (intersection or union) of two events, but
only the marginal information is available. In such cases the best we can do is to determine the smallest
(minimum) and largest (maximum) values of an intersection or a union, given the marginal probabilities. We
do this by assessing the possible scenarios that could exist for the joint information, i.e. looking at the
possible ways that events can exist together. For the purpose of this course we will restrict this analysis to
two events.

Consider the following situation: A research company conducts a large nationwide survey on consumer
products. For the sake of confidentiality they only publish selected information about brand usage in the
public domain, such as the marginal probabilities of consuming soft drink A and soft drink B, i.e. P  A and

P  B  . The manufacturer of both soft drinks is interested in the joint usage of both brands as part of a
marketing campaign. However, the research company does not provide information on the probability that
consumers use both brands, therefore the value of P  A  B  is unknown to the manufacturer. The

manufacturer can only find the range of possible values of P  A  B  .

86
In order to obtain the minimum and maximum of either P  A  B  or P  A  B  we must evaluate the three
possible scenarios in which the two events can occur:

1) Mutually exclusive

2) Partial overlap

3) Complete overlap

For both union and intersection we will use two of these to find the solution, namely either (1) or (2), and
(3).

Minimum and maximum of an intersection


 Check if scenario (1) is possible
o Add the two marginal probabilities
 If the total is less than or equal to one, it is possible that the two events are mutually
exclusive, hence the intersection is zero
 If the total is greater than one, the events are not mutually exclusive and will therefore
overlap, i.e. scenario (2), hence the intersection it the amount that is greater than one
 Determine the value for scenario (3)
o As one event is completely contained inside the other event, it follows that the intersection is
equal to the smallest probability

87
Minimum and maximum of a union
 Check if scenario (1) is possible
o Add the two marginal probabilities
 If the total is less than or equal to one, it is possible that the two events are mutually
exclusive, hence the union is the sum of the two probabilities
 If the total is greater than one, the events are not mutually exclusive and will therefore
overlap, i.e. scenario (2), hence the union is equal to one
 Determine the value for scenario (3)
o As one event is completely contained inside the other event, it follows that the union is equal
to the largest probability

Exercise 4.8
If P  A  0.6 and P  B   0.3 , find the minimum and maximum values of each of the following events.

1) P  A  B 

88
2) P  A  B 

3) P  A  B 

89
4) P  A  B 

4.5 Further Probability Concepts and Rules

Addition rule
The addition rule is a formula that allows us to easily calculate the probability of the union of two events.
 P  A  B   P  A  P  B   P  A  B 

 P  A  B   P  A  P  B  , if and only if A and B are mutually exclusive

A B
A B

A B

90
Intersection rule
If we have two events A and B, then the events A  B , A  B , A  B and A  B are mutually exclusive
and exhaustive, hence they form a partitioning of the sample space.

B B
A A B A B
A AB AB

Note that events A  B and A  B combine to form A. And since these two events are mutually exclusive,
the marginal probability of A can be expressed in terms of the intersections with another event and its
complement. This holds for any of the marginal probabilities.
 P  A  P  A  B   P  A  B 

De Morgan’s rule
From the table above, the complement of event A  B refers to the combination of the remaining three
events. However, using the addition rule, these three events combined are A  B . In a similar way we can
see that the A  B is equal to A  B . These relationships are described by De Morgan’s Rule, which states
that for two events A and B:
 P  A  B  P  A  B 

o The complement of an intersection is the union of intersections


 P  A  B  P  A  B 

o The complement of a union is the intersection of complements


Exercise 4.9
A sample of 500 shoppers was selected in the Johannesburg area to collect information concerning
consumer behaviour. Among the questions asked was “Do you enjoy shopping for clothing?”. Gender was
also recorded. The results are summarised as follows:

Gender
Male  B  Female  B  Total

Enjoy Yes  A 140 160 300

shopping for No  A  120 80 200


clothing Total 260 240 500

91
Find the probabilities of the following events:

1) P  A  B 

2) P  A  B 

3) P  A  B 

4) P  A  B 

5) P  A  B 

Compare (4) with (5):

6) P  A  B 

7) P  A  B 

Compare (6) with (7):

92
Conditional probability
Conditional probabilities allow us to find the probability of some event, if we know that some other event
has occurred. For two events A and B, if we know that event B has occurred, the sample space is reduced to
event B. Then the probability that A occurred (given B) is equal to the probability of both A and B occurring,
divided by the probability of B occurring.
P  A  B
 P  A | B 
P  B

Exercise 4.10
Consider the example outlined in Exercise 4.9 and calculate the following probabilities:
1) P  A | B 

2) P  B | A

From the above example we can see that events A | B and B | A are very different from one another, since
we are conditioning on different knowledge. It is important to note that:
 P  A | B   P  B | A

Multiplication rule
The conditional probability formula can be used to express an intersection in terms of a marginal and a
conditional probability. This is known as the multiplication rule.
P  A  B
 Since P  A | B  
P  B

o It follows that P  A  B   P  B  P  A | B 

P  A  B
 Also note that since P  B | A  
P  A

o It follows that P  A  B   P  A P  B | A

Note that the intersection between events can be expressed in terms of two different conditional
probabilities.
93
Solving conditional probability problems
There are a variety of ways to solve a conditional probability problem, such as constructing a contingency
table or drawing a decision tree. We can construct a contingency table from conditional information by
using the various formulae discussed above. Remember, a contingency table represents the entire sample
space. Therefore all the cells of the table must add up to the total sample size (if counts are used) or the total
probability of 1 (for proportions). A decision tree (or tree diagram) allows us to represent the events of an
experiment as branches of a tree. The first level of branches consists of the unconditional (marginal)
probabilities and subsequent levels of branches consist of the conditional probabilities. For two events A and
B, the tree diagram is structured as follows:

Unconditional Conditional Intersection

B|A A B

B|A A B

B|A AB
A

B|A AB

From the multiplication rule, the products of the respective marginal and conditional probabilities yield the
intersection probabilities of the events.

Exercise 4.11
In a manufacturing plant machines A and B are used equally often to manufacture a mechanical part. The
probability that machine A produces a defective part is 0.1, while machine B has a 40% chance of producing
a defective part.

1) List all the given events and probabilities in notation.

94
2) Construct a tree diagram for this example.

3) Construct a contingency table for this example.

4) What is the probability that a randomly selected part was manufactured by machine A and it was
good?

95
5) What proportion of parts is not defective?

6) A randomly selected part is tested and found to be defective. What is the probability that it was
produced by machine B?

Exercise 4.12
A file contains four unpaid and ten paid accounts and two accounts are randomly chosen without
replacement.

1) Construct a tree diagram for this example.

96
2) What is the probability that both accounts are paid?

3) What is the probability of having at least one paid account?

97
Bayes’ Theorem
For decision making we could find a probability of an event by using either the relative frequency of the
occurrence (empirical approach) or subjectivity. Sometimes new information emerges which may impact on
our original calculation. In such cases we would like to revise our probability. For example, a manager
gauges from an interview that a candidate applying for a sales position has a low probability of being
successful at the company. However, if the candidate actually scores very well on the company’s skills
assessment tests, the manager may revise his / her original assessment.

Bayes’ Theorem was developed by Reverend Thomas Bayes in the eighteenth century to revise probability
calculations in light of new information. This theorem is a special application of conditional probabilities.

For two events A and B the conditional probability of A given B is expressed in terms of probabilities
conditioned on event A and its complement:
P  B | A P  A
 P  A | B 
P  B | A P  A  P  B | A  P  A 

This formula can be extended to any number of events. For example, consider the three events A1 , A2 and

A3 such that A1  A2  A3  S , and a further event B. Then the probability of event A1 given event B is:
P  B | A1  P  A1 
 P  A1 | B  
P  B | A1  P  A1   P  B | A2  P  A2   P  B | A3  P  A3 

Exercise 4.13
Refer to Exercise 4.11(6) and use Bayes’ Theorem to calculate to calculate the probability that a part, which
we know is defective, was made by machine B. We therefore need to calculate P  B | D  using conditional

information that is in terms of which machine was used, i.e. P  D | A and P  D | B  .

98
Exercise 4.14
A company has found that 85% of people selected for its trainee program completed the course. Of these,
60% became productive salespeople, compared to only 10% of those trainees who did not complete the
trainee program. If a salesperson that entered the trainee program became productive, what is the probability
that this person completed the program?

1) List all the given events and probabilities in notation.

2) Draw a tree diagram to find this probability.

99
3) Use Bayes’ Theorem to find this probability.

4) Construct a contingency table to find this probability.

5) Compare the results and comment on your findings.

100
Statistical independence
Consider the example of tossing a fair coin and then rolling a fair die. The outcome on the coin does not
influence the outcome on the die, in other words the outcomes on the coin and the die are independent of
each other. This notion is referred to as statistical independence.

Two events A and B are statistically independent if and only if the probability of the intersection between A
and B is equal to the product of the two marginal probabilities.
 P  A  B   P  A P  B 

Furthermore, if A and B are independent events then conditioning on either event has no impact. In other
words, the information that B happened does not change the probability of A.
P  A  B
 P  A | B   P  A , provided that P  B   0
P  B

Exercise 4.15
A BCom Accounting graduate intends to write his final board exam in a few months. He estimates that his
chance of passing the exam 60%, owing to work responsibilities and time constraints. If he fails the exam he
is allowed to rewrite it until he passes. If it can be assumed that each attempt at passing the exam is
independent of each other, what is the probability that he will pass on the fourth try?

4.6 Counting Rules

In finite probability theory we need to know the number of outcomes there would be for a particular event,
as well as the total number of outcomes in the sample space in order to calculate a probability. In some a
priori experiments it is relatively easy to identify and list the entire sample space. However, very often there
are a large number of outcomes of an experiment, which is difficult to determine without the assistance of
counting rules.

101
Counting Rule #1:
If an experiment consists of n independent and identical trials and each trial has k possible outcomes, then
the total number of possible outcomes is equal to k n .

Counting Rule #2:


If an experiment consists of n independent trials and each trial has its own set of ki possible outcomes for

i  1, 2,  , n , then the total number of possible outcomes is equal to k1  k2  kn .

Counting Rule #3:


The total number of different ways that n unique items can be arranged is “n-factorial”, expressed
mathematically as n !   n    n  1   n  2    n  3 3  2 1 .

Counting Rule #4:


If n items must be arranged, but some of the items are identical, we need to adjust counting rule #3 to
account for duplicates. Then the total number of different ways that n items can be arranged, of which n1 are

n!
alike, n2 are alike, ..., nr are alike, is .
n1 !  n2 !    nr !

Counting Rule #5:


A permutation counts the total number of ways of arranging a subset of r items out of n unique items, r  n .
In other words, we choose r of the n items and place them, or line them up, in a particular order.
n!
Permutations are calculated using factorial. The general formula for “n permutation r” is n Pr  .
 n  r !

Counting Rule #6:


A combination counts the total number of ways of selecting a subset of r items out of n unique items, r  n .
In other words, we choose r of the n items and list them, but the order in which they are listed is not
important. As with permutations, combinations are calculated using factorial. The general formula for “n
n n!
combination r” is n Cr     .
 r  r ! n  r  !

102
Exercise 4.16
We will now look at simple examples of each of the six counting rules above. For each of these experiments,
list the total number of outcomes.

1) Rule #1.
o A fair coin is tossed five times

o An ATM pin number consists of any four digits

2) Rule #2.
o A fair coin is tossed once, followed by one roll of a fair six-sided die

o An auditor wishes to select one account for auditing from each of four sets of accounts
containing four, three, five and four accounts respectively

3) Rule #3.
o How many different arrangements are there of the letters A, B and C?

o In how many different ways can the four Business Statistics lecturers be allocated to the four
different diagonals?

103
4) Rule #4.
o How many different letter arrangements can be formed using all the letters in the word
PEPPER?

o How many “words” can be created from all the letters in STATISTICIAN?

5) Rule #5.
o How many different three letter “words” can be created from our alphabet?

o The trifecta at the Durban July horse race consists of correctly picking the first three horses in
the sixth race. How many possible trifecta outcomes are there if the sixth race is run by 16
horses?

6) Rule #6.
o How many subsets of two elements can be taken from the set {1, 2, 3, 4}?

o Twelve students applied to be tutors for a specific diagonal, but the lecturer only requires
five. How many different ways are there to randomly select five of the twelve students?

104
Exercise 4.17

1) One percent of a population has a particular disease. A test is developed to detect the presence of the
disease. Any medical test could result in an incorrect diagnosis. For example, a person could be
healthy but the test indicates they have the disease (false-positive), or they could have the disease but
the test shows they do not (false-negative). This new test gives a false-positive reading 3% of the
time and a false-negative reading 2% of the time. If a person tests positive for the disease, what is the
probability that he / she has the disease?

2) Consider two independent events: C (drink coffee) and D (have a dog). If P  C   0.4 and

P  C  D   0.5 , what is P  D  ?

105
3) The Gauteng number plates consist of two letters, followed by a three-digit number, then another two
letters, and the letters “GP”. How many different number plates can be created using this system?

4) A student bought five different textbooks for her first-year courses. In how many ways can she
arrange these books on her bookshelf?

5) A small company with fifteen employees wants to set up a committee of three people to organise a
certain company function. Each of the three people selected will be required to fulfil a specific role
within the committee. If all fifteen employees have an equal chance of being selected, how many
different committees are possible?

106
6) At a management consulting firm, a team consisting of five consultants must be selected from a
group of seven male and three female consultants to work on a new project. One of the male
consultants, John, has essential experience required for this project and management decided that he
should be guaranteed a place in the team. How many different teams can be formed if John is
definitely chosen and at least one female is chosen for the team?

107
Chapter 5 DISCRETE AND CONTINUOUS DISTRIBUTIONS

5.1 Introduction

When we collect data using sample information we summarise the distribution of a variable with descriptive
statistics. As a sample is only a subset of the population the exact population distribution is not known,
although it exists. In some empirical cases the population distribution can be established through repeated
estimation from historical data. However, certain random experiments can be completely defined using
classical a priori probabilities, allowing us to construct the population distribution of a particular variable.
Some of these experiments occur often and the probability distributions have been expressed in a general
form. In this course we will cover the Binomial, Poisson, Exponential and Normal distributions.

The objectives of this chapter include:


 Understand what a probability distribution is and the properties it possesses
 Identify and distinguish between discrete and continuous distributions
 Identify special discrete and continuous probability distributions
 Calculate the probabilities and population parameters from probability distributions

5.2 Chapter Formulae


DISCRETE RANDOM VARIABLES

Mean:   E  X    xp ( x)

Variance: Var  X   E  X 2    2 , where E  X 2    x 2 p ( x )

PROPERTIES OF SELECTED PROBABILITY DISTRIBUTIONS


Expected
Function Range of X Parameter(s) Variance
Value
 n
p  x     p x q n x
Binomial  x x = 0, 1, …, n n, p np npq
where q  1  p

e  x
Poisson p  x  x = 0, 1, 2, … λ λ λ
x!
2
1 1
Exponential F  x   1  e x x≥0 λ  
 
Normal Standard normal tables −∞ < x < ∞ μ, σ2 μ σ2

108
5.3 Random Variables

A random variable is a function that assumes a value as a result of an experiment. The function relates a
unique numerical value to every outcome of the experiment. As the outcome of any execution of the
experiment is not known in advance, the value of the random variable will vary from trial to trial. There are
two types of random variables, discrete and continuous. A discrete random variable is a numerical variable
that can take on discrete values, which can be found using counting. A continuous random variable is a
numerical variable that can take on any numerical value that can be found by measuring.

Exercise 5.1
Determine whether the following random variables are discrete or continuous:

Description Type
The number of heads in 3 tosses of a coin
Time taken to toss a coin 3 times
Number of children in a household
Height of a randomly chosen person
Value of money carried by a randomly chosen person

Any random variable has a probability distribution in a population context, which is the collection of all
possible values of the variable and the corresponding probabilities of occurrence. Such probabilities are
essentially the weights assigned to a particular outcome of the variable, where heavy events will have large
weights, and consequently large probabilities. For a discrete random variable X the probability distribution
of X is referred to as the probability mass function (p.m.f.). The probability distribution of a continuous
random variable X is called the probability density function (p.d.f.).

With sample data we can calculate various measures of central tendency and variability, producing sample
statistics. The probability distribution for any random variable represents a population of interest. We are
also able to calculate the population parameters from these distributions.

The expectation of a random variable indicates the centre of the distribution, namely the population mean μ.
It is denoted by E(X) and interpreted as “the expected value of X”, or “the mean or average value of X”. The
population variance (σ2) and population standard deviation (σ) are measures used to describe the spread of a
random variable. Numerous other population parameters can be obtained from probability distributions, such
as the mode (value with the largest probability), median (value that divides the total probability in half),
percentiles, IQR, etc.
109
5.4 Discrete Random Variables

Probability mass function


The probability mass function is the set of all possible outcomes (x) and their corresponding probabilities for
a discrete random variable. The probability that a random variable X takes on a specific value x is given by:
P(X = x) = p(x)

The two definitive properties of a p.m.f. are:


1) Each probability is between 0 and 1 (inclusive), namely 0 ≤ p(x) ≤ 1
2) All the probabilities sum to 1, namely  p ( x)  1
all x

Both of these properties must hold for a function to be classified as a p.m.f.

The expected value and variance of a discrete random variable are calculated as follows:
 Expected value:
E X  
  x  p  x
all x

 Variance:
Var  X    2
 E  X2   E  X 
2

 E  X2    2
2
 
  x  p  x    x  p  x 
2

all x  all x 

Exercise 5.2
Consider an experiment where a fair coin is tossed three times. The sample space of this experiment can be
completely determined because of this particular experimental design.
S = {TTT, TTH, THT, HTT, HHT, HTH, THH, HHH}

Let the random variable X denote the number of tails in three tosses of this coin. Therefore X is a discrete
random variable that can take on the values x = 0, 1, 2, 3:
 X = 0, if the set {HHH} occurs, namely 1 element of S
 X = 1, if the set {HHT, HTH, THH} occurs, namely 3 elements of S
 X = 2, if the set {TTH, THT, HTT} occurs, namely 3 elements of S
 X = 3, if the set {TTT} occurs, namely 1 element of S

110
We can now construct a p.m.f. for the number of tails in three tosses of a fair coin. Note that both properties
of a p.m.f. are satisfied.

x p(x)
1
0
8
3
1
8
3
2
8
1
3
8
TOTAL 1

Use the p.m.f. to calculate the following probabilities and parameters:


1) P(no tails) =

2) P(at least 2 tails) =

3) P(at most 3 tails) =

4) P(1 or 2 tails) =

5) E(X) =

6) Var(X) =

111
It is important to note that this p.m.f. reflects the population probability distribution of the experiment, and
is not as a result of an empirical experiment. If we repeat this experiment 100 times, we will get the
empirical distribution of X, which may differ slightly from the p.m.f. above. However, if the experiment is
repeated millions of times ( n  ), the resulting distribution will yield the exact theoretical p.m.f.
In some cases we do not know what the experiment was that generated a given p.m.f. For such situations we
can still use the distribution to calculate probabilities and population parameters.

Exercise 5.3
Consider the following p.m.f. and answer the questions below:

x p(x) Cumulative probability


−1 0.2
0 0.1
3 0.3
4 0.4
TOTAL 1

1) P(X < 1) =

2) P(X ≥ 3) =

3) P(X ≤ 2) =

4) P(X ≥ 1) =

112
5) P(0 < X ≤ 5) =

6) Mode(s) =

7) Median =

8) E(X) =

9) Var(X) =

10) Standard deviation =

113
Exercise 5.4
Verify that the following function is a p.m.f.

 2 x  1 x  1, 2,3, 4

p( x)   16
 0 otherwise

Check property (1):

p(1) =

p(2) =

p(3) =

p(4) =

Check property (2)

 p ( x) 
all x

114
Exercise 5.5
Find the value k such that the following function p(x) is a p.m.f.
kx 2 x  1,1, 2
p( x)  
 0 otherwise

Exercise 5.6
Consider an investment of R1000 in an oil drilling venture. If oil is struck the venture will yield a gross
profit of R10000. If oil is not struck the investor will lose the R1000 investment. If there is a 10% chance of
striking oil, what is the expected profit (net gain) from investing in the venture? Hint: first construct a p.m.f.
of net gain.

115
Binomial distribution
A Binomial random variable arises from an experiment that consists of a number of identical sub
experiments called Bernoulli trials. A Bernoulli trial has only two possible outcomes, a success or a failure.
The Binomial variable counts the number of successes out of n trials. It is a discrete random variable as it
can take on any value 0, 1, ..., n. This distribution is characterised by two parameters, namely the number of
trials (n) and the probability of a success (p), and the p.m.f. has been expressed in a general form. If variable
X is a Binomial random variable with n trials and a probability of success p, it is denoted as: X ~ B(n, p).

An experiment that counts the number of successes in n trials is a Binomial random variable if all of the
following properties are satisfied:
 The experiment consists of n identical trials
 Each trial consists of two possible outcomes: success (S) or failure (F)
o Note: a success is not necessarily a positive event, it refers to the focus of the distribution
 The probability of a success is the same for all trials, namely p
 The n trials are independent

Formulae for X ~ B(n, p):


n
P(X  x)    p x 1  p 
n x
 x  0,1, , n
 x
 E(X) = np
 Var(X) = np(1 – p)

Exercise 5.7
Let X be the number of sixes counted in four rolls of a fair die.
1) Determine the distribution and parameters of X.

116
2) E(X) =

3) Var(X) =

4) P(X ≤ 2) =

Exercise 5.8
A multiple choice test consists of eight questions, each of which has four possible answers and only one is
correct. In order to pass a student has to answer at least four questions correctly. Assume that the student
guesses each answer.
1) Determine the distribution and parameters of X.

2) E(X) =

3) Var(X) =
117
4) P(X = 3) =

5) P(student will pass the test) =

Exercise 5.9
It has been established that 20% of people respond to a certain mailed advertisement. A total of ten such
advertisements have been mailed.
1) Determine the distribution and parameters of X.

118
2) What is the probability that people responded to only two advertisements?

3) What is the probability that people responded to at least one advertisement?

Exercise 5.10
Consider the following Madam & Eve cartoon. Is Madam being fair? Can we show this statistically?

119
If Madam is being fair, what is Eve’s chance of guessing correctly at least once in seven years? Assume that
Eve cannot remember what she guesses from one year to another and therefore each number has the same
chance of being chosen in any year.

The fact that the chance of guessing incorrectly for seven years is quite small casts doubt on the assumption
that p = 0.2. Eve may not be guessing randomly, or she may favour numbers that Madam dislikes, or Madam
may not be playing fair.
120
Poisson distribution
The Binomial random variable counts the number of successes out of n trials. Therefore the total number in
the set (n) is known. Some experiments are such that this upper limit is not known, for example the number
of customers arriving at a store per day. In this case there is no limit to the total number of customers that
could arrive at the store. As a store owner it would be useful to calculate probabilities around customers’
arrivals to ensure that there are enough staff in the store to assist them.

If we can assume that the customers arrive at the store at a constant rate throughout the day, and their
arrivals are independent of one another, a very specific distribution occurs, namely the Poisson distribution.
This is a discrete random variable that can take on any value 0, 1, 2, ..., and counts the number of
occurrences within a given period. In general this distribution counts spontaneous phenomena that occur in
time. Other applications include:
 The number of telephone calls arriving at a switchboard per hour
 The number of insurance claims per month
 The number of potholes in a 5km stretch of road
 Etc.

The Poisson distribution is characterised by the rate of occurrence, lambda (λ), and the p.m.f. has been
expressed in a general form. As the Poisson distribution assumes a constant rate of occurrence, the rate for a
different time period can calculated directly from the given λ. For example, if λ = 2 per hour it follows that
λ = 1 per hour, λ = 10 every five hours, etc. If X is a Poisson random variable with rate λ for a given time
period, it is denoted as: X ~ P(λ).

Formulae for X ~ P(λ):


e  x
 P (X  x )  x  0,1, 2,
x!
 E(X) = λ
 Var(X) = λ

The Binomial and Poisson distributions are related such that, under certain conditions, the Binomial
distribution can be estimated using the Poisson distribution. This only occurs effectively if the total number
of trials (n) of a Binomial random variable is very large and the probability of success (p) is very small. We
can then use the parameters of the Binomial random variable to find the parameter of the Poisson random
variable, and thus estimate probabilities using the latter distribution. If X ~ B(n, p), with n  and p  0
then X can be approximated as X* ~ P(λ), where λ = np.

121
Exercise 5.11
At a department store, customers arrive at a rate of twelve per hour. Let X = the number of customers
arriving in an hour.
1) Determine the distribution of X and its parameter.

2) What is the probability that only two people arrive at the store in any given hour?

3) Calculate the probability that more than two people arrive in an hour.

4) What is the probability that fewer than three people arrive at the store in any given ten-minute
period?

122
Exercise 5.12
On average a typist makes three typing errors every two pages. Assume that the number of typing errors has
a Poisson distribution.
1) Define the random variable X.

2) E(X) =

3) Var(X) =

4) P(X ≥ 2) =

5) What is the probability that a three-page document she typed has no errors?

6) Find the mean and standard deviation of the number of errors in a three-page document.

123
Exercise 5.13
The average number of motorbike accident fatalities in a year is 10 per 50000 population. Let X be the
number of motorbike accident fatalities that occur in a year for a population of 10000 people.
1) Define the distribution of X and its parameters.

2) P(no fatalities) =

3) P(more than two fatalities) =

4) Use the Poisson distribution to approximate X with X* ~ P(λ) and find the value of lambda.

5) P(X* = 0) =

6) P(X* > 2) =

124
5.5 Continuous Random Variables

Probability density function


Measurements made on a continuous scale are best represented as grouped continuous data, where the
values of the variable are grouped into class intervals. The frequency distribution can then be shown
graphically as a histogram. Consider the following histogram of the ages of people in a certain population.
The shaded bar denotes the interval [15, 20) and the size of this area relative to the total area of the
histogram reflects the proportion of the population aged 15 to 19. If X measures the age of a randomly
chosen person from this population, we can express this in probability notation as follows:
number of people aged 15 to 19 years
P (15  X  20) 
number in population
 proportion of population aged 15 to 19
 relative frequency
0.4

0.3

0.2

0.1

0
0 5 10 15 20 25 30 35 40 45 50 55

Since the set of possible values of a continuous random variable is represented as intervals we can make the
classes arbitrarily small. For example, we can measure age in days or months rather than completed years.
This leads to a smoothed curve showing the shape of the distribution instead of the rectangular bars of the
histogram. This curve is called the probability density function (p.d.f.) of a random variable X and is
generated using a mathematical function such that the total area under the curve is equal to one. Therefore
any area or portion of this curve represents the probability that an event (interval) occurred.

0.4

0.3

0.2

0.1

0
0 5 10 15 20 25 30 35 40 45 50 55 60

125
Every continuous random variable has a p.d.f. which can be plotted as a smooth curve. Consider the
following function f(x) and its corresponding plot in Figure 5.1(left).
x
f ( x)  1  for 0 ≤ x ≤ 2
2

1 1

0.8 0.8

0.6 0.6

f(x)
f(x)

0.4 0.4

0.2 0.2

0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
x x

Figure 5.1: Probability distribution of f(x) Selected areas under the curve

The total area under f(x) can easily be calculated from the formula for the area of a triangle:
1 1
Area of   bh  (2)(1)  1
2 2

As such the function f(x) is a p.d.f. and any area under this curve can be calculated, provided that the area is
given as an interval. In Figure 5.1(right) a number of different areas are illustrated, namely: the area less
than 0.5, the area between 0.5 and 1, and the area greater than 1. If we were to consider an exact value of the
random variable X and calculate the probability that the variable is exactly equal to a specific value, for
example P(X = 0.5), this requires us to calculate the area of the line directly above 0.5 in the graph, relative
to the size of the curve. The size of this area is approximately zero.

It therefore follows that P(X = x) = 0 for all possible values of x. This implies that equality in probability
notation for continuous random variables is effectively redundant:
P (a  X  b)  P (a  X  b)  P (X  b)
 P ( a  X  b)
 P ( a  X  b)
 P ( a  X  b)

As with discrete random variables it is possible to derive and calculate population parameters from the p.d.f.
of any continuous random variable. However, these derivations are beyond the scope of this course.

126
Cumulative distribution function
Unlike discrete random variables we cannot calculate probabilities for continuous random variables directly
from the probability distributions, as this process involves integrating over an area of interest for a given
p.d.f. The integral of a function f (x) yields a specific mathematical expression termed the cumulative
distribution function (c.d.f.). This function represents the area under a curve between −∞ and x and allows us
to calculate probabilities for continuous random variables. It is beyond the scope of this course to derive the
formula of a c.d.f. from a given p.d.f.

Let X = a continuous random variable. Then:


 f(x) = probability density function of X
o Derivative of F(x)
o Function used to generate probability density function
 F(x) = P(X ≤ x) = cumulative distribution function of X
o Integral of f(x)
o Function used to calculate probabilities
o Note: always write required probabilities in the form P(X ≤ x)

Exercise 5.14
Consider the function
x
f ( x)  1  for 0 ≤ x ≤ 2
2

If we integrate this function over the given region it yields:


x2
F ( x )  P (X  x )  x  for 0 ≤ x ≤ 2
4

Use F(x) above to calculate the following probabilities:


1
1) F   
2

2) F 1 

1
3) P   X  1  
2 

127
Exercise 5.15
Consider the following function and answer the questions below.
0 x0

F ( x)   x3 0  x  1
1 x  1

1 3
1) P   X   
4 4

1
2) P  X  
 2

3) P  X  3 

Exercise 5.16
Let T be the time taken by any shop assistant to complete a half-day stock take. Areas under the density
function of T can be calculated using the following formula:
 0 t0
 6
 t 
F (t )    0t 6
 6 
 1 t6

1) What proportion of shop assistants take more than 5 hours to complete the task?

2) What is the chance that a randomly selected shop assistant will take between 4 and 5 hours?

128
Exponential distribution
An Exponential random variable measures the waiting time until the first occurrence of an event or between
successive occurrences of an event. Examples of these include:
 The waiting time to the arrival of the first customer in a shop
 The waiting time to the first phone call at a call centre
 The waiting time between successive breakdowns of a truck
 The waiting time between two successive lightning strikes
 Etc.

There is a relation between the Poisson and Exponential distributions, namely:


 If the number of occurrences of an event in a time period follows a Poisson distribution, then the
waiting time follows an Exponential distribution
 If the waiting time follows an Exponential distribution, then the number of occurrences of an event
in a time period follows a Poisson distribution

As we are measuring waiting time, it follows that the Exponential distribution is a continuous random
variable. Both the p.d.f and the c.d.f. have been expressed in a general form. In order to calculate
probabilities we will use the expression for the c.d.f., where:
 P(X ≤ x) = the probability that the waiting time to the first occurrence or between occurrences is at
most x units of time
 P(X > x) = the probability that the waiting time to the first occurrence or between occurrences
exceeds x units of time

The Exponential distribution is characterised by the average number of occurrences in a single time unit,
namely lambda (λ). For example, if X measures the time in minutes until the first event or between events,
then X ~ Exp(λ) where λ is the average rate of occurrence in one minute.

Formulae for X ~ Exp(λ):


 f (X)  e x x0

 F (X)  P(X  x)  1  e x

 P(X  x)  e x using the complement rule


1
 E (X) 

2
1
 Var (X)   

129
Exercise 5.17
At a department store, customers arrive at the rate of twelve per hour according to a Poisson process.
Therefore X ~ P(12) per hour. Let Y = the waiting time in minutes until the first customer arrives or between
two customers’ arrivals.

1) Determine the distribution of Y and its parameter.

2) E(Y) =

3) Var(Y) =

4) What is the probability that the first customer arrives within ten minutes of opening the store?

5) What is the probability that the first customer arrives between the fourth and tenth minute of opening
the store?

6) If a customer just left the store, what is the probability that the next customer will arrive after five
minutes?

130
Exercise 5.18
Flash thunderstorms occur during a particular season at the average rate of 1.5 per week.

1) Determine the distribution of the number of thunderstorms during a week (X).

2) E(X) =

3) Var(X) =

4) Determine the distribution of the waiting times in days until the first thunderstorm or between
successive thunderstorms (Y).

5) E(Y) =

6) Var(Y) =

7) Standard deviation (Y) =

8) What is the probability that the first thunderstorm occurs within five days?

131
9) What is the probability that more than three days elapse between successive thunderstorms?

10) What is the probability that there is exactly two days between successive thunderstorms?

11) What is the probability that at least one thunderstorm occurs in a week?

12) What is the probability that three thunderstorms occur in a two-week period?

132
Normal distribution
The normal distribution, also referred to as the Gaussian distribution, is the most commonly used continuous
distribution in statistics. It is characterised by only two parameters, namely its mean (μ) and its variance (σ2),
and is denoted as X ~ N   ,  2  . For example, if X denotes the IQ scores of a population and

X ~ N 100,144  , it means that IQ is normally distributed with a mean of 100 and a variance of 144 (and a
standard deviation of 12).

The normal distribution has several important properties:


 It is bell-shaped
 The total area under the curve is equal to one
 It is perfectly symmetric about the mean μ
o The mean, median and mode are all identical
o The area to the left of μ = the area to the right of μ = 0.5
 Its associated random variable X has an infinite range, i.e. −∞ < X < ∞

B C

Exercise 5.19
In the figure above, three different normal distributions are illustrated. Discuss the similarities and
differences between these distributions in terms of their means and variances.
1) A vs. B

2) B vs. C

3) A vs. C

133
As with any other continuous random variable we can use the c.d.f. to calculate probabilities associated with
areas under the curve. However, the p.d.f. of the normal distribution is computationally intensive to integrate
and express as a c.d.f. Furthermore, this distribution is characterised by its mean and standard deviation,
which implies that an infinite number of possible normal distributions can exist.

To overcome this problem we will use another important property of random variables, namely that any
random variable can be standardised by subtracting its mean and dividing by its standard deviation. If a
random variable X is normally distributed we can standardise X according to this property. The resulting
variable, denoted by Z, will be a normal random variable with a mean of zero and a variance of one (and
consequently a standard deviation of one). The variable Z is termed the standard normal random variable,
and the particular values of Z are called z-scores or z-values, which range from −∞ to ∞. Therefore:
 If X ~ N   ,  2 

X
 Then Z  ~ N (0,1)

This standardisation from any normally distributed variable X to the standard normal variable Z is a one-to-
one transformation. As a result any given area under the X-curve is maintained under the Z-curve.

Area between μ and x under the X-curve = the area between 0 and z under the Z-curve

μ x 0 z

x
x is mapped on to z


μ is mapped on to 0

134
The c.d.f. of Z has been computed for a set of z-scores and summarised in a table. This table of standard
normal probabilities gives us the specific areas P  Z  z   P  Z  z  under the standard normal distribution
curve for a set of z-scores. As the standard normal distribution is centred at zero, Z can take on any negative
or any positive value. The z-scores in the table range from –6 to +6 and are given up to two decimal places.
For each z-score there is a unique cumulative area to the left of the value. Similarly, each cumulative area is
associated with a single z-score.

A portion of the Z-table is given in Table 5.1. In the table the z-scores are listed in the margins, namely the
first row and the first column. The column gives the z-score up to the first decimal value and the row gives
the second decimal value. The body of the table gives the corresponding cumulative areas (from −∞ to z),
with four decimal places.

The Cumulative Standardised Normal Distribution


Entry represents area under the cumulative standardised
normal distribution from −∞ to Z

 0 Z
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
Table 5.1: Extract from the standard normal table

For example, to find the area between −∞ and 0.21, i.e. P  Z  0.21 , we look for 0.2 in the first column and
0.01 in the first row. Therefore the area is equal to 0.5832. Similarly, to find the area between −∞ and 1.06,
i.e. P  Z  1.06  , we look for 1.0 in the first column and 0.06 in the first row. This gives the required area
equal to 0.8554. Note that as the z-score increases, the cumulative area increases.

We can also find the z-score for a given cumulative area. For example, if the area between −∞ and z is
0.5040, we need to find that particular value in the body of the table and then determine the corresponding z-
score from the first column and the first row. For this example, the z-score that corresponds to a cumulative
area of 0.5040 is z = 0.01.
135
In the above examples, the exact z-score or cumulative area can be found in the table. As the table is
restricted to two decimal places for z-scores and four decimal places for areas, not all possible values are
given in the table. For example, say we are looking for the cumulative area associated with a z-score of
0.227, we must round this value to the two decimal places, i.e. 0.23, which can be found in the table.

The same holds for the areas, i.e. we must round such values to four decimal places. If we cannot find a
certain area in the table, we must find the value which is closest to the one we are looking for. For example,
the cumulative area of 0.79 is not in the table. The closest we can find is 0.7910, which corresponds to a z-
score of 0.81. If there are two areas that are equally close to the one we are looking for, the z-score will be
the average between the two corresponding z-scores. For example, the area 0.51 is exactly halfway between
0.02  0.03
areas 0.5080 and 0.5120 in the table. Therefore the z-score will be  0.025 .
2

Since the Z-table is in the form P  Z  z  we have to express all areas required in terms of this cumulative
form. There are three different areas that we could be interested in:
1) The area between −∞ and any z, i.e. P  Z  z 

2) The area between any z and +∞, i.e. P  Z  z 

3) The area between any two z-scores, say a and b, for a < b, i.e. P  a  Z  b 

Exercise 5.20
For each of these forms, find the required probabilities.
1) P  Z  z 
This area is already in the correct form and we can use that table directly.

P  Z  2  P  Z  1.68 

136
2) P(Z > z)
Re-write using the complement rule, or using the fact that the normal distribution is symmetric.

P  Z  2  P  Z  2 

P  Z  1.68  P  Z  1.68 

3) P  a  Z  b 
The Z-table gives us the cumulative areas up to any point. Therefore we know the size of the area up
to point b, as well as up to point a. We subtract the two areas to find the required probability.

P 1.68  Z  2  P  1.68  Z  2  

137
Exercise 5.21
For each of the following given areas, find the corresponding z-scores.

1) P  Z  z   0.8461 3) P  Z  z   0.9

2) P  Z  z   0.4960 4) P  Z  z   0.1

Exercise 5.22
If Z ~ N  0,1 , find:
1) The probability that Z is at most 1 =

2) The proportion of the curve that exceeds 1.25 =

138
3) The area between −0.8 and 0 =

4) The z-score that corresponds to the lowest 83.4% of the distribution =

5) The value of the 35th percentile =

6) The value of z such that the area greater than that value is 0.05 =

The standard normal table is a useful tool that allows us to find probabilities involving any normally
distributed random variable, i.e. X ~ N   ,  2  . To do this we must first write the required probability in

terms of the original variable X. We then standardise X by subtracting the mean (μ) and dividing by the
standard deviation (σ). This results in the standard normal variable Z, and we can therefore use the Z-table to
calculate the required area.

139
Exercise 5.23
If X ~ N 10,16  , find:

1) P  X  13 =

2) P  8  X  9.5 =

3) P  X  5.4  =

4) What value of X satisfies P  X  x   0.3944

140
5) Find the value of a that satisfies P 10  4a  X  10  4a   0.95

Exercise 5.24
The daily price of a certain commodity is approximately normally distributed with a mean of R8 and a
standard deviation of R0.25.
1) Define the variable X and its distribution.

2) What is the probability that the price exceeds R9 on any particular day?

3) On what proportion of days is the price lower than R8.72?

141
4) What is the probability that the price is between R7.50 and R8.50?

Exercise 5.25
The height a university high-jump student clears each time he attempts a jump is assumed to follow a normal
distribution with a mean of 2m and a standard deviation of 10cm. What height will he clear on the best 10%
of jumps?

To check: Find P  Z  2.128 – the answer should be 0.1

142
Exercise 5.26
A new machine, used for filling cans of liquid hairspray, can be set to dispense any given average fill. If the
amount of fill is approximately normally distributed with a standard deviation of 10ml, what mean setting
will ensure that 99% of the cans contain at least 500ml?

143
Chapter 6 INFERENCE

6.1 Introduction

In many instances it is not possible to collect data for a random variable from all members of a population.
This could be due to time or cost constraints, or because the members are inaccessible. In order to address
this problem we collect information from a representative subset of the population, namely a sample. We
can use the sample data to reach conclusions about the behaviour of a random variable for the whole
population. This process of generalising from a sample to a population is known as inference. Inference is
always done about the population parameters using sample statistics.

The objectives of this chapter include:


 Derive sampling distributions of the mean and a proportion
 Calculate probabilities associated with sampling distributions
 Know how to read the t-distribution table
 Estimate various population parameters
 Calculate confidence intervals
 Perform various hypothesis tests

6.2 Chapter Formulae


Inference for: Confidence interval Test statistic

 x  0
x  z z ~ N  0,1
 2 known 
2
n n

s x  0
x  tn 1;  t ~ tn 1
 2 unknown s
2
n n
Means sd d 0
d  tn 1;  t ~ tn 1
Dependent sd
2
n
n
1  2  x1  x2   0
t ~ tn1  n2  2
1 1 s* 1
 n12
 x1  x2   tn  n 2; s
n1
Independent 
*

1 2 2
n1 n2  n1  1 s12   n2  1 s22
s*2 
n1  n2  2

pˆ  p0
p pˆ 1  pˆ  z ~ N (0,1)
Proportions pˆ  z p0 1  p0 
2
n n

144
6.3 Sampling Distribution of the Mean

Consider the example where a consumer goods company is interested in the amount of money that its
customers spend on a particular product during a month, such as tea. It is reasonable to assume that the
majority of households in South Africa will consume at least one brand of tea during any given month, and
that the amount spent on the product will vary from household to household. As such the amount spent on
tea per month by the entire population is a random variable, which means it will have a distribution, an
average value, a median value, a variance and standard deviation, and other parameter values.

Since the amount spent will not necessarily be the same for all households, it is better to capture the
behaviour of the whole population using some sort of summary measure, such as the average amount spent.
However, if the company wants to ask their customers about their tea expenditure it is unrealistic that they
will be able to speak to each and every household in South Africa. Instead a representative sample can be
drawn from this target population and the sampled individuals can provide a great deal of information about
their monthly spending, at a reasonable cost to the company.

The monthly expenditure on tea products of this sample can be easily summarised using the average of the
variable, i.e. the sample mean. In order to infer behaviour from this sample mean to the greater population
we need to be able to say something about the behaviour of such a sample mean for a given sample size. In
particular, we want to be able to derive the distribution of the sample mean so that we can attach some
measure of certainty to our inference process.

Derivation of the sampling distribution of the mean


A company wants to audit the work done by all fifteen of the administrative assistants in their employment.
Each of the assistants was asked to type the same document. Table 6.1 shows the number of errors made by
each assistant.

Assistant Number of errors Assistant Number of errors Assistant Number of errors


A 3 F 1 K 0
B 1 G 2 L 4
C 4 H 3 M 2
D 7 I 5 N 2
E 2 J 0 O 1
Table 6.1: Number of errors per administrative assistant

145
Let X be defined as the number of errors made by a randomly chosen administrative assistant. X is a random
variable as it assumes a value as a result of the experiment of choosing an administrative assistant at
random, and different administrative assistants will make different number of errors. And since X is a
random variable it has a distribution, and hence a density function. This density function can be described by
its measure of central tendency (the mean) and measure of variability (variance).

In this example there are only fifteen administrative assistants at this company, which means that the
information in Table 6.1 is the data for the entire population. Therefore the average number of errors made
by all administration assistants in this population is denoted by μ and its variance is σ2. The mean value is
also referred to as the expected value of variable X, i.e. μ = E(X), and can be calculated directly from the
population data in Table 6.1 as follows:
N
1 1 37

N
x
i 1
i 
15
3  1    2  1   2.467
15

Suppose we could not determine what μ is because we were unable to collect data for the whole population.
In such cases we will draw samples, collect the necessary information, and calculate the average value x
from the sample. Say we randomly draw a sample of size n = 5 from the population in Table 6.1, and we
repeat the process 15 times. For each sample we can then calculate x . This is illustrated as follows:

Sample Sample data Sample mean  X 

1 3 4 2 2 5 x1  3.2
2 0 2 1 1 7 x2  2.2
3 1 3 0 4 2 x3  2.0
4 4 1 5 4 1 x4  3.0
5 2 3 0 2 1 x5  1.6
6 0 3 1 7 3 x6  2.8
7 2 5 3 0 1 x7  2.2
8 5 2 1 1 0 x8  1.8
9 2 0 0 7 3 x9  2.4
10 4 3 5 0 1 x10  2.6
11 1 1 2 4 3 x11  2.2
12 0 0 5 3 2 x12  2.0
13 3 7 2 0 4 x13  3.2
14 2 0 3 1 7 x14  2.6
15 1 2 4 3 1 x15  2.2

146
From this table we note that the values of X (the column that contains the means from each sample) vary
from sample to sample. Hence X is a random variable, which means it has a distribution and consequently
measures of central tendency and variability. If we take an average across all 15 sample means we get:
3.2  2.2    2.6  2.2
 2.4
15

This average is very close to the exact population mean value of μ = 2.467. In fact, if we were to take all
possible samples of size 5 from this population and calculate an average we will get the exact value of the
population mean. The variability of these sample means depends on the size of the samples.

In general, if we consider all the values of the variable X from all samples of fixed size = n, then:
 The average of all values =  X  E  X 

2
 The variance of all values =  X2  Var  X  
n

In practice we will not extract all possible samples of size n and perform this experiment. This is a
theoretical exercise that shows that the behaviour of the sample mean can be determined as it has its own
distribution and parameters. We are able to derive these parameters of X for a sample of size n as follows:

1) If the parent population X has a normal distribution and the population variance or population
standard deviation is known, then X follows a normal distribution, i.e.
 2 
o If X ~ N   ,  2  and σ2 or σ is known, then X ~ N   ,
 n 
2) If the parent population X has a normal distribution and the population variance or population
standard deviation is unknown, then the standardised form of X follows a t-distribution, i.e.
X
o If X ~ N   ,  2  and σ2 or σ is unknown, then ~ tn 1
s
n
 where s is the sample standard deviation
3) If the parent population X is not normal distributed, then X follows a normal distribution provided
that the sample size is large enough, i.e.
 2 
o If X ~ ?   ,  2  where   E  X  and  2  Var  X  , then X ~ N   , if n  30
 n 
o This is known as the central limit theorem (CLT)

147
It is important to note that the distribution of X for a sample is different to the original distribution of X and
probabilities associated with each of these variables are very different.

Exercise 6.1
Suppose that students’ test scores follow a normal distribution with a mean of 65 and a variance of 25.
1) Find the probability that students’ test score exceeds 62.

2) Find the probability that the average test score of a random sample of 16 students exceeds 62.

3) Compare the answers from (1) and (2) and comment on the difference.

148
Exercise 6.2
IQ scores have a normal distribution with a mean of 100 and a variance of 225. Find the probability that the
average IQ of a random sample of 18 people exceeds 110.

Exercise 6.3
The lifetime of light bulbs are exponentially distributed with a mean lifetime of 1000 hours. A random
sample of 45 light bulbs is selected and the lifetimes measured. What is the probability that the mean
lifetime of the bulbs in the sample is at most 950 hours?

149
6.4 Sampling Distribution of a Proportion

Sometimes we are interested in making inference about the proportion of objects in a population that possess
a particular attribute. The population proportion is denoted by p. As before, we will infer the behaviour of
the population proportion (p) using the sample proportion  p̂  . For example, if we want to find out what
proportion of the population owns a cellphone, we can draw a random sample and calculate the proportion
of the sample that owns a cellphone. This value will tell us something about the p. But in order to do this we
need to derive the sampling distribution of a proportion.

If we follow the same approach as with the distribution of X for a sample, we can derive the distribution of
the sample proportion ( p̂ ) for a sample of size n. If we draw multiple samples of size n and calculate p̂ we
will see that p̂ will vary from sample to sample. Therefore p̂ is a random variable that has a distribution, a
mean, a variance, etc. If the sample is large enough the CLT allows us to derive the distribution and
parameters of a sample proportion, namely:
 p 1  p  
pˆ ~ N  p,  provided that np  5 and nq  5, where q  1  p
 n 

Exercise 6.4
Twenty percent of first year Commerce students own laptops. A lecturer randomly selects 35 out of the 1000
first year Commerce students.
1) What is the probability that at least 5% of the students own a laptop?

2) What is the probability that at most 3% of the students own a laptop?

150
6.5 The t-Distribution

At the beginning of the twentieth century a statistician for Guinness Breweries in Ireland, William S Gosset,
wanted to make inferences about the population mean when σ was unknown. Because Guinness employees
were not permitted to publish research under their own names Gosset adopted the pseudonym “Student”.
The distribution he developed is known as the Student’s t-distribution.

If the random variable X is normally distributed and σ is unknown, X has a t-distribution with n – 1 degrees
s2
of freedom, a mean of μ and a variance of . The standardised form of X follows a tn 1 distribution:
n
X
T ~ tn1
s
n

This expression has the same form as the Z statistic, except that the sample standard deviation (s) is used to
estimate the unknown population standard deviation (σ). We use n – 1 degrees of freedom for the t-
distribution because we are estimating s after we have already estimated x .

Properties of the t-distribution:


 It is bell shaped
 It is symmetric about zero
 It has heavier tails than the normal curve
 It is identified by the degrees of freedom: 1, 2, 3, ...
 As the degrees of freedom increase, it becomes more like the Z distribution

151
A portion of the t-tables is given in Table 6.2. This table only gives six upper tail areas for various degrees
of freedom. Therefore this table gives the specific areas: P  T  t  . The sizes of the upper tail areas are
given in the first row and the degrees of freedom are given in the first column. The t-values that correspond
to the selected upper tail areas are given in the body of the table, with four decimal places. We can use the t-
tables to find the t-value for a given probability, or to find the probability for a given t-value.

Note that the values in the first row, i.e. the upper tail areas, are probabilities. Therefore these values are
bounded to the left by 1 and to the right by 0.

Critical Values of t
For a particular number of degrees of freedom, entry represents
the critical value of t corresponding to a specified upper-tail area (α) 

0 t(, df)

Degrees of Upper-Tail Areas


freedom 0.25 0.10 0.05 0.025 0.01 0.005
1 1.0000 3.0777 6.3138 12.7062 31.8205 63.6567
2 0.8165 1.8856 2.9200 4.3027 6.9646 9.9248
3 0.7649 1.6377 2.3534 3.1824 4.5407 5.8409
4 0.7407 1.5332 2.1318 2.7764 3.7469 4.6041
5 0.7267 1.4759 2.0150 2.5706 3.3649 4.0321
6 0.7176 1.4398 1.9432 2.4469 3.1427 3.7074
7 0.7111 1.4149 1.8946 2.3646 2.9980 3.4995
8 0.7064 1.3968 1.8595 2.3060 2.8965 3.3554
9 0.7027 1.3830 1.8331 2.2622 2.8214 3.2498
10 0.6998 1.3722 1.8125 2.2281 2.7638 3.1693
. . . . . . .
. . . . . . .
. . . . . . .
∞ 0.6745 1.2816 1.6449 1.9600 2.3263 2.5758
Table 6.2: Extract from the t-tables

To find probabilities for the t-distribution using the t-tables we need to first write the required probability in
notation. If it is in the correct form P  T  t  we can use the tables directly. If it is not in the correct form,
we need to re-write it using either the complement rule or the fact that the t-distribution is symmetric.

For example, suppose we have a sample with 10 observations and we need to find P  T  1.8331 . The
required probability is already in the correct form, i.e. greater than some positive t-value. For this example
we have (10 – 1) = 9 degrees of freedom. We therefore only look in the row for 9 degrees of freedom and
look for the value 1.8331, which is located in the column for the upper 0.05 of the distribution. Thus
P  T  1.8331  0.05 .

152
As we only have t-values for six selected areas, many other t-values are omitted from the table. In situations
where we need to find the upper tail area for a t-value that is not in the table, we first identify where this
particular t-value lies, i.e. between which two t-values, and then read off the two corresponding areas. For
example, to find P  T  1.2345 where T ~ t4 , we see that 1.2345 lies between t  0.7407 and t  1.5332 .
These two t-values correspond to the upper tail areas of 0.25 and 0.1. Therefore we can conclude that
P  T  1.2345 where T ~ t4 lies between 0.25 and 0.1.

For a large sample size, i.e. large degrees of freedom, the t-distribution behaves like the Z distribution. The
last row in the t-table contains the z-scores for the six upper tail areas from the Z distribution. The infinity
sign for the degrees of freedom does not imply an infinite sample size, it refers to a large sample, and hence
we have the z-scores. This is convenient for finding exact z-scores for these upper tail probabilities.

Exercise 6.5
Use the t-tables to find the following:
1) P  T  2.7874 for a sample of 26 observations

2) P  T  1.6973 where T ~ t30

153
3) For a sample of n = 21, what is the probability that T is less than –2.086?

4) Suppose we have a sample of 16 observations, find the value t* such that P  T  t *   0.01

5) Calculate P  T  t   0.75 for n = 40

154
6) For a sample of size 26, find the value of t such that the sum of the two tail areas is equal to 0.1

7) P  Z  1.2816  =

8) Find z such that P  Z  z   0.05

155
Exercise 6.6
Suppose that students’ test scores follow a normal distribution with a mean of 65. A random sample of 16
students’ scores yielded a standard deviation of 7. What is the probability that the average test score of this
random sample exceeds 62?

156
Exercise 6.7
IQ scores have a normal distribution with mean 100. A random sample of 18 people’s IQ scores yielded a
variance of 200. Find the probability that the average IQ of 18 people is below 107.

157
6.6 Overview of Inference

Statistical inference can be performed in two ways:


 Estimation
o Calculate sample statistics to estimate population parameters using methods like point
estimation and confidence interval estimation
 Hypothesis testing
o A claim or statement is made regarding a population parameter and the claim is tested using
sample evidence

Inference made about population parameters will depend on the nature of the problem statement. In this
course we will only focus on inference about population means and population proportions.

We will address five different problem statements in this chapter, namely:


1) A single population mean (  ) where the population variance is known.
o For example, making inference about the average household tea expenditure if the population
variance is known to be equal to a certain value
2) A single population mean (  ) where the population variance is unknown.
o For example, making inference about the average household tea expenditure if the population
variance is unknown and can only be estimated from the sample
3) The difference between two unrelated or independent means ( 1  2 ).
o For example, making inference about the difference in average return on an investment when
comparing two different portfolios
o For the purpose of this course, this is done under the assumption that the two population
variances are equal
4) The difference between two related or dependent means ( d ).
o For example, making inference about the impact of an announcement about a merger on a
particular stock price, i.e. compare the price before and after the announcement
5) A single population parameter (p).
o For example, making inference about the proportion of the population that will buy a new
product

158
6.7 Estimation

A point estimate, such as X or p̂ , is a single number calculated from the sample that estimates the
population parameter. The advantage of a point estimate is that it is a single value that gives an indication of
what the population parameter could be. However, we have no idea of how reliable this estimate is as it is
based on a single sample and the point estimate will be different if we used a different sample. Furthermore,
as we have seen from the sampling distribution of the mean, the variance of X depends on the size of the
sample. This means that the sample averages could differ considerably from sample to sample. To solve this
problem we can incorporate this variability by estimating an interval within which the population parameter
will lie. This is called confidence interval (CI) estimation.

Properties of a CI:
 It is a range of values that are plausible estimates of the population parameter
 It consists of an upper and lower bound calculated from the sample
 The centre of the CI is always the sample statistic used to estimate the population parameter
 It allows us to express a certain level of confidence in our estimate, usually 90% or more
 It is referred to as the 100(1 – α)% CI
o α is the probability of making an error, specifically a Type I error
o Common choices for α is 1%, 5% and 10%
o α could also be described as that percentage of samples that yield a 100(1 – α)% CI that do
not include the population parameter of interest

Suppose that we want a 95% CI for a single population mean μ where σ is known. We will then choose the
length of the interval in such a way that we are 95% confident that μ will lie inside this interval. Say we
draw multiple samples of the same size and construct 95% CI’s for each sample. All these intervals will
have the same length and each interval will have the sample mean at the centre.
μ

We therefore fix the interval length so that 95% of samples yield 95% CI’s that contain μ, while 5% of
samples yield 95% CI’s that do not contain μ.
159
We use the Z table or the t-table to incorporate α in our CI calculations. The specific table used will depend
of the nature of the problem statement. Since the CI has an upper and lower bound we split the error over the

two tail areas. In particular, we find the z-score or t-value such that the area in each tail is equal to :
2

 z  P ( Z  z ) 
2
2


 t  P (T  t ) 
2
2

Formulae for CI’s:


1) If σ or σ2 is known, the 100(1 – α)% CI for μ has the form:

x  z
2 n
2) If σ or σ2 is unknown, the 100(1 – α)% CI for μ has the form:
s
x  t
2
; n 1 n

 where T ~ tn 1

3) When comparing two independent samples, the 100(1 – α)% CI for 1  2 has the form:

1 1
 x1  x2   t s* 
2
; n1  n2  2 n1 n2

 where T ~ tn1  n2 2
 n1  1 s12   n2  1 s22
 s*2 
n1  n2  2

4) When comparing two dependent samples, the 100(1 – α)% CI for d has the form:
sd
d  t
2
; n 1 n

 where T ~ tn 1
 d  average of the differences
 sd  standard deviation of the differences

5) For a single proportion, the 100(1 – α)% CI for p has the form:
pˆ 1  pˆ 
pˆ  z
2
n

160
Exercise 6.8
A random sample of 36 debtors has an average debt of R15600 with a standard deviation of R1950. It is
assumed that debt is normally distributed with a standard deviation of R1500.
1) Identify the parameter of interest.

2) Calculate the 90% CI for the average debt.

3) Calculate the 99% CI for the average debt.

4) Compare these two CI’s and comment on the impact of changing the confidence level.

161
Exercise 6.9
A random sample of 10 customers’ banking accounts was selected and the following data pertaining to their
monthly service fee (in Rands) was obtained. It is assumed that service fee follows a normal distributed.

20 45 32 150 48 52 70 64 100 120

1) Identify the parameter of interest.

2) Find the point estimate for the average monthly service fee.

3) Calculate the 95% CI for the average monthly service fee.

Exercise 6.10
In an attempt to market itself as the better pizza parlour, a local pizzeria advertises that their delivery time to
a university residence is less than that for another rival pizzeria also located in the vicinity of the residence.
A group of students wanted to verify the claim so they decided to order 10 pizzas from the local pizzeria and
the 10 pizzas from the rival pizzeria, all at different times in the day. The delivery times (in minutes) are
assumed to be normally distributed, and were recorded as follows:

Local 16.8 11.2 15.4 16.2 17.5 16.4 14 21 13.5 12.5


Rival 22.2 15.2 18.7 17 22.2 20.2 18 19.6 16.4 14

162
1) Identify the parameter of interest.

2) Summary statistics (for Local – Rival).

3) Find the point estimate for the difference in the mean delivery times.

4) Calculate the 95% CI for the difference in the mean delivery times.

163
5) What assumptions did we make and were they justified in this case?

6) How would this change if we defined the difference as Rival – Local?

164
Exercise 6.11
Absenteeism in the workplace is assumed to be normally distributed. A company has found that the rate of
absenteeism is quite high. They have implemented an incentive program that rewards employees for good
attendance. The employees’ absenteeism was noted for a period of six months before the program was
implemented and a period of six months after the program was implemented. The following table reflects
the data collected for 8 randomly chosen employees:

Before Incentive After Incentive


Employee D=B–A D=A–B
Program Program
1 2 1
2 5 6
3 0 2
4 1 1
5 7 3
6 3 1
7 2 1
8 1 0

1) Identify the parameter of interest.

2) Find the point estimate for the difference in mean days absent (D = B – A).

3) Calculate the 95% CI for the difference in mean days absent (D = B – A).

165
4) How would this change if we used D = A – B?

Exercise 6.12
The Gauteng traffic department suspects that 20% of all drivers wear seatbelts. At a road block, 165 drivers
of the 550 randomly selected cars were wearing seatbelts.

1) Identify the parameter of interest.

2) Find the point estimate for the proportions of all drivers wearing seatbelts.

3) Calculate the 95% CI for the proportions of all drivers wearing seatbelts

4) What assumptions did we make and were they justified in this case?

166
6.8 Hypothesis Testing

Hypothesis testing is a rigid process that enables us to use sample data to test statements made about
unknown population parameters. Since we typically collect data from a random sample and not from an
entire population, we can never prove a statement about a population. The idea with hypothesis testing is to
see how likely the statement is in the face of the sample evidence.

Let’s say we would like to draw inference about a population mean for a variable that is approximately
normally distributed with a known population variance. We can then estimate the population mean by
randomly selecting a sample from this population and calculating the sample average of the variable. In
order to say how good such an estimate is we derive the sampling distribution of the mean (Section 6.2),
which allows us to attach some measure of certainty to our estimate. However, the distribution of X for a
sample of size n also requires some assumption about the value of the population mean  .

For example, say it is assumed that IQ scores are approximately normally distributed with a mean of 100

 
and a standard deviation of 15. Therefore X ~ N 100,152 . If this assumption about the distribution of IQ

 152 
scores is valid, then for a sample of n = 25 it follows that X ~ N 100, .
 25 

This implies that any random sample of 25 observations should yield an observed average that is located
around the assumed mean of   100 , in other words the majority of sample means are concentrated around
100. It is less likely to observe a sample mean that is far away from 100, i.e. towards the tail areas of the
distribution. If this happens it seems as if our original assumption about the distribution of IQ scores is
perhaps not a valid assumption. The question is: how far away from 100 must the observed mean be for it to
be considered an unlikely outcome? This is determined through the distribution of our original assumption.

This is the basis of hypothesis testing. In general we would make a statement about the behaviour of a
population parameter that contradicts what we believe to be true to begin with. We then use sample
information to see if there is some evidence for our statement. It is important to understand that we can
never prove something to be true, we can only prove it to be false.

There are essentially five steps in hypothesis testing. In general terms these steps are the same for all
hypothesis tests. The nature of the problem statement will determine the specifications, distribution and
formula required at each step.

167
Steps
1) State the null hypothesis (H0) and the alternative hypothesis (H1).
2) Find the critical value for a specific level of significance α.
3) Calculate the test statistic.
4) Make a decision in terms of H0.
5) Reach a conclusion in terms of H1.

The steps defined


1) State the null hypothesis (H0) and the alternative hypothesis (H1).
There are two types of hypotheses and both must be stated for any hypothesis test.

The null hypothesis (H0) is the neutral hypothesis and contains what we assume to be true to begin
with. It always includes an equal sign in its expression. For a one-sample test, H0 is of the form:
o H 0 :   0 where 0 is the assumed value of the population parameter

The alternative hypothesis (H1) contains the statement to be proved or tested. It can take on one of
three different forms, depending on the particular research question that we wish to address. For
example, H1 for a one-sample test could be formulated as:
o H1 :   0 (one-sided test to the right)

o H1 :   0 (one-sided test to the left)

o H1 :   0 (two-sided test)

We will always test a hypothesis under the assumption that H0 is true. If the data contain sufficient
evidence to support H1, then we reject H0.

2) Find the critical value for a specific level of significance α.


Since certainty and complete confidence in a decision is rarely possible in real life, there is always
the possibility of making an error. Such an error could be rejecting the null hypothesis when in fact it
is true. The probability of making such an error is called the significance level associated with the
hypothesis test and is denoted by α. We want to ensure that this error is as small as possible, so α is
usually 0.01, 0.05 or 0.1. The choice of α is fixed for the test before the sample is drawn and depends
on the impact of such an error.

The critical value(s) for a specific hypothesis test depends on the form of the alternative hypothesis
and the level of significance. Since we have derived the distribution of the mean for a given sample
168
under the null hypothesis, we can find the critical value(s) from the appropriate statistical tables for a
selected level of significance. The critical values are then used to identify the critical or rejection
region(s) of the hypothesis test under the distribution of the null hypothesis. These areas determine
whether we reject H0 or not, given the sample data.

3) Calculate the test statistic.


The test statistic is a value calculated from the observed data and is used to test the particular
hypothesis. For a one-sample test, i.e. inference about a single mean, we can estimate the population
mean for a particular variable using the sample mean. Since the Z- and t-tables give the values of a
distribution in standardised form, it is conventional to first standardise the observed sample mean.
This standardised form of the estimator can then be directly compared to the appropriate distribution.
In other words, we can assess where the test statistic appears under the distribution of H0 and
determine whether this is a likely outcome or not, given our original assumption.

4) Make a decision in terms of H0.


Based on the value of the test statistic and the defined rejection region(s), we will either reject the
null hypothesis or fail to reject the null hypothesis. The decision is made as follows:
o If the value of the test statistic falls inside the rejection region(s), we reject H0 in favour of H1
o If the value of the test statistic falls outside the rejection region(s), we fail to reject H0 in
favour of H1

Note that this decision is made in terms of H0 only. Recall that we cannot test if something is true,
i.e. we do not test whether H0 is true, we only test whether there is evidence to contradict it.
Therefore, we never say that we “accept H0” to be true.

5) Reach a conclusion in terms of H1.


The main objective of any hypothesis test is to test a particular problem statement. It is therefore
important to relate our findings back to our initial problem statement. Since such a statement is
formulated within the alternative hypothesis we will always conclude in terms of H1. If we reject H0
it implies that we have evidence in the data that contradicts our original assumption about the
distribution of the mean. On the other hand, if we fail to reject H0 we have not proved it to be true,
we simply have not been able to find sufficient evidence in favour of our statement. When we
conclude, we never say that we “Accept” H1, we only state one of the following:
o We either have sufficient evidence in favour of H1 at a given level of significance,
o Or we have insufficient evidence in favour of H1 at a given level of significance

169
Testing a hypothesis about μ for a single sample where σ or σ2 is known

Null hypothesis H 0 :   0 OR H 0 :   0 OR
H 0 :   0
H 0 :   0 H 0 :   0
Alternative hypothesis H1 :    0 H1 :    0 H1 :    0
Rejection region

Critical value  z  z  z and +z


2 2

Test statistic x  0
zcalc  ~ N  0,1

n

Exercise 6.13
Choc Delights manufactures sachets of hot chocolate mix. In the past they have found that the filling
weights follow an approximate normal distribution with a mean filling weight of 20 grams and a standard
deviation of 0.05 grams. An unhappy customer claims that the filling weight is always below 20 grams. The
quality control manager sampled 36 sachets and found the mean filling weight to be 19.98 grams. Test the
customer’s claim using a 5% significance level.

1) Hypotheses

2) Critical value(s) and rejection region(s)

3) Test statistic

170
4) Decision

5) Conclusion

Testing a hypothesis about μ for a single sample where σ or σ2 is unknown

Null hypothesis H 0 :   0 OR H 0 :   0 OR
H 0 :   0
H 0 :   0 H 0 :   0
Alternative hypothesis H1 :    0 H1 :    0 H1 :    0
Rejection region

Critical value tn 1; tn 1; t  and +t 


n 1; n 1;
2 2

Test statistic x  0
tcalc  ~ tn 1
s
n

Exercise 6.14
It is believed that the monthly return on an investment is normally distributed with a mean of 1.5%. A
portfolio manager believes that the mean monthly return is more than 1.5%. She collects a random sample of
nine months’ data. Test her claim at a 1% level of significance.

Return (%) –2.0 1.8 –1.5 3.5 –5.3 9.2 6.5 4.4 1.2

171
1) Hypotheses

2) Critical value(s) and rejection region(s)

3) Test statistic

4) Decision

5) Conclusion

172
Testing a hypothesis about the mean difference for two independent sample

Null hypothesis H 0 : 1  2 H 0 : 1  2
OR OR
H 0 : 1  2  0 H 0 : 1  2  0 H 0 : 1  2
H 0 : 1  2 H 0 : 1  2 H 0 : 1  2  0
H 0 : 1  2  0 H 0 : 1  2  0

Alternative hypothesis H1 : 1   2 H1 : 1   2 H1 : 1   2
H1 : 1   2  0 H1 : 1   2  0 H1 : 1   2  0

Rejection region

Critical value tn1  n2  2; tn1  n2  2; t  and +t 


n1  n2  2; n1  n2  2;
2 2

Test statistic  x1  x2   0
tcalc  ~ tn1  n2  2
1 1
s *

n1 n2
 n1  1 s12   n2  1 s22
where s  *2

n1  n2  2

Exercise 6.15
The Checkers store is concerned that the number of tills at their Morningside store is not sufficient for their
customers during busy times and as a result, customers have to wait for long periods in the queue for
payment. The management at the Checkers store in Morningside took a random sample of 12 customers
from their store (during a peak shopping period) and recorded each customer’s waiting time. They also
managed to get another similarly sized Checkers store to select 12 customers at random and to record their
waiting times during the same peak shopping period. Test at a 5% significance level if the mean waiting
times of customers are different at the two stores? It is assumed that waiting time is normally distributed.

Morningside 3 5 2 6 5 4 10 7 5 9 6 6
Other store 3 4 3 5 4 5 7 8 9 7 2 2

173
1) Hypotheses

2) Critical value(s) and rejection region(s)

3) Test statistic

174
4) Decision

5) Conclusion

Testing a hypothesis about the mean difference for two dependent samples

Null hypothesis H 0 : d  0 OR H 0 : d  0 OR
H 0 : d  0
H 0 : d  0 H 0 : d  0
Alternative hypothesis H1 :  d  0 H1 :  d  0 H1 :  d  0
Rejection region

Critical value tn 1; tn 1; t  and +t 


n 1; n 1;
2 2

Test statistic d 0
tcalc  ~ tn 1
sd
n

Exercise 6.16
A manager at an assembly plant wants to show that the productivity on Mondays is higher than the
productivity of Tuesdays. The manager measures productivity in terms of the number of electronic
components the worker assembles per day. He assumes that the productivity of components per day is
normally distributed. Five workers were randomly selected and the number of components assembled on
Monday and Tuesday for each of them was recorded. Test the appropriate hypothesis at the 10% level of
significance.

175
Worker Monday Tuesday D=
A 100 102
B 110 95
C 118 100
D 92 98
E 115 103

1) Hypotheses

2) Critical value(s) and rejection region(s)

3) Test statistic

176
4) Decision

5) Conclusion

Testing a hypothesis about p for a single sample

Null hypothesis H 0 : p  p0 OR H 0 : p  p0 OR
H 0 : p  p0
H 0 : p  p0 H 0 : p  p0
Alternative hypothesis H1 : p  p0 H1 : p  p0 H1 : p  p0
Rejection region

Critical value  z  z  z and +z 


2 2

Test statistic pˆ  p0
zcalc  ~ N  0,1
p0 1  p0 
n

Exercise 6.17
A new magazine is being launched in Gauteng. The publisher feels that for the magazine to be a success it
would have to be read by more than 15% of the market. She conducts a market survey where respondents
are given a description of the magazine and the approximate cost. In a sample of 400 residents in Gauteng
72 said that they would subscribe to the magazine. Test the publishers claim that the magazine will be a
success at a 5% level of significance.

177
1) Hypotheses

2) Critical value(s) and rejection region(s)

3) Test statistic

4) Decision

5) Conclusion

178
6.9 Using a Confidence Interval to Test a Hypothesis

Confidence intervals can be used to test a two-sided hypothesis about a population parameter. A CI provides
an interval of plausible estimates of the population parameter. Therefore, with a certain level of confidence,
we can say that any number contained inside the interval could be the true population parameter. If we
therefore assume that a certain value is true to begin with, i.e. the null hypothesis, and that the number is
contained in the CI, we cannot reject the null hypothesis. On the other hand, if the value in the null
hypothesis is not one of the values we estimated using the CI it is reasonable to assume that it is not valid
and the null hypothesis can be rejected at a certain level of significance.

In general:
 If the value in H0 is included in the 100(1 – α)% CI, we fail to reject H0 at the α% level of
significance
 If the value in H0 is NOT included in the 100(1 – α)% CI, we reject H0 at the α% level of
significance

Refer to all the examples where we calculated a confidence interval. For each of these, use the CI’s to state
and test the following hypotheses.

Exercise 6.18
Consider the debt of 36 debtors, as outlined in Exercise 6.8, and test if the average debt is different from
R16000.
1) The 90% CI for the average debt:

2) Hypothesis test

179
Exercise 6.19
Consider the data pertaining to the monthly service fee (in Rands) of a random sample of 10 customers from
a certain bank, as outlined in Exercise 6.9. The bank states that the average monthly service fee of their
customers is R50. A customer claims that this is not true.
1) The 95% CI for the average monthly service fee:

2) Hypothesis test

Exercise 6.20
Consider the comparison between two pizza parlours (local vs. rival) in terms of the average delivery times,
as outlined in Exercise 6.10. Test whether there is a difference between the average delivery times of the two
pizzerias.
1) The 95% CI for the difference in the mean delivery times:

2) Hypothesis test

180
Exercise 6.21
Consider the data comparing the rate of absenteeism before and after implementing an incentive program, as
outlined in Exercise 6.11. Test whether the incentive program had an impact.
1) The 95% CI for the difference in mean days absent (D = B – A):

2) Hypothesis test

Exercise 6.22
Consider the data for the proportion of drives wearing seatbelts, as outlined in Exercise 6.12. Test if the
percentage of drivers wearing seatbelts is not 20%.
1) The 95% CI for the proportions of all drivers wearing seatbelts:

2) Hypothesis test

181
Chapter 7 p-VALUES

7.1 Introduction

In the previous chapter we outlined a very structured five-step approach to test hypotheses about population
parameters. In following this approach we had to specify our level of significance prior to performing the
test. The rejection region(s) is then determined based on the chosen α.

Suppose that IQ scores of a certain population are assumed to be normally distributed with a mean of 100
and a variance of 144, but you claim that the mean is actually greater than 100. A random sample of 25
people from this target population yields an average IQ of 104. Do the data support your claim if we test at
the 1% level of significance?

For this one-sample z-test, and α = 0.01, the five steps are as follows:
1) H0: μ = 100 vs. H1: μ > 100
2) The rejection region is the specific area under the z-curve: (2.3262, ∞)
3) The value of the test statistic is equal to 1.67
4) Therefore we fail to reject H0 in favour of H1
5) At the 1% level of significance there is insufficient evidence to support the claim that the average IQ
score exceeds 100

But what if we decided upfront to test this hypothesis at the 5% level of significance? This will change the
critical value and could have an impact on the results. The five steps will then be as follows:
1) H0: μ = 100 vs. H1: μ > 100
2) The rejection region is the specific area under the z-curve: (1.6449, ∞)
3) The value of the test statistic is equal to 1.67
4) Therefore we reject H0 in favour of H1
5) At the 5% level of significance there is sufficient evidence to support the claim that the average IQ
score exceeds 100

For this example, changing the level of significance resulted in different conclusions. This approach does
not allow for the flexibility to directly assess the strength of our evidence. We will have to redo parts of the
hypothesis test again for different α to determine at which level of significance we may have different
results, which is time consuming and impractical.

182
An alternative approach is to calculate a value called the p-value, which is used to measure the strength of
our evidence and avoid having to choose a specific level of significance. Most statistical software packages
provide p-values as part of the output.

The objectives of this chapter include:


 Calculate p-values for selected hypothesis tests
 Interpret p-values

7.2 Defining p-values

Formally defined, the p-value is the probability of getting a test statistic equal to or more extreme than the
sample result, under the assumption that the null hypothesis is true. It indicates how far out in the tail the
observed value is. The p-value is often referred to as the observed level of significance and is the smallest
level at which H0 can be rejected. If the p-value is small it means that the test statistic value is far above
zero, so it is likely to be larger than any critical value calculated from any specified α. Therefore we are
more likely to reject H0 as we have some evidence in favour of H1. If the p-value is large it means that the
test statistic value is smaller than the critical value and we are more likely to fail to reject H0.

So, instead of specifying α upfront we can calculate the p-value and interpret it. There are two ways of
interpreting a p-value. We can either classify the value according to the strength of the evidence in favour of
H1 or we can compare the value with all levels of significance.

Classifying the p-value:


p-value Conclusion
(0, 0.01) Very strong evidence in favour of H1
(0.01, 0.05) Strong evidence in favour of H1
(0.05, 0.10) Moderate evidence in favour of H1
(0.10, 0.20) Weak evidence in favour of H1
(0.20, 1) Insufficient evidence in favour of H1

Comparing the p-value to α:


Comparison Decision
p-value < α Reject H0 at the α% level of significance
p-value ≥ α Fail to reject H0 at the α% level of significance

183
The steps involved in hypothesis test using p-values are as follows:
1) State the null and alternative hypotheses
2) Calculate the test statistic
3) Calculate the p-value
4) Reach a conclusion

7.3 Calculating p-values

The p-value calculation depends on the form of the alternative hypothesis. To illustrate this we will use
notation for a one-sample z-test as an example, where the test statistic is denoted by zcalc. The p-value is
calculated directly from the z-tables. The notation can easily be adapted for t-tests and the p-value calculated
using the t-tables. Note: for the one-sided tests the areas of interest is dependent on the direction of the
hypothesis and not the sign of the test statistic value.

Alternative hypothesis Calculation


H1 :   0 (one-sided test to the right) p-value  P  Z  zcalc 

H1 :   0 (one-sided test to the left) p-value  P  Z  zcalc 

H1 :   0 (two-sided test) p-value  P  Z  zcalc   P  Z   zcalc 

For example, suppose that we observe a test statistic value zcalc  1.67 . The following table shows the area
of interest under the Z-curve and the calculation of the p-value for all three forms of H1.

Alternative hypothesis Area of interest p-value calculation

H1 :    0 p-value  P  Z  1.67 

H1 :    0 p-value  P  Z  1.67 

H1 :   0 p-value  P  Z  1.67   P  Z  1.67 

184
Exercise 7.1
For a one-sample z-test, suppose we observe zcalc  1.23 . Identify the areas of interest and calculate the p-
values associated with this test statistic for each of the three different forms of the alternative hypothesis.

Alternative
Area of interest p-value calculation
hypothesis

H1 :    0

H1 :    0

H1 :   0

185
Exercise 7.2
Let us relook the example of IQ scores of a certain population that are assumed to be normally distributed
with a mean of 100 and a variance of 144 (example at the beginning of this chapter). A random sample of 25
people from this target population yields an average IQ of 104. Use p-values to test the claim that the mean
is actually greater than 100.

1) H0:
H1:

2) Test statistic:

3) p-value =

4) Conclusion:

Exercise 7.3
Recall that when we performed this hypothesis test at the beginning of the chapter we failed to reject H0 at
the 1% level of significance, but we rejected H0 at the 5% level of significance. Compare the conclusions
reached using p-values with these results.

186
Exercise 7.4
For the IQ data outlined in Exercise 7.2, use p-values to test the hypothesis that the mean IQ is less than 100.
1) H0:
H1:

2) Test statistic =

3) p-value =

4) Conclusion:

Exercise 7.5
For the IQ data outlined in Exercise 7.2, use p-values to test the hypothesis that the mean IQ is not equal to
100.
1) H0:
H1:

2) Test statistic =

3) p-value =

4) Conclusion:

187
Note that we reach different conclusions with each of the three different forms of the alternative hypothesis.
From a logical perspective this is correct. For example, consider the two separate one-sided tests. It is not
possible to have strong evidence that the mean IQ is greater than 100, and at the same time have strong
evidence that the mean IQ is less than 100. The approach in calculating the respective p-values for the two
different forms reflects what is contained in the data. Therefore it is very important to interpret a p-value in
terms of how the alternative hypothesis is formulated.

We will now consider all the hypothesis test examples from Chapter 6, Section 6.8. Calculate the p-values
for each of these tests and compare these findings with the previous results obtained.

Exercise 7.6
Refer to the Choc Delights example in Exercise 6.13.
1) H0: μ = 20
H1: μ < 20

2) Test statistic: zcalc  2.4

3) p-value =

4) Conclusion:

5) Comparison:

188
Exercise 7.7
Refer to the monthly return on investment example in Exercise 6.14.
1) H0: μ = 1.5
H1: μ > 1.5

2) Test statistic: tcalc  0.32

3) p-value =

4) Conclusion:

5) Comparison:

Exercise 7.8
Refer to the Checkers stores (Morningside vs. Other) example in Exercise 6.15.
1) H0: μM = μO
H1: μM ≠ μO

2) Test statistic: tcalc  0.795

3) p-value =

189
4) Conclusion:

5) Comparison:

Exercise 7.9
Refer to the Productivity (Monday vs. Tuesday) example in Exercise 6.16.
1) H0: μd = 0
H1: μd > 0

2) Test statistic: tcalc  1.54

3) p-value:

4) Conclusion:

5) Comparison:

190
Exercise 7.10
Refer to the new magazine subscription example in Exercise 6.17.
1) H0: p = 0.15
H1: p > 0.15

2) Test statistic: zcalc  1.68

3) p-value =

4) Conclusion:

5) Comparison:

191
Chapter 8 CHI-SQUARED TEST OF INDEPENDENCE

8.1 Introduction

In Chapter 6 we looked at testing hypotheses for a single population mean, the difference between
population means, and a single proportion. In all of these tests the population parameters are numerical
summary measures, namely a mean or proportion, and our objective is to test statements about these
parameters using sample information. These particular problem statements require the original observed data
to be numerical so that we can calculate an average, or categorical for which we find the proportion (also a
numerical value).

When analysing data we are often interested in investigating how two variables behave together, i.e.
bivariate analysis. When both variables are numerical we can use correlation or regression to determine the
strength and nature of the relationship. However, if both variables are categorical, neither correlation (or
regression) nor any of the hypothesis tests we have done so far can be used to assess the relationship
between the variables. Such variables include categorical measures, for example gender or marital status, or
numerical variables that have been categorised, such as a person’s age grouped into age categories.

To address this particular problem statement we use the Chi-squared test of independence, denoted as χ2.
This is a hypothesis test used to determine if two categorical variables are dependent or related or associated.
For example, we may wish to test whether there is an association between income level and brand
preference, or if the size of the washing machine bought is related to the size of the family.

The Chi-squared test is based on a table of observed counts from bivariate data. This table of observed
counts is called an observed frequency table or a contingency table. It is a two-dimensional table where one
variable is presented in rows and the other variable in columns. The row variable has r categories and the
column variable has c categories. The values of both variables divide the table into r × c non-overlapping
cells, and each item in the sample will fall into one and only one of these cells. Analysis of the contingency
table is based on the number of observations in the different categories, rather than the mean or proportion.

Chi-squared tests are nonparametric or distribution-free in nature. This means that we do not need to make
any assumptions about the form of the population distribution from which the samples are drawn. The tests
that were carried out in Chapter 6 made the assumption that the samples are drawn from a specified
distribution or assumed distribution, such as the normal distribution. Such tests are referred to as parametric
tests.
192
The objectives of this chapter include:
 Identify the specific hypothesis test
 Know how to read the Chi-squared table
 Perform the Chi-squared hypothesis test
 Interpret the results

8.2 Chapter Formulae

 row i total  column j total 


Expected value / frequency eij 
sample size

 O  E 2
2  
over all cells E
Chi-squared test statistic
O  Observed frequency
E  Expected frequenct

8.3 Steps in the Hypothesis Test

As the χ2 test is a hypothesis test we follow the same procedure as with any other hypothesis test. As before
there are five steps in this analysis. The main differences between this test and the previous tests are the
formulation of the hypotheses, the set of tables used to find the rejection region, and the formula for the test
statistic value.

Step 1: State the null and alternative hypotheses


The hypotheses are always of the following form:
H0: The two variables are independent or not related or not associated
H1: The two variables are dependent or related or associated

For a practical application, you are required to write out the names of the two categorical variables. For
example, stating that “Income level and brand preference are independent” instead of “The two variables are
independent”.

Step 2: Find the critical value


The Chi-squared distribution is skewed to the right and the rejection region is always under the right tail area
of the distribution. Under the assumption of the null hypothesis the test statistic follows a  2 distribution

193
with  r  1 c  1 degrees of freedom. We use the Chi-squared table to find the critical value. This table
shows the degrees of freedom in the first column, selected upper-tail probabilities in the first row, and the
Chi-squared values in the body of the table.
TABLE 3: Critical values of χ2
For a particular number of degrees of freedom, entry represents the
critical value of χ2 corresponding to a specified upper-tail area (α)

2
0  (, df)
Upper-Tail Areas (α)
Degrees of
freedom 0.995 0.99 0.975 0.95 0.90 0.75 0.25 0.10 0.05 0.025 0.01 0.005
1 0.001 0.004 0.016 0.102 1.323 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 0.575 2.773 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 1.213 4.108 6.251 7.815 9.348 11.345 12.838
. . . . . . . . . . . . .
. . . . . . . . . . . . .
30 13.787 14.953 16.791 18.493 20.599 24.478 34.800 40.256 43.773 46.979 50.892 53.672

The critical value depends on the level of significance (α) and the degrees of freedom:
 2r 1 c 1;

Step 3: Calculate the test statistic


The Chi-squared test statistic is calculated by comparing the differences between the observed and expected
frequencies under the assumption that the null hypothesis is true. Each cell in the contingency table contains
the observed frequencies and is denoted by oij , for row i and column j. The expected frequencies are the

counts that we would expect to get in the contingency table if the two variables are independent. The
expected frequencies for row i and column j, denoted by eij , are calculated as follows:

 row i total  column j total 


eij 
sample size

The formula for the test statistic is:

 oij  eij 
2

 2

for all i , j eij

To ensure that the Chi-squared test is valid the minimum expected frequency in each cell must be 5. In
practice it is not always possible to satisfy these constraints. At the very least, no more than 20% of cells
should have an expected frequency less than 5, and all expected frequencies should be at least 1.
Alternatively we could group some of the categories of a variable or use algorithms that can account for this.
However, this is beyond the scope of the course.
194
Step 4: Make a decision in terms of H0
The calculated test statistic is compared to the critical value and rejection region. If the value falls inside the
rejection region, i.e. the test statistic > critical value, we will reject the null hypothesis in favour of the
alternative. If the value falls outside the rejection region (test statistic < critical value), we fail to reject the
null hypothesis in favour of the alternative.

Step 5: State the conclusion in terms of H1


Based on the decision made in Step 4 we must state the conclusion in terms of the problem statement, which
is contained in the alternative hypothesis. Therefore we need to state whether we have sufficient evidence to
support H1 or not.

Reason why the Chi-squared test works for this hypothesis test
The null hypothesis is the neutral hypothesis and what we assume to be true to begin with. For this test we
assume that the two categorical variables are independent. We then look at the data to see if we can find
evidence to the contrary. In the test statistic calculated under the assumption that H0 is true, we compare
observed and expected frequencies. The latter are calculated based on statistical independence. Recall that if
two events A and B are statistically independent, then P  A  B   P  A P  B  . Therefore, if the levels of
two categorical variables are independent of each other then the probability of any intersection is simply the
product of the two marginal probabilities. This fact forms the basis of the expected frequency calculation.

To illustrate this, let n denote the total sample size:


eij  n  P  row i  column j 
 n  P  row i   P  column j 
row i total   column j total 
 n    
 n   n 
 row i total  column j total 

n

If the two variables are in fact independent the frequencies that we observe should be very close to the
expected frequencies. As a result the test statistic value will be relatively small and therefore unlikely to fall
inside the rejection region. However, if the variables are dependent, what we observe compared to what we
expect will be notable different. The resulting test statistic will be a much larger value and are more likely to
fall inside the rejection region, leading to us to reject the null hypothesis of assumed independence.

195
Exercise 8.1
An advertising company would like to utilise the Internet for their adverts. To better understand who they
will target through Internet advertisement, they want to know if the intensity of Internet usage is dependent
on whether the user is male or female. They conducted a survey on a random sample of 300 Internet users
and recorded the gender and the usage intensity (Light vs. Heavy). Fifty percent of the users were male, and
200 of the 300 respondents were heavy users. Of the females, 55 were light users. Test the appropriate
hypothesis at the 1% level of significance.

The two variables of interest are both categorical. Based on the information provided we need to construct a
contingency table with observed frequencies:

OBSERVED Male Female TOTAL


Light users
Heavy users
TOTAL

1) State hypotheses

2) Critical value

3) Test statistic
Expected frequencies =

196
Test statistic =

4) Decision

5) Conclusion

Exercise 8.2
A certain corporation is interested in determining whether a relationship exists between the commuting time
of its employees and the level of stress-related problems observed on the job. A study of 123 employees
reveals the following:

Stress Level
OBSERVED
High Moderate Low TOTAL
Under 15 minutes 10 6 19 35
Commuting

15 – 45 minutes 15 9 29 53
Time

Over 45 minutes 20 7 8 35
TOTAL 45 22 56 123

Test the hypothesis at the 5% level of significance. Round off all intermediate and final calculations to three
decimal places.

1) State hypotheses

197
2) Critical value

3) Test statistic
Find the missing expected values in the following table:
OBSERVED
High Moderate Low
(EXPECTED)
Under 15 minutes 10 ( ) 6 (6.260) 19 (15.935)
15 – 45 minutes 15 ( ) 9 (9.480) 29 (24.130)
Over 45 minutes 20 (12.805) 7 (6.260) 8 (15.935)

Test statistic =

4) Decision

5) Conclusion

198
8.4 Using p-values

The Chi-squared test can also be performed using p-values. To do this we follow the same approach as
discussed in Chapter 7, namely:
1) State the null and alternative hypotheses.
2) Calculate the test statistic.
3) Calculate the p-value = P   2r 1 c 1   calc
2
.
4) Reach a conclusion.

Exercise 8.3
Use p-values to test the appropriate hypothesis for the Internet advertising example in Exercise 8.1.
Compare the findings from the two different approaches.

1) H0: Gender and the usage intensity are independent


H1: Gender and the usage intensity are dependent

2) Test statistic =

3) p-value =

4) Conclusion:

5) Comparison:

199
Exercise 8.3
Use p-values to test the appropriate hypothesis for the employee commuting time and stress levels example
in Exercise 8.2. Compare the findings from the two different approaches.

1) H0: Commuting time and stress levels are independent


H1: Commuting time and stress levels are dependent

2) Test statistic =

3) p-value =

4) Conclusion:

5) Comparison:

200

You might also like