Stats ch8
Stats ch8
Stats ch8
8 THE NORMAL
DISTRIBUTION
Objectives
After studying this chapter you should
Statistical tables are available in many books and can also be found
online.
You should note that the accuracy of your solutions will depend on the
tables (or graphic calculator) you are using.
8.0
Introduction
The tallest accurately recorded human being was Robert Wadlow in the
USA. On his death at the age of 22 he was 272 cm (8 feet 11.1 inches)
tall. If you were an architect and you had to design doorways in a
building you would clearly not make them all 9 feet high - most ceilings
are lower than this!
What height should the ceilings be?
In 1980 the Government commissioned a survey, carried out on 10 000
adults in Great Britain. They found that the average height was 167.3 cm
with SD (standard deviation) 9.1 cm. You cannot make a door size that
everyone can fit through but what height of door would 95% of people get
through without stooping? This chapter should help you find the answer.
Activity 1
Data collection
There are many sets of data you could collect from people in your group,
such as heights, weights, length of time breath can be held, etc. However,
you will need about 100 results to do this activity properly so here are a
few suggestions where large quantities of data can be collected quickly.
1.
Lengths of leaves
Evergreen bushes such as laurel are useful - though make sure all the
leaves are from the same year's growth.
151
2.
3.
Pieces of string
Look at 10 cm on a ruler and then take a ball of string and try
to cut 100 lengths of 10 cm by guessing. Measure the lengths
of all the pieces in mm.
4.
Weights of apples
If anyone has apple trees in their garden they are bound to
have large quantities in the autumn.
5.
6.
Game of bowls
Make a line with a piece of rope on the grass about 20 metres
away. Let everyone have several goes at trying to land a tennis
ball on the line. Measure how far each ball is from the line.
Try at least two of these activities. You will need about 100 results
in all. To look at the data it would help to have a data handling
package on a computer.
8.1
The data shown on the opposite page gives the length from top to
tail (in millimetres) of a large group of frogs. This has been run
through a computer package so you can see some useful facts about
the data.
In the computer analysis you will see that most of the frogs are
close to the mean value, with fewer at the extremes. This 'bellshaped' pattern of distribution is typical of data which follows a
normal distribution. To obtain a perfectly shaped and symmetrical
distribution you would need to measure thousands of frogs.
Does your data follow a 'bell shaped' pattern?
You may notice that median mean mode, as might be
expected for a symmetrical distribution. From the analysis of data
you also see that the mean is 90.9 mm and the standard deviation is
11.7 mm. Now look at how much of the data is close to the mean,
i.e. within one standard deviation of it. From the stem and leaf
table you can see that 74 frogs have a length within one standard
deviation above the mean and 59 within a SD below the mean.
Altogether, 133 frogs are + or one SD from the mean, which
is 66.5%.
152
Frog Data
The data below show the length from top to tail in millimetres of a large group of frogs.
83
91
73
87
103
65
95
80
84
90
91
113
85
91
76
83
80
72
82
69
100
103
96
79
97
96
98
84
85
91
86
87
117
94
104
98
78
97
50
Frogs
97
77
96
87
104
86
92
96
97
82
107
93
97
91
104
91
101
108
53
110
72
72
105
97
95
94
95
81
102
77
92
87
80
91
95
91
*
* *
89
68
71
79
91
111
100
101
108
97
89
100
107
118
77
96
96
91
*
* * * *
* *
* * * * * * *
* * * * * * * * *
60
95
118
99
102
82
98
90
85
94
79
85
96
102
96
94
88
102
91
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* * *
* * *
105
87
121
98
102
92
91
113
79
90
98
90
104
89
84
115
78
110
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
75
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
80
97
81
97
75
74
95
96
81
90
84
97
94
88
78
96
97
86
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* * *
* * *
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
76
78
104
88
95
88
75
103
92
94
91
109
93
111
73
74
80
101
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* *
*
*
*
*
* * * *
* * * *
* * * *
90
117
100
68
87
90
84
70
98
85
98
90
102
75
120
92
88
87
81
*
*
*
* *
* *
74
95
89
86
62
80
84
95
87
73
86
84
96
92
81
86
82
97
* *
* *
* * * * *
*
*
* *
* *
105
* *
120
MEAN
MEDIAN
TRMEAN
STDEV
MIN
MAX
Q1
Q3
200
90.905
91.000
90.822
11.701
53.000
121.000
83.250
97.000
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
N = 200
3
2
5
0
5
0
5
0
5
0
5
0
5
0
8
1
5
0
5
0
5
0
5
0
7
1
8
2
5
0
5
0
5
0
7
1
7
9
2
6
0
5
0
5
0
7
1
8
2
6
0
5
0
5
1
8
3
8
3
7
0
6
0
5
1
8
3
3
7
1
6
0
5
1
9
3
7
1
6
1
5
2
4
8
1
6
1
5
2
4
8
1
6
1
6
2
4
8
1
6
1
6
2
8
2
7
1
6
2
9
2
7
1
6
2
9
2
7
1
6
3
9
2
7
1
6
3
9
3
7
1
6
3
3
7
1
6
4
4
7
1
6
4
4
7
1
6
4
4
8
1
6
4
4 4 4 4
8 8 8 8 9 9 9 9
2 2 2 2 2 2 3 3 4 4 4 4 4 4
7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 9
4
153
Activity 2
Apply the same techniques to your own sets of data (i.e. draw up
frequency tables or histograms and calculate means and SDs) and
calculate the percentage which lie within one SD of the mean. If
the data is normally distributed then this should be about 68%.
Similarly, you could look for the amount of data within 2 SDs ,
3 SDs, etc. The table below gives approximately the percentages
to expect.
34%
14%
2%
3
SDs
0 1 12
34 %
2 3
over 3
2%
negligible
14 %
Note that very few items of data fall beyond three SDs from the
mean.
What is clearly useful is that no matter what size the numbers are,
if data are normally distributed, the proportions within so many
SDs from the mean are always the same.
Example
IQ test scores, and the results of many other standard tests, are
designed to be normally distributed with mean 100 and standard
deviation 15.
Therefore statements such as the following can be made:
'68% of all people should achieve an IQ score between
85 and 115.'
'Only 2% of people should have an IQ score less than 70.'
'Only 1 in a 1000 people have an IQ greater than 145.'
Exercise 8A
The survey mentioned in the introduction also showed
that the average height of 16-19 year olds was
approximately 169 cm with SD 9 cm.
1. Assuming the data follows a normal distribution,
find:
(a) the percentage of sixth formers taller than
187 cm;
154
8.2
1
2
12 z
f (z)
0.4
0.3
0.2
.
0.1
-3
-2
-1
.00
.02
.01
.03
.04
.05
0 . 0 . 5 0 0 0 0 . 5 0 3 9 9 . 5 0 7 9 8 . 5 11 9 7 . 5 1 5 9 5 . 5 1 9 9 x
0.1 .53983 .54380 .54776 .55172 .55567 .5596x
0.2 .57926 .58317 .58706 .59095 .59483 .5987x
(z) = P ( Z < z)
f (z) d z .
(1.0 )
Tables usually give the area to the left of z and only for values
above zero. This is because symmetry enables you to calculate
all other values.
(1.5)
84.13%
Example
What is the probability of being less than 1.5 SDs below the
mean i.e. ( 1.5) ?
155
Solution
From tables,
( +1.5) =
0.93319
and by symmetry,
( 1.5) = 1 0.93319 = 0.06681
1.5
1.5
Example
If Z ~ N ( 0,1) , find
(a) P( Z > 1.2 )
(b) P( 2.0 < Z < 2.0 )
(c) P( 1.2 < Z < 1.0 )
Solution
(a) P( Z > 1.2 ) = 1 (1.2 )
= 1 0.88493
(from tables)
= 0.11507
1.2
= 0.9545
= (1.0 ) (1 (1.2 ))
156
1.2
= 0.84134 (1 0.88493)
= 0. 72627
You can also use the tables to find the value of a when P( Z > a ) is
a given value and Z ~ N ( 0,1) . This is illustrated in the next
example.
Example
0.90
Solution
(a) Here ( a ) = 0.90 , and from the tables
a 1.28
0.25
Exercise 8B
If Z ~ N ( 0,1) , find
1. P( Z > 0.82 )
2. P( Z < 0.82 )
3. P( Z > 0.82 )
4. P( Z < 0.82 )
5. P( 0.82 < Z < 0.82 )
8.3
6. P( 1 < Z < 1)
7. P( 1 < Z < 1.5)
8. P( 0 < Z < 2.5)
9. P( Z < 1.96 )
10. P( 1.96 < Z < 1.96 )
Transformation of normal
p.d.f.s
Example
Eggs laid by a particular chicken are known to have lengths
normally distributed, with mean 6 cm and standard deviation
1.4 cm. What is the probability of:
(a) finding an egg bigger than 8 cm in length;
(b) finding an egg smaller than 5 cm in length?
157
Solution
(a)
86
1. 4
= 1. 429 ,
but
(1. 43)
so
z =
(b)
56
= 0. 7143 ,
1. 4
but
so
X ~ N ,
)
(
272 167.3
9.1
11.5 .
So his height is 11.5 SDs above the mean. The most accurate
tables show that 6 SDs is only exceeded with a probability of
10
158
Example
If X ~ N ( 4, 9) , find
(a) P( X > 6 )
(b) P( X > 1)
Solution
Now Z =
X X 4
=
,
(a) Hence
P( X > 6 ) = 1 P( X < 6 )
6 4
= 1
3
= 1 ( 0.67)
= 1 0. 74857
= 0.25143
Exercise 8C
1. If X ~ N ( 200,625) , find
(a) P( X > 250 )
3 1
is
N 6300, 1900
)m s
3 1
159
8.4
Example
A machine produces bolts which are N ( 4, 0.09) , where
measurements are in mm. Bolts are measured accurately and any
which are smaller than 3.5 mm or bigger than 4.4 mm are rejected.
Out of a batch of 500 bolts how many would be acceptable?
Solution
( 4. 4 4 )
P ( X < 4. 4 ) =
(1.33) = 0.90824
0.3
(3.5 4 )
P ( X < 3.5) =
( 1.67) = 0.04746 .
0.3
P (3.5 < X < 4. 4 ) 0.90824 0.04746
Hence
= 0.86078 .
The number of acceptable items is therefore
Example
IQ tests are measured on a scale which is N (100, 225) . A woman
wants to form an 'Eggheads Society' which only admits people with
the top 1% of IQ scores. What would she have to set as the cut-off
point in the test to allow this to happen?
Solution
From tables you need to find z such that ( z ) = 0.99 .
This is most easily carried out using a 'percentage points of the
normal distribution' table, which gives the values directly.
Now
1 ( 0.99) = 2.3263
(2.3263) = 0.99 .
(Check this using the usual tables.)
160
3.5
4.4
x = 100 + 2.3263 15
Hence
= 134.8945 134.9 .
Example
A manufacturer does not know the mean and SD of the diameters
of ball bearings he is producing. However, a sieving system
rejects all bearings larger than 2.4 cm and those under 1.8 cm in
diameter. Out of 1000 ball bearings 8% are rejected as too small
and 5.5% as too big. What is the mean and standard deviation of
the ball bearings produced?
Solution
Assume a normal distribution of
(1 0.08)
= 1. 4 ;
0.08
1.8
0.055
2.4
1 (1 0.055) = 1.6 ,
Also
+ 1.6 = 2. 4
1. 4 = 1.8 .
Subtracting,
3.0 = 0.6
= 0.2
+ (1.6 0.2 ) = 2. 4
= 2. 4 (1.6 0.2 )
= 2.08 .
161
Exercise 8D
1. Bags of sugar are sold as 1 kg. To ensure bags
are not sold underweight the machine is set to
put a mean weight of 1004 g in each bag. The
manufacturer claims that the process works to a
standard deviation of 2.4. What proportion of
bags are underweight?
2. Parts for a machine are acceptable within the
'tolerance' limits of 20.5 to 20.6 mm. From
previous tests it is known that the machine
).
N ( 4.5, 1.0 ) kg. All the aids are tested and any
which are unable to support at least 5 kg are
thrown out.
8.5
162
= n p = 200 0. 4 = 80
o = n p (1 p ) = 200 0. 4 0.6 = 48
2
= 6.93 .
2
and
= 1
99 100
99.5 80
6.93
= 1 ( 2.81)
= 1 0.99752
(from tables)
= 0.00248.
Example
Customers arrive at a garage at an average rate of 2 per five
minute period. What is the probability that less than 15 arrive in
a one hour period?
Solution
Hence
14.5 24
P ( less than 15 in an hour ) =
4.9
( 1.94 )
= 1 0.97381
= 0.02619
(Note that 14.5 was used since less than 15 is required.)
163
0.14
n = 30
p = 0.8
n = 50
p = 0.2
0.12
0.15
0.1
0.08
0.1
0.06
0.04
0.05
0.02
0
0
0.12
0.1
24
10
0.35
n = 50
p = 0.5
n = 10
p = 0.2
0.3
0.25
0.08
0.2
0.06
0.15
0.04
0.1
0.02
0.05
25
Poisson distributions
0.2
0.12
=4
= 15
0.1
0.15
0.08
0.1
0.06
0.04
0.05
0.02
4
0.14
=9
0.12
0.1
0.08
0.06
0.04
0.02
0
164
16
Activity 3
Check that the diagrams illustrate that
(a) for a binomial distribution, if p is close to 0.5, the normal is
a good approximation even for quite small n. However, if p
is small or large, then a larger value of n will be required
for the approximation to be good;
(If n > 30 , np > 5 , nq > 5 , then this is generally regarded
as a satisfactory set of circumstances to use a normal
approximation.)
(b) for a Poisson distribution, the larger n is the better the
approximation.
( > 20 is usually regarded as a necessary condition to use
a normal approximation.)
X ~ B( n, p )
X ~ B( n, p )
Approximating distribution
approximation
X ~ Po( n p )
1
and n >10
2
or
p moving away from
X ~ Po( )
1
and
2
X ~ N ( n p, n p q )
n > 30
(q = 1 p)
> 20 (say)
X ~ N ( , )
Example
If X ~ B(20, 0. 4 ) , find P(6 X 10 ) .
Also find approximations to this probability by using the
(a) normal distribution
(b) Poisson distribution.
165
Solution
P( X = 6 ) = 20C6 ( 0.6 )
14
Similarly
( 0. 4 )6 = 0.1244
P( X = 7) = 0.1659
P( X = 8) = 0.1797
P( X = 9) = 0.1597
P( X = 10 ) = 0.1171
P(6 X 10 ) = 0. 747 to 3 decimal places.
Hence
n p q = 20 0. 4 0.6 = 4.8
and
So
X ~ N (8, 4.8)
X 8
,
4.8
10.5 8
5.5 8
P( 5.5 < X < 10.5) =
4.8
4.8
= (1.141) ( 1.141)
= 2(1.141) 1
2 0.87286 1
= 0. 746 to 3 decimal places.
(Note that this is very close to the value found above.)
=np=8
So
X ~ Po(8) and P( X = x ) = e 8
8x
x!
This gives
P( X = 6 ) = e 8
Similarly
86
= 0.1221
6!
P( X = 7) = 0.1396
P( X = 8) = 0.1396
166
5.5
10.5
P( X = 9) = 0.1241
P( X = 10 ) = 0.0993
Thus
Example
Answer the following questions using, in each case, tables of the
binomial, Poisson or normal distribution according to which you
think is most appropriate.
(a) Cars pass a point on a busy city centre road at an average
rate of 7 per five second interval. What is the probability
that in a particular five second interval the number of cars
passing will be
(i) 7 or less
(ii) exactly 7?
(b) Weather records show that for a certain airport during the
winter months an average of one day in 25 is foggy enough
to prevent landings. What is the probability that in a period
of seven winter days landings are prevented on
(i) 2 or more days?
(ii) no days?
(c) The working lives of a particular brand of electric light
bulb are distributed with mean 1200 hours and standard
deviation 200 hours. What is the probability of a bulb
lasting more than 1150 hours?
(AEB)
Solution
(a) The Poisson distribution is suitable here since the question
concerns a random event that can occur 0, 1, 2, ... times.
The mean value is x = 7, giving, from tables,
(i) P( 7 or less) = 0.5987
(ii) P( 7) = P( 7 or less) P(6 or less)
= 0.5987 0. 4497
= 0.149 .
167
24
(ii) P( no days) =
25
0. 7514.
X ~ N 1200, 200 2
and
200
= 1 ( 0.25)
= 1
= ( 0.25)
= 0.59871.
Exercise 8E
1. The probability of someone smoking is about 0.4.
What is the probability that:
(a) in a group of 50 people more than half of
them smoke;
(b) in a group of 150, less than 50 of them
smoke?
2. It is known nationally that support for the Story
party is 32% from election results. In a survey
carried out on 200 voters what is the probability
that more than 80 of them are Story supporters?
3. A manufacturer knows from experience that his
machines produce defects at a rate of 5%. In a
day's production of 500 items 40 defects are
produced. The Production Manager says this is
not surprising. Is there evidence to support this?
168
8.6
Activity 4
Generate 10 random numbers and put them straight into the
statistical function of your calculator. Write down x, the mean
of your sample.
Repeat this 20 times and write down the means of the samples
(remember to clear the statistical memories each time).
Plot these twenty results on normal probability paper and find
the mean and SD of the sample means.
You should find that the twenty values are roughly normal, with
mean, not suprisingly, 0.5 and SD 0.1. The SD has been
decreased by a factor equivalent to the square root of the size of
the sample, i.e. 10 = 3.16 .
This is the basis of a very important theorem, called the Central
Limit Theorem. This says that, irrespective of the original
distribution, sample means are normally distributed about the
original distribution mean with 'standard error' equal to
n
being the original SD and n the sample size. This will be
explained in more detail in the next chapter.
169
8.7
Miscellaneous Exercises
170
(AEB)
171
172