Representation of Data
Representation of Data
Representation of Data
JANUARY 2008
1.
Cotinine is a chemical that is made by the body from nicotine which is found in cigarette
smoke. A doctor tested the blood of 12 patients, who claimed to smoke a packet of
cigarettes a day, for cotinine. The results, in appropriate units, are shown below.
Patient
Cotinine
level, x
160
390
169
175
125
420
171 250
I
210
258 186
L
243
724 961 ]
(a) Find the mean and standard deviation of the level of cotinine in a patients blood.
(4)
(b) Find the median, upper and lower quartiles of these data.
(3)
A doctor suspects that some of his patients have been smoking more than a packet of
cigarettes per day. He decides to use Q3+1.5(Q3Q1) to determine if any of the cotinine
results are far enough away from the upper quartile to be outliers.
(c) Identify which patient(s) may have been smoking more than a packet of cigarettes a
day. Show your working clearly.
(4)
Research suggests that cotinine levels in the blood form a skewed distribution.
(Q1 2Q 2 + Q3 )
One measure of skewness is found using
.
(Q3 Q1 )
(d) Evaluate this measure and describe the skewness of these data.
(3)
JUNE 2008
2.
The age in years of the residents of two hotels are shown in the back to back stem and leaf
diagram below.
Abbey Hotel
850
(1)
(4)
(4)
(11)
(6)
(1)
2
9751
9831
99997665332
987750
8
0
1
2
3
4
5
6
7
Balmoral Hotel
6
447
005569
000013667
233457
015
(1)
(3)
(6)
(9)
(6)
(3)
JANUARY 2009
3.
In a shopping survey a random sample of 104 teenagers were asked how many hours, to
the nearest hour, they spent shopping in the last month. The results are summarised in the
table below.
Number of hours
Mid-point
Frequency
05
2.75
20
67
6.5
16
8 10
18
11 15
13
25
16 25
20.5
15
26 50
38
10
A histogram was drawn and the group (8 10) hours was represented by a rectangle that
was 1.5 cm wide and 3 cm high.
(a) Calculate the width and height of the rectangle representing the group (16 25)
hours.
(3)
(b) Use linear interpolation to estimate the median and interquartile range.
(5)
(c) Estimate the mean and standard deviation of the number of hours spent shopping.
(4)
(d) State, giving a reason, the skewness of these data.
(2)
(e) State, giving a reason, which average and measure of dispersion you would recommend
to use to summarise these data.
(2)
JUNE 2009
4.
The variable x was measured to the nearest whole number. Forty observations are given in
the table below.
x
10 15
16 18
19
Frequency
15
16
A histogram was drawn and the bar representing the 10 15 class has a width of 2 cm and
a height of 5 cm. For the 16 18 class find
(a) the width,
(1)
(b) the height
(2)
of the bar representing this class.
5.
A researcher measured the foot lengths of a random sample of 120 ten-year-old children.
The lengths are summarised in the table below.
Foot length, l, (cm)
Number of children
10 - l < 12
12 - l < 17
53
17 - l < 19
29
19 - l < 21
15
21 - l < 23
11
23 - l < 25
3(mean median )
standard deviation
(c) Evaluate this coefficient and comment on the skewness of these data.
(3)
Greg suggests that a normal distribution is a suitable model for the foot lengths of
ten-year-old children.
(d) Using the value found in part (c), comment on Gregs suggestion, giving a reason for
your answer.
(2)
JANUARY 2010
6.
The 19 employees of a company take an aptitude test. The scores out of 40 are illustrated
in the stem and leaf diagram below.
2|6 means a score of 26
0
1
2
3
4
7
88
4468
2333459
00000
(1)
(2)
(4)
(7)
(5)
Find
(a) the median score,
(1)
(b) the interquartile range.
(3)
The company director decides that any employees whose scores are so low that they are
outliers will undergo retraining.
An outlier is an observation whose value is less than the lower quartile minus 1.0 times
the interquartile range.
(c) Explain why there is only one employee who will undergo retraining.
(2)
(d) On the graph paper on page 5, draw a box plot to illustrate the employees scores.
(3)
10
15
20
25
30
35
40
45
50
55
60
Score
7.
The birth weights, in kg, of 1500 babies are summarised in the table below.
Weight (kg)
Midpoint, xkg
Frequency, f
0.0 1.0
0.50
1.0 2.0
1.50
2.0 2.5
2.25
60
2.5 3.0
280
3.0 3.5
3.25
820
3.5 4.0
3.75
320
4.0 5.0
4.50
10
5.0 6.0
JUNE 2010
8.
A teacher selects a random sample of 56 students and records, to the nearest hour, the time
spent watching television in a particular week.
Hours
1 10
11 20
21 25
26 30
31 40
41 59
Frequency
15
11
13
Mid-point
5.5
15.5
28
50
(2)
A histogram was drawn to represent these data. The 11 20 group was represented by a
bar of width 4 cm and height 6 cm.
(b) Find the width and height of the 26 30 group.
(3)
(c) Estimate the mean and standard deviation of the time spent watching television by
these students.
(5)
(d) Use linear interpolation to estimate the median length of time spent watching
television by these students.
(2)
The teacher estimated the lower quartile and the upper quartile of the time spent watching
television to be 15.8 and 29.3 respectively.
(e) State, giving a reason, the skewness of these data.
(2)
JANUARY 2011
9.
Keith records the amount of rainfall, in mm, at his school, each day for a week. The results
are given below.
2.8
5.6
2.3
9.4
0.0
0.5
1.8
Jenny then records the amount of rainfall, x mm, at the school each day for the following
21 days. The results for the 21 days are summarised below.
x = 84.6
(a) Calculate the mean amount of rainfall during the whole 28 days.
(2)
Keith realises that he has transposed two of his figures. The number 9.4 should have been
4.9 and the number 0.5 should have been 5.0
Keith corrects these figures.
(b) State, giving your reason, the effect this will have on the mean.
(2)
10. Over a long period of time a small company recorded the amount it received in sales per
month. The results are summarised below.
Amount received in sales (1000s)
Two lowest values
3, 4
Lower quartile
Median
12
Upper quartile
14
20, 25
10
15
20
Sales (1000s)
25
30
(b) State the skewness of the distribution of the amount of sales received. Justify your
answer.
(2)
(c) The company claims that for 75 % of the months, the amount received per month is
greater than 10 000. Comment on this claim, giving a reason for your answer.
(2)
11. On a randomly chosen day, each of the 32 students in a class recorded the time, t minutes
to the nearest minute, they spent on their homework. The data for the class is summarised
in the following table.
Time, t
Number of students
10 19
20 29
30 39
40 49
11
50 69
70 79
t = 1414
and
t 2 = 69 378
(b) find the mean and the standard deviation of the times spent by the students on their
homework.
(3)
(c) Comment on the skewness of the distribution of the times spent by the students on
their homework. Give a reason for your answer.
(2)
JUNE 2011
12. A class of students had a sudoku competition. The time taken for each student to complete
the sudoku was recorded to the nearest minute and the results are summarised in the table
below.
Time
Mid-point, x
Frequency, f
2-8
9 - 12
13 - 15
14
16 - 18
17
19 - 22
20.5
23 - 30
26.5
fx 2 = 8603.75 )
Q2 = 13.0
Q3 = 21.0
(e) Describe, giving a reason, the skewness of the times on this occasion.
(2)
JANUARY 2012
Frequency Density
13. The histogram in Figure 1 shows the time, to the nearest minute, that a random sample of
100 motorists were delayed by roadworks on a stretch of motorway.
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0
5
10 15 20
Time (minutes)
25
Figure 1
(a) Complete the table.
Delay (minutes)
Number of motorists
46
78
9
21
10 12
45
13 15
16 20
(2)
(b) Estimate the number of motorists who were delayed between 8.5 and 13.5 minutes by
the roadworks.
(2)
14. The marks, x, of 45 students randomly selected from those students who sat a mathematics
examination are shown in the stem and leaf diagram below.
Mark
3
4
4
5
5
6
6
7
699
012234
56668
023344
556779
000013444
556789
1233
Totals
(3)
(6)
(5)
(6)
(6)
(9)
(6)
(4)
Key
JUNE 2012
15.
Frequency
density
10
15
20 25 30
Speed of car
35
40
45
50
Figure 2
A policeman records the speed of the traffic on a busy road with a 30 mph speed limit.
He records the speeds of a sample of 450 cars. The histogram in Figure 2 represents the
results.
(a) Calculate the number of cars that were exceeding the speed limit by at least 5 mph in
the sample.
(4)
(b) Estimate the value of the mean speed of the cars in the sample.
(3)
(c) Estimate, to 1 decimal place, the value of the median speed of the cars in the sample.
(2)
(d) Comment on the shape of the distribution. Give a reason for your answer.
(2)
(e) State, with a reason, whether the estimate of the mean or the median is a better
representation of the average speed of the traffic on the road.
(2)
JANUARY 2013
16. A survey of 100 households gave the following results for weekly income y.
Income y ()
0 -y < 200
200 -y < 240
240 -y < 320
320 -y < 400
400 -y < 600
600 -y < 800
Mid-point
100
220
280
360
500
700
Frequency f
12
28
22
18
12
8
fy2 = 12 452 800)
A histogram was drawn and the class 200 - y < 240 was represented by a rectangle of
width 2 cm and height 7 cm.
(a) Calculate the width and the height of the rectangle representing the class
320 -y < 400
(3)
(b) Use linear interpolation to estimate the median weekly income to the nearest pound.
(2)
(c) Estimate the mean and the standard deviation of the weekly income for these data.
(4)
One measure of skewness is
3(mean median)
.
standard deviation
(d) Use this measure to calculate the skewness for these data and describe its value.
(2)
Katie suggests using the random variable X which has a normal distribution with mean
320 and standard deviation 150 to model the weekly income for these data.
(e) Find P(240 < X < 400).
(2)
(f) With reference to your calculations in parts (d) and (e) and the data in the table,
comment on Katies suggestion.
(2)
JUNE 2013
17. The marks of a group of female students in a statistics test are summarised in Figure 1
Females
Males
10
20
30
40
50 60
Mark
70
80
90
100
Figure 1
(a) Write down the mark which is exceeded by 75% of the female students.
(1)
The marks of a group of male students in the same statistics test are summarised by the
stem and leaf diagram below.
Mark
1
2
3
4
5
6
7
8
9
Totals
(1)
(1)
(3)
(6)
(9)
(6)
(3)
(1)
(1)
(b) Find the median and interquartile range of the marks of the male students.
(3)
An outlier is a mark that is
either more than 1.5 interquartile range above the upper quartile
or more than 1.5 interquartile range below the lower quartile.
(c) In the space provided on Figure 1 draw a box plot to represent the marks of the male
students, indicating clearly any outliers.
(5)
(d) Compare and contrast the marks of the male and the female students.
(2)
18. The following table summarises the times, t minutes to the nearest minute, recorded for a
group of students to complete an exam.
Time (minutes) t
Number of students f
11 20
21 25
26 30
31 35
36 45
46 60
62
88
16
13
11
10
ft
= 134281.25]
6 15
16 20
21 25
26 30
31 40
41 55
62
88
16
13
11
10
(f) Without further calculations, explain the effect this would have on each of the
estimates found in parts (a), (b), (c) and (d).
(3)
19. An agriculturalist is studying the yields, y kg, from tomato plants. The data from a random
sample of 70 tomato plants are summarised below.
Yield ( y kg)
Frequency (f )
0-y<5
16
2.5
5 - y < 10
24
7.5
10 - y < 15
14
12.5
15 - y < 25
12
20
25 - y < 35
30
fx = 755 and fx
= 12037.5)
1
(a)
2757
, 229.75
12
724961
(229.75) 2 , = 87.34045
12
mean is
sd is
(b)
Ordered list is: 125, 160, 169, 171, 175, 186, 210, 243, 250, 258, 390, 420
Q2 = 12 (186 + 210 ) = 198
(c)
(d)
1
2
2
(a)
50
(b)
Q1
45
Q2
Q3
50.5
63
(c)
Mean =
1469
28
52.464286..
2
81213 1469
Sd =
28
28
=12.164. or 12.387216for divisor n
(d)
52.46.. 50
sd
(e)
Q2 = 7.5 +
Q1 = 5.5 +
( 52 36 ) 3 = 10.2
18
( 26 20 )
16
( 78 54 ) 5 = 15.3
Q3 = 10.5 +
[
]
25
or 10.5 +
16
1333.5
AWRT 12.8
=
104
27254
AWRT 9.88
fx 2 = 27254 x = 104 x 2 = 262.05 x 2
Q3 Q2 [ = 5.1] > Q2 Q1 [ = 3.9] or Q2 < x
fx = 1333.5 x =
(d)
(a)
1(cm)
Therefore frequency of 9 is
9
10
9 or
1.5
15
height = 6(cm)
5
(a)
60 58
Q2 = 17 +
2
29
fx
fx = 2055.5
= 36500.25
Mean = 17.129
36500.25
120
2055.5
120
3 (17.129 17.1379...)
3.28
No skew/ slight skew
= -0.00802
(a)
Median is 33
(b)
(c)
Q1 IQR=24 16 = 8
So 7 is only outlier
(d)
Box
Outlier
Whisker
(a)
(b)
(c)
4841
3.2273&
1500
awrt 3.23
2
Standard deviation =
15889.5 4841
= 0.421093... or s = 0.4212337...
1500
1500
(d)
Q2 = 3.00 +
(e)
Mean(3.23)<Median(3.25)
Negative Skew
403
0.5 = 3.2457...
820
(a)
23,
(c)
fx
1316.5
56
1316.5 x
awrt 23.5
fx
So =
(d)
(e)
37378.25
x 2 = awrt10.7
56
Q2 = (20.5) +
11
[7.9 >5.6 so ]
allow s = 10.8
( 28 21) 5 = 23.68
Q3 Q2 = 5.6, Q2 Q1 = 7.9
(or x < Q2 )
negative skew
9.
(a) 2.8 + 5.6 + 2.3 + 9.4 + 0.5 + 1.8 + 84.6 = 107
(awrt 3.8)
one is 4.5 under what it should be and the other is 4.5 above what it should be.
10.
(a) Outliers
37
35
33
y
29
27
25
10
15
20
25
Sales in 000
(b) Since Q3 Q2 < Q2 Q1. Allow written explanation
negatively skew
(c) not true
since the lower quartile is 7000 and therefore 75% above 7000 not 10000
or 10 is inside the box or any other sensible comment
11.
(a) Median = 32/2 = 16th term (16.5)
x 39.5
16 14
2
or x = 39.5 + 10
=
49.5 39.5 25 14
11
Mean=
1414
= 44.1875
32
69378 1414
Standard deviation =
32
32
= 14.7
(awrt 41.3)
(awrt 44.2)
(or s = 14.9)
12.
10.5
(a)
(b)
( Q2 = ) (15.5 + )
1 30 14
2
3 or
1 31 14
2
= 15.875 or 16.0625
(c)
477.5
191
11
= 15.9
( 15.916&)
[ Accept
]
or 15 12
30
12
8603.75
=
x 2 ,= 5.78 (accept s = 5.88)
30
x=
(d)
Since mean and median are similar (or equal or very close) a normal distribution
may be suitable. [Allow mean or median close to mode/modal class]
(e)
13 (a)
(b)
14 (a)
(b)
(c)
14, 5
21 + 45 + 3 = 69
60
Q1 = 46
Q2 = 56
Q3 = 64
mean = 55.48.
or
2497
45
awrt 55.5
143369 2497
45
45
= 10.342
(s = 10.459..)
sd =
(d)
Mean < median < mode or Q2 Q1 > Q3 Q2 with or without their numbers or
median closer to upper quartile (than lower quartile) or (mean-median)/sd <0;
negative skew;
(e)
15.
(a)
450
450
"562.5"
or one small square =
(o.e. e.g.
)
"22.5"
"562.5"
450
One large square = 20 cars or one small square = 0.8 cars or 1 car = 1.25 squares
No. > 35 mph is: 4.5 "20" or 112.5 "0.8" (or equivalent e.g. using fd)
= 90 (cars)
(b)
[x] =
(c)
[Q2 =]
(d)
Q2 < x
20 +
195
10 (o.e.) [Allow use of (n + 1) giving 195.5 instead of 195]
240
= 28.125 [Use of (n + 1) gives 28.145]
awrt 28.1
[Condone Q2 x ]
[ so (almost) symmetric ]
So positive skew
median ( Q2 )
(e) [If chose skew in (d)]
Since the data is skewed
or
median not affected by extreme values
m = ( 240 ) +
10
80 (o.e.)
22
= 276.36...
(c)
fy = 31600
y =
( 3040
11 )
leading to y = 316
( )
12452800
y
100
z=
80
150
17. (a) 25
(b) Q2 ( or median or m ) = 51
IQR = 63 46 ,= 17
(or Q3 Q1 = 17 )
46 1.5 17 = 20.5
(c) Outliers given by
Outliers limits are 20.5 and 88.5
or 63 + 1.5 17 = 88.5
Allow lower
whisker to 20.5
and upper
whisker to 88.5
Do not allow a
mix of whiskers
e.g 20.5 and 85
Do not allow
both sets of
whiskers
(d) Medians:
IQR: IQR for females smaller than males. Allow lower/higher but not wider
Range: Range of females is less than males
Skewness: Male and female marks are both positively skew
Ignore other statements about average, spread, mean, st. Dev, variation, outliers etc
18. (a)
ft = 4837.5
Mean =
"4837.5"
= 24.1875
200
awrt 24.2 or
387
16
134281.25 4837.5
=
200
200
= 9.293 .....
(accept s =9.32)
(b)
(c)
Q 2 = [ 20.5] +
Q1 = 10.5 +
( 50 / 50.25 ) 10
62
[ = 18.56]
(*)
awrt
9.29
awrt 22.7
(n + 1 gives 18.604)
19.
[Q2 = ] ( 5 + )
= 8.9583
(c)
19
19.5
5
or (use of (n + 1)) (5+ )
5
24
24
awrt 8.96
or
9.0625 awrt 9.06
755
or awrt 10.8
70
12037.5
x2 =
[ x =]
70
[ x =]
55.6326...
= awrt 7.46
(d)
x > Q2
So positive skew
(e)
(25 "18.3")
12 ( +4 ) (o.e.)
10
= 12.04 so 12 plants