Representation of Data

REPRESENTATION OF DATA
JANUARY 2008
1.
Cotinine is a chemical that is made by the body from nicotine which is found in cigarette
smoke. A doctor tested the blood of 12 patients, who claimed to smoke a packet of
cigarettes a day, for cotinine. The results, in appropriate units, are shown below.
Patient
Cotinine
level, x
160
390
169
175
125
420
[You may use
171 250
I
210
258 186
L
243
724 961 ]
(a) Find the mean and standard deviation of the level of cotinine in a patients blood.
(4)
(b) Find the median, upper and lower quartiles of these data.
(3)
A doctor suspects that some of his patients have been smoking more than a packet of
cigarettes per day. He decides to use Q3+1.5(Q3Q1) to determine if any of the cotinine
results are far enough away from the upper quartile to be outliers.
(c) Identify which patient(s) may have been smoking more than a packet of cigarettes a
day. Show your working clearly.
(4)
Research suggests that cotinine levels in the blood form a skewed distribution.
(Q1 2Q 2 + Q3 )
One measure of skewness is found using
.
(Q3 Q1 )
(d) Evaluate this measure and describe the skewness of these data.
(3)
JUNE 2008
2.
The age in years of the residents of two hotels are shown in the back to back stem and leaf
diagram below.
Abbey Hotel
850
(1)
(4)
(4)
(11)
(6)
(1)
means 58 years in Abbey hotel and 50 years in Balmoral hotel
2
9751
9831
99997665332
987750
8
0
1
2
3
4
5
6
7
Balmoral Hotel
6
447
005569
000013667
233457
015
(1)
(3)
(6)
(9)
(6)
(3)
For the Balmoral Hotel,

(a) write down the mode of the age of the residents,
(1)
(b) find the values of the lower quartile, the median and the upper quartile.
(3)
(c) (i) Find the mean, x , of the age of the residents.
(ii) Given that
81 213 find the standard deviation of the age of the residents.

(4)
One measure of skewness is found using

mean mode
standard deviation
(d) Evaluate this measure for the Balmoral Hotel.
(2)
For the Abbey Hotel, the mode is 39, the mean is 33.2, the standard deviation is 12.7 and
the measure of skewness is 0.454
(e) Compare the two age distributions of the residents of each hotel.
(3)
JANUARY 2009
3.
In a shopping survey a random sample of 104 teenagers were asked how many hours, to
the nearest hour, they spent shopping in the last month. The results are summarised in the
table below.
Number of hours
Mid-point
Frequency
05
2.75
20
67
6.5
16
8 10
18
11 15
13
25
16 25
20.5
15
26 50
38
10
A histogram was drawn and the group (8 10) hours was represented by a rectangle that
was 1.5 cm wide and 3 cm high.
(a) Calculate the width and height of the rectangle representing the group (16 25)
hours.
(3)
(b) Use linear interpolation to estimate the median and interquartile range.
(5)
(c) Estimate the mean and standard deviation of the number of hours spent shopping.
(4)
(d) State, giving a reason, the skewness of these data.
(2)
(e) State, giving a reason, which average and measure of dispersion you would recommend
to use to summarise these data.
(2)
JUNE 2009
4.
The variable x was measured to the nearest whole number. Forty observations are given in
the table below.
x
10 15
16 18
19
Frequency
15
16
A histogram was drawn and the bar representing the 10 15 class has a width of 2 cm and
a height of 5 cm. For the 16 18 class find
(a) the width,
(1)
(b) the height
(2)
of the bar representing this class.
5.
A researcher measured the foot lengths of a random sample of 120 ten-year-old children.
The lengths are summarised in the table below.
Foot length, l, (cm)
Number of children
10 - l < 12
12 - l < 17
53
17 - l < 19
29
19 - l < 21
15
21 - l < 23
11
23 - l < 25
(a) Use interpolation to estimate the median of this distribution.

(2)
(b) Calculate estimates for the mean and the standard deviation of these data.
(6)
One measure of skewness is given by
Coefficient of skewness =
3(mean median )
standard deviation
(c) Evaluate this coefficient and comment on the skewness of these data.
(3)
Greg suggests that a normal distribution is a suitable model for the foot lengths of
ten-year-old children.
(d) Using the value found in part (c), comment on Gregs suggestion, giving a reason for
your answer.
(2)
JANUARY 2010
6.
The 19 employees of a company take an aptitude test. The scores out of 40 are illustrated
in the stem and leaf diagram below.
2|6 means a score of 26
0
1
2
3
4
7
88
4468
2333459
00000
(1)
(2)
(4)
(7)
(5)
Find
(a) the median score,
(1)
(b) the interquartile range.
(3)
The company director decides that any employees whose scores are so low that they are
outliers will undergo retraining.
An outlier is an observation whose value is less than the lower quartile minus 1.0 times
the interquartile range.
(c) Explain why there is only one employee who will undergo retraining.
(2)
(d) On the graph paper on page 5, draw a box plot to illustrate the employees scores.
(3)
10
15
20
25
30
35
40
45
50
55
60
Score
7.
The birth weights, in kg, of 1500 babies are summarised in the table below.
Weight (kg)
Midpoint, xkg
Frequency, f
0.0 1.0
0.50
1.0 2.0
1.50
2.0 2.5
2.25
60
2.5 3.0
280
3.0 3.5
3.25
820
3.5 4.0
3.75
320
4.0 5.0
4.50
10
5.0 6.0
[You may use fx = 4841 and fx2 = 15 889.5]
(a) Write down the missing midpoints in the table above.

(2)
(b) Calculate an estimate of the mean birth weight.
(2)
(c) Calculate an estimate of the standard deviation of the birth weight.
(3)
(d) Use interpolation to estimate the median birth weight.
(2)
(e) Describe the skewness of the distribution. Give a reason for your answer.
(2)
JUNE 2010
8.
A teacher selects a random sample of 56 students and records, to the nearest hour, the time
spent watching television in a particular week.
Hours
1 10
11 20
21 25
26 30
31 40
41 59
Frequency
15
11
13
Mid-point
5.5
15.5
28
(a) Find the mid-points of the 21 25 hour and 31 40 hour groups.
50
(2)
A histogram was drawn to represent these data. The 11 20 group was represented by a
bar of width 4 cm and height 6 cm.
(b) Find the width and height of the 26 30 group.
(3)
(c) Estimate the mean and standard deviation of the time spent watching television by
these students.
(5)
(d) Use linear interpolation to estimate the median length of time spent watching
television by these students.
(2)
The teacher estimated the lower quartile and the upper quartile of the time spent watching
television to be 15.8 and 29.3 respectively.
(e) State, giving a reason, the skewness of these data.
(2)
JANUARY 2011
9.
Keith records the amount of rainfall, in mm, at his school, each day for a week. The results
are given below.
2.8
5.6
2.3
9.4
0.0
0.5
1.8
Jenny then records the amount of rainfall, x mm, at the school each day for the following
21 days. The results for the 21 days are summarised below.
x = 84.6
(a) Calculate the mean amount of rainfall during the whole 28 days.
(2)
Keith realises that he has transposed two of his figures. The number 9.4 should have been
4.9 and the number 0.5 should have been 5.0
Keith corrects these figures.
(b) State, giving your reason, the effect this will have on the mean.
(2)
10. Over a long period of time a small company recorded the amount it received in sales per
month. The results are summarised below.
Amount received in sales (1000s)
Two lowest values
3, 4
Lower quartile
Median
12
Upper quartile
14
Two highest values
20, 25
An outlier is an observation that falls

either 1.5 interquartile range above the upper quartile
or 1.5 interquartile range below the lower quartile.
(a) On the graph paper below, draw a box plot to represent these data, indicating clearly
any outliers.
(5)
10
15
20
Sales (1000s)
25
30
(b) State the skewness of the distribution of the amount of sales received. Justify your
answer.
(2)
(c) The company claims that for 75 % of the months, the amount received per month is
greater than 10 000. Comment on this claim, giving a reason for your answer.
(2)
11. On a randomly chosen day, each of the 32 students in a class recorded the time, t minutes
to the nearest minute, they spent on their homework. The data for the class is summarised
in the following table.
Time, t
Number of students
10 19
20 29
30 39
40 49
11
50 69
70 79
(a) Use interpolation to estimate the value of the median.

(2)
Given that
t = 1414
and
t 2 = 69 378
(b) find the mean and the standard deviation of the times spent by the students on their
homework.
(3)
(c) Comment on the skewness of the distribution of the times spent by the students on
their homework. Give a reason for your answer.
(2)
JUNE 2011
12. A class of students had a sudoku competition. The time taken for each student to complete
the sudoku was recorded to the nearest minute and the results are summarised in the table
below.
Time
Mid-point, x
Frequency, f
2-8
9 - 12
13 - 15
14
16 - 18
17
19 - 22
20.5
23 - 30
26.5
(You may use
fx 2 = 8603.75 )
(a) Write down the mid-point for the 9 - 12 interval.

(1)
(b) Use linear interpolation to estimate the median time taken by the students.
(2)
(c) Estimate the mean and standard deviation of the times taken by the students.
(5)
The teacher suggested that a normal distribution could be used to model the times taken
by the students to complete the sudoku.
(d) Give a reason to support the use of a normal distribution in this case.
(1)
On another occasion the teacher calculated the quartiles for the times taken by the students
to complete a different sudoku and found
Q1 = 8.5
Q2 = 13.0
Q3 = 21.0
(e) Describe, giving a reason, the skewness of the times on this occasion.
(2)
JANUARY 2012
Frequency Density
13. The histogram in Figure 1 shows the time, to the nearest minute, that a random sample of
100 motorists were delayed by roadworks on a stretch of motorway.
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0
5
10 15 20
Time (minutes)
25
Figure 1
(a) Complete the table.
Delay (minutes)
Number of motorists
46
78
9
21
10 12
45
13 15
16 20
(2)
(b) Estimate the number of motorists who were delayed between 8.5 and 13.5 minutes by
the roadworks.
(2)
14. The marks, x, of 45 students randomly selected from those students who sat a mathematics
examination are shown in the stem and leaf diagram below.
Mark
3
4
4
5
5
6
6
7
699
012234
56668
023344
556779
000013444
556789
1233
Totals
(3)
(6)
(5)
(6)
(6)
(9)
(6)
(4)
Key
(3|6 means 36)
(a) Write down the modal mark of these students.

(1)
(b) Find the values of the lower quartile, the median and the upper quartile.
(3)
For these students x = 2497 and x2 = 143 369
(c) Find the mean and the standard deviation of the marks of these students.
(3)
(d) Describe the skewness of the marks of these students, giving a reason for your answer.
(2)
The mean and standard deviation of the marks of all the students who sat the examination
were 55 and 10 respectively. The examiners decided that the total mark of each student
should be scaled by subtracting 5 marks and then reducing the mark by a further 10 %.
(e) Find the mean and standard deviation of the scaled marks of all the students.
(4)
JUNE 2012
15.
Frequency
density
10
15
20 25 30
Speed of car
35
40
45
50
Figure 2
A policeman records the speed of the traffic on a busy road with a 30 mph speed limit.
He records the speeds of a sample of 450 cars. The histogram in Figure 2 represents the
results.
(a) Calculate the number of cars that were exceeding the speed limit by at least 5 mph in
the sample.
(4)
(b) Estimate the value of the mean speed of the cars in the sample.
(3)
(c) Estimate, to 1 decimal place, the value of the median speed of the cars in the sample.
(2)
(d) Comment on the shape of the distribution. Give a reason for your answer.
(2)
(e) State, with a reason, whether the estimate of the mean or the median is a better
representation of the average speed of the traffic on the road.
(2)
JANUARY 2013
16. A survey of 100 households gave the following results for weekly income y.
Income y ()
0 -y < 200
200 -y < 240
240 -y < 320
320 -y < 400
400 -y < 600
600 -y < 800
Mid-point
100
220
280
360
500
700
Frequency f
12
28
22
18
12
8

fy2 = 12 452 800)
A histogram was drawn and the class 200 - y < 240 was represented by a rectangle of
width 2 cm and height 7 cm.
(a) Calculate the width and the height of the rectangle representing the class
320 -y < 400
(3)
(b) Use linear interpolation to estimate the median weekly income to the nearest pound.
(2)
(c) Estimate the mean and the standard deviation of the weekly income for these data.
(4)
One measure of skewness is
3(mean median)
.
standard deviation
(d) Use this measure to calculate the skewness for these data and describe its value.
(2)
Katie suggests using the random variable X which has a normal distribution with mean
320 and standard deviation 150 to model the weekly income for these data.
(e) Find P(240 < X < 400).
(2)
(f) With reference to your calculations in parts (d) and (e) and the data in the table,
comment on Katies suggestion.
(2)
JUNE 2013
17. The marks of a group of female students in a statistics test are summarised in Figure 1
Females
Males
10
20
30
40
50 60
Mark
70
80
90
100
Figure 1
(a) Write down the mark which is exceeded by 75% of the female students.
(1)
The marks of a group of male students in the same statistics test are summarised by the
stem and leaf diagram below.
Mark
1
2
3
4
5
6
7
8
9
(2|6 means 26)

4
6
447
066778
001113677
223338
008
5
0
Totals
(1)
(1)
(3)
(6)
(9)
(6)
(3)
(1)
(1)
(b) Find the median and interquartile range of the marks of the male students.
(3)
An outlier is a mark that is
either more than 1.5 interquartile range above the upper quartile
or more than 1.5 interquartile range below the lower quartile.
(c) In the space provided on Figure 1 draw a box plot to represent the marks of the male
students, indicating clearly any outliers.
(5)
(d) Compare and contrast the marks of the male and the female students.
(2)
18. The following table summarises the times, t minutes to the nearest minute, recorded for a
group of students to complete an exam.
Time (minutes) t
Number of students f
11 20
21 25
26 30
31 35
36 45
46 60
62
88
16
13
11
10
[You may use
ft
= 134281.25]
(a) Estimate the mean and standard deviation of these data.

(5)
(b) Use linear interpolation to estimate the value of the median.
(2)
(c) Show that the estimated value of the lower quartile is 18.6 to 3 significant figures.
(1)
(d) Estimate the interquartile range of this distribution.
(2)
(e) Give a reason why the mean and standard deviation are not the most appropriate
summary statistics to use with these data.
(1)
The person timing the exam made an error and each student actually took 5 minutes less
than the times recorded above. The table below summarises the actual times.
Time (minutes) t
Number of students f
6 15
16 20
21 25
26 30
31 40
41 55
62
88
16
13
11
10
(f) Without further calculations, explain the effect this would have on each of the
estimates found in parts (a), (b), (c) and (d).
(3)
19. An agriculturalist is studying the yields, y kg, from tomato plants. The data from a random
sample of 70 tomato plants are summarised below.
Yield ( y kg)
Frequency (f )
Yield midpoint (x kg)
0-y<5
16
2.5
5 - y < 10
24
7.5
10 - y < 15
14
12.5
15 - y < 25
12
20
25 - y < 35
30
(You may use
fx = 755 and fx
= 12037.5)
A histogram has been drawn to represent these data.

The bar representing the yield 5 - y < 10 has a width of 1.5 cm and a height of 8 cm.
(a) Calculate the width and the height of the bar representing the yield 15 - y < 25
(3)
(b) Use linear interpolation to estimate the median yield of the tomato plants.
(2)
(c) Estimate the mean and the standard deviation of the yields of the tomato plants.
(4)
(d) Describe, giving a reason, the skewness of the data.
(2)
(e) Estimate the number of tomato plants in the sample that have a yield of more than
1 standard deviation above the mean.
(2)
1
(a)
2757
, 229.75
12
724961
(229.75) 2 , = 87.34045
12
mean is
sd is
(b)
Ordered list is: 125, 160, 169, 171, 175, 186, 210, 243, 250, 258, 390, 420
Q2 = 12 (186 + 210 ) = 198
(169 + 171) = 170

Q3 = 12 ( 250 + 258 ) = 254
Q1 =
(c)
(d)
1
2
Q3 + 1.5(Q3 Q1 ) = 254 + 1.5(254 170), = 380

Patients F (420) and B (390) are outliers.
Q1 2Q2 + Q3
Q3 Q1
Positive skew
170 2 198 + 254

, 0.3&
254 170
2
(a)
50
(b)
Q1
45
Q2
Q3
50.5
63
(c)
Mean =
1469
28
52.464286..
2
81213 1469
Sd =
28
28
=12.164. or 12.387216for divisor n
(d)
52.46.. 50
sd
(e)
1. mode/median/mean Balmoral>mode/median/mean Abbey
awrt 0.20 or 0.21
2. Balmoral sd < Abbey sd or similar sd or correct comment from their values,

Balmoral range<Abbey range,
Balmoral IQR>Abbey IQR or similar IQR
3. Balmoral positive skew or almost symmetrical AND Abbey negative skew,
Balmoral is less skew than Abbey or correct comment from their value in (d)
4. Balmoral residents generally older than Abbey residents or equivalent.
(a) 8-10 hours: width = 10.5 - 7.5 = 3 represented by 1.5cm

16-25 hours: width = 25.5 - 15.5 = 10 so represented by 5 cm
8- 10 hours: height = fd = 18/3 = 6 represented by 3 cm
16-25 hours: height = fd = 15/10 = 1.5 represented by 0.75 cm
(b)
Q2 = 7.5 +
Q1 = 5.5 +
( 52 36 ) 3 = 10.2
18
( 26 20 )
2 [ = 6.25 or 6.3] or 5.5+
16
( 78 54 ) 5 = 15.3
Q3 = 10.5 +
[
]
25
or 10.5 +
16
(78.75 54) 5 [=15.45 \15.5]
IQR = (15.3 - 6.3) = 9

(c)
(26.25 20) 2 [=6.3]

25
1333.5
AWRT 12.8
=
104
27254
AWRT 9.88
fx 2 = 27254 x = 104 x 2 = 262.05 x 2
Q3 Q2 [ = 5.1] > Q2 Q1 [ = 3.9] or Q2 < x
fx = 1333.5 x =
(d)
(e) So data is positively skew

Use median and IQR,
since data is skewed or not affected by extreme values or outliers
4
(a)
1(cm)
(b) 10 cm2 represents 15
10/15 cm2 represents 1
Therefore frequency of 9 is
9
10
9 or
1.5
15
height = 6(cm)
5
(a)
60 58
Q2 = 17 +
2
29
= 17.1 (17.2 if use 60.5)

(b)
fx
fx = 2055.5
= 36500.25
Mean = 17.129
36500.25
120
2055.5
120
= 3.28 (s= 3.294)

(c)
3 (17.129 17.1379...)
3.28
No skew/ slight skew
= -0.00802
(d) The skewness is very small. Possible.
(a)
Median is 33
(b)
Q1 = 24, Q3 = 40, IQR = 16
(c)
Q1 IQR=24 16 = 8
So 7 is only outlier
(d)
Box
Outlier
Whisker
(accept either whisker)

7
(a)
(b)
2.75 or 2 34 , 5.5 or 5.50 or 5 12

Mean birth weight =
(c)
4841
3.2273&
1500
awrt 3.23
2
Standard deviation =
15889.5 4841
= 0.421093... or s = 0.4212337...
1500
1500
(d)
Q2 = 3.00 +
(e)
Mean(3.23)<Median(3.25)
Negative Skew
403
0.5 = 3.2457...
820
(allow 403.5.. 3.25)

(or very close)
(or symmetrical)
(a)
23,
35.5 (may be in the table)
(b) Width of 10 units is 4 cm so width of 5 units is 2 cm

Height = 2.6 4 =10.4 cm
(c)
fx
1316.5
56
1316.5 x
awrt 23.5
fx
So =
(d)
(e)
37378.25
x 2 = awrt10.7
56
Q2 = (20.5) +
11
[7.9 >5.6 so ]
37378.25 can be implied
allow s = 10.8
( 28 21) 5 = 23.68
Q3 Q2 = 5.6, Q2 Q1 = 7.9
awrt 23.7 or 23.9
(or x < Q2 )
negative skew
9.
(a) 2.8 + 5.6 + 2.3 + 9.4 + 0.5 + 1.8 + 84.6 = 107
mean = 107 / 28 (= 3.821 )
(awrt 3.8)
(b) It will have no effect since
one is 4.5 under what it should be and the other is 4.5 above what it should be.
10.
(a) Outliers
14 + 1.5 (14 7) = 24.5

7 1.5 (14 7) = 3.5
Outlier 25
either upper limit acceptable on diagram
39
37
35
33
y
29
27
25
10
15
20
25
Sales in 000
(b) Since Q3 Q2 < Q2 Q1. Allow written explanation
negatively skew
(c) not true
since the lower quartile is 7000 and therefore 75% above 7000 not 10000
or 10 is inside the box or any other sensible comment
11.
(a) Median = 32/2 = 16th term (16.5)
x 39.5
16 14
2
or x = 39.5 + 10
=
49.5 39.5 25 14
11
Median = 41.3 ( use of n + 1 gives 41.8)

(b)
Mean=
1414
= 44.1875
32
69378 1414
Standard deviation =
32
32
= 14.7
(awrt 41.3)
(awrt 44.2)
(c) mean > median therefore positive skew
(or s = 14.9)
12.
10.5
(a)
(b)
( Q2 = ) (15.5 + )
1 30 14
2
3 or
1 31 14
2
= 15.875 or 16.0625
(c)
477.5
191
11
= 15.9
( 15.916&)
[ Accept
]
or 15 12
30
12
8603.75
=
x 2 ,= 5.78 (accept s = 5.88)
30
x=
(d)
Since mean and median are similar (or equal or very close) a normal distribution
may be suitable. [Allow mean or median close to mode/modal class]
(e)
Q3 Q2 (= 8) > (4.5 =)Q2 Q1

Therefore positive skew
13 (a)
(b)
14 (a)
(b)
(c)
14, 5
21 + 45 + 3 = 69
60
Q1 = 46
Q2 = 56
Q3 = 64
mean = 55.48.
or
2497
45
awrt 55.5
143369 2497
45
45
= 10.342
(s = 10.459..)
sd =
(d)
Mean < median < mode or Q2 Q1 > Q3 Q2 with or without their numbers or
median closer to upper quartile (than lower quartile) or (mean-median)/sd <0;
negative skew;
(e)
anything which rounds to 10.3 (or s = 10.5)
mean = (55 5) 0.9

= 45
sd = 10 0.9
=9
15.
(a)
450
450
"562.5"
or one small square =
(o.e. e.g.
)
"22.5"
"562.5"
450
One large square = 20 cars or one small square = 0.8 cars or 1 car = 1.25 squares
No. > 35 mph is: 4.5 "20" or 112.5 "0.8" (or equivalent e.g. using fd)
= 90 (cars)
One large square =
30 12.5 + 240 25 + 90 32.5 + 30 37.5 + 60 42.5 12975

= 450
450
173
awrt 28.8
= 28.83... or
6
(b)
[x] =
(c)
[Q2 =]
(d)
Q2 < x
20 +
195
10 (o.e.) [Allow use of (n + 1) giving 195.5 instead of 195]
240
= 28.125 [Use of (n + 1) gives 28.145]
awrt 28.1
[Condone Q2 x ]
[ so (almost) symmetric ]
So positive skew
median ( Q2 )
(e) [If chose skew in (d)]
Since the data is skewed
or
median not affected by extreme values
[If chose symmetric in (d)] mean ( x )

Since it uses all the data
16. (a) Width = 4 (cm)
Area of 14 cm 2 represents frequency 28 and area of 4h represents 18

4h 14
=
Or
(o.e.)
h = 2.25 (cm)
18 28
(b)
m = ( 240 ) +
10
80 (o.e.)
22
= 276.36...
(c)
fy = 31600
y =
( 3040
11 )
( ()276 < m < ()276.5)
leading to y = 316
( )
12452800
y
100
= 157 .07... (awrt 157) Allow s = 157.86...
(d) Skewness = 0.764...

(awrt 0.76 or 0.75
[If n+1 used in (b) and m = 278 accept awrt 0.73 or 0.72]
Positive
(e)
z=
80
150
P(240< X < 400) = 0.40 ~ 0.41

(f)
(e) suggests a reasonable fit for this range BUT

(d) since skew it will not be a good fit overall
17. (a) 25
(allow any x where 24 < x < 26)
(b) Q2 ( or median or m ) = 51
IQR = 63 46 ,= 17
(or Q3 Q1 = 17 )
46 1.5 17 = 20.5
(c) Outliers given by
Outliers limits are 20.5 and 88.5
or 63 + 1.5 17 = 88.5
Allow lower
whisker to 20.5
and upper
whisker to 88.5
Do not allow a
mix of whiskers
e.g 20.5 and 85
Do not allow
both sets of
whiskers
(d) Medians:
IQR: IQR for females smaller than males. Allow lower/higher but not wider
Range: Range of females is less than males
Skewness: Male and female marks are both positively skew
Ignore other statements about average, spread, mean, st. Dev, variation, outliers etc
18. (a)
ft = 4837.5
Mean =
(allow 4838 or 4840)
"4837.5"
= 24.1875
200
awrt 24.2 or
387
16
134281.25 4837.5
=
200
200
= 9.293 .....
(accept s =9.32)
(b)
(c)
Q 2 = [ 20.5] +
Q1 = 10.5 +
(100 /100.5 62 ) 5 = 22.659...

88
( 50 / 50.25 ) 10
62
[ = 18.56]
(*)
awrt
9.29
awrt 22.7
(n + 1 gives 18.604)
(d) Q3= 25.5 (Use of n + 1 gives 25.734)

IQR = 6.9 (Use of n + 1 gives 7.1)
(e) The data is skewed (condone negative skew)
(f) Mean decreases and st. dev. remains the same. [Must mention mean and st. dev.] (from(a))
The median and quartiles would decrease. [Must refer to median and at least Q1 .] ((b)(c))
The IQR would remain unchanged (from (d))
19.
(a) Width = 2 1.5 = 3 (cm)

Area = 8 1.5 = 12 cm 2 Frequency = 24 so 1 cm 2 = 2 plants (o.e.)
Frequency of 12 corresponds to area of 6 so height = 2 (cm)
(b)
[Q2 = ] ( 5 + )
= 8.9583
(c)
19
19.5
5
or (use of (n + 1)) (5+ )
5
24
24
awrt 8.96
or
9.0625 awrt 9.06
755
or awrt 10.8
70
12037.5
x2 =
[ x =]
70
[ x =]
55.6326...
= awrt 7.46
(d)
(Accept s = awrt 7.51)
x > Q2
So positive skew
(e)
x + 18.3 so number of plants is e.g.
(25 "18.3")
12 ( +4 ) (o.e.)
10
= 12.04 so 12 plants

Representation of Data

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Representation of Data

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Representation of Data

Uploaded by

Copyright:

Available Formats

REPRESENTATION OF DATA

[You may use

means 58 years in Abbey hotel and 50 years in Balmoral hotel

For the Balmoral Hotel,

81 213 find the standard deviation of the age of the residents.

One measure of skewness is found using

(a) Use interpolation to estimate the median of this distribution.

[You may use fx = 4841 and fx2 = 15 889.5]

(a) Write down the missing midpoints in the table above.

(a) Find the mid-points of the 21 25 hour and 31 40 hour groups.

Two highest values

An outlier is an observation that falls

(a) Use interpolation to estimate the value of the median.

(You may use

(a) Write down the mid-point for the 9 - 12 interval.

(3|6 means 36)

(a) Write down the modal mark of these students.

(2|6 means 26)

[You may use

(a) Estimate the mean and standard deviation of these data.

Yield midpoint (x kg)

(You may use

A histogram has been drawn to represent these data.

(169 + 171) = 170

Q3 + 1.5(Q3 Q1 ) = 254 + 1.5(254 170), = 380

170 2 198 + 254

1. mode/median/mean Balmoral>mode/median/mean Abbey

awrt 0.20 or 0.21

2. Balmoral sd < Abbey sd or similar sd or correct comment from their values,

(a) 8-10 hours: width = 10.5 - 7.5 = 3 represented by 1.5cm

2 [ = 6.25 or 6.3] or 5.5+

(78.75 54) 5 [=15.45 \15.5]

IQR = (15.3 - 6.3) = 9

(26.25 20) 2 [=6.3]

(e) So data is positively skew

(b) 10 cm2 represents 15

10/15 cm2 represents 1

= 17.1 (17.2 if use 60.5)

= 3.28 (s= 3.294)

(d) The skewness is very small. Possible.

Q1 = 24, Q3 = 40, IQR = 16

(accept either whisker)

2.75 or 2 34 , 5.5 or 5.50 or 5 12

(allow 403.5.. 3.25)

35.5 (may be in the table)

(b) Width of 10 units is 4 cm so width of 5 units is 2 cm

37378.25 can be implied

awrt 23.7 or 23.9

mean = 107 / 28 (= 3.821 )

(b) It will have no effect since

14 + 1.5 (14 7) = 24.5

Median = 41.3 ( use of n + 1 gives 41.8)

(c) mean > median therefore positive skew

Q3 Q2 (= 8) > (4.5 =)Q2 Q1

anything which rounds to 10.3 (or s = 10.5)

mean = (55 5) 0.9

One large square =

30 12.5 + 240 25 + 90 32.5 + 30 37.5 + 60 42.5 12975