Statistics Tutorial 1
Statistics Tutorial 1
Statistics Tutorial 1
Tutorial 1
CENTRAL TENDENCY AND SPREAD
Measures of central tendencies, also known as averages, give a single value which represent the
entire set of data. It is generally observed that the data on a variable tend to cluster around some
central value.
1. Arithmetic Mean
2. Median
3. Mode
4. Geometric Mean
5. Harmonic Mean
• Discrete data is counted and can only take certain values. E.g., The number of students in
a class or the results of rolling 2 dice.
• Continuous data is measured and can take any value (within a range). E.g., A person’s
height, weight, time in a race etc.
Below are definitions and formulae for Ungrouped (Discrete) Data.
Sample Variance
∑𝑁
𝑖=1(𝑥𝑖 −𝑥̅ )
2
s = 𝑁−1
of the variance.
Sample S.D., s
∑𝑁
𝑖=1(𝑥𝑖 −𝑥̅ )
2
s = √ 𝑁−1
Example 1.
Consider the set of test scores, out of 20, for a class of 12 students:
4, 17, 7, 14, 18, 12, 3, 16, 10, 4, 4, 11
We first put them in order:
3, 4, 4, 4, 7, 10, 11, 12, 14, 16, 17, 18
(3+4+4+4+7+10+11+12+14+16+17+18)
Mean 𝝁 = 12
𝜇 = 10
Mode = 4
Median: 3, 4, 4, 4, 7, 10, 11, 12, 14, 16, 17, 18
10+11
=
2
= 10.5
Lower Quartile, Q1: 3, 4, 4 | 4, 7, 10 | 11, 12, 14 | 16, 17, 18
4+4
=
2
=4
Upper Quartile, Q3: 3, 4, 4 | 4, 7, 10 | 11, 12, 14 | 16, 17, 18
14+16
= 2
= 15
Range = 18 – 3
= 15
Interquartile Range = 15 – 4
= 11
∑𝑁
𝑖=1(𝑥𝑖 −𝜇)
2
Variance, 2 = 12
336
=
12
= 28
x f x- (x- ) 𝑓 × (𝑥 − 𝑚)2
3 1 3 – 10 = -7 (-7)2 = 49 49 × 1 = 49
4 3 -6 36 36 × 3 = 108
7 1 -3 9 9
10 1 0 0 0
11 1 1 1 1
12 1 2 4 4
14 1 4 16 16
16 1 6 36 36
17 1 7 49 49
18 1 8 64 64
𝑁
∑ 𝑓 = 12 ∑ (𝑥𝑖 − 𝜇)2 = 336
𝑖=1
∑𝑁
𝑖=1(𝑥𝑖 −𝜇)
2
Standard Deviation, = √ 12
= √28
= 5.2915….
𝜎
Coefficient of Variation = 𝜇 × 100
5.2915
= × 100
10
= 52.915
Pearson’s Coefficient of Skewness
3(𝑥̅ −𝑀𝑑)
𝑆𝑘2 = 𝑠
3(10−10.5)
𝑆𝑘2 = 5.2915
= - 0.2835
Below are formulae for Grouped (Continuous) Data. The definitions are omitted.
Term Formula
Mean 𝑥̅ ∑ 𝑓𝑚𝑖
Sample Mean 𝑥̅ = ∑𝑓
Where:
• 𝑚𝑖 is the midpoint of the ith group
• 𝑓 is the frequency of the ith group
• ∑ 𝑓 is the total number of values
Estimated Median 𝑛
(2) − 𝐵
𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + ×𝑊
𝐺
Where:
• L is the lower class boundary of the group containing
the median
• n is the total number of values
• B is the cumulative frequency of the groups BEFORE
the median group
• G is the frequency of the median group
• W is the group width
Variance 2 =
∑𝑁 2
𝑖=1 𝑓𝑚𝑖
− 𝑥̅ 2
𝑛
Where:
• 𝑚𝑖 is the midpoint of the ith group
• 𝑓 is the frequency of the ith group
• 𝑛 is the total number of values (also f)
• 𝑥̅ is the mean of the grouped data
Standard Deviation,
∑𝑁 2
𝑖=1 𝑓𝑚𝑖
=√ − 𝑥̅ 2
𝑛
Coefficient of Variation, CV
𝜎
𝐶𝑉 = × 100
𝜇
Length
Frequency
(mm)
150 - 154 5
155 - 159 2
160 - 164 6
165 - 169 8
170 - 174 9
175 - 179 11
180 - 184 6
185 - 189 3
Estimated Mean
Midpoint Frequency
Length (mm)
m f fm
150 - 154 152 5 760
155 - 159 157 2 314
160 - 164 162 6 972
165 - 169 167 8 1336
170 - 174 172 9 1548
175 - 179 177 11 1947
180 - 184 182 6 1092
185 - 189 187 3 561
Totals: 50 8530
8530
Estimated Mean = = 170.6 mm
50
Estimated Median
The Median is the mean of the 25th and the 26th length, so is in the 170 - 174 group:
50
(( )− 21)
2
Estimated Median = 169.5 + × 5
𝟗
= 169.5 + 2.22...
= 171.7 mm (to 1 decimal)
Mode
The Modal group is the one with the highest frequency, which is 175 - 179:
11− 9
Estimated Mode = 174.5 + (𝟏𝟏 − 𝟗)+ (𝟏𝟏 − 𝟔)
× 5
= 174.5 + 1.42...
= 175.9 mm (to 1 decimal)
∑𝑁 2
𝑖=1 𝑓𝑚𝑖
Estimated Variance = − 𝑥̅ 2
𝑛
Midpoint Frequency
Length (mm)
m f m2 𝒇𝒎𝟐𝒊
150 - 154 152 5 23104 115520
155 - 159 157 2 24649 49298
160 - 164 162 6 26244 157464
165 - 169 167 8 27889 223112
170 - 174 172 9 29584 266256
175 - 179 177 11 31329 344619
180 - 184 182 6 33124 198744
185 - 189 187 3 34969 104907
Totals: 50 230892 1459920
1459920
= − 170.62
50
=
∑𝑁 2
𝑖=1 𝑓𝑚𝑖
=√ − 𝑥̅ 2
𝑛
= √94.04
= 9.6974
CV = 5.6843
Estimated Coefficient of Skewness
3(𝑥̅ −𝑀𝑑)
𝑆𝑘2 = 𝜎
3(170.6−171.7)
𝑆𝑘2 = 9.6974
𝑆𝑘2 = −0.3403
Tutorial Questions
1. The following data represents the daily number of parcels delivered by a courier over a 10-
day period.
15 19 17 11 24 23 8 19 21 13
a. Using the parcel delivery data, calculate the following measures of location:
i. Mode [2]
ii. Median [2]
iii. Mean [2]
b. Based on the measures of location calculated in (a.), comment on the skewness of the
parcel delivery data. [2]
c. Calculate the standard deviation of the parcel delivery data. [4]
2. The following data shows the annual salaries (£000) of 10 employees working for the same
company.
16 18 20 58 22 24 20 32 20 10
Using the annual salary data, calculate the following:
a. Mode
b. Median
c. Mean
d. Range [8]
3. The following data shows the number of hours worked at a weekend by twelve factory
workers.
5 10 12 5 6 7 4 7 9 11 5 3
Using this data, determine the:
a. Mode number of hours worked. [2]
b. Median number of hours worked. [2]
c. Mean number of hours worked. [2]
d. Range of the number of hours worked. [2]
e. Standard deviation of the number of hours worked. [2]
4. The following data shows the number of days taken by ten business students to complete
an assignment.
8 9 10 29 11 12 10 16 10 5
a. Calculate the mode, median and mean number of days taken by the ten students to
complete the assignment. [6]
b. Calculate the range and standard deviation of the number of days taken by the ten
students to complete the assignment. [7]
5. The following set of data represents a frequency distribution of the unemployment rates
(%) of 50 countries.
a. Calculate the mean and median of the distribution of the unemployment rates above.
b. Demonstrate that the sample standard deviation is equal to 2.8.
c. Find the coefficient of skewness and comment on the result.
6. The distances travelled each week (in miles) by 150 randomly selected lorry (truck) drivers
in a country are as follows.
Distance (Miles) Frequency
100 but under 200 56
200 but under 300 44
300 but under 400 25
400 but under 500 17
500 but under 600 7
600 but under 700 1