Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Statistics Tutorial 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

INSB 2015 – Applied Statistics for Business and Analytics

Tutor: Linda Deonath


Email: l.deonath@fac.gsb.tt

Tutorial 1
CENTRAL TENDENCY AND SPREAD

Measures of central tendencies, also known as averages, give a single value which represent the
entire set of data. It is generally observed that the data on a variable tend to cluster around some
central value.

This clustering around some central value is called Central Tendency.

There are various measures of averages:

1. Arithmetic Mean
2. Median
3. Mode
4. Geometric Mean
5. Harmonic Mean

*We will focus on the first 3.

• Discrete data is counted and can only take certain values. E.g., The number of students in
a class or the results of rolling 2 dice.
• Continuous data is measured and can take any value (within a range). E.g., A person’s
height, weight, time in a race etc.
Below are definitions and formulae for Ungrouped (Discrete) Data.

Term Definition Formula


Mean, 𝜇 or 𝑥̅ Average of all the numbers. It is Population Mean 𝜇 = ∑ 𝑥𝑖
𝑁
the sum divided by the count. N ∑𝑥
Sample Mean 𝑥̅ = 𝑛 𝑖
signifies the size of the entire
population, while n signifies a
sample size out of the entire
population.
Median The median is the number such 𝑛+1
𝑀𝑒𝑑𝑖𝑎𝑛 =
that the number of scores above it 2

and below it is the same. It is the


middle value if N is odd, or the
average of the two middle values,
if N is even.
Mode The mode is the most frequently
occurring value. A set of data can
have no mode, two modes (bi-
modal) or more than two modes
(multi-modal).
Upper Quartile (Q3) The Upper Quartile is the middle
number between the median and
the highest value.
Lower Quartile (Q1) The Lower Quartile is the middle
number between the median and
the lowest number.
Range The range of a set of data is the Highest score – Lowest score
difference between the maximum
and minimum values in the set.
Interquartile Range (IQR) The interquartile range is the range IQR = Q3 – Q1
between the middle 50% of the
scores in a distribution.
Variance  or s2 The variance is defined as the Population Variance, 
average of the squared differences ∑𝑁
𝑖=1(𝑥𝑖 −𝜇)
2
 = 𝑁
from the mean.

Sample Variance
∑𝑁
𝑖=1(𝑥𝑖 −𝑥̅ )
2
s = 𝑁−1

Standard Deviation,  or s The Standard Deviation is a Population S.D., 


measure of how spread-out ∑𝑁 2
𝑖=1(𝑥𝑖 −𝜇)
=√
numbers are. It is the square root 𝑁

of the variance.
Sample S.D., s

∑𝑁
𝑖=1(𝑥𝑖 −𝑥̅ )
2
s = √ 𝑁−1

Coefficient of Variation, The coefficient of variation is a


CV statistical measure of the relative Population 𝐶𝑉 = 𝜎 × 100
𝜇
dispersion of data points in a data
series around the mean.
In finance, the coefficient of 𝑠
Sample 𝐶𝑉 = 𝑥̅ × 100
variation allows investors to
determine how much volatility, or
risk, is assumed in comparison to
the amount of return expected
from investments.
The lower the ratio of the standard
deviation to mean return, the
better risk-return trade-off.
Pearson’s Coefficient of This coefficient finds the 3(𝑥̅ − 𝑀𝑑)
𝑆𝑘2 =
Skewness, 𝑆𝑘2 skewness in a sample using 𝑠

descriptive statistics. Data can be


skewed, meaning it tends to have a
long tail on one side or the other.
No skewness means that the data
is perfectly symmetrical and the
mean is exactly at the peak.
* A Normal Distribution is not
skewed.

Example 1.
Consider the set of test scores, out of 20, for a class of 12 students:
4, 17, 7, 14, 18, 12, 3, 16, 10, 4, 4, 11
We first put them in order:
3, 4, 4, 4, 7, 10, 11, 12, 14, 16, 17, 18

Number of values, N = 12 (Population)

(3+4+4+4+7+10+11+12+14+16+17+18)
Mean 𝝁 = 12

𝜇 = 10
Mode = 4
Median: 3, 4, 4, 4, 7, 10, 11, 12, 14, 16, 17, 18
10+11
=
2

= 10.5
Lower Quartile, Q1: 3, 4, 4 | 4, 7, 10 | 11, 12, 14 | 16, 17, 18
4+4
=
2

=4
Upper Quartile, Q3: 3, 4, 4 | 4, 7, 10 | 11, 12, 14 | 16, 17, 18
14+16
= 2

= 15
Range = 18 – 3
= 15
Interquartile Range = 15 – 4
= 11
∑𝑁
𝑖=1(𝑥𝑖 −𝜇)
2
Variance, 2 = 12
336
=
12

= 28
x f x-  (x- ) 𝑓 × (𝑥 − 𝑚)2
3 1 3 – 10 = -7 (-7)2 = 49 49 × 1 = 49
4 3 -6 36 36 × 3 = 108
7 1 -3 9 9
10 1 0 0 0
11 1 1 1 1
12 1 2 4 4
14 1 4 16 16
16 1 6 36 36
17 1 7 49 49
18 1 8 64 64
𝑁
∑ 𝑓 = 12 ∑ (𝑥𝑖 − 𝜇)2 = 336
𝑖=1

∑𝑁
𝑖=1(𝑥𝑖 −𝜇)
2
Standard Deviation,  = √ 12

= √28
= 5.2915….
𝜎
Coefficient of Variation = 𝜇 × 100

5.2915
= × 100
10

= 52.915
Pearson’s Coefficient of Skewness
3(𝑥̅ −𝑀𝑑)
𝑆𝑘2 = 𝑠

3(10−10.5)
𝑆𝑘2 = 5.2915

= - 0.2835

Below are formulae for Grouped (Continuous) Data. The definitions are omitted.

Term Formula
Mean 𝑥̅ ∑ 𝑓𝑚𝑖
Sample Mean 𝑥̅ = ∑𝑓

Where:
• 𝑚𝑖 is the midpoint of the ith group
• 𝑓 is the frequency of the ith group
• ∑ 𝑓 is the total number of values
Estimated Median 𝑛
(2) − 𝐵
𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + ×𝑊
𝐺
Where:
• L is the lower class boundary of the group containing
the median
• n is the total number of values
• B is the cumulative frequency of the groups BEFORE
the median group
• G is the frequency of the median group
• W is the group width

Estimated Mode 𝑓𝑚 − 𝑓𝑚−1


𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑀𝑜𝑑𝑒 = 𝐿 + ×𝑊
(𝑓𝑚 − 𝑓𝑚−1 ) + (𝑓𝑚 − 𝑓𝑚+1 )
Where:
• L is the lower class boundary of the modal group
• 𝑓𝑚−1 is the frequency of the group BEFORE the modal
group
• 𝑓𝑚 is the frequency of the MODAL group
• 𝑓𝑚+1 is the frequency of the group AFTER the modal
group
• W is the group width

Variance  2  =
∑𝑁 2
𝑖=1 𝑓𝑚𝑖
− 𝑥̅ 2
𝑛

Where:
• 𝑚𝑖 is the midpoint of the ith group
• 𝑓 is the frequency of the ith group
• 𝑛 is the total number of values (also f)
• 𝑥̅ is the mean of the grouped data

Standard Deviation, 
∑𝑁 2
𝑖=1 𝑓𝑚𝑖
=√ − 𝑥̅ 2
𝑛

Coefficient of Variation, CV
𝜎
𝐶𝑉 = × 100
𝜇

Pearson’s Coefficient of 𝑆𝑘 = 3(𝑥̅ −𝑀𝑑)


2 𝜎
Skewness, 𝑆𝑘2
Where:
• Md is the median of the grouped data
• 𝑥̅ is the mean of the grouped data
•  is the standard deviation of the grouped data
Example 2.
Baby carrots were grown in a special type of soil. They were dug up and their lengths measured
(to the nearest mm) and the results were grouped.

Length
Frequency
(mm)
150 - 154 5
155 - 159 2
160 - 164 6
165 - 169 8
170 - 174 9
175 - 179 11
180 - 184 6
185 - 189 3

Estimated Mean

Midpoint Frequency
Length (mm)
m f fm
150 - 154 152 5 760
155 - 159 157 2 314
160 - 164 162 6 972
165 - 169 167 8 1336
170 - 174 172 9 1548
175 - 179 177 11 1947
180 - 184 182 6 1092
185 - 189 187 3 561
Totals: 50 8530

8530
Estimated Mean = = 170.6 mm
50
Estimated Median

The Median is the mean of the 25th and the 26th length, so is in the 170 - 174 group:

• L = 169.5 (the lower class boundary of the 170 - 174 group)


• n = 50
• B = 5 + 2 + 6 + 8 = 21
• G=9
• w=5

50
(( )− 21)
2
Estimated Median = 169.5 + × 5
𝟗
= 169.5 + 2.22...
= 171.7 mm (to 1 decimal)

Mode

The Modal group is the one with the highest frequency, which is 175 - 179:

• L = 174.5 (the lower class boundary of the 175 - 179 group)


• fm-1 = 9
• fm = 11
• fm+1 = 6
• w=5

11− 9
Estimated Mode = 174.5 + (𝟏𝟏 − 𝟗)+ (𝟏𝟏 − 𝟔)
× 5
= 174.5 + 1.42...
= 175.9 mm (to 1 decimal)

∑𝑁 2
𝑖=1 𝑓𝑚𝑖
Estimated Variance  = − 𝑥̅ 2
𝑛

Midpoint Frequency
Length (mm)
m f m2 𝒇𝒎𝟐𝒊
150 - 154 152 5 23104 115520
155 - 159 157 2 24649 49298
160 - 164 162 6 26244 157464
165 - 169 167 8 27889 223112
170 - 174 172 9 29584 266256
175 - 179 177 11 31329 344619
180 - 184 182 6 33124 198744
185 - 189 187 3 34969 104907
Totals: 50 230892 1459920

1459920
 = − 170.62
50

 = 

Estimated Standard Deviation

∑𝑁 2
𝑖=1 𝑓𝑚𝑖
=√ − 𝑥̅ 2
𝑛

= √94.04
= 9.6974

Estimated Coefficient of Variation


𝜎
𝐶𝑉 = 𝜇 × 100
9.6974
𝐶𝑉 = × 100
170.6

CV = 5.6843
Estimated Coefficient of Skewness
3(𝑥̅ −𝑀𝑑)
𝑆𝑘2 = 𝜎

3(170.6−171.7)
𝑆𝑘2 = 9.6974

𝑆𝑘2 = −0.3403
Tutorial Questions
1. The following data represents the daily number of parcels delivered by a courier over a 10-
day period.
15 19 17 11 24 23 8 19 21 13
a. Using the parcel delivery data, calculate the following measures of location:
i. Mode [2]
ii. Median [2]
iii. Mean [2]
b. Based on the measures of location calculated in (a.), comment on the skewness of the
parcel delivery data. [2]
c. Calculate the standard deviation of the parcel delivery data. [4]
2. The following data shows the annual salaries (£000) of 10 employees working for the same
company.
16 18 20 58 22 24 20 32 20 10
Using the annual salary data, calculate the following:
a. Mode
b. Median
c. Mean
d. Range [8]

3. The following data shows the number of hours worked at a weekend by twelve factory
workers.
5 10 12 5 6 7 4 7 9 11 5 3
Using this data, determine the:
a. Mode number of hours worked. [2]
b. Median number of hours worked. [2]
c. Mean number of hours worked. [2]
d. Range of the number of hours worked. [2]
e. Standard deviation of the number of hours worked. [2]

4. The following data shows the number of days taken by ten business students to complete
an assignment.
8 9 10 29 11 12 10 16 10 5
a. Calculate the mode, median and mean number of days taken by the ten students to
complete the assignment. [6]
b. Calculate the range and standard deviation of the number of days taken by the ten
students to complete the assignment. [7]
5. The following set of data represents a frequency distribution of the unemployment rates
(%) of 50 countries.

Unemployment rate (%) Frequency


0 but under 2 5
2 but under 4 10
4 but under 6 15
6 but under 8 10
8 but under 10 6
10 but under 12 4

a. Calculate the mean and median of the distribution of the unemployment rates above.
b. Demonstrate that the sample standard deviation is equal to 2.8.
c. Find the coefficient of skewness and comment on the result.

6. The distances travelled each week (in miles) by 150 randomly selected lorry (truck) drivers
in a country are as follows.
Distance (Miles) Frequency
100 but under 200 56
200 but under 300 44
300 but under 400 25
400 but under 500 17
500 but under 600 7
600 but under 700 1

a. Find the arithmetic mean and the median.


b. Calculate the standard deviation.
c. Calculate the coefficient of skewness and comment on the result.

You might also like