Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chapter 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

CHAPTER TWO

SUMMARY STATISTICS

2.1 Introduction
In this chapter, we shall discuss the measure of averages (central tendencies) and
dispersion (spread). These measures are important for statistical reporting and analyses.
They do not involve any inference on them, but their information is important for
decision making.

2.2 Measures of Central Tendency (Mean, Median and Mode)


Mean, median and Mode are three major groups of measures of central tendency or
sometimes called measures of average. These measures can be determined for
ungrouped and grouped data.

2.2.1 Ungrouped data


2.2.1.1 Mean
There are various mean measures in statistics. These include arithmetic mean, geometric
mean and harmonic mean. We define each of these measures below.

Definition 2.1 (Geometric mean)


Given n observations x1 , x2 ,, xn , then the geometric mean is given by

G.M  n x1 x2  xn .
It is not commonly used measure of average, but is still applicable in physical sciences.

Definition 2.2 (Harmonic mean)


Given a set of n observations x1 , x2 ,, xn , then the harmonic mean is defined by
1
H  n
1 1
n
x
i 1 i

This is also not commonly used measure of average.

Definition 2.3 (Arithmetic mean)


The most commonly used measure of average is the arithmetic mean. It is denoted by x .
The arithmetic mean regarded as a suitable measure of central tendency when the values
in the data set are symmetric. In other words if the data set contains no extreme values.
Given a set of n observations as shown above, then

1 n
x   xi
n i1
2.2.1.2 Median
Another measure of central tendency is the median. This is defined as the middle value
of the data set when the data are arranged in order (ascending or descending). The
median is a suitable measure of average for data with extreme values. It is also used to
give the general overview for a huge mass of data whereby the computation of the
arithmetic mean might be tedious. It can be denoted by ~
x.
 n  1
The median takes the position  th for odd number of observations. However, if
 2 
n n 2
there is even number of observations, it is the average of the  th and  th
2  2 
observations.
2.2.1.3 Mode
This is the value(s) with the highest frequency from the data set. It can be used to
determine the most favourable output of a certain experiment and help decide on what
measures may be taken from that output. It is commonly denoted by x̂ .

Example 2.1
Compute mean, median and mode for the following data.
3, 4, 6, 8, 3, 5, 9, 11, 7, 10
Solution
We can first arrange the data in ascending order as follows;
3, 3, 4, 5, 6, 7, 8, 9, 10, 11
Then,
The arithmetic mean is given by,

x
 x  3  3  4  5  6  7  8  9  10  11  66  6.6 .
n 10 10
Mode is the value with highest frequency. In this case the highest frequency is 2,
Therefore, a mode is 3.

There is even number of observations n = 10. In this case, median is the average of the
fifth and sixth observations which are 6 and 7. Hence,
67
Median = = 6.5.
2

2.2.2 Grouped Data


In this case we are going to discuss the arithmetic mean, median and mode for grouped
data.

2.2.2.1 The Arithmetic Mean


Given a grouped frequency distribution we need to create a column for class midpoints
or class marks xi  . Then the arithmetic mean will be obtained from the general formula
given by
k

f
i 1
i xi
x ,
f i

where k is the number of classes and f i n

2.2.2.2 The median


To compute the median we first need to identify the class which contains the median,
we call this as the median class. Then the median can be computed using the formula
N 
  Cb  h
x L  
~ 2
fm
Where L = lower boundary of the median class
N = total number of observations
C b = cumulative frequency before median class
f m = frequency of the median class
h = class width/size of the median class

2.2.2.3 The mode


Mode  x̂  of a grouped data can be computed using the formula
 1 
xˆ  L    h

 1   2 

Suppose that f m = frequency of the modal class


f b = frequency of a class before the modal class
f a = frequency of a class after the modal class
Then, 1  f m  f b and  2  f m  f a

Example 2.2
The height (in inches) of 100 male students at ABC College were recorded as follows

Height 60 – 62 63 – 65 66 – 68 69 – 71 72 - 74
Frequency 5 18 42 27 8

Compute (a) Arithmetic mean (b) Median height (c) Mode.

Solution
(a) For the arithmetic mean we summarize the summations in the following table.

Class Class mark Freq. Cum. f i xi


 xi   fi  Freq
60 – 62 61 5 5 305
63 – 65 64 18 23 1152
66 – 68 67 42 65 2814
69 – 71 70 27 92 1890
72 – 74 73 8 100 584
TOTAL 100 6745

(a) Mean =
fx i i

6745
 67.45
f i 100
(b) Median is the value contained in class 66 – 68. From this class we have
L  65.5 , f m  42 , Cb  23 , h  3.
Then,
N 
  Cb  h
Median = L  
2   65.5  50  23(3)  67.43
fm 42
(c) The class with the highest frequency is 66 – 68 and thus it is the modal class.
From this class, we find that
L  65.5 , h  3 , f m  42 , f b  18 , f a  27 .
Implying that 1  f m  f b  42  18  24 ,  2  f m  f a  42  27  15 .
 1   24 
Therefore, mode = L    h  65.5    (3)  65.5  1.846  67.35
 1   2   39 

2.3 Measures of Dispersion


Measures of dispersion show how data deviate from the given measure of average
(arithmetic mean or median). These measures include range, mean absolute deviation,
standard deviation and quartile deviation.
The most commonly used measure of variation is the sample standard deviation, since
the population standard deviation is not easily obtained in practice. However, this
measure is not suitable for data with extreme values. If the data consists of some
extreme values, the appropriate measure would be the quartile deviation. The mean
absolute deviation is rarely used to compare the variation given two data sets.
The range is very crude measures of dispersion. It is used just to give the general
overview on the spread of data.

2.3.1 Ungrouped data


Consider a set of values X 1 ,, X N from a certain population, and x1 , x2 ,, xn from a
sample of a given population. Then,

2.3.1.1 Mean Absolute Deviation


Mean Absolute Deviation for a population is given by

MAD 
X i X
N
Where X is the population mean and N = population size.
Similarly for sample we have

MAD 
x i x
, where n = sample size.
n

2.3.1.2 Variance and Standard Deviation


The population variance,  2 , is given by the formula

 2

 X i  X2
N
And the sample variance denoted by s 2 , is given by the formula

s2 
 x .
i  x 2
n 1
The standard deviation is defined as the square root of the variance. It is denoted by 
for a population, and s for a sample. It is also denoted in general by SD X  . From the
above formulas we have


 X i  X2
and s 
 x i  x 2
N n 1

2.3.1.3 Quartile Deviation

The quartile deviation (Q.D) is given by Q.D 


1
Q 3  Q1  ,
2
Where,
Q1 = the first quartile, and
Q 3 = the third quartile
Note that quartiles are measures of average and they depend on the number of
observations. For odd number of observations, it is simple to obtain the quartiles, while
for even number of observations; determination of quartiles is not straight forward.
Suppose we have n  10 observations. Then, the first quartile will be taking the position
th
1

n  1th   11   2.75th  2 nd  0.75 3rd  2 nd 
4 4
Similarly, the third quartile will take the position
th
3

n  1th   33   8.25th  8th  0.25 9 th  8th 
4 4

Example 2.3
Compute mean absolute deviation, sample standard deviation and quartile deviation of
the following data: 10, 12, 8, 16, 8, 20, 21, 15

Solution
We first compute the arithmetic mean as
10  12  8  16  8  20  21  17 112
x   14
8 8
Then, the data are arranged in order and various calculations are summarized in the
table below

xi xi  x xi  x  xi  x  2
8 - 6 6 36
8 - 6 6 36
10 - 4 4 16
12 - 2 2 4
16 2 2 4
17 3 3 9
20 6 6 36
21 7 7 49
36 190

Then,
Mean absolute deviation is given by

MAD 
x i x

36
 4.5
n 8

Sample standard deviation is given by

s
 x i  x 2
=
190
= 5.21
n 1 7

We can also obtain the sample variance without prior computation of the arithmetic
mean, x . This alternative formula is given by

1   xi  2 

s 
2

n 1
 xi  n
2

 

For quartile deviation, we proceed as follows;


The data are arranged in ascending order as follows: 8, 8, 10, 12, 16, 17, 20, 21
We have an even number of observations, n = 8.
Then,
 n 1
th th
9
Q1    observation =   observation
 4  4
= 2.25th observation
= 2 nd  0.25 3rd  2 nd  
= 8  0.25 10  8 = 8 + 0.5 = 8.5

th

Q3 
3
n  1th observation =  27  observation
4  4 
= 6.75th observation
= 6 th  0.75 7 th  6 th  
= 17 + 0.75 (20 - 17) = 17 + 2.25 = 19.25

Then, Quartile deviation =


1
Q 3  Q1  = 1 19.25  8.50  10.25  5.375
2 2 2

2.3.2 Grouped data


In this we will consider only two measures commonly used in statistical analysis and
decision making. These include sample standard deviation and quartile deviation.

2.3.2.1 Mean Absolute Deviation


From a grouped data we first compute the class marks ( x i ) and the arithmetic mean, x ,
such that

x
fx
i
, where n   f i
i

n
Then, the mean absolute deviation for the sample is given by

MAD 
f i xi  x
n

2.3.2.2 Sample Standard Deviation


Sample standard deviation is given by

s2 
 f x i i  x 2
n 1
Alternatively we use
1   f i xi  2 
s2 
n  1 
 i i
f x 2

n 

2.3.2.3 Quartile Deviation

The quartile deviation is computed again as Q.D 


1
Q3  Q1  but in this case we need
2
to compute Q1 and Q3 using the formula for median. Q1 is called the lower quartile and
Q3 is called the upper quartile of the grouped data. We have therefore
N 
  Cb  h
Q1  L   
4
f
Where,
L = lower boundary of the class contains lower quartile
C b = cumulative frequency before the class which contains lower quartile
h = the class size
f = the class frequency
Similarly, we define the upper quartile by

3 
 N  Cb  h
Q3  L   
4
f
Where,
L = lower boundary of the class contains upper quartile
C b = cumulative frequency before the class which contains upper quartile
h = the class size
f = the class frequency
Example 2.4
Compute the sample standard deviation and the quartile deviations for the data in the
following table

Class 15 – 19 20 – 24 25 – 29 30 – 34 35 – 39 40 – 44 45 – 49
Freq 4 5 8 5 8 6 4
Solution
Calculations are summarized in the table below
Class Freq ( f i ) c. mark ( x i ) f i xi f i xi2
15 – 19 4 17 68 1156
20 – 24 5 22 110 2420
25 – 29 8 27 216 5832
30 – 34 5 32 160 5120
35 – 39 8 37 296 10952
40 – 44 6 42 252 10584
45 – 49 4 47 188 8836
TOTAL 40 1290 44900

(a) Sample Standard deviation


Using summations from the above table, we compute variance as
1  1290 2 
s2   44900    84.55
39  40 
 s  84.55  9.2
(b) Quartile deviation
We first compute Q1
N  1 41
Note that   10.25 . Thus the class which contains Q1 is 25 – 29.
2 2
From this class, we find that L = 24.5, f = 8, Cb = 9, h = 5,
Thus,
5  40 
Q1  24.5    9   25.125
8 4 
Then, we compute Q3 as follows;
3
N  1  3 41  30.75 . Hence the class which contains upper quartile is 40 – 44.
4 4
From this class we have; L  39.5 , f  6 , Cb  30 , h  5 ,
Therefore,
53 
Q 3  39.5   40  30   39.5  30  305  39.5  0  39.5
1
64  6

Thus the Quartile Deviation =


1
39.5  25.125  7.1875
2

You might also like