Lesson 3 Edtech
Lesson 3 Edtech
Lesson 3 Edtech
Module 10
1
Lesson 3
Measures of Central Tendency of Group Data
Learning Objectives: Upon the successful completion of this lesson, students will be able to:
Identify the property of central tendency for grouped data
Interpret the difference among various measures of central tendency such as mean,
median and mode
Calculate and interpret the value of mean, median and mode of grouped data
Calculate and interpret the weighed mean
Identify distribution as symmetrical or skewed
Statistical techniques were used to analyse and present data, provide for making
decisions under any management method. In order to turn raw data into meaningful
numbers, an understanding of statistical measures that boil down complicated data into
a value that is understandable and relevant is essential. Theses lesson presents some
important measures to be considered under grouped data.
3.0 Measures of Central Tendency
There are three measures of central tendency- the mode, median and the
arithmetic mean. The measure that is most appropriate in any particular situation
depends on the level of measurement of data, the information that one wishes to
communicate about the data and the way data are concentrated in the data set.
The mode, median and mean each present the datas central value. Under
certain conditions, these three measures can be the same. Most often, they are not.
Statistics
Module 10
2
Although this sounds confusing, the explanation is that each of these summary statistics
has its own definition. To use these measures effectively then, one must understand the
perspective that each measures presents, as well as the advantage and disadvantage
associated with each measure.
Briefly, for the mode, the distributions center is the data point or score that
occurs most frequently in the distribution. For the median, the distributions center is the
value that evenly divides the distribution it is the value or score below or above which
50% of the observations occur. Then mean, the distributions center is the balance
point of the distribution, which is based on considering the size or weight of each data
point.
3.1 Mode
The mode of a data set is the value/ values occurring most often in the
distribution. The mode identifies the distributions most prevalent score/scores. For
example, in the data set { 1, 1, 2, 3, 4}, the mode is 1 because 1 is the value that occurs
most frequently. In the data set { 4, 4, 4, 5, 6, 6, 6}, there are two modes: 4 and 6. In
this example, both 4 and 6 occur three times in the distribution, which is most frequently
than any other value occurs. Because this data set has two modes, it is called bimodal.
The mode is a crude but simple measure of central tendency. It can be applied to
nominal, ordinal, interval and ratio scale data, but usually it is applied only to nominal
data. In practice, the mode is not used with ordered, interval or ratio scale data, even
though it is permitted.
Statistics
Module 10
3
Advantages and Disadvantages of Mode
Typically a number that actually occurs in dataset
Has highest probability of occurrence
Applicable to nominal as well as ordinal, interval and ratio scales
Unaffected by extreme scores
But not representative if multimodal with peaks far apart
3.2 The Median
The median of a data set is the number that divides the distribution exactly in
half. That is, when the distributions scores are arranged from smallest to largest, the
median is the midpoint value of the data set. This means that one-half or 50% of the
data scores are larger than the median value and one-half or 50% are smaller than the
median value.
The median is an appropriate measure for ordinal, interval and ratio scale data. It
is inappropriate for nominal scale data, however, because nominal scale data cannot be
ranked. The median always expressed in the same units as the original data. Further,
unlike the mode, the median always have a single value. That is, a data set cannot have
more than one median.
To calculate the median for a grouped data set, lets take this example: suppose
a librarian wishes to determine the median time required for interlibrary loan personnel
to fill requests. To begin the analysis, the librarian examines the number of requests
and then prepares a frequency distribution as shown in the table.
Statistics
Module 10
4
Each of these 156 records contains the clerks notation of the number of days
required to fill the request (a designation of zero day means that the request was filled
on the day it was received). Since the frequency distribution does not show the raw data
values for the distribution, the librarian cannot list the actual values included in the class
interval that contains the median. Because of this, the librarian must determine the
median through a method known as interpolation.
In the table, n=156. So, the rank of the median lies midway between the 78
th
and
79
th
scores. The frequency distribution shows that this score is contained in the interval
whose real limits are 7.5 to 11.5. the 78
th
item is the sixth element in the median class
[6=78-72]. Thus, to arrive at the desired 78
th
rank, the librarian must determine the value
of going six steps into the interval 7.5 to 11.5.
Table 3.1 Summary of Days required to fill Interlibrary Loan [n=156]
Class Interval
Frequency (f)
Upper Class
Boundary
Cumulative
frequency
0-3
4-7
8-11
12-15
16-19
20-23
24-27
24
48
40
15
14
9
6
3.5
7.5
11.5
15.5
19.5
23.5
27.5
24
72
112
137
141
150
156
n= 156
Because the interval contains 40 steps, since f =40, the median must lie 6/40 of
the way between 7.5 to 11.5, the calculations are as follows:
Statistics
Module 10
5
= 7.5 +
(11.5-7.5)
=7.5 + 0.15(4.0)
=7.5 + 0.6 which is 8.1 days.
Formula shortens the calculations needed to compute the median for grouped data.
The symbol
L +
F)
where L= lower class boundary of the median class
n= total number of observation in the data set
F= cumulative frequency up to, but not including the median class
f= frequency of the median class
w= size of the median class
Advantages and Disadvantages of Median
Also unaffected by extreme scores
Usually its value actually occurs in the data
But cannot be entered into equations because there is no equation that defines it
And not as stable from sample to sample because dependent upon the number
of score in the sample.
Statistics
Module 10
6
3.3 The Mean
The arithmetic mean, popularly called the average of the mean is a measure
familiar to most everyone. If data are organized in a grouped frequency distribution, the
mean is determined by multiplying the class midpoint of each class interval by the
frequency of observation in that class. These products are then totalled and divided by
the number of observations in the distribution. In mathematical terms, this is expressed
in formula:
=
where f= frequency or number of observations in each class
M= class midpoint for each class interval
k= number of classes
Formula can be illustrated using the book copyright data in the table,
organized in a group frequency distribution of 8 classes (k=8).
Statistics
Module 10
7
Table 3.2 Copyright Dates (n=100)
Interval
Frequency
(n)
Midpoint
(M)
Frequency x
midpoint
(n*M)
66-69
70-73
74-77
78-81
82-85
86-89
90-93
94-97
10
13
14
15
16
13
11
8
_______
n= 100
67.5
71.5
75.5
79.5
83.5
87.5
91.5
95.5
675.0
929.5
1,057.0
1,192.5
1,336.0
1,337.5
1,006.5
764.0
______
Sum= 8,298.0
From the table 3.2, the librarian sees the following values:
f
1
= 10
f
2
= 13
f
3
= 14
f
4
= 15
f
5
= 16
f
6
= 13
f
7
= 11
f
8
= 8
____
100
M
1
= 67.5
M
2
= 71.5
M
3
= 75.5
M
4
= 79.5
M
5
= 83.5
M
6
= 87.5
M
7
= 91.5
M
8
= 95.5
=f
1
+f
2
+f
3
+f
4
+f
5
+f
6
+f
7
+f
8
=10 + 13 + 14 + 15 + 16 + 13 + 11 + 8 which is 100
Now, substituting into formula with k=8,
Statistics
Module 10
8
=
()()()()()