Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

BST503Lec4 (Autosaved)

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 67

Descriptive Statistics

Classification and Tabulation of Data

Presentation of Data
Measure of Central Tendency (MCT)
 Measure of Shape
 Measure of Dispersion
 One of the important objectives of statistics is to find out various
numerical measures which explains the inherent characteristics of a
frequency distribution.
 The first of such measures is averages. The averages are the measures
which condense a huge unwieldy set of numerical data into single
numerical values which represent the entire distribution. 
 Measures of central tendency are statistical measures which describe
the position of a distribution.
 They are also called statistics of location, and are the complement of
statistics of dispersion, which provide information concerning the
variance of observations.
 In the univariate context, the mean, median and mode are the most
commonly used measures of central tendency.
 computable values on a distribution that discuss the behavior of the center of
a distribution.
Objectives of MCT
1. To condense data in a single value which may be used to
represent a whole series involving magnitudes of the same
2. To facilitate comparisons between data.
3. To facilitate the computation of other statistical measures
such as dispersion, skewness, kurtosis etc.

Simpson and Kafka defined it as “ A

measure of central tendency is a typical value
around which other figures congregate”
Waugh has expressed “An average stand for
the whole group of which it forms a part yet
represents the whole”.
Requisites of a MCT
• It must be rigidly defined and not left to the mere estimation of the
observer. If the definition is rigid, the computed value of the
average obtained by different persons shall be similar.
• The average must be based upon all values given in the distribution.
If the item is not based on all value it might not be representative of
the entire group of data.
• It should be easily understood. The average should possess simple
and obvious properties. It should be too abstract for the common
• It should be capable of being calculated with reasonable care and
• It should be stable and unaffected by sampling fluctuations.
• It should be capable of further algebraic manipulation.
Measures of MCT
1. Arithmetic Mean
Arithmetic mean is a mathematical average and it
is the most popular measures of central tendency. It
is frequently referred to as ‘mean’ it is obtained by
dividing sum of the values of all observations in a
series (ƩX) by the number of items (N) constituting
the series.
Thus, mean of a set of numbers X1, X2, X3,
………..Xn denoted by x̅ and is defined as
Arithmetic Mean Calculated Methods for
individual observation:
• Direct Method :

• Short cut method :

• Step deviation Method :

Arithmetic Mean Calculated Methods for
discrete series :
• Direct Method :

• Short cut method :

• Step deviation Method :

Example : Calculated the Arithmetic Mean
Below is given the Total Milk Yield (kg) in Ist
Lactation of 50 F x H Crossbred cows
Animal No Milk Yield

1 751.5

7 1966.0

8 2219.0

9 4121.0

10 1133.0
Direct Method

Animal No Milk Yield (X)

1 751.5

7 1966.0

8 2219.0

9 4121.0

10 1133.0

N = 10 ∑ X = 20787
Short cut Method

Animal Milk Yield D= x-A

No (X)
1 751.5 -1248.5
2 2098.0
3 951.5
4 1628.0
5 4679.0 2679
6 1240.0
7 1966.0
8 2219.0 219
9 4121.0 2121
10 1133.0 -867
N = 10 ∑ X = 787
Example : Calculated the Arithmetic Mean for
discrete Series
Marks No. of Students

20 8

70 4
Direct method for Discrete series

Marks (X) No. of fX

Students (f)
20 8 160

12 630
20 800
10 500
6 360
70 4 280

N = 60 ∑fX = 2460
Short cut method for Discrete series

Marks No. of d = X-A fd

(X) Students (A = 40)
20 8 -20 -160

12 -10 -120
20 0 0
10 10 100
6 20 120
70 4 30 120

N = 60 ∑fd = 60
AM for Continuous Series
• Obtain the mid point (m)
• Multiply m with frequency (f) and get sum of fm
• Divide the total sum of fm by total frequency

Marks (X) No. of Students (f)

Direct Method :
0-10 5

10 Short cut method :

40 - 50
50-60 10

N = 100
Direct Method for continuous series

Marks Mid No. of FXm

(X) point (m) Students (fm)
0-10 5 5 25

15 10 150
25 25 625
35 30 1050
45 20 900
40 - 50
50-60 55 10 550

N = 100 ∑fm =
Short cut Method for continuous series

Marks Mid point No. of D = m-A FXd

(X) (m) Students D = m-35
0-10 5 5 -30 -150

15 10 -20 -200
25 25 -10 -250
35 30 0 0
45 20 10 200
40 - 50
50-60 55 10 20 200
N = 100 ∑fd = -200
Advantages of Mean:
• It is easy to understand & simple calculate.
• It is based on all the values.
• It is rigidly defined .
• It is easy to understand the arithmetic average even if some of
the details of the data are lacking.
• It is not based on the position in the series.
Disadvantages of Mean:
• It is affected by extreme values.
• It cannot be calculated for open end classes.
• It cannot be located graphically
• It gives misleading conclusions.
• It has upward bias.
1. The algebraic sum of the deviations of all the
variates from their arithmetic mean is zero
Geometric Mean
• The geometric mean is a type of average , usually used for
growth rates, like population growth or interest rates. While
the arithmetic mean adds items, the geometric
mean multiplies items. Also, you can only get the geometric
mean for positive numbers.
• Use to find average per cent increate in sale, population,
production etc.
• A geometric mean, unlike an arithmetic mean, tends to
dampen the effect of very high or low values, which might
bias the mean if a straight average (arithmetic mean) were
Geometric Mean
• Geometric mean is a type of mean or average, which
indicates the central tendency or typical value of a set of
numbers by using the product of their values (as opposed to
the arithmetic mean which uses their sum). The geometric
mean is defined as the nth root of the product of n numbers,
i.e., for a set of numbers x1, x2, ..., xn, the geometric mean is
defined as

𝐺𝑀= √ 𝑥 1 𝑋 𝑥 2 𝑋 𝑥3 𝑋 …………. 𝑋 𝑥𝑛
Individual Series

Discrete Series

Continuous Series
Geometric mean of discrete series

Example : Calculate geometric mean of the

following data :
x       5 6 7 8 9 10 11
f        2 4 7 10 9 6 2
Solution : Calculation of G.M.

x      log x       f       f log x

5      0.6990    2      1.3980
6      0.7782    4      3.1128
7      0.8451    7      5.9157
8      0.9031   10     9.0310 𝐺𝑀 = 𝐴𝑛𝑡𝑜𝐿𝑜𝑔(0.9032)
9      0.9542   9       8.5878
10    1.0000   6       6.0000
11    1.0414   2       2.0828
      N = 40   ∑f log x = 36.1281
Geometric mean of continuous series

Example : Calculate G.M. from the following

data :

X                       f

9.5 – 14.5          10
14.5 – 19.5        15
19.5 – 24.5        17
24.5 – 29.5        25
29.5 – 34.5        18
34.5 – 39.5        12
39.5 – 44.5        8
Geometric mean of continuous series

Solution: Calculation of G.M.


X                     m         log m           f          f log m

9.5 – 14.5         12         1.0792          10         10.7920
14.5 – 19.5        17         1.2304          15        18.4560
19.5 – 24.5        22         1.3424          17        22.8208
24.5 – 29.5        27         1.4314          25        35.7850
29.5 – 34.5        32         1.5051          18        27.0918
34.5 – 39.5        37         1.5682          12        18.8184
39.5 – 14.5        42         1.6232          8          12.9850
                                   N = 105    ∑f logm = 146.7410
Practical Exercise 7

Compute the geometric mean of the following distribution:

Marks 0-10 10-20 20-30 30-40

No. of 5 8 3 4
Harmonic Mean
• The harmonic mean helps to find multiplicative or divisor
relationships between fractions without worrying about
common denominators.
• Harmonic means are often used in averaging things like rates
(e.g., the average travel speed given a duration of several
• It is based on all observations. It not significantly affected by
the fluctuation of sampling. It is capable of algebraic
treatment. It is an appropriate average for averaging ratios
and rates. It does not give much weight to large items.
• Used when there is necessity to give grater weightage to
smaller item and used to compute the average of ratio or rate
of given values
Harmonic Mean
• The harmonic mean is a very specific type of average.
• The Harmonic mean of a series of values is the reciprocal of
the arithmetic means of their reciprocals.
• Most appropriate measure of ration and rates as it equalizes
the weight of each data points.

Individual series
1 1 1 1 1
+ + + +……………
𝐻 𝑁 𝑥 1 𝑥2 𝑥 3 𝑥 4
𝑥𝑛 ]
continuous series
Discrete series
Harmonic mean of discrete series

Example : From the following data compute

the value of the harmonic mean :
x : 5 15 25 35 45
f : 5 15 10 15 5
Solution : Calculation of Harmonic Mean

X     f 1/X f/X

5     5      0.200     1.000
15   15     0.067     1.005
25   10     0.040     0.400
35   15     0.29      0.435
45   5      0.022     0.110

      ∑f = 50 ∑f /X = 2.95
Harmonic mean of continuous series
x       f
0 – 10    5
10 – 20  15
20 – 30  10
30 – 40  15
40 – 50  5
Solution : First of all, we shall find out mid points of the various
classes. They are 5, 15, 25, 35 and 45.
Then we will calculate the H.M. by applying the following formula :
H.M. =
Calculation of Harmonic Mean

m (Mid Points)   f 1/m f/m

5                     5       0.200    1.000
15                  15      0.067    1.005
25                  10      0.040    0.400
35                  15      0.29      0.435
45                  5        0.022    0.110
                    ∑f = 50 ∑f /m = 2.95
Practical Exercise 8
• Calculate Arithmetic Mean, Geometric mean, and Harmonic
mean for the data of Gestation Period of crossbred cows given
278 273 285 282 274
289 287 270 277 285
254 283 275 286 276
285 293 283 287 286
272 292 281 285 290
261 282 282 255 283
283 309 281 288 288
283 285 281 272 277
273 285 270 292 253
295 269 288 281 287
Median is a central value of the distribution, or
the value which divides the distribution in equal
parts, each part containing equal number of items.
Thus it is the central value of the variable, when
the values are arranged in order of magnitude.
Connor has defined as “ The median is that value
of the variable which divides the group into two
equal parts, one part comprising of all values
greater, and the other, all values less than median”
Calculation of Median : Individual Observation

i. Arrange the data in ascending or descending order.

ii. Apply the formula.

iii. Incase of even number of observations median may

be found by averaging two middle position values
Calculation of median – Individual
32.500 25.990
26.950 26.950
25.990 27.020
28.280 27.200
27.200 28.280
27.020 32.500

Will take the average of 3rd and 4th observations

Calculation of Median : Discrete Series
i. Arrange the data in ascending or descending order.

ii. Calculate the cumulative frequencies.

iii. Apply the formula.

iv. Now look at the cumulative frequency column and find

that total which is either equal to (n+1)/2 or next higher
and determine the value of the variable corresponding to it.
It gives the value of median
Calculation of median – Discrete Series

X f X f c.f.
15000 24 15000 24 24
15500 26 15500 26 50
18000 16 16800 20 70
16800 20 17800 30 100
18500 6 18000 16 116
17800 30 18500 6 122
Calculation of median – Continuous series

For calculation of median in a continuous frequency

distribution the following formula will be employed.

L1 = the lower boundary of the class in which the middle item of the distribution lies
cf = the cumulative frequency class preceding the median class
n = the total frequency
f = the frequency of the class median
I = class interval
• Use N/2 as the rank/size of median in place of (n+1)/2

• Now look at the cumulative frequency column and find that total which is either equal
to n/2 or next higher and determine the value of the variable corresponding to it. It
gives the size of median
Example: Median of a set Grouped Data in a
Distribution of Respondents by age
Age Group Frequency of Cumulative
Median class(f) frequencies(cf)
0-20 15 15
20-40 32 47
40-60 54 101
60-80 30 131
80-100 19 150
Total 150
Median (M)=40+

= 40+

= 40+0.52X20
= 40+10.37
= 50.37
Practical Problem 9
• The following data pertains to the number of
members in a family. Find median size of the
Number of
1 2 3 4 5 6 7 8 9 10 11 12
members x

Frequency f 1 3 5 6 10 13 9 5 3 2 2 1
The cumulative frequency just
greater than 30.5 is 38.and the
value of x corresponding to 38 is
6.Hence the median size is 6
members per family.
Practical - 10
• For the frequency distribution of weights of sorghum ear-
heads given in table below. Calculate the median.
Practical Problem 7
An incomplete distribution is given below
x 0-10 10-20 20-30 30-40 40-50 50-60 60-70
f 10 20 ? 40 ? 25 15

a) You are given the median value is 35 and total frequency is 170. Find the
missing frequency
b) Calculate the AM for the complete table
0-10 10
10-20 20 Let missing freq. are f1 and f2.
20-30 ? Total Frequency = 170
30-40 40 The freq. of class other then missing freq = 10+20+40+25+15 = 110
40-50 ? F1+f2 = 170-110= - 60
50-60 25
60-70 15

Median= 35 (given) and it must lies in the class = 30-40

L1 = 30; N/2 = 85; cf = (10_20+f1); i=10; f = 40

0-10 10 x f c m d = (m-35)/10 f fd
10-20 20 0-10 10 0-10 5 -3 10 -30
20-30 ? 10-20 20 10-20 15 -2 20 -40
30-40 40 20-30 35 20-30 25 -1 35 -35
40-50 ? 30-40 40 30-40 35 0 40 0
50-60 25 40-50 25 40-50 45 1 25 25
60-70 15 50-60 25 50-60 55 2 25 50
60-70 15 60-70 65 3 15 45
N=170 ∑fd = 15
Advantages of Median:
• Median can be calculated in all distributions.
• Median can be understood even by common
• Median can be ascertained even with the
extreme items.
• It can be located graphically
• It is most useful dealing with qualitative data
Disadvantages of Median:
• It is not based on all the values.
• It is not capable of further mathematical
• It is affected fluctuation of sampling.
• In case of even no. of values it may not the
value from the data.
Related Positional Measures
• Measures of position are techniques that divide a set
of data into equal groups
• These are used to locate the relative position of the
data value in the data set.
• Median belongs to a general class of statistical
descriptions called fractiles.
• A fractile is a value below that lays a given fraction of
a set of data. In the case of the median, this fraction
is one-half (1/2).
• The most common measures of position are
percentiles, deciles, and quartiles.
Related Positional Measures
• A quartile has a fraction one-fourth (1/4). The three quartiles
Q1, Q2 and Q3 are such that 25 percent of the data fall below
Q1, 25 percent fall between Q1 and Q2, 25 percent fall
between Q2 and Q3 and 25 percent fall above Q3 It will be
seen that Q2 is the median.
Computation of Quartiles, Deciles and
Practical 11
Calculate the lower and upper quartiles, third docile and 20 th percentile from the
following data

Central Value 2.5 7.5 12.5 17.5 22.5

Frequency 7 18 25 30 20

Class Group Frequency of Cumulative

Median class(f) frequencies(cf)
0-5 7 7
5-10 18 25
10-15 25 50
15-20 30 80
20-25 20 100
Q1 lies in the class 5-10
3. Mode
• It is simplest measure of central tendency and is easy to derive.
• Croxton and Cowden defined it as “the mode of a distribution is the value
at the point armed with the item tend to most heavily concentrated. It
may be regarded as the most typical of a series of value”
• The mode is observed rather than computed.
• The mode (Mo) of a distribution of values is the value which occurs most

Calculation of Mode for Individual Series

S. No X S. No X X F X F
1 10 6 27 10 1 20 1
2 27 7 20 12 1 24 1
3 24 8 18 15 1 27 3
4 12 9 15 18 1 30 1
5 27 10 30
Calculation of Mode
28 10 10 8
29 20 15 12
30 40 20 36
31 65 25 35
32 50 30 28
33 15 35 18
40 9 • When mode is determined just by
inspection and difference in
maximum frequency, it preceding
is very small then it is desirable to
find the mode by preparing the
Grouping table and analysis table.
Grouping and Analysis Table for
Mode evaluation
Calculate Mode for the continuous series

Monthly rent (Rs) Number of Libraries (f)

500-1000 5
1000-1500 10
1500-2000 8
2000-2500 16
2500-3000 14
3000 & Above 12
Total 65
Z = 2000+

Z =2000+

Z=2000+0.8 ×500=400

Mode for Bimodal Distribution
• For a bimodal distribution ,
the value of the mode can
not be determined with the
help of the formula given.
• When mode is ill-defined,
mode is calculated on the
Mode = 3 Median – 2 mode
Locating Mode Graphically
1. Draw a histogram for the data
2. Draw two lines diagonally in the inside of the modal class
bar starting from each upper corner of the bar to the upper
corner of the adjacent bar
3. Draw a perpendicular line from the intersection of the two
diagonal line to the X-axis (horizontal scale), which gives the
modal value
Locating Mode Graphically
• Determine the Modal Weight for the following
Weight f
100-110 4
110-120 6
120-130 20
130-140 32
140 -150 33
150-160 17
160-170 8
170-180 2
Advantages of Mode :
• Mode is readily comprehensible and easily
• It is the best representative of data
• It is not at all affected by extreme value.
• The value of mode can also be determined
• It is usually an actual value of an important
part of the series.
Disadvantages of Mode :
• It is not based on all observations.
• It is not capable of further mathematical
• Mode is affected to a great extent by
sampling fluctuations.
• Choice of grouping has great influence
on the value of mode.
•  A measure of central tendency is a measure that
tells us where the middle of a bunch of data lies.

• Mean is the most common measure of central

tendency. It is simply the sum of the numbers
divided by the number of numbers in a set of
data. This is also known as average.
Practical - 12
The following data, 44.4, 67.6, 76.2, 64.7,80.0, 64.2, 75.0, 34.2,
29.2, represent the infection of goats with the viral condition
peste des petits ruminants, expressed as the percentage
morbidity in Indian villages. calculate the median.
• Median is the number present in the middle
when the numbers in a set of data are arranged
in ascending or descending order. If the number
of numbers in a data set is even, then the
median is the mean of the two middle numbers.
• Mode is the value that occurs most frequently in
a set of data.

You might also like