Quartile & Deviation
Quartile & Deviation
Quartile & Deviation
f1 m1 + f2 m2+ + fkmk
X= ------------------------------------
f1 + f 2+ …….+ fk
4. If A is any guessed or assumed arithmetic mean
(which may be any number) and if dj =Xj-A are the
deviations of Xj from A, then equation iare
X=A+d/n
X = A + fd/n
Quantiles:
Percentiles:
The vales dividing the data into 100 equal parts are called
Percentiles and is denoted by P1, P2, P3, P4…… P99.
Problem
Following is the data of marks obtained by 20 students in a test of
statistics;
In order to apply formulae, we need to arrange the above data into
ascending order i.e. in the form of an array.
53 74 82 42 39 20 81 68 58 28
67 54 93 70 30 55 36 37 29 61
20 28 29 30 36 37 39 42 53 54
55 58 61 67 68 70 74 81 82 93
The value of the 5th item is 36 and that of the 6th item is 37. Thus, the
first quartile is a value 0.25th of the way between 36 and 37, which are
36.25. Therefore, = 36.25. Similarly,
The value of the 10th item is 54 and that of the 11th item is 55. Thus the
second quartile is the 0.5th of the value 54 and 55. Since the difference
between 54 and 55 is of 1, therefore 54 + 1(0.5) = 54.5. Hence, = 54.5.
Likewise,
The value of the 15th item is 68 and that of the 16th item is 70.
Thus the third quartile is a value 0.75th of the way between 68
and 70. As the difference between 68 and 70 is 2, so the third
quartile will be 68 + 2(0.75) = 69.5. Therefore, = 69.5.
Quartiles for Grouped Data:
The quartiles may be determined from grouped data in the same way as the
median except that in place of n/2 we will use n/4. For calculating quartiles
from grouped data we will form cumulative frequency column. Quartiles for
grouped data will be calculated from the following formulae;
= Median.
Where,
l = lower class boundary of the class containing the ,
i.e. the class corresponding to the cumulative frequency in which n/4 or 3n/4
lies
h = class interval size of the class containing .
f = frequency of the class containing .
n = number of values, or the total frequency.
C.F = cumulative frequency of the class preceding the class containing .
For Example:
We will calculate the quartiles from the frequency distribution for the
weight of 120 students as given in the following
∑f = n = 120
i. The first quartile is the value of or the 30th item from the lower
end. From this data, we see that cumulative frequency of the third class
is 22 and that of the fourth class is 50. Thus lies in the fourth class i.e.
140 – 149.
50,65,85,95,45,55
Minimum: 45 Maximum : 95
Range = 95-45 = 50
Advantages
Disadvantages
• It is limited/partial, if extreme scores are not representative
of the sample, but are included among the scores.
• It is not very informative, because it is based only on the
most extreme scores.
• It is severely affected by extreme scores in your data
distribution.
Mean Deviation or average deviation:
The mean deviation gives information about how far the data values
are spread out from the mean value.
Distance of each value from that mean
Deviation of set of N numbers X1, X2, …… XN
XJ - X
MD = -------------------- j= 1………… N
N
X = Arithmetic mean
Exercise:
(1450 – 1100)
Q= ------------------ = Rs. 175
2
Variance (V): is calculated by dividing the sum of the squared
deviations from the mean by the number of observations. If
sample size is small, one may use (N – 1) instead of N in the
denominator for calculating SD.
N _ 2 _ 2
V = { ( Xi - X ) } / N or ( Xi - X ) / N - 1
i:1
Higher the value of V, greater is the spread of observations /
cases from the mean. For heterogeneous data, V will be larger.
The square root of V is known as Standard Deviation (S).
Standard Deviation:
X = Arithmetic mean
s is the root mean square of the deviation from the mean or root-mean-
square deviation
220 students were asked the number of hours per week they spent watching television.
With this information, calculate the mean and standard deviation of hours spent watching
television by the 220 students.
If all values of a data set are the same, the standard deviation is zero (because each value
is equal to the mean).
When analysing normally distributed data, standard deviation can be used in conjunction
with the mean in order to calculate data intervals.
If = mean, S = standard deviation and x = a value in the data set, then
• about 68% of the data lie in the interval: - S < x < + S.
• about 95% of the data lie in the interval: - 2S < x < + 2S.
• about 99% of the data lie in the interval: - 3S < x < + 3S.
Assuming the frequency distribution is approximately normal, calculate
the interval within which 95% of the previous example's observations
would be expected to occur.
= 29.82, s = 6.03
Calculate the interval using the following formula: - 2s < x < + 2s
29.82 - (2 X 6.03) < x < 29.82 + (2 X 6.03)
29.82 - 12.06 < x < 29.82 + 12.06
17.76 < x < 41.88
This means that there is about a 95% certainty that a student will spend
between 18 hours and 42 hours per week watching television.
What’s the difference between standard deviation and variance?
• The higher the coefficient of variation, the greater the level of dispersion
around the mean. It is generally expressed as a percentage. Without
units, it allows for comparison between distributions of values whose
scales of measurement are not comparable.
s
CV = --------- x 100
X
6.03
CV = --------- x 100 = 20.2
29.82
calculate the Variance, take each difference, square it, and then average the result:
Variance
σ2 = 2062 + 762 + (−224)2 + 362 + (−94)25
42436 + 5776 + 50176 + 1296 +
=
88365
= 1085205
= 21704
So the Variance is 21,704
And the Standard Deviation is just the square root of
Variance, so:
Standard Deviation
σ = √21704
= 147.32...
= 147 (to the nearest mm)
And the good thing about the Standard Deviation is that it is useful. Now we
can show which heights are within one Standard Deviation (147mm) of the
Mean.
So, using the Standard Deviation we have a "standard" way of knowing what
is normal, and what is extra large or extra small.