Chapter 5
Chapter 5
Chapter 5
UTILIZATION OF DATA
LEARNING OUTCOMES
INTRODUCTION
Statistics is very important tool in the utilization of the assessment data most
especially in describing, analyzing, and interpreting the performance of the students
in the assessment procedures. The teachers should have the necessary background in
the statistical procedures used in assessment of student learning in order to give a
correct description and interpretation about the achievement of the students in a
certain test whether classroom assessment conducted by the teacher, division or
national assessment conducted by the Department of Education.
DEFINITION OF STATISTICS
FREQUENCY DISTRIBUTION
1. Class Limit is the grouping or categories defined by the lower and upper limits.
Examples: LL – UL
10 – 14
15 – 19
20 – 24
Lower class limit (LL) represent the smallest number in each group.
Upper class limit (UL) represent the highest number in each group.
2. Class size (c.i) is the width of each class
interval. Examples: LL – UL
10 – 14
15 – 19
20 – 24
3. Class boundaries are the numbers used to separate each category in the
frequency distribution but without gaps create by the class limits. The scores
of the students are discrete. Add 0.5 to the upper limit to get the upper class
boundary and subtract 0.5 to the lower limit to get the lower class boundary in
each group or category.
Examples: LL – UL LCB - UCB
10 – 14 9.5 – 14.5
15 – 19 14.5 – 19.5
20 – 24 19.5 – 24.5
4. Class marks are the midpoint of the lower and upper class limits. The formula is
¿+ UL
xM =
2
Examples: LL – UL XM
10 – 14 12
15 – 19 17
20 – 24 22
1. Compute the value of the range (R). Range is the difference between the highest
score and the lowest score.
R = HS – LS
Determine the class size (c.i). The class size is the quotient when you
divide the range by the desired number of classes or categories. The desired
numbers of classes are usually 5, 10 or 15 they depend in the number of
scores in the distribution. If the desired number of classes is not identified,
R R
c .i= or c .i=
desired number of classes k
2. Set up the class limits of each class or category. Each class defined by the lower
limit and upper limit. Use the lowest score as the lower limit of the first class.
17 25 30 33 25 45 23 19
27 35 45 48 20 38 39 18
44 22 46 26 36 29 15-LS 21
50-HS 47 34 26 37 25 33 49
22 33 44 38 46 41 37 32
R = HS – LS
= 50 – 15
R = 35
n = 40
Solve the value of k.
k = 1 + 3.3 log n
k = 1 + 3.3 log 40
k = 1 + 3.3 (1.602059991)
k = 1 + 5.286797971
k = 6.286797971
k=6
𝑅
Find the class size.
𝑐. 𝑖 =
𝐾
𝑐. 𝑖 =
3
5
𝑐. 𝑖 = 5.833
6
𝒄. 𝒊 = 𝟔
Construct the class limit starting with the lowest score as the lower limit of the
first category. The last category should contain the highest score in the distribution.
Each category should contain 6 as the size of the width (X). Count the number of scores
that falls in each category (f).
X Tally Frequency(f)
15 – 20 //// 4
21 – 26 ///////// 9
27 – 32 /// 3
33 – 38 ////////// 10
39 – 44 //// 4
45 – 50 ////////// 10
n = 40
Find the class boundaries and class marks of the given score distribution.
X f Class Boundaries XM
15 – 20 4 14.5 – 20.5 17.5
21 – 26 9 20.5 – 26.5 23.5
27 – 32 3 26.5 – 32.5 29.5
33 – 38 10 32.5 – 38.5 35.5
39 – 44 4 38.5 – 44.5 41.5
47.5
45 – 50 10 44.5 – 50.5
n = 40
Frequency polygon is constructed by plotting the class marks against the class
frequencies. The x-axis corresponds to the class marks and the y-axis corresponds to the
class frequencies. Connect the points consecutively using a straight line. Frequency
polygon is best used in representing continuous data such as the scores of students in a
given test.
X Frequency(f)
15 – 20 4
21 – 26 9
27 – 32 3
33 – 38 10
39 – 44 4
45 – 50 10
n = 40
There are two major concepts in describing the assessed performance of the
group: measures of central tendency and measures of variability. Measures of central
tendency are used to determine the average score of a group of scores while measure
of variability indicate the spread of scores in the group. These two concepts are very
important and helpful in understanding the performance of the group.
1. 𝑋 =
ƩX
𝑛
2. 𝑋 =
Ʃfx
n
Example 1: Scores of 15 students in Mathematics I quiz consist of 25
items. The highest score is 25 and the lowest score is 10. Here are the scores:
25,20,18, 18,17,15,15,15,14,14,13,12,12,10,10. Find the men in the following
scores
X (scores)
25
20
18
18
17
15
15
15
14
14
13
12
12
10
10
Ʃx = 228
n=15
Ʃx
x̄=
n
228
x̄= =15.2
15
Analysis:
Example 2: Find the Grade Point Average (GPA) of Ritz Glenn for the first
semester of the school year 2010 – 2011. Use the table below:
(W ¿¿ i)(X i )
x̄=Ʃ ¿
Ʃ Wi
32
x̄=
26
x̄=1.23
The Grade Point Average of Ritz Glenn for the first semester SY 2010 – 2011 is 1.23.
Grouped data are the data or scores that are arranged in a frequency
distribution. Frequency distribution is the arrangement of scores according to
category of classes including the frequency. Frequency is the number of
observations falling in a category.
For this particular lesson we shall discuss only one formula in solving the mean
for gouped data which is called midpoint method. The formula is:
Ʃ f xm
x̄=
n
where x = mean value
1. Find the midpoint or class mark (Xm)of each class or category using the
¿+UL
formula Xm=
2
2. Multiply the frequency and the corresponding class mark f𝑋𝑚.
3. Find the sum of the results in step 2.
Ʃ f xm
4. Solve the mean using the formula x̄=
n
Example 3: Scores of 40 students in a science class consist of 60 items and they are
tabulated below.
X F Xm 𝐟𝑿𝒎
10 – 14 5 12 60
15 – 19 2 17 34
20 – 24 3 22 66
25 – 29 5 27 135
30 – 34 2 32 64
35 – 39 9 37 333
40 – 44 6 42 252
45 – 49 3 47 141
50 – 54 5 52 260
n = 40 Ʃf𝑋𝑚 = 1 345
Ʃ f xm
x̄=
n
1345
x̄=
40
x̄=33.63 Analysis:
The mean performance of 40 students in science quiz is 33.63. Those
students who got scores below 33.63 did not perform well in the said examination
while those students who got scores above 33.63 performed well.
2. Median
X
(score)
19
17
16
15
10
5
2
Analysis:
The median score is 15. Fifty percent (50%) or three of the scores are above 15
(19,17,16) and 50% or three of the scores are below 15 (10,5,2).
X (score)
30
19
16
15
10
5
2
~ 16+15
X=
2
~
X=15.5
Analysis
The median score is 15.5 which means that 50% of the scores in the
distribution are lower than 15.5, those are 15,10,5, and 2; and 50% are greater then
15.5 those are 30,19,17,16 which mean four (4) scores are below 15.5 and four (4)
scores are above 15.5.
Formula:
[ ]
n
−cfp
~ 2
X= LB c.i
fm
~
X = median value
cfp = cumulative frequency before the median class if the scores are
arranged from lowest to highest value
fm = frequency of the median class
Example 3: Scores of 40 students in a science class consist of 60 items and they are
tabulated below. The highest score is 54 and the lowest score is 10.
X F cf<
10 – 14 5 5
15 – 19 2 7
20 – 24 3 10
25 - 29 5 15
30 – 34 2 17 (cfp)
35 – 39 9 (fm) 26
40 - 44 6 32
45 – 49 3 35
50 - 54 5 40
n = 40
Solution:
𝑛
= 40 = 20
2 2
𝑛
The category containing
2 is 35-39.
MC = 35 – 39
LL of the MC = 35
𝐿𝐵 = 34.5
cfp = 17
fm = 9
c.i = 5
[ ]
n
−cfp
~ 2
X= LB c.i
fm
20−17
= 34.5 + [ ]5
9
= 34.5
+[ ]
53
9
= 34.5 + 15
9
= 34.5 + 1.67
~
X=36.17
Analysis:
The median value is 36.17, which means that 50% or 20 scores are less
than 36.17.
3. Mode
Mode is the third measure of central tendency. The mode or the modal score is a
score or scores that occurred most in the distribution. I is classified as unimodal,
bimodal, and trimodal and multimodal.Unimodal is a distribution o scores that consists
of only one mode. Bimodal is a distribution of scores that consists of two modes.
Trimodal is a distribution of scores that consists of three modes or multimodal is a
distribution of scores that consists of more than two modes.
The score that appeared most in section A is 20, hence, the mode of section A is
20. There is only one mode, therefore, score distribution is called unimodal. The
modes of section B are24, since both 18 and 24 appeared twice. There are two modes
in section B, hence, the disctribution is a bimodal distribution. The modes for section
C are18,21, and 25. There are three modes for section C, therefore, it is called a
trimodalor multimodal distribution.
In solving the mode value using grouped data, use the formula:
^
X =LB +
[ ]
d1
d 1 +d 2
c.i
x f
10 – 14 5
15 – 19 2
20 – 24 3
25 – 29 5
30 – 34 2
35 – 39 9
40 – 44 6
45 – 49 3
50 – 54 5
n = 40
Modal Class = 35 – 39
LL of MC = 35
𝐿𝐵 = 34.5
d1 = 9 – 2 = 7
d2 = 9 – 6 = 3
c.i = 5
^
X =LB +
[ ]
d1
d 1 +d 2
c.i
^
X =34.5+
[ ]7
7+3
5
¿ 34.5+
[ ]
35
10
¿ 34.5+3.5
¿ 38
The mode of the score distribution that consists of 40 students is 38, because
38 occurred several times.
Quantile is a score distribution where the scores are divided into different
equal parts. There are three kinds of quantile. The quartile is a score point that
divided the scores in the distribution into four (4) equal parts. Decile is a score point
that divides the scores in the distribution into hundred (100) equal parts.
Where,
k = 1,2,3
n = number of cases
Where,
Dk= is the indicated decile
k = 1,2,3,4,5,6,7,8,9
n = number of cases
Where,
Pk is the indicated percentile
K = 1,2,3,4,5,.....................97,98,99
N = number of cases
Example:
Using the given data 6, 8, 10, 12, 12, 14, 15, 16, 20. Find Q1, Q3, D6, D9, P65, P99.
x (score)
6
8
10
12
12
14
15
16
20
1.
Solve the value of Q1.
n=9
[ ]
ntℎ score
1 1
Q 1= n+(1 − )
4 4
[ ]
ntℎ score
1 1
Q1= 9+(1 − )
4 4
[ ]
ntℎ score
9 3
¿ +
4 4
[ ]
ntℎ score
12
¿
4
Q1=3 rd score
The value of Q1is 10 which is the 3rd score in the distribution. Therefore,
[ ]
ntℎ score
3 3
Q3= 9+(1 − )
4 4
[ ]
ntℎ score
27 1
¿ +
4 4
[ ]
ntℎ score
28
¿
4
Q3=7 tℎ score
𝑄3 = 15
Hence, 75% of the scores in the distribution are less than 15.
3.
Solve the value of D6.
[ ]
ntℎ score
6 6
D 6= n+(1− )
10 10
[ ]
ntℎ score
6 6
D 6= 9+(1− )
10 10
[ ]
ntℎ score
54 4
¿ +
10 10
[ ]
ntℎ score
58
¿
10
D6=5.8 tℎ score
The value of D6 lies within the sum of the 5th score and 80% of the difference
between 6th and 5th scores.
Therefore, 60% of the scores in the distribution are less than 13.60.
4.
Solve the value if D9.
[ ]
ntℎ score
9 9
D 9= n+(1− )
10 10
[ ]
ntℎ score
9 9
D 9= 9+(1− )
10 10
[ ]
ntℎ score
81 1
¿ +
10 10
[ ]
ntℎ score
82
¿
10
D9=8.2tℎ score
The value of D9 lies within the 8th and 9th scores. That is, the sum of the 8th
score and 20% of the difference between the 9th and 8th scores.
D9 = 8th score + 0.20 (9th score – 8th score)
D9 = 16 + 0.20 (20 – 16)
D9 = 16 + 0.80
D9 = 16.80
Therefore, 90% of the scores in the distribution are less than 16.80.
5.
Solve the value of P65.
[ ]
ntℎscore
65 65
P65= n+(1 − )
100 100
[ ]
ntℎscore
65 65
P65 = 9+(1 − )
100 100
[ ]
ntℎ score
585 35
¿ +
100 100
[ ]
ntℎ score
620
¿
100
P65=6.2 tℎ score
Therefore, P65 lies within the 6th and 7th scores. The value of P65 is the sum of
the 6th and 20% of the difference between the 7th and the 6th scores.
P65 = 6th score + 0.20(7th score – 6th score)
= 14 + 0.20 (15 – 14)
= 14 + 0.20 (1)
= 14 + 0.20
P65 = 14.20
Therefore, 65% of the scores in the distribution are less than 14.20.
6.
Solve the value of P99.
[ ]
ntℎ score
99 99
P99= n+(1 − )
100 100
[ ]
ntℎ score
99 99
P99 = 9+(1 − )
100 100
[ ]
ntℎ score
891 1
¿ +
100 100
[ ]
ntℎ score
892
¿
100
P99=8.92 tℎ score
P99 lies within the 8th and 9th scores. The value of P99 is the sum of the 8th
score and 92% of the difference of 9th and 8th scores.
k = 1, 2, and 3
[ ]
n
− cfp1
4
Q1=L B 1+ c .i
fq 1
[ ]
2n
− cfp2
4
Q2=L B 2+ c .i
fq 2
[ ]
3n
− cfp3
4
Q3=L B 3+ c.i
fq 3
Sample Computations of Quartile Using Grouped Data
Example 1: The data for the scores of fifty (50) students in Filipino class are
given below. Solve for the value of Q1.
x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 –80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:
𝑛
= 50 = 12.5
4 4
Q1C = 41 – 48
LL = 41
LB= 40.5
cfp = 10
fq = 5
c.i = 8
[ ]
n
− cfp1
4
Q1=L B 1+ c .i
fq 1
Q1=40.5+
[ ]
12.5 −10
5
8
Q1=40.5+
[ ]
2.5
5
8
Q1=40.5+
20
[ ]
𝑄1 = 40.5 + 4
5
𝑄1 = 44.50
Therefore, 25% of the scores of 50 students who participated in the test are less than
44.50.
Example 2: The data for the scores of fifty (50) students in Filipino class are
given below. Solve for the value of Q3.
x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:
3𝑛 3(50)
4 = 4 = 37.5
Q3C = 73 – 80
LL = 73
LB= 72.5
cfp = 37
fq = 8
c.i = 8
[ ]
3n
− cfp3
4
Q3=L B 3+ c.i
fq 3
Q3=72.5+
[ ]
37.5 − 37
8
8
Q3=72.5+
[ ]
0.5
8
8
Q3=72.5+
4
[]
𝑄3 = 72.5 + 0.5
8
𝑄3 = 73.00
Therefore, 75% of the scores in the distribution are less than 73.
[ ]
kn
−cfp
The general formula for deciles of grouped data is 10
D k = LB + c.i
fd
where:
Dk = indicated decile
k = 1, 2, 3, 4, 5, 6, 7, 8, 9
LB = lower boundary of the indicted decile class
n 2n 3n
DC = deciles class is a class or category containing for D1, for D2, for D3,……
10 10 10
9n
for D9,
10
cfp = cumulative frequency before the indicated decile class when scores are
arranged from lowest to highest
fd = frequency of the indicated decile class
c.i = size of class interval
[ ]
n
− cfp1
10
D1=L B+ c.i
fd 1
[ ]
2n
− cfp2
10
D2=L B + c .i
fd 2
[ ]
3n
−cfp3
10
D3=L B + c.i
fd 3
[ ]
4n
− cfp4
10
D4 =LB + c.i
fd 4
[ ]
5n
−cfp5
10
D5=L B + c.i
fd 5
[ ]
6n
− cfp6
10
D6=L B + c .i
fd 6
[ ]
7n
−cfp7
10
D7=L B + c .i
fd 7
[ ]
8n
− cfp8
10
D8=L B + c .i
fd 8
[ ]
9n
− cfp9
10
D9=L B + c .i
fd 9
Example 3: The data for the scores of fifty (50) students in Filipino class are
given below. Solve the value of D5.
x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:
(5)𝑛 (5)50 250
10 = 10 =10 = 25
D5C = 57 – 64
LL = 57
LB = 56.5
cfp = 19
fd = 12
c.i = 8
[ ]
5n
− cfp5
10
D5=L B c .i
fd 5
D5=56.5+
[ ]
25 − 19
12
8
D5=56.5+
[ ]6
12
8
D5=56.5+
[ ]
48
12
D5=56.5+ 4
𝐷5 = 60.5
Hence, 50% of the scores of 50 students are less than 60.5
Examples 4: The data for the scores of fifty (50) students in Filipino class are
given below. Solve for the value of D7.
x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:
7 n (7 ) 50 350
= = =35
10 10 10
D7C = 65 – 72
LL = 65
LB = 64.5
cfp= 31
fd= 6
c.i = 8
[ ]
7n
−cfp7
10
D7=L B + c .i
fd 7
D7=64.5+
[ ]
35− 31
6
8
D7=64.5+
[]
4
6
8
D7=64.5+
[ ]
32
6
D7=64.5+5.33
D7=69.83
Therefore, 70% of the scores of students are less than 69.83.
b. Percentiles of Grouped Data
[ ]
kn
−cfp
The general formula for percentile using grouped data is 100
Pk =L B + c.i
fd
Where,
n 2n 3n 98 n
PC = percentile class containing for P1, for P2, for P3,….. for P98
100 100 100 100
,
99 n
for P99
100
cfp = cumulative frequency before the indicated percentile class when scores are
arranged from lowest to highest
To derive the formula in solving the indicated percentile, just change the value of
k to the indicated percentile. There are 99 formulas in solving the percentile. Some of
the formulas for percentile are the following.
[ ]
(1)n
− cfp
100
P1=L B+ c.i
fd
[ ]
(2)n
− cfp
100
P2=L B+ c.i
fd
[ ]
(3)n
−cfp
100
P3=L B + c.i
fd
[ ]
(4)n
− cfp
100
P4 =LB + c .i
fd
[ ]
(10)n
− cfp
100
P10=LB + c.i
fd
[ ]
(20)n
−cfp
100
P20=LB + c.i
fd
[ ]
(25)n
−cfp
100
P25=LB + c.i
fd
[ ]
(50)n
− cfp
100
P50=LB + c.i
fd
[ ]
(75)n
− cfp
100
P75=LB + c.i
fd
[ ]
(90)n
− cfp
100
P90=L B + c .i
fd
[ ]
(95)n
−cfp
100
P95=L B+ c .i
fd
[ ]
(99)n
−cfp
100
P99=L B+ c.i
fd
Sample Computations of Percentile Using Grouped Data
Example 5: The data for the scores of fifty (50) students is Fillipino class. Data
are given below. Solve the value of P82.
x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:
82 n 82(50) 4100
= = =41
100 100 100
P82C = 73 – 80
LL = 73
LB = 72.5
cfp = 37
fd = 8
c.i = 8
[ ]
(82)n
− cfp
100
P82=LB + c.i
fd
P82=72.5+
[ ]
41− 37
8
8
P82=72.5+
[]
4
8
8
P82=72.5+
32
8[ ]
P82=72.5+ 4
P82=76.5
Therefore, 82% of the scores of 50 students are less than 76.5
Example 6: The data for the scores of fifty (50) students in Filipino class are
given below. Solve the value of P91.
x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:
91 n 91(50) 4550
= = =45.50
100 100 100
P91C = 81 – 88
LL = 81
LB = 80.50
cfp = 45
fd = 3
c.i = 8
[ ]
(91)n
− cfp
100
P91=LB + c.i
fd
P91=80.50+
[ ]
45.50 −45
3
8
P91=80.50+
[ ]
0.50
3
8
P91=80.50+
4
3 []
P91=80.50+1.33
P91=81.83
Measures of Variation
In the previous section, we have discussed measure or central tendency to describe the
distribution of scores. However, the measure of central tendency does not uniquely describe
a distribution, most especially if we want to know how close or how far is the distance of the
scores of students in a certain test from the average performance of the group. It is in this
line that we make use of the measures of variation. Measure of variation is a single value that
is used to describe the spread of the scores in a distribution. The term variation that is also
known as variability of dispersion. There are several ways of describing the variation of
scores: absolute measures of variation and relative measures of variation.
Before answering such questions, let us first discuss the different types of
measures of variation.
There are four kinds of absolute variation. The range, inter-quartile range and
quartile deviation, mean deviation, variance and standard deviation.
1. Range
Range (R) is the difference between the highest score and the lowest score in a
distribution. Range is the simplest and the crudest measure of variation, simplest
because we shall only consider the highest score and the lowest score.
a. Range for Ungrouped Data R
= HS – LS
Where,
R = range value
HS = Highest score
LS = Lowest score
Group A Group B
10(LS) 15(LS)
12 16
15 16
17 17
25 17
26 23
28 25
30 26
35(HS) 30(HS)
RA = HS – LS
RA = 35 – 10
RA = 25
RB = HS – LS
RB = 30 - 15
RB = 15
Analysis:
The range of Group A = 25 is greater than the range of Group B = 15. The
implication of this is that scores in group A are more spread out than the scores in
group B or the scores in Group B are less scattered than the scores in group A.
Where,
R = range value
Properties of
Range
When the range value is large, the scores in the distribution are more dispersed,
widespread or heterogeneous. On the other hand, when the range value is small the
scores in the distribution are less dispersed, less scattered, or homogeneous.
2. Inter-quartile Range (IQR) and Quartile Deviation (QD)
Inter-quartile range is the difference between the third quartile and the first
quartile.
IQR = Q3 – Q1
Quartile deviation indicates the distance we need to go above and below the
median to include the middle 50% of the scores. It is based on the range of the middle
50% of the scores, instead the entire set.
QD is the quartile deviation value, Q1 is the value of the first quartile and Q3 us the value
of the third quartile.
𝟐
.
𝑸𝟑 − 𝑸𝟏
a. Quartile Deviation of Ungrouped Data
𝑸𝑫 =
𝟐
[ ( )]
ntℎ score
1 1
Q 1= n+ 1 −
4 4
[ ( )]
ntℎscore
1 1
Q1= 9+ 1 −
4 4
[ ( )]
ntℎscore
9 3
Q 1= +
4 4
[ ]
ntℎ score
12
Q 1=
4
rdscore
Q1=3
Q1=10
Solve for Q3.
n= 9
[ ( )]
ntℎscore
3 3
Q 3= n+ 1 −
4 4
[ ( )]
ntℎ score
3 3
Q3= 9+ 1 −
4 4
[ ( )]
ntℎ score
27 1
Q 1= +
4 4
[ ]
ntℎ score
28
Q 1=
4
tℎ score
Q1=7
Q1=15
IQR = Q3 – Q1
= 15 – 10
IQR = 5
Q3 − Q1
QD=
2
15− 10
QD=
2
5
QD=
2
QD=2.5
Analysis:
𝑸𝟑 − 𝑸𝟏
b. Quartile Deviation of Grouped Data
𝑸𝑫 =
𝟐
Example 2: The data given below are the scores of fifty (50) students in Filipino
class. Solve for the value of quartile deviation (QD).
X F cf<
25 – 32 3 3
33 – 40 7 1.
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solve for the value of Q1.
𝑛
= 50 = 12.5
4 4
Q1C = 41 - 48
LL = 41
LB= 40.5
cfp = 10
fq = 5
c.i = 8
[ ]
n
− cfp1
4
Q1=L B 1+ c .i
fq 1
Q1=40.5+
[ ]
12.5 −10
5
8
Q1=40.5+
[ ] 2.5
5
8
Q1=40.5+
[ ] 20
5
Q 1=40.5+ [ 4 ]
Q1=44.5
[ ]
3n
− cfp3
4
Q3=L B 3+ c.i
fq 3
Q3=72.5+
[ 8 ]
37.5 − 37
8
Q3=72.5+
[ ]
0.5
8
8
Q3=72.5+
[]4
8
Q3=72.5+ 0.5
Q3=73
The larger the value of the IQR or QD, the more dispersed the scores at the
middle 50% of the distribution. On the other hand, if the IQR or QD is small, the scores
are less dispersed at the middle 50% of the distribution. The point of dispersion is the
median value.
When the value of IQR and QD is small, the scores are clustered within middle
50% of the score distribution. On the other hand, the scores are dispersed in the
middle 50% of the distribution when the value of IQR and QD. To determine which
group of distribution is more clustered or disperses you should compare it with
another group of distribution since there is no standard value of a small or large value
of IQR and QD.
Mean deviation measure the average deviation of the values from the arithmetic
mean. It gives equal weight to the deviation of every score in the distribution.
MD=
∑ |x − x|
n
Where,
MD = mean deviation value
x = individual score
x = sample mean
n = number of cases
x x-𝐱 /x - 𝐱/
35 13.8 13.8
30 8.8 8.8
26 4.8 4.8
24 2.8 2.8
20 -1.2 1.2
18 -3.2 3.2
18 -3.2 3.2
16 -5.2 5.2
15 -6.2 6.2
10 -11.2 11.2
Ʃx = 212 Ʃ/x - x/
= 60.4
x=
∑x
n
212
x=
10
x=21.2
MD=
∑ |x − x|
n
60.4
MD=
10
MD=6.04
Analysis:
The mean deviation of the 10 scores of students is 6.04. This means that on the average,
the value deviated from the mean of 21.2 is 6.04.
MD=
∑ f |X m − x|
n
Where,
MD = mean deviation
x = mean value
n = number of cases
𝐗𝐦 − 𝐱̅ 𝐟/𝐗𝐦 − 𝐱̅ /
𝐱̅ /
X f Xm fXm /𝐗𝐦 −
x=
∑ fX m
n
1345
x=
40
x=33.63
MD=
∑ f |X m − x|
n
452.22
MD=
40
MD=10.63
Analysis:
The mean deviation of the 40 scores of students is 10.63. This means that in the
average, the value deviated from the mean of 33.63 is 10.63.
Population Variance
2
σ =
∑ ( x − μ )2
N
Sample Variance
2
s=
∑ ( x − x )2
n −1
Steps in Solving Variance of Ungrouped Data
Ʃ(x - 𝐱)2 =
10 -4.6 21.16
Ʃx = 146
60.40
x = 14.6
a. Variance of Ungrouped Data
Population Variance of Ungrouped
Data
σ
2
=
∑ ( x − μ )2
N
60.40
2
σ =
10
2
σ =6.04
2 60.40
s=
9
2
s =6.71
Note: If the standard deviation is already solved, square the value of the standard
deviation to get the variance.
b. Variance of Grouped
Data Population Variance
σ 2
=
∑ f ( X m − μ )2
N
Sample Variance
2
s=
∑ f ( X m − x )2
n −1
Steps in Solving the Variance of Grouped Data
𝐱̅ 𝐗𝐦 𝐟(𝐗𝐦 − 𝐱̅ ) 𝟐
− 𝐱̅ − 𝐱̅ ) 𝟐
X f Xm fXm (𝐗𝐦
Ʃ𝑓(𝑋𝑚 − µ)2
Population Variance
𝑁
ơ2 =
2 606.4
ơ2 = 40
ơ2 = 65.16
Ʃ𝑓(𝑋𝑚 − 𝑥̅
Sample Variance
𝑠 =
)2
𝑛−1
2
𝑠2 = 39
2 606.4
𝑠2 = 66.83
Standard deviation is the most important measures of variation. It is also
known as the square root of the variance. It is average distance of all the scores that
deviates from the mean value.
Note: If the variance is already solved, take the square root of the variance to get
the value of the standard deviation.
Example: Using the data on example 1, solve the population and sample
standard deviation.
σ=
√ ∑ (x − μ)2
N
σ=
√
60.40
10
σ =√ 6.04
σ =2.46
s=
√ ∑ ( X − x)2
n −1
s=
√
60.40
9
s= √ 6.71
s=2.59
σ=
√ ∑ f ( X m − μ)2
N
σ=
√ 2606 . 4
40
σ =√ 65 .16
σ =8 . 07
s=
√ ∑ f ( X m − x )2
N
s=
√ 2606 . 4
39
s= √ 66 . 8308
s=8 .18
Going back to the diagram presented, let us answer the questions pose using the
concept of mean and standard deviation with the help of the diagram below.
1. What can you observe about the mean and the standard deviation of
the three groups of scores?
Answer: The mean of the three groups of scores is the same which is equal to
18.25 and the standard deviation of section A = 5.15, section B = 6.92, and
section C = 7.63.
2. Which group of students performed well in the class?
Answer: In terms of performance, the three sections of students perform the
same because they have the same mean value of 18.25.
3. Which group of scores is most widespread? Less scattered?
Answer: The standard deviation of section A = 5.15, section B = 6.92 and
section C = 7.63. The scores that are most scattered are those in section C
because they have the largest value of standard deviation which is equal to
7.63. On the other hand, the less scattered group of scores is in section A
which has the smallest value of the standard deviation which is equal to 5.15.
Therefore, the smaller the value of the standard deviation on the average the
closer the scores are to the mean value and the larger the value of the
standard deviation on the average makes the scores scattered from the mean
value.
Using the diagram, there are more scores that are closer to the mean
value in section A than in section B and section C.
CV = ( xs ) 100 %
𝑥̅ = mean value
s = standard deviation
Example:
CV A= ( sx )100 %
CV A= ( 18.25
5.15
) 100 %
CV A=28.22%
( sx )100 %
CV B =
CV =(
18.25 )
6.92
B 100 %
CV B =37.92 %
( xs ) 100 %
CV C =
CV =(
18.25 )
7.63
C 100 %
CV C =41.81%
Analysis:
The sores in section A are less scattered than the scores in section B and section
C. In other words, scores in section A are more homogeneous than the scores in section
B and Section C. Another way to interpret this is, the scores in section c are more spread
out than the scores in section A and section B, or the scores in section C are more
heterogeneous tan the scores in section A and section B.
Measures of Skewness
𝑠
SK=
Positively skewed or skewed to the right is a distribution where the thin end
tail of the graph goes to the right part of the curve. This happens when most of the
scores of the students are below the mean.
Negatively skewed or skewed to the left is a distribution where the then in tail of
the graph goes to the left part of the curve. This happens when most scores got by the
students are above the mean.
Negatively skewed distribution means that the students who took the
examination performed well. Most of the scores are high and there are only few low
scores. The shape of the score distribution indicates the performance of the students
but not the reasons why most of the students got high scores. The possible reasons
why students got high score are: the group of students are smart, there is enough time
to finish the examination, the test are very easy, and there is an effective instruction
and the students have prepared themselves for the examination.
Example 1: Find the coefficient of skewness of the scores of 40 grade 6 pupils in
a 100-item test in Mathematics if the mean is 82 and the median is 90 with standard
deviation of 15.
Given:
x=82
~
x=90
s=15
3(x − ~
x)
Sk=
s
3(82 −90)
Sk=
15
3(− 8)
Sk=
15
−24
Sk=
15
Sk=−1.60
Analysis:
3(x − ~
x)
Sk=
s
3(46 −40)
Sk=
7.5
3(6)
Sk=
7.5
18
Sk=
7.5
Sk=2.40
Analysis:
Normal Distribution
Using the figure above, 47.72% or about 43% of the students got a score from 74 to 82.
We can also use the normal curve to determine the ,percentage of the scores of students
below or above a certain score. 15.86% or 16% of the students got a score below 70. This
can be considered also as a score of 70 is at 16th percentile.
About 84.12% or 84% of the scores are below 78. This can be written as P84 = 78.
Standard Scores
In this section, we shall discuss the different kinds of converted scores. The
procedures for converting raw scores to standard scores are presented in this section.
There are four (4) types of standard scores: z-score, t-score, standard nine (stanines),
and percentile ranks.
Scores directly obtained from the test are known as actual scores or raw scores.
Such scores cannot be interpret as whether the score is low, average or high. Scores
must be converted or transformed so that they become meaningful and allow some kind
of interpretations and direct comparisons of two scores. Consider the two figures
below:
Figure A represents the score distribution with a mean of 50 with a standard
deviation of 10. Figure B represent a score distribution with a mean of 80 and standard
deviation of 5.
The shape of the two score distributions above is the same, however, the means
and standard deviation are different. This happens because the ranges of the scores
from figure A differ from figure B. In this case, the scores in those figures cannot be
compared because they belong to two different groups.
The raw scores of all students in the Business Calculus and Production
Management are very important so that we can get the information that describes
both score distributions. Bases from our previous discussion, the mean value and the
standard deviation are necessary to describe the distribution of scores. Let us add the
mean values and standard deviations of the scores of students in Business Calculus and
Production Management as shown:
Business Calculus Production Management
𝑥̅ = 95 𝑥̅ = 80
x = 92 x= 88
s=3 s=4
Ritz Glenn’s score in Business Calculus is three (3) points below the class mean
performance nd eight (8) points above the class mean performance in Production
Management. Using the mean value, we can say that Ritz Glenn performed better in
Production Management than in Business Calculus compared with the performance of
the rest of his classmates. How about the standard deviation? The standard
deviation enables us to know how many percent of the scores above or below each
score has in the distribution.
The shaded area represent the percentage of the scores lower than the score of
Ritz Glenn. In Business Calculus the score of Ritz Glenn is one standard deviation unit
below the mean and in Production Management his score is two standard deviation
units above the mean. To determine the exact percentage of the scores below the score
of Ritz Glenn in Business Calculus and Production management use the normal curve
model.
15.86% or approximately 16% of the scores below the score of Ritz Glenn in
Business Calculus or his score is 16th percentile.
1. z-scores
To get more exact information about the performance of Ritz Glenn collect the
raw score, mean and standard deviation and determine how far below or above the
mean in standard deviation units is the obtained raw score.
To determine the exact position of each score in the normal distribution use z-
score formula. The z-score is used to convert a raw score to standard score to
determine how far a raw score lies from the mean in standard deviation units. From
this we can also determine whether an individual student performs well in the
examination compared to the performance of the whole class.
The z-score value indicates the distance between the given raw score and the
mean value in units of the standard deviation. The z-value is positive when the raw
score is above the mean while the z is negative when the raw score is below the mean.
x−μ x−x
z= ∨z= , where
σ s
z = z-value
x = raw score
µ = population mean
The z-score formula is very essential when we compare the performance of the
student in his subjects or the performance of two students that belongs to different
groups. It can determine the exact location of the scores whether above or below the
mean and how many standard deviation units it is from the mean.
Example: Using the data about Ritz Glenn’s scores in Business Calculus and
Production Management, solve the z-score value.
𝑥̅ = 95 𝑥̅ = 80
x = 92 x= 88
s=3 s=4
z-score of Business Calculus (BC)
𝑥 − 𝑥̅
𝑧=
𝑠
𝑍𝐵𝐶
92 − 95
3
=
−3
𝑍 𝐵𝐶
3
𝑍𝐵𝐶 = −1
=
𝑍𝑃𝑀
88 −
80
=
4
𝑍𝑃𝑀 =
8
𝑍𝑃𝑀 = +2
4
Analysis:
The score of Ritz Glenn in Business Calculus is one unit standard deviation
below the mean. His score in Production Management is two units standard deviation
above the mean. Therefore, we can conclude that Ritz Glenn performed better in
Production Management than in Business Calculus.
1. T-scores
There are two possible values of z-score, positive z if the raw score is above the
mean and negative z if the raw score is below the mean. To avoid confusion between
negative and positive value, use T-score to convert raw scores. T-score is another type
of standard score where the mean is 50 and the standard deviation is 10. In z-score
the mean is 0 and the standard deviation is one (1). To convert raw score to T-score,
find first the z-score equivalent of the raw score and use the formula T-score = 10z +
50.
T-scoreBC = 10z + 50
T-scoreBC = 10(-1) + 50
T-scoreBC = 10 + 50
T-scoreBC = 40
T-scoreBC = 10z + 50
T-scoreBC = 10(2) + 50
T-scoreBC = 20 + 50
T-scoreBC = 70
Analysis:
1. Standard Nine
The third type of standard score is the Standard Nine point scale which is also
known as stanine, the origin word is sta(ndard) + nine. A stanineis a nine-point grading
scale ranging from 1 to 9, 1 being the lowest and 9 the highest. Stanine grading is easier
to understand than the other standard score model. The descriptive interpretation of
stanine 1, 2, 3 is below average, the stanine 4, 5, 6 is interpreted as average and the
descriptive interpretation of stanine 7, 8, 9 is above average. Use this graph as a basis of
analysis stanine results.
From the given figure, the mean of stanine is 0 and the central interval is 0.25
standard deviation of the mean, and each other interval is 0.5 standard deviation wide
except for the end tail of the normal curve.
The given figure below indicates the percentage of score in each stanine and the
corresponding descriptions.
1. Percentile Rank
Another way of converting a raw score to standard score is the percentile rank. A
percentile rank indicates the percentage of scores that lies below a given score.
Example, a test score which is greater than 95% of the scores of the examinees is said to
be 95th percentile. If the scores are normally distributed, percentile rank can be
inferred from the standard score. In solving percentile rank use the formula:
PR= ( n )
CF b +0.5 F g
x 100
Where,
PR = percentile rank
Solving the percentile rank is tedious or needs a very long process, we can short-
cut the solution using the SPSS program or EXCEL program which is more easier to use
and more cheaper than other software.
Example: The table below shoes a summary of the scores of 40 students in a 45-
item multiple choice of test. Find the percentile ranks of each score in the
distribution.
TS F
45 1
43 2
42 2
41 1
40 1
39 2
37 3
36 2
34 1
33 2
32 2
30 3
29 4
28 1
27 1
25 2
24 1
22 2
21 2
19 1
18 2
16 1
15 1
40
Find the cumulative frequency of the frequency distribution. The third column
represents the cumulative frequency.
TS F CF
45 1 40
43 2 39
42 2 37
41 1 35
40 1 34
39 2 33
37 3 31
36 2 28
34 1 26
33 2 25
32 2 23
30 3 21
29 4 18
28 1 14
27 1 13
25 2 12
24 1 10
22 2 9
21 2 7
19 1 5
18 2 4
16 1 2
15 1 1
40
Find the percentile rank of each score.
a. Solution:
Score = 45
CFb = 39
Fg = 1
n = 40
PR= ( CF b +0.5 F g
n )
x 100
PR=( 40 )
39+0.5(1)
x 100
PR=(
40 )
39+0.5
x 100
PR=(
40 )
39.5
x 100
𝑃𝑅 = 0.9875 𝑥 100
𝑃𝑅 = 98.75
𝑃𝑅 = 99
Analysis
A raw score of 45 is equal to percentile rank of 99. This means that 99% of the
students who took the examination had raw scores equal to or lower than 45. This can
be written as PR99 = 45.
b. Solution:
Score = 43
CFb = 37
Fg = 2
n = 40
PR= (
CF b +0.5 F g
n
x 100 )
PR= ( 37+0.5(2)
40 )
x 100
PR=( 37+1
40 )
x 100
PR=( ) x 100
38
40
𝑃𝑅 = 0.95 𝑥 100
𝑃𝑅 = 95
Analysis:
A raw score of 43 is equal to a percentile rank of 95. This means that 95% of the
students who took the examination had raw scores equal to or lower than 43. This can
be written also as PR95 = 43.
c. Solution:
Score = 42
CFb = 35
Fg = 2
n = 40
PR= ( n )
CF b +0.5 F g
x 100
PR=( 35+0.5(2)
40 ) x 100
PR=(
40 )
35+1
x 100
PR=( ) x 100
36
40
𝑃𝑅 = 0.9 𝑥 100
𝑃𝑅 = 90
Analysis
A raw score of 42 is equal to a percentile rank f 90. This means that 90% of the
students who took the examination had raw scores equal to or lower than 43. This can
be written also as PR90 = 42.
Note: continue solving the percentile ranks of each score in the distribution in
the exercise and compare the answers in the percentile ranks distribution in the
succeeding page.
When converting the raw scores to a percentile rank, the raw scores are put on
a scale that has the same meaning with different number of groups and for different
lengths of tests.
TS F CF PR
45 1 40 99
43 2 39 95
42 2 37 90
41 1 35 86
40 1 34 84
39 2 33 80
37 3 31 74
36 2 28 68
34 1 26 64
33 2 25 60
32 2 23 55
30 3 21 49
29 4 18 40
28 1 14 34
27 1 13 31
25 2 12 28
24 1 10 24
22 2 9 20
21 2 7 15
19 1 5 11
18 2 4 8
16 1 2 4
15 1 1 1
n = 40
Relationship between Different Standard Scores
This figure shows the relationship between the raw scores and he converted
scores assuming that the distribution is normally distributed. The score distribution
has a mean of 60 and standard deviation of 5. Using these parameters let us consider
a raw score of 75, this raw score is equivalent to a distance of three standard
deviations from the mean value, z-score of 3 and T-score of 80 and stanine of 8. This
can be done using the different process that we have discussed in the previous
sections.
DESCRIBING RELATIONSHIPS
Correlation
Another statistical method used in analyzing test results is the correlation. This
is the tool that we are going to utilize if we want to determine the relationship or
association between the scores of students in two different subjects. Is there a
relationship between the Mathematics scores and the Science scores of 15
students?
What type of linear relationship exists between the two sets of scores? Such questions
can be answered using the concepts of correlation. In this chapter, the different type of
computing the correlation coefficient when raw scores and ordinal level of
measurement are given. The graphical method or scattergram of determining the
relationship between two groups of scores are also discussed in this section but only
limited to linear relationship.
Correlation refers to the extent to which the distributions are linearly related
or associated between the two variables. The extent of correlation is indicated
numerically by the coefficient (rxy). The correlation coefficient (rxy) also known as
Pearson Product Moment Correlation Coefficient in honor to Karl Pearson who
developed the said formula. The correlation coefficient ranges from -1 to +1. There
are three kinds of correlation based from the correlation coefficient: (1) positive
correlation; (2) negative correlation; and (3) zero correlation. There are two ways of
identifying the correlation between the two variables: (1) using the formula; and (2)
using scatter point or scattergram.
Kinds of Correlation
1. Positive Correlation
High scores in distribution x are associated with high scores in distribution y.
Low scores in distribution x are associated with low scores in distribution y. This
means that as the value of x increases the value of y increases too or as the value of x
decreases, the y values will also decrease. The line that best fitted to the given points
upward to the right as shown in the scattergram of positive correlations. The slope of
the line is positive.
2. Negative Correlation
High scores in distribution x are associated with low scores in distribution y. Low
scores in distribution x are associated with high scores in distribution y. This means
that as the values of x increase, the values of y decrease or when the values of x
decrease, the values of y increase. The line that best fitted to the given points downward
to the right as shown in the scattergram of negative correlations. The slope of the line
is negative.
3. Zero Correlation
No association between scores in distribution x and scores in distribution y. No
single line can be drawn that best fitted to all points as shown in the scattergram of
zero correlation. No discernable pattern can be formed.
The formula in computing the correlation coefficient using th Person Product
Moment Correlation is:
r xy =(n)¿ ¿
Scattergram of Correlation
Another way of determining the correlation of pair of scores is through the use of
graphing. The graphical representation is called scattergram. Using your knowledge in
graphing ordered pairs in the coordinate plane; graph the scores of 8 students in
mathematics and science.
Mathematics Science Scores
Scores
1 6
2 8
3 10
4 11
5 13
6 16
7 20
8 21
Analysis:
Analysis:
As math scores increase, science scores decrease. Using the given points in the
coordinate plane, a straight line downward to the right can be drawn that is best
fitted to all the points. Hence, the slope of the line is negative.
3. Scattergram of Zero Correlation
Analysis:
No discernable pattern can be formed from the given set of points. No single line
can be drawn that is best fitted to all point in the plane.
Computation of Correlation
r xy =(n)¿ ¿
Student Scores in Scores in xy X2 Y2
Math (x) Science (y)
1 35 41 1435 1225 1681
2 15 25 375 225 625
3 11 19 209 121 361
4 35 39 1365 1225 1521
5 45 40 1800 2025 1600
6 28 30 840 784 900
7 30 26 780 900 676
8 15 23 345 225 529
9 45 48 2160 2025 2304
10 40 42 1680 1600 1764
Ʃx = 299 Ʃy = 333 Ʃxy = Ʃx2 = Ʃy2 =
10989 10355 11961
r xy =(n)¿ ¿
( 10 ) ( 10989 ) −( 299)( 333)
r xy =
√ [ ( 10 ) (10355 ) − 299 ][ ( 10 )( 11961 ) − 333 ]
2 2
( 109890 ) −(99567)
r xy =
√ [ (103550 ) − 89401 ] [ (119610 ) −110889 ]
( 109890 ) −(99567)
r xy =
√ [ (103550 ) − 89401 ] [ (119610 ) −110889 ]
10323
r xy =
√ [ 14149 ][ 8721 ]
10323
r xy =
√123393429
10323
r xy =
11108.25949
r xy =0.929308503
r xy =0.93
Analysis:
The value of the correlation coefficient is rxy = 0.93, which means that there is
a very high positive correlation between the scores of 10 students in mathematics
and in science. This means that students who are good in mathematics are also good in
science.
Another way of finding the correlation between two variables is the Spearman
rho correlation coefficient and it is denoted by a Greek letter rho (ƿ). The Spearman rho
correlation coefficient (ƿ) is a measure of correlation when the given sets of data are
expressead in ordinal level of measurement rather then raw scores as in Pearson r,
Spearman rho (ƿ) was first derived by a British psychologist by the name Spearman
In honor to him, the formula was named Spearman’s rho.
6∑ D
2
ρ=1− 2 where,
N (N −1)
ƿ = Spearman rho correlation coefficient value
D = difference between a pair of ranks
N = number of students/ cases
Rank the scores in mathematics and the scores in science, find the difference of
each pair of scores and square the difference. Find the summation of D 2 and solve for ƿ
value.
Solution:
6∑ D
2
ρ=1− 2
N (N −1)
6 (34)
ρ=1−
10(102 − 1)
204
ρ=1−
10 ( 99 )
204
ρ=1−
990
ρ=1− 0.21
ρ=0.79
Analysis
The ƿ value is 0.79, which indicates a high positive correlation between the
mathematics scores and science scores of ten aspirants in the Gabuyo Scholarship. The
students who are good in mathematics are also good in science.
76