Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chapter 5

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 76

CHAPTER 5

UTILIZATION OF DATA

LEARNING OUTCOMES

At the end of this chapter, the students should be able to:

1. Apply statistics in research and in any systematic investigation;


2. Construct frequency distribution for a given set of scores;
3. Graph the scores using histogram and frequency distribution;
4. Calculate the mean, median and mode, decile, quartile, and percentile of the
students’ scores
5. Identify the different properties of the measure of central tendency;
6. Identify the uses of the different measures of variability;
7. Calculate the value and make an analysis of range, mean deviation, quartile
deviation, variance and standard deviation of given scores;
8. Differentiate standards deviation from coefficient of variation;
9. Identify the properties of the different measures of variability;
10. Apply the concept of skewness in identifying the performance of the students;
11. Determine the spread of scores using the measures of variation;
12. Compare the performance of the students using measures of central tendency
and measure of variability;
13. Convert raw scores to standard scores;
14. Determine the relationship of two groups of scores; and
15. Compute r and ƿ value of scores and make an analysis.

INTRODUCTION

Statistics is very important tool in the utilization of the assessment data most
especially in describing, analyzing, and interpreting the performance of the students
in the assessment procedures. The teachers should have the necessary background in
the statistical procedures used in assessment of student learning in order to give a
correct description and interpretation about the achievement of the students in a
certain test whether classroom assessment conducted by the teacher, division or
national assessment conducted by the Department of Education.

In this chapter, we shall discuss the important tools in analyzing and


interpreting assessment results. These statistical tools are measures of central
tendency, measures of variation, skewness, correlation, and different types of
converted scores.

DEFINITION OF STATISTICS

Statistics is a branch of science, which deals with the collection, presentation,


analysis and interpretation of quantitative data.
Branches of Statistics

Descriptive Statistics is a method concerned with collecting, describing, and


analyzing a set of data without drawing conclusions (or inferences) about a large
group.

Inferential Statistics is a branch of statistics, concerned with the analysis of a


subset of data leading to predictions or inferences about the entire set of data.

FREQUENCY DISTRIBUTION

Frequency distribution is a tabular arrangement of data into appropriate


categories showing the number of observation in each category or group. There are
two major advantages: (a) it encompasses the size of the table; and (b) it makes the
data more interpretive.

Parts of Frequency Table

1. Class Limit is the grouping or categories defined by the lower and upper limits.
Examples: LL – UL
10 – 14
15 – 19
20 – 24
Lower class limit (LL) represent the smallest number in each group.
Upper class limit (UL) represent the highest number in each group.
2. Class size (c.i) is the width of each class
interval. Examples: LL – UL
10 – 14
15 – 19
20 – 24

The class size in this score distribution is 5.

3. Class boundaries are the numbers used to separate each category in the
frequency distribution but without gaps create by the class limits. The scores
of the students are discrete. Add 0.5 to the upper limit to get the upper class
boundary and subtract 0.5 to the lower limit to get the lower class boundary in
each group or category.
Examples: LL – UL LCB - UCB
10 – 14 9.5 – 14.5
15 – 19 14.5 – 19.5
20 – 24 19.5 – 24.5
4. Class marks are the midpoint of the lower and upper class limits. The formula is
¿+ UL
xM =
2

Examples: LL – UL XM
10 – 14 12
15 – 19 17
20 – 24 22

Steps in Constructing Frequency Distribution

1. Compute the value of the range (R). Range is the difference between the highest
score and the lowest score.
R = HS – LS

Determine the class size (c.i). The class size is the quotient when you
divide the range by the desired number of classes or categories. The desired
numbers of classes are usually 5, 10 or 15 they depend in the number of
scores in the distribution. If the desired number of classes is not identified,

R R
c .i= or c .i=
desired number of classes k

2. Set up the class limits of each class or category. Each class defined by the lower
limit and upper limit. Use the lowest score as the lower limit of the first class.

𝐿𝐿 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑒𝑐𝑜𝑛𝑑 𝑐𝑙𝑎𝑠𝑠 − 𝑈𝐿 𝑜𝑓 𝑡ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑐𝑙𝑎𝑠𝑠


3. Set up the class boundaries of needed. use the formula
𝑐. 𝑖 =
2
4. Tally the scores in the appropriate classes.
5. Find the other parts if necessary such as class marks, among others.

Examples: Raw score of 40 students in a 50-item mathematics quiz. Construct a


frequency distribution following the steps given previously.

17 25 30 33 25 45 23 19
27 35 45 48 20 38 39 18
44 22 46 26 36 29 15-LS 21
50-HS 47 34 26 37 25 33 49
22 33 44 38 46 41 37 32
R = HS – LS
= 50 – 15
R = 35
n = 40
Solve the value of k.
k = 1 + 3.3 log n
k = 1 + 3.3 log 40
k = 1 + 3.3 (1.602059991)
k = 1 + 5.286797971
k = 6.286797971
k=6

𝑅
Find the class size.
𝑐. 𝑖 =
𝐾

𝑐. 𝑖 =
3
5

𝑐. 𝑖 = 5.833
6

𝒄. 𝒊 = 𝟔

Construct the class limit starting with the lowest score as the lower limit of the
first category. The last category should contain the highest score in the distribution.
Each category should contain 6 as the size of the width (X). Count the number of scores
that falls in each category (f).

X Tally Frequency(f)
15 – 20 //// 4
21 – 26 ///////// 9
27 – 32 /// 3
33 – 38 ////////// 10
39 – 44 //// 4
45 – 50 ////////// 10
n = 40

Find the class boundaries and class marks of the given score distribution.

X f Class Boundaries XM
15 – 20 4 14.5 – 20.5 17.5
21 – 26 9 20.5 – 26.5 23.5
27 – 32 3 26.5 – 32.5 29.5
33 – 38 10 32.5 – 38.5 35.5
39 – 44 4 38.5 – 44.5 41.5
47.5
45 – 50 10 44.5 – 50.5
n = 40

Graphical Representation of Scores in Frequency Distribution

The scores expressed in frequency distribution can be meaningful and easier


to interpret when they are graphed. There are methods of graphing frequency
distribution: bar graph or histogram and frequency polygon and smooth curve. Bar
graph or histogram and frequency distribution will be discussed in this section while
smooth curve will be discussed later in the skewness.

Histogram consists of a set of rectangles having bases on the horizontal axis


which centers at the class marks. The base widths correspond to the class size and the
height of the rectangles corresponds to the class frequencies. Histogram is best used
for graphical representation of discrete data or non-continuous data.

Frequency polygon is constructed by plotting the class marks against the class
frequencies. The x-axis corresponds to the class marks and the y-axis corresponds to the
class frequencies. Connect the points consecutively using a straight line. Frequency
polygon is best used in representing continuous data such as the scores of students in a
given test.

Construct a histogram and frequency polygon using the frequency


distributions of 40 students in a 50-itm mathematics quiz previously discussed.

X Frequency(f)
15 – 20 4
21 – 26 9
27 – 32 3
33 – 38 10
39 – 44 4
45 – 50 10
n = 40

DESCRIBING GROUP PERFORMANCE

There are two major concepts in describing the assessed performance of the
group: measures of central tendency and measures of variability. Measures of central
tendency are used to determine the average score of a group of scores while measure
of variability indicate the spread of scores in the group. These two concepts are very
important and helpful in understanding the performance of the group.

Measure of Central Tendency

Measure of central tendency provides a very convenient way of describing a


set of scores with a single number that describe the performance of the group. It is
also defined as a single value that is used to describe the “center” of the data. It is
thought of as a typical value in a given distribution. There are three commonly used
measures of central tendency. These are the mean, median, and mode. In this section,
we shall discuss how to compute the value and some of the properties of the mean,
median, and mode as applied in a classroom setting.
1. Mean
Mean is the most commonly used measure of the center of data and it is also
referred as the “arithmetic average.”
Computation of Population Mean
𝑥1+ 𝑥
𝜇=
ƩX 2 + 𝑥3 +⋯ 𝑥𝑛
N = N

Computation of Sample Mean


𝑥1+ 𝑥
𝑋 =N =
ƩX 2 + 𝑥3 +⋯ 𝑥𝑛
N

Computation of the Mean for Ungrouped Data

1. 𝑋 =
ƩX

𝑛
2. 𝑋 =
Ʃfx
n
Example 1: Scores of 15 students in Mathematics I quiz consist of 25
items. The highest score is 25 and the lowest score is 10. Here are the scores:
25,20,18, 18,17,15,15,15,14,14,13,12,12,10,10. Find the men in the following
scores

X (scores)
25
20
18
18
17
15
15
15
14
14
13
12
12
10
10

Ʃx = 228
n=15
Ʃx
x̄=
n

228
x̄= =15.2
15

Analysis:

The average performance of 15 students who participated in a mathematics


quiz consisting of 25 items is 15.2. The implication of this is that student who got
score below 15.2 did not perform well in the said examination. Students who got score
higher than 15.2 performed well in the examination compared to the performance of the
whole class.

Example 2: Find the Grade Point Average (GPA) of Ritz Glenn for the first
semester of the school year 2010 – 2011. Use the table below:

Subject Grade (xi) Units (wi) (wi) (xi)


BM 112 1.25 3 3.75
BM 101 1.00 3 3.00
AC 103N 1.25 6 7.50
BEC 111 1.00 3 3.00
MGE 101 1.50 3 4.50
MKM 101 1.25 3 3.75
FM 111 1.50 3 4.50
PEN 2 1.00 2 2.00
Ʃ(wi) = 26 Ʃ(wi) (xi) = 32.00

(W ¿¿ i)(X i )
x̄=Ʃ ¿
Ʃ Wi

32
x̄=
26
x̄=1.23

The Grade Point Average of Ritz Glenn for the first semester SY 2010 – 2011 is 1.23.

Mean for Grouped Data

Grouped data are the data or scores that are arranged in a frequency
distribution. Frequency distribution is the arrangement of scores according to
category of classes including the frequency. Frequency is the number of
observations falling in a category.
For this particular lesson we shall discuss only one formula in solving the mean
for gouped data which is called midpoint method. The formula is:
Ʃ f xm
x̄=
n
where x = mean value

f = frequency in each class or category

Xm = midpoint of each class or category

Ʃf𝑋𝑚 – summation of the product of f𝑋𝑚

Steps of Solving Mean for Grouped Data

1. Find the midpoint or class mark (Xm)of each class or category using the
¿+UL
formula Xm=
2
2. Multiply the frequency and the corresponding class mark f𝑋𝑚.
3. Find the sum of the results in step 2.
Ʃ f xm
4. Solve the mean using the formula x̄=
n
Example 3: Scores of 40 students in a science class consist of 60 items and they are
tabulated below.

X F Xm 𝐟𝑿𝒎
10 – 14 5 12 60
15 – 19 2 17 34
20 – 24 3 22 66
25 – 29 5 27 135
30 – 34 2 32 64
35 – 39 9 37 333
40 – 44 6 42 252
45 – 49 3 47 141
50 – 54 5 52 260
n = 40 Ʃf𝑋𝑚 = 1 345

Ʃ f xm
x̄=
n

1345
x̄=
40

x̄=33.63 Analysis:
The mean performance of 40 students in science quiz is 33.63. Those
students who got scores below 33.63 did not perform well in the said examination
while those students who got scores above 33.63 performed well.

Properties of the Mean

1. It measures stability. Mean is the most stable among other measures of


central tendency because every score contributes to the value of the mean.
2. The sum of each score’s distance from the mean is zero.
3. It is easily affected by the extreme scores.
4. It may not be an actual score in the distribution.
5. It can be applied to interval level of measurement.
6. It is very easy to compute.

When to Use the Mean

1. Sampling stability is desired.


2. Other measures are to be computed such as standard deviation, coefficient of
variation and skewness.

2. Median

Median is the second type of measures of central tendency. Median is what


divides the scores in the distribution into two equal parts. Fifty percent (50%) lies
below the median value and 50% lies above the median value. It is also known as the
middle score or the 50th percentile. For classroom purposes, the first thing to do is to
arrange the scores in proper order. That is to arrange the scores from the lowest score
to highest score or highest score to the lowest score. When the number cases are odd,
the median is a score that has the same number of scores below and above it. When
the scores are even, determine the average of the two middle most scores that have
equal number of scores below and above it.

Median of Ungrouped Data

1. Arrange the scores (from lowest to highest or highest to lowest).


2. Determine the middle most score in a distribution if n is an odd
number and get the average of the two middle most scores if n is an
even number.

Example 1: Fin the median score of 7 students in an English class.

X
(score)
19
17
16
15
10
5
2

Analysis:

The median score is 15. Fifty percent (50%) or three of the scores are above 15
(19,17,16) and 50% or three of the scores are below 15 (10,5,2).

Example 2: Find the median score of 8 students in an English class.

X (score)
30
19
16
15
10
5
2
~ 16+15
X=
2
~
X=15.5

Analysis

The median score is 15.5 which means that 50% of the scores in the
distribution are lower than 15.5, those are 15,10,5, and 2; and 50% are greater then
15.5 those are 30,19,17,16 which mean four (4) scores are below 15.5 and four (4)
scores are above 15.5.

Median of Grouped Data

Formula:

[ ]
n
−cfp
~ 2
X= LB c.i
fm
~
X = median value

MC = median class is a category containing the 𝑛


2

𝐿𝐵 = lower boundary of the median class (MC)

cfp = cumulative frequency before the median class if the scores are
arranged from lowest to highest value
fm = frequency of the median class

c.i = size of the class interval

Steps in Solving Median for Grouped Data

1. Complete the table for cf<.


2. Get 𝑛 of the scores in the distribution so that you can identify MC.
3. Determine 𝐿𝐵, cfp, fm, and c.i.
2

4. Solve the median using the formula.

Example 3: Scores of 40 students in a science class consist of 60 items and they are
tabulated below. The highest score is 54 and the lowest score is 10.

X F cf<
10 – 14 5 5
15 – 19 2 7
20 – 24 3 10
25 - 29 5 15
30 – 34 2 17 (cfp)
35 – 39 9 (fm) 26
40 - 44 6 32
45 – 49 3 35
50 - 54 5 40
n = 40

Solution:

𝑛
= 40 = 20
2 2

𝑛
The category containing
2 is 35-39.
MC = 35 – 39

LL of the MC = 35

𝐿𝐵 = 34.5

cfp = 17

fm = 9
c.i = 5

[ ]
n
−cfp
~ 2
X= LB c.i
fm
20−17
= 34.5 + [ ]5
9

= 34.5
+[ ]
53
9

= 34.5 + 15
9

= 34.5 + 1.67
~
X=36.17

Analysis:

The median value is 36.17, which means that 50% or 20 scores are less
than 36.17.

Properties of the Median

1. It may not be an actual observation in the data set.


2. It can be applied in ordinal level.
3. It is not affected by extreme values because median is a positional measure.

When to Use the Median

1. The exact midpoint of the score distribution is desired.


2. There are extreme scores in the distribution.

3. Mode

Mode is the third measure of central tendency. The mode or the modal score is a
score or scores that occurred most in the distribution. I is classified as unimodal,
bimodal, and trimodal and multimodal.Unimodal is a distribution o scores that consists
of only one mode. Bimodal is a distribution of scores that consists of two modes.
Trimodal is a distribution of scores that consists of three modes or multimodal is a
distribution of scores that consists of more than two modes.

Example 1. Scores of 10 students od Section A, Section B, and Section C

Scores of Section A Scores of Section B Scores of Section C


25 25 25
24 24 25
24 24 25
20 20 22
20 18 21
20 18 21
16 17 21
12 10 18
10 9 18
7 7 18

The score that appeared most in section A is 20, hence, the mode of section A is
20. There is only one mode, therefore, score distribution is called unimodal. The
modes of section B are24, since both 18 and 24 appeared twice. There are two modes
in section B, hence, the disctribution is a bimodal distribution. The modes for section
C are18,21, and 25. There are three modes for section C, therefore, it is called a
trimodalor multimodal distribution.

Mode for Grouped Data

In solving the mode value using grouped data, use the formula:
^
X =LB +
[ ]
d1
d 1 +d 2
c.i

𝐿𝐵 = lower boundary of the modal class

Modal Class (MC) = is a category containing the highest frequency

d1 = difference between the frequency of the modal class and the


frequency above it, when the scores are arranged from lowest to
highest.

d2 = difference between the frequency of the modal class and the


frequency below it, when the scores are arranged from lowest to
highest.

c.i = size of the class interval

Examples 2. Scores of 40 students in a science class consist of 60 items and they


are tabulated below.

x f
10 – 14 5
15 – 19 2
20 – 24 3
25 – 29 5
30 – 34 2
35 – 39 9
40 – 44 6
45 – 49 3
50 – 54 5
n = 40

Modal Class = 35 – 39

LL of MC = 35

𝐿𝐵 = 34.5

d1 = 9 – 2 = 7

d2 = 9 – 6 = 3

c.i = 5

^
X =LB +
[ ]
d1
d 1 +d 2
c.i

^
X =34.5+
[ ]7
7+3
5

¿ 34.5+
[ ]
35
10
¿ 34.5+3.5
¿ 38

The mode of the score distribution that consists of 40 students is 38, because
38 occurred several times.

Properties of the Mode

1. It can be used when the data are qualitative as well as quantitative.


2. It may not be unique.
3. It is not affected by extreme values.
4. It may not exist.

When to Use the Mode

1. When the “typical” value is desired.


2. When the data set is measured on a nominal scale.
4. Quantiles

Quantile is a score distribution where the scores are divided into different
equal parts. There are three kinds of quantile. The quartile is a score point that
divided the scores in the distribution into four (4) equal parts. Decile is a score point
that divides the scores in the distribution into hundred (100) equal parts.

Diagram for Quartile, Decile and

Percentiles Quantile for Ungroup Data

Where,

Qk is the indicated quartile

k = 1,2,3

n = number of cases
Where,
Dk= is the indicated decile
k = 1,2,3,4,5,6,7,8,9
n = number of cases

Where,
Pk is the indicated percentile
K = 1,2,3,4,5,.....................97,98,99
N = number of cases

Example:

Using the given data 6, 8, 10, 12, 12, 14, 15, 16, 20. Find Q1, Q3, D6, D9, P65, P99.

x (score)
6
8
10
12
12
14
15
16
20

1.
Solve the value of Q1.
n=9

[ ]
ntℎ score
1 1
Q 1= n+(1 − )
4 4

[ ]
ntℎ score
1 1
Q1= 9+(1 − )
4 4

[ ]
ntℎ score
9 3
¿ +
4 4

[ ]
ntℎ score
12
¿
4

Q1=3 rd score

The value of Q1is 10 which is the 3rd score in the distribution. Therefore,

25% of the scores are below 10.


2.
Solve the value of Q3.
3
[ ]
ntℎ score
3 3
Q3= n+(1 − )
4 4

[ ]
ntℎ score
3 3
Q3= 9+(1 − )
4 4

[ ]
ntℎ score
27 1
¿ +
4 4

[ ]
ntℎ score
28
¿
4
Q3=7 tℎ score

The 7th score in the distribution which is 15.

𝑄3 = 15

Hence, 75% of the scores in the distribution are less than 15.

3.
Solve the value of D6.

[ ]
ntℎ score
6 6
D 6= n+(1− )
10 10

[ ]
ntℎ score
6 6
D 6= 9+(1− )
10 10

[ ]
ntℎ score
54 4
¿ +
10 10

[ ]
ntℎ score
58
¿
10
D6=5.8 tℎ score
The value of D6 lies within the sum of the 5th score and 80% of the difference
between 6th and 5th scores.

D6 = 5th score + 0.80 (6th score – 5th score)


= 12 + 0.80 (14 – 12)
= 12 + 0.80 (2)
= 12 + 1.60
D6 = 13.60

Therefore, 60% of the scores in the distribution are less than 13.60.
4.
Solve the value if D9.

[ ]
ntℎ score
9 9
D 9= n+(1− )
10 10
[ ]
ntℎ score
9 9
D 9= 9+(1− )
10 10

[ ]
ntℎ score
81 1
¿ +
10 10

[ ]
ntℎ score
82
¿
10
D9=8.2tℎ score

The value of D9 lies within the 8th and 9th scores. That is, the sum of the 8th
score and 20% of the difference between the 9th and 8th scores.
D9 = 8th score + 0.20 (9th score – 8th score)
D9 = 16 + 0.20 (20 – 16)
D9 = 16 + 0.80
D9 = 16.80
Therefore, 90% of the scores in the distribution are less than 16.80.

5.
Solve the value of P65.

[ ]
ntℎscore
65 65
P65= n+(1 − )
100 100

[ ]
ntℎscore
65 65
P65 = 9+(1 − )
100 100

[ ]
ntℎ score
585 35
¿ +
100 100

[ ]
ntℎ score
620
¿
100
P65=6.2 tℎ score

Therefore, P65 lies within the 6th and 7th scores. The value of P65 is the sum of
the 6th and 20% of the difference between the 7th and the 6th scores.
P65 = 6th score + 0.20(7th score – 6th score)
= 14 + 0.20 (15 – 14)
= 14 + 0.20 (1)
= 14 + 0.20

P65 = 14.20

Therefore, 65% of the scores in the distribution are less than 14.20.
6.
Solve the value of P99.
[ ]
ntℎ score
99 99
P99= n+(1 − )
100 100

[ ]
ntℎ score
99 99
P99 = 9+(1 − )
100 100

[ ]
ntℎ score
891 1
¿ +
100 100

[ ]
ntℎ score
892
¿
100
P99=8.92 tℎ score

P99 lies within the 8th and 9th scores. The value of P99 is the sum of the 8th
score and 92% of the difference of 9th and 8th scores.

P99 = 8th score + 0.92 (9th score – 8th score)


P99 = 16 + 0.92 (20 – 16)
P99 = 16 + 3.68
P99 = 19.68
Hence, 99% of the scores in the distribution are less than 19.68.

Quantiles for Grouped Data

a. Quartile for Grouped Data


The general formula for quartile is
where
Qk = the indicated quartile

k = 1, 2, and 3

LB = lower boundary of the quartile class


cfp = cumulative frequency before the quartile class when scores are arranged
from lowest to highest
fq = frequency of the quartile class
c.i = size of the class interval
n 2n 3n
QC = is a class or category containing for Q1, for Q2, and forQ3.
4 4 4

[ ]
n
− cfp1
4
Q1=L B 1+ c .i
fq 1

[ ]
2n
− cfp2
4
Q2=L B 2+ c .i
fq 2
[ ]
3n
− cfp3
4
Q3=L B 3+ c.i
fq 3
Sample Computations of Quartile Using Grouped Data

Example 1: The data for the scores of fifty (50) students in Filipino class are
given below. Solve for the value of Q1.

x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 –80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:
𝑛
= 50 = 12.5
4 4
Q1C = 41 – 48
LL = 41
LB= 40.5
cfp = 10
fq = 5
c.i = 8

[ ]
n
− cfp1
4
Q1=L B 1+ c .i
fq 1

Q1=40.5+
[ ]
12.5 −10
5
8

Q1=40.5+
[ ]
2.5
5
8

Q1=40.5+
20
[ ]
𝑄1 = 40.5 + 4
5

𝑄1 = 44.50

Therefore, 25% of the scores of 50 students who participated in the test are less than
44.50.
Example 2: The data for the scores of fifty (50) students in Filipino class are
given below. Solve for the value of Q3.
x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:

3𝑛 3(50)
4 = 4 = 37.5
Q3C = 73 – 80
LL = 73
LB= 72.5
cfp = 37
fq = 8
c.i = 8

[ ]
3n
− cfp3
4
Q3=L B 3+ c.i
fq 3

Q3=72.5+
[ ]
37.5 − 37
8
8

Q3=72.5+
[ ]
0.5
8
8

Q3=72.5+
4
[]
𝑄3 = 72.5 + 0.5
8

𝑄3 = 73.00

Therefore, 75% of the scores in the distribution are less than 73.

a. Decile for Grouped Data

[ ]
kn
−cfp
The general formula for deciles of grouped data is 10
D k = LB + c.i
fd

where:
Dk = indicated decile
k = 1, 2, 3, 4, 5, 6, 7, 8, 9
LB = lower boundary of the indicted decile class
n 2n 3n
DC = deciles class is a class or category containing for D1, for D2, for D3,……
10 10 10
9n
for D9,
10
cfp = cumulative frequency before the indicated decile class when scores are
arranged from lowest to highest
fd = frequency of the indicated decile class
c.i = size of class interval

[ ]
n
− cfp1
10
D1=L B+ c.i
fd 1

[ ]
2n
− cfp2
10
D2=L B + c .i
fd 2

[ ]
3n
−cfp3
10
D3=L B + c.i
fd 3

[ ]
4n
− cfp4
10
D4 =LB + c.i
fd 4

[ ]
5n
−cfp5
10
D5=L B + c.i
fd 5

[ ]
6n
− cfp6
10
D6=L B + c .i
fd 6

[ ]
7n
−cfp7
10
D7=L B + c .i
fd 7

[ ]
8n
− cfp8
10
D8=L B + c .i
fd 8
[ ]
9n
− cfp9
10
D9=L B + c .i
fd 9

Sample Computation of Decile Using Grouped Data

Example 3: The data for the scores of fifty (50) students in Filipino class are
given below. Solve the value of D5.

x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:
(5)𝑛 (5)50 250
10 = 10 =10 = 25
D5C = 57 – 64
LL = 57
LB = 56.5
cfp = 19
fd = 12
c.i = 8

[ ]
5n
− cfp5
10
D5=L B c .i
fd 5

D5=56.5+
[ ]
25 − 19
12
8

D5=56.5+
[ ]6
12
8

D5=56.5+
[ ]
48
12
D5=56.5+ 4

𝐷5 = 60.5
Hence, 50% of the scores of 50 students are less than 60.5

Examples 4: The data for the scores of fifty (50) students in Filipino class are
given below. Solve for the value of D7.

x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:

7 n (7 ) 50 350
= = =35
10 10 10
D7C = 65 – 72
LL = 65
LB = 64.5
cfp= 31
fd= 6
c.i = 8
[ ]
7n
−cfp7
10
D7=L B + c .i
fd 7

D7=64.5+
[ ]
35− 31
6
8

D7=64.5+
[]
4
6
8

D7=64.5+
[ ]
32
6
D7=64.5+5.33
D7=69.83
Therefore, 70% of the scores of students are less than 69.83.
b. Percentiles of Grouped Data

[ ]
kn
−cfp
The general formula for percentile using grouped data is 100
Pk =L B + c.i
fd
Where,

Pk = the indicated perecentile k = 1, 2, 3, 4, ….. 97, 98, 99

LB = lower boundary of the indicted percentil class

n 2n 3n 98 n
PC = percentile class containing for P1, for P2, for P3,….. for P98
100 100 100 100
,

99 n
for P99
100

cfp = cumulative frequency before the indicated percentile class when scores are
arranged from lowest to highest

fd = frequency of the indicated percentile class

c.i = size of class interval

To derive the formula in solving the indicated percentile, just change the value of
k to the indicated percentile. There are 99 formulas in solving the percentile. Some of
the formulas for percentile are the following.
[ ]
(1)n
− cfp
100
P1=L B+ c.i
fd

[ ]
(2)n
− cfp
100
P2=L B+ c.i
fd

[ ]
(3)n
−cfp
100
P3=L B + c.i
fd

[ ]
(4)n
− cfp
100
P4 =LB + c .i
fd

[ ]
(10)n
− cfp
100
P10=LB + c.i
fd

[ ]
(20)n
−cfp
100
P20=LB + c.i
fd

[ ]
(25)n
−cfp
100
P25=LB + c.i
fd

[ ]
(50)n
− cfp
100
P50=LB + c.i
fd

[ ]
(75)n
− cfp
100
P75=LB + c.i
fd

[ ]
(90)n
− cfp
100
P90=L B + c .i
fd

[ ]
(95)n
−cfp
100
P95=L B+ c .i
fd

[ ]
(99)n
−cfp
100
P99=L B+ c.i
fd
Sample Computations of Percentile Using Grouped Data
Example 5: The data for the scores of fifty (50) students is Fillipino class. Data
are given below. Solve the value of P82.
x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50

Solution:

82 n 82(50) 4100
= = =41
100 100 100
P82C = 73 – 80
LL = 73
LB = 72.5
cfp = 37
fd = 8
c.i = 8

[ ]
(82)n
− cfp
100
P82=LB + c.i
fd

P82=72.5+
[ ]
41− 37
8
8

P82=72.5+
[]
4
8
8

P82=72.5+
32
8[ ]
P82=72.5+ 4
P82=76.5
Therefore, 82% of the scores of 50 students are less than 76.5
Example 6: The data for the scores of fifty (50) students in Filipino class are
given below. Solve the value of P91.

x f cf<
25 – 32 3 3
33 – 40 7 10
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solution:
91 n 91(50) 4550
= = =45.50
100 100 100
P91C = 81 – 88
LL = 81
LB = 80.50
cfp = 45
fd = 3
c.i = 8
[ ]
(91)n
− cfp
100
P91=LB + c.i
fd

P91=80.50+
[ ]
45.50 −45
3
8

P91=80.50+
[ ]
0.50
3
8

P91=80.50+
4
3 []
P91=80.50+1.33
P91=81.83

Hence, 91% of the scores of 50 students are less than 81.83.

Measures of Variation
In the previous section, we have discussed measure or central tendency to describe the
distribution of scores. However, the measure of central tendency does not uniquely describe
a distribution, most especially if we want to know how close or how far is the distance of the
scores of students in a certain test from the average performance of the group. It is in this
line that we make use of the measures of variation. Measure of variation is a single value that
is used to describe the spread of the scores in a distribution. The term variation that is also
known as variability of dispersion. There are several ways of describing the variation of
scores: absolute measures of variation and relative measures of variation.

Let us consider the scores of student in three sections of mathematics class. We


shall consider the spread of scores based on graphical presentation.

Section A Section B Section C


12 12 12
12 12 12
14 12 12
15 13 12
17 13 12
18 14 12
18 17 13
18 20 26
19 20 26
23 28 26
23 28 26
30 30 30
x̅ = 18.25 x̅ = 18.25 x̅ = 18.25
S = 5.15 S = 6.92 S = 7.63
What can you observe about the mean and the standard deviation of the three
groups of scores?

Which group of students performed well in the class?


Which group of scores is most widespread? Less scattered?

Before answering such questions, let us first discuss the different types of
measures of variation.

Types of Absolute Measures of Variation

There are four kinds of absolute variation. The range, inter-quartile range and
quartile deviation, mean deviation, variance and standard deviation.

1. Range
Range (R) is the difference between the highest score and the lowest score in a
distribution. Range is the simplest and the crudest measure of variation, simplest
because we shall only consider the highest score and the lowest score.
a. Range for Ungrouped Data R
= HS – LS

Where,
R = range value

HS = Highest score

LS = Lowest score

Example 1: Find the range of the two groups of score distribution.

Group A Group B
10(LS) 15(LS)
12 16
15 16
17 17
25 17
26 23
28 25
30 26
35(HS) 30(HS)
RA = HS – LS
RA = 35 – 10
RA = 25
RB = HS – LS
RB = 30 - 15
RB = 15
Analysis:

The range of Group A = 25 is greater than the range of Group B = 15. The
implication of this is that scores in group A are more spread out than the scores in
group B or the scores in Group B are less scattered than the scores in group A.

b. Range for Grouped Data


R = HSUB – LSLB

Where,

R = range value

HSUB = upper boundary of the highest score

LSLB = lower boundary of the lowest score

Example 2: Find the value of range of the scores of 50 students in Mathematics


achievement test.
X F
25 – 32 3
33 – 40 7
41 – 48 5
49 – 56 4
57 – 64 12
65 – 72 6
73 – 80 8
81 – 88 3
89 – 97 2
n = 50
LL of LS =
25 LSLB =
24.5
UL of the HS = 97
HSUB = 97.5
R = HSUB – LSLB
R = 97.5 – 24.5
R = 73

Properties of
Range

1. It is quick and easy to understand.


2. It is a rough estimation of variation.
3. It is easily affected by the extreme scores.

Interpretation of Range Value

When the range value is large, the scores in the distribution are more dispersed,
widespread or heterogeneous. On the other hand, when the range value is small the
scores in the distribution are less dispersed, less scattered, or homogeneous.
2. Inter-quartile Range (IQR) and Quartile Deviation (QD)
Inter-quartile range is the difference between the third quartile and the first
quartile.
IQR = Q3 – Q1

Properties of Inter-quartile Range

1. Reduces the influence of extreme values.


2. Not as easy to calculate as the range.
3. Only consider the middle 50% of the scores in the distribution.
4. The point of dispersion is the median value.

Quartile deviation indicates the distance we need to go above and below the
median to include the middle 50% of the scores. It is based on the range of the middle
50% of the scores, instead the entire set.

The formula in computing the value of the quartile deciationis 𝑸𝑫 =


Q3 −Q 1
, where
2

QD is the quartile deviation value, Q1 is the value of the first quartile and Q3 us the value
of the third quartile.

Steps in Solving Quartile Deviation

1. Solve for the value of Q1.

3. Solve for the value of QD using the formula 𝑸𝑫 =


𝑸𝟑− 𝑸𝟏
2. Solve for the value of Q3.

𝟐
.

𝑸𝟑 − 𝑸𝟏
a. Quartile Deviation of Ungrouped Data
𝑸𝑫 =
𝟐

Example 1: Using the given data 6,8,10,12,12,14,15,16,20, find the quartile


deviation.
X(score)
6
8
10
12
12
14
15
16
20
Solve for Q1.
n=9

[ ( )]
ntℎ score
1 1
Q 1= n+ 1 −
4 4

[ ( )]
ntℎscore
1 1
Q1= 9+ 1 −
4 4

[ ( )]
ntℎscore
9 3
Q 1= +
4 4

[ ]
ntℎ score
12
Q 1=
4
rdscore
Q1=3
Q1=10
Solve for Q3.
n= 9

[ ( )]
ntℎscore
3 3
Q 3= n+ 1 −
4 4

[ ( )]
ntℎ score
3 3
Q3= 9+ 1 −
4 4

[ ( )]
ntℎ score
27 1
Q 1= +
4 4

[ ]
ntℎ score
28
Q 1=
4
tℎ score
Q1=7
Q1=15
IQR = Q3 – Q1
= 15 – 10
IQR = 5

Q3 − Q1
QD=
2
15− 10
QD=
2
5
QD=
2
QD=2.5

Analysis:

The amount that deviates from the mean value is 2.5.

𝑸𝟑 − 𝑸𝟏
b. Quartile Deviation of Grouped Data
𝑸𝑫 =
𝟐
Example 2: The data given below are the scores of fifty (50) students in Filipino
class. Solve for the value of quartile deviation (QD).

X F cf<
25 – 32 3 3
33 – 40 7 1.
41 – 48 5 15
49 – 56 4 19
57 – 64 12 31
65 – 72 6 37
73 – 80 8 45
81 – 88 3 48
89 – 97 2 50
n = 50
Solve for the value of Q1.
𝑛
= 50 = 12.5
4 4
Q1C = 41 - 48
LL = 41
LB= 40.5
cfp = 10
fq = 5
c.i = 8

[ ]
n
− cfp1
4
Q1=L B 1+ c .i
fq 1

Q1=40.5+
[ ]
12.5 −10
5
8

Q1=40.5+
[ ] 2.5
5
8

Q1=40.5+
[ ] 20
5
Q 1=40.5+ [ 4 ]
Q1=44.5

Solve for the value of Q3.


3𝑛 3(50)
4 = 4 = 37.5
Q3C = 73 – 80
LL = 73
LB= 72.5
cfp = 37
fq = 8

[ ]
3n
− cfp3
4
Q3=L B 3+ c.i
fq 3

Q3=72.5+
[ 8 ]
37.5 − 37
8
Q3=72.5+
[ ]
0.5
8
8

Q3=72.5+
[]4
8
Q3=72.5+ 0.5
Q3=73

Solve the value for QD


Q3 − Q1
QD=
2
73− 44.5
QD=
2
28.5
QD=
2
QD=14.25
Interpretation of IQR and QD

The larger the value of the IQR or QD, the more dispersed the scores at the
middle 50% of the distribution. On the other hand, if the IQR or QD is small, the scores
are less dispersed at the middle 50% of the distribution. The point of dispersion is the
median value.

Analysis for Inter-quartile Range and Quartile Deviation

When the value of IQR and QD is small, the scores are clustered within middle
50% of the score distribution. On the other hand, the scores are dispersed in the
middle 50% of the distribution when the value of IQR and QD. To determine which
group of distribution is more clustered or disperses you should compare it with
another group of distribution since there is no standard value of a small or large value
of IQR and QD.

1. Mean Deviation (MD)

Mean deviation measure the average deviation of the values from the arithmetic
mean. It gives equal weight to the deviation of every score in the distribution.

a. Mean Deviation for Ungrouped Data

MD=
∑ |x − x|
n
Where,
MD = mean deviation value
x = individual score

x = sample mean

n = number of cases

Steps in Solving Mean Deviation for Ungrouped Data

1. Solve the mean value.


2. Subtract the mean value from each score.
3. Take the absolute value of the difference in step 2.

Solve the mean deviation using the formula MD=


∑ |x − x|
n
Example 1: Find the mean deviation of the scores of 10 students in a mathematics
test. Give the scores:35, 30, 26, 24, 20, 18, 18, 16, 15, 10.

x x-𝐱 /x - 𝐱/
35 13.8 13.8
30 8.8 8.8
26 4.8 4.8
24 2.8 2.8
20 -1.2 1.2
18 -3.2 3.2
18 -3.2 3.2
16 -5.2 5.2
15 -6.2 6.2
10 -11.2 11.2
Ʃx = 212 Ʃ/x - x/
= 60.4

x=
∑x
n

212
x=
10

x=21.2

MD=
∑ |x − x|
n
60.4
MD=
10
MD=6.04

Analysis:
The mean deviation of the 10 scores of students is 6.04. This means that on the average,
the value deviated from the mean of 21.2 is 6.04.

b. Mean Deviation for Grouped Data

MD=
∑ f |X m − x|
n
Where,

MD = mean deviation

value f = class frequency

Xm = class mark or midpoint of each category

x = mean value
n = number of cases

Steps in solving Mean Deviation for Grouped Data

1. Solve for the value of the mean.


2. Subtract the mean value from each midpoint or class mark.
3. Take the absolute value of each difference.
4. Multiply the absolute value and the corresponding class frequency.
5. Find the sum of the results in step 4.
6. Solve for the mean deviation using the formula for grouped data.

Example 2: Find the mean deviation of the given scores below.

𝐗𝐦 − 𝐱̅ 𝐟/𝐗𝐦 − 𝐱̅ /
𝐱̅ /
X f Xm fXm /𝐗𝐦 −

10 – 14 5 12 60 -21.63 21.63 108.15


15 – 19 2 17 34 -16.63 16.63 33.26
20 – 24 3 22 66 -11.63 11.63 34.89
25 – 29 5 27 135 -6.63 6.63 33.15
30 – 34 2 32 64 -1.63 1.63 3.26
35 – 39 9 37 333 3.37 3.37 30.33
40 – 44 6 42 252 8.37 8.37 50.22
45 – 49 3 47 141 13.37 13.37 40.11
50 – 54 5 52 260 18.37 18.37 91.85
n = 40 Ʃf|Xm Ʃf/Xm − x̅ / =
= 1345 425.22

x=
∑ fX m
n
1345
x=
40
x=33.63

MD=
∑ f |X m − x|
n
452.22
MD=
40
MD=10.63

Analysis:

The mean deviation of the 40 scores of students is 10.63. This means that in the
average, the value deviated from the mean of 33.63 is 10.63.

4. Variance and Standard Deviation

Variance is one of the most important measures of variation. It shows variation


about the mean.

Population Variance
2
σ =
∑ ( x − μ )2
N
Sample Variance

2
s=
∑ ( x − x )2
n −1
Steps in Solving Variance of Ungrouped Data

1. Solve for the mean value.


2. Subtract the mean value from each score.
3. Square the difference between the mean and each score.
4. Find the sum of the results in step 3.
5. Solve for the population variance or sample variance using the
formula of ungrouped data.
Example 1: Using the data below, find the variance and standard deviation of the
scores of 10 students in a science quiz. Interpret the result.
x x-𝐱 (x - 𝐱)2
19 4.4 19.36
17 2.4 5.76
16 1.4 1.96
16 1.4 1.96
15 0.4 0.16
14 -0.6 0.36
13 -1.6 2.56
12 -2.6 6.76

Ʃ(x - 𝐱)2 =
10 -4.6 21.16
Ʃx = 146
60.40
x = 14.6
a. Variance of Ungrouped Data
Population Variance of Ungrouped
Data

σ
2
=
∑ ( x − μ )2
N
60.40
2
σ =
10
2
σ =6.04

Sample Variance of Ungrouped Data


2
s=
∑ ( x − x )2
n −1

2 60.40
s=
9
2
s =6.71

Note: If the standard deviation is already solved, square the value of the standard
deviation to get the variance.

b. Variance of Grouped
Data Population Variance

σ 2
=
∑ f ( X m − μ )2
N
Sample Variance

2
s=
∑ f ( X m − x )2
n −1
Steps in Solving the Variance of Grouped Data

1. Solve for the mean value.


2. Subtract the mean value from each midpoint or class mark.
3. Square the difference between the mean value and midpoint or class mark.
4. Multiply the squared difference and the corresponding class frequency.
5. Find the sum of step 4.
6. Solve the population variance or sample variance using the formula of grouped
data.

Example 2: Score distribution of the test results of 40 students in a Filipino class


consisting of 50 items. Solve the variance and standard deviation and interpret the
result.

𝐱̅ 𝐗𝐦 𝐟(𝐗𝐦 − 𝐱̅ ) 𝟐
− 𝐱̅ − 𝐱̅ ) 𝟐
X f Xm fXm (𝐗𝐦

15 – 3 17.5 52.5 33.7 -16.2 262.44 787.32


20
21 – 6 23.5 141 33.7 -10.2 104.04 624.24
26
27 – 5 29.5 147.5 33.7 -4.2 17.64 88.2
32
33 – 15 35.5 532.5 33.7 1.8 3.24 48.6
38
39 – 8 41.5 332 33.7 7.8 60.84 486.72
44
45 - 3 47.5 142.5 33.7 13.8 190.44 571.32
50
n fXm Ʃf/Xm − x̅ ) 2 = 2
= = 1348 606.4
40

Ʃ𝑓(𝑋𝑚 − µ)2
Population Variance

𝑁
ơ2 =

2 606.4
ơ2 = 40
ơ2 = 65.16

Ʃ𝑓(𝑋𝑚 − 𝑥̅
Sample Variance

𝑠 =
)2
𝑛−1
2

𝑠2 = 39
2 606.4

𝑠2 = 66.83
Standard deviation is the most important measures of variation. It is also
known as the square root of the variance. It is average distance of all the scores that
deviates from the mean value.

Sample Standard Deviation

Steps in Solving Standard Deviation of Ungrouped Data

1. Solve for the mean value.


2. Subtract the mean value from each score.
3. Square the difference between the mean and each score.
4. Find the sum of step 3.
5. Solve for the population standard deviation or sample
standard deviation usimg the formula for ungrouped data.

Note: If the variance is already solved, take the square root of the variance to get
the value of the standard deviation.

Example: Using the data on example 1, solve the population and sample
standard deviation.

Population Standard Deviation

σ=
√ ∑ (x − μ)2
N

σ=

60.40
10
σ =√ 6.04
σ =2.46

Sample Standard Deviation

s=
√ ∑ ( X − x)2
n −1
s=

60.40
9
s= √ 6.71
s=2.59

Steps in Solving the Standard Deviation of Grouped Data

1. Solve for the mean value.


2. Subtract the mean value from each midpoint or class mark.
3. Square the difference between the mean value and midpoint or class
mark.
4. Multiply the squared difference and the corresponding class
frequency.
5. Find the sum of the results in step 3.
6. Solve for the population standard deviation or sample
standard deviation using the formula for grouped data.

Population Standard Deviation

σ=
√ ∑ f ( X m − μ)2
N

σ=
√ 2606 . 4
40

σ =√ 65 .16
σ =8 . 07

Sample Standard Deviation

s=
√ ∑ f ( X m − x )2
N

s=
√ 2606 . 4
39

s= √ 66 . 8308
s=8 .18

Interpretation of Standard Deviation

1. If the value if standard deviation is large, on the average, the scores in


the distribution will be far from the mean. Therefore, the scores are
spread out around the mean value. The distribution is also known as
heterogeneous.
2. If the value of standard deviation is small, on the average, the scores in the
distribution will be close to the mean. Hence, the scores are less dispersed or the
scores in the distribution are homogeneous.

Going back to the diagram presented, let us answer the questions pose using the
concept of mean and standard deviation with the help of the diagram below.

1. What can you observe about the mean and the standard deviation of
the three groups of scores?
Answer: The mean of the three groups of scores is the same which is equal to
18.25 and the standard deviation of section A = 5.15, section B = 6.92, and
section C = 7.63.
2. Which group of students performed well in the class?
Answer: In terms of performance, the three sections of students perform the
same because they have the same mean value of 18.25.
3. Which group of scores is most widespread? Less scattered?
Answer: The standard deviation of section A = 5.15, section B = 6.92 and
section C = 7.63. The scores that are most scattered are those in section C
because they have the largest value of standard deviation which is equal to
7.63. On the other hand, the less scattered group of scores is in section A
which has the smallest value of the standard deviation which is equal to 5.15.
Therefore, the smaller the value of the standard deviation on the average the
closer the scores are to the mean value and the larger the value of the
standard deviation on the average makes the scores scattered from the mean
value.
Using the diagram, there are more scores that are closer to the mean
value in section A than in section B and section C.

Properties of Variance and Standard Deviation

1. The most commonly used measures of variation most especially in research.


2. It shows variation of the individual scores about the mean.

Relative Measure of Variation

Coefficient of variation shows variation relative to the mean. It is used to


compare two or more groups of distribution of scores. Usually expressed in percent, the
smaller the value of the coefficient of variation the more homogeneous the scores in
that particular group. On the other hand, the higher the value of the coefficient of
variation the more dispersed the scores in that particular distribution.
The formula in computing the coefficient of variation is:

CV = ( xs ) 100 %
𝑥̅ = mean value

s = standard deviation
Example:

Find the coefficient of variation of the given data below:

Section A Section B Section C


12 12 12
12 12 12
14 12 12
15 13 12
17 13 12
18 14 12
18 17 13
18 20 26
19 20 26
23 28 26
23 28 23

𝑥̅ = 18.25 𝑥̅ = 18.25 𝑥̅ = 18.25


30 30 30

s = 5.15 s = 6.92 s = 7.63

CV A= ( sx )100 %
CV A= ( 18.25
5.15
) 100 %
CV A=28.22%

( sx )100 %
CV B =

CV =(
18.25 )
6.92
B 100 %

CV B =37.92 %

( xs ) 100 %
CV C =

CV =(
18.25 )
7.63
C 100 %

CV C =41.81%

Analysis:

The sores in section A are less scattered than the scores in section B and section
C. In other words, scores in section A are more homogeneous than the scores in section
B and Section C. Another way to interpret this is, the scores in section c are more spread
out than the scores in section A and section B, or the scores in section C are more
heterogeneous tan the scores in section A and section B.

Measures of Skewness

Measure of skewness described the degree of departure of the scores from


symmetry. The skewness coefficient SK can be solved using the formula:

where 𝑥̅= mean value and 𝑥̃ = median value; s = standard deviation.


3(𝑥̅ −𝑥̃)

𝑠
SK=

Skewness can be classified according to the skewness coefficient. If SK > 0, it


is called positively skewed distribution. When SK < 0, it is negatively skewed
distribution. However, if Sk = 0, the scores are normally distributed. The skewness of a
score distribution indicates only the performance of the students but not the reasons
about their performance.

Positively skewed or skewed to the right is a distribution where the thin end
tail of the graph goes to the right part of the curve. This happens when most of the
scores of the students are below the mean.
Negatively skewed or skewed to the left is a distribution where the then in tail of
the graph goes to the left part of the curve. This happens when most scores got by the
students are above the mean.

In a classroom testing, a positively skewed distribution means that the students


who took the examination did very poor. Most of the students got a very low score and
only few students got a high score. Positively skewed distribution tells you only on the
poor performance of the test takers but not the reasons why the students did poorly in
the said examination. Poor performance of the students could be attributed to the
following: ineffective methods of teaching and instruction, students’ unpreparedness to
take the examination, test items are very difficult, and there is no enough time to
answer the test item.

Graphical Representation of Negatively Skewed Distribution


(Sk< 0)

Negatively skewed distribution means that the students who took the
examination performed well. Most of the scores are high and there are only few low
scores. The shape of the score distribution indicates the performance of the students
but not the reasons why most of the students got high scores. The possible reasons
why students got high score are: the group of students are smart, there is enough time
to finish the examination, the test are very easy, and there is an effective instruction
and the students have prepared themselves for the examination.
Example 1: Find the coefficient of skewness of the scores of 40 grade 6 pupils in
a 100-item test in Mathematics if the mean is 82 and the median is 90 with standard
deviation of 15.
Given:
x=82
~
x=90
s=15

3(x − ~
x)
Sk=
s
3(82 −90)
Sk=
15
3(− 8)
Sk=
15
−24
Sk=
15
Sk=−1.60

Analysis:

The sk = -1.60, the value of sk is negative, meaning the score distribution is


negatively skewed. Most of the scores are high, this means that the students have
performed excellently in the said examination.

Example 2: Find the coefficient of skewness of the scores of 45 grade 6 pupils in


a 50-item test in Biology, if the mean is 46 and the median is 40 with standard deviation
of 7.5.
Given:
x=46
~
x=40
s=7.5

3(x − ~
x)
Sk=
s
3(46 −40)
Sk=
7.5
3(6)
Sk=
7.5
18
Sk=
7.5
Sk=2.40

Analysis:

The sk = 2.40, the value of sk is positive, meaning the score distribution is


positively skewed. Most of the scores are below the mean. This means that the students
did not perform well in the said examination.

Normal Distribution

Normal distribution is a special kind of symmetric distribution and it


represents some properties in mathematics. It is very important when comparing
between scores and making statistical decisions. It can determined using the values
of the mean and standard deviation. It is also centered at the eman of the variable and
its variation will depend on the value of its standard deviation. The smaller the value of
the standard deviation, the steeper the score distribution and less dispersed.

Properties of Normal Distribution

1. The curve has a single peak, meaning the distribution is unimodal.


2. It is a bell-shaped curve.
3. It is symmetrical to the mean.
4. The end tails of the curve be extended indefinitely in both sides and
asymptotic to the horizontal line.
5. The shape of the curve will depend on the value of the mean and
standard deviation.
6. The total area under the curve is 1.0. Hence, the area of the curve in
each side of the mean is 0.5.
7. The probability between two given points in the curve is equal to the
area between the two points.
When you add up the percentages of the baseline between three s units above
and three s units below, you come up with 99.98%. Let us evaluate the area under
the normal curve between the mean and the standard deviation as indicated in the
diagram. The percentage of cases that falls between the mean value and the value of
the mean plus the value of one standard deviation unit in the normal distribution of
scores is 34.13%. And the percentage of cases that falls between the mean value and
the value of the mean minus the value of one standard deviation is 34.13%.

A score distribution with mean of 74 and standard deviation of 4 using the


normal curve model, about 34.13% of the scores in the distribution, falls between 74
and 78 as shown in the given illustration.
From the given illustration with mean equals to 74 and a standard deviation of 4,
four points are added to each standard deviation unit above the mean (78, 82, 86, 90)
and four point are subtracted from the mean value for each standard deviation unit
below the mean (70, 66, 62, 58). This is approximately equal to 68.26% or 68% of
the scores in the distribution fall between 72 and 78 as shown in the following
figures.
Using the figure above, about 95.44% of the students got score from 66 to 82.

Using the figure above, 47.72% or about 43% of the students got a score from 74 to 82.

We can also use the normal curve to determine the ,percentage of the scores of students
below or above a certain score. 15.86% or 16% of the students got a score below 70. This
can be considered also as a score of 70 is at 16th percentile.
About 84.12% or 84% of the scores are below 78. This can be written as P84 = 78.

DESCRIBING INDIVIDUAL PERFORMANCE

Standard Scores

In this section, we shall discuss the different kinds of converted scores. The
procedures for converting raw scores to standard scores are presented in this section.
There are four (4) types of standard scores: z-score, t-score, standard nine (stanines),
and percentile ranks.
Scores directly obtained from the test are known as actual scores or raw scores.
Such scores cannot be interpret as whether the score is low, average or high. Scores
must be converted or transformed so that they become meaningful and allow some kind
of interpretations and direct comparisons of two scores. Consider the two figures
below:
Figure A represents the score distribution with a mean of 50 with a standard
deviation of 10. Figure B represent a score distribution with a mean of 80 and standard
deviation of 5.

The shape of the two score distributions above is the same, however, the means
and standard deviation are different. This happens because the ranges of the scores
from figure A differ from figure B. In this case, the scores in those figures cannot be
compared because they belong to two different groups.

Example: Ritz Glenn obtained a score of 92 in Business Calculus and 88 in


Production Management. In which subject did he perform well? Are we correct if we say
that Ritz Glenn performed well in Business Calculus? This may be true, but how certain?
If we said that he is better in Business Calculus, then we are treating 92 as
percentages. In most cases, scores are converted to percentages before the teacher
returned the test papers to students, but not always. Ritz Glenn’s score of 92 in Business
Calculus might mean he answered 92 items correctly out of 100 item, or he answered
91 items correctly out of 92, or it can be interpreted as 92 items correct out of 150
items. This can happen also to his Production management score of 88; he might
answered 88 items correctly out of 100, or 88 correct answers out of 88 items, or he
might have answered 88 items correctly out of 150 items. In other words, raw scores
cannot be interpreted directly, so we need additional information about the scores in
the distribution.

The raw scores of all students in the Business Calculus and Production
Management are very important so that we can get the information that describes
both score distributions. Bases from our previous discussion, the mean value and the
standard deviation are necessary to describe the distribution of scores. Let us add the
mean values and standard deviations of the scores of students in Business Calculus and
Production Management as shown:
Business Calculus Production Management

𝑥̅ = 95 𝑥̅ = 80
x = 92 x= 88

s=3 s=4
Ritz Glenn’s score in Business Calculus is three (3) points below the class mean
performance nd eight (8) points above the class mean performance in Production
Management. Using the mean value, we can say that Ritz Glenn performed better in
Production Management than in Business Calculus compared with the performance of
the rest of his classmates. How about the standard deviation? The standard
deviation enables us to know how many percent of the scores above or below each
score has in the distribution.

Assuming that the scores in Business Calculus and Production Management


are normally distributed, let us construct s curve that represents the given data. The
normal curve model is used as a basis to compare the distribution with different
means and different standard deviations.

The shaded area represent the percentage of the scores lower than the score of
Ritz Glenn. In Business Calculus the score of Ritz Glenn is one standard deviation unit
below the mean and in Production Management his score is two standard deviation
units above the mean. To determine the exact percentage of the scores below the score
of Ritz Glenn in Business Calculus and Production management use the normal curve
model.

15.86% or approximately 16% of the scores below the score of Ritz Glenn in
Business Calculus or his score is 16th percentile.

97.71 or approximately 98% of the students’ scores in Production Management


are lower than Ritz Glenn’s 98th percentile score in Production Management.

1. z-scores

To get more exact information about the performance of Ritz Glenn collect the
raw score, mean and standard deviation and determine how far below or above the
mean in standard deviation units is the obtained raw score.

To determine the exact position of each score in the normal distribution use z-
score formula. The z-score is used to convert a raw score to standard score to
determine how far a raw score lies from the mean in standard deviation units. From
this we can also determine whether an individual student performs well in the
examination compared to the performance of the whole class.

The z-score value indicates the distance between the given raw score and the
mean value in units of the standard deviation. The z-value is positive when the raw
score is above the mean while the z is negative when the raw score is below the mean.

The formula of z-score is:

x−μ x−x
z= ∨z= , where
σ s
z = z-value

x = raw score

s = sample standard deviation


𝑥̅ = sample mean

σ = population standard deviation

µ = population mean

The z-score formula is very essential when we compare the performance of the
student in his subjects or the performance of two students that belongs to different
groups. It can determine the exact location of the scores whether above or below the
mean and how many standard deviation units it is from the mean.

Example: Using the data about Ritz Glenn’s scores in Business Calculus and
Production Management, solve the z-score value.

Business Calculus Production Management

𝑥̅ = 95 𝑥̅ = 80
x = 92 x= 88

s=3 s=4
z-score of Business Calculus (BC)
𝑥 − 𝑥̅
𝑧=
𝑠
𝑍𝐵𝐶
92 − 95
3
=
−3
𝑍 𝐵𝐶
3

𝑍𝐵𝐶 = −1
=

z-score of Production Management (PM)

𝑍𝑃𝑀
88 −
80
=
4

𝑍𝑃𝑀 =
8

𝑍𝑃𝑀 = +2
4

Analysis:

The score of Ritz Glenn in Business Calculus is one unit standard deviation
below the mean. His score in Production Management is two units standard deviation
above the mean. Therefore, we can conclude that Ritz Glenn performed better in
Production Management than in Business Calculus.

1. T-scores

There are two possible values of z-score, positive z if the raw score is above the
mean and negative z if the raw score is below the mean. To avoid confusion between
negative and positive value, use T-score to convert raw scores. T-score is another type
of standard score where the mean is 50 and the standard deviation is 10. In z-score
the mean is 0 and the standard deviation is one (1). To convert raw score to T-score,
find first the z-score equivalent of the raw score and use the formula T-score = 10z +
50.

Business Calculus Production Management


x = 92 x = 88
x̅ = 95 x̅ =
80 S = 3 s=4
From the above discussion, z-score of Business Calculus is -1 and z-score of
Production Management is +2, solve the T-score equivalent:

T-scoreBC = 10z + 50
T-scoreBC = 10(-1) + 50
T-scoreBC = 10 + 50
T-scoreBC = 40
T-scoreBC = 10z + 50
T-scoreBC = 10(2) + 50
T-scoreBC = 20 + 50
T-scoreBC = 70

Analysis:

z-score of -1 is equivalent to a T-score of 40, and z-score of +2 is equivalent to a


T-score of 70. The negative value is eliminated in the T-score equivalent. Therefore,
Ritz Glenn performed better in Production Management than in Business Calculus due
to higher value of T-score which is equal to 70.
Relationship Between z-score and T-score

1. Standard Nine

The third type of standard score is the Standard Nine point scale which is also
known as stanine, the origin word is sta(ndard) + nine. A stanineis a nine-point grading
scale ranging from 1 to 9, 1 being the lowest and 9 the highest. Stanine grading is easier
to understand than the other standard score model. The descriptive interpretation of
stanine 1, 2, 3 is below average, the stanine 4, 5, 6 is interpreted as average and the
descriptive interpretation of stanine 7, 8, 9 is above average. Use this graph as a basis of
analysis stanine results.

From the given figure, the mean of stanine is 0 and the central interval is 0.25
standard deviation of the mean, and each other interval is 0.5 standard deviation wide
except for the end tail of the normal curve.

Stanine is used to compare two or more distribution of data, particularly test


scores. Estimate or compute probabilities of events involving normal distributions.
Facilities using words rather than number in presenting statistical data.

The given figure below indicates the percentage of score in each stanine and the
corresponding descriptions.

Stanine Percentage of Scores Description


1 4% Very Poor
2 7% Poor
3 12% Below Average
4 17% Slightly Below Average
5 20% Average
6 17% Slightly Above Average
7 12% Considerably Above
Average
8 7% Superior
9 4% Very Superior
Relationship Between Percentile Rank and Normal Curve

1. Percentile Rank

Another way of converting a raw score to standard score is the percentile rank. A
percentile rank indicates the percentage of scores that lies below a given score.
Example, a test score which is greater than 95% of the scores of the examinees is said to
be 95th percentile. If the scores are normally distributed, percentile rank can be
inferred from the standard score. In solving percentile rank use the formula:

PR= ( n )
CF b +0.5 F g
x 100

Where,

PR = percentile rank

CFb = cumulative frequency below the given score

Fg = frequency of the given score

n = number of scores in the distribution

Solving the percentile rank is tedious or needs a very long process, we can short-
cut the solution using the SPSS program or EXCEL program which is more easier to use
and more cheaper than other software.

Steps in Solving Percentile Rank

1. Arrange the test scores (TS) from highest to lowest.


2. Make a frequency distribution of each score and the number of students
obtaining each score. (F)
3. Find the cumulative frequency (CF) by adding the frequency in each score from
the bottom upward.
4. Find the percentile rank (PR) in each score using the formula and the result
as indicated in column 4.

Example: The table below shoes a summary of the scores of 40 students in a 45-
item multiple choice of test. Find the percentile ranks of each score in the
distribution.

TS F
45 1
43 2
42 2
41 1
40 1
39 2
37 3
36 2
34 1
33 2
32 2
30 3
29 4
28 1
27 1
25 2
24 1
22 2
21 2
19 1
18 2
16 1
15 1
40
Find the cumulative frequency of the frequency distribution. The third column
represents the cumulative frequency.

TS F CF
45 1 40
43 2 39
42 2 37
41 1 35
40 1 34
39 2 33
37 3 31
36 2 28
34 1 26
33 2 25
32 2 23
30 3 21
29 4 18
28 1 14
27 1 13
25 2 12
24 1 10
22 2 9
21 2 7
19 1 5
18 2 4
16 1 2
15 1 1
40
Find the percentile rank of each score.

a. Solution:
Score = 45
CFb = 39
Fg = 1
n = 40

PR= ( CF b +0.5 F g
n )
x 100

PR=( 40 )
39+0.5(1)
x 100

PR=(
40 )
39+0.5
x 100

PR=(
40 )
39.5
x 100

𝑃𝑅 = 0.9875 𝑥 100
𝑃𝑅 = 98.75
𝑃𝑅 = 99

Analysis
A raw score of 45 is equal to percentile rank of 99. This means that 99% of the
students who took the examination had raw scores equal to or lower than 45. This can
be written as PR99 = 45.

b. Solution:
Score = 43
CFb = 37
Fg = 2
n = 40
PR= (
CF b +0.5 F g
n
x 100 )
PR= ( 37+0.5(2)
40 )
x 100
PR=( 37+1
40 )
x 100

PR=( ) x 100
38
40

𝑃𝑅 = 0.95 𝑥 100
𝑃𝑅 = 95

Analysis:

A raw score of 43 is equal to a percentile rank of 95. This means that 95% of the
students who took the examination had raw scores equal to or lower than 43. This can
be written also as PR95 = 43.

c. Solution:
Score = 42
CFb = 35
Fg = 2
n = 40

PR= ( n )
CF b +0.5 F g
x 100

PR=( 35+0.5(2)
40 ) x 100
PR=(
40 )
35+1
x 100

PR=( ) x 100
36
40

𝑃𝑅 = 0.9 𝑥 100
𝑃𝑅 = 90

Analysis
A raw score of 42 is equal to a percentile rank f 90. This means that 90% of the
students who took the examination had raw scores equal to or lower than 43. This can
be written also as PR90 = 42.

Note: continue solving the percentile ranks of each score in the distribution in
the exercise and compare the answers in the percentile ranks distribution in the
succeeding page.
When converting the raw scores to a percentile rank, the raw scores are put on
a scale that has the same meaning with different number of groups and for different
lengths of tests.

Frequency and percentile ranks distribution of a 45-item multiple choice of


test conducted to 40 students

TS F CF PR
45 1 40 99
43 2 39 95
42 2 37 90
41 1 35 86
40 1 34 84
39 2 33 80
37 3 31 74
36 2 28 68
34 1 26 64
33 2 25 60
32 2 23 55
30 3 21 49
29 4 18 40
28 1 14 34
27 1 13 31
25 2 12 28
24 1 10 24
22 2 9 20
21 2 7 15
19 1 5 11
18 2 4 8
16 1 2 4
15 1 1 1
n = 40
Relationship between Different Standard Scores

This figure shows the relationship between the raw scores and he converted
scores assuming that the distribution is normally distributed. The score distribution
has a mean of 60 and standard deviation of 5. Using these parameters let us consider
a raw score of 75, this raw score is equivalent to a distance of three standard
deviations from the mean value, z-score of 3 and T-score of 80 and stanine of 8. This
can be done using the different process that we have discussed in the previous
sections.

DESCRIBING RELATIONSHIPS

Correlation

Another statistical method used in analyzing test results is the correlation. This
is the tool that we are going to utilize if we want to determine the relationship or
association between the scores of students in two different subjects. Is there a
relationship between the Mathematics scores and the Science scores of 15
students?
What type of linear relationship exists between the two sets of scores? Such questions
can be answered using the concepts of correlation. In this chapter, the different type of
computing the correlation coefficient when raw scores and ordinal level of
measurement are given. The graphical method or scattergram of determining the
relationship between two groups of scores are also discussed in this section but only
limited to linear relationship.

Correlation refers to the extent to which the distributions are linearly related
or associated between the two variables. The extent of correlation is indicated
numerically by the coefficient (rxy). The correlation coefficient (rxy) also known as
Pearson Product Moment Correlation Coefficient in honor to Karl Pearson who
developed the said formula. The correlation coefficient ranges from -1 to +1. There
are three kinds of correlation based from the correlation coefficient: (1) positive
correlation; (2) negative correlation; and (3) zero correlation. There are two ways of
identifying the correlation between the two variables: (1) using the formula; and (2)
using scatter point or scattergram.

Kinds of Correlation

1. Positive Correlation
High scores in distribution x are associated with high scores in distribution y.
Low scores in distribution x are associated with low scores in distribution y. This
means that as the value of x increases the value of y increases too or as the value of x
decreases, the y values will also decrease. The line that best fitted to the given points
upward to the right as shown in the scattergram of positive correlations. The slope of
the line is positive.
2. Negative Correlation
High scores in distribution x are associated with low scores in distribution y. Low
scores in distribution x are associated with high scores in distribution y. This means
that as the values of x increase, the values of y decrease or when the values of x
decrease, the values of y increase. The line that best fitted to the given points downward
to the right as shown in the scattergram of negative correlations. The slope of the line
is negative.
3. Zero Correlation
No association between scores in distribution x and scores in distribution y. No
single line can be drawn that best fitted to all points as shown in the scattergram of
zero correlation. No discernable pattern can be formed.
The formula in computing the correlation coefficient using th Person Product
Moment Correlation is:

r xy =(n)¿ ¿

Example: find the correlation coefficient of the scores of 10 students in


mathematics quiz and science quiz as given below.

Students Scores in Scores in


Math (x) Science (y)
1 35 41
2 15 25
3 11 19
4 35 39
5 45 40
6 28 30
7 30 26
8 15 23
9 45 48
10 40 42

Scattergram of Correlation

1. Scattergram of Positive Correlation

Another way of determining the correlation of pair of scores is through the use of
graphing. The graphical representation is called scattergram. Using your knowledge in
graphing ordered pairs in the coordinate plane; graph the scores of 8 students in
mathematics and science.
Mathematics Science Scores
Scores
1 6
2 8
3 10
4 11
5 13
6 16
7 20
8 21

Analysis:

As math score increases, there is a corresponding increase in the science


score. Using the given points in the coordinate plane, a straight line upward to the right
can be drawn that is best fitted to all points. Hence, the slope of the line is positive.

2. Scattergram of Negative Correlation

Graph the scores of 8 students in mathematics and science in the coordinate


plane.

Mathematics Science Scores


Scores
1 20
2 17
3 15
4 13
5 12
6 9
7 7
8 4

Analysis:

As math scores increase, science scores decrease. Using the given points in the
coordinate plane, a straight line downward to the right can be drawn that is best
fitted to all the points. Hence, the slope of the line is negative.
3. Scattergram of Zero Correlation

Graph he scores of 11 students in mathematics and science in the coordinate


plane.

Mathematics Science Scores


Scores
3 17
4 17
6 11
7 4
7 6
8 15
10 12
10 19
14 13
16 7
17 19

Make a scattergram of the scores of 11 students in mathematics and science.

Analysis:

No discernable pattern can be formed from the given set of points. No single line
can be drawn that is best fitted to all point in the plane.
Computation of Correlation

Steps in Solving Correlation Coefficient Using Pearson r

1. Complete the necessary data in the table as xy column, x2 column, y2 column.


2. Find the Ʃx, Ʃy, Ʃxy, Ʃx2, Ʃy2.
3. Compute the correlation coefficient (rxy) using the formula.

r xy =(n)¿ ¿
Student Scores in Scores in xy X2 Y2
Math (x) Science (y)
1 35 41 1435 1225 1681
2 15 25 375 225 625
3 11 19 209 121 361
4 35 39 1365 1225 1521
5 45 40 1800 2025 1600
6 28 30 840 784 900
7 30 26 780 900 676
8 15 23 345 225 529
9 45 48 2160 2025 2304
10 40 42 1680 1600 1764
Ʃx = 299 Ʃy = 333 Ʃxy = Ʃx2 = Ʃy2 =
10989 10355 11961

r xy =(n)¿ ¿
( 10 ) ( 10989 ) −( 299)( 333)
r xy =
√ [ ( 10 ) (10355 ) − 299 ][ ( 10 )( 11961 ) − 333 ]
2 2

( 109890 ) −(99567)
r xy =
√ [ (103550 ) − 89401 ] [ (119610 ) −110889 ]
( 109890 ) −(99567)
r xy =
√ [ (103550 ) − 89401 ] [ (119610 ) −110889 ]
10323
r xy =
√ [ 14149 ][ 8721 ]
10323
r xy =
√123393429
10323
r xy =
11108.25949
r xy =0.929308503
r xy =0.93

Analysis:

The value of the correlation coefficient is rxy = 0.93, which means that there is
a very high positive correlation between the scores of 10 students in mathematics
and in science. This means that students who are good in mathematics are also good in
science.

Spearman rho Coefficient

Another way of finding the correlation between two variables is the Spearman
rho correlation coefficient and it is denoted by a Greek letter rho (ƿ). The Spearman rho
correlation coefficient (ƿ) is a measure of correlation when the given sets of data are
expressead in ordinal level of measurement rather then raw scores as in Pearson r,
Spearman rho (ƿ) was first derived by a British psychologist by the name Spearman
In honor to him, the formula was named Spearman’s rho.

6∑ D
2
ρ=1− 2 where,
N (N −1)
ƿ = Spearman rho correlation coefficient value
D = difference between a pair of ranks
N = number of students/ cases

Steps in solving Spearman’s rho Correlation Coefficient

1.Rank the scores in the distribution if raw scores are given.


2.Find the difference between each pair.
3.Square the difference.
4.Find the summation of the squared difference.
5.Solve the value of the Spearman’s who coefficient using the formula
6∑ D
2
ρ=1− 2
N (N −1)
Example: Ten (10) aspirants of the Gabuyo Scholarship at YAG University were
rank in their mathematics scores and English scores. Solve the value of ƿ to the nearest
hundredths. The data is tabulated below:

Student Mathematics Science Score


Score
Ritz Glenn 45 50
James Vincent 47 45
John Michael 39 35
Paul John 37 41
Raphael Carlo 33 38
John Rey 40 39
ShejRoi 15 25
Fitch Peter 46 49
Kristle Anne 25 40
Cloe Grace 44 42

Rank the scores in mathematics and the scores in science, find the difference of
each pair of scores and square the difference. Find the summation of D 2 and solve for ƿ
value.

Student Mathematics Science D D2


Score Score
Ritz Glenn 3 1 2 4
James Vincent 1 3 -2 4
John Michael 6 9 -3 9
Paul John 7 5 2 4
Raphael Carlo 8 8 0 0
John Rey 5 7 -2 4
ShejRoi 10 10 0 0
Fitch Peter 2 2 0 0
Kristle Anne 9 6 3 9
Cloe Grace 4 4 0 0
ƩD = 34
2

Solution:
6∑ D
2
ρ=1− 2
N (N −1)
6 (34)
ρ=1−
10(102 − 1)
204
ρ=1−
10 ( 99 )
204
ρ=1−
990
ρ=1− 0.21
ρ=0.79

Analysis

The ƿ value is 0.79, which indicates a high positive correlation between the
mathematics scores and science scores of ten aspirants in the Gabuyo Scholarship. The
students who are good in mathematics are also good in science.
76

You might also like