Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Eps 400 New Notes Dec 15-1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

EPS 400: EDUCATIONAL STATISTICS AND EVALUATION

PART ONE: EDUCATIONAL STATISTICS


CHAPTER ONE: INTRODUCTION
This chapter introduces you to the basic concepts in educational statistics and evaluation.

1.2 KEY CONCEPTS


A. Statistics- is the science of conducting studies to collect, organize, summarize, analyse,
and interpret information/data.
Types of statistics
i) Descriptive statistics- techniques that organize, summarize, and present a set
of data in attempts to describe a situation.
Elements of Descriptive stats.
a. The population or sample of interest
b. One or more variables to be investigated
c. Tables, graphs, or numerical summary tools
d. Identification of patterns in the data
ii) Inferential statistics- techniques that use sample data to make estimates,
decisions, predictions, or other generalizations about the population from
which the sample was drawn.
Important concepts in inferential statistics
a) Population- the entire group of individuals that a researcher wishes to study.
b) A sample- is a small group selected from a population to participate in a research
study.
Reasons why we sample.
i. It would take too much time to study the entire population.
ii. It would cost too much money to study the entire population.
iii. It might not be possible to identify and reach all members of the population.
iv. If we study the entire population, we might not have anything left.
c) Parameter- a numerical value that describes a population.
d) Statistic- a numerical value that describes a sample.
e) Sampling error- the discrepancy between a statistic and a parameter.
f) Hypothesis- A guess about what is likely to happen.

Page 1 of 47
g) Hypothesis testing-A decision making process for evaluating claims about a
population based on information obtained from samples.
B. Data- are the values (measurements or observations) that variables can assume.
i) Data set- A collection of data values.
ii) Data array- an ordered data set.
iii) Datum/ data value- each value in the data set.
iv) Raw data/ raw scores- data collected in their original form/the scores
obtained from tests.

Data can either be quantitative or qualitative. Quantitative data consist of


measurements that are recorded using numbers while qualitative data are
measurements that cannot be measured on a natural numerical scale but can only be
classified into categories.

C. Variable- is a characteristic that can change or assume different values. E.g. height,
weight, intelligence, age, motivation, identity, e.t.c.
Types of variables.
i) Random variables- Variables whose values are determined by chance.
ii) Physical variables- Those measured using tangible/physical instruments e.g.
time, length, height, temperature etc.
iii) Psychological Variables- These are intangible variables that are located in the
individual’s brain and they are measured indirectly.
iv) Qualitative variables- Variables that differ in distinct quality, characteristic,
or attribute. E.g. gender, religion, country, race, tribe e.t.c
v) Quantitative variables- variables that are numerical in nature and can be
ranked or ordered according to their values. E.G. Height, weight, age, length,
body temperatures, scores in a test e.t.c.
Types of quantitative variables.
a) Discrete variable- a variable that exists in indivisible units. It can only be
counted and expressed using whole numbers E.g. Number of children in a
family, no of telephone calls received, no of students in a class etc.
b) Continuous variable- a variable that can be divided into smaller units
without limit. It can assume an infinite number of values between any two
points on a measurement scale. It is obtained by measuring and often
includes fractions and decimals. E.g. Temperature, time, length etc.

Page 2 of 47
VI) Independent variable- in an experiment, it is the variable that is manipulated by the
researcher in order to study its effects on another variable.
VII) Dependent variable- it is the variable that is observed or measured in an experiment
in order to record the effects of the independent variable.
D. Constant- A variable that does not change or that assumes only one value.
E. Measurement- This is the process of assigning numbers to individuals or objects in a
systematic way to represent their properties.
F. Evaluation- This is the process of making value judgements based on measurements.
G. Test- A systematic procedure for observing a sample of behaviour/psychological
variable. A set of questions to which students are expected to supply answer.
H. Test Item- a specific stimulus to which a person responds overtly; a specific question to
which a person supplies an answer in a test.
I. Examination- A collection of tests which measure different traits of the individual in
order to facilitate decision making.

Why teachers need to have a basic knowledge of statistics.

1. Statistics enable teachers to be intelligent consumers of the different kinds of


information they come across in the teaching practice.
2. Knowledge of statistics enables teachers to plan and use appropriate procedures for
conducting and interpreting their own research.
3. Knowledge of statistics enables teachers to describe, classify, organize, summarize,
and present data collected in their practice in ways that are easy to comprehend. E.g.
using graphs, tables, and charts to present test scores.
4. Statistical knowledge helps teachers in the construct and standardize tests in education
as well as in using them properly.
5. Statistics help teachers to use statistical reasoning as they conduct evaluation and
measurement in education.
6. Knowledge of statistics enables teachers to understand individual differences among
students, to compare the suitability of one method or technique with another, and to
make predictions about the future performance of students.

Page 3 of 47
LEVELS/SCALES OF MEASUREMENT.
Scales of measurement refer to the ways in which variables/numbers are defined, categorized,
counted, or measured. They present the set of rules upon which measurement is based.

The four types of measurement scales are: Nominal, ordinal, interval, and ratio.

1. Nominal scale of measurement

This classifies data into mutually exclusive (non-overlapping/ different), exhausting


categories which cannot be arranged in any meaningful order. Numbers in this scale only
represent properties and can only be used for identification, naming or classifying
variables. You can only count the numbers at this level.

Examples.

Numbers on the back of football players. 1, 2, 10, 32 etc.

Gender. e.g. 1= male; 2=female.

Ethnicity e.g. 1= White; 2= Indian; 3= African;

Your college admission number.

Areas of study.

Nationality.

2. Ordinal scale of measurement

Data measured at this level can be put into categories and these categories can be ordered,
or ranked. In this scale, numbers distinguish between individuals and give merit. We can
count the numbers and show the direction of their difference using the concept of
‘greater’ or ‘less’ than or by giving an ordered series.

However, the numbers are ordered without regard for differences in the distance between
the scores. i.e. An ordinal scale can tell us who is the first, second, and third. However, it
cannot tell us whether the distance between the first- and the second ranked scores is the
same as the distance between the second- and third- ranked scores.

Examples.

Letters used in the grading system.


Page 4 of 47
Classifying people according to their body size e.g. small, medium, or large.

The rating scale that uses descriptive words, poor, good, excellent etc.

3. Interval level of measurement.

This ranks data and uses equal interval between any two units of measurement. We can
therefore tell precise differences between scores. However, zero is arbitrary point. i.e. it
only represents another point of measurement in this scale and it is therefore not a
meaningful/true zero. It does not represent absence of the quality being measured or the
absolute lowest point in the scale. It is just a point on the scale and there are numbers
above and below it.

With the numbers obtained on this scale, we can: count, use the concept of greater or less
than; and indicate the precise difference between any two measurements. However, we
cannot make comparative statements about the scores/numbers on an interval scales.

Examples

Measurement of:

- Scores on an IQ test.

- Temperature.

- Sea level. E.g. Dead Sea is the lowest point on earth at 400 Metres below the seal level.

4. Ratio level of measurement.

It is similar to the interval scale in that it also represents quantity and has equal intervals
between units of measurement. In addition, this scale has an absolute/true/meaningful
zero. Therefore, you cannot have negative scores on this scale. You can also form
meaningful ratios/comparisons (fractions) between the scores. E.g. Karo (6kg) is twice as
heavy as James (3kg).

NB: Ratio scales are not common in Psychology because many psychological variables
do not permit the measurement of an absolute zero.

Examples.

Page 5 of 47
Height, Weight, Time, GPA, age, number of children, salary, number of customers
served, e.t.c.

What scale of measurement is used in the following?

a. Labelling people as confident or shy?


b. Classifying a sample of students into two groups, males and females?
c. Classifying sportsmen according to high, medium, and low self-esteem?
d. Your current emotional expression?
1.3 Important Statistical Notation.

Notation refers to the use of letters and symbols to represent statistical concepts and
ideas.

Below is the common statistical notation

The letter X- identifies individual scores for a particular variable. When we have multiple
scores per subject, we use X and Y.

When used to head a column, it represents a set of scores.

It is important in statistics to specify how many scores are in a set. We use:

N to represent the total number of scores for a population and n to represent the number
of scores in a sample.

Many of our operations on numbers in statistics involve adding a set of scores.

Summing a set of values has its own notation called summation notation. The Greek
letter sigma, Σ, is used to indicate ‘the sum of’. The expression ΣX, is read’ the sum of
scores’ and it means add all the scores for variable X. ΣY means add all the scores for
variable Y.

1.4 Order of mathematical operations.

To use and interpret summation notation, you must follow the basic order of operations
required for all mathematical calculation. Below is a list showing the correct order for
performing mathematical operations.

1. Any calculation contained within parenthesis is done first.

Page 6 of 47
2. Squaring or raising to other exponents is done second.
3. Multiplying, and dividing are done third, and should be completed in order from left
to right.
4. Summation with the Σ notation is done next.
5. Any additional adding and subtracting is done last and should be completed in order
from left to right.

Example 1.1: Given the following scores, 8, 9, 4, 3, and 1; compute ΣX; ΣX2; (ΣX)2;
Σ(X-1) and Σ(X-1)2

We can use a computational table to help us demonstrate the calculations:

X X2 (X-1) (X-1)2
8 64 7 49
9 81 8 64
4 16 3 9
3 9 2 4
1 1 0 0
ΣX=25 ΣX2=171 Σ(X-1)=20 Σ(X-1)2=126

TABULATION AND GRAPHICAL REPRESENTATION OF DATA.


A. FREQUENCY DISTRIBUTIONS

A distribution is just a set of scores.

A frequency distribution is an organized tabulation of raw data using the scores and
frequencies.

We have two types of frequency distributions:

a. Ungrouped frequency distribution

It is used for categorical data (nominal or ordinal data) or when we have a very small
range of data.

The regular frequency distribution comprises of two columns. The first column lists the
scores (X values), in one column, from the highest to the lowest score. Besides each X

Page 7 of 47
value, in a second column, is the frequency- the number of times that particular score
occurred in the data set.

Steps in constructing an ungrouped frequency distribution


i. Make a table with the following four columns : A) Class (B) Tally (C) Frequency,
and (D) Percent.
Class Tally Frequency Percent

ii. List the scores in column A starting with the highest to the lowest.
iii. Tally/sort the data and place the results in column B.
iv. Count the tallies and place the numerical frequency (f) in column C. Always
indicate the Σf at the end of this column. Note: the sum of frequency is the same
as the total number of scores in a data set. Thus Σf=N.
v. Find the percentage of values in each class by using the formula:
%=f/n x 100%. Where f=frequency of the class, n= total number of values.
NB: sometimes, we may be required to compute ΣX for a set of scores that have
been organized into a FD. To do that we multiply each X with its respective f and
then add these values. This ensures that we capture all the scores in a data set.
Thus for a FD, ΣX= Σ(Xf).

As a general rule, a frequency distribution should have a maximum of 10-15 rows to keep
it simple. When scores cover a range wider than this, then it is advisable to use a grouped
FD.

Grouped frequency distribution

We can get a relatively simple organization of the data by listing groups of scores in the table
rather than individual scores. The class intervals are indicated in one column from the highest
to the lowest and their respective frequencies are given in an adjacent column.

Major concepts related to grouped FD.

I. Range= highest score- lowest score


II. Class interval or class (X)= A group of scores
III. Apparent limits/class limits= the smallest and largest values that can be included in a
class. The lower class limit is the smallest data value that can be included in a class

Page 8 of 47
while the upper class limit is the largest data value that can be included in a class.
The class limits have gaps between them. E.g. There is a gap between 30 and 31.
IV. Real limits/class boundaries- numbers with an additional decimal value ending in 5
which are used to separate the classes so that there are no gaps in the FD. You get the
lower boundary (real limit) by subtracting .5 from the lower class limit and by adding
.5 to the upper class limit to get the upper boundary.
V. Class width= lower (or upper) class limit of a class minus the lower (upper) class limit
of the class below it.
VI. Mid-point= the score that is at the centre of a class.
VII. Cumulative frequencies (CF)= the total data values accumulated to and including a
specific class. CFB= the total scores up to and below a particular class. CFA= the
total scores up to and above a particular class.

Basic rules for construction of a grouped FD.

I. There should be between 5 and 20 classes.


II. It is preferable a class width as an odd number. This ensures that the mid-point has the
same value as the raw scores.
III. The classes must be mutually exclusive. i.e. non-overlapping.
IV. The classes must be continuous. There should be no gaps in the FD. Even when a
class has no data values, it must be included.
V. The classes must be exhaustive. i.e. all the data values in the distribution must be
captured by the classes.
VI. The classes must have equal width.
VII. The lowest score in each class must be a multiple of the class interval.
VIII. It must have a clear title.

Steps in the construction of a grouped FD.

a. Determine the range of scores


Range= H-L
Inclusive R= H-L +1 (or real upper limit of the highest score- real lower limit of the
lowest score).
Find the range for the following scores:
62,50,36,30,60,48,60,75,50,25,40,90,45,54,60,78,85,32,36,80, 51,53,54.
b. Determine the possible class interval to use.

Page 9 of 47
Lowest possible class interval= R/15 (round it up). HPCI= R/12. (Round it down).
Always decide on an odd numbered CI.
c. Identify the starting point (the highest class in your distribution). NB: For all classes,
the number on your left-hand side should be a multiple of the CI. It must contain the
highest value. Give the other classes. NB: The lowest interval must contain the least
data value. Distribute the data into the classes.
d. Give the real limits and mid-points
e. Tally the data
f. Find the numerical frequencies from the tallies. NB: Always find the sum of frequency
at the bottom of this column.
g. Compute the cumulative frequencies (CFB and CFA). Your GFD should have the
following columns:

Class Real Mid- Tally Frequency CFB CFA


Limits Point

Why we construct FDs.


a. To organize and summarize data in a meaningful, intelligible way.
b. To enable the reader to determine the shape or nature of the distribution.
c. To facilitate the computation of measures of central tendency and variability.
d. To enable the researcher to draw charts and graphs for data presentation.
e. To enable the reader to compare different data sets.
B. Graphical methods for describing data

Reasons why we use graphs and tables in statistics

a. A graph presents statistical data in, a quick, easy-to-read-and-interpret pictorial format.


b. To capture audience’s attention in public speaking or during a presentation.
c. To discuss an issue, reinforce a point, or summarize a situation.
d. To describe and analyze data.
e. The main purpose of any chart is to give representation of data which is more difficult to
obtain from a table or a complete listing of the data.
f. To reveal a trend or pattern in a situation over a period of time.

Common graphical procedures

Page 10 of 47
i) Bar graph- a graph that displays data using bars of different heights. The height of
each bar corresponds to the frequency of the respective score or class. In this graph, a
space is left between adjacent bars. It is used for scores measured on a nominal or
ordinal scale.

The undergraduate student population of Kenyatta University may be summarized thus:

Class Number of students


Freshers 12000
Sophomore 11000
Third year 10000
Fourth year 9000

Number of students
14000

12000

10000

8000

6000 Number of students

4000

2000

0
First year Second year Third year Fourth Year

ii) Histogram- is a graph that represents frequency distribution data by means of


contiguous rectangular bars whose widths represent the class boundaries and whose
heights correspond to the class frequencies. Contiguous means that there are no gaps
between adjacent bars in a histogram. We use histograms to represent continuous
(interval or ratio) data.

How to construct a histogram.


1. Step 1
Draw and label the x and y axes.
2. Step 2
Represent the frequency on the y axis and the class boundaries on the x axis.
3. Step 3
Using the frequencies as the heights, draw vertical bars for each class.

Page 11 of 47
iii) Frequency polygon- a graph that displays the data by using lines that connect points
plotted for the frequencies at the midpoints of the classes. The frequencies are
represented by the heights of the points.
Steps in the construction of frequency polygon
1. Step 1:
Find the midpoint of each class.
2. Step 2:
Draw the x and y axes. Label the x axis with the midpoint of each class, and then use a
suitable scale on the y axis for the frequencies.
3. Step 3:
Using the midpoints for the x values and the frequencies as the y values, plot the points.
4. Step 4:
Join the polygon and close the polygon by drawing a line back to the x axis at the
beginning and end of the graph, at the same distance that the previous and next midpoints
would be located.
iv) Cumulative frequency polygon/ The smooth curve/ the ogive- This is a graph that
represents the cumulative frequencies for the classes in a frequency distribution. We
can construct either a less than or a greater than ogive.
Steps in constructing an ogive
1. Step 1:
Find the cumulative frequency for each class.
2. Step 2:
Draw the x and y axes. Label the x axis with the class boundaries. NB: For the
greater than ogive. plot each upper class boundary and the corresponding
cumulative frequency (since the CFB represent the number of data values
accumulated up to the upper boundary of each class)., plot the lower class boundaries
Use an appropriate scale for the y axis to represent the cumulative
frequencies. Do not label the y axis with the numbers in the cumulative frequency
column
3. Starting with the first class boundary, connect adjacent points with a line segment.
Always close the graph to the class boundary on the x axis at the beginning and at the
end of the graph.
Class Frequency
10-14 4
15-19 7

Page 12 of 47
20-24 12
25-29 17
30-34 10

Distribution Shapes

Importance of distribution shape

The shape of a distribution helps us to describe data.

It also determines the appropriate statistical methods to use when analyzing data.

How to analyse distribution shape.

i. We can analyse the shape of a distribution by drawing a histogram or a frequency


polygon.
ii. By computing statistics that describe the shape of a distribution.

The most common shapes

The normal distribution- this is a bell-shaped distribution which has a single peak and
tapers off at either end. It is approximately symmetric; i.e., it is roughly the same on both
sides of a line running through the center. It is obtained in a heterogenous class. i.e. where
only a few students are very weak, a few very bright, and most of them average. In this
distribution, the scores are evenly distributed on both sides of the mean.

Deviations from the normal distribution.

These are caused by:


a. Having a poor test
b. When the class itself is not normal

They include:

a) Skewness- this is deviation from the normal distribution in terms of symmetry. A


skewed distribution has a ‘tail’ either on one end or the other.
Types of Skewness
i) Positively skewed distribution

A distribution whose peak is to the left and the data values taper/tail off to the
right. To normalize it, you push the tail to the right. This distribution implies that
majority of the students got low marks. This may arise when:

- The students are very poor.


- The given test is very hard.
- The content is above the level of understanding of the students.
ii) Negatively skewed distribution

Page 13 of 47
A distribution whose peak is to the right and the data values taper off/tail off to the
left. NB: to normalize this distribution, you push the tail to the left. This
distribution indicates that majority of the learners got high marks in the test.

This may arise when:

- The given test was very simple.


- Cheating.
- The learners are very bright e.g. Alliance Boys.
b) Bimodal distributions
A bimodal distribution have two peaks of the same height. This can be found in
classes with students from different backgrounds.
c) Kurtosis
This refers to the peakedness of a graph.
Types of kurtosis
i) Mesokurtic- this is cone-shaped distribution. i.e. there is a gradual
increase followed by gradual decrease of frequencies (broad range of
scores). This is the shape of a normal distribution.
ii) Leptokurtic- this is a distribution that is sharp pointed. The range of
scores is very narrow. There is a sudden increase followed by a sudden
decrease of the frequencies. i.e. Majority of the scores were near the mean.
It’s found in a homogenous class.
iii) Platykurtic distribution- this is a uniform distribution which is basically
flat or rectangular. In it, the frequencies of the scores are the same. This
could arise if the class is organized in groups (group work).

MEASURES OF CENTRAL TENDENCY


3.1 Measures of central tendency

Central tendency is a statistical measure that determines a single value which acts as the most

typical score or the best representative score for the entire data set.

The single value is used by researchers to:

a) summarize or describe a large set of data,

b) compare two or more data sets.

The three commonly used measures of central tendency are: mean, median, and mode.

a) The mode

This is the most frequent score in a data set.

Page 14 of 47
- In a frequency distribution, the mode is the score corresponding to the peak or the high

point of the distribution.

- For grouped frequency distribution, mode is the mid-point of the class with the highest

frequency.

NB:

- Can be determined for data measured on any scale of measurement: nominal, ordinal,

interval, or ratio.

Possible scenarios:

i. When one value in the distribution has the highest frequency, that value is the mode.

E.g. 8,11,16,18,11,12,11 mode=11.

ii. When two adjacent scores have the same frequency and the two have the highest

frequency in the distribution. E.g. 0,1,1,2,2,2,3,3,3,4,5. (Plot the data)

Mode is found by finding the average of the two adjacent scores with the highest

frequency. Thus mode= 2+3/2= 2.5

Nb. The peak is somewhere betwen the two scores.

iii. When in a group of scores, two non-adjacent scores have the same frequency, and this

common frequency is greater than that of any other score in the distribution, then the

distribution has two modes. Such a distribution is called a bi-modal distribution.

E.g. 10,11,11,11, 12,12,13,13,14,14,14,15,17. (Plot the data).

iv. When more than two non-adjacent scores occur more frequently than any other score

in the distribution and with the same frequency, the distribution is said to be

multimodal.

v. All the scores in a distribution may occur with the same frequency e.g. 1,2,3,4,5,6, the

distribution is said to have no mode. NB: You don’t say that the mode is 0. Since in

some data sets, 0 is an actual value. You simply say that the distribution has no mode.

Page 15 of 47
Limitations of the mode.

i. It is not always unique. A data set can have more than one mode, or have no mode at all.

ii. Mode is greatly affected by chance and has little or no mathematical usefulness.

Strengths of the mode.

i. It is the only measure of central tendency that can be used for data measured on a nominal

scale.

ii. It is used when the most typical score is desired. Hence it is very useful in analyzing

qualitative data.

iii. It is not affected by outliers.

iv. Very easy to compute.

b) Median (MD)

The middle point in a data set that has been ordered. It is only computed for data that can be

ordered in ranks. i.e.ordinal, interval, and ratio data. We usually find the median by a simple

counting procedure:

I. Median for ungrouped data

a. When N is odd, the median will be the middle value. Steps: a) List the values in

order, and b) select the middle value as median. Example: the median for this data

set 10,20,23,24,25,28,30, is 24.

b. When N is even, the median will fall between two data values. Steps a) List the

values in order, b) Locate the middle two scores, c) add the two values and divide

by 2 to obtain the median.

Example: in this data, 10,20,23,24,25,28,30,33, the median is between the scores 24

and 25. To obtain it, we just get the average of the two scores thus: 24+25/2=

49/2=24.5.

c. If several scores have the same value as the middle score.

Page 16 of 47
E.g. 6,7,8,11,11,11,12,13,14. Here, N=9. You should circle the repeated value. If

it separates the data into two equal halves, then the value is the median. If it does

not separate the data into two equal parts, E.g. 6,7,8,11,11,11,12,13,14,15. Then

we should find the median using the exact median method/frequency distribution

data method.

II. Median for frequency distribution data.

a. One of the simplest methods for finding the median is to draw a histogram. The

goal is to draw a vertical line through the distribution so that exacly half of the

boxes are on either side of the line. The median is that score which is at the

vertical line.

b. You could also use the formula for exact median:

𝑐𝑖
𝑁
− 𝑐𝑓𝑏
𝑀𝐷 = 𝐿 + ( 2 )
𝑓𝑤

Where: L= the lower real limit of the class containing the median score.

N= total number of scores

Cfb= cumulative frequency for the class below the one containing the median

score.

Fw= frequency for the class containing the median score.

Ci= class interval.

Advantages of the median

a. It is relatively unaffected by outliers. It’s the best MOCT for data containing outliers.

b. The median tends to stay at the centre of the distribution even when the distribution is

‘skewed’. It is the best MOCT for skewed distributions.

c. Median is unique distribution can only have one median.

Disadvantage.

Page 17 of 47
a. Cannot be used with nominal data

iii) The Mean

- The arithmetic average of all the scores in a data set. It is calculated for numerical values,

usually measured on an interval or ratio scale. Unlike mode and median, it involves all

the scores in a distribution.

To obtain the mean, sum up all scores and then divide by the total number of scores in the set

(N). The formula for the mean is as follows:

M=ΣX/N.
Where:
M= mean of the scores
ΣX= sum of the scores
N= number of scores

Example: find the mean of the following seven scores: 2,3,6,8,9,10,12

M= 2+3+6+8+9+10+12/7= 50/7= 7.14.

NB: The mean is the score that each individual receives when the total is divided equally

among all N individuals.

Mean for frequency distribution data

Ungrouped:

∑𝒇𝒙
̅=
𝑿
∑𝒇

Grouped:

∑𝒇𝒙
̅=
𝑿
∑𝒇

where x is the mid-point.

Mean of means

(𝒙𝒂 × 𝒏𝒂 ) + (𝒙𝒃 × 𝒏𝒃 ) + (𝒙𝒄 × 𝒏𝒄 ).


̅=
𝑿
𝒏𝒂 + 𝒏𝒃 + 𝒏𝒄

Properties of the Mean


Page 18 of 47
a) A distribution can only have one mean.

b) Its computation involves all the scores in a data set.

c) The mean serves as the balance point of the distribution because the sum of the

distances below the mean is exactly equal to the sum of the distances above the mean.

This makes the mean to be most commonly preferred measure of central tendency.

This property makes the mean to have the following four important characteristics.

a. Changing the value of any score in a data set, will always change the mean.

I. If we add a constant to one value in the data set.

E.g. 9,3,8,7,and5. N=5. ∑X=32. M=6.4. Suppose X=3 is changed to 6. New mean will

be M=7.

New mean= old mean + constant/N.

New mean=6.4+ 3/5= 6.4 +0.6= 7.

VIII. If we add a constant to all the values in the data set

e.g. If we add 3 to all the values in our example data. We would get:

X X+3
9 12
3 6
8 11
7 10
5 8
∑X=32 ∑X=47
Ẍ=6.4 Ẍ=9.4
Note: The new mean is 9.4 which is the same as 6.4 +3.

Therefore: New mean = old mean + constant.

NB: The sum of squared deviations from the mean is always lesser than deviations from any

other point in the distribution.

Advantages of the mean

• Easy to compute.

• Easy to work with and use in further analaysis.

Page 19 of 47
• It is the most representative moct.

Disadvantages of the mean

a) It is extremely sensitive to outliers (a few scores that deviate extremely from the other

scores in a set). Outliers tend to pull the mean to the extremes of the distribution. This

is why the mean is located near the tails of skewed distributions.

b) Is not meanigful for scores measured on a nominal or ordinal scale.

Describing distribution shapes using the MOCT.

The three measures of central tendency systematically relate to one another in a distribution.

a. For normal distributions, the three MOCT are equal and therefore occur at the centre.

b. Positively skewed data, the mean is the largest MOCT, followed by median then mode.

c. Negatively skewed data, the mode is the greatest MOCT, followed by median, and then

mean is the least.

3.2 Measures of dispersion/ measures of variability/ measures of spread

These are statistical measures that tell us how the scores differ from one another. They

indicate how much the scores are spread out or clustered around the central value.

Significance of measures of dispersion

a. They describe the set of scores.

b. They determine how well the average represents the other scores. NB: When a MOV

is small, it indicates that the scores are clustered closely (and therefore the mean is

representative of the data); when it is large it indicates that the mean is not

representative of the data.

c. To facilitate comparison.

d. To facilitate the use of other statistical measures.

The most frequently used measures of dispersion are range, interquartile range, variance, and

standard deviation.

Page 20 of 47
a. Range.

The range is the difference between the highest and the lowest scores in the data set. The

formula is:

R=H-L
Where R=range
H= highest score
L= Lowest score
Range for Frequency Distribution Data:

a. Ungrouped: highest value-lowest value.

b. Grouped:

i) Upper real limit of the highest class minus lower real limit of the lowest class.

ii) Mid-point of highest class minus mid-point of lowest class.

Merits of Range

a) Easy to calculate.

b) Easy to understand.

c) Captures the entire distribution.

Demerits of Range

a) Only involves two values and threfore can highly be influenced by outliers.

b) Does not indicate the direction of variability i.e. not an accurate indicator of variability.

c) Cannot be determined for open ended distributions.

d) Cannot be used for calculating other statistical functions of the data.

QUARTILE DEVIATION/ SEMI-INTERQUARTILE RANGE

It is half the difference between the third quartile and the first quartile.

QD is related to the median. i.e. It is the range of the middle 50% of the data.

What are quartiles?

Quartiles are the values that divide a list of numbers into four groups: Q1; Q2; Q3; & Q4.

NB: Q1= 25% of N; Q2= 50% of N; Q3= 75% of N;

Page 21 of 47
For ungrouped data. E.G. 15,13,6,5,12,50,22,18

To get QD, you must first order the data set.

Steps:

• Arrange data from highest to lowest.


• Find the median of the values (This is Q2).
• Find the median of the values below Q2. (This is Q1).
• Find the median of the values above Q2. (This is Q3).

Substitute in the formula:

𝑄1 − 𝑄3
𝑄=
2
For grouped data.

𝑁/4 − 𝑐𝑓𝑏
𝑄1 = 𝐿 + ( ) 𝑥𝑐
𝑓𝑤

3𝑁/4 − 𝑐𝑓𝑏
𝑄3 = 𝐿 + ( ) 𝑥𝑐
𝑓𝑤

Advantages of SIQ

- Easy to calculate.

- Not influenced by outliers.

- Can be illustrated graphically.

- Can be used for open-ended distributions.

Disadvatages

- Does not represent all the data. i.e. the upper and lower 25% of the data are not used in

the calculation.

- Does not support any other statistical analysis of the data.

Mean Deviation (MD).

Deviation score (D)= The distance of a score from the mean. i.e. 𝑋 − 𝑋̅. Nb: ∑D= 0.

∑|𝑋−𝑋̅ |
Mean deviation (MD)= the average of the absolute deviation scores. i.e. 𝑀𝐷 = 𝑁

Advantages of the MD

Page 22 of 47
- Easy to understand.

Disadvantages of the MD

- Does not support more advanced statistical procedures.

Variance.

Variance (S2) usually indicates the average squared distance from the mean. The greater the

distance, the greater the variance. When two sets of scores have the same mean but different

variances means that one has a greater spread of scores than the other.

Variance is calculated using the following formula:

2
∑(𝑋 − 𝑋̅)2
𝑆 =
𝑁
2 ∑𝑋 2
∑𝑋 −( )
𝑁
𝑆2 =
𝑁
∑𝑋 2
∑𝑋 2
𝑆2 = −( )
𝑁 𝑁

Where S2 = variance of the scores


N = number of scores
ΣX2 = sum of the squared scores
(ΣX)2 =square of the sum of the scores

The steps involved in working out the variance using the above formular include:

i. Find the sum of the values (ΣX). Example: For the following eight

scores 11,12,12,13,14,15,16,17, the ΣX is 110.

ii. Square each value and find the sum (ΣX2). Thus the ΣX2 for our

scores would be: 121+144+144+169+196+225+256+289= 1544

Substitute in the formula:


(∑𝑋)2
∑𝑋 2 −
𝑆2 = 𝑁
𝑁

∑𝑋 2 ∑𝑋 2
𝑆2 = −( )
𝑁 𝑁
S2= 1544/8 – (110/8)2 = 193 -189.0625=3.9375=3.94.

Deviation score method:

Page 23 of 47
2
∑(𝑥 − 𝑥̅ )2
𝑆 =
𝑛

𝑥̅ = 13.75

X x-𝑥̅ (x − 𝑥̅ )2

11 -2.75 7.5625

12 -1.75 3.0625

12 -1.75 3.0625

13 -0.75 0.5625

14 0.25 0.0625

15 1.25 1.5625

16 2.25 5.0625

17 3.25 10.5625

∑(x − 𝑥̅ )2 = 31.5

∑(x − 𝑥̅ )2 = 𝑆𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 = 31.5

Variance = ss/n = 31.5/8 = 3.9375 = 3.94.

For grouped data:

∑𝑋 2 𝑓 ∑𝑋𝑓 2
𝑆2 = −( )
𝑁 𝑁
Steps:

1. Make a table with the following columns:

Table 1

A B C D E F

Class Frequency Midpoint midpoint squared (x2) Xf X2f

2. Square the midpoint for each class and place the products in column D.

3. Multiply frequency by midpoint for each class, and place the products in column E.

4. Multiply the frequency by square of the midpoint, and place the products in column F.
Page 24 of 47
5. Substitute in the formula and solve to get variance:

2
∑𝑋 2 𝑓 ∑𝑋𝑓 2
𝑆 = −( )
𝑁 𝑁

b) Take the square root to get the standard deviation

∑𝑋 2 𝑓 ∑𝑋𝑓 2
𝑆=√ −( )
𝑁 𝑁

Advantages of variance.

Weaknesses of the variance

-It does not give a good descriptive measure of dispersion since it is expressed in squared

deviations.

C. Standard Deviation

The standard deviation (SD), sometimes represented by the Greek letter σ (sigma), is the

square root of the variance. It usually indicates the extent to which scores vary from the

mean. It is a very common measure in the field of testing and measurement. It is also used in

calculating the Deviation IQ.

2
𝛴𝑋 2 −(𝛴𝑋)
𝑆𝐷 = √ 𝑁
𝑁

In the example for variance, we can compute the standard deviation by finding the square

root of the obtained variance. Since, S2=3.94, then SD=√3.94=1.98.

Note: That the greater the standard deviation, the more the scores tend to spread out from the

mean. i.e. The smaller the standard deviation, the less the scores tend to vary from the mean

i.e. The homogenous the class was.

The standard deviation considers all scores in the distribution and it facilitates calculations of

combined standard deviation, which helps to compare two or more distributions of scores. It

Page 25 of 47
also facilitates other statistical computations like correlation and skewness. The standard

deviation provides a unit of measurement for the normal distribution. However, the standard

deviation is very much affected by extreme scores, it is difficult to compute and compare, and

it cannot be calculated for distributions with open classes.

MEASURES OF RELATIONSHIP
Relationship – tendency for two variables to change consistently.

THE CONCEPT OF CORRELATION

It refers to the relationship between two variables.

Questions answered through correlation.

a. Is there a relationship between the two variables?


b. What is the direction of the relationship?
c. What is the magnitude or the strength of the relationship?

Ways of establishing relationships between two variables:

a. Logical examination

This involves examining/observing a pair of data to pick out any pattern of change in the
values.

E.g. (A). 1,2,4,6,7,8. (B). 3,4,6,8,9

However, the relationship between variables is not always easily picked out by mere
observation of the data.

b. By drawing scatter plots/scatter diagrams

Scatterplot is a graph that describes the nature of the relationship between two variables by
plotting them against each other.

Scatter Diagram Procedure

i Collect pairs of data where a relationship is suspected.


ii Draw a graph with the independent variable on the horizontal axis and the dependent
variable on the vertical axis. For each pair of data, put a dot or a symbol where the x-
axis value intersects the y-axis value.
iii Look at the pattern of points to see if a relationship is obvious (NB. If the data clearly
form a line or a curve, then the variables are correlated).

Types of relationships that can be shown using scatter plots.

Page 26 of 47
i. Positive relationship- this is present when the scatter plot indicates that as the values of
one variable increase, the values of the other variable also increase.

Series 1
4.8
4.6
4.4
Series 1
4.2
4
Category 1 Category 2

ii. Negative relationship- this is present when the scatter plot indicates that as the
values of one variable increase, the values of the other variable decrease.

Series 1
5

2
Series 1
1

0
Category 1 Category 2

iii. No relationship
This is when the scatter plot does not indicate any distinct pattern in the points of the
two variables.

Purposes of scatter plots

- Show direction of relationship


- Show strength of relationship- the amount of scatter or dispersion in the points. The
strongest possible relationship is a perfect relationship- a relationship where the actual
data points perfectly fit on a straight line.
- Show linearity
- Linear rlships- a rln btw n 2 vs where a the pts on a scatter plot are best fitted/
summarized by a straight line.
- Non-linear relship- scattreplots best fitted by a curve
- Homoscedascticity/Heteroscedasticity- pts close vs pts far away.
- Presence of outliers- pts far away from the rest.

c. Correlation coefficient (r)

Page 27 of 47
This is a quantitative measure of the relationship between two variables. It ranges from -1
to +1.

Intrepreting r

You consider two parameters:

i. Its sign- it can either be positive or negative.

A Positive correlation coefficient- indicates a positive relationship between the two


variables. i.e. an increase in one variable is associated with an increase in another
variable.

A negative correlation coefficient- indicates a negative relationship between the two


variables. i.e. an increase in one variable is associated with a decrease in another.

ii. Its size/ magnitude- indicates the strength of the relationship between two variables.

How to describe the correlation.

You use words together with numbers.


0 = no relationship
1= a perfect correlation.
<0.2- weak
0.30-0.5- moderate
0.6-.07- strong
0.8-.99- very strong
Common correlation coefficients

1. Pearson’s product moment correlation coefficient (rxy)


Definitional formula
a. 60,50,54,59,75
b. 71,40,53,59,72
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑟𝑥𝑦 =
√∑(𝑥 − 𝑥̅ )2 ∑(𝑦 − 𝑦̅)2
E.g.
Table 2
X Y (𝑥 − 𝑥̅ ) (𝑦 − 𝑦̅) (𝑥 − 𝑥̅ )2 (𝑦 − 𝑦̅)2 (𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
10 9 -6 -4 36 16 24
20 16 4 3 16 9 12
15 13 -1 0 1 0 0
19 17 3 4 9 16 12
16 10 0 -3 0 9 0
∑x=80 ∑y=65 ∑ ∑ ∑
M= 16 M= 13 (𝑥 − 𝑥̅ )2 =62 (𝑦 − 𝑦̅)2=50 (𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)=48

rxy= 48/√62x 50= 48/55.68= 0.86.

Page 28 of 47
Computational Formula

𝑁∑𝑥𝑦 − ∑𝑋∑𝑌
𝑟𝑥𝑦 =
√[𝑁∑𝑋 2 −(∑𝑋)2 ][𝑁∑𝑌 2 −(∑𝑌)2 ]

Table 3

X Y X2 Y2 XY
10 9 100 81 90
20 16 400 256 320
15 13 225 169 195
19 17 361 289 323
16 10 256 100 160
∑X=80 ∑Y=65 ∑X2=1342 ∑Y2=895 ∑XY=1088
5440−5200
𝑟𝑥𝑦 = = 240/278.39= 0.86
√[6710−6400][4475−4225]

ASSUMPTIONS OF PEARSON'S CORRELATION COEFFICIENT.

There are four assumptions that should be met when using Pearson's correlation:

i. The variables are measured at either interval or ratio scales of measurement.


ii. The variables are approximately normally distributed
iii. There is a linear relationship between the two variables.
iv. There is homoscedasticity of the data. i.e. Homogeneity of variance equal variance of
the x and y distributions.
v. There are no outliers in the data.

THE SPEARMAN RANK ORDER CORRELATION COEFFICIENT (rs or rho)

A statistic that measures the degree of association between two ordinal variables. It is the
equivalent of rxy that is usually used when the assumption of normality is not met by the data.
It involves ranking each set of data. Then the differences in the ranks is found. rs is found
computed using these differences. It establishes whether the ranks of the two sets of data are
correlated. When both sets have the same rank, rs = + 1. When the two sets have exactly
opposite ranks then it is -1. If there is no relationship between the rankings, rs will be very
near 0.

6∑𝐷2
𝑟𝑠 = 1 −
𝑛(𝑛 2 − 1)

Steps.

2. Rank the scores in variable x and variable y.

Page 29 of 47
3. Subtract the rankings (rank x- rank y).
4. Square the differences.
5. Find the sum of the squares.
6. Substitute in the formula.
6∑𝐷2
𝑟𝑠 = 1 −
𝑛(𝑛 2 − 1)

ASSUMPTIONS OF THE SPEARMAN RANK ORDER CORRELATION


COEFFICIENT.

i. The two variables are measured at least on an ordinal, interval or ratio scale.
ii. There is a monotonic relationship between the two variables. A monotonic
relationship exists when either the variables increase in value together, or as one
variable value increases, the other variable value decreases.
iii. The variables are not normally distributed.

CAUTIONS WHEN INTERPRETING CORRELATION COEFFICIENTS.

1. A correlation is a simple number and must not be reported as a percentage.

E.g. r = 0.5 does not mean that 50% of the relationship between the two variables.

2. Correlation does not imply causation.

Why?

1. A correlation coefficient is only a measure of relationship. It does not show whether


one variable is causing the scores of the other to be as they are.
2. The two variables could be consequences of a common cause but do not cause each
other. i.e. The third variable problem. The third variable is called a lurking
variable: A hidding variable that causes both X and Y.
a. The number of churches in Nairobi and alcoholism.
b. The number of photocopiers in KM and retakes in EPS 400.
3. The two variables could have cyclic causation: X causes Y, and Y causes X.
4. There could be indirect causation. E.g. X causes A, and A causes Y.
5. There is no connection between X and Y, and the correlation could be coincidental.

MEASURES OF POSITION/RELATIVE STANDING


These are scores used to locate the relative position of a score in the data set. They mainly
compare a students’ performance with that of a reference group, usually one’s classmates.

Common Measures of Position

a. Standard scores
b. Percentiles
c. Deciles
d. Quartiles

Page 30 of 47
Standard scores- the transformation of raw scores to a desired scale with a predetermined
mean and standard deviation.
Types of standard scores

1. Z-scores

A Z-score indicates the relative position of a student in a group by showing how far the raw
score is above or below the mean using the sd as a unit of measurement.

NB: Comparing raw scores may not be possible since the exams may be different, markers,
schools, etc.

1. this is a score that indicates the number of std deviations a raw score is above or
below the mean in a distribution.
𝑥−𝑥̅
Formula: 𝑧 = .
𝑠𝑑

Reasons for computing z-scores

1. A z-score value specifies an exact location of a raw score within a distribution.


2. Transforming raw scores into z-scores we standardizes a distribution making it possible
different distributions can be made comparable.

NB: Standardization- the process of putting scores on a uniform distribution with the same
mean and std deviation for the purposes of comparing them.

Z-scores and Location.

A raw score does not tell us how that particular score compares with other values in the
distribution.

If the raw score is transformed into a z score, the z-score value indicates exactly where a
score is located relative to all the other scores in the distribution.

When scores are converted to z-scores, we end up with numbers with two important
properties:

a. Either a + or – sign. + indicates that the X value is located above the mean. – indicates
that the X value is located below the mean.
b. The numerical value. This shows the number of std. deviations between the raw score
and the mean.

We should able to visualize z-scores as locations in a distribution.

Advs of Z-scores

- Makes two/more different distributions the same.


- Enables us to directly compare individual scores from different distributions.
- Allows us to calculate the probability of a score occurring within our normal distribution .

Page 31 of 47
Disadvantages of Z-scores
- The values of z-scores are normally very small. Most of them range from -3 to +3.
- Burdensome since they involve many decimal values and negative numbers. In case one
forgets putting the – sign it would change the entire meaning of the score.
- The z-score distribution has a mean of 0. This is quite difficult for laymen to compute.

To overcome these challenges, we convert raw scores into other standard scores.

We convert raw scores are transformed into standard scores using the following procedure:

i Convert raw score into z-score.


ii Then use the formula:

Standard score = mean of the standard score + (standard deviation of the standard score
multiplied by the z-score).

i.e. 𝑆𝑡𝑑. 𝑠𝑐𝑜𝑟𝑒 = 𝑋̅ + 𝑆𝐷(𝑧𝑠𝑐𝑜𝑟𝑒).

The following are the common standard scores and their respective means and standard
deviations:

Standard score Mean Standard deviation


1 T-score 50 10
2 Stannines 5 2
3 IQ 100 15

Normalized standard scores- are standard scores based on distributions that were not were
not originally normal, but were transformed into normal distributions.

NB: Normalization is the process of trying to make a distribution of scores to be as close as


possible to a normal distribution. This is attained by smoothing the distribution, stretching it
or by condensing irregularities/departures from normality in raw score distribution.

Examples of Normalized standard scores

Stannines. Mean=5; SD= 2.

To transform raw scores into stannines, we use the following procedure:

IX. Convert the raw score into z-score.


X. Then use the formula: 𝑆𝑡𝑎𝑛𝑛𝑖𝑛𝑒 = 5 + 2(𝑧 𝑠𝑐𝑜𝑟𝑒).

NB: Stannines have a range of 1-9.

Z-scores and standardized distributions

When an entire data set is transformed into z-scores, the resulting distribution of z-scores will
always have a mean of 0 and a sd of 1 .i.e. It will be a standard normal distribution.

Page 32 of 47
NB: Standardization is a form of linear transformation of the scores. In a linear
transformation the shape of the original distribution and the location of any individual score
relative to others in the distribution are not affected.

NB: All normally distributed variables can be transformed into standard normal distributions
using the formula for the z- scores.

We need to know its properties and how to locate scores on it.

a) Properties.

NB: we must know its properties in order for us to solve problems involving distributions that
are approximately normal.

Can you remember the properties of the standard normal distribution?

• Bell-shaped
• Mean, median, and mode located at the centre.
• Unimodal
• Symmetric
• The curve is continuous
• The curve never touches the x axis (homoscedasticity).

Statistical Properties
• Has a mean of 0 and sd of 1.
• The total area under the curve is 1 or 100%.
• 50% or 0.5 of the area lies above the mean and 50 % below the mean.
• 34.13 % of the total area under the curve lies between the mean and one SD below
and above the mean.
• 47.72% of the total area under the curve lies between the mean and two SD below and
above the mean.
• 49.9% of the total area under the curve lies between the mean and three SD below and
above the mean.
• About 68.26 % of the total area under the curve lies within one SD of the mean. (i.e.
34.13 x2).
• About 95.44% of the total area under the curve lies within two SD of the mean. (i.e.
47.72 x2).
• About 99.8% of the total area under the curve lies within three SD of the mean. (i.e.
49.87x2).

Page 33 of 47
b) Locating scores on the standard Normal Distribution

We locate scores on the standard normal distribution by finding areas under the standard
normal distribution curve.

Steps

i Draw picture
ii Shade the desired area
iii Get the area from the table.

Possible areas under the normal distribution curve.

• Between 0 and any z score value: look up the z score value to get the area.
• In any tail: Look up the Z score value to get the area. Then subtract area from 0.50.
• Between two Z-score values on the same side of the mean. Look up both z score
values to get the areas, then subtract the smaller area from the larger area.
• Between two z score values on opposite sides of the mean, look up both z-score
values to get the areas, then add the areas.
• To the left of any z value where z is greater than the mean: look up the z value to get
the area, then add 0.50.
• To the right of any z value, where z is less than the mean, look up the value in the
table to get the area, then add 0.50 to the area.
• In any two tails. Look up the z values in the table to get the areas. Subtract both areas
from 0.50, then add the answers.

In a class of 100 students the mean of an English test is 75 and the standard deviation 7.5.
How many students scored below a score of 69?

Procedure:

i Convert score to z-score


ii Look up for area under the normal distribution curve.
iii Subtract area from .50
iv Multiply answer by N.

How to convert a z-score to the corresponding raw score.

E.g. What is the raw score corresponding to a z-score of 1.96 in the English test?

Procedure.

i Determine whether the z-score is above or below the mean. Recall: Z(sd) is the
distance from the mean. From formula: Z= x-xbar/sd.
We find that: X= Z(sd) + Mean.
ii The score is 1.96(7.5) above the 75.
i.e. X= 1.96(7.5) + 75.
X=89.7.

Page 34 of 47
How to find a z-score when we know the proportion under a normal curve.

E.g. What score must a student get in the English test, to be among the top 5% of the class?

The top 5% will be .50- .45.

Therefore we look up for the z-score value that gives us the proportion .45.

Z= 1.65.

Then work out the raw score.

If z= x-x bar/ sd.

Then Z(sd)+Xbar= X.

Then 1.65(7.5)+75=X.

87.38.

Question.

In a class of 240 students, the mean for the end of year Agriculture test was 70 and the
standard deviation was 9. Assuming that the scores were normally distributed:

a)How many students scored between 50 and 75?


b)How many students scored above a score of 69?
c)How many students scored below a score of 40?
d)What were the cut off scores for the middle 50% of the class?
e)Due to limited facilities, the teacher can only promote 85 % of the top students to the
next class. What is the minimum score a student had to obtain to be selected for the
next class?
C. Percentiles

Percentile- position in hundredths that a score holds in the distribution.

Percentiles indicate what percentage of scores fall below a particular score in a distribution.

Percentile rank- this is the percentage of individual scores in a data set that fall at or below a
given score. It is mainly used in norm-referenced scores.

Percentile point- A score below which a certain percentage of scores fall in a given
distribution.

When we know the mean and sd of a distribution, we can find percentiles using standard
normal distribution tables.

Page 35 of 47
PART TWO: EDUCATIONAL EVALUATION

DEFINITIONS
A. Test- A device or procedure in which a sample of individual’s behaviour is obtained,
evaluated, and scored using standardized procedures.
B. Examination- A collection of tests, which measure different traits of the individual in
order to facilitate decision-making.
C. Measurement- A process of assigning numbers to represent objects, traits, attributes,
or behaviours following a set of rules.
- An educational test is a measurement device and therefore involves rules for
assigning numbers that represent an individual student’s performance.
D. Assessment is the systematic process of collecting and integrating information in a
manner that promotes understanding of students’ characteristics and decision making
in the education process. It is accomplished by use of tests and other techniques.
E. Testing- this is the process of administering, scoring, and interpreting, an instrument.
Testing is a component of the assessment process.
F. Psychometrics- the science of psychological measurement.
G. Evaluation is the process of making value judgements about students and the learning
process based on the information gathered through measurements.
There are two major types of evaluation.
Formative Evaluation
This type of evaluation is done during the teaching-learning process with a goal of
monitoring student’s learning and to provide on-going specific feedback to them.
Formative are generally low stakes, which means that they have low or no point value.
Examples:
- Draw a concept map in class to represent their understanding of a topic.
- Submit one or two sentences identifying the main point of a lecture.
- Turn in a research proposal for early feedback.
Specific uses of formative evaluation in education.
1. It monitors how well the instructional goals and objectives are being met- teacher,
learners, and curriculum designers;
2. It determines student’s strengths and weaknesses- areas mastered and those not
mastered. This informs proper learning interventions which can be put in place.
3. Facilitating learning - allow learners to master the required skills and knowledge;
4. Motivating students-
5. It facilitates the analyses of effectiveness of the teacher and learning resources.
Summative evaluation
Done at the end of an instructional period. It involves comparing student’s performance
against some standard or benchmark. It typically involves summative evaluation of student
performance esp. thro a numerical or letter grade. E.g. A, B, C, D. Summative evaluations are
often high stakes. High stakes tests are those tests that are used in ways that have important
consequences for the student. Such tests affect decisions regarding whether the student will
be promoted, admitted, or allowed to graduate.
Examples
i. A midterm exam
ii. Final project
iii. A journal paper
iv. End of Sem Exams

Page 36 of 47
Specific uses of Summative Evaluation
a. It guides students’ efforts and activities in subsequent courses (formative use).
b. Provides a summary of student’s achievement/progress to parents.
c. It shows the value or quality of learning outcomes.
d. Enhances accountability.
Participants in the Assessment Process
a. Test developers- people who develop the tests. NB: Not all developers are
professionally trained. They have professional and ethical responsibilities.
b. Test users- people who use the tests. Those who select, administer, score, interpret
and use the results. E.g. teachers, psychologists, counsellors, employers, professional
licensing boards, parents, etc. NB: Not all users are professionally trained. They also
have professional and ethical responsibilities.
c. Test takers- people who take/do/write the tests. They have their rights.
d. Test marketers - Those who market assessment products and services.
e. Those who teach others about the assessment process.
CRITERIA FOR CLASSIFYING ASSESSMENT PROCEDURES
There are four basic ways of classification:
1. Nature of assessment
a. Maximum performance –determines what an individual can do when performing
at his/her best. E.g. aptitude test; achievement tests.
b. Typical performance- determines what an individual will do under normal
conditions. Attitude & interest inventories.
2. Form of assessment-
a. Fixed choice- student selects the response to questions from available options.
E.g. MCQs.
b. Constructed response- student constructs extended response in response to
complex task. E.g. Essays.
c. Performance assessment- Assessment which requires the student to demonstrate
knowledge or skill through activities that are mostly direct, active, and hands-on,
such as giving a speech, performing a task, or producing an artistic product.
3. Uses in the classroom-
a. Placement assessment- used to determine student performance at the beginning
of teaching-learning process.
b. Diagnostic assessment- Assessment carried out to find out the underlying causes
of student’s learning difficulties.
c. Formative vs summative-
4. Method of interpreting results

Norm-referenced- measure performance interpretable in terms of a student’s relative


standing in some known group.
Criterion-referenced- tests that measure performance interpretable in terms of clearly
defined and specific domain of learning.

Similarities
A. Both require specification of the achievement domain to be covered.
B. They use same type of questions.
C. Both require a relevant and representative sample of test items.
D. Both use the same rules of item writing.
E. They are both judged by the same qualities of goodness.

Page 37 of 47
F. They are both useful in educational assessment.

Norm-referenced Criterion referenced


Typically cover a large domain of learning with Typically focus on specific domain of
a few items measuring each specific task. learning tasks with a relatively large
number of tasks.
Emphasize discrimination among learners in Emphasize description of the learning
terms of relative levels of learning. tasks learners can or cannot perform.
Interpretation requires clearly defined group. Interpretation requires a clearly defined
achievement domain.
Prefers items of average difficulty and typically Matches item difficulty to learning tasks,
omits very easy and very hard items. without omitting very hard or very easy
items.

Other terms used to describe tests

Individual vs group
Speed vs power
Objective vs subjective
5. Alternative assessment- Non-traditional ways of gathering information about the
student. This may include: portfolios, observations, samples of client’s, or group
projects.

Needs assessment-Inquiry into the current state of knowledge, resources, or practice with the
intention of taking action, making a decision, or providing a service based on the results.

Self-assessment- Personal rating of ability according to specified criteria.

Criteria for classifying tests


(Check Module pg. 106)
Purpose and role of evaluation in Education
i. Student evaluations- enables educators to monitor students’ progress and provide
constructive feedback to them. E.g. they involve assigning students grades to reflect their
academic progress or achievement.
ii. In classroom instruction- provides information that helps teachers to modify and
improve their teaching practices.
• Assessment clarifies the nature of intended learning outcomes-
• Provides short-term goals to work toward
• Provide feedback concerning learners’ mastery
• Helps teachers determine what to teach, how to teach, and how effective their teaching
was, and how to provide feedback to their students.
• Formative assessment- it monitors learners progress and helps teachers ‘size up’ their
students.
• Placement assessment- at the beginning of instruction (measures entry behaviour).
• Diagnostic assessment- identifies learners’ strengths and weaknesses; and causes of
learning problems. Helps teachers provide information about how to overcome learning
difficulties.
• Summative assessment- measures end-of- course achievement.

Page 38 of 47
iii. Selection, placement, and classification decisions- provides infor. that helps educators
to select, place, and classify students.
• Selection decisions involve some individuals being accepted while others are
rejected. Eg. Admission to universities, high school etc.
• Placement decisions involve situations in which students are assigned to
various categories that represent different educational tracks or levels are
ordered in some way.
• Classification decisions- refer to situations in which students are assigned to
different categories that are not ordered or not ordered. E.g. Classification of
special education students VI, HI, Emotional Disturbed e. t. c.
iv. Policy decisions- a lot administrative decisions at different levels. They involve
evaluating the curriculum, instructional practices, levels of funding, employee recognition
and benefits, as well as accountability.
v. Counselling and guidance decisions- school cousellors promote the self-understanding
of students and help students plan for their future.
PREPARATION OF EDUCATIONAL MEASUREMENT OBJECTIVES
Educational objectives- A.k.a Instructional / learning objectives.
Educational goals i.e. what you hope the students will learn or accomplish.
Educational objectives are the foundations of assessment.
ROLE OF OBJECTIVES IN EDUCATION AND EVALUATION
Objectives:
i. Direct the instructional process- by describing the intended results of instruction.
ii. Communicate the intent of instruction to others (students, parents, school personnel,
public, ministry, etc).
iii. Provide a basis for assessing student learning- describing the performance to be
measured.
iv. Make it easier to develop fair, valid, and comprehensive tests.
v. Enhance quality of teaching and learning.
vi. Guide student’s self-assessment of their learning as well as self-management of learning
opportunities.

Characteristics of Educational Objectives:


E.O. Possess three most prominent characteristics:
a. Scope- how specific or broad the objective is. There should be a balance between the
broad and narrow objectives. How? Use an intermediate level of specificity or write
broad objectives then break them down into specific ones.
b. Domain - cognitive, affective, or psychomotor domain.
c. Format- behavioural objectives- specify activities that are observable and
measurable.
Non-behavioural- specify activities that are unobservable and not directly measurable.

BLOOMS TAXONOMY
Taxonomy- system of classification.
Bloom’s taxonomy- a system of classification of educational objectives that was developed
by Benjamin Bloom. It gives the cognitive behaviours that can relate to the development of
academic competence among learners.
Learning objective domains
There are three learning objective domains:
1. Cognitive (knowing): Knowledge, reasoning, problem solving.
2. Psychomotor (doing): Communicating, documenting, and assessing.

Page 39 of 47
3. Affective (feeling): Shows sensitivity.
Considering Blooms Taxonomy when writing learning objectives ensures they are:
- Demonstrable in a tangible way
- Achievable in the time frame
- Describe essential learning
- Are appropriate for the level of learners’ competence.
Criteria for selecting appropriate objectives:
- They should include all-important outcomes of the course.
- They must be consistent with the general goals of education.
- They should be consistent with the sound principles of learning.
- They should be realistic in terms of students’ abilities, time, and facilities.
- They should be specific and measurable.
The Cognitive Domain. Six basic objectives are listed in Bloom’s taxonomy of thinking or
cognitive domain.

1. Knowledge: Remembering or recognizing something without necessarily


understanding, using, or changing it.
2. Comprehension: Understanding the material being communicated without
necessarily relating it to anything else.
3. Application: Using a general concept to solve a particular problem.
4. Analysis: Breaking something down into its parts.
5. Synthesis: Creating something new by combining different ideas.
6. Evaluation: Judging the value of materials or methods as they might be applied in
a particular situation.

Table of Specifications (TOS) Aka Test Blueprint

This is a two-way grid/chart which defines the scope and emphasis of a test by relating the
course content to the instructional objectives.

Its columns list the performance objectives at each level of the cognitive domain as described
in Blooms Taxonomy. Its rows list the key concepts/ content measured by the test.

TOS should be prepared before the test is constructed. Preferably before the actual teaching.

Steps followed in constructing a TOS

a. Choose measurement goals and content domains to be covered


b. Break down domains into key/fairly independent parts
c. Construct the two-way chart. Place the content classification on the left-hand side of
the chart and the cognitive domains across the top of the chart.
d. Find the totals or percentages for each content category (placed at the right hand-side
of the chart) and for each cognitive domain (placed at the bottom of the chart).

Cognitive Domain
Content Knowledge Comprehension Application Analysis Synthesis Evaluation Total
Climate 1 2 3 1 4 2 13
Environment 2 3 1 1 2 3 11
Total 3 5 4 2 6 5 24

Page 40 of 47
Importance of the Table of Specifications.

1 It ensures that the test content matches the curriculum content.


2 It helps the teacher to be more organized in teaching especially in the selection of
educational objectives, assessment procedures, and resources.
3 It encourages the teacher to use items of varying difficulty.
4 It enhances the content validity of a test- guides test developers not to under test or
over test any area, or include irrelevant concepts.
5 It informs students of what to expect in tests, thus improving their learning and
revision for tests.

4.1 Qualities of a Good Test


- Reliable
- Valid
- Fair
- Practical value

4.1.1 Reliabity

4.1.2 Meaning
- The ability of a test to consistently measure what it is measuring.
- The degree to which test scores are consistent, dependable, and repeatable.
4.1.3 Methods of estimating reliability.
The major methods of determining reliability focus on different types of consistency:
- consistency over a period of time,
- over different forms of the assessment,
- within the assessment itself, and
- over different raters.
The major methods of estimating reliability include:
Test-retest reliability- It is the extent to which a test yields the same scores when it is
administered to the same group of test-takers on two different occasions. Procedure: Give
the same test to the same group of test-takers with atime interval between the tests (usually
within a short period of time 2 wks-a month) and then correlating the two sets of scores.

Equivalent/ alternate/parallel forms- this is a measure of equivalence. It involves


administering two forms of a test to the same group of individuals at almost the same time
and then correlating the scores. A high correlation coefficient indicates that the two forms are
measuring the same type of performance. A low correlation coefficient would indicate that
the two versions are not measuring the same thing or that they differ in degree of difficulty.

Test-retest with equivalent forms- This method is a measure of stability and equivalence. It
involves giving two forms of the test to the same group with an increased time interval
between the two forms and then correlating the two sets of scores.
Split-half reliability- this is estimated by administering a test and then dividing it into two
equivalent parts that are scored independently (e.g. odd numbered items vs even numbered
items). The results of the two parts are then scored and correlated. The correlation
coefficient indicates the degree to which consistent results are obtained from the two parts of
the test and therefore adequacy of content sampling.

Page 41 of 47
Inter-rater reliability- this measures consistency of scorers’ judgements when scoring a
test. It is estimated by administering the test once and having two or more individuals
independently score each student’s responses in the test. The scores obtained by two scorers
are then correlated. This method only reflects differences due to the individuals scoring the
test. It is highly applicable for essaytests. A major limitation of the approach is that it is
affected by differences due to raters or scorers.
How to improve the reliability of a test.
i. Standardize the conditions under which the test is taken.
ii. Use sufficient number of items or tasks in a test.
iii. Clearly explain requirements for responding to test items.
iv. Identify specific criteria in advance for scoring student’s essay items. Score all
students’ responses to a particular item before moving to a second one.
v. Avoid being influenced by your expectations of your students and score tests
anonymously.
Factors influencing reliability coefficients
i. Test length. A longer test more reliable than a short one.
ii. Speed- Speed tests are not as reliable as power tests. In a speed test not every student
is able to complete all of the items within the given time. In a power test every student
is able to complete all the items.
iii. Group homogeneity- The more heterogeneous the group of students who take the
test, the more reliable the scores are likely to be.
iv. Item difficulty- very easy or very difficult tests have little reliability.
v. Objectivity- Objectively scored tests show higher reliability than subjective tests.
vi. Variation with the testing situation. Errors in the testing situation (e.g., students
misunderstanding or misreading test directions, noise level, distractions, and sickness)
can cause test scores to vary.
vii. Student’s internal factors
Health
Motivation
Anxiety
Validity
The degree to which a test actually measures what it is supposed to measure.
Types
Face validity- whether a test appears superficially to test what it is supposed to measure.
Content validity- a test’s ability to accurately sample the content taught in class and
measure the extent to which learners understand it.
Predictive validity- a test’s ability to gauge future performance.
Construct validity- the extent to which a test accurately measures a characteristic that is
not directly observable.
How to improve validity of a test
- Specify the instructional domains to be used to measure student achievement.
- Select a representative sample of relevant learning tasks.
- Control for extraneous variables.
- Use standardized instructions.
- Arrange test items properly.
- Let the test be of moderate difficulty.
- Have similar scoring procedures.

Page 42 of 47
ITEM FORMAT FOR WRITTEN TESTS
Item format- the form, plan, structure, arrangement, and layout of individual test items.
Two broad types of items formats:
a. Selected response format- test items that require students to pick the correct answer
form available options. E.G. Multiple choice question, true/false, matching tests.
b. Constructed response format- test items that ask students to create an answer by
writing out information E.G. Fill in the blank spaces, essays etc.
FACTORS TO CONSIDER WHEN SELECTING TEST FORMAT
i. Purpose of the test.
ii. Time available to prepare and mark the test.
iii. Number of students to be tested.
iv. Physical facilities available for reproducing the test.
v. Teacher’s skill at writing different types of tests.
vi. The subject being tested.
vii. The level of examinees.
1. Selected response items
Multiple choice tests (MCTs) are the most popular. A multiple choice item has two parts: the
stem and the alternatives.
Stem- a question or an incomplete statement.
Alternatives- these are possible answers.
Two types of alternatives:
Answer- the correct alternative
Distracters- incorrect alternatives (they serve to ‘distract’ students who actually do not know
the answer).
Forms of MCT
a. Direct-question type- the stem is a direct question. E.g. Which is the longest river in
Africa?
b. Incomplete sentence type- the stem is an incomplete statement. The longest river in
Africal is------------------------------
Guidelines for writing MCT
a. The item should contain all the information necessary to understand the problem/question
b. Provide between three and five alternatives. This reduces the chance for guessing the
correct answer.
c. Keep the alternatives brief and arrange them in an order that promotes efficient scanning.
d. Avoid negatively stated stems as much possible
e. There should be only one correct alternative
f. Stems should not have information that suggests the answer
g. All alternatives should be grammatically correct relative to the stem
h. No item should reveal the answer to another item.
i. All distracters should appear plausible.
j. Randomly place the correct answer in different positions.
k. Minimize use of ‘none of the above’, and avoid using ‘all of the above’.
l. Limit use of always and never in the alternatives.
m. Avoid using the exact textbook wording.
n. Organize the test in a logical manner.
o. Carefully consider the number of items in your test.
Advantages of multiple-choice items.
a. Can be used to assess achievement in a wide range of areas.
b. Marking is easy, objective, and reliable.
c. They efficiently wide range of content (wide content coverage).

Page 43 of 47
d. Distracters provide diagnostic information.
e. Can assess both simple and complex learning outcomes.
Disadvantages of multiple-choice items
a. Time consuming to construct
b. Not effective for measuring some educational objectives. E.g. creativity, organization,
problem solving, or verbal skills.
c. Scores can be influenced by reading ability
d. May encourage rote learning and guesswork
e. Does not allow students freedom of expression
2. Constructed-response items.
Short answer and essay questions are the most common.
Essay items- test items that pose a question for the student to respond to in a written
format.
Types of essay items
a. Restricted-response items- highly structured and clearly specify the form and scope
of student’s response. They typically require students to list, define, describe, or give
reasons. They may specify time and length of the response. E.g. list the three types of
muscle tissue and state four functions of each.
b. Extended-response items- provide more freedom and flexibility in how students can
respond to the item. They do no limit the response it terms of form and scope. They
typically require students to compose, summarize, formulate, compare, interpret, e.t.c.
E.g.
Summarize the major forms of validity.
They provide less structure and this promotes creativity, integration, organization,
analysis and synthesis of the information.
Advantages of essay tests
i. They take less time to prepare.
ii. Effective for measuring higher-level cognitive skills.
iii. They largely eliminate guessing.
iv. They effectively assess knowledge of content, grammar, and writing ability and
style.
Disadvantages
i. Tedious to mark and the essays are difficult to score in a reliable way.
ii. Scoring of essays may be influenced by irrelevant characteristics like students’
academic history, handwriting, grammatical errors, bluffing etc
iii. May only sample limited content due to the time required to answer.
Guidelines for writing effective essay questions
a. Ask questions in a simple and straightforward manner.
b. Consider carefully the amount of time students will need to answer the essay
questions.
c. Let students respond to the same set of items.
d. Use more restricted-response items than extended-response items.
e. Structure and clarify the task.
f. Limit the use of essay to educational objectives that cannot be assessed using
selected-response items.

Page 44 of 47
ITEM ANALYSIS

This is a set of statistics that can be computed for each test item to assess its quality. The
choice of item analysis method depends on the purpose of the test and the person designing
the analysis.
Purpose of item analysis.
i. Item analysis data provides a basis for class discussion of the test results.
ii. Item analysis data provides a basis for remedial work focused on students’ areas
of weakness.
iii. It can help improve classroom instruction.
iv. It increases teacher’s understanding and skills in test construction
v. It helps us understand why a test has specific levels of reliability and validity and
why test scores can be used to predict some criteria and not others.
vi. It may also suggest ways of improving the measurement characteristics of a test.
Basic questions in item analysis.
a. How many people chose each response? Recall that there is only one correct answer.
The incorrect responses are called distracters. Therefore, examining the total pattern
of responses to each item of a test is referred to as distracter analysis.
b. How many people answered the item correctly?/ were the items of appropriate
difficulty? To answer this question, we conduct an analysis of item difficulty.
c. Are responses to the item related to the responses to other items in the test?
Answering this question entails an analysis of item discrimination.
Item difficulty
In achievement tests, item difficulty also called p Value, is commonly measured as the
percentage or proportion of students who answered an item correctly. It can be expressed
as a decimal, percentage, or fraction. It ranges from 0-1 (for decimal or fraction) or 0-
100% (for percentage).
Formular for p.
P value= number of students answering an item correctly/ number of test-takers.
𝑅𝑢 + 𝑅𝑙
𝑝=( ) 𝑥100
𝑁𝑢 + 𝑁𝑙
Where: Ru= number of students in the upper group who got the item right
Rl=number of students in the lower group who got the item right
Nu=Number of students in the upper group who actually attempted the
question
Nl= number of students in the lower group who actually attempted the
question.
How to interpret Item difficulty of Test Scores.
P=0 indicates that all students chose the wrong answers.
P=1 indicates that everyone got the correct answer.
An item with a p value of 0 or 1 is undesirable since the item would not show individual
differences among the learners and are of no value from a measurement perspective.
The optimal difficulty index is 0.50. Any items with p ranging from .30 to .70 are
considered good in a MCT.
The item discrimination index
Item discrimination is abbreviated as D. It is simply the difference between the number of
high achieving students who got an item right and the number of low achieving students who
got the item right. We can compute the discrimination index by subtracting the proportion of
students who got the item correct in the lower group (RL/NL) from the proportion of students
who got the item correct in the upper group (RU/NU)

Page 45 of 47
Thus:
𝑅𝑢 𝑅𝑙
𝐷= −
𝑁𝑢 𝑁𝑙
How to interpret D.
D is usually expressed as a decimal and it can range from -1.0 to +1.0.
If it is positive the item has positive discrimination. i.e. a large proportion of more
knowledgeable students got the item right than the poor students.
If it is 0 the item has zero discrimination. This is possible when test item is too difficult, too
easy, or ambiguous.
If it is negative the item has negative discrimination. i.e. a large proportion of poor students
got the item right than the more knowledgeable students.
We can interpret the quality of test items following these guidelines:
D Quality of item
0.40 and above excellent
0.30-0.39 good
0.20-0.29 fair
0.00-0.19 poor
Negative values poor
Therefore, for items to be considered of good quality, they must have at least a D of 0.30.
However, items with D of .020 and above are acceptable in classroom tests for various
reasons.
Items with negative discrimination values should be reviewed. A negative discrimination
value, like a low difficulty value, may occur as a result of several possible causes: a miskeyed
item, or an item that is ambiguous.

Example:
A group of 140 examinees responded as shown below in a test item:
A B C D Omit Total
Upper 3 57 6 4 0 70
group
Lower 12 34 14 4 6 70
group

Find p and D.
P= 57/70 +34/64= 91/134X 100= 67/91%.
D= 57/70- 34/64= .814- .531= 0.283
Evaluating the effectiveness of distracters
We determine how well the distracters function by inspection. Generally, a good
distracter attracts more students from the lower group than the upper group.
Poor distracters attract more students from the upper group than the lower group. This
could be due to ambiguity in the statement of the item.
Distracters that attract no one (unpopular distracters) are also ineffective/poor.
Report cards
The report card is a summary of teachers’ professional judgements about a student’s
achievement which acts as a standard method of communicating students’ academic
progress and grades to parents.
Functions of a report card
a. Details achievement of curriculum expectations for each semester or term
b. Reports the student’s overall development of learning skills and work habits
c. Includes next steps for future student learning

Page 46 of 47
d. Provides an opportunity for parents/guardians to comment on student
achievement, student goals, and shared responsibilities of home and school to
support improved student learning
e. It gives students descriptive feedback in comments,
f. It provides guidance to help students improve their learning.
Types of judgements on report cards
i. Letter grades e.g. A,B,C,D,E, F (sometimes allowing pluses and minuses).
ii. Numerical scores per subject e.g. 91 mathematics; 70 english etc
iii. Pass/fail category in one or more subjects.
iv. Checklists indicating skills and objective that students have attained (mainly
used for kindergartens and nursery school).
v. Categories for affective characteristics such as effort, cooperation, and other
appropriate and inappropriate behaviours.
Writing effective report card comments.
i. Effective Report Card comments provide specific details about a student’s
achievement of the overall curriculum expectations.
ii. Report card comments should provide students and parents with personalized, clear,
precise, and meaningful feedback.
iii. Effective comments are written in clear and simple language, using vocabulary that is
easily understood by both students and parents, rather than educational terminology
taken directly from the curriculum documents, and conveys a positive tone.
Comments should be error-free, and avoid slang or colloquial language.

Page 47 of 47

You might also like