Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
95 views

Module in Assessment in Learning 1 Final

This document provides an overview of the first lesson in the module on assessment in learning. It begins by defining assessment in learning as the systematic collection, analysis, and interpretation of evidence of student learning in order to make informed decisions about students. It distinguishes assessment from the related concepts of measurement, which is quantifying attributes, and evaluation, which is making value judgments. Assessment incorporates both measurement and evaluation. It also discusses the relationship between assessment and testing, noting that testing is a common form of assessment using tests to collect information about student learning over a period of time.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

Module in Assessment in Learning 1 Final

This document provides an overview of the first lesson in the module on assessment in learning. It begins by defining assessment in learning as the systematic collection, analysis, and interpretation of evidence of student learning in order to make informed decisions about students. It distinguishes assessment from the related concepts of measurement, which is quantifying attributes, and evaluation, which is making value judgments. Assessment incorporates both measurement and evaluation. It also discusses the relationship between assessment and testing, noting that testing is a common form of assessment using tests to collect information about student learning over a period of time.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 87

Module in

Assessment in Learning 1

First Semester, SY 2021 - 2022

Palawan State University


Brooke’s Point Campus

WILLIAM M. HERRERA, M.A. Ed.


Author
Copyright Date: September 2021

All rights reserved.


No part of this module maybe reproduced in any forms without the permission of the author.
Preface

The topics in this course are divided into three modules, each module with three
lessons for a total of nine (9) lessons to cover all the suggested topics to be discussed in
the class given the target learning outcomes of the course.

The first module focusses on the Foundations of Assessment in Learning. This


chapter introduces the foundations of all the lessons in this course on assessment. It has
three lessons: a) the Basic Concepts and Principles in Assessing Learning, which should
be understood well by teachers to inform sound assessment practice; b) the Assessment
Purposes, Learning Targets, and Appropriate Methods, which will enable the teachers to
understand why and what to assess and the appropriate method to use given the purpose
and targets of assessment; c) the Different Classifications of Assessment, which is
expected to help teachers rationalize use of every kind of classroom assessment.

The second module focuses on the Process in the Development and Administration
of Tests. This chapter has three lessons as well: first is Planning a Written Test, which
could enable teachers to design a good test table of specifications; second is Construction
of a Written Test, which sets the guidelines in constructing different test formats, third is on
Establishing Test Validity and Reliability, which provides the necessary input to ensure that
the test constructed measures whit it intends to measure and provides reliable results.

The third module focuses on the Organization, Utilization, and Communication of


Test Results. It has also three lessons in which the first is Organization of Test Data Using
Tables and Graphs, which could provide teachers the skills in organizing and presenting
assessment data for easy interpretation and use to improve teaching and learning. The
second lesson is the topic on Analyses, Interpretation, and Use of Test Data, which could
provide the teachers the right statistical measures in analyzing and interpreting
assessment data. The last lesson is on Grading and Reporting of Test Results, which
could enable teachers to gain some techniques and ethical considerations in
communicating assessment data to learners, parents and other relevant stakeholders.

This compilation of modules has been designed to have the following features: a)
Outcome-based; b) PSG’s aligned; c Standards-based; d) 21 st Century skills and
strategies-focused; and e) Whole-child sensitive.

Each lesson in the three modules has been designed to follow the UPDATERS
Framework, where each letter has the following meaning and feature: U – understand; P –
prepare; D – develop; A –apply; T – transfer; E – evaluate; R – reflect; and S – sustain.

It is hoped that the features of this module will make your learning of the first part of
assessment course, that is, Assessment in Learning 1, meaningful, engaging, and
challenging. Your learning in this course will be a good foundation for the course
Assessment in Learning 2.

The Author
TABLE OF CONTENTS

Title Page

Copyright Page

Preface

Module 1 INTRODUCTION TO ASSESSMENT IN LEARNING

Lesson 1: Basic Concepts and Principles in Assessing Learning..

What is assessment in learning


what are the different measurement frameworks used in assessment..
What are the different types of assessment in learning.
What are the different principles in assessing learning.

Lesson 2: Assessment Purposes, Learning Targets, and Appropriate Methods

what is the purpose of classroom assessment


What are learning targets.
The Bloom's Taxonomy of Educational objectives.
The Revised Bloom's Taxonomy of Educational Objectives.
Types of Learning Targets
Appropriate Methods of Assessment

Lesson 3: Different Classifications of Assessment

What are the different classifications of assessment?..


When do we use educational and psychological assessments?.
When do we use paper-and-pencil and performance-based type of assessments?..
How do we distinguish teacher-made from standardized test?..
What information is sought from achievement and aptitude tests?..
How do we differentiate speed from power test?.
How do we differentiate norm-referenced from criterion-referenced test?

MODULE Il: DEVELOPMENT AND ADMINISTRATION OF TESTSS

Lesson 4: Planning a Written Test

Why do we need to define the test objectives or learning outcomes targeted


for assessment?
What are the objectives for testing
Want is a table of specifications?.
What are the general steps in developing a table of specifications?..
What are the different formats of a test table of specifications.

Lesson 5: Construction of Written tests..

What are the general guidelines in choosing the appropriate test format?
What are the major categories and formats of traditional tests?..
What are the general guidelines in writing Multiple-choice test items
What are the general guidelines in writing matching-type items?,.
What are the general guidelines in writing true or false items/..
What are the general guidelines in writing short-answer test items?
What are the general guidelines in writing essay tests?. .
What are the general guidelines in problem-solving test items? ....
Lesson 6: Establishing Test Validity and Reliability.

What is test reliability?


What are the different ways to establish test reliability?
What is test validity?
What are the different ways to establish test validity?
How to determine if an item Is easy or difficult?

CHAPTER III: ORGANIZATION, UTILIZATION, AND COMMUNICATION


OF TEST RESULTS

Lesson 7: Organization of Test Data Using Tables and Graphs

How do we organize and present ungrouped data through tables?


How do we present test data graphically?
Which graph is best?
What are the variations on the shapes of frequency distributions?

Lesson 8: Analysis, Interpretation, and Use of Test Data.

What are measures of central tendency?.


What are measures of dispersion?
What are measures of position?..
What are standard scores?.
What are measures of covariability?..

Lesson 9: Grading and Reporting of Test Results

What are the purposes of grading and reporting learners' test performance?
What are the different methods in scoring tests or performance tasks?.
What are the different types of test scores?.
What are the general guidelines in grading tests or performance tasks?.
What are the general guidelines in grading essay tests?.
What is the new grading system of the Philippine K-12 Program?.
How should test results be communicated to different stakeholders?

References
Module 1

Basic Concepts and Principles in Assessing Learning

Source:

Balagtas, D. M. (2020). Assessment in Learning 1. Sampaloc, Manila: Rex Bookstore.

Desired Significant Learning Outcomes:


In this lesson, you are expected to:

1. describe assessment in learning; and


2. demonstrate understanding of the different principles in assessing learning,
learning through the preparation of an assessment plan.

Overview

To successfully describe the nature of assessment in learning, develop a concept


map of its basic Concepts and document the experiences of teacher who apply its
principles. To do so, you need to read the following information about the basic concepts,
measurement frameworks, and principles in assessing. You are expected to read this
information before discussion, analysis, and evaluation when you meet the teacher face-
to-face or in your virtual classroom or through online class. If the information provided in
this lesson is not enough, you can search for more information in the internet.

What is Assessment in Learning?

The word assessment is rooted in the Latin word “assidere” which means “to sit
beside another.” Assessment is generally defined as the process of gathering quantitative
and/or qualitative data for the purpose of making decisions.

Assessment in Learning can be defined as the systematic and purpose-oriented


collection, analysis, and interpretation of evidence of student learning in order to make
informed decisions relevant to the learners. This is characterized as a) a process, b) based
on specific objectives, and c) from multiple sources.

How is assessment in learning similar or different from the concept of measurement


or evaluation of learning? Measurement is the process of quantifying the attributes of an
object; whereas evaluation is the process of making value judgments on the information
collected from measurement based on specified criteria. In the context of assessment in
learning, measurement refers to the actual collection of information on student learning
through the use of various strategies and tools, while evaluation refers to the actual
process of making a decision or judgment on student learning based on the information
collected from measurement. Therefore, assessment can be considered as an umbrella
term consisting of measurement and evaluation.

Assessment and Testing

The most common form of assessment is testing. In the educational context, testing
refers to the use of a test or battery of tests to collect information on student learning over
a specific period of time. A test can be categorized as either a selected response
(matching type of test) or constructed response (essay test or short answer test). A test
can make use of objective format (multiple choice, enumeration) or subjective format
(essay). A test to be good and effective should be valid, reliable, has acceptable level of
difficulty, and can discriminate between learners with higher and lower ability. Teachers
are expected to be competent in the design and development of classroom tests.
Assessment and Grading

A related concept to assessment in learning is grading, which can be defined as the


process of assigning value to the performance or achievement of a learner based on
specified criteria or standards. Aside for tests, other classroom tasks can serve as bases
for grading learners. These include a learner’s performance in recitation, seatwork,
homework, and project. Grading is a form of evaluation which provides information on
whether a learner passed or failed a subject or a particular assessment task. Teachers are
expected to be competent in providing performance feedback and communicating the
results of assessment tasks or activities to relevant stakeholders.

What are the different measurement frameworks used in assessment?

The two most common psychometric theories that serve as frameworks for
assessment and measurement, especially in the determination of the psychometric
characteristics of a measure (tests, scale) are the classical test theory (CTT) and the
item response theory (IRT).

The CTT (known as true score theory) explains that variations in the performance of
examinees on a given measure is due to variations in their abilities in which there are
some degree of errors in the measurement caused by some internal and external
conditions. Hence, the CTT also assumes that all measures are imperfect, and the score
obtained from a measure could differ from the true score (or true ability) of an examinee.

The CTT provides an estimation of the item difficulty based on the frequency or
number of examinees who correctly answer a particular item; items with fewer number of
examinees with correct answers are considered more difficult. The CTT also provides an
estimation of item discrimination based on the number of examinees with higher or lower
ability to answer a particular item. If an item is able to distinguish between examinees with
higher ability (higher total test score) and lower ability (lower test score) then an item is
considered to have good discrimination. Test reliability can also be estimated using
approaches from CTT (Kuder Richardson 20, Cronbach’s Alpha). Item analysis based on
CTT has been the dominant approach because of the simplicity of calculating the statistics
(item difficulty index, item discrimination index, and item-total correlation).

The IRT on the other hand, analyzes test items by estimating the probability that an
examinee answers an item correctly or incorrectly. One of the central differences of IRT
from CTT is that in IRT, the characteristic of an item can be estimated independently of the
characteristic or ability of the examinee and vice-versa. Aside from item difficulty and item
discrimination indices, IRT analysis can provide significantly more information on items
and tests, such as fit statistics, item characteristic curve (ICC), and test characteristic
curve (TCC).

What are the different types of assessment in learning?

Assessment in learning could be of different types. The most common


types are formative, summative, diagnostic, and placement. Other experts
would describe the types of assessment as traditional and authentic.

Formative Assessment refers to the assessment activities that provide


information to both teachers and learners on how they can improve the
teaching-learning process. This type of assessment is formative because it is
used at the beginning and during the instruction for teachers to assess
learner’s understanding. The information collected on student learning allows
teachers to make adjustments to their instructional process and strategies to
facilitate learning. Through performance reports and teacher feedback,
formative assessment can also inform learners about their strengths and
weaknesses to enable them to take steps to learn better and improve their
performance as the class progresses.

Summative Assessment are assessment activities that aim to


determine learner’s mastery of content or attainment of learning outcomes
They are supposed to provide information on the quantity or quality of what
students have learned or achieved at the end of the instruction. While data
from summative assessment are typically used for evaluating learner’s
performance in class, these also provide teachers with information about
effectiveness of their teaching strategies and how they can improve their
instruction in the future. Through performance reports and teacher feedback,
summative assessment can also inform learners about what they have done
well and what they need to improve on in their future classes or subjects.

Placement assessment is usually done at the beginning of the school


year to determine what the learners already know or what their needs that
could inform design of instruction are. Grouping of learners based on the
results of placement assessment is usually done before instruction to make it
relevant to address the needs or accommodate the entry performance of the
learners. The entrance examination given in schools is an example of a
placement assessment.

Traditional assessment refers to the use of conventional


strategies or tools to provide information about the learning of
students. Typically, objective (multiple choice) and subjective
(essay) paper-and-pencil tests are used. Traditional assessments
are often used as basis for evaluating and grading learners. It is
viewed as an inauthentic type of assessment.

Authentic Assessment refers to the use of assessment


strategies or tools that allow learners to perform or create a product
that are meaningful to the learners, as they are based on real world
contexts. The authenticity of assessment tasks is best described in
terms of degree rather than the presence or absence of authenticity.
Hence, an assessment can be more authentic or less authentic
compared with other assessments. The most authentic
assessments are those that allow performances that most closely
resemble real-world tasks or applications in real-world settings or
environments.

What are the different principles in assessing learning?

There are many principles in the assessment in learning.


Based on the different readings and references on these principles,
the following may be considered as core principles.
1. Assessment should have a clear purpose. Assessment starts
with a clear purpose. The methods used in collecting information
should be based on this purpose. The interpretation of the data
collected should be aligned with the purpose that has been set.
This assessment principle is congruent with the outcome-based
education (OBE) principles of clarity of focus and design down.

2. Assessment is not an end in itself. Assessment serves as a


means to enhance student learning. It is not a simple recording or
documentation of what learners know and do not know.
Collecting information about student learning, whether formative
or summative, should lead to decisions that will allow
improvement of the learners.

3. Assessment is an ongoing, continuous, and a formative


process. Assessment consists of a series of tasks and activities
conducted over time. It is not a one-shot activity and should be
cumulative. Continuous feedback is an important element of
assessment. This assessment principle is congruent with the
OBE principle of expanded opportunity.

4. Assessment is learner-centered. Assessment is not about what


the teacher does but what the learner can do. Assessment of
learners provides teachers with an understanding on how they
can improve their teaching, which corresponds to the goal of
improving student learning.

5. Assessment is both process- and product-oriented.


Assessment gives equal importance to learner performance or
product and the process they engage in to perform or produce a
product.

6. Assessment must be comprehensive and holistic.


Assessment should be performed using a variety of strategies
and tools designed to assess student learning in a holistic way.
Assessment should be conducted in multiple periods to assess
learning over time. This assessment principle is also congruent
with the OBE principle of expanded opportunity.

7. Assessment requires the use of appropriate measures. For


assessment to be valid, the assessment tools or measures used
must have sound psychometric properties, including, but not
limited to, validity and reliability. Appropriate measures also mean
that learners must be provided with challenging but age and
context-appropriate assessment tasks. This assessment principle
is consistent with the OBE principle of high expectations.

8. Assessment should be as authentic as possible. Assessment


tasks or activities should closely, if not fully, approximate real-life
situations or experiences. Authenticity of assessment can be
thought of as a continuum from least authentic to most authentic,
with more authentic tasks expected to be more meaningful for
learners.
Activity:

Based on the lesson on the basic concepts and principles in


assessment in learning, select five core principles in assessing
learning and explain them in relation to your experience with a
previous or current teacher in one of your courses /subjects.

Principle Illustration of Practice


1. Assessment In our practicum course, we were asked
should be to prepare a lesson plan then execute the
as authentic as plan in front of the students with my critic
possible teacher around to evaluate my
performance. The actual planning of the
lesson and its execution in front of the
class and the critic teacher is a very
authentic way of assessing my ability to
design and deliver instruction rather than
being assessed through demonstration in
front of my classmates in the classroom.

Given the example, continue the identification of illustrations of


assessment practices guided by the principles discussed in the
class. Through our google classroom or any online platforms,
share your insights on how your teacher’s assessment practices
allowed you to improve your learning. Use the format blow:

Principle Illustration of Practice


1.

2.

3.

4.
5.

Lesson 2 Assessment Purposes, Learning Targets and


Appropriate Methods

Desired Significant Learning Outcomes

In this lesson, you are expected to:

1. explain the purpose of classroom assessment; and


2. formulate learning targets that match appropriate
assessment methods.

Overview:

To be able to successfully prepare an assessment plan based


on learning targets, you need to read the following information
about the purposes assessing learning in the classroom, the basic
qualities of effective classroom assessment, learning targets, and
use or appropriate assessment methods. You are expected to read
this before discussion, analysis, and evaluation when you meet the
teacher face-to-face in your classroom.

What is the purpose of classroom assessment?


Assessment works best when its purpose is clear. Without a
clear purpose, it is difficult to design or plan assessment effectively
and efficiently. In classrooms, teachers are expected to know the
instructional goals and learning Outcomes, which will inform how
they will design and implement their assessment. In general, the
purpose of classroom assessment may be classified in terms of the
following:

1. Assessment of Learning. This refers to the use of assessment


to determine learners’ acquired knowledge and skills from
instruction and whether they were able to achieve the curriculum
outcomes. It is generally summative in nature.

2. Assessment for Learning. This refers to the use of assessment


to identify the needs of learners in order to modify instruction or
learning activities in the classroom. It is formative in nature and it
is meant to identify gaps in the learning experiences of learners
so that they can be assisted in achieving the curriculum
outcomes.

3. Assessment as Learning. This refers to the use of assessment


to help learners become self-regulated. It is formative in nature
and meant to use assessment tasks, results, and feedback to
help learners practice self-regulation and make adjustments to
achieve the curriculum outcomes.

As discussed in the previous lesson, assessment serves as


the mechanism by which teachers are able to determine whether
instruction worked in facilitating the learning of students. Hence, it is
very important that assessment is aligned with instruction and the
identified learning outcomes for learners. Knowing what will be
taught (curriculum content, competency, and performance
standards) and how it will be taught (instruction) are as important as
knowing what we want from the very start (curriculum outcome) in
determining the specific purpose and strategy for assessment. The
alignment is easier if teachers have clear purpose on why they are
performing the assessment. Typically, teachers use classroom
assessment for assessment of learning more than assessment for
learning and assessment as learning. Ideally, however, all three
purposes of classroom assessment must be used. While it is
difficult to perform an assessment with all three purposes in mind,
teachers must be able to understand the three purposes of
assessment, including knowing when and how to use them.
The Roles of Classroom Assessment in the Teaching-Learning
Process

Assessment is an integral part of the instructional process


where teachers design and conduct instruction (teaching), so
learners achieve the specific target learning outcomes defined by
the curriculum. While the purpose of assessment may be classified
as assessment of learning. assessment for learning, and
assessment as learning, the specific purpose of an assessment
depends on the teacher's objective in collecting and evaluating
assessment data from learners. More specific objectives for
assessing student learning are congruent to the following roles of
classroom assessment in the teaching-learning process:
formative, diagnostic, evaluative, facilitative, and motivational, each
of which is discussed below.

Formative. Teachers conduct assessment because they want


to acquire information on the current status and level of
learners’ knowledge and skills or competencies. Teachers may
need information (prior knowledge, strengths) about the learners
prior to instruction, so they can design their instructional plan to
better suit the needs of the learners. Teachers may
also need information on learners during instruction to allow
them to modify instruction or learning activities to help learners
achieve the learning outcomes.

Diagnostic. Teachers can use assessment to identify specific


learners' weaknesses or difficulties that may affect their
achievement of the intended learning outcomes. Identifying
these weaknesses allows teachers to focus on specific learning
needs and provide opportunities for instructional intervention or
remediation inside or outside the classroom.

Evaluative. Teachers conduct assessment to measure learners’


performance or achievement for the purposes of making
judgment or grading in particular. Teachers need information on
whether the learners have met the intended learning outcomes
after the instruction is fully implemented. The learners’
placement or promotion to the next educational level is informed
by the assessment results.

Facilitative. Classroom assessment may affect student


learning. On the part of teachers, assessment for learning
provides information on students’ learning and achievement that
teachers can use to improve instruction and the learning
experiences of learners. On the part of learners, assessment as
learning allows them to monitor, evaluate, and improve their
own learning strategies. In both cases, student learning is
facilitated.

Motivational. Classroom assessment can serve as a


mechanism for learners to be motivated and engaged in
learning and achievement in the classroom. Grades, for
instance, can motivate and demotivate learners. Focusing on
progress, providing effective feedback, innovating assessment
tasks, and using scaffolding during assessment activities
provide opportunities for assessment to be motivating rather
than demotivating.

What are Learning Targets?

Educational Goals, Standards, and Objectives

Before discussing what learning targets are, it is important to


first define educational goals, standards, and objectives.

Goals. General statements about desired learner outcomes in a


given year or during the duration of a program (Senior high
school).

Standards. Specific statements about what learners should


know and are capable of doing at a particular grade level,
subject, or course. McMilan (2014, p. 31) described four
different types of educational standards: (1) content (desired
outcomes in a content area), (2) performance (what students do
to demonstrate competence), (3) developmental (sequence of
growth and change over time), and (4) grade-level (outcomes
for a specific grade).

Educational objectives. Specific statements of learner


performance at the end of an instructional unit. These are
sometimes referred to as behavioral objectives and are typically
stated with the use of verbs. The most popular taxonomy of
educational objectives is Bloom's Taxonomy of Educational
Objectives.
The Bloom's Taxonomy of Educational objectives

Bloom's Taxonomy consists of three domains: cognitive, affective,


and psychomotor. These three domains correspond to the three
types of goals that teachers want to assess: Knowledge-based
goals (cognitive), skills-based goals (psychomotor), and affective
goals (affective). Hence, there are three taxonomies that can be
used by teachers depending on the goals. Each taxonomy consists
of different levels of expertise with varying degrees of complexity.
The most popular among the three taxonomies is the Bloom’s
Taxonomy of Educational Objectives in the Cognitive Domain, also
known as Bloom’s taxonomy of Educational Objectives for
Knowledge-Based Goals. The taxonomy describes six levels of
expertise: knowledge, comprehension, application, analysis,
synthesis, and evaluation. Table 2.1 presents the description,
illustrative verbs, and a sample objective for each of the six levels.

Bloom's taxonomies of educational objectives provide teachers with a structured


guide in formulating more specific learning targets as they provide an exhaustive list or
learning objectives. The taxonomies do not only serve as guide for teachers' instruction but
also as a guide for their assessment of student learning in the classroom. Thus, it is
imperative that teachers identify the levels of expertise that they expect the learners to
achieve and demonstrate.
The Revised Bloom's Taxonomy of Educational Objectives

Anderson and Krathwohl proposed a revision of the Bloom's Taxonomy in the


cognitive domain by introducing a two-dimensional model for writing learning objectives
(Anderson and Krathwohl, 2001). The first dimension, knowledge dimension, includes four
types: factual, conceptual, procedural, and metacognitive. The second dimension,
cognitive process dimension, consists of six types: remember, understand, apply, analyze,
evaluate, and create. An educational or learning objective formulated from this two-
dimensional model contains a noun (type of knowledge) and a verb (type of cognitive
process). The Revised Bloom's Taxonomy provides teachers with a more structured and
more precise approach
in designing and assessing learning objectives.

Below is an example of a learning objective:

Students will be able to differentiate qualitative research and quantitative research.

In the example, differentiate is the verb that represents the type of cognitive process
(in this case, analyze), while qualitative research and quantitative research is the noun
phrase that represents the type of knowledge (in this case, conceptual).

Tables 2.2 and 2.3 present the definition, illustrative verbs, and sample objectives of
the cognitive process dimensions and knowledge dimensions of the Revised Bloom's
Taxonomy.
Learning Targets

A learning target is a statement of student performance for a


relatively restricted type of learning outcome that will be achieved in
a single lesson or a few days" and contains "both a description of
what students should know, understand, and be able to do at the
end of instruction and something about the criteria for judging the
level of performance demonstrated" (McMillan 2014, p. 43). In other
words, learning targets are statements on what learners are
supposed to learn and what they can do because of instruction.
Compared with educational goals, standards, and objectives,
learning targets are the most specific and lead to more specific
instructional and assessment activities.

Learning targets should be congruent with the standards


prescribed by program or level and aligned with the instructional or
learning objectives of a subject or course. Teachers must inform
learners about the learning targets of lessons prior to classroom
instruction. The learning targets should be meaningful tor the
learners; hence, they must be as clear and as specific as possible.
It is suggested that learning targets be stated in the learners' point
of view, typically using the phrase "can…: For example, "I can
differentiate between instructional objectives and learning targets.

Mcmillan (2014, p. 53) proposed five criteria for selecting


learning targets: (1) establish the right number of learning targets
(Are there too many or too few targets?); (2) establish
comprehensive learning targets (Are all important types of learning
included?) (3) establish learning targets that reflect school goals
and 21st century skills (Do the targets reflect school goals and 21st
century knowledge, skills, and dispositions?); (4) establish learning
targets that are challenging yet feasible (Will the targets challenge
students to do their best work?; and (5) establish learning targets
that are consistent with current principles of learning and motivation
(Are the targets consistent with research on learning and
motivation?).

Types of Learning Targets

Many experts consider four primary types of learning targets:


knowledge, reasoning, skill, and product. Table 2.4 summarizes
these types of learning targets.
Other experts consider a fifth type of learning target-affect,
which refers to affective characteristics that students can develop
and demonstrate because of instruction. This includes attitudes,
beliefs, interests, and values. Some experts use disposition as an
alternative term for affect. The following is an example of an affect
or disposition learning target:

I can appreciate the importance of addressing potential ethical


issues in the
conduct of thesis research.

Appropriate Methods of Assessment

Once the learning targets are identified, appropriate


assessment methods can be selected to measure student learning.
The match between a learning target and the assessment method
used to measure if students have met the target is very critical.
Tables 2.5.1 and 2.5.2 presents matrix of the different types or
learning targets and sample assessment methods.
There are other types of assessment, and it is up to the
teachers to select the method of assessment and design
appropriate assessment tasks and activities to measure the
identified learning targets.

The information from educational assessment at the beginning


of the lesson is used by the teacher to prepare relevant instruction
for learners. For example, if the learning target is for learners to
determine the by-product of photosynthesis then the teacher can
ask learners if they know what is the food of plants. If incorrect
answers are provided. then the teacher can recommend references
for them to study. if the learning target is for learners to divide a
three-digit number by a two-digit number, then the teacher can start
with a three-item exercise on the task to identify who can and
cannot perform the task. For those who can do the task, the teacher
can provide more exercises, for those who cannot, necessary direct
instruction can be provided. At this point of instruction, the results of
the assessment are not graded because the information is used by
the teacher to prepare relevant ways to teach.

Educational assessment during instruction is done where the


teacher stops at certain parts of the teaching episodes to ask
learners questions, assign exercises, short essays, board work, and
other tasks. If the majority of the learners are still unable to
accomplish the task, then the teacher realizes that further
instruction is needed by learners.

When the teacher observes that majority or all of the learners


are able to demonstrate the learning target, then the teacher can
now conduct the summative assessment. It is best to have a
summative assessment for each learning target so that there is an
evidence that learning has taken place. Both the summative and
formative assessments should be aligned to the same learning
target; in this case, there should be parallelism between the tasks
provided in the formative and summative assessments. When the
learners are provided with word problem-solving tasks in the
summative assessment, word problem-solving should also be given
during the formative assessment, etc.

Psychological assessments, such as tests and scales, are


measures that determine the learners’ cognitive and non-cognitive
characteristics. Examples of cognitive tests are those that measure
ability, aptitude, intelligence, and critical thinking. Affective
measures are for personality, motivation, attitude, interest, and
disposition. The results of these assessments are used by the
school’s guidance counselor to perform interventions on the
learners' academic, career, and social and emotional development.

When do we use paper-and-pencil and performance-based type


of assessments?

Paper-and-pencil type of assessments are cognitive tasks


that require a single correct answer. They usually come in the form
or test types, such as Binary (true or false), short answer
(identification), matching type, and multiple choice. The items
usually pertain to a specific cognitive skill, such as recalling,
understanding, applying, analyzing, evaluating, and creating.

On the other hand, performance-based type of


assessments require learners to perform tasks, such as
demonstrations, arrive at a product, show strategies, and present
information. The skills applied are usually complex and require
integrated skills to arrive at the target response. Examples include
writing an essay, reporting in front of the class, reciting a poem,
demonstrating how a problem was solved, creating a word problem,
reporting the results of an experiment, dance and song
performance, painting and drawing, playing a musical instrument,
etc. Performance-based tasks are usually open-ended, and each
learner arrives with various possible responses.

The use or paper-and-pencil and performance-based tasks


depends on the nature and content of the learning target.

How do we distinguish teacher-made from standardized test?

Standardized tests have fixed directions for administering and


scoring. They can be purchased with test manuals, booklets, and
answer sheets. When these tests were developed, the items were
sampled on a large number of target groups called the norm. The
norm group's performance is used to compare the results of those
who took the test.

Non-standardized or teacher-made tests are usually intended


for classroom assessment. They are used for classroom purposes,
such as determining whether learners have reached the learning
target. These intend to measure behavior (such as learning) in line
with the objectives of the course. Examples are quizzes, long tests,
and exams. Formative and summative assessments are usually
teacher made tests.

Can a teacher-made test become a standardized test? Yes, as


long as it is valid, reliable, and with a standard procedure for
administering, scoring, and interpreting results.

What information is sought from achievement and aptitude


tests?

Achievement tests measure what learners have learned after


instruction or after going through a specific curricular program.
Achievement tests provide information on what learners can do and
have acquired after training and instruction. Achievement is a
measure of what a person has learned within or up to a given time
(Yaremko et al. 1982). It is a measure of the accomplished skills
and indicates what a person can do at present (Atkinson 1995).
Kimball (1989) explained the traditional and alternative views on the
achievement of learners. He noted that the greater number of
courses taken by learners and their more extensive classroom
experience with a subject may give them an advantage.
Achievement can be measured by a variety of means. Achievement
can be reflected in the final grades of learners within a quarter. A
quarterly test composed of several learning targets is also a good
way of determining the achievement of learners. It can als0 be
measured using achievement tests, such as the Wide Range
Achievement Test, California Achievement Test, and Iowa lest Tor
Basic Skills.

According to Lohgman (2005), aptitudes are the characteristics


that influence a person's behavior that aid goal attainment in a
particular situation. Specifically, aptitude refers to the degree of
readiness to learn and perform well in particular situation or domain
(Corno et al. 2002). Examples include the ability to comprehend
instructions, manage one's time, use previously acquired
knowledge appropriately, make good inferences and
generalizations, and manage one's emotions. Other developments
have also led to the conclusion that assessment of aptitude can go
beyond cognitive abilities. An example is the Cognitive Abilities
Measurement that measures working memory capacity, ability to
store old information and process new ones, and speed of an
individual in retrieving and processing new information (Kyllonen
and Christal 1989). Magno (2009) also created a taxonomy of
aptitude test items. The taxonomy provides item writers with a guide
on the type of items to be included when building an aptitude test
depending on the Skills specified. The taxonomy includes 12
classifications categorized as verbal and nonverbal. One of the
schemes in the verbal category include verbal analogy (drawing
sensible conclusions based on a given information and statements),
syllogism (deductive reasoning), and number or letter series; the
nonverbal is composed of topology, visual discrimination ( the ability
to detect differences in and ability to classify objects, symbols or
shapes) , progressive series, visualization (creating a mental image
of an object to find conclusion), orientation, figure ground perception
(identifying figures from the background), surface development,
object assembly (construct an object from component pieces), and
picture completion.

How do we differentiate speed from power test?

Speed tests consist of easy items that heed to be completed


within a time limit. Power tests consist of items with increasing level
of difficulty, but time is sufficient to complete the whole test. An
example of a power test was the one developed by the National
Council of Teachers of Mathematics that determines the ability of
the examinees to utilize data to reason and become creative,
formulate, solve, and reflect critically on the problems provided. An
example of a speed test is a typing test in which examinees are
required to correctly type as many words as possible given a limited
amount of time.

How do we differentiate norm-referenced from criterion-


referenced test?

There are two types of test based on how the scores are
interpreted: norm-referenced and criterion-referenced tests.
Criterion-referenced test has a given set of standards, and the
scores are compared to the given criterion. For example, in a 50-
item test: 40-50 IS very high, 30-39 Is high, 20-29 is average and
10-19 is low, and 0-9 Is very low. One approach in criterion-
referenced interpretation is that the score is compared to a specific
cutoff, An example is the grading in schools where the range of
range of grades 96-100 is highly proficient, 90-95 1S proficient, 80-
89 S nearly proficient, and below 80 Is beginning. The norm-
referenced test interprets results using the distribution of scores of a
sample group. The mean and standard deviations are computed for
the group. The standing of every individual in a norm-referenced
test is based on how far they are from the mean and standard
deviation of the sample. Standardized tests usually interpret scores
using a norm set from a large sample. Having an established norm
for a test means obtaining the normal or average performance in
the distribution of scores. A normal distribution is obtained by
increasing the sample size. A norm is a standard and is based on a
very large group of samples. Norms are reported in the manual of
standardized tests. A normal distribution found in the manual takes
the shape of a bell curve. It shows the number of people within a
range of scores. It also reports the percentage of people with
particular scores. The norm is used to convert a raw Score into
standard scores for interpretability.

What is the use of a norm? (1) A norm is the basis of


interpreting a test score. (2) A norm can be used to interpret a
particular score.
(Module 1) Lesson 3: Different Classifications of Assessment

In order to plan, create, and select the appropriate kind of


assessment, you need to know the characteristics of the different
types of assessment according to purpose, function, and the kind of
information needed about learners. You are expected to read this
before you can create your own illustrative scenario.

What are the different classifications of assessment?

The forms of assessment are classified according to purpose,


form, interpretation of learning, function, ability, and kind of learning.

Classification Type
Educational
Purpose
Psychological
Paper and Pencil
Form
Performance based
Teacher-made
Function
Standardized
Achievement
Kind of Learning
Aptitude
Speed
Ability
Power
Norm-referenced
Interpretation of Learning
Criterion-referenced

When do we use educational and psychological assessments?

Educational assessments are used in the school setting for the


purpose of tracking the growth of learners and grading their
performance. This assessment in the educational setting comes in
the form of formative and summative assessment. These work
hand-in-hand to provide information about student learning.
Formative assessment is a continuous process of gathering
information about student learning at the beginning, during, and
after instruction so that teachers can decide how to improve their
instruction until learners are able to meet the learning targets. When
the learners are provided with enough scaffold as indicated by the
formative assessment, then the summative assessment is
conducted. The purpose of summative assessment is to determine
and record what the learners have learned. On the other hand, the
purpose of formative assessment is to track and monitor student
learning and their progress toward the learning target. Formative
assessment can be any form of assessment (paper-and-pencil or
Performance-based) that is conducted before, during and after
instruction. Before instruction begins, formative assessment serves
as a diagnosis tool to determine whether learners already know
about the learning target. More specifically, formative assessment
given at the start of the lesson determines the following:

1. What learners know and do not know so that instruction can


supplement what learners do not know.
2. Misconceptions of learners so that they can be corrected.
3. Confusion of learners so that they can be clarified.
4. What learners can and cannot do so that enough practice can be
given to perform the task.

The information from educational assessment at the beginning


of the lesson is used by the teacher to prepare relevant instruction
for learners. For example, if the learning target is for learners to
determine the by-product of photosynthesis then the teacher can
ask learners if they know what is the food of plants. If incorrect
answers are provided. then the teacher can recommend references
for them to study. if the learning target is for learners to divide a
three-digit number by a two-digit number, then the teacher can start
with a three-item exercise on the task to identify who can and
cannot perform the task. For those who can do the task, the teacher
can provide more exercises, for those who cannot, necessary direct
instruction can be provided. At this point of instruction, the results of
the assessment are not graded because the information is used by
the teacher to prepare relevant ways to teach.

Educational assessment during instruction is done where the


teacher stops at certain parts of the teaching episodes to ask
learners questions, assign exercises, short essays. board work, and
other tasks. If the majority of the learners are still unable to
accomplish the task, then the teacher realizes that further
instruction is needed by learners.

When the teacher observes that majority or all of the learners


are able to demonstrate the learning target, then the teacher can
now conduct the summative assessment. It is best to have a
summative assessment for each learning target so that there is an
evidence that learning has taken place. Both the summative and
formative assessments should be aligned to the same learning
target; in this case, there should be parallelism between the tasks
provided in
ne formative and summative assessments. When the learners are
provided with word problem-solving tasks in the summative
assessment, word problem-solving should also be given during the
formative assessment, etc.

Psychological assessments, such as tests and scales, are


measures that determine the learners cognitive and non-cognitive
characteristics. Examples of cognitive tests are those that measure
ability, aptitude, intelligence, and critical thinking. Affective
measures are for personality, motivation, attitude, interest, and
disposition. The results of these assessments are used by the
school’s guidance counselor to perform interventions on the
learners' academic, career, and social and emotional development.

When do we use paper-and-pencil and performance-based type


of assessments?

Paper-and-pencil type of assessments are cognitive tasks


that require a single correct answer. They usually come in the form
or test types, such as Binary (true or false), short answer
(identification), matching type, and multiple choice. The items
usually pertain to a specific cognitive skill, such as recalling
understanding, applying, analyzing, evaluating, and creating. On the
other hand, performance-based type of assessments require
learners to perform tasks, such as demonstrations, arrive at a
product, show strategies, and present information. The skills applied
are usually complex and require integrated skills to arrive at the
target response. Examples include writing an essay, reporting in
front of the class, reciting a poem, demonstrating how a problem
was solved, creating a word problem, reporting the results of an
experiment, dance and song performance, painting and drawing,
playing a musical instrument, etc. Performance-based tasks are
usually open-ended, and each learner arrives with various possible
responses.
The use or paper-and-pencil and performance-based tasks
depends on the nature and content of the learning target.

How do we distinguish teacher-made from standardized test?

Standardized tests have fixed directions for administering and


scoring They can be purchased with test manuals, booklets, and
answer sheets. When these tests were developed, the items were
sampled on a large number of target groups called the norm. The
norm group's performance is used to compare the results of those
who took the test.

Non-standardized or teacher-made tests are usually intended


for classroom assessment. They are used for classroom purposes,
such as determining whether learners have reached the learning
target. These intend to measure behavior (such as learning) in line
with the objectives of the course. Examples are quizzes, long tests,
and exams. Formative and summative assessments are usually
teacher made tests.

Can a teacher-made test become a standardized test? Yes, as


long as it is valid, reliable, and with a standard procedure for
administering, scoring, and interpreting results.

What information is sought from achievement and aptitude


tests?

Achievement tests measure what learners have learned after


instruction or after going through a specific curricular program.
Achievement tests provide information on what learners can do and
have acquired after training and instruction. Achievement is a
measure of what a person has learned within or up to a given time
(Yaremko et al. 1982). It is a measure of the accomplished skills
and indicates what a person can do at present (Atkinson 1995).
Kimball (1989) explained the traditional and alternative views on the
achievement of learners. He noted that the greater number of
courses taken by learners and their more extensive classroom
experience with a subject may give them an advantage.
Achievement can be measured by a variety of means. Achievement
can be reflected in the final grades of learners within a quarter. A
quarterly test composed of several learning targets is also a good
way of determining the achievement of learners. It can als0 be
measured using achievement tests, such as the Wide Range
Achievement Test, California Achievement Test, and Iowa lest Tor
Basic Skills.

According to Lohgman (2005), aptitudes are the characteristics


that influence a person's behavior that aid goal attainment in a
particular situation. Specifically, aptitude refers to the degree of
readiness to learn and perform well in particular situation or domain
(Corno et al. 2002). Examples include the ability to comprehend
instructions, manage one's time, use previously acquired
knowledge appropriately, make good inferences and
generalizations, and manage one's emotions. Other developments
have also led to the conclusion that assessment of aptitude can go
beyond cognitive abilities. An example is the Cognitive Abilities
Measurement that measures working memory capacity, ability to
store old information and process new ones, and speed of an
individual in retrieving and processing new information (Kyllonen
and Christal 1989). Magno (2009) also created a taxonomy of
aptitude test items. The taxonomy provides item writers with a guide
on the type of items to be included when building an aptitude test
depending on the Skills specified. The taxonomy includes 12
classifications categorized as verbal and nonverbal. 1ne schemes in
the verbal category include verbal analogy, syllogism, and number
or letter series; the nonverbal is composed of topology, visual
discrimination, progressive series, visualization, orientation, figure
ground perception, surface development, object assembly, and
picture completion.

How do we differentiate speed from power test?

Speed tests consist of easy items that heed to be completed


within a time limit. Power tests consist of items with increasing level
of difficulty, but time is Sufficient to complete the whole test. An
example of a power test was the one developed by the National
Council of Teachers of Mathematics that determines the ability of
the examinees to utilize data to reason and become creative,
formulate, solve, and reflect critically on the problems provided. An
example of a speed test is a typing test in which examinees are
required to correctly type as many words as possible given a limited
amount of time.

How do we differentiate norm-referenced from criterion-


referenced test?

There are two types of test based on how the scores are
interpreted: norm-referenced and criterion-referenced tests.
Criterion-referenced test has a given set of standards, and the
scores are compared to the given criterion. For example, in a 50-
item test: 40-50 IS very high, 30-39 Is high, 20-29 is average and
10-19 is low, and 0-9 Is very low. One approach in criterion-
referenced interpretation is that the score is compared to a specific
cutoff, An example is the grading in schools where the range of
range of grades 96-100 is highly proficient, 90-95 1S proficient, 80-
89 S nearly proficient, and below 80 Is beginning. The norm-
referenced test interprets results using the distribution of scores of a
sample group. The mean and standard deviations are computed for
the group. The standing of every individual in a norm-referenced
test is based on how far they are from the mean and standard
deviation of the sample. Standardized tests usually interpret scores
using a norm set from a large sample. Having an established norm
for a test means obtaining the normal or average performance in
the distribution of scores. A normal distribution is obtained by
increasing the sample size. A norm is a standard and is based on a
very large group of samples. Norms are reported in the manual of
standardized tests. A normal distribution found in the manual takes
the shape of a bell curve. It shows the number of people within a
range of scores. It also reports the percentage of people with
particular scores. The norm is used to convert a raw Score into
standard scores for interpretability.

What is the use of a norm? (1) A norm is the basis of


interpreting a test score. (2) A norm can be used to interpret a
particular score
Module 2: DEVELOPMENT AND ADMINISTRATION OF TESTS
Lesson 4: Planning a Written Test

Desired Significant Learning Outcomes: In this lesson you are expected to:
1. set appropriate instructional objectives for a written test; and
2. prepare a Table of Specifications for a written test.

To be able to learn or enhance your skills in planning for a good classroom test, you
need to review your knowledge on lesson plan development, constructive alignment, and
different test formats. It is suggested that you read books and other references in print or
online that could help you design a good Written test.

Why do you need to define the test objectives or learning outcomes targeted for
assessment?

In designing a well-planned written test, first and foremost, you should be able to
identify the intended learning outcomes in a course, where a written test is an appropriate
method to use. These learning outcomes are knowledge, skills, attitudes, and values that
every student should develop throughout the course. Clear articulation of learning
outcomes is a primary consideration in lesson planning because it serves as the basis for
evaluating the effectiveness of the teaching and learning process determined through
testing or assessment. Learning objectives or outcomes are measurable statements that
articulate, at the beginning of the course, what students should know and be able to do or
value as a result of taking the course. These learning goals provide the rationale for the
curriculum and instruction. They provide teachers the focus and direction on how the
course is to be handled, particularly in terms of
course content, instruction, and assessment, On the other hand, they provide the students
with the reasons and motivation to study and persevere. They give students the
opportunities to be aware of what they need to do to be successful in the course, take
control and ownership of their progress, and focus on what they should be learning.
Setting objectives for assessment is the process of establishing direction to guide both the
teacher in teaching and the
student in learning.

What are the objectives for testing?

In developing a written test, the cognitive behaviors of


learning outcomes are usually targeted. For the cognitive
domain, it is important to identify the levels of behavior
expected from the students. Traditionally, Bloom's
Taxonomy was used to classify learning objectives based on
levels of complexity and specificity of the cognitive
behaviors. With knowledge at the base (i.e., lower order
thinking skill), the categories progress to comprehension,
application, analysis, synthesis, and evaluation. However,
Anderson and Krathwohl, Bloom's student and research
partner, respectively, came up with a revised taxonomy, in
which the nouns used to represent the levels of cognitive
behavior were replaced by verbs, and the synthesis and
evaluation were switched. (Figure 41 presents the two
taxonomies.)

Figure 4.1: Taxonomies of Educational Objectives


In developing the cognitive domain of instructional objectives, key verbs can be
used. See Lesson21or the sample objectives in the RBT Framework.

What is a table of specifications?

A table of specifications (TOS), sometimes called a test blueprint, is a too used by


teachers to design a test. It is a table that maps out the test objectives, contents, or topics
covered by the test the levels of cognitive behavior to be measured, the distribution of
items, number, placement, and weights of test items; and the test format. It helps ensure
that the course's intended learning outcomes, assessments, and instruction are aligned.

Generally, the TOS is prepared before a test is created. However, it is ideal to


prepare one even before the start of instruction. Teachers need to create a OS for every
test that they intend to develop. The test TOs is important because does the following:

 Ensures that the instructional objectives and what the test captures match.
 Ensures that the test developer will not overlook details that are considered
essential to
a good test
 Makes developing a test easier and more efficient
 Ensures that the test will sample all important content areas and processes
 Is useful in planning and organizing
 Offers an opportunity for teachers and students to clarify achievement expectations.

What are the general steps in developing a table of specifications?

Learner assessment within the framework of classroom instruction requires planning.

The following are the steps in developing a Table of Specifications (TOS):

1. Determine the objectives of the test. The first step Is to identify the test objectives.
This should be based on the instructional objectives. In general, the instructional objectives
or the intended learning outcomes are identified at the start, when the teacher creates the
course syllabus. There are three types of objectives: (1) cognitive, (2) affective, and (5)
psychomotor. Cognitive objectives are designed to increase an individual’s knowledge,
understanding, and awareness. On the other hand, affective objectives aim to change an
individual’s attitude into something desirable, while psychomotor objectives are designed
to build physical or motor skills. When planning for assessment, choose only the objectives
that can be best captured by a written test. There are objectives that are not meant for a
written test. For example, if you test the psychomotor domain, it is better to do a
performance-based assessment. There are also cognitive objectives that are sometimes
better assessed through performance-based assessment. Those that require the
demonstration or creation of something tangible like projects would also De more
appropriately measured by performance-based assessment. For a Written test, you can
consider cognitive objective5, ranging from remembering to. creating or ideas, that could
be measured using common formats for testing, such as multiple choice, alternative
response test, matching type, and even essays or open-ended tests.

2. Determine the coverage of the test. The next step in creating the TOS Is to determine
the contents of the test. Only topics or contents that have been discussed in class and are
relevant should be included in the test.

3. Calculate the weight for each topic. Once the test coverage is determined, the weight
of each topic covered in the test is determined. The weight assigned per topic in the test is
based on the relevance and the time spent to cover each topic during instruction. The
percentage of time for a topic in a test is determined by dividing the time spent for that
topic during instruction by the total amount of time spent for all topics covered in the test.
For example, for a test on the Theories of Personality for General Psychology 101 class,
the teacher spent ¼ to 1½ hours class sessions. As such, the weight for each topic is as
follows:

Topic No. of Sessions Time Spent Percentage of


Time (Weight)
Theories & Concepts 0.5 class 30 min 10.0
session
Psychoanalytic Theories 1.5 class 90 min 30.0
session
Trait Theories 1 class session 60 min 20.0
Humanistic Theories 0.5 class 30 min 10.0
session
Cognitive Theories 0.5 class 30 min 10.0
session
Behavioral Theories 0.5 class 30 min 10.0
session
Social Learning Theories 0.5 class 30 min 10.0
session
Total 5 class 300 min or
100
sessions 5 hours

4. Determine the number of items for the whole test. To determine the number of items
to be included in the test, the amount of time needed to answer the items are considered.
As a general rule, students are given 30-60 seconds for each item in test formats with
choices. Fora one-hour class, this means that the test should not exceed 60 items.
However, because you need also to give time for test paper/booklet distribution and giving
instructions, the number of items should be less, maybe just 50 items.

5. Determine the number of items per topic. To determine the number of items to be
included in the test, the weights per topic are considered. Thus, using the examples above,
for a 60-item final test, Theories & Concepts, Humanistic Theories, Cognitive Theories,
Behavioral Theories, and Social Learning Theories will have 5 items, Trait Theories 10
items, and Psychoanalytic Theories - 15 items.

Percentage of
Topic No. of Items
Time (Weight)
Theories & Concepts 10.0 5
Psychoanalytic Theories 30.0 15
Trait Theories 20.0 10
Humanistic Theories 10.0 5
Cognitive Theories 10.0 5
Behavioral Theories 10.0 5
Social Learning Theories 10.0 5
Total 100 50 items

What are the different formats of a test table of specifications?

There are three (3) types of TOS: (1) one-way, (2) two-way, and (3) three-way.

1. One-Way TOS. A one-way TOS maps out the content or topic, test objectives, number
of hours spent, and format, number, and placement of items. This type of TOS is easy to
develop and use because it just works around the objectives without considering the
different levels of cognitive behaviors. However, a one-way TOS cannot ensure that all
levels of cognitive behaviors that should have been developed by the course are covered
in the test.

2. Two-Way TOS. A two-way TOS reflects not only the content, time spent, and number
of items but also the levels of cognitive behavior targeted per test content based on the
theory behind cognitive testing. For example, the common framework for testing at
present in the DepEd Classroom Assessment Policy is the Revised Bloom's Taxonomy
(DepEd, 2015). One advantage of this format is that it allows one to see the levels of
cognitive skills and dimensions of knowledge that are emphasized by the test. It also
shows the framework of assessment used in the development of the test. However, this
format is more complex than the one-way format.

3. Three-way TOS. This type of TOS reflects the features of one-way and two-way TOS.
One advantage of this format is that it challenges the test writer to classify objectives
based on the theory behind the assessment. It also shows the variability of thinking skills
targeted by the test. However, it takes a much longer to develop this type of TOS.
Activity 4:

Below are sets of competencies targeted for instruction taken from a particular subject
area in K – 12 curriculum. Check the assessment method appropriate for the given
competencies.

Sample 1 in Mathematics: Check the competencies appropriate for the given test
format/method

Competencies Appropriate Appropriate for Appropriate for


for Objective Constructed Methods Other
Test Format Type of Test Than a Written
Format Test
1. Order fractions less than one
2. Construct plane figures using ruler and
compass
3. Identify Cardinal numbers from 9001
through 900,000
4. Solve 2 – 3 step word problems on
decimals involving the four operations
5. Transform a division sentence into
multiplication sentence and vice-versa

Sample 2 in Science: Check the competencies appropriate for the given test
format/method

Competencies Appropriate Appropriate for Appropriate for


for Objective Constructed Methods Other
Test Format Type of Test Than a Written
Format Test
1. Infer that the weather changes during the
day and from day to day
2. Practice care and concern for animals
3. Participate in campaigns and activities for
improving/managing one’s environment
4. Compare the ability of land and water to
absorb and release heat
5. Describe the four types of climate in the
Philippines

Sample 3 in Language: Check the competencies appropriate for the given test
format/method

Competencies Appropriate Appropriate for Appropriate for


for Objective Constructed Methods Other
Test Format Type of Test Than a Written
Format Test
1. Use words that describe persons,
places, animals, and events
2. Draw conclusions based on picture-
stimuli/passages
3. Write a different story ending
4. Write simple friendly letters observing
the correct format
5. Compose riddles, slogans, and
announcements from the given
stimuli

Lesson 5: CONSTRUCTION OF WRITTEN TEST


Classroom assessments are an integral part of learners' learning. They do more
than just measure learning. They also inform the learners what needs to be learned and to
what extent and how to learn them. They also provide the parents some feedback about
their child's achievement of the desired learning outcomes.
The schools also get to benefit from classroom assessments because learners test
results can provide them evidence-based data that are useful for instructional planning and
decision-making. As such, it is important that assessment tasks or tests are meaningful
and further promote deep learning. as well as fulfill the criteria and principles of test
construction.
There are many ways by which learners can demonstrate their knowledge and skills
and show evidence of their proficiencies at the end of a lesson, unit, or subject. While
authentic/performance-based assessments have been advocated as the better and more
appropriate methods in assessing learning outcomes, particularly as they assess higher-
level thinking skills, traditional written assessment methods, such as multiple-choice tests,
are also considered as appropriate and efficient classroom assessment tools for some
types of learning targets. This is especially true for large classes and when test results are
needed immediately for some educational decisions. Traditional tests are also deemed
reliable and exhibit excellent content and construct validity.

What are the general guidelines in choosing the appropriate test format?
Not every test is universally valid for every type of learning outcome. For example, if
an intended outcome for a Research Method 1 course is "to design and produce a
research study relevant to ones field of study”, you cannot measure this outcome through
a multiple-choice test or a matching-type test.
To guide you on choosing the appropriate test format and designing fair and
appropriate yet challenging tests, you should ask the following important questions:
1. What are the objectives or desired learning outcomes of the subject/unit/lesson
being
assessed?
o
Deciding on what test format to use generally depends on your learning objectives
or the desired learning outcomes of the subject/unit/lesson. Desired learning outcomes
(DLOS) are statements of what learners are expected to do or demonstrate as a result or
engaging in the learning process.

2. What level of thinking is to be assessed (remembering, understanding, applying,


analyzing, evaluating, and creating)? Does the cognitive level of the test question
match your instructional objectives or DLOs?
The level of thinking to be assessed is also an important factor to consider when
designing your test, as this will guide you in choosing the appropriate test format. For
example, if you intend to assess how much your learners are able to identity important
concepts discussed in class (i.e., remembering or understanding level). a selected-
response format such as multiple choice test would be appropriate. However, it you intend
to assess how your students will be able to explain and apply in another setting a concept
or framework learned in class (i.e, applying and/or analyzing level), you may
consider giving constructed-response test formats such as essays.
It is important that when constructing classroom assessment tools, all levels of
cognitive behaviors are represented-from Remembering (R), Understanding (U), Applying
(Ap). Analyzing (An), Evaluating (E), and creating (C)-and taking into consideration the
Knowledge Dimensions, i.e., Factual F), Conceptual (C), Procedural (P), and
Metacognition (M). You may return to Lesson 2 and Lesson 4 to review the different levels
of Cognitive Behavior and Knowledge Dimensions.
3. Is the test matched or aligned with the course's DLOs and the course contents or
learning
activities?

The assessment tasks should be aligned with the instructional activities and the
DLOs. Thus, it is important that you are clear about what DLOs are to be addressed by
your test and what course activities or tasks are to be implemented to achieve the DLOs.
For example, if you want learners to articulate and justify their stand on ethical
decision-making and social responsibility practices in business (i.e DLO), then an essay
test and class debate are appropriate measures and tasks for this learning outcome. A
multiple-choice test may be used but only if you intend to assess learner’s ability to
recognize what is ethical versus unethical decision-making practice. In the same manner,
matching-type items may be appropriate if you want to know whether your students can
differentiate and match the different approaches or terms to their dentitions.

4. Are the test items realistic to the students?


Test items should be meaningful and realistic to the learners. They should be
relevant or related to their everyday experiences. The use of concepts, terms, situations
that have not been discussed in the class or that they have never encountered, read, or
heard about should be minimized or avoided. This is to prevent learners from making wild
guesses, which will undermine your measurement of what they have really learned from
the class.

WHAT ARE THE MAJOR CATEGORIES AND FORMATS OF TRADITIONAL TESTS?


For the purposes of classroom assessment, traditional tests fall into two general
categories: (1) selected-response type, in which learners select the correct response from
the given options, and (2) constructed-response type, in which the learners are asked to
formulate their own answers. The cognitive capabilities required to answer selected-
response items are different from those required by constructed-response item, regardless
of content.
Selected-Response Tests require learners to choose the correct answer or best
alternative from several choices. While they can cover a wide range of learning materials
very efficiently and measure a variety of learning outcomes, they are limited when
assessing learning outcomes that involve more complex and higher level thinking skills.
Selected-response tests include:
 Multiple Choice Test. It is the most commonly used format in formal testing and
typically consists of a stem (problem), one correct or best alternative (correct
answer), and three or more incorrect or inferior alternatives (distractors).
 True-False or Alternative Response Test. It generally consists of a statement and
deciding if the statement is true (accurate/correct) or false (inaccurate incorrect)
 Matching-Type Test. It consists of two sets of items to be matched with each other
based on a specified attribute.
Constructed-Response Tests require learners to supply answers to a given question or
problem. These include:
 Short Answer Test. It consists of open-ended questions or incomplete sentences
that require learners to create an answer for each item, which is typically a single
word or short phrase. This includes the following types:

Completion. It consists of incomplete statements that require the learners to fill in


the blanks with the correct word or phrase.
Identification. it consists of statements that require the learners to identify or recall
the terms/concepts, people, places, or events that are being described.
Enumeration. It requires the learners to list down all possible answers to the
question.
Essay Test. It consists of problems/questions that require learners to compose or
construct written responses, usually long ones with several paragraphs.
Problem-Solving Test. It consists of problems/questions that require learners to solve
problems in quantitative or non-quantitative settings using knowledge and skills in
mathematical concepts and procedures, and/or other higher-order cognitive skills (e.g-
reasoning, analysis, critical thinking, and skills).
WHAT ARE THE GENERAL GUIDELINES IN WRITING MULTIPLE-CHOICE TEST
ITEMS?
1. Write items that reflect only one specific content and cognitive processing skills.
Faulty: Which of the following is a type of statistical procedure used to test a
hypothesis regarding significant relationship between variables , particularly
in terms of the extent and direction of association?
a. ANCOVA b. ANOVA c. Correlation d. t-test
Good: Which of the following is an inferential statistical procedure used to test a
hypothesis regarding significant differences between two qualitative
variables?
a. ANCOVA b. ANOVA c. Chi-square d. Mann-
Whitney Test
2. Do not lift and use statements from the textbook or other learning materials as test
questions.
3 Keep the vocabulary simple and understandable based on the level of
learners/examinees.
4. Edit and proofread the items for grammatical and Spelling before administering them to
the learners.

Stem:

1. Write the directions in the stem in a clear and understandable manner.

Faulty: Read each question and indicate your answer by shading the circle corresponding
to your answer.

Good: This test consists of two parts. Part A is a reading comprehension test and Part B is
a grammar/language test. Each question is a multiple-choice test item with five (5)
options. You are to answer each question but will not be penalized for a wrong
answer or for guessing. You can go back and review your answers during the time
allotted.

2. Write stems that are consistent in form and structure, that is, present items either in
question form or in descriptive or declarative form.

Faulty: (1) Who was the Philippine president during Martial Law
(2) The first president of the Commonwealth of the Philippines was ______.
Good: (1) Who was the Philippine president during Martial Law
(2) Who was the first president of the Commonwealth of the Philippines?

3. Word the stem positively and avoid double negatives, such as NOT and EXCEPT in a
stem. If a negative word is necessary, underline or capitalize then words tor emphasis.

Faulty: Which of the following is not a measure of variability.

Good: Which of the following is NOI a measure of variability

4. Retrain from making the stem too wordy or containing too much information unless the
problem/question requires the facts presented to solve the problem.

Faulty: What does DNA stand for, and what is the organic chemical of complex molecular
structure found in all cells and viruses and codes genetic information for the transmission
of inherited traits?
Good: As a chemical compound, what does DNA Stand for?

Options:
1. Provide three (3) to five (5) options per item, with only one being the correct or best
answer/alternative.

2. Write options that are parallel or similar in form and length to avoid giving clues about
the correct answer.

Faulty: What is an ecosystem?

A. It is a community of living organisms in conjunction with the nonliving components of


their environment that interact as a system. These biotic and abiotic components
are linked together through nutrient cycles and energy flows.
B. It is a place on Earth's surface where life dwells.
C. It is an area that one or more individual organisms defend against competition from
other organisms.
D. It is the biotic and abiotic surroundings of an organism or population.
E. It is the largest division of the Earth's surface filled with living organisms.

Good: What is an ecosystem?


A. It is a place on the Earth's surface where life dwells.
B. It is the biotic and abiotic surroundings of an organism or population.
C. It is the largest division of the Earth's surface filed with living organisms.
D. It is a large community of living and non-living organisms in a particular area.
E. It is an area that one or more individual organisms defend against competition from
other organisms.

3. Place options in a logical order (e.g., alphabetical, from shortest to longest).


4. Place correct response randomly to avoid a discernable pattern of correct answer.
5. Use None-of-the-above carefully and only when there is one absolutely correct answer,
such as in spelling or math items.

6. Avoid All of the above as an option, especially if it is intended to be the correct answer.
7. Make all options realistic and reasonable.

What are the general guidelines in writing matching-type items?

The matching test item format requires learners to match a word, sentence, or
phrase in one column (i.e., premise) to a corresponding word, sentence, o phrase In a
second column (i.e., response). It is most appropriate when you need to measure the
learners' ability to identify the relationship or association between similar items. They work
best when the course content has many parallel concepts. While matching-type test format
is generally used for Simple recall of information, you can find ways to make it applicable
or useful in assessing higher level of thinking such as applying and analyzing.
The following are the general guidelines in writing good and effective
matching-type tests:
1. Clearly state in the directions the basis for matching the stimuli with the responses.
2. Ensure that the stimuli are longer and the responses are shorter.
3. For each item, include only topics that are related with one another and share the same
foundation of information.
4. Make the response options short, homogeneous, and arranged in logical order.
5. Include response options that are reasonable and realistic and similar in length and
grammatical form.
6. Provide more response options than the number of stimuli.

What are the general guidelines in writing true or false items


True or false items are used to measure learner’s ability to identity whether a
statement or proposition is correct/true or incorrect/false. They are best used when
learner’s ability to judge or evaluate is one of the desired learning outcomes of the course.

There are different variations of the true or false items. These include the following:

1. T-F Correction or Modified True-or-False Question. In this format, the statement is


presented with a key word or phrase that is underlined, and the learner has to supply the
correct word or phrase that is underlined.

2. Yes-No variation. In this format, the learner has to choose yes or no, rather than true or
false.

3. A-B Variation. In this format, the learner has to choose A or B, rather than true or false.

Because true or false test items are prone to guessing, as learners are asked to
choose between two options, utmost care should be exercised in writing true or false
items. the following are the general guidelines in writing true or false items.

1. Include statements that are completely true or completely false.

2. Use simple and easy-to-understand statements.

3. Refrain from using negatives-especially double negatives.

Double negatives are sometimes confusing and could result in wrong answers, not
because the learner does not know the answer but because of how the test items
are presented.

4. Avoid using absolutes such as "always" and "never.


Absolute words such as "always” and "never" restrict possibilities and make a
statement as true 100 percent or all the time. They are also a hint for a "false"
answer.
5. Express a single idea in each test item.
6. Avoid the use of unfamiliar words or vocabulary.
Students may have a difficult time understanding the statement, especially if the
word has not been discussed in the class. Using unfamiliar words would likely lead
to guessing.

7. Avoid lifting statements from the textbook and other learning materials.

What are the general guidelines in writing short-answer test items?

A short-answer test item requires the learner to answer a question or to finish an


incomplete statement by filling in the blank with the correct word or phrase. While it is most
appropriate when you only intend to assess learners lower-level thinking, such as their
ability to recall facts learned in class, you can create items that minimize guessing and
relevant clues to the correct answer.

The following are the general guidelines in writing good fill-in-the-blank or completion test
items:

1. Omit only significant words from the statement.

2. Do not omit too many words from the statement such that the intended meaning is lost.

3. Avoid obvious clues to the correct response.

4. Be sure that there is only one correct response.

5. Avoid grammatical clues to the correct response.

6. If possible, put the blank at the end of a statement rather than at the beginning.
What are the general guidelines in writing essay tests?
Teachers generally choose and employ essay tests over other forms of assessment
because essay tests require learners to create a response rather than to simply select a
response from among alternatives. They are the preferred form of assessment when
teachers want to measure learners higher-order thinking skills, particularly their ability to
reason, analyze, synthesize, and evaluate. They also assess learners writing abilities.
They are most appropriate for assessing learner’s (1) understanding of subject-matter
content, (2) ability to reason with their knowledge of the subject, and (3) problem-solving
and decision skills because items or situations presented in the test are authentic or close
to real life experiences.
There are two types of essay test: (1) extended-response essay, and (2) restricted-
response essay.
The following are the general guidelines in constructing good essay questions:
1. Clearly define the intended learning outcome to be assessed by the essay test.
To design effective essay questions or prompts, the specific intended learning outcomes
are identified. Appropriate direct verbs that most closely match the ability that learners
should demonstrate must be used. These include verbs such as compose, analyze,
interpret, explain, and justify, among others.
2. Refrain from using essay test for intended learning outcomes that are better assessed
by other kinds of assessment.
It is important to take into consideration the limitations of essay tests when planning and
deciding what assessment method to employ tor an intended learning outcome.
3. Clearly define and situate the task within a problem situation as well as the type of
thinking required to answer the test.
Essay questions or prompts should provide clear and well-defined tasks to the learners.
It is important to carefully choose the directive verb, to write clearly the object or focus
of the directive verb, and to delimit the scope of the task.
4. Present tasks that are fair, reasonable, and realistic to the students
Essay questions should contain tasks or questions that students will be able to do or
address. These include those that are within the level of instruction/training, expertise,
and experience of the students.
5. Be specific in the prompts about the time allotment and criteria for grading the response.

Essay prompts and directions should indicate the approximate time given to the
students to answer the essay questions to guide them on how much time they should
allocate for each rem, especially if several essay questions are presented. How the
responses are to be graded or rated should also be clarified to guide the students on
what to include in their responses.

What are the general guidelines in problem-solving test items?


Problem-solving test items are used to measure learners' ability to solve problems
that require quantitative knowledge and competencies and/or critical thinking skills. These
items present a problem situation or task that will require learners to demonstrate work
procedures or come up with a correct solution. Full or partial credit can be assigned to the
answers, depending on the answers or solutions required.
There are different variations of the quantitative problem-solving items. These
include the following:
1. One answer choice - This type of question contains four or five options, and students
are required to choose the best answer,
2. All possible answer choices - This type of question has four or five options, and
students are required to choose all of the options that are correct.
3. Type-In answer - This type of question does not provide options to choose from.
Instead, the learners are asked to supply the correct answer. The teacher should inform
the learners at the start now their answers will be rated.
Problem-solving test items are good test format as they minimize guessing measure
instructional objectives that focus on higher cognitive levels, and measure extensive
amount of contents or topics. However, they require more time for teachers to construct,
read, and correct, and are prone to rater bias especially when scoring rubrics/criteria are
not available. It is therefore important that good quality problem-solving test items are
constructed.
The following are some of the general guidelines in constructing good problem-solving test
items:

1. Identify and explain the problem clearly.


2. Be specific and clear of the type of response required from the students.
3. Specify in the directions the bases for grading students’ answers/procedures.
Lesson 6 – ESTABLISHING TEST VALIDITY AND RELIABILITY

In order to establish the validity and reliability to an assessment tool, you need to
know the different ways to establishing test validity and reliability.

What is test reliability?

Reliability is the consistency of the responses to measure under three conditions:


(1) when retested on the same person; (2) when retested on the same measure; and (3)
similarity of responses across items that measure the same characteristic. In the first
condition, consistent response is expected when the test is given to the same participants.
In the second condition, reliability is attained if the responses of the same test are
consistent with the same test or its equivalent or another test but measures the same
characteristics when administered at a different time. In the third condition, there is
reliability when the person responded in the same way or consistently across items that
measure the same characteristic.

There are different factors that affect the reliability of a measure. The reliability of a
measure can be high or low, depending on the following factors:
1. The number or items in a test - the more items a test has, the likelihood or reliability is
high. The probability of obtaining consistent scores is high because of the large pool of
items.
2. Individual differences of participants - every participant possesses characteristics that
affect their performance in a test, such as fatigue, concentration, innate ability,
perseverance, and motivation. These individual factors change over time and affect the
consistency of the answers in a test.
3. External environment he external environment may include room temperature, noise
level, deep of instruction, exposure to materials, and quality of instruction, which could
affect changes in the responses of examinees in a test.

What are the different ways to establish test reliability?

There are different ways in determining the reliability of a test. The specific kind of
reliability will depend on the (1) variable you are measuring, (2) type of test, and (3)
number of versions of the test.
The different types of reliability are indicated and how they are done.
1. Linear Regression

Linear regression is demonstrated when you have two variables that are measured,
such as two set of scores in a test taken at two different times by the same participants.
when the two scores are plotted in a graph (with X- and Y-axis), they tend to form a
straight line. The straight line formed for the two sets of scores can produce a linear
regression. When a straight line is formed, we can say that there is a correlation between
the two sets of scores. This can be seen in the graph shown. This correlation is shown in
the graph given. The graph is called a scatterplot. Each point in the Scatterplot is a
respondent with two scores (one for each test).
2. Computation of Pearson r correlation

The index of the linear regression is called a correlation coefficient. When the points
in a scatterplot tend to fall within the linear line, the correlation is said to be strong. When
the direction of the scatterplot is directly proportional, the correlation coefficient will have a
positive value. If the line is inverse, the correlation coefficient will have a negative value.
The statistical analysis used to determine the correlation coefficient is called the Pearson f,
How the Pearson r is obtained is lustrated below.

Suppose that a teacher gave the spelling of two-syllable words with 20 items for
Monday and Tuesday. The teacher wanted to determine the reliability of two set of scores
by computing for the Pearson r.

Formula:

N(ΣXY) – (ΣX)(ΣY)
r = ------------------------------------
[NΣX² - (ΣX)²][NΣY² - (ΣY)²]
The value of a correlation coefficient does not exceed 1.00 or -1.00. A value of 1.00
and -1.00 indicates perfect correlation. In test of reliability though, we aim for high positive
correlation to mean that there is consistency in the way the student answered the tests
taken.

3. Difference between a positive and a negative correlation

When the value of the correlation coefficient is positive, it means that the higher the
scores in X, the higher the scores in Y. This is called a positive correlation. In the case of
the two spelling scores, a positive correlation is obtained. when the value of the correlation
coefficient is negative, it means that the higher the scores in X, the lower the scores in Y,
and vice versa. This is called a negative correlation. When the same test is administered to
the same group of participants, usually a positive correlation indicates reliability or
consistency of the scores.

4. Determining the strength of a correlation

The strength of the correlation also indicates the strength of the reliability of the test.
This is indicated by the value of the correlation coefficient. The closer the value to 1.00 01.,
e stronger is the correlation. Below is the guide:

0.80 -1.00 very strong relationship


0.60 - 0.79 strong relationship
0.40 - 0.59 substantial/marked relationship
0.20 - 0.39 weak relationship
0.00 - 0.19 negligible relationship
5. Determining the significance of the correlation

The correlation obtained between two variables could be due to chance. In order to
determine if the correlation is free of certain errors, it is tested for significance. When a
correlation is significant, it means that the probability of the two variables being related is
free of certain errors.

In order to determine if a correlation coefficient value is significant, it is compared with an


expected probability of correlation coefficient values called a critical value. When the value
computed is greater the critical value, it means that the information obtained has more
than 95% chance of being correlated and is significant.

Another statistical analysis mentioned to determine the internal consistency of test


is the Cronbach's alpha. Follow the procedure to determine the internal consistency.

Suppose that five Students answered a checklist about their hygiene with a scale of
1 to 5, where in the following are the corresponding scores.
5-always, 4-often, 3-sometimes, 2-rarely, 1-never

The checklist has five items. The teacher wanted to determine if the items have
internal consistency.

The internal consistency of the responses in the attitude toward teaching is 0.10,
indicating low internal consistency.

The consistency of ratings can also be obtained using a coefficient of concordance.


The Kendall’s ω coefficient of concordance is used to test the agreement among raters.
Below is a performance task demonstrated by five students rated by three raters.
The rubric used a scale of 1 to 4, where in 4 is the highest and 1 is the lowest.

Five Rater 1 Rater 2 Rater 3 Sum of D D²


demonstrations ratings
A 4 4 3 11 2.6 6.76
B 3 2 3 8 -0.4 0.16
C 3 4 4 11 2.6 6.76
D 3 3 2 8 -0.4 0.16
E 1 1 2 4 -4.4 19.36
Mean = 8.4 ∑D² = 33.2

The scores given by 3 raters are first computed by summing up the total ratings for
each demonstration. The mean is obtained for the sum of ratings (8.4). The mean is
subtracted from each of the sum of ratings (D). Each difference is squared (D²), then the
sum of squares is computed (∑D² = 33.2). The mean and summation of squared difference
is substituted in the Kendall’s formula. In the formula, m is the number of raters.

12ƩD² 12(33.2)
W = -------------------- = ----------------- = 0.37
m²(N)(N² - 1) 3²(5)(5² - 1)

W = coefficient of concordance
D = the difference between the individual sum of ranks of the raters and the
average of the sum of ranks of the object or individuals
m = no. of judges or raters
N = no. of objects or individuals being rated

A Kendall's ω coefficient value of 0.37 indicates the agreement of the three raters in
the five demonstrations. There is moderate concordance among the three raters because
the value is far from 100.0. Kendall’s can be interpreted as in Pearson r.
What is test validity?

A measure is valid when it measures what it is supposed to measure. if a quarterly


exam is valid, then the contents should directly measure the objectives of the curriculum. If
a scale that measures personality is composed of five factors, then the scores on the five
factors should have items that are highly correlated. If an entrance exam is valid, it should
predict students' grades after the first semester.

How to Determine if an Item is Easy or Difficult (ITEM ANALYSIS)

Item analysis procedures allow teachers to discover items that are ambiguous, irrelevant,
too easy or difficult, and non-discriminating. It also enhances the technical quality of an
examination, facilitate classroom instruction and identifies the areas of a student’s weakness,
providing information for specific remediation. There are 2 important characteristics of an item: 1)
item difficulty; 2) discrimination index. The difficulty of an item or item difficulty is the number of
students who are able to answer the item correctly divided by the total number of students.

Simplified Item Analysis Procedure


1. Arrange the test scores from the highest to the lowest
2. Get 1/3 of the papers from the highest scores and 1/3 from the lowest scores leaving the
remaining middle group (1/3) intact.
3. Count the # of students in the upper and lower groups, respectively, who choose the
options.
4. Record the frequency from step 3.
5. Estimate the index of difficulty.
6. Estimate item discriminating power.
7. Determine the effectiveness of the distracters
Example: Result of item 21 taken by 45 students:

Stdnt Score Ans Stdnt Score Ans Stdnt Score Ans


1 32 1 16 44 3 31 65 2
2 83 5 17 93 3 32 77 3
3 59 5 18 79 1 33 54 4
4 76 4 19 75 2 34 95 3
5 96 3 20 67 5 35 40 3
6 55 1 21 88 3 36 73 5
7 45 4 22 70 4 37 90 3
8 87 1 23 64 3 38 41 1
9 60 3 24 42 3 39 51 3
10 43 2 25 97 3 40 85 2
11 91 3 26 35 1 41 47 4
12 69 3 27 48 2 42 68 3
13 49 1 28 84 4 43 92 3
14 66 1 29 50 5 44 71 1
15 94 3 30 81 5 45 89 3

Correct answer is 3

Step 1:
97 - 3 79 - 1 55 - 1
96 - 3 77 - 3 54 - 4
95 - 3 76 - 4 51 - 3
94 - 3 75 - 2 50 - 5
93 - 3 73 - 5 49 - 1
92 - 3 71 - 1 48 - 2
91 - 3 70 - 4 47 - 4
90 - 3 69 - 3 45 - 4
89 - 3 68 - 3 44 - 3
88 - 3 67 - 5 43 - 2
87 - 1 66 - 1 42 - 3
85 - 2 65 - 2 41 - 1
84 - 4 64 - 3 40 - 3
83 - 5 60 - 3 35 - 1
81 - 5 59 - 5 32 - 1

Item 21:

Options 1 2 3* 4 5
------------------------------------------------------------------------
Upper (15) 1 1 10 1 2
Lower (15) 5 2 4 3 1
------------------------------------------------------------------------
*Correct answer

Index of Difficulty = (Σ X / N) x 100%

= (10 + 4)/30 x 100% = 14/30 x 100 = .47 x 100% = 47%


Index of Discrimination = (RU – RL)/NG
= (10 – 4)/15 = 6/15 = 0.40

RU – right responses of the upper group


RL – right responses of the lower group
NG – number of students in each group

Interpretation of D values:

D value range Interpretation

- 1.00 - -0.6 0 Questionable item


-0.59 - -0.20 Not Discriminating
-0.21 - 0.40 Moderately Discriminating
0.41 - 0.60 Discriminating
0.61 - 1.00 Very Discriminating

Interpretation of Index of Difficulty:

Difficulty Values Interpretation

0.00 - 0..20 Very Difficult item


0.21 - 0.40 Difficult item
0.41 - 0.60 Moderately Difficult item
0.61 - 0.80 Easy item
0.81 and above Very Easy item

Decision Table:

Difficulty Level Discriminating Level Decision

Difficult Not Discriminating Improbable – Discard


Moderately Discriminating May need revision
Discriminating Accept
Moderately Difficult Not Discriminating Needs Revision
Moderately Discriminating May need revision
Discriminating Accept
Easy Not Discriminating Discard
Moderately Discriminating Needs revision
Discriminating

Note: A good distracter attracts students in the lower group more than in the upper
group
From the example above:
Option 1 – good
Option 2 – good
Option 3 – good
Option 4 – good
Option 5 – poor
Interpreting Test Scores:

Raw scores are scores when tests are corrected. These scores may or may not
represent student’s ability in the subject nor his capacity to learn the subject. It is
necessary to weigh the scores and weighing may be done by either dividing or multiplying
the score on one section of a test by a number calculated to give the desired weight to a
particular exercise. Raw scores remain meaningless unless interpreted. One way of
interpreting scores is by means of transmutation.

TG = (Score/No. of items) x 50 + 50

When developing a teacher-made test, it is good to have items that are easy.
average, and difficult with positive discrimination indices. If you are developing a
standardized test, the rule is more stringent as it aims for average items or not so easy nor
difficult items and whose discrimination index is at least 0.3.
Lesson 7 – Organization of Test Data Using Tables and Graphs
Test data are better appreciated and communicated if they are arranged, organized,
and presented in a clear and concise manner. Good presentation requires designing a
table that can be read easily and quickly. Tables and graphs re common tools that help
readers better understand the test results that are conveyed to concerned groups like
teachers, students, parents, administrators, or researchers, which are used as basis in
developing programs to improve learning of students.

Presentation of Data: (How do we present test data?)

The Textual Form: It is utilized when data to be presented are purely qualitative or when
very few numbers are involved.

The Tabular Form: Statistical table is a more effective device in presenting data. A
statistical table has four essential components:
1. Table heading – shows table # and title
2. The Body – main part of the table (quantitative info)
3. The Stubs - classifications or categories which are presented as values of
a variable.
4. The Boxheads – the captions that appear above the Columns

The Graphical or Pictorial Form


1. Line Graph – an effective device used to portray changes in values with respect to
time
2. Bar Graph – shows categorical and chronological comparisons.
1. Vertical graph – used for chronological comparisons
2. Horizontal Graph – used for categorical comparisons
3. Component Bar Chart – graphic device used to show the relative sizes of the
components that make up a total
4. Pie Chart – portrays magnitudes of the component parts of a whole.
5. Pictograph – used to dramatize the differences among a few quantities.
6. Statistical Maps – are used to present quantitative data which describe or classify
geographical
areas.

Vertical Bar graph


12
10
8
6
4
2
0
1 2 3 4 5 6 7 8

Series1 Series2 Series3


Horizontal Bargraph
8
7
6
5
4
3
2
1
0 2 4 6 8 10 12

Series3 Series2 Series1

Line Graph
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8

Series1 Series2 Series3

Pie Chart

1 2 3 4 5 6 7 8

FREQUENCY DISTRIBUTION:

Array – arrangement of data according to size or magnitude.


Frequency Distribution (FD) – grouping values into mutually exclusive classes showing
the number of observations occurring in its class in a tabular form. A simple FD has
two columns, one for the class intervals and the other for class frequencies. The class
frequencies (f) indicate the number of observations falling w/in the different class
intervals. The class interval has two limits, lower limit and the upper limit but the true
limit is known to be the class boundaries.
Interval size (i) – the number of scores included in a class interval and is known as the
width of the interval.
Class mark (Xm) – midpoint of the interval

Ll + Ul
Xm = -----------------
2
Example:
Class interval f Xm Class Boundaries
----------------------------------------------------------------------------
50 – 54 4 52 49.5 – 54.5
45 – 49 7 47 44.5 – 49.5
40 – 44 12 42 39.5 – 44.5
35 – 39 10 37 34.5 – 39.5
30 – 34 9 32 29.5 – 34.5
25 – 29 6 27 24.5 – 29.5
20 – 24 2 22 19.5 – 24.5
--------------------------------------------------------------------------
i= 5 n = 50
Construction of a FD:

1. Determine the Range (R) by taking the difference of the highest score and the lowest
score.
2. Decide on the number of class intervals. Compute the number of intervals, n, by using
the formula: n = 1 + 3.3 logN, where n = No. of class intervals, N = population
3. Divide the range by the desired # of class interval to get the interval size (i). i = R/n
4. Using the lowest score as lower limit, add (i – 1) to it to obtain the upper limit of the
desired class interval.
5. The lower limit of the 2 nd interval (& so on) may be obtained by adding the class size to
the lower limit of the first interval.

Example
The following are the scores of the 3rd year BSEd students in Statistics test.

43 58 21 24 31 49 40 51 55 28
50 33 62 30 25 39 59 29 36 42
38 46 42 16 50 41 37 35 40 52
47 35 57 55 36 45 32 45 42 36

Construct a FD for the data

Solution:
R = 62 – 16 = 46
n = 1 + 3.3logN
= 1 + 3.3log40
= 6.29 or 7 (round up, if there is a decimal number, if it is a whole number, you
will need to add an extra class to accommodate all the data)

i = R/n = 46 / 6.29 = 7.3 say 7

X Tally f____
58 – 64 lll 3
51 – 57 llll 5
44 – 50 llll – ll 7
37 – 43 llll – llll 10
30 – 36 llll – llll 9
23 – 29 llll 4
16 – 22 ll 2____
i= 7 n = 40
Derived Frequencies:
1. Relative Frequency Distribution
2. Cumulative Frequency Distribution
3. Cumulative Percentage Frequency Distribution
rf = f / n x 100%
Cpf = cf / n x 100%

X f rf cf< cf> cpf< cpf>


----------------------------------------------------------------------------
50 – 54 4 8 50 4 100 8
45 – 49 7 14 46 11 92 22
40 – 44 12 24 39 23 78 46
35 – 39 10 20 27 33 54 66
30 – 34 9 18 17 42 34 84
25 – 29 6 12 8 48 16 96
20 – 24 2 4 2 50 4 100
--------------------------------------------------------------------------
i= 5 n = 50

Frequency Polygon:

An effective device use for comparative purposes. It is constructed by plotting the


frequencies against the corresponding class marks, connecting successive points by
means of straight lines. And allowing both tails to touch the horizontal axis by adding an
extra class mark at each tail of the distribution.

Activity 7:
Consider the following raw data on an arithmetic test:

56 42 68 56 42 78 54 53 56
55 62 44 48 55 57 37 62 47
66 65 54 72 52 42 68 39 38
50 52 47 62 82 41 48 42 60
28 47 48 56 45 58 70 80 67

a. Construct a FD
b. Determine the class marks, class boundaries,
relative frequency, cf< and cf>, cpf < and cpf>
c. Construct a frequency polygon

Lesson 8 – Analysis, Interpretation, and Use of Data


What are measures of central tendency?

The word "measures of central tendency" means the central location or point of
convergence of a set of values. Test scores have a tendency to converge at a central
value. This value Is the average of the set of scores. In other words, a measure of central
tendency gives a single value that represents a given set of scores. Three commonly-used
measures of central tendency or measures of central location are the mean, the median,
and the mode.

Mean – the center of gravity of the distribution and is the most widely used. The mean is
sensitive to
extreme scores.

The Mean of ungrouped data:

x̅ = Σ X / n sample mean
µ= ΣX/N population mean

Example: The following are the family sizes of a sample of 10 households in a slum area.
2, 3, 4, 2, 5, 3, 7, 4, 3, 7

n = 10, ΣX = 40

x̅ = ΣX / n = 40 / 10 = 4

The Mean of Grouped data:

Midpoint Formula:

x̅ = Σ fXm / n = 1,905 / 50 = 38.1

X f Xm fXm
--------------------------------------------------------
50 – 54 4 52 208
45 – 49 7 47 329
40 – 44 12 42 504
35 – 39 10 37 370
30 – 34 9 32 288
25 – 29 6 27 162
20 – 24 2 22 44
--------------------------------------------------------
i= 5 n = 50 Σ fXm = 1,905

Median = the score-point which divides the distribution into two equal parts; it is the value
which lies
50% of the data; it is not sensitive to extreme scores.

Median of Ungrouped data:

Values should be arranged in the order of magnitude, either ascending or


descending order. For data involving an odd number of scores, the median is simply the
middle value.

(n+1) th
mdn = ------ score from the lowest
2
Example 1: Find the median of the following sample data:
6, 15, 8, 42, 18, 24, 23,

n=7

mdn = (7+1 / 2) = 8/2 = 4th score from the lowest

6, 8, 15, 18, 23, 24, 42

mdn = 18

Example 2: Find the median of the ff. data

121, 108, 120, 98, 132, 100, 92, 140, 102, 98

mdn = (10+1) / 2 = 11/2 = 5.5th score

92, 98, 98, 100, 102, 108, 120, 121, 132, 140

mdn = (102 + 108) / 2 = 210/2 = 105

Median of Grouped Data:

n/2 - cfP
mdn = Xlb + ----------- i
fm

Xlb = lower boundary of the median class


fm = frequency of the median class
cfP = cumulative freq. of the preceding class interval
i = interval size
n = total number of observations

Example:
X f cf
--------------------------------------------------------------
50 – 54 4 50
45 – 49 7 46
40 – 44 12 39
35 – 39 10 27
30 – 34 9 17
25 – 29 6 8
20 – 24 2 2
--------------------------------------------------------------
i= 5 n = 50

n/2 = 50/2 = 25

Xlb = 34.5, cfp = 17, fm = 10, i=5

n/2 - cfP
Mdn = Xlb + ----------- i
fm
25 - 17
Mdn = 34.5 + ----------- (5)
10
Mdn = 34.5 + (0.8) 5
Mdn = 34.5 + 4
Mdn = 38.5

Mode: The value which is observed to have the highest frequency.

For ungrouped data, the mode is obtained by mere inspection. While it is possible
for a set of values to have no mode, it is also possible for other sets to have more than one
mode.

Example:
4, 5, 8, 8, 8, 9, 12, 12, 15, 19, 20

Mo = 8

Mode of Grouped Data:

Formulas:
1. Mo = 3(Mdn) – 2(Mean)
Mo = 3(38.5) – 2 (38.1)
Mo = 115.5 – 76.2
Mo = 39.3
d1
2. Mo = Xlb + ----------- i
d1 + d 2

Example:
X f modal class: 40 - 44
------------------------------ Xlb = 39.5
50 – 54 4 d1 = 2
45 – 49 7 d2 = 5
40 – 44 12
35 – 39 10 Mo = 39.5 + (2/2+5) 5
30 – 34 9 Mo = 39.5 + (2/7) 5
25 – 29 6 Mo = 39.5 + (.286) 5
20 – 24 2 Mo = 39.5 + 1.43
------------------------------ Mo = 40.93
i= 5 n = 50

When are mean, median, and mode appropriately used?

To appreciate comparison of the three measures of central tendency, a brief


background of level of measurement is important. We make observations and perform
assessments on many variables related to learning-vocabulary, spelling ability, self-
concept, birth order, socio-economic status, etc. The level or measurement helps you
decide how to interpret the data as measures of these attributes, and this serves as a
guide in determining in part the kind of descriptive statistics to apply in analyzing the test
data.

Scales of Measurement:

1. Nominal Scale – classifies elements into two or more categories or classes ( gender,
religion, etc.)
2. Ordinal Scale – ranks the individuals in terms of the degree to which they possess a
characteristic of interest
3. Interval Scale – in addition to ordering scores from highest to lowest, establishes a
uniform unit in the scale so that any distance between two consecutive
scores is of equal magnitude.
4. Ratio Scale – in addition to being an interval scale, also has an absolute zero in the
scale (height, weight , area, volume, speed, etc.).

Skewness:

A distribution actually takes the form of a bell-shaped or normal curve if the mean,
the median and the mode are equal. It becomes positively skewed if the mean is greater
than the median and negatively skewed if otherwise. As a general rule, the closer the
coefficient of skewness is to zero, the less skewed the distribution will be, and the farther
this coefficient is from zero, the more skewed the distribution will be.

3(x̅ – Mdn.)
Sk = -------------------
S
The performance of the students is said to be satisfactory or very satisfactory when
the curve is negatively skewed while it is unsatisfactory if it is positively skewed.

Positively Skewed Negatively


Skewed

The curve is positively skewed if it tails-off to the right and negatively skewed if it tails-
off to the left.

Other Measures of Position:

Quartile: Divides the distribution into 4 equal parts.

For ungrouped data, follow the steps:


1. Arrange the scores in the decreasing order.
2. From the bottom, find the points below which 25% of the score values and 75%
of the score values fall.
3. Find the average of the two scores in each of these points to determine Q 1 and
Q3 respectively.

Example: Given the following scores, find the 1st and 3rd Quartiles

90, 85, 86 109, 105, 88, 100, 85, 105, 110, 112, 100

Applying the steps:

112
Upper half
110
109 Q3 = (109 + 105) ÷ 2 = 107
105
105
100
------------------------------------------------------------
100
Lower90 half
88 Q1 = (88 + 86) ÷ 2 = 87
86
85
85

For Grouped Data:

n/4 - cfP
Q1 = Xlb + ------------- i
fq

3n/4 - cfP
Q3 = Xlb + ------------- i
fq

Xlb = lower boundary of the median class


Fq = frequency of the quartile class
cfP = cumulative freq. of the preceding class interval
i = interval size
n = total number of observations
Measures of Variation/Dispersion:

a. Range – difference bet. the Hv and the Lv (Hv – Lv)


b. Quartile Deviation – half of inter-quartile range or semi-interquartile range
[(QD = (Q3 – Q1) ÷ 2]
c. Mean Deviation – deviations of the individual score from the average.
d. Variance – average of the squared deviation
e. Standard Deviation – the most reliable measure of variation which is obtained by
extracting the square root of the variance.

Variance and Standard Deviation:

Ungrouped Data: (Mean Deviation Formula)


Σ (X – x̅ )²
(Variance) S² = -------------
n–1

Σ (X – x̅ )²
(Standard Dev.) S = -------------
√ n–1

Example:
19,434.60
(Variance) S² = --------------- = 329.4
60 – 1

(Standard Dev.) S = √ 329.4 = 18.15

Coded Formula:
n ΣfX’² - (ΣfX’)²
S² = ---------------------- (i²)
n(n – 1)

nΣfX’² - (ΣfX’)²
S = -------------------- (i²)
√ n(n – 1)

X f_____X’___fX’ fX’²___
90 – 98 3 4 12 48
81 – 89 8 3 24 72
72 – 80 12 2 24 48
63 – 71 11 1 11 11
54 – 62 10 0 0 0
45 – 53 6 -1 -6 6
36 – 44 5 -2 -10 20
27 – 35 3 -3 -9 27
18 – 26 2 -4 -8 32____
i=9 n = 60 Σ fX’ = 38 Σ fX’² = 264

Variance:

60(264) – (38)²
S² = ---------------------- (9²)
60(60 – 1)

15840 – 1444
S² = ------------------- (81)
3540

S² = (14,396 ÷ 3,540) 81
S² = 329.4

Standard Deviation:

S = √329.4
S = 18.15

Standard Scores:

There are many kinds of standard scores. The most useful is the Z-score, which is
often used to express a raw score in relation to the mean and standard deviation. In other
words, it is how standard deviation is being used. We transform standard score to Z-score.

In the transformation of standard score to Z – Score, we use the following formula:

Z = (X – x̅ ) ÷ S , X = raw score

Example:
On the first day of the final examination week, Larry took the tests for three subjects
– mathematics, economics, and philosophy. Although he felt that his preparation was the
same for these three tests, he believed he did a very good job on the philosophy test. The
test results were released 5 days later and Larry gathered the following information:
Math: Mean = 60, S = 8, and his raw score is 70
Economics: Mean = 72, S = 6, and his raw score is 78
Philosophy: Mean = 85, S = 5, and his raw score is 82

On which test did Larry perform Best? Worst? (Assuming that the 3 subjects have
the same number of items)

Math: Z = (70 – 60)/8 = 1.25


Economics Z = (78 – 72)/6 = 1.00
Philosophy Z = (82 – 85)/5 = -0.6

Best in Math: It is 1.25 standard deviation above the mean under the normal curve
Worst in Philosophy: While 85 is numerically higher than 70 (Math), the result of Z-score
is -0.6, which means that his score is more than half standard deviation below the
average performance of the whole class.

What is Coefficient of Variation as a measure of relative dispersion?

You should note that the range, semi-inter quartile range or quartile deviation, and
standard deviations discussed earlier are expressed in the units of original scores. Thus,
they are measures of absolute dispersion. Let us say one distribution of test scores in
mathematics may have a standard deviation of 10, and another distribution of scores in
science may have a standard deviation of 5. if we want to compare the variability of the
two distributions, can we say that the distribution with standard deviation of 10 has twice
the variability of the one with standard deviation of 5? Consider another example. One
distribution has a standard deviation of 8 meters, while another has a standard deviation of
₱15.00. Can we say that the latter distribution is more spread than the former?
Or can we compare standard deviations in meter and pesos? The answer seems obvious.
We cannot conclude anything by direct comparison of measures of absolute dispersion
because they are of different units or different categories in the first example, one is the
distribution of mathematics scores while the other is the distribution of science scores. To
make the comparison logical, we need a measure of relative dispersion which is
dimensionless or "unit free" This measure of relative dispersion is also known as the
coefficient of variation. This is simply the ratio of the standard deviation of a distribution
and the mean of the distribution which is expressed as percentage value.

cv = (s ÷ x̅ ) 100%

Let’s take the CV of the above tests of Larry:

Math: cv = 8 / 60 = 0.133 x 100% = 13.3%


Economics: cv = 6 / 72 = 0.083 x 100% = 8.3%
Philosophy: cv = 5 / 85 = 0.058 x 100% = 5.8%

Scores in Math are more dispersed than the other two subjects.

The Normal Distribution

The normal distribution is a special kind of symmetrical distribution that is most


frequently used to compare scores. It has been found that when a frequency polygon for a
large distribution of scores of a natural phenomenon and occurring characteristics (IQ.
height, income, test scores, etc.) are drawn as a smooth curve, one curve stands Out,
which is the bell-shaped curve. As seen below, this curve has a small percentage of
observations on both tails, and the bigger percentage on the inner part of the curve. The
shape of this particular curve is known as the normal curve, hence the name normal
distribution.

It is also called Gaussian distribution, named after Carl Friedrich Gauss. This
distribution has been used as a standard reference for many statistical decisions in the
field of research and evaluation.

In the discussion about normal distribution, the standard deviation becomes more
useful because it is used to determine the percentage of scores that fall within a certain
number of standard deviations from the mean. As a result of many experiments, empirical
rules have been established pertaining to the areas under the normal curve. In
assessment, the area in the curve refers to the number of scores that fall within a specific
standard deviation from the mean score, in other words, each portion under the curve
contains a fixed percentage of cases as follows:

68% of the scores fall between one standard deviations below and above the mean
95% of the scores fall between two standard deviations below and above the mean
99.77% of the scores fall between three standard deviations below and above the mean

Characteristics/properties of the Normal Distribution:

1. The mean, median, and mode are all equal.


2. The curve is symmetrical. As such, the value in a specific area on the left is equal to the
value of its corresponding area on the right.
3. The curve changes from concave to convex and approaches the X-axis, but the tails do
not touch the horizontal axis.
4. The total area on the curve is equal to 1.

Other Standard Scores:

There are other two standard scores aside from the Z-score and these are the T-
score and the Stanine scores.
The T-score:

As you see in the computation of the z-score, it can give you negative number,
which simply means the score is below the mean. However, communicating negative z-
score as below the mean may not be understandable to others. We will not even say to
students that they got a negative z-score. A z-score may also be a repeating or
nonrepeating decimal, which may not be comfortable for others. One option is to convert a
z-score into a T-score which is a transformed standard score. To do this, there is
scaling in which a mean of 0 in a z-score is transformed into a mean of 50, and the
standard deviation in z-score is multiplied by 10. The corresponding equation is:

T-score = 50 +10z

For example, a z-score of -2 is equivalent to a T-score of 30. That is:

T-score = 50 + 10(Z)
= 50 + 10(-2)
= 50 – 20
= 30

Looking back at the Philosophy score of Larry in our previous example, which
resulted in a z-score of -0.6, T-score equivalent is:

T = 50+ 10(-.6)
= 50 – 6
= 44

T-scores are convenient because scores below 0 and above 100 are virtually
impossible; in fact 99.7% of the time, a T-score will be between 20 and 80, because these
limits are standard deviations below and above the mean respectively.

Stanine Scores.

Another standard score is stanine, shortened from standard nine. With nine in its
name the scores are on a nine- point scale. in a Z-score distribution, the mean is 0, and
the standard deviation is 1. In this scale, the mean is 5, and the standard deviation is 2.
Each stanine is one-half standard deviation-wide. Like the T-score, stanine score can be
calculated from the Z-score by multiplying the Z-score by 2 and adding 5. That is:

Stanine = 2Z + 5

Going back to our example of Larry’s score in Philosophy that is 82 with a Z-score
of -0.6, its stanine equivalent is:

Stanine = 2(-0.6) + 5
= -1.2 + 5
= 3.8 or 4
On the assumption that stanine scores are normally distributed, the percentages of cases
in each band or range of scores in the scale are as follows:

Stanine Score Percentage of Scores


1 Lowest 4%
2 Next Low 7%
3 Next Low 12%
4 Next Low 17%
5 Middle 20%
6 Next High 17%
7 Next High 12%
8 Next High 7%
9 Highest 4%

With the above percentage distribution of Scores in each stanine, you can directly
convert a set of raw scores into stanine scores. Simply arrange the raw scores from lowest
to highest, and with the percentage of scores in each stanine, you can directly assign the
appropriate stanine score in each raw score. On interpretation of stanine score, let us say
Kate has a stanine score of 2. We can see that her score is somewhere at the low or
bottom 7 percent of the scores. In the same way, if John’s score is in the 6th stanine, it
falls between the 60th and 77th percentile, simply because 60 percent of the scores are
below the 6th stanine and 23 percent of the scores are above the 6th stanine. For
qualitative description, stanine Scores of 1, 2, and 3 are considered as below average; 4,
5, and 6 are average, and 7, 8, and 9 are above average. Thus, you can say that your
score of 86 in English is above average. Similarly, Kate's score is below average while that
of john is average.

The figure in the next page illustrates the equivalence of the different commonly
used standard scores.

The figure above is the Normal Distribution and the Standard Scores
What are measures of covariability?
There are situations when we look at examinees' performance measures, we ask
Ourselves what could explain such scores. Measures of covariability tell us to a certain
extent the relationship between two tests or two factors. Admittedly, a score one gets may
not only be due to a single factor but with other factors directly or indirectly observable,
which are also related to one another. This section will be limited to introducing two scores
that are hypothesized to be related to one another.
When we are interested in finding the degree of relationship between two scores,
we are dealing with the correlation between two variables. The statistical measure is the
correlation coefficient, an index number that ranges from-1.0 to 1.0. The value -1.0
indicates a negative perfect correlation, 0.00 no correlation at all, and 1.00 a perfect
positive correlation. There have been many correlation studies conducted in the field of
assessment and research, but correlation coefficients did not result in exact values of 0.00,
+1.0 and -1.0; instead, the correlation values are either closer to +1.0 or -1.0.

Interpretation of correlation or relationship:


0.80 -1.00 very strong relationship
0.60 - 0.79 strong relationship
0.40 - 0.59 substantial/marked relationship
0.20 - 0.39 weak relationship
0.00 - 0.19 negligible relationship

To measure the relationship of two variables, we use the Pearson Product Moment
Correlation (r)
N(ΣXY) – (ΣX)(ΣY)
r = -------------------------------------
[NΣX² - (ΣX)²][NΣY² - (ΣY)²]

This formula has been introduced earlier in Lesson 6, when you were taught on how
to compute for the reliability coefficient of scores. This is the same Pearson r, but this time,
it is used to establish relationship between two sets of data.
The above mathematical processes gave a correlation coefficient of 0.705 between
performance cores in reading and problem-solving. This coefficient indicates a strong
relationship or very high relationship between the two variables.

There are some precautions to be observed with regard to the computed r as


correlation coefficient. First, it should not be interpreted as percent. Thus, 0.705 should not
be interpreted as 70%. If we want to extend the meaning of 0.705, then compute for r2,
which becomes the coefficient of determination. This coefficient explains the percentage
of the variance in one variable that is associated win the other variable. Remember how
variance and standard deviation were explained in the previous sections? Now, with
reference to the two variables indicated in the Table 8.3, and the computed r of 0.70
(rounded off to the nearest hundredths), It results to an r 2 of 0.49, which can be taken as
49%. This can be interpreted as 49% of the variance in Problem-Solving test scores is
associated with the reading scores. Thus, r2 helps explain the variance observed in the
problem scores. The total variance is equal to 1, in percent, 100%. If 49% of the variance
observed in problem solving scores was attributable to reading scores, then the other 51%
of the variance in problem-solving test scores is due to other factors. This concept is
concretized in Figure 8.12. Second, while the correlation coefficient shows the relationship
between two variables, it should not be interpreted as causation. Considering our example,
we could not say that the scores in reading test causes 49% of the variance of the
problem-solving test scores. Relationship is different from causation.

Reading

49%

Problem-
solving

Activity 8:

1. The following is a Frequency distribution of year end examination marks in a certain


secondary
School.

.Class Interval f
60-65 2
55-59 5
50-54 6
45-49 8
40-44 11
35-59 10
30-34 11
25-29 20
20-24 17
15-19 6
10-14 4
i= n=

a. Compute the mean, median, and mode of the frequency distribution


b. Compare the measures of averages in (a) from this comparison, is the distribution
positively skewed, negatively skewed, or symmetrical?
C. Make a sketch of the graph of the frequency distribution. Describe the graph of the
distribution as to its skewness.
d. Find:

(1) Third quartile or the 75th percentile (P75)


(2) First quartile or the 25th percentile (P5)
(3) semi-interquartile range or quartile deviation

2. A Common exit examination is given to 400 students in a university. {he scores are
normally distributed, and the mean is 78 with a standard deviation of 6. Daniel had a score
of 72 and Jane a score of 84. What are the corresponding Z-scores of Daniel and Jane?
How many students would be expected to score between the scores of Daniel and Jane?
Explain your answer.

3. James obtained a score of 40 in his Mathematics test and 34 in his Reading test. The
class mean score in Mathematics is 45 with a standard deviation of 4 while in Reading, the
mean score is 50 with a standard deviation of 7. On which test did James do better
compared to the rest of the class? Explain your work.

4. Following are sets of scores on two variables: X tor reading comprehension and Y for
Reasoning Skills administered to sample of students.

X: 11 9 15 7 5 9 8 4 8 11
Y: 13 8 14 9 8 7 7 5 10 12

a. Compute the Pearson Product-moment correlation for the above data.


b. Describe the direction and strength of the relationship between reading
comprehension and reasoning skills.
c. Compute the Coefficient or Determination. Interpret the results.
Lesson 9: Grading and Reporting

Grading and reporting are fundamental elements in the teaching-learning process.


Assignment of grades represents the teacher's assessment of the learners performance
on the tests and on the desired learning outcomes as a whole. As such, it is important that
the bases and criteria for grading (1.e., scoring) and reporting test results are clearly
established and articulated from the very start of the course. Besides, grades are symbolic
representations that summarize the quality of learner's work and level of achievement.
Teachers should ensure that grading and reporting of learners' test results are meaningful,
fair, and accurate.

What are the purposes of grading and reporting learners' test performance?

There are various reasons why we assign grades and report learners' test
performance. Grades are alphabetical or numerical symbols/marks that indicate the
degree to which learners are able to achieve the intended learning outcomes. Grades do
not exist in a vacuum but are part or the instructional process and serve as a feedback
loop between the teacher and learners. They are one of the ways to communicate the level
of learning of the learners in specific course content. They give feedback on what specific
topic/s learners have mastered and what they need to focus more when they review for
Summative assessment or final exams. In a way, grades serve as a motivator for learners
to study and do better in the next tests to maintain or improve their final grade.

Grades also give the parents, who nave the greatest stake in learner’s education,
information about their children’s achievements. They provide teachers some bases for
improving their teaching and learning practices and for identifying learners who need
further educational intervention. They are also useful to school administrators who want to
evaluate the effectiveness of the instructional programs in developing the needed skills
and competencies of the learners.

What are the different methods in scoring tests or performance tasks?

There are various ways to score and grade results in multiple-choice tests.
Traditionally, the two most commonly-used scoring methods are number right Scoring
(NR) and negative marking (NM).

Number Right Scoring (NR) entails assigning positive values only to correct
answers while giving a score of zero to incorrect answers. The test score is the sum or the
scores for correct responses. One major concern with this scoring method Is that learners
may get the correct answer by guessing thus, affecting the test reliability and validity.

Negative Marking (NM) entails assigning positive values to correct answers while
punishing the learners for incorrect responses (i.e., right-minus-wrong correcting method).
In this model, a fraction of the number of wrong answers is subtracted from the number of
correct answers. Other models for this type of scoring method include (1) giving a positive
score to correct answer while assigning no mark for omitted items and (2) rewarding
learners for not guessing by awarding points rather than penalizing learners tor incorrect
answers. The recommended penalty tor an incorrect answer is 1/(n-1), where n stands for
the number of choices.

Both NR and NM methods of scoring multiple-choice tests are prone to guessing,


which affect test validity and reliability. As a result, nonconventional scoring methods were
introduced, which include: (1) Partial Credit Scoring Methods, (2) Multiple Answers Scoring
Method, (3) Retrospective Correcting for Guessing, and (4) Standard Setting Scoring
Method.

Partial Credit Scoring Methods attempt to determine a learner's degree of level of


knowledge With respect to each response option given. This method of scoring takes into
account partial knowledge mastery of learners. lt acknowledges that, while learners cannot
always recognize the correct response, they can discern that some response options are
clearly incorrect. There are three formats of partial credit scoring method:

Liberal Multiple-choice Test - It allows learners to select more than one answer to
a question if they feel uncertain which option or alternative is Correct.
Elimination Testing (ET) - It instructs learners to cross out all alternatives they
consider to be
incorrect.
Confidence Weighting (CW) - It asks learners to indicate. what they believe Is the
correct answer and how confident they are about their choice.

For this type of scoring, an item can be assigned different scores, depending on the
learners' response.
Multiple Answers Scoring Method allows learners to have multiple answers for
each item. In this method, learners are instructed that each item has at least one correct
answer or how many answers are correct. Items can be scored as solved only if all the
correct response options are marked but none of the incorrect others. Incorrect options
that are marked can lead to negative scores. Thayn (2011) found that multiple answers
and single answer items have the same discrimination power, item difficulty, and reliability
indices. However, multiple answers method is more difficult to solve, has lower
discrimination power, and takes more time to answer.

Retrospective Correcting for Guessing considers omitted or no-answer items as


incorrect, forcing learners to give an answer for every item even if they do not know the
correct answer. The correction for guessing is implemented later or retroactively. This can
be done through comparing learners' answers in multiple- choice items with their answers
on other test formats, such as short-answer test.

Standard-Setting entails using standards when scoring multiple-choice items,


particularity standards set through norm-referenced or criterion-referenced assessment.
Standards based on norm-referenced assessment are derived from the test performance
of a certain group of learners, while standards from criterion-referenced assessment are
based on preset standards specified from the very start by the teacher or school in
general.

For example, for a final examination in algebra, the Mathematics Department can
set the passing score (eg. /5 percentile rank) based on the norms derived from the scores
of learners for the past three years. To do this, the department will need to collect the
previous scores of learners on the same or equivalent Final exams and apply the formula
for standard scores to compute for the percentile ranks for each range of scores. On the
other hand, passing grades/scores are usually set by the department or the school based
on their standards (e.g., A (90-100 percent), B (80-83 percent, C t(70-/9 percent, or F (0-69
percent).

Marking or scoring constructed-type of tests, such as essay and performance tests,


require standardized scoring schemes so that scores are reliable and have the same valid
meaning tor all learners. There are four types of rating Scales for the assessment of
writing, which can als0 be applied to other authentic or performance-type assessment.
These tour types of scoring are (1) Holistic, (2) Analytic, 5) Primary Trait, and (4) Multiple
Trait scoring.

Holistic Scoring involves giving a single, overall assessment score tor an essay,
writing composition, or other performance-type of assessment as a whole. Although the
scoring rubric for holistic scoring lays out specific criteria for evaluating a task, raters do
not assign a score for each criterion. Instead, as they read a writing task or observe a
performance task, they balance strengths and weaknesses among the various criteria to
arrive at an overall assessment. Holistic Scoring is considered efficient in terms of time
and cost. It also does not penalize poor performance based on only one aspect (e.g.
content, delivery, organization, vocabulary, or coherence for oral presentation). However, it
is said that holistic scoring does not provide sufficient diagnostic information about the
students ability as it does not identity the areas for improvement and is difficult to interpret
as it does not detail the basis for evaluation.
Sample of Holistic Rubric for an Oral Presentation
3 – Excellent Speaker
• Included 10 – 12 changes in hand gestures
• No apparent inappropriate facial expressions
• Utilizes proper voice inflection
• Can create proper ambiance for the poem

2 – Good Speaker
• Included 5 – 9 changes in hand gestures
• Few inappropriate facial expressions
• Have some inappropriate voice inflection changes
• Almost creating proper ambiance
1 – Poor Speaker
• Included 1 – 4 changes in hand gestures
• Lots of inappropriate facial expressions
• Uses monotone voice
• Cannot create proper ambiance
Analytic Scoring involves assessing each aspect of a performance task (e.g.,
essay writing, oral presentation, class debate, and research paper) and assigning a score
for each criterion. Sometimes, an overall score is given by averaging the scores in all
criteria. One advantage of analytic scoring is its reliability. It also provides information that
can be used as diagnostic as it presents learners' strengths and weaknesses and in what
area/s and eventually as basis for remedial instructions. However, it is more time
consuming and therefore, expensive. It is also prone to halo effect, wherein scores in one
scale may influence the ratings of others. It is also difficult to create.

Sample of Analytic Rubric for Oral Presentation


Criteria 1 2 3

Number of x 1-4 5-9 10 - 12


Appropriate Hand 1
Gestures
Appropriate Facial x Lots of Few inappropriate No apparent
Expression 1 inappropriate facial expression inappropriate
facial expressions facial expression
Voice Interpretation x Monotone voice Can vary voice Can easily vary
2 used inflection with voice inflection
difficulty
Incorporate proper x Recitation Recitation has Recitation fully
ambiance through 3 contains very little some feelings captures
feelings in the feelings ambiance through
voice feelings in the
voice

Primary Trait Scoring focuses on only one aspect or criterion of a task, and a
learner’s performance is evaluated on only one trait. This scoring system defines a primary
trait in the task that will then be scored. For example, if a teacher in a political science
class asks his students to write an essay on the advantages and disadvantages of Martial
Law (i.e., the writing task), the basic question addressed in scoring is, "Did the writer
successfully accomplish the purpose of this task? With this focus, teacher would ignore
errors in conventions of written language but instead focus on overall rhetorical
effectiveness. One disadvantage or this scoring scheme is that it is often difficult to focus
exclusively on one trait, such that other traits may be included when scoring. Thus, it is
important that a very detailed scoring guide is used for each specific task.

Multiple-Trait Scoring requires that an essay test or performance task is scored on


more than one aspect, with scoring criteria in place so that they are consistent with the
prompt. Multiple-trait scoring is task specific, and the features to be scored vary from task
to task thus; requiring separate scores for different criteria. Multiple-trait scoring is similar
to analytic scoring because of its focus on several categories or criteria. However, while
analytic scoring evaluates more traditional and generic dimensions of language production,
multiple-trait scoring focuses on specific features of performance required to fulfill the
given task or tasks. For example, scoring criteria for writing performance mav include
abilities to present argument clearly, to organize one’s thoughts, and to present accurate
language usage through grammar, punctuation, and spelling.

What are the different types of test scores?

Test scores can take the form of any of the following: (1) raw scores, (2) parentage
scores, and (3) derived scores. Under the derived scores are grades that are based on
criterion-referenced and norm-referenced grading system.

1. Raw Score is simply the number of items answered correctly on a test. A raw Score
provides an indication of the variability in the performance of students in the class.
However, a raw score has no meaning unless you know what the test IS measuring and
how many items it contains. A raw score also does not mean much because it cannot
be compared with a standard or with the performance of another learner or of the class
as a whole. Raw scores may be useful if everyone knows the test and what it covers,
how many possible right answers there are, and how learners typically do in the test.

2. Percentage score. This refers to the percent of items answered correctly in a test. The
number of items answered correctly is typically converted to percent based on the total
possible score. The percentage score is interpreted as the percent of content, skills, or
knowledge that the learner has a solid grasp of. Just like raw score, percentage score
has limitation because there is no way of comparing the percentage correct obtained in
a test with the percentage correct in another test with a different difficulty level.

Percentage score is most appropriate to use in teacher-made test o criterion-


referenced test. It is appropriate to use for teacher-made test that is administered
commonly to a class or to students taking the same course with the same contents or
syllabus. In this way, the student’s test performances can be compared among each
other in the class or with their peers in another section. In the same manner, percentage
score is suitable to use in subjects wherein a standard has been set.

3. Criterion-Referenced Grading System. This 1S a grading system wherein learners’


test scores or achievement levels are based on their performance in specified learning
goals and outcomes and performance standards. Criterion-referenced grades provide a
measure of how well the learners have achieved the preset standards, regardless of
how everyone else does. It is therefore important that the desired outcomes and the
standards that determine proficiency and success are clear to the learners at the very
start. These should be indicated in the course syllabus. Criterion-referenced grading is
premised on the assumption that learners’ performance is independent of the other
learners in their group/class.

The following are some of the types or criterion-referenced scores or grades:

3.1 Pass or Fail Grade. This type of score is most appropriate if the test or assessment
is primarily or entirely to make a pass or fail decision. In this type of scoring. a
standard or cut-off score is preset, and a learner is given a score of pass if he or
she surpassed the expected level of performance or the cut-off score.
Pass or Fail grading has the following advantages: (1) it takes pressure off the
learners in getting a high letter or numerical grade, allowing them to relax while still
getting the needed education; (2) it gives learners a clear cut idea of their strengths
and weaknesses; and (3) it allows learners to focus on true understanding or
learning of the course content rather than on specific details that will help them
receive a high letter or numerical score.

3.2 Letter Grade. This is one of the most commonly used grading systems. Letter
grades are usually composed of five-level grading Scale labeled from A to E or F,
with A representing the highest level of achievement or performance, and E or F-the
lowest grade-representing a Failing grade. These are often used for all forms of
learners' work, such as quizzes, essays, projects, and assignments.

An example of the descriptors for letter grades is presented below.

Letter Grade Interpretation


A Excellent
B Good
C Satisfactory
D Poor
E Unacceptable

3.3 Plus (+) and Minus (-) Letter Grades. This grading provides a more detailed
descriptions of the level of learners' achievement or task/test performance by
dividing each grade category into three levels, such that a grade of A Can be
assigned as At, A and A-; B as B+, B and B-, and so on. Pus (+) and minus (-)
grades provide a finer discrimination between achievement or performance levels.
They also increase the accuracy of grades as a reflection of learner's performance
enhance student motivation (1.e., to get a high A rather than an A-) and discriminate
among performance in a very similar pool of learners, such as those in advanced
courses or star sections. However, +/- grading system is viewed as unfair,
particularly for learners in the highest category; creates stress for learners, and is
more difficult for teachers as they need to deal with more grade categories when
grading learners.

Examples of the descriptors for plus (+) and minus (-) letter grades are presented below:

(+)/(-) Letter Grades Interpretation


A+ Excellent
A Superior
A- Very Good
B+ Good
B Very Satisfactory
B- High Average
C+ Average
C Fair
C- Pass
D Conditional
E/F Failed

3.4 Categorical Grades. This system of grading is generally more descriptive than
letter grades, especially if coupled with verbal labels. Verbal labels eliminate the
need for a key or legend to explain what each grade category means.

Examples of categorical grades are:


Exceeding Meeting Approaching Emerging Not
Standards Standards Standards Standards Exceeding
Standards
Advanced Intermediate Basic Novice Below Basic
Exemplary Accomplished Developing Beginning Inadequate
Expert Proficient Competent Apprentice Novice
Master Distinguished Proficient Intermediate Novice

4. Norm-Referenced Grading System. In this method of grading, learners' test scores


are compared with those of their peers. Norm-referenced grading involves rank ordering
learners and expressing a learner's score in relation to the achievement of the rest of
the group (i.e., class or grade level, School, etc.). The peer group usually serves as the
normative group (e.8, class, age group, year level). Unlike the criterion-referenced
scoring, norm-referenced scoring does not tell what the learners actually achieved, but it
only indicates the learners' achievement in relation to their peers' performance. Norm-
referenced grading allows teachers to:

(1) compare learners' test performance with that of other learners;


(2) compare learners' performance in one test (subtest) with another test (Subtest); and
(3) compare learners' performance in one form of the test with another torm of the test
administered
at an earlier date.

There are different types of norm-referenced scores.

4.1 Developmental Score. This is the score that has been transformed from raw scores
and reflect the average performance at age and grade levels. There are two kinds of
developmental scores: (1) grade-equivalent and (2) age-equivalent scores.

4.1.1 Grade-Equivalent Score is described as both a growth score and status score.
The grade equivalent of a given raw score on any test indicates the grade level at
which the typical learner earns this raw score. It describes test performance of a
learner in terms of a grade level and the months since the beginning of the
school year. A decimal point is used between the grade and month in grade
equivalents. For example, a score of 7.5 means that the learner did as well as a
Grade / taking the test at the end or the fifth month of the school year.

4.1.2 Age-Equivalent Score indicates the age level that is typical to a learner to obtain
such raw score. It reflects a learner's performance in terms of the chronological
age as compared to those in the norm group. Age-equivalent scores are written
with a hyphen between years and months. For example, a learner's score of 11-5
means that his age equivalent is 11 years and 5 months old, indicating a test
performance that is similar to that of 11½ year-olds in the norm group.
4.2 Percentile Ranks. Percentile Rank is useful in cases where comparison between
individual scores relative to their positions in the entire group is a major concern. One
example is the Licensure Examination for Teachers (LET average scores are actually
percentile ranks). An examinee who surpassed 90% of all the examinees gets a score
of 90 and an examinee who belongs to the top 2% gets 98. Percentile ranks are also
valuable tools for the comparison of two or more measurements, each taken from a
different set of data.

4.3 Stanine Score. This system expresses test results in nine equal steps which range
from one (lowest) to nine (highest). Percentile ranks are grouped into stanines, with
the following verbal interpretations:

Description Stanine Percentile Rank


Very High 9 96 and above
Above Average 8 90 – 95
7 77 – 89
Average 6 60 – 76
5 40 – 59
4 23 – 39
Below Average 3 11 – 22
2 4 – 10
Very Low 1 3 and below

4.4 Standard Scores. They are raw scores that are converted into a common scale of
measurement that provides meaningful description of the individual scores within the
distribution. Two types of a standard score are the Z-score and the T-ratio, (please
see the previous discussions).

What are the general guidelines in grading tests or performance task?

Utmost care should be observed to ensure that grading practices are equitable, fair,
and meaningful to earners and stakeholders. When constructing a test or performance
task, the methods and criteria for grading learners’ responses or answers should be set
and specified. The following are the general guidelines in grading tests or performance
tasks:

1. Stick to the purpose of the assessment. Before coming up with an assessment, it is


first important to determine the purpose of the test. Will the assessment be used for
diagnostic purposes, formative assessment, or summative assessment?

2. Be guided by the desired learning outcomes. The learners should be informed early
on what are expected of them insofar as learning outcomes are concerned, as well as
how they will be assessed and graded in the test.

3. Develop grading criteria. Grading criteria to be used in traditional tests, and


performance tasks should be made clear to the students. A holistic or analytic rubric
can be used to map out the grading criteria.

4. Inform learners what scoring methods are to be used. Learners should be made
aware before the start of testing, whether their test responses are to be scored based
on the number right, negative marking, or through non-conventional scoring methods.
As such, the learners will be guided on how to mark their responses during the test.

5. Decide on what type of test scores to use. It is important that different types of
grading scheme be used for different tests, assignments, or performance tasks.
Learners should also be informed at the start of what grading system is to be used for
a particular test or task.

What are the general guidelines in grading essay test?

Essays require more time to grade than the other types of traditional tests Grading
essay tests can also be influenced by extraneous factors, such as learners handwriting
legibility and raters' biases. It is therefore important that you devise essay question
prompts and grading scheme procedures that will minimize the threats to validity and
reliability.

Scoring essay responses can be made more rigorous by developing a scoring


scheme. The following are the general guidelines in scoring essay tests.
1. Identify the criteria for rating the essay. The criteria or standards for evaluating the
essay should be predetermined. Some of the criteria that can be used include content,
organization/format, grammar proficiency development and support, focus and details,
etc. It is important that the specific standards and criteria included are relevant to the
type of performance tasks given.

2. Determine the type of rubric to use. There are two basic types of rubric holistic or
analytic scoring system. Holistic rubrics require evaluating the essay and taking into
consideration all the criteria. Only a single score is given based on the overall
judgment of the learner's writing composition. Holistic rubric is viewed to be more
convenient for the teachers as it requires less area or aspect of writing to evaluate.
However, it does not provide specific feedback on what course topic/content or criteria
that the students are weak at and need to improve on. On the other hand, analytic
scoring system requires that the essay is evaluated based on each of the criteria. It
provides useful feedback on learner's strengths and weaknesses for each Course
content or criterion.

3. Prepare the rubric. In developing rubric, the skills and competencies related to essay
Writing should first be identified. These skills and competencies represent the criteria.
Then, performance benchmarks and point values are determined. Performance marks
can be numerical categories, but the most frequently used are descriptors with
corresponding rating scale.

4. Evaluate essay anonymously. Checking essay should be done anonymously. It is


important that the rater does not identify whose paper he/she is rating.

5. Score one essay question at a time. This is to ensure that the same thinking and
standards are applied for all learners in the class. The rater should try to avoid any
distraction or interruption when evaluating the same irem.

6. Be conscious of own biases when evaluating a paper. The rate should not be
affected by learners' handwriting, writing style, length of responses, and other factors.
He/she should stick to the criteria included in the rubric when evaluating the essay.

7. Review initial scores and comments before giving the final rating. This is important
especially for essays that were initially given a barely passing or failing grade.

8. Get two or more raters for essays that are high-stake, such as those used for
admission, placement, or scholarship screening purposes. The final grade will be the
average of all the ratings given.

9. Write comments next to the learner's responses to provide feedback on how well one
has performed in the essay test.

What is the new grading system of the Philippine K-12 Program?

On April 1, 2015, the Department of Education, through DepEd Order 8 s. 2015,


announced the implementation of a new grading system for all grade levels in public
schools from elementary to Senior High School. Although private schools are not required
to implement the same guidelines, they are encouraged to follow them and are permitted
to modify them in accordance to their institution's Philosophy, Vision, and Mission. The
grading system is described as a standard and competency-based grading system, where
60 is the minimum grade needed O pass a specific learning area, which is transmuted to
75 in the report card. The 1owest mark that can appear on the report card is 60 for
Quarterly Grades and Final Grades. Grades will be based on the weighted raw score of
the learners’ summative assessments based on three components: Written Work (WW),
Performance Task (PT), and Quarterly Assessment (QA).
Steps are: 1) get total score for each component; 2) Convert to percent; 3) Convert % to
weighted score (WS); 4) Add weighted scores for the initial grade; and 5) Transmute initial
grade to quarter grade (QG).
For MAPEH, individual grades are given to each area (.e.. Music, Art, and Health).
The quarterly grade for MAPEH is the average grade across the four areas, as follows:

QG for Music + QG for Arts+QG for PE+QG for Health


QG for MAPEH = ---------------------------------------------------------------------------
4

The Final Grade for each subject is then computed by getting the average of the four
quarterly grades, as seen below:

1QG + 2QG + 3QG + 4QG


Final Grade for each Learning Area = ------------------------------------------
4

The General Grade, on the other hand, is computed by getting the average of the
Final Grades for all subject areas. Each subject area has equal weignt:

Sum of Final Grades of All Learning Areas


General Average = ----------------------------------------------------------------------
Total Number of Learning Areas in a Grade Level

All grades reflected in the report card are reported as whole number

How should test results be communicated to different stakeholders?

Since grades serve as an important feedback about learners level of performance


or achievement, teachers should communicate the test results to learners, their parents,
and other stakeholders. Feedback on how well the learners performed on a test or any
performance task has been proven to help them improve their learning.

If test results are to serve as a mechanism to inform learners on what, where and
how they should improve in their work and learning as a whole, then an effective and
efficient reporting system should be in place. Teachers should come up with guidelines
and processes on how grades are to be communicated and presented to make them clear,
understandable, and relevant to the recipients. Unless the test results are communicated
effectively, the purpose of assessment is not likely to be achieved.

First, the rationale or purpose of testing and the nature of the tests administered to
the learners should be clearly explained. This is especially true for high-stake testing, such
as those used for placement, admission, grade level promotion, graduation decisions, as
well as for 1Q or psychological testing, which are more likely to be misinterpreted. It is
important to inform the students and their parents that tests are only one of several tools to
assess their performance or achievement and that they are not evaluated on the basis of
one test alone.

Second, the meaning of test scores should be explained. For norm-referenced


testing terms such as a percentile rank, stanine, standard scores, and the like should be
explained clearly. Similarly, standards or criteria used tor criterion-referenced testing
should be thoroughly described and clarified to the recipients.

Finally, learners and parents should be made to understand the meaning or


interpretation of their test scores. Teachers have the better grasp of the learners’
capabilities in school. As such, they are in the best position to explain, especially to the
parents, how far their test scores are from their classmates or other learners in other
classes, on what topics, subject areas, or competencies they are good at or should
improve on, and whether or not they are working up to their potential.

References:

1. Balagtas, Marilyn U., David, Adonis P., Golla, Evangeline F., Magno, Carlo P., and
Valladolid, Violeta C. (2020), “Assessment in Learning 1”, 1st Edition, Rex Book
Store, Inc., Sampaloc, Manila, Philippines.
2. Anderson, W. L., and Krathwohl, D. R. (2001). “A Taxonomy for Learning, Teaching, and
Assessing: A revision of Bloom’s Taxonomy of Educational Objectives”. New
York: Longman.
3. Navarro, Rosita L, and Rosita G. Santos, (2012). “Assessment of Learning Outcomes 1,”
2nd Edition, Lorimar Publishing House, Manila, Philippines.
4. Baker, E. L. (1992), “The Role of Domain Specifications in Improving the Technical
Quality of Performance Assessment,” (CSE Tech. Rep. Los Angeles: University of
California, Center for Research on Evaluation, Standards, and Student Testing.
5. Hernon, P. and Dugan, R. “Outcomes Assessment in Higher Education.” Westport:
Libraries Unlimited, 2004.
6. Mehrens, W. A. “Using Performance Assessment for Accountability Purposes,”
Edcuational Measurement: Issues and Practices. 1992.
7. Tuckman, B. “ The Essay Test: A Look at the Advantages and Disadvantages,” NASSP
Bulletin, 1993.
8. Zaremba, S and Schultz, M. “An Analysis of Traditional Classroom Assessment
Techniques and Discussion,” ED 365404, 1993.

You might also like