Module 6
Module 6
Assessment: It is systematic procedure for collecting information that can be used to make
inferences about the characteristics of people or object. It is a process of gathering
information to monitor progress and make educational decisions if necessary. As noted in my
definition of test, an assessment may include a test, but also includes methods such as
observations, interviews, behavior monitoring, etc.
Evaluation: Basically, evaluation is the process of making judgments based on criteria and
evidence. Also, Procedures used to determine whether the subject (i.e. student) meets a
preset criteria, such as qualifying for special education services. This uses assessment
(remember that an assessment may be a test) to make a determination of qualification in
accordance with a predetermined criterion.
Different Questioning types in the classroom
There are many advantages to closed questions. They’re quick and easy to
respond to and generally reduce confusion. They’re also particularly useful for
challenging pupils’ memory and recalling facts.
Useful for: critical or creative discussion, finding out more information about a
concept or lesson
Probing questions
These questions are useful for gaining clarification and encouraging others to
tell you more information about a subject. Probing questions are usually a
series of questions that dig deeper and provide a fuller picture.
When a teacher wishes to start the new lesson, the teacher start with
questions and at one point the student not able to tell the answers then the
teacher introduce the new lesson. This kind of question and bring the students
at one point and introducing the concepts / lesson motivate the student to
listen the class.
Leading questions
It’s important to use leading questions carefully; they can be seen as an unfair
way of getting the answer you want.
Loaded questions
Loaded questions are seemingly straightforward, closed questions — with a
twist: they contain an assumption about the respondent. They’re popularly used
by examiner during viva-voce of laboratory or project to trick their interviewee
or student to get the fundamental concept of the lab or project they would
otherwise be unwilling to disclose.
For example, the question: ‘have you stopped copying the answers from the
nearby students?’ assumes the respondent copied more than once. Whether
the student answers yes or no, the student will admit to having copied the
answer at some point.
Of course, the preferred response would be: ‘I have never copied answers in my
examination’ But it’s not always easy to spot the trap. These questions are quite
rightly seen as manipulative.
Useful for: discovering facts about someone who would otherwise be reluctant
to offer up the information
Funnel questions
When a faculty member wish to start with a generalized discussion and get the
concepts in detail then funnel questions are more suitable. This funnel questions
very much necessary while refresh the concepts before exams or practical.
Funnel questions can also be used to bring the students in the relaxed attention
mode: asking students to go into detail about their difficulties in learning or
listening distracts them from their anxiety and gives the information that the
faculty member need to provide them a solution, which in turn calms them
down and makes them think something positive is being done to help them.
To start with preparative assessment, the recall questions are very much useful
for the faculty members. Recall questions require the student to remember the
lesson which taught in the earlier class. For example, the faculty members wish
to start the class, before the faculty member wish to connect the previous class
concept and make the students to remember the earlier class, then recall
questions are suitable to start the session. This kind of question is also used to
the student who is not listening the class and bring back the attention with
simple recall questions.
Rhetorical questions
The rhetorical questions are make the students to always remember the
concept or formulae or statement. Rhetorical questions are asked to keep the
students engaged in the class by remembering the lesson. It also helps student
to think, be creative and come up with ideas.
Testing the learning is an important part of classroom practice, and questioning is one of the
most common methods of checking learner understanding. Questioning is something teachers
do naturally as part of their daily routine, but developing the skills associated with questioning
techniques presents many challenges for teachers and it is something that is developed over
time. Teachers need to review what is to be learnt in any one teaching and learning session and
plan for the inclusion of questioning accordingly. When to pose open and closed questions,
teacher must know, how to develop a question distribution strategy and when to use questions
to check learners’ knowledge.
So,
• Pose the question to the whole group.
• Pause – allowing all learners to think of the answer Pose, Pause, Pounce (PPP)
• Name a learner to answer.
• Listen to the answer.
• Reward correct answers.
• Incorrect answers should not be ridiculed either by the teacher or the remainder of the
group of learners.
• Spread the questions around the class so that all can participate.
The distribution of the question is again very important.
• If teachers work around the class in an obvious systematic order, those who have
answered tend to relax a little, and sometimes ‘switch off’.
• Use a technique which is not obvious.
• Be conscious of the tendency to choose the same learners when asking questions.
• Most teachers tend to concentrate their attention on those learners, so deliberately pay
attention to those normally omitted.
Different forms of Assessment
Assessment frames learning, creates learning activity and orients all aspects of learning
behavior. Tests and other assessment procedures can also be classified in terms of their
functional role in classroom instruction. The functional role explains the sequence of
assessment procedures are likely to be used in the classroom. This kind of sequencing and
categorization is continuing today. According to David Miller, the classification are
Placement assessment: To determine student performance at the beginning of instruction
Formative assessment: To monitor learning progress during instruction
Diagnostic assessment: To diagnose learning difficulties during instruction
Summative assessment: To assess achievement at the end of instruction
Placement Assessment
Placement assessment is concerned with the student's entry performance and typically focuses
on questions such as (a) Does the student possess the knowledge and skills needed to begin the
planned instruction? For example, beginning algebra, student should have a sufficient
command of essential mathematics concepts (b) To what extent has the student already
developed the understanding and skills that are the goals of the planned instruction? Sufficient
levels of comprehension and proficiencies might indicate the desirability of skipping certain
units or of being placed in a more advanced course. (c) To what extent do the student's interests,
work habits, and personality characteristics indicate that one mode of instruction might be
better than another (e.g., group instruction versus independent study)? The goal of placement
assessment is to determine for each student the position in the instructional sequence and the
mode of instruction that is most beneficial.
Formative Assessment
Assessment for learning is a formative assessment. Formative assessment is used to monitor
learning progress during instruction. Its purpose is to provide continuous feedback to both
students and teachers concerning learning successes and failures. The wide variety of
information that teachers collect about students’ learning processes provides the basis for
determining what they need to do next to move student learning forward. It provides the basis
for providing descriptive feedback for students and deciding on groupings, instructional
strategies, and resources The feedback to students provides reinforcement of successful
learning and identifies the specific learning errors and misconceptions that need correction.
Formative assessment depends heavily on specially prepared tests and assessments for each
segment of instruction that is unitwise or chapterwise. Tests and other types of assessment tasks
used for formative assessment are most frequently teacher made, but customized tests made
available by publishers of textbooks and other instructional materials also can serve th.is
function. Observational techniques are, of course, also useful in monitoring student progress
and identifying learning errors. Because formative assessment is directed toward. improving
learning and instruction, the results are typically not used for assigning course grades.
Diagnostic Assessment
Diagnostic assessment is a highly specialized procedure. It is concerned with the
persistent or recurring learning difficulties that are left unresolved by the standard corrective
prescriptions of formative assessment. If a student continues to experience failure in reading,
mathematics, or other subjects despite the use of prescribed alternative methods of instruction,
then a more detailed diagnosis is indicated. To use a medical analogy, formative assessment
provides first-aid treatment for simple learning problems, and diagnostic assessment searches
for the underlying causes of those problems that do not respond to first-aid treatment. Thus,
diagnostic assessment is much more comprehensive and detailed. It involves the use of
specially prepared diagnostic tests as well as various observational techniques. Serious learning
disabilities also are likely to require the services of educational, counsellors, and medical
specialists. The aim of diagnostic assessment is to determine the causes of persistent learning
problems and to formulate a plan for remedial action.
Summative Assessment
The last kind of assessment is called summative assessment and it is also called as
assessment of learning. Summative assessment typically comes at the end of a course of
instruction. It is designed to determine the extent to which the instructional goals have been
achieved and is used primarily for assigning course grades or for certifying student mastery of
the intended learning outcomes. The techniques used in summative assessment are determined
by the instructional goals, but they typically include teacher made achievement tests, ratings
on various types of performance, and assessments of products. These various sources of
information about student achievement may be systematically collected into a portfolio that
may be used to summarize or showcase the student’s accomplishments and progress. Although
the main purpose of summative assessment is grading or the certification of student
achievement, it also provides information for judging the appropriateness of the course
objectives and the effectiveness of the instruction.
So, sequencing of assessment is more sensible in Teaching – Learning process. Among the all
forms of assessment formative assessment is more powerful assessment for the improvement
teaching – learning process. In the formative assessment questioning to the students is an art.
Introduction to Student Assessment and Evaluation
INTRODUCTION
From this point of view evaluation is defined as a systematic process of determining the extent to
which the learners achieve instructional/training objectives. It may include either quantitative or
qualitative description of learner behavior plus value judgement concerning its worth. It is
imperative that we make judgements based on proper information (qualitative or quantitative)
through suitably designed tools and techniques for the purpose.
Role of Evaluation
Purposes of Evaluation
• know his/her strengths and weaknesses and direct his/her study efforts to
make up for gap or knowledge and understanding
• compare his/her progress with that of his/her peers and get motivated to
do better
• assess how effective the instructional methods and strategies used are
• detect students' learning difficulties and provide for remedy
• identify individual student differences and suitably adapt teaching
strategies
• grade students
It helps administrators to
• make any structural changes in the system such as providing more resources,
revision of curriculum etc., to improve the system
General Principles
Some general principles that provide direction to the evaluation process are:
• Evaluation is a systematic process to determine the extent to which objectives
are achieved. This means that formulating objectives in clear terms is an
important prerequisite, as that will spell out 'what to evaluate'.
• Evaluation procedures are selected in terms of the purposes to be served. The
question is not 'should this procedure be used?' but rather 'when this
procedure be used?' A particular procedure is suitable for certain purposes
and not appropriate for others.
• A variety of procedures are needed for evaluation. Tests (different types), self-
report techniques and observation are some of the procedures available,
Appropriate procedures are to be used depending on the nature of objectives
(cognitive, psychomotor; and affective) for ensuing comprehensive
evaluation.
• Knowledge of limitations as well as strengths of different evaluation
procedures is needed for their proper use- A teacher/trainer should develop
skills in minimizing errors in evaluation by being able to design and use
different procedures appropriately
• Evaluation is a means to an end and not an end in itself. Evaluation has to be
looked upon as a process of obtaining reliable information upon which to base
educational decisions (instructional, guidance or administrative). It is not the
end of the teaching learning process.
General Principles for Evaluation Process
Some general principles that provide direction to the evaluation process are:
• Evaluation is a systematic process to determine the extent to which objectives are
achieved. This means that formulating objectives in clear terms is an important
prerequisite, as that will spell out 'what to evaluate'.
• Evaluation procedures are selected in terms of the purposes to be served. The question
is not 'should this procedure be used?' but rather 'when this procedure be used?' A
particular procedure is suitable for certain purposes and not appropriate for others.
• A variety of procedures are needed for evaluation. Tests (different types), self-report
techniques and observation are some of the procedures available.
• Appropriate procedures are to be used depending on the nature of
objectives(cognitive, psychomotor; and affective) for ensuing comprehensive
evaluation.
• Knowledge of limitations as well as strengths of different evaluation procedures is
needed for their proper use- A teacher/trainer should develop skills in minimizing
errors in evaluation by being able to design and use different procedures appropriately
• Evaluation is a means to an end and not an end in itself. Evaluation has to be looked
upon as a process of obtaining reliable information upon which to base educational
decisions (instructional, guidance or administrative). It is not the end of the teaching
learning process.
Criticism 1: Tests Create Anxiety. There is no doubt that anxiety increases during testing.
For most students, it motivates them to perform better. For a few, test anxiety may be so great
that it interferes with test performance. These typically are students who are generally
anxious, and the test simply adds to their already high level of anxiety. A number of steps can
be taken to reduce test anxiety, such as thoroughly preparing for the test, taking practice
exercises, and using liberal time limits. Fortunately, many test publishers in recent years have
provided practice tests and shifted from speed tests to power tests. This should help, but it is
still necessa1y to observe students carefully during testing and to discount the scores of
overly anxious students.
Criticism 2: Tests Categorize and Label Students. Categorizing and labeling individuals can
be a serious problem, particularly when those labels are used as an excuse for poor student
achieve men t rather than a means of providing the extra-services and help to ensure better
achievement. It is all too easy to place individuals in pigeonholes and apply labels that
determine, at-least in part, how they are viewed and treated. Classifying students in terms of
levels of mental ability has probably caused the greatest concern in education. When students
are classified as mentally retarded, for example, it influences how teachers and peers view the
m, how they view themselves, and the kind of institutions programs they receive. When
students are mislabeled as mentally retarded, as has been the case with some racial and ethnic
minorities, the problem is compounded. At least some of the support for mainstreaming
handicapped students has come from the desire to avoid the categorizing and labeling that
accompanies special education classes. Classifying students into various types of learning
groups can more efficiently use the teacher 's time and the school's resources. However, when
grouping, teachers must consider that tests measure only a limited sample of a student 's
abilities and that students are continuously changing and developing. By keeping the
groupings tentative and flexible and regrouping for different subjects (e.g., science and math),
teachers can avoid most of the undesirable features of grouping. It is when the categories are
viewed as rigid and permanent that labeling becomes a serious problem. In such cases, it is
not the test that should be blamed but the user of the test.
Criticism 3: Tests Damage Students' Self-Concepts. This is a concern that requires the
attention of teachers, counselors, and other users of tests. The improper use of tests may
indeed contribute to distorted self-concepts. The stereotyping of students is one misuse of
tests that is likely to have an undesirable influence on a student's self-concept. Another is the
inadequate interpretation of tst scores that may cause students to make unwarranted
generalizations from the results. It is certainly discouraging to receive low scores on tests,
and it is easy to see how students might develop a general sense of failure unless the results
are properly interpreted. Low-scoring students need to be made aware that aptitudeand
achievement tests are limited measures and that the results can change. In addition, the
possibility of over generalizing from low test scores will be lessened if the student's positive
accomplishments and characteristics are mentioned during the interpretation. When properly
interpreted and used, tests can help students develop a realistic understanding of their
strengths and weaknesses and thereby contribute to imp roved learning and a positive self-
image.
Criticism 4: Tests create Self-fulfilling prophecies. This criticism has been directed primarily
toward intelligence or scholastic aptitude tests. The argument is that test scorescreate tead1er
expectations concerning the achievement of individual students; the tead1er then teaches in
accordance with those expectations, and the students respond by achieving to their expected
level- a self-fulfilling prophecy. Thus, those who are expected to achieve more do achieve
more, and those who are expected to achieve less do achieve less. The belief that teacher
expectations enhance or hinder a student's achievement is widely held, and the role of testing
in creating these expectations is certainly worthy of further research.
In summary, there is some merit in the va1ious criticisms concerning the possible
undesirable effects of tests on students; but more often than not, these criticisms should be
directed at the users of the tests rather than the tests themselves. The same persons who
misuse test results are likely to misuse alternative types of information that are even less
accurate and objective. Thus, the solution is not to stop using tests but to start using tests and
other data. sources of information more effectively.
Concepts of Educational Testing
Testing is neither assessment nor appraisal, but at the same time it may become a
means to getting information, data or evidences needed for assessment and appraisal. Testing
is one of the significant and most usable technique in any system of examination or
evaluation. It envisages the use of instruments or tools for gathering information or data. In
written examinations, question paper is one of the most potent tools employed for collecting
and obtaining information about pupils’ achievement.
Learning Theory
Learning theories are conceptual frameworks describing how information is absorbed,
processed and retained during learning. Cognitive, emotional, and environmental influences,
as well as prior experience, all play a part in how understanding, or a world view, is acquired
or changed and knowledge and skills retained.
Behaviorists look at learning as an aspect of conditioning and will advocate a system of
rewards and targets in education. Educators who embrace cognitive theory believe that the
definition of learning as a change in behavior is too narrow and prefer to study the learner
rather than their environment and in particular the complexities of human memory. Those
who advocate constructivism believe that a learner's ability to learn relies to a large extent on
what he already knows and understands, and the acquisition of knowledge should be an
individually tailored process of construction. Transformative learning theory focuses upon
the often- necessary change that is required in a learner's preconceptions and world view.
Behaviorism
Behaviorism is a philosophy of learning that only focuses on objectively observable
behaviors and discounts mental activities. Behavior theorists define learning as nothing more
than the acquisition of new behavior. Experiments by behaviorists identify conditioning as a
universal learning process. There are two different types of conditioning, each yielding a
different behavioral pattern: Classic conditioning occurs when a natural reflex responds to a
stimulus.
The most popular example is Pavlov's observation that dogs salivate when they eat or even
see food. Essentially, animals and people are biologically "wired" so that a certain stimulus
will produce a specific response. Behavioral or operant conditioning occurs when a response
to a stimulus is reinforced. Basically, operant conditioning is a simple feedback system: If a
reward or reinforcement follows the response to a stimulus, then the response becomes more
probable in the future. For example, leading behaviorist B.F. Skinner used reinforcement
techniques to teach pigeons to dance and bowl a ball in a mini-alley.
How Behaviorism impacts learning:
• Positive and negative reinforcement techniques of Behaviorism can be very effective.
• Teachers use Behaviorism when they reward or punish student behaviours.
Cognitivism
Jean Piaget authored a theory based on the idea that a developing child builds
cognitive structures, mental "maps", for understanding and responding to physical
experiences within their environment. Piaget proposed that a child's cognitive structure
increases in sophistication with development, moving from a few innate reflexes such as
crying and sucking to highly complex mental activities.
The four developmental stages of Piaget's model and the processes by which children
progress through them are: The child is not yet able to conceptualize abstractly and needs
concrete physical situations. As physical experience accumulates, the child starts to
conceptualize, creating logical structures that explain their physical experiences. Abstract
problem solving is possible at this stage. For example, arithmetic equations can be solved
with numbers, not just with objects. By this point, the child's cognitive structures are like
those of an adult and include conceptual reasoning. Piaget proposed that during all
development stages, the child experiences their environment using whatever mental maps
they have constructed. If the experience is a repeated one, it fits easily - or is assimilated -
into the child's cognitive structure so that they maintain mental "equilibrium". If the
experience is different or new, the child loses equilibrium, and alters their cognitive structure
to accommodate the new conditions. In this way, the child constructs increasingly complex
cognitive structures.
Constructivism
Constructivism is a philosophy of learning founded on the premise that, by reflecting
on our experiences we construct our own understanding of the world we live in. Each of us
generates our own "rules" and "mental models," which we use to make sense of our
experiences. Learning, therefore, is simply the process of adjusting our mental models to
accommodate new experiences.
Executing
In executing, a student routinely carries out a procedure when confronted with a
familiar task (i.e., exercise). The familiarity of the situation often provides sufficient clues to
guide the choice of the appropriate procedure to use. Executing is more frequently associated
with the use of skills and algorithms than with techniques and methods (see our discussion of
Procedural knowledge on pages 52-53). Skills and algorithms have two qualities that make
them particularly amenable to executing. First, they consist of a sequence of steps that are
generally followed in a fixed order. Second, when the steps are performed correctly, the end
result is a predetermined answer. An alternative term for executing is carrying out.
Assessment formats: In executing, a student is given a familiar task that can be performed using
a well-known procedure. For example, an execution task is "Solve for x: x2 + 2x - 3 = 0 using
the technique of completing the square." Students may be asked to supply the answer or, where
appropriate, select from among a set of possible answers. Furthermore, because the emphasis
is on the procedure as well as the answer, students may be required not only to find the answer
but also to show their work.
Implementing
Implementing occurs when a student selects and uses a procedure to perform an
unfamiliar task. Because selection is required, students must possess an understanding of the
type of problem encountered as well as the range of procedures that are available. Thus,
implementing is used in conjunction with other cognitive process categories, such as
Understand and Create. Because the student is faced with an unfamiliar problem, he or she
does not immediately know which of the available procedures to use. Furthermore, no single
procedure may be a "perfect fit'' for the problem; some modification in the procedure may be
needed. Implementing is more frequently associated with the use of techniques and methods
than with skills and algorithms. Techniques and methods have two qualities that make them
particularly amenable to implementing. First, the procedure may be more like a "flow chart"
than a fixed sequence; that is, the procedure may have "decision points" built into it (e.g., after
completing Step 3, should I do Step 4A or Step 4B?). Second, there often is no single, fixed
answer that is expected when the procedure is applied correctly.
The notion of no single, fixed answer is especially applicable to objectives that call for
applying conceptual knowledge such as theories, models, and structures, where no procedure
has been developed for the application. Consider an objective such as "The student shall be
able to apply a social psychological theory of crowd behaviour to crowd control." Social
psychological theory is Conceptual not Procedural knowledge. This is clearly an Apply
objective, however, and there is no procedure for making the application. Given that the theory
would very clearly structure and guide the student in the application, this objective is just barely
on the Apply side of Create, but Apply it is. So it would be classified as implementing. To see
why it fits, think of the Apply category as structured along a continuum. It starts with the
narrow, highly structured execute, in which the known Procedural knowledge is applied almost
routinely. It continues through the broad, increasingly unstructured implement, in which, at the
beginning, the procedure must be selected to fit a new situation. In the middle of the category,
the procedure may have to be modified to implement it. At the far end of implementing, where
there is no set Procedural knowledge to modify, a procedure must be manufactured out of
Conceptual knowledge using theories, models, or structures as a guide. So, although Apply is
closely linked to Procedural knowledge, and this linkage carries through most of the category
of Apply, there are some instances in implementing to which one applies Conceptual
knowledge as well. An alternative term for implementing is using.
Generating
Generating involves representing the problem and arriving at alternatives or hypotheses
that meet certain criteria. Often the way a problem is initially represented suggests possible
solutions; however, redefining or coming up with a new representation of the problem may
suggest different solutions. When generating transcends the boundaries or constraints of prior
knowledge and existing theories, it involves divergent thinking and forms the core of what can
be called creative thinking.
Generating is used in a restricted sense here. Understand also requires generative
processes, which we have included in translating, exemplifying, summarizing, inferring,
classifying, comparing, and explaining. However, the goal of Understand is most often
convergent (that is, to arrive at a single meaning). In contrast, the goal of generating within
Create is divergent (that is, to arrive at various possibilities).An alternative term for generating
is hypothesizing.
Planning
Planning involves devising a solution method that meets a problem's criteria, that is,
developing a plan for solving the problem. Planning stops short of carrying out the steps to
create the actual solution for a given problem. In planning, a student may establish sub-goals,
or break a task into subtasks to be performed when solving the problem. Teachers often skip
stating planning objectives, instead stating their objectives in terms of producing, the final stage
of the creative process. When this happens, planning is either assumed or implicit in the
producing objective. In this case, planning is likely to be carried out by the student covertly
during the course of constructing a product (i.e., producing). An alternative term is designing.
Sample objectives and corresponding assessments: ln planning, when given a problem
statement, a student develops a solution method. In history, a sample objective could be to be
able to plan research papers on given historical topics. An assessment task asks the student,
prior to writing a research paper on the causes of the Indian Revolution, to submit an outline
of the paper, including the steps he or she intends to follow to conduct the research. In the
natural sciences, a sample objective could be to learn to design studies to test various
hypotheses. An assessment task asks students to plan a way of determining which of three
factors determines the rate of oscillation of a pendulum.. In mathematics, an objective could be
to be able to lay out the steps needed to solve geometry problems. An assessment task asks
students to devise a plan for determining the volume of the frustum of a pyramid (a task not
previously considered in class). The plan may involve computing the volume of the large
pyramid, then computing the volume of the small pyramid, and finally subtracting the smaller
volume from the larger.
Producing
Producing involves carrying out a plan for solving a given problem that meets certain
specifications. As we noted earlier, objectives within the category Create may or may not
include originality or uniqueness as one of the specifications. So it is with producing objectives.
Producing can require the coordination of the four types of knowledge. An alternative term is
constructing.
Sample objectives and corresponding assessments: In producing, a student is given a functional
description of a goal and must create a product that satisfies the description. It involves carrying
out a solution plan for a given problem. Sample objectives involve producing novel and useful
products that meet certain requirements. In history, an objective could be to learn to write
papers pertaining to particular historical periods that meet specified standards of scholarship.
An assessment task asks students to write a short story that takes place during the Indian
Revolution. In science, an objective could be to learn to design habitats for certain species and
certain purposes. A corresponding assessment task asks students to design the living quarters
of a space station. In all these examples, the specifications become the criteria for evaluating
student performance relative to the objective. These specifications, then, should be included in
a scoring rubric that is given to the students in advance of the assessment.
Assessment formats: A common task for assessing producing is a design task, in which students
are asked to create a product that corresponds to certain specifications. For example, students
may be asked to produce schematic plans for a new institution that include new ways for
students to conveniently store their personal belongings.
Analyse
Analyse involves breaking material into its constituent parts and determining how the
parts are related to e another and to an overall structure. This process category includes the
cognitive processes of differentiating, organizing, and attributing. Objectives classified as
Analyse include learning to determine the relevant or important pieces of a message
(differentiating), the ways in which the pieces of a message are organized (organizing), and the
underlying purpose of the message (attributing). Although learning to Analyze may be viewed
as an end in itself, it is probably more defensible educationally to consider analysis as an
extension of Understanding or as a prelude to Evaluating or Creating. Improving students' skills
in analyzing educational communications is a goal in many fields of study. Teachers of science,
social studies, the humanities, and the arts frequently give "learning to analyze" as one of their
important objectives. They may, for example, wish to develop in their students the ability to:
• distinguish fact from opinion (or reality from fantasy);
• connect conclusions with supporting statements;
• distinguish relevant from extraneous material;
• determine how ideas are related to one another;
• ascertain the unstated assumptions involved in what is said;
• distinguish dominant from subordinate ideas or themes in poetry or music; and
• find evidence in support of the author's purposes.
The process categories of Understand, Analyze, and Evaluate are interrelated and often used
iteratively in performing cognitive tasks. At the same time, however, it is important to maintain
them as separate process categories. A person who understands a communication may not be
able to analyze it well. Similarly, someone who is skillful in analyzing a communication may
evaluate it poorly.
Differentiating
Differentiating involves distinguishing the parts of a whole structure in terms of their
relevance or importance. Differentiating occurs when a student discriminates relevant from
irrelevant information, or important from unimportant in formation, and then attends to the
relevant or important information. Differentiating is different from the cognitive processes
associated with Understand because it involves structural organization and, in particular,
determining how the parts fit into the overall structure or whole. More specifically,
differentiating differs from comparing in using the larger context to determine what is relevant
or important and what is not. For instance, in differentiating apples and oranges in the context
of fruit, internal seeds are relevant, but color and shape are irrelevant. In comparing. all of these
aspects (i.e., seeds, color, and shape) are relevant. Alternative terms for differentiating are
discriminating, selecting, distinguish ing, and focusing.
Sample objectives and corresponding assessments: In the social sciences, an objective could
be to learn to determine the major points in research reports. A corresponding assessment item
requires a student to circle the main points in an archaeological report about an ancient Indian
city (such as when the city began and when it ended, the population of the city over the course
of its existence, the geographic location of the city, the physical buildings in the city, its
economic and cultural function, the social organization of the city, why the city was built and
why it was deserted). Similarly, in the natural sciences, an objective could be to select the main
steps in a written description of how something works. A corresponding assessment item asks
a student to read a chapter in a book that describes lightning formation and then to divide the
process into major steps (including moist air rising to form a cloud, creation of updrafts and
downdrafts inside the cloud, separation of charges within the cloud, movement of a stepped
leader downward from cloud to ground, and creation of a return stroke from ground to cloud).
Finally, in mathematics, an objective could be to distinguish between relevant and irrelevant
numbers in a word problem. An assessment item requires a student to circle the relevant
numbers and cross out the irrelevant numbers in a word problem.
Assessment formats: Differentiating can be assessed with constructed response or selection
tasks. In a constructed response task, a student is given some material and is asked to indicate
which parts are most important or relevant, as in this example: ''Write the numbers that are
needed to solve this problem: Pencils come in packages that contain 12 each and cost Rs.2.00
each. John has Rs.5.00 and wishes to buy 24 pencils. How many packages does he need to
buy?" In a selection task, a student is given some material and is asked to choose which parts
are most important or relevant, as in this example: "Which numbers are needed to solve this
problem? Pencils come in packages that contain 12 each and cost Rs.2.00 each. John has
Rs.5.00 and wishes to buy 24 pencils. How many packages does he need to buy? (a) 12,
Rs.2.00, Rs.5.00, 24; (b) 12, Rs.2.00, Rs.5.00; (c) 12, Rs.2.00, 24; (d) 12, 24."
Organizing
Organizing involves identifying the elements of a communication or situation and
recognizing how they fit together into a coherent structure. In organizing, a student builds
systematic and coherent connections among pieces of presented information. Organizing
usually occurs in conjunction with differentiating. The student first identifies the relevant or
important elements and then determines the overall structure within which the elements fit.
Organizing can also occur in conjunction with attributing, in which the focus is on determining
the author's intention or point of view. Alternative terms for organizing are structuring,
integrating, finding coherence, outlining, and parsing.
Attributing
Attributing occurs when a student is able to ascertain the point of view, biases, values,
or intention underlying communications. Attributing involves a process of deconstruction, in
which a student determines the intentions of the author of the presented material. In contrast to
interpreting, in which the student seeks to Understand the meaning of the presented material,
attributing involves an extension beyond basic understanding to infer the intention or point of
view underlying the presented material. An alternative term is deconstructing.
Assessment formats: Attributing can be assessed by presenting some written or oral material
and then asking a student to construct or select a description of the author's or speaker's point
of view, intentions, and the like. For example, a constructed response task is "What is the
author's purpose in writing the essay you read on the Amazon rain forests?" A selection version
of this task is "The author's purpose in writing the essay you read is to: (a) provide factual
information about Amazon rain forests, (b) alert the reader to the need to protect rain forests,
(c) demonstrate the economic advantages of developing rain forests, or (d) describe the
consequences to humans if rain forests are developed." Alternatively, students might be asked
to indicate whether the author of the essay would (a) strongly agree, (b) agree, (c) neither agree
nor disagree, (d) disagree, or (e) strongly disagree with several statements. Statements like
"The rainforest is a unique type of ecological system" would follow.
CATEGORIES OF THE COGNITIVE PROCESS DIMENSION
Let us define the cognitive processes within each of the six categories in detail, making
comparisons with other cognitive processes, where appropriate. In addition, sample
educational objectives and assessments in various subject areas as well as alternative versions
of assessment tasks. Each illustrative objective in the following material should be read as
though preceded by the phrase "The student is able to ... " or "The student learns to...."
Remember
When the objective of instruction is to promote retention of the presented material in
much the same form as it was taught, the relevant process category is Remember.
Remembering involves retrieving relevant knowledge from long term memory. The two
associated cognitive processes are recognizing and recalling. The relevant knowledge may be
Factual, Conceptual, Procedural, or Meta-cognitive, or some combination of these. To assess
student learning in the simplest process category, the student is given a recognition or recall
task under conditions very similar to those in which he or she learned the material. Little, if
any, extension beyond those conditions is expected. If, for example, a student learned the
English equivalents of 20 Spanish words, then a test of remembering could involve requesting
the student to match the Spanish words in one list with their English equivalents in a second
list (i.e., recognize) or to write the corresponding English word next to each of the Spanish
words presented in the list (i.e., recall).
Remembering knowledge is essential for meaningful learning and problem solving as
that knowledge is used in more complex tasks. For example, knowledge of the correct spelling
of common English words appropriate to a given grade level is necessary if the student is to
master writing an essay. Where teachers concentrate solely on rote learning, teaching and
assessing focus solely on remembering elements or fragments of knowledge, often in isolation
from their context. When teachers focus on meaningful learning, however, remembering
knowledge is integrated within the larger task of constructing new
knowledge or solving new problems.
Recognizing
Recognizing involves retrieving relevant knowledge from long-term memory in order
to compare it with presented information. In recognizing, the student searches long-term
memory for a piece of information that is identical or extremely similar to the presented
information (as represented in working memory). When presented with new information, the
student determines whether that information corresponds to previously learned knowledge,
searching for a match. An alternative term for recognizing is identifying.
Sample objectives and corresponding assessments: In general studies, an objective could be for
students to recognize the correct dates of important events in Indian history. A corresponding
test item is: "True or false: The Declaration of Independence was on August 15, 1947." In
literature, an objective could be to recognize authors of Indian literary works. A corresponding
assessment is a matching test that contains a list of ten authors and a list of slightly more than
ten novels. In mathematics, an objective could be to recognize the numbers of sides in basic
geometric shapes. A corresponding assessment is a multiple choice test with items such as the
following: "How many sides does a pentagon have? (a) four, (b) five, (c) six, (d) seven."
Recalling
Recalling involves retrieving relevant knowledge from long-term memory when given
a prompt to do so. The prompt is often a question. In recalling, a student searches long-term
memory for a piece of information and brings that piece of information to working memory
where it can be processed. An alternative term for recalling is retrieving.
Assessment formats: Assessment tasks for recalling can vary in the number and quality of cues
that students are provided. With low cueing, the student is not given any hints or related
information (such as "What is a meter?"). With high cueing, the student is given several hints
(such as "In the metric system, a meter is a measure of .").
Assessment tasks for recalling can also vary in the amount of embedding, or the extent to which
the items are placed within a larger meaningful context. With low embedding, the recall task
is presented as a single, isolated event, as in the preceding examples. With high embedding,
the recall task is included within the context of a larger problem, such as asking a student to
recall the formula for the area of a circle when solving a word problem that requires that
formula.
CONCEPTUAL KNOWLEDGE
Conceptual knowledge includes knowledge of categories and classifications and the
relationships between and among them-more complex, organized knowledge forms.
Conceptual knowledge includes schemas, mental models, or implicit or explicit theories in
different cognitive psychological models. These schemas, models, and theories represent the
knowledge an individual has about how a particular subject matter is organized and structured,
how the different parts or bits of information are interconnected and interrelated in a more
systematic manner, and how these parts function together. For example, a mental model for
why the seasons occur may include ideas about the earth, the sun, the rotation of the earth
around the sun, and the tilt of the earth toward the sun at different times during the year. These
are not just simple, isolated facts about the earth and sun but rather ideas about the relationships
between them and how they are linked to the seasonal changes. This type of conceptual
knowledge might be one aspect of hat is termed "disciplinary knowledge," or the way experts
in the discipline think about a phenomenon-in this case the scientific explanation for the
occurrence of the seasons.
Conceptual knowledge includes three subtypes: knowledge of classifications and
categories, knowledge of principles and generalizations, and knowledge of theories, models,
and structures. Classifications and categories form the basis for principles and generalizations.
These, in turn, form the basis for theories, models, and structures. The three subtypes should
capture a great deal of the knowledge that is generated within all the different disciplines.
Checking
Checking involves testing for internal inconsistencies or fallacies in an operation or a
product. For example, checking occurs when a student tests whether or not a conclusion follows
from its premises, whether data support or disconfirm a hypothesis, or whether presented
material contains parts that contradict one another. When combined with planning (a cognitive
process in the category Create) and implementing (a cognitive process in the category Apply),
checking· involves determining how well the plan is working. Alternative terms for checking
are testing, detecting, monitoring, and coordinating.
Sample objectives and corresponding assessments: In checking, students look for internal
inconsistencies. A sample objective in the social sciences could be to learn to detect
inconsistencies in persuasive messages. A corresponding assessment task asks students to
watch a television advertisement for a political candidate and point out any logical flaws in the
persuasive message. A sample objective in the sciences could be to learn to determine whether
a scientist's conclusion follows from the observed data. An assessment task asks a student to
read a report of a chemistry experiment and determine whether or not the conclusion follows
from the results of the experiment.
Assessment formats: Checking tasks can involve operations or products given to the students
or ones created by the students themselves. Checking can also take place within the context of
carrying out a solution to a problem or performing a task, where one is concerned with the
consistency of the actual implementation (e.g., Is this where I should be in light of what I've
done so far?).
Critiquing
Critiquing involves judging a product or operation based on externally imposed criteria
and standards. In critiquing, a student notes the positive and negative features of a product and
makes a judgment based at least partly on those features. Critiquing lies at the core of what has
been called critical thinking. An example of critiquing is judging the merits of a particular
solution to the problem of acid rain in terms of its likely effectiveness and its associated costs
(e.g., requiring all power p1ants throughout the country to restrict their smokestack emissions
to certain limits). An alternative term is judging.
Sample objectives and corresponding assessments: In critiquing, students judge the merits of a
product or operation based on specified or student-determined criteria and standards. In the
social sciences, an objective could be to learn to evaluate a proposed solution (such as
"eliminate all grading") to a social problem (such as "how to improve K-12 education") in
terms of its likely effectiveness. In the natural sciences, an objective could be to learn to
evaluate the reasonableness of a hypothesis (such as the hypothesis that strawberries are
growing to extraordinary size because of the unusual alignment of the stars). Finally, in
mathematics, an objective could be to learn to judge which of two alternative methods is a more
effective and efficient way of solving given problems (such as judging whether it is better to
find all prime factors of 60 or to produce an algebraic equation to solve the problem "What are
the possible ways you could multiply two whole numbers to get 60?").
Assessment formats: student may be asked to critique his or her own hypotheses or creations
or those generated by someone else. The critique could be based on positive, negative, or both
kinds of criteria and yield both positive and negative consequences.
Factual Knowledge
Factual knowledge encompasses the basic elements that experts use in communicating
about their academic discipline, understanding it, and organizing it systematically. These
elements are usually serviceable to people who work in the discipline in the very form in which
they are presented; they need little or no alteration from one use or application to another.
Factual knowledge contains the basic elements students must know if they are to be acquainted
with the discipline or to solve any of the problems in it. The elements are usually symbols
associated with some concrete referents, or "strings of symbols" that convey important
information. For the most part, Factual knowledge exists at a relatively low level of abstraction.
Because there is a tremendous wealth of these basic elements, it is almost inconceivable that a
student could learn all of them relevant to a particular subject matter. As our knowledge
increases in the Engineering and Technology, sciences, and mathematics, even experts in these
fields have difficulty keeping up with all the new elements. Consequently, some selection for
educational purposes is almost always required. For classification purposes, Factual knowledge
may be distinguished from Conceptual knowledge by virtue of its very specificity; that is,
Factual knowledge can be isolated as elements or bits of information that are believed to have
some value in and of themselves. The two subtypes of Factual knowledge are knowledge of
terminology (Aa) and knowledge of specific details and elements (Ab).
Knowledge of terminology
Knowledge of terminology includes knowledge of specific verbal and nonverbal la
bels and symbols (e.g., words, numerals, signs, pictures). Each subject matter contains a large
number of labels and symbols, both verbal and nonverbal, that have particular referents. They
are the basic language of the discipline the shorthand used by experts to express what they
know. In any attempt by experts to communicate with others about phenomena within their
discipline, they find it necessary to use the special labels and symbols they have devised. In
many cases it is impossible for experts to discuss problems in their discipline without making
use of essential terms. Quite literally, they are unable to even think about many of the
phenomena in the discipline unless they use these labels and symbols. The novice learner must
be cognizant of these labels and symbols and learn the generally accepted referents that are
attached to them. As the expert must communicate with these terms, so must those learning the
discipline have a knowledge of the terms and their referents as they attempt to comprehend or
think about the phenomena of the discipline. Here, to a greater extent than in any other category
of knowledge, experts find their own labels and symbols so useful and precise that they are
likely to want the learner to know more than the learner really needs to know or can learn. This
may be especially true in the sciences, where attempts are made to use labels and symbols with
great precision. Scientists find it difficult to express ideas or discuss particular phenomena with
the use of other symbols or with "popular" or "folk knowledge" terms more familiar to a lay
population.
Strategic knowledge
Strategic knowledge is knowledge of the general strategies for learning, thinking, and
problem solving. The strategies in this subtype can be used across many different tasks and
subject matters, rather than being most useful for one particular type of task in one specific
subject area (e.g., solving a quadratic equation or applying Ohm's law).
This subtype, Strategic knowledge includes knowledge of the variety of strategies that stu
dents might use to memorize material, extract meaning from text, or comprehend what they
hear in classrooms or read in books and other course materials. The large number of different
learning strategies can be grouped into three general categories: rehearsal, elaboration, and
organizational (Weinstein and Mayer, 1986). Rehearsal strategies involve repeating words or
terms to be recalled over and over to oneself; they are generally not the most effective strategies
for deeper levels of learning and comprehension. In contrast, elaboration strategies include the
use of various mnemonics for memory tasks as well as techniques such as summarizing,
paraphrasing, and selecting the main idea from texts. Elaboration strategies foster deeper
processing of the material to be learned and result in better comprehension and learning than
do rehearsal strategies. Organizational strategies include various forms of outlining, drawing
"cognitive maps" or concept mapping, and note taking; students transform the material from
one form to another. Organizational strategies usually result in better comprehension and
learning than do rehearsal strategies.
In addition to these general learning strategies, students can have knowledge of various
meta-cognitive strategies that are useful in planning, monitoring, and regulating their cognition.
Students can eventually use these strategies to plan their cognition (e.g., set sub-goals), monitor
their cognition (e.g., ask themselves questions as they read a piece of text, check their answer
to a math problem), and regulate their cognition (e.g., re-read something they don't understand,
go back and "repair'' their calculating mistake in a math problem). Again, in this category we
refer to students' knowledge of these various strategies, not their actual use. Finally, this
subtype, Strategic knowledge includes general strategies for problem solving and thinking
(Baron, 1994; Nickerson, Perkins, and Smith, 1985; Sternberg, 1985). These strategies
represent the various general heuristics students can use to solve problems, particularly ill-
defined problems that have no definitive solution method. Examples of heuristics are means-
ends analysis and working backward from the desired goal state. In addition to problem-solving
strategies, there are general strategies for deductive and inductive thinking, including
evaluating the validity of different logical statements, avoiding circularity in arguments,
making appropriate inferences from different sources of data, and drawing on appropriate
samples to make inferences (i.e., avoiding the availability heuristic-making decisions from
convenient instead of representative symbols).
Examples of knowledge about cognitive tasks, including contextual and conditional knowledge
• Knowledge that recall tasks (i.e., short-answer items) generally make more demands on
the individual's memory system than recognition tasks (i.e., multiple-choice items)
• Knowledge that a primary source book may be more difficult to under
• stand than a general textbook or popular book
• Knowledge that a simple memorization task (e.g., remembering a phone number) may
require only rehearsal
• Knowledge that elaboration strategies like summarizing and paraphrasing can result in
deeper levels of comprehension
• Knowledge that general problem-solving heuristics may be most useful when the
individual lacks relevant subject- or task-specific knowledge or in the absence of
specific Procedural knowledge
• Knowledge of the local and general social, conventional, and cultural norms for how,
when, and why to use different strategies
Self knowledge
Along with knowledge of different strategies and cognitive tasks, Flavell (1979)
proposed that self-knowledge was an important component of meta-cognition. In his model
self-knowledge includes knowledge of one's strengths and weaknesses in relation to cognition
and learning. For example, students who know they generally do better on multiple-choice tests
than on essay tests have some self-knowledge about their test-taking skills. This knowledge
may be useful to students as they study for the two different types of tests. In addition, one hall
mark of experts is that they know when they do not know something and they then have some
general strategies for finding the needed and appropriate in formation. Self-awareness of the
breadth and depth of one's own knowledge base is an important aspect of self-knowledge.
Finally, students need to be aware of the different types of general strategies they are likely to
rely on in different situations. An awareness that one tends to over-rely on a particular strategy,
when there may be other more adaptive strategies for the task, could lead to a change in strategy
use.
In addition to knowledge of one's general cognition, individuals have beliefs about their
motivation. Motivation is a complicated and confusing area, with many models and theories
available. Although motivational beliefs are usually not considered in cognitive models, a fairly
substantial body of literature is emerging that shows important links between students'
motivational beliefs and their cognition and learning.
A consensus has emerged, however, around general social cognitive models of motivation that
propose three sets of motivational beliefs (Pintrich and Schunk, 1996). Because these beliefs
are social cognitive in nature, they fit into a taxonomy of knowledge. The first set consists of
self-efficacy beliefs, that is, students' judgments of their capability to accomplish a specific
task. The second set includes beliefs about the goals or reasons students have for pursuing a
specific task (e.g., learning vs. getting a good grade}. The third set contains value and interest
beliefs, which represent students' perceptions of their personal interest (liking) for a task as
well as their judgments of how important and useful the task is to them. Just as students need
to develop self-knowledge and awareness about their own knowledge and cognition, they also
need to develop self knowledge and awareness about their own motivation. Again, awareness
of these different motivational beliefs may enable learners to monitor and regulate their
behaviour in learning situations in a more adaptive manner.
Self-knowledge is an important aspect of Meta-cognitive knowledge, but the accuracy
of self-knowledge seems to be most crucial for learning. We are not advocating that teachers
try to boost students' "self-esteem" (a completely different construct from self-knowledge) by
providing students with positive but false, inaccurate, and misleading feedback about their
academic strengths and weaknesses. It is much more important for students to have accurate
perceptions and judgments of their knowledge base and expertise than to have 'inflated and
inaccurate self-knowledge. If students are not aware they do not know some aspect of Factual
knowledge or Conceptual knowledge or that they don't know how to do something (Procedural
knowledge), it is unlikely they will make any effort to learn the new material. A hallmark of
experts is that they know what they know and what they do not know, and they do not have
inflated or false impressions of their actual knowledge and abilities. Accordingly, we
emphasize the need for teachers to help students make accurate assessments of their self-
knowledge and not attempt to inflate students' academic self-esteem.
Examples of self-knowledge
• Knowledge that one is knowledgeable in some areas but not in others
• Knowledge that one tends to rely on one type of "cognitive tool" (strategy)
• in certain situations
• Knowledge of one's capabilities to perform a particular task that are accurate, not
inflated (e.g., overconfident)
• Knowledge of one's goals for performing a task
• Knowledge of one's judgments about the relative utility value of a task
Procedural knowledge
Procedural knowledge is the "knowledge of how" to do something. The "something"
might range from completing fairly routine exercises to solving novel problems. Procedural
knowledge often takes the form of a series or sequence of steps to be followed. It includes
knowledge of skills, algorithms, techniques, and methods, collectively known as procedures.
Procedural knowledge also includes knowledge of the criteria used to determine when to use
various procedures. In fact, as Bransford, Brown, and Cocking (1999) noted, not only do
experts have a great deal of knowledge about their subject matter, but their knowledge is
"conditionalized" so that they know when and where to use it. Whereas Factual knowledge and
Conceptual knowledge represent the "what" of knowledge, procedural knowledge concerns the
"how." In other words, Procedural knowledge reflects knowledge of different "processes,"
whereas Factual knowledge and Conceptual knowledge deal with what might be termed
"products." It is important to note that Procedural knowledge represents only the knowledge of
these procedures.
In contrast to Meta-cognitive knowledge (which includes knowledge of more general
strategies that cut across subject matters or academic disciplines), Procedural knowledge is
specific or germane to particular subject matters or academic disciplines. Accordingly, we
reserve the term Procedural knowledge for the knowledge of skills, algorithms, techniques, and
methods that are subject specific or discipline specific. In mathematics, for example, there are
algorithms for performing long division, solving quadratic equations, and establishing the
congruence of triangles. In science, there are general methods for designing and performing
experiments. In social studies, there are procedures for reading maps, estimating the age of
physical artifacts, and collecting historical data. In language arts, there are procedures for
spelling words in English and for generating grammatically correct sentences. Because of the
subject-specific nature of these procedures, knowledge of them also reflects specific
disciplinary knowledge or specific disciplinary ways of thinking in contrast to general
strategies for problem solving that can be applied across many disciplines.
Interpreting
Interpreting occurs when a student is able to convert information from one
representational form to another. Interpreting may involve converting words to words (e.g.,
paraphrasing), pictures to words, words to pictures, numbers to words, words to numbers,
musical notes to tones, and the like. Alternative terms are translating, paraphrasing,
representing, and clarifying.
Sample objectives and corresponding assessments: In interpreting, when given information in
one form of representation, a student is able to change it into another form. For example, an
objective could be to learn to paraphrase important functions of compiler phases. A
corresponding assessment asks a student to check the email grammar. In science, an objective
could be to learn to draw pictorial representations of various natural phenomena. A
corresponding assessment item asks a student to draw a series of diagrams illustrating
photosynthesis. In mathematics, a sample objective could be to learn to translate number
sentences expressed in words into algebraic equations expressed in symbols. A corresponding
assessment item asks a student to write an equation that corresponds to the statement "There
are twice as many boys as girls in this class."
Assessment formats: Appropriate test item formats include both constructed response (i.e.,
supply an answer) and selected response (i.e., choose an answer). Information is
presented in one form, and students are asked either to construct or to select the same
information in a different form. For example, a constructed response task is: "Write an
equation that corresponds to the following statement, using T for total cost and P for
number of bundles. The total cost of mailing a package is Rs. 2.00 for the first bundle
plus Rs.1.50 for each additional bundle." A selection version of this task is: "Which
equation corresponds to the following statement, where T stands for total cost and P for
number of bundles?
The total cost of mailing a package is Rs. 2.00 for the first bundle plus Rs.1.50 for each
additional bundle (a) T = Rs.3.50 + B, (b) T = Rs. 2.00 + Rs. 1.50(B), (c) T = Rs. 2.00 +
Rs.1.50(B-1)."
To increase the probability that interpreting rather than remembering is being assessed,
the information included in the assessment task must be new. "New" here means that
students did not encounter it during instruction. Unless this rule is observed, we cannot
ensure that interpreting rather than remembering is being assessed. If the assessment task
is identical to a task or example used during instruction, we are probably assessing
remembering, despite our efforts to the contrary. Although we will not repeat this point
from here on, it applies to each of the process categories and cognitive processes beyond
Remember. If assessment tasks are to tap higher-order cognitive processes, they must
require that students cannot answer them correctly by relying on memory alone.
Exemplifying
Exemplifying occurs when a student gives a specific example or instance of a general
concept or principle. Exemplifying involves identifying the defining features of the general
concept or principle (e.g., an isosceles triangle must have two equal sides) and using these
features to select or construct a specific instance (e.g., being able to select which of three
presented triangles is an isosceles triangle). Alternative terms are illustrating and instantiating.
Assessment formats: Exemplifying tasks can involve the constructed response format in which
the student must create an example or the selected response format in which the student must
select an example from a given set. The science example, "Locate an inorganic compound and
tell why it is inorganic," requires a constructed response. In contrast, the item "Which of these
is an inorganic compound? (a) iron, (b) protein, (c) blood, (d) leaf mold" requires a selected
response.
Classify
Classifying occurs when a student recognizes that something (e.g., a particular instance
or example) belongs to a certain category (e.g., concept or principle). Classifying involve s
detecting relevant features or patterns that "fit" both the specific instance and the concept or
principle. Classifying is a complementary process to exemplifying. Whereas exemplifying
begins with a general concept or principle and requires the student to find a specific instance
or example, classifying begins with a specific instance or example and requires the student to
find a general concept or principle. Alternative terms for classifying are categorizing and
subsuming.
Assessment formats: In constructed response tasks, a student is given an instance and must
produce its related concept or principle. In selected response tasks, a student is given an
instance and must select its concept or principle from a list. In a sorting task, a student is given
a set of instances and must determine which ones belong in a specified category and which
ones do not, or must place each instance into one of multiple categories.
Summarizing
Summarizing occurs when a student suggests a single statement that represents
presented information or abstracts a general theme. Summarizing involves constructing a
representation of the information, such as the meaning of a scene in a play, and abstracting a
summary from it, such as determining a theme or main points. Alternative terms are
generalizing and abstracting.
Sample objectives and corresponding assessments: In inferring, when given a set or series of
examples or instances, a student finds a concept or principle that accounts for them. For
example, In mathematics, an objective could be to learn to infer the relationship expressed as
an equation that represents several observations of values for two variables. An assessment
item asks a student to describe the relationship as an equation involving x and y for situations
in which if x is 1, then y is 0; if x is 2, then y is 3; and if x is 3, then y is 8.
Assessment formats: Three common tasks that require inferring (often along with
implementing) are completion tasks, analogy tasks, and oddity tasks. In completion tasks, a
student is given a series of items and must determine what will come next, as in the number
series example above. In analogy tasks, a student is given an analogy of the form A is to B as
C is to D, such as "nation" is to "president" as "state" is to ____________. The student's task
is to produce or select a term that fits in the blank and completes the analogy (such as
"governor"). In an oddity task, a student is given three or more items and must
determine which does not belong. For example, a student may be given three physics problems,
two involving one principle and another involving a different principle. To focus solely on the
inferring process, the question in each assessment task could be to state the underlying concept
or principle the student is using to arrive at the correct answer.
Comparing
Comparing involves detecting similarities and differences between two or more objects,
events, ideas, problems, or situations, such as determining how a well known event (e.g., a
recent political scandal) is like a less familiar event (e.g., a historical political scandal).
Comparing includes finding one-to-one correspondences between elements and patterns in one
object, event, or idea and those in another object, event, or idea. When used in conjunction with
inferring (e.g., first, abstracting a rule from the more familiar situation) and implementing (e.g.,
second, applying the rule to the less familiar situation), comparing can con tribute to reasoning
by analogy. Alternative terms are contrasting, matching, and mapping.
Sample objectives and corresponding assessments: In comparing, when given new information,
a student detects correspondences with more familiar knowledge. For example, in social
studies, an objective could be to understand historical events by comparing them to familiar
situations. A corresponding assessment question is "How is the Indian culture revolution like
a family fight or an argument between friends?" In the natural sciences, a sample objective
could be to learn to compare an electrical circuit to a more familiar system. In assessment, we
ask "How is an electrical circuit like water flowing through a pipe?" Comparing may also
involve determining correspondences between two or more presented objects, events, or ideas.
In mathematics, a sample objective could be to learn to compare structurally similar word
problems. A corresponding assessment question asks a student to tell how a certain mixture
problem is like a certain work problem.
Assessment formats:A major technique for assessing the cognitive process of comparing is
mapping. In mapping, a student must show how each part of one object, idea, problem, or
situation corresponds to (or maps onto) each part of another. For example, a student could be
asked to detail how the battery, wire, and resistor in an electrical circuit are like the pump,
pipes, and pipe constructions in a water flow system, respectively.
Explaining
Explaining occurs when a student is able to construct and use a cause-and effect model
of a system. The model may be derived from a formal theory (as is often the case in the natural
sciences) or may be grounded in research or experience (as is often the case in the social
sciences and humanities). A complete explanation involves constructing a cause-and-effect
model, including each major part in a system or each major event in the chain, and using the
model to determine how a change in one part of the system or one "link" in the chain affects a
change in another part. An alternative term for explaining is constructing a model.
Assessment formats: Several tasks can be aimed at assessing a student's ability to explain,
including reasoning, troubleshooting, redesigning, and predicting. In reasoning tasks, a student
is asked to offer a reason for a given event. For example, "Why does air enter a bicycle tire
pump when you pull up on the handle?" In this case, an answer such as "It is forced in because
the air pressure is less inside the pump than outside" involves finding a principle that accounts
for a given event. In troubleshooting, a student is asked to diagnose what could have gone
wrong in a malfunctioning system. For example, "Suppose you pull up and press down on the
handle of a bicycle tire pump several times but no air comes out. What's wrong?" In this case,
the student must find an explanation for a symptom, such as "There is a hole in the cylinder"
or "A valve is stuck in the open position." In redesigning, a student is asked to change the
system to accomplish some goal. For example, "How could you improve a bicycle tire pump
so that it would be more efficient?" To answer this question, a student must imagine altering
one or more of the components in the system, such as "Put lubricant between the piston and
the cylinder."
In predicting, a student is asked how a change in one part of a system will effect a change in
another part of the system. For example, "What would hap pen if you increased the diameter
of the cylinder in a bicycle tire pump?" This question requires that the student "operate" the
mental model of the pump to see that the amount of air moving through the pump could be
increased by in creasing the diameter of the cylinder.
Assessment Procedures
Assessment methods are the strategies, techniques, tools and instruments for collecting
information to determine the extent to which students demonstrate desired learning outcomes.
Several methods should be used to assess student learning outcomes. Relying on only one
method to provide information about the program will only reflect a part of students’
achievement. Additionally, student learning outcome may be difficult to assess using only one
method. For each student learning outcome, a combination of direct and indirect assessment
methods should be used. For example, responses from student surveys may be informative,
however, when combined with students’ test results they will be more meaningful, valid, and
reliable.
Principles of Assessment
Assessment will be valid
Assessment will be explicitly designed to measure student achievement of the intended
learning outcomes, and all intended learning outcomes will be summatively assessed. The
processes for the approval of new modules and programmes, and for amending existing
modules and programmes, will ensure that assessment is an integral part of module and
programme design, and the ongoing validity of assessment will be considered through annual
and periodic review.
Assessment will be reliable
To ensure the level of consistency that is necessary for assessment to be reliable, all awards at
the same academic level will be aligned with the institution generic qualification descriptor,
level descriptor and assessment criteria for that level of award.
Assessment will be equitable
Different assessment methods may be appropriate for different learning styles, and it therefore
encourages all programmes to employ (in a way that is consistent with the intended learning
outcomes being assessed) a diversity of assessment methods to allow all students to
demonstrate their knowledge, understanding and skills.
Assessment will be explicit and transparent
Prior to undertaking any assessment task, students will be clearly informed of the purpose and
requirements of the task and will be provided with the specific assessment criteria that will be
used for marking it. Feedback to students will be related to the stated learning outcomes and
specific assessment criteria. Clear information on the policies and processes relating to
assessment will be easily available to all involved in the assessment process.
Assessment will support the student learning process
All assessment tasks influence the way in which students approach their learning, and this will
be taken into account in the design of all assessment tasks.
Assessment will be efficient
Assessment will be efficient for both students and staff such that learning outcomes are not
overly assessed and that knowledge and skills can be sampled.
Direct Method of Assessment
Direct method of assessment will provide the exact outcome of the classroom. The evidence of
the direct assessment is concrete like quantifiable, measurable and visible. It clearly shows the
student learning in a course. It gives the direction to the faculty members that what is the
understanding of the subject and with the understanding what they can do?. This method used
commonly by most of the faculty members. There are different methodologies of direct
assessment method.
1. Standardized Examination
2. Quiz
3. Simulations
4. Demonstrations
5. Capstone Projects
6. Portfolios
7. Oral Exams
So, the strength of direct measurement is, the faculty members are getting the concrete evidence
of a sample what students can do with their student learning.
But at the same time, direct measurement has its own weakness as some of the teaching learning
process components cannot be evaluated directly.
Completion and Single Answer Questions
A statement given in an incomplete form or a question is posed to the student. The student is
required to complete the sentence by filling in the blank with the correct answer or supply
answer to the question in a single word or phrase.
Example:
What is the atomicity of Hydrogen?
The valency of Carbon is
These types of questions are useful to test a student's
• knowledge of facts, principles, theories
• comprehension of information including interpretation of data, parts
Construction
• Items must be clearly stated so that a single brief answer is possible.
• The question must be direct
Example: What is the SI unit of force
• The item with blank spaces must make enough sense so that the student knows
what to do
Example: A metal is--------------
The above item does not indicate to the student what is expected of him/her.
• Answer must be related to the main point in the statement
• Place the blanks at the end ofthe statement. The blank may either be provided
at the end or beginning of the statement. Blanks in the middle ofthe statement
should be avoided as much as possible.
Example: There are two rational numbers which have themselves as reciprocals. One of them
is l . The other number is
Don 't make the statement too general.
For example: A circular saw is ------------
Let us look at some examples.
What is a refrigerator?
This is a vague question. Students can answer it in different ways. One answer can be it is a
gadget to store vegetables and prevent them from spoilage. Another answer may be it is a
gadget that works on the principle of refrigeration. Yet another answer could be it is a gadget
which maintains the required temperature for preventing spoiling of food items.
A capacitor-----------------DC capacitor---------and ------- electric energy.
These two items violate rules of construction. More than one answer is possible for both
questions.
The Selection question provides students with alternate answers from which to choose. The
correct answer. The following are the types of Selection items
• Multiple Choice
• True /false (also called Alternate response)
• Matching
It is important to remember that all assessment methods have their limitations and contain some
bias. A meaningful assessment program would use both direct and indirect assessments from a
variety of sources (students, alumni, faculty, employers, etc.). This use of multiple assessment
methods provides converging evidence of student learning. Indirect methods provide a valuable
supplement to direct methods and are generally a part of a robust assessment program.
• Proportion of upper-level
courses compared to the same
program at other institutions
• Graduate school placement
rates
• Performance on tests of
writing, critical thinking, or
general knowledge • Locally developed,
• Rubric scores for class commercial, or national surveys
assignments in General of student perceptions or self-
Education, interdisciplinary report of activities (e.g.,
core courses, or other courses National Survey of Student
required of all students Engagement)
• Performance on achievement • Transcript studies that examine
Institutional
tests patterns and trends of course
• Explicit self-reflections on selection and grading
what students have learned • Annual reports including
related to institutional institutional benchmarks (e.g.,
programs such as service graduation and retention rates,
learning (e.g., asking students grade point averages of
to name the three most graduates, etc.)
important things they have
learned in a program)
Short Answer Questions
Short-answer questions are open-ended questions that require students to supply their answer.
They are commonly used in examinations to assess the basic knowledge and understanding
(low cognitive levels) of a topic before more in-depth assessment questions are asked on the
topic.
This is a supply type item where the student is given a clear direction to restrict the answer to
2 or 3 sentences. Questions must be such that answers are possible within the limits of specified
lengths.
Example;
Define Poisson's ratio
List three important uses of poor conductors
What is normalization in database?
Construction
• The question must be simple, clear and unambiguous
• Scope of answer must be limited by the use of words such as 'List 'give reasons', 'define'
etc,.
• Questions must be interpretable in the same way by all students.
Column A Column B
1. Best for measuring computational a. Matching item skill
2. Least useful for educational b. Multiple Choice diagnosis
3. Most difficult to score objectively c. True-False item
4. Provides high scores by guessing d. Short answer item alone
5. Measures greater variety of learning outcomes
6. Measures learning at recall level
• Give very clear instructions about how students must write the answers to each item, where
they are to mark their answers.
• The acceptable format for numbering matching questions is to place numbers in front of
the premises on the left place letters in front of the responses on the right
• Keep the lists as short as possible
• Arrange the lists in a logical order. If dates are used it is preferable to put them in a
chronological order.
• Use proper numbering for both columns. Items in Column may be given alphabetical
numbers while those in Column may be given numerical numbers.
Example:
Write the quadratic Equation – Expected factual knowledge of the student
A ball is thrown straight up, from 10 m above from the ground, with a velocity of 20 m/s. When
does the ball will hit the ground? (Ignore the air resistance) – This question will bring the
conceptual knowledge and procedural knowledge by applying quadratic equation, how to solve
the problem. So, problem solving method brings the students conceptual knowledge and
procedural knowledge.
Example:
In a packet switching network, packets are routed from source to destination along a
single path having two intermediate nodes. If the message size is 48 bytes and each packet
contains a header of 3 bytes, then what is the optimum packet size?
The above question will extract the concept knowledge which is behind the packet switching
network. Solving the problem ensure the students understanding clearly.
In software engineering, the problem solving will be split large complex goals into
small, simpler ones, try to think different kind of a parallel solution of each one, make the
problem as abstract so that the problem can be applied in an another same abstract in an another
issue, learn to use the existing solutions instead re-inventing the wheel and think in terms of
data flow. This procedure would give the clear idea to the student to approach a problem in
software engineering.
Multiple Choice Test Items
The Multiple-Choice questions(MCQ) are one of the most widely used for the
assessment. They are also known to be quite difficult for construction. In a multiple-choice
item, the student is required to select the correct answer for a question from a group of several
alternatives.
An example:
The transfer ofheat in a steel bar from one end to the other end is by
a) Conduction
b) Convection
c) Radiation
d) Fusion
In the above example " the transfer of heat from one end of the steel bar to the other end" is the
main question. This is at the top of the item. This is the question to which the student must
select the correct answer. This statement or question is called Stem. The Stem can be either in
the form of a direct question or an incomplete sentence. This acts as a stimulus to evoke the
correct response from students. The alternatives provided as possible answers are called
Options. In the example four options are given. The student has to choose the correct answer
from the options. There may be four or even five options. In the example given items at a, b, c,
d are options.
The correct answer is called the Key. In the example option (a) is the key. Other than the correct
answer are called Distracters. Options b, c, and d are the distracters.
Example:
Voltage drop in a resistor is NOT proportional to
a) current
b) resistance
c) power dissipation
d) physical dimensions or size
(Notice NOT, the negative. This is given in capital to emphasize. It may also be underlined).
Stem must be a complete question by itself not requiring the student to read the options in order
to discover what is being asked
Example:
When two resistors of value 10 ohms and 30 ohms are connected in series, the net resistance
value will be
a) 3 ohms
b) 20 ohms
c) 40 ohms
d) 300 ohms
In this item, the student could work-out the answer without referring to the response, since the
stem is a complete question by itself.
Content of the question must be made clear to avoid confusion. State the stem of the item in a
simple clear sentence. Use simple language so that students understand the statement without
much difficulty
Example:
Poor construction
The paucity of plausible, but incorrect statement that can be related to a central idea poses a
problem when constructing which one of the following types of test items?
a) Short answer
b) True- False
c) Multiple choice
d) Essay
Better constructed item
The lack of plausible but incorrect alternatives will cause the greatest difficulty when
constructing
a) Short answer question
b) True-False
c) Multiple Choice item
d) Essay
Put as much of the wording as possible in the item of the item and anything that needs repeating
in each option should be included in the stem.
Example:
In objective testing the term objective
a) refers to the method of identifying the learning outcomes
b) refers to the method of selecting the test content
c) refers to the method of presenting the problem
d) refers to the method of scoring the answers.
The phrase 'refers to the method' repeats itself in all the four options. It must be taken to the
stem. The stem must then read In objective testing the term objective refers to the method of '
The options must be closely related to the stem.
Example:
The property of a circuit that tends to oppose a change in current is called
a) Conductance
b) Voltage
c) temperature
d) Inductance
In the above example b and c are not properties of a circuit. These are not good options. Better
options would be to replace b and c by
b) capacitance
c) resistance
The options should be parallel to structure i.e. they should fit grammatically with the stem.
Grammatical consistency ofall options is very important.
Example:
The station where an aircraft is taken for repairs is called an
a) apron
b) hanger
c) tower
d) workshop
In this example only one option fits the grammatical structure of the stem. In order to improve
this stem may end with .... is called an
The item must not contain clue to the student such as combination ofsingulars and plurals in
the options.
Example:
The direction of propagation ofan electromagnetic wave in the free space is
a) along the electric field
b) along the magnetic field
c) in the plane of electric and magnetic field
d) perpendicular to the surface of containing the two fields,
In the above example the precision and length of the key option d makes it stand out from the
rest. To avoid it, the phrase in d must be inserted appropriately in each of the answer.
Example:
An ion is
a) a charged particle
b) an atom which has gained or lost electrons
c) a neutral particle
d) formed in electrolytes
Here the stem is vague and three of the options given are acceptable.
Distracters must be incorrect yet likely to be plausible to weaker students. This means that the
distracters must be believable.
Example:
Waste and overflow fittings for a bathtub are installed
a) before the bathtub is set in place
b) after the bathtub is set in place
c) at the same time as the trap
d) none of the above.
It is unlikely that any student would choose d as the answer; particularly since all the other
options are likely alternatives,
Another example:
A person invested Rs 500 in a business. He sold goods worth Rs 550 in this business. The %
profit he got was
a) Rs.50
b) Rs.10
c) 50%
d) Rs 550
The items should not be very lengthy and involve lengthy calculations Example:
What is the equivalent resistance of 330 K ohms and a 100 K ohm resistor connected in
parallel?
b) 76.74 K ohms
c) 82.05 K ohms
d) 120 K ohms
e) 430 K ohms
The student has to work through lengthy calculations to arrive at the correct answer. The item
should test the understanding of the principle of resistors in parallel. It is not expected to test
the ability of calculation. The item may be reworded suitably.
The level of information required to reject wrong responses should not be higher than that
Coulomb is the unit of measurement of
a) inductive reactance
b) electric charge
c) band width
d) trans conductance
Required selecting a correct response
Here to reject options a, c, and d a higher level of information is required than to select the key.
Hence, a, c, and d are poor distracters. The options for the item may be rewrintten to suit the
level of learning under test. The options may be modified as under
a) resistance
b) charge
c) power
d) potential difference
Advantages of multiple-choice items
• versatility in measuring all levels of cognitive ability
• highly reliable test scores
• scoring efficiency and accuracy
• objective measurement of student achievement or ability
• a wide sampling of content or objectives
• a reduced guessing factor when compared to True-False items
• different response alternatives which can provide diagnostic feedback
SPECIFIC
Constant Alternative type
1. Does the item include only one significant idea in each statement?
2. Is the statement so precise that it can be judged unequivocally true or false?
3. Is the statement short and in simple language?
4. Does the item use negative statements sparingly and avoid double negatives?
Multiple Choice
1. Is the stem concise and unambiguous? Is the negative(if unavoidable) emphasized?
2. Is the stem a complete question by itself? Does the item require the student to read the
options to discover what is being asked?
3. Is the content of the question clear?
4. Does the stem include anything that needs to be repeated in every option, within itself?
5. Are the options parallel in content?
6. Are the options parallel in structure?
7. Is the item devoid of any clues such as mix up of singular, plural, precision and length
of key option etc.?
8. Is the key option unarguably correct?'
9. Are the distracters plausible?
10. Does the item exclude ' 'all these"?
11. Is the language used in the item appropriate to the vocabulary of students at this level?
12. Does the item avoid similarity of wording in both stem and the correct answer?
13. Does the item exclude responses that are "all inclusive"?
14. Does the item use an efficient format?
Matching type
1. Does the item include only homogeneous material In the premises"?
2. Is the number of responses sufficiently large so that the last of their premises can still
have many options to choose from?
3. Does the item specify the basis of matching, type of matching, kind of entry etc?
It is generally agreed that teachers need to evaluate the work of their students and
assess all aspects of their teaching to enhance students’ learning and improve their own
performance. Assessment includes collecting, judging and interpreting information about
students’ performance. It is not a separate add-on activity but an integral part of the learning
and teaching process. Its purpose is to provide reliable information and feedback to improve
and enhance the quality of learning and teaching. Suitable assessment enables
• students to understand their abilities and hence improve their ways of learning;
• teachers to understand the performance of their students so that suitable and timely
measures can be provided; and
• parents to understand the performance of their wards so that they can, in collaboration
with teachers, provide suitable support to help the learning of their wards.
Different modes of assessment serve for different purposes. Assessment for learning,
which is usually formative, focuses on the learning process and learning progress.
Assessment of learning, which is usually summative, focuses on the product of learning. As
both the learning process and product are important. In the summative assessment, frequently
the faculty members are getting the following issues from the students.
This is happened due to the Examiner/ Teacher imparts instruction according to what 'she/he
thinks is appropriate or important'. The intended learning outcomes are not stated clearly and
therefore overlooked. Students get confused as they are unaware of what is actually expected
out of them and they suffer. Blueprinting in Assessment, can overcome these issues, if not
completely, to a large extent and hence make the assessment more valid. Blueprint is a map
and a specification for an assessment program that ensures that all aspects of the curriculum
and educational domains are covered by assessment programs over a specified period of time.
It is a two dimensional chart which shows the placement of each question in respect of the
objective and the content area that it test. In simple terms, Blueprint links assessment to
learning objectives. It also indicates the marks carried by each question. It is useful to prepare
a blue print so that the test maker knows which question will test which objective and which
content unit and how many marks it would carry. The blue print concretizes the design in
operational terms and all the dimensions of a question (i.e. its objective, its form, the content
area it would cover and the marks allotted to it) become clear to the test maker. The blue print
is called Table of Specification (ToS).
For Example, the following table is ToS for the Computer Communication and Networks,
Step 1:
Define the following
1. The type of things the student should be able to do (i.e. ABILITIES)
2. The subject matter in which he should be able to do them (i.e CONTENT)
A Table of Specifications is a two way-chart, which relates CONTENT andABILITIES by
assigning suitable weightages for testing purpose.
Content order
Recognize Recall 2.1 2.2 2.3 2.4 2.5 2.6 2.7 3.1 3.2
abilities
(No. and
Name
of the Unit)
Total 100
Step 2:
Factors to be considered:
According to step 2,
Imparting Content
Unit / Module No Unit Name
in Percentage
Unit / Module 1 Network Design 18%
Unit / Module 2 LAN Access methods and Standards 15%
LAN Access 0 3 0 0 0 4 4 0 4 0 0 15
methods and
Standards
Packet Switching 0 3 0 0 0 0 2 4 4 5 0 18
Networks
TCP / IP 0 2 0 22
Architecture
Advanced 27
Network
Architecture and
Security Protocols
Total 20 60 20 0 100
The ability and content details are updated in the above the table. After updating the details,
you have check the each cell where the question can be designed. Finally, the two
dimensional table will be filled with values
Abilities
Remember Understand Apply Higher
Content order Total
(No. and Name
Recognize Recall 2.1 2.2 2.3 2.4 2.5 2.6 2.7 3.1 3.2 abilities
of the Unit)
Network 5 5 2 0 0 0 0 2 4 0 0 0 18
Design
LAN Access
methods and 0 3 0 0 0 4 4 0 4 0 0 0 15
Standards
Packet
Switching 0 3 0 0 0 0 2 4 4 5 0 0 18
Networks
TCP / IP 0 2 0 0 0 0 0 3 12 5 0 0 22
Architecture
Advanced
Network
Architecture 0 2 0 0 0 0 0 5 10 5 5 0 27
and Security
Protocols
Total 20 60 20 0 100
Once the values are finalized, the start to design the question paper. The cornerstone of
classroom assessment practices is the validity of the judgments about students’ learning and
knowledge (Wolming&Wilkstrom, 2010). A TOS is one tool that teachers cause to support
their professional judgment when creating or selecting test for use with their students. The
TOS can be used in conjunction with lesson and unit planning to help teacher make clear the
connections between planning, instruction, and assessment.
Analytic Rubric
Introduction
An analytic rubric articulates levels of performance for each criterion to allow the
instructor to assess student performance on each criterion. Thus using analytic rubric, the
instructor is able to provide specific feedback on several dimensions of an assignment (e.g.,
thesis, organization, mechanics, etc.) along specific levels of performance
Advantages
• Provide useful feedback on areas of strength and weakness.
• Each criterion is evaluated specifically.
• Criterion can be weighted to reflect the relative importance of each dimension.
Disadvantages
Analytic or holistic rubrics can be used depending on the purpose of teachers and
performance expected from students in the assessment of student’s writings. However, the
matter as to which one is more reliable is controversial. Some researchers assert that analytic
rubric is more re- liable than holistic rubric (e.g. Elbow, 2000; Gunning, 2006).
Template - 1
5 Point Rating Scale
Need to
Criteria Excellent Very Good Good Satisfied
Improve
Need to
Criteria Very Good Good Satisfied
Improve
StudentName: ______________________________________
CATEGORY 4 3 2 1
Routinely provides useful Usually provides useful ideas Sometimes provides useful Rarely provides useful
ideas when participating in when participating in the group ideas when participating in the ideas when
the group and in classroom and in classroom discussion. A group and in classroom participating in the
Contributions discussion. A definite leader strong group member who tries discussion. A satisfactory group group and in classroom
who contributes a lot of effort. hard! member who does what is discussion. May refuse
required. to participate.
Provides work of the highest Provides high quality work. Provides work that occasionally Provides work that
quality. needs to be checked/redone by usually needs to be
Quality of Work other group members to ensure checked/redone by
quality. others to ensure
quality.
Routinely uses time well Usually uses time well throughout Tends to procrastinate, but Rarely gets things done
throughout the project to the project, but may have always gets things done by the by the deadlines AND
ensure things get done on procrastinated on one thing. Group deadlines. Group does not have group has to adjust
time. Group does not have to does not have to adjust deadlines to adjust deadlines or work deadlines or work
Time-management adjust deadlines or work or work responsibilities because of responsibilities because of this responsibilities because
responsibilities because of this person\'s procrastination. person\'s procrastination. of this person\'s
this person\'s procrastination. inadequate time
management.
Actively looks for and Refines solutions suggested by Does not suggest or refine Does not try to solve
suggests solutions to others. solutions, but is willing to try problems or help others
problems. out solutions suggested by solve problems. Lets
Problem-solving others. others do the work.
Never is publicly critical of Rarely is publicly critical of the Occasionally is publicly critical Often is publicly
the project or the work of project or the work of others. Often of the project or the work of critical of the project or
others. Always has a positive has a positive attitude about the other members of the group. the work of other
Attitude attitude about the task(s). task(s). Usually has a positive attitude members of the group.
about the task(s). Often has a negative
attitude about the
task(s).
Consistently stays focused on Focuses on the task and what needs Focuses on the task and what Rarely focuses on the
the task and what needs to be to be done most of the time. Other needs to be done some of the task and what needs to
done. Very self-directed. group members can count on this time. Other group members be done. Lets others do
Focus on the task person. must sometimes nag, prod, and the work.
remind to keep this person on-
task.
Almost always listens to, Usually listens to, shares, with, and Often listens to, shares with, Rarely listens to, shares
shares with, and supports the supports the efforts of others. Does and supports the efforts of with, and supports the
Working with efforts of others. Tries to keep not cause \"waves\" in the group. others, but sometimes is not a efforts of others. Often
Others people working well together. good team member. is not a good team
player.
Routinely monitors the Routinely monitors the Occasionally monitors the Rarely monitors the
effectiveness of the group, effectiveness of the group and effectiveness of the group and effectiveness of the
Monitors Group and makes suggestions to works to make the group more works to make the group more group and does not
Effectiveness make it more effective. effective. effective. work to make it more
effective.
When we assess the performance the student each criterion can be considered equally or
unequally. Consider, in the above example all the criteria are treated as equal. The importance
for all the criteria is equal. A student SSS is getting a score as follows
StudentName: _______SSS_______________________________
CATEGORY 4 3 2 1
Routinely provides useful Usually provides useful ideas Sometimes provides useful Rarely provides useful
ideas when participating in when participating in the group ideas when participating in the ideas when
the group and in classroom and in classroom discussion. A group and in classroom participating in the
Contributions discussion. A definite leader strong group member who tries discussion. A satisfactory group group and in classroom
who contributes a lot of hard! member who does what is discussion. May refuse
effort. required. to participate.
Provides work of the highest Provides high quality work. Provides work that occasionally Provides work that
quality. needs to be checked/redone by usually needs to be
Quality of Work other group members to ensure checked/redone by
quality. others to ensure
quality.
Routinely uses time well Usually uses time well Tends to procrastinate, but Rarely gets things done
throughout the project to throughout the project, but may always gets things done by the by the deadlines AND
ensure things get done on have procrastinated on one deadlines. Group does not have group has to adjust
time. Group does not have to thing. Group does not have to to adjust deadlines or work deadlines or work
Time-management adjust deadlines or work adjust deadlines or work responsibilities because of this responsibilities because
responsibilities because of responsibilities because of this person\'s procrastination. of this person\'s
this person\'s procrastination. person\'s procrastination. inadequate time
management.
Actively looks for and Refines solutions suggested by Does not suggest or refine Does not try to solve
suggests solutions to others. solutions, but is willing to try problems or help others
problems. out solutions suggested by solve problems. Lets
Problem-solving others. others do the work.
Never is publicly critical of Rarely is publicly critical of the Occasionally is publicly critical Often is publicly
the project or the work of project or the work of others. of the project or the work of critical of the project or
others. Always has a positive Often has a positive attitude other members of the group. the work of other
Attitude attitude about the task(s). about the task(s). Usually has a positive attitude members of the group.
about the task(s). Often has a negative
attitude about the
task(s).
Consistently stays focused on Focuses on the task and what Focuses on the task and what Rarely focuses on the
the task and what needs to be needs to be done most of the needs to be done some of the task and what needs to
done. Very self-directed. time. Other group members can time. Other group members be done. Lets others do
Focus on the task count on this person. must sometimes nag, prod, and the work.
remind to keep this person on-
task.
Almost always listens to, Usually listens to, shares, with, and Often listens to, shares with, Rarely listens to, shares
shares with, and supports supports the efforts of others. Does and supports the efforts of with, and supports the
Working with the efforts of others. Tries to not cause \"waves\" in the group. others, but sometimes is not a efforts of others. Often
Others keep people working well good team member. is not a good team
together. player.
Routinely monitors the Routinely monitors the Occasionally monitors the Rarely monitors the
effectiveness of the group, effectiveness of the group and effectiveness of the group and effectiveness of the
Monitors Group and makes suggestions to works to make the group more works to make the group more group and does not
Effectiveness make it more effective. effective. effective. work to make it more
effective.
In the rubric evaluation, the bold indicators are score for the student SSS. In a simple table
Collaborative Work Skills : in Mechnical Workshop
StudentName: ______________________________________
CATEGORY 4 3 2 1
Contributions *
Quality of
Work *
Time-
management *
Problem-
solving *
Attitude *
Focus on the
task *
Working with
Others *
Monitors
Group
Effectiveness
*
Score = 4 + 3+ 3 + 4 + 3 + 3 + 4 + 3
= 27 / 32
= 84.38%
When we consider all the criteria as equal, then we got the score as 84.38%. But, in some cases
we cannot consider all the criteria as equal. So, we have to give the relative importance to each
criterion.
CATEGORY 4 3 2 1
Knowledge in
the subject
Communication
Skills
Dress Code
Punctuality
Humor
Anecdotes in
the class
Friendliness
Assignment
Preparation
In the above, table it is not necessary to give equal importance to all the criteria. In the above
example, assume that the faculty member like to give the relative importance as follows.
CATEGORY Importance
Knowledge in the subject 2 mark * indicator rating
Communication Skills 2 mark * indicator rating
If a teacher is evaluated with an above relative importance then the score is not equal as normal
scoring where every criterion is considered as equal.
Punctuality * 4x1=4
Humor * 4x1=4
Anecdotes in
the class * 4x2=8
In the above sample, the important competencies are considered as more weightage . So, a
teacher is punctual, friendliness and humor cannot be treated as equal to the teacher with
knowledge, communication, anecdotes in the class and etc.,
So, the performance is exactly assessed. In addition, the strength and weakness easily
identified.
What Are the Parts of a Rubric?
Rubrics are composed of four basic parts in which the professor sets out the parameters of
the assignment. The parties and processes involved in making a rubric can and should vary
tremendously, but the basic format remains the same. In its simplest form, the rubric includes
a task description (the assignment), a scale of some sort (levels of achievement, possibly in
the form of grades),
Title
Task Description
Scale level 1 Scale level 2 Scale level 3
Criterion 1
Criterion 2
Criterion 3
Criterion 4
In the rubric, it is created by using a simple Microsoft Word table to create our grids using the
“elegant” format found in the “auto format” section. This sample grid shows three scales and
four criteria. This is the most common kind of rubric, but sometimes it can be further
extended with more criteria with valid label to maximum of five scale levels and six to seven
criteria. In this document, it will look at the four component parts of the rubric and, using an
oral presentation assignment as an example, develop the above grid part-by-part until it is a
useful grading tool (a usable rubric) for the professor and a clear indication of expectations
and actual performance for the student.
Part-by-Part Development of a Rubric
Part 1: Task Description
The task description is almost always originally framed by the faculty member and involves a
“performance” of some sort by the student. The task can take the form of a specific
assignment, such as a laboratory work, paper, a poster, assignment, or a presentation. The
task can also apply to overall behavior, such as participation, use of proper lab protocols, and
behavioural expectations in the classroom. It is necessary to place the task description, usually
cut and pasted from the syllabus, at the top of the grading rubric, partly to remind ourselves
how the assignment was written as we grade, and to have a handy reference later on when
we may decide to reuse the same rubric.
Task Description: Each student will make a 5-minute presentation on an installation and
configuration of the Web Server for web Technologies Lab. The presentation should include
appropriate photographs, presentations, maps, graphs, simulations, and other visual aids
for the audience.
Criterion 1
Criterion 2
Criterion 3
Criterion 4
More important, however, we find that the task assignment grabs the students’ attention in
a way nothing else can, when placed at the top of what they know will be a grading tool. With
the added reference to their grades, the task assignment and the rubric criteria become more
immediate to students and are more carefully read. Students focus on grades. Sad, but true.
We might as well take advantage of it to communicate our expectations as clearly as possible.
If the assignment is too long to be included in its entirety on the rubric, or if there is some
other reason for not including it there, we put the title of the full assignment at the top of the
rubric: for example, “Rubric for Oral Presentation.” This will at least remind the students that
there is a full description elsewhere, and it will facilitate later reference and analysis for the
professor. Sometimes we go further and add the words “see syllabus” or “see handout.”
Another possibility is to put the larger task description along the side of the rubric. For reading
and grading ease, rubrics should seldom, if ever, be more than one page long. Most rubrics
will contain both a descriptive title and a task description. Figure 1.2 illustrates Part 1 of the
sample rubric with the title and task description highlighted.
Part 2: Scale
The scale describes how well or poorly any given task has been performed and occupies yet
another side of the grid to complete the rubric’s evaluative goal. Terms used to describe the
level of performance should be tactful but clear. In the generic rubric, words such as
“mastery,” “partial mastery,” “progressing,” and “emerging” provide a more positive, active,
verb description of what is expected next from the student and also mitigate the potential
shock of low marks in the lowest levels of the scale. Some professors may prefer to use non-
judgmental, non-competitive language, such as “high level,” “middle level,” and “beginning
level,” whereas others prefer numbers or even grades.
Here are some commonly used labels compiled by Huba and Freed (2000):
• Sophisticated, competent, partly competent, not yet competent (NSF Synthesis
Engineering Education Coalition, 1997)
• Exemplary, proficient, marginal, unacceptable
• Advanced, intermediate high, intermediate, novice (American Council of Teachers of
Foreign Languages, 1986, p. 278)
• distinguished, proficient, intermediate, novice (Gotcher, 1997):
• accomplished, average, developing, beginning (College of Education, 1997)
We almost always confine ourselves to three levels of performance when we first construct a
rubric. After the rubric has been used on a real assignment, we often expand that to five. It is
much easier to refine the descriptions of the assignment and create more levels after seeing
what our students actually do. Figure 1.3 presents the Part 2 version of our rubric where the
scale has been highlighted. There is no set formula for the number of levels a rubric scale
should have. Most professors prefer to clearly describe the performances at three or even
five levels using a scale. But five levels is enough. The more levels there are, the more difficult
it becomes to differentiate between them and to articulate precisely why one student’s work
falls into the scale level it does. On the other hand, more specific levels make the task clearer
for the student and they reduce the professor’s time needed to furnish detailed grading notes.
Most professors consider three to be the optimum number of levels on a rubric scale.
Task Description: Each student will make a 5-minute presentation on an installation and
configuration of the Web Server for web Technologies Lab. The presentation should include
appropriate photographs, presentations, maps, graphs, simulations, and other visual aids
for the audience.
Criteria 1
Criteria 2
Criteria 3
Criteria 4
Figure 1.3 Part 2: Scales.
If a faculty chooses to describe only one level, the rubric is called a holistic rubric or a scoring
guide rubric. It usually contains a description of the highest level of performance expected for
each criterion, followed by room for scoring and describing in a “Comments” column just how
far the student has come toward achieving or not achieving that level. Scoring guide rubrics,
however, usually require considerable additional explanation in the form of written notes and
so are more time-consuming than grading with a three-to five-level rubric.
Part 3: Criteria
The criteria of a rubric lay out the parts of the task simply and completely. A rubric can also
clarify for students how their task can be broken down into components and which of those
components are most important. Is it the grammar? The analysis? The factual content? The
research techniques? And how much weight is given to each of these aspects of the
assignment? Although it is not necessary to weight the different criteria differently, adding
points or percentages to each criterion further emphasizes the relative importance of each
aspect of the task. Criteria should actually represent the type of component skills students
must combine in a successful scholarly work, such as the need for a firm grasp of content,
technique, citation, examples, analysis, and a use of language appropriate to the occasion.
When well done, the criteria of a rubric (usually listed along one side of the rubric) will not
only outline these component skills, but after the work is graded, should provide a quick
overview of the student’s strengths and weaknesses in each criterion. Criterion need not and
should not include any description of the quality of the performance. “Organization,” for
example, is a common criterion, but not “Good Organization.” We leave the question of the
quality of student work within that criterion to the scale and the description of the criterion.
Breaking up the assignment into its distinct criteria leads to a kind of task analysis with the
components of the task clearly identified. Both students and faculty members find this useful.
It tells the student much more than a mere task assignment or a grade reflecting only the
finished product. Together with good descriptions, the criteria of a rubric provide detailed
feedback on specific parts of the assignment and how well or poorly those were carried out.
This is especially useful in assignments such as our oral presentation example in which many
different criteria come into play, as shown in Figure 1.4.
Criteria alone are all-encompassing categories, so for each of the criteria, a rubric
should also contain at the very least a description of the highest level of performance in that
criterion. A rubric that contains only the description of the highest level of performance is
called a scoring guide rubric. Scoring guide rubrics allow for greater flexibility and the personal
touch, but the need to explain in writing where the student has failed to meet the highest
levels of performance does increase the time it takes to grade using scoring guide rubrics. For
most tasks, we prefer to use a rubric that contains at least three scales and a description of
the most common ways in which students fail to meet the highest level of expectations
Task Description: Each student will make a 5-minute presentation on an installation and
configuration of the Web Server for web Technologies Lab. The presentation should include
appropriate photographs, presentations, maps, graphs, simulations, and other visual aids
for the audience.
Knowledge/understanding
20%/20 points
Thinking/inquiry
30%/30 points
Communication
20%/20 points
Use of visual aids
20%/20 points
Presentation skills
10%/10 points
ANALYSIS OF A QUESTION PAPER
The first step to construct a good question paper is to be able to critically look at the
existing question paper and to identify its strengths and deficiencies. The question paper
is a very important component of the assessment system. Since the students are required
to demonstrate the performance that they now become capable after undergoing the
teaching - learning process; it is very necessary that the question paper clearly calls for
the same performance. Thus we see that there needs to be a great deal of relationship
between the instructional objectives and the question paper. The performances that the
question paper asks the student to demonstrate should be the same as those that the
curriculum specifies.
Everv classroom teacher who prepares the student for examinations that are conducted
by bodies that are outside the institution should be capable of analvsing the question
paper and to specify its strengths and weaknesses.
Resources needed
2. The scheme or the pattern of the question paper prescribed by the board
or examination system
3. The table of specifications for the question paper, if available
4. The marking or scoring system together with the marks assigned to
individual questions and sub-divisions
5. The curriculum document with objectives, content details and time
allocation
6. The teacher analysing the question paper should have expert knowledge in
the subject area together with the knowledge and skills in construction and
use of achievement tests and examinations.
1
Qualities of a good paper
Analysing the questions paper in detail we can perhaps discus briefly some desirable
qualities of a paper. These may be stated as follows:
1. In any examination, the paper should be fair to all students. That is those
who have studied more should get more marks than those who have
studied less.
2. The paper should be comprehensive and test or sample the content of the
entire curriculum as also the abilities.
3. Those students who have studied all areas of the curriculum should be able
to get more marks than those who study only selected portions (because of
the open choice given in some question papers) should.
4. The question and the answer expected should be clear and unambiguous.
The language should be easily understandable. If students do not
understand what is expected of them how will they be able to answer?
5. The relative marks for each question and its sub-division should be marked
so that when a student answers he knows how to allocate his time for the
answer.
6. In general the more number of specific questions a paper has the better will
its reliability. So also if the questions are objective, the paper and exam will
be more reliable Scoring of a paper also is easy if the questions of objective
in nature.
2
Analysis
Question paper should be analysed at 2 levels viz.: Micro level and pertains to the
individual questions and items and Macro level in which we consider the question paper
as a whole. Both these analysis are important.
4. The time allocated is appropriate and the marks assigned to the question is
also proportionate to time and performance expected.
5. The difficulty level of the question should be appropriate to the class that
uses the paper. The facility value of around 50% or a bit more may be
recommended (if you have data on item analysis).
7. when checked with other questions and items, this item does not test the
same area of content and abilities and that it does not give clues for
answering other items.
3
It is not enough that each question and item has been constructed and measures
a specific and important area of content, it must also fit into the question paper
appropriately. The paper as a whole measures the achievement of students. In all
analysis it is the paper that should be considered as a whole more than individual
items and questions. Since the paper cannot measure all areas of curriculum that
is taught it has to necessarily sample content and abilities. How far it is a
representative sample needs to be analysed. So in analysing the question paper
as a whole we should check whether.
4
6. A student who has selectively studied only a few areas of content can get
full marks due to choice. A student who studied more areas of content
should get more marks that will be negotiated if choice is allowed in the
questions. Choice always also results in different students taking different
questions and so that uniformity of assessment of all students is lost.
7. Are there any questions, which are outside the curriculum and syllabus?
5
Performance Assessment
Developing Instructions
Generally, performance tasks often require fairly complex student responses, it is
important that your instruction precisely specify the types of responses you are expecting.
Because originality and creativity are seen as desirable educational outcomes, Performance
tasks often give students considerable freedom in how they approach the task. It is the teacher's
responsibility to write instructions clearly and precisely so that students do not need to "read
the teacher's mind" that what the teacher is expecting from the individual.
Here, a list of questions that assessment professionals recommend you to consider when
evaluating the quality of your instructions (e.g., Nitko, 2001):
Indicators
A rubric is simply a written guide that helps you score constructed-response assessments. In
discussing the development of scoring rubrics for performance assessments, Popham (1999)
identified three essential tasks that need to be completed, discussed in the following
paragraphs.
Select important criteria that will be considered when evaluating student responses.
Start by selecting the criteria or response characteristics that you will employ when
judging the quality of a student's response. The criteria you are considering when judging the
quality of a student's response should be described in a precise manner so that there is no
confusion about what the rating refers to. It is also highly desirable to select criteria that can be
directly observed and judged. Characteristics such as interest, attitude, and effort are not
directly observable and do not make good bases for evaluation.
Validity
Validity is the adequacy and appropriateness of the interpretations and uses of
assessment results. An evaluation of the validity of the use and interpretation of an assessment
can take many forms. For example, if an assessment is to be used to describe student
achievement, then we should like to be able to interpret the scores as a relevant and
representative sample of the achievement domain to be measured. If the results are to be used
as a measure of students' understanding of mathematical concepts, then we should like our
interpretations to be based on evidence that the scores actually reflect mathematical
understanding and are not distorted by irrelevant factors, such as the reading demands of the
tasks. If the results are to be used to predict students' success in some future activity, then we
should like our interpretations to be based on as good an estimate of future success as possible.
Basically, then, validity is always concerned with the specific use of assessment results and the
soundness and fairness of our proposed interpretations of those results. As we will see later in
this chapter, however, this does not mean that validation procedures can be matched to specific
assessment uses on a one -to-one basis.
Reliability refers to the consistency of assessment results. If we obtain quite similar scores
when the same assessment procedure is used with the same students on two different occasions,
then we also can conclude that our results have a high degree of reliability from one occasion
to another. Similarly, if different teachers independently rate student performances on the same
assessment ta.sk and obtain similar ratings, we also can conclude that the results have a high
degree of reliability from one rater to another. Like validity, reliability is intimately related to
the type of interpretation to be made. For some uses, we may be interested in asking how
reliable our assessment results are over a given period of time and, for others, how reliable they
are over different samples of the same behavior. In all instances in which reliability is being
determined, however, we are concerned with the consistency of the results rather than with the
appropriateness of the interpretation made from the results (validity). The relation between
reliability and validity is sometimes confusing to persons who encounter these terms for the
first time. Reliability (consistency) of measurement is needed
to obtain valid results, but we can have reliability without validity. That is, we can have
consistent measures that provide the wrong information or are interpreted inappropriately. The
target-shooting illustration in Figure 2 depicts the concept that reliability is a necessary but not
sufficient condition for validity.
Content Validity
Each item in a test must be a sampling of knowledge or performance that the test is
supposed to measure. Content validity refers to the degree to which the test measures the
content in relation to the objectives spelt out. Content validity is usually associated with
achievement tests. It may be defined as the extent to which a test measures a representative
sample of subject matter content and the behavioural changes under consideration. The focus
of content validity, then, is on the adequacy of the sample and not simply on the appearance of
the test. A test that appears to be a relevant measure, based on superficial examination, is said
to have 'face validity'. Although a test should look like an appropriate tool to obtain the co-
operation of those taking the test, face validity should not be considered as a substitute for
content validity. The test must adequately sample both subject matter content and the major
types of behavioural changes. These must also be properly weighted in terms of their relative
importance. The factors that will affect the validity of a test are
• unclear directions and ambiguous statements in test items
• reading vocabulary and sentence structure too difficult for the student to understand
• inappropriate level of difficulty of test item for the person being examined, This results
in poor discrimination of marks and therefore low reliability
• poorly constructed test items i.e., items with poor lay out or unclear words or figures
• test items inappropriate for the outcomes being measured
• test too short so that adequate sample not made of content and behaviours
• improper arrangement of items and identifiable pattern in answers to items in the test.
In short, any defect in the construction of items and assembling a test will contribute to
invalidity of the measurement and therefore care must be taken to prevent the same. The content
sampled in a lest must be a representative sample and it should be able to truly measure the
achievement of the learners. Using a table of specifications for the test does this.
Construct validity:
Construct validity concerns the extent to which a test tells us something about a
meaningful characteristic of the individual. Information about such characteristics (or
"Constructs- as they are sometimes called) may help us understand the student's performance
In various aspects. Common examples of constructs are intelligence, scientific attitude, critical
thinking, reading comprehension, study skills and mathematical aptitude. Construct validity
may be defined as the extent to which test performance can be interpreted in terms of certain
psychological constructs.
For example, in order to understand, why a student consistently does well in English but poorly
in Mathematics, it may be useful to know something about his general level of intelligence, his
verbal ability, his numerical ability and perhaps his attitudes towards the different subjects-
Knowledge of such characteristics can help ensure that each student benefits maximally from
the learning experiences provided.
Construct validity is important in the context of achievement testing also. Achievement is an
important construct in an educational setting. Construct validity here refers to the test's ability
to measure the individual's actual achievement of instructional objectives. If an achievement
test has high construct validity, it should distinguish between students' who have achieved at
different levels.
Concurrent Validity
Concurrent validity is a criterion-related validity. It is the extent to which test
performance is related to some other current performance. The concurrent validity of a test
must be considered when one is using the test to distinguish between two or more groups of
individuals, whose status on a criterion is different at the time of testing. Tests or inventories
used to separate individuals in different academic curricula, in different vocational groups etc.
if successful, would be showing concurrent validity.
Concurrent validity may also be of relevant concern In judging achievement tests. In every day
classroom experiences, there frequently are appropriate
contemporary criteria with which achievement test performance must be compared. For
example, test performance in Mathematics, should be related to computational skill exhibited
in Engineering subject (Contemporary Criterion). High concurrent validity of the Mathematics
test means that those who do well in the test also do well in a test of the criterion of
computational skill in the Engineering subject.
Predicted Validity
This is also a content-related validity. It is the extent to which the test performance is
accurate in predicting some future performance. Eg: Aptitude testing, Predictive validity is
pertinent whenever test results are used to make specific predictions.
Consider for example the problem of selecting students for a course based on an admission
test. The assumption made is that the candidates who do well in the test are likely to succeed
in the course. The Admission test should have high predictive validity for this purpose, In order
to determine the predictive validity of the test, it is necessary to establish a correlation between
the admission test scores and a criterion viz. Course performance. Many examination results
are used as predictors of future performance in later stages of education. Eg: Marks in
Mathematics, Physics and Chemistry are used to predict success in the Engineering courses.
ESTABLISH THE CHARACTERISTICS OF TEST
Objectivity:
The objectivity of a test refers to the degree to which equally competent scorers give the
same score for the same answer paper. If the test contains objective test items, then the
objectivity of the test will be high. Essay type questions where the scorer has to use his
subjective judgement cannot be highly objective. The scores assigned by different
examiners should not be affected by the personal bias of the scorers. Though objectivity
is a desirable quality, it should not be insisted upon where the other more important
characteristic (namely validity) requires subjective items in the test. Essay tests are less
objective. It is known that no two examiners assign the same score for an essay. In
scoring an essay many extraneous factors come into the picture.
1
Discrimination:
A good test should be able to pick out a good student from the poor one, The test should
also be able to detect small differences in students' achievement. This is the ability of a
test to discriminate and we can increase discriminating power of the test by using items
that can discriminate as well as by having items that have all levels of difficulty. A test
having larger range of scores will be able to discriminate better.
Comprehensiveness:
A test measuring student achievement both in the subject content and in behavioural
outcomes must be comprehensive as well as representative in sampling so as to make
the test good.
Usability:
A test should be easy to administer, score and to interpret to make it usable. The test
which takes minimum student time in administering and which can be administered
without much problems in seating etc., is preferable to one which requires elaborate
precautions to administer. Economy in making the test (printing) and economy of time in
its use are desirable characteristics. A test which is easy to score after administration is
also a desirable test. In short, if 2 tests are compared and other things being equal the
test that is easy to design, duplicate, administer and score is desirable.
Relevance:
This relates to the matter of matching the performance measured by the test item or
question to the type of performance specified by the instructional objectives or the
learning outcomes. This is therefore possible only when the curriculum specifies the
intended learning outcomes (objectives) clearly. It is therefore necessary for the test
constructor to use careful judgement in selecting items and questions. If the outcome
calls for supplying the answer then the item should require the student to supply the
answer rather than select an answer. It is more important when higher order abilities are
2
involved. The items or questions should be exactly matched in performance and in the
level of performance with those indicated in the objectives.
In norm-referenced tests, if the items are all easy or too difficult, then the spread of scores
of those taking the test tends to be restricted i.e. either all will get high scores or all will
get low scores, Classroom achievement test should be so constructed that the average
score is around 50%.
The test that is administered and scored should be fair to the student. It is desirable that
an average student who has learnt the topics taught should be able to do well and pass.
The items and questions should be able to measure his learning and should not be
twisted or made unnecessarily complicated, The question paper as a whole should be a
balanced one in that the different areas must be given appropriate weightage based on
importance and the time spent in teaching. The items themselves should check important
areas of achievement rather than some trivial or obscure areas in the content.
3
Reliability
Reliability refers to the consistency of measurement that is, how consistent are test
scores from one measurement to another. If a test gives a score now and if it is administered
again after lapse of a short time without remedial instruction, gives scores which are
comparable, then the test is said to be reliable. This is called test-retest reliability. The type of
test items in a test can affect reliability also. Items which can be differently scored by different
examiners or by the same examiner at different times will contribute to unreliability (i.e. essay
type items). Factors affecting reliability are
• items that are ambiguous, too easy or too difficult will contribute to unreliability since
all those examined will get the same or nearly same scores.
• a longer test is more reliable than a short one as there is greater scope for a larger spread
of scores.
• test that has greater spread of scores is more reliable, as it discriminates between high
and low achievers.
• Objective tests are more reliable than essay type tests because the subjective judgement
of the scorer does not affect the scores.
Reliability refers to error in measurement. This could be extrinsic error or intrinsic error.
Extrinsic error may be due to:
• test and Examination conditions and situations.
• subjectivity in scanning by the scorers (this can be eliminated by using objective items
or minimised by having a marking scheme and examiners meeting for scoring).
Intrinsic error may be due to:
• the quality of items and questions
• sampling of areas not balanced i.e. it is biased
• time limits set arbitrarily which is not in keeping with the requirements of the test
situation.
The reliability of a test is measured in terms of Reliability coefficient. The following methods
are used to find out the reliability of a test.
Test-Retest Method
The test is administered to a group of students for whom the test is constructed. The
same test is administered to the same group of students after a lapse of time. Test administration
must be under similar conditions. This would mean that the students are retested with the same
test. The scores are compared and coefficient of correlation found out. If there is a high positive
correlation then the test will be valid. However in this method the effect of learning or
unlearning due to the lapse of time cannot be ruled out.
The first step to construct a good question paper is to be able to critically look at the
existing question paper and to identify its strengths and deficiencies. The question paper
is a very important component of the assessment system. Since the students are required
to demonstrate the performance that they now become capable after undergoing the
teaching - learning process; it is very necessary that the question paper clearly calls for
the same performance. Thus we see that there needs to be a great deal of relationship
between the instructional objectives and the question paper. The performances that the
question paper asks the student to demonstrate should be the same as those that the
curriculum specifies.
Everv classroom teacher who prepares the student for examinations that are conducted
by bodies that are outside the institution should be capable of analvsing the question
paper and to specify its strengths and weaknesses.
Resources needed
2. The scheme or the pattern of the question paper prescribed by the board
or examination system
3. The table of specifications for the question paper, if available
4. The marking or scoring system together with the marks assigned to
individual questions and sub-divisions
5. The curriculum document with objectives, content details and time
allocation
6. The teacher analysing the question paper should have expert knowledge in
the subject area together with the knowledge and skills in construction and
use of achievement tests and examinations.
1
Qualities of a good paper
Analysing the questions paper in detail we can perhaps discus briefly some desirable
qualities of a paper. These may be stated as follows:
1. In any examination, the paper should be fair to all students. That is those
who have studied more should get more marks than those who have
studied less.
2. The paper should be comprehensive and test or sample the content of the
entire curriculum as also the abilities.
3. Those students who have studied all areas of the curriculum should be able
to get more marks than those who study only selected portions (because of
the open choice given in some question papers) should.
4. The question and the answer expected should be clear and unambiguous.
The language should be easily understandable. If students do not
understand what is expected of them how will they be able to answer?
5. The relative marks for each question and its sub-division should be marked
so that when a student answers he knows how to allocate his time for the
answer.
6. In general the more number of specific questions a paper has the better will
its reliability. So also if the questions are objective, the paper and exam will
be more reliable Scoring of a paper also is easy if the questions of objective
in nature.
2
Analysis
Question paper should be analysed at 2 levels viz.: Micro level and pertains to the
individual questions and items and Macro level in which we consider the question paper
as a whole. Both these analysis are important.
4. The time allocated is appropriate and the marks assigned to the question is
also proportionate to time and performance expected.
5. The difficulty level of the question should be appropriate to the class that
uses the paper. The facility value of around 50% or a bit more may be
recommended (if you have data on item analysis).
7. when checked with other questions and items, this item does not test the
same area of content and abilities and that it does not give clues for
answering other items.
3
It is not enough that each question and item has been constructed and measures
a specific and important area of content, it must also fit into the question paper
appropriately. The paper as a whole measures the achievement of students. In all
analysis it is the paper that should be considered as a whole more than individual
items and questions. Since the paper cannot measure all areas of curriculum that
is taught it has to necessarily sample content and abilities. How far it is a
representative sample needs to be analysed. So in analysing the question paper
as a whole we should check whether.
4
6. A student who has selectively studied only a few areas of content can get
full marks due to choice. A student who studied more areas of content
should get more marks that will be negotiated if choice is allowed in the
questions. Choice always also results in different students taking different
questions and so that uniformity of assessment of all students is lost.
7. Are there any questions, which are outside the curriculum and syllabus?
5
Criterion referenced test versus Norm referenced Test
Different kinds of test can be conducted in a teaching – learning process and scores can be
interpreted. Based on the test or interpretation of score the test can be classified as Norm-
referenced Test versus Criterion referenced Test. Norm-referenced test and criterion-
referenced test are differentiated with respect to the ways scores are interpreted and the
purposes of the tests. Norm-referenced test is the process of evaluating or grading the learning
of students by ranking them against the performance of their peer group. Criteria- referenced
test is the process of evaluating or grading the learning of students against a set of defined
criteria. Norm Referenced Test is a test that measures how the performance of a particular
student or a group of students as one group compares with the performance of another student
or group of another set of students as a group whose scores are given as the norm. A test taker‘s
score is, therefore, interpreted with reference to the scores of other test takers or groups of test
takers. Norm referenced test tells that where a student stands compared to other students in
their performance. This position may help the student to take some decisions. The quality of
Norm referenced test is usually good because they are developed by experts, piloted, and
revised before they are used with students. It is also good for ranking and sorting students for
administrative purposes. It is intended to judge the class performance and institutions
accountability of providing learning standards and maintaining quality of education.
Criterion referenced test would be used to assess whether students pass or fail at a certain
criterion. So, Criterion referenced test is an approach of evaluation through which a learner‘s
performance is measured with respect to the same criterion in the classroom. Criterion
referenced test is good to measure specific skills or specific outcome of a student. It provides
the roadmap to the faculty member that how well the students are progressing. It is good to
determine learning progress if students have learning gaps or academic deficits that need to be
addressed . In a paper written by researcher Bond said that Criterion referenced test gives
direction to teaching and re-teaching. Instructors can use the test results to determine how well
they are teaching the curriculum and where they are lagging behind.
In a Military Selection a criterion was set. The Criterion is to climb the wall using a rope and
jump to other side of the wall. The scenario is, consider a wall with a height of 10 metre from
the ground. A rope is hanging in front of the wall. There are 20 candidates are standing in-front
of the wall. The criterion is, each one has to use the rope which is hanging in- front of the wall
and climb the wall using the rope and jump to other side of the wall. Assume that after the test,
some of them not even climbed 50% of the wall. Few of them climbed 75% of the wall and
only three of them touched the top of the wall but they could not jump to other side. So, who
will be selected for military? No body. It is exactly criterion referenced. The criterion need to
succeeded.
Quick Way to Estimate Reliability for Classroom Exams
Saupe (1961) provided a quick method for teachers to calculate reliability for a classroom exam
in Era prior to cay access to calculators or computers. It is appropriate for a test in which each
item is given equal weight and each item is scored either right or wrong. First, the standard
deviation of the exam must be estimated from a simple approximation
Thus, for example, in a class with 24 student test scores, the top one-sixth of the scores are 98,
92, 87, and 86, while the bottom sixth of the scores are 48, 72, 74, and 75. With 25 test items,
the calculation are:
SD = [ 98 + 92 + 87 + 86 – 48 – 72 – 74 - 75] / 23 / 2
= [363 – 269] / 11.5
= 94 / 11.5 = 8.17
So,
Reliability = 1 – [0.19 x 25] / 8.172
= 1 – 0.07
= 0.93
A reliability coefficient of 0.93 for a classroom test is excellent.
Reference : Measurement and Assessment in Education by Reynolds, Livingston and Willson Second
Edition
Discrimination Index
Probably the most popular method of calculating an index of item discrimination is
based on the difference in performance between two groups. Although there are different ways
of selecting the two groups, they are typically defined in terms of total test performance. One
common approach is to select the top and bottom 27% of test takers in terms of their overall
performance on the test and exclude the middle 46% (Kelley, 1939). Some assessment experts
have suggested using the top and bottom 25%, some the top and bottom 33%, and some the
top and bottom halves. In practice, all of these are probably acceptable (later in this chapter we
will show you a more practical approach that saves both time and effort). The difficulty of the
item is computed for each group separately, and these are labeled PT and PB (T for top, B for
bottom). The difference between PT and PB is the discrimination
PT - proportion of examinees in the top group getting the item correct index, designated as D,
and is calculated with the llowing formula (e.g-Johnson, 1951)
D= PT-PB
Where
D = Discrimination index
PT = proportion of examinees in the top group getting the item correct
PB = proportion of examinees in the bottom group getting the item correct
To illustrate the logic behind this index, consider a classroom test designed to measure
academic achievement in some specified area. If the item is discriminating between students
who know the material and those who do not, then students who are more knowledgeable ie.,
students in the top group) should get the item correct more often than students who are less
knowledgeable (ie. students in the bottom group). For example, if PT= 0. 0.80 indicating 80%
of the students in the top group answered the item correctly) and PB=0.30 (indicating 30% of
the students in the bottom group answered the item correctly), then
Hopkins (1998) provided guidelines for evaluating items in terms of their D values (see Table
1). According to these guidelines, D values of 0.40 and above are considered excellent, between
0.30 and 0.39 are good, between 0.11 and 0.29 are fair, and between 0.00 and 0.10 are poor.
Items with negative D values are likely mis-keyed or there are other serious problems. Other
testing assessment experts have provided different guidelines, some more rigorous and some
more lenient.
As a general rule, we suggest that items with D values over 0.30 are acceptable (the larger the
better), and items with D values below 0.30 should be carefully reviewed and possibly revised
or deleted. However, this is only a general rule and there are exceptions. For example, most
indexes of item discrimination, including the item discrimination index (D), are biased in favor
of items with intermediate difficulty levels. That is, the maximum D value of an item is
0.00-0.10 Poor
related to its p value (see Table 2). Items that all test takers either pass or fail i.e., P values of
either 0.0 or 1.0) cannot provide any information about individual differences and their D
values will always be zero. If half of the test takers perfectly answered an item and half failed
(i.e., p value of 0.50), then it is possible for the item's D value to be 1.0. This does not mean
that all items with p values of 0.50 will have D values of 1.0, but just that the item can
conceivably have a D value of 1.0. As a result of this relationship between p and D, items that
have excellent discrimination power (i.e.., D Value of 0.40 and above) will necessarily have p
values between 0.20 and 0.80. In testing situations in which it is desirable to have either very
difficult items, D values can be expected to be lower than those normally desired. Additionally,
items that measure abilities or objectives that are not emphasized throughout the test may have
poor discrimination due to their unique focus. In this situation, if the item important ability or
learning objective and is free of technical defects, it should be retained (e.g., Linn & Gronlund,
2000)
Scoring System
It is essential that each candidate's progress be watched carefully and reported as accurately as
possible. Scores also prove an important means for stimulating, directing and rewarding the
efforts of candidates. Scores represent the degree of achievement as precisely as possible under
the circumstances. Scores are necessary but they are based on sufficient evidences. There are
two major types of scoring systems have been used for evaluation of the item set.
Absolute Scoring System: A scoring system in which candidate's percent score is independent
of any other candidate's scores called absolute scoring. To evaluate the Item set, teacher present
must be there, whose natural instincts in-clime to be helpful guides and counsellors, to stand in
judgment over some of their fellow. Experts stated that, “It is never difficult to give good scores
to candidate if it is higher than he really expected. But there are likely to be more occasions for
disappointment than pleasure in scores". Scoring standards are often varying from instructor to
instructor and from institution to institution. Due to such reasons, there is no scoring system
available, which will make the process of scoring easy and satisfactory. This is to say that no
new scoring system however cleverly devised and conscientiously followed to solve basic
problem of scoring.
A distribution is a set of scores. The score can be obtained from any kind of test conducted.
Statically, The distribution of a statistical data set is a listing or function showing all the
possible values of the data and how often they occur. When a distribution of categorical data
is organized, it provides the number or percentage of individuals in each group. When a
distribution of numerical data is organized, they’re often ordered from smallest to largest,
broken into reasonably sized groups, and then put into graphs and charts to examine the shape,
center, and amount of variability in the data.
Measures of central tendency are used to describe the centre of the distribution. There are three
measures commonly used for obtain the distribution. These measures are mean, median and
mode.
Mean: The "average" number; found by adding all data points and dividing by the number of
data points.
The mean is the usual average, so I'll add and then divide:
(13 + 18 + 13 + 14 + 13 + 16 + 14 + 21 + 13) ÷ 9 = 15
Median: The middle number; found by ordering all data points and picking out the one in the
middle
There are nine numbers in the list, so the middle one will be the (9 + 1) ÷ 2 = 10 ÷ 2 = 5th
number:
13, 13, 13, 13, 14, 14, 16, 18, 21 >>> So the median is 14.
Mode: The most frequent number—that is, the number that occurs the highest number of times.
So, 13 is the mode.
In addition with three measures, range is also one kind of measure which is the difference
between the largest value and smallest value. In the list largest values is 21, and the smallest
is 13, so the range is 21 – 13 = 8.
Standard Deviation (SD): This is one of the measures of variability of scores. It indicates the
spread of scores around the mean score. For standard distribution its value is 1. A greater value
of SD indicates that there is a wide spread of scores around the mean score. It is defined as the
positive square root of the arithmetic mean of the square of deviation of given observation from
their arithmetic mean. In short 𝜎𝜎 is defined as root mean square deviation from mean.
A114 4
2
A115 5
A116 6 1
A117 7 0
A118 8 10 9 8 7 6 5 4
A119 8
A120 7
Figure -1
Normal Distribution
Many distributions fall on a normal curve, especially when large samples of data are
considered. This is important to understand because if a distribution is normal, there are certain
qualities that are consistent and help in quickly understanding the scores within the distribution.
The mean, median, and mode of a normal distribution are identical and fall exactly in the center
of the curve.
The empirical rule tells you what percentage of your data falls within a certain number
of standard deviations from the mean:
• 68% of the data falls within one standard deviation of the mean.
• 95% of the data falls within two standard deviations of the mean.
• 99.7% of the data falls within three standard deviations of the mean.
Figure 2
The standard deviation controls the spread of the distribution. A smaller standard deviation
indicates that the data is tightly clustered around the mean; the normal distribution will be
taller. A larger standard deviation indicates that the data is spread out around the mean; the
normal distribution will be flatter and wider.
Skewness: Literal meaning of skewness is 'Lack of Symmetry'. It is used to study the shape
i.e. symmetry or asymmetry of the frequency distribution. In a symmetrical distribution,
equal distances on either side of the central value. Both the tails (Left and Right) of the
curve is equal in shape and length. The skewness of normal frequency distribution is Zero.
The frequency curve of the distribution is not a symmetric bell shaped curve but it is
stretched more to one side to the other i.e. it has longer tail to one side (Left or Right) than
to another. A frequency distribution, for which the curve has a longer tail towards the right
side, is said to be positively skewed and if longer tail towards the left side, it is said to be
negatively skewed. The figure - 2 graphs show the symmetrical distribution and flowing
figure -3 and figure -4 shows asymmetrical distribution.
Figure – 3
For a right (positive) skewed distribution, the mean is typically greater than the median.
Also notice that the tail of the distribution on the right hand (positive) side is longer than
on the left hand side.
Figure - 4
A distribution that is skewed left has exactly the opposite characteristics of one that is skewed
right:
Percentile Rank: When different groups appear for the tests, scores are also different for tests.
They have widely different means, standard deviations and distributions. It is useful to have
standard scale to which they are referred. One such scale is percentile rank. The percentile rank
of a test score indicates what percent of the scores falls below the midpoint of that score
interval. In calculating percentile rank of any score, half of the persons receiving that score are
considered to have scored below and half of then to have scored above the midpoint of the
score interval. Percentile rank is used to determine the relation between a particular candidate's
score and score of other candidates tested in the group. It is the range of 0 to 100 regardless of
whether the group as a whole performs well or poorly in the test. Percentile ranks differ from
the original or raw test scores. Percentile ranks are rectangular distribution while row scores
are normal distribution. In normal distribution, the scores are concentrated near the middle with
decreasing score frequencies as one moves out to the high and low extremes. In rectangular
distribution the score frequencies are uniform all along the scale.
Item Analysis for Constructed –Response Items
Our discussion and example of the calculation of the it difficulty index and discrimination index
used examples that were dichotomously scored (i.e., scored right or wrong: 0 or ). Although this
procedure works fine with selected-response items (e-g. true-false, multiple-choice), you need a
slightly different approach with constructed-response items that are scored in a more continuous
manner (e.g. an essay item that can receive scores between 1 and 5 depending on quality). To
calculate the time difficulty index for a continuously scored constructed-response item, use the
following formula (Nitko, 2001):
The range of possible scores is calculated as the maximum possible score on the item minus the
minimum possible score on the item. For example, if an item has an average score of 2.7 and is
scored on a l to 5 scale, the calculation would be:
2.7 2.7
P= = =0.675
5-1 4
Therefore, this item has an item difficulty index of 0.675. This value can be interpreted the same
as dichotomously scored items we discussed.
To calculate the item discrimination index for a continuously scored constructed-response item,
you use the following formula (Nitko, 2001)
Average Score for the Top Group Average Score for the Bottom Group
D=
Range of Possible Scores
For example, if the average score for the top group is 4.3, the average score for the bottom group
in 1.7, and the item is scored on a 1 to 5 scale, the calculation would be:
Therefore, this item has an item discrimination index of 0.65. Again, this value can be
interpreted the same as the dichotomously scored items we discussed.
Item Difficulty Index ( or Item Difficulty Level)
When evaluating items on ability tests, an important consideration is the difficulty level
of the item. Item difficulty is defined as the percentage or proportion of test takers who
correctly answer the item difficulty level or index is abbreviated as p and calculated with the
following formula:
number of examinees correctly answering the item
P=
Number of examinees
For example, in a class of 30 students, among the 30 students 20 students got the answer
correct and ten are incorrect, the item difficulty index is 0.67. The calculations are illustrated
here.
20
P= =0.67
30
In the same class. if ten students get the answer correct and 20 are incorrect, the item difficulty
index is 0.33.
While calculating the item analysis, if X number of students not appeared for the question, then
we have to subtract the X from the number of Examinees.
For Example, in a of class 30 students, among the 30 students 18 students are submitting the
correct answer, 8 students are submitting the wrong answer and 4 students are not responded
for the questions. So, the value P = 18/(30-4)
The item difficulty index can range from 0.0 to 1.0 with easier items having larger decimal
values and difficult items at lower values. An item answered correctly by all students receives
an item difficulty of 1.0 whereas an item answered in-correctly by all students receives an item
difficulty of 0.0. Items with p values of either 1.0 or 0.0 provide no information about
individual differences and are of no value from a measurement perspective. Some test
developers will include one or two items with p values of 1.0 at the beginning of a test to instill
a sense of confidence in test takers. This is a defensible practice from a motivational
perspective, but from a technical perspective these items do not contribute to the measurement
characteristics of the test. Another factor that should be considered about the inclusion of very
easy or very difficult items is the issue of time efficiency. The time students spend answering
ineffective items is largely wasted and could be better spent on items that enhance the
measurement characteristics of the test.
For maximizing variability and reliability, the optimal item difficulty level is 0.50, indicating
that 50% of test takers answered the item correctly and 50% answered incorrectly. Based on
this statement, you might conclude that it is desirable for all is that items on a test are often
correlated with each other, which means the measurement process may be confounded if all
the items have p values of 0.50. As a result, it is often desirable to select some items with p
values below 0.50 and some with values greater than 0.50, but with a mean of 0.50. Aiken
(2000) recommends that there should be approximately a 0.20 range of these p values around
the optimal value. For example, a test developer might select items with difficulty levels
ranging from 0.40 to 0.60, with a mean of 0.50.
Another reason why 0.50 is not the optimal difficulty level for every testing situation involves
the influence of guessing. On constructed-response items (e.g., essay and short-answer items)
for which guessing is not a major concern, 0.50 is typically considered the optimal difficulty
level.
In general, the difficulty of the items are as follows
Difficulty Value Item Evaluation
0.20 to 0.30 Most Difficult
0.30 to 0.40 Difficult
0.40 to 0.60 Moderate Difficult
0.60 to 0.70 Easy
0.70 to 0.80 Most Easy
However, with selected-response items (e.g., multiple choice and true-false items) for which
test takers might answer the item correctly simply by guessing, the optimal difficulty level
varies. To take into consideration the effects of guessing.
TABLE 1: Optimal p Values for Items with Varying Numbers of Choices
3 0.77
4 0.74
5 0.69
The optimal item difficulty level is set higher than for constructed-response items. For
example, for multiple-choice items with four options the average p should be approximately
0.74 (Lord, 1952). That is, the test developer might select items with difficulty levels ranging
from .64 to 0.84 with a mean of approximately 0.74. Table 6.1 provides information on the
optimal mean p value for selected-response items with varying numbers of alternatives or
choices
Now, you are aware of the importance of the reliability of measurement. A common question
is "How can I estimate the reliability of scores on my classroom tests?" Most teachers have a
number of options. First, if you use multiple-choice or other tests that can be scored by a
computer scoring program, the score printout will typically report some reliability estimate
(e.g. coefficient alpha or KR-20). If you do not have access to computer scoring, but the items
on a test are of approximately equal difficulty and scored dichotomously (i.e. correct/incorrect),
you can use an internal consistency reliability estimate known as the Kuder-Richardson
formula 21 (KR-21). To calculate KR-21 you need to know only the mean, variance, and
number of items on the test:
𝑋𝑋 (𝑛𝑛−𝑋𝑋)
KR-21 =1 -
𝑛𝑛𝜎𝜎 2
X = mean
𝜎𝜎 2 = Variance
n = Number of Items
X= 40.15
𝜎𝜎 2 = 39.71
and n=50
So,
40.15(50−40.15)
KR -21 = 1 -
50∗39.71
395.4775
=1-
1985.66
= 1 – 0.1991
=0. 80