Assignment No. 1 Q.1 Explain Classroom Assessment. Write A Note On Principles of Classroom Assessment
Assignment No. 1 Q.1 Explain Classroom Assessment. Write A Note On Principles of Classroom Assessment
Assignment No. 1 Q.1 Explain Classroom Assessment. Write A Note On Principles of Classroom Assessment
ASSIGNMENT No. 1
Q.1 Explain classroom Assessment. Write a note on principles of classroom assessment.
Classroom Assessment Techniques (CATs) are generally simple, non-graded, anonymous, in-class activities
designed to give you and your student’s useful feedback on the teaching-learning process as it is happening.
Examples of CATs include the following.
The Background Knowledge Probe is a short, simple questionnaire given to students at the start of a
course, or before the introduction of a new unit, lesson or topic. It is designed to uncover students’ pre-
conceptions.
The Minute Paper tests how students are gaining knowledge, or not. The instructor ends class by asking
students to write a brief response to the following questions: “What was the most important thing you
learned during this class?” and “What important question remains unanswered?”
The Muddiest Point is one of the simplest CATs to help assess where students are having difficulties.
The technique consists of asking students to jot down a quick response to one question: “What was the
muddiest point in [the lecture, discussion, homework assignment, film, etc.]?” The term “muddiest”
means “most unclear” or “most confusing.”
The What’s the Principle? CAT is useful in courses requiring problem-solving. After students figure out
what type of problem they are dealing with, they often must decide what principle(s) to apply in order to
solve the problem. This CAT provides students with a few problems and asks them to state the principle
that best applies to each problem.
Defining Features Matrix: Prepare a handout with a matrix of three columns and several rows. At the
top of the first two columns, list two distinct concepts that have potentially confusing similarities (e.g.
hurricanes vs. tornados, Picasso vs. Matisse). In the third column, list the important characteristics of
both concepts in no particular order. Give your students the handout and have them use the matrix to
identify which characteristics belong to each of the two concepts. Collect their responses, and you’ll
quickly find out which characteristics are giving your students the most trouble.
CATs can be used to improve the teaching and learning that occurs in a class. More frequent use of CATs can…
Provide just-in-time feedback about the teaching-learning process
Provide information about student learning with less work than traditional assignments (tests, papers,
etc.)
Encourage the view that teaching is an ongoing process of inquiry, experimentation, and reflection
Help students become better monitors of their own learning
Help students feel less anonymous, even in large courses
Provide concrete evidence that the instructor cares about learning
Results from CATs can guide teachers in fine-tuning their teaching strategies to better meet student needs. A
good strategy for using CATs is the following.
1
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020
1. Decide what you want to assess about your students’ learning from a CAT.
2. Choose a CAT that provides this feedback, is consistent with your teaching style, and can be
implemented easily in your class.
3. Explain the purpose of the activity to students, and then conduct it.
4. After class, review the results, determine what they tell you about your students’ learning, and decide
what changes to make, if any.
5. Let your students know what you learned from the CAT and how you will use this information.
The standard references on CATs is Classroom Assessment Techniques: A Handbook for College Teachers,
2nd edition, by Thomas A. Angelo and K. Patricia Cross (Jossey-Bass, 1993). This book includes 50 CATs,
indexed in a variety of useful ways. The book is available at the Center for Teaching library. See its ACORN
record for call number and availability.
Focused Listing
Focused Listing is a quick and simple student writing activity.
Muddiest Point
Muddiest Point is a quick and simple technique where students identify a challenging or confusing concept.
One Minute Paper
Minute paper is an introductory technique for a student writing activity.
Think-Pair-Share
Think-Pair-Share is a quick and easy technique that has students working in pairs to answer questions posed by
the instructor.
Concept Mapping
Concept Mapping is an intermediate technique that asks students to create ways of representing and organizing
ideas and concepts.
Jigsaw
Jigsaw is an advanced technique where teach each other assigned topics.
Memory Matrix
Memory matrix is an intermediate technique that asks students to create a structure for organizing large sets of
information.
Quiz Show
Quiz Show is an intermediate technique that uses a game show format for review sessions.
Q.2 How content outline is prepared while developing classroom test.
Basic Steps in Classroom Assessment
1. Determining the purpose of the assessment (pre-test, formative, or summative)
2. Developing the test specifications (this is the table you are creating)
3. Selecting the appropriate assessment tasks (form and type)
2
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020
1) Short answer
2) Completion
3
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020
Selection types: (1) alternate choice (2) matching (3) multiple choice (4) keyed response (5) interpretive
exercise
1) Extended-response
2) Restricted response
Active (evaluates process):
Remember: Bottom line = select the item type that provides the most direct measure of the intended
behavioral objectives
4. Preparing the relevant assessment tasks; the limited number of items should be representative of the
domain
Learning outcomes at the first 3 levels of Bloom's taxonomy are easier to construct items for, so they
usually receive undue emphasis; without the table of specifications, ease of construction becomes the
dominant criterion
How long should the test be? Long enough to provide an adequate sampling of each behavioral
objective; keep in mind also the limitations of the students (how long can they sit, etc.)
Eliminating irrelevant barriers to performance:
1) Make sure that the students have the prerequisite skills and prior knowledge needed
2) Measure intended learning outcome, not the irrelevant skills (reading or writing ability)
3) Ambiguity -- again, making sure that you measure your behavioral objectives and not mind reading
4) Bias (gender, race, ethnic) -- items should be as free of bias as possible
4
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020
5
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020
an educational achievement must lead to a difference in behavior. The person who has achieved more must in
some circumstances behave differently from the person who has achieved less. If such a difference cannot be
observed and verified no grounds exist for believing that the achievement is important.
Measurement, in its most fundamental form, requires nothing more than the verifiable observation of such a
difference. If person A exhibits to any qualified observer more of a particular trait than person B, then that trait
is measurable. By definition, then, any important achievement is potentially measurable.
Many important educational achievements can be measured quite satisfactorily by means of paper and pencil
tests. But in some cases the achievement is so complex, variable, and conditional that the measurements
obtained are only rough approximations. In other cases the difficulty lies in the attempt to measure something
that has been alleged to exist but that has never been defined specifically. Thus, to say that all important
achievements are potentially measurable is not to say that all those achievements have been clearly identified or
that satisfactory techniques for measuring all of them have been developed.
Achievement, aptitude, and intelligence tests. Achievement tests are often distinguished from aptitude tests
that purport to predict what a person is able to learn or from intelligence tests intended to measure his capacity
for learning. But the distinction between aptitude and achievement is more apparent than real, more a difference
in the use made of the measurements, than in what is being measured. In a very real sense, tests of aptitude and
intelligence are also tests of achievement.
The tasks used to measure a child’s mental age may differ from those used to measure his knowledge of the
facts of addition. The tasks used to assess a youth’s aptitude for the study of a foreign language may differ from
those used to assess his knowledge of English literature. But all of these tasks test achievement; they measure
what a person has learned to know or to do. All learning except the very earliest builds on prior learning. Thus,
what is regarded as achievement in retrospect is regarded as aptitude when looking to the future.
There may well be differences in genetically determined biological equipment for learning among normal
human beings. But no method has yet been discovered for measuring these differences directly. Only if one is
willing to assume that several persons have had identical opportunities, incentives, and other favorable
circumstances for learning (and that is quite an assumption) is it reasonable to use present differences in
achievements as a basis for dependable estimates of corresponding differences in native ability to learn.
Types of tests. Although some achievement testing is done orally, with examinee and examiner face to face,
most of it makes use of written tests. Of these written tests there are two main types: essay and objective. If the
test consists of a relatively small number of questions or directions in response to which the examinee writes a
sentence, a paragraph, or a longer essay of his own composition, the test is usually referred to as an essay test.
Alternatively, if the test consists of a relatively large number of questions or incomplete statements in response
to which the examinee chooses one of several suggested answers, the test is ordinarily referred to as an
objective test.
6
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020
Objective tests can be scored by clerks or scoring machines. Essay tests must be scored by judges who have
special qualifications and who sometimes are specially trained for the particular scoring process. The scores
obtained from objective tests tend to be more reliable than those obtained from essay tests. That is, independent
scorings of the same answers, or of the same person’s answers to equivalent sets of questions, tend to agree
more closely in the case of objective tests than in the case of essay tests.
There are four major steps in achievement testing: (1) the preparation or selection of the test, (2) the
administration of the test to the examinees, (3) the scoring of the answers given, and (4) the interpretation of the
resulting scores.
Test development. In the United States, and to a lesser extent in other countries, achievement tests have been
developed and are offered for sale by commercial test publishers. Buros (1961) has provided a list of tests in
print and has indicated where they may be obtained. Recent catalogs of tests are available from most of the
publishers listed in that volume.
The achievement tests that most people are familiar with are the standard exams taken by every student in
school. Students are regularly expected to demonstrate their learning and proficiency in a variety of subjects. In
most cases, certain scores on these achievement tests are needed in order to pass a class or continue on to the
next grade level.
The role of achievement tests in education has become much more pronounced since the passage of the 2001
No Child Left Behind Act.1 This legislation focused on standard-based education which was used to measure
educational goals and outcomes. While this law was later replaced by the 2015 Every Student Succeeds Act,
achievement testing remains a key element in measuring educational success and plays a role in determining
school funding.
But achievement tests are not just important during the years of K-12 education and college. They can be used
to assess skills when people are trying to learn a new sport. If you were learning dance, martial arts, or some
other specialized athletic skill, an achievement test can be important for determining your current level of
ability and possible need for further training.
Some more examples of achievement tests include:
A math exam covering the latest chapter in your book
A test in your social psychology class
A comprehensive final in your Spanish class
The ACT and SAT exams
A skills demonstration in your martial arts class
Each of these tests is designed to assess how much you know at a specific point in time about a certain topic.
Achievement tests are not used to determine what you are capable of; they are designed to evaluate what you
know and your level of skill at the given moment.
7
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020
As you can see, achievement tests are widely used in a number of domains, both academic- and career-related.
Students face an array of achievement tests almost every day as they complete their studies at all grade levels,
from pre-K through college. Such tests allow educators and parents to assess how their kids are doing in school,
but also provide feedback to students on their own performance.
Achievement tests are often used in educational and training settings. In schools, for example, achievements
tests are frequently used to determine the level of education for which students might be prepared. Students
might take such a test to determine if they are ready to enter into a particular grade level or if they are ready to
pass of a particular subject or grade level and move on to the next.
Standardized achievement tests are also used extensively in educational settings to determine if students have
met specific learning goals. Each grade level has certain educational expectations, and testing is used to
determine if schools, teachers, and students are meeting those standards.
So how exactly are achievement tests created? In many instances, subject matter experts help determine what
content standards should exist for a certain subject. These standard represent the things that an individual at a
certain skill or grade level should know about a particular subject. Test designers can then use this information
to develop exams that accurately reflect the most important things that a person should know about that topic.
Achievement Tests vs Aptitude Tests
Achievement tests differ in important ways from aptitude tests. An aptitude test is designed to determine your
potential for success in a certain area. For example, a student might take an aptitude test to help
determine which types of career they might be best suited for. An achievement test, on the other hand, would be
designed to determine what a student already knows about a specific subject.
Q.4 Define advantages and disadvantages of Multiple types test questions.
Multiple choice items are a common way to measure student understanding and recall. Wisely constructed and
utilized, multiple choice questions will make stronger and more accurate assessments.
At the end of this activity, you will be able to construct multiple choice test items and identify when to use them
in your assessments.
Let's begin by thinking about the advantages and disadvantages of using multiple-choice questions. Knowing
the advantages and disadvantages of using multiple choice questions will help you decide when to use them in
your assessments.
Advantages
Allow for assessment of a wide range of learning objectives
Objective nature limits scoring bias
Students can quickly respond to many items, permitting wide sampling and coverage of content
Difficulty can be manipulated by adjusting similarity of distractors
Efficient to administer and score
Incorrect response patterns can be analyzed
8
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020
9
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020
The stem should include any words that would be repeated in each response.
Examples
Examine the examples below and think about the tips you just learned. As you look at each one think about
whether or not it 's a good example or does it need improvement?
As a public health nurse, Susan tries to identify individuals with unrecognized health risk factors or
asymptomatic disease conditions in populations. This type of intervention can best be described as
A. case management
B. health teaching
B. advocacy
D. screening
E. none of the above
This item should be revised. It should not have “none of the above” as a choice if you are asking for the “best”
answer.
Critical pedagogy
A. is an approach to teaching and learning based on feminist ideology that embraces
egalitarianism by identifying and overcoming oppressive practices.
B. is an approach to teaching and learning based on sociopolitical theory that
embraces egalitarianism through overcoming oppressive practices.
C. is an approach to teaching and learning based on how actual day-to-day
teaching/learning is experienced by students and teachers rather than what could
or should be experienced.
D. is an approach to teaching and learning based on increasing awareness of how
dominant patterns of thought permeate modern society and delimit the contextual
lens through which one views the world around them.
This item should be revised because the repetitive wording should be in the stem. So the stem should read
"Clinical pedagogy is an approach to teaching and learning based on:"
Katie weighs 11 pounds. She has an order for ampicillin sodium 580 mg IV q 6 hours. What is her daily
dose of ampicillin as ordered?
A. 1160 mg
B. 1740 mg
C. 2320 mg
D. 3480 mg
This example is well written and structured.
The research design that provides the best evidence for a cause-effect relationship is an:
10
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020
A. experimental design
B. control group
C. quasi-experimental design
D. evidence-based practice
This example contains a grammatical cue and grammatical inconsistency. Additionally, all distractors are not
equally plausible.
The nurse supervisor wrote the following evaluation note: Carol has been a nurse in the post-surgical
unit for 2 years. She has good organizational and clinical skills in managing patient conditions. She has
a holistic grasp of situations and is ready to assume greater responsibilities to further individualize care.
Using the Dreyfus model of skill acquisition, identify the stage that best describes Carol’s performance.
A. Novice
B. Advanced beginner
C. Competent
D. Proficient
E. Expert
This is a good example.
Multiple choice questions are commonly used in assessments because of their objective nature and efficient
administration. To make the most of these advantages, it's important to make sure your questions are well
written.
Q.5 Write a detailed note on different types of reliability of test.
Reliability is a measure of the consistency of a metric or a method.
Every metric or method we use, including things like methods for uncovering usability problems in an interface
and expert judgment, must be assessed for reliability.
In fact, before you can establish validity, you need to establish reliability.
Here are the four most common ways of measuring reliability for any empirical method or metric:
inter-rater reliability
test-retest reliability
parallel forms reliability
internal consistency reliability
Because reliability comes from a history in educational measurement (think standardized tests), many of the
terms we use to assess reliability come from the testing lexicon. But don’t let bad memories of testing allow you
to dismiss their relevance to measuring the customer experience. These four methods are the most common
ways of measuring reliability for any empirical method or metric.
11
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020
Inter-Rater Reliability
The extent to which raters or observers respond the same way to a given phenomenon is one measure of
reliability. Where there’s judgment there’s disagreement.
Even highly trained experts disagree among themselves when observing the same phenomenon. Kappa and
the correlation coefficient are two common measures of inter-rater reliability. Some examples include:
Evaluators identifying interface problems
Experts rating the severity of a problem
For example, we found that the average inter-rater reliability[pdf] of usability experts rating the severity of
usability problems was r = .52. You can also measure intra-rater reliability, whereby you correlate multiple
scores from one observer. In that same study, we found that the average intra-rater reliability when judging
problem severity was r = .58 (which is generally low reliability).
Test-Retest Reliability
Do customers provide the same set of responses when nothing about their experience or their attitudes has
changed? You don’t want your measurement system to fluctuate when all other things are static.
Have a set of participants answer a set of questions (or perform a set of tasks). Later (by at least a few days,
typically), have them answer the same questions again. When you correlate the two sets of measures, look for
very high correlations (r > 0.7) to establish retest reliability.
As you can see, there’s some effort and planning involved: you need for participants to agree to answer the
same questions twice. Few questionnaires measure test-retest reliability (mostly because of the logistics), but
with the proliferation of online research, we should encourage more of this type of measure.
Parallel Forms Reliability
Getting the same or very similar results from slight variations on the question or evaluation method also
establishes reliability. One way to achieve this is to have, say, 20 items that measure one construct (satisfaction,
loyalty, usability) and to administer 10 of the items to one group and the other 10 to another group, and then
correlate the results. You’re looking for high correlations and no systematic difference in scores between the
groups.
Internal Consistency Reliability
This is by far the most commonly used measure of reliability in applied settings. It’s popular because it’s the
easiest to compute using software—it requires only one sample of data to estimate the internal consistency
reliability. This measure of reliability is described most often using Cronbach’s alpha (sometimes called
coefficient alpha).
It measures how consistently participants respond to one set of items. You can think of it as a sort of average of
the correlations between items. Cronbach’s alpha ranges from 0.0 to 1.0 (a negative alpha means you probably
need to reverse some items). Since the late 1960s, the minimally acceptable measure of reliability has been
12
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020
0.70; in practice, though, for high-stakes questionnaires, aim for greater than 0.90. For example, the SUS has
a Cronbach’s alpha of 0.92.
The more items you have, the more internally reliable the instrument, so to increase internal consistency
reliability, you would add items to your questionnaire. Since there’s often a strong need to have few items,
however, internal reliability usually suffers. When you have only a few items, and therefore usually lower
internal reliability, having a larger sample size helps offset the loss in reliability.
Here are a few things to keep in mind about measuring reliability:
Reliability is the consistency of a measure or method over time.
Reliability is necessary but not sufficient for establishing a method or metric as valid.
There isn’t a single measure of reliability, instead there are four common measures of consistent
responses.
You’ll want to use as many measures of reliability as you can (although in most cases one is
sufficient to understand the reliability of your measurement system).
Even if you can’t collect reliability data, be aware of the ways in which low reliability may affect
the validity of your measures, and ultimately the veracity of your decisions
13