Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Assignment No. 1 Q.1 Explain Classroom Assessment. Write A Note On Principles of Classroom Assessment

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 13

Course: Educational Assessment and Evaluation (8602)

Semester: Autumn, 2020

ASSIGNMENT No. 1
Q.1 Explain classroom Assessment. Write a note on principles of classroom assessment.
Classroom Assessment Techniques (CATs) are generally simple, non-graded, anonymous, in-class activities
designed to give you and your student’s useful feedback on the teaching-learning process as it is happening.
Examples of CATs include the following.
 The Background Knowledge Probe is a short, simple questionnaire given to students at the start of a
course, or before the introduction of a new unit, lesson or topic. It is designed to uncover students’ pre-
conceptions.
 The Minute Paper tests how students are gaining knowledge, or not. The instructor ends class by asking
students to write a brief response to the following questions: “What was the most important thing you
learned during this class?” and “What important question remains unanswered?”
 The Muddiest Point is one of the simplest CATs to help assess where students are having difficulties.
The technique consists of asking students to jot down a quick response to one question: “What was the
muddiest point in [the lecture, discussion, homework assignment, film, etc.]?” The term “muddiest”
means “most unclear” or “most confusing.”
 The What’s the Principle? CAT is useful in courses requiring problem-solving. After students figure out
what type of problem they are dealing with, they often must decide what principle(s) to apply in order to
solve the problem. This CAT provides students with a few problems and asks them to state the principle
that best applies to each problem.
 Defining Features Matrix: Prepare a handout with a matrix of three columns and several rows. At the
top of the first two columns, list two distinct concepts that have potentially confusing similarities (e.g.
hurricanes vs. tornados, Picasso vs. Matisse). In the third column, list the important characteristics of
both concepts in no particular order. Give your students the handout and have them use the matrix to
identify which characteristics belong to each of the two concepts. Collect their responses, and you’ll
quickly find out which characteristics are giving your students the most trouble.
CATs can be used to improve the teaching and learning that occurs in a class. More frequent use of CATs can…
 Provide just-in-time feedback about the teaching-learning process
 Provide information about student learning with less work than traditional assignments (tests, papers,
etc.)
 Encourage the view that teaching is an ongoing process of inquiry, experimentation, and reflection
 Help students become better monitors of their own learning
 Help students feel less anonymous, even in large courses
 Provide concrete evidence that the instructor cares about learning
Results from CATs can guide teachers in fine-tuning their teaching strategies to better meet student needs. A
good strategy for using CATs is the following.

1
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020

1. Decide what you want to assess about your students’ learning from a CAT.
2. Choose a CAT that provides this feedback, is consistent with your teaching style, and can be
implemented easily in your class.
3. Explain the purpose of the activity to students, and then conduct it.
4. After class, review the results, determine what they tell you about your students’ learning, and decide
what changes to make, if any.
5. Let your students know what you learned from the CAT and how you will use this information.
The standard references on CATs is Classroom Assessment Techniques: A Handbook for College Teachers,
2nd edition, by Thomas A. Angelo and K. Patricia Cross (Jossey-Bass, 1993). This book includes 50 CATs,
indexed in a variety of useful ways. The book is available at the Center for Teaching library. See its ACORN
record for call number and availability.
Focused Listing
Focused Listing is a quick and simple student writing activity.
Muddiest Point
Muddiest Point is a quick and simple technique where students identify a challenging or confusing concept.
One Minute Paper
Minute paper is an introductory technique for a student writing activity.
Think-Pair-Share
Think-Pair-Share is a quick and easy technique that has students working in pairs to answer questions posed by
the instructor.
Concept Mapping
Concept Mapping is an intermediate technique that asks students to create ways of representing and organizing
ideas and concepts.
Jigsaw
Jigsaw is an advanced technique where teach each other assigned topics.
Memory Matrix
Memory matrix is an intermediate technique that asks students to create a structure for organizing large sets of
information.
Quiz Show
Quiz Show is an intermediate technique that uses a game show format for review sessions.
Q.2 How content outline is prepared while developing classroom test.
Basic Steps in Classroom Assessment
1. Determining the purpose of the assessment (pre-test, formative, or summative)
2. Developing the test specifications (this is the table you are creating)
3. Selecting the appropriate assessment tasks (form and type)

2
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020

4. Prepare the relevant assessment tasks


5. Assemble the assessment
6. Provide instruction
7. Evaluate the assessment
8. Use the assessment results
1. Determining the purpose of the assessment
Pre-testing
1) Whether students have the prerequisite skills needed for the instruction
2) To what extent students have already achieved the objectives of the planned instruction -- are confined to a
limited domain - low level of difficulty - serve as a basis for remedial work or for adaptation of instructional
plans - not usually different from post test (an equivalent form)
During instruction assessment
This is called diagnostic or formative assessment; done about midway through a unit or chapter
1) To monitor learning progress
2) Provide feedback to students and teachers
3) Detect learning errors, diagnostic - practice tests, quizzes - predefined segment of instruction - limited sample
of learning outcomes
End of instruction assessment
This is called summative assessment and measures the extent to which the intended learning outcomes have
been achieved; can serve the same purposes as pre-testing (for the following unit) and formative assessment
2. Developing the specifications for tests and assessments (this is the table you are creating)
Steps:
1) Prepare a list of instructional objectives
2) Outline course content
3) Prepare a two-way table / chart; table is limited to those objectives that are measurable
3. Selecting the appropriate assessment tasks [two forms: objective and performance]
First Form = Objective
Objective items -- highly structured; single right answer; limits type of response student can make; scoring
is quick, easy, and accurate
Supply types

1) Short answer
2) Completion

3
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020

Selection types: (1) alternate choice (2) matching (3) multiple choice (4) keyed response (5) interpretive
exercise

Second Form = Performance


Performance items -- less structure (problem can be redefined and the answer organized and presented in
their own words); scoring is more difficult and less reliable
Essay questions:

1) Extended-response
2) Restricted response
Active (evaluates process):

1) Construction of graphs, diagrams, models


2) Use of equipment or playing an instrument

Product: Report, art work, science project

Remember: Bottom line = select the item type that provides the most direct measure of the intended
behavioral objectives

4. Preparing the relevant assessment tasks; the limited number of items should be representative of the
domain

Learning outcomes at the first 3 levels of Bloom's taxonomy are easier to construct items for, so they
usually receive undue emphasis; without the table of specifications, ease of construction becomes the
dominant criterion
How long should the test be? Long enough to provide an adequate sampling of each behavioral
objective; keep in mind also the limitations of the students (how long can they sit, etc.)
Eliminating irrelevant barriers to performance:

1) Make sure that the students have the prerequisite skills and prior knowledge needed
2) Measure intended learning outcome, not the irrelevant skills (reading or writing ability)
3) Ambiguity -- again, making sure that you measure your behavioral objectives and not mind reading
4) Bias (gender, race, ethnic) -- items should be as free of bias as possible

4
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020

General suggestions for writing test items / tasks:

1) Use table of specifications


2) Write more items than needed
3) Write items well in advance of testing date
4) Write items so that they call for the performance described in the behavioral objectives
5) Task to be performed is clearly specified
6) Write item at appropriate reading / writing level (in sub-tests not measuring reading, such as, math,
science, and social studies, test makers generally write items two years below grade placement to avoid
testing reading ability)
7) Item provides no clue to answer
8) Answer is agreed upon by experts
9) Recheck items when revised for relevance
Valid Assessment will:
1) Improve student achievement
2) Improve instruction
3) Improve student-teacher relationships
Q.3 what are the types of achievements tests? For chat purposes achievement tests are used?
The purpose of achievement testing is to measure some aspect of the intellectual competence of human beings:
what a person has learned to know or to do. Teachers use achievement tests to measure the attainments of their
students. Employers use achievement tests to measure the competence of prospective employees. Professional
associations use achievement tests to exclude unqualified applicants from the practice of the profession. In any
circumstances where it is necessary or useful to distinguish persons of higher from those of lower competence
or attainments, achievement testing is likely to occur.
The varieties of intellectual competence that may be developed by formal education, self-study, or other types
of experience are numerous and diverse. There is a corresponding number and diversity of types of tests used to
measure achievement. In this article attention will be directed mainly toward the measurement of cognitive
achievements by means of paper and pencil tests. The justifications for this limitation are (1) that cognitive
achievements are of central importance to effective human behavior, (2) that the use of paper and pencil tests to
measure these achievements is a comparatively well-developed and effective technique, and (3) that other
aspects of intellectual competence will be discussed in other articles, such as those on motivation, learning,
attitudes, leadership, aesthetics, and personality.
Measurability of achievement. Despite the complexity, intangibility, and delayed fruition of many educational
achievements and despite the relative imprecision of many of the techniques of educational measurement, there
are logical grounds for believing that all important educational achievements can be measured. To be important,

5
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020

an educational achievement must lead to a difference in behavior. The person who has achieved more must in
some circumstances behave differently from the person who has achieved less. If such a difference cannot be
observed and verified no grounds exist for believing that the achievement is important.
Measurement, in its most fundamental form, requires nothing more than the verifiable observation of such a
difference. If person A exhibits to any qualified observer more of a particular trait than person B, then that trait
is measurable. By definition, then, any important achievement is potentially measurable.
Many important educational achievements can be measured quite satisfactorily by means of paper and pencil
tests. But in some cases the achievement is so complex, variable, and conditional that the measurements
obtained are only rough approximations. In other cases the difficulty lies in the attempt to measure something
that has been alleged to exist but that has never been defined specifically. Thus, to say that all important
achievements are potentially measurable is not to say that all those achievements have been clearly identified or
that satisfactory techniques for measuring all of them have been developed.
Achievement, aptitude, and intelligence tests. Achievement tests are often distinguished from aptitude tests
that purport to predict what a person is able to learn or from intelligence tests intended to measure his capacity
for learning. But the distinction between aptitude and achievement is more apparent than real, more a difference
in the use made of the measurements, than in what is being measured. In a very real sense, tests of aptitude and
intelligence are also tests of achievement.
The tasks used to measure a child’s mental age may differ from those used to measure his knowledge of the
facts of addition. The tasks used to assess a youth’s aptitude for the study of a foreign language may differ from
those used to assess his knowledge of English literature. But all of these tasks test achievement; they measure
what a person has learned to know or to do. All learning except the very earliest builds on prior learning. Thus,
what is regarded as achievement in retrospect is regarded as aptitude when looking to the future.
There may well be differences in genetically determined biological equipment for learning among normal
human beings. But no method has yet been discovered for measuring these differences directly. Only if one is
willing to assume that several persons have had identical opportunities, incentives, and other favorable
circumstances for learning (and that is quite an assumption) is it reasonable to use present differences in
achievements as a basis for dependable estimates of corresponding differences in native ability to learn.
Types of tests. Although some achievement testing is done orally, with examinee and examiner face to face,
most of it makes use of written tests. Of these written tests there are two main types: essay and objective. If the
test consists of a relatively small number of questions or directions in response to which the examinee writes a
sentence, a paragraph, or a longer essay of his own composition, the test is usually referred to as an essay test.
Alternatively, if the test consists of a relatively large number of questions or incomplete statements in response
to which the examinee chooses one of several suggested answers, the test is ordinarily referred to as an
objective test.

6
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020

Objective tests can be scored by clerks or scoring machines. Essay tests must be scored by judges who have
special qualifications and who sometimes are specially trained for the particular scoring process. The scores
obtained from objective tests tend to be more reliable than those obtained from essay tests. That is, independent
scorings of the same answers, or of the same person’s answers to equivalent sets of questions, tend to agree
more closely in the case of objective tests than in the case of essay tests.
There are four major steps in achievement testing: (1) the preparation or selection of the test, (2) the
administration of the test to the examinees, (3) the scoring of the answers given, and (4) the interpretation of the
resulting scores.
Test development. In the United States, and to a lesser extent in other countries, achievement tests have been
developed and are offered for sale by commercial test publishers. Buros (1961) has provided a list of tests in
print and has indicated where they may be obtained. Recent catalogs of tests are available from most of the
publishers listed in that volume.
The achievement tests that most people are familiar with are the standard exams taken by every student in
school. Students are regularly expected to demonstrate their learning and proficiency in a variety of subjects. In
most cases, certain scores on these achievement tests are needed in order to pass a class or continue on to the
next grade level.
The role of achievement tests in education has become much more pronounced since the passage of the 2001
No Child Left Behind Act.1 This legislation focused on standard-based education which was used to measure
educational goals and outcomes. While this law was later replaced by the 2015 Every Student Succeeds Act,
achievement testing remains a key element in measuring educational success and plays a role in determining
school funding.
But achievement tests are not just important during the years of K-12 education and college. They can be used
to assess skills when people are trying to learn a new sport. If you were learning dance, martial arts, or some
other specialized athletic skill, an achievement test can be important for determining your current level of
ability and possible need for further training.
Some more examples of achievement tests include:
 A math exam covering the latest chapter in your book
 A test in your social psychology class
 A comprehensive final in your Spanish class
 The ACT and SAT exams
 A skills demonstration in your martial arts class
Each of these tests is designed to assess how much you know at a specific point in time about a certain topic.
Achievement tests are not used to determine what you are capable of; they are designed to evaluate what you
know and your level of skill at the given moment.

7
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020

As you can see, achievement tests are widely used in a number of domains, both academic- and career-related.
Students face an array of achievement tests almost every day as they complete their studies at all grade levels,
from pre-K through college. Such tests allow educators and parents to assess how their kids are doing in school,
but also provide feedback to students on their own performance.
Achievement tests are often used in educational and training settings. In schools, for example, achievements
tests are frequently used to determine the level of education for which students might be prepared. Students
might take such a test to determine if they are ready to enter into a particular grade level or if they are ready to
pass of a particular subject or grade level and move on to the next.
Standardized achievement tests are also used extensively in educational settings to determine if students have
met specific learning goals. Each grade level has certain educational expectations, and testing is used to
determine if schools, teachers, and students are meeting those standards.
So how exactly are achievement tests created? In many instances, subject matter experts help determine what
content standards should exist for a certain subject. These standard represent the things that an individual at a
certain skill or grade level should know about a particular subject. Test designers can then use this information
to develop exams that accurately reflect the most important things that a person should know about that topic.
Achievement Tests vs Aptitude Tests
Achievement tests differ in important ways from aptitude tests. An aptitude test is designed to determine your
potential for success in a certain area. For example, a student might take an aptitude test to help
determine which types of career they might be best suited for. An achievement test, on the other hand, would be
designed to determine what a student already knows about a specific subject.
Q.4 Define advantages and disadvantages of Multiple types test questions.
Multiple choice items are a common way to measure student understanding and recall. Wisely constructed and
utilized, multiple choice questions will make stronger and more accurate assessments.
At the end of this activity, you will be able to construct multiple choice test items and identify when to use them
in your assessments.
Let's begin by thinking about the advantages and disadvantages of using multiple-choice questions. Knowing
the advantages and disadvantages of using multiple choice questions will help you decide when to use them in
your assessments.
Advantages
 Allow for assessment of a wide range of learning objectives
 Objective nature limits scoring bias
 Students can quickly respond to many items, permitting wide sampling and coverage of content
 Difficulty can be manipulated by adjusting similarity of distractors
 Efficient to administer and score
 Incorrect response patterns can be analyzed

8
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020

 Less influenced by guessing than true-false


Disadvantages
 Limited feedback to correct errors in student understanding
 Tend to focus on low level learning objectives
 Results may be biased by reading ability or test-wiseness
 Development of good items is time consuming
 Measuring ability to organize and express ideas is not possible
Multiple choice items consist of a question or incomplete statement (called a stem) followed by 3 to 5 response
options. The correct response is called the key while the incorrect response options are called distractors.
For example: This is the most common type of item used in assessments. It requires students to select one
response from a short list of alternatives. (stem)
1. True-false (distractor)
2. Multiple choice (key)
3. Short answer (distractor)
4. Essay (distractor)
Following these tips will help you develop high quality multiple choice questions for your assessments.
Formatting Tips
 Use 3-5 responses in a vertical list under the stem.
 Put response options in a logical order (chronological, numerical), if there is one, to assist readability.
Writing Tips
 Use clear, precise, simple language so that wording doesn’t effect students’ demonstration of what they
know (avoid humor, jargon, cliché).
 Each question should represent a complete thought and be written as a coherent sentence.
 Avoid absolute or vague terminology (all, none, never, always, usually, sometimes).
 Avoid using negatives; if required, highlight them.
 Assure there is only one interpretation of meaning and one correct or best response.
 Stem should be written so that students would be able to answer the question without looking at the
responses.
 All responses should be written clearly, approximately homogeneous in content, length and grammar.
 Make distractors plausible and equally attractive for students who do not know the material.
 Ensure stems and responses are independent; don’t supply or clue the answer in a distractor or another
question.
 Avoid “all of the above” or “none of the above” when possible, and especially if asking for the best
answer.
 Include the bulk of the content in the stem, not in the responses.

9
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020

 The stem should include any words that would be repeated in each response.
Examples
Examine the examples below and think about the tips you just learned. As you look at each one think about
whether or not it 's a good example or does it need improvement?
 As a public health nurse, Susan tries to identify individuals with unrecognized health risk factors or
asymptomatic disease conditions in populations. This type of intervention can best be described as
A. case management
B. health teaching
B. advocacy
D. screening
E. none of the above
This item should be revised. It should not have “none of the above” as a choice if you are asking for the “best”
answer.
 Critical pedagogy
A. is an approach to teaching and learning based on feminist ideology that embraces
egalitarianism by identifying and overcoming oppressive practices.
B. is an approach to teaching and learning based on sociopolitical theory that
embraces egalitarianism through overcoming oppressive practices.
C. is an approach to teaching and learning based on how actual day-to-day
teaching/learning is experienced by students and teachers rather than what could
or should be experienced.
D. is an approach to teaching and learning based on increasing awareness of how
dominant patterns of thought permeate modern society and delimit the contextual
lens through which one views the world around them.
This item should be revised because the repetitive wording should be in the stem. So the stem should read
"Clinical pedagogy is an approach to teaching and learning based on:"
 Katie weighs 11 pounds. She has an order for ampicillin sodium 580 mg IV q 6 hours. What is her daily
dose of ampicillin as ordered?
A. 1160 mg
B. 1740 mg
C. 2320 mg
D. 3480 mg
This example is well written and structured.
 The research design that provides the best evidence for a cause-effect relationship is an:

10
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020

A. experimental design
B. control group
C. quasi-experimental design
D. evidence-based practice
This example contains a grammatical cue and grammatical inconsistency. Additionally, all distractors are not
equally plausible.
 The nurse supervisor wrote the following evaluation note: Carol has been a nurse in the post-surgical
unit for 2 years. She has good organizational and clinical skills in managing patient conditions. She has
a holistic grasp of situations and is ready to assume greater responsibilities to further individualize care.
Using the Dreyfus model of skill acquisition, identify the stage that best describes Carol’s performance.
A. Novice
B. Advanced beginner
C. Competent
D. Proficient
E. Expert
This is a good example.
Multiple choice questions are commonly used in assessments because of their objective nature and efficient
administration. To make the most of these advantages, it's important to make sure your questions are well
written.
Q.5 Write a detailed note on different types of reliability of test.
Reliability is a measure of the consistency of a metric or a method.
Every metric or method we use, including things like methods for uncovering usability problems in an interface
and expert judgment, must be assessed for reliability.
In fact, before you can establish validity, you need to establish reliability.
Here are the four most common ways of measuring reliability for any empirical method or metric:
 inter-rater reliability
 test-retest reliability
 parallel forms reliability
 internal consistency reliability
Because reliability comes from a history in educational measurement (think standardized tests), many of the
terms we use to assess reliability come from the testing lexicon. But don’t let bad memories of testing allow you
to dismiss their relevance to measuring the customer experience. These four methods are the most common
ways of measuring reliability for any empirical method or metric.

11
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020

Inter-Rater Reliability
The extent to which raters or observers respond the same way to a given phenomenon is one measure of
reliability. Where there’s judgment there’s disagreement.
Even highly trained experts disagree among themselves when observing the same phenomenon. Kappa and
the correlation coefficient are two common measures of inter-rater reliability. Some examples include:
 Evaluators identifying interface problems
 Experts rating the severity of a problem
For example, we found that the average inter-rater reliability[pdf] of usability experts rating the severity of
usability problems was r = .52. You can also measure intra-rater reliability, whereby you correlate multiple
scores from one observer. In that same study, we found that the average intra-rater reliability when judging
problem severity was r = .58 (which is generally low reliability).
Test-Retest Reliability
Do customers provide the same set of responses when nothing about their experience or their attitudes has
changed? You don’t want your measurement system to fluctuate when all other things are static.
Have a set of participants answer a set of questions (or perform a set of tasks). Later (by at least a few days,
typically), have them answer the same questions again. When you correlate the two sets of measures, look for
very high correlations (r > 0.7) to establish retest reliability.
As you can see, there’s some effort and planning involved: you need for participants to agree to answer the
same questions twice. Few questionnaires measure test-retest reliability (mostly because of the logistics), but
with the proliferation of online research, we should encourage more of this type of measure.
Parallel Forms Reliability
Getting the same or very similar results from slight variations on the question or evaluation method also
establishes reliability. One way to achieve this is to have, say, 20 items that measure one construct (satisfaction,
loyalty, usability) and to administer 10 of the items to one group and the other 10 to another group, and then
correlate the results. You’re looking for high correlations and no systematic difference in scores between the
groups.
Internal Consistency Reliability
This is by far the most commonly used measure of reliability in applied settings. It’s popular because it’s the
easiest to compute using software—it requires only one sample of data to estimate the internal consistency
reliability. This measure of reliability is described most often using Cronbach’s alpha (sometimes called
coefficient alpha).
It measures how consistently participants respond to one set of items. You can think of it as a sort of average of
the correlations between items. Cronbach’s alpha ranges from 0.0 to 1.0 (a negative alpha means you probably
need to reverse some items). Since the late 1960s, the minimally acceptable measure of reliability has been

12
Course: Educational Assessment and Evaluation (8602)
Semester: Autumn, 2020

0.70; in practice, though, for high-stakes questionnaires, aim for greater than 0.90. For example, the SUS has
a Cronbach’s alpha of 0.92.
The more items you have, the more internally reliable the instrument, so to increase internal consistency
reliability, you would add items to your questionnaire. Since there’s often a strong need to have few items,
however, internal reliability usually suffers. When you have only a few items, and therefore usually lower
internal reliability, having a larger sample size helps offset the loss in reliability.
Here are a few things to keep in mind about measuring reliability:
 Reliability is the consistency of a measure or method over time.
 Reliability is necessary but not sufficient for establishing a method or metric as valid.
 There isn’t a single measure of reliability, instead there are four common measures of consistent
responses.
 You’ll want to use as many measures of reliability as you can (although in most cases one is
sufficient to understand the reliability of your measurement system).
 Even if you can’t collect reliability data, be aware of the ways in which low reliability may affect
the validity of your measures, and ultimately the veracity of your decisions

13

You might also like