Module 2 - Techniques For Testing
Module 2 - Techniques For Testing
(Coombe,
Folse & Hubley, 2007).
Unit 2: Techniques for Testing
Constructing test items and tasks for any type of assessment is a task of significant
importance and one that is filled with challenges. Why? Test items are the
foundation of tests and the backbone of most assessment instruments.
With her exam specs firmly in hand, Mrs. Wright starts to prepare her midterm exam.
In doing so she does the following to ensure that her students are assessed fairly:
Subjective Items
Longer answer
Open response
Emphasis on production
English language testing, namely True/False (T/F), multiple choice (MCQ) and
matching.
True/False
True/False questions are second only to MCQs in frequency of use in professionallyproduced tests and perhaps one of the most popular formats for teacher-produced
tests. Basically, they are a specialized form of the MCQ format in which there are
only two possible alternatives and students must classify their answers into one of two
response categories. The common response categories are: True/False, yes/no,
correct/incorrect, right/wrong or fact/opinion. Because True/False is the most
common response category, these questions are generally referred to as True/False
questions.
True/False questions are typically written as statements and the students task is to
decide whether the statements are True or False. They are attractive to many test
developers because they offer several advantages. First of all, when you use this
question type, you can test large amounts of content. Additionally, because
True/False questions are shorter than most other item types, they typically require less
time for students to respond to them. Consequently, more items can be incorporated
into tests than is possible with other item type, which increases reliability. Another
big advantage is that scoring is quick and reliable and can be accomplished efficiently
and accurately.
Despite their advantages, True/False questions have several disadvantages. Perhaps
the one that is the most commonly cited is that they are perceived as easy by both
teachers and students because of the guessing factor that is associated with them.
With only two possible answers, there is always the danger that guessing may distort
or inflate the final mark. To overcome this disadvantage, it is recommended that
teachers utilize a third response category called the Not Given or the Not Enough
Information category. By doing so, the guessing factor goes from 50% to a more
acceptable 33 1/3%. Yet another way to alleviate this problem is by asking students
to correct false statements or to find statements in the text that support either a true or
a false response. These two methods increase the diagnostic value of True/False
questions. A second disadvantage of True/False items is that in order for them to be
reliable, you need to include a sufficient number of them on the test.
Tips for Writing Good True/False Questions:
The following tips can help you write effective True/False questions.
Write items that test meaning rather than trivial detail
True/False items are said to test gist or intensive understanding very well.
Questions should be written at a lower level of language difficulty than
the text.
This is important because you want to insure that comprehension is based on
understanding of the text and not understanding of the question itself. (This is
important for lower proficiency learners, especially K-12 learners).
By doing so, you will avoid getting those Ts that suspiciously look like Fs. This
will facilitate your marking substantially.
Multiple-Choice Questions
MCQs are probably the most commonly used format in professionally developed
tests. They are widely used to assess learning at the recall and comprehension levels.
Although they are more difficult to write than True/False questions, the job becomes
easier with the correct training and a little practice.
MCQs take many forms, but their basic structure is stem and response options. It is
the test takers task to identify the correct or most appropriate choice. The stem is
usually written as a question (i.e. Where did John go?) or an incomplete statement
(i.e. John _______ to the store). The response options of an MCQ are all the choices
given to the test taker. There are usually four when testing reading, vocabulary and
grammar but three with listening. They are most commonly expressed as A, B, C and
D. One of these response options is the key or correct answer. The others are referred
to as distractors or incorrect response options. The purpose of distractors is to
distract students attention away from the key if they do not know the correct answer.
The popularity of MCQs is based on a number of advantages associated with this
format. First of all, provided they are written well, they are very reliable because
there is only one answer possible. Another advantage is that they can be useful at
various levels. MCQs can be used to assess language at the elementary level and
content associated with graduate level language education. A third advantage is that
assessment is not affected by the test takers ability to write as they are only required
to circle the correct response or pencil in a bubble sheet. In addition, MCQs are wellliked by administrators as being cost effective item formats because they can even be
scored by computer if the institution has the correct equipment. This advantage
makes them quite easy to analyze. Finally, students from all parts of the world are
familiar with this format.
As with any item format, there are some distinct disadvantages to using MCQs.
Probably the one that is the most cited by teachers is that MCQs do not lend
themselves to the testing of productive language skills or language as communication.
The ability to test primarily recognition knowledge restricts what can be tested with
this format. Another disadvantage is that MCQs encourage guessing, which can
sometimes have an effect on exam results. A third disadvantage, one that most
teachers do not appreciate, is that it is challenging and time consuming to write
plausible distracters and produce good items.
Common MCQ Item Violations:
In this section, we will present examples of the most common MCQ item violations
and suggest ways to repair them.
Grammatical Inconsistency
A common mistake when developing MCQs is grammatical inconsistency between
the stem and the response options. Almost always, the stem and the key are
grammatically consistent, but distractors oftentimes do not mesh properly with the
stem.
Jane spent most of the day at the mall but she _________ anything.
A.
didnt buy
B.
bought
C.
not buy
D.
many shops
In this item, of course A is the key. Distractor D is grammatically inconsistent with
the other response options as it is a noun clause while the others are all verb forms. To
fix this item, distractor D should be changed to a verb form like buying.
Extraneous cues or clues
Cueing can occur in two places on a test: within an item or within the test. An
extraneous clue that occurs within the test is one where students can find the answer
to a question somewhere else on the test paper in another section of the test. Consider
the following cueing violation within an item:
After Ive had a big lunch, I only want an ___________ for dinner.
A.
pot of soup
B.
apple
C.
big steak
D.
candy bar
The key here is B as it is the only distracter that takes the article an in the stem. In
this item, a student only needs to know the grammatical rules concerning a and an
to figure out the correct response. To fix this item, consider putting a/an in the stem.
3 for 1 Split
This item violation occurs when three distractors are parallel and one is not. It is
sometimes called odd man out. This item violation varies in the degree of
seriousness. It is a serious violation if the unparallel option is the key. (If the odd
distracter out is not the key, then the violation is considered not as serious.)
The company was in desperate need for more workers so they __________ an
expensive ad in the newspaper.
A.
placing
B.
to place
C.
placement
D.
placed
In this item, D is the key. The 3 for 1 split is three verb forms (A, B & D) and 1 noun
(C). (As 3 for 1 splits go, this is not a terrible one as the key is not the odd man out.)
Impure items
Impure items are ones that test more than one thing.
I didnt see ______________.
A.
had the students gone
B.
C.
D.
This items tests both verb tense and word order. Remember that good items should
only test one concept or point.
Apples and Oranges
An apples and oranges violation is one where two response options have no relation to
the other two. This is often referred to as a 2 for 2 split. There are instances where 2
for 2 splits are acceptable (i.e. the case of opposites or affirmative/negative).
Nowadays people use mobile phones _____________.
A.
frequently
B.
seldom
C.
in their cars
D.
for emergency purposes
Distracters A and B are prepositions and C and D are prepositional phrases. This item
would be better if all the response options were either prepositions or prepositional
phrases.
Subsuming Response Options
In this item violation, the intended answer and a very good distractor could be correct.
Consider the following sample listening item.
Mary: We need to buy some new lawn furniture for the patio.
Steve: OK. Ill go to the mall tonight. Any special kind youre looking for?
Mary: Something cheap and comfortable. Just dont get anything made of
metal please.
What will Steve buy?
A.
outdoor furniture
B.
comfortable chairs
C.
steel furnishings
Although the key is B, it can be argued that A is correct on a higher level because
comfortable chairs is part of outdoor furniture. It is a sign of a poor item when
two response options can be considered correct. (Notice that for a listening item, there
are only three response options for students to choose from.)
Intersecting items
These items are ones where some of the response items have to do with one concept
or part of the answer and the other response option relates to more than one concept.
The problem here is that more often than not the intersection invariably is the correct
answer or key.
According to the article, the best time to take vitamins is _______________.
A.
B.
C.
before breakfast
on a full stomach
with meals
In this question, three options have to do with eating and one with the time of day.
Only one relates to both and it is the correct answer or key.
Unparallel Options
This item violation occurs when the response options are not parallel either in length
or in grammatical consistency.
For a group of ESL students from Mexico this item is not a problem. However, for
students who might have come from a country hit by the tsunami, this item could be
too upsetting. Test taking is a very scary experience for most students. It is important
that the content they encounter on tests not add to that fear.
Double Answer or Key
This item violation is the most commonly made amongst teachers. It occurs when
more than one response option is correct.
The teacher waited in her office until her students __________.
A.
came
B.
would come
C.
come
D.
had come
Many would argue that both A and D are correct responses.
No Answer
This is the second most common item violation made by teachers. It occurs when the
author of the test item forgets to include the key among the list of response options.
This most often occurs when the item has gone through various incarnations and
rewrites.
This is the restaurant ___________ I told you about yesterday.
A.
what
B.
where
C.
why
D.
how
Giveaway Distractors
This violation occurs when test takers can improve their scores by eliminating absurd
or giveaway distractors.
According to the text, the author of the article comes from __________.
A.
Dubai
B.
France
C.
Buenos Aires
D.
Disneyland
Students can easily eliminate D so it is not an effective distractor. It adds nothing to
this question.
Tips for Writing Good Multiple-choice Questions:
Multiple-choice questions are the hardest type of objective question to write for
classroom teachers. Teachers should keep the following guiding principles in mind
when writing MCQs:
The question or task should be clear from the stem of the MCQ.
Teachers should write MCQs that test only one concept.
Take background knowledge into account.
The selection of the correct or best answer should involve interpretation of the
passage/stem, not merely the activation of background knowledge.
As much context as possible should be provided.
The MCQ format is often criticized for its lack of authenticity. Whenever possible,
set items in context.
Randomly assign correct answers. Dont unconsciously introduce a pattern into the
test that will help the students who are guessing or who do not know the answer to get
the correct answer. Dont neglect placing the answer in the A position. Research
has shown that this is the position that is the most neglected because teachers want
students to read through all response options before responding to the question. In
many cases, when students find the key in the A position, they do not go on to read
the other response options, so teachers unconsciously place the actual answer further
down in the list of response options. One way to randomize answers is to alphabetize
them. By doing so, the correct answer will automatically vary in the A, B, C, or D
position.
Move recurring information in response options to the stem.
If the same words appear in all response options, take these words out of the response
options and put them in the stem.
Avoid writing absurd or giveaway distracters.
Do not waste space by including funny or implausible distracters in your items. All
distracters should appear for a valid pedagogical reason.
Avoid extraneous clues.
Avoid unintentional grammatical, phonological or morphological clues that assist
students in answering an item without having the requisite knowledge or skill being
tested.
Make the stem positive.
Writing the stem in the affirmative tends to make the question more understandable.
Introducing negatives increases the difficulty and discrimination of the question.
However, if you must make it negative, place the negative near the end of the
statement (i.e. Which of the following is NOT or All of the following are
_________ except..).
Make sure all questions are independent of one another.
Avoid sequential items where the successful completion of one question presupposes
a correct answer to the preceding question. This presents students with a double
jeopardy situation. If they answer the first question incorrectly, they automatically
miss the second question, thereby doubly penalizing them.
Use statistics to help you understand your MCQs.
Statistics like item analysis can assist you in your decisions about items. Use
statistics to help you decide whether to accept, discard or revise MCQs. (See chapter
______ for information on item analysis.)
Main Idea MCQ Format
The testing of the main idea of a text is frequently done via MCQs. The
recommended word count of the paragraph or text itself should be based on course
materials. One standard way to test main idea of a text or a paragraph within a text
employs an MCQ format with the response options written in the following way:
TS (too specific) This distractor focuses on one detail within the text or
paragraph.
OT (off topic) Depending on the level of the students, this distractor is written so
that it reflects an idea that is not developed in the paragraph or text. For more
advanced students, the idea would be related in some way.
Matching:
Another common objective format is matching. Matching is an extended form of
MCQ that draws upon the students ability to make connections between ideas,
vocabulary, and structure. Matching questions present the students with two columns
of information. Students must then find the matches between the two columns. Items
in the left-hand column are called premises or stems and the items in the right-hand
column are called options. The advantage over MCQs is that the student has more
distractors per item. Additionally, writing items in the matching format is somewhat
easier for teachers than either MCQs or True/False/Not Given.
Tips for Writing Matching Items:
These are some important points to bear in mind when writing matching questions:
There should be more options than premises
Never write items that rely on direct 1-on-1 matching. The consequence of 1-on-1
matching is that if a student gets one item wrong, at least two (but potentially many
more) are wrong by default. By contrast, if the student gets all previous items right,
the last item is a process of elimination freebie.
Number the premises and letter the options
To facilitate marking, number the premise column and letter the option column. Then
have students write the letter of the correct answer in the space provided.
Make options shorter than premises
When developing matching questions, you should write options that are relatively
short. This will reduce the reading load on the part of students.
Options and premises should be related to one central theme
Relating the information in both columns to one central theme will make for a more
coherent test section.
Avoid widows
Widows are when half of a test section or question overlaps onto another page.
Make sure the matching section (and all other sections) on your test is all on the same
page. Students might fail to see and answer items that continue on another page.
Make it clear to students if they can use options more than once
Be sure to explicitly state in the rubric whether options can be used more than once.
If this is not permitted, you might advise students to cross out options they have
already used.
Have students write the letter of the correct answer in a blank provided.
Failure to include this in the rubric will force students to draw lines between options
and premises making them next to impossible to grade.
Subjective Items
Subjective items often offer better ways of testing language skills than their objective
counterparts. Subjective items allow for much greater freedom and flexibility in the
answers they require. In this section, we will discuss common subjective items used
on language tests. They include cloze and gap fill, short answer and completion
items, and essay questions.
Cloze/Gap Fill Items:
Many teachers dont distinguish between gap fill and cloze tests. However, there are
some important differences between the two. In gap fill questions we normally choose
the words which we delete whereas in cloze we delete the words systematically.
Cloze testing originated in the 1950s as a test of reading comprehension.
Conventional cloze tests involve the removal of words at regular intervals, usually
every 6-8 words and normally not less than every five. The students task is to
complete the gaps with appropriate fillers. To do this, students have to read around
the gap. More specifically, they must refer to the text on either side of the gap taking
into account meaning and structure to process the answer. Although they remain
primarily a test of reading, cloze formats can test a wide variety of language areas.
Gap fill items are those where a word or phrase is replaced by a blank in a sentence.
The students task is to fill in the missing word or phrase. Harrison (1983) identifies
two types of gap fills: function gaps (such as prepositions, articles, conjunctions)
which have only one correct filler and semantic gaps (such as nouns, adjectives,
verbs and adverbs) that can be filled with a number of different alternatives (p. 40).
Tips for Writing Cloze/Gap Fill Items:
Answers should be short and concise
The response that goes in the blank should not be overly long. Make sure there is
enough room in the blank to comfortably write the response.
Provide enough context
There needs to be sufficient context present for students to surmise what goes in the
blank
Blanks should be of equal length
When putting blanks in your paragraph/text, make sure they are the same length.
Providing blanks that differ in length implies responses of varying lengths.
The main body of the question should precede the blank.
Develop and allow for a list of acceptable responses
When grading cloze/gap fill items, be sure to allow for the possibility of more than
one answer.
Oscarsson (1984) describes student progress cards as simple self-assessment tools that
have been used in a variety of educational settings around the world. Quite simply,
student progress cards define short-term functional goals and group these together in
graded blocks at various levels of difficulty. Both students and teachers can
participate in this activity. The students can check off (in the learner column) each
language skill or activity that they are sure of performing successfully. The teacher
can later check off (in the teacher column) the activity once the learner has mastered
it. A sample activity follows:
Objective
Student
Teacher
A popular technique in the area of self-assessment has been the use of rating scales,
check lists, and questionnaires. These three techniques have been used as a means
where learners could rate their perceived general language proficiency or ability level.
A great deal of developmental work has been done in this area through the use of
ability statements such as I can read and understand newspaper articles intended
for native speakers of the language (Oscarsson 1984).
Learner diaries and dialog journals have been proposed as one way of systematizing
self-assessment for students. Learners should be encouraged to write about what they
learned, their perceived level of mastery over the course content, and what they plan
to do with their acquired skills. These techniques will be discussed in more depth in
Chapter 4.
Videotapes
In todays technological age, no other audiovisual aid can match the potential of the
video recorder. Video can be exploited in a number of ways to encourage selfassessment in the classroom. For example, students can be videotaped or they can
videotape each other and then assess their language skills. An obvious advantage to
the use of video in self-assessment is that students can assess not only their
communicative or language skills but their paralinguistic (i.e. body language) skills as
well.
Portfolio Assessment
Portfolios are collections (either paper or electronic) assembled by both teacher and
student of representative samples of on-going work over a period of time. The best
portfolios are more than a scrapbook or folder of all my papers; they contain a
variety of work in various stages and utilize multiple media. Portfolios will be
discussed at more depth in Chapter 4.
Student-designed Tests
A novel approach within alternative assessment is to have students write tests on
course material. This process results in greater learner awareness of course content,
test formats, and test strategies. Student-designed tests are good practice and review
activities that encourage students to take responsibility for their own learning.
Learner-centered Assessment
Learner-centered assessment advocates using input from learners in many areas of
testing. For example, students can select the themes, formats and marking schemes to
be used. Involving learners in aspects of classroom testing reduces test anxiety and
results in greater student motivation.
Projects
Typically, projects are content-based and involve a group of students working
together to find information about a topic. In the process, they use authentic
information sources and have to evaluate what they find. Projects usually culminate
in a final product in which the information is given to others. This product could be a
presentation, a poster, a brochure, a display, a webquest, or one of many other
options. An additional advantage of projects is that they integrate language skills in a
real-life context.
Presentations
Presentations can be an assessment tool in themselves for speaking, but more often
they are integrated into other forms of alternative assessment. Increasingly, students
make use of computer presentation software like Microsoft PowerPoint which helps
them to clarify the organization and sequence of their presentation. Presentations are
another real-life skill that gives learners an opportunity to address some of the sociocultural aspects of communication such as using appropriate register and discourse.
A useful introduction to the world of academic presentations is the Icebreaker
speech. In this activity, students must write out, practice and deliver a 4 to 6 minute
speech about themselves. Information they may want to include would be: personal
details like age, place of birth, family information, hobbies, and future goals. The
purpose of this presentation is to get students up in front of an audience for the first
time. An added advantage if done at the beginning of the semester is for the teacher
and class to get to know one another.
Ten Things to Remember About Testing Techniques
1.
2.
Format should remain the same within one section of the exam
It is confusing to mix formats within the same section of a test.
3.
Make sure the item format is correctly matched to the test purpose and
course content
Test items should relate to curricular objectives. Before writing test content, teachers
should think about what they are trying to test and match their purpose with the item
format that most closely resembles it.
4.
Include items of varying levels of difficulty
Present items of different levels of difficulty throughout the test from easy to difficult.
We recommend the 30/40/30 principle. When constructing a test, try to gauge 30% of
the questions to the below average students in the class; 40% of the questions to target
those that have mid-range abilities. The remaining 30% of the questions should be
directed toward those students who are above average in their language ability. In
this way, everyone in the class, no matter what their ability level, will have access to
some of the questions.
5.
Start with an easy question first
If you follow the 30/40/30 principle mentioned above, start the exam with one of the
questions coming from the easy group. This will relax students and their anxiety
levels should decrease.
6.
Avoid ambiguous items, negatives, and, most especially, double negatives
Unless your purpose as a tester is to test ambiguity, then ambiguous items should be
avoided at all costs. Sometimes ambiguous language causes students to answer
incorrectly even when they know the answer. Negatives and double negatives are
extremely confusing and should be avoided unless however the intention is to test
negatives, only relevant information should be presented.
7.
Avoid race, gender and ethnic background bias
Sometimes test content unfairly biases certain groups. To avoid this, examine your
items carefully. Make sure that at least one other colleague has looked over your
items.
8.
Answer keys should be prepared in advance of test administration
Develop keys and model answers at the same time as you develop the test or
assessment task.
9.
Employ items that test all levels of thinking
Avoid lifting items verbatim from test stimulus. This does not require a great deal of
processing from our students. Try to include items/tasks of varying degrees of
sophistication and levels of thinking. Remember the six levels in Blooms Taxonomy
(1984): Knowledge, Comprehension, Application, Analysis, Synthesis, and
Evaluation.
10.
Give clear instructions to students
When we assess our students, we want to know if they comprehend the questions we
are asking them. Directions that are too elaborate, for example, could impede student
comprehension, thereby skewing test results.
Extension Activities
Test Review Activity
Mr. Knott has just developed his midterm English exam for his integrated skills ESL
class. Take a look at the grammar and vocabulary section. Which aspects of his test
are good? What could be done to improve the test?
2.
3.
4.
clair
automobile
5.
6.
7.
8.
Although the proposal has some disadvantages, they are outweighed by the
_______________.
a. advantages
c.
drawbacks
b. negatives
d.
problem
9.
10.
11.
See the appendix for a critical review of Mr. Knotts grammar and vocabulary test.