Clinical review

ABC of learning and teaching in medicine

Written assessment
Lambert W T Schuwirth, Cees P M van der Vleuten

Some misconceptions about written assessment may still exist,

Choosing the most appropriate type of
despite being disproved repeatedly by many scientific studies.
written examination for a certain purpose
Probably the most important misconception is the belief that is often difficult. This article discusses
the format of the question determines what the question some general issues of written assessment
actually tests. Multiple choice questions, for example, are often then gives an overview of the most
believed to be unsuitable for testing the ability to solve medical commonly used types, together with their
problems. The reasoning behind this assumption is that all a major advantages and disadvantages
student has to do in a multiple choice question is recognise the
correct answer, whereas in an open ended question he or she
has to generate the answer spontaneously. Research has
repeatedly shown, however, that the question’s format is of
limited importance and that it is the content of the question
that determines almost totally what the question tests.
x A score that a student obtains on a test should indicate the score
This does not imply that question formats are always
that this student would obtain in any other given (equally difficult)
interchangeable—some knowledge cannot be tested with test in the same field (“parallel test”)
multiple choice questions, and some knowledge is best not x A test represents at best a sample—selected from a range of possible
tested with open ended questions. questions. So if a student passes a particular test one has to be sure
Five criteria can be used to evaluate the advantages and that he or she would not have failed a parallel test, and vice versa
disadvantages of question types: reliability, validity, educational x Two factors influence reliability negatively:
impact, cost effectiveness, and acceptability. Reliability pertains Sample error—The number of items may be too small to provide a
to the accuracy with which a score on a test is determined. reproducible result
Validity refers to whether the question actually tests what it is Sample too narrow—If the questions focus only on a certain
element, the scores cannot generalise to the whole discipline
purported to test.
Educational impact is important because students tend to
focus strongly on what they believe will be in the examinations.
Therefore they will prepare strategically depending on the
question types used. Whether different preparation leads to
different types of knowledge is not fully clear, however. When
teachers are forced to use a particular question type, they will
tend to ask about the themes that can be easily assessed with x The validity of a test is the extent to which it measures what it
purports to measure
that question type, and they will neglect the topics for which the
x Most competencies cannot be observed directly (body length, for
question type is less well suited. Therefore, it is wise to vary the example, can be observed directly; intelligence has to be derived
question types in different examinations. from observations). Therefore, in examinations it is important to
Cost effectiveness and acceptability are important as the collect evidence to ensure validity:
costs of different examinations have to be taken into account, One simple piece of evidence could be, for example, that experts
and even the best designed examination will not survive if it is score higher than students on the test
Alternative approaches include (a) an analysis of the distribution of
not accepted by teachers and students.
course topics within test elements (a so called blueprint) and
(b) an assessment of the soundness of individual test items.
x Good validation of tests should use several different pieces of
“True or false” questions evidence

The main advantage of “true or false” questions is their

conciseness. A question can be answered quickly by the student,
so the test can cover a broad domain. Such questions, however,
have two major disadvantages. Firstly, they are quite difficult to
construct flawlessly—the statements have to be defensibly true True or false questions are most suitable
or absolutely false. Teachers must be taught thoroughly how to when the purpose of the question is to
construct these question types. Secondly, when a student test whether students are able to evaluate
answers a “false” question correctly, we can conclude only that the correctness of an assumption; in other
cases they are best avoided
the student knew the statement was false, not that he or she
knew the correct fact.

“Single, best option” multiple choice

Multiple choice questions can be used in
questions any form of testing, except when
Multiple choice questions are well known, and there is extensive spontaneous generation of the answer is
experience worldwide in constructing them. Their main essential, such as in creativity,
hypothesising, and writing skills
advantage is the high reliability per hour oftesting—mainly

Clinical review

because they are quick to answer—so a broad domain can be

Teachers need to be taught well how to
covered. They are often easier to construct than true or false
write good multiple choice questions
questions and are more versatile. If constructed well, multiple
choice questions can test more than simple facts. Unfortunately
though, they are often used to test only facts, as teachers often
think this is all they are fit for.

Multiple true or false questions Which of the following drugs

belong to the ACE inhibitor group?
These questions enable the teacher to ask a question to which
there is more than one correct answer. Although they take (a) atenolol (h) metoprolol
somewhat longer to answer than the previous two types, their
(b) pindolol (i) propranolol
reliability per hour of testing time is not much lower.
Construction, however, is not easy. It is important to have (c) amiloride (j) triamterene
sufficient distracters (incorrect options) and to find a good (d) furosemide (frusemide) (k) captopril
balance between the number of correct options and distracters. (e) enalapril (l) verapamil
In addition, it is essential to construct the question so that
(f) clopamide (m) digoxin
correct options are defensibly correct and distracters are
defensibly incorrect. A further disadvantage is the rather (g) epoprostenol
complicated scoring procedure for these questions.

Example of a multiple, true or false question

“Short answer” open ended questions
Open ended questions are more flexible—in that they can test
issues that require, for example, creativity, spontaneity—but they
have lower reliability. Because answering open ended questions
is much more time consuming than answering multiple choice
questions, they are less suitable for broad sampling. They are Open ended questions are perhaps the
also expensive to produce and to score. When writing open most widely accepted question type. Their
ended questions it is important to describe clearly how detailed format is commonly believed to be
the answer should be—without giving away the answer. A good intrinsically superior to a multiple choice
open ended question should include a detailed answer key for format. Much evidence shows, however,
the person marking the paper. Short answer, open ended that this assumed superiority is limited
questions are not suitable for assessing factual knowledge; use
multiple choice questions instead.
Short answer, open ended questions should be aimed at the
aspects of competence that cannot be tested in any other way.

Essays are ideal for assessing how well students can summarise,
hypothesise, find relations, and apply known procedures to new
situations. They can also provide an insight into different
aspects of writing ability and the ability to process information.
Unfortunately, answering them is time consuming, so their
reliability is limited.
When constructing essay questions, it is essential to define
the criteria on which the answers will be judged. A common
pitfall is to “over-structure” these criteria in the pursuit of
objectivity, and this often leads to trivialising the questions.
Some structure and criteria are necessary, but too detailed a
structure provides little gain in reliability and a considerable
loss of validity. Essays involve high costs, so they should be used
sparsely and only in cases where short answer, open ended
questions or multiple choice questions are not appropriate.

“Key feature” questions

In such a question, a description of a realistic case is followed by
a small number of questions that require only essential
“Key feature” questions aim to measure
decisions; these questions may be either multiple choice or
problem solving ability validly without
open ended, depending on the content of the question. Key losing too much reliability
feature questions seem to measure problem solving ability

Clinical review

validly and have good reliability. In addition, most people

Example of a key feature question
involved consider them to be a suitable approach, which makes
them more acceptable. Case
You are a general practitioner. Yesterday you made a house call on Mr
However, the key feature approach is rather new and
Downing. From your history taking and physical examination you
therefore less well known than the other approaches. Also, diagnosed nephrolithiasis. You gave an intramuscular injection of
construction of the questions is time consuming; inexperienced 100 mg diclofenac, and you left him some diclofenac suppositories.
teachers may need up to three hours to produce a single key You advised him to take one when in pain but not more than two a
feature case with questions. Experienced writers, though, may day. Today he rings you at 9 am. He still has pain attacks, which
produce up to four an hour. Nevertheless, these questions are respond well to the diclofenac, but since 5 am he has also had a
continuous pain in his right side and a fever (38.9°C).
expensive to produce, and large numbers of cases are normally
needed to prevent students from memorising cases. Key feature Which of the following is the best next step?
(a) Ask him to wait another day to see how the disease progresses
questions are best used for testing the application of knowledge
(b) Prescribe broad spectrum antibiotics
and problem solving in “high stakes” examinations. (c) Refer him to hospital for an intravenous pyelogram
(d) Refer him urgently to a urologist

Extended matching questions

The key elements of extended matching questions are a list of
options, a “lead-in” question, and some case descriptions or
vignettes. Students should understand that an option may be Example of an extended matching question
correct for more than one vignette, and some options may not (a) Campylobacter jejuni, (b) Candida albicans, (c) Giardia lamblia,
apply to any of the vignettes. The idea is to minimise the (d) Rotavirus, (e) Salmonella typhi, (f) Yersinia enterocolitica,
recognition effect that occurs in standard multiple choice (g) Pseudomonas aeruginosa, (h) Escherichia coli, (i) Helicobacter pylori,
(j) Clostridium perfringens, (k) Mycobacterium tuberculosis, (l) Shigella
questions because of the many possible combinations between
flexneri, (m) Vibrio cholerae, (n) Clostridium difficile, (o) Proteus mirabilis,
vignettes and options. Also, by using cases instead of facts, the (p) Tropheryma whippelii
items can be used to test application of knowledge or problem
For each of the following cases, select (from the list above) the
solving ability. They are easier to construct than key feature micro-organism most likely to be responsible:
questions, as many cases can be derived from one set of options. x A 48 year old man with a chronic complaint of dyspepsia suddenly
Their reliability has been shown to be good. Scoring of the develops severe abdominal pain. On physical examination there is
answers is easy and could be done with a computer. general tenderness to palpation with rigidity and rebound
The format of extended matching questions is still relatively tenderness. Abdominal radiography shows free air under the
unknown, so teachers need training and practice before they diaphragm
x A 45 year old woman is treated with antibiotics for recurring
can write these questions. There is a risk of an
respiratory tract infections. She develops a severe abdominal pain
under-representation of certain themes simply because they do with haemorrhagic diarrhoea. Endoscopically a
not fit the format. Extended matching questions are best used pseudomembranous colitis is seen
when large numbers of similar sorts of decisions (for example,
relating to diagnosis or ordering of laboratory tests) need
testing for different situations.

Choosing the best question type for a particular examination is
Using only one type of question
not simple. A careful balancing of costs and benefits is required. throughout the whole curriculum is not a
A well designed assessment programme will use different types valid approach
of question appropriate for the content being tested.

Further reading
x Case SM, Swanson DB. Extended-matching items: a practical
alternative to free response questions. Teach Learn Med
x Frederiksen N. The real test bias: influences of testing on teaching
and learning. Am Psychol 1984;39:193-202.
x Bordage G. An alternative approach to PMPs: the “key-features”
concept. In: Hart IR, Harden R, eds. Further developments in assessing
clinical competence; proceedings of the second Ottawa conference.
Lambert W T Schuwirth is assistant professor and Cees P M van der
Montreal: Can-Heal Publications, 1987:59-75.
Vleuten is professor and chair in the department of educational
x Swanson DB, Norcini JJ, Grosso LJ. Assessment of clinical
development and research at the University of Maastricht in the
competence: written and computer-based simulations. Assessment
and Evaluation in Higher Education 1987;12:220-46.
x Ward WC. A comparison of free-response and multiple-choice The ABC of learning and teaching in medicine is edited by Peter
forms of verbal aptitude tests. Applied Psychological Measurement Cantillon, senior lecturer in medical informatics and medical
1982;6(1):1-11. education, National University of Ireland, Galway, Republic of Ireland;
x Schuwirth LWT. An approach to the assessment of medical problem Linda Hutchinson, director of education and workforce development
solving: computerised case-based testing. Maastricht: Datawyse and consultant paediatrician, University Hospital Lewisham; and
Publications, 1998. (Thesis from Department of Educational Diana F Wood, deputy dean for education and consultant
Development and Research, Maastricht University.) endocrinologist, Barts and the London, Queen Mary’s School of
Medicine and Dentistry, Queen Mary, University of London. The
BMJ 2003;326:643–5 series will be published as a book in late spring.

