Assessment in Medical Education

IBEROAMERICAN JOURNAL OF MEDICINE 04 (2020) 351-359
Journal homepage: www.iberoamericanjm.tk
Review
How to Use and Apply Assessment Tools in Medical Education?

Said Said Elshamaa,*
a
Department of Forensic Medicine and Clinical Toxicology, College of Medicine, Suez Canal University, Ismailia City, Egypt. College of
Medicine, Taif University, Taif, Saudi Arabia
ARTICLE INFO ABSTRACT
Article history: Assessment in medical education usually gives the evidence that learning was
Received 24 July 2020 carried out and the learning objectives were achieved. The assessment program is a
Received in revised form 01 August measurement tool to evaluate the progress in knowledge, skills, behaviors, and the
2020 attitude of students. So, the planning for an effective assessment program should be
Accepted 10 August 2020 based on instructional objectives, instructional activities, and efficient assessment
methods. Thus, a well-designed assessment procedure should be characterized by
Keywords: validity and reliability. There are two methods for interpreting the results of
Assessment students’ performance, norm-referenced and criterion-referenced; the first gives a
Methods relative ranking of students while the second describes learning tasks that students
Medical Education can and cannot perform. The information that gets from the assessment results
should be used effectively to evaluate and revise the instructional course for more
improvement. Therefore, the reporting of the assessment results to stakeholders
should be clear, comprehensive, and understandable to prevent misinterpretation
that may affect students and other stakeholders adversely.
© 2020 The Authors. Published by Iberoamerican Journal of Medicine. This is an open access
article under the CC BY license (http://creativecommons. org/licenses/by/4.0/).
1. INTRODUCTION Therefore, the integration of assessment with an instruction

should be depending on essential principles for effective
Assessment is a tool for determining the extent of assessment. These principles should include clear intended
instruction intended learning outcomes achievement by learning outcomes, using the different assessment
students; it is considered an integrated process with the procedures, the relevance of procedures to instruction, an
instruction process. Moreover, a well-integrated designed adequate sample of the student performance, the fairness of
assessment gives a credible impression about the procedures, the judgment of successful performance
effectiveness of the instruction process. In addition, the according to specific criteria, the feedback to the students
student assessment leads to student motivation, student about the strength and weakness of the performance for the
self-evaluation development, retention and transfer of the correction, the comprehensive grading, and the reporting
learning [1]. system. Thus, the choice of assessment method selection
should be depending on using the most efficient and
* Corresponding author.
E-mail address: saidelshama@yahoo.com
© 2020 The Authors. Published by Iberoamerican Journal of Medicine. This is an open access article under the CC BY license (http://creativecommons.
org/licenses/by/4.0/).
http://doi.org/10.5281/zenodo.3978444
352 IBEROAMERICAN JOURNAL OF MEDICINE 04 (2020) 351-359
appropriate method for the intended learning outcomes 2. MAJOR TYPES OF ASSESSMENT
assessment. Noteworthy, student learning improvement is
considered the main objective of the assessment program Initially, the assessment includes testing and performance
[2]. assessments; it is classified into tests for selected response
In this context, the planning for the student assessment and other for supply response in addition to performance
should be based on instructional objectives, instructional assessments restricted or extended.
activities, and assessment methods. So, the instructional Selected response tests measure understanding and
objectives should describe the intended learning outcomes thinking skills wherein the student chooses the correct or
in performance terms wherein this performance is evidence the best answer (Multiple-choice questions (MCQ), true-
of the student learning at the end of the learning false and matching tests). It is a common use because of
experience. Moreover, the revised bloom's taxonomy of the administration of a large number of the selected
educational objectives is considered the framework for response items to the students' group in a short time with
identification of the previous factors via two dimensions; rapid scoring of its results by the hand or machine. Its
the first includes six cognitive process categories scoring is completely objective, but it is low in realism
(remember, understand, apply, analyze, evaluate and because the student selects the response from a given set of
create) while the second includes four knowledge the possible answers and then there is a limited response to
categories (factual, conceptual, procedural and the listed alternatives. On the other hand, the student can
Metacognitive). This taxonomy prepares the assessment respond by the word, short phrase, or complete essay in
procedures and instruments in alignment with the supply response tests; it requires more time for its results
instructional objectives and activities wherein harmony and scoring, but its scoring is more subjective and then
alignment between objectives (intended learning personal bias stands against the judgment. It is more
outcomes), instructional activities, and assessment are the realistic in comparison with the selected type because it has
title of effective planning for the student assessment [3]. great freedom of the response with a moderate structure
Worthwhile, the planning of assessment and instruction are [6].
complemented each other. So, the planning for them should Restricted performance assessments assess the
be done at the same time to have answers for some performance of highly structured limited task (writing a
necessary questions that help for the success of the brief paragraph for a given topic); it is more realistic in
assessment program such as what is the extent of the need comparison with the selected type because it has great
for pretesting?. What is the type of assessment during and freedom of the response with moderate structure as the
at the end of instruction? Therefore, preparation of supply response tests. On the other hand, extended
achievement test should be based on a set of steps that performance assessments assess the comprehensive and
include instructional objectives specification, test less structured performance task (writing a short story); it
specification, construction of the relevant test items, and is high in realism because it simulates the performance in
arrangement of the test items, clear direction preparation, the real world wherein it is an integration of ideas and
revision and evaluation of the assembled test, skills of different learning sources. Noteworthy, the
administration of the test, and the test item analysis [4]. performance assessments usually are time-consuming and
In the related context, the assessment types may be depend on the quality performance criteria. Moreover, it is
classified according to timing into placement assessment applied by the rating scale or the set of scoring rubrics
that is a given test at the beginning of the course to identify based on subjective judgment [7].
the necessary prerequisite skills of the instruction success;
it is a pretest that determines entry assessment and covers
the intended learning outcomes of the planned instruction.
The formative assessment (process-focused) is used for the 3. TYPES OF TESTS
learner progress monitoring during the instruction by
MCQ are the most useful selection type item; it is designed
identification the strength and weak points of the student
performance; its design depends on measuring the extent of to measure simple and complex intended learning
the learning outcomes mastering by the learners in the outcomes. It consists of the stem (problem situation) and
limited section of instruction wherein its results are a several options (choices); the stem is a question or an
method of the learning improvement. At the end of incomplete statement while options are several answers
instruction, the extent of the learning outcomes (correct answer and plausible wrong answers which are
achievement and the terminal performance of students called distracters). The best answer form is another type of
should be measured by summative assessment (outcome- multiple-choice item for more complex achievement
focused); it is a comprehensive method for the mastering wherein all options are partially correct but one option is
identification or the grades assigning, it aims to provide the clearly better than the others [8].
student’s feedback and evaluation of the instruction To prepare the effective multiple-choice item, it should be
the design of the item for one intended learning outcome
effectiveness [5].
measurement. Furthermore, the stem of the item should be
a single clear problem formulation with simple and clear
IBEROAMERICAN JOURNAL OF MEDICINE 04 (2020) 351-359 353
language along with much wording in the stem of the item 4. HOW TO ASSESS THE PSYCHOMOTOR
with avoidance of the repeated same material in the DOMAIN IN MEDICAL EDUCATION?
options. Moreover, the item stem should be in a positive
form emphasizing the negative wording using underline or Objective Structured Clinical Examination (OSCE) is
capitalization or its near position for the statement end. The considered the used tool for assessment of the psychomotor
intended answer may be correct or clearly best wherein all domain; it is an examination for competence assessment
options are consistent with the item stem grammatically (content skills, process skills, and clinical management). It
and parallel in the form avoiding the verbal clues to is considered the standardized tool for clinical
prevent discrimination of the correct or incorrect answer competencies assessment such as history taking, physical
such as similarity of the wording in the stem and correct examination, and technical procedures. According to the
answer, stereotyped phraseology of the correct answer, Millers pyramid, OSCE measures the category which is
great detail of the correct answer, absolute terms in the called shows how; it consists of multiple stations and a
distracters “always, never, all, none” or there are two wide sampling of clinical and communication skills with a
inclusive responses or two responses have the same lot of examiners and patients within a limited time by using
meaning. Moreover, the correct answer length should vary a checklist or global rating scale. Therefore, it has high
as well as the position of the correct answer should vary reliability because the use of detailed checklists may
randomly, besides avoidance using the phrase “all of the decrease inter-rater unreliability and then reinforces the
above” as an alternative, but the phrase “none of the reliability. In addition, the test results depend on the direct
above” should be used with extreme caution. In addition, observation and the repeated measurements that help the
the difficulty of the item is controlled by the complexity of examiner to assess many different qualitative aspects such
the stem problem or by the homogeneity of alternatives. as efficiency and the students’ skill performance.
Each item should be independent for other items in the test Moreover, there is also acceptability for this exam because
along with the application of normal rules of grammar and every student does the same task. It is also a valid exam
using the efficient item format [9]. depending on content (good sampling of matching skills
In addition, distracters should be plausible and attractive to with the learning outcomes), construct validity, and
the uninformed; it should be stated in the student language authentic length of the station [13].
with good sound words and similar to the correct answer in To design a good OSCE, it should determine the examined
the length and complexity of wording. Distracters should skills types in alignment with the learning objectives of the
represent common misconceptions or errors of students; it module and the types of assessment tools (ex. checklist).
should be homogenous and has extraneous clues without Moreover, it also should determine the number of stations
overusing. Noteworthy, breaking any one of the above (10-15 stations), the time of station and the length of
rules may be encouraged if it will improve the item examination (10 minutes X 10 stations = 100 minutes)
effectiveness according to experiences of the test maker in besides the preparation of resources such as examination
the item writing [10]. rooms, manikins, examiners, patients, and volunteers [14].
Matching items type is a simple variation of multiple- Furthermore, the marks scheme should be constructed
choice items wherein it should shift to matching items depending on discrimination actions to distinguish between
when there are a number of related similar factors. good and poor performance. In addition, the preparation of
Matching items type is a series of stems (premises) and instructions is also considered essential for the examiner,
series of answers (responses) which are arranged in the patient, and student. At first, it should outline the required
columns under the guiding directions for the matching. The task exactly at every station for the student along with
matching items type should include matching item material outlining the marking scheme instructions about the action
homogeneity and a shortlist of items with brief responses and performance of the student at every station for the
on the right. Moreover, the number of responses should be examiner. Secondly, it should outline the dealing approach
larger or smaller than premises with responses using more between the patient and the student. Finally, it should
than once and placed in alphabetical or numerical order. evaluate the exam after finishing it. Noteworthy, the
Directions should be specific and a basis for matching success OSCE depends on the availability of facilities such
wherein it should indicate that the use of response may be as manikins and other tools, examiners, real patients,
once or more than once, or not at all. Worthwhile, the actors, technical and administrative teams, and training
matching items should be placed on the same page with the [15].
responses [11]. At last but not least, the use of short stations in the OSCE
The extended matching question (EMQ) is different from is considered a controversial issue wherein some educators
the single best answer multiple-choice questions and think that it is destructive to the validity of the test. Some
superior to it for the assessment of the problem-solving and educators adopt this view because the use of short stations
clinical reasoning skills of the students. It consists of a does not allow to assess other aspects of shows how level
theme (symptom, diagnosis, treatment), options list such as the ability of students to deal with complicated
(answers), lead-in statement (question), and two stems (two situations that need the integrated different skills such as
clinical problems) [12]. decision making, drawing the conclusions based on
physical examination and investigation and management
skills of the case. Thus, the use of short stations becomes 6. HOW TO ASSESS THE COMPREHENSIVE
limited to the technical skills only according to some DOMAIN “COMBINED DOMAINS” IN MEDICAL
views. On another hand, other educators prefer the use of EDUCATION?
long stations as an alternative indicating the limited
influence of the station length on the reliability. Therefore, Portfolio-based assessment is a live alternative to
I think that the best is the determination of the assessment traditional high stakes testing. So, it is used for summative
task by using a good balance for the content apart from the and formative assessment wherein it has value as a source
controversial views to ensure the authenticity and the of self-satisfaction. The portfolio is considered one of the
efficiency of measurement [16]. useful and popular assessment tools of the student
performance in undergraduate and postgraduate medical
education; it aims to link the objectives of instructional
5. HOW TO ASSESS THE AFFECTIVE DOMAIN course with clinical experience that is recorded in a
IN MEDICAL EDUCATION? standardized manner to facilitate the learning, teaching,
and assessment [20].
Worthwhile, the performance tasks usually contain The portfolio is a collection of systematic, selected,
knowledge, skill, and the affective components (affective purposeful and organized student work (materials) that
domain) that describes the learning objectives which show the personal ability of every student (evidence of
address feeling, emotion, and the degree of acceptance or performance) and his professional development via
rejection. Moreover, the affective domain has many measuring the growth of knowledge, skills, and attitudes.
parameters such as attitude that is an important mental Therefore, the content of the portfolio (evidence of the
parameter of the affective domain; it consists of cognition, learning achievement) consists of clinical tutor reports,
affects, behavioral intentions and evaluation. The second selected student assignments, a list of attained skills, and
parameter of the affective domain is the motivation that evidence of communication skills, assessment results, and
means initiation, direction, and human behavior the reflective diary [21].
persistence; it includes also engaging reasons in a special In this context, we can divide the portfolio into two types;
behavior such as basic needs, object, goal, and the developmental and showcase portfolio. The developmental
desirable ideal. Thirdly, another parameter is the self- type is usually used throughout the instructional course
efficacy that is considered a personal perception for the (formative) and assesses the student learning progress
ability of performance in a particular manner [17]. while the showcase type is used at the end of the course
Thus, the affective domain is difficult to assess because it (summative) and shows the student's best work samples
emphasizes attitude, feeling, emotion, and values. So, it and the final level of performance [22].
should be stated in specific, measurable, observable In addition, portfolios have many advantages such as the
objectives to translate into quantitative terms. Therefore, learning progress assessment over the times, positive effect
the taxonomy of affective domain classifies the behavioral for the coverage of the best student work, and providing
objectives into observable behaviors in the quantitative the greater motivation because of comparison between the
terms such as receiving (accept, attend, recognize), present and past work. Furthermore, its advantages include
responding (discuss, complete, examine), valuing (accept, an improvement in the self-assessment skills of the student,
seek, defend), organization (discriminate, organize, providing reflective learning, adjustment of the individual
systematize), and characterization (verify, internalize) [18]. differences, providing the connection between theory and
In this context, the assessment of affective domain depends practice besides communication with the students and
on many tools that assess attitudes, interests, motivations, parents for the learning progress, and an increase in
and self-efficacy. These tools include self-report, rating collaboration between student and teacher. However and
scales, semantic differential scales, Thurstone scale, and for fair judgment, we should remind that portfolios have
checklist. The self-report is written reflections that are some disadvantages such as the time consuming because of
done by an individual about his attitude or feeling toward the portfolio entries selection, periodic revision, and
an idea or people or concept while the rating scales are a providing the feedback [23].
number of the designed categories to extract the To plan the portfolios, there are many steps that should be
quantitative information such as Likert scale and 1-10 applied such as determination of the portfolio purpose and
rating scale. Semantic differential scales "SD" assess the the involved entries types with a determination of the
personal reaction to specific ideas or concepts in rating guidelines for entries selection and evaluation. In addition,
terms on bipolar scales while the Thurstone scale assesses it should also determine the procedures of portfolio
the attitude by determination favorability position on the maintenance and using, and the criteria of portfolio
issue [19]. evaluation. Finally, we should discriminate between
portfolio evaluation as a structure and the student
evaluation as performance progress. The structural
evaluation of the portfolio depends on makeup,
organization, and content while overall evaluation of the
student performance progress that is shown in the portfolio
is determined via the rating scale based on the learning be based on student achievement only without addition to
outcomes assessment. Thus, the holistic rubrics of each extraneous factors such as effort or misbehavior. Thirdly,
involved area in the portfolio determine the final level of grades should also be based on varieties of valid
student performance [24]. assessment data and all learning outcomes while the results
should be involved in the final grade for more validity of
the grade. Fourthly, the weighting method should be used
7. HOW TO DEAL WITH THE ASSESSMENT for combining scores of the grading with a selection of a
RESULTS? suitable frame for the grading reference. Finally, the
revision of the borderline cases should be done by re-
Firstly, the assessment results should be summarized examining all achievement evidence [28].
concisely into informative data such as tallies, percentages, However, the results or test scores interpretation is an
and qualitative data (themes, grouped listings). Secondly, important step in dealing with the assessment results
the assessment results should be sharing as a wherein it is considered a translation of the quantitative
summarization for these results or in a brief report data to equal numerical set; it is a process for score
associated with essential information such as identification analysis to generate meaningful quality. Noteworthy, there
of the successful student rules, satisfactory evidence for his are different types of scores; the first is the raw score that is
success, and the determined action for unsatisfactory a number of the received points in the test that have not
results. Moreover, the venues of the assessment results meaningful interpretation while the second is the scaled
sharing should be determined via choosing one venue or score that is a result transformation through a consistent
more such as web sites, emails, newsletters, presentations, scale. In addition, the test score interpretation should
brochures, posters, or banners [25]. depend on the referencing framework that is a structure for
In this context, the reporting of assessment results should comparison of the student performance to something
be fair, honest, balanced, objectively, useful, and external to the assessment itself; it is a comparison of the
documented with providing appropriate attribution. So, it student score to the predetermined standard of performance
should give the most impact via using the meaningful, (standard criteria) [29].
attractive, interesting title and headings. Furthermore, the Thus, the referencing framework for the test score
reporting of assessment results should be short, cascade interpretation may be a criterion-referenced framework or
from major points to details with informed commentary. In norm-referenced framework. The criterion-referenced
the related context, grading of results is also considered an framework is the description of individual performance in
essential element because it provides us effective feedback the test without referring to the performance of others
about the learning process and the suggestions for its wherein the criterion is the domain of performance that is a
improvement wherein assigning grades are a valid measure reference of the student assessment results. Worthwhile,
for learner achievement [26]. this interpretation is meaningful if the test is designed
Noteworthy, the performance assessment has different specifically for this purpose. So, the test performance using
types such as essay tests, ratings, and multiple-choice criterion-referenced assessment can be measured by the
questions wherein it translates the student performance to speed of performance (task performance within a fixed
grades that represent the extent or degree of intended time), the degree of performance accuracy, the percentage
learning outcomes achievement. Therefore, every medical (proportions number of maximum points gained) such as
school should be having a clear grading policy for valid the percentage of the corrected answers or the percentage
judgment. Moreover, grading may be divided into two of the learning objectives achievement, the quality rankings
types; the first is an absolute grading while the second is (quality level of performance such as an excellent rating of
relative grading. Absolute grading is based on a 4, good rating of 3), the percentage of the correct score
comparison between the student performance and pre- (standard for judgment of the performance mastering of the
specified standard of performance depending on the learning objectives), and the expectancy table (it interprets
mastering of the learning and cutoff points identification raw score in expected performance terms) [30].
while the relative grading depends on a comparison The norm-referenced framework is a comparison of the
between the student performance and the group members' individual test score with other students' test scores who
performance for individual ranking in the group [27]. take the same test. Therefore, it determines the student
In addition, the validity of the grading system should be standing in the reference group wherein the student score is
based on the efficacy and fairness of the assigning grades. not treated individually but it is related to the group.
Therefore, there are some guidelines that should be applied Moreover, norm-referenced scores depend on the
during the designing of the grading system. Initially, the transformation of the raw score mathematically wherein
students should be aware of the grading system of the the raw score in the norm-referenced framework is not
course achievement at the beginning of the course valid for the student performance interpretation. So, it
including components of assessment, the weight of every should be converted into the derived score that is a
test grade, and the description of every letter grade. numerical report of the test performance on the score scale.
Worthwhile, these guidelines should be written in detail in The percentage of the norm group that is scored below a
the study guide of every module. Secondly, grades should particular raw score is identified as percentile ranks; it is
different from the percentage of corrected answers items students with their peers inside or outside the institution at
that is criterion-referenced interpretation. Developmental a national or international level and determine what the
scores or scales are one of the norm-referenced scores that extent of the improvement achievement for the students or
identify the development of students across various grades the educational program, the strength and weakness points,
or age levels wherein the grade equivalent score is capability and productivity of the students. However,
matching the particular raw score that equals the obtaining some schools adopt standardized achievement tests that
grade level of the student. The standardized scores of depend on the norm-referenced approach to interpret their
norm-referenced scores are transforming scores for the test results. It compares the student performance to the
performance comparison across two or more different representative sample of students’ performance in the norm
measures; it divides into linear standard scores and group at a regional or national level; it is designed to
normalized standard scores wherein the linear standardized determine the common set of goals achievement by the
scores (Z-scores and T-scores) compare between two students. So, there are some guidelines that should be
distributions of the test performance and maintain the same applied when standardized achievement tests are
distribution shape of corresponding raw scores while the constructed. At first, the test content should be depending
normalized standard scores (stanines and deviation IQ on many the used textbooks besides the test items should
scores) depend on the knowledge of normal distribution be constructed by test experts and subject matters.
characters in the interpretation and convert the distribution Moreover, the test items should also be selected depending
of the raw scores to normal distribution. Finally, I want to on the test specifications, and then it is revised and
remind that all norm-referenced scores contain errors analyzed for the difficulty via using the rigid directions for
because there is not test act as a perfect measure [31]. the test. In addition, the test scores should be interpreting
Finally and conclusively, there is not a gold standard- according to the norm-referenced framework whereas the
setting in the assessment. According to the above test manual should be included the procedures of scoring,
mentioned, there are two types of standard-setting interpretation, and the use of results. Finally, we can
methods; criterion-referenced or absolute method, wherein modify the standardized achievement test and interpret its
the standard-setting does not depend on the test results scores according to the criterion-referenced framework if
(independent) while norm-referenced or relative method we can modify multiple-choice items and add open-ended
wherein the standard setting is based on the test results. performance task [33].
The norm-referenced standard is considered the method of Noteworthy, the percentage of the correct score is
choice to rank examinees while the criterion-referenced considered one of the best methods of reporting of the
standard is considered the most appropriate to fulfill criterion-referenced test results wherein it tells us about the
whether examinees’ mastering of a specific domain meets percentage of corrected answers in the test. However, the
the pre-set requirements. Regrettably, two standard-setting norm-referenced scores have different types that are used
approaches have disadvantages that diminish their with standardized tests such as percentile ranks, grade
credibility because it leads to widely divergent results on equivalent scores, and the standard scores. The percentile
the same test. The criterion-referenced method with a pre- rank is different from the percentage of the corrected
fixed cut-off score leads to a large variation in failure rates answers (criterion-referenced) because it indicates the
while the norm-referenced method leads to a large relative position in the group as a percentage of students
variation in cut-off scores. In addition, the procedures of a scoring while the grade equivalent scores indicate the
criterion-referenced standard setting require panels to relative test performance as a grade level. The standard
determine a minimum acceptable level per test item. scores depend on statistics such as mean and standard
Moreover, these procedures are considered time- deviation of the scores set [34].
consuming and costly. So, the cut-off scores are established On the other hand, the assessment feedback is important
in the form of a pre-fixed percentage of the corrected for the stakeholders such as students, parents, and the
answers of test questions because of the inability to use educational authority wherein its importance for the
regularly the panels for standard-setting procedures. students and parents is determining the level of
However, merging a pre-fixed cut-off score with a relative achievement and the position of students among their
point of reference as a compromise method may reduce the peers. In addition, it is also important for the governmental
disadvantages of conventional criterion and norm- educational administrators to evaluate the instruction and
referenced methods besides making the optimal use of their the learning process, the extent of learning outcomes
advantages [32]. achievement, and the success of the educational policy of
So, every educational institution should have a vision for this medical school. Thus, we should use a detailed
the interpretation of the assessment results; this vision reporting system about the performance of the learning
should determine benchmarks or standards wherein the outcomes of the course [35].
interpretation of assessment results should be based on it. In the end, the report of results should be comprehensive,
Benchmark or standard may be local, external, internal, well organized in an arranged manner without lengthening
value-added, historical trends, strengths and weaknesses and confusion issues, rating the performance, and
perspective, and capability or productivity. According to informative based on the list of specific learning outcomes.
the benchmark or standard choice, we can compare our However, the report format choice depends on the report
material and audience. So, we can use a full report as a examiners (raters) while the inter-case reliability is a
complete assessment activities record or assessment measurement of the student performance from one case to
summary as a note, brochure, or flyer to highlight the another with consistent variables. Furthermore, the test-
particular findings or specific issues. Thus, the components retest reliability is measured by the correlation of one score
of the assessment report should include a description of with others; it is an indicator of consistency over time.
activities, results interpretation, and suggestions. Worthwhile, increasing the testing time and the number of
Moreover, the determination of audience or stakeholders questions are considered methods for improvement of
should be known before the determination of content, examination reliability. In the related context, the
format and the method of assessment results reporting acceptability of the instrument for the users determines its
because every stakeholder needs different content and style usefulness to measure what it is supposed to measure (face
of the results report according to his scope such as validity) besides the utility of assessment instrument that
accrediting organization, higher education commission, should be depending on the reliability, validity, educational
medical education committee, students, and the parents. impact, costs, and the acceptability of method [38].
Furthermore, the assessment results may be used as a Thirdly, the choosing of an assessment instrument for any
method for curriculum evaluation and revision or examination should be depending on multiple levels of
accreditation or employment. Therefore, web reporting is clinical competence that are suggested by Miller (Millers
considered one easy access wherein it is used for a wide Pyramid). MCQ, Essay, and Oral exam are suitable
range of audiences [36]. instruments to test knowledge (knows) while clinical
At last, we would like to mention that communication of scenarios based MCQ, Oral exam, and the Extended
the assessment results should be clear, understandable, matching items are suitable assessment instruments to test
interesting, explainable, and appropriate for the content. understanding and concept building (knows how).
Thus, it may be a chart, table, or graph according to the Moreover, the OSCE and the standardized patient are
available data. Effective tables and charts should have a suitable to test the performance (shows how) while the
meaningful and self-explanatory title and content with a performance log (logbook), checklist, and portfolio are
clear label for every table or chart. Moreover, the results suitable to test the concerned task performance in a real-
should be classified into groups if it is much, and it should life situation (does). Thus, it should choose one or two
be easy for the readers to detect the differences and trends. assessment instruments from each level to reflect the real
At the end of this paragraph, we should refer that the ability of examinee [39].
confidentiality of the assessment result reporting is a title Fourthly, it should use the blueprinting for the tested
of the participant’s credibility in the assessment process objectives specification and determination of its relative
[37]. weight in the examination wherein the table of
specification is the blueprint of the test; it identifies the
types of test items that should be included in the test
8. HOW TO DESIGN A SUCCESSFUL according to the time spent and the cognitive level of every
ASSESSMENT PROGRAM? objective. So, it should align the summative test with the
studied subject matter and the used cognitive process
Continuing with what we started, we can summarize the during the instruction. Worthwhile, the table of
ingredients for designing a successful assessment program specification improves the validity of the test that is based
for the medical student. At first, the rules and procedures of on the quality of the evidence (test content and response
assessment should be clear to the students at the beginning process); the test content is the studied subject matter while
of the module; it should also be involved in the study guide the response process is the kind of thinking that is required
of the module. Secondly, using a well-designed assessment in the test. In addition, there are many approaches to
procedure that is characterized by validity and reliability; develop and use the table of specification; one approach of
the validity means appropriate and meaningfulness of them depends on a selection of the tested learning
inferences that extracted from the assessment results for the outcomes wherein we can select and put the learning
intended use, it should include the content that means the objectives according to the terms of Bloom’s taxonomy in
representative of the learning objectives in the assessment the cognitive domain [40].
and congruence of the assessment instrument with the Fifthly, a referencing framework should be applied to get
purpose (construct validity). Moreover, it should also accurate and useful results interpretations. Norm-
include the predictive validity that means the ability of the referenced interpretation is a survey testing to measure the
instrument to predict performance in the future besides the individual differences in the achievement wherein it
reliability of an assessment that is the consistency of the depends on the other student’s performance for
assessment results which can be interpreted by norm- determination the passing and fail grade of the given
referenced or criterion-referenced, it is a necessary student. On another hand, the criterion-referenced
prerequisite of the valid test. Noteworthy, a highly reliable interpretation is a mastery testing to describe the tasks that
test doesn’t mean necessary its validity. In addition, we can the student can perform with comparison his performance
divide reliability into many types; the inter-rater reliability to a specific achievement domain wherein it depends on
means consistency of the performance rating by different the certain determining level of knowledge or skill for
passing the exam. Noteworthy, the criterion-referenced 10. REFERENCES

framework does not depend on other performances of
examinees but it is based on the particular examinee 1. Wolming S, Wikstrom C. The concept of validity in theory and practice.
performance [41]. Assess Educ Princ Pol Pract. 2010;17(2):117-32. doi:
10.1080/09695941003693856.
In addition, the standard sitting may be used that is a
2. Gronlund NE. Assessment of Student Achievement. 8th ed. Pearson USA;
special boundary one score to determine who performs 2006.
well and who does not wherein the credibility of the
3. Amin Z, Seng CY, Eng KH. Practical Guide to Medical Student Assessment.
standard is different according to who sets the standard, World Scientific Publishing Co. Pte. Ltd. Singapore; 2006.
characters of the used methods, and the outcome. In the
4. Elshama SS. How to Develop Medical Education (Implementation View). 1st
end, the assessment should have feasibility that depends on ed. Scholars' Press Germany; 2016.
the availability of resources such as availability of the time 5. Begum N, Hossain S, Talukder MH. Influence of formative assessment on
for test development, test administration, analysis of summative assessment in undergraduate medical students. Bangladesh J Med
papers, availability of training for examiners and the costs Educ. 2013;4(1):16-9. doi: 10.3329/bjme.v4i1.32191.
[42]. 6. Schuwirth LW, van der Vleuten CP. Different written assessment methods:
what can be said about their strengths and weaknesses? Med Educ.
2004;38(9):974-979. doi: 10.1111/j.1365-2929.2004.01916.x.
7. Nair BR, Parsons K. Performance-based assessment: Innovation in medical
9. CONCLUSIONS education. Arch Med Health Sci. 2014;2:123-5.
8. Schuwirth LW, van der Vleuten CP. ABC of learning and teaching in
Assessment in medical education is a tool to evaluate the medicine: Written assessment. BMJ. 2003;326(7390):643-5. doi:
10.1136/bmj.326.7390.643.
learning process through the student assessment. The
assessment program evaluates the medical student in 9. Palmer EJ, Devitt PG. Assessment of higher order cognitive skills in
undergraduate education: modified essay or multiple choice questions?
different domains such as cognitive, psychomotor, and Research paper. BMC Med Educ. 2007;7:49. doi: 10.1186/1472-6920-7-49.
affective via using tests for the selected response and other
10. Al-Wardy NM. Assessment methods in undergraduate medical education.
for the supply response in addition to the performance Sultan Qaboos Univ Med J. 2010;10(2):203-9.
assessments restricted or extended. So, the planning for a 11. Gibbs T, Brigden D, Hellenberg D. Assessment and evaluation in medical
well-designed assessment program should be based on education, S Afr Fam Pract. 2006;48(1):5-7. doi:
effective ingredients for the success wherein it should be 10.1080/20786204.2006.10873311.
characterized by validity and reliability. Moreover, 12. Wood EJ. What are Extended Matching Sets Questions? Biosci Educ J.
interpretation and reporting of the assessment results to 2003;1(1):1-8. doi: 10.3108/beej.2003.01010002.
stakeholders should be clear, comprehensive, and 13. Carraccio C, Englander R. The objective structured clinical examination:
a step in the direction of competency-based evaluation. Arch Pediatr Adolesc
understandable to enable different stakeholders to evaluate Med. 2000;154(7):736-41. doi: 10.1001/archpedi.154.7.736.
and revise the instructional course effectively for more
14. Zayyan M. Objective structured clinical examination: the assessment of
improvement. choice. Oman Med J. 2011;26(4):219-22. doi: 10.5001/omj.2011.55.
15. Khan A, Ayub M, Shah Z. An audit of the medical students’ perceptions
regarding objective structured clinical examination. Educ Res Int. 2016. doi:
10.1155/2016/4806398.
16. Elshama SS. How to Use Simulation in Medical Education. 1st ed.
Scholars' Press Germany; 2016.
17. Yanofsky SD, Nyquist JG. Using the Affective Domain to Enhance
Teaching of the ACGME Competencies in Anesthesiology Training. J Educ
Perioper Med. 2014;12(1):E055.
18. Lurie SJ, Mooney CJ, Lyness JM. Measurement of the general
competencies of the accreditation council for graduate medical education: a
systematic review. Acad Med. 2009;84(3):301-9. doi:
10.1097/ACM.0b013e3181971f08.
19. Boud D, Falchikov N. Aligning assessment with long-term learning.
Assess Eval High Educ. 2006;31(4):399-413. doi:
10.1080/02602930600679050.
20. Thistlethwaite J. How to keep a portfolio. Clin Teach. 2006;3(2):118-
23. doi: 10.1111/j.1743-498X.2006.00078.x.
21. Haldane T. "Portfolios" as a method of assessment in medical education.
Gastroenterol Hepatol Bed Bench. 2014;7(2):89-93.
22. Roberts C, Newble DI, O'Rourke AJ. Portfolio-based assessments in
medical education: are they valid and reliable for summative purposes? Med
Educ. 2002;36(10):899-900. doi: 10.1046/j.1365-2923.2002.01288.x.
23. Davis MH, Friedman Ben David M, Harden RM, Howie P, Ker J, McGhee
C, et al. Portfolio assessment in medical students' final examinations. Med
Teach. 2001;23(4):357-66. doi: 10.1080/01421590120063349.
24. Jenkins L, Mash B, Derese A. Reliability testing of a portfolio assessment
tool for postgraduate family medicine training in South Africa. Afr J Prim
Health Care Fam Med. 2013;5(1):577. doi: 10.4102/phcfm.v5i1.577.
25. Epstein RM. Assessment in medical education. N Engl J Med. 34. Becker DF, Pomplun MR. Technical reporting and documentation. In:
2007;356(4):387-96. doi: 10.1056/NEJMra054784. Downing SM, Haladyna TM, editors. Handbook of test development. New York:
Routledge; 2006:711-24.
26. McLachlan JC. The relationship between assessment and learning. Med
Educ. 2006;40(8):716-7. doi: 10.1111/j.1365-2929.2006.02518.x. 35. Downing SM. Reliability: on the reproducibility of assessment data. Med
Educ. 2004;38(9):1006-1012. doi: 10.1111/j.1365-2929.2004.01932.x.
27. Hays R, Gupta TS, Veitch J. The practical value of the standard error of
measurement in borderline pass/fail decisions. Med Educ. 2008;42(8):810-5. 36. Wong J, Cheung E. Ethics assessment in medical students. Med Teach.
doi: 10.1111/j.1365-2923.2008.03103.x. 2003;25(1):5-8. doi: 10.1080/0142159021000061341.
28. Downing SM, Tekian A, Yudkowsky R. Procedures for establishing 37. Kibble JD. Best practices in summative assessment. Adv Physiol Educ.
defensible absolute passing scores on performance examinations in health 2017;41(1):110-9. doi: 10.1152/advan.00116.2016.
professions education. Teach Learn Med. 2006;18(1):50-7. doi:
10.1207/s15328015tlm1801_11. 38. Downing SM. Validity: on meaningful interpretation of assessment data.
Med Educ. 2003;37(9):830-7. doi: 10.1046/j.1365-2923.2003.01594.x.
29. Muijtjens AM, Schuwirth LW, Cohen-Schotanus J, Thoben AJ, van der
Vleuten CP. Benchmarking by cross-institutional comparison of student 39. Shumway JM, Harden RM; Association for Medical Education in Europe.
achievement in a progress test. Med Educ. 2008;42(1):82-88. doi: AMEE Guide No. 25: The assessment of learning outcomes for the competent
10.1111/j.1365-2923.2007.02896.x. and reflective physician. Med Teach. 2003;25(6):569-84. doi:
10.1080/0142159032000151907.
30. Lok B, McNaught C, Young K. Criterion-referenced and norm-referenced
assessments: compatibility and complementarity. Assess Eval High Educ. 40. Schuwirth LW, van der Vleuten CP. General overview of the theories used
2016;41(3):450-65. doi: 10.1080/02602938.2015.1022136. in assessment: AMEE Guide No. 57. Med Teach. 2011;33(10):783-97. doi:
10.3109/0142159X.2011.611022.
31. McKinley DW, Norcini JJ. How to set standards on performance-based
examinations: AMEE Guide No. 85. Med Teach. 2014;36(2):97-110. doi: 41. Elfaki OA, Salih KMA. Comparison of Two Standard Setting Methods in a
10.3109/0142159X.2013.853119. Medical Students MCQs Exam in Internal Medicine. American Journal of
Medicine and Medical Sciences. 2015;5(4):164-7.
32. Cohen-Schotanus J, van der Vleuten CP. A standard setting method with
the best performing students as point of reference: practical and affordable. 42. Ben-David MF. AMEE Guide No. 18: Standard setting in student
Med Teach. 2010;32(2):154-160. doi: 10.3109/01421590903196979. assessment, Med Teach. 2000;22(2):120-30. doi: 10.1080/01421590078526.
33. Allen D, Tanner K. Rubrics: tools for making learning goals and
evaluation criteria explicit for both teachers and learners. CBE Life Sci Educ.
2006;5(3):197-203. doi: 10.1187/cbe.06-06-0168.

Assessment in Medical Education

Uploaded by

Copyright:

Available Formats

Assessment in Medical Education

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assessment in Medical Education

Uploaded by

Copyright:

Available Formats

IBEROAMERICAN JOURNAL OF MEDICINE 04 (2020) 351-359

Journal homepage: www.iberoamericanjm.tk

How to Use and Apply Assessment Tools in Medical Education?

ARTICLE INFO ABSTRACT

1. INTRODUCTION Therefore, the integration of assessment with an instruction

passing the exam. Noteworthy, the criterion-referenced 10. REFERENCES

You might also like