Module 4 - Writing Assessment
Module 4 - Writing Assessment
The first element of a good writing assessment is the rubric. The rubric is the
instructions for carrying out the writing task. The rubric should include information such
as the procedures for responding, the task format, time allotted for completion of the
test/task, and information about how the test/task will be evaluated. Much of the
information in the rubric should come from the test specification.
Test specifications for a writing test should provide the test writer with details on the
topic, the rhetorical pattern to be tested, the intended audience, how much information
should be included in the rubric, the number of words the student is expected to produce,
and overall weighting (Davidson & Lloyd, 2005).
.
Good rubrics should:
The second essential part of any test of writing is the writing prompt. Hyland (2003)
defines the prompt as the stimulus the student must respond to (p. 221). Kroll and Reid
(1994:233) identify three main prompt formats: base, framed and text-based. The first
two are the most common in F/SL writing assessment. Base prompts state the entire task
in direct and very simple terms whereas framed prompts present the writer with a
situation that acts as a frame for the interpretation of the task. Text-based prompts
present writers with a text to which they must respond or utilize in their writing.
Consider the following examples:
Base prompts:
Do you favor or oppose a complete ban on smoking? Why? Why not?
Discuss the view that women are better drivers than men.
Many say that money is the root of all evil. Do you agree or disagree with
this statement.
Framed prompts:
On a recent flight back home to the UAE, Emirates Airlines lost your baggage.
Write a complaint letter to Mr. Al-Ahli, the General Manager, telling him about
your problem. Be sure to include the following:
Your flight details
A description of the baggage lost and its contents
What you would like Mr. Al-Ahli to do for you
Text-based prompts:
You have been asked by a youth travel magazine to write an article about things
to see and do in your hometown. Using the attached set of pictures, write a onepage article on this topic.
You have been put in charge of selecting an appropriate restaurant for your senior
class party. Use the restaurant reviews below to select an appropriate venue and
then write an invitation letter to your fellow classmates persuading them to join
you there.
Criteria of good writing prompts:
A writing prompt defines the writing task for students. It consists of a question or a
statement that students will address in their writing and the conditions under which they
will be asked to write.
Each prompt you use in the assessment of writing should meet the following criteria:
Developing a good writing prompt requires that you use the appropriate signpost term
to match the rhetorical pattern you are using. Some of the most common are listed
below:
Describe:
Discuss:
Explain:
Compare:
Contrast:
Analyze:
Define:
Summarize:
Outline:
Evaluate:
Another essential element of good writing assessment is the expected response. The
expected response is a description of what the teacher intends students to do with the
writing task. Before communicating information on the expected response to students, it
is necessary for the teacher to have a clear picture of what type of response they want the
assessment task to generate.
Finally, whatever way you choose to assess writing, it is recommended that you evaluate
the effectiveness of your writing tasks/tests. According to Hyland (2003), good writing
tasks are likely to produce positive responses to the following questions:
Did the prompt discriminate well among my students?
Were the essays easy to read and evaluate?
Were students able to write to their potential and show what they knew?
Issues in Writing Assessment
Time Allocation
A commonly-asked question by teachers is how much time should students be given to
complete writing tasks. Although timing would depend on whether you are assessing
process or product, a good rule of thumb is provided by Jacobs et al (1981). In their
research on the Michigan Composition Test, they (1981:19) state that allowing 30
minutes is probably sufficient time for most students to produce an adequate sample of
writing. With process oriented writing or portfolios, much more time should be allocated
for assessment tasks.
Process vs. Product
In recent years, there has been a shift towards focusing on the process of writing rather
than on the written product. Some writing tests have focused on assessing the whole
writing process from brainstorming activities all the way to the final draft (or finished
product). In using this process approach, students usually have to submit their work in a
portfolio that includes all draft material. A more traditional way to assess writing is
through a product approach. This is most frequently accomplished through a timed
essay, which usually occurs at the mid and end point of the semester. In general, it is
recommended that teachers use a combination of the two approaches in their writing
assessment, but the approach ultimately depends on the course objectives.
Test Administration Conditions
In this day and age, technology has the potential to impact writing assessment. In the
move toward more authentic writing assessment, it is being argued that students should
be allowed to use computer, spell and grammar check, thesaurus, and online dictionaries
as these tools would be available to them in real-life contexts.
In parts of the world where writing assessment is taking place electronically, these
technological advances bring several issues to the fore. First of all, when we allow
students to use computers, they have access to tools such as spell and grammar check.
This access could put those who write by hand at a distinct disadvantage. The issue of
skill contamination must also be considered as electronic writing assessment is also a test
of keyboarding and computer skills. Whatever delivery mode you decide to use for your
writing assessments, it is important to be consistent with all students.
Topic Restriction
Topic restriction is a controversial and often heated issue in writing assessment. Topic
restriction is the belief that all students should be asked to write on the same topic with
no alternatives allowed. It is believed by many teachers in the field that students perform
better when they have the opportunity to select the prompt from a variety of alternative
topics. When given a choice, students often select the topic that interests them and one
for which they have background knowledge. The obvious benefit of providing students
with a list of alternatives is that if they do not understand a particular prompt, they will be
able to select another. The major advantage to giving students a choice of writing prompt
is the reduction of student anxiety.
On the other hand, the major disadvantage of providing more than one prompt is that it is
often difficult to write prompts which are at the same level of difficulty. Many testers
feel that it is generally advisable for all students to write on the same topic because
allowing students to choose topics introduces too much variance into the scores.
Moreover, marker consistency may be reduced if all papers read at a single writing
calibration session are not on the same topic. It is the general consensus within the
language testing community that all students should write on the topic and preferably on
more than one topic. Research results, however, are mixed on whether students write
better with single or with multiple prompts (Hamp-Lyons, 1990). It is thought that the
performance of students who are given multiple prompts may be less than expected
because students often waste time selecting a topic instead of spending that time writing.
If you do decide to allow students to select a topic from a variety of alternatives, make
sure your alternative topics are the same genre and rhetorical pattern. This practice will
make it easier for you to achieve inter-rater reliability.
Classroom Teacher as Rater
Should classroom teachers mark their own students papers? Experts disagree here.
Those who are against having teachers mark their own students papers warn that there is
the possibility that teachers might show bias either for or against a particular student.
Other experts believe that it is the classroom teacher who knows the student best and
should be included as a marker. Double blind marking is the recommended ideal where
no student identifying information appears on the scripts.
Multiple Raters
Do we really need more than one marker for student writing samples? The answer is an
unequivocal yes. All reputable writing assessment programs use more than one rater to
judge essays. In fact, the recommended number is two, with a third in case of extreme
disagreement or discrepancy. Why? It is believed that multiple judgments lead to a final
score that is closer to a true score than any single judgment (Hamp-Lyons, 1990).
Ways to Assess Writing
The F/SL literature generally addresses two types of writing: free writing and guided
writing. The former requires students to read a prompt that poses a situation and write a
planned response based on a combination of background knowledge and knowledge
learned from the course. Guided writing, however, requires students to manipulate
content that is provided in the prompt, usually in the form of a chart or diagram.
Guided Writing
Guided writing is a bridge between objective and subjective formats. This task requires
teachers to be very clear about what they expect students to do. Decide in advance
whether mechanical issues like spelling, punctuation and capitalization matter when the
task focuses on comprehension. Some important points to keep in mind for guided
writing are:
Be clear about the expected form and length of response (e.g., one paragraph, a 250word essay, a letter).
If you want particular information included, clearly specify it in the prompt (e.g.,
three causes and effects, two supporting details).
Similarly, specify the discourse pattern(s) the students are expected to use (e.g.,
compare and contrast, cause and effect, description).
Free Writing
All of the above suggestions are particularly germane to free writing. The goal for
teachers is to elicit comparable products from students of different ability levels.
The use of multiple raters is especially important in evaluating free writing. Agree on
grading criteria in advance and calibrate before the actual grading session.
Acquaint students with the marking scheme in advance by using it for teaching,
grading homework and providing feedback. (Remember: in all cases, good
assessment mirrors actual classroom instruction.)
Teach good writing strategies by providing students with enough space for an outline,
a draft, and the finished product.
Self Assessment:
There are two self-assessment techniques than can be used in writing assessment: dialog
journals and learning logs. Dialog journals require students to regularly make entries
addressed to the teacher on topics of their choice. The teacher then writes back,
modeling appropriate language use but not correcting the students language. Dialog
journals can be in a paper/pencil or electronic format. Students typically write in class
for a five to ten minute period either at the beginning or end of the class. If you want to
use dialog journals in your classes, make sure you dont assess students on language
accuracy. Instead, Peyton and Reed (1990) recommend that you assess students on areas
like topic initiation, elaboration, variety, and use of different genres, expressions of
interests and attitudes, and awareness about the writing process.
Peer Assessment
Peer Assessment is yet another technique that can be used when assessing writing. Peer
assessment involves the students in the evaluation of writing. One of the advantages of
peer assessment is that it eases the marking burden on the teacher. Teachers dont need
to mark every single piece of student writing, but it is important that students get regular
feedback on what they produce. Students can use checklists, scoring rubrics or simple
questions for peer assessment. The major rationale for peer assessment is that when
students learn to evaluate the work of their peers, they are extending their own learning
opportunities.
Portfolio Assessment
Portfolio-based assessment examines multiple pieces of writing written over time under
different constraints rather than an assessment of a single essay written under a specified
time. Many programs are moving toward portfolio assessment as a response to the
reflective, local needs of students and programs.
Definitions vary but the general consensus is that, in simple terms, a portfolio is a
collection of student work. As far as portfolios are defined in writing assessment, a
portfolio is a purposive collection of student writing over time, which shows the stages in
the writing process a text has gone through and thus the stages of the writers growth.
Increasingly, portfolios are being compiled in a way that allows the student to provide
evidence of self-reflection. Portfolios reflect accomplishment relative to specific
instructional goals or objectives. Key elements of portfolios are student reflection and
self-monitoring. Portfolios can showcase a students best work or display a collection of
both drafts and final products to demonstrate progress and continued improvement.
Characteristics of a portfolio
Several well-known testers have put forth lists of characteristics that exemplify good
portfolios. For instance, Paulson, Paulson and Meyer (1991) believe that portfolios must
include student participation in four important areas: 1) the selection of portfolio
contents; 2) the guidelines for selection; 3) the criteria for judging merit and 4) evidence
of student reflection.
The element of reflection figures prominently in the portfolio assessment experience. It is
generally recognized that one of the main benefits of portfolio assessment is the
promotion of learner reflection. By having reflection as part of the portfolio process,
students are asked to think about their needs, goals, weakness and strengths in language
learning. They are also asked to select their best work and to explain why that particular
work was beneficial to them. Learner reflection allows students to contribute their own
insights about their learning to the assessment process. In our view, Santos (1997) says it
best, without reflection, the portfolio remains a folder of all my papers.
Marking Procedures for the Assessment of Writing
Reliable writing assessment requires a carefully thought-out set of procedures and a
significant amount of time needs to be devoted to the rating process.
First, a small team of trained and experienced raters needs to select a number of sample
benchmark scripts from completed exam papers. These benchmark scripts need to be
representative of the following levels at minimum:
Clear pass (good piece of writing that is solidly in the A/B range)
Borderline pass (a paper that is on the borderline between pass and fail but shows
enough of the requisite information to be a pass)
Borderline fail (a paper that is on the borderline between pass and fail but does
not have enough of the requisite information to pass)
Clear fail (a below average paper that is clearly in the D/F range)
Once benchmark papers have been selected, the team of experienced raters needs to rate
the scripts using the scoring criteria and agree on a score. It will be helpful to note down
a few of the reasons why the script was rated in such a way. Next, the lead arbitrator
needs to conduct a calibration session (oftentimes referred to as a standardization or
norming session) where the entire pool of raters rate the sample scripts and try to agree
on the scores that each script should receive. In these calibration sessions, teachers
should evaluate and discuss benchmark scripts until they arrive at a consensus score.
These calibration sessions are time consuming and not very popular with groups of
teachers who oftentimes want to get started on the writing marking right away. They can
also get very heated especially when raters of different educational and cultural
backgrounds are involved. Despite these disadvantages, they are an essential component
to standardizing writing scores.
Writing Assessment Scales
An important part of writing assessment deals with selecting the appropriate writing scale.
Selecting the appropriate marking scale depends upon the context in which a teacher works.
This includes the availability of resources, amount of time allocated to getting reliable writing
marks to administration, and the teacher population and management structure of the
institution.
The F/SL assessment literature generally recognizes two different types of writing scales for
assessing student written proficiency: holistic and analytic.
Holistic Marking Scales
Holistic marking is based on the marker's total impression of the essay as a whole. Holistic
marking is variously termed as impressionistic, global or integrative marking. Experts in
holistic marking scales recommend that this type of marking is quick and reliable if 3 to 4
people mark each script. The general rule of thumb for holistic marking is to mark for two
hours and then take a rest grading no more than 20 scripts per hour. Holistic marking is most
successful using scales of a limited range (e.g., from 0-6).
FL/SL educators have identified a number of advantages to this type of marking. First, it
is reliable if done under no time constraints and if teachers receive adequate training.
Also, this type of marking is generally perceived to be quicker than other types of writing
assessment and enables a large number of scripts to be scored in a short period of time.
Third, since overall writing ability is assessed, students are not disadvantaged by one
lower component such as poor grammar bringing down a score. An additional advantage
is that the scores tend to emphasize the writers strengths (Cohen, 1994: 315).
Several disadvantages of holistic marking have also been identified. First of all, this type of
marking can be unreliable if marking is done under short time constraints and with
inexperienced, untrained teachers (Heaton, 1990). Secondly, Cohen (1994) has cautioned that
longer essays often tend to receive higher marks. Testers point out that reducing a score to
one figure tends to reduce the reliability of the overall mark. It is also difficult to interpret a
composite score from a holistic mark. The most serious problem associated with holistic
marking is the inability of this type of marking to provide washback. More specifically, when
marks are gathered through a holistic marking scale, no diagnostic information on how those
marks were awarded appears. Thus, testers often find it difficult to justify the rationale for the
mark. Hamp-Lyons (1990) has stated that holistic marking is severely limited in that it does
not provide a profile of the student's writing ability. Finally, since this type of scale looks at
writing as a whole, there is the tendency on the part of the marker to overlook the various subskills that make up writing.
Both Educational Testing Service (ETS) and the International English Language Testing
Systems (IELTS) have conducted a tremendous amount of research in the area of holistic
marking.
Analytical Marking Scales
Analytic marking is where raters provide separate assessments for each of a number of
aspects of performance (Hamp-Lyons, 1991). In other words, raters mark selected aspects of
a piece of writing and assign point values to quantifiable criteria. In the literature, analytic
marking has been termed discrete point marking and focused holistic marking. Analytic
marking scales are generally more effective with inexperienced teachers. In addition, these
scales are more reliable for scales with a larger point range.
A number of advantages have been identified with analytic marking. Firstly, unlike holistic
marking, analytical writing scales provide teachers with a "profile" of their students' strengths
and weaknesses in the area of writing. Additionally, this type of marking is very reliable if
done with a population of inexperienced teachers who have had little training and grade under
short time constraints (Heaton, 1990). Finally, training raters is easier because the scales are
more explicit and detailed.
Just as there are advantages to analytic marking, educators point out a number of
disadvantages associated with using this type of scale. Analytic marking is perceived to be
more time consuming because it requires teachers to rate various aspects of a student's essay.
It also necessitates a set of specific criteria to be written and for markers to be trained and
attend frequent calibration sessions. These sessions are to insure that inter-marker differences
are reduced which thereby increase validity. Also, because teachers look at specific areas in a
given essay, the most common being content, organization, grammar, mechanics and
vocabulary, marks are often lower than for their holistically-marked counterparts.
Perhaps the most well-known analytic writing scale is the ESL Composition Profile
(Jacobs et al, 1981). This scale contains five component skills, each focusing on an
important aspect of composition and weighted according to its approximate importance:
content (30 points), organization (20 points), vocabulary (20 points), language use (25
points) and mechanics (5 points). The total weight for each component is further broken
down into numerical ranges that correspond to four levels from very poor to excellent
to very good.
Responding to Student Writing
Another important aspect of writing marking is providing written feedback to students.
This feedback is essential in that it provides opportunities for students to learn and make
improvements to their writing. Probably the most common type of written teacher
feedback is handwritten comments on the students papers. These comments usually
occur at the end of the paper or in the margins. Some teachers like to use correction
codes to provide formative feedback to students. These simple correction codes facilitate
marking and minimize the amount of red ink on student writings. Figure is an example
of a common correction code used by teachers. Advances in technology provide us with
another way of responding to student writing. Electronic feedback is particularly
valuable because it can be used to give a combination of handwritten comments and
correction codes.
Teachers can easily provide commentary and insert corrections
through Microsoft Words track changes facility and through simple-to-use software
programs like Markin.
Sample Marking Codes for Writing
sp
vt
ww
wv
Spelling
Verb tense
Wrong word
Wrong verb
Nice idea/content!
Switch placement
New paragraph
I dont understand
Research indicates that teacher written feedback is highly valued by second language
writers (F. Hyland, 1998 as cited in Hyland, 2003) and many students particularly value
feedback on their grammar (Leki, 1990). Although positive remarks are motivating and
highly valued by students, Hyland (2003) points out that too much praise or positive
8.
Provide students with diagnostic feedback.
Use writing assessment results to identify what students can and cannot do well and make
sure to provide this information to students. With analytic marking you will have access
to a profile to give students feedback. With holistic marking scales, be sure to take notes
on students strengths and areas for improvement.
9.
Practice blind or double blind marking.
Mark essays without looking at students names as the general impression we have of our
students is a potential form of bias. Some teachers mark on the basis of how well they
know the student and his/her abilities. It is not uncommon for a teacher to give a higher
score to a poorly written script of a good or above average student by rationalizing that
Mohammed is really a good student even though he didnt show it on this essay. Maybe
he was tired or not feeling well. This is known as the halo effect. Have students put
their names on the back of their papers or issue each student with a candidate number to
prevent this practice.
10.
Calibrate and recalibrate.
The best way to achieve inter-rater reliability is to practice. Start early in the academic
year by employing the marking criteria and scale in non-test situations. Make students
aware from the outset of the criteria and expectations for their work. Reliability can be
increased by using multiple marking, which reduces the scope for error that is inherent in
a single score.
Extension Activity
Mr. Knott has been asked to contribute a writing prompt for use in the next midterm test.
The course objectives focused on description. Here is the prompt he developed. What
aspects of the prompt are good? What aspects of the prompt might prove problematic?
See the appendix for Mrs. Wrights review of this writing prompt.