Language Testing: Expertise in Evaluating Second Language Compositions
Language Testing: Expertise in Evaluating Second Language Compositions
Language Testing: Expertise in Evaluating Second Language Compositions
http://ltj.sagepub.com/
Published by:
http://www.sagepublications.com
Additional services and information for Language Testing can be found at:
Subscriptions: http://ltj.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Citations: http://ltj.sagepub.com/content/7/1/31.refs.html
What is This?
This study (1) assesses whether raters implicitly distinguish students’ writing
expertise and second language proficiency while evaluating ESL compositions
holistically and (2) seeks to describe the decision-making behaviours used by
experienced and inexperienced raters in this process. The performance of 7
novice and 6 expert ESL teachers was assessed while they evaluated 12
compositions written by adult students with differing levels of ESL proficiency
(intermediate and advanced) and writing expertise (average and professionally
experienced writers) in their mother tongues. Multivariate analyses of rating
scores indicated that both groups’ evaluations distinguished students’ second
language proficiency and writing skills as separate, non-interacting factors.
Descriptive analyses of the raters’ concurrent verbal reports revealed 28
common decision-making behaviours, many of which varied significantly in
use between novice and expert groups. Implications discuss biases in holistic
evaluations of second language compositions, aspects of expertise in this skill,
and potential uses of this research for the training of composition raters and
student-teachers.
I Introduction
Holistic and analytic methods for evaluating compositions have
gained wide acceptance in second language testing and teaching
practices (Canale, 1981; Carroll, 1980; Jacobs, Zinkgraf, Wormuth,
Hartfiel, and Hughey, 1981; Perkins, 1983). But understanding of
these evaluation methods has, in two respects, remained limited.
Firstly, it is not certain that students’ language proficiency can
logically be distinguished from their writing skills, since both factors
necessarily interact in the processes (i.e., for the student) and
products (i.e., for the evaluator) of composition writing in a second
language. For instance, Cumming, (1989) found both factors had
significant, but separate, effects on analytic rating scores in three ESL
composition tasks. This suggests such evaluation procedures may
II Approach
Twelve compositions were selected from a pool of 147 composition
examinations administered placement
as a for ESL classes at a test
Canadian university. The compositions were selected to represent:
two levels of ESL proficiency (intermediate and advanced); two levels
of writing expertise in students’ mother tongue (average student
writers and those with professional experience writing in their work);
and thirdly the writing of students from different language and
cultural backgrounds (so as to counterbalance this effect on the
ratings).
TOEFL scores and interview assessments, collected in the pro-
cess of placement testing, were used to establish ESL proficiency
groupings. The intermediate group had TOEFL scores between 387
and 457, and had been placed in pre-university ESL classes. The
advanced group’s scores ranged from 537 to 627; they had gained
admission to academic programmes at the university, though they
had opted to take one ESL course. Data on students’ expertise in
writing in their mother tongues were collected using a self-report
instrument (validated in several earlier studies, see Cumming, 1989)
asking for self-assessment of abilities to write in various situations, as
well as self-reports of professional experience writing. Compositions
were selected from extreme ends of the 4-point scale in this instru-
ment. Only students reporting professional experience writing were
included in the higher level; none reported being published authors.
For the 12 compositions chosen, TOEFL scores correlated with
holistic interview ratings (r = .8, p < .001), but not with students’
self-ratings of writing expertise (r = .04). The number of words
III Findings
7 Second language proficiency and writing expertise
MANOVA results indicated that both groups of raters implicitly
distinguished students’ ESL proficiency and writing expertise as
separate factors in their rating of the compositions. The ratings of the
expert group showed significant main effects for ESL proficiency
(F = 28.1, p < .0001 ) and writing expertise (F = 4.4, p < .04). Similar
results appeared for the novice groups as main effects for ESL
proficiency (F= 12.5, p < .001) and writing expertise (F= 10.4,
p < .002). Interestingly, there were no interaction effects between the
main factors for either group, suggesting that both groups of raters
implicitly treated students’ ESL proficiency and writing expertise as
separate, distinct factors in their evaluations. As Figure 1 shows, the
ratings for both groups tended to increase consistently (about 1 point
on the 4-point scale) according to students’ levels of ESL proficiency
or writing expertise.
Univariate tests indicated that the ratings for the novice and expert
groups were, however, significantly different from each other: for
ratings of ’content’ (F = 13.5, p < .0003) and ’rhetorical organiza-
tion’ (F= 13.6, p< .0003), but not for ratings of ’language use’
2 Decision-making behaviours
Impressionistic analyses of the raters’ verbal reports identified 28
r1p(,1~1&dquo;n-m~ 1<in<1 hph~vinurs nsprt to internret and evaluate the student
Expert 1: Now I’m moving to paper number 61, which is presented in two
paragraphs, about half as long as the longest one so far. Well, a bit longer than
the one I just saw, but not by much./A wonderful job./We’ve got verb tense
problems. We’ve got conditional problems. Fairly basic stuff here./We go to
another paragraph here. Why? /This student has a strong sense of sentence. We
don’t have a lot of run-ons and fragments. Ah, the sentences are very simple. A
lot of, um, simple problems, verbs, conditionals, spelling. This student hasn’t
had very much English language training. She’s probably quite good orally.
But, ah, she hasn’t had a basic grounding in writing, nor in English structure.
Um, the person probably isn’t a very good writer. We have a very kind of oral
kind of writing./She gives us her background. The only reason really, the
student is on topic in that she may be able to get a better job back in Germany,
but, ah, she doesn’t really look at the effect she predicts her studies to have. She
presents us with her history.
Expert 4: Okay, 61. A shortie. Two paragraphs. Here we go./Okay, now, she
understands rhetorical paragraphing. But she doesn’t have a real conclusion
here. Ah, an introduction, that’s basically what? A paragraph, one line, what
she did, ah, in Germany, and why she wants to learn English. And, ah, kind of
muddled that second paragraph./Another word form. ’Quiet’ instead of
’quite’./Now, we’ve got several ideas here, but not well developed. So I think I
can only give her a 2 for rhetorical patterns. Content? A lot of different ideas
that are very nice. But the organization doesn’t allow you to see them. I’m not
sure that those ideas are there. She’s talking about how nice the city is. She’s
talking about money. Ah, she has a lot of friends who speak English. And
English is important. Ah, this is a poor pattern.
Two novices, however, avoided editing strategies altogether. Novice
7, for instance, focused almost exclusively on comprehending the
ideas communicated in the texts, rejecting analyses of the language
features of the compositions for the reason that that would be too
’technical’: ’That is really difficult, this whole business of commu-
nicating. If they get their ideas across, I guess, um, I mean that’s the
primary thing, I would say, if I can understand what they’re trying to
get across. And the rest of it seems very artificial to me, the technical
aspect of it.’ However, as she progressed with the evaluations, she
found she had avoided making distinctions necessary for assessment:
‘I can see my tendency is to give them a similar number straight across
the board. I haven’t made much discrimination between each of, ah,
the language use, content, and rhetorical organization. The fine
tuning in that area, I think, would drive me bananas.’
Discussion
The results of the first analysis in this study confirm Cumming’s
Acknowledgements
I greatly appreciated: assistance from Ernest Hall and Catherine
Ostler-Howlett in coding the data; advice from Lee Gunderson on the
statistical analyses; the time and effort volunteered by the 13 teachers
and student-teachers to rate the compositions and report on their
thinking; and partial funding of this research by the Social Sciences
and Humanities Research Council of Canada through the University
of British Columbia.
V References
This may be rewriting his c.v. You know, we may have a language here that is
memorized, that’s not really generated. (Expert 1)
It’s very curious to me. I want to know, I guess the other thing I would like to
know is what are the life experiences of these people? Some of them are dead
give-aways in their writing. Others, um, are not. And I’m tempted, I guess I’m
just very curious to know if it’s their lack of experience, or age, or what have
you, that affects their writing. Because some of these come across as being
much younger. (Novice 7)
Um, ’as it was’? What does he say? What’s the purpose of that? I’m not sure.
(Novice 8)
4. summarize propositions
Okay, the writer deals with, not only, studying both English and French. Um,
the value of studying English and French, then the value of meeting people
from other countries. As a, um, the value in broadening one’s world view.
(Expert 1)
This person has said, what? They introduced themselves, why there’re taking
English, why they’re here, um, what they like about being here. (Novice 10)
What has just crossed my mind, now, as I just look at the way the composition
actually looks is that it’s not broken down into paragraphs at all. (Expert 2)
The conclusion, the person has decided to, ah, slightly alter the topic, or the
conclusion of the topic. (Novice 12)
thing means. And then I can go through and, ah, see what individual correc-
tions to work on. (Novice 10)
10. read to assess specific criteria
But, ah, language use? I’m looking now for just simple subject-verb units to
get a sense of sentence. (Expert 1)
So that’s number 85. Um, language use? Well, I’m looking for what kind of
content words there are that might show a good level of understanding, or use
of difficult words maybe. (Novice 10)
You can’t help but be drawn into the personal thinking in some of these.
They’re fairly heroic characters, some of them. (Novice 7)
17. assess development of topics
He makes abundant applications. His reason for studying, it’s developed here.
Again it’s for the company. (Expert 4)
It to be quite logical. It begins with how we should consider the future.
seems
And, um, it begins by telling us about his or her problem in Taiwan. And that
actually follows up to why this person is here. (Novice 8)
18. rate content overall
I’m going to give it a 1 in terms of content. (Expert 2)
Content? It isn’t ineffective, but it’s not effective. So I’d say, it’s a 2. (Novice
12)
For language use, there are some really low level mechanical sorts of problems
’- in verbs. But she does use a lot of transitions and attempts at coherence.
(Expert 1) .
Um, for language use? It’s pretty good for language use. I’d give it a 3 for
language use. (Novice 13)
Well, um, this writer has no concept of, ah, rhetorical organization. He’s put