Speaking Chapter
Speaking Chapter
Speaking Chapter
net/publication/318661099
Cognitive validity
CITATIONS READS
0 438
1 author:
John Field
University of Bedfordshire
29 PUBLICATIONS 1,038 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Rethinking the Listening Test: From Theory to Practice. Publication: Equinox, early 2018 View project
All content following this page was uploaded by John Field on 24 July 2017.
This chapter considers the cognitive validity of the Speaking tasks which
feature in the Cambridge ESOL suite. By ‘cognitive validity’ is to be under-
stood the extent to which the tasks in question succeed in eliciting from can-
didates a set of processes which resemble those employed in a real-world
speaking event. A second consideration is how finely the relevant processes
are graded across the levels of the suite in terms of the cognitive demands that
they impose upon the candidate.
Previous volumes in this series have considered the cognitive validity of
Cambridge ESOL tests of writing and reading. This chapter can therefore
draw upon the format and approach already established in Chapters 3 of
Shaw and Weir (2007) and Khalifa and Weir (2009). As with the earlier
analyses, a major goal is to propose a cognitive model of the construct in
question, which can serve as a framework for judging the cognitive valid-
ity of any test of skilled performance. The model is not drawn from testing
theory but from independent insights afforded by empirical research into the
psychology of language use.
It needs to be borne in mind that this is the first volume of the series to
tackle the spoken modality. While there are certain parallels in the process-
ing model presented here – especially with the model of writing proposed by
Shaw and Weir (2007) – there are also marked differences. Perhaps the most
important distinguishing characteristic of the oral skills is that they typically
operate under tight time constraints that are not usually present in writing
and reading. An interactional speaker self-evidently does not have time for
planning and revising in the way that a writer does. Due account will be taken
of this important difference in the discussion that follows.
The chapter falls into three parts. Firstly, there is an explanation of the
general notion of cognitive validity, which retraces some of the points made
in earlier volumes but does so with specific reference to the present exercise.
Next, some background is provided to research findings on the nature of
L1 speaking; and a process account of the skill is proposed, drawing chiefly
upon the model devised by Levelt (1989). The Levelt model is adapted to
provide a five-part framework for an examination of the cognitive valid-
ity of the tests in the Cambridge ESOL suite. The discussion then goes on
to consider two important characteristics of the types of speech elicited by
65
Examining Speaking
the tasks in the text. Clearly, this entails some consideration of task design,
leading us into a grey area at the interface between cognitive and context
validity. But the goal here is to consider these particular features strictly in
terms of any likely additional cognitive demands which they impose upon
the candidate.
66
Cognitive validity
67
Examining Speaking
(Levelt 1989: Chapter 1). They also take into account the possible impact
upon a speaker’s attentional resources of affective factors such as tired-
ness or anxiety. But their principal focus is on the mental operations which
are engaged by language users under normal circumstances and the way in
which situation-specific information of various kinds can be integrated into
those operations.
It is also incorrect to assert that cognitive psychologists rely upon the
assumption that all language users behave identically. While certain process-
ing routines may provide the easiest and most efficient routes to language
production and reception, some users (including L2 learners) achieve the
same goals by less direct means. It is evident that individuals, whether speak-
ing in L1 or L2, vary enormously in the range of vocabulary they command
and in their powers of expression. It is also evident that L2 speakers respond
in very individual ways to the challenges posed by an inadequate lexical or
grammatical repertoire. Similarly, there is no suggestion that test designers
can afford to ignore the influence of factors arising from individual speaker
differences of age, gender, ethnicity, first language background, etc., as
discussed in the previous chapter on test taker characteristics.
Nevertheless, the premise is adopted that underlying the four language
skills are certain established and shared routines which can be traced by
examining and comparing the performance of expert language users. This
assumption is supported by two lines of argument:
68
Cognitive validity
In line with the notion of ‘cognition’ that has just been outlined, it will be
assumed in considering the speaking skill that variation between tasks due to
situation or genre falls outside the remit of cognitive validity (see Chapter 4
on context validity). Similarly, the discussion will in the main exclude consid-
eration of social-affective factors which might influence the content or deliv-
ery of the speaker’s utterances; these features are the concern of social rather
than cognitive psychology and fall under the speaker-specific aspects covered
in Chapter 2. To the extent that the present chapter considers linguistic dif-
ficulty at all, it does so purely in terms of the cognitive effort which the assem-
bly of a syntactic structure or the retrieval of a piece of vocabulary might
require. More general issues of linguistic accuracy, fluency and complexity
are covered in Chapter 4.
69
Examining Speaking
70
Cognitive validity
rate is so high.’ All the more reason, then, for those who design tests of speak-
ing to have a detailed understanding of the nature of the skill and of the proc-
esses that contribute to it.
Extensive empirical research (psychological, neurological and phonetic)
into all aspects of the skill of speaking has enabled commentators to achieve
a fair degree of consensus as to the processes engaged when individuals
assemble a spoken utterance in their first language. Early psycholinguistic
research in the 1960s into producing and analysing spoken language con-
cerned itself greatly with syntactic structure and with the extent to which the
rules of grammar (particularly Chomskyan grammar) might have psycho-
logical reality (i.e. represent the processes in which a speaker engages when
producing an utterance). These enquiries indicated that the clause was an
important unit of speech assembly; but proved otherwise generally incon-
clusive. (See Aitchison 2008 for an accessible extended discussion.) It was at
this point that many psycholinguists turned to an evidence-driven approach
to speech production in preference to one that was largely shaped by estab-
lished linguistic theory. One line of enquiry built upon emerging evidence
from phonetics in respect of phenomena such as pausing. It became clear that
brief pauses (generally of 0.2 to 1.0 seconds) were necessary for the forward
planning of speech; and the location of these planning pauses was found to
correspond quite consistently with syntactic boundaries, again implicating
the clause as a unit of assembly. One theory (Beattie 1983) suggested that the
length of planning pauses varied according to whether the planning in ques-
tion related solely to the form of the next utterance or whether it additionally
anticipated the conceptual content of later utterances.
A second line of enquiry focused especially on the errors made by natu-
rally performing speakers, known as slips of the tongue. The notion was that
by examining failures of the speech assembly system, one might gain insights
into the cues that speakers were using in aiming for their targets. Slips of the
tongue provided evidence suggesting that a syntactic frame was prepared by
a speaker in advance of lexical items being slotted into it (He found a wife
for his job, substituted for the target He found a job for his wife.) and that
morphological markings were added at quite a late stage (She come backs
tomorrow)1. These findings provided the basis for an early model of speech
production by Garrett (1980, 1988), which drew also upon research into the
speech impairments associated with aphasia. Important in Garrett’s model
were the assumptions a) that a preliminary structural frame is established
into which the outcomes of a parallel lexical search are inserted; and b) that
there is an initial planning phase where the syntactic framework and the links
to lexis are abstract, followed by a phase where they are realised concretely in
terms of word order and phonological word form.
A problem to which Garrett and others gave much thought was the
extent to which syntactic assembly was so closely intertwined with semantic
71
Examining Speaking
72
Cognitive validity
Levelt (1989, 1999) makes clear that any model of speech production,
whether in L1 or in L2, needs to incorporate a number of stages. Field
(2004:284) identifies them as:
a) a conceptual stage, where the proposition that is to be expressed first
enters the mind of the speaker
b) a syntactic stage, where the speaker chooses an appropriate frame into
which words are to be inserted, and marks parts of it for plural, verb
agreement etc.
c) a lexical stage, where a meaning-driven search of the speaker’s lexicon
or vocabulary store takes place, supported by cues as to the form of the
word (e.g. its first syllable)
d) a phonological stage, where the abstract information assembled so far is
converted into a speech-like form
e) a phonetic stage, where features such as assimilation are introduced,
which reduce articulatory effort; and where the target utterance is
converted into a set of instructions to the articulators
f) an articulatory stage, in which the message is uttered.
It is important to note that the first three of these stages are abstract
and not in verbal form. It is only at stage (d) that linguistic forms become
involved. A model of speaking also needs to allow for:
• a forward planning mechanism at discourse level, which (for example)
marks out in advance which syllable is to carry sentence stress
• a buffer, in which an articulatory plan for the current utterance can be
held while the utterance is actually being produced
• a monitoring mechanism, which enables a speaker to check an utterance
for accuracy, clarity and appropriacy immediately before it is uttered
and almost immediately afterwards.
73
Examining Speaking
74
Cognitive validity
Figure 3.1 Adapted version of the Levelt model (1989:9), separating levels of
processing from outputs of processing
GRAMMATICAL ENCODING
constructing a syntactic frame abstract surface structure
forming links to lexical entries
MORPHO-PHONOLOGICAL
ENCODING phonological plan
Conversion to linguistic form
PHONETIC ENCODING
Conversion to instructions to articulators phonetic plan
Cues stored in a speech buffer
ARTICULATION
overt speech
Execution of instructions
SELF-MONITORING self-repair
important to recognise that the model (1989:9, 1999:87) represents two dis-
tinct types of phenomenon: a) the set of processes employed by the speaker
in assembling an utterance; and b) the different forms taken by the message
as it is reshaped by the intending speaker. In the interests of clarity, the two
are displayed here in separate columns. The reader should bear in mind
that what is shown as the output of a given stage also forms the input to the
following one.
The model shown here lacks an important component of the reading
model proposed by Khalifa and Weir (2009), in that it does not show an exec-
utive mechanism or ‘goal setter’ which controls and directs the attention of
the speaker, takes account of context and prevailing circumstances and sup-
ports decision making. Levelt himself (1989:20–22) acknowledges the role of
such a mechanism in conceptualisation and in self-monitoring but stresses
the fact that elsewhere speaking is heavily dependent upon processes that are
highly automatic and thus not subject to central control.
Nevertheless, to bring the model into line with the Khalifa and Weir
account, it is useful to make clear the information sources upon which the
speaker draws (whether automatically or with a degree of intentionality)
when assembling an utterance. The relative richness or poverty of those
sources is clearly an important factor in shaping the performance of a second
75
Examining Speaking
76
Cognitive validity
Figure 3.2 Information sources feeding into the phases of the processing system
Rhetoric
Discourse patterns
Syntax
Lexical knowledge (lemma*)
Pragmatic knowledge GRAMMATICAL
Knowledge of formulaic chunks ENCODING
Combinatorial possibilities
(syntactic / collocational)
Syllabary: Knowledge of
PHONETIC ENCODING
articulatory settings
[ARTICULATION]
* In the 1999 update of his model, Levelt distinguishes between three components in a lexical
entry in the mind storing information about a word. There is a semantic component which
enables a match to be made between a meaning and the target word; a lemma containing
syntactic information about the word (its word class and combinatorial possibilities); and a
lexeme containing information about the word’s phonological form and morphology.
77
Examining Speaking
impossible for the articulators to achieve a speech rate of more than about
eight syllables per second (Miller 1951). But there are also psychological
constraints: the human processor (responsible not just for language produc-
tion and reception but for handling all kinds of mental task) is characterised
by its limited capacity. What this entails is that an increase in the demands
created by one aspect of a task will limit the performer’s ability to deliver in
other areas. Thus, it is very difficult for a language user to speak and write at
the same time or for an individual to memorise a set of facts while repeating
nonsense words.
The limitations upon what can be held in the mind short-term have two
important implications for any discussion of the role of the second language
speaker:
a) For a low-proficiency L2 speaker, the process of retrieving words
and syntactic patterns is much more effortful than it is with a native
speaker – potentially limiting performance in other areas of the speaking
process, such as the ability to hold long-term plans in the mind.
b) The complexity of the task that is set can affect performance. The more
difficult the task, the more attention the speaker gives to handling it and
the less will be available for the delivery of speech.
So, in any consideration of the processes in which test takers engage, full
account needs to be taken of these cognitive demands upon them.
Conceptualisation
Levelt (1989:107) envisages conceptualisation as entailing two types of
operation:
• macro-planning, in which a set of speech acts is anticipated
• micro-planning, at a more local level relating to the role and form of the
upcoming utterance.
These resemble the subdivisions proposed by Shaw and Weir (2007) in
their framework for the cognitive validation of L2 writing. However, macro-
planning in speaking is much more constrained than it is in writing. The
speaker is under pressure to respond promptly in most speaking contexts,
thus limiting the time available for planning and structuring content. In
addition, there are working memory constraints on how much longer-term
material the L2 speaker can store while at the same time dealing with current
production demands (unless, of course, the speaker has the support of pen
and paper to record their intentions). Micro-planning is much more local-
ised. It positions the intended utterance in relation to the discourse as a whole
by taking account of knowledge shared with listener and current topic. It also
adds indications of language-specific features such as tense or interrogation
to the proposition that is to be expressed.
78
Cognitive validity
Grammatical encoding
As already suggested, Levelt’s original formulation phase can be treated as
falling into two parts. The first entails the construction of a surface struc-
ture, an abstract framework for the sentence to be uttered, based upon a syn-
tactic pattern. The second converts the framework and the associated lexis
to phonological form; this involves retrieving the appropriate forms from
memory.
Levelt views surface structure as built around the major components of
the idea that the speaker wishes to express. Thus, the proposition I put two
pounds in the meter would, in an English speaker, trigger the valency pattern
associated with the word PUT:
[Agent + PUT + thing put + destination].
Syntactic complexity is clearly a factor in the cognitive difficulty of pro-
ducing an utterance. Raters readily assume that inability on the part of an
L2 speaker to form a particular structure derives from a lack of linguistic
knowledge; but it may equally well derive from the demands of assembling
the structure and of retaining it in the mind while the utterance is being
produced.
In tests of speaking, however, linguistic content is often expressed not
in terms of syntactic complexity of form but in terms of the language func-
tions which test takers are required to perform. This makes it simpler to
envisage the transition from the test taker’s initial idea to a template for an
utterance. It can be treated as a matter of mapping from the function that
the test taker wishes to perform to the pattern that best expresses that func-
tion. Discussion of pragmatic language falls mainly within Chapter 4, which
concerns itself with linguistic criteria; but it is important to recognise that
any staging of the functions to be performed is also a staging of cognitive
demands. The issue at stake is not primarily the difficulty of the language
that has to be retrieved, but how easy it is for the test taker to perform
the mapping exercise. Contributory factors might include the frequency and
transparency of the function and the complexity of the form of words that
expresses it.
79
Examining Speaking
Phonological encoding
For the second language speaker, the most critical phase of forming an
utterance is the one at which they retrieve phonological forms from memory
in order to give concrete form to what has been planned. Whereas the
process of retrieval in one’s first language is generally rapid, automatic and
accurate, it is likely to be much slower in a second language, especially at
lower levels of proficiency. The search is likely to be more effortful, requiring
higher levels of attention; and the speaker is likely to be less confident of the
outcome.
A widely favoured view of second language acquisition represents it as the
acquisition of a type of expertise, and traces parallels, sometimes sustainable,
sometimes not, with the acquisition of the ability to drive or to play chess
(Anderson 1983). According to this analysis, a speaker’s ability to retrieve L2
word forms from memory develops as a result of increasing familiarity with
the operation. The speaker begins with a retrieval process which is laboured
and heavily controlled in terms of the attention it demands (compare the
careful step-by-step way in which a driver first learns to change gear). By dint
of continued use of the process, what were once separate steps become com-
bined, and the move from stimulus to output becomes increasingly automa-
tised (Schneider and Schiffrin 1977) until the speaker can achieve it without
the conscious allocation of attention. The development is known as proce-
duralisation. Motivated learners often find a means of assisting it by rehears-
ing forms of the spoken language in their minds in anticipation of speech
encounters that may occur. For an application of the notions of automaticity
and control to second language performance see DeKeyser (2001), Robinson
(2003) and Kormos (2006:38–51)2.
Evidence of increasing proceduralisation in L2 speakers is often marked
by a shift towards producing language in chunks (Wray 2002). What begins
as an utterance assembled piece by piece (I + do + not + know + why) later
becomes articulated as a single unit (dunnowhy). It also appears to be stored
in the mind in this form, enabling more rapid retrieval. By taking this path,
L2 speakers align their behaviour with a native speaker’s (and indeed with
their own behaviour in L1). The speed with which native speakers assemble
grammatically correct sentences can only be explained if we recognise that
the operation relies heavily upon stitching together well-established and
often-used combinations of words. Strong evidence for this comes from the
productions of sports commentators, forced to plan their utterances under
extreme time pressures (Kuiper 1996, quoted in Wray 2002):
They’re off and racing now + threading its way through + round the turn
they come
and from the productions of sound-bite politicians (Frost interview with
Tony Blair, quoted in Fairclough 2000:112):
80
Cognitive validity
. . . I think it’s sensible + if for example in areas like erm + the constitution or
indeed in respect of erm education it may be + or any of the issues which matter
to the country + you can work with another political party because there are
lots of things we have in common with the Liberal Democrats why not do it.
81
Examining Speaking
move from slow and intentional retrieval at lower levels to a high degree of
automatisation at higher levels, and from individually assembled strings of
words to the production of formulaic sequences. These developments make
an important contribution towards a listener’s impression of fluency.
Fluency is a notoriously slippery concept, and attempts to define it have
caused much controversy over the years. As Luoma (2004:88) points out, the
term can be given a wide range of definitions, from narrow specifications relat-
ing to hesitation and speech rate to broader ones that are ‘virtually synony-
mous with “speaking proficiency’’. Furthermore, it is not a simple question
of identifying features that are physically present in a speaker’s productions;
there is also the issue of how those features are perceived by a listener. With
this in mind, Lennon (1990) suggests that ratings of fluency in speaking
tests differ from scores based upon quantifiable facets such as accuracy and
appropriacy because they draw purely upon performance phenomena.
However, the assertion perhaps misses the point. Judgements of fluency
are heavily influenced by the ease with which a speaker retrieves and assem-
bles word forms. Lennon himself describes fluency as ‘an impression on the
listener’s part that the psycholinguistic processes of speech planning and
speech production are functioning easily and effectively’ (1990:391). Schmidt
(1992) goes further and explicitly links fluency with automaticity. It is cer-
tainly possible to identify a number of surface features in the speech of a
test candidate which provide indicators of proceduralisation and thus of the
speaker’s progress towards more automatic retrieval processes. They include
the chunking of words, the distribution of pausing and the average length of
stretches of uninterrupted speech.
Increased chunking of word strings by L2 learners manifests itself in a
number of ways. Firstly, as already noted, there are likely to be gains in accu-
racy because a chunk is produced in a way that is pre-constituted. There are
also likely to be improvements in aspects of delivery such as rhythm because
the chunks are stored as phonological wholes; this leads to the impression of
a more native-like command of the mechanics of producing the L2.
Chunking enables several words to be produced as a unit, almost as if
they were a single lexical item. Its effects are thus observable in the speaker’s
length of run – often taken to be the mean number of syllables uttered between
pauses. Raupach (1987, cited in Towell and Hawkins 1994:222) compared
French-acquiring German schoolchildren before and after a period of resi-
dence in France and found increases of up to four syllables in their mean
length of run.
One cannot detach consideration of length of run from consideration of
how frequently the speaker pauses. The more pauses there are, the shorter the
runs are and the more fragmented the discourse is likely to appear, whether
in L1 or L2. But the issue here is not just how often a speaker pauses but also
how long and where. All speakers, however fluent, need to pause at syntactic
82
Cognitive validity
Once again, chunking assists the speaker – here because it simplifies the
planning process and reduces the need to revise plans while speaking. The
results are seen in shorter planning pauses and a much lower incidence of
hesitation.
To summarise, discussion of ‘retrieval’ (i.e. converting the abstract output
of the previous stage into linguistic form) has led us to identify a number of
physical characteristics which are associated with progress in the acquisition
of L2 speaking skills and which together form possible indicators of fluency.
They are:
• use of pre-assembled chunks – leading to syntactic accuracy and native-
like rhythmic properties
• length of run
• duration of planning pauses at syntactic boundaries
• frequency of hesitation pauses.
83
Examining Speaking
84
Cognitive validity
the paired Cambridge ESOL Speaking tests), where the intelligibility of the
test taker to another non-native speaker is also put to the test.
A different concern about the assessment of intelligibility focuses on the
unrepresentative nature of the assessor, usually somebody whose own pho-
nological representations are finely honed by dint of long exposure to a wide
variety of L2 accents (see Kenworthy 1987: Chapter 2 for an interesting dis-
cussion of this issue). It can be suggested that this very expertise potentially
diminishes the predictive power of the tests since intelligibility to the assessor
does not guarantee intelligibility to the wider L1 public. However, practical
considerations have to prevail here: the alternative, of using ‘lay’ assessors,
is not a viable solution, given the potential implications for reliability. The
alternative is to raise awareness of the issue among examiners of speaking
so that they are encouraged to imagine themselves in the position of a lis-
tener with less experience of L2 varieties, as is the case with Cambridge ESOL
examiners (see Chapters 4 and 5).
Self-monitoring
In the final stage of the speaking process, a speaker assesses how precisely
and effectively each utterance realises the plans that were laid down during its
assembly. Self-monitoring might compare the rhetorical impact of what was
said against the goals of the speaker at the conceptualisation stage. It might
compare the syntactic structure that was actually produced against the frame
selected during formulation. Or it might compare the realisation of a particu-
lar word against the correct form of the word that is stored in memory. For a
skilled speaker, a major concern is whether the utterance is unambiguous and
whether it conveys clearly to the listener the speaker’s pragmatic intentions.
Self-monitoring thus potentially takes place at many different levels of a
message. However, Levelt (1989:463) concludes that it is extremely unlikely
that an L1 speaker can attend to all the levels in the brief time span available
– especially given that the speaker is also intent on completing the utterance
under delivery and on planning the next one. Certain levels might be priori-
tised (in much the same way that teachers of writing sometimes choose to
focus only on errors of spelling or of sentence construction). Levelt suggests
that the exact levels that are monitored may reflect the demands of the pre-
vailing context and that the degree of monitoring may fluctuate during the
course of an extended utterance.
These comments are illuminating when considering the self-monitoring of
L2 speakers. As noted several times in this account, the effort of assembling
speech in a second language makes additional cognitive demands, which
limit the speaker’s performance compared to that in L1. From this, it seems
reasonable to conclude that second language speakers are even more prone
to limit their self-monitoring to specific target areas. A number of researchers
85
Examining Speaking
(e.g. Lennon 1984, Poulisse 1993, van Hest 1996) have suggested that they
pay more attention to errors of lexical appropriacy than to errors of gram-
matical accuracy, though Kormos (2006:131) remains unconvinced on the
basis of her research with Hungarian learners. It is clearly dangerous to gen-
eralise because factors such as task type and instructional tradition may lead
to variations in the amount of attention allocated to monitoring and in the
importance attributed to accuracy. But it would seem likely that lower pro-
ficiency learners focus attention on linguistic features rather than pragmatic
ones, comparing one or more of their syntax, lexis and pronunciation with
what they perceive to be L2 norms. A mark of increased competence as an L2
speaker would thus be a gradual increment in the extent to which the speaker
heeds the effectiveness with which the message has been conveyed.
A further consideration in both L1 and L2 is the way in which the speaker
handles the recognition that problems of transmission have occurred. A
competent speaker needs to be able to self-repair efficiently and promptly
following certain implicitly recognised norms (Levelt 1989:478–499), which
may or may not be language-specific.
A note on timing
Psycholinguistic accounts of speaking do not always make it clear that the
processes which contribute to planning3 an utterance can occur at three
different points:
a) before the speech event
b) while an interlocutor is speaking
c) immediately before or during the speaker’s turn.
This has consequences for the way one interprets the model of speak-
ing just outlined. It also serves to distinguish two types of speech event: one
where extensive pre-planning is not possible and one where it is.
The ability to plan at point (a) is restricted in most genres of speech event.
An exception can be found in pre-planned monologues such as lectures, where
the speaker has the opportunity not only of choosing the propositions to be
addressed but also of ordering them and making the connections between
them plain. There is even a possibility of encoding the ideas phonologically
and rehearsing the form of words to be used by turning them over in the mind.
It is only in this type of speech event that speaking comes close to the degree of
planning that is possible in skilled writing (Shaw and Weir 2007).
Most genres of speech event are dialogic, and call for immediate responses
to points made by an interlocutor. Planning therefore has to be reactive and
to take place while the event is ongoing. Ample evidence that a degree of
active planning is undertaken while an interlocutor is speaking (point (b))
can be found in the way that a new turn in natural speech often overlaps with
86
Cognitive validity
the preceding one. But much planning also takes place at point (c), while the
speaker is actually engaged in their own turn. As already noted, it occurs during
short planning pauses, usually located at the end of a syntactic structure such
as a clause. During these pauses, speakers face a complex task. They have to
assemble the next utterance by means of grammatical, phonological and pho-
netic encoding. But they also have to plan ahead conceptually, taking adequate
account of where the present turn is leading or how the listener has reacted. An
early study of planning pauses (Beattie 1983) identified what appears to be a
regular pattern in which a phase of short pauses for linguistic planning gives
way regularly to a phase of longer pauses where a degree of forward concep-
tual planning also takes place. More recent research by Roberts and Kirsner
(2000) appears to confirm the existence of this temporal cycle.
Conceptualisation
As already noted, Levelt (1989:107) envisages conceptualisation as entailing
two types of operation:
87
Examining Speaking
a) Provision of ideas – the complexity of the ideas which test takers have to
express and the extent to which the ideas are supplied to them
b) Integrating utterances into a discourse framework – the extent to which
test takers are assessed on their ability to relate utterances to the wider
discourse (including their awareness of information shared with the
interlocutor).
A further factor which plays an important part in assisting conceptualisa-
tion is whether the speaker is given time to pre-plan what to say (in terms of
general ideas, of the links between those ideas or of the actual form of words
to be used) or not. The question of pre-planning is partly a matter of task
design and is therefore discussed later when considering task demands.
Provision of ideas
Retrieving information and generating ideas impose heavy cognitive
demands upon a speaker. If the need for conceptualisation is reduced, then
the task becomes less onerous, allowing more working memory to be allo-
cated to retrieving the relevant linguistic forms. This is one means by which a
designer can adjust test requirements to make allowance for the more effort-
ful processing demands faced by an L2 speaker with limited knowledge and
experience of the target language. Another important consideration in decid-
ing how much support to provide is the need to ensure that a test does not
too heavily reward the candidate’s imagination rather than their language
proficiency.
In the Cambridge ESOL specifications, one can identify two broad deter-
minants of cognitive difficulty. The first lies in the availability of the informa-
tion demanded of the test taker. An emphasis in the early stages upon personal
and everyday information assists test takers because it asks for the retrieval of
information that is conceptually simple and easily accessed. (This is an interest-
ing side benefit of what would appear to be the testers’ main concern: namely,
to grade the difficulty of the language that is needed to achieve the task.)
The task content gradually moves towards more abstract discussion at
CAE and CPE level. The relevant specifications from the Cambridge ESOL
Common Scale for Speaking are as follows:
88
Cognitive validity
These availability criteria are (like some others in the Common Scale and
Can Do statements) not very sharply differentiated. But the way in which
they are operationalised in actual Cambridge ESOL tests is well illustrated
in the interview stage of the sample material at the end of this volume. The
questions in PET Part 1 (p. 315) relate to candidates’ names, occupations
and home towns – familiar terrain indeed, and material that is quite easily
pre-rehearsed. At FCE level, the questioning in Part 1 (p. 318) still features
familiar topics (home, local town, school, jobs) but is considerably more
open-ended. In Part 1 of the CAE sample (p. 321), some standard questions
are retained; but there are also wider topics such as future plans, travel and
holidays and personal tastes and habits. The specifications ‘unfamiliar’ and
‘unexpected’ contribute to a substantial hike in cognitive difficulty in Part 1
of the CPE materials (p. 324) where wide-ranging questions cover housing,
the importance of study routines and sports facilities and elicit personal views
on social change, communications, internet shopping and space tourism.
(For a more detailed discussion of the gradation of topic content across the
proficiency levels, see Chapter 4.)
A second consideration is how much support is provided by the test rubric
in the form of ideas that the candidates might wish to express. An obvious
way of grading difficulty in a suite of speaking tests is to gradually reduce this
support as the level of the exam increases. However, that seems not to have
been the policy adopted in the Speaking tests of the Cambridge ESOL suite:
quite detailed written or visual support for conceptualisation is provided in
all five Speaking tests from KET to CPE.
The benefit of this is that it ensures comparability between the perform-
ances of candidates at a given level since the concepts and the areas of lexis
upon which they draw are similar. Importantly, it also avoids the danger of
weighting assessment at the higher levels too heavily in favour of the test
takers’ imagination rather than their language.
Nevertheless, the support given is carefully calibrated in terms of how spe-
cific it is and how complex the cues are. The cognitive demands are ratcheted
up gradually by moving:
• from set questions for interlocutors to looser and more open ones
• from the precise demands of prompt cards to visual cues that require
description and then on to visual cues which need to be compared and
evaluated
• from a single prompt (usually in visual form) to multiple ones
• from single-modality prompts to multi-modality ones which combine
oral rubrics with both visual and written stimuli.
There is thus a cline which begins with simple written prompts which serve
to constrain the form and content of the productions in KET (in the sample
materials, posters advertise an air museum and a bookshop). It moves on
89
Examining Speaking
Grammatical encoding
As noted earlier, the linguistic content of speaking tests is often specified
not in terms of grammatical structure but in terms of the language functions
which test takers are required to perform. This makes it easier to describe the
transition from the test taker’s initial idea to a rough template for an utter-
ance. It can be treated as a question of mapping from the function that the test
taker wishes to perform to the pattern that best expresses that function.
The issue under discussion when considering cognitive validity is not the
complexity of the language that has to be retrieved (discussed in Chapter 4)
but how easily the test taker is able to perform the mapping exercise. There
are two ways in which the demands of mapping can be reduced in order to
lighten the cognitive load upon lower level test takers with limited linguistic
resources. One lies in restricting the number of functions that a test taker is
90
Cognitive validity
91
Examining Speaking
Table 3.2 indicates which new functions are added to the repertoire at each
level of the suite. Rather than following the specifications in the Speaking
Test Features, which indicate a wide range of possible functions at each level,
it is based upon a close reading of the interactive task types with a view to
establishing what are the necessary functional demands which they impose.
92
Cognitive validity
Phonological encoding
The discussion of the cognitive framework on pages 80–84 identified a
number of characteristics which are associated with proceduralisation in
the acquisition of L2 speaking skills and which together form indicators of
fluency. They are:
• use of pre-assembled chunks – leading to syntactic accuracy and native-
like rhythmic properties
• length of run
• duration of planning pauses at syntactic boundaries
• frequency of hesitation pauses.
Of these characteristics, the one most consistently represented in the spec-
ifications and instructions to assessors for the Cambridge ESOL Speaking
tests is hesitation. At the lowest level of the suite, an attempt is made to allow
for the greater cognitive demands of planning when one’s linguistic and pho-
nological resources are limited. KET examiners are advised: ‘Candidates at
this level may need some thinking time before they respond. Be sensitive to
this and do not rush candidates, but do not allow pauses to extend unnatu-
rally’ (ISE:15). In the Common Scale for Speaking, there is an attempt to
define hesitation according to the factors that might be responsible for it:
93
Examining Speaking
94
Cognitive validity
95
Examining Speaking
Self-monitoring
Like articulation, self-monitoring and self-repair are aspects of speaker per-
formance which are difficult to capture in the form of test specifications. The
closest the Common Scale criteria come is in focusing upon the candidate’s
ability to deal with breakdowns of communication as and when they occur
– specifically, the extent to which the candidate relies upon support from the
interlocutor in addressing such problems.
96
Cognitive validity
97
Examining Speaking
duration of the task or the familiarity of the content and language that has
to be retrieved. The task design determines the amount of time available to
the candidate for the assembly of an utterance. It also specifies the relation-
ship between the examiner and the candidate or between two candidates in
a pair. Here, one consideration is the length of turn required of each party.
The shorter the examiner’s turn, the greater the pressure on the candidate to
formulate and respond. The shorter the turn expected of the candidate, the
more likely they are to be able to construct a grammatically correct response
and/or to base a response upon linguistic forms employed by the interlocu-
tor. An additional factor in an examiner–candidate relationship (and indeed
in a candidate–candidate one) is the predictability of the content. A task
where the topic changes frequently makes much greater demands upon
the speaker-as-listener than one where the line of conversation is relatively
predictable.
Two of the task variables just identified will be considered in the sections
that follow. They are firstly the nature of the interaction that the task requires
and with it the length of turn demanded of the candidate; and secondly the
amount of time available for the assembly of spoken utterances. Each con-
tributes importantly to the cognitive load that is imposed upon a candidate
by a speaking task.
Patterns of interaction
There are several possible interaction formats for a test of speaking (Davies,
Brown, Elder, Hill, Lumley and McNamara 1999:182, Luoma 2004:35–45,
Fulcher 2003:55–57). It can be one-way, with the test taker responding to
a computer screen or to a voice on a CD or DVD in what is usually termed
a SOPI (Simulated Oral Proficiency Interview). This interaction format
is sometimes referred to as indirect and sometimes as semi-direct (Fulcher
2003:190). In this case, the course of the conversation is entirely pre-
determined, and the candidate has no impact upon the direction it takes.
This approach may meet certain practical needs of mass testing (Stansfield
1990) but is difficult to defend in terms of cognitive validity for the following
reasons:
98
Cognitive validity
c) The test taker is placed under a time pressure even more extreme than
that which obtains in real life, since a response has to be completed
before the next utterance on the recording.
d) The test taker is limited to a single role, that of respondent.
e) The test taker is unable to demonstrate the ability to seek repair or
clarification in cases of uncertainty.
One can say that, in terms of the real-life cognitive processes engaged, this
type of test might well provide indications as to L2 listening skills but cannot
be said to measure the ability to participate actively in a conversation. As
Fulcher puts it (2003:193) after reviewing the evidence: ‘Given our current
state of knowledge, we can only conclude that, while scores on an indirect
test can be used to predict scores on a direct test, the indirect test is testing
something different from the direct test.’
An alternative use of recording requires the test taker to produce a mono-
logue on a specific question or topic. Again, it cannot be claimed that this
approach replicates the conditions and demands of conversational speech:
there is no interaction and the test taker usually has to be allowed time to
prepare what to say. It does, of course, enable an assessment of the test
taker’s oral presentation skills; but, even so, an important element is absent
in the form of auditors who provide signals of understanding. Furthermore,
the formality of a recording tends to inhibit the test taker from stopping to
rephrase points that they may have made inadequately (Luoma 2004:45).
The part played in normal speaking by retrospective self-monitoring and
repair is likely to be severely reduced. In short, the test does not conform to
the view that most speech events entail a process of co-construction between
interlocutors (McNamara 1997b, Swain 2001).
A second approach is to conduct the speaking test by telephone. Here,
interviewer–test taker interaction is indeed provided for. The interviewer
has the possibility of developing a topic more freely, reacting to initiatives
by the test taker and providing support and back-channelling. However,
this type of test models a single, very specific form of spoken exchange. It
is qualitatively different from a face-to-face engagement in several ways
that directly affect phonological processing. Most obviously, there is the
absence of visual context and of the paralinguistic cues provided by facial
expression and gesture. In processing terms, the consequence is that into-
nation assumes a much more important function for both speaker and
listener. A second drawback is that the physical signal is different since a
phone line employs a reduced frequency band, with the result that acous-
tic cues to certain phonemes are absent because they occur at frequen-
cies above 4000 Hz. An experienced phone user learns to adjust for this
(though the /f/-/s/ distinction in particular remains a problem). However,
the point remains that the low-level acoustic-phonetic processing of a
99
Examining Speaking
phone message differs markedly from that of other types of speech event.
It is unsurprising that many speakers find the process of handling a phone
conversation in a second language, whether as initiator or respondent, a
daunting experience.
An approach which requires the physical presence of an interviewer would
thus seem to be preferable in terms of cognitive validity. However, much also
depends upon the role that the interviewer is required to take. One can trace
a continuum in terms of the demands imposed upon the test taker. At one
end, the interviewer’s questions might elicit responses which are formulaic or
which echo the linguistic patterns employed by the questioner. This is neces-
sary for the purposes of controlling from task difficulty at lower levels; but
it detracts from cognitive validity because relatively closed responses of this
kind impose little requirement upon the speaker to engage in the process of
a novo utterance construction. At the other end, the interviewer’s role might
be limited to back-channelling and follow-up questions once a discussion has
been initiated. This not only places a much greater onus upon the candidate,
but aligns the processing that takes place much more closely with that of
normal spoken discourse.
There have also been sociolinguistic concerns (Ross and Berwick 1992,
Young and Milanovic 1992) that the power imbalance between interviewer
and test taker may affect the nature of the communication that takes place (at
its extreme, raising the prospect of a lockstep question and answer pattern).
These concerns have led test designers to introduce a further interaction
format in which test takers communicate with each other in pairs (Taylor
2000b). The approach has been the subject of some criticism (for a review of
the issues, see Fulcher 2003:186–190, Luoma 2004:36–39), which is discussed
in detail in Chapter 4. Clearly, assessing communication between two non-
native speakers can be said to possess strong ecological validity in light of
the widespread use of English as a language of international communication.
(See, for example, the discussion by Canagarajah 2006). On the other hand,
a cognitive analysis cannot overlook the fact that the format increases the
demands of the test, since the candidate as listener potentially has to deal
with two voices (interviewer and fellow test taker), two varieties of the target
language (one native and one non-native), and even potentially a three-way
interaction.
This last comment draws attention again to the critical role that listening
skills play in what are ostensibly tests of speaking. Clearly, it is impossible to
filter out listening from an interactive communication task, and Chapter 1
noted how the ALTE Can Do statements (Council of Europe 2001) combined
the Listening and Speaking descriptors onto a single scale for Interaction.
Nevertheless, the extent to which achievement in a speaking test is reliant on
the listening skill is an issue in an examination where listening is also tested in
a separate paper. The only way of shifting the balance towards speaking is to
100
Cognitive validity
provide a further section of the test in which the candidate speaks for a more
extended period – but does so to a live audience in the form of an examiner
and/or another candidate, who can provide the encouragement and feedback
which would normally be available.
This discussion of patterns of interaction in speaking tests has thus iden-
tified three possible and indeed potentially desirable formats: interviewer–
candidate (I–C), candidate–candidate (C–C) and solo candidate (C). A
fourth scenario might allow for a three-way exchange (I–C–C) in which the
interviewer engages in discussion with two test takers.
As noted, these formats vary in the cognitive demands they make of test
takers. At first glance, it might appear that the most demanding is C. The
reasons lie partly in the difficulties of speaking at length, even in one’s first
language, but also (in an L2 situation) in the reduced opportunity for repair
or support by the interviewer. Fulcher comments (2003:19): ‘If we accept the
view that conversation is co-constructed between participants talking in spe-
cific contexts, our construct definition may have to take into account such
aspects of talk as the degree of interlocutor support.’
However, the fact is that each of the four patterns of interaction poses its
own cognitive challenges:
• I–C may well require a relatively rapid response, particularly when the
interviewer’s turns are short.
• C–C contains a major element of unpredictability (particularly in cases
where the partner test taker has listening comprehension difficulties
and is inclined to go off topic). It also contains an important variable
in the form of the test takers’ familiarity with each other’s L2 variety.
A further complication lies in the extent to which a test taker can
accommodate to and echo the language of their partner. Whereas
the language of the interlocutor can presumably be trusted as a
potential source of linguistic information and emulated accordingly,
fine judgements have to be made as to the extent to which a fellow
learner’s language is to be trusted. This kind of decision clearly imposes
additional cognitive demands.
• So far as I–C–C is concerned, a three-way conversation clearly demands
more complex processing than a two-way exchange. The test taker
not only needs to process input in two different voices; but also needs
to keep track of the points-of-view and foregrounded topics being
expressed by two different individuals.
In addition, the cognitive demands made of the test taker may vary
considerably within these various formats. Two important factors briefly
touched upon have been: how much the test taker has to contribute to the
discourse, and the extent to which the test taker has to engage in a novo utter-
ance construction.
101
Examining Speaking
Planning time
The amount of time speakers have in order to prepare what to say has an
important impact upon several of the phases of processing. This is true
whether they are performing in a first or a second language.
Pre-planning time clearly assists conceptualisation. The speaker has
greater opportunity to generate ideas that are relevant to the topic to be dis-
cussed. They have also has greater opportunity to organise them and to mark
how they are linked conceptually. Pre-planning time also assists grammatical
encoding and retrieval: increasing the likelihood of utterances that are care-
fully formed syntactically and of precision in the choice of lexis. One might
also expect a greater degree of fluency in that retrieval of many of the appro-
priate lexical and syntactic forms can take place in advance of task perform-
ance. Indeed, there are opportunities for rehearsing fully formed utterances
and committing them to long-term memory before delivering them. These
utterance templates go on to assist self-monitoring, in that they provide the
speaker with a concrete target against which to match actual performance. In
short, pre-planning time supports not only the search for ideas to express and
the definition of goals, but also the organisation of information, the precision
of the language used and the awareness of performance errors.
That said, it has to be recognised that most speech events are interactive.
They take place under time pressure and require utterances to be assembled
spontaneously. Only a limited number of contexts allow the speaker the
luxury of planning what to say. They include formal monologue situations
such as making speeches or giving academic presentations; but also situations
where a speaker knows in advance that they will be called upon to express an
extended opinion, report an event, tell a story or outline a set of proposals or
requirements. Applying strict ecological criteria, one might argue that it is
only in conjunction with these types of speaking that it is entirely appropri-
ate for a task to incorporate pre-planning time. This is not just an academic
point. It is clear that the cognitive processing associated with a planned task
is markedly different from that associated with an unplanned. One can argue
that it results in a different type of discourse. A listener might reasonably
expect a greater degree of coherence and cohesion, less hesitation and rep-
etition, fewer false starts and reformulations; and this in turn might lead to
setting the bar for fluency at a higher level than with an extempore speaker.
The implications for the assessment of planned monologues will be obvious.
The effects of pre-planning an L2 speaking task have been quite widely
discussed in the literature on task-based learning (TBL), as have the effects of
repeating a task. Most of the studies that have explored the impact upon per-
formance (e.g. Bygate 2001, Crookes 1989, Foster and Skehan 1996, Mehnert
1998, Ortega 1999) concur in concluding that preparation or repetition leads
to an increase in fluency and complexity. Bygate (2001) reports unclear
102
Cognitive validity
Interaction
Live interaction, termed direct testing, is very much a feature of the approach
to the assessment of speaking favoured by Cambridge ESOL (see Chapter 1 for
discussion of the long tradition of direct speaking assessment by this board).
The suite of Speaking tests embraces all four of the possible formats that have
been identified. The tests provide for an interviewer (referred to as an interlocu-
tor) whose function is to initiate interaction and to keep it going. But they also
include phases of test taker–test taker interaction, thus mitigating the possible
‘power’ effects previously mentioned as associated with the interviewer’s role.
Most tests also make provision for solo presentation, in which (as noted) lis-
tening skills play a minimal part. To support the interlocutor, who may be per-
sonally engaged in the exchanges that take place, a second examiner, known as
an assessor, monitors performance. Saville and Hargreaves (1999) argue that
the use of a range of interactional formats confirms to general principles of test
design, in which tasks are varied so as to elicit different types of language.
Table 3.4 summarises the speaking tasks at different levels of the suite.
It is clear that a range of interaction types is covered, and that they foster
language processing which draws upon both interactive and presentational
skills. An initial ‘interview’ section in all tests features the I–C relationship,
while C–C appears in the specifications as ‘Two-way collaboration’, C as
103
Examining Speaking
Note: Timings are for a complete task involving two candidates. Timings for FCE, CAE and
CPE separate Part 3 and Part 4. Actual speaking times for long turns per candidate are: PET:
up to 1 minute; FCE: 1 minute (plus 20 secs. peer feedback); CAE: 1 minute (plus 30 secs.
peer feedback); CPE: 2 minutes (plus up to 1 minute peer feedback).
‘Long turn’ and I–C–C’ as ‘Three-way discussion’. Test takers are required
to respond to task demands involving negotiation as well as to more formal
questions, both open and closed.
In grading the cognitive demands imposed upon candidates, the test devel-
opers have relied partly upon the duration of the tasks. The overall contact
time for a pair of candidates increases gradually from KET (maximum 10
minutes) to CPE (19 minutes). These figures are only broadly indicative
of the amount of speaking an individual candidate has to do, since there is
inevitably variation in the relative contributions made. A more important
criterion appears to be the way in which the test is distributed between the
various types of interaction. Figure 3.3 provides evidence of how this feature
has been calibrated.
At KET and PET, the Part 1 I–C interview depends quite heavily upon
simple routinised Q&A exchanges. Here are some examples from KET part 1:
Where do you live / come from?
Do you work or are you a student?
Do you like (studying English)?
104
Cognitive validity
Figure 3.3 Interaction formats in Speaking tests: timings for tasks for two
candidates
9
I–C
8 C–C
C
7
I–C–C
6
Minutes
0
KET PET FCE CAE CPE
Examination
105
Examining Speaking
taker communication by providing for a C-C phase even at the lowest level
in KET – though obviously here in a very controlled form, with interlocutor
prompting.
Overall, then, there has clearly been a principled approach to the grading
of the task demands arising from different interaction types.
Planning time
As noted earlier, all the tests in the ESOL suite, with the exception of KET,
include one task which requires extended individual performance. In cogni-
tive terms, a long turn of this kind places heavier conceptualisation demands
upon the test taker, who has to generate more ideas than when responding
briefly to the comments of others and has to organise them meaningfully. It
might be said to constitute a different type of discourse by virtue of the part
played by forward planning (Brown, Anderson, Shillcock and Yule 1984:16–
18). From the assessor’s point-of-view, coherence and cohesion would then
be expected to be a greater consideration and hesitation, self-repair and
loosely connected ideas might be regarded unfavourably.
The Instructions to Speaking Examiners (2011) make it clear that certain
assessment criteria are more applicable to long turn tasks that are principally
monologues. Particular attention is given to discourse features. From PET
level up to CPE, these are:
• sustaining a long turn
• coherence and clarity of message
• organisation of language and ideas
• accuracy and appropriacy of linguistic resources.
(ISE:47, 51, 55, 59)
Due account has thus been taken of the part played by ‘macro-planning’
in extended discourse. However, there is an anomaly here. The candidate is
allowed little or no pre-planning time in order to reflect upon the prompts
provided and to undertake the kind of forward thinking that would feature in
some types of longer turn. To be sure, the turn remains a relatively short one at
most levels (indeed, the PET instructions allow the candidate to speak for less
than a minute). But if candidates are to be judged on criteria such as coher-
ence, cohesion and organisation, then the longer 2-minute slot at CPE level
perhaps requires a brief period of preparation to permit macro-planning. It is
true, as noted earlier, that a degree of forward planning can occur alongside
speech assembly, once a monologue is underway. But one wonders whether
in this type of task a speaker needs time to formulate in advance a set of ‘sub-
goals’ or ‘speech act intentions’ (Levelt 1989:109). The requirement that test
takers speak extempore but are assessed in part by criteria based upon pre-
planned speech would seem to break one of Weir’s test performance condi-
tions (1993:39), namely ‘processing under normal time constraints’.
106
Cognitive validity
107
Examining Speaking
108
Cognitive validity
109
Examining Speaking
has been given both to task demands and to the types of processing that
can be deemed to be representative of performance at different stages of
proficiency.
The focus of the present chapter has chiefly been on the cognitive demands of
the speaking process, whether in L1 or in L2. The criteria for assessing cogni-
tive validity that have been identified concern the target behaviour of the test
candidate and potentially provide a framework for the cognitive validation
of any test of second language speaking.
A secondary focus has been on two types of task variable present in tests
of speaking: namely the nature of the interaction involved and the availabil-
ity of pre-planning time. Here, the emphasis has been strictly upon the way
in which these variables increase or diminish the cognitive demands upon
a speaker. Clearly, other aspects of task design (including task setting and
linguistic demands) are beyond the remit of this chapter, and will now be
considered in Chapter 4.
Acknowledgements
The author is enormously indebted to Lynda Taylor for her sensitive editing
and her helpful suggestions and feedback. He is also especially grateful to
Kathy Gude, Chair of the Cambridge ESOL Speaking Panel, for percep-
tive comments and advice about the criteria and the instructions given to
examiners.
Notes
1 The examples are from Aitchison (2008:254–5); for a discussion of syntactic
speech errors, see Fromkin 1988.
2 It is curious that a procedural component of this kind does not feature
in some standard models of test performance. The Bachman and Palmer
(1996) model, for example, provides for task-related metacognitive strategies
used by test takers; but does not explicitly recognise the very different type
of highly learned cognitive procedure that enables a candidate to apply
linguistic and pragmatic knowledge with minimal working memory demands.
3 Some caution is needed with the term planning, which is used in a narrow
sense when discussing conceptualisation (macro- vs micro planning) but is also
used more generally to refer to the process of assembling an upcoming piece
of speech.
4 Chair of the Cambridge ESOL Speaking Panel, Kathy Gude, comments
(personal communication) ‘Perhaps it is [because] our assessment of ‘fluency’
is too subjective and not as easily quantified as some of the other assessment
criteria’.
5 Describing the long turn section, the CAE Handbook for Teachers
(2008b:76) asserts: This part tests the candidates’ ability to produce an
extended piece of discourse. On the other hand, the profile also foregrounds
110
Cognitive validity
111