EM02 Tot
EM02 Tot
EM02 Tot
L2 VOCABULARY ACQUISITION,
KNOWLEDGE AND USE
new perspectives on
assessment and corpus analysis
EDITED BY
CAMILLA BARDEL
University of Stockholm
CHRISTINA LINDQVIST
University of Uppsala
BATIA LAUFER
University of Haifa
Editors
Gabriele Pallotti (Series editor),
University of Modena and Reggio Emilia
Foreword
Camilla Bardel, Christina Lindqvist and Batia Laufer 5
This book revolves around two main themes. One is vocabulary assessment
methods, the other vocabulary use research by means of corpus analysis and
computational linguistics. The chapters are based on individual papers which
were presented either at a workshop at Stockholm University in May 2010, or
at a thematic panel at the 20th Eurosla Conference in Reggio Emilia in
September 2010. We felt that these conference contributions offered some new
insights into L2 vocabulary research and consequently decided to compile them
into a book that would present recent L2 vocabulary research and suggest some
new directions in the field.
Different ways of assessing vocabulary reflect different conceptualizations of
vocabulary knowledge. Vocabulary knowledge can be viewed as the number of
words a person knows (hence, there are tests of vocabulary size, e.g. Nation &
Beglar, 2007), the amount of information a person has about a particular word
(deep knowledge tests measure how well certain words are known, e.g. Wesche
& Paribakht, 1996), how a word associates with other words (e.g. Read, 1993),
and the speed with which words are retrieved (Laufer & Nation, 2001). Lexical
richness in free production has been measured by lexical profiles (e.g. Laufer &
Nation, 1995; Bardel, Gudmundson & Lindqvist, 2012). Some of the chapters
in the book discuss problems of these measurement methods and make sugges-
tions for refinements and additions (Cobb; Gyllstad; Lindqvist et al.).
The introduction of language corpora, corpus analysis techniques and other
computer analyses into second language research has made it possible to conduct
studies on sizeable and varied samples of spontaneous linguistic productions.
Cross-corpora comparisons and new types of analyses can be performed that pro-
vide new insights into lexical knowledge and its development in a second lan-
guage. Some of the chapters of the book reflect these developments in lexical
research. These chapters analyze the vocabulary found in learners’ performance
in speaking (Lindqvist et al.) or in writing (Levitzky-Aviad & Laufer; Tono).
Besides being concerned with these two overarching themes, the chapters also
focus on a number of central issues in vocabulary research. One such issue is the
role of word frequency, which is a recurrent factor when measuring lexical rich-
ness and is discussed from different points of view in some of the chapters
(Cobb; Levitzky-Aviad & Laufer; Lindqvist et al.).
Another central issue is the relationship between knowledge of single words and
multi-word units, which is addressed in detail by Henriksen, who sees colloca-
tional knowledge as part of communicative competence. Even very advanced
learners seem to have difficulty with mastering this kind of knowledge fully, as
Levitzky-Aviad and Laufer found. Their data shows in fact that students
improved over time as far as measures of single words were concerned, but not
with respect to multi-word units. Knowledge of multi-word units is normally
considered to be indicative of deep knowledge, a construct that is discussed
thoroughly in Gyllstad’s chapter.
Yet another fundamental theme in vocabulary acquisition research pertains to
the differences between learning and using oral and written vocabulary. The
studies in this book examine data from written and spoken language, some
focussing on production, some on comprehension. The differences in lexical
sophistication between spoken and written modes are discussed by Lindqvist et
al. and by Milton. Milton also points out that the correlations between vocab-
ulary size scores and listening skills are generally weaker than the correlations
with the written skills of reading and writing, and suggests some possible expla-
nations for this difference. As regards written production, Tono’s chapter
addresses the important issue of vocabulary errors as correlates of proficiency
level, and analyzes the kinds of errors characterizing different proficiency levels
in academic essays.
Below is a brief summary of the chapters.
Henrik Gyllstad, in his chapter Looking at L2 vocabulary knowledge dimensions
from an assessment perspective – challenges and potential solutions, notes how the
recent upsurge of interest in L2 vocabulary and L2 vocabulary assessment has
been followed by a situation where a large number of knowledge constructs are
proposed and investigated. As Gyllstad points out, the development of compet-
ing definitions and perspectives is part and parcel of any flourishing academic
domain, but still, it is a problem if constructs are given very different interpre-
tations from study to study. Taking the fundamental constructs of vocabulary
breadth and depth (Anderson & Freebody, 1981) as a point of departure, and
drawing on some subsequent critical work on their viability and use, Gyllstad
discusses some of the basic assumptions underlying these constructs. In partic-
ular, he emphasizes that empirical data on the learning and assessment of lexi-
cal items larger than single words, e.g. phrasal verbs, collocations and idioms,
raise questions as to where to draw the line between breadth and depth. The
author ends his paper by presenting suggestions for potential remedies.
Multi-word units are further discussed in Birgit Henriksen’s contribution,
Research on L2 learners’ collocational competence and development – a progress
report. According to previous studies, mastery of formulaic sequences – includ-
Foreword 7
We would like to express our gratitude to the participants at the two meetings
on vocabulary acquisition held in Stockholm and Reggio Emilia. We also thank
the reviewers of this volume, as well as the series editor Gabriele Pallotti, the edi-
torial assistant, Fabiana Rosi, and the language editor Françoise Thornton-
Smith, who proofread the final version of the manuscript.
February 2013
References
The heightened interest in L2 vocabulary over the last two or three decades has
brought with it a number of suggestions of how vocabulary knowledge should be
modelled. From a testing and assessment perspective, this paper takes a closer
look at some of these suggestions and attempts to tease out how terms like model,
dimension and construct are used to describe different aspects of vocabulary
knowledge, and how the terms relate to each other. Next, the two widely
assumed dimensions of vocabulary breadth and depth are investigated in terms
of their viability for testing purposes. The paper identifies several challenges in
this regard, among others the questionable assumption that multi-word units
like collocations naturally belong in the depth dimension, and problems that
follow from the complex and often ill-defined nature of the depth dimension.
Suggestions for remedies are provided.
1. Introduction
Ever since Meara (1980) pointed out the then Cinderella-like status of vocabu-
lary some three decades ago, the field of foreign and second language vocabu-
lary (L2)1 has seen a formidable explosion in terms of activity and the number
of studies published. The dramatic yet welcome increase in research on vocab-
ulary over the last 30 years has brought with it an increase also with regard to
terminology. A striking example of the plethora of terms that may exist for a sin-
gle concept, arguably some having more or less central meanings than others,
can be seen in Wray’s (2002) account of terms used to describe aspects of for-
mulaicity, presented as Figure 1. As Wray points out, even though there are clear
cases of conceptual duplication across the terms used, there are also cases of
terms shared across different fields that do not refer to the same thing. Whether
1 Henceforth, the abbreviation L2 will be used to denote both a second and a foreign
language.
Figure 1. Terms used to describe aspects of formulaicity (taken from Wray, 2002: 9).
In the remainder of this paper, I will first take a look at some of the central
terminology used for describing knowledge and abilities in the field of L2
vocabulary acquisition, primarily from a testing and assessment perspective. I
will discuss how the terminology is used, identify potential problems, and sug-
gest remedies to these when possible. I will then discuss the origins and appli-
cations of the influential and widely-used dimensions of vocabulary breadth and
depth, particularly in relation to some of the challenges that researchers face
when using these for assessment purposes. In doing this, I will also propose
remedies to overcome some of the more persistent challenges.
As was pointed out in the previous section, the heightened interest in L2 vocab-
ulary has entailed an increase in the number of constructs that have been pro-
posed and used. Recent examples connected to vocabulary size tests, i.e. tests of
the number of words in a language for which a learner has at least a basic form-
meaning knowledge, are written receptive vocabulary size (Meara & Buxton,
1987), controlled productive vocabulary size (Laufer & Nation, 1999) and
aural receptive vocabulary size (Milton & Hopkins, 2006). These three exam-
ples have a parent construct (‘vocabulary size’) as a common denominator, but
are more specific by adding terms that narrow the construct down even further,
e.g. ‘receptive’, ‘productive’, ‘aural’, and ‘written’. This is obviously a good thing,
as the added specificity makes it clearer what kind of knowledge is targeted.
Interestingly, even though the notion of construct is arguably very central when
describing vocabulary knowledge and its assessment, the term itself is not always
used specifically in the literature. Instead, the term dimension often appears
when L2 vocabulary researchers discuss acquisition and assessment matters.
Here are some examples of ‘dimensions’ proposed in the literature on L2 vocab-
ulary acquisition.
• Henriksen (1999), in describing a model of lexical development:
a) partial to precise knowledge, b) depth of knowledge, and c) receptive
to productive use ability.
• Meara (2005), in describing a model of lexical competence/skill:
a) vocabulary size, b) vocabulary organization, and c) vocabulary acces-
sibility.
• Daller et al. (2007), in describing a learner’s vocabulary knowledge in
“lexical space”:
a) lexical breadth, b) lexical depth, and c) lexical fluency.
14 Henrik Gyllstad
The first thing to note about the three proposals is that they all assume three
dimensions, perhaps either true to a geometrical definition of space assuming
length, breadth and depth, or simply giving support to the proverb that says
that all good things come in threes. As to the first dimension (a) of the three
models, it could be seen to deal with the same underlying process, namely the
building of a repository of vocabulary items. What is characteristic of this
dimension is that it has more to do with quantity than quality. Learners are
shown to know x number of words, but this knowledge is minimally seen as a
basic form-meaning mapping. Meara’s (2005) vocabulary size and Daller et al.’s
(2007) lexical breadth are very similar in this sense, whereas my understanding
of Henriksen’s (1999) partial to precise knowledge dimension is that she refers to
the development of individual word knowledge, and that she emphasizes that
the acquisition process is not an all-or-nothing activity. There are differences
among authors as regards the second dimension (b), too. Daller et al. see lexical
depth largely from a word knowledge framework perspective. Based on Nation’s
(2001) (see Table 2) descriptive approach to what aspects are involved in know-
ing a word, depth is seen as those aspects that go beyond the basic form-mean-
ing mapping, e.g. concepts and referents, associations, collocations and con-
straints on use. Meara’s second dimension is called vocabulary organisation, and
it is conceptually different to that of Daller et al. Meara envisages vocabulary
organisation as the structured, lexical network that makes up a learner’s mental
lexicon. The focus here is on the links between words in this network and on
how, from a more holistic perspective, they can inform us about the network as
a whole. The fundamental difference between these first two approaches will be
further discussed later on in this chapter. Henriksen’s dimension, called depth of
knowledge, may sound closer to that of Daller et al., but in fact she discusses it
more in terms of network building in line with Meara’s conception of vocabu-
lary organisation. When it comes to the third dimension (c), the versions pro-
posed by Daller et al. and Meara are conceptually close. The former call it lexi-
cal fluency and state that it is intended to define “how readily and automatical-
ly a learner is able to use the words they know and the information they have
on the use of these words” (Daller et al., 2007, p. 8). This may involve the speed
and accuracy with which word forms can be recognised receptively or retrieved
for expressing targeted meanings when speaking or writing (productive vocab-
ulary). Meara’s version, called vocabulary accessibility, is said to have to do with
“how easily you can manipulate the words you know” (Meara, 2005, p. 271),
which is likely to imply both receptive and productive aspects, even though
Meara’s development of tests of this dimension has focused largely on receptive
recognition skills. Henriksen’s version is called receptive to productive use ability,
which is argued to be a continuum, describing “levels of access or use ability”
(1999, p. 314). Thus, there is a clear conceptual overlap between the three dif-
Looking at L2 vocabulary knowledge dimensions from an assessment perspective 15
ferent versions, but it is also evident that the authors describe these dimensions
in different ways and propose different ways to operationalise them.
The use of the term dimension raises the question as to what the relation
is between this term and the term construct. It seems that in some cases in the
literature construct and dimension are used more or less synonymously, where-
as in other cases they are used hierarchically in a hyponymic relation, with
dimension as a hypernym and construct as its hyponym. There are also cases
of the converse relation, for example in Henriksen (1999), where construct is
the superordinate (hypernym) term and dimension the subordinate
(hyponym). Another term that is used in this context is model. Hierarchically,
a model can be seen as a set of propositions that clarify how different con-
structs relate to each other. Meara (2005) talks about his three dimensions as
being part of a model of vocabulary skills, while Henriksen (1999) proposes a
model of lexical competence. Daller et al. (2007) do not use the term model
when discussing their multi-dimensional space, but it is interesting to note that
the name of the volume in which their text is published is called Modelling and
Assessing Vocabulary Knowledge. The terms model, dimension and construct
might be seen as co-existing at different hierarchical levels, albeit with some
restrictions. Thus, I would like to propose that a model may consist of several
dimensions, which in turn may comprise various constructs. A dimension can
also be a construct, so long as type of knowledge or ability referred to is clear-
ly defined – and by extension – measurable through some sort of test or assess-
ment. If it is not, then the use of dimension rather than construct is more suit-
able. Furthermore, a dimension can consist of several constructs, just as a con-
struct in principle can be divided into two or more ‘sub-constructs’. An exam-
ple of this would be the dimension of vocabulary size, which can also be said
to be a construct. In order to accommodate more detailed descriptions of
vocabulary knowledge, e.g. aural receptive vocabulary size (Milton & Hopkins,
2006) or controlled productive vocabulary size (Laufer & Nation, 1999), it is
possible to treat these as two sub-constructs within the construct (and dimen-
sion) of vocabulary size. From an assessment perspective, researchers ought to
define constructs with precision. One way of doing this is by following
Bachman’s (1990, p. 40-45) three-stage analysis:
a. the construct needs to be defined theoretically;
b. the construct needs to be defined operationally;
c. procedures must be established for the quantification of observations.
The theoretical definition (a) is a specification of the relevant characteristics of
the ability we intend to measure, and its distinction from other similar con-
structs. If there are several subcomponents to a construct, then the interrela-
16 Henrik Gyllstad
tions between these must be specified. When it comes to the operational defi-
nition of the construct (b), this process involves attempts to make the con-
struct observable. To a great extent, the theoretical definition will govern what
options are available. For example, the theoretical definition of the construct
‘listening comprehension’ suggests an operationalisation as a task in which
information must be decoded aurally in some fashion. With respect to the
third stage (c), our measurement should be quantified on a scale. If applied to
vocabulary depth (see the section below), with many subcomponents argued to
be part of this construct, it is then very important to try to pin down how they
relate to each other. To the best of my knowledge, this has not been done. On
a theoretical level, Schmitt (2010b) has intuitively hypothesized how the dif-
ferent word knowledge aspects of Nation’s (2001) framework (see Table 2)
relate to each other developmentally, but these hypotheses need to be empiri-
cally tested.
Having discussed the use of terminology in L2 vocabulary knowledge mod-
elling, I will now turn to discussing the viability of two of the most influential
dimensions in the field, vocabulary breadth and vocabulary depth, in order to
see if they can be treated as constructs.
assume that, for most purposes, a person has a sufficiently deep understand-
ing of a word if it conveys to him or her all of the distinctions that would be
understood by an ordinary adult under normal circumstances.
These two aspects of vocabulary knowledge have indeed been influential and wide-
ly used. Not surprisingly, though, they have also been the subject of some criticism.
Firstly, as was pointed out by Read in his account of the term depth
(2004), Anderson and Freebody’s definitions leave us with a number of unclear
terms. For example, in relation to “depth”, it is not clear what is meant by “dis-
tinctions”. Also, it raises the question as to what “an ordinary adult” is and
what “normal circumstances” are. My own reading of Anderson and Freebody
(1981) is that what they mean by distinctions when outlining the depth aspect
is in effect meaning distinctions. This is arguably clear in the passage follow-
ing the one where breadth and depth are initially defined (Anderson &
Freebody, 1981, p. 93):
[…] the meaning a young child has for a word is likely to be more global, less
differentiated than that of an older person. With increasing age, the child
makes more and more of the adult distinctions.
Table 1. The application of the term depth in L2 vocabulary acquisition research (based on
Read, 2004: 211-212).
1. Precision of meaning (the difference between having a limited, vague idea
of what a word means and having a much more
elaborated and specific knowledge of its meaning)
2. Comprehensive word knowledge (knowledge of a word, not only its semantic features
but also orthographic, phonological, morphological,
syntactic, collocational and pragmatic characteristics)
3. Network knowledge (the incorporation of the word into a lexical network
in the mental lexicon, together with the ability to link
it to – and distinguish it from – related words)
18 Henrik Gyllstad
account of how the term depth had been operationalised up to the early 2000s,
there are three applications of the term. The additional two are seen as points 2
and 3 in Table 1.
It is clear from the above descriptions that it is only the first application
called ‘Precision of meaning’ that is consistent with how Anderson and
Freebody (1981) originally defined depth of word knowledge. The second
operationalisation outlined by Read is that of comprehensive word knowl-
edge. Here, as the name implies, a sizeable number of aspects are involved in
knowing a word. One of the most recent and influential descriptions of such
aspects is that of Nation (2001), shown here as Table 2. It is beyond the scope
of this paper to go into a detailed description of Nation’s framework, but one
thing is relevant. Typically, the aspects called ‘spoken’ and ‘written’ under the
heading ‘Form’, together with ‘form and meaning’ under the heading
‘Meaning’ are seen as breadth aspects, whereas the remaining ones in the table
are usually considered depth aspects. This means that knowledge of word
parts, word associations, grammatical functions and collocations are usually
considered depth of word knowledge aspects, an assumption I will return to
later in this chapter.
Table 2. Description of “what is involved in knowing a word”, from Nation (2001: 27).
i.e. break a leg is non-compositional. However, this sequence can also evoke a
more literal reading, to denote the fracture of a bone that someone might suf-
fer in an accident. In this reading, the sequence would be what Howarth
(1996) refers to as a free combination. Likewise, the sequence break a record
has two possible readings, too. One of them denotes the more literal process
of someone destroying a vinyl record, as played on turntables. This would
then also be called a free combination. However, the other reading would be
called a collocation, since one of the components (words) of the sequence is
used in a figurative, de-lexical, or technical sense, in this case the verb break.
It stands to reason that lexical items like these are very important for second
language learning. The point here is that some of them behave like single
orthographic words – certainly the compound noun, but arguably the phrasal
verb and perhaps the collocation and idiom as well. If this is the case, then
they should be made part of the vocabulary inventory and included in a fre-
quency list where single orthographic words would reside jointly with multi-
word items (see Cobb, this volume and Henriksen, this volume). As a case in
point, Shin and Nation (2008) have presented an analysis, based on the 10-
million-word spoken part of the British National Corpus (BNC), in which as
many as 84 collocations occurred with such high frequency that they would
make it into the top 1,000 single word types of the spoken corpus. It should
be noted here that Shin and Nation’s use of the term collocation mainly
resides in one of two traditions of collocation research, called the frequency-
based tradition, the other being the phraseological tradition (see Nesselhauf,
2004; Gyllstad, 2007; Barfield & Gyllstad, 2009 for accounts of these). The
84 collocations of the first frequency band include for example you know, I
think, and come back. Furthermore, as many as 224 collocations would make
it into the second 1,000 word type band of the corpus (see Table 3). As argued
by Shin and Nation (2008), a large number of collocations would qualify for
inclusion in the most frequent single word bands, if no distinction was made
between single words and collocations. This argument seriously challenges the
construct of vocabulary size.
Table 3. The number of collocations that would potentially qualify into single word frequency
bands of English (table taken from Shin & Nation, 2008: 345).
If we accept the assumption that lexical items such as collocations are part of
everyone’s vocabulary, then we need to start thinking of ways of incorporating
lexical items larger than single words into measures of vocabulary size. The rea-
son why this has not yet been done is probably because it is fraught with all sorts
of problems. It is very likely that the vocabulary size construct based on single
orthographic words will maintain its validity for years to come because of its
desirable measurement characteristics. However, attempts at creating measures
of vocabulary size where the nature of word usage – as illustrated by Shin and
Nation’s study – is addressed should be well on their way (see e.g. Martinez &
Schmitt, 2012, and chapter by Cobb, this volume).
Another consequence of this discussion is that it is not clear whether col-
locations and collocation knowledge should reside in the vocabulary depth con-
struct. For many researchers who follow Nation’s (2001) descriptive framework
of word knowledge (see Table 2), aspects except for basic form and meaning
knowledge are typically treated as depth components (see e.g. Read, 2000;
Jiang, 2004; Milton, 2009; Schmitt, 2000, 2010a). In my own work on devel-
oping English collocation tests (Gyllstad, 2007, 2009), I have been reluctant to
call my two test formats – COLLEX and COLLMATCH – depth tests. Both
test formats are receptive recognition measures of verb + noun collocations such
as pay a visit, do justice and keep a diary. The reason for my reluctance is that I
have not seen any convincing arguments yet for why they should be measures
of depth. True, if one subscribes to the idea that any test that measures either
form knowledge or form-meaning knowledge of single words is a size test, and
everything else is a depth test, then it follows that collocation tests would be
depth tests. However, I think this is an over-simplification.
This is also clearly connected to the second major challenge to the dichoto-
my breadth/depth: the multi-faceted nature of the depth construct, as it is con-
ventionally used. Typically, the following aspects of word knowledge are listed
under the heading depth, in its comprehensive word knowledge interpretation:
- meaning knowledge beyond the most frequent,
dictionary-based meaning of a word
- word associations
- collocations
- word parts
- grammatical functions
These aspects of depth are quite disparate, which makes the definition of depth
as a single construct and its subsequent operationalisation very difficult. As
Milton (2009) rightly points out, depth has not been sufficiently and unam-
biguously defined (Milton, 2009, p. 150):
Looking at L2 vocabulary knowledge dimensions from an assessment perspective 23
The difficulties in measuring qualities, such as depth, start with the defini-
tions of this quality. We lack clear, comprehensive and unambiguous defini-
tions to work with and this challenges the validity of any test that might fall
within this area. […] Without a clear construct, it is impossible to create a
test that can accurately measure a quality whatever that quality is.
I have two additional points to make here. First of all, the coining of depth as
a dimension has been valuable in pushing the thinking and theorizing in the
field forward. However, it only makes sense to call it a dimension; as a con-
struct, it is arguably far too vague and elusive. Secondly, one important
approach to ascertaining the viability of a construct is through empirical inves-
tigation, and the most straightforward way of doing this is through correlation
studies. A considerable number of studies have indeed been carried out to inves-
tigate the relation between breadth and depth (e.g. Qian, 1999; Nurweni &
Read, 1999; Vermeer, 2001; Meara & Wolter, 2004; Wolter, 2005; Gyllstad,
2007). Qian (1999) used the Vocabulary Levels Test (VLT) (Nation 2001) as a
size measure and found correlations between scores on that test with scores on
the Word Associates Test (WAT) (Read, 1993, 1998) as a depth measure at r =
.82, based on data from 74 L1 Korean and L1 Chinese ESL college and univer-
sity students, predominately 18-27 year-olds. Nurweni and Read (1999)
administered both a receptive vocabulary size measure and a WAT format depth
measure to 350 L1 Indonesian ESL first-year university students, and they
observed a correlation of r = .62 for the whole group. In a subsequent analysis,
in which the 350 students were subdivided according to scores on a general pro-
ficiency exam, they observed a correlation of r = .81 for high level students
(10% of the whole group); r = .43 for mid level students (42% of the whole
group); and r = .18 for low level students (48% of the whole group). Vermeer
(2001), testing 50 L1 and L2 Dutch kindergarten 5-year-olds, arrived at corre-
lations ranging between r = .70 and .83 between a receptive vocabulary size
measure and an association task depth measure. Meara and Wolter (2004)
found a modest level of correlation between scores on a test of overall vocabu-
lary size and scores on a vocabulary depth test (r = < .3), based on data from
147 Japanese learners of English. This depth test, called V_Links, is argued to
be a test of lexical organisation, following the lexical network interpretation of
depth (Read, 2004). The result was taken as support for the view that size and
organisation are “more-or-less independent features of L2 lexicons” (Meara &
Wolter, 2004, p. 93). Wolter (2005), putting different versions of V_Links to
the test, found similarly low, or even inverse (though not significant), correla-
tions with vocabulary size. Wolter concludes that there is evidence to suggest
that vocabulary organisation, as measured by V_Links (versions 2.0 and 4.0),
and vocabulary size may develop orthogonally (2005, p. 208).
24 Henrik Gyllstad
On balance then, except for the studies by Meara and Wolter, breadth and
depth seem to correlate highly with each other, which raises questions about
their viability as independent constructs. Based on his own investigations of
breadth and depth, Vermeer concluded that (2001, p. 222):
Breadth and depth are often considered opposites. It is a moot point whether
this opposition is justified. Another assumption is that a deeper knowledge of
words is the consequence of knowing more words, or that, conversely, the
more words someone knows, the finer the networks and the deeper the word
knowledge.
Vermeer’s caveat is thus that one should not assume a priori that breadth and
depth are poles.
In order to illustrate in detail some of the challenges implied by using size
and depth empirically, I will briefly account for a study (taken from Gyllstad,
2007) which aimed at finding validation support for two tests of collocation,
the aforementioned COLLEX and COLLMATCH tests. The purpose was to
see whether the collocation tests gravitated more towards vocabulary size or
vocabulary depth when correlated with tests widely assumed to be size and
depth tests, respectively. Scores from 24 Swedish learners of English on five dif-
ferent tests were gathered. The learners ranged from upper secondary school
students to third term university students. The five tests used are shown in Table
4. The analysis yielded very high correlations between the test scores from
vocabulary size (VLT) and vocabulary depth (WAT) at r = .93. The collocation
tests (COLLEX, COLLMATCH) correlated at r = .90 with vocabulary size
(VLT) and at r = .85-.90 with the vocabulary depth measure (WAT).
Table 4. Tests used in a validation study investigating how collocation knowledge relates to the
vocabulary size and depth constructs (based on Gyllstad, 2007).
The question is, what does all this tell us? The collocation tests correlated high-
ly with vocabulary size and almost equally highly with vocabulary depth. At the
same time, the size and depth measures in turn correlated highly with one
another. A common way of interpreting high correlations is to assume that the
variables that are involved are closely related or even the same thing. From a
testing perspective, Norbert Schmitt (personal communication) has argued for
the fact that every size test is in fact also a depth test. What he seems to mean
by this is that for any given word in a size test, test-takers must have some sort
of depth of word knowledge of that word in order to fulfill the test task. This
presupposes, of course, a view of depth where word knowledge starts with a
rather incomplete and partial level of knowledge, for example mere form recog-
nition or very tentative and uncertain meaning knowledge. Most researchers,
however, assume that basic form-meaning knowledge is part of the vocabulary
breadth/size knowledge construct, and that depth is what comes beyond this
basic knowledge.
An analysis that could shed light on the potential difference between the
assumed constructs is multiple linear regression (see Bachman, 2004). It would
for example be possible to try to estimate how much of the variation in a set of
reading comprehension scores can be explained by vocabulary size scores. Then,
as a second step, the variable of vocabulary depth would be entered into the
regression model in order to ascertain whether the percentage of explained vari-
ance would increase. If that is the case, then vocabulary depth could be argued
to bring an added, unique contribution to the variance in reading comprehen-
sion scores. As a case in point, Qian (1999) found that his measure of depth of
vocabulary knowledge added a further 11% to the prediction of reading com-
prehension scores, over and above the prediction afforded by vocabulary size. A
final remark that needs to be made here, though, is that we must look critical-
ly at the test instruments themselves. For example, in my own study (Gyllstad,
2007) and several of the studies reported above, including that of Qian (1999),
a version of the Word Associates Test (WAT) (Read, 1993, 1998) was used.
Some of the words featuring in the WAT are fairly low-frequency items, and
vocabulary size is therefore suspected to have a considerable influence on test-
takers’ performance. A closer look at some of the words featured in the specific
WAT test version used in Qian (1999) and Gyllstad (2007) confirms this. For
example, target words like ample, synthetic (both 6K), and fertile (7K), together
with associate words like cautious (5K) and plentiful (8K) are clearly not high-
frequency words. This confounds the two variables and arguably explains at
least part of the observed high correlations between vocabulary size and vocab-
ulary depth scores.
26 Henrik Gyllstad
4. Concluding remarks
Author’s note
I would like to thank two anonymous reviewers, the volume editors and the
series editor for valuable comments and suggestions.
References
1. Introduction
The seminal works by Pawley and Syder (1983), Nattinger and DeCarrico
(1992) and Lewis (1993) have drawn language researchers’ and teachers’ atten-
tion to the frequency and importance of formulaic sequences (FSs), i.e. recur-
ring lexical chunks in language use. A range of different types of FSs have been
identified: idioms (if life deals you with lemons make lemonade), figurative
expressions (to freeze to the spot), pragmatic formulas (have a nice day), discourse
markers (let me see now), lexicalized sentence stems (this means that…), and col-
locations (rough crossing, remotely clear), which are the focus of this article.
Mastery of FSs is a central aspect of communicative competence (Barfield &
Gyllstad, 2009b; Nation, 2001; Schmitt, 2004; Wood, 2010; Wray, 2002),
enabling the native speaker to process language both fluently and idiomatically
(Pawley & Syder, 1983) and to fulfil basic communicative needs (Wray, 2002).
Moreover, memory and the ability to chunk language into units play an impor-
tance role in language use and learning (Ellis, 2001; 2003; 2005). Hoey (2005)
has also argued for the facilitating processing effects in terms of lexical priming
for recurrent lexical units.
Mastery of FSs is also important for L2 learners. During the last two
decades, we have witnessed an increasing focus in SLA research and in second
and foreign language teaching publications both on FSs in general and more
specifically on collocations (e.g. Barfield & Gyllstad, 2009a; Granger &
Meunier, 2008; Lewis, 2000; Schmitt, 2004; Wood, 2010). The central role of
FSs in language knowledge and the benefits of mastering language chunks in
relation to fluency and native-like selection are important reasons for focusing
on formulaic language, including collocations (see Nation, 2001, pp. 317-318).
Collocations are frequently recurring two-to-three word syntagmatic units
which can include both lexical and grammatical words, e.g. verb + noun (pay trib-
ute), adjective + noun (hot spice), preposition + noun (on guard) and adjective +
preposition (immune to). Many of the studies on collocations have shown that even
high-level learners seem to experience problems in relation to using and develop-
ing L2 collocational knowledge (e.g. Arnaud & Savignon, 1997; Nesselhauf, 2005;
Revier & Henriksen, 2006). Researchers wanting to explore L2 collocational
knowledge, use and development may however also be faced with a number of seri-
ous challenges (Henriksen & Stenius Stæhr, 2009). The aim of this paper is to pro-
vide a progress report on L2 collocational research to see if we can find empirical
support for the more general claim that collocations are a problem area for L2 lan-
guage learners, and to discuss whether researchers are faced with specific challenges
when describing L2 learners’ collocational development and use.
A number of central issues taken up in the studies will be addressed: how
can collocations be defined? Why do L1 and L2 learners need to develop collo-
cational competence? Do L1 and L2 learners differ in their use and develop-
ment of collocations? Is it problematic if L2 learners’ knowledge and use of col-
locations differ from those of L1 users? Which types of collocations have been
studied and which research instruments have been used? Can specific research
challenges be identified? The final section will outline some of the more gener-
al issues raised by the collocational research reviewed, i.e. issues which should
be taken into consideration in future studies.
classify different types of FSs, using a number of criteria (e.g. Boers &
Lindstromberg, 2009; Koya, 2005). Nesselhauf (2005) discusses in detail differ-
ent potential defining criteria, and Nation (2001) outlines 10 different scalar
criteria: frequency of co-occurrence, adjacency, grammatical connectedness,
grammatical structure, grammatical uniqueness, grammatical fossilization, col-
locational specialization, lexical fossilization, semantic opaqueness and unique-
ness of meaning. Many researchers place FSs on a continuum with collocations
as an intermediate category (for an alternative classification see Warren, 2005).
Nattinger and DeCarrico (1992) outline three distinguishing criteria between
idioms, collocations and free combinations: flexibility, compositionality and
productivity. Cowie and Howarth (1996) argue that collocations can be distin-
guished from the other types of FSs by being characterized as institutionalized,
memorized, restricted and semantically opaque units. Laufer and Waldman
(2011) use the criteria of restricted co-occurrence and relative transparency of
meaning. Howarth (1998, p. 24) stands out by focusing on the function of col-
locations, defining them as “combinations of words with a syntactic function as
constituents of sentences (such as noun or prepositional phrases or verb and
object constructions).”
An often quoted (e.g. Wray, 2002), but very illustrative example of a collo-
cation is the adjective + noun unit major catastrophe. If we look at other possi-
ble options for adjectives found in a thesaurus, covering more or less the same
semantic content as major, the following near-synonyms will often be listed: big,
large, great, huge, substantial, enormous, vast, gigantic, and colossal. The Oxford
collocations dictionary (Deuter, 2002) offers big, great, and major as preferred
collocates, but none of the other conceivable adjectives. Many of these are
potential options on the reference level, but are less appropriate on the pragmat-
ic level of conventionalized, i.e. standard, language use. Other often cited con-
trastive examples are strong coffee vs. powerful car and blonde hair vs. light paint.
Two major traditions have been adopted in relation to identifying colloca-
tions (see Barfield & Gyllstad, 2009; Granger & Pacquot, 2008; Gyllstad, 2007;
Nesselhauf, 2005). Firstly, the frequency-based view which identifies collocations
on the basis of the probability of occurrence of their constituent words, often in
large language corpora. Secondly, the phraseological view which is based on a
syntactic and semantic analysis of the collocational unit, using some of the crite-
ria mentioned above, such as degree of opacity, syntactic structure and substi-
tutability of word elements. The advantage of using the corpus approach is that
it employs objective criteria such as frequency, range and collocational span.
However, a data-driven approach focuses on performance and not competence
(Howarth, 1998) and disregards central questions of memory storage and lan-
guage processing. By not including a semantic analysis, this procedure may lead
to the identification of recurring lexical bundles that native speakers would not
32 Birgit Henriksen
classify as collocational unit, i.e. the chunks may have little psycholinguistic
validity for the language users (e.g. and the and of a). On the other hand, the
more subjective phraseological approach only identifies chunks with clear
semantic relations between the constituents, and fails to report the actual fre-
quency of use of the collocations. Some of these collocations may be fairly low
in frequency and may therefore not constitute the most suitable targets for L2
learning and teaching (judicial organ, ruggedly handsome). Many researchers now
apply both procedures, initially identifying the frequently occurring combina-
tions in a large corpus through statistical measures (see Schmitt, 2010, p. 124-
132 for a detailed presentation) and subsequently including and excluding spe-
cific combinations on the basis of an analysis of the word pairs identified. Using
the computational approach as a starting point makes it possible to distinguish
between collocations of varying frequency of use.
Following Gyllstad (2007), collocations can be viewed as both 1) lexical
units, i.e. instances of language use which can be identified in written or spo-
ken production and 2) associative mental links between words in language users’
minds. A number of researchers have studied the psycholinguistic validity of FSs
(e.g. Columbus, 2010; Durrant, 2008, 2009; Ellis, Simpson-Vlach, &
Maynard, 2008), substantiating the fact that the different types of units identi-
fied in language data may indeed be seen as independently represented chunks
in the mental lexicon. The question of psycholinguistic validation of FSs,
including collocations, is important in relation to establishing useful inventories
for the learning and teaching of collocations (see e.g. Durrant, 2009).
So far, it has been assumed that collocations are arbitrary structures, i.e. con-
ventionalized combinatory options preferred by native speakers. However, as
pointed out by Boers, Eyckmans, and Stengers (2006) and Boers and
Lindstromberg (2009) this is not the case for all FSs, including collocations; in
other words some collocations are motivated rather than arbitrary. Some colloca-
tions may be semantically motivated and can be traced back to specific etymo-
logical sources (e.g. weeding out), whereas others are formally motivated e.g.
based on alliteration and assonance (tell a tale, say a prayer, seek + solace, solitude,
a solution and support, do + damage, a degree and a doctorate). Arbitrary colloca-
tions can primarily be identified on the basis of frequency of occurrence in the
language input, whereas the motivated collocations can also be identified on the
basis of semantic or formal criteria via analysis (see also Walker, 2011). Based on
a number of experiments (see again Boers et al., 2006 for an overview), Boers and
his colleagues have argued that this difference between arbitrary and motivated
collocations may influence the learnability of different types of collocations and
thus the teaching approaches to be adopted. As discussed, one useful pathway to
acquiring arbitrary collocations may be via rote learning approaches, whereas the
motivated collocations may be learnt through the use of insightful, analytic
Research on L2 learners’ collocational competence and development – a progress report 33
learning approaches, thus enabling L2 learners to benefit from the increased cog-
nitive involvement connected with the processing of these collocations.
Different categories of FSs have been identified. Fewer attempts have been
made to classify collocations systematically into different subcategories. As we
have seen, some collocations are grammatical (sometimes referred to as ‘colliga-
tions’, see Gyllstad, 2007, p. 25), others lexical. Some collocations may differ in
their degree of fixedness, transparency and arbitrariness. The degree of seman-
tic transparency is a central variable used to distinguish between different types
of collocations. If the learner knows the meaning of the two lexical items
included, the collocation major catastrophe is fully transparent, and can there-
fore be understood through a process of decoding the two lexical elements in
their literal sense. This is also the case with a verb + noun collocation like take
the money. Other collocations are less straightforward, being either semi-trans-
parent (take a course) or non-transparent (take sides). The meaning of the semi-
transparent collocation is not decoded as easily as the literal counterpart, but is
on the other hand not likely to be as salient as the non-transparent collocation
which is idiomatic and cannot be understood on the basis of its constituents.
Consequently, it has been argued that primarily the semi-transparent colloca-
tions will cause problems for language learners and should therefore be the main
focus of L2 research and teaching (Nesselhauf, 2003; 2005). Many FSs have
specific pragmatic functions as speech acts, discourse markers or conversational
up-takers, playing an important role in social interaction. However this is not
the case for most collocations which are composite units (Howarth, 1998) ful-
filling a referential function (e.g. major catastrophe, tell a tale) as syntactic phras-
es. Some of the collocations are semantically motivated; others are formally
motivated, whereas others again seem to be arbitrary combinations which have
become the preferred lexical choice. Finally, many collocations are low in fre-
quency; especially those that have high mutual semantic coherence (e.g. precon-
ceived notions). All of these aspects may have an influence on the frequency,
salience and learnability of the individual collocations.
It has been widely argued (e.g. Boers et al., 2006; Boers & Lindstromberg,
2009; Durrant, 2008; Lorenz, 1999) that collocational competence is impor-
tant for language production and reception, enabling both the L1 and L2 lan-
guage user: 1) to make idiomatic choices and come across as native-like; 2) to
process language fluently under real-time conditions (Columbus, 2010; Ellis et
al., 2008); 3) to establish ‘islands of reliability’ (Dechert, 1983; Raupach, 1984)
which enable the language user to channel cognitive energy into more creative
34 Birgit Henriksen
The results from the L2 studies reviewed here will be discussed in relation to the
four main questions mentioned in the introduction. Due to the number of
studies on collocations, this overview is, however, not exhaustive. For a discus-
sion of some of the studies not included here see Koya (2005) (Japanese stud-
ies), Pei (2008) (Chinese studies), Fan (2009) and Laufer and Waldman (2011).
Finally, it has not been possible to include newer articles published in 2012.
Two types of collocations have been the focus of investigation: lexical col-
locations, i.e. possible syntagmatic combinations between nouns, verbs, adjec-
tives and adverbs (e.g. foul play, take sides, truly happy) and grammatical collo-
cations, i.e. collocations which include prepositions (e.g. hand over to, present
with, important for).
Many researchers have focused on lexical verb+noun collocations (e.g.
Bahns & Eldaw, 1993; Barfield, 2003; Bonk, 2001; Chan & Liou, 2005;
Eyckmans, 2009; Gyllstad, 2007; Howarth, 1996; Koya, 2005; Laufer &
Research on L2 learners’ collocational competence and development – a progress report 35
Girsai, 2008; Laufer & Waldman, 2011; Peters, 2009; Revier & Henriksen,
2006), often looking at the restricted, semi-transparent collocations which are
hypothesized to pose a special challenge for language learners (e.g. Nesselhauf,
2003, 2005; Revier, 2009). Another focus area has been the lexical
adjective+noun combination (e.g. Jaén, 2007; Li & Schmitt, 2010; Peters,
2009; Siyanova & Schmitt, 2008). Some researchers delimit their scope of
investigation to one type of collocation; others include two types, whereas oth-
ers include a range of collocational structures in their studies (e.g. Barfield,
2009; Fan, 2009; Fayez-Hussein, 1990; Gitzaki, 1999; Hoffman & Lehmann,
2000; Groom, 2009; Keshavarz & Salimi, 2007; Prentice, 2010; Skrzypek,
2009; Ying & O’Neill, 2009).
same as a causal relation and a number of other important factors will also influ-
ence L2 learners’ language performance.
As shown, L2 collocational use does deviate from L1 use, both quantitative-
ly and qualitatively. Wray (2002, p. 74) has stressed the need of L2 learners to
master FSs in order to identify with the target language community. However, if
we view L2 use from a lingua franca perspective, native-like attainment and selec-
tion may not necessarily be the goal for L2 development compared for example
to communicative efficiency. Howarth (1998) points out that infelicitous colloca-
tional choices made by L2 learners should in fact be viewed more positively as
instances of risk-taking behaviour, arguing that these are indications that the
interlanguage users are employing various communication strategies (e.g. experi-
mentation, transfer, analogy and repetition) in order to cope communicatively.
The use of FSs, including collocations, is very genre-specific. Mastery of
collocations may be a hallmark of certain types of academic writing which
emphasize clarity, precision and lack of ambiguity (Howarth, 1998). If, as
argued, collocations function as central composite syntactic units for clause level
production, lack of collocational knowledge may be expected to have a negative
effect on L2 performance not just productively for the L2 learner, but also
receptively for the receiver, if central referential units are misunderstood. Apart
from leading to unfortunate misunderstandings, advanced non-native speakers’
collocational deviations may at least signal a lack of academic expertise.
Moreover, the study by Millar (2011) has documented that malformed L2 col-
locations, both in terms of lexical misselection of a constituent and misforma-
tion of the collocation, lead to an increased processing burden for native speak-
ers in terms of slower reading speed. But again, some of the same receptive pro-
cessing effects could also be hypothesized for other aspects of language use, e.g.
heavily accented L2 speech or word stress errors.
Most researchers working with FSs have argued that language users draw
on a large inventory of ready-made FSs to supplement creative language pro-
duction (e.g. Ellis et al., 2008; Erman & Warren, 2000; Hoey, 2005; Pawley &
Syder, 1983) and that this facilitates language processing. Looking at the pro-
cessing advantages of FSs for both native and non-native speakers, the findings
of the earlier experimental studies by Schmitt and his colleagues (Schmitt
Grandage, & Adolphs, 2004; Schmitt & Underwood, 2004; Underwood,
Schmitt, & Galpin, 2004) are, however, very mixed. In a later study, Conklin
and Schmitt (2008) did find significant processing advantages for FSs in literal
as well as non-literal use for both native and non-native speakers. As discussed
(Columbus, 2010; Weinert, 2010), these mixed results may be due to the meth-
ods employed or the types of FSs tested, influenced by factors such as frequen-
cy, familiarity, recency and context – aspects which may be expected to play a
significant role in a usage-based account of language use and language acquisi-
38 Birgit Henriksen
tion (Weinert, 2010, p. 11). None of these earlier processing studies focuses
directly on collocations, but the recent study by Columbus (2010), which
included restricted collocations, reports faster processing for all three types of
FSs tested over compositional control sentences. The evidence of certain pro-
cessing advantages of FSs – including collocations - seems to be mounting.
4.4. Why do L2 learners have problems in relation to using and developing their col-
locational competence?
It is an underlying assumption in the research literature that the L2 learner -
when developing collocational competence - needs to go through the same devel-
opmental processes described in most single-word vocabulary acquisition
research. This entails that the learner must be able to 1) recognize collocations,
i.e. notice and delineate them in the input; 2) understand the meaning and func-
40 Birgit Henriksen
ers experience with collocations may in fact be caused by the use of communica-
tive approaches to teaching, arguing that a more form-focused approach to
teaching should be adopted.
Some studies have looked at the effect of teaching on L2 learners’ colloca-
tional knowledge, focusing specifically on awareness raising activities. The
Chinese studies on teaching reported by Pei (2008) show positive effects of
teaching collocations to L2 learners. Eyckmans (2009) found that noticing activ-
ities can improve learners’ awareness of syntagmatic links. This result has, how-
ever, been contested in a more recent study of chunk learning (Stengers et al.,
2010) which showed no positive effect of teacher-led noticing activities com-
pared to the control groups. Ying and O’Neill (2009), Peters (2009) and Barfield
(2009) also describe different approaches to collocations in language teaching,
emphasizing the need to raise L2 learners’ awareness of collocations, for example
of the contrastive differences between collocations and the need to draw learners’
attention to the collocations with no direct translation equivalence between the
L1 and the L2 (see also Bahns, 1993). Laufer and Girsai (2008) looked at the
benefits of form-focused instruction, stressing the need to adopt a teaching
approach to collocations based on contrastive analysis and the use of translation.
Webb and Kagimoto (2011) investigated the learning effect of the number of
collocates presented with the node word, the position of the node word in rela-
tion to the collocate and the presentation of synonymous collocations together
in the same teaching set. They found that increasing the number of nodes for the
same collocate benefited learning, whereas the presentation of synonymous col-
locations affected learning negatively. The relative position of the collocational
constituents did not seem to have an effect. Based on a corpus study focusing on
a number of different semantic and pragmatic features of collocations, Walker
(2011) has suggested that the use of concordance data may support learning,
making the process more meaningful and memorable to the learners. In a teach-
ing study, Chan and Liou (2005) did find positive effects of using a concordanc-
ing approach to the teaching of collocations. Handl (2009) has also raised the
issue of presentation of collocations in learner dictionaries in order to help learn-
ers identify the collocations they need. However, L2 learners often have no
knowledge of collocation dictionaries or other potential resources for working
with collocations independently.
overview is given in table 1. Again, the list is not exhaustive and does not
include some of the studies reviewed by Pei (2008) and Koya (2005) and some
of the studies mentioned in Fan (2009).
Methodologies Studies
Written and oral on-line tasks
Written corpora, essays Chi et al., 1994; Howarth, 1998; Granger, 1998; Gitsaki, 1999;
Lorenz, 1999; Kazubski, 2000; Nesselhauf, 2003; Revier &
Henriksen, 2006; Wang & Shaw, 2008; Siyanova & Schmitt,
2008; Bell, 2009; Durrant & Schmitt, 2009; Fan, 2009; Prentice,
2010; Li & Schmitt, 2010; Laufer & Waldman, 2011
Oral production Prentice, 2010
Off-line elicitation
Written translation tasks Biskup, 1992; Bahns & Eldaw, 1993; Farghal & Obiedat, 1995;
from L1 to L2 Gitsaki, 1999; Koya, 2005; Webb & Kagimoto, 2011
Gap fill tasks: Cloze tests Bahns & Eldaw, 1993; Farghal & Obiedat, 1995; Herbst, 1996;
and fill-in-the-blank tests Arnaud & Savignon, 1997; Gitsaki, 1999; Shei, 1999; Hoffman &
Lehman, 2000; Bonk, 2001; Durrant, 2008; Durrant & Schmitt,
2010; Revier, 2009; Prentice, 2010
Multiple choice tasks, Fayez-Hussein, 1990; Granger, 1998; Bonk, 2001; Mochizuki, 2002;
matching and judgement Honsun, 2005; Gyllstad, 2007; Leśniewska & Witalisz, 2007;
Siyanova & Schmitt, 2008
Recognition task Barfield, 2003; Gyllstad, 2007
Association task Barfield, 2009; Fitzpatrick, 2012
On-line reaction tasks
Eye movement task Underwood et al., 2004; Columbus, 2010
Self-paced reading Conklin & Schmitt, 2008; Millar, 2011
Recognition task Siyanova & Schmitt, 2008; Yamashita & Jiang, 2010; Wolter &
with reaction time Gyllstad, 2011
Three general types of elicitation tools have been used (Siyanova & Schmitt,
2008, p. 1) written on-line tasks, often in the form of essays produced by both
NSs and NNSs and often collected in large data banks; 2) off-line elicitation tools
in the form of productive translation tasks, cloze format tasks and association tasks
as well as receptive multiple-choice and judgement tasks; 3) on-line reaction tasks
44 Birgit Henriksen
Much of the research conducted is exploratory, and researchers fail to use vali-
dated, standardized elicitation procedures. Some of the newer studies are, how-
ever, aimed at developing and validating instruments for measuring collocation-
al knowledge. Finally, many of the studies focus on the state of the learners’ col-
location knowledge and use, and the studies that look at collocation develop-
ment are primarily cross-sectional.
6. The Need for Following the Development of Individual Learners over Time
Many of the collocational studies are based on L1 and L2 data extracted from
large corpora. As pointed out by Laufer and Waldman (2011), the advantage of
this approach is that large amounts of data can be examined across a variety of
data sources and informant groups (across L2 proficiency levels or L1 vs. L2
data) with the use of concordance software. The disadvantage is, however, that
only very few studies are longitudinal, tracing the same learners over time with
the same tasks. Consequently, we often do not follow the use and development
of collocation knowledge from the perspective of the individual learner.
Granger (2009, p. 65) argues that we “need to abandon the notion of the
generic L2 learner and distinguish between different types of L2 learners and L2
learning situations”, stressing the need to look at variables that influence learn-
er language such as the learner’s L1 (e.g. Wolter & Gyllstad, 2011), degree of
exposure (e.g. Groom, 2009) or proficiency level, as well as factors pertaining to
the task such as medium, genre, or task type (e.g. Forsberg & Fant, 2010). Most
of these factors have tended to be neglected in most L2 learner corpus research.
The need to study language development from a usage-based perspective
as it unfolds for the individual learner, the need to take contextual factors into
consideration and the need to allow for inter-learner and intra-learner varia-
tion in the results reported, echoes some of the very central assumptions
about language learning outlined by Larsen-Freeman (1997; 2006) in her dis-
cussion of complex, dynamic non-linear models of language development.
According to Larsen-Freeman, we need to abandon the ‘developmental ladder
metaphor’ which views language development as a linear process which pro-
ceeds more or less neatly through a series of stages towards native-like attain-
ment. As argued, the language system adapts to the changing contexts the
learners are exposed to. Adaptation and fluctuation of the system dependent
on the language use conditions of, and the choices made by, the individual
learner should therefore be expected. Moreover, development in one subsys-
tem of language may support or compete with development in another sub-
system. Because language is viewed both as a cognitive and social resource
embedded in a usage-based context, Larsen-Freeman argues that the L2 learn-
Research on L2 learners’ collocational competence and development – a progress report 47
ers’ identities, goals and affective states will influence their language use and
consequently their language development.
The conflicting results found in some of the collocation studies reported
earlier as well as the failure to report development over time in some of the stud-
ies may, as is often pointed out by the researchers themselves, be due to differ-
ences in the operationalization of the construct of collocational knowledge, the
collocations targeted or the lack of sensitivity of the elicitation tools employed.
One could, perhaps, also hypothesize that the results are an effect of the quan-
titative approach adopted and the reliance on learner corpus data in many of the
studies. One could speculate whether a research approach which focuses more
on individual learners and their differential development should be adopted to
complement the quantitative approaches employed. Some learners for example
choose to focus on learning new vocabulary items instead of developing depth
of knowledge of already acquired lexical items (Ying & O’Neill, 2009). The ori-
entation of learning resources in this way will most likely have a negative effect
on the learner’s acquisition of collocations, i.e. the competition between these
two lexical ‘subsystems’ will be detrimental to the development of collocational
competence.
L1 language learners develop collocational competence through extended
exposure to their native language in varying contexts and co-texts. Repeated
exposures create and strengthen associative links between the collocational con-
stituents in the language learner’s memory organisation, priming (Hoey, 2005)
the learner to recognize and use the collocations as holistic units. Repeated
exposure to collocations in varying contexts and co-texts is also a prerequisite
for developing collocational competence for the L2 learner.
Words and collocations are by nature carriers of semantic meaning. If we
exclude the most frequent 2000-3000 word families with very high text cover-
age and range, most lower-frequency words are related to specific topics, situa-
tions, genres, contexts and co-texts. Technical and special purpose contexts and
language materials are classic examples of input rich in specialized vocabulary.
The nature of the L2 language learners’ contact with the target language will
naturally influence the lexical items the learner encounters. For FL learners the
selection of lexical items is most often under the control of the teacher and
dependent on the materials introduced in the language classroom and highly
limited by the time allotted to language learning. Additional, self-generated L2
input will often be dependent on the learners’ personal interests and the special
context situations the learners choose to engage in. We all have stories of learn-
ers who have a personal interest for example in internet role plays or computer
games and therefore have an exceptionally well-developed vocabulary within
these specialized areas. As pointed out by Nation (2001, p. 20) “One person’s
technical vocabulary is another person’s low-frequency word”. Hoey (2005, p.
48 Birgit Henriksen
14) also stresses the uniqueness of the individual learner’s input and the prob-
lems of documenting the learning process.
All these observations are in themselves fairly trivial, but if we link the role
of context and co-text in L2 input to the points raised by Larsen-Freeman (1997;
2006) in relation to how the individual language learners adapt and orient them-
selves to the communicative situations and the needs they experience, the ques-
tion of frequency becomes extremely crucial. If we look at the frequency of the
individual collocations in language input, it is clear that a collocation like major
catastrophe is less frequent than the two words that make up the collocational
unit. Or phrased differently, the likelihood of learners encountering the colloca-
tion repeatedly in input is smaller than encountering the individual words and is
highly dependent on the type of input the learner encounters. In a small
exploratory case study, Dörnyei, Durow, and Zahran (2004) investigated the
effect of individual learner differences on the acquisition of FSs. Not surprising-
ly, they found that the individual learner’s motivation, active interaction and
social adaptation to the second language situation highly affected the learning
outcome. This result might explain why a larger study of the acquisition of FSs
which was based on whole-sample statistics failed to produce significant results.
Inspired by Larsen-Freeman’s approach, Bell (2009) carried out a longitudi-
nal study, describing “the messy little details” of lexical development which
become apparent when looking more closely at one individual learner. As the
case study shows, the data reveals instances of fluctuation and variability in the
learner’s lexical development similar to the scouting and trailing behaviour
described by Larsen-Freeman. The learning path can be characterized as showing
jagged development and fluctuating patterns of use with structures moving into
prominence and/or disappearing. Moreover, Bell identifies the use of intermedi-
ate structures and results of competing sub-systems. The longitudinal studies by
Barfield (2009) and Li and Schmitt (2010) are examples of case studies which
follow individual learners’ development of collocation knowledge over time. The
in-depth analysis of the individual learners enables Barfield (2009) to describe
how different learners approach the learning task, giving us interesting insights
into how learners handle the challenges they meet and how they choose to organ-
ize their learning in relation to the contexts and needs they experience. Li and
Schmitt (2010) also document in detail the inter- and intra-learner variation in
the development of the four informants followed over a 12-month learning peri-
od. In a more recent study, Fitzpatrick (2012) tracks the changes in vocabulary
knowledge of a single subject in a study abroad context by the use of word asso-
ciation data collected six times over an 8-month period. One of the focus areas
in the study are the syntagmatic responses produced which give an insight into
the developing productive collocational knowledge of the informant.
It is more than likely that collocational acquisition is much more idiosyn-
Research on L2 learners’ collocational competence and development – a progress report 49
cratic in nature and dependent on specific language use situations than single-
word acquisition and therefore calls for more qualitative, case-study, longitudi-
nal research approaches like the studies outlined above. Larsen-Freeman argues
for the need to use both macro- and micro-level perspectives in SLA research in
order to trace both the larger cross-learner patterns of interlanguage develop-
ment and the developmental paths taken by the individual learner. One could
argue that complementary research methodologies may be a fruitful path to
pursue in future collocation research.
7. Rounding off
This research overview has shown that native and non-native speakers do differ
in their use of collocations both quantitatively and qualitatively, and this holds
for advanced L2 learners as well. It has been found that malformed L2 colloca-
tions may have negative effects on the processing speed for the recipients.
Collocations, however, primarily fulfil a referential function and lack of collo-
cational knowledge therefore might not lead to potential pragmatic failure in
the same way, i.e. have the same social and interpersonal consequences as infe-
licitous use of some of the other types of FSs. On the other hand, collocations
are conveyers of precise semantic information, so incorrect use of collocations
may potentially lead to misunderstandings, and the failure to use them appro-
priately may signal lack of expertise and knowledge.
The development of collocational knowledge is slow and uneven and pro-
ductive mastery clearly lags behind receptive use. But, as argued by many
researchers, collocations are more low-frequent than the words that make up the
collocations, and learners therefore mostly lack sufficient exposure to collocations
to create, strengthen and maintain the associative links between the constituents.
Many conflicting findings have also been reported. This may in part be
caused by the lack of clarity and agreement in the research field in relation to
the underlying theoretical assumptions regarding the conceptualization of col-
locational knowledge and development. This naturally affects the type of
research questions asked, the identification and selection of collocations target-
ed for investigation and the research approaches adopted. Moreover, the
methodological problems identified in the review make it difficult to outline
any valid generalizations across the many studies carried out. The findings show
that learning and ability for use are affected by a number of factors pertaining
specifically to the types of collocations targeted, their frequency, degree of
semantic transparency and the context of learning. Researchers are therefore
faced with a number of challenges in relation to language target selection crite-
ria. Moreover, learners’ awareness of collocations, their motivation to focus on
50 Birgit Henriksen
these and the teaching conditions afforded for acquisition to take place differ
immensely, pointing to the need to combine macro-level, quantitative studies
looking at large corpora of L1 and L2 language use and development with
micro-level, qualitative case studies of the collocational competence and acqui-
sitional patterns of the individual language learner.
None of these results is, however, surprising, and matches the general SLA
findings for other areas of language use, e.g. single-words and other types of FSs.
We therefore need to ask whether and, if so, in which way collocations are rad-
ically different from other types of FSs or single-word items. Are there specific
obstacles related to learning collocations, e.g. factors such as transparency,
saliency or function, which make them more difficult to learn or is it merely a
matter of lack of exposure due to their frequency which hinders sufficient
uptake and consolidation? Does the fact that learners often already have knowl-
edge of the individual words that make up collocations hinder or facilitate
learning? Can we transfer our knowledge and assumptions about the knowl-
edge, use and development of single-words and FSs to research on collocations
or should other models and approaches be adopted? It has been found that col-
locations are processed holistically as lexical units and that L2 learners tend to
transfer collocational knowledge from their L1, but we still know little about
the types of lexical entries formed for collocations, the links between lexical
entries for single words and collocations, the links between lexical entries in the
L1 and the L2, and the routes the language user takes in processing them. All
these aspects may have an impact on the L2 learners’ knowledge, use and devel-
opment of collocations and are fruitful avenues of research. The newer studies
carried out by Bell (2009), Wolter and Gyllstad (2011) and Fitzpatrick (2012)
for example present some very promising research directions to take, which may
help us find answers to some of these questions.
Acknowledgements
I would like to express my gratitude to the editors, the two anonymous review-
ers and Henrik Gyllstad and Brent Wolter for their comments on the paper.
References
Al-Zahrani, M. S. (1998). Knowledge of English lexical collocations among male Saudi col-
lege students majoring in English at a Saudi university. Ph.D. UMI, Ann Arbor, MI.
Arnaud, P. J. L., & Savignon, S. J. (1997). Rare words, complex lexical units and the
advanced learner. In J. Coady & T. Huckin (Eds.), Second language vocabulary
acquisition (pp.157-173). Cambridge: Cambridge University Press.
Research on L2 learners’ collocational competence and development – a progress report 51
Dechert, H. W. (1983). How a story is done in a second language. In Færch, C. & G. Kasper
(Eds.), Strategies in interlanguage communication (pp. 20-60). London: Longman.
Deuter, M. (2002). The Oxford collocations dictionary Oxford: Oxford University Press.
Dörnyei, Z., Durow, V., & Zahran, K. (2004). Individual differences and their effects
on formulaic sequence acquisition. In Schmitt N. (Ed.), Formulaic sequences:
Acquisition, processing and use (pp. 87-106). Amsterdam: Benjamins.
Durrant, P. (2008). High frequency collocations and second language learning.
(Unpublished doctoral dissertation). The University of Nottingham, Nottingham.
Durrant, P. (2009). Investigating the viability of a collocation list for students of English
for academic purposes. English for Specific Purposes, 28(3), 157-169.
Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make
use of collocations? International Review of Applied Linguistics, 47(2), 157-177.
Durrant, P., & Schmitt, N. (2010). Adult learners’ retention of collocations from expo-
sure. Second Language Research, 26(2), 163-188.
Ellis, N.C. (2001). Memory for language. In P. Robinson (Ed.), Cognition and second
language instruction (pp. 33-68). Cambridge: Cambridge University Press.
Ellis, N.C. (2003). Constructions, chunking and connectionism: the emergence of sec-
ond language structure. In C. J. Doughty & M. H. Long (Eds.), The handbook of
second language acquisition. Oxford: Blackwell.
Ellis, N. C. (2005). At the interface: Dynamic interactions of explicit and implicit lan-
guage knowledge. Studies in Second Language Acquisition, 27(2), 305-352.
Ellis, N. C., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic language in native
and second-language speakers: Psycholinguistics, corpus linguistics, and TESOL.
TESOL Quarterly, 41(3), 375-396.
Erman, B., & Warren, B. (2000). The idiom principle and the open choice principle.
Text 20(1), 29-62.
Eyckmans, J. (2009). Towards an assessment of learners’ receptive and productive syn-
tagmatic knowledge. In A. Barfield & H. Gyllstad (Eds.), Researching collocations
in another language: Multiple interpretations (pp. 139-152). Basingstoke: Palgrave
Macmillan.
Fan, M. (2009). An exploratory study of collocational use by ESL students: A task-
based approach. System, 37(1), 110-123.
Farghal, M., & Obiedat, H. (1995). Collocations: A neglected variable in EFL.
International Review of Applied Linguistics, 33(4), 315-31.
Fayez-Hussein, R. (1990). Collocations: The missing link in vocabulary acquisition
amongst EFL learners. In J. Fisiak (Ed.), Papers and studies in contrastive linguistic:
The Polish English contrastive project. (Vol. 26, pp.123-136). Poznan: Adam
Mickiewicz University.
Fitzpatrick, T. (2012). Tracking the changes: vocabulary acquisition in the study abroad
context. The Language Learning Journal, 40(1), 81-98.
Forsberg, F., & Fant, L. (2010). Idiomatically speaking: the effects of task variation on
formulaic language in highly proficient users of L2 French and Spanish. In D.
Wood (Ed.), Perspectives on formulaic language: Acquisition and communication (pp.
47-70). London/New York: Continuum.
Research on L2 learners’ collocational competence and development – a progress report 53
Gitzaki, C. (1999). Second language lexical acquisition: A study of the development of col-
locational knowledge. San Francisco: International Scholar Publications.
Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and
formulae. In A. P. Cowie (Ed.), Phraseology: Theory, analysis and application (pp.
145-160). Oxford: Oxford University Press.
Granger, S. (2009). Learner corpora: A window onto the L2 phrasicon. In A. Barfield
& H. Gyllstad (Eds.) Researching collocations in another language: Multiple interpre-
tations (pp. 60-65). Basingstoke: Palgrave Macmillan.
Granger, S., & F. Meunier (Eds.). (2008). Phraseology. An interdisciplinary perspective.
Amsterdam: Benjamins.
Granger, S., & Pacquot, M. (2008). Disentangling the phraseological web. In S.
Granger & F. Meunier (Eds.), Phraseology. An interdisciplinary perspective (pp. 27-
49). Amsterdam: Benjamins.
Groom, N. (2009). Effects of second language immersion on second language collocation-
al development. In A. Barfield & H. Gyllstad (Eds.), Researching collocations in anoth-
er language: Multiple interpretations (pp. 21-33). Basingstoke: Palgrave Macmillan
Gyllstad, H. (2007). Testing English collocations: Developing receptive tests for use with
advanced Swedish learners. Lund: Lund University.
Handl, S. (2009). Towards collocational webs for presenting collocations in learners’
dictionaries. In A. Barfield & H. Gyllstad (Eds.), Researching collocations in anoth-
er language: Multiple interpretations (pp. 69-85). Basingstoke: Palgrave Macmillan.
Hasselgren, A. (1994). Lexical teddy bears and advanced learners: a study into the ways
Norwegian students cope with English vocabulary. International Journal of Applied
Linguistics, 4(2), 237-258.
Henriksen, B., & Stenius Stæhr, L. (2009). Processes in the development of L2 colloca-
tional knowledge: A challenge for language learners, researchers and teachers. In A.
Barfield & H. Gyllstad (Eds.), Researching collocations in another language: Multiple
interpretations (pp. 224-231). Basingstoke: Palgrave Macmillan.
Herbst, T. (1996). What are collocations: sandy beeches or false teeth? English Studies
77(4), 379-393.
Hoey, M. (2005). Lexical priming: A new theory of words and language. London:
Routledge.
Hoffmann, S., & Lehmann, H. M. (2000). Collocational Evidence from the British
National Corpus. In J. M. Kirk (Ed.), Corpora Galore: Analyses and Techniques in
Describing English. Amsterdam: Rodopi.
Howarth, P. (1996). Phraseology in English academic writing: Some implications for lan-
guage learning and dictionary making. Tübingen: Narr.
Howarth, P. (1998). Phraseology and second language proficiency. Applied Linguistics,
19(1), 24-44.
Jaén, M. M. (2007). A corpus-driven design of a test for assessing the ESL collocational com-
petence of university students. International Journal of English Studies, 7(2), 127-147.
Jiang, J. (2009). Designing pedagogic materials to improve awareness and productive use of
L2 collocations. In A. Barfield & H. Gyllstad (Eds.), Researching collocations in anoth-
er language: Multiple interpretations (pp. 99-113). Basingstoke: Palgrave Macmillan.
54 Birgit Henriksen
Kazubski, P. (2000). Selected aspects of lexicon, phraseology and style in the writing of
Polish advanced learners of English: A contrastive, corpus-based approach.
Available on-line at http://main.amu.edu.pl/przemka/research.html
Keshavarz, M. H., & Salimi, H. (2007). Collocational competence and cloze test per-
formance: a study of Iranian EFL learners. International Journal of Applied
Linguistics, 17(1), 81-92.
Koya, T. (2005). The acquisition of basic collocations by Japanese learners of English.
(Unpublished doctoral dissertation) Waseda University, Japan. Available on-line at
http://dspace.wul.waseda.ac.jp/dspace/bitstream/2065/5285/3/Honbun-4160.pdf
Larsen-Freeman, D. (1997). Chaos/complexity science and second language acquisi-
tion. Applied Linguistics, 18(2), 141-165.
Larsen-Freeman, D. (2006). The emergence of complexity, fluency, and accuracy in the
oral and written production of five Chinese learners of English. Applied Linguistics,
27(4), 590-619.
Laufer, B., & Girsai, N. (2008). Form-focused instruction in second language vocabu-
lary learning: A case for contrastive analysis and translation. Applied Linguistics,
29(4), 694-716.
Laufer, B., & Waldman, T. (2011). Verb-noun collocations in second-language writing:
A corpus analysis of learners’ English. Language Learning, 61(2), 647-672.
Les´niewska, J., & Witalisz, E. (2007). Cross-linguistic influence and acceptability judg-
ments of L2 and L1 collocations: A study of advanced Polish learners of English.
EUROSLA Yearbook 7, 27-48.
Lewis, M. (1993). The lexical approach. Hove: Language Teaching Publications.
Lewis, M. (Ed.). (2000). Teaching collocation: Further developments in the lexical
approach. Hove: Language Teaching Publications.
Li, J., & Schmitt, N. (2010). The development of collocations use in academic texts by
advanced L2 learners: a multiple case study approach. In D. Wood (Ed.),
Perspectives on formulaic language: Acquisition and communication (pp. 23-46).
London/New York: Continuum.
Lorenz, T. R. (1999). Adjective intensification – learners versus native speakers: A corpus
study of argumentative writing. Amsterdam: Rodopi.
Millar, N. (2011). The processing of malformed formulaic language. Applied Linguistics,
32(2), 129-148.
Mochizuki, M. (2002). Explorations of two aspects of vocabulary knowledge:
Paradigmatic and collocational. Annual Review of English Language Education in
Japan, 13, 121-129.
Nation, P. (2001). Learning vocabulary in another language. Cambridge: Cambridge
University Press.
Nattinger, J. R., & DeCarrico, J. S. (1992). Lexical phrases and language teaching.
Oxford: Oxford University Press.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and
some implications for teaching. Applied Linguistics, 24(2), 223-242.
Nesselhauf, N. (2005). Collocations in a learner corpus. Studies in Corpus Linguistics
(Vol. 14). Amsterdam: Benjamins.
Research on L2 learners’ collocational competence and development – a progress report 55
Pawley, A., & Syder, F. (1983). Two puzzles for linguistic theory: Native-like selection
and native-like fluency. In J. Richards & R. Schmidt (Eds.), Language and commu-
nication (pp. 191-226). London: Longman.
Pei, C. (2008). Review of empirical studies on collocations in the field of SLA. Celea
Journal, 31(6), 72-81
Peters, E. (2009). Learning collocations through attention-drawing techniques: A qual-
itative and quantitative analysis. In A. Barfield & H. Gyllstad (Eds.), Researching
collocations in another language: Multiple interpretations (pp. 194-207).
Basingstoke: Palgrave Macmillan.
Prentice, J. (2010). På rak sak: Om ordförbindelser och konventionaliserede uttryck
bland unga språkbrukare i flerspråkiga miljöer. Göteborgstudier i nordisk
språkvvetenskap 13. Göteborg: Intellecta Infolog.
Raupach, M. (1984). Formulae in second language speech production. In H. W.
Dechert, D. Möhle & M. Raupach (Eds.), Second language production (pp. 114-
137). Tübingen: Narr.
Revier, R. L. (2009). Evaluating a new test of whole English collocations. In A. Barfield
& H. Gyllstad (Eds.), Researching collocations in another language: Multiple inter-
pretations (pp. 125-138). Basingstoke: Palgrave Macmillan.
Revier, R. L., & Henriksen, B. (2006). Teaching collocations. Pedagogical implications
based on a cross-sectional study of Danish EFL. In M. Bendtsen, M. Björklund,
C. Fant & L. Forsman (Eds.), Språk, lärande och utbilding i sikte (pp. 191-206).
Pedagogiska fakulteten Åbo Akademi Vasa.
Schmitt, N. (2004). (Ed.). Formulaic sequences: Acquisition, processing and use. Amsterdam:
Benjamins.
Schmitt, N. (2010). Researching vocabulary. A vocabulary research manual. Basingstoke:
Palgrave Macmillan.
Schmitt, N., Grandage, S., & Adolphs, S. (2004). Are corpus-derived recurrent clusters
psycholinguistically valid? In Schmitt, N. (Ed.), Formulaic sequences: Acquisition,
processing and use (pp. 127-149). Amsterdam: Benjamins.
Schmitt, N., & Underwood, G. (2004). Exploring the processing of formulaic
sequences through a self-paced reading task. In Schmitt, N. (Ed.), Formulaic
sequences: Acquisition, processing and use (pp. 173-189). Amsterdam: Benjamins..
Shei, C. C. (1999). A brief review of English verb-noun collocation. Available on-line
at http://www.dai.ed.ac.uk/homes/shei/survey.html>.
Sinclair, J. M. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.
Sinclair, J. M. (2004). Trust the Text: Language, corpus and discourse. London: Routledge.
Siyanova, A., & Schmitt, N. (2008). L2 learner production and processing of collocation:
A multi-study perspective. The Canadian Modern Language Review, 64(3), 429-458.
Skrzypek, A. (2009). Phonological short-term memory and L2 collocational develop-
ment in adult learners. EUROSLA Yearbook, 9, 160-184.
Stengers, H. F., Boers, F., Eyckmans, J., & Housen, A. (2010). Does chunking foster
chunk uptake? In De Knop, S., F. Boers & T. De Rycker (Eds.), Fostering language
teaching efficiency through cognitive linguistics (pp. 99-117). Berlin/New York:
Mouton de Gruyter.
56 Birgit Henriksen
Underwood, G., Schmitt, N., & Galpin, A. (2004). The eyes have it: An eye-movement
study into the processing of formulaic sequences. In Schmitt N. (Ed.), Formulaic
sequences: Acquisition, processing and use (pp. 153-172). Amsterdam: Benjamins.
Walker, C. P. (2011). A corpus-based study of the linguistic features and processes which
influence the way collocations are formed: Some implications for the learning of collo-
cations. TESOL Quarterly, 45(2), 291-312.
Wang, Y., & Shaw, P. (2008). Transfer and universality: Collocation use in advanced
Chinese and Swedish learner English. ICAME Journal, 32, 201-232.
Warren, B. (2005). A model of idiomaticity. Nordic Journal of English Studies, 4,
35-54.
Webb, S., & Kagimoto, E. (2011). Learning collocations: Do the number of collocates,
position of the node word, and synonymy affect learning. Applied Linguistics,
32(3), 259-276.
Weinert, R. (2010). Formulaicity and usage-based language: linguistic, psycholinguistic
and acquisitional manifestations. In D. Wood (Ed.), Perspectives on formulaic lan-
guage: Acquisition and communication (pp. 1-20). London/New York: Continuum.
Wolter, B., & Gyllstad, H. (2011). Collocational links in the L2 mental lexicon and the
influence of L1 intralexical knowledge. Applied Linguistics, 34(4), 430-449.
Wood, D. (2010). Perspectives on formulaic language: Acquisition and communication.
London/New York: Continuum.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University
Press.
Yamashita, J., & Jiang, N. (2010). L1 influence on the acquisition of L2 collocations:
Japanese ESL users and EFL learners acquiring English collocations. TESOL
Quarterly, 44(4), 647-668.
Ying, Y., & O’Neill, M. (2009). Collocation learning through and ‘AWARE’ approach:
Learner perspectives and learning process. In A. Barfield & H. Gyllstad (Eds.),
Researching collocations in another language: Multiple interpretations (pp. 181-193).
Basingstoke: Palgrave Macmillan.
Measuring the contribution of vocabulary
knowledge to proficiency in the four skills
James Milton
Swansea University
This chapter examines the way vocabulary knowledge relates to the ability to
perform communicatively in a foreign language and in particular the ability to
perform in the four language skills of reading, writing, listening and speaking. It
reviews recent research designed to investigate the way vocabulary knowledge
and performance inter-relate. There is a tradition of research which demon-
strates that measures of vocabulary knowledge are particularly good predictors of
performance in the four skills, and recent research suggests that when measures
of different dimensions of vocabulary knowledge are combined this predictive-
ness can be enhanced. Large vocabularies, and speed and depth of vocabulary
knowledge, appear indispensable to the development of good performance in
any language skill and it is now possible to enumerate the scale of vocabulary
that is needed for the CEFR levels of communicative performance.
as Schmitt (2008) notes, the insights gained have failed to make their way into
the mainstream literature on language pedagogy. An example of the prevailing
attitude to vocabulary in pedagogy can been seen in the comment by Harris and
Snow that “few words are retained from those which are ‘learned’ or ‘taught’ by
direct instruction ... [and learners] extend their vocabulary through sub-con-
scious acquisition” (Harris & Snow, 2004, pp. 55-61). With this attitude, the
explicit teaching of vocabulary, and the systematic organisation of vocabulary in
the curriculum, is not a priority.
In academic circles, the place of vocabulary in language learning has been
significantly revised over the last decade and current academic thinking is very
much at odds with much classroom and textbook practice. Far from being an
element which is merely incidental to language learning, current thinking advo-
cates that vocabulary may be crucial to the development of language perform-
ance overall. In a recent version of generative grammar, the Minimalist Program
(Chomsky, 1995), the differences between languages are seen to be mainly lex-
ical in nature and this leads Cook (1998) to suggest that the Minimalist
Program is lexically-driven. The properties of the lexical items shape the sen-
tence rather than lexical items being slotted into pre-existent structures. The
task the language learner faces, therefore, is principally one of learning the
vocabulary of the foreign language. The acquisition of vocabulary items in suf-
ficient quantity triggers the setting of universal grammatical parameters. This
approach is reflected in the Lexical Learning Hypothesis (Ellis, 1997) according
to which vocabulary knowledge is indispensable to the acquisition of grammar.
One of the outcomes of the recent academic interest in vocabulary has been
the development of ways for describing and testing vocabulary knowledge,
which are both principled and systematic. Recently developed methods allow
normalised data to be produced so the growth of a foreign language lexicon over
the course of learning can be modelled. With this information it becomes pos-
sible to measure the contribution of vocabulary knowledge to language devel-
opment and confirm whether the close relationship between vocabulary growth
and language level exists in practice.
tive word knowledge. Some words, it seems, exist in the minds of language
speakers primed for use and can be called to mind in speech or in writing easi-
ly and quickly. Other words are not used in this way but can, nonetheless, be
called to mind for comprehension if they occur in the speech or writing of oth-
ers. Each of these facets of knowledge can contribute to language performance
in its own different way. A language user with extensive knowledge of words in
their phonological form but no knowledge of the written form of words, for
example, has the potential at least to speak and understand speech but no capac-
ity for reading or writing. There is no definitive list of what comprises word
knowledge and even native speakers will not know every facet of every word in
their lexicon. In measuring vocabulary knowledge in order to assess how it
impacts on overall language performance, therefore, decisions have to be made
as to exactly what it is that is being measured.
The nearest thing we have to a definitive list of what it means to know a
word is Nation’s (2001) table shown in table 1. This table usefully encapsulates
knowledge of the various forms of a word, the various aspects of meaning a
word can carry with it, and the elements of use which are also part of word
knowledge. Knowledge of form includes not just knowledge of the written and
sound forms of a word but also knowledge of affixation, knowledge of the way
extra parts can be added, or the ways in which a word can change, to reflect
changes in its grammatical function or to add to its meaning. Knowledge of
meaning includes not just knowledge of a core meaning, perhaps a link with a
direct foreign language counterpart, but also the concepts, referents and associ-
ations, which a word may carry with it. Words in different languages often carry
differences in nuances of meaning, which, if a learner is to perform fluently,
may need to be known. And knowledge of use includes knowledge of the gram-
mar of a word but also the way words like to behave in relation to each other.
Some words like to occur in combination with other words, in particular idioms
for example, and some words, like swear words, may be restricted in the occa-
sions where they can be used appropriately, and this knowledge will also be
needed if the language is to be used fluently and skilfully. Each facet of knowl-
edge is sub-divided into receptive and productive knowledge.
This is a very useful and insightful list, and makes apparent just how much
is involved in fully knowing a word. It is also clear that designing a test that can
capture knowledge in all this diversity is scarcely practical. A single test could
not possibly hope to encompass every aspect of knowledge described in this
table. There is a further difficulty inherent in this table in that the various forms
of knowledge are characterised but not precisely defined. In assessing knowledge
of word parts, for example, it is unclear at what point the additions and changes
to a word will form a new word rather than a derived form of an existing one.
Nor is it clear, for example, how frequently a word must co-occur with another
60 James Milton
Table 1. Description of “what is involved in knowing a word”, from Nation (2001: 27).
eign language are linked (e.g. Alderson, 1984; Laufer, 1992; Laufer & Nation,
1999; Qian, 1999; Zimmerman, 2004) and it is the nature and extent of this
link that this chapter intends to make more clear.
The goal for any foreign language learner is to use the language in some way.
This may be for speech and casual conversation, or for translation of texts, or
for study through the medium of the foreign language. It has become a com-
monplace in the assessment of language to consider language in terms of four
separate skills: the receptive skills of reading and listening, and the productive
skills of speaking and writing. In reality, of course, these distinctions are not
so clear and the ability to read and listen fluently requires the learner to active-
ly anticipate the language that is likely to occur and then monitor input to
check that the possibilities which have been created are occurring.
Nonetheless, the distinction is enshrined in formal and assessment schemes.
The University of Cambridge Local Examinations Syndicate (UCLES) exams,
such as the International English Language Testing System (IELTS) test,
administer separate papers for each of these skills and devise separate grading
schedules for them. The Council of Europe’s (2001) Common European
Framework of Reference for Languages (CEFR) hierarchy uses both global
descriptors of language performance as a whole (p. 24), and descriptors sepa-
rated into the four skills (pp. 26-27). These descriptors are couched in terms
of performance of language rather than in terms of the language knowledge,
which is likely to underlie performance. The example below of the CEFR’s
global descriptor for performance at C2 level illustrates this (Council of
Europe, 2001, p. 24).
Can understand with ease virtually everything heard or read. Can summarise
information from different spoken and written sources, reconstructing argu-
ments and accounts in a coherent presentation. Can express him/herself
spontaneously, very fluently and precisely, differentiating finer shades of
meaning even in more complex situations.
trol appears to be closer to vocabulary depth in that it refers to the accuracy and
appropriateness of vocabulary selection and use. Table 2 presents the descriptors
for vocabulary range.
level descriptor
C2 Has a good command of a very broad lexical repertoire including idiomatic
expressions and colloquialisms; shows awareness of connotative levels of
meaning.
C1 Has a good command of a broad lexical repertoire allowing gaps to be readily
overcome with circumlocutions; little obvious searching for expressions or
avoidance strategies. Good command of idiomatic expressions and
colloquialisms.
B2 Has a good range of vocabulary for matters connected to his/her field and
most general topics. Can vary formulation to avoid frequent repetition, but
lexical gaps can still cause hesitation and circumlocution.
B1 Has a sufficient vocabulary to express him/herself with some circumlocutions
on most topics pertinent to his/her everyday life such as family, hobbies and
interests, work, travel, and current events. Has sufficient vocabulary to conduct
routine, everyday transactions involving familiar situations and topics.
A2 Has a sufficient vocabulary for the expression of basic communicative needs.
Has a sufficient vocabulary for coping with simple survival needs.
A1 Has a basic vocabulary repertoire of isolated words and phrases related to
particular concrete situations.
fore, also suggests that it might be possible and useful for vocabulary size and
depth measurements to be attached to the different levels.
There is some empirical evidence that links vocabulary breadth measures
with the CEFR language levels. Milton (2010), shown in Table 3, provides EFL
vocabulary sizes (out of the most frequent 5,000 lemmatised words in English)
gained from over 10,000 learners in Greece taking both recognition tests of
their vocabulary size and also formal UCLES exams at levels within the CEFR
framework.
Table 3. Vocabulary size estimates, CEFR levels and formal exams (Milton, 2010, p. 224)
While there is some individual variation around these ranges, Milton is able to
conclude that “the assumption made in the CEFR literature, that as learners
progress through the CEFR levels their foreign language lexicons will increase
in size and complexity, is broadly true” (2010, p. 224). Variation may be
explained by the way vocabulary knowledge and language performance are
imperfectly linked. Learners with the same or similar vocabulary sizes – and
remember these are based on knowledge of the 5,000 most frequent lemmatised
words in English and so are not absolute vocabulary size estimates – may make
different use of this knowledge to communicate more or less successfully.
Milton and Alexiou (2009) report similar vocabulary size measurements for
CEFR levels in French and Greek as foreign languages.
If vocabulary breadth predicts overall language performance well, then it
might be expected that vocabulary breadth will link well also with the four sep-
arate skills. However, there are reasons for thinking that the oral skills, speaking
and listening, will have a different relationship with vocabulary knowledge from
the written skills, writing and reading. Figures for coverage (the proportion of a
corpus provided by words in the corpus arranged in frequency order) in spoken
and written corpora suggest that written text is typically lexically more sophis-
ticated than spoken text. A comparison (Figure 2) of coverage taken from writ-
ten and spoken sub-corpora of the 100 million word British National Corpus
illustrates this (Milton, 2009, p. 58).
66 James Milton
It has been acknowledged for some time that vocabulary knowledge is a good
predictor of general proficiency in a foreign language. However, most research
on the relationship has been conducted with measures of vocabulary size only,
and within the realm of reading skill only (Stæhr, 2008). Generally, such stud-
ies have found strong correlations between receptive vocabulary size tests and
reading comprehension tests, ranging from 0.50 to 0.85, with learners from dif-
ferent proficiency levels (e.g. Laufer, 1992; Qian, 1999; Albrechtsen, Haastrup
& Henriksen, 2008).
A feature of recent work in vocabulary studies has been to try to investigate
more fully the links between lexical knowledge and learner performance, and
investigate the scale of the contribution which vocabulary, in all three of its
dimensions, can make to a variety of communicative skills in foreign language
performance. By extension, such research also tests the credibility of theories
such as the Lexical Learning Hypothesis (Ellis, 1997), and contributes firmer
evidence to the place that vocabulary should have in the structure of the foreign
language learning curriculum, since in this view of learning it is vocabulary
knowledge which drives learning in other aspects of language. However, the
considerations above have suggested that the relationship between vocabulary
knowledge and overall language skill may potentially be difficult to model and
to measure. Different dimensions of vocabulary knowledge might need to be
measured separately and their effects combined if the full nature of the relation-
ship with language skill is to be seen. Further, it might be that the relationship
will vary according to the level of the learner and the skills the learner needs.
The following sections will examine particular pieces of research in this area,
which illustrate the state of our knowledge and from which broader conclusions
can be drawn.
(Coxhead, 2000). However, the academic word level was excluded from Stæhr’s
study as not relevant for low-level learners. The test assesses learners’ receptive
knowledge of word meaning at the 2,000, the 3,000, the 5,000 and the 10,000
level, and the test results can thus give an indication whether learners master the
first 2,000, 3,000, 5,000 or 10,000 word families in English. Although the VLT
was originally designed as a diagnostic test intended for pedagogical purposes,
researchers (e.g. Read, 2000; Schmitt et al., 2001) acknowledge its use as a means
of giving a good guide to overall vocabulary size. Tests of language skills were
assessed as part of the national school leaving examination. Reading and listening
skill abilities were measured using pencil-and-paper multiple-choice tests. Writing
ability was measured using the scores awarded for an essay task where the partic-
ipants had to write a letter to a job agency applying for a job.
Stæhr’s results indicate a correlation between vocabulary size and reading,
which is comparable with the findings of other research mentioned above and
suggests a strong and statistically significant relationship between the amount of
vocabulary a learner knows in the foreign language and their ability to handle
questions on a text designed to test their ability to fully comprehend the text.
His analysis, using binary logistic regression, shows that as much as 72% of the
variance in the ability to obtain an average score or above in the reading test is
explained by vocabulary size (Nagelkerke R2 = 0.722). The results also illumi-
nate the relationship with other language skills. The correlation between vocab-
ulary size and both writing and listening ability is also statistically significant
and reasonably strong. Stæhr suggests that 52% of the variance in the ability to
obtain an average or above-average writing score is accounted for by vocabulary
size (Nagelkerke R2 = 0.524), and that 39 % of the variance in the listening
scores, in terms of the ability to score above the mean, is accounted for by the
variance in the vocabulary scores (Nagelkerke R2 = 0.388). His interpretation
of this is that this amount of variance is substantial. Even the contribution
towards listening, the smallest in this study, is considerable, given the fact that
it is explained by one single factor. This confirms the importance of receptive
vocabulary size for learners in all three skills investigated.
Stæhr’s findings further indicate the importance of knowing the most fre-
quent 2,000 word families in English in particular and he suggests that knowl-
edge of this vocabulary represents an important threshold for the learners of his
study. Knowledge of this vocabulary is likely to lead to a performance above
average in the listening, reading and writing tests of the national school leaving
exam. The results seem to emphasize that the 2,000 vocabulary level is a crucial
learning goal for low-level EFL learners and suggest that the single dimension
of vocabulary size is a crucial determiner of the ability to perform in the three
foreign language skills tested. The more vocabulary learners know, the better
they are likely to perform through the medium of the foreign language.
Measuring the contribution of vocabulary knowledge to proficiency in the four skills 69
two vocabulary size tests is moderate to poor at 0.41, even if the relationship is
still statistically significant. Interestingly, it appears that elementary level learners
have knowledge predominantly in aural form, while the more advanced learners
tend increasingly to grow lexicons where words appear to be known through
written form only (see also Milton & Hopkins, 2006; Milton & Riordan 2006).
It seems that vocabulary size can predict oral skills comparably with written skills
provided that vocabulary size is measured appropriately. The correlation between
A-Lex and speaking scores (0.71) is very similar to the correlations observed
between X-Lex and reading and writing scores (0.70 and 0.76).
Regression analysis suggests that vocabulary size can explain broadly simi-
lar amounts of variance in all the four skills. If the relationship is assumed to be
linear, and one should bear in mind that for oral skills in particular this need
not be the case, then between 40 % and 60 % of variance in sub-skills scores
can be explained through the single variable of vocabulary size. Variance in the
listening sub-test, which involves both reading questions and listening for
answers, is best explained through a combination of the written and aural sub-
scores. Analysis using binary logistic regression, used because the relationship
may not be linear, produces comparable results explaining between 41% and
62% of variance in the ability to score grade 5 or above on the IELTS sub-tests.
The fact that binary logistic regression explains more variance in the speaking
scores (Nagelkerke R2 = 0.61, Cox & Snell R2 = 0.45) than the linear regres-
sion (Adjusted R2 = 0.40) is tentatively suggested by Milton et al. as evidence
that the relationship between vocabulary size and performance in tests of speak-
ing skill is non-linear, although differences in the way these scores are calculat-
ed make this a highly subjective interpretation.
The significance of these results is to confirm the importance of the vocab-
ulary size dimension in all aspects of foreign language performance. Vocabulary
size, calculated appropriately, appears consistently to explain about 50% of vari-
ance in the scores awarded to learners for their performance in the sub-skills of
language, including speaking skills where hitherto the relationship has been
assumed to be less strong. The fact that, as in explaining listening sub-scores,
measurements for different aspects of vocabulary knowledge can be aggregated
to enhance the explanatory power of vocabulary in the four skills suggests that
continuing to investigate the various dimensions of vocabulary knowledge may
yield useful insights.
vocabulary sizes can be linked to language levels as those presented in the CEFR
and that vocabulary size can be used as a reliable placement measure. The expec-
tation that oral skills would not be so closely linked to vocabulary size has not
emerged in these studies possibly because the measures of skill used relate to
measures such as IELTS scores, which are rather academic and might favour a
more linear relationship than would be the case if the skills were measured in a
non-academic context. Unusually in the spoken register, the skills rewarded in
the IELTS speaking sub-test may benefit from the more extensive use of infre-
quent vocabulary. This conclusion has emerged despite the clear evidence that
in successful language performers words are held predominantly in the written
form and have presumably been learned by reading rather than through oral
interaction.
Stæhr (2008) has remarked that the explanatory power of vocabulary size
in explaining variance in scores on language skills suggests that vocabulary size
may be the determinant factor, pre-eminent among the other factors which may
be at work in performing in and through a foreign language. Schoonen’s find-
ings, however, suggest that this may be an exaggeration, since size and other fac-
tors appear so closely linked and the importance of other variables exceeds
vocabulary in his study. Nonetheless, vocabulary knowledge, and vocabulary
size in particular, are clearly a very major contributor to success in language per-
formance. It has emerged that knowledge of the most frequent 2,000 words, in
particular, is an important feature in successful communication through a for-
eign language. There is a caveat here, in that the findings suggest that in oral
skills the importance of vocabulary knowledge diminishes with increasing size
rather faster than it does in skills that involve the written word. The reason for
this is worth consideration and the best explanation available is that this is con-
nected with coverage and differences in the way we handle written and spoken
language. Corpora suggest that, in English language for example, the most fre-
quent words in a language are even more frequent in spoken language than in
written language. Adolphs and Schmitt’s (2003) analysis of spoken data in
CANCODE indicates that important coverage thresholds such as the 95% cov-
erage figure for general comprehension might be reached with between 2,000
and 3,000 words; perhaps half the figure needed to reach the same threshold in
written discourse.
The studies by Stæhr (2008), Milton et al. (2010) and Schoonen (2010)
discussed above suggest that, because the dimensions of vocabulary knowledge
are so closely linked, a single measure of vocabulary knowledge is likely, by itself,
to be a good indicator of skill and level in a foreign language. Because vocabu-
lary breadth in English is now easily measurable using reliable tests for which
we have normalised scores, perhaps it is not surprising if vocabulary size or
breadth has become particularly closely associated with performance in the four
Measuring the contribution of vocabulary knowledge to proficiency in the four skills 73
skills. It seems from the studies above, however, that other dimensions also con-
tribute to performance, perhaps as much as size, and that a combination of
scores for size and depth, or size and speed, for example, can add up to 10% to
the explanatory power of vocabulary knowledge in skills performance. Very
crudely, the more sophisticated the measures of vocabulary knowledge, the
more they are likely to explain variance in performance in the four skills, up to
the level of around 50%. Beyond that point other factors will be needed to
improve the explanatory power of any model. These could be knowledge fac-
tors, such as grammatical knowledge, or skill factors in the ability that users
have in applying their knowledge when listening, reading, speaking or writing.
This is clearly an avenue for further research.
The studies discussed above also allow us to reconsider the concept of lex-
ical space explained at the outset of the chapter: the idea that learners can be
characterised differently according to the type of knowledge they have of the
words they know in their foreign language, and this can explain how they vary
in performance. One interpretation why the depth and size dimensions cor-
relate so well is that they are essentially the same dimension, at least until
learners become very knowledgeable and competent and sufficient words are
known for subtlety in choice or combination to become possible (see
Gyllstadt, this volume). The convenient rectangular shape in Figure 1 is trans-
formed into something much narrower at the outset of learning where lexical
size is paramount, and becomes wider at the most competent levels where
increased depth becomes a possibility and a potential asset. Co-linearity is
noted by Schoonen who suggests another possibility (Schoonen, personal cor-
respondence), that there will be an ‘equal’ development in all three dimen-
sions, and all three will be strongly correlated, but this is probably a spurious
correlation due to language exposure as common cause. Theoretically, it
remains possible to have uneven profiles, including differences in breadth and
depth, but to evaluate this experimental studies would be required where one
dimension only is trained, for example speed, as in Snellings, Van Gelderen &
De Glopper (2004).
4.5. Vocabulary knowledge, theories of language learning, and implications for pedagogy
At the outset of this chapter I suggested that there was a contradiction
between much pedagogical theory and practice and recent SLA theories, as
regards the importance and relevance of vocabulary knowledge to the process of
acquiring proficiency in a foreign language. Current methods and approaches
to language teaching fail to consider how vocabulary should be systematically
built into the curriculum or suggest that this would not be appropriate assum-
ing that the acquisition of vocabulary is merely incidental to the process of lan-
74 James Milton
phonological coding. Learners without this high literacy and who are tied to
phonological decoding may develop more balanced lexicons with orthographic
and phonological word knowledge more equal in size as suggested in Milton
and Hopkins (2006) and Milton and Riordan (2006). However, the price to be
paid for this, perhaps through the slowness of the reading process and the extra
burden on memory, is that the lexicon tends to grow more slowly, limiting com-
municativeness in the written domain.
The research summarised above appears to support theories such as Ellis’s
Lexical Learning Hypothesis. Vocabulary development, however measured,
appears to mesh very closely with other features of language such as grammat-
ical development, and also with overall language ability. Developing learners’
vocabulary knowledge appears to be an integral feature of developing their lan-
guage performance generally. The link has not been established in a strongly
causal sense and while it is not yet clear that the vocabulary knowledge is driv-
ing the other aspects of language development, vocabulary certainly appears to
develop in size and depth alongside every other aspect of language. This very
strongly supports the idea, as in the lexical approach (Lewis & Hill, 1997), that
vocabulary should be built more explicitly into the development of any good
language curriculum. This could be in the form of indicating particular words
to be learned, as in the most frequent words in any language, but it might
imply the introduction of size as a metric into curricula as a means of setting
appropriate targets and monitoring progress without dictating the content of
learning directly.
Even though this may seem quite commonsensical, we have evidence from
the UK that details of vocabulary can be systematically downplayed from for-
mal curricula in line with methodological approaches such as the
Communicative Approach. Curriculum descriptions for B1 level foreign lan-
guage exams in UK (e.g. Edexcel, 2003, for French) routinely contain only min-
imal core vocabularies of around 1,000 words, levels of vocabulary which are
incompatible with performance attainment at B1 level observed elsewhere in
Europe (Milton & Alexiou, 2009). We also have evidence that the teaching of
foreign language vocabulary following these curricula rarely extends beyond
1,000 words at B1 level (Milton, 2006; 2008; David 2008). In other countries
(as indicated in Milton & Alexiou, 2009) CEFR levels have an expectation of
rather greater vocabulary knowledge than in the UK and since it is highly
unlikely that learners can be as communicative with 1,000 words at B1 level as
with the 2,000 or more words required for this level elsewhere in Europe, there
is a clear mismatch in the applications of the CEFR level which vocabulary size
estimates can demonstrate.
76 James Milton
References
Harris, V. & Snow, D. (2004). Classic Pathfinder: Doing it for themselves: focus on learn-
ing strategies and vocabulary building. London: CILT.
Henriksen, B. (1999). Three dimensions of vocabulary development. Studies in Second
Language Acquisition, 21(2), 303-317.
Laufer, B. (1992). How much lexis is necessary for reading comprehension? In P. J. L.
Arnaud & H. Béjoint (Eds.), Vocabulary and applied linguistics (pp. 126-132).
London: Macmillan.
Laufer, B. & Nation, P. (1999). A productive-size test of controlled productive ability.
Language Testing, 16(1), 33-51.
Lewis, M. & Hill, J. (1997). The Lexical Approach; the state of ELT and the way forward.
Boston, Mass: Heinle.
Lightbown, P. & Spada, N. (2006). How Languages are Learned (3rd Ed). Oxford:
Oxford University Press.
Littlewood, W. (1983). Communicative Language Teaching. Cambridge: Cambridge
University Press.
Meara, P. (1996). The dimensions of lexical competence. In G. Brown, K. Malmkjaer,
& J. Williams (Eds.), Performance and competence in second language acquisition
(pp. 35-53). Cambridge: Cambridge University Press.
Meara, P. & Milton, J. (2003). X_Lex, The Swansea Levels Test. Newbury: Express.
Meara, P. & Wolter, B. (2004). V_Links, beyond vocabulary depth. Angles on the English
Speaking World, 4, 85-96.
Milton, J. (2006). Language Lite: Learning French vocabulary in school. Journal of
French Language Studies 16(2), 187-205.
Milton, J. (2008). French vocabulary breadth among learners in the British school and
university system: comparing knowledge over time. Journal of French Language
Studies, 18(3), 333-348.
Milton, J. (2009). Measuring Second Language Vocabulary Acquisition. Bristol:
Multilingual Matters.
Milton, J. (2010). The development of vocabulary breadth across the CEFR levels. In
I. Vedder, I. Bartning, & M. Martin (Eds.), Communicative proficiency and linguis-
tic development: intersections between SLA and language testing research (pp. 211-
232). Second Language Acquisition and Testing in Europe Monograph Series 1.
Milton, J. & Hopkins, N. (2006). Comparing phonological and orthographic vocabu-
lary size: do vocabulary tests underestimate the knowledge of some learners. The
Canadian Modern Language Review, 63(1),127-147.
Milton, J. & Riordan, O. (2006). Level and script effects in the phonological and ortho-
graphic vocabulary size of Arabic and Farsi speakers. In P. Davidson, C. Coombe,
D. Lloyd, & D. Palfreyman (Eds.), Teaching and Learning Vocabulary in Another
Language (pp. 122-133). UAE: TESOL Arabia.
Milton, J. & Alexiou, T. (2009). Vocabulary size and the Common European
Framework of Reference for Languages. In B. Richards, H.M. Daller, D.D.
Malvern, P. Meara, J. Milton, & J. Treffers-Daller (Eds.), Vocabulary Studies in First
and Second Language Acquisition (pp. 194-211). Basingstoke: Palgrave Macmillan.
78 James Milton
Milton J., Wade, J. & Hopkins, N. (2010). Aural word recognition and oral compe-
tence in a foreign language. In R. Chacón-Beltrán, C. Abello-Contesse, & M.
Torreblanca-López (Eds.), Further insights into non-native vocabulary teaching and
learning (pp. 83-98). Bristol: Multilingual Matters.
Mitchell, R. & Myles, F. (2004). Second Language Learning Theories. London: Hodder
Arnold.
Nation, I. S. P. (2001). Learning Vocabulary in Another Language. Cambridge:
Cambridge University Press.
O’Dell, F. (1997). Incorporating vocabulary into the syllabus. In N. Schmitt & M.
McCarthy (Eds.), Vocabulary: description, acquisition and pedagogy (pp. 258-278).
Cambridge: Cambridge University Press.
Qian, D. D. (1999). Assessing the roles of depth and breadth of vocabulary knowledge in
reading comprehension. The Canadian Modern Language Review, 56(2), 282-307.
Read, J. (2000). Assessing Vocabulary. Cambridge: Cambridge University Press.
Schmitt, N. (2008). Review article: instructed second language vocabulary learning.
Language Teaching Research 12(3), 329-363.
Schmitt, N., Schmitt, D., & Clapham, C. (2001). Developing and exploring the behaviour
of two new versions of the Vocabulary Levels Test. Language Testing, 18 (1), 55-88.
Schoonen, R. (2010). The development of lexical proficiency knowledge and skill.
Paper presented at the Copenhagen Symposium on Approaches to the Lexicon,
Copenhagen Business School on 8-10 December 2010. Accessed at https://confer-
ence.cbs.dk/index.php/lexicon/lexicon/schedConf/presentations on 03.03.2011.
Schoonen, R., Van Gelderen, A., Stoel, R., Hulstijn, J., & De Glopper, K. (2011).
Modeling the development of L1 and EFL writing proficiency of secondary-school
students. Language Learning, 61(1), 31-79.
Segalowitz, N. & Hulstijn, J. (2005). Automaticity in bilingualism and second language
learning. In J. F. Kroll & A. M. B. De Groot (Eds.), Handbook of Bilingualism:
Psycholinguistic Approaches (pp. 371-388). Oxford: Oxford University Press.
Snellings, P., Van Gelderen, A., & De Glopper, K. (2004). The effect of enhanced lex-
ical retrieval on L2 writing. Applied Psycholinguistics, 25(2), 175-200.
Stæhr, L. S. (2008). Vocabulary size and the skills of listening, reading and writing.
Language Learning Journal, 36(2), 139-152.
Suárez, A. & Meara, P. (1989). The effects of irregular orthography on the processing
of words in a foreign language. Reading in a Foreign Language, 6(1), 349-356.
Vermeer, A. (2001). Breadth and depth of vocabulary in relation to L1/L2 acquisition
and frequency of input. Applied Psycholinguistics 22(2), 217-234.
Wesche, M. & Paribakht, T. A. (1996). Assessing second language vocabulary knowl-
edge: depth versus breadth. The Canadian Modern Language Review, 53(1), 13-40.
Wilkins, D. A. (1972). Linguistics in Language Teaching. London: Arnold.
Wolter, B. (2005). V_Links: A New Approach to Assessing Depth of Word Knowledge. PhD
Dissertation, University of Wales Swansea.
Zimmerman, K. J. (2004). The role of Vocabulary Size in Assessing Second Language
Proficiency. MA dissertation, Brigham Young University.
FREQUENCY 2.0: Incorporating homoforms
and multiword units in pedagogical frequency lists
Thomas Cobb
Université du Québec à Montréal
1. Introduction
Applying corpus insights to language learning is slow work with roughly one or
two interesting advances per decade. In terms of lexis and frequency: Tim John’s
corpus and concordance package MicroConcord became available in 1986,
enabling language practitioners to build concordances and calculate word fre-
quencies in their own texts and compare these to more general word frequen-
cies in the small corpora bundled with the program. In the 1990’s, Heatley and
Nation’s (1994) Vocabprofile, a computational deployment of West’s (1953)
General Service List (GSL) integrated with a series of academic lists, allowed
West’s hand-made General Service List (1953) of 2,000 high-value lexical items
for English teaching made careful distinctions not only between homoforms,
which are clearly different words (money banks and river banks), but also between
main senses of words (cloud banks and river banks). The limitations of this list are
that it is small (2,000 word families), intuitive (with only rudimentary frequen-
cy information), narrowly pedagogical (no vulgarities allowed), and largely inap-
plicable to text creation or modification except through handwork with small
texts. These shortcomings have now been more than compensated for by lists
based not only on huge corpora like the BNC, but also by the systematic inclu-
sion of range (the distribution of items across the BNC’s 100 subdivisions) as a
second consideration in their construction. And yet it is ironic that in the newer
lists, the old distinctions have temporarily been lost between both word senses
and homoforms. Distinguishing word senses may not be crucial to such an enter-
prise, if, as Beretta, Fiorentino and Poeppel (2005) argue, these are normally
computed in real time from a single entry in the mental lexicon. Nation (e.g.,
2001) has long argued for a pedagogy focusing on the “monosemic” concept
underlying the polysemes. Nonetheless, homoforms do pose a problem.
The BNC frequency list produced by Leech et al. (2001), while lemma-
tized for part of speech, does not distinguish between different words that are
merely linked by a common word form. A trip to the Web version of the BNC
(at http://bncweb.lancs.ac.uk/) reveals that the program is able to output lem-
mas (related morphologies of the same word form) but not distinguish homo-
forms. Nor does the newer list by Davies and Gardner (2010) drawing on the
even larger Corpus of Contemporary American English (COCA, 425 million
words, see Figure 1).
The combined meanings of bank shown in Fig. 1 place the word-form at
rank 701 in the frequency list, hence in the first 1,000 words by frequency. But
this placement is almost certainly an artifact of lumping the two banks togeth-
er, as shown by the collocates account, loan, and river in line 3. Bank1 and bank2
are clearly distinct words linked mainly by a resemblance of form (and possibly
a common etymology that few language users would be aware of ). The reason
FREQUENCY 2.0: Incorporating homoforms and multiword units in pedagogical frequency lists 83
for failure to distinguish between the two banks is, of course, clear. The amount
of textual information that is summarized in a small compilation like Figure 1
is vast (the figure 52,366 at the bottom refers to the number of instances of
bank in the COCA corpus), such that there is no easy way to insert human
judgment into the process. A human investigation of the context for each of
these entries, followed by a count-up, is presumably the only way to tell the dif-
ferent banks apart, and this is an arduous task.
However, with some quick and dirty human-computer cooperation based
on random sampling, this prising apart can be done for many practical purpos-
es. For example, here is a mini-experiment for the word-form bank based on the
50 random non-lemmatized samples offered for free by the BNC website at
http://www.natcorp.ox.ac.uk/. Entering a search for bank reveals that the BNC
contains 17,603 lemmatized instances of this item (all noun forms combined).
Then, eyeballing and counting up the separate meanings from the available 50
random concordance lines over 10 runs, we find a remarkably consistent 43 to
50 lines of money or blood bank and only 5 to 7 of river or cloud bank. Thus
a rough 86% to 96% of the 17,603 uses pertain to money bank, or minimally
15,138 occurrences, so it is probably safe in its first-1,000 position (see Figure
1 for BNC cut-offs). But river bank is instead a medium frequency item (7 uses
in 50, or 14% of the BNC’s 17,603 total occurrences amounts to 2,465 occur-
rences, placing it near the end of the third 1,000 by frequency).
The recent large-corpus based lists also fail to distinguish between MWUs
that are compositional, like a+lot (to build a house on), and ones that are non-
compositional, like a_lot (of money), in the sense that the individual words do not
add up to the accepted meaning of the unit (as suggested in the notation of an
underscore rather than a plus sign). But once again the corpora make it possible
to do so. Passing large corpora through computer programs identifies a wealth of
information about all the ways that words co-occur in more than random
sequences and the extent to which they do so (Sinclair, 1991). In Figure 1, we see
84 Thomas Cobb
COCA’s main collocates of bank, with bullet signs indicating whether each falls
consistently before or after the key word (world• = World Bank, •account = bank
account). What the computer output does not show is that not all collocates are
created equal. In some, the node word and collocate retain their independence (an
international bank), while in others they do not (World Bank, Left Bank, West
Bank). Degree of connectedness can to some extent be predicted by frequency of
found versus predicted co-occurrence in such measures as mutual information or
log-likelihood, as calculated by programs like BNC-Web (which gives internation-
al bank a mutual information (MI) value of 3.04 and West Bank a value of 5.82
or almost double).
In two BNC-based studies, both again involving computational analysis
with human follow-up, Shin and Nation (2007) and Martinez and Schmitt
(2012) identified unexpectedly large numbers of recurring word strings in the
highest frequency zone of the language. Shin and Nation’s co-occurrences (you
know, I think, a bit) were for the most part compositional items which, if incor-
porated into the existing frequency scheme, would count as first 2,000 items.
There was no proposal actually to incorporate these items into standard fre-
quency lists, but merely to argue for their importance to language learners.
Martinez and Schmitt’s focus, on the other hand, was specifically on high-fre-
quency co-occurrences that they judged to be non-compositional, or idiomatic,
i.e. which have, in specific environments, independent meanings and hence
deserve their own places within standard frequency lists. Using a methodology
to be described below, these researchers identified 505 such MWUs in the first
five thousand-lists of the BNC (or just over 10%), distributed over these lists in
the manner shown in Table 1.
Schmitt’s 505 MWUs were given their rightful places and added to the current
frequency lists, then quite a number of existing items would be displaced from
zone to zone (which are arbitrary divisions in any case). The result would be a
set of lists something like the one imagined in Table 2.
Incorporating these two kinds of information would also have strong effects on
the deployment of frequency information in the profiling of novel texts.
Profiling would no longer be a simple matter of matching a word in a text to its
family headword and thence to its counterpart in a frequency list. Rather, the
profiler would have to interpret both homoforms and MWUs in context, in
order to determine which meaning of a homoform was applicable (bank_1 or
bank_2), and in the case of MWUs whether a particular string was composi-
tional or non-compositional (‘look at all the bugs’, or ‘I don’t like bugs at all’).
It is this incorporation of context that is the qualitative transformation implied
in the term Frequency 2.0.
Frequency profiling up to present has been based on single word forms. It has
relied on matching stable word frequencies to equivalent word forms in a
given text. The modification proposed here involves not only extensive mod-
ification of the lists, but also a real-time contextual analysis of each potential
homoform or MWU to determine its true identity in a particular text. These
are dealt with in turn.
3.1. Multiwords
Whether for homoforms or MWUs, the first task is to identify the item involved,
assign it to a category (‘money bank’ or ‘river bank’; ‘a lot of money’ or ‘build on
a lot’), calculate the frequency of each in a large corpus, and give each a place in
the standardized vocabulary lists used by course developers, test writers, and
computer programs like Vocabprofile. A methodology for doing this work is
under development in a new crop of student research projects in vocabulary.
86 Thomas Cobb
Table 3. The highest frequency MWUs from Martinez and Schmitt (2012)
Integrated
List MWU Frequency Example
Rank (per 100 million)
amount of money, then there is a clear similarity between the two, such that
they can be seen as members of a single ‘monoseme’). Thus the exact MWUs
eventually to be integrated into standard frequency schemes remain to be
determined. Nonetheless it seems likely that at least some of Martinez and
Schmitt’s selections are not very controversial (at all, as well as from the first
1,000 list, and as far as and as long as from the second, clearly have both com-
positional and non-compositional meanings). It also seems clear that Martinez
and Schmitt’s basic methodology for determining such items, a large-scale
crunching of matched corpus samples followed by a principled selection by
humans and the calculation of a frequency rating, is likely to prove the best
means of working toward a standard set of MWUs. Following that, the ques-
tion will be how to deploy this information in live Vocabprofiles of novel texts,
and this is a question that can be tackled while the exact target items are not
yet settled.
3.2. Homoforms
The work on homoforms was performed by Kevin Parent in the context of doc-
toral work with Nation. Parent took West’s GSL list of 2,000 high frequency
items as a starting point, on the grounds that most homoforms are found in the
highest frequency zones and also that these would be of greatest pedagogical rel-
evance. Wang and Nation (2004) had already shown that there were only a
handful of such items (about 10) in the 570-word Academic Word List (AWL;
Coxhead, 2000; a compendium of third to sixth thousand level items). In the
GSL, Parent identified 75 items with two or more headwords in the Shorter
Oxford English Dictionary (SOED), a dictionary which marks homoforms
explicitly with separate headwords. For each of these 75 items, he generated 500
random concordance lines from the BNC, and hand-sorted them according to
the SOED’s headwords. He found that for 54 of the 75 items, the commonest
meaning accounted for 90% or more of the 500 lines (surprisingly bank itself
falls into this category, along with bear and bit; the others can be seen in Table
1 in the Appendix). Some of the remaining items whose homoformy is less
skewed are shown in Table 4. Thus, we see in the first row that half of the uses
of miss pertained to loss, or failing to have or to get something, while the other
half occurred in titles (such as Miss Marple).
Some points about Table 4 are in order. First, the items are not lemmatized,
or divided into parts of speech (POS), but are simple counts of word forms.
This is because while the different meanings of a homoform sometimes corre-
spond to a difference in POS (to like somebody vs. look like somebody), some-
times they do not (‘I broke my arms’ vs. ‘I left the arms outside the house’). In
the absence of knowing which of these two types of homoform is predominant
90 Thomas Cobb
in English, Parent’s decision was to begin the analysis with word forms. Second,
Parent’s analysis was confined to true homoforms. This meant that he did not
include words with plausible etymological relationships (gold bar and drink at
a bar) and words that while undifferentiated in writing are nonetheless differ-
entiated in speech (‘close [shut] the door’ and ‘close [near] to dawn’). The analy-
sis is now being expanded to include all effective homoforms, roughly 100 items
in the highest frequency zones. Third, as shown in Table 4, Parent’s list was also
confined to cases where the least important meaning of a homoform set was
greater than 10% in the BNC. It has often been argued that there is no point
in handling items where one meaning is vastly predominant (e.g., Wang &
Nation, 2004) since the labour to do so would be great and the differences
minor. However, once a methodology for assigning differential frequencies is
developed, it is arguably feasible to deal with a larger number of homographs
and take less frequently used members into account. For example, as already
mentioned the 10% criterion leaves ‘river bank’ lumped with ‘money bank’,
which intuitively seems an inaccuracy, and one that can easily be avoided once
this analysis and technology is in place. A useful target is probably all the homo-
forms in the first 5,000 word families where the less frequent member or mem-
bers account for more than 5% of cases.
Following the calculation of proportions from the 500-word samples,
each item would be tagged (possibly as miss_1 and miss_2) and assigned by
extrapolation its two (or sometimes more) new places in the frequency lists.
The evenly divided miss is currently a first-1,000 item, with 19,010 lemma-
tized occurrences in the BNC (raw information available from BNC-Web,
http://bncweb.lancs.ac.uk/). But if half of these (about 9,505) are appor-
tioned to each meaning of miss, then neither miss_1 nor miss_2 belongs in this
first 1,000 category. As the first row of Table 5 shows, only lemmas occurring
12,696 times or more in the BNC qualify as first 1,000 items. Rather, both
would feature in the second 1,000 zone (between 4,858 and 12,638 occur-
rences). In cases where a meaning distinction corresponds to a POS distinc-
tion, as with miss, then the POS-tagged BNC could provide even more pre-
cise information (in this case that the verb is 10,348 occurrences and the
noun 8,662, both still in the second 1,000). Counts could be refined and cut-
offs change as the proposed amendments are made and items shifted up and
down the scale. List building would ideally be left to an expert in developing
and applying inclusion criteria, with Paul Nation as the obvious candidate
since he has already developed a principled method of balancing frequency
and range, spoken and written data, and corpus as well as pedagogical validi-
ty, into the existing BNC lists.
FREQUENCY 2.0: Incorporating homoforms and multiword units in pedagogical frequency lists 91
Table 5. BNC’s first five 1000-list cut-offs by token count (for lemmas)
K1 >12639
K2 4858 - 12638
K3 2430 - 4857
K4 1478 - 2429
K5 980 - 1477
Table 6 gives a sense of what this new arrangement would look like. Parent’s
proportions have been multiplied against BNC frequency sums and sorted
according to Martinez’ cut-offs in order to give a provisional look at the thou-
sand-level re-assignments that could flow from Parent’s data in Table 3. The
thousand (or k) levels in the first column on the left are the current composite
k-levels from the BNC; those in the third and subsequent columns are provi-
sional new k-levels for the independent meanings of the homoform. (These are
even highly provisional since they merely result from multiplying Parent’s per-
centages from 500 lines against BNC word-form totals from 100 million
words). The goal in presenting this data at this point is merely to give a flavour
of the changes being proposed. Also of interest may be any compatibility issues
arising from combining data from several analyses.
Note that the original 1,000-level ratings as presented in Table 6 may not
be identical to those in Nation’s current fourteen 1,000 lists in all cases (spell is
shown as 2k in Table 6, but in Vocabprofile output it is 1k). That is because
Nation’s first two 1,000 levels (1k and 2k) are derived from the spoken part of
the BNC corpus (10 million words, or 10 percent of the full corpus), in order
to ensure for pedagogical reasons that words like hello will appear in the first
1,000 word families. All ratings in Table 6 are based on information from the
unmodified BNC, in an attempt to employ a common scale to think about
moving items between levels.
Table 6 shows provisional list assignments for the 18 items of Parent’s
analysis that would be most likely to affect frequency ratings, in that the less
dominant meaning is nonetheless substantial (between 10% and 50%). As is
shown, only seven items (the top six plus pool) would require shifting the dom-
inant member to a lower frequency zone (e.g., from first thousand to second).
Similarly, in the remainder of the homoforms identified by Parent, the reanaly-
sis proposed here will most often leave the dominant member of a homoform
at its existing level. (The remainder of Parent’s analysis is shown in Table 1 in
the Appendix [further analysis under way, January, 2013)]). So is this reanalysis
worth the trouble?
92
Table 6. Provisional adjustments to frequency ratings for homoforms
Bumping the minor member down a zone could yield rather different text
profiles from those at present. If teachers are looking for texts at a particular
level, say one matched to their learners as a means of building fluency, or ahead
of their learners to build intensive reading skills, then just a few items (band_2
or host_2) can push a short text above or below the 95% (Laufer, 1989) or 98%
known-word comprehension threshold (Nation, 2006). Given the air time
given in the recent research literature to the 95 vs. 98% difference as a factor in
comprehension (Schmitt et al., 2011), small differences are clearly important.
Similarly when Vocabprofiles are used to assess the lexical richness of student
writing (Laufer & Nation, 1995) or speech (Ovtcharov et al., 2006; Lindqvist,
2010), a small number of lower frequency items can make a large difference to
the lexical richness scores of short texts.
To summarize, the resources, methodologies, and motivation for a signifi-
cant upgrade of the Frequency 1.0 scheme are largely in place. These include a
methodology for identifying the main homoforms and MWUs for the pedagog-
ically relevant zones of the BNC, a means of assigning them frequency ratings,
and a first application of this methodology. There is clearly much more to do in
this phase of the project, yet even when this is accomplished there will still be the
matter of deploying this information in the real-time profiling of particular texts.
5. A database of collocates
compositional phrase, the frequent collocates mostly involve words like levels,
times, and costs (thus at all levels, etc.) and as a non-compositional phrase they
largely involve negative quantifiers like none, hardly, and nothing (thus nothing
at all, etc.) and this once again must be hand sorted. A compilation of the most
frequent 50 collocates of at all, sorted into compositional and non-composi-
tional lists that an updated Vocabprofile can use to do its sorting is shown in
Table 3 in the Appendix.
From these diverse sources, a database of collocates for both homoforms
and MWUs can be fashioned.
6. Program function
Figure 3. Database with collocates for two members of the homograph miss
7.2. Context
It is frequently claimed that there are few true synonyms in a language owing to
differences in contexts of use and especially the distinct collocations that differ-
ent senses of words typically enter into (Sinclair, 1991). This claim should be
even more applicable to forms which are not just synonyms but have no related
meaning whatever. However, to date many examples but few proofs are offered
for this claim, which therefore remains intuitive. The proof of the claim would
be if the collocations that appear to distinguish the meanings of a homoform in
a particular corpus could predict the same distinctions in a novel text or corpus.
7.3. Procedure
The BNC was mined for all collocations with a frequency > 10 for the first three
items from Parent’s selection in Table 6 (miss, yard, and net) and two selections
from Martinez and Schmitt’s selection in Table 3 (a lot and at all) in the manner
of the information in Table 2 in the Appendix for bank. For each item, roughly
200 collocations, with some variability in the number, were hand sorted into
those corresponding to each meaning, which in the case of miss was tagged as
miss_1 or miss_2. The collocations were coded in the PERL scripting language
to match text strings within ten words on either side of each test item, including
strings with an unpredicted intervening word (miss train would also match missed
their train). Novel contexts for the five items were obtained by searching a cor-
pus of simplified stories for texts containing both meanings of each of the homo-
forms. For example, Wilde’s The Picture of Dorian Gray (Oxford Bookworms
Series; 10,500 running words; 1,000 headwords) bears three instances of miss
with both parsings represented. All instances were extracted as concordance lines
100 Thomas Cobb
of roughly 30 words (80 characters on either side of the keyword). These concor-
dance lines served as a greatly truncated ‘text’ that would test the program’s abil-
ity to use context information to disambiguate the homoforms. The next step
was to feed this test text into a computer program that accesses the collocation-
al database. The program breaks a text (in this case, the set of concordance lines
with homographs) into family headwords, identifies the current search term, and
looks for pattern matches in its collocation set. Each time it makes a match it
records the fact and awards a point to the relevant meaning.
7.4. Results
The collocational information is clearly able to distinguish the two meanings of
the homoform miss. Figure 5 shows the Dorian Gray output for miss, followed
by the record of the decision process.
Parsed concordance
034. omething to say to you.’ That would be lovely. But wont you MISS_1 your train?’ said
Dorian Gray, as he went up the step
035. , You look like a prince. I must call you Prince Charming.’ MISS_2 Sibyl knows how to
flatter you.’ You dont understand
036. g, Harry. I apologize to you both.’ My dear Dorian, perhaps MISS_2 Vane is ill,’ said
Hallward. We will come some other
Program’s reasoning
34. 2 0 miss_1
to you’ That would be love But wont you MISS you train’ say DORIAN Gray as he go up
— miss ‘you MISS’
— miss ‘train’
35. 0 1 miss_2
like a prince I must call you Prince Charming’ MISS Sibyl know how to FLATTER you’You dont understand
— miss ‘MISS Sibyl’ (CAP)
36. 0 1 miss_2
I apology to you both’ My dear Dorian perhaps MISS Vane be ill’ SAY Hallward We will come some
— miss ‘MISS Vane’ (CAP)
The program’s reasoning as shown in the output is thus: Before starting, the
algorithm reduces all words to familized headwords (e.g., go not went in line 34).
To parse the instance at concordance line 34, a pronoun subject (I|you|he, etc)
before the keyword, and the presence of the high frequency collocate train any-
where in the string, give a score of 2-0 for miss_1 (loss). The challenge point in
FREQUENCY 2.0: Incorporating homoforms and multiword units in pedagogical frequency lists 101
this and the many other runs of this experiment is where the meaning of the
homoform changes. This happens in line 35, where there is no match suggesting
miss_1 (loss), and one piece of evidence for miss_2 (title), namely miss followed
by a word with a capital letter, giving a score of 0-1 and a verdict of miss_2. In
line 36, a capital letter is once again the decider, now backed up by the coherent
information assumption. A score of 0-0 would have led to a continuation of the
previous parsing and that would have been correct.
Similarly, the Bookworms version of Conan Doyle’s Tales of Mystery and
Imagination was found to bear both meanings of at all, and once again the col-
locations were able to distinguish these (Fig. 6), largely through discovering var-
ious quantifiers like few, none, any and if for the non-compositionals and a fol-
lowing the for the compositional (these are underlined in the concordance out-
put for emphasis).
Figure 6. “at all” in simplified Tales of Mystery & Imagination – Bookworm Level 3
020. sons of the richest families of England. There was nothing at_all_1 to stop me now. I spent
my money wildly, and passed
021. n and the strange fears I had felt. If I thought about them at_all_1, I used to laugh at myself.
My life at Eton lasted f
022. htening, and few people were brave enough to enter the room at_all_1. In this room,
against the farthest wall, stood a hu
023. nd held it there for many minutes. There was no life in him at_all_1. Now his eye would not
trouble me again. Perhaps you
024. lantern was closed_2, and so no light came out of it, none at_all_1. Then slowly, very slowly,
I put my head inside the
025. d it. I started walking around the streets at night looking at_all_2 the cats, to see if I can_1
find another one like Pl
In the five test cases, all significantly longer than the ones shown here, the col-
location database was able to correctly identify the relevant meaning of the sin-
gle word or multiword homoform in at least 95% of cases. Accuracy can be
increased by expanding the size of the database (Fig. 4 is far from an exhaustive
list of at all the collocates Web-BNC offers for at all), but at the expense of slow-
ing the program down and making it less useful for practitioners.
7.5. Discussion
There is thus evidence that collocations can indeed simulate the function of
human judgment in this task and hence that the full database of collocates for
the high frequency homoforms and MWUs is worth building.
Further, it should be noted that the task set to the computer program in
102 Thomas Cobb
8. Conclusion
The pieces of Frequency 2.0 are at hand and, although hailing from quite dis-
parate quarters, merely require assembly. The most frequent and most pedagogi-
cally relevant homoforms have been identified, separated, and assigned initial fre-
quency ratings, and a methodology is in place to move the analysis down the scale
to the vast number of homoform items in English where the minor member rep-
resents fewer than 5% of occurrences. Refinements there will certainly be, and the
question of what makes an MWU non-compositional will need further thinking,
but the methodology is likely to be something similar to the one proposed here.
Further, while the first round of this work had to be accomplished by humans,
prizing apart the banks and at all’s by inspecting samplings of concordance lines,
for subsequent rounds a means is available to automate this task using a comput-
er program in conjunction with a collocational database such that sampling
should not be necessary: within a year or two, the collocational database should
be completed for both the Parent and Martinez items, or principled sub-sets
thereof, and it should be possible to assemble the pieces and create a complete set
of trial lists, incorporating both types of homoforms, as hypothesized in Table 2.
When that happens, an important task will be to establish new cut-offs –
that is, new frequency counts. The alert reader will have noticed that in several
of the analyses above, the original word-form cut-offs were used for proposed
new frequency assignments, whereas in fact, every re-assignment will shift all
the cut-offs. For example, if the first thousand list is defined as every BNC
lemma represented by more than 12,369 occurrences (Table 5), and the non-
compositional meaning of a lot is found to have more occurrences than this,
then it should be included as a first thousand item – and the current last item
will be bumped to the second thousand list.
Also on the to-do list will be to establish a coding format for the different
meanings of homographs (bank_1 and bank_2, or bank_money and bank_river?
and at_all for non-compositional MWUs but plain at and all for composition-
al?); to settle on the exact list of MWUs to include; to settle on the percentage
of main-meaning occurrences (90% or 95%) that makes handling separate
FREQUENCY 2.0: Incorporating homoforms and multiword units in pedagogical frequency lists 103
meanings worth program time; and to decide whether to limit the single word
analysis to the first five thousand-word families or to proceed further. Benefits
to be realized will be more accurate Vocabprofiling (extent to be determined),
greater credibility for this methodology within the scientific community, and
more effective language instruction.
References
Aston, G., & Burnard, L. (1998). The BNC handbook: exploring the British National
Corpus with SARA. Edinburgh: Edinburgh University Press.
Bauer, L., & Nation, P. (1993). Word families. International Journal of Lexicography,
6(4), 253-279.
Beglar, D., & Nation, P. (2007). A vocabulary size test. The Language Teacher, 31(7), 9-13.
Beretta, A., Fiorentino, R., & Poeppel, D. (2005). The effects of homonymy and poly-
semy on lexical access: an MEG study. Cognitive Brain Research, 24, 57-65.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman gram-
mar of spoken and written English. Harlow, UK: Pearson Education.
Cobb, T. (2010). Learning about language and learners from computer programs.
Reading in a Foreign Language, 22(1), 181-200.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213-238.
Davies, M., & Gardner, D. (2010). Frequency dictionary of contemporary American
English: Word sketches, collocates, and thematic lists. New York: Routledge.
Davies, M. (2011). Word frequency data from the Corpus of Contemporary American
English (COCA). [Downloaded from http://www.wordfrequency.info on 2012-
07-02.]
Ellis, N. C. (2002). Frequency effects in language processing. Studies in Second Language
Acquisition, 24(02), 143-188.
Ellis, N. C., & Larsen-Freeman, D. (2009). Constructing a second language: Analyses
and computational simulations of the emergence of linguistic constructions from
usage. Language Learning, 59, 90-125.
Grant, L., & Nation, P. (2006). How many idioms are there in English? International
Journal of Applied Linguistics, 151, 1-14.
Heatley, A., & Nation, P. (1994). Range. Victoria University of Wellington, NZ.
[Computer program, available with updates at http://www.vuw.ac.nz/lals/].
Hoey, M. (2005). Lexical priming: A new theory of words and language. London: Taylor
and Francis.
Johns, T. (1986). Micro-concord: A language learner’s research tool. System, 14(2), 151-162.
Laufer, B. (1989). What percentage of text-lexis is essential for comprehension? In C.
Lauren & M. Nordman (Eds.), Special language: From humans thinking to thinking
machines (pp. 316-323). Clevedon, UK: Multilingual Matters.
Laufer, B., & Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written
104 Thomas Cobb
APPENDIX
Table 2. Collocates for two banks, from Just-The-Word database, frequency >10, span=5
word-forms either side, hand-sorted into independent meanings
Money banks
world bank 714 development bank 86 director of bank 51
central bank 690 bank on 84 bank announce 50
bank account 422 bank balance 78 bank credit 50
bank holiday 409 swiss bank 76 bank provide 49
bank manager 298 bank rate 74 private bank 49
national bank 272 major bank 73 money in bank 49
commercial bank 226 bank lend 71 clearing bank 48
european bank 215 state bank 67 international bank 48
merchant bank 201 bank clerk 64 president of bank 48
royal bank 191 bank and company 62 bank offer 47
bank loan 189 British bank 61 bank statement 47
investment bank 165 american bank 57 french bank 45
between bank 142 bank and institution 57 bank official 45
go to bank 117 borrow from bank 55 leave bank 44
midland bank 113 include bank 55 german bank 43
big bank 104 branch of bank 55 reserve bank 43
governor of bank 97 bank or building society 55 clearing bank 40
bank deposit 95 bank hold 53 creditor bank 40
foreign bank 91 bank note 53 bank strip 40
bank and building society 90 japanese bank 52 bank lending 39
large bank 87 data bank 51 bank agree 38
>>>
106 Thomas Cobb
>>>
bank pay 38 bank seek 22 accept by bank 14
chairman of bank 38 irish bank 22 deposit in bank 14
work in bank 37 issuing bank 22 make by bank 14
join bank 37 bank interest 22 set up bank 14
bank buy 37 head of bank 22 offer by bank 14
leading bank 37 group of bank 22 owe to bank 14
bank governor 37 Western bank 21 shanghai bank 14
break bank 36 role of bank 21 write to bank 14
bank lending 36 clear bank 20 bank step 14
overseas bank 35 enable bank 20 retail bank 14
bank charge 35 close bank 20 jeff bank 14
bank debt 35 bank operate 20 bank employee 14
allow bank 34 bank raid 20 bank finance 14
have in bank 33 line bank 19 bank funding 14
rob bank 33 sponsor by bank 19 bank customer 14
issue by bank 33 bank charge 19 bank estimate 14
bank issue 33 bank require 19 consortium of bank 14
bank sell 32 trust bank 19 building society and bank 14
bank able 32 bank borrowing 19 bank and government 14
land bank 32 bank corporation 19 receive from bank 13
bank branch 32 bank vault 19 draw on bank 13
loan from bank 32 subsidiary of bank 19 sell to bank 13
way to bank 32 establishment of bank 19 co-op bank 13
northern bank 31 take to bank 18 deposit with bank 13
be bank 30 bank create 18 bank to bank 13
bottle bank 30 asian bank 18 get in bank 12
street bank 30 account with bank 18 hold by bank 12
bank robbery 30 Government and bank 18 pay to bank 12
bank base rate 30 eastern bank 17 take by bank 12
memory bank 29 piggy bank 17 bank assistant 12
put in bank 28 state-owned bank 17 bank guarantee 12
bank cut 28 city bank 17 bank creditor 12
bank staff 28 bank card 17 Balance at bank 12
manager of bank 28 debt to bank 17 currency and bank 12
force bank 26 oblige bank 16 Building society or bank 12
provide by bank 26 approach bank 16 bank and credit 12
Independent bank 26 bank publish 16 bank or company 12
bank report 26 bank deal 16 deposit with bank 11
pay into bank 25 bank overdraft 16 bank grant 11
street bank 25 agreement with bank 16 bank intervene 11
union bank 25 name of bank 16 failed bank 11
bank robber 25 available from bank 16 gene bank 11
account at bank 25 bank and house 16 bank post 11
customer of bank 25 bank up 16 bank operating 11
fund and bank 25 own by bank 15 bank interest rate 11
bank and fund 25 work for bank 15 chair of bank 11
regional bank 24 persuade bank 15 money from bank 11
bank act 22 bank president 15 company and bank 11
bank refuse 22 bank show 15
FREQUENCY 2.0: Incorporating homoforms and multiword units in pedagogical frequency lists 107
River banks
west bank 240 steep bank 45 left bank 28
river bank 210 opposite bank 42 east bank 27
along bank 194 west bank 42 left bank 26
south bank 166 top of bank 42 stand on bank 15
far bank 94 grassy bank 41 occupied bank 14
its banks 85 north bank 41 shingle bank 12
down bank 73 sit on bank 30 situate on bank 11
up bank 53 swain bank 30 walk along bank 11
south bank 48 burst bank 28
Non-Compositional
(anything) at all wrong (no) interest at all at all — (phrase end)
(didn’t) notice at all (no) problem at all at all’ (phrase end)
(didn’t) seem at all (no) reason at all at all possible
(didn’t) sleep at all (no) sense at all at all! (sentence end)
(doesn’t) bother (me) at all (no) sound at all at all. (sentence end)
(doesn’t) exist at all (no) trouble at all at all? (sentence end)
(doesn’t) look at all (not) aimed at all did (not) at all
(don’t care) at all about (not) at all actually hardly at all
(don’t care) at all except (not) at all clear if at all
(don’t care) at all really (not) at all easy mention at all
(don’t see it) at all (not) at all sure never (did it) at all
(don’t) like at all (not) at all surprised no … at all
(don’t) mind at all (not) changed at all nobody at all
(don’t) remember at all (not) doubt (it) at all none at all
(don’t) see at all (not) pleased at all not at all
(no) good at all (not) worried at all nothing at all
(no) harm at all any at all n’t … at all
(no) help at all anything at all scarcely at all
(no) idea at all anywhere at all without (any) at all
Compositional
avoided at all (costs) at all sites at all events
avoid at all (costs) at All Saints at all costs
at all times at all levels at all ages
at all stages at all hours
A new approach to measuring lexical
sophistication in L2 oral production
ChristinaLindqvist*, Anna Gudmundson** and CamillaBardel**
*Uppsala University, **Stockholm University
The aims of this chapter are a) to give a comprehensive description of a new tool
for lexical profiling by reporting how it was developed, and b) to indicate possible
areas of use and future developments of the tool. The tool has been used for meas-
uring the lexical sophistication of Swedish learners of French and Italian. The dif-
ferent steps of development have partly been presented in previous studies (Bardel
& Lindqvist, 2011; Bardel, Gudmundson & Lindqvist, 2012; Lindqvist, Bardel &
Gudmundson, 2011) but are complemented here through a detailed account of
the tool, in order to enable replication and use of the method with other languages.
The outline of this chapter is as follows: first, as a background, we provide a sur-
vey of methods designed to measure lexical richness in L2 production. Then we
discuss the inherent differences between written and spoken language and what
these differences may imply when lexical richness is measured. Next, we present
a new method for analyzing L2 learners’ lexical profiles in oral production data,
giving a detailed technical description of the creation of the tool. We then dis-
cuss pros and cons with frequency-based measures in general and present our
solutions to some of the problems brought up. Finally, we suggest some poten-
tial areas of use and discuss some possible improvements of the method.
will be repeated more often as compared to low-frequency words, and this ten-
dency will increase the longer the text is. Several measures have been proposed in
order to solve the problem with text length. One example is the index of Guiraud
(Guiraud, 1954), which is a type/token based measure that is supposed to be
independent of text length. The index of Guiraud results from dividing the num-
ber of types by the square root of the number of tokens. For a long text, this pro-
cedure will result in a higher lexical richness than what would have been obtained
with a simple TTR. However, according to Daller, Van Hout and Treffers-Daller
(2003, p. 200) neither TTR nor the index of Guiraud are valid measures of lex-
ical richness at later stages of L2 acquisition. A development of the Guiraud
index is the advanced Guiraud, which takes in frequency as a factor (Daller et al.,
2003). Furthermore, Malvern, Richards, Chipere and Durán (2004) have sug-
gested the D measure, which is freely available in CHILDES. D models the
falling TTR curve by calculating TTRs for samples of different text lengths,
ranging from samples of 35 words to samples of 50 words, which are taken ran-
domly from the text. However, in their critical evaluation of D, McCarthy and
Jarvis (2007) conclude that even though the D measure was the most reliable of
those investigated, it still retains a certain degree of sensitivity to text length.
Lexical sophistication is defined as the percentage of sophisticated or
advanced words in a text. There are, however, different definitions of sophisti-
cated/advanced vocabulary. Low-frequency words, for instance, are generally
considered to be advanced and sophisticated (Laufer & Nation, 1995; Vermeer,
2004). It has even been suggested that words are learned in rough order of fre-
quency (Cobb & Horst, 2004; Vermeer, 2004). The difficulty of words, as
measured by their frequency, should therefore be taken into account when
measuring the lexical richness of L2 learners. A method which relies on the raw
frequency of words in the target language is the Lexical Frequency Profile, LFP
(Laufer & Nation, 1995). The LFP measures the proportion of high-frequency
words vs. the proportion of low-frequency words in a written text. All the words
are divided into different categories, which have been established on the basis of
frequency bands based on written language corpora (Laufer & Nation, 1995).
Vocabprofile is a program that executes this categorization according to the fol-
lowing frequency bands: the 1000 most frequent word families, the next 1000
most frequent word families, and the Academic Wordlist, which contains the
570 most frequent word families drawn from academic texts (Coxhead, 2000,
see also www.lextutor.ca/vocabprofile). The words that do not appear in any of
these categories end up in the ‘not-in-the-lists’ category.1
1 There is also an updated version of Vocabprofile for English (but not for French),
which distinguishes 20 different frequency bands.
A new approach to measuring lexical sophistication in L2 oral production 111
Laufer and Nation (1995) have shown that the LFP measure is able to dis-
tinguish between different proficiency levels. The English version of LFP was
validated by Laufer and Nation and there is also a French version, with the pro-
gram Vocabprofil, also based on written data, which has been validated in a
study of the oral production of advanced French L2 learners by Ovtcharov,
Cobb and Halter (2006). It is interesting to note that Ovtcharov et al. actually
used oral learner data and ran those against frequency bands based on written
data. Still, they found significant differences between learners at different profi-
ciency levels.
Even though Ovtcharov at al. (2006) were able to validate the French version
of LFP using learners’ oral production data, the appropriateness of compar-
ing learners’ spoken language with written data bases can be questioned.
Lindqvist (2010) used the French version, Vocabprofil, comparing two
groups at different proficiency levels.2 In contrast to Ovtcharov et al. (2006),
she found no significant differences between the two learner groups. She also
conducted a qualitative analysis of the words classified in the not-in-the-lists
category, and found that many words typical in oral French were classified in
this category, such as ben (‘well’), ouais (‘yeah’), rigolo (‘fun’), prof (short for
‘teacher’), sympa (‘nice’), although these are frequent in everyday speech.
Lindqvist suggested that frequency lists based on L1 oral data should be used
when investigating L2 learners’ oral production. This has also been pointed
out by Tidball and Treffers-Daller (2008, p. 311), who call for an oral ver-
sion of the Vocabprofil program, so that oral data can be compared to an oral
data base, which would better reflect the informants’ lexical profile. For
instance, the words ben and ouais are discourse markers that are often found
in spoken language, but not in written production (McCarthy, 1998; Tidball
& Treffers-Daller, 2008), so even if they are produced often by a learner a
comparison to a written data base would give the impression that the learn-
er uses rare words, and the conclusion that the learner in question has an
advanced vocabulary might be wrong. According to McCarthy (1998, p.
122), frequency lists based on spoken language differ from those based on
written sources. Generally, the differences between spoken and written lan-
guage are considerable (see e.g. Linell, 2005, p. 28), something that must
2 The levels of proficiency of the learners were established on the basis of a morpho-
syntactic analysis (cf. Barting & Schlyter, 2004).
112 Christina Lindqvist, Anna Gudmundson and Camilla Bardel
Considering the background described above, and in order to avoid not only a
written language bias (cf. Linell, 2005), but also methodological problems of
validity, we set out to create a new tool for analyzing lexical sophistication in
French and Italian L2, within the on-going project Aspects of the advanced L2
learner’s lexicon.3 We developed a lexical profiler explicitly for the analysis of
spoken language. In order to create frequency bands based on spoken target lan-
guage data, we used the Corpaix corpus for French and the C-Oral-Rom and LIP
corpora for Italian.4 We also developed a program that runs learner data against
the frequency bands. In the following, we will describe the process of creating
the tool.
ent data, usually associated with software to update and query the data” (The
Free On-line Dictionary of Computing: http://foldoc.org/database). When
working with sets of associated tables, i.e. retrieving, organizing, joining, count-
ing and comparing table contents, work is very much facilitated if a query lan-
guage such as SQL can be used.
5 Only tokens that appear ten times or more in the Corpaix corpus were added to the
list created by Véronis.
6 This number has been corrected compared to earlier studies (Bardel,
Gudmundson, & Lindqvist, 2012; Lindqvist, Bardel, & Gudmundson, 2011) in
which the number of lemmas was estimated to 2766, due to a technical error. This
small difference does not have any effect on the division of the lemmas into the
frequency bands.
114 Christina Lindqvist, Anna Gudmundson and Camilla Bardel
frequency list based on both LIP and C-Oral-Rom. The final result consists
of a lemma-frequency list composed of 19962 different lemmas based on a
total of 789070 tokens.
When creating the French and Italian frequency bands it was decided to
use the lemma as counting unit instead of the word family, for the following rea-
sons (for a more detailed discussion, see Lindqvist et al., 2011). A word family
can include both derivations and inflected forms of a headword, which implies
that the word family might include quite a high number of forms. For example,
an Italian regular verb has six different forms in present tense: canto, canti,
canta, cantiamo, cantate, cantano (from inf. cantare). This marking of person is
compounded with marking of tense, aspect and modality (e.g. past tense of sub-
junctive 1st person plural: cantassimo). Hence, Italian has a very rich verb mor-
phology. Furthermore a word family can also include nouns, adjectives, etc,
whose relationships with the base are not always very transparent, such as can-
zone (song), cantante (singer) and, possibly, cantautore (a compound of cantante
and autore, singer/songwriter). The fact that a learner uses one particular form
does not necessarily mean that he or she has knowledge of all the related forms
in the word family. This claim is particularly relevant in our research, which
concerns oral production. It is plausible that the learner knows several word
forms that are simply not used in one particular recorded session, which makes
it impossible to draw any conclusions regarding how many forms related to a
specific word family are actually known. Using the lemma as counting unit is
an option that reduces the number of forms attached to a headword, even
though this does not solve the problem completely. In conclusion, the French
and Italian frequency bands described in this paper are different from the ones
elaborated by Laufer and Nation (1995) and Cobb and Horst (2004), which are
based on word families.
2746 lemmas from the French lemma-frequency list and 3127 lemmas
from the Italian lemma-frequency list were divided into three frequency
bands consisting of about 1000 lemmas each. Hence, band 1 includes the
most frequent 1000 lemmas, band 2 the 2nd 1000 most frequent lemmas and
band 3 the 3rd 1000 most frequent lemmas. The lemmas not appearing in
any of these three bands are categorized as off-list lemmas, i.e. those not
belonging to the most frequent 3000 lemmas in Italian or French. Table 1
shows the frequency distribution of the French frequency bands and table 2
the frequency distribution of the Italian frequency bands.
Table 2. The Italian frequency bands
The tokens included in the French frequency bands (1-3) cover 93.44% of the
total number of tokens included in the Corpaix corpus, and the tokens includ-
ed in the Italian frequency bands (1-3) cover 93.32% of the total number of
tokens included in the Italian corpus, i.e. the combination of LIP and C-Oral-
Rom. As can be seen from the tables above, the number of lemmas included in
the Italian frequency bands is slightly higher than that of the French bands. It
can also be noted that the number of lemmas included in each band within each
language varies between 807 and 986 for French and between 1019 and 1080
for Italian. The reason for this is that the line between two frequency bands must
be drawn where two lemmas differ in frequency; for example, in the French list,
all lemmas from rank 971 to 986 occur 50 times in the corpus, while the lemma
ranked as number 987, journal (newspaper) occurs 49 times. Journal could not
be included in the first frequency band since it would have been necessary to
include all other lemmas that occur 49 times as well. The number of lemmas
included in each band could therefore not be established and decided before-
hand. The aim, however, was to distribute them as evenly as possible. It can be
noted that more than 90% of all tokens that appear in the two corpora belong
to band 1 and that only a small percentage belong to bands 2 and 3. The French
and Italian frequency bands were imported into an SQL data base.
The following SQL query can be used to compare French learner data to the
French frequency bands (named ‘corpaixband’).
(1)
SELECT
i.InformantName,
i.LinguisticLevel,
sum(LemmaFreq) as “number of lemmas”,
sum(case when band = 1 then freq else 0 end) as “band 1”,
sum(case when band = 2 then freq else 0 end) as “band 2”,
sum(case when band = 3 then freq else 0 end) as “band 3”,
sum(case when band is null then freq else 0 end) as “offlist”
FROM FrenchInputFile i
left outer join corpaixband b on i.lemma = b.lemma
group by InformantName
order by LinguisticLevel
In example (1) above, the content of the field/column ‘LemmaFreq’ from the
table ‘FrenchInputFile’ is compared to that of ‘corpaixband’, creating an output
file with information about the number of lemmas in the ‘FrenchInputFile’
belonging to band 1, band 2, band 3 and offlist. The result is grouped and
ordered by ‘InformantName’ and ‘LinguisticLevel’ as shown in the figure below.
7 Proficiency level was operationalized as a 1-6 scale based on Bartning & Schlyter’s
(2004) framework, where 6 corresponds to a very advanced level.
A new approach to measuring lexical sophistication in L2 oral production 117
Another useful query provides information about the informant’s name, the
lemma, the frequency of the lemma, the linguistic level of the informant, and
the band to which the lemma belongs. The query is shown in example (2) and
it returns an output file represented in figure 5.
118 Christina Lindqvist, Anna Gudmundson and Camilla Bardel
(2)
select
i.InformantName,
i.lemma,
i.LemmaFreq,
i.LinguisticLevel,
b.band
from FrenchInputFile i
left outer join corpaixband b on i.lemma = b.lemma
As can be seen from the output file in figure 5, the last column indicates the
band to which the lemma belongs. This is useful information when single lem-
mas have to be studied and analyzed.
Two important advantages with the lexical frequency profiling analysis are that
it is able to distinguish between proficiency levels in oral production and that
A new approach to measuring lexical sophistication in L2 oral production 119
this measure of lexical richness seems to correlate with the other measures of
proficiency used in our earlier studies. However, there are also some important
drawbacks with this kind of measure in general. Some of them will be discussed
at the end of this paper. There are also problems related to the frequency crite-
rion per se. The method relies exclusively on (low-) frequency as a criterion of
high level proficiency (or difficulty for the learner). Other factors that may have
an impact on learnability (and lexical richness) are cognateness and the role of
teaching materials (cf. Horst & Collins, 2006; Milton, 2007). Horst and
Collins showed that the use of cognates decreased with higher proficiency, sug-
gesting that cognates (although of low frequency) are not indicative of an
advanced vocabulary, in the sense of LFP. As for the role of teaching materials,
Milton has pointed out that words that are introduced early, covering certain
thematic fields, like travelling or eating out, are learned early, even though they
are not used in everyday speech by native speakers, and these words are erro-
neously classified when regarded as advanced vocabulary. These issues were
explored in Bardel and Lindqvist (2011), which led to certain modifications of
the LOPP method. These modifications are described in the following section.
of the methodology used to carry out the teachers’ judgement test can be found
in Bardel et al. (2012).
In order to evaluate the LOPPa tool, data from a previous study carried out
with the LOPPf tool (Lindqvist et al., 2011) were re-analyzed with the LOPPa
tool (Bardel et al., 2012). It was found that the distinction between basic and
advanced words resulted in a higher intra-group homogeneity compared to the
purely frequency based perspective. Thus, by taking cognateness and the notion
of thematic words into consideration, the lexical richness measure improved, an
improvement that was shown by an increased effect size as expressed by eta2.
On the basis of our research we can claim that there are two main advantages
with lexical frequency profiling analyses: (1) They are able to distinguish
between proficiency levels in oral production. This has been shown both for the
method relying only on frequency (Lindqvist et al., 2011) and for the elaborat-
ed version of the method, which takes cognates and thematic vocabulary into
account (Bardel et al., 2012). (2) LOPPa provides results that seem to correlate
with other measures of proficiency used in our earlier studies (mainly measures
of morpho-syntactic development).
Another advantage that we would like to point out is that it is possible to
conduct both quantitative and qualitative analyses using LOPPa, as opposed to
using formulas of lexical richness, e.g. D or TTR. The procedure of LOPPa is
to first provide a quantitative result, i.e. the division of the lemmas into bands.
In a second phase, it is possible to make an in-depth analysis of the words actu-
ally used, by looking at the lists provided by the program. This is possible for a
whole data set as well as for individual learners. By making such a thorough
analysis it is also possible to continuously improve the method by analyzing the
words that appear in the off-list for instance. It is plausible that new cognates
and words belonging to thematic vocabulary will appear in the off-list when
new data is used in the program. We also believe that the method could be used
for pedagogical purposes, for example in order to assess learners’ lexical richness
in oral production. Teachers could use the basic/advanced word list as a point
of reference in vocabulary teaching. The method is also suitable for self-assess-
ment, if learners are given the possibility to analyze their own production with-
in a specific course component at higher levels of education.
It has to be admitted that there are some limitations to the method at this
stage of our research. One of the limitations concerns the fact that it is oriented
towards learners with Swedish as their L1 and French or Italian as their L2 (and
also taking into account that English is an additional second language for all
A new approach to measuring lexical sophistication in L2 oral production 121
learners). This certainly limits the number of potential users. However, given the
detailed description of the elaboration of the method provided in this paper,
there are good possibilities to adapt it for use with other languages. Another lim-
itation is that the method is most suitable for oral data. As we have discussed else-
where, it is preferable to compare learner data to the same type of data in the tar-
get language, as word frequency may differ between oral and written language.
There are also some important drawbacks with this kind of measure of lex-
ical richness in general. One is that it only taps formal aspects of word knowl-
edge. Deep knowledge of vocabulary is not accounted for, e.g. use of words with
multiple meanings or use of multi-word units (cf. Nation, 2006; Cobb, this vol-
ume). Furthermore, another aspect that remains ignored is non-targetlike use of
target language forms. Possible solutions to these problems will be discussed in
the following section.
There are several aspects that must be learned in order to achieve complete
knowledge of a word: form (spoken and written, i.e. pronunciation and
spelling), word structure (morphology), syntactic pattern of the word in a
phrase and sentence, meaning (referential – including multiplicity of meaning
and metaphorical extensions of meaning; affective – the connotation of the
word; pragmatic – the suitability of the word in a particular situation), lexical
relations of the word with other words (e.g. synonymy, antonymy, hyponomy)
and collocations. All these aspects can be more or less well known. The more
advanced a learner, the more aspects of a word are likely to be known, and the
more developed are the different aspects, for example, more meanings of a hom-
ograph are known, more synonyms, more collocations and idiomatic expres-
sions are mastered (Laufer, 1997, p.141).
Qualitative knowledge about the single word is sometimes referred to as
depth. In his attempt to pinpoint what researchers have in mind when investi-
gating depth of knowledge, Read (2004) distinguishes three approaches to
vocabulary learning in the literature, comprehensive word knowledge, precision of
meaning and network knowledge. According to the first approach, depth covers
different types of knowledge of a word, like those indicated by Laufer (1997, p.
141), all of which, if they are fulfilled, can be called comprehensive word knowl-
edge. With precision of meaning, Read (2004, p. 211) refers to “the difference
between having a limited, vague idea of what a word means and having much
more elaborated and specific knowledge of its meaning”. It seems problematic
to establish a criterion for precise knowledge. Typically, the criterion is that of
the adult native speaker. However, as Read (2004, p. 213) points out, “knowl-
122 Christina Lindqvist, Anna Gudmundson and Camilla Bardel
fait is an off-list word. Treating these words separately means that the number
of words categorized as highly frequent will rise, although this may not corre-
spond to the frequency of the whole expression in the target language input. In
order to account for the frequency of multi-word units, we would have to find
a way to integrate them in the frequency lists. It is encouraging to see that work
in this direction has started for English (Cobb, this volume; Martinez &
Schmitt, 2012). However, considering our approach in the LOPPa framework,
we find it pertinent to include multi-word units that are cognates (Wolter &
Gyllstad, 2011) and thematic in a basic and an advanced vocabulary.
How could this be accomplished within the LOPPa framework? Every
multi-word unit present in the corpus to be analyzed must be tagged as a unit
in order to make it appear as a unit and not as several different words. This
would lead to a non-match with the baseline corpora, if they are not tagged in
exactly the same way, and consequently the multi-word units would end up in
the off-list among the low-frequent advanced words. If the aim is to get a pic-
ture of the role of frequency for vocabulary learning, as in the LFP, one must
make them appear in the frequency bands they actually belong to, and in order
to do this the actual frequency of the multi-word units must be looked up in
the corpora used as baseline data. Of course, the same goes for the multiple
meanings of words. Words occurring in the baseline corpora must be sorted into
frequency bands on the basis of the meaning they have in context.
Another important aspect, which is not accounted for in lexical profiling
analyses, is the use of words that do not exist in the TL. In fact, non target-like
word forms and non target-like use of words (although correct at the formal
level) represent an important aspect of vocabulary knowledge. Our main focus
thus far has been on the vocabulary use by relatively advanced learners, but ear-
lier research has shown that cross-linguistic influence occurs more frequently at
the earlier stages of development (Lindqvist, 2009; Williams & Hammarberg,
2009 [1998]). It is important to integrate this aspect when analyzing the lexical
profile of learners. Moreover, as noted above, Read (2000) considers that the
proportion of errors is one aspect of lexical richness.
Non target-like use can be instances of code-switching, lexical inventions
or other deviant forms of words in the TL (Bardel & Lindqvist, 2007; Dewaele,
1998; Williams & Hammarberg, 2009 [1998]). Vocabprofile gives the instruc-
tion to remove code-switches and other deviant forms, and this was also done
in the Laufer and Nation (1995) study. We followed this methodology in the
LOPPf/a analyses. The main reason for that is that if they had been kept, words
belonging to another language than the TL would end up in the off-list, thus
adding to the proportion of advanced words. However, in our view, code-
switches are also part of the learner’s vocabulary, and have something to say
about the level of vocabulary proficiency. Moreover, the fact that a learner uses
124 Christina Lindqvist, Anna Gudmundson and Camilla Bardel
7. Conclusions
As we have shown, several efforts have been made within the project Aspects of
the advanced L2 learner’s lexicon, to create and improve a tool for lexical profil-
ing of Swedish L2 learners’ oral production of French and Italian. In a number
of steps we have improved our original method LOPP, but there are still many
things to develop further. On top of the ideas put forward in this chapter, given
that the method is now only available to the research group, an important step
forward would be to make the method and the data accessible to other users by
providing a user-friendly interface.
References
Bardel, C., Gudmundson, A., & Lindqvist, C. (2012). Aspects of lexical sophistication
in advanced learners’ oral production: Vocabulary acquisition and use in L2 French
and Italian. Studies in Second Language Acquisition, 34(2), 269-290.
Bardel, C. & Lindqvist, C. (2007). The role of proficiency and psychotypology in cross-
linguistic influence. A study of a multilingual learner of Italian L3. In M. Chini,
P. Desideri, M.E. Favilla & G. Pallotti (Eds.), Atti del XI congresso internazionale
dell’Associazione italiana di linguistica applicata. Napoli 9-10 febbraio 2006 (pp.
123-145). Perugia: Guerra.
Bardel, C. & Lindqvist, C. (2011). Developing a lexical profiler for spoken French and
Ialian L2: The role of frequency, cognates and thematic vocabulary. In L. Roberts,
G. Pallotti, & C. Bettoni (Eds.), EUROSLA yearbook 11 (pp. 75-93). Amstedam:
Benjamins.
Bartning, I. & Schlyter, S. (2004). Itinéraires acquisitionnels et stades de développe-
ment en français L2. Journal of French Language Studies, 14(3), 281-289.
Bensoussan, M. & Laufer, B. (1984). Lexical guessing in context in EFL reading com-
prehension. Journal of Research in Reading, 7(1), 15-32.
Campione, E., Véronis, J., & Deulofeu, J. (2005). The French corpus. In E. Cresti, &
M. Moneglia (Eds.), C-ORAL-ROM: Integrated reference corpora for spoken romance
languages (pp. 111-133). Amsterdam: Benjamins.
A new approach to measuring lexical sophistication in L2 oral production 125
Cobb, T. & Horst, M. (2004). Is there room for an academic wordlist in French? In P.
Boogards, & B. Laufer (Eds.), Vocabulary in a second language: Selection, acquisition,
and testing (pp. 15-38). Amsterdam: Benjamins.
Codd, E. F. (1970). A relational model of data for large shared data banks.
Communications of the ACM, 13(6), 377-387.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213-238.
Cresti, E. & Moneglia, M. (2005). C-ORAL-ROM: Integrated reference corpora for spo-
ken romance languages. Amsterdam: Benjamins.
Daller, H., Van Hout, R., & Treffers-Daller, J. (2003). Lexical richness in the sponta-
neous speech of bilinguals. Applied Linguistics, 24(2), 197-222.
De Mauro, T., Mancini, F., Vedovelli, M., & Voghera, M. (1993). Lessico di frequenza
dell’italiano parlato (1st ed.). Milano: Etaslibri.
Dewaele, J. (1998). Lexical inventions: French interlanguage as L2 versus L3. Applied
Linguistics, 19(4), 471-490.
Guiraud, P. (1954). Les caractéristiques statistiques du vocabulaire. Paris: Presses Universitaires
de France.
Horst, M. & Collins, L. (2006). From faible to strong: How does their vocabulary
grow? Canadian Modern Language Review, 63(1), 83-106.
Jones, A., Stephens, R., Plew, R. R., Garrett, B., & Kriegel, A. (2005). SQL functions
programmer’s reference (programmer to programmer). Indianapolis: Wiley Pub.
Laufer, B. (1997). The lexical plight in second language reading: Words you don’t know,
words you think you know, and words you can’t guess. In J. Coady & T. N.
Huckin (Eds.), Second language vocabulary acquisition: A rationale for pedagogy (pp.
20-34). Cambridge: Cambridge University Press.
Laufer, B. & Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written
production. Applied Linguistics, 16(3), 307-322.
Lindqvist, C. (2009). The use of the L1 and the L2 in French L3: Examining cross-lin-
guistic lexemes in multilingual learners’ oral production. International Journal of
Multilingualism, 6(3), 281-297.
Lindqvist, C. (2010). La richesse lexicale dans la production orale de l’apprenant avancé
de français. Canadian Modern Language Review, 66(3), 393-420.
Lindqvist, C., Bardel, C., & Gudmundson, A. (2011). Lexical richness in the advanced
learner’s oral production of French and Italian L2. IRAL, 49(3), 221-240.
Linell, P. (2005). The written language bias in linguistics. London: Routledge.
Malvern, D. D., Richards, B. J., Chipere, N., & Durán, P. (2004). Lexical diversity and lan-
guage development: Quantification and assessment. Basingstoke: Palgrave Macmillan.
Martinez, R. & Schmitt, N. (2012). A phrasal expression list. Applied Linguistics, 33(3),
299-320.
McCarthy, M. (1998). Spoken language and applied linguistics. Cambridge: Cambridge
University Press.
McCarthy, P. M. & Jarvis, S. (2007). Vocd: A theoretical and empirical evaluation. Language
Testing, 24(4), 459-488.
126 Christina Lindqvist, Anna Gudmundson and Camilla Bardel
Meara, P. (2009). Connected words: Word associations and second language vocabulary
acquisition. Amsterdam: Benjamins.
Milton, J. (2007). Lexical profiles, learning styles and the construct validity of lexical
size tests. In H. Daller, J. Milton, & J. Treffers-Daller (Eds.), Modelling and assess-
ing vocabulary knowledge (pp. 47-58). Cambridge: Cambridge University Press.
Nation, P. (2006). How large a vocabulary is needed for reading and listening? The
Canadian Modern Language Review 63(1), 59-82.
Ovtcharov, V., Cobb, T., & Halter, R. (2006). La richesse lexicale des productions orales:
Mesure fiable du niveau de compétence langagière. The Canadian Modern Language
Review, 61(1), 107-125.
Read, J. (2000). Assessing vocabulary. Cambridge: Cambridge University Press.
Read, J. (2004). Reserch in teaching vocabulary. Annual Review of Applied Linguistics,
24, 146-161.
Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees.
International Conference on New Methods in Language Processing, Manchester, UK.
Schmid, H. (1995). Improvements in part-of-speech tagging with an application to
German. Proceedings of the ACL SIGDAT-Workshop, Dublin, Ireland. 1-9.
Schneider, S. (2008). BADIP. Retrieved 10/10, 2008, from http://languageserver.uni-
graz.at/badip/badip/home.php
Scott, M. (2004). WordSmith tools version 4. Oxford: Oxford University Press.
Tidball, F., & Treffers-Daller, J. (2008). Analysing lexical richness in French learner lan-
guage: What frequency lists and teacher judgment can tell us about basic and
advanced words. French Language Studies, 18(3), 299-313.
Vermeer, A. (2001). Breadth and depth of vocabulary in relation to L1/L2 acquisition
and frequency of input. Applied Psycholinguistics, 22(2), 217-234.
Vermeer, A. (2004). The relation between lexical richness and vocabulary size in Dutch
L1 and L2 children. In P. Boogards & B. Laufer (Eds.), Vocabulary in a second lan-
guage: Selection, acquisition, and testing (pp. 173-189). Amsterdam: Benjamins.
Williams, S. & Hammarberg, B. (2009 [1998]). Language switches in L3 production:
Implications for a polyglot speaking model. In B. Hammarberg (Ed.), Third lan-
guage acquisition (pp. 28-73). Edinburgh: Edinburgh University Press.
Wilton, P. & Colby, J. W. (2005). Beginning SQL. Indianapolis: Wiley.
Wolter, B. & Gyllstad, H. (2011). Collocational links in the L2 mental lexicon and the
influence of L1 intralexical knowledge. Applied Linguistics, 32(4), 430-449.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University
Press.
Lexical properties in the writing of foreign
language learners over eight years of study:
single words and collocations
Tami Levitzky-Aviad and Batia Laufer
University of Haifa, Israel
Lexical proficiency has been defined and researched in terms of lexical knowl-
edge, use and fluency. Different studies have shown that use of vocabulary in a
foreign language (or L2) develops more slowly than vocabulary knowledge,
either passive or active. However, many studies of free production compared
learners of two or three proficiency levels and examined single words, not multi-
word units, even though the latter are characteristic of idiomatic language, and
should be considered a component of lexical use.
The data for the present study was collected as part of the on-going compilation
of an Israeli learner corpus of written English. The data was analyzed to exam-
ine progress in vocabulary use over 8 years of learning, starting with students at
the end of elementary school (grade 6) and ending with English majors at the
university. The passages were compared on lexical richness – the proportion of
frequent to non-frequent vocabulary, on lexical variation – type token ratio, and
on the number of collocations. A total of 290 essays (200 words each) were ana-
lyzed using the VocabProfile, a software program that calculates the percentage
of a text’s words at different frequency levels and provides the text’s type-token
ratio. Significant increases in the use of infrequent vocabulary and collocations
were found only with the university students. A significant increase in lexical
variation was found at the end of high school. The lack of substantial progress
during school years, on the one hand, and the significant progress during the
one year at university, on the other hand, corroborate previous research. In light
of this limited progress, recommendations are made for further investigations
into the effect of different pedagogical approaches to the teaching of foreign lan-
guage vocabulary.
The goal of the present study is to examine the development of several ‘active’
lexical dimensions across eight years of learning English. More specifically, the
study aims at investigating developments in active vocabulary knowledge and in
three dimensions of vocabulary use: variation, richness and the use of colloca-
tions. Vocabulary is a clear indicator of how well foreign language (FL) learners
can communicate (Lewis, 1997; Widdowson, 1989). Effective vocabulary use
in writing has been found to have a positive influence on measures of the qual-
ity of writing and on one’s general language level (e.g. Lee, 2003; Llach &
Gallego, 2009; Morris & Cobb, 2004). Also, language learners themselves men-
tion vocabulary as a crucial aspect in writing (Leki & Carson, 1994; Polio &
Glew, 1996). It is therefore not surprising that research interest in the impor-
tance of vocabulary for writing in a foreign language is growing.
To understand the relationship between vocabulary and writing, we will
first explain several key terms in lexical research: lexical knowledge vs. lexical
use; depth, breadth and strength of knowledge; passive and active vocabulary
knowledge; recall and recognition; lexical variation and lexical richness; and col-
locations. We will then refer to available research on vocabulary and writing,
first for single words, then for collocations.
Vocabulary acquisition can be discussed in terms of both ‘lexical knowl-
edge’ and ‘lexical use’. Lexical knowledge is the information about the word that
learners have stored in their mental lexicons, while lexical use is the manifesta-
tion of this knowledge in real-time production (Laufer, 2005; Laufer &
Goldstein, 2004). This distinction implies that lexical knowledge in a foreign
language is typically more advanced than lexical use, because not all words
stored in learners’ mental lexicons are necessarily activated and used in free writ-
ing (Laufer, 1991).
Vocabulary knowledge can be assessed qualitatively, in terms of ‘depth’ of
knowledge, and quantitatively in terms of ‘breadth’ of knowledge and
‘strength’ of knowledge. Depth of knowledge refers to the degree of acquain-
tance with the various form and meaning components of a given lexical entry
(e.g. its morphological structure, its grammatical or lexical patterns, and its
relations with other lexical items) (Richards, 1976). Breadth of knowledge
refers to vocabulary size, i.e. the quantity of lexical entries stored in one’s
mental lexicon. In measuring vocabulary size, a word is considered ‘known’
when the correct meaning is associated with the correct word form. However,
form-meaning associations can take different forms, reflecting different
parameters according to which strength of knowledge is assessed (Laufer,
Elder, Hill, & Congdon, 2004; Laufer & Goldstein, 2004). These parameters
have been defined along the active-passive and recall-recognition distinctions
of meaning-form relationships. More details on how the distinctions were
operationalized are provided in the ‘Measurement tools’ section). The first
distinction implies that there is a difference in knowledge between people
who can retrieve the FL word form in order to convey a certain meaning
(‘active’ knowledge) and those who cannot do this, but can retrieve the mean-
ing once the FL word is presented to them (‘passive’ knowledge). The second
Lexical properties in the writing of foreign language learners over eight years of study 129
distinction implies that there is a difference between those who can recall the
form or the meaning of a word and those who cannot do this, but can recog-
nize the form or meaning in a set of options. Four modalities of strength of
knowledge thus emerge from these distinctions: active recall, passive recall,
active recognition and passive recognition. Of these, active recall is the hard-
est to achieve, and therefore represents the strongest degree of knowledge, fol-
lowed by passive recall, active recognition and passive recognition, respective-
ly (Laufer & Goldstein, 2004). In sum, strength of knowledge is a combina-
tion of four aspects of knowledge of meaning that constitute a hierarchy of
difficulty: passive recognition (easiest), active recognition, passive recall, and
active recall (hardest).
Lexical ‘variation’ and lexical ‘richness’ are two quantitative measures of
vocabulary use. Variation, (or ‘diversity’), is a measure of the number of different
words (types) used, or, more specifically, the type-token ratio (TTR). ‘Richness’,
on the other hand, is the proportion of low-frequency words in a piece of writ-
ing (Laufer, 1994; Laufer & Nation, 1995).
Phraseological analyses suggest that at least one-third to one-half of lan-
guage is composed of multi-word units (MWU) (Erman & Warren, 2000; Hill,
2000). They are retrieved faster than individual lexical items, indicating perhaps
that certain phrases are stored and retrieved as a whole (Erman, 2007; Schmitt,
Grandage, & Adolphs, 2004; Wray, 2002). There also seems to be a processing
advantage for formulaic sequences, at least in reading (Underwood, Schmitt &
Galphin, 2004). Therefore, a good knowledge of formulaic language is advan-
tageous for language learners and users.
Though there are several kinds of MWUs, we focused on the knowledge
and use of lexical collocations (henceforth, ‘collocations’) as it was shown to be
one possible indicator of native-like competence (Howarth, 1998; Hill, 2000).
We have adopted Nesselhauf ’s (2003) definition of collocations as word combi-
nations in which one of the words (the ‘base’ or headword) retains its independ-
ent meaning, while the meaning of the other word, (the ‘collocate’) is restricted
to the specific context and can only be used with some semantically related
headwords (though not even with all of them). The combinations chosen for
investigation in the present research were thus only MWUs which were found
compatible with this definition. These included examples such as ‘make a deci-
sion’ or ‘heavy rain’, but not combinations such as ‘eat breakfast’ or ‘play ball’.
Active vocabulary has been found to be (i) smaller in size, (ii) develop more
slowly (Laufer, 1998; Laufer & Goldstein, 2004; Nemati, 2010) and (iii) decay
faster (Schneider, Healy, & Bourne, 2002) than passive vocabulary. Accordingly,
as mentioned earlier, the most advanced degree of knowledge has been found to
be active recall, followed by passive recall, active recognition and passive recog-
nition, respectively (Laufer & Goldstein, 2004). Test results on progress in for-
130 Tami Levitzky-Aviad and Batia Laufer
2. The study
1,000 words (Nation, 2006). In the VST, each of these levels is represented by a
sample of 20 words. Hence, VST tests peoples’ knowledge of a total of 140 items
which represent the above mentioned 7,000 word families. As part of the VST,
test-takers show their understanding of each English word tested by choosing the
correct option from four options of synonyms and definitions of the word.
Though based on the VST, the test used for the current study was a bilin-
gual test. Since the groups which were compared included beginners and low
level learners, a bilingual test was considered more appropriate than a monolin-
gual test. Additionally, while the VST tests passive knowledge, or, more specif-
ically, passive recognition (since learners choose the correct paraphrase of the
target item), the test designed for the purpose of the present research tested
active knowledge.
The other test upon which our test was modelled is the CATSS. The spe-
cific feature of CATSS, in addition to testing words at different frequency lev-
els, is that it tests the four modalities of strength of knowledge from strongest
to weakest (see section 1): active recall, passive recall, active recognition and pas-
sive recognition. The test proceeds as follows: In the first modality (active
recall), a prompt appears on screen, which is the L1 translation of the target
word. The first letter of the target English word is also provided and the test-
taker needs to use this letter and type the English equivalent. Words known in
this modality are not tested again in subsequent modalities. Representing the
hardest, hence strongest degree of knowledge, each correct answer accounts for
1 point of the final CATSS score. In the second modality (passive recall), the
English target word appears on screen for the test-taker to translate into the L1.
Words known in this modality are not tested again. Each correct answer
accounts for 0.75 points of the final CATSS score. In the third modality (active
recognition), the test-taker needs to choose the correct English equivalent for
the L1 word out of four English options. Words known at this modality are not
tested again. Each correct answer accounts for 0.5 points of the final CATSS
score. In the last modality (passive recognition) the test-taker needs to choose
the correct L1 equivalent for the English target word out of four L1 options.
Representing the ‘weakest’ degree of knowledge, a correct answer at this modal-
ity receives 0.25 points of the final CATSS score. Words not known in any of
the four modalities receive zero points in the final score. The items tested pro-
ceed from frequent to less frequent. Hence, the final CATSS score has been
claimed to represent both size and strength of knowledge as it takes into account
not only the number of words test-takers know, but also the ‘way’ in which these
words are known (Laufer et al., 2004; Laufer & Goldstein, 2004).
Modelled upon CATSS, the test designed for the present study also takes
into account different strength modalities, yet with several modifications. While
Lexical properties in the writing of foreign language learners over eight years of study 133
CATSS tests both passive and active knowledge, the test in this study tests only
active knowledge (hereafter referred to as ACATSS). Another feature distin-
guishing ACATSS from the original CATSS is that the Hebrew (L1) prompt
words in the ACATSS do not appear in isolation, but rather in between two
asterisks within a Hebrew sentence. The decision to present the word within a
sentence was made so as to avoid ambiguity in cases of polysemy of the Hebrew
words. Such an approach also follows the model used in the VST.
In the ACATSS, the learners’ task is to provide the English equivalent of
the word in asterisks. To do so, the test includes three cycles: two for testing
active recall and one for testing active recognition.
First, the target item is tested for active recall without any cues, to mirror
a real life situation of independent writing. This is demonstrated in the follow-
ing example, where the target word is ‘lake’ and the Hebrew sentence means:
This *lake* is nice. The instructions for the test were given in both English and
Hebrew so that young learners could also clearly understand what they were
expected to do.
Example: cycle 1
Translate the words in *asterisks* into English:
A word known in this cycle is not tested again. If it is not known, it is tested
again in the second cycle. Here too active recall is tested, but now with the first
letter of the English word provided. Whereas in cycle 1 learners may provide a
non-target word which nevertheless fits the context, the first letter in cycle 2
limits word choice, trying to direct the learners to elicit the target word.
Example: cycle 2
Translate the words in *asterisks* into English
(use the first letter of the English word as provided for you):
l
Based on the assumption that words known in active recall would also be
known in active recognition (Laufer et al., 2004; Laufer & Goldstein, 2004),
only words which were not known in either one of the active recall stages are
tested again for active recognition. In this third cycle, learners are presented
with four English words of which they are asked to choose the correct equiva-
lent for the Hebrew word in asterisks. The distracters in the recognition stage
were sampled from the same frequency level as the English target word to elim-
inate the effect that word frequency might have on the choice of the response.
134 Tami Levitzky-Aviad and Batia Laufer
Example: cycle 3
Circle the correct translation for each of the words in *asterisks*:
a. tale b. rhythm c. lake d. lawn
Once all 20 words at one frequency level are tested, the test moves on to the
next frequency level. A word scores 1 point if known in the first cycle (active
recall with no cue), 2/3 if known in the second cycle (active recall with a cue),
1/3 in the third cycle (active recognition) and 0 for lack of any knowledge.
The total score for each frequency level is calculated by adding up the scores
learners receive for the 20 words. The total scores of all seven frequency lev-
els are then summed up to provide one total ACATSS score. As in the VST,
since the 140 words tested in the ACATSS represent a vocabulary size of
7,000 word families, the total ACATSS score can be multiplied by 50 to pro-
vide an indication of active vocabulary size as affected by the strength modal-
ities tested.
Three scores were obtained with the VocabProfile. Following the distinc-
tion between the first 2000 words (k1-k2) as the most frequent words and the
beyond-2000 levels (k3-k20) as the low frequency words (Nation & Kyongho,
1995), we first added up the percentages of k3-k20 to obtain the general per-
centage of the low frequency vocabulary in the passages. The score obtained
was thus considered an indication of how ‘rich’ the piece of writing was.
However, since some of the learners whose essays were sampled for the research
were at the very early stages of EFL learning, we also separated the percentages
of the 1st and the 2nd 1000 words. Additionally, the TTR obtained with the
VocabProfile program was taken to be an indication of variation.
ure of lexical variation in writing. Finally, the total number of different verb-
noun and adjective-noun collocations was used to examine their prevalence in
the written samples.
Four sets of one-way ANOVAs and post-hoc tests were used to compare
learners at different points of learning on each of the four dimensions of lexical
proficiency: size and strength of active vocabulary knowledge, richness, varia-
tion and the use of collocations.
Pearson correlations were then used to test whether the improvements in
each of the lexical dimensions over the years correlate with each other.
2.5. Results
Our first research question addressed the developments in each of the dimen-
sions of lexical proficiency. Tables 1.1 – 4.2 show the results for each dimension.
As noted in section 2.2, the written data analyzed in the present study consist-
ed of the 290 passages written by school-aged students in grades six to twelve
and by first year university English majors. However, the ACATSS results were
only obtained for 101 of these students. Thus, tables 1.1 and 1.2, showing the
results for active knowledge, refer only to students in grades 6, 9 and 12 and the
university students at the beginning of their first year in university. Tables 2-4
then show the results for the different measures of vocabulary use in the writ-
ten passages for all the school grades tested and for the university students at the
beginning and at the end of their first year.
2.5.1. RQ 1a: What developments occur in the size and strength of active vocabu-
lary knowledge of English words during the years of formal English learning?
Table 1 presents the means of the raw scores for each of the English learning
stages tested by the ACATSS. Table 2 shows the significance of differences
between the different pairs of learning stages. As noted in section 2.2, only 101
of the 290 students were tested with the ACATSS. Accordingly, the results in
tables 1 and 2 only refer to these students. Table 1 shows that the mean
ACATSS scores increase at each learning stage; table 2 shows that the differences
between all pairs of stages are statistically significant.
2.5.2. RQ 1b: What developments occur in the lexical richness of learners’ written
samples during the years of formal English learning?
Table 3 presents the mean proportions of k3-k20 words in the written samples.
Table 4 shows the significance of differences in these proportions between all of
the different pairs of learning stages. Table 5 presents the mean proportions of
k2 words in the written samples. Table 6 shows the significance of differences
in these proportions between all of the different pairs of learning stages.
Table 3 shows a general increase across the learning stages represented by
school/university years in the mean proportion of k3-k20 words in the written
samples, despite some slight decreases between some of the learning stages (e.g.,
grade 9 - 3.84%, grade 10 – 3.65%). However, as shown in table 4, in school
years all these changes appear to be statistically insignificant. In other words, in
the six years between the end of elementary school (grade 6) and the end of
high-school there are no statistically significant increases in the use of low fre-
quency words of k3-k20. Statistically significant improvements occur between
each of the school grades 6-12 and the English majors at the end of their 1st
year in the English department and between each of the school grades 6-10 and
the English majors at the beginning of their first year. Another significant
improvement occurs in the one year of English studies at the English depart-
ments in the college or university.
Table 3. Mean proportions (in %) of k3-k20 words in the written samples (n=290 learners)
Learning Stage N Min (%) Max (%) Mean (%) SD
Grade 6 15 1.5 5.45 3.24 1.20
Grade 7 21 .99 5.37 2.85 1.11
Grade 8 35 1 6.40 3.28 1.54
Grade 9 30 .98 6.86 3.84 1.62
Grade 10 39 0 8.16 3.65 1.78
Grade 11 36 .51 7.92 4.04 1.80
Grade 12 39 .50 8.54 4.17 1.78
Eng. Majors- beginning 36 1.49 12.75 5.48 2.74
Eng. Majors-end of 1st year 39 .50 16.58 7.75 3.37
138 Tami Levitzky-Aviad and Batia Laufer
*p<0.05 **p<0.01
Table 5 shows a general increase in the use of k2 words. Table 6 shows that sig-
nificant increases in the use of these words occur already during school years
between each of the grades 6-10 and grade 12. Statistically significant improve-
ments also occur between each of the school grades 6-10 and the two universi-
ty stages.
Table 5. Mean proportions (in %) of k2 words in the written samples (n=290 learners)
Learning Stage N Min (%) Max (%) Mean (%) SD
Grade 6 15 2.5 7.35 4.55 1.40
Grade 7 21 1.46 8.37 4.63 2.06
Grade 8 35 1.95 8.29 5.13 1.83
Grade 9 30 0 10.26 4.82 2.64
Grade 10 39 1.46 9.80 5.34 2.88
Grade 11 36 .50 11.50 5.79 2.99
Grade 12 39 1.99 12.56 7.25 2.58
Eng. Majors- beginning 36 2.49 13.93 7.27 3.18
Eng. Majors-end of 1st year 39 2.42 18.65 7.37 3.22
Lexical properties in the writing of foreign language learners over eight years of study 139
*p<0.05 **p<0.01
2.5.3. RQ 1c: What developments occur in the lexical variation in learners’ written
samples during the years of formal English learning?
Table 7 presents the mean type-token ratio reflecting lexical variation, i.e., the
percentage of different words in the text. Table 3.2 shows the significance of dif-
ferences between all the different pairs of EFL learning stages in regard to the
type-token ratios.
Table 7 shows a general increase in the type-token ratios in the writing samples,
despite some slight decreases which occasionally occur (e.g., grade 6 – 50.98%,
grade 7 – 49.78%). The only statistically significant differences, however (table
8) are between each of the grades 6-11 and grade 12 and between each of the
grades 6-11 and each of the university stages.
140 Tami Levitzky-Aviad and Batia Laufer
*p<0.05 **p<0.01
2.5.4. RQ 1d: What developments occur in the use of collocations in the learners’
written samples during the years of formal English learning?
Table 9 presents the raw means of different (not repeated) verb-noun and adjec-
tive-noun collocations found in the learners’ written samples of 200 tokens
each. Table 10 shows the significance of differences between all the different
pairs of EFL learning stages in regard to the use of these collocations.
Table 9 shows a general increase in the use of collocations, despite some
decreases which occur occasionally (e.g., grade 10 – 0.72, grade 11 – 0.42).
However, table 10 demonstrates that the only statistically significant differences
are between each of the school grades (6-12) and the English majors at the end
of their first year and between each of the grades 6-9 and 11 and the English
majors at the beginning of the first year.
Table 9. Raw means of different collocations in the 200-word samples (n=290 learners)
Learning Stage N Min (raw) Max (raw) Mean (raw) SD
Grade 6 15 0 1 0.13 0.35
Grade 7 21 0 2 0.38 0.59
Grade 8 35 0 2 0.23 0.55
Grade 9 30 0 2 0.37 0.61
Grade 10 39 0 5 0.72 1.15
Grade 11 36 0 2 0.42 0.60
Grade 12 39 0 4 0.72 0.94
Eng. Majors- beginning 36 0 7 1.31 1.65
Eng. Majors-end of 1st year 39 0 5 1.56 1.57
Lexical properties in the writing of foreign language learners over eight years of study 141
*p<0.05 **p<0.01
Table 11 shows the results of Pearson product moment correlations between the
developments, that is, the mean differences of the various lexical dimensions
over the years. Correlations with the ACATSS were conducted only for the 101
students who took this test. All other correlations were conducted for all 290
students.
The table 11 shows that the improvements in almost all lexical dimensions
over the years correlate significantly with each other. Lack of significant corre-
lation was found only between the results of the progress on the ACATSS and
the progress in the use of collocations.
Table 11. Correlations between the mean differences of the various lexical dimensions
Active knowledge size Variation (TTR) Richness #1 Richness #2
& Strength (ACATSS) (N=290) (k3-k20) (k2)
(N=101) (N=290) (N=290)
Variation (TTR) .380**
(N=101)
Richness #1 .207** .297**
(k3-k20)
(N=290)
Richness #2 .298** .348** .316**
(k2)
(N=290)
Use of collocations .149 .326** .222** .201**
(N=290)
**p<0.01
142 Tami Levitzky-Aviad and Batia Laufer
3. Discussion
The main focus of this study was the similarities and differences in the develop-
mental patterns of several dimensions of L2 lexical proficiency over eight years
of study. We will therefore discuss the progress found for each dimension and
compare the development of vocabulary knowledge with that of vocabulary use.
Continuous statistically significant improvements were found in active
knowledge as reflected in the ACATSS scores across all stages of English learn-
ing (see tables 1 and 2). And yet, these significant improvements should also be
considered vis-à-vis what they mean in terms of active vocabulary size and its
growth, and, even more so, in terms of the manifestation of this knowledge in
vocabulary use.
An increase in the size of knowledge suggests that there is an increase in the
amount of low-frequency words learners know. We can therefore expect that at
least those learners who have demonstrated a relatively high command of the
language and are accepted to the English department would also possess knowl-
edge of more lower-frequency words than would the general population of
school-aged students for whom English is not the major area of study. When
multiplying the mean ACATSS score of the first year English majors (see table
1) by 50 to reach the more general estimate of their active vocabulary size (see
section 2.3.1), the figure reached is 2850 (57x50). Hence, despite the statisti-
cally significant increase in active vocabulary size from the 12th grade to the
beginning of the 1st year in the English department (see table 1.2), even the
advanced students in the latter group know fewer than 1000 words beyond the
2000 most frequent words in English.
Furthermore, although these figures represent the development in active
knowledge, they do not necessarily reflect a similar vocabulary growth in free
writing. With regards to free writing, the results show a gradual, and some-
times statistically significant, progress in the three dimensions of vocabulary
use we tested: richness, variation and the use of collocations. However, while
active knowledge demonstrated a continuous significant increase throughout
the years, our findings, similar to previous ones (Laufer, 1991; Laufer &
Nation, 1995; Laufer & Paribakht, 1998; Lemmouh, 2010; Leñko-
Szymañska, 2002; Muncie, 2002) indicate that six or more years must pass
before students’ ability to put this knowledge into use also significantly
improves. More specifically, a statistically significant improvement in lexical
variation was evident only at the end of high-school (see table 8), whereas sta-
tistically significant improvements in the use of the k3-k20 low-frequency
words were completely lacking during school years and occur only during the
one year of university (see table 4). Lack of significant progress is also evident
in the use of collocations, not only during school years, but also during the one
Lexical properties in the writing of foreign language learners over eight years of study 143
year of university (see table 10). These results corroborate previous findings
(Laufer & Waldman, 2011; Nesselhauf, 2003; Pawley & Syder, 1983) and pro-
vide a clear indication of the specific difficulty involved in incorporating col-
locations into the writing of even advanced learners. Laufer and Waldman
(2011) explained this difficulty in terms of semantic transparency of colloca-
tions and their difference from L1. As many collocations are easily understood,
they go unnoticed in the input, and as a collocate in an L2 collocation is often
different from L1, learners cannot rely on their L1 and on the knowledge of
the individual words in L2.
The lack of statistically significant improvements in students during the six
earlier school years, as well as the lack of significant progress in the use of col-
locations even during the one advanced year at university, are even more puz-
zling given that richness and variation in vocabulary use can improve even over
the course of a single year at university. Since not all school students eventually
become English majors, some of them may never again study English in a for-
mal setting. It is hard to accept, then, that what school students end up with is
only an active vocabulary size of just over 2000 word families (46X50=2300),
and, perhaps, a higher ability to vary the vocabulary they are able to use, with-
out similar increases in the numbers of lower-frequency words or collocations
they use.
A few possible explanations can be provided to account for the discrepan-
cies between vocabulary knowledge and use and for the lack of significant
progress in vocabulary use during earlier school years. One possible assumption
which could have been made is that the nature of vocabulary learning may be
such that active knowledge and use are separate traits of lexical proficiency,
which develop in totally different ways. However, the moderate correlations we
found between vocabulary knowledge and use (see table 11), similar to previous
studies (Laufer & Nation, 1995; Leñko-Szymañska, 2002), point to a different
interpretation of the results. These correlations indicate that, despite the dis-
crepancies between vocabulary knowledge and use, an increase in learners’
active vocabulary knowledge may be moderately reflected in their use of richer
vocabulary. Also, the statistically significant increase in the use of k3-k20 words
during the one year at university suggests that rapid progress in vocabulary use
is possible. Hence, taken together, the significant correlations found between
active vocabulary knowledge and use and the progress in the use of low-frequen-
cy words over the one year of university suggest that the lack of statistically sig-
nificant growth we found in lexical use could be changed.
Therefore, another explanation for the lack of significant progress in
vocabulary use during earlier school years could be the lack of sufficient lan-
guage training and practice during these years, which could result from learn-
ers’ writing strategies, the teaching methods applied and/or the time of expo-
144 Tami Levitzky-Aviad and Batia Laufer
sure to English during school years. Coming up with a word to express a cer-
tain idea in writing requires learners to know more features of that word than
they need when they are asked to provide the word in some controlled setting.
However, due to factors such as the rarity of low frequency words, the arbitrary
nature of collocations or various incongruencies between L1 and L2 colloca-
tions, learners may experience uncertainties regarding the use of such lexical
items and may thus simply refrain from using them (Fan, 2009; Hill, 2000;
Laufer, 1998; Laufer & Waldman, 2011; Nesselhauf, 2003). Instead, they may
resort to using high frequency single words which convey the same, or at least
similar, ideas. This strategy is reinforced by teachers who believe that for com-
munication to be effective, foreign language learners’ ability to express their
ideas using any appropriate vocabulary is satisfactory in many cases.
Unfortunately, such a claim, especially when made by teachers, downplays the
need for sufficient practice of non-basic vocabulary (Laufer, 2005; Nemati,
2010; Milton, this volume) and, consequently, perpetuates stagnation of
vocabulary in free expression. This lack of progress is not something that any
education system should welcome.
To achieve progress, specific and realistic goals need to be set, and effective
teaching methods need to be implemented. Such teaching methods should
involve acknowledging the importance of encouraging FL learners’ use of low-
frequency vocabulary and collocations in their writing. Previous studies have
shown the effectiveness of Form-Focused Instruction (FFI) in activating learn-
ers’ lexical knowledge and putting some of it to use (Laufer, 2005; Laufer, 2010;
Laufer & Girsai, 2008; Lee, 2003; Nesselhauf, 2003; Webb, 2005; Xiao &
McEnery, 2006). Such an approach advocates explicit vocabulary instruction,
either as part of more general communication tasks (Focus on Form-FonF) or
as a goal in itself (Focus on Forms – FonFs). A longitudinal systematic syllabus
of FFI which gradually introduces low-frequency words and collocations and
encourages their use could be a possible solution for enhancing the knowledge
and use of such items at all stages of L2 learning.
Future research could compare the development of EFL vocabulary use in
writing in different educational systems, in different classes or in different con-
trolled experimental conditions. Such comparisons might be useful to show the
effectiveness of different pedagogical approaches for the development of L2
vocabulary use over the years.
Lexical properties in the writing of foreign language learners over eight years of study 145
References
Bahns, J., & Eldaw, M. (1993). Should we teach EFL students collocations? System,
21(1), 101-114.
Cobb, T. (n.d.). Web Vocabprofile: An adaptation of Heatley & Nation’s (1994) Range.
Computer program. Available on-line at http://www.lextutor.ca/vp/
Cobb, T. (2007).The revised frequency lists of k8-k14. Available on-line at
http://www.lextutor.ca/vp/bnc/cobb_6
Coxhead, A. (2000). A New Academic Word List. TESOL Quarterly, 34(2): 213-238.
Davis, M. & Gardner, D. (2010). Word Frequency List of American English. Available on-
line at www.wordfrequency.com
Erman, B. (2007), Cognitive processes as evidence of the idiom principle. International
Journal of Corpus Linguistics 12(1), 25-53.
Erman, B., & Warren, B. (2000). The idiom principle and the open choice principle.
Text-Interdisciplinary Journal for the Study of Discourse, 20(1), 29-62.
Fan, M. (2009). An exploratory study of collocational use by ESL students-A task based
approach. System, 37(1), 110-123.
Gitsaki, C. (1999). Second language lexical acquisition: A study of the development of col-
locational knowledge. San Francisco, CA: International Scholars Publications.
Heatley, A. & Nation, P. (1994). Range. Victoria University of Wellington, NZ.
Computer program. Available on-line at http://www.vuw.ac.nz/lals/
Hill, J. (2000). Revising priorities: From grammatical failure to collocational success. In
M. Lewis (Ed.), Teaching Collocation: Further Development in the Lexical Approach
(pp. 47-70). Hove: Language Teaching Publications.
Howarth, P. (1998). The phraseology of learners’ academic writing. In A. P. Cowie
(Ed.), Phraseology: Theory, analysis, and applications (pp. 161-186). Oxford:
Clarendon Press.
Hsu, J. (2007). Lexical collocations and their relation to the online writing of Taiwanese
college English majors and non-English majors. Electronic Journal of Foreign
Language Teaching, 4(2), 192-209.
Kucera, H., & Francis, W. N. (1967). Computational analysis of present-day American
English. Brown University Press Providence, RI.
Laufer, B. (1991). The development of L2 lexis in the expression of the advanced learn-
er. The Modern Language Journal, 75(4), 440-448.
Laufer, B. (1994). The lexical profile of second language writing: Does it change over
time? RELC Journal, 25 (2), 21-33.
Laufer, B. (1998). The development of passive and active vocabulary in a second lan-
guage: Same or different? Applied Linguistics, 19(2), 255-271.
Laufer, B. (2005). Focus on form in second language vocabulary learning. EUROSLA
Yearbook, 5(1), 223–250.
Laufer, B. (2007). CATSS: The Computer Adaptive Test of Size and Strength. Computer
program. Available on-line at http://hcc.haifa.ac.il/~blaufer/
146 Tami Levitzky-Aviad and Batia Laufer
Laufer, B. (2010). The contribution of dictionary use to the production and retention of
collocations in a second language. International Journal of Lexicography, 24(1), 29-49.
Laufer, B., & Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written
production. Applied Linguistics, 16(3), 307-322.
Laufer, B., & Paribakht, T. S. (1998). The relationship between passive and active vocab-
ularies: Effects of language learning context. Language Learning, 48 (3), 365-391.
Laufer, B., Elder, C., Hill, K., & Congdon, P. (2004). Size and strength: Do we need
both to measure vocabulary knowledge? Language Testing, 21(2), 202-226.
Laufer, B., & Goldstein, Z. (2004). Testing vocabulary knowledge: Size, strength, and
computer adaptiveness. Journal of Learning Language, 54(3), 399-436.
Laufer, B., & Girsai, N. (2008). Form-focused instruction in second language vocabu-
lary learning: A case for contrastive analysis and translation. Applied Linguistics, 29,
694-716.
Laufer, B., & Waldman, T. (2011). Verb?noun collocations in second language writing:
A corpus analysis of learners’ English. Language Learning, 61(2), 647–672.
Lee, S. H. (2003). ESL learners’ vocabulary use in writing and the effects of explicit
vocabulary instruction. System, 31(4), 537-561.
Leki, I., & Carson, J. G. (1994). Students’ perceptions of EAP writing instruction and
writing needs across the disciplines. Tesol Quarterly, 28(1), 81-101.
Lemmouh, Z. (2010). The Relationship among Vocabulary Knowledge, Academic
Achievement and the Lexical Richness in Writing in Swedish University Students of
English. Ph.D. Dissertation, Department of English, Stockholm University.
Leñko-Szymañska, A. (2002). How to trace the growth in learners’ active vocabulary? A
corpus based study. Teaching and Learning by Doing Corpus Analysis: Proceedings of the
Fourth International Conference on Teaching and Language Corpora. Graz (pp. 19-24).
Lewis, M. (1997). Pedagogical implications of the lexical approach. In J. Coady, & T.
Huckin (Eds.), Second language vocabulary acquisition: A rationale for pedagogy (pp.
255-270). Cambridge: Cambridge University Press.
Linnarud, M. (1986). Lexis in composition: A performance analysis of Swedish learn-
ers’ written English. Dissertation Abstracts International. C: European Abstracts, 47
(4), 812.
Llach, M. P. A., & Gallego, M. T. (2009). Examining the relationship between recep-
tive vocabulary size and written skills of primary school learners. ATLANTIS, 31,
129-147.
McIntosh, C., Francis, B., & Poole, R. (Eds.) (2009). The Oxford Collocations
Dictionary. Oxford: Oxford University Press.
Morris, L., & Cobb, T. (2004). Vocabulary profiles as predictors of the academic perform-
ance of teaching English as a second language trainees. System, 32(1), 75-87.
Muncie, J. (2002). Process writing and vocabulary development: Comparing lexical fre-
quency profiles across drafts. System, 30(2), 225-235.
Nation, I.S. P. (2006). How large a vocabulary is needed for reading and listening?
Canadian Modern Language Review/La Revue Canadienne Des Langues Vivantes,
63(1), 59-82.
Lexical properties in the writing of foreign language learners over eight years of study 147
Nation, I.S.P., & Kyongho, H. (1995). Where would general service vocabulary stop
and special purposes vocabulary begin? System, 23(1), 35-41.
Nation, I.S.P., & Beglar, D. (2007). A vocabulary size test. The Language Teacher, 31(7):
9-13.
Nemati, A. (2010). Active and passive vocabulary knowledge: The effect of years of
instruction. The Asian EFL Journal Quarterly 12(1), 30-46.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and
some implications for teaching. Applied Linguistics, 24(2), 223-242.
Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: Benjamins.
Pawley, A., & Syder, F. H. (1983). Natural selection in syntax: Notes on adaptive vari-
ation and change in vernacular and literary grammar. Journal of Pragmatics, 7(5),
551-579.
Polio, C., & Glew, M. (1996). ESL writing assessment prompts: How students choose*
1. Journal of Second Language Writing, 5(1), 35-49.
Richards, J. C. (1976). The role of vocabulary teaching. TESOL Quarterly, 10(1), 77-89.
Schmitt, N., Grandage, S., & Adolphs, S. (2004). Are corpus-derived recurrent clusters
psycholinguistically valid? In Schmitt, N. (ed.), Formulaic Sequences: Acquisition,
Processing and Use (pp. 127-151). Amsterdam: Benjamins.
Schneider, V. I., Healy, A. F., & Bourne L. E. Jr. (2002). What is learned under diffi-
cult conditions is hard to forget: Contextual interference effects in foreign vocab-
ulary acquisition, retention, and transfer. Journal of Memory and Language, 46(2),
419-440.
Summers, D., Mayor, M., & Elston, J. (Eds.), (2006). The Longman Exams Coach.
Essex: Pearson-Longman.
Underwood, G., Schmitt, N., & Galphin, A. (2004). The eyes have it: An eye-move-
ment study into the processing of formulaic sequences. In Schmitt, N. (ed.),
Formulaic Sequences: Acquisition, Processing and Use (pp 153-172). Amsterdam:
Benjamins.
Waldman, T. & Levitzky-Aviad, T. (in preparation). The Israeli Learner Corpus of
Written English (ILcoWE).
Webb, S. (2005). Receptive and productive vocabulary learning: The effects of reading and
writing on word knowledge. Studies in Second Language Acquisition, 27(01), 33 52.
Widdowson, H. G. (1989). Knowledge of language and ability for use. Applied
Linguistics, 10(2), 128-137.
Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge: Cambridge
University Press.
Xiao, R., & McEnery, T. (2006). Collocation, semantic prosody, and near synonymy: A
cross-linguistic perspective. Applied Linguistics, 27(1), 103.
Zhang, X. (1993). English collocations and their effect on the writing of native and non-
native college freshmen. PhD. Dissertation, Indiana University of Pennsylvania.
Automatic extraction of L2 criterial lexico-
grammatical features across pseudo-longitudinal
learner corpora: using edit distance and
variability-based neighbour clustering
Yukio Tono
Tokyo University of Foreign Studies
1. Introduction
In SLA, it is becoming increasingly popular to use techniques and resources
developed in the field of corpus linguistics and natural language processing.
The use of learner corpora, systematically sampled collections of learner
speech or writing in a machine-readable format, is rapidly gaining ground
among ELT materials developers, practitioners and SLA researchers (Granger,
1998; Granger, Hung, & Petch-Tyson, 2002). Behind all of this, there is a
growing awareness that frequency of items to be acquired in input plays an
important role in L1 and L2 acquisition processes (Gries & Divjak, 2012).
According to Goldberg (1995, 2006), the Saussurian concept of a symbolic
unit, that is a form-meaning pair, is assumed to cover not only the level of
words, but also applies to constructions at all levels of semantic linguistic rep-
resentation from morphemes and words to increasingly complex syntactic
configurations. This symbolic unit is acquired through the exposure to the
target language in context. I would argue that with the advent of corpus lin-
guistics and natural language processing, SLA researchers should once again
EUROSLA MONOGRAPHS SERIES 2
L2 vocabulary acquisition, knowledge and use, 149-176
Table 1. Possible criterial feature types
150
Negative grammatical Incorrect properties or errors that occur at a certain level or Errors involving incorrect morphology for determiners, as in
properties of the levels, and with a characteristic frequency. Both the presence Derivation of Determiners (abbreviated DD) She name was
L2 levels versus absence of the errors, and the characteristic frequency Anna (instead of Her name ...), show significant differences in
of error can be criterial for the given level or levels. E.g. error error frequencies that decline from B1 > B2 > C1 > C2.
property P with a characteristic frequency F may be criterial
for [B1 and B2].
Positive usage distri- Positive usage distributions for a correct property of L2 that The distribution of relative clauses formed on indirect
butions for correct match the distribution of native speaking (i.e. L1) users of object/oblique positions (e.g. the professor that I gave the book
L2 properties the L2. The positive usage distribution may be acquired at a to) to relativizations on other clausal positions (subjects and
certain level and will generally persist at all higher levels and direct objects) appears to approximate that of native speakers
be criterial for the relevant levels. at the C levels, but not at earlier levels. Hence this is a posi-
tive usage distribution that is criterial for [C1, C2]
Negative usage Negative usage distributions for a correct property of L2 that The distribution of relative clauses formed on indirect
distributions for do not match the distribution of native speaking (i.e. L1) object/oblique positions is the negative usage distribution,
correct L2 properties users of the L2. The negative usage distribution may occur at criterial for B2 and below.
a certain level or levels with a characteristic frequency F and
be criterial for the relevant level(s).
Yukio Tono
Automatic extraction of L2 criterial lexico-grammatical features 151
What is unique in the EPP is its corpus-based method of finding ‘criterial fea-
tures’ from learner corpora sampled from the subjects at different CEFR levels.
Salamoura and Saville (2009) defined a ‘criterial feature’ as follows (Salamoura
& Saville, 2009, p. 34).
A ‘criterial feature’ is one whose use varies according to the level achieved and
thus can serve as a basis for the estimation of a language learner’s proficiency
level. So far the various EP research strands have identified the following
kinds of linguistic feature whose use or non-use, accuracy of use or frequen-
cy of use may be criterial: lexical/semantic, morpho-syntactic/syntactic, func-
tional, notional, discourse, and pragmatic.
Hawkins and Buttery (2010), for example, have identified four types of feature
that may be criterial for distinguishing one CEFR level from the others. Table
1 shows the classifications.
The English Profile (EP) researchers have done preliminary studies with
regard to the criterial features, using the Cambridge Learner Corpus (CLC)
(Williams, 2007; Parodi, 2008; Hendriks, 2008; Filipovic, 2009; Hawkins &
Buttery, 2010). The CLC currently comprises approximately 50 million words
of written learner data, roughly half of which is coded for errors. It has also been
parsed using the Robust Accurate Statistical Parser (RASP) (Briscoe, Carroll &
Watson, 2006). Salamoura and Saville (2009) state that the CLC mainly covers
A2 level and above, which is the reason why the EP researchers started to build
152 Yukio Tono
a new corpus called the Cambridge English Profile Corpus (CEPC), mainly
focusing on lower-proficiency level students’ writing and speech.
Considering the sheer size of the CLC with error annotations and the
CEFR as a framework, this EP programme seems to create a new research par-
adigm in learner corpus research. Those who are interested in using learner
corpora in SLA research can relate their findings to the EP researchers’ find-
ings in terms of criterial features. Those who are involved in syllabus/materi-
als design will find the RLDs for English very informative once those items
are actually identified. Test developers will make full use of the results of the
EP research for improving their test design and contents.
Some may argue that this whole approach is affected by the ‘comparative fal-
lacy’ (Bley-Vroman, 1983). Bley-Vroman warned that L2 speakers’ interlanguage
systems should be seen as independent of their L1s and target languages and
should thus be studied in their own right. This implies discarding the notion of
‘target-like’ performance. Most learner-corpus-based IL studies rely on the com-
parison between L2 learners and their mother tongues or target-like performance
by native speakers of the target languages. In my opinion, this again depends on
research purposes. If one wishes to describe interim states of IL systems, inde-
pendent of both L1s and target languages, Bley-Vroman’s position makes perfect
sense. However, as Kasper (1997) said, SLA researchers have legitimate and
important interests in assessing learners’ IL knowledge and actions not just as
achievements in their own right but also measured against some kind of standard
(ibid: 310). From pedagogical and assessment viewpoints, there is nothing wrong
with setting native speakers’ well-formed sentences as a goal, because that is the
language taught in the classroom. Therefore, L2 profiling research is worth the
effort, as long as we properly understand its aims.
One of the issues of identifying criterial features is deciding how to
extract errors from learner data and judge whether they serve as criterial fea-
tures or not. The CLC is manually tagged for errors, but it would be quite dif-
ficult to extract learner errors from generic learner data without error annota-
tions. There are two main purposes of this paper; to propose a new approach
of annotating errors semi-automatically by comparing the original learner
data against the proofread data, by using edit distance and automatic POS
tagging, and to judge whether or not those errors can serve as criterial features
by employing multivariate statistics called correspondence analysis and vari-
ability-based neighbour clustering. This is especially useful because it provides
a set of criterial features for lower levels that are not provided by CLC, in
order to identify a set of features for Japanese learners of English in specific
L2 contexts, to suggest an alternative classification of features for all CEFR
levels, and to offer a generic technique of extracting criterial features from any
learner corpora.
Automatic extraction of L2 criterial lexico-grammatical features 153
2. Method
2 Differences between the two words are positions No. 2, 3, 5 and 7 in the letter sequence
of “sitting”. Thus the distance is 4.
Automatic extraction of L2 criterial lexico-grammatical features 155
(1) a. Two elements are identified as the same and aligned to each other (“\” path in the matrix)
b. X is aligned to a gap (“|” path)
c. Y is aligned to a gap (“–” path)
Suppose X has a sequence “ABCE” and Y has “ACDE,” the thick black line
in Figure 1 indicates the optimal path for alignments. There is possibly more
than one path from the starting point (0,0) to the end point (4,4). A Dynamic
Programming (DP) algorithm checks all available paths from the start to the
end and calculates each cost to identify the optimal path.
Sequence Y
Sequence X
In our case, two aligned sequences correspond to two sentences, and the parts
in the sequences (A to E in Figure 1) are actual words in the sentences. Figure
2 shows in matrix form how this algorithm checks the two aligned sentences, an
original sentence (vertical) and its corrected counterpart (horizontal).
156 Yukio Tono
In Figure 2, two possible cases of alignment are illustrated. The alignments are
described in (2) and (3) below:
The alignment result in (2) is better than that in (3) in the sense that miss-
ing items in the sentence pairs (a) and (b) are correctly matched in (2), com-
pared to the results in (3). Each of the paths in Figure 2 shows these alignment
results, with thick black lines showing the case in (2) and dotted lines, showing
the case in (3). Each edit distance in (2) and (3) is calculated and the optimal
path (in this case, (2)) produces the highest score. Look at (2) once again. There
are three allowable edit operations in the Levenshtein distance, which is
described in (4):
→
→
(5) a. substitution misformation errors
→
b. insertion addition errors
c. deletion omission errors
best tagged alignment results with the highest total of individual scores as an
optimal alignment. The three error types are identified automatically based on
the alignment results, and then tagged for each error type: <msf> for misforma-
tion, <add> for addition, and <oms> for omission. Correction candidates are
specified in the case of misformation tags, as in <msf crr= “correct answer”>.
The output of the program is shown in (6):
(6) I eat <add>a</add> bread and <msf crr=fried>flied</msf> <oms>eggs</oms> every morning.
If the alignments are accurate, chances are that surface strategy taxonomy
errors can be extracted fairly accurately and automatically.
2.3. Procedure
Using the heuristics described in 2.2., the parallel (i.e. original and proofread)
version of the entire JEFLL Corpus was processed for the Levenshtein distance
and then automatically tagged for three types of surface strategy taxonomy
error: omission, addition and misformation. The output of the program was
checked manually, and problematical cases of word order errors were identified
and corrected. In order to capture an overall tendency of extracted errors, all the
tagged surface strategy taxonomy errors were processed for part-of-speech
(POS) information, using an automatic POS tagger. This made it possible to
analyse extracted errors in terms of their parts of speech. At this level, the error
annotation in the corpora is only related to the surface strategy taxonomy errors
and their POS information. I am fully aware of the limitations of dealing with
errors using the surface taxonomy and POS only. It needs further analysis in
terms of linguistic classification, e.g. agreement errors, tense errors, verb subcat-
egorization errors, among others. Furthermore, a POS tagger developed for
analysing native speakers’ data may not be entirely suitable for interlanguage
data. But I have the following justifications for my approach. First, the main
purpose of this chapter is to propose a method of annotating errors semi-auto-
matically in learner language and not to propose comprehensive criterial fea-
tures from learner data. Using the approach described in this paper, researchers
can work on their learner data and make further analysis of each error type they
are interested in. Second, the overview of POS-related errors based on the sur-
face strategy taxonomy still provides a very interesting summary regarding the
state of ILs at each stage and helps to generate new hypotheses related to differ-
ent aspects of acquisition. For instance, omission errors of determiners are quite
frequent across all the stages of acquisition in the JEFLL Corpus, while the
repertoire of nouns in lexicon will also increase as the level increases. This means
that the use of articles improves for particular noun groups, but the knowledge
158 Yukio Tono
of the article system is not fully acquired as more lexical items are introduced in
the lexicon. This kind of microscopic analysis can be done for each error type,
but this should be dealt with elsewhere. Third, automatic annotation described
in this paper can be used to annotate large samples of learner corpora, which is
cost-effective, and helps to conduct profiling research such as EPP to provide a
bird’s eye view of how learner performance will change from one stage to another.
The frequency distributions of the above error types in terms of POSs were
obtained across the school years. Multivariate statistics were used in order to
capture complex relationships between school years and different error types.
Correspondence analysis was used first to obtain biplots between major error
types and school years, which was supplemented by clustering techniques called
“variability-based neighbour clustering (VNC)” (Gries & Stoll, 2008). Both are
techniques of data reduction and summarisation. Correspondence analysis is a
descriptive/exploratory technique designed to analyze simple two-way and
multi-way tables containing some measure of correspondence between the rows
and columns. The results provide information which is similar in nature to that
produced by Factor Analysis techniques, and they allow one to explore the
structure of categorical variables included in the table. Graphical representations
of two variables mapped onto the two extracted dimensions are especially use-
ful in order to see relative proximity of the items in each variable. VNC differs
from standard approaches because it only clusters neighbouring data points,
thus preserving the data points’ temporal sequence. This is important because
the order of school years needs to be taken into account as we cluster linguistic
features characterising each level.
3. Results
3 Precision is defined as a measure of the proportion of selected items that the system
got right: precision = (true positive)/((true positive)+(false positive)). Recall is
defined as the proportion of the target items that the system selected: recall = (true
positive)/((true positive)+(false negative)) (Manning & Schutze 1999: 268).
Automatic extraction of L2 criterial lexico-grammatical features 159
was 179 out of 641 (precision = 72.07%), which shows that alignment of mis-
formation was very difficult in comparison to the other two error types.
Consequently, F measure was also low (F= 0.8373).The sample output is shown
in (7), where no error was found in the analysed sentence:
(7) <result>
<sentence id= “ns”>
Today I ate bread and milk
</sentence>
<sentence id= “st”>
Today I ate bread and milk
</sentence>
<trial no= “01a”>
Today I ate bread and milk
</trial>
</result>
The first sentence labelled “ns” is the one proofread by a native speaker. The sec-
ond sentence labelled “st” is the student’s original sentence and the third one is
the output of comparing the pair (“ns” and “st”). If there is no error in the sen-
tence, the output is the same as the two sentences above.
The sentences in (8) show the case in which the sentence pair (“ns” and
“st”) has several differences. In the first output labelled “trial No. 01a”, differ-
ences between the pair were identified in terms of omission, addition and mis-
formation (tagged <oms>, <add>, and <msf> respectively) along with suggested
corrections shown in the attribute “crr=”. The edit distance program works in
such a way that the first trial was retained as long as there was no overlapping
word found in the identified error items. If there was any overlapping word, for
example, “breakfast” in the output “01a”, additional analysis was made to re-
classify the two overlapped words into a single case of transposition from one
position to another in a sentence. Thus, in the output “02”, the word “break-
fast” is tagged as <trs_add> for the first one and <trs_oms> for the second one,
showing that these two words both belong to the same misordering error.
(8) <result>
<sentence id= “ns”>
I like breakfast but I don’t eat rice and miso soup for breakfast
</sentence>
<sentence id= “st”>
I like breakfast but I don’t eat in breakfast rise and misosoup
</sentence>
160 Yukio Tono
4 Please note, however, that this figure is based on the automatic extraction, whose pre-
cision is roughly 72%.
5 The number of misordering errors has to be interpreted carefully because this feature
was added after the first evaluation was done for the other three types of errors and
the accuracy rate was not checked against manually corrected data.
Automatic extraction of L2 criterial lexico-grammatical features 161
many misformation and omission errors on verbs. However, verbs behave dif-
ferently from nouns in several respects. First, the number of verb misformation
errors stays almost the same throughout the school years while noun misforma-
tion errors decrease in the first three years. This may be again related to the use
of Japanese words in the compositions. Second, verb omissions are very high in
year 7, they decrease considerably in Year 8 and after another slight decrease in
Year 9 they tend to remain constant; noun omission errors seem to follow a U-
shaped curve, with a high initial proportion gradually shrinking in Years 8 and
9, to then grow again in later years. Verbs are also different from nouns in the
way addition errors occur. While the number of noun addition errors decreases
constantly from Year 7 to 10, verb addition errors increase from Year 7 to 10.
This is mainly due to the increasing overuse of “have” as an auxiliary besides its
use as a lexical verb, as learners experiment with more complex grammatical
constructions.
Determiner errors are especially frequent in the case of omissions. The frequen-
cies of omission errors are five to six times higher than addition errors, which
shows that Japanese-speaking learners of English tend to omit determiners
rather than oversupply them. Error rates remain almost the same throughout
the school years, which shows that determiner omission errors are quite persist-
ent in nature. Prepositions are also problematical and they are frequently omit-
ted. Interestingly, preposition omission errors have a typically U-shaped error
curve, where the errors decrease for the first three years and then increase again
in a later stage. Although the number is relatively smaller, addition errors of
prepositions also increase steadily as the school year increases. Preposition errors
162 Yukio Tono
Table 3. Normalised frequencies of 4 types of errors across school years and POSs (per 10,000 words)
Addition
YEAR DET NOUN PRN ADV ADJ BE VERB PRP MODAL TO CONJ TOTAL
7 28.8 100.8 12.0 13.7 10.0 26.4 18.6 10.2 5.5 6.4 3.5 242.8
8 25.6 67.0 14.4 15.1 9.7 22.6 23.5 19.3 3.4 11.5 3.4 223.5
9 23.7 60.8 12.4 16.3 7.1 20.9 29.0 16.3 5.6 8.6 5.0 214.7
10 32.3 38.6 19.1 35.8 6.8 29.3 78.8 30.4 16.7 11.8 6.0 315.4
11 36.7 41.2 25.4 32.9 11.7 26.6 73.5 33.5 20.3 12.3 7.3 332.3
12 33.6 42.0 25.6 35.8 13.0 28.0 69.5 32.0 18.4 11.7 7.5 329.2
1658.0
Omission
YEAR DET NOUN PRN ADV ADJ BE VERB PRP MODAL TO CONJ TOTAL
7 176.7 283.7 138.2 56.2 79.7 80.4 200.8 126.4 24.8 32.3 23.5 1229.7
8 165.6 188.8 81.8 39.7 47.9 51.0 126.3 97.8 10.2 22.8 12.8 852.7
9 119.8 103.7 53.0 33.6 27.7 40.2 98.6 69.2 9.8 16.7 7.2 588.5
10 193.7 154.2 61.4 51.6 44.0 56.1 102.6 131.2 14.0 32.3 16.1 867.4
11 149.8 145.6 62.3 58.4 42.2 52.3 85.8 125.1 15.4 22.2 14.1 784.2
12 157.9 191.9 67.7 56.2 53.5 47.7 109.6 120.7 14.0 27.0 12.2 870.5
5193.0
Misformation
YEAR DET NOUN PRN ADV ADJ BE VERB PRP MODAL TO CONJ TOTAL
7 46.9 594.8 104.5 62.2 63.6 134.2 223.9 38.3 11.3 7.1 16.2 1309.9
8 45.9 475.0 77.3 75.3 73.5 86.0 207.1 62.5 13.4 14.4 15.0 1153.4
9 44.1 380.4 63.2 69.6 53.2 61.7 200.0 57.2 14.8 10.5 21.6 985.3
10 60.4 391.2 61.1 151.6 79.5 67.5 202.1 95.8 24.0 15.3 34.7 1193.2
11 61.9 345.9 60.9 132.7 66.6 61.6 193.4 79.0 20.2 18.0 31.7 1082.7
12 54.9 383.7 64.7 124.2 76.7 57.9 199.8 78.8 26.0 15.7 26.7 1121.0
6845.6
Misordering
YEAR DET NOUN PRN ADV ADJ BE VERB PRP MODAL TO CONJ TOTAL
7 1.1 14.0 2.9 2.4 4.2 0.4 5.1 1.3 0.4 0.4 0.9 40.2
8 2.6 11.7 2.8 3.4 2.9 1.0 3.6 1.0 0.2 0.8 1.2 39.2
9 1.0 8.5 2.7 2.8 2.3 1.2 2.8 1.0 0.4 0.4 1.1 33.3
10 3.7 12.1 5.1 4.4 2.5 1.6 3.5 4.7 0.5 1.1 2.8 51.9
11 4.2 11.3 3.2 5.0 3.3 1.9 4.9 2.8 0.8 1.0 1.7 51.1
12 3.9 8.8 3.4 4.4 3.5 2.3 4.8 3.0 0.4 0.8 1.7 49.0
264.6
Automatic extraction of L2 criterial lexico-grammatical features 163
will become more frequent as learners learn more prepositions and try to use
them to express more complex ideas in English.
It is noteworthy that errors observed with a frequency analysis based on the
surface strategy taxonomy have some general characteristics, which may point
to some general interlanguage developmental trends. First, omission errors are
more common than additions. Naturally, L2 learners start with simplified struc-
tures, which lack required elements such as determiners, prepositions, verbs,
and nouns to form well-formed sentences. As their proficiency levels go up,
however, the ratio of addition errors to omission errors will become higher. This
indicates that the more proficient L2 learners become, the more varieties of lan-
guage they will use and they will thus take increasingly more risks in expressing
themselves, which will lead to more errors. This is clearly shown in the increas-
ing frequencies of errors related to verbs, adverbs, adjectives, prepositions, con-
junctions and modals (see Table 3). This tendency is closely related to lexical
choice errors with major content words and is known to have an inverted U-
shaped curve (Hawkins & Buttery, 2010), which indicates that errors of this
type will continue to increase as learners become proficient from the beginning
to the intermediate levels and as the repertoire of language becomes wider and
errors will decrease or disappear when they reach near-native proficiency levels.
In JEFLL, because of the lower proficiency levels, most addition errors contin-
ue to grow in number or stay the same throughout the six years.
The statistics, however, have to be interpreted carefully in the case of mis-
formation errors, given that the identification of misformation errors by edit
distance has lower precision/recall scores in comparison to the other error types.
There is also an influence of the use of Japanese words in the essays, which
boosted the frequencies of noun errors, especially in Year 7.
said about Year 8 and Year 9. Year 7 was apart from the other groups, showing
that the group behaved very differently. The positions of POS errors in relation
to the school years revealed interesting patterns. Noun errors (NOUN), for
example, were close together with Year 7, far from the other error groups. As can
be seen from Table 3, noun errors were very high in frequency for Year 7, main-
ly due to the fact that Year 7 students used Japanese words very often in the
compositions, which were analysed as nouns by a POS tagger. Thus, high fre-
quencies of noun errors involve the use of Japanese words in the passages.
Another reason why noun errors were located far from the other groups is that
their frequencies kept going down significantly from Year 7 to 9 until they
became stable for higher levels. On the other hand, verb errors (VERB) and
modal auxiliary errors (MODAL) showed opposite tendencies, with their fre-
quencies continuing to increase toward Year 12. Figure 5 shows the results of
correspondence analysis for omission errors.
The overall picture here is different from addition errors. The relationship
between the two variables (POS omission errors X school year) summarised in
the biplots in Figure 5 can be interpreted by looking at Table 3 again. The stu-
dents’ groups were not plotted in the order of the school years. Rather, Year 12
was placed toward the centre, and Year 10 and Year 11 were on the rightmost
end. This is partly due to the fact that error frequencies reported in Table 3
suddenly increased in Year 10 after a gradual decrease from Year 7 to 9. It seems
that omission errors did not simply decrease as the school year went up. In
166 Yukio Tono
many cases, omission errors decreased in frequency from Year 7 to 9, rose again
in Year 10 and either stayed the same toward Year 12 or fluctuated through the
three years in senior high, which explains why the points for these years do not
follow a straight line from left to right in the biplot. Also there were two dif-
ferent groups of POS errors, divided by the origin of the axis. Those placed on
the left side of the origin for the first axis (PRN, NOUN, VERB, and ADJ) all
shared the same tendency that their frequencies in Year 7 were much higher,
compared to the other errors (ADV, PRP, DET, and TO), whose frequencies
were not very high in Year 7 and gradually became higher in Year 10 - 12. The
former group consists of parts of speech that are primary components of con-
structions and open class in nature (except for PRN) whereas the latter group
belongs to closed class and their primary functions are connecting components
in a sentence. This shows that learners at the beginning stage of acquisition fail
to supply major elements such as verbs or nouns, but these omission errors
tend to decrease as they progress. On the other hand, they will have more
errors on function words such as prepositions, determiners, infinitives, and
adverbs, which help to modify principal elements in a sentence to make it
more complex.
Figure 6 illustrates the way misformation errors occurred and their rela-
tionship with school years.
Automatic extraction of L2 criterial lexico-grammatical features 167
the syntactic elaboration of sentences, which is shown in the errors of closed sys-
tem such as CONJ, MODAL, PRP and TO.
Dendrograms are best read from the bottom, since they join together groups
starting from those having the lowest distance. The distance is represented not in
the horizontal but in the vertical axis, which means that a short vertical line rep-
resents closely associated points while a long one represents a greater distance
between them. Cluster 1 distinguishes Year 7 from the rest. Cluster 2 ranges from
Year 8 and Year 9, and cluster 3 ranges from Year 10 to Year 12.
Figure 7. VNC for noun addition errors (LEFT: scree plots; RIGHT: dendrogram)
Figure 8 shows the three clusters by dividing them by vertical dotted lines.
Horizontal lines under the numbers (2) and (3) indicate the mean frequencies
that are observed in the data for the three clusters.
the results were not very useful even though the dendrograms in Figures 9 and
10 made two clusters anyway, just for the sake of giving an idea of where the
division could be made. Regarding the addition errors in Figure 7, only nouns,
adverbs, verbs, modals and prepositions made two meaningful clusters. Except
for noun addition errors, which produced three clusters due to the effects of the
intensive use of Japanese in Year 7, the first cluster ranges from Year 7 to Year 9,
and the second ranges from Year 10 to Year 12, thus clearly dividing the junior
high group and the senior high group in terms of the error occurrence patterns.
This confirms the findings observed in correspondence analysis in Figure 4, and
without VNC it was difficult to state which POS errors actually contributed to
the divisions.
The omission errors show slightly more complicated pictures. As was
shown in Figure 5, there is a tendency for omission errors to decrease
throughout Year 7 and Year 9, and increase again in Year 10 toward Year 12,
which is due to the fact that learners took more risks to extend their repertoire
of English at later stages, yielding more errors. Learners tended to master the
use of basic lexis and grammar that they had learned at the early stage, but as
they moved onto more advanced stages, they produced different types of
omission errors. In terms of accuracy rates, this is a well-known inverted U-
shaped developmental curve. Among the omission errors, only nouns, pro-
nouns, and verbs seemed to show meaningful clusters. Interestingly, the two
clusters are Year 7 and the rest in most cases. It is worth pointing out again in
this connection the results of correspondence analysis. Those errors placed on
the left side of the origin for the first axis (PRN, NOUN, VERB, and ADJ)
in Figure 5 nearly correspond to the ones showing meaningful clusters in
Figure 8, namely nouns, verbs, and pronouns. One should bear in mind that
their frequencies in Year 7 were much higher, compared to the other errors
(ADV, PRP, DET, and TO), whose frequencies were not very high in Year 7
and gradually became higher in Year 10 - 12. Therefore, the results of VNC
suggest that three omission errors above all (noun, verb and pronoun) are use-
ful in distinguishing Year 7 from the rest of the groups, while for the other
POS errors the results are not conclusive.
4. Discussion
So far, I have proposed a new way of extracting errors from learner corpora and
judging the status of those extracted errors as criterial features. Edit distance is
a common metric to spot differences between two strings of characters. It is
used intensively in other areas such as the analysis of DNA sequences. By
extending its use to a comparison of learner production and target-like per-
Automatic extraction of L2 criterial lexico-grammatical features 171
Table 4. Extracted criterial features for the learning stages of Japanese EFL learners
Types POS Criterial for: mean error
freq. of errors
Addition nouns [Year 7] > [Year 8 - 9] > [Year 10 -12] 58.4
adverbs [Year 10 - 12] > [Year 7 - 9] 24.93
verbs [Year 10 - 12] > [Year 7 - 9] 48.81
prepositions [Year 10 - 12] > [Year 7 - 9] 23.62
modals [Year 10 - 12] > [Year 7 - 9] 11.65
Omission nouns [Year 7] > [Year 8] = [Year 10 -12] > [Year 9] 177.98
verbs [Year 7] > [Year 8 - 12] 120.62
pronouns [Year 7] > [Year 8 - 12] 111.73
is a formal procedure for extracting and identifying criterial features. This paper
proposes a formal, methodological procedure for identifying criterial features in
IL development. Using edit distance, possible error candidates are automatical-
ly extracted. Subcategorising those errors by POS can be done by automatic
POS tagging. Variability-based neighbour clustering will make it possible to
aggregate similar groups and cluster variables into meaningful stages of learning.
This procedure can be applied to any kinds of learner corpora if they have par-
allel versions of the data set ready for edit distance. A word of caution is in order
here. The approach presented in this paper is only applied to extracting surface
strategy taxonomy errors. It will not deal with semantic errors such as
tense/aspect morphology, for this kind of information is not revealed on the sur-
face. Also this method is only applicable to “errors” as criterial features. It will
not be used to extract well-formed language features as criteria. This should not
be the limitation of this study, however, because well-formed linguistic features
are usually much easier to extract, using ordinary corpus analysis tools such as
concordancing or n-gram analysis over different sets of learner data. I hasten to
add that VNC can also be used for analysing both errors and non-errors as long
as frequency information is available regarding given linguistic features across
different stages.
Some final notes are in order with respect to methodological issues. The
detection of misformation errors could be improved. At the moment, the accu-
racy of misformation errors is sufficiently high with respect to one-to-one lex-
ical mapping relation. If the mapping is between one to multiple words or vice
versa, the accuracy rate suddenly drops. In order to solve this problem, onto-
logical knowledge such as POS-labelled wordlists or something of the kind will
be needed, which is more complex than simple surface character-level similar-
ities. The results of multivariate analysis should also be further interpreted
from both macroscopic and microscopic viewpoints. In macro views, my find-
ings should be related to a much larger framework of criterial features and
CEFR levels. If several dozen criterial features were identified, it would be nec-
essary to re-classify those criterial features in terms of their relative importance.
Also there are some cases in which a bundle of criterial features will work bet-
ter than a single feature, thus some methods have to be proposed in order to
figure out how to deal with such possibilities. I should admit that identifying
criterial features is one thing, but constructing the overall framework is quite
another. This whole process of identifying criterial features using learner cor-
pora and constructing the overall theoretical framework based on those criter-
ial features seems to me a very promising research strand, which definitely links
learner corpus research to SLA and English language teaching and assessment
in a meaningful way.
174 Yukio Tono
References
Abe, M. (2003). A corpus-based contrastive analysis of spoken and written learner cor-
pora: the case of Japanese-speaking learners of English. In D. Archer, P. Rayson, A.
Wilson, & T. McEnery (Eds.), Proceedings of the Corpus Linguistics 2003 Conference
(CL 2003) (pp. 1-9). Lancaster University: University Centre for Computer
Corpus Research on Language.
Abe, M. (2004). A corpus-based analysis of interlanguage: errors and English proficien-
cy level of Japanese learners of English. In Y. Tono (Ed.), Handbook of An
International Symposium on Learner Corpora in Asia (pp. 28-32). Tokyo: Showa
Women’s University.
Abe, M. (2005). A comparison of spoken and written learner corpora: analyzing devel-
opmental patterns of grammatical features in Japanese Learners of English. The
Proceedings of the NICT JLE Corpus Symposium (pp. 72-75). Kyoto: National
Institute of Communications Technology.
Abe, M. & Tono, Y. (2005). Variations in L2 spoken and written English: investigating
patterns of grammatical errors across proficiency levels. Proceedings from the Corpus
Linguistics Conference Series ( Vol. 1, no.1) Retrieved from
http://www.corpus.bham.ac.uk/pclc/ index.shtml
Bley-Vroman, R. (1983). The comparative fallacy in interlanguage studies: The case of
systematicity. Language Learning, 33, 1-17.
Briscoe, E., Carroll, J., & Watson, R. (2006). The second release of the RASP System.
Retrieved January 15, 2012, from http://acl.ldc.upenn.edu/P/P06/P06–4020.pdf
Dulay, H., Burt, M., & Krashen, S. (1982). Language Two. Oxford: Oxford University
Press.
Filipovic, L. (2009). English Profile – Interim report. Internal Cambridge ESOL report,
April 2009.
Goldberg, A. E. (1995). Construction: A Construction Grammar Approach to Argument
Structure. Chicago: University of Chicago Press.
Goldberg, A.E. (2006). Constructions at Work: the nature of generalization in language.
Oxford: Oxford University Press.
Granger, S. (Ed.). (1998). Learner English on Computer. London/New York: Addison
Wesley Longman.
Granger, S., Hung, J. & Petch-Tyson, S. (Eds.). (2002). Computer Learner Corpora, Second
Language Acquisition and Foreign Language Teaching. Amsterdam: Benjamins.
Gries, S. Th. & Divjak, D. (2012). Frequency Effects in Language Learning and
Processing. Berlin: Mouton de Gruyter.
Gries, S. Th. & Stoll, S. (2009). Finding developmental groups in acquisition data: vari-
ability-based neighbor clustering. Journal of Quantitative Linguistics 16(3), 217-
242.
Hawkins, J. A. & Buttery, P. (2010). Criterial features in learner corpora: Theory and
illustrations. English Profile Journal, 1(1), 1-23.
Automatic extraction of L2 criterial lexico-grammatical features 175
Tono, Y. & Mochizuki, H. (2009). Toward automatic error identification in learner cor-
pora: A DP matching approach. Paper presented at Corpus Linguistics 2009,
Liverpool, UK.
UCLES-RCEAL Funded Research Projects. Retrieved January 15, 2012, from
http://www.englishprofile.org/images/pdf/ucles_rceal_projects.pdf.
Williams, C. (2007). A preliminary study into the verbal subcategorisation frame: Usage in
the CLC. Unpublished manuscript.
About the authors
Anna Gudmundson has a PhD in Italian and does research in L2 and L3 acqui-
sition at the department of language education at Stockholm University, Sweden.
Her thesis concerns the acquisition of grammatical gender and number in Italian
as a second language. She is currently engaged in research on lexical acquisition
and cross-linguistic influences from previously acquired languages.