Textbook Trends in Teaching Language Testing: Alan Davies University of Edinburgh, UK

Language Testing 2008 25 (3) 327–347
Textbook trends in teaching

language testing
Alan Davies University of Edinburgh, UK
The article examines changes in language testing textbooks in English since

Lado (1961) and proposes that two trends may be discerned. The first shows
how the growing professionalism of the field has required an expansion in
teaching materials to meet the need for new training programmes. What the
expansion also shows is the desire, again a mark of increasing professional-
ism, to provide all teaching resources from within the profession so that for
needed skills (e.g. statistics and measurement) it is now less necessary to
appeal to outsiders such as statisticians and psychometricians. The second
trend explains the need for the profession to expand its view of the skills
needed by its members. From Lado onwards, skills were always conjoined
with knowledge about language and about testing. More recently, the profes-
sion has explicitly declared a concern for principles with regard, for example,
to validity and to ethics. The increasing professionalism comes at a cost: that
cost is twofold: in-housing all resources means that language testers are
increasingly insulated from other potentially rewarding disciplines. And the
complete resource offerings in the later teaching materials means that students
may be denied empirical encounters with real language learners, spending
all (or much of) their training within the resource material. The article also
questions how far research has informed the changes in training materials.
Keywords: informed by research, knowledge, language testing textbooks,

practical manuals, principles, professionalism, skills, teachers’ resources
In writing about the teaching of language testing, we can make use

of any of the materials (printed, audio, video, DVD, etc.) that have
been developed. But it will not be very helpful to do so, first because
our critique becomes an indiscriminate survey of the literature, and,
second, because teaching ceases to be a deliberate proactive presen-
tation and becomes an exposure to the whole field. It is more useful,
both for our understanding of teaching and in order to put limits on
Address for correspondence: Alan Davies, Linguistics and English Language, School of
Philosophy, Psychology and Language Sciences, University of Edinburgh, Adam Ferguson
Building, George Square, Edinburgh EH8 9LL, UK; email: a.davies@ed.ac.uk
© 2008 SAGE Publications (Los Angeles, London, New Delhi and Singapore) DOI:10.1177/0265532208090156
328 Textbook trends in teaching language testing
the material discussed in this paper, to consider teaching as deliber-

ate pedagogy.
By ‘deliberate pedagogy’, I mean the work that teachers do in
their professional pursuit of teaching: they plan and organize their
area of expertise, which may be a language, a science or, in our case,
language testing, in order to facilitate learning.
When I look back over the last 50 years two trends may be dis-
cerned. The first trend charts the growing professionalism and expan-
sion of the field alongside the attempt to develop all-in material,
thereby relieving the student of the need in the teaching context to draw
on material outside the textbook. As we shall see, psychometric issues
are still very important today – but see Fulcher and Davidson (2007).
The second trend reveals the move from the skills knowledge
approach to the current attempt to take account also of principles.
Skills provide the training in necessary and appropriate methodology,
including item writing, statistics, test analysis and increasingly
software programmes for test delivery, analysis and reportage.
Knowledge offers relevant background in measurement and language
description, as well as in context setting, and may involve an exam-
ination of different models of language learning, of language teach-
ing and of language testing such as communicative language testing,
performance testing and nowadays, socio-cultural theory. Principles
concern the proper use of language tests, their fairness and impact,
including questions of ethics and professionalism, thus a consider-
ation of the growing professionalism of language testing, of the
responsibilities of language testers and of the impact of their work on
a range of stakeholders and of the ethical choices they must make. In
what follows, I reflect on key publications over the period and later
return to a consideration of a selection from those key publications of
representative texts in terms of the two trends I have adumbrated.
These representative texts are British, American and Australian and
they span the whole period under discussion. They are not intended
to be any more celebrated than any of the others referred to but are
selected as representative largely because they illustrate my argument
of the move over these 40 years from skills knowledge to a
knowledge-informed skills and then to a principles-informed skills.
I First trend: Expansion

A three-way distinction can be made of materials produced for teach-
ers. At the most discursive end we have (1) teachers’ resources,
Alan Davies 329
including books, videos, DVD and computer software. These provide

a library for teachers, there to inform them and be made available,
where appropriate, to their students. Next are (2) textbooks which
provide a deliberately pedagogic approach, again aimed largely at
teachers and intended to help them professionally. Then at the how-to
end we have (3) practical manuals. Sometimes two of these elements
may be combined. Robert Lado, whom Wood calls ‘luminary’ (Wood,
1991, p. 238) gave language testing early credibility. It is instructive
to consult the list of references in Language Testing (Lado, 1961). He
is even-handed: he cites such influential psychometric texts as
Anastasi (1954/1961), Cronbach (1949/1961) and Buros (1959), as
well as significant linguistic texts (Bloomfield, 1933; Gleason, 1955;
Hockett, 1958; Sapir, 1921; Fries, 1945). There are no references to
any other language testing authors and Lado’s linguistics references
are all to theoretical and descriptive linguists, not to applied linguists.
It is as though Lado is making his contribution to establishing the
field by maintaining that applied linguistics needs language and that
language testing needs applied linguistics measurement.
Lado’s book is thus firmly in the middle of my three-way distri-
bution, among the textbooks. Lado introduces his book thus:
‘a comprehensive introduction to the construction and use of foreign
language tests. It incorporates modern linguistic knowledge into lan-
guage testing as one of its chief contributions. … The material is pri-
marily intended for teachers of foreign languages and of English as
a foreign language’ (Lado, 1961, p. vii). While the book may be a
textbook, in my use of the term here, there are some parts of the book
which approximate a practical manual, notably Part 2: ‘Testing the
elements of language’.
There are those who do not value Lado’s contribution to language
testing. But that is unjust. The book is a triumph of combining issues.
McNamara, 40 years later, writes ‘his recommendations about testing
dominated practice for nearly twenty years and are still influential in
powerful tests such as TOEFL’ (McNamara, 2000, p. 89). Bernard
Spolsky calls Lado’s 1961 volume ‘a pioneering book’ (Spolsky,
1995, p. 353). He praises Lado’s work thus: ‘Lado’s explicit appeal to
theory was a crucial step to the professionalization of the field. With
Lado, and with the students and colleagues he gathered (in the 1950s)
at Michigan, like Harris and Palmer the language testing profession
had taken a major first step’ (Spolsky, 1995, p. 150). And he urges
us to remember our pioneers, such as Lado: ‘Our field has been
remarkably ahistorical: we have too often satisfied ourselves with
patricidal fury on a named or unnamed predecessor before launching
ourselves into our own rediscovery of a slightly circular wheel of our

own’ (Spolsky, 1995, p. 352). No later publication comes near the
breadth of Lado (1961), until perhaps Fulcher and Davidson (2007).
What Lado was keenly aware of was that language teachers need
to know about language as well as about language testing. Lado’s
successors have been less concerned with providing knowledge about
language, perhaps because in the last half century applied linguistics
has been more widely available. Here is part of Lado’s commentary:
‘As language yields its secrets to linguistic analysis, lexicographic
study, and quantitative research, it is more and more feasible to define
specifically the task of learning a foreign language. As we identify
more precisely the elements and patterns to be acquired by the speak-
ers of a language in learning another, we will be able to test more pre-
cisely the progress made by the student under given conditions’ (Lado,
1961, pp. 338–339). Lado may have been over-optimistic about the
future of science, but his understanding of what was needed was just.
Later in the 1960s, Harris (1969) and Valette (1967, 2nd ed., 1977)
followed on Lado’s example, Harris for ESL, Valette for modern for-
eign languages, amplifying Lado in a specialist area and at the same
time dependent on him. Valette (1967, p. v) claimed that her inten-
tion was ‘to introduce teachers to a diversity of testing techniques’,
while Harris (1969) offered his book as ‘a short concise text on the
testing of ESL, a subject about which both classroom teachers and
trainers of teachers have shown an increasing concern’ (Harris, 1969,
p. vii). Like Valette, Harris modelled himself on Lado, thus combin-
ing an analytic approach to language and its uses in such sections as
‘what is meant by reading comprehension’, ‘what is meant by writ-
ing’, ‘what is meant by speaking a second language’ along with dis-
cussion of test characteristics (reliability, validity, practicality), test
construction, test administration, analysis of test results and followed
by a separate section on the statistics needed to complete the task.
Valette is narrower, understandably so since she offers a range of dif-
ferent language examples, but in the main both the Harris and the
Valette are largely concerned with ‘how to’ develop tests, with Harris
going further in how to analyse the results.
While Lado combined the resource and the textbook and Valette
and Harris the textbook and the practical manual, all three primarily
offer a textbook approach. CAL (1961) and Davies (1968), in their
provision of historical accounts and views of language testing issues,
provided texts which were resource-based and as such offered infor-
mation and ideas to teachers and graduate students which could be
followed up. Applied linguistics in the 1960s was still in its early
Alan Davies 331
days and therefore language teachers (who might or might not fol-
low graduate courses in applied linguistics at some point) were the
necessary audience for applied linguistics developments.
In his important CAL paper, J. B. Carroll (1961) does not attempt
to offer a language testing blueprint. Instead, he sets out to ‘readdress
the attention of the audience to certain basic and fundamental prob-
lems and points of view, some of which may have been lost sight of
in the heat of enthusiasm for technical detail’ (Carroll, 1961, p. 31).
And in his edited volume, Davies (1968) offers a range of views that
attempt to bring together the ‘three strands of language testing: lan-
guage, learning and evaluation’ (Davies, 1968, p. 1), examining ‘the
basic disciplines and their relevance to language testing … uses and
types of test … the influence of tests on education … the item analy-
sis needed’ (Davies, 1968, p. 13). Again, as with the Carroll paper,
the focus here is that of resource (with a glance at textbook material).
Whether we locate them as resource materials or as textbooks, it
seems the case that while the Lado and the Valette and the Harris
combine the what with the how to, with Valette and Harris more on
the how to side, the Carroll and the Davies are both very much on
the what side.
Through the 1960s and the 1970s, with the publication of the text-
books of Clark (1972) and Allen and Davies (1977), along with the
publication of the Peace Corps’ Manual of language testing (pub-
lished later as Anderson 1993), a deliberately practical field-guide,
there was always the recognition that while these materials provided
textbooks and practical manuals, supporting psychometric and stat-
istical back-ups were necessary. These were not language or applied
linguistics specific but were generic to all testing such as Cronbach
(1949), Anastasi (1954) and so on. And for statistical work there
were generic programmes such as SPSS.
The Edinburgh course in applied linguistics appeared in four vol-
umes between 1973 and 1977 (Allen & Corder, 1973, 1974, 1975;
Allen & Davies, 1977). Volume 4 (Allen & Davies, 1977), with the
title Testing and experimental methods, argued that ideas in applied
linguistics needed to be submitted to the rigour of hypothesis and
experimentation. Experiments, it was suggested, need tests while
tests are themselves kinds of experiment: ‘This book is an attempt to
demonstrate our belief in the importance of this link’ (Allen &
Davies, 1977, p. 10).
After the Introduction by Davies, two chapters (by Davies &
Ingram) were devoted to testing, two (by Ruth Clark) to experimen-
tal design and computation and one to statistical inference. There
were also two appendices on statistical inference and tables. Practical

work was provided at the end of each chapter. Ingram (1977) con-
tributed a theoretical chapter on ‘Basic concepts in testing’, while
Davies contributed a more practical chapter on ‘The construction of
language tests’ (Davies, 1977). The book fits very neatly into our sec-
ond (text-book) category. The emphasis throughout is very much on
the connection between skills and knowledge, both measurement
knowledge and language knowledge being presented as part of the
skills that the language tester (and researcher) needed to acquire. This
volume, The Edinburgh course in applied linguistics, Volume 4, was
a serious attempt to locate language testing firmly within applied lin-
guistics and was possibly the first such attempt. All four volumes
were much used over the subsequent 10 years to teach applied lin-
guistics and, in the case of Volume 4, to teach both language testing
and research methodology.
In the 1980s, we see both an expansion and an enriching in lan-
guage testing publications. This development was paralleled in other
areas of applied linguistics, an increasing number of research spe-
cialists in fields such as second language acquisition and discourse
analysis expanded their research base and, necessarily, their teaching
provision to take account of the expansion. Thus in language testing
we see the explosion of communicative language teaching with
Carroll and Hall’s (1985) teachers’ guide alongside a growth in more
general textbooks such as Madsen (1983), Hughes (1989) and the
earlier Heaton (1975). B. J. Carroll (1985) promoted communicative
language testing procedures. ‘The purpose of this book’ Carroll
maintained ‘is to outline principles and techniques for specifying the
communicative needs of a language learner and for assessing his lan-
guage performance in terms of those needs’ (Carroll, 1985, p. 5).
With hindsight it would be more appropriate to place Carroll’s
Testing Communicative Performance in our skills knowledge cat-
egory. Carroll’s principles are more properly considered knowledge
in my sense, with the proviso that this knowledge is quite ideologic-
ally driven.
What we also see in the teaching programmes at graduate level is
that empirical work is still part of the core requirement so that stu-
dents following language testing courses were required to carry out
a small-scale testing project. For this they relied on the growing
number of textbooks which were beginning to relate the necessary
research and analytic techniques to language and applied linguistics.
Among these were Hatch and Farhady (1982), followed in a second
edition by Hatch and Lazaraton (1991) for statistics and research
Alan Davies 333
design and the Henning (1987) which began to make IRT techniques
meaningful to the field of language testing and applied linguistics.
The 1980s also saw the start of the new journal Language Testing,
a sure sign of the field’s growing research capacity. This journal was
very deliberately not a teaching outlet.
In the 1990s we see the normal academic development of an
emerging discipline, now maturing and showing that maturity by
publishing research monographs, regular surveys of the field
(Davies, 1982; Skehan, 1988, 1989; Alderson & Banerjee, 2001,
2002), both its development and its future trends. These develop-
ments increased the coming together of the contributing disciplines
so that Bachman (1990) and Bachman and Palmer (1996) brought
together research design, statistics, computer programmes, test
preparation and analyses, while Davies (1990) and Wood (1991)
offered critiques of language testing which can at best be regarded as
resource materials for teaching, but are not easily put directly to use
in a training programme.
Since so much writing about language testing, up until the 1990s,
and perhaps even today, concerns large-scale testing, Genesee and
Upshur (1996) was very much to be welcomed, dealing as it did with
the very real, and very difficult context of classroom assessment.
Genesee and Upshur termed their book ‘practical’ and it belongs to
the practical manual end of my textbook category, concerned primar-
ily with skills and offering the knowledge necessary for employing
those skills but less concerned with principles.
As the field of language testing has grown, as courses have de-
veloped, at the undergraduate and graduate as well as at the PhD level,
and as those courses have specialized so that now there are a number
of master’s degrees in language testing itself, while formerly these
were normally part of degrees in applied linguistics or in applied lan-
guage studies, TESOL, and so forth, different teaching needs have
shown themselves. Extended resources deliberately designed for
teaching are represented by the publication of self-help teaching
materials in the shape of the University of Melbourne video series
Mark My Words (1997) and the ILTA Web-based interviews on lan-
guage testing Video FAQs (Fulcher & Thrasher, 1999, 2000). A paral-
lel development is represented by the publication of the Dictionary of
language testing (Davies, Brown, Elder, Hill, Lumley & McNamara,
1999) and the Encyclopedic dictionary of language testing (Mousavi,
2002). Maturing disciplines display and develop their maturity
through the development of teaching materials, such as the videos and
through the defining descriptive work of specific dictionaries. This
descriptive work provides the self-help teaching resources that teach-

ers and students rely on. Davies et al. (1999) labelled their Dictionary
of Language Testing a segmental dictionary: ‘as such, it retains its
professional/vocational/registral association and at the same time its
normative/pedagogic purpose’ (Davies, 1996, p. 231).
A more traditional development, also in the 2000s, is represented
by the Cambridge University Press (CUP) Language Assessment
series which, beginning in 1999, has to date published 10 volumes.
Seven are of particular relevance (Douglas, 1999; Alderson, 2000;
Read, 2000; Buck, 2001; Cushing, 2002; Purpura, 2004; Luoma,
2004) inasmuch as they replicate within their properly narrow con-
fines the operation of the Bachman (1990) model of language test-
ing. This series is deliberately pedagogic. The Series Editors’ Preface
to Alderson (2000) concludes thus: ‘this book offers a principled
approach to the design, development and use of reading tests and
thus exemplifies the purpose of this series to bring together theory
and research in applied linguistics in a way that is useful to language
testing practitioners’ (Alderson, 2000, p. xi).
A separate but contemporary development may be found in the
work of Pennycook (2001), Shohamy (2001), Hawkey (2006) and
McNamara and Roever (2006). Although none of these publications
is a textbook, all are being widely used and excerpted in the teach-
ing of language testing and in training programmes. Basing them-
selves on a teleological foundation, on the judgement of test use
(what Bachman & Palmer (1996) term ‘test usefulness’) and on a
concern for a professional attachment to ethics (Davies, 1997), they
all insist, in somewhat different ways, that test validity must take
account of how and where a test is used. Such critiques, based as
they are on an essentialist, relativist belief, may or may not be ten-
able or indeed practical. But there is no doubt that their critical
attacks have penetrated into teaching programmes, giving pause to
the perhaps overly confident view that a language test is a language
test, no matter where or for whom. This critical stand-off is linked
also to the social constructivist critiques of positivist philosophies
(Lantolf, 2000). In all cases, what we see is a genuine and worth-
while attempt to reflect on what Shohamy (2001) calls the power of
tests. Students and teachers who are working in and studying lan-
guage testing need to know about these critiques so that they are
aware that what they are involved in, language testing, has the poten-
tial to harm, indeed destroy people, even though, of course, they may
not change what the students and teachers think or do.
Alan Davies 335
II Second trend: Skills, knowledge and principles

The second trend reveals the move from the skills knowledge
approach to the current attempt to take account also of principles.
Skills provide the training in necessary and appropriate method-
ology, including item writing, statistics, test analysis and increa-
singly software programmes for test delivery, analysis and reportage.
Knowledge offers relevant background in measurement and lan-
guage description as well as in context setting. Principles concern
the proper use of language tests, their fairness and impact, including
questions of ethics and professionalism.
The movement over the last 40 odd years seems to be from skills
to skills knowledge to skills knowledge principles. The trend
is not consistent but overall seems to hold. We can argue as follows:
what a new (applied) activity needs – quickly – is to disseminate
skills. But it becomes apparent quite soon that skills are not sustain-
able without knowledge since knowledge provides the context in
which skills operate: if skills represent ‘how?’, then knowledge rep-
resents ‘what?’. And then over time, as the activity becomes more
confident and, as a profession, practising the activity grows, it is
inevitable that the activity itself comes into question: externally – of
course, but that is an old critique (testing has always had its critics) –
but now internally as the language testing professionals themselves
begin to query their own professionalism, their ethical foundations.
What then happens is that what had been a skill, such as item writ-
ing, incorporates knowledge and so becomes skill knowledge
since item writing requires understanding of the context and purpose
for which the items are being written. Thus a test of LSP necessarily
requires that item writers have the relevant knowledge of the lan-
guage description of their area of special purpose. And further, as
knowledge becomes more widely available in the profession, so the
need to explain, to justify and to judge becomes important. Thus the
concern for, let us say, validity moves from principles to knowledge
as validity itself takes on more than a concern to represent the ideal
domain and becomes a recognition of the practical impact on the test
in all its singular settings. In its turn, the new principles-informed
knowledge is operationalized and incorporated into skills. Skills,
meaning techniques and methodologies, on their own are no longer
enough, skills knowledge are inadequate without the addition of
principles. For teaching, as for learning, there is a need for careful bal-
ancing of the practical (the skills) with the descriptive (the knowledge)
and the theoretical (the principles). All are necessary but one without
the other(s) is likely to be misunderstood and/or trivialized.
The survey of language testing courses, reported in Bailey and
Brown (1996), has now been updated for this volume by Brown and
Bailey who find that in the 10 years since their 1996 survey little has
changed apart from the choice of textbooks. They report ‘the pres-
ence of a stable knowledge base that is evolving and expanding
rather than shifting radically’ (Brown & Bailey, this volume: p. 371).
In 1996, Bailey and Brown reported that ‘there is a great deal of
diversity in the sorts of language testing preparation provided to
teachers’ (Bailey & Brown, 1996, p. 250). This diversity is revealed
in the list of required and optional textbooks supplied to them by the
84 language testing teachers who returned their questionnaires.
Bailey and Brown list 32 textbooks, half of which were listed by
only one respondent. The most common textbooks were as follows:
Henning, 1987
Madsen, 1983
Hughes, 1989
Bachman, 1990
Oller, 1979
Shohamy, 1985
Bailey and Brown (1996, p. 247) comment that ‘there is a wide range
of emphasis, from the very theoretical to the very practical in the
assessment preparation language teachers receive’. However, of the
six textbooks listed above, the four most commonly used, (Henning,
Hughes, Bachman and Oller) were very much on the theoretical side.
It appears that there was a widely held view that a language testing
textbook should be inclusive, combining knowledge and skills. The
high ranking given to Oller (1979) confirms that choice of theory
plus practical textbook; indeed, we might speculate that the attrac-
tion of the Oller is that it offered not just knowledge and skills but
also broached principles in the discussion of the nature of validity:
‘What is the ultimate criterion of validity for language tests?’(Oller,
1979, p. 404). No doubt Oller’s ideological adherence at that time to
his expectancy grammar and to the indivisibility hypothesis (or uni-
factorial structure of language proficiency) may have made for a
one-sided approach, but principles they are, nonetheless.
In their 2007 Survey, reported in this volume, Brown and Bailey
noted that 29 textbooks were listed as in use, compared with the 32
in the 1996 Survey. They write: ‘Interestingly, only six of the books
were common to both studies and of those, four were in new
editions, while only two were in their original editions’ (Brown &
Bailey, this volume: p. 371).
Alan Davies 337
The six textbooks common to both surveys were as follows:

Hughes (1989, 2002)
Bachman (1990)
Brown (new ed. 2005)
Cohen (1994)
Alderson, Clapham and Wall (1995)
Bachman and Palmer (1996)
(There is, of course, a natural delay before a new textbook is taken
up and an existing one laid down.) The 2007 list of frequency of use
placed five of these (Hughes, Bachman, Brown, Alderson et al.,
Bachman and Palmer) at the head of its list. I noted above that of the six
most commonly used textbooks listed in 1996 four were on the theor-
etical side. These include the Hughes and the Bachman, the only text-
books that appear as most commonly used in both lists. And two of those
moving into the top position for the first time in 2007, the Alderson,
Clapham and Wall and the Bachman and Palmer, also take up a theoret-
ical approach, thereby combining, as I have argued, knowledge and
skills. This combination of knowledge and skills is, it would appear,
more likely to endure than the somewhat ephemeral practical manuals.
I now consider a small number of celebrated, perhaps iconic texts
published over the last 50 years, to illustrate what I regard as the skills,
knowledge and principles trend in the concept of the teaching of
language testing. What I propose is, as foreshadowed earlier, that over
this period there has been an expansion from skills to skills
knowledge and then to skills knowledge principles. My illus-
trative texts are:
Lado (1961)
Allen and Davies (1977)
Hughes (1989)
Bachman (1990)
Alderson, Clapham and Wall (1995)
Bachman and Palmer (1996)
Mark My Words (1997), the ILTA Video FAQs (Fulcher & Thrasher, 1999,
2000), the Dictionary of language testing (Davies et al., 1999) – these three
taken together
McNamara (2000)
Davidson & Lynch (2002)
Weir (2005)
McNamara and Roever (2006)
Fulcher and Davidson (2007)
While there are indeed texts that deal entirely or perhaps mainly
with skills (for instance, Madsen, 1983; Carroll & Hall, 1985;
Heaton, 1975), all the examples I want to discuss are more compre-
hensive. Thus Lado (1961), Allen and Davies (1977), Hughes (1989)
deal with both skills and knowledge.
Lado (1961) begins his book with knowledge: his Part 1 consists
of discussions of language, language learning, language testing, vari-
ables and strategy of language testing, and critical evaluation of
tests. The remaining 90% of the book examines the skills needed for
developing tests and for experiments using language tests.
Similarly, Allen and Davies (1977) consider both skills know-
ledge with chapters on ‘Basic concepts in testing’ and on ‘The con-
struction of language tests’. There are then two chapters on
experiments plus a further chapter (and appendices) on the meaning
and working of statistics used in experiments and testing.
Hughes (1989, now in its second edition, 2003) again deals with
the background knowledge needed in language testing and with the
statistical and item writing skills. What is of interest to us in this dis-
cussion is that Hughes’s second edition does not move beyond the
skills knowledge position he took up in 1989 in spite of the pres-
sures exercised by discussions in the language testing community on
principles. This may suggest that there is less demand among teach-
ers, Hughes’s target audience, for principles than I had assumed.
Bachman and Palmer (1996) and Davidson and Lynch (2002) follow
much the same pattern. However, Bachman and Palmer’s examination
of the conceptual basis of test development introduces what they term
‘test usefulness’, ‘a kind of metric by which we can evaluate not only
the tests that we develop and use, but also all aspects of test develop-
ment and use’ (Bachman & Palmer, 1996, p. 17). This formulation
I consider to be an incorporation of skills knowledge such that the
knowledge of test development and use now becomes a learnt skill. But
the main purpose of their book is, they contend, ‘to enable the reader to
become competent in the design, development and use of language
tests’ (Bachman & Palmer, 1996, p. 3). That is its primary purpose and
that is why I place the book in the skills knowledge category.
Davidson and Lynch’s book (2003) also belongs in this category.
The subtitle is: ‘A teacher’s guide to writing and using language test
specifications’. Davidson and Lynch maintain that existing language
testing textbooks assume knowledge of testing while they aim to
provide an introduction to newcomers, focussing on test specifica-
tions. And so their book offers guidance on the skills needed to write
test specifications but also shows how the knowledge behind those
specifications can be viewed as – and taught as – skills in their own
right.
Bachman (1990), the text on which Bachman and Palmer (1996) is
based (as well as the series Cambridge Language Assessment – see
above), does not deal with the skills of item writing and test analysis.
Alan Davies 339
These matters are, after all, taken up in the later Bachman and Palmer
(1996). But what Bachman does in his 1990 text is to treat knowledge
as a form of skill and at the same time to move on to begin an exam-
ination of principles. Hence the discussion of validity as a unitary
concept, of the evidential nature of validity and of the consequential
and ethical basis of validity. Such discussions look forward to the
Bachman and Palmer (1996) concept of test usefulness.
Alderson, Clapham and Wall (1995) is much more rounded and
less practical. ‘Something we do not do in this book is to describe
language testing techniques in detail’ (Alderson, Clapham & Wall,
1995, pp. 2–3). What they do deal with is the examination of valid-
ation in all its aspects. Indeed, short on techniques though this volume
may be, through its in-depth discussion of validity and standards its
scope as a textbook is as wide-ranging as that of Lado’s before them
and that of Fulcher and Davidson (2007) over ten years later.
Alderson, Clapham and Wall (1995) ground much of their discussion
by reference to the work of the UK EFL examination boards. This
has the advantage of realism and makes a powerful argument for the
different concerns of academic language testing on the one hand and
public or institutional (or indeed commercial) language testing on
the other. It is distressing for academics to learn that: ‘not all boards
understand what is meant by validation, validity and reliability’
(Alderson et al., 1995, p. 257). But the very context specificity of the
book and its strong critique – almost ideological – of the boards may
detract from the overall concern of the learner who is unlikely to
have the same critical view as the authors.
The two video projects, Mark My Words (1997) and the ILTA
Video FAQs (Fulcher & Thrasher, 1999, 2000) are not primarily con-
cerned with skills. Both deal largely with knowledge. Thus, Mark
My Words has the following topics in its series:
Language proficiency assessment
Principles of test development (‘principles’ is used somewhat
differently in the present article)
Objective and subjective assessment
Stages of test analysis
Performance assessment
Classroom-based assessment
In this video series, knowledge is presented not so much as back-
ground as part of the necessary skills behaviour in developing lan-
guage tests. The Dictionary of Language Testing (Davies et al.,
1999) goes further by incorporating, as is the nature of dictionaries,
topics such as ethics, ethicality and impact and thus taking some
account of the principles of language testing.
Like Alderson, Clapham and Wall (1995), Weir (2005) is less con-
cerned with skills than with knowledge and in particular with valid-
ation. What he very carefully does is to explain that validation
evidence is required to demonstrate validity: in other words, while
others have shaped knowledge into a kind of skill, what Weir does is
to convert principles into first knowledge and then into a skill. No
doubt there is a case for retaining a separation between knowledge
and skills and between principles and skills so that implementing
them requires thought, not just automaticity. But for teaching pur-
poses, which is our concern here, the demonstration of how to make
both knowledge and principles operational, that is skill-like, is peda-
gogically very appealing.
McNamara (2000) and McNamara and Roever (2006) both elabo-
rate the knowledge needed for professional language testers and deal
in some depth with principles. Thus, in both texts there is concern
with ethics and social policy and responsibilities: indeed, the whole
of McNamara and Roever (2006) is concerned, as the title indicates,
with social issues. McNamara and Roever argue that ‘language test-
ing is … ripe for a broader view of assessment and its social aspects
(while) … testers need … to reflect on test use’ (McNamara &
Roever, 2006, p. 8). If McNamara (2000) and McNamara and Roever
(2006) are concerned wholly with knowledge and principles in such
a way that principles become part of the knowledge needed, Fulcher
and Davidson (2007) is yet more all-embracing, offering in one vol-
ume what Bachman (1990) and Bachman and Palmer (1996) offer in
two. Fulcher and Davidson (2007) has the title: Language testing and
assessment: An advanced resource book and is part of an Applied
Linguistics series of resource books in different areas.
Fulcher and Davidson (2007, p. xix) consider that their discussion
‘is set within a new approach that we believe brings together testing
practice, theory, ethics and philosophy. At the heart of our new
approach is the concept of effect-driven testing. This is a view of test
validity that is highly pragmatic. Our emphasis is on the outcome of
testing activities’.
The integrative nature of their text, comprising:
A) Introduction: 10 units dealing with the central concepts of language
testing and assessment
B) Extension: readings from books and articles linked to the concepts
introduced in Section A
C) Exploration: extended activities building on both A and B
Alan Davies 341
situates the learner within the language testing enterprise. Of all the
texts examined in this paper, Fulcher and Davidson (2007) does
seem to provide the most complete coverage of skills, knowledge
and principles.
The development proposed above can be summarized as follows: In
the 1960s (and earlier) language testing relied on external sources,
particularly psychometric (Cronbach, 1949; Anastasi, 1961; Tyler,
1963; Anstey, 1966). From the 1970s and onwards, the attempt was
made internally to nativize the necessary skills and knowledge – but
in separate texts, thus Hatch and Farhady (1982) and Hatch and
Lazaraton (1991) dealing with statistics and research design; Shohamy
(2001) and possibly the external Pennycook (2001) to handle critical
approaches, the ILTA Code of Ethics (2000) presenting the profes-
sion’s ethical principles, and Bachman (2004), again dealing with stat-
istics. Meanwhile, we have the internal sequence discussed above
from Lado (1961) to Fulcher and Davidson (2007) moving gradually
beyond the skills knowledge scenario to the skills knowledge
principles combination. We present this array in Table 1.
III Conclusion
The development in teaching materials examined in this paper comes
as a result of the increasing professionalism of the field of language
testing. That increasing professionalism has a cost: that cost is two-
fold: in-housing all resources means that language testers are
increasingly excluded from other potentially rewarding disciplines.
And the complete resource offerings in the later teaching materials
means that students are over-protected from exposure to empirical
encounters with real language learners, spending all (or much of)
their training within the resource material.
This exclusion from external influences leads to an insularity,
a reluctance to take up new ideas, as McNamara and Roever (2006)
argue. They remind us that in spite of the social turn in the last two
decades the teaching of language testing is still largely psychomet-
ric: ‘In terms of academic training, we stress the importance of a
well-rounded training for language testers that goes beyond applied
psychometrics … a training that includes a critical view of testing
and social consequences’ (McNamara & Roever, 2006, p. 255).
A similar resistance may explain the reluctance to grapple with
recent research findings. Wood’s (1991) is an extreme view: ‘it is
clear that innovation is not driven by research … (but) … it is import-
ant to understand how (innovations) happened, and whether they
Table 1 Changes in English language testing textbooks
External Internal Internal Skills Knowledge Principles

separate combined
1960s: Cronbach; Lado x x

Anastasi;
Tyler;
Anstey
1970s: Hatch & Allen & x x
Farhady; Davies
Hatch &
Lazaraton
Hughes x x
Bachman; x x
Alderson, x x
Clapham &
Wall
Bachman & x x
Palmer
Mark My ? x ?
Words
2000s: Code of x x x
Ethics
Shohamy McNamara x x
Pennycook Davidson & x x
Lynch; Weir
Bachman McNamara & x x
Roever
Fulcher & x x x
Davidson
were actually necessary, if only to appreciate how marginal the part

research evidence plays in the decision’ (Wood, 1991, p. 248).Wood’s
scepticism about the influence of research relates to innovation: ‘it is
clear that innovation is not driven by research’, a view he exem-
plifies in his comments on the English Language Testing Service test
(Wood, 1991, pp. 235–236). In this paper, I have not, in any direct
way, discussed to what extent language testing textbooks make use
of language testing research. This omission is deliberate. It is not, of
course, that there is a dearth of research in language testing: on the
contrary there is a great deal, reported in the journals such as
Language Testing and Language Assessment Quarterly, in the encyc-
lopedias such as Shohamy and Hornberger (2008) and Hinkel
(2005), in the many monographs, notably the Alderson and Bachman
series (see the reference above, p. 334), to the CUP Language
Assessment Series: 2000–2004) and in the regular reports of
Alan Davies 343
Cambridge ESOL and Educational Testing Service on TOEFL. But

while a textbook is properly informed by research, its primary pur-
pose is not, as is that of a monograph, to report recent research.
Textbooks consolidate while monographs are dynamic, reporting
developments in current research. There is an inevitable gap, a time
lag between the publication of research and its incorporation in a
textbook, by which time there may be a very different research need,
as, for example, McNamara (2005), Rea-Dickins (2008) and Leung
and Lewkowicz (2008) note, ‘in their references to classroom assess-
ment’. McNamara’s labelling of this gap as ‘unbridgeable’
(McNamara, 2005, p. 778) is apt and mistaken. Yes, there is a gap
but it is a necessary gap. Worthwhile training needs to be informed
by mature understanding of research and not by the latest news from
the PhD and the research project.
IV References
Alderson, C. (2000). Assessing reading. Cambridge: Cambridge University
Press.
Alderson, C. & Banerjee, J. (2001). Language testing and assessment (Survey
Article). Language Teaching, 34, 213–236.
Alderson, C. & Banerjee, J. (2002) Language testing and assessment (Survey
Article). Language Teaching, 35, 79–113.
Alderson, C., Clapham, C. & Wall, D. (1995). Language test construction
and evaluation. Cambridge: Cambridge University Press.
Alderson, C. & Bachman, L. F. (Eds.). (2000–2005). The Cambridge lan-
guage assessment series. Cambridge: Cambridge University Press.
Allen, P. & Davies, A. (Eds.). (1977). Testing and experimental methods:
Volume 4. The Edinburgh course in applied linguistics. London: Oxford
University Press.
Allen, P. & Corder, S. P. (Eds.). (1973). Readings for applied linguistics:
University Press.
Allen, P. & Corder, S. P. (Eds.). (1974). Techniques in applied linguistics:
University Press.
Allen, P. & Corder, S. Pit (Eds.). (1975). Papers in applied linguistics: Volume
2. The Edinburgh course in applied linguistics. London: Oxford
University Press.
Anastasi, A. (1954). Psychological testing (1st ed.). New York: Macmillan.
Anastasi, A. (1961). Psychological testing (2nd ed.). New York: Macmillan.
Anderson, N. (1993). Handbook for classroom teachers in Peace Corps
language programs. Manual. Washington, DC: The Peace Corps.
Anstey, E. (1966). Psychological tests. London: Macmillan.
Bachman, L. (1990). Fundamental considerations in language testing.

Oxford: Oxford University Press.
Bachman, L. (2004). Statistical analyses for language assessment.
Cambridge: Cambridge University Press (also with Antony Kunnan:
Handbook and CD).
Bachman, L. & Palmer, A. (1996). Language testing in practice. Oxford:
Oxford University Press.
Bailey, K. M. & Brown, J. D. (1996). Language testing courses: What
are they? In A. Cumming & R. Berwick (Eds.), Validation in language
testing (pp. 236–256). Clevedon: Multilingual Matters.
Bloomfield, L. (1933). Language. New York: Henry Holt.
Brown, J. D. & Bailey, K. M. (2008). Language testing courses: What are they
in 2007? Language Testing (this issue).
Buck, G. (2001). Assessing listening. Cambridge: Cambridge University Press.
Buros, O. (1959). Fifth mental measurements yearbook. Highland Park, NJ:
Gryphon Press.
CAL (Center for Applied Linguistics) (1961). Testing the English proficiency
of foreign students. Washington, DC: CAL.
Carroll, B. J. (1985). Testing communicative performance. Oxford: Pergamon.
Carroll, B. J. & Hall, P. (1985). Make your own language tests: A practical
guide to writing language performance tests. Oxford: Pergamon.
Carroll, B. J. (1961). Fundamental considerations in testing for English
Language proficiency of foreign students. In CAL Testing the English
proficiency of foreign students (pp. 30–40). Washington, DC: Center for
Applied Linguistics.
Clark, J. (1972). Foreign language testing: Theory and practice. Philadelphia,
PA: Center for Curriculum Inc.
Cobuild (1987). Collins Cobuild English language dictionary. London: Collins.
COE (2000). ILTA Code of Ethics. http://www.iltaonline.com/code.pdf
Cohen, A. D. (1994) Assessing language ability in the classroom (2nd ed.).
New York: Heinle and Heinle.
Cronbach, L. (1949). Essentials of psychological testing (1st ed.). New York:
Harper and Row International.
Cronbach, L. (1961). Essentials of psychological testing (2nd ed.). New York:
Harper and Row International.
Cumming, A. & Berwick, R. (Eds.). (1996). Validation in language testing.
Clevedon: Multilingual Matters.
Cushing, S. W. (2002). Assessing writing. Cambridge: Cambridge University
Press.
Davidson, F. & Lynch, B. (2002). Testcraft: A teacher’s guide to writing and
using language test specifications. New Haven, CT and London: Yale
University Press.
Davies, A. (Ed.). (1968). Language testing symposium: A psycholinguistic
approach. London: Oxford University Press.
Davies, A. (1977). The construction of language tests. In P. Allen & A. Davies
(Eds.), Testing and experimental methods. Volume 4. Edinburgh course in
applied linguistics (pp. 38–104). London: Oxford University Press.
Alan Davies 345
Davies, A. (1982). Language testing survey. Parts 1 and 2. In V. Kinsella (Ed.),

Surveys (pp. 127–159). Cambridge: Cambridge University Press.
Davies, A. (1990). Principles of language testing. Oxford: Oxford University
Press.
Davies, A. (1996). The role of the segmental dictionary in professional valid-
ation: Constructing a dictionary of language testing. In A. Cumming and
R. Berwick (Eds.), Validation in language testing (pp. 222–235).
Clevedon: Multilingual Matters.
Davies, A. (1997). Demands of being professional in language testing.
Language Testing, 14(3), 328–339.
Davies, A., Brown, A., Elder, C., Hill, K., Lumley, T., & McNamara, T.
(1999). Dictionary of language testing. Cambridge: Cambridge
University Press and UCLES.
Douglas, D. (1999). Assessing languages for specific purposes. Cambridge:
Cambridge University Press.
Fries, C. (1945). Teaching and learning English as a foreign language. Ann
Arbor, MI: University of Michigan Press.
Fulcher, G. & Davidson, F. (2007). Language testing and assessment: An
advanced resource book. London: Routledge.
Fulcher, G. & Thrasher, R. (1999/2000). Video FAQs. Introducing topics in
language testing. ILTA (online) http://www.le.ac.uk/education/ilta/faqs/
main.html
Genesee, F. & Upshur, J. (1996). Classroom evaluation in second language
education. Cambridge: Cambridge University Press.
Gleason, H. (1955). An introduction to descriptive linguistics. New York:
Henry Holt.
Harris, D. (1969). Testing English as a second language. New York: McGraw Hill.
Hatch, E. & Lazaraton, A. (1991). The research manual: Design and statis-
tics for applied linguistics. New York: Newbury House.
Hatch, E. & Farhady, H. (1982). Research design and statistics for applied
linguistics. New York: Newbury House.
Hawkey, R. (2006). Impact theory and practice. Studies in language testing.
Cambridge: Cambridge University Press and UCLES.
Heaton, B. (1975). Writing English language tests. London: Longman.
Henning, G. (1987). A guide to language testing. Cambridge, MA: Newbury
House.
Hinkel, E. (Ed.). (2005). Handbook of research in second language teaching
and learning. Mahwah, NJ: Lawrence Erlbaum.
Hockett, C. (1958). A course in modern linguistics. New York: Macmillan.
Hughes, A. (1989). Testing for language teachers (1st ed.). Cambridge:
Hughes, A. (2003). Testing for language teachers (2nd ed.). Cambridge:
Ingram, E. (1977). Basic concepts in testing. In J. P. B. Allen & A. Davies
(Eds.), Testing and experimental methods: Volume 4. Edinburgh course
in applied linguistics (pp. 11–37). London: Oxford University Press.
Lado, R. (1961)Language testing. London: Longmans.
Lantolf, J. (Ed.). (2000). Sociocultural theory and second language learning.

Oxford: Oxford University Press.
Leung, C. & Lewkowicz, J. (2008). Assessing second/additional language of
diverse populations. In E. Shohamy & N. H. Hornberger (Eds.),
Encyclopedia of language and education: Volume 7. Language testing
and assessment (2nd ed.), (pp. 301–317). New York: Springer Science &
Business Media.
Luoma, S. (2004). Assessing speaking. Cambridge: Cambridge University
Press.
Madsen, H. (1983). Techniques in testing. New York and Oxford: Oxford
University Press.
Mark My Words (1997). Video Series. Melbourne: University of Melbourne
Language Testing Research Centre.
McNamara, T. (2000). Language testing. Oxford: Oxford University Press.
McNamara, T. (2005). Introduction to Part V1: Second language testing and
assessment. In E. Hinkel. (Ed.), Handbook of research in second lan-
guage teaching and learning (pp.775–778). Mahwah, NJ: Lawrence
Erlbaum.
McNamara, T. & Roever, C. (2006). Language testing: The social dimension.
Malden, MA and Oxford: Blackwell.
Mousavi, A. (2002). An encyclopedic dictionary of language testing (3rd ed.).
Taiwan: Tung Hua.
Oller, J. W., Jr. (1979). Language tests at school. London: Longman.
Pennycook, A. (2001). Critical applied linguistics: A critical introduction.
Mahwah, NJ: Lawrence Erlbaum.
Purpura, J. (2004). Assessing grammar. Cambridge: Cambridge University
Press.
Read, J. (2000). Assessing vocabulary. Cambridge: Cambridge University
Press.
Rea-Dickins, P. (2008). Classroom-based language assessment. In
E. Shohamy & N. H. Hornberger (Eds.), Encyclopedia of language and
education: Volume 7. Language testing and assessment (2nd ed.).
(pp. 257–271). New York: Springer Science & Business Media.
Sapir, E. (1921). Language: An introduction to the study of speech. New York:
Harcourt Brace.
Shohamy, E. (1985). A Practical handbook in language testing for the second
language teacher. Shaked: Ramat Aviv, Israel.
Shohamy, E. (2001). The power of tests: A critical perspective on the uses of
language tests. Harlow: Pearson.
Shohamy, E. & Hornberger, N. H. (Eds.). (2008) Encyclopedia of language
and education: Volume 7. Language testing and assessment (2nd ed.).
New York: Springer Science & Business Media.
Skehan, P. (1988). Language Testing: Part 1. Language Teaching, 21(4),
211–221.
Skehan, P. (1989). Language testing: Part 2. Language Teaching, 22(1), 1–13.
Spolsky, B. (1995). Measured words. Oxford: Oxford University Press.
Tyler, L. (1963). Tests and measurements. Englewood Cliffs, NJ: Prentice-Hall.
Alan Davies 347
Valette, R. (1967). Modern language tests: A handbook (1st ed.). New York:
Harcourt Brace and World.
Valette, R. (1977). Modern language tests: A handbook (2nd ed.). New York:
Harcourt Brace and World.
Weir, C. (2005). Language testing and validation: An evidence based
approach. Houndmills, Basingstoke: Palgrave Macmillan.
Wood, R. (1991). Assessment and testing. Cambridge: Cambridge University
Press.

Textbook Trends in Teaching Language Testing: Alan Davies University of Edinburgh, UK

Uploaded by

Copyright:

Available Formats

Textbook Trends in Teaching Language Testing: Alan Davies University of Edinburgh, UK

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Textbook Trends in Teaching Language Testing: Alan Davies University of Edinburgh, UK

Uploaded by

Copyright:

Available Formats

Language Testing 2008 25 (3) 327–347

Textbook trends in teaching

The article examines changes in language testing textbooks in English since

Keywords: informed by research, knowledge, language testing textbooks,

In writing about the teaching of language testing, we can make use

the material discussed in this paper, to consider teaching as deliber-

I First trend: Expansion

including books, videos, DVD and computer software. These provide

ourselves into our own rediscovery of a slightly circular wheel of our

were also two appendices on statistical inference and tables. Practical

descriptive work provides the self-help teaching resources that teach-

II Second trend: Skills, knowledge and principles

The six textbooks common to both surveys were as follows:

Table 1 Changes in English language testing textbooks

External Internal Internal Skills Knowledge Principles

1960s: Cronbach; Lado x x

were actually necessary, if only to appreciate how marginal the part

Cambridge ESOL and Educational Testing Service on TOEFL. But

Bachman, L. (1990). Fundamental considerations in language testing.

Davies, A. (1982). Language testing survey. Parts 1 and 2. In V. Kinsella (Ed.),

Lantolf, J. (Ed.). (2000). Sociocultural theory and second language learning.

You might also like