Textbook Trends in Teaching Language Testing: Alan Davies University of Edinburgh, UK
Textbook Trends in Teaching Language Testing: Alan Davies University of Edinburgh, UK
Textbook Trends in Teaching Language Testing: Alan Davies University of Edinburgh, UK
Address for correspondence: Alan Davies, Linguistics and English Language, School of
Philosophy, Psychology and Language Sciences, University of Edinburgh, Adam Ferguson
Building, George Square, Edinburgh EH8 9LL, UK; email: a.davies@ed.ac.uk
© 2008 SAGE Publications (Los Angeles, London, New Delhi and Singapore) DOI:10.1177/0265532208090156
328 Textbook trends in teaching language testing
days and therefore language teachers (who might or might not fol-
low graduate courses in applied linguistics at some point) were the
necessary audience for applied linguistics developments.
In his important CAL paper, J. B. Carroll (1961) does not attempt
to offer a language testing blueprint. Instead, he sets out to ‘readdress
the attention of the audience to certain basic and fundamental prob-
lems and points of view, some of which may have been lost sight of
in the heat of enthusiasm for technical detail’ (Carroll, 1961, p. 31).
And in his edited volume, Davies (1968) offers a range of views that
attempt to bring together the ‘three strands of language testing: lan-
guage, learning and evaluation’ (Davies, 1968, p. 1), examining ‘the
basic disciplines and their relevance to language testing … uses and
types of test … the influence of tests on education … the item analy-
sis needed’ (Davies, 1968, p. 13). Again, as with the Carroll paper,
the focus here is that of resource (with a glance at textbook material).
Whether we locate them as resource materials or as textbooks, it
seems the case that while the Lado and the Valette and the Harris
combine the what with the how to, with Valette and Harris more on
the how to side, the Carroll and the Davies are both very much on
the what side.
Through the 1960s and the 1970s, with the publication of the text-
books of Clark (1972) and Allen and Davies (1977), along with the
publication of the Peace Corps’ Manual of language testing (pub-
lished later as Anderson 1993), a deliberately practical field-guide,
there was always the recognition that while these materials provided
textbooks and practical manuals, supporting psychometric and stat-
istical back-ups were necessary. These were not language or applied
linguistics specific but were generic to all testing such as Cronbach
(1949), Anastasi (1954) and so on. And for statistical work there
were generic programmes such as SPSS.
The Edinburgh course in applied linguistics appeared in four vol-
umes between 1973 and 1977 (Allen & Corder, 1973, 1974, 1975;
Allen & Davies, 1977). Volume 4 (Allen & Davies, 1977), with the
title Testing and experimental methods, argued that ideas in applied
linguistics needed to be submitted to the rigour of hypothesis and
experimentation. Experiments, it was suggested, need tests while
tests are themselves kinds of experiment: ‘This book is an attempt to
demonstrate our belief in the importance of this link’ (Allen &
Davies, 1977, p. 10).
After the Introduction by Davies, two chapters (by Davies &
Ingram) were devoted to testing, two (by Ruth Clark) to experimen-
tal design and computation and one to statistical inference. There
332 Textbook trends in teaching language testing
design and the Henning (1987) which began to make IRT techniques
meaningful to the field of language testing and applied linguistics.
The 1980s also saw the start of the new journal Language Testing,
a sure sign of the field’s growing research capacity. This journal was
very deliberately not a teaching outlet.
In the 1990s we see the normal academic development of an
emerging discipline, now maturing and showing that maturity by
publishing research monographs, regular surveys of the field
(Davies, 1982; Skehan, 1988, 1989; Alderson & Banerjee, 2001,
2002), both its development and its future trends. These develop-
ments increased the coming together of the contributing disciplines
so that Bachman (1990) and Bachman and Palmer (1996) brought
together research design, statistics, computer programmes, test
preparation and analyses, while Davies (1990) and Wood (1991)
offered critiques of language testing which can at best be regarded as
resource materials for teaching, but are not easily put directly to use
in a training programme.
Since so much writing about language testing, up until the 1990s,
and perhaps even today, concerns large-scale testing, Genesee and
Upshur (1996) was very much to be welcomed, dealing as it did with
the very real, and very difficult context of classroom assessment.
Genesee and Upshur termed their book ‘practical’ and it belongs to
the practical manual end of my textbook category, concerned primar-
ily with skills and offering the knowledge necessary for employing
those skills but less concerned with principles.
As the field of language testing has grown, as courses have de-
veloped, at the undergraduate and graduate as well as at the PhD level,
and as those courses have specialized so that now there are a number
of master’s degrees in language testing itself, while formerly these
were normally part of degrees in applied linguistics or in applied lan-
guage studies, TESOL, and so forth, different teaching needs have
shown themselves. Extended resources deliberately designed for
teaching are represented by the publication of self-help teaching
materials in the shape of the University of Melbourne video series
Mark My Words (1997) and the ILTA Web-based interviews on lan-
guage testing Video FAQs (Fulcher & Thrasher, 1999, 2000). A paral-
lel development is represented by the publication of the Dictionary of
language testing (Davies, Brown, Elder, Hill, Lumley & McNamara,
1999) and the Encyclopedic dictionary of language testing (Mousavi,
2002). Maturing disciplines display and develop their maturity
through the development of teaching materials, such as the videos and
through the defining descriptive work of specific dictionaries. This
334 Textbook trends in teaching language testing
and the theoretical (the principles). All are necessary but one without
the other(s) is likely to be misunderstood and/or trivialized.
The survey of language testing courses, reported in Bailey and
Brown (1996), has now been updated for this volume by Brown and
Bailey who find that in the 10 years since their 1996 survey little has
changed apart from the choice of textbooks. They report ‘the pres-
ence of a stable knowledge base that is evolving and expanding
rather than shifting radically’ (Brown & Bailey, this volume: p. 371).
In 1996, Bailey and Brown reported that ‘there is a great deal of
diversity in the sorts of language testing preparation provided to
teachers’ (Bailey & Brown, 1996, p. 250). This diversity is revealed
in the list of required and optional textbooks supplied to them by the
84 language testing teachers who returned their questionnaires.
Bailey and Brown list 32 textbooks, half of which were listed by
only one respondent. The most common textbooks were as follows:
Henning, 1987
Madsen, 1983
Hughes, 1989
Bachman, 1990
Oller, 1979
Shohamy, 1985
Bailey and Brown (1996, p. 247) comment that ‘there is a wide range
of emphasis, from the very theoretical to the very practical in the
assessment preparation language teachers receive’. However, of the
six textbooks listed above, the four most commonly used, (Henning,
Hughes, Bachman and Oller) were very much on the theoretical side.
It appears that there was a widely held view that a language testing
textbook should be inclusive, combining knowledge and skills. The
high ranking given to Oller (1979) confirms that choice of theory
plus practical textbook; indeed, we might speculate that the attrac-
tion of the Oller is that it offered not just knowledge and skills but
also broached principles in the discussion of the nature of validity:
‘What is the ultimate criterion of validity for language tests?’(Oller,
1979, p. 404). No doubt Oller’s ideological adherence at that time to
his expectancy grammar and to the indivisibility hypothesis (or uni-
factorial structure of language proficiency) may have made for a
one-sided approach, but principles they are, nonetheless.
In their 2007 Survey, reported in this volume, Brown and Bailey
noted that 29 textbooks were listed as in use, compared with the 32
in the 1996 Survey. They write: ‘Interestingly, only six of the books
were common to both studies and of those, four were in new
editions, while only two were in their original editions’ (Brown &
Bailey, this volume: p. 371).
Alan Davies 337
Lado (1961) begins his book with knowledge: his Part 1 consists
of discussions of language, language learning, language testing, vari-
ables and strategy of language testing, and critical evaluation of
tests. The remaining 90% of the book examines the skills needed for
developing tests and for experiments using language tests.
Similarly, Allen and Davies (1977) consider both skills know-
ledge with chapters on ‘Basic concepts in testing’ and on ‘The con-
struction of language tests’. There are then two chapters on
experiments plus a further chapter (and appendices) on the meaning
and working of statistics used in experiments and testing.
Hughes (1989, now in its second edition, 2003) again deals with
the background knowledge needed in language testing and with the
statistical and item writing skills. What is of interest to us in this dis-
cussion is that Hughes’s second edition does not move beyond the
skills knowledge position he took up in 1989 in spite of the pres-
sures exercised by discussions in the language testing community on
principles. This may suggest that there is less demand among teach-
ers, Hughes’s target audience, for principles than I had assumed.
Bachman and Palmer (1996) and Davidson and Lynch (2002) follow
much the same pattern. However, Bachman and Palmer’s examination
of the conceptual basis of test development introduces what they term
‘test usefulness’, ‘a kind of metric by which we can evaluate not only
the tests that we develop and use, but also all aspects of test develop-
ment and use’ (Bachman & Palmer, 1996, p. 17). This formulation
I consider to be an incorporation of skills knowledge such that the
knowledge of test development and use now becomes a learnt skill. But
the main purpose of their book is, they contend, ‘to enable the reader to
become competent in the design, development and use of language
tests’ (Bachman & Palmer, 1996, p. 3). That is its primary purpose and
that is why I place the book in the skills knowledge category.
Davidson and Lynch’s book (2003) also belongs in this category.
The subtitle is: ‘A teacher’s guide to writing and using language test
specifications’. Davidson and Lynch maintain that existing language
testing textbooks assume knowledge of testing while they aim to
provide an introduction to newcomers, focussing on test specifica-
tions. And so their book offers guidance on the skills needed to write
test specifications but also shows how the knowledge behind those
specifications can be viewed as – and taught as – skills in their own
Bachman (1990), the text on which Bachman and Palmer (1996) is
based (as well as the series Cambridge Language Assessment – see
above), does not deal with the skills of item writing and test analysis.
Alan Davies 339
These matters are, after all, taken up in the later Bachman and Palmer
(1996). But what Bachman does in his 1990 text is to treat knowledge
as a form of skill and at the same time to move on to begin an exam-
ination of principles. Hence the discussion of validity as a unitary
concept, of the evidential nature of validity and of the consequential
and ethical basis of validity. Such discussions look forward to the
Bachman and Palmer (1996) concept of test usefulness.
Alderson, Clapham and Wall (1995) is much more rounded and
less practical. ‘Something we do not do in this book is to describe
language testing techniques in detail’ (Alderson, Clapham & Wall,
1995, pp. 2–3). What they do deal with is the examination of valid-
ation in all its aspects. Indeed, short on techniques though this volume
may be, through its in-depth discussion of validity and standards its
scope as a textbook is as wide-ranging as that of Lado’s before them
and that of Fulcher and Davidson (2007) over ten years later.
Alderson, Clapham and Wall (1995) ground much of their discussion
by reference to the work of the UK EFL examination boards. This
has the advantage of realism and makes a powerful argument for the
different concerns of academic language testing on the one hand and
public or institutional (or indeed commercial) language testing on
the other. It is distressing for academics to learn that: ‘not all boards
understand what is meant by validation, validity and reliability’
(Alderson et al., 1995, p. 257). But the very context specificity of the
book and its strong critique – almost ideological – of the boards may
detract from the overall concern of the learner who is unlikely to
have the same critical view as the authors.
The two video projects, Mark My Words (1997) and the ILTA
Video FAQs (Fulcher & Thrasher, 1999, 2000) are not primarily con-
cerned with skills. Both deal largely with knowledge. Thus, Mark
My Words has the following topics in its series:
Language proficiency assessment
Principles of test development (‘principles’ is used somewhat
differently in the present article)
Objective and subjective assessment
Stages of test analysis
Performance assessment
Classroom-based assessment
In this video series, knowledge is presented not so much as back-
ground as part of the necessary skills behaviour in developing lan-
guage tests. The Dictionary of Language Testing (Davies et al.,
1999) goes further by incorporating, as is the nature of dictionaries,
340 Textbook trends in teaching language testing
topics such as ethics, ethicality and impact and thus taking some
account of the principles of language testing.
Like Alderson, Clapham and Wall (1995), Weir (2005) is less con-
cerned with skills than with knowledge and in particular with valid-
ation. What he very carefully does is to explain that validation
evidence is required to demonstrate validity: in other words, while
others have shaped knowledge into a kind of skill, what Weir does is
to convert principles into first knowledge and then into a skill. No
doubt there is a case for retaining a separation between knowledge
and skills and between principles and skills so that implementing
them requires thought, not just automaticity. But for teaching pur-
poses, which is our concern here, the demonstration of how to make
both knowledge and principles operational, that is skill-like, is peda-
gogically very appealing.
McNamara (2000) and McNamara and Roever (2006) both elabo-
rate the knowledge needed for professional language testers and deal
in some depth with principles. Thus, in both texts there is concern
with ethics and social policy and responsibilities: indeed, the whole
of McNamara and Roever (2006) is concerned, as the title indicates,
with social issues. McNamara and Roever argue that ‘language test-
ing is … ripe for a broader view of assessment and its social aspects
(while) … testers need … to reflect on test use’ (McNamara &
Roever, 2006, p. 8). If McNamara (2000) and McNamara and Roever
(2006) are concerned wholly with knowledge and principles in such
a way that principles become part of the knowledge needed, Fulcher
and Davidson (2007) is yet more all-embracing, offering in one vol-
ume what Bachman (1990) and Bachman and Palmer (1996) offer in
two. Fulcher and Davidson (2007) has the title: Language testing and
assessment: An advanced resource book and is part of an Applied
Linguistics series of resource books in different areas.
Fulcher and Davidson (2007, p. xix) consider that their discussion
‘is set within a new approach that we believe brings together testing
practice, theory, ethics and philosophy. At the heart of our new
approach is the concept of effect-driven testing. This is a view of test
validity that is highly pragmatic. Our emphasis is on the outcome of
testing activities’.
The integrative nature of their text, comprising:
A) Introduction: 10 units dealing with the central concepts of language
testing and assessment
B) Extension: readings from books and articles linked to the concepts
introduced in Section A
C) Exploration: extended activities building on both A and B
Alan Davies 341
situates the learner within the language testing enterprise. Of all the
texts examined in this paper, Fulcher and Davidson (2007) does
seem to provide the most complete coverage of skills, knowledge
and principles.
The development proposed above can be summarized as follows: In
the 1960s (and earlier) language testing relied on external sources,
particularly psychometric (Cronbach, 1949; Anastasi, 1961; Tyler,
1963; Anstey, 1966). From the 1970s and onwards, the attempt was
made internally to nativize the necessary skills and knowledge – but
in separate texts, thus Hatch and Farhady (1982) and Hatch and
Lazaraton (1991) dealing with statistics and research design; Shohamy
(2001) and possibly the external Pennycook (2001) to handle critical
approaches, the ILTA Code of Ethics (2000) presenting the profes-
sion’s ethical principles, and Bachman (2004), again dealing with stat-
istics. Meanwhile, we have the internal sequence discussed above
from Lado (1961) to Fulcher and Davidson (2007) moving gradually
beyond the skills knowledge scenario to the skills knowledge
principles combination. We present this array in Table 1.
III Conclusion
The development in teaching materials examined in this paper comes
as a result of the increasing professionalism of the field of language
testing. That increasing professionalism has a cost: that cost is two-
fold: in-housing all resources means that language testers are
increasingly excluded from other potentially rewarding disciplines.
And the complete resource offerings in the later teaching materials
means that students are over-protected from exposure to empirical
encounters with real language learners, spending all (or much of)
their training within the resource material.
This exclusion from external influences leads to an insularity,
a reluctance to take up new ideas, as McNamara and Roever (2006)
argue. They remind us that in spite of the social turn in the last two
decades the teaching of language testing is still largely psychomet-
ric: ‘In terms of academic training, we stress the importance of a
well-rounded training for language testers that goes beyond applied
psychometrics … a training that includes a critical view of testing
and social consequences’ (McNamara & Roever, 2006, p. 255).
A similar resistance may explain the reluctance to grapple with
recent research findings. Wood’s (1991) is an extreme view: ‘it is
clear that innovation is not driven by research … (but) … it is import-
ant to understand how (innovations) happened, and whether they
342 Textbook trends in teaching language testing
IV References
Alderson, C. (2000). Assessing reading. Cambridge: Cambridge University
Alderson, C. & Banerjee, J. (2001). Language testing and assessment (Survey
Article). Language Teaching, 34, 213–236.
Alderson, C. & Banerjee, J. (2002) Language testing and assessment (Survey
Article). Language Teaching, 35, 79–113.
Alderson, C., Clapham, C. & Wall, D. (1995). Language test construction
and evaluation. Cambridge: Cambridge University Press.
Alderson, C. & Bachman, L. F. (Eds.). (2000–2005). The Cambridge lan-
guage assessment series. Cambridge: Cambridge University Press.
Allen, P. & Davies, A. (Eds.). (1977). Testing and experimental methods:
Volume 4. The Edinburgh course in applied linguistics. London: Oxford
University Press.
Allen, P. & Corder, S. P. (Eds.). (1973). Readings for applied linguistics:
Volume 1. The Edinburgh course in applied linguistics. London: Oxford
University Press.
Allen, P. & Corder, S. P. (Eds.). (1974). Techniques in applied linguistics:
Volume 3. The Edinburgh course in applied linguistics. London: Oxford
University Press.
Allen, P. & Corder, S. Pit (Eds.). (1975). Papers in applied linguistics: Volume
2. The Edinburgh course in applied linguistics. London: Oxford
University Press.
Anastasi, A. (1954). Psychological testing (1st ed.). New York: Macmillan.
Anastasi, A. (1961). Psychological testing (2nd ed.). New York: Macmillan.
Anderson, N. (1993). Handbook for classroom teachers in Peace Corps
language programs. Manual. Washington, DC: The Peace Corps.
Anstey, E. (1966). Psychological tests. London: Macmillan.
344 Textbook trends in teaching language testing
Valette, R. (1967). Modern language tests: A handbook (1st ed.). New York:
Harcourt Brace and World.
Valette, R. (1977). Modern language tests: A handbook (2nd ed.). New York:
Harcourt Brace and World.
Weir, C. (2005). Language testing and validation: An evidence based
approach. Houndmills, Basingstoke: Palgrave Macmillan.
Wood, R. (1991). Assessment and testing. Cambridge: Cambridge University