State-of-the-Art Review Article
Pronunciation
Jane Setter University of Reading, UK
Jennifer Jenkins King’s College London, UK
j.e.setter@reading.ac.uk, jennifer.jenkins@kcl.ac.uk
This article is organised in five main sections. It begins
by outlining the scope of pronunciation teaching and the
role of pronunciation in our personal and social lives. The
second section surveys the background to pronunciation
teaching from its origins in the early twentieth century to
the present day, and includes a discussion of pronunciation
models and of the role of the first language (L1) in the
acquisition of second language (L2) pronunciation. Then
a third section explores recent research into a range of
aspects involved in the process: the effects of L1 and L2
similarities and differences; the role of intelligibility, accent
attitudes, identity and motivation; the part played by
listening; and the place of pronunciation within discourse.
This section concludes with a discussion of a number of
controversies that have arisen from recent pronunciation
research and of research into the potential for using
computer-based technology in pronunciation teaching. The
fourth section explores a range of socio-political issues that
affect pronunciation teaching when the L2 is learnt as an
international rather than a foreign language, and the fifth
section moves on to consider the implications of all this for
teaching.
1. Introduction
1.1 The scope of pronunciation teaching
Pronunciation involves the production and perception of segmentals (sounds), both alone and in the
stream of speech, where they undergo a number
of modifications and interact with suprasegmental
(prosodic) features, particularly stress and intonation.
Although all these aspects of pronunciation could
be expected to appear on second language (L2)
curricula, there are differences in the degree of
importance attached to pronunciation teaching in
different parts of the world. For example, for the past
three decades, pronunciation in English Language
Teaching (ELT) has tended to be marginalised in
the UK and US, but to be regarded as critical in
many parts of mainland Europe such as Austria and
Russia. On the other hand, if somewhat curiously,
pronunciation is widely accorded far more importance in the UK as far as the teaching of
foreign languages other than English is concerned.
One might well wonder why differences in attitude
towards pronunciation teaching exist, when quite
evidently it ought to be an important aspect within
a teaching and learning context which is communicatively oriented (see Grotjahn’s 1998 review of
pronunciation teaching). Whatever the pedagogic
orientation, however, pronunciation is universally
considered to be a ‘difficult’ aspect of an L2 to teach
and learn – and possibly the most difficult, for various
reasons which will emerge below.
We would like to state that this article focuses
unashamedly on English pronunciation, as this is the
authors’ area of expertise. This should not be taken
to mean, however, that work on the pronunciation
of other languages has either not been undertaken,
or is less important.
1.2 The role of pronunciation
Whether or not pronunciation is accorded a major
role in the L2 classroom, it plays a major role in our
personal and social lives. On the one hand, at the
affective level it is through the way we speak, and
above all, by means of our accent, that we project
our regional, social and ethnic identities. The latter
are deeply-rooted, often from a very early age, and
may prove subconsciously resistant to change even
if on the surface, as language learners, we profess
the desire to acquire a nativelike accent in our L2.
On the other hand, our pronunciation is also a
major factor in our intelligibility to our listeners. The
Jane Setter is a Lecturer in Phonetics at the University
of Reading in the School of Linguistics and Applied
Language Studies, where she is Director of the English
Pronunciation Research Unit. She is co-editor of Daniel
Jones’ English pronouncing dictionary (2003,
Cambridge) with Peter Roach and James Hartman. Jane
is Joint Coordinator of IATEFL’s Pronunciation Special
Interest Group.
Jennifer Jenkins is a Senior Lecturer in the Department
of Educational and Professional Studies at King’s College
London, where she is also Programme Director of the MA
in English Language Teaching and Applied Linguistics.
She has published widely on pronunciation in language
teaching, most notably The phonology of English as
an international language (2000, Oxford University
Press) and has also written an undergraduate coursebook,
World Englishes (2003, Routledge).
Lang. Teach. 38, 1–17. doi:10.1017/S026144480500251X Printed in the United Kingdom
c 2005
Cambridge University Press
1
■
Pronunciation
pragmatics literature consistently emphasises the role
of the interpretation of meaning in context in
communication breakdown (see e.g., Thomas 1995).
However, when a pronunciation feature impedes the
intelligibility of a word, the likelihood – particularly
in the case of a non-native listener, who tends to
focus on the acoustic signal rather than use contextual
cues to resolve ambiguity – is that communication
will fail even before pragmatic factors enter the
equation (cf. Jenkins, 2000: 80–83). Pronunciation,
then, plays a vital role in successful communication
both productively and receptively. One of the main
problems for L2 learners, however, is that pronunciation tends to operate at a subconscious level, particularly with regard to suprasegmental features, and
so is often not easily amenable to manipulation.
2. The background
2.1 Origins of interest in phonology/
phonetics and pronunciation teaching
Pronunciation has a long and distinguished history in
second language teaching. For, as Seidlhofer (2001:
56) points out, it “stood at the very beginning
of language teaching methodology as a principled,
theoretically-founded discipline, originating with the
late-nineteenth-century Reform Movement”. The
Reform Movement brought together phoneticians
interested in the teaching of pronunciation from a
number of European countries and resulted in the
establishment of pronunciation as a major concern
of second language instruction lasting well into the
second half of the twentieth century, even in the
teaching of English (see Collins and Mees, 1999;
Howatt, 2004). Their collaboration also led to the
founding of the International Phonetic Association
and the development of the International Phonetic
Alphabet (IPA), capable of representing the full
inventory of sounds of all known languages. The
pervasiveness of the IPA in pronunciation teaching
and research is attested by the fact that, over a hundred
years later, it is still the universally acknowledged
system of phonetic transcription.
Although pronunciation teaching suffered a
setback with the advent of Communicative Language
Teaching in the later twentieth century, especially in
the teaching of English, the basic principles of the
Reform Movement, such as the prioritising of the
spoken language over the written, were never altogether lost. And in more recent years, pronunciation
specialists have devised ways of incorporating the
teaching of pronunciation within a communicative
framework, by moving away from the drilling of
discrete language items to communicative activities
in which pronunciation contributes to the meaning
in context. This in turn has led to a much greater
interest in the teaching of suprasegmental aspects
of pronunciation than existed in the earlier years,
itself underpinned by copious research into the com2
municative role of pronunciation (see, for example,
Morley (ed.), 1994; Wennerstrom, 2001; and the
discussion of pronunciation within a communicativediscourse paradigm in Section 3.5 below).
2.2 Pronunciation models in research
and teaching
Pronunciation is a matter which needs to be addressed
in the teaching of all languages, as clearly there is little
point in learning a (living) language if one does not
mean to communicate with other speakers of that
language. However, the main body of literature in
this area is on teaching English pronunciation. This
is probably unsurprising given the status of English
world-wide. This article focuses on teaching and
research in the area of English pronunciation, but
many of the issues and concerns which are raised here
can be applied to pronunciation in other languages.
When English pronunciation teaching takes place
in institutions all over the world, the models adopted
are generally derived from what are referred to here
as older varieties of English (OVEs), these being
for the most part British and American English.
These accents are comprehensively described in pronouncing dictionaries (see Roach et al., 2003; Upton
et al., 2001; Wells, 2000) and books on English
phonetics and phonology (see, e.g., Roach, 2000;
Kreidler, 2004) – although some more recently
conceived texts do include other Englishes (see, e.g.
Collins & Mees, 2003; Deterding & Poedjosoedarmo,
1998; McMahon, 2002) – and materials are copious
and readily available. Countries such as Japan, Taiwan,
the Philippines and those in South America tend to
use American English as a model, whereas British
English is found in former colonies and protectorates,
such as Hong Kong, India and certain African
countries, and also in Europe.
This approach to the selection of a model is
intuitive rather than empirical, and can be based
on sociocultural or market-driven choices. OVEs are
regarded as ‘proper English’, and any local variety is
simply not good enough. An example of this way of
thinking can be seen in the case of India; although
Indian English is a recognised nativised variety of
English (NVE), many Indian speakers of English
aspire towards Received Pronunciation (RP), rather
than treating Indian English as a valid model in its
own right (see section 3.3).
For British English, the main and, it must be said,
exhaustively comprehensive reference is Gimson’s
pronunciation of English (2001, Arnold), edited by
Cruttenden, currently in its 6th edition and regularly
updated. Although some writers believe the term
to be outdated (see, for example, Roach, 2000: 3),
Cruttenden continues to use RP for the prestige
accent of English, noting that this term is “the result
of a social judgement rather than an official decision as
to what is ‘correct’ or ‘wrong’” (2001: 79). He goes
■
on to say, however, that innovation in RP “tends
to be stigmatised” (2001: 79), and it must be said
that deviation from RP norms among announcers on
the BBC can still cause consternation among some
British (and overseas) listeners, to the extent of letters
of complaint appearing in the British press. While the
term RP can bring to mind a radio announcer from
the early 20th Century, Cruttenden gives examples
of changes and recent innovations in RP, and also
mentions features of ‘Estuary English’ as having an
influence, such as vocalisation of ‘dark l’ in e.g. milk,
and use of a glottal stop to replace a /t/ before an
accented vowel or a pause in e.g. not always. It should
be noted that ‘Estuary English’ does not describe any
one single accent of British English, but is rather an
umbrella term covering many accents spoken in the
south-east of England which share some features of
pronunciation, such as those mentioned above (see
Przedlacka, 1999). For American English, Kreidler
(1989) provides a clear description, although it does
not provide detail at the level of Cruttenden (2001),
and is rather more akin to Roach (2000).
In the classroom, the teacher is certainly the main
influence on learners. Classes taking place in monolingual situations will generally have a non-native
speaker (NNS) teacher, and that teacher’s pronunciation will act as a model for students. Some
countries, such as Hong Kong, operate schemes to
employ teachers from OVE backgrounds in order
to provide an OVE model. The strong form of the
argument for use of models based on OVEs goes
something like this: surely speakers need to have
a common pronunciation in order to be able to
understand one another? Monolingual teaching
situations which involve a NNS teacher would seem
to lead to chaos rather than mutual intelligibility.
However, writers on the subject in recent years can
be seen to take a more flexible approach. Roach
(1994), for example, in an address to tertiary teachers
in Singapore, a community of speakers with a recognised NVE, suggests that students will need to
learn a variety which is more intelligible to other
speakers of English as well as using the local variety,
but does not recommend strict adherence to an OVE
model. Taylor (1991), stressing that intelligibility is
interactional in nature, suggests “teaching to a transcription rather than a particular model” (Taylor,
1991: 433), where the transcription represents any
viable accent, and acts as a guide to the necessary
contrasts. Barrera-Pardo & López-Soto (2003) are
more concerned with a mismatch between the model
learners are exposed to in the real world and that
which is the focus of instruction, insisting that, if a
model has to be adopted, it should at the very least
reflect that which is closest to the “real language”
the learners are going to hear outside the classroom
(Barrera-Pardo & López-Soto, 2003: 2839).
In research, NNS English is usually compared with
OVEs, such as in Setter (2003), Pickering (2002),
Pronunciation
Low et al. (2000) and Tajima et al. (1997). Similarly,
native speakers (NSs) are very often the listeners in
tests of intelligibility (see, for example, AndersonHsieh et al., 1992; Tajima et al., 1997), although
studies which look at the opposite do exist, for
example Derwing & Munro (2001), Derwing et al.
(2002), which look at how intelligible NSs are to
NNSs. It is, of course, necessary to have a point
of reference for such studies, but in future it may
be the case that comparisons are made between
accents/varieties of English which do not involve
OVEs at all. If intelligibility between NSs and NNSs
is a source of data for researchers, intelligibility in
English between NNS groups would seem to provide
endless possibilities for research, and could lead to the
development of teaching materials which are geared
towards particular English communication situations – between Hong Kong and Japanese speakers of
English, perhaps. The scope for study, then, is almost
infinite.
2.3 The role of L1 – transfer
and interlanguage
It was once thought that a straightforward comparison of the features of a learner’s L1 with L2,
a target language, would uncover all the mysteries
of what was difficult in L2, and also what should
be straightforward. This method of comparison,
known as Contrastive Analysis, has some validity for
pronunciation, where the total inventory of sounds
available to a speaker in L1 is sure to have a bearing.
But it is not enough to do a simple comparison of
which sounds constitute phonemes in each language
and whether or not they occur in both to predict
what a learner will or will not be able to pronounce.
The syllable is a unit of immense importance in L1,
and the positions in which sounds occur in syllables
must be taken into account; although the Contrastive
Analysis Hypothesis may have addressed it, this is
often not picked up by teachers when considering
pronunciation difficulties. Many Chinese languages,
for example, allow only a vowel or a nasal consonant
at the end of a syllable, and so the non-nasal single
consonants and consonant clusters which can occur
at the end of English syllables in words such as please,
crisps and films present a difficulty for learners, even if
the sound(s) appear in syllable initial position in their
language(s).
In a development of the notion that a contrastive
analysis will not account for all learner differences
and difficulties, Corder (1971) and Selinker (1972)
proposed that L2 be regarded as a distinct system,
an interlanguage. This, together with the idea of L1
interference on L2 – where features of the L1 play a
part on the successful acquisition of elements of the
L2 – has given rise to many studies of interlanguage
phonology and the role of L1 in pronunciation. In
1987, Tarone lamented the dearth of studies into the
3
■
Pronunciation
phonology of interlanguage (Tarone, 1987: 70) – a
reflection, perhaps, of the lack of materials specifically
geared towards pronunciation teaching at that stage –
but this has been rectified in recent years. The New
Sounds conferences, organised by Allan James of the
University of Klagenfurt and Jonathan Leather of
the University of Amsterdam, attract researchers into
interlanguage from all over the world (see for example
Leather & James, 1997; James & Leather, 2002).
New theories in how interlanguages work are being
developed all the time.
Major, for example, has developed what he calls the
Ontogeny Model (see, for example, Major, 1987a,
1987b, 1997, 2002), in which he argues for “an
interrelationship of interference and developmental
factors” in L2 phonological acquisition (Major,
1987a: 102). Major shows how interference is more
prevalent in initial stages of phonological acquisition,
where a learner copes by using a similar L1 phoneme,
but this interference slowly decreases over time, to
be overtaken by developmental factors as learning
takes place. These developmental processes are more
akin to native speaker L1 phonological acquisition
processes as the learner becomes more proficient (and
presumably has sufficient NS input).
Looking at interlanguage from the aspect of
parameter setting, a notion used in the framework
of generative phonological acquisition, Archibald
examines the acquisition of what he calls “new knowledge” (2002: 11). His findings suggest that learners
are actually able to “alter their L1 representations on
the basis of the L2 input” (Archibald, 2002: 20); that
is, parameters which are set for L1 can be altered
if learning of L2 processes take place, rather than a
speaker setting up a whole new set of parameters for
the phonology of L2. This is clearly a necessity if
learners are to break out of the phonology of their
L1 and pronounce L2 with any accuracy.
Flege, a prolific writer on the subject of interlanguage phonology, is most well known for his
work on the effect of age on the acquisition of
segments, most specifically vowels, in an L2. For some
recent co-authored studies, see Tsukada et al., 2003
and Aoyama et al., 2003. Flege (1995) developed
the Speech Learning Model (SLM), which “leads
to the expectation that subtle differences will exist
between vowels produced by early bilinguals and
L2 monolinguals” (Flege, 2002: 132). Flege asserts
that sounds in the L1 and L2 systems of a bilingual
speaker share what he calls a “common phonological
space” (Flege, 2002: 132), and it is suggested that
they will likely influence and interact with each
other. This sits well with Archibald’s position (see
above). Probably unsurprisingly, Flege’s studies show
that early bilinguals are judged to have more L2like vowels than late bilinguals, but still “can not be
expected to perform like ‘perfect’ bilinguals” (Flege,
2002: 140), i.e., have a production which is identical
to a NS of the target L2.
4
3. Recent research into L2
pronunciation acquisition
3.1 L1 and L2: similarities and differences
From a research point of view, there has been quite a
lot of interest in the acquisition of L1 consonants
among English speaking children, but little, by
comparison, on vowels. The earliest consonant
sounds that English L1 children tend to acquire are
the plosives, nasals and fricatives /p b t d k g m n
h s/ and also approximants /j/ and /w/, with the
approximants described as liquids, /l r/, the remaining
fricatives /f v T D z S Z/ and affricates /Ù dZ/ being
acquired later, together with consonant clusters.
Syllables tend to be CV to start with, and partially
or fully reduplicated – for example, mama or babi – and
slowly take on adult characteristics, with patterns such
as voicing of consonants in syllable initial position
and devoicing of those in syllable final position (so
dog would sound like dock and cat like gat), and
simplification of consonant clusters (play might sound
like pay and stop like dop), being common. Intonation
patterns are, interestingly, distinguishable between
very small babies from different L1 backgrounds. So,
what, if any, similarities are there between L1 and L2
phonological acquisition?
Carlisle (2001: 2) tells us that the CV syllable is
recognised as an “absolute universal in the languages
of the world”, and so it is logical that a child will
start with syllables such as mama or babi. From the
perspective of complex syllable onsets and codas,
or consonant clusters at the beginning or end of
a syllable, those which adhere to the Sonority
Sequencing Principle (SSP) (Clements, 1990) are
preferred, with the hierarchy as follows: vowels are
most sonorant, followed by glides (for English, /j w/),
liquids (/l r/), nasals (/m n N/), fricatives (/f v T D s z S Z
h/), and finally plosives (/p b t d k g/). Voiced sounds
are considered to be more sonorant than voiceless
ones. This means that a syllable beginning pl- (e.g.
play) is more preferred to one which begins st- (e.g.
stop), and may explain why NS children retain the
first sound in play but the second in stop; if Universal
Grammar is activated, st- is dispreferred. English
allows a large number of complex syllables, with up to
three consonants allowed in initial position and four
in final position; s-clusters (clusters beginning with
/s/) in particular do not adhere to the SSP. In fact,
these clusters, called “reversals”, are considered to be
a serious departure from the SSP (Carlisle, 2001: 5).
So, how does this relate to L1 and L2 acquisition?
Carlisle’s (2001) survey of studies on interlanguage
and syllable structure universals comes to the conclusion that L1 transfer is, in fact, a stronger influence on the pronunciation of an L2 than the
preference for CV syllables, from which one can
deduce that patterns of phonological acquisition of
L2 are far removed from those of L1 due to the
incontrovertible effect of L1 phonology. However,
■
where the L2 phonology is dissimilar to that of L1,
it is not impossible that a sequence of acquisition
similar to that of L1 takes place – if it ever does.
Major’s Ontogeny Model (see above) gives us an indication that this can happen under suitable learning
circumstances; see Major (1999) specifically on consonant clusters, and also Hansen (2001) for a discussion of linguistic constraints, including sonority,
on the acquisition of various final consonant combinations among Chinese speakers of English. Peng &
Setter (2000) find a systematic alternation in the
deletion of alveolar plosives in consonant clusters in
English spoken in Hong Kong which is not unlike
patterns found in L1 English speakers.
Another matter which features in L1 acquisition
is the Critical Period Hypothesis (CPH), originally
put forward by Lenneberg (1967). Lenneberg’s initial
suggestion was that, pre-puberty, it is easier to learn
a language, but following puberty the brain “behaves
as if it had become set in its ways” (Lenneberg, 1967:
158), and language learning is much more difficult.
This was attributed to hemispheric specialisation for
language functions, which was thought to have taken
place by puberty, and is a popular excuse for why it
is difficult for L1 English speakers to learn foreign
languages at school. See also Celce-Murcia et al.
(1996: 15–16). Although the strong form has been
discredited for a number of years, the CPH might
explain Flege’s findings (see above) that those who
learn an L2 at an earlier age have a more nativelike pronunciation. Flege, however, in an early paper,
finds the CPH counter-productive for research into
L2 phonology, and concludes that it cannot and
should not account for the differences between adultchild performance, as many other factors play a part
(Flege, 1987: 174). Hiding behind the CPH, says
Flege, means that important questions that need to be
asked about individual L2 learners may not be asked.
In Flege et al. (1999), both the pronunciation and
grammatical structures among Korean L2 speakers
of English are examined, and correlated with their
age of arrival (AOA) in the United States. It is
found that pronunciation is indeed more native-like
as participants’ AOAs decreased; this is attributed to
the possible influence of brain maturation, but is more
likely, say Flege et al., due to “changes in how the L1
and L2 phonological systems interact as the L1 system
develops” (1999: 101). Again, this echoes Major’s
Ontogeny Model (above).
3.2 Research into intelligibility
As mentioned in section 2.2 above, much of
the research into intelligibility has involved testing
whether NNSs are intelligible to NSs, the (rather
arrogant?) premise being, one assumes, that NNSs are
learning English principally in order to communicate
with NSs. Research of this kind includes work by
Tajima et al. (1997), Magen (1998), Nelson (1992),
Pronunciation
Tyler (1995), Benrabah (1997), Anderson-Hsieh et al.
(1992), Grosjean & Gee (1987), Munro & Derwing
(1995a, 1995b, 1998), Major et al. (2002), BürkiCohen et al. (2001), to name but a few. The impact
of this research has been to show that it is, in
fact, deviance in the pronunciation of suprasegmentals which causes the most difficulty for NSs
listeners.
A relatively new approach to the intelligibility of
pronunciation concerns interaction between NNSs.
Here, by definition, the premise cannot be that the
second language is being learned for communication
with its NSs or that intelligibility for and of a NS
listener is paramount. Nor can it be assumed that
pronunciation deviance will have the same effect
in NNS-NNS interaction as it does in NS-NNS
communication. So far the research in this area has
investigated only English, and here the findings are
that segmentals have a far greater role in English as
an International Language than they do in English as
a Foreign Language (see Section 4 below for further
discussion).
3.3 Research into attitude, motivation
and identity
While much of the research into pronunciation focuses on linguistic factors, there is a growing interest
in socio-psychological influences: the role played by
identity, attitudes and motivation in learners’ selection of pronunciation models and goals, and in their
ultimate achievement in relation to their choices.
Pronunciation seems to be particularly bound up
with identity. Our accents are an expression of who
we are or aspire to be, of how we want to be seen
by others, of the social communities with which we
identify or seek membership, and of whom we admire
or ostracise. At the same time, and sometimes in spite
of the latter, our accents are also likely to indicate a
strong, if for some people a subconscious, attachment
to our mother tongue, which Daniels (1997: 82)
describes as “a sort of umbilical cord which ties
us to our mother”. He argues that “whenever we
speak an L2 we cut that cord, perhaps unconsciously
afraid of not being able to find it and tie it up
again when we revert to L1” and observes that
“a possible way of avoiding the cut is to continue
using the sounds, the rhythms and the intonation of
our mother tongue while pretending to speak L2”.
Acquiring an L2 accent, then, may be felt by learners
whether consciously or subconsciously to involve the
development of a new ego and as such, be resisted
because of individual and/or social pressures.
Pronunciation attitudes play an important part
in determining learning choices and outcomes.
Cenoz and Garcia Lecumberri (1999) show how the
perceived difficulty of some NS English accents cause
learners to develop less favourable attitudes towards
these accents. Jarvella et al. (2001) demonstrate that
5
■
Pronunciation
Danish learners of English are able to distinguish
between different NS English accents and that they
rate them differently in terms of attractiveness, with
English-English accents generally being rated as most
attractive and American-English as least. DaltonPuffer et al. (1997) also find that English-English (in
this case RP) is the most highly evaluated accent by
Austrian learners, with near-RP their second choice,
and American-English (GA) their third choice, while
the two non-native (Austrian) accents are evaluated
as having very low status. Of the latter two, the one
rated as by far the least attractive is the accent most
often heard in Austria and that spoken by the subjects
themselves. Smit and Dalton (2000) likewise find that
Austrian learners prefer to aim for an NS accent,
while Smit (2002) adds a further dimension to the
complex equation, that of linguistic insecurity. Her
findings may help account for learners’ aspiration
for an NS accent and yet their failure to acquire
one: their feelings of inadequacy pronunciation-wise.
Common to the majority of the accent attitude
research, then, is the subjects’ professed desire to
acquire a prestige NS accent rather than a local or
internationally acceptable accent, even though such
is rarely the learning outcome.
Pronunciation, it seems, is a more sensitive area
of language than the other linguistic levels because
of the way in which it encroaches on identity and
elicits strong attitudes. This in turn may go some
way to explain why, despite a professed desire to
sound ‘nativelike’, the aspiration is rarely achieved by
L2 learners. The socio-psychological research (as well
as the sociolinguistic research in relation to international languages: see Section 4 below) indicates
that pronunciation teachers would do well to replace
the notion of absolute ‘correctness’ with one of
appropriateness (see Seidlhofer, 2001: 57–60). In this
respect, the prevailing concept of ‘accent reduction’,
with its tendency to treat L2 learners as though they
are subjects for speech pathology and to encourage
them to lose all traces of their L1 accent, is being
questioned by those working on the acquisition of
international languages, most notably English as an
International Language (EIL). The concept of ‘accent
addition’, that is, the adding of L2 pronunciation features to learners’ repertoires in accordance with their
needs and preferences is, instead, being promoted as
one more in keeping with current theories of bilingualism (additive rather than subtractive) and
of learner autonomy. Jenkins (2000: 209–210),
for example, proposes five stages of pronunciation
learning, each one involving the addition to learners’
repertoires if they so desire:
r addition of EIL core items (see Section 4) productively and receptively
r addition of a range of NNS English accents to the
learner’s receptive repertoire
r addition of accommodation skills
6
r addition of non-core items to the learner’s receptive
repertoire
r addition of a range of NS English accents to the
learner’s receptive repertoire.
Those learners who wish to preserve their L1
identity in their L2 but be understood by and understand other NNSs will probably choose as their goal
the first three stages. On the other hand, those who
want also to be able to understand NSs’ pronunciation
will probably aim for all five stages. Whatever their
decision, however, there is no requirement for
learners to lose their L1 accent and, by implication,
their L1 identity.
This all calls into question the traditional distinction between the instrumental and integrative
motivation of Schumann’s acculturation model (see
Schumann 1986). Dörnyei and Csizér (2002: 453),
for example, argue that “World English is turning
into an increasingly international language and it is
therefore rapidly losing its national cultural base while
becoming associated with a global culture”. This,
they believe, “undermines the traditional definition
of integrativeness as it is not clear any more who the
‘L2 speakers’ or the members of the L2 community
are”. For L2 English pronunciation (and the same will
undoubtedly be true of any subsequent international
languages that supersede English), motivation is
no longer a straightforward concept involving the
learner’s orientation to the accent of the language’s
native speaker community. Instead, as Dörnyei and
Csizér imply, it has been complicated by a host of
factors relating to the new international context of
communication. Much more research is needed to
clarify the situation and, in particular, the factors
influencing the ambivalent pronunciation attitudes of
learners of international languages, which Bamgbos.e
(1998: 7) describes in respect of English accents as “a
love-hate relationship”, in the sense that “one does
not wish to sound like a native speaker but still finds
the accent fascinating”.
3.4 Research into listening
Listening is, to some extent, the flip-side of pronunciation. The extent to which one affects the other
cannot be underestimated; one needs to be able to
hear a phonemic contrast before one can successfully
produce it, for example.
Field advocates a “signal-based approach” (2003:
332) to listening which involves using bottom-up
processing in listening activities, rather than assuming
enough information can be gained from context.
Drawing learners’ attention to possible problems such
as cross-boundary segmentation, the identification of
weak forms and assimilation in NS speech, Field
addresses an area which is very often neglected in
either listening or pronunciation teaching. Learners,
particularly in initial stages, find sounds more
■
concrete than higher level units (like phrases or
sentences), and tend to be persuaded by their own
first parse of an utterance, which can result in miscommunication at the global level, and so this
approach seems to be highly sensible. Wilson (2003)
also advocates a bottom-up approach, and suggests
some practical, student-centred activities to improve
listening.
Cauldwell (2002b) looks at suprasegmental aspects
of listening. He introduces something he calls the
“word-crusher”, a double-prominence tone unit in
which words between the first and last stressed items
are “crushed”, or temporally pushed together. In this
paper, Cauldwell sees English as messy, and suggests
activities in which students can practice the blurring
of words in the word-crusher. By understanding how
this works in English, processes of connected speech
can be better modelled by the learner, and therefore
messages better perceived. In another article on
listening, Cauldwell (2002–2003) suggests that more
attention needs to be placed on understanding fast
speech, and that teachers need to be equipped with
the ability and terminology to describe it effectively
to learners. For this purpose, a teacher or teacher
educator could certainly do a lot worse than investing
in a copy of Shockey’s Sound patterns of spoken English
(2003), which is a fully comprehensive guide to
connected speech processes in English.
In another study on suprasegmental issues,
Erickson et al. (1999) show that Japanese listeners
have difficulty perceiving and counting syllables in
English, and attribute this to negative language
transfer at the suprasegmental level, and also, in part,
to the fact that English words are written down using
Japanese katakana, which represents English words
in terms of the Japanese unit of timing, the mora.
Strategies for learning how to predict the number of
syllables in an English word are surely implicated.
The above activities are aimed at enabling NNSs
to decode NS speech. Imai et al. (2003) look at the
responses of both Spanish and English speakers of
English, presented with English single-word stimuli
in noise, some of which are Spanish accented and
some not. The English NS group performed best
overall; interestingly, the Spanish listeners performed
better perceiving unaccented speech than Spanish
accented speech, but better than the native English
speakers in perceiving Spanish accented speech.
Major et al. (2002) looked at what happened when
listeners from many different language backgrounds
were asked comprehension questions based on
lectures given in English by NS and NNS. It was
found that all groups scored badly when listening to
lectures given by NNS, Spanish speakers did much
better when listening to L1 Spanish speakers and
Chinese speakers did much worse when listening to
L1 Chinese speakers. It is suggested that using NNS
speech in listening comprehension tests may well
disadvantage listeners from different L1 backgrounds,
Pronunciation
or create a bias where the listener and speaker
are both from the same L1 background. Clearly,
there needs to be more work in the classroom on
developing strategies for listening to Englishes other
than those one might be most likely to come across,
although the practicalities of doing so might be
problematic.
In an experiment to find which speaking rates
were preferred by NNS listeners, Derwing &
Munro (2001) use Mandarin and “mixed” groups
of L2 English speakers to rate the speed of spoken
narratives on a scale ranging from “too fast” to
“too slow”. English speech produced by NS and
Mandarin NNS was presented in its original format,
and also in computer modified temporal formats,
including an adjustment to the Mean Mandarin rate
and Mean English rate. They found that slowing
down the speech did not generally lead to better
evaluations of preference amongst listeners from any
of the groups, but that the preference among the nonMandarin speakers was for slightly slower Mandarinaccented English. They conclude that asking learners
to “slow down” may not actually be beneficial.
3.5 Pronunciation research within a
communicative-discourse paradigm
Discourse intonation research began more than three
decades ago with Halliday and the Prague School
(see Halliday, 1970). Since then, the main focus has
been on English, most notably the pioneering work
of Brazil and his colleagues at the University of
Birmingham, although there has also been research
into discourse intonation involving other languages,
such as Moyer (1999) on German. Brazil’s research
was published posthumously (Brazil, 1997) by his colleagues, although it had earlier (1985) been published
as a Birmingham University monograph as publishers
had not at that time appreciated the significance
of Brazil’s contribution to the understanding of the
relationship between intonation and grammatical
meaning on the one hand or the expression of attitude on the other. For more recent publications on
discourse intonation see Wichman (2000), and Chun
(2002) specifically on discourse intonation in L2.
Discourse intonation is an empirically-based
model which is concerned with the communicative
function of intonation rather than the grammatical
and attitudinal functions which are to this day the
concerns of traditional models. Its primary interests
are firstly the establishing of social meanings and
roles through the assignment of prominence, key and
tone choice (with a falling, or ‘proclaiming’, tone
for non-shared and a fall-rise, or ‘referring’ tone for
shared information), and secondly the intonational
mechanisms for controlling conversation, such as
turn-taking and introducing/concluding topics. The
model thus provides teachers and researchers with a
means of analysing speakers’ intonation choices in
7
Pronunciation
authentic speech in a way that traditional models,
based on invented examples and intuition, cannot.
Nevertheless, discourse intonation is only now beginning to be widely taken up in language teaching.
This is to some extent because the earliest teaching
materials to embrace the model (e.g. Bradford, 1988;
Brazil, Coulthard & Johns, 1980) tended to apply the
model in its entirety for productive use. While analysis
and interpretation of intonation choices after the
event was found to be a useful activity, the assessment
of shared/non-shared status and consequent assigning
of tone proved too subconscious and too fleeting
to be conducive to teaching for production. More
recent materials (e.g. Bowler & Cunningham, 1999;
Hancock, 2003; Gilbert, 2001; Levis, 2001), perhaps
for this reason, focus for production more on prominence, where it is easier to apply the ‘rules’ at a
conscious level, and less on tone assignment, which
they tend to treat at a receptive level except in
relation to conversation management where, again,
productive ‘rules’ are more amenable to conscious
manipulation. Wennerstrom (2001, 2003) emphasises
the need to provide learners with authentic conversation data which they can work on at an analytical
level, in effect, becoming discourse analysts, before
they move on to develop their discourse intonation
productive skills.
Pronunciation has also begun to be taught from a
discourse perspective within the lexical approach, an
approach which advocates the teaching of vocabulary
and grammar in lexical phrases rather than as a series
of discrete items (see e.g. Nattinger & DeCarrico,
1992). The potential for the teaching of discourse
intonation within the lexical approach was first
explained in detail by Seidlhofer & Dalton-Puffer
(1995). Subsequently the idea has been taken up
in numerous teaching materials with lexical phrases
being taught complete with their intonation contours
and tone units being introduced by means of the
lexical phrase.
3.6 Controversies in L2 pronunciation
research
English speech rhythm is often described as ‘stresstimed’; in basic terms, this means that the beginning
of each stressed syllable is said to be equidistant in
time from the beginning of the next stressed syllable.
This is in comparison to ‘syllable-timed’ languages
(e.g., Spanish, Cantonese), in which the start of
each individual syllable is said to be equidistant in
time from the start of the next. Instrumental studies
have, in fact, shown that very little difference can be
found between typically ‘stress-timed’ and typically
‘syllable-timed’ languages. Roach (1982) and Dauer
(1983), for example, investigated so-called stress- and
syllable-timed rhythm; both found that the theory
fell down when tested empirically. Cauldwell (2002a)
finds English speech rhythm to be “irrhythmical”.
8
■
This research, however, has had very little impact
on pronunciation teaching materials. On the whole,
many teachers still believe in stress-timing. But this
may not be a complete misnomer. Although it has
been proven that the difference between stress- and
syllable-timing does not have much basis in reality
from a speech production point of view, there is evidence to show that it is important for speech
perception, particularly among speakers of what may
be considered OVEs, such as British and American
English. Cutler (1993), in an article which discusses
the speech segmentation problem in different languages, asserts that rhythm based on word stress is
a key factor in English speech segmentation. For
French listeners, however, the syllable is more salient
in speech segmentation, and for Japanese listeners, it
is the mora. Speakers of different languages use that
language’s approach to linguistic rhythm in order to
segment a stream of speech from another language,
with, for example, French speakers using the syllable
to segment Japanese and English. The fact that
speakers segment a stream of speech differently, using
different, language specific rhythmic rules to do so,
is attributed to how we acquire language as infants,
the suggestion being that the “characteristic rhythmic
pattern of a language is sufficiently salient to assist the
newborn child in segmenting the continuous speech
stream into discrete units” (Cutler, 1993: 455). It
appears to be the case that, once we have acquired
a particular approach as infants, it stays with us as a
strategy for parsing the speech signal; studies of what
are referred to as “maximally competent FrenchEnglish bilinguals” show that they seem only to have
“one rhythmic segmentation procedure available to
them” (Cutler, 1993: 455).
It is asserted that the appropriate production of
stressed syllables is, therefore, of a high degree of importance in the effective communication of messages in English among speakers of OVEs, and for
some researchers this importance cannot be overemphasised in learner situations. If, as Adams (1979: 87)
suggests, learners “fail to recognise the significance of
the timing of syllables” when producing utterances in
English, and instead “produce an anomalous rhythm
which seriously impairs the total intelligibility of their
utterance”, both parties to the act of communication
will be at a loss to explain what has happened and
what was intended. In short, the communicative
transaction will not be successful. This is a matter
which has not eluded researchers, materials writers
and teachers (see, for example, Anderson-Hsieh et al.,
1992; Anderson-Hsieh & Venkatagiri, 1994; ChelaFlores, 1998; Gilbert, 1984; Taylor, 1981; Wong,
1987), but, claim Anderson-Hsieh & Venkatagiri
(1994), it is something which has, until recently,
been somewhat under-investigated. Taylor comments
that “perhaps the most widely encountered difficulty
among foreign learners of English is rhythm” (1981:
219), a sentiment echoed by Anderson-Hsieh (1992:
■
51) when she claims that “suprasegmentals often
elude ESL students”.
The difficulty experienced by NNSs of English
in acquiring English speech rhythm can therefore
be considered to have implications for intelligibility.
This is especially so in the light of studies like the
one by Anderson-Hsieh et al. (1994), which asserts
that prosody is the most critical feature in English
pronunciation (1994: 531), and that of Magen (1998).
These two studies, both of which use Englishspeaking raters to assess the pronunciation performance and intelligibility of the subjects in such areas
as segmentals, syllable structure, vowel quantity and
voicing, provide us with firm evidence that prosodic
and suprasegmental features have a consistently high
influence on the intelligibility of a non-native
speaker’s pronunciation.
So what does happen in other Englishes? Looking
at a variety from South East Asia, Low et al. (2000)
study the temporal features of Singapore English.
Vowel quality and vowel duration in Singapore
English is compared with that of British English using
a measure especially developed by the authors, the
‘Pairwise Variability Index’ or PVI. Their data show
that Singapore English speakers fail to reduce vowels
in weak syllables to the same extent that British
English speakers do, a phenomenon also found in the
English of Hong Kong speakers by Setter (2000,
2003). This can be expected to contribute to the rhythmic differences between British English and the
varieties studied, with the implication that both
Singapore and Hong Kong English will be difficult
for speakers of British English to understand.
Although this is speculation as far as these two studies
are concerned, various psychological studies of
speech perception demonstrate that deviations from
what may be considered normal English stress patterns can indeed cause difficulty in the correct parsing
of a message. Cutler (1984) points out that, in English,
“word stress patterns are an integral part of the
phonological representations of words in the mental
lexicon” (1984: 78), a statement which has farreaching implications for English speech perception
and production. What this means is that the listener
has a model of any given lexical item held in the
mental lexicon; that model includes its stress pattern.
For the listener’s correct retrieval of a particular item
during the process of speech perception, something
which comes rather close to approximating that
model must be produced by the speaker. If, as
Cutler (1984: 79) asserts, native English speakers draw
“heavily on information about stress pattern” as a
normal and efficient way of understanding speech, it
is crucial that this close approximation to the model
has correct stressing. If this is not achieved, the listener
will at the very best have difficulty reconstructing the
message, or, at worst, not understand it at all.
In a more recent work, Cutler & Norris (1988)
investigate the importance of strong syllables in the
Pronunciation
parsing of English. They suggest that lexical access
is initiated by the occurrence of a stressed syllable,
and claim that the high frequency of English content
words starting with a stressed syllable means that
this strategy would work very well in English. The
importance of stressed syllables in spoken word
recognition is also supported by Grosjean and Gee,
who claim that “stressed syllables (and only they) are
used to initiate lexical search” (Grosjean & Gee, 1987:
144). They do not, however, offer much in the way
of empirical evidence.
In fact, if the stressed syllables in a stream of
English speech are incorrectly placed, native speakers
may process the message as something completely
different. Cutler (1984) gives the following as an
example: “[ . . . ] a hearer who heard the word
‘perfectionist’ stressed on the first syllable, with the
second syllable reduced, parsed it as ‘perfect shnist’,
and only became aware of the error when no meaning
could be given to ‘shnist’”(Cutler, 1984: 79). Cutler
(1984) also cites and old study by Bansal (1966),
who presented listeners with English spoken by
Indian speakers. It was found that, if words with an
initial stress were produced with second syllable stress,
‘atmosphere’ was heard as ‘must fear’, ‘yesterday’ as
‘or study’, ‘character’ as ‘director’, and ‘written’ as
‘retain’, and when two-syllable words with stress on
the second syllable were uttered with initial stress,
hearers perceived ‘prefer’ as ‘fearful’, ‘correct’ as
‘carried’, and ‘about’ as ‘come out’ (Cutler, 1984:
79–80).
Although the above work is on word stress and
not speech rhythm in longer stretches of speech, the
point is clearly this: if the normally strong syllables
are weakened and the weak syllables strengthened,
the intelligibility is lost, or at least severely impaired.
This strongly advocates the use of conventional patterns of English speech rhythm as an essential factor
in the correct parsing of messages in NNS-NS interactions. In order to test the difference made in ease
of perception among NNSs and NSs, Tajima et al.
(1997) recorded phrases spoken in English by a native
American English speaker and a Mandarin Chinese/
Taiwanese speaker of English, and acoustically manipulated each according to the other’s rhythmic patterns
to see whether this had any affect on intelligibility.
It was found that the intelligibility of the Chinese
speaker’s speech among NSs of American English
improved by between 15% and 25% compared with
unaltered speech, and that the American English
speaker’s speech became less intelligible by similar
proportions. This leads them to conclude that “native
listeners’ ability to recognise English phrases is significantly influenced by whether or not the phrases have
appropriate native-like temporal properties” (Tajima
et al., 1997: 17).
Research on perception of stress and rhythm
notwithstanding, Cauldwell, in a version of his
1996 article published on the web, concludes “The
9
■
Pronunciation
continued presence of the refuted hypothesis, that
has become hard-wired into our thinking, is an
obstacle to progress in understanding the nature
of spontaneous speech: long-refuted, it should be
now discarded. Life without the stress and syllabletiming hypothesis will be more difficult, but it should
make possible real advances in the understanding of
spontaneous speech” (Cauldwell, 2002a: 22). This
conclusion is based on his own research, and, taking
that and the findings of Roach (1982) and Dauer
(1983) into account, he certainly has a point. But
although the influence of research into the reality of
the production of stress- and syllable-timed languages
is growing, it would not be sensible to throw the
baby out with the bathwater and fail to focus on the
importance of appropriate stressing in order to make
messages clear. Many teachers and especially teacher
educators now qualify the claim by referring to stress
timing as only a tendency and as occurring mainly
in more formal speech. Marks (1999: 198) argues,
meanwhile, that the use of rhythmical structures
such as rhymes in the classroom is valid in so far
as it “provides a convenient framework for the perception and production of a number of characteristic features of English pronunciation which are often
found to be problematic for learners: stress/unstress
(and therefore the basis for intonation), vowel
length, vowel reduction, elision, compression, pause
(between adjacent stresses)”. This is a sensible
recommendation that is likely to continue finding
favour with teachers long after they have abandoned
any belief in the existence of stress timing.
Another controversial area, related to speech
rhythm in that it concerns itself with suprasegmental
aspects of pronunciation, and one in which
technology is becoming invaluable in pronunciation
teaching, is that of intonation. In particular, studies
have been carried out on intonation in yes/no
questions, which, conventional wisdom and the
majority of teaching materials tell us, always have
a rising intonation. Both Levis (1999) in respect of
American English and Cauldwell (1999) in respect
of British English have arrived at similar conclusions
about yes/no questions. Both constructed a corpus
of naturally occurring speech samples from native
speakers of the respective varieties of English (as
opposed to the invented examples favoured by earlier
pronunciation researchers) and analysed them for
final pitch direction in yes/no questions. Neither
researcher found that yes/no questions unilaterally
have a rising tone. As yet, this finding has failed to
have had much of an impact on materials writers,
although an understanding of intonation in yes/no
questions does seem to have filtered through to
teachers and teacher educators. Thompson (1995),
for example, suggests a simple binary approach, in
which learners should be encouraged to use a rising
tone if they are genuinely asking a question and
a falling tone if they think they may know the
10
answer, carrying on to say that “learners should be
exposed to plenty of examples of yes/no questions”
(Thompson, 1995: 240). Computer corpora recorded
from naturally occurring speech could certainly be
used to provide those examples in the development
of listening activities.
3.7 Research into the potential for
technology in pronunciation teaching
When one thinks of using computers in pronunciation teaching, the most obvious use is perhaps
to focus on the identification and production of
individual speech sounds, and this has indeed been
the case. However, in recent years there has been a
greater focus on suprasegmental aspects in materials
produced for students to use on computer platforms,
which clearly reflects the importance placed on these
features in pronunciation teaching text books.
Programmes which deal specifically with segmental issues include SPECO, which combines advanced
speech technology with user-friendly graphics to aid
clinical remediation of children’s speech pathologies.
The programme has obvious applications in the field
of L2 English pronunciation teaching (see Roach,
2002). PRAAT, an application developed for speech
researchers by Paul Boersma and David Weenik of the
University of Amsterdam, which has applications in
speech analysis, synthesis, manipulation and labelling,
among others, and offers a facility for phoneme
identification and discrimination tests, has also been
modified to teach vowel and diphthong production
by means of formant plotting (see Brett, 2002).
The PRAAT programme can be downloaded free
of charge from www.praat.org.
Examples of this kind may lead to the conclusion
that computers are making the teacher redundant, but
this is an over simplistic view; at best programmes
such as those described can only be used in conjunction with classroom teaching, and recent research
urges us towards the careful evaluation of computer
programmes for teaching pronunciation. PRAAT, for
example, was designed to be used by serious speech
researchers, and computer readouts of formant plots
require a sophisticated level of understanding which
may be lacking in many teachers and learners, or
take too much classroom time to develop. Derwing
et al. (2000) looked at popular automatic speech
recognition (ASR) packages for ESL speech and
found that they are still not able to perform as well
as human listeners listening to non-native speech,
concluding that “the possibilities for using ASR
software in the L2 classroom are intriguing”, but as
yet still possibilities. Anderson-Hsieh (1992) points
out that, useful though it may be, using electronic
feedback “cannot carry out for students all the other
work that goes into acquiring native-like speech”,
although it is useful as an awareness raising tool.
■
The recent surge of interest in harnessing computers for teaching suprasegmentals has lead to the
development of a number of programmes. Kaltenboeck, for example, has developed a CD-ROM
for the teaching of intonation (see Kaltenboeck,
2002). Protea Textware have published three CDROMs focusing on connected speech in American
English, Australian English and British English (see
Westwood & Kaufmann, 2002). Cauldwell (2002c)
has published a CD-ROM, Streaming speech, which
deals with a range of aspects of British English
pronunciation. The material on the CD-ROM is
underpinned by extensive research, some snippets
of which feature in pop-ups while the programme
is running. For example, the section which deals
with connected speech processes is informed by
Shockey (2003), the section dealing with units of
speech is based on the research of Brazil (1997),
Halliday (1994) and Tench (1996), and that on
the functions of level tone again links with Brazil
(1997). The student is able to record him or
herself speaking in some sections, and compare this
with an English-speaking model. Fraser’s (2001a)
CD-ROM, Learn to speak clearly in English, is
another which covers different aspects of English
pronunciation. It starts by encouraging the student
to think about communication in general, before
moving on to sections on sentence stress and rhythm,
the role of segmentals and suprasegmentals, and
‘critical listening’. Again, students are able to record
themselves and compare it with a speaker on the disk.
For teachers, there is a companion disk (see Fraser,
2001b), which similarly makes uses of clever graphics
and comparisons with other culture-specific ideas,
like colour, to demonstrate how speakers of different
languages categorise phonemes differently. Another
interesting feature of the materials discussed in this
paragraph is that they have clearly been designed
with learner autonomy in mind; as Kaltenboeck
(2002: 13) points out, this is particularly relevant to
the acquisition of pronunciation. Students are encouraged to listen to themselves and think more about
what makes a message clear, rather than focussing
on the precise production of individual sounds. This
may well make them more successful in producing
effective communicators than the segmental speech
recognition packages because of the shortcomings
of the latter identified by Derwing et al. and
Anderson-Hsieh (above). Although suprasegmental
materials are still in their infancy, they point to an
important teaching tool for the future, one which
complements rather than supersedes written materials
and classroom teachers.
Dictionaries are another area in which technology
is coming to the fore. Many of the major publishers have started to issue CD-ROMs with their
dictionaries, promoting learner autonomy in pronunciation acquisition. Fortunately, what is now
available on disk is far superior to the stilted speech
Pronunciation
of the electronic ‘talking’ dictionaries which first
became available in the late 80s/early 90s. The new
CD-ROMs offer learners a range of features such as
the opportunity to hear words in isolation and, in
some cases, in connected speech. There is also the
possibility of recording and listening to themselves
in order to compare their own pronunciation
with the dictionary version. The only pronouncing
dictionary to currently be accompanied by a CDROM is the latest edition of Daniel Jones’ English
pronouncing dictionary (Roach et al., 2003), providing
the learner with a copious amount of information
about American and British English pronunciation.
However, in the current format, only the British
English pronunciation of words can be heard on the
CD-ROM.
Another electronic medium which hardly requires
an introduction is the internet. As with any materials,
however, caution is advised; transcription systems in
particular vary from site to site, and this may be a
cause of confusion for students. British-based and
influenced sites tend to have the most consistency
in symbols used for individual phonemes, although
slight variations do exist. Many sites focus on pronunciation without the use of phonetic symbols, and
these may well be best for students, depending on
their aims. There are sites which test and train English
phonemic transcription (see Tench, 2002; Luscombe,
1996; Cooke et al., no date), allow you to listen and
identify intonation contours (see Maidment, 2000a,
2000b and 2001) or work on minimal pairs (Kelly
2001), offer pronunciation tips (see Maidment, 1999),
and work with both teachers and students on a variety
of issues (see Fraser, 2000; Widmayer & Gray, 2002).
Widmayer & Gray’s site is particularly good value,
directing teachers and learners to all sorts of resources,
including sites with authentic materials. Here we
have given references to but a few of the many
sites available on-line. It has to be said that these
are all basically listening sites, but, as listening and
pronunciation go hand in hand, awareness-raising of
the kind offered on these sites is an invaluable addition
to pronunciation learning and teaching.
4. Socio-political issues
As was pointed out in section 3.2 above, the vast majority of pronunciation research and classroom teaching is grounded on the premise that learners need
to understand and be understood by native speakers
(NSs) of the language in question. However, for an
increasing number of learners, most particularly in the
case of English but also in the case of other languages
such as Spanish, pronunciation training is needed in
order to facilitate communication with other nonnative speakers (NNSs) from different first languages.
A distinction can therefore be made between a
foreign language, where interaction typically takes
place between a NS and a NNS, and an international
11
■
Pronunciation
language, where interaction is more typically between
a NNS and another NNS.
As far as English is concerned, research into the
learning of the language for international purposes,
i.e. English as an International Language (EIL), has
demonstrated not only the critical part played by pronunciation in maintaining successful communication
between NNSs from different L1s, but also the ways
in which the pronunciation priorities involved in EIL
differ from those of EFL.
The main EIL research approach to have been
adopted to date focuses on the role of pronunciation
in promoting and obstructing intelligibility. Building
on earlier research (Smith, 1992; Smith & Bisazza,
1982; Smith & Nelson, 1985; Smith & Rafiqzad,
1979) in which listeners from a range of L1s were
asked to rate the comprehensibility of speakers
from different L1s, Jenkins (2000, 2002) identifies
a number of pronunciation features which appear to
be crucial, or ‘core’, in safeguarding the intelligibility
of pronunciation for NNS listeners who do not
share the speaker’s L1. Her Lingua Franca Core
targets these core features: consonant sounds other
than the voiceless and voiced dental fricatives \T \
and \D \ and dark ‘l’; vowel quantity; word-initial
and word-medial consonant clusters, with deletion
being more problematic than epenthesis (addition);
tonic stress. Meanwhile, the remaining features of
NS English pronunciation (vowel length; features of
connected speech such as assimilation, elision, weak
forms; word stress; pitch direction) were found in the
research to be unnecessary for intelligibility in EIL
communication contexts and are therefore designated
‘non-core’. Jenkins argues that in cases where these
non-core features are affected by transfer from the
NNS’s first language, the resulting forms should be
described as regional (L2) sociolinguistic variation
rather than pronunciation ‘error’.
Subsequently, Lin’s (2003) research into the simplification of word-initial consonant clusters, building
on Weinberger (1987), has demonstrated that simplification by epenthesis is communicatively less
harmful to intelligibility for an NNS listener than
simplification by deletion. By preserving more of the
underlying form, Lin points out, epenthesis limits
ambiguity, whereas consonant deletion leads to nonrecoverability and greater ambiguity. Lin’s research
thus supports the Lingua Franca Core claim regarding
consonant clusters. On the other hand, Peng & Ann’s
(2001) research into word stress demonstrates that
there may be common patterns of stress across L2
varieties of English. If further research supports their
finding, the Lingua Franca Core will need to be
modified so as to incorporate word stress, though
with stress patterns being determined by NNSs rather
than by NSs.
Research into pronunciation in EIL contexts has
also begun to show the importance of accommodation. For example, Jenkins’s research draws
12
on Speech (later ‘Communication’) Accommodation
Theory (Beebe & Giles, 1984; Giles et al., 1991)
to demonstrate that intelligible pronunciation in EIL
communication is not a monolithic construct, but
that it requires constant negotiation and adjustment
in relation to speaker-listener factors specific to the
particular context of the interaction (see Jenkins,
2003).
5. Implications for pronunciation
teaching
From a broad point of view, pronunciation needs to
lose its isolated character and be treated pedagogically
as part of communication and discourse. This would
mean focusing on what will help a learner make
meaning in communicative situations at the same
time as learning about other aspects of language in
general language teaching textbooks; pronunciation
practice should be incorporated at as early a stage
as possible. In line with research conducted within
an SLA framework, the notion of the teachability of
various pronunciation features should be taken into
account, along with factors such as age, motivation
and the influence of L1. Aspects which require focus
from the perspective of discourse and communication
include appropriate use of discourse intonation, the
understanding of how sentences break down into
tone units and lexical phrases, the ability to highlight
stressed syllables in a stream of speech, and production
of the segmental elements.
Approaches to pronunciation teaching should also
be willing to adapt, and not continue to be influenced
by old fashioned notions. Where research identifies
the mythical nature of beliefs, such as rigid stresstiming and the use of specific intonation patterns
on questions, for example, teachers, teacher trainers
and materials developers should be ready to take this
on board and develop curricula which make use of
this information. Also, the notion of ‘error’ needs to
be readdressed in the light of the NVEs which are
emerging, and of EIL. The implications for models
and goals include a change of emphasis from accent
reduction to accent addition, and, in parallel, the
development of accommodation skills, in order to
make spoken messages clearer to all speakers/listeners.
However, a learner should not be discouraged from
using an OVE as a model, if that is what is desired by
the learner.
There should also be an enhanced role for listening
in pronunciation teaching. Learners need to be
exposed not only to OVEs but also to other varieties
of English, particularly those of speakers of local L2
Englishes with whom they are likely to communicate.
Learners need to be trained to be able to pick out the
salient information in a stream of speech, so that they
do not feel left behind, and also need to be introduced
to pitfalls arising from the use of connected speech
features by proficient users of English.
■
As far as resources are concerned, pronunciation
(and listening) resources should be made more readily
available to teachers and students, and these resources
should be introduced and demonstrated positively
during teacher training, rather than being treated
like poor relatives to general teaching texts and
materials, or, worse still, regarded as rather scary
and too difficult. Updated printed pronunciation
materials which take into account World Englishes
and EIL need to be developed. The extent to which
technology can be exploited is enormous; as with
all materials, teachers should be judicious in what is
actually being taught via computers and the internet,
in order to make sure the materials have taken
research into account and are not just rehashing
old ideas through technological means. Computer
applications have a great potential as use in learner
independence and self-access situations; it is the job
of the teacher to be able to evaluate these materials
and ensure the learner has made the best selection for
his/her level and needs.
We have highlighted the need to take research
into account when devising curricula for teaching
pronunciation. Research into pronunciation clearly
needs to go on, and the obvious area is research
into NNS-NNS interaction. As much as possible
of this research should be driven and completed by
teachers, who are in the position to see the difficulties
encountered and use their own research to inform
their teaching. This can only strengthen the position
of the teacher with respect to pronunciation teaching
and learning.
6. Conclusion
To conclude, we would like to finish by stating the
obvious: pronunciation is the major contributor to
successful spoken communication, and how anyone
learning a language can expect to be understood
with poor pronunciation skills is outside of our
comprehension. Teachers must take a step back
from current practice and evaluate their own
pronunciation skills and teaching methodologies, and
also have accessible to them current research, so
that they are able to look at how they can improve
not only the communicative skills of their students,
but also their own. The onus is on the teacher
educator, teacher and student to learn to listen,
both to themselves and other speakers, and address
features of their speech which may make it difficult
for communication to take place. If we are going to
use English as a world language, then let’s use it for
mutual understanding.
Bibliography
Adams, C. (1979). English speech rhythm and the foreign learner.
The Hague, Paris and New York: Mouton.
Anderson-Hsieh, J. (1992). Using electronic visual feedback
to teach suprasegmentals. System 20, 1, 51–62.
Pronunciation
Anderson-Hsieh, J., Johnson, R. and Koehler, K. (1992).
The relationship between native speaker judgments of nonnative pronunciation and deviance in segmentals, prosody,
and syllable structure. Language Learning 42, 4, 529–
555.
Anderson-Hsieh, J. and Venkatagiri, H. (1994). Syllable
duration and pausing in the speech of Chinese ESL
speakers. TESOL Quarterly 28, 4, 807–812.
Aoyama, K., Flege, J. E., Guion, S. G., Akahane-Yamada, R.
and Yamada, T. (2003). Foreign accent in English words
produced by Japanese children and adults. 15th International
Congress of Phonetic Sciences, Barcelona, Spain, 3201–
3204.
Archibald, J. (2002). Parsing procedures and the question of
full access in L2 phonology. New Sounds 2000: The Fourth
International Symposium on the Acquisition of Second-Language
Speech, University of Amsterdam, 11–21.
Archibald, J. (1997). The relationship between L2 segment
and syllable structure. New Sounds 97: The Third International Symposium on the Acquisition of Second-Language Speech,
University of Klagenfurt, 17–25.
Bamgbos.e, A. (1998). Torn between the norms: innovations
in world Englishes. World Englishes 17, 1, 1–14.
Bansal, R. K. (1966). The Intelligibility of Indian English: PhD
Thesis, London University.
Barrera-Pardo, D. and Lópex-Soto, T. (2003). Language
input and choice of English pronunciation models in local
contexts. 15th International Congress of Phonetic Sciences,
Barcelona, Spain, 2837–2840.
Beebe, L. and Giles, H. (1984). Speech-accommodation theories: a discussion in terms of second-language acquisition.
International Journal of the Sociology of Language 46, 5–32.
Benrabah, M. (1997). Word-stress: a source of unintelligibility in English. IRAL 35, 3, 157–165.
Bond, Z. S. and Fokes, J. (1985). Non-native patterns of
English syllable timing. Journal of Phonetics 13, 407–420.
Bowler, B. and Cunningham, S. (1999). Headway pronunciation course. Oxford: Oxford University Press.
Bradford, B. (1988). Intonation in context. Cambridge:
Cambridge University Press.
Brazil, D. (1997). The communicative value of intonation in
English. Cambridge: Cambridge University Press.
Brazil, D. (1994). Pronunciation for advanced learners of English.
Cambridge: Cambridge University Press.
Brazil, D., Coulthard, M. and Johns, C. (1980). Discourse
intonation and language teaching. London: Longman.
Brett, D. (2002). Improved vowel production with the
PRAAT programme. In D. Teeler (ed.), Talking computers,
7–10.
Brown, A. (1988). The staccato effect in the pronunciation
of English in Malaysia and Singapore. In J. Foley (ed.), New
Englishes: the case of Singapore, 115–147.
Brown, A., Deterding, D. and Low, E. L. (eds.) (2000).
The English language in Singapore: research on pronunciation.
Singapore: Singapore Association for Applied Linguistics.
Bürki-Cohen, J., Miller, J. L. and Eimas, P. D. (2001).
Perceiving non-native speech. Language and Speech 44, 2,
149–169.
Burton, J. and Clennell, C. (eds.) (2003). Interaction and
language learning. Alexandria, VA: TESOL.
Carlisle, R. S. (2002). The acquisition of two and three
member onsets: time III of a longitudinal study. New Sounds
2000: The Fourth International Symposium on the Acquisition
of Second-Language Speech, University of Amsterdam, 42–47.
Carlisle, R. S. (2001). Syllable structure universals and
second language acquisition. International Journal of English
Studies 1, 1, 1–20.
Carlisle, R. S. (1999). The modification of onsets in a
markedness relationship: testing the conformity hypothesis.
In J. Leather (ed.), Phonological issues in language learning, 59–
93.
13
Pronunciation
Carter, R. and Nunan, D. (eds.) (2001). The Cambridge guide
to teaching English to speakers of other languages. Cambridge:
Cambridge University Press.
Cauldwell, R. (2002–2003). Grasping the nettle: the
importance of perception work in listening comprehension. http://www.developingteachers.com/articles tchtraining/
perception1 richard.htm
Cauldwell, R. (2002a). The functional irrythmicality of
spontaneous speech: a discourse view of speech rhythms.
http://www.solki.jyu.fi/apples/.
Cauldwell, R. (2002b). Phonology for listening: relishing
the messy. http://www.speechinaction.pwp.blueyonder.co.uk/
pdf%20files/Phonology%20for%20Listening Relishing%20the
%20messy.pdf.
Cauldwell, R. (2002c). Streaming speech: listening and pronunciation for advanced learners of English. Birmingham, UK:
Speechinaction.
Cauldwell, R. (1999). Judgements of attitudinal meanings
in isolation and in context. http://www.phon.ucl.ac.uk/home/
johnm/cauld.htm
Celce-Murcia, M., Brinton, D. M. and Goodwin, J. M.
(1996). Teaching pronunciation: a reference for teachers of
English to speakers of other languages. Cambridge: Cambridge
University Press.
Cenoz, J. and Garcia Lecumberri, L. (1999). The acquisition of English pronunciation: learners’ views. International Journal of Applied Linguistics 9, 1, 3–17.
Chela-Flores (1998). Teaching English rhythm: from theory to
practice. Caracas, Venezuela: Fondos Editorial Tropykos.
Chun, D. M. (2002). Discourse intonation in L2: from theory and
research to practice. Amsterdam: John Benjamins.
Clements, G. (1990). The role of the sonority cycle in core
syllabification. In J. Kingston and M. Beckman (eds.), Papers
in laboratory phonology I: between the grammar and physics of
speech, 283–333.
Cohen, P. R., Morgan, J. and Pollack, M. E. (eds.) (1990).
Intentions in communication. London: MIT Press.
Collins, B. and Mees, I. M. (2003). Practical phonetics and
phonology: a resource book for students. London and New York:
Routledge.
Collins, B. and Mees, I. M. (1999). The real Professor Higgins.
The life and career of Daniel Jones. Berlin and New York:
Mouton de Gruyter.
Cooke, M., Lecumberri, M. L. G., Maidment, J.
and Ericsson, A. (no date). Web transcription tool.
http://www.wtt.org.uk/index.html.
Corder, S. P. (1971). Language continua and the
interlanguage hypothesis. In S. P. Corder and E. Roulet
(eds.), The notions of simplification, interlanguages, and
pidgins, and their relation to second language pedagogy, 11–
17.
Corder, S. P. and Roulet, E. (eds.) (1971). The notions of
simplification, interlanguages, and pidgins, and their relation to
second language pedagogy. Geneva: Librarie Droz.
Cruttenden, A. (2001). Gimson’s pronunciation of English (6th
edn.). London: Arnold.
Cutler, A. (1993). Segmenting speech in different languages.
The Psychologist 6, 453–455.
Cutler, A. (1984). Stress and accent in language production
and understanding. In D. Gibbon and H. Richter (eds.),
Intonation, accent and rhythm: studies in discourse phonology,
77–90.
Cutler, A. and Norris, D. (1988). The role of strong syllables
in segmentation for lexical access. Journal of Experimental
Psychology: Human Perception and Performance 14, 1, 113–
121.
Dalton-Puffer, C., Kaltenboeck, G. and Smit, U. (1997).
Learner attitudes and L2 pronunciation in Austria. World
Englishes 16, 1, 115–128.
Daniels, H. (1997). Psycholinguistic, psycho-affective and
procedural factors in the acquisition of authentic L2
14
■
pronunciation. In A. McLean (ed.) (1997). SIG Selections
1997 Special interests in ELT, 80–85.
Dauer, R. M. (1983). Stress timing and syllable timing
reanalysed. Journal of Phonetics 11, 51–62.
Derwing, T. M. and Munro, M. J. (2001). What speaking
rates do non-native listeners prefer? Applied Linguistics 22,
3, 324–337.
Derwing, T. M., Munro, M. J. and Carbonaro, M. (2000).
Does popular speech recognition software work with ESL
speech? TESOL Quarterly 34, 592–603.
Derwing, T. M. and Rossiter, M. J. (2002). ESL learners’
perception of their pronunciation needs and strategies.
System 30, 155–166.
Derwing, T. M., Rossiter, M. J. and Munro, M. J. (2002).
Teaching native speakers to listen to foreign accented
speech. Journal of Multilingual and Multicultural Development
23, 4, 245–259.
Deterding, D. and Poedjosoedarmo, G. R. (1998). The
sounds of English: phonetics and phonology for English teachers
in South East Asia. Singapore: Simon & Schuster (Asia).
Dörnyei, Z. and Csizér, K. (2002). Some dynamics of
language attitudes and motivation: results of a longitudinal
nationwide survey. Applied Linguistics 23, 4, 421–462.
Eckman, F. R., Elreyes, A. and Iverson, G. K. (2001).
Allophonic splits in L2 phonology: the question of
learnability. International Journal of English Studies 1, 1, 21–
52.
Erickson, D., Akahane-Yamada, R., Tajima, K. and
Matsumoto, K. F. (1999). Syllable counting and mora units
in speech perception. ICPhS99, San Francisco, 1479–1482.
Field, J. (2003). Promoting perception: lexical segmentation
in L2 listening. ELT Journal 57, 4, 325–333.
Flege, J. E. (2002). No perfect bilinguals. New Sounds 2000:
The Fourth International Symposium on the Acquisition of
Second-Language Speech, University of Amsterdam, 132–
141.
Flege, J. E. (1997). The role of category formation in
second-language speech learning. New Sounds 97: The
Third International Symposium on the Acquisition of SecondLanguage Speech, University of Klagenfurt, 79–88.
Flege, J. E. (1995). Second language speech learning: theory,
findings and problems. In W. Strange (ed.), Speech perception
and linguistic experience: theoretical and methodological issues,
233–277.
Flege, J. E. (1987). A critical period for learning to pronounce
foreign languages? Applied Linguistics 8, 2, 162–177.
Flege, J. E., Yeni-Komshian, G. H. and Liu, S. (1999).
Age constraints on second-language acquisition. Journal of
Memory and Language 41, 78–104.
Fraser, H. (2001a). Learn to speak clearly in English. Kingston,
ACT: Catalyst Interactive.
Fraser, H. (2001b). Teaching pronunciation: a guide for teachers
of English as a second language. Fyshwick ACT: Catalyst
Interactive.
Fraser, H. (2000). Teaching pronunciation. http://wwwpersonal.une.edu.au/∼hfraser/pronunc.htm.
Gibbon, D. and Richter, H. (eds.) (1984). Intonation, accent
and rhythm: studies in discourse phonology. Berlin and New
York: W. de Gruyter.
Gilbert, J. (2001). Clear speech from the start. Cambridge:
Cambridge University Press.
Gilbert, J. (1984). Clear speech: pronunciation and listening
comprehension in North American English. Cambridge:
Cambridge University Press.
Giles, H., Coupland, N. and Coupland, J. (eds.) (1991).
Contexts of accommodation. Developments in applied sociolinguistics. Cambridge: Cambridge University Press.
Goh, C. (2000). A discourse approach to the description
of intonation in Singapore English. In A. Brown, D.
Deterding and E. L. Low (eds.), The English language in
Singapore: research on pronunciation, 35–45.
■
Gorsuch, G. J. (2001). Testing textbook theories and tests:
the case of suprasegmentals in a pronunciation textbook.
System 29, 119–136.
Goswarmi, U. and Bryant, P. (1990). Phonological skills and
learning to read. Hove, East Sussex, UK: Psychology Press.
Grosjean, F. and Gee, J. P. (1987). Prosodic structure and
spoken word recognition. Cognition 25, 135–155.
Grotjahn, R. (1998). Ausspracheunterricht: ausegewählte
Befunde aus der Grundlagenforschung und didaktischmethodische Implikationen. Zeitschrift für Fremdsprachenforschung 9, 1, 35–83.
Gutiérrez, F. (2001). The acquisition of syllable timing by
native speakers of English. An empirical study. International
Journal of English Studies 1, 1, 93–114.
Halliday, M. A. K. (1994). An Introduction to Functional
Grammar (2nd edn.). London: Edward Arnold.
Halliday, M. A. K. (1970). A course in spoken English:
intonation. Oxford: Oxford University Press.
Hancock, M. (2003). English pronunciation in use. Cambridge:
Cambridge University Press.
Hansen, J. G. (2001). Linguistic constraints on the acquisition
of English syllable codas by native speakers of Mandarin
Chinese. Applied Linguistics 22, 3, 338–365.
Hewings, M. (1995). Tone choice in the English intonation
of non-native speakers. IRAL 33, 3, 251–265.
Howatt, A. P. R. with Widdowson, H. G. (2004). A
history of English language teaching. 2nd ed. Oxford: Oxford
University Press.
Imai, S., Flege, J. E. and Walley, A. (2003). Spoken
word recognition of accented and unaccented speech:
lexical factors affecting native and non-native listeners. 15th
International Congress of Phonetic Sciences, Barcelona, Spain,
845–848.
James, A. and Leather, J. (eds.) (2002). New sounds 2000:
proceedings of the Fourth International Symposium on SecondLanguage Speech. University of Amsterdam.
Jarvella, R. J., Bang, E., Lykke Jakobsen, A. and Mees,
I. M. (2001). Of mouths and men: non-native listeners’
identification and evaluation of varieties of English.
International Journal of Applied Linguistics 11, 1, 37–56.
Jenkins, J. (2003). Intelligibility in lingua franca discourse.
In J. Burton and C. Clennell (eds.) Interaction and language
learning, 83–97.
Jenkins, J. (2002). A sociolinguistically based, empirically
researched pronunciation syllabus for English as an
international language. Applied Linguistics 23, 1, 83–103.
Jenkins, J. (2000). The phonology of English as an international
language. Oxford: Oxford University Press.
Kaltenboek, G. (2002). Computer-based intonation
teaching: problems and potential. In D. Teeler (ed.), Talking
computers, 11–17.
Kelly, C. I. (2001). American English pronunciation practice.
http://www.manythings.org/pp/.
Kingston, J. and Beckman, M. (eds.) (1990). Papers in
laboratory phonology I: between the grammar and physics of speech.
Cambridge: Cambridge University Press.
Kreidler, C. W. (2004). The pronunciation of English: a course
book in phonology (2nd edn.). Oxford and New York:
Blackwell.
Leather, J. (ed.) (1999). Phonological issues in language learning.
Malden MA & Oxford: Blackwell.
Leather, J. and James, A. (eds.) (1997). New sounds 97:
proceedings of the Third International Symposium of SecondLanguage Speech. Klagenfurt: University of Klagenfurt.
Lecumberri, M. L. G. (2001). Native language influence in
learners’ assessment of English focus. International Journal of
English Studies 1, 1, 53–72.
Lehiste, I. (1977). Isochrony reconsidered. Journal of Phonetics
5, 253–263.
Lenneberg, E. H. (1967). Biological foundations of language.
New York: Wiley.
Pronunciation
Levis, J. M. (2001). Teaching focus for conversational use.
English Language Teaching Journal 55, 1, 47–54.
Levis, J. M. (1999). Intonation in theory and practice,
revisited. TESOL Quarterly 33, 1, 37–63.
Lin, Y. H. (2003). Interphonology variability: sociolinguistic
factors affecting L2 simplification strategies. Applied
Linguistics 24, 4, 439–464.
Lindfield, K. C., Wingfield, A. and Goodglass, H. (1999).
The contribution of prosody to spoken word recognition.
Applied Psycholinguistics 20, 395–405.
Low, E. L., Grabe, E. and Nolan, F. (2000). Quantitative
characterizations of speech rhythm: syllable-timing in
Singapore English. Language and Speech 43, 4, 377–
401.
Luscombe, S. (1996). On-line phonology course. http://
www.celt.stir.ac.uk/staff/HIGDOX/STEPHEN/PHONO/
PHONOLG.HTM.
Magen, H. S. (1998). The perception of foreign-accented
speech. Journal of Phonetics 26, 381–400.
Maidment, J. (2001). Online intonation (OI!). http://www.
phon.ucl.ac.uk/home/johnm/oi/oiin.htm.
Maidment, J. (2000a). Plato. http://www.btinternet.com/
∼eptotd/vm/plato/platmen.htm.
Maidment, J. (2000b). TONI. http://www.btinternet.com/
∼eptotd/vm/toni/tonimenu.htm.
Maidment, J. (1999). English pronunciation tip of the day.
http://www.phon.ucl.ac.uk/home/johnm/eptotd/tiphome.htm.
Major, R. C. (2002). The ontogeny and phylogeny
of second language phonology. New Sounds 2000:
The Fourth International Symposium on the Acquisition
of Second-Language Speech, University of Amsterdam,
223–230.
Major, R. C. (1999). Chronological and stylistic aspects of
second language acquisition of consonant clusters. In J.
Leather (ed.), Phonological issues in language learning, 123–
151.
Major, R. C. (1997). Further evidence for the similarity
differential rater hypothesis. New Sounds 97: The Third
International Symposium on the Acquisition of Second-Language
Speech, University of Klagenfurt, 215–222.
Major, R. C. (1987a). A model for interlanguage phonology.
In G. Ioup and S. H. Weinberger (eds.), Interlanguage
phonology: the acquisition of a second language sound system,
101–124.
Major, R. C. (1987b). The natural phonology of second
language acquisition. In A. James and J. Leather
(eds.), Sound patterns in second language acquisition, 207–
224.
Major, R. C., Fitzmaurice, S. F., Bunta, F. and
Balasubramanian, C. (2002). The effects of nonnative
accents on listening comprehension: implications for ESL
assessment. TESOL Quarterly 36, 2, 173–190.
Marks, J. (1999). Is stress-timing real? ELT Journal 53, 3,
191–199.
McLean, A. (ed.) (1997). SIG Selections 1997 Special interests
in ELT. Whitstable: IATEFL.
McMahon, A. (2002). An introduction to English phonology.
Edinburgh: Edinburgh University Press.
Miller, M. (1984). On the perception of rhythm. Journal of
Phonetics 12, 75–83.
Mohanan, K. P. (1992). Describing the phonology of nonnative varieties of a language. World Englishes 11, 2/3, 111–
128.
Monroy, R. (2001). Profiling the phonological processes
shaping the frozen IL of adult learners of English
as a Foreign Language. Some theoretical implications.
International Journal of English Studies 1, 1, 157–218.
Morley, J. (ed.) (1994). Pronunciation pedagogy and theory. New
views, new directions. Alexandria, VA: TESOL.
Moyer, M. (1999). Ultimate attainment in L2 phonology’.
Studies in Second Language Acquisition, 21, 1, 81–108.
15
Pronunciation
Munro, M. J. and Derwing, T. M. (1998). The effects of
speaking rate on listener evaluations of native and foreign
accented speech. Language Learning 48, 2, 159–182.
Munro, M. J. and Derwing, T. M. (1995a). Processing time,
accent, and comprehensibility in the perception of native
and foreign-accented speech. Language and Speech 38, 3,
289–306.
Munro, M. J. and Derwing, T. M. (1995b). Foreign accent,
comprehensibility, and intelligibility in the speech of
second language learners. Language Learning 45, 1, 73–97.
Nattinger, J. R. and De Carrico, J. S. (1992). Lexical phrases
and language teaching. Oxford: Oxford University Press.
Nelson, C. (1992). Intelligibility and non-native varieties of
English. In B. B. Kachru, P. Strevens and L. K. Ayers (eds.),
The other tongue: English across cultures, 58–73.
Neri, A., Cucchiarini, C. and Strik, W. (2003). Automatic
speech recognition for second language learning: how and
why it actually works. 15th International Congress of Phonetic
Sciences, Barcelona, Spain, 1157–1160.
Nichols, A. C. (1964). Apparent factors leading to errors in
audition made by foreign students. Speech Monographs 5, 31,
85–91.
Peng, L. and Ann, J. (2001). Stress and duration in three
varieties of English. World Englishes 20, 1, 1–27.
Peng, L. and Setter, J. (2000). The emergence of
systematicity in the English pronunciations of two
Cantonese-speaking adults in Hong Kong. English
World-Wide 20, 1, 81–108.
Pickering, L. (2004). The structure and function of intonational paragraphs in native and nonnative speaker instructional discourse. English for Specific Purposes 23, 19–43.
Pickering, L. (2002). Patterns of intonation in cross-cultural
communication exchange structure in NS & ITA classroom
discourse. The Seventh Annual Conference on Language, Interaction & Culture, University of California, Santa Barbara,
1–7.
Pickering, L. (2001). The role of tone choice in improving
ITA communication in the classroom. TESOL Quarterly
35, 2, 233–255.
Pierrehumbert, J. and Hirschberg, J. (1990). The meaning of
intonational contours in the interpretation of discourse. In
P. R. Cohen, J. Morgan and M. E. Pollack (eds.), Intentions
in communication, 271–311.
Piske, T., Flege, J. E. and Mackay, I. (2002). Factors affecting
degree of global foreign accent in an L2. New Sounds
2000: The Fourth International Symposium on the Acquisition
of Second-Language Speech, University of Amsterdam, 290–
297.
Przedlacka, J. (1999). Estuary English? A sociophonetic study
of teenage speech in the Home Counties. Frankfurt am Main:
Peter Lang.
Roach, P. J. (2002). SPECO: computer-based phonetic
training for children. In D. Teeler (ed.), Talking computers,
23–27.
Roach, P. J. (2000). English phonetics and phonology: a practical
course (3rd edn.). Cambridge: Cambridge University Press.
Roach, P. J. (1994). Working on the model pronunciation.
Unpublished paper, Singapore Tertiary English Teachers’
Society, Singapore.
Roach, P. J. (1982). On the distinction between ’stress-timed’
and ‘syllable-timed’ languages. In D. Crystal (ed.), Linguistic
controversies, 73–79.
Roach, P. J., Hartman, J. W. and Setter, J. E. (eds.)
(2003). Daniel Jones’ English pronouncing dictionary (16th
edn.). Cambridge: Cambridge University Press.
Schumann, J. H. (1986). Research on the acculturation model
for second language acquisition. Journal of Multilingual and
Multicultural Development 7, 5, 379–392.
Seidlhofer, B. (2001). Pronunciation. In R. Carter and D.
Nunan (eds.), The Cambridge guide to teaching English to
speakers of other languages, 56–65.
16
■
Seidlhofer, B. and Dalton-Puffer, C. (1995). Appropriate
units in pronunciation teaching: some programmatic
pointers. International Journal of Applied Linguistics 5, 1, 135–
146.
Selinker, L. (1972). Interlanguage. IRAL 18, 139–152.
Setter, J. E. (2003). A comparison of speech rhythm
in British and Hong Kong English. 15th International
Congress of Phonetic Sciences, Barcelona, Spain, 467–
470.
Setter, J. E. (2000). Rhythm and timing in Hong Kong English.
PhD thesis. Reading: University of Reading.
Setter, J. E. and Deterding, D. (2003). Extra final consonants
in the English of Hong Kong and Singapore. 15th
International Congress of Phonetic Sciences, Barcelona, Spain,
1875–1878.
Shockey, L. (2003). Sound patterns of spoken English. Malden,
MA and Oxford UK: Blackwell.
Smit, U. (2002). The interaction of motivation and
achievement in advanced EFL pronunciation learners.
IRAL 40, 1/2, 89–116.
Smit, U. and Dalton, C. (2000). Motivation in advanced EFL
pronunciation learners. IRAL 38, 3/4, 229–246.
Smith, L. (1992). Spread of English and issues of intelligibility.
In B. B. Kachru (ed.) The other tongue. English across cultures,
2nd ed. Urbana, IL: University of Illinois Press.
Smith, L. and Bisazza, J. (1982). The comprehensibility
of three varieties of English for college students in seven
countries. Language Learning 32, 259–270.
Smith, L. and Nelson, C. (1985). International intelligibility
of English: directions and resources. World Englishes 4, 3,
333–342.
Smith, L. and Rafiqzad, K. (1979). English for cross-cultural
communication: the question of intelligibility. TESOL
Quarterly 13, 3, 371–380.
Stankler, G. (2002). The CD-ROM pronunciation basics.
In D. Teeler (ed.), Talking computers, 5–6.
Strange, W. (ed.) (1995). Speech perception and linguistic experience: theoretical and methodological issues. Timonium MD:
York Press.
Tajima, K., Port, R. and Dalby, J. (1997). Effects of temporal
correction on intelligibility of foreign-accented English.
Journal of Phonetics 25, 10–24.
Tarone, E. E. (1987). The phonology of interlanguage. In G.
Ioup and S. H. Weinberger (eds.), Interlanguage phonology:
the acquisition of a second language system, 70–85.
Taylor, D. S. (1991). Who speaks English to whom? The
question of teaching English pronunciation for global
communication. System 19, 4, 425–435.
Taylor, D. S. (1981). Non-native speakers and the rhythm of
English. IRAL 14, 3, 219–226.
Teeler, D. (ed.) (2002). Talking computers. Whitstable:
IATEFL.
Tench, P. (2002). Transcribing English words. http://www.
cf.ac.uk/encap/staff/tench/tswords.html.
Tench, P. (2001). An applied interlanguage experiment into
phonological misperceptions. International Journal of English
Studies 1, 1, 259–278.
Tench, P. (1996). The intonation of English. London & New
York: Cassell.
Thomas, J. (1995). Meaning in interaction. London: Longman.
Thompson, S. (1995). Teaching intonation on questions. ELT
Journal 49, 3, 235–243.
Tsukada, K., Birdsong, D., Bialystok, E., Mack, M.,
Sung, H. and Flege, J. E. (2003). The perception
and production of English /E/ and /æ/ by Korean
children and adults living in North America. 15th
International Congress of Phonetic Sciences, Barcelona, Spain,
1589–1592.
Tyler, A. (1995). The coconstruction of cross-cultural
miscommunication. Studies in Second Language Acquisition
17, 129–152.
■
Upton, C., Kretzschmar, W. and Konopka, R. (eds.) (2001).
Oxford dictionary of pronunciation for current English. Oxford:
Oxford University Press.
Vaughan-Rees, M. (2002). Pronunciation in CD-ROM
dictionaries: a comparative review. In D. Teeler (ed.),
Talking computers, 28–32.
Weinberger, S. H. (1999). Speech accent archive.
http://classweb.gmu.edu/accent/.
Weinberger, S. H. (1987). The influence of linguistic context
on syllable simplification. In G. Ioup and S. H. Weinberger
(eds.) Interlanguage Phonology. Cambridge, Massachusetts:
Newbury House.
Wells, J. C. (2000). Longman pronunciation dictionary (2nd
edn.). London: Longman.
Wennerstrom, A. (2003). Students as discourse analysts
in the conversation class. In J. Burton and C.
Clennell (eds.) Interaction and Language Learning, 161–
175.
Wennerstrom, A. (2001). The music of everyday speech:
prosody and discourse analysis. Oxford: Oxford University
Press.
Pronunciation
Wennerstrom, A. (1998). Intonation as cohesion in academic
discourse: a study of Chinese speakers of English. Studies in
Second Language Acquisition 20, 1–25.
Wennerstrom, A. (1994). Intonational meaning in English
discourse: a study of non-native speakers. Applied Linguistics
15, 4, 399–420.
Westwood, V. W. and Kaufmann, H. (2002). Connected speech.
Hurstbridge VIC: Protea Textware.
Wichmann, A. (2000). Intonation in text and discourse:
beginnings, middles and ends. Harlow: Longman.
Widmayer, S. and Gray, H. (2002). Sounds of English.
http://www.soundsofenglish.org/index.html.
Wilson, M. (2003). Discovery listening: improving perceptual
processing. ELT Journal 57, 4, 335–343.
Wong, R. (1987). Teaching pronunciation: focus on English
rhythm and intonation. Englewood Cliffs, NJ: Prentice Hall.
Young-Scholten, M. (1997). Second language syllable simplification: deviant development or deviant input? New
Sounds 97: The Third International Symposium on the Acquisition of Second-Language Speech, University of Klagenfurt,
351–360.
17
View publication stats