Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
1
World Englishes
Marianne Hundt
1 INTRODUCTION
English corpus linguistics was kick-started by the compilation of the machine-readable
Brown corpus of written American English (AmE) in 1961. A parallel British English
(BrE) version was soon to follow. In the 1980s, the Brown-type compilation model
started spreading to other parts of the English-speaking world (India, Australia and New
Zealand).1 While the Brown-type corpora are a useful resource, and their sampling
frame is even used to cover previous stages of World Englishes,2 they are limited with
respect to regional spread and, more importantly, provide evidence on printed written
language use, only. Corpus linguistics truly went global when, in the late 1980s, Sidney
Greenbaum launched a huge international project that aimed at providing standard onemillion word samples of World Englishes on a hitherto unprecedented scale, the
International Corpus of English or ICE (Greenbaum 1996). The label ‘standard’ in this
context serves two meanings, covering both the variety (educated English) that was to
be sampled as well as the principled compilation that would hopefully ensure
comparative research across the different Englishes (see section 2.1). The focus in this
chapter will be on World Englishes that are used as first or second language varieties.
1
For an overview of research based on non-ICE corpora, see Fallon (2004) and Nelson
(2006).
2
For the extended Brown family covering Britain and the US, see e.g. Hundt and Leech
(2012). Sebastian Hoffmann (Trier) is involved in the compilation of a historical corpus
for Singapore English modelled on Brown, and Peter Collins (Sydney) is engaged in a
similar project for the Philippines.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
2
Discussion of corpus-based research into English as a lingua franca (ELF) and Learner
Englishes can be found in [cross-references to other chapters].3 While some scholars
have compiled their own corpora of world Englishes (e.g. by tapping into archives
found on the world-wide web),4 the focus in this chapter will be on research based on
ICE, as these are the most widely available corpora of ENL and ESL Englishes, and
they are corpora in the more narrow corpus linguistic sense, i.e. principled,
representative collections of texts (see .
The original vision was already very ambitious in that the project was aiming to include
eighteen sub-corpora (Greenbaum 1996:3). Since then, the corpus has been growing and
new members keep joining the ICE family of corpora.5 Today, ICE components are
available or under compilation for varieties of English as a first language (ENL)6 like
3
For corpus-based research that bridges the ‘paradigm gap’ between studies on first,
second and foreign language varieties, see the papers in Mukherjee and Hundt (2011).
For a more detailed discussion of the terminology and classification of different World
Englishes, see Mesthrie and Bhatt (2008: 2-13).
4
A publicly available web-based set of World Englishes corpora is provided by Mark
Davies at http://corpus2.byu.edu/glowbe/ [last accessed 1 July 2013].
5
For a list of available corpora and those under construction, see http://ice-
corpora.net/ice/index.htm (last visited 16-01-2013).
6
The established acronym is actually derived from ‘English as a Native Language’, but
nativeness is a somewhat controversial issue whereas ‘first language’ is a more neutral
term.
Randi Reppen! 1/7/13 11:10
Comment: Based on the foot note why not
use L1 – English as an L1
MH: Because ENL is the label that is
commonly used in the Kachruvian threecircles model.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
3
BrE or New Zealand English (NZE), institutionalised7 second language varieties (ESL)
such as Indian (IndE) and Singapore (SingE) English, but also for countries in which a
standard(izing) variety of English exists alongside a creole (e.g. in Jamaica and the
Bahamas) or where the exact status of English is a matter of debate (e.g. English in
Hong Kong or Malta).8 While the recent and ongoing expansion of ICE allows linguists
interested in World Englishes to include a broad range of data in comparative, crossvarietal research, both the original design and the expansion of the ICE project pose a
number of methodological issues that need to be addressed. The aim of this chapter is to
critically discuss the sampling frame of the corpus and its suitability for research into
the dynamics of English as a global language. In particular, it will address the question
whether the corpus design poses certain restrictions on the study of World Englishes.
Section two of this chapter will survey important early/ier studies based on this resource
and section three will present a case study on the use of the present perfect in different
Englishes. The chapter will conclude with a short evaluation of the existing resources
and an outlook on recent and future developments.
2 PREVIOUS ICE-BASED RESEARCH9
ICE corpora sample both written and spoken data, amounting to approximately 400,000
and 600,000 words, respectively. Written texts include both printed and non-printed
7
Institutionalised varieties of English are varieties that typically have official status in a
country and/or are used in a broad range of intranational domains, such as
administration, tertiary education and the media.
8
Note that the original intent was to include ENL and ESL varieties, only (Greenbaum,
1996: 3).
9
The studies in this section at times combine ICE with other electronic sources.
Randi! 1/7/13 12:58
Comment: First do previous research on
WE. Then focus in on ICE-based research…
MH: I have now referred to previous corpusbased research in more detail above,
essentially with references to existing research
guides (i.e. Fallon 2004) – rather than
repeating this information here, I’d like to
keep the focus on ICE. My main argument is
that the studies do not differ fundamentally in
terms of their method from what people have
been doing with ICE.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
4
texts and the spoken component has public, unscripted and scripted speech materials.
These samples are divided across a set of domains (e.g. private, education, legal, media,
business, administration, etc.).10
The ICE corpora (or parts thereof) have been used individually, for the
description of single varieties (e.g. Nelson et al. 2002 or Sedlatschek 2009). Another
strand of research takes a comparative approach, either with a focus on a comparable set
of varieties in a specific region (see the papers in Peters et al. 2009) or aiming at more
global coverage (the papers in Hundt and Gut 2012). Studies have looked at
grammatical variation (e.g. Zipp forthcoming), lexis (e.g. Skandera 2003) or register
variation in individual varieties (e.g. Balasubramanian 2009, van Rooy et al. 2010). At a
total of approximately 1 million words of text, the ICE corpora pose obvious limitations
on lexical studies and infrequent grammatical patterns.
Phonetic analyses of ICE corpora are extremely rare (but see, e.g. Rosenfelder 2009),
mostly because the sound files collected for the spoken part of the ICE corpora are not
made publicly available and, with the exception of some sections of ICE-GB (see
Huckvale and Fang 1996) and ICE-NIG (see Wunder et al. 2010), the data have been
transcribed orthographically but not been aligned with the original sound files in a
systematic way. Finally, ICE corpora have also been used to investigate issues of
typological (e.g. Szmrecsanyi and Kortmann 2011) and sociolinguistic (e.g. Mair 2009)
variation.
10
For details, see http://ice-corpora.net/ice/design.htm [last accessed 1 July 2013].
Randi Reppen! 1/7/13 13:18
Comment: She complied her own corpus
and didn’t use ICE. I don’t know about the
other studies listed – but please be careful to
see what they used.
MH: I have been careful: She used both her
own corpus and ICE-India. But have added a
footnote earlier to make this explicit – the
same applies to the study by Davydova – uses
a ‘home-made’ corpus and ICE; Sedlatschek
padded his study with other sources, too, so
it’s a common approach.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
5
2.1 ICE as a resource for the study of World Englishes
Any study that uses data from several ICE components relies on the comparability of
these corpora. Thus, corpus comparability is one of the issues that need to be addressed
in a critical evaluation of ICE for the study of World Englishes. It was one of the key
design features that the initiator of the project wanted to achieve:
“The ICE project views as the basis for international comparisons the provision of
parallel corpora that sample English used in the participating countries. For valid
comparative studies the components of ICE need to follow the same design, to date
from the same period, and to be processed and analysed in similar ways” (Greenbaum,
1996:5).
An obvious limitation on the comparability of varieties across different ICE
components comes from the long history of the project. Initially, eighteen varieties were
to be represented and the intention was that the material sampled should have been
produced in the late 1980s and early 1990s. The more recent additions necessarily
sample data from around a decade later, thus introducing a slight diachronic bias into
comparisons. Users also need to be aware that sampling for a single ICE corpus may
occasionally stretch over a considerable time period, which introduces diachronic
variation not only between but also within individual ICE components. The sampling
for ICE-Fiji started back in 2005, for instance, included individual texts published as
early as 1990 (Biewer et al. 2010:10) and – for political and practical reasons – still
remains incomplete at the moment of writing. This means that within a single ICE
corpus, the time span may occasionally stretch to around 20 years instead of the original
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
6
plan (1990-94 only, see Nelson 1996:28). Mukherjee and Schilk (2012:191) therefore
rightly conclude that “[w]e have to uphold … the general fiction of linguistic stability
over the past 20 years in order to be able to treat ICE components as synchronically
comparable corpora”. But users need to be aware that this is obviously a fiction rather
than a fact. While the diachronic bias inherent in the current set of ICE corpora might
be a relevant factor for some studies this might be less of a problem for others: A
feature that underwent rapid spread in spoken varieties of English in different parts of
the globe, such as quotative be like, crucially depends on contemporaneous sampling of
the data (see Höhn 2012:268) whereas more long-term changes as those in the
complementation patterns of verbs might be less affected by the bias inherent in the
data. Mair and Winkle (2012), for instance, found that differences between ENL and
ESL varieties of English were more marked than any signs of ongoing change in their
study of specificational cleft sentences.
While it is sometimes necessary to be aware of the potential diachronic bias introduced
by the data, background information on the precise temporal span sampled in individual
ICE corpora (or components thereof) is often difficult to obtain. The majority of ICE
corpora were released without detailed bibliographical background information on
individual texts included in the corpus or biographical information for the spontaneous
spoken conversations, notable exceptions being ICE-NZ and ICE-IRE. ICE-NZ
includes texts (both written and spoken) from 1990 to 1998 (Vine 1999:8); the earliest
texts in ICE-IRE are from 1990 and the latest (recordings of spoken data) from 2005
(Kallen and Kirk 2008:4, 31); in ICE-CAN, the written texts sampled were produced
between 1988 and 1999 and the spoken between 1985 and 1997, with the bulk of data
Randi Reppen! 1/7/13 13:30
Comment: Mention the problem/challenge
that registers from different Englishes might
not be realized in the same way. This is a
significant issue in corpus design that needs to
be considered.
MH: I cover this aspect below where I discuss
‘formality’ and refer to relevant studies that
have looked at formality in ICE corpora.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
7
having been recorded in the years 1994 and 1995.11 A comparison of these three ICE
members would thus have to factor in the fact that the data were collected between 1985
and 2005. Comparison with more recent ICE members obviously has to take an even
broader time span into account.
In the sampling of the spoken component, no restrictions on speaker age or gender were
usually imposed, so there is no straightforward way in which the spontaneous spoken
data in ICE could be used for apparent-time studies that would allow linguists to trace
ongoing change in the new Englishes. The ICE corpora aim to be samples of educated
usage and therefore do not attempt balance with respect to people’s regional
background which, to some extent, may result in a fairly homogenous regional sample
in the spontaneous spoken conversations. ICE-GB’s spoken data, for example, are
largely a sample of educated London English (see Hundt, 2009:461).
While biographical background information is often not available for individual written
texts, it can easily be obtained (and has been collected) from the informants who
contributed to the spoken part of the corpus. If it were more widely available, this
background information would help the research community to interpret the findings in
the right light. One point may serve to illustrate this. A search for university in ICECAN shows (possible) traces of language contact:
11
I am grateful to John Newman and Georgie Columbus for providing me with the
metadata for ICE-CAN.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
8
(1) But it’s not like that you have a computer desk somewhere on the university
where you can all … (ICE-CAN, S1A-063)
(2) when the kids come from India some of them are into college already aren’t they
or university (ICE-CAN, S1A-057)
The speaker of (1) is a female informant who has French and German as additional
languages and spent seven years at university in Switzerland: the use of both on and the
could have been triggered by the German PP-N collocation an der Universität. The use
of into with college/university in example (2) might have to be attributed to the
speaker’s multilingual background: in addition to Canadian English (CanE) and French,
he speaks Hindi, Gujarati, Nepali and Kannada and spent two years living in India.
While language contact in Canada is an obvious factor to take into account, researchers
using ICE-CAN may not necessarily be aware of the fact that, in individual instances,
contact may go beyond the potential influence of the country’s second official language,
French.
Another problem for comparability has to be attributed to the interpretation of text types
from one cultural environment to another (see e.g. Biewer et al., 2010 on the difficulties
of sampling texts for the ‘skills and hobbies’ section and ‘technology’ in the Fijian
context). One more example of this kind may be taken from the student essay section of
ICE-PHI where the student clearly perceives the task of writing an academic essay
differently from what one might expect:
(3) Plato would suggest aristocracy.
Randi Reppen! 28/5/13 03:05
Comment: This relates to the comment
earlier about register differences - Develop this
a bit more as to how this can cause problems
for analysis
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
9
And Freud would …
Ehehehe …
As for me …
Er …
Argh.
Maths is so much easier.
P.S. I didn’t realize how hard it is to write something that has to do with
Philosophy until now.
Too many thoughts. (ICE-PHI, W1A-001)
While an individual text is unlikely to affect the results of a study, a more systematic
bias in interpreting text categories differently in a regional variety of English will have
an impact, e.g. on variables that are sensitive to ‘formality’. I will return to this issue
below.
Similarly, a closer look at some spontaneous conversations in the ICE corpora may
show that informants at times engage in interview-like behaviour (note that the
contributions of the fieldworker have been marked as extra-corpus material), throwing
some doubt on the ‘naturalness’ of such ‘private’ conversations:12
12
Note also that a lot of the spoken material was collected in university contexts. This
has introduced a certain thematic bias into a lot of the ICE components and led, for
instance, to an over-representation of education as a topic in ICE corpora (see Newman
and Columbus, 2009).
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 10
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
(4)
<$Z> <X>First of all uhm <,> this isn't the best way to start off but tell
me your full name </X>
<$A> Uhm <,> Tang Wai Ping <,> Ade
<$Z> <X>Tang Wai Ping <{> <[> Ade </[>
How did you get the name <,> Ade </X>
<$A> <,> <[> Yeah </[> </{>Uhm the name is uhm decided by my grandpa
(ICE-HK, S1A-001)
The ICE corpus was conceived before e-mail communication became one of the most
common forms of written long-distance communication. It is therefore not surprising
that the original corpus design included letters (both social and business) as a text
category to be sampled for the written part of the corpus. Nowadays, e-mail and other
means of electronic written communication have largely replaced letter writing,
especially in the private domain. It is therefore not surprising that a lot of ICE corpora
have gone against the original design (which stipulated for the inclusion of e-mail as a
separate, additional text type) and have (also) sampled e-mails. ICE-CAN, for instance,
includes the whole range from hand-written to typed letters, but also e-mails whereas
neither ICE-NZ nor ICE-IRE includes any e-mails. With other ICE components
sampling only e-mails (see e.g. Biewer et al. 2010:11), the ‘letters’ category is therefore
likely to exhibit a fair bit of inter-varietal differences, both in terms of patterns that are
used as well as the level of formality.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 11
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
Formality is another issue that needs to be considered in the interpretation of findings
from ICE corpora. It is a factor that plays a role at several levels. Firstly, previous
research on individual varieties (e.g. Schneider 2005, Hundt 2006) suggests that there
might be less of a formality gap between written and spoken texts in ESL corpora than
there is in ENL corpora. Zhiming and Huaqing’s (2006) study indicates that even such
broad generalisations are problematic as a particular feature might be indicative of
regional differences with respect to only one register (e.g. private conversation) and not
even across the spoken medium as a whole. While Xiao (2009), in his multidimensional
analysis of five ICE corpora (GB, India, Hong Kong, the Philippines and Singapore)
shows that there are both similarities between ESL varieties with respect to stylistic
parameters, he also found differences among them, e.g. that spoken and written texts are
much closer in ICE-IND than in the other corpora.13 Secondly, investigations into the
stylistic homogeneity and heterogeneity of corpora (notably Biber 1988) have shown
that there may be considerable variation within certain pre-defined text categories, on
the one hand, and more similarities among texts that are grouped into different
categories on the other hand; Sigley (1997:232) therefore cautions us:
“Corpus analysts are recommended to beware of treating the pre-existing text categories
as natural groups, and to consider alternative text groupings which may be more
relevant for their purpose.”
13
Other useful studies that address the complex matter of register variation and corpus
comparability are Biber (1993), Gries (2006) and Sigley (2012).
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 12
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
Additional studies on stylistic variation across and within ICE components are therefore
needed as background information for the interpretation of findings on individual
patterns.
Finally, ICE corpora have been used alongside other resources in the description of
World Englishes. Especially for the study of lexico-grammatical variation, the ICE
corpora provide interesting sources for hypothesis building that can then be verified
against larger dataset, usually from less stratified material. Examples of such studies are
Mukherjee and Hoffmann’s (2006) investigation of new ditransitives in Indian English
(e.g. to gift somebody something) or Hoffmann et al.’s (2012) study of light verb
constructions in South Asian Englishes, where the use of the indefinite article is
variable (e.g. to take (a) look at).
2.2 Some findings and research questions
The potential problems with cross-corpus comparability outlined in section 2.1 do not
mean that ICE does not allow for meaningful comparative research. On the contrary,
ICE components (and parts thereof) have been used to investigate various linguistic
features. A recurrent research question that these studies aim to answer concerns
Randi! 1/7/13 13:37
Comment: This section is 3.5 pages, mostly
giving a critical evaluation of ICE. Can we
condense that a lot, and shift the focus to a
general survey of corpus-based research on
WE, and then the place of ICE-based research
relative to the full range of those studies.
MH: This section is quite explicit on purpose –
as far as I know, there is no other critical
evaluation of the ICE corpora to date, and I
wanted to alert users to potential skewing
effects that the ICE sampling may have on
their results. I’d prefer to keep this focus
instead of repeating things that have been
published on research into World Englishes
that uses the Brown family of corpora (i.e. the
ground covered by Fallon, 2004).
ongoing change and whether a particular variety is more advanced or more conservative
with respect to a particular change. AmE is leading the change towards a greater use of
quasi- or semi-modals like going to, want to while at the same time being more
advanced in the decline of core modals (see e.g. Collins and Yao 2012); the continued
increase of the progressive, on the other hand, is spear-headed by AusE and NZE
among the ENL varieties, with some New Englishes showing higher, others showing
Randi Reppen! 1/7/13 13:54
Comment: won’ t these vary depending on
register considerations???
MH: Yes, within the ICE-components, there’s
variation with respect to speech vs. writing,
and within writing, among different text types
– but this is more or less stable across the
corpora, so that you an also talk about regional
differences. So the focus here was on regional
rather than text type variation. Could be more
explicit, but was trying to keep things short
here. Have added a footnote – hope this helps.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 13
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
lower frequencies of progressives (e.g. Hundt 1998, Collins 2009, Hundt and Vogel
2011).14 Other studies focus on the relative closeness or distance between ENL and ESL
varieties, looking both at how global features are used in local varieties and at evidence
of structural nativization: one of the reasons that New Englishes are less advanced in the
move away from core modals is that would, for instance, shows an extended (i.e.
nativized) use (see Deuber et al. 2012). In the following example it has replaced ENL
will; the example at the same time illustrates the extended use of the progressive in a
context where one would expect a non-progressive VP in a variety such as BrE or AmE:
(4) First, I would be explaining about the gender inequality, which often leads to the
high incidence of poverty amongst women, which is what I would be discussing
about in the second part of this essay. (ICE-FJ, W1A-016)
Nativization is an important indicator of how far a ‘new’ English variety has come in its
development along, e.g. the stages suggested in Schneider’s (2007) model of new
dialect evolution.
3 CASE STUDY: THE PRESENT PERFECT
The present perfect (PP) is of interest because it is apparently an example of stable
regional variation in written BrE and AmE (see e.g. Hundt and Smith 2009). At the
same time, perfect constructions serve pragmatic functions in certain text types, as we
will see, and show interesting patterns of nativization in New Englishes. With respect to
14
This kind of regional variation can be observed over and beyond variation across
different modes (i.e. speech and writing) and registers.
Randi! 1/7/13 13:57
Comment: Even for ICE-based studies, I’d
like Section 2.2 expanded, and Section 2.1
greatly reduced – we care a lot more about the
research questions and findings for this
chapter.
MH: See my earlier comment – can do further
revisions if you insist. My aim was less to
provide a critical evaluation of the ICE
corpora as a resource for comparative studies
because a lot of people, who have not been
involved in the compilation of an ICE
component, are unaware of some of the
differences among the corpora. But am happy
to shift the focus if you insist.
Randi Reppen! 17/5/13 16:10
Comment: Some examples here would be
helpful
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 14
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
methodological issues, the study will compare results from lexical searches with data
derived from syntactically annotated corpora.
For past-time reference, Present-Day English (PDE) has a choice between the PP (I
have seen her) and the simple past tense (PT) (I saw her). The standard account of the
PP in standard (ENL) varieties of English today is that it refers to past events that have
current relevance. Elsness’ (1997) long-term, corpus-based study of BrE and AmE
shows that the PP increases over time but starts decreasing again from the second half
of the eighteenth-century, a development which is lead by AmE. In the twentieth
century, there is relatively stable variation in the use of the PP, with higher levels found
in BrE than in AmE (see e.g. Hundt and Smith 2009).15 Beyond regional differences
between the two standard northern-hemisphere varieties, previous studies have found
the PP to be particularly frequent in spoken Australian English (AusE) (see Engel and
Ritz 2000 or Elsness 2009:98).
As far as functions of the PP are concerned, standard PDE differs from languages such
as German or French, where the perfect has grammaticalized into a form used for
reference to events that are clearly in the past. However, both historical and regional
varieties of English also provide evidence of the occasional narrative use of the PP in
clear past-tense contexts (see e.g. Elsness 1997:292 for historical varieties and Hughes
et al. 42005:12f. for dialects; this use is also attested in AusE, see Engel and Ritz 2000
15
Note, however, that Bowie et al. (2013) observe a slight increase in the perfect in
their spoken British data.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 15
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
and Ritz in press).16 Engel and Ritz (2000) show how the PP has a special pragmatic
function in press reportage where it serves as a framing element at the beginning or end
of an article.
Elsness (2009) compares data from the Brown-family of corpora with ICE data, which
also include non-printed material and texts sampled from a slightly later date. Both
factors are likely to have had an influence on the slightly higher percentages of PPs he
finds in the written parts of ICE-AUS and ICE-NZ. In other words, it is important to
compare datasets that sample the same kinds of text.
Recently, a number of studies have made use of the ICE components to investigate
variation across both ENL and ESL varieties of English in the use of the perfect. There
are studies that look at the text frequency per million words of the PP (e.g. Bowie et al.
2013 or Yao and Collins 2013), but they focus on ENL Englishes only. The focus in the
following survey of previous research is on studies that have looked at variation in
different Englishes. These model the variation in terms of variable contexts, i.e. where
there is alternation between the PP and the SP. Davydova (2011) uses the spoken
components of ICE-IND, East Africa and Singapore and the London Lund Corpus
(LLC) of spoken BrE to study the use of PP vs. SP in ‘present perfect contexts’, i.e.
only those contexts where the PP could replace the SP (e.g. not in narrative contexts or
with adverbials of definite past time references like long ago, yesterday, the other day,
16
A different sub-types of the narrative function is the so-called ‘footballer’s perfect’
(Walker 2008).
Randi! 28/5/13 03:14
Comment: What survey?
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 16
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
in 1900, etc. She summarizes the procedure for defining these context in the following
figure:17
Randi! 1/7/13 14:49
Comment: Hmm – I think this is crucial for
understanding the quantitative findings.
Explain, plus discuss the methods for doing
the analysis
MH: Have added her summary chart – hope
that it will be sufficient.
Figure 1: Categorisation of present perfect contexts (Davydova, 2011: 124)
Her data reveal that the proportion of PPs is lower in ESL varieties than in spoken BrE:
Table 1. PP vs. SP in present perfect contexts (raw frequencies and percentages, based
on Davydova 2011:175, 223, 238, 145; infrequent additional forms not included)
ICE-IND ICE-EAf
ICE-SIN
LLC
perfect 715 (53%) 247 (58%) 532 (56%) 1812 (90%)
preterite 471 (35%) 159 (37%) 350 (36%) 197 (10%)
Seoane and Suárez-Gómez (2013) use a similar approach but a slightly different set of
ICE corpora (Hong Kong, Singapore, India, Philippines and GB as a benchmark corpus)
as well as a slightly different methodology of data retrieval and definition of the
17
For a more detailed discussion of this concept, see Davydova (2011: 119-131).
Randi! 28/5/13 03:16
Comment: Is this a raw count? Add a
legend to the table
Randi! 1/7/13 14:59
Comment: 90% perfect!! What does that
mean? The results in this table need to be
explained – they don’t seem credible!
MH: If you look at contexts where you’d
expect the PP rather than the SP, only, then
these figures do make sense! Hope that the
explanations above have made it more
obvious!
Marianne Hundt! 1/7/13 15:10
Deleted: 25
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 17
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
variable. Like Davydova, their focus is on spoken data, but while Davydova used both
face-to-face conversations and telephone calls, Seoane and Suárez-Gómez limit their
analysis to the private conversations. (The rationale for using spontaneous speech in
both cases is that this is the least monitored kind of data and that, according to Miller
(2004), PP and SP alternate more frequently in this type of language). Seoane and
Suárez-Gómez limited their analysis to the ten most frequent verbs in the Asian ICE
corpora, extracting the data automatically, whereas Davydova read through the corpus
files searching for present perfect contexts. Finally, following Huddleston & Pullum
(2002: 143), Seoane and Suárez-Gómez defined the perfect semantically as expressing
events covering “a time span beginning in the past and extending up to now”.
Table 2. Forms expressing perfect meaning (i.e. experiential, recent past, resultative and
persistent situation) in Private Dialogue in Asian varieties of English (based Seoane and
Suárez-Gómez 2013:9; PP vs. SP, only; percentages over all variants)
ICE-HK
ICE-SIN
ICE-PHI
ICE-IND
ICE-GB
perfect 410 (59.2%) 155 (44.4%) 169 (57.3%) 300 (77.5%) 236 (80.8%)
preterite 204 (29.5%) 174 (49.9%) 121 (41.0%) 70 (18.1%) 48 (16.4%)
Randi Reppen! 17/5/13 16:23
Comment: Is this different from perfect
forms???
Randi Reppen! 28/5/13 03:22
Comment: How do you account for studies
based on the same ICE corpora ending up with
such different results?? -- eg compare SIN &
IND in Tables 1 and 2.
This study produces different results from Davydova because of a the different
methodology, resulting in a more marked divide between IndE and SingE. Both studies
use a semantic definition of the variable but apply different retrieval strategies. The
combination of these differences may well give rise to diverging results despite the fact
that very similar sets of data were used. The difference for BrE, in addition, is most
likely due to the fact that different benchmark corpora were used: LLC was sampled
earlier than ICE-GB and contains more formal conversations than those included in
ICE-GB. Note, however, that while the comparison of the results for LLC and ICE-GB
Randi! 28/5/13 03:23
Comment: Ok – explain this. It’s a pretty
large difference.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 18
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
in tables 1 and 2 suggest that there has been a decrease in the use of the PP, Bowie et al.
(2013:326) report a slight increase in spoken BrE. This apparently contradicting result
can be explained if we take a closer look at the definition of the variable: the
percentages presented in the tables above compare the use of the present perfect against
the use of the SP whereas Bowie et al. look at text frequency per million words (pmw)
of the PP. This approach avoids the difficulty of deciding which SP verbs could be
replaced by a PP, but they report the text frequency of one construction, only. Variation
of PPs may actually extend beyond the SP as a variant, as a recent paper by Pfaff et al.
(2013) indicates: They found that the past progressive (e.g. I was just looking at this
Randi! 2/7/13 09:52
Comment: Ok – explain more here. What
does each approach tell us? Why would we
choose one over the other?
MH: Have now added something here. In the
original version, I only discussed these issues
in section 3.3 – could therefore refer to this
discussion here and move the added bits of
texts. [I think that this would work better.]
picture) is also occasionally used in spontaneous speech to refer to recent events in past
contexts. By measuring the frequency of the PP in terms of text frequency rather than
against alternating constructions the question of syntactic equivalence is avoided.
With respect to suitable benchmark corpora ICE-GB is a more suitable choice because
the texts sampled, by and large, stem from the same period as those sampled in the other
ICE components. Interestingly, with ICE-GB as a yardstick for comparison and a
differently defined variable, IndE comes closer to BrE in Seoane and Suárez-Gómez
investigation than it was in Davydova’s study and the emerging picture is one of a
gradient rather than an ENL-ESL divide.
Davydova (2011:170, 231, 253) also looked at the PP in past tense contexts; her study
shows this is actually a rare phenomenon in IndE, EAfE and SingE, corroborating
Balasubramanian’s (2009:92) earlier finding for IndE: on the basis of ICE-IND and a
corpus of contemporary IndE, she discovered that PPs in clear past tense contexts are
infrequent at 3.4% and a mere 0.9% of all PPs in IndE speech and writing, respectively.
Randi! 1/7/13 16:17
Comment: But this is confusing – both
studies report the results as percentages. So
what’s the difference???
MH: Because the approaches are really quite
different, maybe downplay the comparison a
bit?
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 19
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
In addition to the standard variants, i.e. the PP with auxiliary have, the ICE components
also reveal traces of nativization, for instance the pattern without an auxiliary (6) or
with a base form rather than a past participle (7):
(5) That’s why I never looked back and had any regrets for whatever I myself done or
decided upon with my eyes open. (ICE-IND, S1A-038; quoted from Davydova
2011:180)
(6) She has give four exams (ICE-IND, S1A-070, quoted from Seoane and SuárezGómez 2013:11)
ICE corpora also yield a minority of instances which combine auxiliary be with a past
participle, which could be retentions of the older be-perfect (7); note, however that
some of the attested examples are with transitive rather than the historically attested
intransitive verbs, i.e. (8) and (9) are not simply retentions but modern ‘extensions’ of
the be-perfect:18
(7) I said to the receptionist <,> here on the desk <,> is he gone in to visit (ICE-IRE,
S1A-008)
(8) Look I’m I’m almost finished Sacred Hunger [title of a novel; MH] (ICE-HK, S1A047, quoted from Seoane and Suárez-Gómez 2013:12)
18
Note, however, that IrE also uses the be-perfect with transitive verbs, a feature that
McCafferty (forthcoming) attributes to substrate influence from Irish.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 20
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
(9) Okay once the pieces are been cut and washed and dried now we connect them
together. (ICE-SIN, S2A-058)
Examples (7)-(9) are all from spoken texts, overall a more likely context for nativized
patterns to occur than in edited written language. While both Davydova (2011) and
Seoane and Suárez-Gómez (2013) used spoken data, only, the case study in the
following section focuses on the use of the PP in the news sections of the ICE corpora.
3.1 Data and methodology
The case study aims to broaden the scope of previous research by including varieties of
English that have not been subjected to comparative research, partly because the
respective ICE components have only recently been made available or are still under
construction. The ENL varieties included are BrE, AmE, CanE, NZE and AusE; ESL
varieties selected are Fiji English (FijE), Philippine English (PhilE), Indian English
(IndE), Sri Lankan English (SLE) and Ghanaian English (GhE). In addition to providing
evidence on the use of the PP in some new ICE corpora, another aim is to illustrate how
different approaches to data retrieval may influence the results. The analyses will be
limited to the newspaper section of ICE, not only for obvious time constraints on a
small-scale study and limitations on the availability of spoken data,19 but also because
newspapers are expected to be maximally comparable across different regional varieties
of English. Moreover, Miller (2004:230) points out that “[i]n formal written English the
Perfect construction is solidly fixed, in frequent use and protected by grammars of
standard English and by editorial practice.” This general tendency might no longer
19
Only written data are currently available for ICE-US, ICE-FJ, ICE-SL and ICE-GH.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 21
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
apply to newspaper texts, as he observes “… with the intensive use of computers
newspapers are no longer edited as rigorously as they once were” (Miller, 2004:234).
Finally, newspaper articles afford the possibility to investigate one of the narrative
functions of the PP.
Three different approaches will be used to extract corpus data from the corpus material.
The PP combines a form of the auxiliary have with a past participle. Because of the text
frequency of auxiliary have and lack of grammatical annotation, previous studies tended
to rely on the verb-based approach, i.e. they restricted analysis to a set of frequently
used lexical verbs.20 The ICE corpora are currently being POS-tagged and parsed at the
university of Zurich, using the Tree Tagger tagset (Schmid 1994) and a probabilistic
dependency parser (Pro3Gres), developed by Schneider (2008). This allows for
automatic retrieval of PP and SPs. Eight of the ten ICE components that form the basis
of this study have been syntactically annotated, thus allowing for automatic retrieval of
all PPs and SPs and thereby affording a bird’s eye view of the frequency of the two
kinds of verb phrase.
In addition, a verb-based approach will be used for an analysis that looks at more
strictly variable contexts (i.e. include only SP verbs that can be replaced by a PP),
making use of nine high-frequency lexical verbs (come, finish, get, give, go, hear, see,
tell, think). A third approach will look into the co-occurrence of the PP with temporal
adverbs, those that are typically associated with it (i.e. just, (n)ever, yet) and one that
20
Exceptions would be Hundt and Smith (2009), who retrieved PPs from the tagged
version of the Brown-family corpora and Bowie et al. (2013), who use fuzzy tree
fragments (FTFs) to retrieve their data. Davydova (2011) retrieved her data manually.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 22
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
prototypically triggers the SP (yesterday). Finally, a qualitative analysis of the opening
sections of articles in ICE-AUS, ICE-NZ, ICE-FJ and ICE-PHI will show whether the
perfect is frequently used with the framing function observed by Engel and Ritz (2000).
As far as the definition of the variable is concerned, the focus is on standard variants of
both the PP and the SP. Occasionally, a perfect with auxiliary be rather than have is
attested even in edited, printed texts from an ESL context:
(10)
The game was long been seen as a hobby … . (ICE-GH, W2C-004)
Such non-standard variants are not included in the counts.
In interpreting the overall frequency of simple past tense VPs in ESL varieties one has
to be aware of the possibility that there is zero past tense marking on verbs. The
following example comes from ICE-FJ – it is a serendipitous find from a manual postedit of co-occurrences with the adverb yesterday. Note that zero past tense marking
(brave) is used along regular past tense marking (stood) in this example:
(11)
Hundreds of students of a Suva prominent school brave yesterday's heat and
stood in long queues to wait for their turn to pay their fees. (ICE-FJ, W2C-012)
While I did not systematically search for zero past tense marking, the phenomenon
seems to be rare in the newspaper data, so they are not included in the counts. Similarly,
PPs with a base form of the participle are not included in the present case study, partly
because PPs were retrieved by searching for the standard past participle and partly
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 23
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
because these nativized patterns, again, appeared to be typical of spoken rather than
written usage. By narrowing down the envelope of variation in this way we will not
have missed a large number of relevant hits: Suárez-Gómez and Seoane (2013:167)
found only 1.1% zero-marked SPs and 0.6% PPs with a base form in their written Asian
English material.
While zero past tense marking might lead to under-reporting of SPs in automatically
retrieved data, lack of back-shifting to past perfects in reported speech will lead to overreporting of data:
(12)
He said his wife could have been saved if there was someone who knew how to
apply CPR. (ICE-FJ, W2C-014)
Lack of back-shifting also occurred with PPs (and not only in the ESL varieties); these
instances were not included in the counts because they are not part of the variable
context investigated here, i.e. they are not typical PP contexts but variants of the past
perfect:
(13)
The Burnaby Lawyer noted that Bourassa has come to B.C. before – the most
recent visit was in April, 1988. (ICE-CAN, W2C-010)
(14)
While acknowledging that the Board had not lived up to its role, he emphasised
that since his takeover two years ago, things have turned for the better. (ICE-IND,
W2C-019)
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 24
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
Nativized patterns were also excluded from the co-occurrence data with temporal
adverbs. In IndE, for instance, yet can be used in the sense of ‘still’, as the following
examples show:
(15)
Randi Reppen! 1/7/13 16:36
Comment: Also true for AmE
MH: Do I need to include this information in a
footnote if the benchmark variety is BrE?
The gas emerged from a broken outlet pipe of the tanker and spread in the
nearby village while people were yet asleep. (ICE-IND, W2C-012)
(16)
Over a month having lapsed since the poll-day violence, Khan has yet not been
arrested. (ICE-IND, W2C-004)
Finally, instances where the adverb did not modify the VP but another temporal adverb
were also manually excluded from the concordances:
(17)
The government just recently moved to dismantle the allocation of the imports
of sugar under the so-called minimum access volume scheme under which a limited
group corner the bulk of the importation. (ICE-PHI, W2C-006)
3.2 Findings
The automatically retrieved data will be presented in two different ways. Figure 2a
shows the overall frequency of PPs and SPs across the parsed datasets. Even though the
results are presented in terms of percentages, it is important to note that these, unlike
tables 1 and 2 above, do not represent variable contexts of use, i.e. only SPs in present
perfect contexts, but the proportion of all PP and SPs. The results thus simply give an
initial indication of the distribution of the two constructions across the different ICE
Randi! 1/7/13 16:38
Comment: So just is excluded, but
‘recently’ would be included in that example –
right?
MH: I only looked at JUST, (N)EVER and
YET, so recently was not included in the set of
adverbials I looked at. Do I need to make that
explicit here (the adverbials are listed above).
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 25
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
components. The data were not manually post-edited to exclude false positives, e.g.
instances of have got (to). [add footnote on evaluation of automatically retrieved data]
ICE-US
11.1%
ICE-CAN
12.2%
ICE-GB
17.1%
ICE-AUS
9.9%
ICE-NZ
11.3%
ICE-IND
14.5%
ICE-PHI
9.3%
ICE-FJ
7.6%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Figure 2a. Relative frequency of PP and SP in the press section of ICE corpora (parsed)
(ICE-US, N = 1890; ICE-CAN, N = 1928; ICE-GB, N = 1753; ICE-AUS, N = 2248; ICE-NZ, N = 2160;
ICE-IND, N = 1584; ICE-PHI, N = 2029; ICE-FJ, N = 2145)
Figure 2a shows that in news reporting, the SP occurs with a much higher text
frequency than the PP. Note, however that the relative frequencies presented in figure
2a are not directly comparable with the proportions reported in Tables 1a and 1b which
only included SPs in present perfect contexts. The bird’s eye view also suggests that the
ENL and ESL varieties do not fall into two neat groups. At 9.3%, PhilE yields an even
lower proportion of PPs than AmE, its historical parent variety. AusE, NZE and CanE
are closer to AmE than to BrE, which has the highest relative frequency of PPs, closely
followed by a historically related ESL variety, IndE. The lowest frequency of PPs are
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 26
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
found in FijE, an ESL variety that is currently undergoing nativization (see e.g. Zipp,
forthcoming).
A somewhat different picture emerges if the frequency of PPs is measured against
corpus size rather than as a proportion of PPs vs. SPs (see Figure 2b): ICE-AUS has
slightly more rather than fewer PPs than ICE-US on this count.
ICE-US
ICECAN
ICE-GB
ICE-AUS
ICE-NZ
ICE-IND
ICE-PHI
ICE-FJ
0
2000
4000
6000
8000
10000
12000
14000
16000
Figure 2b. PPs (frequency pmw) in the press sections of ICE corpora (parsed)21
(ICE-US, N = 210; ICE-CAN, N = 235; ICE-GB, N = 299; ICE-AUS, N = 222; ICE-NZ, N = 245; ICEIND, N = 230; ICE-PHI, N = 188; ICE-FJ, N = 164;)
Text frequency counts of PPs in newspaper texts of ICE thus do not confirm Elsness’s
(2009) findings of a more frequent use of PPs in AusE than in the other ENL varieties.
21
The press sections contain approximately 20,000 words each.
Marianne Hundt! 1/7/13 16:46
Deleted: , i.e. pmw frequencies are
extrapolated from this somewhat rough
measure
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 27
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
Figure 3a, finally, presents the results from the verb-based approach and a variable
context, i.e. only SPs that could be replaced by a PP. The concordances were manually
post-edited to exclude instances from reported speech with past-tense reporting verbs,
i.e. contexts in which back-shifting rules might apply.
ICE-AUS
7.9%
ICE-CAN
13.3%
ICE-GB
14.3%
ICE-NZ
10.2%
ICE-US
11.7%
ICE-FJ
15.9%
ICE-IND
15.1%
ICE-SL
18.8%
ICE-PHI
11.6%
ICE-GH
10%
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
Figure 3a. PP vs. SP with selected verbs
(ICE-AUS, N = 139; ICE-CAN, N = 98; ICE-GB, N = 112; ICE-NZ, N = 98; ICE-US, N = 120; ICE-FJ,
N = 82; ICE-IND, N = 73; ICE-SL, N = 64; ICE-PHI, N = 69; ICE-GH, N = 60)
Even if the variable is defined differently in Figure 3a from the data presented in
Figures 2a and 2b, and a verb-based approach is used, the proportion of PPs in
newspaper texts is still much lower than in the spoken data investigated by Davydova
(2011) and Seoane and Suárez-Gómez (2013) (i.e. Tables 1a and 1b above). As in
Figure 2b, ICE-CAN and ICE-GB yield similar proportions of PPs, as do ICE-US and
ICE-PHI. While the ESL varieties FijE, IndE and SLE show a high proportion of PPs,
GhE patterns more closely with PhilE here, albeit at an even lower level of PPs. AusE is
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 28
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
now the variety with the lowest use of PPs. Note, however, that Figure 3a includes both
active and passive VPs. Elsness (2009:98) limits his analysis to active, positive,
declarative and non-progressive contexts, and Figure 3b shows that voice, for instance
appears to have an effect on the proportion of PPs and SPs as passives may be avoided
particularly in combination with the perfect, particularly in ESL varieties.
ICE-AUS
7.9%
ICE-CAN
11.6%
ICE-GB
13%
ICE-NZ
8.8%
ICE-US
11.7%
ICE-FJ
17.7%
ICE-IND
14.7%
ICE-SL
17.5%
ICE-PHI
12.1%
ICE-GH
7.3%
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
Figure 3b: PP vs. SP with selected verbs (active only)
(ICE-AUS, N = 127; ICE-CAN, N = 86; ICE-GB, N = 100; ICE-NZ, N = 80; ICE-US, N = 120; ICE-FJ,
N = 62; ICE-IND, N = 68; ICE-SL, N = 57; ICE-PHI, N = 66; ICE-GH, N = 55)
With active-only VPs, AusE and NZE show more similar usage of PPs, as do AmE and
CanE; BrE has the highest proportion of PPs amongst the ENL varieties. With the
exception of GhE and American-based PhilE, the ESL varieties show relatively high
proportions of PPs, with IndE showing a slightly higher proportion of PPs than BrE and
the somewhat ‘younger’ ESL varieties in Fiji and Sri Lanka yielding the highest relative
Randi! 28/5/13 03:49
Comment: Why would that be the case?
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 29
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
use of PPs. The results of this case study are thus different from those obtained from
spoken data, where ESL varieties yielded overall lower proportions of PPs.
As far as co-occurrence with temporal adverbials is concerned, the newspaper section of
ICE is too small to yield conclusive results. However, the figures in Table 3 indicate
some interesting trends.
Table 3. Co-occurrence of PP and SP with adverbials just, (n)ever, yet22
ICE-AUS
ICE-CAN
ICE-GB
ICE-NZ
ICE-US
ICE-FJ
ICE-IND
ICE-SL
ICE-PHI
ICE-GH
PP : SP total
5:2
7
8:4
12
7:2
9
6:2
8
8:7
15
2:2
4
2:1
3
3:0
3
1:4
5
0:3
3
As predicted by previous research, SPs are more often found with adverbs expressing
current relevance in AmE and in CanE, but are a real minority variant in the other ENL
varieties. Usage in PhilE, again, shows the historical connection with AmE. Additional
data would be needed to verify whether GhE might also be influenced by AmE or
whether the absence of PPs with adverbs that typically trigger this construction has to
be attributed to the overall low frequency of PPs observed in Figure 3b. A qualitative
analysis of the data shows that the SP at times is a variant of the PP when it co-occurs
22
Again, instances in subordinate clauses with a past tense reporting verb are excluded.
Randi! 2/7/13 09:37
Comment: Why these 3 adverbials? I think
I missed something here.
MH: Because they typically co-occur with the
PP (at least in BrE)
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 30
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
with an adverb such as just (18), whereas some instances are variants of the past perfect
(i.e. with lack of back-shifting), as in (19):
(18)
Laced with the victorious Fiji Barbarians players who just returned from
Auckland the side has a full set of arsenal to do the damage in Vanua Levu. (ICEFJ, W2C-018)
(19)
he did not push the players hard in the first run considering the fact that they just
came back from the festive break (ICE-FJ, W2C-020:6:65)
A systematic search for co-occurrence of the PP with a clear past-tense adverb (i.e.
yesterday) did not yield a single incidence in any of the press sections of the ten corpora
surveyed for this case study. Likewise, a qualitative analysis of the opening paragraphs
of articles in ICE-AUS, ICE-NZ, ICE-FJ and ICE-PHI did not provide any evidence of
the typical framing function that Engel and Ritz (2000) describe for their spoken radio
data, at least not with adverbs that have a clear past time reference.
3.3 Discussion
The different methods used to retrieve PPs from ICE and the different measures
employed to compare the results make it difficult to assess them. The data presented
above have shown that the envelope of variation that is studied will result in a different
picture of the relation among ENL and ESL varieties: it makes a difference, for
instance, whether the overall text frequency of PPs is compared or whether the variable
is defined more narrowly, e.g. as an alternation between PP and SP in perfect contexts.
Randi! 28/5/13 03:51
Comment: Yes! This is an important point.
Explain and discuss more re. this.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 31
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
We also saw that the results are slightly different if passive VPs are included in the
counts or not.
Qualitative analyses of the corpus data show that variable use of PPs and SPs is at times
difficult to categorize. In the following example, a PP is used in a clear past time
context, but the choice of the PP itself suggests that the past action has current
relevance:
(20)
The new Prime Minister’s policy declaration at the meeting of Kisan
Coordination Committee on December 31 last has given enough indication of
coming change. (ICE-IND, W2C-007)
Occasionally, ESL ICE corpora yield interesting examples of temporal expressions that
typically co-occur with either a PP, a present tense or future time expression. Instead,
what we find in ICE-SL is a SP:
(21)
This is the first time Janasansadaya came to Anuradhapura and first a workshop
is held for Maha Sangha. (ICE-SL, W2C-015)
A follow-up search showed that this is not a tense choice regularly attested in any of the
ICE corpora. Previous web-based data (Hundt, 2013:194) show, however, that the
present progressive outnumbers the PP with This is the first time in SingE and IndE.
This brings us to the question what should be included in the envelope of variation.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 32
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
The results presented in the previous section used a fairly conservative definition and
compared PPs and SPs, only. Qualitative analyses of corpus data, however, reveal that
the envelope of variation cannot only be broadened to include the nativized patterns
mentioned in section 3.1, but the present progressive, as well, as the following examples
show:
(22)
“Over the past five years, the number of newborns affected is steadily
increasing which corresponds with the increase in females detected with the disease
over the past five years," she said. (ICE-FJ, W2C-014)
(23)
The entire hilly belt of Jammu is experiencing heavy snowfall since early this
morning, reports say today, according to PTI -. (ICE-IND, W2C-009)
(24)
"Ever since the Sakvithi and the recent Golden Key crisis erupted, we are now
experiencing increased number of deposits than before, … (ICE-SL, W2C-001)23
These variants have not been discussed in the context of variable use of the PP so far
because they are extensions of the progressive to traditional PP contexts. They are not
limited to ESL varieties but also occasionally attested from ENL contexts (see Hundt
and Vogel 2011:159). This extension of the present progressive to PP contexts might be
fostered by the somewhat more unobtrusive use of the past progressive in this
environment (see Pfaff et al. 2013).
23
The adverb since is also used as a conjunction introducing an adverbial clause of
cause/reason; without ever, the subordinate clause would be ambiguous between the
two readings and this, in turn, might have prompted the use of the present progressive in
this particular example.
Marianne Hundt! 2/7/13 09:48
Deleted: fostered
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 33
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
4 CONCLUSION AND OUTLOOK
The ICE is an excellent resource for the study of standard(izing) varieties of English
around the World. Due to the history of the project, certain limitations apply with
respect to the diachronic bias inherent in individual components and across regional
varieties. Another limitation concerns the use of spoken material, which is so far only
available in transcription. For some points of fine-grained grammatical analysis (e.g.
final consonant cluster reduction in past tense VPs), availability of the original sound
files would be desirable. In an ideal world, the sound files would be aligned with the
transcription allowing researchers to target the particular grammatical structure
retrieved from the corpus (this design feature is currently available for ICE-NIG, only,
but seems to have been envisaged in the original plan for the ICE corpora, as the sound
samples at the project website (http://ice-corpora.net/ice/sounds.htm) indicate. With a
few exceptions, documentation of existing ICE components remains poor even though
background information is often important for the interpretation of individual structure.
Comparisons between ENL and ESL ICE data need to take the possibility of
localisation of text types into account (different narrative traditions, for instance, may
affect the use of tense in fiction texts, see Biewer 2012). van der Auwera et al. (2012),
for instance, assume that ongoing language change tends to be more advanced in spoken
than in written texts and therefore take differences between speech and writing as a
proxy for ongoing change. The problem with this approach is that it must assume
stylistic differences between speech and writing to be the same across different varieties
of English, and this is not necessarily the case. More research, ideally in combination
Randi! 2/7/13 09:49
Comment: Ok – but the chapter is not about
ICE…
MH: For the reasons outlined above, I think it
makes sense to have the focus on ICE here and
refer to previous reviews of Brown-family
based research. Am happy to change, however,
if you insist.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 34
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
with real-time data or apparent-time data from sociolinguistically balanced samples is
needed to verify whether these assumptions are valid.
[7047 words originally; now approx. 8000]
REFERENCES
Aarts, Bas, Close, Joanne, Leech, Geoffrey and Wallis, Sean (eds.). 2013. The verb
phrase in English. Investigating recent language change with corpora. Cambridge:
Cambridge University Press.
Andersen, Gisle and Bech, Kristin. (eds.). 2013. English corpus linguistics: Variation in
time, space and genre. Amsterdam and New York: Rodopi.
Balasubramanian, Chandrika. 2009. Register variation in Indian English. Amsterdam:
Benjamins.
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge
University Press.
Biber, Douglas. 1993. Representativeness in corpus design. Literary and Linguistic
Computing 8(4): 243-57.
Biewer, Carolin, Hundt, Marianne and Zipp, Lena. 2010. How a Fiji corpus? Challenges
in the compilation of an ESL ICE component. ICAME Journal 34: 5-23.
Biewer, Carolin. 2012. South Pacific Englishes. The dynamics of second-language
varieties of English in Fiji, Samoa and the Cook Islands. Post-doctoral thesis,
University of Zurich.
Bowie, Jill, Wallis, Sean and Aarts, Bas. 2013. The perfect in spoken British English. In
Aarts et al., eds., 318-352.
Collins, Peter. 2009. The progressive in English. In Pam Peters et al. (eds.), 115-123.
Collins, Peter and Yao, Xinyue. 2012. Modals and quasi-modals in New Englishes. In
Marianne Hundt and Ulrike Gut (eds.), 35-53.
Davydova, Julia. 2011. The present perfect in non-native Englishes. A corpus-based
study of variation. Berlin: De Gruyter.
Deuber, Dagmar, Biewer, Carolin, Hackert, Stephanie and Hilbert, Michaela. 2012. Will
and would in selected New Englishes: General and variety-specific tendencies. In
Marianne Hundt and Ulrike Gut (eds.), 77-102.
Elsness, Johan. 1997. The perfect and the preterite in contemporary and earlier
English. Berlin and New York: Mouton de Gruyter.
Elsness, Johan. 2009. The perfect and the preterite in Australian and New Zealand
English. In Pam Peters et al. (eds.), 89-114.
Engel, Dulcie M. and Marie-Eve Ritz. 2000. The use of the present perfect in Australian
English. Australian Journal of Linguistics 20(2): 119-140.
Fallon, Helen. 2004. Comparing World Englishes: A research guide. World Englishes
23(2): 309–16.
Greenbaum, Sidney (ed.). 1996. Comparing English worldwide: The International
Corpus of English. Oxford: Clarendon Press.
Gries, Stefan. 2006. Exploring variability within and between corpora: some
methodological considerations. Corpora 1(2): 109-151.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 35
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
Höhn, Nicole. 2012. “And they were all like ‘What’s going on?”: New quotatives in
Jamaican and Irish English. In Marianne Hundt and Ulrike Gut (eds.), 263-289.
Hoffmann, Sebastian, Hundt, Marianne and Mukherjee, Joybrato. 2012. Indian English
– an emerging epicentre? A pilot study on light verbs in web-derived corpora of
South Asian Englishes. Anglia 129(3-4): 258-280.
Hoffmann, Thomas and Siebers, Lucia (eds.). World Englishes – Problems, properties
and prospects. Amsterdam and Philadelphia: Benjamins.
Huckvale, Mark and Fang, Alex Changyu. 1996. PROSICE: A spoken English database
for prosody research. In Sidney Greenbaum (ed.), 262-279.
Hughes, Arthur, Trudgill, Peter and Watt, Dominic. 2005. English accents and dialects.
An introduction to social and regional varieties of English in the British Isles. Fourth
edition. London: Hodder Arnold.
Hundt, Marianne. 1998. New Zealand English grammar. Fact or fiction? Amsterdam:
Benjamins.
Hundt, Marianne. 2006. “The committee has/have decided ...” On concord patterns with
collective nouns in inner and outer circle varieties of English. Journal of English
Linguistics 34(3): 206-232.
Hundt, Marianne. 2009. Global English – global corpora: Report on a panel discussion
at the 28th ICAME conference. In Antoinette Renouf and Andrew Kehoe (eds.),
Corpus linguistics: Refinements and reassessments. Amsterdam: Rodopi, 451-462.
Hundt, Marianne and Geoffrey Leech. 2012. ‘Small is beautiful’ – On the value of
standard reference corpora for observing recent grammatical change. In Terttu
Nevalainen and Elizabeth Closs Traugott (eds.), 175–188.
Hundt, Marianne. 2013. The diversification of English: old, new and emerging
epicentres. In Daniel Schreier and Marianne Hundt (eds.), English as a contact
language. Cambridge: Cambridge University Press, 182-203.
Hundt, Marianne and Nicholas Smith. 2009. The present perfect in British and
American English: has there been any change, recently? ICAME Journal 33: 45-63.
Hundt, Marianne and Katrin Vogel. 2011. Overuse of the progressive in ESL and
learner Englishes – fact or fiction? In Joybrato Mukherjee and Marianne Hundt
(eds.), 145-166.
Hundt, Marianne and Gut, Ulrike (eds.). 2012. Mapping unity and diversity world-wide:
Corpus-based studies of New Englishes. Amsterdam: Benjamins.
Kallen, Jeffrey L. and Kirk, John M. 2008. ICE Ireland: A user's guide. Belfast: Trinity
College Dublin.
McCafferty, Kevin. Forthcoming. ‘[W]ell are you not got over thinking about going to
ireland yet’: the be-perfect in eighteenth- and nineteenth-century Irish English. To
appear in Marianne Hundt (ed.), Late modern English syntax. Cambridge:
Cambridge University Press.
Mair, Christian. 2009. Corpus linguistics meets sociolinguistics: Studying educated
spoken usage in Jamaica on the basis of the International Corpus of English. In
Thomas Hoffmann and Lucia Siebers (eds.), 39-60.
Mair, Christian and Winkle, Claudia. 2012. Change from to-infinitive to bare infinitive
in specificational cleft sentences: Apparent-time data from World Englishes. In
Marianne Hundt and Ulrike Gut (eds.), 243-262.
Mesthrie, Rajend and Bhatt, Rakesh M. 2008. World Englishes. The study of new
linguistic varieties. Cambridge: Cambridge University Press.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 36
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
Miller, Jim. 2004. Perfect and resultative constructions in spoken and non-spoken
English. In Olga Fischer, Muriel Norde and Harry Perridon (eds.), Up and down the
cline – the nature of grammaticalization. Amsterdam: Benjamins, 229-246.
Mukherjee, Joybrato and Hoffmann, Sebastian. 2006. Describing verb-complementation
profiles of New Englishes: A pilot study of Indian English. English World-Wide 27:
147-73.
Mukherjee, Joybrato and Hundt, Marianne. (eds.). 2011. Exploring second-language
varieties of English and learner Englishes. Bridging a paradigm gap. Amsterdam:
Benjamins.
Mukherjee, Joybrato and Schilk, Marco. 2012. Exploring variation and change in New
Englishes: Looking into the International Corpus of English (ICE) and beyond. In
Terttu Nevalainen and Elizabeth Closs Traugott (eds.), 189-199.
Nelson, Gerald. 1996. The design of the corpus. In Sidney Greenbaum (ed.), 27–35.
Nelson, Gerald. 2006. World Englishes and corpora studies. In Braj B. Kachru, Yamuna
Kachru and Cecil L. Nelson (eds.), The handbook of World Englishes. Oxford:
Blackwell, 733-50.
Nelson, Gerald, Sean Wallis and Bas Aarts. 2002. Exploring natural language. Working
with the British component of the International Corpus of English. Amsterdam:
Benjamins.
Nevalainen, Terttu and Traugott, Elizabeth Closs. (eds.). The Oxford handbook of the
history of English. Oxford: Oxford University Press.
Newman, John and Columbus, Georgie. 2009. Education as an over-represented topic in
the ICE corpora? Paper presented at the 15th Conference of the International
Association for World Englishes, Cebu City, Philippines, 22 to 24 October 2009.
Peters, Pam, Collins, Peter and Smith, Adam. (eds.). 2009. Comparative studies in
Australian and New Zealand English. Grammar and beyond. Amsterdam:
Benjamins.
Pfaff, Meike, Bergs, Alexander and Hoffmann, Thomas. 2013. ‘I was just reading this
article’ – on the expression of recentness and the English past progressive. In Bas
Aarts et al. (eds.), 217-238.
Ritz, Eve-Marie. In press. Relationship between event, reference time and time of
utterance and the representation of present perfect sentences in Australian English
narratives. Cahiers Chronos.
Rosenfelder, Ingrid. 2009. Rhoticity in educated Jamaican English. In Thomas
Hoffmann and Lucia Siebers (eds.), 61–82.
Schmid, Helmut. 1994. Probabilistic part-of-speech tagging using decision trees. In
Proceedings of International Conference on New Methods in Language Processing,
Manchester, 44–49.
Schneider, Edgar W. 2005. The subjunctive in Philippine English. In Danilo T. Dayag
and J. Stephen Quakenbusch (eds.), Linguistics and language education in the
Philippines and beyond. A Festschrift in Honor of Ma. Lourdes S. Bautista. Manila:
Linguistic Society of the Philippines, 27-40.
Schneider, Edgar W. 2007. Postcolonial English. Cambridge: Cambridge University
Press.
Schneider, Gerold. 2008. Hybrid long-distance functional dependency parsing. Ph.D.
dissertation, University of Zurich.
Sedlatschek, Andreas. 2009. Contemporary Indian English. Variation and change.
Amsterdam: Benjamins.
Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 37
Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge
University Press, 381-400. (Final pre-print version)
Seoane, Elena and Suárez-Gómez, Cristina. 2013. The expression of the perfect in East
and South-East Asian Englishes. English World-Wide 34(1): 1-25.
Sigley, Robert J. 1997. Text categories and where you can stick them: A crude
formality index. International Journal of Corpus Linguistics 2(2): 199-237.
Sigley, Robert. 2012. Assessing corpus comparability using a formality index: The case
of the Brown and LOB clones. In Shunji Yamazaki and Robert Sigley (eds.),
Approaching language variation through corpora. A Festschrift in honour of Toshio
Saito. Bern: Peter Lang, 65-114.
Skandera, Paul. 2003. Drawing a map of Africa: Idiom in Kenyan English. Tübingen:
Narr.
Suárez-Gómez, Cristina and Elena Seoane. 2013. “They have published a new cultural
policy that just come out”: Competing forms in spoken and written New Englishes.
In Gisle Andersen and Kristin Bech (eds.), 163-182.
Szmrecsanyi, Benedikt and Bernd Kortmann. 2011. Typological profiling. Learner
Englishes versus indigenized L2 varieties of English. In Joybrato Mukherjee and
Marianne Hundt (eds.), 167-187.
van der Auwera, Johan, Noël, Dirk and De Wit, Astrid. 2012. The diverging need(to)’s
of Asian Englishes. In Marianne Hundt and Ulrike Gut (eds.), 55-75.
van Rooy, Bertus, Terblanche, Lize, Haase, Christoph and Schmied, Joseph. 2010.
Register differentiation in East African English. A multidimensional study. English
World-Wide 31(3): 311-49.
Vine, Bernadette. 1999. Guide to The New Zealand Component of the International
Corpus of English. Wellington: School of Linguistics and Applied Language Studies,
Victoria University of Wellington.
Walker, Jim. 2008. The footballer’s perfect – Are footballers leading the way? In Eva
Lavric, Gerhard Pisek, Andrew Skinner and Wolfgang Stadler (eds.), The linguistics
of football. Tübingen: Gunter Narr, 295-303.
Wunder, Eva-Maria, Voormann, Holger and Gut, Ulrike. 2010. The ICE Nigeria corpus
project: Creating an open, rich and accurate corpus. ICAME Journal 34: 78-88.
Xiao, Richard. 2009. Multidimensional analysis and the study of world Englishes.
World Englishes 28(4): 421-450.
Yao, Xinyue and Collins, Peter. 2013. Functional variation in the English present
perfect: a cross-varietal study. In Gisle Andersen and Kristin Bech (eds.), 91-111.
Zhiming, Bao and Huaqing, Hong. 2006. Diglossia and register variation in Singapore
English. World Englishes 25(1): 105-114.
Zipp, Lena. Forthcoming. Educated Fiji English. Lexico-grammar and variety status.
Amsterdam: Benjamins.