World Englishes

Marianne Hundt

World Englishes

This chapter focuses on studies of World Englishes based on the International Cor-pus of English (Greenbaum 1996). Part one surveys important early studies based on this resource. Recently, the ICE corpora have been used to study unity and diver-sity across New Englishes (Hundt & Gut 2012). It critically discusses the sampling frame of the corpus and its suitability for research into the dynamics of Eng-lish as a global language. In particular, it will address the question whether the corpus design poses certain restrictions on the study of World Englishes. The chapter contrasts corpus-based and corpus-driven approaches. Another methodological issue discussed is the combination of ICE corpora with supplementary data derived from the internet. The case study looks at the use of present and past perfect constructions in different ICE corpora. The present perfect is of interest because it is apparently an example of stable regional variation in British and American English (Elsness 2009, Hundt & Smith 2009). At the same time, perfect constructions show interesting patterns of nativization in New Englishes, including different discourse pragmatic functions (e.g. Sharma 2001). With respect to methodological issues, the study compares results from a verb-based analysis with data derived from syntactically annotated corpora. An important aspect to consider is the interaction of regional with register variation....Read more

Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) 2 Discussion of corpus-based research into English as a lingua franca (ELF) and Learner Englishes can be found in [cross-references to other chapters]. 3 While some scholars have compiled their own corpora of world Englishes (e.g. by tapping into archives found on the world-wide web), 4 the focus in this chapter will be on research based on ICE, as these are the most widely available corpora of ENL and ESL Englishes, and they are corpora in the more narrow corpus linguistic sense, i.e. principled, representative collections of texts (see . The original vision was already very ambitious in that the project was aiming to include eighteen sub-corpora (Greenbaum 1996:3). Since then, the corpus has been growing and new members keep joining the ICE family of corpora. 5 Today, ICE components are available or under compilation for varieties of English as a first language (ENL) 6 like 3 For corpus-based research that bridges the ‘paradigm gap’ between studies on first, second and foreign language varieties, see the papers in Mukherjee and Hundt (2011). For a more detailed discussion of the terminology and classification of different World Englishes, see Mesthrie and Bhatt (2008: 2-13). 4 A publicly available web-based set of World Englishes corpora is provided by Mark Davies at http://corpus2.byu.edu/glowbe/ [last accessed 1 July 2013]. 5 For a list of available corpora and those under construction, see http://ice- corpora.net/ice/index.htm (last visited 16-01-2013). 6 The established acronym is actually derived from ‘English as a Native Language’, but nativeness is a somewhat controversial issue whereas ‘first language’ is a more neutral term. Randi Reppen 1/7/13 11:10 Comment: Based on the foot note why not use L1 – English as an L1 MH: Because ENL is the label that is commonly used in the Kachruvian three- circles model.

Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) 1 World Englishes Marianne Hundt 1 INTRODUCTION English corpus linguistics was kick-started by the compilation of the machine-readable Brown corpus of written American English (AmE) in 1961. A parallel British English (BrE) version was soon to follow. In the 1980s, the Brown-type compilation model started spreading to other parts of the English-speaking world (India, Australia and New Zealand).1 While the Brown-type corpora are a useful resource, and their sampling frame is even used to cover previous stages of World Englishes,2 they are limited with respect to regional spread and, more importantly, provide evidence on printed written language use, only. Corpus linguistics truly went global when, in the late 1980s, Sidney Greenbaum launched a huge international project that aimed at providing standard onemillion word samples of World Englishes on a hitherto unprecedented scale, the International Corpus of English or ICE (Greenbaum 1996). The label ‘standard’ in this context serves two meanings, covering both the variety (educated English) that was to be sampled as well as the principled compilation that would hopefully ensure comparative research across the different Englishes (see section 2.1). The focus in this chapter will be on World Englishes that are used as first or second language varieties. 1 For an overview of research based on non-ICE corpora, see Fallon (2004) and Nelson (2006). 2 For the extended Brown family covering Britain and the US, see e.g. Hundt and Leech (2012). Sebastian Hoffmann (Trier) is involved in the compilation of a historical corpus for Singapore English modelled on Brown, and Peter Collins (Sydney) is engaged in a similar project for the Philippines. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) 2 Discussion of corpus-based research into English as a lingua franca (ELF) and Learner Englishes can be found in [cross-references to other chapters].3 While some scholars have compiled their own corpora of world Englishes (e.g. by tapping into archives found on the world-wide web),4 the focus in this chapter will be on research based on ICE, as these are the most widely available corpora of ENL and ESL Englishes, and they are corpora in the more narrow corpus linguistic sense, i.e. principled, representative collections of texts (see . The original vision was already very ambitious in that the project was aiming to include eighteen sub-corpora (Greenbaum 1996:3). Since then, the corpus has been growing and new members keep joining the ICE family of corpora.5 Today, ICE components are available or under compilation for varieties of English as a first language (ENL)6 like 3 For corpus-based research that bridges the ‘paradigm gap’ between studies on first, second and foreign language varieties, see the papers in Mukherjee and Hundt (2011). For a more detailed discussion of the terminology and classification of different World Englishes, see Mesthrie and Bhatt (2008: 2-13). 4 A publicly available web-based set of World Englishes corpora is provided by Mark Davies at http://corpus2.byu.edu/glowbe/ [last accessed 1 July 2013]. 5 For a list of available corpora and those under construction, see http://ice- corpora.net/ice/index.htm (last visited 16-01-2013). 6 The established acronym is actually derived from ‘English as a Native Language’, but nativeness is a somewhat controversial issue whereas ‘first language’ is a more neutral term. Randi Reppen! 1/7/13 11:10 Comment: Based on the foot note why not use L1 – English as an L1 MH: Because ENL is the label that is commonly used in the Kachruvian threecircles model. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) 3 BrE or New Zealand English (NZE), institutionalised7 second language varieties (ESL) such as Indian (IndE) and Singapore (SingE) English, but also for countries in which a standard(izing) variety of English exists alongside a creole (e.g. in Jamaica and the Bahamas) or where the exact status of English is a matter of debate (e.g. English in Hong Kong or Malta).8 While the recent and ongoing expansion of ICE allows linguists interested in World Englishes to include a broad range of data in comparative, crossvarietal research, both the original design and the expansion of the ICE project pose a number of methodological issues that need to be addressed. The aim of this chapter is to critically discuss the sampling frame of the corpus and its suitability for research into the dynamics of English as a global language. In particular, it will address the question whether the corpus design poses certain restrictions on the study of World Englishes. Section two of this chapter will survey important early/ier studies based on this resource and section three will present a case study on the use of the present perfect in different Englishes. The chapter will conclude with a short evaluation of the existing resources and an outlook on recent and future developments. 2 PREVIOUS ICE-BASED RESEARCH9 ICE corpora sample both written and spoken data, amounting to approximately 400,000 and 600,000 words, respectively. Written texts include both printed and non-printed 7 Institutionalised varieties of English are varieties that typically have official status in a country and/or are used in a broad range of intranational domains, such as administration, tertiary education and the media. 8 Note that the original intent was to include ENL and ESL varieties, only (Greenbaum, 1996: 3). 9 The studies in this section at times combine ICE with other electronic sources. Randi! 1/7/13 12:58 Comment: First do previous research on WE. Then focus in on ICE-based research… MH: I have now referred to previous corpusbased research in more detail above, essentially with references to existing research guides (i.e. Fallon 2004) – rather than repeating this information here, I’d like to keep the focus on ICE. My main argument is that the studies do not differ fundamentally in terms of their method from what people have been doing with ICE. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) 4 texts and the spoken component has public, unscripted and scripted speech materials. These samples are divided across a set of domains (e.g. private, education, legal, media, business, administration, etc.).10 The ICE corpora (or parts thereof) have been used individually, for the description of single varieties (e.g. Nelson et al. 2002 or Sedlatschek 2009). Another strand of research takes a comparative approach, either with a focus on a comparable set of varieties in a specific region (see the papers in Peters et al. 2009) or aiming at more global coverage (the papers in Hundt and Gut 2012). Studies have looked at grammatical variation (e.g. Zipp forthcoming), lexis (e.g. Skandera 2003) or register variation in individual varieties (e.g. Balasubramanian 2009, van Rooy et al. 2010). At a total of approximately 1 million words of text, the ICE corpora pose obvious limitations on lexical studies and infrequent grammatical patterns. Phonetic analyses of ICE corpora are extremely rare (but see, e.g. Rosenfelder 2009), mostly because the sound files collected for the spoken part of the ICE corpora are not made publicly available and, with the exception of some sections of ICE-GB (see Huckvale and Fang 1996) and ICE-NIG (see Wunder et al. 2010), the data have been transcribed orthographically but not been aligned with the original sound files in a systematic way. Finally, ICE corpora have also been used to investigate issues of typological (e.g. Szmrecsanyi and Kortmann 2011) and sociolinguistic (e.g. Mair 2009) variation. 10 For details, see http://ice-corpora.net/ice/design.htm [last accessed 1 July 2013]. Randi Reppen! 1/7/13 13:18 Comment: She complied her own corpus and didn’t use ICE. I don’t know about the other studies listed – but please be careful to see what they used. MH: I have been careful: She used both her own corpus and ICE-India. But have added a footnote earlier to make this explicit – the same applies to the study by Davydova – uses a ‘home-made’ corpus and ICE; Sedlatschek padded his study with other sources, too, so it’s a common approach. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) 5 2.1 ICE as a resource for the study of World Englishes Any study that uses data from several ICE components relies on the comparability of these corpora. Thus, corpus comparability is one of the issues that need to be addressed in a critical evaluation of ICE for the study of World Englishes. It was one of the key design features that the initiator of the project wanted to achieve: “The ICE project views as the basis for international comparisons the provision of parallel corpora that sample English used in the participating countries. For valid comparative studies the components of ICE need to follow the same design, to date from the same period, and to be processed and analysed in similar ways” (Greenbaum, 1996:5). An obvious limitation on the comparability of varieties across different ICE components comes from the long history of the project. Initially, eighteen varieties were to be represented and the intention was that the material sampled should have been produced in the late 1980s and early 1990s. The more recent additions necessarily sample data from around a decade later, thus introducing a slight diachronic bias into comparisons. Users also need to be aware that sampling for a single ICE corpus may occasionally stretch over a considerable time period, which introduces diachronic variation not only between but also within individual ICE components. The sampling for ICE-Fiji started back in 2005, for instance, included individual texts published as early as 1990 (Biewer et al. 2010:10) and – for political and practical reasons – still remains incomplete at the moment of writing. This means that within a single ICE corpus, the time span may occasionally stretch to around 20 years instead of the original Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) 6 plan (1990-94 only, see Nelson 1996:28). Mukherjee and Schilk (2012:191) therefore rightly conclude that “[w]e have to uphold … the general fiction of linguistic stability over the past 20 years in order to be able to treat ICE components as synchronically comparable corpora”. But users need to be aware that this is obviously a fiction rather than a fact. While the diachronic bias inherent in the current set of ICE corpora might be a relevant factor for some studies this might be less of a problem for others: A feature that underwent rapid spread in spoken varieties of English in different parts of the globe, such as quotative be like, crucially depends on contemporaneous sampling of the data (see Höhn 2012:268) whereas more long-term changes as those in the complementation patterns of verbs might be less affected by the bias inherent in the data. Mair and Winkle (2012), for instance, found that differences between ENL and ESL varieties of English were more marked than any signs of ongoing change in their study of specificational cleft sentences. While it is sometimes necessary to be aware of the potential diachronic bias introduced by the data, background information on the precise temporal span sampled in individual ICE corpora (or components thereof) is often difficult to obtain. The majority of ICE corpora were released without detailed bibliographical background information on individual texts included in the corpus or biographical information for the spontaneous spoken conversations, notable exceptions being ICE-NZ and ICE-IRE. ICE-NZ includes texts (both written and spoken) from 1990 to 1998 (Vine 1999:8); the earliest texts in ICE-IRE are from 1990 and the latest (recordings of spoken data) from 2005 (Kallen and Kirk 2008:4, 31); in ICE-CAN, the written texts sampled were produced between 1988 and 1999 and the spoken between 1985 and 1997, with the bulk of data Randi Reppen! 1/7/13 13:30 Comment: Mention the problem/challenge that registers from different Englishes might not be realized in the same way. This is a significant issue in corpus design that needs to be considered. MH: I cover this aspect below where I discuss ‘formality’ and refer to relevant studies that have looked at formality in ICE corpora. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) 7 having been recorded in the years 1994 and 1995.11 A comparison of these three ICE members would thus have to factor in the fact that the data were collected between 1985 and 2005. Comparison with more recent ICE members obviously has to take an even broader time span into account. In the sampling of the spoken component, no restrictions on speaker age or gender were usually imposed, so there is no straightforward way in which the spontaneous spoken data in ICE could be used for apparent-time studies that would allow linguists to trace ongoing change in the new Englishes. The ICE corpora aim to be samples of educated usage and therefore do not attempt balance with respect to people’s regional background which, to some extent, may result in a fairly homogenous regional sample in the spontaneous spoken conversations. ICE-GB’s spoken data, for example, are largely a sample of educated London English (see Hundt, 2009:461). While biographical background information is often not available for individual written texts, it can easily be obtained (and has been collected) from the informants who contributed to the spoken part of the corpus. If it were more widely available, this background information would help the research community to interpret the findings in the right light. One point may serve to illustrate this. A search for university in ICECAN shows (possible) traces of language contact: 11 I am grateful to John Newman and Georgie Columbus for providing me with the metadata for ICE-CAN. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) 8 (1) But it’s not like that you have a computer desk somewhere on the university where you can all … (ICE-CAN, S1A-063) (2) when the kids come from India some of them are into college already aren’t they or university (ICE-CAN, S1A-057) The speaker of (1) is a female informant who has French and German as additional languages and spent seven years at university in Switzerland: the use of both on and the could have been triggered by the German PP-N collocation an der Universität. The use of into with college/university in example (2) might have to be attributed to the speaker’s multilingual background: in addition to Canadian English (CanE) and French, he speaks Hindi, Gujarati, Nepali and Kannada and spent two years living in India. While language contact in Canada is an obvious factor to take into account, researchers using ICE-CAN may not necessarily be aware of the fact that, in individual instances, contact may go beyond the potential influence of the country’s second official language, French. Another problem for comparability has to be attributed to the interpretation of text types from one cultural environment to another (see e.g. Biewer et al., 2010 on the difficulties of sampling texts for the ‘skills and hobbies’ section and ‘technology’ in the Fijian context). One more example of this kind may be taken from the student essay section of ICE-PHI where the student clearly perceives the task of writing an academic essay differently from what one might expect: (3) Plato would suggest aristocracy. Randi Reppen! 28/5/13 03:05 Comment: This relates to the comment earlier about register differences - Develop this a bit more as to how this can cause problems for analysis Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) 9 And Freud would … Ehehehe … As for me … Er … Argh. Maths is so much easier. P.S. I didn’t realize how hard it is to write something that has to do with Philosophy until now. Too many thoughts. (ICE-PHI, W1A-001) While an individual text is unlikely to affect the results of a study, a more systematic bias in interpreting text categories differently in a regional variety of English will have an impact, e.g. on variables that are sensitive to ‘formality’. I will return to this issue below. Similarly, a closer look at some spontaneous conversations in the ICE corpora may show that informants at times engage in interview-like behaviour (note that the contributions of the fieldworker have been marked as extra-corpus material), throwing some doubt on the ‘naturalness’ of such ‘private’ conversations:12 12 Note also that a lot of the spoken material was collected in university contexts. This has introduced a certain thematic bias into a lot of the ICE components and led, for instance, to an over-representation of education as a topic in ICE corpora (see Newman and Columbus, 2009). Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 10 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) (4) <$Z> <X>First of all uhm <,> this isn't the best way to start off but tell me your full name </X> <$A> Uhm <,> Tang Wai Ping <,> Ade <$Z> <X>Tang Wai Ping <{> <[> Ade </[> How did you get the name <,> Ade </X> <$A> <,> <[> Yeah </[> </{>Uhm the name is uhm decided by my grandpa (ICE-HK, S1A-001) The ICE corpus was conceived before e-mail communication became one of the most common forms of written long-distance communication. It is therefore not surprising that the original corpus design included letters (both social and business) as a text category to be sampled for the written part of the corpus. Nowadays, e-mail and other means of electronic written communication have largely replaced letter writing, especially in the private domain. It is therefore not surprising that a lot of ICE corpora have gone against the original design (which stipulated for the inclusion of e-mail as a separate, additional text type) and have (also) sampled e-mails. ICE-CAN, for instance, includes the whole range from hand-written to typed letters, but also e-mails whereas neither ICE-NZ nor ICE-IRE includes any e-mails. With other ICE components sampling only e-mails (see e.g. Biewer et al. 2010:11), the ‘letters’ category is therefore likely to exhibit a fair bit of inter-varietal differences, both in terms of patterns that are used as well as the level of formality. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 11 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) Formality is another issue that needs to be considered in the interpretation of findings from ICE corpora. It is a factor that plays a role at several levels. Firstly, previous research on individual varieties (e.g. Schneider 2005, Hundt 2006) suggests that there might be less of a formality gap between written and spoken texts in ESL corpora than there is in ENL corpora. Zhiming and Huaqing’s (2006) study indicates that even such broad generalisations are problematic as a particular feature might be indicative of regional differences with respect to only one register (e.g. private conversation) and not even across the spoken medium as a whole. While Xiao (2009), in his multidimensional analysis of five ICE corpora (GB, India, Hong Kong, the Philippines and Singapore) shows that there are both similarities between ESL varieties with respect to stylistic parameters, he also found differences among them, e.g. that spoken and written texts are much closer in ICE-IND than in the other corpora.13 Secondly, investigations into the stylistic homogeneity and heterogeneity of corpora (notably Biber 1988) have shown that there may be considerable variation within certain pre-defined text categories, on the one hand, and more similarities among texts that are grouped into different categories on the other hand; Sigley (1997:232) therefore cautions us: “Corpus analysts are recommended to beware of treating the pre-existing text categories as natural groups, and to consider alternative text groupings which may be more relevant for their purpose.” 13 Other useful studies that address the complex matter of register variation and corpus comparability are Biber (1993), Gries (2006) and Sigley (2012). Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 12 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) Additional studies on stylistic variation across and within ICE components are therefore needed as background information for the interpretation of findings on individual patterns. Finally, ICE corpora have been used alongside other resources in the description of World Englishes. Especially for the study of lexico-grammatical variation, the ICE corpora provide interesting sources for hypothesis building that can then be verified against larger dataset, usually from less stratified material. Examples of such studies are Mukherjee and Hoffmann’s (2006) investigation of new ditransitives in Indian English (e.g. to gift somebody something) or Hoffmann et al.’s (2012) study of light verb constructions in South Asian Englishes, where the use of the indefinite article is variable (e.g. to take (a) look at). 2.2 Some findings and research questions The potential problems with cross-corpus comparability outlined in section 2.1 do not mean that ICE does not allow for meaningful comparative research. On the contrary, ICE components (and parts thereof) have been used to investigate various linguistic features. A recurrent research question that these studies aim to answer concerns Randi! 1/7/13 13:37 Comment: This section is 3.5 pages, mostly giving a critical evaluation of ICE. Can we condense that a lot, and shift the focus to a general survey of corpus-based research on WE, and then the place of ICE-based research relative to the full range of those studies. MH: This section is quite explicit on purpose – as far as I know, there is no other critical evaluation of the ICE corpora to date, and I wanted to alert users to potential skewing effects that the ICE sampling may have on their results. I’d prefer to keep this focus instead of repeating things that have been published on research into World Englishes that uses the Brown family of corpora (i.e. the ground covered by Fallon, 2004). ongoing change and whether a particular variety is more advanced or more conservative with respect to a particular change. AmE is leading the change towards a greater use of quasi- or semi-modals like going to, want to while at the same time being more advanced in the decline of core modals (see e.g. Collins and Yao 2012); the continued increase of the progressive, on the other hand, is spear-headed by AusE and NZE among the ENL varieties, with some New Englishes showing higher, others showing Randi Reppen! 1/7/13 13:54 Comment: won’ t these vary depending on register considerations??? MH: Yes, within the ICE-components, there’s variation with respect to speech vs. writing, and within writing, among different text types – but this is more or less stable across the corpora, so that you an also talk about regional differences. So the focus here was on regional rather than text type variation. Could be more explicit, but was trying to keep things short here. Have added a footnote – hope this helps. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 13 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) lower frequencies of progressives (e.g. Hundt 1998, Collins 2009, Hundt and Vogel 2011).14 Other studies focus on the relative closeness or distance between ENL and ESL varieties, looking both at how global features are used in local varieties and at evidence of structural nativization: one of the reasons that New Englishes are less advanced in the move away from core modals is that would, for instance, shows an extended (i.e. nativized) use (see Deuber et al. 2012). In the following example it has replaced ENL will; the example at the same time illustrates the extended use of the progressive in a context where one would expect a non-progressive VP in a variety such as BrE or AmE: (4) First, I would be explaining about the gender inequality, which often leads to the high incidence of poverty amongst women, which is what I would be discussing about in the second part of this essay. (ICE-FJ, W1A-016) Nativization is an important indicator of how far a ‘new’ English variety has come in its development along, e.g. the stages suggested in Schneider’s (2007) model of new dialect evolution. 3 CASE STUDY: THE PRESENT PERFECT The present perfect (PP) is of interest because it is apparently an example of stable regional variation in written BrE and AmE (see e.g. Hundt and Smith 2009). At the same time, perfect constructions serve pragmatic functions in certain text types, as we will see, and show interesting patterns of nativization in New Englishes. With respect to 14 This kind of regional variation can be observed over and beyond variation across different modes (i.e. speech and writing) and registers. Randi! 1/7/13 13:57 Comment: Even for ICE-based studies, I’d like Section 2.2 expanded, and Section 2.1 greatly reduced – we care a lot more about the research questions and findings for this chapter. MH: See my earlier comment – can do further revisions if you insist. My aim was less to provide a critical evaluation of the ICE corpora as a resource for comparative studies because a lot of people, who have not been involved in the compilation of an ICE component, are unaware of some of the differences among the corpora. But am happy to shift the focus if you insist. Randi Reppen! 17/5/13 16:10 Comment: Some examples here would be helpful Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 14 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) methodological issues, the study will compare results from lexical searches with data derived from syntactically annotated corpora. For past-time reference, Present-Day English (PDE) has a choice between the PP (I have seen her) and the simple past tense (PT) (I saw her). The standard account of the PP in standard (ENL) varieties of English today is that it refers to past events that have current relevance. Elsness’ (1997) long-term, corpus-based study of BrE and AmE shows that the PP increases over time but starts decreasing again from the second half of the eighteenth-century, a development which is lead by AmE. In the twentieth century, there is relatively stable variation in the use of the PP, with higher levels found in BrE than in AmE (see e.g. Hundt and Smith 2009).15 Beyond regional differences between the two standard northern-hemisphere varieties, previous studies have found the PP to be particularly frequent in spoken Australian English (AusE) (see Engel and Ritz 2000 or Elsness 2009:98). As far as functions of the PP are concerned, standard PDE differs from languages such as German or French, where the perfect has grammaticalized into a form used for reference to events that are clearly in the past. However, both historical and regional varieties of English also provide evidence of the occasional narrative use of the PP in clear past-tense contexts (see e.g. Elsness 1997:292 for historical varieties and Hughes et al. 42005:12f. for dialects; this use is also attested in AusE, see Engel and Ritz 2000 15 Note, however, that Bowie et al. (2013) observe a slight increase in the perfect in their spoken British data. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 15 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) and Ritz in press).16 Engel and Ritz (2000) show how the PP has a special pragmatic function in press reportage where it serves as a framing element at the beginning or end of an article. Elsness (2009) compares data from the Brown-family of corpora with ICE data, which also include non-printed material and texts sampled from a slightly later date. Both factors are likely to have had an influence on the slightly higher percentages of PPs he finds in the written parts of ICE-AUS and ICE-NZ. In other words, it is important to compare datasets that sample the same kinds of text. Recently, a number of studies have made use of the ICE components to investigate variation across both ENL and ESL varieties of English in the use of the perfect. There are studies that look at the text frequency per million words of the PP (e.g. Bowie et al. 2013 or Yao and Collins 2013), but they focus on ENL Englishes only. The focus in the following survey of previous research is on studies that have looked at variation in different Englishes. These model the variation in terms of variable contexts, i.e. where there is alternation between the PP and the SP. Davydova (2011) uses the spoken components of ICE-IND, East Africa and Singapore and the London Lund Corpus (LLC) of spoken BrE to study the use of PP vs. SP in ‘present perfect contexts’, i.e. only those contexts where the PP could replace the SP (e.g. not in narrative contexts or with adverbials of definite past time references like long ago, yesterday, the other day, 16 A different sub-types of the narrative function is the so-called ‘footballer’s perfect’ (Walker 2008). Randi! 28/5/13 03:14 Comment: What survey? Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 16 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) in 1900, etc. She summarizes the procedure for defining these context in the following figure:17 Randi! 1/7/13 14:49 Comment: Hmm – I think this is crucial for understanding the quantitative findings. Explain, plus discuss the methods for doing the analysis MH: Have added her summary chart – hope that it will be sufficient. Figure 1: Categorisation of present perfect contexts (Davydova, 2011: 124) Her data reveal that the proportion of PPs is lower in ESL varieties than in spoken BrE: Table 1. PP vs. SP in present perfect contexts (raw frequencies and percentages, based on Davydova 2011:175, 223, 238, 145; infrequent additional forms not included) ICE-IND ICE-EAf ICE-SIN LLC perfect 715 (53%) 247 (58%) 532 (56%) 1812 (90%) preterite 471 (35%) 159 (37%) 350 (36%) 197 (10%) Seoane and Suárez-Gómez (2013) use a similar approach but a slightly different set of ICE corpora (Hong Kong, Singapore, India, Philippines and GB as a benchmark corpus) as well as a slightly different methodology of data retrieval and definition of the 17 For a more detailed discussion of this concept, see Davydova (2011: 119-131). Randi! 28/5/13 03:16 Comment: Is this a raw count? Add a legend to the table Randi! 1/7/13 14:59 Comment: 90% perfect!! What does that mean? The results in this table need to be explained – they don’t seem credible! MH: If you look at contexts where you’d expect the PP rather than the SP, only, then these figures do make sense! Hope that the explanations above have made it more obvious! Marianne Hundt! 1/7/13 15:10 Deleted: 25 Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 17 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) variable. Like Davydova, their focus is on spoken data, but while Davydova used both face-to-face conversations and telephone calls, Seoane and Suárez-Gómez limit their analysis to the private conversations. (The rationale for using spontaneous speech in both cases is that this is the least monitored kind of data and that, according to Miller (2004), PP and SP alternate more frequently in this type of language). Seoane and Suárez-Gómez limited their analysis to the ten most frequent verbs in the Asian ICE corpora, extracting the data automatically, whereas Davydova read through the corpus files searching for present perfect contexts. Finally, following Huddleston & Pullum (2002: 143), Seoane and Suárez-Gómez defined the perfect semantically as expressing events covering “a time span beginning in the past and extending up to now”. Table 2. Forms expressing perfect meaning (i.e. experiential, recent past, resultative and persistent situation) in Private Dialogue in Asian varieties of English (based Seoane and Suárez-Gómez 2013:9; PP vs. SP, only; percentages over all variants) ICE-HK ICE-SIN ICE-PHI ICE-IND ICE-GB perfect 410 (59.2%) 155 (44.4%) 169 (57.3%) 300 (77.5%) 236 (80.8%) preterite 204 (29.5%) 174 (49.9%) 121 (41.0%) 70 (18.1%) 48 (16.4%) Randi Reppen! 17/5/13 16:23 Comment: Is this different from perfect forms??? Randi Reppen! 28/5/13 03:22 Comment: How do you account for studies based on the same ICE corpora ending up with such different results?? -- eg compare SIN & IND in Tables 1 and 2. This study produces different results from Davydova because of a the different methodology, resulting in a more marked divide between IndE and SingE. Both studies use a semantic definition of the variable but apply different retrieval strategies. The combination of these differences may well give rise to diverging results despite the fact that very similar sets of data were used. The difference for BrE, in addition, is most likely due to the fact that different benchmark corpora were used: LLC was sampled earlier than ICE-GB and contains more formal conversations than those included in ICE-GB. Note, however, that while the comparison of the results for LLC and ICE-GB Randi! 28/5/13 03:23 Comment: Ok – explain this. It’s a pretty large difference. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 18 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) in tables 1 and 2 suggest that there has been a decrease in the use of the PP, Bowie et al. (2013:326) report a slight increase in spoken BrE. This apparently contradicting result can be explained if we take a closer look at the definition of the variable: the percentages presented in the tables above compare the use of the present perfect against the use of the SP whereas Bowie et al. look at text frequency per million words (pmw) of the PP. This approach avoids the difficulty of deciding which SP verbs could be replaced by a PP, but they report the text frequency of one construction, only. Variation of PPs may actually extend beyond the SP as a variant, as a recent paper by Pfaff et al. (2013) indicates: They found that the past progressive (e.g. I was just looking at this Randi! 2/7/13 09:52 Comment: Ok – explain more here. What does each approach tell us? Why would we choose one over the other? MH: Have now added something here. In the original version, I only discussed these issues in section 3.3 – could therefore refer to this discussion here and move the added bits of texts. [I think that this would work better.] picture) is also occasionally used in spontaneous speech to refer to recent events in past contexts. By measuring the frequency of the PP in terms of text frequency rather than against alternating constructions the question of syntactic equivalence is avoided. With respect to suitable benchmark corpora ICE-GB is a more suitable choice because the texts sampled, by and large, stem from the same period as those sampled in the other ICE components. Interestingly, with ICE-GB as a yardstick for comparison and a differently defined variable, IndE comes closer to BrE in Seoane and Suárez-Gómez investigation than it was in Davydova’s study and the emerging picture is one of a gradient rather than an ENL-ESL divide. Davydova (2011:170, 231, 253) also looked at the PP in past tense contexts; her study shows this is actually a rare phenomenon in IndE, EAfE and SingE, corroborating Balasubramanian’s (2009:92) earlier finding for IndE: on the basis of ICE-IND and a corpus of contemporary IndE, she discovered that PPs in clear past tense contexts are infrequent at 3.4% and a mere 0.9% of all PPs in IndE speech and writing, respectively. Randi! 1/7/13 16:17 Comment: But this is confusing – both studies report the results as percentages. So what’s the difference??? MH: Because the approaches are really quite different, maybe downplay the comparison a bit? Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 19 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) In addition to the standard variants, i.e. the PP with auxiliary have, the ICE components also reveal traces of nativization, for instance the pattern without an auxiliary (6) or with a base form rather than a past participle (7): (5) That’s why I never looked back and had any regrets for whatever I myself done or decided upon with my eyes open. (ICE-IND, S1A-038; quoted from Davydova 2011:180) (6) She has give four exams (ICE-IND, S1A-070, quoted from Seoane and SuárezGómez 2013:11) ICE corpora also yield a minority of instances which combine auxiliary be with a past participle, which could be retentions of the older be-perfect (7); note, however that some of the attested examples are with transitive rather than the historically attested intransitive verbs, i.e. (8) and (9) are not simply retentions but modern ‘extensions’ of the be-perfect:18 (7) I said to the receptionist <,> here on the desk <,> is he gone in to visit (ICE-IRE, S1A-008) (8) Look I’m I’m almost finished Sacred Hunger [title of a novel; MH] (ICE-HK, S1A047, quoted from Seoane and Suárez-Gómez 2013:12) 18 Note, however, that IrE also uses the be-perfect with transitive verbs, a feature that McCafferty (forthcoming) attributes to substrate influence from Irish. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 20 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) (9) Okay once the pieces are been cut and washed and dried now we connect them together. (ICE-SIN, S2A-058) Examples (7)-(9) are all from spoken texts, overall a more likely context for nativized patterns to occur than in edited written language. While both Davydova (2011) and Seoane and Suárez-Gómez (2013) used spoken data, only, the case study in the following section focuses on the use of the PP in the news sections of the ICE corpora. 3.1 Data and methodology The case study aims to broaden the scope of previous research by including varieties of English that have not been subjected to comparative research, partly because the respective ICE components have only recently been made available or are still under construction. The ENL varieties included are BrE, AmE, CanE, NZE and AusE; ESL varieties selected are Fiji English (FijE), Philippine English (PhilE), Indian English (IndE), Sri Lankan English (SLE) and Ghanaian English (GhE). In addition to providing evidence on the use of the PP in some new ICE corpora, another aim is to illustrate how different approaches to data retrieval may influence the results. The analyses will be limited to the newspaper section of ICE, not only for obvious time constraints on a small-scale study and limitations on the availability of spoken data,19 but also because newspapers are expected to be maximally comparable across different regional varieties of English. Moreover, Miller (2004:230) points out that “[i]n formal written English the Perfect construction is solidly fixed, in frequent use and protected by grammars of standard English and by editorial practice.” This general tendency might no longer 19 Only written data are currently available for ICE-US, ICE-FJ, ICE-SL and ICE-GH. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 21 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) apply to newspaper texts, as he observes “… with the intensive use of computers newspapers are no longer edited as rigorously as they once were” (Miller, 2004:234). Finally, newspaper articles afford the possibility to investigate one of the narrative functions of the PP. Three different approaches will be used to extract corpus data from the corpus material. The PP combines a form of the auxiliary have with a past participle. Because of the text frequency of auxiliary have and lack of grammatical annotation, previous studies tended to rely on the verb-based approach, i.e. they restricted analysis to a set of frequently used lexical verbs.20 The ICE corpora are currently being POS-tagged and parsed at the university of Zurich, using the Tree Tagger tagset (Schmid 1994) and a probabilistic dependency parser (Pro3Gres), developed by Schneider (2008). This allows for automatic retrieval of PP and SPs. Eight of the ten ICE components that form the basis of this study have been syntactically annotated, thus allowing for automatic retrieval of all PPs and SPs and thereby affording a bird’s eye view of the frequency of the two kinds of verb phrase. In addition, a verb-based approach will be used for an analysis that looks at more strictly variable contexts (i.e. include only SP verbs that can be replaced by a PP), making use of nine high-frequency lexical verbs (come, finish, get, give, go, hear, see, tell, think). A third approach will look into the co-occurrence of the PP with temporal adverbs, those that are typically associated with it (i.e. just, (n)ever, yet) and one that 20 Exceptions would be Hundt and Smith (2009), who retrieved PPs from the tagged version of the Brown-family corpora and Bowie et al. (2013), who use fuzzy tree fragments (FTFs) to retrieve their data. Davydova (2011) retrieved her data manually. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 22 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) prototypically triggers the SP (yesterday). Finally, a qualitative analysis of the opening sections of articles in ICE-AUS, ICE-NZ, ICE-FJ and ICE-PHI will show whether the perfect is frequently used with the framing function observed by Engel and Ritz (2000). As far as the definition of the variable is concerned, the focus is on standard variants of both the PP and the SP. Occasionally, a perfect with auxiliary be rather than have is attested even in edited, printed texts from an ESL context: (10) The game was long been seen as a hobby … . (ICE-GH, W2C-004) Such non-standard variants are not included in the counts. In interpreting the overall frequency of simple past tense VPs in ESL varieties one has to be aware of the possibility that there is zero past tense marking on verbs. The following example comes from ICE-FJ – it is a serendipitous find from a manual postedit of co-occurrences with the adverb yesterday. Note that zero past tense marking (brave) is used along regular past tense marking (stood) in this example: (11) Hundreds of students of a Suva prominent school brave yesterday's heat and stood in long queues to wait for their turn to pay their fees. (ICE-FJ, W2C-012) While I did not systematically search for zero past tense marking, the phenomenon seems to be rare in the newspaper data, so they are not included in the counts. Similarly, PPs with a base form of the participle are not included in the present case study, partly because PPs were retrieved by searching for the standard past participle and partly Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 23 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) because these nativized patterns, again, appeared to be typical of spoken rather than written usage. By narrowing down the envelope of variation in this way we will not have missed a large number of relevant hits: Suárez-Gómez and Seoane (2013:167) found only 1.1% zero-marked SPs and 0.6% PPs with a base form in their written Asian English material. While zero past tense marking might lead to under-reporting of SPs in automatically retrieved data, lack of back-shifting to past perfects in reported speech will lead to overreporting of data: (12) He said his wife could have been saved if there was someone who knew how to apply CPR. (ICE-FJ, W2C-014) Lack of back-shifting also occurred with PPs (and not only in the ESL varieties); these instances were not included in the counts because they are not part of the variable context investigated here, i.e. they are not typical PP contexts but variants of the past perfect: (13) The Burnaby Lawyer noted that Bourassa has come to B.C. before – the most recent visit was in April, 1988. (ICE-CAN, W2C-010) (14) While acknowledging that the Board had not lived up to its role, he emphasised that since his takeover two years ago, things have turned for the better. (ICE-IND, W2C-019) Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 24 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) Nativized patterns were also excluded from the co-occurrence data with temporal adverbs. In IndE, for instance, yet can be used in the sense of ‘still’, as the following examples show: (15) Randi Reppen! 1/7/13 16:36 Comment: Also true for AmE MH: Do I need to include this information in a footnote if the benchmark variety is BrE? The gas emerged from a broken outlet pipe of the tanker and spread in the nearby village while people were yet asleep. (ICE-IND, W2C-012) (16) Over a month having lapsed since the poll-day violence, Khan has yet not been arrested. (ICE-IND, W2C-004) Finally, instances where the adverb did not modify the VP but another temporal adverb were also manually excluded from the concordances: (17) The government just recently moved to dismantle the allocation of the imports of sugar under the so-called minimum access volume scheme under which a limited group corner the bulk of the importation. (ICE-PHI, W2C-006) 3.2 Findings The automatically retrieved data will be presented in two different ways. Figure 2a shows the overall frequency of PPs and SPs across the parsed datasets. Even though the results are presented in terms of percentages, it is important to note that these, unlike tables 1 and 2 above, do not represent variable contexts of use, i.e. only SPs in present perfect contexts, but the proportion of all PP and SPs. The results thus simply give an initial indication of the distribution of the two constructions across the different ICE Randi! 1/7/13 16:38 Comment: So just is excluded, but ‘recently’ would be included in that example – right? MH: I only looked at JUST, (N)EVER and YET, so recently was not included in the set of adverbials I looked at. Do I need to make that explicit here (the adverbials are listed above). Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 25 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) components. The data were not manually post-edited to exclude false positives, e.g. instances of have got (to). [add footnote on evaluation of automatically retrieved data] ICE-US 11.1% ICE-CAN 12.2% ICE-GB 17.1% ICE-AUS 9.9% ICE-NZ 11.3% ICE-IND 14.5% ICE-PHI 9.3% ICE-FJ 7.6% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Figure 2a. Relative frequency of PP and SP in the press section of ICE corpora (parsed) (ICE-US, N = 1890; ICE-CAN, N = 1928; ICE-GB, N = 1753; ICE-AUS, N = 2248; ICE-NZ, N = 2160; ICE-IND, N = 1584; ICE-PHI, N = 2029; ICE-FJ, N = 2145) Figure 2a shows that in news reporting, the SP occurs with a much higher text frequency than the PP. Note, however that the relative frequencies presented in figure 2a are not directly comparable with the proportions reported in Tables 1a and 1b which only included SPs in present perfect contexts. The bird’s eye view also suggests that the ENL and ESL varieties do not fall into two neat groups. At 9.3%, PhilE yields an even lower proportion of PPs than AmE, its historical parent variety. AusE, NZE and CanE are closer to AmE than to BrE, which has the highest relative frequency of PPs, closely followed by a historically related ESL variety, IndE. The lowest frequency of PPs are Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 26 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) found in FijE, an ESL variety that is currently undergoing nativization (see e.g. Zipp, forthcoming). A somewhat different picture emerges if the frequency of PPs is measured against corpus size rather than as a proportion of PPs vs. SPs (see Figure 2b): ICE-AUS has slightly more rather than fewer PPs than ICE-US on this count. ICE-US ICECAN ICE-GB ICE-AUS ICE-NZ ICE-IND ICE-PHI ICE-FJ 0 2000 4000 6000 8000 10000 12000 14000 16000 Figure 2b. PPs (frequency pmw) in the press sections of ICE corpora (parsed)21 (ICE-US, N = 210; ICE-CAN, N = 235; ICE-GB, N = 299; ICE-AUS, N = 222; ICE-NZ, N = 245; ICEIND, N = 230; ICE-PHI, N = 188; ICE-FJ, N = 164;) Text frequency counts of PPs in newspaper texts of ICE thus do not confirm Elsness’s (2009) findings of a more frequent use of PPs in AusE than in the other ENL varieties. 21 The press sections contain approximately 20,000 words each. Marianne Hundt! 1/7/13 16:46 Deleted: , i.e. pmw frequencies are extrapolated from this somewhat rough measure Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 27 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) Figure 3a, finally, presents the results from the verb-based approach and a variable context, i.e. only SPs that could be replaced by a PP. The concordances were manually post-edited to exclude instances from reported speech with past-tense reporting verbs, i.e. contexts in which back-shifting rules might apply. ICE-AUS 7.9% ICE-CAN 13.3% ICE-GB 14.3% ICE-NZ 10.2% ICE-US 11.7% ICE-FJ 15.9% ICE-IND 15.1% ICE-SL 18.8% ICE-PHI 11.6% ICE-GH 10% 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% Figure 3a. PP vs. SP with selected verbs (ICE-AUS, N = 139; ICE-CAN, N = 98; ICE-GB, N = 112; ICE-NZ, N = 98; ICE-US, N = 120; ICE-FJ, N = 82; ICE-IND, N = 73; ICE-SL, N = 64; ICE-PHI, N = 69; ICE-GH, N = 60) Even if the variable is defined differently in Figure 3a from the data presented in Figures 2a and 2b, and a verb-based approach is used, the proportion of PPs in newspaper texts is still much lower than in the spoken data investigated by Davydova (2011) and Seoane and Suárez-Gómez (2013) (i.e. Tables 1a and 1b above). As in Figure 2b, ICE-CAN and ICE-GB yield similar proportions of PPs, as do ICE-US and ICE-PHI. While the ESL varieties FijE, IndE and SLE show a high proportion of PPs, GhE patterns more closely with PhilE here, albeit at an even lower level of PPs. AusE is Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 28 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) now the variety with the lowest use of PPs. Note, however, that Figure 3a includes both active and passive VPs. Elsness (2009:98) limits his analysis to active, positive, declarative and non-progressive contexts, and Figure 3b shows that voice, for instance appears to have an effect on the proportion of PPs and SPs as passives may be avoided particularly in combination with the perfect, particularly in ESL varieties. ICE-AUS 7.9% ICE-CAN 11.6% ICE-GB 13% ICE-NZ 8.8% ICE-US 11.7% ICE-FJ 17.7% ICE-IND 14.7% ICE-SL 17.5% ICE-PHI 12.1% ICE-GH 7.3% 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% Figure 3b: PP vs. SP with selected verbs (active only) (ICE-AUS, N = 127; ICE-CAN, N = 86; ICE-GB, N = 100; ICE-NZ, N = 80; ICE-US, N = 120; ICE-FJ, N = 62; ICE-IND, N = 68; ICE-SL, N = 57; ICE-PHI, N = 66; ICE-GH, N = 55) With active-only VPs, AusE and NZE show more similar usage of PPs, as do AmE and CanE; BrE has the highest proportion of PPs amongst the ENL varieties. With the exception of GhE and American-based PhilE, the ESL varieties show relatively high proportions of PPs, with IndE showing a slightly higher proportion of PPs than BrE and the somewhat ‘younger’ ESL varieties in Fiji and Sri Lanka yielding the highest relative Randi! 28/5/13 03:49 Comment: Why would that be the case? Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 29 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) use of PPs. The results of this case study are thus different from those obtained from spoken data, where ESL varieties yielded overall lower proportions of PPs. As far as co-occurrence with temporal adverbials is concerned, the newspaper section of ICE is too small to yield conclusive results. However, the figures in Table 3 indicate some interesting trends. Table 3. Co-occurrence of PP and SP with adverbials just, (n)ever, yet22 ICE-AUS ICE-CAN ICE-GB ICE-NZ ICE-US ICE-FJ ICE-IND ICE-SL ICE-PHI ICE-GH PP : SP total 5:2 7 8:4 12 7:2 9 6:2 8 8:7 15 2:2 4 2:1 3 3:0 3 1:4 5 0:3 3 As predicted by previous research, SPs are more often found with adverbs expressing current relevance in AmE and in CanE, but are a real minority variant in the other ENL varieties. Usage in PhilE, again, shows the historical connection with AmE. Additional data would be needed to verify whether GhE might also be influenced by AmE or whether the absence of PPs with adverbs that typically trigger this construction has to be attributed to the overall low frequency of PPs observed in Figure 3b. A qualitative analysis of the data shows that the SP at times is a variant of the PP when it co-occurs 22 Again, instances in subordinate clauses with a past tense reporting verb are excluded. Randi! 2/7/13 09:37 Comment: Why these 3 adverbials? I think I missed something here. MH: Because they typically co-occur with the PP (at least in BrE) Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 30 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) with an adverb such as just (18), whereas some instances are variants of the past perfect (i.e. with lack of back-shifting), as in (19): (18) Laced with the victorious Fiji Barbarians players who just returned from Auckland the side has a full set of arsenal to do the damage in Vanua Levu. (ICEFJ, W2C-018) (19) he did not push the players hard in the first run considering the fact that they just came back from the festive break (ICE-FJ, W2C-020:6:65) A systematic search for co-occurrence of the PP with a clear past-tense adverb (i.e. yesterday) did not yield a single incidence in any of the press sections of the ten corpora surveyed for this case study. Likewise, a qualitative analysis of the opening paragraphs of articles in ICE-AUS, ICE-NZ, ICE-FJ and ICE-PHI did not provide any evidence of the typical framing function that Engel and Ritz (2000) describe for their spoken radio data, at least not with adverbs that have a clear past time reference. 3.3 Discussion The different methods used to retrieve PPs from ICE and the different measures employed to compare the results make it difficult to assess them. The data presented above have shown that the envelope of variation that is studied will result in a different picture of the relation among ENL and ESL varieties: it makes a difference, for instance, whether the overall text frequency of PPs is compared or whether the variable is defined more narrowly, e.g. as an alternation between PP and SP in perfect contexts. Randi! 28/5/13 03:51 Comment: Yes! This is an important point. Explain and discuss more re. this. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 31 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) We also saw that the results are slightly different if passive VPs are included in the counts or not. Qualitative analyses of the corpus data show that variable use of PPs and SPs is at times difficult to categorize. In the following example, a PP is used in a clear past time context, but the choice of the PP itself suggests that the past action has current relevance: (20) The new Prime Minister’s policy declaration at the meeting of Kisan Coordination Committee on December 31 last has given enough indication of coming change. (ICE-IND, W2C-007) Occasionally, ESL ICE corpora yield interesting examples of temporal expressions that typically co-occur with either a PP, a present tense or future time expression. Instead, what we find in ICE-SL is a SP: (21) This is the first time Janasansadaya came to Anuradhapura and first a workshop is held for Maha Sangha. (ICE-SL, W2C-015) A follow-up search showed that this is not a tense choice regularly attested in any of the ICE corpora. Previous web-based data (Hundt, 2013:194) show, however, that the present progressive outnumbers the PP with This is the first time in SingE and IndE. This brings us to the question what should be included in the envelope of variation. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 32 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) The results presented in the previous section used a fairly conservative definition and compared PPs and SPs, only. Qualitative analyses of corpus data, however, reveal that the envelope of variation cannot only be broadened to include the nativized patterns mentioned in section 3.1, but the present progressive, as well, as the following examples show: (22) “Over the past five years, the number of newborns affected is steadily increasing which corresponds with the increase in females detected with the disease over the past five years," she said. (ICE-FJ, W2C-014) (23) The entire hilly belt of Jammu is experiencing heavy snowfall since early this morning, reports say today, according to PTI -. (ICE-IND, W2C-009) (24) "Ever since the Sakvithi and the recent Golden Key crisis erupted, we are now experiencing increased number of deposits than before, … (ICE-SL, W2C-001)23 These variants have not been discussed in the context of variable use of the PP so far because they are extensions of the progressive to traditional PP contexts. They are not limited to ESL varieties but also occasionally attested from ENL contexts (see Hundt and Vogel 2011:159). This extension of the present progressive to PP contexts might be fostered by the somewhat more unobtrusive use of the past progressive in this environment (see Pfaff et al. 2013). 23 The adverb since is also used as a conjunction introducing an adverbial clause of cause/reason; without ever, the subordinate clause would be ambiguous between the two readings and this, in turn, might have prompted the use of the present progressive in this particular example. Marianne Hundt! 2/7/13 09:48 Deleted: fostered Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 33 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) 4 CONCLUSION AND OUTLOOK The ICE is an excellent resource for the study of standard(izing) varieties of English around the World. Due to the history of the project, certain limitations apply with respect to the diachronic bias inherent in individual components and across regional varieties. Another limitation concerns the use of spoken material, which is so far only available in transcription. For some points of fine-grained grammatical analysis (e.g. final consonant cluster reduction in past tense VPs), availability of the original sound files would be desirable. In an ideal world, the sound files would be aligned with the transcription allowing researchers to target the particular grammatical structure retrieved from the corpus (this design feature is currently available for ICE-NIG, only, but seems to have been envisaged in the original plan for the ICE corpora, as the sound samples at the project website (http://ice-corpora.net/ice/sounds.htm) indicate. With a few exceptions, documentation of existing ICE components remains poor even though background information is often important for the interpretation of individual structure. Comparisons between ENL and ESL ICE data need to take the possibility of localisation of text types into account (different narrative traditions, for instance, may affect the use of tense in fiction texts, see Biewer 2012). van der Auwera et al. (2012), for instance, assume that ongoing language change tends to be more advanced in spoken than in written texts and therefore take differences between speech and writing as a proxy for ongoing change. The problem with this approach is that it must assume stylistic differences between speech and writing to be the same across different varieties of English, and this is not necessarily the case. More research, ideally in combination Randi! 2/7/13 09:49 Comment: Ok – but the chapter is not about ICE… MH: For the reasons outlined above, I think it makes sense to have the focus on ICE here and refer to previous reviews of Brown-family based research. Am happy to change, however, if you insist. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 34 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) with real-time data or apparent-time data from sociolinguistically balanced samples is needed to verify whether these assumptions are valid. [7047 words originally; now approx. 8000] REFERENCES Aarts, Bas, Close, Joanne, Leech, Geoffrey and Wallis, Sean (eds.). 2013. The verb phrase in English. Investigating recent language change with corpora. Cambridge: Cambridge University Press. Andersen, Gisle and Bech, Kristin. (eds.). 2013. English corpus linguistics: Variation in time, space and genre. Amsterdam and New York: Rodopi. Balasubramanian, Chandrika. 2009. Register variation in Indian English. Amsterdam: Benjamins. Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press. Biber, Douglas. 1993. Representativeness in corpus design. Literary and Linguistic Computing 8(4): 243-57. Biewer, Carolin, Hundt, Marianne and Zipp, Lena. 2010. How a Fiji corpus? Challenges in the compilation of an ESL ICE component. ICAME Journal 34: 5-23. Biewer, Carolin. 2012. South Pacific Englishes. The dynamics of second-language varieties of English in Fiji, Samoa and the Cook Islands. Post-doctoral thesis, University of Zurich. Bowie, Jill, Wallis, Sean and Aarts, Bas. 2013. The perfect in spoken British English. In Aarts et al., eds., 318-352. Collins, Peter. 2009. The progressive in English. In Pam Peters et al. (eds.), 115-123. Collins, Peter and Yao, Xinyue. 2012. Modals and quasi-modals in New Englishes. In Marianne Hundt and Ulrike Gut (eds.), 35-53. Davydova, Julia. 2011. The present perfect in non-native Englishes. A corpus-based study of variation. Berlin: De Gruyter. Deuber, Dagmar, Biewer, Carolin, Hackert, Stephanie and Hilbert, Michaela. 2012. Will and would in selected New Englishes: General and variety-specific tendencies. In Marianne Hundt and Ulrike Gut (eds.), 77-102. Elsness, Johan. 1997. The perfect and the preterite in contemporary and earlier English. Berlin and New York: Mouton de Gruyter. Elsness, Johan. 2009. The perfect and the preterite in Australian and New Zealand English. In Pam Peters et al. (eds.), 89-114. Engel, Dulcie M. and Marie-Eve Ritz. 2000. The use of the present perfect in Australian English. Australian Journal of Linguistics 20(2): 119-140. Fallon, Helen. 2004. Comparing World Englishes: A research guide. World Englishes 23(2): 309–16. Greenbaum, Sidney (ed.). 1996. Comparing English worldwide: The International Corpus of English. Oxford: Clarendon Press. Gries, Stefan. 2006. Exploring variability within and between corpora: some methodological considerations. Corpora 1(2): 109-151. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 35 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) Höhn, Nicole. 2012. “And they were all like ‘What’s going on?”: New quotatives in Jamaican and Irish English. In Marianne Hundt and Ulrike Gut (eds.), 263-289. Hoffmann, Sebastian, Hundt, Marianne and Mukherjee, Joybrato. 2012. Indian English – an emerging epicentre? A pilot study on light verbs in web-derived corpora of South Asian Englishes. Anglia 129(3-4): 258-280. Hoffmann, Thomas and Siebers, Lucia (eds.). World Englishes – Problems, properties and prospects. Amsterdam and Philadelphia: Benjamins. Huckvale, Mark and Fang, Alex Changyu. 1996. PROSICE: A spoken English database for prosody research. In Sidney Greenbaum (ed.), 262-279. Hughes, Arthur, Trudgill, Peter and Watt, Dominic. 2005. English accents and dialects. An introduction to social and regional varieties of English in the British Isles. Fourth edition. London: Hodder Arnold. Hundt, Marianne. 1998. New Zealand English grammar. Fact or fiction? Amsterdam: Benjamins. Hundt, Marianne. 2006. “The committee has/have decided ...” On concord patterns with collective nouns in inner and outer circle varieties of English. Journal of English Linguistics 34(3): 206-232. Hundt, Marianne. 2009. Global English – global corpora: Report on a panel discussion at the 28th ICAME conference. In Antoinette Renouf and Andrew Kehoe (eds.), Corpus linguistics: Refinements and reassessments. Amsterdam: Rodopi, 451-462. Hundt, Marianne and Geoffrey Leech. 2012. ‘Small is beautiful’ – On the value of standard reference corpora for observing recent grammatical change. In Terttu Nevalainen and Elizabeth Closs Traugott (eds.), 175–188. Hundt, Marianne. 2013. The diversification of English: old, new and emerging epicentres. In Daniel Schreier and Marianne Hundt (eds.), English as a contact language. Cambridge: Cambridge University Press, 182-203. Hundt, Marianne and Nicholas Smith. 2009. The present perfect in British and American English: has there been any change, recently? ICAME Journal 33: 45-63. Hundt, Marianne and Katrin Vogel. 2011. Overuse of the progressive in ESL and learner Englishes – fact or fiction? In Joybrato Mukherjee and Marianne Hundt (eds.), 145-166. Hundt, Marianne and Gut, Ulrike (eds.). 2012. Mapping unity and diversity world-wide: Corpus-based studies of New Englishes. Amsterdam: Benjamins. Kallen, Jeffrey L. and Kirk, John M. 2008. ICE Ireland: A user's guide. Belfast: Trinity College Dublin. McCafferty, Kevin. Forthcoming. ‘[W]ell are you not got over thinking about going to ireland yet’: the be-perfect in eighteenth- and nineteenth-century Irish English. To appear in Marianne Hundt (ed.), Late modern English syntax. Cambridge: Cambridge University Press. Mair, Christian. 2009. Corpus linguistics meets sociolinguistics: Studying educated spoken usage in Jamaica on the basis of the International Corpus of English. In Thomas Hoffmann and Lucia Siebers (eds.), 39-60. Mair, Christian and Winkle, Claudia. 2012. Change from to-infinitive to bare infinitive in specificational cleft sentences: Apparent-time data from World Englishes. In Marianne Hundt and Ulrike Gut (eds.), 243-262. Mesthrie, Rajend and Bhatt, Rakesh M. 2008. World Englishes. The study of new linguistic varieties. Cambridge: Cambridge University Press. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 36 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) Miller, Jim. 2004. Perfect and resultative constructions in spoken and non-spoken English. In Olga Fischer, Muriel Norde and Harry Perridon (eds.), Up and down the cline – the nature of grammaticalization. Amsterdam: Benjamins, 229-246. Mukherjee, Joybrato and Hoffmann, Sebastian. 2006. Describing verb-complementation profiles of New Englishes: A pilot study of Indian English. English World-Wide 27: 147-73. Mukherjee, Joybrato and Hundt, Marianne. (eds.). 2011. Exploring second-language varieties of English and learner Englishes. Bridging a paradigm gap. Amsterdam: Benjamins. Mukherjee, Joybrato and Schilk, Marco. 2012. Exploring variation and change in New Englishes: Looking into the International Corpus of English (ICE) and beyond. In Terttu Nevalainen and Elizabeth Closs Traugott (eds.), 189-199. Nelson, Gerald. 1996. The design of the corpus. In Sidney Greenbaum (ed.), 27–35. Nelson, Gerald. 2006. World Englishes and corpora studies. In Braj B. Kachru, Yamuna Kachru and Cecil L. Nelson (eds.), The handbook of World Englishes. Oxford: Blackwell, 733-50. Nelson, Gerald, Sean Wallis and Bas Aarts. 2002. Exploring natural language. Working with the British component of the International Corpus of English. Amsterdam: Benjamins. Nevalainen, Terttu and Traugott, Elizabeth Closs. (eds.). The Oxford handbook of the history of English. Oxford: Oxford University Press. Newman, John and Columbus, Georgie. 2009. Education as an over-represented topic in the ICE corpora? Paper presented at the 15th Conference of the International Association for World Englishes, Cebu City, Philippines, 22 to 24 October 2009. Peters, Pam, Collins, Peter and Smith, Adam. (eds.). 2009. Comparative studies in Australian and New Zealand English. Grammar and beyond. Amsterdam: Benjamins. Pfaff, Meike, Bergs, Alexander and Hoffmann, Thomas. 2013. ‘I was just reading this article’ – on the expression of recentness and the English past progressive. In Bas Aarts et al. (eds.), 217-238. Ritz, Eve-Marie. In press. Relationship between event, reference time and time of utterance and the representation of present perfect sentences in Australian English narratives. Cahiers Chronos. Rosenfelder, Ingrid. 2009. Rhoticity in educated Jamaican English. In Thomas Hoffmann and Lucia Siebers (eds.), 61–82. Schmid, Helmut. 1994. Probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in Language Processing, Manchester, 44–49. Schneider, Edgar W. 2005. The subjunctive in Philippine English. In Danilo T. Dayag and J. Stephen Quakenbusch (eds.), Linguistics and language education in the Philippines and beyond. A Festschrift in Honor of Ma. Lourdes S. Bautista. Manila: Linguistic Society of the Philippines, 27-40. Schneider, Edgar W. 2007. Postcolonial English. Cambridge: Cambridge University Press. Schneider, Gerold. 2008. Hybrid long-distance functional dependency parsing. Ph.D. dissertation, University of Zurich. Sedlatschek, Andreas. 2009. Contemporary Indian English. Variation and change. Amsterdam: Benjamins. Marianne Hundt. 2015. World Englishes. In D. Biber & R. Reppen, eds. The 37 Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 381-400. (Final pre-print version) Seoane, Elena and Suárez-Gómez, Cristina. 2013. The expression of the perfect in East and South-East Asian Englishes. English World-Wide 34(1): 1-25. Sigley, Robert J. 1997. Text categories and where you can stick them: A crude formality index. International Journal of Corpus Linguistics 2(2): 199-237. Sigley, Robert. 2012. Assessing corpus comparability using a formality index: The case of the Brown and LOB clones. In Shunji Yamazaki and Robert Sigley (eds.), Approaching language variation through corpora. A Festschrift in honour of Toshio Saito. Bern: Peter Lang, 65-114. Skandera, Paul. 2003. Drawing a map of Africa: Idiom in Kenyan English. Tübingen: Narr. Suárez-Gómez, Cristina and Elena Seoane. 2013. “They have published a new cultural policy that just come out”: Competing forms in spoken and written New Englishes. In Gisle Andersen and Kristin Bech (eds.), 163-182. Szmrecsanyi, Benedikt and Bernd Kortmann. 2011. Typological profiling. Learner Englishes versus indigenized L2 varieties of English. In Joybrato Mukherjee and Marianne Hundt (eds.), 167-187. van der Auwera, Johan, Noël, Dirk and De Wit, Astrid. 2012. The diverging need(to)’s of Asian Englishes. In Marianne Hundt and Ulrike Gut (eds.), 55-75. van Rooy, Bertus, Terblanche, Lize, Haase, Christoph and Schmied, Joseph. 2010. Register differentiation in East African English. A multidimensional study. English World-Wide 31(3): 311-49. Vine, Bernadette. 1999. Guide to The New Zealand Component of the International Corpus of English. Wellington: School of Linguistics and Applied Language Studies, Victoria University of Wellington. Walker, Jim. 2008. The footballer’s perfect – Are footballers leading the way? In Eva Lavric, Gerhard Pisek, Andrew Skinner and Wolfgang Stadler (eds.), The linguistics of football. Tübingen: Gunter Narr, 295-303. Wunder, Eva-Maria, Voormann, Holger and Gut, Ulrike. 2010. The ICE Nigeria corpus project: Creating an open, rich and accurate corpus. ICAME Journal 34: 78-88. Xiao, Richard. 2009. Multidimensional analysis and the study of world Englishes. World Englishes 28(4): 421-450. Yao, Xinyue and Collins, Peter. 2013. Functional variation in the English present perfect: a cross-varietal study. In Gisle Andersen and Kristin Bech (eds.), 91-111. Zhiming, Bao and Huaqing, Hong. 2006. Diglossia and register variation in Singapore English. World Englishes 25(1): 105-114. Zipp, Lena. Forthcoming. Educated Fiji English. Lexico-grammar and variety status. Amsterdam: Benjamins.

Log In

World Englishes