My main research interests are in corpus linguistics and contextual approaches to meaning. In my current work I am particularly interested in the development of tools to support innovative research questions in the Digital Humanities and new developments in Big Data.
In the field of corpus linguistics I have published on the relationship between lexis and grammar and the way in which phraseology contributes to cohesion in texts. A major focus of my research is corpus stylistics, an area of research that employs corpus linguistic methods for the study of literary texts. I have worked extensively on Dickens's fiction and proposed a lexical approach to body language presentation in fiction.
In this paper, we aim to situate corpus linguistic approaches to literary texts within the wider ... more In this paper, we aim to situate corpus linguistic approaches to literary texts within the wider context of digital humanities. With an exploratory case study of gendered body language in children’s literature, we illustrate the relationship between quantitative and qualitative analysis. The case study is focused on female body language descriptions and how the presentation of body language has changed over time. We work with two corpora of children’s literature: 19th century and contemporary fiction. Our analysis confirms the substantial imbalance in the representation of female and male characters that has been identified by earlier studies and also shows a more nuanced picture of emerging subtle changes.
In this paper our focus is on analyzing register variation within fiction, rather than between fi... more In this paper our focus is on analyzing register variation within fiction, rather than between fiction and other registers. By working with subcorpora that separate text within and outside of quotation marks, we appromixate fictional speech and narration. This enables us to identify and compare linguistic features with regard to different situational contexts in the fictional world. We focus in particular on the novels of Charles Dickens and a reference corpus of other 19th-century fiction. Our main method for the register analysis is Multi-dimensional Analysis (MDA) for which we draw on altogether four dimensions from two previous MDAs. The linguistic distinctions we identify highlight similarities between fictional speech and involved registers such as face-to-face communication, and between narration and more informational and narrative prose. In addition to the detailed information on register features that characterize speech and narration, the paper raises more general questions about the ability of register studies to deal with situational contexts within fiction.
We propose a lexico-grammatical approach to speech in fiction based on the centrality of 'fiction... more We propose a lexico-grammatical approach to speech in fiction based on the centrality of 'fictional speech-bundles' as the key element of fictional talk. To identify fictional speech-bundles, we use three corpora of 19th-century fiction that are available through the corpus stylistic web application CLiC (Corpus Linguistics in Context). We focus on the 'quotes' subsets of the corpora, i.e. text within quotation marks, which is mostly equivalent to direct speech. These quotes subsets are compared across the fiction corpora and with the spoken component of the British National Corpus 1994. The comparisons illustrate how fictional speech-bundles can be described on a continuum from lexical bundles in real spoken language to repeated sequences of words that are specific to individual fictional characters. Typical functions of fictional speech-bundles are the description of interactions and interpersonal relationships of fictional characters. While our approach crucially depends on an innovative corpus linguistic methodology, it also draws on theoretical insights into spoken grammar and characterisation in fiction in order to question traditional notions of realism and authenticity in fictional speech.
This chapter looks at the corpus tool CLiC, a web application specifically designed for the study... more This chapter looks at the corpus tool CLiC, a web application specifically designed for the study of literary texts. It allows students to run concordances or generate keywords, for instance. It gives students the opportunity to work with a corpus of Dickens novels, but also with other 19th century authors. Unlike more general corpus tools, CLiC enables searches that help to address research questions particular to literary texts. We investigate the question as to what kind of corpus exercises can be designed to help students understand the variety of opportunities that corpus approaches to literary texts offer. We deal with issues of frequency, but also with links between concepts in literary linguistics and corpus linguistics, specifically characterization and mind-modelling. We focus on examples from Charles Dickens's Oliver Twist for an illustrative case-study.
The Corpus Linguistics Discourse. In honour of Wolfgang Teubert, 2018
In this chapter, we propose a novel theoretical framework for the literary translation of fiction... more In this chapter, we propose a novel theoretical framework for the literary translation of fictional characters. This framework develops the cognitive corpus linguistic notion of mind- modelling to account for process-, product- and function-oriented aspects of literary translation. We use the examples of Alice and the Queen from Alice’s Adventures in Wonderland to compare character cues across the English original and a Czech translation. The character cues we focus on are reporting verbs. Reporting verbs, as part of the presentation of fictional speech, form a central component of narrative fiction and so provide an ideal evidential basis for our theoretical framework. The translation shifts we found through our comparison of source and target text specifically include gendered uses of reporting verbs. By approaching the target text as both a translation and a reading of the text in its own right we are able to view translation shifts as a reflection of shifts in the mind- modelling of fictional characters.
This paper introduces the web application CLiC, which we developed as part of a research project ... more This paper introduces the web application CLiC, which we developed as part of a research project bringing together insights from both cognitive poetics and corpus stylistics, with Dickens's novels as a case study. CLiC supports the analysis of discourse in narrative fiction with search options that make it possible to focus on stretches of text within and outside quotation marks. We argue that such search options open up novel ways of using concordances to link lexico-grammatical and textual patterns. We focus specifically on patterns for the creation of fictional characters. From a technical point of view, we explain the XML annotation that CLiC works with. Our discussion of textual examples focusses on phrases in fictional speech that illustrate significant differences between text within and outside quotation marks. In terms of theory, we argue that CLiC supports the identification of textual patterns that can provide insights into fictional minds and contribute to the exploration of readerly effects within the wider framework of mind-modelling.
The use of corpus linguistic techniques and other related mathematical analyses have rarely, if e... more The use of corpus linguistic techniques and other related mathematical analyses have rarely, if ever, been applied to qualitative data collected from the veterinary field. The aim of this study was to explore the use of a combination of corpus linguistic analyses and mathematical methods to investigate a free-text questionnaire dataset collected from 3796 UK veterinarians on evidence-based veterinary medicine, specifically, attitudes towards practice-based research (PBR) and improving the veterinary knowledge base. The corpus methods of key word, concordance and collocate analyses were used to identify patterns of meanings within the free text responses. Key words were determined by comparing the questionnaire data with a wordlist from the British National Corpus (representing general English text) using cross-tabs and log-likelihood comparisons to identify words that occur significantly more frequently in the questionnaire data. Concordance and collocation analyses were used to account for the contextual patterns in which such key words occurred, involving qualitative analysis and Mutual Information Analysis (MI3). Additionally, a mathematical topic modelling approach was used as a comparative analysis; words within the free text responses were grouped into topics based on their weight or importance within each response to find starting points for analysis of textual patterns. Results generated from using both qualitative and quantitative techniques identified that the perceived advantages of taking part in PBR centred on the themes of improving knowledge of both individuals and of the veterinary profession as a whole (illustrated by patterns around the words learning, improving, contributing). Time constraints (lack of time, time issues, time commitments) were the main concern of respondents in relation to taking part in PBR. Opinions of what vets could do to improve the veterinary knowledge base focussed on the collecting and sharing of information (record, report), particularly recording and discussing clinical cases (interesting cases), and undertaking relevant continuing professional development activities. The approach employed here demonstrated how corpus linguistics and mathematical methods can help to both identify and contextualise relevant linguistic patterns in the questionnaire responses. The results of the study inform those seeking to coordinate PBR initiatives about the motivators of veterinarians to participate in such initiatives and what concerns need to be addressed. The approach used in this study demonstrates a novel way of analysing textual data in veterinary research.
In this paper, we explore the potential of a corpus approach to study translated cohesion. We use... more In this paper, we explore the potential of a corpus approach to study translated cohesion. We use key words as starting points for identifying cohesive networks in Lovecraft's At the Mountains of Madness and discuss how these networks contribute to the construction of literary meanings in the text. We focus on the role of repetition as a key element in establishing cohesive networks between lexical items. We specifically discuss the implications of our method for the analysis of cohesion in translated texts. A comparison of Lovecraft's original novel and a translation into Italian provides us with a nuanced understanding of the complex nature of cohesive networks. Finally, we discuss the broader issue of applying models and methods from corpus linguistics to corpus stylistic analysis.
In this paper, we aim to situate corpus linguistic approaches to literary texts within the wider ... more In this paper, we aim to situate corpus linguistic approaches to literary texts within the wider context of digital humanities. With an exploratory case study of gendered body language in children’s literature, we illustrate the relationship between quantitative and qualitative analysis. The case study is focused on female body language descriptions and how the presentation of body language has changed over time. We work with two corpora of children’s literature: 19th century and contemporary fiction. Our analysis confirms the substantial imbalance in the representation of female and male characters that has been identified by earlier studies and also shows a more nuanced picture of emerging subtle changes.
In this paper our focus is on analyzing register variation within fiction, rather than between fi... more In this paper our focus is on analyzing register variation within fiction, rather than between fiction and other registers. By working with subcorpora that separate text within and outside of quotation marks, we appromixate fictional speech and narration. This enables us to identify and compare linguistic features with regard to different situational contexts in the fictional world. We focus in particular on the novels of Charles Dickens and a reference corpus of other 19th-century fiction. Our main method for the register analysis is Multi-dimensional Analysis (MDA) for which we draw on altogether four dimensions from two previous MDAs. The linguistic distinctions we identify highlight similarities between fictional speech and involved registers such as face-to-face communication, and between narration and more informational and narrative prose. In addition to the detailed information on register features that characterize speech and narration, the paper raises more general questions about the ability of register studies to deal with situational contexts within fiction.
We propose a lexico-grammatical approach to speech in fiction based on the centrality of 'fiction... more We propose a lexico-grammatical approach to speech in fiction based on the centrality of 'fictional speech-bundles' as the key element of fictional talk. To identify fictional speech-bundles, we use three corpora of 19th-century fiction that are available through the corpus stylistic web application CLiC (Corpus Linguistics in Context). We focus on the 'quotes' subsets of the corpora, i.e. text within quotation marks, which is mostly equivalent to direct speech. These quotes subsets are compared across the fiction corpora and with the spoken component of the British National Corpus 1994. The comparisons illustrate how fictional speech-bundles can be described on a continuum from lexical bundles in real spoken language to repeated sequences of words that are specific to individual fictional characters. Typical functions of fictional speech-bundles are the description of interactions and interpersonal relationships of fictional characters. While our approach crucially depends on an innovative corpus linguistic methodology, it also draws on theoretical insights into spoken grammar and characterisation in fiction in order to question traditional notions of realism and authenticity in fictional speech.
This chapter looks at the corpus tool CLiC, a web application specifically designed for the study... more This chapter looks at the corpus tool CLiC, a web application specifically designed for the study of literary texts. It allows students to run concordances or generate keywords, for instance. It gives students the opportunity to work with a corpus of Dickens novels, but also with other 19th century authors. Unlike more general corpus tools, CLiC enables searches that help to address research questions particular to literary texts. We investigate the question as to what kind of corpus exercises can be designed to help students understand the variety of opportunities that corpus approaches to literary texts offer. We deal with issues of frequency, but also with links between concepts in literary linguistics and corpus linguistics, specifically characterization and mind-modelling. We focus on examples from Charles Dickens's Oliver Twist for an illustrative case-study.
The Corpus Linguistics Discourse. In honour of Wolfgang Teubert, 2018
In this chapter, we propose a novel theoretical framework for the literary translation of fiction... more In this chapter, we propose a novel theoretical framework for the literary translation of fictional characters. This framework develops the cognitive corpus linguistic notion of mind- modelling to account for process-, product- and function-oriented aspects of literary translation. We use the examples of Alice and the Queen from Alice’s Adventures in Wonderland to compare character cues across the English original and a Czech translation. The character cues we focus on are reporting verbs. Reporting verbs, as part of the presentation of fictional speech, form a central component of narrative fiction and so provide an ideal evidential basis for our theoretical framework. The translation shifts we found through our comparison of source and target text specifically include gendered uses of reporting verbs. By approaching the target text as both a translation and a reading of the text in its own right we are able to view translation shifts as a reflection of shifts in the mind- modelling of fictional characters.
This paper introduces the web application CLiC, which we developed as part of a research project ... more This paper introduces the web application CLiC, which we developed as part of a research project bringing together insights from both cognitive poetics and corpus stylistics, with Dickens's novels as a case study. CLiC supports the analysis of discourse in narrative fiction with search options that make it possible to focus on stretches of text within and outside quotation marks. We argue that such search options open up novel ways of using concordances to link lexico-grammatical and textual patterns. We focus specifically on patterns for the creation of fictional characters. From a technical point of view, we explain the XML annotation that CLiC works with. Our discussion of textual examples focusses on phrases in fictional speech that illustrate significant differences between text within and outside quotation marks. In terms of theory, we argue that CLiC supports the identification of textual patterns that can provide insights into fictional minds and contribute to the exploration of readerly effects within the wider framework of mind-modelling.
The use of corpus linguistic techniques and other related mathematical analyses have rarely, if e... more The use of corpus linguistic techniques and other related mathematical analyses have rarely, if ever, been applied to qualitative data collected from the veterinary field. The aim of this study was to explore the use of a combination of corpus linguistic analyses and mathematical methods to investigate a free-text questionnaire dataset collected from 3796 UK veterinarians on evidence-based veterinary medicine, specifically, attitudes towards practice-based research (PBR) and improving the veterinary knowledge base. The corpus methods of key word, concordance and collocate analyses were used to identify patterns of meanings within the free text responses. Key words were determined by comparing the questionnaire data with a wordlist from the British National Corpus (representing general English text) using cross-tabs and log-likelihood comparisons to identify words that occur significantly more frequently in the questionnaire data. Concordance and collocation analyses were used to account for the contextual patterns in which such key words occurred, involving qualitative analysis and Mutual Information Analysis (MI3). Additionally, a mathematical topic modelling approach was used as a comparative analysis; words within the free text responses were grouped into topics based on their weight or importance within each response to find starting points for analysis of textual patterns. Results generated from using both qualitative and quantitative techniques identified that the perceived advantages of taking part in PBR centred on the themes of improving knowledge of both individuals and of the veterinary profession as a whole (illustrated by patterns around the words learning, improving, contributing). Time constraints (lack of time, time issues, time commitments) were the main concern of respondents in relation to taking part in PBR. Opinions of what vets could do to improve the veterinary knowledge base focussed on the collecting and sharing of information (record, report), particularly recording and discussing clinical cases (interesting cases), and undertaking relevant continuing professional development activities. The approach employed here demonstrated how corpus linguistics and mathematical methods can help to both identify and contextualise relevant linguistic patterns in the questionnaire responses. The results of the study inform those seeking to coordinate PBR initiatives about the motivators of veterinarians to participate in such initiatives and what concerns need to be addressed. The approach used in this study demonstrates a novel way of analysing textual data in veterinary research.
In this paper, we explore the potential of a corpus approach to study translated cohesion. We use... more In this paper, we explore the potential of a corpus approach to study translated cohesion. We use key words as starting points for identifying cohesive networks in Lovecraft's At the Mountains of Madness and discuss how these networks contribute to the construction of literary meanings in the text. We focus on the role of repetition as a key element in establishing cohesive networks between lexical items. We specifically discuss the implications of our method for the analysis of cohesion in translated texts. A comparison of Lovecraft's original novel and a translation into Italian provides us with a nuanced understanding of the complex nature of cohesive networks. Finally, we discuss the broader issue of applying models and methods from corpus linguistics to corpus stylistic analysis.
Phraséologie et stylistique de la langue littéraire / Phraseology and Stylistics of Literary Language. Approches interdisciplinaires / Interdisciplinary Approaches, 2020
The description of body language is an important autho-rial technique of characterisation. In thi... more The description of body language is an important autho-rial technique of characterisation. In this chapter, we offer a corpus linguistic approach to the study of body language that enables us to combine detailed qualitative analysis with the observation of more general textual patterns. We take the example of the body part noun eyes to identify patterns of non-verbal communication. Our approach centres on the comparison of collocation across fictional speech, narration and suspensions, as a means to identify local textual functions of eye language. The general principles we demonstrate are applicable beyond the example of eyes to the study of body part nouns more generally. The analysis employs the Cor-poraCoCo R package, the web application CLiC, and it also makes use of semantic annotation with the USAS tagger.
Rethinking Language, Text and Context. Interdisciplinary Research in Stylistics in Honour of Michael Toolan, 2018
This chapter situates corpus stylistics within wider trends in the digital humanities and emphasi... more This chapter situates corpus stylistics within wider trends in the digital humanities and emphasises the need for developing tools and visualisation methods tailored to the analysis of literary texts. Using the CLiC web app, the chapter shows how standard corpus linguistic methods can be further developed to better address research questions in literary stylistics. The analysis presents an innovative comparative approach to the identification of speech clusters in an individual fictional text—Dickens’s Great Expectations—as compared to larger corpora containing all of Dickens’s novels and authentic spoken language, respectively. This comparative perspective does not only emphasise differences between fictional speech and narration, but also considers overlapping patterns. The chapter links the notions of deviation and norms that are drawn on in literary stylistics to corpus linguistic comparisons of different corpora with particular emphasis on the fuzzy nature of linguistic categories.
Uploads