Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
  • I began my academic career in 1989 at the University of Würzburg, Germany, where I studied for an MA, with a one-seme... moreedit
  • Geoffrey Leech, Gerry Knowlesedit
In corpus pragmatics, most of the research into speech acts still tends to be limited to working with the original, highly abstract, speech-act taxonomies devised by ordinary language philosophers like Austin and Searle. The aim of this... more
In corpus pragmatics, most of the research into speech acts still tends to be limited to working with the original, highly abstract, speech-act taxonomies devised by ordinary language philosophers like Austin and Searle. The aim of this article is to illustrate how the use of such restricted taxonomies may lead to oversimplified or potentially misleading impressions regarding the communicative functions expressed in spoken interaction, and to demonstrate how a more elaborate taxonomy, the DART taxonomy (Weisser, 2018), may help us gain better insights into the pragmatic strategies that occur in dialogues. To this end, I will draw on a small sample of dialogues, both from a task-oriented domain and unconstrained interaction, and contrast selected speech-act categorisations on the basis of Searle’s and the DART taxonomy, demonstrating the advantages that arise from using a more fine-grained taxonomy to describe complex verbal exchanges.
Abstract Based on data extracted from the Chinese component of the LINDSEI corpus and its native speaker counterpart LOCNEC, this paper examines the similarities and differences between Chinese English learners and English native speakers... more
Abstract Based on data extracted from the Chinese component of the LINDSEI corpus and its native speaker counterpart LOCNEC, this paper examines the similarities and differences between Chinese English learners and English native speakers in the use of recycling and replacement, two very common forms of ‘self-repair’. The data were analysed with a focus on two aspects: the syntactic class of words both learners and native speakers tend to initiate recycling and replacement in, and which types of syntactic/lexical elements are most frequently repeated. The results of the study indicate that Chinese English learners employ more recycling and replacement than native speakers. The most striking finding is that Chinese English learners utilise more verbs to initiate recycling and as replaced items than native speakers, which has significant implications for both vocabulary and grammar teaching. Another important finding is that both Chinese English learners and native speakers use more word-level recycling than group-level recycling, which partly contradicts those in earlier studies. The findings are discussed with reference to morpho-syntactic patterns of English, combined with theories of attention and automaticity in L2.
The main purposes of this chapter are to present a survey of current and developing work in the areas of research and development with respect to integrated spoken and written language resources, and to provide preliminary guidelines for... more
The main purposes of this chapter are to present a survey of current and developing work in the areas of research and development with respect to integrated spoken and written language resources, and to provide preliminary guidelines for the representation or annotation of dialogue in resources for language engineering (see also Gibbon et al. 1997, pp. 146–172).
This chapter attempts to demonstrate how using semi-automated corpus-annotation techniques could “objectify” the evaluation of agent and caller speech in customer contact call centres. This is achieved, in part, by profiling particular... more
This chapter attempts to demonstrate how using semi-automated corpus-annotation techniques could “objectify” the evaluation of agent and caller speech in customer contact call centres. This is achieved, in part, by profiling particular speaker or speaker groups through an analysis and comparison of the speech acts and other linguistic features used by Filipino agents and callers from American and British language backgrounds. Results suggest that pragmatics-related features of call centre discourse may be indicative of a speaker’s or speaker group’s performance and language variety. The discussion here demonstrates that it is in fact possible to profile individual speakers or groups in specific ways, as well as to judge their efficiency as communicators, at least to some extent. At the same time, it is evident that the behaviour of the different agents and callers potentially points towards certain preferences in the two varieties of English investigated in this study.
Recent years, as well as ICAME conferences, have seen a renewed interest in the compilation of next-generation or new ICE sub-corpora (cf. Nelson 2017), possibly also including web-based genres or data from Outer Circle varieties (Edwards... more
Recent years, as well as ICAME conferences, have seen a renewed interest in the compilation of next-generation or new ICE sub-corpora (cf. Nelson 2017), possibly also including web-based genres or data from Outer Circle varieties (Edwards 2017). As today corpus compilation via the web has become a much more convenient method than the traditional sampling employed in creating the original ICE corpora, it thus makes sense to try and compile as much as possible of the materials for new or updated written ICE materials from online sources. Hundt et al. refer to this strategy as using the “‘Web for corpus building’” (2007: 2), and, despite certain issues described there (ibid. 3) and in other publications (e.g. Schäfer & Bildhauer 2013 or Gatto 2014), this method represents a useful way of generating high-quality corpus data via visual inspection. And even though such an approach may stand in contrast with increasingly popular options for retrieving such data fully or semi-automatically using only seed terms, it is more appropriate for creating smaller-scale corpora constructed according to the principles and categories employed in the compilation of the ICE corpora. ICEweb 2 is a tool that makes it easy for the user to create new sub-corpora based on these criteria by automatically creating the relevant data structures (which can also easily be extend to new genres), and providing assistance in constructing and running queries through a number of different search engines to create lists of suitable web page addresses for the user to inspect. Any potential bias introduced by using a single search engine only can be avoided by compiling lists produced by these different engines. Pages identified in this way can then be downloaded fully automatically, storing the original URL and other meta information, and cleaned up inside the tool prior to converting them to plain text and/or a dedicated form of XML that allows for later pragmatic annotation, similar to the one suggested in Weisser (2017) for the spoken components of ICE. In addition, ICEweb 2 also contains facilities for PoS tagging, concordancing, and n-gram analysis, including adjustable frequency norming, turning it into an all-round tool for working with new ICE data.
Based on data extracted from the Chinese component of the LINDSEI corpus and its native speaker counterpart LOCNEC, this paper examines the similarities and differences between Chinese English learners and English native speakers in the... more
Based on data extracted from the Chinese component of the LINDSEI corpus and its native speaker counterpart LOCNEC, this paper examines the similarities and differences between Chinese English learners and English native speakers in the use of recycling and replacement. The data were analysed with a focus on two aspects: the syntactic class of words both learners and native speakers tend to initiate recycling and replacement in, and which types of syntactic/lexical elements are most frequently repeated. The results of the study indicate that Chinese English learners employ more recycling and replacement than native speakers. The most striking finding is that Chinese English learners utilise more verbs to initiate recycling and as replaced items than native speakers, which has significant implications for both vocabulary and grammar teaching. Another important finding is that both Chinese English learners and native speakers use more word-level recycling than group-level recycling, which partly contradicts those in earlier studies. The findings are discussed with reference to morpho-syntactic patterns of English, combined with theories of attention and automaticity in L2.
This Habilitationsschrift (professorial thesis) details an earlier version of my current approach to the pragmatic annotation and analysis of dialogues. For a more advanced and updated version of this methodology, see my book 'How... more
This Habilitationsschrift (professorial thesis) details an earlier version of my current approach to the pragmatic annotation and analysis of dialogues. For a more advanced and updated version of this methodology, see my book 'How to Do Corpus Pragmatics on Pragmatically Annotated Data: Speech Acts and Beyond', published with John Benjamins in 2018.
ABSTRACT The Dialogue Annotation and Research Tool (DART) is a tool designed to facilitate large-scale corpus-based research into pragmatics-related aspects, including syntax, pragmatics (speech-acts), semantico-pragmatics (Searle’s... more
ABSTRACT The Dialogue Annotation and Research Tool (DART) is a tool designed to facilitate large-scale corpus-based research into pragmatics-related aspects, including syntax, pragmatics (speech-acts), semantico-pragmatics (Searle’s ‘IFIDS’), as well as other interaction-relevant features. This chapter aims at providing a detailed description and discussion of the DART annotation scheme, which has already been successfully applied to a number of corpora from various domains, and provide examples of its application and appliability to various types of spoken data, illustrating e.g. the ability to create speaker profiles that allow the researcher to investigate features of (in)directness & politeness, initiative, etc. (cf. Weisser 2016d). In doing so, I shall draw on materials from task-oriented corpora (SPAADIA; SRI Amex), unconstrained dialogue (Switchboard), and ‘English as a lingua franca (ELF)’ data from ICE-HK, at the same time illustrating the advantages of the scheme over prior existing schemes.
Pragmatics, as a linguistic discipline, broadly deals with the analysis and recognition of meaning in texts. Keywords: corpus linguistics; discourse analysis; interaction; pragmatics
Since the inception of the ICE project in 1990, ICE corpora have been used extensively in the investigation and comparison of varieties of English on different linguistic levels. These levels, however, have so far primarily been... more
Since the inception of the ICE project in 1990, ICE corpora have been used extensively in the investigation and comparison of varieties of English on different linguistic levels. These levels, however, have so far primarily been restricted to lexis and lexico-grammar, while relatively little has to date been achieved in the investigation of pragmatic strategies used by the speakers in these corpora. One of the main reasons for this shortcoming is a lack of suitable annotation that would make such a detailed pragmatic comparison possible. This paper will propose a suitable model and format for converting and enriching the ICE corpora with different levels of pragmatics-relevant information, as well as discussing the issues involved in this endeavour. And finally, to illustrate the feasibility of this aim, the paper will also include a small case study carried out on a number of files, pointing out how the resulting annotations could later be exploited in pragmatics research.
In corpus pragmatics, most of the research into speech acts still tends to be limited to working with the original, highly abstract, speech-act taxonomies devised by ordinary language philosophers like Austin and Searle. The aim of this... more
In corpus pragmatics, most of the research into speech acts still tends to be limited to working with the original, highly abstract, speech-act taxonomies devised by ordinary language philosophers like Austin and Searle. The aim of this article is to illustrate how the use of such restricted taxonomies may lead to oversimplified or potentially misleading impressions regarding the communicative functions expressed in spoken interaction, and to demonstrate how a more elaborate taxonomy, the DART taxonomy (Weisser, 2018), may help us gain better insights into the pragmatic strategies that occur in dialogues. To this end, I will draw on a small sample of dialogues, both from a task-oriented domain and unconstrained interaction, and contrast selected speech-act categorisations on the basis of Searle’s and the DART taxonomy, demonstrating the advantages that arise from using a more fine-grained taxonomy to describe complex verbal exchanges.
Corpus-based research into pragmatics is suffering from a distinct lack of suitably annotated corpora. This dilemma has so far generally forced researchers in corpus-based pragmatics to focus on well-known fixed expressions (e. g.... more
Corpus-based research into pragmatics is suffering from a distinct lack of suitably annotated corpora. This dilemma has so far generally forced researchers in corpus-based pragmatics to focus on well-known fixed expressions (e. g. discourse markers, politeness formulae, etc.) in their research, rather than being able to investigate interaction on the level of speech acts and other pragmatics-relevant features on a larger scale. This article describes a research environment that aims at remedying this problem (currently for English only) by making large-scale annotation of, and research into, speech acts and other linguistic levels possible in an efficient manner, at the same time discussing the difficulties and complexities inherent in such an endeavour. It then goes on to illustrate the efficiency of the approach, and how the resulting annotations represent an improvement over existing models in the form of a brief case study. The latter includes an illustrative discussion of the p...
Most standard grammars concentrate on describing grammatically well-formed units, such as “normal” declaratives, interrogatives, etc., and only handle non-grammatical units as irregular and somehow deviant from the norm. However, in... more
Most standard grammars concentrate on describing grammatically well-formed units, such as “normal” declaratives, interrogatives, etc., and only handle non-grammatical units as irregular and somehow deviant from the norm. However, in spoken language, we often encounter especially smaller or fragmentary, non-clause-like textual units that do not easily fi t traditional descriptions. The aim of this article is to provide an overview of these non-grammatical units, to describe what their functions are, as well as to explain why they form such a necessary part of spoken interaction.
... Martin Weisser, 2009 Edinburgh University Press Ltd 22 George Square, Edinburgh ... from an array 53 Table 5.1 Essential file handling modes 57 Table 5.2 Basic file tests 65 Table 6.1 Regex match modifiers 77 Table 6.2 Extended regex... more
... Martin Weisser, 2009 Edinburgh University Press Ltd 22 George Square, Edinburgh ... from an array 53 Table 5.1 Essential file handling modes 57 Table 5.2 Basic file tests 65 Table 6.1 Regex match modifiers 77 Table 6.2 Extended regex extensions 78 Table 7.1 SAMPA to IPA ...
... siderable emphasis on investigating features of intonation and cohesion because I be-... selves. 1.4.2. The Foreign Learner and the Issue of National Identity. ... own cultural identity and status. While this may well be the case for... more
... siderable emphasis on investigating features of intonation and cohesion because I be-... selves. 1.4.2. The Foreign Learner and the Issue of National Identity. ... own cultural identity and status. While this may well be the case for some learners ...
This article reports on a pilot project which aims at creating a speech-act annotated training corpus for service dialogue systems. In order to achieve this aim, an annotation tool, which allows us to automate large parts of the... more
This article reports on a pilot project which aims at creating a speech-act annotated training corpus for service dialogue systems. In order to achieve this aim, an annotation tool, which allows us to automate large parts of the annotation, is being developed. This tool converts text-based transcriptions into XML and applies different levels of markup to each dialogue, so that there remains as little post-editing to be done as possible. The project also aims at developing a relatively generic mark-up scheme that may be applied to different domains without needing a large degree of adaptation. This article describes aspects of the grammar ‘controlling/governing’ the tool and how this grammar ‘interacts’ with the general strategies employed in the annotation.
In corpus pragmatics, most of the research into speech acts still tends to be limited to working with the original, highly abstract, speech-act taxonomies devised by ordinary language philosophers like Austin and Searle. The aim of this... more
In corpus pragmatics, most of the research into speech acts still tends to be limited to working with the original, highly abstract, speech-act taxonomies devised by ordinary language philosophers like Austin and Searle. The aim of this article is to illustrate how the use of such restricted taxonomies may lead to oversimplified or potentially misleading impressions regarding the communicative functions expressed in spoken interaction, and to demonstrate how a more elaborate taxonomy, the DART taxonomy (Weisser, 2018), may help us gain better insights into the pragmatic strategies that occur in dialogues. To this end, I will draw on a small sample of dialogues, both from a task-oriented domain and unconstrained interaction, and contrast selected speech-act categorisations on the basis of Searle’s and the DART taxonomy, demonstrating the advantages that arise from using a more fine-grained taxonomy to describe complex verbal exchanges.
This article describes a new tool designed to facilitate the collection of web-based data for new or existing ICE corpora.
This paper illustrates how the Dialogue Annotation and Research Tool (DART; Version) can be used to annotate and investigate speech acts in large corpora, using a taxonomy of 160+ speech-act categories.
The Dialogue Annotation and Research Tool (DART) is a tool designed to facilitate large-scale corpus-based research into pragmatics-related aspects, including syntax, pragmatics (speech-acts), semantico-pragmatics (Searle’s ‘IFIDS’), as... more
The Dialogue Annotation and Research Tool (DART) is a tool designed to facilitate large-scale corpus-based research into pragmatics-related aspects, including syntax, pragmatics (speech-acts), semantico-pragmatics (Searle’s ‘IFIDS’), as well as other interaction-relevant features. This chapter aims at providing a detailed description and discussion of the DART annotation scheme, which has already been successfully applied to a number of corpora from various domains, and provide examples of its application and appliability to various types of spoken data, illustrating e.g. the ability to create speaker profiles that allow the researcher to investigate features of (in)directness & politeness, initiative, etc. (cf. Weisser 2016d). In doing so, I shall draw on materials from task-oriented corpora (SPAADIA; SRI Amex), unconstrained dialogue (Switchboard), and ‘English as a lingua franca (ELF)’ data from ICE-HK, at the same time illustrating the advantages of the scheme over prior existing schemes.
Part-of-Speech (PoS) tagging is still one of the most common and basic operations carried out in order to enrich corpora, but especially smaller projects often need to rely on freeware taggers that are sub-optimal for detailed linguistic... more
Part-of-Speech (PoS) tagging is still one of the most common and basic operations carried out in order to enrich corpora, but especially
smaller projects often need to rely on freeware taggers that are sub-optimal for detailed linguistic research. This paper introduces a tool,
the Tagging Optimiser, which enhances the output of such taggers to make the resulting tags more accurate, readable, and to enhance
the tagset to render it more useful for finer-grained linguistic analyses.
This paper discusses a methodology and scheme for updating the spoken components of the International Corpus of English (ICE) by changing the existing format to XML and adding pragmatics-relevant information on multiple levels.
Research Interests:
Presentation about my The Simple Corpus Tool at CoLTA 2015, 16th December 2015
Research Interests:
This presentation, given at CL 2015 in Lancaster, introduces the initial stages of the development of the The Text Annotation and Research Tool (TART).
Research Interests:
Paper presented at Asialex 2015
Research Interests:
This chapter, from Aijmer & Rühlemann (2014), Corpus Pragmatics: a Handbook (CUP), prevents an overview and some guidelines for annotating speech acts, and compares & evaluates different annotation systems. For more info on the Handbook,... more
This chapter, from Aijmer & Rühlemann (2014), Corpus Pragmatics: a Handbook (CUP), prevents an overview and some guidelines for annotating speech acts, and compares & evaluates different annotation systems.
For more info on the Handbook, see http://www.cambridge.org/gb/academic/subjects/languages-linguistics/semantics-and-pragmatics/corpus-pragmatics-handbook?format=HB.
Research Interests:
Workshop slides from the HAAL 2014 Conference
Research Interests:

And 5 more

Invited talk describing the advantages and features of version 3 of the Simple Corpus Tool
Part-of-Speech (PoS) tagging is still one of the most common and basic operations carried out in order to enrich corpora, but especially smaller projects often need to rely on freeware taggers that are sub-optimal for detailed linguistic... more
Part-of-Speech (PoS) tagging is still one of the most common and basic operations carried out in order to enrich corpora, but especially
smaller projects often need to rely on freeware taggers that are sub-optimal for detailed linguistic research. This paper introduces a tool,
the Tagging Optimiser, which enhances the output of such taggers to make the resulting tags more accurate, readable, and to enhance
the tagset to render it more useful for finer-grained linguistic analyses.
This presentation introduces a novel way of combining pragmatics-relevant and error (feature) annotation in order to enhance learner profiling.
Research Interests:
Recent years, as well as ICAME conferences, have seen a renewed interest in the compilation of next-generation or new ICE sub-corpora (cf. Nelson 2017), possibly also including web-based genres or data from Outer Circle varieties (Edwards... more
Recent years, as well as ICAME conferences, have seen a renewed interest in the compilation of next-generation or new ICE sub-corpora (cf. Nelson 2017), possibly also including web-based genres or data from Outer Circle varieties (Edwards 2017). As today corpus compilation via the web has become a much more convenient method than the traditional sampling employed in creating the original ICE corpora, it thus makes sense to try and compile as much as possible of the materials for new or updated written ICE materials from online sources. Hundt et al. refer to this strategy as using the “‘Web for corpus building’” (2007: 2), and, despite certain issues described there (ibid. 3) and in other publications (e.g. Schäfer & Bildhauer 2013 or Gatto 2014), this method represents a useful way of generating high-quality corpus data via visual inspection. And even though such an approach may stand in contrast with increasingly popular options for retrieving such data fully or semi-automatically using only seed terms, it is more appropriate for creating smaller-scale corpora constructed according to the principles and categories employed in the compilation of the ICE corpora.
ICEweb 2 is a tool that makes it easy for the user to create new sub-corpora based on these criteria by automatically creating the relevant data structures (which can also easily be extend to new genres), and providing assistance in constructing and running queries through a number of different search engines to create lists of suitable web page addresses for the user to inspect. Any potential bias introduced by using a single search engine only can be avoided by compiling lists produced by these different engines. Pages identified in this way can then be downloaded fully automatically, storing the original URL and other meta information, and cleaned up inside the tool prior to converting them to plain text and/or a dedicated form of XML that allows for later pragmatic annotation, similar to the one suggested in Weisser (2017) for the spoken components of ICE. In addition, ICEweb 2 also contains facilities for PoS tagging, concordancing, and n-gram analysis, including adjustable frequency norming, turning it into an all-round tool for working with new ICE data.
Paper presented at CL2017, introducing the new version of DART (ver. 2.0), including a feature comparison with ver. 1 and illustrations of potential usage.
This is the Manual for the Simple Corpus Tool Version 3.0, a new version written in Python & PyQt.
Manual to accompany version 3.0 of the Dialogue Annotation and Research Tool (DART). This new version now recognises 162 speech acts and contains numerous enhancements to the different annotation & analysis modules, including a new... more
Manual to accompany version 3.0 of the Dialogue Annotation and Research Tool (DART). This new version now recognises 162 speech acts and contains numerous enhancements to the different annotation & analysis modules, including a new pattern counting facility.
This Habilitationsschrift (professorial thesis) details an earlier version of my current approach to the pragmatic annotation and analysis of dialogues. For a more advanced and updated version of this methodology, see my book 'How to Do... more
This Habilitationsschrift (professorial thesis) details an earlier version of my current approach to the pragmatic annotation and analysis of dialogues. For a more advanced and updated version of this methodology, see my book 'How to Do Corpus Pragmatics on Pragmatically Annotated Data: Speech Acts and Beyond', published with John Benjamins in 2018.
The latest version of the DART speech-act taxonomy, which now contains 162 speech-act labels and associated explanations.
Keynote given at the PakTESOL South Punjab Chapter, International Conference on Contemporary Trends in Linguistics, Literature and ELT on 14th December 2022