Refereed journal articles
Internet Histories: Digital Technology, Culture and Society, 2022
Netizens, Michael and Ronda Hauben's foundational treatise on Usenet and the Internet, was first ... more Netizens, Michael and Ronda Hauben's foundational treatise on Usenet and the Internet, was first published in print 25 years ago. In this piece, we trace the history and impact of the book and of Usenet itself, contextualising them within the contemporary and modern-day scholarship on virtual communities, online culture, and Internet history. We discuss the Net as a tool of empowerment, and touch on the social, technical, and economic issues related to the maintenance of shared network infrastructures and to the preservation and commodification of Usenet archives. Our interview with Ronda Hauben offers a retrospective look at the development of online communities, their impact, and how they are studied. She recounts her own introduction to the online world, as well as the impetus and writing process for Netizens. She presents Michael Hauben's conception of “netizens” as contributory citizens of the Net (rather than mere users of it) and the “electronic commons” they built up, and argues that this collaborative and collectivist model has been overwhelmed and endangered by the privatisation and commercialisation of the Internet and its communities.
Bookmarks Related papers MentionsView impact
Notes and Queries, Sep 1, 2020
In 1979 and 1980, Word Ways: The Journal of Recreational Linguistics printed a series of articles... more In 1979 and 1980, Word Ways: The Journal of Recreational Linguistics printed a series of articles on the early history, religious symbolism, and cultural significance of the rotas square, an ancient Latin-language palindromic word square. The articles were attributed to Dmitri A. Borgmann, the noted American writer on wordplay and former editor of Word Ways. While they attracted little attention at the time, some 35 years after their publication (and 29 years after Borgmann's death), questions began to be raised about their authorship. There is much internal and external evidence that, taken together, compellingly supports the notion that Borgmann did not write the articles himself. This paper surveys this evidence and solicits help in identifying the articles' original source.
Bookmarks Related papers MentionsView impact
Journal of Open Source Software, 2020
In computer science, a preprocessor (or macro processor) is a tool that programatically alters it... more In computer science, a preprocessor (or macro processor) is a tool that programatically alters its input, typically on the basis of inline annotations, to produce data that serves as input for another program. Preprocessors are used in software development and document processing workflows to translate or extend programming or markup languages, as well as for conditional or pattern-based generation of source code and text. Early preprocessors were relatively simple string replacement tools that were tied to specific programming languages and application domains, and while these have since given rise to more powerful, general-purpose tools, these often require the user to learn and use complex macro languages with their own syntactic conventions. In this paper, we present GPP, an extensible, general-purpose preprocessor whose principal advantage is that its syntax and behaviour can be customized to suit any given preprocessing task. This makes GPP of particular benefit to research applications, where it can be easily adapted for use with novel markup, programming, and control languages.
Bookmarks Related papers MentionsView impact
Procesamiento del Lenguaje Natural, 2020
Most humour processing systems to date make at best discrete, coarse-grained distinctions between... more Most humour processing systems to date make at best discrete, coarse-grained distinctions between the comical and the conventional, yet such notions are better conceptualized as a broad spectrum. In this paper, we present a probabilistic approach, a variant of Gaussian process preference learning (GPPL), that learns to rank and rate the humorousness of short texts by exploiting human preference judgments and automatically sourced linguistic annotations. We apply our system, which is similar to one that had previously shown good performance on English-language one-liners annotated with pairwise humorousness annotations, to the Spanish-language data set of the HAHA@IberLEF2019 evaluation campaign. We report system performance for the campaign's two subtasks, humour detection and funniness score prediction, and discuss some issues arising from the conversion between the numeric scores used in the HAHA@IberLEF2019 data and the pairwise judgment annotations required for our method.
Bookmarks Related papers MentionsView impact
Lexical polysemy, a fundamental characteristic of all human languages, has long been regarded as ... more Lexical polysemy, a fundamental characteristic of all human languages, has long been regarded as a major challenge to machine translation, human–computer interaction, and other applications of computational natural language processing (NLP). Traditional approaches to automatic word sense disambiguation (WSD) rest on the assumption that there exists a single, unambiguous communicative intention underlying every word in a document. However, writers sometimes intend for a word to be interpreted as simultaneously carrying multiple distinct meanings. This deliberate use of lexical ambiguity—i.e., punning—is a particularly common source of humour, and therefore has important implications for how NLP systems process documents and interact with users. In this paper we make a case for research into computational methods for the detection of puns in running text and for the isolation of the intended meanings. We discuss the challenges involved in adapting principles and techniques from WSD to humorously ambiguous text, and outline our plans for evaluating WSD-inspired systems in a dedicated pun identification task. We describe the compilation of a large manually annotated corpus of puns and present an analysis of its properties. While our work is principally concerned with simple puns which are monolexemic and homographic (i.e., exploiting single words which have different meanings but are spelled identically), we touch on the challenges involved in processing other types.
Bookmarks Related papers MentionsView impact
Journal of Educational Computing Research, 2003
Latent semantic analysis (LSA) is an automated, statistical technique for comparing the semantic ... more Latent semantic analysis (LSA) is an automated, statistical technique for comparing the semantic similarity of words or documents. In this paper, I examine the application of LSA to automated essay scoring. I compare LSA methods to earlier statistical methods for assessing essay quality, and critically review contemporary essay-scoring systems built on LSA, including the Intelligent Essay Assessor, Summary Street, State the Essence, Apex, and Select-a-Kibitzer. Finally, I discuss current avenues of research, including LSA's application to computer-measured readability assessment and to automatic summarization of student essays.
Bookmarks Related papers MentionsView impact
International Journal on Artificial Intelligence Tools, 2001
For many years, the non-monotonic reasoning community has focussed on highly expressive logics. S... more For many years, the non-monotonic reasoning community has focussed on highly expressive logics. Such logics have turned out to be computationally expensive, and have given little support to the practical use of non-monotonicreasoning. In this work we discuss defeasible logic, a less-expressive but more efficient non-monotonic logic. We report on two new implemented systems for defeasible logic: a query answering system employing a backward-chaining approach, and a forward-chaining implementation that computes all conclusions. Our experimental evaluation demonstrates that the systems can deal with large theories (up to hundreds of thousands of rules). We show that defeasible logic has linear complexity, which contrasts markedly with most other non-monotonic logics and helps to explain the impressive experimental results. We believe that defeasible logic, with its efficiency and simplicity, is a good candidate to be used as a modelling language for practical applications, including modelling of regulations and business rules.
Bookmarks Related papers MentionsView impact
International Journal of Applied Mathematics, 1999
Automatic generation of Bayesian network (BN) structures (directed acyclic graphs) is an importan... more Automatic generation of Bayesian network (BN) structures (directed acyclic graphs) is an important step in experimental study of algorithms for inference in BNs and algorithms for learning BNs from data. Previously known simulation algorithms do not guarantee connectedness of generated structures or even successful genearation according to a user specification. We propose a simple, efficient and well-behaved algorithm for automatic generation of BN structures. The performance of the algorithm is demonstrated experimentally.
Bookmarks Related papers MentionsView impact
Book chapters
La traducción audiovisual a través de la traducción automática y la posedición: prácticas actuales y futuras, 2023
Bookmarks Related papers MentionsView impact
Using Technologies for Creative-Text Translation
We present and evaluate PunCAT, an interactive electronic tool for the translation of puns. Follo... more We present and evaluate PunCAT, an interactive electronic tool for the translation of puns. Following the strategies known to be applied in pun translation, PunCAT automatically translates each sense of the pun separately; it then allows the user to explore the semantic fields of these translations in order to help construct a plausible target-language solution that maximizes the semantic correspondence to the original. Our evaluation is based on an empirical pilot study in which the participants translated puns from a variety of published sources from English into German, with and without PunCAT. We aimed to answer the following questions: Does the tool support, improve, or constrain the translation process, and if so, in what ways? And what are the tool's main benefits and drawbacks as perceived and described by the participants? Our analysis of the translators' cognitive processes gives us insight into their decision-making strategies and how they interacted with the tool. We find clear evidence that PunCAT effectively supports the translation process in terms of stimulating brainstorming and broadening the translator's pool of solution candidates. We have also identified a number of directions in which the tool could be adapted to better suit translators' work processes.
Bookmarks Related papers MentionsView impact
Handbook of Language and Humor, Feb 2017
Bookmarks Related papers MentionsView impact
Recent Advances in Natural Language Processing III, 2004
We describe a language-neutral automatic summarization system which aims to produce coherent extr... more We describe a language-neutral automatic summarization system which aims to produce coherent extracts. It builds an initial extract composed solely of topic sentences, and then recursively fills in the topical lacunae by providing linking material between semantically dissimilar sentences. While experiments with human judges did not prove a statistically significant increase in textual coherence with the use of a latent semantic analysis module, we found a strong positive correlation between coherence and overall summary quality.
Bookmarks Related papers MentionsView impact
Articles in refereed proceedings
Experimental IR Meets Multilinguality, Multimodality, and Interaction: Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF 2023), 2023
The goal of the JOKER track series is to bring together linguists, translators, and computer scie... more The goal of the JOKER track series is to bring together linguists, translators, and computer scientists to foster progress on the automatic interpretation, generation, and translation of wordplay. Building on lessons learned from last year's edition, JOKER-2023 held three shared tasks aligned with the human approaches to the translation of wordplay, or more specifically of puns in English, French, and Spanish: detection, location and interpretation, and finally translation. In this paper, we define these three tasks and describe our approaches to corpus creation and evaluation. We then present an overview of the participating systems, including summaries of their approaches and a comparison of their performance. As in JOKER-2022, this year's track also solicited contributions making further use of our data (an “unshared task”), which we also report on.
Bookmarks Related papers MentionsView impact
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023
Despite recent advances in information retrieval and natural language processing, rhetorical devi... more Despite recent advances in information retrieval and natural language processing, rhetorical devices that exploit ambiguity or subvert linguistic rules remain a challenge for such systems. However, corpus-based analysis of wordplay has been a perennial topic of scholarship in the humanities, including literary criticism, language education, and translation studies. The immense data-gathering effort required for these studies points to the need for specialized text retrieval and classification technology, and consequently for appropriate test collections. In this paper, we introduce and analyze a new dataset for research and applications in the retrieval and processing of wordplay. Developed for the JOKER track at CLEF 2023, our annotated corpus extends and improves upon past English wordplay detection datasets in several ways. First, we introduce hundreds of additional positive examples; second, we provide French translations for the examples; and third, we provide negative examples with characteristics closely matching those of the positive examples. This last feature helps ensure that AI models learn to effectively distinguish wordplay from non-wordplay, and not simply texts differing in length, style, or vocabulary. Our test collection represents then a step towards wordplay-aware multilingual information retrieval.
Bookmarks Related papers MentionsView impact
Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, Proceedings, Part III, 2023
Understanding and translating humorous wordplay often requires recognition of implicit cultural r... more Understanding and translating humorous wordplay often requires recognition of implicit cultural references, knowledge of word formation processes, and discernment of double meanings – issues which pose challenges for humans and computers alike. This paper introduces the CLEF 2023 JOKER track, which takes an interdisciplinary approach to the creation of reusable test collections, evaluation metrics, and methods for the automatic processing of wordplay. We describe the track's interconnected shared tasks for the detection, location, interpretation, and translation of puns. We also describe associated data sets and evaluation methodologies, and invite contributions making further use of our data.
Bookmarks Related papers MentionsView impact
Experimental IR Meets Multilinguality, Multimodality, and Interaction: Proceedings of the Thirteenth International Conference of the CLEF Association (CLEF 2022), 2022
While humour and wordplay are among the most intensively studied problems in the field of transla... more While humour and wordplay are among the most intensively studied problems in the field of translation studies, they have been almost completely ignored in machine translation. This is partly because most AI-based translation tools require a quality and quantity of training data (e.g., parallel corpora) that has historically been lacking for humour and wordplay. The goal of the JOKER@CLEF 2022 workshop was to bring together translators and computer scientists to work on an evaluation framework for wordplay, including data and metric development, and to foster work on automatic methods for wordplay translation. To this end, we defined three pilot tasks: (1) classify and explain instances of wordplay, (2) translate single terms containing wordplay, and (3) translate entire phrases containing wordplay (punning jokes). This paper describes and discusses each of these pilot tasks, as well as the participating systems and their results.
Bookmarks Related papers MentionsView impact
Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II, 2022
Humour remains one of the most difficult aspects of intercultural communication: understanding hu... more Humour remains one of the most difficult aspects of intercultural communication: understanding humour often requires understanding implicit cultural references and/ or double meanings, and this raises the question of the (un)translatability of humour. Wordplay is a common source of humour in literature, journalism, and advertising due to its attention-getting, mnemonic, playful, and subversive character. The translation of humour and wordplay is therefore in high demand. Modern translation depends heavily on technological aids, yet few works have treated the automation of humour and wordplay translation and the creation of humour corpora. The goal of the JOKER workshop is to bring together translators and computer scientists to work on an evaluation framework for creative language, including data and metric development, and to foster work on automatic methods for wordplay translation. We propose three pilot tasks: (1) classify and explain instances of wordplay, (2) translate single words containing wordplay, and (3) translate entire phrases containing wordplay.
Bookmarks Related papers MentionsView impact
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2021), 2021
In this work, we design an end-to-end model for poetry generation based on conditioned recurrent ... more In this work, we design an end-to-end model for poetry generation based on conditioned recurrent neural network (RNN) language models whose goal is to learn stylistic features (poem length, sentiment, alliteration, and rhyming) from examples alone. We show this model successfully learns the ‘meaning' of length and sentiment, as we can control it to generate longer or shorter as well as more positive or more negative poems. However, the model does not grasp sound phenomena like alliteration and rhyming, but instead exploits low-level statistical cues. Possible reasons include the size of the training data, the relatively low frequency and difficulty of these sublexical phenomena as well as model biases. We show that more recent GPT-2 models also have problems learning sublexical phenomena such as rhyming from examples alone.
Bookmarks Related papers MentionsView impact
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), 2021
Disagreement between coders is ubiquitous in virtually all datasets annotated with human judgemen... more Disagreement between coders is ubiquitous in virtually all datasets annotated with human judgements in both natural language processing and computer vision. However, most supervised machine learning methods assume that a single preferred interpretation exists for each item, which is at best an idealization. The aim of the SemEval-2021 shared task on Learning with Disagreements (Le-wi-Di) was to provide a unified testing framework for methods for learning from data containing multiple and possibly contradictory annotations covering the best-known datasets containing information about disagreements for interpreting language and classifying images. In this paper we describe the shared task and its results.
Bookmarks Related papers MentionsView impact
Proceedings of the Second Workshop on Human-Informed Translation and Interpreting Тechnology (HiT-IT 2019), 2019
The translation of wordplay is one of the most extensively researched problems in translation stu... more The translation of wordplay is one of the most extensively researched problems in translation studies, but it has attracted little attention in the fields of natural language processing and machine translation. This is because today's language technologies treat anomalies and ambiguities in the input as things that must be resolved in favour of a single ``correct'' interpretation, rather than preserved and interpreted in their own right. But if computers cannot yet process such creative language on their own, can they at least provide specialized support to translation professionals? In this paper, I survey the state of the art relevant to computational processing of humorous wordplay and put forth a vision of how existing theories, resources, and technologies could be adapted and extended to support interactive, computer-assisted translation.
Bookmarks Related papers MentionsView impact
Uploads