Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 57 results for author: Bontcheva, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12614  [pdf, other

    cs.CL cs.LG

    EUvsDisinfo: a Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles

    Authors: João A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina Scarton

    Abstract: This work introduces EUvsDisinfo, a multilingual dataset of trustworthy and disinformation articles related to pro-Kremlin themes. It is sourced directly from the debunk articles written by experts leading the EUvsDisinfo project. Our dataset is the largest to-date resource in terms of the overall number of articles and distinct languages. It also provides the largest topical and temporal coverage… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 4 pages, 3 figures, 2 tables

  2. arXiv:2405.00611  [pdf, other

    cs.CL

    Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling

    Authors: Yida Mu, Peizhen Bai, Kalina Bontcheva, Xingyi Song

    Abstract: Large language models (LLMs) with their strong zero-shot topic extraction capabilities offer an alternative to probabilistic topic modelling and closed-set topic classification approaches. As zero-shot topic extractors, LLMs are expected to understand human instructions to generate relevant and non-hallucinated topics based on the given documents. However, LLM-based topic modelling approaches ofte… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  3. arXiv:2403.16248  [pdf, other

    cs.CL

    Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling

    Authors: Yida Mu, Chun Dong, Kalina Bontcheva, Xingyi Song

    Abstract: Topic modelling, as a well-established unsupervised technique, has found extensive use in automatically detecting significant topics within a corpus of documents. However, classic topic modelling approaches (e.g., LDA) have certain drawbacks, such as the lack of semantic understanding and the presence of overlapping topics. In this work, we investigate the untapped potential of large language mode… ▽ More

    Submitted 26 March, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  4. arXiv:2402.08467  [pdf, other

    cs.CL

    Lying Blindly: Bypassing ChatGPT's Safeguards to Generate Hard-to-Detect Disinformation Claims at Scale

    Authors: Freddy Heppell, Mehmet E. Bakir, Kalina Bontcheva

    Abstract: As Large Language Models (LLMs) become more proficient, their misuse in large-scale viral disinformation campaigns is a growing concern. This study explores the capability of ChatGPT to generate unconditioned claims about the war in Ukraine, an event beyond its knowledge cutoff, and evaluates whether such claims can be differentiated by human readers and automated tools from human-written ones. We… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  5. arXiv:2311.05265  [pdf, other

    cs.CL cs.AI cs.LG

    Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels

    Authors: Ben Wu, Yue Li, Yida Mu, Carolina Scarton, Kalina Bontcheva, Xingyi Song

    Abstract: In this paper, we address the limitations of the common data annotation and training methods for objective single-label classification tasks. Typically, when annotating such tasks annotators are only asked to provide a single label for each sample and annotator disagreement is discarded when a final hard label is decided through majority voting. We challenge this traditional approach, acknowledgin… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP 2023 (Findings)

  6. Analysing State-Backed Propaganda Websites: a New Dataset and Linguistic Study

    Authors: Freddy Heppell, Kalina Bontcheva, Carolina Scarton

    Abstract: This paper analyses two hitherto unstudied sites sharing state-backed disinformation, Reliable Recent News (rrn.world) and WarOnFakes (waronfakes.com), which publish content in Arabic, Chinese, English, French, German, and Spanish. We describe our content acquisition methodology and perform cross-site unsupervised topic clustering on the resulting multilingual dataset. We also perform linguistic a… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 main conference

    Journal ref: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5729-5741

  7. arXiv:2309.14146  [pdf, other

    cs.CL

    Examining Temporal Bias in Abusive Language Detection

    Authors: Mali Jin, Yida Mu, Diana Maynard, Kalina Bontcheva

    Abstract: The use of abusive language online has become an increasingly pervasive problem that damages both individuals and society, with effects ranging from psychological harm right through to escalation to real-life violence and even death. Machine learning models have been developed to automatically detect abusive language, but these models can suffer from temporal bias, the phenomenon in which topics,… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  8. arXiv:2309.11576  [pdf, other

    cs.CL

    Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets

    Authors: Yida Mu, Xingyi Song, Kalina Bontcheva, Nikolaos Aletras

    Abstract: A crucial aspect of a rumor detection model is its ability to generalize, particularly its ability to detect emerging, previously unknown rumors. Past research has indicated that content-based (i.e., using solely source posts as input) rumor detection models tend to perform less effectively on unseen rumors. At the same time, the potential of context-based models remains largely untapped. The main… ▽ More

    Submitted 23 March, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted at LREC-COLING 2024

  9. arXiv:2309.07601  [pdf, other

    cs.CL cs.AI cs.LG

    Detecting Misinformation with LLM-Predicted Credibility Signals and Weak Supervision

    Authors: João A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina Scarton

    Abstract: Credibility signals represent a wide range of heuristics that are typically used by journalists and fact-checkers to assess the veracity of online content. Automating the task of credibility signal extraction, however, is very challenging as it requires high-accuracy signal-specific extractors to be trained, while there are currently no sufficiently large datasets annotated with all credibility si… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  10. Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification

    Authors: Olesya Razuvayevskaya, Ben Wu, Joao A. Leite, Freddy Heppell, Ivan Srba, Carolina Scarton, Kalina Bontcheva, Xingyi Song

    Abstract: Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning techniques designed to make the training of language models more efficient. Previous results demonstrated that these methods can even improve performance on some classification tasks. This paper complements the existing research by investigating how these techniques influence the classification performance and computation… ▽ More

    Submitted 8 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Journal ref: PLOS ONE 2024

  11. arXiv:2308.05680  [pdf, other

    cs.CL cs.CY cs.IR cs.LG cs.SI

    Finding Already Debunked Narratives via Multistage Retrieval: Enabling Cross-Lingual, Cross-Dataset and Zero-Shot Learning

    Authors: Iknoor Singh, Carolina Scarton, Xingyi Song, Kalina Bontcheva

    Abstract: The task of retrieving already debunked narratives aims to detect stories that have already been fact-checked. The successful detection of claims that have already been debunked not only reduces the manual efforts of professional fact-checkers but can also contribute to slowing the spread of misinformation. Mainly due to the lack of readily available data, this is an understudied problem, particul… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

  12. arXiv:2305.14310  [pdf, ps, other

    cs.CL

    Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science

    Authors: Yida Mu, Ben P. Wu, William Thorne, Ambrose Robinson, Nikolaos Aletras, Carolina Scarton, Kalina Bontcheva, Xingyi Song

    Abstract: Instruction-tuned Large Language Models (LLMs) have exhibited impressive language understanding and the capacity to generate responses that follow specific prompts. However, due to the computational demands associated with training these models, their applications often adopt a zero-shot setting. In this paper, we evaluate the zero-shot performance of two publicly accessible LLMs, ChatGPT and Open… ▽ More

    Submitted 24 March, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted at LREC-COLING 2024

  13. arXiv:2304.04811  [pdf, other

    cs.CL

    A Large-Scale Comparative Study of Accurate COVID-19 Information versus Misinformation

    Authors: Yida Mu, Ye Jiang, Freddy Heppell, Iknoor Singh, Carolina Scarton, Kalina Bontcheva, Xingyi Song

    Abstract: The COVID-19 pandemic led to an infodemic where an overwhelming amount of COVID-19 related content was being disseminated at high velocity through social media. This made it challenging for citizens to differentiate between accurate and inaccurate information about COVID-19. This motivated us to carry out a comparative study of the characteristics of COVID-19 misinformation versus those of accurat… ▽ More

    Submitted 7 May, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: ICWSM TrueHealth 2023

  14. arXiv:2304.04806  [pdf, other

    cs.CL

    Examining Temporalities on Stance Detection towards COVID-19 Vaccination

    Authors: Yida Mu, Mali Jin, Kalina Bontcheva, Xingyi Song

    Abstract: Previous studies have highlighted the importance of vaccination as an effective strategy to control the transmission of the COVID-19 virus. It is crucial for policymakers to have a comprehensive understanding of the public's stance towards vaccination on a large scale. However, attitudes towards COVID-19 vaccination, such as pro-vaccine or vaccine hesitancy, have evolved over time on social media.… ▽ More

    Submitted 24 March, 2024; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: Accepted at LREC-COLING 2024

  15. SheffieldVeraAI at SemEval-2023 Task 3: Mono and multilingual approaches for news genre, topic and persuasion technique classification

    Authors: Ben Wu, Olesya Razuvayevskaya, Freddy Heppell, João A. Leite, Carolina Scarton, Kalina Bontcheva, Xingyi Song

    Abstract: This paper describes our approach for SemEval-2023 Task 3: Detecting the category, the framing, and the persuasion techniques in online news in a multi-lingual setup. For Subtask 1 (News Genre), we propose an ensemble of fully trained and adapter mBERT models which was ranked joint-first for German, and had the highest mean rank of multi-language teams. For Subtask 2 (Framing), we achieved first p… ▽ More

    Submitted 9 May, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Journal ref: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 1995-2008, Toronto, Canada. Association for Computational Linguistics

  16. arXiv:2302.03147  [pdf, other

    cs.CL

    It's about Time: Rethinking Evaluation on Rumor Detection Benchmarks using Chronological Splits

    Authors: Yida Mu, Kalina Bontcheva, Nikolaos Aletras

    Abstract: New events emerge over time influencing the topics of rumors in social media. Current rumor detection benchmarks use random splits as training, development and test sets which typically results in topical overlaps. Consequently, models trained on random splits may not perform well on rumor classification on previously unseen topics due to the temporal concept drift. In this paper, we provide a re-… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted at EACL 2023 Findings

  17. arXiv:2301.06660  [pdf, other

    cs.CL

    VaxxHesitancy: A Dataset for Studying Hesitancy towards COVID-19 Vaccination on Twitter

    Authors: Yida Mu, Mali Jin, Charlie Grimshaw, Carolina Scarton, Kalina Bontcheva, Xingyi Song

    Abstract: Vaccine hesitancy has been a common concern, probably since vaccines were created and, with the popularisation of social media, people started to express their concerns about vaccines online alongside those posting pro- and anti-vaccine content. Predictably, since the first mentions of a COVID-19 vaccine, social media users posted about their fears and concerns or about their support and belief in… ▽ More

    Submitted 15 April, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

    Comments: Accepted at ICWSM 2023

  18. Comparative Analysis of Engagement, Themes, and Causality of Ukraine-Related Debunks and Disinformation

    Authors: Iknoor Singh, Kalina Bontcheva, Xingyi Song, Carolina Scarton

    Abstract: This paper compares quantitatively the spread of Ukraine-related disinformation and its corresponding debunks, first by considering re-tweets, replies, and favourites, which demonstrate that despite platform efforts Ukraine-related disinformation is still spreading wider than its debunks. Next, bidirectional post-hoc analysis is carried out using Granger causality tests, impulse response analysis… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: Published in International Conference on Social Informatics 2022 (SocInfo 2022)

    Report number: LNCS, volume 13618 (pp 128--143)

  19. arXiv:2210.09197  [pdf, other

    cs.CL cs.AI cs.LG

    On the Impact of Temporal Concept Drift on Model Explanations

    Authors: Zhixue Zhao, George Chrysostomou, Kalina Bontcheva, Nikolaos Aletras

    Abstract: Explanation faithfulness of model predictions in natural language processing is typically evaluated on held-out data from the same temporal distribution as the training data (i.e. synchronous settings). While model performance often deteriorates due to temporal variation (i.e. temporal concept drift), it is currently unknown how explanation faithfulness is impacted when the time span of the target… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: Accepted at EMNLP Findings 2022

  20. arXiv:2207.08522  [pdf, ps, other

    cs.CL

    Classifying COVID-19 vaccine narratives

    Authors: Yue Li, Carolina Scarton, Xingyi Song, Kalina Bontcheva

    Abstract: Vaccine hesitancy is widespread, despite the government's information campaigns and the efforts of the World Health Organisation (WHO). Categorising the topics within vaccine-related narratives is crucial to understand the concerns expressed in discussions and identify the specific issues that contribute to vaccine hesitancy. This paper addresses the need for monitoring and analysing vaccine narra… ▽ More

    Submitted 17 November, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

    Comments: In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, 2023

  21. arXiv:2107.12303  [pdf, other

    cs.CY cs.SI

    The False COVID-19 Narratives That Keep Being Debunked: A Spatiotemporal Analysis

    Authors: Iknoor Singh, Kalina Bontcheva, Carolina Scarton

    Abstract: The onset of the COVID-19 pandemic led to a global infodemic that has brought unprecedented challenges for citizens, media, and fact-checkers worldwide. To address this challenge, over a hundred fact-checking initiatives worldwide have been monitoring the information space in their countries and publishing regular debunks of viral false COVID-19 narratives. This study examines the database of the… ▽ More

    Submitted 24 April, 2024; v1 submitted 26 July, 2021; originally announced July 2021.

  22. arXiv:2106.11702  [pdf, other

    cs.SI cs.CY cs.LG

    Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of COVID-19 Infodemic

    Authors: Ye Jiang, Xingyi Song, Carolina Scarton, Ahmet Aker, Kalina Bontcheva

    Abstract: The spreading COVID-19 misinformation over social media already draws the attention of many researchers. According to Google Scholar, about 26000 COVID-19 related misinformation studies have been published to date. Most of these studies focusing on 1) detect and/or 2) analysing the characteristics of COVID-19 related misinformation. However, the study of the social behaviours related to misinforma… ▽ More

    Submitted 8 July, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

  23. arXiv:2103.02917  [pdf, other

    cs.CY cs.CL

    MP Twitter Engagement and Abuse Post-first COVID-19 Lockdown in the UK: White Paper

    Authors: Tracie Farrell, Mehmet Bakir, Kalina Bontcheva

    Abstract: The UK has had a volatile political environment for some years now, with Brexit and leadership crises marking the past five years. With this work, we wanted to understand more about how the global health emergency, COVID-19, influences the amount, type or topics of abuse that UK politicians receive when engaging with the public. With this work, we wanted to understand more about how the global hea… ▽ More

    Submitted 8 March, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

  24. Multistage BiCross encoder for multilingual access to COVID-19 health information

    Authors: Iknoor Singh, Carolina Scarton, Kalina Bontcheva

    Abstract: The Coronavirus (COVID-19) pandemic has led to a rapidly growing 'infodemic' of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall neural Multistage BiCross encoder approach. It is… ▽ More

    Submitted 26 August, 2021; v1 submitted 8 January, 2021; originally announced January 2021.

    Journal ref: PLOS ONE 2021

  25. arXiv:2010.04543  [pdf, other

    cs.CL cs.LG cs.SI

    Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis

    Authors: João A. Leite, Diego F. Silva, Kalina Bontcheva, Carolina Scarton

    Abstract: Hate speech and toxic comments are a common concern of social media platform users. Although these comments are, fortunately, the minority in these platforms, they are still capable of causing harm. Therefore, identifying these comments is an important task for studying and preventing the proliferation of toxicity in social media. Previous work in automatically detecting toxic comments focus mainl… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: Accepted to AACL-IJCNLP 2020

  26. arXiv:2010.04532  [pdf, other

    cs.CL cs.LG

    Measuring What Counts: The case of Rumour Stance Classification

    Authors: Carolina Scarton, Diego F. Silva, Kalina Bontcheva

    Abstract: Stance classification can be a powerful tool for understanding whether and which users believe in online rumours. The task aims to automatically predict the stance of replies towards a given rumour, namely support, deny, question, or comment. Numerous methods have been proposed and their performance compared in the RumourEval shared tasks in 2017 and 2019. Results demonstrated that this is a chall… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: Accepted to AACL-IJCNLP 2020

  27. arXiv:2008.05261  [pdf, other

    cs.CY cs.SI

    Vindication, Virtue and Vitriol: A study of online engagement and abuse toward British MPs during the COVID-19 Pandemic

    Authors: Tracie Farrell, Genevieve Gorrell, Kalina Bontcheva

    Abstract: COVID-19 has given rise to malicious content online, including online abuse and hate toward British MPs. In order to understand and contextualise the level of abuse MPs receive, we consider how ministers use social media to communicate about the crisis, and the citizen engagement that this generates. The focus of the paper is on a large-scale, mixed methods study of abusive and antagonistic respon… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

    Comments: Under review as of August 2020

  28. arXiv:2006.08363  [pdf, other

    cs.CY cs.SI

    MP Twitter Abuse in the Age of COVID-19: White Paper

    Authors: Genevieve Gorrell, Tracie Farrell, Kalina Bontcheva

    Abstract: As COVID-19 sweeps the globe, outcomes depend on effective relationships between the public and decision-makers. In the UK there were uncivil tweets to MPs about perceived UK tardiness to go into lockdown. The pandemic has led to increased attention on ministers with a role in the crisis. However, generally this surge has been civil. Prime minister Boris Johnson's severe illness with COVID-19 resu… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

  29. arXiv:2006.03354  [pdf, other

    cs.LG cs.CL cs.SI stat.ML

    Classification Aware Neural Topic Model and its Application on a New COVID-19 Disinformation Corpus

    Authors: Xingyi Song, Johann Petrak, Ye Jiang, Iknoor Singh, Diana Maynard, Kalina Bontcheva

    Abstract: The explosion of disinformation accompanying the COVID-19 pandemic has overloaded fact-checkers and media worldwide, and brought a new major challenge to government responses worldwide. Not only is disinformation creating confusion about medical science amongst citizens, but it is also amplifying distrust in policy makers and governments. To help tackle this, we developed computational methods to… ▽ More

    Submitted 11 March, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

    Comments: This is arXiv version of "Classification Aware Neural Topic Model for COVID-19 Disinformation Categorisation"

    Journal ref: PLOS ONE 2021

  30. arXiv:2004.08355  [pdf

    cs.CL cs.AI

    Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability

    Authors: Georg Rehm, Dimitrios Galanis, Penny Labropoulou, Stelios Piperidis, Martin Welß, Ricardo Usbeck, Joachim Köhler, Miltos Deligiannis, Katerina Gkirtzou, Johannes Fischer, Christian Chiarcos, Nils Feldhus, Julián Moreno-Schneider, Florian Kintzel, Elena Montiel, Víctor Rodríguez Doncel, John P. McCrae, David Laqua, Irina Patricia Theile, Christian Dittmar, Kalina Bontcheva, Ian Roberts, Andrejs Vasiljevs, Andis Lagzdiņš

    Abstract: With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider federation of AI/LT platforms. We illustrate the a… ▽ More

    Submitted 17 April, 2020; originally announced April 2020.

    Comments: Proceedings of the 1st International Workshop on Language Technology Platforms (IWLTP 2020). To appear

  31. arXiv:2003.13833  [pdf

    cs.CL cs.AI cs.DL

    The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe

    Authors: Georg Rehm, Katrin Marheinecke, Stefanie Hegele, Stelios Piperidis, Kalina Bontcheva, Jan Hajič, Khalid Choukri, Andrejs Vasiļjevs, Gerhard Backfried, Christoph Prinz, José Manuel Gómez Pérez, Luc Meertens, Paul Lukowicz, Josef van Genabith, Andrea Lösch, Philipp Slusallek, Morten Irgens, Patrick Gatellier, Joachim Köhler, Laure Le Bars, Dimitra Anastasiou, Albina Auksoriūtė, Núria Bel, António Branco, Gerhard Budin , et al. (22 additional authors not shown)

    Abstract: Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitu… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  32. arXiv:2003.13551  [pdf

    cs.CL

    European Language Grid: An Overview

    Authors: Georg Rehm, Maria Berger, Ela Elsholz, Stefanie Hegele, Florian Kintzel, Katrin Marheinecke, Stelios Piperidis, Miltos Deligiannis, Dimitris Galanis, Katerina Gkirtzou, Penny Labropoulou, Kalina Bontcheva, David Jones, Ian Roberts, Jan Hajic, Jana Hamrlová, Lukáš Kačena, Khalid Choukri, Victoria Arranz, Andrejs Vasiļjevs, Orians Anvari, Andis Lagzdiņš, Jūlija Meļņika, Gerhard Backfried, Erinç Dikici , et al. (11 additional authors not shown)

    Abstract: With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented, by nation states, lang… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  33. arXiv:2001.08686  [pdf, other

    cs.CY

    Online Abuse toward Candidates during the UK General Election 2019: Working Paper

    Authors: Genevieve Gorrell, Mehmet E Bakir, Ian Roberts, Mark A Greenwood, Kalina Bontcheva

    Abstract: The 2019 UK general election took place against a background of rising online hostility levels toward politicians and concerns about its impact on democracy. We collected 4.2 million tweets sent to or from election candidates in the six week period spanning from the start of November until shortly after the December 12th election. We found abuse in 4.46\% of replies received by candidates, up from… ▽ More

    Submitted 30 January, 2020; v1 submitted 23 January, 2020; originally announced January 2020.

    Comments: Second draft, January 30th 2020

  34. arXiv:1910.00920  [pdf, other

    cs.CY

    Race and Religion in Online Abuse towards UK Politicians: Working Paper

    Authors: Genevieve Gorrell, Mehmet E. Bakir, Mark A. Greenwood, Ian Roberts, Kalina Bontcheva

    Abstract: Against a backdrop of tensions related to EU membership, we find levels of online abuse toward UK MPs reach a new high. Race and religion have become pressing topics globally, and in the UK this interacts with "Brexit" and the rise of social media to create a complex social climate in which much can be learned about evolving attitudes. In 8 million tweets by and to UK MPs in the first half of 2019… ▽ More

    Submitted 24 October, 2019; v1 submitted 2 October, 2019; originally announced October 2019.

    Comments: This is an earlier version of a paper currently under review. This version spans January to June 2019 whereas the paper under review covers January to September 2019, as well as adding further analysis

  35. The evolution of argumentation mining: From models to social media and emerging tools

    Authors: Anastasios Lytos, Thomas Lagkas, Panagiotis Sarigiannidis, Kalina Bontcheva

    Abstract: Argumentation mining is a rising subject in the computational linguistics domain focusing on extracting structured arguments from natural text, often from unstructured or noisy text. The initial approaches on modeling arguments was aiming to identify a flawless argument on specific fields (Law, Scientific Papers) serving specific needs (completeness, effectiveness). With the emerge of Web 2.0 and… ▽ More

    Submitted 4 July, 2019; originally announced July 2019.

    Comments: Journal of Information Processing & Management, Elsevier - Accepted Version

    Journal ref: Information Processing & Management, Volume 56, Issue 6, 2019

  36. arXiv:1904.11230  [pdf, other

    cs.CY

    Online Abuse of UK MPs from 2015 to 2019: Working Paper

    Authors: Mark A. Greenwood, Mehmet E. Bakir, Genevieve Gorrell, Xingyi Song, Ian Roberts, Kalina Bontcheva

    Abstract: We extend previous work about general election-related abuse of UK MPs with two new time periods, one in late 2018 and the other in early 2019, allowing previous observations to be extended to new data and the impact of key stages in the UK withdrawal from the European Union on patterns of abuse to be explored. The topics that draw abuse evolve over the four time periods are reviewed, with topics… ▽ More

    Submitted 25 April, 2019; originally announced April 2019.

  37. arXiv:1902.06521  [pdf, other

    cs.CY

    Local Media and Geo-situated Responses to Brexit: A Quantitative Analysis of Twitter, News and Survey Data

    Authors: Genevieve Gorrell, Mehmet E. Bakir, Luke Temple, Diana Maynard, Paolo Cifariello, Jackie Harrison, J. Miguel Kanai, Kalina Bontcheva

    Abstract: Societal debates and political outcomes are subject to news and social media influences, which are in turn subject to commercial and other forces. Local press are in decline, creating a ``news gap''. Research shows a contrary relationship between UK regions' economic dependence on EU membership and their voting in the 2016 UK EU membership referendum, raising questions about local awareness. We dr… ▽ More

    Submitted 23 June, 2020; v1 submitted 18 February, 2019; originally announced February 2019.

  38. arXiv:1902.01752  [pdf, other

    cs.CY

    Partisanship, Propaganda and Post-Truth Politics: Quantifying Impact in Online Debate

    Authors: Genevieve Gorrell, Mehmet E. Bakir, Ian Roberts, Mark A. Greenwood, Benedetta Iavarone, Kalina Bontcheva

    Abstract: The recent past has highlighted the influential role of social networks and online media in shaping public debate on current affairs and political issues. This paper is focused on studying the role of politically-motivated actors and their strategies for influencing and manipulating public opinion online: partisan media, state-backed propaganda, and post-truth politics. In particular, we present q… ▽ More

    Submitted 17 February, 2020; v1 submitted 5 February, 2019; originally announced February 2019.

    Comments: This is now published in the Journal of Web Science. Please cite accordingly. https://webscience-journal.net/webscience/article/view/84

    Journal ref: Journal of Web Science 2019(7)

  39. arXiv:1809.06943  [pdf

    cs.IR cs.SI

    Argumentation Mining: Exploiting Multiple Sources and Background Knowledge

    Authors: Anastasios Lytos, Thomas Lagkas, Panagiotis Sarigiannidis, Kalina Bontcheva

    Abstract: The field of Argumentation Mining has arisen from the need of determining the underlying causes from an expressed opinion and the urgency to develop the established fields of Opinion Mining and Sentiment Analysis. The recent progress in the wider field of Artificial Intelligence in combination with the available data through Social Web has create great potential for every sub-field of Natural Lang… ▽ More

    Submitted 18 September, 2018; originally announced September 2018.

    Comments: 12th Annual South-East European Doctoral Student Conference (DSC2018), ISBN: 978-960-9416-20-7, pp. 66-74, Thessaloniki, Greece, May 2018

  40. arXiv:1809.06683  [pdf, other

    cs.CL

    RumourEval 2019: Determining Rumour Veracity and Support for Rumours

    Authors: Genevieve Gorrell, Kalina Bontcheva, Leon Derczynski, Elena Kochkina, Maria Liakata, Arkaitz Zubiaga

    Abstract: This is the proposal for RumourEval-2019, which will run in early 2019 as part of that year's SemEval event. Since the first RumourEval shared task in 2017, interest in automated claim validation has greatly increased, as the dangers of "fake news" have become a mainstream concern. Yet automated support for rumour checking remains in its infancy. For this reason, it is important that a shared task… ▽ More

    Submitted 18 September, 2018; originally announced September 2018.

  41. arXiv:1804.01498  [pdf, other

    cs.CY

    Online Abuse of UK MPs in 2015 and 2017: Perpetrators, Targets, and Topics

    Authors: Genevieve Gorrell, Mark Greenwood, Ian Roberts, Diana Maynard, Kalina Bontcheva

    Abstract: Concerns have reached the mainstream about how social media are affecting political outcomes. One trajectory for this is the exposure of politicians to online abuse. In this paper we use 1.4 million tweets from the months before the 2015 and 2017 UK general elections to explore the abuse directed at politicians. This collection allows us to look at abuse broken down by both party and gender and ai… ▽ More

    Submitted 4 April, 2018; originally announced April 2018.

    Comments: 10 pages, 12 figures

  42. arXiv:1801.09633  [pdf, other

    cs.CL

    Helping Crisis Responders Find the Informative Needle in the Tweet Haystack

    Authors: Leon Derczynski, Kenny Meesters, Kalina Bontcheva, Diana Maynard

    Abstract: Crisis responders are increasingly using social media, data and other digital sources of information to build a situational understanding of a crisis situation in order to design an effective response. However with the increased availability of such data, the challenge of identifying relevant information from it also increases. This paper presents a successful automatic approach to handling this p… ▽ More

    Submitted 29 January, 2018; originally announced January 2018.

    Journal ref: Proc. 15th International Conference on Information Systems for Crisis Response and Management (ISCRAM), 2018, pp. 649-662. ISBN 9780692127605

  43. Discourse-Aware Rumour Stance Classification in Social Media Using Sequential Classifiers

    Authors: Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik, Kalina Bontcheva, Trevor Cohn, Isabelle Augenstein

    Abstract: Rumour stance classification, defined as classifying the stance of specific social media posts into one of supporting, denying, querying or commenting on an earlier post, is becoming of increasing interest to researchers. While most previous work has focused on using individual tweets as classifier inputs, here we report on the performance of sequential classifiers that exploit the discourse featu… ▽ More

    Submitted 6 December, 2017; originally announced December 2017.

    Journal ref: Information Processing & Management, Volume 54, Issue 2, March 2018, Pages 273-290

  44. arXiv:1708.05286  [pdf, other

    cs.CL

    Simple Open Stance Classification for Rumour Analysis

    Authors: Ahmet Aker, Leon Derczynski, Kalina Bontcheva

    Abstract: Stance classification determines the attitude, or stance, in a (typically short) text. The task has powerful applications, such as the detection of fake news or the automatic extraction of attitudes toward entities or events in the media. This paper describes a surprisingly simple and efficient classification approach to open stance classification in Twitter, for rumour and veracity classification… ▽ More

    Submitted 14 September, 2017; v1 submitted 17 August, 2017; originally announced August 2017.

    Journal ref: In RANLP 2017

  45. arXiv:1708.04592  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Gold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data

    Authors: Nattapong Sanchan, Ahmet Aker, Kalina Bontcheva

    Abstract: Usage of online textual media is steadily increasing. Daily, more and more news stories, blog posts and scientific articles are added to the online volumes. These are all freely accessible and have been employed extensively in multiple research areas, e.g. automatic text summarization, information retrieval, information extraction, etc. Meanwhile, online debate forums have recently become popular,… ▽ More

    Submitted 15 August, 2017; originally announced August 2017.

    Comments: accepted and presented at the CICLING 2017 - 18th International Conference on Intelligent Text Processing and Computational Linguistics

  46. arXiv:1708.04587  [pdf, other

    cs.CL cs.AI cs.IR

    Automatic Summarization of Online Debates

    Authors: Nattapong Sanchan, Ahmet Aker, Kalina Bontcheva

    Abstract: Debate summarization is one of the novel and challenging research areas in automatic text summarization which has been largely unexplored. In this paper, we develop a debate summarization pipeline to summarize key topics which are discussed or argued in the two opposing sides of online debates. We view that the generation of debate summaries can be achieved by clustering, cluster labeling, and vis… ▽ More

    Submitted 15 August, 2017; originally announced August 2017.

    Comments: Accepted and to be published in Natural Language Processing and Information Retrieval workshop, Recent Advances in Natural Language Processing 2017 (RANLP 2017)

  47. arXiv:1704.05972  [pdf, ps, other

    cs.CL cs.AI

    SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours

    Authors: Leon Derczynski, Kalina Bontcheva, Maria Liakata, Rob Procter, Geraldine Wong Sak Hoi, Arkaitz Zubiaga

    Abstract: Media is full of false claims. Even Oxford Dictionaries named "post-truth" as the word of 2016. This makes it more important than ever to build systems that can identify the veracity of a story, and the kind of discourse there is around it. RumourEval is a SemEval shared task that aims to identify and handle rumours and reactions to them, in text. We present an annotation scheme, a large dataset c… ▽ More

    Submitted 19 April, 2017; originally announced April 2017.

  48. arXiv:1704.00656  [pdf, other

    cs.CL cs.HC cs.IR cs.SI

    Detection and Resolution of Rumours in Social Media: A Survey

    Authors: Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, Rob Procter

    Abstract: Despite the increasing use of social media platforms for information and news gathering, its unmoderated nature often leads to the emergence and spread of rumours, i.e. pieces of information that are unverified at the time of posting. At the same time, the openness of social media platforms provides opportunities to study how users share and discuss rumours, and to explore how natural language pro… ▽ More

    Submitted 3 April, 2018; v1 submitted 3 April, 2017; originally announced April 2017.

    Comments: ACM Computing Surveys

    Journal ref: ACM Computing Surveys 51, 2, Article 32 (February 2018), 36 pages

  49. arXiv:1701.02877  [pdf, other

    cs.CL

    Generalisation in Named Entity Recognition: A Quantitative Analysis

    Authors: Isabelle Augenstein, Leon Derczynski, Kalina Bontcheva

    Abstract: Named Entity Recognition (NER) is a key NLP task, which is all the more challenging on Web and user-generated content with their diverse and continuously changing language. This paper aims to quantify how this diversity impacts state-of-the-art NER methods, by measuring named entity (NE) and context variability, feature sparsity, and their effects on precision and recall. In particular, our findin… ▽ More

    Submitted 7 March, 2017; v1 submitted 11 January, 2017; originally announced January 2017.

    Comments: Preprint, accepted to Computer Speech and Language

  50. arXiv:1609.01962  [pdf, other

    cs.CL cs.IR cs.SI

    Using Gaussian Processes for Rumour Stance Classification in Social Media

    Authors: Michal Lukasik, Kalina Bontcheva, Trevor Cohn, Arkaitz Zubiaga, Maria Liakata, Rob Procter

    Abstract: Social media tend to be rife with rumours while new reports are released piecemeal during breaking news. Interestingly, one can mine multiple reactions expressed by social media users in those situations, exploring their stance towards rumours, ultimately enabling the flagging of highly disputed rumours as being potentially false. In this work, we set out to develop an automated, supervised classi… ▽ More

    Submitted 7 September, 2016; originally announced September 2016.