Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 53 results for author: Greene, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.02253  [pdf, other

    cs.CL

    Advancing Post-OCR Correction: A Comparative Study of Synthetic Data

    Authors: Shuhao Guan, Derek Greene

    Abstract: This paper explores the application of synthetic data in the post-OCR domain on multiple fronts by conducting experiments to assess the impact of data volume, augmentation, and synthetic data generation methods on model performance. Furthermore, we introduce a novel algorithm that leverages computer vision feature detection algorithms to calculate glyph similarity for constructing post-OCR synthet… ▽ More

    Submitted 13 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: ACL 2024 findings

  2. arXiv:2406.04244  [pdf, other

    cs.CL

    Benchmark Data Contamination of Large Language Models: A Survey

    Authors: Cheng Xu, Shuhao Guan, Derek Greene, M-Tahar Kechadi

    Abstract: The rapid development of Large Language Models (LLMs) like GPT-4, Claude-3, and Gemini has transformed the field of natural language processing. However, it has also resulted in a significant issue known as Benchmark Data Contamination (BDC). This occurs when language models inadvertently incorporate evaluation benchmark information from their training data, leading to inaccurate or unreliable per… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 31 pages, 7 figures, 3 tables

  3. arXiv:2406.03921  [pdf, other

    cs.SI

    Knowledge Transfer, Knowledge Gaps, and Knowledge Silos in Citation Networks

    Authors: Eoghan Cunningham, Derek Greene

    Abstract: The advancement of science relies on the exchange of ideas across disciplines and the integration of diverse knowledge domains. However, tracking knowledge flows and interdisciplinary integration in rapidly evolving, multidisciplinary fields remains a significant challenge. This work introduces a novel network analysis framework to study the dynamics of knowledge transfer directly from citation da… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  4. arXiv:2403.19011  [pdf, other

    q-bio.QM cs.LG

    Sequential Inference of Hospitalization Electronic Health Records Using Probabilistic Models

    Authors: Alan D. Kaplan, Priyadip Ray, John D. Greene, Vincent X. Liu

    Abstract: In the dynamic hospital setting, decision support can be a valuable tool for improving patient outcomes. Data-driven inference of future outcomes is challenging in this dynamic setting, where long sequences such as laboratory tests and medications are updated frequently. This is due in part to heterogeneity of data types and mixed-sequence types contained in variable length sequences. In this work… ▽ More

    Submitted 24 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  5. arXiv:2401.11198  [pdf, other

    cs.IR

    A Deep Learning Approach for Selective Relevance Feedback

    Authors: Suchana Datta, Debasis Ganguly, Sean MacAvaney, Derek Greene

    Abstract: Pseudo-relevance feedback (PRF) can enhance average retrieval effectiveness over a sufficiently large number of queries. However, PRF often introduces a drift into the original information need, thus hurting the retrieval effectiveness of several queries. While a selective application of PRF can potentially alleviate this issue, previous approaches have largely relied on unsupervised or feature-ba… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  6. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  7. RecPrompt: A Prompt Tuning Framework for News Recommendation Using Large Language Models

    Authors: Dairui Liu, Boming Yang, Honghui Du, Derek Greene, Aonghus Lawlor, Ruihai Dong, Irene Li

    Abstract: In the evolving field of personalized news recommendation, understanding the semantics of the underlying data is crucial. Large Language Models (LLMs) like GPT-4 have shown promising performance in understanding natural language. However, the extent of their applicability in news recommendation systems remains to be validated. This paper introduces RecPrompt, the first framework for news recommend… ▽ More

    Submitted 9 August, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

    Comments: 8 pages, 3 figures, and 8 tables

  8. arXiv:2309.14984  [pdf, other

    cs.IR cs.DL

    The Role of Document Embedding in Research Paper Recommender Systems: To Breakdown or to Bolster Disciplinary Borders?

    Authors: Eoghan Cunningham, Derek Greene, Barry Smyth

    Abstract: In the extensive recommender systems literature, novelty and diversity have been identified as key properties of useful recommendations. However, these properties have received limited attention in the specific sub-field of research paper recommender systems. In this work, we argue for the importance of offering novel and diverse research paper recommendations to scientists. This approach aims to… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: Under Review at Scientometrics

  9. Handwriting Analysis on the Diaries of Rosamond Jacob

    Authors: Sharmistha S. Sawant, Saloni D. Thakare, Derek Greene, Gerardine Meaney, Alan F. Smeaton

    Abstract: Handwriting is an art form that most people learn at an early age. Each person's writing style is unique with small changes as we grow older and as our mood changes. Here we analyse handwritten text in a culturally significant personal diary. We compare changes in handwriting and relate this to the sentiment of the written material and to the topic of diary entries. We identify handwritten text fr… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: International Conference on Content-based Multimedia Indexing, September 20--22, 2023, Orleans, France

  10. Curatr: A Platform for Semantic Analysis and Curation of Historical Literary Texts

    Authors: Susan Leavy, Gerardine Meaney, Karen Wade, Derek Greene

    Abstract: The increasing availability of digital collections of historical and contemporary literature presents a wealth of possibilities for new research in the humanities. The scale and diversity of such collections however, presents particular challenges in identifying and extracting relevant content. This paper presents Curatr, an online platform for the exploration and curation of literature with machi… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: 12 pages

    Journal ref: Metadata and Semantic Research (MTSR 2019), Communications in Computer and Information Science, vol 1057. Springer, Cham

  11. Topic-Centric Explanations for News Recommendation

    Authors: Dairui Liu, Derek Greene, Irene Li, Xuefei Jiang, Ruihai Dong

    Abstract: News recommender systems (NRS) have been widely applied for online news websites to help users find relevant articles based on their interests. Recent methods have demonstrated considerable success in terms of recommendation performance. However, the lack of explanation for these recommendations can lead to mistrust among users and lack of acceptance of recommendations. To address this issue, we p… ▽ More

    Submitted 6 October, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: 20 pages

    Journal ref: ACM Trans. Recomm. Syst. 1, 1, Article 1 (January 2024), 26 pages.

  12. arXiv:2304.00310  [pdf, other

    cs.IR

    On the Feasibility and Robustness of Pointwise Evaluation of Query Performance Prediction

    Authors: Suchana Datta, Debasis Ganguly, Derek Greene, Mandar Mitra

    Abstract: Despite the retrieval effectiveness of queries being mutually independent of one another, the evaluation of query performance prediction (QPP) systems has been carried out by measuring rank correlation over an entire set of queries. Such a listwise approach has a number of disadvantages, notably that it does not support the common requirement of assessing QPP for individual queries. In this paper,… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

  13. arXiv:2303.08954  [pdf, other

    cs.CL

    PRESTO: A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs

    Authors: Rahul Goel, Waleed Ammar, Aditya Gupta, Siddharth Vashishtha, Motoki Sano, Faiz Surani, Max Chang, HyunJeong Choe, David Greene, Kyle He, Rattima Nitisaroj, Anna Trukhina, Shachi Paul, Pararth Shah, Rushin Shah, Zhou Yu

    Abstract: Research interest in task-oriented dialogs has increased as systems such as Google Assistant, Alexa and Siri have become ubiquitous in everyday life. However, the impact of academic research in this area has been limited by the lack of datasets that realistically capture the wide array of user pain points. To enable research on some of the more challenging aspects of parsing realistic conversation… ▽ More

    Submitted 16 March, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: PRESTO v1 Release

  14. Graph Embedding for Mapping Interdisciplinary Research Networks

    Authors: Eoghan Cunningham, Derek Greene

    Abstract: Representation learning is the first step in automating tasks such as research paper recommendation, classification, and retrieval. Due to the accelerating rate of research publication, together with the recognised benefits of interdisciplinary research, systems that facilitate researchers in discovering and understanding relevant works from beyond their immediate school of knowledge are vital. Th… ▽ More

    Submitted 20 March, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

  15. arXiv:2212.08733  [pdf, other

    cs.LG cs.AI cs.CV

    Counterfactual Explanations for Misclassified Images: How Human and Machine Explanations Differ

    Authors: Eoin Delaney, Arjun Pakrashi, Derek Greene, Mark T. Keane

    Abstract: Counterfactual explanations have emerged as a popular solution for the eXplainable AI (XAI) problem of elucidating the predictions of black-box deep-learning systems due to their psychological validity, flexibility across problem domains and proposed legal compliance. While over 100 counterfactual methods exist, claiming to generate plausible explanations akin to those preferred by people, few hav… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

  16. arXiv:2206.03159  [pdf, other

    cs.SI

    The Structure of Interdisciplinary Science: Uncovering and Explaining Roles in Citation Graphs

    Authors: Eoghan Cunningham, Derek Greene

    Abstract: Role discovery is the task of dividing the set of nodes on a graph into classes of structurally similar roles. Modern strategies for role discovery typically rely on graph embedding techniques, which are capable of recognising complex local structures. However, when working with large, real-world networks, it is difficult to interpret or validate a set of roles identified according to these method… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: submitted to the international conference on complex networks and their applications

  17. arXiv:2204.07292  [pdf, other

    cs.LG q-bio.QM

    Unsupervised Probabilistic Models for Sequential Electronic Health Records

    Authors: Alan D. Kaplan, John D. Greene, Vincent X. Liu, Priyadip Ray

    Abstract: We develop an unsupervised probabilistic model for heterogeneous Electronic Health Record (EHR) data. Utilizing a mixture model formulation, our approach directly models sequences of arbitrary length, such as medications and laboratory results. This allows for subgrouping and incorporation of the dynamics underlying heterogeneous data types. The model consists of a layered set of latent variables… ▽ More

    Submitted 31 August, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

  18. arXiv:2203.12504  [pdf, other

    cs.DL cs.SI

    Author Multidisciplinarity and Disciplinary Roles in Field of Study Networks

    Authors: Eoghan Cunningham, Barry Smyth, Derek Greene

    Abstract: When studying large research corpora, "distant reading" methods are vital to understand the topics and trends in the corresponding research space. In particular, given the recognised benefits of multidisciplinary research, it may be important to map schools or communities of diverse research topics, and to understand the multidisciplinary role that topics play within and between these communities.… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

  19. Assessing Network Representations for Identifying Interdisciplinarity

    Authors: Eoghan Cunningham, Derek Greene

    Abstract: Many studies have sought to identify interdisciplinary research as a function of the diversity of disciplines identified in an article's references or citations. However, given the constant evolution of the scientific landscape, disciplinary boundaries are shifting and blurring, making it increasingly difficult to describe research within a strict taxonomy. In this work, we explore the potential f… ▽ More

    Submitted 8 April, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

  20. A Novel Perspective to Look At Attention: Bi-level Attention-based Explainable Topic Modeling for News Classification

    Authors: Dairui Liu, Derek Greene, Ruihai Dong

    Abstract: Many recent deep learning-based solutions have widely adopted the attention-based mechanism in various tasks of the NLP discipline. However, the inherent characteristics of deep learning models and the flexibility of the attention mechanism increase the models' complexity, thus leading to challenges in model explainability. In this paper, to address this challenge, we propose a novel practical fra… ▽ More

    Submitted 27 October, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

    Comments: Findings of ACL2022

  21. arXiv:2202.07376  [pdf, other

    cs.IR

    Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Query Performance Prediction

    Authors: Suchana Datta, Debasis Ganguly, Derek Greene, Mandar Mitra

    Abstract: Motivated by the recent success of end-to-end deep neural models for ranking tasks, we present here a supervised end-to-end neural approach for query performance prediction (QPP). In contrast to unsupervised approaches that rely on various statistics of document score distributions, our approach is entirely data-driven. Further, in contrast to weakly supervised approaches, our method also does not… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

  22. arXiv:2202.06306  [pdf, ps, other

    cs.IR

    An Analysis of Variations in the Effectiveness of Query Performance Prediction

    Authors: Debasis Ganguly, Suchana Datta, Mandar Mitra, Derek Greene

    Abstract: A query performance predictor estimates the retrieval effectiveness of an IR system for a given query. An important characteristic of QPP evaluation is that, since the ground truth retrieval effectiveness for QPP evaluation can be measured with different metrics, the ground truth itself is not absolute, which is in contrast to other retrieval tasks, such as that of ad-hoc retrieval. Motivated by t… ▽ More

    Submitted 13 February, 2022; originally announced February 2022.

  23. Collaboration in the Time of COVID: A Scientometric Analysis of Multidisciplinary SARS-CoV-2 Research

    Authors: Eoghan Cunningham, Barry Smyth, Derek Greene

    Abstract: The novel coronavirus SARS-CoV-2 and the COVID-19 illness it causes have inspired unprecedented levels of multidisciplinary research in an effort to address a generational public health challenge. In this work we conduct a scientometric analysis of COVID-19 research, paying particular attention to the nature of collaboration that this pandemic has fostered among different disciplines. Increased mu… ▽ More

    Submitted 30 August, 2021; originally announced August 2021.

    Comments: Submitted to Humanities and Social Sciences Communications: accepted pending minor revisions

    Journal ref: Humanit Soc Sci Commun 8, 240 (2021)

  24. arXiv:2107.09734  [pdf, other

    cs.LG cs.AI

    Uncertainty Estimation and Out-of-Distribution Detection for Counterfactual Explanations: Pitfalls and Solutions

    Authors: Eoin Delaney, Derek Greene, Mark T. Keane

    Abstract: Whilst an abundance of techniques have recently been proposed to generate counterfactual explanations for the predictions of opaque black-box systems, markedly less attention has been paid to exploring the uncertainty of these generated explanations. This becomes a critical issue in high-stakes scenarios, where uncertain and misleading explanations could have dire consequences (e.g., medical diagn… ▽ More

    Submitted 20 July, 2021; originally announced July 2021.

    Journal ref: ICML Workshop on Algorithmic Recourse, July 2021

  25. arXiv:2104.14461  [pdf

    cs.AI

    Twin Systems for DeepCBR: A Menagerie of Deep Learning and Case-Based Reasoning Pairings for Explanation and Data Augmentation

    Authors: Mark T Keane, Eoin M Kenny, Mohammed Temraz, Derek Greene, Barry Smyth

    Abstract: Recently, it has been proposed that fruitful synergies may exist between Deep Learning (DL) and Case Based Reasoning (CBR); that there are insights to be gained by applying CBR ideas to problems in DL (what could be called DeepCBR). In this paper, we report on a program of research that applies CBR solutions to the problem of Explainable AI (XAI) in the DL. We describe a series of twin-systems pai… ▽ More

    Submitted 13 June, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: 7 pages,4 figures, 2 tables

    Journal ref: IJCAI-21 Workshop on DL-CBR-AML, July 2021

  26. arXiv:2009.13211  [pdf, other

    cs.LG stat.ML

    Instance-based Counterfactual Explanations for Time Series Classification

    Authors: Eoin Delaney, Derek Greene, Mark T. Keane

    Abstract: In recent years, there has been a rapidly expanding focus on explaining the predictions made by black-box AI systems that handle image and tabular data. However, considerably less attention has been paid to explaining the predictions of opaque AI systems handling time series data. In this paper, we advance a novel model-agnostic, case-based technique -- Native Guide -- that generates counterfactua… ▽ More

    Submitted 24 June, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

  27. arXiv:2008.05223  [pdf, other

    physics.med-ph cs.LG eess.IV

    Bone Segmentation in Contrast Enhanced Whole-Body Computed Tomography

    Authors: Patrick Leydon, Martin O'Connell, Derek Greene, Kathleen M Curran

    Abstract: Segmentation of bone regions allows for enhanced diagnostics, disease characterisation and treatment monitoring in CT imaging. In contrast enhanced whole-body scans accurate automatic segmentation is particularly difficult as low dose whole body protocols reduce image quality and make contrast enhanced regions more difficult to separate when relying on differences in pixel intensities. This paper… ▽ More

    Submitted 13 August, 2020; v1 submitted 12 August, 2020; originally announced August 2020.

    Comments: 15 pages, 10 figures and 3 tables. Submitted to The Journal of Physics in Medicine and Biology for possible publication

  28. arXiv:2005.06898  [pdf, other

    cs.CL cs.LG

    Mitigating Gender Bias in Machine Learning Data Sets

    Authors: Susan Leavy, Gerardine Meaney, Karen Wade, Derek Greene

    Abstract: Artificial Intelligence has the capacity to amplify and perpetuate societal biases and presents profound ethical implications for society. Gender bias has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, in… ▽ More

    Submitted 18 May, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

    Comments: 10 pages, 5 figures, 5 Tables, Presented as Bias2020 workshop (as part of the ECIR Conference) - http://bias.disim.univaq.it

  29. arXiv:1908.05192  [pdf, other

    cs.SI cs.LG

    Temporal Analysis of Reddit Networks via Role Embeddings

    Authors: Siobhan Grayson, Derek Greene

    Abstract: Inspired by diachronic word analysis from the field of natural language processing, we propose an approach for uncovering temporal insights regarding user roles from social networks using graph embedding methods. Specifically, we apply the role embedding algorithm, struc2vec, to a collection of social networks exhibiting either "loyal" or "vagrant" characteristics derived from the popular online s… ▽ More

    Submitted 14 August, 2019; originally announced August 2019.

  30. arXiv:1810.05511  [pdf, other

    cs.SI physics.soc-ph

    Semi-Supervised Overlapping Community Finding based on Label Propagation with Pairwise Constraints

    Authors: Elham Alghamdi, Derek Greene

    Abstract: Algorithms for detecting communities in complex networks are generally unsupervised, relying solely on the structure of the network. However, these methods can often fail to uncover meaningful groupings that reflect the underlying communities in the data, particularly when those structures are highly overlapping. One way to improve the usefulness of these algorithms is by incorporating additional… ▽ More

    Submitted 21 November, 2018; v1 submitted 12 October, 2018; originally announced October 2018.

    Comments: Fix tables

  31. arXiv:1810.03046  [pdf, other

    cs.SI cs.CY

    MeetupNet Dublin: Discovering Communities in Dublin's Meetup Network

    Authors: Arjun Pakrashi, Elham Alghamdi, Brian Mac Namee, Derek Greene

    Abstract: Meetup.com is a global online platform which facilitates the organisation of meetups in different parts of the world. A meetup group typically focuses on one specific topic of interest, such as sports, music, language, or technology. However, many users of this platform attend multiple meetups. On this basis, we can construct a co-membership network for a given location. This network encodes how p… ▽ More

    Submitted 2 November, 2018; v1 submitted 6 October, 2018; originally announced October 2018.

  32. arXiv:1710.05212  [pdf

    cs.CY

    On Supporting Digital Journalism: Case Studies in Co-Designing Journalistic Tools

    Authors: Georgiana Ifrim, Derek Greene, Mark T. Keane, Claudia Orellana-Rodriguez, Bichen Shi, Gevorg Poghosyan

    Abstract: Since 2013 researchers at University College Dublin in the Insight Centre for Data Analytics have been involved in a significant research programme in digital journalism, specifically targeting tools and social media guidelines to support the work of journalists. Most of this programme was undertaken in collaboration with The Irish Times. This collaboration involved identifying key problems curren… ▽ More

    Submitted 14 October, 2017; originally announced October 2017.

    Comments: Computation + Journalism Symposium (C+J 2017), October 2017, Northwestern University, Evanston, IL USA

    Journal ref: Computation + Journalism Symposium (C+J 2017), October 2017, Northwestern University, Evanston, IL USA

  33. arXiv:1702.07186  [pdf, other

    cs.IR cs.CL cs.LG stat.ML

    Stability of Topic Modeling via Matrix Factorization

    Authors: Mark Belford, Brian Mac Namee, Derek Greene

    Abstract: Topic models can provide us with an insight into the underlying latent structure of a large corpus of documents. A range of methods have been proposed in the literature, including probabilistic topic models and techniques based on matrix factorization. However, in both cases, standard implementations rely on stochastic elements in their initialization phase, which can potentially lead to different… ▽ More

    Submitted 9 September, 2017; v1 submitted 23 February, 2017; originally announced February 2017.

  34. arXiv:1702.06891  [pdf, other

    cs.CL

    EVE: Explainable Vector Based Embedding Technique Using Wikipedia

    Authors: M. Atif Qureshi, Derek Greene

    Abstract: We present an unsupervised explainable word embedding technique, called EVE, which is built upon the structure of Wikipedia. The proposed model defines the dimensions of a semantic vector representing a word using human-readable labels, thereby it readily interpretable. Specifically, each vector is constructed using the Wikipedia category graph structure together with the Wikipedia article link st… ▽ More

    Submitted 22 February, 2017; originally announced February 2017.

  35. arXiv:1607.03055  [pdf, other

    cs.CL cs.CY

    Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach

    Authors: Derek Greene, James P. Cross

    Abstract: This study analyzes the political agenda of the European Parliament (EP) plenary, how it has evolved over time, and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making plenary speeches. To unveil the plenary agenda and detect latent themes in legislative speeches over time, MEP speech content is analyzed using a new dynamic topic… ▽ More

    Submitted 11 July, 2016; originally announced July 2016.

    Comments: Long version including appendix. arXiv admin note: substantial text overlap with arXiv:1505.07302

  36. arXiv:1601.02975  [pdf, other

    cs.CY cs.AI

    Indicators of Good Student Performance in Moodle Activity Data

    Authors: Ewa Młynarska, Derek Greene, Pádraig Cunningham

    Abstract: In this paper we conduct an analysis of Moodle activity data focused on identifying early predictors of good student performance. The analysis shows that three relevant hypotheses are largely supported by the data. These hypotheses are: early submission is a good sign, a high level of activity is predictive of good results and evening activity is even better than daytime activity. We highlight som… ▽ More

    Submitted 12 January, 2016; originally announced January 2016.

    Comments: Short version

  37. arXiv:1508.01067  [pdf, other

    cs.CL cs.IR

    Topic Stability over Noisy Sources

    Authors: Jing Su, Oisín Boydell, Derek Greene, Gerard Lynch

    Abstract: Topic modelling techniques such as LDA have recently been applied to speech transcripts and OCR output. These corpora may contain noisy or erroneous texts which may undermine topic stability. Therefore, it is important to know how well a topic modelling algorithm will perform when applied to noisy data. In this paper we show that different types of textual noise will have diverse effects on the st… ▽ More

    Submitted 5 August, 2015; originally announced August 2015.

  38. arXiv:1505.07302  [pdf, other

    cs.CL cs.CY

    Unveiling the Political Agenda of the European Parliament Plenary: A Topical Analysis

    Authors: Derek Greene, James P. Cross

    Abstract: This study analyzes political interactions in the European Parliament (EP) by considering how the political agenda of the plenary sessions has evolved over time and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making Parliamentary speeches. It does so by considering the context in which speeches are made, and the content of those… ▽ More

    Submitted 7 July, 2015; v1 submitted 27 May, 2015; originally announced May 2015.

    Comments: Add link to implementation code on Github

  39. arXiv:1502.04609  [pdf, other

    cs.IR cs.HC

    TextLuas: Tracking and Visualizing Document and Term Clusters in Dynamic Text Data

    Authors: Derek Greene, Daniel Archambault, Václav Belák, Pádraig Cunningham

    Abstract: For large volumes of text data collected over time, a key knowledge discovery task is identifying and tracking clusters. These clusters may correspond to emerging themes, popular topics, or breaking news stories in a corpus. Therefore, recently there has been increased interest in the problem of clustering dynamic data. However, there exists little support for the interactive exploration of the ou… ▽ More

    Submitted 3 November, 2014; originally announced February 2015.

    Comments: 21 page version

  40. arXiv:1407.7736  [pdf, ps, other

    cs.SI cs.CL cs.CY physics.soc-ph

    A Latent Space Analysis of Editor Lifecycles in Wikipedia

    Authors: Xiangju Qin, Derek Greene, Pádraig Cunningham

    Abstract: Collaborations such as Wikipedia are a key part of the value of the modern Internet. At the same time there is concern that these collaborations are threatened by high levels of member turnover. In this paper we borrow ideas from topic analysis to editor activity on Wikipedia over time into a latent space that offers an insight into the evolving patterns of editor behavior. This latent space repre… ▽ More

    Submitted 29 July, 2014; originally announced July 2014.

    Comments: 16 pages, In Proc. of 5th International Workshop on Mining Ubiquitous and Social Environments (MUSE) at ECML/PKDD 2014

  41. arXiv:1404.4606  [pdf, other

    cs.LG cs.CL cs.IR

    How Many Topics? Stability Analysis for Topic Models

    Authors: Derek Greene, Derek O'Callaghan, Pádraig Cunningham

    Abstract: Topic modeling refers to the task of discovering the underlying thematic structure in a text corpus, where the output is commonly presented as a report of the top terms appearing in each topic. Despite the diversity of topic modeling algorithms that have been proposed, a common challenge in successfully applying these techniques is the selection of an appropriate number of topics for a given corpu… ▽ More

    Submitted 19 June, 2014; v1 submitted 16 April, 2014; originally announced April 2014.

    Comments: Improve readability of plots. Add minor clarifications

  42. arXiv:1403.2923  [pdf, ps, other

    cs.IR cs.NE

    Adaptive Representations for Tracking Breaking News on Twitter

    Authors: Igor Brigadir, Derek Greene, Pádraig Cunningham

    Abstract: Twitter is often the most up-to-date source for finding and tracking breaking news stories. Therefore, there is considerable interest in developing filters for tweet streams in order to track and summarize stories. This is a non-trivial text analytics task as tweets are short, and standard retrieval methods often fail as stories evolve over time. In this paper we examine the effectiveness of adapt… ▽ More

    Submitted 28 November, 2014; v1 submitted 12 March, 2014; originally announced March 2014.

    Comments: 8 Page

    ACM Class: I.5.4; I.5.1; H.3.3

  43. arXiv:1401.7535  [pdf, other

    cs.SI cs.CY physics.soc-ph

    Online Social Media in the Syria Conflict: Encompassing the Extremes and the In-Betweens

    Authors: Derek O'Callaghan, Nico Prucha, Derek Greene, Maura Conway, Joe Carthy, Pádraig Cunningham

    Abstract: The Syria conflict has been described as the most socially mediated in history, with online social media playing a particularly important role. At the same time, the ever-changing landscape of the conflict leads to difficulties in applying analytical approaches taken by other studies of online political activism. Therefore, in this paper, we use an approach that does not require strong prior assum… ▽ More

    Submitted 13 August, 2014; v1 submitted 29 January, 2014; originally announced January 2014.

    Comments: 8 pages, 3 figures, 3 tables. Minor changes including additional references

  44. arXiv:1308.6149  [pdf, other

    cs.SI cs.CY physics.soc-ph

    The Extreme Right Filter Bubble

    Authors: Derek O'Callaghan, Derek Greene, Maura Conway, Joe Carthy, Pádraig Cunningham

    Abstract: Due to its status as the most popular video sharing platform, YouTube plays an important role in the online strategy of extreme right groups, where it is often used to host associated content such as music and other propaganda. In this paper, we develop a categorization suitable for the analysis of extreme right channels found on YouTube. By combining this with an NMF-based topic modelling method,… ▽ More

    Submitted 28 August, 2013; originally announced August 2013.

    Comments: 10 pages, 7 figures

    ACM Class: H.3.3; H.3.5

  45. arXiv:1308.5125  [pdf, other

    cs.SI physics.soc-ph

    Discovering Latent Patterns from the Analysis of User-Curated Movie Lists

    Authors: Derek Greene, Pádraig Cunningham

    Abstract: User content curation is becoming an important source of preference data, as well as providing information regarding the items being curated. One popular approach involves the creation of lists. On Twitter, these lists might contain accounts relevant to a particular topic, whereas on a community site such as the Internet Movie Database (IMDb), this might take the form of lists of movies sharing co… ▽ More

    Submitted 23 August, 2013; originally announced August 2013.

    Comments: 13 pages

  46. arXiv:1306.3839  [pdf, other

    cs.SI physics.soc-ph

    TwitterCrowds: Techniques for Exploring Topic and Sentiment in Microblogging Data

    Authors: Daniel Archambault, Derek Greene, Pádraig Cunningham

    Abstract: Analysts and social scientists in the humanities and industry require techniques to help visualize large quantities of microblogging data. Methods for the automated analysis of large scale social media data (on the order of tens of millions of tweets) are widely available, but few visualization techniques exist to support interactive exploration of the results. In this paper, we present extended d… ▽ More

    Submitted 17 June, 2013; originally announced June 2013.

    Comments: 19 pages, colour figures

  47. arXiv:1302.1726  [pdf, other

    cs.SI physics.soc-ph

    Uncovering the Wider Structure of Extreme Right Communities Spanning Popular Online Networks

    Authors: Derek O'Callaghan, Derek Greene, Maura Conway, Joe Carthy, Pádraig Cunningham

    Abstract: Recent years have seen increased interest in the online presence of extreme right groups. Although originally composed of dedicated websites, the online extreme right milieu now spans multiple networks, including popular social media platforms such as Twitter, Facebook and YouTube. Ideally therefore, any contemporary analysis of online extreme right activity requires the consideration of multiple… ▽ More

    Submitted 16 May, 2013; v1 submitted 7 February, 2013; originally announced February 2013.

    Comments: 10 pages, 11 figures. Due to use of "sigchi" template, minor changes were made to ensure 10 page limit was not exceeded. Minor clarifications in Introduction, Data and Methodology sections

    ACM Class: J.4; H.2.8

  48. arXiv:1301.5809  [pdf, other

    cs.SI physics.soc-ph

    Producing a Unified Graph Representation from Multiple Social Network Views

    Authors: Derek Greene, Pádraig Cunningham

    Abstract: In many social networks, several different link relations will exist between the same set of users. Additionally, attribute or textual information will be associated with those users, such as demographic details or user-generated content. For many data analysis tasks, such as community finding and data visualisation, the provision of multiple heterogeneous types of user data makes the analysis pro… ▽ More

    Submitted 18 February, 2013; v1 submitted 24 January, 2013; originally announced January 2013.

    Comments: 13 pages. Clarify notation

  49. arXiv:1207.0017  [pdf, other

    cs.SI physics.soc-ph

    Identifying Topical Twitter Communities via User List Aggregation

    Authors: Derek Greene, Derek O'Callaghan, Pádraig Cunningham

    Abstract: A particular challenge in the area of social media analysis is how to find communities within a larger network of social interactions. Here a community may be a group of microblogging users who post content on a coherent topic, or who are associated with a specific event or news story. Twitter provides the ability to curate users into lists, corresponding to meaningful topics or themes. Here we de… ▽ More

    Submitted 29 June, 2012; originally announced July 2012.

  50. arXiv:1206.7050  [pdf, other

    cs.SI cs.CY physics.soc-ph

    An Analysis of Interactions Within and Between Extreme Right Communities in Social Media

    Authors: Derek O'Callaghan, Derek Greene, Maura Conway, Joe Carthy, Pádraig Cunningham

    Abstract: Many extreme right groups have had an online presence for some time through the use of dedicated websites. This has been accompanied by increased activity in social media platforms in recent years, enabling the dissemination of extreme right content to a wider audience. In this paper, we present an analysis of the activity of a selection of such groups on Twitter, using network representations bas… ▽ More

    Submitted 27 January, 2014; v1 submitted 29 June, 2012; originally announced June 2012.

    Comments: 20 pages, 6 figures. Additional topic analysis