Search | arXiv e-print repository

Evaluating Ensemble Methods for News Recommender Systems

Abstract: News recommendation is crucial for facilitating individuals' access to articles, particularly amid the increasingly digital landscape of news consumption. Consequently, extensive research is dedicated to News Recommender Systems (NRS) with increasingly sophisticated algorithms. Despite this sustained scholarly inquiry, there exists a notable research gap regarding the potential synergy achievable… ▽ More News recommendation is crucial for facilitating individuals' access to articles, particularly amid the increasingly digital landscape of news consumption. Consequently, extensive research is dedicated to News Recommender Systems (NRS) with increasingly sophisticated algorithms. Despite this sustained scholarly inquiry, there exists a notable research gap regarding the potential synergy achievable by amalgamating these algorithms to yield superior outcomes. This paper endeavours to address this gap by demonstrating how ensemble methods can be used to combine many diverse state-of-the-art algorithms to achieve superior results on the Microsoft News dataset (MIND). Additionally, we identify scenarios where ensemble methods fail to improve results and offer explanations for this occurrence. Our findings demonstrate that a combination of NRS algorithms can outperform individual algorithms, provided that the base learners are sufficiently diverse, with improvements of up to 5\% observed for an ensemble consisting of a content-based BERT approach and the collaborative filtering LSTUR algorithm. Additionally, our results demonstrate the absence of any improvement when combining insufficiently distinct methods. These findings provide insight into successful approaches of ensemble methods in NRS and advocates for the development of better systems through appropriate ensemble solutions. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.14483 [pdf, other]

Valid Error Bars for Neural Weather Models using Conformal Prediction

Authors: Vignesh Gopakumar, Joel Oskarrson, Ander Gray, Lorenzo Zanisi, Stanislas Pamela, Daniel Giles, Matt Kusner, Marc Deisenroth

Abstract: Neural weather models have shown immense potential as inexpensive and accurate alternatives to physics-based models. However, most models trained to perform weather forecasting do not quantify the uncertainty associated with their forecasts. This limits the trust in the model and the usefulness of the forecasts. In this work we construct and formalise a conformal prediction framework as a post-pro… ▽ More Neural weather models have shown immense potential as inexpensive and accurate alternatives to physics-based models. However, most models trained to perform weather forecasting do not quantify the uncertainty associated with their forecasts. This limits the trust in the model and the usefulness of the forecasts. In this work we construct and formalise a conformal prediction framework as a post-processing method for estimating this uncertainty. The method is model-agnostic and gives calibrated error bounds for all variables, lead times and spatial locations. No modifications are required to the model and the computational cost is negligible compared to model training. We demonstrate the usefulness of the conformal prediction framework on a limited area neural weather model for the Nordic region. We further explore the advantages of the framework for deterministic and probabilistic models. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2405.02350 [pdf, ps, other]

What makes Models Compositional? A Theoretical View: With Supplement

Authors: Parikshit Ram, Tim Klinger, Alexander G. Gray

Abstract: Compositionality is thought to be a key component of language, and various compositional benchmarks have been developed to empirically probe the compositional generalization of existing sequence processing models. These benchmarks often highlight failures of existing models, but it is not clear why these models fail in this way. In this paper, we seek to theoretically understand the role the compo… ▽ More Compositionality is thought to be a key component of language, and various compositional benchmarks have been developed to empirically probe the compositional generalization of existing sequence processing models. These benchmarks often highlight failures of existing models, but it is not clear why these models fail in this way. In this paper, we seek to theoretically understand the role the compositional structure of the models plays in these failures and how this structure relates to their expressivity and sample complexity. We propose a general neuro-symbolic definition of compositional functions and their compositional complexity. We then show how various existing general and special purpose sequence processing models (such as recurrent, convolution and attention-based ones) fit this definition and use it to analyze their compositional complexity. Finally, we provide theoretical guarantees for the expressivity and systematic generalization of compositional models that explicitly depend on our proposed definition and highlighting factors which drive poor empirical performance. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: Extended version of the original IJCAI 2024 paper with detailed supplementary materials (27 pages, 7 figures)

arXiv:2403.16887 [pdf]

ChatGPT "contamination": estimating the prevalence of LLMs in the scholarly literature

Authors: Andrew Gray

Abstract: The use of ChatGPT and similar Large Language Model (LLM) tools in scholarly communication and academic publishing has been widely discussed since they became easily accessible to a general audience in late 2022. This study uses keywords known to be disproportionately present in LLM-generated text to provide an overall estimate for the prevalence of LLM-assisted writing in the scholarly literature… ▽ More The use of ChatGPT and similar Large Language Model (LLM) tools in scholarly communication and academic publishing has been widely discussed since they became easily accessible to a general audience in late 2022. This study uses keywords known to be disproportionately present in LLM-generated text to provide an overall estimate for the prevalence of LLM-assisted writing in the scholarly literature. For the publishing year 2023, it is found that several of those keywords show a distinctive and disproportionate increase in their prevalence, individually and in combination. It is estimated that at least 60,000 papers (slightly over 1% of all articles) were LLM-assisted, though this number could be extended and refined by analysis of other characteristics of the papers or by identification of further indicative keywords. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 12 pages, 6 figures

arXiv:2402.13440 [pdf, other]

A Neuro-Symbolic Approach to Multi-Agent RL for Interpretability and Probabilistic Decision Making

Authors: Chitra Subramanian, Miao Liu, Naweed Khan, Jonathan Lenchner, Aporva Amarnath, Sarathkrishna Swaminathan, Ryan Riegel, Alexander Gray

Abstract: Multi-agent reinforcement learning (MARL) is well-suited for runtime decision-making in optimizing the performance of systems where multiple agents coexist and compete for shared resources. However, applying common deep learning-based MARL solutions to real-world problems suffers from issues of interpretability, sample efficiency, partial observability, etc. To address these challenges, we present… ▽ More Multi-agent reinforcement learning (MARL) is well-suited for runtime decision-making in optimizing the performance of systems where multiple agents coexist and compete for shared resources. However, applying common deep learning-based MARL solutions to real-world problems suffers from issues of interpretability, sample efficiency, partial observability, etc. To address these challenges, we present an event-driven formulation, where decision-making is handled by distributed co-operative MARL agents using neuro-symbolic methods. The recently introduced neuro-symbolic Logical Neural Networks (LNN) framework serves as a function approximator for the RL, to train a rules-based policy that is both logical and interpretable by construction. To enable decision-making under uncertainty and partial observability, we developed a novel probabilistic neuro-symbolic framework, Probabilistic Logical Neural Networks (PLNN), which combines the capabilities of logical reasoning with probabilistic graphical models. In PLNN, the upward/downward inference strategy, inherited from LNN, is coupled with belief bounds by setting the activation function for the logical operator associated with each neural network node to a probability-respecting generalization of the Fréchet inequalities. These PLNN nodes form the unifying element that combines probabilistic logic and Bayes Nets, permitting inference for variables with unobserved states. We demonstrate our contributions by addressing key MARL challenges for power sharing in a system-on-chip application. △ Less

Submitted 20 February, 2024; originally announced February 2024.

ACM Class: I.2.6

arXiv:2311.05967 [pdf, other]

doi 10.1088/1741-4326/ad313a

Plasma Surrogate Modelling using Fourier Neural Operators

Authors: Vignesh Gopakumar, Stanislas Pamela, Lorenzo Zanisi, Zongyi Li, Ander Gray, Daniel Brennand, Nitesh Bhatia, Gregory Stathopoulos, Matt Kusner, Marc Peter Deisenroth, Anima Anandkumar, JOREK Team, MAST Team

Abstract: Predicting plasma evolution within a Tokamak reactor is crucial to realizing the goal of sustainable fusion. Capabilities in forecasting the spatio-temporal evolution of plasma rapidly and accurately allow us to quickly iterate over design and control strategies on current Tokamak devices and future reactors. Modelling plasma evolution using numerical solvers is often expensive, consuming many hou… ▽ More Predicting plasma evolution within a Tokamak reactor is crucial to realizing the goal of sustainable fusion. Capabilities in forecasting the spatio-temporal evolution of plasma rapidly and accurately allow us to quickly iterate over design and control strategies on current Tokamak devices and future reactors. Modelling plasma evolution using numerical solvers is often expensive, consuming many hours on supercomputers, and hence, we need alternative inexpensive surrogate models. We demonstrate accurate predictions of plasma evolution both in simulation and experimental domains using deep learning-based surrogate modelling tools, viz., Fourier Neural Operators (FNO). We show that FNO has a speedup of six orders of magnitude over traditional solvers in predicting the plasma dynamics simulated from magnetohydrodynamic models, while maintaining a high accuracy (MSE in the normalised domain $\approx$ $10^{-5}$). Our modified version of the FNO is capable of solving multi-variable Partial Differential Equations (PDE), and can capture the dependence among the different variables in a single model. FNOs can also predict plasma evolution on real-world experimental data observed by the cameras positioned within the MAST Tokamak, i.e., cameras looking across the central solenoid and the divertor in the Tokamak. We show that FNOs are able to accurately forecast the evolution of plasma and have the potential to be deployed for real-time monitoring. We also illustrate their capability in forecasting the plasma shape, the locations of interactions of the plasma with the central solenoid and the divertor for the full (available) duration of the plasma shot within MAST. The FNO offers a viable alternative for surrogate modelling as it is quick to train and infer, and requires fewer data points, while being able to do zero-shot super-resolution and getting high-fidelity solutions. △ Less

Submitted 18 June, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

Journal ref: Nucl. Fusion 64 056025 (2024)

arXiv:2309.16467 [pdf, other]

Compositional Program Generation for Few-Shot Systematic Generalization

Authors: Tim Klinger, Luke Liu, Soham Dan, Maxwell Crouse, Parikshit Ram, Alexander Gray

Abstract: Compositional generalization is a key ability of humans that enables us to learn new concepts from only a handful examples. Neural machine learning models, including the now ubiquitous Transformers, struggle to generalize in this way, and typically require thousands of examples of a concept during training in order to generalize meaningfully. This difference in ability between humans and artificia… ▽ More Compositional generalization is a key ability of humans that enables us to learn new concepts from only a handful examples. Neural machine learning models, including the now ubiquitous Transformers, struggle to generalize in this way, and typically require thousands of examples of a concept during training in order to generalize meaningfully. This difference in ability between humans and artificial neural architectures, motivates this study on a neuro-symbolic architecture called the Compositional Program Generator (CPG). CPG has three key features: \textit{modularity}, \textit{composition}, and \textit{abstraction}, in the form of grammar rules, that enable it to generalize both systematically to new concepts in a few-shot manner, as well as productively by length on various sequence-to-sequence language tasks. For each input, CPG uses a grammar of the input language and a parser to generate a parse in which each grammar rule is assigned its own unique semantic module, a probabilistic copy or substitution program. Instances with the same parse are always processed with the same composed modules, while those with different parses may be processed with different modules. CPG learns parameters for the modules and is able to learn the semantics for new rules and types incrementally, without forgetting or retraining on rules it's already seen. It achieves perfect generalization on both the SCAN and COGS benchmarks using just 14 examples for SCAN and 22 examples for COGS -- state-of-the-art accuracy with a 1000x improvement in sample efficiency. △ Less

Submitted 18 January, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: 7 pages of text with 1 page of references

arXiv:2308.13292 [pdf, other]

A Bayesian Active Learning Approach to Comparative Judgement

Authors: Andy Gray, Alma Rahat, Tom Crick, Stephen Lindsay

Abstract: Assessment is a crucial part of education. Traditional marking is a source of inconsistencies and unconscious bias, placing a high cognitive load on the assessors. An approach to address these issues is comparative judgement (CJ). In CJ, the assessor is presented with a pair of items and is asked to select the better one. Following a series of comparisons, a rank is derived using a ranking model,… ▽ More Assessment is a crucial part of education. Traditional marking is a source of inconsistencies and unconscious bias, placing a high cognitive load on the assessors. An approach to address these issues is comparative judgement (CJ). In CJ, the assessor is presented with a pair of items and is asked to select the better one. Following a series of comparisons, a rank is derived using a ranking model, for example, the BTM, based on the results. While CJ is considered a reliable method for marking, there are concerns around transparency, and the ideal number of pairwise comparisons to generate a reliable estimation of the rank order is not known. Additionally, there have been attempts to generate a method of selecting pairs that should be compared next in an informative manner, but some existing methods are known to have created their own bias within results inflating the reliability metric used. As a result, a random selection approach is usually deployed. We propose a novel Bayesian approach to CJ (BCJ) for determining the ranks of compared items alongside a new way to select the pairs to present to the marker(s) using active learning (AL), addressing the key shortcomings of traditional CJ. Furthermore, we demonstrate how the entire approach may provide transparency by providing the user insights into how it is making its decisions and, at the same time, being more efficient. Results from our experiments confirm that the proposed BCJ combined with entropy-driven AL pair-selection method is superior to other alternatives. We also find that the more comparisons done, the more accurate BCJ becomes, which solves the issue the current method has of the model deteriorating if too many comparisons are performed. As our approach can generate the complete predicted rank distribution for an item, we also show how this can be utilised in devising a predicted grade, guided by the assessor. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: 16 pages

arXiv:2307.02689 [pdf, other]

Learning Symbolic Rules over Abstract Meaning Representations for Textual Reinforcement Learning

Authors: Subhajit Chaudhury, Sarathkrishna Swaminathan, Daiki Kimura, Prithviraj Sen, Keerthiram Murugesan, Rosario Uceda-Sosa, Michiaki Tatsubori, Achille Fokoue, Pavan Kapanipathi, Asim Munawar, Alexander Gray

Abstract: Text-based reinforcement learning agents have predominantly been neural network-based models with embeddings-based representation, learning uninterpretable policies that often do not generalize well to unseen games. On the other hand, neuro-symbolic methods, specifically those that leverage an intermediate formal representation, are gaining significant attention in language understanding tasks. Th… ▽ More Text-based reinforcement learning agents have predominantly been neural network-based models with embeddings-based representation, learning uninterpretable policies that often do not generalize well to unseen games. On the other hand, neuro-symbolic methods, specifically those that leverage an intermediate formal representation, are gaining significant attention in language understanding tasks. This is because of their advantages ranging from inherent interpretability, the lesser requirement of training data, and being generalizable in scenarios with unseen data. Therefore, in this paper, we propose a modular, NEuro-Symbolic Textual Agent (NESTA) that combines a generic semantic parser with a rule induction system to learn abstract interpretable rules as policies. Our experiments on established text-based game benchmarks show that the proposed NESTA method outperforms deep reinforcement learning-based techniques by achieving better generalization to unseen test games and learning from fewer training interactions. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: ACL 2023

arXiv:2306.15041 [pdf]

A Comparison of Neuroelectrophysiology Databases

Authors: Priyanka Subash, Alex Gray, Misque Boswell, Samantha L. Cohen, Rachael Garner, Sana Salehi, Calvary Fisher, Samuel Hobel, Satrajit Ghosh, Yaroslav Halchenko, Benjamin Dichter, Russell A. Poldrack, Chris Markiewicz, Dora Hermes, Arnaud Delorme, Scott Makeig, Brendan Behan, Alana Sparks, Stephen R Arnott, Zhengjia Wang, John Magnotti, Michael S. Beauchamp, Nader Pouratian, Arthur W. Toga, Dominique Duncan

Abstract: As data sharing has become more prevalent, three pillars - archives, standards, and analysis tools - have emerged as critical components in facilitating effective data sharing and collaboration. This paper compares four freely available intracranial neuroelectrophysiology data repositories: Data Archive for the BRAIN Initiative (DABI), Distributed Archives for Neurophysiology Data Integration (DAN… ▽ More As data sharing has become more prevalent, three pillars - archives, standards, and analysis tools - have emerged as critical components in facilitating effective data sharing and collaboration. This paper compares four freely available intracranial neuroelectrophysiology data repositories: Data Archive for the BRAIN Initiative (DABI), Distributed Archives for Neurophysiology Data Integration (DANDI), OpenNeuro, and Brain-CODE. The aim of this review is to describe archives that provide researchers with tools to store, share, and reanalyze both human and non-human neurophysiology data based on criteria that are of interest to the neuroscientific community. The Brain Imaging Data Structure (BIDS) and Neurodata Without Borders (NWB) are utilized by these archives to make data more accessible to researchers by implementing a common standard. As the necessity for integrating large-scale analysis into data repository platforms continues to grow within the neuroscientific community, this article will highlight the various analytical and customizable tools developed within the chosen archives that may advance the field of neuroinformatics. △ Less

Submitted 30 August, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: 22 pages, 6 figures, 5 tables

arXiv:2306.13906 [pdf, other]

doi 10.1145/3587102.3588792

Can GPT-4 Support Analysis of Textual Data in Tasks Requiring Highly Specialized Domain Expertise?

Authors: Jaromir Savelka, Kevin D. Ashley, Morgan A Gray, Hannes Westermann, Huihui Xu

Abstract: We evaluated the capability of generative pre-trained transformers~(GPT-4) in analysis of textual data in tasks that require highly specialized domain expertise. Specifically, we focused on the task of analyzing court opinions to interpret legal concepts. We found that GPT-4, prompted with annotation guidelines, performs on par with well-trained law student annotators. We observed that, with a rel… ▽ More We evaluated the capability of generative pre-trained transformers~(GPT-4) in analysis of textual data in tasks that require highly specialized domain expertise. Specifically, we focused on the task of analyzing court opinions to interpret legal concepts. We found that GPT-4, prompted with annotation guidelines, performs on par with well-trained law student annotators. We observed that, with a relatively minor decrease in performance, GPT-4 can perform batch predictions leading to significant cost reductions. However, employing chain-of-thought prompting did not lead to noticeably improved performance on this task. Further, we demonstrated how to analyze GPT-4's predictions to identify and mitigate deficiencies in annotation guidelines, and subsequently improve the performance of the model. Finally, we observed that the model is quite brittle, as small formatting related changes in the prompt had a high impact on the predictions. These findings can be leveraged by researchers and practitioners who engage in semantic/pragmatic annotations of texts in the context of the tasks requiring highly specialized domain expertise. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Journal ref: ITiCSE 2023: Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. June 2023. Pages 117 - 123

arXiv:2306.10452 [pdf, other]

MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types

Authors: Keerthiram Murugesan, Sarathkrishna Swaminathan, Soham Dan, Subhajit Chaudhury, Chulaka Gunasekara, Maxwell Crouse, Diwakar Mahajan, Ibrahim Abdelaziz, Achille Fokoue, Pavan Kapanipathi, Salim Roukos, Alexander Gray

Abstract: With the growing interest in large language models, the need for evaluating the quality of machine text compared to reference (typically human-generated) text has become focal attention. Most recent works focus either on task-specific evaluation metrics or study the properties of machine-generated text captured by the existing metrics. In this work, we propose a new evaluation scheme to model huma… ▽ More With the growing interest in large language models, the need for evaluating the quality of machine text compared to reference (typically human-generated) text has become focal attention. Most recent works focus either on task-specific evaluation metrics or study the properties of machine-generated text captured by the existing metrics. In this work, we propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts. Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types such as spatial/geographic errors, entity errors, etc, to guide the model for better prediction of human judgments. We propose a neural framework for evaluating machine texts that uses these mismatch error types as auxiliary tasks and re-purposes the existing single-number evaluation metrics as additional scalar features, in addition to textual features extracted from the machine and reference texts. Our experiments reveal key insights about the existing metrics via the mismatch errors. We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation. △ Less

Submitted 17 June, 2023; originally announced June 2023.

Comments: Accepted at ACL 2023 (ACL Findings Long)

arXiv:2306.09525 [pdf, other]

Explaining Legal Concepts with Augmented Large Language Models (GPT-4)

Authors: Jaromir Savelka, Kevin D. Ashley, Morgan A. Gray, Hannes Westermann, Huihui Xu

Abstract: Interpreting the meaning of legal open-textured terms is a key task of legal professionals. An important source for this interpretation is how the term was applied in previous court cases. In this paper, we evaluate the performance of GPT-4 in generating factually accurate, clear and relevant explanations of terms in legislation. We compare the performance of a baseline setup, where GPT-4 is direc… ▽ More Interpreting the meaning of legal open-textured terms is a key task of legal professionals. An important source for this interpretation is how the term was applied in previous court cases. In this paper, we evaluate the performance of GPT-4 in generating factually accurate, clear and relevant explanations of terms in legislation. We compare the performance of a baseline setup, where GPT-4 is directly asked to explain a legal term, to an augmented approach, where a legal information retrieval module is used to provide relevant context to the model, in the form of sentences from case law. We found that the direct application of GPT-4 yields explanations that appear to be of very high quality on their surface. However, detailed analysis uncovered limitations in terms of the factual accuracy of the explanations. Further, we found that the augmentation leads to improved quality, and appears to eliminate the issue of hallucination, where models invent incorrect statements. These findings open the door to the building of systems that can autonomously retrieve relevant sentences from case law and condense them into a useful explanation for legal scholars, educators or practicing lawyers alike. △ Less

Submitted 22 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2305.20018 [pdf, other]

Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency

Authors: Maxwell Crouse, Ramon Astudillo, Tahira Naseem, Subhajit Chaudhury, Pavan Kapanipathi, Salim Roukos, Alexander Gray

Abstract: We introduce Logical Offline Cycle Consistency Optimization (LOCCO), a scalable, semi-supervised method for training a neural semantic parser. Conceptually, LOCCO can be viewed as a form of self-learning where the semantic parser being trained is used to generate annotations for unlabeled text that are then used as new supervision. To increase the quality of annotations, our method utilizes a coun… ▽ More We introduce Logical Offline Cycle Consistency Optimization (LOCCO), a scalable, semi-supervised method for training a neural semantic parser. Conceptually, LOCCO can be viewed as a form of self-learning where the semantic parser being trained is used to generate annotations for unlabeled text that are then used as new supervision. To increase the quality of annotations, our method utilizes a count-based prior over valid formal meaning representations and a cycle-consistency score produced by a neural text generation model as additional signals. Both the prior and semantic parser are updated in an alternate fashion from full passes over the training data, which can be seen as approximating the marginalization of latent structures through stochastic variational inference. The use of a count-based prior, frozen text generation model, and offline annotation process yields an approach with negligible complexity and latency increases as compared to conventional self-learning. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model. We demonstrate the utility of LOCCO on the well-known WebNLG benchmark where we obtain an improvement of 2 points against a self-learning parser under equivalent conditions, an improvement of 1.3 points against the previous state-of-the-art parser, and competitive text generation performance in terms of BLEU score. △ Less

Submitted 31 May, 2023; originally announced May 2023.

arXiv:2305.15022 [pdf, other]

Hierarchical clustering with dot products recovers hidden tree structure

Authors: Annie Gray, Alexander Modell, Patrick Rubin-Delanchy, Nick Whiteley

Abstract: In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a… ▽ More In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model. The key technical innovations are to understand how hierarchical information in this model translates into tree geometry which can be recovered from data, and to characterise the benefits of simultaneously growing sample size and data dimension. We demonstrate superior tree recovery performance with real data over existing approaches such as UPGMA, Ward's method, and HDBSCAN. △ Less

Submitted 1 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2301.10414 [pdf, other]

Towards a Unification of Logic and Information Theory

Authors: Luis A. Lastras, Barry Trager, Jonathan Lenchner, Wojtek Szpankowski, Chai Wah Wu, Mark Squillante, Alex Gray

Abstract: This article introduces a theory of communication that covers the following generic scenario: Alice knows more than Bob about a certain set of logic propositions and Alice and Bob wish to communicate as efficiently as possible with the shared goal that, following their communication, Bob should be able to deduce a particular logic proposition that Alice knows to be true. We assume that our logic… ▽ More This article introduces a theory of communication that covers the following generic scenario: Alice knows more than Bob about a certain set of logic propositions and Alice and Bob wish to communicate as efficiently as possible with the shared goal that, following their communication, Bob should be able to deduce a particular logic proposition that Alice knows to be true. We assume that our logic system is propositional logic, and we build on top of one of the legendary works in this area, namely the work of Carnap and Bar-Hillel on a theory of semantic information. Our main contribution is a collection of theorems studying various different assumptions on what Alice and Bob know and what their goal is. These theorems all provide sharp upper and lower bounds phrased in terms of an entropy-like function that we call $Λ$, in reference to its apparent connection to problems of communication involving logic. It turns out that when the goal is to communicate only a portion of the knowledge that Alice possesses, the optimum communication cost is lower than most people seem to assume, yet unavoidably, such optimum communication strategies end up allowing Bob to prove even more things than originally intended. Another interesting outcome is that in some scenarios, Alice need not know the logic statements that Bob knows in order to attain asymptotically the same communication efficiency as if she knew the statement, in a nod to the famous Slepian-Wolf and Wyner-Ziv results from source coding theory. Our work also introduces practical codes, which are comprised of a combination of linear codes and enumerative source codes, which turn out to be asymptotically optimal for some scenarios. △ Less

Submitted 16 April, 2024; v1 submitted 25 January, 2023; originally announced January 2023.

arXiv:2301.05131 [pdf, other]

Toward Theoretical Guidance for Two Common Questions in Practical Cross-Validation based Hyperparameter Selection

Authors: Parikshit Ram, Alexander G. Gray, Horst C. Samulowitz, Gregory Bramble

Abstract: We show, to our knowledge, the first theoretical treatments of two common questions in cross-validation based hyperparameter selection: (1) After selecting the best hyperparameter using a held-out set, we train the final model using {\em all} of the training data -- since this may or may not improve future generalization error, should one do this? (2) During optimization such as via SGD (stochasti… ▽ More We show, to our knowledge, the first theoretical treatments of two common questions in cross-validation based hyperparameter selection: (1) After selecting the best hyperparameter using a held-out set, we train the final model using {\em all} of the training data -- since this may or may not improve future generalization error, should one do this? (2) During optimization such as via SGD (stochastic gradient descent), we must set the optimization tolerance $ρ$ -- since it trades off predictive accuracy with computation cost, how should one set it? Toward these problems, we introduce the {\em hold-in risk} (the error due to not using the whole training data), and the {\em model class mis-specification risk} (the error due to having chosen the wrong model class) in a theoretical view which is simple, general, and suggests heuristics that can be used when faced with a dataset instance. In proof-of-concept studies in synthetic data where theoretical quantities can be controlled, we show that these heuristics can, respectively, (1) always perform at least as well as always performing retraining or never performing retraining, (2) either improve performance or reduce computational overhead by $2\times$ with no loss in predictive performance. △ Less

Submitted 12 January, 2023; originally announced January 2023.

Comments: Extended version of the paper appearing at the SIAM International Conference on Data Mining 2023 (SDM23)

arXiv:2208.11665 [pdf, other]

Statistical exploration of the Manifold Hypothesis

Authors: Nick Whiteley, Annie Gray, Patrick Rubin-Delanchy

Abstract: The Manifold Hypothesis is a widely accepted tenet of Machine Learning which asserts that nominally high-dimensional data are in fact concentrated near a low-dimensional manifold, embedded in high-dimensional space. This phenomenon is observed empirically in many real world situations, has led to development of a wide range of statistical methods in the last few decades, and has been suggested as… ▽ More The Manifold Hypothesis is a widely accepted tenet of Machine Learning which asserts that nominally high-dimensional data are in fact concentrated near a low-dimensional manifold, embedded in high-dimensional space. This phenomenon is observed empirically in many real world situations, has led to development of a wide range of statistical methods in the last few decades, and has been suggested as a key factor in the success of modern AI technologies. We show that rich and sometimes intricate manifold structure in data can emerge from a generic and remarkably simple statistical model -- the Latent Metric Model -- via elementary concepts such as latent variables, correlation and stationarity. This establishes a general statistical explanation for why the Manifold Hypothesis seems to hold in so many situations. Informed by the Latent Metric Model we derive procedures to discover and interpret the geometry of high-dimensional data, and explore hypotheses about the data generating mechanism. These procedures operate under minimal assumptions and make use of well known, scaleable graph-analytic algorithms. △ Less

Submitted 9 February, 2024; v1 submitted 24 August, 2022; originally announced August 2022.

MSC Class: 62R20; 62R40; 62G05; 62G20; 62R07; 62-08; 62H25; 62H30

arXiv:2204.01805 [pdf]

Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment

Authors: Andy Gray, Alma Rahat, Tom Crick, Stephen Lindsay, Darren Wallace

Abstract: Marking and feedback are essential features of teaching and learning, across the overwhelming majority of educational settings and contexts. However, it can take a great deal of time and effort for teachers to mark assessments, and to provide useful feedback to the students. Furthermore, it also creates a significant cognitive load on the assessors, especially in ensuring fairness and equity. Ther… ▽ More Marking and feedback are essential features of teaching and learning, across the overwhelming majority of educational settings and contexts. However, it can take a great deal of time and effort for teachers to mark assessments, and to provide useful feedback to the students. Furthermore, it also creates a significant cognitive load on the assessors, especially in ensuring fairness and equity. Therefore, an alternative approach to marking called comparative judgement (CJ) has been proposed in the educational space. Inspired by the law of comparative judgment (LCJ). This pairwise comparison for as many pairs as possible can then be used to rank all submissions. Studies suggest that CJ is highly reliable and accurate while making it quick for the teachers. Alternative studies have questioned this claim suggesting that the process can increase bias in the results as the same submission is shown many times to an assessor for increasing reliability. Additionally, studies have also found that CJ can result in the overall marking process taking longer than a more traditional method of marking as information about many pairs must be collected. In this paper, we investigate Elo, which has been extensively used in rating players in zero-sum games such as chess. We experimented on a large-scale Twitter dataset on the topic of a recent major UK political event ("Brexit", the UK's political exit from the European Union) to ask users which tweet they found funnier between a pair selected from ten tweets. Our analysis of the data reveals that the Elo rating is statistically significantly similar to the CJ ranking with a Kendall's tau score of 0.96 and a p-value of 1.5x10^(-5). We finish with an informed discussion regarding the potential wider application of this approach to a range of educational contexts. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: 12 pages, 4 figures, one table, pre-review version

arXiv:2201.05793 [pdf, other]

A Benchmark for Generalizable and Interpretable Temporal Question Answering over Knowledge Bases

Authors: Sumit Neelam, Udit Sharma, Hima Karanam, Shajith Ikbal, Pavan Kapanipathi, Ibrahim Abdelaziz, Nandana Mihindukulasooriya, Young-Suk Lee, Santosh Srivastava, Cezar Pendus, Saswati Dana, Dinesh Garg, Achille Fokoue, G P Shrivatsa Bhargav, Dinesh Khandelwal, Srinivas Ravishankar, Sairam Gurajada, Maria Chang, Rosario Uceda-Sosa, Salim Roukos, Alexander Gray, Guilherme Lima, Ryan Riegel, Francois Luus, L Venkata Subramaniam

Abstract: Knowledge Base Question Answering (KBQA) tasks that involve complex reasoning are emerging as an important research direction. However, most existing KBQA datasets focus primarily on generic multi-hop reasoning over explicit facts, largely ignoring other reasoning types such as temporal, spatial, and taxonomic reasoning. In this paper, we present a benchmark dataset for temporal reasoning, TempQA-… ▽ More Knowledge Base Question Answering (KBQA) tasks that involve complex reasoning are emerging as an important research direction. However, most existing KBQA datasets focus primarily on generic multi-hop reasoning over explicit facts, largely ignoring other reasoning types such as temporal, spatial, and taxonomic reasoning. In this paper, we present a benchmark dataset for temporal reasoning, TempQA-WD, to encourage research in extending the present approaches to target a more challenging set of complex reasoning tasks. Specifically, our benchmark is a temporal question answering dataset with the following advantages: (a) it is based on Wikidata, which is the most frequently curated, openly available knowledge base, (b) it includes intermediate sparql queries to facilitate the evaluation of semantic parsing based approaches for KBQA, and (c) it generalizes to multiple knowledge bases: Freebase and Wikidata. The TempQA-WD dataset is available at https://github.com/IBM/tempqa-wd. △ Less

Submitted 15 January, 2022; originally announced January 2022.

Comments: 7 pages, 2 figures, 7 tables. arXiv admin note: substantial text overlap with arXiv:2109.13430

arXiv:2112.07051 [pdf]

doi 10.1093/database/baac035

A Simple Standard for Sharing Ontological Mappings (SSSOM)

Authors: Nicolas Matentzoglu, James P. Balhoff, Susan M. Bello, Chris Bizon, Matthew Brush, Tiffany J. Callahan, Christopher G Chute, William D. Duncan, Chris T. Evelo, Davera Gabriel, John Graybeal, Alasdair Gray, Benjamin M. Gyori, Melissa Haendel, Henriette Harmse, Nomi L. Harris, Ian Harrow, Harshad Hegde, Amelia L. Hoyt, Charles T. Hoyt, Dazhi Jiao, Ernesto Jiménez-Ruiz, Simon Jupp, Hyeongsik Kim, Sebastian Koehler , et al. (19 additional authors not shown)

Abstract: Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, ar… ▽ More Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Are they associated in some other way? Such relationships between the mapped terms are often not documented, leading to incorrect assumptions and making them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Also, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. The Simple Standard for Sharing Ontological Mappings (SSSOM) addresses these problems by: 1. Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. 2. Defining an easy to use table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data standards. 3. Implementing open and community-driven collaborative workflows designed to evolve the standard continuously to address changing requirements and mapping practices. 4. Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases, and survey some existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable, and Reusable (FAIR). The SSSOM specification is at http://w3id.org/sssom/spec. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: Corresponding author: Christopher J. Mungall <cjmungall@lbl.gov>

arXiv:2112.03324 [pdf, other]

Neuro-Symbolic Inductive Logic Programming with Logical Neural Networks

Authors: Prithviraj Sen, Breno W. S. R. de Carvalho, Ryan Riegel, Alexander Gray

Abstract: Recent work on neuro-symbolic inductive logic programming has led to promising approaches that can learn explanatory rules from noisy, real-world data. While some proposals approximate logical operators with differentiable operators from fuzzy or real-valued logic that are parameter-free thus diminishing their capacity to fit the data, other approaches are only loosely based on logic making it dif… ▽ More Recent work on neuro-symbolic inductive logic programming has led to promising approaches that can learn explanatory rules from noisy, real-world data. While some proposals approximate logical operators with differentiable operators from fuzzy or real-valued logic that are parameter-free thus diminishing their capacity to fit the data, other approaches are only loosely based on logic making it difficult to interpret the learned "rules". In this paper, we propose learning rules with the recently proposed logical neural networks (LNN). Compared to others, LNNs offer strong connection to classical Boolean logic thus allowing for precise interpretation of learned rules while harboring parameters that can be trained with gradient-based optimization to effectively fit the data. We extend LNNs to induce rules in first-order logic. Our experiments on standard benchmarking tasks confirm that LNN rules are highly interpretable and can achieve comparable or higher accuracy due to their flexible parameterization. △ Less

Submitted 6 December, 2021; originally announced December 2021.

arXiv:2110.10973 [pdf, other]

LOA: Logical Optimal Actions for Text-based Interaction Games

Authors: Daiki Kimura, Subhajit Chaudhury, Masaki Ono, Michiaki Tatsubori, Don Joven Agravante, Asim Munawar, Akifumi Wachi, Ryosuke Kohita, Alexander Gray

Abstract: We present Logical Optimal Actions (LOA), an action decision architecture of reinforcement learning applications with a neuro-symbolic framework which is a combination of neural network and symbolic knowledge acquisition approach for natural language interaction games. The demonstration for LOA experiments consists of a web-based interactive platform for text-based games and visualization for acqu… ▽ More We present Logical Optimal Actions (LOA), an action decision architecture of reinforcement learning applications with a neuro-symbolic framework which is a combination of neural network and symbolic knowledge acquisition approach for natural language interaction games. The demonstration for LOA experiments consists of a web-based interactive platform for text-based games and visualization for acquired knowledge for improving interpretability for trained rules. This demonstration also provides a comparison module with other neuro-symbolic approaches as well as non-symbolic state-of-the-art agent models on the same text-based games. Our LOA also provides open-sourced implementation in Python for the reinforcement learning environment to facilitate an experiment for studying neuro-symbolic agents. Code: https://github.com/ibm/loa △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: ACL-IJCNLP 2021 (demo paper)

arXiv:2110.10963 [pdf, other]

Neuro-Symbolic Reinforcement Learning with First-Order Logic

Authors: Daiki Kimura, Masaki Ono, Subhajit Chaudhury, Ryosuke Kohita, Akifumi Wachi, Don Joven Agravante, Michiaki Tatsubori, Asim Munawar, Alexander Gray

Abstract: Deep reinforcement learning (RL) methods often require many trials before convergence, and no direct interpretability of trained policies is provided. In order to achieve fast convergence and interpretability for the policy in RL, we propose a novel RL method for text-based games with a recent neuro-symbolic framework called Logical Neural Network, which can learn symbolic and interpretable rules… ▽ More Deep reinforcement learning (RL) methods often require many trials before convergence, and no direct interpretability of trained policies is provided. In order to achieve fast convergence and interpretability for the policy in RL, we propose a novel RL method for text-based games with a recent neuro-symbolic framework called Logical Neural Network, which can learn symbolic and interpretable rules in their differentiable network. The method is first to extract first-order logical facts from text observation and external word meaning network (ConceptNet), then train a policy in the network with directly interpretable logical operators. Our experimental results show RL training with the proposed method converges significantly faster than other state-of-the-art neuro-symbolic methods in a TextWorld benchmark. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: EMNLP 2021 (main conference)

arXiv:2110.01295 [pdf, other]

SPaR.txt, a cheap Shallow Parsing approach for Regulatory texts

Authors: Ruben Kruiper, Ioannis Konstas, Alasdair Gray, Farhad Sadeghineko, Richard Watson, Bimal Kumar

Abstract: Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to research that focuses on small sub-tasks, such as shallow parsing or the extraction of a limited subset of rules. This study introduces a shal… ▽ More Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to research that focuses on small sub-tasks, such as shallow parsing or the extraction of a limited subset of rules. This study introduces a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for ACC. We annotate a small domain-specific dataset of 200 sentences, SPaR.txt, and train a sequence tagger that achieves 79,93 F1-score on the test set. We then show through manual evaluation that the model identifies most (89,84%) defined terms in a set of building regulation documents, and that both contiguous and discontiguous Multi-Word Expressions (MWE) are discovered with reasonable accuracy (70,3%). △ Less

Submitted 4 October, 2021; originally announced October 2021.

Comments: To be published in the NLLP workshop at EMNLP 2021, 9 pages (15 including reference and appendices). For the ScotReg corpus, SPaR.txt dataset and code see: http://github.com/rubenkruiper/SPaR.txt

arXiv:2109.13430 [pdf, other]

SYGMA: System for Generalizable Modular Question Answering OverKnowledge Bases

Authors: Sumit Neelam, Udit Sharma, Hima Karanam, Shajith Ikbal, Pavan Kapanipathi, Ibrahim Abdelaziz, Nandana Mihindukulasooriya, Young-Suk Lee, Santosh Srivastava, Cezar Pendus, Saswati Dana, Dinesh Garg, Achille Fokoue, G P Shrivatsa Bhargav, Dinesh Khandelwal, Srinivas Ravishankar, Sairam Gurajada, Maria Chang, Rosario Uceda-Sosa, Salim Roukos, Alexander Gray, Guilherme LimaRyan Riegel, Francois Luus, L Venkata Subramaniam

Abstract: Knowledge Base Question Answering (KBQA) tasks that in-volve complex reasoning are emerging as an important re-search direction. However, most KBQA systems struggle withgeneralizability, particularly on two dimensions: (a) acrossmultiple reasoning types where both datasets and systems haveprimarily focused on multi-hop reasoning, and (b) across mul-tiple knowledge bases, where KBQA approaches are… ▽ More Knowledge Base Question Answering (KBQA) tasks that in-volve complex reasoning are emerging as an important re-search direction. However, most KBQA systems struggle withgeneralizability, particularly on two dimensions: (a) acrossmultiple reasoning types where both datasets and systems haveprimarily focused on multi-hop reasoning, and (b) across mul-tiple knowledge bases, where KBQA approaches are specif-ically tuned to a single knowledge base. In this paper, wepresent SYGMA, a modular approach facilitating general-izability across multiple knowledge bases and multiple rea-soning types. Specifically, SYGMA contains three high levelmodules: 1) KB-agnostic question understanding module thatis common across KBs 2) Rules to support additional reason-ing types and 3) KB-specific question mapping and answeringmodule to address the KB-specific aspects of the answer ex-traction. We demonstrate effectiveness of our system by evalu-ating on datasets belonging to two distinct knowledge bases,DBpedia and Wikidata. In addition, to demonstrate extensi-bility to additional reasoning types we evaluate on multi-hopreasoning datasets and a new Temporal KBQA benchmarkdataset on Wikidata, namedTempQA-WD1, introduced in thispaper. We show that our generalizable approach has bettercompetetive performance on multiple datasets on DBpediaand Wikidata that requires both multi-hop and temporal rea-soning △ Less

Submitted 27 September, 2021; originally announced September 2021.

arXiv:2109.12240 [pdf, other]

Logical Credal Networks

Authors: Haifeng Qian, Radu Marinescu, Alexander Gray, Debarun Bhattacharjya, Francisco Barahona, Tian Gao, Ryan Riegel, Pravinda Sahu

Abstract: This paper introduces Logical Credal Networks, an expressive probabilistic logic that generalizes many prior models that combine logic and probability. Given imprecise information represented by probability bounds and conditional probability bounds of logic formulas, this logic specifies a set of probability distributions over all interpretations. On the one hand, our approach allows propositional… ▽ More This paper introduces Logical Credal Networks, an expressive probabilistic logic that generalizes many prior models that combine logic and probability. Given imprecise information represented by probability bounds and conditional probability bounds of logic formulas, this logic specifies a set of probability distributions over all interpretations. On the one hand, our approach allows propositional and first-order logic formulas with few restrictions, e.g., without requiring acyclicity. On the other hand, it has a Markov condition similar to Bayesian networks and Markov random fields that is critical in real-world applications. Having both these properties makes this logic unique, and we investigate its performance on maximum a posteriori inference tasks, including solving Mastermind games with uncertainty and detecting credit card fraud. The results show that the proposed method outperforms existing approaches, and its advantage lies in aggregating multiple sources of imprecise information. △ Less

Submitted 24 September, 2021; originally announced September 2021.

arXiv:2109.09566 [pdf, other]

Combining Rules and Embeddings via Neuro-Symbolic AI for Knowledge Base Completion

Authors: Prithviraj Sen, Breno W. S. R. Carvalho, Ibrahim Abdelaziz, Pavan Kapanipathi, Francois Luus, Salim Roukos, Alexander Gray

Abstract: Recent interest in Knowledge Base Completion (KBC) has led to a plethora of approaches based on reinforcement learning, inductive logic programming and graph embeddings. In particular, rule-based KBC has led to interpretable rules while being comparable in performance with graph embeddings. Even within rule-based KBC, there exist different approaches that lead to rules of varying quality and previ… ▽ More Recent interest in Knowledge Base Completion (KBC) has led to a plethora of approaches based on reinforcement learning, inductive logic programming and graph embeddings. In particular, rule-based KBC has led to interpretable rules while being comparable in performance with graph embeddings. Even within rule-based KBC, there exist different approaches that lead to rules of varying quality and previous work has not always been precise in highlighting these differences. Another issue that plagues most rule-based KBC is the non-uniformity of relation paths: some relation sequences occur in very few paths while others appear very frequently. In this paper, we show that not all rule-based KBC models are the same and propose two distinct approaches that learn in one case: 1) a mixture of relations and the other 2) a mixture of paths. When implemented on top of neuro-symbolic AI, which learns rules by extending Boolean logic to real-valued logic, the latter model leads to superior KBC accuracy outperforming state-of-the-art rule-based KBC by 2-10% in terms of mean reciprocal rank. Furthermore, to address the non-uniformity of relation paths, we combine rule-based KBC with graph embeddings thus improving our results even further and achieving the best of both worlds. △ Less

Submitted 16 September, 2021; originally announced September 2021.

arXiv:2106.13367 [pdf, other]

SeaNet -- Towards A Knowledge Graph Based Autonomic Management of Software Defined Networks

Authors: Qianru Zhou, Alasdair J. G. Gray, Stephen McLaughlin

Abstract: Automatic network management driven by Artificial Intelligent technologies has been heatedly discussed over decades. However, current reports mainly focus on theoretic proposals and architecture designs, works on practical implementations on real-life networks are yet to appear. This paper proposes our effort toward the implementation of knowledge graph driven approach for autonomic network manage… ▽ More Automatic network management driven by Artificial Intelligent technologies has been heatedly discussed over decades. However, current reports mainly focus on theoretic proposals and architecture designs, works on practical implementations on real-life networks are yet to appear. This paper proposes our effort toward the implementation of knowledge graph driven approach for autonomic network management in software defined networks (SDNs), termed as SeaNet. Driven by the ToCo ontology, SeaNet is reprogrammed based on Mininet (a SDN emulator). It consists three core components, a knowledge graph generator, a SPARQL engine, and a network management API. The knowledge graph generator represents the knowledge in the telecommunication network management tasks into formally represented ontology driven model. Expert experience and network management rules can be formalized into knowledge graph and by automatically inferenced by SPARQL engine, Network management API is able to packet technology-specific details and expose technology-independent interfaces to users. The Experiments are carried out to evaluate proposed work by comparing with a commercial SDN controller Ryu implemented by the same language Python. The evaluation results show that SeaNet is considerably faster in most circumstances than Ryu and the SeaNet code is significantly more compact. Benefit from RDF reasoning, SeaNet is able to achieve O(1) time complexity on different scales of the knowledge graph while the traditional database can achieve O(nlogn) at its best. With the developed network management API, SeaNet enables researchers to develop semantic-intelligent applications on their own SDNs. △ Less

Submitted 27 May, 2022; v1 submitted 24 June, 2021; originally announced June 2021.

arXiv:2106.09795 [pdf, other]

LNN-EL: A Neuro-Symbolic Approach to Short-text Entity Linking

Authors: Hang Jiang, Sairam Gurajada, Qiuhao Lu, Sumit Neelam, Lucian Popa, Prithviraj Sen, Yunyao Li, Alexander Gray

Abstract: Entity linking (EL), the task of disambiguating mentions in text by linking them to entities in a knowledge graph, is crucial for text understanding, question answering or conversational systems. Entity linking on short text (e.g., single sentence or question) poses particular challenges due to limited context. While prior approaches use either heuristics or black-box neural methods, here we propo… ▽ More Entity linking (EL), the task of disambiguating mentions in text by linking them to entities in a knowledge graph, is crucial for text understanding, question answering or conversational systems. Entity linking on short text (e.g., single sentence or question) poses particular challenges due to limited context. While prior approaches use either heuristics or black-box neural methods, here we propose LNN-EL, a neuro-symbolic approach that combines the advantages of using interpretable rules based on first-order logic with the performance of neural learning. Even though constrained to using rules, LNN-EL performs competitively against SotA black-box neural approaches, with the added benefits of extensibility and transferability. In particular, we show that we can easily blend existing rule templates given by a human expert, with multiple types of features (priors, BERT encodings, box embeddings, etc), and even scores resulting from previous EL methods, thus improving on such methods. For instance, on the LC-QuAD-1.0 dataset, we show more than $4$\% increase in F1 score over previous SotA. Finally, we show that the inductive bias offered by using logic results in learned rules that transfer well across datasets, even without fine tuning, while maintaining high accuracy. △ Less

Submitted 17 June, 2021; originally announced June 2021.

Comments: Accepted to ACL 2021

arXiv:2106.01260 [pdf, other]

Matrix factorisation and the interpretation of geodesic distance

Authors: Nick Whiteley, Annie Gray, Patrick Rubin-Delanchy

Abstract: Given a graph or similarity matrix, we consider the problem of recovering a notion of true distance between the nodes, and so their true positions. We show that this can be accomplished in two steps: matrix factorisation, followed by nonlinear dimension reduction. This combination is effective because the point cloud obtained in the first step lives close to a manifold in which latent distance is… ▽ More Given a graph or similarity matrix, we consider the problem of recovering a notion of true distance between the nodes, and so their true positions. We show that this can be accomplished in two steps: matrix factorisation, followed by nonlinear dimension reduction. This combination is effective because the point cloud obtained in the first step lives close to a manifold in which latent distance is encoded as geodesic distance. Hence, a nonlinear dimension reduction tool, approximating geodesic distance, can recover the latent positions, up to a simple transformation. We give a detailed account of the case where spectral embedding is used, followed by Isomap, and provide encouraging experimental evidence for other combinations of techniques. △ Less

Submitted 22 September, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

MSC Class: 62G05; 62H20; 62H12; 62H30

arXiv:2103.02363 [pdf, other]

Reinforcement Learning with External Knowledge by using Logical Neural Networks

Authors: Daiki Kimura, Subhajit Chaudhury, Akifumi Wachi, Ryosuke Kohita, Asim Munawar, Michiaki Tatsubori, Alexander Gray

Abstract: Conventional deep reinforcement learning methods are sample-inefficient and usually require a large number of training trials before convergence. Since such methods operate on an unconstrained action set, they can lead to useless actions. A recent neuro-symbolic framework called the Logical Neural Networks (LNNs) can simultaneously provide key-properties of both neural networks and symbolic logic.… ▽ More Conventional deep reinforcement learning methods are sample-inefficient and usually require a large number of training trials before convergence. Since such methods operate on an unconstrained action set, they can lead to useless actions. A recent neuro-symbolic framework called the Logical Neural Networks (LNNs) can simultaneously provide key-properties of both neural networks and symbolic logic. The LNNs functions as an end-to-end differentiable network that minimizes a novel contradiction loss to learn interpretable rules. In this paper, we utilize LNNs to define an inference graph using basic logical operations, such as AND and NOT, for faster convergence in reinforcement learning. Specifically, we propose an integrated method that enables model-free reinforcement learning from external knowledge sources in an LNNs-based logical constrained framework such as action shielding and guide. Our results empirically demonstrate that our method converges faster compared to a model-free reinforcement learning method that doesn't have such logical constraints. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: KBRL Workshop at IJCAI-PRICAI 2020

arXiv:2103.00418 [pdf, other]

Logic Embeddings for Complex Query Answering

Authors: Francois Luus, Prithviraj Sen, Pavan Kapanipathi, Ryan Riegel, Ndivhuwo Makondo, Thabang Lebese, Alexander Gray

Abstract: Answering logical queries over incomplete knowledge bases is challenging because: 1) it calls for implicit link prediction, and 2) brute force answering of existential first-order logic queries is exponential in the number of existential variables. Recent work of query embeddings provides fast querying, but most approaches model set logic with closed regions, so lack negation. Query embeddings tha… ▽ More Answering logical queries over incomplete knowledge bases is challenging because: 1) it calls for implicit link prediction, and 2) brute force answering of existential first-order logic queries is exponential in the number of existential variables. Recent work of query embeddings provides fast querying, but most approaches model set logic with closed regions, so lack negation. Query embeddings that do support negation use densities that suffer drawbacks: 1) only improvise logic, 2) use expensive distributions, and 3) poorly model answer uncertainty. In this paper, we propose Logic Embeddings, a new approach to embedding complex queries that uses Skolemisation to eliminate existential variables for efficient querying. It supports negation, but improves on density approaches: 1) integrates well-studied t-norm logic and directly evaluates satisfiability, 2) simplifies modeling with truth values, and 3) models uncertainty with truth bounds. Logic Embeddings are competitively fast and accurate in query answering over large, incomplete knowledge graphs, outperform on negation queries, and in particular, provide improved modeling of answer uncertainty as evidenced by a superior correlation between answer set size and embedding entropy. △ Less

Submitted 28 February, 2021; originally announced March 2021.

Comments: IBM Research

arXiv:2012.01707 [pdf, other]

Leveraging Abstract Meaning Representation for Knowledge Base Question Answering

Authors: Pavan Kapanipathi, Ibrahim Abdelaziz, Srinivas Ravishankar, Salim Roukos, Alexander Gray, Ramon Astudillo, Maria Chang, Cristina Cornelio, Saswati Dana, Achille Fokoue, Dinesh Garg, Alfio Gliozzo, Sairam Gurajada, Hima Karanam, Naweed Khan, Dinesh Khandelwal, Young-Suk Lee, Yunyao Li, Francois Luus, Ndivhuwo Makondo, Nandana Mihindukulasooriya, Tahira Naseem, Sumit Neelam, Lucian Popa, Revanth Reddy , et al. (5 additional authors not shown)

Abstract: Knowledge base question answering (KBQA)is an important task in Natural Language Processing. Existing approaches face significant challenges including complex question understanding, necessity for reasoning, and lack of large end-to-end training datasets. In this work, we propose Neuro-Symbolic Question Answering (NSQA), a modular KBQA system, that leverages (1) Abstract Meaning Representation (AM… ▽ More Knowledge base question answering (KBQA)is an important task in Natural Language Processing. Existing approaches face significant challenges including complex question understanding, necessity for reasoning, and lack of large end-to-end training datasets. In this work, we propose Neuro-Symbolic Question Answering (NSQA), a modular KBQA system, that leverages (1) Abstract Meaning Representation (AMR) parses for task-independent question understanding; (2) a simple yet effective graph transformation approach to convert AMR parses into candidate logical queries that are aligned to the KB; (3) a pipeline-based approach which integrates multiple, reusable modules that are trained specifically for their individual tasks (semantic parser, entity andrelationship linkers, and neuro-symbolic reasoner) and do not require end-to-end training data. NSQA achieves state-of-the-art performance on two prominent KBQA datasets based on DBpedia (QALD-9 and LC-QuAD1.0). Furthermore, our analysis emphasizes that AMR is a powerful tool for KBQA systems. △ Less

Submitted 2 June, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: Accepted to Findings of ACL

arXiv:2011.05624 [pdf, other]

SARA -- A Semantic Access Point Resource Allocation Service for Heterogenous Wireless Networks

Authors: Qianru Zhou, Alasdair J. G. Gray, Dimitrios Pezaros, Stephen McLaughlin

Abstract: In this paper, we present SARA, a Semantic Access point Resource Allocation service for heterogenous wireless networks with various wireless access technologies existing together. By automatically reasoning on the knowledge base of the full system provided by a knowledge based autonomic network management system -- SEANET, SARA selects the access point providing the best quality of service among t… ▽ More In this paper, we present SARA, a Semantic Access point Resource Allocation service for heterogenous wireless networks with various wireless access technologies existing together. By automatically reasoning on the knowledge base of the full system provided by a knowledge based autonomic network management system -- SEANET, SARA selects the access point providing the best quality of service among the different access technologies. Based on an ontology assisted knowledge based system SEANET, SARA can also adapt the access point selection strategy according to customer defined rules automatically. Results of our evaluation based on emulated networks with hybrid access technologies and various scales show that SARA is able to improve the channel condition, in terms of throughput, evidently. Comparisons with current AP selection algorithms demonstrate that SARA outperforms the existing AP selection algorithms. The overhead in terms of time expense is reasonable and is shown to be faster than traditional access point selection approaches. △ Less

Submitted 11 November, 2020; originally announced November 2020.

Comments: 2019 IEEE Wireless Day

arXiv:2009.07726 [pdf, other]

doi 10.1007/978-3-030-62419-4_23

Leveraging Semantic Parsing for Relation Linking over Knowledge Bases

Authors: Nandana Mihindukulasooriya, Gaetano Rossiello, Pavan Kapanipathi, Ibrahim Abdelaziz, Srinivas Ravishankar, Mo Yu, Alfio Gliozzo, Salim Roukos, Alexander Gray

Abstract: Knowledgebase question answering systems are heavily dependent on relation extraction and linking modules. However, the task of extracting and linking relations from text to knowledgebases faces two primary challenges; the ambiguity of natural language and lack of training data. To overcome these challenges, we present SLING, a relation linking framework which leverages semantic parsing using Abst… ▽ More Knowledgebase question answering systems are heavily dependent on relation extraction and linking modules. However, the task of extracting and linking relations from text to knowledgebases faces two primary challenges; the ambiguity of natural language and lack of training data. To overcome these challenges, we present SLING, a relation linking framework which leverages semantic parsing using Abstract Meaning Representation (AMR) and distant supervision. SLING integrates multiple relation linking approaches that capture complementary signals such as linguistic cues, rich semantic representation, and information from the knowledgebase. The experiments on relation linking using three KBQA datasets; QALD-7, QALD-9, and LC-QuAD 1.0 demonstrate that the proposed approach achieves state-of-the-art performance on all benchmarks. △ Less

Submitted 16 September, 2020; originally announced September 2020.

Comments: Accepted at the 19th International Semantic Web Conference (ISWC 2020)

MSC Class: 68T35 ACM Class: I.2.7; I.2.4

arXiv:2008.02429 [pdf, ps, other]

Foundations of Reasoning with Uncertainty via Real-valued Logics

Authors: Ronald Fagin, Ryan Riegel, Alexander Gray

Abstract: Real-valued logics underlie an increasing number of neuro-symbolic approaches, though typically their logical inference capabilities are characterized only qualitatively. We provide foundations for establishing the correctness and power of such systems. We give a sound and strongly complete axiomatization that can be parametrized to cover essentially every real-valued logic, including all the comm… ▽ More Real-valued logics underlie an increasing number of neuro-symbolic approaches, though typically their logical inference capabilities are characterized only qualitatively. We provide foundations for establishing the correctness and power of such systems. We give a sound and strongly complete axiomatization that can be parametrized to cover essentially every real-valued logic, including all the common fuzzy logics. Our class of sentences are very rich, and each describes a set of possible real values for a collection of formulas of the real-valued logic, including which combinations of real values are possible. Strong completeness allows us to derive exactly what information can be inferred about the combinations of real values of a collection of formulas given information about the combinations of real values of several other collections of formulas. We then extend the axiomatization to deal with weighted subformulas. Finally, we give a decision procedure based on linear programming for deciding, for certain real-valued logics and under certain natural assumptions, whether a set of our sentences logically implies another of our sentences. △ Less

Submitted 30 August, 2022; v1 submitted 5 August, 2020; originally announced August 2020.

Comments: 12 pages (incl. references). To be submitted to PNAS

arXiv:2006.13155 [pdf, other]

Logical Neural Networks

Authors: Ryan Riegel, Alexander Gray, Francois Luus, Naweed Khan, Ndivhuwo Makondo, Ismail Yunus Akhalwaya, Haifeng Qian, Ronald Fagin, Francisco Barahona, Udit Sharma, Shajith Ikbal, Hima Karanam, Sumit Neelam, Ankita Likhyani, Santosh Srivastava

Abstract: We propose a novel framework seamlessly providing key properties of both neural nets (learning) and symbolic logic (knowledge and reasoning). Every neuron has a meaning as a component of a formula in a weighted real-valued logic, yielding a highly intepretable disentangled representation. Inference is omnidirectional rather than focused on predefined target variables, and corresponds to logical re… ▽ More We propose a novel framework seamlessly providing key properties of both neural nets (learning) and symbolic logic (knowledge and reasoning). Every neuron has a meaning as a component of a formula in a weighted real-valued logic, yielding a highly intepretable disentangled representation. Inference is omnidirectional rather than focused on predefined target variables, and corresponds to logical reasoning, including classical first-order logic theorem proving as a special case. The model is end-to-end differentiable, and learning minimizes a novel loss function capturing logical contradiction, yielding resilience to inconsistent knowledge. It also enables the open-world assumption by maintaining bounds on truth values which can have probabilistic semantics, yielding resilience to incomplete knowledge. △ Less

Submitted 23 June, 2020; originally announced June 2020.

Comments: 10 pages (incl. references), 38 pages supplementary, 7 figures, 9 tables, 6 algorithms. In submission to NeurIPS 2020

arXiv:2006.09635 [pdf, other]

Solving Constrained CASH Problems with ADMM

Authors: Parikshit Ram, Sijia Liu, Deepak Vijaykeerthi, Dakuo Wang, Djallel Bouneffouf, Greg Bramble, Horst Samulowitz, Alexander G. Gray

Abstract: The CASH problem has been widely studied in the context of automated configurations of machine learning (ML) pipelines and various solvers and toolkits are available. However, CASH solvers do not directly handle black-box constraints such as fairness, robustness or other domain-specific custom constraints. We present our recent approach [Liu, et al., 2020] that leverages the ADMM optimization fram… ▽ More The CASH problem has been widely studied in the context of automated configurations of machine learning (ML) pipelines and various solvers and toolkits are available. However, CASH solvers do not directly handle black-box constraints such as fairness, robustness or other domain-specific custom constraints. We present our recent approach [Liu, et al., 2020] that leverages the ADMM optimization framework to decompose CASH into multiple small problems and demonstrate how ADMM facilitates incorporation of black-box constraints. △ Less

Submitted 10 July, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: 7th ICML Workshop on Automated Machine Learning (2020)

arXiv:2006.09167 [pdf, other]

doi 10.1063/5.0018516

Heterogeneous Parallelization and Acceleration of Molecular Dynamics Simulations in GROMACS

Authors: Szilárd Páll, Artem Zhmurov, Paul Bauer, Mark Abraham, Magnus Lundborg, Alan Gray, Berk Hess, Erik Lindahl

Abstract: The introduction of accelerator devices such as graphics processing units (GPUs) has had profound impact on molecular dynamics simulations and has enabled order-of-magnitude performance advances using commodity hardware. To fully reap these benefits, it has been necessary to reformulate some of the most fundamental algorithms, including the Verlet list, pair searching and cut-offs. Here, we presen… ▽ More The introduction of accelerator devices such as graphics processing units (GPUs) has had profound impact on molecular dynamics simulations and has enabled order-of-magnitude performance advances using commodity hardware. To fully reap these benefits, it has been necessary to reformulate some of the most fundamental algorithms, including the Verlet list, pair searching and cut-offs. Here, we present the heterogeneous parallelization and acceleration design of molecular dynamics implemented in the GROMACS codebase over the last decade. The setup involves a general cluster-based approach to pair lists and non-bonded pair interactions that utilizes both GPUs and CPU SIMD acceleration efficiently, including the ability to load-balance tasks between CPUs and GPUs. The algorithm work efficiency is tuned for each type of hardware, and to use accelerators more efficiently we introduce dual pair lists with rolling pruning updates. Combined with new direct GPU-GPU communication as well as GPU integration, this enables excellent performance from single GPU simulations through strong scaling across multiple GPUs and efficient multi-node parallelization. △ Less

Submitted 7 September, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: The following article has been submitted to the Journal of Chemical Physics

ACM Class: J.2; I.6.3

arXiv:1912.06723 [pdf, other]

doi 10.1145/3377325.3377538

AutoAIViz: Opening the Blackbox of Automated Artificial Intelligence with Conditional Parallel Coordinates

Authors: Daniel Karl I. Weidele, Justin D. Weisz, Eno Oduor, Michael Muller, Josh Andres, Alexander Gray, Dakuo Wang

Abstract: Artificial Intelligence (AI) can now automate the algorithm selection, feature engineering, and hyperparameter tuning steps in a machine learning workflow. Commonly known as AutoML or AutoAI, these technologies aim to relieve data scientists from the tedious manual work. However, today's AutoAI systems often present only limited to no information about the process of how they select and generate m… ▽ More Artificial Intelligence (AI) can now automate the algorithm selection, feature engineering, and hyperparameter tuning steps in a machine learning workflow. Commonly known as AutoML or AutoAI, these technologies aim to relieve data scientists from the tedious manual work. However, today's AutoAI systems often present only limited to no information about the process of how they select and generate model results. Thus, users often do not understand the process, neither do they trust the outputs. In this short paper, we provide a first user evaluation by 10 data scientists of an experimental system, AutoAIViz, that aims to visualize AutoAI's model generation process. We find that the proposed system helps users to complete the data science tasks, and increases their understanding, toward the goal of increasing trust in the AutoAI system. △ Less

Submitted 17 January, 2020; v1 submitted 13 December, 2019; originally announced December 2019.

Comments: 5 pages, 1 figure, IUI2020

arXiv:1910.14436 [pdf, other]

How can AI Automate End-to-End Data Science?

Authors: Charu Aggarwal, Djallel Bouneffouf, Horst Samulowitz, Beat Buesser, Thanh Hoang, Udayan Khurana, Sijia Liu, Tejaswini Pedapati, Parikshit Ram, Ambrish Rawat, Martin Wistuba, Alexander Gray

Abstract: Data science is labor-intensive and human experts are scarce but heavily involved in every aspect of it. This makes data science time consuming and restricted to experts with the resulting quality heavily dependent on their experience and skills. To make data science more accessible and scalable, we need its democratization. Automated Data Science (AutoDS) is aimed towards that goal and is emergin… ▽ More Data science is labor-intensive and human experts are scarce but heavily involved in every aspect of it. This makes data science time consuming and restricted to experts with the resulting quality heavily dependent on their experience and skills. To make data science more accessible and scalable, we need its democratization. Automated Data Science (AutoDS) is aimed towards that goal and is emerging as an important research and business topic. We introduce and define the AutoDS challenge, followed by a proposal of a general AutoDS framework that covers existing approaches but also provides guidance for the development of new methods. We categorize and review the existing literature from multiple aspects of the problem setup and employed techniques. Then we provide several views on how AI could succeed in automating end-to-end AutoDS. We hope this survey can serve as insightful guideline for the AutoDS field and provide inspiration for future research. △ Less

Submitted 22 October, 2019; originally announced October 2019.

arXiv:1909.02309 [pdf, other]

doi 10.1145/3359313

Human-AI Collaboration in Data Science: Exploring Data Scientists' Perceptions of Automated AI

Authors: Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, Alexander Gray

Abstract: The rapid advancement of artificial intelligence (AI) is changing our lives in many ways. One application domain is data science. New techniques in automating the creation of AI, known as AutoAI or AutoML, aim to automate the work practices of data scientists. AutoAI systems are capable of autonomously ingesting and pre-processing data, engineering new features, and creating and scoring models bas… ▽ More The rapid advancement of artificial intelligence (AI) is changing our lives in many ways. One application domain is data science. New techniques in automating the creation of AI, known as AutoAI or AutoML, aim to automate the work practices of data scientists. AutoAI systems are capable of autonomously ingesting and pre-processing data, engineering new features, and creating and scoring models based on a target objectives (e.g. accuracy or run-time efficiency). Though not yet widely adopted, we are interested in understanding how AutoAI will impact the practice of data science. We conducted interviews with 20 data scientists who work at a large, multinational technology company and practice data science in various business settings. Our goal is to understand their current work practices and how these practices might change with AutoAI. Reactions were mixed: while informants expressed concerns about the trend of automating their jobs, they also strongly felt it was inevitable. Despite these concerns, they remained optimistic about their future job security due to a view that the future of data science work will be a collaboration between humans and AI systems, in which both automation and human expertise are indispensable. △ Less

Submitted 5 September, 2019; originally announced September 2019.

arXiv:1908.06097 [pdf]

Performance report and optimized implementations of Weather & Climate dwarfs on multi-node systems

Authors: Louis Douriez, Alan Gray, David Guibert, Peter Messmer, Erwan Raffin

Abstract: This document is one of the deliverable reports created for the ESCAPE project. ESCAPE stands for Energy-efficient Scalable Algorithms for Weather Prediction at Exascale. The project develops world-class, extreme-scale computing capabilities for European operational numerical weather prediction and future climate models. This is done by identifying Weather & Climate dwarfs which are key patterns i… ▽ More This document is one of the deliverable reports created for the ESCAPE project. ESCAPE stands for Energy-efficient Scalable Algorithms for Weather Prediction at Exascale. The project develops world-class, extreme-scale computing capabilities for European operational numerical weather prediction and future climate models. This is done by identifying Weather & Climate dwarfs which are key patterns in terms of computation and communication (in the spirit of the Berkeley dwarfs). These dwarfs are then optimised for different hardware architectures (single and multi-node) and alternative algorithms are explored. Performance portability is addressed through the use of domain specific languages. Here we summarize the work performed on optimizations of the dwarfs focusing on CPU multi-nodes and multi-GPUs. We limit ourselves to a subset of the dwarf configurations chosen by the consortium. Intra-node optimizations of the dwarfs and energy-specific optimizations have been described in Deliverable D3.3. To cover the important algorithmic motifs we picked dwarfs related to the dynamical core as well as column physics. Specifically, we focused on the formulation relevant to spectral codes like ECMWF's IFS code. The main findings of this report are: (a) Up-to 30% performance gain with CPU based multi-node systems compared to optimized version of dwarfs from task 3.3 (see D3.3), (b) up to 10X performance gain on multiple GPUs from optimizations to keep data resident on the GPU and enable fast inter-GPU communication mechanisms, and (c) multi-GPU systems which feature a high-bandwidth all-to-all interconnect topology with NVLink/NVSwitch hardware are particularly well suited to the algorithms. △ Less

Submitted 16 August, 2019; originally announced August 2019.

Comments: 35 pages, 22 figures

ACM Class: D.2.8; G.1.8; G.4

arXiv:1908.06096 [pdf]

Performance report and optimized implementation of Weather & Climate Dwarfs on GPU, MIC and Optalysys Optical Processor

Authors: Cyril Mazauric, Erwan Raffin, Xavier Vigouroux, David Guibert, Alex Macfaden, Jacob Poulsen, Per Berg, Alan Gray, Peter Messmer

Abstract: This document is one of the deliverable reports created for the ESCAPE project. ESCAPE stands for Energy-efficient Scalable Algorithms for Weather Prediction at Exascale. The project develops world-class, extreme-scale computing capabilities for European operational numerical weather prediction and future climate models. This is done by identifying Weather & Climate dwarfs which are key patterns i… ▽ More This document is one of the deliverable reports created for the ESCAPE project. ESCAPE stands for Energy-efficient Scalable Algorithms for Weather Prediction at Exascale. The project develops world-class, extreme-scale computing capabilities for European operational numerical weather prediction and future climate models. This is done by identifying Weather & Climate dwarfs which are key patterns in terms of computation and communication (in the spirit of the Berkeley dwarfs). These dwarfs are then optimised for different hardware architectures (single and multi-node) and alternative algorithms are explored. Performance portability is addressed through the use of domain specific languages. Here we summarize the work performed on optimizations of the dwarfs on CPUs, Xeon Phi, GPUs and on the Optalysys optical processor. We limit ourselves to a subset of the dwarf configurations and to problem sizes small enough to execute on a single node. Also, we use time-to-solution as the main performance metric. Multi-node optimizations of the dwarfs and energy-specific optimizations are beyond the scope of this report and will be described in Deliverable D3.4. To cover the important algorithmic motifs we picked dwarfs related to the dynamical core as well as column physics. Specifically, we focused on the formulation relevant to spectral codes like ECMWF's IFS code. The main findings of this report are: (a) Acceleration of 1.1x - 2.5x of the dwarfs on CPU based systems using compiler directives, (b) order of magnitude acceleration of the dwarfs on GPUs (23x for spectral transform, 9x for MPDATA) using data locality optimizations and (c) demonstrated feasibility of a spectral transform in a purely optical fashion. △ Less

Submitted 16 August, 2019; originally announced August 2019.

Comments: 75 pages, 33 figures

ACM Class: D.2.8; G.1.8; G.4

arXiv:1905.00424 [pdf, other]

An ADMM Based Framework for AutoML Pipeline Configuration

Authors: Sijia Liu, Parikshit Ram, Deepak Vijaykeerthy, Djallel Bouneffouf, Gregory Bramble, Horst Samulowitz, Dakuo Wang, Andrew Conn, Alexander Gray

Abstract: We study the AutoML problem of automatically configuring machine learning pipelines by jointly selecting algorithms and their appropriate hyper-parameters for all steps in supervised learning pipelines. This black-box (gradient-free) optimization with mixed integer & continuous variables is a challenging problem. We propose a novel AutoML scheme by leveraging the alternating direction method of mu… ▽ More We study the AutoML problem of automatically configuring machine learning pipelines by jointly selecting algorithms and their appropriate hyper-parameters for all steps in supervised learning pipelines. This black-box (gradient-free) optimization with mixed integer & continuous variables is a challenging problem. We propose a novel AutoML scheme by leveraging the alternating direction method of multipliers (ADMM). The proposed framework is able to (i) decompose the optimization problem into easier sub-problems that have a reduced number of variables and circumvent the challenge of mixed variable categories, and (ii) incorporate black-box constraints along-side the black-box optimization objective. We empirically evaluate the flexibility (in utilizing existing AutoML techniques), effectiveness (against open source AutoML toolkits),and unique capability (of executing AutoML with practically motivated black-box constraints) of our proposed scheme on a collection of binary classification data sets from UCI ML& OpenML repositories. We observe that on an average our framework provides significant gains in comparison to other AutoML frameworks (Auto-sklearn & TPOT), highlighting the practical advantages of this framework. △ Less

Submitted 6 December, 2019; v1 submitted 1 May, 2019; originally announced May 2019.

Journal ref: published at AAAI 2020

arXiv:1903.05372 [pdf, other]

Lost Silence: An emergency response early detection service through continuous processing of telecommunication data streams

Authors: Qianru Zhou, Stephen McLaughlin, Alasdair J. G. Gray, Shangbin Wu, Chengxiang Wang

Abstract: Early detection of significant traumatic events, e.g. a terrorist attack or a ship capsizing, is important to ensure that a prompt emergency response can occur. In the modern world telecommunication systems could play a key role in ensuring a successful emergency response by detecting such incidents through significant changes in calls and access to the networks. In this paper a methodology is ill… ▽ More Early detection of significant traumatic events, e.g. a terrorist attack or a ship capsizing, is important to ensure that a prompt emergency response can occur. In the modern world telecommunication systems could play a key role in ensuring a successful emergency response by detecting such incidents through significant changes in calls and access to the networks. In this paper a methodology is illustrated to detect such incidents immediately (with the delay in the order of milliseconds), by processing semantically annotated streams of data in cellular telecommunication systems. In our methodology, live information about the position and status of phones are encoded as RDF streams. We propose an algorithm that processes streams of RDF annotated telecommunication data to detect abnormality. Our approach is exemplified in the context of a passenger cruise ship capsizing. However, the approach is readily translatable to other incidents. Our evaluation results show that with a properly chosen window size, such incidents can be detected efficiently and effectively. △ Less

Submitted 13 March, 2019; originally announced March 2019.

Comments: 15 pages, 4 figures, WSP ISWC 2017 conference

Journal ref: ISWC WSP 2017, pp. 33--47

arXiv:1902.09944 [pdf, ps, other]

Automated Screening for Distress: A Perspective for the Future

Authors: Rajib Rana, Siddique Latif, Raj Gururajan, Anthony Gray, Geraldine Mackenzie, Gerald Humphris, Jeff Dunn

Abstract: Distress is a complex condition which affects a significant percentage of cancer patients and may lead to depression, anxiety, sadness, suicide and other forms of psychological morbidity. Compelling evidence supports screening for distress as a means of facilitating early intervention and subsequent improvements in psychological well-being and overall quality of life. Nevertheless, despite the exi… ▽ More Distress is a complex condition which affects a significant percentage of cancer patients and may lead to depression, anxiety, sadness, suicide and other forms of psychological morbidity. Compelling evidence supports screening for distress as a means of facilitating early intervention and subsequent improvements in psychological well-being and overall quality of life. Nevertheless, despite the existence of evidence based and easily administered screening tools, for example, the Distress Thermometer, routine screening for distress is yet to achieve widespread implementation. Efforts are intensifying to utilise innovative, cost effective methods now available through emerging technologies in the informatics and computational arenas. △ Less

Submitted 27 July, 2020; v1 submitted 22 February, 2019; originally announced February 2019.

Comments: Accepted in European Journal of Cancer Care

arXiv:1609.01479 [pdf, other]

A Lightweight Approach to Performance Portability with targetDP

Authors: Alan Gray, Kevin Stratford

Abstract: Leading HPC systems achieve their status through use of highly parallel devices such as NVIDIA GPUs or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel h… ▽ More Leading HPC systems achieve their status through use of highly parallel devices such as NVIDIA GPUs or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus a separate lattice QCD particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with MPI to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and GPU-accelerated large scale supercomputers. △ Less

Submitted 9 November, 2016; v1 submitted 6 September, 2016; originally announced September 2016.

Comments: 11 pages, 5 figures, accepted to the International Journal of High Performance Computing Applications (IJHPCA), acceptance date 27th October 2016

arXiv:1507.07260 [pdf, other]

Reduced-Set Kernel Principal Components Analysis for Improving the Training and Execution Speed of Kernel Machines

Authors: Hassan A. Kingravi, Patricio A. Vela, Alexandar Gray

Abstract: This paper presents a practical, and theoretically well-founded, approach to improve the speed of kernel manifold learning algorithms relying on spectral decomposition. Utilizing recent insights in kernel smoothing and learning with integral operators, we propose Reduced Set KPCA (RSKPCA), which also suggests an easy-to-implement method to remove or replace samples with minimal effect on the empir… ▽ More This paper presents a practical, and theoretically well-founded, approach to improve the speed of kernel manifold learning algorithms relying on spectral decomposition. Utilizing recent insights in kernel smoothing and learning with integral operators, we propose Reduced Set KPCA (RSKPCA), which also suggests an easy-to-implement method to remove or replace samples with minimal effect on the empirical operator. A simple data point selection procedure is given to generate a substitute density for the data, with accuracy that is governed by a user-tunable parameter . The effect of the approximation on the quality of the KPCA solution, in terms of spectral and operator errors, can be shown directly in terms of the density estimate error and as a function of the parameter . We show in experiments that RSKPCA can improve both training and evaluation time of KPCA by up to an order of magnitude, and compares favorably to the widely-used Nystrom and density-weighted Nystrom methods. △ Less

Submitted 26 July, 2015; originally announced July 2015.

Showing 1–50 of 72 results for author: Gray, A