-
Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings
Authors:
Robert Wolfe,
Isaac Slaughter,
Bin Han,
Bingbing Wen,
Yiwei Yang,
Lucas Rosenblatt,
Bernease Herman,
Eva Brown,
Zening Qu,
Nic Weber,
Bill Howe
Abstract:
The rapid proliferation of generative AI has raised questions about the competitiveness of lower-parameter, locally tunable, open-weight models relative to high-parameter, API-guarded, closed-weight models in terms of performance, domain adaptation, cost, and generalization. Centering under-resourced yet risk-intolerant settings in government, research, and healthcare, we see for-profit closed-wei…
▽ More
The rapid proliferation of generative AI has raised questions about the competitiveness of lower-parameter, locally tunable, open-weight models relative to high-parameter, API-guarded, closed-weight models in terms of performance, domain adaptation, cost, and generalization. Centering under-resourced yet risk-intolerant settings in government, research, and healthcare, we see for-profit closed-weight models as incompatible with requirements for transparency, privacy, adaptability, and standards of evidence. Yet the performance penalty in using open-weight models, especially in low-data and low-resource settings, is unclear.
We assess the feasibility of using smaller, open-weight models to replace GPT-4-Turbo in zero-shot, few-shot, and fine-tuned regimes, assuming access to only a single, low-cost GPU. We assess value-sensitive issues around bias, privacy, and abstention on three additional tasks relevant to those topics. We find that with relatively low effort, very low absolute monetary cost, and relatively little data for fine-tuning, small open-weight models can achieve competitive performance in domain-adapted tasks without sacrificing generality. We then run experiments considering practical issues in bias, privacy, and hallucination risk, finding that open models offer several benefits over closed models. We intend this work as a case study in understanding the opportunity cost of reproducibility and transparency over for-profit state-of-the-art zero shot performance, finding this cost to be marginal under realistic settings.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Higher-Rank Irreducible Cartesian Tensors for Equivariant Message Passing
Authors:
Viktor Zaverkin,
Francesco Alesiani,
Takashi Maruyama,
Federico Errica,
Henrik Christiansen,
Makoto Takamoto,
Nicolas Weber,
Mathias Niepert
Abstract:
The ability to perform fast and accurate atomistic simulations is crucial for advancing the chemical sciences. By learning from high-quality data, machine-learned interatomic potentials achieve accuracy on par with ab initio and first-principles methods at a fraction of their computational cost. The success of machine-learned interatomic potentials arises from integrating inductive biases such as…
▽ More
The ability to perform fast and accurate atomistic simulations is crucial for advancing the chemical sciences. By learning from high-quality data, machine-learned interatomic potentials achieve accuracy on par with ab initio and first-principles methods at a fraction of their computational cost. The success of machine-learned interatomic potentials arises from integrating inductive biases such as equivariance to group actions on an atomic system, e.g., equivariance to rotations and reflections. In particular, the field has notably advanced with the emergence of equivariant message-passing architectures. Most of these models represent an atomic system using spherical tensors, tensor products of which require complicated numerical coefficients and can be computationally demanding. This work introduces higher-rank irreducible Cartesian tensors as an alternative to spherical tensors, addressing the above limitations. We integrate irreducible Cartesian tensor products into message-passing neural networks and prove the equivariance of the resulting layers. Through empirical evaluations on various benchmark data sets, we consistently observe on-par or better performance than that of state-of-the-art spherical models.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
On the use of associative memory in Hopfield networks designed to solve propositional satisfiability problems
Authors:
Natalya Weber,
Werner Koch,
Ozan Erdem,
Tom Froese
Abstract:
Hopfield networks are an attractive choice for solving many types of computational problems because they provide a biologically plausible mechanism. The Self-Optimization (SO) model adds to the Hopfield network by using a biologically founded Hebbian learning rule, in combination with repeated network resets to arbitrary initial states, for optimizing its own behavior towards some desirable goal s…
▽ More
Hopfield networks are an attractive choice for solving many types of computational problems because they provide a biologically plausible mechanism. The Self-Optimization (SO) model adds to the Hopfield network by using a biologically founded Hebbian learning rule, in combination with repeated network resets to arbitrary initial states, for optimizing its own behavior towards some desirable goal state encoded in the network. In order to better understand that process, we demonstrate first that the SO model can solve concrete combinatorial problems in SAT form, using two examples of the Liars problem and the map coloring problem. In addition, we show how under some conditions critical information might get lost forever with the learned network producing seemingly optimal solutions that are in fact inappropriate for the problem it was tasked to solve. What appears to be an undesirable side-effect of the SO model, can provide insight into its process for solving intractable problems.
△ Less
Submitted 4 March, 2024; v1 submitted 31 July, 2023;
originally announced July 2023.
-
Soft-Search: Two Datasets to Study the Identification and Production of Research Software
Authors:
Eva Maxfield Brown,
Lindsey Schwartz,
Richard Lewei Huang,
Nicholas Weber
Abstract:
Software is an important tool for scholarly work, but software produced for research is in many cases not easily identifiable or discoverable. A potential first step in linking research and software is software identification. In this paper we present two datasets to study the identification and production of research software. The first dataset contains almost 1000 human labeled annotations of so…
▽ More
Software is an important tool for scholarly work, but software produced for research is in many cases not easily identifiable or discoverable. A potential first step in linking research and software is software identification. In this paper we present two datasets to study the identification and production of research software. The first dataset contains almost 1000 human labeled annotations of software production from National Science Foundation (NSF) awarded research projects. We use this dataset to train models that predict software production. Our second dataset is created by applying the trained predictive models across the abstracts and project outcomes reports for all NSF funded projects between the years of 2010 and 2023. The result is an inferred dataset of software production for over 150,000 NSF awards. We release the Soft-Search dataset to aid in identifying and understanding research software production: https://github.com/si2-urssi/eager
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
Scaling up the self-optimization model by means of on-the-fly computation of weights
Authors:
Natalya Weber,
Werner Koch,
Tom Froese
Abstract:
The Self-Optimization (SO) model is a useful computational model for investigating self-organization in "soft" Artificial life (ALife) as it has been shown to be general enough to model various complex adaptive systems. So far, existing work has been done on relatively small network sizes, precluding the investigation of novel phenomena that might emerge from the complexity arising from large numb…
▽ More
The Self-Optimization (SO) model is a useful computational model for investigating self-organization in "soft" Artificial life (ALife) as it has been shown to be general enough to model various complex adaptive systems. So far, existing work has been done on relatively small network sizes, precluding the investigation of novel phenomena that might emerge from the complexity arising from large numbers of nodes interacting in interconnected networks. This work introduces a novel implementation of the SO model that scales as $\mathcal{O}\left(N^{2}\right)$ with respect to the number of nodes $N$, and demonstrates the applicability of the SO model to networks with system sizes several orders of magnitude higher than previously was investigated. Removing the prohibitive computational cost of the naive $\mathcal{O}\left(N^{3}\right)$ algorithm, our on-the-fly computation paves the way for investigating substantially larger system sizes, allowing for more variety and complexity in future studies.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Extending Open Bandit Pipeline to Simulate Industry Challenges
Authors:
Bram van den Akker,
Niklas Weber,
Felipe Moraes,
Dmitri Goldenberg
Abstract:
Bandit algorithms are often used in the e-commerce industry to train Machine Learning (ML) systems when pre-labeled data is unavailable. However, the industry setting poses various challenges that make implementing bandit algorithms in practice non-trivial. In this paper, we elaborate on the challenges of off-policy optimisation, delayed reward, concept drift, reward design, and business rules con…
▽ More
Bandit algorithms are often used in the e-commerce industry to train Machine Learning (ML) systems when pre-labeled data is unavailable. However, the industry setting poses various challenges that make implementing bandit algorithms in practice non-trivial. In this paper, we elaborate on the challenges of off-policy optimisation, delayed reward, concept drift, reward design, and business rules constraints that practitioners at Booking.com encounter when applying bandit algorithms. Our main contributions is an extension to the Open Bandit Pipeline (OBP) framework. We provide simulation components for some of the above-mentioned challenges to provide future practitioners, researchers, and educators with a resource to address challenges encountered in the e-commerce industry.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
Research Software Publication Policy Case Study
Authors:
Nic Weber
Abstract:
Research software is increasingly recognized as a vital component of the scholarly record. Journals offer authors the opportunity to publish research software papers, but often have different requirements for how these publications should be structured and how code should be verified. In this short case study we gather data from 20 Physical Science journals to trace the frequency, quality control,…
▽ More
Research software is increasingly recognized as a vital component of the scholarly record. Journals offer authors the opportunity to publish research software papers, but often have different requirements for how these publications should be structured and how code should be verified. In this short case study we gather data from 20 Physical Science journals to trace the frequency, quality control, and publishing criteria for software papers. Our goal with the case study is to provide a proof-of-concept for doing descriptive empirical work with software publication policies across numerous domains of science and engineering. In the narrative we therefore provide descriptive statistics showing how these journals differ in criteria required for archiving, linking, verifying, and documenting software as part of a formal publication. The contribution of this preliminary work is twofold: 1. We provide case study of Physical Science research software publications over time; 2. We demonstrate the use of a new survey method for analyzing research software publication policies. In our conclusion, we describe how comparative research into software publication policies can provide better criteria and requirements for an emerging software publication landscape.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
Fed-DART and FACT: A solution for Federated Learning in a production environment
Authors:
Nico Weber,
Patrick Holzer,
Tania Jacob,
Enislay Ramentol
Abstract:
Federated Learning as a decentralized artificial intelligence (AI) solution solves a variety of problems in industrial applications. It enables a continuously self-improving AI, which can be deployed everywhere at the edge. However, bringing AI to production for generating a real business impact is a challenging task. Especially in the case of Federated Learning, expertise and resources from multi…
▽ More
Federated Learning as a decentralized artificial intelligence (AI) solution solves a variety of problems in industrial applications. It enables a continuously self-improving AI, which can be deployed everywhere at the edge. However, bringing AI to production for generating a real business impact is a challenging task. Especially in the case of Federated Learning, expertise and resources from multiple domains are required to realize its full potential. Having this in mind we have developed an innovative Federated Learning framework FACT based on Fed-DART, enabling an easy and scalable deployment, helping the user to fully leverage the potential of their private and decentralized data.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Ethics of Open Data
Authors:
Nic Weber,
Brandon Locke
Abstract:
This chapter addresses emergent ethical issues in producing, using, curating, and providing services for open data. Our goal is to provide an introduction to how ethical topics in open data manifest in practical dilemmas for scholarly communications and some approaches to understanding and working through them. We begin with a brief overview of what can be thought of as three basic theories of eth…
▽ More
This chapter addresses emergent ethical issues in producing, using, curating, and providing services for open data. Our goal is to provide an introduction to how ethical topics in open data manifest in practical dilemmas for scholarly communications and some approaches to understanding and working through them. We begin with a brief overview of what can be thought of as three basic theories of ethics that intersect with dilemmas in openness, accountability, transparency, and fairness in data: Virtue, Consequential, and Non-consequential ethics. We then map these kinds of ethics to the practical questions that arise in provisioning infrastructures, providing services, and supporting sustainable research in science and scholarship that depends upon open access to data. Throughout, we attempt to offer concrete examples of potential ethical dilemmas facing scholarly communication with respect to open data, and try to make clear what kinds of ethical positions are helpful to practitioners. In doing so, we hope to both clarify the ethical questions facing librarians doing practical work to support open data access, as well as situate current debates in the field with respect to these three kinds of ethics.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks
Authors:
Nicolas Weber
Abstract:
The increased interest in Artificial Intelligence (AI) raised the need for highly optimized and sophisticated AI frameworks. Starting with the Lua-based Torch many frameworks have emerged over time, such as Theano, Caffe, Chainer, CNTK, MxNet, PyTorch, DL4J, or TensorFlow. All of these provide a high level scripting API that allows users to easily design neural networks and run these on various ki…
▽ More
The increased interest in Artificial Intelligence (AI) raised the need for highly optimized and sophisticated AI frameworks. Starting with the Lua-based Torch many frameworks have emerged over time, such as Theano, Caffe, Chainer, CNTK, MxNet, PyTorch, DL4J, or TensorFlow. All of these provide a high level scripting API that allows users to easily design neural networks and run these on various kinds of hardware. What the user usually does not see is the high effort put into these frameworks to provide peak execution performance. While mainstream CPUs and GPUs have the "luxury" to have a wide spread user base in the open source community, less mainstream CPU, GPU or accelerator vendors need to put in a high effort to get their hardware supported by these frameworks. This includes not only the development of highly efficient compute libraries such as CUDNN, OneDNN or VEDNN but also supporting an ever growing number of simpler compute operations such as summation and multiplications. Each of these frameworks, nowadays, supports several hundred of unique operations, with tensors of various sizes, shapes and data types, which end up in thousands of compute kernels required for each device type. And the number of operations keeps increasing.
That is why NEC Laboratories Europe started developing the SOL AI Optimization project already years ago, to deliver optimal performance to users while keeping the maintenance burden minimal.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Councils in Action: Automating the Curation of Municipal Governance Data for Research
Authors:
Eva Maxfield Brown,
Nicholas Weber
Abstract:
Large scale comparative research into municipal governance is often prohibitively difficult due to a lack of high-quality data. But, recent advances in speech-to-text algorithms and natural language processing has made it possible to more easily collect and analyze data about municipal governments. In this paper, we introduce an open-source platform, the Council Data Project (CDP), to curate novel…
▽ More
Large scale comparative research into municipal governance is often prohibitively difficult due to a lack of high-quality data. But, recent advances in speech-to-text algorithms and natural language processing has made it possible to more easily collect and analyze data about municipal governments. In this paper, we introduce an open-source platform, the Council Data Project (CDP), to curate novel datasets for research into municipal governance. The contribution of this work is two-fold: 1. We demonstrate that CDP, as an infrastructure, can be used to assemble reliable comparative data on municipal governance; 2. We provide exploratory analysis of three municipalities to show how CDP data can be used to gain insight into how municipal governments perform over time. We conclude by describing future directions for research on and with CDP such as the development of machine learning models for speaker annotation, outline generation, and named entity recognition for improved linked data.
△ Less
Submitted 31 August, 2022; v1 submitted 19 April, 2022;
originally announced April 2022.
-
Toward Unsupervised Test Scenario Extraction for Automated Driving Systems from Urban Naturalistic Road Traffic Data
Authors:
Nico Weber,
Christoph Thiem,
Ulrich Konigorski
Abstract:
Scenario-based testing is a promising approach to solve the challenge of proving the safe behavior of vehicles equipped with automated driving systems. Since an infinite number of concrete scenarios can theoretically occur in real-world road traffic, the extraction of scenarios relevant in terms of the safety-related behavior of these systems is a key aspect for their successful verification and v…
▽ More
Scenario-based testing is a promising approach to solve the challenge of proving the safe behavior of vehicles equipped with automated driving systems. Since an infinite number of concrete scenarios can theoretically occur in real-world road traffic, the extraction of scenarios relevant in terms of the safety-related behavior of these systems is a key aspect for their successful verification and validation. Therefore, a method for extracting multimodal urban traffic scenarios from naturalistic road traffic data in an unsupervised manner, minimizing the amount of (potentially biased) prior expert knowledge, is proposed. Rather than an (elaborate) rule-based assignment by extracting concrete scenarios into predefined functional scenarios, the presented method deploys an unsupervised machine learning pipeline. The approach allows exploring the unknown nature of the data and their interpretation as test scenarios that experts could not have anticipated. The method is evaluated for naturalistic road traffic data at urban intersections from the inD and the Silicon Valley Intersections datasets. For this purpose, it is analyzed with which clustering approach (K-Means, hierarchical clustering, and DBSCAN) the scenario extraction method performs best (referring to an elaborate rule-based implementation). Subsequently, using hierarchical clustering the results show both a jump in overall accuracy of around 20% when moving from 4 to 5 clusters and a saturation effect starting at 41 clusters with an overall accuracy of 84%. These observations can be a valuable contribution in the context of the trade-off between the number of functional scenarios (i.e., clustering accuracy) and testing effort. Possible reasons for the observed accuracy variations of different clusters, each with a fixed total number of given clusters, are discussed.
△ Less
Submitted 21 April, 2023; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Decoupled coordinates for machine learning-based molecular fragment linking
Authors:
Markus Fleck,
Noah Weber,
Christopher Trummer
Abstract:
Recent developments in machine-learning based molecular fragment linking have demonstrated the importance of informing the generation process with structural information specifying the relative orientation of the fragments to be linked. However, such structural information has not yet been provided in the form of a complete relative coordinate system. Mathematical details for a decoupled set of bo…
▽ More
Recent developments in machine-learning based molecular fragment linking have demonstrated the importance of informing the generation process with structural information specifying the relative orientation of the fragments to be linked. However, such structural information has not yet been provided in the form of a complete relative coordinate system. Mathematical details for a decoupled set of bond lengths, bond angles and torsion angles are elaborated and the coordinate system is demonstrated to be complete. Significant impact on the quality of the generated linkers is demonstrated numerically. The amount of reliable information within the different types of degrees of freedom is investigated. Ablation studies and an information-theoretical analysis are performed. The presented benefits suggest the application of a complete and decoupled relative coordinate system as a standard good practice in linker design.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
A Needle in a Haystack -- How to Derive Relevant Scenarios for Testing Automated Driving Systems in Urban Areas
Authors:
Nico Weber,
Christoph Thiem,
Ulrich Konigorski
Abstract:
While there was great progress regarding the technology and its implementation for vehicles equipped with automated driving systems (ADS), the problem of how to proof their safety as a necessary precondition prior to market launch remains unsolved. One promising solution are scenario-based test approaches; however, there is no commonly accepted way of how to systematically generate and extract the…
▽ More
While there was great progress regarding the technology and its implementation for vehicles equipped with automated driving systems (ADS), the problem of how to proof their safety as a necessary precondition prior to market launch remains unsolved. One promising solution are scenario-based test approaches; however, there is no commonly accepted way of how to systematically generate and extract the set of relevant scenarios to be tested to sufficiently capture the real-world traffic dynamics, especially for urban operational design domains. Within the scope of this paper, the overall concept of a novel simulation-based toolchain for the development and testing of ADS-equipped vehicles in urban environments is presented. Based on previous work regarding highway environments, the developed novel enhancements aim at empowering the toolchain to be able to deal with the increased complexity due to the more complex road networks with multi-modal interactions of various traffic participants. Based on derived requirements, a thorough explanation of different modules constituting the toolchain is given, showing first results and identified research gaps, respectively. A closer look is taken on two use cases: First, it is investigated whether the toolchain is capable to serve as synthetic data source within the development phase of ADS-equipped vehicles to enrich a scenario database in terms of extent, complexity and impacts of different what-if-scenarios for future mixed traffic. Second, it is analyzed how to combine the individual advantages of real recorded data and an agent-based simulation within a so-called adaptive replay-to-sim approach to support the testing phase of an ADS-equipped vehicle. The developed toolchain contributes to the overarching goal of a commonly accepted methodology for the validation and safety proof of ADS-equipped vehicles, especially in urban environments.
△ Less
Submitted 8 September, 2021;
originally announced September 2021.
-
Toward Generating Sufficiently Valid Test Case Results: A Method for Systematically Assigning Test Cases to Test Bench Configurations in a Scenario-Based Test Approach for Automated Vehicles
Authors:
Markus Steimle,
Nico Weber,
Markus Maurer
Abstract:
To successfully launch automated vehicles into the consumer market, there must be credible proof that the vehicles will operate safely. However, finding a method to validate the vehicles' safe operation is a challenging problem. While scenario-based test approaches seem to be possible solutions, they require execution of a large number of test cases. Several test benches, ranging from actual test…
▽ More
To successfully launch automated vehicles into the consumer market, there must be credible proof that the vehicles will operate safely. However, finding a method to validate the vehicles' safe operation is a challenging problem. While scenario-based test approaches seem to be possible solutions, they require execution of a large number of test cases. Several test benches, ranging from actual test vehicles to partly or fully simulated environments, are available to execute these test cases. Each test bench provides different elements, which in turn, have different parameters and parameter ranges. The composition of elements with their specific parameter values at a specific test bench that is used to execute a test case is referred to as a test bench configuration. However, selecting the most suitable test bench configuration is difficult. The selected test bench configuration determines whether the execution of a specific test case provides sufficiently valid test case results with respect to the intended purpose, for example, validating a vehicle's safe operation. The effective and efficient execution of a large number of test cases requires a method for systematically assigning test cases to the most suitable test bench configuration. Based on a proposed method for classifying test bench configurations, we propose and illustrate a method for systematically assigning test cases to test bench configurations in a scenario-based test approach for automated vehicles. This assignment method allows for the effective and efficient execution of a large number of test cases while generating sufficiently valid test case results.
△ Less
Submitted 20 January, 2022; v1 submitted 7 September, 2021;
originally announced September 2021.
-
HALF: Holistic Auto Machine Learning for FPGAs
Authors:
Jonas Ney,
Dominik Loroch,
Vladimir Rybalkin,
Nico Weber,
Jens KrĂĽger,
Norbert Wehn
Abstract:
Deep Neural Networks (DNNs) are capable of solving complex problems in domains related to embedded systems, such as image and natural language processing. To efficiently implement DNNs on a specific FPGA platform for a given cost criterion, e.g. energy efficiency, an enormous amount of design parameters has to be considered from the topology down to the final hardware implementation. Interdependen…
▽ More
Deep Neural Networks (DNNs) are capable of solving complex problems in domains related to embedded systems, such as image and natural language processing. To efficiently implement DNNs on a specific FPGA platform for a given cost criterion, e.g. energy efficiency, an enormous amount of design parameters has to be considered from the topology down to the final hardware implementation. Interdependencies between the different design layers have to be taken into account and explored efficiently, making it hardly possible to find optimized solutions manually. An automatic, holistic design approach can improve the quality of DNN implementations on FPGA significantly. To this end, we present a cross-layer design space exploration methodology. It comprises optimizations starting from a hardware-aware topology search for DNNs down to the final optimized implementation for a given FPGA platform. The methodology is implemented in our Holistic Auto machine Learning for FPGAs (HALF) framework, which combines an evolutionary search algorithm, various optimization steps and a library of parametrizable hardware DNN modules. HALF automates both the exploration process and the implementation of optimized solutions on a target FPGA platform for various applications. We demonstrate the performance of HALF on a medical use case for arrhythmia detection for three different design goals, i.e. low-energy, low-power and high-throughput respectively. Our FPGA implementation outperforms a TensorRT optimized model on an Nvidia Jetson platform in both throughput and energy consumption.
△ Less
Submitted 20 October, 2021; v1 submitted 28 June, 2021;
originally announced June 2021.
-
Human Schema Curation via Causal Association Rule Mining
Authors:
Noah Weber,
Anton Belyy,
Nils Holzenberger,
Rachel Rudinger,
Benjamin Van Durme
Abstract:
Event schemas are structured knowledge sources defining typical real-world scenarios (e.g., going to an airport). We present a framework for efficient human-in-the-loop construction of a schema library, based on a novel script induction system and a well-crafted interface that allows non-experts to "program" complex event structures. Associated with this work we release a schema library: a machine…
▽ More
Event schemas are structured knowledge sources defining typical real-world scenarios (e.g., going to an airport). We present a framework for efficient human-in-the-loop construction of a schema library, based on a novel script induction system and a well-crafted interface that allows non-experts to "program" complex event structures. Associated with this work we release a schema library: a machine readable resource of 232 detailed event schemas, each of which describe a distinct typical scenario in terms of its relevant sub-event structure (what happens in the scenario), participants (who plays a role in the scenario), fine-grained typing of each participant, and the implied relational constraints between them. We make our schema library and the SchemaBlocks interface available online.
△ Less
Submitted 23 May, 2022; v1 submitted 18 April, 2021;
originally announced April 2021.
-
Addressing Research Software Sustainability via Institutes
Authors:
Daniel S. Katz,
Jeffrey C. Carver,
Neil P. Chue Hong,
Sandra Gesing,
Simon Hettrick,
Tom Honeyman,
Karthik Ram,
Nicholas Weber
Abstract:
Research software is essential to modern research, but it requires ongoing human effort to sustain: to continually adapt to changes in dependencies, to fix bugs, and to add new features. Software sustainability institutes, amongst others, develop, maintain, and disseminate best practices for research software sustainability, and build community around them. These practices can both reduce the amou…
▽ More
Research software is essential to modern research, but it requires ongoing human effort to sustain: to continually adapt to changes in dependencies, to fix bugs, and to add new features. Software sustainability institutes, amongst others, develop, maintain, and disseminate best practices for research software sustainability, and build community around them. These practices can both reduce the amount of effort that is needed and create an environment where the effort is appreciated and rewarded. The UK SSI is such an institute, and the US URSSI and the Australian AuSSI are planning to become institutes, and this extended abstract discusses them and the strengths and weaknesses of this approach.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
Generating Narrative Text in a Switching Dynamical System
Authors:
Noah Weber,
Leena Shekhar,
Heeyoung Kwon,
Niranjan Balasubramanian,
Nathanael Chambers
Abstract:
Early work on narrative modeling used explicit plans and goals to generate stories, but the language generation itself was restricted and inflexible. Modern methods use language models for more robust generation, but often lack an explicit representation of the scaffolding and dynamics that guide a coherent narrative. This paper introduces a new model that integrates explicit narrative structure w…
▽ More
Early work on narrative modeling used explicit plans and goals to generate stories, but the language generation itself was restricted and inflexible. Modern methods use language models for more robust generation, but often lack an explicit representation of the scaffolding and dynamics that guide a coherent narrative. This paper introduces a new model that integrates explicit narrative structure with neural language models, formalizing narrative modeling as a Switching Linear Dynamical System (SLDS). A SLDS is a dynamical system in which the latent dynamics of the system (i.e. how the state vector transforms over time) is controlled by top-level discrete switching variables. The switching variables represent narrative structure (e.g., sentiment or discourse states), while the latent state vector encodes information on the current state of the narrative. This probabilistic formulation allows us to control generation, and can be learned in a semi-supervised fashion using both labeled and unlabeled data. Additionally, we derive a Gibbs sampler for our model that can fill in arbitrary parts of the narrative, guided by the switching variables. Our filled-in (English language) narratives outperform several baselines on both automatic and human evaluations.
△ Less
Submitted 7 April, 2020;
originally announced April 2020.
-
Causal Inference of Script Knowledge
Authors:
Noah Weber,
Rachel Rudinger,
Benjamin Van Durme
Abstract:
When does a sequence of events define an everyday scenario and how can this knowledge be induced from text? Prior works in inducing such scripts have relied on, in one form or another, measures of correlation between instances of events in a corpus. We argue from both a conceptual and practical sense that a purely correlation-based approach is insufficient, and instead propose an approach to scrip…
▽ More
When does a sequence of events define an everyday scenario and how can this knowledge be induced from text? Prior works in inducing such scripts have relied on, in one form or another, measures of correlation between instances of events in a corpus. We argue from both a conceptual and practical sense that a purely correlation-based approach is insufficient, and instead propose an approach to script induction based on the causal effect between events, formally defined via interventions. Through both human and automatic evaluations, we show that the output of our method based on causal effects better matches the intuition of what a script represents
△ Less
Submitted 2 April, 2020;
originally announced April 2020.
-
SOL: Effortless Device Support for AI Frameworks without Source Code Changes
Authors:
Nicolas Weber,
Felipe Huici
Abstract:
Modern high performance computing clusters heavily rely on accelerators to overcome the limited compute power of CPUs. These supercomputers run various applications from different domains such as simulations, numerical applications or artificial intelligence (AI). As a result, vendors need to be able to efficiently run a wide variety of workloads on their hardware. In the AI domain this is in part…
▽ More
Modern high performance computing clusters heavily rely on accelerators to overcome the limited compute power of CPUs. These supercomputers run various applications from different domains such as simulations, numerical applications or artificial intelligence (AI). As a result, vendors need to be able to efficiently run a wide variety of workloads on their hardware. In the AI domain this is in particular exacerbated by the existence of a number of popular frameworks (e.g, PyTorch, TensorFlow, etc.) that have no common code base, and can vary in functionality. The code of these frameworks evolves quickly, making it expensive to keep up with all changes and potentially forcing developers to go through constant rounds of upstreaming. In this paper we explore how to provide hardware support in AI frameworks without changing the framework's source code in order to minimize maintenance overhead. We introduce SOL, an AI acceleration middleware that provides a hardware abstraction layer that allows us to transparently support heterogeneous hardware. As a proof of concept, we implemented SOL for PyTorch with three backends: CPUs, GPUs and vector processors.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Hierarchical Quantized Representations for Script Generation
Authors:
Noah Weber,
Leena Shekhar,
Niranjan Balasubramanian,
Nathanael Chambers
Abstract:
Scripts define knowledge about how everyday scenarios (such as going to a restaurant) are expected to unfold. One of the challenges to learning scripts is the hierarchical nature of the knowledge. For example, a suspect arrested might plead innocent or guilty, and a very different track of events is then expected to happen. To capture this type of information, we propose an autoencoder model with…
▽ More
Scripts define knowledge about how everyday scenarios (such as going to a restaurant) are expected to unfold. One of the challenges to learning scripts is the hierarchical nature of the knowledge. For example, a suspect arrested might plead innocent or guilty, and a very different track of events is then expected to happen. To capture this type of information, we propose an autoencoder model with a latent space defined by a hierarchy of categorical variables. We utilize a recently proposed vector quantization based approach, which allows continuous embeddings to be associated with each latent variable value. This permits the decoder to softly decide what portions of the latent hierarchy to condition on by attending over the value embeddings for a given setting. Our model effectively encodes and generates scripts, outperforming a recent language modeling-based method on several standard tasks, and allowing the autoencoder model to achieve substantially lower perplexity scores compared to the previous language modeling-based method.
△ Less
Submitted 28 August, 2018;
originally announced August 2018.
-
The Fine Line between Linguistic Generalization and Failure in Seq2Seq-Attention Models
Authors:
Noah Weber,
Leena Shekhar,
Niranjan Balasubramanian
Abstract:
Seq2Seq based neural architectures have become the go-to architecture to apply to sequence to sequence language tasks. Despite their excellent performance on these tasks, recent work has noted that these models usually do not fully capture the linguistic structure required to generalize beyond the dense sections of the data distribution \cite{ettinger2017towards}, and as such, are likely to fail o…
▽ More
Seq2Seq based neural architectures have become the go-to architecture to apply to sequence to sequence language tasks. Despite their excellent performance on these tasks, recent work has noted that these models usually do not fully capture the linguistic structure required to generalize beyond the dense sections of the data distribution \cite{ettinger2017towards}, and as such, are likely to fail on samples from the tail end of the distribution (such as inputs that are noisy \citep{belkinovnmtbreak} or of different lengths \citep{bentivoglinmtlength}). In this paper, we look at a model's ability to generalize on a simple symbol rewriting task with a clearly defined structure. We find that the model's ability to generalize this structure beyond the training distribution depends greatly on the chosen random seed, even when performance on the standard test set remains the same. This suggests that a model's ability to capture generalizable structure is highly sensitive. Moreover, this sensitivity may not be apparent when evaluating it on standard test sets.
△ Less
Submitted 8 May, 2018; v1 submitted 3 May, 2018;
originally announced May 2018.
-
BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism
Authors:
Nicolas Weber,
Florian Schmidt,
Mathias Niepert,
Felipe Huici
Abstract:
Neural network frameworks such as PyTorch and TensorFlow are the workhorses of numerous machine learning applications ranging from object recognition to machine translation. While these frameworks are versatile and straightforward to use, the training of and inference in deep neural networks is resource (energy, compute, and memory) intensive. In contrast to recent works focusing on algorithmic en…
▽ More
Neural network frameworks such as PyTorch and TensorFlow are the workhorses of numerous machine learning applications ranging from object recognition to machine translation. While these frameworks are versatile and straightforward to use, the training of and inference in deep neural networks is resource (energy, compute, and memory) intensive. In contrast to recent works focusing on algorithmic enhancements, we introduce BrainSlug, a framework that transparently accelerates neural network workloads by changing the default layer-by-layer processing to a depth-first approach, reducing the amount of data required by the computations and thus improving the performance of the available hardware caches. BrainSlug achieves performance improvements of up to 41.1% on CPUs and 35.7% on GPUs. These optimizations come at zero cost to the user as they do not require hardware changes and only need tiny adjustments to the software.
△ Less
Submitted 23 April, 2018;
originally announced April 2018.
-
Detail-Preserving Pooling in Deep Networks
Authors:
Faraz Saeedan,
Nicolas Weber,
Michael Goesele,
Stefan Roth
Abstract:
Most convolutional neural networks use some method for gradually downscaling the size of the hidden layers. This is commonly referred to as pooling, and is applied to reduce the number of parameters, improve invariance to certain distortions, and increase the receptive field size. Since pooling by nature is a lossy process, it is crucial that each such layer maintains the portion of the activation…
▽ More
Most convolutional neural networks use some method for gradually downscaling the size of the hidden layers. This is commonly referred to as pooling, and is applied to reduce the number of parameters, improve invariance to certain distortions, and increase the receptive field size. Since pooling by nature is a lossy process, it is crucial that each such layer maintains the portion of the activations that is most important for the network's discriminability. Yet, simple maximization or averaging over blocks, max or average pooling, or plain downsampling in the form of strided convolutions are the standard. In this paper, we aim to leverage recent results on image downscaling for the purposes of deep learning. Inspired by the human visual system, which focuses on local spatial changes, we propose detail-preserving pooling (DPP), an adaptive pooling method that magnifies spatial changes and preserves important structural detail. Importantly, its parameters can be learned jointly with the rest of the network. We analyze some of its theoretical properties and show its empirical benefits on several datasets and networks, where DPP consistently outperforms previous pooling approaches.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.
-
Controlling Decoding for More Abstractive Summaries with Copy-Based Networks
Authors:
Noah Weber,
Leena Shekhar,
Niranjan Balasubramanian,
Kyunghyun Cho
Abstract:
Attention-based neural abstractive summarization systems equipped with copy mechanisms have shown promising results. Despite this success, it has been noticed that such a system generates a summary by mostly, if not entirely, copying over phrases, sentences, and sometimes multiple consecutive sentences from an input paragraph, effectively performing extractive summarization. In this paper, we veri…
▽ More
Attention-based neural abstractive summarization systems equipped with copy mechanisms have shown promising results. Despite this success, it has been noticed that such a system generates a summary by mostly, if not entirely, copying over phrases, sentences, and sometimes multiple consecutive sentences from an input paragraph, effectively performing extractive summarization. In this paper, we verify this behavior using the latest neural abstractive summarization system - a pointer-generator network. We propose a simple baseline method that allows us to control the amount of copying without retraining. Experiments indicate that the method provides a strong baseline for abstractive systems looking to obtain high ROUGE scores while minimizing overlap with the source article, substantially reducing the n-gram overlap with the original article while keeping within 2 points of the original model's ROUGE score.
△ Less
Submitted 19 March, 2018; v1 submitted 19 March, 2018;
originally announced March 2018.
-
Mining Open Government Data Used in Scientific Research
Authors:
An Yan,
Nicholas Weber
Abstract:
In the following paper, we describe results from mining citations, mentions, and links to open government data (OGD) in peer-reviewed literature. We inductively develop a method for categorizing how OGD are used by different research communities, and provide descriptive statistics about the publication years, publication outlets, and OGD sources. Our results demonstrate that, 1. The use of OGD in…
▽ More
In the following paper, we describe results from mining citations, mentions, and links to open government data (OGD) in peer-reviewed literature. We inductively develop a method for categorizing how OGD are used by different research communities, and provide descriptive statistics about the publication years, publication outlets, and OGD sources. Our results demonstrate that, 1. The use of OGD in research is steadily increasing from 2009 to 2016; 2. Researchers use OGD from 96 different open government data portals, with data.gov.uk and data.gov being the most frequent sources; and, 3.Contrary to previous findings, we provide evidence suggesting that OGD from developing nations, notably India and Kenya, are being frequently used to fuel scientific discoveries. The findings of this paper contribute to ongoing research agendas aimed at tracking the impact of open government data initiatives, and provides an initial description of how open government data are valuable to diverse scientific research communities.
△ Less
Submitted 24 March, 2018; v1 submitted 8 February, 2018;
originally announced February 2018.
-
Event Representations with Tensor-based Compositions
Authors:
Noah Weber,
Niranjan Balasubramanian,
Nathanael Chambers
Abstract:
Robust and flexible event representations are important to many core areas in language understanding. Scripts were proposed early on as a way of representing sequences of events for such understanding, and has recently attracted renewed attention. However, obtaining effective representations for modeling script-like event sequences is challenging. It requires representations that can capture event…
▽ More
Robust and flexible event representations are important to many core areas in language understanding. Scripts were proposed early on as a way of representing sequences of events for such understanding, and has recently attracted renewed attention. However, obtaining effective representations for modeling script-like event sequences is challenging. It requires representations that can capture event-level and scenario-level semantics. We propose a new tensor-based composition method for creating event representations. The method captures more subtle semantic interactions between an event and its entities and yields representations that are effective at multiple event-related tasks. With the continuous representations, we also devise a simple schema generation method which produces better schemas compared to a prior discrete representation based method. Our analysis shows that the tensors capture distinct usages of a predicate even when there are only subtle differences in their surface realizations.
△ Less
Submitted 20 November, 2017;
originally announced November 2017.
-
Report on the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3)
Authors:
Daniel S. Katz,
Sou-Cheng T. Choi,
Kyle E. Niemeyer,
James Hetherington,
Frank Löffler,
Dan Gunter,
Ray Idaszak,
Steven R. Brandt,
Mark A. Miller,
Sandra Gesing,
Nick D. Jones,
Nic Weber,
Suresh Marru,
Gabrielle Allen,
Birgit Penzenstadler,
Colin C. Venters,
Ethan Davis,
Lorraine Hwang,
Ilian Todorov,
Abani Patra,
Miguel de Val-Borro
Abstract:
This report records and discusses the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3). The report includes a description of the keynote presentation of the workshop, which served as an overview of sustainable scientific software. It also summarizes a set of lightning talks in which speakers highlighted to-the-point lessons and challenges pertaining to sustain…
▽ More
This report records and discusses the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3). The report includes a description of the keynote presentation of the workshop, which served as an overview of sustainable scientific software. It also summarizes a set of lightning talks in which speakers highlighted to-the-point lessons and challenges pertaining to sustaining scientific software. The final and main contribution of the report is a summary of the discussions, future steps, and future organization for a set of self-organized working groups on topics including developing pathways to funding scientific software; constructing useful common metrics for crediting software stakeholders; identifying principles for sustainable software engineering design; reaching out to research software organizations around the world; and building communities for software sustainability. For each group, we include a point of contact and a landing page that can be used by those who want to join that group's future activities. The main challenge left by the workshop is to see if the groups will execute these activities that they have scheduled, and how the WSSSPE community can encourage this to happen.
△ Less
Submitted 6 February, 2016;
originally announced February 2016.
-
Niche Modeling: Ecological Metaphors for Sustainable Software in Science
Authors:
Nicholas Weber,
Andrea Thomer,
Michael Twidale
Abstract:
This position paper is aimed at providing some history and provocations for the use of an ecological metaphor to describe software development environments. We do not claim that the ecological metaphor is the best or only way of looking at software - rather we want to ask if it can indeed be a productive and thought provoking one.
This position paper is aimed at providing some history and provocations for the use of an ecological metaphor to describe software development environments. We do not claim that the ecological metaphor is the best or only way of looking at software - rather we want to ask if it can indeed be a productive and thought provoking one.
△ Less
Submitted 6 September, 2013;
originally announced September 2013.