-
Nemotron-4 340B Technical Report
Authors:
Nvidia,
:,
Bo Adler,
Niket Agarwal,
Ashwath Aithal,
Dong H. Anh,
Pallab Bhattacharya,
Annika Brundyn,
Jared Casper,
Bryan Catanzaro,
Sharon Clay,
Jonathan Cohen,
Sirshak Das,
Ayush Dattagupta,
Olivier Delalleau,
Leon Derczynski,
Yi Dong,
Daniel Egert,
Ellie Evans,
Aleksander Ficek,
Denys Fridman,
Shaona Ghosh,
Boris Ginsburg,
Igor Gitman,
Tomasz Grzegorzek
, et al. (58 additional authors not shown)
Abstract:
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be…
▽ More
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
CRIS: Collaborative Refinement Integrated with Segmentation for Polyp Segmentation
Authors:
Ankush Gajanan Arudkar,
Bernard J. E. Evans
Abstract:
Accurate detection of colorectal cancer and early prevention heavily rely on precise polyp identification during gastrointestinal colonoscopy. Due to limited data, many current state-of-the-art deep learning methods for polyp segmentation often rely on post-processing of masks to reduce noise and enhance results. In this study, we propose an approach that integrates mask refinement and binary sema…
▽ More
Accurate detection of colorectal cancer and early prevention heavily rely on precise polyp identification during gastrointestinal colonoscopy. Due to limited data, many current state-of-the-art deep learning methods for polyp segmentation often rely on post-processing of masks to reduce noise and enhance results. In this study, we propose an approach that integrates mask refinement and binary semantic segmentation, leveraging a novel collaborative training strategy that surpasses current widely-used refinement strategies. We demonstrate the superiority of our approach through comprehensive evaluation on established benchmark datasets and its successful application across various medical image segmentation architectures.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
SciJava Ops: An Improved Algorithms Framework for Fiji and Beyond
Authors:
Gabriel J. Selzer,
Curtis T. Rueden,
Mark C. Hiner,
Edward L. Evans III,
David Kolb,
Marcel Wiedenmann,
Christian Birkhold,
Tim-Oliver Buchholz,
Stefan Helfrich,
Brian Northan,
Alison Walter,
Johannes Schindelin,
Tobias Pietzsch,
Stephan Saalfeld,
Michael R. Berthold,
Kevin W. Eliceiri
Abstract:
Many scientific software platforms provide plugin mechanisms that simplify the integration, deployment, and execution of externally developed functionality. One of the most widely used platforms in the imaging space is Fiji, a popular open-source application for scientific image analysis. Fiji incorporates and builds on the ImageJ and ImageJ2 platforms, which provide a powerful plugin architecture…
▽ More
Many scientific software platforms provide plugin mechanisms that simplify the integration, deployment, and execution of externally developed functionality. One of the most widely used platforms in the imaging space is Fiji, a popular open-source application for scientific image analysis. Fiji incorporates and builds on the ImageJ and ImageJ2 platforms, which provide a powerful plugin architecture used by thousands of plugins to solve a wide variety of problems. This capability is a major part of Fiji's success, and it has become a widely used biological image analysis tool and a target for new functionality. However, a plugin-based software architecture cannot unify disparate platforms operating on incompatible data structures; interoperability necessitates the creation of adaptation or "bridge" layers to translate data and invoke functionality. As a result, while platforms like Fiji enable a high degree of interconnectivity and extensibility, they were not fundamentally designed to integrate across the many data types, programming languages, and architectural differences of various software platforms.To help address this challenge, we present SciJava Ops, a foundational software library for expressing algorithms as plugins in a unified and extensible way. Continuing the evolution of Fiji's SciJava plugin mechanism, SciJava Ops enables users to harness algorithms from various software platforms within a central execution environment. In addition, SciJava Ops automatically adapts data into the most appropriate structure for each algorithm, allowing users to freely and transparently combine algorithms from otherwise incompatible tools. While SciJava Ops is initially distributed as a Fiji update site, the framework does not require Fiji, ImageJ, or ImageJ2, and would be suitable for integration with additional image analysis platforms.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention
Authors:
Ethan N. Evans,
Matthew Cook,
Zachary P. Bradshaw,
Margarite L. LaBorde
Abstract:
The widely popular transformer network popularized by the generative pre-trained transformer (GPT) has a large field of applicability, including predicting text and images, classification, and even predicting solutions to the dynamics of physical systems. In the latter context, the continuous analog of the self-attention mechanism at the heart of transformer networks has been applied to learning t…
▽ More
The widely popular transformer network popularized by the generative pre-trained transformer (GPT) has a large field of applicability, including predicting text and images, classification, and even predicting solutions to the dynamics of physical systems. In the latter context, the continuous analog of the self-attention mechanism at the heart of transformer networks has been applied to learning the solutions of partial differential equations and reveals a convolution kernel nature that can be exploited by the Fourier transform. It is well known that many quantum algorithms that have provably demonstrated a speedup over classical algorithms utilize the quantum Fourier transform. In this work, we explore quantum circuits that can efficiently express a self-attention mechanism through the perspective of kernel-based operator learning. In this perspective, we are able to represent deep layers of a vision transformer network using simple gate operations and a set of multi-dimensional quantum Fourier transforms. We analyze the computational and parameter complexity of our novel variational quantum circuit, which we call Self-Attention Sequential Quantum Transformer Channel (SASQuaTCh), and demonstrate its utility on simplified classification problems.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
A Quick Introduction to Quantum Machine Learning for Non-Practitioners
Authors:
Ethan N. Evans,
Dominic Byrne,
Matthew G. Cook
Abstract:
This paper provides an introduction to quantum machine learning, exploring the potential benefits of using quantum computing principles and algorithms that may improve upon classical machine learning approaches. Quantum computing utilizes particles governed by quantum mechanics for computational purposes, leveraging properties like superposition and entanglement for information representation and…
▽ More
This paper provides an introduction to quantum machine learning, exploring the potential benefits of using quantum computing principles and algorithms that may improve upon classical machine learning approaches. Quantum computing utilizes particles governed by quantum mechanics for computational purposes, leveraging properties like superposition and entanglement for information representation and manipulation. Quantum machine learning applies these principles to enhance classical machine learning models, potentially reducing network size and training time on quantum hardware. The paper covers basic quantum mechanics principles, including superposition, phase space, and entanglement, and introduces the concept of quantum gates that exploit these properties. It also reviews classical deep learning concepts, such as artificial neural networks, gradient descent, and backpropagation, before delving into trainable quantum circuits as neural networks. An example problem demonstrates the potential advantages of quantum neural networks, and the appendices provide detailed derivations. The paper aims to help researchers new to quantum mechanics and machine learning develop their expertise more efficiently.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Bluefish: Composing Diagrams with Declarative Relations
Authors:
Josh Pollock,
Catherine Mei,
Grace Huang,
Elliot Evans,
Daniel Jackson,
Arvind Satyanarayan
Abstract:
Diagrams are essential tools for problem-solving and communication as they externalize conceptual structures using spatial relationships. But when picking a diagramming framework, users are faced with a dilemma. They can either use a highly expressive but low-level toolkit, whose API does not match their domain-specific concepts, or select a high-level typology, which offers a recognizable vocabul…
▽ More
Diagrams are essential tools for problem-solving and communication as they externalize conceptual structures using spatial relationships. But when picking a diagramming framework, users are faced with a dilemma. They can either use a highly expressive but low-level toolkit, whose API does not match their domain-specific concepts, or select a high-level typology, which offers a recognizable vocabulary but supports a limited range of diagrams. To address this gap, we introduce Bluefish: a diagramming framework inspired by component-based user interface (UI) libraries. Bluefish lets users create diagrams using relations: declarative, composable, and extensible diagram fragments that relax the concept of a UI component. Unlike a component, a relation does not have sole ownership over its children nor does it need to fully specify their layout. To render diagrams, Bluefish extends a traditional tree-based scenegraph to a compound graph that captures both hierarchical and adjacent relationships between nodes. To evaluate our system, we construct a diverse example gallery covering many domains including mathematics, physics, computer science, and even cooking. We show that Bluefish's relations are effective declarative primitives for diagrams. Bluefish is open source, and we aim to shape it into both a usable tool and a research platform.
△ Less
Submitted 25 July, 2024; v1 submitted 30 June, 2023;
originally announced July 2023.
-
Evaluating the Social Impact of Generative AI Systems in Systems and Society
Authors:
Irene Solaiman,
Zeerak Talat,
William Agnew,
Lama Ahmad,
Dylan Baker,
Su Lin Blodgett,
Canyu Chen,
Hal Daumé III,
Jesse Dodge,
Isabella Duan,
Ellie Evans,
Felix Friedrich,
Avijit Ghosh,
Usman Gohar,
Sara Hooker,
Yacine Jernite,
Ria Kalluri,
Alberto Lusoli,
Alina Leidinger,
Michelle Lin,
Xiuzhu Lin,
Sasha Luccioni,
Jennifer Mickel,
Margaret Mitchell,
Jessica Newman
, et al. (6 additional authors not shown)
Abstract:
Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts, but there is no official standard for means of evaluating those impacts or for which impacts should be evaluated. In this paper, we present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality in two overarching categor…
▽ More
Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts, but there is no official standard for means of evaluating those impacts or for which impacts should be evaluated. In this paper, we present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality in two overarching categories: what can be evaluated in a base system independent of context and what can be evaluated in a societal context. Importantly, this refers to base systems that have no predetermined application or deployment context, including a model itself, as well as system components, such as training data. Our framework for a base system defines seven categories of social impact: bias, stereotypes, and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. Suggested methods for evaluation apply to listed generative modalities and analyses of the limitations of existing evaluations serve as a starting point for necessary investment in future evaluations. We offer five overarching categories for what can be evaluated in a broader societal context, each with its own subcategories: trustworthiness and autonomy; inequality, marginalization, and violence; concentration of authority; labor and creativity; and ecosystem and environment. Each subcategory includes recommendations for mitigating harm.
△ Less
Submitted 28 June, 2024; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Grounding Characters and Places in Narrative Texts
Authors:
Sandeep Soni,
Amanpreet Sihra,
Elizabeth F. Evans,
Matthew Wilkens,
David Bamman
Abstract:
Tracking characters and locations throughout a story can help improve the understanding of its plot structure. Prior research has analyzed characters and locations from text independently without grounding characters to their locations in narrative time. Here, we address this gap by proposing a new spatial relationship categorization task. The objective of the task is to assign a spatial relations…
▽ More
Tracking characters and locations throughout a story can help improve the understanding of its plot structure. Prior research has analyzed characters and locations from text independently without grounding characters to their locations in narrative time. Here, we address this gap by proposing a new spatial relationship categorization task. The objective of the task is to assign a spatial relationship category for every character and location co-mention within a window of text, taking into consideration linguistic context, narrative tense, and temporal scope. To this end, we annotate spatial relationships in approximately 2500 book excerpts and train a model using contextual embeddings as features to predict these relationships. When applied to a set of books, this model allows us to test several hypotheses on mobility and domestic space, revealing that protagonists are more mobile than non-central characters and that women as characters tend to occupy more interior space than men. Overall, our work is the first step towards joint modeling and analysis of characters and places in narrative text.
△ Less
Submitted 27 May, 2023;
originally announced May 2023.
-
Algorithmic techniques for finding resistance distances on structured graphs
Authors:
E. J. Evans,
A. E. Francis
Abstract:
In this paper we give a survey of methods used to calculate values of resistance distance (also known as effective resistance) in graphs. Resistance distance has played a prominent role not only in circuit theory and chemistry, but also in combinatorial matrix theory and spectral graph theory. Moreover resistance distance has applications ranging from quantifying biological structures, distributed…
▽ More
In this paper we give a survey of methods used to calculate values of resistance distance (also known as effective resistance) in graphs. Resistance distance has played a prominent role not only in circuit theory and chemistry, but also in combinatorial matrix theory and spectral graph theory. Moreover resistance distance has applications ranging from quantifying biological structures, distributed control systems, network analysis, and power grid systems. In this paper we discuss both exact techniques and approximate techniques and for each method discussed we provide an illustrative example of the technique. We also present some open questions and conjectures.
△ Less
Submitted 13 September, 2021; v1 submitted 17 August, 2021;
originally announced August 2021.
-
Stochastic Spatio-Temporal Optimization for Control and Co-Design of Systems in Robotics and Applied Physics
Authors:
Ethan N. Evans,
Andrew P. Kendall,
Evangelos A. Theodorou
Abstract:
Correlated with the trend of increasing degrees of freedom in robotic systems is a similar trend of rising interest in Spatio-Temporal systems described by Partial Differential Equations (PDEs) among the robotics and control communities. These systems often exhibit dramatic under-actuation, high dimensionality, bifurcations, and multimodal instabilities. Their control represents many of the curren…
▽ More
Correlated with the trend of increasing degrees of freedom in robotic systems is a similar trend of rising interest in Spatio-Temporal systems described by Partial Differential Equations (PDEs) among the robotics and control communities. These systems often exhibit dramatic under-actuation, high dimensionality, bifurcations, and multimodal instabilities. Their control represents many of the current-day challenges facing the robotics and automation communities. Not only are these systems challenging to control, but the design of their actuation is an NP-hard problem on its own. Recent methods either discretize the space before optimization, or apply tools from linear systems theory under restrictive linearity assumptions in order to arrive at a control solution. This manuscript provides a novel sampling-based stochastic optimization framework based entirely in Hilbert spaces suitable for the general class of \textit{semi-linear} SPDEs which describes many systems in robotics and applied physics. This framework is utilized for simultaneous policy optimization and actuator co-design optimization. The resulting algorithm is based on variational optimization, and performs joint episodic optimization of the feedback control law and the actuation design over episodes. We study first and second order systems, and in doing so, extend several results to the case of second order SPDEs. Finally, we demonstrate the efficacy of the proposed approach with several simulated experiments on a variety of SPDEs in robotics and applied physics including an infinite degree-of-freedom soft robotic manipulator.
△ Less
Submitted 17 February, 2021;
originally announced February 2021.
-
Using Social Networks to Improve Group Transition Prediction in Professional Sports
Authors:
Emily J. Evans,
Rebecca Jones,
Joseph Leung,
Benjamin Z. Webb
Abstract:
We examine whether social data can be used to predict how members of Major League Baseball (MLB) and members of the National Basketball Association (NBA) transition between teams during their career. We find that incorporating social data into various machine learning algorithms substantially improves the algorithms' ability to correctly determine these transitions. In particular, we measure how p…
▽ More
We examine whether social data can be used to predict how members of Major League Baseball (MLB) and members of the National Basketball Association (NBA) transition between teams during their career. We find that incorporating social data into various machine learning algorithms substantially improves the algorithms' ability to correctly determine these transitions. In particular, we measure how player performance, team fitness, and social data individually and collectively contribute to predicting these transitions. Incorporating individual performance and team fitness both improve the predictive accuracy of our algorithms. However, this improvement is dwarfed by the improvement seen when we include social data suggesting that social relationships have a comparatively large effect on player transitions in both MLB and in the NBA.
△ Less
Submitted 1 September, 2020;
originally announced September 2020.
-
An interdisciplinary survey of network similarity methods
Authors:
Emily Evans,
Marissa Graham
Abstract:
Comparative graph and network analysis play an important role in both systems biology and pattern recognition, but existing surveys on the topic have historically ignored or underserved one or the other of these fields. We present an integrative introduction to the key objectives and methods of graph and network comparison in each field, with the intent of remaining accessible to relative novices…
▽ More
Comparative graph and network analysis play an important role in both systems biology and pattern recognition, but existing surveys on the topic have historically ignored or underserved one or the other of these fields. We present an integrative introduction to the key objectives and methods of graph and network comparison in each field, with the intent of remaining accessible to relative novices in order to mitigate the barrier to interdisciplinary idea crossover.
To guide our investigation, and to quantitatively justify our assertions about what the key objectives and methods of each field are, we have constructed a citation network containing 5,793 vertices from the full reference lists of over two hundred relevant papers, which we collected by searching Google Scholar for ten different network comparison-related search terms. We investigate its basic statistics and community structure, and frame our presentation around the papers found to have high importance according to five different standard centrality measures.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Ex ante prediction of cascade sizes on networks of agents facing binary outcomes
Authors:
Paul Ormerod,
Ellie Evans
Abstract:
We consider in this paper the potential for ex ante prediction of the cascade size in a model of binary choice with externalities (Schelling 1973, Watts 2002). Agents are connected on a network and can be in one of two states of the world, 0 or 1. Initially, all are in state 0 and a small number of seeds are selected at random to switch to state1. A simple threshold rule specifies whether other ag…
▽ More
We consider in this paper the potential for ex ante prediction of the cascade size in a model of binary choice with externalities (Schelling 1973, Watts 2002). Agents are connected on a network and can be in one of two states of the world, 0 or 1. Initially, all are in state 0 and a small number of seeds are selected at random to switch to state1. A simple threshold rule specifies whether other agents switch subsequently. The cascade size (the percolation) is the proportion of all agents which eventually switches to state 1. We select information on the connectivity of the initial seeds, the connectivity of the agents to which they are connected, the thresholds of these latter agents, and the thresholds of the agents to which these are connected. We obtain results for random, small world and scale -free networks with different network parameters and numbers of initial seeds. The results are robust with respect to these factors. We perform least squares regression of the logit transformation of the cascade size (Hosmer and Lemeshow 1989) on these potential explanatory variables. We find considerable explanatory power for the ex ante prediction of cascade sizes. For the random networks, on average 32 per cent of the variance of the cascade sizes is explained, 40 per cent for the small world and 46 per cent for the scale-free. The connectivity variables are hardly ever significant in the regressions, whether relating to the seeds themselves or to the agents connected to the seeds. In contrast, the information on the thresholds of agents contains much more explanatory power. This supports the conjecture of Watts and Dodds (2007.) that large cascades are driven by a small mass of easily influenced agents.
△ Less
Submitted 17 March, 2011;
originally announced March 2011.