Search | arXiv e-print repository

Just How Flexible are Neural Networks in Practice?

Authors: Ravid Shwartz-Ziv, Micah Goldblum, Arpit Bansal, C. Bayan Bruss, Yann LeCun, Andrew Gordon Wilson

Abstract: It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function c… ▽ More It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function class, built into an architecture, shapes its loss surface and impacts the minima we find. In this work, we examine the ability of neural networks to fit data in practice. Our findings indicate that: (1) standard optimizers find minima where the model can only fit training sets with significantly fewer samples than it has parameters; (2) convolutional networks are more parameter-efficient than MLPs and ViTs, even on randomly labeled data; (3) while stochastic training is thought to have a regularizing effect, SGD actually finds minima that fit more training data than full-batch gradient descent; (4) the difference in capacity to fit correctly labeled and incorrectly labeled samples can be predictive of generalization; (5) ReLU activation functions result in finding minima that fit more data despite being designed to avoid vanishing and exploding gradients in deep architectures. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.09177 [pdf, other]

Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency

Authors: Alan Nawzad Amin, Andrew Gordon Wilson

Abstract: To make accurate predictions, understand mechanisms, and design interventions in systems of many variables, we wish to learn causal graphs from large scale data. Unfortunately the space of all possible causal graphs is enormous so scalably and accurately searching for the best fit to the data is a challenge. In principle we could substantially decrease the search space, or learn the graph entirely… ▽ More To make accurate predictions, understand mechanisms, and design interventions in systems of many variables, we wish to learn causal graphs from large scale data. Unfortunately the space of all possible causal graphs is enormous so scalably and accurately searching for the best fit to the data is a challenge. In principle we could substantially decrease the search space, or learn the graph entirely, by testing the conditional independence of variables. However, deciding if two variables are adjacent in a causal graph may require an exponential number of tests. Here we build a scalable and flexible method to evaluate if two variables are adjacent in a causal graph, the Differentiable Adjacency Test (DAT). DAT replaces an exponential number of tests with a provably equivalent relaxed problem. It then solves this problem by training two neural networks. We build a graph learning method based on DAT, DAT-Graph, that can also learn from data with interventions. DAT-Graph can learn graphs of 1000 variables with state of the art accuracy. Using the graph learned by DAT-Graph, we also build models that make much more accurate predictions of the effects of interventions on large scale RNA sequencing data. △ Less

Submitted 18 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: ICML 2024; Code at https://github.com/AlanNawzadAmin/DAT-graph

arXiv:2406.08391 [pdf, other]

Large Language Models Must Be Taught to Know What They Don't Know

Authors: Sanyam Kapoor, Nate Gruver, Manley Roberts, Katherine Collins, Arka Pal, Umang Bhatt, Adrian Weller, Samuel Dooley, Micah Goldblum, Andrew Gordon Wilson

Abstract: When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibrati… ▽ More When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibration and then show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We show that a thousand graded examples are sufficient to outperform baseline methods and that training through the features of a model is necessary for good performance and tractable for large open-source models when using LoRA. We also investigate the mechanisms that enable reliable LLM uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators, applicable not just to their own uncertainties but also the uncertainty of other models. Lastly, we show that uncertainty estimates inform human use of LLMs in human-AI collaborative settings through a user study. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Code available at: https://github.com/activatedgeek/calibration-tuning

arXiv:2406.07337 [pdf, other]

Transferring Knowledge from Large Foundation Models to Small Downstream Models

Authors: Shikai Qiu, Boran Han, Danielle C. Maddix, Shuai Zhang, Yuyang Wang, Andrew Gordon Wilson

Abstract: How do we transfer the relevant knowledge from ever larger foundation models into small, task-specific downstream models that can run at much lower costs? Standard transfer learning using pre-trained weights as the initialization transfers limited information and commits us to often massive pre-trained architectures. This procedure also precludes combining multiple pre-trained models that learn co… ▽ More How do we transfer the relevant knowledge from ever larger foundation models into small, task-specific downstream models that can run at much lower costs? Standard transfer learning using pre-trained weights as the initialization transfers limited information and commits us to often massive pre-trained architectures. This procedure also precludes combining multiple pre-trained models that learn complementary information. To address these shortcomings, we introduce Adaptive Feature Transfer (AFT). Instead of transferring weights, AFT operates purely on features, thereby decoupling the choice of the pre-trained model from the smaller downstream model. Rather than indiscriminately compressing all pre-trained features, AFT adaptively transfers pre-trained features that are most useful for performing the downstream task, using a simple regularization that adds minimal overhead. Across multiple vision, language, and multi-modal datasets, AFT achieves significantly better downstream performance compared to alternatives with a similar computational cost. Furthermore, AFT reliably translates improvement in pre-trained models into improvement in downstream performance, even if the downstream model is over $50\times$ smaller, and can effectively transfer complementary information learned by multiple pre-trained models. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: ICML 2024. Code available at https://github.com/amazon-science/adaptive-feature-transfer

arXiv:2406.06248 [pdf, other]

Compute Better Spent: Replacing Dense Layers with Structured Matrices

Authors: Shikai Qiu, Andres Potapczynski, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson

Abstract: Dense linear layers are the dominant computational bottleneck in foundation models. Identifying more efficient alternatives to dense matrices has enormous potential for building more compute-efficient models, as exemplified by the success of convolutional networks in the image domain. In this work, we systematically explore structured matrices as replacements for dense matrices. We show that diffe… ▽ More Dense linear layers are the dominant computational bottleneck in foundation models. Identifying more efficient alternatives to dense matrices has enormous potential for building more compute-efficient models, as exemplified by the success of convolutional networks in the image domain. In this work, we systematically explore structured matrices as replacements for dense matrices. We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance, especially as models scale. Using insights from the Maximal Update Parameterization, we determine the optimal scaling for initialization and learning rates of these unconventional layers. Finally, we measure the scaling laws of different structures to compare how quickly their performance improves with compute. We propose a novel matrix family containing Monarch matrices, the Block Tensor-Train (BTT), which we show performs better than dense matrices for the same compute on multiple tasks. On CIFAR-10/100 with augmentation, BTT achieves exponentially lower training loss than dense when training MLPs and ViTs. BTT matches dense ViT-S/32 performance on ImageNet-1k with 3.8 times less compute and is more efficient than dense for training small GPT-2 language models. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: ICML 24. Code available at https://github.com/shikaiqiu/compute-better-spent

arXiv:2405.14812 [pdf, other]

As an AI Language Model, "Yes I Would Recommend Calling the Police'': Norm Inconsistency in LLM Decision-Making

Authors: Shomik Jain, D Calacci, Ashia Wilson

Abstract: We investigate the phenomenon of norm inconsistency: where LLMs apply different norms in similar situations. Specifically, we focus on the high-risk application of deciding whether to call the police in Amazon Ring home surveillance videos. We evaluate the decisions of three state-of-the-art LLMs -- GPT-4, Gemini 1.0, and Claude 3 Sonnet -- in relation to the activities portrayed in the videos, th… ▽ More We investigate the phenomenon of norm inconsistency: where LLMs apply different norms in similar situations. Specifically, we focus on the high-risk application of deciding whether to call the police in Amazon Ring home surveillance videos. We evaluate the decisions of three state-of-the-art LLMs -- GPT-4, Gemini 1.0, and Claude 3 Sonnet -- in relation to the activities portrayed in the videos, the subjects' skin-tone and gender, and the characteristics of the neighborhoods where the videos were recorded. Our analysis reveals significant norm inconsistencies: (1) a discordance between the recommendation to call the police and the actual presence of criminal activity, and (2) biases influenced by the racial demographics of the neighborhoods. These results highlight the arbitrariness of model decisions in the surveillance context and the limitations of current bias detection and mitigation strategies in normative decision-making. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.00740 [pdf, other]

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Authors: Samuel Lavoie, Polina Kirichenko, Mark Ibrahim, Mahmoud Assran, Andrew Gordon Wilson, Aaron Courville, Nicolas Ballas

Abstract: There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's v… ▽ More There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text. We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5% on ImageNet outperforming a similarly sized CLIP by 1.4%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0%. We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to richer visual representations. △ Less

Submitted 14 May, 2024; v1 submitted 29 April, 2024; originally announced May 2024.

Comments: 14 pages, 8 figures, 7 tables, to be published at ICML2024

arXiv:2404.14952 [pdf, other]

Leveraging Speech for Gesture Detection in Multimodal Communication

Authors: Esam Ghaleb, Ilya Burenko, Marlou Rasenberg, Wim Pouw, Ivan Toni, Peter Uhrig, Anna Wilson, Judith Holler, Aslı Özyürek, Raquel Fernández

Abstract: Gestures are inherent to human interaction and often complement speech in face-to-face communication, forming a multimodal communication system. An important task in gesture analysis is detecting a gesture's beginning and end. Research on automatic gesture detection has primarily focused on visual and kinematic information to detect a limited set of isolated or silent gestures with low variability… ▽ More Gestures are inherent to human interaction and often complement speech in face-to-face communication, forming a multimodal communication system. An important task in gesture analysis is detecting a gesture's beginning and end. Research on automatic gesture detection has primarily focused on visual and kinematic information to detect a limited set of isolated or silent gestures with low variability, neglecting the integration of speech and vision signals to detect gestures that co-occur with speech. This work addresses this gap by focusing on co-speech gesture detection, emphasising the synchrony between speech and co-speech hand gestures. We address three main challenges: the variability of gesture forms, the temporal misalignment between gesture and speech onsets, and differences in sampling rate between modalities. We investigate extended speech time windows and employ separate backbone models for each modality to address the temporal misalignment and sampling rate differences. We utilize Transformer encoders in cross-modal and early fusion techniques to effectively align and integrate speech and skeletal sequences. The study results show that combining visual and speech information significantly enhances gesture detection performance. Our findings indicate that expanding the speech buffer beyond visual time segments improves performance and that multimodal integration using cross-modal and early fusion techniques outperforms baseline methods using unimodal and late fusion methods. Additionally, we find a correlation between the models' gesture prediction confidence and low-level speech frequency features potentially associated with gestures. Overall, the study provides a better understanding and detection methods for co-speech gestures, facilitating the analysis of multimodal communication. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.08592 [pdf, other]

Scarce Resource Allocations That Rely On Machine Learning Should Be Randomized

Authors: Shomik Jain, Kathleen Creel, Ashia Wilson

Abstract: Contrary to traditional deterministic notions of algorithmic fairness, this paper argues that fairly allocating scarce resources using machine learning often requires randomness. We address why, when, and how to randomize by proposing stochastic procedures that more adequately account for all of the claims that individuals have to allocations of social goods or opportunities. Contrary to traditional deterministic notions of algorithmic fairness, this paper argues that fairly allocating scarce resources using machine learning often requires randomness. We address why, when, and how to randomize by proposing stochastic procedures that more adequately account for all of the claims that individuals have to allocations of social goods or opportunities. △ Less

Submitted 19 June, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: To appear in the proceedings of the International Conference on Machine Learning (ICML 2024)

ACM Class: K.4.0

arXiv:2403.16365 [pdf, other]

Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

Authors: Hossein Souri, Arpit Bansal, Hamid Kazemi, Liam Fowl, Aniruddha Saha, Jonas Geiping, Andrew Gordon Wilson, Rama Chellappa, Tom Goldstein, Micah Goldblum

Abstract: Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clea… ▽ More Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clean data, called base samples, and then modify those samples to craft poisons. However, some base samples may be significantly more amenable to poisoning than others. As a result, we may be able to craft more potent poisons by carefully choosing the base samples. In this work, we use guided diffusion to synthesize base samples from scratch that lead to significantly more potent poisons and backdoors than previous state-of-the-art attacks. Our Guided Diffusion Poisoning (GDP) base samples can be combined with any downstream poisoning or backdoor attack to boost its effectiveness. Our implementation code is publicly available at: https://github.com/hsouri/GDP . △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.14029 [pdf, other]

Quadcopter Team Configurable Motion Guided by a Quadruped

Authors: Mohammad Ghufran, Sourish Tetakayala, Jack Hughes, Aron Wilson, Hossein Rastgoftar

Abstract: The paper focuses on modeling and experimental evaluation of a quadcopter team configurable coordination guided by a single quadruped robot. We consider the quadcopter team as particles of a two-dimensional deformable body and propose a two-dimensional affine transformation model for safe and collision-free configurable coordination of this heterogeneous robotic system. The proposed affine transfo… ▽ More The paper focuses on modeling and experimental evaluation of a quadcopter team configurable coordination guided by a single quadruped robot. We consider the quadcopter team as particles of a two-dimensional deformable body and propose a two-dimensional affine transformation model for safe and collision-free configurable coordination of this heterogeneous robotic system. The proposed affine transformation is decomposed into translation, that is specified by the quadruped global position, and configurable motion of the quadcopters, which is determined by a nonsingular Jacobian matrix so that the quadcopter team can safely navigate a constrained environment while avoiding collision. We propose two methods to experimentally evaluate the proposed heterogeneous robot coordination model. The first method measures real positions of quadcopters, quadruped, and environmental objects all with respect to the global coordinate system. On the other hand, the second method measures position with respect to the local coordinate system fixed on the dog robot which in turn enables safe planning the Jacobian matrix of the quadcopter team while the world is virtually approached the robotic system. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13947 [pdf, other]

BlendScape: Enabling Unified and Personalized Video-Conferencing Environments through Generative AI

Authors: Shwetha Rajaram, Nels Numan, Balasaravanan Thoravi Kumaravel, Nicolai Marquardt, Andrew D. Wilson

Abstract: Today's video-conferencing tools support a rich range of professional and social activities, but their generic, grid-based environments cannot be easily adapted to meet the varying needs of distributed collaborators. To enable end-user customization, we developed BlendScape, a system for meeting participants to compose video-conferencing environments tailored to their collaboration context by leve… ▽ More Today's video-conferencing tools support a rich range of professional and social activities, but their generic, grid-based environments cannot be easily adapted to meet the varying needs of distributed collaborators. To enable end-user customization, we developed BlendScape, a system for meeting participants to compose video-conferencing environments tailored to their collaboration context by leveraging AI image generation techniques. BlendScape supports flexible representations of task spaces by blending users' physical or virtual backgrounds into unified environments and implements multimodal interaction techniques to steer the generation. Through an evaluation with 15 end-users, we investigated their customization preferences for work and social scenarios. Participants could rapidly express their design intentions with BlendScape and envisioned using the system to structure collaboration in future meetings, but experienced challenges with preventing distracting elements. We implement scenarios to demonstrate BlendScape's expressiveness in supporting distributed collaboration techniques from prior work and propose composition techniques to improve the quality of environments. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.09869 [pdf, other]

Mind the GAP: Improving Robustness to Subpopulation Shifts with Group-Aware Priors

Authors: Tim G. J. Rudner, Ya Shi Zhang, Andrew Gordon Wilson, Julia Kempe

Abstract: Machine learning models often perform poorly under subpopulation shifts in the data distribution. Developing methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well… ▽ More Machine learning models often perform poorly under subpopulation shifts in the data distribution. Developing methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well under subpopulation shifts. We design a simple group-aware prior that only requires access to a small set of data with group information and demonstrate that training with this prior yields state-of-the-art performance -- even when only retraining the final layer of a previously trained non-robust model. Group aware-priors are conceptually simple, complementary to existing approaches, such as attribute pseudo labeling and data reweighting, and open up promising new avenues for harnessing Bayesian inference to enable robustness to subpopulation shifts. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Published in Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)

arXiv:2403.07815 [pdf, other]

Chronos: Learning the Language of Time Series

Authors: Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, Yuyang Wang

Abstract: We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M… ▽ More We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters) on a large collection of publicly available datasets, complemented by a synthetic dataset that we generated via Gaussian processes to improve generalization. In a comprehensive benchmark consisting of 42 datasets, and comprising both classical local models and deep learning methods, we show that Chronos models: (a) significantly outperform other methods on datasets that were part of the training corpus; and (b) have comparable and occasionally superior zero-shot performance on new datasets, relative to methods that were trained specifically on them. Our results demonstrate that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks, positioning pretrained models as a viable tool to greatly simplify forecasting pipelines. △ Less

Submitted 2 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: Code and model checkpoints available at https://github.com/amazon-science/chronos-forecasting

arXiv:2403.02695 [pdf, other]

Controllable Prompt Tuning For Balancing Group Distributional Robustness

Authors: Hoang Phan, Andrew Gordon Wilson, Qi Lei

Abstract: Models trained on data composed of different groups or domains can suffer from severe performance degradation under distribution shifts. While recent methods have largely focused on optimizing the worst-group objective, this often comes at the expense of good performance on other groups. To address this problem, we introduce an optimization scheme to achieve good performance across groups and find… ▽ More Models trained on data composed of different groups or domains can suffer from severe performance degradation under distribution shifts. While recent methods have largely focused on optimizing the worst-group objective, this often comes at the expense of good performance on other groups. To address this problem, we introduce an optimization scheme to achieve good performance across groups and find a good solution for all without severely sacrificing performance on any of them. However, directly applying such optimization involves updating the parameters of the entire network, making it both computationally expensive and challenging. Thus, we introduce Controllable Prompt Tuning (CPT), which couples our approach with prompt-tuning techniques. On spurious correlation benchmarks, our procedures achieve state-of-the-art results across both transformer and non-transformer architectures, as well as unimodal and multimodal data, while requiring only 0.4% tunable parameters. △ Less

Submitted 4 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: Proceedings of the 41st International Conference on Machine Learning

arXiv:2402.05980 [pdf, other]

Do Large Code Models Understand Programming Concepts? A Black-box Approach

Authors: Ashish Hooda, Mihai Christodorescu, Miltiadis Allamanis, Aaron Wilson, Kassem Fawaz, Somesh Jha

Abstract: Large Language Models' success on text generation has also made them better at code generation and coding tasks. While a lot of work has demonstrated their remarkable performance on tasks such as code completion and editing, it is still unclear as to why. We help bridge this gap by exploring to what degree auto-regressive models understand the logical constructs of the underlying programs. We prop… ▽ More Large Language Models' success on text generation has also made them better at code generation and coding tasks. While a lot of work has demonstrated their remarkable performance on tasks such as code completion and editing, it is still unclear as to why. We help bridge this gap by exploring to what degree auto-regressive models understand the logical constructs of the underlying programs. We propose Counterfactual Analysis for Programming Concept Predicates (CACP) as a counterfactual testing framework to evaluate whether Large Code Models understand programming concepts. With only black-box access to the model, we use CACP to evaluate ten popular Large Code Models for four different programming concepts. Our findings suggest that current models lack understanding of concepts such as data flow and control flow. △ Less

Submitted 23 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.04379 [pdf, other]

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Authors: Nate Gruver, Anuroop Sriram, Andrea Madotto, Andrew Gordon Wilson, C. Lawrence Zitnick, Zachary Ulissi

Abstract: We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculatio… ▽ More We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculations, we show that our strongest model (fine-tuned LLaMA-2 70B) can generate materials predicted to be metastable at about twice the rate (49% vs 28%) of CDVAE, a competing diffusion model. Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material, infilling of partial structures and text-conditional generation. Finally, we show that language models' ability to capture key symmetries of crystal structures improves with model scale, suggesting that the biases of pretrained LLMs are surprisingly well-suited for atomistic data. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: ICLR 2024. Code available at: https://github.com/facebookresearch/crystal-llm

arXiv:2402.00809 [pdf, other]

Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

Authors: Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, José Miguel Hernández-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang

Abstract: In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learni… ▽ More In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential. △ Less

Submitted 2 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.10149 [pdf, other]

Multi-Agent Reinforcement Learning for Maritime Operational Technology Cyber Security

Authors: Alec Wilson, Ryan Menzies, Neela Morarji, David Foster, Marco Casassa Mont, Esin Turkbeyler, Lisa Gralewski

Abstract: This paper demonstrates the potential for autonomous cyber defence to be applied on industrial control systems and provides a baseline environment to further explore Multi-Agent Reinforcement Learning's (MARL) application to this problem domain. It introduces a simulation environment, IPMSRL, of a generic Integrated Platform Management System (IPMS) and explores the use of MARL for autonomous cybe… ▽ More This paper demonstrates the potential for autonomous cyber defence to be applied on industrial control systems and provides a baseline environment to further explore Multi-Agent Reinforcement Learning's (MARL) application to this problem domain. It introduces a simulation environment, IPMSRL, of a generic Integrated Platform Management System (IPMS) and explores the use of MARL for autonomous cyber defence decision-making on generic maritime based IPMS Operational Technology (OT). OT cyber defensive actions are less mature than they are for Enterprise IT. This is due to the relatively brittle nature of OT infrastructure originating from the use of legacy systems, design-time engineering assumptions, and lack of full-scale modern security controls. There are many obstacles to be tackled across the cyber landscape due to continually increasing cyber-attack sophistication and the limitations of traditional IT-centric cyber defence solutions. Traditional IT controls are rarely deployed on OT infrastructure, and where they are, some threats aren't fully addressed. In our experiments, a shared critic implementation of Multi Agent Proximal Policy Optimisation (MAPPO) outperformed Independent Proximal Policy Optimisation (IPPO). MAPPO reached an optimal policy (episode outcome mean of 1) after 800K timesteps, whereas IPPO was only able to reach an episode outcome mean of 0.966 after one million timesteps. Hyperparameter tuning greatly improved training performance. Across one million timesteps the tuned hyperparameters reached an optimal policy whereas the default hyperparameters only managed to win sporadically, with most simulations resulting in a draw. We tested a real-world constraint, attack detection alert success, and found that when alert success probability is reduced to 0.75 or 0.9, the MARL defenders were still able to win in over 97.5% or 99.5% of episodes, respectively. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: 13 pages, 7 figures, Proceedings of the Conference on Applied Machine Learning in Information Security 2023 (CAMLIS)

Journal ref: Proceedings of the Conference on Applied Machine Learning in Information Security 2023 (CAMLIS), Arlington VA, USA, October 19-20, 2023, CEUR-WS.org, online CEUR-WS.org/Vol-3652/paper3.pdf

arXiv:2401.04349 [pdf]

WebGPU-SPY: Finding Fingerprints in the Sandbox through GPU Cache Attacks

Authors: Ethan Ferguson, Adam Wilson, Hoda Naghibijouybari

Abstract: Microarchitectural attacks on CPU structures have been studied in native applications, as well as in web browsers. These attacks continue to be a substantial threat to computing systems at all scales. With the proliferation of heterogeneous systems and integration of hardware accelerators in every computing system, modern web browsers provide the support of GPU-based acceleration for the graphic… ▽ More Microarchitectural attacks on CPU structures have been studied in native applications, as well as in web browsers. These attacks continue to be a substantial threat to computing systems at all scales. With the proliferation of heterogeneous systems and integration of hardware accelerators in every computing system, modern web browsers provide the support of GPU-based acceleration for the graphics and rendering processes. Emerging web standards also support the GPU acceleration of general-purpose computation within web browsers. In this paper, we present a new attack vector for microarchitectural attacks in web browsers. We use emerging GPU accelerating APIs in modern browsers (specifically WebGPU) to launch a GPU-based cache side channel attack on the compute stack of the GPU that spies on victim activities on the graphics (rendering) stack of the GPU. Unlike prior works that rely on JavaScript APIs or software interfaces to build timing primitives, we build the timer using GPU hardware resources and develop a cache side channel attack on Intel's integrated GPUs. We leverage the GPU's inherent parallelism at different levels to develop high-resolution parallel attacks. We demonstrate that GPU-based cache attacks can achieve a precision of 90 for website fingerprinting of 100 top websites. We also discuss potential countermeasures against the proposed attack to secure the systems at a critical time when these web standards are being developed and before they are widely deployed. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2401.01764 [pdf, other]

Understanding the Detrimental Class-level Effects of Data Augmentation

Authors: Polina Kirichenko, Mark Ibrahim, Randall Balestriero, Diane Bouchacourt, Ramakrishna Vedantam, Hamed Firooz, Andrew Gordon Wilson

Abstract: Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. The… ▽ More Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. There has been little progress in resolving class-level accuracy drops due to a limited understanding of these effects. In this work, we present a framework for understanding how DA interacts with class-level learning dynamics. Using higher-quality multi-label annotations on ImageNet, we systematically categorize the affected classes and find that the majority are inherently ambiguous, co-occur, or involve fine-grained distinctions, while DA controls the model's bias towards one of the closely related classes. While many of the previously reported performance drops are explained by multi-label annotations, our analysis of class confusions reveals other sources of accuracy degradation. We show that simple class-conditional augmentation strategies informed by our framework improve performance on the negatively affected classes. △ Less

Submitted 7 December, 2023; originally announced January 2024.

Comments: Neural Information Processing Systems (NeurIPS), 2023

arXiv:2312.17174 [pdf, other]

Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution

Authors: Ying Wang, Tim G. J. Rudner, Andrew Gordon Wilson

Abstract: Vision-language pretrained models have seen remarkable success, but their application to safety-critical settings is limited by their lack of interpretability. To improve the interpretability of vision-language models such as CLIP, we propose a multi-modal information bottleneck (M2IB) approach that learns latent representations that compress irrelevant information while preserving relevant visual… ▽ More Vision-language pretrained models have seen remarkable success, but their application to safety-critical settings is limited by their lack of interpretability. To improve the interpretability of vision-language models such as CLIP, we propose a multi-modal information bottleneck (M2IB) approach that learns latent representations that compress irrelevant information while preserving relevant visual and textual features. We demonstrate how M2IB can be applied to attribution analysis of vision-language pretrained models, increasing attribution accuracy and improving the interpretability of such models when applied to safety-critical domains such as healthcare. Crucially, unlike commonly used unimodal attribution methods, M2IB does not require ground truth labels, making it possible to audit representations of vision-language pretrained models when multiple modalities but no ground-truth data is available. Using CLIP as an example, we demonstrate the effectiveness of M2IB attribution and show that it outperforms gradient-based, perturbation-based, and attention-based attribution methods both qualitatively and quantitatively. △ Less

Submitted 22 June, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: Published in Advances in Neural Information Processing Systems 36 (NeurIPS 2023)

arXiv:2312.17173 [pdf, other]

Non-Vacuous Generalization Bounds for Large Language Models

Authors: Sanae Lotfi, Marc Finzi, Yilun Kuang, Tim G. J. Rudner, Micah Goldblum, Andrew Gordon Wilson

Abstract: Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply regurgitate their training corpora. We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. In particular,… ▽ More Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply regurgitate their training corpora. We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. In particular, we derive a compression bound that is valid for the unbounded log-likelihood loss using prediction smoothing, and we extend the bound to handle subsampling, accelerating bound computation on massive datasets. To achieve the extreme level of compression required for non-vacuous generalization bounds, we devise SubLoRA, a low-dimensional non-linear parameterization. Using this approach, we find that larger models have better generalization bounds and are more compressible than smaller models. △ Less

Submitted 12 February, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.17162 [pdf, other]

Function-Space Regularization in Neural Networks: A Probabilistic Perspective

Authors: Tim G. J. Rudner, Sanyam Kapoor, Shikai Qiu, Andrew Gordon Wilson

Abstract: Parameter-space regularization in neural network optimization is a fundamental tool for improving generalization. However, standard parameter-space regularization methods make it challenging to encode explicit preferences about desired predictive functions into neural network training. In this work, we approach regularization in neural networks from a probabilistic perspective and show that by vie… ▽ More Parameter-space regularization in neural network optimization is a fundamental tool for improving generalization. However, standard parameter-space regularization methods make it challenging to encode explicit preferences about desired predictive functions into neural network training. In this work, we approach regularization in neural networks from a probabilistic perspective and show that by viewing parameter-space regularization as specifying an empirical prior distribution over the model parameters, we can derive a probabilistically well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training. This method -- which we refer to as function-space empirical Bayes (FSEB) -- includes both parameter- and function-space regularization, is mathematically simple, easy to implement, and incurs only minimal computational overhead compared to standard regularization techniques. We evaluate the utility of this regularization technique empirically and demonstrate that the proposed method leads to near-perfect semantic shift detection, highly-calibrated predictive uncertainty estimates, successful task adaption from pre-trained models, and improved generalization under covariate shift. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: Published in Proceedings of the 40th International Conference on Machine Learning (ICML 2023)

arXiv:2312.09323 [pdf, other]

Perspectives on the State and Future of Deep Learning - 2023

Authors: Micah Goldblum, Anima Anandkumar, Richard Baraniuk, Tom Goldstein, Kyunghyun Cho, Zachary C Lipton, Melanie Mitchell, Preetum Nakkiran, Max Welling, Andrew Gordon Wilson

Abstract: The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time. The plan is to host this survey periodically until the AI singularity paperclip-frenzy-driven doomsday, keeping an updated list of topical questions and interviewing new community members for each edition. In this issue, we probed people's opinions on inter… ▽ More The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time. The plan is to host this survey periodically until the AI singularity paperclip-frenzy-driven doomsday, keeping an updated list of topical questions and interviewing new community members for each edition. In this issue, we probed people's opinions on interpretable AI, the value of benchmarking in modern NLP, the state of progress towards understanding deep learning, and the future of academia. △ Less

Submitted 18 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.08823 [pdf, other]

Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin algorithm

Authors: Vishwak Srinivasan, Andre Wibisono, Ashia Wilson

Abstract: We propose a new method called the Metropolis-adjusted Mirror Langevin algorithm for approximate sampling from distributions whose support is a compact and convex set. This algorithm adds an accept-reject filter to the Markov chain induced by a single step of the Mirror Langevin algorithm (Zhang et al., 2020), which is a basic discretisation of the Mirror Langevin dynamics. Due to the inclusion of… ▽ More We propose a new method called the Metropolis-adjusted Mirror Langevin algorithm for approximate sampling from distributions whose support is a compact and convex set. This algorithm adds an accept-reject filter to the Markov chain induced by a single step of the Mirror Langevin algorithm (Zhang et al., 2020), which is a basic discretisation of the Mirror Langevin dynamics. Due to the inclusion of this filter, our method is unbiased relative to the target, while known discretisations of the Mirror Langevin dynamics including the Mirror Langevin algorithm have an asymptotic bias. For this algorithm, we also give upper bounds for the number of iterations taken to mix to a constrained distribution whose potential is relatively smooth, convex, and Lipschitz continuous with respect to a self-concordant mirror function. As a consequence of the reversibility of the Markov chain induced by the inclusion of the Metropolis-Hastings filter, we obtain an exponentially better dependence on the error tolerance for approximate constrained sampling. We also present numerical experiments that corroborate our theoretical findings. △ Less

Submitted 21 June, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 49 pages, 6 figures, 2 tables. Shorter version without experiments accepted to COLT 2024

arXiv:2312.05879 [pdf, other]

Wild Motion Unleashed: Markerless 3D Kinematics and Force Estimation in Cheetahs

Authors: Zico da Silva, Stacy Shield, Penny E. Hudson, Alan M. Wilson, Fred Nicolls, Amir Patel

Abstract: The complex dynamics of animal manoeuvrability in the wild is extremely challenging to study. The cheetah ($\textit{Acinonyx jubatus}$) is a perfect example: despite great interest in its unmatched speed and manoeuvrability, obtaining complete whole-body motion data from these animals remains an unsolved problem. This is especially difficult in wild cheetahs, where it is essential that the methods… ▽ More The complex dynamics of animal manoeuvrability in the wild is extremely challenging to study. The cheetah ($\textit{Acinonyx jubatus}$) is a perfect example: despite great interest in its unmatched speed and manoeuvrability, obtaining complete whole-body motion data from these animals remains an unsolved problem. This is especially difficult in wild cheetahs, where it is essential that the methods used are remote and do not constrain the animal's motion. In this work, we use data obtained from cheetahs in the wild to present a trajectory optimisation approach for estimating the 3D kinematics and joint torques of subjects remotely. We call this approach kinetic full trajectory estimation (K-FTE). We validate the method on a dataset comprising synchronised video and force plate data. We are able to reconstruct the 3D kinematics with an average reprojection error of 17.69 pixels (62.94 $\%$ PCK using the nose-to-eye(s) length segment as a threshold), while the estimates produce an average root-mean-square error of 171.3 N ($\approx$ 17.16 $\%$ of peak force during stride) for the estimated ground reaction force when compared against the force plate data. While the joint torques cannot be directly validated against ground truth data, as no such data is available for cheetahs, the estimated torques agree with previous studies of quadrupeds in controlled settings. These results will enable deeper insight into the study of animal locomotion in a more natural environment for both biologists and roboticists. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.02796 [pdf, other]

Materials Expert-Artificial Intelligence for Materials Discovery

Authors: Yanjun Liu, Milena Jovanovic, Krishnanand Mallayya, Wesley J. Maddox, Andrew Gordon Wilson, Sebastian Klemenz, Leslie M. Schoop, Eun-Ah Kim

Abstract: The advent of material databases provides an unprecedented opportunity to uncover predictive descriptors for emergent material properties from vast data space. However, common reliance on high-throughput ab initio data necessarily inherits limitations of such data: mismatch with experiments. On the other hand, experimental decisions are often guided by an expert's intuition honed from experiences… ▽ More The advent of material databases provides an unprecedented opportunity to uncover predictive descriptors for emergent material properties from vast data space. However, common reliance on high-throughput ab initio data necessarily inherits limitations of such data: mismatch with experiments. On the other hand, experimental decisions are often guided by an expert's intuition honed from experiences that are rarely articulated. We propose using machine learning to "bottle" such operational intuition into quantifiable descriptors using expertly curated measurement-based data. We introduce "Materials Expert-Artificial Intelligence" (ME-AI) to encapsulate and articulate this human intuition. As a first step towards such a program, we focus on the topological semimetal (TSM) among square-net materials as the property inspired by the expert-identified descriptor based on structural information: the tolerance factor. We start by curating a dataset encompassing 12 primary features of 879 square-net materials, using experimental data whenever possible. We then use Dirichlet-based Gaussian process regression using a specialized kernel to reveal composite descriptors for square-net topological semimetals. The ME-AI learned descriptors independently reproduce expert intuition and expand upon it. Specifically, new descriptors point to hypervalency as a critical chemical feature predicting TSM within square-net compounds. Our success with a carefully defined problem points to the "machine bottling human insight" approach as promising for machine learning-aided material discovery. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 8 pages main text, 4 figs, 8 pages Supplementary material

arXiv:2312.02517 [pdf, other]

Simplifying Neural Network Training Under Class Imbalance

Authors: Ravid Shwartz-Ziv, Micah Goldblum, Yucen Lily Li, C. Bayan Bruss, Andrew Gordon Wilson

Abstract: Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models. The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures. Notably, we demonstrate that simply tuning existing components of standard deep learning pipelines, such… ▽ More Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models. The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures. Notably, we demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, optimizer, and label smoothing, can achieve state-of-the-art performance without any such specialized class imbalance methods. We also provide key prescriptions and considerations for training under class imbalance, and an understanding of why imbalance methods succeed or fail. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: NeurIPS 2023. Code available at https://github.com/ravidziv/SimplifyingImbalancedTraining

arXiv:2311.15990 [pdf, other]

Should We Learn Most Likely Functions or Parameters?

Authors: Shikai Qiu, Tim G. J. Rudner, Sanyam Kapoor, Andrew Gordon Wilson

Abstract: Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation. However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions. Moreover, the most likely parameters under the parameter posterior do not generall… ▽ More Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation. However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions. Moreover, the most likely parameters under the parameter posterior do not generally correspond to the most likely function induced by the parameter posterior. In fact, we can re-parametrize a model such that any setting of parameters can maximize the parameter posterior. As an alternative, we investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data. We show that this procedure leads to pathological solutions when using neural networks and prove conditions under which the procedure is well-behaved, as well as a scalable approximation. Under these conditions, we find that function-space MAP estimation can lead to flatter minima, better generalization, and improved robustness to overfitting. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: NeurIPS 2023. Code available at https://github.com/activatedgeek/function-space-map

arXiv:2311.05877 [pdf, other]

A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning

Authors: Valeriia Cherepanova, Roman Levin, Gowthami Somepalli, Jonas Geiping, C. Bayan Bruss, Andrew Gordon Wilson, Tom Goldstein, Micah Goldblum

Abstract: Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. To prevent overfitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset of informative features. E… ▽ More Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. To prevent overfitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset of informative features. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. Motivated by the increasing popularity of tabular deep learning, we construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features. We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems such as selecting from corrupted or second-order features. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Journal ref: Conference on Neural Information Processing Systems 2023

arXiv:2310.19909 [pdf, other]

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks

Authors: Micah Goldblum, Hossein Souri, Renkun Ni, Manli Shu, Viraj Prabhu, Gowthami Somepalli, Prithvijit Chattopadhyay, Mark Ibrahim, Adrien Bardes, Judy Hoffman, Rama Chellappa, Andrew Gordon Wilson, Tom Goldstein

Abstract: Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performan… ▽ More Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performance increases for a range of systems, it is difficult for practitioners to make informed decisions about which backbone to choose. Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more. Furthermore, BoB sheds light on promising directions for the research community to advance computer vision by illuminating strengths and weakness of existing approaches through a comprehensive analysis conducted on more than 1500 training runs. While vision transformers (ViTs) and self-supervised learning (SSL) are increasingly popular, we find that convolutional neural networks pretrained in a supervised fashion on large training sets still perform best on most tasks among the models we consider. Moreover, in apples-to-apples comparisons on the same architectures and similarly sized pretraining datasets, we find that SSL backbones are highly competitive, indicating that future works should perform SSL pretraining with advanced architectures and larger pretraining datasets. We release the raw results of our experiments along with code that allows researchers to put their own backbones through the gauntlet here: https://github.com/hsouri/Battle-of-the-Backbones △ Less

Submitted 19 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2310.16139 [pdf, other]

Pix2HDR -- A pixel-wise acquisition and deep learning-based synthesis approach for high-speed HDR videos

Authors: Caixin Wang, Jie Zhang, Matthew A. Wilson, Ralph Etienne-Cummings

Abstract: Accurately capturing dynamic scenes with wide-ranging motion and light intensity is crucial for many vision applications. However, acquiring high-speed high dynamic range (HDR) video is challenging because the camera's frame rate restricts its dynamic range. Existing methods sacrifice speed to acquire multi-exposure frames. Yet, misaligned motion in these frames can still pose complications for HD… ▽ More Accurately capturing dynamic scenes with wide-ranging motion and light intensity is crucial for many vision applications. However, acquiring high-speed high dynamic range (HDR) video is challenging because the camera's frame rate restricts its dynamic range. Existing methods sacrifice speed to acquire multi-exposure frames. Yet, misaligned motion in these frames can still pose complications for HDR fusion algorithms, resulting in artifacts. Instead of frame-based exposures, we sample the videos using individual pixels at varying exposures and phase offsets. Implemented on a monochrome pixel-wise programmable image sensor, our sampling pattern simultaneously captures fast motion at a high dynamic range. We then transform pixel-wise outputs into an HDR video using end-to-end learned weights from deep neural networks, achieving high spatiotemporal resolution with minimized motion blurring. We demonstrate aliasing-free HDR video acquisition at 1000 FPS, resolving fast motion under low-light conditions and against bright backgrounds - both challenging conditions for conventional cameras. By combining the versatility of pixel-wise sampling patterns with the strength of deep neural networks at decoding complex scenes, our method greatly enhances the vision system's adaptability and performance in dynamic conditions. △ Less

Submitted 25 April, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: 17 pages, 18 figures

arXiv:2310.07820 [pdf, other]

Large Language Models Are Zero-Shot Time Series Forecasters

Authors: Nate Gruver, Marc Finzi, Shikai Qiu, Andrew Gordon Wilson

Abstract: By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. Developing this approach, we find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot extrapolate time series at a level comparable to or exceeding the performance of purpose-built time series models trained on the downstream tasks. To f… ▽ More By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. Developing this approach, we find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot extrapolate time series at a level comparable to or exceeding the performance of purpose-built time series models trained on the downstream tasks. To facilitate this performance, we propose procedures for effectively tokenizing time series data and converting discrete distributions over tokens into highly flexible densities over continuous values. We argue the success of LLMs for time series stems from their ability to naturally represent multimodal distributions, in conjunction with biases for simplicity, and repetition, which align with the salient features in many time series, such as repeated seasonal trends. We also show how LLMs can naturally handle missing data without imputation through non-numerical text, accommodate textual side information, and answer questions to help explain predictions. While we find that increasing model size generally improves performance on time series, we show GPT-4 can perform worse than GPT-3 because of how it tokenizes numbers, and poor uncertainty calibration, which is likely the result of alignment interventions such as RLHF. △ Less

Submitted 18 June, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023. Code available at: https://github.com/ngruver/llmtime

arXiv:2309.09944 [pdf, other]

DiffusionWorldViewer: Exposing and Broadening the Worldview Reflected by Generative Text-to-Image Models

Authors: Zoe De Simone, Angie Boggust, Arvind Satyanarayan, Ashia Wilson

Abstract: Generative text-to-image (TTI) models produce high-quality images from short textual descriptions and are widely used in academic and creative domains. Like humans, TTI models have a worldview, a conception of the world learned from their training data and task that influences the images they generate for a given prompt. However, the worldviews of TTI models are often hidden from users, making it… ▽ More Generative text-to-image (TTI) models produce high-quality images from short textual descriptions and are widely used in academic and creative domains. Like humans, TTI models have a worldview, a conception of the world learned from their training data and task that influences the images they generate for a given prompt. However, the worldviews of TTI models are often hidden from users, making it challenging for users to build intuition about TTI outputs, and they are often misaligned with users' worldviews, resulting in output images that do not match user expectations. In response, we introduce DiffusionWorldViewer, an interactive interface that exposes a TTI model's worldview across output demographics and provides editing tools for aligning output images with user perspectives. In a user study with 18 diverse TTI users, we find that DiffusionWorldViewer helps users represent their varied viewpoints in generated images and challenge the limited worldview reflected in current TTI models. △ Less

Submitted 5 February, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: 20 pages, 8 figures

arXiv:2309.03060 [pdf, other]

CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra

Authors: Andres Potapczynski, Marc Finzi, Geoff Pleiss, Andrew Gordon Wilson

Abstract: Many areas of machine learning and science involve large linear algebra problems, such as eigendecompositions, solving linear systems, computing matrix exponentials, and trace estimation. The matrices involved often have Kronecker, convolutional, block diagonal, sum, or product structure. In this paper, we propose a simple but general framework for large-scale linear algebra problems in machine le… ▽ More Many areas of machine learning and science involve large linear algebra problems, such as eigendecompositions, solving linear systems, computing matrix exponentials, and trace estimation. The matrices involved often have Kronecker, convolutional, block diagonal, sum, or product structure. In this paper, we propose a simple but general framework for large-scale linear algebra problems in machine learning, named CoLA (Compositional Linear Algebra). By combining a linear operator abstraction with compositional dispatch rules, CoLA automatically constructs memory and runtime efficient numerical algorithms. Moreover, CoLA provides memory efficient automatic differentiation, low precision computation, and GPU acceleration in both JAX and PyTorch, while also accommodating new objects, operations, and rules in downstream packages via multiple dispatch. CoLA can accelerate many algebraic operations, while making it easy to prototype matrix structures and algorithms, providing an appealing drop-in tool for virtually any computational effort that requires linear algebra. We showcase its efficacy across a broad range of applications, including partial differential equations, Gaussian processes, equivariant model construction, and unsupervised learning. △ Less

Submitted 29 November, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: Code available at https://github.com/wilson-labs/cola. NeurIPS 2023

arXiv:2306.13564 [pdf, other]

Estimating Residential Solar Potential Using Aerial Data

Authors: Ross Goroshin, Alex Wilson, Andrew Lamb, Betty Peng, Brandon Ewonus, Cornelius Ratsch, Jordan Raisher, Marisa Leung, Max Burq, Thomas Colthurst, William Rucklidge, Carl Elkin

Abstract: Project Sunroof estimates the solar potential of residential buildings using high quality aerial data. That is, it estimates the potential solar energy (and associated financial savings) that can be captured by buildings if solar panels were to be installed on their roofs. Unfortunately its coverage is limited by the lack of high resolution digital surface map (DSM) data. We present a deep learnin… ▽ More Project Sunroof estimates the solar potential of residential buildings using high quality aerial data. That is, it estimates the potential solar energy (and associated financial savings) that can be captured by buildings if solar panels were to be installed on their roofs. Unfortunately its coverage is limited by the lack of high resolution digital surface map (DSM) data. We present a deep learning approach that bridges this gap by enhancing widely available low-resolution data, thereby dramatically increasing the coverage of Sunroof. We also present some ongoing efforts to potentially improve accuracy even further by replacing certain algorithmic components of the Sunroof processing pipeline with deep learning. △ Less

Submitted 23 June, 2023; originally announced June 2023.

Journal ref: ICLR 2023 - Tackling Climate Change with Machine Learning Workshop

arXiv:2306.11074 [pdf, other]

Simple and Fast Group Robustness by Automatic Feature Reweighting

Authors: Shikai Qiu, Andres Potapczynski, Pavel Izmailov, Andrew Gordon Wilson

Abstract: A major challenge to out-of-distribution generalization is reliance on spurious features -- patterns that are predictive of the class label in the training data distribution, but not causally related to the target. Standard methods for reducing the reliance on spurious features typically assume that we know what the spurious feature is, which is rarely true in the real world. Methods that attempt… ▽ More A major challenge to out-of-distribution generalization is reliance on spurious features -- patterns that are predictive of the class label in the training data distribution, but not causally related to the target. Standard methods for reducing the reliance on spurious features typically assume that we know what the spurious feature is, which is rarely true in the real world. Methods that attempt to alleviate this limitation are complex, hard to tune, and lead to a significant computational overhead compared to standard training. In this paper, we propose Automatic Feature Reweighting (AFR), an extremely simple and fast method for updating the model to reduce the reliance on spurious features. AFR retrains the last layer of a standard ERM-trained base model with a weighted loss that emphasizes the examples where the ERM model predicts poorly, automatically upweighting the minority group without group labels. With this simple procedure, we improve upon the best reported results among competing methods trained without spurious attributes on several vision and natural language classification benchmarks, using only a fraction of their compute. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: ICML 23. Code available at https://github.com/AndPotap/afr

Journal ref: 40th International Conference on Machine Learning 2023

arXiv:2306.07526 [pdf, other]

User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems

Authors: Marc Finzi, Anudhyan Boral, Andrew Gordon Wilson, Fei Sha, Leonardo Zepeda-Núñez

Abstract: Diffusion models are a class of probabilistic generative models that have been widely used as a prior for image processing tasks like text conditional generation and inpainting. We demonstrate that these models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems. In these applications, diffusion models can implicitly represent knowledge about out… ▽ More Diffusion models are a class of probabilistic generative models that have been widely used as a prior for image processing tasks like text conditional generation and inpainting. We demonstrate that these models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems. In these applications, diffusion models can implicitly represent knowledge about outliers and extreme events; however, querying that knowledge through conditional sampling or measuring probabilities is surprisingly difficult. Existing methods for conditional sampling at inference time seek mainly to enforce the constraints, which is insufficient to match the statistics of the distribution or compute the probability of the chosen events. To achieve these ends, optimally one would use the conditional score function, but its computation is typically intractable. In this work, we develop a probabilistic approximation scheme for the conditional score function which provably converges to the true distribution as the noise level decreases. With this scheme we are able to sample conditionally on nonlinear userdefined events at inference time, and matches data statistics even when sampling from the tails of the distribution. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: ICML 2023 Conference

arXiv:2306.05854 [pdf, other]

Partitioning Strategies for Distributed SMT Solving

Authors: Amalee Wilson, Andres Noetzli, Andrew Reynolds, Byron Cook, Cesare Tinelli, Clark Barrett

Abstract: For many users of Satisfiability Modulo Theories (SMT) solvers, the solver's performance is the main bottleneck in their application. One promising approach for improving performance is to leverage the increasing availability of parallel and cloud computing. However, despite many efforts, the best parallel approach to date consists of running a portfolio of solvers, meaning that performance is sti… ▽ More For many users of Satisfiability Modulo Theories (SMT) solvers, the solver's performance is the main bottleneck in their application. One promising approach for improving performance is to leverage the increasing availability of parallel and cloud computing. However, despite many efforts, the best parallel approach to date consists of running a portfolio of solvers, meaning that performance is still limited by the best possible sequential performance. In this paper, we revisit divide-and-conquer approaches to parallel SMT, in which a challenging problem is partitioned into several subproblems. We introduce several new partitioning strategies and evaluate their performance, both alone as well as within portfolios, on a large set of difficult SMT benchmarks. We show that hybrid portfolios that include our new strategies can significantly outperform traditional portfolios for parallel SMT. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: Submitted to FMCAD 2023

arXiv:2305.20028 [pdf, other]

A Study of Bayesian Neural Network Surrogates for Bayesian Optimization

Authors: Yucen Lily Li, Tim G. J. Rudner, Andrew Gordon Wilson

Abstract: Bayesian optimization is a highly efficient approach to optimizing objective functions which are expensive to query. These objectives are typically represented by Gaussian process (GP) surrogate models which are easy to optimize and support exact inference. While standard GP surrogates have been well-established in Bayesian optimization, Bayesian neural networks (BNNs) have recently become practic… ▽ More Bayesian optimization is a highly efficient approach to optimizing objective functions which are expensive to query. These objectives are typically represented by Gaussian process (GP) surrogate models which are easy to optimize and support exact inference. While standard GP surrogates have been well-established in Bayesian optimization, Bayesian neural networks (BNNs) have recently become practical function approximators, with many benefits over standard GPs such as the ability to naturally handle non-stationarity and learn representations for high-dimensional data. In this paper, we study BNNs as alternatives to standard GP surrogates for optimization. We consider a variety of approximate inference procedures for finite-width BNNs, including high-quality Hamiltonian Monte Carlo, low-cost stochastic MCMC, and heuristics such as deep ensembles. We also consider infinite-width BNNs, linearized Laplace approximations, and partially stochastic models such as deep kernel learning. We evaluate this collection of surrogate models on diverse problems with varying dimensionality, number of objectives, non-stationarity, and discrete and continuous inputs. We find: (i) the ranking of methods is highly problem dependent, suggesting the need for tailored inductive biases; (ii) HMC is the most successful approximate inference procedure for fully stochastic BNNs; (iii) full stochasticity may be unnecessary as deep kernel learning is relatively competitive; (iv) deep ensembles perform relatively poorly; (v) infinite-width BNNs are particularly promising, especially in high dimensions. △ Less

Submitted 8 May, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: ICLR 2024. Code available at https://github.com/yucenli/bnn-bo

arXiv:2305.20009 [pdf, other]

Protein Design with Guided Discrete Diffusion

Authors: Nate Gruver, Samuel Stanton, Nathan C. Frey, Tim G. J. Rudner, Isidro Hotzel, Julien Lafrance-Vanasse, Arvind Rajpal, Kyunghyun Cho, Andrew Gordon Wilson

Abstract: A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with high fitness. Given its broad success in conditional sampling, classifier-guided diffusion modeling is a promising foundation for protein design, leading many to… ▽ More A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with high fitness. Given its broad success in conditional sampling, classifier-guided diffusion modeling is a promising foundation for protein design, leading many to develop guided diffusion models for structure with inverse folding to recover sequences. In this work, we propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models that follows gradients in the hidden states of the denoising network. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods, including scarce data and challenging inverse design. Moreover, we use NOS to generalize LaMBO, a Bayesian optimization procedure for sequence design that facilitates multiple objectives and edit-based constraints. The resulting method, LaMBO-2, enables discrete diffusions and stronger performance with limited edits through a novel application of saliency maps. We apply LaMBO-2 to a real-world protein design task, optimizing antibodies for higher expression yield and binding affinity to several therapeutic targets under locality and developability constraints, attaining a 99% expression rate and 40% binding rate in exploratory in vitro experiments. △ Less

Submitted 12 December, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

Journal ref: Advances in Neural Information Processing Systems 36, December 10-16, 2023

arXiv:2305.12576 [pdf, other]

Automated Few-shot Classification with Instruction-Finetuned Language Models

Authors: Rami Aly, Xingjian Shi, Kaixiang Lin, Aston Zhang, Andrew Gordon Wilson

Abstract: A particularly successful class of approaches for few-shot learning combines language models with prompts -- hand-crafted task descriptions that complement data samples. However, designing prompts by hand for each task commonly requires domain knowledge and substantial guesswork. We observe, in the context of classification tasks, that instruction finetuned language models exhibit remarkable promp… ▽ More A particularly successful class of approaches for few-shot learning combines language models with prompts -- hand-crafted task descriptions that complement data samples. However, designing prompts by hand for each task commonly requires domain knowledge and substantial guesswork. We observe, in the context of classification tasks, that instruction finetuned language models exhibit remarkable prompt robustness, and we subsequently propose a simple method to eliminate the need for handcrafted prompts, named AuT-Few. This approach consists of (i) a prompt retrieval module that selects suitable task instructions from the instruction-tuning knowledge base, and (ii) the generation of two distinct, semantically meaningful, class descriptions and a selection mechanism via cross-validation. Over $12$ datasets, spanning $8$ classification tasks, we show that AuT-Few outperforms current state-of-the-art few-shot learning methods. Moreover, AuT-Few is the best ranking method across datasets on the RAFT few-shot benchmark. Notably, these results are achieved without task-specific handcrafted prompts on unseen tasks. △ Less

Submitted 21 October, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: EMNLP2023 Findings

arXiv:2305.08157 [pdf, ps, other]

doi 10.1145/3630106.3658899

Algorithmic Pluralism: A Structural Approach To Equal Opportunity

Authors: Shomik Jain, Vinith Suriyakumar, Kathleen Creel, Ashia Wilson

Abstract: We present a structural approach toward achieving equal opportunity in systems of algorithmic decision-making called algorithmic pluralism. Algorithmic pluralism describes a state of affairs in which no set of algorithms severely limits access to opportunity, allowing individuals the freedom to pursue a diverse range of life paths. To argue for algorithmic pluralism, we adopt Joseph Fishkin's theo… ▽ More We present a structural approach toward achieving equal opportunity in systems of algorithmic decision-making called algorithmic pluralism. Algorithmic pluralism describes a state of affairs in which no set of algorithms severely limits access to opportunity, allowing individuals the freedom to pursue a diverse range of life paths. To argue for algorithmic pluralism, we adopt Joseph Fishkin's theory of bottlenecks, which focuses on the structure of decision-points that determine how opportunities are allocated. The theory contends that each decision-point or bottleneck limits access to opportunities with some degree of severity and legitimacy. We extend Fishkin's structural viewpoint and use it to reframe existing systemic concerns about equal opportunity in algorithmic decision-making, such as patterned inequality and algorithmic monoculture. In proposing algorithmic pluralism, we argue for the urgent priority of alleviating severe bottlenecks in algorithmic decision-making. We contend that there must be a pluralism of opportunity available to many different individuals in order to promote equal opportunity in a systemic way. We further show how this framework has several implications for system design and regulation through current debates about equal opportunity in algorithmic hiring. △ Less

Submitted 15 May, 2024; v1 submitted 14 May, 2023; originally announced May 2023.

Comments: To appear in the proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT 2024)

MSC Class: 68-06 ACM Class: K.4.0

arXiv:2304.14994 [pdf, other]

A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks

Authors: Marc Finzi, Andres Potapczynski, Matthew Choptuik, Andrew Gordon Wilson

Abstract: Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic… ▽ More Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic forgetting impairs the applicability of this approach to initial value problems (IVPs). In an alternative local-in-time approach, the optimization problem can be converted into an ordinary differential equation (ODE) on the network parameters and the solution propagated forward in time; however, we demonstrate that current methods based on this approach suffer from two key issues. First, following the ODE produces an uncontrolled growth in the conditioning of the problem, ultimately leading to unacceptably large numerical errors. Second, as the ODE methods scale cubically with the number of model parameters, they are restricted to small neural networks, significantly limiting their ability to represent intricate PDE initial conditions and solutions. Building on these insights, we develop Neural IVP, an ODE based IVP solver which prevents the network from getting ill-conditioned and runs in time linear in the number of parameters, enabling us to evolve the dynamics of challenging PDEs with neural networks. △ Less

Submitted 30 August, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

Comments: ICLR 2023. Code available at https://github.com/mfinzi/neural-ivp

arXiv:2304.12210 [pdf, other]

A Cookbook of Self-Supervised Learning

Authors: Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Geiping, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann LeCun, Micah Goldblum

Abstract: Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier… ▽ More Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook. We hope to empower the curious researcher to navigate the terrain of methods, understand the role of the various knobs, and gain the know-how required to explore how delicious SSL can be. △ Less

Submitted 28 June, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

arXiv:2304.05366 [pdf, other]

The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

Authors: Micah Goldblum, Marc Finzi, Keefer Rowan, Andrew Gordon Wilson

Abstract: No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems. Accordingly, these theorems are often referenced in support of the notion that individual problems require specially tailored inductive biases. While virtually all uniformly sampled datasets h… ▽ More No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems. Accordingly, these theorems are often referenced in support of the notion that individual problems require specially tailored inductive biases. While virtually all uniformly sampled datasets have high complexity, real-world problems disproportionately generate low-complexity data, and we argue that neural network models share this same preference, formalized using Kolmogorov complexity. Notably, we show that architectures designed for a particular domain, such as computer vision, can compress datasets on a variety of seemingly unrelated domains. Our experiments show that pre-trained and even randomly initialized language models prefer to generate low-complexity sequences. Whereas no free lunch theorems seemingly indicate that individual problems require specialized learners, we explain how tasks that often require human intervention such as picking an appropriately sized model when labeled data is scarce or plentiful can be automated into a single learning algorithm. These observations justify the trend in deep learning of unifying seemingly disparate problems with an increasingly small set of machine learning models. △ Less

Submitted 7 June, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: Published at the International Conference on Machine Learning (ICML) 2024

arXiv:2303.11765 [pdf, other]

Cable Routing and Assembly using Tactile-driven Motion Primitives

Authors: Achu Wilson, Helen Jiang, Wenzhao Lian, Wenzhen Yuan

Abstract: Manipulating cables is challenging for robots because of the infinite degrees of freedom of the cables and frequent occlusion by the gripper and the environment. These challenges are further complicated by the dexterous nature of the operations required for cable routing and assembly, such as weaving and inserting, hampering common solutions with vision-only sensing. In this paper, we propose to i… ▽ More Manipulating cables is challenging for robots because of the infinite degrees of freedom of the cables and frequent occlusion by the gripper and the environment. These challenges are further complicated by the dexterous nature of the operations required for cable routing and assembly, such as weaving and inserting, hampering common solutions with vision-only sensing. In this paper, we propose to integrate tactile-guided low-level motion control with high-level vision-based task parsing for a challenging task: cable routing and assembly on a reconfigurable task board. Specifically, we build a library of tactile-guided motion primitives using a fingertip GelSight sensor, where each primitive reliably accomplishes an operation such as cable following and weaving. The overall task is inferred via visual perception given a goal configuration image, and then used to generate the primitive sequence. Experiments demonstrate the effectiveness of individual tactile-guided primitives and the integrated end-to-end solution, significantly outperforming the method without tactile sensing. Our reconfigurable task setup and proposed baselines provide a benchmark for future research in cable manipulation. More details and video are presented in \url{https://helennn.github.io/cable-manip/} △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.01513 [pdf]

Safe AI for health and beyond -- Monitoring to transform a health service

Authors: Mahed Abroshan, Michael Burkhart, Oscar Giles, Sam Greenbury, Zoe Kourtzi, Jack Roberts, Mihaela van der Schaar, Jannetta S Steyn, Alan Wilson, May Yong

Abstract: Machine learning techniques are effective for building predictive models because they identify patterns in large datasets. Development of a model for complex real-life problems often stop at the point of publication, proof of concept or when made accessible through some mode of deployment. However, a model in the medical domain risks becoming obsolete as patient demographics, systems and clinical… ▽ More Machine learning techniques are effective for building predictive models because they identify patterns in large datasets. Development of a model for complex real-life problems often stop at the point of publication, proof of concept or when made accessible through some mode of deployment. However, a model in the medical domain risks becoming obsolete as patient demographics, systems and clinical practices change. The maintenance and monitoring of predictive model performance post-publication is crucial to enable their safe and effective long-term use. We will assess the infrastructure required to monitor the outputs of a machine learning algorithm, and present two scenarios with examples of monitoring and updates of models, firstly on a breast cancer prognosis model trained on public longitudinal data, and secondly on a neurodegenerative stratification algorithm that is currently being developed and tested in clinic. △ Less

Submitted 6 June, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: 12 pages, 3 figures

ACM Class: I.2.1

arXiv:2302.04019 [pdf, other]

Fortuna: A Library for Uncertainty Quantification in Deep Learning

Authors: Gianluca Detommaso, Alberto Gasparin, Michele Donini, Matthias Seeger, Andrew Gordon Wilson, Cedric Archambeau

Abstract: We present Fortuna, an open-source library for uncertainty quantification in deep learning. Fortuna supports a range of calibration techniques, such as conformal prediction that can be applied to any trained neural network to generate reliable uncertainty estimates, and scalable Bayesian inference methods that can be applied to Flax-based deep neural networks trained from scratch for improved unce… ▽ More We present Fortuna, an open-source library for uncertainty quantification in deep learning. Fortuna supports a range of calibration techniques, such as conformal prediction that can be applied to any trained neural network to generate reliable uncertainty estimates, and scalable Bayesian inference methods that can be applied to Flax-based deep neural networks trained from scratch for improved uncertainty quantification and accuracy. By providing a coherent framework for advanced uncertainty quantification methods, Fortuna simplifies the process of benchmarking and helps practitioners build robust AI systems. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Showing 1–50 of 171 results for author: Wilson, A