-
From cryptomarkets to the surface web: Scouting eBay for counterfeits
Authors:
Felix Soldner,
Fabian Plum,
Bennett Kleinberg,
Shane D Johnson
Abstract:
Detecting counterfeits on online marketplaces is challenging, and current methods struggle with the volume of sales on platforms like eBay, while cryptomarkets openly sell counterfeits. Leveraging information from 453 cryptomarket counterfeits, we automated a search for corresponding products on eBay, utilizing image and text similarity metrics. We collected data twice over 4-months to analyze cha…
▽ More
Detecting counterfeits on online marketplaces is challenging, and current methods struggle with the volume of sales on platforms like eBay, while cryptomarkets openly sell counterfeits. Leveraging information from 453 cryptomarket counterfeits, we automated a search for corresponding products on eBay, utilizing image and text similarity metrics. We collected data twice over 4-months to analyze changes with an average of 159 eBay products per cryptomarket item, totaling 134k products. We found identical products, which would warrant further investigation as to whether they are counterfeits. Results indicate increasing difficulty finding similar products over time, moderated by product type and origin. Future improved versions of the current system could be used to examine possible connections between cryptomarket and surface web listings more closely and could hold practical value in supporting the detection of counterfeits on the surface web.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Removing Bias from Maximum Likelihood Estimation with Model Autophagy
Authors:
Paul Mayer,
Lorenzo Luzi,
Ali Siahkoohi,
Don H. Johnson,
Richard G. Baraniuk
Abstract:
We propose autophagy penalized likelihood estimation (PLE), an unbiased alternative to maximum likelihood estimation (MLE) which is more fair and less susceptible to model autophagy disorder (madness). Model autophagy refers to models trained on their own output; PLE ensures the statistics of these outputs coincide with the data statistics. This enables PLE to be statistically unbiased in certain…
▽ More
We propose autophagy penalized likelihood estimation (PLE), an unbiased alternative to maximum likelihood estimation (MLE) which is more fair and less susceptible to model autophagy disorder (madness). Model autophagy refers to models trained on their own output; PLE ensures the statistics of these outputs coincide with the data statistics. This enables PLE to be statistically unbiased in certain scenarios where MLE is biased. When biased, MLE unfairly penalizes minority classes in unbalanced datasets and exacerbates the recently discovered issue of self-consuming generative modeling. Theoretical and empirical results show that 1) PLE is more fair to minority classes and 2) PLE is more stable in a self-consumed setting. Furthermore, we provide a scalable and portable implementation of PLE with a hypernetwork framework, allowing existing deep learning architectures to be easily trained with PLE. Finally, we show PLE can bridge the gap between Bayesian and frequentist paradigms in statistics.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
An Eye Gaze Heatmap Analysis of Uncertainty Head-Up Display Designs for Conditional Automated Driving
Authors:
Michael A. Gerber,
Ronald Schroeter,
Daniel Johnson,
Christian P. Janssen,
Andry Rakotonirainy,
Jonny Kuo,
Mike G. Lenne
Abstract:
This paper reports results from a high-fidelity driving simulator study (N=215) about a head-up display (HUD) that conveys a conditional automated vehicle's dynamic "uncertainty" about the current situation while fallback drivers watch entertaining videos. We compared (between-group) three design interventions: display (a bar visualisation of uncertainty close to the video), interruption (interrup…
▽ More
This paper reports results from a high-fidelity driving simulator study (N=215) about a head-up display (HUD) that conveys a conditional automated vehicle's dynamic "uncertainty" about the current situation while fallback drivers watch entertaining videos. We compared (between-group) three design interventions: display (a bar visualisation of uncertainty close to the video), interruption (interrupting the video during uncertain situations), and combination (a combination of both), against a baseline (video-only). We visualised eye-tracking data to conduct a heatmap analysis of the four groups' gaze behaviour over time. We found interruptions initiated a phase during which participants interleaved their attention between monitoring and entertainment. This improved monitoring behaviour was more pronounced in combination compared to interruption, suggesting pre-warning interruptions have positive effects. The same addition had negative effects without interruptions (comparing baseline & display). Intermittent interruptions may have safety benefits over placing additional peripheral displays without compromising usability.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs
Authors:
Daniel D. Johnson,
Daniel Tarlow,
David Duvenaud,
Chris J. Maddison
Abstract:
Identifying how much a model ${\widehat{p}}_θ(Y|X)$ knows about the stochastic real-world process $p(Y|X)$ it was trained on is important to ensure it avoids producing incorrect or "hallucinated" answers or taking unsafe actions. But this is difficult for generative models because probabilistic predictions do not distinguish between per-response noise (aleatoric uncertainty) and lack of knowledge…
▽ More
Identifying how much a model ${\widehat{p}}_θ(Y|X)$ knows about the stochastic real-world process $p(Y|X)$ it was trained on is important to ensure it avoids producing incorrect or "hallucinated" answers or taking unsafe actions. But this is difficult for generative models because probabilistic predictions do not distinguish between per-response noise (aleatoric uncertainty) and lack of knowledge about the process (epistemic uncertainty), and existing epistemic uncertainty quantification techniques tend to be overconfident when the model underfits. We propose a general strategy for teaching a model to both approximate $p(Y|X)$ and also estimate the remaining gaps between ${\widehat{p}}_θ(Y|X)$ and $p(Y|X)$: train it to predict pairs of independent responses drawn from the true conditional distribution, allow it to "cheat" by observing one response while predicting the other, then measure how much it cheats. Remarkably, we prove that being good at cheating (i.e. cheating whenever it improves your prediction) is equivalent to being second-order calibrated, a principled extension of ordinary calibration that allows us to construct provably-correct frequentist confidence intervals for $p(Y|X)$ and detect incorrect responses with high probability. We demonstrate empirically that our approach accurately estimates how much models don't know across ambiguous image classification, (synthetic) language modeling, and partially-observable navigation tasks, outperforming existing techniques.
△ Less
Submitted 27 May, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Towards a Unified Naming Scheme for Thermo-Active Soft Actuators: A Review of Materials, Working Principles, and Applications
Authors:
Trevor Exley,
Emilly Hays,
Daniel Johnson,
Arian Moridani,
Ramya Motati,
Amir Jafari
Abstract:
Soft robotics is a rapidly growing field that spans the fields of chemistry, materials science, and engineering. Due to the diverse background of the field, there have been contrasting naming schemes such as 'intelligent', 'smart' and 'adaptive' materials which add vagueness to the broad innovation among literature. Therefore, a clear, functional and descriptive naming scheme is proposed in which…
▽ More
Soft robotics is a rapidly growing field that spans the fields of chemistry, materials science, and engineering. Due to the diverse background of the field, there have been contrasting naming schemes such as 'intelligent', 'smart' and 'adaptive' materials which add vagueness to the broad innovation among literature. Therefore, a clear, functional and descriptive naming scheme is proposed in which a previously vague name -- Soft Material for Soft Actuators -- can remain clear and concise -- Phase-Change Elastomers for Artificial Muscles. By synthesizing the working principle, material, and application into a naming scheme, the searchability of soft robotics can be enhanced and applied to other fields. The field of thermo-active soft actuators spans multiple domains and requires added clarity. Thermo-active actuators have potential for a variety of applications spanning virtual reality haptics to assistive devices. This review offers a comprehensive guide to selecting the type of thermo-active actuator when one has an application in mind. Additionally, it discusses future directions and improvements that are necessary for implementation.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
A density estimation perspective on learning from pairwise human preferences
Authors:
Vincent Dumoulin,
Daniel D. Johnson,
Pablo Samuel Castro,
Hugo Larochelle,
Yann Dauphin
Abstract:
Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted…
▽ More
Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted to maximize the rewards, often under additional regularization constraints. We propose an alternative interpretation which centers on the generative process for pairwise preferences and treats LHF as a density estimation problem. We provide theoretical and empirical results showing that for a family of generative processes defined via preference behavior distribution equations, training a reward function on pairwise preferences effectively models an annotator's implicit preference distribution. Finally, we discuss and present findings on "annotator misspecification" -- failure cases where wrong modeling assumptions are made about annotator behavior, resulting in poorly-adapted models -- suggesting that approaches that learn from pairwise human preferences could have trouble learning from a population of annotators with diverse viewpoints.
△ Less
Submitted 10 January, 2024; v1 submitted 23 November, 2023;
originally announced November 2023.
-
Data Discovery for the SDGs: A Systematic Rule-based Approach
Authors:
Yuwei Jiang,
David Johnson
Abstract:
In 2015, the United Nations put forward 17 Sustainable Development Goals (SDGs) to be achieved by 2030, where data has been promoted as a focus to innovating sustainable development and as a means to measuring progress towards achieving the SDGs. In this study, we propose a systematic approach towards discovering data types and sources that can be used for SDG research. The proposed method integra…
▽ More
In 2015, the United Nations put forward 17 Sustainable Development Goals (SDGs) to be achieved by 2030, where data has been promoted as a focus to innovating sustainable development and as a means to measuring progress towards achieving the SDGs. In this study, we propose a systematic approach towards discovering data types and sources that can be used for SDG research. The proposed method integrates a systematic mapping approach using manual qualitative coding over a corpus of SDG-related research literature followed by an automated process that applies rules to perform data entity extraction computationally. This approach is exemplified by an analysis of literature relating to SDG 7, the results of which are also presented in this paper. The paper concludes with a discussion of the approach and suggests future work to extend the method with more advance NLP and machine learning techniques.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
Automated Artifact Detection in Ultra-widefield Fundus Photography of Patients with Sickle Cell Disease
Authors:
Anqi Feng,
Dimitri Johnson,
Grace R. Reilly,
Loka Thangamathesvaran,
Ann Nampomba,
Mathias Unberath,
Adrienne W. Scott,
Craig Jones
Abstract:
Importance: Ultra-widefield fundus photography (UWF-FP) has shown utility in sickle cell retinopathy screening; however, image artifact may diminish quality and gradeability of images. Objective: To create an automated algorithm for UWF-FP artifact classification. Design: A neural network based automated artifact detection algorithm was designed to identify commonly encountered UWF-FP artifacts in…
▽ More
Importance: Ultra-widefield fundus photography (UWF-FP) has shown utility in sickle cell retinopathy screening; however, image artifact may diminish quality and gradeability of images. Objective: To create an automated algorithm for UWF-FP artifact classification. Design: A neural network based automated artifact detection algorithm was designed to identify commonly encountered UWF-FP artifacts in a cross section of patient UWF-FP. A pre-trained ResNet-50 neural network was trained on a subset of the images and the classification accuracy, sensitivity, and specificity were quantified on the hold out test set. Setting: The study is based on patients from a tertiary care hospital site. Participants: There were 243 UWF-FP acquired from patients with sickle cell disease (SCD), and artifact labelling in the following categories was performed: Eyelash Present, Lower Eyelid Obstructing, Upper Eyelid Obstructing, Image Too Dark, Dark Artifact, and Image Not Centered. Results: Overall, the accuracy for each class was Eyelash Present at 83.7%, Lower Eyelid Obstructing at 83.7%, Upper Eyelid Obstructing at 98.0%, Image Too Dark at 77.6%, Dark Artifact at 93.9%, and Image Not Centered at 91.8%. Conclusions and Relevance: This automated algorithm shows promise in identifying common imaging artifacts on a subset of Optos UWF-FP in SCD patients. Further refinement is ongoing with the goal of improving efficiency of tele-retinal screening in sickle cell retinopathy (SCR) by providing a photographer real-time feedback as to the types of artifacts present, and the need for image re-acquisition. This algorithm also may have potential future applicability in other retinal diseases by improving quality and efficiency of image acquisition of UWF-FP.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Addressing Discontinuous Root-Finding for Subsequent Differentiability in Machine Learning, Inverse Problems, and Control
Authors:
Daniel Johnson,
Ronald Fedkiw
Abstract:
There are many physical processes that have inherent discontinuities in their mathematical formulations. This paper is motivated by the specific case of collisions between two rigid or deformable bodies and the intrinsic nature of that discontinuity. The impulse response to a collision is discontinuous with the lack of any response when no collision occurs, which causes difficulties for numerical…
▽ More
There are many physical processes that have inherent discontinuities in their mathematical formulations. This paper is motivated by the specific case of collisions between two rigid or deformable bodies and the intrinsic nature of that discontinuity. The impulse response to a collision is discontinuous with the lack of any response when no collision occurs, which causes difficulties for numerical approaches that require differentiability which are typical in machine learning, inverse problems, and control. We theoretically and numerically demonstrate that the derivative of the collision time with respect to the parameters becomes infinite as one approaches the barrier separating colliding from not colliding, and use lifting to complexify the solution space so that solutions on the other side of the barrier are directly attainable as precise values. Subsequently, we mollify the barrier posed by the unbounded derivatives, so that one can tunnel back and forth in a smooth and reliable fashion facilitating the use of standard numerical approaches. Moreover, we illustrate that standard approaches fail in numerous ways mostly due to a lack of understanding of the mathematical nature of the problem (e.g. typical backpropagation utilizes many rules of differentiation, but ignores L'Hopital's rule).
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Towards Interpretability in Audio and Visual Affective Machine Learning: A Review
Authors:
David S. Johnson,
Olya Hakobyan,
Hanna Drimalla
Abstract:
Machine learning is frequently used in affective computing, but presents challenges due the opacity of state-of-the-art machine learning methods. Because of the impact affective machine learning systems may have on an individual's life, it is important that models be made transparent to detect and mitigate biased decision making. In this regard, affective machine learning could benefit from the re…
▽ More
Machine learning is frequently used in affective computing, but presents challenges due the opacity of state-of-the-art machine learning methods. Because of the impact affective machine learning systems may have on an individual's life, it is important that models be made transparent to detect and mitigate biased decision making. In this regard, affective machine learning could benefit from the recent advancements in explainable artificial intelligence (XAI) research. We perform a structured literature review to examine the use of interpretability in the context of affective machine learning. We focus on studies using audio, visual, or audiovisual data for model training and identified 29 research articles. Our findings show an emergence of the use of interpretability methods in the last five years. However, their use is currently limited regarding the range of methods used, the depth of evaluations, and the consideration of use-cases. We outline the main gaps in the research and provide recommendations for researchers that aim to implement interpretable methods for affective machine learning.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Software-based Automatic Differentiation is Flawed
Authors:
Daniel Johnson,
Trevor Maxfield,
Yongxu Jin,
Ronald Fedkiw
Abstract:
Various software efforts embrace the idea that object oriented programming enables a convenient implementation of the chain rule, facilitating so-called automatic differentiation via backpropagation. Such frameworks have no mechanism for simplifying the expressions (obtained via the chain rule) before evaluating them. As we illustrate below, the resulting errors tend to be unbounded.
Various software efforts embrace the idea that object oriented programming enables a convenient implementation of the chain rule, facilitating so-called automatic differentiation via backpropagation. Such frameworks have no mechanism for simplifying the expressions (obtained via the chain rule) before evaluating them. As we illustrate below, the resulting errors tend to be unbounded.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Efficient Multi-stage Inference on Tabular Data
Authors:
Daniel S Johnson,
Igor L Markov
Abstract:
Many ML applications and products train on medium amounts of input data but get bottlenecked in real-time inference. When implementing ML systems, conventional wisdom favors segregating ML code into services queried by product code via Remote Procedure Call (RPC) APIs. This approach clarifies the overall software architecture and simplifies product code by abstracting away ML internals. However, t…
▽ More
Many ML applications and products train on medium amounts of input data but get bottlenecked in real-time inference. When implementing ML systems, conventional wisdom favors segregating ML code into services queried by product code via Remote Procedure Call (RPC) APIs. This approach clarifies the overall software architecture and simplifies product code by abstracting away ML internals. However, the separation adds network latency and entails additional CPU overhead. Hence, we simplify inference algorithms and embed them into the product code to reduce network communication. For public datasets and a high-performance real-time platform that deals with tabular data, we show that over half of the inputs are often amenable to such optimization, while the remainder can be handled by the original model. By applying our optimization with AutoML to both training and inference, we reduce inference latency by 1.3x, CPU resources by 30%, and network communication between application front-end and ML back-end by about 50% for a commercial end-to-end ML platform that serves millions of real-time decisions per second.
△ Less
Submitted 21 July, 2023; v1 submitted 21 March, 2023;
originally announced March 2023.
-
R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents
Authors:
Daniel D. Johnson,
Daniel Tarlow,
Christian Walder
Abstract:
Large language models show impressive results at predicting structured text such as code, but also commonly introduce errors and hallucinations in their output. When used to assist software developers, these models may make mistakes that users must go back and fix, or worse, introduce subtle bugs that users may miss entirely. We propose Randomized Utility-driven Synthesis of Uncertain REgions (R-U…
▽ More
Large language models show impressive results at predicting structured text such as code, but also commonly introduce errors and hallucinations in their output. When used to assist software developers, these models may make mistakes that users must go back and fix, or worse, introduce subtle bugs that users may miss entirely. We propose Randomized Utility-driven Synthesis of Uncertain REgions (R-U-SURE), an approach for building uncertainty-aware suggestions based on a decision-theoretic model of goal-conditioned utility, using random samples from a generative model as a proxy for the unobserved possible intents of the end user. Our technique combines minimum-Bayes-risk decoding, dual decomposition, and decision diagrams in order to efficiently produce structured uncertainty summaries, given only sample access to an arbitrary generative model of code and an optional AST parser. We demonstrate R-U-SURE on three developer-assistance tasks, and show that it can be applied different user interaction patterns without retraining the model and leads to more accurate uncertainty estimates than token-probability baselines. We also release our implementation as an open-source library at https://github.com/google-research/r_u_sure.
△ Less
Submitted 28 April, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Real-Time Traffic End-of-Queue Detection and Tracking in UAV Video
Authors:
Russ Messenger,
Md Zobaer Islam,
Matthew Whitlock,
Erik Spong,
Nate Morton,
Layne Claggett,
Chris Matthews,
Jordan Fox,
Leland Palmer,
Dane C. Johnson,
John F. O'Hara,
Christopher J. Crick,
Jamey D. Jacob,
Sabit Ekin
Abstract:
Highway work zones are susceptible to undue accumulation of motorized vehicles which calls for dynamic work zone warning signs to prevent accidents. The work zone signs are placed according to the location of the end-of-queue of vehicles which usually changes rapidly. The detection of moving objects in video captured by Unmanned Aerial Vehicles (UAV) has been extensively researched so far, and is…
▽ More
Highway work zones are susceptible to undue accumulation of motorized vehicles which calls for dynamic work zone warning signs to prevent accidents. The work zone signs are placed according to the location of the end-of-queue of vehicles which usually changes rapidly. The detection of moving objects in video captured by Unmanned Aerial Vehicles (UAV) has been extensively researched so far, and is used in a wide array of applications including traffic monitoring. Unlike the fixed traffic cameras, UAVs can be used to monitor the traffic at work zones in real-time and also in a more cost-effective way. This study presents a method as a proof of concept for detecting End-of-Queue (EOQ) of traffic by processing the real-time video footage of a highway work zone captured by UAV. EOQ is detected in the video by image processing which includes background subtraction and blob detection methods. This dynamic localization of EOQ of vehicles will enable faster and more accurate relocation of work zone warning signs for drivers and thus will reduce work zone fatalities. The method can be applied to detect EOQ of vehicles and notify drivers in any other roads or intersections too where vehicles are rapidly accumulating due to special events, traffic jams, construction, or accidents.
△ Less
Submitted 31 October, 2023; v1 submitted 9 January, 2023;
originally announced February 2023.
-
Testing Human Ability To Detect Deepfake Images of Human Faces
Authors:
Sergi D. Bray,
Shane D. Johnson,
Bennett Kleinberg
Abstract:
Deepfakes are computationally-created entities that falsely represent reality. They can take image, video, and audio modalities, and pose a threat to many areas of systems and societies, comprising a topic of interest to various aspects of cybersecurity and cybersafety. In 2020 a workshop consulting AI experts from academia, policing, government, the private sector, and state security agencies ran…
▽ More
Deepfakes are computationally-created entities that falsely represent reality. They can take image, video, and audio modalities, and pose a threat to many areas of systems and societies, comprising a topic of interest to various aspects of cybersecurity and cybersafety. In 2020 a workshop consulting AI experts from academia, policing, government, the private sector, and state security agencies ranked deepfakes as the most serious AI threat. These experts noted that since fake material can propagate through many uncontrolled routes, changes in citizen behaviour may be the only effective defence. This study aims to assess human ability to identify image deepfakes of human faces (StyleGAN2:FFHQ) from nondeepfake images (FFHQ), and to assess the effectiveness of simple interventions intended to improve detection accuracy. Using an online survey, 280 participants were randomly allocated to one of four groups: a control group, and 3 assistance interventions. Each participant was shown a sequence of 20 images randomly selected from a pool of 50 deepfake and 50 real images of human faces. Participants were asked if each image was AI-generated or not, to report their confidence, and to describe the reasoning behind each response. Overall detection accuracy was only just above chance and none of the interventions significantly improved this. Participants' confidence in their answers was high and unrelated to accuracy. Assessing the results on a per-image basis reveals participants consistently found certain images harder to label correctly, but reported similarly high confidence regardless of the image. Thus, although participant accuracy was 62% overall, this accuracy across images ranged quite evenly between 85% and 30%, with an accuracy of below 50% for one in every five images. We interpret the findings as suggesting that there is a need for an urgent call to action to address this threat.
△ Less
Submitted 25 May, 2023; v1 submitted 7 December, 2022;
originally announced December 2022.
-
Counterfeits on Darknet Markets: A measurement between Jan-2014 and Sep-2015
Authors:
Felix Soldner,
Bennett Kleinberg,
Shane D Johnson
Abstract:
Counterfeits harm consumers, governments, and intellectual property holders. They accounted for 3.3% of worldwide trades in 2016, having an estimated value of $509 billion in the same year. While estimations are mostly based on border seizures, we examined openly labeled counterfeits on darknet markets, which allowed us to gather and analyze information from a different perspective. Here, we analy…
▽ More
Counterfeits harm consumers, governments, and intellectual property holders. They accounted for 3.3% of worldwide trades in 2016, having an estimated value of $509 billion in the same year. While estimations are mostly based on border seizures, we examined openly labeled counterfeits on darknet markets, which allowed us to gather and analyze information from a different perspective. Here, we analyzed data from 11 darknet markets for the period Jan-2014 and Sep-2015. The findings suggest that darknet markets harbor similar counterfeit product types as found in seizures but that the share of watches is higher and lower for electronics, clothes, shoes, and Tobacco on darknet markets. Also, darknet market counterfeits seem to have similar shipping origins as seized goods, with some exceptions, such as a relatively high share (5%) of dark market counterfeits originating from the US. Lastly, counterfeits on dark markets tend to have a relatively low price and sales volume. However, based on preliminary estimations, the original products on the surface web seem to be worth a multiple of the prices of the counterfeit counterparts on darknet markets. Gathering insights about counterfeits from darknet markets can be valuable for businesses and authorities and be cost-effective compared to border seizures. Thus, monitoring darknet markets can help us understand the counterfeit landscape better.
△ Less
Submitted 24 October, 2023; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Redistributor: Transforming Empirical Data Distributions
Authors:
Pavol Harar,
Dennis Elbrächter,
Monika Dörfler,
Kory D. Johnson
Abstract:
We present an algorithm and package, Redistributor, which forces a collection of scalar samples to follow a desired distribution. When given independent and identically distributed samples of some random variable $S$ and the continuous cumulative distribution function of some desired target $T$, it provably produces a consistent estimator of the transformation $R$ which satisfies $R(S)=T$ in distr…
▽ More
We present an algorithm and package, Redistributor, which forces a collection of scalar samples to follow a desired distribution. When given independent and identically distributed samples of some random variable $S$ and the continuous cumulative distribution function of some desired target $T$, it provably produces a consistent estimator of the transformation $R$ which satisfies $R(S)=T$ in distribution. As the distribution of $S$ or $T$ may be unknown, we also include algorithms for efficiently estimating these distributions from samples. This allows for various interesting use cases in image processing, where Redistributor serves as a remarkably simple and easy-to-use tool that is capable of producing visually appealing results. For color correction it outperforms other model-based methods and excels in achieving photorealistic style transfer, surpassing deep learning methods in content preservation. The package is implemented in Python and is optimized to efficiently handle large datasets, making it also suitable as a preprocessing step in machine learning. The source code is available at https://github.com/paloha/redistributor.
△ Less
Submitted 5 July, 2024; v1 submitted 25 October, 2022;
originally announced October 2022.
-
Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions
Authors:
Daniel D. Johnson,
Ayoub El Hanchi,
Chris J. Maddison
Abstract:
Contrastive learning is a powerful framework for learning self-supervised representations that generalize well to downstream supervised tasks. We show that multiple existing contrastive learning methods can be reinterpreted as learning kernel functions that approximate a fixed positive-pair kernel. We then prove that a simple representation obtained by combining this kernel with PCA provably minim…
▽ More
Contrastive learning is a powerful framework for learning self-supervised representations that generalize well to downstream supervised tasks. We show that multiple existing contrastive learning methods can be reinterpreted as learning kernel functions that approximate a fixed positive-pair kernel. We then prove that a simple representation obtained by combining this kernel with PCA provably minimizes the worst-case approximation error of linear predictors, under a straightforward assumption that positive pairs have similar labels. Our analysis is based on a decomposition of the target function in terms of the eigenfunctions of a positive-pair Markov chain, and a surprising equivalence between these eigenfunctions and the output of Kernel PCA. We give generalization bounds for downstream linear prediction using our Kernel PCA representation, and show empirically on a set of synthetic tasks that applying Kernel PCA to contrastive learning models can indeed approximately recover the Markov chain eigenfunctions, although the accuracy depends on the kernel parameterization as well as on the augmentation strength.
△ Less
Submitted 14 February, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.
-
A Library for Representing Python Programs as Graphs for Machine Learning
Authors:
David Bieber,
Kensen Shi,
Petros Maniatis,
Charles Sutton,
Vincent Hellendoorn,
Daniel Johnson,
Daniel Tarlow
Abstract:
Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models. Our library admits the construction of control-flow graphs, data-flow graphs, and composite ``program graphs'…
▽ More
Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models. Our library admits the construction of control-flow graphs, data-flow graphs, and composite ``program graphs'' that combine control-flow, data-flow, syntactic, and lexical information about a program. We present the capabilities and limitations of the library, perform a case study applying the library to millions of competitive programming submissions, and showcase the library's utility for machine learning research.
△ Less
Submitted 15 August, 2022;
originally announced August 2022.
-
Autocorrelation, Wigner and Ambiguity Transforms on Polygons for Coherent Radiation Rendering
Authors:
Jacob Mackay,
David Johnson,
Graham Brooker
Abstract:
Simulating the radar illumination of large scenes generally relies on a geometric model of light transport which largely ignores prominent wave effects. This can be remedied through coherence ray-tracing, but this requires the Wigner transform of the aperture. This diffraction function has been historically difficult to generate, and is relevant in the fields of optics, holography, synchrotron-rad…
▽ More
Simulating the radar illumination of large scenes generally relies on a geometric model of light transport which largely ignores prominent wave effects. This can be remedied through coherence ray-tracing, but this requires the Wigner transform of the aperture. This diffraction function has been historically difficult to generate, and is relevant in the fields of optics, holography, synchrotron-radiation, quantum systems and radar. In this paper we provide the Wigner transform of arbitrary polygons through geometric transforms and the Stokes Fourier transform; and display its use in Monte-Carlo rendering.
△ Less
Submitted 5 February, 2022;
originally announced February 2022.
-
An Experience Report of Executive-Level Artificial Intelligence Education in the United Arab Emirates
Authors:
David Johnson,
Mohammad Alsharid,
Rasheed El-Bouri,
Nigel Mehdi,
Farah Shamout,
Alexandre Szenicer,
David Toman,
Saqr Binghalib
Abstract:
Teaching artificial intelligence (AI) is challenging. It is a fast moving field and therefore difficult to keep people updated with the state-of-the-art. Educational offerings for students are ever increasing, beyond university degree programs where AI education traditionally lay. In this paper, we present an experience report of teaching an AI course to business executives in the United Arab Emir…
▽ More
Teaching artificial intelligence (AI) is challenging. It is a fast moving field and therefore difficult to keep people updated with the state-of-the-art. Educational offerings for students are ever increasing, beyond university degree programs where AI education traditionally lay. In this paper, we present an experience report of teaching an AI course to business executives in the United Arab Emirates (UAE). Rather than focusing only on theoretical and technical aspects, we developed a course that teaches AI with a view to enabling students to understand how to incorporate it into existing business processes. We present an overview of our course, curriculum and teaching methods, and we discuss our reflections on teaching adult learners, and to students in the UAE.
△ Less
Submitted 2 February, 2022;
originally announced February 2022.
-
Learning Generalized Gumbel-max Causal Mechanisms
Authors:
Guy Lorberbom,
Daniel D. Johnson,
Chris J. Maddison,
Daniel Tarlow,
Tamir Hazan
Abstract:
To perform counterfactual reasoning in Structural Causal Models (SCMs), one needs to know the causal mechanisms, which provide factorizations of conditional distributions into noise sources and deterministic functions mapping realizations of noise to samples. Unfortunately, the causal mechanism is not uniquely identified by data that can be gathered by observing and interacting with the world, so…
▽ More
To perform counterfactual reasoning in Structural Causal Models (SCMs), one needs to know the causal mechanisms, which provide factorizations of conditional distributions into noise sources and deterministic functions mapping realizations of noise to samples. Unfortunately, the causal mechanism is not uniquely identified by data that can be gathered by observing and interacting with the world, so there remains the question of how to choose causal mechanisms. In recent work, Oberst & Sontag (2019) propose Gumbel-max SCMs, which use Gumbel-max reparameterizations as the causal mechanism due to an intuitively appealing counterfactual stability property. In this work, we instead argue for choosing a causal mechanism that is best under a quantitative criteria such as minimizing variance when estimating counterfactual treatment effects. We propose a parameterized family of causal mechanisms that generalize Gumbel-max. We show that they can be trained to minimize counterfactual effect variance and other losses on a distribution of queries of interest, yielding lower variance estimates of counterfactual treatment effect than fixed alternatives, also generalizing to queries not seen at training time.
△ Less
Submitted 11 November, 2021;
originally announced November 2021.
-
Parallel Algebraic Effect Handlers
Authors:
Ningning Xie,
Daniel D. Johnson,
Dougal Maclaurin,
Adam Paszke
Abstract:
Algebraic effects and handlers support composable and structured control-flow abstraction. However, existing designs of algebraic effects often require effects to be executed sequentially. This paper studies parallel algebraic effect handlers. In particular, we formalize λp, an untyped lambda calculus which models two key features, effect handlers and parallelizable computations, the latter of whi…
▽ More
Algebraic effects and handlers support composable and structured control-flow abstraction. However, existing designs of algebraic effects often require effects to be executed sequentially. This paper studies parallel algebraic effect handlers. In particular, we formalize λp, an untyped lambda calculus which models two key features, effect handlers and parallelizable computations, the latter of which takes the form of a for expression as inspired by the Dex programming language. We present various interesting examples expressible in our calculus, and provide a Haskell implementation. We hope this paper provides a basis for future designs and implementations of parallel algebraic effect handlers.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
Beyond In-Place Corruption: Insertion and Deletion In Denoising Probabilistic Models
Authors:
Daniel D. Johnson,
Jacob Austin,
Rianne van den Berg,
Daniel Tarlow
Abstract:
Denoising diffusion probabilistic models (DDPMs) have shown impressive results on sequence generation by iteratively corrupting each example and then learning to map corrupted versions back to the original. However, previous work has largely focused on in-place corruption, adding noise to each pixel or token individually while keeping their locations the same. In this work, we consider a broader c…
▽ More
Denoising diffusion probabilistic models (DDPMs) have shown impressive results on sequence generation by iteratively corrupting each example and then learning to map corrupted versions back to the original. However, previous work has largely focused on in-place corruption, adding noise to each pixel or token individually while keeping their locations the same. In this work, we consider a broader class of corruption processes and denoising models over sequence data that can insert and delete elements, while still being efficient to train and sample from. We demonstrate that these models outperform standard in-place models on an arithmetic sequence task, and that when trained on the text8 dataset they can be used to fix spelling errors without any fine-tuning.
△ Less
Submitted 15 July, 2021;
originally announced July 2021.
-
Structured Denoising Diffusion Models in Discrete State-Spaces
Authors:
Jacob Austin,
Daniel D. Johnson,
Jonathan Ho,
Daniel Tarlow,
Rianne van den Berg
Abstract:
Denoising diffusion probabilistic models (DDPMs) (Ho et al. 2020) have shown impressive results on image and waveform generation in continuous state spaces. Here, we introduce Discrete Denoising Diffusion Probabilistic Models (D3PMs), diffusion-like generative models for discrete data that generalize the multinomial diffusion model of Hoogeboom et al. 2021, by going beyond corruption processes wit…
▽ More
Denoising diffusion probabilistic models (DDPMs) (Ho et al. 2020) have shown impressive results on image and waveform generation in continuous state spaces. Here, we introduce Discrete Denoising Diffusion Probabilistic Models (D3PMs), diffusion-like generative models for discrete data that generalize the multinomial diffusion model of Hoogeboom et al. 2021, by going beyond corruption processes with uniform transition probabilities. This includes corruption with transition matrices that mimic Gaussian kernels in continuous space, matrices based on nearest neighbors in embedding space, and matrices that introduce absorbing states. The third allows us to draw a connection between diffusion models and autoregressive and mask-based generative models. We show that the choice of transition matrix is an important design decision that leads to improved results in image and text domains. We also introduce a new loss function that combines the variational lower bound with an auxiliary cross entropy loss. For text, this model class achieves strong results on character-level text generation while scaling to large vocabularies on LM1B. On the image dataset CIFAR-10, our models approach the sample quality and exceed the log-likelihood of the continuous-space DDPM model.
△ Less
Submitted 22 February, 2023; v1 submitted 7 July, 2021;
originally announced July 2021.
-
Decentralised Intelligence, Surveillance, and Reconnaissance in Unknown Environments with Heterogeneous Multi-Robot Systems
Authors:
Ki Myung Brian Lee,
Felix H. Kong,
Ricardo Cannizzaro,
Jennifer L. Palmer,
David Johnson,
Chanyeol Yoo,
Robert Fitch
Abstract:
We present the design and implementation of a decentralised, heterogeneous multi-robot system for performing intelligence, surveillance and reconnaissance (ISR) in an unknown environment. The team consists of functionally specialised robots that gather information and others that perform a mission-specific task, and is coordinated to achieve simultaneous exploration and exploitation in the unknown…
▽ More
We present the design and implementation of a decentralised, heterogeneous multi-robot system for performing intelligence, surveillance and reconnaissance (ISR) in an unknown environment. The team consists of functionally specialised robots that gather information and others that perform a mission-specific task, and is coordinated to achieve simultaneous exploration and exploitation in the unknown environment. We present a practical implementation of such a system, including decentralised inter-robot localisation, mapping, data fusion and coordination. The system is demonstrated in an efficient distributed simulation. We also describe an UAS platform for hardware experiments, and the ongoing progress.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
An Upper Confidence Bound for Simultaneous Exploration and Exploitation in Heterogeneous Multi-Robot Systems
Authors:
Ki Myung Brian Lee,
Felix H. Kong,
Ricardo Cannizzaro,
Jennifer L. Palmer,
David Johnson,
Chanyeol Yoo,
Robert Fitch
Abstract:
Heterogeneous multi-robot systems are advantageous for operations in unknown environments because functionally specialised robots can gather environmental information, while others perform tasks. We define this decomposition as the scout-task robot architecture and show how it avoids the need to explicitly balance exploration and exploitation~by permitting the system to do both simultaneously. The…
▽ More
Heterogeneous multi-robot systems are advantageous for operations in unknown environments because functionally specialised robots can gather environmental information, while others perform tasks. We define this decomposition as the scout-task robot architecture and show how it avoids the need to explicitly balance exploration and exploitation~by permitting the system to do both simultaneously. The challenge is to guide exploration in a way that improves overall performance for time-limited tasks. We derive a novel upper confidence bound for simultaneous exploration and exploitation based on mutual information and present a general solution for scout-task coordination using decentralised Monte Carlo tree search. We evaluate the performance of our algorithms in a multi-drone surveillance scenario in which scout robots are equipped with low-resolution, long-range sensors and task robots capture detailed information using short-range sensors. The results address a new class of coordination problem for heterogeneous teams that has many practical applications.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.
-
Efficacy of Images Versus Data Buffers: Optimizing Interactive Applications Utilizing OpenCL for Scientific Visualization
Authors:
Donald W. Johnson,
T. J. Jankun-Kelly
Abstract:
This paper examines an algorithm using dual OpenCL image buffers to optimize data streaming for ensemble processing and visualization. Image buffers were utilized because they allow cached memory access, unlike simple data buffers, which are more commonly used. OpenCL image object performance was improved by allowing upload and mapping into one buffer to occur concurrently with mapping and/or proc…
▽ More
This paper examines an algorithm using dual OpenCL image buffers to optimize data streaming for ensemble processing and visualization. Image buffers were utilized because they allow cached memory access, unlike simple data buffers, which are more commonly used. OpenCL image object performance was improved by allowing upload and mapping into one buffer to occur concurrently with mapping and/or processing of data in another buffer. This technique was applied in an interactive application allowing multiple flood extent maps to be combined into a single image, and allowing users to vary input image sets in real time. The efficiency of this technique was tested by varying both dimensions of input images and number of iterations; computation scaled linearly with number of input images, with best results achieved using ~4k images. Tests were performed to determine the rate at which data could be moved from data buffers to image buffers, examining a large range of possible image buffer dimensions. Additional tests examined kernel runtimes with different image and buffer variants. Limitations of the algorithm and possible applications are discussed.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Getting to the Point. Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming
Authors:
Adam Paszke,
Daniel Johnson,
David Duvenaud,
Dimitrios Vytiniotis,
Alexey Radul,
Matthew Johnson,
Jonathan Ragan-Kelley,
Dougal Maclaurin
Abstract:
We present a novel programming language design that attempts to combine the clarity and safety of high-level functional languages with the efficiency and parallelism of low-level numerical languages. We treat arrays as eagerly-memoized functions on typed index sets, allowing abstract function manipulations, such as currying, to work on arrays. In contrast to composing primitive bulk-array operatio…
▽ More
We present a novel programming language design that attempts to combine the clarity and safety of high-level functional languages with the efficiency and parallelism of low-level numerical languages. We treat arrays as eagerly-memoized functions on typed index sets, allowing abstract function manipulations, such as currying, to work on arrays. In contrast to composing primitive bulk-array operations, we argue for an explicit nested indexing style that mirrors application of functions to arguments. We also introduce a fine-grained typed effects system which affords concise and automatically-parallelized in-place updates. Specifically, an associative accumulation effect allows reverse-mode automatic differentiation of in-place updates in a way that preserves parallelism. Empirically, we benchmark against the Futhark array programming language, and demonstrate that aggressive inlining and type-driven compilation allows array programs to be written in an expressive, "pointful" style with little performance penalty.
△ Less
Submitted 12 April, 2021;
originally announced April 2021.
-
DESED-FL and URBAN-FL: Federated Learning Datasets for Sound Event Detection
Authors:
David S. Johnson,
Wolfgang Lorenz,
Michael Taenzer,
Stylianos Mimilakis,
Sascha Grollmisch,
Jakob Abeßer,
Hanna Lukashevich
Abstract:
Research on sound event detection (SED) in environmental settings has seen increased attention in recent years. The large amounts of (private) domestic or urban audio data needed raise significant logistical and privacy concerns. The inherently distributed nature of these tasks, make federated learning (FL) a promising approach to take advantage of largescale data while mitigating privacy issues.…
▽ More
Research on sound event detection (SED) in environmental settings has seen increased attention in recent years. The large amounts of (private) domestic or urban audio data needed raise significant logistical and privacy concerns. The inherently distributed nature of these tasks, make federated learning (FL) a promising approach to take advantage of largescale data while mitigating privacy issues. While FL has also seen increased attention recently, to the best of our knowledge there is no research towards FL for SED. To address this gap and foster further research in this field, we create and publish novel FL datasets for SED in domestic and urban environments. Furthermore, we provide baseline results on the datasets in a FL context for three deep neural network architectures. The results indicate that FL is a promising approach for SED, but faces challenges with divergent data distributions inherent to distributed client edge devices.
△ Less
Submitted 31 May, 2021; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Accelerating computational modeling and design of high-entropy alloys
Authors:
Rahul Singh,
Aayush Sharma,
Prashant Singh,
Ganesh Balasubramanian,
Duane D. Johnson
Abstract:
With huge design spaces for unique chemical and mechanical properties, we remove a roadblock to computational design of {high-entropy alloys} using a metaheuristic hybrid Cuckoo Search (CS) for "on-the-fly" construction of Super-Cell Random APproximates (SCRAPs) having targeted atomic site and pair probabilities on arbitrary crystal lattices. Our hybrid-CS schema overcomes large, discrete combinat…
▽ More
With huge design spaces for unique chemical and mechanical properties, we remove a roadblock to computational design of {high-entropy alloys} using a metaheuristic hybrid Cuckoo Search (CS) for "on-the-fly" construction of Super-Cell Random APproximates (SCRAPs) having targeted atomic site and pair probabilities on arbitrary crystal lattices. Our hybrid-CS schema overcomes large, discrete combinatorial optimization by ultrafast global solutions that scale linearly in system size and strongly in parallel, e.g. a 4-element, 128-atom model [a $10^{73+}$ space] is found in seconds -- a reduction of 13,000+ over current strategies. With model-generation eliminated as a bottleneck, computational alloy design can be performed that is currently impossible or impractical. We showcase the method for real alloys with varying short-range order. Being problem-agnostic, our hybrid-CS schema offers numerous applications in diverse fields.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
Probing for Multilingual Numerical Understanding in Transformer-Based Language Models
Authors:
Devin Johnson,
Denise Mak,
Drew Barker,
Lexi Loessberg-Zahl
Abstract:
Natural language numbers are an example of compositional structures, where larger numbers are composed of operations on smaller numbers. Given that compositional reasoning is a key to natural language understanding, we propose novel multilingual probing tasks tested on DistilBERT, XLM, and BERT to investigate for evidence of compositional reasoning over numerical data in various natural language n…
▽ More
Natural language numbers are an example of compositional structures, where larger numbers are composed of operations on smaller numbers. Given that compositional reasoning is a key to natural language understanding, we propose novel multilingual probing tasks tested on DistilBERT, XLM, and BERT to investigate for evidence of compositional reasoning over numerical data in various natural language number systems. By using both grammaticality judgment and value comparison classification tasks in English, Japanese, Danish, and French, we find evidence that the information encoded in these pretrained models' embeddings is sufficient for grammaticality judgments but generally not for value comparisons. We analyze possible reasons for this and discuss how our tasks could be extended in further studies.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Learning Graph Structure With A Finite-State Automaton Layer
Authors:
Daniel D. Johnson,
Hugo Larochelle,
Daniel Tarlow
Abstract:
Graph-based neural network models are producing strong results in a number of domains, in part because graphs provide flexibility to encode domain knowledge in the form of relational structure (edges) between nodes in the graph. In practice, edges are used both to represent intrinsic structure (e.g., abstract syntax trees of programs) and more abstract relations that aid reasoning for a downstream…
▽ More
Graph-based neural network models are producing strong results in a number of domains, in part because graphs provide flexibility to encode domain knowledge in the form of relational structure (edges) between nodes in the graph. In practice, edges are used both to represent intrinsic structure (e.g., abstract syntax trees of programs) and more abstract relations that aid reasoning for a downstream task (e.g., results of relevant program analyses). In this work, we study the problem of learning to derive abstract relations from the intrinsic graph structure. Motivated by their power in program analyses, we consider relations defined by paths on the base graph accepted by a finite-state automaton. We show how to learn these relations end-to-end by relaxing the problem into learning finite-state automata policies on a graph-based POMDP and then training these policies using implicit differentiation. The result is a differentiable Graph Finite-State Automaton (GFSA) layer that adds a new edge type (expressed as a weighted adjacency matrix) to a base graph. We demonstrate that this layer can find shortcuts in grid-world graphs and reproduce simple static analyses on Python programs. Additionally, we combine the GFSA layer with a larger graph-based model trained end-to-end on the variable misuse program understanding task, and find that using the GFSA layer leads to better performance than using hand-engineered semantic edges or other baseline methods for adding learned edge types.
△ Less
Submitted 6 November, 2020; v1 submitted 9 July, 2020;
originally announced July 2020.
-
Coercing Machine Learning to Output Physically Accurate Results
Authors:
Zhenglin Geng,
Dan Johnson,
Ronald Fedkiw
Abstract:
Many machine/deep learning artificial neural networks are trained to simply be interpolation functions that map input variables to output values interpolated from the training data in a linear/nonlinear fashion. Even when the input/output pairs of the training data are physically accurate (e.g. the results of an experiment or numerical simulation), interpolated quantities can deviate quite far fro…
▽ More
Many machine/deep learning artificial neural networks are trained to simply be interpolation functions that map input variables to output values interpolated from the training data in a linear/nonlinear fashion. Even when the input/output pairs of the training data are physically accurate (e.g. the results of an experiment or numerical simulation), interpolated quantities can deviate quite far from being physically accurate. Although one could project the output of a network into a physically feasible region, such a postprocess is not captured by the energy function minimized when training the network; thus, the final projected result could incorrectly deviate quite far from the training data. We propose folding any such projection or postprocess directly into the network so that the final result is correctly compared to the training data by the energy function. Although we propose a general approach, we illustrate its efficacy on a specific convolutional neural network that takes in human pose parameters (joint rotations) and outputs a prediction of vertex positions representing a triangulated cloth mesh. While the original network outputs vertex positions with erroneously high stretching and compression energies, the new network trained with our physics prior remedies these issues producing highly improved results.
△ Less
Submitted 22 November, 2019; v1 submitted 21 October, 2019;
originally announced October 2019.
-
Adaptive, Distribution-Free Prediction Intervals for Deep Networks
Authors:
Danijel Kivaranovic,
Kory D. Johnson,
Hannes Leeb
Abstract:
The machine learning literature contains several constructions for prediction intervals that are intuitively reasonable but ultimately ad-hoc in that they do not come with provable performance guarantees. We present methods from the statistics literature that can be used efficiently with neural networks under minimal assumptions with guaranteed performance. We propose a neural network that outputs…
▽ More
The machine learning literature contains several constructions for prediction intervals that are intuitively reasonable but ultimately ad-hoc in that they do not come with provable performance guarantees. We present methods from the statistics literature that can be used efficiently with neural networks under minimal assumptions with guaranteed performance. We propose a neural network that outputs three values instead of a single point estimate and optimizes a loss function motivated by the standard quantile regression loss. We provide two prediction interval methods with finite sample coverage guarantees solely under the assumption that the observations are independent and identically distributed. The first method leverages the conformal inference framework and provides average coverage. The second method provides a new, stronger guarantee by conditioning on the observed data. Lastly, our loss function does not compromise the predictive accuracy of the network like other prediction interval methods. We demonstrate the ease of use of our procedures as well as its improvements over other methods on both simulated and real data. As most deep networks can easily be modified by our method to output predictions with valid prediction intervals, its use should become standard practice, much like reporting standard errors along with mean estimates.
△ Less
Submitted 24 February, 2020; v1 submitted 25 May, 2019;
originally announced May 2019.
-
"Why did you do that?": Explaining black box models with Inductive Synthesis
Authors:
Görkem Paçacı,
David Johnson,
Steve McKeever,
Andreas Hamfelt
Abstract:
By their nature, the composition of black box models is opaque. This makes the ability to generate explanations for the response to stimuli challenging. The importance of explaining black box models has become increasingly important given the prevalence of AI and ML systems and the need to build legal and regulatory frameworks around them. Such explanations can also increase trust in these uncerta…
▽ More
By their nature, the composition of black box models is opaque. This makes the ability to generate explanations for the response to stimuli challenging. The importance of explaining black box models has become increasingly important given the prevalence of AI and ML systems and the need to build legal and regulatory frameworks around them. Such explanations can also increase trust in these uncertain systems. In our paper we present RICE, a method for generating explanations of the behaviour of black box models by (1) probing a model to extract model output examples using sensitivity analysis; (2) applying CNPInduce, a method for inductive logic program synthesis, to generate logic programs based on critical input-output pairs; and (3) interpreting the target program as a human-readable explanation. We demonstrate the application of our method by generating explanations of an artificial neural network trained to follow simple traffic rules in a hypothetical self-driving car simulation. We conclude with a discussion on the scalability and usability of our approach and its potential applications to explanation-critical scenarios.
△ Less
Submitted 17 April, 2019;
originally announced April 2019.
-
Physics of eccentric binary black hole mergers: A numerical relativity perspective
Authors:
E. A. Huerta,
Roland Haas,
Sarah Habib,
Anushri Gupta,
Adam Rebei,
Vishnu Chavva,
Daniel Johnson,
Shawn Rosofsky,
Erik Wessel,
Bhanu Agarwal,
Diyu Luo,
Wei Ren
Abstract:
Gravitational wave observations of eccentric binary black hole mergers will provide unequivocal evidence for the formation of these systems through dynamical assembly in dense stellar environments. The study of these astrophysically motivated sources is timely in view of electromagnetic observations, consistent with the existence of stellar mass black holes in the globular cluster M22 and in the G…
▽ More
Gravitational wave observations of eccentric binary black hole mergers will provide unequivocal evidence for the formation of these systems through dynamical assembly in dense stellar environments. The study of these astrophysically motivated sources is timely in view of electromagnetic observations, consistent with the existence of stellar mass black holes in the globular cluster M22 and in the Galactic center, and the proven detection capabilities of ground-based gravitational wave detectors. In order to get insights into the physics of these objects in the dynamical, strong-field gravity regime, we present a catalog of 89 numerical relativity waveforms that describe binary systems of non-spinning black holes with mass-ratios $1\leq q \leq 10$, and initial eccentricities as high as $e_0=0.18$ fifteen cycles before merger. We use this catalog to quantify the loss of energy and angular momentum through gravitational radiation, and the astrophysical properties of the black hole remnant, including its final mass and spin, and recoil velocity. We discuss the implications of these results for gravitational wave source modeling, and the design of algorithms to search for and identify eccentric binary black hole mergers in realistic detection scenarios.
△ Less
Submitted 5 September, 2019; v1 submitted 21 January, 2019;
originally announced January 2019.
-
3D Deep Learning with voxelized atomic configurations for modeling atomistic potentials in complex solid-solution alloys
Authors:
Rahul Singh,
Aayush Sharma,
Onur Rauf Bingol,
Aditya Balu,
Ganesh Balasubramanian,
Duane D. Johnson,
Soumik Sarkar
Abstract:
The need for advanced materials has led to the development of complex, multi-component alloys or solid-solution alloys. These materials have shown exceptional properties like strength, toughness, ductility, electrical and electronic properties. Current development of such material systems are hindered by expensive experiments and computationally demanding first-principles simulations. Atomistic si…
▽ More
The need for advanced materials has led to the development of complex, multi-component alloys or solid-solution alloys. These materials have shown exceptional properties like strength, toughness, ductility, electrical and electronic properties. Current development of such material systems are hindered by expensive experiments and computationally demanding first-principles simulations. Atomistic simulations can provide reasonable insights on properties in such material systems. However, the issue of designing robust potentials still exists. In this paper, we explore a deep convolutional neural-network based approach to develop the atomistic potential for such complex alloys to investigate materials for insights into controlling properties. In the present work, we propose a voxel representation of the atomic configuration of a cell and design a 3D convolutional neural network to learn the interaction of the atoms. Our results highlight the performance of the 3D convolutional neural network and its efficacy in machine-learning the atomistic potential. We also explore the role of voxel resolution and provide insights into the two bounding box methodologies implemented for voxelization.
△ Less
Submitted 23 November, 2018;
originally announced November 2018.
-
Real-Time Object Pose Estimation with Pose Interpreter Networks
Authors:
Jimmy Wu,
Bolei Zhou,
Rebecca Russell,
Vincent Kee,
Syler Wagner,
Mitchell Hebert,
Antonio Torralba,
David M. S. Johnson
Abstract:
In this work, we introduce pose interpreter networks for 6-DoF object pose estimation. In contrast to other CNN-based approaches to pose estimation that require expensively annotated object pose data, our pose interpreter network is trained entirely on synthetic pose data. We use object masks as an intermediate representation to bridge real and synthetic. We show that when combined with a segmenta…
▽ More
In this work, we introduce pose interpreter networks for 6-DoF object pose estimation. In contrast to other CNN-based approaches to pose estimation that require expensively annotated object pose data, our pose interpreter network is trained entirely on synthetic pose data. We use object masks as an intermediate representation to bridge real and synthetic. We show that when combined with a segmentation model trained on RGB images, our synthetically trained pose interpreter network is able to generalize to real data. Our end-to-end system for object pose estimation runs in real-time (20 Hz) on live RGB data, without using depth information or ICP refinement.
△ Less
Submitted 3 August, 2018;
originally announced August 2018.
-
Wireless coverage prediction via parametric shortest paths
Authors:
David Applegate,
Aaron Archer,
David S. Johnson,
Evdokia Nikolova,
Mikkel Thorup,
Ger Yang
Abstract:
When deciding where to place access points in a wireless network, it is useful to model the signal propagation loss between a proposed antenna location and the areas it may cover. The indoor dominant path (IDP) model, introduced by Wölfle et al., is shown in the literature to have good validation and generalization error, is faster to compute than competing methods, and is used in commercial softw…
▽ More
When deciding where to place access points in a wireless network, it is useful to model the signal propagation loss between a proposed antenna location and the areas it may cover. The indoor dominant path (IDP) model, introduced by Wölfle et al., is shown in the literature to have good validation and generalization error, is faster to compute than competing methods, and is used in commercial software such as WinProp, iBwave Design, and CellTrace. Previously, the algorithms known for computing it involved a worst-case exponential-time tree search, with pruning heuristics to speed it up.
We prove that the IDP model can be reduced to a parametric shortest path computation on a graph derived from the walls in the floorplan. It therefore admits a quasipolynomial-time (i.e., $n^{O(\log n)}$) algorithm. We also give a practical approximation algorithm based on running a small constant number of shortest path computations. Its provable worst-case additive error (in dB) can be made arbitrarily small via appropriate choices of parameters, and is well below 1dB for reasonable choices. We evaluate our approximation algorithm empirically against the exact IDP model, and show that it consistently beats its theoretical worst-case bounds, solving the model exactly (i.e., no error) in the vast majority of cases.
△ Less
Submitted 16 May, 2018;
originally announced May 2018.
-
Eccentric, nonspinning, inspiral, Gaussian-process merger approximant for the detection and characterization of eccentric binary black hole mergers
Authors:
E. A. Huerta,
C. J. Moore,
Prayush Kumar,
Daniel George,
Alvin J. K. Chua,
Roland Haas,
Erik Wessel,
Daniel Johnson,
Derek Glennon,
Adam Rebei,
A. Miguel Holgado,
Jonathan R. Gair,
Harald P. Pfeiffer
Abstract:
We present $\texttt{ENIGMA}$, a time domain, inspiral-merger-ringdown waveform model that describes non-spinning binary black holes systems that evolve on moderately eccentric orbits. The inspiral evolution is described using a consistent combination of post-Newtonian theory, self-force and black hole perturbation theory. Assuming eccentric binaries that circularize prior to coalescence, we smooth…
▽ More
We present $\texttt{ENIGMA}$, a time domain, inspiral-merger-ringdown waveform model that describes non-spinning binary black holes systems that evolve on moderately eccentric orbits. The inspiral evolution is described using a consistent combination of post-Newtonian theory, self-force and black hole perturbation theory. Assuming eccentric binaries that circularize prior to coalescence, we smoothly match the eccentric inspiral with a stand-alone, quasi-circular merger, which is constructed using machine learning algorithms that are trained with quasi-circular numerical relativity waveforms. We show that $\texttt{ENIGMA}$ reproduces with excellent accuracy the dynamics of quasi-circular compact binaries. We validate $\texttt{ENIGMA}$ using a set of $\texttt{Einstein Toolkit}$ eccentric numerical relativity waveforms, which describe eccentric binary black hole mergers with mass-ratios between $1 \leq q \leq 5.5$, and eccentricities $e_0 \lesssim 0.2$ ten orbits before merger. We use this model to explore in detail the physics that can be extracted with moderately eccentric, non-spinning binary black hole mergers. We use $\texttt{ENIGMA}$ to show that GW150914, GW151226, GW170104, GW170814 and GW170608 can be effectively recovered with spinning, quasi-circular templates if the eccentricity of these events at a gravitational wave frequency of 10Hz satisfies $e_0\leq \{0.175,\, 0.125,\,0.175,\,0.175,\, 0.125\}$, respectively. We show that if these systems have eccentricities $e_0\sim 0.1$ at a gravitational wave frequency of 10Hz, they can be misclassified as quasi-circular binaries due to parameter space degeneracies between eccentricity and spin corrections. Using our catalog of eccentric numerical relativity simulations, we discuss the importance of including higher-order waveform multipoles in gravitational wave searches of eccentric binary black hole mergers.
△ Less
Submitted 24 January, 2018; v1 submitted 16 November, 2017;
originally announced November 2017.
-
SegICP-DSR: Dense Semantic Scene Reconstruction and Registration
Authors:
Jay M. Wong,
Syler Wagner,
Connor Lawson,
Vincent Kee,
Mitchell Hebert,
Justin Rooney,
Gian-Luca Mariottini,
Rebecca Russell,
Abraham Schneider,
Rahul Chipalkatty,
David M. S. Johnson
Abstract:
To enable autonomous robotic manipulation in unstructured environments, we present SegICP-DSR, a real- time, dense, semantic scene reconstruction and pose estimation algorithm that achieves mm-level pose accuracy and standard deviation (7.9 mm, σ=7.6 mm and 1.7 deg, σ=0.7 deg) and suc- cessfully identified the object pose in 97% of test cases. This represents a 29% increase in accuracy, and a 14%…
▽ More
To enable autonomous robotic manipulation in unstructured environments, we present SegICP-DSR, a real- time, dense, semantic scene reconstruction and pose estimation algorithm that achieves mm-level pose accuracy and standard deviation (7.9 mm, σ=7.6 mm and 1.7 deg, σ=0.7 deg) and suc- cessfully identified the object pose in 97% of test cases. This represents a 29% increase in accuracy, and a 14% increase in success rate compared to SegICP in cluttered, unstruc- tured environments. The performance increase of SegICP-DSR arises from (1) improved deep semantic segmentation under adversarial training, (2) precise automated calibration of the camera intrinsic and extrinsic parameters, (3) viewpoint specific ray-casting of the model geometry, and (4) dense semantic ElasticFusion point clouds for registration. We benchmark the performance of SegICP-DSR on thousands of pose-annotated video frames and demonstrate its accuracy and efficacy on two tight tolerance grasping and insertion tasks using a KUKA LBR iiwa robotic arm.
△ Less
Submitted 6 November, 2017;
originally announced November 2017.
-
Mutual Information in Frequency and its Application to Measure Cross-Frequency Coupling in Epilepsy
Authors:
Rakesh Malladi,
Don H Johnson,
Giridhar P Kalamangalam,
Nitin Tandon,
Behnaam Aazhang
Abstract:
We define a metric, mutual information in frequency (MI-in-frequency), to detect and quantify the statistical dependence between different frequency components in the data, referred to as cross-frequency coupling and apply it to electrophysiological recordings from the brain to infer cross-frequency coupling. The current metrics used to quantify the cross-frequency coupling in neuroscience cannot…
▽ More
We define a metric, mutual information in frequency (MI-in-frequency), to detect and quantify the statistical dependence between different frequency components in the data, referred to as cross-frequency coupling and apply it to electrophysiological recordings from the brain to infer cross-frequency coupling. The current metrics used to quantify the cross-frequency coupling in neuroscience cannot detect if two frequency components in non-Gaussian brain recordings are statistically independent or not. Our MI-in-frequency metric, based on Shannon's mutual information between the Cramer's representation of stochastic processes, overcomes this shortcoming and can detect statistical dependence in frequency between non-Gaussian signals. We then describe two data-driven estimators of MI-in-frequency: one based on kernel density estimation and the other based on the nearest neighbor algorithm and validate their performance on simulated data. We then use MI-in-frequency to estimate mutual information between two data streams that are dependent across time, without making any parametric model assumptions. Finally, we use the MI-in- frequency metric to investigate the cross-frequency coupling in seizure onset zone from electrocorticographic recordings during seizures. The inferred cross-frequency coupling characteristics are essential to optimize the spatial and spectral parameters of electrical stimulation based treatments of epilepsy.
△ Less
Submitted 15 March, 2018; v1 submitted 5 November, 2017;
originally announced November 2017.
-
Reconstructing Video from Interferometric Measurements of Time-Varying Sources
Authors:
Katherine L. Bouman,
Michael D. Johnson,
Adrian V. Dalca,
Andrew A. Chael,
Freek Roelofs,
Sheperd S. Doeleman,
William T. Freeman
Abstract:
Very long baseline interferometry (VLBI) makes it possible to recover images of astronomical sources with extremely high angular resolution. Most recently, the Event Horizon Telescope (EHT) has extended VLBI to short millimeter wavelengths with a goal of achieving angular resolution sufficient for imaging the event horizons of nearby supermassive black holes. VLBI provides measurements related to…
▽ More
Very long baseline interferometry (VLBI) makes it possible to recover images of astronomical sources with extremely high angular resolution. Most recently, the Event Horizon Telescope (EHT) has extended VLBI to short millimeter wavelengths with a goal of achieving angular resolution sufficient for imaging the event horizons of nearby supermassive black holes. VLBI provides measurements related to the underlying source image through a sparse set spatial frequencies. An image can then be recovered from these measurements by making assumptions about the underlying image. One of the most important assumptions made by conventional imaging methods is that over the course of a night's observation the image is static. However, for quickly evolving sources, such as the galactic center's supermassive black hole (Sgr A*) targeted by the EHT, this assumption is violated and these conventional imaging approaches fail. In this work we propose a new way to model VLBI measurements that allows us to recover both the appearance and dynamics of an evolving source by reconstructing a video rather than a static image. By modeling VLBI measurements using a Gaussian Markov Model, we are able to propagate information across observations in time to reconstruct a video, while simultaneously learning about the dynamics of the source's emission region. We demonstrate our proposed Expectation-Maximization (EM) algorithm, StarWarps, on realistic synthetic observations of black holes, and show how it substantially improves results compared to conventional imaging algorithms. Additionally, we demonstrate StarWarps on real VLBI data of the M87 Jet from the VLBA.
△ Less
Submitted 1 February, 2018; v1 submitted 3 November, 2017;
originally announced November 2017.
-
Python Open Source Waveform Extractor (POWER): An open source, Python package to monitor and post-process numerical relativity simulations
Authors:
Daniel Johnson,
E. A. Huerta,
Roland Haas
Abstract:
Numerical simulations of Einstein's field equations provide unique insights into the physics of compact objects moving at relativistic speeds, and which are driven by strong gravitational interactions. Numerical relativity has played a key role to firmly establish gravitational wave astrophysics as a new field of research, and it is now paving the way to establish whether gravitational wave radiat…
▽ More
Numerical simulations of Einstein's field equations provide unique insights into the physics of compact objects moving at relativistic speeds, and which are driven by strong gravitational interactions. Numerical relativity has played a key role to firmly establish gravitational wave astrophysics as a new field of research, and it is now paving the way to establish whether gravitational wave radiation emitted from compact binary mergers is accompanied by electromagnetic and astro-particle counterparts. As numerical relativity continues to blend in with routine gravitational wave data analyses to validate the discovery of gravitational wave events, it is essential to develop open source tools to streamline these studies. Motivated by our own experience as users and developers of the open source, community software, the Einstein Toolkit, we present an open source, Python package that is ideally suited to monitor and post-process the data products of numerical relativity simulations, and compute the gravitational wave strain at future null infinity in high performance environments. We showcase the application of this new package to post-process a large numerical relativity catalog and extract higher-order waveform modes from numerical relativity simulations of eccentric binary black hole mergers and neutron star mergers. This new software fills a critical void in the arsenal of tools provided by the Einstein Toolkit Consortium to the numerical relativity community.
△ Less
Submitted 27 November, 2017; v1 submitted 9 August, 2017;
originally announced August 2017.
-
Data-Driven Estimation Of Mutual Information Between Dependent Data
Authors:
Rakesh Malladi,
Don H Johnson,
Behnaam Aazhang
Abstract:
We consider the problem of estimating mutual information between dependent data, an important problem in many science and engineering applications. We propose a data-driven, non-parametric estimator of mutual information in this paper. The main novelty of our solution lies in transforming the data to frequency domain to make the problem tractable. We define a novel metric--mutual information in fr…
▽ More
We consider the problem of estimating mutual information between dependent data, an important problem in many science and engineering applications. We propose a data-driven, non-parametric estimator of mutual information in this paper. The main novelty of our solution lies in transforming the data to frequency domain to make the problem tractable. We define a novel metric--mutual information in frequency--to detect and quantify the dependence between two random processes across frequency using Cramér's spectral representation. Our solution calculates mutual information as a function of frequency to estimate the mutual information between the dependent data over time. We validate its performance on linear and nonlinear models. In addition, mutual information in frequency estimated as a part of our solution can also be used to infer cross-frequency coupling in the data.
△ Less
Submitted 7 March, 2017;
originally announced March 2017.
-
SegICP: Integrated Deep Semantic Segmentation and Pose Estimation
Authors:
Jay M. Wong,
Vincent Kee,
Tiffany Le,
Syler Wagner,
Gian-Luca Mariottini,
Abraham Schneider,
Lei Hamilton,
Rahul Chipalkatty,
Mitchell Hebert,
David M. S. Johnson,
Jimmy Wu,
Bolei Zhou,
Antonio Torralba
Abstract:
Recent robotic manipulation competitions have highlighted that sophisticated robots still struggle to achieve fast and reliable perception of task-relevant objects in complex, realistic scenarios. To improve these systems' perceptive speed and robustness, we present SegICP, a novel integrated solution to object recognition and pose estimation. SegICP couples convolutional neural networks and multi…
▽ More
Recent robotic manipulation competitions have highlighted that sophisticated robots still struggle to achieve fast and reliable perception of task-relevant objects in complex, realistic scenarios. To improve these systems' perceptive speed and robustness, we present SegICP, a novel integrated solution to object recognition and pose estimation. SegICP couples convolutional neural networks and multi-hypothesis point cloud registration to achieve both robust pixel-wise semantic segmentation as well as accurate and real-time 6-DOF pose estimation for relevant objects. Our architecture achieves 1cm position error and <5^\circ$ angle error in real time without an initial seed. We evaluate and benchmark SegICP against an annotated dataset generated by motion capture.
△ Less
Submitted 5 September, 2017; v1 submitted 5 March, 2017;
originally announced March 2017.
-
Near-Optimal Disjoint-Path Facility Location Through Set Cover by Pairs
Authors:
David S. Johnson,
Lee Breslau,
Ilias Diakonikolas,
Nick Duffield,
Yu Gu,
MohammadTaghi Hajiaghayi,
Howard Karloff,
Mauricio G. C. Resende,
Subhabrata Sen
Abstract:
In this paper we consider two special cases of the "cover-by-pairs" optimization problem that arise when we need to place facilities so that each customer is served by two facilities that reach it by disjoint shortest paths. These problems arise in a network traffic monitoring scheme proposed by Breslau et al. and have potential applications to content distribution. The "set-disjoint" variant appl…
▽ More
In this paper we consider two special cases of the "cover-by-pairs" optimization problem that arise when we need to place facilities so that each customer is served by two facilities that reach it by disjoint shortest paths. These problems arise in a network traffic monitoring scheme proposed by Breslau et al. and have potential applications to content distribution. The "set-disjoint" variant applies to networks that use the OSPF routing protocol, and the "path-disjoint" variant applies when MPLS routing is enabled, making better solutions possible at the cost of greater operational expense. Although we can prove that no polynomial-time algorithm can guarantee good solutions for either version, we are able to provide heuristics that do very well in practice on instances with real-world network structure. Fast implementations of the heuristics, made possible by exploiting mathematical observations about the relationship between the network instances and the corresponding instances of the cover-by-pairs problem, allow us to perform an extensive experimental evaluation of the heuristics and what the solutions they produce tell us about the effectiveness of the proposed monitoring scheme. For the set-disjoint variant, we validate our claim of near-optimality via a new lower-bounding integer programming formulation. Although computing this lower bound requires solving the NP-hard Hitting Set problem and can underestimate the optimal value by a linear factor in the worst case, it can be computed quickly by CPLEX, and it equals the optimal solution value for all the instances in our extensive testbed.
△ Less
Submitted 3 November, 2016;
originally announced November 2016.
-
Strategic Seeding of Rival Opinions
Authors:
Samuel D. Johnson,
Jemin George,
Raissa M. D'Souza
Abstract:
We present a network influence game that models players strategically seeding the opinions of nodes embedded in a social network. A social learning dynamic, whereby nodes repeatedly update their opinions to resemble those of their neighbors, spreads the seeded opinions through the network. After a fixed period of time, the dynamic halts and each player's utility is determined by the relative stren…
▽ More
We present a network influence game that models players strategically seeding the opinions of nodes embedded in a social network. A social learning dynamic, whereby nodes repeatedly update their opinions to resemble those of their neighbors, spreads the seeded opinions through the network. After a fixed period of time, the dynamic halts and each player's utility is determined by the relative strength of the opinions held by each node in the network vis-a-vis the other players. We show that the existence of a pure Nash equilibrium cannot be guaranteed in general. However, if the dynamics are allowed to progress for a sufficient amount of time so that a consensus among all of the nodes is obtained, then the existence of a pure Nash equilibrium can be guaranteed. The computational complexity of finding a pure strategy best response is shown to be NP-complete, but can be efficiently approximated to within a (1 - 1/e) factor of optimal by a simple greedy algorithm.
△ Less
Submitted 22 September, 2016;
originally announced September 2016.
-
Open and Regionalised Spectrum Repositories for Emerging Countries
Authors:
Andrés Arcia-Moret,
Arjuna Sathiaseelan,
Marco Zennaro,
Freddy Rondón,
Ermanno Pietrosemoli,
David Johnson
Abstract:
TV White Spaces have recently been proposed as an alternative to alleviate the spectrum crunch, characterised by the need to reallocate frequency bands to accommodate the ever-growing demand for wireless communications. In this paper, we discuss the motivations and challenges for collecting spectrum measurements in developing regions and discuss a scalable system for communities to gather and prov…
▽ More
TV White Spaces have recently been proposed as an alternative to alleviate the spectrum crunch, characterised by the need to reallocate frequency bands to accommodate the ever-growing demand for wireless communications. In this paper, we discuss the motivations and challenges for collecting spectrum measurements in developing regions and discuss a scalable system for communities to gather and provide access to White Spaces information through open and regionalised repositories. We further discuss two relevant aspects. First, we propose a cooperative mechanism for sensing spectrum availability using a detector approach. Second, we propose a strategy (and an architecture) on the database side to implement spectrum governance. Other aspects of the work include discussion of an extensive measurement campaign showing a number of white spaces in developing regions, an overview of our experience on low-cost spectrum analysers, and the architecture of zebra-rfo, an application for processing crowd-sourced spectrum data.
△ Less
Submitted 17 July, 2016;
originally announced July 2016.