-
Scalable unsupervised alignment of general metric and non-metric structures
Authors:
Sanketh Vedula,
Valentino Maiorca,
Lorenzo Basile,
Francesco Locatello,
Alex Bronstein
Abstract:
Aligning data from different domains is a fundamental problem in machine learning with broad applications across very different areas, most notably aligning experimental readouts in single-cell multiomics. Mathematically, this problem can be formulated as the minimization of disagreement of pair-wise quantities such as distances and is related to the Gromov-Hausdorff and Gromov-Wasserstein distanc…
▽ More
Aligning data from different domains is a fundamental problem in machine learning with broad applications across very different areas, most notably aligning experimental readouts in single-cell multiomics. Mathematically, this problem can be formulated as the minimization of disagreement of pair-wise quantities such as distances and is related to the Gromov-Hausdorff and Gromov-Wasserstein distances. Computationally, it is a quadratic assignment problem (QAP) that is known to be NP-hard. Prior works attempted to solve the QAP directly with entropic or low-rank regularization on the permutation, which is computationally tractable only for modestly-sized inputs, and encode only limited inductive bias related to the domains being aligned. We consider the alignment of metric structures formulated as a discrete Gromov-Wasserstein problem and instead of solving the QAP directly, we propose to learn a related well-scalable linear assignment problem (LAP) whose solution is also a minimizer of the QAP. We also show a flexible extension of the proposed framework to general non-metric dissimilarities through differentiable ranks. We extensively evaluate our approach on synthetic and real datasets from single-cell multiomics and neural latent spaces, achieving state-of-the-art performance while being conceptually and computationally simple.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
A Theoretical Framework for an Efficient Normalizing Flow-Based Solution to the Schrodinger Equation
Authors:
Daniel Freedman,
Eyal Rozenberg,
Alex Bronstein
Abstract:
A central problem in quantum mechanics involves solving the Electronic Schrodinger Equation for a molecule or material. The Variational Monte Carlo approach to this problem approximates a particular variational objective via sampling, and then optimizes this approximated objective over a chosen parameterized family of wavefunctions, known as the ansatz. Recently neural networks have been used as t…
▽ More
A central problem in quantum mechanics involves solving the Electronic Schrodinger Equation for a molecule or material. The Variational Monte Carlo approach to this problem approximates a particular variational objective via sampling, and then optimizes this approximated objective over a chosen parameterized family of wavefunctions, known as the ansatz. Recently neural networks have been used as the ansatz, with accompanying success. However, sampling from such wavefunctions has required the use of a Markov Chain Monte Carlo approach, which is inherently inefficient. In this work, we propose a solution to this problem via an ansatz which is cheap to sample from, yet satisfies the requisite quantum mechanical properties. We prove that a normalizing flow using the following two essential ingredients satisfies our requirements: (a) a base distribution which is constructed from Determinantal Point Processes; (b) flow layers which are equivariant to a particular subgroup of the permutation group. We then show how to construct both continuous and discrete normalizing flows which satisfy the requisite equivariance. We further demonstrate the manner in which the non-smooth nature ("cusps") of the wavefunction may be captured, and how the framework may be generalized to provide induction across multiple molecules. The resulting theoretical framework entails an efficient approach to solving the Electronic Schrodinger Equation.
△ Less
Submitted 28 May, 2024;
originally announced June 2024.
-
Leveraging Latents for Efficient Thermography Classification and Segmentation
Authors:
Tamir Shor,
Chaim Baskin,
Alex Bronstein
Abstract:
Breast cancer is a prominent health concern worldwide, currently being the secondmost common and second-deadliest type of cancer in women. While current breast cancer diagnosis mainly relies on mammography imaging, in recent years the use of thermography for breast cancer imaging has been garnering growing popularity. Thermographic imaging relies on infrared cameras to capture body-emitted heat di…
▽ More
Breast cancer is a prominent health concern worldwide, currently being the secondmost common and second-deadliest type of cancer in women. While current breast cancer diagnosis mainly relies on mammography imaging, in recent years the use of thermography for breast cancer imaging has been garnering growing popularity. Thermographic imaging relies on infrared cameras to capture body-emitted heat distributions. While these heat signatures have proven useful for computer-vision systems for accurate breast cancer segmentation and classification, prior work often relies on handcrafted feature engineering or complex architectures, potentially limiting the comparability and applicability of these methods. In this work, we present a novel algorithm for both breast cancer classification and segmentation. Rather than focusing efforts on manual feature and architecture engineering, our algorithm focuses on leveraging an informative, learned feature space, thus making our solution simpler to use and extend to other frameworks and downstream tasks, as well as more applicable to data-scarce settings. Our classification produces SOTA results, while we are the first work to produce segmentation regions studied in this paper.
△ Less
Submitted 23 June, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Active propulsion noise shaping for multi-rotor aircraft localization
Authors:
Gabriele Serussi,
Tamir Shor,
Tom Hirshberg,
Chaim Baskin,
Alex Bronstein
Abstract:
Multi-rotor aerial autonomous vehicles (MAVs) primarily rely on vision for navigation purposes. However, visual localization and odometry techniques suffer from poor performance in low or direct sunlight, a limited field of view, and vulnerability to occlusions. Acoustic sensing can serve as a complementary or even alternative modality for vision in many situations, and it also has the added benef…
▽ More
Multi-rotor aerial autonomous vehicles (MAVs) primarily rely on vision for navigation purposes. However, visual localization and odometry techniques suffer from poor performance in low or direct sunlight, a limited field of view, and vulnerability to occlusions. Acoustic sensing can serve as a complementary or even alternative modality for vision in many situations, and it also has the added benefits of lower system cost and energy footprint, which is especially important for micro aircraft. This paper proposes actively controlling and shaping the aircraft propulsion noise generated by the rotors to benefit localization tasks, rather than considering it a harmful nuisance. We present a neural network architecture for selfnoise-based localization in a known environment. We show that training it simultaneously with learning time-varying rotor phase modulation achieves accurate and robust localization. The proposed methods are evaluated using a computationally affordable simulation of MAV rotor noise in 2D acoustic environments that is fitted to real recordings of rotor pressure fields.
△ Less
Submitted 29 February, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Demystifying Graph Sparsification Algorithms in Graph Properties Preservation
Authors:
Yuhan Chen,
Haojie Ye,
Sanketh Vedula,
Alex Bronstein,
Ronald Dreslinski,
Trevor Mudge,
Nishil Talati
Abstract:
Graph sparsification is a technique that approximates a given graph by a sparse graph with a subset of vertices and/or edges. The goal of an effective sparsification algorithm is to maintain specific graph properties relevant to the downstream task while minimizing the graph's size. Graph algorithms often suffer from long execution time due to the irregularity and the large real-world graph size.…
▽ More
Graph sparsification is a technique that approximates a given graph by a sparse graph with a subset of vertices and/or edges. The goal of an effective sparsification algorithm is to maintain specific graph properties relevant to the downstream task while minimizing the graph's size. Graph algorithms often suffer from long execution time due to the irregularity and the large real-world graph size. Graph sparsification can be applied to greatly reduce the run time of graph algorithms by substituting the full graph with a much smaller sparsified graph, without significantly degrading the output quality. However, the interaction between numerous sparsifiers and graph properties is not widely explored, and the potential of graph sparsification is not fully understood.
In this work, we cover 16 widely-used graph metrics, 12 representative graph sparsification algorithms, and 14 real-world input graphs spanning various categories, exhibiting diverse characteristics, sizes, and densities. We developed a framework to extensively assess the performance of these sparsification algorithms against graph metrics, and provide insights to the results. Our study shows that there is no one sparsifier that performs the best in preserving all graph properties, e.g. sparsifiers that preserve distance-related graph properties (eccentricity) struggle to perform well on Graph Neural Networks (GNN). This paper presents a comprehensive experimental study evaluating the performance of sparsification algorithms in preserving essential graph metrics. The insights inform future research in incorporating matching graph sparsification to graph algorithms to maximize benefits while minimizing quality degradation. Furthermore, we provide a framework to facilitate the future evaluation of evolving sparsification algorithms, graph metrics, and ever-growing graph data.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Vector Quantile Regression on Manifolds
Authors:
Marco Pegoraro,
Sanketh Vedula,
Aviv A. Rosenberg,
Irene Tallini,
Emanuele Rodolà,
Alex M. Bronstein
Abstract:
Quantile regression (QR) is a statistical tool for distribution-free estimation of conditional quantiles of a target variable given explanatory features. QR is limited by the assumption that the target distribution is univariate and defined on an Euclidean domain. Although the notion of quantiles was recently extended to multi-variate distributions, QR for multi-variate distributions on manifolds…
▽ More
Quantile regression (QR) is a statistical tool for distribution-free estimation of conditional quantiles of a target variable given explanatory features. QR is limited by the assumption that the target distribution is univariate and defined on an Euclidean domain. Although the notion of quantiles was recently extended to multi-variate distributions, QR for multi-variate distributions on manifolds remains underexplored, even though many important applications inherently involve data distributed on, e.g., spheres (climate and geological phenomena), and tori (dihedral angles in proteins). By leveraging optimal transport theory and c-concave functions, we meaningfully define conditional vector quantile functions of high-dimensional variables on manifolds (M-CVQFs). Our approach allows for quantile estimation, regression, and computation of conditional confidence sets and likelihoods. We demonstrate the approach's efficacy and provide insights regarding the meaning of non-Euclidean quantiles through synthetic and real data experiments.
△ Less
Submitted 7 February, 2024; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Designing Nonlinear Photonic Crystals for High-Dimensional Quantum State Engineering
Authors:
Eyal Rozenberg,
Aviv Karnieli,
Ofir Yesharim,
Joshua Foley-Comer,
Sivan Trajtenberg-Mills,
Sarika Mishra,
Shashi Prabhakar,
Ravindra Pratap,
Daniel Freedman,
Alex M. Bronstein,
Ady Arie
Abstract:
We propose a novel, physically-constrained and differentiable approach for the generation of D-dimensional qudit states via spontaneous parametric down-conversion (SPDC) in quantum optics. We circumvent any limitations imposed by the inherently stochastic nature of the physical process and incorporate a set of stochastic dynamical equations governing its evolution under the SPDC Hamiltonian. We de…
▽ More
We propose a novel, physically-constrained and differentiable approach for the generation of D-dimensional qudit states via spontaneous parametric down-conversion (SPDC) in quantum optics. We circumvent any limitations imposed by the inherently stochastic nature of the physical process and incorporate a set of stochastic dynamical equations governing its evolution under the SPDC Hamiltonian. We demonstrate the effectiveness of our model through the design of structured nonlinear photonic crystals (NLPCs) and shaped pump beams; and show, theoretically and experimentally, how to generate maximally entangled states in the spatial degree of freedom. The learning of NLPC structures offers a promising new avenue for shaping and controlling arbitrary quantum states and enables all-optical coherent control of the generated states. We believe that this approach can readily be extended from bulky crystals to thin Metasurfaces and potentially applied to other quantum systems sharing a similar Hamiltonian structures, such as superfluids and superconductors.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Classifier Robustness Enhancement Via Test-Time Transformation
Authors:
Tsachi Blau,
Roy Ganz,
Chaim Baskin,
Michael Elad,
Alex Bronstein
Abstract:
It has been recently discovered that adversarially trained classifiers exhibit an intriguing property, referred to as perceptually aligned gradients (PAG). PAG implies that the gradients of such classifiers possess a meaningful structure, aligned with human perception. Adversarial training is currently the best-known way to achieve classification robustness under adversarial attacks. The PAG prope…
▽ More
It has been recently discovered that adversarially trained classifiers exhibit an intriguing property, referred to as perceptually aligned gradients (PAG). PAG implies that the gradients of such classifiers possess a meaningful structure, aligned with human perception. Adversarial training is currently the best-known way to achieve classification robustness under adversarial attacks. The PAG property, however, has yet to be leveraged for further improving classifier robustness. In this work, we introduce Classifier Robustness Enhancement Via Test-Time Transformation (TETRA) -- a novel defense method that utilizes PAG, enhancing the performance of trained robust classifiers. Our method operates in two phases. First, it modifies the input image via a designated targeted adversarial attack into each of the dataset's classes. Then, it classifies the input image based on the distance to each of the modified instances, with the assumption that the shortest distance relates to the true class. We show that the proposed method achieves state-of-the-art results and validate our claim through extensive experiments on a variety of defense methods, classifier architectures, and datasets. We also empirically demonstrate that TETRA can boost the accuracy of any differentiable adversarial training classifier across a variety of attacks, including ones unseen at training. Specifically, applying TETRA leads to substantial improvement of up to $+23\%$, $+20\%$, and $+26\%$ on CIFAR10, CIFAR100, and ImageNet, respectively.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Multi PILOT: Learned Feasible Multiple Acquisition Trajectories for Dynamic MRI
Authors:
Tamir Shor,
Tomer Weiss,
Dor Noti,
Alex Bronstein
Abstract:
Dynamic Magnetic Resonance Imaging (MRI) is known to be a powerful and reliable technique for the dynamic imaging of internal organs and tissues, making it a leading diagnostic tool. A major difficulty in using MRI in this setting is the relatively long acquisition time (and, hence, increased cost) required for imaging in high spatio-temporal resolution, leading to the appearance of related motion…
▽ More
Dynamic Magnetic Resonance Imaging (MRI) is known to be a powerful and reliable technique for the dynamic imaging of internal organs and tissues, making it a leading diagnostic tool. A major difficulty in using MRI in this setting is the relatively long acquisition time (and, hence, increased cost) required for imaging in high spatio-temporal resolution, leading to the appearance of related motion artifacts and decrease in resolution. Compressed Sensing (CS) techniques have become a common tool to reduce MRI acquisition time by subsampling images in the k-space according to some acquisition trajectory. Several studies have particularly focused on applying deep learning techniques to learn these acquisition trajectories in order to attain better image reconstruction, rather than using some predefined set of trajectories. To the best of our knowledge, learning acquisition trajectories has been only explored in the context of static MRI. In this study, we consider acquisition trajectory learning in the dynamic imaging setting. We design an end-to-end pipeline for the joint optimization of multiple per-frame acquisition trajectories along with a reconstruction neural network, and demonstrate improved image reconstruction quality in shorter acquisition times. The code for reproducing all experiments is accessible at https://github.com/tamirshor7/MultiPILOT.
△ Less
Submitted 23 March, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Spectral Graph Complexity
Authors:
Anton Tsitsulin,
Davide Mottin,
Panagiotis Karras,
Alex Bronstein,
Emmanuel Müller
Abstract:
We introduce a spectral notion of graph complexity derived from the Weyl's law. We experimentally demonstrate its correlation to how well the graph can be embedded in a low-dimensional Euclidean space.
We introduce a spectral notion of graph complexity derived from the Weyl's law. We experimentally demonstrate its correlation to how well the graph can be embedded in a low-dimensional Euclidean space.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
Using Deep Reinforcement Learning for mmWave Real-Time Scheduling
Authors:
Barak Gahtan,
Reuven Cohen,
Alex M. Bronstein,
Gil Kedar
Abstract:
We study the problem of real-time scheduling in a multi-hop millimeter-wave (mmWave) mesh. We develop a model-free deep reinforcement learning algorithm called Adaptive Activator RL (AARL), which determines the subset of mmWave links that should be activated during each time slot and the power level for each link. The most important property of AARL is its ability to make scheduling decisions with…
▽ More
We study the problem of real-time scheduling in a multi-hop millimeter-wave (mmWave) mesh. We develop a model-free deep reinforcement learning algorithm called Adaptive Activator RL (AARL), which determines the subset of mmWave links that should be activated during each time slot and the power level for each link. The most important property of AARL is its ability to make scheduling decisions within the strict time slot constraints of typical 5G mmWave networks. AARL can handle a variety of network topologies, network loads, and interference models, it can also adapt to different workloads. We demonstrate the operation of AARL on several topologies: a small topology with 10 links, a moderately-sized mesh with 48 links, and a large topology with 96 links. We show that for each topology, we compare the throughput obtained by AARL to that of a benchmark algorithm called RPMA (Residual Profit Maximizer Algorithm). The most important advantage of AARL compared to RPMA is that it is much faster and can make the necessary scheduling decisions very rapidly during every time slot, while RPMA cannot. In addition, the quality of the scheduling decisions made by AARL outperforms those made by RPMA.
△ Less
Submitted 18 February, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Transformer Vs. MLP-Mixer: Exponential Expressive Gap For NLP Problems
Authors:
Dan Navon,
Alex M. Bronstein
Abstract:
Vision-Transformers are widely used in various vision tasks. Meanwhile, there is another line of works starting with the MLP-mixer trying to achieve similar performance using mlp-based architectures. Interestingly, until now those mlp-based architectures have not been adapted for NLP tasks. Additionally, until now, mlp-based architectures have failed to achieve state-of-the-art performance in visi…
▽ More
Vision-Transformers are widely used in various vision tasks. Meanwhile, there is another line of works starting with the MLP-mixer trying to achieve similar performance using mlp-based architectures. Interestingly, until now those mlp-based architectures have not been adapted for NLP tasks. Additionally, until now, mlp-based architectures have failed to achieve state-of-the-art performance in vision tasks. In this paper, we analyze the expressive power of mlp-based architectures in modeling dependencies between multiple different inputs simultaneously, and show an exponential gap between the attention and the mlp-based mechanisms. Our results suggest a theoretical explanation for the mlp inability to compete with attention-based mechanisms in NLP problems, they also suggest that the performance gap in vision tasks may be due to the mlp relative weakness in modeling dependencies between multiple different locations, and that combining smart input permutations with mlp architectures may not be enough to close the performance gap alone.
△ Less
Submitted 17 November, 2022; v1 submitted 17 August, 2022;
originally announced August 2022.
-
Random Search Hyper-Parameter Tuning: Expected Improvement Estimation and the Corresponding Lower Bound
Authors:
Dan Navon,
Alex M. Bronstein
Abstract:
Hyperparameter tuning is a common technique for improving the performance of neural networks. Most techniques for hyperparameter search involve an iterated process where the model is retrained at every iteration. However, the expected accuracy improvement from every additional search iteration, is still unknown. Calculating the expected improvement can help create stopping rules for hyperparameter…
▽ More
Hyperparameter tuning is a common technique for improving the performance of neural networks. Most techniques for hyperparameter search involve an iterated process where the model is retrained at every iteration. However, the expected accuracy improvement from every additional search iteration, is still unknown. Calculating the expected improvement can help create stopping rules for hyperparameter tuning and allow for a wiser allocation of a project's computational budget. In this paper, we establish an empirical estimate for the expected accuracy improvement from an additional iteration of hyperparameter search. Our results hold for any hyperparameter tuning method which is based on random search \cite{bergstra2012random} and samples hyperparameters from a fixed distribution. We bound our estimate with an error of $O\left(\sqrt{\frac{\log k}{k}}\right)$ w.h.p. where $k$ is the current number of iterations. To the best of our knowledge this is the first bound on the expected gain from an additional iteration of hyperparameter search. Finally, we demonstrate that the optimal estimate for the expected accuracy will still have an error of $\frac{1}{k}$.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Threat Model-Agnostic Adversarial Defense using Diffusion Models
Authors:
Tsachi Blau,
Roy Ganz,
Bahjat Kawar,
Alex Bronstein,
Michael Elad
Abstract:
Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks. Following the discovery of this vulnerability in real-world imaging and vision applications, the associated safety concerns have attracted vast research attention, and many defense techniques have been developed. Most of these defense methods rely on adversarial training (AT) --…
▽ More
Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks. Following the discovery of this vulnerability in real-world imaging and vision applications, the associated safety concerns have attracted vast research attention, and many defense techniques have been developed. Most of these defense methods rely on adversarial training (AT) -- training the classification network on images perturbed according to a specific threat model, which defines the magnitude of the allowed modification. Although AT leads to promising results, training on a specific threat model fails to generalize to other types of perturbations. A different approach utilizes a preprocessing step to remove the adversarial perturbation from the attacked image. In this work, we follow the latter path and aim to develop a technique that leads to robust classifiers across various realizations of threat models. To this end, we harness the recent advances in stochastic generative modeling, and means to leverage these for sampling from conditional distributions. Our defense relies on an addition of Gaussian i.i.d noise to the attacked image, followed by a pretrained diffusion process -- an architecture that performs a stochastic iterative process over a denoising network, yielding a high perceptual quality denoised outcome. The obtained robustness with this stochastic preprocessing step is validated through extensive experiments on the CIFAR-10 dataset, showing that our method outperforms the leading defense methods under various threat models.
△ Less
Submitted 17 July, 2022;
originally announced July 2022.
-
Physical Passive Patch Adversarial Attacks on Visual Odometry Systems
Authors:
Yaniv Nemcovsky,
Matan Jacoby,
Alex M. Bronstein,
Chaim Baskin
Abstract:
Deep neural networks are known to be susceptible to adversarial perturbations -- small perturbations that alter the output of the network and exist under strict norm limitations. While such perturbations are usually discussed as tailored to a specific input, a universal perturbation can be constructed to alter the model's output on a set of inputs. Universal perturbations present a more realistic…
▽ More
Deep neural networks are known to be susceptible to adversarial perturbations -- small perturbations that alter the output of the network and exist under strict norm limitations. While such perturbations are usually discussed as tailored to a specific input, a universal perturbation can be constructed to alter the model's output on a set of inputs. Universal perturbations present a more realistic case of adversarial attacks, as awareness of the model's exact input is not required. In addition, the universal attack setting raises the subject of generalization to unseen data, where given a set of inputs, the universal perturbations aim to alter the model's output on out-of-sample data. In this work, we study physical passive patch adversarial attacks on visual odometry-based autonomous navigation systems. A visual odometry system aims to infer the relative camera motion between two corresponding viewpoints, and is frequently used by vision-based autonomous navigation systems to estimate their state. For such navigation systems, a patch adversarial perturbation poses a severe security issue, as it can be used to mislead a system onto some collision course. To the best of our knowledge, we show for the first time that the error margin of a visual odometry model can be significantly increased by deploying patch adversarial attacks in the scene. We provide evaluation on synthetic closed-loop drone navigation data and demonstrate that a comparable vulnerability exists in real data. A reference implementation of the proposed method and the reported experiments is provided at https://github.com/patchadversarialattacks/patchadversarialattacks.
△ Less
Submitted 4 October, 2022; v1 submitted 11 July, 2022;
originally announced July 2022.
-
Fast Nonlinear Vector Quantile Regression
Authors:
Aviv A. Rosenberg,
Sanketh Vedula,
Yaniv Romano,
Alex M. Bronstein
Abstract:
Quantile regression (QR) is a powerful tool for estimating one or more conditional quantiles of a target variable $\mathrm{Y}$ given explanatory features $\boldsymbol{\mathrm{X}}$. A limitation of QR is that it is only defined for scalar target variables, due to the formulation of its objective function, and since the notion of quantiles has no standard definition for multivariate distributions. R…
▽ More
Quantile regression (QR) is a powerful tool for estimating one or more conditional quantiles of a target variable $\mathrm{Y}$ given explanatory features $\boldsymbol{\mathrm{X}}$. A limitation of QR is that it is only defined for scalar target variables, due to the formulation of its objective function, and since the notion of quantiles has no standard definition for multivariate distributions. Recently, vector quantile regression (VQR) was proposed as an extension of QR for vector-valued target variables, thanks to a meaningful generalization of the notion of quantiles to multivariate distributions via optimal transport. Despite its elegance, VQR is arguably not applicable in practice due to several limitations: (i) it assumes a linear model for the quantiles of the target $\boldsymbol{\mathrm{Y}}$ given the features $\boldsymbol{\mathrm{X}}$; (ii) its exact formulation is intractable even for modestly-sized problems in terms of target dimensions, number of regressed quantile levels, or number of features, and its relaxed dual formulation may violate the monotonicity of the estimated quantiles; (iii) no fast or scalable solvers for VQR currently exist. In this work we fully address these limitations, namely: (i) We extend VQR to the non-linear case, showing substantial improvement over linear VQR; (ii) We propose {vector monotone rearrangement}, a method which ensures the quantile functions estimated by VQR are monotone functions; (iii) We provide fast, GPU-accelerated solvers for linear and nonlinear VQR which maintain a fixed memory footprint, and demonstrate that they scale to millions of samples and thousands of quantile levels; (iv) We release an optimized python package of our solvers as to widespread the use of VQR in real-world applications.
△ Less
Submitted 2 June, 2023; v1 submitted 30 May, 2022;
originally announced May 2022.
-
Towards Predicting Fine Finger Motions from Ultrasound Images via Kinematic Representation
Authors:
Dean Zadok,
Oren Salzman,
Alon Wolf,
Alex M. Bronstein
Abstract:
A central challenge in building robotic prostheses is the creation of a sensor-based system able to read physiological signals from the lower limb and instruct a robotic hand to perform various tasks. Existing systems typically perform discrete gestures such as pointing or grasping, by employing electromyography (EMG) or ultrasound (US) technologies to analyze muscle states. While estimating finge…
▽ More
A central challenge in building robotic prostheses is the creation of a sensor-based system able to read physiological signals from the lower limb and instruct a robotic hand to perform various tasks. Existing systems typically perform discrete gestures such as pointing or grasping, by employing electromyography (EMG) or ultrasound (US) technologies to analyze muscle states. While estimating finger gestures has been done in the past by detecting prominent gestures, we are interested in detection, or inference, done in the context of fine motions that evolve over time. Examples include motions occurring when performing fine and dexterous tasks such as keyboard typing or piano playing. We consider this task as an important step towards higher adoption rates of robotic prostheses among arm amputees, as it has the potential to dramatically increase functionality in performing daily tasks. To this end, we present an end-to-end robotic system, which can successfully infer fine finger motions. This is achieved by modeling the hand as a robotic manipulator and using it as an intermediate representation to encode muscles' dynamics from a sequence of US images. We evaluated our method by collecting data from a group of subjects and demonstrating how it can be used to replay music played or text typed. To the best of our knowledge, this is the first study demonstrating these downstream tasks within an end-to-end system.
△ Less
Submitted 28 September, 2022; v1 submitted 10 February, 2022;
originally announced February 2022.
-
SPDCinv: Inverse Quantum-Optical Design of High-Dimensional Qudits
Authors:
Eyal Rozenberg,
Aviv Karnieli,
Ofir Yesharim,
Joshua Foley-Comer,
Sivan Trajtenberg-Mills,
Daniel Freedman,
Alex M. Bronstein,
Ady Arie
Abstract:
Spontaneous parametric down-conversion in quantum optics is an invaluable resource for the realization of high-dimensional qudits with spatial modes of light. One of the main open challenges is how to directly generate a desirable qudit state in the SPDC process. This problem can be addressed through advanced computational learning methods; however, due to difficulties in modeling the SPDC process…
▽ More
Spontaneous parametric down-conversion in quantum optics is an invaluable resource for the realization of high-dimensional qudits with spatial modes of light. One of the main open challenges is how to directly generate a desirable qudit state in the SPDC process. This problem can be addressed through advanced computational learning methods; however, due to difficulties in modeling the SPDC process by a fully differentiable algorithm that takes into account all interaction effects, progress has been limited. Here, we overcome these limitations and introduce a physically-constrained and differentiable model, validated against experimental results for shaped pump beams and structured crystals, capable of learning every interaction parameter in the process. We avoid any restrictions induced by the stochastic nature of our physical model and integrate the dynamic equations governing the evolution under the SPDC Hamiltonian. We solve the inverse problem of designing a nonlinear quantum optical system that achieves the desired quantum state of down-converted photon pairs. The desired states are defined using either the second-order correlations between different spatial modes or by specifying the required density matrix. By learning nonlinear volume holograms as well as different pump shapes, we successfully show how to generate maximally entangled states. Furthermore, we simulate all-optical coherent control over the generated quantum state by actively changing the profile of the pump beam. Our work can be useful for applications such as novel designs of high-dimensional quantum key distribution and quantum information processing protocols. In addition, our method can be readily applied for controlling other degrees of freedom of light in the SPDC process, such as the spectral and temporal properties, and may even be used in condensed-matter systems having a similar interaction Hamiltonian.
△ Less
Submitted 11 December, 2021;
originally announced December 2021.
-
Delta-GAN-Encoder: Encoding Semantic Changes for Explicit Image Editing, using Few Synthetic Samples
Authors:
Nir Diamant,
Nitsan Sandor,
Alex M Bronstein
Abstract:
Understating and controlling generative models' latent space is a complex task.
In this paper, we propose a novel method for learning to control any desired attribute in a pre-trained GAN's latent space, for the purpose of editing synthesized and real-world data samples accordingly.
We perform Sim2Real learning, relying on minimal samples to achieve an unlimited amount of continuous precise ed…
▽ More
Understating and controlling generative models' latent space is a complex task.
In this paper, we propose a novel method for learning to control any desired attribute in a pre-trained GAN's latent space, for the purpose of editing synthesized and real-world data samples accordingly.
We perform Sim2Real learning, relying on minimal samples to achieve an unlimited amount of continuous precise edits.
We present an Autoencoder-based model that learns to encode the semantics of changes between images as a basis for editing new samples later on, achieving precise desired results - example shown in Fig. 1.
While previous editing methods rely on a known structure of latent spaces (e.g., linearity of some semantics in StyleGAN), our method inherently does not require any structural constraints.
We demonstrate our method in the domain of facial imagery: editing different expressions, poses, and lighting attributes, achieving state-of-the-art results.
△ Less
Submitted 17 November, 2021; v1 submitted 16 November, 2021;
originally announced November 2021.
-
Joint optimization of system design and reconstruction in MIMO radar imaging
Authors:
Tomer Weiss,
Nissim Peretz,
Sanketh Vedula,
Arie Feuer,
Alex Bronstein
Abstract:
Multiple-input multiple-output (MIMO) radar is one of the leading depth sensing modalities. However, the usage of multiple receive channels lead to relative high costs and prevent the penetration of MIMOs in many areas such as the automotive industry. Over the last years, few studies concentrated on designing reduced measurement schemes and image reconstruction schemes for MIMO radars, however the…
▽ More
Multiple-input multiple-output (MIMO) radar is one of the leading depth sensing modalities. However, the usage of multiple receive channels lead to relative high costs and prevent the penetration of MIMOs in many areas such as the automotive industry. Over the last years, few studies concentrated on designing reduced measurement schemes and image reconstruction schemes for MIMO radars, however these problems have been so far addressed separately. On the other hand, recent works in optical computational imaging have demonstrated growing success of simultaneous learning-based design of the acquisition and reconstruction schemes, manifesting significant improvement in the reconstruction quality. Inspired by these successes, in this work, we propose to learn MIMO acquisition parameters in the form of receive (Rx) antenna elements locations jointly with an image neural-network based reconstruction. To this end, we propose an algorithm for training the combined acquisition-reconstruction pipeline end-to-end in a differentiable way. We demonstrate the significance of using our learned acquisition parameters with and without the neural-network reconstruction.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
GRASP: Graph Alignment through Spectral Signatures
Authors:
Judith Hermanns,
Anton Tsitsulin,
Marina Munkhoeva,
Alex Bronstein,
Davide Mottin,
Panagiotis Karras
Abstract:
What is the best way to match the nodes of two graphs? This graph alignment problem generalizes graph isomorphism and arises in applications from social network analysis to bioinformatics. Some solutions assume that auxiliary information on known matches or node or edge attributes is available, or utilize arbitrary graph features. Such methods fare poorly in the pure form of the problem, in which…
▽ More
What is the best way to match the nodes of two graphs? This graph alignment problem generalizes graph isomorphism and arises in applications from social network analysis to bioinformatics. Some solutions assume that auxiliary information on known matches or node or edge attributes is available, or utilize arbitrary graph features. Such methods fare poorly in the pure form of the problem, in which only graph structures are given. Other proposals translate the problem to one of aligning node embeddings, yet, by doing so, provide only a single-scale view of the graph. In this paper, we transfer the shape-analysis concept of functional maps from the continuous to the discrete case, and treat the graph alignment problem as a special case of the problem of finding a mapping between functions on graphs. We present GRASP, a method that first establishes a correspondence between functions derived from Laplacian matrix eigenvectors, which capture multiscale structural characteristics, and then exploits this correspondence to align nodes. Our experimental study, featuring noise levels higher than anything used in previous studies, shows that GRASP outperforms state-of-the-art methods for graph alignment across noise levels and graph types.
△ Less
Submitted 11 June, 2021; v1 submitted 10 June, 2021;
originally announced June 2021.
-
Detector-Free Weakly Supervised Grounding by Separation
Authors:
Assaf Arbelle,
Sivan Doveh,
Amit Alfassy,
Joseph Shtok,
Guy Lev,
Eli Schwartz,
Hilde Kuehne,
Hila Barak Levi,
Prasanna Sattigeri,
Rameswar Panda,
Chun-Fu Chen,
Alex Bronstein,
Kate Saenko,
Shimon Ullman,
Raja Giryes,
Rogerio Feris,
Leonid Karlinsky
Abstract:
Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume the existence of a pre-trained object de…
▽ More
Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume the existence of a pre-trained object detector, relying on it to produce the ROIs for localization. In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector. We directly learn everything from the images and associated free-form text pairs, thus potentially gaining an advantage on the categories unsupported by the detector. The key idea behind our proposed Grounding by Separation (GbS) method is synthesizing `text to image-regions' associations by random alpha-blending of arbitrary image pairs and using the corresponding texts of the pair as conditions to recover the alpha map from the blended image via a segmentation network. At test time, this allows using the query phrase as a condition for a non-blended query image, thus interpreting the test image as a composition of a region corresponding to the phrase and the complement region. Using this approach we demonstrate a significant accuracy improvement, of up to $8.5\%$ over previous DF-WSG SotA, for a range of benchmarks including Flickr30K, Visual Genome, and ReferIt, as well as a significant complementary improvement (above $7\%$) over the detector-based approaches for WSG.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy Labels
Authors:
Evgenii Zheltonozhskii,
Chaim Baskin,
Avi Mendelson,
Alex M. Bronstein,
Or Litany
Abstract:
The success of learning with noisy labels (LNL) methods relies heavily on the success of a warm-up stage where standard supervised training is performed using the full (noisy) training set. In this paper, we identify a "warm-up obstacle": the inability of standard warm-up stages to train high quality feature extractors and avert memorization of noisy labels. We propose "Contrast to Divide" (C2D),…
▽ More
The success of learning with noisy labels (LNL) methods relies heavily on the success of a warm-up stage where standard supervised training is performed using the full (noisy) training set. In this paper, we identify a "warm-up obstacle": the inability of standard warm-up stages to train high quality feature extractors and avert memorization of noisy labels. We propose "Contrast to Divide" (C2D), a simple framework that solves this problem by pre-training the feature extractor in a self-supervised fashion. Using self-supervised pre-training boosts the performance of existing LNL approaches by drastically reducing the warm-up stage's susceptibility to noise level, shortening its duration, and improving extracted feature quality. C2D works out of the box with existing methods and demonstrates markedly improved performance, especially in the high noise regime, where we get a boost of more than 27% for CIFAR-100 with 90% noise over the previous state of the art. In real-life noise settings, C2D trained on mini-WebVision outperforms previous works both in WebVision and ImageNet validation sets by 3% top-1 accuracy. We perform an in-depth analysis of the framework, including investigating the performance of different pre-training approaches and estimating the effective upper bound of the LNL performance with semi-supervised learning. Code for reproducing our experiments is available at https://github.com/ContrastToDivide/C2D
△ Less
Submitted 20 October, 2021; v1 submitted 25 March, 2021;
originally announced March 2021.
-
Self-Supervised Classification Network
Authors:
Elad Amrani,
Leonid Karlinsky,
Alex Bronstein
Abstract:
We present Self-Classifier -- a novel self-supervised end-to-end classification learning approach. Self-Classifier learns labels and representations simultaneously in a single-stage end-to-end manner by optimizing for same-class prediction of two augmented views of the same sample. To guarantee non-degenerate solutions (i.e., solutions where all labels are assigned to the same class) we propose a…
▽ More
We present Self-Classifier -- a novel self-supervised end-to-end classification learning approach. Self-Classifier learns labels and representations simultaneously in a single-stage end-to-end manner by optimizing for same-class prediction of two augmented views of the same sample. To guarantee non-degenerate solutions (i.e., solutions where all labels are assigned to the same class) we propose a mathematically motivated variant of the cross-entropy loss that has a uniform prior asserted on the predicted labels. In our theoretical analysis, we prove that degenerate solutions are not in the set of optimal solutions of our approach. Self-Classifier is simple to implement and scalable. Unlike other popular unsupervised classification and contrastive representation learning approaches, it does not require any form of pre-training, expectation-maximization, pseudo-labeling, external clustering, a second network, stop-gradient operation, or negative pairs. Despite its simplicity, our approach sets a new state of the art for unsupervised classification of ImageNet; and even achieves comparable to state-of-the-art results for unsupervised representation learning. Code is available at https://github.com/elad-amrani/self-classifier.
△ Less
Submitted 12 July, 2022; v1 submitted 19 March, 2021;
originally announced March 2021.
-
Inverse Design of Quantum Holograms in Three-Dimensional Nonlinear Photonic Crystals
Authors:
Eyal Rozenberg,
Aviv Karnieli,
Ofir Yesharim,
Sivan Trajtenberg-Mills,
Daniel Freedman,
Alex M. Bronstein,
Ady Arie
Abstract:
We introduce a systematic approach for designing 3D nonlinear photonic crystals and pump beams for generating desired quantum correlations between structured photon-pairs. Our model is fully differentiable, allowing accurate and efficient learning and discovery of novel designs.
We introduce a systematic approach for designing 3D nonlinear photonic crystals and pump beams for generating desired quantum correlations between structured photon-pairs. Our model is fully differentiable, allowing accurate and efficient learning and discovery of novel designs.
△ Less
Submitted 20 February, 2021;
originally announced February 2021.
-
ISP Distillation
Authors:
Eli Schwartz,
Alex Bronstein,
Raja Giryes
Abstract:
Nowadays, many of the images captured are `observed' by machines only and not by humans, e.g., in autonomous systems. High-level machine vision models, such as object recognition or semantic segmentation, assume images are transformed into some canonical image space by the camera \ans{Image Signal Processor (ISP)}. However, the camera ISP is optimized for producing visually pleasing images for hum…
▽ More
Nowadays, many of the images captured are `observed' by machines only and not by humans, e.g., in autonomous systems. High-level machine vision models, such as object recognition or semantic segmentation, assume images are transformed into some canonical image space by the camera \ans{Image Signal Processor (ISP)}. However, the camera ISP is optimized for producing visually pleasing images for human observers and not for machines. Therefore, one may spare the ISP compute time and apply vision models directly to RAW images. Yet, it has been shown that training such models directly on RAW images results in a performance drop. To mitigate this drop, we use a RAW and RGB image pairs dataset, which can be easily acquired with no human labeling. We then train a model that is applied directly to the RAW data by using knowledge distillation such that the model predictions for RAW images will be aligned with the predictions of an off-the-shelf pre-trained model for processed RGB images. Our experiments show that our performance on RAW images for object classification and semantic segmentation is significantly better than models trained on labeled RAW images. It also reasonably matches the predictions of a pre-trained model on processed RGB images, while saving the ISP compute overhead.
△ Less
Submitted 4 May, 2023; v1 submitted 25 January, 2021;
originally announced January 2021.
-
SimuGAN: Unsupervised forward modeling and optimal design of a LIDAR Camera
Authors:
Nir Diamant,
Tal Mund,
Ohad Menashe,
Aviad Zabatani,
Alex M. Bronstein
Abstract:
Energy-saving LIDAR camera for short distances estimates an object's distance using temporally intensity-coded laser light pulses and calculates the maximum correlation with the back-scattered pulse.
Though on low power, the backs-scattered pulse is noisy and unstable, which leads to inaccurate and unreliable depth estimation.
To address this problem, we use GANs (Generative Adversarial Networ…
▽ More
Energy-saving LIDAR camera for short distances estimates an object's distance using temporally intensity-coded laser light pulses and calculates the maximum correlation with the back-scattered pulse.
Though on low power, the backs-scattered pulse is noisy and unstable, which leads to inaccurate and unreliable depth estimation.
To address this problem, we use GANs (Generative Adversarial Networks), which are two neural networks that can learn complicated class distributions through an adversarial process. We learn the LIDAR camera's hidden properties and behavior, creating a novel, fully unsupervised forward model that simulates the camera. Then, we use the model's differentiability to explore the camera parameter space and optimize those parameters in terms of depth, accuracy, and stability. To achieve this goal, we also propose a new custom loss function designated to the back-scattered code distribution's weaknesses and its circular behavior. The results are demonstrated on both synthetic and real data.
△ Less
Submitted 16 December, 2020;
originally announced December 2020.
-
Digital Gimbal: End-to-end Deep Image Stabilization with Learnable Exposure Times
Authors:
Omer Dahary,
Matan Jacoby,
Alex M. Bronstein
Abstract:
Mechanical image stabilization using actuated gimbals enables capturing long-exposure shots without suffering from blur due to camera motion. These devices, however, are often physically cumbersome and expensive, limiting their widespread use. In this work, we propose to digitally emulate a mechanically stabilized system from the input of a fast unstabilized camera. To exploit the trade-off betwee…
▽ More
Mechanical image stabilization using actuated gimbals enables capturing long-exposure shots without suffering from blur due to camera motion. These devices, however, are often physically cumbersome and expensive, limiting their widespread use. In this work, we propose to digitally emulate a mechanically stabilized system from the input of a fast unstabilized camera. To exploit the trade-off between motion blur at long exposures and low SNR at short exposures, we train a CNN that estimates a sharp high-SNR image by aggregating a burst of noisy short-exposure frames, related by unknown motion. We further suggest learning the burst's exposure times in an end-to-end manner, thus balancing the noise and blur across the frames. We demonstrate this method's advantage over the traditional approach of deblurring a single image or denoising a fixed-exposure burst on both synthetic and real data.
△ Less
Submitted 23 November, 2021; v1 submitted 8 December, 2020;
originally announced December 2020.
-
Non-Rigid Puzzles
Authors:
Or Litany,
Emanuele Rodolà,
Alex Bronstein,
Michael Bronstein,
Daniel Cremers
Abstract:
Shape correspondence is a fundamental problem in computer graphics and vision, with applications in various problems including animation, texture mapping, robotic vision, medical imaging, archaeology and many more. In settings where the shapes are allowed to undergo non-rigid deformations and only partial views are available, the problem becomes very challenging. To this end, we present a non-rigi…
▽ More
Shape correspondence is a fundamental problem in computer graphics and vision, with applications in various problems including animation, texture mapping, robotic vision, medical imaging, archaeology and many more. In settings where the shapes are allowed to undergo non-rigid deformations and only partial views are available, the problem becomes very challenging. To this end, we present a non-rigid multi-part shape matching algorithm. We assume to be given a reference shape and its multiple parts undergoing a non-rigid deformation. Each of these query parts can be additionally contaminated by clutter, may overlap with other parts, and there might be missing parts or redundant ones. Our method simultaneously solves for the segmentation of the reference model, and for a dense correspondence to (subsets of) the parts. Experimental results on synthetic as well as real scans demonstrate the effectiveness of our method in dealing with this challenging matching scenario.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
Self-Supervised Learning for Large-Scale Unsupervised Image Clustering
Authors:
Evgenii Zheltonozhskii,
Chaim Baskin,
Alex M. Bronstein,
Avi Mendelson
Abstract:
Unsupervised learning has always been appealing to machine learning researchers and practitioners, allowing them to avoid an expensive and complicated process of labeling the data. However, unsupervised learning of complex data is challenging, and even the best approaches show much weaker performance than their supervised counterparts. Self-supervised deep learning has become a strong instrument f…
▽ More
Unsupervised learning has always been appealing to machine learning researchers and practitioners, allowing them to avoid an expensive and complicated process of labeling the data. However, unsupervised learning of complex data is challenging, and even the best approaches show much weaker performance than their supervised counterparts. Self-supervised deep learning has become a strong instrument for representation learning in computer vision. However, those methods have not been evaluated in a fully unsupervised setting. In this paper, we propose a simple scheme for unsupervised classification based on self-supervised representations. We evaluate the proposed approach with several recent self-supervised methods showing that it achieves competitive results for ImageNet classification (39% accuracy on ImageNet with 1000 clusters and 46% with overclustering). We suggest adding the unsupervised evaluation to a set of standard benchmarks for self-supervised learning. The code is available at https://github.com/Randl/kmeans_selfsuper
△ Less
Submitted 9 November, 2020; v1 submitted 24 August, 2020;
originally announced August 2020.
-
3D FLAT: Feasible Learned Acquisition Trajectories for Accelerated MRI
Authors:
Jonathan Alush-Aben,
Linor Ackerman-Schraier,
Tomer Weiss,
Sanketh Vedula,
Ortal Senouf,
Alex Bronstein
Abstract:
Magnetic Resonance Imaging (MRI) has long been considered to be among the gold standards of today's diagnostic imaging. The most significant drawback of MRI is long acquisition times, prohibiting its use in standard practice for some applications. Compressed sensing (CS) proposes to subsample the k-space (the Fourier domain dual to the physical space of spatial coordinates) leading to significantl…
▽ More
Magnetic Resonance Imaging (MRI) has long been considered to be among the gold standards of today's diagnostic imaging. The most significant drawback of MRI is long acquisition times, prohibiting its use in standard practice for some applications. Compressed sensing (CS) proposes to subsample the k-space (the Fourier domain dual to the physical space of spatial coordinates) leading to significantly accelerated acquisition. However, the benefit of compressed sensing has not been fully exploited; most of the sampling densities obtained through CS do not produce a trajectory that obeys the stringent constraints of the MRI machine imposed in practice. Inspired by recent success of deep learning based approaches for image reconstruction and ideas from computational imaging on learning-based design of imaging systems, we introduce 3D FLAT, a novel protocol for data-driven design of 3D non-Cartesian accelerated trajectories in MRI. Our proposal leverages the entire 3D k-space to simultaneously learn a physically feasible acquisition trajectory with a reconstruction method. Experimental results, performed as a proof-of-concept, suggest that 3D FLAT achieves higher image quality for a given readout time compared to standard trajectories such as radial, stack-of-stars, or 2D learned trajectories (trajectories that evolve only in the 2D plane while fully sampling along the third dimension). Furthermore, we demonstrate evidence supporting the significant benefit of performing MRI acquisitions using non-Cartesian 3D trajectories over 2D non-Cartesian trajectories acquired slice-wise.
△ Less
Submitted 11 August, 2020;
originally announced August 2020.
-
Data-Driven Prediction of Embryo Implantation Probability Using IVF Time-lapse Imaging
Authors:
David H. Silver,
Martin Feder,
Yael Gold-Zamir,
Avital L. Polsky,
Shahar Rosentraub,
Efrat Shachor,
Adi Weinberger,
Pavlo Mazur,
Valery D. Zukin,
Alex M. Bronstein
Abstract:
The process of fertilizing a human egg outside the body in order to help those suffering from infertility to conceive is known as in vitro fertilization (IVF). Despite being the most effective method of assisted reproductive technology (ART), the average success rate of IVF is a mere 20-40%. One step that is critical to the success of the procedure is selecting which embryo to transfer to the pati…
▽ More
The process of fertilizing a human egg outside the body in order to help those suffering from infertility to conceive is known as in vitro fertilization (IVF). Despite being the most effective method of assisted reproductive technology (ART), the average success rate of IVF is a mere 20-40%. One step that is critical to the success of the procedure is selecting which embryo to transfer to the patient, a process typically conducted manually and without any universally accepted and standardized criteria. In this paper we describe a novel data-driven system trained to directly predict embryo implantation probability from embryogenesis time-lapse imaging videos. Using retrospectively collected videos from 272 embryos, we demonstrate that, when compared to an external panel of embryologists, our algorithm results in a 12% increase of positive predictive value and a 29% increase of negative predictive value.
△ Less
Submitted 2 June, 2020; v1 submitted 1 June, 2020;
originally announced June 2020.
-
HCM: Hardware-Aware Complexity Metric for Neural Network Architectures
Authors:
Alex Karbachevsky,
Chaim Baskin,
Evgenii Zheltonozhskii,
Yevgeny Yermolin,
Freddy Gabbay,
Alex M. Bronstein,
Avi Mendelson
Abstract:
Convolutional Neural Networks (CNNs) have become common in many fields including computer vision, speech recognition, and natural language processing. Although CNN hardware accelerators are already included as part of many SoC architectures, the task of achieving high accuracy on resource-restricted devices is still considered challenging, mainly due to the vast number of design parameters that ne…
▽ More
Convolutional Neural Networks (CNNs) have become common in many fields including computer vision, speech recognition, and natural language processing. Although CNN hardware accelerators are already included as part of many SoC architectures, the task of achieving high accuracy on resource-restricted devices is still considered challenging, mainly due to the vast number of design parameters that need to be balanced to achieve an efficient solution. Quantization techniques, when applied to the network parameters, lead to a reduction of power and area and may also change the ratio between communication and computation. As a result, some algorithmic solutions may suffer from lack of memory bandwidth or computational resources and fail to achieve the expected performance due to hardware constraints. Thus, the system designer and the micro-architect need to understand at early development stages the impact of their high-level decisions (e.g., the architecture of the CNN and the amount of bits used to represent its parameters) on the final product (e.g., the expected power saving, area, and accuracy). Unfortunately, existing tools fall short of supporting such decisions.
This paper introduces a hardware-aware complexity metric that aims to assist the system designer of the neural network architectures, through the entire project lifetime (especially at its early stages) by predicting the impact of architectural and micro-architectural decisions on the final product. We demonstrate how the proposed metric can help evaluate different design alternatives of neural network models on resource-restricted devices such as real-time embedded systems, and to avoid making design mistakes at early stages.
△ Less
Submitted 26 April, 2020; v1 submitted 19 April, 2020;
originally announced April 2020.
-
Do We Need Depth in State-Of-The-Art Face Authentication?
Authors:
Amir Livne,
Alex Bronstein,
Ron Kimmel,
Ziv Aviv,
Shahaf Grofit
Abstract:
Some face recognition methods are designed to utilize geometric information extracted from depth sensors to overcome the weaknesses of single-image based recognition technologies. However, the accurate acquisition of the depth profile is an expensive and challenging process. Here, we introduce a novel method that learns to recognize faces from stereo camera systems without the need to explicitly c…
▽ More
Some face recognition methods are designed to utilize geometric information extracted from depth sensors to overcome the weaknesses of single-image based recognition technologies. However, the accurate acquisition of the depth profile is an expensive and challenging process. Here, we introduce a novel method that learns to recognize faces from stereo camera systems without the need to explicitly compute the facial surface or depth map. The raw face stereo images along with the location in the image from which the face is extracted allow the proposed CNN to improve the recognition task while avoiding the need to explicitly handle the geometric structure of the face. This way, we keep the simplicity and cost efficiency of identity authentication from a single image, while enjoying the benefits of geometric data without explicitly reconstructing it. We demonstrate that the suggested method outperforms both existing single-image and explicit depth based methods on large-scale benchmarks, and even capable of recognize spoofing attacks. We also provide an ablation study that shows that the suggested method uses the face locations in the left and right images to encode informative features that improve the overall performance.
△ Less
Submitted 10 November, 2020; v1 submitted 24 March, 2020;
originally announced March 2020.
-
Shape retrieval of non-rigid 3d human models
Authors:
David Pickup,
Xianfang Sun,
Paul L Rosin,
Ralph R Martin,
Z Cheng,
Zhouhui Lian,
Masaki Aono,
A Ben Hamza,
A Bronstein,
M Bronstein,
S Bu,
Umberto Castellani,
S Cheng,
Valeria Garro,
Andrea Giachetti,
Afzal Godil,
Luca Isaia,
J Han,
Henry Johan,
L Lai,
Bo Li,
C Li,
Haisheng Li,
Roee Litman,
X Liu
, et al. (6 additional authors not shown)
Abstract:
3D models of humans are commonly used within computer graphics and vision, and so the ability to distinguish between body shapes is an important shape retrieval problem. We extend our recent paper which provided a benchmark for testing non-rigid 3D shape retrieval algorithms on 3D human models. This benchmark provided a far stricter challenge than previous shape benchmarks. We have added 145 new m…
▽ More
3D models of humans are commonly used within computer graphics and vision, and so the ability to distinguish between body shapes is an important shape retrieval problem. We extend our recent paper which provided a benchmark for testing non-rigid 3D shape retrieval algorithms on 3D human models. This benchmark provided a far stricter challenge than previous shape benchmarks. We have added 145 new models for use as a separate training set, in order to standardise the training data used and provide a fairer comparison. We have also included experiments with the FAUST dataset of human scans. All participants of the previous benchmark study have taken part in the new tests reported here, many providing updated results using the new data. In addition, further participants have also taken part, and we provide extra analysis of the retrieval results. A total of 25 different shape retrieval methods.
△ Less
Submitted 1 March, 2020;
originally announced March 2020.
-
StarNet: towards Weakly Supervised Few-Shot Object Detection
Authors:
Leonid Karlinsky,
Joseph Shtok,
Amit Alfassy,
Moshe Lichtenstein,
Sivan Harary,
Eli Schwartz,
Sivan Doveh,
Prasanna Sattigeri,
Rogerio Feris,
Alexander Bronstein,
Raja Giryes
Abstract:
Few-shot detection and classification have advanced significantly in recent years. Yet, detection approaches require strong annotation (bounding boxes) both for pre-training and for adaptation to novel classes, and classification approaches rarely provide localization of objects in the scene. In this paper, we introduce StarNet - a few-shot model featuring an end-to-end differentiable non-parametr…
▽ More
Few-shot detection and classification have advanced significantly in recent years. Yet, detection approaches require strong annotation (bounding boxes) both for pre-training and for adaptation to novel classes, and classification approaches rarely provide localization of objects in the scene. In this paper, we introduce StarNet - a few-shot model featuring an end-to-end differentiable non-parametric star-model detection and classification head. Through this head, the backbone is meta-trained using only image-level labels to produce good features for jointly localizing and classifying previously unseen categories of few-shot test tasks using a star-model that geometrically matches between the query and support images (to find corresponding object instances). Being a few-shot detector, StarNet does not require any bounding box annotations, neither during pre-training nor for novel classes adaptation. It can thus be applied to the previously unexplored and challenging task of Weakly Supervised Few-Shot Object Detection (WS-FSOD), where it attains significant improvements over the baselines. In addition, StarNet shows significant gains on few-shot classification benchmarks that are less cropped around the objects (where object localization is key).
△ Less
Submitted 17 September, 2020; v1 submitted 15 March, 2020;
originally announced March 2020.
-
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
Authors:
Elad Amrani,
Rami Ben-Ari,
Daniel Rotman,
Alex Bronstein
Abstract:
One of the key factors of enabling machine learning models to comprehend and solve real-world tasks is to leverage multimodal data. Unfortunately, annotation of multimodal data is challenging and expensive. Recently, self-supervised multimodal methods that combine vision and language were proposed to learn multimodal representations without annotation. However, these methods often choose to ignore…
▽ More
One of the key factors of enabling machine learning models to comprehend and solve real-world tasks is to leverage multimodal data. Unfortunately, annotation of multimodal data is challenging and expensive. Recently, self-supervised multimodal methods that combine vision and language were proposed to learn multimodal representations without annotation. However, these methods often choose to ignore the presence of high levels of noise and thus yield sub-optimal results. In this work, we show that the problem of noise estimation for multimodal data can be reduced to a multimodal density estimation task. Using multimodal density estimation, we propose a noise estimation building block for multimodal representation learning that is based strictly on the inherent correlation between different modalities. We demonstrate how our noise estimation can be broadly integrated and achieves comparable results to state-of-the-art performance on five different benchmark datasets for two challenging multimodal tasks: Video Question Answering and Text-To-Video Retrieval. Furthermore, we provide a theoretical probabilistic error bound substantiating our empirical results and analyze failure cases. Code: https://github.com/elad-amrani/ssml.
△ Less
Submitted 10 December, 2020; v1 submitted 6 March, 2020;
originally announced March 2020.
-
Colored Noise Injection for Training Adversarially Robust Neural Networks
Authors:
Evgenii Zheltonozhskii,
Chaim Baskin,
Yaniv Nemcovsky,
Brian Chmiel,
Avi Mendelson,
Alex M. Bronstein
Abstract:
Even though deep learning has shown unmatched performance on various tasks, neural networks have been shown to be vulnerable to small adversarial perturbations of the input that lead to significant performance degradation. In this work we extend the idea of adding white Gaussian noise to the network weights and activations during adversarial training (PNI) to the injection of colored noise for def…
▽ More
Even though deep learning has shown unmatched performance on various tasks, neural networks have been shown to be vulnerable to small adversarial perturbations of the input that lead to significant performance degradation. In this work we extend the idea of adding white Gaussian noise to the network weights and activations during adversarial training (PNI) to the injection of colored noise for defense against common white-box and black-box attacks. We show that our approach outperforms PNI and various previous approaches in terms of adversarial accuracy on CIFAR-10 and CIFAR-100 datasets. In addition, we provide an extensive ablation study of the proposed method justifying the chosen configurations.
△ Less
Submitted 20 March, 2020; v1 submitted 4 March, 2020;
originally announced March 2020.
-
Robust Quantization: One Model to Rule Them All
Authors:
Moran Shkolnik,
Brian Chmiel,
Ron Banner,
Gil Shomron,
Yury Nahshan,
Alex Bronstein,
Uri Weiser
Abstract:
Neural network quantization methods often involve simulating the quantization process during training, making the trained model highly dependent on the target bit-width and precise way quantization is performed. Robust quantization offers an alternative approach with improved tolerance to different classes of data-types and quantization policies. It opens up new exciting applications where the qua…
▽ More
Neural network quantization methods often involve simulating the quantization process during training, making the trained model highly dependent on the target bit-width and precise way quantization is performed. Robust quantization offers an alternative approach with improved tolerance to different classes of data-types and quantization policies. It opens up new exciting applications where the quantization process is not static and can vary to meet different circumstances and implementations. To address this issue, we propose a method that provides intrinsic robustness to the model against a broad range of quantization processes. Our method is motivated by theoretical arguments and enables us to store a single generic model capable of operating at various bit-widths and quantization policies. We validate our method's effectiveness on different ImageNet models.
△ Less
Submitted 22 October, 2020; v1 submitted 18 February, 2020;
originally announced February 2020.
-
MetAdapt: Meta-Learned Task-Adaptive Architecture for Few-Shot Classification
Authors:
Sivan Doveh,
Eli Schwartz,
Chao Xue,
Rogerio Feris,
Alex Bronstein,
Raja Giryes,
Leonid Karlinsky
Abstract:
Few-Shot Learning (FSL) is a topic of rapidly growing interest. Typically, in FSL a model is trained on a dataset consisting of many small tasks (meta-tasks) and learns to adapt to novel tasks that it will encounter during test time. This is also referred to as meta-learning. Another topic closely related to meta-learning with a lot of interest in the community is Neural Architecture Search (NAS),…
▽ More
Few-Shot Learning (FSL) is a topic of rapidly growing interest. Typically, in FSL a model is trained on a dataset consisting of many small tasks (meta-tasks) and learns to adapt to novel tasks that it will encounter during test time. This is also referred to as meta-learning. Another topic closely related to meta-learning with a lot of interest in the community is Neural Architecture Search (NAS), automatically finding optimal architecture instead of engineering it manually. In this work, we combine these two aspects of meta-learning. So far, meta-learning FSL methods have focused on optimizing parameters of pre-defined network architectures, in order to make them easily adaptable to novel tasks. Moreover, it was observed that, in general, larger architectures perform better than smaller ones up to a certain saturation point (where they start to degrade due to over-fitting). However, little attention has been given to explicitly optimizing the architectures for FSL, nor to an adaptation of the architecture at test time to particular novel tasks. In this work, we propose to employ tools inspired by the Differentiable Neural Architecture Search (D-NAS) literature in order to optimize the architecture for FSL without over-fitting. Additionally, to make the architecture task adaptive, we propose the concept of `MetAdapt Controller' modules. These modules are added to the model and are meta-trained to predict the optimal network connections for a given novel task. Using the proposed approach we observe state-of-the-art results on two popular few-shot benchmarks: miniImageNet and FC100.
△ Less
Submitted 9 March, 2020; v1 submitted 1 December, 2019;
originally announced December 2019.
-
Spectral Geometric Matrix Completion
Authors:
Amit Boyarski,
Sanketh Vedula,
Alex Bronstein
Abstract:
Deep Matrix Factorization (DMF) is an emerging approach to the problem of matrix completion. Recent works have established that gradient descent applied to a DMF model induces an implicit regularization on the rank of the recovered matrix. In this work we interpret the DMF model through the lens of spectral geometry. This allows us to incorporate explicit regularization without breaking the DMF st…
▽ More
Deep Matrix Factorization (DMF) is an emerging approach to the problem of matrix completion. Recent works have established that gradient descent applied to a DMF model induces an implicit regularization on the rank of the recovered matrix. In this work we interpret the DMF model through the lens of spectral geometry. This allows us to incorporate explicit regularization without breaking the DMF structure, thus enjoying the best of both worlds. In particular, we focus on matrix completion problems with underlying geometric or topological relations between the rows and/or columns. Such relations are prevalent in matrix completion problems that arise in many applications, such as recommender systems and drug-target interaction. Our contributions enable DMF models to exploit these relations, and make them competitive on real benchmarks, while exhibiting one of the first successful applications of deep linear networks.
△ Less
Submitted 7 June, 2021; v1 submitted 17 November, 2019;
originally announced November 2019.
-
Smoothed Inference for Adversarially-Trained Models
Authors:
Yaniv Nemcovsky,
Evgenii Zheltonozhskii,
Chaim Baskin,
Brian Chmiel,
Maxim Fishman,
Alex M. Bronstein,
Avi Mendelson
Abstract:
Deep neural networks are known to be vulnerable to adversarial attacks. Current methods of defense from such attacks are based on either implicit or explicit regularization, e.g., adversarial training. Randomized smoothing, the averaging of the classifier outputs over a random distribution centered in the sample, has been shown to guarantee the performance of a classifier subject to bounded pertur…
▽ More
Deep neural networks are known to be vulnerable to adversarial attacks. Current methods of defense from such attacks are based on either implicit or explicit regularization, e.g., adversarial training. Randomized smoothing, the averaging of the classifier outputs over a random distribution centered in the sample, has been shown to guarantee the performance of a classifier subject to bounded perturbations of the input. In this work, we study the application of randomized smoothing as a way to improve performance on unperturbed data as well as to increase robustness to adversarial attacks. The proposed technique can be applied on top of any existing adversarial defense, but works particularly well with the randomized approaches. We examine its performance on common white-box (PGD) and black-box (transfer and NAttack) attacks on CIFAR-10 and CIFAR-100, substantially outperforming previous art for most scenarios and comparable on others. For example, we achieve 60.4% accuracy under a PGD attack on CIFAR-10 using ResNet-20, outperforming previous art by 11.7%. Since our method is based on sampling, it lends itself well for trading-off between the model inference complexity and its performance. A reference implementation of the proposed techniques is provided at https://github.com/yanemcovsky/SIAM
△ Less
Submitted 16 March, 2020; v1 submitted 17 November, 2019;
originally announced November 2019.
-
Loss Aware Post-training Quantization
Authors:
Yury Nahshan,
Brian Chmiel,
Chaim Baskin,
Evgenii Zheltonozhskii,
Ron Banner,
Alex M. Bronstein,
Avi Mendelson
Abstract:
Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. Additionally, we show that the structure is flat and separable…
▽ More
Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. Additionally, we show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods. Reference implementation is available at https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq
△ Less
Submitted 16 March, 2020; v1 submitted 17 November, 2019;
originally announced November 2019.
-
CAT: Compression-Aware Training for bandwidth reduction
Authors:
Chaim Baskin,
Brian Chmiel,
Evgenii Zheltonozhskii,
Ron Banner,
Alex M. Bronstein,
Avi Mendelson
Abstract:
Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory bandwidth requirements, which can be a main energy consumer and throughput bottleneck in hardware accelerators. Accordingly, an efficient feature map compression m…
▽ More
Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory bandwidth requirements, which can be a main energy consumer and throughput bottleneck in hardware accelerators. Accordingly, an efficient feature map compression method can result in substantial performance gains. Inspired by quantization-aware training approaches, we propose a compression-aware training (CAT) method that involves training the model in a way that allows better compression of feature maps during inference. Our method trains the model to achieve low-entropy feature maps, which enables efficient compression at inference time using classical transform coding methods. CAT significantly improves the state-of-the-art results reported for quantization. For example, on ResNet-34 we achieve 73.1% accuracy (0.2% degradation from the baseline) with an average representation of only 1.79 bits per value. Reference implementation accompanies the paper at https://github.com/CAT-teams/CAT
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
Localization with Limited Annotation for Chest X-rays
Authors:
Eyal Rozenberg,
Daniel Freedman,
Alex Bronstein
Abstract:
Localization of an object within an image is a common task in medical imaging. Learning to localize or detect objects typically requires the collection of data which has been labelled with bounding boxes or similar annotations, which can be very time consuming and expensive. A technique which could perform such learning with much less annotation would, therefore, be quite valuable. We present such…
▽ More
Localization of an object within an image is a common task in medical imaging. Learning to localize or detect objects typically requires the collection of data which has been labelled with bounding boxes or similar annotations, which can be very time consuming and expensive. A technique which could perform such learning with much less annotation would, therefore, be quite valuable. We present such a technique for localization with limited annotation, in which the number of images with bounding boxes can be a small fraction of the total dataset (e.g. less than 1%); all other images only possess a whole image label and no bounding box. We propose a novel loss function for tackling this problem; the loss is a continuous relaxation of a well-defined discrete formulation of weakly supervised learning and is numerically well-posed. Furthermore, we propose a new architecture which accounts for both patch dependence and shift-invariance, through the inclusion of CRF layers and anti-aliasing filters, respectively. We apply our technique to the localization of thoracic diseases in chest X-ray images and demonstrate state-of-the-art localization performance on the ChestX-ray14 dataset.
△ Less
Submitted 10 October, 2019; v1 submitted 19 September, 2019;
originally announced September 2019.
-
Horizontal Flows and Manifold Stochastics in Geometric Deep Learning
Authors:
Stefan Sommer,
Alex Bronstein
Abstract:
We introduce two constructions in geometric deep learning for 1) transporting orientation-dependent convolutional filters over a manifold in a continuous way and thereby defining a convolution operator that naturally incorporates the rotational effect of holonomy; and 2) allowing efficient evaluation of manifold convolution layers by sampling manifold valued random variables that center around a w…
▽ More
We introduce two constructions in geometric deep learning for 1) transporting orientation-dependent convolutional filters over a manifold in a continuous way and thereby defining a convolution operator that naturally incorporates the rotational effect of holonomy; and 2) allowing efficient evaluation of manifold convolution layers by sampling manifold valued random variables that center around a weighted diffusion mean. Both methods are inspired by stochastics on manifolds and geometric statistics, and provide examples of how stochastic methods -- here horizontal frame bundle flows and non-linear bridge sampling schemes, can be used in geometric deep learning. We outline the theoretical foundation of the two methods, discuss their relation to Euclidean deep networks and existing methodology in geometric deep learning, and establish important properties of the proposed constructions.
△ Less
Submitted 31 May, 2021; v1 submitted 13 September, 2019;
originally announced September 2019.
-
PILOT: Physics-Informed Learned Optimized Trajectories for Accelerated MRI
Authors:
Tomer Weiss,
Ortal Senouf,
Sanketh Vedula,
Oleg Michailovich,
Michael Zibulevsky,
Alex Bronstein
Abstract:
Magnetic Resonance Imaging (MRI) has long been considered to be among "the gold standards" of diagnostic medical imaging. The long acquisition times, however, render MRI prone to motion artifacts, let alone their adverse contribution to the relative high costs of MRI examination. Over the last few decades, multiple studies have focused on the development of both physical and post-processing method…
▽ More
Magnetic Resonance Imaging (MRI) has long been considered to be among "the gold standards" of diagnostic medical imaging. The long acquisition times, however, render MRI prone to motion artifacts, let alone their adverse contribution to the relative high costs of MRI examination. Over the last few decades, multiple studies have focused on the development of both physical and post-processing methods for accelerated acquisition of MRI scans. These two approaches, however, have so far been addressed separately. On the other hand, recent works in optical computational imaging have demonstrated growing success of concurrent learning-based design of data acquisition and image reconstruction schemes. Such schemes have already demonstrated substantial effectiveness, leading to considerably shorter acquisition times and improved quality of image reconstruction. Inspired by this initial success, in this work, we propose a novel approach to the learning of optimal schemes for conjoint acquisition and reconstruction of MRI scans, with the optimization carried out simultaneously with respect to the time-efficiency of data acquisition and the quality of resulting reconstructions. To be of a practical value, the schemes are encoded in the form of general k-space trajectories, whose associated magnetic gradients are constrained to obey a set of predefined hardware requirements (as defined in terms of, e.g., peak currents and maximum slew rates of magnetic gradients). With this proviso in mind, we propose a novel algorithm for the end-to-end training of a combined acquisition-reconstruction pipeline using a deep neural network with differentiable forward- and back-propagation operators. We demonstrate its effectiveness on image reconstruction and image segmentation tasks, reporting substantial improvements in terms of acceleration factors as well as the quality of these tasks.
△ Less
Submitted 13 April, 2021; v1 submitted 12 September, 2019;
originally announced September 2019.
-
Correspondence-Free Region Localization for Partial Shape Similarity via Hamiltonian Spectrum Alignment
Authors:
Arianna Rampini,
Irene Tallini,
Maks Ovsjanikov,
Alex M. Bronstein,
Emanuele Rodolà
Abstract:
We consider the problem of localizing relevant subsets of non-rigid geometric shapes given only a partial 3D query as the input. Such problems arise in several challenging tasks in 3D vision and graphics, including partial shape similarity, retrieval, and non-rigid correspondence. We phrase the problem as one of alignment between short sequences of eigenvalues of basic differential operators, whic…
▽ More
We consider the problem of localizing relevant subsets of non-rigid geometric shapes given only a partial 3D query as the input. Such problems arise in several challenging tasks in 3D vision and graphics, including partial shape similarity, retrieval, and non-rigid correspondence. We phrase the problem as one of alignment between short sequences of eigenvalues of basic differential operators, which are constructed upon a scalar function defined on the 3D surfaces. Our method therefore seeks for a scalar function that entails this alignment. Differently from existing approaches, we do not require solving for a correspondence between the query and the target, therefore greatly simplifying the optimization process; our core technique is also descriptor-free, as it is driven by the geometry of the two objects as encoded in their operator spectra. We further show that our spectral alignment algorithm provides a remarkably simple alternative to the recent shape-from-spectrum reconstruction approaches. For both applications, we demonstrate improvement over the state-of-the-art either in terms of accuracy or computational cost.
△ Less
Submitted 14 June, 2019;
originally announced June 2019.
-
Baby steps towards few-shot learning with multiple semantics
Authors:
Eli Schwartz,
Leonid Karlinsky,
Rogerio Feris,
Raja Giryes,
Alex M. Bronstein
Abstract:
Learning from one or few visual examples is one of the key capabilities of humans since early infancy, but is still a significant challenge for modern AI systems. While considerable progress has been achieved in few-shot learning from a few image examples, much less attention has been given to the verbal descriptions that are usually provided to infants when they are presented with a new object. I…
▽ More
Learning from one or few visual examples is one of the key capabilities of humans since early infancy, but is still a significant challenge for modern AI systems. While considerable progress has been achieved in few-shot learning from a few image examples, much less attention has been given to the verbal descriptions that are usually provided to infants when they are presented with a new object. In this paper, we focus on the role of additional semantics that can significantly facilitate few-shot visual learning. Building upon recent advances in few-shot learning with additional semantic information, we demonstrate that further improvements are possible by combining multiple and richer semantics (category labels, attributes, and natural language descriptions). Using these ideas, we offer the community new results on the popular miniImageNet and CUB few-shot benchmarks, comparing favorably to the previous state-of-the-art results for both visual only and visual plus semantics-based approaches. We also performed an ablation study investigating the components and design choices of our approach.
△ Less
Submitted 22 January, 2020; v1 submitted 5 June, 2019;
originally announced June 2019.
-
The Shape of Data: Intrinsic Distance for Data Distributions
Authors:
Anton Tsitsulin,
Marina Munkhoeva,
Davide Mottin,
Panagiotis Karras,
Alex Bronstein,
Ivan Oseledets,
Emmanuel Müller
Abstract:
The ability to represent and compare machine learning models is crucial in order to quantify subtle model changes, evaluate generative models, and gather insights on neural network architectures. Existing techniques for comparing data distributions focus on global data properties such as mean and covariance; in that sense, they are extrinsic and uni-scale. We develop a first-of-its-kind intrinsic…
▽ More
The ability to represent and compare machine learning models is crucial in order to quantify subtle model changes, evaluate generative models, and gather insights on neural network architectures. Existing techniques for comparing data distributions focus on global data properties such as mean and covariance; in that sense, they are extrinsic and uni-scale. We develop a first-of-its-kind intrinsic and multi-scale method for characterizing and comparing data manifolds, using a lower-bound of the spectral variant of the Gromov-Wasserstein inter-manifold distance, which compares all data moments. In a thorough experimental study, we demonstrate that our method effectively discerns the structure of data manifolds even on unaligned data of different dimensionalities; moreover, we showcase its efficacy in evaluating the quality of generative models.
△ Less
Submitted 15 February, 2020; v1 submitted 27 May, 2019;
originally announced May 2019.