Search | arXiv e-print repository

Learning Regularization for Graph Inverse Problems

Authors: Moshe Eliasof, Md Shahriar Rahim Siddiqui, Carola-Bibiane Schönlieb, Eldad Haber

Abstract: In recent years, Graph Neural Networks (GNNs) have been utilized for various applications ranging from drug discovery to network design and social networks. In many applications, it is impossible to observe some properties of the graph directly; instead, noisy and indirect measurements of these properties are available. These scenarios are coined as Graph Inverse Problems (GRIP). In this work, we… ▽ More In recent years, Graph Neural Networks (GNNs) have been utilized for various applications ranging from drug discovery to network design and social networks. In many applications, it is impossible to observe some properties of the graph directly; instead, noisy and indirect measurements of these properties are available. These scenarios are coined as Graph Inverse Problems (GRIP). In this work, we introduce a framework leveraging GNNs to solve GRIPs. The framework is based on a combination of likelihood and prior terms, which are used to find a solution that fits the data while adhering to learned prior information. Specifically, we propose to combine recent deep learning techniques that were developed for inverse problems, together with GNN architectures, to formulate and solve GRIP. We study our approach on a number of representative problems that demonstrate the effectiveness of the framework. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2407.00595 [pdf, other]

Fully invertible hyperbolic neural networks for segmenting large-scale surface and sub-surface data

Authors: Bas Peters, Eldad Haber, Keegan Lensink

Abstract: The large spatial/temporal/frequency scale of geoscience and remote-sensing datasets causes memory issues when using convolutional neural networks for (sub-) surface data segmentation. Recently developed fully reversible or fully invertible networks can mostly avoid memory limitations by recomputing the states during the backward pass through the network. This results in a low and fixed memory req… ▽ More The large spatial/temporal/frequency scale of geoscience and remote-sensing datasets causes memory issues when using convolutional neural networks for (sub-) surface data segmentation. Recently developed fully reversible or fully invertible networks can mostly avoid memory limitations by recomputing the states during the backward pass through the network. This results in a low and fixed memory requirement for storing network states, as opposed to the typical linear memory growth with network depth. This work focuses on a fully invertible network based on the telegraph equation. While reversibility saves the major amount of memory used in deep networks by the data, the convolutional kernels can take up most memory if fully invertible networks contain multiple invertible pooling/coarsening layers. We address the explosion of the number of convolutional kernels by combining fully invertible networks with layers that contain the convolutional kernels in a compressed form directly. A second challenge is that invertible networks output a tensor the same size as its input. This property prevents the straightforward application of invertible networks to applications that map between different input-output dimensions, need to map to outputs with more channels than present in the input data, or desire outputs that decrease/increase the resolution compared to the input data. However, we show that by employing invertible networks in a non-standard fashion, we can still use them for these tasks. Examples in hyperspectral land-use classification, airborne geophysical surveying, and seismic imaging illustrate that we can input large data volumes in one chunk and do not need to work on small patches, use dimensionality reduction, or employ methods that classify a patch to a single central pixel. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 22 pages, 13 figures

MSC Class: 86A04

arXiv:2407.00257 [pdf, other]

Inverting airborne electromagnetic data with machine learning

Authors: Michael S. McMillan, Bas Peters, Ophir Greif, Paulina Wozniakowska, Eldad Haber

Abstract: This study focuses on inverting time-domain airborne electromagnetic data in 2D by training a neural-network to understand the relationship between data and conductivity, thereby removing the need for expensive forward modeling during the inversion process. Instead the forward modeling is completed in the training stage, where training models are built before calculating 3D forward modeling traini… ▽ More This study focuses on inverting time-domain airborne electromagnetic data in 2D by training a neural-network to understand the relationship between data and conductivity, thereby removing the need for expensive forward modeling during the inversion process. Instead the forward modeling is completed in the training stage, where training models are built before calculating 3D forward modeling training data. The method relies on training data being similar to the field dataset of choice, therefore, the field data was first inverted in 1D to get an idea of the expected conductivity distribution. With this information, $ 10,000 $ training models were built with similar conductivity ranges, and the research shows that this provided enough information for the network to produce realistic 2D inversion models over an aquifer-bearing region in California. Once the training was completed, the actual inversion time took only a matter of seconds on a generic laptop, which means that if future data was collected in this region it could be inverted in near real-time. Better results are expected by increasing the number of training models and eventually the goal is to extend the method to 3D inversion. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 4 pages, 5 figures, conference submission

MSC Class: 86A22

arXiv:2406.19253 [pdf, other]

Advection Augmented Convolutional Neural Networks

Authors: Niloufar Zakariaei, Siddharth Rout, Eldad Haber, Moshe Eliasof

Abstract: Many problems in physical sciences are characterized by the prediction of space-time sequences. Such problems range from weather prediction to the analysis of disease propagation and video prediction. Modern techniques for the solution of these problems typically combine Convolution Neural Networks (CNN) architecture with a time prediction mechanism. However, oftentimes, such approaches underperfo… ▽ More Many problems in physical sciences are characterized by the prediction of space-time sequences. Such problems range from weather prediction to the analysis of disease propagation and video prediction. Modern techniques for the solution of these problems typically combine Convolution Neural Networks (CNN) architecture with a time prediction mechanism. However, oftentimes, such approaches underperform in the long-range propagation of information and lack explainability. In this work, we introduce a physically inspired architecture for the solution of such problems. Namely, we propose to augment CNNs with advection by designing a novel semi-Lagrangian push operator. We show that the proposed operator allows for the non-local transformation of information compared with standard convolutional kernels. We then complement it with Reaction and Diffusion neural components to form a network that mimics the Reaction-Advection-Diffusion equation, in high dimensions. We demonstrate the effectiveness of our network on a number of spatio-temporal datasets that show their merit. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.14003 [pdf, other]

Deep Optimal Experimental Design for Parameter Estimation Problems

Authors: Md Shahriar Rahim Siddiqui, Arman Rahmim, Eldad Haber

Abstract: Optimal experimental design is a well studied field in applied science and engineering. Techniques for estimating such a design are commonly used within the framework of parameter estimation. Nonetheless, in recent years parameter estimation techniques are changing rapidly with the introduction of deep learning techniques to replace traditional estimation methods. This in turn requires the adaptat… ▽ More Optimal experimental design is a well studied field in applied science and engineering. Techniques for estimating such a design are commonly used within the framework of parameter estimation. Nonetheless, in recent years parameter estimation techniques are changing rapidly with the introduction of deep learning techniques to replace traditional estimation methods. This in turn requires the adaptation of optimal experimental design that is associated with these new techniques. In this paper we investigate a new experimental design methodology that uses deep learning. We show that the training of a network as a Likelihood Free Estimator can be used to significantly simplify the design process and circumvent the need for the computationally expensive bi-level optimization problem that is inherent in optimal experimental design for non-linear systems. Furthermore, deep design improves the quality of the recovery process for parameter estimation problems. As proof of concept we apply our methodology to two different systems of Ordinary Differential Equations. △ Less

Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.10871 [pdf, other]

Graph Neural Reaction Diffusion Models

Authors: Moshe Eliasof, Eldad Haber, Eran Treister

Abstract: The integration of Graph Neural Networks (GNNs) and Neural Ordinary and Partial Differential Equations has been extensively studied in recent years. GNN architectures powered by neural differential equations allow us to reason about their behavior, and develop GNNs with desired properties such as controlled smoothing or energy conservation. In this paper we take inspiration from Turing instabiliti… ▽ More The integration of Graph Neural Networks (GNNs) and Neural Ordinary and Partial Differential Equations has been extensively studied in recent years. GNN architectures powered by neural differential equations allow us to reason about their behavior, and develop GNNs with desired properties such as controlled smoothing or energy conservation. In this paper we take inspiration from Turing instabilities in a Reaction Diffusion (RD) system of partial differential equations, and propose a novel family of GNNs based on neural RD systems. We \textcolor{black}{demonstrate} that our RDGNN is powerful for the modeling of various data types, from homophilic, to heterophilic, and spatio-temporal datasets. We discuss the theoretical properties of our RDGNN, its implementation, and show that it improves or offers competitive performance to state-of-the-art methods. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: Accepted at SIAM Journal on Scientific Computing (Submitted for review on 06/2023)

arXiv:2405.21021 [pdf, other]

Beyond Conventional Parametric Modeling: Data-Driven Framework for Estimation and Prediction of Time Activity Curves in Dynamic PET Imaging

Authors: Niloufar Zakariaei, Arman Rahmim, Eldad Haber

Abstract: Dynamic Positron Emission Tomography (dPET) imaging and Time-Activity Curve (TAC) analyses are essential for understanding and quantifying the biodistribution of radiopharmaceuticals over time and space. Traditional compartmental modeling, while foundational, commonly struggles to fully capture the complexities of biological systems, including non-linear dynamics and variability. This study introd… ▽ More Dynamic Positron Emission Tomography (dPET) imaging and Time-Activity Curve (TAC) analyses are essential for understanding and quantifying the biodistribution of radiopharmaceuticals over time and space. Traditional compartmental modeling, while foundational, commonly struggles to fully capture the complexities of biological systems, including non-linear dynamics and variability. This study introduces an innovative data-driven neural network-based framework, inspired by Reaction Diffusion systems, designed to address these limitations. Our approach, which adaptively fits TACs from dPET, enables the direct calibration of diffusion coefficients and reaction terms from observed data, offering significant improvements in predictive accuracy and robustness over traditional methods, especially in complex biological scenarios. By more accurately modeling the spatio-temporal dynamics of radiopharmaceuticals, our method advances modeling of pharmacokinetic and pharmacodynamic processes, enabling new possibilities in quantitative nuclear medicine. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.13220 [pdf, other]

Paired Autoencoders for Inverse Problems

Authors: Matthias Chung, Emma Hart, Julianne Chung, Bas Peters, Eldad Haber

Abstract: We consider the solution of nonlinear inverse problems where the forward problem is a discretization of a partial differential equation. Such problems are notoriously difficult to solve in practice and require minimizing a combination of a data-fit term and a regularization term. The main computational bottleneck of typical algorithms is the direct estimation of the data misfit. Therefore, likelih… ▽ More We consider the solution of nonlinear inverse problems where the forward problem is a discretization of a partial differential equation. Such problems are notoriously difficult to solve in practice and require minimizing a combination of a data-fit term and a regularization term. The main computational bottleneck of typical algorithms is the direct estimation of the data misfit. Therefore, likelihood-free approaches have become appealing alternatives. Nonetheless, difficulties in generalization and limitations in accuracy have hindered their broader utility and applicability. In this work, we use a paired autoencoder framework as a likelihood-free estimator for inverse problems. We show that the use of such an architecture allows us to construct a solution efficiently and to overcome some known open problems when using likelihood-free estimators. In particular, our framework can assess the quality of the solution and improve on it if needed. We demonstrate the viability of our approach using examples from full waveform inversion and inverse electromagnetic imaging. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 18 pages, 6 figures

arXiv:2404.04874 [pdf, other]

Graph Neural Networks for Binary Programming

Authors: Moshe Eliasof, Eldad Haber

Abstract: This paper investigates a link between Graph Neural Networks (GNNs) and Binary Programming (BP) problems, laying the groundwork for GNNs to approximate solutions for these computationally challenging problems. By analyzing the sensitivity of BP problems, we are able to frame the solution of BP problems as a heterophilic node classification task. We then propose Binary-Programming GNN (BPGNN), an a… ▽ More This paper investigates a link between Graph Neural Networks (GNNs) and Binary Programming (BP) problems, laying the groundwork for GNNs to approximate solutions for these computationally challenging problems. By analyzing the sensitivity of BP problems, we are able to frame the solution of BP problems as a heterophilic node classification task. We then propose Binary-Programming GNN (BPGNN), an architecture that integrates graph representation learning techniques with BP-aware features to approximate BP solutions efficiently. Additionally, we introduce a self-supervised data generation mechanism, to enable efficient and tractable training data acquisition even for large-scale BP problems. Experimental evaluations of BPGNN across diverse BP problem sizes showcase its superior performance compared to exhaustive search and heuristic approaches. Finally, we discuss open challenges in the under-explored field of BP problems with GNNs. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2402.04653 [pdf, other]

An Over Complete Deep Learning Method for Inverse Problems

Authors: Moshe Eliasof, Eldad Haber, Eran Treister

Abstract: Obtaining meaningful solutions for inverse problems has been a major challenge with many applications in science and engineering. Recent machine learning techniques based on proximal and diffusion-based methods have shown promising results. However, as we show in this work, they can also face challenges when applied to some exemplary problems. We show that similar to previous works on over-complet… ▽ More Obtaining meaningful solutions for inverse problems has been a major challenge with many applications in science and engineering. Recent machine learning techniques based on proximal and diffusion-based methods have shown promising results. However, as we show in this work, they can also face challenges when applied to some exemplary problems. We show that similar to previous works on over-complete dictionaries, it is possible to overcome these shortcomings by embedding the solution into higher dimensions. The novelty of the work proposed is that we jointly design and learn the embedding and the regularizer for the embedding vector. We demonstrate the merit of this approach on several exemplary and common inverse problems. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2401.11074 [pdf, other]

On The Temporal Domain of Differential Equation Inspired Graph Neural Networks

Authors: Moshe Eliasof, Eldad Haber, Eran Treister, Carola-Bibiane Schönlieb

Abstract: Graph Neural Networks (GNNs) have demonstrated remarkable success in modeling complex relationships in graph-structured data. A recent innovation in this field is the family of Differential Equation-Inspired Graph Neural Networks (DE-GNNs), which leverage principles from continuous dynamical systems to model information flow on graphs with built-in properties such as feature smoothing or preservat… ▽ More Graph Neural Networks (GNNs) have demonstrated remarkable success in modeling complex relationships in graph-structured data. A recent innovation in this field is the family of Differential Equation-Inspired Graph Neural Networks (DE-GNNs), which leverage principles from continuous dynamical systems to model information flow on graphs with built-in properties such as feature smoothing or preservation. However, existing DE-GNNs rely on first or second-order temporal dependencies. In this paper, we propose a neural extension to those pre-defined temporal dependencies. We show that our model, called TDE-GNN, can capture a wide range of temporal dynamics that go beyond typical first or second-order methods, and provide use cases where existing temporal models are challenged. We demonstrate the benefit of learning the temporal dependencies using our method rather than using pre-defined temporal dynamics on several graph benchmarks. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: AISTATS 2024

arXiv:2401.03827 [pdf, ps, other]

Supersymmetry, Part I (Theory)

Authors: Ben Allanach, Howard E. Haber

Abstract: This is a review of the theoretical aspects of the supersymmetric extension of the Standard Model of particle physics, extracted from Chapter 88 of the 2023 update of the Review of Particle Physics, which appears in R.L. Workman et al. (Particle Data Group), Prog. Theor. Exp. Phys. 2022, 083C01 (2022) and 2023 update. The companion review, "Supersymmetry, Part II (Experiment)", can be found in Cha… ▽ More This is a review of the theoretical aspects of the supersymmetric extension of the Standard Model of particle physics, extracted from Chapter 88 of the 2023 update of the Review of Particle Physics, which appears in R.L. Workman et al. (Particle Data Group), Prog. Theor. Exp. Phys. 2022, 083C01 (2022) and 2023 update. The companion review, "Supersymmetry, Part II (Experiment)", can be found in Chapter 89 of the Review of Particle Physics (op. cit.). △ Less

Submitted 3 February, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: 46 pages, 1 figure, 1 table; v2: a few new references have been added

arXiv:2312.12969 [pdf, ps, other]

Explicit form for the most general Lorentz transformation revisited

Authors: Howard E. Haber

Abstract: Explicit formulae for the $4\times 4$ Lorentz transformation matrices corresponding to a pure boost and a pure three-dimensional rotation are very well-known. Significantly less well-known is the explicit formula for a general Lorentz transformation with arbitrary nonzero boost and rotation parameters. We revisit this more general formula by presenting two different derivations. The first derivati… ▽ More Explicit formulae for the $4\times 4$ Lorentz transformation matrices corresponding to a pure boost and a pure three-dimensional rotation are very well-known. Significantly less well-known is the explicit formula for a general Lorentz transformation with arbitrary nonzero boost and rotation parameters. We revisit this more general formula by presenting two different derivations. The first derivation (which is somewhat simpler than previous ones appearing in the literature) evaluates the exponential of a $4\times 4$ real matrix $A$, where $A$ is a product of the diagonal matrix ${\rm diag}(+1, -1, -1, -1)$ and an arbitrary $4\times 4$ real antisymmetric matrix. The formula for $\exp A$ depends only on the eigenvalues of $A$ and makes use of the Lagrange interpolating polynomial. The second derivation exploits the observation that the spinor product $η^\dagger\overlineσ^{\lower3pt\hbox{$\scriptstyle μ$}}χ$ transforms as a Lorentz four-vector, where $χ$ and $η$ are two-component spinors. The advantage of the latter derivation is that the corresponding formula for a general Lorentz transformation $Λ$ reduces to the computation of the trace of a product of $2\times 2$ matrices. Both computations are shown to yield equivalent expressions for $Λ$. △ Less

Submitted 8 August, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: 26 pages; v2: typographical errors fixed and a minor improvement of notation is implemented. In addition, the explicit form for a Lorentz transformation in 2+1 spacetime dimensions is provided; v3: further typographical errors fixed and a number of tweaks have been made to improve the presentation

arXiv:2312.08599 [pdf, other]

Multi-IRS Aided Mobile Edge Computing for High Reliability and Low Latency Services

Authors: Elie El Haber, Mohamed Elhattab, Chadi Assi, Sanaa Sharafeddine, Kim Khoa Nguyen

Abstract: Although multi-access edge computing (MEC) has allowed for computation offloading at the network edge, weak wireless signals in the radio access network caused by obstacles and high network load are still preventing efficient edge computation offloading, especially for user requests with stringent latency and reliability requirements. Intelligent reflective surfaces (IRS) have recently emerged as… ▽ More Although multi-access edge computing (MEC) has allowed for computation offloading at the network edge, weak wireless signals in the radio access network caused by obstacles and high network load are still preventing efficient edge computation offloading, especially for user requests with stringent latency and reliability requirements. Intelligent reflective surfaces (IRS) have recently emerged as a technology capable of enhancing the quality of the signals in the radio access network, where passive reflecting elements can be tuned to improve the uplink or downlink signals. Harnessing the IRS's potential in enhancing the performance of edge computation offloading, in this paper, we study the optimized use of a system of multi-IRS along with the design of the offloading (to an edge with multi MECs) and resource allocation parameters for the purpose of minimizing the devices' energy consumption considering 5G services with stringent latency and reliability requirements. After presenting our non-convex mathematical problem, we propose a suboptimal solution based on alternating optimization where we divide the problem into sub-problems which are then solved separately. Specifically, the offloading decision is solved through a matching game algorithm, and then the IRS phase shifts and resource allocation optimizations are solved in an alternating fashion using the Difference of Convex approach. The obtained results demonstrate the gains both in energy and network resources and highlight the IRS's influence on the design of the MEC parameters. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2311.04976 [pdf, other]

doi 10.1016/j.physletb.2024.138501

Classes of complete dark photon models constrained by Z-Physics

Authors: Miguel P. Bento, Howard E. Haber, João P. Silva

Abstract: Dark Matter models that employ a vector portal to a dark sector are usually treated as an effective theory that incorporates kinetic mixing of the photon with a new U(1) gauge boson, with the $Z$ boson integrated out. However, a more complete theory must employ the full SU(2)$_L\times $U(1)$_Y \times $U(1)$_{Y^\prime}$ gauge group, in which kinetic mixing of the $Z$ boson with the new U(1) gauge b… ▽ More Dark Matter models that employ a vector portal to a dark sector are usually treated as an effective theory that incorporates kinetic mixing of the photon with a new U(1) gauge boson, with the $Z$ boson integrated out. However, a more complete theory must employ the full SU(2)$_L\times $U(1)$_Y \times $U(1)$_{Y^\prime}$ gauge group, in which kinetic mixing of the $Z$ boson with the new U(1) gauge boson is taken into account. The importance of the more complete analysis is demonstrated by an example where the parameter space of the effective theory that yields the observed dark matter relic density is in conflict with a suitably defined electroweak $ρ$-parameter that is deduced from a global fit to $Z$ physics data. △ Less

Submitted 7 February, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: 7 pages, 1 figure; v2: clarifying remarks added at the end of Section IV and in Section V, along with additional references in Sections I and V; v3: switched to Elsevier style; v4: minor modifications to the text and references; matches published version

Journal ref: Phys. Lett. B 850 (2024) 138501

arXiv:2307.16092 [pdf, ps, other]

Feature Transportation Improves Graph Neural Networks

Authors: Moshe Eliasof, Eldad Haber, Eran Treister

Abstract: Graph neural networks (GNNs) have shown remarkable success in learning representations for graph-structured data. However, GNNs still face challenges in modeling complex phenomena that involve feature transportation. In this paper, we propose a novel GNN architecture inspired by Advection-Diffusion-Reaction systems, called ADR-GNN. Advection models feature transportation, while diffusion captures… ▽ More Graph neural networks (GNNs) have shown remarkable success in learning representations for graph-structured data. However, GNNs still face challenges in modeling complex phenomena that involve feature transportation. In this paper, we propose a novel GNN architecture inspired by Advection-Diffusion-Reaction systems, called ADR-GNN. Advection models feature transportation, while diffusion captures the local smoothing of features, and reaction represents the non-linear transformation between feature channels. We provide an analysis of the qualitative behavior of ADR-GNN, that shows the benefit of combining advection, diffusion, and reaction. To demonstrate its efficacy, we evaluate ADR-GNN on real-world node classification and spatio-temporal datasets, and show that it improves or offers competitive performance compared to state-of-the-art networks. △ Less

Submitted 20 December, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

Comments: AAAI 2024

arXiv:2306.01836 [pdf, other]

doi 10.1007/JHEP10(2023)083

Tree-level Unitarity in SU(2)$_L\times$U(1)$_Y \times$U(1)$_{Y'}$ Models

Authors: Miguel P. Bento, Howard E. Haber, João P. Silva

Abstract: In models with a U(1) gauge extension beyond the Standard Model, one can derive sum rules for the couplings of the theory that are a consequence of tree-level unitarity. In this paper, we provide a comprehensive list of coupling sum rules for a general SU(2)$_L\times$U(1)$_Y \times$U(1)$_{Y'}$ gauge theory coupled to an arbitrary set of fermion and scalar multiplets. These results are of particula… ▽ More In models with a U(1) gauge extension beyond the Standard Model, one can derive sum rules for the couplings of the theory that are a consequence of tree-level unitarity. In this paper, we provide a comprehensive list of coupling sum rules for a general SU(2)$_L\times$U(1)$_Y \times$U(1)$_{Y'}$ gauge theory coupled to an arbitrary set of fermion and scalar multiplets. These results are of particular interest for models of dark matter that employ an extended gauge sector mediated by a new (dark) $Z^\prime$ gauge boson. For the case of a minimal extension of the Standard Model with a U(1)$_{Y'}$ gauge boson, we clarify the definitions of the weak mixing angle and the electroweak $ρ$ parameter. We demonstrate the utility of a generalized $ρ$ parameter (denoted by $ρ^\prime$) whose definition naturally follows from the unitarity sum rules developed in this paper. △ Less

Submitted 13 December, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: 29 pages, 2 figures, 2 tables. Version 3 coincides with the published version. Version 4 adds some clarifying remarks to footnotes 3 and 4 and reorganizes the material of Section 6.3

Journal ref: JHEP 10 (2023) 083

arXiv:2304.00015 [pdf, other]

DRIP: Deep Regularizers for Inverse Problems

Authors: Moshe Eliasof, Eldad Haber, Eran Treister

Abstract: In this paper we consider inverse problems that are mathematically ill-posed. That is, given some (noisy) data, there is more than one solution that approximately fits the data. In recent years, deep neural techniques that find the most appropriate solution, in the sense that it contains a-priori information, were developed. However, they suffer from several shortcomings. First, most techniques ca… ▽ More In this paper we consider inverse problems that are mathematically ill-posed. That is, given some (noisy) data, there is more than one solution that approximately fits the data. In recent years, deep neural techniques that find the most appropriate solution, in the sense that it contains a-priori information, were developed. However, they suffer from several shortcomings. First, most techniques cannot guarantee that the solution fits the data at inference. Second, while the derivation of the techniques is inspired by the existence of a valid scalar regularization function, such techniques do not in practice rely on such a function, and therefore veer away from classical variational techniques. In this work we introduce a new family of neural regularizers for the solution of inverse problems. These regularizers are based on a variational formulation and are guaranteed to fit the data. We demonstrate their use on a number of highly ill-posed problems, from image deblurring to limited angle tomography. △ Less

Submitted 25 August, 2023; v1 submitted 30 March, 2023; originally announced April 2023.

arXiv:2303.11404 [pdf, other]

Semi-Automated Segmentation of Geoscientific Data Using Superpixels

Authors: Conrad P. Koziol, Eldad Haber

Abstract: Geological processes determine the distribution of resources such as critical minerals, water, and geothermal energy. However, direct observation of geology is often prevented by surface cover such as overburden or vegetation. In such cases, remote and in-situ surveys are frequently conducted to collect physical measurements of the earth indicative of the geology. Developing a geological segmentat… ▽ More Geological processes determine the distribution of resources such as critical minerals, water, and geothermal energy. However, direct observation of geology is often prevented by surface cover such as overburden or vegetation. In such cases, remote and in-situ surveys are frequently conducted to collect physical measurements of the earth indicative of the geology. Developing a geological segmentation based on these measurements is challenging since individual datasets can differ in properties (e.g. units, dynamic ranges, textures) and because the data does not uniquely constrain the geology. Further, as the number of datasets grows the information to constrain geology increases while simultaneously becoming harder to make sense of. Inspired by the concept of superpixels, we propose a deep-learning based approach to segment rasterized survey data into regions with similar characteristics. We demonstrate its use for semi-automated geoscientific mapping with datasets arising from independent sensors and with diverse properties. In addition, we introduce a new loss function for superpixels including a novel regularization parameter penalizing image segmentation with non-connected component superpixels. This improves integration of prior knowledge by allowing better control over the number of superpixels generated. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 11 pages, 7 figures

arXiv:2302.13697 [pdf, other]

Accommodating Hints of New Heavy Scalars in the Framework of the Flavor-Aligned Two-Higgs-Doublet Model

Authors: Joseph M. Connell, Pedro Ferreira, Howard E. Haber

Abstract: Searches for new neutral Higgs bosons of an extended Higgs sector at the LHC can be interpreted in the framework of the two-Higgs doublet model. By employing generic flavor-aligned Higgs-fermion Yukawa couplings, we propose an analysis that uses experimental data to determine whether flavor alignment is a consequence of a symmetry that is either exact or at most softly broken. We illustrate our pr… ▽ More Searches for new neutral Higgs bosons of an extended Higgs sector at the LHC can be interpreted in the framework of the two-Higgs doublet model. By employing generic flavor-aligned Higgs-fermion Yukawa couplings, we propose an analysis that uses experimental data to determine whether flavor alignment is a consequence of a symmetry that is either exact or at most softly broken. We illustrate our proposal in two different scenarios based on a few 3 sigma (local) excesses observed by the ATLAS and CMS Collaborations in their searches for heavy scalars. In Scenario 1, an excess of events is interpreted as $A\to ZH\to \ell^+\ell^- b\bar{b}$ (where $\ell=e$ or $μ$), with the CP-odd and CP-even neutral scalar masses given by $m_A=610$ GeV and $m_H=290$ GeV, respectively. In Scenario 2, an excess of events in the production of $t\bar{t}$ and $τ^+τ^-$ final states is interpreted as decays of a CP-odd scalar of mass $m_A=400$ GeV. Scenario 1 is consistent with Type-I Yukawa interactions, which can arise in a 2HDM subject to a softly-broken $\mathbb{Z}_2$ discrete symmetry. Scenario 2 is inconsistent with a symmetry-based flavor alignment, but can be consistent with more general flavor-aligned Higgs-fermion Yukawa couplings. △ Less

Submitted 23 September, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: 49 pages, 19 figures, 15 tables; v2: minor tweaks and some added references; v3: better implementation of the CMS 95% ditau exclusion region and projections of Scenario 2 for Run 3 of the LHC added; v4: minor tweaks and some updated references; version accepted for publication in Physical Review D; v5: two additional references added

Report number: SCIPP-22/02

arXiv:2211.16631 [pdf, other]

Every Node Counts: Improving the Training of Graph Neural Networks on Node Classification

Authors: Moshe Eliasof, Eldad Haber, Eran Treister

Abstract: Graph Neural Networks (GNNs) are prominent in handling sparse and unstructured data efficiently and effectively. Specifically, GNNs were shown to be highly effective for node classification tasks, where labelled information is available for only a fraction of the nodes. Typically, the optimization process, through the objective function, considers only labelled nodes while ignoring the rest. In th… ▽ More Graph Neural Networks (GNNs) are prominent in handling sparse and unstructured data efficiently and effectively. Specifically, GNNs were shown to be highly effective for node classification tasks, where labelled information is available for only a fraction of the nodes. Typically, the optimization process, through the objective function, considers only labelled nodes while ignoring the rest. In this paper, we propose novel objective terms for the training of GNNs for node classification, aiming to exploit all the available data and improve accuracy. Our first term seeks to maximize the mutual information between node and label features, considering both labelled and unlabelled nodes in the optimization process. Our second term promotes anisotropic smoothness in the prediction maps. Lastly, we propose a cross-validating gradients approach to enhance the learning from labelled data. Our proposed objectives are general and can be applied to various GNNs and require no architectural modifications. Extensive experiments demonstrate our approach using popular GNNs like GCN, GAT and GCNII, reading a consistent and significant accuracy improvement on 10 real-world node classification datasets. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.14302 [pdf, other]

Neural DAEs: Constrained neural networks

Authors: Tue Boesen, Eldad Haber, Uri Michael Ascher

Abstract: This article investigates the effect of explicitly adding auxiliary algebraic trajectory information to neural networks for dynamical systems. We draw inspiration from the field of differential-algebraic equations and differential equations on manifolds and implement related methods in residual neural networks, despite some fundamental scenario differences. Constraint or auxiliary information effe… ▽ More This article investigates the effect of explicitly adding auxiliary algebraic trajectory information to neural networks for dynamical systems. We draw inspiration from the field of differential-algebraic equations and differential equations on manifolds and implement related methods in residual neural networks, despite some fundamental scenario differences. Constraint or auxiliary information effects are incorporated through stabilization as well as projection methods, and we show when to use which method based on experiments involving simulations of multi-body pendulums and molecular dynamics scenarios. Several of our methods are easy to implement in existing code and have limited impact on training performance while giving significant boosts in terms of inference. △ Less

Submitted 12 March, 2024; v1 submitted 25 November, 2022; originally announced November 2022.

Comments: Extended the paper to PDEs, added a third experiment denoising a vector field and updated the introduction to make the distinction between this work and physics informed neural networks more clear

MSC Class: 70H99; 34A09

arXiv:2211.09252 [pdf, other]

doi 10.21468/SciPostPhys.15.5.188

A quantum register using collective excitations in a Bose-Einstein condensate

Authors: Elisha Haber, Zekai Chen, Nicholas P. Bigelow

Abstract: A qubit made up of an ensemble of atoms is attractive due to its resistance to atom losses, and many proposals to realize such a qubit are based on the Rydberg blockade effect. In this work, we instead consider an experimentally feasible protocol to coherently load a spin-dependent optical lattice from a spatially overlapping Bose--Einstein condensate. Identifying each lattice site as a qubit, wit… ▽ More A qubit made up of an ensemble of atoms is attractive due to its resistance to atom losses, and many proposals to realize such a qubit are based on the Rydberg blockade effect. In this work, we instead consider an experimentally feasible protocol to coherently load a spin-dependent optical lattice from a spatially overlapping Bose--Einstein condensate. Identifying each lattice site as a qubit, with an empty or filled site as the qubit basis, we discuss how high-fidelity single-qubit operations, two-qubit gates between arbitrary pairs of qubits, and nondestructive measurements could be performed. In this setup, the effect of atom losses has been mitigated, the atoms never need to be removed from the ground state manifold, and separate storage and computational bases for the qubits are not required, all of which can be significant sources of decoherence in many other types of atomic qubits. △ Less

Submitted 6 July, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: 24+8 pages, 9 figures

Journal ref: SciPost Phys. 15, 188 (2023)

arXiv:2210.00449 [pdf, ps, other]

Higgs Boson Physics -- The View Ahead

Authors: Howard E. Haber

Abstract: Eleven years ago, the Higgs boson was discovered at the LHC. I briefly survey the status of Higgs boson physics today and explore some of the implications for future Higgs studies. Although current experimental measurements are consistent with interpreting the observed Higgs boson as being consistent with the predictions of the Standard Model of particle physics, it is still possible that the Higg… ▽ More Eleven years ago, the Higgs boson was discovered at the LHC. I briefly survey the status of Higgs boson physics today and explore some of the implications for future Higgs studies. Although current experimental measurements are consistent with interpreting the observed Higgs boson as being consistent with the predictions of the Standard Model of particle physics, it is still possible that the Higgs boson is a member of an extended scalar sector that lies beyond the Standard Model. Nevertheless, an extended Higgs sector is already highly constrained. The Higgs sector can also serve as a portal to new physics beyond the Standard Model. Finally, two Higgs wishlists are assembled that merit future study and clarification at the LHC and future collider facilities now under development. △ Less

Submitted 11 October, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

Comments: Original version (v1): 7 pages, 3 figures, contributed to the September 2022 issue of the CERN EP Newsletter of the EP department. Expanded version (v2): 9 pages, 10 figures, contributed to a LHEP Special Issue: Higgs physics, and beyond, after 10 years since the discovery. Corrected expanded version (v3): typographical errors corrected, a few minor modifications and additional references added

arXiv:2208.09433 [pdf, other]

Estimating a potential without the agony of the partition function

Authors: Eldad Haber, Moshe Eliasof, Luis Tenorio

Abstract: Estimating a Gibbs density function given a sample is an important problem in computational statistics and statistical learning. Although the well established maximum likelihood method is commonly used, it requires the computation of the partition function (i.e., the normalization of the density). This function can be easily calculated for simple low-dimensional problems but its computation is d… ▽ More Estimating a Gibbs density function given a sample is an important problem in computational statistics and statistical learning. Although the well established maximum likelihood method is commonly used, it requires the computation of the partition function (i.e., the normalization of the density). This function can be easily calculated for simple low-dimensional problems but its computation is difficult or even intractable for general densities and high-dimensional problems. In this paper we propose an alternative approach based on Maximum A-Posteriori (MAP) estimators, we name Maximum Recovery MAP (MR-MAP), to derive estimators that do not require the computation of the partition function, and reformulate the problem as an optimization problem. We further propose a least-action type potential that allows us to quickly solve the optimization problem as a feed-forward hyperbolic neural network. We demonstrate the effectiveness of our methods on some standard data sets. △ Less

Submitted 11 March, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

arXiv:2207.07408 [pdf, other]

pathGCN: Learning General Graph Spatial Operators from Paths

Authors: Moshe Eliasof, Eldad Haber, Eran Treister

Abstract: Graph Convolutional Networks (GCNs), similarly to Convolutional Neural Networks (CNNs), are typically based on two main operations - spatial and point-wise convolutions. In the context of GCNs, differently from CNNs, a pre-determined spatial operator based on the graph Laplacian is often chosen, allowing only the point-wise operations to be learnt. However, learning a meaningful spatial operator i… ▽ More Graph Convolutional Networks (GCNs), similarly to Convolutional Neural Networks (CNNs), are typically based on two main operations - spatial and point-wise convolutions. In the context of GCNs, differently from CNNs, a pre-determined spatial operator based on the graph Laplacian is often chosen, allowing only the point-wise operations to be learnt. However, learning a meaningful spatial operator is critical for developing more expressive GCNs for improved performance. In this paper we propose pathGCN, a novel approach to learn the spatial operator from random paths on the graph. We analyze the convergence of our method and its difference from existing GCNs. Furthermore, we discuss several options of combining our learnt spatial operator with point-wise convolutions. Our extensive experiments on numerous datasets suggest that by properly learning both the spatial and point-wise convolutions, phenomena like over-smoothing can be inherently avoided, and new state-of-the-art performance is achieved. △ Less

Submitted 15 July, 2022; originally announced July 2022.

Comments: ICML 2022

arXiv:2206.09643 [pdf, ps, other]

doi 10.1103/PhysRevD.106.095038

P-even, CP-violating Signals in Scalar-Mediated Processes

Authors: Howard E. Haber, Venus Keus, Rui Santos

Abstract: Most studies of Higgs sector CP violation focus on the detection of CP-violating neutral Higgs-fermion Yukawa couplings, which yield P-odd, CP-violating phenomena. There is some literature on purely bosonic signatures of Higgs sector CP violation, where the simultaneous observation of three processes (suitably chosen) constitutes a signal of P-even CP violation. However, in the examples previously… ▽ More Most studies of Higgs sector CP violation focus on the detection of CP-violating neutral Higgs-fermion Yukawa couplings, which yield P-odd, CP-violating phenomena. There is some literature on purely bosonic signatures of Higgs sector CP violation, where the simultaneous observation of three processes (suitably chosen) constitutes a signal of P-even CP violation. However, in the examples previously analyzed, some of the processes are strongly suppressed in the approximate Higgs alignment limit (corresponding to the existence of a Standard Model like Higgs boson as suggested by LHC data), in which case the proposed CP-violating signals are difficult to observe in practice. In this paper, we extend the existing literature by examining processes that do not vanish in the Higgs alignment limit and whose simultaneous observation would provide unambiguous evidence for scalar-mediated P-even CP violation. We assess the discovery potential of such signals at various future multi-TeV lepton (and $γγ$) colliders. The potential for detecting loop-induced P-even, CP-violating phenomena is also considered. △ Less

Submitted 1 December, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

Comments: 58 pages, 10 figures, 10 tables. One reference added in version 2. New footnote 15 and a number of additional references added in version 3. Clarifications added at the beginning of Section 4 and some references updated in version 4. A few typographical errors corrected in version 5, which now approximately matches the published version

Report number: DIAS-STP-22-02 and SCIPP-22/01

Journal ref: Physical Review D 106, 095038 (2022)

arXiv:2205.07578 [pdf, ps, other]

A natural mechanism for a SM-like Higgs boson in the 2HDM without decoupling

Authors: Howard E. Haber

Abstract: The properties of the Higgs boson discovered at the Large Hadron Collider are very well described by the Standard Model (SM). Thus, any theory that invokes an extended Higgs sector must explain why the neutral scalar observed at the LHC so closely resembles the SM Higgs boson. In this talk, I review the Higgs alignment limit, in which one neutral scalar state of the Higgs sector is SM-like. An app… ▽ More The properties of the Higgs boson discovered at the Large Hadron Collider are very well described by the Standard Model (SM). Thus, any theory that invokes an extended Higgs sector must explain why the neutral scalar observed at the LHC so closely resembles the SM Higgs boson. In this talk, I review the Higgs alignment limit, in which one neutral scalar state of the Higgs sector is SM-like. An approximate Higgs alignment can be achieved "naturally" either via decoupling or via an approximate symmetry. Using the two-Higgs doublet model as a prototype for an extended Higgs sector, I examine the symmetries of the scalar potential and their soft breakings that may be responsible for the SM-like properties of the observed Higgs boson, and I demonstrate how to extend such (softly-broken) symmetries to the Yukawa sector of the model. △ Less

Submitted 3 October, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

Comments: 15 pages, 2 figures, 5 tables, contribution to the Proceedings of DISCRETE 2020-2021; version 2 updates several references

arXiv:2203.05006 [pdf, other]

Resource-Efficient Invariant Networks: Exponential Gains by Unrolled Optimization

Authors: Sam Buchanan, Jingkai Yan, Ellie Haber, John Wright

Abstract: Achieving invariance to nuisance transformations is a fundamental challenge in the construction of robust and reliable vision systems. Existing approaches to invariance scale exponentially with the dimension of the family of transformations, making them unable to cope with natural variabilities in visual data such as changes in pose and perspective. We identify a common limitation of these approac… ▽ More Achieving invariance to nuisance transformations is a fundamental challenge in the construction of robust and reliable vision systems. Existing approaches to invariance scale exponentially with the dimension of the family of transformations, making them unable to cope with natural variabilities in visual data such as changes in pose and perspective. We identify a common limitation of these approaches--they rely on sampling to traverse the high-dimensional space of transformations--and propose a new computational primitive for building invariant networks based instead on optimization, which in many scenarios provides a provably more efficient method for high-dimensional exploration than sampling. We provide empirical and theoretical corroboration of the efficiency gains and soundness of our proposed method, and demonstrate its utility in constructing an efficient invariant network for a simple hierarchical object detection task when combined with unrolled optimization. Code for our networks and experiments is available at https://github.com/sdbuch/refine. △ Less

Submitted 9 March, 2022; originally announced March 2022.

arXiv:2110.09585 [pdf, other]

A-Optimal Active Learning

Authors: Tue Boesen, Eldad Haber

Abstract: In this work we discuss the problem of active learning. We present an approach that is based on A-optimal experimental design of ill-posed problems and show how one can optimally label a data set by partially probing it, and use it to train a deep network. We present two approaches that make different assumptions on the data set. The first is based on a Bayesian interpretation of the semi-supervis… ▽ More In this work we discuss the problem of active learning. We present an approach that is based on A-optimal experimental design of ill-posed problems and show how one can optimally label a data set by partially probing it, and use it to train a deep network. We present two approaches that make different assumptions on the data set. The first is based on a Bayesian interpretation of the semi-supervised learning problem with the graph Laplacian that is used for the prior distribution and the second is based on a frequentist approach, that updates the estimation of the bias term based on the recovery of the labels. We demonstrate that this approach can be highly efficient for estimating labels and training a deep network. △ Less

Submitted 25 November, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

Comments: 14 pages, submitted to Physica Scripta

MSC Class: 68T05 (primary); 68T07; 62K05

arXiv:2109.03320 [pdf, other]

A momentum dependent optical lattice induced by artificial gauge potential

Authors: Zekai Chen, Hepeng Yao, Elisha Haber, Nicholas P. Bigelow

Abstract: We propose an experimentally feasible method to generate a one-dimensional optical lattice potential in an ultracold Bose gas system that depends on the transverse momentum of the atoms. The optical lattice is induced by the artificial gauge potential generated by a periodically driven multi-laser Raman process, which depends on the transverse momentum of the atoms. We study the many-body Bose-Hub… ▽ More We propose an experimentally feasible method to generate a one-dimensional optical lattice potential in an ultracold Bose gas system that depends on the transverse momentum of the atoms. The optical lattice is induced by the artificial gauge potential generated by a periodically driven multi-laser Raman process, which depends on the transverse momentum of the atoms. We study the many-body Bose-Hubbard model in an effective 1D case and show that the superfluid--Mott-insulator transition can be controlled via tuning the transverse momentum of the atomic gas. We examined our prediction via a strong-coupling expansion to an effective 1D Bose-Hubbard model and a quantum Monte Carlo calculation, and discuss possible applications of our system. △ Less

Submitted 7 September, 2021; originally announced September 2021.

arXiv:2108.01938 [pdf, other]

PDE-GCN: Novel Architectures for Graph Neural Networks Motivated by Partial Differential Equations

Authors: Moshe Eliasof, Eldad Haber, Eran Treister

Abstract: Graph neural networks are increasingly becoming the go-to approach in various fields such as computer vision, computational biology and chemistry, where data are naturally explained by graphs. However, unlike traditional convolutional neural networks, deep graph networks do not necessarily yield better performance than shallow graph networks. This behavior usually stems from the over-smoothing phe… ▽ More Graph neural networks are increasingly becoming the go-to approach in various fields such as computer vision, computational biology and chemistry, where data are naturally explained by graphs. However, unlike traditional convolutional neural networks, deep graph networks do not necessarily yield better performance than shallow graph networks. This behavior usually stems from the over-smoothing phenomenon. In this work, we propose a family of architectures to control this behavior by design. Our networks are motivated by numerical methods for solving Partial Differential Equations (PDEs) on manifolds, and as such, their behavior can be explained by similar analysis. Moreover, as we demonstrate using an extensive set of experiments, our PDE-motivated networks can generalize and be effective for various types of problems from different fields. Our architectures obtain better or on par with the current state-of-the-art results for problems that are typically approached using different architectures. △ Less

Submitted 26 October, 2021; v1 submitted 4 August, 2021; originally announced August 2021.

Comments: NeurIPS 2021

arXiv:2107.11235 [pdf, other]

doi 10.1063/5.0064458

Robust deep learning for emulating turbulent viscosities

Authors: Aakash Patil, Jonathan Viquerat, George El Haber, Elie Hachem

Abstract: From the simplest models to complex deep neural networks, modeling turbulence with machine learning techniques still offers multiple challenges. In this context, the present contribution proposes a robust strategy using patch-based training to learn turbulent viscosity from flow velocities, and demonstrates its efficient use on the Spallart-Allmaras turbulence model. Training datasets are generate… ▽ More From the simplest models to complex deep neural networks, modeling turbulence with machine learning techniques still offers multiple challenges. In this context, the present contribution proposes a robust strategy using patch-based training to learn turbulent viscosity from flow velocities, and demonstrates its efficient use on the Spallart-Allmaras turbulence model. Training datasets are generated for flow past two-dimensional (2D) obstacles at high Reynolds numbers and used to train an auto-encoder type convolutional neural network with local patch inputs. Compared to a standard training technique, patch-based learning not only yields increased accuracy but also reduces the computational cost required for training. △ Less

Submitted 1 October, 2021; v1 submitted 23 July, 2021; originally announced July 2021.

arXiv:2105.08883 [pdf, other]

Deep Neural Network Accelerated Implicit Filtering

Authors: Brian Irwin, Eldad Haber, Raviv Gal, Avi Ziv

Abstract: In this paper, we illustrate a novel method for solving optimization problems when derivatives are not explicitly available. We show that combining implicit filtering (IF), an existing derivative free optimization (DFO) method, with a deep neural network global approximator leads to an accelerated DFO method. Derivative free optimization problems occur in a wide variety of applications, including… ▽ More In this paper, we illustrate a novel method for solving optimization problems when derivatives are not explicitly available. We show that combining implicit filtering (IF), an existing derivative free optimization (DFO) method, with a deep neural network global approximator leads to an accelerated DFO method. Derivative free optimization problems occur in a wide variety of applications, including simulation based optimization and the optimization of stochastic processes, and naturally arise when the objective function can be viewed as a black box, such as a computer simulation. We highlight the practical value of our method, which we call deep neural network accelerated implicit filtering (DNNAIF), by demonstrating its ability to help solve the coverage directed generation (CDG) problem. Solving the CDG problem is a key part of the design and verification process for new electronic circuits, including the chips that power modern servers and smartphones. △ Less

Submitted 18 May, 2021; originally announced May 2021.

Comments: 9 pages, 6 figures

arXiv:2103.05180 [pdf, other]

An Introduction to Deep Generative Modeling

Authors: Lars Ruthotto, Eldad Haber

Abstract: Deep generative models (DGM) are neural networks with many hidden layers trained to approximate complicated, high-dimensional probability distributions using a large number of samples. When trained successfully, we can use the DGMs to estimate the likelihood of each observation and to create new samples from the underlying distribution. Developing DGMs has become one of the most hotly researched f… ▽ More Deep generative models (DGM) are neural networks with many hidden layers trained to approximate complicated, high-dimensional probability distributions using a large number of samples. When trained successfully, we can use the DGMs to estimate the likelihood of each observation and to create new samples from the underlying distribution. Developing DGMs has become one of the most hotly researched fields in artificial intelligence in recent years. The literature on DGMs has become vast and is growing rapidly. Some advances have even reached the public sphere, for example, the recent successes in generating realistic-looking images, voices, or movies; so-called deep fakes. Despite these successes, several mathematical and practical issues limit the broader use of DGMs: given a specific dataset, it remains challenging to design and train a DGM and even more challenging to find out why a particular model is or is not effective. To help advance the theoretical understanding of DGMs, we introduce DGMs and provide a concise mathematical framework for modeling the three most popular approaches: normalizing flows (NF), variational autoencoders (VAE), and generative adversarial networks (GAN). We illustrate the advantages and disadvantages of these basic approaches using numerical experiments. Our goal is to enable and motivate the reader to contribute to this proliferating research area. Our presentation also emphasizes relations between generative modeling and optimal transport. △ Less

Submitted 11 April, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: 25 pages, 11 figures

MSC Class: 68T07

arXiv:2102.07136 [pdf, ps, other]

doi 10.1103/PhysRevD.103.115012

Exceptional regions of the 2HDM parameter space

Authors: Howard E. Haber, Joao P. Silva

Abstract: The exceptional region of the parameter space (ERPS) of the two Higgs doublet model (2HDM) is defined to be the parameter regime where the scalar potential takes on a very special form. In the standard parameterization of the 2HDM scalar potential with squared mass parameters $m_{11}^2$, $m_{22}^2$, $m_{12}^2$, and dimensionless couplings, $λ_1$, $λ_2$, $\ldots,λ_7$, the ERPS corresponds to… ▽ More The exceptional region of the parameter space (ERPS) of the two Higgs doublet model (2HDM) is defined to be the parameter regime where the scalar potential takes on a very special form. In the standard parameterization of the 2HDM scalar potential with squared mass parameters $m_{11}^2$, $m_{22}^2$, $m_{12}^2$, and dimensionless couplings, $λ_1$, $λ_2$, $\ldots,λ_7$, the ERPS corresponds to $λ_1=λ_2$, $λ_7=-λ_6$, $m_{11}^2=m_{22}^2$ and $m_{12}^2=0$, corresponding to a scalar potential with an enhanced generalized CP symmetry called GCP2. Many special features persist if $λ_1=λ_2$ and $λ_7=-λ_6$ are retained while allowing for $m_{11}^2\neq m_{22}^2$ and/or $m_{12}^2\neq 0$, corresponding to a scalar potential with a softly-broken GCP2 symmetry, which we designate as the ERPS4. In this paper, we examine many of the special features of the ERPS4, as well as even more specialized cases within the ERPS4 framework in which additional constraints on the scalar potential parameters are imposed. By surveying the landscape of the ERPS4, we complete the classification of 2HDM scalar potentials that exhibit an exact Higgs alignment (where the tree-level couplings of one neutral scalar coincide with those of the Standard Model Higgs boson), due to a residual symmetry that is unbroken in the vacuum. One surprising aspect of the ERPS4 is the possibility that the scalar sector is CP-conserving despite the presence of a complex parameter of the scalar potential whose complex phase cannot be removed by separate rephasings of the two scalar doublet fields. The significance of the ERPS4 regime for custodial symmetry is also discussed, and the cases where a custodial symmetric 2HDM scalar potential preserves an exact Higgs alignment are elucidated. △ Less

Submitted 29 June, 2022; v1 submitted 14 February, 2021; originally announced February 2021.

Comments: 103 pages, 12 tables, version 2 adds subsections in Sections 5, 6, and 7. Additional references added and a number of clarifications and minor additions included; version 3 closely resembles the published version; version 4 adds some additional cases in Tables 8 and 9; version 5 matches the Erratum published in Phys. Rev. D 105, 119902 (2022)

Report number: CFTP/21-001 and SCIPP-21/01

Journal ref: Phys. Rev. D 103, 115012 (2021)

arXiv:2102.03881 [pdf, other]

Mimetic Neural Networks: A unified framework for Protein Design and Folding

Authors: Moshe Eliasof, Tue Boesen, Eldad Haber, Chen Keasar, Eran Treister

Abstract: Recent advancements in machine learning techniques for protein folding motivate better results in its inverse problem -- protein design. In this work we introduce a new graph mimetic neural network, MimNet, and show that it is possible to build a reversible architecture that solves the structure and design problems in tandem, allowing to improve protein design when the structure is better estimate… ▽ More Recent advancements in machine learning techniques for protein folding motivate better results in its inverse problem -- protein design. In this work we introduce a new graph mimetic neural network, MimNet, and show that it is possible to build a reversible architecture that solves the structure and design problems in tandem, allowing to improve protein design when the structure is better estimated. We use the ProteinNet data set and show that the state of the art results in protein design can be improved, given recent architectures for protein folding. △ Less

Submitted 7 February, 2021; originally announced February 2021.

arXiv:2012.15629 [pdf, other]

doi 10.1140/epjc/s10052-021-09198-2

Higgs-mass predictions in the MSSM and beyond

Authors: P. Slavich, S. Heinemeyer, E. Bagnaschi, H. Bahl, M. Goodsell, H. E. Haber, T. Hahn, R. Harlander, W. Hollik, G. Lee, M. Mühlleitner, S. Paßehr, H. Rzehak, D. Stöckinger, A. Voigt, C. E. M. Wagner, G. Weiglein, B. C. Allanach, T. Biekötter, S. Borowka, J. Braathen, M. Carena, T. N. Dao, G. Degrassi, F. Domingo , et al. (14 additional authors not shown)

Abstract: Predictions for the Higgs masses are a distinctive feature of supersymmetric extensions of the Standard Model, where they play a crucial role in constraining the parameter space. The discovery of a Higgs boson and the remarkably precise measurement of its mass at the LHC have spurred new efforts aimed at improving the accuracy of the theoretical predictions for the Higgs masses in supersymmetric m… ▽ More Predictions for the Higgs masses are a distinctive feature of supersymmetric extensions of the Standard Model, where they play a crucial role in constraining the parameter space. The discovery of a Higgs boson and the remarkably precise measurement of its mass at the LHC have spurred new efforts aimed at improving the accuracy of the theoretical predictions for the Higgs masses in supersymmetric models. The "Precision SUSY Higgs Mass Calculation Initiative" (KUTS) was launched in 2014 to provide a forum for discussions between the different groups involved in these efforts. This report aims to present a comprehensive overview of the current status of Higgs-mass calculations in supersymmetric models, to document the many advances that were achieved in recent years and were discussed during the KUTS meetings, and to outline the prospects for future improvements in these calculations. △ Less

Submitted 2 February, 2023; v1 submitted 31 December, 2020; originally announced December 2020.

Comments: iv, 79 pages; 5 figures. v2: iv, 99 pages; added appendix on public codes for the Higgs-mass calculation in SUSY models. v3: minor modifications, references updated; matches version published in EPJC. v4: hyperlinks enabled

Report number: DESY 20-229, IFT-UAM/CSIC-20-184, FR-PHENO-2020-021, KA-TP-23-2020, MPP-2020-235, P3H-20-086, TTK-20-53

arXiv:2011.13159 [pdf, other]

doi 10.1007/JHEP05(2021)235

A natural mechanism for approximate Higgs alignment in the 2HDM

Authors: Patrick Draper, Andreas Ekstedt, Howard E. Haber

Abstract: The 2HDM possesses a neutral scalar interaction eigenstate whose tree-level properties coincide with the Standard Model (SM) Higgs boson. In light of the LHC Higgs data which suggests that the observed Higgs boson is SM-like, it follows that the mixing of the SM Higgs interaction eigenstate with the other neutral scalar interaction eigenstates of the 2HDM should be suppressed, corresponding to the… ▽ More The 2HDM possesses a neutral scalar interaction eigenstate whose tree-level properties coincide with the Standard Model (SM) Higgs boson. In light of the LHC Higgs data which suggests that the observed Higgs boson is SM-like, it follows that the mixing of the SM Higgs interaction eigenstate with the other neutral scalar interaction eigenstates of the 2HDM should be suppressed, corresponding to the so-called Higgs alignment limit. The exact Higgs alignment limit can arise naturally due to a global symmetry of the scalar potential. If this symmetry is softly broken, then the Higgs alignment limit becomes approximate (although still potentially consistent with the current LHC Higgs data). In this paper, we obtain the approximate Higgs alignment suggested by the LHC Higgs data as a consequence of a softly broken global symmetry of the Higgs Lagrangian. However, this can only be accomplished if the Yukawa sector of the theory is extended. We propose an extended 2HDM with vector-like top quark partners, where explicit mass terms in the top sector provide the source of the soft symmetry breaking of a generalized CP symmetry. In this way, we can realize approximate Higgs alignment without a significant fine-tuning of the model parameters. We then explore the implications of the current LHC bounds on vector-like top quark partners for the success of our proposed scenario. △ Less

Submitted 14 June, 2021; v1 submitted 26 November, 2020; originally announced November 2020.

Comments: 55 pages, 8 figures, 3 tables, and 3 appendices; version 2 adds references, corrects typographical errors and adds one figure; version 3 updates references

Report number: SCIPP-20/03

Journal ref: Journal of High Energy Physics 05 (2021) 235

arXiv:2010.01275 [pdf, other]

doi 10.1007/s10589-022-00448-x

Secant Penalized BFGS: A Noise Robust Quasi-Newton Method Via Penalizing The Secant Condition

Authors: Brian Irwin, Eldad Haber

Abstract: In this paper, we introduce a new variant of the BFGS method designed to perform well when gradient measurements are corrupted by noise. We show that by treating the secant condition with a penalty method approach motivated by regularized least squares estimation, one can smoothly interpolate between updating the inverse Hessian approximation with the original BFGS update formula and not updating… ▽ More In this paper, we introduce a new variant of the BFGS method designed to perform well when gradient measurements are corrupted by noise. We show that by treating the secant condition with a penalty method approach motivated by regularized least squares estimation, one can smoothly interpolate between updating the inverse Hessian approximation with the original BFGS update formula and not updating the inverse Hessian approximation. Furthermore, we find the curvature condition is smoothly relaxed as the interpolation moves towards not updating the inverse Hessian approximation, disappearing entirely when the inverse Hessian approximation is not updated. These developments allow us to develop a method we refer to as secant penalized BFGS (SP-BFGS) that allows one to relax the secant condition based on the amount of noise in the gradient measurements. SP-BFGS provides a means of incrementally updating the new inverse Hessian approximation with a controlled amount of bias towards the previous inverse Hessian approximation, which allows one to replace the overwriting nature of the original BFGS update with an averaging nature that resists the destructive effects of noise and can cope with negative curvature measurements. We discuss the theoretical properties of SP-BFGS, including convergence when minimizing strongly convex functions in the presence of uniformly bounded noise. Finally, we present extensive numerical experiments using over 30 problems from the CUTEst test problem set that demonstrate the superior performance of SP-BFGS compared to BFGS in the presence of both noisy function and gradient evaluations. △ Less

Submitted 10 July, 2021; v1 submitted 3 October, 2020; originally announced October 2020.

Comments: 38 pages, 3 figures; corrected errors, added numerical experiments

MSC Class: 49; 65

arXiv:2009.03990 [pdf, ps, other]

doi 10.1142/S0217751X21300027

A tale of three diagonalizations

Authors: Howard E. Haber

Abstract: In addition to the diagonalization of a normal matrix by a unitary similarity transformation, there are two other types of diagonalization procedures that sometimes arise in quantum theory applications -- the singular value decomposition and the Autonne-Takagi factorization. In these pedagogical notes, we carry out each of these diagonalization procedures for the most general $2\times 2$ matrices… ▽ More In addition to the diagonalization of a normal matrix by a unitary similarity transformation, there are two other types of diagonalization procedures that sometimes arise in quantum theory applications -- the singular value decomposition and the Autonne-Takagi factorization. In these pedagogical notes, we carry out each of these diagonalization procedures for the most general $2\times 2$ matrices for which the corresponding diagonalization is possible and provide explicit analytical results in each of the three cases. △ Less

Submitted 25 February, 2021; v1 submitted 6 September, 2020; originally announced September 2020.

Comments: 27 pages; version 2 adds a footnote 14 and improves the pdf output by modifying the implementation of hyperref; version 3 corrects an error in eq. (78)

Journal ref: International Journal of Modern Physics A 36 (2021) 2130002

arXiv:2007.03643 [pdf, other]

Segmentation of Pulmonary Opacification in Chest CT Scans of COVID-19 Patients

Authors: Keegan Lensink, Issam Laradji, Marco Law, Paolo Emilio Barbano, Savvas Nicolaou, William Parker, Eldad Haber

Abstract: The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has rapidly spread into a global pandemic. A form of pneumonia, presenting as opacities with in a patient's lungs, is the most common presentation associated with this virus, and great attention has gone into how these changes relate to patient morbidity and mortality. In this work we provide open source models for the segmentation o… ▽ More The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has rapidly spread into a global pandemic. A form of pneumonia, presenting as opacities with in a patient's lungs, is the most common presentation associated with this virus, and great attention has gone into how these changes relate to patient morbidity and mortality. In this work we provide open source models for the segmentation of patterns of pulmonary opacification on chest Computed Tomography (CT) scans which have been correlated with various stages and severities of infection. We have collected 663 chest CT scans of COVID-19 patients from healthcare centers around the world, and created pixel wise segmentation labels for nearly 25,000 slices that segment 6 different patterns of pulmonary opacification. We provide open source implementations and pre-trained weights for multiple segmentation models trained on our dataset. Our best model achieves an opacity Intersection-Over-Union score of 0.76 on our test set, demonstrates successful domain adaptation, and predicts the volume of opacification within 1.7\% of expert radiologists. Additionally, we present an analysis of the inter-observer variability inherent to this task, and propose methods for appropriate probabilistic approaches. △ Less

Submitted 8 July, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

Comments: 9 pages, 5 figures. Fix typo in delimiter between author names in arXiv metadata

arXiv:2003.08466 [pdf, other]

Fully reversible neural networks for large-scale 3D seismic horizon tracking

Authors: Bas Peters, Eldad Haber

Abstract: Tracking a horizon in seismic images or 3D volumes is an integral part of seismic interpretation. The last few decades saw progress in using neural networks for this task, starting from shallow networks for 1D traces, to deeper convolutional neural networks for large 2D images. Because geological structures are intrinsically 3D, we hope to see improved horizon tracking by training networks on 3D s… ▽ More Tracking a horizon in seismic images or 3D volumes is an integral part of seismic interpretation. The last few decades saw progress in using neural networks for this task, starting from shallow networks for 1D traces, to deeper convolutional neural networks for large 2D images. Because geological structures are intrinsically 3D, we hope to see improved horizon tracking by training networks on 3D seismic data cubes. While there are some 3D convolutional neural networks for various seismic interpretation tasks, they are restricted to shallow networks or relatively small 3D inputs because of memory limitations. The required memory for the network states and weights increases with network depth. We present a fully reversible network for horizon tracking that has a memory requirement that is independent of network depth. To tackle memory issues regarding the network weights, we use layers that train in a factorized form directly. Therefore, we can maintain a large number of network channels while keeping the number of convolutional kernels low. We use the saved memory to increase the input size of the data by order of magnitude such that the network can better learn from large structures in the data. A field data example verifies the proposed network structure is suitable for seismic horizon tracking. △ Less

Submitted 18 March, 2020; originally announced March 2020.

MSC Class: 68U10 ACM Class: I.4.6

arXiv:2003.07474 [pdf, other]

Fully reversible neural networks for large-scale surface and sub-surface characterization via remote sensing

Authors: Bas Peters, Eldad Haber, Keegan Lensink

Abstract: The large spatial/frequency scale of hyperspectral and airborne magnetic and gravitational data causes memory issues when using convolutional neural networks for (sub-) surface characterization. Recently developed fully reversible networks can mostly avoid memory limitations by virtue of having a low and fixed memory requirement for storing network states, as opposed to the typical linear memory g… ▽ More The large spatial/frequency scale of hyperspectral and airborne magnetic and gravitational data causes memory issues when using convolutional neural networks for (sub-) surface characterization. Recently developed fully reversible networks can mostly avoid memory limitations by virtue of having a low and fixed memory requirement for storing network states, as opposed to the typical linear memory growth with depth. Fully reversible networks enable the training of deep neural networks that take in entire data volumes, and create semantic segmentations in one go. This approach avoids the need to work in small patches or map a data patch to the class of just the central pixel. The cross-entropy loss function requires small modifications to work in conjunction with a fully reversible network and learn from sparsely sampled labels without ever seeing fully labeled ground truth. We show examples from land-use change detection from hyperspectral time-lapse data, and regional aquifer mapping from airborne geophysical and geological data. △ Less

Submitted 16 March, 2020; originally announced March 2020.

MSC Class: 68T45 ACM Class: I.4.6

arXiv:2001.01430 [pdf, ps, other]

doi 10.1103/PhysRevD.101.055023

Basis-independent treatment of the complex 2HDM

Authors: Rafael Boto, Tiago V. Fernandes, Howard E. Haber, Jorge C. Romao, Joao P. Silva

Abstract: The complex 2HDM (C2HDM) is the most general CP-violating two Higgs doublet model that possesses a softly-broken $\mathbb{Z}_2$ symmetry. However, the physical consequences of the model cannot depend on the basis of scalar fields used to define it. Thus, to get a better sense of the significance of the C2HDM parameters, we have analyzed this model by employing a basis-independent formalism. This f… ▽ More The complex 2HDM (C2HDM) is the most general CP-violating two Higgs doublet model that possesses a softly-broken $\mathbb{Z}_2$ symmetry. However, the physical consequences of the model cannot depend on the basis of scalar fields used to define it. Thus, to get a better sense of the significance of the C2HDM parameters, we have analyzed this model by employing a basis-independent formalism. This formalism involves transforming to the Higgs basis (which is defined up to an arbitrary complex phase) and identifying quantities that are invariant with respect to this phase degree of freedom. Using this method, we have obtained the constraints that enforce the softly-broken $\mathbb{Z}_2$ symmetry. One can then relate the C2HDM parameters to basis-independent quantities up to a two-fold ambiguity. We then show how this remaining ambiguity is resolved. We also examine the possibility of spontaneous CP violation when the scalar potential of the C2HDM is explicitly CP-conserving. Basis-independent constraints are presented that govern the presence of spontaneous CP violation. △ Less

Submitted 17 March, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

Comments: 65 pages, 3 tables. Appendix E added to version 2, additional references included, and a number of clarifications (and corrections of typographical errors) implemented. A few more references and a number of clarifying footnotes added in version 3 (title change dictated by Physical Review D). Version 4 matches the published version

Report number: CFTP/19-032 and SCIPP-19/01

Journal ref: Phys. Rev. D 101, 055023 (2020)

arXiv:1912.13302 [pdf, ps, other]

doi 10.21468/SciPostPhysLectNotes.21

Useful relations among the generators in the defining and adjoint representations of SU(N)

Authors: Howard E. Haber

Abstract: There are numerous relations among the generators in the defining and adjoint representations of SU(N). These include Casimir operators, formulae for traces of products of generators, etc. Due to the existence of the completely symmetric tensor $d_{abc}$ that arises in the study of the SU(N) Lie algebra, one can also consider relations that involve the adjoint representation matrix,… ▽ More There are numerous relations among the generators in the defining and adjoint representations of SU(N). These include Casimir operators, formulae for traces of products of generators, etc. Due to the existence of the completely symmetric tensor $d_{abc}$ that arises in the study of the SU(N) Lie algebra, one can also consider relations that involve the adjoint representation matrix, $(D^a)_{bc}=d_{abc}$. In this review, we summarize many useful relations satisfied by the defining and adjoint representation matrices of SU(N). A few relations special to the case of N=3 are highlighted. △ Less

Submitted 20 January, 2021; v1 submitted 31 December, 2019; originally announced December 2019.

Comments: 11 pages, a collection of useful formulae from a variety of sources; version 2 matches the published version

Journal ref: SciPost Phys. Lect. Notes 21 (2021)

arXiv:1912.12137 [pdf, other]

Symmetric block-low-rank layers for fully reversible multilevel neural networks

Authors: Bas Peters, Eldad Haber, Keegan Lensink

Abstract: Factors that limit the size of the input and output of a neural network include memory requirements for the network states/activations to compute gradients, as well as memory for the convolutional kernels or other weights. The memory restriction is especially limiting for applications where we want to learn how to map volumetric data to the desired output, such as video-to-video. Recently develope… ▽ More Factors that limit the size of the input and output of a neural network include memory requirements for the network states/activations to compute gradients, as well as memory for the convolutional kernels or other weights. The memory restriction is especially limiting for applications where we want to learn how to map volumetric data to the desired output, such as video-to-video. Recently developed fully reversible neural networks enable gradient computations using storage of the network states for a couple of layers only. While this saves a tremendous amount of memory, it is the convolutional kernels that take up most memory if fully reversible networks contain multiple invertible pooling/coarsening layers. Invertible coarsening operators such as the orthogonal wavelet transform cause the number of channels to grow explosively. We address this issue by combining fully reversible networks with layers that contain the convolutional kernels in a compressed form directly. Specifically, we introduce a layer that has a symmetric block-low-rank structure. In spirit, this layer is similar to bottleneck and squeeze-and-expand structures. We contribute symmetry by construction, and a combination of notation and flattening of tensors allows us to interpret these network structures in linear algebraic fashion as a block-low-rank matrix in factorized form and observe various properties. A video segmentation example shows that we can train a network to segment the entire video in one go, which would not be possible, in terms of memory requirements, using non-reversible networks and previously proposed reversible networks. △ Less

Submitted 14 December, 2019; originally announced December 2019.

MSC Class: 68T45

arXiv:1910.13157 [pdf, other]

doi 10.1109/JSTSP.2020.2972775

LeanConvNets: Low-cost Yet Effective Convolutional Neural Networks

Authors: Jonathan Ephrath, Moshe Eliasof, Lars Ruthotto, Eldad Haber, Eran Treister

Abstract: Convolutional Neural Networks (CNNs) have become indispensable for solving machine learning tasks in speech recognition, computer vision, and other areas that involve high-dimensional data. A CNN filters the input feature using a network containing spatial convolution operators with compactly supported stencils. In practice, the input data and the hidden features consist of a large number of chann… ▽ More Convolutional Neural Networks (CNNs) have become indispensable for solving machine learning tasks in speech recognition, computer vision, and other areas that involve high-dimensional data. A CNN filters the input feature using a network containing spatial convolution operators with compactly supported stencils. In practice, the input data and the hidden features consist of a large number of channels, which in most CNNs are fully coupled by the convolution operators. This coupling leads to immense computational cost in the training and prediction phase. In this paper, we introduce LeanConvNets that are derived by sparsifying fully-coupled operators in existing CNNs. Our goal is to improve the efficiency of CNNs by reducing the number of weights, floating point operations and latency times, with minimal loss of accuracy. Our lean convolution operators involve tuning parameters that controls the trade-off between the network's accuracy and computational costs. These convolutions can be used in a wide range of existing networks, and we exemplify their use in residual networks (ResNets). Using a range of benchmark problems from image classification and semantic segmentation, we demonstrate that the resulting LeanConvNet's accuracy is close to state-of-the-art networks while being computationally less expensive. In our tests, the lean versions of ResNet in most cases outperform comparable reduced architectures such as MobileNets and ShuffleNets. △ Less

Submitted 12 February, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

arXiv:1910.01694 [pdf, other]

Fluid Flow Mass Transport for Generative Networks

Authors: Jingrong Lin, Keegan Lensink, Eldad Haber

Abstract: Generative Adversarial Networks have been shown to be powerful in generating content. To this end, they have been studied intensively in the last few years. Nonetheless, training these networks requires solving a saddle point problem that is difficult to solve and slowly converging. Motivated from techniques in the registration of point clouds and by the fluid flow formulation of mass transport, w… ▽ More Generative Adversarial Networks have been shown to be powerful in generating content. To this end, they have been studied intensively in the last few years. Nonetheless, training these networks requires solving a saddle point problem that is difficult to solve and slowly converging. Motivated from techniques in the registration of point clouds and by the fluid flow formulation of mass transport, we investigate a new formulation that is based on strict minimization, without the need for the maximization. The formulation views the problem as a matching problem rather than an adversarial one and thus allows us to quickly converge and obtain meaningful metrics in the optimization path. △ Less

Submitted 7 October, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

arXiv:1910.00170 [pdf, other]

doi 10.1007/s11081-020-09507-w

How To Catch A Lion In The Desert -- On The Solution Of The Coverage Directed Generation (CDG) Problem

Authors: Raviv Gal, Eldad Haber, Brian Irwin, Bilal Saleh, Avi Ziv

Abstract: The testing and verification of a complex hardware or software system, such as modern integrated circuits (ICs) found in everything from smartphones to servers, can be a difficult process. One of the most difficult and time-consuming tasks a verification team faces is reaching coverage closure, or hitting all events in the coverage space. Coverage-directed-generation (CDG), or the automatic genera… ▽ More The testing and verification of a complex hardware or software system, such as modern integrated circuits (ICs) found in everything from smartphones to servers, can be a difficult process. One of the most difficult and time-consuming tasks a verification team faces is reaching coverage closure, or hitting all events in the coverage space. Coverage-directed-generation (CDG), or the automatic generation of tests that can hit hard-to-hit coverage events, and thus provide coverage closure, holds the potential to save verification teams significant simulation resources and time. In this paper, we propose a new approach to the CDG problem by formulating the CDG problem as a noisy derivative free optimization (DFO) problem. However, this formulation is complicated by the fact that derivatives of the objective function are unavailable, and the objective function evaluations are corrupted by noise. We solve this noisy optimization problem by utilizing techniques from direct optimization coupled with a robust noise estimator, and by leveraging techniques from inverse problems to estimate the gradient of the noisy objective function. We demonstrate the efficiency and reliability of this new approach through numerical experiments with an abstract model of part of IBM's NorthStar processor, a superscalar in-order processor designed for servers. △ Less

Submitted 30 September, 2019; originally announced October 2019.

Comments: 23 pages, 5 figures

Showing 1–50 of 188 results for author: Haber, E