Search | arXiv e-print repository

doi 10.1145/3649329.3656543

CircuitVAE: Efficient and Scalable Latent Circuit Optimization

Authors: Jialin Song, Aidan Swope, Robert Kirby, Rajarshi Roy, Saad Godil, Jonathan Raiman, Bryan Catanzaro

Abstract: Automatically designing fast and space-efficient digital circuits is challenging because circuits are discrete, must exactly implement the desired logic, and are costly to simulate. We address these challenges with CircuitVAE, a search algorithm that embeds computation graphs in a continuous space and optimizes a learned surrogate of physical simulation by gradient descent. By carefully controllin… ▽ More Automatically designing fast and space-efficient digital circuits is challenging because circuits are discrete, must exactly implement the desired logic, and are costly to simulate. We address these challenges with CircuitVAE, a search algorithm that embeds computation graphs in a continuous space and optimizes a learned surrogate of physical simulation by gradient descent. By carefully controlling overfitting of the simulation surrogate and ensuring diverse exploration, our algorithm is highly sample-efficient, yet gracefully scales to large problem instances and high sample budgets. We test CircuitVAE by designing binary adders across a large range of sizes, IO timing constraints, and sample budgets. Our method excels at designing large circuits, where other algorithms struggle: compared to reinforcement learning and genetic algorithms, CircuitVAE typically finds 64-bit adders which are smaller and faster using less than half the sample budget. We also find CircuitVAE can design state-of-the-art adders in a real-world chip, demonstrating that our method can outperform commercial tools in a realistic setting. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Design Automation Conference (DAC) 2024; the first two authors contributed equally

arXiv:2406.06751 [pdf, other]

Complexity-Aware Deep Symbolic Regression with Robust Risk-Seeking Policy Gradients

Authors: Zachary Bastiani, Robert M. Kirby, Jacob Hochhalter, Shandian Zhe

Abstract: This paper proposes a novel deep symbolic regression approach to enhance the robustness and interpretability of data-driven mathematical expression discovery. Despite the success of the state-of-the-art method, DSR, it is built on recurrent neural networks, purely guided by data fitness, and potentially meet tail barriers, which can zero out the policy gradient and cause inefficient model updates.… ▽ More This paper proposes a novel deep symbolic regression approach to enhance the robustness and interpretability of data-driven mathematical expression discovery. Despite the success of the state-of-the-art method, DSR, it is built on recurrent neural networks, purely guided by data fitness, and potentially meet tail barriers, which can zero out the policy gradient and cause inefficient model updates. To overcome these limitations, we use transformers in conjunction with breadth-first-search to improve the learning performance. We use Bayesian information criterion (BIC) as the reward function to explicitly account for the expression complexity and optimize the trade-off between interpretability and data fitness. We propose a modified risk-seeking policy that not only ensures the unbiasness of the gradient, but also removes the tail barriers, thus ensuring effective updates from top performers. Through a series of benchmarks and systematic experiments, we demonstrate the advantages of our approach. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.02336 [pdf, other]

Polynomial-Augmented Neural Networks (PANNs) with Weak Orthogonality Constraints for Enhanced Function and PDE Approximation

Authors: Madison Cooley, Shandian Zhe, Robert M. Kirby, Varun Shankar

Abstract: We present polynomial-augmented neural networks (PANNs), a novel machine learning architecture that combines deep neural networks (DNNs) with a polynomial approximant. PANNs combine the strengths of DNNs (flexibility and efficiency in higher-dimensional approximation) with those of polynomial approximation (rapid convergence rates for smooth functions). To aid in both stable training and enhanced… ▽ More We present polynomial-augmented neural networks (PANNs), a novel machine learning architecture that combines deep neural networks (DNNs) with a polynomial approximant. PANNs combine the strengths of DNNs (flexibility and efficiency in higher-dimensional approximation) with those of polynomial approximation (rapid convergence rates for smooth functions). To aid in both stable training and enhanced accuracy over a variety of problems, we present (1) a family of orthogonality constraints that impose mutual orthogonality between the polynomial and the DNN within a PANN; (2) a simple basis pruning approach to combat the curse of dimensionality introduced by the polynomial component; and (3) an adaptation of a polynomial preconditioning strategy to both DNNs and polynomials. We test the resulting architecture for its polynomial reproduction properties, ability to approximate both smooth functions and functions of limited smoothness, and as a method for the solution of partial differential equations (PDEs). Through these experiments, we demonstrate that PANNs offer superior approximation properties to DNNs for both regression and the numerical solution of PDEs, while also offering enhanced accuracy over both polynomial and DNN-based regression (each) when regressing functions with limited smoothness. △ Less

Submitted 4 June, 2024; originally announced June 2024.

MSC Class: 68T07; 68U99; 65N99

arXiv:2402.11126 [pdf, other]

Kolmogorov n-Widths for Multitask Physics-Informed Machine Learning (PIML) Methods: Towards Robust Metrics

Authors: Michael Penwarden, Houman Owhadi, Robert M. Kirby

Abstract: Physics-informed machine learning (PIML) as a means of solving partial differential equations (PDE) has garnered much attention in the Computational Science and Engineering (CS&E) world. This topic encompasses a broad array of methods and models aimed at solving a single or a collection of PDE problems, called multitask learning. PIML is characterized by the incorporation of physical laws into the… ▽ More Physics-informed machine learning (PIML) as a means of solving partial differential equations (PDE) has garnered much attention in the Computational Science and Engineering (CS&E) world. This topic encompasses a broad array of methods and models aimed at solving a single or a collection of PDE problems, called multitask learning. PIML is characterized by the incorporation of physical laws into the training process of machine learning models in lieu of large data when solving PDE problems. Despite the overall success of this collection of methods, it remains incredibly difficult to analyze, benchmark, and generally compare one approach to another. Using Kolmogorov n-widths as a measure of effectiveness of approximating functions, we judiciously apply this metric in the comparison of various multitask PIML architectures. We compute lower accuracy bounds and analyze the model's learned basis functions on various PDE problems. This is the first objective metric for comparing multitask PIML architectures and helps remove uncertainty in model validation from selective sampling and overfitting. We also identify avenues of improvement for model architectures, such as the choice of activation function, which can drastically affect model generalization to "worst-case" scenarios, which is not observed when reporting task-specific errors. We also incorporate this metric into the optimization process through regularization, which improves the models' generalizability over the multitask PDE problem. △ Less

Submitted 4 September, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

arXiv:2311.04465 [pdf, other]

Solving High Frequency and Multi-Scale PDEs with Gaussian Processes

Authors: Shikai Fang, Madison Cooley, Da Long, Shibo Li, Robert Kirby, Shandian Zhe

Abstract: Machine learning based solvers have garnered much attention in physical simulation and scientific computing, with a prominent example, physics-informed neural networks (PINNs). However, PINNs often struggle to solve high-frequency and multi-scale PDEs, which can be due to spectral bias during neural network training. To address this problem, we resort to the Gaussian process (GP) framework. To fle… ▽ More Machine learning based solvers have garnered much attention in physical simulation and scientific computing, with a prominent example, physics-informed neural networks (PINNs). However, PINNs often struggle to solve high-frequency and multi-scale PDEs, which can be due to spectral bias during neural network training. To address this problem, we resort to the Gaussian process (GP) framework. To flexibly capture the dominant frequencies, we model the power spectrum of the PDE solution with a student $t$ mixture or Gaussian mixture. We apply the inverse Fourier transform to obtain the covariance function (by Wiener-Khinchin theorem). The covariance derived from the Gaussian mixture spectrum corresponds to the known spectral mixture kernel. Next, we estimate the mixture weights in the log domain, which we show is equivalent to placing a Jeffreys prior. It automatically induces sparsity, prunes excessive frequencies, and adjusts the remaining toward the ground truth. Third, to enable efficient and scalable computation on massive collocation points, which are critical to capture high frequencies, we place the collocation points on a grid, and multiply our covariance function at each input dimension. We use the GP conditional mean to predict the solution and its derivatives so as to fit the boundary condition and the equation itself. As a result, we can derive a Kronecker product structure in the covariance matrix. We use Kronecker product properties and multilinear algebra to promote computational efficiency and scalability, without low-rank approximations. We show the advantage of our method in systematic experiments. The code is released at \url{https://github.com/xuangu-fang/Gaussian-Process-Slover-for-High-Freq-PDE}. △ Less

Submitted 18 March, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

Journal ref: The Twelfth International Conference on Learning Representations (ICLR 2024)

arXiv:2311.00176 [pdf, other]

ChipNeMo: Domain-Adapted LLMs for Chip Design

Authors: Mingjie Liu, Teodor-Dumitru Ene, Robert Kirby, Chris Cheng, Nathaniel Pinckney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, Bonita Bhaskaran, Bryan Catanzaro, Arjun Chaudhuri, Sharon Clay, Bill Dally, Laura Dang, Parikshit Deshpande, Siddhanth Dhodhi, Sameer Halepete, Eric Hill, Jiashang Hu, Sumit Jain, Ankit Jindal, Brucek Khailany, George Kokai , et al. (17 additional authors not shown)

Abstract: ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: domain-adaptive tokenization, domain-adaptive continued pretraining, model alignment with domain-specific instructions, and domain-adapted retrieval models. We e… ▽ More ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: domain-adaptive tokenization, domain-adaptive continued pretraining, model alignment with domain-specific instructions, and domain-adapted retrieval models. We evaluate these methods on three selected LLM applications for chip design: an engineering assistant chatbot, EDA script generation, and bug summarization and analysis. Our evaluations demonstrate that domain-adaptive pretraining of language models, can lead to superior performance in domain related downstream tasks compared to their base LLaMA2 counterparts, without degradations in generic capabilities. In particular, our largest model, ChipNeMo-70B, outperforms the highly capable GPT-4 on two of our use cases, namely engineering assistant chatbot and EDA scripts generation, while exhibiting competitive performance on bug summarization and analysis. These results underscore the potential of domain-specific customization for enhancing the effectiveness of large language models in specialized applications. △ Less

Submitted 4 April, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

Comments: Updated results for ChipNeMo-70B model

arXiv:2310.17021 [pdf, other]

Streaming Factor Trajectory Learning for Temporal Tensor Decomposition

Authors: Shikai Fang, Xin Yu, Shibo Li, Zheng Wang, Robert Kirby, Shandian Zhe

Abstract: Practical tensor data is often along with time information. Most existing temporal decomposition approaches estimate a set of fixed factors for the objects in each tensor mode, and hence cannot capture the temporal evolution of the objects' representation. More important, we lack an effective approach to capture such evolution from streaming data, which is common in real-world applications. To add… ▽ More Practical tensor data is often along with time information. Most existing temporal decomposition approaches estimate a set of fixed factors for the objects in each tensor mode, and hence cannot capture the temporal evolution of the objects' representation. More important, we lack an effective approach to capture such evolution from streaming data, which is common in real-world applications. To address these issues, we propose Streaming Factor Trajectory Learning for temporal tensor decomposition. We use Gaussian processes (GPs) to model the trajectory of factors so as to flexibly estimate their temporal evolution. To address the computational challenges in handling streaming data, we convert the GPs into a state-space prior by constructing an equivalent stochastic differential equation (SDE). We develop an efficient online filtering algorithm to estimate a decoupled running posterior of the involved factor states upon receiving new data. The decoupled estimation enables us to conduct standard Rauch-Tung-Striebel smoothing to compute the full posterior of all the trajectories in parallel, without the need for revisiting any previous data. We have shown the advantage of SFTL in both synthetic tasks and real-world applications. The code is available at {https://github.com/xuangu-fang/Streaming-Factor-Trajectory-Learning}. △ Less

Submitted 7 November, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.08818 [pdf, other]

Algorithm xxxx: HiPPIS A High-Order Positivity-Preserving Mapping Software for Structured Meshes

Authors: Timbwaoga A. J. Ouermi, Robert M Kirby, Martin Berzins

Abstract: Polynomial interpolation is an important component of many computational problems. In several of these computational problems, failure to preserve positivity when using polynomials to approximate or map data values between meshes can lead to negative unphysical quantities. Currently, most polynomial-based methods for enforcing positivity are based on splines and polynomial rescaling. The spline-ba… ▽ More Polynomial interpolation is an important component of many computational problems. In several of these computational problems, failure to preserve positivity when using polynomials to approximate or map data values between meshes can lead to negative unphysical quantities. Currently, most polynomial-based methods for enforcing positivity are based on splines and polynomial rescaling. The spline-based approaches build interpolants that are positive over the intervals in which they are defined and may require solving a minimization problem and/or system of equations. The linear polynomial rescaling methods allow for high-degree polynomials but enforce positivity only at limited locations (e.g., quadrature nodes). This work introduces open-source software (HiPPIS) for high-order data-bounded interpolation (DBI) and positivity-preserving interpolation (PPI) that addresses the limitations of both the spline and polynomial rescaling methods. HiPPIS is suitable for approximating and mapping physical quantities such as mass, density, and concentration between meshes while preserving positivity. This work provides Fortran and Matlab implementations of the DBI and PPI methods, presents an analysis of the mapping error in the context of PDEs, and uses several 1D and 2D numerical examples to demonstrate the benefits and limitations of HiPPIS. △ Less

Submitted 12 October, 2023; originally announced October 2023.

MSC Class: 65D05; 65D15

arXiv:2310.05387 [pdf, other]

Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels

Authors: Da Long, Wei W. Xing, Aditi S. Krishnapriyan, Robert M. Kirby, Shandian Zhe, Michael W. Mahoney

Abstract: Discovering governing equations from data is important to many scientific and engineering applications. Despite promising successes, existing methods are still challenged by data sparsity and noise issues, both of which are ubiquitous in practice. Moreover, state-of-the-art methods lack uncertainty quantification and/or are costly in training. To overcome these limitations, we propose a novel equa… ▽ More Discovering governing equations from data is important to many scientific and engineering applications. Despite promising successes, existing methods are still challenged by data sparsity and noise issues, both of which are ubiquitous in practice. Moreover, state-of-the-art methods lack uncertainty quantification and/or are costly in training. To overcome these limitations, we propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS). We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We combine it with a Bayesian spike-and-slab prior -- an ideal Bayesian sparse distribution -- for effective operator selection and uncertainty quantification. We develop an expectation-propagation expectation-maximization (EP-EM) algorithm for efficient posterior inference and function estimation. To overcome the computational challenge of kernel regression, we place the function values on a mesh and induce a Kronecker product construction, and we use tensor algebra to enable efficient computation and optimization. We show the advantages of KBASS on a list of benchmark ODE and PDE discovery tasks. △ Less

Submitted 21 April, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

arXiv:2307.04909 [pdf, other]

Planar Curve Registration using Bayesian Inversion

Authors: Andreas Bock, Colin J. Cotter, Robert C. Kirby

Abstract: We study parameterisation-independent closed planar curve matching as a Bayesian inverse problem. The motion of the curve is modelled via a curve on the diffeomorphism group acting on the ambient space, leading to a large deformation diffeomorphic metric mapping (LDDMM) functional penalising the kinetic energy of the deformation. We solve Hamilton's equations for the curve matching problem using t… ▽ More We study parameterisation-independent closed planar curve matching as a Bayesian inverse problem. The motion of the curve is modelled via a curve on the diffeomorphism group acting on the ambient space, leading to a large deformation diffeomorphic metric mapping (LDDMM) functional penalising the kinetic energy of the deformation. We solve Hamilton's equations for the curve matching problem using the Wu-Xu element [S. Wu, J. Xu, Nonconforming finite element spaces for $2m^\text{th}$ order partial differential equations on $\mathbb{R}^n$ simplicial grids when $m=n+1$, Mathematics of Computation 88 (316) (2019) 531-551] which provides mesh-independent Lipschitz constants for the forward motion of the curve, and solve the inverse problem for the momentum using Bayesian inversion. Since this element is not affine-equivalent we provide a pullback theory which expedites the implementation and efficiency of the forward map. We adopt ensemble Kalman inversion using a negative Sobolev norm mismatch penalty to measure the discrepancy between the target and the ensemble mean shape. We provide several numerical examples to validate the approach. △ Less

Submitted 10 July, 2023; originally announced July 2023.

Comments: 45 pages, 9 figures

arXiv:2304.03297 [pdf, other]

Neural Operator Learning for Ultrasound Tomography Inversion

Authors: Haocheng Dai, Michael Penwarden, Robert M. Kirby, Sarang Joshi

Abstract: Neural operator learning as a means of mapping between complex function spaces has garnered significant attention in the field of computational science and engineering (CS&E). In this paper, we apply Neural operator learning to the time-of-flight ultrasound computed tomography (USCT) problem. We learn the mapping between time-of-flight (TOF) data and the heterogeneous sound speed field using a ful… ▽ More Neural operator learning as a means of mapping between complex function spaces has garnered significant attention in the field of computational science and engineering (CS&E). In this paper, we apply Neural operator learning to the time-of-flight ultrasound computed tomography (USCT) problem. We learn the mapping between time-of-flight (TOF) data and the heterogeneous sound speed field using a full-wave solver to generate the training data. This novel application of operator learning circumnavigates the need to solve the computationally intensive iterative inverse problem. The operator learns the non-linear mapping offline and predicts the heterogeneous sound field with a single forward pass through the model. This is the first time operator learning has been used for ultrasound tomography and is the first step in potential real-time predictions of soft tissue distribution for tumor identification in beast imaging. △ Less

Submitted 28 May, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

Comments: 4 pages, 1 figure

arXiv:2302.14227 [pdf, other]

doi 10.1016/j.jcp.2023.112464

A unified scalable framework for causal sweeping strategies for Physics-Informed Neural Networks (PINNs) and their temporal decompositions

Authors: Michael Penwarden, Ameya D. Jagtap, Shandian Zhe, George Em Karniadakis, Robert M. Kirby

Abstract: Physics-informed neural networks (PINNs) as a means of solving partial differential equations (PDE) have garnered much attention in the Computational Science and Engineering (CS&E) world. However, a recent topic of interest is exploring various training (i.e., optimization) challenges - in particular, arriving at poor local minima in the optimization landscape results in a PINN approximation givin… ▽ More Physics-informed neural networks (PINNs) as a means of solving partial differential equations (PDE) have garnered much attention in the Computational Science and Engineering (CS&E) world. However, a recent topic of interest is exploring various training (i.e., optimization) challenges - in particular, arriving at poor local minima in the optimization landscape results in a PINN approximation giving an inferior, and sometimes trivial, solution when solving forward time-dependent PDEs with no data. This problem is also found in, and in some sense more difficult, with domain decomposition strategies such as temporal decomposition using XPINNs. We furnish examples and explanations for different training challenges, their cause, and how they relate to information propagation and temporal decomposition. We then propose a new stacked-decomposition method that bridges the gap between time-marching PINNs and XPINNs. We also introduce significant computational speed-ups by using transfer learning concepts to initialize subnetworks in the domain and loss tolerance-based propagation for the subdomains. Finally, we formulate a new time-sweeping collocation point algorithm inspired by the previous PINNs causality literature, which our framework can still describe, and provides a significant computational speed-up via reduced-cost collocation point segmentation. The proposed methods form our unified framework, which overcomes training challenges in PINNs and XPINNs for time-dependent PDEs by respecting the causality in multiple forms and improving scalability by limiting the computation required per optimization iteration. Finally, we provide numerical results for these methods on baseline PDE problems for which unmodified PINNs and XPINNs struggle to train. △ Less

Submitted 18 September, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Journal ref: Journal of Computational Physics, 493, 2023, 112464

arXiv:2302.03175 [pdf, other]

Genetic Programming Based Symbolic Regression for Analytical Solutions to Differential Equations

Authors: Hongsup Oh, Roman Amici, Geoffrey Bomarito, Shandian Zhe, Robert Kirby, Jacob Hochhalter

Abstract: In this paper, we present a machine learning method for the discovery of analytic solutions to differential equations. The method utilizes an inherently interpretable algorithm, genetic programming based symbolic regression. Unlike conventional accuracy measures in machine learning we demonstrate the ability to recover true analytic solutions, as opposed to a numerical approximation. The method is… ▽ More In this paper, we present a machine learning method for the discovery of analytic solutions to differential equations. The method utilizes an inherently interpretable algorithm, genetic programming based symbolic regression. Unlike conventional accuracy measures in machine learning we demonstrate the ability to recover true analytic solutions, as opposed to a numerical approximation. The method is verified by assessing its ability to recover known analytic solutions for two separate differential equations. The developed method is compared to a conventional, purely data-driven genetic programming based symbolic regression algorithm. The reliability of successful evolution of the true solution, or an algebraic equivalent, is demonstrated. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Comments: 14 pages, 9 figures

arXiv:2302.00807 [pdf, other]

Deep neural operators can serve as accurate surrogates for shape optimization: A case study for airfoils

Authors: Khemraj Shukla, Vivek Oommen, Ahmad Peyvan, Michael Penwarden, Luis Bravo, Anindya Ghoshal, Robert M. Kirby, George Em Karniadakis

Abstract: Deep neural operators, such as DeepONets, have changed the paradigm in high-dimensional nonlinear regression from function regression to (differential) operator regression, paving the way for significant changes in computational engineering applications. Here, we investigate the use of DeepONets to infer flow fields around unseen airfoils with the aim of shape optimization, an important design pro… ▽ More Deep neural operators, such as DeepONets, have changed the paradigm in high-dimensional nonlinear regression from function regression to (differential) operator regression, paving the way for significant changes in computational engineering applications. Here, we investigate the use of DeepONets to infer flow fields around unseen airfoils with the aim of shape optimization, an important design problem in aerodynamics that typically taxes computational resources heavily. We present results which display little to no degradation in prediction accuracy, while reducing the online optimization cost by orders of magnitude. We consider NACA airfoils as a test case for our proposed approach, as their shape can be easily defined by the four-digit parametrization. We successfully optimize the constrained NACA four-digit problem with respect to maximizing the lift-to-drag ratio and validate all results by comparing them to a high-order CFD solver. We find that DeepONets have low generalization error, making them ideal for generating solutions of unseen shapes. Specifically, pressure, density, and velocity fields are accurately inferred at a fraction of a second, hence enabling the use of general objective functions beyond the maximization of the lift-to-drag ratio considered in the current work. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Comments: 21 pages, 14 Figures

arXiv:2210.12704 [pdf, other]

Batch Multi-Fidelity Active Learning with Budget Constraints

Authors: Shibo Li, Jeff M. Phillips, Xin Yu, Robert M. Kirby, Shandian Zhe

Abstract: Learning functions with high-dimensional outputs is critical in many applications, such as physical simulation and engineering design. However, collecting training examples for these applications is often costly, e.g. by running numerical solvers. The recent work (Li et al., 2022) proposes the first multi-fidelity active learning approach for high-dimensional outputs, which can acquire examples at… ▽ More Learning functions with high-dimensional outputs is critical in many applications, such as physical simulation and engineering design. However, collecting training examples for these applications is often costly, e.g. by running numerical solvers. The recent work (Li et al., 2022) proposes the first multi-fidelity active learning approach for high-dimensional outputs, which can acquire examples at different fidelities to reduce the cost while improving the learning performance. However, this method only queries at one pair of fidelity and input at a time, and hence has a risk to bring in strongly correlated examples to reduce the learning efficiency. In this paper, we propose Batch Multi-Fidelity Active Learning with Budget Constraints (BMFAL-BC), which can promote the diversity of training examples to improve the benefit-cost ratio, while respecting a given budget constraint for batch queries. Hence, our method can be more practically useful. Specifically, we propose a novel batch acquisition function that measures the mutual information between a batch of multi-fidelity queries and the target function, so as to penalize highly correlated queries and encourages diversity. The optimization of the batch acquisition function is challenging in that it involves a combinatorial search over many fidelities while subject to the budget constraint. To address this challenge, we develop a weighted greedy algorithm that can sequentially identify each (fidelity, input) pair, while achieving a near $(1 - 1/e)$-approximation of the optimum. We show the advantage of our method in several computational physics and engineering applications. △ Less

Submitted 23 October, 2022; originally announced October 2022.

arXiv:2210.12669 [pdf, other]

Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks

Authors: Shibo Li, Michael Penwarden, Yiming Xu, Conor Tillinghast, Akil Narayan, Robert M. Kirby, Shandian Zhe

Abstract: Physics-informed neural networks (PINNs) are emerging as popular mesh-free solvers for partial differential equations (PDEs). Recent extensions decompose the domain, apply different PINNs to solve the problem in each subdomain, and stitch the subdomains at the interface. Thereby, they can further alleviate the problem complexity, reduce the computational cost, and allow parallelization. However, t… ▽ More Physics-informed neural networks (PINNs) are emerging as popular mesh-free solvers for partial differential equations (PDEs). Recent extensions decompose the domain, apply different PINNs to solve the problem in each subdomain, and stitch the subdomains at the interface. Thereby, they can further alleviate the problem complexity, reduce the computational cost, and allow parallelization. However, the performance of multi-domain PINNs is sensitive to the choice of the interface conditions. While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine appropriate interface conditions for solving a family of parametric PDEs. Specifically, we develop two contextual multi-arm bandit (MAB) models. The first one applies to the entire training course, and online updates a Gaussian process (GP) reward that given the PDE parameters and interface conditions predicts the performance. We prove a sub-linear regret bound for both UCB and Thompson sampling, which in theory guarantees the effectiveness of our MAB. The second one partitions the training into two stages, one is the stochastic phase and the other deterministic phase; we update a GP reward for each phase to enable different condition selections at the two stages to further bolster the flexibility and performance. We have shown the advantage of METALIC on four bench-mark PDE families. △ Less

Submitted 6 July, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

arXiv:2208.00579 [pdf, other]

Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization

Authors: Tan Nguyen, Richard G. Baraniuk, Robert M. Kirby, Stanley J. Osher, Bao Wang

Abstract: Transformers have achieved remarkable success in sequence modeling and beyond but suffer from quadratic computational and memory complexities with respect to the length of the input sequence. Leveraging techniques include sparse and linear attention and hashing tricks; efficient transformers have been proposed to reduce the quadratic complexity of transformers but significantly degrade the accurac… ▽ More Transformers have achieved remarkable success in sequence modeling and beyond but suffer from quadratic computational and memory complexities with respect to the length of the input sequence. Leveraging techniques include sparse and linear attention and hashing tricks; efficient transformers have been proposed to reduce the quadratic complexity of transformers but significantly degrade the accuracy. In response, we first interpret the linear attention and residual connections in computing the attention map as gradient descent steps. We then introduce momentum into these components and propose the \emph{momentum transformer}, which utilizes momentum to improve the accuracy of linear transformers while maintaining linear memory and computational complexities. Furthermore, we develop an adaptive strategy to compute the momentum value for our model based on the optimal momentum for quadratic optimization. This adaptive momentum eliminates the need to search for the optimal momentum value and further enhances the performance of the momentum transformer. A range of experiments on both autoregressive and non-autoregressive tasks, including image generation and machine translation, demonstrate that the momentum transformer outperforms popular linear transformers in training efficiency and accuracy. △ Less

Submitted 31 July, 2022; originally announced August 2022.

Comments: 22 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:2110.07034

MSC Class: 65Pxx

arXiv:2207.04084 [pdf, other]

Adaptive Self-supervision Algorithms for Physics-informed Neural Networks

Authors: Shashank Subramanian, Robert M. Kirby, Michael W. Mahoney, Amir Gholami

Abstract: Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function, but recent work has shown that this can lead to optimization difficulties. Here, we study the impact of the location of the collocation points on the trainability of these models. We find that the vanilla PINN performance can be significantly boosted by adaptin… ▽ More Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function, but recent work has shown that this can lead to optimization difficulties. Here, we study the impact of the location of the collocation points on the trainability of these models. We find that the vanilla PINN performance can be significantly boosted by adapting the location of the collocation points as training proceeds. Specifically, we propose a novel adaptive collocation scheme which progressively allocates more collocation points (without increasing their number) to areas where the model is making higher errors (based on the gradient of the loss function in the domain). This, coupled with a judicious restarting of the training during any optimization stalls (by simply resampling the collocation points in order to adjust the loss landscape) leads to better estimates for the prediction error. We present results for several problems, including a 2D Poisson and diffusion-advection system with different forcing functions. We find that training vanilla PINNs for these problems can result in up to 70% prediction error in the solution, especially in the regime of low collocation points. In contrast, our adaptive schemes can achieve up to an order of magnitude smaller error, with similar computational complexity as the baseline. Furthermore, we find that the adaptive methods consistently perform on-par or slightly better than vanilla PINN method, even for large collocation point regimes. The code for all the experiments has been open sourced. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: 15 pages

arXiv:2207.00678 [pdf, other]

Infinite-Fidelity Coregionalization for Physical Simulation

Authors: Shibo Li, Zheng Wang, Robert M. Kirby, Shandian Zhe

Abstract: Multi-fidelity modeling and learning are important in physical simulation-related applications. It can leverage both low-fidelity and high-fidelity examples for training so as to reduce the cost of data generation while still achieving good performance. While existing approaches only model finite, discrete fidelities, in practice, the fidelity choice is often continuous and infinite, which can cor… ▽ More Multi-fidelity modeling and learning are important in physical simulation-related applications. It can leverage both low-fidelity and high-fidelity examples for training so as to reduce the cost of data generation while still achieving good performance. While existing approaches only model finite, discrete fidelities, in practice, the fidelity choice is often continuous and infinite, which can correspond to a continuous mesh spacing or finite element length. In this paper, we propose Infinite Fidelity Coregionalization (IFC). Given the data, our method can extract and exploit rich information within continuous, infinite fidelities to bolster the prediction accuracy. Our model can interpolate and/or extrapolate the predictions to novel fidelities, which can be even higher than the fidelities of training data. Specifically, we introduce a low-dimensional latent output as a continuous function of the fidelity and input, and multiple it with a basis matrix to predict high-dimensional solution outputs. We model the latent output as a neural Ordinary Differential Equation (ODE) to capture the complex relationships within and integrate information throughout the continuous fidelities. We then use Gaussian processes or another ODE to estimate the fidelity-varying bases. For efficient inference, we reorganize the bases as a tensor, and use a tensor-Gaussian variational posterior to develop a scalable inference algorithm for massive outputs. We show the advantage of our method in several benchmark tasks in computational physics. △ Less

Submitted 23 October, 2022; v1 submitted 1 July, 2022; originally announced July 2022.

arXiv:2205.07000 [pdf, other]

doi 10.1109/DAC18074.2021.9586094

PrefixRL: Optimization of Parallel Prefix Circuits using Deep Reinforcement Learning

Authors: Rajarshi Roy, Jonathan Raiman, Neel Kant, Ilyas Elkin, Robert Kirby, Michael Siu, Stuart Oberman, Saad Godil, Bryan Catanzaro

Abstract: In this work, we present a reinforcement learning (RL) based approach to designing parallel prefix circuits such as adders or priority encoders that are fundamental to high-performance digital design. Unlike prior methods, our approach designs solutions tabula rasa purely through learning with synthesis in the loop. We design a grid-based state-action representation and an RL environment for const… ▽ More In this work, we present a reinforcement learning (RL) based approach to designing parallel prefix circuits such as adders or priority encoders that are fundamental to high-performance digital design. Unlike prior methods, our approach designs solutions tabula rasa purely through learning with synthesis in the loop. We design a grid-based state-action representation and an RL environment for constructing legal prefix circuits. Deep Convolutional RL agents trained on this environment produce prefix adder circuits that Pareto-dominate existing baselines with up to 16.0% and 30.2% lower area for the same delay in the 32b and 64b settings respectively. We observe that agents trained with open-source synthesis tools and cell library can design adder circuits that achieve lower area and delay than commercial tool adders in an industrial cell library. △ Less

Submitted 14 May, 2022; originally announced May 2022.

Comments: Copyright 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: ACM/IEEE Design Automation Conference (DAC), 2021, pp. 853-858

arXiv:2204.04273 [pdf, other]

Weight Matrix Dimensionality Reduction in Deep Learning via Kronecker Multi-layer Architectures

Authors: Jarom D. Hogue, Robert M. Kirby, Akil Narayan

Abstract: Deep learning using neural networks is an effective technique for generating models of complex data. However, training such models can be expensive when networks have large model capacity resulting from a large number of layers and nodes. For training in such a computationally prohibitive regime, dimensionality reduction techniques ease the computational burden, and allow implementations of more r… ▽ More Deep learning using neural networks is an effective technique for generating models of complex data. However, training such models can be expensive when networks have large model capacity resulting from a large number of layers and nodes. For training in such a computationally prohibitive regime, dimensionality reduction techniques ease the computational burden, and allow implementations of more robust networks. We propose a novel type of such dimensionality reduction via a new deep learning architecture based on fast matrix multiplication of a Kronecker product decomposition; in particular our network construction can be viewed as a Kronecker product-induced sparsification of an "extended" fully connected network. Analysis and practical examples show that this architecture allows a neural network to be trained and implemented with a significant reduction in computational time and resources, while achieving a similar error level compared to a traditional feedforward neural network. △ Less

Submitted 17 January, 2023; v1 submitted 8 April, 2022; originally announced April 2022.

MSC Class: 15A23; 15A69; 65F30; 68T05 ACM Class: G.1.3; I.2.6

arXiv:2202.12316 [pdf, other]

AutoIP: A United Framework to Integrate Physics into Gaussian Processes

Authors: Da Long, Zheng Wang, Aditi Krishnapriyan, Robert Kirby, Shandian Zhe, Michael Mahoney

Abstract: Physical modeling is critical for many modern science and engineering applications. From a data science or machine learning perspective, where more domain-agnostic, data-driven models are pervasive, physical knowledge -- often expressed as differential equations -- is valuable in that it is complementary to data, and it can potentially help overcome issues such as data sparsity, noise, and inaccur… ▽ More Physical modeling is critical for many modern science and engineering applications. From a data science or machine learning perspective, where more domain-agnostic, data-driven models are pervasive, physical knowledge -- often expressed as differential equations -- is valuable in that it is complementary to data, and it can potentially help overcome issues such as data sparsity, noise, and inaccuracy. In this work, we propose a simple, yet powerful and general framework -- AutoIP, for Automatically Incorporating Physics -- that can integrate all kinds of differential equations into Gaussian Processes (GPs) to enhance prediction accuracy and uncertainty quantification. These equations can be linear or nonlinear, spatial, temporal, or spatio-temporal, complete or incomplete with unknown source terms, and so on. Based on kernel differentiation, we construct a GP prior to sample the values of the target function, equation-related derivatives, and latent source functions, which are all jointly from a multivariate Gaussian distribution. The sampled values are fed to two likelihoods: one to fit the observations, and the other to conform to the equation. We use the whitening method to evade the strong dependency between the sampled function values and kernel parameters, and we develop a stochastic variational learning algorithm. AutoIP shows improvement upon vanilla GPs in both simulation and several real-world applications, even using rough, incomplete equations. △ Less

Submitted 20 July, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

arXiv:2202.04137 [pdf, other]

Machine Learning in Heterogeneous Porous Materials

Authors: Marta D'Elia, Hang Deng, Cedric Fraces, Krishna Garikipati, Lori Graham-Brady, Amanda Howard, George Karniadakis, Vahid Keshavarzzadeh, Robert M. Kirby, Nathan Kutz, Chunhui Li, Xing Liu, Hannah Lu, Pania Newell, Daniel O'Malley, Masa Prodanovic, Gowri Srinivasan, Alexandre Tartakovsky, Daniel M. Tartakovsky, Hamdi Tchelepi, Bozo Vazic, Hari Viswanathan, Hongkyu Yoon, Piotr Zarzycki

Abstract: The "Workshop on Machine learning in heterogeneous porous materials" brought together international scientific communities of applied mathematics, porous media, and material sciences with experts in the areas of heterogeneous materials, machine learning (ML) and applied mathematics to identify how ML can advance materials research. Within the scope of ML and materials research, the goal of the wor… ▽ More The "Workshop on Machine learning in heterogeneous porous materials" brought together international scientific communities of applied mathematics, porous media, and material sciences with experts in the areas of heterogeneous materials, machine learning (ML) and applied mathematics to identify how ML can advance materials research. Within the scope of ML and materials research, the goal of the workshop was to discuss the state-of-the-art in each community, promote crosstalk and accelerate multi-disciplinary collaborative research, and identify challenges and opportunities. As the end result, four topic areas were identified: ML in predicting materials properties, and discovery and design of novel materials, ML in porous and fractured media and time-dependent phenomena, Multi-scale modeling in heterogeneous porous materials via ML, and Discovery of materials constitutive laws and new governing equations. This workshop was part of the AmeriMech Symposium series sponsored by the National Academies of Sciences, Engineering and Medicine and the U.S. National Committee on Theoretical and Applied Mechanics. △ Less

Submitted 4 February, 2022; originally announced February 2022.

Comments: The workshop link is: https://amerimech.mech.utah.edu

arXiv:2201.00888 [pdf, other]

GP-HMAT: Scalable, ${O}(n\log(n))$ Gaussian Process Regression with Hierarchical Low-Rank Matrices

Authors: Vahid Keshavarzzadeh, Shandian Zhe, Robert M. Kirby, Akil Narayan

Abstract: A Gaussian process (GP) is a powerful and widely used regression technique. The main building block of a GP regression is the covariance kernel, which characterizes the relationship between pairs in the random field. The optimization to find the optimal kernel, however, requires several large-scale and often unstructured matrix inversions. We tackle this challenge by introducing a hierarchical mat… ▽ More A Gaussian process (GP) is a powerful and widely used regression technique. The main building block of a GP regression is the covariance kernel, which characterizes the relationship between pairs in the random field. The optimization to find the optimal kernel, however, requires several large-scale and often unstructured matrix inversions. We tackle this challenge by introducing a hierarchical matrix approach, named HMAT, which effectively decomposes the matrix structure, in a recursive manner, into significantly smaller matrices where a direct approach could be used for inversion. Our matrix partitioning uses a particular aggregation strategy for data points, which promotes the low-rank structure of off-diagonal blocks in the hierarchical kernel matrix. We employ a randomized linear algebra method for matrix reduction on the low-rank off-diagonal blocks without factorizing a large matrix. We provide analytical error and cost estimates for the inversion of the matrix, investigate them empirically with numerical computations, and demonstrate the application of our approach on three numerical examples involving GP regression for engineering problems and a large-scale real dataset. We provide the computer implementation of GP-HMAT, HMAT adapted for GP likelihood and derivative computations, and the implementation of the last numerical example on a real dataset. We demonstrate superior scalability of the HMAT approach compared to built-in $\backslash$ operator in MATLAB for large-scale linear solves $\bf{A}\bf{x} = \bf{y}$ via a repeatable and verifiable empirical study. An extension to hierarchical semiseparable (HSS) matrices is discussed as future research. △ Less

Submitted 17 December, 2021; originally announced January 2022.

arXiv:2110.13361 [pdf, ps, other]

doi 10.1016/j.jcp.2023.11191211912

A Metalearning Approach for Physics-Informed Neural Networks (PINNs): Application to Parameterized PDEs

Authors: Michael Penwarden, Shandian Zhe, Akil Narayan, Robert M. Kirby

Abstract: Physics-informed neural networks (PINNs) as a means of discretizing partial differential equations (PDEs) are garnering much attention in the Computational Science and Engineering (CS&E) world. At least two challenges exist for PINNs at present: an understanding of accuracy and convergence characteristics with respect to tunable parameters and identification of optimization strategies that make PI… ▽ More Physics-informed neural networks (PINNs) as a means of discretizing partial differential equations (PDEs) are garnering much attention in the Computational Science and Engineering (CS&E) world. At least two challenges exist for PINNs at present: an understanding of accuracy and convergence characteristics with respect to tunable parameters and identification of optimization strategies that make PINNs as efficient as other computational science tools. The cost of PINNs training remains a major challenge of Physics-informed Machine Learning (PiML) - and, in fact, machine learning (ML) in general. This paper is meant to move towards addressing the latter through the study of PINNs on new tasks, for which parameterized PDEs provides a good testbed application as tasks can be easily defined in this context. Following the ML world, we introduce metalearning of PINNs with application to parameterized PDEs. By introducing metalearning and transfer learning concepts, we can greatly accelerate the PINNs optimization process. We present a survey of model-agnostic metalearning, and then discuss our model-aware metalearning applied to PINNs as well as implementation considerations and algorithmic complexity. We then test our approach on various canonical forward parameterized PDEs that have been presented in the emerging PINNs literature. △ Less

Submitted 19 January, 2023; v1 submitted 25 October, 2021; originally announced October 2021.

Journal ref: Journal of Computational Physics, Volume 477, 2023, 111912

arXiv:2110.08432 [pdf, other]

Meta-Learning with Adjoint Methods

Authors: Shibo Li, Zheng Wang, Akil Narayan, Robert Kirby, Shandian Zhe

Abstract: Model Agnostic Meta Learning (MAML) is widely used to find a good initialization for a family of tasks. Despite its success, a critical challenge in MAML is to calculate the gradient w.r.t. the initialization of a long training trajectory for the sampled tasks, because the computation graph can rapidly explode and the computational cost is very expensive. To address this problem, we propose Adjoin… ▽ More Model Agnostic Meta Learning (MAML) is widely used to find a good initialization for a family of tasks. Despite its success, a critical challenge in MAML is to calculate the gradient w.r.t. the initialization of a long training trajectory for the sampled tasks, because the computation graph can rapidly explode and the computational cost is very expensive. To address this problem, we propose Adjoint MAML (A-MAML). We view gradient descent in the inner optimization as the evolution of an Ordinary Differential Equation (ODE). To efficiently compute the gradient of the validation loss w.r.t. the initialization, we use the adjoint method to construct a companion, backward ODE. To obtain the gradient w.r.t. the initialization, we only need to run the standard ODE solver twice -- one is forward in time that evolves a long trajectory of gradient flow for the sampled task; the other is backward and solves the adjoint ODE. We need not create or expand any intermediate computational graphs, adopt aggressive approximations, or impose proximal regularizers in the training loss. Our approach is cheap, accurate, and adaptable to different trajectory lengths. We demonstrate the advantage of our approach in both synthetic and real-world meta-learning tasks. △ Less

Submitted 24 February, 2023; v1 submitted 15 October, 2021; originally announced October 2021.

arXiv:2109.02631 [pdf, other]

Guiding Global Placement With Reinforcement Learning

Authors: Robert Kirby, Kolby Nottingham, Rajarshi Roy, Saad Godil, Bryan Catanzaro

Abstract: Recent advances in GPU accelerated global and detail placement have reduced the time to solution by an order of magnitude. This advancement allows us to leverage data driven optimization (such as Reinforcement Learning) in an effort to improve the final quality of placement results. In this work we augment state-of-the-art, force-based global placement solvers with a reinforcement learning agent t… ▽ More Recent advances in GPU accelerated global and detail placement have reduced the time to solution by an order of magnitude. This advancement allows us to leverage data driven optimization (such as Reinforcement Learning) in an effort to improve the final quality of placement results. In this work we augment state-of-the-art, force-based global placement solvers with a reinforcement learning agent trained to improve the final detail placed Half Perimeter Wire Length (HPWL). We propose novel control schemes with either global or localized control of the placement process. We then train reinforcement learning agents to use these controls to guide placement to improved solutions. In both cases, the augmented optimizer finds improved placement solutions. Our trained agents achieve an average 1% improvement in final detail place HPWL across a range of academic benchmarks and more than 1% in global place HPWL on real industry designs. △ Less

Submitted 6 September, 2021; originally announced September 2021.

ACM Class: B.7.2

arXiv:2109.01050 [pdf, other]

Characterizing possible failure modes in physics-informed neural networks

Authors: Aditi S. Krishnapriyan, Amir Gholami, Shandian Zhe, Robert M. Kirby, Michael W. Mahoney

Abstract: Recent work in scientific machine learning has developed so-called physics-informed neural network (PINN) models. The typical approach is to incorporate physical domain knowledge as soft constraints on an empirical loss function and use existing machine learning methodologies to train the model. We demonstrate that, while existing PINN methodologies can learn good models for relatively trivial pro… ▽ More Recent work in scientific machine learning has developed so-called physics-informed neural network (PINN) models. The typical approach is to incorporate physical domain knowledge as soft constraints on an empirical loss function and use existing machine learning methodologies to train the model. We demonstrate that, while existing PINN methodologies can learn good models for relatively trivial problems, they can easily fail to learn relevant physical phenomena for even slightly more complex problems. In particular, we analyze several distinct situations of widespread physical interest, including learning differential equations with convection, reaction, and diffusion operators. We provide evidence that the soft regularization in PINNs, which involves PDE-based differential operators, can introduce a number of subtle problems, including making the problem more ill-conditioned. Importantly, we show that these possible failure modes are not due to the lack of expressivity in the NN architecture, but that the PINN's setup makes the loss landscape very hard to optimize. We then describe two promising solutions to address these failure modes. The first approach is to use curriculum regularization, where the PINN's loss term starts from a simple PDE regularization, and becomes progressively more complex as the NN gets trained. The second approach is to pose the problem as a sequence-to-sequence learning task, rather than learning to predict the entire space-time at once. Extensive testing shows that we can achieve up to 1-2 orders of magnitude lower error with these methods as compared to regular PINN training. △ Less

Submitted 11 November, 2021; v1 submitted 2 September, 2021; originally announced September 2021.

Comments: 22 pages

Journal ref: NeurIPS 2021

arXiv:2108.10946 [pdf, other]

Full waveform inversion using triangular waveform adapted meshes

Authors: Keith J. Roberts, Alexandre Olender, Lucas Franceschini, Robert C. Kirby, Rafael S. Gioria, Bruno S. Carmo

Abstract: In this article, continuous Galerkin finite elements are applied to perform full waveform inversion (FWI) for seismic velocity model building. A time-domain FWI approach is detailed that uses meshes composed of variably sized triangular elements to discretize the domain. To resolve both the forward and adjoint-state equations, and to calculate a mesh-independent gradient associated with the FWI pr… ▽ More In this article, continuous Galerkin finite elements are applied to perform full waveform inversion (FWI) for seismic velocity model building. A time-domain FWI approach is detailed that uses meshes composed of variably sized triangular elements to discretize the domain. To resolve both the forward and adjoint-state equations, and to calculate a mesh-independent gradient associated with the FWI process, a fully-explicit, variable higher-order (up to degree $k=5$ in $2$D and $k=3$ in 3D) mass lumping method is used. By adapting the triangular elements to the expected peak source frequency and properties of the wavefield (e.g., local P-wavespeed) and by leveraging higher-order basis functions, the number of degrees-of-freedom necessary to discretize the domain can be reduced. Results from wave simulations and FWIs in both $2$D and 3D highlight our developments and demonstrate the benefits and challenges with using triangular meshes adapted to the material proprieties. Software developments are implemented an open source code built on top of Firedrake, a high-level Python package for the automated solution of partial differential equations using the finite element method. △ Less

Submitted 24 August, 2021; originally announced August 2021.

arXiv:2107.08093 [pdf, other]

doi 10.1109/TVCG.2021.3093776

Particle Merging-and-Splitting

Authors: Nghia Truong, Cem Yuksel, Chakrit Watcharopas, Joshua A. Levine, Robert M. Kirby

Abstract: Robustly handling collisions between individual particles in a large particle-based simulation has been a challenging problem. We introduce particle merging-and-splitting, a simple scheme for robustly handling collisions between particles that prevents inter-penetrations of separate objects without introducing numerical instabilities. This scheme merges colliding particles at the beginning of the… ▽ More Robustly handling collisions between individual particles in a large particle-based simulation has been a challenging problem. We introduce particle merging-and-splitting, a simple scheme for robustly handling collisions between particles that prevents inter-penetrations of separate objects without introducing numerical instabilities. This scheme merges colliding particles at the beginning of the time-step and then splits them at the end of the time-step. Thus, collisions last for the duration of a time-step, allowing neighboring particles of the colliding particles to influence each other. We show that our merging-and-splitting method is effective in robustly handling collisions and avoiding penetrations in particle-based simulations. We also show how our merging-and-splitting approach can be used for coupling different simulation systems using different and otherwise incompatible integrators. We present simulation tests involving complex solid-fluid interactions, including solid fractures generated by fluid interactions. △ Less

Submitted 16 July, 2021; originally announced July 2021.

Comments: IEEE Trans. Vis. Comput. Graph

arXiv:2106.13361 [pdf, other]

doi 10.1016/j.jcp.2021.110844

Multifidelity Modeling for Physics-Informed Neural Networks (PINNs)

Authors: Michael Penwarden, Shandian Zhe, Akil Narayan, Robert M. Kirby

Abstract: Multifidelity simulation methodologies are often used in an attempt to judiciously combine low-fidelity and high-fidelity simulation results in an accuracy-increasing, cost-saving way. Candidates for this approach are simulation methodologies for which there are fidelity differences connected with significant computational cost differences. Physics-informed Neural Networks (PINNs) are candidates f… ▽ More Multifidelity simulation methodologies are often used in an attempt to judiciously combine low-fidelity and high-fidelity simulation results in an accuracy-increasing, cost-saving way. Candidates for this approach are simulation methodologies for which there are fidelity differences connected with significant computational cost differences. Physics-informed Neural Networks (PINNs) are candidates for these types of approaches due to the significant difference in training times required when different fidelities (expressed in terms of architecture width and depth as well as optimization criteria) are employed. In this paper, we propose a particular multifidelity approach applied to PINNs that exploits low-rank structure. We demonstrate that width, depth, and optimization criteria can be used as parameters related to model fidelity, and show numerical justification of cost differences in training due to fidelity parameter choices. We test our multifidelity scheme on various canonical forward PDE models that have been presented in the emerging PINNs literature. △ Less

Submitted 5 January, 2023; v1 submitted 24 June, 2021; originally announced June 2021.

Journal ref: Journal of Computational Physics Volume 451, 15 February 2022, 110844

arXiv:2106.09884 [pdf, other]

Batch Multi-Fidelity Bayesian Optimization with Deep Auto-Regressive Networks

Authors: Shibo Li, Robert M. Kirby, Shandian Zhe

Abstract: Bayesian optimization (BO) is a powerful approach for optimizing black-box, expensive-to-evaluate functions. To enable a flexible trade-off between the cost and accuracy, many applications allow the function to be evaluated at different fidelities. In order to reduce the optimization cost while maximizing the benefit-cost ratio, in this paper, we propose Batch Multi-fidelity Bayesian Optimization… ▽ More Bayesian optimization (BO) is a powerful approach for optimizing black-box, expensive-to-evaluate functions. To enable a flexible trade-off between the cost and accuracy, many applications allow the function to be evaluated at different fidelities. In order to reduce the optimization cost while maximizing the benefit-cost ratio, in this paper, we propose Batch Multi-fidelity Bayesian Optimization with Deep Auto-Regressive Networks (BMBO-DARN). We use a set of Bayesian neural networks to construct a fully auto-regressive model, which is expressive enough to capture strong yet complex relationships across all the fidelities, so as to improve the surrogate learning and optimization performance. Furthermore, to enhance the quality and diversity of queries, we develop a simple yet efficient batch querying method, without any combinatorial search over the fidelities. We propose a batch acquisition function based on Max-value Entropy Search (MES) principle, which penalizes highly correlated queries and encourages diversity. We use posterior samples and moment matching to fulfill efficient computation of the acquisition function and conduct alternating optimization over every fidelity-input pair, which guarantees an improvement at each step. We demonstrate the advantage of our approach on four real-world hyperparameter optimization applications. △ Less

Submitted 25 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

arXiv:2104.14704 [pdf, other]

doi 10.1016/j.cma.2021.114221

Towards an Extrinsic, CG-XFEM Approach Based on Hierarchical Enrichments for Modeling Progressive Fracture

Authors: M. Keith Ballard, Roman Amici, Varun Shankar, Lauren A. Ferguson, Michael Braginsky, Robert M. Kirby

Abstract: We propose an extrinsic, continuous-Galerkin (CG), extended finite element method (XFEM) that generalizes the work of Hansbo and Hansbo to allow multiple Heaviside enrichments within a single element in a hierarchical manner. This approach enables complex, evolving XFEM surfaces in 3D that cannot be captured using existing CG-XFEM approaches. We describe an implementation of the method for 3D stat… ▽ More We propose an extrinsic, continuous-Galerkin (CG), extended finite element method (XFEM) that generalizes the work of Hansbo and Hansbo to allow multiple Heaviside enrichments within a single element in a hierarchical manner. This approach enables complex, evolving XFEM surfaces in 3D that cannot be captured using existing CG-XFEM approaches. We describe an implementation of the method for 3D static elasticity with linearized strain for modeling open cracks as a salient step towards modeling progressive fracture. The implementation includes a description of the finite element model, hybrid implicit/explicit representation of enrichments, numerical integration method, and novel degree-of-freedom (DoF) enumeration algorithm. This algorithm supports an arbitrary number of enrichments within an element, while simultaneously maintaining a CG solution across elements. Additionally, our approach easily allows an implementation suitable for distributed computing systems. Enabled by the DoF enumeration algorithm, the proposed method lays the groundwork for a computational tool that efficiently models progressive fracture. To facilitate a discussion of the complex enrichment hierarchies, we develop enrichment diagrams to succinctly describe and visualize the relationships between the enrichments (and the fields they create) within an element. This also provides a unified language for discussing extrinsic XFEM methods in the literature. We compare several methods, relying on the enrichment diagrams to highlight their nuanced differences. △ Less

Submitted 29 April, 2021; originally announced April 2021.

arXiv:2104.12986 [pdf, other]

doi 10.1145/3490485

Bringing Trimmed Serendipity Methods to Computational Practice in Firedrake

Authors: Justin Crum, Cyrus Cheng, David A. Ham, Lawrence Mitchell, Robert C. Kirby, Joshua A. Levine, Andrew Gillette

Abstract: We present an implementation of the trimmed serendipity finite element family, using the open source finite element package Firedrake. The new elements can be used seamlessly within the software suite for problems requiring $H^1$, \hcurl, or \hdiv-conforming elements on meshes of squares or cubes. To test how well trimmed serendipity elements perform in comparison to traditional tensor product ele… ▽ More We present an implementation of the trimmed serendipity finite element family, using the open source finite element package Firedrake. The new elements can be used seamlessly within the software suite for problems requiring $H^1$, \hcurl, or \hdiv-conforming elements on meshes of squares or cubes. To test how well trimmed serendipity elements perform in comparison to traditional tensor product elements, we perform a sequence of numerical experiments including the primal Poisson, mixed Poisson, and Maxwell cavity eigenvalue problems. Overall, we find that the trimmed serendipity elements converge, as expected, at the same rate as the respective tensor product elements while being able to offer significant savings in the time or memory required to solve certain problems. △ Less

Submitted 8 October, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

Comments: 19 pages, 7 figures, 3 tables, 2 listings

Journal ref: ACM Transactions on Mathematical Software 48(1):8:1-8:19 (2022)

arXiv:2104.03743 [pdf, other]

Residual Gaussian Process: A Tractable Nonparametric Bayesian Emulator for Multi-fidelity Simulations

Authors: Wei W. Xing, Akeel A. Shah, Peng Wang, Shandian Zhe Qian Fu, Robert. M. Kirby

Abstract: Challenges in multi-fidelity modeling relate to accuracy, uncertainty estimation and high-dimensionality. A novel additive structure is introduced in which the highest fidelity solution is written as a sum of the lowest fidelity solution and residuals between the solutions at successive fidelity levels, with Gaussian process priors placed over the low fidelity solution and each of the residuals. T… ▽ More Challenges in multi-fidelity modeling relate to accuracy, uncertainty estimation and high-dimensionality. A novel additive structure is introduced in which the highest fidelity solution is written as a sum of the lowest fidelity solution and residuals between the solutions at successive fidelity levels, with Gaussian process priors placed over the low fidelity solution and each of the residuals. The resulting model is equipped with a closed-form solution for the predictive posterior, making it applicable to advanced, high-dimensional tasks that require uncertainty estimation. Its advantages are demonstrated on univariate benchmarks and on three challenging multivariate problems. It is shown how active learning can be used to enhance the model, especially with a limited computational budget. Furthermore, error bounds are derived for the mean prediction in the univariate case. △ Less

Submitted 8 April, 2021; originally announced April 2021.

arXiv:2012.14901 [pdf, other]

Visualization of topology optimization designs with representative subset selection

Authors: Daniel J Perry, Vahid Keshavarzzadeh, Shireen Y Elhabian, Robert M Kirby, Michael Gleicher, Ross T Whitaker

Abstract: An important new trend in additive manufacturing is the use of optimization to automatically design industrial objects, such as beams, rudders or wings. Topology optimization, as it is often called, computes the best configuration of material over a 3D space, typically represented as a grid, in order to satisfy or optimize physical parameters. Designers using these automated systems often seek to… ▽ More An important new trend in additive manufacturing is the use of optimization to automatically design industrial objects, such as beams, rudders or wings. Topology optimization, as it is often called, computes the best configuration of material over a 3D space, typically represented as a grid, in order to satisfy or optimize physical parameters. Designers using these automated systems often seek to understand the interaction of physical constraints with the final design and its implications for other physical characteristics. Such understanding is challenging because the space of designs is large and small changes in parameters can result in radically different designs. We propose to address these challenges using a visualization approach for exploring the space of design solutions. The core of our novel approach is to summarize the space (ensemble of solutions) by automatically selecting a set of examples and to represent the complete set of solutions as combinations of these examples. The representative examples create a meaningful parameterization of the design space that can be explored using standard visualization techniques for high-dimensional spaces. We present evaluations of our subset selection technique and that the overall approach addresses the needs of expert designers. △ Less

Submitted 29 December, 2020; originally announced December 2020.

Comments: 14 pages, 10 figures

ACM Class: I.3.8; G.1.3

arXiv:2012.00901 [pdf, other]

Deep Multi-Fidelity Active Learning of High-dimensional Outputs

Authors: Shibo Li, Robert M. Kirby, Shandian Zhe

Abstract: Many applications, such as in physical simulation and engineering design, demand we estimate functions with high-dimensional outputs. The training examples can be collected with different fidelities to allow a cost/accuracy trade-off. In this paper, we consider the active learning task that identifies both the fidelity and input to query new training examples so as to achieve the best benefit-cost… ▽ More Many applications, such as in physical simulation and engineering design, demand we estimate functions with high-dimensional outputs. The training examples can be collected with different fidelities to allow a cost/accuracy trade-off. In this paper, we consider the active learning task that identifies both the fidelity and input to query new training examples so as to achieve the best benefit-cost ratio. To this end, we propose DMFAL, a Deep Multi-Fidelity Active Learning approach. We first develop a deep neural network-based multi-fidelity model for learning with high-dimensional outputs, which can flexibly, efficiently capture all kinds of complex relationships across the outputs and fidelities to improve prediction. We then propose a mutual information-based acquisition function that extends the predictive entropy principle. To overcome the computational challenges caused by large output dimensions, we use multi-variate Delta's method and moment-matching to estimate the output posterior, and Weinstein-Aronszajn identity to calculate and optimize the acquisition function. The computation is tractable, reliable and efficient. We show the advantage of our method in several applications of computational physics and engineering design. △ Less

Submitted 25 October, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

arXiv:2006.04976 [pdf, other]

Physics Informed Deep Kernel Learning

Authors: Zheng Wang, Wei Xing, Robert Kirby, Shandian Zhe

Abstract: Deep kernel learning is a promising combination of deep neural networks and nonparametric function learning. However, as a data driven approach, the performance of deep kernel learning can still be restricted by scarce or insufficient data, especially in extrapolation tasks. To address these limitations, we propose Physics Informed Deep Kernel Learning (PI-DKL) that exploits physics knowledge repr… ▽ More Deep kernel learning is a promising combination of deep neural networks and nonparametric function learning. However, as a data driven approach, the performance of deep kernel learning can still be restricted by scarce or insufficient data, especially in extrapolation tasks. To address these limitations, we propose Physics Informed Deep Kernel Learning (PI-DKL) that exploits physics knowledge represented by differential equations with latent sources. Specifically, we use the posterior function sample of the Gaussian process as the surrogate for the solution of the differential equation, and construct a generative component to integrate the equation in a principled Bayesian hybrid framework. For efficient and effective inference, we marginalize out the latent variables in the joint probability and derive a collapsed model evidence lower bound (ELBO), based on which we develop a stochastic model estimation algorithm. Our ELBO can be viewed as a nice, interpretable posterior regularization objective. On synthetic datasets and real-world applications, we show the advantage of our approach in both prediction accuracy and uncertainty quantification. △ Less

Submitted 18 January, 2022; v1 submitted 8 June, 2020; originally announced June 2020.

Comments: 8 pages, 5 figures, AISTATS

arXiv:2006.04972 [pdf, other]

Multi-Fidelity High-Order Gaussian Processes for Physical Simulation

Authors: Zheng Wang, Wei Xing, Robert Kirby, Shandian Zhe

Abstract: The key task of physical simulation is to solve partial differential equations (PDEs) on discretized domains, which is known to be costly. In particular, high-fidelity solutions are much more expensive than low-fidelity ones. To reduce the cost, we consider novel Gaussian process (GP) models that leverage simulation examples of different fidelities to predict high-dimensional PDE solution outputs.… ▽ More The key task of physical simulation is to solve partial differential equations (PDEs) on discretized domains, which is known to be costly. In particular, high-fidelity solutions are much more expensive than low-fidelity ones. To reduce the cost, we consider novel Gaussian process (GP) models that leverage simulation examples of different fidelities to predict high-dimensional PDE solution outputs. Existing GP methods are either not scalable to high-dimensional outputs or lack effective strategies to integrate multi-fidelity examples. To address these issues, we propose Multi-Fidelity High-Order Gaussian Process (MFHoGP) that can capture complex correlations both between the outputs and between the fidelities to enhance solution estimation, and scale to large numbers of outputs. Based on a novel nonlinear coregionalization model, MFHoGP propagates bases throughout fidelities to fuse information, and places a deep matrix GP prior over the basis weights to capture the (nonlinear) relationships across the fidelities. To improve inference efficiency and quality, we use bases decomposition to largely reduce the model parameters, and layer-wise matrix Gaussian posteriors to capture the posterior dependency and to simplify the computation. Our stochastic variational learning algorithm successfully handles millions of outputs without extra sparse approximations. We show the advantages of our method in several typical applications. △ Less

Submitted 8 June, 2020; originally announced June 2020.

arXiv:1911.11906 [pdf, other]

A Scalable Framework for Solving Fractional Diffusion Equations

Authors: Max Carlson, Robert M. Kirby, Hari Sundar

Abstract: The study of fractional order differential operators is receiving renewed attention in many scientific fields. In order to accommodate researchers doing work in these areas, there is a need for highly scalable numerical methods for solving partial differential equations that involve fractional order operators on complex geometries. These operators have desirable special properties that also change… ▽ More The study of fractional order differential operators is receiving renewed attention in many scientific fields. In order to accommodate researchers doing work in these areas, there is a need for highly scalable numerical methods for solving partial differential equations that involve fractional order operators on complex geometries. These operators have desirable special properties that also change the computational considerations in such a way that undermines traditional methods and makes certain other approaches more appealing. We have developed a scalable framework for solving fractional diffusion equations using one such method, specifically the method of eigenfunction expansion. In this paper, we will discuss the specific parallelization strategies used to efficiently compute the full set of eigenvalues and eigenvectors for a discretized Laplace eigenvalue problem and apply them to construct approximate solutions to our fractional order model problems. Additionally, we demonstrate the performance of the method on the Frontera computing cluster and the accuracy of the method on simple geometries using known exact solutions. △ Less

Submitted 26 November, 2019; originally announced November 2019.

arXiv:1910.07577 [pdf, ps, other]

Deep Coregionalization for the Emulation of Spatial-Temporal Fields

Authors: Wei Xing, Robert M. Kirby, Shandian Zhe

Abstract: Data-driven surrogate models are widely used for applications such as design optimization and uncertainty quantification, where repeated evaluations of an expensive simulator are required. For most partial differential equation (PDE) simulators, the outputs of interest are often spatial or spatial-temporal fields, leading to very high-dimensional outputs. Despite the success of existing data-drive… ▽ More Data-driven surrogate models are widely used for applications such as design optimization and uncertainty quantification, where repeated evaluations of an expensive simulator are required. For most partial differential equation (PDE) simulators, the outputs of interest are often spatial or spatial-temporal fields, leading to very high-dimensional outputs. Despite the success of existing data-driven surrogates for high-dimensional outputs, most methods require a significant number of samples to cover the response surface in order to achieve a reasonable degree of accuracy. This demand makes the idea of surrogate models less attractive considering the high computational cost to generate the data. To address this issue, we exploit the multi-fidelity nature of a PDE simulator and introduce deep coregionalization, a Bayesian non-parametric autoregressive framework for efficient emulation of spatial-temporal fields. To effectively extract the output correlations in the context of multi-fidelity data, we develop a novel dimension reduction technique, residual principal component analysis. Our model can simultaneously capture the rich output correlations and the fidelity correlations and make high-fidelity predictions with only a few expensive, high-fidelity simulation samples. We show the advantages of our model in three canonical PDE models and a fluid dynamics problem. The results show that the proposed method cannot only approximate a simulator with significantly less cost (at bout 10%-25%) but also further improve model accuracy. △ Less

Submitted 16 October, 2019; originally announced October 2019.

arXiv:1907.07224 [pdf, other]

The Effect of Data Transformations on Scalar Field Topological Analysis of High-Order FEM Solutions

Authors: Ashok Jallepalli, Joshua A. Levine, Robert M. Kirby

Abstract: High-order finite element methods (HO-FEM) are gaining popularity in the simulation community due to their success in solving complex flow dynamics. There is an increasing need to analyze the data produced as output by these simulations. Simultaneously, topological analysis tools are emerging as powerful methods for investigating simulation data. However, most of the current approaches to topologi… ▽ More High-order finite element methods (HO-FEM) are gaining popularity in the simulation community due to their success in solving complex flow dynamics. There is an increasing need to analyze the data produced as output by these simulations. Simultaneously, topological analysis tools are emerging as powerful methods for investigating simulation data. However, most of the current approaches to topological analysis have had limited application to HO-FEM simulation data for two reasons. First, the current topological tools are designed for linear data (polynomial degree one), but the polynomial degree of the data output by these simulations is typically higher (routinely up to polynomial degree six). Second, the simulation data and derived quantities of the simulation data have discontinuities at element boundaries, and these discontinuities do not match the input requirements for the topological tools. One solution to both issues is to transform the high-order data to achieve low-order, continuous inputs for topological analysis. Nevertheless, there has been little work evaluating the possible transformation choices and their downstream effect on the topological analysis. We perform an empirical study to evaluate two commonly used data transformation methodologies along with the recently introduced L-SIAC filter for processing high-order simulation data. Our results show diverse behaviors are possible. We offer some guidance about how best to consider a pipeline of topological analysis of HO-FEM simulations with the currently available implementations of topological analysis. △ Less

Submitted 16 July, 2019; originally announced July 2019.

Comments: 11 pages, Accepted to IEEEVIS

arXiv:1906.03489 [pdf, other]

doi 10.1016/j.cpc.2019.107110

Nektar++: enhancing the capability and application of high-fidelity spectral/$hp$ element methods

Authors: David Moxey, Chris D. Cantwell, Yan Bao, Andrea Cassinelli, Giacomo Castiglioni, Sehun Chun, Emilia Juda, Ehsan Kazemi, Kilian Lackhove, Julian Marcon, Gianmarco Mengaldo, Douglas Serson, Michael Turner, Hui Xu, Joaquim Peiró, Robert M. Kirby, Spencer J. Sherwin

Abstract: Nektar++ is an open-source framework that provides a flexible, high-performance and scalable platform for the development of solvers for partial differential equations using the high-order spectral/$hp$ element method. In particular, Nektar++ aims to overcome the complex implementation challenges that are often associated with high-order methods, thereby allowing them to be more readily used in a… ▽ More Nektar++ is an open-source framework that provides a flexible, high-performance and scalable platform for the development of solvers for partial differential equations using the high-order spectral/$hp$ element method. In particular, Nektar++ aims to overcome the complex implementation challenges that are often associated with high-order methods, thereby allowing them to be more readily used in a wide range of application areas. In this paper, we present the algorithmic, implementation and application developments associated with our Nektar++ version 5.0 release. We describe some of the key software and performance developments, including our strategies on parallel I/O, on in situ processing, the use of collective operations for exploiting current and emerging hardware, and interfaces to enable multi-solver coupling. Furthermore, we provide details on a newly developed Python interface that enables a more rapid introduction for new users unfamiliar with spectral/$hp$ element methods, C++ and/or Nektar++. This release also incorporates a number of numerical method developments - in particular: the method of moving frames, which provides an additional approach for the simulation of equations on embedded curvilinear manifolds and domains; a means of handling spatially variable polynomial order; and a novel technique for quasi-3D simulations to permit spatially-varying perturbations to the geometry in the homogeneous direction. Finally, we demonstrate the new application-level features provided in this release, namely: a facility for generating high-order curvilinear meshes called NekMesh; a novel new AcousticSolver for aeroacoustic problems; our development of a 'thick' strip model for the modelling of fluid-structure interaction problems in the context of vortex-induced vibrations. We conclude by commenting some directions for future code development and expansion. △ Less

Submitted 26 November, 2019; v1 submitted 8 June, 2019; originally announced June 2019.

Comments: 21 pages, 14 figures

Journal ref: Computer Physics Communications 249 (2020) 107110

arXiv:1811.00684 [pdf, other]

SDCNet: Video Prediction Using Spatially-Displaced Convolution

Authors: Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro

Abstract: We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows. Previous approaches rely on resampling past frames, guided by a learned future optical flow, or on direct generation of pixels. Resampling based on flow is insufficient because it cannot deal with disocclusions. Generative models currently lead to blurry results. Recent app… ▽ More We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows. Previous approaches rely on resampling past frames, guided by a learned future optical flow, or on direct generation of pixels. Resampling based on flow is insufficient because it cannot deal with disocclusions. Generative models currently lead to blurry results. Recent approaches synthesis a pixel by convolving input patches with a predicted kernel. However, their memory requirement increases with kernel size. Here, we spatially-displaced convolution (SDC) module for video frame prediction. We learn a motion vector and a kernel for each pixel and synthesize a pixel by applying the kernel at a displaced location in the source image, defined by the predicted motion vector. Our approach inherits the merits of both vector-based and kernel-based approaches, while ameliorating their respective disadvantages. We train our model on 428K unlabelled 1080p video game frames. Our approach produces state-of-the-art results, achieving an SSIM score of 0.904 on high-definition YouTube-8M videos, 0.918 on Caltech Pedestrian videos. Our model handles large motion effectively and synthesizes crisp frames with consistent motion. △ Less

Submitted 27 March, 2021; v1 submitted 1 November, 2018; originally announced November 2018.

Comments: Published in ECCV 2018. Codes available at https://github.com/NVIDIA/semantic-segmentation/tree/sdcnet/sdcnet. Project page available at https://nv-adlr.github.io/publication/2018-SDCNet

arXiv:1808.05513 [pdf, other]

doi 10.1145/3361745

Code generation for generally mapped finite elements

Authors: Robert C. Kirby, Lawrence Mitchell

Abstract: Many classical finite elements such as the Argyris and Bell elements have long been absent from high-level PDE software. Building on recent theoretical work, we describe how to implement very general finite element transformations in FInAT and hence into the Firedrake finite element system. Numerical results evaluate the new elements, comparing them to existing methods for classical problems. For… ▽ More Many classical finite elements such as the Argyris and Bell elements have long been absent from high-level PDE software. Building on recent theoretical work, we describe how to implement very general finite element transformations in FInAT and hence into the Firedrake finite element system. Numerical results evaluate the new elements, comparing them to existing methods for classical problems. For a second order model problem, we find that new elements give smooth solutions at a mild increase in cost over standard Lagrange elements. For fourth-order problems, however, the newly-enabled methods significantly outperform interior penalty formulations. We also give some advanced use cases, solving the nonlinear Cahn-Hilliard equation and some biharmonic eigenvalue problems (including Chladni plates) using $C^1$ discretizations. △ Less

Submitted 6 September, 2019; v1 submitted 16 August, 2018; originally announced August 2018.

Comments: 23 pages

Journal ref: ACM Transactions on Mathematical Software 45(41):1-23 (2019)

arXiv:1808.01371 [pdf, other]

Large Scale Language Modeling: Converging on 40GB of Text in Four Hours

Authors: Raul Puri, Robert Kirby, Nikolai Yakovenko, Bryan Catanzaro

Abstract: Recent work has shown how to train Convolutional Neural Networks (CNNs) rapidly on large image datasets, then transfer the knowledge gained from these models to a variety of tasks. Following [Radford 2017], in this work, we demonstrate similar scalability and transfer for Recurrent Neural Networks (RNNs) for Natural Language tasks. By utilizing mixed precision arithmetic and a 32k batch size distr… ▽ More Recent work has shown how to train Convolutional Neural Networks (CNNs) rapidly on large image datasets, then transfer the knowledge gained from these models to a variety of tasks. Following [Radford 2017], in this work, we demonstrate similar scalability and transfer for Recurrent Neural Networks (RNNs) for Natural Language tasks. By utilizing mixed precision arithmetic and a 32k batch size distributed across 128 NVIDIA Tesla V100 GPUs, we are able to train a character-level 4096-dimension multiplicative LSTM (mLSTM) for unsupervised text reconstruction over 3 epochs of the 40 GB Amazon Reviews dataset in four hours. This runtime compares favorably with previous work taking one month to train the same size and configuration for one epoch over the same dataset. Converging large batch RNN models can be challenging. Recent work has suggested scaling the learning rate as a function of batch size, but we find that simply scaling the learning rate as a function of batch size leads either to significantly worse convergence or immediate divergence for this problem. We provide a learning rate schedule that allows our model to converge with a 32k batch size. Since our model converges over the Amazon Reviews dataset in hours, and our compute requirement of 128 Tesla V100 GPUs, while substantial, is commercially available, this work opens up large scale unsupervised NLP training to most commercial applications and deep learning researchers. A model can be trained over most public or private text datasets overnight. △ Less

Submitted 10 August, 2018; v1 submitted 3 August, 2018; originally announced August 2018.

Comments: 8 pages; To appear in High Performance Machine Learning Workshop (HPML) 2018

arXiv:1806.02972 [pdf, other]

Robust Node Generation for Meshfree Discretizations on Irregular Domains and Surfaces

Authors: Varun Shankar, Robert M. Kirby, Aaron L. Fogelson

Abstract: We present a new algorithm for the automatic one-shot generation of scattered node sets on irregular 2D and 3D domains using Poisson disk sampling coupled to novel parameter-free, high-order parametric Spherical Radial Basis Function (SBF)-based geometric modeling of irregular domain boundaries. Our algorithm also automatically modifies the scattered node sets locally for time-varying embedded bou… ▽ More We present a new algorithm for the automatic one-shot generation of scattered node sets on irregular 2D and 3D domains using Poisson disk sampling coupled to novel parameter-free, high-order parametric Spherical Radial Basis Function (SBF)-based geometric modeling of irregular domain boundaries. Our algorithm also automatically modifies the scattered node sets locally for time-varying embedded boundaries in the domain interior. We derive complexity estimates for our node generator in 2D and 3D that establish its scalability, and verify these estimates with timing experiments. We explore the influence of Poisson disk sampling parameters on both quasi-uniformity in the node sets and errors in an RBF-FD discretization of the heat equation. In all cases, our framework requires only a small number of "seed" nodes on domain boundaries. The entire framework exhibits O(N) complexity in both 2D and 3D. △ Less

Submitted 8 June, 2018; originally announced June 2018.

Comments: 26 pages, 9 figures, accepted SIAM Journal on Scientific Computing

arXiv:1804.03358 [pdf, other]

Curvilinear Mesh Adaptation using Radial Basis Function Interpolation and Smoothing

Authors: Vidhi Zala, Varun Shankar, Shankar P. Sastry, Robert M. Kirby

Abstract: We present a new iterative technique based on radial basis function (RBF) interpolation and smoothing for the generation and smoothing of curvilinear meshes from straight-sided or other curvilinear meshes. Our technique approximates the coordinate deformation maps in both the interior and boundary of the curvilinear output mesh by using only scattered nodes on the boundary of the input mesh as dat… ▽ More We present a new iterative technique based on radial basis function (RBF) interpolation and smoothing for the generation and smoothing of curvilinear meshes from straight-sided or other curvilinear meshes. Our technique approximates the coordinate deformation maps in both the interior and boundary of the curvilinear output mesh by using only scattered nodes on the boundary of the input mesh as data sites in an interpolation problem. Our technique produces high-quality meshes in the deformed domain even when the deformation maps are singular due to a new iterative algorithm based on modification of the RBF shape parameter. Due to the use of RBF interpolation, our technique is applicable to both 2D and 3D curvilinear mesh generation without significant modification. △ Less

Submitted 10 April, 2018; originally announced April 2018.

Comments: 18 pages, 8 figures. Accepted to Journal of Scientific Computing

arXiv:1711.02473 [pdf, other]

Exposing and exploiting structure: optimal code generation for high-order finite element methods

Authors: Miklós Homolya, Robert C. Kirby, David A. Ham

Abstract: Code generation based software platforms, such as Firedrake, have become popular tools for developing complicated finite element discretisations of partial differential equations. We extended the code generation infrastructure in Firedrake with optimisations that can exploit the structure inherent to some finite elements. This includes sum factorisation on cuboid cells for continuous, discontinuou… ▽ More Code generation based software platforms, such as Firedrake, have become popular tools for developing complicated finite element discretisations of partial differential equations. We extended the code generation infrastructure in Firedrake with optimisations that can exploit the structure inherent to some finite elements. This includes sum factorisation on cuboid cells for continuous, discontinuous, H(div) and H(curl) conforming elements. Our experiments confirm optimal algorithmic complexity for high-order finite element assembly. This is achieved through several novel contributions: the introduction of a more powerful interface between the form compiler and the library providing the finite elements; a more abstract, smarter library of finite elements called FInAT that explicitly communicates the structure of elements; and form compiler algorithms to automatically exploit this exposed structure. △ Less

Submitted 7 November, 2017; originally announced November 2017.

Comments: Submitted to ACM Transactions on Mathematical Software

arXiv:1710.02862 [pdf, other]

Exploration of Heterogeneous Data Using Robust Similarity

Authors: Mahsa Mirzargar, Ross T. Whitaker, Robert M. Kirby

Abstract: Heterogeneous data pose serious challenges to data analysis tasks, including exploration and visualization. Current techniques often utilize dimensionality reductions, aggregation, or conversion to numerical values to analyze heterogeneous data. However, the effectiveness of such techniques to find subtle structures such as the presence of multiple modes or detection of outliers is hindered by the… ▽ More Heterogeneous data pose serious challenges to data analysis tasks, including exploration and visualization. Current techniques often utilize dimensionality reductions, aggregation, or conversion to numerical values to analyze heterogeneous data. However, the effectiveness of such techniques to find subtle structures such as the presence of multiple modes or detection of outliers is hindered by the challenge to find the proper subspaces or prior knowledge to reveal the structures. In this paper, we propose a generic similarity-based exploration technique that is applicable to a wide variety of datatypes and their combinations, including heterogeneous ensembles. The proposed concept of similarity has a close connection to statistical analysis and can be deployed for summarization, revealing fine structures such as the presence of multiple modes, and detection of anomalies or outliers. We then propose a visual encoding framework that enables the exploration of a heterogeneous dataset in different levels of detail and provides insightful information about both global and local structures. We demonstrate the utility of the proposed technique using various real datasets, including ensemble data. △ Less

Submitted 8 October, 2017; originally announced October 2017.

Comments: Presented at Visualization in Data Science (VDS at IEEE VIS 2017)

Showing 1–50 of 53 results for author: Kirby, R