-
PIC2O-Sim: A Physics-Inspired Causality-Aware Dynamic Convolutional Neural Operator for Ultra-Fast Photonic Device FDTD Simulation
Authors:
Pingchuan Ma,
Haoyu Yang,
Zhengqi Gao,
Duane S. Boning,
Jiaqi Gu
Abstract:
The finite-difference time-domain (FDTD) method, which is important in photonic hardware design flow, is widely adopted to solve time-domain Maxwell equations. However, FDTD is known for its prohibitive runtime cost, taking minutes to hours to simulate a single device. Recently, AI has been applied to realize orders-of-magnitude speedup in partial differential equation (PDE) solving. However, AI-b…
▽ More
The finite-difference time-domain (FDTD) method, which is important in photonic hardware design flow, is widely adopted to solve time-domain Maxwell equations. However, FDTD is known for its prohibitive runtime cost, taking minutes to hours to simulate a single device. Recently, AI has been applied to realize orders-of-magnitude speedup in partial differential equation (PDE) solving. However, AI-based FDTD solvers for photonic devices have not been clearly formulated. Directly applying off-the-shelf models to predict the optical field dynamics shows unsatisfying fidelity and efficiency since the model primitives are agnostic to the unique physical properties of Maxwell equations and lack algorithmic customization. In this work, we thoroughly investigate the synergy between neural operator designs and the physical property of Maxwell equations and introduce a physics-inspired AI-based FDTD prediction framework PIC2O-Sim which features a causality-aware dynamic convolutional neural operator as its backbone model that honors the space-time causality constraints via careful receptive field configuration and explicitly captures the permittivity-dependent light propagation behavior via an efficient dynamic convolution operator. Meanwhile, we explore the trade-offs among prediction scalability, fidelity, and efficiency via a multi-stage partitioned time-bundling technique in autoregressive prediction. Multiple key techniques have been introduced to mitigate iterative error accumulation while maintaining efficiency advantages during autoregressive field prediction. Extensive evaluations on three challenging photonic device simulation tasks have shown the superiority of our PIC2O-Sim method, showing 51.2% lower roll-out prediction error, 23.5 times fewer parameters than state-of-the-art neural operators, providing 300-600x higher simulation speed than an open-source FDTD numerical solver.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Rare Event Probability Learning by Normalizing Flows
Authors:
Zhenggqi Gao,
Dinghuai Zhang,
Luca Daniel,
Duane S. Boning
Abstract:
A rare event is defined by a low probability of occurrence. Accurate estimation of such small probabilities is of utmost importance across diverse domains. Conventional Monte Carlo methods are inefficient, demanding an exorbitant number of samples to achieve reliable estimates. Inspired by the exact sampling capabilities of normalizing flows, we revisit this challenge and propose normalizing flow…
▽ More
A rare event is defined by a low probability of occurrence. Accurate estimation of such small probabilities is of utmost importance across diverse domains. Conventional Monte Carlo methods are inefficient, demanding an exorbitant number of samples to achieve reliable estimates. Inspired by the exact sampling capabilities of normalizing flows, we revisit this challenge and propose normalizing flow assisted importance sampling, termed NOFIS. NOFIS first learns a sequence of proposal distributions associated with predefined nested subset events by minimizing KL divergence losses. Next, it estimates the rare event probability by utilizing importance sampling in conjunction with the last proposal. The efficacy of our NOFIS method is substantiated through comprehensive qualitative visualizations, affirming the optimality of the learned proposal distribution, as well as a series of quantitative experiments encompassing $10$ distinct test cases, which highlight NOFIS's superiority over baseline approaches.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
KirchhoffNet: A Scalable Ultra Fast Analog Neural Network
Authors:
Zhengqi Gao,
Fan-Keng Sun,
Ron Rohrer,
Duane S. Boning
Abstract:
In this paper, we leverage a foundational principle of analog electronic circuitry, Kirchhoff's current and voltage laws, to introduce a distinctive class of neural network models termed KirchhoffNet. Essentially, KirchhoffNet is an analog circuit that can function as a neural network, utilizing its initial node voltages as the neural network input and the node voltages at a specific time point as…
▽ More
In this paper, we leverage a foundational principle of analog electronic circuitry, Kirchhoff's current and voltage laws, to introduce a distinctive class of neural network models termed KirchhoffNet. Essentially, KirchhoffNet is an analog circuit that can function as a neural network, utilizing its initial node voltages as the neural network input and the node voltages at a specific time point as the output. The evolution of node voltages within the specified time is dictated by learnable parameters on the edges connecting nodes. We demonstrate that KirchhoffNet is governed by a set of ordinary differential equations (ODEs), and notably, even in the absence of traditional layers (such as convolution layers), it attains state-of-the-art performances across diverse and complex machine learning tasks. Most importantly, KirchhoffNet can be potentially implemented as a low-power analog integrated circuit, leading to an appealing property -- irrespective of the number of parameters within a KirchhoffNet, its on-chip forward calculation can always be completed within a short time. This characteristic makes KirchhoffNet a promising and fundamental paradigm for implementing large-scale neural networks, opening a new avenue in analog neural networks for AI.
△ Less
Submitted 6 May, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Nominality Score Conditioned Time Series Anomaly Detection by Point/Sequential Reconstruction
Authors:
Chih-Yu Lai,
Fan-Keng Sun,
Zhengqi Gao,
Jeffrey H. Lang,
Duane S. Boning
Abstract:
Time series anomaly detection is challenging due to the complexity and variety of patterns that can occur. One major difficulty arises from modeling time-dependent relationships to find contextual anomalies while maintaining detection accuracy for point anomalies. In this paper, we propose a framework for unsupervised time series anomaly detection that utilizes point-based and sequence-based recon…
▽ More
Time series anomaly detection is challenging due to the complexity and variety of patterns that can occur. One major difficulty arises from modeling time-dependent relationships to find contextual anomalies while maintaining detection accuracy for point anomalies. In this paper, we propose a framework for unsupervised time series anomaly detection that utilizes point-based and sequence-based reconstruction models. The point-based model attempts to quantify point anomalies, and the sequence-based model attempts to quantify both point and contextual anomalies. Under the formulation that the observed time point is a two-stage deviated value from a nominal time point, we introduce a nominality score calculated from the ratio of a combined value of the reconstruction errors. We derive an induced anomaly score by further integrating the nominality score and anomaly score, then theoretically prove the superiority of the induced anomaly score over the original anomaly score under certain conditions. Extensive studies conducted on several public datasets show that the proposed framework outperforms most state-of-the-art baselines for time series anomaly detection.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Provable Routing Analysis of Programmable Photonics
Authors:
Zhengqi Gao,
Xiangfeng Chen,
Zhengxing Zhang,
Chih-Yu Lai,
Uttara Chakraborty,
Wim Bogaerts,
Duane S. Boning
Abstract:
Programmable photonic integrated circuits (PPICs) are an emerging technology recently proposed as an alternative to custom-designed application-specific integrated photonics. Light routing is one of the most important functions that need to be realized on a PPIC. Previous literature has investigated the light routing problem from an algorithmic or experimental perspective, e.g., adopting graph the…
▽ More
Programmable photonic integrated circuits (PPICs) are an emerging technology recently proposed as an alternative to custom-designed application-specific integrated photonics. Light routing is one of the most important functions that need to be realized on a PPIC. Previous literature has investigated the light routing problem from an algorithmic or experimental perspective, e.g., adopting graph theory to route an optical signal. In this paper, we also focus on the light routing problem, but from a complementary and theoretical perspective, to answer questions about what is possible to be routed. Specifically, we demonstrate that not all path lengths (defined as the number of tunable basic units that an optical signal traverses) can be realized on a square-mesh PPIC, and a rigorous realizability condition is proposed and proved. We further consider multi-path routing, where we provide an analytical expression on path length sum, upper bounds on path length mean/variance, and the maximum number of realizable paths. All of our conclusions are proven mathematically. Illustrative potential optical applications using our observations are also presented.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
NeurOLight: A Physics-Agnostic Neural Operator Enabling Parametric Photonic Device Simulation
Authors:
Jiaqi Gu,
Zhengqi Gao,
Chenghao Feng,
Hanqing Zhu,
Ray T. Chen,
Duane S. Boning,
David Z. Pan
Abstract:
Optical computing is an emerging technology for next-generation efficient artificial intelligence (AI) due to its ultra-high speed and efficiency. Electromagnetic field simulation is critical to the design, optimization, and validation of photonic devices and circuits. However, costly numerical simulation significantly hinders the scalability and turn-around time in the photonic circuit design loo…
▽ More
Optical computing is an emerging technology for next-generation efficient artificial intelligence (AI) due to its ultra-high speed and efficiency. Electromagnetic field simulation is critical to the design, optimization, and validation of photonic devices and circuits. However, costly numerical simulation significantly hinders the scalability and turn-around time in the photonic circuit design loop. Recently, physics-informed neural networks have been proposed to predict the optical field solution of a single instance of a partial differential equation (PDE) with predefined parameters. Their complicated PDE formulation and lack of efficient parametrization mechanisms limit their flexibility and generalization in practical simulation scenarios. In this work, for the first time, a physics-agnostic neural operator-based framework, dubbed NeurOLight, is proposed to learn a family of frequency-domain Maxwell PDEs for ultra-fast parametric photonic device simulation. We balance the efficiency and generalization of NeurOLight via several novel techniques. Specifically, we discretize different devices into a unified domain, represent parametric PDEs with a compact wave prior, and encode the incident light via masked source modeling. We design our model with parameter-efficient cross-shaped NeurOLight blocks and adopt superposition-based augmentation for data-efficient learning. With these synergistic approaches, NeurOLight generalizes to a large space of unseen simulation settings, demonstrates 2-orders-of-magnitude faster simulation speed than numerical solvers, and outperforms prior neural network models by ~54% lower prediction error with ~44% fewer parameters. Our code is available at https://github.com/JeremieMelo/NeurOLight.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Automatic Synthesis of Light Processing Functions for Programmable Photonics: Theory and Realization
Authors:
Zhengqi Gao,
Xiangfeng Chen,
Zhengxing Zhang,
Uttara Chakraborty,
Wim Bogaerts,
Duane S. Boning
Abstract:
Linear light processing functions (e.g., routing, splitting, filtering) are key functions requiring configuration to implement on a programmable photonic integrated circuit (PPIC). In recirculating waveguide meshes (which include loop-backs), this is usually done manually. Some previous results describe explorations to perform this task automatically, but their efficiency or applicability is still…
▽ More
Linear light processing functions (e.g., routing, splitting, filtering) are key functions requiring configuration to implement on a programmable photonic integrated circuit (PPIC). In recirculating waveguide meshes (which include loop-backs), this is usually done manually. Some previous results describe explorations to perform this task automatically, but their efficiency or applicability is still limited. In this paper, we propose an efficient method that can automatically realize configurations for many light processing functions on a square-mesh PPIC. At its heart is an automatic differentiation subroutine built upon analytical expressions of scattering matrices, that enables gradient descent optimization for functional circuit synthesis. Similar to the state-of-the-art synthesis techniques, our method can realize configurations for a wide range of light processing functions, and multiple functions on the same PPIC simultaneously. However, we do not need to separate the functions spatially into different subdomains of the mesh, and the resulting optimum can have multiple functions using the same part of the mesh. Furthermore, compared to non-gradient or numerical differentiation based methods, our proposed approach achieves 3x time reduction in computational cost.
△ Less
Submitted 10 February, 2023; v1 submitted 30 August, 2022;
originally announced August 2022.
-
Learning from Multiple Annotator Noisy Labels via Sample-wise Label Fusion
Authors:
Zhengqi Gao,
Fan-Keng Sun,
Mingran Yang,
Sucheng Ren,
Zikai Xiong,
Marc Engeler,
Antonio Burazer,
Linda Wildling,
Luca Daniel,
Duane S. Boning
Abstract:
Data lies at the core of modern deep learning. The impressive performance of supervised learning is built upon a base of massive accurately labeled data. However, in some real-world applications, accurate labeling might not be viable; instead, multiple noisy labels (instead of one accurate label) are provided by several annotators for each data sample. Learning a classifier on such a noisy trainin…
▽ More
Data lies at the core of modern deep learning. The impressive performance of supervised learning is built upon a base of massive accurately labeled data. However, in some real-world applications, accurate labeling might not be viable; instead, multiple noisy labels (instead of one accurate label) are provided by several annotators for each data sample. Learning a classifier on such a noisy training dataset is a challenging task. Previous approaches usually assume that all data samples share the same set of parameters related to annotator errors, while we demonstrate that label error learning should be both annotator and data sample dependent. Motivated by this observation, we propose a novel learning algorithm. The proposed method displays superiority compared with several state-of-the-art baseline methods on MNIST, CIFAR-100, and ImageNet-100. Our code is available at: https://github.com/zhengqigao/Learning-from-Multiple-Annotator-Noisy-Labels.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
FreDo: Frequency Domain-based Long-Term Time Series Forecasting
Authors:
Fan-Keng Sun,
Duane S. Boning
Abstract:
The ability to forecast far into the future is highly beneficial to many applications, including but not limited to climatology, energy consumption, and logistics. However, due to noise or measurement error, it is questionable how far into the future one can reasonably predict. In this paper, we first mathematically show that due to error accumulation, sophisticated models might not outperform bas…
▽ More
The ability to forecast far into the future is highly beneficial to many applications, including but not limited to climatology, energy consumption, and logistics. However, due to noise or measurement error, it is questionable how far into the future one can reasonably predict. In this paper, we first mathematically show that due to error accumulation, sophisticated models might not outperform baseline models for long-term forecasting. To demonstrate, we show that a non-parametric baseline model based on periodicity can actually achieve comparable performance to a state-of-the-art Transformer-based model on various datasets. We further propose FreDo, a frequency domain-based neural network model that is built on top of the baseline model to enhance its performance and which greatly outperforms the state-of-the-art model. Finally, we validate that the frequency domain is indeed better by comparing univariate models trained in the frequency v.s. time domain.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Adjusting for Autocorrelated Errors in Neural Networks for Time Series
Authors:
Fan-Keng Sun,
Christopher I. Lang,
Duane S. Boning
Abstract:
An increasing body of research focuses on using neural networks to model time series. A common assumption in training neural networks via maximum likelihood estimation on time series is that the errors across time steps are uncorrelated. However, errors are actually autocorrelated in many cases due to the temporality of the data, which makes such maximum likelihood estimations inaccurate. In this…
▽ More
An increasing body of research focuses on using neural networks to model time series. A common assumption in training neural networks via maximum likelihood estimation on time series is that the errors across time steps are uncorrelated. However, errors are actually autocorrelated in many cases due to the temporality of the data, which makes such maximum likelihood estimations inaccurate. In this paper, in order to adjust for autocorrelated errors, we propose to learn the autocorrelation coefficient jointly with the model parameters. In our experiments, we verify the effectiveness of our approach on time series forecasting. Results across a wide range of real-world datasets with various state-of-the-art models show that our method enhances performance in almost all cases. Based on these results, we suggest empirical critical values to determine the severity of autocorrelated errors. We also analyze several aspects of our method to demonstrate its advantages. Finally, other time series tasks are also considered to validate that our method is not restricted to only forecasting.
△ Less
Submitted 8 October, 2021; v1 submitted 27 January, 2021;
originally announced January 2021.
-
Robust Reinforcement Learning on State Observations with Learned Optimal Adversary
Authors:
Huan Zhang,
Hongge Chen,
Duane Boning,
Cho-Jui Hsieh
Abstract:
We study the robustness of reinforcement learning (RL) with adversarially perturbed state observations, which aligns with the setting of many adversarial attacks to deep reinforcement learning (DRL) and is also important for rolling out real-world RL agent under unpredictable sensing noise. With a fixed agent policy, we demonstrate that an optimal adversary to perturb state observations can be fou…
▽ More
We study the robustness of reinforcement learning (RL) with adversarially perturbed state observations, which aligns with the setting of many adversarial attacks to deep reinforcement learning (DRL) and is also important for rolling out real-world RL agent under unpredictable sensing noise. With a fixed agent policy, we demonstrate that an optimal adversary to perturb state observations can be found, which is guaranteed to obtain the worst case agent reward. For DRL settings, this leads to a novel empirical adversarial attack to RL agents via a learned adversary that is much stronger than previous ones. To enhance the robustness of an agent, we propose a framework of alternating training with learned adversaries (ATLA), which trains an adversary online together with the agent using policy gradient following the optimal adversarial attack framework. Additionally, inspired by the analysis of state-adversarial Markov decision process (SA-MDP), we show that past states and actions (history) can be useful for learning a robust agent, and we empirically find a LSTM based policy can be more robust under adversaries. Empirical evaluations on a few continuous control environments show that ATLA achieves state-of-the-art performance under strong adversaries. Our code is available at https://github.com/huanzhang12/ATLA_robust_RL.
△ Less
Submitted 21 January, 2021;
originally announced January 2021.
-
On $\ell_p$-norm Robustness of Ensemble Stumps and Trees
Authors:
Yihan Wang,
Huan Zhang,
Hongge Chen,
Duane Boning,
Cho-Jui Hsieh
Abstract:
Recent papers have demonstrated that ensemble stumps and trees could be vulnerable to small input perturbations, so robustness verification and defense for those models have become an important research problem. However, due to the structure of decision trees, where each node makes decision purely based on one feature value, all the previous works only consider the $\ell_\infty$ norm perturbation.…
▽ More
Recent papers have demonstrated that ensemble stumps and trees could be vulnerable to small input perturbations, so robustness verification and defense for those models have become an important research problem. However, due to the structure of decision trees, where each node makes decision purely based on one feature value, all the previous works only consider the $\ell_\infty$ norm perturbation. To study robustness with respect to a general $\ell_p$ norm perturbation, one has to consider the correlation between perturbations on different features, which has not been handled by previous algorithms. In this paper, we study the problem of robustness verification and certified defense with respect to general $\ell_p$ norm perturbations for ensemble decision stumps and trees. For robustness verification of ensemble stumps, we prove that complete verification is NP-complete for $p\in(0, \infty)$ while polynomial time algorithms exist for $p=0$ or $\infty$. For $p\in(0, \infty)$ we develop an efficient dynamic programming based algorithm for sound verification of ensemble stumps. For ensemble trees, we generalize the previous multi-level robustness verification algorithm to $\ell_p$ norm. We demonstrate the first certified defense method for training ensemble stumps and trees with respect to $\ell_p$ norm perturbations, and verify its effectiveness empirically on real datasets.
△ Less
Submitted 29 September, 2020; v1 submitted 19 August, 2020;
originally announced August 2020.
-
Multi-Stage Influence Function
Authors:
Hongge Chen,
Si Si,
Yang Li,
Ciprian Chelba,
Sanjiv Kumar,
Duane Boning,
Cho-Jui Hsieh
Abstract:
Multi-stage training and knowledge transfer, from a large-scale pretraining task to various finetuning tasks, have revolutionized natural language processing and computer vision resulting in state-of-the-art performance improvements. In this paper, we develop a multi-stage influence function score to track predictions from a finetuned model all the way back to the pretraining data. With this score…
▽ More
Multi-stage training and knowledge transfer, from a large-scale pretraining task to various finetuning tasks, have revolutionized natural language processing and computer vision resulting in state-of-the-art performance improvements. In this paper, we develop a multi-stage influence function score to track predictions from a finetuned model all the way back to the pretraining data. With this score, we can identify the pretraining examples in the pretraining task that contribute most to a prediction in the finetuning task. The proposed multi-stage influence function generalizes the original influence function for a single model in (Koh & Liang, 2017), thereby enabling influence computation through both pretrained and finetuned models. We study two different scenarios with the pretrained embeddings fixed or updated in the finetuning tasks. We test our proposed method in various experiments to show its effectiveness and potential applications.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations
Authors:
Huan Zhang,
Hongge Chen,
Chaowei Xiao,
Bo Li,
Mingyan Liu,
Duane Boning,
Cho-Jui Hsieh
Abstract:
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises. Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions. Several works have shown this vulnerability via adversarial attacks, but existing approaches on improving the robustness of DRL under th…
▽ More
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises. Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions. Several works have shown this vulnerability via adversarial attacks, but existing approaches on improving the robustness of DRL under this setting have limited success and lack for theoretical principles. We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks. We propose the state-adversarial Markov decision process (SA-MDP) to study the fundamental properties of this problem, and develop a theoretically principled policy regularization which can be applied to a large family of DRL algorithms, including proximal policy optimization (PPO), deep deterministic policy gradient (DDPG) and deep Q networks (DQN), for both discrete and continuous action control problems. We significantly improve the robustness of PPO, DDPG and DQN agents under a suite of strong white box adversarial attacks, including new attacks of our own. Additionally, we find that a robust policy noticeably improves DRL performance even without an adversary in a number of environments. Our code is available at https://github.com/chenhongge/StateAdvDRL.
△ Less
Submitted 14 July, 2021; v1 submitted 19 March, 2020;
originally announced March 2020.
-
Variational inference formulation for a model-free simulation of a dynamical system with unknown parameters by a recurrent neural network
Authors:
Kyongmin Yeo,
Dylan E. C. Grullon,
Fan-Keng Sun,
Duane S. Boning,
Jayant R. Kalagnanam
Abstract:
We propose a recurrent neural network for a "model-free" simulation of a dynamical system with unknown parameters without prior knowledge. The deep learning model aims to jointly learn the nonlinear time marching operator and the effects of the unknown parameters from a time series dataset. We assume that the time series data set consists of an ensemble of trajectories for a range of the parameter…
▽ More
We propose a recurrent neural network for a "model-free" simulation of a dynamical system with unknown parameters without prior knowledge. The deep learning model aims to jointly learn the nonlinear time marching operator and the effects of the unknown parameters from a time series dataset. We assume that the time series data set consists of an ensemble of trajectories for a range of the parameters. The learning task is formulated as a statistical inference problem by considering the unknown parameters as random variables. A latent variable is introduced to model the effects of the unknown parameters, and a variational inference method is employed to simultaneously train probabilistic models for the time marching operator and an approximate posterior distribution for the latent variable. Unlike the classical variational inference, where a factorized distribution is used to approximate the posterior, we employ a feedforward neural network supplemented by an encoder recurrent neural network to develop a more flexible probabilistic model. The approximate posterior distribution makes an inference on a trajectory to identify the effects of the unknown parameters. The time marching operator is approximated by a recurrent neural network, which takes a latent state sampled from the approximate posterior distribution as one of the input variables, to compute the time evolution of the probability distribution conditioned on the latent variable. In the numerical experiments, it is shown that the proposed variational inference model makes a more accurate simulation compared to the standard recurrent neural networks. It is found that the proposed deep learning model is capable of correctly identifying the dimensions of the random parameters and learning a representation of complex time series data.
△ Less
Submitted 26 February, 2021; v1 submitted 2 March, 2020;
originally announced March 2020.
-
Towards Stable and Efficient Training of Verifiably Robust Neural Networks
Authors:
Huan Zhang,
Hongge Chen,
Chaowei Xiao,
Sven Gowal,
Robert Stanforth,
Bo Li,
Duane Boning,
Cho-Jui Hsieh
Abstract:
Training neural networks with verifiable robustness guarantees is challenging. Several existing approaches utilize linear relaxation based neural network output bounds under perturbation, but they can slow down training by a factor of hundreds depending on the underlying network architectures. Meanwhile, interval bound propagation (IBP) based training is efficient and significantly outperforms lin…
▽ More
Training neural networks with verifiable robustness guarantees is challenging. Several existing approaches utilize linear relaxation based neural network output bounds under perturbation, but they can slow down training by a factor of hundreds depending on the underlying network architectures. Meanwhile, interval bound propagation (IBP) based training is efficient and significantly outperforms linear relaxation based methods on many tasks, yet it may suffer from stability issues since the bounds are much looser especially at the beginning of training. In this paper, we propose a new certified adversarial training method, CROWN-IBP, by combining the fast IBP bounds in a forward bounding pass and a tight linear relaxation based bound, CROWN, in a backward bounding pass. CROWN-IBP is computationally efficient and consistently outperforms IBP baselines on training verifiably robust neural networks. We conduct large scale experiments on MNIST and CIFAR datasets, and outperform all previous linear relaxation and bound propagation based certified defenses in $\ell_\infty$ robustness. Notably, we achieve 7.02% verified test error on MNIST at $ε=0.3$, and 66.94% on CIFAR-10 with $ε=8/255$. Code is available at https://github.com/deepmind/interval-bound-propagation (TensorFlow) and https://github.com/huanzhang12/CROWN-IBP (PyTorch).
△ Less
Submitted 27 November, 2019; v1 submitted 14 June, 2019;
originally announced June 2019.
-
Robustness Verification of Tree-based Models
Authors:
Hongge Chen,
Huan Zhang,
Si Si,
Yang Li,
Duane Boning,
Cho-Jui Hsieh
Abstract:
We study the robustness verification problem for tree-based models, including decision trees, random forests (RFs) and gradient boosted decision trees (GBDTs). Formal robustness verification of decision tree ensembles involves finding the exact minimal adversarial perturbation or a guaranteed lower bound of it. Existing approaches find the minimal adversarial perturbation by a mixed integer linear…
▽ More
We study the robustness verification problem for tree-based models, including decision trees, random forests (RFs) and gradient boosted decision trees (GBDTs). Formal robustness verification of decision tree ensembles involves finding the exact minimal adversarial perturbation or a guaranteed lower bound of it. Existing approaches find the minimal adversarial perturbation by a mixed integer linear programming (MILP) problem, which takes exponential time so is impractical for large ensembles. Although this verification problem is NP-complete in general, we give a more precise complexity characterization. We show that there is a simple linear time algorithm for verifying a single tree, and for tree ensembles, the verification problem can be cast as a max-clique problem on a multi-partite graph with bounded boxicity. For low dimensional problems when boxicity can be viewed as constant, this reformulation leads to a polynomial time algorithm. For general problems, by exploiting the boxicity of the graph, we develop an efficient multi-level verification algorithm that can give tight lower bounds on the robustness of decision tree ensembles, while allowing iterative improvement and any-time termination. OnRF/GBDT models trained on 10 datasets, our algorithm is hundreds of times faster than the previous approach that requires solving MILPs, and is able to give tight robustness verification bounds on large GBDTs with hundreds of deep trees.
△ Less
Submitted 9 December, 2019; v1 submitted 10 June, 2019;
originally announced June 2019.
-
Robust Decision Trees Against Adversarial Examples
Authors:
Hongge Chen,
Huan Zhang,
Duane Boning,
Cho-Jui Hsieh
Abstract:
Although adversarial examples and model robustness have been extensively studied in the context of linear models and neural networks, research on this issue in tree-based models and how to make tree-based models robust against adversarial examples is still limited. In this paper, we show that tree based models are also vulnerable to adversarial examples and develop a novel algorithm to learn robus…
▽ More
Although adversarial examples and model robustness have been extensively studied in the context of linear models and neural networks, research on this issue in tree-based models and how to make tree-based models robust against adversarial examples is still limited. In this paper, we show that tree based models are also vulnerable to adversarial examples and develop a novel algorithm to learn robust trees. At its core, our method aims to optimize the performance under the worst-case perturbation of input features, which leads to a max-min saddle point problem. Incorporating this saddle point objective into the decision tree building procedure is non-trivial due to the discrete nature of trees --- a naive approach to finding the best split according to this saddle point objective will take exponential time. To make our approach practical and scalable, we propose efficient tree building algorithms by approximating the inner minimizer in this saddle point problem, and present efficient implementations for classical information gain based trees as well as state-of-the-art tree boosting models such as XGBoost. Experimental results on real world datasets demonstrate that the proposed algorithms can substantially improve the robustness of tree-based models against adversarial examples.
△ Less
Submitted 11 June, 2019; v1 submitted 27 February, 2019;
originally announced February 2019.
-
The Limitations of Adversarial Training and the Blind-Spot Attack
Authors:
Huan Zhang,
Hongge Chen,
Zhao Song,
Duane Boning,
Inderjit S. Dhillon,
Cho-Jui Hsieh
Abstract:
The adversarial training procedure proposed by Madry et al. (2018) is one of the most effective methods to defend against adversarial examples in deep neural networks (DNNs). In our paper, we shed some lights on the practicality and the hardness of adversarial training by showing that the effectiveness (robustness on test set) of adversarial training has a strong correlation with the distance betw…
▽ More
The adversarial training procedure proposed by Madry et al. (2018) is one of the most effective methods to defend against adversarial examples in deep neural networks (DNNs). In our paper, we shed some lights on the practicality and the hardness of adversarial training by showing that the effectiveness (robustness on test set) of adversarial training has a strong correlation with the distance between a test point and the manifold of training data embedded by the network. Test examples that are relatively far away from this manifold are more likely to be vulnerable to adversarial attacks. Consequentially, an adversarial training based defense is susceptible to a new class of attacks, the "blind-spot attack", where the input images reside in "blind-spots" (low density regions) of the empirical distribution of training data but is still on the ground-truth data manifold. For MNIST, we found that these blind-spots can be easily found by simply scaling and shifting image pixel values. Most importantly, for large datasets with high dimensional and complex data manifold (CIFAR, ImageNet, etc), the existence of blind-spots in adversarial training makes defending on any valid test examples difficult due to the curse of dimensionality and the scarcity of training data. Additionally, we find that blind-spots also exist on provable defenses including (Wong & Kolter, 2018) and (Sinha et al., 2018) because these trainable robustness certificates can only be practically optimized on a limited set of training data.
△ Less
Submitted 15 January, 2019;
originally announced January 2019.
-
Towards Fast Computation of Certified Robustness for ReLU Networks
Authors:
Tsui-Wei Weng,
Huan Zhang,
Hongge Chen,
Zhao Song,
Cho-Jui Hsieh,
Duane Boning,
Inderjit S. Dhillon,
Luca Daniel
Abstract:
Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17]. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible. Current available methods of computing such a bound are either time-consuming or delivering low qua…
▽ More
Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17]. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible. Current available methods of computing such a bound are either time-consuming or delivering low quality bounds that are too loose to be useful. In this paper, we exploit the special structure of ReLU networks and provide two computationally efficient algorithms Fast-Lin and Fast-Lip that are able to certify non-trivial lower bounds of minimum distortions, by bounding the ReLU units with appropriate linear functions Fast-Lin, or by bounding the local Lipschitz constant Fast-Lip. Experiments show that (1) our proposed methods deliver bounds close to (the gap is 2-3X) exact minimum distortion found by Reluplex in small MNIST networks while our algorithms are more than 10,000 times faster; (2) our methods deliver similar quality of bounds (the gap is within 35% and usually around 10%; sometimes our bounds are even better) for larger networks compared to the methods based on solving linear programming problems but our algorithms are 33-14,000 times faster; (3) our method is capable of solving large MNIST and CIFAR networks up to 7 layers with more than 10,000 neurons within tens of seconds on a single CPU core.
In addition, we show that, in fact, there is no polynomial time algorithm that can approximately find the minimum $\ell_1$ adversarial distortion of a ReLU network with a $0.99\ln n$ approximation ratio unless $\mathsf{NP}$=$\mathsf{P}$, where $n$ is the number of neurons in the network.
△ Less
Submitted 2 October, 2018; v1 submitted 25 April, 2018;
originally announced April 2018.
-
Efficient Spatial Variation Characterization via Matrix Completion
Authors:
Hongge Chen,
Duane Boning,
Zheng Zhang
Abstract:
In this paper, we propose a novel method to estimate and characterize spatial variations on dies or wafers. This new technique exploits recent developments in matrix completion, enabling estimation of spatial variation across wafers or dies with a small number of randomly picked sampling points while still achieving fairly high accuracy. This new approach can be easily generalized, including for e…
▽ More
In this paper, we propose a novel method to estimate and characterize spatial variations on dies or wafers. This new technique exploits recent developments in matrix completion, enabling estimation of spatial variation across wafers or dies with a small number of randomly picked sampling points while still achieving fairly high accuracy. This new approach can be easily generalized, including for estimation of mixed spatial and structure or device type information.
△ Less
Submitted 28 March, 2017;
originally announced March 2017.