Search | arXiv e-print repository

arXiv:2405.20763 [pdf, other]

Improving Generalization and Convergence by Enhancing Implicit Regularization

Authors: Mingze Wang, Haotian He, Jinbo Wang, Zilin Wang, Guanhua Huang, Feiyu Xiong, Zhiyu Li, Weinan E, Lei Wu

Abstract: In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that I… ▽ More In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM). △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 35 pages

arXiv:2401.01220 [pdf, other]

Solving multiscale dynamical systems by deep learning

Authors: Zhi-Qin John Xu, Junjie Yao, Yuxiao Yi, Liangkai Hang, Weinan E, Yaoyu Zhang, Tianhan Zhang

Abstract: Multiscale dynamical systems, modeled by high-dimensional stiff ordinary differential equations (ODEs) with wide-ranging characteristic timescales, arise across diverse fields of science and engineering, but their numerical solvers often encounter severe efficiency bottlenecks. This paper introduces a novel DeePODE method, which consists of a global multiscale sampling method and a fitting by deep… ▽ More Multiscale dynamical systems, modeled by high-dimensional stiff ordinary differential equations (ODEs) with wide-ranging characteristic timescales, arise across diverse fields of science and engineering, but their numerical solvers often encounter severe efficiency bottlenecks. This paper introduces a novel DeePODE method, which consists of a global multiscale sampling method and a fitting by deep neural networks to handle multiscale systems. DeePODE's primary contribution is to address the multiscale challenge of efficiently uncovering representative training sets by combining the Monte Carlo method and the ODE system's intrinsic evolution without suffering from the ``curse of dimensionality''. The DeePODE method is validated in multiscale systems from diverse areas, including a predator-prey model, a power system oscillation, a battery electrolyte auto-ignition, and turbulent flames. Our methods exhibit strong generalization capabilities to unseen conditions, highlighting the power of deep learning in modeling intricate multiscale dynamical processes across science and engineering domains. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: 7 pages, 6 figures

arXiv:2311.17749 [pdf, other]

Learning Free Terminal Time Optimal Closed-loop Control of Manipulators

Authors: Wei Hu, Yue Zhao, Weinan E, Jiequn Han, Jihao Long

Abstract: This paper presents a novel approach to learning free terminal time closed-loop control for robotic manipulation tasks, enabling dynamic adjustment of task duration and control inputs to enhance performance. We extend the supervised learning approach, namely solving selected optimal open-loop problems and utilizing them as training data for a policy network, to the free terminal time scenario. Thr… ▽ More This paper presents a novel approach to learning free terminal time closed-loop control for robotic manipulation tasks, enabling dynamic adjustment of task duration and control inputs to enhance performance. We extend the supervised learning approach, namely solving selected optimal open-loop problems and utilizing them as training data for a policy network, to the free terminal time scenario. Three main challenges are addressed in this extension. First, we introduce a marching scheme that enhances the solution quality and increases the success rate of the open-loop solver by gradually refining time discretization. Second, we extend the QRnet in Nakamura-Zimmerer et al. (2021b) to the free terminal time setting to address discontinuity and improve stability at the terminal state. Third, we present a more automated version of the initial value problem (IVP) enhanced sampling method from previous work (Zhang et al., 2022) to adaptively update the training dataset, significantly improving its quality. By integrating these techniques, we develop a closed-loop policy that operates effectively over a broad domain with varying optimal time durations, achieving near globally optimal total costs. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2304.06913 [pdf, other]

The Random Feature Method for Time-dependent Problems

Authors: Jingrun Chen, Weinan E, Yixin Luo

Abstract: We present a framework for solving time-dependent partial differential equations (PDEs) in the spirit of the random feature method. The numerical solution is constructed using a space-time partition of unity and random feature functions. Two different ways of constructing the random feature functions are investigated: feature functions that treat the spatial and temporal variables (STC) on the sam… ▽ More We present a framework for solving time-dependent partial differential equations (PDEs) in the spirit of the random feature method. The numerical solution is constructed using a space-time partition of unity and random feature functions. Two different ways of constructing the random feature functions are investigated: feature functions that treat the spatial and temporal variables (STC) on the same footing, or functions that are the product of two random feature functions depending on spatial and temporal variables separately (SoV). Boundary and initial conditions are enforced by penalty terms. We also study two ways of solving the resulting least-squares problem: the problem is solved as a whole or solved using the block time-marching strategy. The former is termed ``the space-time random feature method'' (ST-RFM). Numerical results for a series of problems show that the proposed method, i.e. ST-RFM with STC and ST-RFM with SoV, have spectral accuracy in both space and time. In addition, ST-RFM only requires collocation points, not a mesh. This is important for solving problems with complex geometry. We demonstrate this by using ST-RFM to solve a two-dimensional wave equation over a complex domain. The two strategies differ significantly in terms of the behavior in time. In the case when block time-marching is used, we prove a lower error bound that shows an exponentially growing factor with respect to the number of blocks in time. For ST-RFM, we prove an upper bound with a sublinearly growing factor with respect to the number of subdomains in time. These estimates are also confirmed by numerical results. △ Less

Submitted 13 April, 2023; originally announced April 2023.

Comments: 26 pages, 12 figures

MSC Class: 65M20; 65M55; 65M70

arXiv:2209.04078 [pdf, other]

Initial Value Problem Enhanced Sampling for Closed-Loop Optimal Control Design with Deep Neural Networks

Authors: Xuanxi Zhang, Jihao Long, Wei Hu, Weinan E, Jiequn Han

Abstract: Closed-loop optimal control design for high-dimensional nonlinear systems has been a long-standing challenge. Traditional methods, such as solving the associated Hamilton-Jacobi-Bellman equation, suffer from the curse of dimensionality. Recent literature proposed a new promising approach based on supervised learning, by leveraging powerful open-loop optimal control solvers to generate training dat… ▽ More Closed-loop optimal control design for high-dimensional nonlinear systems has been a long-standing challenge. Traditional methods, such as solving the associated Hamilton-Jacobi-Bellman equation, suffer from the curse of dimensionality. Recent literature proposed a new promising approach based on supervised learning, by leveraging powerful open-loop optimal control solvers to generate training data and neural networks as efficient high-dimensional function approximators to fit the closed-loop optimal control. This approach successfully handles certain high-dimensional optimal control problems but still performs poorly on more challenging problems. One of the crucial reasons for the failure is the so-called distribution mismatch phenomenon brought by the controlled dynamics. In this paper, we investigate this phenomenon and propose the initial value problem enhanced sampling method to mitigate this problem. We theoretically prove that this sampling strategy improves over the vanilla strategy on the classical linear-quadratic regulator by a factor proportional to the total time duration. We further numerically demonstrate that the proposed sampling strategy significantly improves the performance on tested control problems, including the optimal landing problem of a quadrotor and the optimal reaching problem of a 7 DoF manipulator. △ Less

Submitted 9 July, 2023; v1 submitted 8 September, 2022; originally announced September 2022.

arXiv:2207.13380 [pdf, other]

Bridging Traditional and Machine Learning-based Algorithms for Solving PDEs: The Random Feature Method

Authors: Jingrun Chen, Xurong Chi, Weinan E, Zhouwang Yang

Abstract: One of the oldest and most studied subject in scientific computing is algorithms for solving partial differential equations (PDEs). A long list of numerical methods have been proposed and successfully used for various applications. In recent years, deep learning methods have shown their superiority for high-dimensional PDEs where traditional methods fail. However, for low dimensional problems, it… ▽ More One of the oldest and most studied subject in scientific computing is algorithms for solving partial differential equations (PDEs). A long list of numerical methods have been proposed and successfully used for various applications. In recent years, deep learning methods have shown their superiority for high-dimensional PDEs where traditional methods fail. However, for low dimensional problems, it remains unclear whether these methods have a real advantage over traditional algorithms as a direct solver. In this work, we propose the random feature method (RFM) for solving PDEs, a natural bridge between traditional and machine learning-based algorithms. RFM is based on a combination of well-known ideas: 1. representation of the approximate solution using random feature functions; 2. collocation method to take care of the PDE; 3. the penalty method to treat the boundary conditions, which allows us to treat the boundary condition and the PDE in the same footing. We find it crucial to add several additional components including multi-scale representation and rescaling the weights in the loss function. We demonstrate that the method exhibits spectral accuracy and can compete with traditional solvers in terms of both accuracy and efficiency. In addition, we find that RFM is particularly suited for complex problems with complex geometry, where both traditional and machine learning-based algorithms encounter difficulties. △ Less

Submitted 27 July, 2022; originally announced July 2022.

arXiv:2205.08622 [pdf, other]

Solving optimal control of rigid-body dynamics with collisions using the hybrid minimum principle

Authors: Wei Hu, Jihao Long, Yaohua Zang, Weinan E, Jiequn Han

Abstract: Collisions are common in many dynamical systems with real applications. They can be formulated as hybrid dynamical systems with discontinuities automatically triggered when states transverse certain manifolds. We present an algorithm for the optimal control problem of such hybrid dynamical systems based on solving the equations derived from the hybrid minimum principle (HMP). The algorithm is an i… ▽ More Collisions are common in many dynamical systems with real applications. They can be formulated as hybrid dynamical systems with discontinuities automatically triggered when states transverse certain manifolds. We present an algorithm for the optimal control problem of such hybrid dynamical systems based on solving the equations derived from the hybrid minimum principle (HMP). The algorithm is an iterative scheme following the spirit of the method of successive approximations (MSA), and it is robust to undesired collisions observed in the initial guesses. We analyze the discontinuities in the system and propose a stable collision condition, which is crucial for the convergence of iterative algorithms in systems experiencing collisions. Subsequently, we establish a convergence theorem demonstrating linear convergence for the MSA algorithm when collisions are present. We also address several numerical challenges introduced by the discontinuities. The algorithm is tested on disc collision problems whose optimal solutions exhibit one or multiple collisions. Linear convergence in terms of iteration steps and asymptotic first-order accuracy in terms of time discretization are observed when the algorithm is implemented with the forward-Euler scheme. The numerical results demonstrate that the proposed algorithm has better accuracy and convergence than direct methods based on gradient descent. Furthermore, the algorithm is also simpler, more accurate, and more stable than a deep reinforcement learning method. △ Less

Submitted 10 May, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

MSC Class: 49Mxx

arXiv:2205.07990 [pdf, other]

Empowering Optimal Control with Machine Learning: A Perspective from Model Predictive Control

Authors: Weinan E, Jiequn Han, Jihao Long

Abstract: Solving complex optimal control problems have confronted computational challenges for a long time. Recent advances in machine learning have provided us with new opportunities to address these challenges. This paper takes model predictive control, a popular optimal control method, as the primary example to survey recent progress that leverages machine learning techniques to empower optimal control… ▽ More Solving complex optimal control problems have confronted computational challenges for a long time. Recent advances in machine learning have provided us with new opportunities to address these challenges. This paper takes model predictive control, a popular optimal control method, as the primary example to survey recent progress that leverages machine learning techniques to empower optimal control solvers. We also discuss some of the main challenges encountered when applying machine learning to develop more robust optimal control algorithms. △ Less

Submitted 20 July, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

arXiv:2203.06753 [pdf, other]

A Machine Learning Enhanced Algorithm for the Optimal Landing Problem

Authors: Yaohua Zang, Jihao Long, Xuanxi Zhang, Wei Hu, Weinan E, Jiequn Han

Abstract: We propose a machine learning enhanced algorithm for solving the optimal landing problem. Using Pontryagin's minimum principle, we derive a two-point boundary value problem for the landing problem. The proposed algorithm uses deep learning to predict the optimal landing time and a space-marching technique to provide good initial guesses for the boundary value problem solver. The performance of the… ▽ More We propose a machine learning enhanced algorithm for solving the optimal landing problem. Using Pontryagin's minimum principle, we derive a two-point boundary value problem for the landing problem. The proposed algorithm uses deep learning to predict the optimal landing time and a space-marching technique to provide good initial guesses for the boundary value problem solver. The performance of the proposed method is studied using the quadrotor example, a reasonably high dimensional and strongly nonlinear system. Drastic improvement in reliability and efficiency is observed. △ Less

Submitted 13 March, 2022; originally announced March 2022.

arXiv:2201.03549 [pdf, other]

A multi-scale sampling method for accurate and robust deep neural network to predict combustion chemical kinetics

Authors: Tianhan Zhang, Yuxiao Yi, Yifan Xu, Zhi X. Chen, Yaoyu Zhang, Weinan E, Zhi-Qin John Xu

Abstract: Machine learning has long been considered as a black box for predicting combustion chemical kinetics due to the extremely large number of parameters and the lack of evaluation standards and reproducibility. The current work aims to understand two basic questions regarding the deep neural network (DNN) method: what data the DNN needs and how general the DNN method can be. Sampling and preprocessing… ▽ More Machine learning has long been considered as a black box for predicting combustion chemical kinetics due to the extremely large number of parameters and the lack of evaluation standards and reproducibility. The current work aims to understand two basic questions regarding the deep neural network (DNN) method: what data the DNN needs and how general the DNN method can be. Sampling and preprocessing determine the DNN training dataset, further affect DNN prediction ability. The current work proposes using Box-Cox transformation (BCT) to preprocess the combustion data. In addition, this work compares different sampling methods with or without preprocessing, including the Monte Carlo method, manifold sampling, generative neural network method (cycle-GAN), and newly-proposed multi-scale sampling. Our results reveal that the DNN trained by the manifold data can capture the chemical kinetics in limited configurations but cannot remain robust toward perturbation, which is inevitable for the DNN coupled with the flow field. The Monte Carlo and cycle-GAN samplings can cover a wider phase space but fail to capture small-scale intermediate species, producing poor prediction results. A three-hidden-layer DNN, based on the multi-scale method without specific flame simulation data, allows predicting chemical kinetics in various scenarios and being stable during the temporal evolutions. This single DNN is readily implemented with several CFD codes and validated in various combustors, including (1). zero-dimensional autoignition, (2). one-dimensional freely propagating flame, (3). two-dimensional jet flame with triple-flame structure, and (4). three-dimensional turbulent lifted flames. The results demonstrate the satisfying accuracy and generalization ability of the pre-trained DNN. The Fortran and Python versions of DNN and example code are attached in the supplementary for reproducibility. △ Less

Submitted 12 August, 2022; v1 submitted 9 January, 2022; originally announced January 2022.

arXiv:2201.02025 [pdf, other]

A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics

Authors: Zhiwei Wang, Yaoyu Zhang, Enhan Zhao, Yiguang Ju, Weinan E, Zhi-Qin John Xu, Tianhan Zhang

Abstract: A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics is proposed and validated using high-temperature auto-ignitions, perfectly stirred reactors (PSR), and one-dimensional freely propagating flames of n-heptane/air mixtures. The mechanism reduction is modeled as an optimization problem on Boolean space, where a Boolean vector, each entry corresponding to a specie… ▽ More A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics is proposed and validated using high-temperature auto-ignitions, perfectly stirred reactors (PSR), and one-dimensional freely propagating flames of n-heptane/air mixtures. The mechanism reduction is modeled as an optimization problem on Boolean space, where a Boolean vector, each entry corresponding to a species, represents a reduced mechanism. The optimization goal is to minimize the reduced mechanism size given the error tolerance of a group of pre-selected benchmark quantities. The key idea of the DeePMR is to employ a deep neural network (DNN) to formulate the objective function in the optimization problem. In order to explore high dimensional Boolean space efficiently, an iterative DNN-assisted data sampling and DNN training procedure are implemented. The results show that DNN-assistance improves sampling efficiency significantly, selecting only $10^5$ samples out of $10^{34}$ possible samples for DNN to achieve sufficient accuracy. The results demonstrate the capability of the DNN to recognize key species and reasonably predict reduced mechanism performance. The well-trained DNN guarantees the optimal reduced mechanism by solving an inverse optimization problem. By comparing ignition delay times, laminar flame speeds, temperatures in PSRs, the resulting skeletal mechanism has fewer species (45 species) but the same level of accuracy as the skeletal mechanism (56 species) obtained by the Path Flux Analysis (PFA) method. In addition, the skeletal mechanism can be further reduced to 28 species if only considering atmospheric, near-stoichiometric conditions (equivalence ratio between 0.6 and 1.2). The DeePMR provides an innovative way to perform model reduction and demonstrates the great potential of data-driven methods in the combustion area. △ Less

Submitted 8 September, 2022; v1 submitted 6 January, 2022; originally announced January 2022.

arXiv:2107.03673 [pdf, other]

doi 10.4208/cicp.OA-2021-0257

MOD-Net: A Machine Learning Approach via Model-Operator-Data Network for Solving PDEs

Authors: Lulu Zhang, Tao Luo, Yaoyu Zhang, Weinan E, Zhi-Qin John Xu, Zheng Ma

Abstract: In this paper, we propose a a machine learning approach via model-operator-data network (MOD-Net) for solving PDEs. A MOD-Net is driven by a model to solve PDEs based on operator representation with regularization from data. For linear PDEs, we use a DNN to parameterize the Green's function and obtain the neural operator to approximate the solution according to the Green's method. To train the DNN… ▽ More In this paper, we propose a a machine learning approach via model-operator-data network (MOD-Net) for solving PDEs. A MOD-Net is driven by a model to solve PDEs based on operator representation with regularization from data. For linear PDEs, we use a DNN to parameterize the Green's function and obtain the neural operator to approximate the solution according to the Green's method. To train the DNN, the empirical risk consists of the mean squared loss with the least square formulation or the variational formulation of the governing equation and boundary conditions. For complicated problems, the empirical risk also includes a few labels, which are computed on coarse grid points with cheap computation cost and significantly improves the model accuracy. Intuitively, the labeled dataset works as a regularization in addition to the model constraints. The MOD-Net solves a family of PDEs rather than a specific one and is much more efficient than original neural operator because few expensive labels are required. We numerically show MOD-Net is very efficient in solving Poisson equation and one-dimensional radiative transfer equation. For nonlinear PDEs, the nonlinear MOD-Net can be similarly used as an ansatz for solving nonlinear PDEs, exemplified by solving several nonlinear PDE problems, such as the Burgers equation. △ Less

Submitted 28 December, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

arXiv:2012.12654 [pdf]

A deep learning-based ODE solver for chemical kinetics

Authors: Tianhan Zhang, Yaoyu Zhang, Weinan E, Yiguang Ju

Abstract: Developing efficient and accurate algorithms for chemistry integration is a challenging task due to its strong stiffness and high dimensionality. The current work presents a deep learning-based numerical method called DeepCombustion0.0 to solve stiff ordinary differential equation systems. The homogeneous autoignition of DME/air mixture, including 54 species, is adopted as an example to illustrate… ▽ More Developing efficient and accurate algorithms for chemistry integration is a challenging task due to its strong stiffness and high dimensionality. The current work presents a deep learning-based numerical method called DeepCombustion0.0 to solve stiff ordinary differential equation systems. The homogeneous autoignition of DME/air mixture, including 54 species, is adopted as an example to illustrate the validity and accuracy of the algorithm. The training and testing datasets cover a wide range of temperature, pressure, and mixture conditions between 750-1200 K, 30-50 atm, and equivalence ratio = 0.7-1.5. Both the first-stage low-temperature ignition (LTI) and the second-stage high-temperature ignition (HTI) are considered. The methodology highlights the importance of the adaptive data sampling techniques, power transform preprocessing, and binary deep neural network (DNN) design. By using the adaptive random samplings and appropriate power transforms, smooth submanifolds in the state vector phase space are observed, on which two three-layer DNNs can be appropriately trained. The neural networks are end-to-end, which predict temporal gradients of the state vectors directly. The results show that temporal evolutions predicted by DNN agree well with traditional numerical methods in all state vector dimensions, including temperature, pressure, and species concentrations. Besides, the ignition delay time differences are within 1%. At the same time, the CPU time is reduced by more than 20 times and 200 times compared with the HMTS and VODE method, respectively. The current work demonstrates the enormous potential of applying the deep learning algorithm in chemical kinetics and combustion modeling. △ Less

Submitted 23 November, 2020; originally announced December 2020.

arXiv:2012.01484 [pdf, ps, other]

Some observations on high-dimensional partial differential equations with Barron data

Authors: Weinan E, Stephan Wojtowytsch

Abstract: We use explicit representation formulas to show that solutions to certain partial differential equations lie in Barron spaces or multilayer spaces if the PDE data lie in such function spaces. Consequently, these solutions can be represented efficiently using artificial neural networks, even in high dimension. Conversely, we present examples in which the solution fails to lie in the function space… ▽ More We use explicit representation formulas to show that solutions to certain partial differential equations lie in Barron spaces or multilayer spaces if the PDE data lie in such function spaces. Consequently, these solutions can be represented efficiently using artificial neural networks, even in high dimension. Conversely, we present examples in which the solution fails to lie in the function space associated to a neural network under consideration. △ Less

Submitted 4 June, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

MSC Class: 68T07; 35C15; 65M80

arXiv:2010.05627 [pdf, other]

Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning

Authors: Pan Zhou, Jiashi Feng, Chao Ma, Caiming Xiong, Steven Hoi, Weinan E

Abstract: It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this generalization gap by analyzing their local convergence behaviors. Specifically, we observe the heavy tails of gradient noise in these algorithms. This motivates us to analyze these algorithms thr… ▽ More It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this generalization gap by analyzing their local convergence behaviors. Specifically, we observe the heavy tails of gradient noise in these algorithms. This motivates us to analyze these algorithms through their Levy-driven stochastic differential equations (SDEs) because of the similar convergence behaviors of an algorithm and its SDE. Then we establish the escaping time of these SDEs from a local basin. The result shows that (1) the escaping time of both SGD and ADAM~depends on the Radon measure of the basin positively and the heaviness of gradient noise negatively; (2) for the same basin, SGD enjoys smaller escaping time than ADAM, mainly because (a) the geometry adaptation in ADAM~via adaptively scaling each gradient coordinate well diminishes the anisotropic structure in gradient noise and results in larger Radon measure of a basin; (b) the exponential gradient average in ADAM~smooths its gradient and leads to lighter gradient noise tails than SGD. So SGD is more locally unstable than ADAM~at sharp minima defined as the minima whose local basins have small Radon measure, and can better escape from them to flatter ones with larger Radon measure. As flat minima here which often refer to the minima at flat or asymmetric basins/valleys often generalize better than sharp ones , our result explains the better generalization performance of SGD over ADAM. Finally, experimental results confirm our heavy-tailed gradient noise assumption and theoretical affirmation. △ Less

Submitted 28 November, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

Comments: NeurIPS 2020

arXiv:2009.14596 [pdf, other]

doi 10.4208/cicp.OA-2020-0185

Machine Learning and Computational Mathematics

Authors: Weinan E

Abstract: Neural network-based machine learning is capable of approximating functions in very high dimension with unprecedented efficiency and accuracy. This has opened up many exciting new possibilities, not just in traditional areas of artificial intelligence, but also in scientific computing and computational science. At the same time, machine learning has also acquired the reputation of being a set of "… ▽ More Neural network-based machine learning is capable of approximating functions in very high dimension with unprecedented efficiency and accuracy. This has opened up many exciting new possibilities, not just in traditional areas of artificial intelligence, but also in scientific computing and computational science. At the same time, machine learning has also acquired the reputation of being a set of "black box" type of tricks, without fundamental principles. This has been a real obstacle for making further progress in machine learning. In this article, we try to address the following two very important questions: (1) How machine learning has already impacted and will further impact computational mathematics, scientific computing and computational science? (2) How computational mathematics, particularly numerical analysis, {can} impact machine learning? We describe some of the most important progress that has been made on these issues. Our hope is to put things into a perspective that will help to integrate machine learning with computational mathematics. △ Less

Submitted 23 September, 2020; originally announced September 2020.

MSC Class: 68T07; 46E15; 26B35; 26B40

arXiv:2009.13500 [pdf, ps, other]

A priori estimates for classification problems using neural networks

Authors: Weinan E, Stephan Wojtowytsch

Abstract: We consider binary and multi-class classification problems using hypothesis classes of neural networks. For a given hypothesis class, we use Rademacher complexity estimates and direct approximation theorems to obtain a priori error estimates for regularized loss functionals. We consider binary and multi-class classification problems using hypothesis classes of neural networks. For a given hypothesis class, we use Rademacher complexity estimates and direct approximation theorems to obtain a priori error estimates for regularized loss functionals. △ Less

Submitted 28 September, 2020; originally announced September 2020.

MSC Class: 68T07; 60-08

arXiv:2009.10713 [pdf, other]

Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't

Authors: Weinan E, Chao Ma, Stephan Wojtowytsch, Lei Wu

Abstract: The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network-based machine learning. In the tradition of good old applied mathematics, we will not only give attention to rigorous mathematical results, but also the insight we have gained from careful numerical experiments as well as… ▽ More The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network-based machine learning. In the tradition of good old applied mathematics, we will not only give attention to rigorous mathematical results, but also the insight we have gained from careful numerical experiments as well as the analysis of simplified models. Along the way, we also list the open problems which we believe to be the most important topics for further study. This is not a complete overview over this quickly moving field, but we hope to provide a perspective which may be helpful especially to new researchers in the area. △ Less

Submitted 7 December, 2020; v1 submitted 22 September, 2020; originally announced September 2020.

Comments: Review article. Feedback welcome

MSC Class: 68T07 (primary); 26B40; 41A30; 35Q68

arXiv:2009.07799 [pdf, other]

On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis

Authors: Zhong Li, Jiequn Han, Weinan E, Qianxiao Li

Abstract: We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a… ▽ More We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals, and characterize the approximation rate and its relation with memory. Moreover, we perform a fine-grained dynamical analysis of training linear RNNs, which further reveal the intricate interactions between memory and learning. A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework, on approximation and optimization: when there is long term memory in the target, it takes a large number of neurons to approximate it. Moreover, the training process will suffer from slow downs. In particular, both of these effects become exponentially more pronounced with memory - a phenomenon we call the "curse of memory". These analyses represent a basic step towards a concrete mathematical understanding of new phenomenon that may arise in learning temporal relationships using recurrent architectures. △ Less

Submitted 15 May, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

Comments: Published version

MSC Class: 68W25; 68T07; 37M10 ACM Class: I.2.6

arXiv:2009.02327 [pdf, other]

doi 10.1103/PhysRevFluids.6.114402

OnsagerNet: Learning Stable and Interpretable Dynamics using a Generalized Onsager Principle

Authors: Haijun Yu, Xinyuan Tian, Weinan E, Qianxiao Li

Abstract: We propose a systematic method for learning stable and physically interpretable dynamical models using sampled trajectory data from physical processes based on a generalized Onsager principle. The learned dynamics are autonomous ordinary differential equations parameterized by neural networks that retain clear physical structure information, such as free energy, diffusion, conservative motion and… ▽ More We propose a systematic method for learning stable and physically interpretable dynamical models using sampled trajectory data from physical processes based on a generalized Onsager principle. The learned dynamics are autonomous ordinary differential equations parameterized by neural networks that retain clear physical structure information, such as free energy, diffusion, conservative motion and external forces. For high dimensional problems with a low dimensional slow manifold, an autoencoder with metric preserving regularization is introduced to find the low dimensional generalized coordinates on which we learn the generalized Onsager dynamics. Our method exhibits clear advantages over existing methods on benchmark problems for learning ordinary differential equations. We further apply this method to study Rayleigh-Benard convection and learn Lorenz-like low dimensional autonomous reduced order models that capture both qualitative and quantitative properties of the underlying dynamics. This forms a general approach to building reduced order models for forced dissipative systems. △ Less

Submitted 17 October, 2021; v1 submitted 6 September, 2020; originally announced September 2020.

Comments: 29 pages, 19 figures

MSC Class: 76E30; 34D20; 68T05/07; 82C35

Journal ref: Phy. Rev. Fluids 6(11):114402, 2021

arXiv:2008.13333 [pdf, other]

doi 10.1088/1361-6544/ac337f

Algorithms for Solving High Dimensional PDEs: From Nonlinear Monte Carlo to Machine Learning

Authors: Weinan E, Jiequn Han, Arnulf Jentzen

Abstract: In recent years, tremendous progress has been made on numerical algorithms for solving partial differential equations (PDEs) in a very high dimension, using ideas from either nonlinear (multilevel) Monte Carlo or deep learning. They are potentially free of the curse of dimensionality for many different applications and have been proven to be so in the case of some nonlinear Monte Carlo methods for… ▽ More In recent years, tremendous progress has been made on numerical algorithms for solving partial differential equations (PDEs) in a very high dimension, using ideas from either nonlinear (multilevel) Monte Carlo or deep learning. They are potentially free of the curse of dimensionality for many different applications and have been proven to be so in the case of some nonlinear Monte Carlo methods for nonlinear parabolic PDEs. In this paper, we review these numerical and theoretical advances. In addition to algorithms based on stochastic reformulations of the original problem, such as the multilevel Picard iteration and the Deep BSDE method, we also discuss algorithms based on the more traditional Ritz, Galerkin, and least square formulations. We hope to demonstrate to the reader that studying PDEs as well as control and variational problems in very high dimensions might very well be among the most promising new directions in mathematics and scientific computing in the near future. △ Less

Submitted 11 September, 2020; v1 submitted 30 August, 2020; originally announced August 2020.

MSC Class: 65C05; 65K10; 65M75; 90C06

Journal ref: Nonlinearity 35 (2022) 278-310

arXiv:2007.15623 [pdf, ps, other]

On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics

Authors: Weinan E, Stephan Wojtowytsch

Abstract: We develop Banach spaces for ReLU neural networks of finite depth $L$ and infinite width. The spaces contain all finite fully connected $L$-layer networks and their $L^2$-limiting objects under bounds on the natural path-norm. Under this norm, the unit ball in the space for $L$-layer networks has low Rademacher complexity and thus favorable generalization properties. Functions in these spaces can… ▽ More We develop Banach spaces for ReLU neural networks of finite depth $L$ and infinite width. The spaces contain all finite fully connected $L$-layer networks and their $L^2$-limiting objects under bounds on the natural path-norm. Under this norm, the unit ball in the space for $L$-layer networks has low Rademacher complexity and thus favorable generalization properties. Functions in these spaces can be approximated by multi-layer neural networks with dimension-independent convergence rates. The key to this work is a new way of representing functions in some form of expectations, motivated by multi-layer neural networks. This representation allows us to define a new class of continuous models for machine learning. We show that the gradient flow defined this way is the natural continuous analog of the gradient descent dynamics for the associated multi-layer neural networks. We show that the path-norm increases at most polynomially under this continuous gradient flow dynamics. △ Less

Submitted 30 July, 2020; originally announced July 2020.

MSC Class: 68T07; 46E15; 26B35; 26B40

arXiv:2006.14450 [pdf, other]

The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

Authors: Chao Ma, Lei Wu, Weinan E

Abstract: A numerical and phenomenological study of the gradient descent (GD) algorithm for training two-layer neural network models is carried out for different parameter regimes when the target function can be accurately approximated by a relatively small number of neurons. It is found that for Xavier-like initialization, there are two distinctive phases in the dynamic behavior of GD in the under-parametr… ▽ More A numerical and phenomenological study of the gradient descent (GD) algorithm for training two-layer neural network models is carried out for different parameter regimes when the target function can be accurately approximated by a relatively small number of neurons. It is found that for Xavier-like initialization, there are two distinctive phases in the dynamic behavior of GD in the under-parametrized regime: An early phase in which the GD dynamics follows closely that of the corresponding random feature model and the neurons are effectively quenched, followed by a late phase in which the neurons are divided into two groups: a group of a few "activated" neurons that dominate the dynamics and a group of background (or "quenched") neurons that support the continued activation and deactivation process. This neural network-like behavior is continued into the mildly over-parametrized regime, where it undergoes a transition to a random feature-like behavior. The quenching-activation process seems to provide a clear mechanism for "implicit regularization". This is qualitatively different from the dynamics associated with the "mean-field" scaling where all neurons participate equally and there does not appear to be qualitative changes when the network parameters are changed. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Comments: 23 pages

arXiv:2006.05982 [pdf, ps, other]

Representation formulas and pointwise properties for Barron functions

Authors: Weinan E, Stephan Wojtowytsch

Abstract: We study the natural function space for infinitely wide two-layer neural networks with ReLU activation (Barron space) and establish different representation formulae. In two cases, we describe the space explicitly up to isomorphism. Using a convenient representation, we study the pointwise properties of two-layer networks and show that functions whose singular set is fractal or curved (for examp… ▽ More We study the natural function space for infinitely wide two-layer neural networks with ReLU activation (Barron space) and establish different representation formulae. In two cases, we describe the space explicitly up to isomorphism. Using a convenient representation, we study the pointwise properties of two-layer networks and show that functions whose singular set is fractal or curved (for example distance functions from smooth submanifolds) cannot be represented by infinitely wide two-layer networks with finite path-norm. We use this structure theorem to show that the only $C^1$-diffeomorphisms which Barron space are affine. Furthermore, we show that every Barron function can be decomposed as the sum of a bounded and a positively one-homogeneous function and that there exist Barron functions which decay rapidly at infinity and are globally Lebesgue-integrable. This result suggests that two-layer neural networks may be able to approximate a greater variety of functions than commonly believed. △ Less

Submitted 4 June, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

MSC Class: 68T07; 46E15; 26B35; 26B40

arXiv:2006.02619 [pdf, other]

Integrating Machine Learning with Physics-Based Modeling

Authors: Weinan E, Jiequn Han, Linfeng Zhang

Abstract: Machine learning is poised as a very powerful tool that can drastically improve our ability to carry out scientific research. However, many issues need to be addressed before this becomes a reality. This article focuses on one particular issue of broad interest: How can we integrate machine learning with physics-based modeling to develop new interpretable and truly reliable physical models? After… ▽ More Machine learning is poised as a very powerful tool that can drastically improve our ability to carry out scientific research. However, many issues need to be addressed before this becomes a reality. This article focuses on one particular issue of broad interest: How can we integrate machine learning with physics-based modeling to develop new interpretable and truly reliable physical models? After introducing the general guidelines, we discuss the two most important issues for developing machine learning-based physical models: Imposing physical constraints and obtaining optimal datasets. We also provide a simple and intuitive explanation for the fundamental reasons behind the success of modern machine learning, as well as an introduction to the concurrent machine learning framework needed for integrating machine learning with physics-based modeling. Molecular dynamics and moment closure of kinetic equations are used as examples to illustrate the main issues discussed. We end with a general discussion on where this integration will lead us to, and where the new frontier will be after machine learning is successfully integrated into scientific modeling. △ Less

Submitted 3 June, 2020; originally announced June 2020.

arXiv:2005.10815 [pdf, other]

Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

Authors: Stephan Wojtowytsch, Weinan E

Abstract: We prove that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than $t^{-4/(d-2)}$ under mean field scaling. Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality. We present numerical evidence that gradient descent tra… ▽ More We prove that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than $t^{-4/(d-2)}$ under mean field scaling. Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality. We present numerical evidence that gradient descent training with general Lipschitz target functions becomes slower and slower as the dimension increases, but converges at approximately the same rate in all dimensions when the target function lies in the natural function space for two-layer ReLU networks. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Comments: 5 figures

MSC Class: 68T07; 49Q22; 68W25

arXiv:2005.10807 [pdf, ps, other]

Kolmogorov Width Decay and Poor Approximators in Machine Learning: Shallow Neural Networks, Random Feature Models and Neural Tangent Kernels

Authors: Weinan E, Stephan Wojtowytsch

Abstract: We establish a scale separation of Kolmogorov width type between subspaces of a given Banach space under the condition that a sequence of linear maps converges much faster on one of the subspaces. The general technique is then applied to show that reproducing kernel Hilbert spaces are poor $L^2$-approximators for the class of two-layer neural networks in high dimension, and that multi-layer networ… ▽ More We establish a scale separation of Kolmogorov width type between subspaces of a given Banach space under the condition that a sequence of linear maps converges much faster on one of the subspaces. The general technique is then applied to show that reproducing kernel Hilbert spaces are poor $L^2$-approximators for the class of two-layer neural networks in high dimension, and that multi-layer networks with small path norm are poor approximators for certain Lipschitz functions, also in the $L^2$-topology. △ Less

Submitted 2 October, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

MSC Class: 68T07; 41A30; 41A65; 46E15; 46E22

arXiv:2003.03672 [pdf, other]

doi 10.1103/PhysRevE.102.043309

Machine learning based non-Newtonian fluid model with molecular fidelity

Authors: Huan Lei, Lei Wu, Weinan E

Abstract: We introduce a machine-learning-based framework for constructing continuum non-Newtonian fluid dynamics model directly from a micro-scale description. Dumbbell polymer solutions are used as examples to demonstrate the essential ideas. To faithfully retain molecular fidelity, we establish a micro-macro correspondence via a set of encoders for the micro-scale polymer configurations and their macro-s… ▽ More We introduce a machine-learning-based framework for constructing continuum non-Newtonian fluid dynamics model directly from a micro-scale description. Dumbbell polymer solutions are used as examples to demonstrate the essential ideas. To faithfully retain molecular fidelity, we establish a micro-macro correspondence via a set of encoders for the micro-scale polymer configurations and their macro-scale counterparts, a set of nonlinear conformation tensors. The dynamics of these conformation tensors can be derived from the micro-scale model and the relevant terms can be parametrized using machine learning. The final model named the deep non-Newtonian model (DeePN$^2$), takes the form of conventional non-Newtonian fluid dynamics models, with a new form of the objective tensor derivative. Both the formulation of the dynamic equation and the neural network representation rigorously preserve the rotational invariance, which ensures the admissibility of the constructed model. Numerical results demonstrate the accuracy of DeePN$^2$, where models based on empirical closures show limitations. △ Less

Submitted 23 October, 2020; v1 submitted 7 March, 2020; originally announced March 2020.

Journal ref: Phys. Rev. E 102, 043309 (2020)

arXiv:1912.12777 [pdf, ps, other]

doi 10.1007/s11425-020-1773-8

Machine Learning from a Continuous Viewpoint

Authors: Weinan E, Chao Ma, Lei Wu

Abstract: We present a continuous formulation of machine learning, as a problem in the calculus of variations and differential-integral equations, in the spirit of classical numerical analysis. We demonstrate that conventional machine learning models and algorithms, such as the random feature model, the two-layer neural network model and the residual neural network model, can all be recovered (in a scaled f… ▽ More We present a continuous formulation of machine learning, as a problem in the calculus of variations and differential-integral equations, in the spirit of classical numerical analysis. We demonstrate that conventional machine learning models and algorithms, such as the random feature model, the two-layer neural network model and the residual neural network model, can all be recovered (in a scaled form) as particular discretizations of different continuous formulations. We also present examples of new models, such as the flow-based random feature model, and new algorithms, such as the smoothed particle method and spectral method, that arise naturally from this continuous formulation. We discuss how the issues of generalization error and implicit regularization can be studied under this framework. △ Less

Submitted 26 September, 2020; v1 submitted 29 December, 2019; originally announced December 2019.

Comments: published version

Journal ref: Science China Mathematics (2020)

arXiv:1912.06987 [pdf, ps, other]

The Generalization Error of the Minimum-norm Solutions for Over-parameterized Neural Networks

Authors: Weinan E, Chao Ma, Lei Wu

Abstract: We study the generalization properties of minimum-norm solutions for three over-parametrized machine learning models including the random feature model, the two-layer neural network model and the residual network model. We proved that for all three models, the generalization error for the minimum-norm solution is comparable to the Monte Carlo rate, up to some logarithmic terms, as long as the mode… ▽ More We study the generalization properties of minimum-norm solutions for three over-parametrized machine learning models including the random feature model, the two-layer neural network model and the residual network model. We proved that for all three models, the generalization error for the minimum-norm solution is comparable to the Monte Carlo rate, up to some logarithmic terms, as long as the models are sufficiently over-parametrized. △ Less

Submitted 28 January, 2021; v1 submitted 15 December, 2019; originally announced December 2019.

Comments: Published version

Journal ref: Pure and Applied Functional Analysis, Volume 5, Number 6, 1145-1460, 2020

arXiv:1906.08039 [pdf, ps, other]

The Barron Space and the Flow-induced Function Spaces for Neural Network Models

Authors: Weinan E, Chao Ma, Lei Wu

Abstract: One of the key issues in the analysis of machine learning models is to identify the appropriate function space and norm for the model. This is the set of functions endowed with a quantity which can control the approximation and estimation errors by a particular machine learning model. In this paper, we address this issue for two representative neural network models: the two-layer networks and the… ▽ More One of the key issues in the analysis of machine learning models is to identify the appropriate function space and norm for the model. This is the set of functions endowed with a quantity which can control the approximation and estimation errors by a particular machine learning model. In this paper, we address this issue for two representative neural network models: the two-layer networks and the residual neural networks. We define the Barron space and show that it is the right space for two-layer neural network models in the sense that optimal direct and inverse approximation theorems hold for functions in the Barron space. For residual neural network models, we construct the so-called flow-induced function space, and prove direct and inverse approximation theorems for this space. In addition, we show that the Rademacher complexity for bounded sets under these norms has the optimal upper bounds. △ Less

Submitted 27 March, 2021; v1 submitted 18 June, 2019; originally announced June 2019.

arXiv:1904.05263 [pdf, other]

Analysis of the Gradient Descent Algorithm for a Deep Neural Network Model with Skip-connections

Authors: Weinan E, Chao Ma, Qingcan Wang, Lei Wu

Abstract: The behavior of the gradient descent (GD) algorithm is analyzed for a deep neural network model with skip-connections. It is proved that in the over-parametrized regime, for a suitable initialization, with high probability GD can find a global minimum exponentially fast. Generalization error estimates along the GD path are also established. As a consequence, it is shown that when the target functi… ▽ More The behavior of the gradient descent (GD) algorithm is analyzed for a deep neural network model with skip-connections. It is proved that in the over-parametrized regime, for a suitable initialization, with high probability GD can find a global minimum exponentially fast. Generalization error estimates along the GD path are also established. As a consequence, it is shown that when the target function is in the reproducing kernel Hilbert space (RKHS) with a kernel defined by the initialization, there exist generalizable early-stopping solutions along the GD path. In addition, it is also shown that the GD path is uniformly close to the functions given by the related random feature model. Consequently, in this "implicit regularization" setting, the deep neural network model deteriorates to a random feature model. Our results hold for neural networks of any width larger than the input dimension. △ Less

Submitted 14 April, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

Comments: 29 pages, 4 figures

arXiv:1904.04326 [pdf, other]

doi 10.1007/s11425-019-1628-5

A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics

Authors: Weinan E, Chao Ma, Lei Wu

Abstract: A fairly comprehensive analysis is presented for the gradient descent dynamics for training two-layer neural network models in the situation when the parameters in both layers are updated. General initialization schemes as well as general regimes for the network width and training data size are considered. In the over-parametrized regime, it is shown that gradient descent dynamics can achieve zero… ▽ More A fairly comprehensive analysis is presented for the gradient descent dynamics for training two-layer neural network models in the situation when the parameters in both layers are updated. General initialization schemes as well as general regimes for the network width and training data size are considered. In the over-parametrized regime, it is shown that gradient descent dynamics can achieve zero training loss exponentially fast regardless of the quality of the labels. In addition, it is proved that throughout the training process the functions represented by the neural network model are uniformly close to that of a kernel method. For general values of the network width and training data size, sharp estimates of the generalization error is established for target functions in the appropriate reproducing kernel Hilbert space. △ Less

Submitted 20 February, 2020; v1 submitted 8 April, 2019; originally announced April 2019.

Comments: Published version

MSC Class: 41A99; 49M99

Journal ref: Science China Mathematics (2020)

arXiv:1810.06397 [pdf, other]

doi 10.4310/CMS.2019.v17.n5.a11

A Priori Estimates of the Population Risk for Two-layer Neural Networks

Authors: Weinan E, Chao Ma, Lei Wu

Abstract: New estimates for the population risk are established for two-layer neural networks. These estimates are nearly optimal in the sense that the error rates scale in the same way as the Monte Carlo error rates. They are equally effective in the over-parametrized regime when the network size is much larger than the size of the dataset. These new estimates are a priori in nature in the sense that the b… ▽ More New estimates for the population risk are established for two-layer neural networks. These estimates are nearly optimal in the sense that the error rates scale in the same way as the Monte Carlo error rates. They are equally effective in the over-parametrized regime when the network size is much larger than the size of the dataset. These new estimates are a priori in nature in the sense that the bounds depend only on some norms of the underlying functions to be fitted, not the parameters in the model, in contrast with most existing results which are a posteriori in nature. Using these a priori estimates, we provide a perspective for understanding why two-layer neural networks perform better than the related kernel methods. △ Less

Submitted 20 February, 2020; v1 submitted 15 October, 2018; originally announced October 2018.

Comments: Published version

MSC Class: 41A46; 41A63; 62J02; 65D05

Journal ref: Communications in Mathematical Sciences, Volume 17(2019)

arXiv:1809.10188 [pdf, other]

Monge-Ampère Flow for Generative Modeling

Authors: Linfeng Zhang, Weinan E, Lei Wang

Abstract: We present a deep generative model, named Monge-Ampère flow, which builds on continuous-time gradient flow arising from the Monge-Ampère equation in optimal transport theory. The generative map from the latent space to the data space follows a dynamical system, where a learnable potential function guides a compressible fluid to flow towards the target density distribution. Training of the model am… ▽ More We present a deep generative model, named Monge-Ampère flow, which builds on continuous-time gradient flow arising from the Monge-Ampère equation in optimal transport theory. The generative map from the latent space to the data space follows a dynamical system, where a learnable potential function guides a compressible fluid to flow towards the target density distribution. Training of the model amounts to solving an optimal control problem. The Monge-Ampère flow has tractable likelihoods and supports efficient sampling and inference. One can easily impose symmetry constraints in the generative model by designing suitable scalar potential functions. We apply the approach to unsupervised density estimation of the MNIST dataset and variational calculation of the two-dimensional Ising model at the critical point. This approach brings insights and techniques from Monge-Ampère equation, optimal transport, and fluid dynamics into reversible flow-based generative models. △ Less

Submitted 26 September, 2018; originally announced September 2018.

arXiv:1807.01083 [pdf, ps, other]

doi 10.1007/s40687-018-0172-y

A Mean-Field Optimal Control Formulation of Deep Learning

Authors: Weinan E, Jiequn Han, Qianxiao Li

Abstract: Recent work linking deep neural networks and dynamical systems opened up new avenues to analyze deep learning. In particular, it is observed that new insights can be obtained by recasting deep learning as an optimal control problem on difference or differential equations. However, the mathematical aspects of such a formulation have not been systematically explored. This paper introduces the mathem… ▽ More Recent work linking deep neural networks and dynamical systems opened up new avenues to analyze deep learning. In particular, it is observed that new insights can be obtained by recasting deep learning as an optimal control problem on difference or differential equations. However, the mathematical aspects of such a formulation have not been systematically explored. This paper introduces the mathematical formulation of the population risk minimization problem in deep learning as a mean-field optimal control problem. Mirroring the development of classical optimal control, we state and prove optimality conditions of both the Hamilton-Jacobi-Bellman type and the Pontryagin type. These mean-field results reflect the probabilistic nature of the learning problem. In addition, by appealing to the mean-field Pontryagin's maximum principle, we establish some quantitative relationships between population and empirical learning problems. This serves to establish a mathematical foundation for investigating the algorithmic and theoretical connections between optimal control and deep learning. △ Less

Submitted 3 July, 2018; originally announced July 2018.

Comments: 44 pages

Journal ref: Research in the Mathematical Sciences, 6:10 (2019)

arXiv:1709.05963 [pdf, other]

doi 10.1007/s00332-018-9525-3

Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations

Authors: Christian Beck, Weinan E, Arnulf Jentzen

Abstract: High-dimensional partial differential equations (PDE) appear in a number of models from the financial industry, such as in derivative pricing models, credit valuation adjustment (CVA) models, or portfolio optimization models. The PDEs in such applications are high-dimensional as the dimension corresponds to the number of financial assets in a portfolio. Moreover, such PDEs are often fully nonlinea… ▽ More High-dimensional partial differential equations (PDE) appear in a number of models from the financial industry, such as in derivative pricing models, credit valuation adjustment (CVA) models, or portfolio optimization models. The PDEs in such applications are high-dimensional as the dimension corresponds to the number of financial assets in a portfolio. Moreover, such PDEs are often fully nonlinear due to the need to incorporate certain nonlinear phenomena in the model such as default risks, transaction costs, volatility uncertainty (Knightian uncertainty), or trading constraints in the model. Such high-dimensional fully nonlinear PDEs are exceedingly difficult to solve as the computational effort for standard approximation methods grows exponentially with the dimension. In this work we propose a new method for solving high-dimensional fully nonlinear second-order PDEs. Our method can in particular be used to sample from high-dimensional nonlinear expectations. The method is based on (i) a connection between fully nonlinear second-order PDEs and second-order backward stochastic differential equations (2BSDEs), (ii) a merged formulation of the PDE and the 2BSDE problem, (iii) a temporal forward discretization of the 2BSDE and a spatial approximation via deep neural nets, and (iv) a stochastic gradient descent-type optimization procedure. Numerical results obtained using ${\rm T{\small ENSOR}F{\small LOW}}$ in ${\rm P{\small YTHON}}$ illustrate the efficiency and the accuracy of the method in the cases of a $100$-dimensional Black-Scholes-Barenblatt equation, a $100$-dimensional Hamilton-Jacobi-Bellman equation, and a nonlinear expectation of a $ 100 $-dimensional $ G $-Brownian motion. △ Less

Submitted 18 September, 2017; originally announced September 2017.

Comments: 56 pages, 12 figures

MSC Class: 65C99; 65M99; 60H30; 65-05

Journal ref: J. Nonlinear Sci. 29, 1563-1619 (2019)

arXiv:1708.03223 [pdf, ps, other]

doi 10.1007/s10915-018-00903-0

On multilevel Picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations

Authors: Weinan E, Martin Hutzenthaler, Arnulf Jentzen, Thomas Kruse

Abstract: Parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) are key ingredients in a number of models in physics and financial engineering. In particular, parabolic PDEs and BSDEs are fundamental tools in the state-of-the-art pricing and hedging of financial derivatives. The PDEs and BSDEs appearing in such applications are often high-dimensional and nonl… ▽ More Parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) are key ingredients in a number of models in physics and financial engineering. In particular, parabolic PDEs and BSDEs are fundamental tools in the state-of-the-art pricing and hedging of financial derivatives. The PDEs and BSDEs appearing in such applications are often high-dimensional and nonlinear. Since explicit solutions of such PDEs and BSDEs are typically not available, it is a very active topic of research to solve such PDEs and BSDEs approximately. In the recent article [E, W., Hutzenthaler, M., Jentzen, A., and Kruse, T. Linear scaling algorithms for solving high-dimensional nonlinear parabolic differential equations. arXiv:1607.03295 (2017)] we proposed a family of approximation methods based on Picard approximations and multilevel Monte Carlo methods and showed under suitable regularity assumptions on the exact solution for semilinear heat equations that the computational complexity is bounded by $O( d \, ε^{-(4+δ)})$ for any $δ\in(0,\infty)$, where $d$ is the dimensionality of the problem and $ε\in(0,\infty)$ is the prescribed accuracy. In this paper, we test the applicability of this algorithm on a variety of $100$-dimensional nonlinear PDEs that arise in physics and finance by means of numerical simulations presenting approximation accuracy against runtime. The simulation results for these 100-dimensional example PDEs are very satisfactory in terms of accuracy and speed. In addition, we also provide a review of other approximation methods for nonlinear PDEs and BSDEs from the literature. △ Less

Submitted 10 August, 2017; originally announced August 2017.

Journal ref: J. Sci. Comput. 79, 1534-1571 (2019)

arXiv:1707.02568 [pdf, other]

doi 10.1073/pnas.1718942115

Solving high-dimensional partial differential equations using deep learning

Authors: Jiequn Han, Arnulf Jentzen, Weinan E

Abstract: Developing algorithms for solving high-dimensional partial differential equations (PDEs) has been an exceedingly difficult task for a long time, due to the notoriously difficult problem known as the "curse of dimensionality". This paper introduces a deep learning-based approach that can handle general high-dimensional parabolic PDEs. To this end, the PDEs are reformulated using backward stochastic… ▽ More Developing algorithms for solving high-dimensional partial differential equations (PDEs) has been an exceedingly difficult task for a long time, due to the notoriously difficult problem known as the "curse of dimensionality". This paper introduces a deep learning-based approach that can handle general high-dimensional parabolic PDEs. To this end, the PDEs are reformulated using backward stochastic differential equations and the gradient of the unknown solution is approximated by neural networks, very much in the spirit of deep reinforcement learning with the gradient acting as the policy function. Numerical results on examples including the nonlinear Black-Scholes equation, the Hamilton-Jacobi-Bellman equation, and the Allen-Cahn equation suggest that the proposed algorithm is quite effective in high dimensions, in terms of both accuracy and cost. This opens up new possibilities in economics, finance, operational research, and physics, by considering all participating agents, assets, resources, or particles together at the same time, instead of making ad hoc assumptions on their inter-relationships. △ Less

Submitted 3 July, 2018; v1 submitted 9 July, 2017; originally announced July 2017.

Comments: 13 pages, 6 figures

Journal ref: Proceedings of the National Academy of Sciences, 115(34), 8505-8510 (2018)

arXiv:1706.04702 [pdf, other]

doi 10.1007/s40304-017-0117-6

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

Authors: Weinan E, Jiequn Han, Arnulf Jentzen

Abstract: We propose a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, by making an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution… ▽ More We propose a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, by making an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution of the BSDE. The policy function is then approximated by a neural network, as is done in deep reinforcement learning. Numerical results using TensorFlow illustrate the efficiency and accuracy of the proposed algorithms for several 100-dimensional nonlinear PDEs from physics and finance such as the Allen-Cahn equation, the Hamilton-Jacobi-Bellman equation, and a nonlinear pricing model for financial derivatives. △ Less

Submitted 14 June, 2017; originally announced June 2017.

Comments: 39 pages, 15 figures

MSC Class: 65M75; 60H35; 65C30

Journal ref: Commun. Math. Stat. 5, 349-380 (2017)

arXiv:1611.07422 [pdf, other]

Deep Learning Approximation for Stochastic Control Problems

Authors: Jiequn Han, Weinan E

Abstract: Many real world stochastic control problems suffer from the "curse of dimensionality". To overcome this difficulty, we develop a deep learning approach that directly solves high-dimensional stochastic control problems based on Monte-Carlo sampling. We approximate the time-dependent controls as feedforward neural networks and stack these networks together through model dynamics. The objective funct… ▽ More Many real world stochastic control problems suffer from the "curse of dimensionality". To overcome this difficulty, we develop a deep learning approach that directly solves high-dimensional stochastic control problems based on Monte-Carlo sampling. We approximate the time-dependent controls as feedforward neural networks and stack these networks together through model dynamics. The objective function for the control problem plays the role of the loss function for the deep neural network. We test this approach using examples from the areas of optimal trading and energy storage. Our results suggest that the algorithm presented here achieves satisfactory accuracy and at the same time, can handle rather high dimensional problems. △ Less

Submitted 1 November, 2016; originally announced November 2016.

arXiv:1607.03295 [pdf, ps, other]

doi 10.1007/s42985-021-00089-5

Multilevel Picard iterations for solving smooth semilinear parabolic heat equations

Authors: Weinan E, Martin Hutzenthaler, Arnulf Jentzen, Thomas Kruse

Abstract: We introduce a new family of numerical algorithms for approximating solutions of general high-dimensional semilinear parabolic partial differential equations at single space-time points. The algorithm is obtained through a delicate combination of the Feynman-Kac and the Bismut-Elworthy-Li formulas, and an approximate decomposition of the Picard fixed-point iteration with multilevel accuracy. The a… ▽ More We introduce a new family of numerical algorithms for approximating solutions of general high-dimensional semilinear parabolic partial differential equations at single space-time points. The algorithm is obtained through a delicate combination of the Feynman-Kac and the Bismut-Elworthy-Li formulas, and an approximate decomposition of the Picard fixed-point iteration with multilevel accuracy. The algorithm has been tested on a variety of semilinear partial differential equations that arise in physics and finance, with very satisfactory results. Analytical tools needed for the analysis of such algorithms, including a semilinear Feynman-Kac formula, a new class of semi-norms and their recursive inequalities, are also introduced. They allow us to prove for semilinear heat equations with gradient-independent nonlinearity that the computational complexity of the proposed algorithm is bounded by $O(d\,\varepsilon^{-(4+δ)})$ for any $δ\in (0,\infty)$ under suitable assumptions, where $d\in \mathbb{N}$ is the dimensionality of the problem and $\varepsilon\in(0,\infty)$ is the prescribed accuracy. △ Less

Submitted 22 February, 2019; v1 submitted 12 July, 2016; originally announced July 2016.

Journal ref: Partial Differential Equations and Applications 2 (2021), no. 80

arXiv:1511.02975 [pdf, other]

Noisy Hegselmann-Krause Systems: Phase Transition and the 2R-Conjecture

Authors: Chu Wang, Qianxiao Li, Weinan E, Bernard Chazelle

Abstract: The classic Hegselmann-Krause (HK) model for opinion dynam- ics consists of a set of agents on the real line, each one instructed to move, at every time step, to the mass center of all the agents within a fixed distance R. In this work, we investigate the effects of noise in the continuous-time version of the model as described by its mean-field limiting Fokker-Planck equation. In the presence of… ▽ More The classic Hegselmann-Krause (HK) model for opinion dynam- ics consists of a set of agents on the real line, each one instructed to move, at every time step, to the mass center of all the agents within a fixed distance R. In this work, we investigate the effects of noise in the continuous-time version of the model as described by its mean-field limiting Fokker-Planck equation. In the presence of a finite number of agents, the system exhibits a phase transition from order to disorder as the noise increases. The ordered phase features clusters whose width depends only on the noise level. We introduce an order parameter to track the phase transition and resolve the corresponding phase dia- gram. The system undergoes a phase transition for small R but none for larger R. Based on the stability analysis of the mean-field equation, we derive the existence of a forbidden zone for the disordered phase to emerge. We also provide a theoretical explanation for the well-known 2R conjecture, which states that, for a random initial distribution in a fixed interval, the final configuration consists of clusters separated by a distance of roughly 2R. Our theoretical analysis also confirms previous simulations and predicts properties of the noisy HK model in higher dimension. △ Less

Submitted 24 November, 2015; v1 submitted 9 November, 2015; originally announced November 2015.

arXiv:1211.1446 [pdf, ps, other]

doi 10.1016/j.jcp.2013.03.030

Efficient iterative method for solving the Dirac-Kohn-Sham density functional theory

Authors: Lin Lin, Sihong Shao, Weinan E

Abstract: We present for the first time an efficient iterative method to directly solve the four-component Dirac-Kohn-Sham (DKS) density functional theory. Due to the existence of the negative energy continuum in the DKS operator, the existing iterative techniques for solving the Kohn-Sham systems cannot be efficiently applied to solve the DKS systems. The key component of our method is a novel filtering st… ▽ More We present for the first time an efficient iterative method to directly solve the four-component Dirac-Kohn-Sham (DKS) density functional theory. Due to the existence of the negative energy continuum in the DKS operator, the existing iterative techniques for solving the Kohn-Sham systems cannot be efficiently applied to solve the DKS systems. The key component of our method is a novel filtering step (F) which acts as a preconditioner in the framework of the locally optimal block preconditioned conjugate gradient (LOBPCG) method. The resulting method, dubbed the LOBPCG-F method, is able to compute the desired eigenvalues and eigenvectors in the positive energy band without computing any state in the negative energy band. The LOBPCG-F method introduces mild extra cost compared to the standard LOBPCG method and can be easily implemented. We demonstrate our method in the pseudopotential framework with a planewave basis set which naturally satisfies the kinetic balance prescription. Numerical results for Pt$_{2}$, Au$_{2}$, TlF, and Bi$_{2}$Se$_{3}$ indicate that the LOBPCG-F method is a robust and efficient method for investigating the relativistic effect in systems containing heavy elements. △ Less

Submitted 21 March, 2013; v1 submitted 6 November, 2012; originally announced November 2012.

Comments: 31 pages, 5 figures

Journal ref: Journal of Computational Physics 245 (2013) 205-217

arXiv:1102.5545 [pdf, ps, other]

Cauchy-Born rule and spin density wave for the spin-polarized Thomas-Fermi-Dirac-von Weizsacker model

Authors: Weinan E, Jianfeng Lu

Abstract: The electronic structure (electron charges and spins) of a perfect crystal under external magnetic field is analyzed using the spin-polarized Thomas-Fermi-Dirac-von Weizsacker model. An extension of the classical Cauchy-Born rule for crystal lattices is established for the electronic structure under sharp stability conditions on charge density wave and spin density wave. A Landau-Lifschitz type mi… ▽ More The electronic structure (electron charges and spins) of a perfect crystal under external magnetic field is analyzed using the spin-polarized Thomas-Fermi-Dirac-von Weizsacker model. An extension of the classical Cauchy-Born rule for crystal lattices is established for the electronic structure under sharp stability conditions on charge density wave and spin density wave. A Landau-Lifschitz type micromagnetic energy functional is derived. △ Less

Submitted 27 February, 2011; originally announced February 2011.

Comments: 24 pages; dated June 17, 2010

arXiv:1011.0042 [pdf, ps, other]

doi 10.1088/0951-7715/24/6/008

The Gentlest Ascent Dynamics

Authors: Weinan E, Xiang Zhou

Abstract: Dynamical systems that describe the escape from the basins of attraction of stable invariant sets are presented and analyzed. It is shown that the stable fixed points of such dynamical systems are the index-1 saddle points. Generalizations to high index saddle points are discussed. Both gradient and non-gradient systems are considered. Preliminary results on the nature of the dynamical behavior ar… ▽ More Dynamical systems that describe the escape from the basins of attraction of stable invariant sets are presented and analyzed. It is shown that the stable fixed points of such dynamical systems are the index-1 saddle points. Generalizations to high index saddle points are discussed. Both gradient and non-gradient systems are considered. Preliminary results on the nature of the dynamical behavior are presented. △ Less

Submitted 4 February, 2011; v1 submitted 29 October, 2010; originally announced November 2010.

arXiv:0812.4352 [pdf, ps, other]

doi 10.1103/PhysRevB.79.115133

Multipole Representation of the Fermi Operator with Application to the Electronic Structure Analysis of Metallic Systems

Authors: Lin Lin, Jianfeng Lu, Roberto Car, Weinan E

Abstract: We propose a multipole representation of the Fermi-Dirac function and the Fermi operator, and use this representation to develop algorithms for electronic structure analysis of metallic systems. The new algorithm is quite simple and efficient. Its computational cost scales logarithmically with $βΔ\eps$ where $β$ is the inverse temperature, and $Δ\eps$ is the width of the spectrum of the discreti… ▽ More We propose a multipole representation of the Fermi-Dirac function and the Fermi operator, and use this representation to develop algorithms for electronic structure analysis of metallic systems. The new algorithm is quite simple and efficient. Its computational cost scales logarithmically with $βΔ\eps$ where $β$ is the inverse temperature, and $Δ\eps$ is the width of the spectrum of the discretized Hamiltonian matrix. △ Less

Submitted 23 December, 2008; originally announced December 2008.

Comments: 10 pages, 3 figures, 3 tables

Journal ref: Phys. Rev. B, 79, 115133, 2009

arXiv:0806.1621 [pdf, ps, other]

Some Critical Issues for the "Equation-Free" Approach to Multiscale Modeling

Authors: Weinan E, Eric Vanden-Eijnden

Abstract: The "equation-free'' approach has been proposed in recent years as a general framework for developing multiscale methods to efficiently capture the macroscale behavior of a system using only the microscale models. In this paper, we take a close look at some of the algorithms proposed under the "equation-free'' umbrella, the projective integrators and the patch dynamics. We discuss some very simp… ▽ More The "equation-free'' approach has been proposed in recent years as a general framework for developing multiscale methods to efficiently capture the macroscale behavior of a system using only the microscale models. In this paper, we take a close look at some of the algorithms proposed under the "equation-free'' umbrella, the projective integrators and the patch dynamics. We discuss some very simple examples in the context of the "equation-free'' approach. These examples seem to indicate that while its general philosophy is quite attractive and indeed similar to many other approaches in concurrent multiscale modeling, there are severe limitations to the specific implementation proposed by the equation-free approach. △ Less

Submitted 10 June, 2008; originally announced June 2008.

MSC Class: 65L99; 65M99

arXiv:math/0212415 [pdf, ps, other]

Energy landscapes and rare events

Authors: Weinan E, Weiqing Ren, Eric Vanden-Eijnden

Abstract: Many problems in physics, material sciences, chemistry and biology can be abstractly formulated as a system that navigates over a complex energy landscape of high or infinite dimensions. Well-known examples include phase transitions of condensed matter, conformational changes of biopolymers, and chemical reactions. The energy landscape typically exhibits multiscale features, giving rise to the m… ▽ More Many problems in physics, material sciences, chemistry and biology can be abstractly formulated as a system that navigates over a complex energy landscape of high or infinite dimensions. Well-known examples include phase transitions of condensed matter, conformational changes of biopolymers, and chemical reactions. The energy landscape typically exhibits multiscale features, giving rise to the multiscale nature of the dynamics. This is one of the main challenges that we face in computational science. In this report, we will review the recent work done by scientists from several disciplines on probing such energy landscapes. Of particular interest is the analysis and computation of transition pathways and transition rates between metastable states. We will then present the string method that has proven to be very effective for some truly complex systems in material science and chemistry. △ Less

Submitted 30 November, 2002; originally announced December 2002.

Report number: ICM-2002 MSC Class: 60-08; 60F10; 65C

Journal ref: Proceedings of the ICM, Beijing 2002, vol. 1, 621--630

arXiv:math/0005306 [pdf, ps, other]

Invariant measures for Burgers equation with stochastic forcing

Authors: Weinan E, K. M. Khanin, A. E. Mazel, Ya. G. Sinai

Abstract: In this paper we study the following Burgers equation du/dt + d/dx (u^2/2) = epsilon d^2u/dx^2 + f(x,t) where f(x,t)=dF/dx(x,t) is a random forcing function, which is periodic in x and white noise in t. We prove the existence and uniqueness of an invariant measure by establishing a ``one force, one solution'' principle, namely that for almost every realization of the force, there is a uniqu… ▽ More In this paper we study the following Burgers equation du/dt + d/dx (u^2/2) = epsilon d^2u/dx^2 + f(x,t) where f(x,t)=dF/dx(x,t) is a random forcing function, which is periodic in x and white noise in t. We prove the existence and uniqueness of an invariant measure by establishing a ``one force, one solution'' principle, namely that for almost every realization of the force, there is a unique distinguished solution that exists for the time interval (-infty, +infty) and this solution attracts all other solutions with the same forcing. This is done by studying the so-called one-sided minimizers. We also give a detailed description of the structure and regularity properties for the stationary solutions. In particular, we prove, under some non-degeneracy conditions on the forcing, that almost surely there is a unique main shock and a unique global minimizer for the stationary solutions. Furthermore the global minimizer is a hyperbolic trajectory of the underlying system of characteristics. △ Less

Submitted 30 April, 2000; originally announced May 2000.

Comments: 84 pages, published version, abstract added in migration

Report number: Annals migration 4-2001

Journal ref: Ann. of Math. (2) 151 (2000), no. 3, 877-960

Showing 1–50 of 50 results for author: E, W