Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 65 results for author: E, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14969  [pdf, other

    cs.LG cs.AI

    Uni-Mol2: Exploring Molecular Pretraining Model at Scale

    Authors: Xiaohong Ji, Wang Zhen, Zhifeng Gao, Hang Zheng, Linfeng Zhang, Guolin Ke, Weinan E

    Abstract: In recent years, pretraining models have made significant advancements in the fields of natural language processing (NLP), computer vision (CV), and life sciences. The significant advancements in NLP and CV are predominantly driven by the expansion of model parameters and data size, a phenomenon now recognized as the scaling laws. However, research exploring scaling law in molecular pretraining mo… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2405.20763  [pdf, other

    cs.LG math.OC stat.ML

    Improving Generalization and Convergence by Enhancing Implicit Regularization

    Authors: Mingze Wang, Haotian He, Jinbo Wang, Zilin Wang, Guanhua Huang, Feiyu Xiong, Zhiyu Li, Weinan E, Lei Wu

    Abstract: In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that I… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 35 pages

  3. arXiv:2405.12356  [pdf, other

    physics.bio-ph cs.LG physics.chem-ph physics.data-an

    Coarse-graining conformational dynamics with multi-dimensional generalized Langevin equation: how, when, and why

    Authors: Pinchen Xie, Yunrui Qiu, Weinan E

    Abstract: A data-driven ab initio generalized Langevin equation (AIGLE) approach is developed to learn and simulate high-dimensional, heterogeneous, coarse-grained conformational dynamics. Constrained by the fluctuation-dissipation theorem, the approach can build coarse-grained models in dynamical consistency with all-atom molecular dynamics. We also propose practical criteria for AIGLE to enforce long-term… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  4. arXiv:2402.00522  [pdf, ps, other

    cs.LG stat.ML

    Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling

    Authors: Mingze Wang, Weinan E

    Abstract: We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit… ▽ More

    Submitted 24 May, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: 70 pages

  5. arXiv:2401.08309  [pdf, other

    cs.CL cs.LG

    Anchor function: a type of benchmark functions for studying language models

    Authors: Zhongwang Zhang, Zhiwei Wang, Junjie Yao, Zhangchen Zhou, Xiaolong Li, Weinan E, Zhi-Qin John Xu

    Abstract: Understanding transformer-based language models is becoming increasingly crucial, particularly as they play pivotal roles in advancing towards artificial general intelligence. However, language model research faces significant challenges, especially for academic research groups with constrained resources. These challenges include complex data structures, unknown target functions, high computationa… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  6. arXiv:2311.17749  [pdf, other

    math.OC cs.RO

    Learning Free Terminal Time Optimal Closed-loop Control of Manipulators

    Authors: Wei Hu, Yue Zhao, Weinan E, Jiequn Han, Jihao Long

    Abstract: This paper presents a novel approach to learning free terminal time closed-loop control for robotic manipulation tasks, enabling dynamic adjustment of task duration and control inputs to enhance performance. We extend the supervised learning approach, namely solving selected optimal open-loop problems and utilizing them as training data for a policy network, to the free terminal time scenario. Thr… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  7. arXiv:2305.01243  [pdf

    physics.comp-ph cs.LG

    Machine-Learned Invertible Coarse Graining for Multiscale Molecular Modeling

    Authors: Jun Zhang, Xiaohan Lin, Weinan E, Yi Qin Gao

    Abstract: Multiscale molecular modeling is widely applied in scientific research of molecular properties over large time and length scales. Two specific challenges are commonly present in multiscale modeling, provided that information between the coarse and fine representations of molecules needs to be properly exchanged: One is to construct coarse grained (CG) models by passing information from the fine to… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: 10 pages, 5 figures, plus SI

  8. arXiv:2302.03498  [pdf, other

    cs.CL cs.SD eess.AS

    MAC: A unified framework boosting low resource automatic speech recognition

    Authors: Zeping Min, Qian Ge, Zhong Li, Weinan E

    Abstract: We propose a unified framework for low resource automatic speech recognition tasks named meta audio concatenation (MAC). It is easy to implement and can be carried out in extremely low resource environments. Mathematically, we give a clear description of MAC framework from the perspective of bayesian sampling. In this framework, we leverage a novel concatenative synthesis text-to-speech system to… ▽ More

    Submitted 15 February, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

  9. arXiv:2201.03549  [pdf, other

    physics.chem-ph cs.LG math.NA physics.comp-ph physics.flu-dyn

    A multi-scale sampling method for accurate and robust deep neural network to predict combustion chemical kinetics

    Authors: Tianhan Zhang, Yuxiao Yi, Yifan Xu, Zhi X. Chen, Yaoyu Zhang, Weinan E, Zhi-Qin John Xu

    Abstract: Machine learning has long been considered as a black box for predicting combustion chemical kinetics due to the extremely large number of parameters and the lack of evaluation standards and reproducibility. The current work aims to understand two basic questions regarding the deep neural network (DNN) method: what data the DNN needs and how general the DNN method can be. Sampling and preprocessing… ▽ More

    Submitted 12 August, 2022; v1 submitted 9 January, 2022; originally announced January 2022.

  10. arXiv:2201.02025  [pdf, other

    cs.LG math.OC

    A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics

    Authors: Zhiwei Wang, Yaoyu Zhang, Enhan Zhao, Yiguang Ju, Weinan E, Zhi-Qin John Xu, Tianhan Zhang

    Abstract: A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics is proposed and validated using high-temperature auto-ignitions, perfectly stirred reactors (PSR), and one-dimensional freely propagating flames of n-heptane/air mixtures. The mechanism reduction is modeled as an optimization problem on Boolean space, where a Boolean vector, each entry corresponding to a specie… ▽ More

    Submitted 8 September, 2022; v1 submitted 6 January, 2022; originally announced January 2022.

  11. arXiv:2112.14798  [pdf, other

    physics.comp-ph cs.LG physics.flu-dyn

    DeePN$^2$: A deep learning-based non-Newtonian hydrodynamic model

    Authors: Lidong Fang, Pei Ge, Lei Zhang, Weinan E, Huan Lei

    Abstract: A long standing problem in the modeling of non-Newtonian hydrodynamics of polymeric flows is the availability of reliable and interpretable hydrodynamic models that faithfully encode the underlying micro-scale polymer dynamics. The main complication arises from the long polymer relaxation time, the complex molecular structure and heterogeneous interaction. DeePN$^2$, a deep learning-based non-Newt… ▽ More

    Submitted 13 April, 2022; v1 submitted 29 December, 2021; originally announced December 2021.

  12. arXiv:2112.14377  [pdf, other

    econ.GN cs.LG

    DeepHAM: A Global Solution Method for Heterogeneous Agent Models with Aggregate Shocks

    Authors: Jiequn Han, Yucheng Yang, Weinan E

    Abstract: An efficient, reliable, and interpretable global solution method, the Deep learning-based algorithm for Heterogeneous Agent Models (DeepHAM), is proposed for solving high dimensional heterogeneous agent models with aggregate shocks. The state distribution is approximately represented by a set of optimal generalized moments. Deep neural networks are used to approximate the value and policy function… ▽ More

    Submitted 21 February, 2022; v1 submitted 28 December, 2021; originally announced December 2021.

    Comments: Slides available at https://users.flatironinstitute.org/~jhan/files/DeepHAM_slides.pdf

  13. MOD-Net: A Machine Learning Approach via Model-Operator-Data Network for Solving PDEs

    Authors: Lulu Zhang, Tao Luo, Yaoyu Zhang, Weinan E, Zhi-Qin John Xu, Zheng Ma

    Abstract: In this paper, we propose a a machine learning approach via model-operator-data network (MOD-Net) for solving PDEs. A MOD-Net is driven by a model to solve PDEs based on operator representation with regularization from data. For linear PDEs, we use a DNN to parameterize the Green's function and obtain the neural operator to approximate the solution according to the Green's method. To train the DNN… ▽ More

    Submitted 28 December, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

  14. arXiv:2107.03633  [pdf, other

    cs.LG stat.ML

    Generalization Error of GAN from the Discriminator's Perspective

    Authors: Hongkang Yang, Weinan E

    Abstract: The generative adversarial network (GAN) is a well-known model for learning high-dimensional distributions, but the mechanism for its generalization ability is not understood. In particular, GAN is vulnerable to the memorization phenomenon, the eventual convergence to the empirical distribution. We consider a simplified GAN model with the generator replaced by a density, and analyze how the discri… ▽ More

    Submitted 5 November, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

    MSC Class: 68T07; 62G07; 60-08

  15. arXiv:2104.07794  [pdf, ps, other

    cs.LG

    An $L^2$ Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation

    Authors: Jihao Long, Jiequn Han, Weinan E

    Abstract: Reinforcement learning (RL) algorithms based on high-dimensional function approximation have achieved tremendous empirical success in large-scale problems with an enormous number of states. However, most analysis of such algorithms gives rise to error bounds that involve either the number of states or the number of features. This paper considers the situation where the function approximation is ma… ▽ More

    Submitted 15 February, 2022; v1 submitted 15 April, 2021; originally announced April 2021.

  16. arXiv:2012.12654  [pdf

    physics.chem-ph cs.LG math.NA

    A deep learning-based ODE solver for chemical kinetics

    Authors: Tianhan Zhang, Yaoyu Zhang, Weinan E, Yiguang Ju

    Abstract: Developing efficient and accurate algorithms for chemistry integration is a challenging task due to its strong stiffness and high dimensionality. The current work presents a deep learning-based numerical method called DeepCombustion0.0 to solve stiff ordinary differential equation systems. The homogeneous autoignition of DME/air mixture, including 54 species, is adopted as an example to illustrate… ▽ More

    Submitted 23 November, 2020; originally announced December 2020.

  17. arXiv:2012.05420  [pdf, ps, other

    cs.LG stat.ML

    On the emergence of simplex symmetry in the final and penultimate layers of neural network classifiers

    Authors: Weinan E, Stephan Wojtowytsch

    Abstract: A recent numerical study observed that neural network classifiers enjoy a large degree of symmetry in the penultimate layer. Namely, if $h(x) = Af(x) +b$ where $A$ is a linear map and $f$ is the output of the penultimate layer of the network (after activation), then all data points $x_{i, 1}, \dots, x_{i, N_i}$ in a class $C_i$ are mapped to a single point $y_i$ by $f$ and the points $y_i$ are loc… ▽ More

    Submitted 4 June, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

    MSC Class: 68T07; 62H30

  18. arXiv:2012.01484  [pdf, ps, other

    math.AP cs.LG

    Some observations on high-dimensional partial differential equations with Barron data

    Authors: Weinan E, Stephan Wojtowytsch

    Abstract: We use explicit representation formulas to show that solutions to certain partial differential equations lie in Barron spaces or multilayer spaces if the PDE data lie in such function spaces. Consequently, these solutions can be represented efficiently using artificial neural networks, even in high dimension. Conversely, we present examples in which the solution fails to lie in the function space… ▽ More

    Submitted 4 June, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    MSC Class: 68T07; 35C15; 65M80

  19. arXiv:2011.14269  [pdf, other

    stat.ML cs.LG

    Generalization and Memorization: The Bias Potential Model

    Authors: Hongkang Yang, Weinan E

    Abstract: Models for learning probability distributions such as generative models and density estimators behave quite differently from models for learning functions. One example is found in the memorization phenomenon, namely the ultimate convergence to the empirical distribution, that occurs in generative adversarial networks (GANs). For this reason, the issue of generalization is more subtle than that for… ▽ More

    Submitted 1 March, 2021; v1 submitted 28 November, 2020; originally announced November 2020.

    Comments: Added new section on regularized model

    MSC Class: 68T07; 60-08

  20. arXiv:2010.05627  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning

    Authors: Pan Zhou, Jiashi Feng, Chao Ma, Caiming Xiong, Steven Hoi, Weinan E

    Abstract: It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this generalization gap by analyzing their local convergence behaviors. Specifically, we observe the heavy tails of gradient noise in these algorithms. This motivates us to analyze these algorithms thr… ▽ More

    Submitted 28 November, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020

  21. arXiv:2010.05311  [pdf, other

    econ.EM cs.AI cs.LG econ.GN stat.ML

    Interpretable Neural Networks for Panel Data Analysis in Economics

    Authors: Yucheng Yang, Zhong Zheng, Weinan E

    Abstract: The lack of interpretability and transparency are preventing economists from using advanced tools like neural networks in their empirical research. In this paper, we propose a class of interpretable neural network models that can achieve both high prediction accuracy and interpretability. The model can be written as a simple function of a regularized number of interpretable features, which are out… ▽ More

    Submitted 29 November, 2020; v1 submitted 11 October, 2020; originally announced October 2020.

  22. arXiv:2010.05172  [pdf, other

    econ.GN cs.AI

    The Knowledge Graph for Macroeconomic Analysis with Alternative Big Data

    Authors: Yucheng Yang, Yue Pang, Guanhua Huang, Weinan E

    Abstract: The current knowledge system of macroeconomics is built on interactions among a small number of variables, since traditional macroeconomic models can mostly handle a handful of inputs. Recent work using big data suggests that a much larger number of variables are active in driving the dynamics of the aggregate economy. In this paper, we introduce a knowledge graph (KG) that consists of not only li… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

  23. arXiv:2009.14596  [pdf, other

    math.NA cs.LG stat.ML

    Machine Learning and Computational Mathematics

    Authors: Weinan E

    Abstract: Neural network-based machine learning is capable of approximating functions in very high dimension with unprecedented efficiency and accuracy. This has opened up many exciting new possibilities, not just in traditional areas of artificial intelligence, but also in scientific computing and computational science. At the same time, machine learning has also acquired the reputation of being a set of "… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

    MSC Class: 68T07; 46E15; 26B35; 26B40

  24. arXiv:2009.13500  [pdf, ps, other

    stat.ML cs.LG math.NA

    A priori estimates for classification problems using neural networks

    Authors: Weinan E, Stephan Wojtowytsch

    Abstract: We consider binary and multi-class classification problems using hypothesis classes of neural networks. For a given hypothesis class, we use Rademacher complexity estimates and direct approximation theorems to obtain a priori error estimates for regularized loss functionals.

    Submitted 28 September, 2020; originally announced September 2020.

    MSC Class: 68T07; 60-08

  25. arXiv:2009.10713  [pdf, other

    cs.LG math.NA stat.ML

    Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't

    Authors: Weinan E, Chao Ma, Stephan Wojtowytsch, Lei Wu

    Abstract: The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network-based machine learning. In the tradition of good old applied mathematics, we will not only give attention to rigorous mathematical results, but also the insight we have gained from careful numerical experiments as well as… ▽ More

    Submitted 7 December, 2020; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: Review article. Feedback welcome

    MSC Class: 68T07 (primary); 26B40; 41A30; 35Q68

  26. arXiv:2009.07799  [pdf, other

    cs.LG math.OC stat.ML

    On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis

    Authors: Zhong Li, Jiequn Han, Weinan E, Qianxiao Li

    Abstract: We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a… ▽ More

    Submitted 15 May, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: Published version

    MSC Class: 68W25; 68T07; 37M10 ACM Class: I.2.6

  27. arXiv:2009.06125  [pdf, other

    cs.LG stat.ML

    A Qualitative Study of the Dynamic Behavior for Adaptive Gradient Algorithms

    Authors: Chao Ma, Lei Wu, Weinan E

    Abstract: The dynamic behavior of RMSprop and Adam algorithms is studied through a combination of careful numerical experiments and theoretical explanations. Three types of qualitative features are observed in the training loss curve: fast initial convergence, oscillations, and large spikes in the late phase. The sign gradient descent (signGD) flow, which is the limit of Adam when taking the learning rate t… ▽ More

    Submitted 29 September, 2021; v1 submitted 13 September, 2020; originally announced September 2020.

  28. arXiv:2009.02327  [pdf, other

    math.DS cs.LG physics.comp-ph

    OnsagerNet: Learning Stable and Interpretable Dynamics using a Generalized Onsager Principle

    Authors: Haijun Yu, Xinyuan Tian, Weinan E, Qianxiao Li

    Abstract: We propose a systematic method for learning stable and physically interpretable dynamical models using sampled trajectory data from physical processes based on a generalized Onsager principle. The learned dynamics are autonomous ordinary differential equations parameterized by neural networks that retain clear physical structure information, such as free energy, diffusion, conservative motion and… ▽ More

    Submitted 17 October, 2021; v1 submitted 6 September, 2020; originally announced September 2020.

    Comments: 29 pages, 19 figures

    MSC Class: 76E30; 34D20; 68T05/07; 82C35

    Journal ref: Phy. Rev. Fluids 6(11):114402, 2021

  29. arXiv:2008.05621  [pdf, other

    cs.LG stat.ML

    The Slow Deterioration of the Generalization Error of the Random Feature Model

    Authors: Chao Ma, Lei Wu, Weinan E

    Abstract: The random feature model exhibits a kind of resonance behavior when the number of parameters is close to the training sample size. This behavior is characterized by the appearance of large generalization gap, and is due to the occurrence of very small eigenvalues for the associated Gram matrix. In this paper, we examine the dynamic behavior of the gradient descent algorithm in this regime. We show… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

  30. arXiv:2007.15623  [pdf, ps, other

    stat.ML cs.LG math.FA

    On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics

    Authors: Weinan E, Stephan Wojtowytsch

    Abstract: We develop Banach spaces for ReLU neural networks of finite depth $L$ and infinite width. The spaces contain all finite fully connected $L$-layer networks and their $L^2$-limiting objects under bounds on the natural path-norm. Under this norm, the unit ball in the space for $L$-layer networks has low Rademacher complexity and thus favorable generalization properties. Functions in these spaces can… ▽ More

    Submitted 30 July, 2020; originally announced July 2020.

    MSC Class: 68T07; 46E15; 26B35; 26B40

  31. arXiv:2007.09788  [pdf, other

    quant-ph cond-mat.dis-nn cs.LG

    Coarse-grained spectral projection (CGSP): a deep learning-assisted approach to quantum unitary dynamics

    Authors: Pinchen Xie, Weinan E

    Abstract: We propose the coarse-grained spectral projection method (CGSP), a deep learning-assisted approach for tackling quantum unitary dynamic problems with an emphasis on quench dynamics. We show CGSP can extract spectral components of many-body quantum states systematically with sophisticated neural network quantum ansatz. CGSP exploits fully the linear unitary nature of the quantum dynamics, and is po… ▽ More

    Submitted 4 November, 2020; v1 submitted 19 July, 2020; originally announced July 2020.

    Journal ref: Phys. Rev. B 103, 024304 (2021)

  32. arXiv:2006.14450  [pdf, other

    cs.LG math.OC stat.ML

    The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

    Authors: Chao Ma, Lei Wu, Weinan E

    Abstract: A numerical and phenomenological study of the gradient descent (GD) algorithm for training two-layer neural network models is carried out for different parameter regimes when the target function can be accurately approximated by a relatively small number of neurons. It is found that for Xavier-like initialization, there are two distinctive phases in the dynamic behavior of GD in the under-parametr… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

    Comments: 23 pages

  33. arXiv:2006.05982  [pdf, ps, other

    stat.ML cs.LG math.AP math.FA

    Representation formulas and pointwise properties for Barron functions

    Authors: Weinan E, Stephan Wojtowytsch

    Abstract: We study the natural function space for infinitely wide two-layer neural networks with ReLU activation (Barron space) and establish different representation formulae. In two cases, we describe the space explicitly up to isomorphism. Using a convenient representation, we study the pointwise properties of two-layer networks and show that functions whose singular set is fractal or curved (for examp… ▽ More

    Submitted 4 June, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    MSC Class: 68T07; 46E15; 26B35; 26B40

  34. arXiv:2006.02619  [pdf, other

    physics.comp-ph cs.LG math.NA

    Integrating Machine Learning with Physics-Based Modeling

    Authors: Weinan E, Jiequn Han, Linfeng Zhang

    Abstract: Machine learning is poised as a very powerful tool that can drastically improve our ability to carry out scientific research. However, many issues need to be addressed before this becomes a reality. This article focuses on one particular issue of broad interest: How can we integrate machine learning with physics-based modeling to develop new interpretable and truly reliable physical models? After… ▽ More

    Submitted 3 June, 2020; originally announced June 2020.

  35. arXiv:2005.10815  [pdf, other

    cs.LG math.AP stat.ML

    Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

    Authors: Stephan Wojtowytsch, Weinan E

    Abstract: We prove that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than $t^{-4/(d-2)}$ under mean field scaling. Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality. We present numerical evidence that gradient descent tra… ▽ More

    Submitted 21 May, 2020; originally announced May 2020.

    Comments: 5 figures

    MSC Class: 68T07; 49Q22; 68W25

  36. arXiv:2005.10807  [pdf, ps, other

    math.FA cs.LG stat.ML

    Kolmogorov Width Decay and Poor Approximators in Machine Learning: Shallow Neural Networks, Random Feature Models and Neural Tangent Kernels

    Authors: Weinan E, Stephan Wojtowytsch

    Abstract: We establish a scale separation of Kolmogorov width type between subspaces of a given Banach space under the condition that a sequence of linear maps converges much faster on one of the subspaces. The general technique is then applied to show that reproducing kernel Hilbert spaces are poor $L^2$-approximators for the class of two-layer neural networks in high dimension, and that multi-layer networ… ▽ More

    Submitted 2 October, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

    MSC Class: 68T07; 41A30; 41A65; 46E15; 46E22

  37. arXiv:2004.11658  [pdf, other

    physics.comp-ph cs.CE

    86 PFLOPS Deep Potential Molecular Dynamics simulation of 100 million atoms with ab initio accuracy

    Authors: Denghui Lu, Han Wang, Mohan Chen, Jiduan Liu, Lin Lin, Roberto Car, Weinan E, Weile Jia, Linfeng Zhang

    Abstract: We present the GPU version of DeePMD-kit, which, upon training a deep neural network model using ab initio data, can drive extremely large-scale molecular dynamics (MD) simulation with ab initio accuracy. Our tests show that the GPU version is 7 times faster than the CPU version with the same power consumption. The code can scale up to the entire Summit supercomputer. For a copper system of 113, 2… ▽ More

    Submitted 7 September, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

    Comments: 29 pages, 11 figures

  38. arXiv:1912.06987  [pdf, ps, other

    stat.ML cs.LG math.ST

    The Generalization Error of the Minimum-norm Solutions for Over-parameterized Neural Networks

    Authors: Weinan E, Chao Ma, Lei Wu

    Abstract: We study the generalization properties of minimum-norm solutions for three over-parametrized machine learning models including the random feature model, the two-layer neural network model and the residual network model. We proved that for all three models, the generalization error for the minimum-norm solution is comparable to the Monte Carlo rate, up to some logarithmic terms, as long as the mode… ▽ More

    Submitted 28 January, 2021; v1 submitted 15 December, 2019; originally announced December 2019.

    Comments: Published version

    Journal ref: Pure and Applied Functional Analysis, Volume 5, Number 6, 1145-1460, 2020

  39. A mathematical model for universal semantics

    Authors: Weinan E, Yajun Zhou

    Abstract: We characterize the meaning of words with language-independent numerical fingerprints, through a mathematical analysis of recurring patterns in texts. Approximating texts by Markov processes on a long-range time scale, we are able to extract topics, discover synonyms, and sketch semantic fields from a particular document of moderate length, without consulting external knowledge-base or thesaurus.… ▽ More

    Submitted 12 July, 2020; v1 submitted 29 July, 2019; originally announced July 2019.

    Comments: Main text (12 pages, 7 figures); Software Manual (ii+262 pages, 16 figures, 12 tables, available as two ancillary files). Revised according to reviewers' comments

    Journal ref: IEEE Trans. Pattern Anal. Mach. Intell. 44(3):1124-1132 (2022)

  40. arXiv:1906.08039  [pdf, ps, other

    cs.LG math.PR stat.ML

    The Barron Space and the Flow-induced Function Spaces for Neural Network Models

    Authors: Weinan E, Chao Ma, Lei Wu

    Abstract: One of the key issues in the analysis of machine learning models is to identify the appropriate function space and norm for the model. This is the set of functions endowed with a quantity which can control the approximation and estimation errors by a particular machine learning model. In this paper, we address this issue for two representative neural network models: the two-layer networks and the… ▽ More

    Submitted 27 March, 2021; v1 submitted 18 June, 2019; originally announced June 2019.

  41. arXiv:1904.05263  [pdf, other

    cs.LG math.OC stat.ML

    Analysis of the Gradient Descent Algorithm for a Deep Neural Network Model with Skip-connections

    Authors: Weinan E, Chao Ma, Qingcan Wang, Lei Wu

    Abstract: The behavior of the gradient descent (GD) algorithm is analyzed for a deep neural network model with skip-connections. It is proved that in the over-parametrized regime, for a suitable initialization, with high probability GD can find a global minimum exponentially fast. Generalization error estimates along the GD path are also established. As a consequence, it is shown that when the target functi… ▽ More

    Submitted 14 April, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

    Comments: 29 pages, 4 figures

  42. arXiv:1904.04326  [pdf, other

    cs.LG math.OC stat.ML

    A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics

    Authors: Weinan E, Chao Ma, Lei Wu

    Abstract: A fairly comprehensive analysis is presented for the gradient descent dynamics for training two-layer neural network models in the situation when the parameters in both layers are updated. General initialization schemes as well as general regimes for the network width and training data size are considered. In the over-parametrized regime, it is shown that gradient descent dynamics can achieve zero… ▽ More

    Submitted 20 February, 2020; v1 submitted 8 April, 2019; originally announced April 2019.

    Comments: Published version

    MSC Class: 41A99; 49M99

    Journal ref: Science China Mathematics (2020)

  43. arXiv:1903.02154  [pdf, ps, other

    cs.LG stat.ML

    A Priori Estimates of the Population Risk for Residual Networks

    Authors: Weinan E, Chao Ma, Qingcan Wang

    Abstract: Optimal a priori estimates are derived for the population risk, also known as the generalization error, of a regularized residual network model. An important part of the regularized model is the usage of a new path norm, called the weighted path norm, as the regularization term. The weighted path norm treats the skip connections and the nonlinearities differently so that paths with more nonlineari… ▽ More

    Submitted 30 May, 2019; v1 submitted 5 March, 2019; originally announced March 2019.

  44. arXiv:1811.01558  [pdf, other

    cs.LG stat.ML

    Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations

    Authors: Qianxiao Li, Cheng Tai, Weinan E

    Abstract: We develop the mathematical foundations of the stochastic modified equations (SME) framework for analyzing the dynamics of stochastic gradient algorithms, where the latter is approximated by a class of stochastic differential equations with small noise parameters. We prove that this approximation can be understood mathematically as an weak approximation, which leads to a number of precise and usef… ▽ More

    Submitted 5 November, 2018; originally announced November 2018.

    MSC Class: 65K10; 60H10

  45. arXiv:1810.11890  [pdf, other

    physics.comp-ph cond-mat.mtrl-sci cs.LG

    Active Learning of Uniformly Accurate Inter-atomic Potentials for Materials Simulation

    Authors: Linfeng Zhang, De-Ye Lin, Han Wang, Roberto Car, Weinan E

    Abstract: An active learning procedure called Deep Potential Generator (DP-GEN) is proposed for the construction of accurate and transferable machine learning-based models of the potential energy surface (PES) for the molecular modeling of materials. This procedure consists of three main components: exploration, generation of accurate reference data, and training. Application to the sample systems of Al, Mg… ▽ More

    Submitted 6 February, 2019; v1 submitted 28 October, 2018; originally announced October 2018.

    Journal ref: Phys. Rev. Materials 3, 023804 (2019)

  46. arXiv:1810.06397  [pdf, other

    stat.ML cs.LG math.ST

    A Priori Estimates of the Population Risk for Two-layer Neural Networks

    Authors: Weinan E, Chao Ma, Lei Wu

    Abstract: New estimates for the population risk are established for two-layer neural networks. These estimates are nearly optimal in the sense that the error rates scale in the same way as the Monte Carlo error rates. They are equally effective in the over-parametrized regime when the network size is much larger than the size of the dataset. These new estimates are a priori in nature in the sense that the b… ▽ More

    Submitted 20 February, 2020; v1 submitted 15 October, 2018; originally announced October 2018.

    Comments: Published version

    MSC Class: 41A46; 41A63; 62J02; 65D05

    Journal ref: Communications in Mathematical Sciences, Volume 17(2019)

  47. arXiv:1809.10188  [pdf, other

    cs.LG cond-mat.stat-mech math.DS stat.ML

    Monge-Ampère Flow for Generative Modeling

    Authors: Linfeng Zhang, Weinan E, Lei Wang

    Abstract: We present a deep generative model, named Monge-Ampère flow, which builds on continuous-time gradient flow arising from the Monge-Ampère equation in optimal transport theory. The generative map from the latent space to the data space follows a dynamical system, where a learnable potential function guides a compressible fluid to flow towards the target density distribution. Training of the model am… ▽ More

    Submitted 26 September, 2018; originally announced September 2018.

  48. arXiv:1808.04258  [pdf, other

    cs.LG physics.comp-ph stat.ML

    Model Reduction with Memory and the Machine Learning of Dynamical Systems

    Authors: Chao Ma, Jianchun Wang, Weinan E

    Abstract: The well-known Mori-Zwanzig theory tells us that model reduction leads to memory effect. For a long time, modeling the memory effect accurately and efficiently has been an important but nearly impossible task in developing a good reduced model. In this work, we explore a natural analogy between recurrent neural networks and the Mori-Zwanzig formalism to establish a systematic approach for developi… ▽ More

    Submitted 10 August, 2018; originally announced August 2018.

  49. A Mean-Field Optimal Control Formulation of Deep Learning

    Authors: Weinan E, Jiequn Han, Qianxiao Li

    Abstract: Recent work linking deep neural networks and dynamical systems opened up new avenues to analyze deep learning. In particular, it is observed that new insights can be obtained by recasting deep learning as an optimal control problem on difference or differential equations. However, the mathematical aspects of such a formulation have not been systematically explored. This paper introduces the mathem… ▽ More

    Submitted 3 July, 2018; originally announced July 2018.

    Comments: 44 pages

    Journal ref: Research in the Mathematical Sciences, 6:10 (2019)

  50. arXiv:1807.00297  [pdf, ps, other

    cs.LG stat.ML

    Exponential Convergence of the Deep Neural Network Approximation for Analytic Functions

    Authors: Weinan E, Qingcan Wang

    Abstract: We prove that for analytic functions in low dimension, the convergence rate of the deep neural network approximation is exponential.

    Submitted 1 July, 2018; originally announced July 2018.