Search | arXiv e-print repository

FADAS: Towards Federated Adaptive Asynchronous Optimization

Authors: Yujia Wang, Shiqiang Wang, Songtao Lu, Jinghui Chen

Abstract: Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning. While the SGD-based FL algorithms have demonstrated considerable success in the past, there is a growing trend towards adopting adaptive federated optimization methods, particularly for training large-scale models. However, the conventional synchronous aggregation design poses a signi… ▽ More Federated learning (FL) has emerged as a widely adopted training paradigm for privacy-preserving machine learning. While the SGD-based FL algorithms have demonstrated considerable success in the past, there is a growing trend towards adopting adaptive federated optimization methods, particularly for training large-scale models. However, the conventional synchronous aggregation design poses a significant challenge to the practical deployment of those adaptive federated optimization methods, particularly in the presence of straggler clients. To fill this research gap, this paper introduces federated adaptive asynchronous optimization, named FADAS, a novel method that incorporates asynchronous updates into adaptive federated optimization with provable guarantees. To further enhance the efficiency and resilience of our proposed method in scenarios with significant asynchronous delays, we also extend FADAS with a delay-adaptive learning adjustment strategy. We rigorously establish the convergence rate of the proposed algorithms and empirical results demonstrate the superior performance of FADAS over other asynchronous FL baselines. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: Accepted by ICML 2024

arXiv:2407.08496 [pdf, ps, other]

Convergences of Combinatorial Ricci Flows to Degenerated Circle Packings in Hyperbolic Background Geometry

Authors: Guangming Hu, Sicheng Lu, Dong Tan, Youliang Zhong, Puchun Zhou

Abstract: This paper investigates a kind of degenerated circle packings in hyperbolic background geometry. A main problem is whether a prescribed total geodesic curvature data can be realized by a degenerated circle packing or not. We fully characterize the sufficient and necessary conditions and show the uniqueness. Furthermore, we introduce the combinatoral Ricci flow to find the desired degenerated circl… ▽ More This paper investigates a kind of degenerated circle packings in hyperbolic background geometry. A main problem is whether a prescribed total geodesic curvature data can be realized by a degenerated circle packing or not. We fully characterize the sufficient and necessary conditions and show the uniqueness. Furthermore, we introduce the combinatoral Ricci flow to find the desired degenerated circle packed surface, analougus to the methods of Chow-Luo and Takatsu. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 36 pages, 9 figures

MSC Class: 52C26; 57M50

arXiv:2407.06784 [pdf, other]

Preasymptotic error estimates of EEM and CIP-EEM for the time-harmonic Maxwell equations with large wave number

Authors: Shuaishuai Lu, Haijun Wu

Abstract: Preasymptotic error estimates are derived for the linear edge element method (EEM) and the linear $\boldsymbol{H}(\boldsymbol{\mathrm{curl}})$-conforming interior penalty edge element method (CIP-EEM) for the time-harmonic Maxwell equations with large wave number. It is shown that under the mesh condition that $κ^3 h^2$ is sufficiently small, the errors of the solutions to both methods are bounded… ▽ More Preasymptotic error estimates are derived for the linear edge element method (EEM) and the linear $\boldsymbol{H}(\boldsymbol{\mathrm{curl}})$-conforming interior penalty edge element method (CIP-EEM) for the time-harmonic Maxwell equations with large wave number. It is shown that under the mesh condition that $κ^3 h^2$ is sufficiently small, the errors of the solutions to both methods are bounded by $\mathcal{O} (κh + κ^3 h^2 )$ in the energy norm and $\mathcal{O} (κh^2 + κ^2 h^2 )$ in the $\boldsymbol{L}^2$ norm, where $κ$ is the wave number and $h$ is the mesh size. Numerical tests are provided to verify our theoretical results and to illustrate the potential of CIP-EEM in significantly reducing the pollution effect. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.05078 [pdf, ps, other]

Function and derivative approximation by shallow neural networks

Authors: Yuanyuan Li, Shuai Lu

Abstract: We investigate a Tikhonov regularization scheme specifically tailored for shallow neural networks within the context of solving a classic inverse problem: approximating an unknown function and its derivatives within a unit cubic domain based on noisy measurements. The proposed Tikhonov regularization scheme incorporates a penalty term that takes three distinct yet intricately related network (semi… ▽ More We investigate a Tikhonov regularization scheme specifically tailored for shallow neural networks within the context of solving a classic inverse problem: approximating an unknown function and its derivatives within a unit cubic domain based on noisy measurements. The proposed Tikhonov regularization scheme incorporates a penalty term that takes three distinct yet intricately related network (semi)norms: the extended Barron norm, the variation norm, and the Radon-BV seminorm. These choices of the penalty term are contingent upon the specific architecture of the neural network being utilized. We establish the connection between various network norms and particularly trace the dependence of the dimensionality index, aiming to deepen our understanding of how these norms interplay with each other. We revisit the universality of function approximation through various norms, establish rigorous error-bound analysis for the Tikhonov regularization scheme, and explicitly elucidate the dependency of the dimensionality index, providing a clearer understanding of how the dimensionality affects the approximation performance and how one designs a neural network with diverse approximating tasks. △ Less

Submitted 6 July, 2024; originally announced July 2024.

MSC Class: 65D15; 65F22; 65J20

arXiv:2407.04909 [pdf, ps, other]

The averaging principle of stochastic functional partial differential equations with Hölder coefficients and infinite delay

Authors: Shuaishuai Lu, Xue Yang, Yong Li

Abstract: In this paper, we establish the averaging principle for stochastic functional partial differential equations (SFPDEs) characterized by Hölder coefficients and infinite delay. Firstly, we rigorously establish the existence and uniqueness of strong solutions for a specific class of finite-dimensional systems characterized by Hölder continuous coefficients and infinite delay. We extend these results… ▽ More In this paper, we establish the averaging principle for stochastic functional partial differential equations (SFPDEs) characterized by Hölder coefficients and infinite delay. Firstly, we rigorously establish the existence and uniqueness of strong solutions for a specific class of finite-dimensional systems characterized by Hölder continuous coefficients and infinite delay. We extend these results to their infinite-dimensional counterparts using the variational approach and Galerkin projection technique. Subsequently, we establish the averaging principle (the first Bogolyubov theorem) for SFPDEs with infinite delay, subject to conditions of linear growth and Hölder continuity. This is achieved through classical Khasminskii time discretization and reductio ad absurdum, illustrating the convergence of solutions from the original Cauchy problem to those of the averaged equation across the finite interval [0, T]. To illustrate our findings, we present two applications: stochastic generalized porous media equations and stochastic reaction-diffusion equations with Hölder coefficients. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.00588 [pdf, other]

Forward and backward problems for coupled subdiffusion systems

Authors: Dian Feng, Yikan Liu, Shuai Lu

Abstract: In this article, we investigate both forward and backward problems for coupled systems of time-fractional diffusion equations, encompassing scenarios of strong coupling. For the forward problem, we establish the well-posedness of the system, leveraging the eigensystem of the corresponding elliptic system as the foundation. When considering the backward problem, specifically the determination of in… ▽ More In this article, we investigate both forward and backward problems for coupled systems of time-fractional diffusion equations, encompassing scenarios of strong coupling. For the forward problem, we establish the well-posedness of the system, leveraging the eigensystem of the corresponding elliptic system as the foundation. When considering the backward problem, specifically the determination of initial values through final time observations, we demonstrate a Lipschitz stability estimate, which is consist with the stability observed in the case of a single equation. To numerically address this backward problem, we refer to the explicit formulation of Tikhonov regularization to devise a multi-channel neural network architecture. This innovative architecture offers a versatile approach, exhibiting its efficacy in multidimensional settings through numerical examples and its robustness in handling initial values that have not been trained. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 26 pages, 7 figures

MSC Class: 35R11; 35K58; 35B44

arXiv:2406.10511 [pdf, other]

Efficient Hardware Accelerator Based on Medium Granularity Dataflow for SpTRSV

Authors: Qian Chen, Xiaofeng Yang, Shengli Lu

Abstract: Sparse triangular solve (SpTRSV) is widely used in various domains. Numerous studies have been conducted using CPUs, GPUs, and specific hardware accelerators, where dataflow can be categorized into coarse and fine granularity. Coarse dataflow offers good spatial locality but suffers from low parallelism, while fine dataflow provides high parallelism but disrupts the spatial structure, leading to i… ▽ More Sparse triangular solve (SpTRSV) is widely used in various domains. Numerous studies have been conducted using CPUs, GPUs, and specific hardware accelerators, where dataflow can be categorized into coarse and fine granularity. Coarse dataflow offers good spatial locality but suffers from low parallelism, while fine dataflow provides high parallelism but disrupts the spatial structure, leading to increased nodes and poor data reuse. This paper proposes a novel hardware accelerator for SpTRSV or SpTRSV-like DAGs. The accelerator implements a medium granularity dataflow through hardware-software codesign and achieves both excellent spatial locality and high parallelism. Additionally, a partial sum caching mechanism is introduced to reduce the blocking frequency of processing elements (PEs), and a reordering algorithm of intra-node edges computation is developed to enhance data reuse. Experimental results on 264 benchmarks with node counts reaching up to 85,392 demonstrate that this work achieves average performance improvements of 12.2$\times$ (up to 874.5$\times$) over CPUs and 10.1$\times$ (up to 740.4$\times$) over GPUs. Compared to the state-of-the-art technique (DPU-v2), this work shows a 2.5$\times$ (up to 5.9$\times$) average performance improvement and 1.8$\times$ (up to 4.1$\times$) average energy efficiency enhancement. △ Less

Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

arXiv:2405.18858 [pdf, other]

Distributed Bilevel Optimization with Communication Compression

Authors: Yutong He, Jie Hu, Xinmeng Huang, Songtao Lu, Bin Wang, Kun Yuan

Abstract: Stochastic bilevel optimization tackles challenges involving nested optimization structures. Its fast-growing scale nowadays necessitates efficient distributed algorithms. In conventional distributed bilevel methods, each worker must transmit full-dimensional stochastic gradients to the server every iteration, leading to significant communication overhead and thus hindering efficiency and scalabil… ▽ More Stochastic bilevel optimization tackles challenges involving nested optimization structures. Its fast-growing scale nowadays necessitates efficient distributed algorithms. In conventional distributed bilevel methods, each worker must transmit full-dimensional stochastic gradients to the server every iteration, leading to significant communication overhead and thus hindering efficiency and scalability. To resolve this issue, we introduce the first family of distributed bilevel algorithms with communication compression. The primary challenge in algorithmic development is mitigating bias in hypergradient estimation caused by the nested structure. We first propose C-SOBA, a simple yet effective approach with unbiased compression and provable linear speedup convergence. However, it relies on strong assumptions on bounded gradients. To address this limitation, we explore the use of moving average, error feedback, and multi-step compression in bilevel optimization, resulting in a series of advanced algorithms with relaxed assumptions and improved convergence properties. Numerical experiments show that our compressed bilevel algorithms can achieve $10\times$ reduction in communication overhead without severe performance degradation. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.06938 [pdf, ps, other]

Stochastic functional partial differential equations with monotone coefficients: Poisson stability measures, exponential mixing and limit theorems

Authors: Shuaishuai Lu, Xue Yang, Yong Li

Abstract: This paper examines Poisson stable (including stationary, periodic, almost periodic, Levitan almost periodic, Bohr almost automorphic, pseudo-periodic, Birkhoff recurrent, pseudo-recurrent, etc.) measures and limit theorems for stochastic functional partial differential equations(SFPDEs) with monotone coefficients. We first show the existence and uniqueness of entrance measure $μ_{t}$ for SFPDEs b… ▽ More This paper examines Poisson stable (including stationary, periodic, almost periodic, Levitan almost periodic, Bohr almost automorphic, pseudo-periodic, Birkhoff recurrent, pseudo-recurrent, etc.) measures and limit theorems for stochastic functional partial differential equations(SFPDEs) with monotone coefficients. We first show the existence and uniqueness of entrance measure $μ_{t}$ for SFPDEs by dissipative method (or remoting start). Then, with the help of Shcherbakov's comparability method in character of recurrence, we prove that the entrance measure inherits the same recurrence of coefficients. Thirdly, we show the tightness of the set of measures $μ_{t}$. As a result, any sequence of the average of $\{μ_{t}\}_{t\in\mathbb{R} }$ have the limit point $μ^{*}$. Further, we study the uniform exponential mixing of the measure $μ^{*}$ in the sense of Wasserstein metric. Fourthly, under uniform exponential mixing and Markov property, we establish the strong law of large numbers, the central limit theorem and estimate the corresponding rates of convergence for solution maps of SFPDEs. Finally, we give applications of stochastic generalized porous media equations with delay to illustrate of our results. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06223 [pdf, ps, other]

McKean-Vlasov SPDEs with Hölder continuous coefficients: existence, uniqueness, ergodicity, exponential mixing and limit theorems

Authors: Shuaishuai Lu, Xue Yang, Yong Li

Abstract: This paper investigates the existence and uniqueness of solutions, as well as the ergodicity and exponential mixing to invariant measures, and limit theorems for a class of McKean-Vlasov SPDEs characterized by Hlder continuity. We rigorously establish the existence and uniqueness of strong solutions for a specific class of finite-dimensional systems with Hölder continuous coefficients. Extending t… ▽ More This paper investigates the existence and uniqueness of solutions, as well as the ergodicity and exponential mixing to invariant measures, and limit theorems for a class of McKean-Vlasov SPDEs characterized by Hlder continuity. We rigorously establish the existence and uniqueness of strong solutions for a specific class of finite-dimensional systems with Hölder continuous coefficients. Extending these results to the infinite-dimensional counterparts using the Galerkin projection technique. Additionally, we explore the properties of the solutions, including time homogeneity, the Markov and the Feller property. Building upon these properties, we examine the exponential ergodicity and mixing of invariant measures under Lyapunov conditions. Finally, within the framework of coefficients meeting the criteria of Hlder continuity and Lyapunov conditions, alongside the uniform mixing property of invariant measures, we establish the strong law of large numbers and the central limit theorem for the solution and obtain estimates of corresponding convergence rates. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2404.07230 [pdf, ps, other]

Interval-valued fuzzy soft $β$-covering approximation spaces

Authors: Shizhan Lu

Abstract: The concept of interval-valued fuzzy soft $β$-covering approximation spaces (IFS$β$CASs) is introduced to combine the theories of soft sets, rough sets and interval-valued fuzzy sets, and some fundamental propositions concerning interval-valued fuzzy soft $β$-neighborhoods and soft $β$-neighborhoods of IFS$β$CASs are explored. And then four kinds of interval-valued fuzzy soft $β$-coverings based f… ▽ More The concept of interval-valued fuzzy soft $β$-covering approximation spaces (IFS$β$CASs) is introduced to combine the theories of soft sets, rough sets and interval-valued fuzzy sets, and some fundamental propositions concerning interval-valued fuzzy soft $β$-neighborhoods and soft $β$-neighborhoods of IFS$β$CASs are explored. And then four kinds of interval-valued fuzzy soft $β$-coverings based fuzzy rough sets are researched. Finally, the relationships of four kinds of interval-valued fuzzy soft $β$-coverings based fuzzy rough sets are investigated. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 12 pages

arXiv:2403.15745 [pdf, other]

Fast Consensus Topology Design via Minimizing Laplacian Energy

Authors: Susie Lu, Ji Liu

Abstract: This paper characterizes the graphical properties of an optimal topology with minimal Laplacian energy under the constraint of fixed numbers of vertices and edges, and devises an algorithm to construct such connected optimal graphs. These constructed graphs possess maximum vertex and edge connectivity, and more importantly, exhibit large algebraic connectivity of an optimal order provided they are… ▽ More This paper characterizes the graphical properties of an optimal topology with minimal Laplacian energy under the constraint of fixed numbers of vertices and edges, and devises an algorithm to construct such connected optimal graphs. These constructed graphs possess maximum vertex and edge connectivity, and more importantly, exhibit large algebraic connectivity of an optimal order provided they are not sparse. These properties guarantee fast and resilient consensus processes over these graphs. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2402.03167 [pdf, other]

Decentralized Bilevel Optimization over Graphs: Loopless Algorithmic Update and Transient Iteration Complexity

Authors: Boao Kong, Shuchen Zhu, Songtao Lu, Xinmeng Huang, Kun Yuan

Abstract: Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. Howev… ▽ More Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. However, current decentralized SBO algorithms face challenges, including expensive inner-loop updates and unclear understanding of the influence of network topology, data heterogeneity, and the nested bilevel algorithmic structures. In this paper, we introduce a single-loop decentralized SBO (D-SOBA) algorithm and establish its transient iteration complexity, which, for the first time, clarifies the joint influence of network topology and data heterogeneity on decentralized bilevel algorithms. D-SOBA achieves the state-of-the-art asymptotic rate, asymptotic gradient/Hessian complexity, and transient iteration complexity under more relaxed assumptions compared to existing methods. Numerical experiments validate our theoretical findings. △ Less

Submitted 26 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: 37 pages, 6 figures

arXiv:2401.15945 [pdf, ps, other]

Regularization of linear inverse problems with irregular noise using embedding operators

Authors: Xinyan Li, Simon Hubmer, Shuai Lu, Ronny Ramlau

Abstract: In this paper, we investigate regularization of linear inverse problems with irregular noise. In particular, we consider the case that the noise can be preprocessed by certain adjoint embedding operators. By introducing the consequent preprocessed problem, we provide convergence analysis for general regularization schemes under standard assumptions. Furthermore, for a special case of Tikhonov regu… ▽ More In this paper, we investigate regularization of linear inverse problems with irregular noise. In particular, we consider the case that the noise can be preprocessed by certain adjoint embedding operators. By introducing the consequent preprocessed problem, we provide convergence analysis for general regularization schemes under standard assumptions. Furthermore, for a special case of Tikhonov regularization in Computerized Tomography, we show that our approach leads to a novel (Fourier-based) filtered backprojection algorithm. Numerical examples with different parameter choice rules verify the efficiency of our proposed algorithm. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 23 pages, 2 figures

arXiv:2401.12229 [pdf, ps, other]

Interior $C^2$ estimate for Hessian quotient equation in general dimension

Authors: Siyuan Lu

Abstract: In this paper, we study the interior $C^2$ regularity problem for the Hessian quotient equation $\left(\frac{σ_n}{σ_k}\right)(D^2u)=f$. We give a complete answer to this longstanding problem: for $k=n-1,n-2$, we establish an interior $C^2$ estimate; for $k\leq n-3$, we show that interior $C^2$ estimate fails by finding a singular solution. In this paper, we study the interior $C^2$ regularity problem for the Hessian quotient equation $\left(\frac{σ_n}{σ_k}\right)(D^2u)=f$. We give a complete answer to this longstanding problem: for $k=n-1,n-2$, we establish an interior $C^2$ estimate; for $k\leq n-3$, we show that interior $C^2$ estimate fails by finding a singular solution. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: arXiv admin note: text overlap with arXiv:2311.05835

MSC Class: 35J60; 35J15; 35J96

arXiv:2312.01807 [pdf, other]

Moduli Space of Dihedral Spherical Surfaces and Measured Foliations

Authors: Sicheng Lu, Bin Xu

Abstract: Cone spherical surfaces are orientable Riemannian surfaces with constant curvature one and a finite set of conical singularities. A subset of these surfaces, referred to as dihedral surfaces, is characterized by their monodromy groups, which notably preserve a pair of antipodal points on the unit two-sphere within three-dimensional Euclidean space. On each dihedral surface, we define a pair of tra… ▽ More Cone spherical surfaces are orientable Riemannian surfaces with constant curvature one and a finite set of conical singularities. A subset of these surfaces, referred to as dihedral surfaces, is characterized by their monodromy groups, which notably preserve a pair of antipodal points on the unit two-sphere within three-dimensional Euclidean space. On each dihedral surface, we define a pair of transverse measured foliations that, in turn, comprehensively characterize the original dihedral surface. Furthermore, we introduce a variety of geometric decompositions and deformations specific to dihedral surfaces. As a practical application, we ascertain the dimension of the moduli space for dihedral surfaces given specified cone angles and topological types. This dimension acts as an indicator of the independent geometric parameters that determine the isometric classes of these surfaces. △ Less

Submitted 2 April, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: 40 pages, 15 figures. All comments are welcome! Small modifications on propositions, descriptions, figures and fonts in v2

MSC Class: 58D27; 53C12; 30F45; 30F30

arXiv:2311.05835 [pdf, ps, other]

Interior $C^2$ estimate for Hessian quotient equation in dimension three

Authors: Siyuan Lu

Abstract: In this paper, we establish an interior $C^2$ estimate for the Hessian quotient equation $\left(\frac{σ_3}{σ_1}\right)(D^2u)=f$ in dimension three. A crucial ingredient in our proof is a Jacobi inequality. In this paper, we establish an interior $C^2$ estimate for the Hessian quotient equation $\left(\frac{σ_3}{σ_1}\right)(D^2u)=f$ in dimension three. A crucial ingredient in our proof is a Jacobi inequality. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.20709 [pdf, other]

Quadratic Differentials as Stability Conditions of Graded Skew-gentle Algebras

Authors: Suiqi Lu, Yu Qiu, Dongjian Wu

Abstract: We prove that the principal component of the exchange graph of hearts of a graded skew-gentle algebra can be identified with the corresponding exchange graph of S-graphs, using the geometric models and $\operatorname{Int}=\operatorname{dim}\operatorname{Hom}$ formula in Qiu-Zhang-Zhou. Using the same argument in Bridgeland-Smith, Barbieri-Möller-Qiu-So and Christ-Haiden-Qiu, we extend this identif… ▽ More We prove that the principal component of the exchange graph of hearts of a graded skew-gentle algebra can be identified with the corresponding exchange graph of S-graphs, using the geometric models and $\operatorname{Int}=\operatorname{dim}\operatorname{Hom}$ formula in Qiu-Zhang-Zhou. Using the same argument in Bridgeland-Smith, Barbieri-Möller-Qiu-So and Christ-Haiden-Qiu, we extend this identification to an isomorphism between the spaces of stability conditions and of quadratic differentials. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.06784 [pdf, ps, other]

Finiteness of pointed maps to moduli spaces of polarized varieties

Authors: Ariyan Javanpeykar, Steven Lu, Ruiran Sun, Kang Zuo

Abstract: We prove a finiteness result for pointed maps to the base space of a family of polarized varieties with maximal variation in moduli. A key ingredient is a new criterion for the rigidity of pointed maps. We prove a finiteness result for pointed maps to the base space of a family of polarized varieties with maximal variation in moduli. A key ingredient is a new criterion for the rigidity of pointed maps. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 15 pages. Comments welcome

arXiv:2309.12425 [pdf, other]

Principal Stratification with Continuous Post-Treatment Variables: Nonparametric Identification and Semiparametric Estimation

Authors: Sizhu Lu, Zhichao Jiang, Peng Ding

Abstract: Post-treatment variables often complicate causal inference. They appear in many scientific problems, including noncompliance, truncation by death, mediation, and surrogate endpoint evaluation. Principal stratification is a strategy to address these challenges by adjusting for the potential values of the post-treatment variables, defined as the principal strata. It allows for characterizing treatme… ▽ More Post-treatment variables often complicate causal inference. They appear in many scientific problems, including noncompliance, truncation by death, mediation, and surrogate endpoint evaluation. Principal stratification is a strategy to address these challenges by adjusting for the potential values of the post-treatment variables, defined as the principal strata. It allows for characterizing treatment effect heterogeneity across principal strata and unveiling the mechanism of the treatment's impact on the outcome related to post-treatment variables. However, the existing literature has primarily focused on binary post-treatment variables, leaving the case with continuous post-treatment variables largely unexplored. This gap persists due to the complexity of infinitely many principal strata, which present challenges to both the identification and estimation of causal effects. We fill this gap by providing nonparametric identification and semiparametric estimation theory for principal stratification with continuous post-treatment variables. We propose to use working models to approximate the underlying causal effect surfaces and derive the efficient influence functions of the corresponding model parameters. Based on the theory, we construct doubly robust estimators and implement them in an R package. △ Less

Submitted 3 April, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

arXiv:2306.02422 [pdf, other]

A Generalized Alternating Method for Bilevel Learning under the Polyak-Łojasiewicz Condition

Authors: Quan Xiao, Songtao Lu, Tianyi Chen

Abstract: Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning. Recent results have shown that simple alternating (implicit) gradient-based algorithms can match the convergence rate of single-level gradient descent (GD) when addressing bilevel problems with a strongly c… ▽ More Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning. Recent results have shown that simple alternating (implicit) gradient-based algorithms can match the convergence rate of single-level gradient descent (GD) when addressing bilevel problems with a strongly convex lower-level objective. However, it remains unclear whether this result can be generalized to bilevel problems beyond this basic setting. In this paper, we first introduce a stationary metric for the considered bilevel problems, which generalizes the existing metric, for a nonconvex lower-level objective that satisfies the Polyak-Łojasiewicz (PL) condition. We then propose a Generalized ALternating mEthod for bilevel opTimization (GALET) tailored to BLO with convex PL LL problem and establish that GALET achieves an $ε$-stationary point for the considered problem within $\tilde{\cal O}(ε^{-1})$ iterations, which matches the iteration complexity of GD for single-level smooth nonconvex problems. △ Less

Submitted 5 October, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

Comments: Camera ready version

arXiv:2305.14686 [pdf, other]

Harmonic Measures and Numerical Computation of Cauchy Problems for Laplace Equations

Authors: Yu Chen, Jin Cheng, Shuai Lu, Masahiro Yamamoto

Abstract: It is well known that Cauchy problem for Laplace equations is an ill-posed problem in Hadamard's sense. Small deviations in Cauchy data may lead to large errors in the solutions. It is observed that if a bound is imposed on the solution, there exists a conditional stability estimate. This gives a reasonable way to construct stable algorithms. However, it is impossible to have good results at all p… ▽ More It is well known that Cauchy problem for Laplace equations is an ill-posed problem in Hadamard's sense. Small deviations in Cauchy data may lead to large errors in the solutions. It is observed that if a bound is imposed on the solution, there exists a conditional stability estimate. This gives a reasonable way to construct stable algorithms. However, it is impossible to have good results at all points in the domain. Although numerical methods for Cauchy problems for Laplace equations have been widely studied for quite a long time, there are still some unclear points, for example, how to evaluate the numerical solutions, which means whether we can approximate the Cauchy data well and keep the bound of the solution, and at which points the numerical results are reliable? In this paper, we will prove the conditional stability estimate which is quantitatively related to harmonic measures. The harmonic measure can be used as an indicate function to pointwisely evaluate the numerical result, which further enables us to find a reliable subdomain where the local convergence rate is higher than a certain order. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2302.14252 [pdf, other]

Compressed Decentralized Proximal Stochastic Gradient Method for Nonconvex Composite Problems with Heterogeneous Data

Authors: Yonggui Yan, Jie Chen, Pin-Yu Chen, Xiaodong Cui, Songtao Lu, Yangyang Xu

Abstract: We first propose a decentralized proximal stochastic gradient tracking method (DProxSGT) for nonconvex stochastic composite problems, with data heterogeneously distributed on multiple workers in a decentralized connected network. To save communication cost, we then extend DProxSGT to a compressed method by compressing the communicated information. Both methods need only $\mathcal{O}(1)$ samples pe… ▽ More We first propose a decentralized proximal stochastic gradient tracking method (DProxSGT) for nonconvex stochastic composite problems, with data heterogeneously distributed on multiple workers in a decentralized connected network. To save communication cost, we then extend DProxSGT to a compressed method by compressing the communicated information. Both methods need only $\mathcal{O}(1)$ samples per worker for each proximal update, which is important to achieve good generalization performance on training deep neural networks. With a smoothness condition on the expected loss function (but not on each sample function), the proposed methods can achieve an optimal sample complexity result to produce a near-stationary point. Numerical experiments on training neural networks demonstrate the significantly better generalization performance of our methods over large-batch training methods and momentum variance-reduction methods and also, the ability of handling heterogeneous data by the gradient tracking scheme. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.12409 [pdf, ps, other]

Curvature estimates for semi-convex solutions of Hessian equations in hyperbolic space

Authors: Siyuan Lu

Abstract: In this paper, we establish a curvature estimate for semi-convex solutions of Hessian equations in hyperbolic space. We also obtain a curvature estimate for admissible solutions to prescribed curvature measure type problem in hyperbolic space. A crucial ingredient in both estimates is a concavity inequality for Hessian operator. In this paper, we establish a curvature estimate for semi-convex solutions of Hessian equations in hyperbolic space. We also obtain a curvature estimate for admissible solutions to prescribed curvature measure type problem in hyperbolic space. A crucial ingredient in both estimates is a concavity inequality for Hessian operator. △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2302.02922 [pdf, other]

Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks

Authors: Shuai Zhang, Meng Wang, Pin-Yu Chen, Sijia Liu, Songtao Lu, Miao Liu

Abstract: Due to the significant computational challenge of training large-scale graph neural networks (GNNs), various sparse learning techniques have been exploited to reduce memory and storage costs. Examples include \textit{graph sparsification} that samples a subgraph to reduce the amount of data aggregation and \textit{model sparsification} that prunes the neural network to reduce the number of trainab… ▽ More Due to the significant computational challenge of training large-scale graph neural networks (GNNs), various sparse learning techniques have been exploited to reduce memory and storage costs. Examples include \textit{graph sparsification} that samples a subgraph to reduce the amount of data aggregation and \textit{model sparsification} that prunes the neural network to reduce the number of trainable weights. Despite the empirical successes in reducing the training cost while maintaining the test accuracy, the theoretical generalization analysis of sparse learning for GNNs remains elusive. To the best of our knowledge, this paper provides the first theoretical characterization of joint edge-model sparse learning from the perspective of sample complexity and convergence rate in achieving zero generalization error. It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy. Although the analysis is centered on two-layer GNNs with structural constraints on data, the insights are applicable to more general setups and justified by both synthetic and practical citation datasets. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Journal ref: The Eleventh International Conference on Learning Representations, 2023

arXiv:2301.07875 [pdf, ps, other]

Increasing stability of a linearized inverse boundary value problem for a nonlinear Schrödinger equation on transversally anisotropic manifolds

Authors: Shuai Lu, Jian Zhai

Abstract: We consider the problem of recovering a nonlinear potential function in a nonlinear Schrödinger equation on transversally anisotropic manifolds from the linearized Dirichlet-to-Neumann map at a large wavenumber. By calibrating the complex geometric optics (CGO) solutions according to the wavenumber, we prove the increasing stability of recovering the coefficient of a cubic term as the wavenumber b… ▽ More We consider the problem of recovering a nonlinear potential function in a nonlinear Schrödinger equation on transversally anisotropic manifolds from the linearized Dirichlet-to-Neumann map at a large wavenumber. By calibrating the complex geometric optics (CGO) solutions according to the wavenumber, we prove the increasing stability of recovering the coefficient of a cubic term as the wavenumber becomes large. △ Less

Submitted 18 January, 2023; originally announced January 2023.

arXiv:2212.09513 [pdf, other]

Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization

Authors: Zichong Li, Pin-Yu Chen, Sijia Liu, Songtao Lu, Yangyang Xu

Abstract: Many real-world problems not only have complicated nonconvex functional constraints but also use a large number of data points. This motivates the design of efficient stochastic methods on finite-sum or expectation constrained problems. In this paper, we design and analyze stochastic inexact augmented Lagrangian methods (Stoc-iALM) to solve problems involving a nonconvex composite (i.e. smooth+non… ▽ More Many real-world problems not only have complicated nonconvex functional constraints but also use a large number of data points. This motivates the design of efficient stochastic methods on finite-sum or expectation constrained problems. In this paper, we design and analyze stochastic inexact augmented Lagrangian methods (Stoc-iALM) to solve problems involving a nonconvex composite (i.e. smooth+nonsmooth) objective and nonconvex smooth functional constraints. We adopt the standard iALM framework and design a subroutine by using the momentum-based variance-reduced proximal stochastic gradient method (PStorm) and a postprocessing step. Under certain regularity conditions (assumed also in existing works), to reach an $\varepsilon$-KKT point in expectation, we establish an oracle complexity result of $O(\varepsilon^{-5})$, which is better than the best-known $O(\varepsilon^{-6})$ result. Numerical experiments on the fairness constrained problem and the Neyman-Pearson classification problem with real data demonstrate that our proposed method outperforms an existing method with the previously best-known complexity result. △ Less

Submitted 19 December, 2022; originally announced December 2022.

arXiv:2211.13562 [pdf, other]

Increasing stability of the first order linearized inverse Schrödinger potential problem with integer power type nonlinearities

Authors: Sen Zou, Shuai Lu, Boxi Xu

Abstract: We investigate the increasing stability of the inverse Schrödinger potential problem with integer power type nonlinearities at a large wavenumber. By considering the first order linearized system with respect to the unknown potential function, a combination formula of the first order linearization is proposed, which provides a Lipschitz type stability for the recovery of the Fourier coefficients o… ▽ More We investigate the increasing stability of the inverse Schrödinger potential problem with integer power type nonlinearities at a large wavenumber. By considering the first order linearized system with respect to the unknown potential function, a combination formula of the first order linearization is proposed, which provides a Lipschitz type stability for the recovery of the Fourier coefficients of the unknown potential function in low frequency mode. These stability results highlight the advantage of nonlinearity in solving this inverse potential problem by explicitly quantifying the dependence to the wavenumber and the nonlinearities index. A reconstruction algorithm for general power type nonlinearities is also provided. Several numerical examples illuminate the efficiency of our proposed algorithm. △ Less

Submitted 24 November, 2022; originally announced November 2022.

Comments: 37 pages, 8 figures

MSC Class: 35J25; 65N20

arXiv:2207.13499 [pdf, other]

On a Dynamic Variant of the Iteratively Regularized Gauss-Newton Method with Sequential Data

Authors: Neil K. Chada, Marco A. Iglesias, Shuai Lu, Frank Werner

Abstract: For numerous parameter and state estimation problems, assimilating new data as they become available can help produce accurate and fast inference of unknown quantities. While most existing algorithms for solving those kind of ill-posed inverse problems can only be used with a single instance of the observed data, in this work we propose a new framework that enables existing algorithms to invert mu… ▽ More For numerous parameter and state estimation problems, assimilating new data as they become available can help produce accurate and fast inference of unknown quantities. While most existing algorithms for solving those kind of ill-posed inverse problems can only be used with a single instance of the observed data, in this work we propose a new framework that enables existing algorithms to invert multiple instances of data in a sequential fashion. Specifically we will work with the well-known iteratively regularized Gauss-Newton method (IRGNM), a variational methodology for solving nonlinear inverse problems. We develop a theory of convergence analysis for a proposed dynamic IRGNM algorithm in the presence of Gaussian white noise. We combine this algorithm with the classical IRGNM to deliver a practical (hybrid) algorithm that can invert data sequentially while producing fast estimates. Our work includes the proof of well-definedness of the proposed iterative scheme, as well as various error bounds that rely on standard assumptions for nonlinear inverse problems. We use several numerical experiments to verify our theoretical findings, and to highlight the benefits of incorporating sequential data. The context of the numerical experiments comprises various parameter identification problems including a Darcy flow elliptic PDE example, and that of electrical impedance tomography. △ Less

Submitted 27 July, 2022; originally announced July 2022.

arXiv:2207.13283 [pdf, other]

INTERACT: Achieving Low Sample and Communication Complexities in Decentralized Bilevel Learning over Networks

Authors: Zhuqing Liu, Xin Zhang, Prashant Khanduri, Songtao Lu, Jia Liu

Abstract: In recent years, decentralized bilevel optimization problems have received increasing attention in the networking and machine learning communities thanks to their versatility in modeling decentralized learning problems over peer-to-peer networks (e.g., multi-agent meta-learning, multi-agent reinforcement learning, personalized training, and Byzantine-resilient learning). However, for decentralized… ▽ More In recent years, decentralized bilevel optimization problems have received increasing attention in the networking and machine learning communities thanks to their versatility in modeling decentralized learning problems over peer-to-peer networks (e.g., multi-agent meta-learning, multi-agent reinforcement learning, personalized training, and Byzantine-resilient learning). However, for decentralized bilevel optimization over peer-to-peer networks with limited computation and communication capabilities, how to achieve low sample and communication complexities are two fundamental challenges that remain under-explored so far. In this paper, we make the first attempt to investigate the class of decentralized bilevel optimization problems with nonconvex and strongly-convex structure corresponding to the outer and inner subproblems, respectively. Our main contributions in this paper are two-fold: i) We first propose a deterministic algorithm called INTERACT (inner-gradient-descent-outer-tracked-gradient) that requires the sample complexity of $\mathcal{O}(n ε^{-1})$ and communication complexity of $\mathcal{O}(ε^{-1})$ to solve the bilevel optimization problem, where $n$ and $ε> 0$ are the number of samples at each agent and the desired stationarity gap, respectively. ii) To relax the need for full gradient evaluations in each iteration, we propose a stochastic variance-reduced version of INTERACT (SVR-INTERACT), which improves the sample complexity to $\mathcal{O}(\sqrt{n} ε^{-1})$ while achieving the same communication complexity as the deterministic algorithm. To our knowledge, this work is the first that achieves both low sample and communication complexities for solving decentralized bilevel optimization problems over networks. Our numerical experiments also corroborate our theoretical findings. △ Less

Submitted 5 October, 2022; v1 submitted 27 July, 2022; originally announced July 2022.

arXiv:2207.05650 [pdf, other]

A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization

Authors: Songtao Lu

Abstract: Nonconvex constrained optimization problems can be used to model a number of machine learning problems, such as multi-class Neyman-Pearson classification and constrained Markov decision processes. However, such kinds of problems are challenging because both the objective and constraints are possibly nonconvex, so it is difficult to balance the reduction of the loss value and reduction of constrain… ▽ More Nonconvex constrained optimization problems can be used to model a number of machine learning problems, such as multi-class Neyman-Pearson classification and constrained Markov decision processes. However, such kinds of problems are challenging because both the objective and constraints are possibly nonconvex, so it is difficult to balance the reduction of the loss value and reduction of constraint violation. Although there are a few methods that solve this class of problems, all of them are double-loop or triple-loop algorithms, and they require oracles to solve some subproblems up to certain accuracy by tuning multiple hyperparameters at each iteration. In this paper, we propose a novel gradient descent and perturbed ascent (GDPA) algorithm to solve a class of smooth nonconvex inequality constrained problems. The GDPA is a primal-dual algorithm, which only exploits the first-order information of both the objective and constraint functions to update the primal and dual variables in an alternating way. The key feature of the proposed algorithm is that it is a single-loop algorithm, where only two step-sizes need to be tuned. We show that under a mild regularity condition GDPA is able to find Karush-Kuhn-Tucker (KKT) points of nonconvex functional constrained problems with convergence rate guarantees. To the best of our knowledge, it is the first single-loop algorithm that can solve the general nonconvex smooth problems with nonconvex inequality constraints. Numerical results also showcase the superiority of GDPA compared with the best-known algorithms (in terms of both stationarity measure and feasibility of the obtained solutions). △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: This work has been accepted by the Thirty-ninth International Conference on Machine Learning. (Some typos in the ICML proceedings are corrected in this version.)

arXiv:2206.13482 [pdf, other]

Understanding Benign Overfitting in Gradient-Based Meta Learning

Authors: Lisha Chen, Songtao Lu, Tianyi Chen

Abstract: Meta learning has demonstrated tremendous success in few-shot learning with limited supervised data. In those settings, the meta model is usually overparameterized. While the conventional statistical learning theory suggests that overparameterized models tend to overfit, empirical evidence reveals that overparameterized meta learning methods still work well -- a phenomenon often called "benign ove… ▽ More Meta learning has demonstrated tremendous success in few-shot learning with limited supervised data. In those settings, the meta model is usually overparameterized. While the conventional statistical learning theory suggests that overparameterized models tend to overfit, empirical evidence reveals that overparameterized meta learning methods still work well -- a phenomenon often called "benign overfitting." To understand this phenomenon, we focus on the meta learning settings with a challenging bilevel structure that we term the gradient-based meta learning, and analyze its generalization performance under an overparameterized meta linear regression model. While our analysis uses the relatively tractable linear models, our theory contributes to understanding the delicate interplay among data heterogeneity, model adaptation and benign overfitting in gradient-based meta learning tasks. We corroborate our theoretical claims through numerical simulations. △ Less

Submitted 9 November, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

arXiv:2206.00161 [pdf, ps, other]

On the asymptotic Plateau problem in hyperbolic space

Authors: Siyuan Lu

Abstract: In this paper, we solve the asymptotic Plateau problem in hyperbolic space for constant $σ_{n-1}$ curvature, i.e. the existence of a complete hypersurface in $\mathbb{H}^{n+1}$ satisfying $σ_{n-1}(κ)=σ\in (0,n)$ with a prescribed asymptotic boundary $Γ$. The key ingredient is the curvature estimates. Previously, this is only known for $σ_0<σ<n$, where $σ_0$ is a positive constant. In this paper, we solve the asymptotic Plateau problem in hyperbolic space for constant $σ_{n-1}$ curvature, i.e. the existence of a complete hypersurface in $\mathbb{H}^{n+1}$ satisfying $σ_{n-1}(κ)=σ\in (0,n)$ with a prescribed asymptotic boundary $Γ$. The key ingredient is the curvature estimates. Previously, this is only known for $σ_0<σ<n$, where $σ_0$ is a positive constant. △ Less

Submitted 12 February, 2023; v1 submitted 31 May, 2022; originally announced June 2022.

arXiv:2205.14413 [pdf]

Discrimination-Based Double Auction for Maximizing Social Welfare in the Electricity and Heating Market Considering Privacy Preservation

Authors: Lu Wang, Wei Gu, Shuai Lu, Haifeng Qiu, Zhi Wu

Abstract: This paper proposes a doubled-sided auction mechanism with price discrimination for social welfare (SW) maximization in the electricity and heating market. In this mechanism, energy service providers (ESPs) submit offers and load aggregators (LAs) submit bids to an energy trading center (ETC) to maximize their utility; in turn, the selfless ETC as an auctioneer leverages dis-criminatory price weig… ▽ More This paper proposes a doubled-sided auction mechanism with price discrimination for social welfare (SW) maximization in the electricity and heating market. In this mechanism, energy service providers (ESPs) submit offers and load aggregators (LAs) submit bids to an energy trading center (ETC) to maximize their utility; in turn, the selfless ETC as an auctioneer leverages dis-criminatory price weights to regulate the behaviors of ESPs and LAs, which combines the individual benefits of each stakeholder with the overall social welfare to achieve the global optimum. Nash games are employed to describe the interactions between players with the same market role. Theoretically, we first prove the existence and uniqueness of the Nash equilibrium; then, considering the requirement of game players to preserve privacy, a distributed algorithm based on the alternating direction method of multipliers is developed to implement distributed bidding and analytical target cascading algorithm is applied to reach the balance of demand and supply. We validated the proposed mechanism using case studies on a city-level distribution system. The results indicated that the achieved SW improved by 4%-15% compared with other mechanisms, and also verified the effectiveness of the distributed algorithm. △ Less

Submitted 28 May, 2022; originally announced May 2022.

arXiv:2204.05420 [pdf, ps, other]

On the Dirichlet problem for Lagrangian phase equation with critical and supercritical phase

Authors: Siyuan Lu

Abstract: In this paper, we solve the Dirichlet problem for Lagrangian phase equation with critical and supercritical phase. A crucial ingredient is the interior $C^2$ estimate. Our result is sharp in the sense that there exist singular solutions in the subcritical phase case. In this paper, we solve the Dirichlet problem for Lagrangian phase equation with critical and supercritical phase. A crucial ingredient is the interior $C^2$ estimate. Our result is sharp in the sense that there exist singular solutions in the subcritical phase case. △ Less

Submitted 12 February, 2023; v1 submitted 11 April, 2022; originally announced April 2022.

arXiv:2203.01924 [pdf, other]

Min-Max Bilevel Multi-objective Optimization with Applications in Machine Learning

Authors: Alex Gu, Songtao Lu, Parikshit Ram, Lily Weng

Abstract: We consider a generic min-max multi-objective bilevel optimization problem with applications in robust machine learning such as representation learning and hyperparameter optimization. We design MORBiT, a novel single-loop gradient descent-ascent bilevel optimization algorithm, to solve the generic problem and present a novel analysis showing that MORBiT converges to the first-order stationary poi… ▽ More We consider a generic min-max multi-objective bilevel optimization problem with applications in robust machine learning such as representation learning and hyperparameter optimization. We design MORBiT, a novel single-loop gradient descent-ascent bilevel optimization algorithm, to solve the generic problem and present a novel analysis showing that MORBiT converges to the first-order stationary point at a rate of $\widetilde{\mathcal{O}}(n^{1/2} K^{-2/5})$ for a class of weakly convex problems with $n$ objectives upon $K$ iterations of the algorithm. Our analysis utilizes novel results to handle the non-smooth min-max multi-objective setup and to obtain a sublinear dependence in the number of objectives $n$. Experimental results on robust representation learning and robust hyperparameter optimization showcase (i) the advantages of considering the min-max multi-objective setup, and (ii) convergence properties of the proposed MORBiT. Our code is at https://github.com/minimario/MORBiT. △ Less

Submitted 7 March, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

Comments: 43 pages, 3 figures, ICLR 2023 version

arXiv:2202.06303 [pdf, other]

On the Exactness of an Energy-efficient Train Control model based on Convex Optimization

Authors: Shaofeng Lu, Minling Feng, Kunpeng Wu

Abstract: In this paper, we demonstrate the exactness proof for the energy-efficient train control (EETC) model based on convex optimization. The proof of exactness shows that the convex optimization model will share the same optimization results with the initial model on which the convex relaxations are conducted. We first show how the relaxation on the initial non-convex model is conducted and provide ana… ▽ More In this paper, we demonstrate the exactness proof for the energy-efficient train control (EETC) model based on convex optimization. The proof of exactness shows that the convex optimization model will share the same optimization results with the initial model on which the convex relaxations are conducted. We first show how the relaxation on the initial non-convex model is conducted and provide analysis to show that the relaxations are convex constraints and the relaxed model is thus a convex model. Subsequently, we prove that the relaxed convex model will always achieve its optimal solution on the initial equality constraints and the optimal solution achieved by convex optimization will be the same as the one obtained by the initial non-convex model and the relaxations applied are exact. A numerical verification has been conducted based on a typical urban rail system with a steep gradient. The results of this paper shed lights on further applications of convex optimization on energy-efficient train control and relevant areas related to operation and control of low-carbon transportation systems. △ Less

Submitted 13 February, 2022; originally announced February 2022.

Comments: 11 pages and 4 figures

arXiv:2202.06217 [pdf, ps, other]

The double contravariant powerset monad in the Goguen category of fuzzy sets

Authors: Sijia Lu, Dexue Zhang

Abstract: A monad is constructed in the Goguen category of fuzzy sets valued in a unital quantale, which is an analog of the double contravariant powerset monad in the category of sets. With help of this monad it is proved that the Goguen category of fuzzy sets is dually monadic over itself. A monad is constructed in the Goguen category of fuzzy sets valued in a unital quantale, which is an analog of the double contravariant powerset monad in the category of sets. With help of this monad it is proved that the Goguen category of fuzzy sets is dually monadic over itself. △ Less

Submitted 3 August, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

Comments: 21 pages

MSC Class: 03E72; 18C15; 18C20

arXiv:2201.11839 [pdf, ps, other]

The local-global principle for divisibility in CM elliptic curves

Authors: Brendan Creutz, Sheng Lu

Abstract: We consider the local-global principle for divisibility in the Mordell-Weil group of a CM elliptic curve defined over a number field. For each prime $p$ we give sharp lower bounds on the degree $d$ of a number field over which there exists a CM elliptic curve which gives a counterexample to the local-global principle for divisibility by a power of $p$. As a corollary we deduce that there are at mo… ▽ More We consider the local-global principle for divisibility in the Mordell-Weil group of a CM elliptic curve defined over a number field. For each prime $p$ we give sharp lower bounds on the degree $d$ of a number field over which there exists a CM elliptic curve which gives a counterexample to the local-global principle for divisibility by a power of $p$. As a corollary we deduce that there are at most finitely many elliptic curves (with or without CM) which are counterexamples with $p > 2d+1$. We also deduce that the local-global principle for divisibility by powers of $7$ holds over quadratic fields. △ Less

Submitted 27 January, 2022; originally announced January 2022.

arXiv:2201.11057 [pdf, ps, other]

Translation of Mathieu's On the five-fold transitive function of 24 quantities

Authors: Yiming Bing, Bright Hu, Ronni Hu, Rhianna Kho, Max Lau, Rhianna Li, Stefan Lu, Finn Mcdonald, Michael Sun, Gavin Trann, Nicholas Wolfe, Joshua Yao, Leon Zhou, Nathan Zhou

Abstract: We translated the paper 'On the five-fold transitive function of 24 quantities' by Émile Mathieu into English. This was done to aid our project on expressing $M_{23}$ as additive functions on the finite field of $2_{11}$ elements. We translated the paper 'On the five-fold transitive function of 24 quantities' by Émile Mathieu into English. This was done to aid our project on expressing $M_{23}$ as additive functions on the finite field of $2_{11}$ elements. △ Less

Submitted 21 January, 2022; originally announced January 2022.

Comments: We translated the paper 'On the five-fold transitive function of 24 quantities' by Émile Mathieu into English. This was done to aid our project on expressing $M_{23}$ as additive functions on the finite field of $2_{11}$ elements

arXiv:2201.10731 [pdf, other]

A fast-solved model for energy-efficient train control based on convex optimization

Authors: Minling Feng, Kunpeng Wu, Shaofeng Lu

Abstract: In modern rail transportation, energy-efficient train control (EETC) is concerned with the optimal train speed trajectory or control strategies to achieve the minimum energy cost under various operation and traction constraints. This paper proposes an EETC model based on convex optimization so that the model can be rapidly solved by convex optimization algorithms. The high computational efficiency… ▽ More In modern rail transportation, energy-efficient train control (EETC) is concerned with the optimal train speed trajectory or control strategies to achieve the minimum energy cost under various operation and traction constraints. This paper proposes an EETC model based on convex optimization so that the model can be rapidly solved by convex optimization algorithms. The high computational efficiency and robustness of the convex model can be verified by comparing the results achieved by the method proposed by this paper and other mainstream mathematical programming methods including mixed-integer linear programming (MILP) and Radau pseudospectral method (RPM). Based on the characteristics of convex optimization, the proposed method boasts more significant advantages over its counterparts in terms of computational efficiency in the promising online applications for automatic train control systems of various types of rail transportation. △ Less

Submitted 25 January, 2022; originally announced January 2022.

Comments: 10 pages, 5 figures

arXiv:2201.00108 [pdf, ps, other]

The Mathieu group $M_{23}$ as additive functions on the finite field of size ${2^{11}}$

Authors: Yiming Bing, Bright Hu, Ronni Hu, Rhianna Li, Stefan Lu, Finn McDonald, Michael Sun, Nicholas Wolfe, Joshua Yao, Leon Zhou, Nathan Zhou

Abstract: We explicitly extend the standard permutation action of the Mathieu group $M_{23}$ on a 23 element set $C=C_{23}$ contained in a finite field of $2^{11}$ elements $\mathbb{F}_{2^{11}}$ to additive functions on this finite field. That is we represent $M_{23}$ as functions $\varphi:\mathbb{F}_{2^{11}}\to \mathbb{F}_{2^{11}}$ such that $\varphi(x+y)=\varphi(x)+\varphi(y)$ and $\varphi|_{C}$ is the st… ▽ More We explicitly extend the standard permutation action of the Mathieu group $M_{23}$ on a 23 element set $C=C_{23}$ contained in a finite field of $2^{11}$ elements $\mathbb{F}_{2^{11}}$ to additive functions on this finite field. That is we represent $M_{23}$ as functions $\varphi:\mathbb{F}_{2^{11}}\to \mathbb{F}_{2^{11}}$ such that $\varphi(x+y)=\varphi(x)+\varphi(y)$ and $\varphi|_{C}$ is the standard permutation action. We give explicit $11\times 11$ matrices for the pair of standard generators of order $23$ and order $5$, as well as many tables to help facilitate future calculations. △ Less

Submitted 31 December, 2021; originally announced January 2022.

Comments: 15 pages

MSC Class: 20C34

arXiv:2112.11420 [pdf, other]

Zeroth-order Optimization for Composite Problems with Functional Constraints

Authors: Zichong Li, Pin-Yu Chen, Sijia Liu, Songtao Lu, Yangyang Xu

Abstract: In many real-world problems, first-order (FO) derivative evaluations are too expensive or even inaccessible. For solving these problems, zeroth-order (ZO) methods that only need function evaluations are often more efficient than FO methods or sometimes the only options. In this paper, we propose a novel zeroth-order inexact augmented Lagrangian method (ZO-iALM) to solve black-box optimization prob… ▽ More In many real-world problems, first-order (FO) derivative evaluations are too expensive or even inaccessible. For solving these problems, zeroth-order (ZO) methods that only need function evaluations are often more efficient than FO methods or sometimes the only options. In this paper, we propose a novel zeroth-order inexact augmented Lagrangian method (ZO-iALM) to solve black-box optimization problems, which involve a composite (i.e., smooth+nonsmooth) objective and functional constraints. Under a certain regularity condition (also assumed by several existing works on FO methods), the query complexity of our ZO-iALM is $\tilde{O}(d\varepsilon^{-3})$ to find an $\varepsilon$-KKT point for problems with a nonconvex objective and nonconvex constraints, and $\tilde{O}(d\varepsilon^{-2.5})$ for nonconvex problems with convex constraints, where $d$ is the variable dimension. This appears to be the first work that develops an iALM-based ZO method for functional constrained optimization and meanwhile achieves query complexity results matching the best-known FO complexity results up to a factor of $d$. With an extensive experimental study, we show the effectiveness of our method. The applications of our method span from classical optimization problems to practical machine learning examples such as resource allocation in sensor networks and adversarial example generation. △ Less

Submitted 21 December, 2021; originally announced December 2021.

Comments: AAAI 2022

MSC Class: 90C26; 90C30; 90C25; 90C60; 90C56; 90C06

arXiv:2111.13446 [pdf, other]

doi 10.1088/1361-6420/ac637a

Increasing stability in the linearized inverse Schrödinger potential problem with power type nonlinearities

Authors: Shuai Lu, Mikko Salo, Boxi Xu

Abstract: We consider increasing stability in the inverse Schrödinger potential problem with power type nonlinearities at a large wavenumber. Two linearization approaches, with respect to small boundary data and small potential function, are proposed and their performance on the inverse Schrödinger potential problem is investigated. It can be observed that higher order linearization for small boundary data… ▽ More We consider increasing stability in the inverse Schrödinger potential problem with power type nonlinearities at a large wavenumber. Two linearization approaches, with respect to small boundary data and small potential function, are proposed and their performance on the inverse Schrödinger potential problem is investigated. It can be observed that higher order linearization for small boundary data can provide an increasing stability for an arbitrary power type nonlinearity term if the wavenumber is chosen large. Meanwhile, linearization with respect to the potential function leads to increasing stability for a quadratic nonlinearity term, which highlights the advantage of nonlinearity in solving the inverse Schrödinger potential problem. Noticing that both linearization approaches can be numerically approximated, we provide several reconstruction algorithms for the quadratic and general power type nonlinearity terms, where one of these algorithms is designed based on boundary measurements of multiple wavenumbers. Several numerical examples shed light on the efficiency of our proposed algorithms. △ Less

Submitted 17 March, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

Comments: 29 pages, 7 figures

Journal ref: Inverse Problems 38 (2022) 065009 (25pp)

arXiv:2111.13020 [pdf, other]

doi 10.1142/S0218202522500361

Normalized solutions with positive energies for a coercive problem and application to the cubic-quintic nonlinear Schrödinger equation

Authors: Louis Jeanjean, Sheng-Sen Lu

Abstract: In any dimension $N \geq 1$, for given mass $m > 0$ and when the $C^1$ energy functional \begin{equation*} I(u) := \frac{1}{2} \int_{\mathbb{R}^N} |\nabla u|^2 dx - \int_{\mathbb{R}^N} F(u) dx \end{equation*} is coercive on the mass constraint \begin{equation*} S_m := \left\{ u \in H^1(\mathbb{R}^N) ~|~ \|u\|^2_{L^2(\mathbb{R}^N)} = m \right\}, \end{equation*} we are interested in… ▽ More In any dimension $N \geq 1$, for given mass $m > 0$ and when the $C^1$ energy functional \begin{equation*} I(u) := \frac{1}{2} \int_{\mathbb{R}^N} |\nabla u|^2 dx - \int_{\mathbb{R}^N} F(u) dx \end{equation*} is coercive on the mass constraint \begin{equation*} S_m := \left\{ u \in H^1(\mathbb{R}^N) ~|~ \|u\|^2_{L^2(\mathbb{R}^N)} = m \right\}, \end{equation*} we are interested in searching for constrained critical points at positive energy levels. Under general conditions on $F \in C^1(\mathbb{R}, \mathbb{R})$ and for suitable ranges of the mass, we manage to construct such critical points which appear as a local minimizer or correspond to a mountain pass or a symmetric mountain pass level. In particular, our results shed some light on the cubic-quintic nonlinear Schrödinger equation in $\mathbb{R}^3$. △ Less

Submitted 15 August, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

Comments: This version is the final one, corresponding to the paper now published in Math. Models Methods Appl. Sci. DOI: 10.1142/S0218202522500361

MSC Class: 35Q55; 35J20

Journal ref: Mathematical Models and Methods in Applied Sciences 32 (2022) 1557-1588

arXiv:2109.07213 [pdf]

Hierarchical Electricity and Carbon Trading in Transmission and Distribution Networks Based on Virtual Federated Prosumer

Authors: Lu Wang, Zhi Wu, Wei Gu, Haifeng Qiu, Shuai Lu

Abstract: Facing the dilemma of growing energy demand and mitigating carbon emissions, this paper proposes an energy sharing mechanism based on virtual federated prosumers (VFPs) with budget allocation for joint electricity and carbon market to incentivize distributed energy resources to participate in the hierarchical market and reduce carbon emissions. At the transmission level, the regional transmission… ▽ More Facing the dilemma of growing energy demand and mitigating carbon emissions, this paper proposes an energy sharing mechanism based on virtual federated prosumers (VFPs) with budget allocation for joint electricity and carbon market to incentivize distributed energy resources to participate in the hierarchical market and reduce carbon emissions. At the transmission level, the regional transmission operator coordinates transactions between two markets, the inter-VFP energy sharing market and the wholesale market, intending to minimize the overall cost of VFPs. The energy sharing market clearing problem is formulated as a generalized Nash game, for which we develop a first-order response algorithm to obtain the equilibrium. At the distribution level, the VFPs play the role of selfless auctioneer that leverage discriminatory weights and benchmark prices to allocate the electricity-carbon budget among entities in the VFP to maximize social welfare. The Nash game is exploited to characterize the budget allocation problem, for which a distributed feedback allocation algorithm is proposed. The entire hierarchical electricity and carbon trading is modeled as an equilibrium problem and is solved iteratively. Case studies based on a practical regional grid verify the effectiveness of the proposed algorithm and show that the mechanism is effective in improving energy efficiency and reducing carbon emissions. △ Less

Submitted 15 September, 2021; originally announced September 2021.

arXiv:2108.04142 [pdf, ps, other]

doi 10.1007/s00526-022-02320-6

On global minimizers for a mass constrained problem

Authors: Louis Jeanjean, Sheng-Sen Lu

Abstract: In any dimension $N \geq 1$, for given mass $m > 0$ and for the $C^1$ energy functional \begin{equation*} I(u):=\frac{1}{2}\int_{\mathbb{R}^N}|\nabla u|^2dx-\int_{\mathbb{R}^N}F(u)dx, \end{equation*} we revisit the classical problem of finding conditions on $F \in C^1(\mathbb{R},\mathbb{R})$ insuring that $I$ admits global minimizers on the mass constraint \begin{equation*} S_m:=\left\{u… ▽ More In any dimension $N \geq 1$, for given mass $m > 0$ and for the $C^1$ energy functional \begin{equation*} I(u):=\frac{1}{2}\int_{\mathbb{R}^N}|\nabla u|^2dx-\int_{\mathbb{R}^N}F(u)dx, \end{equation*} we revisit the classical problem of finding conditions on $F \in C^1(\mathbb{R},\mathbb{R})$ insuring that $I$ admits global minimizers on the mass constraint \begin{equation*} S_m:=\left\{u\in H^1(\mathbb{R}^N)~|~\|u\|^2_{L^2(\mathbb{R}^N)}=m\right\}. \end{equation*} Under assumptions that we believe to be nearly optimal, in particular without assuming that $F$ is even, any such global minimizer, called energy ground state, proves to have constant sign and to be radially symmetric monotone with respect to some point in $\mathbb{R}^N$. Moreover, we show that any energy ground state is a least action solution of the associated action functional. This last result answers positively, under general assumptions, a long standing issue. △ Less

Submitted 10 October, 2022; v1 submitted 9 August, 2021; originally announced August 2021.

Comments: This version is the final one, corresponding to the paper now published in Calc. Var. Partial Differential Equations

MSC Class: 35J60; 58E05

Journal ref: Calculus of Variations and Partial Differential Equations, 61 (2022): 214

arXiv:2108.02434 [pdf, other]

A Geometrically Consistent Trace Finite Element Method For The Laplace-Beltrami Eigenvalue Problem

Authors: Song Lu, Xianmin Xu

Abstract: In this paper, we propose a new trace finite element method for the {Laplace-Beltrami} eigenvalue problem. The method is proposed directly on a smooth manifold which is implicitly given by a level-set function and require high order numerical quadrature on the surface. A comprehensive analysis for the method is provided. We show that the eigenvalues of the discrete Laplace-Beltrami operator coinci… ▽ More In this paper, we propose a new trace finite element method for the {Laplace-Beltrami} eigenvalue problem. The method is proposed directly on a smooth manifold which is implicitly given by a level-set function and require high order numerical quadrature on the surface. A comprehensive analysis for the method is provided. We show that the eigenvalues of the discrete Laplace-Beltrami operator coincide with only part of the eigenvalues of an embedded problem, which further corresponds to the finite eigenvalues for a singular generalized algebraic eigenvalue problem. The finite eigenvalues can be efficiently solved by a rank-completing perturbation algorithm in {\it Hochstenbach et al. SIAM J. Matrix Anal. Appl., 2019} \cite{hochstenbach2019solving}. We prove the method has optimal convergence rate. Numerical experiments verify the theoretical analysis and show that the geometric consistency can improve the numerical accuracy significantly. △ Less

Submitted 14 January, 2022; v1 submitted 5 August, 2021; originally announced August 2021.

Comments: 23 pages, 6 figures

arXiv:2108.02375 [pdf, ps, other]

On the $σ_2$-Nirenberg problem on $\mathbb{S}^2$

Authors: YanYan Li, Han Lu, Siyuan Lu

Abstract: We establish theorems on the existence and compactness of solutions to the $σ_2$-Nirenberg problem on the standard sphere $\mathbb S^2$. A first significant ingredient, a Liouville type theorem for the associated fully nonlinear Möbius invariant elliptic equations, was established in an earlier paper of ours. Our proof of the existence and compactness results requires a number of additional crucia… ▽ More We establish theorems on the existence and compactness of solutions to the $σ_2$-Nirenberg problem on the standard sphere $\mathbb S^2$. A first significant ingredient, a Liouville type theorem for the associated fully nonlinear Möbius invariant elliptic equations, was established in an earlier paper of ours. Our proof of the existence and compactness results requires a number of additional crucial ingredients which we prove in this paper: A Liouville type theorem for the associated fully nonlinear Möbius invariant degenerate elliptic equations, a priori estimates of first and second order derivatives of solutions to the $σ_2$-Nirenberg problem, and a Bôcher type theorem for the associated fully nonlinear Möbius invariant elliptic equations. Given these results, we are able to complete a fine analysis of a sequence of blow-up solutions to the $σ_2$-Nirenberg problem. In particular, we prove that there can be at most one blow-up point for such a blow-up sequence of solutions. This, together with a Kazdan-Warner type identity, allows us to prove $L^\infty$ a priori estimates for solutions of the $σ_2$-Nirenberg problem under some simple generic hypothesis. The higher derivative estimates then follow from classical estimates of Nirenberg and Schauder. In turn, the existence of solutions to the $σ_2$-Nirenberg problem is obtained by an application of the by now standard degree theory for second order fully nonlinear elliptic operators. △ Less

Submitted 5 August, 2021; originally announced August 2021.

arXiv:2107.08157 [pdf, ps, other]

Determination of source terms in diffusion and wave equations by observations after incidents: uniqueness and stability

Authors: Jin Cheng, Shuai Lu, Masahiro Yamamoto

Abstract: We consider a diffusion and a wave equations: $$ \partial_t^ku(x,t) = Δu(x,t) + μ(t)f(x), \quad x\in Ω, \, t>0, \quad k=1,2 $$ with the zero initial and boundary conditions, where $Ω\subset \mathbb{R}^d$ is a bounded domain. We establish uniqueness and/or stability results for inverse problems of 1. determining $μ(t)$, $0<t<T$ with given $f(x)$; 2. determining $f(x)$, $x\in Ω$ with given $μ(t)$ \e… ▽ More We consider a diffusion and a wave equations: $$ \partial_t^ku(x,t) = Δu(x,t) + μ(t)f(x), \quad x\in Ω, \, t>0, \quad k=1,2 $$ with the zero initial and boundary conditions, where $Ω\subset \mathbb{R}^d$ is a bounded domain. We establish uniqueness and/or stability results for inverse problems of 1. determining $μ(t)$, $0<t<T$ with given $f(x)$; 2. determining $f(x)$, $x\in Ω$ with given $μ(t)$ \end{itemize} by data of $u$: $u(x_0,\cdot)$ with fixed point $x_0\in Ω$ or Neumann data on subboundary over time interval. In our inverse problems, data are taken over time interval $T_1<t<T_1$, by assuming that $T<T_1<T_2$ and $μ(t)=0$ for $t\ge T$, which means that the source stops to be active after the time $T$ and the observations are started only after $T$. This assumption is practical by such a posteriori data after incidents, although inverse problems had been well studied in the case of $T=0$. We establish the non-uniqueness, the uniqueness and conditional stability for a diffusion and a wave equations. The proofs are based on eigenfunction expansions of the solutions $u(x,t)$, and we rely on various knowledge of the generalized Weierstrass theorem on polynomial approximation, almost periodic functions, Carleman estimate, non-harmonic Fourier series. △ Less

Submitted 16 July, 2021; originally announced July 2021.

Showing 1–50 of 137 results for author: Lu, S