-
Grassmannian optimization is NP-hard
Authors:
Zehua Lai,
Lek-Heng Lim,
Ke Ye
Abstract:
We show that unconstrained quadratic optimization over a Grassmannian $\operatorname{Gr}(k,n)$ is NP-hard. Our results cover all scenarios: (i) when $k$ and $n$ are both allowed to grow; (ii) when $k$ is arbitrary but fixed; (iii) when $k$ is fixed at its lowest possible value $1$. We then deduce the NP-hardness of unconstrained cubic optimization over the Stiefel manifold $\operatorname{V}(k,n)$…
▽ More
We show that unconstrained quadratic optimization over a Grassmannian $\operatorname{Gr}(k,n)$ is NP-hard. Our results cover all scenarios: (i) when $k$ and $n$ are both allowed to grow; (ii) when $k$ is arbitrary but fixed; (iii) when $k$ is fixed at its lowest possible value $1$. We then deduce the NP-hardness of unconstrained cubic optimization over the Stiefel manifold $\operatorname{V}(k,n)$ and the orthogonal group $\operatorname{O}(n)$. As an addendum we demonstrate the NP-hardness of unconstrained quadratic optimization over the Cartan manifold, i.e., the positive definite cone $\mathbb{S}^n_{\scriptscriptstyle++}$ regarded as a Riemannian manifold, another popular example in manifold optimization. We will also establish the nonexistence of $\mathrm{FPTAS}$ in all cases.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Simple matrix expressions for the curvatures of Grassmannian
Authors:
Zehua Lai,
Lek-Heng Lim,
Ke Ye
Abstract:
We show that modeling a Grassmannian as symmetric orthogonal matrices $\operatorname{Gr}(k,\mathbb{R}^n) \cong\{Q \in \mathbb{R}^{n \times n} : Q^{\scriptscriptstyle\mathsf{T}} Q = I, \; Q^{\scriptscriptstyle\mathsf{T}} = Q,\; \operatorname{tr}(Q)=2k - n\}$ yields exceedingly simple matrix formulas for various curvatures and curvature-related quantities, both intrinsic and extrinsic. These include…
▽ More
We show that modeling a Grassmannian as symmetric orthogonal matrices $\operatorname{Gr}(k,\mathbb{R}^n) \cong\{Q \in \mathbb{R}^{n \times n} : Q^{\scriptscriptstyle\mathsf{T}} Q = I, \; Q^{\scriptscriptstyle\mathsf{T}} = Q,\; \operatorname{tr}(Q)=2k - n\}$ yields exceedingly simple matrix formulas for various curvatures and curvature-related quantities, both intrinsic and extrinsic. These include Riemann, Ricci, Jacobi, sectional, scalar, mean, principal, and Gaussian curvatures; Schouten, Weyl, Cotton, Bach, Plebański, cocurvature, nonmetricity, and torsion tensors; first, second, and third fundamental forms; Gauss and Weingarten maps; and upper and lower delta invariants. We will derive explicit, simple expressions for the aforementioned quantities in terms of standard matrix operations that are stably computable with numerical linear algebra. Many of these aforementioned quantities have never before been presented for the Grassmannian.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Autonomous Sparse Mean-CVaR Portfolio Optimization
Authors:
Yizun Lin,
Yangyu Zhang,
Zhao-Rong Lai,
Cheng Li
Abstract:
The $\ell_0$-constrained mean-CVaR model poses a significant challenge due to its NP-hard nature, typically tackled through combinatorial methods characterized by high computational demands. From a markedly different perspective, we propose an innovative autonomous sparse mean-CVaR portfolio model, capable of approximating the original $\ell_0$-constrained mean-CVaR model with arbitrary accuracy.…
▽ More
The $\ell_0$-constrained mean-CVaR model poses a significant challenge due to its NP-hard nature, typically tackled through combinatorial methods characterized by high computational demands. From a markedly different perspective, we propose an innovative autonomous sparse mean-CVaR portfolio model, capable of approximating the original $\ell_0$-constrained mean-CVaR model with arbitrary accuracy. The core idea is to convert the $\ell_0$ constraint into an indicator function and subsequently handle it through a tailed approximation. We then propose a proximal alternating linearized minimization algorithm, coupled with a nested fixed-point proximity algorithm (both convergent), to iteratively solve the model. Autonomy in sparsity refers to retaining a significant portion of assets within the selected asset pool during adjustments in pool size. Consequently, our framework offers a theoretically guaranteed approximation of the $\ell_0$-constrained mean-CVaR model, improving computational efficiency while providing a robust asset selection scheme.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Globally Optimal Solutions to a Class of Fractional Optimization Problems Based on Proximal Gradient Algorithm
Authors:
Yizun Lin,
Jian-Feng Cai,
Zhao-Rong Lai,
Cheng Li
Abstract:
In this paper, we investigate a category of constrained fractional optimization problems that emerge in various practical applications. The objective function for this category is characterized by the ratio of a numerator and denominator, both being convex, semi-algebraic, Lipschitz continuous, and differentiable with Lipschitz continuous gradients over the constraint sets. The constrained sets as…
▽ More
In this paper, we investigate a category of constrained fractional optimization problems that emerge in various practical applications. The objective function for this category is characterized by the ratio of a numerator and denominator, both being convex, semi-algebraic, Lipschitz continuous, and differentiable with Lipschitz continuous gradients over the constraint sets. The constrained sets associated with these problems are closed, convex, and semi-algebraic. We propose an efficient algorithm that is inspired by the proximal gradient method, and we provide a thorough convergence analysis. Our algorithm offers several benefits compared to existing methods. It requires only a single proximal gradient operation per iteration, thus avoiding the complicated inner-loop concave maximization usually required. Additionally, our method converges to a critical point without the typical need for a nonnegative numerator, and this critical point becomes a globally optimal solution with an appropriate condition. Our approach is adaptable to unbounded constraint sets as well. Therefore, our approach is viable for many more practical models. Numerical experiments show that our method not only reliably reaches ground-truth solutions in some model problems but also outperforms several existing methods in maximizing the Sharpe ratio with real-world financial data.
△ Less
Submitted 15 May, 2024; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Abstract Visual Reasoning: An Algebraic Approach for Solving Raven's Progressive Matrices
Authors:
Jingyi Xu,
Tushar Vaidya,
Yufei Wu,
Saket Chandra,
Zhangsheng Lai,
Kai Fong Ernest Chong
Abstract:
We introduce algebraic machine reasoning, a new reasoning framework that is well-suited for abstract reasoning. Effectively, algebraic machine reasoning reduces the difficult process of novel problem-solving to routine algebraic computation. The fundamental algebraic objects of interest are the ideals of some suitably initialized polynomial ring. We shall explain how solving Raven's Progressive Ma…
▽ More
We introduce algebraic machine reasoning, a new reasoning framework that is well-suited for abstract reasoning. Effectively, algebraic machine reasoning reduces the difficult process of novel problem-solving to routine algebraic computation. The fundamental algebraic objects of interest are the ideals of some suitably initialized polynomial ring. We shall explain how solving Raven's Progressive Matrices (RPMs) can be realized as computational problems in algebra, which combine various well-known algebraic subroutines that include: Computing the Gröbner basis of an ideal, checking for ideal containment, etc. Crucially, the additional algebraic structure satisfied by ideals allows for more operations on ideals beyond set-theoretic operations.
Our algebraic machine reasoning framework is not only able to select the correct answer from a given answer set, but also able to generate the correct answer with only the question matrix given. Experiments on the I-RAVEN dataset yield an overall $93.2\%$ accuracy, which significantly outperforms the current state-of-the-art accuracy of $77.0\%$ and exceeds human performance at $84.4\%$ accuracy.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
Towards a Multimodal Charging Network: Joint Planning of Charging Stations and Battery Swapping Stations for Electrified Ride-Hailing Fleets
Authors:
Zhijie Lai,
Sen Li
Abstract:
This paper considers a multimodal charging network in which charging stations and battery swapping stations are jointly built to support an electric ride-hailing fleet synergistically. Our argument is based on the observation that charging an EV is a time-consuming burden, and battery swapping faces scaling issues due to its deployment costs. However, charging stations are cost-effective, making t…
▽ More
This paper considers a multimodal charging network in which charging stations and battery swapping stations are jointly built to support an electric ride-hailing fleet synergistically. Our argument is based on the observation that charging an EV is a time-consuming burden, and battery swapping faces scaling issues due to its deployment costs. However, charging stations are cost-effective, making them ideal for scaling up EV fleets, while battery swapping stations offer quick turnaround and can be deployed in tandem with charging stations to improve fleet utilization and reduce operational costs. To fulfill this vision, we consider a ride-hailing platform that jointly builds charging and battery swapping stations to support an EV fleet. An optimization model is proposed to capture the platform's planning and operational decisions. In particular, the model incorporates essential components such as elastic passenger demand, spatial charging equilibrium, charging and swapping congestion, etc. The overall problem is formulated as a nonconcave program. Instead of pursuing the globally optimal solution, we establish a tight upper bound through relaxation and decomposition, allowing us to evaluate the solution optimality even in the absence of concavity. Through case studies for Manhattan, New York City, we find that joint planning of charging and battery swapping stations outperforms deploying only one of them, yielding a total profit that is 11.7% higher than swapping-only deployment under a limited budget, and 17.5% higher than charging-only deployment under a sufficient budget. These results underscore the complementary benefit between charging and battery swapping facilities.
△ Less
Submitted 1 April, 2024; v1 submitted 24 December, 2022;
originally announced December 2022.
-
Simpler flag optimization
Authors:
Zehua Lai,
Lek-Heng Lim,
Ke Ye
Abstract:
We study the geometry of flag manifolds under different embeddings into a product of Grassmannians. We show that differential geometric objects and operations -- tangent vector, metric, normal vector, exponential map, geodesic, parallel transport, gradient, Hessian, etc -- have closed-form analytic expressions that are computable with standard numerical linear algebra. Furthermore, we are able to…
▽ More
We study the geometry of flag manifolds under different embeddings into a product of Grassmannians. We show that differential geometric objects and operations -- tangent vector, metric, normal vector, exponential map, geodesic, parallel transport, gradient, Hessian, etc -- have closed-form analytic expressions that are computable with standard numerical linear algebra. Furthermore, we are able to derive a coordinate descent method in the flag manifold that performs well compared to other gradient descent methods.
△ Less
Submitted 30 November, 2022;
originally announced December 2022.
-
Haagerup bound for quaternionic Grothendieck inequality
Authors:
Shmuel Friedland,
Zehua Lai,
Lek-Heng Lim
Abstract:
We present here several versions of the Grothendieck inequality over the skew field of quaternions: The first one is the standard Grothendieck inequality for rectangular matrices, and two additional inequalities for self-adjoint matrices, as introduced by the first and the last authors in a recent paper. We give several results on ``conic Grothendieck inequality'': as Nesterov $π/2$-Theorem, which…
▽ More
We present here several versions of the Grothendieck inequality over the skew field of quaternions: The first one is the standard Grothendieck inequality for rectangular matrices, and two additional inequalities for self-adjoint matrices, as introduced by the first and the last authors in a recent paper. We give several results on ``conic Grothendieck inequality'': as Nesterov $π/2$-Theorem, which corresponds to the cones of positive semidefinite matrices; the Goemans--Williamson inequality, which corresponds to the cones of weighted Laplacians; the diagonally dominant matrices. The most challenging technical part of this paper is the proof of the analog of Haagerup result that the inverse of the hypergeometric function $x {}_2F_1(\frac{1}{2}, \frac{1}{2}; 3; x^2)$ has first positive Taylor coefficient and all other Taylor coefficients are nonpositive.
△ Less
Submitted 30 November, 2022;
originally announced December 2022.
-
Stochastic Steffensen method
Authors:
Minda Zhao,
Zehua Lai,
Lek-Heng Lim
Abstract:
Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes -- the Steffensen method avoids second derivatives and is still quadratically convergent like Newton method. By incorporating an optimal step size we can even push its convergence order beyond quadratic to $1+\sqrt{2} \approx 2.414$. While…
▽ More
Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes -- the Steffensen method avoids second derivatives and is still quadratically convergent like Newton method. By incorporating an optimal step size we can even push its convergence order beyond quadratic to $1+\sqrt{2} \approx 2.414$. While such high convergence orders are a pointless overkill for a deterministic algorithm, they become rewarding when the algorithm is randomized for problems of massive sizes, as randomization invariably compromises convergence speed. We will introduce two adaptive learning rates inspired by the Steffensen method, intended for use in a stochastic optimization setting and requires no hyperparameter tuning aside from batch size. Extensive experiments show that they compare favorably with several existing first-order methods. When restricted to a quadratic objective, our stochastic Steffensen methods reduce to randomized Kaczmarz method -- note that this is not true for SGD or SLBFGS -- and thus we may also view our methods as a generalization of randomized Kaczmarz to arbitrary objectives.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Spatiotemporal Pricing and Fleet Management of Autonomous Mobility-on-Demand Networks: A Decomposition and Dynamic Programming Approach with Bounded Optimality Gap
Authors:
Zhijie Lai,
Sen Li
Abstract:
This paper studies spatiotemporal pricing and fleet management for autonomous mobility-on-demand (AMoD) systems while taking elastic demand into account. We consider a platform that offers ride-hailing services using a fleet of autonomous vehicles and makes pricing, rebalancing, and fleet sizing decisions in response to demand fluctuations. A network flow model is developed to characterize the evo…
▽ More
This paper studies spatiotemporal pricing and fleet management for autonomous mobility-on-demand (AMoD) systems while taking elastic demand into account. We consider a platform that offers ride-hailing services using a fleet of autonomous vehicles and makes pricing, rebalancing, and fleet sizing decisions in response to demand fluctuations. A network flow model is developed to characterize the evolution of system states over space and time, which captures the vehicle-passenger matching process and demand elasticity with respect to price and waiting time. The platform's objective of maximizing profit is formulated as a constrained optimal control problem, which is highly nonconvex due to the nonlinear demand model and complex supply-demand interdependence. To address this challenge, an integrated decomposition and dynamic programming approach is proposed, where we first relax the problem through a change of variable, then separate the relaxed problem into a few small-scale subproblems via dual decomposition, and finally solve each subproblem using dynamic programming. Despite the nonconvexity, our approach establishes a theoretical upper bound to evaluate the solution optimality. The proposed model and methodology are validated in numerical studies for Manhattan. We find that compared to the benchmark case, the proposed upper bound is significantly tighter. We also find that compared to pricing alone, joint pricing and fleet rebalancing can only offer a minor profit improvement when demand can be accurately predicted. However, during unanticipated demand surges, joint pricing and rebalancing can lead to substantially improved profits, and the impacts of demand shocks, despite being more widespread, can dissipate faster.
△ Less
Submitted 5 December, 2023; v1 submitted 7 June, 2022;
originally announced June 2022.
-
Riemannian Interior Point Methods for Constrained Optimization on Manifolds
Authors:
Zhijian Lai,
Akiko Yoshise
Abstract:
We extend the classical primal-dual interior point method from the Euclidean setting to the Riemannian one. Our method, named the Riemannian interior point method, is for solving Riemannian constrained optimization problems. We establish its local superlinear and quadratic convergence under the standard assumptions. Moreover, we show its global convergence when it is combined with a classical line…
▽ More
We extend the classical primal-dual interior point method from the Euclidean setting to the Riemannian one. Our method, named the Riemannian interior point method, is for solving Riemannian constrained optimization problems. We establish its local superlinear and quadratic convergence under the standard assumptions. Moreover, we show its global convergence when it is combined with a classical line search. Our method is a generalization of the classical framework of primal-dual interior point methods for nonlinear nonconvex programming. Numerical experiments show the stability and efficiency of our method.
△ Less
Submitted 7 February, 2024; v1 submitted 18 March, 2022;
originally announced March 2022.
-
Regulating Transportation Network Companies with a Mixture of Autonomous Vehicles and For-Hire Human Drivers
Authors:
Di Ao,
Jing Gao,
Zhijie Lai,
Sen Li
Abstract:
This paper investigates the equity impacts of autonomous vehicles (AV) on for-hire human drivers and passengers in a ride-hailing market, and examines regulation policies that protect human drivers and improve transport equity for ride-hailing passengers. We consider a transportation network companies (TNC) that employs a mixture of AVs and human drivers to provide ride-hailing services. The TNC p…
▽ More
This paper investigates the equity impacts of autonomous vehicles (AV) on for-hire human drivers and passengers in a ride-hailing market, and examines regulation policies that protect human drivers and improve transport equity for ride-hailing passengers. We consider a transportation network companies (TNC) that employs a mixture of AVs and human drivers to provide ride-hailing services. The TNC platform determines the spatial prices, fleet size, human driver payments, and vehicle relocation strategies to maximize its profit, while individual passengers choose between different transport modes to minimize their travel costs. A market equilibrium model is proposed to capture the interactions among passengers, human drivers, AVs, and TNC over the transportation network. The overall problem is formulated as a non-concave program, and an algorithm is developed to derive its approximate solution with a theoretical performance guarantee. Our study shows that TNC prioritizes AV deployment in higher-demand areas to make a higher profit. As AVs flood into these higher-demand areas, they compete with human drivers in the urban core and push them to relocate to suburbs. This leads to reduced earning opportunities for human drivers and increased spatial inequity for passengers. To mitigate these concerns, we consider: (a) a minimum wage for human drivers; and (b) a restrictive pickup policy that prohibits AVs from picking up passengers in higher-demand areas. In the former case, we show that a minimum wage for human drivers will protect them from the negative impact of AVs with negligible impacts on passengers. However, there exists a threshold beyond which the minimum wage will trigger the platform to replace the majority of human drivers with AVs.
△ Less
Submitted 17 December, 2023; v1 submitted 14 December, 2021;
originally announced December 2021.
-
Completely Positive Factorization by a Riemannian Smoothing Method
Authors:
Zhijian Lai,
Akiko Yoshise
Abstract:
Copositive optimization is a special case of convex conic programming, and it consists of optimizing a linear function over the cone of all completely positive matrices under linear constraints. Copositive optimization provides powerful relaxations of NP-hard quadratic problems or combinatorial problems, but there are still many open problems regarding copositive or completely positive matrices. I…
▽ More
Copositive optimization is a special case of convex conic programming, and it consists of optimizing a linear function over the cone of all completely positive matrices under linear constraints. Copositive optimization provides powerful relaxations of NP-hard quadratic problems or combinatorial problems, but there are still many open problems regarding copositive or completely positive matrices. In this paper, we focus on one such problem; finding a completely positive (CP) factorization for a given completely positive matrix. We treat it as a nonsmooth Riemannian optimization problem, i.e., a minimization problem of a nonsmooth function over a Riemannian manifold. To solve this problem, we present a general smoothing framework for solving nonsmooth Riemannian optimization problems and show convergence to a stationary point of the original problem. An advantage is that we can implement it quickly with minimal effort by directly using the existing standard smooth Riemannian solvers, such as Manopt. Numerical experiments show the efficiency of our method especially for large-scale CP factorizations.
△ Less
Submitted 4 October, 2022; v1 submitted 4 July, 2021;
originally announced July 2021.
-
On-Demand Valet Charging for Electric Vehicles: Economic Equilibrium, Infrastructure Planning and Regulatory Incentives
Authors:
Zhijie Lai,
Sen Li
Abstract:
Many city residents cannot install their private electric vehicle (EV) chargers due to the lack of dedicated parking spaces or insufficient grid capacity. This presents a significant barrier towards large-scale EV adoption. To address this concern, this paper considers a novel business model, on-demand valet charging, that unlocks the potential of under-utilized public charging infrastructure to p…
▽ More
Many city residents cannot install their private electric vehicle (EV) chargers due to the lack of dedicated parking spaces or insufficient grid capacity. This presents a significant barrier towards large-scale EV adoption. To address this concern, this paper considers a novel business model, on-demand valet charging, that unlocks the potential of under-utilized public charging infrastructure to promise higher EV penetration. In the proposed model, a platform recruits a fleet of couriers that shuttle between customers and public charging stations to provide on-demand valet charging services to EV owners at an affordable price. Couriers are dispatched to pick up low-battery EVs from customers, deliver the EVs to charging stations, plug them in, and then return the fully-charged EVs to customers. To depict the proposed business model, we develop a queuing network to represent the stochastic matching dynamics, and further formulate an economic equilibrium model to capture the incentives of couriers, customers as well as the platform. These models are used to examine how charging infrastructure planning and regulatory intervention will affect the market outcome. First, we find that the optimal charging station densities for distinct stakeholders are different: couriers prefer a lower density; the platform prefers a higher density; while the density in-between leads to the highest EV penetration as it balances the time traveling to and queuing at charging stations. Second, we evaluate a regulatory policy that imposes a tax on the platform and invests the tax revenue in public charging infrastructure. Numerical results suggest that this regulation can suppress the platform's market power associated with monopoly pricing, increase social welfare, and facilitate the market expansion.
△ Less
Submitted 9 November, 2021; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Online Statistical Inference for Stochastic Optimization via Kiefer-Wolfowitz Methods
Authors:
Xi Chen,
Zehua Lai,
He Li,
Yichen Zhang
Abstract:
This paper investigates the problem of online statistical inference of model parameters in stochastic optimization problems via the Kiefer-Wolfowitz algorithm with random search directions. We first present the asymptotic distribution for the Polyak-Ruppert-averaging type Kiefer-Wolfowitz (AKW) estimators, whose asymptotic covariance matrices depend on the distribution of search directions and the…
▽ More
This paper investigates the problem of online statistical inference of model parameters in stochastic optimization problems via the Kiefer-Wolfowitz algorithm with random search directions. We first present the asymptotic distribution for the Polyak-Ruppert-averaging type Kiefer-Wolfowitz (AKW) estimators, whose asymptotic covariance matrices depend on the distribution of search directions and the function-value query complexity. The distributional result reflects the trade-off between statistical efficiency and function query complexity. We further analyze the choice of random search directions to minimize certain summary statistics of the asymptotic covariance matrix. Based on the asymptotic distribution, we conduct online statistical inference by providing two construction procedures of valid confidence intervals.
△ Less
Submitted 9 December, 2023; v1 submitted 5 February, 2021;
originally announced February 2021.
-
Simpler Grassmannian optimization
Authors:
Zehua Lai,
Lek-Heng Lim,
Ke Ye
Abstract:
There are two widely used models for the Grassmannian $\operatorname{Gr}(k,n)$, as the set of equivalence classes of orthogonal matrices $\operatorname{O}(n)/(\operatorname{O}(k) \times \operatorname{O}(n-k))$, and as the set of trace-$k$ projection matrices $\{P \in \mathbb{R}^{n \times n} : P^{\mathsf{T}} = P = P^2,\; \operatorname{tr}(P) = k\}$. The former, standard in manifold optimization, ha…
▽ More
There are two widely used models for the Grassmannian $\operatorname{Gr}(k,n)$, as the set of equivalence classes of orthogonal matrices $\operatorname{O}(n)/(\operatorname{O}(k) \times \operatorname{O}(n-k))$, and as the set of trace-$k$ projection matrices $\{P \in \mathbb{R}^{n \times n} : P^{\mathsf{T}} = P = P^2,\; \operatorname{tr}(P) = k\}$. The former, standard in manifold optimization, has the advantage of giving numerically stable algorithms but the disadvantage of having to work with equivalence classes of matrices. The latter, widely used in coding theory and probability, has the advantage of using actual matrices (as opposed to equivalence classes) but working with projection matrices is numerically unstable. We present an alternative that has both advantages and suffers from neither of the disadvantages; by representing $k$-dimensional subspaces as symmetric orthogonal matrices of trace $2k-n$, we obtain \[ \operatorname{Gr}(k,n) \cong \{Q \in \operatorname{O}(n) : Q^{\mathsf{T}} = Q, \; \operatorname{tr}(Q) = 2k -n\}. \] As with the other two models, we show that differential geometric objects and operations -- tangent vector, metric, normal vector, exponential map, geodesic, parallel transport, gradient, Hessian, etc -- have closed-form analytic expressions that are computable with standard numerical linear algebra. In the proposed model, these expressions are considerably simpler, a result of representing $\operatorname{Gr}(k,n)$ as a linear section of a compact matrix Lie group $\operatorname{O}(n)$, and can be computed with at most one QR decomposition and one exponential of a special skew-symmetric matrix that takes only $O(nk(n-k))$ time. In particular, we completely avoid eigen- and singular value decompositions in our steepest descent, conjugate gradient, quasi-Newton, and Newton methods for the Grassmannian.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Recht-Ré Noncommutative Arithmetic-Geometric Mean Conjecture is False
Authors:
Zehua Lai,
Lek-Heng Lim
Abstract:
Stochastic optimization algorithms have become indispensable in modern machine learning. An unresolved foundational question in this area is the difference between with-replacement sampling and without-replacement sampling -- does the latter have superior convergence rate compared to the former? A groundbreaking result of Recht and Ré reduces the problem to a noncommutative analogue of the arithme…
▽ More
Stochastic optimization algorithms have become indispensable in modern machine learning. An unresolved foundational question in this area is the difference between with-replacement sampling and without-replacement sampling -- does the latter have superior convergence rate compared to the former? A groundbreaking result of Recht and Ré reduces the problem to a noncommutative analogue of the arithmetic-geometric mean inequality where $n$ positive numbers are replaced by $n$ positive definite matrices. If this inequality holds for all $n$, then without-replacement sampling indeed outperforms with-replacement sampling. The conjectured Recht-Ré inequality has so far only been established for $n = 2$ and a special case of $n = 3$. We will show that the Recht-Ré conjecture is false for general $n$. Our approach relies on the noncommutative Positivstellensatz, which allows us to reduce the conjectured inequality to a semidefinite program and the validity of the conjecture to certain bounds for the optimum values, which we show are false as soon as $n = 5$.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.