Local convergence of tensor methods

Doikov, Nikita; Nesterov, Yurii

doi:10.1007/s10107-020-01606-x

Local convergence of tensor methods

Full Length Paper
Series A
Open access
Published: 04 January 2021

Volume 193, pages 315–336, (2022)
Cite this article

Download PDF

You have full access to this open access article

Mathematical Programming Submit manuscript

Local convergence of tensor methods

Download PDF

2496 Accesses
8 Citations
2 Altmetric
Explore all metrics

Abstract

In this paper, we study local convergence of high-order Tensor Methods for solving convex optimization problems with composite objective. We justify local superlinear convergence under the assumption of uniform convexity of the smooth component, having Lipschitz-continuous high-order derivative. The convergence both in function value and in the norm of minimal subgradient is established. Global complexity bounds for the Composite Tensor Method in convex and uniformly convex cases are also discussed. Lastly, we show how local convergence of the methods can be globalized using the inexact proximal iterations.

Implementable tensor methods in unconstrained convex optimization

Article Open access 21 November 2019

Reachability of Optimal Convergence Rate Estimates for High-Order Numerical Convex Optimization Methods

Article 01 January 2019

Linear convergence of an alternating polar decomposition method for low rank orthogonal tensor approximations

Article 30 July 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Motivation In Nonlinear Optimization, it seems to be a natural idea to increase the performance of numerical methods by employing high-order oracles. However, the main obstacle to this approach consists in a prohibiting complexity of the corresponding Taylor approximations formed by the high-order multidimensional polynomials, which are difficult to store, handle, and minimize. If we go just one step above the commonly used quadratic approximation, we get a multidimensional polynomial of degree three which is never convex. Consequently, its usefulness for optimization methods is questionable.

However, recently in [18] it was shown that the Taylor polynomials of convex functions have a very interesting structure. It appears that their augmentation by a power of Euclidean norm with a reasonably big coefficients gives us a global upper convex model of the objective function, which keeps all advantages of the local high-order approximation.

One of the classical and well-known results in Nonlinear Optimization is related to the local quadratic convergence of Newton’s method [13, 19]. Later on, it was generalized to the case of composite optimization problems [14], where the objective is represented as a sum of two convex components: smooth, and possibly nonsmooth but simple. Local superlinear convergence of the Incremental Newton method for finite-sum minimization problems was established in [24].

The study of high-order numerical methods for solving nonlinear equations is dated back to the work of Chebyshev in 1838, where the scalar methods of order three and four were proposed [2]. The methods of arbitrary order for solving nonlinear equations were studied in [6].

A big step in the second-order optimization theory was made since [22], where Cubic regularization of the Newton method with its global complexity estimates was proposed. Additionally, the local superlinear convergence was justified. See also [1] for the local analysis of the Adaptive cubic regularization methods.

Our paper is aimed to study local convergence of high-order methods, generalizing corresponding results from [22] in several ways. We establish local superlinear convergence of Tensor Method [18] of degree $p \ge 2$, in the case when the objective is composite, and its smooth part is uniformly convex of arbitrary degree q from the interval $2 \le q < p - 1$. For strongly convex functions ($q=2$), this gives the local convergence of degree p.

Contents We formulate our problem of interest and define a step of the Regularized Composite Tensor Method in Sect. 2. Then, we declare some of its properties, which are required for our analysis.

In Sect. 3, we prove local superlinear convergence of the Tensor Method in function value, and in the norm of minimal subgradient, under the assumption of uniform convexity of the objective.

In Sect. 4, we discuss global behavior of the method and justify sublinear and linear global rates of convergence for convex and uniformly convex cases, respectively.

One application of our developments is provided in Sect. 5. We show how local convergence can be applied for computing an inexact step in proximal methods. A global sublinear rate of convergence for the resulting scheme is also given.

Notations and generalities In what follows, we denote by $\mathbb {E}$ a finite-dimensional real vector space, and by $\mathbb {E}^*$ its dual spaced composed by linear functions on $\mathbb {E}$. For such a function $s \in \mathbb {E}^*$, we denote by $\langle s, x \rangle $ its value at $x \in \mathbb {E}$. Using a self-adjoint positive-definite operator $B: \mathbb {E}\rightarrow \mathbb {E}^*$ (notation $B = B^* \succ 0$), we can endow these spaces with mutually conjugate Euclidean norms:

$$\begin{aligned} \Vert x \Vert= & {} \langle B x, x \rangle ^{1/2}, \quad x \in \mathbb {E}, \quad \Vert g \Vert _* \; = \; \langle g, B^{-1} g \rangle ^{1/2}, \quad g \in \mathbb {E}^*. \end{aligned}$$

For a smooth function $f: \mathrm{dom} \,f \rightarrow \mathbb {R}$ with convex and open domain $\mathrm{dom} \,f \subseteq \mathbb {E}$, denote by $\nabla f(x)$ its gradient, and by $\nabla ^2 f(x)$ its Hessian evaluated at point $x \in \mathrm{dom} \,f \subseteq \mathbb {E}$. Note that

$$\begin{aligned} \nabla f(x)\in & {} \mathbb {E}^*, \quad \nabla ^2 f(x) h \; \in \; \mathbb {E}^*, \quad x \in \mathrm{dom} \,f, \; h \in \mathbb {E}. \end{aligned}$$

For non-differentiable convex function $f(\cdot )$, we denote by $\partial f(x) \subset \mathbb {E}^*$ its subdifferential at the point $x \in \mathrm{dom} \,f$.

In what follows, we often work with directional derivatives. For $p \ge 1$, denote by

$$\begin{aligned} D^p f(x)[h_1, \dots , h_p] \end{aligned}$$

the directional derivative of function f at x along directions $h_i \in \mathbb {E}$, $i = 1, \dots , p$. If all directions $h_1, \dots , h_p$ are the same, we apply a simpler notation

$$\begin{aligned} D^p f(x)[h]^p, \quad h \in \mathbb {E}. \end{aligned}$$

Note that $D^p f(x)[ \cdot ]$ is a symmetric p-linear form. Its norm is defined in the standard way:

$$\begin{aligned} \Vert D^pf(x) \Vert= & {} \max \limits _{h_1, \dots , h_p \in \mathbb {E}} \left\{ D^p f(x)[h_1, \dots , h_p ]: \; \Vert h_i \Vert \le 1, \, i = 1, \dots , p \right\} \nonumber \\= & {} \max \limits _{h \in \mathbb {E}} \left\{ \Big | D^p f(x)[h]^p\Big |: \; \Vert h \Vert \le 1 \right\} \end{aligned}$$

(1.1)

(for the last equation see, for example, Appendix 1 in [21]). Similarly, we define

$$\begin{aligned} \begin{array}{rcl} \Vert D^pf(x) - D^pf(y) \Vert= & {} \max \limits _{h \in \mathbb {E}} \left\{ \Big | D^p f(x)[h]^p - D^pf(y)[h]^p\Big |: \; \Vert h \Vert \le 1 \right\} . \end{array} \end{aligned}$$

(1.2)

In particular, for any $x \in \mathrm{dom} \,f$ and $h_1, h_2 \in \mathbb {E}$, we have

$$\begin{aligned} Df(x)[h_1]= & {} \langle \nabla f(x), h_1 \rangle , \quad D^2f(x)[h_1, h_2] \; = \; \langle \nabla ^2 f(x) h_1, h_2 \rangle . \end{aligned}$$

Thus, for the Hessian, our definition corresponds to a spectral norm of the self-adjoint linear operator (maximal module of all eigenvalues computed with respect to $B \succ 0$).

Finally, the Taylor approximation of function $f(\cdot )$ at $x \in \mathrm{dom} \,f$ is defined as follows:

$$\begin{aligned}&f(x+h) = \varOmega _p(f, x; x + h) + o(\Vert h\Vert ^p), \quad x+h \in \mathrm{dom} \,f,\\&\quad \varOmega _p(f,x;y) \; {\mathop {=}\limits ^{\mathrm {def}}}\; f(x) + \sum \limits _{k=1}^p {1 \over k!} D^k f(x)[y-x]^k, \quad y \in \mathbb {E}. \end{aligned}$$

Consequently, for all $y \in \mathbb {E}$ we have

$$\begin{aligned} \nabla \varOmega _p(f,x;y)= & {} \sum \limits _{k=1}^p {1 \over (k-1)!} D^k f(x)[y-x]^{k-1}, \end{aligned}$$

(1.3)

$$\begin{aligned} \nabla ^2 \varOmega _p(f,x;y)= & {} \sum \limits _{k=2}^p {1 \over (k-2)!} D^k f(x)[y-x]^{k-2}. \end{aligned}$$

(1.4)

2 Main inequalities

In this paper, we consider the following composite convex minimization problem

$$\begin{aligned} \min \limits _{x \in \mathrm{dom} \,h} \Big \{ F(x) = f(x) + h(x) \Big \}, \end{aligned}$$

(2.1)

where $h: \mathbb {E}\rightarrow \mathbb {R}\cup \{+\infty \}$ is a simple proper closed convex function and $f \in C^{p,p}(\mathrm{dom} \,h)$ for a certain $p \ge 2$. In other words, we assume that the pth derivative of function f is Lipschitz continuous:

$$\begin{aligned} \begin{array}{rcl} \Vert D^p f(x) - D^p f(y) \Vert\le & {} L_p \Vert x - y \Vert , \quad x, y \in \mathrm{dom} \,h. \end{array} \end{aligned}$$

(2.2)

Assuming that $L_{p} < +\infty $, by the standard integration arguments we can bound the residual between function value and its Taylor approximation:

$$\begin{aligned} \begin{array}{rcl} | f(y) - \varOmega _p(f,x;y) |\le & {} {L_{p} \over (p+1)!} \Vert y - x \Vert ^{p+1}, \quad x, y \in \mathrm{dom} \,h. \end{array} \end{aligned}$$

(2.3)

Applying the same reasoning to functions $\langle \nabla f(\cdot ), h \rangle $ and $\langle \nabla ^2 f(\cdot ) h, h \rangle $ with direction $h \in \mathbb {E}$ being fixed, we get the following guarantees:

$$\begin{aligned} \Vert \nabla f(y) - \nabla \varOmega _p(f,x;y) \Vert _*\le & {} {L_p \over p!} \Vert y - x \Vert ^{p}, \end{aligned}$$

(2.4)

$$\begin{aligned} \Vert \nabla ^2 f(y) - \nabla ^2 \varOmega _p(f,x;y) \Vert\le & {} {L_p \over (p-1)!} \Vert y - x \Vert ^{p-1}, \end{aligned}$$

(2.5)

which are valid for all $x, y \in \mathrm{dom} \,h$.

Let us define now one step of the Regularized Composite Tensor Method (RCTM) of degree $p \ge 2$:

$$\begin{aligned} \begin{array}{rcl} T\equiv & {} T_H(x) \; {\mathop {=}\limits ^{\mathrm {def}}}\; \arg \min \limits _{y \in \mathbb {E}} \left\{ \varOmega _p(f,x;y) + {H \over (p+1)!} \Vert y - x \Vert ^{p+1} + h(y) \right\} . \end{array} \end{aligned}$$

(2.6)

It can be shown that for

(2.7)

the auxiliary optimization problem in (2.6) is convex (see Theorem 1 in [18]). This condition is crucial for implementability of our methods and we always assume it to be satisfied.

Let us write down the first-order optimality condition for the auxiliary optimization problem in (2.6):

$$\begin{aligned} \begin{array}{rcl} \langle \nabla \varOmega _p(f,x;T) + {H \over p!} \Vert T - x \Vert ^{p-1}B(T-x), y - T \rangle + h(y)\ge & {} h(T), \end{array} \end{aligned}$$

(2.8)

for all $y \in \mathrm{dom} \,h$. In other words, for vector

$$\begin{aligned} h'(T) \; {\mathop {=}\limits ^{\mathrm {def}}}\; - \left( \nabla \varOmega _p(f,x;T) + {H \over p!} \Vert T - x \Vert ^{p-1}B(T-x) \right) \end{aligned}$$

(2.9)

we have $h'(T) {\mathop {\in }\limits ^{(2.8)}} \partial h(T)$. This fact explains our notation

$$\begin{aligned} F'(T) \; {\mathop {=}\limits ^{\mathrm {def}}}\; \nabla f(T) + h'(T) \; \in \partial F(T). \end{aligned}$$

(2.10)

Let us present some properties of the point $T = T_H(x)$. First of all, we need some bounds for the norm of vector $F'(T)$. Note that

$$\begin{aligned} \Big \Vert F'(T) + {H \over p!} \Vert T - x \Vert ^{p-1}B(T-x) \Big \Vert _*&{{\mathop {=}\limits ^{(2.9)}}}&\Big \Vert \nabla f(T) - \nabla \varOmega _p(f,x;T) \Big \Vert _* \nonumber \\&{{\mathop {\le }\limits ^{(2.4)}}}&{L_p \over p!} \Vert T - x \Vert ^p. \end{aligned}$$

(2.11)

Consequently,

$$\begin{aligned} \begin{array}{rcl} \Vert F'(T) \Vert _*\le & {} {L_p+H \over p!} \Vert T - x \Vert ^p. \end{array} \end{aligned}$$

(2.12)

Secondly, we use the following lemma.

Lemma 1

Let $\beta > 1$ and $H = \beta L_p$. Then

$$\begin{aligned} \begin{array}{rcl} \langle F'(T), x - T \rangle\ge & {} \left( {p! \over (p+1)L_p} \right) ^{1 \over p} \cdot \Vert F'(T) \Vert _*^{p+1 \over p} \cdot {(\beta ^2 - 1)^{p-1 \over 2p} \over \beta } \cdot {p \over (p^2-1)^{p-1 \over 2p}}. \end{array} \end{aligned}$$

(2.13)

In particular, if $\beta = p$, then

$$\begin{aligned} \begin{array}{rcl} \langle F'(T), x - T \rangle\ge & {} \left( {p! \over (p+1)L_p} \right) ^{1 \over p} \cdot \Vert F'(T) \Vert _*^{p+1 \over p}. \end{array} \end{aligned}$$

(2.14)

Proof

Denote $r = \Vert T - x \Vert $, $h = {H \over p!}$, and $l = {L_p \over p!}$. Then inequality (2.11) can be written as follows:

$$\begin{aligned} \Vert F'(T) + h r^{p-1} B(T-x) \Vert ^2_*\le & {} l^2 r^{2p}. \end{aligned}$$

This means that

$$\begin{aligned} \begin{array}{rcl} \langle F'(T), x - T \rangle\ge & {} {1 \over 2 h r^{p-1}} \Vert F'(T) \Vert _*^2 + {r^{2p} (h^2 - l^2) \over 2h r^{p-1}}. \end{array} \end{aligned}$$

(2.15)

Denote

$$\begin{aligned} a= & {} {1 \over 2h} \Vert F'(T) \Vert _*^2, \quad b \; = \; {h^2 - l^2 \over 2h}, \quad \tau \; = \; r^{p-1}, \quad \alpha \; = \; {p+1 \over p-1}. \end{aligned}$$

Then inequality (2.15) can be rewritten as follows:

$$\begin{aligned} \langle F'(T) , x - T \rangle\ge & {} {a \over \tau } + b \tau ^{\alpha } \; \ge \; \min \limits _{t > 0} \left\{ {a \over t} + b t^{\alpha } \right\} \; = \; (1+\alpha ) \left( {a \over \alpha } \right) ^{\alpha \over 1 + \alpha } b^{1 \over 1 + \alpha }. \end{aligned}$$

Taking into account that $1+\alpha = {2p \over p-1}$ and ${\alpha \over 1 + \alpha } = {p + 1 \over 2p}$, and using the actual meaning of a, b, and $\alpha $, we get

$$\begin{aligned} \langle F'(T), x - T \rangle\ge & {} {2 p \over p-1} \cdot { \Vert F'(T) \Vert _*^{p+1 \over p} \over (2h)^{p+1 \over 2p}} \cdot {(p-1)^{p+1 \over 2p} \over (p+1)^{p+1 \over 2p}} \cdot {(h^2 - l^2)^{p-1 \over 2p} \over (2h)^{p-1 \over 2p}}\\= & {} \Vert F'(T) \Vert _*^{p+1 \over p} \cdot {(h^2 - l^2)^{p-1 \over 2p} \over h} \cdot {p \over (p+1)^{p+1 \over 2p} (p-1)^{p-1 \over 2p}}\\= & {} \Vert F'(T) \Vert _*^{p+1 \over p} \cdot {(h^2 - l^2)^{p-1 \over 2p} \over h} \cdot {p \over (p^2-1)^{p-1 \over 2p} (p+1)^{1 \over p}}. \end{aligned}$$

It remains to note that

$$\begin{aligned} {(h^2 - l^2)^{p-1 \over 2p} \over h}= & {} {(H^2 - L_p^2)^{p-1 \over 2p} \over H} \cdot (p!)^{1 \over p} \; = \; {(\beta ^2 - 1)^{p-1 \over 2p} \over \beta } \cdot \left( {p! \over L_p} \right) ^{1 \over p}. \end{aligned}$$

$\square $

3 Local convergence

The main goal of this paper consists in analyzing the local behavior of the Regularized Composite Tensor Method (RCTM):

$$\begin{aligned} \begin{array}{rcl} x_0 \; \in \; \mathrm{dom} \,h, \quad x_{k+1}= & {} T_H(x_k), \quad k \ge 0, \end{array} \end{aligned}$$

(3.1)

as applied to the problem (2.1). In order to prove local superlinear convergence of this scheme, we need one more assumption.

Assumption 1

The objective in problem (2.1) is uniformly convex of degree $q \ge 2$. Thus, for all $x, y \in \mathrm{dom} \,h$ and for all $G_x \in \partial F(x), G_y \in \partial F(y)$, it holds:

$$\begin{aligned} \begin{array}{rcl} \langle G_x - G_y, x - y \rangle\ge & {} \sigma _q \Vert x - y \Vert ^q, \end{array} \end{aligned}$$

(3.2)

for certain $\sigma _q > 0$.

It is well known that this assumption guarantees the uniform convexity of the objective function (see, for example, Lemma 4.2.1 in [19]):

$$\begin{aligned} \begin{array}{rcl} F(y)\ge & {} F(x) + \langle G_x, y - x \rangle + {\sigma _q \over q} \Vert y - x \Vert ^q, \quad y \in \mathrm{dom} \,h, \end{array} \end{aligned}$$

(3.3)

where $G_x$ is an arbitrary subgradient from $\partial F(x)$. Therefore,

$$\begin{aligned} F^*= & {} \min \limits _{y \in \mathrm{dom} \,h} F(y) \; \ge \; \min \limits _{y \in \mathbb {E}} \left\{ F(x) + \langle G_x, y - x \rangle + {\sigma _q \over q} \Vert y - x \Vert ^q \right\} \nonumber \\= & {} F(x) - {q-1 \over q} \left( {1 \over \sigma _q} \right) ^{1 \over q-1} \Vert G_x \Vert _*^{q \over q-1}. \end{aligned}$$

(3.4)

This simple inequality gives us the following local convergence rate for RCTM.

Theorem 1

For any $k \ge 0$ we have

$$\begin{aligned} \begin{array}{rcl} F(x_{k+1}) - F^*\le & {} (q-1) q^{p-q+1 \over q-1} \bigl ({1 \over \sigma _q}\bigr )^{p+1 \over q-1} \left( {L_p + H \over p!} \right) ^{q \over q - 1} \big [F(x_k) - F^* \big ]^{p \over q-1}. \end{array} \end{aligned}$$

(3.5)

Proof

Indeed, for any $k \ge 0$ we have

$$\begin{aligned}&F(x_k) - F^* \ge F(x_k) - F(x_{k+1}) \\&\quad {{\mathop {\ge }\limits ^{(3.3)}}} \; \langle F'(x_{k+1}), x_k - x_{k+1} \rangle + {\sigma _q \over q} \Vert x_k - x_{k+1} \Vert ^q\\&\quad {{\mathop {\ge }\limits ^{(2.13)}}} \; {\sigma _q \over q} \Vert x_k - x_{k+1} \Vert ^q \; {{\mathop {\ge }\limits ^{(2.12)}}} \; {\sigma _q \over q} \left( {p! \over L_p+H} \Vert F'(x_{k+1}) \Vert _* \right) ^{q \over p}\\&\quad {{\mathop {\ge }\limits ^{(3.4)}}} \; {\sigma _q \over q} \left( {p! \over L_p+H}\right) ^{q \over p} \left( {q \, \sigma _q^{1 \over q-1} \over q-1} (F(x_{k+1})-F^*) \right) ^{q -1\over p}. \end{aligned}$$

And this is exactly inequality (3.5). $\square $

Corollary 1

If $p > q-1$, then method (3.1) has local superlinear rate of convergence for problem (2.1).

Proof

Indeed, in this case ${p \over q-1} > 1$. $\square $

For example, if $q = 2$ (strongly convex function) and $p=2$ (Cubic Regularization of the Newton Method), then the rate of convergence is quadratic. If $q=2$, and $p = 3$, then the local rate of convergence is cubic, etc.

Let us study now the local convergence of the method (3.1) in terms of the norm of gradient. For any $x \in \mathrm{dom} \,h$ denote

$$\begin{aligned} \eta (x) \; {\mathop {=}\limits ^{\mathrm {def}}}\; \min \limits _{g \in \partial h(x)} \Vert \nabla f(x) + g \Vert _*. \end{aligned}$$

(3.6)

If $\partial h(x) = \emptyset $, we set $\eta (x) = +\infty $.

Theorem 2

For any $k \ge 0$ we have

$$\begin{aligned} \begin{array}{rcl} \eta (x_{k+1})&\, \le \,&\Vert F'(x_{k + 1}) \Vert _{*} \; \le \; {L_p + H \over p!} \left[ {1 \over \sigma _q} \, \eta (x_k) \right] ^{p \over q-1}. \end{array} \end{aligned}$$

(3.7)

Proof

Indeed, in view of inequality (3.2), we have

$$\begin{aligned} \langle \nabla f(x_k) + g_k, x_{k} - x_{k + 1} \rangle\ge & {} \langle F'(x_{k + 1}), x_k - x_{k + 1} \rangle + \sigma _q \Vert x_k - x_{k + 1}\Vert ^q \\&{{\mathop {\ge }\limits ^{(2.13)}}}&\sigma _q \Vert x_k - x_{k + 1}\Vert ^{q}, \end{aligned}$$

where $g_k$ is an arbitrary vector from $\partial h(x_k)$. Therefore, we conclude that

$$\begin{aligned} \eta (x_k)\ge & {} \sigma _q \Vert x_k - x_{k+1} \Vert ^{q-1}. \end{aligned}$$

It remains to use inequality (2.12). $\square $

As we can see, the condition for superlinear convergence of the method (3.1) in terms of the norm of the gradient is the same as in Corollary 1: we need to have ${p \over q-1} > 1$, that is $p > q-1$. Moreover, the local rate of convergence has the same order as that for the residual of the function value.

According to Theorem 1, the region of superlinear convergence of RCTM in terms of the function value is as follows:

$$\begin{aligned} \begin{array}{rcl} \mathcal {Q}= & {} \left\{ x \in \mathrm{dom} \,h: \; F(x) - F^* \; \le \; {1 \over q} \cdot \biggl ( { \sigma _q^{p + 1} \over (q - 1)^{q - 1} } \cdot \Bigl ( { p! \over L_p + H } \Bigr )^{q} \biggr )^{1 \over p - q + 1} \right\} . \end{array} \end{aligned}$$

(3.8)

Alternatively, by Theorem 2, in terms of the norm of minimal subgradient (3.6), the region of superlinear convergence looks as follows:

$$\begin{aligned} \begin{array}{rcl} \mathcal {G}= & {} \left\{ x \in \mathrm{dom} \,h: \; \eta (x) \; \le \; \biggl ( \sigma _q^{p} \cdot \Bigl ( { p! \over L_p + H } \Bigr )^{q - 1} \biggr )^{1 \over p - q + 1} \right\} . \end{array} \end{aligned}$$

(3.9)

Note that these sets can be very different. Indeed, set $\mathcal {Q}$ is a closed and convex neighborhood of the point $x^*$. At the same time, the structure of the set $\mathcal {G}$ can be very complex since in general the function $\eta (x)$ is discontinuous. Let us look at simple example where $h(x) = \text{ Ind}_Q(x)$, the indicator function of a closed convex set Q.

Example 1

Consider the following optimization problem:

$$\begin{aligned} \min \limits _{x \in \mathbb {R}^2} \left\{ f(x) : \; \Vert x \Vert ^2 \; {\mathop {=}\limits ^{\mathrm {def}}}\; (x^{(1)})^2 + (x^{(2)})^2 \le 1 \right\} , \end{aligned}$$

(3.10)

with

$$\begin{aligned} f(x)= & {} \frac{\sigma _2}{2}\Vert x - \bar{x}\Vert ^2 + \frac{2 \sigma _3}{3}\Vert x - \bar{x}\Vert ^3, \end{aligned}$$

for some fixed $\sigma _2, \sigma _3 > 0$ and $\bar{x} = (0, -2) \in \mathbb {R}^2$. We have

$$\begin{aligned} \nabla f(x) = r(x) \cdot ( x^{(1)}, x^{(2)} + 2), \end{aligned}$$

where $r: \mathbb {R}^2 \rightarrow \mathbb {R}$ is

$$\begin{aligned} r(x) = \sigma _2 + 2\sigma _3 \Vert x - \bar{x}\Vert . \end{aligned}$$

Note that f is uniformly convex of degree $q = 2$ with constant $\sigma _2$, and for $q = 3$ with constant $\sigma _3$ (see Lemma 4.2.3 in [19]). Moreover, we have for any $\nu \in [0, 1]$:

$$\begin{aligned} \langle \nabla f(x) - \nabla f(y), x - y \rangle\ge & {} \sigma _2 \Vert x - y\Vert ^2 + \sigma _3 \Vert x - y\Vert ^3 \\\ge & {} \min \limits _{t \ge 0} \Bigl \{ \frac{\sigma _2}{t^{\nu }} + \sigma _3 t^{1 - \nu } \Bigr \} \cdot \Vert x - y\Vert ^{2 + \nu } \\\ge & {} \sigma _2^{1 - \nu } \sigma _3^{\nu } \cdot \Vert x - y\Vert ^{2 + \nu }. \end{aligned}$$

Hence, this function is uniformly convex of any degree $q \in [2, 3]$. At the same time, the Hessian of f is Lipschitz continuous with constant $L_2 = 4 \sigma _3$ (see Lemma 4.2.4 in [19]).

Clearly, in this problem $x^*=(0,-1)$, and it can be written in the composite form (2.1) with

$$\begin{aligned} h(x)= & {} \left\{ \begin{array}{ll} + \infty , &{}\quad \text{ if } \Vert x \Vert > 1, \\ 0, &{}\quad \text{ otherwise. } \end{array} \right. \end{aligned}$$

Note that for $x \in \mathrm{dom} \,h \equiv \{ x: \; \Vert x \Vert \le 1\}$, we have

$$\begin{aligned} \partial h(x) \; = \; \left\{ \begin{array}{ll} 0, &{}\quad \text{ if } \Vert x \Vert < 1, \\ \{ \gamma x, \, \gamma \ge 0 \}, &{}\quad \text{ if } \Vert x \Vert = 1. \end{array} \right. \end{aligned}$$

Therefore, if $\Vert x \Vert < 1$, then $\eta (x) = \Vert \nabla f(x) \Vert \ge \sigma _2$. If $\Vert x \Vert = 1$, then

$$\begin{aligned} \eta ^2(x)&{{\mathop {=}\limits ^{(3.6)}}}&\min \limits _{\gamma \ge 0} \Bigl \{ \bigl [ (r(x) + \gamma ) x^{(1)} \bigr ]^2 + \bigl [ (r(x) + \gamma ) x^{(2)} + 2 r(x) \bigr ]^2 \Bigr \} \\= & {} \min \limits _{\gamma \ge 0} \Bigl \{ (r(x) + \gamma )^2 + 4r(x) (r(x) + \gamma ) x^{(2)} + 4 r^2(x) \Bigr \} \\= & {} \left\{ \begin{array}{ll} 4r^2(x) (1 - (x^{(2)})^2), &{}\quad \text{ if } x^{(2)} \le -\frac{1}{2}, \\ r^2(x) (5 + 4 x^{(2)}), &{}\quad \text{ otherwise. } \end{array} \right. \end{aligned}$$

Thus, in any neighbourhood of $x^*$, $\eta (x)$ vanishes only along the boundary of the feasible set. $\square $

So, the question arises how the Tensor Method (3.1) could come to the region $\mathcal {G}$. The answer follows from the inequalities derived in Sect. 2. Indeed,

$$\begin{aligned} \Vert F'(x_{k+1}) \Vert _* \; {{\mathop {\le }\limits ^{(2.12)}}} \; {L_p + H \over p!} \Vert x_k - x_{k+1} \Vert ^p, \end{aligned}$$

and

$$\begin{aligned} F(x_k) - F(x_{k+1})\ge & {} \langle F'(x_{k+1}), x_k - x_{k+1} \rangle \\&{{\mathop {\ge }\limits ^{(2.14)}}}&\left( {p! \over (p+1)L_p} \right) ^{1 \over p} \cdot \Vert F'(x_{k+1}) \Vert ^{p+1 \over p}_*. \end{aligned}$$

Thus, at some moment the norm $\Vert F'(x_k) \Vert _*$ will be small enough to enter $\mathcal {G}$.

4 Global complexity bounds

Let us briefly discuss the global complexity bounds of the method (3.1), namely the number of iterations required for coming from an arbitrary initial point $x_0 \in \mathrm{dom} \,h$ to the region $\mathcal {Q}$. First, note that for every step $T = T_H(x)$ of the method with parameter $H \ge p L_p$, we have

$$\begin{aligned} F(T)&{{\mathop {\le }\limits ^{(2.3)}}}&\varOmega _p(f,x;T) + \frac{H}{(p + 1)!}\Vert T - x\Vert ^{p + 1} + h(T) \\&{{\mathop {=}\limits ^{(2.6)}}}&\min \limits _{y \in \mathbb {E}} \Bigl \{ \varOmega _p(f,x;y) + \frac{H}{(p + 1)!}\Vert y - x\Vert ^{p + 1} + h(y) \Bigr \} \\&{{\mathop {\le }\limits ^{(2.3)}}}&\min \limits _{y \in \mathbb {E}} \Bigl \{ F(y) + \frac{H + L_p}{(p + 1)!} \Vert y - x\Vert ^{p + 1} \Bigr \}. \end{aligned}$$

Therefore,

$$\begin{aligned} \begin{array}{rcl} F(T(x)) - F^{*}\le & {} \frac{H + L_p}{(p + 1)!}\Vert x - x^{*}\Vert ^{p + 1}, \quad x \in \mathrm{dom} \,h, \end{array} \end{aligned}$$

(4.1)

with $x^{*} {\mathop {=}\limits ^{\mathrm {def}}}\arg \min \limits _{y \in \mathbb {E}} F(y)$, which exists by our assumption. Denote by D the maximal radius of the initial level set of the objective, which we assume to be finite:

$$\begin{aligned} \begin{array}{rcl} D \;\; {\mathop {=}\limits ^{\mathrm {def}}}\; \sup \limits _{x \in \mathrm{dom} \,h} \Bigl \{ \Vert x - x^{*}\Vert :\; F(x) \le F(x_0) \Bigr \} \;< & {} \; +\infty . \end{array} \end{aligned}$$

Then, by monotonicity of the method (3.1) and by convexity we conclude

$$\begin{aligned} {1 \over D}\Bigl ( F(x_{k + 1}) - F^* \Bigr ) \; \le \; {1 \over D}\langle F'(x_{k + 1}), x_{k + 1} - x^{*} \rangle \; \le \; \Vert F'(x_{k + 1})\Vert _{*}. \end{aligned}$$

(4.2)

In the general convex case, we can prove the global sublinear rate of convergence of the Tensor Method of the order $O({1 / k^p})$ [18]. For completeness of presentation, let us prove an extension of this result onto the composite case.

Theorem 3

For the method (3.1) with $H = pL_p$ we have

$$\begin{aligned} \begin{array}{rcl} F(x_{k}) - F^{*}\le & {} { (p + 1) (2p)^p \over p! } \cdot {L_p D^{p + 1} \over (k - 1)^p}, \qquad k \ge 2. \end{array} \end{aligned}$$

(4.3)

Proof

Indeed, in view of (2.14) and (4.2), we have for every $k \ge 0$

$$\begin{aligned} F(x_{k}) - F(x_{k + 1})\ge & {} \langle F'(x_{k + 1}), x_k - x_{k + 1} \rangle \\&{{\mathop {\ge }\limits ^{(2.14)}}}&\left( {p! \over (p+1)L_p} \right) ^{1 \over p} \cdot \Vert F'(x_{k + 1}) \Vert _*^{p+1 \over p} \\&{{\mathop {\ge }\limits ^{(4.2)}}}&\left( {p! \over (p+1)L_p D^{p + 1} } \right) ^{1 \over p} \cdot \Bigl ( F(x_{k + 1}) - F^* \Bigr )^{ p + 1 \over p }. \end{aligned}$$

Denoting $\delta _k = F(x_k) - F^*$ and $C = \left( {p! \over (p+1) L_p D^{p + 1} }\right) ^{1 \over p}$, we obtain the following recurrence:

$$\begin{aligned} \begin{array}{rcl} \delta _{k} - \delta _{k + 1}\ge & {} C \delta _{k + 1}^{p + 1 \over p}, \qquad k \ge 0, \end{array} \end{aligned}$$

(4.4)

or for $\mu _k = C^p \delta _k {{\mathop {\le }\limits ^{(4.1)}}} 1$, as follows:

$$\begin{aligned} \mu _{k} - \mu _{k + 1}\ge & {} \mu _{k + 1}^{p + 1 \over p}, \qquad k \ge 0. \end{aligned}$$

Then, Lemma 1.1 from [8] provides us with the following guarantee:

$$\begin{aligned} \mu _{k}\le & {} \Bigl ( \frac{p(1 + \mu _1^{1 / p})}{k - 1} \Bigr )^p \; \le \; \Bigl ( \frac{2p}{k - 1} \Bigr )^p, \quad k \ge 2. \end{aligned}$$

Therefore,

$$\begin{aligned} \delta _k= & {} {\mu _{k} \over C^p} \; \le \; \left( { 2p \over C (k - 1) }\right) ^p \; = \; { (p + 1) (2p)^p \over p! } \cdot {L_p D^{p + 1} \over (k - 1)^p}, \qquad k \ge 2. \end{aligned}$$

$\square $

For a given degree $q \ge 2$ of uniform convexity with $\sigma _q > 0$, and for RCTM of order $p \ge q - 1$, let us denote by $\omega _{p, q}$ the following condition number:

$$\begin{aligned} \omega _{p, q} \; {\mathop {=}\limits ^{\mathrm {def}}}\; \frac{p + 1}{p!} \cdot \Bigl ( \frac{q - 1}{q} \Bigr )^{q - 1} \cdot \frac{L_p D^{p - q + 1}}{\sigma _q}. \end{aligned}$$

Corollary 2

In order to achieve the region $\mathcal {Q}$ it is enough to perform

$$\begin{aligned} \Biggl \lceil 2p \cdot \biggl ( { q^{q} \over (q - 1)^{q - 1} } \cdot \omega _{p, q}^{\frac{p + 1}{p}} \biggr )^{1 \over p - q + 1} \Biggr \rceil + 2 \end{aligned}$$

(4.5)

iterations of the method.

Proof

Plugging (3.8) into (4.3). $\square $

We can improve this estimate, knowing that the objective is globally uniformly convex (3.2). Then the linear rate of convergence arises at the first state, till the entering in the region $\mathcal {Q}$.

Theorem 4

Let $\sigma _q > 0$ with $q \le p + 1$. Then for the method (3.1) with $H = pL_p$, we have

$$\begin{aligned} \begin{array}{rcl} F(x_{k}) - F^{*}\le & {} \exp \left( -{k \over 1 + \omega ^{1/p}_{p, q}} \right) \cdot \bigl ( F(x_{0}) - F^*\bigr ), \qquad k \ge 1. \end{array} \end{aligned}$$

(4.6)

Therefore, for a given $\varepsilon > 0$ to achieve $F(x_K) - F^{*} \le \varepsilon $, it is enough to set

$$\begin{aligned} \begin{array}{rcl} K= & {} \left\lceil (1+\omega ^{1/p}_{p,q}) \cdot \log {\frac{F(x_0) - F^{*}}{\varepsilon }} \right\rceil + 1. \end{array} \end{aligned}$$

(4.7)

Proof

Indeed, for every $k \ge 0$

$$\begin{aligned} F(x_{k}) - F(x_{k + 1})\ge & {} \langle F'(x_{k + 1}), x_k - x_{k + 1} \rangle \\&{{\mathop {\ge }\limits ^{(2.14)}}}&\left( {p! \over (p+1)L_p} \right) ^{1 \over p} \cdot \Vert F'(x_{k + 1}) \Vert _*^{p+1 \over p} \\= & {} \left( {p! \over (p+1)L_p} \right) ^{1 \over p} \cdot \Vert F'(x_{k + 1}) \Vert _*^{p - q + 1 \over p} \cdot \Vert F'(x_{k + 1}) \Vert _*^{q \over p} \\&{\mathop {\!}\limits ^{(4.2),(3.4)}}\!{\ge }\!\!&\left( {p! \over p \!+\! 1} \cdot { \sigma _q \over L_p D^{p \!-\! q\! +\! 1}} \right) ^{1 \over p} \cdot \left( { q \over q \!-\! 1 }\right) ^{q \!-\! 1 \over p} \cdot \Bigl ( F(x_{k \!+\! 1})\! -\! F^* \Bigr ) \\= & {} \left( \frac{1}{\omega _{p, q}} \right) ^{1 \over p} \cdot \Bigl ( F(x_{k + 1}) - F^* \Bigr ). \end{aligned}$$

Denoting $\delta _k = F(x_k) - F^{*}$, we obtain

$$\begin{aligned} \delta _{k + 1}\le & {} {\omega ^{1/p}_{p,q} \over 1 + \omega ^{1/p}_{p,q}} \cdot \delta _k \; \le \; \exp \left( - {1 \over 1 + \omega ^{1/p}_{p,q}} \right) \cdot \delta _k, \qquad k \ge 1. \end{aligned}$$

$\square $

We see that, for RCTM with $p \ge 2$ minimizing the uniformly convex objective of degree $q \le p + 1$, the condition number $\omega ^{1/p}_{p, q}$ is the main factor in the global complexity estimates (4.5) and (4.7). Since in general this number may be arbitrarily big, complexity estimate $\tilde{O}(\omega _{p, q}^{1 / p})$ in (4.7) is much better than the estimate $O(\omega _{p, q}^{(p + 1) / (p(p - q + 1))})$ in (4.5) because of relation ${ p + 1 \over p - q + 1} \ge 1$.

These global bounds can be improved, by using the universal [3, 10] and the accelerated [7, 9, 10, 17, 28] high-order schemes.

High-order tensor methods for minimizing the gradient norm were developed in [4]. These methods achieve near-optimal global convergence rates, and can be used for coming into the region $\mathcal {G}$ (3.9). Note, that for the composite minimization problems, some modification of these methods is required, which ensures minimization of the subgradient norm.

Finally, let us mention some recent results [12, 20], where it was shown that a proper implementation of the third-order schemes by second-order oracle may lead to a significant acceleration of the methods. However, the relation of these techniques to the local convergence needs further investigations.

5 Application to proximal methods

Let us discuss now a general approach, which uses the local convergence of the methods for justifying the global performance of proximal iterations.

The proximal method [23] is one of the classical methods in theoretical optimization. Every step of the method for solving problem (2.1) is a minimization of the regularized objective:

$$\begin{aligned} \begin{array}{rcl} x_{k + 1}= & {} \arg \min \limits _{x \in \mathbb {E}} \Bigl \{ a_{k + 1} F(x) + \frac{1}{2}\Vert x - x_k\Vert ^2 \Bigr \}, \qquad k \ge 0, \end{array} \end{aligned}$$

(5.1)

where $\{ a_k \}_{k \ge 1}$ is a sequence of positive coefficients, related to the iteration counter.

Of course, in general, we can hope only to solve subproblem (5.1) inexactly. The questions of practical implementations and possible generalizations of the proximal method, are still in the area of intensive research (see, for example [11, 25,26,27]).

One simple observation on the subproblem (5.1) is that it is 1-strongly convex. Therefore, if we would be able to pick an initial point from the region of superlinear convergence (3.8) or (3.9), we could minimize it very quickly by RCTM of degree $p \ge 2$ up to arbitrary accuracy. In this section, we are going to investigate this approach. For the resulting scheme, we will prove the global rate of convergence of the order $\tilde{O}(1 / k^{p + 1 \over 2})$.

Denote by $\varPhi _{k + 1}$ the regularized objective from (5.1):

$$\begin{aligned} \varPhi _{k + 1}(x) \; {\mathop {=}\limits ^{\mathrm {def}}}\; a_{k + 1} F(x) + \frac{1}{2}\Vert x - x_k\Vert ^2 \; = \; a_{k + 1} f(x) + \frac{1}{2}\Vert x - x_k\Vert ^2 + a_{k + 1} h(x). \end{aligned}$$

We fix a sequences of accuracies $\{\delta _k\}_{k \ge 1}$ and relax the assumption on exact minimization in (5.1). Now, at every step we need to find a point $x_{k + 1}$ and corresponding subgradient vector $g_{k + 1} \in \partial \varPhi _{k + 1}(x_{k + 1})$ with bounded norm:

$$\begin{aligned} \begin{array}{rcl} \Vert g_{k + 1}\Vert _{*}\le & {} \delta _{k + 1}. \end{array} \end{aligned}$$

(5.2)

Denote

$$\begin{aligned} F'(x_{k + 1}) \; {\mathop {=}\limits ^{\mathrm {def}}}\; \frac{1}{a_{k + 1}}( g_{k + 1} - B(x_{k + 1} - x_k)) \; \in \; \partial F(x_{k + 1}). \end{aligned}$$

The following global convergence result holds for the general proximal method with inexact minimization criterion (5.2).

Theorem 5

Assume that there exist a minimum $x^{*} \in \mathrm{dom} \,h$ of the problem (2.1). Then, for any $k \ge 1$, we have

$$\begin{aligned} \begin{array}{rcl} \sum \limits _{i = 1}^k a_i(F(x_i) - F^{*}) + \frac{1}{2}\sum \limits _{i = 1}^k a_i^2 \Vert F'(x_i)\Vert _{*}^2 + \frac{1}{2}\Vert x_k - x^{*}\Vert ^2\le & {} R_k(\delta ), \end{array} \end{aligned}$$

(5.3)

where

$$\begin{aligned} R_k(\delta ) \; {\mathop {=}\limits ^{\mathrm {def}}}\; \frac{1}{2}\left( \Vert x_0 - x^{*}\Vert + \sum \limits _{i = 1}^k \delta _i \right) ^2. \end{aligned}$$

Proof

First, let us prove that for all $k \ge 0$ and for every $x \in \mathrm{dom} \,h$, we have

$$\begin{aligned} \begin{array}{rcl} \frac{1}{2}\Vert x_0 - x\Vert ^2 + \sum \limits _{i = 1}^k a_i F(x)\ge & {} \frac{1}{2}\Vert x_k - x\Vert ^2 + C_k(x), \end{array} \end{aligned}$$

(5.4)

where

$$\begin{aligned} C_k(x) \; {\mathop {=}\limits ^{\mathrm {def}}}\; \sum \limits _{i = 1}^k \left( a_i F(x_i) + \frac{a_i^2}{2} \Vert F'(x_i)\Vert _{*}^2 + \langle g_i, x - x_{i - 1} \rangle - \frac{\delta _i^2}{2} \right) . \end{aligned}$$

This is obviously true for $k = 0$. Let it hold for some $k \ge 0$. Consider the step number $k + 1$ of the inexact proximal method.

By condition (5.2), we have

$$\begin{aligned} \Vert a_{k + 1} F'(x_{k + 1}) + B(x_{k + 1} - x_k) \Vert _{*}^2\le & {} \delta _{k + 1}^2. \end{aligned}$$

Equivalently,

$$\begin{aligned} \langle a_{k + 1} F'(x_{k + 1}), x_k - x_{k + 1} \rangle \ge \frac{a_{k + 1}^2}{2}\Vert F'(x_{k + 1})\Vert _{*}^2 + \frac{1}{2}\Vert x_{k + 1} - x_k\Vert ^2 - \frac{\delta _{k + 1}^2}{2}.\nonumber \\ \end{aligned}$$

(5.5)

Therefore, using the inductive assumption and strong convexity of $\varPhi _{k + 1}(\cdot )$, we conclude

$$\begin{aligned}&\frac{1}{2}\Vert x_0 - x\Vert ^2 + \sum \limits _{i = 1}^{k + 1} a_i F(x) \; = \; \frac{1}{2}\Vert x_0 - x\Vert ^2 + \sum \limits _{i = 1}^k a_i F(x) + a_{k + 1} F(x) \\&\quad \; {{\mathop {\ge }\limits ^{(5.4)}}} \; \varPhi _{k + 1}(x) + C_k(x) \\&\quad \ge \;\;\, \varPhi _{k + 1}(x_{k + 1}) + \langle g_{k + 1}, x - x_{k + 1} \rangle + \frac{1}{2}\Vert x_{k + 1} - x\Vert ^2 + C_k(x) \\&\quad = \;\;\, a_{k + 1} F(x_{k + 1}) + \frac{1}{2}\Vert x_{k + 1} - x_k\Vert ^2 + \langle g_{k + 1}, x_k - x_{k + 1} \rangle \\&\qquad + \;\;\, \langle g_{k + 1}, x - x_k \rangle + \frac{1}{2}\Vert x_{k + 1} - x\Vert ^2 + C_k(x) \\&\quad = \;\;\, a_{k + 1} F(x_{k + 1}) + \langle a_{k + 1} F'(x_{k + 1}), x_k - x_{k + 1} \rangle - \frac{1}{2}\Vert x_{k + 1} - x_k\Vert ^2 \\&\qquad + \;\;\, \langle g_{k + 1}, x - x_k \rangle + \frac{1}{2}\Vert x_{k + 1} - x\Vert ^2 + C_k(x) \\&\quad {{\mathop {\ge }\limits ^{(5.5)}}} \; a_{k + 1} F(x_{k + 1}) + \frac{a_{k + 1}^2}{2}\Vert F'(x_{k + 1})\Vert _{*}^2 - \frac{\delta _{k + 1}^2}{2} \\&\qquad + \;\;\, \langle g_{k + 1}, x - x_k \rangle + \frac{1}{2}\Vert x_{k + 1} - x\Vert ^2 + C_k(x) \\&\quad = \;\;\, \frac{1}{2}\Vert x_{k + 1} - x\Vert ^2 + C_{k + 1}(x). \end{aligned}$$

Thus, inequality (5.4) is valid for all $k \ge 0$.

Now, plugging $x \equiv x^{*}$ into (5.4), we have

$$\begin{aligned}&\sum \limits _{i = 1}^k a_i (F(x_i) - F^{*}) + \frac{1}{2}\sum \limits _{i = 1}^k a_i^2 \Vert F'(x_i)\Vert _{*}^2 + \frac{1}{2}\Vert x_k - x^{*}\Vert ^2 \nonumber \\&\quad \le \;\;\, \frac{1}{2}\Vert x_0 - x^{*}\Vert ^2 + \frac{1}{2}\sum \limits _{i = 1}^k \delta _i^2 + \sum \limits _{i = 1}^k \langle g_i, x_{i - 1} - x^{*} \rangle \nonumber \\&\quad {{\mathop {\le }\limits ^{(5.2)}}} \; \frac{1}{2}\Vert x_0 - x^{*}\Vert ^2 + \frac{1}{2}\sum \limits _{i = 1}^k \delta _i^2 + \sum \limits _{i = 1}^k \delta _i \Vert x_{i - 1} - x^{*} \Vert {\mathop {=}\limits ^{\mathrm {def}}}\alpha _k. \end{aligned}$$

(5.6)

In order to finish the proof, it is enough to show that $\alpha _k \le R_k(\delta )$.

Indeed,

$$\begin{aligned} \alpha _{k + 1}= & {} \alpha _k + \frac{1}{2} \delta _{k + 1}^2 + \delta _{k + 1} \Vert x_k - x^{*}\Vert \\&{{\mathop {\le }\limits ^{(5.6)}}}&\alpha _k + \frac{1}{2}\delta _{k + 1}^2 + \delta _{k + 1} \sqrt{2 \alpha _k} \\= & {} \left( \sqrt{\alpha _k} + \frac{1}{\sqrt{2}}\delta _{k + 1} \right) ^2. \end{aligned}$$

Therefore,

$$\begin{aligned} \sqrt{\alpha _k}\le & {} \sqrt{\alpha _{k - 1}} + \frac{1}{\sqrt{2}}\delta _{k} \; \le \; \cdots \; \le \; \sqrt{\alpha _0} + \frac{1}{\sqrt{2}}\sum \limits _{i = 1}^k \delta _i \\= & {} \frac{1}{\sqrt{2}}\left( \Vert x_0 - x^{*}\Vert + \sum \limits _{i = 1}^k \delta _i \right) \; = \; \sqrt{R_k(\delta )}. \end{aligned}$$

$\square $

Now, we are ready to use the result on the local superlinear convergence of RCTM in the norm of subgradient (Theorem 2), in order to minimize $\varPhi _{k + 1}(\cdot )$ at every step of inexact proximal method.

Note that

$$\begin{aligned} \partial \varPhi _{k + 1}(x)= & {} a_{k + 1} \partial F(x) + B(x - x_k), \end{aligned}$$

and it is natural to start minimization process from the previous point $x_k$, for which $\partial \varPhi _{k + 1}(x_k) = a_{k + 1} \partial F(x_k)$. Let us also notice, that the Lipschitz constant of the pth derivative ($p \ge 2$) of the smooth part of $\varPhi _{k + 1}$ is $a_{k + 1} L_p$.

Using our previous notation, one step of RCTM can be written as follows:

$$\begin{aligned}&T_H(\varPhi _{k + 1}, z) \\&\;\;\, {\mathop {=}\limits ^{\mathrm {def}}}\;\;\, \arg \min \limits _{y \in \mathbb {E}} \Bigl \{ a_{k \!+\! 1} \varOmega _{p}(f, z; y) + \frac{H}{(p \!+\! 1)!}\Vert y - z\Vert ^{p \!+\! 1} \!+\! a_{k \!+\! 1}h(y) + \frac{1}{2}\Vert y - x_k\Vert ^2 \Bigr \}, \end{aligned}$$

where $H = a_{k + 1}pL_p$. Then, a sufficient condition for $z = x_k$ to be in the region of superlinear convergence (3.9) is

$$\begin{aligned} a_{k + 1} \Vert F'(x_k) \Vert _*\le & {} \left( p! \over a_{k + 1} (p + 1) L_p \right) ^{1 \over p - 1}, \end{aligned}$$

or, equivalently

$$\begin{aligned} a_{k + 1}\le & {} \left( {1 \over \Vert F'(x_k)\Vert _{*} }\right) ^{p - 1 \over p} \left( { p! \over (p + 1) L_p }\right) ^{1 \over p}. \end{aligned}$$

To be sure that $x_k$ is strictly inside the region, we can pick:

$$\begin{aligned} \boxed { \begin{array}{rcl} a_{k + 1}= & {} \left( {1 \over 2 \Vert F'(x_k)\Vert _{*}} \right) ^{p - 1 \over p} \left( p! \over (p + 1) L_p \right) ^{1 \over p} \end{array} } \end{aligned}$$

(5.7)

Note, that this rule requires fixing an initial subgradient $F'(x_0) \in \partial F(x_0)$, in order to choose $a_1$.

Finally, we apply the following steps:

$$\begin{aligned} \begin{array}{rcl} z_0 \; = \; x_k, \quad z_{t+1}= & {} T_{H}(\varPhi _{k + 1}, z_t), \quad t \ge 0. \end{array} \end{aligned}$$

(5.8)

We can estimate the required number of these iterations as follows.

Lemma 2

At every iteration $k \ge 0$ of the inexact proximal method, in order to achieve $\Vert \varPhi '_{k + 1}(z_t) \Vert _{*} \le \delta _{k + 1}$, it is enough to perform

$$\begin{aligned} \begin{array}{rcl} t_k= & {} \biggl \lceil \frac{1}{\log _2 p} \cdot \log _2 \log _2 \left( \frac{2 D_k(\delta ) }{ \delta _{k + 1}} \right) \biggr \rceil \end{array} \end{aligned}$$

(5.9)

steps of RCTM (5.8), where

$$\begin{aligned} D_k(\delta ) \; {\mathop {=}\limits ^{\mathrm {def}}}\; \max \biggl \{ \Vert x_0 - x^{*}\Vert + \sum \limits _{i = 1}^k \delta _i, \Bigl ( \frac{p! \Vert F'(x_0)\Vert _{*} }{(p + 1)L_p2^{p - 1}} \Bigr )^{1 \over p} \biggr \} \end{aligned}$$

Proof

According to (3.7), one step of RCTM (5.8) provides us with the following guarantee in terms of the subgradients of our objective $\varPhi _{k + 1}(\cdot )$:

$$\begin{aligned} \begin{array}{rcl} \Vert \varPhi '_{k + 1}(z_t) \Vert _{*}\le & {} \frac{a_{k + 1} (p + 1) L_p}{p!} \Vert \varPhi '_{k + 1}(z_{t - 1}) \Vert _{*}^p, \end{array} \end{aligned}$$

(5.10)

where we used in (3.7) the values $q = 2$, $\sigma _q = 1$, $a_{k + 1} L_p$ for the Lipschitz constant of the pth derivative of the smooth part of $\varPhi _{k + 1}$, and $H = a_{k + 1}pL_p$.

Denote $\beta \equiv \left( { a_{k + 1}(p + 1)L_p \over p! } \right) ^{1 \over p - 1} {{\mathop {=}\limits ^{(5.7)}}} \left( { (p + 1) L_p \over 2 \cdot p! \cdot \Vert F'(x_k)\Vert _* } \right) ^{1 \over p}$. Then, from (5.10) we have

$$\begin{aligned}&\beta \Vert \varPhi '_{k + 1}(z_t) \Vert _{*} \le \bigl (\beta \Vert \varPhi '_{k + 1}(z_{t - 1}) \Vert _{*}\bigr )^{p} \nonumber \\&\quad \le \dots \;\; \le \;\; \bigl (\beta \Vert \varPhi '_{k + 1}(z_0) \Vert _{*}\bigr )^{p^t} \nonumber \\&\quad = (\beta a_{k + 1}\Vert F'(x_k)\Vert _{*})^{p^t} \nonumber \\&\quad = \left( a_{k + 1}^{p \over p - 1} \left( { (p + 1) L_p \over p! }\right) ^{1 \over p - 1} \Vert F'(x_k)\Vert _{*} \right) ^{p^t} \nonumber \\&\quad {{\mathop {=}\limits ^{(5.7)}}} \; \left( {1 \over 2}\right) ^{p^t}. \end{aligned}$$

(5.11)

Therefore, for

$$\begin{aligned} \begin{array}{rcl} t\ge & {} \log _p \log _2 \left( \frac{1}{\beta \delta _{k + 1}} \right) \; = \; \frac{1}{\log _2 p} \cdot \log _2 \log _2 \left( \frac{1}{ \delta _{k + 1}} \left( { 2 \cdot p! \cdot \Vert F'(x_k) \Vert _* \over (p + 1) L_p } \right) ^{1 \over p} \right) , \end{array} \end{aligned}$$

(5.12)

it holds $\Vert \varPhi '_{k + 1}(z_t)\Vert _{*} \le \delta _{k + 1}$. To finish the proof, let us estimate $\Vert F'(x_k) \Vert _{*}$ from above. We have

$$\begin{aligned} 2^{3p - 2 \over p} \left( \frac{(p + 1)L_p}{p!} \right) ^{2 \over p} R_k(\delta )&{{\mathop {\ge }\limits ^{(5.3)}}}&2^{2(p - 1) \over p} \left( \frac{(p + 1)L_p}{p!} \right) ^{2 \over p} \sum \limits _{i = 1}^k a_i^2 \Vert F'(x_i)\Vert _{*}^2 \nonumber \\&{{\mathop {=}\limits ^{(5.7)}}}&\sum \limits _{i = 1}^k \Vert F'(x_{i - 1})\Vert _{*}^{2(1 - p) \over p} \Vert F'(x_i)\Vert _{*}^2. \end{aligned}$$

(5.13)

Thus, for every $1 \le i \le k$ it holds

$$\begin{aligned} \Vert F'(x_i)\Vert _{*} \; {{\mathop {\le }\limits ^{(5.13)}}} \; \Vert F'(x_{i - 1})\Vert _{*}^{\rho } \cdot \mathcal {D}, \end{aligned}$$

(5.14)

with $\mathcal {D} \equiv R_k^{1/2}(\delta ) \left( \frac{(p + 1) L_p}{p!} \right) ^{1 \over p} 2^{3p - 2 \over 2p}$, and $\rho \equiv \frac{p - 1}{p}$. Therefore,

$$\begin{aligned}&\Vert F'(x_k)\Vert _{*} \; {{\mathop {\le }\limits ^{(5.14)}}} \; \Vert F'(x_0)\Vert _{*}^{\rho ^k} \cdot \mathcal {D}^{1 + \rho + \rho ^2 + \dots + \rho ^{k - 1}} \\&\quad = \Vert F'(x_0)\Vert _{*} \cdot \Bigl ( \Vert F'(x_0)\Vert _{*}^{\rho ^k - 1} \cdot \mathcal {D}^{\frac{\; \;1 - \rho ^k}{1 - \rho }} \Bigr ) \\&\quad = \Vert F'(x_0)\Vert _{*} \cdot \left( \frac{\mathcal {D}^{p}}{\Vert F'(x_0)\Vert _{*}} \right) ^{1 - \rho ^k} \; \le \; \Vert F'(x_0) \Vert _{*} \cdot \max \bigl \{ \frac{\mathcal {D}^p}{\Vert F'(x_0)\Vert _{*}}, 1 \bigr \} \\&\quad = \max \biggl \{ \frac{(p + 1) L_p 2^{p - 1}}{p!} \Bigl ( \Vert x_0 - x^{*}\Vert + \sum \limits _{i = 1}^k \delta _i \Bigr )^p, \; \Vert F'(x_0)\Vert _{*} \biggr \}. \end{aligned}$$

Substitution of this bound into (5.12) gives (5.9). $\square $

Let us prove now the rate of convergence for the outer iterations. This is a direct consequence of Theorem 5 and the choice (5.7) of the coefficients $\{ a_{k} \}_{k \ge 1}$.

Lemma 3

Let for a given $\varepsilon > 0$,

$$\begin{aligned} \begin{array}{rcl} F(x_k) - F^{*}\ge & {} \varepsilon , \qquad 1 \le k \le K. \end{array} \end{aligned}$$

(5.15)

Then for every $1 \le k \le K$, we have

$$\begin{aligned} \begin{array}{rcl} F(\bar{x}_k) - F^{*}\le & {} \frac{L_p \left( \Vert x_0 - x^{*} \Vert + \sum _{i = 1}^k \delta _i \right) ^{p + 1} }{k^{p + 1 \over 2}} \frac{(p + 1) 2^{p - 2} V_k(\varepsilon ) }{ p!}, \end{array} \end{aligned}$$

(5.16)

where $\bar{x}_k \; {\mathop {=}\limits ^{\mathrm {def}}}\; \frac{\sum _{i = 1}^k a_i x_i}{\sum _{i = 1}^k a_i}$, and $V_k(\varepsilon ) \; {\mathop {=}\limits ^{\mathrm {def}}}\; \left( \frac{\Vert F'(x_0)\Vert _{*} \cdot ( \Vert x_0 - x^{*}\Vert + \sum _{i = 1}^k \delta _i )}{\varepsilon } \right) ^{p - 1 \over k}$.

Proof

Using the inequality between the arithmetic and geometric means, we obtain

$$\begin{aligned} R_{k}(\delta )&{{\mathop {\ge }\limits ^{(5.3)}}}&\frac{1}{2}\sum \limits _{i = 1}^k a_i^2 \Vert F'(x_i)\Vert _*^2 \; {{\mathop {=}\limits ^{(5.7)}}} \; \frac{1}{8} \left( \frac{p!}{(p + 1)L_p} \right) ^{2 \over p - 1} \sum \limits _{i = 1}^k \frac{a_i^2}{a_{i + 1}^{2p \over p - 1}} \nonumber \\\ge & {} \frac{k}{8} \left( \frac{p!}{(p + 1)L_p} \right) ^{2 \over p - 1} \left( \prod \limits _{i = 1}^k \frac{a_i^2}{a_{i + 1}^{2p \over p - 1}} \right) ^{1 \over k} \nonumber \\= & {} \frac{k}{8} \left( \frac{p!}{(p + 1)L_p} \right) ^{2 \over p - 1} \left( \frac{a_1}{a_{k + 1}} \right) ^{2p \over (p - 1)k} \left( \prod \limits _{i = 1}^k a_i \right) ^{-2 \over (p - 1)k} \nonumber \\\ge & {} \frac{k^{p + 1 \over p - 1}}{8} \left( \frac{p!}{(p + 1)L_p} \right) ^{2 \over p - 1} \left( \frac{a_1}{a_{k + 1}} \right) ^{2p \over (p - 1)k} \left( \sum \limits _{i = 1}^k a_i \right) ^{-2 \over p - 1}. \end{aligned}$$

(5.17)

Therefore,

$$\begin{aligned} F(\bar{x}_k) - F^{*}\le & {} \frac{1}{\sum \limits _{i = 1}^k a_i} \sum \limits _{i = 1}^k a_i (F(x_i) - F^{*}) \; {{\mathop {\le }\limits ^{(5.3)}}} \; \frac{R_k(\delta )}{\sum \limits _{i = 1}^k a_i} \\&{{\mathop {\le }\limits ^{(5.17)}}}&\frac{ R_k(\delta )^{p + 1 \over 2} }{k^{p + 1 \over 2}} \frac{(p + 1) L_p}{p!} \left( \frac{a_{k + 1}}{a_1} \right) ^{p \over k} 8^{p - 1 \over 2} \\= & {} \frac{L_p \left( \Vert x_0 - x^{*} \Vert + \sum _{i = 1}^k \delta _i \right) ^{p + 1} }{k^{p + 1 \over 2}} \frac{(p + 1) 2^{p - 2} }{ p!} \left( \frac{\Vert F'(x_0)\Vert _{*}}{\Vert F'(x_k)\Vert _{*}} \right) ^{p - 1 \over k}, \end{aligned}$$

where the first inequality holds by convexity. At the same time, we have

$$\begin{aligned} \Vert F'(x_k)\Vert _{*}\ge & {} \frac{\langle F'(x_k), x_k - x^{*} \rangle }{\Vert x_k - x^{*}\Vert } \; \ge \; \frac{F(x_k) - F^{*}}{\Vert x_k - x^{*}\Vert } \\&{{\mathop {\ge }\limits ^{(5.15)}}}&\frac{\varepsilon }{\Vert x_k - x^{*}\Vert } \; {{\mathop {\ge }\limits ^{(5.3)}}} \; \frac{\varepsilon }{\Vert x_0 - x^{*}\Vert + \sum _{i = 1}^k \delta _i }. \end{aligned}$$

Thus, $\left( \frac{\Vert F'(x_0)\Vert _{*}}{\Vert F'(x_k)\Vert _{*}} \right) ^{p - 1 \over k} \le V_k(\varepsilon )$ and we obtain (5.16). $\square $

Remark 1

Note that $\bigl (\frac{1}{\varepsilon }\bigr )^{p - 1 \over k} = \exp \bigl ( {p - 1 \over k} \ln {1 \over \varepsilon } \bigr )$. Therefore after $k = O\left( \ln {1 \over \varepsilon }\right) $ iterations, the factor $V_k(\varepsilon )$ is bounded by an absolute constant.

Since the local convergence of RCTM is very fast (5.9), we can choose the inner accuracies $\{ \delta _i \}_{i \ge 1}$ small enough, to have the right hand side of (5.16) being of the order $\tilde{O}(1 / k^{p + 1 \over 2})$. Let us present a precise statement.

Theorem 6

Let $\delta _k \equiv \frac{c}{k^s}$ for fixed absolute constants $c > 0$ and $s > 1$. Let for a given $\varepsilon > 0$, we have

$$\begin{aligned} F(x_k) - F^{*}\ge & {} \varepsilon , \qquad 1 \le k \le K. \end{aligned}$$

Then, for every k such that $\ln \frac{\Vert F'(x_0)\Vert _{*} R}{ \varepsilon } \le k \le K$, we get

$$\begin{aligned} \begin{array}{rcl} F(\bar{x}_k) - F^{*}\le & {} \frac{L_p R^{p + 1}}{k^{p + 1 \over 2}} \frac{(p + 1) 2^{p - 2} \exp (p - 1)}{p!}, \end{array} \end{aligned}$$

(5.18)

where

$$\begin{aligned} R \; {\mathop {=}\limits ^{\mathrm {def}}}\; \Vert x_0 - x^{*}\Vert + \frac{cs}{s - 1}. \end{aligned}$$

The total number of oracle calls $N_k$ during the first k iterations is bounded as follows:

$$\begin{aligned} N_k\le & {} k \cdot \Bigl ( 1 + \frac{1}{\log _2 p} \log _2 \log _2 \frac{2D k^s }{c} \Bigr ), \end{aligned}$$

where

$$\begin{aligned} D \; {\mathop {=}\limits ^{\mathrm {def}}}\; \max \biggr \{ R, \, \Bigl ( \frac{p! \Vert F'(x_0)\Vert _{*} }{(p + 1)L_p2^{p - 1}} \Bigr )^{1 \over p} \biggl \}. \end{aligned}$$

Proof

Indeed,

$$\begin{aligned} \sum \limits _{i = 1}^k \delta _i= & {} c\Biggl (1 + \sum \limits _{i = 2}^k \frac{1}{i^s} \Biggr ) \; \; \le \; \; c\Biggl (1 + \int \limits _1^k \frac{dx}{x^{s}} \Biggr ) \; \; = \; \; c\Biggl (1 - \frac{1}{s - 1} \int \limits _1^k dx^{-(s - 1)} \Biggr ) \\= & {} c\Biggl (1 - \frac{k^{-(s - 1)}}{s - 1} + \frac{1}{s - 1} \Biggr ) \; \; \le \; \; \frac{cs}{s - 1}. \end{aligned}$$

Thus, we obtain (5.18) directly from the bound (5.16), and by the fact that

$$\begin{aligned} V_k(\varepsilon )\equiv & {} \Bigl ( \frac{\Vert F'(x_0) \Vert _{*} R}{\varepsilon } \Bigr )^{\frac{p - 1}{k}} \; = \; \exp \Bigl ( \frac{p - 1}{k} \log \frac{\Vert F'(x_0) \Vert _{*} R}{\varepsilon } \Bigr ) \\\le & {} \exp (p - 1), \end{aligned}$$

when $k \ge \ln \frac{\Vert F'(x_0) \Vert _{*} R }{ \varepsilon } $.

Finally,

$$\begin{aligned} N_k&{{\mathop {\le }\limits ^{(5.9)}}}&\sum \limits _{i = 1}^k \left\lceil \frac{1}{\log _2 p} \log _2 \log _2 \frac{2 D }{\delta _i} \right\rceil \; \le \; k + \frac{1}{\log _2 p} \sum \limits _{i = 1}^k \log _2 \log _2 \frac{2Di^s}{c} \\\le & {} k + \frac{1}{\log _2 p} \sum \limits _{i = 1}^k \log _2 \log _2 \frac{2Dk^s}{c} \; = \; k \cdot \Biggl (1 + \frac{1}{\log _2 p} \log _2 \log _2 \frac{2Dk^s}{c} \Biggr ). \end{aligned}$$

$\square $

Note that we were able to justify the global performance of the scheme, using only the local convergence results for the inner method. It is interesting to compare our approach with the recent results on the path-following second-order methods [5].

We can drop the logarithmic components in the complexity bounds by using the hybrid proximal methods (see [15, 16]), where at each iteration only one step of RCTM is performed. The resulting rate of convergence there is $O(1 / k^{p + 1 \over 2})$, without any extra logarithmic factors. However, this rate is worse than the rate $O(1 / k^p)$ provided by the Theorem 3 for the primal iterations of RCTM (3.1).

References

Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Program. 127(2), 245–295 (2011)
Article MathSciNet Google Scholar
Chebyshev, P.L.: Polnoe sobranie sochinenii. Izd. Akad. Nauk SSSR 5, 7–25 (1951)
Google Scholar
Doikov, N., Nesterov, Y.: Minimizing uniformly convex functions by cubic regularization of Newton method. arXiv preprint arXiv:1905.02671 (2019)
Dvurechensky, P., Gasnikov, A., Ostroukhov, P., Uribe, C.A., Ivanova, A.: Near-optimal tensor methods for minimizing the gradient norm of convex function. arXiv preprint arXiv:1912.03381 (2019)
Dvurechensky, P., Nesterov, Y.: Global performance guarantees of second-order methods for unconstrained convex minimization. Technical report, CORE Discussion Paper (2018)
Evtushenko, Y.G., Tretyakov, A.A.: p-th order methods for solving nonlinear system. Dokl. akad. nauk 455(5), 512–515 (2014)
Google Scholar
Gasnikov, A., Dvurechensky, P., Gorbunov, E., Vorontsova, E., Selikhanovych, D., Uribe, C.A.: Optimal tensor methods in smooth convex and uniformly convex optimization. In: Conference on Learning Theory, pp. 1374–1391 (2019)
Grapiglia, G.N., Nesterov, Y.: Regularized Newton methods for minimizing functions with Hölder continuous Hessians. SIAM J. Optim. 27(1), 478–506 (2017)
Article MathSciNet Google Scholar
Grapiglia, G.N., Nesterov, Y.: Accelerated regularized Newton methods for minimizing composite convex functions. SIAM J. Optim. 29(1), 77–99 (2019)
Article MathSciNet Google Scholar
Grapiglia, G.N, Nesterov, Y.: Tensor methods for minimizing functions with Hölder continuous higher-order derivatives. arXiv preprint arXiv:1904.12559 (2019)
Güler, O.: On the convergence of the proximal point algorithm for convex minimization. SIAM J. Control Optim. 29(2), 403–419 (1991)
Article MathSciNet Google Scholar
Kamzolov, D., Gasnikov, A.: Near-optimal hyperfast second-order method for convex optimization and its sliding. arXiv preprint arXiv:2002.09050 (2020)
Kantorovich, L.V.: Functional analysis and applied mathematics. Uspekhi Matematicheskikh Nauk 3(6), 89–185 (1948)
MathSciNet MATH Google Scholar
Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014)
Article MathSciNet Google Scholar
Marques Alves, M., Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of a Rockafellar’s proximal method of multipliers for convex programming based on second-order approximations. Optimization 68, 1–30 (2019)
Article MathSciNet Google Scholar
Monteiro, R.D.C., Svaiter, B.F.: On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean. SIAM J. Optim. 20(6), 2755–2787 (2010)
Article MathSciNet Google Scholar
Nesterov, Y.: Accelerating the cubic regularization of Newton’s method on convex problems. Math. Program. 112(1), 159–181 (2008)
Article MathSciNet Google Scholar
Nesterov, Y.: Implementable Tensor Methods in Unconstrained Convex Optimization. Universite catholique de Louvain, Center for Operations Research and Econometrics (CORE), Leuven (2018)
MATH Google Scholar
Nesterov, Y.: Lectures on Convex Optimization, vol. 137. Springer, Berlin (2018)
Book Google Scholar
Nesterov, Y.: Superfast second-order methods for unconstrained convex optimization. CORE DP 7, 2020 (2020)
Google Scholar
Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming. SIAM, Philadelphia (1994)
Book Google Scholar
Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton’s method and its global performance. Math. Program. 108(1), 177–205 (2006)
Article MathSciNet Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Article MathSciNet Google Scholar
Rodomanov, A., Kropotov, D.: A superlinearly-convergent proximal Newton-type method for the optimization of finite sums. In: International Conference on Machine Learning, pp. 2597–2605 (2016)
Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Convex Anal. 19(4), 1167–1192 (2012)
MathSciNet MATH Google Scholar
Schmidt, M., Roux, N.L., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 1458–1466 (2011)
Solodov, M.V., Svaiter, B.F.: A unified framework for some inexact proximal point algorithms. Numer. Funct. Anal. Optim. 22(7–8), 1013–1035 (2001)
Article MathSciNet Google Scholar
Song, C., Ma, Y.: Towards unified acceleration of high-order algorithms under Hölder continuity and uniform convexity. arXiv preprint arXiv:1906.00582 (2019)

Download references

Acknowledgements

We are very thankful to anonymous referees for valuable comments that improved the initial version of this paper.

Author information

Authors and Affiliations

Institute of Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), Catholic University of Louvain (UCL), Louvain-la-Neuve, Belgium
Nikita Doikov
Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (UCL), 34 voie du Roman Pays, 1348, Louvain-la-Neuve, Belgium
Yurii Nesterov

Authors

Nikita Doikov
View author publications
You can also search for this author in PubMed Google Scholar
Yurii Nesterov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikita Doikov.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research results of this paper were obtained in the framework of ERC Advanced Grant 788368.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Doikov, N., Nesterov, Y. Local convergence of tensor methods. Math. Program. 193, 315–336 (2022). https://doi.org/10.1007/s10107-020-01606-x

Download citation

Received: 16 December 2019
Accepted: 08 December 2020
Published: 04 January 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s10107-020-01606-x

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Local convergence of tensor methods

Abstract

Similar content being viewed by others

Implementable tensor methods in unconstrained convex optimization

Reachability of Optimal Convergence Rate Estimates for High-Order Numerical Convex Optimization Methods

Linear convergence of an alternating polar decomposition method for low rank orthogonal tensor approximations

1 Introduction

2 Main inequalities

Lemma 1

Proof

3 Local convergence

Assumption 1

Theorem 1

Proof

Corollary 1

Proof

Theorem 2

Proof

Example 1

4 Global complexity bounds

Theorem 3

Proof

Corollary 2

Proof

Theorem 4

Proof

5 Application to proximal methods

Theorem 5

Proof

Lemma 2

Proof

Lemma 3

Proof

Remark 1

Theorem 6

Proof

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation