Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution

Attouch, Hedy; László, Szilárd Csaba

doi:10.1007/s00186-024-00867-y

Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution

Original Article
Open access
Published: 27 June 2024

Volume 99, pages 307–347, (2024)
Cite this article

Download PDF

You have full access to this open access article

Mathematical Methods of Operations Research Aims and scope Submit manuscript

Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution

Download PDF

920 Accesses
Explore all metrics

Abstract

In a Hilbertian framework, for the minimization of a general convex differentiable function f, we introduce new inertial dynamics and algorithms that generate trajectories and iterates that converge fastly towards the minimizer of f with minimum norm. Our study is based on the non-autonomous version of the Polyak heavy ball method, which, at time t, is associated with the strongly convex function obtained by adding to f a Tikhonov regularization term with vanishing coefficient $\varepsilon (t)$. In this dynamic, the damping coefficient is proportional to the square root of the Tikhonov regularization parameter $\varepsilon (t)$. By adjusting the speed of convergence of $\varepsilon (t)$ towards zero, we will obtain both rapid convergence towards the infimal value of f, and the strong convergence of the trajectories towards the element of minimum norm of the set of minimizers of f. In particular, we obtain an improved version of the dynamic of Su-Boyd-Candès for the accelerated gradient method of Nesterov. This study naturally leads to corresponding first-order algorithms obtained by temporal discretization. In the case of a proper lower semicontinuous and convex function f, we study the proximal algorithms in detail, and show that they benefit from similar properties.

First-order optimization algorithms via inertial systems with Hessian driven damping

Article 16 November 2020

Accelerated Gradient Methods Combining Tikhonov Regularization with Geometric Damping Driven by the Hessian

Article 31 May 2023

A fast continuous time approach for non-smooth convex optimization using Tikhonov regularization technique

Article Open access 25 October 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Throughout the paper, ${\mathcal {H}}$ is a real Hilbert space which is endowed with the scalar product $\langle \cdot ,\cdot \rangle $, with $\Vert x\Vert ^2= \langle x,x\rangle $ for $x\in {\mathcal {H}}$. We consider the convex minimization problem

$$\begin{aligned} \min \left\{ f (x): \ x \in {\mathcal {H}} \right\} , \end{aligned}$$

(1)

where $f: {\mathcal {H}} \rightarrow {\mathbb {R}}$ is a convex continuously differentiable function whose solution set $S={{\,\textrm{argmin}\,}}f$ is nonempty. We aim at finding by rapid methods the element of minimum norm of S. As an original aspect of our approach, we start from the Polyak heavy ball with friction dynamic for strongly convex functions, and then adapt it to treat the case of general convex functions. Recall that a function $f: {{\mathcal {H}}}\rightarrow {\mathbb {R}}$ is said to be $\mu $-strongly convex for some $\mu >0$ if $f- \frac{\mu }{2}\Vert \cdot \Vert ^2$ is convex. In this setting, we have the exponential convergence result:

Theorem 1

Suppose that $f: {{\mathcal {H}}}\rightarrow {\mathbb {R}}$ is a function of class ${{\mathcal {C}}}^1$ which is $\mu $-strongly convex for some $\mu >0$. Let $x(\cdot ): [t_0, + \infty [ \rightarrow {{\mathcal {H}}}$ be a solution trajectory of

$$\begin{aligned} \ddot{x}(t) + 2\sqrt{\mu } {\dot{x}}(t) + \nabla f (x(t)) = 0. \end{aligned}$$

(2)

Then, the following property holds: $f(x(t))- \min _{{\mathcal {H}}}f = {\mathcal {O}} \left( e^{-\sqrt{\mu }t}\right) $ as $t \rightarrow +\infty $.

Let us see how to take advantage of this fast convergence result, and how to adapt it to the case of a general convex differentiable function $f: {{\mathcal {H}}}\rightarrow {\mathbb {R}}$. The main idea is linked to Tikhonov’s method of regularization. It consists in considering the corresponding non-autonomous dynamic which at time t is governed by the gradient of the strongly convex function $f_t: {{\mathcal {H}}}\rightarrow {\mathbb {R}}$

$$\begin{aligned} f_t (x):= f(x) + \frac{\varepsilon (t)}{2} \Vert x\Vert ^2. \end{aligned}$$

Then replacing f by $f_t$ in (2), and noticing that $f_t$ is $\varepsilon (t)$-strongly convex, we obtain the dynamic

$$\begin{aligned} \mathrm{(TRIGS)} \qquad \ddot{x}(t) + \delta \sqrt{\varepsilon (t)} {\dot{x}}(t) + \nabla f (x(t)) + \varepsilon (t) x(t) =0, \end{aligned}$$

with $\delta =2$. (TRIGS) stands shortly for Tikhonov regularization of inertial gradient systems. In order not to asymptotically modify the equilibria, we suppose that $\varepsilon (t) \rightarrow 0$ as $t\rightarrow +\infty $. This condition implies that (TRIGS) falls within the framework of the inertial gradient systems with asymptotically vanishing damping. The importance of this class of inertial dynamics has been highlighted by several recent studies (Apidopoulos et al. 2018; Attouch et al. 2023; Attouch and Cabot 2017; Attouch et al. 2018; Attouch and Peypouquet 2016; Chambolle and Dossal 2015; Su et al. 2016), which make the link with the accelerated gradient method of Nesterov Nesterov (1983, 2004).

1.1 Historical facts and related results

In relation to optimization algorithms, a rich literature has been devoted to the coupling of dynamic gradient systems with Tikhonov regularization.

1.1.1 First-order gradient dynamics

For first-order gradient systems and subdifferential inclusions, the asymptotic hierarchical minimization property which results from the introduction of a vanishing viscosity term in the dynamic (in our context the Tikhonov approximation (Tikhonov 1963; Tikhonov and Arsenin 1977)) has been highlighted in a series of papers (Alvarez and Cabot 2006; Attouch 1996; Attouch and Cominetti 1996; Attouch and Czarnecki 2010; Baillon and Cominetti 2001; Cominetti et al. 2008; Hirstoaga 2006). In parallel way, there is a vast literature on convex descent algorithms involving Tikhonov and more general penalty, regularization terms. The historical evolution can be traced back to Fiacco and McCormick (1968), and the interpretation of interior point methods with the help of a vanishing logarithmic barrier. Some more specific references for the coupling of Prox and Tikhonov can be found in Cominetti (1997). The time discretization of the first-order gradient systems and subdifferential inclusions involving multiscale (in time) features provides a natural link between the continuous and discrete dynamics. The resulting algorithms combine proximal based methods (for example forward-backward algorithms), with the viscosity of penalization methods, see (Attouch et al. 2011a, b; Bot and Csetnek 2014; Cabot 2004, 2005; Hirstoaga 2006).

1.1.2 Second order gradient dynamics

First studies concerning the coupling of damped inertial dynamics with Tikhonov approximation concerned the heavy ball with friction system of Polyak (1987), where the damping coefficient $\gamma >0$ is fixed. In Attouch and Czarnecki (2002) Attouch-Czarnecki considered the system

$$\begin{aligned} \ddot{x}(t) + \gamma {\dot{x}}(t) + \nabla f(x(t)) + \varepsilon (t) x(t) =0. \end{aligned}$$

(3)

In the slow parametrization case $\int _0^{+\infty } \varepsilon (t) dt = + \infty $, they proved that any solution $x(\cdot )$ of (3) converges strongly to the minimum norm element of ${{\,\textrm{argmin}\,}}f$, see also Jendoubi and May (2010). A parallel study has been developed for PDE’s, see Alvarez and Attouch (2001) for damped hyperbolic equations with non-isolated equilibria, and Alvarez and Cabot (2006) for semilinear PDE’s. The system (3) is a special case of the general dynamic model

$$\begin{aligned} \ddot{x}(t) + \gamma {\dot{x}}(t) + \nabla f (x(t)) + \varepsilon (t) \nabla g (x(t))=0 \end{aligned}$$

(4)

which involves two functions f and g intervening with different time scale. When $\varepsilon (\cdot )$ tends to zero moderately slowly, it was shown in Attouch and Czarnecki (2017) that the trajectories of (4) converge asymptotically to equilibria that are solutions of the following hierarchical problem: they minimize the function g on the set of minimizers of f. When ${\mathcal {H}}= {{\mathcal {H}}}_1\times {{\mathcal {H}}}_2$ is a product space, defining for $x=(x_1,x_2)$, $f (x_1,x_2):= f_1 (x_1)+f_2 (x_2)$ and $g(x_1,x_2):= \Vert A_1 x_1 -A_2 x_2 \Vert ^2$, where the $A_i,\, i\in \{1,2\}$ are linear operators, (4) provides (weakly) coupled inertial systems. The continuous and discrete-time versions of these systems have a natural connection to the best response dynamics for potential games (Attouch and Czarnecki 2010), domain decomposition for PDE’s (Attouch et al. 2016), optimal transport (Attouch et al. 2010), coupled wave equations (Haraux and Jendoubi 2016).

In the quest for a faster convergence, the following system

$$\begin{aligned} \text{(AVD) }_{\alpha , \varepsilon } \quad \quad \ddot{x}(t) + \frac{\alpha }{t} {\dot{x}}(t) + \nabla f (x(t)) +\varepsilon (t) x(t)=0, \end{aligned}$$

(5)

has been studied by Attouch et al. (2018). It is a Tikhonov regularization of the dynamic

$$\begin{aligned} \text{(AVD) }_{\alpha } \quad \quad \ddot{x}(t) + \frac{\alpha }{t} {\dot{x}}(t) + \nabla f (x(t))=0, \end{aligned}$$

(6)

which was introduced by Su et al. (2016). When $\alpha =3$, $\text{(AVD) }_{\alpha }$ can be viewed as a continuous version of the accelerated gradient method of Nesterov. It has been the subject of many recent studies which have given an in-depth understanding of the Nesterov acceleration method, see Apidopoulos et al. (2018); Attouch and Cabot (2017); Attouch et al. (2018); Su et al. (2016). The results obtained in Attouch et al. (2018) concerning (5) will serve as a basis for comparison.

1.2 Model results

To illustrate our results, let us consider the case $\varepsilon (t) = \frac{c}{t^r}$ where r is positive parameter satisfying $0<r \le 2$. The case $r=2$ is of particular interest, it is related to the continuous version of the accelerated gradient method of Nesterov, with optimal convergence rate for general convex differentiable function f.

1.2.1 Case $r=2$

Let us consider the (TRIGS) dynamic

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} {\dot{x}}(t) + \nabla f\left( x(t) \right) +\frac{c}{t^2}x(t)=0, \end{aligned}$$

(7)

where the parameter $\alpha \ge 3$ plays a crucial role. As a consequence of Theorems 13 and 14 we have

Theorem 2

Let $x: [t_0, +\infty [ \rightarrow {\mathcal {H}}$ be a solution of (7). We then have the following results:

i)
If $\alpha =3$, then $\displaystyle f\left( x(t)\right) - \min _{{{\mathcal {H}}}} f =O\left( \frac{\ln t}{t^2}\right) \text{ as } t\rightarrow +\infty .$
ii)
If $\alpha >3$, then $\displaystyle f\left( x(t)\right) - \min _{{{\mathcal {H}}}} f =O\left( \frac{1}{t^2}\right) \text{ as } t\rightarrow +\infty .$ Further, the trajectory x is bounded, $\displaystyle \Vert {\dot{x}}(t)\Vert =O\left( \frac{1}{t}\right) \text{ as } t\rightarrow +\infty $, and there is strong convergence to the minimum norm solution:
$$\begin{aligned} \liminf _{t \rightarrow +\infty }{\Vert x(t) - x^*\Vert } = 0. \end{aligned}$$

1.2.2 Case $r<2$

As a consequence of Theorems 11 and 16, we have:

Theorem 3

Take $\varepsilon (t)=1/t^r$, $ \frac{2}{3}<r<2$. Let $x: [t_0, +\infty [ \rightarrow {\mathcal {H}}$ be a global solution trajectory of

$$\begin{aligned} \ddot{x}(t) + \frac{\delta }{t^{\frac{r}{2}}}{\dot{x}}(t) + \nabla f\left( x(t) \right) + \frac{1}{t^{r}} x(t)=0. \end{aligned}$$

Then, we have fast convergence the values, and strong convergence to the minimum norm solution:

$$\begin{aligned} f(x(t))-\min _{{{\mathcal {H}}}} f= {\mathcal {O}} \left( \displaystyle { \frac{1}{t^{\frac{3r}{2}-1}} } \right) \text{ and } \liminf _{t \rightarrow + \infty }{\Vert x(t) - x*\Vert } = 0. \end{aligned}$$

These results are completed by showing that, if there exists $T \ge t_0$, such that the trajectory $\{ x(t):t \ge T \}$ stays either in the open ball $B(0, \Vert x^*\Vert )$ or in its complement, then x(t) converges strongly to $x^*$ as $t\rightarrow +\infty .$ Corresponding results for the associated proximal algorithms, obtained by temporal discretization, are obtained in Sect. 5.

A remarkable property of the above results is that the rate of convergence of values is comparable to the Nesterov accelerated gradient method. In addition, we have a strong convergence property to the minimum norm solution, with comparable numerical complexity. These results represent an important advance compared to previous works by producing new dynamics for which we have both rapid convergence of values and strong convergence towards the solution of minimum norm. Let us stress the fact that in our approach the fast convergence of the values and the strong convergence towards the solution of minimum norm are obtained for the same dynamic, whereas in the previous works (Attouch et al. 2018; Attouch and Czarnecki 2002), they are obtained for different dynamics obtained for different settings of the parameters. It is clear that the results extend naturally to obtaining strong convergence towards the solution closest to a desired state $x_d$. It suffices to replace in Tikhonov’s approximation $\Vert x\Vert ^2$ by $\Vert x-x_d\Vert ^2$. This is important for inverse problems.

Remark 4

We emphasize that in Attouch et al. (2022) the authors also considered the case $\varepsilon (t)=1/t^r$, $0<r<2$ and showed the improved rates $ f\left( x(t)\right) -\min _{{{\mathcal {H}}}} f =O\left( \frac{1}{t^{r}}\right) \text{ as } t\rightarrow +\infty ,$ $\Vert {\dot{x}}(t)\Vert =O\left( \frac{1}{t^{\frac{r+2}{4}}}\right) \text{ as } t\rightarrow +\infty $ and the improved strong convergence result $ \lim _{t \rightarrow \infty }{\Vert x(t) - x^*\Vert } = 0.$ More precisely, by denoting $x_t$ the unique minimizer of the strongly convex function $f(x)+\frac{1}{2t^{r}}\Vert x\Vert ^2$, it is shown in Attouch et al. (2022) that $\Vert x(t) - x_t \Vert =O\left( \frac{1}{t^{\frac{2-r}{4}}}\right) \text{ as } t\rightarrow +\infty $ and this implies that $ \lim _{t \rightarrow \infty }{\Vert x(t) - x^*\Vert } = 0.$ The above results were further extended and improved in László (2023), where the rate $ f\left( x(t)\right) -\min _{{{\mathcal {H}}}} f =O\left( \frac{1}{t^{\frac{2r+2}{3}}}\right) \text{ as } t\rightarrow +\infty $ has been obtained. Similar strong convergence results and fast rates were obtained in Attouch et al. (2023) and László (2024) for a system with explicit hessian driven damping and a system with implicit hessian driven damping, respectively.

Nevertheless, whether the strong convergence result $ \lim _{t \rightarrow \infty }{\Vert x(t) - x^*\Vert } = 0$ can be obtained in case $r=2$ is still a challenging open question.

1.3 Contents

In section 2, we show existence and uniqueness of a global solution for the Cauchy problem associated with (TRIGS). Then, based on Lyapunov analysis, we obtain convergence rates of the values which are valid for a general $\varepsilon (\cdot )$. Section 3 is devoted to an in-depth analysis in the critical case $\varepsilon (t) = c/t^2$. Section 4 is devoted to the study of the strong convergence property of the trajectories towards the minimum norm solution, in the case of a general $\varepsilon (\cdot )$. Then in Sect. 5 we obtain similar results for the associated proximal algorithms, obtained by temporal discretization.

2 Convergence analysis for general $\varepsilon (t)$

We are going to analyze via Lyapunov analysis the convergence properties as $t\rightarrow +\infty $ of the solution trajectories of the inertial dynamic (TRIGS) that we recall below

$$\begin{aligned} \ddot{x}(t) + \delta \sqrt{\varepsilon (t)} {\dot{x}}(t) + \nabla f (x(t)) + \varepsilon (t) x(t) =0. \end{aligned}$$

(8)

Throughout the paper, we assume that $t_0$ is the origin of time, $\delta $ is a positive parameter, and

$(H_1)$ :: $f: {\mathcal {H}} \rightarrow {\mathbb {R}}$ is convex and differentiable, $\nabla f$ is Lipschitz continuous on bounded sets.
$(H_2)$ :: $S:= \text{ argmin } f \ne \emptyset $. We denote by $x^*$ the element of minimum norm of S.
$(H_3)$ :: $\varepsilon : [t_0, +\infty [ \rightarrow {\mathbb {R}}^+ $ is a nonincreasing function, of class ${\mathcal {C}}^1$, such that $\lim _{t \rightarrow \infty }\varepsilon (t) =0$.

2.1 Existence and uniqueness for the Cauchy problem

Let us first show that the Cauchy problem for (TRIGS) is well posed.

Theorem 5

Given $(x_0, v_0) \in {{\mathcal {H}}}\times {{\mathcal {H}}}$, there exists a unique global classical solution $x: [t_0, +\infty [ \rightarrow {\mathcal {H}}$ of the Cauchy problem

$$\begin{aligned} {\left\{ \begin{array}{ll} \ddot{x}(t) + \delta \sqrt{\varepsilon (t)} {\dot{x}}(t) + \nabla f\left( x(t) \right) +\varepsilon (t) x(t)=0 \\ x(t_0) = x_0, \, {\dot{x}}(t_0) = v_0. \end{array}\right. } \end{aligned}$$

(9)

Proof

The proof relies on the combination of the Cauchy-Lipschitz theorem with energy estimates. First consider the Hamiltonian formulation of (9) as the first order system

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{x}}(t) - y(t) =0 \\ {\dot{y}}(t) + \delta \sqrt{\varepsilon (t)} y(t) + \nabla f\left( x(t) \right) +\varepsilon (t) x(t)=0 \\ x(t_0) = x_0, \, y(t_0) = v_0. \end{array}\right. } \end{aligned}$$

(10)

According to the hypothesis $(H_1), (H_2), (H_3)$, and by applying the Cauchy-Lipschitz theorem in the locally Lipschitz case, we obtain the existence and uniqueness of a local solution. Then, in order to pass from a local solution to a global solution, we rely on the energy estimate obtained by taking the scalar product of (TRIGS) with ${\dot{x}}(t)$. It gives

$$\begin{aligned} \frac{d}{dt} \Big ( \frac{1}{2}\Vert {\dot{x}}(t)\Vert ^2 + f(x(t)) + \frac{1}{2}\varepsilon (t) \Vert x(t)\Vert ^2 ) \Big ) + \delta \sqrt{\varepsilon (t)} \Vert {\dot{x}}(t)\Vert ^2 - \frac{1}{2}{\dot{\varepsilon }} (t) \Vert x(t)\Vert ^2 =0. \end{aligned}$$

From $(H_3)$, $\varepsilon (\cdot )$ is non-increasing. Therefore, the energy function $t \mapsto W(t)$ is decreasing where

$$\begin{aligned} W(t):= \frac{1}{2}\Vert {\dot{x}}(t)\Vert ^2 + f(x(t)) + \frac{1}{2}\varepsilon (t) \Vert x(t)\Vert ^2. \end{aligned}$$

The end of the proof follows a standard argument. Take a maximal solution defined on an interval $[t_0, T[$. If T is infinite, the proof is over. Otherwise, if T is finite, according to the above energy estimate, we have that $\Vert {\dot{x}}(t)\Vert $ remains bounded, just like $\Vert x(t)\Vert $ and $\Vert \ddot{x}(t)\Vert $ (use (TRIGS)). Therefore, the limit of x(t) and ${\dot{x}}(t)$ exists when $t \rightarrow T$. Applying the local existence result at T with the initial conditions thus obtained gives a contradiction to the maximality of the solution. $\square $

2.2 General case

The control of the decay of $\varepsilon (t)$ to zero as $t \rightarrow +\infty $ will play a key role in the Lyapunov analysis of (TRIGS). Precisely, we will use the following condition.

Definition 1

Given $\delta >0$, we say that $t \mapsto \varepsilon (t)$ satisfies the controlled decay property $\mathrm{(CD)}_K$, if it is a nonincreasing function which satisfies: there exists $t_1\ge t_0$ such that for all $t\ge t_1,$

$$\begin{aligned} \left( \frac{1}{\sqrt{\varepsilon (t)}}\right) '\le \min (2K-\delta , \delta -K), \end{aligned}$$

where K is a parameter such that $ \frac{\delta }{2}< K < \delta $ for $0<\delta \le 2$, and $ \frac{\delta + \sqrt{\delta ^2 -4}}{2}< K < \delta $ for $\delta > 2$.

Theorem 6

Let $x: [t_0, +\infty [ \rightarrow {\mathcal {H}}$ be a solution trajectory of (TRIGS). Let $\delta $ be a positive parameter. Suppose that $\varepsilon (\cdot )$ satisfies the condition $\mathrm{(CD)}_K$ for some $K>0$. Then, we have the following rate of convergence of values: for all $t\ge t_1$

$$\begin{aligned} f(x(t)) - \min _{{{\mathcal {H}}}} f \le \frac{K\Vert x^*\Vert ^2}{2}\frac{1}{{\mathfrak {M}}(t)}\int _{t_1}^t \varepsilon ^{\frac{3}{2}}(s){\mathfrak {M}}(s)ds+\frac{C}{{\mathfrak {M}}(t)}, \end{aligned}$$

(11)

where

$$\begin{aligned} {\mathfrak {M}}(t)=\exp \left( {\displaystyle {\int _{t_1}^t\mu (s)ds}}\right) , \quad \mu (t) =-\frac{{\dot{\varepsilon }}(t)}{2\varepsilon (t)}+ (\delta -K)\sqrt{\varepsilon (t)} \end{aligned}$$

and

$$\begin{aligned} C=\left( f(x(t_1)) - f(x^*) \right) +\frac{\varepsilon (t_1)}{2}\Vert x(t_1)\Vert ^2+ \frac{1}{2} \Vert K\sqrt{\varepsilon (t_1)}(x(t_1)-x^*) + {\dot{x}}(t_1) \Vert ^2. \end{aligned}$$

Proof

Lyapunov analysis. Set $f^*:= f(x^*)=\min _{{{\mathcal {H}}}} f$. The energy function ${\mathcal {E}}: [t_0, +\infty [ \rightarrow {\mathbb {R}}_+,$

$$\begin{aligned} {\mathcal {E}}(t):= \left( f(x(t)) - f^*\right) +\frac{\varepsilon (t)}{2}\Vert x(t)\Vert ^2+ \frac{1}{2} \Vert c(t)(x(t)-x^*) + {\dot{x}}(t) \Vert ^2, \end{aligned}$$

(12)

will be the basis for our Lyapunov analysis. The function $c:[t_0,+\infty [\rightarrow {\mathbb {R}}$ will be defined later, appropriately. Let us differentiate ${\mathcal {E}}(\cdot )$. By using the derivation chain rule, we get

$$\begin{aligned} \dot{{\mathcal {E}}}(t)&= \langle {\nabla }f(x(t)),{\dot{x}}(t)\rangle +\frac{{\dot{\varepsilon }}(t)}{2}\Vert x(t)\Vert ^2+\varepsilon (t)\langle {\dot{x}}(t),x(t)\rangle \nonumber \\&\quad +\langle c'(t)(x(t)-x^*)+c(t){\dot{x}}(t) +\ddot{x}(t),c(t)(x(t)-x^*) + {\dot{x}}(t)\rangle . \end{aligned}$$

(13)

According to the constitutive Eq. (8), we have

$$\begin{aligned} \ddot{x}(t) = - \varepsilon (t) x(t)-\delta \sqrt{\varepsilon (t)} {\dot{x}}(t) - \nabla f(x(t)). \end{aligned}$$

(14)

Therefore,

$$\begin{aligned}{} & {} \langle c'(t)(x(t)-x^*)+c(t){\dot{x}}(t) +\ddot{x}(t),c(t)(x(t)-x^*) + {\dot{x}}(t)\rangle \nonumber \\{} & {} \quad = \langle c'(t)(x(t)-x^*)+(c(t)-\delta \sqrt{\varepsilon (t)}){\dot{x}}(t) -(\varepsilon (t)x(t)\nonumber \\{} & {} \qquad +\nabla f(x(t))), c(t)(x(t)-x^*) + {\dot{x}}(t)\rangle \nonumber \\{} & {} \quad =c'(t)c(t)\Vert x(t)-x^*\Vert ^2+(c'(t)+c^2(t)-\delta c(t)\sqrt{\varepsilon (t)})\langle {\dot{x}}(t),x(t)-x^*\rangle \nonumber \\{} & {} \qquad +(c(t)-\delta \sqrt{\varepsilon (t)})\Vert {\dot{x}}(t)\Vert ^2 \nonumber \\{} & {} \qquad -\varepsilon (t)\langle x(t),{\dot{x}}(t)\rangle -\langle \nabla f(x(t)), {\dot{x}}(t)\rangle -c(t)\langle \varepsilon (t)x(t)+\nabla f(x(t)), x(t)-x^*\rangle .\nonumber \\ \end{aligned}$$

(15)

By combining (13) with (15), we get

$$\begin{aligned} \dot{{\mathcal {E}}}(t)&=\frac{{\dot{\varepsilon }}(t)}{2}\Vert x(t)\Vert ^2+c'(t)c(t)\Vert x(t)-x^*\Vert ^2\nonumber \\&\quad +(c'(t)+c^2(t)-\delta c(t)\sqrt{\varepsilon (t)})\langle {\dot{x}}(t),x(t)-x^*\rangle \nonumber \\&\quad +(c(t)-\delta \sqrt{\varepsilon (t)})\Vert {\dot{x}}(t)\Vert ^2-c(t)\langle \varepsilon (t)x(t)+\nabla f(x(t)), x(t)-x^*\rangle . \end{aligned}$$

(16)

Consider the function

$$\begin{aligned} f_t:{\mathcal {H}}\longrightarrow {\mathbb {R}},\,f_t(x)=f(x)+\frac{\varepsilon (t)}{2}\Vert x\Vert ^2. \end{aligned}$$

According to the strong convexity property of $f_t$, we have

$$\begin{aligned} f_t(y)-f_t(x)\ge \langle {\nabla }f_t(x),y-x\rangle +\frac{\varepsilon (t)}{2}\Vert x-y\Vert ^2, \text{ for } \text{ all } x,y\in {\mathcal {H}}. \end{aligned}$$

Take $y=x^*$ and $x=x(t)$ in the above inequality. We get

$$\begin{aligned}&f(x^*)+\frac{\varepsilon (t)}{2}\Vert x^*\Vert ^2-f(x(t))-\frac{\varepsilon (t)}{2}\Vert x(t)\Vert ^2\ge \\&\quad -\langle {\nabla }f(x(t))+\varepsilon (t)(x(t),x(t)-x^*\rangle +\frac{\varepsilon (t)}{2}\Vert x(t)-x^*\Vert ^2. \end{aligned}$$

Consequently,

$$\begin{aligned}{} & {} -\langle {\nabla }f(x(t))+\varepsilon (t)x(t),x(t)-x^*\rangle \le -(f(x(t))-f(x^*)) \nonumber \\{} & {} +\frac{\varepsilon (t)}{2}\Vert x^*\Vert ^2-\frac{\varepsilon (t)}{2}\Vert x(t)\Vert ^2-\frac{\varepsilon (t)}{2}\Vert x(t)-x^*\Vert ^2. \end{aligned}$$

(17)

By multiplying (17) with c(t) and injecting in (16) we get

$$\begin{aligned} \dot{{\mathcal {E}}}(t)&\le -c(t)(f(x(t))-f^*)+\left( \frac{{\dot{\varepsilon }}(t)}{2}-c(t)\frac{\varepsilon (t)}{2}\right) \Vert x(t)\Vert ^2\nonumber \\&\quad +\left( c'(t)c(t)-c(t)\frac{\varepsilon (t)}{2}\right) \Vert x(t)-x^*\Vert ^2+(c(t)-\delta \sqrt{\varepsilon (t)})\Vert {\dot{x}}(t)\Vert ^2\nonumber \\&\quad +(c'(t)+c^2(t)-\delta c(t)\sqrt{\varepsilon (t)})\langle {\dot{x}}(t),x(t)-x^*\rangle +c(t)\frac{\varepsilon (t)}{2}\Vert x^*\Vert ^2. \end{aligned}$$

(18)

On the other hand, for a positive function $\mu (t)$ we have

$$\begin{aligned} \mu (t){\mathcal {E}}(t)&=\mu (t) \left( f(x(t)) - f^*\right) +\mu (t)\frac{\varepsilon (t)}{2}\Vert x(t)\Vert ^2\nonumber \\&\quad + \frac{1}{2}\mu (t)c^2(t) \Vert x(t)-x^*\Vert ^2 +\frac{1}{2}\mu (t) \Vert {\dot{x}}(t) \Vert ^2\nonumber \\&\quad + \mu (t)c(t)\langle {\dot{x}}(t),x(t)-x^*\rangle . \end{aligned}$$

(19)

By adding (18) and (19) we get

$$\begin{aligned} \dot{{\mathcal {E}}}(t)+\mu (t){\mathcal {E}}(t)&\le (\mu (t)-c(t))( f(x(t)) - f^*)+\left( \frac{{\dot{\varepsilon }}(t)}{2}-c(t)\frac{\varepsilon (t)}{2}+\mu (t)\frac{\varepsilon (t)}{2}\right) \Vert x(t)\Vert ^2\nonumber \\&\quad +\left( c'(t)c(t)-c(t)\frac{\varepsilon (t)}{2}+\frac{1}{2}\mu (t)c^2(t)\right) \Vert x(t)-x^*\Vert ^2\nonumber \\&\quad +\left( c(t)-\delta \sqrt{\varepsilon (t)}+\frac{1}{2}\mu (t)\right) \Vert {\dot{x}}(t)\Vert ^2\nonumber \\&\quad +\left( c'(t)+c^2(t)-\delta c(t)\sqrt{\varepsilon (t)}+ \mu (t)c(t)\right) \langle {\dot{x}}(t),x(t)-x^*\rangle \nonumber \\&\quad +c(t)\frac{\varepsilon (t)}{2}\Vert x^*\Vert ^2. \end{aligned}$$

(20)

Since we have no control on the sign of $\langle {\dot{x}}(t),x(t)-x^*\rangle $, we take the coefficient in front of this term equal to zero, that is

$$\begin{aligned} c'(t)+c^2(t)-\delta c(t)\sqrt{\varepsilon (t)}+ \mu (t)c(t) =0. \end{aligned}$$

(21)

Take $c(t)=K\sqrt{\varepsilon (t)}$. Indeed, it is here that the choice of c, and of the corresponding parameter K, come into play. The relation (21) can be equivalently written

$$\begin{aligned} \mu (t) =-\frac{{\dot{\varepsilon }}(t)}{2\varepsilon (t)}+ (\delta -K)\sqrt{\varepsilon (t)}. \end{aligned}$$

According to this choice for $\mu (t)$ and c(t), the inequality (20) becomes

$$\begin{aligned} \dot{{\mathcal {E}}}(t)+\mu (t){\mathcal {E}}(t)&\le \frac{1}{2\varepsilon (t)}\left( -{\dot{\varepsilon }}(t)+2(\delta -2K){\varepsilon (t)}^{\frac{3}{2}}\right) ( f(x(t)) - f^*)\nonumber \\&\quad +\frac{1}{4}\left( {\dot{\varepsilon }}(t) + 2\left( \delta -2K \right) \varepsilon (t)^{\frac{3}{2}}\right) \Vert x(t)\Vert ^2\nonumber \\&\quad +\frac{K}{4}\left( K{\dot{\varepsilon }}(t) +2\varepsilon (t)^{\frac{3}{2}}(-K^2+ \delta K -1) \right) \Vert x(t)-x^*\Vert ^2\nonumber \\&\quad + \frac{1}{4\varepsilon (t)} \left( -{\dot{\varepsilon }}(t)+2(K- \delta ){\varepsilon (t)}^{\frac{3}{2}} \right) \Vert {\dot{x}}(t)\Vert ^2 +\frac{K\Vert x^*\Vert ^2}{2}\varepsilon ^{\frac{3}{2}}(t). \end{aligned}$$

(22)

Let us show that the condition $\mathrm{(CD)}_K$ provide the nonpositive sign for the coefficients in front of the terms of the right side of (22). Recall that, according to the hypotheses $\mathrm{(CD)}_K$, for all $t\ge t_1$ we have the properties a) and b):

$$\begin{aligned}{} & {} a) \, \left( \frac{1}{\sqrt{\varepsilon (t)}}\right) '\le M_1 (K)= \min (2K-\delta , \delta -K)=\left\{ \begin{array}{ll}2K-\delta \text{ if } K\le \frac{2}{3} \delta \\ \delta -K, \text{ if } \frac{2}{3} \delta \le K,\end{array}\right. \\{} & {} b) \, \left( \frac{1}{\sqrt{\varepsilon (t)}}\right) '\ge 0. \end{aligned}$$

Without ambiguity we write briefly $M_1$ for $M_1 (K)$. Note that b) just expresses that $\varepsilon (\cdot )$ is non increasing. According to the hypotheses $\mathrm{(CD)}_K$, we claim that for all $t\ge t_1$

$$\begin{aligned} \left\{ \begin{array}{llll} i) \, \left( \frac{1}{\sqrt{\varepsilon (t)}}\right) '\le 2K-\delta \\ ii) \, \left( \frac{1}{\sqrt{\varepsilon (t)}}\right) '\ge \frac{\delta K-K^2-1}{K} \\ iii) \, \left( \frac{1}{\sqrt{\varepsilon (t)}}\right) '\le \delta -K. \end{array}\right. \end{aligned}$$

(23)

Let us justify these inequalities (23).

i)
is a consequence of $\left( \frac{1}{\sqrt{\varepsilon (t)}}\right) '\le M_1$ and $M_1\le 2K-\delta $.
ii)
is a consequence of $\left( \frac{1}{\sqrt{\varepsilon (t)}}\right) '\ge 0$ and $\delta K-K^2-1 \le 0$. Precisely, when $\delta \le 2$ we have $\delta K-K^2-1 \le 2 K-K^2-1 \le 0$. When $\delta > 2$, we have $\delta K-K^2-1 \le 0$ because $K \ge \frac{\delta + \sqrt{\delta ^2 -4}}{2} $.
iii)
is a consequence of $\left( \frac{1}{\sqrt{\varepsilon (t)}}\right) '\le M_1$ and $M_1\le \delta -K$.

The inequalities (23) can be equivalently written as follows: for all $t\ge t_1$

$$\begin{aligned} \left\{ \begin{array}{llll} i) \, -{\dot{\varepsilon }}(t)+2(\delta -2K){\varepsilon (t)}^{\frac{3}{2}}\le 0 \\ ii) \, K{\dot{\varepsilon }}(t)+2(\delta K-K^2-1){\varepsilon (t)}^{\frac{3}{2}}\le 0\\ iii) \, -{\dot{\varepsilon }}(t)+2(K-\delta ){\varepsilon (t)}^{\frac{3}{2}}\le 0. \end{array}\right. \end{aligned}$$

(24)

The inequalities (24) give that the coefficients entering the right side of (22) are nonpositive:

i) gives that the coefficient of $f(x(t)) - f^*$ is nonpositive.
Since ${\dot{\varepsilon }}(t)\le 0$ we have ${\dot{\varepsilon }}(t)+2(\delta -2K){\varepsilon (t)}^{\frac{3}{2}}\le -{\dot{\varepsilon }}(t)+2(\delta -2K){\varepsilon (t)}^{\frac{3}{2}}$. Therefore, by i) we have that the coefficient of $\Vert x(t)\Vert ^2$ in (22) is nonpositive.
ii) gives that the coefficient of $\Vert x(t)-x^*\Vert ^2$ is nonpositive.
iii) gives that the coefficient of $\Vert {\dot{x}}(t)\Vert ^2 $ is nonpositive.

Let us return to (22). Using (24) and the above results, we obtain

$$\begin{aligned} \dot{{\mathcal {E}}}(t)+\mu (t){\mathcal {E}}(t)\le \frac{K\Vert x^*\Vert ^2}{2}\varepsilon ^{\frac{3}{2}}(t), \text{ for } \text{ all } t\ge t_1. \end{aligned}$$

(25)

By multiplying (25) with ${\mathfrak {M}}(t)= \exp \left( {\displaystyle {\int _{t_1}^t\mu (s)ds}}\right) $ we obtain

$$\begin{aligned} \frac{d}{dt}\left( {\mathfrak {M}}(t){\mathcal {E}}(t)\right) \le \frac{K\Vert x^*\Vert ^2}{2}\varepsilon ^{\frac{3}{2}}(t){\mathfrak {M}}(t). \end{aligned}$$

(26)

By integrating (26) on $[t_1,t]$ we get

$$\begin{aligned} {\mathcal {E}}(t)\le \frac{K\Vert x^*\Vert ^2}{2}\frac{\int _{t_1}^t \varepsilon ^{\frac{3}{2}}(s){\mathfrak {M}}(s)ds}{{\mathfrak {M}}(t)}+\frac{{\mathfrak {M}}(t_1){\mathcal {E}}(t_1)}{{\mathfrak {M}}(t)}. \end{aligned}$$

(27)

By definition of ${\mathcal {E}}(t)$ we deduce that

$$\begin{aligned} f(x(t)) - \min _{{{\mathcal {H}}}} f \le \frac{K\Vert x^*\Vert ^2}{2}\frac{\int _{t_1}^t \varepsilon ^{\frac{3}{2}}(s){\mathfrak {M}}(s)ds}{{\mathfrak {M}}(t)}+\frac{{\mathcal {E}}(t_1)}{{\mathfrak {M}}(t)}, \end{aligned}$$

(28)

for all $t\ge t_1$, and this gives the convergence rate of the values. $\square $

Remark 7

By integrating the relation $0\le \left( \frac{1}{\sqrt{\varepsilon (t)}}\right) '\le M_1$ on an interval $[t_1, t]$, we get

$$\begin{aligned} \frac{1}{\sqrt{\varepsilon (t_1)}}\le \frac{1}{\sqrt{\varepsilon (t)}}\le M_1 t+\frac{1}{\sqrt{\varepsilon (t_1)}}-M_1 t_1. \end{aligned}$$

Therefore, denoting $C_1=\frac{1}{\sqrt{\varepsilon (t_1)}}-M_1 t_1$, and $C_2=\varepsilon (t_1)$ we have

$$\begin{aligned} \frac{1}{(M_1 t+C_1)^2} \le \varepsilon (t) \le C_2. \end{aligned}$$

(29)

This shows that the Lyapunov analysis developed previously only provides information in the case where $ \varepsilon (t) $ is greater than or equal to $ C / t^2 $. Since the damping coefficient $\gamma (t) = \delta \sqrt{\varepsilon (t)}$, this means that $\gamma (t)$ must be greater than or equal to C/t. This is in accordance with the theory of inertial gradient systems with time-dependent viscosity coefficient, which states that the asymptotic optimization property is valid provided that the integral on $[t_0, +\infty [$ of $\gamma (t)$ is infinite, see Attouch and Cabot (2017).

As a consequence of Theorem 6 we have the following result.

Corollary 8

Under the hypothesis of Theorem 6 we have

$$\begin{aligned} \lim _{t\rightarrow +\infty }{\mathfrak {M}}(t)=+\infty . \end{aligned}$$

(30)

Suppose moreover that $\varepsilon ^{\frac{3}{2}}(\cdot )\in L^1(t_0,+\infty )$. Then

$$\begin{aligned} \lim _{t\rightarrow +\infty }f(x(t))=\min _{{{\mathcal {H}}}} f. \end{aligned}$$

(31)

Proof

By definition of $\mu (t)$, since $\varepsilon (\cdot )$ is nonincreasing and $ \delta \ge K$, we have that $\mu (t)$ is nonnegative for all $t\ge t_1$. Therefore, $t \mapsto {\mathfrak {M}}(t)$ is a nondecreasing function. Let us write equivalently $\mu (t)=\frac{d}{dt}\ln \frac{1}{\sqrt{\varepsilon (t)}}+ (\delta -K)\sqrt{\varepsilon (t)}$, and integrate on $[t_1,t]$. We obtain

$$\begin{aligned} {\mathfrak {M}}(t)= \exp \left( {\displaystyle {\int _{t_1}^t\mu (s)ds}}\right) =\frac{C}{\sqrt{\varepsilon (t)}}\exp \left( \int _{t_1}^t (\delta -K)\sqrt{\varepsilon (s)}ds\right) . \end{aligned}$$

Since $ \delta -K\ge 0$, we deduce that ${\mathfrak {M}}(t) \ge \frac{C}{\sqrt{\varepsilon (t)}}$. Since $\lim _{t \rightarrow \infty }\varepsilon (t) =0$, we get

$$\begin{aligned} \lim _{t\rightarrow +\infty }{\mathfrak {M}}(t)=+\infty . \end{aligned}$$

Moreover, if we suppose that $\varepsilon ^{\frac{3}{2}}(\cdot )\in L^1(t_0,+\infty )$, then by Attouch et al. (2018, Lemma A.3) we obtain

$$\begin{aligned} \lim _{t\rightarrow +\infty } \frac{\int _{t_1}^t \varepsilon ^{\frac{3}{2}}(s){\mathfrak {M}}(s)ds}{{\mathfrak {M}}(t)}=0. \end{aligned}$$

Combining these properties with the convergence rate (11) of Theorem 6, we obtain (31). $\square $

2.3 Particular cases

Since $\varepsilon (t) \rightarrow 0$ as $t \rightarrow + \infty $, (TRIGS) falls within the setting of the inertial dynamics with an asymptotic vanishing damping coefficient $\gamma (t)$. Here, $\gamma (t) = \delta \sqrt{\varepsilon (t)} $. We know with (Cabot et al. 2009) that for such systems, the optimization property is satisfied asymptotically if $\int _{t_0}^{+\infty } \gamma (t) dt =+\infty $ (i.e. $\gamma (t)$ does no tend too rapidly towards zero). By taking $\varepsilon (t) = \frac{c}{t^p}$, it is easy to verify that the condition $\mathrm{(CD)}_K$ is satisfied if $p \le 2$, that is $\sqrt{\varepsilon (t)} = \frac{c}{t^p}$, with $p\le 1$, which is in accordance with the above property. Let us particularize Theorem 6 to situations where the integrals can be computed (at least estimated).

2.3.1 $\varepsilon (t) $ of order $1/t^2$

Take

$$\begin{aligned} \varepsilon (t)=\frac{1}{(M t+C)^2},\, \, M < M_1 (K),\ \, C\le C_1. \end{aligned}$$

Then, $\left( \frac{1}{\sqrt{\varepsilon (t)}}\right) '\le M_1 (K)$ for all $t\ge t_0$ and the condition $\mathrm{(CD)}_K$ is satisfied. Moreover,

$$\begin{aligned} \mu (t)=\frac{M+\delta -K}{Mt+C}, \quad {\mathfrak {M}}(t)=\left( \frac{Mt+C}{Mt_0+C}\right) ^{\frac{M+\delta -K}{M}}. \end{aligned}$$

Therefore, (11) becomes

$$\begin{aligned} {\mathcal {E}}(t)\le \frac{K\Vert x^*\Vert ^2}{2}\frac{\displaystyle {\int _{t_0}^t {(Ms+C)}^{\frac{-2M+\delta -K}{M}}ds}}{{(Mt+C)}^{\frac{M+\delta -K}{M}}}+\frac{(Mt_0+C)^{\frac{M+\delta -K}{M}}{\mathcal {E}}(t_0)}{(Mt+C)^{\frac{M+\delta -K}{M}}}. \end{aligned}$$

(32)

Consequently, we have

$$\begin{aligned}{} & {} {\mathcal {E}}(t)\le \frac{K\Vert x^*\Vert ^2}{2(-M+\delta -K)}\frac{1}{{(Mt+C)}^2}\\{} & {} \quad +\frac{-\frac{K\Vert x^*\Vert ^2}{2(-M+\delta -K)}(Mt_0+C)^{\frac{-M+\delta -K}{M}}+(Mt_0+C)^{\frac{M+\delta -K}{M}}{\mathcal {E}}(t_0)}{(Mt+C)^{\frac{M+\delta -K}{M}}}. \end{aligned}$$

By assumption we have $M < M_1 \le \delta -K$. Therefore $\frac{M+\delta -K}{M} > 2$ and $-M +\delta -K > 0$. It follows that when $Mt +C\ge 1$

$$\begin{aligned} {\mathcal {E}}(t)\le \frac{C'}{(Mt+C)^2}, \, \text{ with } \, \displaystyle {C'=\frac{K\Vert x^*\Vert ^2}{2(-M+\delta -K)}+(Mt_0+C)^{\frac{M+\delta -K}{M}}{\mathcal {E}}(t_0)}. \end{aligned}$$

Observe that $\displaystyle {\delta \sqrt{\varepsilon (t)}=\frac{\frac{\delta }{M}}{t+\frac{C}{M}}=\frac{\alpha }{t+\beta }}$, where we set $\alpha = \frac{\delta }{M}$ and $\beta =\frac{C}{M}$. Since $M< M_1\le \frac{1}{3} \delta $ we get $\alpha \in \big ]3,+\infty \big [$. Indeed, we can get any $\alpha > 3$. Note also that by translating the time scale the result in the general case $ \beta \ge 0 $ results from its obtaining for a particular case $ \beta =0$. According to the fact that we can take for $\delta $ any positive number, we obtain

Theorem 9

Take $\alpha \in \big ]3,+\infty \big [,$ $c >0$. Let $x: [t_0, +\infty [ \rightarrow {\mathcal {H}}$ be a solution trajectory of

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} {\dot{x}}(t) + \nabla f\left( x(t) \right) +\frac{c}{t^2}x(t)=0. \end{aligned}$$

Then, the following convergence rate of the values is satisfied: as $t\rightarrow +\infty $

$$\begin{aligned} f(x(t))-\min _{{{\mathcal {H}}}} f=O\left( \frac{1}{t^2}\right) . \end{aligned}$$

Remark 10

It is an natural question to compare our dynamic ($c>0$) with the (Su et al. 2016) ($c=0$), which was introduced as a continuous version of the Nesterov accelerated gradient method. We obtain the optimal convergence rate of values with an additional Tikhonov regularization term, which is a remarkable property. In fact, in the next sections we will prove that the Tikhonov term induces strong convergence of the trajectory to the minimum norm solution.

2.3.2 $\varepsilon (t)$ of order $1/t^r$, $\frac{2}{3}< r<2$

Take $\varepsilon (t)=1/t^r$, $r<2$. Then

$$\begin{aligned} \mu (t)= & {} -\frac{1}{2}\frac{{\dot{\varepsilon }}(t)}{\varepsilon (t)}+ (\delta -K)\sqrt{\varepsilon (t)}\\= & {} \frac{r}{2t}+ \frac{\delta -K}{ t^\frac{r}{2}}. \end{aligned}$$

Therefore

$$\begin{aligned} {\mathfrak {M}}(t)= & {} \exp {\displaystyle {\int _{t_0}^t \left( \frac{r}{2s}+ \frac{\delta -K}{ s^\frac{r}{2}}\right) ds}}= C \displaystyle {t^{ \frac{r}{2}} \exp \left( \frac{2(\delta -K)}{2-r } t^{1-\frac{r}{2}}\right) }. \end{aligned}$$

Set

$$\begin{aligned} m(t):=\displaystyle {t^{ \frac{r}{2}} \exp \left( \frac{2(\delta -K)}{2-r } t^{1-\frac{r}{2}}\right) }. \end{aligned}$$

According to (28) we have that for some $C_1>0$

$$\begin{aligned} f(x(t)) - \min _{{{\mathcal {H}}}} f\le & {} \frac{C_1}{m(t)} \int _{t_0}^t \frac{m(s)}{s^{\frac{3r}{2}}}ds +\frac{C_1}{m(t)}. \end{aligned}$$

(33)

Note that according to $r<2$, m(t) is an increasing function which has an exponential growth as $t\rightarrow +\infty $. Accordingly, by the mean value theorem we have the following majorization.

$$\begin{aligned} \frac{1}{m(t)} \int _{t_0}^t \frac{m(s)}{s^{\frac{3r}{2}}}ds\le & {} \frac{m(t)}{m(t)} \int _{t_0}^{t} \frac{1}{s^{\frac{3r}{2}}}ds = {\mathcal {O}} \left( \frac{1}{t^{\frac{3r}{2}-1}} \right) . \end{aligned}$$

(34)

Let us summarize these results in the following statement.

Theorem 11

Take $\varepsilon (t)=1/t^r$, $ \frac{2}{3}<r<2$, $\delta >0$. Let $x: [t_0, +\infty [ \rightarrow {\mathcal {H}}$ be a global trajectory of

$$\begin{aligned} \ddot{x}(t) + \frac{\delta }{t^{\frac{r}{2}}}{\dot{x}}(t) + \nabla f\left( x(t) \right) + \frac{1}{t^{r}} x(t)=0. \end{aligned}$$

Then, the following convergence rate of the values is satisfied: as $t\rightarrow +\infty $

$$\begin{aligned} f(x(t))-\min _{{{\mathcal {H}}}} f= {\mathcal {O}} \left( \displaystyle { \frac{1}{t^{\frac{3r}{2}-1}} } \right) . \end{aligned}$$

Remark 12

When $r \rightarrow 2$ the exponent $\frac{3r}{2}-1$ tends to 2. So there is a continuous transition in the convergence rate. As in Remark 10 the additional Tikhonov regularization term is expected to have a regularization effect (even better than in the case $r=2$). In addition, the above analysis makes appear another critical value, namely $r= \frac{2}{3}$.

3 In-depth analysis in the critical case $\varepsilon (t) = c/t^2$

Let us refine our analysis in the case where the Tikhonov regularization coefficient and the damping coefficient are respectively of order $1/t^2$ and 1/t. Our analysis will now take into account the coefficients $\alpha $ and c in front of these terms. So the Cauchy problem for (TRIGS) is written

$$\begin{aligned} {\left\{ \begin{array}{ll} \ddot{x}(t) + \frac{\alpha }{t} {\dot{x}}(t) + \nabla f\left( x(t) \right) +\frac{c}{t^2}x(t)=0\\ x(t_0) = x_0, \, {\dot{x}}(t_0) = v_0, \end{array}\right. } \end{aligned}$$

(35)

where $t_0> 0,\,c> 0$, $(x_0,v_0) \in {\mathcal {H}} \times {\mathcal {H}},$ and $\alpha \ge 3$. The starting time $t_0$ is taken strictly greater than zero to take into account the fact that the functions $\frac{c}{t^2}$ and $\frac{\alpha }{t}$ have singularities at 0. This is not a limitation of the generality of the proposed approach, since we will focus on the asymptotic behaviour of the generated trajectories.

3.1 Convergence rate of the values

Theorem 13

Let $t_0 > 0$ and, for some initial data $x_0,v_0\in {\mathcal {H}}$, let $x: [t_0, +\infty [ \rightarrow {\mathcal {H}}$ be the unique global solution of (35). Then, the following results hold.

i)
If $\alpha =3$, then $\displaystyle f\left( x(t)\right) - \min _{{{\mathcal {H}}}} f =O\left( \frac{\ln t}{t^2}\right) \text{ as } t\rightarrow +\infty .$
ii)
If $\alpha >3$, then $\displaystyle f\left( x(t)\right) - \min _{{{\mathcal {H}}}} f =O\left( \frac{1}{t^2}\right) \text{ as } t\rightarrow +\infty .$ Further, the trajectory x is bounded and $\displaystyle \Vert {\dot{x}}(t)\Vert =O\left( \frac{1}{t}\right) \text{ as } t\rightarrow +\infty .$

Proof

The analysis is parallel to that of Theorem 6. Set $f^*:= f(x^*)=\min _{{{\mathcal {H}}}} f$. Let $b:[t_0,+\infty [\rightarrow {\mathbb {R}}$, $b(t)=\frac{K}{t}$ where $K>0$ will be defined later. Let us introduce ${\mathcal {E}}: [t_0, +\infty [ \rightarrow {\mathbb {R}},$

$$\begin{aligned} {\mathcal {E}}(t) := \left( f(x(t)) - f^*\right) +\frac{c}{2t^2}\Vert x(t)\Vert ^2+ \frac{1}{2} \Vert b(t)(x(t)-x^*) + {\dot{x}}(t) \Vert ^2, \end{aligned}$$

(36)

that will serve as a Lypaunov function. Then,

$$\begin{aligned} \dot{{\mathcal {E}}}(t)&= \langle {\nabla }f(x(t)),{\dot{x}}(t)\rangle -\frac{c}{t^3}\Vert x(t)\Vert ^2+\frac{c}{t^2}\langle {\dot{x}}(t),x(t)\rangle \nonumber \\&\quad +\langle b'(t)(x(t)-x^*)+b(t){\dot{x}}(t) +\ddot{x}(t),b(t)(x(t)-x^*) + {\dot{x}}(t)\rangle . \end{aligned}$$

(37)

According to the dynamic system (35), we have

$$\begin{aligned} \ddot{x}(t) = - \frac{c}{t^2} x(t)-\frac{\alpha }{t} {\dot{x}}(t) - \nabla f(x(t)). \end{aligned}$$

(38)

Therefore,

$$\begin{aligned}&\langle b'(t)(x(t)-x^*)+b(t){\dot{x}}(t) +\ddot{x}(t),b(t)(x(t)-x^*) + {\dot{x}}(t)\rangle =\nonumber \\&\quad \left\langle -\frac{K}{t^2}(x(t)-x^*)+\frac{K-\alpha }{t}{\dot{x}}(t) -\left( \frac{c}{t^2}x(t)+\nabla f(x(t))\right) , \frac{K}{t}(x(t)-x^*) + {\dot{x}}(t)\right\rangle =\nonumber \\&\quad -\frac{K^2}{t^3}\Vert x(t)-x^*\Vert ^2+\frac{K^2-\alpha K-K}{t^2}\langle {\dot{x}}(t),x(t)-x^*\rangle +\frac{K-\alpha }{t}\Vert {\dot{x}}(t)\Vert ^2\nonumber \\&\quad -\frac{c}{t^2}\langle x(t),{\dot{x}}(t)\rangle -\langle \nabla f(x(t)), {\dot{x}}(t)\rangle -\frac{K}{t}\left\langle \frac{c}{t^2}x(t)+\nabla f(x(t)), x(t)-x^*\right\rangle . \end{aligned}$$

(39)

Combining (37) and (39), we get

$$\begin{aligned} \dot{{\mathcal {E}}}(t)&=-\frac{c}{t^3}\Vert x(t)\Vert ^2-\frac{K^2}{t^3}\Vert x(t)-x^*\Vert ^2\nonumber \\&\quad +\frac{K^2-\alpha K-K}{t^2}\langle {\dot{x}}(t),x(t)-x^*\rangle +\frac{K-\alpha }{t}\Vert {\dot{x}}(t)\Vert ^2\nonumber \\&\quad -\frac{K}{t}\left\langle \frac{c}{t^2}x(t)+\nabla f(x(t)), x(t)-x^*\right\rangle . \end{aligned}$$

(40)

Consider the strongly convex function

$$\begin{aligned} f_t:{\mathcal {H}}\longrightarrow {\mathbb {R}},\, \, f_t(x)=f(x)+\frac{c}{2t^2}\Vert x\Vert ^2. \end{aligned}$$

From the gradient inequality we have

$$\begin{aligned} f_t(y)-f_t(x)\ge \langle {\nabla }f_t(x),y-x\rangle +\frac{c}{2t^2}\Vert x-y\Vert ^2, \text{ for } \text{ all } x,y\in {\mathcal {H}}. \end{aligned}$$

Take $y=x^*$ and $x=x(t)$ in the above inequality. We obtain

$$\begin{aligned}&f^*+\frac{c}{2t^2}\Vert x^*\Vert ^2-f(x(t))-\frac{c}{2t^2}\Vert x(t)\Vert ^2\ge \\&\quad -\left\langle {\nabla }f(x(t))+\frac{c}{t^2}x(t),x(t)-x^*\right\rangle +\frac{c}{2t^2}\Vert x(t)-x^*\Vert ^2. \end{aligned}$$

Consequently,

$$\begin{aligned}&-\left\langle \frac{c}{t^2}x(t)+{\nabla }f(x(t)),x(t)-x^*\right\rangle \nonumber \\&\le -(f(x(t))-f^*)-\frac{c}{2t^2}\Vert x(t)\Vert ^2-\frac{c}{2t^2}\Vert x(t)-x^*\Vert ^2\nonumber \\&\quad +\frac{c}{2t^2}\Vert x^*\Vert ^2. \end{aligned}$$

(41)

By multiplying (41) with $\frac{K}{t}$, and injecting in (40), we obtain

$$\begin{aligned} \dot{{\mathcal {E}}}(t)&\le -\frac{K}{t}(f(x(t))-f^*)-\left( \frac{c}{t^3}+\frac{Kc}{2t^3}\right) \Vert x(t)\Vert ^2-\left( \frac{K^2}{t^3}+\frac{Kc}{2t^3}\right) \Vert x(t)-x^*\Vert ^2\nonumber \\&\quad +\frac{K^2-\alpha K-K}{t^2}\langle {\dot{x}}(t),x(t)-x^*\rangle +\frac{K-\alpha }{t}\Vert {\dot{x}}(t)\Vert ^2+ \frac{cK}{2t^3}\Vert x^*\Vert ^2. \end{aligned}$$

(42)

On the other hand, by multiplying the function ${\mathcal {E}}(t)$ by $\mu (t)=\frac{\alpha -K+1}{t}$, we obtain

$$\begin{aligned} \mu (t){\mathcal {E}}(t)&=\frac{\alpha -K+1}{t} \left( f(x(t)) - f^*\right) +\frac{(\alpha -K+1)c}{2 t^3}\Vert x(t)\Vert ^2\nonumber \\&\quad + \frac{(\alpha -K+1)K^2}{2t^3} \Vert x(t)-x^*\Vert ^2 \nonumber \\&\quad +\frac{\alpha -K+1}{2 t} \Vert {\dot{x}}(t) \Vert ^2+ \frac{(\alpha -K+1)K}{t^2}\langle {\dot{x}}(t),x(t)-x^*\rangle . \end{aligned}$$

(43)

By adding (42) and (43), we get

$$\begin{aligned} \dot{{\mathcal {E}}}(t)+\mu (t){\mathcal {E}}(t)&\le \frac{\alpha -2K+1}{t}(f(x(t))-f^*)+\frac{(\alpha -2K-1)c}{2t^3}\Vert x(t)\Vert ^2\nonumber \\&\quad +\frac{(\alpha -K-1)K^2-Kc}{2t^3}\Vert x(t)-x^*\Vert ^2\nonumber \\&\quad +\frac{K-\alpha +1}{2t}\Vert {\dot{x}}(t)\Vert ^2+ \frac{cK}{2t^3}\Vert x^*\Vert ^2. \end{aligned}$$

(44)

The case $\alpha >3.$ Take $\frac{\alpha +1}{2}< K< \alpha -1.$ Since $\alpha > 3$, such K exists. This implies that $\alpha -2K+1<0$, hence $\alpha -2K-1<0$, and $K-\alpha +1<0$. In addition, since $c>0$ there exists $K\in \big ]\frac{\alpha +1}{2},\alpha -1\big [$ such that

$$\begin{aligned} (\alpha -K-1)K^2-Kc\le 0. \end{aligned}$$

(45)

Indeed, (45) can be deduced from the fact that the continuous function $\varphi (K)=(\alpha -K-1)K$ is decreasing on the interval $\left[ \frac{\alpha +1}{2},\alpha -1\right] $ and $\varphi \left( \alpha -1\right) =0$. Therefore, for every $c>0$ there exists $K\in \big ]\frac{\alpha +1}{2},\alpha -1\big [$ such that $c\ge \varphi (K).$ So take $K\in \big ]\frac{\alpha +1}{2},\alpha -1\big [$ such that (45) holds. Then, by collecting the previous results, (44) yields

$$\begin{aligned}&\dot{{\mathcal {E}}}(t)+\mu (t){\mathcal {E}}(t) \le \frac{cK}{2t^3}\Vert x^*\Vert ^2. \end{aligned}$$

(46)

Taking into account that $\mu (t)=\frac{\alpha -K+1}{t}$, by multiplying (46) with $t^{\alpha -K+1}$ we get

$$\begin{aligned}&\frac{d}{dt}\left( t^{\alpha -K+1}{\mathcal {E}}(t)\right) \le \frac{cK}{2}\Vert x^*\Vert ^2 t^{\alpha -K-2}. \end{aligned}$$

(47)

By integrating (47) on $[t_0,t]$, we get

$$\begin{aligned}&{\mathcal {E}}(t)\le \frac{cK\Vert x^*\Vert ^2}{2(\alpha -K-1)} \frac{1}{t^2}-\frac{cK\Vert x^*\Vert ^2}{2(\alpha -K-1)}\frac{t_0^{\alpha -K-1}}{t^{\alpha -K+1}}+\frac{t_0^{\alpha -K+1}{\mathcal {E}}(t_0)}{t^{\alpha -K+1}}. \end{aligned}$$

(48)

Since $\alpha -K+1 >2$, we obtain

$$\begin{aligned} {\mathcal {E}}(t)=O\left( \frac{1}{t^2}\right) \text{ as } t\rightarrow +\infty . \end{aligned}$$

(49)

By definition of ${\mathcal {E}}(t)$ we immediately deduce that

$$\begin{aligned} f(x(t))-\min _{{{\mathcal {H}}}} f=O\left( \frac{1}{t^2}\right) \text{ as } t\rightarrow +\infty , \end{aligned}$$

(50)

and further, that the trajectory $x(\cdot )$ is bounded and

$$\begin{aligned} \Vert {\dot{x}}(t)\Vert =O\left( \frac{1}{t}\right) \text{ as } t\rightarrow +\infty . \end{aligned}$$

The case $\alpha =3$. Take $K=2$. With the previous notations, we have now $\mu (t)=\frac{2}{t}$ and (44) gives

$$\begin{aligned} \dot{{\mathcal {E}}}(t)+\frac{2}{t}{\mathcal {E}}(t)\le&-\frac{c}{t^3}\Vert x(t)\Vert ^2-\frac{c}{t^3}\Vert x(t)-x^*\Vert ^2+ \frac{c}{t^3}\Vert x^*\Vert ^2\le \frac{c}{t^3}\Vert x^*\Vert ^2. \end{aligned}$$

(51)

After multiplication of (51) by $t^2$ we get

$$\begin{aligned} \frac{d}{dt}(t^2{\mathcal {E}}(t))\le \frac{c}{t}\Vert x^*\Vert ^2. \end{aligned}$$

(52)

By integrating (52) on $[t_0,t]$ we get

$$\begin{aligned} {\mathcal {E}}(t)\le c\Vert x^*\Vert ^2\frac{\ln t}{t^2}-c\Vert x^*\Vert ^2\frac{\ln t_0}{t^2}+\frac{t_0^2{\mathcal {E}}(t_0)}{t^2}. \end{aligned}$$

(53)

Consequently, we have

$$\begin{aligned} {\mathcal {E}}(t)=O\left( \frac{\ln t}{t^2}\right) \text{ as } t\rightarrow +\infty . \end{aligned}$$

(54)

By definition of ${\mathcal {E}}(t)$ we immediately deduce that

$$\begin{aligned} f(x(t))-\min f=O\left( \frac{\ln t}{t^2}\right) \text{ as } t\rightarrow +\infty . \end{aligned}$$

(55)

which gives the claim. $\square $

3.2 Strong convergence

Theorem 14

Let $t_0 > 0$ and, for some starting points $x_0,v_0\in {\mathcal {H}}$, let $x: [t_0, +\infty [ \rightarrow {\mathcal {H}}$ be the unique global solution of (35). Let $x^*$ be the element of minimal norm of $S={{\,\textrm{argmin}\,}}f$, that is $x^*=\text{ proj}_{S}0$. Then, for all $\alpha > 3$ we have that

$$\begin{aligned} \liminf _{t \rightarrow +\infty }{\Vert x(t) - x^*\Vert } = 0. \end{aligned}$$

Further, if there exists $T \ge t_0$, such that the trajectory $\{ x(t):t \ge T \}$ stays either in the open ball $B(0, \Vert x^*\Vert )$ or in its complement, then x(t) converges strongly to $x^*$ as $t\rightarrow +\infty .$

Proof

The proof combines energetic and geometric arguments, as it was initiated in Attouch and Czarnecki (2002). We successively consider the three following configurations of the trajectory.

I.:

Assume that there exists $T\ge t_0$ such that $\Vert x(t)\Vert \ge \Vert x^*\Vert $ for all $t\ge T.$ Let us denote $f_t(x):=f(x)+\frac{c}{2t^2}\Vert x\Vert ^2$ and let $x_t:={{\,\textrm{argmin}\,}}f_t(x).$ Let us recall some classical properties of the Tikhonov approximation:

$$\begin{aligned} \forall t>0 \, \, \Vert x_{t}\Vert \le \Vert x^*\Vert , \, \text{ and } \, \lim _{t\rightarrow +\infty }\Vert x_t -x^* \Vert =0. \end{aligned}$$

(56)

Using the gradient inequality for the strongly convex function $f_t$, we have

$$\begin{aligned} f_t(x(t))-f_t(x_t)\ge \frac{c}{2t^2}\Vert x(t)-x_t\Vert ^2. \end{aligned}$$

On the other hand

$$\begin{aligned} f_t(x_t)-f_t(x^*)=f(x_t)-f^*+\frac{c}{2t^2}(\Vert x_t\Vert ^2-\Vert x^*\Vert ^2)\ge \frac{c}{2t^2}(\Vert x_t\Vert ^2-\Vert x^*\Vert ^2). \end{aligned}$$

By adding the last two inequalities we get

$$\begin{aligned} f_t(x(t))-f_t(x^*)\ge \frac{c}{2t^2}(\Vert x(t)-x_t\Vert ^2+\Vert x_t\Vert ^2-\Vert x^*\Vert ^2), \end{aligned}$$

(57)

Therefore, according to (56), to obtain the strong convergence of the trajectory x(t) to $x^*$, it is enough to show that $f_t(x(t))-f_t(x^*)=o\left( \frac{1}{t^2}\right) , \text{ as } t\rightarrow +\infty .$ For $K>0$, consider now the energy functional

$$\begin{aligned} E(t)&=f_t(x(t))-f_t(x^*)+\frac{1}{2}\left\| \frac{K}{t}(x(t)-x^*)+{\dot{x}}(t)\right\| ^2\nonumber \\&=(f(x(t))\!-\!f(x^*))\!+\!\frac{c}{2t^2}(\Vert x(t)\Vert ^2-\Vert x^*\Vert ^2)\!+\!\frac{1}{2}\left\| \frac{K}{t}(x(t)-x^*)\!+\!{\dot{x}}(t)\right\| ^2. \end{aligned}$$

(58)

Then,

$$\begin{aligned} {\dot{E}}(t)&=\langle {\nabla }f_t(x(t)),{\dot{x}}(t)\rangle -\frac{c}{2t^3}(\Vert x(t)\Vert ^2-\Vert x^*\Vert ^2)\nonumber \\&\quad +\left\langle -\frac{K}{t^2}(x(t)-x^*)+\frac{K}{t}{\dot{x}}(t)+\ddot{x}(t),\frac{K}{t}(x(t)-x^*)+{\dot{x}}(t)\right\rangle . \end{aligned}$$

(59)

Let us examine the different terms of (59). According to the constitutive Eq. (35) we have

$$\begin{aligned}&\left\langle -\frac{K}{t^2}(x(t)-x^*)+\frac{K}{t}{\dot{x}}(t)+\ddot{x}(t),\frac{K}{t}(x(t)-x^*)+{\dot{x}}(t)\right\rangle =\nonumber \\&\quad \left\langle -\frac{K}{t^2}(x(t)-x^*)+\frac{K-\alpha }{t}{\dot{x}}(t) -\left( \frac{c}{t^2}x(t)+\nabla f(x(t))\right) , \frac{K}{t}(x(t)-x^*) + {\dot{x}}(t)\right\rangle =\nonumber \\&\quad -\frac{K^2}{t^3}\Vert x(t)-x^*\Vert ^2+\frac{K^2-\alpha K-K}{t^2}\langle {\dot{x}}(t),x(t)-x^*\rangle +\frac{K-\alpha }{t}\Vert {\dot{x}}(t)\Vert ^2\nonumber \\&\quad -\frac{c}{t^2}\langle x(t),{\dot{x}}(t)\rangle -\langle \nabla f(x(t)), {\dot{x}}(t)\rangle -\frac{K}{t}\left\langle \frac{c}{t^2}x(t)+\nabla f(x(t)), x(t)-x^*\right\rangle . \end{aligned}$$

(60)

Further, from (41) we get

$$\begin{aligned}{} & {} -\frac{K}{t}\left\langle \frac{c}{t^2}x(t)+{\nabla }f(x(t)),x(t)-x^*\right\rangle \nonumber \\{} & {} \quad \le -\frac{K}{t}(f(x(t))-f^*)-\frac{cK}{2t^3}\Vert x(t)\Vert ^2-\frac{cK}{2t^3}\Vert x(t)-x^*\Vert ^2 +\frac{cK}{2t^3}\Vert x^*\Vert ^2 \nonumber \\{} & {} \quad =-\frac{K}{t}(f_t(x(t))-f_t(x^*))-\frac{cK}{2t^3}\Vert x(t)-x^*\Vert ^2. \end{aligned}$$

(61)

Injecting (60) and (61) in (59) we get

$$\begin{aligned} {\dot{E}}(t)\le & {} -\frac{K}{t}(f_t(x(t))-f_t(x^*)) -\frac{c}{t^{3}}(\Vert x(t)\Vert ^2-\Vert x^*\Vert ^2)-\frac{2K^2+cK}{2t^3}\Vert x(t)-x^*\Vert ^2 \nonumber \\{} & {} +\frac{K^2-\alpha K-K}{t^2}\langle {\dot{x}}(t),x(t)-x^*\rangle +\frac{K-\alpha }{t}\Vert {\dot{x}}(t)\Vert ^2. \end{aligned}$$

(62)

Consider now the function $\mu (t)=\frac{\alpha +1-K}{t}.$ Then,

$$\begin{aligned} \mu (t)E(t)&=\frac{\alpha +1-K}{t}(f_t(x(t))-f_t(x^*))+\frac{K^2(\alpha +1-K)}{2t^3}\Vert x(t)-x^*\Vert ^2\nonumber \\&\quad +\frac{K(\alpha +1-K)}{t^2}\langle {\dot{x}}(t),x(t)-x^*\rangle +\frac{\alpha +1-K}{2t}\Vert {\dot{x}}(t)\Vert ^2. \end{aligned}$$

(63)

Consequently, (62) and (63) yield

$$\begin{aligned} {\dot{E}}(t)+\mu (t)E(t)\le{} & {} \frac{\alpha +1-2K}{t}(f_t(x(t))-f_t(x^*))-\frac{c}{t^3}(\Vert x(t)\Vert ^2-\Vert x^*\Vert ^2) \hspace{1cm}\nonumber \\{} & {} + \frac{K^2(\alpha -1-K)-cK}{2t^3}\Vert x(t)-x^*\Vert ^2+\frac{K-\alpha +1}{2t}\Vert {\dot{x}}(t)\Vert ^2 \nonumber \\ ={} & {} \frac{\alpha +1-2K}{t}(f(x(t))-f(x^*))+(\alpha -1-2K)\frac{c}{2t^3}(\Vert x(t)\Vert ^2-\Vert x^*\Vert ^2) \nonumber \\{} & {} +\frac{K^2(\alpha -1-K)-cK}{2t^3}\Vert x(t)-x^*\Vert ^2+\frac{K-\alpha +1}{2t}\Vert {\dot{x}}(t)\Vert ^2. \end{aligned}$$

(64)

Assume that $\frac{\alpha +1}{2}< K< \alpha -1.$ Since $\alpha > 3$ such K exists. As in the proof of Theorem 13 we deduce that $\alpha -2K+1<0$, $K-\alpha +1<0$ and since $c>0$ there exists $K\in \left( \frac{\alpha +1}{2},\alpha -1\right) $ such that

$$\begin{aligned} (\alpha -K-1)K^2-Kc\le 0. \end{aligned}$$

(65)

So take $K\in \left( \frac{\alpha +1}{2},\alpha -1\right) $ such that (65) holds. Then, (64) leads to

$$\begin{aligned} {\dot{E}}(t)+\frac{\alpha +1-K}{t}E(t)\le (\alpha -1-2K)\frac{c}{2t^3}(\Vert x(t)\Vert ^2-\Vert x^*\Vert ^2). \end{aligned}$$

(66)

Let us integrate the differential inequality (66). After multiplication by $t^{\alpha +1-K}$ we get

$$\begin{aligned} \frac{d}{dt}t^{\alpha +1-K}E(t)\le \frac{c}{2}(\alpha -1-2K)t^{\alpha -2-K}(\Vert x(t)\Vert ^2-\Vert x^*\Vert ^2) \end{aligned}$$

and integrating the latter on $[T,t],\,t>T$ we obtain

$$\begin{aligned} E(t)\le \frac{c}{2}(\alpha -1-2K)\frac{\int _{T}^t s^{\alpha -2-K}(\Vert x(s)\Vert ^2-\Vert x^*\Vert ^2)ds}{t^{\alpha +1-K}}+\frac{T^{\alpha +1-K}E(T)}{t^{\alpha +1-K}}.\nonumber \\ \end{aligned}$$

(67)

In one hand, from the definition of E(t) we have

$$\begin{aligned} f_t(x(t))-f_t(x^*)\le E(t). \end{aligned}$$

Therefore,

$$\begin{aligned}{} & {} f_t(x(t))-f_t(x^*)\le \frac{c}{2}(\alpha -1-2K)\\{} & {} \frac{\int _{T}^t s^{\alpha -2-K}(\Vert x(s)\Vert ^2-\Vert x^*\Vert ^2)ds}{t^{\alpha +1-K}}+\frac{T^{\alpha +1-K}E(T)}{t^{\alpha +1-K}}. \end{aligned}$$

On the other hand (57) gives

$$\begin{aligned} f_t(x(t))-f_t(x^*)\ge \frac{c}{2t^2}(\Vert x(t)-x_t\Vert ^2+\Vert x_t\Vert ^2-\Vert x^*\Vert ^2). \end{aligned}$$

Consequently,

$$\begin{aligned}{} & {} (\alpha -1-2K)\frac{\int _{T}^t s^{\alpha -2-K}(\Vert x(s)\Vert ^2-\Vert x^*\Vert ^2)ds}{t^{\alpha -1-K}}+\frac{2T^{\alpha +1-K}E(T)}{ct^{\alpha -1-K}}\nonumber \\{} & {} \quad \ge \Vert x(t)-x_t\Vert ^2+\Vert x_t\Vert ^2-\Vert x^*\Vert ^2. \end{aligned}$$

(68)

By assumption $\Vert x(t)\Vert \ge \Vert x^*\Vert $ for all $t\ge T$ and $\alpha -1-2K<0.$ Hence, for all $t>T$, (68) leads to

$$\begin{aligned} \frac{2 T^{\alpha +1-K}E(T)}{ct^{\alpha -1-K}}\ge \Vert x(t)-x_t\Vert ^2+\Vert x_t\Vert ^2-\Vert x^*\Vert ^2. \end{aligned}$$

(69)

Now, by taking the limit $t\longrightarrow +\infty $ and using that $x_t\rightarrow x^*,\,t\rightarrow +\infty $ we get

$$\begin{aligned} \lim _{t\rightarrow +\infty }\Vert x(t)-x_t\Vert \le 0 \end{aligned}$$

and hence

$$\begin{aligned} \lim _{t\rightarrow +\infty }x(t)=x^*. \end{aligned}$$

II.:

Assume now that there exists $T\ge t_0$ such that $\Vert x(t)\Vert < \Vert x^*\Vert $ for all $t\ge T.$ According to Theorem 13, we have that

$$\begin{aligned} \lim _{t \rightarrow +\infty } f(x(t))=\min _{{{\mathcal {H}}}} f. \end{aligned}$$

Let ${\bar{x}} \in {\mathcal {H}}$ be a weak sequential cluster point of the trajectory x, which exists since, by Theorem 13, the trajectory is bounded. So, there exists a sequence $\left( t_{n}\right) _{n \in {\mathbb {N}}} \subseteq [T,+\infty )$ such that $t_{n} \rightarrow +\infty $ and $x\left( t_{n}\right) $ converges weakly to ${\bar{x}}$ as $n \rightarrow +\infty $. Since f is weakly lower semicontinuous, we deduce that

$$\begin{aligned} f({\bar{x}}) \le \liminf _{n \rightarrow +\infty } f\left( x\left( t_{n}\right) \right) =\min _{{{\mathcal {H}}}} f \,, \end{aligned}$$

hence ${\bar{x}} \in {\text {argmin}} f.$ Now, since the norm is weakly lower semicontinuous, and since $\Vert x(t)\Vert < \Vert x^*\Vert $ for all $t\ge T$, we have

$$\begin{aligned} \begin{array}{c} \Vert {\bar{x}}\Vert \le \liminf _{n \rightarrow +\infty }\left\| x\left( t_{n}\right) \right\| \le \left\| x^*\right\| . \end{array} \end{aligned}$$

Combining ${\bar{x}} \in {\text {argmin}} f$ with the definition of $x^*$, this implies that ${\bar{x}}=x^{*}.$ This shows that the trajectory $x(\cdot )$ converges weakly to $x^*$. So

$$\begin{aligned} \left\| x^*\right\| \le \liminf _{t \rightarrow +\infty }\Vert x(t)\Vert \le \limsup _{t \rightarrow +\infty }\Vert x(t)\Vert \le \left\| x^*\right\| , \end{aligned}$$

hence we have

$$\begin{aligned} \lim _{t \rightarrow +\infty }\Vert x(t)\Vert =\left\| x^*\right\| . \end{aligned}$$

Combining this property with $x(t) \rightharpoonup x^*$ as $t \rightarrow +\infty ,$ we obtain the strong convergence, that is

$$\begin{aligned} \lim _{t \rightarrow +\infty } x(t)=x^*. \end{aligned}$$

III.:

We suppose that for every $T \ge t_{0}$ there exists $t \ge T$ such that $\left\| x^*\right\| >\Vert x(t)\Vert $ and also there exists $s \ge T$ such that $\left\| x^{*}\right\| \le \Vert x(s)\Vert $. From the continuity of x, we deduce that there exists a sequence $\left( t_{n}\right) _{n \in {\mathbb {N}}} \subseteq \left[ t_{0},+\infty \right) $ such that $t_{n} \rightarrow +\infty $ as $n \rightarrow +\infty $ and, for all $n \in {\mathbb {N}}$ we have

$$\begin{aligned} \left\| x\left( t_{n}\right) \right\| =\left\| x^{*}\right\| . \end{aligned}$$

Consider ${\bar{x}} \in {\mathcal {H}}$ a weak sequential cluster point of $\left( x\left( t_{n}\right) \right) _{n \in {\mathbb {N}}}$. We deduce as in case II that ${\bar{x}}=x^{*}.$ Hence, $x^*$ is the only weak sequential cluster point of $x(t_n)$ and consequently the sequence $x(t_n)$ converges weakly to $x^*$. Obviously $\left\| x\left( t_{n}\right) \right\| \rightarrow \left\| x^{*}\right\| $ as $n \rightarrow +\infty $. So, it follows that $x(t_n)\rightarrow x^*,\,n\rightarrow +\infty $, that is $\left\| x\left( t_{n}\right) -x^{*}\right\| \rightarrow 0$ as $n \rightarrow +\infty .$ This leads to $\liminf _{t \rightarrow +\infty }\left\| x(t)-x^*\right\| =0$.

$\square $

4 Strong convergence-general case

We are going to analyze via Lyapunov analysis the strong convergence properties as $t\rightarrow +\infty $ of the solution trajectories of the inertial dynamic (TRIGS) that we recall below

$$\begin{aligned} \ddot{x}(t) + \delta \sqrt{\varepsilon (t)} {\dot{x}}(t) + \nabla f (x(t)) + \varepsilon (t) x(t) =0. \end{aligned}$$

Theorem 15

Let consider the dynamic system (TRIGS) where we assume that $\varepsilon (\cdot )$ satisfies the condition $\mathrm{(CD)}_K$ for some $K>0$, $\int _{t_0}^{+\infty }\varepsilon ^{\frac{3}{2}}(t)dt<+\infty $ and $\lim _{t\rightarrow +\infty } \frac{1}{\sqrt{\varepsilon (t)}\exp \left( {\displaystyle {\int _{t_0}^t (\delta -K)\sqrt{\varepsilon (s)}ds}}\right) }=0.$

Then, for any global solution trajectory $x: [t_0, +\infty [ \rightarrow {\mathcal {H}}$ of (TRIGS),

$$\begin{aligned} \liminf _{t \rightarrow +\infty }{\Vert x(t) - x^*\Vert } = 0, \end{aligned}$$

where $x^*$ is the element of minimal norm of ${{\,\textrm{argmin}\,}}f$, that is $x^*=\text{ proj}_{{{\,\textrm{argmin}\,}}f}0$.

Further, if there exists $T \ge t_0$, such that the trajectory $\{ x(t):t \ge T \}$ stays either in the open ball $B(0, \Vert x^*\Vert )$ or in its complement, then x(t) converges strongly to $x^*$ as $t\rightarrow +\infty .$

Proof

The proof is parallel to that of Theorem 14. We analyze the behavior of the trajectory $x(\cdot )$ depending on its position with respect to the ball $B(0, \Vert x^*\Vert )$.

I.:

Assume that $\Vert x(t)\Vert \ge \Vert x^*\Vert $ for all $t\ge T.$ Let us denote $f_t(x)=f(x)+\frac{\varepsilon (t)}{2}\Vert x\Vert ^2$, and consider the energy functional $E: \big [t_1,+\infty \big [\rightarrow {\mathbb {R}}$ defined by

$$\begin{aligned} E(t):=f_t(x(t))-f_t(x^*)+\frac{1}{2}\Vert c(t)(x(t)-x^*)+{\dot{x}}(t)\Vert ^2, \end{aligned}$$

where $c(t)=K\sqrt{\varepsilon (t)}$. Note that $E(t)={\mathcal {E}}(t) -\frac{\varepsilon (t)}{2}\Vert x^*\Vert ^2$, where ${\mathcal {E}}(t)$ was defined in the proof of Theorem 6. Hence, reasoning as in the proof of Theorem 6, see (25) (and keeping the term containing $\Vert x(t)\Vert ^2$ in the right hand side of (22)), we get for all $t\ge t_1$ that

$$\begin{aligned} \dot{{E}}(t)+\mu (t){E}(t)\le&\left( \frac{{\dot{\varepsilon }}(t)}{2}-c(t)\frac{\varepsilon (t)}{2}+\mu (t)\frac{\varepsilon (t)}{2}\right) (\Vert x(t)\Vert ^2-\Vert x^*\Vert ^2), \end{aligned}$$

(70)

where $\mu (t)=-\frac{{\dot{\varepsilon }}(t)}{2\varepsilon (t)}+(\delta -K)\sqrt{\varepsilon (t)}$. An elementary computation gives $\frac{{\dot{\varepsilon }}(t)}{2}-c(t)\frac{\varepsilon (t)}{2}+\mu (t)\frac{\varepsilon (t)}{2} \le 0$, because of $\varepsilon (\cdot )$ decreasing and $K \ge \frac{\delta }{2}$. Since $\Vert x(t)\Vert \ge \Vert x^*\Vert $ for all $t\ge T$, (70) yields

$$\begin{aligned} \dot{{E}}(t)+\mu (t){E}(t)\le 0, \text{ for } \text{ all } t\ge T_1=\max \{T,t_1\}. \end{aligned}$$

(71)

Set

$$\begin{aligned} {\mathfrak {M}}(t)=\exp \left( {\displaystyle {\int _{T_1}^t\mu (s)ds}}\right) =\exp \left( {\displaystyle {\int _{T_1}^t -\frac{{\dot{\varepsilon }}(s)}{2\varepsilon (s)}+(\delta -K)\sqrt{\varepsilon (s)}ds}}\right) . \end{aligned}$$

Therefore, we have with $C=\sqrt{\varepsilon (T_1)}$

$$\begin{aligned} {\mathfrak {M}}(t)= C\frac{1}{\sqrt{\varepsilon (t)}}\exp \left( {\displaystyle {\int _{T_1}^t (\delta -K)\sqrt{\varepsilon (s)}ds}}\right) . \end{aligned}$$

Multiplying (71) with ${\mathfrak {M}}(t)$ and integrating on an interval $[T_1, t],$ we get for all $t\ge T_1$ that

$$\begin{aligned} {\mathfrak {M}}(t)E(t)\le {\mathfrak {M}}(T_1)E(T_1)=C'. \end{aligned}$$

Consequently, there exists $C_1'>0$ such that for all $t\ge T_1$ one has

$$\begin{aligned} E(t)\le \frac{C_1'\sqrt{\varepsilon (t)}}{\exp \left( {\displaystyle {\int _{T_1}^t (\delta -K)\sqrt{\varepsilon (s)}ds}}\right) }. \end{aligned}$$

Further, $f_t(x(t))-f_t(x^*)\le E(t)$, for all $t\ge t_1$. Therefore,

$$\begin{aligned} f_t(x(t))-f_t(x^*)\le \frac{C_1'\sqrt{\varepsilon (t)}}{\exp \left( {\displaystyle {\int _{T_1}^t (\delta -K)\sqrt{\varepsilon (s)}ds}}\right) }, \text{ for } \text{ all } t\ge T_1. \end{aligned}$$

(72)

For fixed t let us denote $x_{\varepsilon (t)}={{\,\textrm{argmin}\,}}f_t(x).$ Obviously $\Vert x_{\varepsilon (t)}\Vert \le \Vert x^*\Vert .$ Using the gradient inequality for the strongly convex function $f_t$ we have

$$\begin{aligned} f_t(x)-f_t(x_{\varepsilon (t)})\ge \frac{{\varepsilon (t)}}{2}\Vert x-x_{\varepsilon (t)}\Vert ^2 \text{ for } \text{ all } x\in {{\mathcal {H}}} \text{ and } t\ge t_0. \end{aligned}$$

On the other hand

$$\begin{aligned}{} & {} f_t(x_{\varepsilon (t)})-f_t(x^*)=f(x_{\varepsilon (t)})-f^*+\frac{{\varepsilon (t)}}{2}(\Vert x_{\varepsilon (t)}\Vert ^2-\Vert x^*\Vert ^2)\\{} & {} \ge \frac{{\varepsilon (t)}}{2}(\Vert x_{\varepsilon (t)}\Vert ^2-\Vert x^*\Vert ^2). \end{aligned}$$

Now, by adding the last two inequalities we get

$$\begin{aligned} f_t(x)-f_t(x^*)\ge \frac{{\varepsilon (t)}}{2}(\Vert x-x_{\varepsilon (t)}\Vert ^2+\Vert x_{\varepsilon (t)}\Vert ^2-\Vert x^*\Vert ^2) \text{ for } \text{ all } x\in {{\mathcal {H}}} \text{ and } t\ge t_0. \end{aligned}$$

(73)

Hence, (72) and (73) lead to

$$\begin{aligned}{} & {} \Vert x(t)-x_{\varepsilon (t)}\Vert ^2+\Vert x_{\varepsilon (t)}\Vert ^2-\Vert x^*\Vert ^2\nonumber \\{} & {} \le \frac{C_2'}{\sqrt{\varepsilon (t)}\exp \left( {\displaystyle {\int _{T_1}^t (\delta -K)\sqrt{\varepsilon (s)}ds}}\right) }, \text{ for } \text{ all } t\ge T_1. \end{aligned}$$

(74)

Now, by taking the limit as $t\rightarrow +\infty $, and using that $x_{\varepsilon (t)}\rightarrow x^*$ as $t\rightarrow +\infty $ and the assumption in the hypotheses of the theorem we get $\lim _{t\rightarrow +\infty }\Vert x(t)-x_{\varepsilon (t)}\Vert \le 0$, and hence $\lim _{t\rightarrow +\infty }x(t)=x^*.$

II.:

Assume now, that $\Vert x(t)\Vert <\Vert x^*\Vert $ for all $t\ge T.$ By Corollary 8 we get that $f(x(t))\rightarrow \min f$ as $t\rightarrow +\infty .$ Now, we take ${\bar{x}} \in {\mathcal {H}}$ a weak sequential cluster point of the trajectory x, which exists since the trajectory is bounded. This means that there exists a sequence $\left( t_{n}\right) _{n \in {\mathbb {N}}} \subseteq [T,+\infty )$ such that $t_{n} \rightarrow +\infty $ and $x\left( t_{n}\right) $ converges weakly to ${\bar{x}}$ as $n \rightarrow +\infty $. We know that f is weakly lower semicontinuous, so one has

$$\begin{aligned} f({\bar{x}}) \le \liminf _{n \rightarrow +\infty } f\left( x\left( t_{n}\right) \right) =\min f \,, \end{aligned}$$

hence ${\bar{x}} \in {\text {argmin}} f.$ Now, since the norm is weakly lower semicontinuous one has that

$$\begin{aligned} \begin{array}{c} \Vert {\bar{x}}\Vert \le \liminf _{n \rightarrow +\infty }\left\| x\left( t_{n}\right) \right\| \le \left\| x^*\right\| \end{array} \end{aligned}$$

which, from the definition of $x^*$, implies that ${\bar{x}}=x^{*}.$ This shows that the trajectory $x(\cdot )$ converges weakly to $x^*$. So

$$\begin{aligned} \left\| x^*\right\| \le \liminf _{t \rightarrow +\infty }\Vert x(t)\Vert \le \limsup _{t \rightarrow +\infty }\Vert x(t)\Vert \le \left\| x^*\right\| , \end{aligned}$$

hence we have

$$\begin{aligned} \lim _{t \rightarrow +\infty }\Vert x(t)\Vert =\left\| x^*\right\| . \end{aligned}$$

From the previous relation and the fact that $x(t) \rightharpoonup x^*$ as $t \rightarrow +\infty ,$ we obtain the strong convergence, that is

$$\begin{aligned} \lim _{t \rightarrow +\infty } x(t)=x^*. \end{aligned}$$

III.:

We suppose that for every $T \ge t_{0}$ there exists $t \ge T$ such that $\left\| x^*\right\| >\Vert x(t)\Vert $ and also there exists $s \ge T$ such that $\left\| x^{*}\right\| \le \Vert x(s)\Vert $. From the continuity of x, we deduce that there exists a sequence $\left( t_{n}\right) _{n \in {\mathbb {N}}} \subseteq \left[ t_{0},+\infty \right) $ such that $t_{n} \rightarrow +\infty $ as $n \rightarrow +\infty $ and, for all $n \in {\mathbb {N}}$ we have

$$\begin{aligned} \left\| x(t_{n})\right\| =\left\| x^{*}\right\| . \end{aligned}$$

Consider ${\bar{x}} \in {\mathcal {H}}$ a weak sequential cluster point of $\left( x\left( t_{n}\right) \right) _{n \in {\mathbb {N}}}$. We deduce as at case II that ${\bar{x}}=x^{*}.$ Hence, $x^*$ is the only weak sequential cluster point of $x(t_n)$ and consequently the sequence $x(t_n)$ converges weakly to $x^*$. Obviously $\left\| x(t_{n})\right\| \rightarrow \left\| x^{*}\right\| $ as $n \rightarrow +\infty $. So, it follows that $x(t_n)\rightarrow x^*,\,n\rightarrow +\infty $, that is $\left\| x\left( t_{n}\right) -x^{*}\right\| \rightarrow 0$ as $n \rightarrow +\infty .$ This leads to $\liminf _{t \rightarrow +\infty }\left\| x(t)-x^*\right\| =0.$

$\square $

4.1 The case $\varepsilon (t)$ is of order $1/t^r$, $\frac{2}{3}< r<2$

Take $\varepsilon (t)=1/t^r$, $\frac{2}{3}<r<2$. Then, $\int _{t_0}^{+\infty }\varepsilon ^{\frac{3}{2}}(t)dt= \int _{t_0}^{+\infty }\frac{1}{t^{\frac{3}{2} r}}dt<+\infty $, $\left( \frac{1}{\sqrt{\varepsilon (t)}}\right) '=\frac{r}{2}t^{\frac{r}{2}-1}$ and

$$\begin{aligned} \lim _{t\rightarrow +\infty } \frac{1}{\sqrt{\varepsilon (t)}\exp \left( {\displaystyle {\int _{t_0}^t (\delta -K)\sqrt{\varepsilon (s)}ds}}\right) }= \lim _{t\rightarrow +\infty }\frac{C t^\frac{r}{2}}{\exp \left( \frac{2(\delta -K)}{2-r}t^{1-\frac{r}{2}}\right) } =0. \end{aligned}$$

Therefore, Theorem 15 can be applied. Let us summarize these results in the following statement.

Theorem 16

Take $\varepsilon (t)=1/t^r$, $ \frac{2}{3}<r<2$. Let $x: [t_0, +\infty [ \rightarrow {\mathcal {H}}$ be a global solution trajectory of

$$\begin{aligned} \ddot{x}(t) + \frac{\delta }{t^{\frac{r}{2}}}{\dot{x}}(t) + \nabla f\left( x(t) \right) + \frac{1}{t^{r}} x(t)=0. \end{aligned}$$

Then, $ \liminf _{t \rightarrow +\infty }{\Vert x(t) - x^*\Vert } = 0.$

Further, if there exists $T \ge t_0$, such that the trajectory $\{ x(t):t \ge T \}$ stays either in the open ball $B(0, \Vert x^*\Vert )$ or in its complement, then x(t) converges strongly to $x^*$ as $t\rightarrow +\infty .$

5 Fast inertial algorithms with Tikhonov regularization

On the basis of the convergence properties of continuous dynamic (TRIGS), one would expect to obtain similar results for the algorithms resulting from its temporal discretization. To illustrate this, we will do a detailed study of the associated proximal algorithms, obtained by implicit discretization. A full study of the associated first-order algorithms would be beyond the scope of this article, and will be the subject of further study. So, for $k\ge 1$, consider the discrete dynamic

$$\begin{aligned} (x_{k+1}-2x_{k}+x_{k-1})+ \frac{\alpha }{k}(x_{k}-x_{k-1})+\nabla f(x_{k+1}) + \frac{c}{k^2} \xi _{k} =0, \end{aligned}$$

(75)

with time step size equal to one. We take $\xi _k = x_{k}$, which gives

$$\begin{aligned} \mathrm{(IPATRE)}\quad \left\{ \begin{array}{l} y_k= x_k+\alpha _{k }(x_k - x_{k-1})\\ x_{k+1}={ \mathrm prox}_{f}\left( y_k - \frac{c}{k^2}x_k\right) , \end{array} \right. \end{aligned}$$

where (IPATRE) stands for Inertial Proximal Algorithm with Tikhonov REgularization. According to (75) we have

$$\begin{aligned} x_{k+1}=\alpha _k(x_{k}-x_{k-1})-\nabla f(x_{k+1}) +\left( 1- \frac{c}{k^2}\right) x_k. \end{aligned}$$

(76)

5.1 Convergence of values

We have the following result.

Theorem 17

Let $(x_k)$ be a sequence generated by $\text{(IPATRE) }$. Assume that $\alpha >3$. Then for all $s\in \big [\frac{1}{2},1\big [$ the following hold:

(i)
$f(x_k)-\min _{{{\mathcal {H}}}} f=o(k^{-2s})$, $\Vert x_{k}-x_{k-1}\Vert =o(k^{-s})$ and $\Vert {\nabla }f(x_k)\Vert =o(k^{-s})$ as $k\rightarrow +\infty .$
(ii)
$\displaystyle \sum _{k=1}^{+\infty } k^{2s-1}(f(x_k)-\min _{{{\mathcal {H}}}} f)<+\infty ,$ $\displaystyle \sum _{k=1}^{+\infty } k^{2s-1}\Vert x_k-x_{k-1}\Vert ^2<+\infty $, $\displaystyle \sum _{k=1}^{+\infty } k^{2s}\Vert {\nabla }f(x_k)\Vert ^2<+\infty $.

Proof

Given $x^*\in {{\,\textrm{argmin}\,}}f$, set $f^*=f(x^*)=\min _{{{\mathcal {H}}}} f$. For $k\ge 2$, consider the discrete energy

$$\begin{aligned} E_k:=\Vert a_{k-1}(x_{k-1}-x^*)+b_{k-1}(x_k-x_{k-1}+{\nabla }f(x_k))\Vert ^2+d_{k-1}\Vert x_{k-1}\Vert ^2, \end{aligned}$$

(77)

where $a_k=ak^{r-1}, \, 2<a<\alpha -1$ and $b_k=k^r$, $r\in ]0,1]$. The sequence $(d_k)$ will be defined later. Set shortly $c_k:=\frac{c}{k^2}.$ Let us develop $E_k$.

$$\begin{aligned} E_k= & {} a_{k-1}^2\Vert x_{k-1}-x^*\Vert ^2+b_{k-1}^2\Vert x_k-x_{k-1}\Vert ^2+b_{k-1}^2\Vert {\nabla }f(x_k)\Vert ^2\nonumber \\{} & {} +2a_{k-1}b_{k-1}\langle x_k-x_{k-1},x_{k-1}-x^*\rangle \nonumber \\{} & {} +2a_{k-1}b_{k-1}\langle {\nabla }f(x_k),x_{k-1}-x^*\rangle \nonumber \\{} & {} +2b_{k-1}^2\langle {\nabla }f(x_k),x_k-x_{k-1}\rangle +d_{k-1}\Vert x_{k-1}\Vert ^2. \end{aligned}$$

(78)

Further

$$\begin{aligned}&2a_{k-1}b_{k-1}\langle x_k-x_{k-1},x_{k-1}-x^*\rangle \\&\quad =a_{k-1}b_{k-1}(\Vert x_k-x^*\Vert ^2-\Vert x_k-x_{k-1}\Vert ^2-\Vert x_{k-1}-x^*\Vert ^2)\\&\quad 2a_{k-1}b_{k-1}\langle {\nabla }f(x_k),x_{k-1}-x^*\rangle =2a_{k-1}b_{k-1}\langle {\nabla }f(x_k),x_{k}-x^*\rangle \\&\quad -2a_{k-1}b_{k-1}\langle {\nabla }f(x_k),x_{k}-x_{k-1}\rangle . \end{aligned}$$

Consequently, (78) becomes

$$\begin{aligned} E_k= & {} a_{k-1}b_{k-1}\Vert x_k-x^*\Vert ^2+(a_{k-1}^2-a_{k-1}b_{k-1})\Vert x_{k-1}-x^*\Vert ^2\nonumber \\{} & {} +(b_{k-1}^2-a_{k-1}b_{k-1})\Vert x_k-x_{k-1}\Vert ^2 \nonumber \\{} & {} +b_{k-1}^2\Vert {\nabla }f(x_k)\Vert ^2+2a_{k-1}b_{k-1}\langle {\nabla }f(x_k),x_{k}-x^*\rangle \nonumber \\{} & {} +(2b_{k-1}^2-2a_{k-1}b_{k-1})\langle {\nabla }f(x_k),x_k-x_{k-1}\rangle \nonumber \\{} & {} +d_{k-1}\Vert x_{k-1}\Vert ^2. \hspace{8cm} \end{aligned}$$

(79)

Let us proceed similarly with $E_{k+1}$. Let us first observe that from (77) we have

$$\begin{aligned} E_{k+1}=\Vert a_{k}(x_{k}-x^*)+b_{k}(\alpha _k(x_k-x_{k-1})-c_k x_k)\Vert ^2+d_{k}\Vert x_{k}\Vert ^2. \end{aligned}$$

Therefore, after development we get

$$\begin{aligned} E_{k+1}= & {} a_k^2\Vert x_k-x^*\Vert ^2+\alpha _k^2b_k^2\Vert x_k-x_{k-1}\Vert ^2+b_k^2c_k^2\Vert x_k\Vert ^2+2\alpha _k a_kb_k\langle x_k-x_{k-1},x_k-x^*\rangle \nonumber \\{} & {} -2\alpha _kb_k^2c_k\langle x_k-x_{k-1},x_k\rangle -2a_kb_kc_k\langle x_k,x_k-x^*\rangle +d_k\Vert x_k\Vert ^2. \end{aligned}$$

(80)

Further,

$$\begin{aligned}{} & {} 2\alpha _k a_kb_k\langle x_k-x_{k-1},x_k-x^*\rangle =-\alpha _k a_kb_k(\Vert x_{k-1}-x^*\Vert -\Vert x_k-x_{k-1}\Vert ^2-\Vert x_k-x^*\Vert ^2) \\{} & {} \quad -2\alpha _kb_k^2c_k\langle x_k-x_{k-1},x_k\rangle =\alpha _kb_k^2c_k(\Vert x_{k-1}\Vert ^2-\Vert x_k-x_{k-1}\Vert ^2-\Vert x_k\Vert ^2) \\{} & {} \quad -2a_kb_kc_k\langle x_k,x_k-x^*\rangle =a_kb_kc_k(\Vert x^*\Vert ^2-\Vert x_k-x^*\Vert ^2-\Vert x_k\Vert ^2). \end{aligned}$$

Therefore, (80) yields

$$\begin{aligned} E_{k+1}&=(a_k^2+\alpha _k a_kb_k-a_kb_kc_k)\Vert x_k-x^*\Vert ^2-\alpha _k a_kb_k\Vert x_{k-1}-x^*\Vert ^2 \nonumber \\&+(\alpha _k^2b_k^2+\alpha _k a_kb_k-\alpha _kb_k^2c_k)\Vert x_k-x_{k-1}\Vert ^2+(b_k^2c_k^2+d_k-\alpha _kb_k^2c_k -a_kb_kc_k)\Vert x_k\Vert ^2\nonumber \\&+\alpha _kb_k^2c_k\Vert x_{k-1}\Vert ^2+a_kb_kc_k\Vert x^*\Vert ^2. \end{aligned}$$

(81)

By combining (79) and (81), we obtain

$$\begin{aligned}&E_{k+1}-E_k=(a_k^2+\alpha _k a_kb_k-a_kb_kc_k-a_{k-1}b_{k-1})\Vert x_k-x^*\Vert ^2 \nonumber \\&\quad +(-\alpha _k a_kb_k -a_{k-1}^2+a_{k-1}b_{k-1})\Vert x_{k-1}-x^*\Vert ^2\nonumber \\&\quad +(\alpha _k^2b_k^2+\alpha _k a_kb_k-\alpha _kb_k^2c_k-b_{k-1}^2+a_{k-1}b_{k-1})\Vert x_k-x_{k-1}\Vert ^2\nonumber \\&\quad +(b_k^2c_k^2+d_k-\alpha _kb_k^2c_k -a_kb_kc_k)\Vert x_k\Vert ^2\nonumber \\&\quad +(\alpha _kb_k^2c_k-d_{k-1})\Vert x_{k-1}\Vert ^2-b_{k-1}^2\Vert {\nabla }f(x_k)\Vert ^2\nonumber \\&\quad +2a_{k-1}b_{k-1}\langle {\nabla }f(x_k),x^*-x_{k}\rangle \nonumber \\&\quad +(2b_{k-1}^2-2a_{k-1}b_{k-1})\langle {\nabla }f(x_k),x_{k-1}-x_k\rangle +a_kb_kc_k\Vert x^*\Vert ^2. \end{aligned}$$

(82)

By convexity of f, we have

$$\begin{aligned} \langle {\nabla }f(x_k),x^*-x_{k}\rangle \le f^*-f(x_k) \text{ and } \langle {\nabla }f(x_k),x_{k-1}-x_k\rangle \le f(x_{k-1})-f(x_k). \end{aligned}$$

According to the form of $(a_k)$ and $(b_k)$, there exists $k_0\ge 2$ such that $b_k\ge a_k$ for all $k \ge k_0$. Consequently, $2b_{k-1}^2-2a_{k-1}b_{k-1} \ge 0$ which, according to the above convexity inequalities, gives

$$\begin{aligned}{} & {} 2a_{k-1}b_{k-1}\langle {\nabla }f(x_k),x^*-x_{k}\rangle +(2b_{k-1}^2-2a_{k-1}b_{k-1})\langle {\nabla }f(x_k),x_{k-1}-x_k\rangle \nonumber \\{} & {} \quad \le 2a_{k-1}b_{k-1}(f^* -f(x_k)) +(2b_{k-1}^2-2a_{k-1}b_{k-1})\left[ f(x_{k-1})-f(x_k) \right] \nonumber \\{} & {} \quad = -2a_{k-1}b_{k-1}(f(x_k)-f^*) +(2b_{k-1}^2-2a_{k-1}b_{k-1})\left[ (f(x_{k-1})- f^* )- (f(x_k) - f^* ) \right] \nonumber \\{} & {} \quad = (2b_{k-1}^2-2a_{k-1}b_{k-1})(f(x_{k-1})-f^*) -2b_{k-1}^2 (f(x_k)-f^*) \nonumber \\{} & {} \quad =(2b_{k-1}^2-2a_{k-1}b_{k-1})(f(x_{k-1})-f^*)\nonumber \\{} & {} \quad -\Big ((2b_{k}^2-2a_{k}b_{k})+(2b_{k-1}^2-2b_{k}^2+2a_{k}b_{k})\Big )(f(x_{k})-f^*). \end{aligned}$$

(83)

Set $\mu _k:=2b_{k}^2-2a_{k}b_{k}$ and observe that $\mu _k\ge 0$ for all $k\ge k_0$, and $\mu _k \sim C k^{2r}$ (we use C as a generic positive constant). Let us also introduce $m_k:=2b_{k-1}^2-2b_{k}^2+2a_{k}b_{k}$, and observe that $m_k\ge 0$ for all $k\ge k_0$. Equivalently, let us show that for all $\frac{1}{2}\le r\le 1$ one has $b_{k}^2-a_{k}b_{k}\le b_{k-1}^2$ for all $k\ge 1$. Equivalently $ k^{2r}-ak^{2r-1}-(k-1)^{2r} \le 0 $. By convexity of the function $x\mapsto x^{2r}$, the subgradient inequality gives

$$\begin{aligned} (x-1)^{2r} \ge x^{2r} -2r x^{2r -1} \ge x^{2r} -a x^{2r -1}, \end{aligned}$$

where the second inequality comes from $ 2 r < a $. Replacing x with k gives the claim. In addition $m_k \sim Ck^{2r-1}.$ Combining (82) and (83), we obtain that for all $k\ge k_0$

$$\begin{aligned}&E_{k+1}-E_k+\mu _k(f(x_{k})-f^*)-\mu _{k-1}(f(x_{k-1})-f^*)+m_k(f(x_{k})-f^*)\nonumber \\&\quad \le (a_k^2+\alpha _k a_kb_k-a_kb_kc_k-a_{k-1}b_{k-1})\Vert x_k-x^*\Vert ^2\nonumber \\&\quad +(-\alpha _k a_kb_k -a_{k-1}^2+a_{k-1}b_{k-1})\Vert x_{k-1}-x^*\Vert ^2\nonumber \\&\quad +(\alpha _k^2b_k^2+\alpha _k a_kb_k-\alpha _kb_k^2c_k-b_{k-1}^2+a_{k-1}b_{k-1})\Vert x_k-x_{k-1}\Vert ^2\nonumber \\&\quad +(b_k^2c_k^2+d_k-\alpha _kb_k^2c_k -a_kb_kc_k)\Vert x_k\Vert ^2+(\alpha _kb_k^2c_k-d_{k-1})\Vert x_{k-1}\Vert ^2-b_{k-1}^2\Vert {\nabla }f(x_k)\Vert ^2\nonumber \\&\quad +a_kb_kc_k\Vert x^*\Vert ^2. \end{aligned}$$

(84)

Let us now analyze the right hand side of (84).

i) Write the coefficient of $\Vert x_k-x^*\Vert ^2$ so as to show a term similar to the coefficient of $\Vert x_{k-1}-x^*\Vert ^2$. This will prepare the summation of these quantities. This gives

$$\begin{aligned}{} & {} a_k^2+\alpha _k a_kb_k-a_kb_kc_k-a_{k-1}b_{k-1}=(\alpha _{k+1}a_{k+1}b_{k+1} +a_{k}^2-a_{k}b_{k}) \nonumber \\{} & {} \quad +(\alpha _k a_kb_k-a_kb_kc_k-a_{k-1}b_{k-1}-\alpha _{k+1}a_{k+1}b_{k+1}+a_{k}b_{k}). \end{aligned}$$

(85)

a)
By definition, $\alpha _{k+1}a_{k+1}b_{k+1} +a_{k}^2-a_{k}b_{k}=a(k+1)^{2r-1}-\alpha a(k+1)^{2r-2}+a^2k^{2r-2}-a k^{2r-1}$. Proceeding as before, let us show that $a(x+1)^{2r-1}-\alpha a(x+1)^{2r-2}+a^2x^{2r-2}-a x^{2r-1}\le 0$ for x large enough. By taking $ \frac{1}{2}\le r \le 1 $, by convexity of the function $x\mapsto -x^{2r-1}$, the subgradient inequality gives $(2r-1)x^{2r-2} \ge (x+1)^{2r-1} - x^{2r -1}.$ Therefore,
$$\begin{aligned}{} & {} a(x+1)^{2r-1}-a x^{2r-1}-\alpha a(x+1)^{2r-2}+a^2x^{2r-2}\\{} & {} \quad \le a(2r-1)x^{2r-2}-\alpha a(x+1)^{2r-2}+a^2x^{2r-2}. \end{aligned}$$
But $a(2r-1)x^{2r-2}+a^2x^{2r-2}\le \alpha a(x+1)^{2r-2}$ since $2r+a\le \alpha +1$ and the claim follows. Therefore, there exists $k_1\ge k_0$ such that for all $\frac{1}{2}\le r\le 1$ we have
$$\begin{aligned} \alpha _{k+1}a_{k+1}b_{k+1} +a_{k}^2-a_{k}b_{k}\le 0, \text{ for } \text{ all } k\ge k_1. \end{aligned}$$
(86)
Set $\nu _k:=-\alpha _{k+1}a_{k+1}b_{k+1} -a_{k}^2+a_{k}b_{k}$. According to (86), $\nu _k\ge 0$ for all $k\ge k_1$, and $\nu _k \sim Ck^{2r-2}$.
b)
Consider now the second term in the right hand side of (85):
$$\begin{aligned}{} & {} a_k a_kb_k-a_kb_kc_k-a_{k-1}b_{k-1}-\alpha _{k+1}a_{k+1}b_{k+1}+a_{k}b_{k}\\{} & {} =2ak^{2r-1}-\alpha ak^{2r-2}-ack^{2r-3}-a(k-1)^{2r-1}\\{} & {} - a(k+1)^{2r-1}+\alpha a(k+1)^{2r-2}. \end{aligned}$$
Let us show that for all $\frac{1}{2}\le r\le 1$
$$\begin{aligned} \phi (x,r)= & {} 2ax^{2r-1}-\alpha ax^{2r-2}-acx^{2r-3}-a(x-1)^{2r-1}\\{} & {} -a(x+1)^{2r-1}+\alpha a(x+1)^{2r-2}\le 0 \end{aligned}$$
for x large enough. By convexity of the function $x\mapsto x^{2r-1}-(x-1)^{2r-1}$ (one can easily verify that its second order derivative is nonnegative), the subgradient inequality gives $(x+1)^{2r-1}-2x^{2r-1}+(x-1)^{2r-1}\ge (2r-1)(x^{2r-2}-(x-1)^{2r-2})$. Therefore
$$\begin{aligned} \phi (x,r)= & {} -a [ (x+1)^{2r-1}-2x^{2r-1}+(x-1)^{2r-1} ]\\{} & {} -\alpha ax^{2r-2}-acx^{2r-3}+ \alpha a(k+1)^{2r-2}\\\le & {} -a [ (2r-1)(x^{2r-2}-(x-1)^{2r-2}) ]\\{} & {} -\alpha ax^{2r-2}-acx^{2r-3}+ \alpha a(k+1)^{2r-2}\\= & {} a (2r-1)(x-1)^{2r-2}-a(\alpha +2r-1)x^{2r-2}\\{} & {} -acx^{2r-3}+\alpha a(x+1)^{2r-2}. \end{aligned}$$
Similarly, by convexity of the function $x\mapsto (x-1)^{2r-2}-x^{2r-2}$, the subgradient inequality gives $2x^{2r-2}-(x+1)^{2r-2}-(x-1)^{2r-2}\ge (2r-2)((x-1)^{2r-3}-x^{2r-3})$. Therefore, $a\alpha (x+1)^{2r-2}-a\alpha x^{2r-2}\le a\alpha (x^{2r-2}-(x-1)^{2r-2})-a\alpha (2r-2)((x-1)^{2r-3}-x^{2r-3}).$ Consequently,
$$\begin{aligned}{} & {} \phi (x,r)\le a (2r-1-\alpha )((x-1)^{2r-2}-x^{2r-2})\\{} & {} -a\alpha (2r-2)((x-1)^{2r-3}-x^{2r-3})-acx^{2r-3}. \end{aligned}$$
Finally, by convexity of the function $x\mapsto x^{2r-2}$, the subgradient inequality gives $(x-1)^{2r-2}-x^{2r-2}\ge -(2r-2)x^{2r-3}$. Taking into account that $a (2r-1-\alpha )\le 0$ we get
$$\begin{aligned} \phi (x,r)\le -a (2r-1-2\alpha )(2r-2)x^{2r-3}-a\alpha (2r-2)(x-1)^{2r-3}-acx^{2r-3}. \end{aligned}$$
Since $\frac{2\alpha +1-2r}{\alpha }>1$ we obtain that $\phi (x,r)\le 0$ for $x>1$. Consequently, there exists $k_2\ge k_1$ such that for all $\frac{1}{2}\le r\le 1$
$$\begin{aligned} \alpha _k a_kb_k-a_kb_kc_k-a_{k-1}b_{k-1}-\alpha _{k+1}a_{k+1}b_{k+1}+a_{k}b_{k}\le 0, \text{ for } \text{ all } k\ge k_2.\nonumber \\ \end{aligned}$$
(87)
Set $n_k:=-\alpha _k a_kb_k+a_kb_kc_k+a_{k-1}b_{k-1}+\alpha _{k+1}a_{k+1}b_{k+1}-a_{k}b_{k}$. So $n_k\ge 0$ for all $k\ge k_2$ and $n_k\sim Ck^{2r-3}.$

ii) Let us now examine the coefficient of $\Vert x_k-x_{k-1}\Vert ^2$. By definition we have

$$\begin{aligned}{} & {} \alpha _k^2b_k^2+\alpha _k a_kb_k-\alpha _kb_k^2c_k-b_{k-1}^2+a_{k-1}b_{k-1}\\{} & {} \quad = k^{2r}-(k-1)^{2r}+(-2\alpha +a)k^{2r-1}+a(k-1)^{2r-1}\\{} & {} \quad +(\alpha ^2 -\alpha a-c)k^{2r-2}+\alpha ck^{2r-3}. \end{aligned}$$

Let us show that for all $\frac{1}{2}\le r\le 1$

$$\begin{aligned} \phi (x,r)= & {} x^{2r}-(x-1)^{2r}+(-2\alpha +a)x^{2r-1}+a(x-1)^{2r-1}\\{} & {} +(\alpha ^2 -\alpha a-c)x^{2r-2}+\alpha cx^{2r-3}\le 0, \end{aligned}$$

if x is large enough. By convexity of the function $x\mapsto x^{2r}-a x^{2r-1}$, the subgradient inequality gives $((x-1)^{2r}-a(x-1)^{2r-1})-(x^{2r}-ax^{2r-1})\ge -(2rx^{2r-1}-a(2r-1)x^{2r-2})$. Therefore, taking into account that $r-\alpha +a\le 1-\alpha +a\le 0$, we obtain

$$\begin{aligned} \phi (x,r)\le 2(r-\alpha +a)x^{2r-1}-a(2r-1)x^{2r-2}+(\alpha ^2 -\alpha a-c)x^{2r-2}+\alpha cx^{2r-3}\le 0, \end{aligned}$$

for x large enough. Consequently, there exist $k_3\ge k_2$ such that for all $\frac{1}{2}\le r\le 1$

$$\begin{aligned} \alpha _k^2b_k^2+\alpha _k a_kb_k-\alpha _kb_k^2c_k-b_{k-1}^2+a_{k-1}b_{k-1}\le 0, \text{ for } \text{ all } k\ge k_3. \end{aligned}$$

(88)

Set $\eta _k:=-\alpha _k^2b_k^2-\alpha _k a_kb_k+\alpha _kb_k^2c_k+b_{k-1}^2-a_{k-1}b_{k-1}$. So $\eta _k\ge 0$ for all $k\ge k_3$ and $\eta _k\sim Ck^{2r-1}.$

iii) The coefficient of $\Vert x_{k-1}\Vert ^2$ is $\alpha _kb_k^2c_k-d_{k-1}$. We proceed in a similar way as in i), and write the coefficient of $\Vert x_k\Vert ^2$ as

$$\begin{aligned}{} & {} b_k^2c_k^2+d_k-\alpha _kb_k^2c_k -a_kb_kc_k=(-\alpha _{k+1}b_{k+1}^2c_{k+1}+d_{k})\\{} & {} \quad +(b_k^2c_k^2+\alpha _{k+1}b_{k+1}^2c_{k+1}-\alpha _kb_k^2c_k -a_kb_kc_k). \end{aligned}$$

We have

$$\begin{aligned} b_k^2c_k^2+\alpha _{k+1}b_{k+1}^2c_{k+1}-\alpha _kb_k^2c_k -a_kb_kc_k&=c^2k^{2r-4}+ c(k+1)^{2r-2}-\alpha c(k+1)^{2r-3}\\&\quad -ck^{2r-2}+\alpha ck^{2r-3}-ack^{2r-3}. \end{aligned}$$

Let us show that for all $\frac{1}{2}\le r\le 1$

$$\begin{aligned}{} & {} \phi (x,r)=c(x+1)^{2r-2}-\alpha c(x+1)^{2r-3}-cx^{2r-2}\\{} & {} +\alpha cx^{2r-3}-acx^{2r-3}+c^2x^{2r-4}\le 0 \end{aligned}$$

for x large enough. Since for x large enough, the function $x\mapsto x^{2r-2}-\alpha x^{2r-3}$ is convex, the subgradient inequality gives

$$\begin{aligned}{} & {} (x^{2r-2}-\alpha x^{2r-3})-( (x+1)^{2r-2}-\alpha (x+1)^{2r-3})\\{} & {} \ge -((2r-2)(x+1)^{2r-3}-\alpha (2r-3)(x+1)^{2r-4}). \end{aligned}$$

Therefore, by taking into account that $r\le 1$, we obtain

$$\begin{aligned} \phi (x,r)\le (2r-2)c(x+1)^{2r-3}-\alpha (2r-3)c(x+1)^{2r-4}-acx^{2r-3}+c^2x^{2r-4}\le 0 \end{aligned}$$

for x large enough. Consequently, there exists $k_4\ge k_3$ such that for all $\frac{1}{2}\le r\le 1$ we have

$$\begin{aligned} b_k^2c_k^2+\alpha _{k+1}b_{k+1}^2c_{k+1}-\alpha _kb_k^2c_k -a_kb_kc_k\le 0 \text{ for } \text{ all } k\ge k_4. \end{aligned}$$

(89)

Let us denote $\sigma _k:=\alpha _{k+1}b_{k+1}^2c_{k+1}-d_k$ and $s_k:=-b_k^2c_k^2-\alpha _{k+1}b_{k+1}^2c_{k+1}+\alpha _kb_k^2c_k +a_kb_kc_k$ and observe that $s_k\ge 0$ for all $k\ge k_4$ and $s_k\sim Ck^{2r-3}.$

Combining (84), (86), (87), (88) and (89) we obtain that for all $k\ge k_4$ and $r\in \left[ \frac{1}{2},1\right] $ it holds

$$\begin{aligned}&E_{k+1}-E_k+\mu _k(f(x_{k})-f^*)-\mu _{k-1}(f(x_{k-1})-f^*)+m_k(f(x_{k})-f^*)\nonumber \\&\quad +\nu _k\Vert x_k-x^*\Vert ^2-\nu _{k-1}\Vert x_{k-1}-x^*\Vert ^2+n_k\Vert x_k-x^*\Vert ^2\nonumber \\&\quad +\sigma _k\Vert x_k\Vert ^2-\sigma _{k-1}\Vert x_{k-1}\Vert ^2+s_k\Vert x_k\Vert ^2\nonumber \\&\quad +\eta _k\Vert x_k-x_{k-1}\Vert ^2+b_{k-1}^2\Vert {\nabla }f(x_k)\Vert ^2\le a_kb_kc_k\Vert x^*\Vert ^2. \end{aligned}$$

(90)

Finally, take $d_{k-1}=\frac{1}{2} \alpha _kb_k^2c_k.$ Then, $\sigma _k=\frac{1}{2}\alpha _{k+1}b_{k+1}^2c_{k+1}\sim C k^{2r-2}$, $\sigma _k\ge 0$ for all $k\ge k_5=\max (\alpha -1, k_4).$ Further, $\mu _k, m_k, \nu _k, n_k, s_k$ and $ \eta _k$ are nonnegative for all $k\ge k_5$ and $r\in \left[ \frac{1}{2},1\right] $.

Assume now that $\frac{1}{2}\le r<1.$ According to $\sum _{k\ge k_5}a_kb_kc_k\Vert x^*\Vert ^2=ac\Vert x^*\Vert ^2\sum _{k\ge k_5}k^{2r-3}=C<+\infty $, by summing up (90) from $k=k_5$ to $k=n>k_5$, we obtain that there exists $C_1>0$ such that

$$\begin{aligned}{} & {} E_{n+1}\le C_1,\\{} & {} \quad \mu _n(f(x_{n})-f^*)\le C_1, \text{ hence } f(x_{n})-f^*={\mathcal {O}}(n^{-2r}),\\{} & {} \quad \sum _{k\ge k_5}m_k(f(x_{k})-f^*)\le C_1, \text{ hence } \sum _{k\ge 1}k^{2r-1}(f(x_{k})-f^*)<+\infty ,\\{} & {} \quad \nu _k\Vert x_k-x^*\Vert ^2\le C_1, \text{ hence } \Vert x_n-x^*\Vert ={\mathcal {O}}(n^{1-r}),\\{} & {} \quad \sum _{k\ge k_5} n_k\Vert x_k-x^*\Vert ^2\le C_1, \text{ hence } \sum _{k\ge 1} k^{2r-3}\Vert x_k-x^*\Vert ^2<+\infty ,\\{} & {} \quad \sigma _k\Vert x_k\Vert ^2\le C_1, \text{ hence } \Vert x_n\Vert ={\mathcal {O}}(n^{1-r}),\\{} & {} \quad \sum _{k\ge k_5} s_k\Vert x_k\Vert ^2\le C_1, \text{ hence } \sum _{k\ge 1} k^{2r-3}\Vert x_k\Vert ^2<+\infty ,\\{} & {} \quad \sum _{k\ge k_5} \eta _k\Vert x_k-x_{k-1}\Vert ^2\le C_1, \text{ hence } \sum _{k\ge 1}k^{2r-1}\Vert x_k-x_{k-1}\Vert ^2<+\infty \\{} & {} \quad \sum _{k\ge k_5} b_{k-1}^2\Vert {\nabla }f(x_k)\Vert ^2\le C_1, \text{ hence } \sum _{k\ge 1}k^{2r}\Vert {\nabla }f(x_k)\Vert ^2<+\infty . \end{aligned}$$

Since $\sum _{k\ge 1}k^{2r}\Vert {\nabla }f(x_k)\Vert ^2<+\infty $, we have $\Vert {\nabla }f(x_n)\Vert =o(n^{-r}).$ Combining this property with $E_{n+1}\le C_1$ yields $\sup _{n\ge 1}\Vert an^{r-1}(x_n-x^*)+n^{r}(x_{n+1}-x_{n})\Vert +\frac{c}{2}\left( 1-\frac{\alpha }{n}\right) n^{2r-2}\Vert x_{n-1}\Vert ^2<+\infty $.

Let us show now, that $f(x_n)-f^*= o(n^{-2r})$ and $\Vert x_n-x_{n-1}\Vert =o(n^{-r}).$ From (90) we get

$$\begin{aligned}{} & {} \sum _{k\ge 1} [(E_{k+1}+\mu _k (f(x_{k})-f^*)+\nu _k\Vert x_k-x^*\Vert ^2)\\{} & {} \quad -(E_{k}+\mu _{k-1} (f(x_{k-1})-f^*)+\nu _{k-1}\Vert x_{k-1}-x^*\Vert ^2)]_+<+\infty . \end{aligned}$$

Therefore, the following limit exists

$$\begin{aligned}{} & {} \lim _{k\rightarrow +\infty }(\Vert ak^{r-1}(x_k-x^*)+k^{r}(x_{k+1}-x_{k})\Vert ^2\\{} & {} +d_k\Vert x_k\Vert ^2+\mu _k (f(x_{k})-f^*)+\nu _k\Vert x_k-x^*\Vert ^2). \end{aligned}$$

Note that $d_k \sim C k^{2r-2},\,\mu _k\sim C k^{2r}$ and $\nu _k\sim C k^{2r-2}.$ Further, we have

$\sum _{k\ge 1} k^{2r-3}\Vert x_k-x^*\Vert ^2<+\infty $, $\sum _{k\ge 1} k^{2r-1}\Vert x_k-x_{k-1}\Vert ^2<+\infty $, $\sum _{k\ge 1}k^{2r-1}(f(x_{k})-f^*)< +\infty $ and $\sum _{k\ge 1} k^{2r-3}\Vert x_k\Vert ^2<+\infty $, hence

$$\begin{aligned}{} & {} \sum _{k\ge 1}\frac{1}{k}(\Vert ak^{r-1}(x_k-x^*)+k^{r}(x_{k+1}-x_{k})\Vert ^2+d_k\Vert x_k\Vert ^2\\{} & {} \quad +\mu _k (f(x_{k})-f^*)+\nu _k\Vert x_k-x^*\Vert ^2)<+\infty . \end{aligned}$$

Since $\sum _{k\ge 1}\frac{1}{k}=+\infty $ we get

$$\begin{aligned}{} & {} \lim _{k\rightarrow +\infty }(\Vert ak^{r-1}(x_k-x^*)+k^{r}(x_{k+1}-x_{k})\Vert ^2+d_k\Vert x_k\Vert ^2\\{} & {} \quad +\mu _k (f(x_{k})-f^*)+\nu _k\Vert x_k-x^*\Vert ^2)=0 \end{aligned}$$

and the claim follows. $\square $

Remark 18

The convergence rate of the values is $f(x_k)-\min _{{{\mathcal {H}}}} f=o(k^{-2s})$ for any $0<s <1$. Practically it is as good as the rate $ f\left( x(t)\right) - \min _{{{\mathcal {H}}}} f =O\left( \frac{1}{t^2}\right) $ obtained for the continuous dynamic.

5.2 Strong convergence to the minimum norm solution

Theorem 19

Take $\alpha >3$. Let $(x_k)$ be a sequence generated by (IPATRE). Let $x^*$ be the minimum norm element of ${{\,\textrm{argmin}\,}}f$. Then, $\liminf _{k\rightarrow +\infty }\Vert x_k-x^*\Vert =0$. Further, $(x_k)$ converges strongly to $x^*$ whenever $(x_k)$ is in the interior of the ball $B(0,\Vert x^*\Vert )$ for k large enough, or $(x_k)$ is in the complement of the ball $B(0,\Vert x^*\Vert )$ for k large enough.

Proof

Case I. Assume that there exists $k_0\in {\mathbb {N}}$ such that $\Vert x_k\Vert \ge \Vert x^*\Vert $ for all $k\ge k_0.$ Set $c_k=\frac{c}{k^2},$ and define $f_{c_k}(x):=f(x)+\frac{c}{2k^2}\Vert x\Vert ^2$. Consider the energy function defined in (77) with $r=1$, that is $a_k=a$ and $b_k=k^2$, where we assume that $\max (2,\alpha -2)<a<\alpha -1.$ Then,

$$\begin{aligned} E_k=\Vert a(x_{k-1}-x^*)+(k-1)^2(x_k-x_{k-1}+{\nabla }f(x_k))\Vert ^2+d_{k-1}\Vert x_{k-1}\Vert ^2, \end{aligned}$$

where the sequence $(d_k)$ will be defined later. Next, we introduce another energy functional

$$\begin{aligned} {\mathcal {E}}_k= & {} \frac{1}{2} c_{k-1}(\Vert x_{k-1}\Vert ^2-\Vert x^*\Vert ^2)+\Vert a(x_{k-1}-x^*)+(k-1)^2(x_k-x_{k-1}+\nabla f(x_k))\Vert ^2 \nonumber \\{} & {} + d_{k-1}\Vert x_{k-1}\Vert ^2. \end{aligned}$$

(91)

Note that ${\mathcal {E}}_k=\frac{1}{2}c_{k-1}(\Vert x_{k-1}\Vert ^2-\Vert x^*\Vert ^2)+E_k$. Then,

$$\begin{aligned} {\mathcal {E}}_{k+1}-{\mathcal {E}}_k=\frac{1}{2}c_{k}(\Vert x_{k}\Vert ^2-\Vert x^*\Vert ^2)-\frac{1}{2}c_{k-1}(\Vert x_{k-1}\Vert ^2-\Vert x^*\Vert ^2)+E_{k+1}-E_k. \end{aligned}$$

(92)

According to (90), there exists $k_1\ge k_0$ such that for all $k\ge k_1$

$$\begin{aligned}&{\mathcal {E}}_{k+1}-{\mathcal {E}}_k+\mu _k(f(x_{k})-f^*)-\mu _{k-1}(f(x_{k-1})-f^*)+m_k(f(x_{k})-f^*)\nonumber \\&\quad +\nu _k\Vert x_k-x^*\Vert ^2-\nu _{k-1}\Vert x_{k-1}-x^*\Vert ^2+n_k\Vert x_k-x^*\Vert ^2\nonumber \\&\quad +\eta _k\Vert x_k-x_{k-1}\Vert ^2+b_{k-1}^2\Vert {\nabla }f(x_k)\Vert ^2\le -\sigma _k\Vert x_k\Vert ^2+\sigma _{k-1}\Vert x_{k-1}\Vert ^2-s_k\Vert x_k\Vert ^2\nonumber \\&\quad +\frac{1}{2}c_{k}(\Vert x_{k}\Vert ^2-\Vert x^*\Vert ^2)-\frac{1}{2}c_{k-1}(\Vert x_{k-1}\Vert ^2-\Vert x^*\Vert ^2)+ a_kb_kc_k\Vert x^*\Vert ^2. \end{aligned}$$

(93)

Adding $\frac{1}{2}(\mu _k+m_k) c_k(\Vert x_k\Vert ^2-\Vert x^*\Vert ^2)-\frac{1}{2}\mu _{k-1} c_{k-1}(\Vert x_{k-1}\Vert ^2-\Vert x^*\Vert ^2)$ to both side of (93) we get

$$\begin{aligned}{} & {} {\mathcal {E}}_{k+1}-{\mathcal {E}}_k+\mu _k(f_{c_k}(x_{k})-f_{c_k}(x^*))-\mu _{k-1}(f_{c_{k-1}}(x_{k-1})-f_{c_{k-1}}(x^*))\nonumber \\{} & {} \quad +m_k(f_{c_k}(x_{k})-f_{c_k}(x^*))\nonumber \\{} & {} \quad +\nu _k\Vert x_k-x^*\Vert ^2-\nu _{k-1}\Vert x_{k-1}-x^*\Vert ^2+n_k\Vert x_k-x^*\Vert ^2 \nonumber \\{} & {} \quad +\eta _k\Vert x_k-x_{k-1}\Vert ^2 +b_{k-1}^2\Vert {\nabla }f(x_k)\Vert ^2 \nonumber \\{} & {} \quad \le -\sigma _k\Vert x_k\Vert ^2+\sigma _{k-1}\Vert x_{k-1}\Vert ^2-s_k\Vert x_k\Vert ^2\nonumber \\{} & {} \quad +\frac{1}{2}(\mu _k+m_k+1)c_{k}(\Vert x_{k}\Vert ^2-\Vert x^*\Vert ^2)\nonumber \\{} & {} \quad -\frac{1}{2}(\mu _{k-1}+1)c_{k-1}(\Vert x_{k-1}\Vert ^2-\Vert x^*\Vert ^2)+ a_kb_kc_k\Vert x^*\Vert ^2. \end{aligned}$$

(94)

The right hand side of (94) can be written as

$$\begin{aligned}&\left( \frac{1}{2}(\mu _k+m_k+1)c_{k}-\sigma _k-s_k\right) (\Vert x_{k}\Vert ^2-\Vert x^*\Vert ^2)\\&\quad +\left( -\frac{1}{2}(\mu _{k-1}+1)c_{k-1}+\sigma _{k-1}\right) (\Vert x_{k-1}\Vert ^2-\Vert x^*\Vert ^2)\\&\quad +(a_kb_kc_k-\sigma _k-s_k+\sigma _{k-1})\Vert x^*\Vert ^2. \end{aligned}$$

In this case we have $\mu _k=2b_{k}^2-2a_{k}b_{k}=2k^2-2ak$ and $m_k=2b_{k-1}^2-2b_{k}^2+2a_{k}b_{k}=2(a-2)k+2$. Further, $\sigma _k=\alpha _{k+1}b_{k+1}^2c_{k+1}-d_k=c-\frac{\alpha c}{k+1}-d_k$ and $s_k=b_k^2c_k^2-\alpha _{k+1}b_{k+1}^2c_{k+1}+\alpha _kb_k^2c_k +a_kb_kc_k=\frac{\alpha c}{k+1}+\frac{c(a-\alpha )}{k}+\frac{c^2}{k^2}.$ Now, take $d_k=\frac{(a+2-\alpha )c}{2k}\ge 0$ and an easy computation gives that there exists $k_2\ge k_1$ such that for all $k\ge k_2$ one has

$$\begin{aligned}{} & {} \frac{1}{2}(\mu _k+m_k+1)c_{k}-\sigma _k-s_k=-\frac{(a+2-\alpha )c}{2k}+\frac{2c^2-3c}{2k^2}\le 0, \\{} & {} \quad -\frac{1}{2}(\mu _{k-1}+1)c_{k-1}+\sigma _{k-1}=\frac{c(a-2-\alpha )}{2(k-1)}+\frac{\alpha c}{k(k-1)}-\frac{c}{2(k-1)^2}\le 0 \\{} & {} \quad a_kb_kc_k-\sigma _k-s_k+\sigma _{k-1}=\frac{(a+2-\alpha )c}{2k}-\frac{(a+2-\alpha )c}{2(k-1)}-\frac{c^2}{k^2}\le 0. \end{aligned}$$

Now, since by assumption $\Vert x_k\Vert \ge \Vert x^*\Vert $ for $k\ge k_0$, we get that the right hand side of (94) is nonpositive for all $k\ge k_2.$ Hence, for all $k\ge k_2$ we have

$$\begin{aligned}&{\mathcal {E}}_{k+1}-{\mathcal {E}}_k+\mu _k(f_{c_k}(x_{k})-f_{c_k}(x^*))-\mu _{k-1}(f_{c_{k-1}}(x_{k-1})-f_{c_{k-1}}(x^*))\nonumber \\&\quad +m_k(f_{c_k}(x_{k})-f_{c_k}(x^*))\nonumber \\&\quad +\nu _k\Vert x_k-x^*\Vert ^2-\nu _{k-1}\Vert x_{k-1}-x^*\Vert ^2+n_k\Vert x_k-x^*\Vert ^2 \nonumber \\&\quad +\eta _k\Vert x_k-x_{k-1}\Vert ^2+b_{k-1}^2\Vert {\nabla }f(x_k)\Vert ^2\le 0. \end{aligned}$$

(95)

Note that $\nu _k \sim C$. Therefore, from (95), similarly as in the proof of Theorem 17, we deduce that $\Vert x_k-x^*\Vert $ is bounded, and therefore $(x_k)$ is bounded. Further,

$$\begin{aligned} \lim _{k\rightarrow +\infty }(\Vert a(x_k-x^*)+k(x_{k+1}-x_{k})\Vert ^2+\mu _k (f_{c_k}(x_{k})-f_{c_k}(x^*))+\nu _k\Vert x_k-x^*\Vert ^2)=0, \end{aligned}$$

that is, $\lim _{k\rightarrow +\infty }\nu _k\Vert x_k-x^*\Vert ^2=0$ and hence $\lim _{k\rightarrow +\infty }x_k=x^*.$

Case II. Assume that there exists $k_0\in {\mathbb {N}}$ such that $\Vert x_k\Vert <\Vert x^*\Vert $ for all $k\ge k_0.$ From there we get that $(x_k)$ is bounded. Now, take ${\bar{x}} \in {\mathcal {H}}$ a weak sequential cluster point of $(x_k),$ which exists since $(x_k)$ is bounded. This means that there exists a sequence $\left( k_{n}\right) _{n \in {\mathbb {N}}} \subseteq [k_0,+\infty )\cap {\mathbb {N}}$ such that $k_{n} \rightarrow +\infty $ and $x_{k_{n}}$ converges weakly to ${\bar{x}}$ as $n \rightarrow +\infty $. Since f is weakly lower semicontinuous, according to Theorem 17 we have $f({\bar{x}}) \le \liminf _{n \rightarrow +\infty } f\left( x_{k_{n}}\right) =\min f \,,$ hence ${\bar{x}} \in {\text {argmin}} f.$ Since the norm is weakly lower semicontinuous, we deduce that

$$\begin{aligned} \begin{array}{c} \Vert {\bar{x}}\Vert \le \liminf _{n \rightarrow +\infty }\left\| x_{k_{n}}\right\| \le \left\| x^*\right\| . \end{array} \end{aligned}$$

According to the definition of $x^*$, we get ${\bar{x}}=x^{*}.$ Therefore $(x_k)$ converges weakly to $x^*$. So

$$\begin{aligned} \left\| x^*\right\| \le \liminf _{k \rightarrow +\infty }\Vert x_k\Vert \le \limsup _{t \rightarrow +\infty }\Vert x_k\Vert \le \left\| x^*\right\| . \end{aligned}$$

Therefore, we have $ \lim _{k \rightarrow +\infty }\Vert x_k\Vert =\left\| x^*\right\| .$ From the previous relation and the fact that $x_k\rightharpoonup x^*$ as $k \rightarrow +\infty ,$ we obtain the strong convergence, that is $\lim _{k \rightarrow +\infty } x_k=x^*.$

Case III. Suppose that for every $k \ge k_{0}$ there exists $l \ge k$ such that $\left\| x^*\right\| >\Vert x_l\Vert $, and suppose also there exists $m \ge k$ such that $\left\| x^{*}\right\| \le \Vert x_m\Vert $. So, let $k_1\ge k_0$ and $l_1\ge k_1$ such that $\left\| x^*\right\| >\Vert x_{l_1}\Vert .$ Let $k_2>l_1$ and $l_2\ge k_2$ such that $\left\| x^*\right\| >\Vert x_{l_2}\Vert .$ Continuing the process, we obtain $(x_{l_n})$, a subsequence of $(x_k)$ with the property that $\Vert x_{l_n}\Vert <\Vert x^*\Vert $ for all $n\in {\mathbb {N}}.$ By reasoning as in Case II, we obtain that $\lim _{n \rightarrow +\infty } x_{l_n}=x^*.$ Consequently, $\liminf _{k \rightarrow +\infty } \Vert x_k-x^*\Vert =0.$ $\square $

5.3 Non-smooth case

Let us extend the results of the previous sections to the case of a proper lower semicontinuous and convex function $f: {{\mathcal {H}}}\rightarrow {\mathbb {R}}\cup \left\{ +\infty \right\} $. We rely on the basic properties of the Moreau envelope $f_{\lambda }: {{\mathcal {H}}}\rightarrow {\mathbb {R}}$ ($\lambda $ is a positive real parameter), which is defined by

$$\begin{aligned} f_{\lambda } (x) = \min _{z \in {{\mathcal {H}}}} \left\{ f (z) + \frac{1}{2 \lambda } \Vert z - x\Vert ^2 \right\} , \quad \text {for any }x\in {{\mathcal {H}}}. \end{aligned}$$

Recall that $f_{\lambda } $ is a convex differentiable function, whose gradient is $\lambda ^{-1}$-Lipschitz continuous, and such that $\min _{{{\mathcal {H}}}} f= \min _{{{\mathcal {H}}}} f_{\lambda }$, ${{\,\textrm{argmin}\,}}_{{{\mathcal {H}}}} f_{\lambda } = {{\,\textrm{argmin}\,}}_{{{\mathcal {H}}}} f$. The interested reader may refer to Bauschke and Combettes (2011); Brézis (1972) for a comprehensive treatment of the Moreau envelope in a Hilbert setting. Since the set of minimizers is preserved by taking the Moreau envelope, the idea is to replace f by $f_{\lambda } $ in the previous algorithm, and take advantage of the fact that $f_{\lambda } $ is continuously differentiable. Then, algorithm $\mathrm{(IPATRE)}$ applied to $f_{\lambda }$ now reads

$$\begin{aligned} \begin{array}{l} \mathrm{(IPATRE)} \quad \left\{ \begin{array}{l} y_k= x_k+\alpha _{k }(x_k - x_{k-1})\\ x_{k+1}={ \mathrm prox}_{f_{\lambda }}\left( y_k - \frac{c}{k^2}x_k\right) . \end{array} \right. \end{array} \end{aligned}$$

By applying Theorems 17 and 19, we obtain fast convergence of the sequence $(x_k)$ to the element of minimum norm of f. Thus, we just need to formulate these results in terms of f and its proximal mapping. This is straightforward thanks to the following formulae from proximal calculus (Bauschke and Combettes 2011):

1.
$f_{\lambda } (x)= f({{\,\textrm{prox}\,}}_{ \lambda f}(x)) + \frac{1}{2\lambda } \Vert x-{{\,\textrm{prox}\,}}_{\lambda f}(x)\Vert ^2$.
2.
$\nabla f_{\lambda } (x)= \frac{1}{\lambda } \left( x-{{\,\textrm{prox}\,}}_{ \lambda f}(x) \right) $.
3.
${{\,\textrm{prox}\,}}_{ \theta f_{\lambda }}(x) = \frac{\lambda }{\lambda +\theta }x + \frac{\theta }{\lambda +\theta }{{\,\textrm{prox}\,}}_{ (\lambda + \theta )f}(x).$

We obtain the following relaxed inertial proximal algorithm (NS stands for non-smooth):

$$\begin{aligned} \begin{array}{rcl} {\text{(IPATRE-NS) }} \quad {\left\{ \begin{array}{ll} y_k= x_{k} + ( 1- \frac{\alpha }{k}) ( x_{k} - x_{k-1}) \\ x_{k+1} = \frac{\lambda }{1+\lambda }\left( y_k - \frac{c}{k^2}x_k \right) + \frac{1}{1+\lambda }{{\,\textrm{prox}\,}}_{(\lambda +1) f} \left( y_k - \frac{c}{k^2}x_k \right) . \end{array}\right. } \end{array} \end{aligned}$$

Theorem 20

Let $f: {{\mathcal {H}}}\rightarrow {\mathbb {R}}\cup \left\{ +\infty \right\} $ be a convex, lower semicontinuous, proper function. Assume that $\alpha >3$. Let $(x_k)$ be a sequence generated by $ \text {(IPATRE-NS)}$. Then for all $s\in \big [\frac{1}{2},1\big [$, we have:

(i)
$f({{{\,\textrm{prox}\,}}}_{ \lambda f}(x_k))-\min _{{{\mathcal {H}}}} f=o(k^{-2s})$, $\Vert x_{k}-x_{k-1}\Vert =o(k^{-s})$,

$\Vert x_k - {{{\,\textrm{prox}\,}}}_{ \lambda f}(x_k))\Vert =o(k^{-s})$ as $k\rightarrow +\infty .$
(ii)
$\displaystyle \sum _{k=1}^{+\infty } k^{2s-1}(f({{{\,\textrm{prox}\,}}}_{ \lambda f}(x_k))-\min _{{{\mathcal {H}}}} f)<+\infty ,$ $\displaystyle \sum _{k=1}^{+\infty } k^{2s-1}\Vert x_k-x_{k-1}\Vert ^2<+\infty $, $\displaystyle \sum _{k=1}^{+\infty } k^{2s}\Vert x_k - {{{\,\textrm{prox}\,}}}_{ \lambda f}(x_k))\Vert ^2< +\infty $.
(iii)
$\liminf _{k\rightarrow +\infty }\Vert x_k-x^*\Vert =0$. Further, $(x_k)$ converges strongly to $x^*$ the element of minimum norm of ${{\,\textrm{argmin}\,}}f$, if $(x_k)$ is in the interior of the ball $B(0,\Vert x^*\Vert )$ for k large enough, or if $(x_k)$ is in the complement of the ball $B(0,\Vert x^*\Vert )$ for k large enough.

6 Conclusion, perspective

In the framework of convex optimization in general Hilbert spaces, we have introduced an inertial dynamic in which the damping coefficient and the Tikhonov regularization coefficient vanish as time tends to infinity. The judicious adjustment of these parameters makes it possible to obtain trajectories converging quickly (and strongly) towards the minimum norm solution. This seems to be the first time that these two properties have been obtained for the same dynamic. Indeed, the Nesterov accelerated gradient method and the hierarchical minimization attached to the Tikhonov regularization are fully effective within this dynamic. On the basis of Lyapunov’s analysis, we have developed an in-depth mathematical study of the dynamic which is a valuable tool for the development of corresponding results for algorithms obtained by temporal discretization. We thus obtained similar results for the corresponding proximal algorithms. This study opens up a large field of promising research concerning first-order optimization algorithms. Many interesting questions such as the introduction of Hessian-driven damping to attenuate oscillations (Attouch et al. 2022, 2016; Boţ et al. 2021), and the study of the impact of errors, perutrbations, deserve further study. These results also adapt well to the numerical analysis of inverse problems for which strong convergence and obtaining a solution close to a desired state are key properties.

Data availability

In this manuscript only the datasets generated by authors were analysed.

References

Alvarez F, Attouch H (2001) Convergence and asymptotic stabilization for some damped hyperbolic equations with non-isolated equilibria. ESAIM Control Optim Calc Var 6:539–552
Article MathSciNet Google Scholar
Alvarez F, Cabot A (2006) Asymptotic selection of viscosity equilibria of semilinear evolution equations by the introduction of a slowly vanishing term. Discrete Contin Dyn Syst 15:921–938
Article MathSciNet Google Scholar
Apidopoulos V, Aujol J-F, Dossal Ch (2018) The differential inclusion modeling the FISTA algorithm and optimality of convergence rate in the case $b \le 3$. SIAM J Optim 28(1):551–574
Article MathSciNet Google Scholar
Attouch H (1996) Viscosity solutions of minimization problems. SIAM J Optim 6(3):769–806
Article MathSciNet Google Scholar
Attouch H, Balhag A, Chbani Z, Riahi H (2022) Damped inertial dynamics with vanishing Tikhonov regularization: strong asymptotic convergence towards the minimum norm solution. J Differ Equ 311:29–58
Article MathSciNet Google Scholar
Attouch H, Balhag A, Chbani Z, Riahi H (2023) Accelerated gradient methods combining Tikhonov regularization with geometric damping driven by the hessian. Appl Math Optim 88:29
Article MathSciNet Google Scholar
Attouch H, Boţ RI, Csetnek ER (2023) Fast optimization via inertial dynamics with closed-loop damping. J Eur Math Soc 25:1985–2056
Article MathSciNet Google Scholar
Attouch H, Briceño-Arias LM, Combettes PL (2010) A parallel splitting method for coupled monotone inclusions. SIAM J Control Optim 48(5):3246–3270
Article MathSciNet Google Scholar
Attouch H, Briceño-Arias LM, Combettes PL (2016) A strongly convergent primal-dual method for nonoverlapping domain decomposition. Numer Math 133(3):443–470
Article MathSciNet Google Scholar
Attouch H, Cabot A (2017) Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity. J Differ Equ 263(9):5412–5458
Article MathSciNet Google Scholar
Attouch H, Chbani Z, Fadili J, Riahi H (2022) First order optimization algorithms via inertial systems with Hessian driven damping. Math Progr 193:113–155
Article MathSciNet Google Scholar
Attouch H, Chbani Z, Peypouquet J, Redont P (2018) Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math Progr 168(1–2):123–175
Article MathSciNet Google Scholar
Attouch H, Chbani Z, Riahi H (2018) Combining fast inertial dynamics for convex optimization with Tikhonov regularization. J Math Anal Appl 457:1065–1094
Article MathSciNet Google Scholar
Attouch H, Cominetti R (1996) A dynamical approach to convex minimization coupling approximation with the steepest descent method. J Differ Equ 128(2):519–540
Article MathSciNet Google Scholar
Attouch H, Czarnecki M-O (2002) Asymptotic control and stabilization of nonlinear oscillators with non-isolated equilibria. J Differ Equ 179:278–310
Article MathSciNet Google Scholar
Attouch H, Czarnecki M-O (2010) Asymptotic behavior of coupled dynamical systems with multiscale aspects. J Differ Equ 248:1315–1344
Article MathSciNet Google Scholar
Attouch H, Czarnecki M-O, Peypouquet J (2011) Prox-penalization and splitting methods for constrained variational problems. SIAM J Optim 21:149–173
Article MathSciNet Google Scholar
Attouch H, Czarnecki M-O, Peypouquet J (2011) Coupling forward-backward with penalty schemes and parallel splitting for constrained variational inequalities. SIAM J Optim 21:1251–1274
Article MathSciNet Google Scholar
Attouch H, Czarnecki M-O (2017) Asymptotic behavior of gradient-like dynamical systems involving inertia and multiscale aspects. J Differ Equ 262(3):2745–2770
Article MathSciNet Google Scholar
Attouch H, Peypouquet J (2016) The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than $1/k^2$. SIAM J Optim 26(3):1824–1834
Article MathSciNet Google Scholar
Attouch H, Peypouquet J, Redont P (2016) Fast convex minimization via inertial dynamics with Hessian driven damping. J Differ Equ 261(10):5734–5783
Article Google Scholar
Baillon J-B, Cominetti R (2001) A convergence result for non-autonomous subgradient evolution equations and its application to the steepest descent exponential penalty trajectory in linear programming. J Funct Anal 187:263–273
Article MathSciNet Google Scholar
Bauschke H, Combettes PL (2011) Convex analysis and monotone operator theory in Hilbert spaces. CMS Books in Mathematics, Springer
Book Google Scholar
Bot RI, Csetnek ER (2014) Forward-Backward and Tseng’s type penalty schemes for monotone inclusion problems. Set-Valued Var Anal 22:313–331
Article MathSciNet Google Scholar
Boţ RI, Csetnek ER, László SC (2021) Tikhonov regularization of a second order dynamical system with Hessian damping. Math Progr 189:151–186
Article MathSciNet Google Scholar
Brézis H (1972) Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution, Lecture Notes 5, North Holland
Cabot A (2004) Inertial gradient-like dynamical system controlled by a stabilizing term. J Optim Theory Appl 120:275–303
Article MathSciNet Google Scholar
Cabot A (2005) Proximal point algorithm controlled by a slowly vanishing term: applications to hierarchical minimization. SIAM J Optim 15(2):555–572
Article MathSciNet Google Scholar
Cabot A, Engler H, Gadat S (2009) On the long time behavior of second order differential equations with asymptotically small dissipation Trans. Am Math Soc 361:5983–6017
Article Google Scholar
Chambolle A, Dossal Ch (2015) On the convergence of the iterates of Fista. J Optim Theory Appl 166:968–982
Article MathSciNet Google Scholar
Cominetti R (1997) Coupling the proximal point algorithm with approximation methods. J Optim Theory Appl 95(3):581–600
Article MathSciNet Google Scholar
Cominetti R, Peypouquet J, Sorin S (2008) Strong asymptotic convergence of evolution equations governed by maximal monotone operators with Tikhonov regularization. J Differ Equ 245:3753–3763
Article MathSciNet Google Scholar
Fiacco A, McCormick G (1968) Nonlinear programming: sequential unconstrained minimization techniques. John Wiley and Sons, New York
Google Scholar
Haraux A, Jendoubi MA (2016) A Liapunov function approach to the stabilization of second-order coupled systems arXiv preprint arXiv:1604.06547
Hirstoaga SA (2006) Approximation et résolution de problèmes d’équilibre, de point fixe et d’inclusion monotone. PhD thesis, Université Pierre et Marie Curie - Paris VI, HAL Id: tel-00137228
Jendoubi MA, May R (2010) On an asymptotically autonomous system with Tikhonov type regularizing term. Arch Math 95(4):389–399
Article MathSciNet Google Scholar
László SC (2023) On the strong convergence of the trajectories of a Tikhonov regularized second order dynamical system with asymptotically vanishing damping. J Differ Equ 362:355–381
Article MathSciNet Google Scholar
László SC (2024) Solving convex optimization problems via a second order dynamical system with implicit Hessian damping and Tikhonov regularization arXiv preprint arXiv:2401.02676
Nesterov Y (1983) A method of solving a convex programming problem with convergence rate $O(1/k^2)$. Soviet Math Dokl 27:372–376
Google Scholar
Nesterov Y (2004) Introductory lectures on convex optimization: A basic course, vol 87. Applied Optimization. Kluwer Academic Publishers, Boston, MA
Polyak B (1987) Introduction to Optimization. Optimization Software-Inc, New York
Google Scholar
Su W, Boyd S, Candès EJ (2016) A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J Mach Learn Res 17(153):1–43
MathSciNet Google Scholar
Tikhonov AN (1963) Solution of incorrectly formulated problems and the regularization method. Sov Math 4:1035–1038
Google Scholar
Tikhonov AN, Arsenin VY (1977) Solutions of Ill-Posed Problems. Winston, New York
Google Scholar

Download references

Funding

The research leading to these results received funding from Romanian Ministry of Research, Innovation and Digitization, CNCS-UEFISCDI, project number PN-III-P1$-$1.1-TE-2021-0138, within PNCDI III.

Author information

Authors and Affiliations

IMAG, Univ. Montpellier, CNRS, Montpellier, France
Hedy Attouch
Department of Mathematics, Technical University of Cluj-Napoca, Str. Memorandumului nr. 28, 400114, Cluj-Napoca, Romania
Szilárd Csaba László

Authors

Hedy Attouch
View author publications
You can also search for this author in PubMed Google Scholar
Szilárd Csaba László
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Szilárd Csaba László.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

S.C. László: This work was supported by a grant of the Ministry of Research, Innovation and Digitization, CNCS-UEFISCDI, project number PN-III-P1-1.1-TE-2021-0138, within PNCDI III.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Attouch, H., László, S.C. Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution. Math Meth Oper Res 99, 307–347 (2024). https://doi.org/10.1007/s00186-024-00867-y

Download citation

Received: 13 March 2024
Revised: 20 April 2024
Accepted: 03 May 2024
Published: 27 June 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s00186-024-00867-y

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution

Abstract

Similar content being viewed by others

First-order optimization algorithms via inertial systems with Hessian driven damping

Accelerated Gradient Methods Combining Tikhonov Regularization with Geometric Damping Driven by the Hessian

A fast continuous time approach for non-smooth convex optimization using Tikhonov regularization technique

1 Introduction

Theorem 1

1.1 Historical facts and related results

1.1.1 First-order gradient dynamics

1.1.2 Second order gradient dynamics

1.2 Model results

1.2.1 Case \(r=2\)

Theorem 2

1.2.2 Case \(r<2\)

Theorem 3

Remark 4

1.3 Contents

2 Convergence analysis for general \(\varepsilon (t)\)

2.1 Existence and uniqueness for the Cauchy problem

Theorem 5

Proof

2.2 General case

Definition 1

Theorem 6

Proof

Remark 7

Corollary 8

Proof

2.3 Particular cases

2.3.1 \(\varepsilon (t) \) of order \(1/t^2\)

Theorem 9

Remark 10

2.3.2 \(\varepsilon (t)\) of order \(1/t^r\), \(\frac{2}{3}< r<2\)

Theorem 11

Remark 12

3 In-depth analysis in the critical case \(\varepsilon (t) = c/t^2\)

3.1 Convergence rate of the values

Theorem 13

Proof

3.2 Strong convergence

Theorem 14

Proof

4 Strong convergence-general case

Theorem 15

Proof

4.1 The case \(\varepsilon (t)\) is of order \(1/t^r\), \(\frac{2}{3}< r<2\)

Theorem 16

5 Fast inertial algorithms with Tikhonov regularization

5.1 Convergence of values

Theorem 17

Proof

Remark 18

5.2 Strong convergence to the minimum norm solution

Theorem 19

Proof

5.3 Non-smooth case

Theorem 20

6 Conclusion, perspective

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation