Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
\newsiamremark

remarkRemark \newsiamremarkhypothesisHypothesis \newsiamremarkassumptionAssumption \newsiamthmclaimClaim \headersExtrapolated Plug-and-Play Three-Operator Splitting MethodsZ. Wu, C. Huang, and T. Zeng

Extrapolated Plug-and-Play Three-Operator Splitting Methods for Nonconvex Optimization with Applications to Image Restorationthanks: Submitted to the editors October 22, 2023. \fundingThis work was supported by Grant NSFC/RGC N_CUHK 415/19, Grant ITF ITS/173/22FP, Grant RGC 14300219, 14302920, 14301121, and CUHK Direct Grant for Research, the National Natural Science Foundation of China Grant 12001286, and the China Postdoctoral Science Foundation Grant 2022M711672.

Zhongming Wu Co-first author. School of Management Science and Engineering, Nanjing University of Information Science and Technology, Nanjing, China (). wuzm@nuist.edu.cn    Chaoyan Huang Co-first author. Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong, China (). cyhuang@math.cuhk.edu.hk    Tieyong Zeng Corresponding author. Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong, China (). zeng@math.cuhk.edu.hk
Abstract

This paper investigates the convergence properties and applications of the three-operator splitting method, also known as Davis-Yin splitting (DYS) method, integrated with extrapolation and Plug-and-Play (PnP) denoiser within a nonconvex framework. We first propose an extrapolated DYS method to effectively solve a class of structural nonconvex optimization problems that involve minimizing the sum of three possible nonconvex functions. Our approach provides an algorithmic framework that encompasses both extrapolated forward-backward splitting and extrapolated Douglas-Rachford splitting methods. To establish the convergence of the proposed method, we rigorously analyze its behavior based on the Kurdyka-Łojasiewicz property, subject to some tight parameter conditions. Moreover, we introduce two extrapolated PnP-DYS methods with convergence guarantee, where the traditional regularization prior is replaced by a gradient step-based denoiser. This denoiser is designed using a differentiable neural network and can be reformulated as the proximal operator of a specific nonconvex functional. We conduct extensive experiments on image deblurring and image super-resolution problems, where our results showcase the advantage of the extrapolation strategy and the superior performance of the learning-based model that incorporates the PnP denoiser in terms of achieving high-quality recovery images.

keywords:
Plug-and-Play, three-operator splitting method, nonconvex optimization, denoising prior, convergence guarantee
{AMS}

90C26, 90C30, 90C90, 65K05

1 Introduction

In this paper, we consider the following type of structural nonconvex optimization problem:

(1) min𝐱nF(𝐱)=f1(𝐱)+f2(𝐱)+h(𝐱),subscript𝐱superscript𝑛𝐹𝐱subscript𝑓1𝐱subscript𝑓2𝐱𝐱\min_{{\bf x}\in\mathbb{R}^{n}}F({\bf x})=f_{1}({\bf x})+f_{2}({\bf x})+h({\bf x% }),roman_min start_POSTSUBSCRIPT bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_F ( bold_x ) = italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_x ) + italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_x ) + italic_h ( bold_x ) ,

where f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and hhitalic_h are continuously differentiable and potentially nonconvex, and f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is a proper closed (possibly nonconvex) function. The model Eq. 1 captures a rich number of applications in fields of deep learning, signal and image processing, and statistical learning, see e.g., [7, 14, 15, 29, 74, 75, 76]. In particular, the smooth term includes the least squares or logistic loss functions, and the nonsmooth term can be represented as regularizers, e.g., to promote potential behavior such as sparsity and low-rank.

Splitting methods, which fully leverage the inherent separable structure, is a class of popular and state-of-the-art approaches for effectively addressing structural optimization problems. A generic way to solve the type of problem Eq. 1 is the three-operator splitting method, also known as Davis-Yin splitting (DYS) method which was first studied in [16] for convex optimization, i.e., all the involved functions in Eq. 1 are convex. The concrete iterative scheme of DYS method can be read as

(2) {𝐲k+1argmin𝐲n{f1(𝐲)+12γ𝐲𝐱k2},𝐳k+1argmin𝐳n{f2(𝐳)+12γ𝐳(2𝐲k+1γh(𝐲k+1)𝐱k)2},𝐱k+1=𝐱k+(𝐳k+1𝐲k+1),casessuperscript𝐲𝑘1subscriptargmin𝐲superscript𝑛subscript𝑓1𝐲12𝛾superscriptnorm𝐲superscript𝐱𝑘2superscript𝐳𝑘1subscriptargmin𝐳superscript𝑛subscript𝑓2𝐳12𝛾superscriptnorm𝐳2superscript𝐲𝑘1𝛾superscript𝐲𝑘1superscript𝐱𝑘2superscript𝐱𝑘1superscript𝐱𝑘superscript𝐳𝑘1superscript𝐲𝑘1\left\{\begin{array}[]{l}{\bf y}^{k+1}\in\operatorname*{arg\,min}\limits_{{\bf y% }\in\mathbb{R}^{n}}\left\{f_{1}({\bf y})+\frac{1}{2\gamma}\left\|{\bf y}-{\bf x% }^{k}\right\|^{2}\right\},\\[8.5359pt] {\bf z}^{k+1}\in\operatorname*{arg\,min}\limits_{{\bf z}\in\mathbb{R}^{n}}% \left\{f_{2}({\bf z})+\frac{1}{2\gamma}\left\|{\bf z}-\left(2{\bf y}^{k+1}-% \gamma\nabla h\left({\bf y}^{k+1}\right)-{\bf x}^{k}\right)\right\|^{2}\right% \},\\[8.5359pt] {\bf x}^{k+1}={\bf x}^{k}+\left({\bf z}^{k+1}-{\bf y}^{k+1}\right),\end{array}\right.{ start_ARRAY start_ROW start_CELL bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∈ start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_y - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } , end_CELL end_ROW start_ROW start_CELL bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∈ start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_z ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_z - ( 2 bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } , end_CELL end_ROW start_ROW start_CELL bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) , end_CELL end_ROW end_ARRAY

where γ>0𝛾0\gamma>0italic_γ > 0 is a proximal parameter. DYS method Eq. 2 includes two proximal subproblems with respect to 𝐲𝐲{\bf y}bold_y and 𝐳𝐳{\bf z}bold_z, which extends various previous splitting schemes such as the forward-backward splitting (FBS) method [3], Douglas-Rachford splitting (DRS) method [34], alternating direction method of multipliers (ADMM) algorithm [23] and the generalized forward-backward splitting method [51]. Later on, some variants and extensions of DYS method are explored for convex optimization [32, 56, 57, 62]. However, for the nonconvex setting as that in Eq. 1, convergence properties of the DYS method Eq. 2 are less understood. In contrast, the FBS method and DRS method, two special cases of the DYS method, have been well studied for nonconvex optimization, see e.g., [3, 34, 63]. Indeed, splitting methods are widely employed in image processing because numerous problems in image restoration can be addressed through variational methods. The resulting image is obtained as a minimizer of a suitable energy functional, typically exhibiting a separable structure. For recent applications in this field, we refer to [17, 40, 58, 62].

Another captivating and intriguing topic within the realm of splitting methods is the incorporation of acceleration techniques. Since the pioneering work of Polyak [50] on the heavy-ball method approach to gradient descent, extrapolation, as well as named inertial strategy, has been adapted to various optimization schemes to achieve accelerated convergence. Notable examples include the accelerated proximal point algorithm [12] for variational inequality problems and the accelerated FBS [4, 68, 6] for convex optimization. Over the past decade, the extrapolation technique has also been extended to various splitting methods for solving nonconvex optimization problems and expediting convergence based on Kurdyka–Łojasiewicz framework (see Definition 2.2), as demonstrated in studies such as [45, 33, 69, 37, 49, 73, 72, 48]. In this paper, our first focus is to investigate the convergence properties of the DYS method Eq. 2 when combined with extrapolation technique for solving Eq. 1. This endeavor will result in the development of a versatile framework encompassing extrapolated (or named inertial) FBS and extrapolated DRS methods as specialized schemes tailored for nonconvex optimization.

Recently, Plug-and-Play (PnP) methods combine splitting algorithm with denoising priors are widely used in solving many practical problems [19, 70, 71, 35]. PnP method offers a concise yet adaptable approach for integrating statistical priors into a problem, eliminating the requirement to explicitly construct an objective function. The first PnP method was the PnP-ADMM developed in [67] to address a range of imaging problems, which simply replaces the proximal subproblem with the denoising prior. Since then, many PnP-based methods such as PnP-FBS [59, 66], PnP-DRS [10, 28] and PnP-primal dual [46] approaches, reported empirical success on a large variety of applications, but with scarce theoretical guarantees. In several recent studies, the convergence of PnP methods has been achieved through the utilization of contractive fixed-point iterations. For example, the convergence of various proximal algorithms has been established by assuming properties such as denoiser averaging [60], firm nonexpansiveness [61], or simple nonexpansiveness [41, 52]. However, it is important to note that off-the-shelf deep denoisers often lack 1-Lipschitz continuity, which is equivalent to nonexpansiveness. The imposition of strict Lipschitz constraints on the network adversely affects its denoising performance [24, 28].

To address the challenge of nonexpansiveness in deep denoisers, Ryu et al. [55] proposed a method where each layer is individually normalized using its spectral norm. However, this approach imposes limitations on the utilization of residual skip connections, which are widely employed in deep denoisers. In a recent study, Hurault et al. [27] tackled this issue by training a deep image denoiser using a gradient-based PnP prior. By replacing the regularization step with the constructed denoiser, they demonstrated that the resulting gradient step PnP prior corresponds to the proximal operator of a specific nonconvex functional [28]. Under this condition, they successfully established the convergence of PnP-FBS, PnP-ADMM, and PnP-DRS iterates towards stationary points of explicit functions. Inspired by this research direction, it is worth exploring the convergence guarantees and potential applications of combining PnP methods with the DYS algorithm Eq. 2 in the form of Eq. 1.

1.1 Our contribution

This paper provides a generic algorithm framework that combines splitting methods, extrapolation strategy, and deep prior. The main contributions of this paper are threefold:

  • We propose an extrapolated DYS method for solving the type of structural nonconvex optimization problem Eq. 1, which provides a generic algorithm framework including extrapolated FBS and extrapolated DRS methods. Under the tight parameter conditions, the convergence of the generated iterates is established based on Kurdyka–Łojasiewicz framework.

  • By replacing the regularization step with the gradient step-based denoiser, we propose two extrapolated PnP-DYS methods. The denoiser is constructed by a differentiable neural network and can be reformulated as the proximal operator of a specific nonconvex functional. The convergence of both PnP-DYS algorithms is also established.

  • Extensive experiments on image deblurring and image super-resolution problems are conducted to evaluate the performance of the proposed schemes. The numerical results illustrate the advantages and efficiency of the extrapolation strategy. Moreover, the experiments reveal the superiority of the PnP-based model with deep denoiser in terms of the quality of the recovered images.

1.2 Organization

The remainder of this paper is organized as follows. Some related methods and preliminaries are reviewed in Section 2. An extrapolated DYS method with convergence analysis is developed in Section 3. Section 4 combines PnP approach and produces two extrapolated PnP-DYS methods with convergence guarantee. Some experimental results are reported in Section 5, and the conclusions follow in Section 6.

1.3 Notation

We use nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT to denote the n𝑛nitalic_n-dimensional Euclidean space, +subscript\mathbb{R}_{+}blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT to denote the set of nonnegative real numbers, ,\langle\cdot,\cdot\rangle⟨ ⋅ , ⋅ ⟩ to denote the inner product, and \|\cdot\|∥ ⋅ ∥ to denote the norm induced from the inner product. For an extended real-valued function f𝑓fitalic_f, the domain of f𝑓fitalic_f is defined as domf:={𝐱n|f(𝐱)<}assigndom𝑓conditional-set𝐱superscript𝑛𝑓𝐱{\rm dom}f:=\{{\bf x}\in\mathbb{R}^{n}\;|\;f({\bf x})<\infty\}roman_dom italic_f := { bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_f ( bold_x ) < ∞ }. We say that the function f𝑓fitalic_f is proper if domfdom𝑓{\rm dom}f\neq\emptysetroman_dom italic_f ≠ ∅ and f(𝐱)>𝑓𝐱f({\bf x})>-\inftyitalic_f ( bold_x ) > - ∞ for any 𝐱domf𝐱dom𝑓{\bf x}\in{\rm dom}fbold_x ∈ roman_dom italic_f, and is closed if it is lower semicontinuous. For any subset Sn𝑆superscript𝑛S\subseteq\mathbb{R}^{n}italic_S ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and any point 𝐱n𝐱superscript𝑛{\bf x}\in\mathbb{R}^{n}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, the distance from 𝐱𝐱{\bf x}bold_x to S𝑆Sitalic_S is defined by dist(𝐱,S):=inf{𝐲𝐱|𝐲S},assigndist𝐱𝑆infimumconditionalnorm𝐲𝐱𝐲𝑆{\rm dist}({\bf x},S):=\inf\left\{\|{\bf y}-{\bf x}\|\;\big{|}\;{\bf y}\in S% \right\},roman_dist ( bold_x , italic_S ) := roman_inf { ∥ bold_y - bold_x ∥ | bold_y ∈ italic_S } , and dist(𝐱,S)=dist𝐱𝑆{\rm dist}({\bf x},S)=\inftyroman_dist ( bold_x , italic_S ) = ∞ for all 𝐱𝐱\bf xbold_x when S=𝑆S=\emptysetitalic_S = ∅.

2 Preliminaries

In this section, we review the definitions of subdifferential and Kurdyka-Łojasiewicz (KL) property for further analysis.

Definition 2.1.

[3, 8] (Subdifferentials) Let f:n(,+]:𝑓superscript𝑛f:\mathbb{R}^{n}\rightarrow(-\infty,+\infty]italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → ( - ∞ , + ∞ ] be a proper and lower semicontinuous function.

  • (i)

    For a given 𝐱domf𝐱dom𝑓{\bf x}\in{\rm dom}fbold_x ∈ roman_dom italic_f, the Fréchet subdifferential of f𝑓fitalic_f at 𝐱𝐱{\bf x}bold_x, written by ^f(𝐱)^𝑓𝐱\widehat{\partial}f({\bf x})over^ start_ARG ∂ end_ARG italic_f ( bold_x ), is the set of all vectors 𝐮n𝐮superscript𝑛{\bf u}\in\mathbb{R}^{n}bold_u ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT satisfying

    lim inf𝐲𝐱,𝐲𝐱f(𝐲)f(𝐱)𝐮,𝐲𝐱𝐲𝐱0,subscriptlimit-infimumformulae-sequence𝐲𝐱𝐲𝐱𝑓𝐲𝑓𝐱𝐮𝐲𝐱norm𝐲𝐱0\liminf_{{\bf y}\neq{\bf x},{\bf y}\rightarrow{\bf x}}\frac{f({\bf y})-f({\bf x% })-\langle{\bf u},{\bf y}-{\bf x}\rangle}{\|{\bf y}-{\bf x}\|}\geq 0,\vspace{-% 0.05in}lim inf start_POSTSUBSCRIPT bold_y ≠ bold_x , bold_y → bold_x end_POSTSUBSCRIPT divide start_ARG italic_f ( bold_y ) - italic_f ( bold_x ) - ⟨ bold_u , bold_y - bold_x ⟩ end_ARG start_ARG ∥ bold_y - bold_x ∥ end_ARG ≥ 0 ,

    and we set ^f(𝐱)=^𝑓𝐱\widehat{\partial}f({\bf x})=\emptysetover^ start_ARG ∂ end_ARG italic_f ( bold_x ) = ∅ when 𝐱domf𝐱dom𝑓{\bf x}\notin{\rm dom}fbold_x ∉ roman_dom italic_f.

  • (ii)

    The limiting-subdifferential, or simply the subdifferential, of f𝑓fitalic_f at 𝐱𝐱{\bf x}bold_x, written by f(𝐱)𝑓𝐱\partial f({\bf x})∂ italic_f ( bold_x ), is defined by

    (3) f(𝐱):={𝐮n|𝐱k𝐱,s.t.f(𝐱k)f(𝐱)and^f(𝐱k)𝐮k𝐮}.assign𝑓𝐱conditional-set𝐮superscript𝑛formulae-sequencesuperscript𝐱𝑘𝐱st𝑓superscript𝐱𝑘𝑓𝐱and^𝑓superscript𝐱𝑘containssuperscript𝐮𝑘𝐮\partial f({\bf x}):=\{{\bf u}\in\mathbb{R}^{n}\;|\;\exists~{}{\bf x}^{k}% \rightarrow{\bf x},~{}{\rm s.t.}~{}f({\bf x}^{k})\rightarrow f({\bf x})~{}{\rm and% }~{}\widehat{\partial}f({\bf x}^{k})\ni{\bf u}^{k}\rightarrow{\bf u}\}.∂ italic_f ( bold_x ) := { bold_u ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | ∃ bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → bold_x , roman_s . roman_t . italic_f ( bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) → italic_f ( bold_x ) roman_and over^ start_ARG ∂ end_ARG italic_f ( bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∋ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT → bold_u } .
  • (iii)

    A point 𝐱superscript𝐱{\bf x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is called (limiting-)critical point or stationary point of f𝑓fitalic_f if it satisfies 0f(𝐱)0𝑓superscript𝐱0\in\partial f({\bf x}^{*})0 ∈ ∂ italic_f ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), and the set of critical points of f𝑓fitalic_f is denoted by critfcrit𝑓{\rm crit}froman_crit italic_f.

Definition 2.1 implies that the property ^f(𝐱)f(𝐱)^𝑓𝐱𝑓𝐱\widehat{\partial}f({\bf x})\subseteq\partial f({\bf x})over^ start_ARG ∂ end_ARG italic_f ( bold_x ) ⊆ ∂ italic_f ( bold_x ) holds immediately, and ^f(𝐱)^𝑓𝐱\widehat{\partial}f({\bf x})over^ start_ARG ∂ end_ARG italic_f ( bold_x ) is closed and convex while f(𝐱)𝑓𝐱\partial f({\bf x})∂ italic_f ( bold_x ) is closed. Indeed, the subdifferential Eq. 3 reduces to the gradient of f𝑓fitalic_f denoted by f𝑓\nabla f∇ italic_f if f𝑓fitalic_f is continuously differentiable. Furthermore, as described in [53], if g𝑔gitalic_g is a continuously differentiable function, it holds that (f+g)=f+g𝑓𝑔𝑓𝑔\partial(f+g)=\partial f+\nabla g∂ ( italic_f + italic_g ) = ∂ italic_f + ∇ italic_g.

Next, we recall the KL property [2, 8], which is important in the convergence analysis.

Definition 2.2.

(KL property and KL function) Let f:n(,+]:𝑓superscript𝑛f:\mathbb{R}^{n}\rightarrow(-\infty,+\infty]italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → ( - ∞ , + ∞ ] be a proper and lower semicontinuous function.

  • (a)𝑎(a)( italic_a )

    The function f𝑓fitalic_f is said to have KL property at 𝐱dom(f)superscript𝐱dom𝑓{\bf x}^{*}\in{\rm dom}(\partial f)bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ roman_dom ( ∂ italic_f ) if there exist η(0,+]𝜂0\eta\in(0,+\infty]italic_η ∈ ( 0 , + ∞ ], a neighborhood U𝑈Uitalic_U of 𝐱superscript𝐱{\bf x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and a continuous and concave function φ:[0,η)+:𝜑0𝜂subscript\varphi:[0,\eta)\rightarrow\mathbb{R}_{+}italic_φ : [ 0 , italic_η ) → blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT such that

    • (i)

      φ(0)=0𝜑00\varphi(0)=0italic_φ ( 0 ) = 0 and φ𝜑\varphiitalic_φ is continuously differentiable on (0,η)0𝜂(0,\eta)( 0 , italic_η ) with φ>0superscript𝜑0\varphi^{\prime}>0italic_φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT > 0;

    • (ii)

      for all 𝐱U{𝐳n|f(𝐱)<f(𝐳)<f(𝐱)+η}𝐱𝑈conditional-set𝐳superscript𝑛𝑓superscript𝐱𝑓𝐳𝑓superscript𝐱𝜂{\bf x}\in U\cap\{{\bf z}\in\mathbb{R}^{n}\;|\;f({\bf x}^{*})<f({\bf z})<f({% \bf x}^{*})+\eta\}bold_x ∈ italic_U ∩ { bold_z ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_f ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) < italic_f ( bold_z ) < italic_f ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + italic_η }, the following KL inequality holds:

      (4) φ(f(𝐱)f(𝐱))dist(0,f(𝐱))1.superscript𝜑𝑓𝐱𝑓superscript𝐱dist0𝑓𝐱1\varphi^{\prime}(f({\bf x})-f({\bf x}^{*})){\rm dist}(0,\partial f({\bf x}))% \geq 1.italic_φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_f ( bold_x ) - italic_f ( bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) roman_dist ( 0 , ∂ italic_f ( bold_x ) ) ≥ 1 .
  • (b)𝑏(b)( italic_b )

    If f𝑓fitalic_f satisfies the KL property at each point of dom(f)𝑓(\partial f)( ∂ italic_f ), then f𝑓fitalic_f is called a KL function.

Denote ΦηsubscriptΦ𝜂\Phi_{\eta}roman_Φ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT as the set of functions φ𝜑\varphiitalic_φ which satisfy the involved conditions in Definition 2.2(a). Then, we give an uniformized KL property which was established in [8] in the following, it will be useful for further convergence analysis.

Lemma 2.3.

[8] (Uniformized KL property) Let f:n(,+]:𝑓superscript𝑛f:\mathbb{R}^{n}\rightarrow(-\infty,+\infty]italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → ( - ∞ , + ∞ ] be a proper and lower semicontinuous function and ΩΩ\Omegaroman_Ω be a compact set. Assume that f𝑓fitalic_f is a constant on ΩΩ\Omegaroman_Ω and satisfies the KL property at each point of ΩΩ\Omegaroman_Ω. Then, there exist ς>0,η>0formulae-sequence𝜍0𝜂0\varsigma>0,~{}\eta>0italic_ς > 0 , italic_η > 0 and φΦη𝜑subscriptΦ𝜂\varphi\in\Phi_{\eta}italic_φ ∈ roman_Φ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT such that

(5) φ(f(𝐱)f(𝐱¯))dist(0,f(𝐱))1,superscript𝜑𝑓𝐱𝑓¯𝐱dist0𝑓𝐱1\varphi^{\prime}(f({\bf x})-f(\bar{\bf x})){\rm dist}(0,\partial f({\bf x}))% \geq 1,italic_φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_f ( bold_x ) - italic_f ( over¯ start_ARG bold_x end_ARG ) ) roman_dist ( 0 , ∂ italic_f ( bold_x ) ) ≥ 1 ,

for all 𝐱¯Ω¯𝐱Ω\bar{\bf x}\in\Omegaover¯ start_ARG bold_x end_ARG ∈ roman_Ω and each 𝐱𝐱{\bf x}bold_x satisfying dist(𝐱,Ω)<ςdist𝐱Ω𝜍{\rm dist}({\bf x},\Omega)<\varsigmaroman_dist ( bold_x , roman_Ω ) < italic_ς and f(𝐱¯)<f(𝐱)<f(𝐱¯)+η.𝑓¯𝐱𝑓𝐱𝑓¯𝐱𝜂f(\bar{\bf x})<f({\bf x})<f(\bar{\bf x})+\eta.italic_f ( over¯ start_ARG bold_x end_ARG ) < italic_f ( bold_x ) < italic_f ( over¯ start_ARG bold_x end_ARG ) + italic_η .

Below we give a well-known descent lemma for smooth functions in the literature and the detailed proof can be found in [44, Lemma 1.2.3].

Lemma 2.4.

[44] Let h:n:superscript𝑛h:~{}\mathbb{R}^{n}\rightarrow\mathbb{R}italic_h : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R be a continuously differentiable function with gradient h\nabla h∇ italic_h assumed Lhsubscript𝐿L_{h}italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT-Lipschitz continuous. Then, we have

(6) |h(𝐮)h(𝐯)𝐮𝐯,h(𝐯)|Lh2𝐮𝐯2,𝐮,𝐯n.formulae-sequence𝐮𝐯𝐮𝐯𝐯subscript𝐿2superscriptnorm𝐮𝐯2for-all𝐮𝐯superscript𝑛\Big{|}h({\bf u})-h({\bf v})-\langle{\bf u}-{\bf v},\nabla h({\bf v})\rangle% \Big{|}\leq\frac{L_{h}}{2}\|{\bf u}-{\bf v}\|^{2},\qquad\forall~{}{\bf u},{\bf v% }\in\mathbb{R}^{n}.| italic_h ( bold_u ) - italic_h ( bold_v ) - ⟨ bold_u - bold_v , ∇ italic_h ( bold_v ) ⟩ | ≤ divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ bold_u - bold_v ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∀ bold_u , bold_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT .

Lemma 2.5.

[9] Let {an}subscript𝑎𝑛\{a_{n}\}{ italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } and {bn}subscript𝑏𝑛\{b_{n}\}{ italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } be two nonnegative sequences satisfying nbn<subscript𝑛subscript𝑏𝑛\sum_{n\in\mathbb{N}}b_{n}<\infty∑ start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT < ∞ and an+1aan+ban1+bnsubscript𝑎𝑛1𝑎subscript𝑎𝑛𝑏subscript𝑎𝑛1subscript𝑏𝑛a_{n+1}\leq a\cdot a_{n}+b\cdot a_{n-1}+b_{n}italic_a start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ≤ italic_a ⋅ italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_b ⋅ italic_a start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for all n1𝑛1n\geq 1italic_n ≥ 1, where a𝑎a\in\mathbb{R}italic_a ∈ blackboard_R, b0𝑏0b\geq 0italic_b ≥ 0 and a+b<1𝑎𝑏1a+b<1italic_a + italic_b < 1. Then, we have nan<subscript𝑛subscript𝑎𝑛\sum_{n\in\mathbb{N}}a_{n}<\infty∑ start_POSTSUBSCRIPT italic_n ∈ blackboard_N end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT < ∞.

3 Extrapolated DYS method with convergence analysis

Algorithm 1 An extrapolated DYS method
  Choose the parameters α0𝛼0\alpha\geq 0italic_α ≥ 0 and γ>0𝛾0\gamma>0italic_γ > 0. Given 𝐱0superscript𝐱0{\bf x}^{0}bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and 𝐱1=𝐱0superscript𝐱1superscript𝐱0{\bf x}^{-1}={\bf x}^{0}bold_x start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, set k=0𝑘0k=0italic_k = 0.
  while the stopping criteria is not satisfied, do
     
(7) {𝐰k=𝐱k+α(𝐱k𝐱k1),𝐲k+1=Proxγf1(𝐰k),𝐳k+1=Proxγf2(2𝐲k+1γh(𝐲k+1)𝐰k),𝐱k+1=𝐰k+(𝐳k+1𝐲k+1).\left\{\begin{aligned} {\bf w}^{k}&={\bf x}^{k}+\alpha({\bf x}^{k}-{\bf x}^{k-% 1}),\\ {\bf y}^{k+1}&={\rm Prox}_{\gamma f_{1}}\left({\bf w}^{k}\right),\\ {\bf z}^{k+1}&={\rm Prox}_{\gamma f_{2}}\left(2{\bf y}^{k+1}-\gamma\nabla h({% \bf y}^{k+1})-{\bf w}^{k}\right),\\ {\bf x}^{k+1}&={\bf w}^{k}+\left({\bf z}^{k+1}-{\bf y}^{k+1}\right).\end{% aligned}\right.{ start_ROW start_CELL bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_CELL start_CELL = bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_α ( bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = roman_Prox start_POSTSUBSCRIPT italic_γ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = roman_Prox start_POSTSUBSCRIPT italic_γ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 2 bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) . end_CELL end_ROW
  end while

In this section, we propose a general extrapolated DYS method and conduct the convergence analysis.

3.1 The extrapolated DYS method

We propose an extrapolated DYS algorithm to solve the general nonconvex optimization problem Eq. 1, where an extrapolation step is incorporated to accelerate the convergence speed. Note that for any γ>0𝛾0\gamma>0italic_γ > 0, the proximal operator of the function f𝑓fitalic_f is defined by

Proxγf(𝐱)=argmin𝐲n{f(𝐲)+12γ𝐲𝐱2}.subscriptProx𝛾𝑓𝐱subscriptargmin𝐲superscript𝑛𝑓𝐲12𝛾superscriptnorm𝐲𝐱2{\rm Prox}_{\gamma f}({\bf x})=\operatorname*{arg\,min}_{{\bf y}\in\mathbb{R}^% {n}}\left\{f({\bf y})+\frac{1}{2\gamma}\|{\bf y}-{\bf x}\|^{2}\right\}.roman_Prox start_POSTSUBSCRIPT italic_γ italic_f end_POSTSUBSCRIPT ( bold_x ) = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT { italic_f ( bold_y ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_y - bold_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } .

We say that f𝑓fitalic_f is prox-bounded if f+12γ2f+\frac{1}{2\gamma}\|\cdot\|^{2}italic_f + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ ⋅ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is lower bounded for some γ>0𝛾0\gamma>0italic_γ > 0. The supremum of all such γ𝛾\gammaitalic_γ is the threshold of prox-boundedness of f𝑓fitalic_f, denoted as γfsubscript𝛾𝑓\gamma_{f}italic_γ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT. If f𝑓fitalic_f is lower semicontinuous, then ProxγfsubscriptProx𝛾𝑓{\rm Prox}_{\gamma f}roman_Prox start_POSTSUBSCRIPT italic_γ italic_f end_POSTSUBSCRIPT is nonempty and compact for all γ(0,γf)𝛾0subscript𝛾𝑓\gamma\in\left(0,\gamma_{f}\right)italic_γ ∈ ( 0 , italic_γ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ) [53, Theorem 1.25].

The concrete iterative scheme is summarized in Algorithm 1, which provides a versatile algorithmic framework that encompasses both (extrapolated) forward-backward splitting and (extrapolated) Douglas-Rachford splitting methods. In particular, when the extrapolation step vanishes, i.e., α=0𝛼0\alpha=0italic_α = 0, Algorithm 1 simplifies to the classical three-operator splitting method studied in [7, 16]. When f1=0subscript𝑓10f_{1}=0italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 in Eq. 1, Algorithm 1 reduces to the extrapolated (or named inertial) forward-backward splitting method, also known as inertial proximal gradient method, studied in [4, 43, 38, 73]. Algorithm 1 also recovers extrapolated Douglas-Rachford splitting method when h=00h=0italic_h = 0.

Besides, when α=0𝛼0\alpha=0italic_α = 0 and the function hhitalic_h vanishes, Algorithm 1 reduces to the classical DRS algorithm. The convergence of DRS method for nonconvex optimization was first discussed in [34], and then refined in [63]. Some other variants and extensions of DRS method for nonconvex optimization can refer to [21, 22, 36, 39, 65]. When α=0𝛼0\alpha=0italic_α = 0 and f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT vanishes, the DYS algorithm becomes another very popular approach, namely, the forward-backward splitting (FBS) or proximal gradient method. We refer to [1, 3, 9, 64, 73] for the extension studies of FBS method in the nonconvex setting.

Next we present some assumptions for problem Eq. 1 to facilitate convergence analysis.

{assumption}

The functions f1,f2subscript𝑓1subscript𝑓2f_{1},f_{2}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and g𝑔gitalic_g in Eq. 1 satisfy the following conditions:

  • (i)

    f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT has a Lipschitz continuous gradient, i.e., there exists a constant Lf1>0subscript𝐿subscript𝑓10L_{f_{1}}>0italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT > 0 such that

    f1(𝐲1)f1(𝐲2)Lf1𝐲1𝐲2,𝐲1,𝐲2n.formulae-sequencenormsubscript𝑓1subscript𝐲1subscript𝑓1subscript𝐲2subscript𝐿subscript𝑓1normsubscript𝐲1subscript𝐲2for-allsubscript𝐲1subscript𝐲2superscript𝑛\left\|\nabla f_{1}\left({\bf y}_{1}\right)-\nabla f_{1}\left({\bf y}_{2}% \right)\right\|\leq L_{f_{1}}\left\|{\bf y}_{1}-{\bf y}_{2}\right\|,\quad% \forall~{}{\bf y}_{1},{\bf y}_{2}\in\mathbb{R}^{n}.∥ ∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥ ≤ italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∥ bold_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ , ∀ bold_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT .
  • (ii)

    hhitalic_h has a Lipschitz continuous gradient, i.e., there exists a constant Lh>0subscript𝐿0L_{h}>0italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT > 0 such that

    h(𝐲1)h(𝐲2)Lh𝐲1𝐲2,𝐲1,𝐲2n.formulae-sequencenormsubscript𝐲1subscript𝐲2subscript𝐿normsubscript𝐲1subscript𝐲2for-allsubscript𝐲1subscript𝐲2superscript𝑛\left\|\nabla h\left({\bf y}_{1}\right)-\nabla h\left({\bf y}_{2}\right)\right% \|\leq L_{h}\left\|{\bf y}_{1}-{\bf y}_{2}\right\|,\quad\forall~{}{\bf y}_{1},% {\bf y}_{2}\in\mathbb{R}^{n}.∥ ∇ italic_h ( bold_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - ∇ italic_h ( bold_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥ ≤ italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∥ bold_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ , ∀ bold_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT .
  • (iii)

    f2:n{}:subscript𝑓2superscript𝑛f_{2}:\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{\infty\}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R ∪ { ∞ } is a proper closed function, and the objective function F𝐹Fitalic_F is bounded from below.

Let l𝑙l\in\mathbb{R}italic_l ∈ blackboard_R be a constant such that f1+l22f_{1}+\frac{l}{2}\|\cdot\|^{2}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG italic_l end_ARG start_ARG 2 end_ARG ∥ ⋅ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is convex. It should be noted that the existence of such an l𝑙litalic_l can be guaranteed by the Lipschitz continuity of f1subscript𝑓1\nabla f_{1}∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Specifically, one can always choose l=Lf1𝑙subscript𝐿subscript𝑓1l=L_{f_{1}}italic_l = italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. In addition, it follows from the convexity of f1+l22f_{1}+\frac{l}{2}\|\cdot\|^{2}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG italic_l end_ARG start_ARG 2 end_ARG ∥ ⋅ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT that

f1(𝐲1)f1(𝐲2)f1(𝐲2),𝐲1𝐲2l2𝐲1𝐲22,𝐲1,𝐲2n.formulae-sequencesubscript𝑓1subscript𝐲1subscript𝑓1subscript𝐲2subscript𝑓1subscript𝐲2subscript𝐲1subscript𝐲2𝑙2superscriptnormsubscript𝐲1subscript𝐲22for-allsubscript𝐲1subscript𝐲2superscript𝑛f_{1}({\bf y}_{1})-f_{1}({\bf y}_{2})-\langle\nabla f_{1}({\bf y}_{2}),{\bf y}% _{1}-{\bf y}_{2}\rangle\geq-\frac{l}{2}\|{\bf y}_{1}-{\bf y}_{2}\|^{2},\quad% \forall~{}{\bf y}_{1},{\bf y}_{2}\in\mathbb{R}^{n}.italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) - ⟨ ∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , bold_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ ≥ - divide start_ARG italic_l end_ARG start_ARG 2 end_ARG ∥ bold_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∀ bold_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT .

Then, according to the Lipschitz continuity of f1subscript𝑓1\nabla f_{1}∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Lemma 2.4, it must holds that lLf1𝑙subscript𝐿subscript𝑓1l\geq-L_{f_{1}}italic_l ≥ - italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Hence, there must exist a constant l[Lf1,Lf1]𝑙subscript𝐿subscript𝑓1subscript𝐿subscript𝑓1l\in[-L_{f_{1}},L_{f_{1}}]italic_l ∈ [ - italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] such that f1+l22f_{1}+\frac{l}{2}\|\cdot\|^{2}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG italic_l end_ARG start_ARG 2 end_ARG ∥ ⋅ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is convex. Note that l<0𝑙0l<0italic_l < 0 implies that f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is strongly convex. Define

(8) Λ(γ):=1γl2γLh2+γLhγ2Lf12.assignΛ𝛾1𝛾𝑙2𝛾subscript𝐿2𝛾subscript𝐿superscript𝛾2superscriptsubscript𝐿subscript𝑓12\Lambda(\gamma):=\frac{1-\gamma{l}-2\gamma L_{h}}{2+\gamma L_{h}}-\gamma^{2}L_% {f_{1}}^{2}.roman_Λ ( italic_γ ) := divide start_ARG 1 - italic_γ italic_l - 2 italic_γ italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 + italic_γ italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG - italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Now we give the parameter conditions for Algorithm 1 in the following assumption. {assumption} The parameters α𝛼\alphaitalic_α and γ𝛾\gammaitalic_γ should be chosen such that 0<γ<1Lf1+Lh0𝛾1subscript𝐿subscript𝑓1subscript𝐿0<\gamma<\frac{1}{L_{f_{1}}+L_{h}}0 < italic_γ < divide start_ARG 1 end_ARG start_ARG italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG and 0α<Λ(γ).0𝛼Λ𝛾0\leq\alpha<\Lambda(\gamma).0 ≤ italic_α < roman_Λ ( italic_γ ) .

Remark 3.1.

Note that for given Lf1>0subscript𝐿subscript𝑓10L_{f_{1}}>0italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT > 0 and Lh0,Λ(γ)>0formulae-sequencesubscript𝐿0Λ𝛾0L_{h}\geq 0,~{}\Lambda(\gamma)>0italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ≥ 0 , roman_Λ ( italic_γ ) > 0 always holds if γ>0𝛾0\gamma>0italic_γ > 0 is sufficiently small. Moreover, for the case of Lh=0subscript𝐿0L_{h}=0italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = 0, i.e., when h=00h=0italic_h = 0, it is easy to determine that Λ(γ)>0Λ𝛾0\Lambda(\gamma)>0roman_Λ ( italic_γ ) > 0 if the following threshold for γ𝛾\gammaitalic_γ is satisfied:

(9) 0<γ<l+l2+8Lf124Lf12.0𝛾𝑙superscript𝑙28superscriptsubscript𝐿subscript𝑓124superscriptsubscript𝐿subscript𝑓120<\gamma<{\frac{-l+\sqrt{l^{2}+8L_{f_{1}}^{2}}}{4L_{f_{1}}^{2}}}.0 < italic_γ < divide start_ARG - italic_l + square-root start_ARG italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 8 italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG 4 italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

The above relation implies that γ<1Lf1𝛾1subscript𝐿subscript𝑓1\gamma<\frac{1}{L_{f_{1}}}italic_γ < divide start_ARG 1 end_ARG start_ARG italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG, since the maximum value of the upper bound can be attained when l=Lf1𝑙subscript𝐿subscript𝑓1l=-L_{f_{1}}italic_l = - italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT for every fixed value of Lf1subscript𝐿subscript𝑓1L_{f_{1}}italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Indeed, when h=00h=0italic_h = 0 and α=0𝛼0\alpha=0italic_α = 0, the extrapolated DYS algorithm Eq. 7 reduces to the classical DRS algorithm in [34, 63]. In this case, the range of γ𝛾\gammaitalic_γ specified in Eq. 9 is tighter compared to that in [34], particularly in terms of the larger upper bound. For Lh>0subscript𝐿0L_{h}>0italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT > 0, we can also provide a computable threshold for γ𝛾\gammaitalic_γ to ensure that Section 3.1 holds, i.e., 0<γ<1Lf1+Lh0𝛾1subscript𝐿subscript𝑓1subscript𝐿0<\gamma<\frac{1}{L_{f_{1}}+L_{h}}0 < italic_γ < divide start_ARG 1 end_ARG start_ARG italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG and Λ(γ)>0Λ𝛾0\Lambda(\gamma)>0roman_Λ ( italic_γ ) > 0, as follows:

(10) 0<γ<min{1Lf1+Lh,γ0},0𝛾1subscript𝐿subscript𝑓1subscript𝐿subscript𝛾00<\gamma<\min\left\{\frac{1}{L_{f_{1}}+L_{h}},\gamma_{0}\right\},0 < italic_γ < roman_min { divide start_ARG 1 end_ARG start_ARG italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG , italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } ,

where γ0:=(2Lh+Lf1+l)+(2Lh+Lf1+l)2+4(LhLf1+Lf12)2(LhLf1+Lf12).assignsubscript𝛾02subscript𝐿subscript𝐿subscript𝑓1𝑙superscript2subscript𝐿subscript𝐿subscript𝑓1𝑙24subscript𝐿subscript𝐿subscript𝑓1superscriptsubscript𝐿subscript𝑓122subscript𝐿subscript𝐿subscript𝑓1superscriptsubscript𝐿subscript𝑓12\gamma_{0}:=\frac{-(2L_{h}+L_{f_{1}}+l)+\sqrt{(2L_{h}+L_{f_{1}}+l)^{2}+4(L_{h}% L_{f_{1}}+L_{f_{1}}^{2})}}{2(L_{h}L_{f_{1}}+L_{f_{1}}^{2})}.italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := divide start_ARG - ( 2 italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_l ) + square-root start_ARG ( 2 italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_l ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 4 ( italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG end_ARG start_ARG 2 ( italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG .

Remark 3.2.

When α=0𝛼0\alpha=0italic_α = 0, the extrapolated DYS algorithm Eq. 7 reduces to the method studied in [7, 42]. However, in this case, the range of γ𝛾\gammaitalic_γ based on Section 3.1 is different from the result in [7] for the fixed Lf1subscript𝐿subscript𝑓1L_{f_{1}}italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, Lhsubscript𝐿L_{h}italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT and l𝑙litalic_l. Especially the upper bound of γ𝛾\gammaitalic_γ is different due to the distinct construction of Λ(γ)Λ𝛾\Lambda(\gamma)roman_Λ ( italic_γ ) in Eq. 8. In other words, as a byproduct, this paper provides an improved parameter condition for γ𝛾\gammaitalic_γ to ensure the convergence of the DYS method in the nonconvex setting. In addition, the lower boundedness of the energy function for the DYS method, and a certain sublinear convergence rate are established under some common conditions, which will be detailed later.

3.2 Convergence analysis

In this subsection, we prove the convergence of Algorithm 1, i.e., the extrapolated DYS algorithm, for the general nonconvex optimization problem Eq. 1 under Section 3.1 and Section 3.1.

For convenience, we first present the corresponding first-order optimality conditions for the 𝐲𝐲\bf ybold_y- and 𝐳𝐳\bf zbold_z-subproblems in Eq. 7, which will be frequently utilized in the subsequent convergence analysis. Specifically, the optimality condition for 𝐲𝐲\bf ybold_y-subproblem in Eq. 7 is

(11) 0=f1(𝐲k+1)+1γ(𝐲k+1𝐰k),0subscript𝑓1superscript𝐲𝑘11𝛾superscript𝐲𝑘1superscript𝐰𝑘0=\nabla f_{1}({\bf y}^{k+1})+\frac{1}{\gamma}\left({\bf y}^{k+1}-{\bf w}^{k}% \right),0 = ∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ,

and that for 𝐳𝐳\bf zbold_z-subproblem in Eq. 7 is

(12) 0f2(𝐳k+1)+1γ(𝐳k+1+γh(𝐲k+1)2𝐲k+1+𝐰k).0subscript𝑓2superscript𝐳𝑘11𝛾superscript𝐳𝑘1𝛾superscript𝐲𝑘12superscript𝐲𝑘1superscript𝐰𝑘0\in\partial f_{2}({\bf z}^{k+1})+\frac{1}{\gamma}\left({\bf z}^{k+1}+\gamma% \nabla h({\bf y}^{k+1})-2{\bf y}^{k+1}+{\bf w}^{k}\right).0 ∈ ∂ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT + italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - 2 bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT + bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) .

To simplify the notations in our analysis, we denote

(13) 𝐯k=(𝐲k,𝐳k,𝐱k),𝐮k=(𝐯k,𝐱k1,𝐱k2),k1,formulae-sequencesuperscript𝐯𝑘superscriptsuperscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘topformulae-sequencesuperscript𝐮𝑘superscriptsuperscript𝐯𝑘superscript𝐱𝑘1superscript𝐱𝑘2topfor-all𝑘1{\bf v}^{k}=({\bf y}^{k},{\bf z}^{k},{\bf x}^{k})^{\top},\quad{\bf u}^{k}=({% \bf v}^{k},{\bf x}^{k-1},{\bf x}^{k-2})^{\top},\quad\forall~{}k\geq 1,bold_v start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = ( bold_v start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k - 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , ∀ italic_k ≥ 1 ,

and

(14) Δ𝐱k=𝐱k𝐱k1,Δ𝐲k=𝐲k𝐲k1,k1.formulae-sequencesuperscriptsubscriptΔ𝐱𝑘superscript𝐱𝑘superscript𝐱𝑘1formulae-sequencesuperscriptsubscriptΔ𝐲𝑘superscript𝐲𝑘superscript𝐲𝑘1for-all𝑘1\Delta_{\bf x}^{k}={\bf x}^{k}-{\bf x}^{k-1},\quad\Delta_{\bf y}^{k}={\bf y}^{% k}-{\bf y}^{k-1},\quad\forall~{}k\geq 1.roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , ∀ italic_k ≥ 1 .

Next, for γ>0𝛾0\gamma>0italic_γ > 0, we define an auxiliary function γsubscript𝛾\mathcal{H}_{\gamma}caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT as follows:

(15) γ(𝐲,𝐳,𝐱)subscript𝛾𝐲𝐳𝐱\displaystyle\mathcal{H}_{\gamma}\left({\bf y},{\bf z},{\bf x}\right)caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y , bold_z , bold_x ) =f1(𝐲)+f2(𝐳)+h(𝐲)+12γ2𝐲𝐳𝐱γh(𝐲)2absentsubscript𝑓1𝐲subscript𝑓2𝐳𝐲12𝛾superscriptnorm2𝐲𝐳𝐱𝛾𝐲2\displaystyle=f_{1}({\bf y})+f_{2}({\bf z})+h({\bf y})+\frac{1}{2\gamma}\|2{% \bf y}-{\bf z}-{\bf x}-\gamma\nabla h({\bf y})\|^{2}= italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y ) + italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z ) + italic_h ( bold_y ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ 2 bold_y - bold_z - bold_x - italic_γ ∇ italic_h ( bold_y ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
12γ𝐱𝐲+γh(𝐲)21γ𝐲𝐳212𝛾superscriptnorm𝐱𝐲𝛾𝐲21𝛾superscriptnorm𝐲𝐳2\displaystyle\quad-\frac{1}{2\gamma}\left\|{\bf x}-{\bf y}+\gamma\nabla h({\bf y% })\right\|^{2}-\frac{1}{\gamma}\left\|{\bf y}-{\bf z}\right\|^{2}- divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_x - bold_y + italic_γ ∇ italic_h ( bold_y ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ∥ bold_y - bold_z ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=f1(𝐲)+f2(𝐳)+h(𝐲)+12γ𝐲𝐱γh(𝐲)212γ𝐳𝐱γh(𝐲)2,absentsubscript𝑓1𝐲subscript𝑓2𝐳𝐲12𝛾superscriptnorm𝐲𝐱𝛾𝐲212𝛾superscriptnorm𝐳𝐱𝛾𝐲2\displaystyle=f_{1}({\bf y})+f_{2}({\bf z})+h({\bf y})+\frac{1}{2\gamma}\left% \|{\bf y}-{\bf x}-\gamma\nabla h({\bf y})\right\|^{2}-\frac{1}{2\gamma}\left\|% {\bf z}-{\bf x}-\gamma\nabla h({\bf y})\right\|^{2},= italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y ) + italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z ) + italic_h ( bold_y ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_y - bold_x - italic_γ ∇ italic_h ( bold_y ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_z - bold_x - italic_γ ∇ italic_h ( bold_y ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

which is motivated by the DYS envelope studied in [42] and also utilized in [7]. Based on the definition of γsubscript𝛾\mathcal{H}_{\gamma}caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT, we define the energy function associated with extrapolated DYS method Eq. 7 as follows:

(16) Θα,γ(𝐲,𝐳,𝐱,𝐱1,𝐱2)=γ(𝐲,𝐳,𝐱)+α22γ𝐱1𝐱22,subscriptΘ𝛼𝛾𝐲𝐳𝐱subscript𝐱1subscript𝐱2subscript𝛾𝐲𝐳𝐱superscript𝛼22𝛾superscriptnormsubscript𝐱1subscript𝐱22\displaystyle\Theta_{\alpha,\gamma}\left({\bf y},{\bf z},{\bf x},{\bf x}_{1},{% \bf x}_{2}\right)=\mathcal{H}_{\gamma}({\bf y},{\bf z},{\bf x})+\frac{\alpha^{% 2}}{2\gamma}\left\|{\bf x}_{1}-{\bf x}_{2}\right\|^{2},roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_y , bold_z , bold_x , bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y , bold_z , bold_x ) + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where α0𝛼0\alpha\geq 0italic_α ≥ 0 is a constant parameter that remains consistent with that in Algorithm 1.

We first show that the sequence {Θα,γ(𝐮k)}k1subscriptsubscriptΘ𝛼𝛾superscript𝐮𝑘𝑘1\{\Theta_{\alpha,\gamma}\left({\bf u}^{k}\right)\}_{k\geq 1}{ roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT is monotonically nonincreasing.

Lemma 3.3.

Suppose that Section 3.1 and Section 3.1 hold. Let the sequence {(𝐲k,𝐳k,𝐱k)}k1subscriptsuperscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘𝑘1\{({\bf y}^{k},{\bf z}^{k},{\bf x}^{k})\}_{k\geq 1}{ ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT be generated by Eq. 7, and {𝐮k},{𝐯k}superscript𝐮𝑘superscript𝐯𝑘\{{\bf u}^{k}\},~{}\{{\bf v}^{k}\}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } , { bold_v start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } and {Δ𝐱k},{Δ𝐲k}superscriptsubscriptΔ𝐱𝑘superscriptsubscriptΔ𝐲𝑘\{\Delta_{\bf x}^{k}\},~{}\{\Delta_{\bf y}^{k}\}{ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } , { roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } are defined in Eq. 13 and Eq. 14, respectively. Then, for a given τ(α,Λ(γ))𝜏𝛼Λ𝛾\tau\in\left(\alpha,\Lambda(\gamma)\right)italic_τ ∈ ( italic_α , roman_Λ ( italic_γ ) ), the sequence {Θα,γ(𝐮k)}k1subscriptsubscriptΘ𝛼𝛾superscript𝐮𝑘𝑘1\{\Theta_{\alpha,\gamma}\left({\bf u}^{k}\right)\}_{k\geq 1}{ roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT is monotonically nonincreasing. In particular, for any k1𝑘1k\geq 1italic_k ≥ 1, we have

(17) Θα,γ(𝐮k)Θα,γ(𝐮k+1)(Λ(γ)τ)(1γ+Lh2)Δ𝐲k+12+ξ(α,γ)Δ𝐱k2,subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮𝑘1Λ𝛾𝜏1𝛾subscript𝐿2superscriptnormsuperscriptsubscriptΔ𝐲𝑘12𝜉𝛼𝛾superscriptnormsuperscriptsubscriptΔ𝐱𝑘2\displaystyle\Theta_{\alpha,\gamma}\left({\bf u}^{k}\right)-\Theta_{\alpha,% \gamma}\left({\bf u}^{k+1}\right)\geq\left(\Lambda(\gamma)-\tau\right)\left(% \frac{1}{\gamma}+\frac{L_{h}}{2}\right)\left\|\Delta_{\bf y}^{k+1}\right\|^{2}% +\xi(\alpha,\gamma)\left\|\Delta_{\bf x}^{k}\right\|^{2},roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ≥ ( roman_Λ ( italic_γ ) - italic_τ ) ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ξ ( italic_α , italic_γ ) ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where Λ(γ)Λ𝛾\Lambda(\gamma)roman_Λ ( italic_γ ) is defined in Eq. 8, and ξ(α,γ):=αγ+αLhα2Lh2γα2Lh22α2Lh2τα2τγ>0assign𝜉𝛼𝛾𝛼𝛾𝛼subscript𝐿superscript𝛼2subscript𝐿2𝛾superscript𝛼2superscriptsubscript𝐿22superscript𝛼2subscript𝐿2𝜏superscript𝛼2𝜏𝛾0\xi(\alpha,\gamma):=\frac{\alpha}{\gamma}+\alpha L_{h}-\frac{\alpha^{2}L_{h}}{% 2}-\frac{\gamma\alpha^{2}L_{h}^{2}}{2}-\frac{\alpha^{2}L_{h}}{2\tau}-\frac{% \alpha^{2}}{\tau\gamma}>0italic_ξ ( italic_α , italic_γ ) := divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG + italic_α italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG - divide start_ARG italic_γ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_τ end_ARG - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_τ italic_γ end_ARG > 0.

Proof 3.4.

It follows from Eq. 15 that

(18) γ(𝐲k+1,𝐳k+1,𝐱k)γ(𝐲k+1,𝐳k+1,𝐱k+1)subscript𝛾superscript𝐲𝑘1superscript𝐳𝑘1superscript𝐱𝑘subscript𝛾superscript𝐲𝑘1superscript𝐳𝑘1superscript𝐱𝑘1\displaystyle\mathcal{H}_{\gamma}\left({\bf y}^{k+1},{\bf z}^{k+1},{\bf x}^{k}% \right)-\mathcal{H}_{\gamma}\left({\bf y}^{k+1},{\bf z}^{k+1},{\bf x}^{k+1}\right)caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT )
=1γΔ𝐱k+1,𝐳k+1𝐲k+1absent1𝛾superscriptsubscriptΔ𝐱𝑘1superscript𝐳𝑘1superscript𝐲𝑘1\displaystyle=\frac{1}{\gamma}\left\langle-\Delta_{\bf x}^{k+1},{\bf z}^{k+1}-% {\bf y}^{k+1}\right\rangle= divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ⟨ - roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ⟩
=1γ𝐲k+1𝐳k+12αγ𝐳k+1𝐲k+1,Δ𝐱k,absent1𝛾superscriptnormsuperscript𝐲𝑘1superscript𝐳𝑘12𝛼𝛾superscript𝐳𝑘1superscript𝐲𝑘1superscriptsubscriptΔ𝐱𝑘\displaystyle=-\frac{1}{\gamma}\left\|{\bf y}^{k+1}-{\bf z}^{k+1}\right\|^{2}-% \frac{\alpha}{\gamma}\left\langle{\bf z}^{k+1}-{\bf y}^{k+1},\Delta_{\bf x}^{k% }\right\rangle,= - divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG ⟨ bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ ,

where the last equality follows from the first and last relations in Eq. 7. Since 𝐳k+1superscript𝐳𝑘1{\bf z}^{k+1}bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT is a minimizer of the 𝐳𝐳\bf zbold_z-subproblem according to the third equality in Eq. 7, we have

f2(𝐳k)+12γ2𝐲k+1𝐳k𝐰kγh(𝐲k+1)2subscript𝑓2superscript𝐳𝑘12𝛾superscriptnorm2superscript𝐲𝑘1superscript𝐳𝑘superscript𝐰𝑘𝛾superscript𝐲𝑘12\displaystyle f_{2}({\bf z}^{k})+\frac{1}{2\gamma}\left\|2{\bf y}^{k+1}-{\bf z% }^{k}-{\bf w}^{k}-\gamma\nabla h({\bf y}^{k+1})\right\|^{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ 2 bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
f2(𝐳k+1)+12γ2𝐲k+1𝐳k+1𝐰kγh(𝐲k+1)2.absentsubscript𝑓2superscript𝐳𝑘112𝛾superscriptnorm2superscript𝐲𝑘1superscript𝐳𝑘1superscript𝐰𝑘𝛾superscript𝐲𝑘12\displaystyle\geq f_{2}({\bf z}^{k+1})+\frac{1}{2\gamma}\left\|2{\bf y}^{k+1}-% {\bf z}^{k+1}-{\bf w}^{k}-\gamma\nabla h({\bf y}^{k+1})\right\|^{2}.≥ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ 2 bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

This together with Eq. 15, we have

(19) γ(𝐲k+1,𝐳k,𝐱k)γ(𝐲k+1,𝐳k+1,𝐱k)subscript𝛾superscript𝐲𝑘1superscript𝐳𝑘superscript𝐱𝑘subscript𝛾superscript𝐲𝑘1superscript𝐳𝑘1superscript𝐱𝑘\displaystyle\mathcal{H}_{\gamma}\left({\bf y}^{k+1},{\bf z}^{k},{\bf x}^{k}% \right)-\mathcal{H}_{\gamma}\left({\bf y}^{k+1},{\bf z}^{k+1},{\bf x}^{k}\right)caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT )
=f2(𝐳k)+12γ2𝐲k+1𝐳k𝐱kγh(𝐲k+1)21γ𝐲k+1𝐳k2absentsubscript𝑓2superscript𝐳𝑘12𝛾superscriptnorm2superscript𝐲𝑘1superscript𝐳𝑘superscript𝐱𝑘𝛾superscript𝐲𝑘121𝛾superscriptnormsuperscript𝐲𝑘1superscript𝐳𝑘2\displaystyle=f_{2}({\bf z}^{k})+\frac{1}{2\gamma}\left\|2{\bf y}^{k+1}-{\bf z% }^{k}-{\bf x}^{k}-\gamma\nabla h({\bf y}^{k+1})\right\|^{2}-\frac{1}{\gamma}% \left\|{\bf y}^{k+1}-{\bf z}^{k}\right\|^{2}= italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ 2 bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
f2(𝐳k+1)12γ2𝐲k+1𝐳k+1𝐱kγh(𝐲k+1)2+1γ𝐲k+1𝐳k+12subscript𝑓2superscript𝐳𝑘112𝛾superscriptnorm2superscript𝐲𝑘1superscript𝐳𝑘1superscript𝐱𝑘𝛾superscript𝐲𝑘121𝛾superscriptnormsuperscript𝐲𝑘1superscript𝐳𝑘12\displaystyle\quad-f_{2}({\bf z}^{k+1})-\frac{1}{2\gamma}\left\|2{\bf y}^{k+1}% -{\bf z}^{k+1}-{\bf x}^{k}-\gamma\nabla h({\bf y}^{k+1})\right\|^{2}+\frac{1}{% \gamma}\left\|{\bf y}^{k+1}-{\bf z}^{k+1}\right\|^{2}- italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ 2 bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
1γ𝐲k+1𝐳k+121γ𝐲k+1𝐳k2+1γ𝐳k+1𝐳k,𝐰k𝐱kabsent1𝛾superscriptnormsuperscript𝐲𝑘1superscript𝐳𝑘121𝛾superscriptnormsuperscript𝐲𝑘1superscript𝐳𝑘21𝛾superscript𝐳𝑘1superscript𝐳𝑘superscript𝐰𝑘superscript𝐱𝑘\displaystyle\geq\frac{1}{\gamma}\left\|{\bf y}^{k+1}-{\bf z}^{k+1}\right\|^{2% }-\frac{1}{\gamma}\left\|{\bf y}^{k+1}-{\bf z}^{k}\right\|^{2}+\frac{1}{\gamma% }\left\langle{\bf z}^{k+1}-{\bf z}^{k},{\bf w}^{k}-{\bf x}^{k}\right\rangle≥ divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ⟨ bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩
=1γ𝐲k+1𝐳k+121γ𝐲k+1𝐳k2+αγ𝐳k+1𝐳k,Δ𝐱k,absent1𝛾superscriptnormsuperscript𝐲𝑘1superscript𝐳𝑘121𝛾superscriptnormsuperscript𝐲𝑘1superscript𝐳𝑘2𝛼𝛾superscript𝐳𝑘1superscript𝐳𝑘superscriptsubscriptΔ𝐱𝑘\displaystyle=\frac{1}{\gamma}\left\|{\bf y}^{k+1}-{\bf z}^{k+1}\right\|^{2}-% \frac{1}{\gamma}\left\|{\bf y}^{k+1}-{\bf z}^{k}\right\|^{2}+\frac{\alpha}{% \gamma}\left\langle{\bf z}^{k+1}-{\bf z}^{k},\Delta_{\bf x}^{k}\right\rangle,= divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG ⟨ bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ ,

where the last equality follows from the relation 𝐰k=𝐱k+αΔ𝐱ksuperscript𝐰𝑘superscript𝐱𝑘𝛼superscriptsubscriptΔ𝐱𝑘{\bf w}^{k}={\bf x}^{k}+\alpha\Delta_{\bf x}^{k}bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_α roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT in Eq. 7. Since f1+12γ𝐰k2f_{1}+\frac{1}{2\gamma}\left\|{\bf w}^{k}-\cdot\right\|^{2}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - ⋅ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is a strongly convex function with modulus 1γl1𝛾𝑙\frac{1}{\gamma}-ldivide start_ARG 1 end_ARG start_ARG italic_γ end_ARG - italic_l, and recall the optimality condition 0=f1(𝐲k+1)+1γ(𝐲k+1𝐰k)0subscript𝑓1superscript𝐲𝑘11𝛾superscript𝐲𝑘1superscript𝐰𝑘0=\nabla f_{1}({\bf y}^{k+1})+\frac{1}{\gamma}\left({\bf y}^{k+1}-{\bf w}^{k}\right)0 = ∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) for the 𝐲𝐲\bf ybold_y-subproblem in Eq. 11, we obtain

f1(𝐲k)+12γ𝐲k𝐰k2f1(𝐲k+1)+12γ𝐲k+1𝐰k2+12(1γl)Δ𝐲k+12.subscript𝑓1superscript𝐲𝑘12𝛾superscriptnormsuperscript𝐲𝑘superscript𝐰𝑘2subscript𝑓1superscript𝐲𝑘112𝛾superscriptnormsuperscript𝐲𝑘1superscript𝐰𝑘2121𝛾𝑙superscriptnormsuperscriptsubscriptΔ𝐲𝑘12f_{1}({\bf y}^{k})+\frac{1}{2\gamma}\left\|{\bf y}^{k}-{\bf w}^{k}\right\|^{2}% \geq f_{1}({\bf y}^{k+1})+\frac{1}{2\gamma}\left\|{\bf y}^{k+1}-{\bf w}^{k}% \right\|^{2}+\frac{1}{2}\left(\frac{1}{\gamma}-{l}\right)\left\|\Delta_{\bf y}% ^{k+1}\right\|^{2}.italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG - italic_l ) ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

This implies that

f1(𝐲k)+12γ𝐲k𝐱k2subscript𝑓1superscript𝐲𝑘12𝛾superscriptnormsuperscript𝐲𝑘superscript𝐱𝑘2\displaystyle f_{1}({\bf y}^{k})+\frac{1}{2\gamma}\left\|{\bf y}^{k}-{\bf x}^{% k}\right\|^{2}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
f1(𝐲k+1)+12γ𝐲k+1𝐱k2αγΔ𝐲k+1,Δ𝐱k+12(1γl)Δ𝐲k+12.absentsubscript𝑓1superscript𝐲𝑘112𝛾superscriptnormsuperscript𝐲𝑘1superscript𝐱𝑘2𝛼𝛾superscriptsubscriptΔ𝐲𝑘1superscriptsubscriptΔ𝐱𝑘121𝛾𝑙superscriptnormsuperscriptsubscriptΔ𝐲𝑘12\displaystyle\geq f_{1}({\bf y}^{k+1})+\frac{1}{2\gamma}\left\|{\bf y}^{k+1}-{% \bf x}^{k}\right\|^{2}-\frac{\alpha}{\gamma}\left\langle\Delta_{\bf y}^{k+1},% \Delta_{\bf x}^{k}\right\rangle+\frac{1}{2}\left(\frac{1}{\gamma}-{l}\right)% \left\|\Delta_{\bf y}^{k+1}\right\|^{2}.≥ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG ⟨ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG - italic_l ) ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Therefore, it follows from Eq. 15 that

γ(𝐲k,𝐳k,𝐱k)γ(𝐲k+1,𝐳k,𝐱k)subscript𝛾superscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘subscript𝛾superscript𝐲𝑘1superscript𝐳𝑘superscript𝐱𝑘\displaystyle\mathcal{H}_{\gamma}\left({\bf y}^{k},{\bf z}^{k},{\bf x}^{k}% \right)-\mathcal{H}_{\gamma}\left({\bf y}^{k+1},{\bf z}^{k},{\bf x}^{k}\right)caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT )
=f1(𝐲k)+h(𝐲k)+12γ𝐲k𝐱kγh(𝐲k)212γ𝐳k𝐱kγh(𝐲k)2absentsubscript𝑓1superscript𝐲𝑘superscript𝐲𝑘12𝛾superscriptnormsuperscript𝐲𝑘superscript𝐱𝑘𝛾superscript𝐲𝑘212𝛾superscriptnormsuperscript𝐳𝑘superscript𝐱𝑘𝛾superscript𝐲𝑘2\displaystyle=f_{1}({\bf y}^{k})+h({\bf y}^{k})+\frac{1}{2\gamma}\left\|{\bf y% }^{k}-{\bf x}^{k}-\gamma\nabla h({\bf y}^{k})\right\|^{2}-\frac{1}{2\gamma}% \left\|{\bf z}^{k}-{\bf x}^{k}-\gamma\nabla h({\bf y}^{k})\right\|^{2}= italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
f1(𝐲k+1)h(𝐲k+1)12γ𝐲k+1𝐱kγh(𝐲k+1)2+12γ𝐳k𝐱kγh(𝐲k+1)2subscript𝑓1superscript𝐲𝑘1superscript𝐲𝑘112𝛾superscriptnormsuperscript𝐲𝑘1superscript𝐱𝑘𝛾superscript𝐲𝑘1212𝛾superscriptnormsuperscript𝐳𝑘superscript𝐱𝑘𝛾superscript𝐲𝑘12\displaystyle\quad-f_{1}({\bf y}^{k+1})-h({\bf y}^{k+1})-\frac{1}{2\gamma}% \left\|{\bf y}^{k+1}-{\bf x}^{k}-\gamma\nabla h({\bf y}^{k+1})\right\|^{2}+% \frac{1}{2\gamma}\left\|{\bf z}^{k}-{\bf x}^{k}-\gamma\nabla h({\bf y}^{k+1})% \right\|^{2}- italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
h(𝐲k)𝐲k𝐱k,h(𝐲k)+γ2h(𝐲k)212γ𝐳k𝐱kγh(𝐲k)2absentsuperscript𝐲𝑘superscript𝐲𝑘superscript𝐱𝑘superscript𝐲𝑘𝛾2superscriptnormsuperscript𝐲𝑘212𝛾superscriptnormsuperscript𝐳𝑘superscript𝐱𝑘𝛾superscript𝐲𝑘2\displaystyle\geq h({\bf y}^{k})-\left\langle{\bf y}^{k}-{\bf x}^{k},\nabla h(% {\bf y}^{k})\right\rangle+\frac{\gamma}{2}\left\|\nabla h({\bf y}^{k})\right\|% ^{2}-\frac{1}{2\gamma}\left\|{\bf z}^{k}-{\bf x}^{k}-\gamma\nabla h({\bf y}^{k% })\right\|^{2}≥ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - ⟨ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ⟩ + divide start_ARG italic_γ end_ARG start_ARG 2 end_ARG ∥ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
h(𝐲k+1)+𝐲k+1𝐱k,h(𝐲k+1)γ2h(𝐲k+1)2+12γ𝐳k𝐱kγh(𝐲k+1)2superscript𝐲𝑘1superscript𝐲𝑘1superscript𝐱𝑘superscript𝐲𝑘1𝛾2superscriptnormsuperscript𝐲𝑘1212𝛾superscriptnormsuperscript𝐳𝑘superscript𝐱𝑘𝛾superscript𝐲𝑘12\displaystyle\quad-h({\bf y}^{k+1})+\left\langle{\bf y}^{k+1}-{\bf x}^{k},% \nabla h({\bf y}^{k+1})\right\rangle-\frac{\gamma}{2}\left\|\nabla h({\bf y}^{% k+1})\right\|^{2}+\frac{1}{2\gamma}\left\|{\bf z}^{k}-{\bf x}^{k}-\gamma\nabla h% ({\bf y}^{k+1})\right\|^{2}- italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + ⟨ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ⟩ - divide start_ARG italic_γ end_ARG start_ARG 2 end_ARG ∥ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+12(1γl)Δ𝐲k+12αγΔ𝐲k+1,Δ𝐱k.121𝛾𝑙superscriptnormsuperscriptsubscriptΔ𝐲𝑘12𝛼𝛾superscriptsubscriptΔ𝐲𝑘1superscriptsubscriptΔ𝐱𝑘\displaystyle\quad+\frac{1}{2}\left(\frac{1}{\gamma}-{l}\right)\left\|\Delta_{% \bf y}^{k+1}\right\|^{2}-\frac{\alpha}{\gamma}\left\langle\Delta_{\bf y}^{k+1}% ,\Delta_{\bf x}^{k}\right\rangle.+ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG - italic_l ) ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG ⟨ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ .

Then, expanding the squares and combining the terms in the right-hand side of the above inequality, we have

(20) γ(𝐲k,𝐳k,𝐱k)γ(𝐲k+1,𝐳k,𝐱k)subscript𝛾superscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘subscript𝛾superscript𝐲𝑘1superscript𝐳𝑘superscript𝐱𝑘\displaystyle\mathcal{H}_{\gamma}\left({\bf y}^{k},{\bf z}^{k},{\bf x}^{k}% \right)-\mathcal{H}_{\gamma}\left({\bf y}^{k+1},{\bf z}^{k},{\bf x}^{k}\right)caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT )
h(𝐲k)+𝐳k𝐲k,h(𝐲k)12γ𝐳k𝐱k2h(𝐲k+1)+𝐲k+1𝐳k,h(𝐲k+1)absentsuperscript𝐲𝑘superscript𝐳𝑘superscript𝐲𝑘superscript𝐲𝑘12𝛾superscriptnormsuperscript𝐳𝑘superscript𝐱𝑘2superscript𝐲𝑘1superscript𝐲𝑘1superscript𝐳𝑘superscript𝐲𝑘1\displaystyle\geq h({\bf y}^{k})+\left\langle{\bf z}^{k}-{\bf y}^{k},\nabla h(% {\bf y}^{k})\right\rangle-\frac{1}{2\gamma}\left\|{\bf z}^{k}-{\bf x}^{k}% \right\|^{2}-h({\bf y}^{k+1})+\left\langle{\bf y}^{k+1}-{\bf z}^{k},\nabla h({% \bf y}^{k+1})\right\rangle≥ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + ⟨ bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ⟩ - divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + ⟨ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ⟩
+12γ𝐳k𝐱k2+12(1γl)Δ𝐲k+12αγΔ𝐲k+1,Δ𝐱k12𝛾superscriptnormsuperscript𝐳𝑘superscript𝐱𝑘2121𝛾𝑙superscriptnormsuperscriptsubscriptΔ𝐲𝑘12𝛼𝛾superscriptsubscriptΔ𝐲𝑘1superscriptsubscriptΔ𝐱𝑘\displaystyle\quad+\frac{1}{2\gamma}\left\|{\bf z}^{k}-{\bf x}^{k}\right\|^{2}% +\frac{1}{2}\left(\frac{1}{\gamma}-{l}\right)\left\|\Delta_{\bf y}^{k+1}\right% \|^{2}-\frac{\alpha}{\gamma}\left\langle\Delta_{\bf y}^{k+1},\Delta_{\bf x}^{k% }\right\rangle+ divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG - italic_l ) ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG ⟨ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩
=h(𝐲k)+𝐳k𝐲k,h(𝐲k)h(𝐲k+1)𝐳k𝐲k+1,h(𝐲k+1)absentsuperscript𝐲𝑘superscript𝐳𝑘superscript𝐲𝑘superscript𝐲𝑘superscript𝐲𝑘1superscript𝐳𝑘superscript𝐲𝑘1superscript𝐲𝑘1\displaystyle=h({\bf y}^{k})+\left\langle{\bf z}^{k}-{\bf y}^{k},\nabla h({\bf y% }^{k})\right\rangle-h({\bf y}^{k+1})-\left\langle{\bf z}^{k}-{\bf y}^{k+1},% \nabla h({\bf y}^{k+1})\right\rangle= italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + ⟨ bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ⟩ - italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - ⟨ bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ⟩
+12(1γl)Δ𝐲k+12αγΔ𝐲k+1,Δ𝐱k,121𝛾𝑙superscriptnormsuperscriptsubscriptΔ𝐲𝑘12𝛼𝛾superscriptsubscriptΔ𝐲𝑘1superscriptsubscriptΔ𝐱𝑘\displaystyle\quad+\frac{1}{2}\left(\frac{1}{\gamma}-{l}\right)\left\|\Delta_{% \bf y}^{k+1}\right\|^{2}-\frac{\alpha}{\gamma}\left\langle\Delta_{\bf y}^{k+1}% ,\Delta_{\bf x}^{k}\right\rangle,+ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG - italic_l ) ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG ⟨ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ ,

Next, according to Lemma 2.4 and the Lhsubscript𝐿L_{h}italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT-Lipschitz continuity of h\nabla h∇ italic_h, we have

(21) h(𝐲k)h(𝐲k+1)+𝐳k𝐲k,h(𝐲k)𝐳k𝐲k+1,h(𝐲k+1)superscript𝐲𝑘superscript𝐲𝑘1superscript𝐳𝑘superscript𝐲𝑘superscript𝐲𝑘superscript𝐳𝑘superscript𝐲𝑘1superscript𝐲𝑘1\displaystyle h({\bf y}^{k})-h({\bf y}^{k+1})+\left\langle{\bf z}^{k}-{\bf y}^% {k},\nabla h({\bf y}^{k})\right\rangle-\left\langle{\bf z}^{k}-{\bf y}^{k+1},% \nabla h({\bf y}^{k+1})\right\rangleitalic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + ⟨ bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ⟩ - ⟨ bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ⟩
=h(𝐲k)h(𝐲k+1)+Δ𝐲k+1,h(𝐲k)𝐳k𝐲k+1,h(𝐲k+1)h(𝐲k)absentsuperscript𝐲𝑘superscript𝐲𝑘1superscriptsubscriptΔ𝐲𝑘1superscript𝐲𝑘superscript𝐳𝑘superscript𝐲𝑘1superscript𝐲𝑘1superscript𝐲𝑘\displaystyle=h({\bf y}^{k})-h({\bf y}^{k+1})+\left\langle\Delta_{\bf y}^{k+1}% ,\nabla h({\bf y}^{k})\right\rangle-\left\langle{\bf z}^{k}-{\bf y}^{k+1},% \nabla h({\bf y}^{k+1})-\nabla h({\bf y}^{k})\right\rangle= italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + ⟨ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ⟩ - ⟨ bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ⟩
Lh2Δ𝐲k+12𝐳k𝐲k+1,h(𝐲k+1)h(𝐲k)absentsubscript𝐿2superscriptnormsuperscriptsubscriptΔ𝐲𝑘12superscript𝐳𝑘superscript𝐲𝑘1superscript𝐲𝑘1superscript𝐲𝑘\displaystyle\geq-\frac{L_{h}}{2}\left\|\Delta_{\bf y}^{k+1}\right\|^{2}-\left% \langle{\bf z}^{k}-{\bf y}^{k+1},\nabla h({\bf y}^{k+1})-\nabla h({\bf y}^{k})\right\rangle≥ - divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ⟨ bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ⟩
Lh2Δ𝐲k+12Lh2𝐲k+1𝐳k2Lh2Δ𝐲k+12.absentsubscript𝐿2superscriptnormsuperscriptsubscriptΔ𝐲𝑘12subscript𝐿2superscriptnormsuperscript𝐲𝑘1superscript𝐳𝑘2subscript𝐿2superscriptnormsuperscriptsubscriptΔ𝐲𝑘12\displaystyle\geq-\frac{L_{h}}{2}\left\|\Delta_{\bf y}^{k+1}\right\|^{2}-\frac% {L_{h}}{2}\left\|{\bf y}^{k+1}-{\bf z}^{k}\right\|^{2}-\frac{L_{h}}{2}\left\|% \Delta_{\bf y}^{k+1}\right\|^{2}.≥ - divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Substituting Eq. 21 into Eq. 20, we obtain

(22) γ(𝐲k,𝐳k,𝐱k)γ(𝐲k+1,𝐳k,𝐱k)subscript𝛾superscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘subscript𝛾superscript𝐲𝑘1superscript𝐳𝑘superscript𝐱𝑘\displaystyle\mathcal{H}_{\gamma}\left({\bf y}^{k},{\bf z}^{k},{\bf x}^{k}% \right)-\mathcal{H}_{\gamma}\left({\bf y}^{k+1},{\bf z}^{k},{\bf x}^{k}\right)caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT )
12(1γl)Δ𝐲k+12αγΔ𝐲k+1,Δ𝐱kLhΔ𝐲k+12Lh2𝐲k+1𝐳k2.absent121𝛾𝑙superscriptnormsuperscriptsubscriptΔ𝐲𝑘12𝛼𝛾superscriptsubscriptΔ𝐲𝑘1superscriptsubscriptΔ𝐱𝑘subscript𝐿superscriptnormsuperscriptsubscriptΔ𝐲𝑘12subscript𝐿2superscriptnormsuperscript𝐲𝑘1superscript𝐳𝑘2\displaystyle\geq\frac{1}{2}\left(\frac{1}{\gamma}-{l}\right)\left\|\Delta_{% \bf y}^{k+1}\right\|^{2}-\frac{\alpha}{\gamma}\left\langle\Delta_{\bf y}^{k+1}% ,\Delta_{\bf x}^{k}\right\rangle-L_{h}\left\|\Delta_{\bf y}^{k+1}\right\|^{2}-% \frac{L_{h}}{2}\left\|{\bf y}^{k+1}-{\bf z}^{k}\right\|^{2}.≥ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG - italic_l ) ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG ⟨ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ - italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Summing Eq. 18, Eq. 19 and Eq. 22 yields

(23) γ(𝐲k,𝐳k,𝐱k)γ(𝐲k+1,𝐳k+1,𝐱k+1)subscript𝛾superscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘subscript𝛾superscript𝐲𝑘1superscript𝐳𝑘1superscript𝐱𝑘1\displaystyle\mathcal{H}_{\gamma}\left({\bf y}^{k},{\bf z}^{k},{\bf x}^{k}% \right)-\mathcal{H}_{\gamma}\left({\bf y}^{k+1},{\bf z}^{k+1},{\bf x}^{k+1}\right)caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT )
1γl2γLh2γΔ𝐲k+12(1γ+Lh2)𝐲k+1𝐳k2+αγ𝐲k𝐳k,Δ𝐱kabsent1𝛾𝑙2𝛾subscript𝐿2𝛾superscriptnormsuperscriptsubscriptΔ𝐲𝑘121𝛾subscript𝐿2superscriptnormsuperscript𝐲𝑘1superscript𝐳𝑘2𝛼𝛾superscript𝐲𝑘superscript𝐳𝑘superscriptsubscriptΔ𝐱𝑘\displaystyle\geq\frac{1-\gamma{l}-2\gamma L_{h}}{2\gamma}\left\|\Delta_{\bf y% }^{k+1}\right\|^{2}-\left(\frac{1}{\gamma}+\frac{L_{h}}{2}\right)\left\|{\bf y% }^{k+1}-{\bf z}^{k}\right\|^{2}+\frac{\alpha}{\gamma}\left\langle{\bf y}^{k}-{% \bf z}^{k},\Delta_{\bf x}^{k}\right\rangle≥ divide start_ARG 1 - italic_γ italic_l - 2 italic_γ italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_γ end_ARG ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG ⟨ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩
=1γl2γLh2γΔ𝐲k+12(1γ+Lh2)𝐲k+1𝐳k2absent1𝛾𝑙2𝛾subscript𝐿2𝛾superscriptnormsuperscriptsubscriptΔ𝐲𝑘121𝛾subscript𝐿2superscriptnormsuperscript𝐲𝑘1superscript𝐳𝑘2\displaystyle=\frac{1-\gamma{l}-2\gamma L_{h}}{2\gamma}\left\|\Delta_{\bf y}^{% k+1}\right\|^{2}-\left(\frac{1}{\gamma}+\frac{L_{h}}{2}\right)\left\|{\bf y}^{% k+1}-{\bf z}^{k}\right\|^{2}= divide start_ARG 1 - italic_γ italic_l - 2 italic_γ italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_γ end_ARG ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
αγΔ𝐱k2+α2γΔ𝐱k1,Δ𝐱k,𝛼𝛾superscriptnormsuperscriptsubscriptΔ𝐱𝑘2superscript𝛼2𝛾superscriptsubscriptΔ𝐱𝑘1superscriptsubscriptΔ𝐱𝑘\displaystyle\quad-\frac{\alpha}{\gamma}\left\|\Delta_{\bf x}^{k}\right\|^{2}+% \frac{\alpha^{2}}{\gamma}\left\langle\Delta_{\bf x}^{k-1},\Delta_{\bf x}^{k}% \right\rangle,- divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG ⟨ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ ,

where the last equality holds due to the fact 𝐲k𝐳k=𝐰k1𝐱k=𝐱k1𝐱k+α(𝐱k1𝐱k2)superscript𝐲𝑘superscript𝐳𝑘superscript𝐰𝑘1superscript𝐱𝑘superscript𝐱𝑘1superscript𝐱𝑘𝛼superscript𝐱𝑘1superscript𝐱𝑘2{\bf y}^{k}-{\bf z}^{k}={\bf w}^{k-1}-{\bf x}^{k}={\bf x}^{k-1}-{\bf x}^{k}+% \alpha({\bf x}^{k-1}-{\bf x}^{k-2})bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = bold_w start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_α ( bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 2 end_POSTSUPERSCRIPT ) by Eq. 7. Our further aim is to analyze the negative term 𝐲k+1𝐳k2superscriptnormsuperscript𝐲𝑘1superscript𝐳𝑘2\|{\bf y}^{k+1}-{\bf z}^{k}\|^{2}∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. It follows from the second equality in Eq. 7 that 0=f1(𝐲k+1)+1γ(𝐲k+1𝐰k)0subscript𝑓1superscript𝐲𝑘11𝛾superscript𝐲𝑘1superscript𝐰𝑘0=\nabla f_{1}({\bf y}^{k+1})+\frac{1}{\gamma}({\bf y}^{k+1}-{\bf w}^{k})0 = ∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ). Further, we obtain

(24) 𝐲k+1𝐳k2superscriptnormsuperscript𝐲𝑘1superscript𝐳𝑘2\displaystyle\left\|{\bf y}^{k+1}-{\bf z}^{k}\right\|^{2}∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=𝐲k+1(𝐱k𝐰k1+𝐲k)2absentsuperscriptnormsuperscript𝐲𝑘1superscript𝐱𝑘superscript𝐰𝑘1superscript𝐲𝑘2\displaystyle=\left\|{\bf y}^{k+1}-\left({\bf x}^{k}-{\bf w}^{k-1}+{\bf y}^{k}% \right)\right\|^{2}= ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - ( bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT + bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=(𝐲k+1𝐰k)(𝐲k𝐰k1)+(𝐰k𝐱k)2absentsuperscriptnormsuperscript𝐲𝑘1superscript𝐰𝑘superscript𝐲𝑘superscript𝐰𝑘1superscript𝐰𝑘superscript𝐱𝑘2\displaystyle=\left\|({\bf y}^{k+1}-{\bf w}^{k})-({\bf y}^{k}-{\bf w}^{k-1})+(% {\bf w}^{k}-{\bf x}^{k})\right\|^{2}= ∥ ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) + ( bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=γ2f1(𝐲k)f1(𝐲k+1)2+𝐰k𝐱k2+2(𝐲k+1𝐰k)(𝐲k𝐰k1),𝐰k𝐱kabsentsuperscript𝛾2superscriptnormsubscript𝑓1superscript𝐲𝑘subscript𝑓1superscript𝐲𝑘12superscriptnormsuperscript𝐰𝑘superscript𝐱𝑘22superscript𝐲𝑘1superscript𝐰𝑘superscript𝐲𝑘superscript𝐰𝑘1superscript𝐰𝑘superscript𝐱𝑘\displaystyle=\gamma^{2}\left\|\nabla f_{1}({\bf y}^{k})-\nabla f_{1}({\bf y}^% {k+1})\right\|^{2}+\|{\bf w}^{k}-{\bf x}^{k}\|^{2}+2\langle({\bf y}^{k+1}-{\bf w% }^{k})-({\bf y}^{k}-{\bf w}^{k-1}),{\bf w}^{k}-{\bf x}^{k}\rangle= italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - ∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 ⟨ ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) , bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩
γ2Lf12Δ𝐲k+12+α2Δ𝐱k2+2α(𝐲k+1𝐰k)(𝐲k𝐰k1),Δ𝐱kabsentsuperscript𝛾2superscriptsubscript𝐿subscript𝑓12superscriptnormsuperscriptsubscriptΔ𝐲𝑘12superscript𝛼2superscriptnormsuperscriptsubscriptΔ𝐱𝑘22𝛼superscript𝐲𝑘1superscript𝐰𝑘superscript𝐲𝑘superscript𝐰𝑘1superscriptsubscriptΔ𝐱𝑘\displaystyle\leq\gamma^{2}L_{f_{1}}^{2}\left\|\Delta_{\bf y}^{k+1}\right\|^{2% }+\alpha^{2}\left\|\Delta_{\bf x}^{k}\right\|^{2}+2\alpha\left\langle({\bf y}^% {k+1}-{\bf w}^{k})-({\bf y}^{k}-{\bf w}^{k-1}),\Delta_{\bf x}^{k}\right\rangle≤ italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_α ⟨ ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩
=γ2Lf12Δ𝐲k+12+2αΔ𝐲k+1,Δ𝐱k+2α2Δ𝐱k1,Δ𝐱kα(2+α)Δ𝐱k2,absentsuperscript𝛾2superscriptsubscript𝐿subscript𝑓12superscriptnormsuperscriptsubscriptΔ𝐲𝑘122𝛼superscriptsubscriptΔ𝐲𝑘1superscriptsubscriptΔ𝐱𝑘2superscript𝛼2superscriptsubscriptΔ𝐱𝑘1superscriptsubscriptΔ𝐱𝑘𝛼2𝛼superscriptnormsuperscriptsubscriptΔ𝐱𝑘2\displaystyle=\gamma^{2}L_{f_{1}}^{2}\left\|\Delta_{\bf y}^{k+1}\right\|^{2}+2% \alpha\left\langle\Delta_{\bf y}^{k+1},\Delta_{\bf x}^{k}\right\rangle+2\alpha% ^{2}\left\langle\Delta_{\bf x}^{k-1},\Delta_{\bf x}^{k}\right\rangle-\alpha(2+% \alpha)\left\|\Delta_{\bf x}^{k}\right\|^{2},= italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_α ⟨ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ + 2 italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟨ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ - italic_α ( 2 + italic_α ) ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where the last equality follows from the relation 𝐰k1𝐰k=α(𝐱k1𝐱k2)+(1+α)(𝐱k1𝐱k)superscript𝐰𝑘1superscript𝐰𝑘𝛼superscript𝐱𝑘1superscript𝐱𝑘21𝛼superscript𝐱𝑘1superscript𝐱𝑘{\bf w}^{k-1}-{\bf w}^{k}=\alpha({\bf x}^{k-1}-{\bf x}^{k-2})+(1+\alpha)({\bf x% }^{k-1}-{\bf x}^{k})bold_w start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = italic_α ( bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 2 end_POSTSUPERSCRIPT ) + ( 1 + italic_α ) ( bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) by Eq. 7. Substituting Eq. 24 into Eq. 23 yields

(25) γ(𝐲k,𝐳k,𝐱k)γ(𝐲k+1,𝐳k+1,𝐱k+1)subscript𝛾superscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘subscript𝛾superscript𝐲𝑘1superscript𝐳𝑘1superscript𝐱𝑘1\displaystyle\mathcal{H}_{\gamma}\left({\bf y}^{k},{\bf z}^{k},{\bf x}^{k}% \right)-\mathcal{H}_{\gamma}\left({\bf y}^{k+1},{\bf z}^{k+1},{\bf x}^{k+1}\right)caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT )
(1γl2γLh2γ(1γ+Lh2)γ2Lf12)Δ𝐲k+12absent1𝛾𝑙2𝛾subscript𝐿2𝛾1𝛾subscript𝐿2superscript𝛾2superscriptsubscript𝐿subscript𝑓12superscriptnormsuperscriptsubscriptΔ𝐲𝑘12\displaystyle\geq\left(\frac{1-\gamma{l}-2\gamma L_{h}}{2\gamma}-\left(\frac{1% }{\gamma}+\frac{L_{h}}{2}\right)\gamma^{2}L_{f_{1}}^{2}\right)\left\|\Delta_{% \bf y}^{k+1}\right\|^{2}≥ ( divide start_ARG 1 - italic_γ italic_l - 2 italic_γ italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_γ end_ARG - ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+(α+α2γ+αLh+α2Lh2)Δ𝐱k2𝛼superscript𝛼2𝛾𝛼subscript𝐿superscript𝛼2subscript𝐿2superscriptnormsuperscriptsubscriptΔ𝐱𝑘2\displaystyle\quad+\left(\frac{\alpha+\alpha^{2}}{\gamma}+\alpha L_{h}+\frac{% \alpha^{2}L_{h}}{2}\right)\left\|\Delta_{\bf x}^{k}\right\|^{2}+ ( divide start_ARG italic_α + italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG + italic_α italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(2γ+Lh)αΔ𝐲k+1,Δ𝐱k(1γ+Lh)α2Δ𝐱k1,Δ𝐱k.2𝛾subscript𝐿𝛼superscriptsubscriptΔ𝐲𝑘1superscriptsubscriptΔ𝐱𝑘1𝛾subscript𝐿superscript𝛼2superscriptsubscriptΔ𝐱𝑘1superscriptsubscriptΔ𝐱𝑘\displaystyle\quad-\left(\frac{2}{\gamma}+L_{h}\right)\alpha\left\langle\Delta% _{\bf y}^{k+1},\Delta_{\bf x}^{k}\right\rangle-\left(\frac{1}{\gamma}+L_{h}% \right)\alpha^{2}\left\langle\Delta_{\bf x}^{k-1},\Delta_{\bf x}^{k}\right\rangle.- ( divide start_ARG 2 end_ARG start_ARG italic_γ end_ARG + italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) italic_α ⟨ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ - ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG + italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟨ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ .

Note that for any τ>0𝜏0\tau>0italic_τ > 0, it holds that

αΔ𝐲k+1,Δ𝐱kτ2Δ𝐲k+12+α22τΔ𝐱k2,𝛼superscriptsubscriptΔ𝐲𝑘1superscriptsubscriptΔ𝐱𝑘𝜏2superscriptnormsuperscriptsubscriptΔ𝐲𝑘12superscript𝛼22𝜏superscriptnormsuperscriptsubscriptΔ𝐱𝑘2\alpha\left\langle\Delta_{\bf y}^{k+1},\Delta_{\bf x}^{k}\right\rangle\leq% \frac{\tau}{2}\left\|\Delta_{\bf y}^{k+1}\right\|^{2}+\frac{\alpha^{2}}{2\tau}% \left\|\Delta_{\bf x}^{k}\right\|^{2},italic_α ⟨ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ ≤ divide start_ARG italic_τ end_ARG start_ARG 2 end_ARG ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_τ end_ARG ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

and

(1γ+Lh)α2Δ𝐱k1,Δ𝐱kα22γΔ𝐱k12+γα22(1γ+Lh)2Δ𝐱k2.1𝛾subscript𝐿superscript𝛼2superscriptsubscriptΔ𝐱𝑘1superscriptsubscriptΔ𝐱𝑘superscript𝛼22𝛾superscriptnormsuperscriptsubscriptΔ𝐱𝑘12𝛾superscript𝛼22superscript1𝛾subscript𝐿2superscriptnormsuperscriptsubscriptΔ𝐱𝑘2\displaystyle\left(\frac{1}{\gamma}+L_{h}\right)\alpha^{2}\left\langle\Delta_{% \bf x}^{k-1},\Delta_{\bf x}^{k}\right\rangle\leq\frac{\alpha^{2}}{2\gamma}% \left\|\Delta_{\bf x}^{k-1}\right\|^{2}+\frac{\gamma\alpha^{2}}{2}\left(\frac{% 1}{\gamma}+L_{h}\right)^{2}\left\|\Delta_{\bf x}^{k}\right\|^{2}.( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG + italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟨ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ ≤ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_γ end_ARG ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_γ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG + italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Substituting the above inequalities into Eq. 25, we get

(26) γ(𝐲k,𝐳k,𝐱k)γ(𝐲k+1,𝐳k+1,𝐱k+1)subscript𝛾superscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘subscript𝛾superscript𝐲𝑘1superscript𝐳𝑘1superscript𝐱𝑘1\displaystyle\mathcal{H}_{\gamma}\left({\bf y}^{k},{\bf z}^{k},{\bf x}^{k}% \right)-\mathcal{H}_{\gamma}\left({\bf y}^{k+1},{\bf z}^{k+1},{\bf x}^{k+1}\right)caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT )
(Λ(γ)τ)(1γ+Lh2)Δ𝐲k+12α22γΔ𝐱k12absentΛ𝛾𝜏1𝛾subscript𝐿2superscriptnormsuperscriptsubscriptΔ𝐲𝑘12superscript𝛼22𝛾superscriptnormsuperscriptsubscriptΔ𝐱𝑘12\displaystyle\geq\left(\Lambda(\gamma)-\tau\right)\left(\frac{1}{\gamma}+\frac% {L_{h}}{2}\right)\left\|\Delta_{\bf y}^{k+1}\right\|^{2}-\frac{\alpha^{2}}{2% \gamma}\left\|\Delta_{\bf x}^{k-1}\right\|^{2}≥ ( roman_Λ ( italic_γ ) - italic_τ ) ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_γ end_ARG ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+(αγ+α22γ+αLhα2Lh2γα2Lh22α2Lh2τα2τγ)Δ𝐱k2,𝛼𝛾superscript𝛼22𝛾𝛼subscript𝐿superscript𝛼2subscript𝐿2𝛾superscript𝛼2superscriptsubscript𝐿22superscript𝛼2subscript𝐿2𝜏superscript𝛼2𝜏𝛾superscriptnormsuperscriptsubscriptΔ𝐱𝑘2\displaystyle\quad+\left(\frac{\alpha}{\gamma}+\frac{\alpha^{2}}{2\gamma}+% \alpha L_{h}-\frac{\alpha^{2}L_{h}}{2}-\frac{\gamma\alpha^{2}L_{h}^{2}}{2}-% \frac{\alpha^{2}L_{h}}{2\tau}-\frac{\alpha^{2}}{\tau\gamma}\right)\left\|% \Delta_{\bf x}^{k}\right\|^{2},+ ( divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_γ end_ARG + italic_α italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG - divide start_ARG italic_γ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_τ end_ARG - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_τ italic_γ end_ARG ) ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where Λ(γ)Λ𝛾\Lambda(\gamma)roman_Λ ( italic_γ ) is defined in Eq. 8 and τ𝜏\tauitalic_τ is an auxiliary parameter assumed to satisfy α<τ<Λ(γ)𝛼𝜏Λ𝛾\alpha<\tau<\Lambda(\gamma)italic_α < italic_τ < roman_Λ ( italic_γ ), which must exist according to Section 3.1. Then, according to the definition of Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT in Eq. 16 and Eq. 26, the conclusion Eq. 17 can be obtained directly.

Now we show that ξ(α,γ):=αγ+αLhα2Lh2γα2Lh22α2Lh2τα2τγ>0assign𝜉𝛼𝛾𝛼𝛾𝛼subscript𝐿superscript𝛼2subscript𝐿2𝛾superscript𝛼2superscriptsubscript𝐿22superscript𝛼2subscript𝐿2𝜏superscript𝛼2𝜏𝛾0\xi(\alpha,\gamma):=\frac{\alpha}{\gamma}+\alpha L_{h}-\frac{\alpha^{2}L_{h}}{% 2}-\frac{\gamma\alpha^{2}L_{h}^{2}}{2}-\frac{\alpha^{2}L_{h}}{2\tau}-\frac{% \alpha^{2}}{\tau\gamma}>0italic_ξ ( italic_α , italic_γ ) := divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG + italic_α italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG - divide start_ARG italic_γ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_τ end_ARG - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_τ italic_γ end_ARG > 0. Since τ>α𝜏𝛼\tau>\alphaitalic_τ > italic_α, we can easily obtain that αγα2τγ>0𝛼𝛾superscript𝛼2𝜏𝛾0\frac{\alpha}{\gamma}-\frac{\alpha^{2}}{\tau\gamma}>0divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_τ italic_γ end_ARG > 0. It leaves to show αLhα2Lh2γα2Lh22α2Lh2τ>0𝛼subscript𝐿superscript𝛼2subscript𝐿2𝛾superscript𝛼2superscriptsubscript𝐿22superscript𝛼2subscript𝐿2𝜏0\alpha L_{h}-\frac{\alpha^{2}L_{h}}{2}-\frac{\gamma\alpha^{2}L_{h}^{2}}{2}-% \frac{\alpha^{2}L_{h}}{2\tau}>0italic_α italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG - divide start_ARG italic_γ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_τ end_ARG > 0. According to Section 3.1, we know that

(27) α<Λ(γ)<1γl2γLh2+γLh<11+γLh.𝛼Λ𝛾1𝛾𝑙2𝛾subscript𝐿2𝛾subscript𝐿11𝛾subscript𝐿\alpha<\Lambda(\gamma)<\frac{1-\gamma{l}-2\gamma L_{h}}{2+\gamma L_{h}}<\frac{% 1}{1+\gamma L_{h}}.italic_α < roman_Λ ( italic_γ ) < divide start_ARG 1 - italic_γ italic_l - 2 italic_γ italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 + italic_γ italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG < divide start_ARG 1 end_ARG start_ARG 1 + italic_γ italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG .

This together with τ>α𝜏𝛼\tau>\alphaitalic_τ > italic_α, we have αLhα2Lh2γα2Lh22α2Lh2τ>αLhα2Lh2γα2Lh22α2Lh2α=αLh2(1ααγLh)>0𝛼subscript𝐿superscript𝛼2subscript𝐿2𝛾superscript𝛼2superscriptsubscript𝐿22superscript𝛼2subscript𝐿2𝜏𝛼subscript𝐿superscript𝛼2subscript𝐿2𝛾superscript𝛼2superscriptsubscript𝐿22superscript𝛼2subscript𝐿2𝛼𝛼subscript𝐿21𝛼𝛼𝛾subscript𝐿0\alpha L_{h}-\frac{\alpha^{2}L_{h}}{2}-\frac{\gamma\alpha^{2}L_{h}^{2}}{2}-% \frac{\alpha^{2}L_{h}}{2\tau}>\alpha L_{h}-\frac{\alpha^{2}L_{h}}{2}-\frac{% \gamma\alpha^{2}L_{h}^{2}}{2}-\frac{\alpha^{2}L_{h}}{2\alpha}=\frac{\alpha L_{% h}}{2}(1-\alpha-\alpha\gamma L_{h})>0italic_α italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG - divide start_ARG italic_γ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_τ end_ARG > italic_α italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG - divide start_ARG italic_γ italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG - divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_α end_ARG = divide start_ARG italic_α italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ( 1 - italic_α - italic_α italic_γ italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT ) > 0. This completes the proof.

The following lemma presents that the sequences {Δ𝐱k}superscriptsubscriptΔ𝐱𝑘\{\Delta_{\bf x}^{k}\}{ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT }, {Δ𝐲k}superscriptsubscriptΔ𝐲𝑘\{\Delta_{\bf y}^{k}\}{ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } and {𝐲k𝐳k}superscript𝐲𝑘superscript𝐳𝑘\{{\bf y}^{k}-{\bf z}^{k}\}{ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } vanish with certain sublinear convergence rate.

Lemma 3.5.

Suppose that Section 3.1 and Section 3.1 hold. Let the sequence {(𝐲k,𝐳k,𝐱k)}k1subscriptsuperscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘𝑘1\{({\bf y}^{k},{\bf z}^{k},{\bf x}^{k})\}_{k\geq 1}{ ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT be generated by Eq. 7 which is assumed to be bounded, and the sequences {𝐮k},{𝐯k}superscript𝐮𝑘superscript𝐯𝑘\{{\bf u}^{k}\},~{}\{{\bf v}^{k}\}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } , { bold_v start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } are defined in Eq. 13, respectively. Then,

  • (i)

    it holds that k=0Δ𝐱k2<+superscriptsubscript𝑘0superscriptnormsuperscriptsubscriptΔ𝐱𝑘2\sum_{k=0}^{\infty}\|\Delta_{\bf x}^{k}\|^{2}<+\infty∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < + ∞ and k=0Δ𝐲k2<+superscriptsubscript𝑘0superscriptnormsuperscriptsubscriptΔ𝐲𝑘2\sum_{k=0}^{\infty}\|\Delta_{\bf y}^{k}\|^{2}<+\infty∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < + ∞. Furthermore, we have limkΔ𝐱k=0subscript𝑘normsuperscriptsubscriptΔ𝐱𝑘0\lim_{k\rightarrow\infty}\|\Delta_{\bf x}^{k}\|=0roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = 0, limkΔ𝐲k=0subscript𝑘normsuperscriptsubscriptΔ𝐲𝑘0\lim_{k\rightarrow\infty}\|\Delta_{\bf y}^{k}\|=0roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = 0, and limk𝐲k𝐳k=0subscript𝑘normsuperscript𝐲𝑘superscript𝐳𝑘0\lim_{k\rightarrow\infty}\|{\bf y}^{k}-{\bf z}^{k}\|=0roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = 0.

  • (ii)

    it holds that minkKΔ𝐱k=𝒪(1K)subscript𝑘𝐾normsuperscriptsubscriptΔ𝐱𝑘𝒪1𝐾\min_{k\leq K}\|\Delta_{\bf x}^{k}\|=\mathcal{O}(\frac{1}{\sqrt{K}})roman_min start_POSTSUBSCRIPT italic_k ≤ italic_K end_POSTSUBSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = caligraphic_O ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ), minkKΔ𝐲k=𝒪(1K)subscript𝑘𝐾normsuperscriptsubscriptΔ𝐲𝑘𝒪1𝐾\min_{k\leq K}\|\Delta_{\bf y}^{k}\|=\mathcal{O}(\frac{1}{\sqrt{K}})roman_min start_POSTSUBSCRIPT italic_k ≤ italic_K end_POSTSUBSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = caligraphic_O ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ), and minkK𝐲k𝐳k=𝒪(1K)subscript𝑘𝐾normsuperscript𝐲𝑘superscript𝐳𝑘𝒪1𝐾\min_{k\leq K}\|{\bf y}^{k}-{\bf z}^{k}\|=\mathcal{O}(\frac{1}{\sqrt{K}})roman_min start_POSTSUBSCRIPT italic_k ≤ italic_K end_POSTSUBSCRIPT ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = caligraphic_O ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ).

Proof 3.6.

We now prove (i). We first show that Θα,γ(𝐮k)subscriptΘ𝛼𝛾superscript𝐮𝑘\Theta_{\alpha,\gamma}\left({\bf u}^{k}\right)roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) is lower bounded for all k𝑘kitalic_k. It follows from the definition of Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT in Eq. 16 that

(28) Θα,γ(𝐮k)subscriptΘ𝛼𝛾superscript𝐮𝑘\displaystyle\Theta_{\alpha,\gamma}\left({\bf u}^{k}\right)roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) =γ(𝐲k,𝐳k,𝐱k)+α2γΔ𝐱k12absentsubscript𝛾superscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘𝛼2𝛾superscriptnormsuperscriptsubscriptΔ𝐱𝑘12\displaystyle=\mathcal{H}_{\gamma}\left({\bf y}^{k},{\bf z}^{k},{\bf x}^{k}% \right)+\frac{\alpha}{2\gamma}\|\Delta_{\bf x}^{k-1}\|^{2}= caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + divide start_ARG italic_α end_ARG start_ARG 2 italic_γ end_ARG ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=f1(𝐲k)+f2(𝐳k)+h(𝐲k)+12γ2𝐲k𝐳k𝐱kγh(𝐲k)2absentsubscript𝑓1superscript𝐲𝑘subscript𝑓2superscript𝐳𝑘superscript𝐲𝑘12𝛾superscriptnorm2superscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘𝛾superscript𝐲𝑘2\displaystyle=f_{1}({\bf y}^{k})+f_{2}({\bf z}^{k})+h({\bf y}^{k})+\frac{1}{2% \gamma}\|2{\bf y}^{k}-{\bf z}^{k}-{\bf x}^{k}-\gamma\nabla h({\bf y}^{k})\|^{2}= italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ 2 bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
12γ𝐱k𝐲k+γh(𝐲k)21γ𝐲k𝐳k2+α2γΔ𝐱k12.12𝛾superscriptnormsuperscript𝐱𝑘superscript𝐲𝑘𝛾superscript𝐲𝑘21𝛾superscriptnormsuperscript𝐲𝑘superscript𝐳𝑘2𝛼2𝛾superscriptnormsuperscriptsubscriptΔ𝐱𝑘12\displaystyle\quad-\frac{1}{2\gamma}\left\|{\bf x}^{k}-{\bf y}^{k}+\gamma% \nabla h({\bf y}^{k})\right\|^{2}-\frac{1}{\gamma}\left\|{\bf y}^{k}-{\bf z}^{% k}\right\|^{2}+\frac{\alpha}{2\gamma}\|\Delta_{\bf x}^{k-1}\|^{2}.- divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α end_ARG start_ARG 2 italic_γ end_ARG ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Since f1subscript𝑓1\nabla f_{1}∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and h\nabla h∇ italic_h are both Lipschitz continuous with moduli Lf1subscript𝐿subscript𝑓1L_{f_{1}}italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and Lhsubscript𝐿L_{h}italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT, then

f1(𝐲k)f1(𝐳k)f1(𝐲k),𝐳k𝐲kLf12𝐲k𝐳k2,subscript𝑓1superscript𝐲𝑘subscript𝑓1superscript𝐳𝑘subscript𝑓1superscript𝐲𝑘superscript𝐳𝑘superscript𝐲𝑘subscript𝐿subscript𝑓12superscriptnormsuperscript𝐲𝑘superscript𝐳𝑘2f_{1}({\bf y}^{k})\geq f_{1}({\bf z}^{k})-\langle\nabla f_{1}({\bf y}^{k}),{% \bf z}^{k}-{\bf y}^{k}\rangle-\frac{L_{f_{1}}}{2}\|{\bf y}^{k}-{\bf z}^{k}\|^{% 2},italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ≥ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - ⟨ ∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ - divide start_ARG italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

and

h(𝐲k)h(𝐳k)h(𝐲k),𝐳k𝐲kLh2𝐲k𝐳k2.superscript𝐲𝑘superscript𝐳𝑘superscript𝐲𝑘superscript𝐳𝑘superscript𝐲𝑘subscript𝐿2superscriptnormsuperscript𝐲𝑘superscript𝐳𝑘2h({\bf y}^{k})\geq h({\bf z}^{k})-\langle\nabla h({\bf y}^{k}),{\bf z}^{k}-{% \bf y}^{k}\rangle-\frac{L_{h}}{2}\|{\bf y}^{k}-{\bf z}^{k}\|^{2}.italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ≥ italic_h ( bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - ⟨ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ - divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Substituting them into Eq. 28, and togethering with f1(𝐲k)=1γ(𝐲k𝐰k1)subscript𝑓1superscript𝐲𝑘1𝛾superscript𝐲𝑘superscript𝐰𝑘1\nabla f_{1}({\bf y}^{k})=-\frac{1}{\gamma}\left({\bf y}^{k}-{\bf w}^{k-1}\right)∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) = - divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) from Eq. 11, we have

(29) Θα,γ(𝐮k)subscriptΘ𝛼𝛾superscript𝐮𝑘\displaystyle\Theta_{\alpha,\gamma}\left({\bf u}^{k}\right)roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) f1(𝐳k)+f2(𝐳k)+h(𝐳k)+(12γLf1+Lh2)𝐲k𝐳k2absentsubscript𝑓1superscript𝐳𝑘subscript𝑓2superscript𝐳𝑘superscript𝐳𝑘12𝛾subscript𝐿subscript𝑓1subscript𝐿2superscriptnormsuperscript𝐲𝑘superscript𝐳𝑘2\displaystyle\geq f_{1}({\bf z}^{k})+f_{2}({\bf z}^{k})+h({\bf z}^{k})+\left(% \frac{1}{2\gamma}-\frac{L_{f_{1}}+L_{h}}{2}\right)\left\|{\bf y}^{k}-{\bf z}^{% k}\right\|^{2}≥ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + italic_h ( bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + ( divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG - divide start_ARG italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
1γ𝐱k𝐰k1,𝐲k𝐳k1γ𝐲k𝐳k2+α2γΔ𝐱k121𝛾superscript𝐱𝑘superscript𝐰𝑘1superscript𝐲𝑘superscript𝐳𝑘1𝛾superscriptnormsuperscript𝐲𝑘superscript𝐳𝑘2𝛼2𝛾superscriptnormsuperscriptsubscriptΔ𝐱𝑘12\displaystyle\quad-\frac{1}{\gamma}\left\langle{\bf x}^{k}-{\bf w}^{k-1},{\bf y% }^{k}-{\bf z}^{k}\right\rangle-\frac{1}{\gamma}\left\|{\bf y}^{k}-{\bf z}^{k}% \right\|^{2}+\frac{\alpha}{2\gamma}\|\Delta_{\bf x}^{k-1}\|^{2}- divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ⟨ bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_w start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ - divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_α end_ARG start_ARG 2 italic_γ end_ARG ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
F(𝐳k)+(12γLf1+Lh2)𝐲k𝐳k2,absent𝐹superscript𝐳𝑘12𝛾subscript𝐿subscript𝑓1subscript𝐿2superscriptnormsuperscript𝐲𝑘superscript𝐳𝑘2\displaystyle\geq F({\bf z}^{k})+\left(\frac{1}{2\gamma}-\frac{L_{f_{1}}+L_{h}% }{2}\right)\left\|{\bf y}^{k}-{\bf z}^{k}\right\|^{2},≥ italic_F ( bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + ( divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG - divide start_ARG italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where the first inequality follows from 12γ2𝐲k𝐳k𝐱kγh(𝐲k)2=12γ𝐲k𝐳k2+1γ𝐲k𝐱k,𝐲k𝐳k𝐲k𝐳k,h(𝐲k)+12γ𝐱k𝐲k+γh(𝐲k)212𝛾superscriptnorm2superscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘𝛾superscript𝐲𝑘212𝛾superscriptnormsuperscript𝐲𝑘superscript𝐳𝑘21𝛾superscript𝐲𝑘superscript𝐱𝑘superscript𝐲𝑘superscript𝐳𝑘superscript𝐲𝑘superscript𝐳𝑘superscript𝐲𝑘12𝛾superscriptnormsuperscript𝐱𝑘superscript𝐲𝑘𝛾superscript𝐲𝑘2\frac{1}{2\gamma}\|2{\bf y}^{k}-{\bf z}^{k}-{\bf x}^{k}-\gamma\nabla h({\bf y}% ^{k})\|^{2}=\frac{1}{2\gamma}\left\|{\bf y}^{k}-{\bf z}^{k}\right\|^{2}+\frac{% 1}{\gamma}\left\langle{\bf y}^{k}-{\bf x}^{k},{\bf y}^{k}-{\bf z}^{k}\right% \rangle-\langle{\bf y}^{k}-{\bf z}^{k},\nabla h({\bf y}^{k})\rangle+\frac{1}{2% \gamma}\left\|{\bf x}^{k}-{\bf y}^{k}+\gamma\nabla h({\bf y}^{k})\right\|^{2}divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ 2 bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ⟨ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⟩ - ⟨ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ⟩ + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and the second one follows from α2γ0𝛼2𝛾0\frac{\alpha}{2\gamma}\geq 0divide start_ARG italic_α end_ARG start_ARG 2 italic_γ end_ARG ≥ 0 and 𝐱k=𝐰k1+(𝐳k𝐲k)superscript𝐱𝑘superscript𝐰𝑘1superscript𝐳𝑘superscript𝐲𝑘{\bf x}^{k}={\bf w}^{k-1}+\left({\bf z}^{k}-{\bf y}^{k}\right)bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = bold_w start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT + ( bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) by Eq. 7. This implies that Θα,γ(𝐮k)subscriptΘ𝛼𝛾superscript𝐮𝑘\Theta_{\alpha,\gamma}\left({\bf u}^{k}\right)roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) for all k1𝑘1k\geq 1italic_k ≥ 1 is bounded from below due to the fact that 0<γ<1Lf1+Lh0𝛾1subscript𝐿subscript𝑓1subscript𝐿0<\gamma<\frac{1}{L_{f_{1}}+L_{h}}0 < italic_γ < divide start_ARG 1 end_ARG start_ARG italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG and the boundedness of F𝐹Fitalic_F and {𝐮k}k1subscriptsuperscript𝐮𝑘𝑘1\left\{{\bf u}^{k}\right\}_{k\geq 1}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT. Summing Eq. 17 from k=1𝑘1k=1italic_k = 1 to N10𝑁10N-1\geq 0italic_N - 1 ≥ 0, we get

(30) Θα,γ(𝐮1)Θα,γ(𝐮N)(Λ(γ)τ)(1γ+Lh2)k=2NΔ𝐲k2+ξ(α,γ)k=1N1Δ𝐱k2.subscriptΘ𝛼𝛾superscript𝐮1subscriptΘ𝛼𝛾superscript𝐮𝑁Λ𝛾𝜏1𝛾subscript𝐿2superscriptsubscript𝑘2𝑁superscriptnormsuperscriptsubscriptΔ𝐲𝑘2𝜉𝛼𝛾superscriptsubscript𝑘1𝑁1superscriptnormsuperscriptsubscriptΔ𝐱𝑘2\displaystyle\Theta_{\alpha,\gamma}\left({\bf u}^{1}\right)-\Theta_{\alpha,% \gamma}\left({\bf u}^{N}\right)\geq\left(\Lambda(\gamma)-\tau\right)\left(% \frac{1}{\gamma}+\frac{L_{h}}{2}\right)\sum_{k=2}^{N}\left\|\Delta_{\bf y}^{k}% \right\|^{2}+\xi(\alpha,\gamma)\sum_{k=1}^{N-1}\left\|\Delta_{\bf x}^{k}\right% \|^{2}.roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ) ≥ ( roman_Λ ( italic_γ ) - italic_τ ) ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ∑ start_POSTSUBSCRIPT italic_k = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ξ ( italic_α , italic_γ ) ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Therefore, letting N+𝑁N\rightarrow+\inftyitalic_N → + ∞ and following the lower boundedness of {Θα,γ(𝐮k)}k1subscriptsubscriptΘ𝛼𝛾superscript𝐮𝑘𝑘1\{\Theta_{\alpha,\gamma}\left({\bf u}^{k}\right)\}_{k\geq 1}{ roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT, we have

(31) (Λ(γ)τ)(1γ+Lh2)k=2Δ𝐲k2+ξ(α,γ)k=1Δ𝐱k2<+.Λ𝛾𝜏1𝛾subscript𝐿2superscriptsubscript𝑘2superscriptnormsuperscriptsubscriptΔ𝐲𝑘2𝜉𝛼𝛾superscriptsubscript𝑘1superscriptnormsuperscriptsubscriptΔ𝐱𝑘2\displaystyle\left(\Lambda(\gamma)-\tau\right)\left(\frac{1}{\gamma}+\frac{L_{% h}}{2}\right)\sum_{k=2}^{\infty}\left\|\Delta_{\bf y}^{k}\right\|^{2}+\xi(% \alpha,\gamma)\sum_{k=1}^{\infty}\left\|\Delta_{\bf x}^{k}\right\|^{2}<+\infty.( roman_Λ ( italic_γ ) - italic_τ ) ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ∑ start_POSTSUBSCRIPT italic_k = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ξ ( italic_α , italic_γ ) ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < + ∞ .

This implies that k=0Δ𝐱k2<+superscriptsubscript𝑘0superscriptnormsuperscriptsubscriptΔ𝐱𝑘2\sum_{k=0}^{\infty}\|\Delta_{\bf x}^{k}\|^{2}<+\infty∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < + ∞ and k=0Δ𝐲k2<+superscriptsubscript𝑘0superscriptnormsuperscriptsubscriptΔ𝐲𝑘2\sum_{k=0}^{\infty}\|\Delta_{\bf y}^{k}\|^{2}<+\infty∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < + ∞. Therefore, it holds that limkΔ𝐱k=0subscript𝑘normsuperscriptsubscriptΔ𝐱𝑘0\lim_{k\rightarrow\infty}\|\Delta_{\bf x}^{k}\|=0roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = 0 and limkΔ𝐲k=0subscript𝑘normsuperscriptsubscriptΔ𝐲𝑘0\lim_{k\rightarrow\infty}\|\Delta_{\bf y}^{k}\|=0roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = 0. Since 𝐲k𝐳k=𝐰k1𝐱k=𝐱k1𝐱k+α(𝐱k1𝐱k2)superscript𝐲𝑘superscript𝐳𝑘superscript𝐰𝑘1superscript𝐱𝑘superscript𝐱𝑘1superscript𝐱𝑘𝛼superscript𝐱𝑘1superscript𝐱𝑘2{\bf y}^{k}-{\bf z}^{k}={\bf w}^{k-1}-{\bf x}^{k}={\bf x}^{k-1}-{\bf x}^{k}+% \alpha({\bf x}^{k-1}-{\bf x}^{k-2})bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = bold_w start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_α ( bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 2 end_POSTSUPERSCRIPT ) from Eq. 7, we further have limk𝐲k𝐳k=0subscript𝑘normsuperscript𝐲𝑘superscript𝐳𝑘0\lim_{k\rightarrow\infty}\|{\bf y}^{k}-{\bf z}^{k}\|=0roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = 0.

We turn to prove (ii). According to Eq. 30 and recalling ξ(α,γ)>0𝜉𝛼𝛾0\xi(\alpha,\gamma)>0italic_ξ ( italic_α , italic_γ ) > 0 and the lower boundedness of {Θα,γ(𝐮k)}k1subscriptsubscriptΘ𝛼𝛾superscript𝐮𝑘𝑘1\{\Theta_{\alpha,\gamma}\left({\bf u}^{k}\right)\}_{k\geq 1}{ roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT, we know that there exists a constant C0subscript𝐶0C_{0}italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that

(32) Kmin1kKΔ𝐱k2k=1KΔ𝐱k21ξ(α,γ)(Θα,γ(𝐮1)Θα,γ(𝐮K+1))C0.𝐾subscript1𝑘𝐾superscriptnormsuperscriptsubscriptΔ𝐱𝑘2superscriptsubscript𝑘1𝐾superscriptnormsuperscriptsubscriptΔ𝐱𝑘21𝜉𝛼𝛾subscriptΘ𝛼𝛾superscript𝐮1subscriptΘ𝛼𝛾superscript𝐮𝐾1subscript𝐶0\displaystyle K\cdot\min_{1\leq k\leq K}\left\|\Delta_{\bf x}^{k}\right\|^{2}% \leq\sum_{k=1}^{K}\left\|\Delta_{\bf x}^{k}\right\|^{2}\leq\frac{1}{\xi(\alpha% ,\gamma)}\left(\Theta_{\alpha,\gamma}\left({\bf u}^{1}\right)-\Theta_{\alpha,% \gamma}\left({\bf u}^{K+1}\right)\right)\leq C_{0}.italic_K ⋅ roman_min start_POSTSUBSCRIPT 1 ≤ italic_k ≤ italic_K end_POSTSUBSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_ξ ( italic_α , italic_γ ) end_ARG ( roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_K + 1 end_POSTSUPERSCRIPT ) ) ≤ italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT .

This implies that minkKΔ𝐱k=𝒪(1K)subscript𝑘𝐾normsuperscriptsubscriptΔ𝐱𝑘𝒪1𝐾\min_{k\leq K}\|\Delta_{\bf x}^{k}\|=\mathcal{O}(\frac{1}{\sqrt{K}})roman_min start_POSTSUBSCRIPT italic_k ≤ italic_K end_POSTSUBSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = caligraphic_O ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ). Similarly, we can obtain minkKΔ𝐲k=𝒪(1K)subscript𝑘𝐾normsuperscriptsubscriptΔ𝐲𝑘𝒪1𝐾\min_{k\leq K}\|\Delta_{\bf y}^{k}\|=\mathcal{O}(\frac{1}{\sqrt{K}})roman_min start_POSTSUBSCRIPT italic_k ≤ italic_K end_POSTSUBSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = caligraphic_O ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ) and minkK𝐲k𝐳k=𝒪(1K)subscript𝑘𝐾normsuperscript𝐲𝑘superscript𝐳𝑘𝒪1𝐾\min_{k\leq K}\|{\bf y}^{k}-{\bf z}^{k}\|=\mathcal{O}(\frac{1}{\sqrt{K}})roman_min start_POSTSUBSCRIPT italic_k ≤ italic_K end_POSTSUBSCRIPT ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = caligraphic_O ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ). This completes the proof.

Note that in Lemma 3.5, we show the lower boundedness of Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT, as well as γsubscript𝛾\mathcal{H}_{\gamma}caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT, for the generated sequences, relying on Section 3.1(iii) and Section 3.1. The lower boundedness plays a crucial role in establishing both the sublinear convergence rate and the convergence of the generated sequence. Some similar results have also been discussed in [42], which demonstrates the consistency between the lower bound and the minimizer of γsubscript𝛾\mathcal{H}_{\gamma}caligraphic_H start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT and F𝐹Fitalic_F.

In the following, we give the subsequential convergence result for Algorithm 1.

Theorem 3.7.

Suppose that Section 3.1 and Section 3.1 hold. Let the sequence {(𝐲k,𝐳k,𝐱k)}k1subscriptsuperscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘𝑘1\{({\bf y}^{k},{\bf z}^{k},{\bf x}^{k})\}_{k\geq 1}{ ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT be generated by Eq. 7 which is assumed to be bounded, and the sequences {𝐮k},{𝐯k}superscript𝐮𝑘superscript𝐯𝑘\{{\bf u}^{k}\},~{}\{{\bf v}^{k}\}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } , { bold_v start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } are defined in Eq. 13. Then,

  • (i)

    any cluster point 𝐮:=(𝐲,𝐳,𝐱,𝐱,𝐱)assignsuperscript𝐮superscript𝐲superscript𝐳superscript𝐱superscript𝐱superscript𝐱{\bf u}^{*}:=({\bf y}^{*},{\bf z}^{*},{\bf x}^{*},{\bf x}^{*},{\bf x}^{*})bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := ( bold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) of the sequence {𝐮k}k1subscriptsuperscript𝐮𝑘𝑘1\left\{{\bf u}^{k}\right\}_{k\geq 1}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT is a critical point of the problem Eq. 1, i.e., it holds that 0F(𝐲)0𝐹superscript𝐲0\in\partial F({\bf y}^{*})0 ∈ ∂ italic_F ( bold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ).

  • (ii)

    The limit limkΘα,γ(𝐮k)subscript𝑘subscriptΘ𝛼𝛾superscript𝐮𝑘\lim_{k\rightarrow\infty}\Theta_{\alpha,\gamma}({\bf u}^{k})roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) exists and for any cluster point 𝐮superscript𝐮{\bf u}^{*}bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT of the sequence {𝐮k}k1subscriptsuperscript𝐮𝑘𝑘1\{{\bf u}^{k}\}_{k\geq 1}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT, we have

    (33) Θ:=limkΘα,γ(𝐮k)=Θα,γ(𝐮).assignsuperscriptΘsubscript𝑘subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮\Theta^{*}:=\lim_{k\rightarrow\infty}\Theta_{\alpha,\gamma}({\bf u}^{k})=% \Theta_{\alpha,\gamma}({\bf u}^{*}).roman_Θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) = roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) .

Proof 3.8.

We first prove (i). It follows from Eq. 7 and Lemma 3.5(i) that

limk𝐳k+1𝐳k=0.subscript𝑘normsuperscript𝐳𝑘1superscript𝐳𝑘0\lim_{k\rightarrow\infty}\left\|{\bf z}^{k+1}-{\bf z}^{k}\right\|=0.roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT ∥ bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = 0 .

Let 𝐮superscript𝐮{\bf u}^{*}bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be a cluster point of {𝐮k}k1subscriptsuperscript𝐮𝑘𝑘1\left\{{\bf u}^{k}\right\}_{k\geq 1}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT, and assume that {𝐮kj}superscript𝐮subscript𝑘𝑗\left\{{\bf u}^{k_{j}}\right\}{ bold_u start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } is a convergent subsequence such that limk𝐮kj=𝐮.subscript𝑘superscript𝐮subscript𝑘𝑗superscript𝐮\lim_{k\rightarrow\infty}{\bf u}^{k_{j}}={\bf u}^{*}.roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT bold_u start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT . Then

(34) limj𝐮kj=limj𝐮kj1=𝐮.subscript𝑗superscript𝐮subscript𝑘𝑗subscript𝑗superscript𝐮subscript𝑘𝑗1superscript𝐮\lim_{j\rightarrow\infty}{\bf u}^{k_{j}}=\lim_{j\rightarrow\infty}{\bf u}^{k_{% j}-1}={\bf u}^{*}.roman_lim start_POSTSUBSCRIPT italic_j → ∞ end_POSTSUBSCRIPT bold_u start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = roman_lim start_POSTSUBSCRIPT italic_j → ∞ end_POSTSUBSCRIPT bold_u start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT = bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT .

Summing Eq. 11 and Eq. 12 and taking the limit along the convergent subsequence {𝐮kj}superscript𝐮subscript𝑘𝑗\{{\bf u}^{k_{j}}\}{ bold_u start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT }, and applying Eq. 3 and Eq. 34, we have

0f1(𝐲)+f2(𝐲)+h(𝐲).0subscript𝑓1superscript𝐲subscript𝑓2superscript𝐲superscript𝐲0\in\nabla f_{1}\left({\bf y}^{*}\right)+\partial f_{2}\left({\bf y}^{*}\right% )+\nabla h\left({\bf y}^{*}\right).0 ∈ ∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ∂ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + ∇ italic_h ( bold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) .

Now we prove (ii). Suppose that {𝐮kj}superscript𝐮subscript𝑘𝑗\left\{{\bf u}^{k_{j}}\right\}{ bold_u start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } is a subsequence which converges to 𝐮superscript𝐮{\bf u}^{*}bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT as j𝑗j\rightarrow\inftyitalic_j → ∞. It follows from Lemma 3.3 and Lemma 3.5 that Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT is nonincreasing and bounded from below by Section 3.1. Therefore, Θ:=limkΘα,γ(𝐮k)assignsuperscriptΘsubscript𝑘subscriptΘ𝛼𝛾superscript𝐮𝑘\Theta^{*}:=\lim_{k\rightarrow\infty}\Theta_{\alpha,\gamma}({\bf u}^{k})roman_Θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) exists. It follows from Eq. 7 that 𝐳ksuperscript𝐳𝑘{\bf z}^{k}bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT is the minimizer of 𝐳𝐳{\bf z}bold_z-subproblem, we have

f2(𝐳k)+12γ𝐳k(2𝐲kγh(𝐲k)𝐱k1)2subscript𝑓2superscript𝐳𝑘12𝛾superscriptnormsuperscript𝐳𝑘2superscript𝐲𝑘𝛾superscript𝐲𝑘superscript𝐱𝑘12\displaystyle f_{2}({\bf z}^{k})+\frac{1}{2\gamma}\left\|{\bf z}^{k}-\left(2{% \bf y}^{k}-\gamma\nabla h({\bf y}^{k})-{\bf x}^{k-1}\right)\right\|^{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - ( 2 bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
f2(𝐳)+12γ𝐳(2𝐲kγh(𝐲k)𝐱k1)2.absentsubscript𝑓2superscript𝐳12𝛾superscriptnormsuperscript𝐳2superscript𝐲𝑘𝛾superscript𝐲𝑘superscript𝐱𝑘12\displaystyle\leq f_{2}({\bf z}^{*})+\frac{1}{2\gamma}\left\|{\bf z}^{*}-\left% (2{\bf y}^{k}-\gamma\nabla h({\bf y}^{k})-{\bf x}^{k-1}\right)\right\|^{2}.≤ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - ( 2 bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Replacing k𝑘kitalic_k by kjsubscript𝑘𝑗k_{j}italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in the above inequality and taking the limit on both sides, it follows from Eq. 34 yields limjf2(𝐳kj)f2(𝐳).subscript𝑗subscript𝑓2superscript𝐳subscript𝑘𝑗subscript𝑓2superscript𝐳\lim_{j\rightarrow\infty}f_{2}({\bf z}^{k_{j}})\leq f_{2}\left({\bf z}^{*}% \right).roman_lim start_POSTSUBSCRIPT italic_j → ∞ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ≤ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) . On the other hand, since f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is proper and closed, we have liminfjf2(𝐳kj)subscriptinfimum𝑗subscript𝑓2superscript𝐳subscript𝑘𝑗absent\lim\inf_{j\rightarrow\infty}f_{2}\left({\bf z}^{k_{j}}\right)\geqroman_lim roman_inf start_POSTSUBSCRIPT italic_j → ∞ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) ≥ f2(𝐳)subscript𝑓2superscript𝐳f_{2}\left({\bf z}^{*}\right)italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). Hence

limjf2(𝐳kj)=f2(𝐳).subscript𝑗subscript𝑓2superscript𝐳subscript𝑘𝑗subscript𝑓2superscript𝐳\lim_{j\rightarrow\infty}f_{2}({\bf z}^{k_{j}})=f_{2}({\bf z}^{*}).roman_lim start_POSTSUBSCRIPT italic_j → ∞ end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) = italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) .

This together with the properties of f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and g𝑔gitalic_g in Section 3.1 and Eq. 34, and the boundedness of the sequence {𝐮k}superscript𝐮𝑘\{{\bf u}^{k}\}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT }, we claim that

Θ:=limkΘα,γ(𝐮k)=Θα,γ(𝐮).assignsuperscriptΘsubscript𝑘subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮\Theta^{*}:=\lim_{k\rightarrow\infty}\Theta_{\alpha,\gamma}({\bf u}^{k})=% \Theta_{\alpha,\gamma}({\bf u}^{*}).roman_Θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) = roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) .

This completes the proof.

Remark 3.9.

Note that the boundedness of the sequence {𝐱k}superscript𝐱𝑘\{{\bf x}^{k}\}{ bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } is a standard assumption for the nonconvex optimization algorithms. It is documented in [2, Remark 3.3] that the boundedness assumption on the sequence {𝐱k}superscript𝐱𝑘\{{\bf x}^{k}\}{ bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } automatically holds when the corresponding lower level set {𝐱|F(𝐱)F0}conditional-set𝐱𝐹𝐱subscript𝐹0\{{\bf x}~{}|~{}F({\bf x})\leq F_{0}\}{ bold_x | italic_F ( bold_x ) ≤ italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } is compact for some F0subscript𝐹0F_{0}\in\mathbb{R}italic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R.

We present an inequality characterizing the upper bound of the subdifferential of Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT, which plays a key role in further convergence analysis.

Lemma 3.10.

Suppose that Section 3.1 and Section 3.1 hold. Let hhitalic_h be a twice continuously differentiable function with a bounded Hessian, i.e., there exists a constant M>0𝑀0M>0italic_M > 0 such that 2h(𝐲)M,𝐲nformulae-sequencenormsuperscript2𝐲𝑀for-all𝐲superscript𝑛\|\nabla^{2}h({\bf y})\|\leq M,\forall~{}{\bf y}\in\mathbb{R}^{n}∥ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_h ( bold_y ) ∥ ≤ italic_M , ∀ bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Let {(𝐲k,𝐳k,𝐱k)}k1subscriptsuperscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘𝑘1\{({\bf y}^{k},{\bf z}^{k},{\bf x}^{k})\}_{k\geq 1}{ ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT be the sequence generated by Eq. 7, and {𝐮k},{𝐯k}superscript𝐮𝑘superscript𝐯𝑘\{{\bf u}^{k}\},~{}\{{\bf v}^{k}\}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } , { bold_v start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } are defined in Eq. 13. Then, for any k1𝑘1k\geq 1italic_k ≥ 1, there exists a constant b>0𝑏0b>0italic_b > 0 such that

(35) dist(0,Θα,γ(𝐮k+1))b(Δ𝐱k+1+Δ𝐱k).dist0subscriptΘ𝛼𝛾superscript𝐮𝑘1𝑏normsuperscriptsubscriptΔ𝐱𝑘1normsuperscriptsubscriptΔ𝐱𝑘{\rm dist}\left(0,\partial\Theta_{\alpha,\gamma}({\bf u}^{k+1})\right)\leq b% \left(\|\Delta_{\bf x}^{k+1}\|+\|\Delta_{\bf x}^{k}\|\right).roman_dist ( 0 , ∂ roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ) ≤ italic_b ( ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ + ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ ) .

Proof 3.11.

Firstly, from the definition of Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT in Eq. 16, we have

(36) 𝐲Θα,γ(𝐮k+1)subscript𝐲subscriptΘ𝛼𝛾superscript𝐮𝑘1\displaystyle\nabla_{\bf y}\Theta_{\alpha,\gamma}({\bf u}^{k+1})∇ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) =f1(𝐲k+1)+1γ(𝐲k+1𝐱k+1)+2h(𝐲k+1)(𝐳k+1𝐲k+1)absentsubscript𝑓1superscript𝐲𝑘11𝛾superscript𝐲𝑘1superscript𝐱𝑘1superscript2superscriptsuperscript𝐲𝑘1topsuperscript𝐳𝑘1superscript𝐲𝑘1\displaystyle=\nabla f_{1}({\bf y}^{k+1})+\frac{1}{\gamma}\left({\bf y}^{k+1}-% {\bf x}^{k+1}\right)+\nabla^{2}h({\bf y}^{k+1})^{\top}\left({\bf z}^{k+1}-{\bf y% }^{k+1}\right)= ∇ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT )
=1γ(𝐱k𝐱k+1)+αγ(𝐱k𝐱k1)+2h(𝐲k+1)(𝐳k+1𝐲k+1),absent1𝛾superscript𝐱𝑘superscript𝐱𝑘1𝛼𝛾superscript𝐱𝑘superscript𝐱𝑘1superscript2superscriptsuperscript𝐲𝑘1topsuperscript𝐳𝑘1superscript𝐲𝑘1\displaystyle=\frac{1}{\gamma}\left({\bf x}^{k}-{\bf x}^{k+1}\right)+\frac{% \alpha}{\gamma}\left({\bf x}^{k}-{\bf x}^{k-1}\right)+\nabla^{2}h({\bf y}^{k+1% })^{\top}\left({\bf z}^{k+1}-{\bf y}^{k+1}\right),= divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ( bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG ( bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) + ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ,

where the last equality follows from Eq. 11. Secondly, we compute the subgradient of Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT with respect to 𝐳𝐳\bf zbold_z as follows:

(37) 𝐳Θα,γ(𝐮k+1)subscript𝐳subscriptΘ𝛼𝛾superscript𝐮𝑘1\displaystyle\partial_{\bf z}\Theta_{\alpha,\gamma}({\bf u}^{k+1})∂ start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT )
=f2(𝐳k+1)+1γ(𝐳k+12𝐲k+1+γh(𝐲k+1)+𝐱k+1)2γ(𝐳k+1𝐲k+1)absentsubscript𝑓2superscript𝐳𝑘11𝛾superscript𝐳𝑘12superscript𝐲𝑘1𝛾superscript𝐲𝑘1superscript𝐱𝑘12𝛾superscript𝐳𝑘1superscript𝐲𝑘1\displaystyle=\partial f_{2}({\bf z}^{k+1})+\frac{1}{\gamma}\left({\bf z}^{k+1% }-2{\bf y}^{k+1}+\gamma\nabla h({\bf y}^{k+1})+{\bf x}^{k+1}\right)-\frac{2}{% \gamma}\left({\bf z}^{k+1}-{\bf y}^{k+1}\right)= ∂ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - 2 bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT + italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) + bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - divide start_ARG 2 end_ARG start_ARG italic_γ end_ARG ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT )
1γ(𝐱k+1𝐱k)αγ(𝐱k𝐱k1),1𝛾superscript𝐱𝑘1superscript𝐱𝑘𝛼𝛾superscript𝐱𝑘superscript𝐱𝑘1absent\displaystyle\ni-\frac{1}{\gamma}\left({\bf x}^{k+1}-{\bf x}^{k}\right)-\frac{% \alpha}{\gamma}\left({\bf x}^{k}-{\bf x}^{k-1}\right),∋ - divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ( bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG ( bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) ,

where the inclusion follows from Eq. 7 and Eq. 12. Thirdly, from the definition of Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT in Eq. 16, it is easy to obtain

(38) 𝐱Θα,γ(𝐮k+1)subscript𝐱subscriptΘ𝛼𝛾superscript𝐮𝑘1\displaystyle\nabla_{\bf x}\Theta_{\alpha,\gamma}({\bf u}^{k+1})∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) =1γ(𝐳k+1𝐲k+1)=1γ(𝐱k+1𝐱k)+αγ(𝐱k𝐱k1),absent1𝛾superscript𝐳𝑘1superscript𝐲𝑘11𝛾superscript𝐱𝑘1superscript𝐱𝑘𝛼𝛾superscript𝐱𝑘superscript𝐱𝑘1\displaystyle=\frac{1}{\gamma}\left({\bf z}^{k+1}-{\bf y}^{k+1}\right)=\frac{1% }{\gamma}\left({\bf x}^{k+1}-{\bf x}^{k}\right)+\frac{\alpha}{\gamma}\left({% \bf x}^{k}-{\bf x}^{k-1}\right),= divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ( bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) + divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG ( bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) ,

where the last equality follows from Eq. 7. Finally, it follows from Eq. 16 that

(39) 𝐱1Θα,γ(𝐮k+1)=α2γ(𝐱k𝐱k1)and𝐱2Θα,γ(𝐮k+1)=α2γ(𝐱k1𝐱k).subscriptsubscript𝐱1subscriptΘ𝛼𝛾superscript𝐮𝑘1superscript𝛼2𝛾superscript𝐱𝑘superscript𝐱𝑘1andsubscriptsubscript𝐱2subscriptΘ𝛼𝛾superscript𝐮𝑘1superscript𝛼2𝛾superscript𝐱𝑘1superscript𝐱𝑘\nabla_{{\bf x}_{1}}\Theta_{\alpha,\gamma}({\bf u}^{k+1})=\frac{\alpha^{2}}{% \gamma}\left({\bf x}^{k}-{\bf x}^{k-1}\right)~{}~{}{\rm and}~{}~{}\nabla_{{\bf x% }_{2}}\Theta_{\alpha,\gamma}({\bf u}^{k+1})=\frac{\alpha^{2}}{\gamma}\left({% \bf x}^{k-1}-{\bf x}^{k}\right).∇ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) = divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG ( bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) roman_and ∇ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) = divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG ( bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) .

Besides, by the boundedness of 2h()normsuperscript2\|\nabla^{2}h(\cdot)\|∥ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_h ( ⋅ ) ∥, we get

(40) 𝐲Θα,γ(𝐮k+1)normsubscript𝐲subscriptΘ𝛼𝛾superscript𝐮𝑘1\displaystyle\left\|\nabla_{\bf y}\Theta_{\alpha,\gamma}({\bf u}^{k+1})\right\|∥ ∇ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ∥
1γ𝐱k𝐱k+1+αγ𝐱k𝐱k1+M𝐳k+1𝐲k+1absent1𝛾normsuperscript𝐱𝑘superscript𝐱𝑘1𝛼𝛾normsuperscript𝐱𝑘superscript𝐱𝑘1𝑀normsuperscript𝐳𝑘1superscript𝐲𝑘1\displaystyle\leq\frac{1}{\gamma}\left\|{\bf x}^{k}-{\bf x}^{k+1}\right\|+% \frac{\alpha}{\gamma}\left\|{\bf x}^{k}-{\bf x}^{k-1}\right\|+M\left\|{\bf z}^% {k+1}-{\bf y}^{k+1}\right\|≤ divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG ∥ bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ + divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG ∥ bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ + italic_M ∥ bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥
(1γ+M)𝐱k𝐱k+1+(αγ+αM)𝐱k𝐱k1,absent1𝛾𝑀normsuperscript𝐱𝑘superscript𝐱𝑘1𝛼𝛾𝛼𝑀normsuperscript𝐱𝑘superscript𝐱𝑘1\displaystyle\leq\left(\frac{1}{\gamma}+M\right)\left\|{\bf x}^{k}-{\bf x}^{k+% 1}\right\|+\left(\frac{\alpha}{\gamma}+\alpha M\right)\left\|{\bf x}^{k}-{\bf x% }^{k-1}\right\|,≤ ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG + italic_M ) ∥ bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ + ( divide start_ARG italic_α end_ARG start_ARG italic_γ end_ARG + italic_α italic_M ) ∥ bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ ,

where the last inequality follows from the first and last relations in Eq. 7. Combining Eq. 36, Eq. 37, Eq. 38, Eq. 39, and Eq. 40, we can obtain the conclusion Eq. 35 immediately. This completes the proof.

Now we establish the global convergence for Algorithm 1 based on the uniformized KL property. We will show that the sequence {𝐮k}k1subscriptsuperscript𝐮𝑘𝑘1\left\{{\bf u}^{k}\right\}_{k\geq 1}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT has finite length and thus is convergent. Especially, the sequence {𝐲k}k1subscriptsuperscript𝐲𝑘𝑘1\left\{{\bf y}^{k}\right\}_{k\geq 1}{ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT converges to a stationary point in critFcrit𝐹{\rm crit}Froman_crit italic_F.

Theorem 3.12.

Suppose that Section 3.1 and Section 3.1 hold. Let hhitalic_h be a twice continuously differentiable function with a bounded Hessian, i.e., there exists a constant M>0𝑀0M>0italic_M > 0 such that 2h(𝐲)M,𝐲nformulae-sequencenormsuperscript2𝐲𝑀for-all𝐲superscript𝑛\|\nabla^{2}h({\bf y})\|\leq M,\forall~{}{\bf y}\in\mathbb{R}^{n}∥ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_h ( bold_y ) ∥ ≤ italic_M , ∀ bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Let {(𝐲k,𝐳k,𝐱k)}k1subscriptsuperscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘𝑘1\{({\bf y}^{k},{\bf z}^{k},{\bf x}^{k})\}_{k\geq 1}{ ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT be the sequence generated by Eq. 7 which is assumed to be bounded, and {𝐮k},{𝐯k}superscript𝐮𝑘superscript𝐯𝑘\{{\bf u}^{k}\},~{}\{{\bf v}^{k}\}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } , { bold_v start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } are defined in Eq. 13. If F𝐹Fitalic_F in Eq. 1 is a KL function, then the sequence {𝐮k}k1subscriptsuperscript𝐮𝑘𝑘1\left\{{\bf u}^{k}\right\}_{k\geq 1}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT has finite length, that is,

k=1𝐲k+1𝐲k<+,k=1𝐳k+1𝐳k<+,k=1𝐱k+1𝐱k<+.superscriptsubscript𝑘1normsuperscript𝐲𝑘1superscript𝐲𝑘brasuperscriptsubscript𝑘1superscript𝐳𝑘1superscript𝐳𝑘normbrasuperscriptsubscript𝑘1superscript𝐱𝑘1superscript𝐱𝑘\sum_{k=1}^{\infty}\left\|{\bf y}^{k+1}-{\bf y}^{k}\right\|<+\infty,\quad\sum_% {k=1}^{\infty}\left\|{\bf z}^{k+1}-{\bf z}^{k}\right\|<+\infty,\quad\sum_{k=1}% ^{\infty}\left\|{\bf x}^{k+1}-{\bf x}^{k}\right\|<+\infty.∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ < + ∞ , ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ < + ∞ , ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ < + ∞ .

Hence, the whole sequence {𝐮k}k1subscriptsuperscript𝐮𝑘𝑘1\{{\bf u}^{k}\}_{k\geq 1}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT is convergent.

Proof 3.13.

We use θ(𝐮)𝜃superscript𝐮\theta({\bf u}^{\infty})italic_θ ( bold_u start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ) to denote the cluster point set of the sequence {𝐮k}superscript𝐮𝑘\{{\bf u}^{k}\}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT }. Since {𝐮k}superscript𝐮𝑘\{{\bf u}^{k}\}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } is bounded, θ(𝐮)𝜃superscript𝐮\theta({\bf u}^{\infty})italic_θ ( bold_u start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ) is a nonempty compact set, and it holds that

limkdist(𝐮k,θ(𝐮))=0.subscript𝑘distsuperscript𝐮𝑘𝜃superscript𝐮0\lim_{k\rightarrow\infty}{\rm dist}\left({\bf u}^{k},\theta({\bf u}^{\infty})% \right)=0.roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT roman_dist ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_θ ( bold_u start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ) ) = 0 .

From Lemma 3.5(i), Theorem 3.7(i) and Eq. 7, we know that θ(𝐮)critF×critF×critF×critF×critF𝜃superscript𝐮crit𝐹crit𝐹crit𝐹crit𝐹crit𝐹\theta({\bf u}^{\infty})\subseteq{\rm crit}F\times{\rm crit}F\times{\rm crit}F% \times{\rm crit}F\times{\rm crit}Fitalic_θ ( bold_u start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ) ⊆ roman_crit italic_F × roman_crit italic_F × roman_crit italic_F × roman_crit italic_F × roman_crit italic_F. Hence, for any 𝐮:=(𝐲,𝐳,𝐱,𝐱,𝐱)θ(𝐮)assignsuperscript𝐮superscript𝐲superscript𝐳superscript𝐱superscript𝐱superscript𝐱𝜃superscript𝐮{\bf u}^{*}:=({\bf y}^{*},{\bf z}^{*},{\bf x}^{*},{\bf x}^{*},{\bf x}^{*})\in% \theta({\bf u}^{\infty})bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := ( bold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∈ italic_θ ( bold_u start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ), there exists a subsequence {𝐮ki}superscript𝐮subscript𝑘𝑖\{{\bf u}^{k_{i}}\}{ bold_u start_POSTSUPERSCRIPT italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } of {𝐮k}superscript𝐮𝑘\{{\bf u}^{k}\}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } converging to 𝐮superscript𝐮{\bf u}^{*}bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

It follows from Theorem 3.7(ii) that limkΘα,γ(𝐮k)=Θα,γ(𝐮)subscript𝑘subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮\lim_{k\rightarrow\infty}\Theta_{\alpha,\gamma}({\bf u}^{k})=\Theta_{\alpha,% \gamma}({\bf u}^{*})roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) = roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). If there exists an integer k¯¯𝑘\bar{k}over¯ start_ARG italic_k end_ARG such that Θα,γ(𝐮k)=Θα,γ(𝐮)subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮\Theta_{\alpha,\gamma}({\bf u}^{k})=\Theta_{\alpha,\gamma}({\bf u}^{*})roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) = roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), then from Lemma 3.3, we have

(Λ(γ)τ)(1γ+Lh2)Δ𝐲k+12+ξ(α,γ)Δ𝐱k2Λ𝛾𝜏1𝛾subscript𝐿2superscriptnormsuperscriptsubscriptΔ𝐲𝑘12𝜉𝛼𝛾superscriptnormsuperscriptsubscriptΔ𝐱𝑘2\displaystyle\left(\Lambda(\gamma)-\tau\right)\left(\frac{1}{\gamma}+\frac{L_{% h}}{2}\right)\left\|\Delta_{\bf y}^{k+1}\right\|^{2}+\xi(\alpha,\gamma)\left\|% \Delta_{\bf x}^{k}\right\|^{2}( roman_Λ ( italic_γ ) - italic_τ ) ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ξ ( italic_α , italic_γ ) ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Θα,γ(𝐮k)Θα,γ(𝐮k+1)absentsubscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮𝑘1\displaystyle\leq\Theta_{\alpha,\gamma}({\bf u}^{k})-\Theta_{\alpha,\gamma}({% \bf u}^{k+1})≤ roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT )
Θα,γ(𝐮k¯)Θα,γ(𝐮)absentsubscriptΘ𝛼𝛾superscript𝐮¯𝑘subscriptΘ𝛼𝛾superscript𝐮\displaystyle\leq\Theta_{\alpha,\gamma}({\bf u}^{\bar{k}})-\Theta_{\alpha,% \gamma}({\bf u}^{*})≤ roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT over¯ start_ARG italic_k end_ARG end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
=0k>k¯.formulae-sequenceabsent0for-all𝑘¯𝑘\displaystyle=0\quad\forall~{}k>\bar{k}.= 0 ∀ italic_k > over¯ start_ARG italic_k end_ARG .

Thus, we have 𝐲k+1=𝐲ksuperscript𝐲𝑘1superscript𝐲𝑘{\bf y}^{k+1}={\bf y}^{k}bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT and 𝐱k+1=𝐱ksuperscript𝐱𝑘1superscript𝐱𝑘{\bf x}^{k+1}={\bf x}^{k}bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT for any k>k¯𝑘¯𝑘k>\bar{k}italic_k > over¯ start_ARG italic_k end_ARG. Together with Eq. 7, we also have 𝐳k+1=𝐳ksuperscript𝐳𝑘1superscript𝐳𝑘{\bf z}^{k+1}={\bf z}^{k}bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT = bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, and thus the assertion k=1𝐱k+1𝐱k<+,k=1𝐲k+1𝐲k<+,formulae-sequencesuperscriptsubscript𝑘1normsuperscript𝐱𝑘1superscript𝐱𝑘superscriptsubscript𝑘1normsuperscript𝐲𝑘1superscript𝐲𝑘\sum_{k=1}^{\infty}\left\|{\bf x}^{k+1}-{\bf x}^{k}\right\|<+\infty,~{}\sum_{k% =1}^{\infty}\left\|{\bf y}^{k+1}-{\bf y}^{k}\right\|<+\infty,∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ < + ∞ , ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ < + ∞ , and k=1𝐳k+1𝐳k<+superscriptsubscript𝑘1normsuperscript𝐳𝑘1superscript𝐳𝑘\sum_{k=1}^{\infty}\left\|{\bf z}^{k+1}-{\bf z}^{k}\right\|<+\infty∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ < + ∞ hold trivially. Otherwise, since Θα,γ(𝐮k)subscriptΘ𝛼𝛾superscript𝐮𝑘\Theta_{\alpha,\gamma}({\bf u}^{k})roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) is nonincreasing from Lemma 3.3, we have Θα,γ(𝐮k)>Θα,γ(𝐮)subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮\Theta_{\alpha,\gamma}({\bf u}^{k})>\Theta_{\alpha,\gamma}({\bf u}^{*})roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) > roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) for all k𝑘kitalic_k. Again from limkΘα,γ(𝐮k)=Θα,γ(𝐮)subscript𝑘subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮\lim_{k\rightarrow\infty}\Theta_{\alpha,\gamma}({\bf u}^{k})=\Theta_{\alpha,% \gamma}({\bf u}^{*})roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) = roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), we know that for any η>0𝜂0\eta>0italic_η > 0, there exists a nonnegative integer k0subscript𝑘0k_{0}italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that Θα,γ(𝐮k)<Θα,γ(𝐮)+ηsubscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮𝜂\Theta_{\alpha,\gamma}({\bf u}^{k})<\Theta_{\alpha,\gamma}({\bf u}^{*})+\etaroman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) < roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + italic_η for any k>k0𝑘subscript𝑘0k>k_{0}italic_k > italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. In addition, for any ς>0𝜍0\varsigma>0italic_ς > 0 there exists a positive integer k1subscript𝑘1k_{1}italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT such that dist(𝐮k,θ(𝐮))<ςdistsuperscript𝐮𝑘𝜃superscript𝐮𝜍{\rm dist}\left({\bf u}^{k},\theta({\bf u}^{\infty})\right)<\varsigmaroman_dist ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_θ ( bold_u start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ) ) < italic_ς for all k>k1𝑘subscript𝑘1k>k_{1}italic_k > italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Consequently, for any η,ς>0𝜂𝜍0\eta,~{}\varsigma>0italic_η , italic_ς > 0, when k>k2:=max{k0,k1}𝑘subscript𝑘2assignsubscript𝑘0subscript𝑘1k>k_{2}:=\max\{k_{0},k_{1}\}italic_k > italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := roman_max { italic_k start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT }, we have

dist(𝐮k,θ(𝐮))<ςandΘα,γ(𝐮k)<Θα,γ(𝐮)+η.formulae-sequencedistsuperscript𝐮𝑘𝜃superscript𝐮𝜍andsubscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮𝜂{\rm dist}\left({\bf u}^{k},\theta({\bf u}^{\infty})\right)<\varsigma\qquad% \hbox{and}\qquad\Theta_{\alpha,\gamma}({\bf u}^{k})<\Theta_{\alpha,\gamma}({% \bf u}^{*})+\eta.roman_dist ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , italic_θ ( bold_u start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ) ) < italic_ς and roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) < roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + italic_η .

Since θ(𝐮)𝜃superscript𝐮\theta({\bf u}^{\infty})italic_θ ( bold_u start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ) is a nonempty and compact set, and Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT is a constant on θ(𝐮)𝜃superscript𝐮\theta({\bf u}^{\infty})italic_θ ( bold_u start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ), we can apply Lemma 2.3 with Ω:=θ(𝐮)assignΩ𝜃superscript𝐮\Omega:=\theta({\bf u}^{\infty})roman_Ω := italic_θ ( bold_u start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ). Therefore, for any k>k2𝑘subscript𝑘2k>k_{2}italic_k > italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we have

(41) φ(Θα,γ(𝐮k)Θα,γ(𝐮))dist(0,Θα,γ(𝐮k))1.superscript𝜑subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮dist0subscriptΘ𝛼𝛾superscript𝐮𝑘1\varphi^{\prime}(\Theta_{\alpha,\gamma}({\bf u}^{k})-\Theta_{\alpha,\gamma}({% \bf u}^{*})){\rm dist}(0,\partial\Theta_{\alpha,\gamma}({\bf u}^{k}))\geq 1.italic_φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) roman_dist ( 0 , ∂ roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ) ≥ 1 .

From the concavity of φ𝜑\varphiitalic_φ, we have

φ(Θα,γ(𝐮k)Θα,γ(𝐮))φ(Θα,γ(𝐮k+1)Θα,γ(𝐮))𝜑subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮𝜑subscriptΘ𝛼𝛾superscript𝐮𝑘1subscriptΘ𝛼𝛾superscript𝐮\displaystyle\varphi(\Theta_{\alpha,\gamma}({\bf u}^{k})-\Theta_{\alpha,\gamma% }({\bf u}^{*}))-\varphi(\Theta_{\alpha,\gamma}({\bf u}^{k+1})-\Theta_{\alpha,% \gamma}({\bf u}^{*}))italic_φ ( roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) - italic_φ ( roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) )
φ(Θα,γ(𝐮k)Θα,γ(𝐮))(Θα,γ(𝐮k)Θα,γ(𝐮k+1)).absentsuperscript𝜑subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮𝑘1\displaystyle\quad\geq\varphi^{\prime}(\Theta_{\alpha,\gamma}({\bf u}^{k})-% \Theta_{\alpha,\gamma}({\bf u}^{*}))(\Theta_{\alpha,\gamma}({\bf u}^{k})-% \Theta_{\alpha,\gamma}({\bf u}^{k+1})).≥ italic_φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ( roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) ) .

Then, associated with dist(0,Θα,γ(𝐮k))b(𝐱k𝐱k1+𝐱k1𝐱k2)dist0subscriptΘ𝛼𝛾superscript𝐮𝑘𝑏normsuperscript𝐱𝑘superscript𝐱𝑘1normsuperscript𝐱𝑘1superscript𝐱𝑘2{\rm dist}\left(0,\partial\Theta_{\alpha,\gamma}({\bf u}^{k})\right)\leq b(\|{% \bf x}^{k}-{\bf x}^{k-1}\|+\|{\bf x}^{k-1}-{\bf x}^{k-2}\|)roman_dist ( 0 , ∂ roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ) ≤ italic_b ( ∥ bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ + ∥ bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 2 end_POSTSUPERSCRIPT ∥ ) in Lemma 3.10, Eq. 41, and φ(Θα,γ(𝐮k)Θα,γ(𝐮))>0superscript𝜑subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮0\varphi^{\prime}(\Theta_{\alpha,\gamma}({\bf u}^{k})-\Theta_{\alpha,\gamma}({% \bf u}^{*}))>0italic_φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) > 0, we get

Θα,γ(𝐮k)Θα,γ(𝐮k+1)φ(Θα,γ(𝐮k)Θα,γ(𝐮))φ(Θα,γ(𝐮k+1)Θα,γ(𝐮))φ(Θα,γ(𝐮k)Θα,γ(𝐮))b(𝐱k𝐱k1+𝐱k1𝐱k2)×[φ(Θα,γ(𝐮k)Θα,γ(𝐮))φ(Θα,γ(𝐮k+1)Θα,γ(𝐮))].subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮𝑘1𝜑subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮𝜑subscriptΘ𝛼𝛾superscript𝐮𝑘1subscriptΘ𝛼𝛾superscript𝐮superscript𝜑subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮𝑏delimited-∥∥superscript𝐱𝑘superscript𝐱𝑘1delimited-∥∥superscript𝐱𝑘1superscript𝐱𝑘2delimited-[]𝜑subscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮𝜑subscriptΘ𝛼𝛾superscript𝐮𝑘1subscriptΘ𝛼𝛾superscript𝐮\begin{split}\Theta_{\alpha,\gamma}({\bf u}^{k})-\Theta_{\alpha,\gamma}({\bf u% }^{k+1})&\leq\frac{\varphi(\Theta_{\alpha,\gamma}({\bf u}^{k})-\Theta_{\alpha,% \gamma}({\bf u}^{*}))-\varphi(\Theta_{\alpha,\gamma}({\bf u}^{k+1})-\Theta_{% \alpha,\gamma}({\bf u}^{*}))}{\varphi^{\prime}(\Theta_{\alpha,\gamma}({\bf u}^% {k})-\Theta_{\alpha,\gamma}({\bf u}^{*}))}\\ &\leq b(\|{\bf x}^{k}-{\bf x}^{k-1}\|+\|{\bf x}^{k-1}-{\bf x}^{k-2}\|)\\ &\quad\times[\varphi(\Theta_{\alpha,\gamma}({\bf u}^{k})-\Theta_{\alpha,\gamma% }({\bf u}^{*}))-\varphi(\Theta_{\alpha,\gamma}({\bf u}^{k+1})-\Theta_{\alpha,% \gamma}({\bf u}^{*}))].\end{split}start_ROW start_CELL roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) end_CELL start_CELL ≤ divide start_ARG italic_φ ( roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) - italic_φ ( roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_ARG start_ARG italic_φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ italic_b ( ∥ bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ + ∥ bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 2 end_POSTSUPERSCRIPT ∥ ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL × [ italic_φ ( roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) - italic_φ ( roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ] . end_CELL end_ROW

For convenience, for all p,q𝑝𝑞p,q\in\mathbb{N}italic_p , italic_q ∈ blackboard_N, we define

ζp,q:=φ(Θα,γ(𝐮p)Θα,γ(𝐮))φ(Θα,γ(𝐮q)Θα,γ(𝐮)).assignsubscript𝜁𝑝𝑞𝜑subscriptΘ𝛼𝛾superscript𝐮𝑝subscriptΘ𝛼𝛾superscript𝐮𝜑subscriptΘ𝛼𝛾superscript𝐮𝑞subscriptΘ𝛼𝛾superscript𝐮\zeta_{p,q}:=\varphi\big{(}\Theta_{\alpha,\gamma}({\bf u}^{p})-\Theta_{\alpha,% \gamma}({\bf u}^{*})\big{)}-\varphi\big{(}\Theta_{\alpha,\gamma}({\bf u}^{q})-% \Theta_{\alpha,\gamma}({\bf u}^{*})\big{)}.italic_ζ start_POSTSUBSCRIPT italic_p , italic_q end_POSTSUBSCRIPT := italic_φ ( roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) - italic_φ ( roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) .

Combining Eq. 17 and the above relation, it yields that for any k>k2𝑘subscript𝑘2k>k_{2}italic_k > italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT,

(Λ(γ)τ)(1γ+Lh2)Δ𝐲k+12+ξ(α,γ)Δ𝐱k2Λ𝛾𝜏1𝛾subscript𝐿2superscriptnormsuperscriptsubscriptΔ𝐲𝑘12𝜉𝛼𝛾superscriptnormsuperscriptsubscriptΔ𝐱𝑘2\displaystyle\left(\Lambda(\gamma)-\tau\right)\left(\frac{1}{\gamma}+\frac{L_{% h}}{2}\right)\left\|\Delta_{\bf y}^{k+1}\right\|^{2}+\xi(\alpha,\gamma)\left\|% \Delta_{\bf x}^{k}\right\|^{2}( roman_Λ ( italic_γ ) - italic_τ ) ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_ξ ( italic_α , italic_γ ) ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Θα,γ(𝐮k)Θα,γ(𝐮k+1)absentsubscriptΘ𝛼𝛾superscript𝐮𝑘subscriptΘ𝛼𝛾superscript𝐮𝑘1\displaystyle\leq\Theta_{\alpha,\gamma}({\bf u}^{k})-\Theta_{\alpha,\gamma}({% \bf u}^{k+1})≤ roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) - roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_u start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT )
b(Δ𝐱k+Δ𝐱k1)ζk,k+1.absent𝑏normsuperscriptsubscriptΔ𝐱𝑘normsuperscriptsubscriptΔ𝐱𝑘1subscript𝜁𝑘𝑘1\displaystyle\leq b\left(\|\Delta_{\bf x}^{k}\|+\|\Delta_{\bf x}^{k-1}\|\right% )\zeta_{k,k+1}.≤ italic_b ( ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ + ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ ) italic_ζ start_POSTSUBSCRIPT italic_k , italic_k + 1 end_POSTSUBSCRIPT .

This implies that

Δ𝐲k+112(Δ𝐱k+Δ𝐱k1)2bρ1ζk,k+1,normsuperscriptsubscriptΔ𝐲𝑘112normsuperscriptsubscriptΔ𝐱𝑘normsuperscriptsubscriptΔ𝐱𝑘12𝑏subscript𝜌1subscript𝜁𝑘𝑘1\|\Delta_{\bf y}^{k+1}\|\leq\sqrt{\frac{1}{2}(\|\Delta_{\bf x}^{k}\|+\|\Delta_% {\bf x}^{k-1}\|)}\sqrt{\frac{2b}{\rho_{1}}\zeta_{k,k+1}},∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ ≤ square-root start_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ + ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ ) end_ARG square-root start_ARG divide start_ARG 2 italic_b end_ARG start_ARG italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_ζ start_POSTSUBSCRIPT italic_k , italic_k + 1 end_POSTSUBSCRIPT end_ARG ,

and

Δ𝐱k12(Δ𝐱k+Δ𝐱k1)2bρ2ζk,k+1,normsuperscriptsubscriptΔ𝐱𝑘12normsuperscriptsubscriptΔ𝐱𝑘normsuperscriptsubscriptΔ𝐱𝑘12𝑏subscript𝜌2subscript𝜁𝑘𝑘1\|\Delta_{\bf x}^{k}\|\leq\sqrt{\frac{1}{2}(\|\Delta_{\bf x}^{k}\|+\|\Delta_{% \bf x}^{k-1}\|)}\sqrt{\frac{2b}{\rho_{2}}\zeta_{k,k+1}},∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ ≤ square-root start_ARG divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ + ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ ) end_ARG square-root start_ARG divide start_ARG 2 italic_b end_ARG start_ARG italic_ρ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG italic_ζ start_POSTSUBSCRIPT italic_k , italic_k + 1 end_POSTSUBSCRIPT end_ARG ,

where ρ1:=(Λ(γ)τ)(1γ+Lh2)assignsubscript𝜌1Λ𝛾𝜏1𝛾subscript𝐿2\rho_{1}:=\left(\Lambda(\gamma)-\tau\right)\left(\frac{1}{\gamma}+\frac{L_{h}}% {2}\right)italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := ( roman_Λ ( italic_γ ) - italic_τ ) ( divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG + divide start_ARG italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ) and ρ2:=ξ(α,γ)assignsubscript𝜌2𝜉𝛼𝛾\rho_{2}:=\xi(\alpha,\gamma)italic_ρ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := italic_ξ ( italic_α , italic_γ ). Further, using the fact that μ1μ2μ1/2+μ2/2subscript𝜇1subscript𝜇2subscript𝜇12subscript𝜇22\sqrt{\mu_{1}\mu_{2}}\leq\mu_{1}/2+\mu_{2}/2square-root start_ARG italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ≤ italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / 2 + italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 2 with μ1=(Δ𝐱k+Δ𝐱k1)/2subscript𝜇1normsuperscriptsubscriptΔ𝐱𝑘normsuperscriptsubscriptΔ𝐱𝑘12\mu_{1}=(\|\Delta_{\bf x}^{k}\|+\|\Delta_{\bf x}^{k-1}\|)/2italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ + ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ ) / 2 and μ2=2bζk,k+1/ρ1subscript𝜇22𝑏subscript𝜁𝑘𝑘1subscript𝜌1\mu_{2}=2b\zeta_{k,k+1}/\rho_{1}italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 2 italic_b italic_ζ start_POSTSUBSCRIPT italic_k , italic_k + 1 end_POSTSUBSCRIPT / italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT or μ2=2bζk,k+1/ρ2subscript𝜇22𝑏subscript𝜁𝑘𝑘1subscript𝜌2\mu_{2}=2b\zeta_{k,k+1}/\rho_{2}italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 2 italic_b italic_ζ start_POSTSUBSCRIPT italic_k , italic_k + 1 end_POSTSUBSCRIPT / italic_ρ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we get

(42) Δ𝐲k+114(Δ𝐱k+Δ𝐱k1)+bρ1ζk,k+1,normsuperscriptsubscriptΔ𝐲𝑘114normsuperscriptsubscriptΔ𝐱𝑘normsuperscriptsubscriptΔ𝐱𝑘1𝑏subscript𝜌1subscript𝜁𝑘𝑘1\|\Delta_{\bf y}^{k+1}\|\leq\frac{1}{4}\left(\|\Delta_{\bf x}^{k}\|+\|\Delta_{% \bf x}^{k-1}\|\right)+\frac{b}{\rho_{1}}\zeta_{k,k+1},∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ ≤ divide start_ARG 1 end_ARG start_ARG 4 end_ARG ( ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ + ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ ) + divide start_ARG italic_b end_ARG start_ARG italic_ρ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_ζ start_POSTSUBSCRIPT italic_k , italic_k + 1 end_POSTSUBSCRIPT ,

and

(43) Δ𝐱k14(Δ𝐱k+Δ𝐱k1)+bρ2ζk,k+1.normsuperscriptsubscriptΔ𝐱𝑘14normsuperscriptsubscriptΔ𝐱𝑘normsuperscriptsubscriptΔ𝐱𝑘1𝑏subscript𝜌2subscript𝜁𝑘𝑘1\|\Delta_{\bf x}^{k}\|\leq\frac{1}{4}\left(\|\Delta_{\bf x}^{k}\|+\|\Delta_{% \bf x}^{k-1}\|\right)+\frac{b}{\rho_{2}}\zeta_{k,k+1}.∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ ≤ divide start_ARG 1 end_ARG start_ARG 4 end_ARG ( ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ + ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ∥ ) + divide start_ARG italic_b end_ARG start_ARG italic_ρ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG italic_ζ start_POSTSUBSCRIPT italic_k , italic_k + 1 end_POSTSUBSCRIPT .

Then, it follows from Lemma 2.5 and Eq. 43 that k=1Δ𝐱k+1<+,superscriptsubscript𝑘1normsuperscriptsubscriptΔ𝐱𝑘1\sum_{k=1}^{\infty}\|\Delta_{\bf x}^{k+1}\|<+\infty,∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ < + ∞ , and we further have k=1Δ𝐲k+1<+superscriptsubscript𝑘1normsuperscriptsubscriptΔ𝐲𝑘1\sum_{k=1}^{\infty}\|\Delta_{\bf y}^{k+1}\|<+\infty∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ < + ∞ due to Eq. 42. Again from Eq. 7, we know that k=1Δ𝐳k+1<+superscriptsubscript𝑘1normsuperscriptsubscriptΔ𝐳𝑘1\sum_{k=1}^{\infty}\|\Delta_{\bf z}^{k+1}\|<+\infty∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ∥ < + ∞. Thus, {𝐮k}k1subscriptsuperscript𝐮𝑘𝑘1\{{\bf u}^{k}\}_{k\geq 1}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT is a Cauchy sequence and hence it is convergent. Applying Theorem 3.7(i), there exists a 𝐲critFsuperscript𝐲crit𝐹{\bf y}^{*}\in{\rm crit}Fbold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ roman_crit italic_F such that limk𝐲k=𝐲subscript𝑘superscript𝐲𝑘superscript𝐲\lim_{k\rightarrow\infty}{\bf y}^{k}={\bf y}^{*}roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = bold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. This completes the proof.

Remark 3.14.

KL functions exhibit remarkable versatility and are extensively applied in various domains, including semi-algebraic analysis, subanalytic analysis, and log-exp functions. Concrete examples of KL functions can be found in [2, 3, 8]. These examples encompass many common instances such as psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm (where p0𝑝0p\geq 0italic_p ≥ 0), indicator functions of semi-algebraic sets, and a majority of convex functions.

4 Extrapolated PnP-DYS methods

In this section, we focus on the development of a class of Plug-and-Play Davis-Yin splitting (PnP-DYS) algorithms with convergence guarantee. The PnP approach is a versatile methodology primarily utilized for addressing inverse problems involving large-scale measurements through the integration of statistical priors defined as denoisers. This approach draws inspiration from well-established proximal algorithms commonly employed in nonsmooth composite optimization, such as FBS, DRS, and ADMM. The rise in the popularity of deep learning has resulted in the widespread adoption of PnP for effectively utilizing learned priors defined through pre-trained deep neural networks. This adoption has propelled PnP to achieve state-of-the-art performance across a range of applications. For instance, by replacing the proximal operator of f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with a learned denoiser 𝒟σsubscript𝒟𝜎\mathcal{D}_{\sigma}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT in Eq. 2, we can obtain a PnP-DYS method as follows:

{𝐲k+1=Proxγf1(𝐱k),𝐳k+1=𝒟σ(2𝐲k+1γh(𝐲k+1)𝐱k),𝐱k+1=𝐱k+(𝐳k+1𝐲k+1).\left\{\begin{aligned} {\bf y}^{k+1}&={\rm Prox}_{\gamma f_{1}}\left({\bf x}^{% k}\right),\\ {\bf z}^{k+1}&=\mathcal{D}_{\sigma}\left(2{\bf y}^{k+1}-\gamma\nabla h({\bf y}% ^{k+1})-{\bf x}^{k}\right),\\ {\bf x}^{k+1}&={\bf x}^{k}+\left({\bf z}^{k+1}-{\bf y}^{k+1}\right).\end{% aligned}\right.{ start_ROW start_CELL bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = roman_Prox start_POSTSUBSCRIPT italic_γ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( 2 bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) . end_CELL end_ROW

To guarantee the theoretical convergence, we consider the Gradient Step (GS) Denoiser developed in [13, 27] as follows:

(44) 𝒟σ=Igσ,subscript𝒟𝜎𝐼subscript𝑔𝜎\mathcal{D}_{\sigma}=I-\nabla g_{\sigma},caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT = italic_I - ∇ italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ,

which is obtained from a scalar function:

gσ=12𝐱Nσ(𝐱)2,subscript𝑔𝜎12superscriptnorm𝐱subscript𝑁𝜎𝐱2g_{\sigma}=\frac{1}{2}\left\|{\bf x}-N_{\sigma}({\bf x})\right\|^{2},italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_x - italic_N start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where the mapping Nσ(𝐱)subscript𝑁𝜎𝐱N_{\sigma}({\bf x})italic_N start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x ) is realized as a differentiable neural network, enabling the explicit computation of gσsubscript𝑔𝜎g_{\sigma}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT and ensuring that gσsubscript𝑔𝜎g_{\sigma}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT has a Lipschitz gradient with a constant L𝐿Litalic_L (L<1𝐿1L<1italic_L < 1). Originally, the denoiser 𝒟σsubscript𝒟𝜎\mathcal{D}_{\sigma}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT in Eq. 44 is trained to denoise images degraded with Gaussian noise of level σ𝜎\sigmaitalic_σ. In [27], it is shown that, although constrained to be an exact conservative field, it can realize state-of-the-art denoising. Remarkably, the denoiser 𝒟σsubscript𝒟𝜎\mathcal{D}_{\sigma}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT in Eq. 44 takes the form of a proximal mapping of a weakly convex function, as stated in the next proposition.

Proposition 4.1.

[28, Propostion 3.1] 𝒟σ(𝐱)=proxϕσ(𝐱)subscript𝒟𝜎𝐱subscriptproxsubscriptitalic-ϕ𝜎𝐱\mathcal{D}_{\sigma}({\bf x})=\operatorname{prox}_{\phi_{\sigma}}({\bf x})caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x ) = roman_prox start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x ), where ϕσsubscriptitalic-ϕ𝜎\phi_{\sigma}italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT is defined by

(45) ϕσ(𝐱)=gσ(𝒟σ1(𝐱))12𝒟σ1(𝐱)𝐱2subscriptitalic-ϕ𝜎𝐱subscript𝑔𝜎superscriptsubscript𝒟𝜎1𝐱12superscriptnormsuperscriptsubscript𝒟𝜎1𝐱𝐱2\phi_{\sigma}({\bf x})=g_{\sigma}\left(\mathcal{D}_{\sigma}^{-1}({\bf x})% \right)-\frac{1}{2}\left\|\mathcal{D}_{\sigma}^{-1}({\bf x})-{\bf x}\right\|^{2}italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x ) = italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_x ) ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_x ) - bold_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

if 𝐱Im(𝒟σ)𝐱Imsubscript𝒟𝜎{\bf x}\in\operatorname{Im}\left(\mathcal{D}_{\sigma}\right)bold_x ∈ roman_Im ( caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ), and ϕσ(𝐱)=+subscriptitalic-ϕ𝜎𝐱\phi_{\sigma}({\bf x})=+\inftyitalic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x ) = + ∞ otherwise. Moreover, ϕσsubscriptitalic-ϕ𝜎\phi_{\sigma}italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT is LL+1𝐿𝐿1\frac{L}{L+1}divide start_ARG italic_L end_ARG start_ARG italic_L + 1 end_ARG-weakly convex and ϕσ is L1L-Lipschitz onIm(𝒟σ)subscriptitalic-ϕ𝜎 is 𝐿1𝐿-Lipschitz onImsubscript𝒟𝜎\nabla\phi_{\sigma}\text{ is }\frac{L}{1-L}\text{-Lipschitz on}\operatorname{% Im}\left(\mathcal{D}_{\sigma}\right)∇ italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT is divide start_ARG italic_L end_ARG start_ARG 1 - italic_L end_ARG -Lipschitz on roman_Im ( caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ), and ϕσ(𝐱)gσ(𝐱)subscriptitalic-ϕ𝜎𝐱subscript𝑔𝜎𝐱\phi_{\sigma}({\bf x})\geq g_{\sigma}({\bf x})italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x ) ≥ italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x ), 𝐱nfor-all𝐱superscript𝑛\forall{\bf x}\in\mathbb{R}^{n}∀ bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.

Drawing upon the Proposition 4.1, we are interested in developing the extrapolated PnP-DYS algorithm, with a plugged denoiser 𝒟σsubscript𝒟𝜎\mathcal{D}_{\sigma}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT in Eq. 44 that corresponds to the proximal operator of a nonconvex functional ϕσsubscriptitalic-ϕ𝜎\phi_{\sigma}italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT in Eq. 45. To do so, we turn to target the optimization problems as follows:

(46) minFγ,σ(𝐱)=f(𝐱)+1γϕσ(𝐱)+h(𝐱),subscript𝐹𝛾𝜎𝐱𝑓𝐱1𝛾subscriptitalic-ϕ𝜎𝐱𝐱\min F_{\gamma,\sigma}({\bf x})=f({\bf x})+\frac{1}{\gamma}\phi_{\sigma}({\bf x% })+h({\bf x}),roman_min italic_F start_POSTSUBSCRIPT italic_γ , italic_σ end_POSTSUBSCRIPT ( bold_x ) = italic_f ( bold_x ) + divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x ) + italic_h ( bold_x ) ,

where f𝑓fitalic_f is a (possibly nonconvex) data-fidelity term, hhitalic_h is differential with Lipschitz continus gradient, γ𝛾\gammaitalic_γ is a regularization parameter and ϕσsubscriptitalic-ϕ𝜎\phi_{\sigma}italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT is defined as in Proposition 4.1 from the function gσsubscript𝑔𝜎g_{\sigma}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT satisfying 𝒟σ=Igσsubscript𝒟𝜎𝐼subscript𝑔𝜎\mathcal{D}_{\sigma}=I-\nabla g_{\sigma}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT = italic_I - ∇ italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT. In our analysis, to use Proposition 4.1, gσsubscript𝑔𝜎g_{\sigma}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT is assumed 𝒞2superscript𝒞2\mathcal{C}^{2}caligraphic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with L𝐿Litalic_L-Lipschitz continuous gradient (L<1)𝐿1(L<1)( italic_L < 1 ). We also assume f𝑓fitalic_f and gσsubscript𝑔𝜎g_{\sigma}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT bounded from below. From Proposition 4.1, we get that ϕσsubscriptitalic-ϕ𝜎\phi_{\sigma}italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT and thus Fλ,σsubscript𝐹𝜆𝜎F_{\lambda,\sigma}italic_F start_POSTSUBSCRIPT italic_λ , italic_σ end_POSTSUBSCRIPT are also bounded from below. In the following, we develop two extrapolated PnP-DYS methods depending on whether f𝑓fitalic_f in Eq. 46 exhibits smoothness and discuss their theoretical convergence.

According to [25, Lemma 1], ϕσ(𝐱)subscriptitalic-ϕ𝜎𝐱\phi_{\sigma}({\bf x})italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x ) in Eq. 45 satisfies the Kurdyka-Łojasiewicz (KL) property if gσsubscript𝑔𝜎g_{\sigma}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT is real analytic [31] in a neighborhood of 𝐱𝐧𝐱superscript𝐧\bf x\in\mathbb{R}^{n}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT bold_n end_POSTSUPERSCRIPT and its Jacobian matrix Jgσ(𝐱)𝐽subscript𝑔𝜎𝐱Jg_{\sigma}({\bf x})italic_J italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x ) is nonsingular. Note that the real analytic property of gσsubscript𝑔𝜎g_{\sigma}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT can be ensured for a broader range of deep neural networks. Meanwhile, the nonsingularity of Jgσ(𝐱)𝐽subscript𝑔𝜎𝐱Jg_{\sigma}({\bf x})italic_J italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x ) can be guaranteed by assuming L<1𝐿1L<1italic_L < 1 as discussed in [25]. For more discussions on general conditions under which the KL property holds for deep neural networks, we refer to [5, 11, 77]. Therefore, selecting a neural network for gσsubscript𝑔𝜎g_{\sigma}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT that guarantees the KL property of ϕσ(𝐱)subscriptitalic-ϕ𝜎𝐱\phi_{\sigma}({\bf x})italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x ) during implementation is not a difficult task.

4.1 When f𝑓fitalic_f is smooth with Lipschitz continuous gradient

In this subsection, we consider the case that f𝑓fitalic_f in Eq. 46 is differentiable with Lipschitz continuous gradient. In this case, we replace the second proximal subproblem with a learned denoiser 𝒟σsubscript𝒟𝜎\mathcal{D}_{\sigma}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT in Eq. 44, and produce a smooth extrapolated PnP-DYS method detailed in Algorithm 2. Actually, Algorithm 2 reduces to the extrapolated versions, i.e., the accelerated versions, of PnP-DRS and PnP-FBS methods when h=00h=0italic_h = 0 and f=0𝑓0f=0italic_f = 0, respectively. Notably, these specific cases have not been explored in previous literature to the best of our knowledge.

Algorithm 2 A smooth extrapolated PnP-DYS method
  Choose the parameters α0𝛼0\alpha\geq 0italic_α ≥ 0 and γ>0𝛾0\gamma>0italic_γ > 0. Given 𝐱0superscript𝐱0{\bf x}^{0}bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and 𝐱1=𝐱0superscript𝐱1superscript𝐱0{\bf x}^{-1}={\bf x}^{0}bold_x start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, set k=0𝑘0k=0italic_k = 0.
  while the stopping criteria is not satisfied, do
     
{𝐰k=𝐱k+α(𝐱k𝐱k1),𝐲k+1=Proxγf(𝐰k),𝐳k+1=𝒟σ(2𝐲k+1γh(𝐲k+1)𝐰k),𝐱k+1=𝐰k+(𝐳k+1𝐲k+1).\left\{\begin{aligned} {\bf w}^{k}&={\bf x}^{k}+\alpha({\bf x}^{k}-{\bf x}^{k-% 1}),\\ {\bf y}^{k+1}&={\rm Prox}_{\gamma f}\left({\bf w}^{k}\right),\\ {\bf z}^{k+1}&=\mathcal{D}_{\sigma}\left(2{\bf y}^{k+1}-\gamma\nabla h({\bf y}% ^{k+1})-{\bf w}^{k}\right),\\ {\bf x}^{k+1}&={\bf w}^{k}+\left({\bf z}^{k+1}-{\bf y}^{k+1}\right).\end{% aligned}\right.{ start_ROW start_CELL bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_CELL start_CELL = bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_α ( bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = roman_Prox start_POSTSUBSCRIPT italic_γ italic_f end_POSTSUBSCRIPT ( bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( 2 bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) . end_CELL end_ROW
  end while

Next, we discuss the convergence property of Algorithm 2 for the explicit optimization problem Eq. 46. Before the analysis, we define

~γ(𝐲,𝐳,𝐱)=f(𝐲)+1γϕσ(𝐳)+h(𝐲)+12γ𝐲𝐱γh(𝐲)212γ𝐳𝐱γh(𝐲)2,subscript~𝛾𝐲𝐳𝐱𝑓𝐲1𝛾subscriptitalic-ϕ𝜎𝐳𝐲12𝛾superscriptnorm𝐲𝐱𝛾𝐲212𝛾superscriptnorm𝐳𝐱𝛾𝐲2\displaystyle\mathcal{\widetilde{H}}_{\gamma}\left({\bf y},{\bf z},{\bf x}% \right)=f({\bf y})+\frac{1}{\gamma}\phi_{\sigma}({\bf z})+h({\bf y})+\frac{1}{% 2\gamma}\|{\bf y}-{\bf x}-\gamma\nabla h({\bf y})\|^{2}-\frac{1}{2\gamma}\|{% \bf z}-{\bf x}-\gamma\nabla h({\bf y})\|^{2},over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y , bold_z , bold_x ) = italic_f ( bold_y ) + divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_z ) + italic_h ( bold_y ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_y - bold_x - italic_γ ∇ italic_h ( bold_y ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_z - bold_x - italic_γ ∇ italic_h ( bold_y ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

and

Θ~α,γ(𝐲,𝐳,𝐱,𝐱1,𝐱2)=~γ(𝐲,𝐳,𝐱)+α22γ𝐱1𝐱22.subscript~Θ𝛼𝛾𝐲𝐳𝐱subscript𝐱1subscript𝐱2subscript~𝛾𝐲𝐳𝐱superscript𝛼22𝛾superscriptnormsubscript𝐱1subscript𝐱22\displaystyle\widetilde{\Theta}_{\alpha,\gamma}\left({\bf y},{\bf z},{\bf x},{% \bf x}_{1},{\bf x}_{2}\right)=\mathcal{\widetilde{H}}_{\gamma}({\bf y},{\bf z}% ,{\bf x})+\frac{\alpha^{2}}{2\gamma}\|{\bf x}_{1}-{\bf x}_{2}\|^{2}.over~ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_y , bold_z , bold_x , bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = over~ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y , bold_z , bold_x ) + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

In the following, we present the convergence results of Algorithm 2.

Theorem 4.2.

Let gσ:n{+}:subscript𝑔𝜎superscript𝑛g_{\sigma}:\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R ∪ { + ∞ } of class 𝒞2superscript𝒞2\mathcal{C}^{2}caligraphic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with L𝐿Litalic_L-Lipschitz continuous gradient with L<1𝐿1L<1italic_L < 1, and 𝒟σ=Igσsubscript𝒟𝜎𝐼subscript𝑔𝜎\mathcal{D}_{\sigma}=I-\nabla g_{\sigma}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT = italic_I - ∇ italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT. Let f:n{+}:𝑓superscript𝑛f:\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R ∪ { + ∞ } and hhitalic_h be differentiable with Lfsubscript𝐿𝑓L_{f}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT- and Lhsubscript𝐿L_{h}italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT-Lipschitz continuous gradient, and let lfsubscript𝑙𝑓l_{f}italic_l start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT be a constant such that f+lf2f+\frac{l_{f}}{2}\|\cdot\|italic_f + divide start_ARG italic_l start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∥ ⋅ ∥ is convex. Suppose that f𝑓fitalic_f, gσsubscript𝑔𝜎g_{\sigma}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT and hhitalic_h are bounded from below, Then, for α𝛼\alphaitalic_α and γ𝛾\gammaitalic_γ satisfying Section 3.1 with Lf1:=Lfassignsubscript𝐿subscript𝑓1subscript𝐿𝑓L_{f_{1}}:=L_{f}italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT := italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and l:=lfassign𝑙subscript𝑙𝑓l:=l_{f}italic_l := italic_l start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, the sequence {(𝐲k,𝐳k,𝐱k)}k1subscriptsuperscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘𝑘1\left\{({\bf y}^{k},{\bf z}^{k},{\bf x}^{k})\right\}_{k\geq 1}{ ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT generated by Algorithm 2 which is assumed to be bounded verify that

  • (i)

    {Θ~α,γ(𝐲k,𝐳k,𝐱k,𝐱k1,𝐱k2)}k1subscriptsubscript~Θ𝛼𝛾superscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘superscript𝐱𝑘1superscript𝐱𝑘2𝑘1\left\{\widetilde{\Theta}_{\alpha,\gamma}\left({\bf y}^{k},{\bf z}^{k},{\bf x}% ^{k},{\bf x}^{k-1},{\bf x}^{k-2}\right)\right\}_{k\geq 1}{ over~ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k - 2 end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT is nonincreasing and converges.

  • (ii)

    the sequences {Δ𝐱k}superscriptsubscriptΔ𝐱𝑘\{\Delta_{\bf x}^{k}\}{ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT }, {Δ𝐲k}superscriptsubscriptΔ𝐲𝑘\{\Delta_{\bf y}^{k}\}{ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } and {𝐲k𝐳k}superscript𝐲𝑘superscript𝐳𝑘\{{\bf y}^{k}-{\bf z}^{k}\}{ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } vanish with rate minkKΔ𝐱k=𝒪(1K)subscript𝑘𝐾normsuperscriptsubscriptΔ𝐱𝑘𝒪1𝐾\min_{k\leq K}\|\Delta_{\bf x}^{k}\|=\mathcal{O}(\frac{1}{\sqrt{K}})roman_min start_POSTSUBSCRIPT italic_k ≤ italic_K end_POSTSUBSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = caligraphic_O ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ), minkKΔ𝐲k=𝒪(1K)subscript𝑘𝐾normsuperscriptsubscriptΔ𝐲𝑘𝒪1𝐾\min_{k\leq K}\|\Delta_{\bf y}^{k}\|=\mathcal{O}(\frac{1}{\sqrt{K}})roman_min start_POSTSUBSCRIPT italic_k ≤ italic_K end_POSTSUBSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = caligraphic_O ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ), and minkK𝐲k𝐳k=𝒪(1K)subscript𝑘𝐾normsuperscript𝐲𝑘superscript𝐳𝑘𝒪1𝐾\min_{k\leq K}\|{\bf y}^{k}-{\bf z}^{k}\|=\mathcal{O}(\frac{1}{\sqrt{K}})roman_min start_POSTSUBSCRIPT italic_k ≤ italic_K end_POSTSUBSCRIPT ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = caligraphic_O ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ), respectively.

  • (iii)

    any cluster point 𝐮:=(𝐲,𝐳,𝐱,𝐱,𝐱)assignsuperscript𝐮superscript𝐲superscript𝐳superscript𝐱superscript𝐱superscript𝐱{\bf u}^{*}:=({\bf y}^{*},{\bf z}^{*},{\bf x}^{*},{\bf x}^{*},{\bf x}^{*})bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := ( bold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) of sequence {𝐮k}k1subscriptsuperscript𝐮𝑘𝑘1\left\{{\bf u}^{k}\right\}_{k\geq 1}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT is a critical point of the problem Eq. 46, i.e., it holds that 0Fγ,σ(𝐲)0subscript𝐹𝛾𝜎superscript𝐲0\in\partial F_{\gamma,\sigma}({\bf y}^{*})0 ∈ ∂ italic_F start_POSTSUBSCRIPT italic_γ , italic_σ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ).

  • (iv)

    if hhitalic_h is a twice continuously differentiable function with a bounded Hessian, i.e., there exists a constant M>0𝑀0M>0italic_M > 0 such that 2h(𝐲)M,𝐲nformulae-sequencenormsuperscript2𝐲𝑀for-all𝐲superscript𝑛\|\nabla^{2}h({\bf y})\|\leq M,~{}\forall~{}{\bf y}\in\mathbb{R}^{n}∥ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_h ( bold_y ) ∥ ≤ italic_M , ∀ bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and Fγ,σsubscript𝐹𝛾𝜎F_{\gamma,\sigma}italic_F start_POSTSUBSCRIPT italic_γ , italic_σ end_POSTSUBSCRIPT in Eq. 46 is a KL function. Then, the whole sequence {𝐮k}k1subscriptsuperscript𝐮𝑘𝑘1\{{\bf u}^{k}\}_{k\geq 1}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT is convergent.

Proof 4.3.

Since f𝑓fitalic_f and hhitalic_h are differentiable with Lfsubscript𝐿𝑓L_{f}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT- and Lhsubscript𝐿L_{h}italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT-Lipschitz continuous gradient, the problem Eq. 46 is a special form of Eq. 1 with f1:=fassignsubscript𝑓1𝑓f_{1}:=fitalic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := italic_f and f2:=1γϕσassignsubscript𝑓21𝛾subscriptitalic-ϕ𝜎f_{2}:=\frac{1}{\gamma}\phi_{\sigma}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT. Therefore, it follows from Lemma 3.3 and Lemma 3.5 that (i) and (ii) hold. The assertion (iii) can be obtained according to Theorem 3.7, and the conclusion (iv) can be derived from Theorem 3.12. This completes the proof.

4.2 When f𝑓fitalic_f is nonsmooth

To cope with the problem Eq. 46 with a possibly nondifferentiable function f𝑓fitalic_f, we propose a nonsmooth extrapolated PnP-DYS method in Algorithm 3. In this case, we replace the first proximal subproblem in Algorithm 3 by a learned denoiser 𝒟σsubscript𝒟𝜎\mathcal{D}_{\sigma}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT defined in Eq. 44 to guarantee the theoretical convergence.

Algorithm 3 A nonsmooth extrapolated PnP-DYS method
  Choose the parameters α0𝛼0\alpha\geq 0italic_α ≥ 0 and γ>0𝛾0\gamma>0italic_γ > 0. Given 𝐱0superscript𝐱0{\bf x}^{0}bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT and 𝐱1=𝐱0superscript𝐱1superscript𝐱0{\bf x}^{-1}={\bf x}^{0}bold_x start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, set k=0𝑘0k=0italic_k = 0.
  while the stopping criteria is not satisfied, do
     
{𝐰k=𝐱k+α(𝐱k𝐱k1),𝐲k+1=𝒟σ(𝐰k),𝐳k+1=Proxγf(2𝐲k+1γh(𝐲k+1)𝐰k),𝐱k+1=𝐰k+(𝐳k+1𝐲k+1).\left\{\begin{aligned} {\bf w}^{k}&={\bf x}^{k}+\alpha({\bf x}^{k}-{\bf x}^{k-% 1}),\\ {\bf y}^{k+1}&=\mathcal{D}_{\sigma}\left({\bf w}^{k}\right),\\ {\bf z}^{k+1}&={\rm Prox}_{\gamma f}\left(2{\bf y}^{k+1}-\gamma\nabla h({\bf y% }^{k+1})-{\bf w}^{k}\right),\\ {\bf x}^{k+1}&={\bf w}^{k}+\left({\bf z}^{k+1}-{\bf y}^{k+1}\right).\end{% aligned}\right.{ start_ROW start_CELL bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_CELL start_CELL = bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + italic_α ( bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = roman_Prox start_POSTSUBSCRIPT italic_γ italic_f end_POSTSUBSCRIPT ( 2 bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - italic_γ ∇ italic_h ( bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) - bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT end_CELL start_CELL = bold_w start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT + ( bold_z start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_y start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ) . end_CELL end_ROW
  end while

In order to analyze the convergence of Algorithm 3, we define

^γ(𝐲,𝐳,𝐱)=1γϕσ(𝐲)+f(𝐳)+h(𝐲)+12γ𝐲𝐱γh(𝐲)212γ𝐳𝐱γh(𝐲)2,subscript^𝛾𝐲𝐳𝐱1𝛾subscriptitalic-ϕ𝜎𝐲𝑓𝐳𝐲12𝛾superscriptnorm𝐲𝐱𝛾𝐲212𝛾superscriptnorm𝐳𝐱𝛾𝐲2\displaystyle\mathcal{\widehat{H}}_{\gamma}\left({\bf y},{\bf z},{\bf x}\right% )=\frac{1}{\gamma}\phi_{\sigma}({\bf y})+f({\bf z})+h({\bf y})+\frac{1}{2% \gamma}\|{\bf y}-{\bf x}-\gamma\nabla h({\bf y})\|^{2}-\frac{1}{2\gamma}\|{\bf z% }-{\bf x}-\gamma\nabla h({\bf y})\|^{2},over^ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y , bold_z , bold_x ) = divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_y ) + italic_f ( bold_z ) + italic_h ( bold_y ) + divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_y - bold_x - italic_γ ∇ italic_h ( bold_y ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_z - bold_x - italic_γ ∇ italic_h ( bold_y ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

and

Θ^α,γ(𝐲,𝐳,𝐱,𝐱1,𝐱2)=^γ(𝐲,𝐳,𝐱)+α22γ𝐱1𝐱22.subscript^Θ𝛼𝛾𝐲𝐳𝐱subscript𝐱1subscript𝐱2subscript^𝛾𝐲𝐳𝐱superscript𝛼22𝛾superscriptnormsubscript𝐱1subscript𝐱22\displaystyle\widehat{\Theta}_{\alpha,\gamma}\left({\bf y},{\bf z},{\bf x},{% \bf x}_{1},{\bf x}_{2}\right)=\mathcal{\widehat{H}}_{\gamma}({\bf y},{\bf z},{% \bf x})+\frac{\alpha^{2}}{2\gamma}\|{\bf x}_{1}-{\bf x}_{2}\|^{2}.over^ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_y , bold_z , bold_x , bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = over^ start_ARG caligraphic_H end_ARG start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ( bold_y , bold_z , bold_x ) + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_γ end_ARG ∥ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Now we give the convergence results of Algorithm 3 based on the conclusions in Section 3 and the discussions in [28].

Theorem 4.4.

Let gσ:n{+}:subscript𝑔𝜎superscript𝑛g_{\sigma}:\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R ∪ { + ∞ } of class 𝒞2superscript𝒞2\mathcal{C}^{2}caligraphic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with L𝐿Litalic_L-Lipschitz continuous gradient with L<1𝐿1L<1italic_L < 1, and 𝒟σ=Igσsubscript𝒟𝜎𝐼subscript𝑔𝜎\mathcal{D}_{\sigma}=I-\nabla g_{\sigma}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT = italic_I - ∇ italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT with Im(𝒟σ)Imsubscript𝒟𝜎{\rm Im}(\mathcal{D}_{\sigma})roman_Im ( caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) being convex. Let f:n{+}:𝑓superscript𝑛f:\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R ∪ { + ∞ } is a proper closed function and hhitalic_h is differentiable Lhsubscript𝐿L_{h}italic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT-Lipschitz continuous gradient. Suppose that f𝑓fitalic_f, gσsubscript𝑔𝜎g_{\sigma}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT and hhitalic_h are bounded from below, Then, for α𝛼\alphaitalic_α and γ𝛾\gammaitalic_γ satisfying Section 3.1 with Lf1:=Lγ(1L)assignsubscript𝐿subscript𝑓1𝐿𝛾1𝐿L_{f_{1}}:=\frac{L}{\gamma(1-L)}italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT := divide start_ARG italic_L end_ARG start_ARG italic_γ ( 1 - italic_L ) end_ARG and l:=Lγ(L+1)assign𝑙𝐿𝛾𝐿1l:=\frac{L}{\gamma(L+1)}italic_l := divide start_ARG italic_L end_ARG start_ARG italic_γ ( italic_L + 1 ) end_ARG, the sequence {(𝐲k,𝐳k,𝐱k)}k1subscriptsuperscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘𝑘1\left\{({\bf y}^{k},{\bf z}^{k},{\bf x}^{k})\right\}_{k\geq 1}{ ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT generated by Algorithm 3 which is assumed to be bounded verify that

  • (i)

    {Θ^α,γ(𝐲k,𝐳k,𝐱k,𝐱k1,𝐱k2)}k1subscriptsubscript^Θ𝛼𝛾superscript𝐲𝑘superscript𝐳𝑘superscript𝐱𝑘superscript𝐱𝑘1superscript𝐱𝑘2𝑘1\left\{\widehat{\Theta}_{\alpha,\gamma}\left({\bf y}^{k},{\bf z}^{k},{\bf x}^{% k},{\bf x}^{k-1},{\bf x}^{k-2}\right)\right\}_{k\geq 1}{ over^ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT italic_k - 2 end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT is nonincreasing and converges.

  • (ii)

    the sequences {Δ𝐱k}superscriptsubscriptΔ𝐱𝑘\{\Delta_{\bf x}^{k}\}{ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT }, {Δ𝐲k}superscriptsubscriptΔ𝐲𝑘\{\Delta_{\bf y}^{k}\}{ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } and {𝐲k𝐳k}superscript𝐲𝑘superscript𝐳𝑘\{{\bf y}^{k}-{\bf z}^{k}\}{ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } vanish with rate minkKΔ𝐱k=𝒪(1K)subscript𝑘𝐾normsuperscriptsubscriptΔ𝐱𝑘𝒪1𝐾\min_{k\leq K}\|\Delta_{\bf x}^{k}\|=\mathcal{O}(\frac{1}{\sqrt{K}})roman_min start_POSTSUBSCRIPT italic_k ≤ italic_K end_POSTSUBSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = caligraphic_O ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ), minkKΔ𝐲k=𝒪(1K)subscript𝑘𝐾normsuperscriptsubscriptΔ𝐲𝑘𝒪1𝐾\min_{k\leq K}\|\Delta_{\bf y}^{k}\|=\mathcal{O}(\frac{1}{\sqrt{K}})roman_min start_POSTSUBSCRIPT italic_k ≤ italic_K end_POSTSUBSCRIPT ∥ roman_Δ start_POSTSUBSCRIPT bold_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = caligraphic_O ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ), and minkK𝐲k𝐳k=𝒪(1K)subscript𝑘𝐾normsuperscript𝐲𝑘superscript𝐳𝑘𝒪1𝐾\min_{k\leq K}\|{\bf y}^{k}-{\bf z}^{k}\|=\mathcal{O}(\frac{1}{\sqrt{K}})roman_min start_POSTSUBSCRIPT italic_k ≤ italic_K end_POSTSUBSCRIPT ∥ bold_y start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - bold_z start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ = caligraphic_O ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ), respectively.

  • (iii)

    any cluster point 𝐮:=(𝐲,𝐳,𝐱,𝐱,𝐱)assignsuperscript𝐮superscript𝐲superscript𝐳superscript𝐱superscript𝐱superscript𝐱{\bf u}^{*}:=({\bf y}^{*},{\bf z}^{*},{\bf x}^{*},{\bf x}^{*},{\bf x}^{*})bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := ( bold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) of sequence {𝐮k}k1subscriptsuperscript𝐮𝑘𝑘1\left\{{\bf u}^{k}\right\}_{k\geq 1}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT is a critical point of the problem Eq. 46, i.e., it holds that 0Fγ,σ(𝐲)0subscript𝐹𝛾𝜎superscript𝐲0\in\partial F_{\gamma,\sigma}({\bf y}^{*})0 ∈ ∂ italic_F start_POSTSUBSCRIPT italic_γ , italic_σ end_POSTSUBSCRIPT ( bold_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ).

  • (iv)

    if hhitalic_h is a twice continuously differentiable function with a bounded Hessian, i.e., there exists a constant M>0𝑀0M>0italic_M > 0 such that 2h(𝐲)M,𝐲nformulae-sequencenormsuperscript2𝐲𝑀for-all𝐲superscript𝑛\|\nabla^{2}h({\bf y})\|\leq M,\forall{\bf y}\in\mathbb{R}^{n}∥ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_h ( bold_y ) ∥ ≤ italic_M , ∀ bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, and Fγ,σsubscript𝐹𝛾𝜎F_{\gamma,\sigma}italic_F start_POSTSUBSCRIPT italic_γ , italic_σ end_POSTSUBSCRIPT in Eq. 46 is a KL function. Then, the whole sequence {𝐮k}k1subscriptsuperscript𝐮𝑘𝑘1\{{\bf u}^{k}\}_{k\geq 1}{ bold_u start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k ≥ 1 end_POSTSUBSCRIPT is convergent.

Proof 4.5.

It follows from Proposition 4.1 that ϕσsubscriptitalic-ϕ𝜎\phi_{\sigma}italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT is LL+1𝐿𝐿1\frac{L}{L+1}divide start_ARG italic_L end_ARG start_ARG italic_L + 1 end_ARG-weakly convex and ϕσsubscriptitalic-ϕ𝜎\nabla\phi_{\sigma}∇ italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT is L1L𝐿1𝐿\frac{L}{1-L}divide start_ARG italic_L end_ARG start_ARG 1 - italic_L end_ARG-Lipschitz on Im(𝒟σ)Imsubscript𝒟𝜎\operatorname{Im}(\mathcal{D}_{\sigma})roman_Im ( caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ). Thus, the problem Eq. 46 can be seen as a special form of Eq. 1 with f1=1γϕσsubscript𝑓11𝛾subscriptitalic-ϕ𝜎f_{1}=\frac{1}{\gamma}\phi_{\sigma}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT and f2=fsubscript𝑓2𝑓f_{2}=fitalic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_f. Since Im(𝒟σ)Imsubscript𝒟𝜎{\rm Im}(\mathcal{D}_{\sigma})roman_Im ( caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) is convex, it follows from [28, Appendix C.2] and Lemma 3.3 that (i) and (ii) hold. According to the assumptions on gσsubscript𝑔𝜎g_{\sigma}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT, we know that 𝒟σsubscript𝒟𝜎\mathcal{D}_{\sigma}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT is continuous on Im(𝒟σ)Imsubscript𝒟𝜎{\rm Im}(\mathcal{D}_{\sigma})roman_Im ( caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ), and then the assertion (iii) can be obtained according to Theorem 3.7. Moreover, the conclusion (iv) can be derived from Theorem 3.12. This completes the proof.

Remark 4.6.

As discussed in [28, 47], one can ensure that the Lipschitz constant L<1𝐿1L<1italic_L < 1 for gσsubscript𝑔𝜎\nabla g_{\sigma}∇ italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT is to softly constrain it by penalizing the spectral norm of the Hessian of gσsubscript𝑔𝜎g_{\sigma}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT in the denoiser training loss. This approach will be further explained in the experiments. In addition, if L>1𝐿1L>1italic_L > 1, one can relax the deep prior with a parameter η[0,1]𝜂01\eta\in[0,1]italic_η ∈ [ 0 , 1 ], given by 𝒟ση=η𝒟σ+(1η)Isuperscriptsubscript𝒟𝜎𝜂𝜂subscript𝒟𝜎1𝜂𝐼\mathcal{D}_{\sigma}^{\eta}=\eta\mathcal{D}_{\sigma}+(1-\eta)Icaligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_η end_POSTSUPERSCRIPT = italic_η caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT + ( 1 - italic_η ) italic_I. It is important to note that the relaxed deep prior 𝒟σηsuperscriptsubscript𝒟𝜎𝜂\mathcal{D}_{\sigma}^{\eta}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_η end_POSTSUPERSCRIPT exhibits the same property as stated in Proposition 4.1. More specifically, 𝒟σηsuperscriptsubscript𝒟𝜎𝜂\mathcal{D}_{\sigma}^{\eta}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_η end_POSTSUPERSCRIPT continues to be the proximal operator of a certain weakly convex functional. As a result, the condition becomes ηL<1𝜂𝐿1\eta L<1italic_η italic_L < 1, which can be easily guaranteed since η[0,1]𝜂01\eta\in[0,1]italic_η ∈ [ 0 , 1 ]. We refer to [25, Subsection 3.4] for more discussions.

5 Numerical experiments

In this section, we implement the extrapolated DYS algorithm with or without PnP denoiser on image deblurring and image super-resolution tasks, and compare numerical results with other advanced models and methods. All experiments are implemented with PyTorch on an NVIDIA RTX A6000 GPU.

We consider the image restoration problem with both sparse-induced regularization and Tikhonov regularization, whose mathematical model can be read as

(47) min𝐱n12ν2A𝐱𝐛2+r(𝐱)+β2𝐱2,subscript𝐱superscript𝑛12superscript𝜈2superscriptnorm𝐴𝐱𝐛2𝑟𝐱𝛽2superscriptnorm𝐱2\min_{{\bf x}\in\mathbb{R}^{n}}\frac{1}{2\nu^{2}}\|A{\bf x}-{\bf b}\|^{2}+r({% \bf x})+\frac{\beta}{2}\|{\bf x}\|^{2},roman_min start_POSTSUBSCRIPT bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ italic_A bold_x - bold_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_r ( bold_x ) + divide start_ARG italic_β end_ARG start_ARG 2 end_ARG ∥ bold_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where r()𝑟r(\cdot)italic_r ( ⋅ ) is the sparse-induced regularizer which maybe nonconvex, 𝐛𝐛\bf bbold_b is the observation, ν𝜈\nuitalic_ν is the Gaussian noise level and A𝐴Aitalic_A is the linear operator. When A𝐴Aitalic_A denotes the blur kernel, the model Eq. 47 corresponds to image deblurring problem, which aims to restore a clean image 𝐱superscript𝐱{\bf x}^{*}bold_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT from the observed image 𝐛𝐛\bf bbold_b. Additionally, if A=SB𝐴𝑆𝐵A=SBitalic_A = italic_S italic_B, where B𝐵Bitalic_B denotes the blur operator and S𝑆Sitalic_S is the standard s𝑠sitalic_s-fold downsampler (i.e., selecting the upper-left pixel for each distinct s×s𝑠𝑠s\times sitalic_s × italic_s patch), the model Eq. 47 reduces to the image super-resolution problem. This problem involves enhancing the resolution and quality of a low-resolution image to generate a high-resolution version of the same image. We can see that the model Eq. 47 falls into the form of Eq. 1 with f1(𝐱)=12ν2A𝐱𝐛2subscript𝑓1𝐱12superscript𝜈2superscriptnorm𝐴𝐱𝐛2f_{1}({\bf x})=\frac{1}{2\nu^{2}}\|A{\bf x}-{\bf b}\|^{2}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_x ) = divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ italic_A bold_x - bold_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, f2(𝐱)=r(𝐱)subscript𝑓2𝐱𝑟𝐱f_{2}({\bf x})=r({\bf x})italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_x ) = italic_r ( bold_x ) and h(𝐱)=β2𝐱2𝐱𝛽2superscriptnorm𝐱2h({\bf x})=\frac{\beta}{2}\|{\bf x}\|^{2}italic_h ( bold_x ) = divide start_ARG italic_β end_ARG start_ARG 2 end_ARG ∥ bold_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Additionally, the following model with sparse-induced regularization and box constraint is also widely used in solving image deblurring and image super-resolution problems:

(48) min𝐱12ν2A𝐱𝐛2+r(𝐱),subscript𝐱12superscript𝜈2superscriptnorm𝐴𝐱𝐛2𝑟𝐱\min_{{\bf x}\in\mathcal{B}}\frac{1}{2\nu^{2}}\|A{\bf x}-{\bf b}\|^{2}+r({\bf x% }),roman_min start_POSTSUBSCRIPT bold_x ∈ caligraphic_B end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ italic_A bold_x - bold_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_r ( bold_x ) ,

where \mathcal{B}caligraphic_B is a convex box. Model Eq. 48 is a special form of Eq. 1 if r()𝑟r(\cdot)italic_r ( ⋅ ) is smooth with f1(𝐱)=r(𝐱)subscript𝑓1𝐱𝑟𝐱f_{1}({\bf x})=r({\bf x})italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_x ) = italic_r ( bold_x ), f2(𝐱)=δ(𝐱)subscript𝑓2𝐱subscript𝛿𝐱f_{2}({\bf x})=\delta_{\mathcal{B}}({\bf x})italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_x ) = italic_δ start_POSTSUBSCRIPT caligraphic_B end_POSTSUBSCRIPT ( bold_x ) and h(𝐱)=12ν2A𝐱𝐛2𝐱12superscript𝜈2superscriptnorm𝐴𝐱𝐛2h({\bf x})=\frac{1}{2\nu^{2}}\|A{\bf x}-{\bf b}\|^{2}italic_h ( bold_x ) = divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ italic_A bold_x - bold_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where δ()subscript𝛿\delta_{\mathcal{B}}({\cdot})italic_δ start_POSTSUBSCRIPT caligraphic_B end_POSTSUBSCRIPT ( ⋅ ) denotes the indicator function.

In the experiments, we will consider two cases of r()𝑟r(\cdot)italic_r ( ⋅ ) for Eq. 47 and Eq. 48 as follows:

  • 1.

    r(𝐱)=𝐱TV𝑟𝐱subscriptnorm𝐱TVr({\bf x})=\|{\bf x}\|_{\rm TV}italic_r ( bold_x ) = ∥ bold_x ∥ start_POSTSUBSCRIPT roman_TV end_POSTSUBSCRIPT, the isotropic total-variational (TV) regularizer [20, 54];

  • 2.

    r(𝐱)=1γϕσ(𝐱)𝑟𝐱1𝛾subscriptitalic-ϕ𝜎𝐱r({\bf x})=\frac{1}{\gamma}\phi_{\sigma}({\bf x})italic_r ( bold_x ) = divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x ), the nonconvex regularizer in Eq. 45 induced by Gradient Step (GS) denoiser 𝒟σsubscript𝒟𝜎\mathcal{D}_{\sigma}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT.

We refer to the model Eq. 47 with the above two regularizers as TVTik and DeTik. Similarly, the model Eq. 48 with both regularizers is denoted as TVBox and DeBox, respectively. As discussed in Section 4, DeTik can be solved by Algorithm 2, while DeBox should be solved by Algorithm 3 due to the nonsmoothness of δsubscript𝛿\delta_{\mathcal{B}}italic_δ start_POSTSUBSCRIPT caligraphic_B end_POSTSUBSCRIPT. For the classical TV-based models, i.e., TVTik and TVBox, the split Bregman algorithm is applicable. More specifically, we import the image processing package ‘scikit-image’ in Python with ‘skimage.restoration.denoise_tv_bregman’ for solving the isotropic TV-subproblem with a maximum iteration of 100. Certainly, Algorithm 1 also can be used to solve TVTik, as there are two smooth terms with Lipschitz continuous gradient involved in Eq. 47. We initialize all the tested algorithms with 𝐱1=𝐱0=𝐛superscript𝐱1superscript𝐱0𝐛{\bf x}^{-1}={\bf x}^{0}={\bf b}bold_x start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = bold_b. The algorithms are terminated when the relative difference between consecutive values of the objective function is less than ε=108𝜀superscript108\varepsilon=10^{-8}italic_ε = 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT or the number of iterations exceeds kmax=1000subscript𝑘1000k_{\max}=1000italic_k start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 1000.

As aforementioned, we utilize the deep GS denoiser to replace the traditional regularizer. Specifically, in the experiments, we employ the classical DRUNet [78] as our denoiser 𝒟σsubscript𝒟𝜎\mathcal{D}_{\sigma}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT. DRUNet incorporates both U-Net and ResNet architectures and takes an additional noise level map as input, achieving state-of-the-art performance in Gaussian noise removal. To ensure L<1𝐿1L<1italic_L < 1 of the Lipschitz constant of gσsubscript𝑔𝜎\nabla g_{\sigma}∇ italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT in Eq. 44, following the approach in [47], we regularize the training loss of 𝒟σsubscript𝒟𝜎\mathcal{D}_{\sigma}caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT using the spectral norm of the Hessian of gσsubscript𝑔𝜎g_{\sigma}italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT as follows:

(49) S(σ)=𝔼𝐱p,ξσ𝒩(0,σ2)[𝒟σ(𝐱+ξσ)𝐱2+μmax(2gσ(𝐱+ξσ)S,1ϵ)],subscript𝑆𝜎subscript𝔼formulae-sequencesimilar-to𝐱𝑝similar-tosubscript𝜉𝜎𝒩0superscript𝜎2delimited-[]superscriptnormsubscript𝒟𝜎𝐱subscript𝜉𝜎𝐱2𝜇subscriptnormsuperscript2subscript𝑔𝜎𝐱subscript𝜉𝜎𝑆1italic-ϵ\mathcal{L}_{S}(\sigma)=\mathbb{E}_{{\bf x}\sim p,\xi_{\sigma}\sim\mathcal{N}(% 0,\sigma^{2})}\left[\|\mathcal{D}_{\sigma}({\bf x}+\xi_{\sigma})-{\bf x}\|^{2}% +\mu\max(\|\nabla^{2}g_{\sigma}({\bf x}+\xi_{\sigma})\|_{S},1-\epsilon)\right],caligraphic_L start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_σ ) = blackboard_E start_POSTSUBSCRIPT bold_x ∼ italic_p , italic_ξ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT [ ∥ caligraphic_D start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x + italic_ξ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) - bold_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ roman_max ( ∥ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x + italic_ξ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT , 1 - italic_ϵ ) ] ,

where p𝑝pitalic_p is the distribution of a dataset of clean images and S\|\cdot\|_{S}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT is the spectral norm. We set ϵ=0.1italic-ϵ0.1\epsilon=0.1italic_ϵ = 0.1 and μ=0.01𝜇0.01\mu=0.01italic_μ = 0.01 according to [28]. Following the setting of [27], we have retrained the DRUNet [78] with loss function Eq. 49 on the Berkeley segmentation dataset, Waterloo Exploration Database, DIV2K dataset, and Flick2K dataset. For the image deblurring problem, ten different blur kernels111https://github.com/Huang-chao-yan/convergent_pnp/tree/main/kernels (from Ker1 to Ker10) and three noise levels: ν={2.55,7.65,12.75}𝜈2.557.6512.75\nu=\{2.55,7.65,12.75\}italic_ν = { 2.55 , 7.65 , 12.75 } will be used to simulate the degraded image.

5.1 Effect of extrapolation

We first test the effectiveness of extrapolation parameter α𝛼\alphaitalic_α by applying Algorithm 2 to solve the DeTik model. For the DeTik model, we know that Lf1=1ν2λmax(AA)subscript𝐿subscript𝑓11superscript𝜈2subscript𝜆superscript𝐴top𝐴L_{f_{1}}=\frac{1}{\nu^{2}}\lambda_{\max}(A^{\top}A)italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A ), l=Lf1𝑙subscript𝐿subscript𝑓1l=-L_{f_{1}}italic_l = - italic_L start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and Lh=βsubscript𝐿𝛽L_{h}=\betaitalic_L start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = italic_β, where λmaxsubscript𝜆\lambda_{\max}italic_λ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT denotes the maximal eigenvalue of a given matrix. In the experiment, we set the model parameter β[0.0005,0.001]𝛽0.00050.001\beta\in[0.0005,0.001]italic_β ∈ [ 0.0005 , 0.001 ] for different noise levels in Eq. 47. It follows from Section 3.1 that 0α<Λ(γ)0𝛼Λ𝛾0\leq\alpha<\Lambda(\gamma)0 ≤ italic_α < roman_Λ ( italic_γ ). Therefore, for a given and fixed γ𝛾\gammaitalic_γ that satisfies Eq. 10, we test the values of α={0,0.25,0.50,0.75,0.99}Λ(γ)𝛼00.250.500.750.99Λ𝛾\alpha=\{0,0.25,0.50,0.75,0.99\}*\Lambda(\gamma)italic_α = { 0 , 0.25 , 0.50 , 0.75 , 0.99 } ∗ roman_Λ ( italic_γ ) by performing a numerical comparison of the computational cost and the quality of recovery for the image deblurring problem.

Refer to caption

Refer to caption

Refer to caption

Figure 1: Effect of α𝛼\alphaitalic_α in Algorithm 2 for solving DeTik model on ‘butterfly’ with Ker1 and noise level 2.552.552.552.55. Increasing the extrapolation parameter α𝛼\alphaitalic_α speeds-up the convergence of the algorithm. This increased convergence speed does not alter the quality of the proposed restoration.

In Fig. 1, we report the effect of α𝛼\alphaitalic_α on ‘butterfly’ with Ker1 and noise level 2.552.552.552.55. More specifically, the evolution curves of the convergence of residual 𝐱k+1𝐱knormsuperscript𝐱𝑘1superscript𝐱𝑘\left\|{\bf x}^{k+1}-{\bf x}^{k}\right\|∥ bold_x start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∥ at rate minjk𝐱j+1𝐱j2subscript𝑗𝑘superscriptnormsuperscript𝐱𝑗1superscript𝐱𝑗2\min_{j\leq k}\left\|{\bf x}^{j+1}-{\bf x}^{j}\right\|^{2}roman_min start_POSTSUBSCRIPT italic_j ≤ italic_k end_POSTSUBSCRIPT ∥ bold_x start_POSTSUPERSCRIPT italic_j + 1 end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, PSNR and SSIM values with respect to the number of iterations are presented, which showcases the advantage of the proposed extrapolation step. Furthermore, the detailed results include iteration number (Iter.), computational time in seconds (Time(s)), recovered PSNR (dB), and SSIM for three tested images (butterfly, leaves, and starfish) in Sect3C with different levels of noise are reported in Appendix A. From the presented results, we can see that Algorithm 2 exhibits improved performance as the extrapolation stepsize α𝛼\alphaitalic_α increases, particularly in terms of computational cost. In our subsequent experiments, we set α=0.99Λ(γ)𝛼0.99Λ𝛾\alpha=0.99*\Lambda(\gamma)italic_α = 0.99 ∗ roman_Λ ( italic_γ ) for a given γ𝛾\gammaitalic_γ to obtain results more efficiently.

Refer to caption

PSNR value along with γνsubscript𝛾𝜈\gamma_{\nu}italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT

Refer to caption

γν=0.01subscript𝛾𝜈0.01\gamma_{\nu}=0.01italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = 0.01

Refer to caption

γν=1subscript𝛾𝜈1\gamma_{\nu}=1italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = 1

Refer to caption

γν=1.3subscript𝛾𝜈1.3\gamma_{\nu}=1.3italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = 1.3

Figure 2: Influence of the parameter γν=ν2γsubscript𝛾𝜈superscript𝜈2𝛾\gamma_{\nu}=\frac{\nu^{2}}{\gamma}italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = divide start_ARG italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG for deblurring with DeTik model. First column: average PSNR along with the γνsubscript𝛾𝜈\gamma_{\nu}italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT. The other parameters are fixed. Remaining columns: visual results for deblurring ‘leaves’ with various γνsubscript𝛾𝜈\gamma_{\nu}italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT.

Refer to caption

PSNR value along with σνsubscript𝜎𝜈\sigma_{\nu}italic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT

Refer to caption

σν=0.6subscript𝜎𝜈0.6\sigma_{\nu}=0.6italic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = 0.6

Refer to caption

σν=1.4subscript𝜎𝜈1.4\sigma_{\nu}=1.4italic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = 1.4

Refer to caption

σν=10subscript𝜎𝜈10\sigma_{\nu}=10italic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = 10

Figure 3: Influence of the parameter σνsubscript𝜎𝜈\sigma_{\nu}italic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT for deblurring with DeTik model. First column: average PSNR along with the σνsubscript𝜎𝜈\sigma_{\nu}italic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT. The other parameters are fixed. Remaining columns: visual results for deblurring ‘leaves’ with various σνsubscript𝜎𝜈\sigma_{\nu}italic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT.

5.2 Parameter analysis

In this subsection, we study the influence of the parameters and initialization of Algorithm 2 for solving the DeTik model. Recall that DeTik can be read as

min𝐱n12ν2A𝐱𝐛2+1γϕσ(𝐱)+β2𝐱2,subscript𝐱superscript𝑛12superscript𝜈2superscriptnorm𝐴𝐱𝐛21𝛾subscriptitalic-ϕ𝜎𝐱𝛽2superscriptnorm𝐱2\min_{{\bf x}\in\mathbb{R}^{n}}\frac{1}{2\nu^{2}}\|A{\bf x}-{\bf b}\|^{2}+% \frac{1}{\gamma}\phi_{\sigma}({\bf x})+\frac{\beta}{2}\|{\bf x}\|^{2},roman_min start_POSTSUBSCRIPT bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ italic_A bold_x - bold_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_γ end_ARG italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( bold_x ) + divide start_ARG italic_β end_ARG start_ARG 2 end_ARG ∥ bold_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where ν𝜈\nuitalic_ν and σ𝜎\sigmaitalic_σ are the noise levels of the synth input image and the denoiser ϕσsubscriptitalic-ϕ𝜎\phi_{\sigma}italic_ϕ start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT, respectively. We fix model parameter β𝛽\betaitalic_β for different noise levels as that in the last subsection, and roughly estimate σ𝜎\sigmaitalic_σ proportionally to the input noise level ν𝜈\nuitalic_ν as σ=σνν𝜎subscript𝜎𝜈𝜈\sigma=\sigma_{\nu}*\nuitalic_σ = italic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∗ italic_ν, where σνsubscript𝜎𝜈\sigma_{\nu}italic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT is a positive constant. Consequently, the parameters we will be testing are γν=ν2γsubscript𝛾𝜈superscript𝜈2𝛾\gamma_{\nu}=\frac{\nu^{2}}{\gamma}italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = divide start_ARG italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_γ end_ARG and σν=σ/νsubscript𝜎𝜈𝜎𝜈\sigma_{\nu}=\sigma/\nuitalic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = italic_σ / italic_ν.

In Fig. 2, we display the average PSNR value of Set3C using 10 tested blur kernels under a noise level of 2.552.552.552.55, where γνsubscript𝛾𝜈\gamma_{\nu}italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ranges from 0.10.10.10.1 to 1.41.41.41.4 with a step size of 0.10.10.10.1. From the results, we can see that the instances with γνsubscript𝛾𝜈\gamma_{\nu}italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT values around 1 exhibit superior performance compared to other cases. This observation is further supported by the restored images on the right-hand side, which demonstrate that the quality of that corresponding to γν=1subscript𝛾𝜈1\gamma_{\nu}=1italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = 1 is better than those for γν=0.01subscript𝛾𝜈0.01\gamma_{\nu}=0.01italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = 0.01 and γν=1.3subscript𝛾𝜈1.3\gamma_{\nu}=1.3italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = 1.3. When γν=0.01subscript𝛾𝜈0.01\gamma_{\nu}=0.01italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = 0.01, the noise is removed, but the blur remains. for a larger value of γν=1.3subscript𝛾𝜈1.3\gamma_{\nu}=1.3italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = 1.3, both the noise and blur remain. Hence, in our experiments, we chose γν=1subscript𝛾𝜈1\gamma_{\nu}=1italic_γ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT = 1 to address the noise level of 2.552.552.552.55. Next, we test the effect of the parameter σνsubscript𝜎𝜈\sigma_{\nu}italic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT and present the average PSNR value of Set3C with 10 tested blur kernels under noise level 2.552.552.552.55 for σν[0.6,2.4]subscript𝜎𝜈0.62.4\sigma_{\nu}\in[0.6,2.4]italic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∈ [ 0.6 , 2.4 ] with a step size 0.20.20.20.2 in Fig. 3. The results indicate that almost no deblurring occurs when the value of σνsubscript𝜎𝜈\sigma_{\nu}italic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT is small. Conversely, as σνsubscript𝜎𝜈\sigma_{\nu}italic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT increases, excessive smoothing takes place, resulting in the loss of image details. Based on both the curve analysis and the visual outcomes, we select σν[1,2]subscript𝜎𝜈12\sigma_{\nu}\in[1,2]italic_σ start_POSTSUBSCRIPT italic_ν end_POSTSUBSCRIPT ∈ [ 1 , 2 ].

We further investigate the impact of the initialization of Algorithm 2. In Fig. 4, we plot the average PSNR value of Set3C obtained from 10 tested blur kernels under a noise level of 2.552.552.552.55. Due to the nonconvex regularizer, the proposed scheme is sensitive to initial value. Following the setting of [28], the initial 𝐱0superscript𝐱0{\bf x}^{0}bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT is varied with different noise levels: {0.01,2.55,5,7.5,10}0.012.5557.510\{0.01,2.55,5,7.5,10\}{ 0.01 , 2.55 , 5 , 7.5 , 10 }. Based on the PSNR curve and visual quality in Fig. 4, we can see that a suitable initial input is crucial for the image deblurring task. When an initial input closely resembles the ground truth image, certain images may not undergo further iterations and terminate prematurely, particularly when the stopping criteria remain unchanged. On the other hand, if a heavily noisy image serves as the initial input, the iteration process progresses smoothly. However, the resulting image retains the heavy noise due to the low-level denoiser’s inability to effectively handle such noise levels. In our experiments, we adopt the observation as the initial input to ensure the validity of the obtained results.

Refer to caption

PSNR value along with νinitsubscript𝜈𝑖𝑛𝑖𝑡\nu_{init}italic_ν start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT

Refer to caption

νinit=0.01/255subscript𝜈𝑖𝑛𝑖𝑡0.01255\nu_{init}=0.01/255italic_ν start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT = 0.01 / 255

Refer to caption

νinit=2.55/255subscript𝜈𝑖𝑛𝑖𝑡2.55255\nu_{init}=2.55/255italic_ν start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT = 2.55 / 255

Refer to caption

νinit=10/255subscript𝜈𝑖𝑛𝑖𝑡10255\nu_{init}=10/255italic_ν start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT = 10 / 255

Figure 4: Influence of the initialitation 𝐱0superscript𝐱0{\bf x}^{0}bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT for deblurring with DeTik model. First column: average PSNR along with different 𝐱0superscript𝐱0{\bf x}^{0}bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT. The other parameters are fixed. Remaining columns: visual results for deblurring ‘leaves’ with various 𝐱0superscript𝐱0{\bf x}^{0}bold_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT.

5.3 Image deblurring and super-resolution

In this subsection, we are devoted to demonstrating the effectiveness and robustness of the proposed Algorithm 2 and Algorithm 3 by solving image deblurring and super-resolution problems.

As discussed in Section 4.1, Algorithm 2 can be utilized to solve DeTik model due to the smoothness of f1(𝐱)=12ν2A𝐱𝐛2subscript𝑓1𝐱12superscript𝜈2superscriptnorm𝐴𝐱𝐛2f_{1}({\bf x})=\frac{1}{2\nu^{2}}\|A{\bf x}-{\bf b}\|^{2}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_x ) = divide start_ARG 1 end_ARG start_ARG 2 italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ italic_A bold_x - bold_b ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where ν𝜈\nuitalic_ν is the noise level; Algorithm 3 can be used to solve DeBox model mentioned in Eq. 48. We first determine an appropriate γ𝛾\gammaitalic_γ satisfying Section 3.1 and set α=0.99Λ(γ)𝛼0.99Λ𝛾\alpha=0.99*\Lambda(\gamma)italic_α = 0.99 ∗ roman_Λ ( italic_γ ). We consider Gaussian noise with 3 noise levels ν{2.55,7.65,12.75}/255𝜈2.557.6512.75255\nu\in\{2.55,7.65,12.75\}/255italic_ν ∈ { 2.55 , 7.65 , 12.75 } / 255, i.e., ν{0.01,0.03,0.05}𝜈0.010.030.05\nu\in\{0.01,0.03,0.05\}italic_ν ∈ { 0.01 , 0.03 , 0.05 }, and 2 scale factors ×2,×3\times 2,\times 3× 2 , × 3. For the tested noise levels, we set σ={1.4ν,0.7ν,0.6ν}𝜎1.4𝜈0.7𝜈0.6𝜈\sigma=\{1.4\nu,0.7\nu,0.6\nu\}italic_σ = { 1.4 italic_ν , 0.7 italic_ν , 0.6 italic_ν }, ν2/γ={1,0.9,0.6}superscript𝜈2𝛾10.90.6\nu^{2}/\gamma=\{1,0.9,0.6\}italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_γ = { 1 , 0.9 , 0.6 } in Algorithm 2 for both image deblurring and super-resolution. For all noise levels, we set σ={2ν,1ν,0.75ν}𝜎2𝜈1𝜈0.75𝜈\sigma=\{2\nu,1\nu,0.75\nu\}italic_σ = { 2 italic_ν , 1 italic_ν , 0.75 italic_ν } and ν2/γ={5,1.5,1}superscript𝜈2𝛾51.51\nu^{2}/\gamma=\{5,1.5,1\}italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_γ = { 5 , 1.5 , 1 } in Algorithm 3 for both tasks. We test the proposed algorithms for different tasks and compare the numerical results recovered by DeTik and DeBox.

Table 1: Numerical results (PSNR(dB)) of our DeTik and DeBox for image deblurring with Ker1 and 3 noise levels on Dataset Set3C.
Noise Level 2.55 7.65 12.75
Images Butterfly Leaves Starfish Butterfly Leaves Starfish Butterfly Leaves Starfish
Degraded 17.68 16.50 21.56 17.48 16.34 21.09 17.10 16.06 20.28
DeTik 33.18 34.02 33.14 29.91 30.34 29.78 27.94 28.06 27.58
DeBox 33.62 33.80 33.53 29.75 30.20 29.59 27.90 27.96 27.57

For the image deblurring task, we test four classical datasets, i.e., Set3C, Set14, Kodak24, and Set17, with different blur kernels and noise levels. For the sake of brevity, we present the image deblur results of Ker1 with various noise levels on Set3C in Table 1, and more results can be found in Appendix B. Our proposed methods demonstrate competitive performance in the task of image deblurring across different noise levels. On the other hand, the visual results of image ‘powerpoint2002’ in Set14 degraded by the blur Ker6 and noise level 12.7512.7512.7512.75 can be found in Fig. 5. To assess the convergence of the proposed algorithms in the experimental aspect, the evolution and energy curves are plotted and presented alongside the corresponding recovered images.

For the image super-resolution task, we set the scale factor as ×2absent2\times 2× 2 and ×3absent3\times 3× 3. Meanwhile, the blur and noise (mentioned in the deblurring task) are also considered in the experiments. The image super-resolution results on datasets Set5, CBSD68, and Urban100 are reported in Appendix B. More specifically, we report the numerical results on Set5 in Table 2. Furthermore, the visual results for noise level 7.657.657.657.65 with blur Ker8 and scale factor ×2absent2\times 2× 2 are shown in Fig. 6. The evolution and energy curves demonstrate the convergence of the proposed approaches in the experiment, which aligns with our theoretical results.

Table 2: Numerical results (PSNR(dB)) of our DeTik and DeBox for image super-resolution with Ker1 and 3 noise levels and scales ×2absent2\times 2× 2 and ×3absent3\times 3× 3 on Dataset Set5.
Scales Noise Level 2.55 7.65 12.75
Images Baby Bird Butterfly Head Woman Baby Bird Butterfly Head Woman Baby Bird Butterfly Head Woman
×2absent2\times 2× 2 Degraded 28.82 24.73 17.75 25.52 22.73 27.61 24.23 17.64 24.93 22.41 25.90 23.36 17.43 23.94 21.82
DeTik 33.93 31.90 27.88 29.17 30.63 32.49 29.75 26.42 28.53 29.02 31.51 27.87 24.67 27.74 26.81
DeBox 34.26 31.85 27.19 29.19 30.51 32.45 29.53 26.16 28.32 28.89 31.42 27.85 24.77 27.63 26.95
×3absent3\times 3× 3 Degraded 28.75 24.72 17.75 25.43 22.66 27.20 24.05 17.61 24.65 22.23 25.15 22.94 17.34 23.40 21.47
DeTik 32.40 29.17 22.51 28.29 27.44 31.51 27.59 23.68 27.77 26.47 30.46 26.05 22.35 27.20 25.14
DeBox 32.53 29.09 22.91 27.67 27.06 31.54 27.44 23.37 27.69 26.45 30.63 26.15 22.44 27.15 25.20

Refer to caption

(a) Original

Refer to caption

(b) Observed (18.14 dB)

Refer to caption

(c) DeTik (30.04 dB)

Refer to caption

(d) DeBox (30.48 dB)

Refer to caption

(e) Evolution of (c)

Refer to caption

(f) Evolution of (d)

Refer to caption

(g) Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT value of (c)

Refer to caption

(h) Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT value of (d)

Figure 5: The deblurring results of DeTik and DeBox on image degradation with Ker6 and noise level 12.7512.7512.7512.75. The evolution and Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT value along with the number of iterations.

Refer to caption

(a) Original

Refer to caption

(b) Observed (18.03 dB)

Refer to caption

(c) DeTik (24.82 dB)

Refer to caption

(d) DeBox (24.79 dB)

Refer to caption

(e) Evolution of (c)

Refer to caption

(f) Evolution of (d)

Refer to caption

(g) Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT value of (c)

Refer to caption

(h) Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT value of (d)

Figure 6: The super-resolution results of DeTik and DeBox on image degradation with scale (×2absent2\times 2× 2) Ker8 and noise level 7.657.657.657.65. The evolution and Θα,γsubscriptΘ𝛼𝛾\Theta_{\alpha,\gamma}roman_Θ start_POSTSUBSCRIPT italic_α , italic_γ end_POSTSUBSCRIPT value along with the number of iterations.
Table 3: Comparison on average image deblurring results (PSNR(dB)) of the state-of-the-art methods with our methods on Set3C, Set14, and Set17 datasets.
Datasets Noise Level Degraded DWDN DP-IRCNN DPIR DREDDUN Alg. 1 Alg. 2 ADMM Alg. 3
TVTik DeTik TVBox DeBox
Set3C 2.55 19.93 30.92 30.92 32.55 30.71 29.46 30.98 28.84 31.24
7.65 19.52 28.62 27.60 28.60 28.62 25.10 28.78 25.18 28.62
12.75 18.84 26.92 25.93 26.80 26.97 23.34 27.08 23.39 27.08
Set14 2.55 22.82 31.08 30.64 31.76 31.16 28.47 30.17 27.68 30.08
7.65 22.10 28.41 28.13 28.79 28.57 26.68 28.47 26.06 28.33
12.75 21.03 27.20 27.03 27.32 27.38 25.30 27.30 25.10 27.32
Set17 2.55 25.28 33.14 32.35 33.98 33.41 30.56 32.60 30.67 32.43
7.65 24.07 30.39 29.83 30.64 30.62 27.73 30.64 27.85 30.55
12.75 22.55 28.93 28.74 29.40 29.24 26.33 29.25 26.54 29.29

5.4 Comparison with state-of-the-art methods

In the preceding subsections, we have substantiated the validity of the proposed algorithm in handling both smooth and non-smooth objective functions. However, these evaluations alone do not entirely showcase the advantage of our method. Hence, in this subsection, we conduct a comparative analysis with state-of-the-art methods to provide further evidence of the exceptional effectiveness of our approach.

Refer to caption
Refer to caption
(a) Original
Refer to caption
Refer to caption
(b) Observed (23.02 dB)
Refer to caption
Refer to caption
(c) DWDN (31.05 dB)
Refer to caption
Refer to caption
(d) DP-IRCNN (31.22 dB)
Refer to caption
Refer to caption
(e) DPIR (31.97 dB)
Refer to caption
Refer to caption
(f) DREDDUN (31.19 dB)
Refer to caption
Refer to caption
(g) DeTik (32.05 dB)
Refer to caption
Refer to caption
(h) DeBox (31.61 dB)
Figure 7: Image deblurring results with Ker9 and noise level 12.75. (g) is the result of proposed Algorithm 2 for DeTik; (h) is the result of the proposed Algorithm 3 for DeBox.

5.4.1 Comparisons with advanced deblurring models

Following the implementation of the plug-and-play strategy, our proposed method integrates a denoiser into the objective function. Consequently, several methods that employ the same strategy are compared. While these methods yield competitive results, it is important to note that our proposed method holds a distinct advantage in terms of theoretical analysis. Specifically, our method guarantees convergence, whereas not all of the compared methods provide such a guarantee. In this paper, some plug-and-play methods and unrolling models DWDN [18], DPIR [78] with IRCNN [80] (DP-IRCNN), DPIR [78] with DRUNet (DPIR), and DREDDUN [30] are compared. All the compared codes were obtained either from the official published versions or were graciously provided by the authors themselves.

To provide more comprehensive results of the image deblurring, we compiled the average results for 10 blur kernels and 3 noise levels in Table 3. We list the results of the proposed two algorithms with two cases, respectively. From the numerical results, it becomes evident that our DeTik and DeBox yield competitive performance compared to deep learning-based plug-and-play and unrolling methods. Nevertheless, it is important to note that the traditional TVTik and TVBox cases may exhibit less satisfactory results, which is understandable considering that deep learning-based models have the advantage of leveraging more prior information compared to traditional priors. Furthermore, the visual results are depicted in Fig. 7 for a more comprehensive illustration. Note that we only present our PnP-based results (DeTik and DeBox) for visual comparison. We can see that although the PnP-based methods usually cause over-smoothing, the proposed algorithms (DeTik and DeBox) exhibit superior performance in detail restoration compared to the other methods.

Table 4: Comparison on average image super-resolution results (PSNR(dB)) of the state-of-the-art methods with our methods on Set5 and Urban100 datasets.
Scales Datasets Noise Level Bicubic USRNet DP-IRCNN DPIR DREDDUN Alg. 1 Alg. 2 ADMM Alg. 3
TVTik DeTik TVBox DeBox
×2absent2\times 2× 2 Set5 2.55 24.21 30.75 29.33 31.07 30.49 28.16 30.29 27.70 30.51
7.65 23.48 29.38 27.76 28.81 28.46 26.59 29.16 26.04 29.12
12.75 22.45 27.98 26.96 27.60 27.34 23.26 27.91 24.40 27.99
Urban100 2.55 19.15 25.67 25.34 25.40 25.43 21.23 24.10 21.51 23.86
7.65 18.93 24.49 23.69 24.52 23.81 19.86 24.34 20.80 23.18
12.75 18.53 22.92 22.68 23.18 22.89 19.24 23.29 19.81 22.34
×3absent3\times 3× 3 Set5 2.55 23.29 30.11 27.99 28.95 28.55 25.79 28.20 26.09 28.42
7.65 22.71 28.19 26.52 27.22 27.11 25.19 27.65 25.72 27.64
12.75 21.84 27.04 25.68 26.18 26.14 24.57 26.61 25.03 26.67
Urban100 2.55 18.54 24.03 22.80 23.62 23.12 21.52 23.14 20.45 21.72
7.65 18.35 22.12 21.90 22.36 21.67 20.05 21.65 19.92 21.45
12.75 18.00 20.93 20.37 20.91 20.91 19.16 20.70 19.40 20.93

5.4.2 Comparisons with advanced super-resolution models

For image super-resolution task, USRNet [79], IRCNN [80] (DP-IRCNN), DPIR [78] with DRUNet (DPIR), and DREDDUN [30] are compared. All the compared codes used in our study were obtained either from the official published versions or were graciously provided by the authors themselves. Note that when addressing the image super-resolution task with sample scales ×2absent2\times 2× 2 and ×3absent3\times 3× 3, we simulated the degraded images by incorporating blur and noise during the sampling process. Specifically, we added 10 blur kernels and introduced the 3 Gaussian noises mentioned earlier.

The average image super-resolution results of the proposed algorithms with other advanced super-resolution models are listed in Table 4. We can see that our methods achieve competitive results under different scaling factors. While it is true that some compared methods outperform the proposed algorithm in some degradation cases, it is important to note that most of these methods lack convergence guarantees. Furthermore, we conducted a visual comparison of the renderings in Fig. 8, in which the proposed methods exhibit distinct advantages. Our proposed method excels in detail recovery when compared to other methods. Hence, based on both theoretical guarantees and experimental evidence, the algorithms we proposed exhibit distinct advantages when applied to image super-resolution tasks.

Refer to caption
Refer to caption
(a) Original
Refer to caption
Refer to caption
(b) Observed (18.56 dB)
Refer to caption
Refer to caption
(c) USRNet (22.28 dB)
Refer to caption
Refer to caption
(d) DP-IRCNN (22.10 dB)
Refer to caption
Refer to caption
(e) DPIR (22.51 dB)
Refer to caption
Refer to caption
(f) DREDDUN (21.53 dB)
Refer to caption
Refer to caption
(g) DeTik (22.84 dB)
Refer to caption
Refer to caption
(h) DeBox (22.62 dB)
Figure 8: Image super-resolution results with scale ×2absent2\times 2× 2, Ker1 and noise level 7.65. (g) is the result of proposed Algorithm 2 for DeTik; (h) is the results of the proposed Algorithm 3 for DeBox.

6 Conclusions

This paper studied an extrapolated three-operator splitting method for solving a class of structural nonconvex optimization problems that minimize the sum of three functions. Our method extends the Davis-Yin splitting approach, which encompasses the widely-used forward-backward and Douglas-Rachford splitting methods, and introduces extrapolation techniques to handle nonconvex optimization problems. The convergence to a stationary point has been established by leveraging the Kurdyka-Łojasiewicz property. To further enhance the applicability, we applied the proposed splitting method within the Plug-and-Play (PnP) approach, incorporating a learned denoiser. The extrapolated PnP-based splitting methods replace the regularization step with a denoiser based on gradient step-based techniques, and we have provided theoretical guarantees for their convergence. This integration allows us to leverage the power of learning-based models. Furthermore, we have conducted extensive numerical experiments to evaluate the performance of our proposed methods on image deblurring and super-resolution problems. The results of these experiments have demonstrated the advantages and efficiency of the extrapolation strategy employed in our algorithmic framework. Importantly, our experiments have highlighted the superiority of the learning-based model with the PnP denoiser in terms of image quality.

In future research, we will consider the variants of the proposed method, such as incorporating line search, inexact solving techniques, and dynamically adapting parameter choices, to extend the applicability of our framework to a broader range of practical problems. Further theoretical investigations are warranted to establish convergence guarantees for splitting methods combined with other efficient PnP denoisers, such as the Bregman-based denoiser proposed in [26] for various Poisson inverse problems. In addition, investigating the potential applications of the proposed methods in the field of medical image processing is a crucial aspect of our future work.

Appendix A Experimental results on effect of extrapolation

We report the average image deblurring results under 10 different blur kernels and 3 noise levels in Table 5, which include iteration number (Iter.), computational time in seconds (Time(s)), recovered PSNR (dB), and SSIM for three tested images (butterfly, leaves, and starfish) in Sect3C with different levels of noise. From the presented results, we can see that Algorithm 2 exhibits improved performance as the extrapolation stepsize α𝛼\alphaitalic_α increases, particularly in terms of computational cost. Increasing the extrapolation parameter α𝛼\alphaitalic_α speeds-up the convergence of the algorithm. This increased convergence speed does not alter the quality of the proposed restoration.

Table 5: Parameter analysis of α𝛼\alphaitalic_α in Algorithm 2 for image deblurring by DeTik model on the dataset Set3C with different noise levels.
α𝛼\alphaitalic_α Image butterfly leaves starfish
Noise Level 2.55 7.65 12.75 2.55 7.65 12.75 2.55 7.65 12.75
0 Iter. 681 1001 512 436 596 428 388 396 513
Time(s) 25.97 36.79 18.87 15.54 20.92 15.61 13.56 13.75 18.30
PSNR 33.18 29.91 27.94 33.97 30.33 28.05 33.11 29.78 27.57
SSIM 0.9760 0.9569 0.9367 0.9890 0.9760 0.9617 0.9551 0.9233 0.8866
0.25Λ(γ)0.25Λ𝛾0.25*\Lambda(\gamma)0.25 ∗ roman_Λ ( italic_γ ) Iter. 617 972 457 383 532 381 340 351 417
Time(s) 22.54 37.87 17.03 13.13 19.00 13.72 11.42 12.40 14.74
PSNR 33.18 29.91 27.94 33.97 30.33 28.05 33.11 29.78 27.57
SSIM 0.9760 0.9569 0.9367 0.9890 0.9760 0.9617 0.9551 0.9233 0.8866
0.50Λ(γ)0.50Λ𝛾0.50*\Lambda(\gamma)0.50 ∗ roman_Λ ( italic_γ ) Iter. 550 901 403 332 467 226 297 308 396
Time(s) 20.07 36.18 15.07 11.79 16.32 12.07 10.11 10.51 13.88
PSNR 33.18 29.91 27.94 33.97 30.33 28.05 33.11 29.78 27.57
SSIM 0.9760 0.9569 0.9367 0.9890 0.9760 0.9617 0.9551 0.9233 0.8866
0.75Λ(γ)0.75Λ𝛾0.75*\Lambda(\gamma)0.75 ∗ roman_Λ ( italic_γ ) Iter. 463 873 348 279 405 317 251 265 507
Time(s) 16.77 37.86 12.46 10.06 14.28 10.99 8.71 9.35 17.20
PSNR 33.18 29.91 27.94 33.9 30.33 28.05 33.11 29.78 27.58
SSIM 0.9760 0.9569 0.9367 0.9890 0.9760 0.9617 0.9551 0.9233 0.8866
0.99Λ(γ)0.99Λ𝛾0.99*\Lambda(\gamma)0.99 ∗ roman_Λ ( italic_γ ) Iter. 375 861 491 225 344 26 204 223 276
Time(s) 13.63 32.35 18.41 7.66 12.10 9.21 6.69 7.69 9.31
PSNR 33.18 29.91 27.95 33.97 30.33 28.05 33.14 29.78 27.57
SSIM 0.9760 0.9569 0.9367 0.9890 0.9760 0.9617 0.9551 0.9233 0.8866

Refer to caption

Set3C

Refer to caption

Set14

Refer to caption

Kodak24

Refer to caption

Set17

Figure 9: Average results (PSNR(dB)) of TVTik and DeTik for image deblurring with 10 different blur kernels and 3 noise levels on Set3C, Set14, Kodak24, and Set17 datasets.

Refer to caption

Set5 (×2absent2\times 2× 2)

Refer to caption

CBSD68 (×2absent2\times 2× 2)

Refer to caption

Urban100 (×2absent2\times 2× 2)

Refer to caption

Set5 (×3absent3\times 3× 3)

Refer to caption

CBSD68 (×3absent3\times 3× 3)

Refer to caption

Urban100 (×3absent3\times 3× 3)

Figure 10: Average results (PSNR(dB)) of TVTik and DeTik for image super-resolution with 2 scales (×2absent2\times 2× 2 and ×3absent3\times 3× 3), 10 different blur kernels and 3 noise levels on Set5, CBSD68, and Urban100 datasets.

Refer to caption

Set3C

Refer to caption

Set14

Refer to caption

Kodak24

Refer to caption

Set17

Figure 11: Average results (PSNR(dB)) of TVBox and DeBox for image deblurring with 10 different blur kernels and 3 noise levels on Set3C, Set14, Kodak24, and Set17 datasets.

Refer to caption

Set5 (×2absent2\times 2× 2)

Refer to caption

CBSD68 (×2absent2\times 2× 2)

Refer to caption

Urban100 (×2absent2\times 2× 2)

Refer to caption

Set5 (×3absent3\times 3× 3)

Refer to caption

CBSD68 (×3absent3\times 3× 3)

Refer to caption

Urban100 (×3absent3\times 3× 3)

Figure 12: Average numerical results (PSNR(dB)) of TVBox and DeBox for image super-resolution with 2 scales (×2absent2\times 2× 2 and ×3absent3\times 3× 3), 10 different blur kernels and 3 noise levels on Set5, CBSD68, and Urban100 datasets.

Appendix B Experimental results on robustness of Algorithm 2 and Algorithm 3

To further demonstrate the effectiveness of the proposed methods, we compare the results recovered by the model TVTik and DeTik in Fig. 9 and Fig. 10, and the model DeBox and TVBox in Fig. 11 and Fig. 12, for image deblurring and super-resolution, respectively.

We use the Matlab built-in function ‘boxplot’ to create a box plot. As shown in Fig. 9, each picture contains 9 boxes. The yellow, pink, and blue boxes represent the average PSNR values of the degraded images, the images restored by TVTik and DeTik, and the first, second, and third sets of yellow, pink, and blue boxes correspond to the noise levels of 2.552.552.552.55, 7.657.657.657.65, and 12.7512.7512.7512.75, respectively. On each box, the central mark indicates the median, and the bottom and top edges of the box indicate the 25252525th and 75757575th percentiles. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the dot symbol. From the box plot, we can see that the median of DeTik is higher than that of TVTik. Note that the TVTik model also enhances the quality of the degraded image when compared to the yellow boxes. These results demonstrate that Algorithm 2 is efficient in image restoration, as it successfully restores images affected by 10 different kernels and 3 different noise levels. Similarly to the deblurring results, the box plot is presented to show the super-resolution outcomes. The first and second rows of the box plot display the results of super-resolution under degradation with scale factors ×2absent2\times 2× 2 and ×3absent3\times 3× 3, respectively. The results presented in Fig. 10 also demonstrate that the proposed algorithm effectively solves the tested models, and DeTik outperforms TVTik in terms of recovery quality for image super-resolution.

For different noise levels and blur kernels, the average image restoration results of Set3C, Set14, Kodak24, and Set17 with box plot are demonstrated in Fig. 11. The yellow, pink, and blue boxes denote the average PSNR of the degraded images, the image restored by TVBox and DeBox. The first, second, and third sets of yellow, pink, and blue boxes correspond to the noise levels of 2.552.552.552.55, 7.657.657.657.65, and 12.7512.7512.7512.75, respectively. Similarly, the super-resolution results for two scale factors, ×2absent2\times 2× 2 and ×3absent3\times 3× 3, are presented in Fig. 12. The result demonstrates that the proposed method exhibits consistent and stable image restoration performance. From Fig. 11 and Fig. 12, we can see that Algorithm 3 effectively solves the DeBox model, and DeBox outperforms TVBox in terms of recovery quality for both image deblurring and super-resolution tasks. The experiment results also demonstrate that Algorithm 3 can handle the minimization with the non-differentiable term.

Acknowledgement

The authors are grateful to the anonymous referees for their valuable comments, which largely improve the quality of this paper.

References

  • [1] M. Ahookhosh, A. Themelis, and P. Patrinos, A Bregman forward-backward linesearch algorithm for nonconvex composite optimization: superlinear convergence to nonisolated local minima, SIAM Journal on Optimization, 31 (2021), pp. 653–685.
  • [2] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality, Mathematics of Operations Research, 35 (2010), pp. 438–457.
  • [3] H. Attouch, J. Bolte, and B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods, Mathematical Programming, 137 (2013), pp. 91–129.
  • [4] H. Attouch, J. Peypouquet, and P. Redont, A dynamical approach to an inertial forward-backward algorithm for convex minimization, SIAM Journal on Optimization, 24 (2014), pp. 232–256.
  • [5] A. Barakat and P. Bianchi, Convergence rates of a momentum algorithm with bounded adaptive step size for nonconvex optimization, in Asian Conference on Machine Learning, PMLR, 2020, pp. 225–240.
  • [6] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 2 (2009), pp. 183–202.
  • [7] F. Bian and X. Zhang, A three-operator splitting algorithm for nonconvex sparsity regularization, SIAM Journal on Scientific Computing, 43 (2021), pp. 2809–2839.
  • [8] J. Bolte, S. Sabach, and M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Mathematical Programming, 146 (2014), pp. 459–494.
  • [9] R. I. Boţ, E. R. Csetnek, and S. C. László, An inertial forward–backward algorithm for the minimization of the sum of two nonconvex functions, EURO Journal on Computational Optimization, 4 (2016), pp. 3–25.
  • [10] G. T. Buzzard, S. H. Chan, S. Sreehari, and C. A. Bouman, Plug-and-play unplugged: Optimization-free reconstruction using consensus equilibrium, SIAM Journal on Imaging Sciences, 11 (2018), pp. 2001–2020.
  • [11] C. Castera, J. Bolte, C. Févotte, and E. Pauwels, An inertial newton algorithm for deep learning, The Journal of Machine Learning Research, 22 (2021), pp. 5977–6007.
  • [12] C. Chen, S. Ma, and J. Yang, A general inertial proximal point algorithm for mixed variational inequality problem, SIAM Journal on Optimization, 25 (2015), pp. 2120–2142.
  • [13] R. Cohen, Y. Blau, D. Freedman, and E. Rivlin, It has potential: Gradient-driven denoisers for convergent solutions to inverse problems, Advances in Neural Information Processing Systems, 34 (2021), pp. 18152–18164.
  • [14] P. L. Combettes and J.-C. Pesquet, Fixed point strategies in data science, IEEE Transactions on Signal Processing, 69 (2021), pp. 3878–3905.
  • [15] L. Condat, D. Kitahara, A. Contreras, and A. Hirabayashi, Proximal splitting algorithms for convex optimization: A tour of recent advances, with new twists, SIAM Review, 65 (2023), pp. 375–435.
  • [16] D. Davis and W. Yin, A three-operator splitting scheme and its optimization applications, Set-Valued and Variational Analysis, 25 (2017), pp. 829–858.
  • [17] L.-J. Deng, R. Glowinski, and X.-C. Tai, A new operator splitting method for the Euler elastica model for image smoothing, SIAM Journal on Imaging Sciences, 12 (2019), pp. 1190–1230.
  • [18] J. Dong, S. Roth, and B. Schiele, Deep wiener deconvolution: Wiener meets deep learning for image deblurring, Advances in Neural Information Processing Systems, 33 (2020), pp. 1048–1059.
  • [19] R. G. Gavaskar, C. D. Athalye, and K. N. Chaudhury, On plug-and-play regularization using linear denoisers, IEEE Transactions on Image Processing, 30 (2021), pp. 4802–4813.
  • [20] T. Goldstein and S. Osher, The split bregman method for l1-regularized problems, SIAM Journal on Imaging Sciences, 2 (2009), pp. 323–343.
  • [21] K. Guo and D. Han, A note on the Douglas–Rachford splitting method for optimization problems involving hypoconvex functions, Journal of Global Optimization, 72 (2018), pp. 431–441.
  • [22] K. Guo, D. Han, and X. Yuan, Convergence analysis of Douglas–Rachford splitting method for “strongly+ weakly” convex programming, SIAM Journal on Numerical Analysis, 55 (2017), pp. 1549–1577.
  • [23] D. Han, A survey on some recent developments of alternating direction method of multipliers, Journal of the Operations Research Society of China, (2022), pp. 1–52.
  • [24] J. Hertrich, S. Neumayer, and G. Steidl, Convolutional proximal neural networks and plug-and-play algorithms, Linear Algebra and its Applications, 631 (2021), pp. 203–234.
  • [25] S. Hurault, A. Chambolle, A. Leclaire, and N. Papadakis, Convergent Plug-and-Play with proximal denoiser and unconstrained regularization parameter, arXiv preprint arXiv:2311.01216, (2023).
  • [26] S. Hurault, U. Kamilov, A. Leclaire, and N. Papadakis, Convergent Bregman plug-and-play image restoration for Poisson inverse problems, arXiv preprint arXiv:2306.03466, (2023).
  • [27] S. Hurault, A. Leclaire, and N. Papadakis, Gradient step denoiser for convergent plug-and-play, in International Conference on Learning Representations (ICLR’22), 2022.
  • [28] S. Hurault, A. Leclaire, and N. Papadakis, Proximal denoiser for convergent plug-and-play optimization with nonconvex regularization, in International Conference on Machine Learning, PMLR, 2022, pp. 9483–9505.
  • [29] P. Jain, P. Kar, et al., Non-convex optimization for machine learning, Foundations and Trends® in Machine Learning, 10 (2017), pp. 142–363.
  • [30] S. Kong, W. Wang, X. Feng, and X. Jia, Deep red unfolding network for image restoration, IEEE Transactions on Image Processing, 31 (2022), pp. 852–867.
  • [31] S. G. Krantz and H. R. Parks, A primer of real analytic functions, Springer Science & Business Media, 2002.
  • [32] P. Latafat and P. Patrinos, Asymmetric forward–backward–adjoint splitting for solving monotone inclusions involving three operators, Computational Optimization and Applications, 68 (2017), pp. 57–93.
  • [33] H. Le, N. Gillis, and P. Patrinos, Inertial block proximal methods for non-convex non-smooth optimization, in International Conference on Machine Learning, PMLR, 2020, pp. 5671–5681.
  • [34] G. Li and T. K. Pong, Douglas–Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems, Mathematical Programming, 159 (2016), pp. 371–401.
  • [35] J. Li, C. Huang, R. Chan, H. Feng, M. K. Ng, and T. Zeng, Spherical image inpainting with frame transformation and data-driven prior deep networks, SIAM Journal on Imaging Sciences, 16 (2023), pp. 1179–1196.
  • [36] M. Li and Z. Wu, Convergence analysis of the generalized splitting methods for a class of nonconvex optimization problems, Journal of Optimization Theory and Applications, 183 (2019), pp. 535–565.
  • [37] J. Liang, J. Fadili, and G. Peyré, A multi-step inertial forward-backward splitting method for non-convex optimization, Advances in Neural Information Processing Systems, 29 (2016).
  • [38] J. Liang, J. Fadili, and G. Peyré, Activity identification and local linear convergence of forward–backward-type methods, SIAM Journal on Optimization, 27 (2017), pp. 408–437.
  • [39] S. B. Lindstrom and B. Sims, Survey: sixty years of Douglas–Rachford, Journal of the Australian Mathematical Society, 110 (2021), pp. 333–370.
  • [40] H. Liu, X.-C. Tai, and R. Glowinski, An operator-splitting method for the gaussian curvature regularization model with applications to surface smoothing and imaging, SIAM Journal on Scientific Computing, 44 (2022), pp. A935–A963.
  • [41] J. Liu, S. Asif, B. Wohlberg, and U. Kamilov, Recovery analysis for plug-and-play priors using the restricted eigenvalue condition, Advances in Neural Information Processing Systems, 34 (2021), pp. 5921–5933.
  • [42] Y. Liu and W. Yin, An envelope for Davis-Yin splitting and strict saddle-point avoidance, Journal of Optimization Theory and Applications, 181 (2019), pp. 567–587.
  • [43] D. A. Lorenz and T. Pock, An inertial forward-backward algorithm for monotone inclusions, Journal of Mathematical Imaging and Vision, 51 (2015), pp. 311–325.
  • [44] Y. Nesterov, Introductory lectures on convex optimization: A basic course, vol. 87, Springer Science & Business Media, 2003.
  • [45] P. Ochs, Y. Chen, T. Brox, and T. Pock, ipiano: Inertial proximal algorithm for nonconvex optimization, SIAM Journal on Imaging Sciences, 7 (2014), pp. 1388–1419.
  • [46] S. Ono, Primal-dual plug-and-play image restoration, IEEE Signal Processing Letters, 24 (2017), pp. 1108–1112.
  • [47] J.-C. Pesquet, A. Repetti, M. Terris, and Y. Wiaux, Learning maximally monotone operators for image recovery, SIAM Journal on Imaging Sciences, 14 (2021), pp. 1206–1237.
  • [48] D. N. Phan and N. Gillis, An inertial block majorization minimization framework for nonsmooth nonconvex optimization, Journal of Machine Learning Research, 24 (2023), pp. 1–41.
  • [49] T. Pock and S. Sabach, Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems, SIAM Journal on Imaging Sciences, 9 (2016), pp. 1756–1787.
  • [50] B. T. Polyak, Some methods of speeding up the convergence of iteration methods, Ussr Computational Mathematics and Mathematical Physics, 4 (1964), pp. 1–17.
  • [51] H. Raguet, J. Fadili, and G. Peyré, A generalized forward-backward splitting, SIAM Journal on Imaging Sciences, 6 (2013), pp. 1199–1226.
  • [52] E. T. Reehorst and P. Schniter, Regularization by denoising: Clarifications and new interpretations, IEEE Transactions on Computational Imaging, 5 (2018), pp. 52–67.
  • [53] R. T. Rockafellar and R. J.-B. Wets, Variational analysis, vol. 317, Springer Science & Business Media, 2009.
  • [54] L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60 (1992), pp. 259–268.
  • [55] E. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin, Plug-and-play methods provably converge with properly trained denoisers, International Conference on Machine Learning, (2019), pp. 5546–5557.
  • [56] E. K. Ryu, A. B. Taylor, C. Bergeling, and P. Giselsson, Operator splitting performance estimation: Tight contraction factors and optimal parameter selection, SIAM Journal on Optimization, 30 (2020), pp. 2251–2271.
  • [57] A. Salim, L. Condat, K. Mishchenko, and P. Richtárik, Dualize, split, randomize: Toward fast nonsmooth optimization algorithms, Journal of Optimization Theory and Applications, 195 (2022), pp. 102–130.
  • [58] S. Setzer, Operator splittings, bregman methods and frame shrinkage in image processing, International Journal of Computer Vision, 92 (2011), pp. 265–280.
  • [59] S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, G. T. Buzzard, L. F. Drummy, J. P. Simmons, and C. A. Bouman, Plug-and-play priors for bright field electron tomography and sparse interpolation, IEEE Transactions on Computational Imaging, 2 (2016), pp. 408–423.
  • [60] Y. Sun, B. Wohlberg, and U. S. Kamilov, An online plug-and-play algorithm for regularized image reconstruction, IEEE Transactions on Computational Imaging, 5 (2019), pp. 395–408.
  • [61] Y. Sun, Z. Wu, X. Xu, B. Wohlberg, and U. S. Kamilov, Scalable plug-and-play ADMM with convergence guarantees, IEEE Transactions on Computational Imaging, 7 (2021), pp. 849–863.
  • [62] Y. Tang, M. Wen, and T. Zeng, Preconditioned three-operator splitting algorithm with applications to image restoration, Journal of Scientific Computing, 92 (2022), pp. 1–26.
  • [63] A. Themelis and P. Patrinos, Douglas–Rachford splitting and ADMM for nonconvex optimization: Tight convergence results, SIAM Journal on Optimization, 30 (2020), pp. 149–181.
  • [64] A. Themelis, L. Stella, and P. Patrinos, Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms, SIAM Journal on Optimization, 28 (2018), pp. 2274–2303.
  • [65] A. Themelis, L. Stella, and P. Patrinos, Douglas–Rachford splitting and ADMM for nonconvex optimization: accelerated and newton-type linesearch algorithms, Computational Optimization and Applications, 82 (2022), pp. 395–440.
  • [66] T. Tirer and R. Giryes, Image restoration by iterative denoising and backward projections, IEEE Transactions on Image Processing, 28 (2018), pp. 1220–1234.
  • [67] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, Plug-and-play priors for model based reconstruction, in 2013 IEEE Global Conference on Signal and Information Processing, IEEE, 2013, pp. 945–948.
  • [68] S. Villa, S. Salzo, L. Baldassarre, and A. Verri, Accelerated and inexact forward-backward algorithms, SIAM Journal on Optimization, 23 (2013), pp. 1607–1633.
  • [69] Q. Wang and D. Han, A generalized inertial proximal alternating linearized minimization method for nonconvex nonsmooth problems, Applied Numerical Mathematics, 189 (2023), pp. 66–87.
  • [70] K. Wei, A. Aviles-Rivero, J. Liang, Y. Fu, H. Huang, and C.-B. Schönlieb, Tfpnp: Tuning-free plug-and-play proximal algorithms with applications to inverse imaging problems, The Journal of Machine Learning Research, 23 (2022), pp. 699–746.
  • [71] T. Wu, W. Wu, Y. Yang, F.-L. Fan, and T. Zeng, Retinex image enhancement based on sequential decomposition with a plug-and-play framework, IEEE Transactions on Neural Networks and Learning Systems, (2023), pp. 1–14.
  • [72] Z. Wu, C. Li, M. Li, and A. Lim, Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems, Journal of Global Optimization, 79 (2021), pp. 617–644.
  • [73] Z. Wu and M. Li, General inertial proximal gradient method for a class of nonconvex nonsmooth optimization problems, Computational Optimization and Applications, 73 (2019), pp. 129–158.
  • [74] J. Yang and Y. Zhang, Alternating direction algorithms for L1subscript𝐿1{L}_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-problems in compressive sensing, SIAM Journal on Scientific Computing, 33 (2011), pp. 250–278.
  • [75] P. Yin, Y. Lou, Q. He, and J. Xin, Minimization of L1subscript𝐿1{L}_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-L2subscript𝐿2{L}_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for compressed sensing, SIAM Journal on Scientific Computing, 37 (2015), pp. 536–583.
  • [76] A. Yurtsever, V. Mangalick, and S. Sra, Three operator splitting with a nonconvex loss function, in International Conference on Machine Learning, PMLR, 2021, pp. 12267–12277.
  • [77] J. Zeng, T. T.-K. Lau, S. Lin, and Y. Yao, Global convergence of block coordinate descent in deep learning, in International Conference on Machine Learning, PMLR, 2019, pp. 7313–7323.
  • [78] K. Zhang, Y. Li, W. Zuo, L. Zhang, L. Van Gool, and R. Timofte, Plug-and-play image restoration with deep denoiser prior, IEEE Transactions on Pattern Analysis and Machine Intelligence, 44 (2021), pp. 6360–6376.
  • [79] K. Zhang, L. Van Gool, and R. Timofte, Deep unfolding network for image super-resolution, in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 3217–3226.
  • [80] K. Zhang, W. Zuo, S. Gu, and L. Zhang, Learning deep cnn denoiser prior for image restoration, in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3929–3938.