Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Primal-Dual Strategy (PDS) for Composite Optimization Over Directed graphs

Sajad Zandi and Mehdi Korki S. Zandi is with faculty of Engineering and IT, University of Technology Sydney, Ultimo, 2007, Australia ( email: sajad.zandi@student.uts.edu.au).M. Korki is with School of Science, Computing and Engineering Technologies, Swinburne University of Technology, Hawthorn, 3122, Australia ( e-mail: mkorki@swin.edu.au).
Abstract

We investigate the distributed multi-agent sharing optimization problem in a directed graph, with a composite objective function consisting of a smooth function plus a convex (possibly non-smooth) function shared by all agents. While adhering to the network connectivity structure, the goal is to minimize the sum of smooth local functions plus a non-smooth function. The proposed Primal-Dual algorithm (PD) is similar to a previous algorithm [43], but it has additional benefits. To begin, we investigate the problem in directed graphs, where agents can only communicate in one direction and the combination matrix is not symmetric. Furthermore, the combination matrix is changing over time, and the condition coefficient weights are produced using an adaptive approach. The strong convexity assumption, adaptive coefficient weights, and a new upper bound on step-sizes are used to demonstrate that linear convergence is possible. New upper bounds on step-sizes are derived under the strong convexity assumption and adaptive coefficient weights that are time-varying in the presence of both smooth and non-smooth terms. Simulation results show the efficacy of the proposed algorithm compared to some other algorithms.

Index Terms:
Primal-Dual algorithm (PD), directed graph, adaptive coefficient weights.

I Introduction

In a decentralized approach, N𝑁Nitalic_N agents, which only exchange information with their immediate neighbors, are connected to form a multi-agent cooperative network for estimating an unknown parameter through in-network processing. The goal of this work is to modify and utilize the multi-agent sharing optimization problems [43], in a distributed approach. In recent years, there has been a surge of interest in distributed optimization. It is also a solution to minimize the average/sum of n𝑛nitalic_n convex functions by a network of n𝑛nitalic_n nodes, where a single agent i𝑖iitalic_i has only access to a private function fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. This problem frequently arises in formation control [13] and non-autonomous power control [14]. The coupled multi-agent sharing optimization problem includes two functions [43]: the sum of smooth local functions plus a convex (possibly non-smooth) function coupling all agents in the network. Agents aim to find the solution wkQksuperscriptsubscript𝑤𝑘superscriptsubscript𝑄𝑘w_{k}^{\star}\in\mathbb{R}^{Q_{k}}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT to the following problem:

minw1,.,wNk=1NJk(wk)+D(w),\min_{w_{1},....,w_{N}}\sum_{k=1}^{N}J_{k}(w_{k})+\mathrm{D}(w),roman_min start_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … . , italic_w start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + roman_D ( italic_w ) , (1)

where function Jk(wk):Qk:subscript𝐽𝑘subscript𝑤𝑘superscriptsubscript𝑄𝑘J_{k}(w_{k}):\mathbb{R}^{Q_{k}}\rightarrow\mathbb{R}italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) : blackboard_R start_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → blackboard_R is only known to agent k𝑘kitalic_k, and non-smooth function D(w)D𝑤\mathrm{D}(w)roman_D ( italic_w ) is known by all agents. Note that D(w)=g(k=1NCkwk):M(+):D𝑤𝑔superscriptsubscript𝑘1𝑁subscript𝐶𝑘subscript𝑤𝑘superscript𝑀\mathrm{D}(w)=g(\sum_{k=1}^{N}C_{k}w_{k}):\mathbb{R}^{M}\rightarrow\mathbb{R}% \cup(+\infty)roman_D ( italic_w ) = italic_g ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) : blackboard_R start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT → blackboard_R ∪ ( + ∞ ) is a convex (possibly non-smooth) function, where the matrix CkM×Qksubscript𝐶𝑘superscript𝑀subscript𝑄𝑘C_{k}\in\mathbb{R}^{M\times Q_{k}}italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_Q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is known by agent k𝑘kitalic_k only. The coupled multi-agent optimization problem of (1) represents the sharing formulation, where the agents (with their various variables) are coupled through D(w)D𝑤\mathrm{D}(w)roman_D ( italic_w ). The applications of problem (1) include regression over distributed features [1], [2], dictionary learning over distributed models [3], clustering in graphs [4], smart grid control [5], and network utility maximization [6], to name a few.

The authors in [43] proposed a decentralized algorithm for (1) and established its linear convergence to the global solution. They derived their algorithms based on reformulating (1) into an equivalent decentralized solution using an equivalent saddle-point problem. In this work, we use a similar technique to develop and derive a novel decentralized algorithm in a directed graph based on adaptive coefficient weights and a new convergence rate. Weight balance is usually not possible in real-world scenarios. As a result, we aim to improve the optimization and learning algorithms that are suitable for and applicable to directed graphs. The main challenge in dealing with directed graphs is the substitution of construction of doubly-stochastic to either row-stochastic or column-stochastic matrix, which are used as weighted adjacency matrices, in directed graphs. The row-stochasticity of the weight matrix guarantees that all agents reach consensus, while the column-stochasticity ensures optimality [7]. Most of the methods for optimization over directed graphs are used to combine the average-consensus methods with optimization algorithms designed for undirected graphs. In contrast to undirected graphs, the applications of a directed graph topology[50] are wider, and it may also result in lower communication costs and simpler topology. Combining average-consensus techniques created for directed graphs with optimization algorithms made for undirected graphs serves as the inspiration for the current methods for optimization over directed graphs. For example, subgradient-push method, described in [8] and extensively investigated in [9], combines push-sum consensus [10] and distributed gradient descent (DGD). Directed-Distributed Gradient Descent (D-DGD)[7, 12], is a linear technique over directed graphs that is based on surplus consensus [11] and DGD. However, due to the diminishing step-size, such DGD-based algorithms converge relatively slowly for general convex functions and strongly convex functions.

I-A Related Works

Information exchange within a multi-agent cooperative network (through coupled neighbors) aims to estimate an unknown parameter. The applications of this cooperative learning and estimation are in various fields such as signal processing, optimization, and control [30], [31]. Information exchange can happen in a distributed or central approach. In the central method, data is collected from the whole network and is processed in the fusion center, then it is sent back to the agents. This method is powerful, however, it still has some limitations such as limited bandwidth and vulnerability of the fusion center to failure. Decentralized methods refer to fully distributed solutions. As an application, the distributed solution method enables parallel processing and the distribution of computational load across multiple agents. When the bandwidth around the central agent is low, the performance of network methods degrades. On the other hand, one advantage of decentralized methods is that convergence occurs quickly while bandwidth is limited. Furthermore, in some sensitive applications, privacy and confidentiality are barriers that prevent sending information to fusion centers. On the other hand, in a distributed model, the agents are only allowed to share information with their immediate neighbors. This can be applied in social, transportation, and biological networks [30, 51].

Many studies have made decentralized optimization algorithms in large-scale networks an interesting topic. For instance, consider [15, 16] with one agent at the center of a centralized topology that may fail or violate privacy requirements. The incremental algorithm [17, 20] may be considered as a replacement for a non-centralized topology based on a directed ring network topology. Furthermore, to manage a time-varying network, [21] proposes a distributed subgradient (DSG) algorithm with slow speed due to a decaying step-size that is required to obtain a consensus and optimal solution.

In this paper, we propose a new convergence rate for a primal-dual algorithm in a directed graph with adaptive coefficient weights.

Moreover, if D(𝐰)𝐷𝐰D({\mathbf{w}})italic_D ( bold_w ) in (1) is a non-smooth regularizer across all the agents, a proximal decentralized algorithm structure with a fixed point fits with the desired global solution. On the other hand, if D(𝐰)=0𝐷𝐰0D({\mathbf{w}})=0italic_D ( bold_w ) = 0, then decentralized primal methods can only converge to a biased solution. As a solution, primal methods are required with a decaying step-size to slow down the convergence rate [43]. As a result, one of the difficulties in this field is the speed of convergence, for which some methods are affected by decreasing step sizes. To accelerate convergence, we proposed a solution that utilizes strong convexity and the Liptschiz-Continuous gradients case. Some methods employed a constant step size that geometrically converges to an error ball centered on the optimal solution[34]. The authors in [22] achieve geometric convergence with the help of symmetric weights. In [23], the authors combine inexact gradient methods and a gradient estimation technique according to dynamic average consensus, where the underlying graphs must be undirected or weight-balanced according to the proposed methods. The convergence rate of some methods can be limited by decreasing step size, depending on whether the objective function is generally convex or strongly convex.

Recently, the authors in [24] proposed a fast distributed algorithm, called DEXTRA, where they have chosen an appropriate step-size which resulted in a linear convergence of the algorithm, while the objective functions are strongly convex with Liptschiz-Continuous gradients. In this method, if we choose a decreasing step-size, the convergence behavior is slow. However, if the step-size is determined by an interval and the objective functions are strongly convex with the Lipschitz-Continuous gradient, the convergence rate geometrically results in a global optimal point [48]. In ADD-OPT/push-DIGing [25], the improvement of [24] provides a geometrical convergence with a sufficiently small step size. In DEXTRA [24] and ADD-OPT/push-DIGing [25] algorithms, each node needs to know its out-degree in order to build a column-stochastic weights matrix. In [26], the authors eliminate this requirement by using row-stochastic weights and the need for out-degree knowledge, as each agent locally decides the weights assigned to incoming information. Fast methods over directed graphs share one thing in common: they are all based on push-sum type techniques. This results in a nonlinear algorithm since an independent algorithm is required to asymptotically learn either the right or the left eigenvector, which corresponds to the eigenvalue of 1 [49]. It does, however, result in additional computations and communications among the agents.

This work focuses on the case where communication between nodes is directed and time-varying. We analyze the convergence rate with a decreasing step-size for the non-smooth function. First, we discuss the problem when D(𝐰)𝐷𝐰D({\mathbf{w}})italic_D ( bold_w ) is smooth. Decentralized primal methods in [32][35] can not achieve an exact solution for the convergence rate unless a degraded step-size is applied. In [36], to improve the convergence rate, a decentralized method based on the alternating direction method of multipliers (ADMM) technique has been utilized. To simplify the implementation of the convergence rate, a gradient tracking method was applied in [37][38], then, dual-domain and accelerated dual gradient descent were utilized in [39]. In case, D(𝐰)𝐷𝐰D({\mathbf{w}})italic_D ( bold_w ) is non-smooth, a proximal gradient method is useful to compute the convergence rate, however, the increasing number of inner loops makes it computationally expensive [40]. If the non-smooth function is replaced by the indicator function of zero, then, a good convergence rate is attained [41], [42]. For the proximal decentralized algorithm, an asymptotic linear convergence exists even though all the functions are piece-wise linear-quadratic (PLQ). Since both large number of iterations and PLQ[47] are considered, a good convergence rate can be obtained.

I-B Contribution

In this paper, we develop a modified version of proximal-dual (PD) strategy [43], [44] over a distributed network, in a directed graph with adaptive coefficient weights. The purpose of this work is to reach the new convergence rates in a directed graph.

The contributions of this paper are summarized as follows:

  • In contrast to the proposed methods in the literature [43], [44], which use uniform combination coefficients, the proposed PD algorithm employs adaptive combination coefficients with new step sizes.

  • To have both a good convergence and a good performance, we optimize the adaptive combination coefficients, in a directed graph, to reduce the load of the communications and the energy of each agent.

  • Consequently, we derive new bounds for step sizes which result in a lower squared error. (See Theorem 1)

Simulation results also show that the proposed algorithm achieves a faster convergence rate.

The rest of the paper is organized as follows. In Section II, we present the problem formulation of the diffusion strategy and proposed primal-dual diffusion algorithm, and saddle points reformulation. We also provide the implementation of proximal adaptive-then-combine (ATC) for dual-diffusion algorithm and diffusion tracking method. In Section III, we present the proposed proximal dual diffusion Method, i.e., PD-dMVC and we derive the new theorem to find the new step-sizes. In Section IV, we provide the simulation results to evaluate the performance of the proposed algorithm. We conclude the paper in Section VII.

I-C Notation and Symbols

All over the paper, all vectors are column vectors, with exception of the regression vector denoted by 𝐮𝐮{\mathbf{u}}bold_u, which is a row vector. IMsubscript𝐼𝑀I_{M}italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT denotes the identity matrix of size M×M𝑀𝑀M\times Mitalic_M × italic_M and 1Nsubscript1𝑁\textbf{1}_{N}1 start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT denotes the N×1𝑁1N\times 1italic_N × 1 vector with all entries equal to one. Lowercase letters denote vectors and uppercase letters denote matrices. XY𝑋𝑌X\geq Yitalic_X ≥ italic_Y and X>Y𝑋𝑌X>Yitalic_X > italic_Y implies that XY𝑋𝑌X-Yitalic_X - italic_Y is positive definite matrix ( X𝑋Xitalic_X and Y𝑌Yitalic_Y are both symmetric matrices with the same dimension). col{.}col\{.\}italic_c italic_o italic_l { . } denotes a column vector that is structured by stacking the elements on top of each other. xC2=xTCxsuperscriptsubscriptnorm𝑥𝐶2superscript𝑥𝑇𝐶𝑥\|x\|_{C}^{2}=x^{T}Cx∥ italic_x ∥ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C italic_x, for a vector xM𝑥superscript𝑀x\in\mathbb{R}^{M}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT and a squared matrix C0𝐶0C\geq 0italic_C ≥ 0. The subdifferential f(x)𝑓𝑥\partial f(x)∂ italic_f ( italic_x ) of a function f𝑓fitalic_f at x𝑥xitalic_x is the set of all subgradients. The proximal operator of a function f(x)𝑓𝑥f(x)italic_f ( italic_x ) with step-size μ𝜇\muitalic_μ is proxμf(x)=argmin𝑢f(u)+12μxu2subscriptprox𝜇𝑓𝑥𝑢𝑓𝑢12𝜇superscriptnorm𝑥𝑢2\operatorname{prox}_{\mu f}(x)=\underset{u}{\arg\min}f(u)+\frac{1}{2\mu}\|x-u% \|^{2}roman_prox start_POSTSUBSCRIPT italic_μ italic_f end_POSTSUBSCRIPT ( italic_x ) = underitalic_u start_ARG roman_arg roman_min end_ARG italic_f ( italic_u ) + divide start_ARG 1 end_ARG start_ARG 2 italic_μ end_ARG ∥ italic_x - italic_u ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The conjugate of a function f𝑓fitalic_f is defined as f(v)=supxvxsuperscript𝑓𝑣limit-fromsubscriptsupremum𝑥superscript𝑣top𝑥f^{*}(v)=\sup_{x}v^{\top}x-italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_v ) = roman_sup start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x - f(x)𝑓𝑥f(x)italic_f ( italic_x ). A differentiable function f𝑓fitalic_f is δ𝛿\deltaitalic_δ-smooth and ν𝜈\nuitalic_ν-stronglyconvex if f(x)f(y)δxynorm𝑓𝑥𝑓𝑦𝛿norm𝑥𝑦\|\nabla f(x)-\nabla f(y)\|\leq\delta\|x-y\|∥ ∇ italic_f ( italic_x ) - ∇ italic_f ( italic_y ) ∥ ≤ italic_δ ∥ italic_x - italic_y ∥ and (xy)(f(x)(x-y)^{\top}(\nabla f(x)-( italic_x - italic_y ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( ∇ italic_f ( italic_x ) - f(y))νxy2\nabla f(y))\geq\nu\|x-y\|^{2}∇ italic_f ( italic_y ) ) ≥ italic_ν ∥ italic_x - italic_y ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, respectively, for any x𝑥xitalic_x and y𝑦yitalic_y.

II Problem formulation

II-A Problem Reformulation

II-A1 Primal dual decentralized algorithms

According to some studies, adaptive then combined methods such as primal-dual recursion have larger step-size stability ranges than non-ATC methods. The primal-dual framework was the first primal-dual interpretation of ATC gradient tracking methods[ref-ref]. When the step-size is kept constant, diffusion methods solve optimization problems in the form of approximate problems that converge to biased solutions. To converge to an optimal solution, primal-based methods require the use of decaying step-size; however, decaying step-size slows down the convergence rate. As a result, primal-dual methods are proposed to obtain unbiased problem-solving and converge to exact minimizers while communications are limited. There are two kinds of problems: those with penalties and those without. The benefit of using a penalty term is that it improves the convergence rate of a decentralized algorithm while aggregate cost is conditioned and strongly convex but individuals are not. An equivalent approach is to reformulate some problems through a network and try to find interested unknown parameters. In fact, while studying a network with a general algorithmic framework as a solution to the decentralized optimization problem, minimizing a sum of local cost functions is an interesting topic. Furthermore, saddle points as a solution structure include primal-descent, dual ascend, and combine equations. In addition, the cooperation between equations is as follows: gradient-descent followed by gradient-ascend, followed by a combination step.

In this section, we consider some saddle-point problems that can reformulate (1). A decentralized algorithm cannot be obtained by directly solving the first equivalent saddle point. As a result, more reformulation is required to derive a decentralized solution in a decentralized approach to solve the problem. The network quantities are introduced as follows:

𝒲col{𝐰1,𝐰2,.,𝐰N}Q,Qk=1NQk,\displaystyle\mathcal{W}\triangleq col\{{\mathbf{w}}_{1},{\mathbf{w}}_{2},....% ,{\mathbf{w}}_{N}\}\in\mathbb{R}^{Q},Q\triangleq\sum_{k=1}^{N}Q_{k},caligraphic_W ≜ italic_c italic_o italic_l { bold_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … . , bold_w start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } ∈ blackboard_R start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT , italic_Q ≜ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , (2)
𝒥(𝒲)k=1NJk(𝐰k),𝒥𝒲superscriptsubscript𝑘1𝑁subscript𝐽𝑘subscript𝐰𝑘\displaystyle\mathcal{J}(\mathcal{W})\triangleq\sum_{k=1}^{N}J_{k}({\mathbf{w}% }_{k}),caligraphic_J ( caligraphic_W ) ≜ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , (3)
C[C1,,CK]M×Q.𝐶subscript𝐶1subscript𝐶𝐾superscript𝑀𝑄\displaystyle C\triangleq\left[C_{1},\cdots,C_{K}\right]\in\mathbb{R}^{M\times Q}.italic_C ≜ [ italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_C start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_Q end_POSTSUPERSCRIPT . (4)

Then, the problem (1) can be written as follows:

min𝒲𝒥(𝒲)+g(C𝒲).subscript𝒲𝒥𝒲𝑔𝐶𝒲\displaystyle\min_{\mathcal{W}}\mathcal{J}(\mathcal{W})+g(C\mathcal{W}).roman_min start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT caligraphic_J ( caligraphic_W ) + italic_g ( italic_C caligraphic_W ) . (5)
Assumption 1

(Objective Function) [43]
Problem (5) has a solution 𝒲superscript𝒲\mathcal{W}^{\star}caligraphic_W start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT, and the function 𝒥:Q:𝒥superscript𝑄\mathcal{J}:\mathbb{R}^{Q}\rightarrow\mathbb{R}caligraphic_J : blackboard_R start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT → blackboard_R is δ𝛿\deltaitalic_δ-smooth and ν𝜈\nuitalic_ν-strongly-convex with 0<νδ0𝜈𝛿0<\nu\leq\delta0 < italic_ν ≤ italic_δ. Moreover, the function g:E{+}:𝑔superscript𝐸g:\mathbb{R}^{E}\rightarrow\mathbb{R}\cup\{+\infty\}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_E end_POSTSUPERSCRIPT → blackboard_R ∪ { + ∞ } is a proper lower semi-continuous and a convex function, and there exists 𝒲Q𝒲superscript𝑄\mathcal{W}\in\mathbb{R}^{Q}caligraphic_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT such that C𝒲𝐶𝒲C\mathcal{W}italic_C caligraphic_W belongs to the relative interior domain of g𝑔gitalic_g.

Remark 1

Because the objective function in (5) is strongly convex, the global solution 𝒲superscript𝒲\mathcal{W}^{\star}caligraphic_W start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT is unique.

Problem (5) is equivalent to the following saddle-point reformulation under strong-duality and Assumption (1).

min𝒲maxy𝒥(𝒲)+yTC𝒲g(y),subscript𝒲subscript𝑦𝒥𝒲superscript𝑦𝑇𝐶𝒲superscript𝑔𝑦\displaystyle\min_{\mathcal{W}}\max_{y}\mathcal{J}(\mathcal{W})+y^{T}C\mathcal% {W}-g^{*}(y),roman_min start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_J ( caligraphic_W ) + italic_y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C caligraphic_W - italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_y ) , (6)

where gsuperscript𝑔g^{*}italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is the conjugate function of g𝑔gitalic_g and y𝑦yitalic_y is the dual variable. Due to the coupling effect of all agents by dual variable y𝑦yitalic_y, which requires a fusion center to compute the dual update, solving (6) directly does not lead to a decentralized algorithm. As a result, additional reformulations are required. A local copy of dual variable y𝑦yitalic_y at agent k𝑘kitalic_k, i.e., yksubscript𝑦𝑘y_{k}italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, is introduced to solve (6) in a decentralized manner. The following quantities are introduced as part of the network:

𝒴𝒴\displaystyle\mathcal{Y}caligraphic_Y col{y1,y2,.,yN}MN,𝒢(𝒴)\displaystyle\triangleq col\{y_{1},y_{2},....,y_{N}\}\in\mathbb{R}^{MN},% \mathcal{G}^{*}(\mathcal{Y})≜ italic_c italic_o italic_l { italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … . , italic_y start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } ∈ blackboard_R start_POSTSUPERSCRIPT italic_M italic_N end_POSTSUPERSCRIPT , caligraphic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( caligraphic_Y ) 1Nk=1Ng(yk),absent1𝑁superscriptsubscript𝑘1𝑁superscript𝑔subscript𝑦𝑘\displaystyle\triangleq\frac{1}{N}\sum_{k=1}^{N}g^{*}(y_{k}),≜ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ,
𝒞dsubscript𝒞𝑑\displaystyle\mathcal{C}_{d}caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT blkdiag{Ck}k=1NMN×Q.absent𝑏𝑙𝑘𝑑𝑖𝑎𝑔superscriptsubscriptsubscript𝐶𝑘𝑘1𝑁superscript𝑀𝑁𝑄\displaystyle\triangleq blkdiag{\{C_{k}\}_{k=1}^{N}}\in\mathbb{R}^{MN\times Q}.≜ italic_b italic_l italic_k italic_d italic_i italic_a italic_g { italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M italic_N × italic_Q end_POSTSUPERSCRIPT . (7)

If we define 𝒟MN×MN𝒟superscript𝑀𝑁𝑀𝑁\mathcal{D}\in\mathbb{R}^{MN\times MN}caligraphic_D ∈ blackboard_R start_POSTSUPERSCRIPT italic_M italic_N × italic_M italic_N end_POSTSUPERSCRIPT as a symmetric matrix, we can write:

𝒟𝒴𝒟𝒴\displaystyle\mathcal{D}\mathcal{Y}caligraphic_D caligraphic_Y =0y1=.=yN.\displaystyle=0\Leftrightarrow y_{1}=....=y_{N}.= 0 ⇔ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = … . = italic_y start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT . (8)

The saddle-point type-2 problem is written as:

min𝒲,𝒳max𝒴𝒥(𝒲)+𝒴T𝒞d𝒲+𝒴T𝒟𝒳𝒢(𝒴).subscript𝒲𝒳subscript𝒴𝒥𝒲superscript𝒴𝑇subscript𝒞𝑑𝒲superscript𝒴𝑇𝒟𝒳superscript𝒢𝒴\displaystyle\min_{\mathcal{W},\mathcal{X}}\max_{\mathcal{Y}}\mathcal{J}(% \mathcal{W})+\mathcal{Y}^{T}\mathcal{C}_{d}\mathcal{W}+\mathcal{Y}^{T}\mathcal% {D}\mathcal{X}-\mathcal{G}^{*}(\mathcal{Y}).roman_min start_POSTSUBSCRIPT caligraphic_W , caligraphic_X end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT caligraphic_J ( caligraphic_W ) + caligraphic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT caligraphic_W + caligraphic_Y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_D caligraphic_X - caligraphic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( caligraphic_Y ) . (9)

The problem formulation of (9) can be solved in a decentralized manner by using an optimal-point (𝒲,𝒳,𝒴)superscript𝒲superscript𝒳superscript𝒴(\mathcal{W}^{*},\mathcal{X}^{*},\mathcal{Y}^{*})( caligraphic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_Y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ).

Lemma 1

(Optimal Point) Subject to assumption .1 and (8), a primal-dual pair (𝒲,y)superscript𝒲superscript𝑦(\mathcal{W}^{*},y^{*})( caligraphic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) and a fixed point (𝒲,𝒳,𝒴)superscript𝒲superscript𝒳superscript𝒴(\mathcal{W}^{*},\mathcal{X}^{*},\mathcal{Y}^{*})( caligraphic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_Y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) are optimal if, and only if, they satisfy the optimality conditions for saddle-point 1 and saddle-point 2 as below, respectively.

Saddle-Point .1:(𝒲,𝒴)Saddle-Point .1:superscript𝒲superscript𝒴\displaystyle\textbf{Saddle-Point .1:}(\mathcal{W}^{*},\mathcal{Y}^{*})Saddle-Point .1: ( caligraphic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_Y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
CTysuperscript𝐶𝑇superscript𝑦\displaystyle-C^{T}y^{*}- italic_C start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =𝒥(𝒲),absent𝒥superscript𝒲\displaystyle=\nabla\mathcal{J}(\mathcal{W}^{*}),= ∇ caligraphic_J ( caligraphic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , (10)
C𝒲𝐶superscript𝒲\displaystyle C\mathcal{W}^{*}italic_C caligraphic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT g(y),absentsuperscript𝑔superscript𝑦\displaystyle\in\partial g^{*}(y^{*}),∈ ∂ italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , (11)
Saddle-Point .2:(𝒲,𝒳,𝒴)Saddle-Point .2:superscript𝒲superscript𝒳superscript𝒴\displaystyle\textbf{Saddle-Point .2:}(\mathcal{W}^{*},\mathcal{X}^{*},% \mathcal{Y}^{*})Saddle-Point .2: ( caligraphic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_Y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
𝒞dT𝒴superscriptsubscript𝒞𝑑𝑇superscript𝒴\displaystyle-\mathcal{C}_{d}^{T}\mathcal{Y}^{*}- caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_Y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =𝒥(𝒲),absent𝒥superscript𝒲\displaystyle=\nabla\mathcal{J}(\mathcal{W}^{*}),= ∇ caligraphic_J ( caligraphic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) , (12)
𝒟𝒴𝒟superscript𝒴\displaystyle\mathcal{D}\mathcal{Y}^{*}caligraphic_D caligraphic_Y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =0,absent0\displaystyle=0,= 0 , (13)
𝒞d𝒲+𝒟𝒳subscript𝒞𝑑superscript𝒲𝒟superscript𝒳\displaystyle\mathcal{C}_{d}\mathcal{W}^{*}+\mathcal{D}\mathcal{X}^{*}caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT caligraphic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + caligraphic_D caligraphic_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT 𝒢(𝒴).absentsuperscript𝒢superscript𝒴\displaystyle\in\partial\mathcal{G}^{*}(\mathcal{Y}^{*}).∈ ∂ caligraphic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( caligraphic_Y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) . (14)
Lemma 2

if (𝒲,𝒳,𝒴superscript𝒲superscript𝒳superscript𝒴\mathcal{W}^{*},\mathcal{X}^{*},\mathcal{Y}^{*}caligraphic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_Y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT) fulfills the optimality condition (12), (13), and (14), then it holds that 𝒴=1Kysuperscript𝒴tensor-productsubscript1𝐾superscript𝑦\mathcal{Y}^{*}=\textbf{1}_{K}\otimes y^{*}caligraphic_Y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = 1 start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ⊗ italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT with (𝒲,ysuperscript𝒲superscript𝑦\mathcal{W}^{*},y^{*}caligraphic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT) satisfying optimality condition (10) and (11).

II-A2 Decentralized Strategy

Let 𝒴1subscript𝒴1\mathcal{Y}_{-1}caligraphic_Y start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT and 𝒲1subscript𝒲1\mathcal{W}_{-1}caligraphic_W start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT take any arbitrary value and 𝒳1=0subscript𝒳10\mathcal{X}_{-1}=0caligraphic_X start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT = 0.

𝒲isubscript𝒲𝑖\displaystyle\mathcal{W}_{i}caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝒲i1μω𝒥(𝒲i1)μω𝒞dT𝒴i1,absentsubscript𝒲𝑖1subscript𝜇𝜔𝒥subscript𝒲𝑖1subscript𝜇𝜔superscriptsubscript𝒞𝑑𝑇subscript𝒴𝑖1\displaystyle=\mathcal{W}_{i-1}-\mu_{\omega}\nabla\mathcal{J}(\mathcal{W}_{i-1% })-\mu_{\omega}\mathcal{C}_{d}^{T}\mathcal{Y}_{i-1},= caligraphic_W start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ∇ caligraphic_J ( caligraphic_W start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_Y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , (15)
𝒵isubscript𝒵𝑖\displaystyle\mathcal{Z}_{i}caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝒴i1+μy𝒞d𝒲i+𝒟𝒳i1,absentsubscript𝒴𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript𝒲𝑖𝒟subscript𝒳𝑖1\displaystyle=\mathcal{Y}_{i-1}+\mu_{y}\mathcal{C}_{d}\mathcal{W}_{i}+\mathcal% {D}\mathcal{X}_{i-1},= caligraphic_Y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + caligraphic_D caligraphic_X start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , (16)
𝒳isubscript𝒳𝑖\displaystyle\mathcal{X}_{i}caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝒳i1𝒟𝒵i,absentsubscript𝒳𝑖1𝒟subscript𝒵𝑖\displaystyle=\mathcal{X}_{i-1}-\mathcal{D}\mathcal{Z}_{i},= caligraphic_X start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - caligraphic_D caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (17)
𝒴isubscript𝒴𝑖\displaystyle\mathcal{Y}_{i}caligraphic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =proxμy𝒢(𝒜¯𝒵i),absentsubscriptproxsubscript𝜇𝑦superscript𝒢¯𝒜subscript𝒵𝑖\displaystyle=\operatorname{prox}_{\mu_{y}\mathcal{G}^{*}}(\mathcal{\bar{A}}% \mathcal{Z}_{i}),= roman_prox start_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over¯ start_ARG caligraphic_A end_ARG caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (18)

where 𝒜¯=A¯IM¯𝒜tensor-product¯𝐴subscript𝐼𝑀\mathcal{\bar{A}}=\bar{A}\otimes I_{M}over¯ start_ARG caligraphic_A end_ARG = over¯ start_ARG italic_A end_ARG ⊗ italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT and A¯N×N¯𝐴superscript𝑁𝑁\bar{A}\in\mathbb{R}^{N\times N}over¯ start_ARG italic_A end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT is left-stochastic or right-stochastic combination matrix.

II-A3 Implementation of proximal ATC for dual-diffusion algorithm

By rewriting (16)-(18) and removing the variable 𝒳isubscript𝒳𝑖\mathcal{X}_{i}caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we obtain:

𝒵i𝒵i1subscript𝒵𝑖subscript𝒵𝑖1\displaystyle\mathcal{Z}_{i}-\mathcal{Z}_{i-1}caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - caligraphic_Z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT =μy𝒞d(𝒲i𝒲i1)+𝒴i1𝒴i2absentsubscript𝜇𝑦subscript𝒞𝑑subscript𝒲𝑖subscript𝒲𝑖1subscript𝒴𝑖1subscript𝒴𝑖2\displaystyle=\mu_{y}\mathcal{C}_{d}(\mathcal{W}_{i}-\mathcal{W}_{i-1})+% \mathcal{Y}_{i-1}-\mathcal{Y}_{i-2}= italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - caligraphic_W start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) + caligraphic_Y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - caligraphic_Y start_POSTSUBSCRIPT italic_i - 2 end_POSTSUBSCRIPT
+𝒟(𝒳i1𝒳i2)𝒟subscript𝒳𝑖1subscript𝒳𝑖2\displaystyle+\mathcal{D}(\mathcal{X}_{i-1}-\mathcal{X}_{i-2})+ caligraphic_D ( caligraphic_X start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - caligraphic_X start_POSTSUBSCRIPT italic_i - 2 end_POSTSUBSCRIPT )
=μy𝒞d(𝒲i𝒲i1)+𝒴i1𝒴i2absentsubscript𝜇𝑦subscript𝒞𝑑subscript𝒲𝑖subscript𝒲𝑖1subscript𝒴𝑖1subscript𝒴𝑖2\displaystyle=\mu_{y}\mathcal{C}_{d}(\mathcal{W}_{i}-\mathcal{W}_{i-1})+% \mathcal{Y}_{i-1}-\mathcal{Y}_{i-2}= italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - caligraphic_W start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) + caligraphic_Y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - caligraphic_Y start_POSTSUBSCRIPT italic_i - 2 end_POSTSUBSCRIPT
+𝒟(𝒟𝒵i1).𝒟𝒟subscript𝒵𝑖1\displaystyle+\mathcal{D}(-\mathcal{D}\mathcal{Z}_{i-1}).+ caligraphic_D ( - caligraphic_D caligraphic_Z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) . (19)

Rearranging (19) yields

𝒵isubscript𝒵𝑖\displaystyle\mathcal{Z}_{i}caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =(I𝒟2)𝒵i1+𝒴i1𝒴i2absent𝐼superscript𝒟2subscript𝒵𝑖1subscript𝒴𝑖1subscript𝒴𝑖2\displaystyle=(I-\mathcal{D}^{2})\mathcal{Z}_{i-1}+\mathcal{Y}_{i-1}-\mathcal{% Y}_{i-2}= ( italic_I - caligraphic_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) caligraphic_Z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + caligraphic_Y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - caligraphic_Y start_POSTSUBSCRIPT italic_i - 2 end_POSTSUBSCRIPT
+μy𝒞d(𝒲i𝒲i1),subscript𝜇𝑦subscript𝒞𝑑subscript𝒲𝑖subscript𝒲𝑖1\displaystyle+\mu_{y}\mathcal{C}_{d}(\mathcal{W}_{i}-\mathcal{W}_{i-1}),+ italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - caligraphic_W start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) , (20)
𝒲isubscript𝒲𝑖\displaystyle\mathcal{W}_{i}caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝒲i1μ𝐰𝒥(𝒲i1)μ𝐰𝒞dT𝒴i1absentsubscript𝒲𝑖1subscript𝜇𝐰𝒥subscript𝒲𝑖1subscript𝜇𝐰superscriptsubscript𝒞𝑑𝑇subscript𝒴𝑖1\displaystyle=\mathcal{W}_{i-1}-\mu_{{\mathbf{w}}}\nabla\mathcal{J}(\mathcal{W% }_{i-1})-\mu_{{\mathbf{w}}}\mathcal{C}_{d}^{T}\mathcal{Y}_{i-1}= caligraphic_W start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT bold_w end_POSTSUBSCRIPT ∇ caligraphic_J ( caligraphic_W start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) - italic_μ start_POSTSUBSCRIPT bold_w end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_Y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT
𝒵isubscript𝒵𝑖\displaystyle\mathcal{Z}_{i}caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =(I𝒟2)𝒵i1+𝒴i1𝒴i2+μy𝒞d(𝒲i𝒲i1),absent𝐼superscript𝒟2subscript𝒵𝑖1subscript𝒴𝑖1subscript𝒴𝑖2subscript𝜇𝑦subscript𝒞𝑑subscript𝒲𝑖subscript𝒲𝑖1\displaystyle=(I-\mathcal{D}^{2})\mathcal{Z}_{i-1}+\mathcal{Y}_{i-1}-\mathcal{% Y}_{i-2}+\mu_{y}\mathcal{C}_{d}(\mathcal{W}_{i}-\mathcal{W}_{i-1}),= ( italic_I - caligraphic_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) caligraphic_Z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + caligraphic_Y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - caligraphic_Y start_POSTSUBSCRIPT italic_i - 2 end_POSTSUBSCRIPT + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - caligraphic_W start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) , (21)
ϕisubscriptitalic-ϕ𝑖\displaystyle\phi_{i}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝒜¯𝒵i,absent¯𝒜subscript𝒵𝑖\displaystyle=\mathcal{\bar{A}}\mathcal{Z}_{i},= over¯ start_ARG caligraphic_A end_ARG caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (22)
𝒴isubscript𝒴𝑖\displaystyle\mathcal{Y}_{i}caligraphic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =Proxμy𝒢(ϕi).absentsubscriptProxsubscript𝜇𝑦superscript𝒢subscriptitalic-ϕ𝑖\displaystyle=\textbf{Prox}_{\mu_{y}\mathcal{G}^{*}}(\phi_{i}).= Prox start_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) .

II-A4 Diffusion Tracking Method

Let 𝒟=I𝒜¯𝒟𝐼¯𝒜\mathcal{D}=I-\mathcal{\bar{A}}caligraphic_D = italic_I - over¯ start_ARG caligraphic_A end_ARG and 𝒜¯=𝒜¯𝒜𝒜\mathcal{\bar{A}}=\mathcal{A}over¯ start_ARG caligraphic_A end_ARG = caligraphic_A, then, substituting these new variables into (21) results in:

𝒵isubscript𝒵𝑖\displaystyle\mathcal{Z}_{i}caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝒜(2I𝒜)𝒵i1+𝒴i1𝒴i2absent𝒜2𝐼𝒜subscript𝒵𝑖1subscript𝒴𝑖1subscript𝒴𝑖2\displaystyle=\mathcal{A}(2I-\mathcal{A})\mathcal{Z}_{i-1}+\mathcal{Y}_{i-1}-% \mathcal{Y}_{i-2}= caligraphic_A ( 2 italic_I - caligraphic_A ) caligraphic_Z start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + caligraphic_Y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - caligraphic_Y start_POSTSUBSCRIPT italic_i - 2 end_POSTSUBSCRIPT
μy𝒞d(𝒲i𝒲i1).subscript𝜇𝑦subscript𝒞𝑑subscript𝒲𝑖subscript𝒲𝑖1\displaystyle-\mu_{y}\mathcal{C}_{d}(\mathcal{W}_{i}-\mathcal{W}_{i-1}).- italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - caligraphic_W start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) . (23)

Multiplying (23) by 𝒜¯¯𝒜\mathcal{\bar{A}}over¯ start_ARG caligraphic_A end_ARG and using (22), we have

ϕi=𝒜¯(ϕi1+𝒴i1𝒴i2μy𝒞d(𝒲i𝒲i1)),subscriptitalic-ϕ𝑖¯𝒜subscriptitalic-ϕ𝑖1subscript𝒴𝑖1subscript𝒴𝑖2subscript𝜇𝑦subscript𝒞𝑑subscript𝒲𝑖subscript𝒲𝑖1\displaystyle\phi_{i}=\mathcal{\bar{A}}(\phi_{i-1}+\mathcal{Y}_{i-1}-\mathcal{% Y}_{i-2}-\mu_{y}\mathcal{C}_{d}(\mathcal{W}_{i}-\mathcal{W}_{i-1})),italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over¯ start_ARG caligraphic_A end_ARG ( italic_ϕ start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + caligraphic_Y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - caligraphic_Y start_POSTSUBSCRIPT italic_i - 2 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - caligraphic_W start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) ) ,

Hence, the final expressions for the diffusion tracking method are:

𝒲isubscript𝒲𝑖\displaystyle\mathcal{W}_{i}caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝒲i1μ𝐰𝒥(𝒲i1)μ𝐰𝒞dT𝒴i1absentsubscript𝒲𝑖1subscript𝜇𝐰𝒥subscript𝒲𝑖1subscript𝜇𝐰superscriptsubscript𝒞𝑑𝑇subscript𝒴𝑖1\displaystyle=\mathcal{W}_{i-1}-\mu_{{\mathbf{w}}}\nabla\mathcal{J}(\mathcal{W% }_{i-1})-\mu_{{\mathbf{w}}}\mathcal{C}_{d}^{T}\mathcal{Y}_{i-1}= caligraphic_W start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT bold_w end_POSTSUBSCRIPT ∇ caligraphic_J ( caligraphic_W start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) - italic_μ start_POSTSUBSCRIPT bold_w end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_Y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT (24)
ϕisubscriptitalic-ϕ𝑖\displaystyle\phi_{i}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝒜¯(ϕi1+𝒴i1𝒴i2μy𝒞d(𝒲i𝒲i1))absent¯𝒜subscriptitalic-ϕ𝑖1subscript𝒴𝑖1subscript𝒴𝑖2subscript𝜇𝑦subscript𝒞𝑑subscript𝒲𝑖subscript𝒲𝑖1\displaystyle=\mathcal{\bar{A}}(\phi_{i-1}+\mathcal{Y}_{i-1}-\mathcal{Y}_{i-2}% -\mu_{y}\mathcal{C}_{d}(\mathcal{W}_{i}-\mathcal{W}_{i-1}))= over¯ start_ARG caligraphic_A end_ARG ( italic_ϕ start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + caligraphic_Y start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - caligraphic_Y start_POSTSUBSCRIPT italic_i - 2 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - caligraphic_W start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) ) (25)

II-A5 Adaptive Combination Matrix

We consider the following definitions and assumptions of the combination matrix:

A𝐴\displaystyle Aitalic_A =[al,k]N×N,absentdelimited-[]subscript𝑎𝑙𝑘superscript𝑁𝑁\displaystyle=[a_{l,k}]\in\mathbb{R}^{N\times N},= [ italic_a start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT ,
𝒜𝒜\displaystyle\mathcal{A}caligraphic_A =AIM,absenttensor-product𝐴subscript𝐼𝑀\displaystyle=A\otimes I_{M},= italic_A ⊗ italic_I start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT , (26)

where al,ksubscript𝑎𝑙𝑘a_{l,k}italic_a start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT is zero if there is no edge connection between agents k𝑘kitalic_k and \ellroman_ℓ. Matrix A𝐴Aitalic_A is assumed to be a primitive symmetric and left/right-stochastic matrix. A¯=A¯𝐴𝐴\bar{A}=Aover¯ start_ARG italic_A end_ARG = italic_A, which is also a primitive symmetric left/right-stochastic matrix. 𝒟=I𝒜𝒟𝐼𝒜\mathcal{D}=I-\mathcal{A}caligraphic_D = italic_I - caligraphic_A. if the eigenvalues of 𝒜¯¯𝒜\mathcal{\bar{A}}over¯ start_ARG caligraphic_A end_ARG belong to (-1,1], many choices for 𝒟𝒟\mathcal{D}caligraphic_D exist.

Assumption 2

(Combination Matrix) Matrix 𝒟𝒟\mathcal{D}caligraphic_D satisfies condition (8) and then below condition yield:

I𝒟>0.𝐼𝒟0\displaystyle I-\mathcal{D}>0.italic_I - caligraphic_D > 0 .

II-A6 Theorem (Convergence-rate)

Theorem 1

Subject to assumption .1 and .2, if 𝒞dsubscript𝒞𝑑\mathcal{C}_{d}caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT is full row rank and the step-sizes μωsubscript𝜇𝜔\mu_{\omega}italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT and μysubscript𝜇𝑦\mu_{y}italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT are strictly positive and satisfy

μωsubscript𝜇𝜔\displaystyle\mu_{\omega}italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT 12δabsent12𝛿\displaystyle\leq\frac{1}{2\delta}≤ divide start_ARG 1 end_ARG start_ARG 2 italic_δ end_ARG (27)
μysubscript𝜇𝑦\displaystyle\mu_{y}italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT <ν2σmax2(𝒞d)absent𝜈2superscriptsubscript𝜎𝑚𝑎𝑥2subscript𝒞𝑑\displaystyle<\frac{\nu}{2\sigma_{max}^{2}(\mathcal{C}_{d})}< divide start_ARG italic_ν end_ARG start_ARG 2 italic_σ start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG (28)

it holds for all i0𝑖0i\geq 0italic_i ≥ 0 and some Co0subscript𝐶𝑜0C_{o}\geq 0italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ≥ 0 that:

𝐰~i2γiCosuperscriptnormsubscript~𝐰𝑖2superscript𝛾𝑖subscript𝐶𝑜\displaystyle\|\tilde{{\mathbf{w}}}_{i}\|^{2}\leq\gamma^{i}C_{o}∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_γ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT (29)

where

γ=Max𝛾𝑀𝑎𝑥\displaystyle\gamma=Maxitalic_γ = italic_M italic_a italic_x {1μων(1μωδ)1μyμωσmax2(𝒞d),1μωμyλmin(𝒞d𝒞dT),\displaystyle\{\frac{1-\mu_{\omega}\nu(1-\mu_{\omega}\delta)}{1-\mu_{y}\mu_{% \omega}\sigma_{max}^{2}(\mathcal{C}_{d})},1-\mu_{\omega}\mu_{y}\lambda_{min}(% \mathcal{C}_{d}\mathcal{C}_{d}^{T}),{ divide start_ARG 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ν ( 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_δ ) end_ARG start_ARG 1 - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG , 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) ,
σ¯2(𝒜)x~i12}<1\displaystyle\underline{\sigma}^{2}(\mathcal{A})\|\tilde{x}_{i-1}\|^{2}\}<1under¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( caligraphic_A ) ∥ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } < 1

See Appendix A.

Remark 2

λmin(𝒞d𝒞dT)subscript𝜆𝑚𝑖𝑛subscript𝒞𝑑superscriptsubscript𝒞𝑑𝑇\lambda_{min}(\mathcal{C}_{d}\mathcal{C}_{d}^{T})italic_λ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) is the smallest eigenvalue of 𝒞d𝒞dTsubscript𝒞𝑑superscriptsubscript𝒞𝑑𝑇\mathcal{C}_{d}\mathcal{C}_{d}^{T}caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, σ¯(𝒜)¯𝜎𝒜\underline{\sigma}(\mathcal{A})under¯ start_ARG italic_σ end_ARG ( caligraphic_A ) denotes the smallest non-zero singular value of 𝒜𝒜\mathcal{A}caligraphic_A and σmax(𝒜)subscript𝜎𝑚𝑎𝑥𝒜\sigma_{max}(\mathcal{A})italic_σ start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT ( caligraphic_A ) is the largest singular value of 𝒜𝒜\mathcal{A}caligraphic_A.

Remark 3

According to Theorem .1, PD algorithm has fast convergence for non-smooth g if Cksubscript𝐶𝑘C_{k}italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT has full row rank.

II-A7 Data Model

  1. 1.

    If we assume that cost function J(𝐰)𝐽𝐰J({\mathbf{w}})italic_J ( bold_w ) is δ𝛿\deltaitalic_δ-smooth ,then, following inequality is established,

    J(𝐰(i1))J(w)2superscriptnorm𝐽𝐰𝑖1𝐽superscript𝑤2\displaystyle\|\nabla J({\mathbf{w}}(i-1))-\nabla J(w^{*})\|^{2}∥ ∇ italic_J ( bold_w ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
    δ𝐰T(i1)(J(𝐰(i1))J(w))absent𝛿superscript𝐰𝑇𝑖1𝐽𝐰𝑖1𝐽superscript𝑤\displaystyle\leq\delta{\mathbf{w}}^{T}(i-1)\Big{(}\nabla J({\mathbf{w}}(i-1))% -\nabla J(w^{*})\Big{)}≤ italic_δ bold_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_i - 1 ) ( ∇ italic_J ( bold_w ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) (30)
  2. 2.

    Strong-convexity is often proposed in adaptation and streaming data to help algorithms against ill-condition by introducing a guard, as well as facilitating conditions to obtain a unique global minimum [45] - [46]. If J(𝐰)𝐽𝐰J({\mathbf{w}})italic_J ( bold_w ) is differentiable, the strong-convexity bound is obtained as

    𝐰~T(i1)(J(𝐰(i1))J(w))ν𝐰~(i1)2superscript~𝐰𝑇𝑖1𝐽𝐰𝑖1𝐽superscript𝑤𝜈superscriptnorm~𝐰𝑖12\displaystyle\tilde{{\mathbf{w}}}^{T}(i-1)(\nabla J({\mathbf{w}}(i-1))-\nabla J% (w^{*}))\geq\nu\|\tilde{{\mathbf{w}}}(i-1)\|^{2}over~ start_ARG bold_w end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_i - 1 ) ( ∇ italic_J ( bold_w ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ≥ italic_ν ∥ over~ start_ARG bold_w end_ARG ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (31)
Table I: Convergence Properties of Distributed Algorithms
Algorithm Support prox.operators Rate(convex) Step-size(upper bound)
EXTRA([22]) Yes O(1K)𝑂1𝐾O(\frac{1}{K})italic_O ( divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ) O(υ(1ρ)L2)𝑂𝜐1𝜌superscript𝐿2O\Big{(}\frac{\upsilon(1-\rho)}{L^{2}}\Big{)}italic_O ( divide start_ARG italic_υ ( 1 - italic_ρ ) end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )
Aug-DGM([27]) Yes converges min{(1ρ)210Lρnκ,12L}superscript1𝜌210𝐿𝜌𝑛𝜅12𝐿\min\Big{\{}\frac{(1-\rho)^{2}}{10L\rho\sqrt{n}\sqrt{\kappa}},\frac{1}{2L}\Big% {\}}roman_min { divide start_ARG ( 1 - italic_ρ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 10 italic_L italic_ρ square-root start_ARG italic_n end_ARG square-root start_ARG italic_κ end_ARG end_ARG , divide start_ARG 1 end_ARG start_ARG 2 italic_L end_ARG }
DIGing([38]) Yes -- O((1ρ)2υκ1.5n)𝑂superscript1𝜌2𝜐superscript𝜅1.5𝑛O\Big{(}\frac{(1-\rho)^{2}}{\upsilon\kappa^{1.5}\sqrt{n}}\Big{)}italic_O ( divide start_ARG ( 1 - italic_ρ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_υ italic_κ start_POSTSUPERSCRIPT 1.5 end_POSTSUPERSCRIPT square-root start_ARG italic_n end_ARG end_ARG )
NIDS([28]) Yes O(1K)𝑂1𝐾O(\frac{1}{K})italic_O ( divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ) 2L2𝐿\frac{2}{L}divide start_ARG 2 end_ARG start_ARG italic_L end_ARG
Exact Diffusion([29]) Yes κ21ρsuperscript𝜅21𝜌\frac{\kappa^{2}}{1-\rho}divide start_ARG italic_κ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_ρ end_ARG O(υL2)𝑂𝜐superscript𝐿2O(\frac{\upsilon}{L^{2}})italic_O ( divide start_ARG italic_υ end_ARG start_ARG italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )
Primal-Dual Yes O(1K)𝑂1𝐾O(\frac{1}{K})italic_O ( divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ) O(1K)𝑂1𝐾O(\frac{1}{K})italic_O ( divide start_ARG 1 end_ARG start_ARG italic_K end_ARG )
Primal-Dual Diffusion Yes O(1K)𝑂1𝐾O(\frac{1}{K})italic_O ( divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ) O(1K)𝑂1𝐾O(\frac{1}{K})italic_O ( divide start_ARG 1 end_ARG start_ARG italic_K end_ARG )

III Proximal Decentralized Algorithm

To manage the non-smooth term D(𝐰)D𝐰\mathrm{D}({\mathbf{w}})roman_D ( bold_w ), it is defined as

D(𝐰)1Nk=1ND(𝐰k)D𝐰1𝑁superscriptsubscript𝑘1𝑁Dsubscript𝐰𝑘\displaystyle\mathrm{D}({\mathbf{w}})\triangleq\frac{1}{N}\sum_{k=1}^{N}% \mathrm{D}({\mathbf{w}}_{k})roman_D ( bold_w ) ≜ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT roman_D ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (32)

III-A Proximal Dual Diffusion Method

Following recursion suggested based on the (39), for i=0,1,.𝑖01i=0,1,....italic_i = 0 , 1 , … ., and each agent k𝑘kitalic_k, if 𝒜¯=𝒜,𝒟=(I𝒜)formulae-sequence¯𝒜𝒜𝒟𝐼𝒜\mathcal{\bar{A}}=\mathcal{A},\mathcal{D}=(I-\mathcal{A})over¯ start_ARG caligraphic_A end_ARG = caligraphic_A , caligraphic_D = ( italic_I - caligraphic_A ), then

𝐰k,isubscript𝐰𝑘𝑖\displaystyle{\mathbf{w}}_{k,i}bold_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT =𝐰k,i1μ𝐰Jk(𝐰k,i1)μ𝐰CkTyk,i1absentsubscript𝐰𝑘𝑖1subscript𝜇𝐰subscript𝐽𝑘subscript𝐰𝑘𝑖1subscript𝜇𝐰superscriptsubscript𝐶𝑘𝑇subscript𝑦𝑘𝑖1\displaystyle={\mathbf{w}}_{k,i-1}-\mu_{{\mathbf{w}}}\nabla J_{k}({\mathbf{w}}% _{k,i-1})-\mu_{{\mathbf{w}}}C_{k}^{T}y_{k,i-1}= bold_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT bold_w end_POSTSUBSCRIPT ∇ italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_w start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT ) - italic_μ start_POSTSUBSCRIPT bold_w end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT (33)
ψk,isubscript𝜓𝑘𝑖\displaystyle\psi_{k,i}italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT =yk,i1+μyCk𝐰k,iabsentsubscript𝑦𝑘𝑖1subscript𝜇𝑦subscript𝐶𝑘subscript𝐰𝑘𝑖\displaystyle=y_{k,i-1}+\mu_{y}C_{k}{\mathbf{w}}_{k,i}= italic_y start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT (34)
zk,isubscript𝑧𝑘𝑖\displaystyle z_{k,i}italic_z start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT =ϕk,i1+ψk,iψk,i1absentsubscriptitalic-ϕ𝑘𝑖1subscript𝜓𝑘𝑖subscript𝜓𝑘𝑖1\displaystyle=\phi_{k,i-1}+\psi_{k,i}-\psi_{k,i-1}= italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT + italic_ψ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT - italic_ψ start_POSTSUBSCRIPT italic_k , italic_i - 1 end_POSTSUBSCRIPT (35)
ϕk,isubscriptitalic-ϕ𝑘𝑖\displaystyle\phi_{k,i}italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT =l𝒩ka¯l,kzl,iabsentsubscript𝑙subscript𝒩𝑘subscript¯𝑎𝑙𝑘subscript𝑧𝑙𝑖\displaystyle=\sum_{l\in\mathcal{N}_{k}}\bar{a}_{l,k}z_{l,i}= ∑ start_POSTSUBSCRIPT italic_l ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT over¯ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT (36)
yk,isubscript𝑦𝑘𝑖\displaystyle y_{k,i}italic_y start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT =proxμk/Kg(ϕk,i)absentsubscriptproxsubscript𝜇𝑘𝐾superscript𝑔subscriptitalic-ϕ𝑘𝑖\displaystyle=\textbf{prox}_{\mu_{k}/Kg^{*}}(\phi_{k,i})= prox start_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / italic_K italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ) (37)
0:  Cksubscript𝐶𝑘C_{k}italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, κ𝜅\kappaitalic_κ Initialize: Chooseϕk(1)=ψk(1)=θk(1)=0Choosesubscriptitalic-ϕ𝑘1subscript𝜓𝑘1subscript𝜃𝑘10\text{Choose}\hskip 2.84526pt\phi_{k}(-1)=\psi_{k}(-1)=\theta_{k}(-1)=0Choose italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( - 1 ) = italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( - 1 ) = italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( - 1 ) = 0, for all k=1,2,.,Nk=1,2,....,Nitalic_k = 1 , 2 , … . , italic_N
1:  for  Repeat i0𝑖0i\geq 0italic_i ≥ 0  do
2:      set 𝐰k(i)=𝐰k(i1)μwJk(𝐰k(i1))μwCkTyk(i1),subscript𝐰𝑘𝑖subscript𝐰𝑘𝑖1subscript𝜇𝑤subscript𝐽𝑘subscript𝐰𝑘𝑖1subscript𝜇𝑤superscriptsubscript𝐶𝑘𝑇subscript𝑦𝑘𝑖1{\mathbf{w}}_{k}(i)={\mathbf{w}}_{k}(i-1)-\mu_{w}\nabla J_{k}({\mathbf{w}}_{k}% (i-1))-\mu_{w}C_{k}^{T}y_{k}(i-1),bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) = bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) - italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ∇ italic_J start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ,
3:      set ψk(i)=yk(i1)+μyCk𝐰k(i),subscript𝜓𝑘𝑖subscript𝑦𝑘𝑖1subscript𝜇𝑦subscript𝐶𝑘subscript𝐰𝑘𝑖\psi_{k}(i)=y_{k}(i-1)+\mu_{y}C_{k}{\mathbf{w}}_{k}(i),italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) = italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ,
4:      set zk(i)=ϕk(i1)+ψk(i)ψk(i1),subscript𝑧𝑘𝑖subscriptitalic-ϕ𝑘𝑖1subscript𝜓𝑘𝑖subscript𝜓𝑘𝑖1z_{k}(i)=\phi_{k}(i-1)+\psi_{k}(i)-\psi_{k}(i-1),italic_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) = italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) - italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ,
5:      set ϕk(i)=𝒩ka,kz(i),subscriptitalic-ϕ𝑘𝑖subscriptsubscript𝒩𝑘subscript𝑎𝑘subscript𝑧𝑖\phi_{k}(i)=\sum_{\ell\in\mathcal{N}_{k}}a_{\ell,k}z_{\ell}(i),italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) = ∑ start_POSTSUBSCRIPT roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT roman_ℓ , italic_k end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_i ) ,
6:      set yk(i)=Proxμy/kg(ϕk(i))subscript𝑦𝑘𝑖𝑃𝑟𝑜subscript𝑥subscript𝜇𝑦𝑘superscript𝑔subscriptitalic-ϕ𝑘𝑖y_{k}(i)=Prox_{\mu_{y}/kg^{*}}(\phi_{k}(i))italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) = italic_P italic_r italic_o italic_x start_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT / italic_k italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ) Proximal Computation :
7:      set yk(i)=Proxμy/kg(ϕk(i))subscript𝑦𝑘𝑖𝑃𝑟𝑜subscript𝑥subscript𝜇𝑦𝑘superscript𝑔subscriptitalic-ϕ𝑘𝑖y_{k}(i)=Prox_{\mu_{y}/kg^{*}}(\phi_{k}(i))italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) = italic_P italic_r italic_o italic_x start_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT / italic_k italic_g start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) )
8:      set ϖ=|ϕk(i)|μy/kitalic-ϖsubscriptitalic-ϕ𝑘𝑖subscript𝜇𝑦𝑘\varpi=|\phi_{k}(i)|-\mu_{y}/kitalic_ϖ = | italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) | - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT / italic_k
9:      set ϖ=max(ϖ,κ)exp(ϖ)superscriptitalic-ϖ𝑚𝑎𝑥italic-ϖ𝜅𝑒𝑥𝑝italic-ϖ\varpi^{{}^{\prime}}=max(\varpi,\kappa)exp(-\varpi)italic_ϖ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT = italic_m italic_a italic_x ( italic_ϖ , italic_κ ) italic_e italic_x italic_p ( - italic_ϖ )
10:     if ϕk(i)2<ϖsubscriptnormsubscriptitalic-ϕ𝑘𝑖2superscriptitalic-ϖ\|\phi_{k}(i)\|_{2}<\varpi^{{}^{\prime}}∥ italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < italic_ϖ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT then
11:         set θ1=Fk(i)ϵk(i)2,subscript𝜃1subscript𝐹𝑘𝑖subscriptnormsubscriptitalic-ϵ𝑘𝑖2\theta_{1}=F_{k}(i)\sqrt{\|\epsilon_{k}(i)\|_{2}},italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) square-root start_ARG ∥ italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ,
12:         set θ2=ϕk(i)2,subscript𝜃2subscriptnormsubscriptitalic-ϕ𝑘𝑖2\theta_{2}=\|\phi_{k}(i)\|_{2},italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = ∥ italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,
13:         set θk(i)=θ11+θ2sign(ϕk(i))subscript𝜃𝑘𝑖subscript𝜃11subscript𝜃2𝑠𝑖𝑔𝑛subscriptitalic-ϕ𝑘𝑖\theta_{k}(i)=\frac{\theta_{1}}{1+\theta_{2}}sign(\phi_{k}(i))italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) = divide start_ARG italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 1 + italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG italic_s italic_i italic_g italic_n ( italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) )
14:     else
15:         set θk(i)=ϖsign(ϕk(i))subscript𝜃𝑘𝑖superscriptitalic-ϖ𝑠𝑖𝑔𝑛subscriptitalic-ϕ𝑘𝑖\theta_{k}(i)=\varpi^{{}^{\prime}}sign(\phi_{k}(i))italic_θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) = italic_ϖ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT italic_s italic_i italic_g italic_n ( italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) )
16:     end if
17:  end for
Algorithm 1 Proximal-Dual diffusion with adaptive coefficient weights-Main Algorithm
Table II: Performance comparison of algorithms for directed and undirected graphs
Algorithm Total time (second) Number of iterations
EXTRA([22]) 19.8019.8019.8019.80 500500500500
Aug-DGM([27]) 18.5818.5818.5818.58 500500500500
DIGing([38]) 18.4518.4518.4518.45 500500500500
NIDS([28]) 18.3118.3118.3118.31 500500500500
Exact Diffusion([29]) 25.6225.6225.6225.62 500500500500
Primal-Dual 18.0118.0118.0118.01 500500500500
Primal-Dual Diffusion 17.1517.1517.1517.15 500500500500

The desired variance can be estimated iteratively, and the factor parameter estimated by agent \ellroman_ℓ by running the following smoothing filter:

γ,k(i)=(1ζ)γ,k(i1)+ζ𝐰(i1)2,subscript𝛾𝑘𝑖1𝜁subscript𝛾𝑘𝑖1𝜁superscriptnormsubscript𝐰𝑖12\displaystyle\gamma_{\ell,k}(i)=(1-\zeta)\gamma_{\ell,k}(i-1)+\zeta\|{\mathbf{% w}}_{\ell}(i-1)\|^{2},italic_γ start_POSTSUBSCRIPT roman_ℓ , italic_k end_POSTSUBSCRIPT ( italic_i ) = ( 1 - italic_ζ ) italic_γ start_POSTSUBSCRIPT roman_ℓ , italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_ζ ∥ bold_w start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (38)

where {𝐰(i1)subscript𝐰𝑖1{\mathbf{w}}_{\ell}(i-1)bold_w start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_i - 1 )}, which is needed to run the recursion at agent k, is computed by agent k𝑘kitalic_k at iteration i. In order to remove the necessity of transmitting γ,k(i)subscript𝛾𝑘𝑖\gamma_{\ell,k}(i)italic_γ start_POSTSUBSCRIPT roman_ℓ , italic_k end_POSTSUBSCRIPT ( italic_i ) from agent \ellroman_ℓ to its neighbors, the estimation of γ,k(i)subscript𝛾𝑘𝑖\gamma_{\ell,k}(i)italic_γ start_POSTSUBSCRIPT roman_ℓ , italic_k end_POSTSUBSCRIPT ( italic_i ) designed into neighbors of agent \ellroman_ℓ. It gives the advantage of not sharing the value and overcomes the difficulty of accessing agent k to 𝐰(i1)subscript𝐰𝑖1{\mathbf{w}}_{\ell}(i-1)bold_w start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_i - 1 ) in the ATC diffusion implementation. This works by the mechanism of receiving from its neighbor \ellroman_ℓ, we replaced 𝐰(i1)subscript𝐰𝑖1{\mathbf{w}}_{\ell}(i-1)bold_w start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ( italic_i - 1 ) by 𝐰k(i1)subscript𝐰𝑘𝑖1{\mathbf{w}}_{k}(i-1)bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) since for i1much-greater-than𝑖1i\gg 1italic_i ≫ 1. Note that the iterates at various agents approach to an optimal point. With this substitution, agent k𝑘kitalic_k can now estimate the variance γ2(i)superscriptsubscript𝛾2𝑖\gamma_{\ell}^{2}(i)italic_γ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_i ) of its neighbor locally by running a smoothing filter of the following form:

γ,k(i)=(1ζ)γ,k(i1)+ζ𝐰k(i1)2.subscript𝛾𝑘𝑖1𝜁subscript𝛾𝑘𝑖1𝜁superscriptnormsubscript𝐰𝑘𝑖12\displaystyle\gamma_{\ell,k}(i)=(1-\zeta)\gamma_{\ell,k}(i-1)+\zeta\|{\mathbf{% w}}_{k}(i-1)\|^{2}.italic_γ start_POSTSUBSCRIPT roman_ℓ , italic_k end_POSTSUBSCRIPT ( italic_i ) = ( 1 - italic_ζ ) italic_γ start_POSTSUBSCRIPT roman_ℓ , italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_ζ ∥ bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (39)

The above expression provides part of an adaptive construction for the combination weights al,ksubscript𝑎𝑙𝑘a_{l,k}italic_a start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT. Besides, adaptive weights are factors to evaluate 𝐰k(i)subscript𝐰𝑘𝑖{\mathbf{w}}_{k}(i)bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ).

The proposed Proximal-Dual diffusion algorithm along with its adaptive coefficient weights are described in Algorithm 1 and Algorithm 2, respectively.

0:  ζ𝜁\zetaitalic_ζ Initialize: Chooseγ,k(1)=0Choosesubscript𝛾𝑘10\text{Choose}\hskip 2.84526pt\gamma_{\ell,k}(-1)=0Choose italic_γ start_POSTSUBSCRIPT roman_ℓ , italic_k end_POSTSUBSCRIPT ( - 1 ) = 0, for all k=1,2,.,Nk=1,2,....,Nitalic_k = 1 , 2 , … . , italic_N and 𝒩ksubscript𝒩𝑘\ell\in\mathcal{N}_{k}roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
1:  for  Repeat i0𝑖0i\geq 0italic_i ≥ 0  do
2:      set χ,k2(i)=(1ζ)χ,k2(i1)+ζMax{1μων(1μωδ)1μyμωσmax2(Cd),1μωμyλmin(CdCdT),σ¯2(𝒜)x~i12}×𝐰~k(1)IμyμωCdTCd2+am𝐲~k(1)Iμyμw𝒞d𝒞dT2+am𝐱~k(1)21μyμwσ(2Max)(𝒞d),𝒩k\chi_{\ell,k}^{2}(i)=(1-\zeta)\chi_{\ell,k}^{2}(i-1)+\zeta Max\{\frac{1-\mu_{% \omega}\nu(1-\mu_{\omega}\delta)}{1-\mu_{y}\mu_{\omega}\sigma_{max}^{2}(% \textbf{C}_{d})},1-\mu_{\omega}\mu_{y}\lambda_{min}(\textbf{C}_{d}\textbf{C}_{% d}^{T}),\underline{\sigma}^{2}(\mathcal{A})\|\tilde{x}_{i-1}\|^{2}\}\times% \frac{\|\tilde{{\mathbf{w}}}_{k}(-1)\|_{I-\mu_{y}\mu_{\omega}C_{d}^{T}C_{d}}^{% 2}+a_{m}\|\tilde{{\mathbf{y}}}_{k}(-1)\|_{I-\mu_{y}\mu_{w}\mathcal{C}_{d}% \mathcal{C}_{d}^{T}}^{2}+a_{m}\|\tilde{{\mathbf{x}}}_{k}(-1)\|^{2}}{1-\mu_{y}% \mu_{w}\sigma^{2}_{(}Max)(\mathcal{C}_{d})},\ell\in\mathcal{N}_{k}italic_χ start_POSTSUBSCRIPT roman_ℓ , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_i ) = ( 1 - italic_ζ ) italic_χ start_POSTSUBSCRIPT roman_ℓ , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_i - 1 ) + italic_ζ italic_M italic_a italic_x { divide start_ARG 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ν ( 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_δ ) end_ARG start_ARG 1 - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG , 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) , under¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( caligraphic_A ) ∥ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } × divide start_ARG ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( - 1 ) ∥ start_POSTSUBSCRIPT italic_I - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( - 1 ) ∥ start_POSTSUBSCRIPT italic_I - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ( end_POSTSUBSCRIPT italic_M italic_a italic_x ) ( caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG , roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
3:      set a,k(i)=exp(χ,k2(i))j𝒩kexp(χj,k2(i)),𝒩kformulae-sequencesubscript𝑎𝑘𝑖𝑒𝑥𝑝superscriptsubscript𝜒𝑘2𝑖subscript𝑗subscript𝒩𝑘𝑒𝑥𝑝superscriptsubscript𝜒𝑗𝑘2𝑖subscript𝒩𝑘a_{\ell,k}(i)=\frac{exp(\chi_{\ell,k}^{-2}(i))}{\sum_{j\in\mathcal{N}_{k}}exp(% \chi_{j,k}^{-2}(i))},\ell\in\mathcal{N}_{k}italic_a start_POSTSUBSCRIPT roman_ℓ , italic_k end_POSTSUBSCRIPT ( italic_i ) = divide start_ARG italic_e italic_x italic_p ( italic_χ start_POSTSUBSCRIPT roman_ℓ , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ( italic_i ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e italic_x italic_p ( italic_χ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ( italic_i ) ) end_ARG , roman_ℓ ∈ caligraphic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
4:  end for
Algorithm 2 Algorithm for Adaptive Coefficient Weights

IV Fast Convergence

If 𝒳1=0subscript𝒳10\mathcal{X}_{-1}=0caligraphic_X start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT = 0, then 𝒳1=𝒟𝒵1subscript𝒳1𝒟subscript𝒵1\mathcal{X}_{1}=-\mathcal{D}\mathcal{Z}_{1}caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = - caligraphic_D caligraphic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT belongs to range space of 𝒟𝒟\mathcal{D}caligraphic_D. As a result, {𝒳i}i0subscriptsubscript𝒳𝑖𝑖0\{\mathcal{X}_{i}\}_{i\geq 0}{ caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ≥ 0 end_POSTSUBSCRIPT will always remain in the range space of 𝒟𝒟\mathcal{D}caligraphic_D. The iterates (𝒲i,𝒴i,𝒵i)subscript𝒲𝑖subscript𝒴𝑖subscript𝒵𝑖(\mathcal{W}_{i},\mathcal{Y}_{i},\mathcal{Z}_{i})( caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) converge to this point (𝒲,𝒴,𝒵)superscript𝒲superscript𝒴superscript𝒵(\mathcal{W}^{*},\mathcal{Y}^{*},\mathcal{Z}^{*})( caligraphic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_Y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_Z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). The error quantities are as follows:

𝒲~isubscript~𝒲𝑖\displaystyle\tilde{\mathcal{W}}_{i}over~ start_ARG caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT 𝒲i𝒲absentsubscript𝒲𝑖superscript𝒲\displaystyle\triangleq\mathcal{W}_{i}-\mathcal{W}^{*}≜ caligraphic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - caligraphic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (40)
𝒴~isubscript~𝒴𝑖\displaystyle\tilde{\mathcal{Y}}_{i}over~ start_ARG caligraphic_Y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT 𝒴i𝒴absentsubscript𝒴𝑖superscript𝒴\displaystyle\triangleq\mathcal{Y}_{i}-\mathcal{Y}^{*}≜ caligraphic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - caligraphic_Y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (41)
𝒵~isubscript~𝒵𝑖\displaystyle\tilde{\mathcal{Z}}_{i}over~ start_ARG caligraphic_Z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT 𝒵i𝒵absentsubscript𝒵𝑖superscript𝒵\displaystyle\triangleq\mathcal{Z}_{i}-\mathcal{Z}^{*}≜ caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - caligraphic_Z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (42)
𝒳~isubscript~𝒳𝑖\displaystyle\tilde{\mathcal{X}}_{i}over~ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT 𝒳i𝒳absentsubscript𝒳𝑖superscript𝒳\displaystyle\triangleq\mathcal{X}_{i}-\mathcal{X}^{*}≜ caligraphic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - caligraphic_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (43)

The expressions in (33)-(37) produce following error recursions:

𝒲~isubscript~𝒲𝑖\displaystyle\mathcal{\tilde{W}}_{i}over~ start_ARG caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝒲~i1μωFk(i)(𝒥(𝒲i1)𝒥(𝒲))absentsubscript~𝒲𝑖1subscript𝜇𝜔subscript𝐹𝑘𝑖𝒥subscript𝒲𝑖1𝒥superscript𝒲\displaystyle=\mathcal{\tilde{W}}_{i-1}-\mu_{\omega}F_{k}(i)(\nabla\mathcal{J}% (\mathcal{W}_{i-1})-\nabla\mathcal{J}(\mathcal{W}^{*}))= over~ start_ARG caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ( ∇ caligraphic_J ( caligraphic_W start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) - ∇ caligraphic_J ( caligraphic_W start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) )
μω𝒞dT𝒴~i1subscript𝜇𝜔superscriptsubscript𝒞𝑑𝑇subscript~𝒴𝑖1\displaystyle-\mu_{\omega}\mathcal{C}_{d}^{T}\mathcal{\tilde{Y}}_{i-1}- italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG caligraphic_Y end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT (44)
𝒵~isubscript~𝒵𝑖\displaystyle\mathcal{\tilde{Z}}_{i}over~ start_ARG caligraphic_Z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝒴~i1+μy𝒞d𝒲~i+𝒟𝒳~i1absentsubscript~𝒴𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript~𝒲𝑖𝒟subscript~𝒳𝑖1\displaystyle=\mathcal{\tilde{Y}}_{i-1}+\mu_{y}\mathcal{C}_{d}\mathcal{\tilde{% W}}_{i}+\mathcal{D}\mathcal{\tilde{X}}_{i-1}= over~ start_ARG caligraphic_Y end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG caligraphic_W end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + caligraphic_D over~ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT (45)
𝒳~isubscript~𝒳𝑖\displaystyle\mathcal{\tilde{X}}_{i}over~ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝒳~i1𝒟𝒵~iabsentsubscript~𝒳𝑖1𝒟subscript~𝒵𝑖\displaystyle=\mathcal{\tilde{X}}_{i-1}-\mathcal{D}\mathcal{\tilde{Z}}_{i}= over~ start_ARG caligraphic_X end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - caligraphic_D over~ start_ARG caligraphic_Z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (46)
𝒴~isubscript~𝒴𝑖\displaystyle\mathcal{\tilde{Y}}_{i}over~ start_ARG caligraphic_Y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =Proxμy𝒢(𝒜¯𝒵i)Proxμy𝒢(𝒜¯𝒵)absentsubscriptProxsubscript𝜇𝑦superscript𝒢¯𝒜subscript𝒵𝑖subscriptProxsubscript𝜇𝑦superscript𝒢¯𝒜superscript𝒵\displaystyle=\textbf{Prox}_{\mu_{y}\mathcal{G}^{*}}(\mathcal{\bar{A}}\mathcal% {Z}_{i})-\textbf{Prox}_{\mu_{y}\mathcal{G}^{*}}(\mathcal{\bar{A}}\mathcal{Z}^{% *})= Prox start_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over¯ start_ARG caligraphic_A end_ARG caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - Prox start_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_G start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over¯ start_ARG caligraphic_A end_ARG caligraphic_Z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) (47)

The network mean-square deviation is:

MSD=limi1NTr(E{𝒴~i𝒴~iT})MSDsubscript𝑖1𝑁𝑇𝑟𝐸subscript~𝒴𝑖superscriptsubscript~𝒴𝑖𝑇{\mathrm{MSD}}=\lim_{i\rightarrow\infty}\frac{1}{N}Tr\Big{(}E\left\{\mathcal{% \tilde{Y}}_{i}\mathcal{\tilde{Y}}_{i}^{T}\right\}\Big{)}roman_MSD = roman_lim start_POSTSUBSCRIPT italic_i → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_N end_ARG italic_T italic_r ( italic_E { over~ start_ARG caligraphic_Y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT over~ start_ARG caligraphic_Y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT } ) (48)

V Convergence properties and performance comparison of algorithms

Table I compares the convergence properties of distributed algorithms in term of supporting prox operators, the rate, and upper bound of the proposed primary-dual diffusion (PDD) and primal-dual(PD) algorithms with those of NIDS, EXTRA and DIGing-ATC, Aug-DGM, Exact diffusion algorithms. As it is seen, the proposed primal-dual diffusion algorithm has the same upper bound of step size as that of the primal-dual algorithm. However, the convergence rate for both PD diffusion and PD is higher than those of the NIDS, EXTRA, DIGing-ATC, Aug-DGM, and Exact diffusion algorithms. Due to the fact shown in Table I, PDD and PD require fewer parameters than DIGing, EXTRA and Aug-DGM algorithms, therefore, their upper bounds are simpler. Table II compares the performance of the algorithms for directed and undirected graphs in terms of total time and number of iterations. All algorithms were assumed to have the same number of iterations. As shown, the total time per second is different for all algorithms with the same number of iterations. It can be seen that PDD algorithm outperforms the other algorithms.

VI Simulation Results

On distributed parameter estimation, simulation for a network with N = 20 nodes are presented. Figure 1 (a) depicts the directed network topology used in all simulations. For computing the combination coefficients, the Metropolis rule is used.

Refer to caption
(a) Network topology
Refer to caption
(b) Network squared error curve of Primal-dual algorithm without non-smooth function
Figure 1: Network topology and network squared error curve of Primal-dual algorithm without non-smooth function.

VI-A Decentralized compressed sensing

Consider a decentralized compressed sensing problem involving some network agents. Each agent i1,.,ni\in{1,....,n}italic_i ∈ 1 , … . , italic_n has its own measurement via yi=Mix+eisubscript𝑦𝑖subscript𝑀𝑖𝑥subscript𝑒𝑖y_{i}=M_{i}x+e_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x + italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where yipsubscript𝑦𝑖superscript𝑝y_{i}\in\mathbb{R}^{p}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT is an unknown sparse signal and eimisubscript𝑒𝑖superscriptsubscript𝑚𝑖e_{i}\in\mathbb{R}^{m_{i}}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is an i.i.d Gaussian noise vector. We consider λx𝜆norm𝑥\lambda\|x\|italic_λ ∥ italic_x ∥ as a non-smooth function and to simplify the problem, λ𝜆\lambdaitalic_λ value is 1. To estimate x𝑥xitalic_x as the sparse vector, we use the decentralized algorithm.

VI-A1 The case without non-smooth function

Consider the decentralized problem of solving for an unknown signal x𝑥x\in\mathbb{R}italic_x ∈ blackboard_R in the absence of a non-smooth function.

VI-A2 The case with nonsmooth function

A non-smooth function was considered to solve a decentralized compressed sensing problem.
The comparison of the simulation results for the proposed primal-dual diffusion (PDD) and primal-dual (PD) algorithms with those of NIDS-adaptive, NIDS, EXTRA, and DIGing-ATC algorithms are shown Fig. 1 (a). Note that in this scenario, we ignored the non-smooth function. It is seen that the proposed PDD outperforms the other algorithms in terms of square error.

In Fig 1 (b), we have compared the performance of the proposed PDD algorithm with those of EXTRA and PD algorithms, where we have a non-smooth function. As it is seen the proposed PDD outperforms EXTRA and PD algorithms.

Refer to caption
(a) without nonsmooth function
Refer to caption
(b) with nonsmooth function
Figure 2: Network Squared error curves of Primal-dual algorithm.

VII Conclusion

We have studied the distributed multi-agent sharing optimization problem in a directed graph, with a composite objective function consisting of a smooth function plus a non-smooth function shared through all agents in a network. To solve the problem, some reformulations were applied. A new upper bound on step sizes are obtained, from which a linear convergence can be achieved under the strong convexity assumption. Further, we proposed the adaptive coefficient weights that are time-varying. We show that the proposed algorithm develops the linear convergence to the global minimizer with adaptive coefficient weights and new bound of step sizes.

Appendix A Proof of Theorem .1

To solve (44, 45, 46, 47) problems, First, we need to define some assumptions, then, square both sides of them to reach optimal step-sizes of PD algorithm and convergence rate equation:

𝐰~k(i)subscript~𝐰𝑘𝑖\displaystyle\tilde{{\mathbf{w}}}_{k}(i)over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) 𝐰k(i)wabsentsubscript𝐰𝑘𝑖superscript𝑤\displaystyle\triangleq{\mathbf{w}}_{k}(i)-w^{*}≜ bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) - italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (49)
𝐲~k(i)subscript~𝐲𝑘𝑖\displaystyle\tilde{{\mathbf{y}}}_{k}(i)over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) 𝐲k(i)yabsentsubscript𝐲𝑘𝑖superscript𝑦\displaystyle\triangleq{\mathbf{y}}_{k}(i)-y^{*}≜ bold_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) - italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (50)
𝐳~k(i)subscript~𝐳𝑘𝑖\displaystyle\tilde{{\mathbf{z}}}_{k}(i)over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) 𝐳k(i)zabsentsubscript𝐳𝑘𝑖superscript𝑧\displaystyle\triangleq{\mathbf{z}}_{k}(i)-z^{*}≜ bold_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) - italic_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (51)
𝐱~k(i)subscript~𝐱𝑘𝑖\displaystyle\tilde{{\mathbf{x}}}_{k}(i)over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) 𝐱k(i)xabsentsubscript𝐱𝑘𝑖superscript𝑥\displaystyle\triangleq{\mathbf{x}}_{k}(i)-x^{*}≜ bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) - italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (52)

Based on agent formulation, equations are reformulated as follows:

𝐰k(i)subscript𝐰𝑘𝑖\displaystyle{\mathbf{w}}_{k}(i)bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) =𝐰k(i1)μωJ(𝐰(i1))μω𝒞dT𝐲k(i1)absentsubscript𝐰𝑘𝑖1subscript𝜇𝜔𝐽𝐰𝑖1subscript𝜇𝜔superscriptsubscript𝒞𝑑𝑇subscript𝐲𝑘𝑖1\displaystyle={\mathbf{w}}_{k}(i-1)-\mu_{\omega}\nabla J({\mathbf{w}}(i-1))-% \mu_{\omega}\mathcal{C}_{d}^{T}{\mathbf{y}}_{k}(i-1)= bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ∇ italic_J ( bold_w ( italic_i - 1 ) ) - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) (53)
𝐰~k(i)subscript~𝐰𝑘𝑖\displaystyle\tilde{{\mathbf{w}}}_{k}(i)over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) =𝐰~k(i1)μω{J(𝐰(i1))J(w)}absentsubscript~𝐰𝑘𝑖1subscript𝜇𝜔𝐽𝐰𝑖1𝐽superscript𝑤\displaystyle=\tilde{{\mathbf{w}}}_{k}(i-1)-\mu_{\omega}\{\nabla J({\mathbf{w}% }(i-1))-\nabla J(w^{*})\}= over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT { ∇ italic_J ( bold_w ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) }
μω𝒞dT𝐲~k(i1)subscript𝜇𝜔superscriptsubscript𝒞𝑑𝑇subscript~𝐲𝑘𝑖1\displaystyle\hskip 10.00002pt-\mu_{\omega}\mathcal{C}_{d}^{T}\tilde{{\mathbf{% y}}}_{k}(i-1)- italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) (54)
𝐳k(i)subscript𝐳𝑘𝑖\displaystyle{\mathbf{z}}_{k}(i)bold_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) =𝐲k(i1)+μy𝒞d𝐰k(i)+𝒟𝐱i1absentsubscript𝐲𝑘𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript𝐰𝑘𝑖𝒟subscript𝐱𝑖1\displaystyle={\mathbf{y}}_{k}(i-1)+\mu_{y}\mathcal{C}_{d}{\mathbf{w}}_{k}(i)+% \mathcal{D}{\mathbf{x}}_{i-1}= bold_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) + caligraphic_D bold_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT (55)
𝐳~k(i)subscript~𝐳𝑘𝑖\displaystyle\tilde{{\mathbf{z}}}_{k}(i)over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) =𝐲~k(i1)+μy𝒞d𝐰~k(i)+𝒟𝐱~k(i1)𝒟=IAabsentsubscript~𝐲𝑘𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖𝒟subscript~𝐱𝑘𝑖1superscript𝒟𝐼𝐴absent\displaystyle=\tilde{{\mathbf{y}}}_{k}(i-1)+\mu_{y}\mathcal{C}_{d}\tilde{{% \mathbf{w}}}_{k}(i)+\mathcal{D}\tilde{{\mathbf{x}}}_{k}(i-1)\Longrightarrow^{% \mathclap{\mathcal{D}=I-A}}= over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) + caligraphic_D over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ⟹ start_POSTSUPERSCRIPT caligraphic_D = italic_I - italic_A end_POSTSUPERSCRIPT (56)
𝐳~k(i)subscript~𝐳𝑘𝑖\displaystyle\tilde{{\mathbf{z}}}_{k}(i)over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) =𝐲~k(i1)+μy𝒞d𝐰~k(i)+(IA)𝐱~k(i1)absentsubscript~𝐲𝑘𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖𝐼𝐴subscript~𝐱𝑘𝑖1\displaystyle=\tilde{{\mathbf{y}}}_{k}(i-1)+\mu_{y}\mathcal{C}_{d}\tilde{{% \mathbf{w}}}_{k}(i)+(I-A)\tilde{{\mathbf{x}}}_{k}(i-1)= over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) + ( italic_I - italic_A ) over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) (57)
𝐱k(i)subscript𝐱𝑘𝑖\displaystyle{\mathbf{x}}_{k}(i)bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) =𝐱k(i1)𝒟𝐳k(i)𝒟=IAabsentsubscript𝐱𝑘𝑖1𝒟subscript𝐳𝑘𝑖superscript𝒟𝐼𝐴absent\displaystyle={\mathbf{x}}_{k}(i-1)-\mathcal{D}{\mathbf{z}}_{k}(i)% \Longrightarrow^{\mathclap{\mathcal{D}=I-A}}= bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) - caligraphic_D bold_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ⟹ start_POSTSUPERSCRIPT caligraphic_D = italic_I - italic_A end_POSTSUPERSCRIPT (58)
𝐱~k(i)subscript~𝐱𝑘𝑖\displaystyle\tilde{{\mathbf{x}}}_{k}(i)over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) =𝐱~k(i1)(IA)𝐳~k(i)absentsubscript~𝐱𝑘𝑖1𝐼𝐴subscript~𝐳𝑘𝑖\displaystyle=\tilde{{\mathbf{x}}}_{k}(i-1)-(I-A)\tilde{{\mathbf{z}}}_{k}(i)= over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) - ( italic_I - italic_A ) over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) (59)
𝐲k(i)subscript𝐲𝑘𝑖\displaystyle{\mathbf{y}}_{k}(i)bold_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) =Prox(A¯𝐳k(i))absent𝑃𝑟𝑜𝑥¯𝐴subscript𝐳𝑘𝑖\displaystyle=Prox(\overline{A}{\mathbf{z}}_{k}(i))= italic_P italic_r italic_o italic_x ( over¯ start_ARG italic_A end_ARG bold_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ) (60)
𝐲~k(i)subscript~𝐲𝑘𝑖\displaystyle\tilde{{\mathbf{y}}}_{k}(i)over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) =Prox(A¯𝐳k(i))Prox(A¯𝐳)A¯=Aabsent𝑃𝑟𝑜𝑥¯𝐴subscript𝐳𝑘𝑖𝑃𝑟𝑜𝑥¯𝐴superscript𝐳superscript¯𝐴𝐴absent\displaystyle=Prox(\overline{A}{\mathbf{z}}_{k}(i))-Prox(\overline{A}{\mathbf{% z}}^{*})\Longrightarrow^{\mathclap{\overline{A}=A}}= italic_P italic_r italic_o italic_x ( over¯ start_ARG italic_A end_ARG bold_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ) - italic_P italic_r italic_o italic_x ( over¯ start_ARG italic_A end_ARG bold_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ⟹ start_POSTSUPERSCRIPT over¯ start_ARG italic_A end_ARG = italic_A end_POSTSUPERSCRIPT (61)
𝐲~k(i)subscript~𝐲𝑘𝑖\displaystyle\tilde{{\mathbf{y}}}_{k}(i)over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) =Prox(A𝐳k(i))Prox(A𝐳)absent𝑃𝑟𝑜𝑥𝐴subscript𝐳𝑘𝑖𝑃𝑟𝑜𝑥𝐴superscript𝐳\displaystyle=Prox(A{\mathbf{z}}_{k}(i))-Prox(A{\mathbf{z}}^{*})= italic_P italic_r italic_o italic_x ( italic_A bold_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ) - italic_P italic_r italic_o italic_x ( italic_A bold_z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) (62)

Now, Squaring of (54),(57),(59), and (62),

𝐰~k(i)2superscriptnormsubscript~𝐰𝑘𝑖2\displaystyle\|\tilde{{\mathbf{w}}}_{k}(i)\|^{2}∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝐰~k(i1)μω{J(𝐰k(i1))J(w)}2absentsuperscriptnormsubscript~𝐰𝑘𝑖1subscript𝜇𝜔𝐽subscript𝐰𝑘𝑖1𝐽superscript𝑤2\displaystyle=\|\tilde{{\mathbf{w}}}_{k}(i-1)-\mu_{\omega}\{\nabla J({\mathbf{% w}}_{k}(i-1))-\nabla J(w^{*})\}\|^{2}= ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT { ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) } ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
2μω𝐲~kT(i1)𝒞d(𝐰~k(i1)\displaystyle\hskip 10.00002pt-2\mu_{\omega}\tilde{{\mathbf{y}}}_{k}^{T}(i-1)% \mathcal{C}_{d}\Big{(}\tilde{{\mathbf{w}}}_{k}(i-1)- 2 italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_i - 1 ) caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 )
μω{J(𝐰k(i1))J(w)})\displaystyle\hskip 10.00002pt-\mu_{\omega}\{\nabla J({\mathbf{w}}_{k}(i-1))-% \nabla J(w^{*})\}\Big{)}- italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT { ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) } ) (63)
𝐳~k(i)2superscriptnormsubscript~𝐳𝑘𝑖2\displaystyle\|\tilde{{\mathbf{z}}}_{k}(i)\|^{2}∥ over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝐲~k(i1)+μy𝒞d𝐰~k(i)2+(IA)𝐱~k(i1)2absentsuperscriptnormsubscript~𝐲𝑘𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖2superscriptnorm𝐼𝐴subscript~𝐱𝑘𝑖12\displaystyle=\|\tilde{{\mathbf{y}}}_{k}(i-1)+\mu_{y}\mathcal{C}_{d}\tilde{{% \mathbf{w}}}_{k}(i)\|^{2}+\|(I-A)\tilde{{\mathbf{x}}}_{k}(i-1)\|^{2}= ∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ ( italic_I - italic_A ) over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+2𝐱~kT(i1)(IA)𝐲~k(i1)+μy𝒞d𝐰~k(i)2subscriptsuperscript~𝐱𝑇𝑘𝑖1𝐼𝐴subscript~𝐲𝑘𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖\displaystyle\hskip 10.00002pt+2\tilde{{\mathbf{x}}}^{T}_{k}(i-1)(I-A)\tilde{{% \mathbf{y}}}_{k}(i-1)+\mu_{y}\mathcal{C}_{d}\tilde{{\mathbf{w}}}_{k}(i)+ 2 over~ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ( italic_I - italic_A ) over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) (64)
𝐱~k(i)2superscriptnormsubscript~𝐱𝑘𝑖2\displaystyle\|\tilde{{\mathbf{x}}}_{k}(i)\|^{2}∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝐱~k(i1)2+(IA)𝐳~k(i)2absentsuperscriptnormsubscript~𝐱𝑘𝑖12superscriptnorm𝐼𝐴subscript~𝐳𝑘𝑖2\displaystyle=\|\tilde{{\mathbf{x}}}_{k}(i-1)\|^{2}+\|(I-A)\tilde{{\mathbf{z}}% }_{k}(i)\|^{2}= ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ ( italic_I - italic_A ) over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
2𝐱~kT(i1)(IA)𝐳~k(i)2subscriptsuperscript~𝐱𝑇𝑘𝑖1𝐼𝐴subscript~𝐳𝑘𝑖\displaystyle\hskip 10.00002pt-2\tilde{{\mathbf{x}}}^{T}_{k}(i-1)(I-A)\tilde{{% \mathbf{z}}}_{k}(i)- 2 over~ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ( italic_I - italic_A ) over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i )
=(57)𝐱~k(i1)2+(IA)𝐳~k(i)2superscript57absentsuperscriptnormsubscript~𝐱𝑘𝑖12superscriptnorm𝐼𝐴subscript~𝐳𝑘𝑖2\displaystyle=^{\mathclap{(\ref{eq:63})}}\|\tilde{{\mathbf{x}}}_{k}(i-1)\|^{2}% +\|(I-A)\tilde{{\mathbf{z}}}_{k}(i)\|^{2}= start_POSTSUPERSCRIPT ( ) end_POSTSUPERSCRIPT ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ ( italic_I - italic_A ) over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
2𝐱~kT(i1)(IA)(𝐲~k(i1)\displaystyle\hskip 10.00002pt-2\tilde{{\mathbf{x}}}^{T}_{k}(i-1)(I-A)(\tilde{% {\mathbf{y}}}_{k}(i-1)- 2 over~ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ( italic_I - italic_A ) ( over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 )
+μy𝒞d𝐰~k(i)+(IA)𝐱~k(i1))\displaystyle\hskip 10.00002pt+\mu_{y}\mathcal{C}_{d}\tilde{{\mathbf{w}}}_{k}(% i)+(I-A)\tilde{{\mathbf{x}}}_{k}(i-1))+ italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) + ( italic_I - italic_A ) over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) (65)
=𝐱~k(i1)2+(IA)𝐳~k(i)2absentsuperscriptnormsubscript~𝐱𝑘𝑖12superscriptnorm𝐼𝐴subscript~𝐳𝑘𝑖2\displaystyle=\|\tilde{{\mathbf{x}}}_{k}(i-1)\|^{2}+\|(I-A)\tilde{{\mathbf{z}}% }_{k}(i)\|^{2}= ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ ( italic_I - italic_A ) over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
2𝐱~kT(i1)(IA)(𝐲~k(i1)\displaystyle\hskip 10.00002pt-2\tilde{{\mathbf{x}}}^{T}_{k}(i-1)(I-A)(\tilde{% {\mathbf{y}}}_{k}(i-1)- 2 over~ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ( italic_I - italic_A ) ( over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 )
+μy𝒞d𝐰~k(i))2(IA)𝐱~k(i1))2\displaystyle\hskip 10.00002pt+\mu_{y}\mathcal{C}_{d}\tilde{{\mathbf{w}}}_{k}(% i))-2\|(I-A)\tilde{{\mathbf{x}}}_{k}(i-1))\|^{2}+ italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ) - 2 ∥ ( italic_I - italic_A ) over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (66)

Adding 2𝐱~kT(i1)(IA)(𝐲~k(i1)+μy𝒞d𝐰~k(i))2subscriptsuperscript~𝐱𝑇𝑘𝑖1𝐼𝐴subscript~𝐲𝑘𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖2\tilde{{\mathbf{x}}}^{T}_{k}(i-1)(I-A)(\tilde{{\mathbf{y}}}_{k}(i-1)+\mu_{y}% \mathcal{C}_{d}\tilde{{\mathbf{w}}}_{k}(i))2 over~ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ( italic_I - italic_A ) ( over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ) and 𝐱i2superscriptnormsubscript𝐱𝑖2-\|{\mathbf{x}}_{i}\|^{2}- ∥ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to both sides of (66) :

2𝐱~kT(i1)(IA)(𝐲~k(i1)+μy𝒞d𝐰~k(i))𝐱i2+𝐱i22subscriptsuperscript~𝐱𝑇𝑘𝑖1𝐼𝐴subscript~𝐲𝑘𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖superscriptnormsubscript𝐱𝑖2superscriptnormsubscript𝐱𝑖2\displaystyle 2\tilde{{\mathbf{x}}}^{T}_{k}(i-1)(I-A)(\tilde{{\mathbf{y}}}_{k}% (i-1)+\mu_{y}\mathcal{C}_{d}\tilde{{\mathbf{w}}}_{k}(i))-\|{\mathbf{x}}_{i}\|^% {2}+\|{\mathbf{x}}_{i}\|^{2}2 over~ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ( italic_I - italic_A ) ( over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ) - ∥ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=𝐱~k(i1)2+(IA)𝐳~k(i)2absentsuperscriptnormsubscript~𝐱𝑘𝑖12superscriptnorm𝐼𝐴subscript~𝐳𝑘𝑖2\displaystyle=\|\tilde{{\mathbf{x}}}_{k}(i-1)\|^{2}+\|(I-A)\tilde{{\mathbf{z}}% }_{k}(i)\|^{2}= ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ ( italic_I - italic_A ) over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
2𝐱~kT(i1)(IA)(𝐲~k(i1)\displaystyle\hskip 10.00002pt-2\tilde{{\mathbf{x}}}^{T}_{k}(i-1)(I-A)(\tilde{% {\mathbf{y}}}_{k}(i-1)- 2 over~ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ( italic_I - italic_A ) ( over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 )
+μy𝒞d𝐰~k(i))2(IA)𝐱~k(i1))2\displaystyle\hskip 10.00002pt+\mu_{y}\mathcal{C}_{d}\tilde{{\mathbf{w}}}_{k}(% i))-2\|(I-A)\tilde{{\mathbf{x}}}_{k}(i-1))\|^{2}+ italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ) - 2 ∥ ( italic_I - italic_A ) over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+2𝐱~kT(i1)(IA)(𝐲~k(i1)+μy𝒞d𝐰~k(i))𝐱i22subscriptsuperscript~𝐱𝑇𝑘𝑖1𝐼𝐴subscript~𝐲𝑘𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖superscriptnormsubscript𝐱𝑖2\displaystyle\hskip 10.00002pt+2\tilde{{\mathbf{x}}}^{T}_{k}(i-1)(I-A)(\tilde{% {\mathbf{y}}}_{k}(i-1)+\mu_{y}\mathcal{C}_{d}\tilde{{\mathbf{w}}}_{k}(i))-\|{% \mathbf{x}}_{i}\|^{2}+ 2 over~ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ( italic_I - italic_A ) ( over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ) - ∥ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (67)
2𝐱~kT(i1)(IA)(𝐲~k(i1)+μy𝒞d𝐰~k(i))2subscriptsuperscript~𝐱𝑇𝑘𝑖1𝐼𝐴subscript~𝐲𝑘𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖\displaystyle 2\tilde{{\mathbf{x}}}^{T}_{k}(i-1)(I-A)(\tilde{{\mathbf{y}}}_{k}% (i-1)+\mu_{y}\mathcal{C}_{d}\tilde{{\mathbf{w}}}_{k}(i))2 over~ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ( italic_I - italic_A ) ( over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) )
=𝐱~k(i1)2+(IA)𝐳~k(i)2absentsuperscriptnormsubscript~𝐱𝑘𝑖12superscriptnorm𝐼𝐴subscript~𝐳𝑘𝑖2\displaystyle=\|\tilde{{\mathbf{x}}}_{k}(i-1)\|^{2}+\|(I-A)\tilde{{\mathbf{z}}% }_{k}(i)\|^{2}= ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ ( italic_I - italic_A ) over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
2(IA)𝐱~k(i1))2𝐱i2\displaystyle\hskip 10.00002pt-2\|(I-A)\tilde{{\mathbf{x}}}_{k}(i-1))\|^{2}-\|% {\mathbf{x}}_{i}\|^{2}- 2 ∥ ( italic_I - italic_A ) over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (68)

then, we substitute (68) in (64), yields,

𝐳~k(i)2superscriptnormsubscript~𝐳𝑘𝑖2\displaystyle\|\tilde{{\mathbf{z}}}_{k}(i)\|^{2}∥ over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝐲~k(i1)+μy𝒞d𝐰~k(i)2absentsuperscriptnormsubscript~𝐲𝑘𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖2\displaystyle=\|\tilde{{\mathbf{y}}}_{k}(i-1)+\mu_{y}\mathcal{C}_{d}\tilde{{% \mathbf{w}}}_{k}(i)\|^{2}= ∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+(IA)𝐱~k(i1)2+𝐱~k(i)2superscriptnorm𝐼𝐴subscript~𝐱𝑘𝑖12superscriptnormsubscript~𝐱𝑘𝑖2\displaystyle\hskip 10.00002pt+\|(I-A)\tilde{{\mathbf{x}}}_{k}(i-1)\|^{2}+\|% \tilde{{\mathbf{x}}}_{k}(i)\|^{2}+ ∥ ( italic_I - italic_A ) over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+(IA)𝐳~k(i)22(IA)(𝐱~k(i1)2\displaystyle\hskip 10.00002pt+\|(I-A)\tilde{{\mathbf{z}}}_{k}(i)\|^{2}-2\|(I-% A)(\tilde{{\mathbf{x}}}_{k}(i-1)\|^{2}+ ∥ ( italic_I - italic_A ) over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 ∥ ( italic_I - italic_A ) ( over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+𝐱~k(i))2\displaystyle\hskip 10.00002pt+\|\tilde{{\mathbf{x}}}_{k}(i))\|^{2}+ ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=𝐲~k(i1)+μy𝒞d𝐰~k(i)2absentsuperscriptnormsubscript~𝐲𝑘𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖2\displaystyle=\|\tilde{{\mathbf{y}}}_{k}(i-1)+\mu_{y}\mathcal{C}_{d}\tilde{{% \mathbf{w}}}_{k}(i)\|^{2}= ∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+(IA)𝐱~k(i1)2+𝐱~k(i)2superscriptnorm𝐼𝐴subscript~𝐱𝑘𝑖12superscriptnormsubscript~𝐱𝑘𝑖2\displaystyle\hskip 10.00002pt+\|(I-A)\tilde{{\mathbf{x}}}_{k}(i-1)\|^{2}+\|% \tilde{{\mathbf{x}}}_{k}(i)\|^{2}+ ∥ ( italic_I - italic_A ) over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+(IA)𝐳~k(i)2(IA)(𝐱~k(i1)2\displaystyle\hskip 10.00002pt+\|(I-A)\tilde{{\mathbf{z}}}_{k}(i)\|^{2}-\|(I-A% )(\tilde{{\mathbf{x}}}_{k}(i-1)\|^{2}+ ∥ ( italic_I - italic_A ) over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ ( italic_I - italic_A ) ( over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+𝐱~k(i))2\displaystyle\hskip 10.00002pt+\|\tilde{{\mathbf{x}}}_{k}(i))\|^{2}+ ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (69)

Both sides of (69) should be deducting of (IA)𝐳~k(i)2superscriptnorm𝐼𝐴subscript~𝐳𝑘𝑖2\|(I-A)\tilde{{\mathbf{z}}}_{k}(i)\|^{2}∥ ( italic_I - italic_A ) over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, then, yields,

𝐳~k(i)A22superscriptsubscriptnormsubscript~𝐳𝑘𝑖superscript𝐴22\displaystyle\|\tilde{{\mathbf{z}}}_{k}(i)\|_{A^{2}}^{2}∥ over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝐲~k(i1)+μy𝒞d𝐰~k(i)2D𝐱~k(i)2absentsubscriptsuperscriptnormsubscript~𝐲𝑘𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖2superscript𝐷superscriptnormsubscript~𝐱𝑘𝑖2\displaystyle=\underbrace{\|\tilde{{\mathbf{y}}}_{k}(i-1)+\mu_{y}\mathcal{C}_{% d}\tilde{{\mathbf{w}}}_{k}(i)\|^{2}}_{\mathclap{D^{{}^{\prime}}}}-\|\tilde{{% \mathbf{x}}}_{k}(i)\|^{2}= under⏟ start_ARG ∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT - ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+𝐱~k(i1)A22superscriptsubscriptnormsubscript~𝐱𝑘𝑖1superscript𝐴22\displaystyle\hskip 10.00002pt+\|\tilde{{\mathbf{x}}}_{k}(i-1)\|_{A^{2}}^{2}+ ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (70)

Solving (Dsuperscript𝐷D^{{}^{\prime}}italic_D start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT):

𝐲~k(i1)+μy𝒞d𝐰~k(i)2superscriptnormsubscript~𝐲𝑘𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖2\displaystyle\|\tilde{{\mathbf{y}}}_{k}(i-1)+\mu_{y}\mathcal{C}_{d}\tilde{{% \mathbf{w}}}_{k}(i)\|^{2}∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝐲~k(i1)2+μy𝒞d𝐰~k(i)2absentsuperscriptnormsubscript~𝐲𝑘𝑖12superscriptnormsubscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖2\displaystyle=\|\tilde{{\mathbf{y}}}_{k}(i-1)\|^{2}+\|\mu_{y}\mathcal{C}_{d}% \tilde{{\mathbf{w}}}_{k}(i)\|^{2}= ∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+2𝐲~k(i1)Tμy𝒞d𝐰~k(i)2subscript~𝐲𝑘superscript𝑖1𝑇subscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖\displaystyle\hskip 10.00002pt+2\tilde{{\mathbf{y}}}_{k}(i-1)^{T}\mu_{y}% \mathcal{C}_{d}\tilde{{\mathbf{w}}}_{k}(i)+ 2 over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i )
=(54)𝐲~k(i1)2superscript54absentsuperscriptnormsubscript~𝐲𝑘𝑖12\displaystyle=^{\mathclap{(\ref{eq:58})}}\|\tilde{{\mathbf{y}}}_{k}(i-1)\|^{2}= start_POSTSUPERSCRIPT ( ) end_POSTSUPERSCRIPT ∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+μy𝒞d𝐰~k(i)2superscriptnormsubscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖2\displaystyle\hskip 10.00002pt+\|\mu_{y}\mathcal{C}_{d}\tilde{{\mathbf{w}}}_{k% }(i)\|^{2}+ ∥ italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+2𝐲~k(i1)Tμy𝒞d2subscript~𝐲𝑘superscript𝑖1𝑇subscript𝜇𝑦subscript𝒞𝑑\displaystyle\hskip 10.00002pt+2\tilde{{\mathbf{y}}}_{k}(i-1)^{T}\mu_{y}% \mathcal{C}_{d}+ 2 over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT
×(𝐰~k(i1)\displaystyle\hskip 10.00002pt\times\Big{(}\tilde{{\mathbf{w}}}_{k}(i-1)× ( over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 )
μω{J(𝐰(i1))\displaystyle\hskip 10.00002pt-\mu_{\omega}\{\nabla J({\mathbf{w}}(i-1))- italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT { ∇ italic_J ( bold_w ( italic_i - 1 ) )
J(w)}μω𝒞dT𝐲~k(i1))\displaystyle\hskip 10.00002pt-\nabla J(w^{*})\}-\mu_{\omega}\mathcal{C}_{d}^{% T}\tilde{{\mathbf{y}}}_{k}(i-1)\Big{)}- ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) } - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) (71)

similar to (68), we can have,

2𝐲~kT(i1)𝒞d(𝐰~k(i1)μω(J(𝐰k(i1))J(w)))2superscriptsubscript~𝐲𝑘𝑇𝑖1subscript𝒞𝑑subscript~𝐰𝑘𝑖1subscript𝜇𝜔𝐽subscript𝐰𝑘𝑖1𝐽superscript𝑤\displaystyle 2\tilde{{\mathbf{y}}}_{k}^{T}(i-1)\mathcal{C}_{d}(\tilde{{% \mathbf{w}}}_{k}(i-1)-\mu_{\omega}(\nabla J({\mathbf{w}}_{k}(i-1))-\nabla J(w^% {*})))2 over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_i - 1 ) caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) )
=1μω(𝐰~k(i1)μω(J(𝐰k(i1))J(w))2\displaystyle=\frac{1}{\mu_{\omega}}(\|\tilde{{\mathbf{w}}}_{k}(i-1)-\mu_{% \omega}(\nabla J({\mathbf{w}}_{k}(i-1))-\nabla J(w^{*}))\|^{2}= divide start_ARG 1 end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT end_ARG ( ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝐰~k(i)2+μω𝒞d𝐲~k(i1)2)\displaystyle\hskip 10.00002pt-\|\tilde{{\mathbf{w}}}_{k}(i)\|^{2}+\|\mu_{% \omega}\mathcal{C}_{d}\tilde{{\mathbf{y}}}_{k}(i-1)\|^{2})- ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) (72)

if we put (72) in (Dsuperscript𝐷D^{{}^{\prime}}italic_D start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT), yields,

𝐲~k(i1)+μy𝒞d𝐰~k(i)2superscriptnormsubscript~𝐲𝑘𝑖1subscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖2\displaystyle\|\tilde{{\mathbf{y}}}_{k}(i-1)+\mu_{y}\mathcal{C}_{d}\tilde{{% \mathbf{w}}}_{k}(i)\|^{2}∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) + italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝐲~k(i1)2+μy𝒞d𝐰~k(i)2absentsuperscriptnormsubscript~𝐲𝑘𝑖12superscriptnormsubscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖2\displaystyle=\|\tilde{{\mathbf{y}}}_{k}(i-1)\|^{2}+\|\mu_{y}\mathcal{C}_{d}% \tilde{{\mathbf{w}}}_{k}(i)\|^{2}= ∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+μyμw(𝐰~k(i1)\displaystyle\hskip 10.00002pt+\frac{\mu_{y}}{\mu_{w}}\Big{(}\tilde{{\mathbf{w% }}}_{k}(i-1)+ divide start_ARG italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_ARG ( over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 )
μω(J(𝐰k(i1))J(w))subscript𝜇𝜔𝐽subscript𝐰𝑘𝑖1𝐽superscript𝑤\displaystyle\hskip 10.00002pt-\mu_{\omega}(\nabla J({\mathbf{w}}_{k}(i-1))-% \nabla J(w^{*}))- italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) )
𝐰k(i)2+μw𝒞d𝐲~k(i1)2)\displaystyle\hskip 10.00002pt-\|{\mathbf{w}}_{k}(i)\|^{2}+\|\mu_{w}\mathcal{C% }_{d}\tilde{{\mathbf{y}}}_{k}(i-1)\|^{2}\Big{)}- ∥ bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
2μyμw𝒞d𝐲~k(i1)22subscript𝜇𝑦subscript𝜇𝑤superscriptnormsubscript𝒞𝑑subscript~𝐲𝑘𝑖12\displaystyle\hskip 10.00002pt-2\mu_{y}\mu_{w}\|\mathcal{C}_{d}\tilde{{\mathbf% {y}}}_{k}(i-1)\|^{2}- 2 italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ∥ caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (73)

putting (73) in (70),

𝐳~k(i)A22superscriptsubscriptnormsubscript~𝐳𝑘𝑖superscript𝐴22\displaystyle\|\tilde{{\mathbf{z}}}_{k}(i)\|_{A^{2}}^{2}∥ over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =𝐲~k(i1)2+μy𝒞d𝐰~k(i)2absentsuperscriptnormsubscript~𝐲𝑘𝑖12superscriptnormsubscript𝜇𝑦subscript𝒞𝑑subscript~𝐰𝑘𝑖2\displaystyle=\|\tilde{{\mathbf{y}}}_{k}(i-1)\|^{2}+\|\mu_{y}\mathcal{C}_{d}% \tilde{{\mathbf{w}}}_{k}(i)\|^{2}= ∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+μyμw(𝐰~k(i1)\displaystyle\hskip 10.00002pt+\frac{\mu_{y}}{\mu_{w}}\Big{(}\|\tilde{{\mathbf% {w}}}_{k}(i-1)+ divide start_ARG italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_ARG ( ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 )
μω(J(𝐰k(i1))J(w))2\displaystyle\hskip 10.00002pt-\mu_{\omega}(\nabla J({\mathbf{w}}_{k}(i-1))-% \nabla J(w^{*}))\|^{2}- italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝐰~k(i)2+μw𝒞d𝐲~k(i1)2)\displaystyle\hskip 10.00002pt-\|\tilde{{\mathbf{w}}}_{k}(i)\|^{2}+\|\mu_{w}% \mathcal{C}_{d}\tilde{{\mathbf{y}}}_{k}(i-1)\|^{2}\Big{)}- ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
2μyμw𝒞d𝐲~k(i1)2𝐱~k(i)22subscript𝜇𝑦subscript𝜇𝑤superscriptnormsubscript𝒞𝑑subscript~𝐲𝑘𝑖12superscriptnormsubscript~𝐱𝑘𝑖2\displaystyle\hskip 10.00002pt-2\mu_{y}\mu_{w}\|\mathcal{C}_{d}\tilde{{\mathbf% {y}}}_{k}(i-1)\|^{2}-\|\tilde{{\mathbf{x}}}_{k}(i)\|^{2}- 2 italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ∥ caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+𝐱~k(i1)A22superscriptsubscriptnormsubscript~𝐱𝑘𝑖1superscript𝐴22\displaystyle\hskip 10.00002pt+\|\tilde{{\mathbf{x}}}_{k}(i-1)\|_{A^{2}}^{2}+ ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (74)

Multiplying (74) by am=μwμysubscript𝑎𝑚subscript𝜇𝑤subscript𝜇𝑦a_{m}=\frac{\mu_{w}}{\mu_{y}}italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = divide start_ARG italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_ARG start_ARG italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG,

am𝐳~k(i)A22+am𝐱~k(i)2=am𝐲~k(i1)Iμyμw𝒞d𝒞dT2subscript𝑎𝑚superscriptsubscriptnormsubscript~𝐳𝑘𝑖superscript𝐴22subscript𝑎𝑚superscriptnormsubscript~𝐱𝑘𝑖2subscript𝑎𝑚superscriptsubscriptnormsubscript~𝐲𝑘𝑖1𝐼subscript𝜇𝑦subscript𝜇𝑤subscript𝒞𝑑superscriptsubscript𝒞𝑑𝑇2\displaystyle a_{m}\|\tilde{{\mathbf{z}}}_{k}(i)\|_{A^{2}}^{2}+a_{m}\|\tilde{{% \mathbf{x}}}_{k}(i)\|^{2}=a_{m}\|\tilde{{\mathbf{y}}}_{k}(i-1)\|_{I-\mu_{y}\mu% _{w}\mathcal{C}_{d}\mathcal{C}_{d}^{T}}^{2}italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∥ over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUBSCRIPT italic_I - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝐰~k(i)Iμwμy𝒞dT𝒞d2superscriptsubscriptnormsubscript~𝐰𝑘𝑖𝐼subscript𝜇𝑤subscript𝜇𝑦superscriptsubscript𝒞𝑑𝑇subscript𝒞𝑑2\displaystyle\hskip 10.00002pt-\|\tilde{{\mathbf{w}}}_{k}(i)\|_{I-\mu_{w}\mu_{% y}\mathcal{C}_{d}^{T}\mathcal{C}_{d}}^{2}- ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUBSCRIPT italic_I - italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+am𝐱~k(i1)2subscript𝑎𝑚superscriptnormsubscript~𝐱𝑘𝑖12\displaystyle\hskip 10.00002pt+a_{m}\|\tilde{{\mathbf{x}}}_{k}(i-1)\|^{2}+ italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+𝐰~k(i1)μω(J(𝐰k(i1))J(w))2D"subscriptsuperscriptnormsubscript~𝐰𝑘𝑖1subscript𝜇𝜔𝐽subscript𝐰𝑘𝑖1𝐽superscript𝑤2superscript𝐷"\displaystyle+\underbrace{\|\tilde{{\mathbf{w}}}_{k}(i-1)-\mu_{\omega}(\nabla J% ({\mathbf{w}}_{k}(i-1))-\nabla J(w^{*}))\|^{2}}_{\mathclap{D^{"}}}+ under⏟ start_ARG ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT " end_POSTSUPERSCRIPT end_POSTSUBSCRIPT (75)

For solving (D"superscript𝐷"D^{"}italic_D start_POSTSUPERSCRIPT " end_POSTSUPERSCRIPT), data model No.6 and data model No.7 needed to be utilize.

𝐰~k(i1)\displaystyle\|\tilde{{\mathbf{w}}}_{k}(i-1)∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) μω(J(𝐰k(i1))J(w))2\displaystyle-\mu_{\omega}(\nabla J({\mathbf{w}}_{k}(i-1))-\nabla J(w^{*}))\|^% {2}- italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=𝐰~k(i1)2+μω2J(𝐰k(i1))J(w)2absentsuperscriptnormsubscript~𝐰𝑘𝑖12superscriptsubscript𝜇𝜔2superscriptnorm𝐽subscript𝐰𝑘𝑖1𝐽superscript𝑤2\displaystyle=\|\tilde{{\mathbf{w}}}_{k}(i-1)\|^{2}+\mu_{\omega}^{2}\|\nabla J% ({\mathbf{w}}_{k}(i-1))-\nabla J(w^{*})\|^{2}= ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
2μω𝐰~i1T(J(𝐰k(i1))J(w))2subscript𝜇𝜔superscriptsubscript~𝐰𝑖1𝑇𝐽subscript𝐰𝑘𝑖1𝐽superscript𝑤\displaystyle\hskip 10.00002pt-2\mu_{\omega}\tilde{{\mathbf{w}}}_{i-1}^{T}(% \nabla J({\mathbf{w}}_{k}(i-1))-\nabla J(w^{*}))- 2 italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) (76)

Rely on data model No.6

𝐰~k(i1)\displaystyle\|\tilde{{\mathbf{w}}}_{k}(i-1)∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) μω(J(𝐰k(i1))J(w))2\displaystyle-\mu_{\omega}(\nabla J({\mathbf{w}}_{k}(i-1))-\nabla J(w^{*}))\|^% {2}- italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝐰~k(i1)2absentsuperscriptnormsubscript~𝐰𝑘𝑖12\displaystyle\leq\|\tilde{{\mathbf{w}}}_{k}(i-1)\|^{2}≤ ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+μω2δ𝐰~i1T(J(𝐰k(i1))J(w))superscriptsubscript𝜇𝜔2𝛿superscriptsubscript~𝐰𝑖1𝑇𝐽subscript𝐰𝑘𝑖1𝐽superscript𝑤\displaystyle\hskip 10.00002pt+\mu_{\omega}^{2}\delta\tilde{{\mathbf{w}}}_{i-1% }^{T}\Big{(}\nabla J({\mathbf{w}}_{k}(i-1))-\nabla J(w^{*})\Big{)}+ italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) )
2μω𝐰~i1T(J(𝐰k(i1))J(w))2subscript𝜇𝜔superscriptsubscript~𝐰𝑖1𝑇𝐽subscript𝐰𝑘𝑖1𝐽superscript𝑤\displaystyle\hskip 10.00002pt-2\mu_{\omega}\tilde{{\mathbf{w}}}_{i-1}^{T}(% \nabla J({\mathbf{w}}_{k}(i-1))-\nabla J(w^{*}))- 2 italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) (77)

and data model No.7,

𝐰~k(i1)\displaystyle\|\tilde{{\mathbf{w}}}_{k}(i-1)∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) μω(J(𝐰k(i1))J(w))2\displaystyle-\mu_{\omega}(\nabla J({\mathbf{w}}_{k}(i-1))-\nabla J(w^{*}))\|^% {2}- italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝐰~k(i1)2+μω2δν𝐰~k(i1)2absentsuperscriptnormsubscript~𝐰𝑘𝑖12superscriptsubscript𝜇𝜔2𝛿𝜈superscriptnormsubscript~𝐰𝑘𝑖12\displaystyle\leq\|\tilde{{\mathbf{w}}}_{k}(i-1)\|^{2}+\mu_{\omega}^{2}\delta% \nu\|\tilde{{\mathbf{w}}}_{k}(i-1)\|^{2}≤ ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_δ italic_ν ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
2μων𝐰~k(i1)22subscript𝜇𝜔𝜈superscriptnormsubscript~𝐰𝑘𝑖12\displaystyle\hskip 10.00002pt-2\mu_{\omega}\nu\|\tilde{{\mathbf{w}}}_{k}(i-1)% \|^{2}- 2 italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ν ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (78)
𝐰~k(i1)\displaystyle\|\tilde{{\mathbf{w}}}_{k}(i-1)∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) μω(J(𝐰k(i1))J(w))2\displaystyle-\mu_{\omega}(\nabla J({\mathbf{w}}_{k}(i-1))-\nabla J(w^{*}))\|^% {2}- italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(1μων(2μωδ))𝐰~k(i1)2absent1subscript𝜇𝜔𝜈2subscript𝜇𝜔𝛿superscriptnormsubscript~𝐰𝑘𝑖12\displaystyle\leq(1-\mu_{\omega}\nu(2-\mu_{\omega}\delta))\|\tilde{{\mathbf{w}% }}_{k}(i-1)\|^{2}≤ ( 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ν ( 2 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_δ ) ) ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (79)

then,

\displaystyle\| 𝐰~k(i1)μω(J(𝐰k(i1))J(w))2\displaystyle\tilde{{\mathbf{w}}}_{k}(i-1)-\mu_{\omega}(\nabla J({\mathbf{w}}_% {k}(i-1))-\nabla J(w^{*}))\|^{2}over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( ∇ italic_J ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ) - ∇ italic_J ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(1μων(2μωδ))𝐰~k(i1)2absent1subscript𝜇𝜔𝜈2subscript𝜇𝜔𝛿superscriptnormsubscript~𝐰𝑘𝑖12\displaystyle\leq(1-\mu_{\omega}\nu(2-\mu_{\omega}\delta))\|\tilde{{\mathbf{w}% }}_{k}(i-1)\|^{2}≤ ( 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ν ( 2 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_δ ) ) ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(1μων(1μωδ))𝐰~k(i1)2μων𝐰~k(i1)2absent1subscript𝜇𝜔𝜈1subscript𝜇𝜔𝛿superscriptnormsubscript~𝐰𝑘𝑖12subscript𝜇𝜔𝜈superscriptnormsubscript~𝐰𝑘𝑖12\displaystyle\leq(1-\mu_{\omega}\nu(1-\mu_{\omega}\delta))\|\tilde{{\mathbf{w}% }}_{k}(i-1)\|^{2}-\mu_{\omega}\nu\|\tilde{{\mathbf{w}}}_{k}(i-1)\|^{2}≤ ( 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ν ( 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_δ ) ) ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ν ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(1μων(1μωδ))𝐰~k(i1)IμyμωCdTCd2absent1subscript𝜇𝜔𝜈1subscript𝜇𝜔𝛿superscriptsubscriptnormsubscript~𝐰𝑘𝑖1𝐼subscript𝜇𝑦subscript𝜇𝜔superscriptsubscript𝐶𝑑𝑇subscript𝐶𝑑2\displaystyle\leq(1-\mu_{\omega}\nu(1-\mu_{\omega}\delta))\|\tilde{{\mathbf{w}% }}_{k}(i-1)\|_{I-\mu_{y}\mu_{\omega}C_{d}^{T}C_{d}}^{2}≤ ( 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ν ( 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_δ ) ) ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUBSCRIPT italic_I - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
1μων(1μωδ)1μyμwσMax2(Cd)𝐰~k(i1)IμyμωCdTCd2absent1subscript𝜇𝜔𝜈1subscript𝜇𝜔𝛿1subscript𝜇𝑦subscript𝜇𝑤superscriptsubscript𝜎𝑀𝑎𝑥2subscript𝐶𝑑superscriptsubscriptnormsubscript~𝐰𝑘𝑖1𝐼subscript𝜇𝑦subscript𝜇𝜔superscriptsubscript𝐶𝑑𝑇subscript𝐶𝑑2\displaystyle\leq\frac{1-\mu_{\omega}\nu(1-\mu_{\omega}\delta)}{1-\mu_{y}\mu_{% w}\sigma_{Max}^{2}(C_{d})}\|\tilde{{\mathbf{w}}}_{k}(i-1)\|_{I-\mu_{y}\mu_{% \omega}C_{d}^{T}C_{d}}^{2}≤ divide start_ARG 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ν ( 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_δ ) end_ARG start_ARG 1 - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_M italic_a italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUBSCRIPT italic_I - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (80)

For positive step-sizes, am>0subscript𝑎𝑚0a_{m}>0italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT > 0, 1μων(1μωδ)>01subscript𝜇𝜔𝜈1subscript𝜇𝜔𝛿01-\mu_{\omega}\nu(1-\mu_{\omega}\delta)>01 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ν ( 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_δ ) > 0 , then, we have μω<12δsubscript𝜇𝜔12𝛿\mu_{\omega}<\frac{1}{2\delta}italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT < divide start_ARG 1 end_ARG start_ARG 2 italic_δ end_ARG. In addition, 1μων(1μωδ)1μyμwσMax2(Cd)>11subscript𝜇𝜔𝜈1subscript𝜇𝜔𝛿1subscript𝜇𝑦subscript𝜇𝑤superscriptsubscript𝜎𝑀𝑎𝑥2subscript𝐶𝑑1\frac{1-\mu_{\omega}\nu(1-\mu_{\omega}\delta)}{1-\mu_{y}\mu_{w}\sigma_{Max}^{2% }(C_{d})}>1divide start_ARG 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ν ( 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_δ ) end_ARG start_ARG 1 - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_M italic_a italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG > 1, consequently, μyν2σMax2(Cd).subscript𝜇𝑦𝜈2superscriptsubscript𝜎𝑀𝑎𝑥2subscript𝐶𝑑\mu_{y}\leq\frac{\nu}{2\sigma_{Max}^{2}(C_{d})}.italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ≤ divide start_ARG italic_ν end_ARG start_ARG 2 italic_σ start_POSTSUBSCRIPT italic_M italic_a italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG . Finally, (75) yields:

𝐰~k(i1)IμyμωCdTCd2+am𝐳~k(i)A22+am𝐱~k(i)2superscriptsubscriptnormsubscript~𝐰𝑘𝑖1𝐼subscript𝜇𝑦subscript𝜇𝜔superscriptsubscript𝐶𝑑𝑇subscript𝐶𝑑2subscript𝑎𝑚superscriptsubscriptnormsubscript~𝐳𝑘𝑖superscript𝐴22subscript𝑎𝑚superscriptnormsubscript~𝐱𝑘𝑖2\displaystyle\|\tilde{{\mathbf{w}}}_{k}(i-1)\|_{I-\mu_{y}\mu_{\omega}C_{d}^{T}% C_{d}}^{2}+a_{m}\|\tilde{{\mathbf{z}}}_{k}(i)\|_{A^{2}}^{2}+a_{m}\|\tilde{{% \mathbf{x}}}_{k}(i)\|^{2}∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUBSCRIPT italic_I - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∥ over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=am𝐲~k(i1)Iμyμw𝒞d𝒞dT2+am𝐱~k(i1)2absentsubscript𝑎𝑚superscriptsubscriptnormsubscript~𝐲𝑘𝑖1𝐼subscript𝜇𝑦subscript𝜇𝑤subscript𝒞𝑑superscriptsubscript𝒞𝑑𝑇2subscript𝑎𝑚superscriptnormsubscript~𝐱𝑘𝑖12\displaystyle=a_{m}\|\tilde{{\mathbf{y}}}_{k}(i-1)\|_{I-\mu_{y}\mu_{w}\mathcal% {C}_{d}\mathcal{C}_{d}^{T}}^{2}+a_{m}\|\tilde{{\mathbf{x}}}_{k}(i-1)\|^{2}= italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUBSCRIPT italic_I - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+1μων(1μωδ)1μyμwσMax2(Cd)𝐰~k(i1)IμyμωCdTCd21subscript𝜇𝜔𝜈1subscript𝜇𝜔𝛿1subscript𝜇𝑦subscript𝜇𝑤superscriptsubscript𝜎𝑀𝑎𝑥2subscript𝐶𝑑superscriptsubscriptnormsubscript~𝐰𝑘𝑖1𝐼subscript𝜇𝑦subscript𝜇𝜔superscriptsubscript𝐶𝑑𝑇subscript𝐶𝑑2\displaystyle+\frac{1-\mu_{\omega}\nu(1-\mu_{\omega}\delta)}{1-\mu_{y}\mu_{w}% \sigma_{Max}^{2}(C_{d})}\|\tilde{{\mathbf{w}}}_{k}(i-1)\|_{I-\mu_{y}\mu_{% \omega}C_{d}^{T}C_{d}}^{2}+ divide start_ARG 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ν ( 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_δ ) end_ARG start_ARG 1 - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_M italic_a italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG ∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUBSCRIPT italic_I - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (81)
Assumption 3

if 𝐳~k(i)subscript~𝐳𝑘𝑖\tilde{{\mathbf{z}}}_{k}(i)over~ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) and 𝐱~k(i1)subscript~𝐱𝑘𝑖1\tilde{{\mathbf{x}}}_{k}(i-1)over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) lied in the range space of 𝒜𝒜\mathcal{A}caligraphic_A, then,

𝐱~k(i1)A22superscriptsubscriptnormsubscript~𝐱𝑘𝑖1superscript𝐴22\displaystyle\|\tilde{{\mathbf{x}}}_{k}(i-1)\|_{A^{2}}^{2}∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUBSCRIPT italic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT σ¯2(A)𝐱~k(i1)2absentsuperscript¯𝜎2𝐴superscriptnormsubscript~𝐱𝑘𝑖12\displaystyle\geq\underline{\sigma}^{2}(A)\|\tilde{{\mathbf{x}}}_{k}(i-1)\|^{2}≥ under¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A ) ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (82)

where σ¯2(A)superscript¯𝜎2𝐴\underline{\sigma}^{2}(A)under¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A ) is the minimum non-zero singular value of 𝒜𝒜\mathcal{A}caligraphic_A. ιmin(CdCdT)subscript𝜄𝑚𝑖𝑛subscript𝐶𝑑superscriptsubscript𝐶𝑑𝑇\iota_{min}(C_{d}C_{d}^{T})italic_ι start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) is assumed as the smallest eigenvalue of CdCdTsubscript𝐶𝑑superscriptsubscript𝐶𝑑𝑇C_{d}C_{d}^{T}italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT which is positive-definite and ιmin(CdCdT)ICdCdTsubscript𝜄𝑚𝑖𝑛subscript𝐶𝑑superscriptsubscript𝐶𝑑𝑇𝐼subscript𝐶𝑑superscriptsubscript𝐶𝑑𝑇\iota_{min}(C_{d}C_{d}^{T})I\leq C_{d}C_{d}^{T}italic_ι start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) italic_I ≤ italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT.

𝐲~k(i1)normsubscript~𝐲𝑘𝑖1\displaystyle\|\tilde{{\mathbf{y}}}_{k}(i-1)\|∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ 2Iμyμw𝒞d𝒞dTsuperscriptsubscriptabsent𝐼subscript𝜇𝑦subscript𝜇𝑤subscript𝒞𝑑superscriptsubscript𝒞𝑑𝑇2{}_{I-\mu_{y}\mu_{w}\mathcal{C}_{d}\mathcal{C}_{d}^{T}}^{2}start_FLOATSUBSCRIPT italic_I - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(1μyμωιmin(CdCdT))𝐲~k(i1)2absent1subscript𝜇𝑦subscript𝜇𝜔subscript𝜄𝑚𝑖𝑛subscript𝐶𝑑superscriptsubscript𝐶𝑑𝑇superscriptnormsubscript~𝐲𝑘𝑖12\displaystyle\leq(1-\mu_{y}\mu_{\omega}\iota_{min}(C_{d}C_{d}^{T}))\|\tilde{{% \mathbf{y}}}_{k}(i-1)\|^{2}≤ ( 1 - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ι start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) ) ∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (83)
Remark 4

Squaring both sides of (62), introduced below equation,

y~i2superscriptnormsubscript~𝑦𝑖2\displaystyle\|\tilde{y}_{i}\|^{2}∥ over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =Prox(𝒜¯𝒵i)Prox(𝒜¯𝒵)2absentsuperscriptnorm𝑃𝑟𝑜𝑥¯𝒜subscript𝒵𝑖𝑃𝑟𝑜𝑥¯𝒜superscript𝒵2\displaystyle=\|Prox(\mathcal{\bar{A}}\mathcal{Z}_{i})-Prox(\mathcal{\bar{A}}% \mathcal{Z}^{*})\|^{2}= ∥ italic_P italic_r italic_o italic_x ( over¯ start_ARG caligraphic_A end_ARG caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_P italic_r italic_o italic_x ( over¯ start_ARG caligraphic_A end_ARG caligraphic_Z start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
𝒜¯𝒵i~2=𝒵i~𝒜¯22absentsuperscriptnorm¯𝒜~subscript𝒵𝑖2superscriptsubscriptnorm~subscript𝒵𝑖superscript¯𝒜22\displaystyle\leq\|\mathcal{\bar{A}}\tilde{\mathcal{Z}_{i}}\|^{2}=\|\tilde{% \mathcal{Z}_{i}}\|_{\mathcal{\bar{A}}^{2}}^{2}≤ ∥ over¯ start_ARG caligraphic_A end_ARG over~ start_ARG caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ∥ over~ start_ARG caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∥ start_POSTSUBSCRIPT over¯ start_ARG caligraphic_A end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
(𝒜¯=𝒜)𝒵i~𝒜22superscript¯𝒜𝒜absentsuperscriptsubscriptnorm~subscript𝒵𝑖superscript𝒜22\displaystyle\leq^{(\mathcal{\bar{A}}=\mathcal{A})}\|\tilde{\mathcal{Z}_{i}}\|% _{\mathcal{A}^{2}}^{2}≤ start_POSTSUPERSCRIPT ( over¯ start_ARG caligraphic_A end_ARG = caligraphic_A ) end_POSTSUPERSCRIPT ∥ over~ start_ARG caligraphic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∥ start_POSTSUBSCRIPT caligraphic_A start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (84)

Based on Assumption 3. and Remark 6., (81) changed and yields,

𝐰~k(i1)IμyμωCdTCd2+am𝐲~k(i)2+am𝐱~k(i)2superscriptsubscriptnormsubscript~𝐰𝑘𝑖1𝐼subscript𝜇𝑦subscript𝜇𝜔superscriptsubscript𝐶𝑑𝑇subscript𝐶𝑑2subscript𝑎𝑚superscriptnormsubscript~𝐲𝑘𝑖2subscript𝑎𝑚superscriptnormsubscript~𝐱𝑘𝑖2\displaystyle\|\tilde{{\mathbf{w}}}_{k}(i-1)\|_{I-\mu_{y}\mu_{\omega}C_{d}^{T}% C_{d}}^{2}+a_{m}\|\tilde{{\mathbf{y}}}_{k}(i)\|^{2}+a_{m}\|\tilde{{\mathbf{x}}% }_{k}(i)\|^{2}∥ over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUBSCRIPT italic_I - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=γ1𝐰~k(i1)IμyμωCdTCd2absentevaluated-atsubscript𝛾1subscript~𝐰𝑘𝑖1𝐼subscript𝜇𝑦subscript𝜇𝜔superscriptsubscript𝐶𝑑𝑇subscript𝐶𝑑2\displaystyle=\gamma_{1}\tilde{{\mathbf{w}}}_{k}(i-1)\|_{I-\mu_{y}\mu_{\omega}% C_{d}^{T}C_{d}}^{2}= italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over~ start_ARG bold_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUBSCRIPT italic_I - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+amγ2𝐲~k(i1)Iμyμw𝒞d𝒞dT2subscript𝑎𝑚subscript𝛾2superscriptsubscriptnormsubscript~𝐲𝑘𝑖1𝐼subscript𝜇𝑦subscript𝜇𝑤subscript𝒞𝑑superscriptsubscript𝒞𝑑𝑇2\displaystyle+a_{m}\gamma_{2}\|\tilde{{\mathbf{y}}}_{k}(i-1)\|_{I-\mu_{y}\mu_{% w}\mathcal{C}_{d}\mathcal{C}_{d}^{T}}^{2}+ italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ over~ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUBSCRIPT italic_I - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+amγ3𝐱~k(i1)2subscript𝑎𝑚subscript𝛾3superscriptnormsubscript~𝐱𝑘𝑖12\displaystyle+a_{m}\gamma_{3}\|\tilde{{\mathbf{x}}}_{k}(i-1)\|^{2}+ italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ∥ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_i - 1 ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (85)

where

γ1subscript𝛾1\displaystyle\gamma_{1}italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 1μων(1μωδ)1μyμwσMax2(Cd)absent1subscript𝜇𝜔𝜈1subscript𝜇𝜔𝛿1subscript𝜇𝑦subscript𝜇𝑤superscriptsubscript𝜎𝑀𝑎𝑥2subscript𝐶𝑑\displaystyle\triangleq\frac{1-\mu_{\omega}\nu(1-\mu_{\omega}\delta)}{1-\mu_{y% }\mu_{w}\sigma_{Max}^{2}(C_{d})}≜ divide start_ARG 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ν ( 1 - italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_δ ) end_ARG start_ARG 1 - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_M italic_a italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG (86)
γ2subscript𝛾2\displaystyle\gamma_{2}italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (1μyμωιmin(CdCdT))absent1subscript𝜇𝑦subscript𝜇𝜔subscript𝜄𝑚𝑖𝑛subscript𝐶𝑑superscriptsubscript𝐶𝑑𝑇\displaystyle\triangleq(1-\mu_{y}\mu_{\omega}\iota_{min}(C_{d}C_{d}^{T}))≜ ( 1 - italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_ι start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ( italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) ) (87)
γ3subscript𝛾3\displaystyle\gamma_{3}italic_γ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT σ¯2(A)absentsuperscript¯𝜎2𝐴\displaystyle\triangleq\underline{\sigma}^{2}(A)≜ under¯ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_A ) (88)

References

  • [1] S. Boyd, N. Parikh, and E. Chu, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” in Found. Trends Mach. Lear, vol. 3, no. 1, pp. 1–22, 2011.
  • [2] S. Sundhar Ram, A. Nedic, and V. V. Veeravalli, “A new class of distributed optimization algorithms: Application to regression of distributed data,” in Optimization Methods and Software, vol. 27, no. 1, pp. 71–88, 2012.
  • [3] J. Chen, Z. J. Towfic, and A. H. Sayed, “Dictionary learning over distributed models,” in IEEE Trans. Signal Process., vol. 63, no. 4, pp. 1001–1016, 2015.
  • [4] D. Hallac, J. Leskovec, and S. Boyd, “Network LASSO: Clustering and optimization in large graphs,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 387–396, 2015.
  • [5] T.-H. Chang, A. Nedic, and A. Scaglione, “Distributed constrained optimization by consensus-based primal-dual perturbation method,” in IEEE Trans. Autom. Control, vol. 59, no. 6, pp. 1524–1538, 2014.
  • [6] D. P. Palomar and M. Chiang, “A tutorial on decomposition methods for network utility maximization,” in IEEE Journal on Selected Areas in Communications, vol. 24, no. 8, pp. 1439–1451, 2006.
  • [7] C. Xi, Q. Wu, and U. A. Khan, “On the distributed optimization over directed networks,” in Neurocomputing, vol. 267, pp. 508–515, 2017.
  • [8] K. I Tsianos, S. Lawlor, and M. Rabbat, “Push-sum distributed dual averaging for convex optimization,” in 2012 ieee 51st ieee conference on decision and control (cdc), pp. 5453–5458, 2012.
  • [9] A. Nedić, A. Olshevsky, “Distributed optimization over time-varying directed graphs,” in IEEE Transactions on Automatic Control, vol. 60, no. 3, pp. 601–615, 2014.
  • [10] D. Kempe, A. Dobra, and J. Gehrke “Gossip-based computation of aggregate information,” in 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings. pp. 482–491, 2003.
  • [11] K. Cai, H. Ishii “Average consensus on general strongly connected digraphs,” in Automatica, vol. 48, no. 11, pp. 2750–2761, 2012.
  • [12] C. Xi, U. A. Khan “Distributed subgradient projection algorithm over directed graphs,” in IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3986–3992, 2016.
  • [13] A. Olshevsky “Efficient information aggregation strategies for distributed control and signal processing,” in arXiv preprint arXiv:1009.6036, 2010.
  • [14] S.S. Ram, V.V. Veeravalli, and A. Nedi ć. “Distributed non-autonomous power control through distributed convex optimization,” in IEEE INFOCOM, pp. 3001–3005 2009.
  • [15] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, and others “Distributed optimization and statistical learning via the alternating direction method of multipliers,” in Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011.
  • [16] V. Cevher, S. Becker, and M. Schmidt “Convex Optimization for Big Data: Scalable, randomized, and parallel algorithms for big data analytics,” in IEEE Signal Processing Magazine, vol. 31, no. 5, pp. 32–43, 2014.
  • [17] D. P. Bertsekas “Incremental proximal methods for large scale convex optimization,” in Mathematical programming, vol. 129, no. 2, pp. 163–195, 2011.
  • [18] A. Nedic, D. P. Bertsekas “Incremental Subgradient Methods for Nondifferentiable Optimization,” in SIAM Journal on Optimization, vol. 12, no. 1, pp. 109–138, 2001.
  • [19] S. Ram, A. Nedić, V. Veeravalli “Incremental stochastic subgradient algorithms for convex optimization,” in SIAM Journal on Optimization, vol. 20, no. 2, pp. 691–717, 2009.
  • [20] A. Nedić, D. Bertsekas, “Convergence rate of incremental subgradient algorithms,” in SIAM Journal on Optimization, pp. 223–264, 2001.
  • [21] A. Nedić, A. Ozdaglar “Distributed subgradient methods for multi-agent optimization,” in IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009.
  • [22] W. Shi, Q. Ling, G. Wu, and W. Yin “Extra: An exact first-order algorithm for decentralized consensus optimization,” in SIAM Journal on Optimization, vol. 25, no. 2, pp. 944–966, 2015.
  • [23] M. Zhu, S. Martínez “Discrete-time dynamic average consensus,” in Automatica, vol. 46, no. 2, pp. 322–329, 2010.
  • [24] C. Xi, U. S. Khan “DEXTRA: A Fast Algorithm for Optimization Over Directed Graphs,” in IEEE Transactions on Automatic Control, vol. 62, no. 10, pp. 4980–4993, 2017.
  • [25] C. Xi, R. Xin, U. S. Khan “ADD-OPT: Accelerated distributed directed optimization,” in IEEE Transactions on Automatic Control, vol. 63, no. 5, pp. 1329–1339, 2017.
  • [26] C. Xi, V. S. Mai, R. Xin, E. H. Abed, U. S. Khan “Linear convergence in optimization over directed graphs with row-stochastic matrices,” in IEEE Transactions on Automatic Control, vol. 63, no. 10, pp. 3558–3565, 2018.
  • [27] J. Xu, S. Zhu, Y. Soh, and L. Xie “Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes,” in Proceedings of the 54th IEEE Conference on Decision and Control (CDC), pp. 2055–2060, 2015.
  • [28] L. Zhi, S. Wei, and Y. Ming “A Decentralized Proximal-Gradient Method With Network Independent Step-Sizes and Separated Convergence Rates,” in IEEE Transactions on Signal Processing, vol. 67, no. 17, pp. 4494–4506, 2019.
  • [29] K. Yuan, B. Ying, X. Zhao, and A. H. Sayed “Exact diffusion for distributed optimization and learning—Part I: Algorithm development,” in IEEE Transactions on Signal Processing, vol. 67, no. 3, pp. 708–723, 2018.
  • [30] A. H. Sayed, “Adaptation, Learning and Optimization over networks,” in Foundations and Trends in Machine Learning, vol. 7, no. 4-5, pp. 311–801, 2014, 10.1109.
  • [31] S. Haykin, “Cognitive Dynamic Systems: Perception-action Cycle, Radar and Radio,”Cambridge University Press, New York, NY, USA. , 2012.
  • [32] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control,, vol. 54, no. 1, pp. 48-61, 2009.
  • [33] J. Chen and A. H. Sayed, “Distributed Pareto optimization via diffusion strategies,” IEEE J. Sel. Topics Signal Process., vol. 7, no. 2, pp. 205-220, April. 2013.
  • [34] K. Yuan, Q. Ling, and W. Yin, “On the convergence of decentralized gradient descent,” SIAM Journal on optimization., vol. 26, no. 3, pp. 1835-1854, 2016.
  • [35] J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual Averaging for distributed optimization: Convergence analysis and network scaling,” IEEE Transactions on Automatic Control,, vol. 57, no. 3, pp. 592-606, 2012.
  • [36] W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin, “On the linear convergence of the ADMM in decentralized consensus optimization,” IEEE Trans. Signal Process.,, vol. 62, no. 7, pp. 1750-1761, 2014.
  • [37] G. Qu and N. Li, “Harnessing smoothness to accelerate distributed optimization,” IEEE Transactions on Control of Network Systems,, vol. 5, no. 3, pp. 1245-1260, 2018.
  • [38] A. Nedic, A. Olshevsky, and W. Shi, “Achieving geometric convergence for distributed optimization over time-varying graphs,” SIAM Journal on Optimization,, vol. 27, no. 4, pp. 2597-2633, 2017.
  • [39] K. Scaman, F. Bach, S. Bubeck, Y. T. Lee, and L. Massoulie “Optimal algorithms for smooth and strongly convex distributed optimization in the network,” in International Conference on Machine Learning (ICML). Stockholm, Sweden, 2017, pp. 3027-3036,.
  • [40] A. I. Chen, and A. Ozdaglar, “A fast distributed optimal-gradient method,” in Annual Allerton Conference on communication , Control, and Computing. Monticello, IL, USA, Oct. 2012, pp. 601-608.
  • [41] G. M. Baudet, “Asynchronous iterative methods for multiprocessors,” Journal of the ACM(JACM), vol. 25, no. 2, pp. 226-244, 1978.
  • [42] D. Bertsekas, “Distributed dynamic programming,” IEEE Transactions on Automatic Control, vol. 27, no. 3, pp. 610-616, 1982.
  • [43] S. A. Alghunaim, M. Yan, and A. H. Sayed, “A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features,” IEEE 28th European Signal Processing Conference (EUSIPCO), pp. 2095–2099, 2021.
  • [44] S. A. Alghunaim, Q. Lyu, M. Yan, and A. H. Sayed, “A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features,” IEEE Transactions on Signal Processing, vol. 69, pp. 5568–5579, 2021.
  • [45] S. Boyd and L. Vandenberghe, “Convex Optimization.” Cambridge, U.K.: Cambridge Univ. Press, 2004.
  • [46] D. Bertsekas, “Convex Analysis and optimization.” Singapore: Athena Scientific, 2003.
  • [47] P. Latafat, N. M. Freris, and P. Patrinos, “A new randomized block-coordinate primal-dual proximal algorithm for distributed optimization,” IEEE Transactions on Automatic Control , vol. 64, no. 10, pp. 4050-4065, Oct. 2019.
  • [48] Xin, Ran, and Usman A. Khan. ”A linear algorithm for optimization over directed graphs with geometric convergence.” IEEE Control Systems Letters, vol. 2, no. 3, pp. 315-320, 2018.
  • [49] D. Kempe, A. Dobra, and J. Gehrke. ”Gossip-based computation of aggregate information” 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pp. 482-491, 2003.
  • [50] Q. Qiu, and H. Su, ”Finite-Time Output Synchronization for Output-Coupled Reaction-Diffusion Neural Networks With Directed Topology,” in IEEE Transactions on Network Science and Engineering, vol. 9, no. 3, pp. 1386-1394, 2022.
  • [51] P. Wan et al., ”Optimal Control for Positive and Negative Information Diffusion Based on Game Theory in Online Social Networks,” in IEEE Transactions on Network Science and Engineering, vol. 10, no. 1, pp. 426-440, 2023.
[Uncaptioned image] Sajad Zandi completed his Master of Science degree in Electrical Engineering from Malayer University, Iran, in 2017. Since May 2024, he has been a PhD student at the School of Mechanical and Mechatronics Engineering, University of Technology Sydney (UTS), Australia. His research focuses on adaptive and statistical signal processing, cooperative learning, multi-agent networking, distributed adaptive estimation, distributed active noise control, information-theoretic data, signal processing for communications, and adaptive optimization.
[Uncaptioned image] Mehdi Korki received the bachelor degree in electrical engineering from Shiraz University, Shiraz, Iran, in 2001 and the master and Ph.D. degrees in electrical engineering from Swinburne University of Technology, Melbourne, Australia, in 2012 and 2016, respectively. He is currently a Lecturer at the School of Science, Computing and Engineering Technologies, Swinburne University of Technology, Melbourne, Australia. His research interests include statistical signal processing, data privacy, wireless sensor networks and distributed signal processing.