Primal-Dual Strategy (PDS) for Composite Optimization Over Directed graphs
Abstract
We investigate the distributed multi-agent sharing optimization problem in a directed graph, with a composite objective function consisting of a smooth function plus a convex (possibly non-smooth) function shared by all agents. While adhering to the network connectivity structure, the goal is to minimize the sum of smooth local functions plus a non-smooth function. The proposed Primal-Dual algorithm (PD) is similar to a previous algorithm [43], but it has additional benefits. To begin, we investigate the problem in directed graphs, where agents can only communicate in one direction and the combination matrix is not symmetric. Furthermore, the combination matrix is changing over time, and the condition coefficient weights are produced using an adaptive approach. The strong convexity assumption, adaptive coefficient weights, and a new upper bound on step-sizes are used to demonstrate that linear convergence is possible. New upper bounds on step-sizes are derived under the strong convexity assumption and adaptive coefficient weights that are time-varying in the presence of both smooth and non-smooth terms. Simulation results show the efficacy of the proposed algorithm compared to some other algorithms.
Index Terms:
Primal-Dual algorithm (PD), directed graph, adaptive coefficient weights.I Introduction
In a decentralized approach, agents, which only exchange information with their immediate neighbors, are connected to form a multi-agent cooperative network for estimating an unknown parameter through in-network processing. The goal of this work is to modify and utilize the multi-agent sharing optimization problems [43], in a distributed approach. In recent years, there has been a surge of interest in distributed optimization. It is also a solution to minimize the average/sum of convex functions by a network of nodes, where a single agent has only access to a private function . This problem frequently arises in formation control [13] and non-autonomous power control [14]. The coupled multi-agent sharing optimization problem includes two functions [43]: the sum of smooth local functions plus a convex (possibly non-smooth) function coupling all agents in the network. Agents aim to find the solution to the following problem:
(1) |
where function is only known to agent , and non-smooth function is known by all agents. Note that is a convex (possibly non-smooth) function, where the matrix is known by agent only. The coupled multi-agent optimization problem of (1) represents the sharing formulation, where the agents (with their various variables) are coupled through . The applications of problem (1) include regression over distributed features [1], [2], dictionary learning over distributed models [3], clustering in graphs [4], smart grid control [5], and network utility maximization [6], to name a few.
The authors in [43] proposed a decentralized algorithm for (1) and established its linear convergence to the global solution. They derived their algorithms based on reformulating (1) into an equivalent decentralized solution using an equivalent saddle-point problem. In this work, we use a similar technique to develop and derive a novel decentralized algorithm in a directed graph based on adaptive coefficient weights and a new convergence rate. Weight balance is usually not possible in real-world scenarios. As a result, we aim to improve the optimization and learning algorithms that are suitable for and applicable to directed graphs. The main challenge in dealing with directed graphs is the substitution of construction of doubly-stochastic to either row-stochastic or column-stochastic matrix, which are used as weighted adjacency matrices, in directed graphs. The row-stochasticity of the weight matrix guarantees that all agents reach consensus, while the column-stochasticity ensures optimality [7]. Most of the methods for optimization over directed graphs are used to combine the average-consensus methods with optimization algorithms designed for undirected graphs. In contrast to undirected graphs, the applications of a directed graph topology[50] are wider, and it may also result in lower communication costs and simpler topology. Combining average-consensus techniques created for directed graphs with optimization algorithms made for undirected graphs serves as the inspiration for the current methods for optimization over directed graphs. For example, subgradient-push method, described in [8] and extensively investigated in [9], combines push-sum consensus [10] and distributed gradient descent (DGD). Directed-Distributed Gradient Descent (D-DGD)[7, 12], is a linear technique over directed graphs that is based on surplus consensus [11] and DGD. However, due to the diminishing step-size, such DGD-based algorithms converge relatively slowly for general convex functions and strongly convex functions.
I-A Related Works
Information exchange within a multi-agent cooperative network (through coupled neighbors) aims to estimate an unknown parameter. The applications of this cooperative learning and estimation are in various fields such as signal processing, optimization, and control [30], [31]. Information exchange can happen in a distributed or central approach. In the central method, data is collected from the whole network and is processed in the fusion center, then it is sent back to the agents. This method is powerful, however, it still has some limitations such as limited bandwidth and vulnerability of the fusion center to failure. Decentralized methods refer to fully distributed solutions. As an application, the distributed solution method enables parallel processing and the distribution of computational load across multiple agents. When the bandwidth around the central agent is low, the performance of network methods degrades. On the other hand, one advantage of decentralized methods is that convergence occurs quickly while bandwidth is limited. Furthermore, in some sensitive applications, privacy and confidentiality are barriers that prevent sending information to fusion centers. On the other hand, in a distributed model, the agents are only allowed to share information with their immediate neighbors. This can be applied in social, transportation, and biological networks [30, 51].
Many studies have made decentralized optimization algorithms in large-scale networks an interesting topic. For instance, consider [15, 16] with one agent at the center of a centralized topology that may fail or violate privacy requirements. The incremental algorithm [17, 20] may be considered as a replacement for a non-centralized topology based on a directed ring network topology. Furthermore, to manage a time-varying network, [21] proposes a distributed subgradient (DSG) algorithm with slow speed due to a decaying step-size that is required to obtain a consensus and optimal solution.
In this paper, we propose a new convergence rate for a primal-dual algorithm in a directed graph with adaptive coefficient weights.
Moreover, if in (1) is a non-smooth regularizer across all the agents, a proximal decentralized algorithm structure with a fixed point fits with the desired global solution. On the other hand, if , then decentralized primal methods can only converge to a biased solution. As a solution, primal methods are required with a decaying step-size to slow down the convergence rate [43]. As a result, one of the difficulties in this field is the speed of convergence, for which some methods are affected by decreasing step sizes. To accelerate convergence, we proposed a solution that utilizes strong convexity and the Liptschiz-Continuous gradients case. Some methods employed a constant step size that geometrically converges to an error ball centered on the optimal solution[34]. The authors in [22] achieve geometric convergence with the help of symmetric weights. In [23], the authors combine inexact gradient methods and a gradient estimation technique according to dynamic average consensus, where the underlying graphs must be undirected or weight-balanced according to the proposed methods. The convergence rate of some methods can be limited by decreasing step size, depending on whether the objective function is generally convex or strongly convex.
Recently, the authors in [24] proposed a fast distributed algorithm, called DEXTRA, where they have chosen an appropriate step-size which resulted in a linear convergence of the algorithm, while the objective functions are strongly convex with Liptschiz-Continuous gradients. In this method, if we choose a decreasing step-size, the convergence behavior is slow. However, if the step-size is determined by an interval and the objective functions are strongly convex with the Lipschitz-Continuous gradient, the convergence rate geometrically results in a global optimal point [48]. In ADD-OPT/push-DIGing [25], the improvement of [24] provides a geometrical convergence with a sufficiently small step size. In DEXTRA [24] and ADD-OPT/push-DIGing [25] algorithms, each node needs to know its out-degree in order to build a column-stochastic weights matrix. In [26], the authors eliminate this requirement by using row-stochastic weights and the need for out-degree knowledge, as each agent locally decides the weights assigned to incoming information. Fast methods over directed graphs share one thing in common: they are all based on push-sum type techniques. This results in a nonlinear algorithm since an independent algorithm is required to asymptotically learn either the right or the left eigenvector, which corresponds to the eigenvalue of 1 [49]. It does, however, result in additional computations and communications among the agents.
This work focuses on the case where communication between nodes is directed and time-varying. We analyze the convergence rate with a decreasing step-size for the non-smooth function. First, we discuss the problem when is smooth. Decentralized primal methods in [32]–[35] can not achieve an exact solution for the convergence rate unless a degraded step-size is applied. In [36], to improve the convergence rate, a decentralized method based on the alternating direction method of multipliers (ADMM) technique has been utilized. To simplify the implementation of the convergence rate, a gradient tracking method was applied in [37]–[38], then, dual-domain and accelerated dual gradient descent were utilized in [39]. In case, is non-smooth, a proximal gradient method is useful to compute the convergence rate, however, the increasing number of inner loops makes it computationally expensive [40]. If the non-smooth function is replaced by the indicator function of zero, then, a good convergence rate is attained [41], [42]. For the proximal decentralized algorithm, an asymptotic linear convergence exists even though all the functions are piece-wise linear-quadratic (PLQ). Since both large number of iterations and PLQ[47] are considered, a good convergence rate can be obtained.
I-B Contribution
In this paper, we develop a modified version of proximal-dual (PD) strategy [43], [44] over a distributed network, in a directed graph with adaptive coefficient weights. The purpose of this work is to reach the new convergence rates in a directed graph.
The contributions of this paper are summarized as follows:
- •
-
•
To have both a good convergence and a good performance, we optimize the adaptive combination coefficients, in a directed graph, to reduce the load of the communications and the energy of each agent.
-
•
Consequently, we derive new bounds for step sizes which result in a lower squared error. (See Theorem 1)
Simulation results also show that the proposed algorithm achieves a faster convergence rate.
The rest of the paper is organized as follows. In Section II, we present the problem formulation of the diffusion strategy and proposed primal-dual diffusion algorithm, and saddle points reformulation. We also provide the implementation of proximal adaptive-then-combine (ATC) for dual-diffusion algorithm and diffusion tracking method. In Section III, we present the proposed proximal dual diffusion Method, i.e., PD-dMVC and we derive the new theorem to find the new step-sizes. In Section IV, we provide the simulation results to evaluate the performance of the proposed algorithm. We conclude the paper in Section VII.
I-C Notation and Symbols
All over the paper, all vectors are column vectors, with exception of the regression vector denoted by , which is a row vector. denotes the identity matrix of size and denotes the vector with all entries equal to one. Lowercase letters denote vectors and uppercase letters denote matrices. and implies that is positive definite matrix ( and are both symmetric matrices with the same dimension). denotes a column vector that is structured by stacking the elements on top of each other. , for a vector and a squared matrix . The subdifferential of a function at is the set of all subgradients. The proximal operator of a function with step-size is . The conjugate of a function is defined as . A differentiable function is -smooth and -stronglyconvex if and , respectively, for any and .
II Problem formulation
II-A Problem Reformulation
II-A1 Primal dual decentralized algorithms
According to some studies, adaptive then combined methods such as primal-dual recursion have larger step-size stability ranges than non-ATC methods. The primal-dual framework was the first primal-dual interpretation of ATC gradient tracking methods[ref-ref]. When the step-size is kept constant, diffusion methods solve optimization problems in the form of approximate problems that converge to biased solutions. To converge to an optimal solution, primal-based methods require the use of decaying step-size; however, decaying step-size slows down the convergence rate. As a result, primal-dual methods are proposed to obtain unbiased problem-solving and converge to exact minimizers while communications are limited. There are two kinds of problems: those with penalties and those without. The benefit of using a penalty term is that it improves the convergence rate of a decentralized algorithm while aggregate cost is conditioned and strongly convex but individuals are not. An equivalent approach is to reformulate some problems through a network and try to find interested unknown parameters. In fact, while studying a network with a general algorithmic framework as a solution to the decentralized optimization problem, minimizing a sum of local cost functions is an interesting topic. Furthermore, saddle points as a solution structure include primal-descent, dual ascend, and combine equations. In addition, the cooperation between equations is as follows: gradient-descent followed by gradient-ascend, followed by a combination step.
In this section, we consider some saddle-point problems that can reformulate (1). A decentralized algorithm cannot be obtained by directly solving the first equivalent saddle point. As a result, more reformulation is required to derive a decentralized solution in a decentralized approach to solve the problem. The network quantities are introduced as follows:
(2) | |||
(3) | |||
(4) |
Then, the problem (1) can be written as follows:
(5) |
Assumption 1
Remark 1
Because the objective function in (5) is strongly convex, the global solution is unique.
Problem (5) is equivalent to the following saddle-point reformulation under strong-duality and Assumption (1).
(6) |
where is the conjugate function of and is the dual variable. Due to the coupling effect of all agents by dual variable , which requires a fusion center to compute the dual update, solving (6) directly does not lead to a decentralized algorithm. As a result, additional reformulations are required. A local copy of dual variable at agent , i.e., , is introduced to solve (6) in a decentralized manner. The following quantities are introduced as part of the network:
(7) |
If we define as a symmetric matrix, we can write:
(8) |
The saddle-point type-2 problem is written as:
(9) |
The problem formulation of (9) can be solved in a decentralized manner by using an optimal-point .
Lemma 1
(Optimal Point) Subject to assumption .1 and (8), a primal-dual pair and a fixed point are optimal if, and only if, they satisfy the optimality conditions for saddle-point 1 and saddle-point 2 as below, respectively.
(10) | ||||
(11) | ||||
(12) | ||||
(13) | ||||
(14) |
II-A2 Decentralized Strategy
Let and take any arbitrary value and .
(15) | ||||
(16) | ||||
(17) | ||||
(18) |
where and is left-stochastic or right-stochastic combination matrix.
II-A3 Implementation of proximal ATC for dual-diffusion algorithm
II-A4 Diffusion Tracking Method
II-A5 Adaptive Combination Matrix
We consider the following definitions and assumptions of the combination matrix:
(26) |
where is zero if there is no edge connection between agents and . Matrix is assumed to be a primitive symmetric and left/right-stochastic matrix. , which is also a primitive symmetric left/right-stochastic matrix. . if the eigenvalues of belong to (-1,1], many choices for exist.
Assumption 2
(Combination Matrix) Matrix satisfies condition (8) and then below condition yield:
II-A6 Theorem (Convergence-rate)
Theorem 1
Subject to assumption .1 and .2, if is full row rank and the step-sizes and are strictly positive and satisfy
(27) | ||||
(28) |
it holds for all and some that:
(29) |
where
See Appendix A.
Remark 2
is the smallest eigenvalue of , denotes the smallest non-zero singular value of and is the largest singular value of .
Remark 3
According to Theorem .1, PD algorithm has fast convergence for non-smooth g if has full row rank.
II-A7 Data Model
-
1.
If we assume that cost function is -smooth ,then, following inequality is established,
(30) - 2.
III Proximal Decentralized Algorithm
To manage the non-smooth term , it is defined as
(32) |
III-A Proximal Dual Diffusion Method
Following recursion suggested based on the (39), for , and each agent , if , then
(33) | ||||
(34) | ||||
(35) | ||||
(36) | ||||
(37) |
Algorithm | Total time (second) | Number of iterations |
EXTRA([22]) | ||
Aug-DGM([27]) | ||
DIGing([38]) | ||
NIDS([28]) | ||
Exact Diffusion([29]) | ||
Primal-Dual | ||
Primal-Dual Diffusion |
The desired variance can be estimated iteratively, and the factor parameter estimated by agent by running the following smoothing filter:
(38) |
where {}, which is needed to run the recursion at agent k, is computed by agent at iteration i. In order to remove the necessity of transmitting from agent to its neighbors, the estimation of designed into neighbors of agent . It gives the advantage of not sharing the value and overcomes the difficulty of accessing agent k to in the ATC diffusion implementation. This works by the mechanism of receiving from its neighbor , we replaced by since for . Note that the iterates at various agents approach to an optimal point. With this substitution, agent can now estimate the variance of its neighbor locally by running a smoothing filter of the following form:
(39) |
The above expression provides part of an adaptive construction for the combination weights . Besides, adaptive weights are factors to evaluate .
The proposed Proximal-Dual diffusion algorithm along with its adaptive coefficient weights are described in Algorithm 1 and Algorithm 2, respectively.
IV Fast Convergence
If , then belongs to range space of . As a result, will always remain in the range space of . The iterates converge to this point . The error quantities are as follows:
(40) | ||||
(41) | ||||
(42) | ||||
(43) |
The expressions in (33)-(37) produce following error recursions:
(44) | ||||
(45) | ||||
(46) | ||||
(47) |
The network mean-square deviation is:
(48) |
V Convergence properties and performance comparison of algorithms
Table I compares the convergence properties of distributed algorithms in term of supporting prox operators, the rate, and upper bound of the proposed primary-dual diffusion (PDD) and primal-dual(PD) algorithms with those of NIDS, EXTRA and DIGing-ATC, Aug-DGM, Exact diffusion algorithms. As it is seen, the proposed primal-dual diffusion algorithm has the same upper bound of step size as that of the primal-dual algorithm. However, the convergence rate for both PD diffusion and PD is higher than those of the NIDS, EXTRA, DIGing-ATC, Aug-DGM, and Exact diffusion algorithms. Due to the fact shown in Table I, PDD and PD require fewer parameters than DIGing, EXTRA and Aug-DGM algorithms, therefore, their upper bounds are simpler. Table II compares the performance of the algorithms for directed and undirected graphs in terms of total time and number of iterations. All algorithms were assumed to have the same number of iterations. As shown, the total time per second is different for all algorithms with the same number of iterations. It can be seen that PDD algorithm outperforms the other algorithms.
VI Simulation Results
On distributed parameter estimation, simulation for a network with N = 20 nodes are presented. Figure 1 (a) depicts the directed network topology used in all simulations. For computing the combination coefficients, the Metropolis rule is used.
VI-A Decentralized compressed sensing
Consider a decentralized compressed sensing problem involving some network agents. Each agent has its own measurement via , where is an unknown sparse signal and is an i.i.d Gaussian noise vector. We consider as a non-smooth function and to simplify the problem, value is 1. To estimate as the sparse vector, we use the decentralized algorithm.
VI-A1 The case without non-smooth function
Consider the decentralized problem of solving for an unknown signal in the absence of a non-smooth function.
VI-A2 The case with nonsmooth function
A non-smooth function was considered to solve a decentralized compressed sensing problem.
The comparison of the simulation results for the proposed primal-dual diffusion (PDD) and primal-dual (PD) algorithms with those of NIDS-adaptive, NIDS, EXTRA, and DIGing-ATC algorithms are shown Fig. 1 (a). Note that in this scenario, we ignored the non-smooth function. It is seen that the proposed PDD outperforms the other algorithms in terms of square error.
In Fig 1 (b), we have compared the performance of the proposed PDD algorithm with those of EXTRA and PD algorithms, where we have a non-smooth function. As it is seen the proposed PDD outperforms EXTRA and PD algorithms.
VII Conclusion
We have studied the distributed multi-agent sharing optimization problem in a directed graph, with a composite objective function consisting of a smooth function plus a non-smooth function shared through all agents in a network. To solve the problem, some reformulations were applied. A new upper bound on step sizes are obtained, from which a linear convergence can be achieved under the strong convexity assumption. Further, we proposed the adaptive coefficient weights that are time-varying. We show that the proposed algorithm develops the linear convergence to the global minimizer with adaptive coefficient weights and new bound of step sizes.
Appendix A Proof of Theorem .1
To solve (44, 45, 46, 47) problems, First, we need to define some assumptions, then, square both sides of them to reach optimal step-sizes of PD algorithm and convergence rate equation:
(49) | ||||
(50) | ||||
(51) | ||||
(52) |
Based on agent formulation, equations are reformulated as follows:
(53) | ||||
(54) | ||||
(55) | ||||
(56) | ||||
(57) | ||||
(58) | ||||
(59) | ||||
(60) | ||||
(61) | ||||
(62) |
Now, Squaring of (54),(57),(59), and (62),
(63) | ||||
(64) | ||||
(65) | ||||
(66) |
Adding and to both sides of (66) :
(67) |
(68) |
then, we substitute (68) in (64), yields,
(69) |
Both sides of (69) should be deducting of , then, yields,
(70) |
Solving ():
(71) |
similar to (68), we can have,
(72) |
if we put (72) in (), yields,
(73) |
(74) |
Multiplying (74) by ,
(75) |
For solving (), data model No.6 and data model No.7 needed to be utilize.
(76) |
Rely on data model No.6
(77) |
and data model No.7,
(78) |
(79) |
then,
(80) |
For positive step-sizes, , , then, we have . In addition, , consequently, Finally, (75) yields:
(81) |
Assumption 3
if and lied in the range space of , then,
(82) |
where is the minimum non-zero singular value of . is assumed as the smallest eigenvalue of which is positive-definite and .
(83) |
Remark 4
Squaring both sides of (62), introduced below equation,
(84) |
References
- [1] S. Boyd, N. Parikh, and E. Chu, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” in Found. Trends Mach. Lear, vol. 3, no. 1, pp. 1–22, 2011.
- [2] S. Sundhar Ram, A. Nedic, and V. V. Veeravalli, “A new class of distributed optimization algorithms: Application to regression of distributed data,” in Optimization Methods and Software, vol. 27, no. 1, pp. 71–88, 2012.
- [3] J. Chen, Z. J. Towfic, and A. H. Sayed, “Dictionary learning over distributed models,” in IEEE Trans. Signal Process., vol. 63, no. 4, pp. 1001–1016, 2015.
- [4] D. Hallac, J. Leskovec, and S. Boyd, “Network LASSO: Clustering and optimization in large graphs,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 387–396, 2015.
- [5] T.-H. Chang, A. Nedic, and A. Scaglione, “Distributed constrained optimization by consensus-based primal-dual perturbation method,” in IEEE Trans. Autom. Control, vol. 59, no. 6, pp. 1524–1538, 2014.
- [6] D. P. Palomar and M. Chiang, “A tutorial on decomposition methods for network utility maximization,” in IEEE Journal on Selected Areas in Communications, vol. 24, no. 8, pp. 1439–1451, 2006.
- [7] C. Xi, Q. Wu, and U. A. Khan, “On the distributed optimization over directed networks,” in Neurocomputing, vol. 267, pp. 508–515, 2017.
- [8] K. I Tsianos, S. Lawlor, and M. Rabbat, “Push-sum distributed dual averaging for convex optimization,” in 2012 ieee 51st ieee conference on decision and control (cdc), pp. 5453–5458, 2012.
- [9] A. Nedić, A. Olshevsky, “Distributed optimization over time-varying directed graphs,” in IEEE Transactions on Automatic Control, vol. 60, no. 3, pp. 601–615, 2014.
- [10] D. Kempe, A. Dobra, and J. Gehrke “Gossip-based computation of aggregate information,” in 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings. pp. 482–491, 2003.
- [11] K. Cai, H. Ishii “Average consensus on general strongly connected digraphs,” in Automatica, vol. 48, no. 11, pp. 2750–2761, 2012.
- [12] C. Xi, U. A. Khan “Distributed subgradient projection algorithm over directed graphs,” in IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3986–3992, 2016.
- [13] A. Olshevsky “Efficient information aggregation strategies for distributed control and signal processing,” in arXiv preprint arXiv:1009.6036, 2010.
- [14] S.S. Ram, V.V. Veeravalli, and A. Nedi ć. “Distributed non-autonomous power control through distributed convex optimization,” in IEEE INFOCOM, pp. 3001–3005 2009.
- [15] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, and others “Distributed optimization and statistical learning via the alternating direction method of multipliers,” in Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011.
- [16] V. Cevher, S. Becker, and M. Schmidt “Convex Optimization for Big Data: Scalable, randomized, and parallel algorithms for big data analytics,” in IEEE Signal Processing Magazine, vol. 31, no. 5, pp. 32–43, 2014.
- [17] D. P. Bertsekas “Incremental proximal methods for large scale convex optimization,” in Mathematical programming, vol. 129, no. 2, pp. 163–195, 2011.
- [18] A. Nedic, D. P. Bertsekas “Incremental Subgradient Methods for Nondifferentiable Optimization,” in SIAM Journal on Optimization, vol. 12, no. 1, pp. 109–138, 2001.
- [19] S. Ram, A. Nedić, V. Veeravalli “Incremental stochastic subgradient algorithms for convex optimization,” in SIAM Journal on Optimization, vol. 20, no. 2, pp. 691–717, 2009.
- [20] A. Nedić, D. Bertsekas, “Convergence rate of incremental subgradient algorithms,” in SIAM Journal on Optimization, pp. 223–264, 2001.
- [21] A. Nedić, A. Ozdaglar “Distributed subgradient methods for multi-agent optimization,” in IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009.
- [22] W. Shi, Q. Ling, G. Wu, and W. Yin “Extra: An exact first-order algorithm for decentralized consensus optimization,” in SIAM Journal on Optimization, vol. 25, no. 2, pp. 944–966, 2015.
- [23] M. Zhu, S. Martínez “Discrete-time dynamic average consensus,” in Automatica, vol. 46, no. 2, pp. 322–329, 2010.
- [24] C. Xi, U. S. Khan “DEXTRA: A Fast Algorithm for Optimization Over Directed Graphs,” in IEEE Transactions on Automatic Control, vol. 62, no. 10, pp. 4980–4993, 2017.
- [25] C. Xi, R. Xin, U. S. Khan “ADD-OPT: Accelerated distributed directed optimization,” in IEEE Transactions on Automatic Control, vol. 63, no. 5, pp. 1329–1339, 2017.
- [26] C. Xi, V. S. Mai, R. Xin, E. H. Abed, U. S. Khan “Linear convergence in optimization over directed graphs with row-stochastic matrices,” in IEEE Transactions on Automatic Control, vol. 63, no. 10, pp. 3558–3565, 2018.
- [27] J. Xu, S. Zhu, Y. Soh, and L. Xie “Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes,” in Proceedings of the 54th IEEE Conference on Decision and Control (CDC), pp. 2055–2060, 2015.
- [28] L. Zhi, S. Wei, and Y. Ming “A Decentralized Proximal-Gradient Method With Network Independent Step-Sizes and Separated Convergence Rates,” in IEEE Transactions on Signal Processing, vol. 67, no. 17, pp. 4494–4506, 2019.
- [29] K. Yuan, B. Ying, X. Zhao, and A. H. Sayed “Exact diffusion for distributed optimization and learning—Part I: Algorithm development,” in IEEE Transactions on Signal Processing, vol. 67, no. 3, pp. 708–723, 2018.
- [30] A. H. Sayed, “Adaptation, Learning and Optimization over networks,” in Foundations and Trends in Machine Learning, vol. 7, no. 4-5, pp. 311–801, 2014, 10.1109.
- [31] S. Haykin, “Cognitive Dynamic Systems: Perception-action Cycle, Radar and Radio,”Cambridge University Press, New York, NY, USA. , 2012.
- [32] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control,, vol. 54, no. 1, pp. 48-61, 2009.
- [33] J. Chen and A. H. Sayed, “Distributed Pareto optimization via diffusion strategies,” IEEE J. Sel. Topics Signal Process., vol. 7, no. 2, pp. 205-220, April. 2013.
- [34] K. Yuan, Q. Ling, and W. Yin, “On the convergence of decentralized gradient descent,” SIAM Journal on optimization., vol. 26, no. 3, pp. 1835-1854, 2016.
- [35] J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual Averaging for distributed optimization: Convergence analysis and network scaling,” IEEE Transactions on Automatic Control,, vol. 57, no. 3, pp. 592-606, 2012.
- [36] W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin, “On the linear convergence of the ADMM in decentralized consensus optimization,” IEEE Trans. Signal Process.,, vol. 62, no. 7, pp. 1750-1761, 2014.
- [37] G. Qu and N. Li, “Harnessing smoothness to accelerate distributed optimization,” IEEE Transactions on Control of Network Systems,, vol. 5, no. 3, pp. 1245-1260, 2018.
- [38] A. Nedic, A. Olshevsky, and W. Shi, “Achieving geometric convergence for distributed optimization over time-varying graphs,” SIAM Journal on Optimization,, vol. 27, no. 4, pp. 2597-2633, 2017.
- [39] K. Scaman, F. Bach, S. Bubeck, Y. T. Lee, and L. Massoulie “Optimal algorithms for smooth and strongly convex distributed optimization in the network,” in International Conference on Machine Learning (ICML). Stockholm, Sweden, 2017, pp. 3027-3036,.
- [40] A. I. Chen, and A. Ozdaglar, “A fast distributed optimal-gradient method,” in Annual Allerton Conference on communication , Control, and Computing. Monticello, IL, USA, Oct. 2012, pp. 601-608.
- [41] G. M. Baudet, “Asynchronous iterative methods for multiprocessors,” Journal of the ACM(JACM), vol. 25, no. 2, pp. 226-244, 1978.
- [42] D. Bertsekas, “Distributed dynamic programming,” IEEE Transactions on Automatic Control, vol. 27, no. 3, pp. 610-616, 1982.
- [43] S. A. Alghunaim, M. Yan, and A. H. Sayed, “A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features,” IEEE 28th European Signal Processing Conference (EUSIPCO), pp. 2095–2099, 2021.
- [44] S. A. Alghunaim, Q. Lyu, M. Yan, and A. H. Sayed, “A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features,” IEEE Transactions on Signal Processing, vol. 69, pp. 5568–5579, 2021.
- [45] S. Boyd and L. Vandenberghe, “Convex Optimization.” Cambridge, U.K.: Cambridge Univ. Press, 2004.
- [46] D. Bertsekas, “Convex Analysis and optimization.” Singapore: Athena Scientific, 2003.
- [47] P. Latafat, N. M. Freris, and P. Patrinos, “A new randomized block-coordinate primal-dual proximal algorithm for distributed optimization,” IEEE Transactions on Automatic Control , vol. 64, no. 10, pp. 4050-4065, Oct. 2019.
- [48] Xin, Ran, and Usman A. Khan. ”A linear algorithm for optimization over directed graphs with geometric convergence.” IEEE Control Systems Letters, vol. 2, no. 3, pp. 315-320, 2018.
- [49] D. Kempe, A. Dobra, and J. Gehrke. ”Gossip-based computation of aggregate information” 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pp. 482-491, 2003.
- [50] Q. Qiu, and H. Su, ”Finite-Time Output Synchronization for Output-Coupled Reaction-Diffusion Neural Networks With Directed Topology,” in IEEE Transactions on Network Science and Engineering, vol. 9, no. 3, pp. 1386-1394, 2022.
- [51] P. Wan et al., ”Optimal Control for Positive and Negative Information Diffusion Based on Game Theory in Online Social Networks,” in IEEE Transactions on Network Science and Engineering, vol. 10, no. 1, pp. 426-440, 2023.
Sajad Zandi completed his Master of Science degree in Electrical Engineering from Malayer University, Iran, in 2017. Since May 2024, he has been a PhD student at the School of Mechanical and Mechatronics Engineering, University of Technology Sydney (UTS), Australia. His research focuses on adaptive and statistical signal processing, cooperative learning, multi-agent networking, distributed adaptive estimation, distributed active noise control, information-theoretic data, signal processing for communications, and adaptive optimization. |
Mehdi Korki received the bachelor degree in electrical engineering from Shiraz University, Shiraz, Iran, in 2001 and the master and Ph.D. degrees in electrical engineering from Swinburne University of Technology, Melbourne, Australia, in 2012 and 2016, respectively. He is currently a Lecturer at the School of Science, Computing and Engineering Technologies, Swinburne University of Technology, Melbourne, Australia. His research interests include statistical signal processing, data privacy, wireless sensor networks and distributed signal processing. |