Multidimensional Newton-Raphson Consensus For Distributed Convex Optimization

Multidimensional Newton-Raphson consensus
for distributed convex optimization

Filippo Zanella, Damiano Varagnolo, Angelo Cenedese, Gianluigi Pillonetto, Luca Schenato
AbstractIn this work we consider a multidimensional
distributed optimization technique that is suitable for multi-
agents systems subject to limited communication connectivity.
In particular, we consider a convex unconstrained additive
problem, i.e. a case where the global convex unconstrained
multidimensional cost function is given by the sum of local
cost functions available only to the specic owning agents. We
show how, by exploiting the separation of time-scales principle,
the multidimensional consensus-based strategy approximates a
Newton-Raphson descent algorithm. We propose two alternative
optimization strategies corresponding to approximations of
the main procedure. These approximations introduce tradeoffs
between the required communication bandwidth and the con-
vergence speed/accuracy of the results. We provide analytical
proofs of convergence and numerical simulations supporting
the intuitions developed through the paper.
Index Termsmultidimensional distributed optimization,
multidimensional convex optimization, consensus algorithms,
multi-agent systems, Newton-Raphson methods
I. INTRODUCTION
To cope with the growing mankind demands, humanity is
building greater and greater systems. But, since big central-
ized systems suffer small structural exibility and robustness
to failures, nowadays trends are to shift towards distributed
services and structures. Brilliant examples are the (current)
principal source of information - Internet, and the (future)
network of renewable energy sources - wind farms, wave
parks and home solar systems. But, to operate at their best,
these networks are required to distributedly solve complex
optimization problems. Computations should thus require
minimal coordination efforts, small computational and mem-
ory requirements, and do not rely on central processing units.
Development and study of such algorithms are major
research topics in the area of control and system theory [1],
[2], and have lead up to now to numerous contributions.
These can be roughly divided into three main categories:
methods based on primal decompositions, methods based on
dual decomposition, and heuristic methods.
Primal decomposition methods operate manipulating the
primal variables, often through subgradient methods, [3] and
references therein. Despite they are widely applicable, they
are easy to implement and they require mild assumptions on
the objective functions, they may be rather slow and may
not progress at each iteration [4, Chap. 6]. Implementations
The research leading to these results has received funding from the
European Union Seventh Framework Programme [FP7/2007-2013] under
grant agreement n
257462 HYCON2 Network of excellence and n
223866
FeedNetBack, by Progetto di Ateneo CPDA090135/09 funded by the
University of Padova, and by the Italian PRIN Project New Methods
and Algorithms for Identication and Adaptive Control of Technological
Systems.
can be based on incremental gradients methods [5] with
deterministic [6] or randomized [7] approaches, and they
may use opportune projection steps to account for possible
constraints [8].
Decomposition methods instead operate manipulating the
dual problem, usually splitting it into simpler sub-tasks that
require the agents to own local copies of the to-be-updated
variables. Convergence to the global optimum is ensured
constraining the local variables to converge to a common
value [9]. In the class of dual decomposition methods, a par-
ticularly popular strategy is the Alternating Direction Method
of Multipliers (ADMM) developed in [1, pp. 253-261] and
recently proposed in various distributed contexts [10], [11].
An other interesting approach, suitable only for particular
optimization problems, is to use the so-called Fast-Lipschitz
methods [12], [13]. These exploit particular structures of the
objective functions and constraints to increase the conver-
gence speed. Alternative distributed optimization approaches
are based on heuristics like swarm optimization [14] or
genetic algorithms [15]. However their convergence and
performance properties are difcult to be studied analytically.
Statement of contribution: here we focus on the uncon-
strained minimization of a sum of multidimensional convex
functions, where each component of the global function is a
private local cost available only to a specic agent. We thus
offer a distributed algorithm that approximatively operates as
a Newton-Raphson minimization procedure, and then derive
two approximated versions that trade-off between the re-
quired communication bandwidth and the convergence speed
/ accuracy of the results. For these strategies we provide
convergence proofs and analysis on the robustness on initial
conditions of the algorithms, under the assumptions that
local cost functions are convex and smooth, and that com-
munication schemes are synchronous. The main algorithm
is an extension of what has been proposed in [16], while
the approximated versions are completely novel. We notice
that communications between agents are based on classical
average-consensus algorithms [17]. The offered algorithms
inherit thus the good properties of consensus algorithms,
namely their simplicity, their potential implementation with
asynchronous communication schemes, and their ability to
adapt to time-varying network topologies.
Structure of the paper: in Sec. II we formulate the
problem from a mathematical point of view. In Sec. III we
derive the main generic distributed algorithm, from which we
derive three different and specic instances in Sections IV, V
and VI. In Sec. VII we briey discuss the properties of these
algorithms, and then in Sec. VIII we show their effectiveness
2012 American Control Conference
Fairmont Queen Elizabeth, Montral, Canada
June 27-June 29, 2012
978-1-4577-1094-0/12/$26.00 2012 AACC 1079
by means of numerical examples. Finally in Sec. IX we draw
some concluding remarks
1
.
II. PROBLEM FORMULATION
We assume that S agents, each endowed with the local N-
dimensional and strictly convex cost function f
i
: R
N
R, aim to collaborate in order to minimize the global cost

function
f : R
N
R f (x) =
1
S
S
i=1
f
i
(x) (1)
where x := [x
1
x
N
]
T
is the generic element in R
N
.
Agents thus want to distributedly compute
x
:= arg min
x
f (x) (2)
exploiting low-complexity distributed optimization algo-
rithms. As in [16], we model the communication network
as a graph G = (V, E) whose vertexes V = {1, 2, . . . , S}
represent the agents and the edges (i, j) E represent the
available communication links. We assume that the graph is
undirected and connected. We say that a stochastic matrix
P R
SS
, i.e. a matrix whose elements are non-negative
and P1
S
= 1
S
, where 1
S
:= [1 1 1]
T
R
S
, is
consistent with a graph G if P
ij
> 0 only if (i, j) E.
If P is also symmetric and includes all edges, i.e. P
ij
> 0
if (i, j) E, then lim
k
P
k
=
1
S
1
S
1
T
S
. Such matrix P is
also often referred as a consensus matrix.
In the following we use x
i
(k) := [x
i,1
(k) x
i,N
(k)]
T
to indicate the input location of agent i at time k, and
operator to indicate differentiation w.r.t. x, i.e.
f
i
(x
i
(k)) :=
_
f
i
x
1
xi(k)

f
i
x
N
xi(k)
_
T
(3)
2
f
i
(x
i
(k)) :=
_
2
f
i
x
m
x
n
xi(k)
_
_
. (4)
In general we use the fraction bar to indicate the Hadamard
division, i.e. the component-wise division of vectors a, b
R
N
a
b
:=
_
a
1
b
1
, ,
a
N
b
N
_
T
. (5)
In general we use bold fonts to indicate vectorial quantities
or functions which range is vectorial, plain italic fonts to
indicate scalar quantities or functions which range is a scalar.
We use capital italic fonts to indicate matrix quantities
and capital bold fonts to indicate matrix quantities derived
stacking other matrix quantities. As in [16], to simplify the
proofs we exploit the following assumption, implying that
x
is unique:
Assumption 1. Local functions f
i
belongs to C
2
, i, i.e.
they are continuous up to the second partial derivatives, their
1
The proofs of the proposed propositions can be found in the homony-
mous technical report available on the authors webpages.
second partial derivatives are strictly positive, bounded, and
they are dened for all x R
N
. Moreover each scalar
component of the global minimizer x
does not take value

on the extended values .
We notice that from the strict convexity assumptions it
follows that x
is unique. Moreover the assumption that each

scalar component of x
does not take value on the extended

values is to obtain convergence proofs that do not require
modications of the standard multi-time-scales approaches
for singular perturbation model analysis [18], [19, Chap. 11].
We also notice that these smoothness assumptions, despite re-
strictive, have been used also by other authors, see e.g. [20],
[21].
A. Notation for Multidimensional Consensus Algorithms
Assume
A
i
=
_
_
a
(i)
11
a
(i)
1M
.
.
.
.
.
.
a
(i)
N1
a
(i)
NM
_
_
i = 1, . . . , S
to be S generic N M matrices associated to agents
1, . . . , S, and that these agents want to distributedly compute
1
S
S
i=1
A
i
by means of the double-stochastic communica-
tion matrix P. In the following sections, to indicate the whole
set of the single component-wise steps
_
_
a
(1)
pq
(k + 1)
.
.
.
a
(S)
pq
(k + 1)
_
_
= P
_
_
a
(1)
pq
(k)
.
.
.
a
(S)
pq
(k)
_
_
p = 1, . . . , N
q = 1, . . . , M
(6)
we use the equivalent matricial notation
_
_
A
1
(k + 1)
.
.
.
A
S
(k + 1)
_
_ = (P I
N
)
_
_
A
1
(k)
.
.
.
A
S
(k)
_
_ (7)
where I
N
is the identity in R
NN
and is the Kronecker
product. Notice that the notation is suited also for vectorial
quantities, e.g. A
i
R
N
.
III. DISTRIBUTED MULTIDIMENSIONAL
CONSENSUS-BASED OPTIMIZATION
Assume the local cost functions to be quadratic, i.e.
f
i
(x) =
1
2
(x b
i
)
T
A
i
(x b
i
)
where A
i
> 0. Straightforward computations show that the
unique minimizer of f is given by
x
=
_
1
S
S
i=1
A
i
_
1
_
1
S
S
i=1
A
i
b
i
_
and can thus be computed using the output of two average
consensus algorithms. Dening the local variables
y
i
(0) := A
i
b
i
R
N
Z
i
(0) := A
i
R
NN
1080
and the corresponding compact forms
Y (k):=
_
_
y
1
(k)
.
.
.
y
S
(k)
_
_ R
NS
Z(k):=
_
_
Z
1
(k)
.
.
.
Z
S
(k)
_
_ R
NSN
then the algorithm
Y (k + 1) =
_
P I
N
_
Y (k) (8)
Z(k + 1) =
_
P I
N
_
Z(k) (9)
x
i
(k) = (Z
i
(k))
1
y
i
(k) i = 1, . . . , S (10)
alternates average-consensus steps (i.e. (8) and (9), given the
considerations in Sec. II-A) with local updates (i.e. (10)),
and is s.t. lim
k
x
i
(k) = x
. The element x
i
(k) can thus
be considered the local estimate of the global minimizer x
at time k. If the cost functions are not quadratic, then the

previous strategy cannot be applied as it is but needs to be
modied using the guidelines:
1) in general
y
i
(0) =
2
f
i
(x
i
(0)) x
i
(0) Z
i
(0) =
2
f
i
(x
i
(0)) .
For quadratic scenarios these two quantities are in fact
independent of x
i
, but this does not happen in the
general case. Consensus steps (8)-(9) should then be
performed considering that the x
i
(k)s change over
time. This requires to appropriately design the update
rules for y
i
and Z
i
;
2) (10) might lead to estimates that change too rapidly.
This requires to take smaller steps towards the estimated
minimum (Z
i
(k))
1
y
i
(k).
To this aim, we propose the following general Alg. 1.
Notice that it depends on quantities that have not yet been
dened, namely g
i
(k) and H
i
(k), i = 1, . . . , S.
The importance of this algorithm is given by the fact that,
under opportune hypotheses, the temporal evolution of the
average state x :=
1
S
S
i=1
x
i
approximatively follows the
update rule
x(t) = x(t) +
_
1
S
S
i=1
H
i
(x(t))
_
1
_
1
S
S
i=1
g
i
(x(t))
_
(see proof of Prop. 2). In the following we show that this
property is appealing since, exploiting proper choices of
g
i
(k) and H
i
(k), we can obtain distributed optimization
algorithms with desirable properties such as convergence
to the global optimum and small communication bandwidth
requirements.
IV. DISTRIBUTED MULTIDIMENSIONAL
NEWTON-RAPHSON
Consider the following Alg. 2, based on the general
layout given by Alg. 1. We show now how it corresponds
to the multidimensional extension of the distributed scalar
optimizer described in [16], and that it distributedly computes
the global optimum x
. We notice that initializations given in

line 5 are critical for the convergence to the global minimizer;
lines 8-9 are local operations assuring the Newton-Raphson
computation to be based on the current local estimates x
i
(k);
lines 10-11 perform the consensus operations, and operations
in line 13 are again local operations performing convex
combinations between the past and new estimates.
Algorithm 1 Distributed Optimization - General Layout
(variables)
1: x
i
(k), y
i
(k), g
i
(k) R
N
; Z
i
(k), H
i
(k) R
NN
for
i = 1, . . . , S and k = 1, 2, . . .
(notice: g
i
and H
i
dened in Alg. 2, Alg. 3, Alg. 4)
(parameters)
2: P R
SS
, consensus matrix
3: (0, 1)
(initialization)
4: for i = 1, . . . , S do
5:
set: y
i
(0) = g
i
(1) = 0
Z
i
(0) = H
i
(1) = 0
x
i
(0) = 0
(main algorithm)
6: for k = 1, 2, . . . do
(local updates)
7: for i = 1, . . . , S do
8: y
i
(k) = y
i
(k 1) +g
i
(k 1) g
i
(k 2)
9: Z
i
(k) = Z
i
(k 1) +H
i
(k 1) H
i
(k 2)
(multidimensional average consensus step)
10: Y (k) =
_
P I
N
_
Y (k)
11: Z(k) =
_
P I
N
_
Z(k)
(local updates)
12: for i = 1, . . . , S do
13: x
i
(k) = (1 )x
i
(k 1) + (Z
i
(k))
1
y
i
(k)
The convergence properties can be proved exploiting a
state augmentation, recognizing the existence of a two-
time scales dynamical system regulated by the parameter
, and then considering that, for small , the uctuations
induced by the fast subsystem exponentially vanish while the
dynamics of the slow subsystem correspond to continuous-
time Newton-Raphson algorithm that converges to the global
optimum given the previously posed Assumption 1. For this
purpose it is useful to dene the shorthands
G(k) :=
_
_
g
1
(k)
.
.
.
g
S
(k)
_
_ R
NS
H(k) :=
_
_
H
1
(k)
.
.
.
H
S
(k)
_
_ R
NSN
Algorithm 2 Distributed Newton-Raphson
Execute Alg. 1 with denitions
g
i
(k) :=
2
f
i
(x
i
(k)) x
i
(k) f
i
(x
i
(k)) R
N
H
i
(k) :=
2
f
i
(x
i
(k)) R
NN
.
The rst step is then to introduce the additional variables
V (k) = G(k 1) and W(k) = H(k 1) and rewrite Alg. 2
1081
as
V (k) = G(k 1)
W(k) = H(k 1)
Y (k) = (P I
N
)
_
Y (k1)+G(k1)V (k1)
_
Z(k) = (P I
N
)
_
Z(k1)+H(k1)W(k1)
_
x
i
(k) = (1)x
i
(k 1)+ (Z
i
(k))
1
y
i
(k)
(11)
from which it is possible to recognize the tracking of the
quantities x
i
(k) plus the consensus step (1
st
to 4
th
rows) and
the local smooth updates (5
th
row). (11) can be considered
the Euler discretization, with time interval T = , of the
continuous time system

V (t) = V (t) +G(t)

W(t) = W(t) +H(t)

Y (t) = KY (t) + (I
NS
K) [G(t) V (t)]
Z(t) = KZ(t) + (I
NS
K) [H(t) W(t)]
x
i
(t) = x
i
(t) + (Z
i
(t))
1
y
i
(t)
(12)
with K := I
NS
(PI
N
). It is immediate to show that K is
positive semidenite, its kernel is generated by 1
NS
, and that
its eigenvalues satisfy 0 =
1
< Re [
2
] Re [
NS
] <
2, where Re [] indicates the real part of . (12) is constituted
by two dynamical subsystems with different time-scales, one
of which is regulated by the parameter . Exploiting classical
time-separation techniques [19, Chap. 11], splitting the dy-
namics in the two time scales and studying them separately
for sufciently small , it follows that the fast dynamics, i.e.
the rst four equations of (12), are s.t. x
i
(t) x(t), where
x(t) :=
1
S
S
i=1
x
i
(t), and moreover x(t) evolves with good
approximation following the ordinary differential equation
x(t) =
_
2
f (x(t))
1
f (x(t)) (13)
corresponding to a continuous Newton-Raphson algorithm
2
that we will prove to be always convergent to the global
optimum x
. These observations are formally stated in the

following (proof in Appendix):
Proposition 2. Consider Alg. 2, equivalent to system (11)
with initial conditions V (0) = Y (0) = 0 and W(0) =
Z(0) = 0. If Assumption 1 holds true, then there ex-
ists an R
+
s.t. if < then Alg. 2 distributedly
and asymptotically computes the global optimum x
, i.e.
lim
k+
x
i
(k) = x
for all i.
V. DISTRIBUTED MULTIDIMENSIONAL JACOBI
Implementation of Alg. 2 requires agents to exchange infor-
mation on about O
_
N
2
_
scalars. This could be prohibitive
in multidimensional scenarios with serious communication
bandwidth constraints and large N. In these cases, to min-
imize the amount of information to be exchanged it is
meaningful to let H
i
(k) be not the whole Hessian matrix
2
f
i
(x
i
(k)), but only its diagonal. The corresponding al-
gorithm, that we call Jacobi due to the underlying diago-
nalization process, is offered in Alg. 3. We notice that this
2
Asymptotic properties of the scalar and continuous time Newton-
Raphson method can be found e.g. in [22], [23].
diagonalization process has already been used in literature,
e.g., see [24], [25], even if in conjunction with different
communication structures.
Algorithm 3 Distributed Jacobi
g
i
(k) := H
i
(k) x
i
(k) f
i
(x
i
(k)) R
N
H
i
(k) :=
_
2
fi
x
2
1
xi(k)
0
.
.
.
0

2
fi
x
2
N
xi(k)
_
_
R
NN
.
Possible interpretations of the proposed approximation are:
agents perform modied second-order Taylor approxi-
mations of the local functions;
agents choose a steepest descent direction in a simplied
norm;
ellipsoids corresponding to the various Hessians
2
f
i
are approximated with ellipsoids having axes that are
parallel with the current coordinate system.
It is easy to show that this approximated strategy is
invariant over afne transformations T : R
NN
R
NN
, T invertible and s.t. f
new
(x) = f(Tx), as classical
Newton-Raphson algorithms are [26, Sec. 9.5]. It is moreover
possible to prove that also Alg. 3 ensures the convergence
to the global optimum, i.e. to prove the following (proof in
Appendix):
Proposition 3. If Assumption 1 holds true, then there exists
an
R
+
s.t. if <
then Alg. 3 distributedly

and asymptotically computes the global optimum x
, i.e.
lim
k+
x
i
(k) = x
for all i.
Analytical characterization of the convergence speed of
Alg. 2 and Alg. 3 is left as a future work.
VI. DISTRIBUTED MULTIDIMENSIONAL GRADIENT
DESCENT
We notice now that the distributed Jacobi relieves the com-
putational requirements of the distributed Newton-Raphson,
since the inversion of H
i
(x
i
(k)) corresponds to the inversion
of N scalars, but nonetheless agents still have to compute
the local second derivatives

2
fi
x
2
n
xi(k)
. If this task is still too
consuming, e.g. in cases where nodes have severe computa-
tional constraints, it is possible to redene H
i
(k) in Alg. 1
in a way that it reduces to a gradient-descent procedure, as
did in the following algorithm.
Algorithm 4 Distributed gradient-descent
g
i
(k) := x
i
(k) f
i
(x
i
(k)) R
N
H
i
(k) := I
N
R
NN
.
1082
VII. DISCUSSION ON THE PREVIOUS ALGORITHMS
The costs associated to the previously proposed strategies
are summarized in Tab. I.
Algorithm 2 3 4
Computational Cost O
`
N
3
O(N) O(N)
Communication Cost O
`
N
2
O(N) O(N)
Memory Cost O
`
N
2
O(N) O(N)
TABLE I
COMPUTATIONAL, COMMUNICATION AND MEMORY COSTS OF
ALGORITHMS 2, 3, AND 4 PER SINGLE UNIT AND SINGLE STEP (LINES 6
TO 13 OF ALGORITHM 1).
We notice that the approximation of the Hessian by
neglecting the off-diagonal terms has been already proposed
in centralized approaches, e.g. [27]. Intuitively, the effect of
this diagonal approximation is the following: the full Newton
method perform both scaling and rotation of the steepest
descent step. The diagonal modied Newton method only
scales the descent step in each direction, thus the more the
directions of the maximal and minimal curvatures are aligned
with the axes, the more the approximated method captures
the curvature information and performs better.
A nal remark is that the analytic Hessian can be ap-
proximated in several ways, but in general it is necessary
to consider only approximations that maintain symmetry
and positive deniteness. In cases where this deniteness is
lacking, or matrices are bad conditioned, modications are
usually performed e.g. through Cholesky factorizations [28].
VIII. NUMERICAL EXAMPLES
We consider a ring communication graph, where agents
can communicate only to their left and right neighbors, and
thus the symmetric circulant communication matrix
P =
_
_
0.5 0.25 0.25
0.25 0.5 0.25
.
.
.
.
.
.
.
.
.
0.25 0.5 0.25
0.25 0.25 0.5
_
_
. (14)
We consider S = 15, N = 2, and local objective functions
randomly generated as
f
i
(x) = exp
_
(x b
i
)
T
A
i
(x b
i
)
_
, i = 1, . . . , S
where b
i
[U [5, 5] , U [5, 5]]
T
, A
i
= D
i
D
T
i
> 0, and
D
i
:=
_
d
11
d
12
d
21
d
22
_
R
22
. (15)
We compare the performances of the previous algorithms
in the following three different scenarios:
S
1
:
_
_
d
11
= d
22
U [0.08, 0.08] R[1, 1]
d
12
U [0.08, 0.08] R[0.25, 0.5]
d
21
U [0.08, 0.08] R[0.5, 0.25]
(16)
where the R-distribution as:
R[c, d] :=
_
c with probability 1/2
d with probability 1/2
i.e. the axes of each contour plot are randomly oriented in
the 2-D plane.
S
2
:
_
_
d
11
U [0.08, 0.08]
d
12
= d
21
= 0
d
22
= 2 d
11
(17)
i.e. the axes of all the contour plots of the f
i
surfaces are
aligned with the axes of the natural reference system.
S
3
:
_
_
d
11
U [0.08, 0.08]
d
12
= d
21
= 0.01
d
22
R[0.9, 1.1] d
11
(18)
i.e. the axes of each contour plot are randomly oriented along
the bisection of the rst and third quadrant.
The contour plots of the global cost functions

fs gener-
ated using (16), (17) and (18), and the evolution of the local
states x
i
for the three algorithms are shown in Fig. 1.
We notice that Alg. 2 and Alg. 3 have qualitatively the
same behavior for the scenarios (16) and (17). This is
because the approximation introduced in Alg. 3 is actually a
good approximation of the analytical Hessians
2
f
i
(x
i
(k)).
Conversely, Alg. 4 presents a remarkably slower convergence
rate. Since the computational time of Alg. 3 and 4 are
comparable, Alg. 3 seems to represent the best choice among
all the presented solutions.
IX. CONCLUSIONS AND FUTURE WORKS
Starting from [16], we offered a multidimensional dis-
tributed convex optimization algorithm that behaves approxi-
matively as a Newton-Raphson procedure. We then proposed
two approximated versions of the main algorithm to take
into account the possible computational, communication and
memory constraints that may arise in practical scenarios.
We produced proofs of convergence under the assumptions
of dealing with smooth convex functions, and numerical
simulations to compare the performances of the proposed
algorithms.
Currently there are many open future research directions.
A rst branch is about the analytical characterization of the
speeds of convergence of the proposed strategies, while an
other one is about the application of quasi-Newton methods
to avoid the computation of the Hessians and the use of
trust region methods. Finally, an important future extension
is to allow the strategy to be implemented in asynchronous
communication frameworks.
1083
15 10 5 0 5 10 15
15
10
5
0
5
10
15
x1
x
2
0 20 40 60 80 100
1.5
1
0.5
0
0.5
1
1.5
2
Distributed Newton-Raphson
x
1
,
i
(
k
)
0 20 40 60 80 100
1.5
1
0.5
0
0.5
1
1.5
2
Distributed Jacobi
0 2000 4000 6000
1.5
1
0.5
0
0.5
1
1.5
2
Distributed gradient descent
15 10 5 0 5 10 15
15
10
5
0
5
10
15
x1
x
2
0 20 40 60 80 100
3
2
1
0
1
2
x
1
,
i
(
k
)
0 20 40 60 80 100
3
2
1
0
1
2
0 2000 4000 6000
3
2
1
0
1
2
15 10 5 0 5 10 15
15
10
5
0
5
10
15
x1
x
2
0 20 40 60 80 100
1.5
1
0.5
0
0.5
1
1.5
2
k (time steps)
x
1
,
i
(
k
)
0 20 40 60 80 100
1.5
1
0.5
0
0.5
1
1.5
2
k (time steps)
2.000 4.000 6.000
1.5
1
0.5
0
0.5
1
1.5
2
k (time steps)
Fig. 1. First column on the left, contours plot of global function

fs for scenarios S
1
, S
2
, S
3
, respectively (from top to bottom). Black dots indicate
the positions of the global minima x
. Second, third and fourth columns, temporal evolutions of the rst components of the local states x
1
, for the case
= 0.25 and N = 15. In particular: second column, distributed Newton-Raphson (Alg. 2). Third column, distributed Jacobi (Alg. 3). Fourth column,
distributed gradient descent (Alg. 4). First row, scenario S
1
. Second row, scenario S
2
. Third row, scenario S
3
. The black dashed lines indicate the rst
components of the global optima x
. Notice that we show a bigger number of time steps for the gradient descent algorithm (fourth column).
REFERENCES
[1] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Compu-
tation: Numerical Methods. Athena Scientic, 1997.
[2] D. P. Bertsekas, Network Optimization: Continuous and Discrete
Models. Belmont, Massachusetts: Athena Scientic, 1998.
[3] K. C. Kiwiel, Convergence of approximate and incremental subgra-
dient methods for convex optimization, SIAM J. on Optim., vol. 14,
no. 3, pp. 807 840, 2004.
[4] B. Johansson, On distributed optimization in networked systems,
Ph.D. dissertation, KTH Electrical Engineering, 2008.
[5] A. Nedi c and D. P. Bertsekas, Incremental subgradient methods for
nondifferentiable optimization, SIAM J. on Optim., vol. 12, no. 1, pp.
109 138, 2001.
[6] D. Blatt, A. Hero, and H. Gauchman, A convergent incremental
gradient method with a constant step size, SIAM J. on Optim., vol. 18,
no. 1, pp. 29 51, 2007.
[7] S. S. Ram, A. Nedi c, and V. Veeravalli, Incremental stochastic
subgradient algorithms for convex optimzation, SIAM J. on Optim.,
vol. 20, no. 2, pp. 691 717, 2009.
[8] A. Nedi c, A. Ozdaglar, and P. A. Parrilo, Constrained consensus and
optimization in multi-agent networks, IEEE TAC, vol. 55, no. 4, pp.
922 938, 2010.
[9] L. Xiao, M. Johansson, and S. Boyd, Simultaneous routing and
resource allocation via dual decomposition, IEEE Trans. on Comm.,
vol. 52, no. 7, pp. 1136 1144, 2004.
[10] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, Consensus in ad hoc
WSNs with noisy links - part I: Distributed estimation of deterministic
signals, IEEE Trans. on Sig. Proc., vol. 56, pp. 350 364, 2008.
[11] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed
optimization and statistical learning via the alternating direction
method of multipliers, Stanford Statistics Dept., Tech. Rep., 2010.
[12] C. Fischione, F-Lipschitz optimization with Wireless Sensor Net-
works applications, IEEE TAC, vol. to appear, pp. , 2011.
[13] C. Fischione and U. Jnsson, Fast-Lipschitz optimization with Wire-
less Sensor Networks applications, in IPSN, 2011.
[14] J. Van Ast, R. Babka, and B. D. Schutter, Particle swarms in
optimization and control, in IFAC World Congress, 2008.
[15] E. Alba and J. M. Troya, A survey of parallel distributed genetic
algorithms, Complexity, vol. 4, no. 4, pp. 31 52, 1999.
[16] F. Zanella, D. Varagnolo, A. Cenedese, G. Pillonetto, and L. Schenato,
Newton-Raphson consensus for distributed convex optimization, in
IEEE Conference on Decision and Control, 2011.
[17] F. Garin and L. Schenato, Networked Control Systems. Springer,
2011, ch. A survey on distributed estimation and control applications
using linear consensus algorithms, pp. 75107.
[18] P. Kokotovi c, H. K. Khalil, and J. OReilly, Singular Perturbation
Methods in Control: Analysis and Design, ser. Classics in applied
mathematics. SIAM, 1999, no. 25.
[19] H. K. Khalil, Nonlinear Systems, 3rd ed. Prentice Hall, 2001.
[20] Y. C. Ho, L. Servi, and R. Suri, A class of center-free resource
allocation algorithms, Large Scale Systems, vol. 1, pp. 51 62, 1980.
[21] L. Xiao and S. Boyd, Optimal scaling of a gradient method for
distributed resource allocation, J. Opt. Theory and Applications, vol.
129, no. 3, pp. 469 488, 2006.
[22] K. Tanabe, Global analysis of continuous analogues of the Levenberg-
Marquardt and Newton-Raphson methods for solving nonlinear equa-
tions, Inst. of Stat. Math., vol. 37, pp. 189203, 1985.
[23] R. Hauser and J. Nedi c, The continuous Newton-Raphson method
can look ahead, SIAM J. on Opt., vol. 15, pp. 915 925, 2005.
[24] S. Athuraliya and S. H. Low, Optimization ow control with newton-
like algorithm, Telecommunication Systems, vol. 15, no. 3-4, pp. 345
358, 2000.
[25] M. Zargham, A. Ribeiro, A. Ozdaglar, and A. Jadbabaie, Accelerated
dual descent for network optimization, in ACC, 2011.
[26] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge
University Press, 2004.
[27] S. Becker and Y. L. Cun, Improving the convergence of back-
propagation learning with second order models, University of
Toronto, Tech. Rep. CRG-TR-88-5, September 1988.
[28] G. H. Golub and C. F. Van Loan, Matrix Computations. Johns
Hopkins University Press, 1996, sec. 4.2.
1084

Multidimensional Newton-Raphson Consensus For Distributed Convex Optimization

Uploaded by

Copyright:

Available Formats

Multidimensional Newton-Raphson Consensus For Distributed Convex Optimization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multidimensional Newton-Raphson Consensus For Distributed Convex Optimization

Uploaded by

Copyright:

Available Formats

Multidimensional Newton-Raphson consensus

for distributed convex optimization

257462 HYCON2 Network of excellence and n

R, aim to collaborate in order to minimize the global cost

does not take value

is unique. Moreover the assumption that each

does not take value on the extended

at time k. If the cost functions are not quadratic, then the

. We notice that initializations given in

. These observations are formally stated in the

then Alg. 3 distributedly

You might also like