MULTI-LEVEL DIFFUSION ADAPTIVE NETWORKS
Federico S. Cattivelli
Ali H. Sayed
Department of Electrical Engineering
University of California, Los Angeles
ABSTRACT
We study the problem of distributed estimation, where a set of
nodes are required to collectively estimate some parameter of interest from their measurements. Diffusion algorithms have been
shown to achieve good performance, increased robustness and are
amenable for real-time implementations. In this work we focus on
multi-level diffusion algorithms, where a network running a diffusion algorithm is enhanced by adding special nodes that can perform
different processing. These special nodes form a second network
where a second diffusion algorithm is implemented. We illustrate
the concept using diffusion LMS, provide performance analysis for
multi-level collaboration and present simulation results showing
improved performance over conventional diffusion.
Index Terms— Distributed estimation, cooperation, diffusion,
adaptive network
1. INTRODUCTION
We study the problem of distributed estimation, where a set of nodes
are required to collectively estimate some parameter of interest from
their measurements. Consider a set of N nodes distributed over some
region. At every time instant i, every node k takes a scalar measurement dk (i) of some random process dk (i) and a 1 × M regression
vector, uk,i , corresponding to a realization of a random process uk,i
which is correlated with dk (i). The objective is for every node in
the network to use the data {dk (i), uk,i } to estimate some deterministic unknown vector wo . Specifically, we seek the optimal linear
estimator wo that minimizes the following global cost function:
J glob (w)
N
E |dk (i) − uk,i w|2
(1)
k=1
where E denotes the expectation operator.
In a centralized solution to the problem, every node transmits its
data {dk (i), uk,i } to a central fusion center for processing, requiring
large amounts of energy for communication. We refer to the solution
obtained by a fusion center as the global solution. An adaptive global
LMS solution to (1) can be obtained as follows [1, 2]:
wi = wi−1 + µ
N
u∗k,i (dk (i) − uk,i wi−1 )
(2)
k=1
Distributed implementations avoid the use of a fusion center and
distribute the processing and communication across the entire network. Two types of distributed estimation methods are known as
incremental and diffusion. Diffusion algorithms are amenable for
real-time implementations, more robust to node and link failure, and
obtain good performance in terms of estimation accuracy. Several
This work was supported in part by NSF grants ECS-0601266 and ECS0725441. Author’s emails: {fcattiv, sayed}@ee.ucla.edu.
978-1-4244-2354-5/09/$25.00 ©2009 IEEE
2789
Fig. 1. Disjoint adaptive networks connected via supernodes.
distributed estimation algorithms have been proposed, including incremental LMS and RLS [1], diffusion LMS [3] , diffusion RLS [4]
and diffusion Kalman filtering [5] and smoothing [6]. Algorithms
based on consensus have been proposed in [7, 8].
In this work, we study multi-level diffusion algorithms, whereby
an adaptive network is enhanced with a set of nodes with special
abilities. These new nodes form a second network that also runs a
diffusion algorithm, creating an adaptive network with two levels of
adaptivity. We focus on the adapt-then-combine (ATC) version of
diffusion LMS [9], though the results can be extended to other diffusion algorithms. We start by providing some motivation for multilevel diffusion, then propose an algorithm for multi-level diffusion,
provide performance analysis for the case of diffusion LMS with
small step-sizes, and show through simulation how adding a second
layer of diffusion improves the overall performance of the network.
2. MOTIVATION
In our previous work we considered diffusion algorithms whereby
nodes communicate with their neighbors in an isotropic manner.
These algorithms usually include two updates: an adaptive update
where nodes use their individual measurements to adapt their current estimates based on an adaptive filtering algorithm such as LMS
or RLS, followed by a combination update where nodes combine
the estimates from their neighbors to obtain the new estimate. This
type of diffusion algorithms is fully distributed. If a node were to
fail, the remaining nodes would continue operating, and the adaptive
network would continue working.
Consider now a situation where the adaptive network is divided
into non-connected sub-networks, as shown in Fig 1. This situation
could arise in an application where the area to be sensed is too large,
or where we wish to sense events located at points spatially far away
from each other. One approach could be to scatter different adaptive
networks over the area, each running a diffusion algorithm, say diffusion LMS. A question arises then about how to fuse the estimates
from these sub-networks such that they benefit from interacting with
each other. One solution would be to equip each one of the subnetworks with a supernode, which is able to communicate with the
supernodes from the other networks (supernodes are shown as red
squares in Fig. 1). This communication may be done through a special link or through multi-hop transmissions. Then the supernodes
ICASSP 2009
would form a new network, and we could run a second diffusion algorithm on the supernodes. Thus, the supernodes would become a
new adaptive network as shown by the red dashed links in Fig. 1.
Another scenario where multi-level diffusion algorithms could
be useful is the case where we have a large adaptive network, and
we want to improve its convergence speed. When an event occurs
on one end of the network, it will take a number of iterations for this
event to propagate. Again we can think of equipping the network
with supernodes capable of communicating through larger distances.
An example network is shown in Fig. 2. Events happening on one
end of the network would propagate faster through the supernodes.
Thus, multi-level diffusion schemes are attractive in situations
where the original adaptive network needs to be enhanced. It is important to note that multi-level diffusion algorithms should preserve
the property of being fully distributed. If one of the supernodes fails,
the network should be able to keep functioning without severe impact on the performance.
3. MULTI-LEVEL DIFFUSION
A multi-level diffusion algorithm consisting of two levels is discussed next. The algorithm gives special privileges to a subset of
the nodes which we call supernodes. Thus, there will be a set of
normal nodes running a diffusion algorithm, and a second level of
diffusion running on the supernodes. Let S denote the set of nodes
that are supernodes. Let Nk denote the neighborhood of node k,
i.e., the set of nodes that are connected to node k, including k itself,
(1)
and let Nk denote the set of supernodes connected to supernode k,
including k itself. Consider matrices A, C and A(1) with individ(1)
ual non-negative real entries al,k , cl,k and al,k , respectively. These
matrices satisfy
cl,k = al,k = 0 if l ∈ Nk
1 ∗ A = 1∗
1∗ C = 1 ∗
(1)
(1)
(3)
1∗ A(1) = 1∗
(4)
al,k = 0 if l ∈ Nk
C1 = 1
where 1 denotes the N × 1 vector with unit entries.
In a multi-level diffusion scheme, normal nodes (i.e., those that
are not supernodes) will perform exactly the same updates as in a
diffusion algorithm. In this case, we focus on the ATC diffusion
LMS algorithm of [9], namely:
For
k ∈ S,
⎧ every node k such that
⎪
ψ
cl,k u∗l,i (dl (i) − ul,i wk,i−1 )
=
w
+
µ
k,i−1
k
⎪ k,i
⎨
(5)
l∈Nk
⎪
al,k ψl,i
⎪
⎩ wk,i =
l∈Nk
The first and second equations of (5) are known as the adaptation
and combination updates, respectively. The matrix C allows nodes
to use measurements from their neighbors in the update, though for
a choice C = I, no exchange of measurements is needed. In the
latter case, at every time instant i, every node k updates the estimate
in three steps. First, every node adapts its current estimate using its
individual measurements {dk (i), uk,i } to obtain ψk,i . Second, all
nodes exchange their pre-estimates ψk,i with their neighbors. Finally, every node combines (or averages) the pre-estimates to obtain
the new estimate wk,i .
The fact that normal nodes perform the same update as in a conventional diffusion algorithm adds robustness to the adaptive network. Normal nodes do not need to be aware that supernodes are
present in the network, and will treat them as normal nodes. Thus,
even if all the supernodes were to fail, this would be transparent to
the rest of the nodes, and the adaptive network would keep functioning as a conventional single-level network.
For the supernodes, k ∈ S, we adopt a diffusion mechanism in
three steps as follows:
For every node k such that k
∈ S,
⎧
⎪
φ
al,k ψl,i
=
a
w
+
k,i
k,k
k,i−1
⎪
⎪
⎪
⎪
l∈Nk ,l=k
⎪
⎪
⎨ ϕ =φ +µ
cl,k u∗l,i (dl (i) − ul,i φk,i )
(6)
k,i
k,i
k
l∈Nk
⎪
⎪
⎪
(1)
⎪
wk,i = ψk,i =
al,k ϕl,i
⎪
⎪
⎪
⎩
(1)
l∈Nk
After the adaptive update of the normal nodes in (5), these nodes will
transmit their pre-estimates ψk,i to their neighbors. A supernode will
receive these pre-estimates from its neighbors and will perform an
initial combination step using these pre-estimates and its previous
estimate wk,i−1 , to obtain φk,i . Then it will run an adaptive step
to obtain ϕk,i . In the third step, supernodes will exchange ϕk,i with
(1)
their neighboring supernodes Nk , and will combine them to obtain
wk,i . The neighbors Nk of the supernode k will receive this estimate
wk,i = ψk,i and will use it in the combination step of (5).
4. PERFORMANCE ANALYSIS
In this section we analyze the performance of the multi-level diffusion LMS algorithm. The analysis is similar to the one presented in
[9], but the results are more general and include the ones presented
in [9] as a special case.
In what follows we consider the estimates w k,i to be random,
and analyze their performance in terms of their expected behavior.
Consider a diffusion LMS update consisting of two adaptive steps
and two combination steps as follows:
⎧
N
⎪
⎪
⎪
ψ
=
w
+
µ
c1,l,k u∗l,i [dl (i) − ul,i w k,i−1 ]
k,i−1
1,k
⎪
k,i
⎪
⎪
⎪
l=1
⎪
⎪
N
⎪
⎪
⎪
⎪
a1,l,k ψ l,i
⎪
⎨ φk,i =
l=1
⎪
⎪
⎪
⎪
ϕk,i
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩ w k,i
=
φk,i + µ2,k
N
c2,l,k u∗l,i [dl (i) − ul,i φk,i ]
l=1
=
N
a2,l,k ϕl,i
l=1
where the coefficients aj,l,k and cj,l,k are real, non-negative, and
correspond to the {l, k} entries of matrices Aj and Cj , respectively,
for j = 1, 2, and the columns of Aj add up to one. The step-sizes
µ1,k and µ2,k correspond to node k. Eq. (7) can be specialized
to the multi-level diffusion LMS algorithm (5)-(6) by appropriately
selecting these matrices - see (20)-(23).
As in [9], we introduce the following assumptions:
• The measurements are related to the unknown vector as follows:
dk (i) = uk,i wo + v k (i)
(8)
2
σv,k
,
where v k (i) is a zero-mean random variable with variance
independent of uk,i for all k and i, and independent of v l (j) for
l = k or i = j.
• All regressors uk,i are spatially and temporally independent with
covariance matrices Ru,k .
• The step-sizes {µj,k }N
k=1 are sufficiently small (see [9] for details).
2790
(7)
We define the error quantities w̃ k,i = wo − w k,i , ψ̃ k,i = wo −
ψ k,i , φ̃k,i = wo − φk,i and ϕ̃k,i = wo − ϕk,i , and the global
vectors:
w̃ i = col [w̃ 1,i , . . . , w̃ N,i ] ψ̃ i = col ψ̃ 1,i , . . . , ψ̃ N,i
ϕ̃i = col ϕ̃1,i , . . . , ϕ̃N,i
φ̃i = col φ̃1,i , . . . , φ̃N,i
We introduce the matrices below for j = 1, 2:
(j)
(j)
Mj = diag µ1 IM , . . . , µN IM
Aj = Aj ⊗ IM ,
D j,i = diag
N
(9)
Cj = Cj ⊗ IM
(j)
Gi = col u∗1,i v 1 (i) , . . . ,
N
(j)
cl,N u∗l,i ul,i
l=1
u∗N,i v N (i)
cl,1 u∗l,i ul,i , . . . ,
l=1
(10)
=
w̃ i−1 − M1 [D 1,i w̃ i−1 + C1T G1,i ]
φ̃i
=
AT1 ψ̃ i
ϕ̃i
=
φ̃i − M2 [D 2,i φ̃i +
w̃ i
=
AT2 ϕ̃i
F
=
(16)
(A1 ⊗ A1 ) A2 ⊗ A2 − A2 ⊗ (D2 M2 A2 ) −
T
T
(D2 M2 A2 ) ⊗ A2 − (D1 M1 A1 A2 ) ⊗ (A1 A2 ) −
(A1 A2 ) ⊗ (D1T M1 A1 A2 )
Then, using the result that Tr(ΣX) = vec(X T )T σ we arrive at
E ||w̃ ∞ ||2(I−F )σ = [vec(KG T K T )]T σ
=
MSDk = E ||w̃ ∞ ||2qk = [vec(KG T K T )]T (I − F )−1 qk
(I − F )σ = rk = vec(diag(ek ) ⊗ Ruk )
AT2 [I − M2 D 2,i ]AT1 [I − M1 D 1,i ]w̃ i−1 −
(11)
AT2 [AT1 − M2 D 2,i AT1 − AT1 M1 D 1,i ]w̃ i−1 −
AT2 AT1 M1 C1T Gi − AT2 M2 C2T Gi
(12)
Moreover, for j = 1, 2, let
Dj
E D j,i = diag
N
(j)
cl,1 Ru,l
, ... ,
l=1
G
2
E[Gi G∗i ] = diag σv,1
Ru,1 , . . . ,
N
(j)
cl,N Ru,l
l=1
2
σv,N
Ru,N
(13)
We follow the energy conservation analysis of [10]. From the independence of the regressors uk,i we obtain that D j,i is independent
of w̃ i−1 . Evaluating the weighted norm of w̃i in (12) we obtain:
E ||w̃ i ||2Σ
=
E ||w̃ i−1 ||2Σ′
∗
+ Tr[ΣKGK ]
(14)
where Σ is any Hermitian positive-definite matrix and
Σ′
≈
K
=
Then the EMSE becomes:
EMSEk = E ||w̃ ∞ ||2rk = [vec(KG T K T )]T (I − F )−1 rk
Using the assumption that the step-sizes are small for all filters, we
drop the terms that include products of step-sizes from (11) to obtain
≈
(18)
The EMSE at node k is obtained by weighting E ||w̃ ∞ ||2 with a
block matrix that has Ruk at block {k, k} and zeros elsewhere, that
is, by selecting
C2T G2,i ]
AT2 [I − M2 D 2,i ]AT1 M1 C1T Gi − AT2 M2 C2T Gi
w̃ i
(17)
The MSD at node k can be obtained by weighting E ||w̃ ∞ ||2
with a block matrix that has an identity matrix at block {k, k} and
zeros elsewhere. Let us denote the vectorized version of this matrix
by qk , that is:
qk = vec(diag(ek ) ⊗ IM )
or, equivalently,
w̃ i
σ ′ vec(Σ′ ) = F σ
Then the MSD becomes:
Then from (7) and the linear model assumption we have:
ψ̃ i
we arrive at
A1 A2 ΣAT2 AT1 −
A1 A2 ΣAT2 M2 D2 AT1 − A1 D2∗ M2 A2 ΣAT2 AT1 −
A1 A2 ΣAT2 AT1 M1 D1 − D1∗ M1 A1 A2 ΣAT2 AT1
(15)
AT2 AT1 M1 C1T + AT2 M2 C2T
where again we ignored terms including products of step-sizes.
Let
σ = vec(Σ),
Σ = vec−1 (σ)
(19)
Note that if we select M1 = 0, we obtain the same result as in
[9], and therefore the case presented here is more general. The network MSD and EMSE are defined as the average MSD and EMSE,
respectively, across all nodes in the network.
In order to apply the above analysis to the multi-level diffusion
LMS algorithm (5)-(6), we need to appropriately select the matrices
A1 , A2 , C1 and C2 in (7). Let γ denote a vector of size N with unity
entries at position k if k ∈ S, and zeros elsewhere. Then we have
C1
A1
C2
=
=
=
C · diag(1 − γ)
A · diag(γ) + IN · diag(1 − γ)
C · diag(γ)
(20)
(21)
(22)
A2
=
A(1) [IN · diag(γ) + A · diag(1 − γ)]
(23)
Eq. (20) indicates that all nodes that are not supernodes will update
their estimates using the adaptive update. Eq. (21) indicates that
all the supernodes will combine the estimates from their neighbors,
whereas the normal nodes will not update their estimates. Eq. (22)
represents the step where the supernodes perform the adaptive update on the data from their neighbors. Finally, Eq. (23) represents
the steps where the supernodes combine the estimates from their supernode neighbors (via A(1) ), combined with the step where all normal nodes combine the estimates from their neighbors (via A).
5. SIMULATIONS
where the vec(·) notation stacks the columns of the matrix argument
on top of each other. We also use the notation w̃2σ to denote w̃2Σ .
Using the Kronecker product property
vec(P ΣQ) = (QT ⊗ P )vec(Σ)
2791
In order to illustrate the performance of the multi-level diffusion algorithms, we present a simulation example in Figs. 2-4. We use a
network with N = 30 nodes and a topology as in Fig. 2, where the
supernodes are denoted by squares. Supernode neighbors are defined
as those supernodes that are 3 or less hops away from each other. The
regressors have size M = 2, are zero-mean Gaussian, independent
6
Tr(Ru,k )
0.02
2
σv,k
0.015
0.01
0.005
0
5
4
3
2
1
1
5
10
15
20
25
30
1
Node number k
5
10
15
20
25
30
Node number k
Transient network EMSE (dB)
2
Fig. 2. Network topology (top), noise variances σv,k
(bottom, left)
and trace of regressor covariances Tr(Ru,k ) (bottom, right).
10
No cooperation
ATC diff. LMS, C = I, (5)
Multi-level diff. LMS 1 (5)-(6)
Multi-level diff. LMS 2 (5)-(6)
Global LMS (2)
0
−10
−20
are assigned ten times higher degree than their actual degree in order to compute the weights. In this way we give more weight to the
estimates of the supernodes. We also used C = I in all cases.
We compare five algorithms. The algorithm denoted “No cooperation” corresponds to the case where every node runs an LMS
algorithm on its data and does not communicate at all with its neighbors. The algorithm denoted “Global LMS” is the global solution
(2). “ATC diff LMS” refers to algorithm [9],(5) using C = I.
“Multi-level ATC diffLMS 1” and “Multi-level ATC diffLMS 2” denote the multi-level diffusion LMS algorithm (5)-(6). The former
algorithm uses a µk = 0.0083 if k is a supernode, and µk = 0.05
otherwise, whereas the latter uses a constant step-size of µk = 0.05.
Fig. 3 shows the learning curves for the different LMS algorithms in terms of the network EMSE. We can observe that the multilevel algorithms outperform the conventional diffusion LMS algorithm. Moreover, the first multi-level algorithm attains comparable steady-state performance, yet converges much faster, whereas
the second multi-level algorithm with a smaller step-size in the supernode level attains a comparable convergence rate, with improved
steady-state performance. The steady-state is close to the one offered by the global solution. Fig. 4 shows the steady-state network
EMSE for the same algorithms, showing agreement with the theoretical results from expressions (18) and (19). The steady-state values
are obtained by averaging over 40 time samples after convergence.
−30
6. CONCLUSIONS
−40
We proposed a cooperation scheme denoted multi-level diffusion
where nodes with enhanced characteristics are introduced in the network. We showed through theory and simulation that this method
improves both the convergence and steady-state performance over
conventional diffusion methods, at the expense of requiring special
provisions to accommodate communication between the supernodes.
−50
0
50
100
150
200
Time, i
Fig. 3. Transient network EMSE (top) and MSD (bottom) for LMS
without cooperation, CTA and ATC diffusion LMS, and global LMS.
7. REFERENCES
ATC diff. LMS, C = I, [9]
ATC diff. LMS, C = I, theory [9]
Multi-level diff. LMS 1 (5)-(6)
Multi-level diff. LMS 1, theory (18)-(19)
Multi-level diff. LMS 2 (5)-(6)
Multi-level diff. LMS 2, theory (18)-(19)
Global LMS (2)
Global LMS, theory [9]
Steady State EMSE (dB)
−35
−40
−45
−50
5
10
15
20
25
30
Node number, k
Fig. 4. Steady-state performance of different algorithms.
in time and space and have covariance matrices Ru,k . The back2
2
ground noise power is denoted by σv,k
. Fig. 2 also shows σv,k
and
Tr(Ru,k ) for every node k. The results were averaged over a total
of 100 experiments of duration 200 iterations each. The unknown
wo was chosen randomly at the beginning of each experiment. The
step-size µk = 0.05 is constant for all nodes and algorithms (except
for one of the muli-level diffusion algorithms as discussed next). For
the diffusion algorithms, relative-degree weights [4] are used for the
matrix A and for the matrix A(1) , with the exception that supernodes
2792
[1] A. H. Sayed and C. G. Lopes, “Adaptive processing over distributed
networks,” IEICE Trans. on Fund. of Electronics, Communications and
Computer Sciences, vol. E90-A, no. 8, pp. 1504–1510, August 2007.
[2] A. H. Sayed and F. Cattivelli, “Distributed adaptive learning mechanisms,” Handbook on Array Processing and Sensor Networks, S.
Haykin and K. J. Ray Liu, editors, Wiley, NJ, 2009.
[3] C. G. Lopes and A. H. Sayed, “Diffusion least-mean squares over adaptive networks: Formulation and performance analysis,” IEEE Trans. on
Signal Processing, vol. 56, no. 7, pp. 3122–3136, July 2008.
[4] F. S. Cattivelli, C. G. Lopes, and A. H. Sayed, “Diffusion recursive
least-squares for distributed estimation over adaptive networks,” IEEE
Trans. on Signal Processing, vol. 56, no. 5, pp. 1865–1877, May 2008.
[5] F. S. Cattivelli, C. G. Lopes, and A. H. Sayed, “Diffusion strategies for
distributed Kalman filtering: formulation and performance analysis,” in
Proc. Cognitive Information Processing, Santorini, Greece, June 2008.
[6] F. S. Cattivelli and A. H. Sayed, “Diffusion mechanisms for fixedpoint distributed Kalman smoothing,” in Proc. EUSIPCO, Lausanne,
Switzerland, August 2008.
[7] L. Xiao, S. Boyd, and S. Lall, “A space-time diffusion scheme for peerto-peer least-squares estimation,” in Proc IPSN, Nashville, TN, April
2006, pp. 168–176.
[8] I. D. Schizas, G. Mateos, and G. B. Giannakis, “Stability analysis of
the consensus-based distributed LMS algorithm,” in Proc. ICASSP, Las
Vegas, NV, March 2008, pp. 3289–3292.
[9] F. S. Cattivelli and A. H. Sayed, “Diffusion lms algorithms with information exchange,” in Proc. Asilomar Conf. Signals, Systems and
Computers, Pacific Grove, CA, October 2008.
[10] A. H. Sayed, Fundamentals of Adaptive Filtering, Wiley, NJ, 2003.