Multi-level diffusion adaptive networks

Ali Sayed

Multi-level diffusion adaptive networks

2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

MULTI-LEVEL DIFFUSION ADAPTIVE NETWORKS Federico S. Cattivelli Ali H. Sayed Department of Electrical Engineering University of California, Los Angeles ABSTRACT We study the problem of distributed estimation, where a set of nodes are required to collectively estimate some parameter of interest from their measurements. Diffusion algorithms have been shown to achieve good performance, increased robustness and are amenable for real-time implementations. In this work we focus on multi-level diffusion algorithms, where a network running a diffusion algorithm is enhanced by adding special nodes that can perform different processing. These special nodes form a second network where a second diffusion algorithm is implemented. We illustrate the concept using diffusion LMS, provide performance analysis for multi-level collaboration and present simulation results showing improved performance over conventional diffusion. Index Terms— Distributed estimation, cooperation, diffusion, adaptive network 1. INTRODUCTION We study the problem of distributed estimation, where a set of nodes are required to collectively estimate some parameter of interest from their measurements. Consider a set of N nodes distributed over some region. At every time instant i, every node k takes a scalar measurement dk (i) of some random process dk (i) and a 1 × M regression vector, uk,i , corresponding to a realization of a random process uk,i which is correlated with dk (i). The objective is for every node in the network to use the data {dk (i), uk,i } to estimate some deterministic unknown vector wo . Speciﬁcally, we seek the optimal linear estimator wo that minimizes the following global cost function: J glob (w) N E |dk (i) − uk,i w|2 (1) k=1 where E denotes the expectation operator. In a centralized solution to the problem, every node transmits its data {dk (i), uk,i } to a central fusion center for processing, requiring large amounts of energy for communication. We refer to the solution obtained by a fusion center as the global solution. An adaptive global LMS solution to (1) can be obtained as follows [1, 2]: wi = wi−1 + µ N u∗k,i (dk (i) − uk,i wi−1 ) (2) k=1 Distributed implementations avoid the use of a fusion center and distribute the processing and communication across the entire network. Two types of distributed estimation methods are known as incremental and diffusion. Diffusion algorithms are amenable for real-time implementations, more robust to node and link failure, and obtain good performance in terms of estimation accuracy. Several This work was supported in part by NSF grants ECS-0601266 and ECS0725441. Author’s emails: {fcattiv, sayed}@ee.ucla.edu. 978-1-4244-2354-5/09/$25.00 ©2009 IEEE 2789 Fig. 1. Disjoint adaptive networks connected via supernodes. distributed estimation algorithms have been proposed, including incremental LMS and RLS [1], diffusion LMS [3] , diffusion RLS [4] and diffusion Kalman ﬁltering [5] and smoothing [6]. Algorithms based on consensus have been proposed in [7, 8]. In this work, we study multi-level diffusion algorithms, whereby an adaptive network is enhanced with a set of nodes with special abilities. These new nodes form a second network that also runs a diffusion algorithm, creating an adaptive network with two levels of adaptivity. We focus on the adapt-then-combine (ATC) version of diffusion LMS [9], though the results can be extended to other diffusion algorithms. We start by providing some motivation for multilevel diffusion, then propose an algorithm for multi-level diffusion, provide performance analysis for the case of diffusion LMS with small step-sizes, and show through simulation how adding a second layer of diffusion improves the overall performance of the network. 2. MOTIVATION In our previous work we considered diffusion algorithms whereby nodes communicate with their neighbors in an isotropic manner. These algorithms usually include two updates: an adaptive update where nodes use their individual measurements to adapt their current estimates based on an adaptive ﬁltering algorithm such as LMS or RLS, followed by a combination update where nodes combine the estimates from their neighbors to obtain the new estimate. This type of diffusion algorithms is fully distributed. If a node were to fail, the remaining nodes would continue operating, and the adaptive network would continue working. Consider now a situation where the adaptive network is divided into non-connected sub-networks, as shown in Fig 1. This situation could arise in an application where the area to be sensed is too large, or where we wish to sense events located at points spatially far away from each other. One approach could be to scatter different adaptive networks over the area, each running a diffusion algorithm, say diffusion LMS. A question arises then about how to fuse the estimates from these sub-networks such that they beneﬁt from interacting with each other. One solution would be to equip each one of the subnetworks with a supernode, which is able to communicate with the supernodes from the other networks (supernodes are shown as red squares in Fig. 1). This communication may be done through a special link or through multi-hop transmissions. Then the supernodes ICASSP 2009 would form a new network, and we could run a second diffusion algorithm on the supernodes. Thus, the supernodes would become a new adaptive network as shown by the red dashed links in Fig. 1. Another scenario where multi-level diffusion algorithms could be useful is the case where we have a large adaptive network, and we want to improve its convergence speed. When an event occurs on one end of the network, it will take a number of iterations for this event to propagate. Again we can think of equipping the network with supernodes capable of communicating through larger distances. An example network is shown in Fig. 2. Events happening on one end of the network would propagate faster through the supernodes. Thus, multi-level diffusion schemes are attractive in situations where the original adaptive network needs to be enhanced. It is important to note that multi-level diffusion algorithms should preserve the property of being fully distributed. If one of the supernodes fails, the network should be able to keep functioning without severe impact on the performance. 3. MULTI-LEVEL DIFFUSION A multi-level diffusion algorithm consisting of two levels is discussed next. The algorithm gives special privileges to a subset of the nodes which we call supernodes. Thus, there will be a set of normal nodes running a diffusion algorithm, and a second level of diffusion running on the supernodes. Let S denote the set of nodes that are supernodes. Let Nk denote the neighborhood of node k, i.e., the set of nodes that are connected to node k, including k itself, (1) and let Nk denote the set of supernodes connected to supernode k, including k itself. Consider matrices A, C and A(1) with individ(1) ual non-negative real entries al,k , cl,k and al,k , respectively. These matrices satisfy cl,k = al,k = 0 if l ∈ Nk 1 ∗ A = 1∗ 1∗ C = 1 ∗ (1) (1) (3) 1∗ A(1) = 1∗ (4) al,k = 0 if l ∈ Nk C1 = 1 where 1 denotes the N × 1 vector with unit entries. In a multi-level diffusion scheme, normal nodes (i.e., those that are not supernodes) will perform exactly the same updates as in a diffusion algorithm. In this case, we focus on the ATC diffusion LMS algorithm of [9], namely: For k ∈ S, ⎧ every node k such that ⎪ ψ cl,k u∗l,i (dl (i) − ul,i wk,i−1 ) = w + µ k,i−1 k ⎪ k,i ⎨ (5) l∈Nk ⎪ al,k ψl,i ⎪ ⎩ wk,i = l∈Nk The ﬁrst and second equations of (5) are known as the adaptation and combination updates, respectively. The matrix C allows nodes to use measurements from their neighbors in the update, though for a choice C = I, no exchange of measurements is needed. In the latter case, at every time instant i, every node k updates the estimate in three steps. First, every node adapts its current estimate using its individual measurements {dk (i), uk,i } to obtain ψk,i . Second, all nodes exchange their pre-estimates ψk,i with their neighbors. Finally, every node combines (or averages) the pre-estimates to obtain the new estimate wk,i . The fact that normal nodes perform the same update as in a conventional diffusion algorithm adds robustness to the adaptive network. Normal nodes do not need to be aware that supernodes are present in the network, and will treat them as normal nodes. Thus, even if all the supernodes were to fail, this would be transparent to the rest of the nodes, and the adaptive network would keep functioning as a conventional single-level network. For the supernodes, k ∈ S, we adopt a diffusion mechanism in three steps as follows: For every node k such that k ∈ S, ⎧ ⎪ φ al,k ψl,i = a w + k,i k,k k,i−1 ⎪ ⎪ ⎪ ⎪ l∈Nk ,l=k ⎪ ⎪ ⎨ ϕ =φ +µ cl,k u∗l,i (dl (i) − ul,i φk,i ) (6) k,i k,i k l∈Nk ⎪ ⎪ ⎪ (1) ⎪ wk,i = ψk,i = al,k ϕl,i ⎪ ⎪ ⎪ ⎩ (1) l∈Nk After the adaptive update of the normal nodes in (5), these nodes will transmit their pre-estimates ψk,i to their neighbors. A supernode will receive these pre-estimates from its neighbors and will perform an initial combination step using these pre-estimates and its previous estimate wk,i−1 , to obtain φk,i . Then it will run an adaptive step to obtain ϕk,i . In the third step, supernodes will exchange ϕk,i with (1) their neighboring supernodes Nk , and will combine them to obtain wk,i . The neighbors Nk of the supernode k will receive this estimate wk,i = ψk,i and will use it in the combination step of (5). 4. PERFORMANCE ANALYSIS In this section we analyze the performance of the multi-level diffusion LMS algorithm. The analysis is similar to the one presented in [9], but the results are more general and include the ones presented in [9] as a special case. In what follows we consider the estimates w k,i to be random, and analyze their performance in terms of their expected behavior. Consider a diffusion LMS update consisting of two adaptive steps and two combination steps as follows: ⎧ N ⎪ ⎪ ⎪ ψ = w + µ c1,l,k u∗l,i [dl (i) − ul,i w k,i−1 ] k,i−1 1,k ⎪ k,i ⎪ ⎪ ⎪ l=1 ⎪ ⎪ N ⎪ ⎪ ⎪ ⎪ a1,l,k ψ l,i ⎪ ⎨ φk,i = l=1 ⎪ ⎪ ⎪ ⎪ ϕk,i ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ w k,i = φk,i + µ2,k N c2,l,k u∗l,i [dl (i) − ul,i φk,i ] l=1 = N a2,l,k ϕl,i l=1 where the coefﬁcients aj,l,k and cj,l,k are real, non-negative, and correspond to the {l, k} entries of matrices Aj and Cj , respectively, for j = 1, 2, and the columns of Aj add up to one. The step-sizes µ1,k and µ2,k correspond to node k. Eq. (7) can be specialized to the multi-level diffusion LMS algorithm (5)-(6) by appropriately selecting these matrices - see (20)-(23). As in [9], we introduce the following assumptions: • The measurements are related to the unknown vector as follows: dk (i) = uk,i wo + v k (i) (8) 2 σv,k , where v k (i) is a zero-mean random variable with variance independent of uk,i for all k and i, and independent of v l (j) for l = k or i = j. • All regressors uk,i are spatially and temporally independent with covariance matrices Ru,k . • The step-sizes {µj,k }N k=1 are sufﬁciently small (see [9] for details). 2790 (7) We deﬁne the error quantities w̃ k,i = wo − w k,i , ψ̃ k,i = wo − ψ k,i , φ̃k,i = wo − φk,i and ϕ̃k,i = wo − ϕk,i , and the global vectors: w̃ i = col [w̃ 1,i , . . . , w̃ N,i ] ψ̃ i = col ψ̃ 1,i , . . . , ψ̃ N,i ϕ̃i = col ϕ̃1,i , . . . , ϕ̃N,i φ̃i = col φ̃1,i , . . . , φ̃N,i We introduce the matrices below for j = 1, 2: (j) (j) Mj = diag µ1 IM , . . . , µN IM Aj = Aj ⊗ IM , D j,i = diag N (9) Cj = Cj ⊗ IM (j) Gi = col u∗1,i v 1 (i) , . . . , N (j) cl,N u∗l,i ul,i l=1 u∗N,i v N (i) cl,1 u∗l,i ul,i , . . . , l=1 (10) = w̃ i−1 − M1 [D 1,i w̃ i−1 + C1T G1,i ] φ̃i = AT1 ψ̃ i ϕ̃i = φ̃i − M2 [D 2,i φ̃i + w̃ i = AT2 ϕ̃i F = (16) (A1 ⊗ A1 ) A2 ⊗ A2 − A2 ⊗ (D2 M2 A2 ) − T T (D2 M2 A2 ) ⊗ A2 − (D1 M1 A1 A2 ) ⊗ (A1 A2 ) − (A1 A2 ) ⊗ (D1T M1 A1 A2 ) Then, using the result that Tr(ΣX) = vec(X T )T σ we arrive at E ||w̃ ∞ ||2(I−F )σ = [vec(KG T K T )]T σ = MSDk = E ||w̃ ∞ ||2qk = [vec(KG T K T )]T (I − F )−1 qk (I − F )σ = rk = vec(diag(ek ) ⊗ Ruk ) AT2 [I − M2 D 2,i ]AT1 [I − M1 D 1,i ]w̃ i−1 − (11) AT2 [AT1 − M2 D 2,i AT1 − AT1 M1 D 1,i ]w̃ i−1 − AT2 AT1 M1 C1T Gi − AT2 M2 C2T Gi (12) Moreover, for j = 1, 2, let Dj E D j,i = diag N (j) cl,1 Ru,l , ... , l=1 G 2 E[Gi G∗i ] = diag σv,1 Ru,1 , . . . , N (j) cl,N Ru,l l=1 2 σv,N Ru,N (13) We follow the energy conservation analysis of [10]. From the independence of the regressors uk,i we obtain that D j,i is independent of w̃ i−1 . Evaluating the weighted norm of w̃i in (12) we obtain: E ||w̃ i ||2Σ = E ||w̃ i−1 ||2Σ′ ∗ + Tr[ΣKGK ] (14) where Σ is any Hermitian positive-deﬁnite matrix and Σ′ ≈ K = Then the EMSE becomes: EMSEk = E ||w̃ ∞ ||2rk = [vec(KG T K T )]T (I − F )−1 rk Using the assumption that the step-sizes are small for all ﬁlters, we drop the terms that include products of step-sizes from (11) to obtain ≈ (18) The EMSE at node k is obtained by weighting E ||w̃ ∞ ||2 with a block matrix that has Ruk at block {k, k} and zeros elsewhere, that is, by selecting C2T G2,i ] AT2 [I − M2 D 2,i ]AT1 M1 C1T Gi − AT2 M2 C2T Gi w̃ i (17) The MSD at node k can be obtained by weighting E ||w̃ ∞ ||2 with a block matrix that has an identity matrix at block {k, k} and zeros elsewhere. Let us denote the vectorized version of this matrix by qk , that is: qk = vec(diag(ek ) ⊗ IM ) or, equivalently, w̃ i σ ′ vec(Σ′ ) = F σ Then the MSD becomes: Then from (7) and the linear model assumption we have: ψ̃ i we arrive at A1 A2 ΣAT2 AT1 − A1 A2 ΣAT2 M2 D2 AT1 − A1 D2∗ M2 A2 ΣAT2 AT1 − A1 A2 ΣAT2 AT1 M1 D1 − D1∗ M1 A1 A2 ΣAT2 AT1 (15) AT2 AT1 M1 C1T + AT2 M2 C2T where again we ignored terms including products of step-sizes. Let σ = vec(Σ), Σ = vec−1 (σ) (19) Note that if we select M1 = 0, we obtain the same result as in [9], and therefore the case presented here is more general. The network MSD and EMSE are deﬁned as the average MSD and EMSE, respectively, across all nodes in the network. In order to apply the above analysis to the multi-level diffusion LMS algorithm (5)-(6), we need to appropriately select the matrices A1 , A2 , C1 and C2 in (7). Let γ denote a vector of size N with unity entries at position k if k ∈ S, and zeros elsewhere. Then we have C1 A1 C2 = = = C · diag(1 − γ) A · diag(γ) + IN · diag(1 − γ) C · diag(γ) (20) (21) (22) A2 = A(1) [IN · diag(γ) + A · diag(1 − γ)] (23) Eq. (20) indicates that all nodes that are not supernodes will update their estimates using the adaptive update. Eq. (21) indicates that all the supernodes will combine the estimates from their neighbors, whereas the normal nodes will not update their estimates. Eq. (22) represents the step where the supernodes perform the adaptive update on the data from their neighbors. Finally, Eq. (23) represents the steps where the supernodes combine the estimates from their supernode neighbors (via A(1) ), combined with the step where all normal nodes combine the estimates from their neighbors (via A). 5. SIMULATIONS where the vec(·) notation stacks the columns of the matrix argument on top of each other. We also use the notation w̃2σ to denote w̃2Σ . Using the Kronecker product property vec(P ΣQ) = (QT ⊗ P )vec(Σ) 2791 In order to illustrate the performance of the multi-level diffusion algorithms, we present a simulation example in Figs. 2-4. We use a network with N = 30 nodes and a topology as in Fig. 2, where the supernodes are denoted by squares. Supernode neighbors are deﬁned as those supernodes that are 3 or less hops away from each other. The regressors have size M = 2, are zero-mean Gaussian, independent 6 Tr(Ru,k ) 0.02 2 σv,k 0.015 0.01 0.005 0 5 4 3 2 1 1 5 10 15 20 25 30 1 Node number k 5 10 15 20 25 30 Node number k Transient network EMSE (dB) 2 Fig. 2. Network topology (top), noise variances σv,k (bottom, left) and trace of regressor covariances Tr(Ru,k ) (bottom, right). 10 No cooperation ATC diff. LMS, C = I, (5) Multi-level diff. LMS 1 (5)-(6) Multi-level diff. LMS 2 (5)-(6) Global LMS (2) 0 −10 −20 are assigned ten times higher degree than their actual degree in order to compute the weights. In this way we give more weight to the estimates of the supernodes. We also used C = I in all cases. We compare ﬁve algorithms. The algorithm denoted “No cooperation” corresponds to the case where every node runs an LMS algorithm on its data and does not communicate at all with its neighbors. The algorithm denoted “Global LMS” is the global solution (2). “ATC diff LMS” refers to algorithm [9],(5) using C = I. “Multi-level ATC diffLMS 1” and “Multi-level ATC diffLMS 2” denote the multi-level diffusion LMS algorithm (5)-(6). The former algorithm uses a µk = 0.0083 if k is a supernode, and µk = 0.05 otherwise, whereas the latter uses a constant step-size of µk = 0.05. Fig. 3 shows the learning curves for the different LMS algorithms in terms of the network EMSE. We can observe that the multilevel algorithms outperform the conventional diffusion LMS algorithm. Moreover, the ﬁrst multi-level algorithm attains comparable steady-state performance, yet converges much faster, whereas the second multi-level algorithm with a smaller step-size in the supernode level attains a comparable convergence rate, with improved steady-state performance. The steady-state is close to the one offered by the global solution. Fig. 4 shows the steady-state network EMSE for the same algorithms, showing agreement with the theoretical results from expressions (18) and (19). The steady-state values are obtained by averaging over 40 time samples after convergence. −30 6. CONCLUSIONS −40 We proposed a cooperation scheme denoted multi-level diffusion where nodes with enhanced characteristics are introduced in the network. We showed through theory and simulation that this method improves both the convergence and steady-state performance over conventional diffusion methods, at the expense of requiring special provisions to accommodate communication between the supernodes. −50 0 50 100 150 200 Time, i Fig. 3. Transient network EMSE (top) and MSD (bottom) for LMS without cooperation, CTA and ATC diffusion LMS, and global LMS. 7. REFERENCES ATC diff. LMS, C = I, [9] ATC diff. LMS, C = I, theory [9] Multi-level diff. LMS 1 (5)-(6) Multi-level diff. LMS 1, theory (18)-(19) Multi-level diff. LMS 2 (5)-(6) Multi-level diff. LMS 2, theory (18)-(19) Global LMS (2) Global LMS, theory [9] Steady State EMSE (dB) −35 −40 −45 −50 5 10 15 20 25 30 Node number, k Fig. 4. Steady-state performance of different algorithms. in time and space and have covariance matrices Ru,k . The back2 2 ground noise power is denoted by σv,k . Fig. 2 also shows σv,k and Tr(Ru,k ) for every node k. The results were averaged over a total of 100 experiments of duration 200 iterations each. The unknown wo was chosen randomly at the beginning of each experiment. The step-size µk = 0.05 is constant for all nodes and algorithms (except for one of the muli-level diffusion algorithms as discussed next). For the diffusion algorithms, relative-degree weights [4] are used for the matrix A and for the matrix A(1) , with the exception that supernodes 2792 [1] A. H. Sayed and C. G. Lopes, “Adaptive processing over distributed networks,” IEICE Trans. on Fund. of Electronics, Communications and Computer Sciences, vol. E90-A, no. 8, pp. 1504–1510, August 2007. [2] A. H. Sayed and F. Cattivelli, “Distributed adaptive learning mechanisms,” Handbook on Array Processing and Sensor Networks, S. Haykin and K. J. Ray Liu, editors, Wiley, NJ, 2009. [3] C. G. Lopes and A. H. Sayed, “Diffusion least-mean squares over adaptive networks: Formulation and performance analysis,” IEEE Trans. on Signal Processing, vol. 56, no. 7, pp. 3122–3136, July 2008. [4] F. S. Cattivelli, C. G. Lopes, and A. H. Sayed, “Diffusion recursive least-squares for distributed estimation over adaptive networks,” IEEE Trans. on Signal Processing, vol. 56, no. 5, pp. 1865–1877, May 2008. [5] F. S. Cattivelli, C. G. Lopes, and A. H. Sayed, “Diffusion strategies for distributed Kalman ﬁltering: formulation and performance analysis,” in Proc. Cognitive Information Processing, Santorini, Greece, June 2008. [6] F. S. Cattivelli and A. H. Sayed, “Diffusion mechanisms for ﬁxedpoint distributed Kalman smoothing,” in Proc. EUSIPCO, Lausanne, Switzerland, August 2008. [7] L. Xiao, S. Boyd, and S. Lall, “A space-time diffusion scheme for peerto-peer least-squares estimation,” in Proc IPSN, Nashville, TN, April 2006, pp. 168–176. [8] I. D. Schizas, G. Mateos, and G. B. Giannakis, “Stability analysis of the consensus-based distributed LMS algorithm,” in Proc. ICASSP, Las Vegas, NV, March 2008, pp. 3289–3292. [9] F. S. Cattivelli and A. H. Sayed, “Diffusion lms algorithms with information exchange,” in Proc. Asilomar Conf. Signals, Systems and Computers, Paciﬁc Grove, CA, October 2008. [10] A. H. Sayed, Fundamentals of Adaptive Filtering, Wiley, NJ, 2003.

Log In

Multi-level diffusion adaptive networks

Related papers

Related papers