Algorithms with high order convergence speed for blind source extraction

Pando Georgiev

Comput Optim Appl DOI 10.1007/s10589-007-9031-2 Algorithms with high order convergence speed for blind source extraction Pando Georgiev · Panos Pardalos · Andrzej Cichocki © Springer Science+Business Media, LLC 2007 Abstract A rigorous convergence analysis for the fixed point ICA algorithm of Hyvärinen and Oja is provided and a generalization of it involving cumulants of an arbitrary order is presented. We consider a specific optimization problem OP(p), p > 3, integer, arising from a Blind Source Extraction problem (BSE) and prove that every local maximum of OP(p) is a solution of (BSE) in sense that it extracts one source signal from a linear mixture of unknown statistically independent signals. An algorithm for solving OP(p) is constructed, which has a rate of convergence p − 1. Keywords Independent component analysis · Cumulants · Fixed point algorithm 1 Introduction The problem of blind source extraction BSE, which we shall consider is formulated as follows: we can observe sensor signals (random variables) x(k) = Research partially supported by NSF and NIH grants. P. Georgiev () ECECS Department, University of Cincinnati, ML 0030, Cincinnati, OH 45221-0030, USA e-mail: pgeorgie@ececs.uc.edu P. Georgiev Sofia University “St. Kl. Ohridski”, 5 J. Bourchier Blvd., 1164, Sofia, Bulgaria e-mail: pandogeorgiev@gmail.com P. Pardalos Center for Applied Optimization, University of Florida, Florida, USA e-mail: pardalos@cao.ise.ufl.edu A. Cichocki The Institute for Physical and Chemical Research (RIKEN), Brain Science Institute, Wako-shi, Saitama 351-0198, Japan e-mail: cia@bsp.brain.riken.jp P. Georgiev et al. [x1 (k), . . . , xm (k)]T which are described as x(k) = As(k), k = 1, 2, . . . , (1) where A is n × n non-singular unknown mixing matrix, s(k) = [s1 (k), . . . , sn (k)]T is a vector of unknown zero mean source signals. Our objective is to estimate the source signals sequentially one-by-one or simultaneously assuming that they are statistically independent. The literature about BSE problem (resp. independent component analysis, ICA) is huge (see for instance [2, 3, 5] and references therein). In this article we prove rigorously the convergence of the algorithm of Hyvarinen and Oja [6] under suitable conditions and generalize it to the case of cumulants of arbitrary order (grater than two). It is an example of an algorithm with high order convergence speed (bigger than cubic). Define the function ϕ : Rn → R by ϕp (w) = cump (wT x) where cump means the self-cumulant of order p: cump (s) = cumulant(s, . . . , s ) p (see [8] for definition and properties of the cumulants). We should mention the following property of the cumulants, which is used essentially: if si , i = 1, . . . , n, are statistically independent, then n n p ci cump (si ). (2) cump ci si = i=1 i=1 Consider the maximization problems OP(p) maximize |ϕp (w)| under constraint w = 1 and DP(p) maximize |ψp (c)| under constraint c = 1 where ψp (c) = cump ( ni=1 ci si ). Denoting y = wT x, c = AT w we have T y=c s= n ci si and ϕp (w) = ψp (AT w). (3) i=1 Without loss of generality we may assume that the matrix A is orthogonal (assuming that we have made the well know preprocessing called “prewhitening”, see [3] or [5]). Algorithms with high order convergence speed for blind source It is easy to see (using (3) and the orthogonality of A) that the problems DP(p) and OP(p) are equivalent in sense that w0 is a solution of OP(p) if and only if c0 = AT w0 is a solution of DP(p). We shall call this property “duality” between the original (observable) problem OP(p) and the dual (hidden) problem DP(p). A very useful observation is the following: if a vector c contains only one nonzero component, say ci0 = ±1, then the vector w = Ac gives extraction (say y(k)) of the source with index i0 , since y(k) := wT x(k) = cT AT x(k) = cT s(k) = si0 (k) ∀k = 1, 2, . . . . (4) We shall show that the solutions c of DP(p) have exactly one nonzero element, and by the above mentioned duality, we can obtain the vectors w = Ac as solutions of the original problem OP(p), and by (4) we achieve extraction. One interesting property of the optimization problem OP(p) is that it has exactly n solutions (up to sign) which are orthonormal and any of which gives extraction of one source signal. The fixed point algorithm finds one of its solutions. 2 Extraction via maximization of the absolute value of the cumulants We note that the idea of maximizing of cum4 (wT x) in order to extract one source from a linear mixture is already considered in [4]. We need the following lemma, which is generalization (with a simple proof) of a lemma in [4] (the case considered there is p = 4). Lemma 1 Consider the optimization problem: minimize (maximize) n p ki v i i=1 subject to v = c > 0 where p > 2 and v = (v1 , . . . , vn ). Denote I + = {i ∈ {1, . . . , n}: ki > 0}, I − = {i ∈ {1, . . . , n}: ki < 0} and ei = (0, . . . , 0, 1, 0, . . . , 0), (1 is at the ith place). (a) If p is even, then the points of local minimum are exactly the vectors m± i = ±cei , ± − i ∈ I and the points of local maximum are exactly the vectors Mj = ±cej , j ∈ I +. (b) If p is odd, then the points of local minimum are exactly the vectors m− i = cei , + − + i ∈ I and mi = −cei , i ∈ I and the points of local maximum are exactly the − + − vectors M+ j = cej , j ∈ I and Mj = −cej , j ∈ I . P. Georgiev et al. Proof (a) Applying the Lagrange multipliers theorem for a point of a local optimum v = (v 1 , . . . , v m ), we write p−1 ki pv i − 2λv i = 0, i = 1, . . . , m, (5) where λ is a Lagrange multiplier. Multiplying (5) by v i and summing, we obtain pfopt = 2λc2 , where fopt means the value of f at the local optimum. Hence λ= p fopt . 2c2 (6) From (5) we obtain p−2 v i ki pv i − p fopt = 0, c2 or fopt ± ki c 2 whence v i is either 0, 1 p−2 . (7) Case 1. Assume that ki0 < 0 for some index i0 and v is a local minimum. Then obviously floc min < 0. According to the second order optimality condition [1], a point x0 is a local minimum if hT L′′ (x0 )h > 0 ∀h ∈ K(x0 ) = {h: hT x0 = 0}, h = 0, where L(x) = n p ki xi − λ x2 − c2 i=1 is the Lagrange function. In our case, by (6) and (7) we obtain hT L′′ (v)h = n p−2 p(p − 1)ki v i − 2λ h2i i=1 p 2 hi − h2i = 2 floc min (p − 2) c i∈I (8) i ∈I / where I is the set of those indexes i, for which v i is different from 0. We shall check the second order sufficient condition for a local minimum for the points m± i0 . We have K(m± i0 ) = {h: hi0 = 0}, Algorithms with high order convergence speed for blind source therefore, for h ∈ K(m± i0 ), h = 0 we have hT L′′ (m± i0 )h > 0, since hi0 = 0 and floc min < 0, i.e. the second order sufficient condition is satisfied and m± is a local minimum. By (4) it follows that for any vector v with at least two nonzeros the quadratic form (8) can take positive and negative values for different values of h, i.e. the necessary condition for a local minimum is not satisfied for such a vector. Case 2. Assume that kj > 0 and v is a local maximum. We apply Case 1 to the function −f and finish the proof. The assertion (b) is proved analogously. In the following theorem we show that any local solution of OP(p) gives extraction of one source, so we don’t care about what type of solution we obtain: local or global. We characterize all solutions of the Lagrange equation for OP(p). Recall that the Lagrange equation of OP(p) is ϕp′ (w) sign(ϕp (w)) + 2λw = 0. (9) Theorem 2 Suppose that the matrix A in (1) is orthogonal. Then: (a) all local solutions of maximization problem OP(p) are exactly ±Aci , i = 1, . . . , n; (b) Any solution w of the Lagrange equation for OP(p) is characterized by the equations p−1 ci ψ ′ (c) = pcump (si )ci ∀i = 1, . . . , n, (10) where c = AT w. (c) We can extract the original source signals up to sign and permutation: yi (k) = wTi x(k) = ±sri (k), ri ∈ {1, . . . , n}. Proof (a) By property (2) and by Lemma 1 we have that DP(p) has exactly n solutions (up to sign), which are the vectors ±ei having ±1 in ith place and 0 in the other places. Now the conclusion of (a) is evident, since the problems OP(p) and DP(p) are equivalent (see the duality in the introduction). (b) Let w be a solution of (9). Since ϕp (w) = ψp (AT w), using the property of the cumulants (2) and the chain rule for differentiating the composite function ψp (AT w) with respect to w, we obtain: p−1 ϕp′ (w) = Aψp′ (c) = pA c1 p−1 cump (s1 ), . . . , cn cump (sn ) T . (11) Hence, using (9) we have Ac = w = ± since A is orthogonal. ϕp′ (w) ϕp′ (w) =± Aψp′ (c) Aψp′ (c) =± Aψ ′ (c) , ψ ′ (c) (12) P. Georgiev et al. Multiplying (12) by AT we obtain c=± ψ ′ (c) , ψ ′ (c) (13) or, componentwise, p−1 ci = ± pcump (si )ci ψ ′ (c) ∀i = 1, . . . , n, (14) which is (10). Analogously we obtain that any solution of (10) is a solution of (9). The assertion (c) is already proved by (4). 3 A generalization of the fixed point algorithm Consider the following algorithm: w(l) = ϕp′ (w(l − 1)) ϕp′ (w(l − 1)) , l = 1, 2, . . . , (15) which is a generalization of the fixed point algorithm of Hyvärinen and Oja. The name is derived by the Lagrange equation (9), since (15) tries to find a solution of it iteratively, and this solution is a fixed point of the operator defined by the right-hand side of (15). Theorem 1 Assume that si are statistically independent, zero mean signals and the mixing matrix A is orthogonal. Let p ≥ 3 be a given integer number, cump (si ) = 0, i = 1, . . . , n and let 1 I (c) = arg max ci cump (si ) p−2 . 1≤i≤n Denote by W0 the set of all elements w ∈ Rn such that w = 1, the set I (AT w) contains only one element, say i(w), and ci(w) = 0. Then: (a) The complement of W0 has a measure zero. (b) If w(0) ∈ W0 then lim yl (k) = ±si0 (k) l→∞ ∀k = 1, 2, . . . , where yl (k) = w(l)T x(k) and i0 = i(w(0)). (c) The rate of convergence in (b) is of order p − 1. Proof (a) It is easy to see that the complement K of W0 is a finite union of proper subspaces of Rn , therefore this union has a measure zero. Indeed, K can be represented as n K= Wi,j , j =i,j =1 Algorithms with high order convergence speed for blind source where Wi,j is a subspace such that I (AT w) contains at least the indices i and j . (b), (c) Using (11) with w = w(l) and c(l) = Aw(l) we obtain (like (12)): Ac(l) = w(l) = ϕp′ (w(l − 1)) ϕp′ (w(l − 1)) Aψp′ (c(l − 1)) = = Aψ ′ (c(l − 1)) , (16) ψ ′ (c(l − 1)) ∀l = 1, 2, . . . , (17) Aψp′ (c(l − 1)) since A is orthogonal. Multiplying (16) by AT we obtain c(l) = ψ ′ (c(l − 1)) ψ ′ (c(l − 1)) or, componentwise, ci (l) = σi p|cump (si )|ci (l − 1)p−1 ψ ′ (c(l − 1)) (18) for every i = 1, . . . , n, l = 1, 2, . . . , where σi = sign(cump (si )). From (18) we obtain 1 ci (l)cump (si ) p−2 1 σi [ci (l − 1)|cump (si )| p−2 ]p−1 . = (c1 (l − 1)p−1 cump (s1 ), . . . , cn (l − 1)p−1 cump (sn ))T (19) From (19) it follows by induction that if the initial conditions satisfy ci (0)cump (si ) = 0 for some i ∈ {1, . . . , n}, then the denominator in (19) can not be zero for every l ≥ 0, so w(l) in (16) is well defined. Hence, if w(0) ∈ W0 , we obtain 1 1 p−1 σi ci (l − 1) |cump (si )| p−2 ci (l) |cump (si )| p−2 . (20) = ci0 (l) |cump (si0 )| σi0 ci0 (l − 1) |cump (si0 )| Now by (20) one proves easily by induction that for every l ≥ 0 and every i = 1, . . . , n ci (l) |cump (si )| ci0 (l) |cump (si0 )| 1 p−2 = σi σi0 l ci (0) |cump (si )| ci0 (0) |cump (si0 )| 1 p−2 (p−1)l . (21) From (21) it follows that ci (l) → 0 when l → ∞ for i = i0 and |ci0 (l)| → 1 (since c(l) = 1), as the speed of the convergence is p − 1. Thus we proved: lim yl (k) = lim wT (l)x(k) = lim cT (l)s(k) l→∞ l→∞ = ±si0 (k) l→∞ ∀k = 1, 2, . . . . (22) Similar algorithm can be used for cumulants involving time delays, like in [7]. This will allow us to extract signals with zero cumulants. P. Georgiev et al. 4 Examples Below we consider some examples, for concrete values of p. 1. p = 3. Then ϕ3 (w) = cum3 (wT x) = E (wT x)3 and 2. p = 4. Then ϕ3′ (w) = 3E (wT x)2 x . ϕ4 (w) = cum4 (wT x) = E (wT x)4 − 3 E (wT x)2 and 2 ϕ4′ (w) = 4E (wT x)3 x − 12E (wT x)2 E (xxT ) w. We note that if the standard prewhitening is made (i.e. E{xxT } = In , A is orthogonal), the algorithm (15) recovers the fixed-point algorithm of Hyvarinen and Oja, i.e. w(l + 1) = E{(w(l)T x)3 x} − 3w . E{(w(l)T x)3 x} − 3w 3. p = 5. Then ϕ5 (w) = E (wT x)5 − 10E (wT x)3 E (wT x)2 and ϕ5′ (w) = 5E (wT x)4 x − 30E (wT x)2 x E (wT x)2 − 20E (wT x)3 E{xxT }w. If, in addition E{xxT } = In , then ϕ5′ (w) = 5E (wT x)4 x − 30E (wT x)2 x − 20E (wT x)3 w. 4. p = 6. Then ϕ6 (w) = E (wT x)6 − 15E (wT x)2 E (wT x)4 3 2 − 10 E (wT x)3 + 30 E (wT x)2 , ϕ6′ (w) = 6E (wT x)5 x − 30E{xxT }wE (wT x)4 − 60E (wT x)2 E (wT x)3 x − 60E (wT x)3 E (wT x)2 x 2 + 180 E (wT x)2 E{xxT }w. If, in addition, E{xxT } = I, then ϕ6′ (w) = 6E (wT x)5 x − 30wE (wT x)4 − 60E (wT x)3 x − 60E (wT x)3 E (wT x)2 x + 180w. Algorithms with high order convergence speed for blind source 5 Some remarks for practical implementation We have implemented the algorithm (15) in the case of six order cumulants, using a deflation procedure. We generated 50 sparse source signals and run the algorithm after random mixture. The number of iterations for extracting all signals one by one using six order cumulants was 313, and the number of iterations using fourth order cumulants was 339. But the computational time is bigger in the case of six order cumulants. This can be explained by the computational time needed for calculation of the six order cumulants, since they have more complex structure. We should mention also that the successful extraction of components depends of their statistical independence. In case when they are not so independent, the extraction is problematic and depends on the initial condition. References 1. Alekseev, V.M., Tihomirov, V.M., Fomin, S.V.: Optimal Control. Nauka, Moscow (1979) (in Russian) 2. Amari, S., Cichocki, A.: Adaptive blind signal processing—neural network approaches. Proc. IEEE 86(10), 2016–2048 (1998) (invited paper) 3. Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing. Wiley, Chichester (2002) 4. Delfosse, N., Loubaton, P.: Adaptive blind separation of independent sources: a deflation approach. Signal Process. 45, 59–83 (1995) 5. Hyvarinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001) 6. Hyvarinen, A., Oja, E.: A fast fixed-point algorithm for independent component analysis. Neural Comput. 9, 1483–1492 (1997) 7. Hyvarinen, A.: Blind source separation by nonstationarity of variance: a cumulant based approach. IEEE Trans. Neural Netw. 12(6), 1471–1474 (2001) 8. Shiryaev, A.N.: Probability. Graduate Texts in Mathematics. Springer, New York (1996)

RELATED PAPERS

RELATED TOPICS

Log In

Algorithms with high order convergence speed for blind source extraction

Algorithms with high order convergence speed for blind source extraction

Related Papers

RELATED PAPERS

RELATED TOPICS