Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

A bilinear algorithm for sparse representations

Computational Optimization and Applications, 2007
...Read more
A Bilinear Algorithm for Sparse Representations Pando Georgiev 1 , Panos Pardalos 2 , Fabian Theis 3 1 University of Cincinnati ECECS Department Cincinnati ML 0030, Ohio 45221, USA E-mail: pgeorgie@ececs.uc.edu 2 University of Florida Center for Applied Optimization E-mail: pardalos@ufl.edu 3 Institute of Biophysics, University of Regensburg D-93040 Regensburg, Germany E-mail: fabian@theis.name Abstract We consider the following sparse representation problem: represent a given matrix X IR m×N as a multiplication X = AS of two matrices A IR m×n (m n<N ) and S IR n×N , under requirements that all m × m submatrices of A are nonsingular, and S is sparse in sense that each column of S has at least n m + 1 zero elements. It is known that under some mild additional assumptions, such representation is unique, up to scaling and permutation of the rows of S. We show that finding A (which is the most difficult part of such representation) can be reduced to a hyperplanes clustering problem. We present a bilinear algorithm for such clustering, which is robust to outliers. A computer simulation example is presented showing the robustness of our algorithm. Keywords : Sparse Component Analysis, Blind Source Separation, un- derdetermined mixtures 1 Introduction Representation of data becomes a fundamental questions in data analysis, signal processing, data mining, neuroscience, etc., since the collected data 1
grows exponentially. In the present paper we consider a linear representation of a given matrix exploiting sparsity. Such linear representation of data set X (given in form of a (m × N )-matrix) is X = AS, A IR m×n , S IR n×N . (1) Under different assumptions on the unknown matrices A (dictionary) and S (source signals) this linear representation is known as: 1) Independent Component Analysis (ICA) problem if the rows of S are (discrete) random variables, which are statistically independent as much as possible; 2) Nonnegative Matrix Factorization (NMF) (see [14]), if the elements of X, A and S are nonnegative; 3) Sparse representation or Sparse Component Analysis problem if S con- tains as many zeros as possible. There is a large amount of papers devoted to ICA problems (see for instance [7], [12] and references therein) but mostly for the case m n. We refer to [13, 16–18] and reference therein for some recent papers on SCA and underdetermined ICA (m<n). A related problem is the so called Blind Source Separation (BSS) problem, in which we know a priori that a representation such as in equation (1) exists and the task is to recover the sources (and the mixing matrix) as accurately as possible. A fundamental property of the complete BSS problem is that such a recovery (under assumptions in 1) and non-Gaussianity of the sources) is unique up to permutation and scaling of the sources, which makes the BSS problem so attractive. Similar uniqueness of the recovery is valid if instead of independence we assume some sparsity of the source matrix (see the next section for a review of this method). The main contribution of this paper is a new algorithm for such recovery under sparsity assumptions, which is robust to outliers. We have to note the fundamental difference between recovery of sparse signals by our method and by l 1 -norm minimization. The l 1 -norm minimization gives solutions which have generally at most m non-zeros [8], [9] and in some special cases gives the sparsest solution [9]. We utilize the fact that for almost all data vectors x (in measure sense) such that the system x = As has a sparse solution with less than m nonzero elements, this solution is unique, while in [9] the authors proved that for all data vectors x such that the system x = As has a sparse solution with less than Spark(A)/2 nonzero elements, this solution is unique. Note that Spark(A) m + 1, where Spark(A) is the smallest number of linearly dependent columns of A. So, our assumptions for uniqueness of the representation (up to permutation and scaling of sources) is less restrictive, since we require more nonzero elements in the solution, namely m 1. 2
A Bilinear Algorithm for Sparse Representations Pando Georgiev1 , Panos Pardalos2 , Fabian Theis3 1 University of Cincinnati ECECS Department Cincinnati ML 0030, Ohio 45221, USA E-mail: pgeorgie@ececs.uc.edu 2 University of Florida Center for Applied Optimization E-mail: pardalos@ufl.edu 3 Institute of Biophysics, University of Regensburg D-93040 Regensburg, Germany E-mail: fabian@theis.name Abstract We consider the following sparse representation problem: represent a given matrix X ∈ IRm×N as a multiplication X = AS of two matrices A ∈ IRm×n (m 6 n < N ) and S ∈ IRn×N , under requirements that all m × m submatrices of A are nonsingular, and S is sparse in sense that each column of S has at least n − m + 1 zero elements. It is known that under some mild additional assumptions, such representation is unique, up to scaling and permutation of the rows of S. We show that finding A (which is the most difficult part of such representation) can be reduced to a hyperplanes clustering problem. We present a bilinear algorithm for such clustering, which is robust to outliers. A computer simulation example is presented showing the robustness of our algorithm. Keywords : Sparse Component Analysis, Blind Source Separation, underdetermined mixtures 1 Introduction Representation of data becomes a fundamental questions in data analysis, signal processing, data mining, neuroscience, etc., since the collected data 1 grows exponentially. In the present paper we consider a linear representation of a given matrix exploiting sparsity. Such linear representation of data set X (given in form of a (m × N )-matrix) is X = AS, A ∈ IRm×n , S ∈ IRn×N . (1) Under different assumptions on the unknown matrices A (dictionary) and S (source signals) this linear representation is known as: 1) Independent Component Analysis (ICA) problem if the rows of S are (discrete) random variables, which are statistically independent as much as possible; 2) Nonnegative Matrix Factorization (NMF) (see [14]), if the elements of X, A and S are nonnegative; 3) Sparse representation or Sparse Component Analysis problem if S contains as many zeros as possible. There is a large amount of papers devoted to ICA problems (see for instance [7], [12] and references therein) but mostly for the case m > n. We refer to [13, 16–18] and reference therein for some recent papers on SCA and underdetermined ICA (m < n). A related problem is the so called Blind Source Separation (BSS) problem, in which we know a priori that a representation such as in equation (1) exists and the task is to recover the sources (and the mixing matrix) as accurately as possible. A fundamental property of the complete BSS problem is that such a recovery (under assumptions in 1) and non-Gaussianity of the sources) is unique up to permutation and scaling of the sources, which makes the BSS problem so attractive. Similar uniqueness of the recovery is valid if instead of independence we assume some sparsity of the source matrix (see the next section for a review of this method). The main contribution of this paper is a new algorithm for such recovery under sparsity assumptions, which is robust to outliers. We have to note the fundamental difference between recovery of sparse signals by our method and by l1 -norm minimization. The l1 -norm minimization gives solutions which have generally at most m non-zeros [8], [9] and in some special cases gives the sparsest solution [9]. We utilize the fact that for almost all data vectors x (in measure sense) such that the system x = As has a sparse solution with less than m nonzero elements, this solution is unique, while in [9] the authors proved that for all data vectors x such that the system x = As has a sparse solution with less than Spark(A)/2 nonzero elements, this solution is unique. Note that Spark(A) 6 m + 1, where Spark(A) is the smallest number of linearly dependent columns of A. So, our assumptions for uniqueness of the representation (up to permutation and scaling of sources) is less restrictive, since we require more nonzero elements in the solution, namely m − 1. 2 2 Blind Source Separation based on sparsity assumptions In this section we present a method (see [11]) for solving the BSS problem if the following assumptions are satisfied: A1) the mixing matrix A ∈ IRm×n has the property that any square m × m submatrix of it is nonsingular; A2) each column of the source matrix S has at most m − 1 nonzero elements; A3) the sources are sufficiently rich represented in the following sense: for any index set of n − m + 1 elements I = {i1 , ..., in−m+1 } ⊂ {1, ..., n} there exist at least m column vectors of the matrix S such that each of them has zero elements in places with indexes in I and each m − 1 of them are linearly independent. 2.1 Matrix identification We describe conditions in the sparse BSS problem under which we can identify the mixing matrix uniquely up to permutation and scaling of the columns. 2.1.1 General case – full identifiability Theorem 1. (Identifiability conditions – general case) Assume that in the representation X = AS the matrix A satisfies condition A1), the matrix S satisfies conditions A2) and A3) and only the matrix X is known. Then the mixing matrix A is identifiable uniquely up to permutation and scaling of the columns. 2.2 Identification of sources Theorem 2. (Uniqueness of sparse representation) Let H be the set of all x ∈ IRm such that the linear system As = x has a solution with at least n − m + 1 zero components. If A fulfills A1), then there exists a subset H0 ⊂ H with measure zero with respect to H, such that for every x ∈ H \ H0 this system has no other solution with this property. From Theorem 2 it follows that the sources are identifiable generically, i.e. up to a set with a measure zero, if they have level of sparseness grater than or equal to n − m + 1 (i.e., each column of S has at least n − m + 1 elements) and the mixing matrix is known. 3 Algorithm 1: SCA matrix identification algorithm Data : samples x(1), . . . , x(T ) of X Result: estimated mixing matrix  1 2 3 4 Hyperplane identification. ¡ ¢ ¡ n ¢ n Cluster the columns of X in m−1 groups Hk , k = 1, ..., m−1 such that the span of the elements of each group Hk produces one hyperplane and these hyperplanes are different. Matrix identification. Cluster the normal vectors to these hyperplanes in the smallest number of groups Gj , j = 1, ..., n (which gives the number of sources n) such that the normal vectors to the hyperplanes in each group Gj lie in a new hyperplane Ĥj . Calculate the normal vectors âj to each hyperplane Ĥj , j = 1, ..., n. The matrix  with columns âj is an estimation of the mixing matrix (up to permutation and scaling of the columns). Algorithm 2: Source Recovery Algorithm Data : samples x(1), . . . , x(N ) (vector columns) of the data matrix X, and mixing matrix A Result: estimated source matrix S 1 2 3 4 Identify the the set of hyperplanes H produced by taking the linear hull of every subsets of the columns of A with m − 1 elements; Repeat for i = 1 to N : Identify the subspace H ∈ H containing xi := X(:, i), or, in practical situation with presence of noise, identify the one to which the distance from xi is minimal and project xi onto H to x̃i ; if H is produced by the linear hull of column vectors ai1 , ..., aim−1 , then find coefficients λi,j such that x̃i = m−1 X λi,j aij . j=1 5 These coefficients are uniquely determined if x̃i doesn’t belong to the set H0 with measure zero with respect to to H (see Theorem 2); Construct the solution S(:, i): it contains λi,j in the place ij for j = 1, ..., m − k, the rest of its components are zero. 4 3 Sparse Component Analysis In this section we review a method in [11] for solving of the SCA problem. Now the conditions are formulated only in terms of the data matrix X. Theorem 3. (SCA conditions) Assume that m 6 n 6 N and the matrix X ∈ IRm×N satisfies the following conditions: ¡ ¢ n (i) the columns of X lie in the union H of m−1 different hyperplanes, each column lies in only one such hyperplane, each hyperplane contains at least m columns of X such that each m − 1 of them ¡ n−1are ¢ linearly independent. (ii) for each i ∈ {1, ..., n} there exist p = m−2 different hyperplanes {Hi,j }pj=1 in H such that their intersection Li = ∩pk=1 Hi,j is one dimensional subspace. (iii) any m different Li span the whole IRm . Then the matrix X is representable uniquely (up to permutation and scaling of the columns of A and S) in the form X = AS, where the matrices A ∈ IRm×n and S ∈ IRn×N satisfy the conditions A1) and A2), A3) respectively. 4 Skeletons of a finite set of points Let X be a finite set of points represented by the columns {xj }N j=1 of the matrix X ∈ Rm×N . The solution {(wi0 , b0i )}ni=1 of the following minimization problem: minimize N X j=1 min |wiT xj − bi | subject to kwi k = 1, bi ∈ R, i = 1, ..., n, 16i6n (2) defines w(1) -skeleton of X (introduced in [15]). It consists of a union of n hyper-planes Hi = {x ∈ Rm : wiT x = bi }, i = 1, ..., n, such that the sum of minimum distances of every point xj to them is minimal. Analogically, the solution of the following minimization problem: minimize N X j=1 min |wiT xj − bi |2 16i6n subject to kwi k = 1, bi ∈ R, i = 1, ..., n, (3) defines w -skeleton of X (introduced in [3]). It consists of union of n hyperplanes {Hi }ni=1 (defined as above) such that the sum of squared minimum distances of every point xj to them is minimal. It is clear that if the representation X = AS is sparse (in sense that each column of S contains at most m − 1 non-zero elements, i.e. it satisfies the (2) 5 condition A2)) then the above defined two skeletons coincide and the data points (columns of X) lie on them. Conversely, assuming that above defined two skeletons coincide, the columns of X lie on them and the conditions of Theorem 3 are satisfies, then finding these skeletons leads to finding the mixing matrix, and subsequently, a sparse representation of X in form (1), which is unique (up to scaling and permutation of the rows of S). 5 Reduction of the clustering problem to a bilinear optimization problem. Below we present a bilinear algorithm for finding a modified skeleton, replacing the condition kwi k = 1 in (2) with the condition wiT e = 1, where the vector e has norm one. Proposition 4. Assume that the w(1) -skeleton of X is unique and any column of X lies only in one hyperplane of this skeleton. If the minimum value of the cost function in the clustering problem (2) is zero and bi = 0 then finding the w(1) -skeleton is equivalent to solving the following bilinear minimization problem (with variables tij ∈ R, ui,j ∈ R, wi ∈ Rm ): minimize N X n X tij uij (4) j=1 i=1 under constraints −uij 6 wiT xj 6 uij , ∀i, j n X tij = 1, tij > 0, ∀i, j (5) ∀i. (7) i=1 wiT e = 1, (6) The solution of (4) − (7) is unique and for any j only one component of tij , i = 1, ..., n is one, the rest are zero, i.e. the matrix T = {tij } gives the cluster assignment. Remark. A necessary condition for the minimum value of the cost function in the minimization problem (4) to be zero is xj for any j not to be collinear to the vector e. Proof of Proposition 4. Let {wi }ni=1 be the solution of (2). Put tij = 1 if i = argmin{wiT xj } and tij = 0 otherwise. Such argmin is uniquely defined by the assumption that any column of X lies in only one hyperplane of the skeleton. Putting also wi′ = wi /wiT e, uij = wiT xj , then a solution of (4) is given by wi′ , tij , uij , i = 1, ..., n, j = 1, ..., N . 6 Conversely, let wi′ , tij , uij , i = 1, ..., n, j = 1, ..., N be a solution of (4)-(7) (with minimum value zero). By (6) for any j there is at least one index ij such that tij j 6= 0 which implies that uij j = 0, i.e. xj lies in a hyperplane with normal vector wij . Since any column of X lies in only one hyperplane of the skeleton, we have wiT xj 6= 0 for every i 6= ij which implies tij = 0. Putting wi = wi′ /kwi′ k we see that {wi }ni=1 is a solution of (2). In the case when the assumptions of Proposition 4 are satisfied except of the assumption that the minimum value of the cost function in the clustering problem (2) is zero, we have that the minimization problems (4)-(7) and the following minimize N X j=1 min |wiT xj | subject to wiT e = 1, i = 1, ..., n, 16i6n (8) are equivalent. The proof is similar to those of Proposition 4 having in mind that N X n N X X tij uij > min uij j=1 i=1 j=1 16i6n as equality is achieved only if tij j = 1, where ij = argmin16i6n uij . If fmin and gmin are the minimum values of the minimization problems (2) and (8) respectively, we have fmin 6 gmin , so if gmin is near to zero we can expect that their solutions coincide (this is the case of the presented example below). The problem (8) has the advantage that it can be solved by linear programming (reformulated in the problem (4)-(7)), which is robust to outliers. This stated in the following proposition. Proposition 5. Assume that the vector e with norm 1 is chosen in a such a way that the following condition is satisfied: if we take any disjointP subsets J1 and then the vector e is not collinear to the vector j∈J1 xj − P J2 of {1, ..., N }, n j∈J2 xj . If {wi }i=1 is a solution of (8), then for every i ∈ {1, ..., n} there is at least one j ∈ {1, ..., N } such that wiT xj = 0, i.e. the hyperplanes with normal vectors wi contain at least one vector column of X. Remark. Almost all vectors e (in measure sense) in the unit sphere in Rm satisfy the assumption in Proposition 5, so in practice e is randomly chosen. Proof of Proposition 5. Denote Ji = {j : |wiT xj | = min |wkT xj |}. 16k6n Then wi is a solution of the following minimization problem: X minimize f (w) = |wT xj | subject to wT e = 1. j∈Ji 7 (9) Let i be fixed. Denote for simplicity w = wi . Define: © ª I + = j ∈ J i : w T xj > 0 , © ª I − = j ∈ J i : w T xj < 0 , © ª I 0 = j ∈ Ji : wT xj = 0 , We have to prove that I 0 6= ∅. By Kuhn-Tucker’s theorem in its subdifferential form and by the chain rule for subdifferentials (see for instance [1]) there exists εj ∈ [−1, 1], j ∈ I 0 , and λ ∈ R such that X X X xj − xj + εj xj = λe. (10) j∈I + j∈I 0 j∈I − Here we used the fact that the subdifferential of the function |.| at zero is equal to [−1, 1]. Multiplying (10) by w we obtain λ = fmin . If fmin = 0, the proposition is proved. Otherwise λ 6= 0. Now if we assume that I 0 = ∅, by (10) we obtain a contradiction with the assumption. The above proposition shows that our method for finding skeletons is robust to outliers in the following sense: if most of the column vectors of X lie on hyperplanes, finding these hyperplanes can be performed by solving the linear programming problem (4)-(7). Note that such robustness to outliers is not unexpected – it appears for instance in linear regression and clustering problems when the l2 norm (Euclidean) is replaced by the l1 norm (see for instance [4]). The separated linear programming problems (with respect to {tij } and with respect to ({wi }ni=1 , {uij }) can be solved separately, which leads to the following analogue (see Algorithm 3) to the k-median clustering algorithm of Bradley, Mangasarian and Street [4]. Note that the linear programming problem with respect to {tij } is the cluster assignment step, which is realized simply as assignment. The finite termination of the algorithm can be proved analogically as the proof of Theorem 3.7 in [3]. 6 Computer simulation example We present an example created by Matlab showing the robustness of our algorithm to outliers. We created a matrix S0 ∈ R5×120 from 5 random variables with normal distribution, using the command S0 = randn(5, 120) (in Matlab notation). We set to zero two elements (randomly chosen) in each column of S0 and obtained 2-sparse source matrix S, plotted in Figure 1. 8 Algorithm 3: n-planes clustering algorithm via linear programming Data : samples x(1), . . . , x(T ) of X Result: estimated matrix W containing the normal vectors of the hyperplanes 1 2 3 4 Initialize randomly W = (w1 , . . . , wn ). for j ← 1, . . . , j0 do Cluster assignment. for t ← 1, . . . , T do Add x(t) to cluster Y(i) , where i is chosen to minimize |wi⊤ x(t)| (distance to hyperplane given by the i-th column of W). end Exit if the mean distance to hyperplanes is smaller than some preset value. Matrix update. Put Tk = {t ∈ T : x(t) ∈ Y(k) . for k ← 1, . . . , n do Solve the following linear programming problem (with respect to w, u(t),P t ∈ Tk ): minimize t∈Tk u(t) T under Pnconstraints − u(t) 6 w x(t) 6 u(t), t ∈ Tk , k=1 wik = 1. Let w̃k be the solution in place of w and put wk = w̃k /kw̃k k. end end The constant j0 is chosen in practice sufficiently large. We created randomly umn is one)  0.5688  0.4337 H=  −0.6317 −0.2988 a normalized mixing matrix (the norm of each col0.1983 −0.4531 −0.2109 −0.8432  −0.4287 −0.3894 −0.7692 −0.2374 −0.0730 0.2047  . −0.1637 0.8516 −0.6037  −0.8562 −0.3432 −0.0443 and created mixed signal matrix as X = HS. We added to the first 20 columns of X a noise N created as N = 0.1 ∗ randn(5, 20) (in Matlab notation). We run our n-planes clustering algorithm several times after obtaining a hyperplane such that at least 5 columns of X are at distance 10−4 to 9 5 0 −5 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 5 0 −5 5 0 −5 5 0 −5 5 0 −5 Figure 1: Example. Artificially created 2-sparse source signals. it. We remove these columns and run again the algorithm similarly, until we obtained 10 hyperplanes. They produced the following estimate of the mixing matrix (after running the Matrix identification part in Algorithm 1 which reconstructs matrix from planes):   0.7692 0.1983 0.3894 0.4287 0.5688  −0.2047 −0.4531 0.0730 0.2374 0.4337   A=  0.6037 −0.2109 −0.8516 0.1637 −0.6317  . 0.0443 −0.8432 0.3432 0.8562 −0.2988 Such perfect reconstruction was not possible when running the algorithm with precision less than 10−4 . After running the source recovery algorithm (see Algorithm 2) we obtained the recovered signals plotted in Figure 2. 7 Conclusion We developed a new algorithm for hyperplane clustering which with combination of other algorithms leads to sparse representation of signals in sense that 10 5 0 −5 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 5 0 −5 5 0 −5 5 0 −5 5 0 −5 Figure 2: Example. Recovered source signals using the recovered mixing matrix A. The samples from 20 to 120 recover perfectly (after permutation and scaling) the original source signals. we can reconstruct uniquely (up to permutation and scaling) a given linear mixture as multiplication of a mixing matrix and a sparse source matrix. Our algorithm has an advantage that it is robust to outliers and uses linear programming. An example is presented showing the robustness of our algorithm. References [1] V.M. Alekseev, V.M. Tihomirov and S.V. Fomin, Optimal Control, Nauka, 1979 (in Russian). [2] P. Bofill and M. Zibulevsky, “ Underdetermined Blind Source Separation using Sparse Representation”, Signal Processing, vol. 81, no. 11, pp. 2353-2362, 2001. 11 [3] P.S. Bradley and O. L. Mangasarian, “k-Plane Clustering”, J. Global optim., 16, (2000), no.1, 23-32. [4] P. S. Bradley, O. L. Mangasarian and W. N. Street, “Clustering via Concave Minimization”, Mathematical Programming Technical Report 96-03, May 1996. Advances in Neural Information Processing Systems 9, MIT Press, Cambridge, MA 1997, 368-374. [5] P.S. Bradley, U. M. Fayyad and O. L. Mangasarian, “Mathematical progrmming for data mining: formulations and challenges”, INFORMS J. Computing, 11 (1999), no. 3, 217–238. [6] A.M. Bronstein, M.M. Bronstein, M. Zibulevsky, Y. Y. Zeevi, “ Blind Separation of Reflections using Sparse ICA”, in Proc. Int. Conf. ICA2003, Nara, Japan, pp.227-232. [7] A. Cichocki and S. Amari, Adaptive Blind Signal and Image Processing. John Wiley, Chichester, 2002. [8] S. Chen, D. Donoho and M. Saunders, “ Atomic decomposition by basis pursuit”, SIAM J. Sci. Comput., Vol. 20, no. 1, pp. 33–61, 1998. [9] D. Donoho and M. Elad, “Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization”, Proc. Nat. Acad. Sci., vol.100, no.5, pp. 2197–2202, 2003. [10] D. Donoho and V. Stodden, “When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?”, Neural Information Processing Systems (NIPS) 2003 Conference, http://books.nips.cc [11] P. Georgiev, F. Theis and A. Cichocki, “Sparse Component Analysis and Blind Source Separation of Underdetermined Mixtures”, IEEE Transactions of Neural Networks, to appear. [12] A. Hyvärinen, J. Karhunen and E. Oja, Independent Component Analysis, John Wiley & Sons, 2001. [13] T.-W. Lee, M.S. Lewicki, M. Girolami, T.J. Sejnowski, “Blind sourse separation of more sources than mixtures using overcomplete representaitons”, IEEE Signal Process. Lett., Vol. 6, no. 4, pp. 87–90, 1999. [14] D. D. Lee and H. S. Seung “Learning the parts of objects by non-negative Matrix Factorization”, Nature, Vol. 40, pp. 788–791, 1999. [15] A. M. Rubinov and J. Ugon, “Skeletons of finite sets of points”, Research working paper 03/06, 2003, School of Information Technology & Mathematical Sciences, Univ. of Ballarat. 12 [16] F.J. Theis, E.W. Lang, and C.G. Puntonet, A geometric algorithm for overcomplete linear ICA. Neurocomputing, in print, 2003. [17] K. Waheed, F. Salem, “Algebraic Overcomplete Independent Component Analysis”, in Proc. Int. Conf. ICA2003, Nara, Japan, pp. 1077– 1082. [18] M. Zibulevsky, and B. A. Pearlmutter, “Blind source separation by sparse decomposition in a signal dictionary”, Neural Comput., Vol. 13, no. 4, pp. 863–882, 2001. 13