Summary of EFICA
Summary of EFICA
Summary of EFICA
A basic assumption in ICA is that the elements of Original Source Signals(S),denoted by s ij, are mutually independent identically distributed random variables with probability density functions (pdfs) being defined as pi(si)j;i=1.d . The row variables sijfor all j=1N , having the same density, are thus an independent identically distributed (i.i.d.) sample of one of the independent sources denoted by . The key assumptions for the identifiability of the model (1), or solving both the Mixing Matrix (A) and Original Source (S) up to some simple ambiguities, are that all but at most one of the densities are non-Gaussian, and the unknown matrix has full rank, i.e., it has column/row rank, whichever is the maximum(Rank corresponds to the Linearly Independent seta(i.e. Either a row or a Column).Column Rank therefore would be the maximum number of Lineraly Independent Columns. A full rank would be possible if all Columns are independent, given No. of Columns> Number of rows.) < -Part of ICA Intro The basic ICA problem and its extensions and applications have been studied widely and many algorithms have been developed. One of the main differences is how the unknown probability density functions pi() of the original signals are estimated or replaced by suitable nonlinearities in the ICA contrast functions. NonGaussianity is the key property. For instance, JADE [SELF] is based on the estimation of kurtosis via cumulants, NPICA [SELF] uses a nonparametric model of the density functions, and RADICAL [SELF] uses an approximation of the entropy of the densities based on order statistics. The FastICA algorithm uses either kurtosis [FastICAs Paper] or other measures of non-Gaussianity in entropy approximations in the form of suitable nonlinear functions G()<- Part of Intro.Please include the other non linearities as well here
Though ICA has been very successful in large scale practical problems, it still suffers from some issues, like the Theoretical Accuracy of the Algorithm when considering the various inputs. To Prove the general Validity that the algorithm is correct and efficient, it should reach Cramer-Rao s Lower Bound.(Need to add info about CRB here) Also write in the CRB that practically the demixing matrix is not exactly the inverse of Original Sources A and the Estimated Sources is approximation of original signals with the variance being calculated as WA-unit matrix(WA is the multiplication of Demixing and Mixing matrix, and assuming a dxd Unit matrix(same dimenstions as that of A and W)
An asymptotic performance analysis of the FastICA algorithm in is compared with the CRB for ICA [SELF] and showed that the accuracy of FastICA is very close, but not equal to, the CRB. The condition for this is that the nonlinearity G() in the FastICA contrast function is the integral of the score function of the original signals, or the negative log density
[SELF]
When the asymptotic performance achieves the CRB, the absolute accuracy is reached, which cannot be improved further. Use the SectionII.A in FastICA, as a chapter.
Use the last Paragraph for EFICA, along with the figure.(If possible, try to understand the maths of section B)
EFFICIENT FASTICA: EFICA The proposed efficient version of FastICA is based on the following observations: i) The symmetric FastICA algorithm can be run with different nonlinearity for different sources; ii) in the symmetrization step of each iteration, it is possible to introduce auxiliary constants, that can be tuned to minimize mean
square estimation error in one (say th) row of the estimated demixing matrix. These estimations can be performed in parallel for all rowsto obtain an estimate of the whole demixing matrix, that achieves the corresponding CRB, if the nonlinearities correspond to score functions of the sources; and iii) the algorithm remains to be asymptotically efficient (attaining the CRB) if the theoretically optimum auxiliary constants in the algorithm are replaced by their consistent estimates. The proposed algorithm EFICA models all independent signals as they have a generalized Gaussian (GG) distribution with appropriate parameters s. The algorithm is summarized in Fig. 1. Note that the output is not constrained, unlike symmetric FastICA, in the sense that the separated components need not have exactly zero sample correlations. In order to explain the proposed algorithm in more details, the notion of generalized symmetric FastICA is introduced, and its efficiency is studied in Section III-A. The algorithm EFICA will be presented in detail in Section III-B. A. Generalizing the Symmetric FastICA to Attain the CRB Consider now a version of the symmetric version of FastICA where two changes have been made. First, as it is not possible to attain the CRB if only one Nonlinearity g()is used, different nonlinear functions, gk(),k=1,2..d will be used for estimation of each row of W+ Second, the first step of the iteration will be followed by multiplying each row of W+ with a suitable positive number ci , i=1,2.d before the symmetric orthogonalization This will change the length (norm) of each row, which will affect the orientations of the rows after orthonormalization. The true score functions are rarely known in advance, and the generalized symmetric FastICA has only a theoretical meaning. It can be proved, however, that the asymptotic efficiency of the algorithm is maintained if the score functions and the optimum coefficients are replaced by their consistent estimates. For the consistent estimation, it is necessary to have a consistent initial estimate of the mixing or demixing matrix. The ordinary symmetric FastICA is one possible choice. Second, one needs a consistent estimate of the score functions computed from the sample distribution functions of the components. This is a widely studied task, and numerous approaches have been developed either parametric [21] or nonparametric [11], [12], [22]. Note, however, that not every score function can serve a suitable nonlinearity for use in FastICA iteration. Suitable nonlinearity must be continuous and differentiable.
B. Proposed Algorithm In this section, an algorithm, called for brevity EFICA is proposed, which combines the idea of the generalized symmetric FastICA with an adaptive choice of the function , which is based on modelling of the distribution of the independent component by GG distribution [15]. The algorithm consists of three steps Step 1) Running the Symmetric FastICA Until Convergence: The purpose of Step 1) is to quickly and reliably get preliminary estimates of the original signals. In this step, therefore, the
optional nonlinearity in the original symmetric FastICA g(s)=tanh(s) is used due to its universality, but other possibilities seem to give promising results as well, e.g., .g(s)=s/(1+s2) Also, the test for saddle points as introduced in [8] is performed to get reliable source estimates.
2) adaptive choice of different nonlinearities gk to estimate the score functions of the found sources, based on the outcome of step 1); Assume that uk is the kth estimated independent signal obtained in Step 1). In many real situations, the distributions of the signals are unimodal and symmetric. In this paper,we focus on a parametric choice of gk that works well for the class of GG distributions with parameter (Symbol:alpha) denoted as GG((Symbol:Aplha)).The score function of this function is
A problem with the score function of the distribution GG((Symbol:alpha))is that it is not continuous for (Symbol:alpha) < 1 and thus it is not a valid nonlinearity for FastICA. For these (Symbol:alpha) s the statistical efficiency cannot be achieved by the algorithm using this score function. They therefore take the Super Gaussian (Symbol:alpha) >2) and Sub Gaussian ((Symbol:alpha) <2) separately. In summary, the nonlinearity of our choice is
where m4k is the estimated fourth-order moment of the kth source signal, and ((Symbol:alpha))k is
, where v1 is approx 0.2096 and v2 is approx 0.1851 Step 3): The Refinement: a refinement or fine-tuning for each of the found source components by one-unit FastICA, using the nonlinearities found in step 2), and another fine-tuning using the optimal Ck parameters (where ck
parameters are The refinement of the initial estimate proceeds in two steps. The first step, denoted R1, is a more sophisticated implementation of the relation (15). Theoretically, it would suffice to perform (15) once, starting from the initial estimate of W. However,
better results are obtained if it is performed separately for each k as series of one unit FastICA iterations, until a convergence is achieved. In the last iteration, however, the normalization step is skipped. This method works well, if the preliminary estimates of the original signals uk from the first step (symmetric FastICA) of the proposed method lie in the right domain of attraction. It might happen, however, that some of the components are difficult to separate from some other components, and the one-unit iterations converge to a wrong component. This pathological case can be excluded by checking the condition whether the angle between the component separated by the initial solution and the one unit solution is not too big. If it happens, then the one unit solution should be replaced by the initial estimate.