article

Free access

Kernel independent component analysis

Authors:

Francis R. Bach,

Michael I. JordanAuthors Info & Claims

The Journal of Machine Learning Research, Volume 3

Pages 1 - 48

https://doi.org/10.1162/153244303768966085

Published: 01 March 2003 Publication History

Abstract

We present a class of algorithms for independent component analysis (ICA) which use contrast functions based on canonical correlations in a reproducing kernel Hilbert space. On the one hand, we show that our contrast functions are related to mutual information and have desirable mathematical properties as measures of statistical dependence. On the other hand, building on recent developments in kernel methods, we show that these criteria and their derivatives can be computed efficiently. Minimizing these criteria leads to flexible and robust algorithms for ICA. We illustrate with simulations involving a wide variety of source distributions, showing that our algorithms outperform many of the presently known algorithms.

References

[1]

S. Akaho. A kernel method for canonical correlation analysis. In Proceedings of the International Meeting of the Psychometric Society (IMPS2001). Tokyo: Springer-Verlag, 2001.]]

[2]

S. Amari. Natural gradient works efficiently in learning. Neural Computation, 10(2):251-276, 1998.]]

[3]

S. Amari, A. Cichocki, and H. H. Yang. A new learning algorithm for blind signal separation. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, 8. Cambridge, MA: MIT Press, 1996.]]

[4]

H. Attias. Independent factor analysis. Neural Computation, 11 (4):803-851, 1999.]]

[5]

F. R. Bach and M. I. Jordan. Tree-dependent component analysis. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI 2002). San Francisco, CA: Morgan Kaufmann, 2002.]]

[6]

C. Baker. The Numerical Treatment of Integral Equations. Oxford, UK: Clarendon Press, 1977.]]

[7]

A. J. Bell and T. J. Sejnowski. An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6):1129-1159, 1995.]]

[8]

P. J. Bickel, C. A. J. Klaassen, Y. Ritov, and J. A. Wellner. Efficient and Adaptive Estimation for Semiparametric Models. New York: Springer-Verlag, 1998.]]

[9]

M. Borga, H. Knutsson, and T. Landelius. Learning canonical correlations. In Proceedings of the Tenth Scandinavian Conference on Image Analysis (SCIA '97), 1997.]]

[10]

R. Boscolo, H. Pan, and V. P. Roychowdhury. Non-parametric ICA. In Proceedings of the Third International Conference on Independent Component Analysis and Blind Source Separation (ICA 2001), 2001.]]

[11]

L. Breiman and J. H. Friedman. Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 80:580-598, 1985.]]

[12]

H. Brezis. Analyse Fonctionelle. Paris: Masson, 1980.]]

[13]

A. Buja. Remarks on functional canonical variates, alternating least squares methods, and ACE. Annals of Statistics, 18(3):1032-1069, 1990.]]

[14]

J.-F. Cardoso. Multidimensional independent component analysis. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'98), 1998.]]

[15]

J.-F. Cardoso. High-order contrasts for independent component analysis. Neural Computation, 11(1):157-192, 1999.]]

[16]

P. Comon. Independent component analysis, a new concept? Signal Processing, 36(3): 287-314, 1994.]]

[17]

T. H. Cormen. C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. Cambridge, MA: MIT Press, 1990.]]

[18]

T. M. Cover and J. A. Thomas. Elements of Information Theory. New York: John Wiley & Sons, 1991.]]

[19]

N. Cristianini, J. Shawe-Taylor, and H. Lodhi. Latent semantic kernels. Journal of Intelligent Information Systems, 18(2-3):127-152, 2002.]]

[20]

R. Durrett. Probability: Theory and Examples. Belmont, CA: Duxbury Press, 1996.]]

[21]

A. Edelman, T. A. Arias, and S. T. Smith. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2):303-353, 1999.]]

[22]

S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representations. Journal of Machine Learning Research, 2:243-264, 2001.]]

[23]

C. Fyfe and P. L. Lai. ICA using kernel canonical correlation analysis. In Proceedings of the International Workshop on Independent Component Analysis and Blind Signal Separation (ICA 2000), 2000.]]

[24]

F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural networks architectures. Neural Computation, 7(2):219-269, 1995.]]

[25]

G. H. Golub and C. F. Van Loan. Matrix Computations. Baltimore, MD: Johns Hopkins University Press, 1996.]]

[26]

S. Harmeling, A. Ziehe, M. Kawanabe, and K.-R. Müller. Kernel feature spaces and nonlinear blind source separation. In T.G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems, 14. Cambridge, MA: MIT Press, 2002.]]

[27]

T. J. Hastie and R. J. Tibshirani. Generalized Additive Models. London: Chapman and Hall, 1990.]]

[28]

H. Hotelling. Relations between two sets of variates. Biometrika, 28:321-377, 1936.]]

[29]

A. Hyvärinen. Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3):626-634, 1999.]]

[30]

A. Hyvärinen, J. Karhunen, and E. Oja. Independent Component Analysis. New York: John Wiley & Sons, 2001.]]

[31]

A. Hyvärinen and E. Oja. A fast fixed point algorithm for independent component analysis. Neural Computation, 9(7):1483-1492, 1997.]]

[32]

C. Jutten and J. Herault. Blind separation of sources, Part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24(1):1-10, 1991.]]

[33]

J. R. Kettenring. Canonical analysis of several sets of variables. Biometrika, 58(3):433-451, 1971.]]

[34]

A. N. Kolmogorov. On the Shannon theory of information transmission in the case of continuous signals. IRE Transactions on Information Theory, 2(4):102-108, 1956.]]

[35]

S. Kullback. Information Theory and Statistics. New York: John Wiley & Sons, 1959.]]

[36]

T.-W. Lee, M., Girolami, and T. J. Sejnowski. Independent component analysis using an extended Infomax algorithm for mixed sub-gaussian and super-gaussian sources. Neural Computation, 11(2):417-441, 1999.]]

[37]

S. Leurgans, R. Moyeed, and B. W. Silverman. Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society, B, 55(3):725-740, 1993.]]

[38]

H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string kernels. Journal of Machine Learning Research, 2:419-444, 2002.]]

[39]

T. Melzer, M. Reiter, and H. Bischof. Nonlinear feature extraction using generalized canonical correlation analysis. In Proceedings of the International Conference on Artificial Neural Networks (ICANN 2001). New York: Springer-Verlag, 2001.]]

[40]

D. T. Pham and P. Garat. Blind separation of mixtures of independent sources through a quasi-maximum likelihood approach. IEEE Transactions on Signal Processing, 45(7): 1712-1725, 1997.]]

[41]

S. Saitoh. Theory of Reproducing Kernels and its Applications. Harlow, UK: Longman Scientific & Technical, 1988.]]

[42]

B. Schölkopf and A. J. Smola. Learning with Kernels. Cambridge, MA: MIT Press, 2001.]]

[43]

B. Schölkopf, A. J. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(3):1299-1319, 1998.]]

[44]

A. J. Smola and B. Schölkopf. Sparse greedy matrix approximation for machine learning. In P. Langley, editor, Proceedings of Seventeenth International Conference on Machine Learning (ICML 2000). San Francisco, CA: Morgan Kaufmann, 2000.]]

[45]

G. Szegö. Orthogonal Polynomials. Providence, RI: American Mathematical Society, 1975.]]

[46]

V. N. Vapnik. Statistical Learning Theory. New York: John Wiley & Sons, 1998.]]

[47]

N. Vlassis and Y. Motomura. Efficient source adaptivity in independent component analysis. IEEE Transactions on Neural Networks, 12(3):559-566, 2001.]]

[48]

M. Welling and M. Weber. A constrained EM algorithm for independent component analysis. Neural Computation, 13(3):677-689, 2001.]]

[49]

H. Widom. Asymptotic behavior of the eigenvalues of certain integral equations. Transactions of the American Mathematical Society, 109:278-295, 1963.]]

[50]

H. Widom. Asymptotic behavior of the eigenvalues of certain integral equations II. Archive for Rational Mechanics and Analysis, 17:215-229, 1964.]]

[51]

C. K. I. Williams and M. Seeger. Effect of the input density distribution on kernel-based classifiers. In P. Langley, editor, Proceedings of Seventeenth International Conference on Machine Learning (ICML 2000). San Francisco, CA: Morgan Kaufmann, 2000.]]

[52]

C. K. I. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, 13. Cambridge, MA: MIT Press, 2001.]]

[53]

S. Wright. Modified Cholesky factorizations in interior-point algorithms for linear programming. SIAM Journal on Optimization, 9(4):1159-1191, 1999.]]

Cited By

Yue XDong ZChen YXie S(2025)Evidential dissonance measure in robust multi-view classification to resist adversarial attackInformation Fusion10.1016/j.inffus.2024.102605113:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.inffus.2024.102605
Wang XGao HWei XPeng LLi RLiu CWu SWong HCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Contrastive Graph Distribution Alignment for Partially View-Aligned ClusteringProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681048(5240-5249)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681048
Harbrecht HMulterer MSchenk OSchwab C(2024)Multiresolution kernel matrix algebraNumerische Mathematik10.1007/s00211-024-01409-8156:3(1085-1114)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00211-024-01409-8
Show More Cited By

Index Terms

Kernel independent component analysis

Recommendations

Independent Component Analysis without Preprocessing
IHMSC '12: Proceedings of the 2012 4th International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 01

In this paper, we introduce a novel independent component analysis (ICA) algorithm, which does not require any preprocessing of the mixed signals (as opposed to most current ICA algorithms). Using a zero-forcing technique, the algorithm performs on-line ...
Independent component analysis of time/position varying mixtures
LVA/ICA'10: Proceedings of the 9th international conference on Latent variable analysis and signal separation

Blind Source Separation (BSS) is a well known problem that has been addressed in numerous studies in the last few decades. Most of the studies in this field address the problem of time/position invariant mixtures of multiple sources. Real problems are ...
Independent component analysis based on higher-order statistics only
SSAP '96: Proceedings of the 8th IEEE Signal Processing Workshop on Statistical Signal and Array Processing (SSAP '96)

Most conventional techniques for independent component analysis (or blind source separation) resort to second-order statistics to decorrelate the observed data. The prewhitening step makes these algorithms sensitive to the presence of additive Gaussian ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 3, Issue

3/1/2003

1437 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 March 2003

Published in JMLR Volume 3

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

258
Total Citations
View Citations
3,370
Total Downloads

Downloads (Last 12 months)92
Downloads (Last 6 weeks)17

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yue XDong ZChen YXie S(2025)Evidential dissonance measure in robust multi-view classification to resist adversarial attackInformation Fusion10.1016/j.inffus.2024.102605113:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.inffus.2024.102605
Wang XGao HWei XPeng LLi RLiu CWu SWong HCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Contrastive Graph Distribution Alignment for Partially View-Aligned ClusteringProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681048(5240-5249)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681048
Harbrecht HMulterer MSchenk OSchwab C(2024)Multiresolution kernel matrix algebraNumerische Mathematik10.1007/s00211-024-01409-8156:3(1085-1114)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00211-024-01409-8
Wang KWang HWang SWang L(2024)Variable selection for multivariate functional data via conditional correlation learningComputational Statistics10.1007/s00180-024-01489-y39:4(2375-2412)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00180-024-01489-y
Sturma NSquires CDrton MUhler COh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Unpaired multi-domain causal representation learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667618(34465-34492)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667618
Rohekar RGurwicz YNisimov SOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Causal interpretation of self-attention in pre-trained transformersProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667488(31450-31465)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667488
Zhang TZhang YZhou TOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Statistical insights into HSIC in high dimensionsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666961(19145-19156)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666961
Pandeva TForré PEvans RShpitser I(2023)Multi-view independent component analysis with shared and individual sourcesProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625988(1639-1650)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625988
Liu WChen YYue XZhang CXie SWilliams BChen YNeville J(2023)Safe multi-view deep classificationProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i7.26066(8870-8878)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i7.26066
Ren YZhang HXia YGuan JZhou SWilliams BChen YNeville J(2023)Multi-level wavelet mapping correlation for statistical dependence measurementProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i5.25799(6499-6506)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i5.25799
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents