Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Kernel independent component analysis

Published: 01 March 2003 Publication History

Abstract

We present a class of algorithms for independent component analysis (ICA) which use contrast functions based on canonical correlations in a reproducing kernel Hilbert space. On the one hand, we show that our contrast functions are related to mutual information and have desirable mathematical properties as measures of statistical dependence. On the other hand, building on recent developments in kernel methods, we show that these criteria and their derivatives can be computed efficiently. Minimizing these criteria leads to flexible and robust algorithms for ICA. We illustrate with simulations involving a wide variety of source distributions, showing that our algorithms outperform many of the presently known algorithms.

References

[1]
S. Akaho. A kernel method for canonical correlation analysis. In Proceedings of the International Meeting of the Psychometric Society (IMPS2001). Tokyo: Springer-Verlag, 2001.]]
[2]
S. Amari. Natural gradient works efficiently in learning. Neural Computation, 10(2):251-276, 1998.]]
[3]
S. Amari, A. Cichocki, and H. H. Yang. A new learning algorithm for blind signal separation. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, 8. Cambridge, MA: MIT Press, 1996.]]
[4]
H. Attias. Independent factor analysis. Neural Computation, 11 (4):803-851, 1999.]]
[5]
F. R. Bach and M. I. Jordan. Tree-dependent component analysis. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI 2002). San Francisco, CA: Morgan Kaufmann, 2002.]]
[6]
C. Baker. The Numerical Treatment of Integral Equations. Oxford, UK: Clarendon Press, 1977.]]
[7]
A. J. Bell and T. J. Sejnowski. An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6):1129-1159, 1995.]]
[8]
P. J. Bickel, C. A. J. Klaassen, Y. Ritov, and J. A. Wellner. Efficient and Adaptive Estimation for Semiparametric Models. New York: Springer-Verlag, 1998.]]
[9]
M. Borga, H. Knutsson, and T. Landelius. Learning canonical correlations. In Proceedings of the Tenth Scandinavian Conference on Image Analysis (SCIA '97), 1997.]]
[10]
R. Boscolo, H. Pan, and V. P. Roychowdhury. Non-parametric ICA. In Proceedings of the Third International Conference on Independent Component Analysis and Blind Source Separation (ICA 2001), 2001.]]
[11]
L. Breiman and J. H. Friedman. Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 80:580-598, 1985.]]
[12]
H. Brezis. Analyse Fonctionelle. Paris: Masson, 1980.]]
[13]
A. Buja. Remarks on functional canonical variates, alternating least squares methods, and ACE. Annals of Statistics, 18(3):1032-1069, 1990.]]
[14]
J.-F. Cardoso. Multidimensional independent component analysis. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'98), 1998.]]
[15]
J.-F. Cardoso. High-order contrasts for independent component analysis. Neural Computation, 11(1):157-192, 1999.]]
[16]
P. Comon. Independent component analysis, a new concept? Signal Processing, 36(3): 287-314, 1994.]]
[17]
T. H. Cormen. C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. Cambridge, MA: MIT Press, 1990.]]
[18]
T. M. Cover and J. A. Thomas. Elements of Information Theory. New York: John Wiley & Sons, 1991.]]
[19]
N. Cristianini, J. Shawe-Taylor, and H. Lodhi. Latent semantic kernels. Journal of Intelligent Information Systems, 18(2-3):127-152, 2002.]]
[20]
R. Durrett. Probability: Theory and Examples. Belmont, CA: Duxbury Press, 1996.]]
[21]
A. Edelman, T. A. Arias, and S. T. Smith. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2):303-353, 1999.]]
[22]
S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representations. Journal of Machine Learning Research, 2:243-264, 2001.]]
[23]
C. Fyfe and P. L. Lai. ICA using kernel canonical correlation analysis. In Proceedings of the International Workshop on Independent Component Analysis and Blind Signal Separation (ICA 2000), 2000.]]
[24]
F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural networks architectures. Neural Computation, 7(2):219-269, 1995.]]
[25]
G. H. Golub and C. F. Van Loan. Matrix Computations. Baltimore, MD: Johns Hopkins University Press, 1996.]]
[26]
S. Harmeling, A. Ziehe, M. Kawanabe, and K.-R. Müller. Kernel feature spaces and nonlinear blind source separation. In T.G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems, 14. Cambridge, MA: MIT Press, 2002.]]
[27]
T. J. Hastie and R. J. Tibshirani. Generalized Additive Models. London: Chapman and Hall, 1990.]]
[28]
H. Hotelling. Relations between two sets of variates. Biometrika, 28:321-377, 1936.]]
[29]
A. Hyvärinen. Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3):626-634, 1999.]]
[30]
A. Hyvärinen, J. Karhunen, and E. Oja. Independent Component Analysis. New York: John Wiley & Sons, 2001.]]
[31]
A. Hyvärinen and E. Oja. A fast fixed point algorithm for independent component analysis. Neural Computation, 9(7):1483-1492, 1997.]]
[32]
C. Jutten and J. Herault. Blind separation of sources, Part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24(1):1-10, 1991.]]
[33]
J. R. Kettenring. Canonical analysis of several sets of variables. Biometrika, 58(3):433-451, 1971.]]
[34]
A. N. Kolmogorov. On the Shannon theory of information transmission in the case of continuous signals. IRE Transactions on Information Theory, 2(4):102-108, 1956.]]
[35]
S. Kullback. Information Theory and Statistics. New York: John Wiley & Sons, 1959.]]
[36]
T.-W. Lee, M., Girolami, and T. J. Sejnowski. Independent component analysis using an extended Infomax algorithm for mixed sub-gaussian and super-gaussian sources. Neural Computation, 11(2):417-441, 1999.]]
[37]
S. Leurgans, R. Moyeed, and B. W. Silverman. Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society, B, 55(3):725-740, 1993.]]
[38]
H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string kernels. Journal of Machine Learning Research, 2:419-444, 2002.]]
[39]
T. Melzer, M. Reiter, and H. Bischof. Nonlinear feature extraction using generalized canonical correlation analysis. In Proceedings of the International Conference on Artificial Neural Networks (ICANN 2001). New York: Springer-Verlag, 2001.]]
[40]
D. T. Pham and P. Garat. Blind separation of mixtures of independent sources through a quasi-maximum likelihood approach. IEEE Transactions on Signal Processing, 45(7): 1712-1725, 1997.]]
[41]
S. Saitoh. Theory of Reproducing Kernels and its Applications. Harlow, UK: Longman Scientific & Technical, 1988.]]
[42]
B. Schölkopf and A. J. Smola. Learning with Kernels. Cambridge, MA: MIT Press, 2001.]]
[43]
B. Schölkopf, A. J. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(3):1299-1319, 1998.]]
[44]
A. J. Smola and B. Schölkopf. Sparse greedy matrix approximation for machine learning. In P. Langley, editor, Proceedings of Seventeenth International Conference on Machine Learning (ICML 2000). San Francisco, CA: Morgan Kaufmann, 2000.]]
[45]
G. Szegö. Orthogonal Polynomials. Providence, RI: American Mathematical Society, 1975.]]
[46]
V. N. Vapnik. Statistical Learning Theory. New York: John Wiley & Sons, 1998.]]
[47]
N. Vlassis and Y. Motomura. Efficient source adaptivity in independent component analysis. IEEE Transactions on Neural Networks, 12(3):559-566, 2001.]]
[48]
M. Welling and M. Weber. A constrained EM algorithm for independent component analysis. Neural Computation, 13(3):677-689, 2001.]]
[49]
H. Widom. Asymptotic behavior of the eigenvalues of certain integral equations. Transactions of the American Mathematical Society, 109:278-295, 1963.]]
[50]
H. Widom. Asymptotic behavior of the eigenvalues of certain integral equations II. Archive for Rational Mechanics and Analysis, 17:215-229, 1964.]]
[51]
C. K. I. Williams and M. Seeger. Effect of the input density distribution on kernel-based classifiers. In P. Langley, editor, Proceedings of Seventeenth International Conference on Machine Learning (ICML 2000). San Francisco, CA: Morgan Kaufmann, 2000.]]
[52]
C. K. I. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, 13. Cambridge, MA: MIT Press, 2001.]]
[53]
S. Wright. Modified Cholesky factorizations in interior-point algorithms for linear programming. SIAM Journal on Optimization, 9(4):1159-1191, 1999.]]

Cited By

View all
  • (2025)Evidential dissonance measure in robust multi-view classification to resist adversarial attackInformation Fusion10.1016/j.inffus.2024.102605113:COnline publication date: 1-Jan-2025
  • (2024)Contrastive Graph Distribution Alignment for Partially View-Aligned ClusteringProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681048(5240-5249)Online publication date: 28-Oct-2024
  • (2024)Multiresolution kernel matrix algebraNumerische Mathematik10.1007/s00211-024-01409-8156:3(1085-1114)Online publication date: 1-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research
The Journal of Machine Learning Research  Volume 3, Issue
3/1/2003
1437 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 March 2003
Published in JMLR Volume 3

Author Tags

  1. Stiefel manifold
  2. blind source separation
  3. canonical correlations
  4. gram matrices
  5. incomplete Cholesky decomposition
  6. independent component analysis
  7. integral equations
  8. kernel methods
  9. mutual information
  10. semiparametric models

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)92
  • Downloads (Last 6 weeks)17
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Evidential dissonance measure in robust multi-view classification to resist adversarial attackInformation Fusion10.1016/j.inffus.2024.102605113:COnline publication date: 1-Jan-2025
  • (2024)Contrastive Graph Distribution Alignment for Partially View-Aligned ClusteringProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681048(5240-5249)Online publication date: 28-Oct-2024
  • (2024)Multiresolution kernel matrix algebraNumerische Mathematik10.1007/s00211-024-01409-8156:3(1085-1114)Online publication date: 1-Jun-2024
  • (2024)Variable selection for multivariate functional data via conditional correlation learningComputational Statistics10.1007/s00180-024-01489-y39:4(2375-2412)Online publication date: 1-Jun-2024
  • (2023)Unpaired multi-domain causal representation learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667618(34465-34492)Online publication date: 10-Dec-2023
  • (2023)Causal interpretation of self-attention in pre-trained transformersProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667488(31450-31465)Online publication date: 10-Dec-2023
  • (2023)Statistical insights into HSIC in high dimensionsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666961(19145-19156)Online publication date: 10-Dec-2023
  • (2023)Multi-view independent component analysis with shared and individual sourcesProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625988(1639-1650)Online publication date: 31-Jul-2023
  • (2023)Safe multi-view deep classificationProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i7.26066(8870-8878)Online publication date: 7-Feb-2023
  • (2023)Multi-level wavelet mapping correlation for statistical dependence measurementProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i5.25799(6499-6506)Online publication date: 7-Feb-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media