research-article

A method for combining mutual information and canonical correlation analysis

Authors:

Olcay KursunAuthors Info & Claims

Expert Systems with Applications: An International Journal, Volume 39, Issue 3

Pages 3333 - 3344

https://doi.org/10.1016/j.eswa.2011.09.020

Published: 01 February 2012 Publication History

Abstract

Highlights We propose a hybrid measure of relevance based on MI and KCCA. Our measure PMI weighs more the samples with predictive powers. PMI effectively eliminates the samples with no predictive contribution. We show that PMI has improved feature detection capability. Feature selection is a critical step in many artificial intelligence and pattern recognition problems. Shannon's Mutual Information (MI) is a classical and widely used measure of dependence measure that serves as a good feature selection algorithm. However, as it is a measure of mutual information in average, under-sampled classes (rare events) can be overlooked by this measure, which can cause critical false negatives (missing a relevant feature very predictive of some rare but important classes). Shannon's mutual information requires a well sampled database, which is not typical of many fields of modern science (such as biomedical), in which there are limited number of samples to learn from, or at least, not all the classes of the target function (such as certain phenotypes in biomedical) are well-sampled. On the other hand, Kernel Canonical Correlation Analysis (KCCA) is a nonlinear correlation measure effectively used to detect independence but its use for feature selection or ranking is limited due to the fact that its formulation is not intended to measure the amount of information (entropy) of the dependence. In this paper, we propose a hybrid measure of relevance, Predictive Mutual Information (PMI) based on MI, which also accounts for predictability of signals from each other in its calculation as in KCCA. We show that PMI has more improved feature detection capability than MI, especially in catching suspicious coincidences that are rare but potentially important not only for experimental studies but also for building computational models. We demonstrate the usefulness of PMI, and superiority over MI, on both toy and real datasets.

References

[1]

Akaho, S. (2001). A kernel method for canonical correlation analysis. International Meeting of Psychometric Society (IMPS2001).

[2]

UCI machine learning repository. University of California, Department of Information and Computer Science, Irvine, CA.

[3]

Kernel independent component analysis. The Journal of Machine Learning Research. v3. 1-48.

[4]

Mutual information maximization: Models of cortical self-organization. Network. v7 i1. 7-31.

[5]

Mutual information and maximal correlation as measures of dependence. Annals of Mathematical Statistics. v33. 587-595.

[6]

Neural networks for pattern recognition. Oxford University Press, Oxford.

[7]

Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association. v80 i391. 580-598.

[8]

On the maximum correlation coefficient. Theory of Probability and Its Applications. v49. 132-138.

[9]

A tutorial on support vector machines for pattern recognition. Knowledge Discovery and Data Mining. v2. 1-43.

Digital Library

[10]

Bayesian bin distribution inference and mutual information. IEEE Transactions on Information Theory. v51 i11. 3766-3779.

[11]

SINBAD: A neocortical mechanism for discovering environmental variables and regularities hidden in sensory input. Biological Cybernetics. v90. 191-202.

Digital Library

[12]

Das statistische problem der correlation also variations und Eigenwert problem. Zeitschrift fur Angewandte Mathematik und Mechanik. v21. 364-379.

[13]

Gurrera, M. D. C. (2005). Construction of bivariate distributions and statistical dependence operations. Ph.D. dissertation, University of Barcelona, Spain.

[14]

Guvenir, H. A., Acar, B., Demiroz, G., & Cekin, A. (1997). A supervised machine learning algorithm for arrhythmia analysis. In Proceedings of the computers in cardiology conference. Lund, Sweden.

[15]

Canonical correlation analysis: An overview with application to learning methods. Neural Computation. v16. 2639-2664.

Digital Library

[16]

Relations between two sets of variates. Biometrika. v28. 321-377.

[17]

A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks. v13. 415-425.

[18]

The class imbalance problem: A systematic study. Intelligent Data Analysis. v6 i5. 429-450.

Digital Library

[19]

SINBAD automation of scientific discovery: From factor analysis to theory synthesis. Natural Computing. v3 i2. 207-233.

Digital Library

[20]

Feature selection and extraction using an unsupervised biologically-suggested approximation to Gebelein's maximal correlation. International Journal of Pattern Recognition and Artificial Intelligence. v24 i3. 337-358.

[21]

Input feature selection by mutual information based on Parzen window. IEEE Transactions on Pattern Analysis and Machine Intelligence. v24. 1667-1671.

Digital Library

[22]

Machine learning for science: State of the art and future prospects. Science. v293. 2051-2055.

[23]

On Gebelein's correlation coefficient. Statistics and Probability Letters. v69. 299-303.

[24]

Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence. v27. 1226-1238.

Digital Library

[25]

On measures of dependence. Acta Mathematica Academiae Scieniarum Hungaricae. v10. 441-451.

[26]

Sakar, C. O., & Kursun, O. (2010). A hybrid method for feature selection based on mutual information and canonical correlation analysis. 20th international conference on pattern recognition. Istanbul, Turkey.

Digital Library

[27]

Learning with kernels. MIT Press, Cambridge, MA.

[28]

A mathematical theory of communication. Bell System Technical Journal. v27. 379-423.

[29]

Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). A detailed analysis of the KDD CUP 99 data set. In Proceedings of the second IEEE international conference on computational intelligence for security and defense applications (Ottawa, Ontario, Canada, July 08-10, 2009) (pp. 53-58). IEEE Press, Piscataway, NJ.

Digital Library

[30]

The nature of statistical learning theory. Springer-Verlag, New York.

Digital Library

[31]

Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering. v18 i1. 63-77.

Digital Library

Cited By

Li FWang JChyu MTang B(2018)Weak fault diagnosis of rotating machinery based on feature reduction with Supervised Orthogonal Local Fisher Discriminant AnalysisNeurocomputing10.1016/j.neucom.2015.05.076168:C(505-519)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1016/j.neucom.2015.05.076
Cevik NSakar CKursun O(2018)Analysis of shared miRNAs of different species using ensemble CCA and genetic distanceComputers in Biology and Medicine10.1016/j.compbiomed.2015.06.02364:C(261-267)Online publication date: 28-Dec-2018
https://dl.acm.org/doi/10.1016/j.compbiomed.2015.06.023
Sakar CKursun OGurgen F(2018)Ensemble canonical correlation analysisApplied Intelligence10.1007/s10489-013-0464-240:2(291-304)Online publication date: 28-Dec-2018
https://dl.acm.org/doi/10.1007/s10489-013-0464-2
Show More Cited By

A method for combining mutual information and canonical correlation analysis
1. Computing methodologies

Recommendations

Canonical correlation analysis based on information theory

In this article, we propose a new canonical correlation method based on information theory. This method examines potential nonlinear relationships between <i>p</i> × 1 vector Y-set and <i>q</i> × 1 vector X-set. It finds canonical coefficient vectors a ...
Canonical random correlation analysis
SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing

Canonical correlation analysis (CCA) is one of the most well-known methods to extract features from multi-view data and has attracted much attention in recent years. However, classical CCA is unsupervised and does not take class label information into ...
Asymptotic properties of canonical correlation analysis for one group with additional observations

We develop canonical correlation analysis in the context of two-step monotone incomplete data drawn from N"p"+"q(@m,@S), a multivariate normal population with mean @m and covariance matrix @S. Our data consist of n observations on each group and an ...

Comments

Information & Contributors

Information

Published In

cover image Expert Systems with Applications: An International Journal

Expert Systems with Applications: An International Journal Volume 39, Issue 3

February, 2012

1661 pages

ISSN:0957-4174

Issue’s Table of Contents

Copyright © Elsevier Ltd.

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 February 2012

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li FWang JChyu MTang B(2018)Weak fault diagnosis of rotating machinery based on feature reduction with Supervised Orthogonal Local Fisher Discriminant AnalysisNeurocomputing10.1016/j.neucom.2015.05.076168:C(505-519)Online publication date: 31-Dec-2018
https://dl.acm.org/doi/10.1016/j.neucom.2015.05.076
Cevik NSakar CKursun O(2018)Analysis of shared miRNAs of different species using ensemble CCA and genetic distanceComputers in Biology and Medicine10.1016/j.compbiomed.2015.06.02364:C(261-267)Online publication date: 28-Dec-2018
https://dl.acm.org/doi/10.1016/j.compbiomed.2015.06.023
Sakar CKursun OGurgen F(2018)Ensemble canonical correlation analysisApplied Intelligence10.1007/s10489-013-0464-240:2(291-304)Online publication date: 28-Dec-2018
https://dl.acm.org/doi/10.1007/s10489-013-0464-2
Liu MChen LLiu BWang X(2015)VRCAProceedings of the 24th International Conference on Artificial Intelligence10.5555/2832415.2832576(2355-2361)Online publication date: 25-Jul-2015
https://dl.acm.org/doi/10.5555/2832415.2832576
Liu MWu CChen L(2015)A vector reconstruction based clustering algorithm particularly for large-scale text collectionNeural Networks10.1016/j.neunet.2014.10.01263:C(141-155)Online publication date: 1-Mar-2015
https://dl.acm.org/doi/10.1016/j.neunet.2014.10.012
Ji BYe YXiao Y(2013)Mutual information evaluationIntelligent Data Analysis10.5555/2595599.259560417:6(1001-1021)Online publication date: 1-Nov-2013
https://dl.acm.org/doi/10.5555/2595599.2595604

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents