A self-contained, cross-platform, package for computing mutual information,
joint/conditional probability, entropy, and more. This package has also been
used for general machine learning and data mining purposes such as feature
selection, Bayesian network construction, signal processing, etc.
This package contains also two functions for minimal redundancy feature
selection. But both function only supports discretized data, thus if you have
continuous data in "d", you will need to discretize them first.
Two source code files of the mRMR (minimum-redundancy maximum-relevancy) feature
selection method in (Peng et al, 2005 and Ding & Peng, 2005, 2003), whose
better performance over the conventional top-ranking method has been
demonstrated on a number of data sets in recent publications. This version uses
mutual information as a proxy for computing relevance and redundancy among
variables (features). Other variations such as using correlation or F-test or
distances can be easily implemented within this framework, too.
Hanchuan Peng, Fuhui Long, and Chris Ding, "Feature selection based on
mutual information: criteria of max-dependency, max-relevance, and
min-redundancy,"
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 27, No. 8, pp.1226-1238, 2005.
Ding C., and Peng HC, "Minimum redundancy feature selection from microarray
gene expression data," Journal of Bioinformatics and Computational
Biology,
Vol. 3, No. 2, pp.185-205, 2005.
Ding, C and Peng HC, Proc. 2nd IEEE Computational Systems Bioinformatics
Conference (CSB 2003),
pp.523-528, Stanford, CA, Aug, 2003.