Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Normalized mutual information feature selection

Published: 01 February 2009 Publication History
  • Get Citation Alerts
  • Abstract

    A filter method of feature selection based on mutual information, called normalized mutual information feature selection (NMIFS), is presented. NMIFS is an enhancement over Battiti's MIFS, MIFS-U, and mRMR methods. The average normalized mutual information is proposed as a measure of redundancy among features. NMIFS outperformed MIFS, MIFS-U, and mRMR on several artificial and benchmark data sets without requiring a user-defined parameter. In addition, NMIFS is combined with a genetic algorithm to form a hybrid filter/wrapper method called GAMIFS. This includes an initialization procedure and a mutation operator based on NMIFS to speed up the convergence of the genetic algorithm. GAMIFS overcomes the limitations of incremental search algorithms that are unable to find dependencies between groups of features.

    References

    [1]
    I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," J. Mach. Learn. Res., vol. 3, pp. 1157-1182, 2003.
    [2]
    A. K. Jain, R. P. Duin, and J. Mao, "Statistical pattern recognition: A review," IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 1, pp. 4-37, Jan. 2000.
    [3]
    J. Kohavi and K. Pfleger, "Irrelevant features and the subset selection problem," in Proc. 11th Int. Conf. Mach. Learn., 1994, pp. 121-129.
    [4]
    M. Dash and H. Liu, "Feature selection for classification," Intell. Data Anal., vol. 1, no. 3, pp. 131-156, 1997.
    [5]
    J. Bins and B. Draper, "Feature selection from huge feature sets," in Proc. Int. Conf. Comput. Vis., Vancouver, BC, Canada, Jul. 2001, pp. 159-165.
    [6]
    M. Sebban and R. Nock, "A hybrid filter/wrapper approach of feature selection using information theory," Pattern Recognit., vol. 35, no. 4, pp. 835-846, Apr. 2002.
    [7]
    M. A. Hall, "Correlation-based feature selection for machine learning," Ph.D. dissertation, Dept. Comput. Sci., Univ. Waikato, Waikato, New Zealand, 1999.
    [8]
    L. Yu and H. Liu, "Efficient feature selection via analysis of relevance and redundancy," J. Mach. Learn. Res., vol. 5, pp. 1205-1224, Oct. 2004.
    [9]
    M. Dash and H. Liu, "Consistency-based search in feature selection," Artif. Intell. J., vol. 151, pp. 155-176, Dec. 2003.
    [10]
    G. Lashkia and L. Anthony, "Relevant, irredundant feature selection and noisy example elimination," IEEE Trans. Syst. Man Cybern. B, Cybern., vol. 34, no. 2, pp. 888-897, Apr. 2004.
    [11]
    R. Battiti, "Using mutual information for selecting features in supervised neural net learning," IEEE Trans. Neural Netw., vol. 5, no. 4, pp. 537-550, Jul. 1994.
    [12]
    N. Kwak and C.-H. Choi, "Input feature selection for classification problems," IEEE Trans. Neural Netw., vol. 3, no. 1, pp. 143-159, Jan. 2002.
    [13]
    P. A. Estévez and R. Caballero, "A niching genetic algorithm for selecting features for neural networks classifiers," in Perspectives in Neural Computation (ICANN'98). New York: Springer-Verlag, 1998, pp. 311-316.
    [14]
    G. van Dijck and M. M. van Hulle, "Speeding up the wrapper feature subset selection in regression by mutual information relevance and redundancy analysis," in Lecture Notes on Computer Science. Berlin, Germany: Springer-Verlag, 2006, vol. 4131, pp. 31-40.
    [15]
    D. Koller and M. Sahami, "Toward optimal feature selection," in Proc. 13th Int. Conf. Mach. Learn., 1996, pp. 284-292.
    [16]
    H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information: Criteria of max-dependency, max-relevance and min-redundancey," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1226-1238, Aug. 2005.
    [17]
    T. W. Chow and D. Huang, "Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information," IEEE Trans. Neural Netw., vol. 16, no. 1, pp. 213-224, Jan. 2005.
    [18]
    K. E. Hild, II, D. Erdogmus, K. Torkkola, and J. C. Principe, "Feature extraction using information theoretic learning," IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 9, pp. 1385-1392, Sep. 2006.
    [19]
    B. Bonev, F. Escalano, and M. Cazorla, "Feature selection, mutual information, and the classification of high-dimensional patterns," Pattern Anal. Appl., vol. 11, no. 3-4, pp. 309-319, 2008.
    [20]
    V. Sindhwani, S. Rakshit, D. Deodhar, D. Erdogmus, J. Principe, and P. Niyogi, "Feature selection in MLPs and SVMs based on maximum output information," IEEE Trans. Neural Netw., vol. 15, no. 4, pp. 937-948, Jul. 2004.
    [21]
    M. Mitchell, An Introduction to Genetic Algorithms. Cambridge, MA: MIT Press, 1996.
    [22]
    F. Brill, D. Brown, and W. Martin, "Fast genetic selection of features for neural networks classifiers," IEEE Trans. Neural Netw., vol. 3, no. 2, pp. 324-328, Mar. 1992.
    [23]
    M. Raymer, W. Punch, E. Goodman, L. Kuhn, and A. Jain, "Dimensionality reduction using genetic algorithms," IEEE Trans. Evol. Comput., vol. 4, no. 2, pp. 164-171, Jul. 2000.
    [24]
    S. W. Mahfoud, "Niching methods for genetic algorithms," Ph.D. dissertation, Dept. General Eng., Univ. Illinois at Urbana-Champaign, Urbana, IL, 1995.
    [25]
    I.-S. Oh, J.-S. Lee, and B.-R. Moon, "Hybrid genetic algorithms for feature selection," IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 11, pp. 1424-1437, Nov. 2004.
    [26]
    J. Huang, N. Lv, and W. Li, "A novel feature selection approach by hybrid genetic algorithm," in Lecture Notes on Artificial Intelligence. Berlin, Germany: Springer-Verlag, 2006, vol. 4099, pp. 721-729.
    [27]
    T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991.
    [28]
    S. Kullback, Information Theory and Statistics. New York: Dover, 1997.
    [29]
    A. M. Fraser and H. L. Swinney, "Independent coordinates for strange attractors from mutual information," Phys. Rev. A, Gen. Phys., vol. 33, no. 2, pp. 1134-1140, Feb. 1986.
    [30]
    K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed. New York: Academic, 1990.
    [31]
    A. Kraskov, H. Stögbauer, and P. Grassberger, "Estimating mutual information," Phys. Rev. E, Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top., vol. 69, no. 6, p. 066138, Jun. 2004.
    [32]
    T. Lan and D. Erdogmus, "Maximally informative feature and sensor selection in pattern recognition using local and global independent component analysis," J. VLSI Signal Process. Syst., vol. 48, no. 1-2, pp. 39-52, Aug. 2007.
    [33]
    O. Vasicek, "A test for normality based on sample entropy," J. Roy. Statist. Soc. B, vol. 31, pp. 632-636, 1976.
    [34]
    N. Kwak and C.-H. Choi, "Input feature selection by mutual information based on Parzen window," IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 12, pp. 1667-1671, Dec. 2002.
    [35]
    M. Tesmer and P. A. Estévez, "AMIFS: Adaptive feature selection by using mutual information," in Proc. IEEE Int. Joint Conf. Neural Netw., Budapest, Hungary, Jul. 2004, pp. 303-308.
    [36]
    J. R. Quinlan, "Induction of decision trees," Mach. Learn., vol. 1, pp. 81-106, 1986.
    [37]
    W. Press, B. Flannery, S. Teukolsky, and W. Vetterling, Numerical Recipes in C, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 1992.
    [38]
    W. Siedlecki and J. Sklansky, "A note on genetic algorithms for large-scale feature selection," Pattern Recognit. Lett., vol. 10, no. 5, pp. 335-347, 1989.
    [39]
    K. Saito and R. Nakano, "Partial BFGS update and efficient step-length calculation for three-layer neural networks," Neural Comput., vol. 9, no. 1, pp. 123-141, 1997.
    [40]
    L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. London, U.K.: Chapman & Hall, 1984.
    [41]
    D. Newman, S. Hettich, C. Blake, and C. Merz, UCI Repository of Machine Learning Databases, Univ. California at Irvine, Irvine, CA, 1998 {Online}. Available: http://www.ics.uci.edu/~mlearn/MLRepository.html
    [42]
    G. E. P. Box and G. M. Jenkins, Time Series Analysis. Cambridge, U.K.: Cambridge Univ. Press, 2003.
    [43]
    W. Pedrycz, "An identification algorithm in fuzzy relational systems," Fuzzy Sets Syst., vol. 13, pp. 153-167, 1984.
    [44]
    M. Sugeno and T. Yasukawa, "A fuzzy-logic-based approach to qualitative modeling," IEEE Trans. Fuzzy Syst., vol. 1, no. 1, pp. 7-31, Feb. 1993.
    [45]
    R. Tong, "The evaluation of fuzzy models derived from experimental data," Fuzzy Sets Syst., vol. 4, pp. 1-12, 1980.
    [46]
    C. Xu and Z. Yong, "Fuzzy model identification and self-learning for dynamic systems," IEEE Trans. Syst. Man Cybern., vol. SMC-17, no. 4, pp. 683-689, Jul./Aug. 1987.
    [47]
    J. C. Principe, D. Xu, and J. Fisher, "Information theoretic learning," in Unsupervised Adaptive Filtering, S. Haykin, Ed. New York: Wiley, 1999, ch. 7.

    Cited By

    View all

    Index Terms

    1. Normalized mutual information feature selection
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image IEEE Transactions on Neural Networks
      IEEE Transactions on Neural Networks  Volume 20, Issue 2
      February 2009
      184 pages

      Publisher

      IEEE Press

      Publication History

      Published: 01 February 2009
      Accepted: 17 July 2008
      Revised: 04 January 2008
      Received: 16 February 2007

      Author Tags

      1. Feature selection
      2. feature selection
      3. genetic algorithms
      4. multilayer perceptron (MLP) neural networks
      5. normalized mutual information (MI)

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 27 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Neighborhood contrastive representation learning for attributed graph clusteringNeurocomputing10.1016/j.neucom.2023.126880562:COnline publication date: 17-Jan-2024
      • (2024)Information bottleneck fusion for deep multi-view clusteringKnowledge-Based Systems10.1016/j.knosys.2024.111551289:COnline publication date: 8-Apr-2024
      • (2024)Deep clustering framework review using multicriteria evaluationKnowledge-Based Systems10.1016/j.knosys.2023.111315285:COnline publication date: 15-Feb-2024
      • (2024)Unsupervised social event detection via hybrid graph contrastive learning and reinforced incremental clusteringKnowledge-Based Systems10.1016/j.knosys.2023.111225284:COnline publication date: 25-Jan-2024
      • (2024)StarlightJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104832187:COnline publication date: 1-May-2024
      • (2024)Minimising redundancy, maximising relevanceExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122490239:COnline publication date: 1-Apr-2024
      • (2024)An efficient classification framework for Type 2 Diabetes incorporating feature interactionsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122138239:COnline publication date: 1-Apr-2024
      • (2024)FS-SCF networkExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121670237:PCOnline publication date: 1-Mar-2024
      • (2024)Local and soft feature selection for value function approximation in batch reinforcement learning for robot navigationThe Journal of Supercomputing10.1007/s11227-023-05854-480:8(10720-10745)Online publication date: 1-May-2024
      • (2023)Centerless multi-view K-means based on the adjacency matrixProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i7.26075(8949-8956)Online publication date: 7-Feb-2023
      • Show More Cited By

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media