Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2739482.2764712acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
poster

Imbalanced Classification Using Genetically Optimized Random Forests

Published: 11 July 2015 Publication History
  • Get Citation Alerts
  • Abstract

    Class imbalance is a problem that commonly affects 'real world' classification datasets, and has been shown to hinder the performance of classifiers. A dataset suffers from class imbalance when the number of instances belonging to one class outnumbers the number of instance belonging to another class. Two ways of dealing with class imbalance are modifying the dataset to reduce the number of instances belonging to the majority class(es) (known as resampling), or allowing the classifier to penalize misclassifying the minority class(es) more than the majority class(es), this can be done by implementing a cost matrix. This paper attempts to improve the classification performance of the Random Forest classifier on imbalanced datasets by exploiting these two techniques, to do this a genetic algorithm is employed to find optimal parameters. Results are compared to commonly used classification algorithms.

    References

    [1]
    J Alcalá, A Fernández, J Luengo, J Derrac, S García, L Sánchez, and F. Herrera. Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17:255--287, 2010.
    [2]
    Leo Breiman. Random forests. Machine learning, 45(1):5--32, 2001.
    [3]
    Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. Classification and regression trees. CRC press, 1984.
    [4]
    Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10--18, 2009.
    [5]
    Nathalie Japkowicz and Shaju Stephen. The class imbalance problem: A systematic study. Intelligent data analysis, 6(5):429--449, 2002.

    Cited By

    View all
    • (2018)Class Weights Random Forest Algorithm for Processing Class Imbalanced Medical DataIEEE Access10.1109/ACCESS.2018.27894286(4641-4652)Online publication date: 2018

    Index Terms

    1. Imbalanced Classification Using Genetically Optimized Random Forests

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      GECCO Companion '15: Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation
      July 2015
      1568 pages
      ISBN:9781450334884
      DOI:10.1145/2739482
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 July 2015

      Check for updates

      Author Tags

      1. classification
      2. cost matrix
      3. cost sensitive classification
      4. genetic algorithms
      5. random forest

      Qualifiers

      • Poster

      Conference

      GECCO '15
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Class Weights Random Forest Algorithm for Processing Class Imbalanced Medical DataIEEE Access10.1109/ACCESS.2018.27894286(4641-4652)Online publication date: 2018

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media