Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

A fast meta-heuristic approach for the \((\alpha ,\beta )-k\)-feature set problem

  • Published:
Journal of Heuristics Aims and scope Submit manuscript

Abstract

The feature selection problem aims to choose a subset of a given set of features that best represents the whole in a particular aspect, preserving the original semantics of the variables on the given samples and classes. In 2004, a new approach to perform feature selection was proposed. It was based on a NP-complete combinatorial optimisation problem called (\(\alpha ,\beta \))-k-feature set problem. Although effective for many practical cases, which made the approach an important feature selection tool, the only existing solution method, proposed on the original paper, was found not to work well for several instances. Our work aims to cover this gap found on the literature, quickly obtaining high quality solutions for the instances that existing approach can not solve. This work proposes a heuristic based on the greedy randomised adaptive search procedure and tabu search to address this problem; and benchmark instances to evaluate its performance. The computational results show that our method can obtain high quality solutions for both real and the proposed artificial instances and requires only a fraction of the computational resources required by the state of the art exact and heuristic approaches which use mixed integer programming models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Find the latest articles, discoveries, and news in related topics.

References

  • Arefin, A., Inostroza-Ponta, M., Mathieson, L., Berretta, R., Moscato, P.: Clustering Nodes in Large-Scale Biological Networks Using External Memory Algorithms. Algorithms and architectures for parallel processing. Springer, Berlin (2011)

    Book  Google Scholar 

  • Benoist, T., Estellon, B., Gardi, F., Megel, R., Nouioua, K.: Localsolver 1. x: a black-box local-search solver for 0-1 programming. 4OR 9(3), 299–316 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Berretta, R., Mendes, A., Moscato, P.: Integer programming models and algorithms for molecular classification of cancer from microarray data. In: Proceedings of the Twenty-eighth Australasian conference on Computer Science, vol. 38, pp. 361–370. Australian Computer Society, Inc., (2005)

  • Berretta, R., Mendes, A., Moscato, P.: Selection of discriminative genes in microarray experiments using mathematical programming. J. Res. Pract. Inf. Technol. 39(4), 287–299 (2007)

    Google Scholar 

  • Berretta, R., Costa, W., Moscato, P.: Combinatorial optimization models for finding genetic signatures from gene expression datasets. Methods Mol. Biol. 453, 363–377 (2008)

    Article  Google Scholar 

  • Bolón-Canedo, V.: A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34, 483-519. http://link.springer.com/article/10.1007/s10115-012-0487-8 (2013)

  • Chandran, U., Ma, C., Dhir, R., Bisceglia, M., Lyons-Weiler, M., Liang, W., Michalopoulos, G., Becich, M., Monzon, F.: Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process. BMC Cancer 7(1):64, doi:10.1186/1471-2407-7-64, http://www.biomedcentral.com/1471-2407/7/64 (2007)

  • Charlesworth, J.C., Curran, J.E., Johnson, M.P., Göring, H.H.H., Dyer, T.D., Diego, V.P., Kent, J.W., Mahaney, M.C., Almasy, L., MacCluer, J.W., et al.: Transcriptomic epidemiology of smoking: the effect of smoking on gene expression in lymphocytes. BMC Med. Genomics 3(1), 29 (2010)

    Article  Google Scholar 

  • Cotta, C., Moscato, P.: The k-feature set problem is W-complete. J. Comput. Syst. Sci. 67(4), 686–690 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Cotta, C., Sloper, C., Moscato, P.: Evolutionary Search of thresholds for robust feature set selection: application to the analysis of microarray data. In: Raidl, G.R., Cagnoni, S., Branke, J., Corne, D.W., Drechsler, R., Jin, Y., Johnson, C.G., Machado, P., Machori, E., Rothlauf, F., Smith, G.D., Squillero, G. (eds.) Applications of Evolutionary Computing, Lecture Notes in Computer Science, vol. 3005, pp. 21–30. Springer, Berlin (2004)

  • Davies, S., Russell, S.: NP-completeness of searches for smallest possible feature sets. In: AAAI Symposium on Intelligent Relevance, AAAI Press, pp. 37–39 (1994)

  • de Rocha, Paula M., Ravetti, M.G., Berretta, R., Moscato, P.: Differences in abundances of cell-signalling proteins in blood reveal novel biomarkers for early detection of clinical Alzheimer’s disease. PloS One 6(3), e17,481 (2011)

    Article  Google Scholar 

  • Fayyad, U.M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Mach. Learn. 8(1), 87–102 (1992)

    MATH  Google Scholar 

  • Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: 13th International Joint Conference on Artificial Intelligence, Morgan Kaufmann (1993)

  • Feo, T.A., Resende, M.G.C.: Greedy randomized adaptive search procedures. J. Global Optim. 6(2), 109–133 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Glover, F.: Tabu search-part I. ORSA J. Comput. 1(3), 190–206 (1989)

    Article  MATH  Google Scholar 

  • Glover, F.: Tabu search-part II. ORSA J. Comput. 2(1), 4–32 (1990)

    Article  MATH  Google Scholar 

  • Glover, F., Laguna, M.: Tabu Search, vol. 1. Springer, Heidelberg (1998)

    MATH  Google Scholar 

  • Gómez Ravetti, M., Moscato, P.: Identification of a 5-protein biomarker molecular signature for predicting Alzheimer’s disease. PLoS One 3(9), e3111 (2008)

    Article  Google Scholar 

  • Gómez Ravetti, M., Berretta, R., Moscato, P.: Novel biomarkers for prostate cancer revealed by (\(\alpha \),\(\beta \))-k-feature sets. In: Foundations of Computational Intelligence, chap 7, vol. 5, pp. 149–175. Springer, Berlin (2009)

  • Gómez Ravetti, M., Rosso, O.A., Berretta, R., Moscato, P.: Uncovering molecular biomarkers that correlate cognitive decline with the changes of hippocampus’ gene expression profiles in Alzheimer’s disease. PloS One 5(4), e10,153 (2010)

    Article  Google Scholar 

  • Hall, M.A.: Correlation-based feature subset selection for machine learning. PhD Thesis, Department of Computer Science, University of Waikato (1999)

  • Kohane, I.S., Kho, A., Butte, A.J.: Microarrays for an Integrative Genomics. The MIT Press, Cambridge (2002)

    Google Scholar 

  • Lesnick, T.G., Papapetropoulos, S., Mash, D.C., Ffrench-Mullen, J., Shehadeh, L., de Andrade, M., Henley, J.R., Rocca, W.A., Ahlskog, J.E., Maraganore, D.M.: A genomic pathway approach to a complex disease: Axon guidance and Parkinson disease. PLoS Genet. 3(6), e98 (2007). doi:10.1371/journal.pgen.0030098

    Article  Google Scholar 

  • Lockstone, H.E., Harris, L.W., Swatton, J.E., Wayland, M.T., Holland, A.J., Bahn, S.: Gene expression profiling in the adult Down syndrome brain. Genomics 90(6):647–660, doi:10.1016/j.ygeno.2007.08.005, http://www.sciencedirect.com/science/article/pii/S0888754307002054 (2007)

  • Mendes, A., Scott, R.J., Moscato, P.: Microarrays—identifying molecular portraits for prostate tumors with different Gleason patterns. In: Clin. Bioinf. Rev. pp. 131–151 (2008)

  • Moscato, P., Mathieson, L., Mendes, A., Berretta, R.: The electronic primaries: predicting the U.S. presidency using feature selection with safe data reduction. In: ACSC ’05: Proceedings of the Twenty-eighth Australasian Conference on Computer Science, Australian Computer Society, Inc., Darlinghurst, Australia, pp. 371–379 (2005)

  • Ray, S., Britschgi, M., Herbert, C., Takeda-Uchimura, Y., Boxer, A., Blennow, K., Friedman, L.F., Galasko, D.R., Jutel, M., Karydas, A., Kaye, J.A., Leszek, J., Miller, B.L., Minthon, L., Quinn, J.F., Rabinovici, G.D., Robinson, W.H., Sabbagh, M.N., So, Y.T., Sparks, D.L., Tabaton, M., Tinklenberg, J., Yesavage, J.A., Tibshirani, R., Wyss-Coray, T.: Classification and prediction of clinical Alzheimer’s diagnosis based on plasma signaling proteins. Nat. Med. 13(11), 1359–1362 (2007)

    Article  Google Scholar 

  • Riveros, C., Mellor, D., Gandhi, K.S., McKay, F.C., Cox, M.B., Berretta, R., Vaezpour, S.Y., Inostroza-Ponta, M., Broadley, S.A., Heard, R.N., et al.: A transcription factor map as revealed by a genome-wide gene expression analysis of whole-blood mRNA transcriptome in multiple sclerosis. PloS One 5(12), e14176 (2010)

    Article  Google Scholar 

  • Rosso, O.A., Mendes, A., Berretta, R., Rostas, J.A., Hunter, M., Moscato, P.: Distinguishing childhood absence epilepsy patients from controls by the analysis of their background brain electrical activity (II): a combinatorial optimization approach for electrode selection. J. Neurosci. Methods 181(2), 257–267 (2009)

    Article  Google Scholar 

  • Scherzer, C.R., Eklund, A.C., Morse, L.J., Liao, Z., Locascio, J.J., Fefer, D., Schwarzschild, M.A,, Schlossmacher, M.G., Hauser, M.A., Vance, J.M., Sudarsky, L.R., Standaert, D.G., Growdon, J.H., Jensen, R.V., Gullans, S.R.: Molecular markers of early Parkinson’s disease based on gene expression in blood. Proc. Natl. Acad. Sci. 104(3):955–960,doi:10.1073/pnas.0610204104, http://www.pnas.org/content/104/3/955.abstract (2007)

  • Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5:1205–1224, http://dl.acm.org/citation.cfm?id=1044700 (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mateus Rocha de Paula.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rocha de Paula, M., Berretta, R. & Moscato, P. A fast meta-heuristic approach for the \((\alpha ,\beta )-k\)-feature set problem. J Heuristics 22, 199–220 (2016). https://doi.org/10.1007/s10732-015-9307-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10732-015-9307-0

Keywords