Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Gene selection and disease prediction from gene expression data using a two-stage hetero-associative memory

  • Regular Paper
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

In general, gene expression microarrays consist of a vast number of genes and very few samples, which represents a critical challenge for disease prediction and diagnosis. This paper develops a two-stage algorithm that integrates feature selection and prediction by extending a type of hetero-associative neural networks. In the first level, the algorithm generates the associative memory, whereas the second level picks the most relevant genes. With the purpose of illustrating the applicability and efficiency of the method proposed here, we use four different gene expression microarray databases and compare their classification performance against that of other renowned classifiers built on the whole (original) feature (gene) space. The experimental results show that the two-stage hetero-associative memory is quite competitive with standard classification models regarding the overall accuracy, sensitivity and specificity. In addition, it also produces a significant decrease in computational efforts and an increase in the biological interpretability of microarrays because worthless (irrelevant and/or redundant) genes are discarded.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Aghajari, Z.H., Teshnehlab, M., Jahed Motlagh, M.R.: A novel chaotic hetero-associative memory. Neurocomputing 167, 352–358 (2015)

    Google Scholar 

  2. Aihara, K., Takabe, T., Toyoda, M.: Chaotic neural networks. Phys. Lett. A 144(6), 333–340 (1990)

    MathSciNet  Google Scholar 

  3. Aldape-Pérez, M., Yáñez-Márquez, C., Camacho-Nieto, O., Argüelles-Cruz, A.J.: An associative memory approach to medical decision support systems. Comput. Methods Prog. Biomed. 106(3), 287–307 (2012)

    Google Scholar 

  4. Anderson, J.A.: A simple neural network generating an interactive memory. Math. Biosci. 14, 197–220 (1972)

    MATH  Google Scholar 

  5. Ang, J.C., Mirzal, A., Haron, H., Hamed, H.N.A.: Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE ACM Trans Comput. Biol. Bioinform. 13(5), 971–989 (2016)

    Google Scholar 

  6. Arya, K.V., Singh, V., Mitra, P., Gupta, P.: Face recognition using parallel associative memory. In: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Singapore, pp. 1332–1336 (2008)

  7. Babu, M., Sarkar, K.: A comparative study of gene selection methods for cancer classification using microarray data. In: Proceedings of the 2nd International Conference on Research in Computational Intelligence and Communication Networks, Kolkata, India, pp. 204–211 (2016)

  8. Ben-Hur, A., Weston, J.: A user’s guide to support vector machines. In: Carugo, O., Eisenhaber, F. (eds.) Data Mining Techniques for the Life Sciences, Methods in Molecular Biology, vol. 609, pp. 223–239. Humana Press, New York (2010)

    Google Scholar 

  9. Berns, A.: Cancer: gene expression in diagnosis. Nature 403, 491–492 (2000)

    Google Scholar 

  10. Braga-Neto, U.M., Dougherty, E.R.: Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3), 374–380 (2004)

    Google Scholar 

  11. Chartier, S., Lepage, R.: Learning and extracting edges from images by a modified hopfield neural network. In: Proceedings of the 16th International Conference on Pattern Recognition, Quebec City, Canada, vol. 3, pp. 431–434 (2002)

  12. Cleofas-Sánchez, L., García, V., Marqués, A., Sánchez, J.: Financial distress prediction using the hybrid associative memory with translation. Appl. Soft Comput. 44, 144–152 (2016)

    Google Scholar 

  13. Dougherty, E.R.: Small sample issues for microarray-based classification. Comp. Funct. Genom. 2(1), 28–34 (2001)

    Google Scholar 

  14. Dudoit, S., Fridlyand, J.: Classification in microarray experiments. In: Speed, T.P. (ed.) Statistical Analysis of Gene Expression Microarray Data, pp. 93–158. Chapman & Hall/CRC Press, London (2003)

    Google Scholar 

  15. Ein-Dor, L., Zuk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. 103(15), 5923–5928 (2006)

    Google Scholar 

  16. García, V., Sánchez, J.S.: Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inform. Sci. 294, 362–375 (2015)

    MathSciNet  Google Scholar 

  17. García, V., Sánchez, J.S., Cleofas-Sánchez, L., Ochoa-Domínguez, H.J., López-Orozco, F.: An insight on the ‘large G, small n’ problem in gene-expression microarray classification. In: Proceedings of the 8th Iberian Conference on Pattern Recognition and Image Analysis, Faro, Portugal, pp. 483–490 (2017)

  18. Hassanien, A.E., Al-Shammari, E.T., Ghali, N.I.: Computational intelligence techniques in bioinformatics. Comput. Biol. Chem. 47, 37–47 (2013)

    Google Scholar 

  19. Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015(ID 198363), 1–13 (2015)

    Google Scholar 

  20. Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)

    Google Scholar 

  21. Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. In: Anderson, J.A., Rosenfeld, E. (eds.) Neurocomputing: Foundations of Research, pp. 457–464. Proceedings of the National Academy of Sciences USA, Cambridge (1988)

  22. Hruschka, E.R., Hruschka, E.R., Ebecken, N.F.F.: Towards efficient imputation by nearest-neighbors: a clustering-based approach. In: Proceedings of the 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, pp. 513–525 (2004)

  23. Hua, J., Xiong, Z., Lowey, J., Suh, E., Dougherty, E.R.: Optimal number of features as a function of sample size for various classification rules. Bioinformatics 21(8), 1509–1515 (2005)

    Google Scholar 

  24. Irsoy, O., Yildiz, O.T., Alpaydin, E.: Design and analysis of classifier learning experiments in bioinformatics: survey and case studies. IEEE ACM Trans. Comput. Biol. 9(6), 1663–1675 (2012)

    Google Scholar 

  25. Japkowicz, N.: Assessment metrics for imbalanced learning. In: He, H., Ma, Y. (eds.) Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 187–210. Wiley IEEE Press, New York (2013)

  26. Kohonen, T.: Correlation matrix memories. IEEE Trans. Comput. C–21(4), 353–359 (1972)

    MATH  Google Scholar 

  27. Kohonen, T.: Associative Memory. A System—Theoretical Approach. Springer, Berlin (1977)

    MATH  Google Scholar 

  28. Kosko, B.: Bidirectional associative memories. IEEE Trans. Syst. Man Cybern. 18(1), 49–60 (1988)

    MathSciNet  Google Scholar 

  29. Larrañaga, P., Calvo, B., Santana, R., Bielza, C., Galdiano, J., Inza, I., Lozano, J.A., Armañanzas, R., Santafé, G., Pérez, A., Robles, V.: Machine learning in bioinformatics. Brief. Bioinform. 7(1), 86–112 (2011)

    Google Scholar 

  30. Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., de Schaetzen, V., Duque, R., Bersini, H., Nowe, A.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE ACM Trans. Comput. Biol. Bioinform. 9(4), 1106–1119 (2012)

    Google Scholar 

  31. Lee, J.W., Lee, J.B., Park, M., Song, S.H.: An extensive evaluation of recent classification tools applied to microarray data. Comput. Stat. Data Anal. 48, 869–885 (2005)

    MATH  Google Scholar 

  32. Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy K-means clustering method. In: Proceedings of the 4th International Conference on Rough Sets and Current Trends in Computing, Uppsala, Sweden, pp. 573–579 (2004)

  33. Lu, Y., Han, J.: Cancer classification using gene expression data. Inform. Syst. 28(4), 243–268 (2003)

    MATH  Google Scholar 

  34. Ma, S., Huang, J.: Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics 21(2), 4356–4362 (2005)

    Google Scholar 

  35. Mahata, P., Mahata, K.: Selecting differentially expressed genes using minimum probability of classification error. J. Biomed. Inform. 40(6), 775–786 (2007)

    Google Scholar 

  36. Nakano, K.: Associatron—a model on associative memory. IEEE Trans. Syst. Man Cybern. 2(3), 380–388 (1972)

    Google Scholar 

  37. Raspe, E., Decraene, C., Berx, G.: Gene expression profiling to dissect the complexity of cancer biology: pitfalls and promise. Semin. Cancer Biol. 22(3), 250–260 (2012)

    Google Scholar 

  38. Raudys, S.J., Jain, A.K.: Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans. Pattern Anal. Mach. Intell. 13(3), 252–264 (1991)

    Google Scholar 

  39. Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Google Scholar 

  40. Sharma, N., Ray, A., Sharma, S., Shukla, K., Pradhan, S., Aggarwal, L.: Segmentation and classification of medical images using texture-primitive features: application of BAM-type artificial neural network. J. Med. Phys. 33(3), 119–126 (2008)

    Google Scholar 

  41. Steinbuch, K.: Die lernmatrix. Kybernetik 1(1), 36–45 (1961). In German

    MATH  Google Scholar 

  42. Sudo, A., Sato, A., Hasegawa, O.: Associative memory for online learning in noisy environments using self-organizing incremental neural network. IEEE Trans. Neural Netw. 20(6), 964–972 (2009)

    Google Scholar 

  43. Sun, X., Liu, Y., Wei, D., Xu, M., Chen, H., Han, J.: Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis. J. Biomed. Inform. 46(2), 252–258 (2013)

    Google Scholar 

  44. Vaishnavi, Y., Shreyas, R., Suhas, S., Surya, U.N., Ladwani, V.M., Ramasubramanian, V.: Associative memory framework for speech recognition: adaptation of hopfield network. In: 2016 IEEE Annual India Conference, Bangalore, India, pp. 1–6 (2016)

  45. Villuendas-Rey, Y., Rey-Benguría, C.F., Ferreira-Santiago, A., Camacho-Nieto, O., Yáñez-Márquez, C.: The naïve associative classifier (NAC): a novel, simple, transparent, and accurate classification model evaluated on financial data. Neurocomputing 265, 105–115 (2017)

    Google Scholar 

  46. Weigelt, B., Baehner, F.L., Reis-Filho, J.S.: The contribution of gene expression profiling to breast cancer classification, prognostication and prediction: a retrospective of the last decade. J. Pathol. 220(2), 263–280 (2010)

    Google Scholar 

  47. Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proceedings of the 8th International Conference on Machine Learning, Williamstown, MA, pp. 601–608 (2001)

  48. Yáñez-Márquez, C.: Associative memories based on order relations and binary operators. Ph.D. thesis, Centro de Investigación en Computación - Instituto Politécnico Nacional, Mexico, (In Spanish) (2002)

  49. Yoon, Y., Lee, J., Park, S., Bien, S., Chung, H.C., Rha, S.Y.: Direct integration of microarrays for selecting informative genes and phenotype classification. Inf. Sci. 178(1), 88–105 (2008)

    Google Scholar 

  50. Zhang, Z., Zhuo, H., Liu, S., de B Harrington, P.: Classification of cancer patients based on elemental contents of serums using bidirectional associative memory networks. Anal. Chim. Acta 436(2), 281–291 (2001)

    Google Scholar 

Download references

Acknowledgements

This study was partially supported by the Valencian Council of Education, Research, Culture and Sport [PROMETEOII/2014/062], the Mexican PRODEP [DSA/103.5/15/7004], and the Spanish Ministry of Economy, Industry and Competitiveness under Grant [TIN2013-46522-P].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Salvador Sánchez.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cleofas-Sánchez, L., Sánchez, J.S. & García, V. Gene selection and disease prediction from gene expression data using a two-stage hetero-associative memory. Prog Artif Intell 8, 63–71 (2019). https://doi.org/10.1007/s13748-018-0148-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-018-0148-6

Keywords