Abstract
In general, gene expression microarrays consist of a vast number of genes and very few samples, which represents a critical challenge for disease prediction and diagnosis. This paper develops a two-stage algorithm that integrates feature selection and prediction by extending a type of hetero-associative neural networks. In the first level, the algorithm generates the associative memory, whereas the second level picks the most relevant genes. With the purpose of illustrating the applicability and efficiency of the method proposed here, we use four different gene expression microarray databases and compare their classification performance against that of other renowned classifiers built on the whole (original) feature (gene) space. The experimental results show that the two-stage hetero-associative memory is quite competitive with standard classification models regarding the overall accuracy, sensitivity and specificity. In addition, it also produces a significant decrease in computational efforts and an increase in the biological interpretability of microarrays because worthless (irrelevant and/or redundant) genes are discarded.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aghajari, Z.H., Teshnehlab, M., Jahed Motlagh, M.R.: A novel chaotic hetero-associative memory. Neurocomputing 167, 352–358 (2015)
Aihara, K., Takabe, T., Toyoda, M.: Chaotic neural networks. Phys. Lett. A 144(6), 333–340 (1990)
Aldape-Pérez, M., Yáñez-Márquez, C., Camacho-Nieto, O., Argüelles-Cruz, A.J.: An associative memory approach to medical decision support systems. Comput. Methods Prog. Biomed. 106(3), 287–307 (2012)
Anderson, J.A.: A simple neural network generating an interactive memory. Math. Biosci. 14, 197–220 (1972)
Ang, J.C., Mirzal, A., Haron, H., Hamed, H.N.A.: Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE ACM Trans Comput. Biol. Bioinform. 13(5), 971–989 (2016)
Arya, K.V., Singh, V., Mitra, P., Gupta, P.: Face recognition using parallel associative memory. In: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Singapore, pp. 1332–1336 (2008)
Babu, M., Sarkar, K.: A comparative study of gene selection methods for cancer classification using microarray data. In: Proceedings of the 2nd International Conference on Research in Computational Intelligence and Communication Networks, Kolkata, India, pp. 204–211 (2016)
Ben-Hur, A., Weston, J.: A user’s guide to support vector machines. In: Carugo, O., Eisenhaber, F. (eds.) Data Mining Techniques for the Life Sciences, Methods in Molecular Biology, vol. 609, pp. 223–239. Humana Press, New York (2010)
Berns, A.: Cancer: gene expression in diagnosis. Nature 403, 491–492 (2000)
Braga-Neto, U.M., Dougherty, E.R.: Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3), 374–380 (2004)
Chartier, S., Lepage, R.: Learning and extracting edges from images by a modified hopfield neural network. In: Proceedings of the 16th International Conference on Pattern Recognition, Quebec City, Canada, vol. 3, pp. 431–434 (2002)
Cleofas-Sánchez, L., García, V., Marqués, A., Sánchez, J.: Financial distress prediction using the hybrid associative memory with translation. Appl. Soft Comput. 44, 144–152 (2016)
Dougherty, E.R.: Small sample issues for microarray-based classification. Comp. Funct. Genom. 2(1), 28–34 (2001)
Dudoit, S., Fridlyand, J.: Classification in microarray experiments. In: Speed, T.P. (ed.) Statistical Analysis of Gene Expression Microarray Data, pp. 93–158. Chapman & Hall/CRC Press, London (2003)
Ein-Dor, L., Zuk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. 103(15), 5923–5928 (2006)
García, V., Sánchez, J.S.: Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inform. Sci. 294, 362–375 (2015)
García, V., Sánchez, J.S., Cleofas-Sánchez, L., Ochoa-Domínguez, H.J., López-Orozco, F.: An insight on the ‘large G, small n’ problem in gene-expression microarray classification. In: Proceedings of the 8th Iberian Conference on Pattern Recognition and Image Analysis, Faro, Portugal, pp. 483–490 (2017)
Hassanien, A.E., Al-Shammari, E.T., Ghali, N.I.: Computational intelligence techniques in bioinformatics. Comput. Biol. Chem. 47, 37–47 (2013)
Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015(ID 198363), 1–13 (2015)
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. In: Anderson, J.A., Rosenfeld, E. (eds.) Neurocomputing: Foundations of Research, pp. 457–464. Proceedings of the National Academy of Sciences USA, Cambridge (1988)
Hruschka, E.R., Hruschka, E.R., Ebecken, N.F.F.: Towards efficient imputation by nearest-neighbors: a clustering-based approach. In: Proceedings of the 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, pp. 513–525 (2004)
Hua, J., Xiong, Z., Lowey, J., Suh, E., Dougherty, E.R.: Optimal number of features as a function of sample size for various classification rules. Bioinformatics 21(8), 1509–1515 (2005)
Irsoy, O., Yildiz, O.T., Alpaydin, E.: Design and analysis of classifier learning experiments in bioinformatics: survey and case studies. IEEE ACM Trans. Comput. Biol. 9(6), 1663–1675 (2012)
Japkowicz, N.: Assessment metrics for imbalanced learning. In: He, H., Ma, Y. (eds.) Imbalanced Learning: Foundations, Algorithms, and Applications, pp. 187–210. Wiley IEEE Press, New York (2013)
Kohonen, T.: Correlation matrix memories. IEEE Trans. Comput. C–21(4), 353–359 (1972)
Kohonen, T.: Associative Memory. A System—Theoretical Approach. Springer, Berlin (1977)
Kosko, B.: Bidirectional associative memories. IEEE Trans. Syst. Man Cybern. 18(1), 49–60 (1988)
Larrañaga, P., Calvo, B., Santana, R., Bielza, C., Galdiano, J., Inza, I., Lozano, J.A., Armañanzas, R., Santafé, G., Pérez, A., Robles, V.: Machine learning in bioinformatics. Brief. Bioinform. 7(1), 86–112 (2011)
Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., de Schaetzen, V., Duque, R., Bersini, H., Nowe, A.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE ACM Trans. Comput. Biol. Bioinform. 9(4), 1106–1119 (2012)
Lee, J.W., Lee, J.B., Park, M., Song, S.H.: An extensive evaluation of recent classification tools applied to microarray data. Comput. Stat. Data Anal. 48, 869–885 (2005)
Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy K-means clustering method. In: Proceedings of the 4th International Conference on Rough Sets and Current Trends in Computing, Uppsala, Sweden, pp. 573–579 (2004)
Lu, Y., Han, J.: Cancer classification using gene expression data. Inform. Syst. 28(4), 243–268 (2003)
Ma, S., Huang, J.: Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics 21(2), 4356–4362 (2005)
Mahata, P., Mahata, K.: Selecting differentially expressed genes using minimum probability of classification error. J. Biomed. Inform. 40(6), 775–786 (2007)
Nakano, K.: Associatron—a model on associative memory. IEEE Trans. Syst. Man Cybern. 2(3), 380–388 (1972)
Raspe, E., Decraene, C., Berx, G.: Gene expression profiling to dissect the complexity of cancer biology: pitfalls and promise. Semin. Cancer Biol. 22(3), 250–260 (2012)
Raudys, S.J., Jain, A.K.: Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans. Pattern Anal. Mach. Intell. 13(3), 252–264 (1991)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Sharma, N., Ray, A., Sharma, S., Shukla, K., Pradhan, S., Aggarwal, L.: Segmentation and classification of medical images using texture-primitive features: application of BAM-type artificial neural network. J. Med. Phys. 33(3), 119–126 (2008)
Steinbuch, K.: Die lernmatrix. Kybernetik 1(1), 36–45 (1961). In German
Sudo, A., Sato, A., Hasegawa, O.: Associative memory for online learning in noisy environments using self-organizing incremental neural network. IEEE Trans. Neural Netw. 20(6), 964–972 (2009)
Sun, X., Liu, Y., Wei, D., Xu, M., Chen, H., Han, J.: Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis. J. Biomed. Inform. 46(2), 252–258 (2013)
Vaishnavi, Y., Shreyas, R., Suhas, S., Surya, U.N., Ladwani, V.M., Ramasubramanian, V.: Associative memory framework for speech recognition: adaptation of hopfield network. In: 2016 IEEE Annual India Conference, Bangalore, India, pp. 1–6 (2016)
Villuendas-Rey, Y., Rey-Benguría, C.F., Ferreira-Santiago, A., Camacho-Nieto, O., Yáñez-Márquez, C.: The naïve associative classifier (NAC): a novel, simple, transparent, and accurate classification model evaluated on financial data. Neurocomputing 265, 105–115 (2017)
Weigelt, B., Baehner, F.L., Reis-Filho, J.S.: The contribution of gene expression profiling to breast cancer classification, prognostication and prediction: a retrospective of the last decade. J. Pathol. 220(2), 263–280 (2010)
Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proceedings of the 8th International Conference on Machine Learning, Williamstown, MA, pp. 601–608 (2001)
Yáñez-Márquez, C.: Associative memories based on order relations and binary operators. Ph.D. thesis, Centro de Investigación en Computación - Instituto Politécnico Nacional, Mexico, (In Spanish) (2002)
Yoon, Y., Lee, J., Park, S., Bien, S., Chung, H.C., Rha, S.Y.: Direct integration of microarrays for selecting informative genes and phenotype classification. Inf. Sci. 178(1), 88–105 (2008)
Zhang, Z., Zhuo, H., Liu, S., de B Harrington, P.: Classification of cancer patients based on elemental contents of serums using bidirectional associative memory networks. Anal. Chim. Acta 436(2), 281–291 (2001)
Acknowledgements
This study was partially supported by the Valencian Council of Education, Research, Culture and Sport [PROMETEOII/2014/062], the Mexican PRODEP [DSA/103.5/15/7004], and the Spanish Ministry of Economy, Industry and Competitiveness under Grant [TIN2013-46522-P].
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cleofas-Sánchez, L., Sánchez, J.S. & García, V. Gene selection and disease prediction from gene expression data using a two-stage hetero-associative memory. Prog Artif Intell 8, 63–71 (2019). https://doi.org/10.1007/s13748-018-0148-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-018-0148-6