Predicting COVID-19 hospitalizations with attribute selection based on genetic and classification algorithms
DOI:
https://doi.org/10.5753/isys.2022.2187Keywords:
Feature selection, COVID-19, Genetic algorithm, Machine learning, Hospitalization predictionAbstract
The COVID-19 pandemic has been pressuring the whole society and overloading hospital systems. Machine learning models designed to predict hospitalizations, for example, can contribute to better targeting hospital resources. However, as the excess of information, often irrelevant or redundant, can impair predictive models’ performance, we propose a hybrid approach to attribute selection in this work. This method aims to find an optimal attribute subset through a genetic algorithm, which considers the results of a classification model in its evaluation function to improve the hospitalization need prediction of COVID-19 patients. We evaluated this approach in two official databases from the State Health Secretariat of Rio Grande do Sul, covering COVID-19 cases registered up to October 2020 and June 2021, respectively. As a result, we provided an increase of 18% in the classification precision for patients with hospitalization necessities in the first database, while in the second one, considering a temporal evaluation with sliding window, this gain was on average 6%. In a real-time application, this would also mean greater precision in targeting resources and, consequently and mainly, improved service to the infected population.
Downloads
References
Alpaydin, E. (2010). Introduction to machine learning. MIT Press, Cambridge, 2nd edition.
Arvind, V., Kim, J. S., Cho, B. H., Geng, E., and Cho, S. K. (2021). Development of a machine learning algorithm to predict intubation among hospitalized patients with COVID-19. Journal of Critical Care, 62:25–30. doi: https://doi.org/10.1016/j.jcrc.2020.10.033.
Burdick, H., Lam, C., Mataraso, S., Siefkas, A., Braden, G., Dellinger, R. P., Mc-Coy, A., Vincent, J.-L., Green-Saxena, A., Barnes, G., Hoffman, J., Calvert, J., Pellegrini, E., and Das, R. (2020). Prediction of respiratory decompensation in Covid-19 patients using machine learning: The READY trial. Computers in Biology and Medicine, 124:103949. doi: https://doi.org/10.1016/j.compbiomed.2020.103949.
Colpo, M. P., Alves, B. C., Pereira, K. S., Brandão, A. F. Z., de Aguiar, M. S., and Primo, T. T. (2021). Attribute selection based on genetic and classification algorithms in the prediction of hospitalization need of COVID-19 patients. In XVII Brazilian Symposium on Information Systems, SBSI 2021, New York, NY, USA. Association for Computing Machinery. doi: https://doi.org/10.1145/3466933.3466935.
Cueto-López, N., García-Ordás, M. T., Dávila-Batista, V., Moreno, V., Aragonés, N., and Alaiz-Rodríguez, R. (2019). A comparative study on feature selection for a risk prediction model for colorectal cancer. Computer Methods and Programs in Biomedicine, 177:219–229. doi: https://doi.org/10.1016/j.cmpb.2019.06.001.
Faria, N. R., Mellan, T. A., Whittaker, C., Claro, I. M., da S. Candido, D., Mishra, S., Crispim, M. A. E., Sales, F. C. S., Hawryluk, I., McCrone, J. T., Hulswit, R. J. G., Franco, L. A. M., Ramundo, M. S., de Jesus, J. G., Andrade, P. S., Coletti, T. M., Ferreira, G. M., Silva, C. A. M., Manuli, E. R., Pereira, R. H. M., Peixoto, P. S., Kraemer, M. U. G., Gaburo, N., da C. Camilo, C., Hoeltgebaum, H., Souza, W. M., Rocha, E. C., de Souza, L. M., de Pinho, M. C., Araujo, L. J. T., Malta, F. S. V., de Lima, A. B., do P. Silva, J., Zauli, D. A. G., de S. Ferreira, A. C., Schnekenberg, R. P., Laydon, D. J.,Walker, P. G. T., Schlüter, H. M., dos Santos, A. L. P., Vidal, M. S., Caro, V. S. D., Filho, R. M. F., dos Santos, H. M., Aguiar, R. S., Proenc¸a-Modena, J. L., Nelson, B., Hay, J. A., Monod, M., Miscouridou, X., Coupland, H., Sonabend, R., Vollmer, M., Gandy, A., Prete, C. A., Nascimento, V. H., Suchard, M. A., Bowden, T. A., Pond, S. L. K., Wu, C.-H., Ratmann, O., Ferguson, N. M., Dye, C., Loman, N. J., Lemey, P., Rambaut, A., Fraiji, N. A., do P. S. S. Carvalho, M., Pybus, O. G., Flaxman, S., Bhatt, S., and Sabino, E. C. (2021). Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil. Science, 372(6544):815–821. doi: https://doi.org/10.1126/science.abh2644.
Funchal, J. P. d. S. and Adanatti, D. F. (2016). Um estudo sobre a classificação de risco na Área da saúde utilizando Árvores de decisão. iSys – Revista Brasileira de Sistemas de Informação, 9(3):9–111. doi: https://doi.org/10.5753/isys.2016.317.
Han, J., Pei, J., and Kamber, M. (2011). Data mining: concepts and techniques. Morgan Kaufmann, Waltham, 3rd edition.
Heckler, W. F., Varella, J. d. C., Costa, C. C. d., and Barbosa, J. L. V. (2020). A model to patient abandonment prediction in the pulmonary rehabilitation. In XVI Brazilian Symposium on Information Systems, SBSI’20, New York, NY, USA. Association for Computing Machinery. doi: https://doi.org/10.1145/3411564.3411642.
Linden, R. (2008). Algoritmos Genéticos. Brasport, Rio de Janeiro, 2nd edition.
Lynch, C. M., Abdollahi, B., Fuqua, J. D., de Carlo, A. R., Bartholomai, J. A., Balgemann, R. N., van Berkel, V. H., and Frieboes, H. B. (2017). Prediction of lung cancer patient survival via supervised machine learning classification techniques. International Journal of Medical Informatics, 108:1–8. doi: https://doi.org/10.1016/j.ijmedinf.2017.09.013.
Maleki, N., Zeinali, Y., and Niaki, S. T. A. (2021). A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection. Expert Systems with Applications, 164:113981. doi: https://doi.org/10.1016/j.eswa.2020.113981.
Monteiro, F., Meloni, F., Baranauskas, J. A., and Macedo, A. A. (2020). Prediction of mortality in intensive care units: a multivariate feature selection. Journal of Biomedical Informatics, 107:103456. doi: https://doi.org/10.1016/j.jbi.2020.103456.
PAHO (2020). Pan American Health Organization. Ficha Informativa COVID-19: A COVID-19 e o papel dos sistemas de informação e das tecnologias na atenção primária. [link], May, 23.
Pawlovsky, A. P. and Matsuhashi, H. (2017). The use of a novel genetic algorithm in component selection for a kNN method for breast cancer prognosis. In 2017 Global Medical Engineering Physics Exchanges/Pan American Health Care Exchanges (GMEPE/PAHCE), pages 1–5, Tuxtla Gutierrez, Mexico. IEEE. doi: https://doi.org/10.1109/GMEPE-PAHCE.2017.7972084.
Pradeep, K. and Naveen, N. (2018). Lung cancer survivability prediction based on performance using classification techniques of support vector machines, c4.5 and naive bayes algorithms for healthcare analytics. Procedia Computer Science, 132:412–420. doi: https://doi.org/10.1016/j.procs.2018.05.162.
Raschka, S. and Mirjalili, V. (2017). Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow. Packt Publishing, Birmingham, UK, 2nd edition.
Scikit-learn (2020). Feature selection. [link].
SES/RS (2020). Secretaria Estadual da Saúde do Rio Grande do Sul. Painel Coronavírus RS. [link].
The Novel Coronavirus Pneumonia Emergency Response Epidemiology Team (2020). The Epidemiological Characteristics of an Outbreak of 2019 Novel Coronavirus Diseases (COVID-19) — China, 2020. China CDC Weekly, 2:113. doi: https://doi.org/10.46234/ccdcw2020.032.
World Health Organization (2020). COVID-19 Weekly Epidemiological Update - 27 December 2020. [link], December, 29.
Zhou, Y., Zhang, W., Kang, J., Zhang, X., and Wang, X. (2021). A problem-specific non-dominated sorting genetic algorithm for supervised feature selection. Information Sciences, 547:841–859. doi: https://doi.org/10.1016/j.ins.2020.08.083.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 The authors
This work is licensed under a Creative Commons Attribution 4.0 International License.