Classification of Users of a Health Service Provider Using Unsupervised Machine Learning Methods

Arango-Abella, Marlon David; Figueroa-García, Juan Carlos

doi:10.1007/s42979-024-02685-9

Classification of Users of a Health Service Provider Using Unsupervised Machine Learning Methods

Original Research
Published: 13 May 2024

Volume 5, article number 543, (2024)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Marlon David Arango-Abella¹^na1 &
Juan Carlos Figueroa-García¹^na1

102 Accesses
Explore all metrics

Abstract

In this paper, we compare three unsupervised classification methods: k-means, fuzzy clustering and Self-Organized Maps (SOM) on a database of a health service provider in Bogotá–Colombia in order to classify users who request services in different offices and to propose a reorganization of human resources of all offices according to the density of customers and their needs. To do so, the database is pre-processed to correct some data problems such as incomplete individuals, bad measurements and outliers to then apply the three selected clustering methods, compare their results and finally propose some recommendations for improving service levels and to reduce both total service and waiting times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of Unsupervised Machine Learning to Cluster the Population Covered by Health Insurance

Analysis of Hospital Patient Data Using Computational Models

Semi-Supervised Learning with the Integration of Fuzzy Clustering and Artificial Neural Network

References

Ahsan MM, Mahmud MA, Saha PK, et al. Effect of data scaling methods on machine learning algorithms and model performance. Technologies. 2021;9(3):52. https://doi.org/10.3390/technologies9030052.
Article Google Scholar
Arbelaitz O, Gurrutxaga I, Muguerza J, et al. An extensive comparative study of cluster validity indices. Pattern Recognit. 2013;46(1):243–56. https://doi.org/10.1016/j.patcog.2012.07.021.
Article Google Scholar
Aremu OO, Hyland-Wood D, McAree PR. A machine learning approach to circumventing the curse of dimensionality in discontinuous time series machine data. Reliab Eng Syst Saf. 2020;195(106):706. https://doi.org/10.1016/j.ress.2019.106706.
Article Google Scholar
Avellaneda F. Learning optimal decision trees from large datasets. CoRR abs/1904.06314; 2019.https://doi.org/10.48550/arXiv.1904.06314. arXiv:1904.06314.
Babuska R. Fuzzy clustering. New York: Wiley; 2004.
Google Scholar
Bezdek JC. Pattern recognition with fuzzy objective function algorithms. New York: Springer; 1981. https://doi.org/10.1007/978-1-4757-0450-1.
Book Google Scholar
Bholowalia P, Kumar A. EBK-means: a clustering technique based on elbow method and K-means in WSN. Int J Comput Appl. 2014;105(9):17–24.
Google Scholar
Bokhour BG, Fix GM, Mueller NM, et al. How can healthcare organizations implement patient-centered care? Examining a large-scale cultural transformation. BMC Health Serv Res. 2018;18(1):1–11. https://doi.org/10.1186/s12913-018-2949-5. https://www.bmchealthservres.biomedcentral.com/articles/10.1186/s12913-018-2949-5
Cetinkaya Z, Horasan F. Decision trees in large data sets. Int J Eng Res Dev. 2021;13(1):140–51. https://doi.org/10.29137/umagd.763490.
Article Google Scholar
Dave M, Gianey H. Different clustering algorithms for Big Data analytics—a review. In: Proceedings of the 5th SMART conference 2016. IEEE, pp. 328–33 (2017). https://doi.org/10.1109/SYSMART.2016.7894544.
Djouzi K, Beghdad-Bey K. A review of clustering algorithms for big data. In: Proceedings of the 4th ICNAS conference 2019. IEEE (2019). https://doi.org/10.1109/ICNAS.2019.8807822.
Fahad A, Alshatri N, Tari Z, et al. A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput. 2014;2(3):267–79. https://doi.org/10.1109/TETC.2014.2330519.
Article Google Scholar
Fahrmeir L, Kneib T, Lang S, et al. Regression: Models, methods and applications, vol. 9783642343. Berlin: Springer; 2013. https://doi.org/10.1007/978-3-642-34333-9.
Book Google Scholar
Filzmoser P, Maronna R, Werner M. Outlier identification in high dimensions. Comput Stat Data Anal. 2008;52(3):1694–711. https://doi.org/10.1016/j.csda.2007.05.018.
Article MathSciNet Google Scholar
Gittler T, Gontarz A, Weiss L, et al. A fundamental approach for data acquisition on machine tools as enabler for analytical Industrie 4.0 applications. In: Procedia CIRP, vol. 79. Elsevier, pp. 586–91 (2019). https://doi.org/10.1016/j.procir.2019.02.088.
Halkidi M, Batistakis Y, Vazirgiannis M. On clustering validation techniques. J Intell Inf Syst. 2001;17(2–3):107–45. https://doi.org/10.1023/A:1012801612483. www.researchgate.net/publication/2500099_On_Clustering_Validation_Techniques.
Harville DA. Matrix algebra on a statistician’s perspective. Berlin: Springer; 2005.
Google Scholar
Hodge VJ, Austin J. A survey of outlier detection methodologies. Artif Intell Rev. 2004;22(2):85–126. https://doi.org/10.1023/B:AIRE.0000045502.10941.a9.
Article Google Scholar
Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Comput Surv. 1999;31(3):264–323. https://doi.org/10.1145/331499.331504.
Article Google Scholar
King AP, Aljabar P. MATLAB programming for biomedical engineers and scientists, 2nd ed. (2022). https://doi.org/10.1016/C2020-0-02707-6.
Klutchnikoff N, Poterie A, Rouviere L. Statistical analysis of a hierarchical clustering algorithm with outliers. J Multivar Anal. 2022;192: 105075. https://doi.org/10.1016/j.jmva.2022.105075.
Article MathSciNet Google Scholar
Kohonen T. Self-organized formation of topologically correct feature maps. Biol Cybern. 1957;43(1):59–69. https://doi.org/10.1007/bf00337288.
Article Google Scholar
Kohonen T. Self-organizing maps. Springer Series in Information Sciences. Berlin: Springer; 2001. https://doi.org/10.1007/978-3-642-56927-2.
Book Google Scholar
Li X, Liang W, Zhang X, et al. A cluster validity evaluation method for dynamically determining the near-optimal number of clusters. Soft Comput. 2020;24(12):9227–41. https://doi.org/10.1007/s00500-019-04449-7. www.researchgate.net/publication/336819374_A_cluster_validity_evaluation_method_for_dynamically_determining_the_near-optimal_number_of_clusters.
Mayer-Schönberger V, Cukier K. Big data: a revolution that will transform how we live, work, and think. Organizacija Znanja. 2013;18(1–4):47–9. https://doi.org/10.3359/oz1314047.
Article Google Scholar
Okolichukwu UV, Sunday BA, Onuodu FE. Review and comparative analysis of data clustering algorithms. Int J Res Innov Appl Sci (IJRIAS). 2020;V:112–6.
Google Scholar
Prestes PA, Silva TE, Barroso GC. Correlation analysis using teaching and learning analytics. Heliyon. 2021;7(11): e08435. https://doi.org/10.1016/j.heliyon.2021.e08435.
Article Google Scholar
Provost F, Fawcett T. Introduction: data-analytic thinking. California: O’Reilly Media; 2013.
Google Scholar
Qu X, Yang L, Guo K, et al. A survey on the development of self-organizing maps for unsupervised intrusion detection. Mob Netw Appl. 2021;26(2):808–29. https://doi.org/10.1007/s11036-019-01353-0.
Article Google Scholar
Quackenbush J. Microarray data normalization and transformation. Nat Genet. 2002;32(4S):496–501. https://doi.org/10.1038/ng1032.
Article Google Scholar
Ros F, Guillaume S. A hierarchical clustering algorithm and an improvement of the single linkage criterion to deal with noise. Expert Syst Appl. 2019;128:96–108. https://doi.org/10.1016/j.eswa.2019.03.031.
Article Google Scholar
Sohil F, Sohali MU, Shabbir J. An introduction to statistical learning with applications in R. Stat Theory Relat Fields. 2022;6(1):87–87. https://doi.org/10.1080/24754269.2021.1980261.
Article Google Scholar
Steinhaus H. Sur la division des corps matériels en parties. Bulletin de l’Académie Polonaise des Sciences. 1957;4(12):801–4.
Google Scholar
Tan J, Yang J, Wu S, et al. A critical look at the current train/test split in machine learning (2021). arXiv:2106.04525.
Thorndike RL. Who belongs in the family? Psychometrika. 1953;18(4):267–76. https://doi.org/10.1007/bf00337288.
Article Google Scholar
Timm NH. Applied multivariate analysis. Berlin: Springer; 2002.
Google Scholar
Uher V, Dráždilová P, Platoš J, et al. Automation of cleaning and ensembles for outliers detection in questionnaire data. Expert Syst Appl. 2022;206(117):809. https://doi.org/10.1016/j.eswa.2022.117809.
Article Google Scholar
Verma M, Srivastava M, Chack N, et al. A comparative study of various clustering algorithms in data mining. Int J Eng Res Appl. 2012;2(3):1379–84.
Google Scholar
Xu P. The analysis of missing data in public use survey databases: a survey of statistical methods, PhD thesis (2004). https://doi.org/10.18297/etd/1603.
Yazici B, Yolacan S. A comparison of various tests of normality. J Stat Comput Simul. 2007;77(2):175–83. https://doi.org/10.1080/10629360600678310.
Article MathSciNet Google Scholar
Zhou X, Zhang H, Ji G, et al. A multi-density clustering algorithm based on similarity for dataset with density variation. IEEE Access. 2019;7:186004–16. https://doi.org/10.1109/ACCESS.2019.2960159.
Article Google Scholar
Ziegel ER. The elements of statistical learning. Technometrics. 2003;45(3):267–8. https://doi.org/10.1198/tech.2003.s770.
Article Google Scholar

Download references

Author information

Marlon David Arango-Abella and Juan Carlos Figueroa-García contributed equally to this work.

Authors and Affiliations

Faculty of Engineering, Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
Marlon David Arango-Abella & Juan Carlos Figueroa-García

Authors

Marlon David Arango-Abella
View author publications
You can also search for this author in PubMed Google Scholar
Juan Carlos Figueroa-García
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marlon David Arango-Abella.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Recent Advances of AI, Optimization and Simulation” guest edited by Juan Carlos Figueroa-García, Roman Neruda, José Luis Villa Ramirez, Carlos Franco and Germán Hernández-Pérez.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Arango-Abella, M.D., Figueroa-García, J.C. Classification of Users of a Health Service Provider Using Unsupervised Machine Learning Methods. SN COMPUT. SCI. 5, 543 (2024). https://doi.org/10.1007/s42979-024-02685-9

Download citation

Received: 27 November 2023
Accepted: 05 February 2024
Published: 13 May 2024
DOI: https://doi.org/10.1007/s42979-024-02685-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification of Users of a Health Service Provider Using Unsupervised Machine Learning Methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Application of Unsupervised Machine Learning to Cluster the Population Covered by Health Insurance

Analysis of Hospital Patient Data Using Computational Models

Semi-Supervised Learning with the Integration of Fuzzy Clustering and Artificial Neural Network

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Classification of Users of a Health Service Provider Using Unsupervised Machine Learning Methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Application of Unsupervised Machine Learning to Cluster the Population Covered by Health Insurance

Analysis of Hospital Patient Data Using Computational Models

Semi-Supervised Learning with the Integration of Fuzzy Clustering and Artificial Neural Network

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation