Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Classification of Users of a Health Service Provider Using Unsupervised Machine Learning Methods

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

In this paper, we compare three unsupervised classification methods: k-means, fuzzy clustering and Self-Organized Maps (SOM) on a database of a health service provider in Bogotá–Colombia in order to classify users who request services in different offices and to propose a reorganization of human resources of all offices according to the density of customers and their needs. To do so, the database is pre-processed to correct some data problems such as incomplete individuals, bad measurements and outliers to then apply the three selected clustering methods, compare their results and finally propose some recommendations for improving service levels and to reduce both total service and waiting times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ahsan MM, Mahmud MA, Saha PK, et al. Effect of data scaling methods on machine learning algorithms and model performance. Technologies. 2021;9(3):52. https://doi.org/10.3390/technologies9030052.

    Article  Google Scholar 

  2. Arbelaitz O, Gurrutxaga I, Muguerza J, et al. An extensive comparative study of cluster validity indices. Pattern Recognit. 2013;46(1):243–56. https://doi.org/10.1016/j.patcog.2012.07.021.

    Article  Google Scholar 

  3. Aremu OO, Hyland-Wood D, McAree PR. A machine learning approach to circumventing the curse of dimensionality in discontinuous time series machine data. Reliab Eng Syst Saf. 2020;195(106):706. https://doi.org/10.1016/j.ress.2019.106706.

    Article  Google Scholar 

  4. Avellaneda F. Learning optimal decision trees from large datasets. CoRR abs/1904.06314; 2019.https://doi.org/10.48550/arXiv.1904.06314. arXiv:1904.06314.

  5. Babuska R. Fuzzy clustering. New York: Wiley; 2004.

    Google Scholar 

  6. Bezdek JC. Pattern recognition with fuzzy objective function algorithms. New York: Springer; 1981. https://doi.org/10.1007/978-1-4757-0450-1.

    Book  Google Scholar 

  7. Bholowalia P, Kumar A. EBK-means: a clustering technique based on elbow method and K-means in WSN. Int J Comput Appl. 2014;105(9):17–24.

    Google Scholar 

  8. Bokhour BG, Fix GM, Mueller NM, et al. How can healthcare organizations implement patient-centered care? Examining a large-scale cultural transformation. BMC Health Serv Res. 2018;18(1):1–11. https://doi.org/10.1186/s12913-018-2949-5. https://www.bmchealthservres.biomedcentral.com/articles/10.1186/s12913-018-2949-5

  9. Cetinkaya Z, Horasan F. Decision trees in large data sets. Int J Eng Res Dev. 2021;13(1):140–51. https://doi.org/10.29137/umagd.763490.

    Article  Google Scholar 

  10. Dave M, Gianey H. Different clustering algorithms for Big Data analytics—a review. In: Proceedings of the 5th SMART conference 2016. IEEE, pp. 328–33 (2017). https://doi.org/10.1109/SYSMART.2016.7894544.

  11. Djouzi K, Beghdad-Bey K. A review of clustering algorithms for big data. In: Proceedings of the 4th ICNAS conference 2019. IEEE (2019). https://doi.org/10.1109/ICNAS.2019.8807822.

  12. Fahad A, Alshatri N, Tari Z, et al. A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput. 2014;2(3):267–79. https://doi.org/10.1109/TETC.2014.2330519.

    Article  Google Scholar 

  13. Fahrmeir L, Kneib T, Lang S, et al. Regression: Models, methods and applications, vol. 9783642343. Berlin: Springer; 2013. https://doi.org/10.1007/978-3-642-34333-9.

    Book  Google Scholar 

  14. Filzmoser P, Maronna R, Werner M. Outlier identification in high dimensions. Comput Stat Data Anal. 2008;52(3):1694–711. https://doi.org/10.1016/j.csda.2007.05.018.

    Article  MathSciNet  Google Scholar 

  15. Gittler T, Gontarz A, Weiss L, et al. A fundamental approach for data acquisition on machine tools as enabler for analytical Industrie 4.0 applications. In: Procedia CIRP, vol. 79. Elsevier, pp. 586–91 (2019). https://doi.org/10.1016/j.procir.2019.02.088.

  16. Halkidi M, Batistakis Y, Vazirgiannis M. On clustering validation techniques. J Intell Inf Syst. 2001;17(2–3):107–45. https://doi.org/10.1023/A:1012801612483. www.researchgate.net/publication/2500099_On_Clustering_Validation_Techniques.

  17. Harville DA. Matrix algebra on a statistician’s perspective. Berlin: Springer; 2005.

    Google Scholar 

  18. Hodge VJ, Austin J. A survey of outlier detection methodologies. Artif Intell Rev. 2004;22(2):85–126. https://doi.org/10.1023/B:AIRE.0000045502.10941.a9.

    Article  Google Scholar 

  19. Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Comput Surv. 1999;31(3):264–323. https://doi.org/10.1145/331499.331504.

    Article  Google Scholar 

  20. King AP, Aljabar P. MATLAB programming for biomedical engineers and scientists, 2nd ed. (2022). https://doi.org/10.1016/C2020-0-02707-6.

  21. Klutchnikoff N, Poterie A, Rouviere L. Statistical analysis of a hierarchical clustering algorithm with outliers. J Multivar Anal. 2022;192: 105075. https://doi.org/10.1016/j.jmva.2022.105075.

    Article  MathSciNet  Google Scholar 

  22. Kohonen T. Self-organized formation of topologically correct feature maps. Biol Cybern. 1957;43(1):59–69. https://doi.org/10.1007/bf00337288.

    Article  Google Scholar 

  23. Kohonen T. Self-organizing maps. Springer Series in Information Sciences. Berlin: Springer; 2001. https://doi.org/10.1007/978-3-642-56927-2.

    Book  Google Scholar 

  24. Li X, Liang W, Zhang X, et al. A cluster validity evaluation method for dynamically determining the near-optimal number of clusters. Soft Comput. 2020;24(12):9227–41. https://doi.org/10.1007/s00500-019-04449-7. www.researchgate.net/publication/336819374_A_cluster_validity_evaluation_method_for_dynamically_determining_the_near-optimal_number_of_clusters.

  25. Mayer-Schönberger V, Cukier K. Big data: a revolution that will transform how we live, work, and think. Organizacija Znanja. 2013;18(1–4):47–9. https://doi.org/10.3359/oz1314047.

    Article  Google Scholar 

  26. Okolichukwu UV, Sunday BA, Onuodu FE. Review and comparative analysis of data clustering algorithms. Int J Res Innov Appl Sci (IJRIAS). 2020;V:112–6.

    Google Scholar 

  27. Prestes PA, Silva TE, Barroso GC. Correlation analysis using teaching and learning analytics. Heliyon. 2021;7(11): e08435. https://doi.org/10.1016/j.heliyon.2021.e08435.

    Article  Google Scholar 

  28. Provost F, Fawcett T. Introduction: data-analytic thinking. California: O’Reilly Media; 2013.

    Google Scholar 

  29. Qu X, Yang L, Guo K, et al. A survey on the development of self-organizing maps for unsupervised intrusion detection. Mob Netw Appl. 2021;26(2):808–29. https://doi.org/10.1007/s11036-019-01353-0.

    Article  Google Scholar 

  30. Quackenbush J. Microarray data normalization and transformation. Nat Genet. 2002;32(4S):496–501. https://doi.org/10.1038/ng1032.

    Article  Google Scholar 

  31. Ros F, Guillaume S. A hierarchical clustering algorithm and an improvement of the single linkage criterion to deal with noise. Expert Syst Appl. 2019;128:96–108. https://doi.org/10.1016/j.eswa.2019.03.031.

    Article  Google Scholar 

  32. Sohil F, Sohali MU, Shabbir J. An introduction to statistical learning with applications in R. Stat Theory Relat Fields. 2022;6(1):87–87. https://doi.org/10.1080/24754269.2021.1980261.

    Article  Google Scholar 

  33. Steinhaus H. Sur la division des corps matériels en parties. Bulletin de l’Académie Polonaise des Sciences. 1957;4(12):801–4.

    Google Scholar 

  34. Tan J, Yang J, Wu S, et al. A critical look at the current train/test split in machine learning (2021). arXiv:2106.04525.

  35. Thorndike RL. Who belongs in the family? Psychometrika. 1953;18(4):267–76. https://doi.org/10.1007/bf00337288.

    Article  Google Scholar 

  36. Timm NH. Applied multivariate analysis. Berlin: Springer; 2002.

    Google Scholar 

  37. Uher V, Dráždilová P, Platoš J, et al. Automation of cleaning and ensembles for outliers detection in questionnaire data. Expert Syst Appl. 2022;206(117):809. https://doi.org/10.1016/j.eswa.2022.117809.

    Article  Google Scholar 

  38. Verma M, Srivastava M, Chack N, et al. A comparative study of various clustering algorithms in data mining. Int J Eng Res Appl. 2012;2(3):1379–84.

    Google Scholar 

  39. Xu P. The analysis of missing data in public use survey databases: a survey of statistical methods, PhD thesis (2004). https://doi.org/10.18297/etd/1603.

  40. Yazici B, Yolacan S. A comparison of various tests of normality. J Stat Comput Simul. 2007;77(2):175–83. https://doi.org/10.1080/10629360600678310.

    Article  MathSciNet  Google Scholar 

  41. Zhou X, Zhang H, Ji G, et al. A multi-density clustering algorithm based on similarity for dataset with density variation. IEEE Access. 2019;7:186004–16. https://doi.org/10.1109/ACCESS.2019.2960159.

    Article  Google Scholar 

  42. Ziegel ER. The elements of statistical learning. Technometrics. 2003;45(3):267–8. https://doi.org/10.1198/tech.2003.s770.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marlon David Arango-Abella.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Recent Advances of AI, Optimization and Simulation” guest edited by Juan Carlos Figueroa-García, Roman Neruda, José Luis Villa Ramirez, Carlos Franco and Germán Hernández-Pérez.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arango-Abella, M.D., Figueroa-García, J.C. Classification of Users of a Health Service Provider Using Unsupervised Machine Learning Methods. SN COMPUT. SCI. 5, 543 (2024). https://doi.org/10.1007/s42979-024-02685-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-02685-9

Keywords