Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Early diagnosis and personalised treatment focusing on synthetic data modelling: : Novel visual learning approach in healthcare

Published: 01 September 2023 Publication History

Abstract

The early diagnosis and personalised treatment of diseases are facilitated by machine learning. The quality of data has an impact on diagnosis because medical data are usually sparse, imbalanced, and contain irrelevant attributes, resulting in suboptimal diagnosis. To address the impacts of data challenges, improve resource allocation, and achieve better health outcomes, a novel visual learning approach is proposed. This study contributes to the visual learning approach by determining whether less or more synthetic data are required to improve the quality of a dataset, such as the number of observations and features, according to the intended personalised treatment and early diagnosis. In addition, numerous visualisation experiments are conducted, including using statistical characteristics, cumulative sums, histograms, correlation matrix, root mean square error, and principal component analysis in order to visualise both original and synthetic data to address the data challenges. Real medical datasets for cancer, heart disease, diabetes, cryotherapy and immunotherapy are selected as case studies. As a benchmark and point of classification comparison in terms of such as accuracy, sensitivity, and specificity, several models are implemented such as k-Nearest Neighbours and Random Forest. To simulate algorithm implementation and data, Generative Adversarial Network is used to create and manipulate synthetic data, whilst, Random Forest is implemented to classify the data. An amendable and adaptable system is constructed by combining Generative Adversarial Network and Random Forest models. The system model presents working steps, overview and flowchart. Experiments reveal that the majority of data-enhancement scenarios allow for the application of visual learning in the first stage of data analysis as a novel approach. To achieve meaningful adaptable synergy between appropriate quality data and optimal classification performance while maintaining statistical characteristics, visual learning provides researchers and practitioners with practical human-in-the-loop machine learning visualisation tools. Prior to implementing algorithms, the visual learning approach can be used to actualise early, and personalised diagnosis. For the immunotherapy data, the Random Forest performed best with precision, recall, f-measure, accuracy, sensitivity, and specificity of 81%, 82%, 81%, 88%, 95%, and 60%, as opposed to 91%, 96%, 93%, 93%, 96%, and 73% for synthetic data, respectively. Future studies might examine the optimal strategies to balance the quantity and quality of medical data.

Highlights

A novel hybrid visual learning approach based on experiments for human judgment and modelling.
Personalised and early diagnosis system adaptable at data and algorithm levels.
Original and synthetic data visualisation experiments for well-informed classification.
Addressing challenging medical data: the root cause of suboptimal diagnosis.
GAN and RF integration and combination to generate, manipulate and classify data.

References

[1]
Berry M.W., Mohamed A., Yap B.W., Supervised and Unsupervised Learning for Data Science, Springer, 2019.
[2]
Remeseiro B., Bolon-Canedo V., A review of feature selection methods in medical applications, Comput. Biol. Med. 112 (2019).
[3]
Ghiasi M.M., Zendehboudi S., Decision tree-based methodology to select a proper approach for wart treatment, Comput. Biol. Med. 108 (2019) 400–409.
[4]
Waring J., Lindvall C., Umeton R., Automated machine learning: Review of the state-of-the-art and opportunities for healthcare, Artif. Intell. Med. 104 (2020).
[5]
Xiao Y., Wu J., Lin Z., Cancer diagnosis using generative adversarial networks based on deep learning from imbalanced data, Comput. Biol. Med. 135 (2021).
[6]
Aggarwal A., Mittal M., Battineni G., Generative adversarial network: An overview of theory and applications, Int. J. Inf. Manag. Data Insights 1 (1) (2021).
[7]
Faruque M.F., Sarker I.H., et al., Performance analysis of machine learning techniques to predict diabetes mellitus, in: 2019 International Conference on Electrical, Computer and Communication Engineering, ECCE, IEEE, 2019, pp. 1–4.
[8]
Almustafa K.M., Prediction of heart disease and classifiers’ sensitivity analysis, BMC Bioinform. 21 (1) (2020) 1–18.
[9]
Abbas S., Jalil Z., Javed A.R., Batool I., Khan M.Z., Noorwali A., Gadekallu T.R., Akbar A., BCD-WERT: A novel approach for breast cancer detection using whale optimization based efficient features and extremely randomized tree algorithm, PeerJ Comput. Sci. 7 (2021).
[10]
Ali M.M., Paul B.K., Ahmed K., Bui F.M., Quinn J.M., Moni M.A., Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison, Comput. Biol. Med. 136 (2021).
[11]
Engelberger F., Galaz-Davison P., Bravo G., Rivera M., Ramírez-Sarmiento C.A., Developing and Implementing Cloud-Based Tutorials That Combine Bioinformatics Software, Interactive Coding, and Visualization Exercises for Distance Learning on Structural Bioinformatics, ACS Publications, 2021.
[12]
. UCI, UCI machine learning repository: heart disease dataset, URL https://bit.ly/44W8zAR.
[13]
. UCI, UCI machine learning repository: breast cancer dataset wisconsin (Diagnostic), URL https://bit.ly/3pSsRMV.
[14]
. UCI, Original Wisconsin breast cancer database, URL https://bit.ly/3Dto07X.
[15]
. UCI, UCI machine learning repository: immunotherapy dataset, URL https://bit.ly/3q3fOrV.
[16]
. UCI, UCI machine learning repository: cryotherapy dataset, URL https://bit.ly/44TjfQI.
[17]
. UCI, UCI machine learning repository: exasens dataset, URL https://bit.ly/43HFYhw.
[18]
. Kaggle, Pima Indians diabetes database, URL https://bit.ly/3Y2kquM.
[19]
Qasem A.G., Lam S.S., Prediction of wart treatment response using a hybrid GA-ensemble learning approach, Expert Syst. Appl. 221 (2023).
[20]
Asanya K.C., Kharrat M., Udom A.U., Torsen E., Robust Bayesian approach to logistic regression modeling in small sample size utilizing a weakly informative student’st prior distribution, Comm. Statist. Theory Methods 52 (2) (2023) 283–293.
[21]
Alamsyah D.P., Ramdhani Y., Arifin T., Febrilla F., Setiawan S., Prediction of immunotherapy success rate: Particle swarm optimization approach, in: 2022 2nd International Conference on Intelligent Technologies, CONIT, IEEE, 2022, pp. 1–5.
[22]
Erdiansyah U., Lubis A.I., Erwansyah K., Komparasi metode K-nearest Neighbor dan Random Forest Dalam Prediksi Akurasi Klasifikasi Pengobatan Penyakit Kutil, J. Media Inf. Budidarma 6 (1) (2022) 208–214.
[23]
Khozeimeh F., Alizadehsani R., Roshanzamir M., Khosravi A., Layegh P., Nahavandi S., An expert system for selecting wart treatment method, Comput. Biol. Med. 81 (2017) 167–175.
[24]
Akben S.B., Predicting the success of wart treatment methods using decision tree based fuzzy informative images, Biocybern. Biomed. Eng. 38 (4) (2018) 819–827.
[25]
Khatri S., Arora D., Kumar A., Enhancing decision tree classification accuracy through genetically programmed attributes for wart treatment method identification, Procedia Comput. Sci. 132 (2018) 1685–1694.
[26]
Mishra A., Reddy U.S., Machine learning approach for wart treatment selection: Prominence on performance assessment, Netw. Model. Anal. Health Inf. Bioinform. 9 (2020) 1–14.
[27]
Hu J., Ou X., Liang P., Li B., Applying particle swarm optimization-based decision tree classifier for wart treatment selection, Complex Intell. Syst. (2021) 1–15.
[28]
Mahmoud A.Y., Neagu D., Scrimieri D., Abdullatif A.R.A., Review of immunotherapy classification: Application domains, datasets, algorithms and software tools from machine learning perspective, in: 2022 32nd Conference of Open Innovations Association, FRUCT, IEEE, 2022, pp. 152–161.
[29]
Mahmoud A.Y., Efficiency of immunotherapy treatments of warts utilising random forest and decision trees, Intell.-Based Med (2023) Under review.
[30]
A.Y. Mahmoud, Preliminary Introduction and Implementation of novel machine learning algorithm Utilising Pareto Principle: classification of small biomedical health-related datasets, in: Advances in Computational Intelligence Systems - Contributions Presented At the 21st UK Workshop on Computational Intelligence, September 7-9, 2022, Sheffield, UK, Springer.
[31]
Mahmoud A.Y., Neagu D., Scrimieri D., Abdullatif A.R.A., Machine learning experiments with artificially generated big data from small immunotherapy datasets, in: 2022 21st IEEE International Conference on Machine Learning and Applications, ICMLA, IEEE, 2022, pp. 986–991.
[32]
A.Y. Mahmoud, Classification of Imbalanced Immunotherapy and Health-related Data Utilising Novel Machine Learning Experiments, in: Advances in Computational Intelligence Systems - Contributions Presented At the 21st UK Workshop on Computational Intelligence, September 7-9, 2022, Sheffield, UK, Springer.
[33]
Saravi B., Hassel F., Ülkümen S., Zink A., Shavlokhova V., Couillard-Despres S., Boeker M., Obid P., Lang G.M., Artificial intelligence-driven prediction modeling and decision making in spine surgery using hybrid machine learning models, J. Personal. Med. 12 (4) (2022) 509.
[34]
Varoquaux G., Cheplygina V., Machine learning for medical imaging: methodological failures and recommendations for the future, NPJ Digit. Med. 5 (1) (2022) 48.
[35]
Ramesh T., Lilhore U.K., Poongodi M., Simaiya S., Kaur A., Hamdi M., Predictive analysis of heart diseases with machine learning approaches, Malaysian J. Comput. Sci. (2022) 132–148.
[36]
Leibig C., Brehmer M., Bunk S., Byng D., Pinker K., Umutlu L., Combining the strengths of radiologists and AI for breast cancer screening: A retrospective analysis, Lancet Digit. Health 4 (7) (2022) e507–e519.
[37]
Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y., Generative adversarial networks, Commun. ACM 63 (11) (2020) 139–144.
[38]
Saxena D., Cao J., Generative adversarial networks (GANs) challenges, solutions, and future directions, ACM Comput. Surv. 54 (3) (2021) 1–42.
[39]
Jin L., Tan F., Jiang S., Generative adversarial network technologies and applications in computer vision, Comput. Intell. Neurosci. 2020 (2020).
[40]
Wiatrak M., Albrecht S.V., Nystrom A., Stabilizing generative adversarial networks: A survey, 2019, arXiv preprint arXiv:1910.00927.
[41]
Lee M., Seok J., Regularization methods for generative adversarial networks: An overview of recent studies, 2020, arXiv preprint arXiv:2005.09165.
[42]
Guo X., Hong J., Lin T., Yang N., Relaxed wasserstein with applications to GANs, in: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, IEEE, 2021, pp. 3325–3329.
[43]
Boulesteix A.-L., Janitza S., Kruppa J., König I.R., Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics, Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 2 (6) (2012) 493–507.
[44]
Salman S., Ahmed M.S., Ibrahim A.M., Mattar O.M., El-Shirbiny H., Sarsik S., Afifi A.M., Anis R.M., Agha N.A.Y., Abushouk A.I., Intralesional immunotherapy for the treatment of warts: A network meta-analysis, J. Acad. Dermatol. 80 (4) (2019) 922–930.
[45]
Saranya P., Asha P., Survey on big data analytics in health care, in: 2019 International Conference on Smart Systems and Inventive Technology, ICSSIT, IEEE, 2019, pp. 46–51.
[46]
Tran K.A., Kondrashova O., Bradley A., Williams E.D., Pearson J.V., Waddell N., Deep learning in cancer diagnosis, prognosis and treatment selection, Genome Med. 13 (1) (2021) 1–17.
[47]
Mahmud M., Kaiser M.S., McGinnity T.M., Hussain A., Deep learning in mining biological data, Cogn. Comput. 13 (2021) 1–33.
[48]
Chen Z., Wu M., Zhao R., Guretno F., Yan R., Li X., Machine remaining useful life prediction via an attention-based deep learning approach, IEEE Trans. Ind. Electron. 68 (3) (2020) 2521–2531.
[49]
Zou J., Huss M., Abid A., Mohammadi P., Torkamani A., Telenti A., A primer on deep learning in genomics, Nature Genetics 51 (1) (2019) 12–18.
[50]
Smith D., Elliot M., Sakshaug J.W., To link or synthesize? An approach to data quality comparison, ACM J. Data Inf. Qual. (2023).
[51]
Alam M.Z., Rahman M.S., Rahman M.S., A random forest based predictor for medical data classification using feature ranking, Inform. Med. Unlocked 15 (2019).
[52]
Alam T.M., Iqbal M.A., Ali Y., Wahab A., Ijaz S., Baig T.I., Hussain A., Malik M.A., Raza M.M., Ibrar S., et al., A model for early prediction of diabetes, Inform. Med. Unlocked 16 (2019).
[53]
B.F. Yuksel, P. Fazli, U. Mathur, V. Bisht, S.J. Kim, J.J. Lee, S.J. Jin, Y.-T. Siu, J.A. Miele, I. Yoon, Human-in-the-loop machine learning to increase video accessibility for visually impaired and blind users, in: Proceedings of the 2020 ACM Designing Interactive Systems Conference, 2020, pp. 47–60.
[54]
Munro R., Monarch R., Human-in-the-Loop Machine Learning: Active Learning and Annotation for Human-Centered AI, Simon and Schuster, 2021.
[55]
Krenmayr L., Frank R., Drobig C., Braungart M., Seidel J., Schaudt D., von Schwerin R., Stucke-Straub K., GaNerAid: Realistic synthetic patient data for clinical trials, Inform. Med. Unlocked 35 (2022).
[56]
Omar N., Nazirun N.N., Vijayam B., Wahab A.A., Bahuri H.A., Diabetes subtypes classification for personalized health care: A review, Artif. Intell. Rev. (2022) 1–25.
[57]
Mayerhoefer M.E., Materka A., Langs G., Häggström I., Szczypiński P., Gibbs P., Cook G., Introduction to radiomics, J. Nucl. Med. 61 (4) (2020) 488–495.
[58]
Zitnik M., Nguyen F., Wang B., Leskovec J., Goldenberg A., Hoffman M.M., Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities, Inf. Fusion 50 (2019) 71–91.
[59]
Willemink M.J., Koszek W.A., Hardell C., Wu J., Fleischmann D., Harvey H., Folio L.R., Summers R.M., Rubin D.L., Lungren M.P., Preparing medical imaging data for machine learning, Radiology 295 (1) (2020) 4–15.
[60]
Asgari P., Miri M.M., Asgari F., The comparison of selected machine learning techniques and correlation matrix in ICU mortality risk prediction, Inform. Med. Unlocked 31 (2022).
[61]
Kawahara D., Saito A., Ozawa S., Nagata Y., Image synthesis with deep convolutional generative adversarial networks for material decomposition in dual-energy CT from a kilovoltage CT, Comput. Biol. Med. 128 (2021).
[62]
Shehab N., Badawy M., Arafat H., Big data analytics and preprocessing, in: Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges, Springer, 2021, pp. 25–43.
[63]
Cantwell C.D., Mohamied Y., Tzortzis K.N., Garasto S., Houston C., Chowdhury R.A., Ng F.S., Bharath A.A., Peters N.S., Rethinking multiscale cardiac electrophysiology with machine learning and predictive modelling, Comput. Biol. Med. 104 (2019) 339–351.
[64]
Ngiam K.Y., Khor W., Big data and machine learning algorithms for health-care delivery, Lancet Oncol. 20 (5) (2019) e262–e273.
[65]
Magesh P.R., Myloth R.D., Tom R.J., An explainable machine learning model for early detection of Parkinson’s disease using LIME on DaTSCAN imagery, Comput. Biol. Med. 126 (2020).
[66]
Ali A., Abu-Elkheir M., Atwan A., Elmogy M., Missing values imputation using fuzzy K-top matching value, J. King Saud Univ.-Comput. Inf. Sci. 35 (1) (2023) 426–437.
[67]
Islam M.T., Mustafa H.A., Multi-Layer Hybrid (MLH) balancing technique: A combined approach to remove data imbalance, Data Knowl. Eng. 143 (2023).
[68]
Islam M.M., Haque M.R., Iqbal H., Hasan M.M., Hasan M., Kabir M.N., Breast cancer prediction: A comparative study using machine learning techniques, SN Comput. Sci. 1 (2020) 1–14.
[69]
Egwom O.J., Hassan M., Tanimu J.J., Hamada M., Ogar O.M., An LDA–SVM machine learning model for breast cancer classification, BioMedInformatics 2 (3) (2022) 345–358.
[70]
Kadam V.J., Jadhav S.M., Vijayakumar K., Breast cancer diagnosis using feature ensemble learning based on stacked sparse autoencoders and softmax regression, J. Med. Syst. 43 (8) (2019) 263.
[71]
Hernández-Julio Y.F., Prieto-Guevara M.J., Nieto-Bernal W., Meriño-Fuentes I., Guerrero-Avendaño A., Framework for the development of data-driven mamdani-type fuzzy clinical decision support systems, Diagnostics 9 (2) (2019) 52.
[72]
Sun W., Cai Z., Liu F., Fang S., Wang G., A survey of data mining technology on electronic medical records, in: 2017 IEEE 19th International Conference on E-Health Networking, Applications and Services, Healthcom, IEEE, 2017, pp. 1–6.
[73]
Haesevoets T., De Cremer D., Dierckx K., Van Hiel A., Human-machine collaboration in managerial decision making, Comput. Hum. Behav. 119 (2021).
[74]
Wu X., Xiao L., Sun Y., Zhang J., Ma T., He L., A survey of human-in-the-loop for machine learning, Future Gener. Comput. Syst. (2022).
[75]
Saleem M., Abbas S., Ghazal T.M., Khan M.A., Sahawneh N., Ahmad M., Smart cities: Fusion-based intelligent traffic congestion control system for vehicular networks using machine learning techniques, Egypt. Inf. J. 23 (3) (2022) 417–426.

Index Terms

  1. Early diagnosis and personalised treatment focusing on synthetic data modelling: Novel visual learning approach in healthcare
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Computers in Biology and Medicine
            Computers in Biology and Medicine  Volume 164, Issue C
            Sep 2023
            1450 pages

            Publisher

            Pergamon Press, Inc.

            United States

            Publication History

            Published: 01 September 2023

            Author Tags

            1. Personalised and early diagnosis
            2. Machine learning
            3. Imbalanced UCI data
            4. Generative Adversarial Network
            5. Random Forest
            6. Synthetic data
            7. Visualisations
            8. Healthcare

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 18 Feb 2025

            Other Metrics

            Citations

            View Options

            View options

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media