CDL4CDRP: A Collaborative Deep Learning Approach for Clinical Decision and Risk Prediction
Abstract
:1. Introduction
2. Related Work
3. Materials and Methods
3.1. Materials
3.1.1. Chronic Kidney Disease Dataset
3.1.2. Dermatology Dataset
3.2. Classification Methods
3.2.1. User-Based Collaborative Filtering
3.2.2. RBM-Based Collaborative Filtering for Clinical Diagnosis
Algorithm 1 Training update for RBM over (h, v) based on CD |
Input: training set D, number of iterations n_step, step number of sampling cd_k Output: updated parameters, a*, b*, W* Step 1: Input training dataset D, number of iterations n_step, step number of Gibbs sampling cd_k; Step 2: Initialization RBM parameters a, b, W; Step 3: Gibbs sampling is performed by controlling the number of iterations cd_k, update hk and vk+1 according to Equations (10) and (11) respectively; Step 4: Update parameter a, b, W according to the following formula, Step 5: Output parameters a*, b*, W* as model parameters |
3.2.3. DRBM-Based Collaborative Filtering for Clinical Diagnosis
3.3. Simulation
4. Results
4.1. Data Visualization
4.2. User-based CF for Clinical Disease Diagnosis
4.3. RBM-Based CF for Clinical Disease Diagnosis
4.4. DRBM Based CF for Clinical Disease Diagnosis
4.5. Experimental Result and Comparative Analysis
5. Discussion
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Appendix A
RBM | Restricted Boltzmann Machine |
DRBM | Discriminative Restricted Boltzmann machine |
SVM | Support Vector Machines |
RF | Random Forest |
LR | Logistic Regression |
MLP | Multi-layer Perceptron |
Mean | Mean Imputation |
kNN | k-nearest neighbors |
cart | Classification and Regression Tree |
pmm | Predictive Mean Matching Imputation |
rf | Random Forest Imputation |
MICE | Multivariate imputation by Chained equations |
MAR | Missing at random |
MCAR | Missing completely at random |
RecSys | Recommender systems |
RF-Mean | Random forest with Mean imputation |
RF-kNN | Random forest with kNN imputation |
RF-cart | Random forest with cart imputation |
RF-pmm | Random forest with pmm imputation |
RF-rf | Random forest with rf imputation |
References
- Musen, M.A.; Middleton, B.; Greenes, R.A. Clinical Decision-Support Systems. In Biomedical Informatics: Computer Applications in Health Care and Biomedicine; Shortliffe, E.H., Cimino, J.J., Eds.; Springer: London, UK, 2014; pp. 643–674. [Google Scholar]
- Berner, E.S.; La Lande, T.J. Overview of Clinical Decision Support Systems. In Clinical Decision Support Systems: Theory and Practice; Berner, E.S., Ed.; Springer: New York, NY, USA, 2007; pp. 3–22. [Google Scholar]
- Newman-Toker, D.E.; Pronovost, P.J. Diagnostic errors—The next frontier for patient safety. JAMA 2009, 301, 1060–1062. [Google Scholar] [CrossRef]
- Isa, I.S.; Saad, Z.; Omar, S.; Osman, M.K.; Ahmad, K.A.; Sakim, H.A.M. Suitable MLP Network Activation Functions for Breast Cancer and Thyroid Disease Detection. In Proceedings of the 2010 Second International Conference on Computational Intelligence, Modelling and Simulation, Tuban, Indonesia, 28–30 September 2010; pp. 39–44. [Google Scholar]
- Shen, L.; Chen, H.; Yu, Z.; Kang, W.; Zhang, B.; Li, H.; Yang, B.; Liu, D. Evolving support vector machines using fruit fly optimization for medical data classification. Knowl. Based Syst. 2016, 96, 61–75. [Google Scholar] [CrossRef]
- Cawley, G.C.; Talbot, N.L.C. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics 2006, 22, 2348–2355. [Google Scholar] [CrossRef] [Green Version]
- Chen, X.; Ishwaran, H. Random forests for genomic data analysis. Genomics 2012, 99, 323–329. [Google Scholar] [CrossRef]
- King, G.; Honaker, J.; Joseph, A.; Scheve, K. Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation. Am. Polit. Sci. Rev. 2002, 95, 49–69. [Google Scholar]
- Adomavicius, G.; Tuzhilin, A. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 2005, 17, 734–749. [Google Scholar] [CrossRef]
- Yang, X.; Guo, Y.; Liu, Y.; Steck, H. A survey of collaborative filtering based social recommender systems. Comput. Commun. 2014, 41, 1–10. [Google Scholar] [CrossRef]
- Zhang, N.; Ding, S.; Zhang, J.; Xue, Y. An overview on Restricted Boltzmann Machines. Neurocomputing 2018, 275, 1186–1199. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef]
- Larochelle, H.; Bengio, Y. Classification using discriminative restricted Boltzmann machines. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 536–543. [Google Scholar]
- Cherla, S.; Tran, S.N.; d’Avila Garcez, A.; Weyde, T. Generalising the Discriminative Restricted Boltzmann Machines. In International Conference on Artificial Neural Networks; Springer: Cham, Switzerland, 2017; pp. 111–119. [Google Scholar]
- Srivastava, N.; Salakhutdinov, R. Multimodal learning with deep Boltzmann machines. J. Mach. Learn. Res. 2014, 15, 2949–2980. [Google Scholar]
- Srivastava, N.; Salakhutdinov, R.; Hinton, G. Modeling documents with a Deep Boltzmann Machine. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence; AUAI Press: Bellevue, WA, USA, 2013; pp. 616–624. [Google Scholar]
- Li, X.; Zhao, F.; Guo, Y. Conditional Restricted Boltzmann Machines for Multi-label Learning with Incomplete Labels. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, San Diego, CA, USA, 9–12 May 2015; pp. 635–643. [Google Scholar]
- Maxwell, A.; Li, R.; Yang, B.; Weng, H.; Ou, A.; Hong, H.; Zhou, Z.; Gong, P.; Zhang, C. Deep learning architectures for multi-label classification of intelligent health risk prediction. BMC Bioinform. 2017, 18, 523. [Google Scholar] [CrossRef]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504. [Google Scholar] [CrossRef]
- Salakhutdinov, R.; Mnih, A.; Hinton, G. Restricted Boltzmann machines for collaborative filtering. In Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA, 20–24 June 2007; pp. 791–798. [Google Scholar]
- Eickholt, J.; Cheng, J. Predicting protein residue–residue contacts using deep networks and boosting. Bioinformatics 2012, 28, 3066–3072. [Google Scholar] [CrossRef] [Green Version]
- Bell, R.M.; Koren, Y. Lessons from the Netflix prize challenge. SIGKDD Explor. Newsl. 2007, 9, 75–79. [Google Scholar] [CrossRef]
- John, A.; Muhammed Ilyas, H.; Vasudevan, V. Medication recommendation system based on clinical documents. In Proceedings of the 2016 International Conference on Information Science (ICIS), Kochi, India, 12–13 August 2016; pp. 180–184. [Google Scholar]
- Felix, G.; Stefanie, B.; Denise, K.; Jochen, S.; Susanne, A.; Hagen, M.; Sebastian, Z. Therapy Decision Support Based on Recommender System Methods. J. Healthc. Eng. 2017, 2017. [Google Scholar] [CrossRef]
- Folino, F.; Pizzuti, C. A recommendation engine for disease prediction. Inf. Syst. e-Bus. Manag. 2015, 13, 609–628. [Google Scholar] [CrossRef]
- Hu, Y.; Koren, Y.; Volinsky, C. Collaborative Filtering for Implicit Feedback Datasets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 263–272. [Google Scholar]
- Jannach, D.; Lerche, L.; Zanker, M. Recommending Based on Implicit Feedback. In Social Information Access: Systems and Technologies; Brusilovsky, P., He, D., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 510–569. [Google Scholar]
- Pan, R.; Zhou, Y.; Cao, B.; Liu, N.N.; Lukose, R.; Scholz, M.; Yang, Q. One-Class Collaborative Filtering. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 502–511. [Google Scholar]
- Pan, R.; Scholz, M. Mind the gaps: Weighting the unknown in large-scale one-class collaborative filtering. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 667–676. [Google Scholar]
- Paquet, U.; Koenigstein, N. One-class collaborative filtering with random graphs. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 999–1008. [Google Scholar]
- Koren, Y. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, 24–27 August 2008; pp. 426–434. [Google Scholar]
- Mnih, A.; Salakhutdinov, R.R. Probabilistic matrix factorization. In Advances in Neural Information Processing Systems; Mit Press: Cambridge, MA, USA, 2008; pp. 1257–1264. [Google Scholar]
- Jamali, M.; Lakshmanan, L. HeteroMF: Recommendation in heterogeneous information networks using context dependent factor models. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 643–654. [Google Scholar]
- Karatzoglou, A.; Baltrunas, L.; Shi, Y. Learning to rank for recommender systems. In Proceedings of the 7th ACM Conference on Recommender Systems, Hong Kong, China, 12–16 October 2013; pp. 493–494. [Google Scholar]
- Cao, B.; Hou, C.; Peng, H.; Fan, J.; Yang, J.; Yin, J.; Deng, S. Predicting e-book ranking based on the implicit user feedback. World Wide Web 2019, 22, 637–655. [Google Scholar] [CrossRef]
- Huang, J.; Wang, J.; Yao, Y.; Zhong, N. Cost-sensitive three-way recommendations by learning pair-wise preferences. INT J. Approx. Reason. 2017, 86, 28–40. [Google Scholar] [CrossRef]
- Shi, Y.; Karatzoglou, A.; Baltrunas, L.; Larson, M.; Hanjalic, A.; Oliver, N. TFMAP: Optimizing MAP for top-n context-aware recommendation. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, Oregon, USA, 12–16 August 2012; pp. 155–164. [Google Scholar]
- Oord, R.V.D.; Dieleman, S.; Schrauwen, B. Deep content-based music recommendation. In Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 2, Lake Tahoe, Nevada, 5–10 December 2013; Curran Associates Inc.: New York, NY, USA; pp. 2643–2651. [Google Scholar]
- Yao, W.; He, J.; Huang, G.; Cao, J.; Zhang, Y. A Graph-based model for context-aware recommendation using implicit feedback data. World Wide Web 2015, 18, 1351–1371. [Google Scholar] [CrossRef]
- Tomczak, J. Application of classification restricted boltzmann machine to medical domains. World Appl. Sci. J. 2014, 31, 69–75. [Google Scholar]
- Luo, X.; Zhou, M.; Xia, Y.; Zhu, Q. An Efficient Non-Negative Matrix-Factorization-Based Approach to Collaborative Filtering for Recommender Systems. IEEE Trans. Ind. Inform. 2014, 10, 1273–1284. [Google Scholar]
- Luo, X.; Zhou, M.; Li, S.; Shang, M. An Inherently Nonnegative Latent Factor Model for High-Dimensional and Sparse Matrices from Industrial Applications. IEEE Trans. Ind. Inform. 2018, 14, 2011–2022. [Google Scholar] [CrossRef]
- Luo, X.; Zhou, M.; Li, S.; You, Z.; Xia, Y.; Zhu, Q. A Nonnegative Latent Factor Model for Large-Scale Sparse Matrices in Recommender Systems via Alternating Direction Method. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 579–592. [Google Scholar] [CrossRef]
- Luo, X.; Zhou, M.; Xia, Y.; Zhu, Q.; Ammari, A.C.; Alabdulwahab, A. Generating Highly Accurate Predictions for Missing QoS Data via Aggregating Nonnegative Latent Factor Models. IEEE Trans. Neural Netw. Learn. 2016, 27, 524–537. [Google Scholar] [CrossRef]
- Luo, X.; Zhou, M.; Li, S.; Xia, Y.; You, Z.; Zhu, Q.; Leung, H. Incorporation of Efficient Second-Order Solvers Into Latent Factor Models for Accurate Prediction of Missing QoS Data. IEEE Trans. Cybern. 2018, 48, 1216–1228. [Google Scholar] [CrossRef]
- Hao, F.; Blair, R.H. A comparative study: Classification vs. user-based collaborative filtering for clinical prediction. BMC Med. Res. Methodol. 2016, 16, 172. [Google Scholar] [CrossRef]
- Dua, D.A.K.T. Efi {UCI} Machine Learning Repository. Available online: http://archive.ics.uci.edu/ml (accessed on 14 February 2019).
- Ravindra, B.; Sriraam, N.; Geetha, M. Classification of non-chronic and chronic kidney disease using SVM neural networks. Int. J. Eng. Technol. 2018, 7, 191–194. [Google Scholar]
- Subasi, A.; Alickovic, E.; Kevric, J. Diagnosis of Chronic Kidney Disease by Using Random Fores. In CMBEBIH 2017. IFMBE Proceedings, Singapore, 2017; Badnjevic, A., Ed.; Springer: Singapore, 2017; pp. 589–594. [Google Scholar]
- Khanna, D.; Sahu, R.; Baths, V.; Deshpande, B. Comparative Study of Classification Techniques (SVM, Logistic Regression and Neural Networks) to Predict the Prevalence of Heart Disease. Int. J. Mach. Learn. Comput. 2015, 5, 414. [Google Scholar] [CrossRef]
- Yildirim, P. Chronic Kidney Disease Prediction on Imbalanced Data by Multilayer Perceptron: Chronic Kidney Disease Prediction. In Proceedings of the 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), Turin, Italy, 4–8 July 2017; pp. 193–198. [Google Scholar]
- Avram, A. TensorFlow: Google Open Sources Their Machine Learning Tool. Available online: https://www.infoq.com/news/2015/11/tensorflow (accessed on 14 February 2019).
- John Chambers: The R Project for Statistical Computing. Available online: https://www.r-project.org (accessed on 14 February 2019).
- Larochelle, H.; Mandel, M.; Pascanu, R.; Bengio, Y. Learning algorithms for the classification restricted Boltzmann machine. J. Mach. Learn. Res. 2012, 13, 643–669. [Google Scholar]
- Little, R.J.A. A Test of Missing Completely at Random for Multivariate Data with Missing Values. J. Am. Stat. Assoc. 1988, 83, 1198–1202. [Google Scholar] [CrossRef]
- van Buuren, S.; Groothuis-Oudshoorn, K. mice: Multivariate Imputation by Chained Equations in R. J. Stat. Softw. 2011, 45, 1–68. [Google Scholar] [CrossRef]
- Luukka, P. A New Nonlinear Fuzzy Robust PCA Algorithm and Similarity Classifier in Classification of Medical Data Sets. Int J. Fuzzy Syst. 2011, 13, 153–162. [Google Scholar]
- Dulhare, U.N.; Ayesha, M. Extraction of action rules for chronic kidney disease using Naïve bayes classifier. In Proceedings of the 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Chennai, India, 15–17 December 2016; pp. 1–5. [Google Scholar]
- Li, G.; Ou, W. Pairwise probabilistic matrix factorization for implicit feedback collaborative filtering. Neurocomputing 2016, 204, 17–25. [Google Scholar] [CrossRef]
- Margolis, R.; Derr, L.; Dunn, M.; Huerta, M.; Larkin, J.; Sheehan, J.; Guyer, M.; Green, E.D. The National Institutes of Health‘s Big Data to Knowledge (BD2K) initiative: Capitalizing on biomedical big data. J. Am. Med. Inform. Assoc. 2014, 21, 957–958. [Google Scholar] [CrossRef]
- Yao, J.; Azam, N. Web-Based Medical Decision Support Systems for Three-Way Medical Decision Making With Game-Theoretic Rough Sets. IEEE T Fuzzy Syst. 2015, 23, 3–15. [Google Scholar] [CrossRef]
f25: Classes | Features | |
---|---|---|
Numerical (Numeric Assigned Values of 0, 1,… and 6) | ||
C1: ckd C2: notckd | f1. | (2, 34], (34, 46], (46, 54], (54, 60], (60, 67], (67, 90] |
f2. | (50, 60], (60, 70], (70, 76], (76, 80], (80, 90], (90, 180] | |
f10. | (22, 94], (94, 108], (108, 125], (125, 148.2], (148.2, 203], (203, 490] | |
f11. | (1.5, 23], (23, 32], (32, 44], (44, 53], (53, 85], (85, 391] | |
f12. | (0.4, 0.8], (0.8, 1.1], (1.1, 1.3], (1.3, 2.2], (2.2, 3.9], (3.9, 76] | |
f13. | (4.5, 135], (135, 137], (137, 137.5], (137.5, 139], (139, 142], (142, 163] | |
f14. | (2.5, 3.7], (3.7, 4.2], (4.2, 4.6], (4.6, 4.62], (4.62, 4.9], (4.9, 47] | |
f15. | (3.1, 9.8], (9.8, 11.4], (11.4, 12.5], (12.5, 13.7], (13.7, 15.2], (15.2, 17.8] | |
f16. | (9, 31], (31, 37], (37, 38.9], (38.9, 42], (42, 47], (47, 54] | |
f17. | (2200, 6300], (6300, 7700], (7700, 8406], (8406, 8406.2], (8406.2, 9800], (9800, 26,400] | |
f18. | (2.1, 3.9], (3.9, 4.7], (4.7, 4.71], (4.71, 4.8], (4.8, 5.4], (5.4, 8] | |
Nominal Class Attributes | ||
f3. 1.005, 1.01, 1.015, 1.02, 1.025 f4. 0, 1, 2, 3, 4, 5 f5. 0, 1, 2, 3, 4, 5 f6. abnormal, normal f7. abnormal, normal f8. notpresent, present f9. notpresent, present f19. no, yes f20. no, yes f21. no, yes f22. poor, good f23. no, yes f24. no, yes |
Classes (Class Label) | Features | |
---|---|---|
Clinical (Numeric Assigned Values of 0, 1, 2, and 3) | Histopathological (Numeric Assigned Values of 0, 1, 2, and 3) | |
C1: Psoriasis C2: Seborrheic dermatitis C3: Lichen planus C4: Pityriasis rosea C5: Cronic dermatitis C6: Pityriasis rubra pilaris | f1. f2. f3. f4. f5. f6. f7. f8. f9. f10. f11 (0 or 1) f34: (0, 21], (21, 29], (29, 36], (36, 43], (43, 52], (52, 75] | f12. f13. f14. f15. f16. f17. f18. f19. f20. f21. f22. f23. f24. f25. f26. f27. f28. f29. f30. f31. f32 f33. |
Missingness | Baseline | 10% | 20% | 30% |
---|---|---|---|---|
Accuracy of dermatology | 0.96894 (k = 1) | 0.96382 (k = 1) | 0.92764 (k = 1) | 0.82814 (k = 8) |
F1-score of dermatology | 0.96787 (k = 1) | 0.96313 (k = 1) | 0.8937 (k = 1) | 0.72510 (k = 8) |
Accuracy of CKD | 0.97183 (k = 4) | 0.96894 (k = 6) | 0.97041 (k = 7) | 0.95275 (k = 6) |
F1-score of CKD | 0.97019 (k = 4) | 0.96894 (k = 6) | 0.96894 (k = 7) | 0.95077 (k = 6) |
Missingness | Baseline | 10% | 20% | 30% |
---|---|---|---|---|
F1-score of dermatology | 0.441 | 0.382 | 0.291 | 0.152 |
F1-score of CKD | 0.730 | 0.686 | 0.619 | 0.530 |
Missingness | Baseline | 10% | 20% | 30% |
---|---|---|---|---|
F1-score of dermatology | 0.964 | 0.953 | 0.941 | 0.903 |
F1-score of CKD | 0.996 | 0.982 | 0.978 | 0.968 |
Missingness | Baseline | 10% | 20% | 30% |
---|---|---|---|---|
Accuracy of dermatology | 0.027 (SVM-cart) | 0.108 (RF-cart) | 0.064 (RF-rf) | 0.109 (LR-rf) |
F1-score of dermatology | 0.035 (MLP-pmm) | 0.066 (LR-cart) | 0.053 (RF-rf) | 0.087 (SVM-rf) |
Accuracy of CKD | 0.987 (RF-mean) | 0.964 (LR-cart) | 0.975 (RF-cart) | 0.966 (LR-pmm) |
F1-score of CKD | 0.987 (RF-mean) | 0.963 (RF-Mean) | 0.975 (RF-cart) | 0.964 (LR-pmm) |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, M.; Min, T.; Zang, T.; Wang, Y. CDL4CDRP: A Collaborative Deep Learning Approach for Clinical Decision and Risk Prediction. Processes 2019, 7, 265. https://doi.org/10.3390/pr7050265
Sun M, Min T, Zang T, Wang Y. CDL4CDRP: A Collaborative Deep Learning Approach for Clinical Decision and Risk Prediction. Processes. 2019; 7(5):265. https://doi.org/10.3390/pr7050265
Chicago/Turabian StyleSun, Mingrui, Tengfei Min, Tianyi Zang, and Yadong Wang. 2019. "CDL4CDRP: A Collaborative Deep Learning Approach for Clinical Decision and Risk Prediction" Processes 7, no. 5: 265. https://doi.org/10.3390/pr7050265