Abstract
Deep neural network models are emerging as an important method in healthcare delivery, following the recent success in other domains such as image recognition. Due to the multiple non-linear inner transformations, deep neural networks are viewed by many as black boxes. For practical use, deep learning models require explanations that are intuitive to clinicians. In this study, we developed a deep neural network model to predict outcomes following major cardiovascular procedures, using temporal image representation of past medical history as input. We created a novel explanation for the prediction of the model by defining impact scores that associate clinical observations with the outcome. For comparison, a logistic regression model was fitted to the same dataset. We compared the impact scores and log odds ratios by calculating three types of correlations, which provided a partial validation of the impact scores. The deep neural network model achieved an area under the receiver operating characteristics curve (AUC) of 0.787, compared to 0.746 for the logistic regression model. Moderate correlations were found between the impact scores and the log odds ratios. Impact scores generated by the explanation algorithm has the potential to shed light on the “black box” deep neural network model and could facilitate its adoption by clinicians.




Similar content being viewed by others
References
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, 2012, vol. 1.
W. Xiong et al., "Toward Human Parity in Conversational Speech Recognition," IEEE/ACM Trans. Audio Speech Lang. Process., vol. 25, no. 11, 2017.
D. Silver et al., "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, no. 7587, pp. 484-9, doi: https://doi.org/10.1038/nature16961.
Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S. Rev. vol. 521, no. 7553, pp. 436-44, May 28 2015, doi: https://doi.org/10.1038/nature14539.
X. W. Chen and X. T. Lin, Big Data Deep Learning: Challenges and Perspectives, (in English), Leee Access, vol. 2, pp. 514-525, 2014, doi: https://doi.org/10.1109/Access.2014.2325029.
R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley, Deep learning for healthcare: review, opportunities and challenges, Brief Bioinform, May 6 2017, doi: https://doi.org/10.1093/bib/bbx044.
J. Futoma, J. Morris, and J. Lucas, "A comparison of models for predicting early hospital readmissions," J Biomed Inform, vol. 56, pp. 229-38, 2015, doi: https://doi.org/10.1016/j.jbi.2015.05.016.
Y. Cheng, F. Wang, P. Zheng, and J. Hu (2016), "Risk prediction with electronic health records: a deep learning approach," in Proceedings of the 2016 SIAM International Conference on Data Mining, Miami: Society for Industrial and Applied Mathematics.
E. Choi, M. T. Bahadori, A. Schuetz, W. F. Stewart, and J. Sun, "Doctor AI: Predicting Clinical Events via Recurrent Neural Networks," JMLR Workshop Conf. Proc., vol. 56, pp. 301-318, 2016. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/28286600.
J. G. Lee et al., "Deep Learning in Medical Imaging: General Overview," Korean J Radiol, vol. 18, no. 4, pp. 570-584, 2017, doi: https://doi.org/10.3348/kjr.2017.18.4.570.
B. J. Erickson, P. Korfiatis, Z. Akkus, and T. L. Kline, "Machine Learning for Medical Imaging," Radiographics, vol. 37, no. 2, pp. 505-515, 2017, doi: https://doi.org/10.1148/rg.2017160130.
L. A. Pastur-Romay, F. Cedron, A. Pazos, and A. B. Porto-Pazos, "Deep Artificial Neural Networks and Neuromorphic Chips for Big Data Analysis: Pharmaceutical and Bioinformatics Applications," Int J Mol Sci, vol. 17, no. 8, 1313 Aug 11 2016, doi: https://doi.org/10.3390/ijms17081313.
NIST. "Guidelines for the 2012 TREC Medical Records Track." http://www-nlpir.nist.gov/projects/trecmed/2012. Accessed March 2018.
M. D. Zeiler and R. Fergus, "Visualizing and Understanding Convolutional Networks, (in English), Lect. Notes Comput. Sc., vol. 8689, pp. 818-833, 2014. [Online]. Available: <Go to ISI>://WOS:000345524200047.
S. Bach, A. Binder, G. Montavon, F. Klauschen, K. R. Muller, and W. Samek, "On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation," PLoS One, vol. 10, no. 7, p. e0130140, 2015, doi: https://doi.org/10.1371/journal.pone.0130140.
Lipton, Z. C., The Mythos of Model Interpretability. Queue, vol 16, no. 3, 2018. https://arxiv.org/abs/1606.03490. Accessed March 2018.
Y. Dong, H. Su, J. Zhu, and B. Zhang, Improving Interpretability of Deep Neural Networks with Semantic Information, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. arXiv preprint arXiv:1703.04096, 2017
Oquab, M., Bottou, L., Laptev, I., and J. Sivic, Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks, (in English). 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717-1724, 2014, https://doi.org/10.1109/Cvpr.2014.222.
R. Shah, Y. Shao, K. M. Doing-Harris, W. Charlene, Y. Cheng, B. Bray, and Q. Zeng-Treitler, "Frailty and Cardiovasular Surgery: Deep Neural Network versus Support Vector Machine to Predict Death," in ACC.18, Orlando, FL, 2018.
Q. T. Zeng, D. Redd, G. Divita, S. Jarad, C. Brandt, and J. R. Nebeker, "Characterizing Clinical Text and Sublanguage: A Case Study of the VA Clinical Notes," J Health Med Informat. S3:001, 2011.
A. Kadish and M. Mehra, "Heart failure devices: implantable cardioverter-defibrillators and biventricular pacing therapy," Circulation, vol. 111, no. 24, pp. 3327-35, 2005, doi: https://doi.org/10.1161/CIRCULATIONAHA.104.481267.
M. S. Slaughter et al., "Advanced heart failure treated with continuous-flow left ventricular assist device," N Engl J Med, vol. 361, no. 23, pp. 2241-51, Dec 3 2009, doi: https://doi.org/10.1056/NEJMoa0909938.
K. H. Ladwig, J. Baumert, B. Marten-Mittag, C. Kolb, B. Zrenner, and C. Schmitt, "Posttraumatic stress symptoms and predicted mortality in patients with implantable cardioverter-defibrillators: results from the prospective living with an implanted cardioverter-defibrillator study," Arch Gen Psychiatry, vol. 65, no. 11, pp. 1324-30, Nov 2008, doi: https://doi.org/10.1001/archpsyc.65.11.1324.
I. M. Morken, E. Bru, T. M. Norekval, A. I. Larsen, T. Idsoe, and B. Karlsen, "Perceived support from healthcare professionals, shock anxiety and post-traumatic stress in implantable cardioverter defibrillator recipients," J Clin Nurs, vol. 23, no. 3-4, pp. 450-60, 2014, doi: https://doi.org/10.1111/jocn.12200.
K. H. Magid, D. D. Matlock, J. S. Thompson, C. K. McIlvennan, and L. A. Allen, "The influence of expected risks on decision making for destination therapy left ventricular assist device: An MTurk survey," J Heart Lung Transplant, vol. 34, no. 7, pp. 988-90, 2015, doi: https://doi.org/10.1016/j.healun.2015.03.006.
R. Rowe et al., "Role of frailty assessment in patients undergoing cardiac interventions," Open Heart, vol. 1, no. 1, p. e000033, 2014, doi: https://doi.org/10.1136/openhrt-2013-000033.
J. Chikwe and D. H. Adams, "Frailty: the missing element in predicting operative mortality," Semin Thorac Cardiovasc Surg, vol. 22, no. 2, pp. 109-10, 2010, doi: https://doi.org/10.1053/j.semtcvs.2010.09.001.
J. Zhang and M. F. Walji, "TURF: toward a unified framework of EHR usability," J Biomed Inform, vol. 44, no. 6, pp. 1056-67, 2011, doi: https://doi.org/10.1016/j.jbi.2011.08.005.
CZ. Che, Y. Cheng, Z. Sun, and Y. Liu, Exploiting Convolutional Neural Network for Risk Prediction with Medical Feature Embedding, arXiv preprint arXiv:1701.07474, 2017.
Y. Cheng, F. Wang, P. Zheng, and J. Hu, Risk Prediction with Electronic Health Records: A Deep Learning Approach, presented at the Proceedings of the 2016 SIAM International Conference on Data Mining, 2016.
S. Kiranyaz, T. Ince, O. Abdeljaber, O. Avci, and M. Gabbouj, "1-D Convolutional Neural Networks for Signal Processing Applications, presented at the ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, 2019.
J. B. Bergstra, O. Bastien, F. Lamblin, P. Pascanu, R. Desjardins, G. Turian, J. Warde-Farley, D. Bengio, Y. Theano, A CPU and GPU Math Expression Compiler, in Proceedings of the Python for Scientific Computing Conference (SciPy) 2010.
S. S. J. Dieleman, C. Raffel, E. Olso, S. K. Sønderby, D. Nouri, et. al, Lasagne: First release. https://doi.org/10.5281/zenodo.27878.
R. P. Anderson, R. Jin, and G. L. Grunkemeier, "Understanding logistic regression analysis in clinical reports: an introduction," Ann Thorac Surg, vol. 75, no. 3, pp. 753-7, 2003. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/12645688. Accessed March 2018.
Acknowledgments
This study was funded by: NIH grant R56 (AG052536-01A1); The Clinical and Translational Science Institute at Children’s National (CTSI-CN) through the NIH Clinical and Translational Science Award (CTSA) program (UL1TR001876); CREATE: A VHA NLP Software Ecosystem for Collaborative Development and Integration (#CRE 12–315); Veterans Health Administration Health Services Research & Development (# CRE 12-321); Career Development Award from the NHLBI (K08HL136850).
Funding
This study was funded by: NIH grant R56 (AG052536-01A1); The Clinical and Translational Science Institute at Children’s National (CTSI-CN) through the NIH Clinical and Translational Science Award (CTSA) program (UL1TR001876); CREATE: A VHA NLP Software Ecosystem for Collaborative Development and Integration (#CRE 12–315); Veterans Health Administration Health Services Research & Development (# CRE 12–321); Career Development Award from the NHLBI (K08HL136850).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
All authors declare that they have no conflicts of interest.
Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection on Image & Signal Processing
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Shao, Y., Cheng, Y., Shah, R.U. et al. Shedding Light on the Black Box: Explaining Deep Neural Network Prediction of Clinical Outcomes. J Med Syst 45, 5 (2021). https://doi.org/10.1007/s10916-020-01701-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-020-01701-8