TEHAI Validation
TEHAI Validation
TEHAI Validation
Review
Aaron Edward Casey1,2, PhD; Saba Ansari3, GCHE (Teaching and Learning), MSc; Bahareh Nakisa4, BSE, MCS,
PhD; Blair Kelly5, Grad Dip (InfoLibStds), BCom; Pieta Brown6, MPS; Paul Cooper3, PhD; Imran Muhammad3,
MIS, MSc, PhD; Steven Livingstone6, BSc, GradDipSci, MDataSci; Sandeep Reddy3, MBBS, MSc, PhD; Ville-Petteri
Makinen1,2,7,8, DSc
1
South Australian Health and Medical Research Institute, Adelaide, Australia
2
Australian Centre for Precision Health, Cancer Research Institute, University of South Australia, Adelaide, Australia
3
School of Medicine, Deakin University, Geelong, Australia
4
School of Information Technology, Deakin University, Geelong, Australia
5
Library, Deakin University, Geelong, Australia
6
Orion Health, Auckland, New Zealand
7
Computational Medicine, Faculty of Medicine, University of Oulu, Oulu, Finland
8
Centre for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu, Finland
Corresponding Author:
Aaron Edward Casey, PhD
South Australian Health and Medical Research Institute
North Terrace
Adelaide, 5000
Australia
Phone: 61 08 8128 4064
Email: aaron.casey@sahmri.com
Abstract
Background: Despite immense progress in artificial intelligence (AI) models, there has been limited deployment in health care
environments. The gap between potential and actual AI applications is likely due to the lack of translatability between controlled
research environments (where these models are developed) and clinical environments for which the AI tools are ultimately
intended.
Objective: We previously developed the Translational Evaluation of Healthcare AI (TEHAI) framework to assess the translational
value of AI models and to support successful transition to health care environments. In this study, we applied the TEHAI framework
to the COVID-19 literature in order to assess how well translational topics are covered.
Methods: A systematic literature search for COVID-19 AI studies published between December 2019 and December 2020
resulted in 3830 records. A subset of 102 (2.7%) papers that passed the inclusion criteria was sampled for full review. The papers
were assessed for translational value and descriptive data collected by 9 reviewers (each study was assessed by 2 reviewers).
Evaluation scores and extracted data were compared by a third reviewer for resolution of discrepancies. The review process was
conducted on the Covidence software platform.
Results: We observed a significant trend for studies to attain high scores for technical capability but low scores for the areas
essential for clinical translatability. Specific questions regarding external model validation, safety, nonmaleficence, and service
adoption received failed scores in most studies.
Conclusions: Using TEHAI, we identified notable gaps in how well translational topics of AI models are covered in the
COVID-19 clinical sphere. These gaps in areas crucial for clinical translatability could, and should, be considered already at the
model development stage to increase translatability into real COVID-19 health care environments.
KEYWORDS
artificial intelligence; health care; clinical translation; translational value; evaluation; capability; utility; adoption; COVID-19;
AI application; health care AI; model validation; AI model; AI tools
a
TEHAI: Translational Evaluation of Healthcare Artificial Intelligence (AI).
b
The framework comprises 15 separate criteria (subcomponents) that are grouped into 3 higher-level components. Each criterion yields a score between
0 and 3 points, depending on the quality of the study. To compare 2 or more AI models against each other, further weighting of the scores can be applied
to emphasize translatability. However, in this study, weighting was not used, since we focused on the statistics of the subcomponents instead.
was 100; however, additional papers were randomly picked to test. Correlations between subcomponents were calculated using
account for the rejection of 21 (17.1%) papers that passed the the Kendall formula. Component scores were calculated by
initial screen but were deemed ineligible after closer inspection adding the relevant subcomponent scores together; group
(Multimedia Appendix 3). Early on in the evaluation, it became differences in mean component scores were assessed using the
apparent that a significant portion of the studies focused on t-test. As there are 15 subcomponents, we set a multiple testing
image analysis; we then enriched the pool for studies that were threshold of P<.003 to indicate 5% type 1 error probability
not imaging focused, taking the ratio of under the Bonferroni correction for 15 independent tests. Unless
imaging-focused:nonimaging-focused studies to 1:1. The full otherwise indicated, mean (SE) scores were calculated.
text was retrieved for all 123 (12.7%) studies in the randomized
sample; however, only 102 (82.9%) studies met our inclusion Results
criteria at the evaluation and extraction stage (Multimedia
Appendix 4). Of the studies that did not meet our inclusion TEHAI Subcomponent Scores
criteria, the majority were nonimaging studies and the final ratio A total of 102 manuscripts were reviewed by 9 reviewers (mean
of imaging-focused:nonimaging-focused studies was 2:1. 22.67 per reviewer, SD 7.71, min.=11, max.=36), with the same
Evaluation and data extraction were conducted using Covidence 2 reviewers scoring the same manuscript an average of 2.83
systematic review software [22]. We used this software to times (SD 2.58, min.=0, max.=13). The Cohen κ statistic for
facilitate the creation of a quality assessment template based interreviewer reliability was 0.45, with an asymptomatic SE of
on the TEHAI framework [15] in combination with other 0.017 over the 2 independent reviewers. The reviewer scores
questions (henceforth referred to as data extraction questions) were in moderate agreement (κ=0.45) according to Cohen’s
aimed at further understanding the components that may original tiers [23]. In practice, this means that the scoring system
influence a study’s capacity to translate into clinical practice was successful in capturing important and consistent information
(Multimedia Appendix 1). As a measure to minimize the impact from the COVID-19 papers, but there would be too much
of subjectivity introduced by human evaluation, each paper was disagreement due to reviewer background or random noise for
initially scored by 2 reviewers, who independently evaluated demanding applications, such as clinical diagnoses [24]. Given
the paper against the elements of the TEHAI framework and that the role of the TEHAI framework is to provide guidance
extracted relevant data. A third reviewer then checked the scores, and decision support (not diagnoses), moderate accuracy is
and if discrepancies were present, they chose 1 of the 2 sufficient for a meaningful practical benefit for AI development.
independent reviewers’ scores as the final result. This process Nevertheless, the question of reviewer bias should be revisited
was built-in to the Covidence platform. To further minimize in future updates to the framework.
the impact of subjectivity introduced by human evaluation, Overall, the capability component scored the highest mean
reviewer roles were also randomly assigned across the score, followed by adoption and utility (Figure 1A). At the
evaluation team. subcomponent level, the poorest-performing questions were
For scoring of the included studies, we derived upon previously nonmaleficence (93/102, 91.2%, scoring 0 points), followed
provided guidance for scoring evidence within the TEHAI closely by safety and quality, external validity, and the number
framework [15]. The TEHAI framework is composed of 3 of services (Figure 1B).
overarching components: capability, utility, and adoption. Each We observed moderate positive correlation (R=0.19-0.43)
component comprises numerous subcomponent questions, of between most capability component questions (data source vs:
which there are 15 in total. The scoring of each TEHAI internal validation R=0.43, external validation R=0.20,
subcomponent is based on a range of 0-3, depending on the performance R=0.33, and use case R=0.37; internal validation
criteria met by the study. In this study, we also investigated the vs: performance R=0.40, use case R=0.31; performance vs use
sums of these scores at the component level to provide a better case R=0.32), with the exception of the subcomponent objective
overview of data. In addition, TEHAI facilitates direct of study (objective of study vs: data source R=0.13, internal
comparisons between specific studies using a weighting validation R=0.09, external validation R=0.08); see Figure 2.
mechanism that further emphasizes the importance of This indicated that if a study scored well in one subcomponent
translatability (see the last column in Table 1). However, for of the capability component, then it was also likely to score
this study, where we focused on the aggregate statistical patterns, well in the other capability subcomponents, with the exception
weighting was not used. of the “objective of the study” subcomponent. Furthermore,
We also asked reviewers to report on a select number of data there was also a correlation between the subcomponents
extraction questions that would enable us to further tease apart belonging to the capability component and the “generalizability
which components of a study may influence the score obtained. and contextualization” (R=0.19-0.31), “transparency”
These questions covered (1) the broad type of the AI algorithm, (R=0.11-0.27), and “alignment with the domain” (R=0.13-0.40)
(2) methodological or clinical focus, (3) open source or subcomponents, as well as our data extraction question 9
proprietary software, (4) the data set size, (5) the country of (method of machine learning used; R=0.11-0.24); see Figure 2.
origin, and (6) imaging or nonimaging data. There was also a significant, moderate correlation between most
adoption component questions (R=0.18-0.42), with the exception
Data Analysis of the “alignment with the domain” subcomponent
Associations between groupings of papers and the distributions (R=0.04-0.26); see Figure 2. A significant negative correlation
of subcomponent scores were assessed with the Fisher exact was observed between a country’s gross domestic product
(GDP) and imaging studies (R=–0.30), indicating that high-GDP to be associated with numerous services than clinical studies.
countries are less likely to conduct imaging studies than Code availability was inversely correlated with transparency
middle-GDP countries. The negative correlation between the (R=–0.36), as expected (open source was 1 of the assessment
audience (clinical or methodological) and the number of services conditions).
(R=–0.36) indicated that methodological studies are less likely
Figure 1. Overall consensus scores obtained by all studies reviewed. (A) Average consensus scores for all studies reviewed (error bars=SE). (B) Stacked
bar graph showing the distribution of scores for each subcomponent question. Ext: external; h/care: health care; int: internal.
Figure 2. Correlation heatmap showing the strength of correlation between all subcomponents and select data extraction questions. The strength of
correlation, as determined by the Fisher exact test, is shown in color, with the size of squares representing the level of significance. Avail: availability;
ext: external; GDP: gross domestic product; h/care: health care; int: internal.
Figure 4. Component and subcomponent scores split into subcategories based on data extraction questions, including (A and B) “intended audience,”
(C and D) “type of software,” and (E and F) “size of data set.” Bars show average scores, with error bars equal to SE. Bold P values indicate P<.05.
Bonferroni-corrected significance P=.003. Ext: external; h/care: health care; int: internal.
Close to half of the studies used open source software (n=45, mean score 1.25, SE 0.16; P=.02), while papers with open source
44.1%), with a small portion (n=8, 7.8%) using proprietary software tended to score better in utility, including the “safety
software (with the remaining studies being unclear as to the and quality” (open source software studies mean score 0.27, SE
software availability). There was a tendency for proprietary 0.09; proprietary software studies mean score 0.13, SE 0.13;
software to perform better at adoption, particularly in the “use P=.99), “privacy” (open source software studies mean score
in a health care setting” subcomponent (open source software 0.91, SE 0.14; proprietary software studies mean score 0.75,
studies mean score 0.69, SE 0.09; proprietary software studies SE 0.31; P=.43), and “nonmaleficence” (open source software
not surprising to find that the CNN was the most popular representing a sufficient diversity of individuals. From a
machine learning model, as most of the selected studies related translational point of view, we also observed shortcomings in
to medical imaging analysis (69/102 studies were imaging the contextualization of AI models. Again, since there was
studies compared to 33/102 studies that were not), where the limited evidence on service deployment, most studies scored
technique is widely understood and beginning to be applied in low on fairness simply due to a lack of data. We also note that
some clinical settings [6,30]. deployment in this case may be hindered by the clinical
acceptance of the models [11], and we will include this topic
Although there was a consistent trend for studies with large data
in future amendments to the TEHAI framework.
sets to score higher than those with small data sets, there was
no significant difference in any subcomponent between studies Limitations
with small versus large data sets. This was a surprising finding Although we undertook a comprehensive evaluation of AI
and indicates that even when studies have collected more data, studies unlike previous assessments, our study still has some
they advance no further in the utility or adoption fields, and limitations. First, the period we used to review and select studies
should the total number of studies analyzed be increased, we was narrow, being just a year. Another limitation is that for
would expect the difference between the two data sets to become practical reasons, we randomly chose a subset of 102 studies
significant. Regarding imaging versus nonimaging, we observed for evaluation out of the 968 eligible studies. Despite these
that nonimaging studies scored higher in some adoption and limitations, we are confident that the evaluation process we
utility subcomponents. We suspect this was due to the more undertook was rigorous, as evidenced by the systematic review
clinical nature of the nonimaging research teams; thus, the of the literature, the detailed assessment of each of the selected
papers focused more on issues important to clinical practice. studies, and the parallel review and consensus steps.
Although there was a tendency for those studies using
proprietary software that we expected to be more mature, the We recommend caution when generalizing the results from this
authors had not advanced the findings into practice any more COVID-19 study to other areas of AI in health care. First,
than that of open source, algorithm-based studies. Again, we evaluation frameworks that rely on human experts can be
would expect this difference to become significant if the number sensitive to the selection of the experts (subjectivity). Second,
of studies scored were to be increased. We also assessed the scoring variation may arise from the nature of the clinical
interpretability of the models as part of the “transparency” problem rather than the AI solution per se; thus, TEHAI results
subcomponent and found that imaging studies in particular from different fields may not be directly comparable. Third, we
included additional visualization to pinpoint the regions that intentionally excluded discovery studies aimed at new biology
were driving the classification. Further, the scoring studies in or novel treatments, as those would have been too early in the
each of the TEHAI components evidenced the need for planning translation pipeline to have a meaningful evaluation. Fourth,
in advance for external validation, safety, and integration in significant heterogeneity of clinical domains may also confound
health services to ensure the full translatability of AI models in the evaluation results and may prevent comparisons of studies
health care. (here, we made an effort to preselect studies that were
comparable). Lastly, the TEHAI framework is designed to be
Most of the reviewed studies lacked sufficient considerations widely applicable, which means that stakeholders with specific
for adoption into health care practices (the third TEHAI subjective requirements may need to adapt their interpretations
component), which has implications for the business case for accordingly.
AI applications in health care. The cost of deployment and costs
from misclassification from both monetary and patient We acknowledge the rapid progress in AI algorithms that may
safety/discomfort perspectives can only be assessed if there are make some of the evaluation aspects obsolete over time;
pilot data available from actual tests that put new tools into however, we also emphasize that 2 of the 3 TEHAI components
service. Furthermore, critical administrative outcomes, such as are not related to AI itself but to the ways AI interacts with the
workload requirements, should be considered as early as requirements of clinical practice and health care processes.
possible. Although we understand that such tests are hard to Therefore, we expect that the translatability observations from
organize from an academic basis, the TEHAI framework can this study will have longevity.
be used as an incentive to move in this direction.
Conclusion
We note that availability of dedicated data sets and computing AI in health care has a translatability challenge, as evidenced
resources for training could be a bottleneck for some by our evaluation study. By assessing 102 AI studies for their
applications. In this study, we observed multiple instances of capability, utility, and adoption aspects, we uncovered
transfer learning, which is 1 solution; however, we will revise translational gaps in many of these studies. Our study highlights
the capability section of TEHAI to make a more specific the need to plan for translational aspects early in the AI
consideration for these issues. development cycle. The evaluation framework we used and the
Fair access to AI technology should also be part of good design. findings from its application will inform developers, researchers,
The TEHAI framework includes this in the “internal validity” clinicians, authorities, and other stakeholders to develop and
subcomponent, where small studies in particular struggled with deploy more translatable AI models in health care.
Acknowledgments
BK extracted appropriate studies from databases. AEC assigned studies to reviewers, carried out all analysis, and generated
figures. All authors were involved in the scoring process. AEC, SR, SA, and V-PM drafted the manuscript. All authors provided
feedback and edits for the final manuscript.
Conflicts of Interest
SR holds directorship in Medi-AI. The other authors have no conflicts of interest to declare.
Multimedia Appendix 1
Component and subcomponent scores split into subcategories based on data extraction questions, including (A and B) "country
GDP" and (C and D) "imaging/nonimaging"-based study. Bars show average scores, with error bars equal to SE. Bold P values
indicate P<.05. Bonferroni-corrected significance P=.003. GDP: gross domestic product.
[PNG File , 144 KB-Multimedia Appendix 1]
Multimedia Appendix 2
Search strategies.
[DOCX File , 15 KB-Multimedia Appendix 2]
Multimedia Appendix 3
PRISMA flow diagram.
[DOCX File , 41 KB-Multimedia Appendix 3]
Multimedia Appendix 4
Evaluation and scoring questions.
[DOCX File , 29 KB-Multimedia Appendix 4]
References
1. Desai AN. Artificial intelligence: promise, pitfalls, and perspective. JAMA 2020 Jul 23;323(24):2448-2449 [doi:
10.1001/jama.2020.8737] [Medline: 32492093]
2. Reddy S, Fox J, Purohit MP. Artificial intelligence-enabled healthcare delivery. J R Soc Med 2019 Jan;112(1):22-28 [FREE
Full text] [doi: 10.1177/0141076818815510] [Medline: 30507284]
3. Artificial intelligence in health care: benefits and challenges of technologies to augment patient care. United States
Government Accountability Office. 2020. URL: https://www.gao.gov/products/gao-21-7sp [accessed 2022-07-13]
4. Feng J, Phillips RV, Malenica I, Bishara A, Hubbard AE, Celi LA, et al. Clinical artificial intelligence quality improvement:
towards continual monitoring and updating of AI algorithms in healthcare. NPJ Digit Med 2022 May 31;5(1):66 [FREE
Full text] [doi: 10.1038/s41746-022-00611-y] [Medline: 35641814]
5. Nsoesie EO. Evaluating artificial intelligence applications in clinical settings. JAMA Netw Open 2018 Oct 07;1(5):e182658
[FREE Full text] [doi: 10.1001/jamanetworkopen.2018.2658] [Medline: 30646173]
6. van Leeuwen KG, Schalekamp S, Rutten MJCM, van Ginneken B, de Rooij M. Artificial intelligence in radiology: 100
commercially available products and their scientific evidence. Eur Radiol 2021 Jul;31(6):3797-3804 [FREE Full text] [doi:
10.1007/s00330-021-07892-z] [Medline: 33856519]
7. Roberts M, Driggs D, Thorpe M, Gilbey J, Yeung M, Ursprung S, et al. Common pitfalls and recommendations for using
machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell 2021
Mar 15;3(3):199-217 [doi: 10.1038/s42256-021-00307-0]
8. Seneviratne MG, Shah NH, Chu L. Bridging the implementation gap of machine learning in healthcare. BMJ Innov 2019
Dec 20;6(2):45-47 [doi: 10.1136/bmjinnov-2019-000359]
9. Kim DW, Jang HY, Kim KW, Shin Y, Park SH. Design characteristics of studies reporting the performance of artificial
intelligence algorithms for diagnostic analysis of medical images: results from recently published papers. Korean J Radiol
2019 Mar;20(3):405-410 [FREE Full text] [doi: 10.3348/kjr.2019.0025] [Medline: 30799571]
10. Yu AC, Mohajer B, Eng J. External validation of deep learning algorithms for radiologic diagnosis: a systematic review.
Radiol Artif Intell 2022 May;4(3):e210064 [FREE Full text] [doi: 10.1148/ryai.210064] [Medline: 35652114]
11. Schneider J, Agus M. Reflections on the clinical acceptance of artificial intelligence. In: Househ M, Borycki E, Kushniruk
A, editors. Multiple Perspectives on Artificial Intelligence in Healthcare: Opportunities and Challenges. Cham: Springer
International Publishing; 2021:103-114
12. Reddy S, Allan S, Coghlan S, Cooper P. A governance model for the application of AI in health care. J Am Med Inform
Assoc 2020 Mar 01;27(3):491-497 [FREE Full text] [doi: 10.1093/jamia/ocz192] [Medline: 31682262]
13. Mhasawade V, Zhao Y, Chunara R. Machine learning and algorithmic fairness in public and population health. Nat Mach
Intell 2021 Jul 29;3(8):659-666 [doi: 10.1038/s42256-021-00373-4]
14. AlHasan A. Bias in medical artificial intelligence. Bull R Coll Surg Engl 2021 Sep;103(6):302-305 [doi:
10.1308/rcsbull.2021.111]
15. Reddy S, Rogers W, Makinen V, Coiera E, Brown P, Wenzel M, et al. Evaluation framework to guide implementation of
AI systems into healthcare settings. BMJ Health Care Inform 2021 Oct;28(1):e100444 [FREE Full text] [doi:
10.1136/bmjhci-2021-100444] [Medline: 34642177]
16. Kim W, Jang Y, Yang J, Chung J. Spatial activation of TORC1 is regulated by Hedgehog and E2F1 signaling in the
Drosophila eye. Dev Cell 2017 Aug 21;42(4):363-375.e4 [FREE Full text] [doi: 10.1016/j.devcel.2017.07.020] [Medline:
28829944]
17. Saygılı A. A new approach for computer-aided detection of coronavirus (COVID-19) from CT and X-ray images using
machine learning methods. Appl Soft Comput 2021 Jul;105:107323 [FREE Full text] [doi: 10.1016/j.asoc.2021.107323]
[Medline: 33746657]
18. Roimi M, Gutman R, Somer J, Ben Arie A, Calman I, Bar-Lavie Y, et al. Development and validation of a machine learning
model predicting illness trajectory and hospital utilization of COVID-19 patients: a nationwide study. J Am Med Inform
Assoc 2021 Jul 12;28(6):1188-1196 [FREE Full text] [doi: 10.1093/jamia/ocab005] [Medline: 33479727]
19. Jin C, Chen W, Cao Y, Xu Z, Tan Z, Zhang X, et al. Development and evaluation of an artificial intelligence system for
COVID-19 diagnosis. Nat Commun 2020 Oct 09;11(1):5088 [FREE Full text] [doi: 10.1038/s41467-020-18685-1] [Medline:
33037212]
20. Reddy S, Bhaskar R, Padmanabhan S, Verspoor K, Mamillapalli C, Lahoti R, et al. Use and validation of text mining and
cluster algorithms to derive insights from corona virus disease-2019 (COVID-19) medical literature. Comput Methods
Programs Biomed Update 2021;1:100010 [FREE Full text] [doi: 10.1016/j.cmpbup.2021.100010] [Medline: 34337589]
21. Guo Y, Zhang Y, Lyu T, Prosperi M, Wang F, Xu H, et al. The application of artificial intelligence and data integration in
COVID-19 studies: a scoping review. J Am Med Inform Assoc 2021 Aug 13;28(9):2050-2067 [FREE Full text] [doi:
10.1093/jamia/ocab098] [Medline: 34151987]
22. Covidence systematic review software. Veritas Health Innovation. URL: https://www.covidence.org [accessed 2023-06-01]
23. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 2016 Jul 02;20(1):37-46 [doi:
10.1177/001316446002000104]
24. Sim J, Wright CC. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther
2005 Mar;85(3):257-268 [Medline: 15733050]
25. The world by income and region. The World Bank. 2022. URL: https://datatopics.worldbank.org/world-development-
indicators/the-world-by-income-and-region.html [accessed 2022-07-13]
26. Ruamviboonsuk P, Tiwari R, Sayres R, Nganthavee V, Hemarat K, Kongprayoon A, et al. Real-time diabetic retinopathy
screening by deep learning in a multisite national screening programme: a prospective interventional cohort study. Lancet
Digit Health 2022 May;4(4):e235-e244 [FREE Full text] [doi: 10.1016/S2589-7500(22)00017-6] [Medline: 35272972]
27. Cen LP, Ji J, Lin J, Ju S, Lin H, Li T, et al. Automatic detection of 39 fundus diseases and conditions in retinal photographs
using deep neural networks. Nat Commun 2021 Aug 10;12(1):4828 [FREE Full text] [doi: 10.1038/s41467-021-25138-w]
[Medline: 34376678]
28. Deperlioglu O, Kose U, Gupta D, Khanna A, Sangaiah AK. Diagnosis of heart diseases by a secure Internet of Health
Things system based on autoencoder deep neural network. Comput Commun 2020 Oct 01;162:31-50 [FREE Full text] [doi:
10.1016/j.comcom.2020.08.011] [Medline: 32843778]
29. Liu J, Gibson E, Ramchal S, Shankar V, Piggott K, Sychev Y, et al. Diabetic retinopathy screening with automated retinal
image analysis in a primary care setting improves adherence to ophthalmic care. Ophthalmol Retina 2021 Jan;5(1):71-77
[FREE Full text] [doi: 10.1016/j.oret.2020.06.016] [Medline: 32562885]
30. Omoumi P, Ducarouge A, Tournier A, Harvey H, Kahn CE, Louvet-de Verchère F, et al. To buy or not to buy-evaluating
commercial AI solutions in radiology (the ECLAIR guidelines). Eur Radiol 2021 Jul;31(6):3786-3796 [FREE Full text]
[doi: 10.1007/s00330-020-07684-x] [Medline: 33666696]
Abbreviations
AI: artificial intelligence
CNN: convolutional neural network
GDP: gross domestic product
MeSH: Medical Subject Headings
NIH: National Institutes of Health
TEHAI: Translational Evaluation of Healthcare AI
Edited by K El Emam, B Malin; submitted 31.08.22; peer-reviewed by W Klement, S Lin; comments to author 07.11.22; revised version
received 23.11.22; accepted 22.03.23; published 06.07.23
Please cite as:
Casey AE, Ansari S, Nakisa B, Kelly B, Brown P, Cooper P, Muhammad I, Livingstone S, Reddy S, Makinen VP
Application of a Comprehensive Evaluation Framework to COVID-19 Studies: Systematic Review of Translational Aspects of Artificial
Intelligence in Health Care
JMIR AI 2023;2:e42313
URL: https://ai.jmir.org/2023/1/e42313
doi: 10.2196/42313
PMID:
©Aaron Edward Casey, Saba Ansari, Bahareh Nakisa, Blair Kelly, Pieta Brown, Paul Cooper, Imran Muhammad, Steven
Livingstone, Sandeep Reddy, Ville-Petteri Makinen. Originally published in JMIR AI (https://ai.jmir.org), 06.07.2023. This is
an open-access article distributed under the terms of the Creative Commons Attribution License
(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium,
provided the original work, first published in JMIR AI, is properly cited. The complete bibliographic information, a link to the
original publication on https://www.ai.jmir.org/, as well as this copyright and license information must be included.