1 Introduction

1.1 The social motivation

Pandemic scenarios are always challenging, because they need massive, fast, and practical actions to save lives. The SARS-CoV-2 pandemics have become a unique event in the twenty-first century of humankind, surpassing all pandemic effects caused by the Spanish flu in 1918. All social complexities and gaps in a world where the current population walks toward the 8 billion inhabitants mark overcome all realities of a 1.5 billion crowded world from 1918. The world’s inhabitants from 1918 did not have effective vaccines or medications to fight against the H1N1 virus. Such a combination resulted in around 50 million deaths, and one-third of the world population was infected (Jordan 2019).

During COVID-19 pandemics, developed nations keep at least 50% of all global vaccine doses available and leave emerging countries behind, increasing the chances of the virus spreading across a population (Fox 2020).

1.2 The biological motivation

The chances of novel virus variants surging (Gallagher 2021) increase significantly when the vaccination process covers only a tiny parcel of the population. The natural virus variance mechanism is always fast enough to find more efficient ways to hijack human host cells and incubate and replicate faster. The result can be catastrophic and cause the pandemic scenario to go out of control of healthcare authorities (Shang et al. 2020). Once in this stage, it may become hard to retake control of the health situation in a region affected by an outbreak. That is why, virus contact tracing plays a critical role in reducing infections. However, virus contact tracing is an arduous manual task and is not enough when a novel virus jumps into humans (Buhat et al. 2021; Shufro 2020).

Both social and biological aspects have motivated the scientific community to find fast responses to protect the population since 2020. Applied computing has been collaborating with the health area by providing computational models that can contribute by attempting to predict the next steps ahead of the pandemic.

Ghaderzadeh and Asadi (2021) provide a systematic review of the current state of all models for detecting and diagnosing COVID-19 through radiology modalities and their processing based on deep learning (DP). Alyasseri et al. (2022) provided a comprehensive review of the most recent DL and machine learning (ML) techniques for COVID-19 diagnosis. Syeda et al. (2021) conducted a systematic review of the literature on the role of artificial intelligence (AI) as a technology to fight the COVID-19 crisis in the fields of epidemiology, diagnosis, and disease progression.

1.3 Objective

Indeed, all previous systematic reviews cited in Sect. 1.2 are critical guides to helping the health area to thrive during the pandemic. However, our systematic review presents an essential advantage to help health authorities to save essential time and be steps ahead of the disease. This systematic review aims to review ML model studies that track pandemic scenario trends within an affected region compared to the actual trends in the same region. The goal is to answer the question: how effectively do ML models behave compared to the reality in the regions affected by the pandemics? The results obtained in this systematic review will work as a reference for healthcare authorities to make fast decisions and position steps ahead of the pandemics, mitigate deaths, and prevent overcrowded hospital admissions. The scientific contribution of this systematic review is to use the lessons learned from the ML models to apply them appropriately in future pandemic scenarios and be preventively successful for population health safety.

The usage of ML models helps attenuate out-of-control pandemic scenarios once the prediction matches possible critical situations with a grade of certainty. To do that, we need to analyze the works published where authors proposed ML model predictions and how close the ML approached real scenarios in territories where COVID-19 caused damage.

To organize the results of this systematic review, we will use the Participants, Intervention, Comparison and Outcomes (PICO) approach (Moher et al. 2009; Liberati et al. 2009).

2 Materials and methods

We performed a systematic review of the articles that analyze COVID-19 pandemic scenarios and computational models used for intervention, prediction, and mitigation of the critical aspects of the virus spreading to optimize the conduction of the pandemic phase by sanitary authorities.

The systematic review uses Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA 2020 statement) to ensure the necessary transparency, protocol application, and guidelines (Moher et al. 2009; Liberati et al. 2009; Page et al. 2021).

This section discusses the research design and the steps used to achieve the goal of this systematic review. This paper addresses the eligibility criteria, the information sources, the research questions, the study selection, the data collection, and the article selection process.

2.1 Eligibility criteria

We have specified inclusion and exclusion criteria as synthesis to reach the necessary grouped studies.

The search takes place in July 2022 to identify meta-analyses focusing on computational models used for intervention, prediction, and mitigation of SARS-CoV-2 pandemic scenarios published between January 2020 and July 2022.

We have defined and refined the search string using the PICO approach (Booth et al. 2019). PICO covers PRISMA checklist topics, such as objectives, search questions, and eligibility criteria, by clustering participants, interventions, comparisons, and outcomes to facilitate our searches.

  • Participants: We included randomized controlled trials of the COVID-19-positive population within specific territories. We considered individual mobility within geolocation as a factor of virus transmissibility.

  • Intervention: we considered all interventions identified by the search where a pre-defined algorithm or plan aims to predict the following steps within the pandemic scenario and how such methods can improve critical scenarios. We do not consider analyses and models that do not offer options to sanitary authorities.

  • Comparison: We assess the studies where ML prediction models for pandemic scenarios can help sanitary authorities map and plan necessary steps to control, mitigate, or prevent the virus from spreading.

  • Outcomes: To be included, a trial had to use ML or mathematical models (MM) that consider the COVID-19-positive population, the territory in scope, the mobility of individuals (whenever present), geolocation information (or a possibility to consider such), the power of ML or MM to focus on helping sanitary authorities management to tackle the scenario and to take over the control over the situation again (examples: prediction of future virus waves, preventive actions like lockdown or restrictions to slow down the virus spreading, or what vaccines are more effective depending upon the virus variants detected in a studied territory). We exclude reports with analyses of ML models without aiming to help sanitary authorities (examples: surveys about methods in ML, MM focused only on simulation, but without a goal of efficient actions toward population).

We considered articles written in English only. We performed the search and retrieval in all databases on July 2022. We excluded theses, dissertations, opinions, criticism, protocols, books, posters, oral presentations, general surveys, and previous non-related systematic and literature reviews.

We present the detailed inclusion and exclusion criteria established in this systematic review below:

  • Inclusion criteria: articles that use ML to attempt to build or identify a critical pandemics region due to a significant virus spreading; articles considering geolocation studies and implementations within a pandemic territory study; articles using reinforcement learning as an alternative to help mitigate the pandemics; articles that are focusing on pandemics prediction or evolution within a specific territory using ML; articles considering machine or specific deep learning prediction approaches intended to help sanitary authorities; articles concerning SARS-CoV-2 variants and spreading trends over regions; articles considering population mobility between locations affected by SARS-CoV-2.

  • Exclusion criteria: non-ML models, articles involving COVID-19 drug matching or finding; articles exploring ML closed to medical standpoints only, like medical imaging, X-ray, tomography to help identify COVID-19 presence; informative medical articles with superficial, generic, or overview medical interventions or analysis against SARS-CoV-2; articles exploring socioeconomic aspects, or blockchain methods during or after pandemics; reports about the usage of wearable IoT devices to help detect COVID-19 presence or exposure; articles focusing on population behavioral issues, mental health changes, or education impacts on the society during the pandemics; articles using algorithms to detect death tolls per se, without objectives to predict virus spreading or improvements in pandemic scenarios; articles about hospital capacity matching or optimization through ML algorithms (except when ML algorithms can also be used for epidemiological tracking); articles where bio environment aspects are involved as an effect of pandemics, like air pollution, climate change, or city water quality; general, direct, or indirect survey articles, which aim to cite or compare ML techniques as their focus.

2.2 Information sources

On 31 July 2022, we searched each of the databases listed in Table 1. We adapted the base search string from Table 2 to ensure that each database will return correct records to match our systematic review objectives (see Appendix A for details). We describe further information about the search strategy employed in this paper in Sect. 2.3. We queried the database by including the exact dates (01 January 2020 to 31 July 2022). However, some databases are limited to the publication year only. For these cases, we included 2020 to 2022 and retrieved the available records until 31 July 2022.

Table 1 All databases covered in this systematic review

2.3 Search strategy

To execute the search strategy, we need to define general and specific research questions to cover applications and results for the computational models and techniques found during the systematic review.

Here are the General Questions (GQ) addressed by this study:

  • GQ1. What were the primary studies regarding ML COVID-19 models to track pandemics’ evolution within a territory?

  • GQ2. What are the challenges in algorithms and computational methods to help sanitary authorities control pandemic scenarios?

These are the Specific Questions (SQ) employed during this systematic review:

  • SQ1. How do studies map COVID-19-positive population within the territory through ML models?

  • SQ2. What studies consider SARS-CoV-2 virus variants tied to COVID-19-positive population within a specific territory?

  • SQ3. What studies help healthcare authorities predict real-time or soft real-time pandemics evolution via ML models?

  • SQ4. What studies focus on building geolocation maps based on the COVID-19-positive population?

We defined the following base search string based on general and specific questions to query all eligible and pertinent databases defined in Table 1.

Table 2 Base search string we use during databases record retrieval

We detail the individual search strings for each queried database in Appendix A.

The search strategy development process considered candidate search terms identified by looking at words in those records’ titles, abstracts, and subject indexing. The databases strategy uses the Cochrane RCT filter reported in the Cochrane Handbook (Lasserson et al. 2019; Thomas et al. 2019).

2.4 Selection process

We combined two approaches to ensure accuracy during the selection process: assessment of records by more than one reviewer (Gartlehner et al. 2020; Waffenschmidt et al. 2019; Wang et al. 2020), and crowdsourcing (Noel-Storr 2019; Nama et al. 2019). In the case of crowdsourcing, we used the Mendeley Reference Manager (MRM) Tool as a central repository of records for teamwork. For this purpose, the tool has a specific feature named Private Group.

We did not use any ML approach to perform the selection process. Three researchers divided the screening tasks into abstracts, titles, keywords, and full-text articles. In case of inconsistencies or disagreements on such stages, the three researchers discussed until they had reached a consensus.

2.5 Data collection process

We have followed meticulous steps for the retrieval, selection, inclusion, and exclusion of records during this systematic review. To achieve the result, we executed the tasks detailed in the workflow from Fig. 1. We used the following tools to assist the data collection:

  • Mendeley Reference Manager (MRM): the central repository of records collected through the search string from Table 2

  • Mendeley Desktop Tool (MDT): the tool that removes duplicated records

  • Mendeley Web Importer (MWB): the tool that imports records directly to MRM from the web search results pages for each database in Table 1

Fig. 1
figure 1

Steps performed to execute the data collection process

2.6 Data items

We organized the included studies in two domains and built a taxonomy to facilitate the analyses for the results section, according to Fig. 2.

Fig. 2
figure 2

Taxonomic analysis and definition of (a) Domain 1: COVID-19 pandemics’ trending prediction and (b) Domain 2: COVID-19 geolocation tracking and prediction

All models and algorithms derived from ML techniques are eligible for inclusion. ML models focusing on predictions of pandemic spreading and geolocation will have preference during the inclusion process.

We collected data on:

  • The report: author, year, and source of publication.

  • The study: types of ML techniques used, sample applications

  • The participants: COVID-19-positive individuals and their geographical location, whenever available

  • The intervention: type, duration, timing, and mode of delivery of the employed ML model.

2.7 Risk of bias

We assessed the risk of bias (RoB) in the included studies by following the revised Cochrane risk-of-bias tool for randomized trials (RoB 2.0) (Higgins et al. 2019; Collaboration 2021). RoB 2.0 addresses five specific domains:

  1. 1.

    Bias arising from the randomization process

  2. 2.

    Bias due to deviations from intended intervention

  3. 3.

    Bias due to missing outcome data

  4. 4.

    Bias in measurement of the outcome

  5. 5.

    Bias in selection of the reported result.

Three review authors independently applied RoB 2.0 tool to each included study. They recorded supporting information and justifications for judgements of risk of bias for each domain (low, high, or some concerns).

We discussed and resolved all discrepancies in judgments of RoB or justifications for judgments to reach a consensus among the three review authors. By following the guidelines from RoB 2.0 tool (Sect. 1.3.4) (Higgins et al. 2019), we derived an overall summary risk of bias judgement (low, high, or some concerns) for each specific outcome.

The review authors determined RoB for each study through the highest RoB level in any of the assessed domains.

For all studies with missing outcomes, we proceeded with instructions from Sects. 10.12.2 and 10.12.3 of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins et al. 2019) and did not exclude the studies. Instead, we considered the effects of the missing outcome during the risk-of-bias determinations.

2.8 Effect measures

According to Cochrane Handbook for Systematic Reviews of Interventions (Higgins et al. 2019), our approach is to analyze the effect measures using the continuous data method when presenting the synthesis of the meta-analysis in this systematic review. We applied the effect size measures by first testing fixed-effect and random effects to determine the best statistical approach. We also adopted standardized means differences by summarizing the effect size measures within a forest plot.

Three review authors independently applied Cochrane Review Manager 5.4.1 tool (Collaboration 2020) to assess each included study and determine the effect size measure for all interventions and forest plot presentation. In case of any discrepancy or dual judgments, the three review authors resolved the conflict through discussion to reach a consensus.

2.9 Synthesis methods

To get the included studies that answer the researched questions, we categorized the research questions into the following main ML categories for the employed interventions:

  1. 1.

    COVID-19 pandemic trending prediction models

  2. 2.

    COVID-19 geolocation tracking and prediction models.

This approach helped us build a corresponding taxonomy map to facilitate the answers to the research questions framework and make the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) summary of findings (Schunemann 2013).

Within the GRADE summary findings, we could assess the risk of bias and level of certainty within the 95% confidence interval of grouped and pooled results.

We used the synthesis methods described in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins et al. 2019). For this systematic review, we used the following synthesis meta-analysis aspects regards effect measures:

  1. 1.

    We divided the meta-analysis into two comparison groups: actual COVID-19 cases’ detection versus ML COVID-19 cases’ prediction.

  2. 2.

    We got study heterogeneity \(I^2=0\) in random-effects analysis. Section 10.10.4.1 of Cochrane Handbook for Systematic Reviews of Interventions (Higgins et al. 2019) states we should never choose between a fixed-effects and a random-effects meta-analysis based on the statistical test for heterogeneity only. However, we tested both fixed-effects and random-effects meta-analysis, and both approaches returned the same pooled results within the 95% confidence interval. Such developments and statistical significance led us to choose a fixed-effects meta-analysis approach.

  3. 3.

    We used standardized mean difference analysis to walk toward statistical homogeneity, because each study has a different number of observations within different periods.

  4. 4.

    As part of fixed-effect and standardized mean difference analysis, we applied the inverse variance statistical method by considering a 95% of confidence interval (CI).

According to Sect. 10.10.2 of Cochrane Handbook for Systematic Reviews of Interventions (Higgins et al. 2019), the simple usage of \(\chi ^2\) to determine heterogeneity has low power in the typical situation of a meta-analysis when studies have a small sample size or are few. Instead, we adopt the \(I^2\) approach to measure heterogeneity for better precision.

We prepared the synthesis results within a forest plot. The forest plot determines important synthesis parameters by including studies via p value and the \(I^2\) heterogeneity measurement. We applied the Cochrane Review Manager 5.4.1 tool (RevMan) to generate funnel and forest plots. RevMan includes RoB 1.0 assessment. However, we replaced it with the updated RoB 2.0 for risk-of-bias analysis (Collaboration 2021).

We built a funnel plot to detect possible publication bias during the analysis of the results (Sects. 13.3.5.2 and 13.3.5.3 of Cochrane Handbook for Systematic Reviews of Interventions (Higgins et al. 2019)). We also linked the included studies to the research questions framework (both general and specific questions). For all studies that are missing outcomes, we proceeded with instructions from Sects. 10.12.2 and 10.12.3 of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins et al. 2019) and did not exclude the studies. Instead, the forest and funnel plots discarded the studies with missing numerical outcomes.

Since the meta-analysis resulted in fixed effects and the heterogeneity coefficient is \(I^2=0\). We agreed that we did not need to perform further meta-regression, sensitive, and other subgroup analyses, as they did not influence the final results of this systematic review.

3 Results

We now present the detailed results after applying the methods described in the previous sections. All results follow the PRISMA 2020 checklist (Higgins et al. 2019) and PRISMA protocol (Moher et al. 2015).

3.1 PRISMA workflow

The search strings initially returned 7031 records in database searching (see Table 3). We organized the results by following the PRISMA workflow according to Fig. 3.

After removing duplicated records, we marked records as ineligible by automation and excluded records for other reasons. We screened 4,819 records for seeking and based on the abstracts and keywords, and we excluded 4,581 records else. Then, from the resultant 217 records available for retrieval, we excluded seven records we could not retrieve.

We then reviewed 210 full-text documents and reached 45 included studies, grouped into 30 new unique studies. Finally, we performed meta-analyses of the results.

Table 3 All databases used for this systematic review
Fig. 3
figure 3

PRISMA 2020 workflow

3.2 Taxonomy of the included studies

We obtained the 30 new studies from the 45 reports by performing a detailed taxonomy by dividing them into two main clusters, as detailed in Sect. 2.9.

Figure 4 describes the entire taxonomic organization of all studies in this systematic review.

  1. 1.

    Domain 1: COVID-19 pandemics trending prediction: 38 studies

    1. (a)

      Stochastic, principal component analysis (PCA), and logistic models: 7 studies

    2. (b)

      Reinforcement learning (RL): 3 studies

    3. (c)

      Recurrent neural networks (RNN) with long short-term memory (LSTM) architecture: 7 studies

    4. (d)

      Time-series: 8 studies

    5. (e)

      Gate recurrent unit (GRU): 2 studies

    6. (f)

      Convolutional neural network (CNN) with logistic regression: 1 study

    7. (g)

      Multilayer perceptron (MLP) neural network: 2 studies

    8. (h)

      Linear regression: 4 studies

    9. (i)

      Ensemble classifiers: 4 studies.

  2. 2.

    Domain 2: COVID-19 geolocation tracking and prediction: 7 studies

    1. (a)

      Ensemble regression learning: 1 study

    2. (b)

      RL: 2 studies

    3. (c)

      Fuzzy analytic hierarchy process (AHP) with Fuzzy technique for order of preference by similarity to ideal solution (TOPSIS): 1 study

    4. (d)

      AHP: 1 study

    5. (e)

      Random forest: 2 studies.

Fig. 4
figure 4

Taxonomy of the studies included in this systematic review

3.3 Answers to the research questions

The included studies addressed all research questions proposed by this paper, with different densities. We show the relationship between the included studies and the addressed questions in Table 4 and discuss them later in this section.

Table 4 Answers to research questions

We have the following density of research questions answered:

  • GQ1: 29 studies (96%)

  • GQ2: 20 studies (69%)

  • SQ1: 23 studies (84%)

  • SQ2: 1 study (2%)

  • SQ3: 9 studies (53%)

  • SQ4: 9 studies (20%).

Notice that SQ2 returned only one study (Rashed and Hirata 2021) from the search strategy executed between January 2020 and July 2022. One possible reason is that SARS-CoV-2 virus variants’ impact acknowledgment began around the third quarter of 2020, and that would explain the lack of ML models focusing on virus variants. However, we need to consider the virus variants prediction and spreading during epidemiological studies as essential variables in building prediction models.

Some included studies have quick access to real-time COVID-19 confirmed cases data provided by their country governments or reliable institutions, facilitating predicting models more quickly. For instance, India’s studies frequently cited the same data provided by government authorities. USA studies relied on Johns Hopkins COVID-19 MapFootnote 1. Other studies frequently use official data from World Health Organization (WHO) databases, which may depend upon countries’ slower case notifications. This scenario may stimulate some countries to step ahead in producing more studies about prediction models, while others do not have the same advantage.

3.4 Risk-of-bias results

We assessed the risk of bias using RoB 2.0 tool (Higgins et al. 2019; Collaboration 2021) (see Sect. 2.7 for the domains assessed in this systematic review).

In terms of the overall risk of bias, the following studies presented high risk (5 of 45 included studies) (Kuo and Fu 2021; Pan et al. 2021; Pang et al. 2021; Shoaib et al. 2021; Zivkovic et al. 2021).

The following studies presented some level of concern (13 of 45 included studies) (Abdallah et al. 2022; Alamrouni et al. 2022; Bi et al. 2022; Chyon et al. 2022; Majhi et al. 2021; Senthilkumar et al. 2021; Nobi et al. 2021; Ohi et al. 2020; Ozik et al. 2021; Raheja et al. 2021; Sah et al. 2022; Ullah et al. 2022; Vasconcelos et al. 2021).

We present the risk of bias for each study in Fig. 5. We used the RobVis R tool to generate the risk of bias reports (McGuinness 2019).

We applied RoB 2.0 Tool over the two branches of our taxonomic analyses. That means we consider the risk of bias analysis over the context of each taxonomic context.

The studies from Kou et al. (2021) and Alanazi et al. (2020) show zero observations in the forest plot. However, they present outcome data according to their study contexts, which is why we rated their risk of bias low. The study from Haghighat 2021 (Haghighat 2021) shows zero observations in the forest plot, because it focuses on three categories only: hospitalized, deaths, and discharged COVID-19-positive patients. The total of cases is not the simple sum of such classes. However, the risk of bias is low, because the study shows the outcomes as expected (Fig. 6).

Fig. 5
figure 5

Risk of bias from RoB 2.0 tool for each included study

Fig. 6
figure 6

Risk-of-bias summary from RoB 2.0 tool

3.5 Overall effect size results of the individual studies

We present the forest plot in Fig. 7 by including all the meta-analysis results from each included study in this systematic review.

We need the forest plot analysis to determine the accuracy of the ML prediction models to represent actual cumulative COVID-19 cases. Therefore, the more accurate the models are, the less risky they represent for public health authorities’ adoption.

Fig. 7
figure 7

Forest plot for effect sizes. The figure displays the meta-analysis consisting of the summary statistics [standardized mean (SMD), standard deviation (SD), sample size, and weight proportion] for comparison between predicted cases by ML model vs. actual COVID-19 cases. It also displays the mean difference and its 95% confidence interval (CI) for the continuous outcome

The goal of a forest plot is to help physicians determine drug or treatment effectiveness within a population. Physicians are more interested in observing how many studies are far from the line of no effect and not passing through the diamond extension within the forest plot. Anything different from this approach means no statistical significance for physicians.

In our systematic review, we explore the exact opposite effect and interpretation. The studies closer to the line of no effect mean that they simulate real COVID-19 case scenarios more accurately. The same observation approach is valid for studies crossing the forest plot diamond. That is the statistical significance we aim for in this systematic review.

On the other hand, if actual case data reveal as too favored in the forest plot, it may indicate a risk of bias on missing or unclear outcomes presented by the study. The scenario is analyzed later in this review.

From Fig. 7, three case prediction studies are far from the line of no effect: Basu and Campbell (2020); Casini and Roccetti (2020) and (Kolozsvári et al. 2021). For the real case’s side of the forest plot, only (Namasudra et al. 2021) is a little far from the line of no effect.

3.6 Results of syntheses

The forest plot presented heterogeneity \(I^2=0\) for fixed and random effect studies. We chose the fixed-effects analysis for the reasons explained in Sect. 2.9.

Another critical aspect to observe in the forest plot is the summary of results: SMD=0.18, with 95% CI limits between [0.01, 0.35]. Since the interval does not cross the line of no effect, it means that the difference found between the two groups was statistically significant. As we used SMD calculation, the overall impact returns Z statistic and the corresponding p value: Z = 2.02 and p value=0.04. Furthermore, since p value\(\le 0.05\), the summary result is inside the statistical significance threshold established by medical and psychological research (Sect. 15.3.2 of Cochrane handbook (Higgins et al. 2019)). In this specific case, the overall effect means that the included ML prediction studies are significantly valid in simulating the trend of the real cumulative COVID-19 infection cases.

The forest plot (Fig. 7) presents zero observations for most geolocation studies due to focusing on the pandemic prediction simulation, not geolocation. However, the zero observations are discarded by RevMan 5.4.1 and do not influence the results of forest plot statistical analyses. The zero observations reflect the risk of bias within domain D3 (bias due to missing outcome data). We indicate it as No information statement for missing outcomes.

3.7 Risk of bias due to missing results

Based on the forest plot from Fig. 7, we flagged the following studies with No information in domain D3 within the risk-of-bias analysis. The referred studies do not match the forest plot proposed synthesis (predicted vs. the actual number of cumulative COVID-19-positive cases) (Abdallah et al. 2022; Alali et al. 2022; Alamrouni et al. 2022; Alanazi et al. 2020; Bushira and Ongala 2021; Bi et al. 2022; Chandra et al. 2022; Chyon et al. 2022; de Araújo Morais and da Silva Gomes 2022; Haghighat 2021; Kou et al. 2021; Kuo and Fu 2021; Malakar 2021; Mallick et al. 2022; Nobi et al. 2021; Ohi et al. 2020; Ozik et al. 2021; Pan et al. 2021; Pang et al. 2021; Rashed and Hirata 2021; Sah et al. 2022; Shoaib et al. 2021; Ullah et al. 2022).

All the above studies present zero number of patients and the impossibility of SMD calculations. However, they do not alter the overall pooled effects calculated within the forest plot. Because of the reasons already detailed in Sect. 2.7, we did not exclude these studies, as they provide essential methodologies for this systematic review.

3.8 Risk of publication bias

We then grouped all studies that produced results to generate the funnel plot (Fig. 8) and analyzed aspects related to publication bias.

Fig. 8
figure 8

Funnel plot based on non-zero studies from forest plot (Fig. 7). The x-axis shows the SMD, while the y-axis shows the standard error (SE)

The funnel plot places most of the studies closer to the funnel’s center, with only one study (Basu and Campbell 2020) as an outlier within 95% CI. This aspect of the funnel plot indicates a low probability of publication bias. In other words, the interpretation is that the ML prediction studies can simulate real-case scenarios with a high chance of certainty, reinforcing the p value=0.04 calculated by the forest plot.

The GRADE summary of findings also includes the publication bias when presenting the pooled study results for the certainty of evidence.

3.9 Certainty of evidence

We presented the certainty of evidence assessment through the GRADE summary of finding tables (Schunemann 2013) divided into the two domains defined in Sect. 2.6. The GRADE summary of findings tables was created and assessed based on the taxonomy defined in Sect. 3.2.

Fig. 9
figure 9

Question for domain 1 GRADE assessment: ML prediction compared to real cumulative cases for tracking COVID-19 pandemics’ future trending. Explanations: (a) Mohan et al. 2021 study should compare real and predicted cases; (b) Raheja et al. 2021 study presents an abnormal standard error and increases the risk of publication bias; (c) Marzouk et al. 2021 study should compare actual and predicted cases; (d) Basu et al. 2020 appear as an outlier in the funnel plot, indicating possible imprecision; (e) Vasconcelos et al. 2021 bring results, but presenting a low-graph resolution, which may cause imprecisions; (f) Pang et al. 2021 study should bring the predicted vs. real cases, as this is the goal of this study; (g) Shoaib et al. 2021 bring no outcomes; (h) Haghighat 2021 brings no outcomes; (i) Pan et al. 2020 graphs with predicted vs. real cases have low resolution, affecting exact values from the study; (j) Ohi et al. 2020 and Kou et al. 2021 do not bring results about predicted vs. real cumulative cases; (k, l) Sah 2022 and Bi 2022 should bring the predicted vs. real cases, as this is the goal of this study. There is only continuity from actual to predicted cases (Senthilkumar et al. 2021; Raheja et al. 2021; Marzouk et al. 2021; Basu and Campbell 2020; Vasconcelos et al. 2021; Pang et al. 2021; Shoaib et al. 2021; Haghighat 2021; Pan et al. 2021; Ohi et al. 2020; Kou et al. 2021; Sah et al. 2022; Bi et al. 2022)

Fig. 10
figure 10

Question for domain 2 GRADE assessment: ML prediction compared to real geographic location for tracking COVID-19 pandemics critical geolocations: (a) Ozik et al. 2021 have no outcomes for cumulative COVID-19 cases’ prediction. Giacopelli 2021 brings very few observations; (b) Malakar et al. 2021 bring imprecise outcomes for cumulated COVID-19 cases during modeling; (c) Bushira et al. 2021 have no outcomes for cumulative COVID-19 cases prediction; (d) Kuo et al. 2021 have no outcomes for cumulative COVID-19 cases’ prediction (Ozik et al. 2021; Giacopelli 2021; Malakar 2021; Bushira and Ongala 2021; Kuo and Fu 2021)

A summary of findings for domains 1 and 2 revealed the following pooled studies and corresponding levels of certainty. Further details are available in GRADE summary of findings tables in Figs. 9 and 10.

  • Domain 1: COVID-19 pandemics’ trending prediction

    • Moderate level of certainty: Linear regression, LSTM RNN, stochastic and PCA logistic models, and ensemble classifiers’ family

    • Low level of certainty: CNN with logistic regression, time-series neural network, MLP, and GRU

    • Very low level of certainty: Reinforcement learning.

  • Domain 2: COVID-19 geolocation tracking and prediction

    • High level of certainty: Ensemble regression learning

    • Low level of certainty: Fuzzy AHP and TOPSIS, AHP, and random forest

    • Very low level of certainty: Reinforcement learning.

3.10 Limitations

The first limitation of this systematic review is that each study has access to heterogeneous databases, and there are no common databases where we could promote studies crossovers. The forest plot (Fig. 7) statistically normalizes the data to calculate a common SMD. However, we consider that studies crossover to common databases would bring a plus significance to this systematic review.

Some studies provided outcomes in the format of low-resolution graphics, which turned the task of numerical reading more difficult. In these cases, authors should provide the supplementary material with the numerical data from where they built the outcome graphics or a high-resolution graphics version. We needed to read the data more closely to avoid bias and take the best approximated numerical values.

As we mentioned in the risk-of-bias Sect. 3.7, some studies brought innovative ML methods to address COVID-19 prediction, but presented no results or numerical and graphical outcomes of the prediction results. Although Sect. 10.12.2 of Cochrane Handbook for Systematic Reviews of Interventions (Higgins et al. 2019) allows replacement data fulfillment for cases of lack of numerical outcomes, we did not adopt this approach to avoid any biased introduction to this systematic review. Instead, we decided to rely on the graphics data regardless they were not accurate. To preserve the original level of certainty, we considered either no data or approximated data (due to low-resolution graphics in some studies).

4 Conclusion

This article presented a systematic review highlighting the contribution of applied computing to the health area. There is a significant effort to model, predict, and mitigate the COVID-19 pandemic across the population. We conducted the systematic review based on the PRISMA protocol, where we could conduct the full review of 45 studies out of initially 7031 screened studies. We could then answer the general and specific questions proposed in this systematic review based on the PICO approach.

Compared to other systematic reviews cited in this study ((Ghaderzadeh and Asadi 2021; Alyasseri et al. 2022; Syeda et al. 2021), our systematic review has the advantage of providing a straightforward scientific contribution to joining all ML pandemic evolution models into a single study and comparing which model can be the best fit depending upon the affected region criticality.

Using Cochrane methodology and GRADE assessment for a summary of findings table allowed us to dive deep into the studies and determine their significance to the scientific contribution that can act as a speedy response to minimize the health disasters caused by the COVID-19 pandemics.

We observed essential ML approaches to predict the next steps of the pandemics regarding COVID-19-positive trends and geolocation. By conducting risk of bias, levels of certainty, a summary of findings, and statistical analysis via the forest and funnel plot assessments for the 45 studies, we could determine the statistical significance of these studies to simulate the progress of the pandemics. Despite some study limitations found during this systematic review, our final results corroborate the possibility of using ML prediction to serve the healthcare authorities for decision-making and preventive actions toward saving lives. ML and healthcare can offer valuable options to respond to and combat current and future pandemics.

The healthcare authorities can take immediate advantage of using the included studies from this systematic review and implement any of the corresponding ML models in the format of ML pipeline graphs or KubernetesFootnote 2 container resources already present in the cloud hyperscalers (like Google Cloud PlatformFootnote 3, SAP Data Intelligence CloudFootnote 4, Microsoft AzureFootnote 5 or Amazon Web ServicesFootnote 6). The containers will provide endpoints where users can enter big data for training and testing. Then, the trained ML container seamlessly becomes a production microservice available for final users.

5 Registration and protocol

This systematic review has not been registered in the PROSPERO database yet.

The citation of this systematic review and meta-analysis protocol is presented in Moher et al. (2015). We explain the details about differences in resultant statistical significance through this systematic review, but they do not cause any differences in the original PRISMA-P protocol.