1 Introduction

In December 2019, the Chinese government informed the rest of the world that a novel coronavirus, Severe Acute Respiratory Syndrome-Related Coronavirus 2 (COVID-19), was rapidly spreading throughout China, which quickly infiltrated many other countries. The United States Centers for Disease Control and Prevention (CDC) recognized a seafood market in Wuhan as the center of the outbreak. On January 13, 2020, the World Health Organization (WHO) reported a case in Thailand, the first case to be identified outside China. On January 16, Japan confirmed its first case, and on January 20, South Korea reported its first confirmed case. Nowadays, most countries in the world have been affected by this virus.

Putra and Khozin Mu’tamar [1] used the Particle Swarm Optimization (PSO) algorithm to estimate parameters in the Susceptible, Infected, Recovered (SIR) model. The results indicate that the suggested method is precise and has low enough error compared to other analytical methods. Mbuvha and Marwala [2] calibrated the SIR model to South Africa’s reported cases after considering different scenarios of the reproduction number (R0) for reporting infections and healthcare resource estimations. Qi and Xiao [3] proposed that both daily temperature and relative humidity can influence the occurrence of COVID-19 in Hubei and other provinces.

Salgotra and Gandomi [4] developed two COVID-19 prediction models based on genetic programming and applied these models in India. Findings from a study by [4] show that genetic evolutionary programming models have proven to be highly reliable for COVID-19 cases in India.

The rest of paper is organized into the following sections. Sections 2 and 3 present the search method procedure and other reviews, respectively. Section 4 shows the main research fields. Generic illustrations are provided in Sect. 5. Mathematical modeling and criteria evaluation are presented in Sects. 6 and 7. Solution approaches, including autoregressive model, exponential models, deep learning, regression methods, etc., are described in Sect. 8. Section 9 depicts the strengths and weaknesses of the various forecasting models. Finally, the conclusion, discussion of results, and future directions are presented in Sect. 10.

2 Search method procedure

The process used to identify the articles for this study’s review is presented in this section.

2.1 Search method

Web of Science (WOS) and Scopus were used to find related publications based on the following keywords: forecasting, prediction, COVID-19, and coronavirus. The classification of the chosen published works based on the subject area is displayed in Fig. 1. Updated articles from the beginning of 2020 to now were filtered from Scopus using the Boolean operator OR, for both topics and titles. We selected 920 technical research articles that contain only algorithmic descriptions, review articles, conference papers, case studies, and provide managerial insights, which were published as of October 10, 2020 (Fig. 2). In addition, this study focuses more on those papers that were indexed by the Web of Science.

Fig. 1
figure 1

Classification of scientific papers based on subject area

Fig. 2
figure 2

Research methodology used in this paper

3 Other reviews

Mahalle and Kalamkar [5] categorized forecasting models as mathematical models and machine learning techniques, using WHO and social media communications as datasets. Significant parameters including death count, metrological parameters, quarantine period, medical resources, and mobility were also studied [5].

Naudé [6] provided a review of the contribution of artificial intelligence (AI) against COVID-19. Some fields of AI that have contribution against COVID-19 have been identified as early warnings and alerts, tracking and prediction, data dashboards, diagnosis and prognosis, treatments and cures, and social control [6].

4 Main research fields

Keywords are critical in identifying the appropriate literature in a research field [7]. As specified by [8]: “keywords represent the core research of a paper.” A keywords network offers a copy of an information area that provides insight into the available subjects and how these topics are related and sorted [9]. Therefore, the VOSviewer 1.6.11 software was applied to provide a keyword co-occurrence network, and bibliographic data were derived from Scopus. Author keywords were used to generate a network of keywords. A sum of 1931 keywords were obtained from the dataset, regarding the full counting. Table 1 presents the parameter settings for keyword visualization.

Table 1 Parameter settings

The resulting network contains 500 nodes and 4000 links, as shown in Fig. 3, which also presents the main fields for forecasting coronavirus. Stronger links in the network visualization are indicated by thicker lines [10]. It can be seen in Fig. 3 that Coronavirus, prediction, epidemic, human, and forecasting have connection links. Moreover, Fig. 3 presents a network visualization based on keywords, where Coronavirus, prediction, epidemic, human, statistical analysis, quarantine, hospitalization, mortality, and weather are among the top keywords on which researchers focused. In Fig. 3, the cluster is indicated by color, and the bigger circle represents the keyword that is used most.

Fig. 3
figure 3

Networks across the links (keywords analysis)

Figure 4 presents the detailed analysis of the sum of works cited and the number of records versus affiliations. The filtered numbers of records and works cited include a minimum of 1 and 18, respectively.

Fig. 4
figure 4

A detailed analysis (sum of works cited and number of records vs. Affiliations)

5 Generic illustrations

Several epidemic models have been used by researchers to estimate the outbreak in the short and long term [11,12,13,14]. The most applied epidemic models are the susceptible, infected, and recovered (SIR) model and susceptible, exposed, infected, and recovered (SEIR). The SIR model [15, 16] is described as shown in Fig. 5:

Fig. 5
figure 5

Susceptible, infected, and recovered (SIR) model

In terms of mathematical modeling, the SIR model is shown below [17]:

$$\frac{{{\text{d}}s}}{{{\text{d}}t}} = - \beta IS$$
(1)
$$\frac{{{\text{d}}I}}{{{\text{d}}t}} = \beta IS - \gamma I$$
(2)
$$\frac{{{\text{d}}R}}{{{\text{d}}t}} = \gamma I$$
(3)

where S is the number of individuals susceptible at time t; I is the number of infected individuals at time t; R is the number of recovered individuals at time t; and \(\beta\) and \(\gamma\) are the transmission rate and rate of recovery (removal), respectively. The SEIR model [18] is similar to the SIR model except that variable E is added for the fraction of individuals that have been infected but are asymptomatic. The SEIR model and the related equations are presented in Fig. 6.

Fig. 6
figure 6

The susceptible, exposed, infected, and recovered (SEIR) diagram [18]

The equations of the SEIR model are defined below:

$$\frac{{{\text{d}}S(t)}}{{{\text{d}}t}} = - \beta \frac{S(t)I(t)}{N} - \alpha S(t)$$
(4)
$$\frac{{{\text{d}}E(t)}}{{{\text{d}}t}} = \beta \frac{S(t)I(t)}{N} - \gamma E(t)$$
(5)
$$\frac{{{\text{d}}I(t)}}{{{\text{d}}t}} = \gamma E(t) - \delta I(t)$$
(6)
$$\frac{{{\text{d}}Q(t)}}{{{\text{d}}t}} = \delta I(t) - \lambda (t)Q(t) - \kappa (t)Q(t)$$
(7)
$$\frac{{{\text{d}}R(t)}}{{{\text{d}}t}} = \lambda (t)Q(t)$$
(8)
$$\frac{{{\text{d}}D(t)}}{{{\text{d}}t}} = \kappa (t)Q(t)$$
(9)
$$\frac{{{\text{d}}P(t)}}{{{\text{d}}t}} = \alpha S(t)$$
(10)

where \(\alpha\) depicts the protection rate; \(\beta\) is the infection rate; \(\gamma\) is the inverse of the average latent time;\(\delta\) represents the inverse of the average quarantine time; \(\lambda_{0} \,{\text{and}}\,\lambda_{1}\) are coefficients used in the time-dependent cure rate; and \(\kappa_{0} \,\) and \(\,\kappa_{1}\) are coefficients used in the time-dependent mortality rate [18].

6 Mathematical modeling

Ahmar and del Val [19] used the SutteARIMA method to forecast short-term confirmed cases of COVID-19 and Spain Market Index (IBEX 35). Comparatively, the SutteARIMA method was found to be more suitable for forecasting daily confirmed cases in Spain than the AutoRegressive Integrated Moving Average (ARIMA) based on the mean absolute percentage error (MAPE) values. Al-qaness [20] suggested an improved version of the Adaptive Neuro-Fuzzy Inference System (ANFIS) based on the Flower Pollination Algorithm (FPA) by using the Salp Swarm Algorithm to forecast the number of confirmed cases of COVID-19 in China. The idea is to determine the parameters of the Adaptive Neuro-Fuzzy Inference System using the hybrid of the Flower Pollination and Salp Swarm Algorithms. The performance of FPA was validated by comparing it with the existing modified ANFIS models, such as Particle Swarm Optimization (PSO), genetic algorithm (GA), approximate Bayesian computation (ABC), and FPA. Anastassopoulou and Russo [21] proposed a method for predicting the reproduction number (R0) from the susceptible, infected, recovered, and deceased (SIRD) model and other key parameters in forecasting the spread of the COVID-19 epidemic in China. Chakraborty and Ghosh [22] presented a real-time forecast of confirmed COVID-19 cases for multiple countries as well as a risk assessment of the novel COVID-19 for some profoundly affected countries using the regression tree algorithm. A simple moving average approach was used by [23] to predict COVID-19 confirmed cases in Pakistan. [24] used a five-parameter logistic growth model to reconstruct and forecast the COVID-19 epidemic in the USA; however, the authors claimed the accuracy of their model depends on federal- and state-level policy decisions. Cheng and Burcu [12] introduced a platform, icumonitoring.ch, to provide hospital-level projections for intensive care unit (ICU) occupancy based on SEIR models. The proposed platform could help ICU managers to estimate the need for additional resources and is updated every 3–4 days. Chimmula and Zhang [25] applied long short-term memory (LSTM) networks as a deep learning technique for predicting COVID-19 outbreaks in Canada. Their approach identified the key features for estimating the trends of the pandemic in Canada. A simple ARIMA model was proposed by [26] to estimate registered and recovered cases after a lockdown in Italy.

Salgotra and Gandomi [4] established two COVID-19 prediction models based on genetic programming in India. Their results indicate that genetic evolutionary programming models are highly reliable for COVID-19 cases in India. Dil and Dil [11] used the SIR model to forecast confirmed COVID-19 cases in the Eastern Mediterranean region, namely Iran, Iraq, Saudi Arabia, United Arab Emirates, Lebanon, Egypt, and Pakistan, with a special focus on Pakistan. A simple SIRD model was proposed by [14] to predict COVID-19 outbreaks in China, Italy, and France and estimate healthcare facility necessities, such as ventilation units.

7 Criteria evaluation

Forecasting confirmed cases, risk assessment, stock market, ICU beds, registered and recovered cases are top criteria in which scholars show heightened interest.

8 Solution approaches

Several approaches have been addressed by researchers to predict the COVID-19 outbreak [27, 28]. Table 2 presents the solution approaches proposed by researchers for forecasting COVID-19, among which SIR, SEIR, SIRD, and Moving Average are the most popular approaches. Also, some researchers [29, 30] preferred to use hybrid algorithms to enhance the power of forecasting algorithms.

Table 2 Proposed solution approaches for forecasting coronavirus 2019 (COVID-19)

8.1 Autoregressive model

The autoregressive time-series model is known as a useful tool to model dependent data and has been applied to various real-world problems [4953].

8.1.1 Moving average

In statistics and economics, a moving average is a way to calculate and analyze data by providing a series of averages of various subsets of the dataset [54].

8.1.1.1 Simple moving average

A simple moving average (SMA) is defined as the unweighted mean of the previous data (in finance) or an equal number of data on either side of a central value (in science or engineering) [54]. An example of an application of a simple moving average in COVID-19 could be found in [23].

8.1.1.2 Autoregressive integrated moving average (ARIMA)

An autoregressive integrated moving average model is a generalized form of the autoregressive moving average model. As it is well known for forecasting, some researchers have used ARIMA to predict the spread of the new pandemic [31, 5558].

8.1.2 Two-piece distributions based on the scale

Maleki M et al. [35] proposed an autoregressive time-series model based on two-piece scale mixture normal distribution to predict confirmed and recovered COVID-19 cases. Compared with the standard autoregressive time-series model, the proposed algorithm outperforms others in the forecasting the confirmed and recovered COVID-19 cases around the world.

8.2 Exponential models

Exponential models are suitable in the modeling of several phenomena, such as populations, interest rates, and infectious diseases [59].

8.2.1 Logistic functions

One of the famous S-shaped curves is logistic a function with application in biology, chemistry, linguistics, political science, and statistics. [24, 37, 38] provide examples of applications of logistic functions in COVID-19.

8.2.2 Deep learning

Deep learning is a famous branch of machine learning in which the learning process can be supervised, semi-supervised, unsupervised [6062]. Application of different forms of deep learning in forecasting COVID-19 cases could be found in long short-term memory (LSTM) networks [25, 63], polynomial neural network [39], and neural network [31, 40].

8.2.3 Regression methods

In statistics, regression methods are a set of statistical modeling to estimate the relationship between a dependent variable and independent variable(s) [64, 65]. As a powerful tool to forecast the pandemic, various regression methods have been addressed by researchers against COVID-19 [4244, 66, 67].

8.2.4 Prophet algorithm

The Prophet algorithm is an open-source tool that works well with time-series data that have seasonal effects. The main goal of the algorithm, developed by Facebook’s Data Science team, is business forecasting [68, 69]. The Prophet algorithm has proven to be robust in dealing with missing data [70].

8.2.5 Genetic programming

Genetic programming (GP) is a nature-inspired algorithm, where the keys include program representation (tree structure), selection, crossover, and mutation [71]. Some examples of GP in COVID-19 are available in [3234].

8.2.6 SIR

One of the most applied epidemic models is the susceptible, infected, and recovered (SIR) model [15, 16]. Variables S, I, and R are defined in Eqs. 13.

8.2.7 SEIR

The SEIR model [18] is an extended version of the SIR model, which considers an additional parameter, E, representing the fraction of individuals that have been infected but are asymptomatic.

8.2.8 SIRD

The SIRD model differentiates between recovered individuals (those who have survived the disease and are now immune) and deceased individuals [13, 14].

9 Strengths and weaknesses of forecasting models

As discussed earlier, many machine learning algorithms have been used to forecast the new pandemic in different places of the world. Figure 7 presents the percentage of contribution of different solution approaches applied in forecasting COVID-19 confirmed cases (there are 925 indexed articles in Scopus as of October 10, 2020). As it is clear from Fig. 7, deep learning, compartmental models, and other methods have the most contributions, while the Prophet algorithm, as a new branch of machine learning, has the least contribution.

Fig. 7
figure 7

% of contribution of different solution approaches applied in the forecasting of COVID-19 confirmed cases

Machine learning algorithms exhibit many pros and cons, which are described in Table 3.

Table 3 Strengths and weaknesses of proposed machine learning algorithms

10 Conclusion and discussion

At the time of writing, COVID-19 had spread to more than 200 countries worldwide with more than 36 million confirmed cases. Several works have been released in the field for predicting global outbreaks. This study aimed to review the most important forecasting models for COVID-19 and provides a short analysis of published literature. This paper highlighted the most important subject areas by keywords analysis. Moreover, several criteria were identified that could help researchers for future works. Also, this paper recognized the most useful models that researchers have applied for predicting this pandemic. Furthermore, this paper may help researchers to identify important gaps in the research area and, subsequently, develop new machine learning models for forecasting the COVID-19 cases. A detailed scientometric analysis was performed as an influential tool for use in bibliometric analyses and reviews. For this aim, keywords and subject areas are discussed, while the classification of forecasting models, criteria evaluation, and comparison of solution methods are provided in the second section of the work.

This study describes some key arguments that are worthy of further discussion:

  • In terms of the subject area, medicine, biochemistry, and mathematics are most discussed areas addressed by scholars.

  • In terms of keywords analysis, trends present that studies on COVID-19 will increase in the next few months. Moreover, Coronavirus, prediction, epidemic, human, statistical analysis, quarantine, hospitalization, mortality, and weather instances are the most interesting keywords for scholars.

  • Several other criteria have been used by researchers in forecasting, including:

    • Confirmed cases, risk assessment, stock market, ventilation units, ICU beds, estimated registered and recovered cases.

  • Several countries, including China, Pakistan, France, Italy, USA, UK, Brazil, Nigeria, Iran, Germany, and India, were addressed as case studies.

  • Among the epidemic models, deep learning, SIR, and SEIR are the top models that were used by researchers.

  • Hybrid algorithms are used to enhance the power of forecasting approaches.

  • The majority of studies are deterministic approaches, while there is an urgent need to provide robust approaches for tackling uncertain situations.

For future research directions, a comprehensive review in other fields, such as artificial intelligence (AI) and deep learning, is encouraged. Moreover, more studies addressing the development of novel and hybrid approaches to forecast the pandemic should be investigated. Furthermore, at the time of writing this paper, we had access to only a limited number of published articles by Scopus and WOS. However, the most important parts of this paper are the keywords and scientometric analysis that consider the whole database, from which we chose some examples of published articles for review. Therefore, a more comprehensive review in the research area is suggested.