1 Introduction

The demand forecasting model is an essential machine learning tool used by the majority of international pharmaceutical corporations with a dual objective of matching the supply with any eventual rise or fall in the demand for the products and keeping the inventory as minimum as possible. If such a system is effectively deployed, it would contribute to the efficient management of their supply chain [1]. This will lead to improvement in the overall profitability of the organization. It should be noted that the supply chain of pharmaceutical companies consists of a series of activities that are complex and sometimes intertwined. As such, these complexities add up to the numerous challenges that are faced by pharmaceutical companies around the globe. Efficient supply chain management is critical for any organization in the essential service sector, this holds very true in the case of the pharmaceutical industry too. The current covid-19 pandemic which was originally considered as an epidemic pertaining to China has become an unsolicited global pandemic. This pandemicis one of the unique situations for the last 100 years, has affected both the supply and demand dimensions of even the well-established global pharmaceutical industry. On the onehand, sudden outbreaks led to waves of rising demand,andon the otherhand, lockdowns, discoveries of vaccines, pills, and medicines contributed to the unpredictability of supply for the pharmaceutical products. Demand Forecasting Models (DFMs) help the companies to stock or produce the required quantity of pharmaceutical products. Efficient DFMs greatly support pharmaceutical companies to accurately predict the demand and to make appropriate decisions in their production and supply chain operations. In a nutshell, efficient DFMs not only enable to surmount the complexity in the supply chain but create a competitive advantage for global pharmaceutical companies [2].

Most pharmaceutical companies used machine learning-based forecasting techniques for identifying the future demand in advance [3]. The main advantage of using artificial neural networks or data mining techniques is its ability to give high-quality predictionswhile consuming lesser time. Practically it is impossible to obtain such higher accuracies using the existing standard statistical forecasting methods [4]. In [5], the authors mentioned that the companies were not able to deliver the right quantity of products on time to their suppliers. Foremost reason being, that these companies failed to identify the rapid changes in the quantity of demand and were unaware of the other competitor’s demand in the supply chain management. Therefore, authors have suggested these organizations use a predictive tool for managing their demand forecasting and related functions management.

The very popular and successful linear forecasting time-series models are Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), and Autoregressive Moving Average (ARMA) [6], but its prediction quality suffered from their assumption of direct actions. This leads to the rapid development of machine learning methods, such as data mining techniques and Artificial Neural Networks (ANN) for the time series data forecasting that provides better prediction accuracy [1].Recently, the growth of the ANN-based time-series forecasting model was tremendous. In addition, it paves the way for making accurate predictions by handling nonlinear input and output data. Linear forecasting methods mentioned above are not suitable for modeling data containing non-linear behavior [7]. In recent times, ANN has been used widely in time-series forecasting for better prediction quality [8]. Therefore, numerous studies have proposed in the literature to study the performance of ANN-based time-series forecasting in comparison using conventional forecasting methods. On the other hand, these studies have shown that there was no benchmark neural network parameter setting for the time-series domain. Also, it is purely dependent on the problem domain [1].

Similarly, in this paper, shallow neural networks (Probabilistic Neural Network-P_NN, Generalized Regression Neural Network-GR_NN, and Radial Basis Function Neural Network-RBF_NN), and deep learning neural network (Long-Term Short Memory Neural Network)are used to develop ANN-based DFMs. Moreover, a comparative study will be illustrated to determine whether the shallow neural network-based demand forecasting model performs better than the deep neural network or not for time series pharmaceutical data.

The main contribution of this work, in summary, is as follows:

• Development of a methodology to select the best demand forecasting models (DFMs) for pharmaceutical data using statistical and neural network models for a time-series sales database.

• Investigate whether shallow and deep neural network methods such as RBF_NN, P_NN,GR_NN, LSTM and Stacked LSTM lead to more accurate ATC thematic drug prediction than non-neural network models.

• Investigate the performance of ARIMA model, Shallow Neural Network and Deep Neural Network models in dealing with time series dataset.

The authors have appropriately segmented the contents of this research paper into 5 sections as follows. Section 1 was earmarked for proving information pertaining to the introduction of the study. Section 2 describes the review of the related literature and highlights the research gaps. Section 3 discusses a brief explanation of the time-series forecasting models and evaluative measures. Section 4 includes the proposed methodology and experimental analysis. At the last, Sect. 5 summarizes the present work with possible future work.

2 Review of literature

In general, demand forecasting is the biggest challenge for a big data application, which may use additional data information such as value and veracity for better prediction. This research mainly focused on demand forecasting for big data analytics [2]. It explains the incorporation of the various sources of data into a possible prediction process but needs a data analyst to accomplish the task, and adequate technical base, and funds in technology [9].

Past literature highlight the insufficiency of statistical methods for optimization of the pharmaceutical assembly line and a production facility. Keeping this constraint in mind, some researchers [10] have adopted an improvised approach that enabled combining simulation methods and analysis of data. It recommended the pharmaceutical unit should be optimized for performance optimization [11].

In another study [12], Convolutional Neural Network (CNN) method, NN-based Genetic Algorithm, and NN-based Particle Swarm Optimization (PSO) methods are executed to predict the demand for energy consumption. According to the simulation results, neural network-based energy forecasting techniques predicted the demand for energy consumption with greater accuracy than the CNN approach.

Genetic Algorithms (GA), Bayesian structure, and Reinforcing Artificial Neural Networks (RANN) design have been developed by authors [13] to forecast the short-run demand for irrigation. The experimental study of these models highlighted the superiority of the ANN model as the results it produced showed a greater predictive quality when compared to the traditional algorithms.

In yet another research [14], Iran’s annual energy usage was forecasted using three different combinations of the Autoregressive Integrated Moving Average Model (ARIMA) –Adaptive Neural Fuzzy Inference System model (ANFIS). The findings of these models indicated that hybrid models have higher accuracy than the ARIMA or ANFIS model individually for the forecasting of energy use.

Many researchers attempted to come out with near-perfect, zero-error forecasting models. Interestingly, in one of such studies, a neuro-fuzzy approach was employed [15] to forecast the future demand from past sales data. ANFISwas implemented along with ANN as a “neuro-fuzzy approach” despite the specification it shows the approach’s effectiveness. There is a notable study [16] that addressed state-of-the-art approaches and significant forecasting issues concerning the demand for drugs of pharmaceutical companies. They have used three forecasting scenarios for demand. It concluded that the “symbolic regression-based forecasting model” had the better model fit and low error rate to demand observation data than the other models. Few authors attempted to forecast the revenue stream of pharmaceutical companies. One such study [17] aimed to introduce a new forecasting approach to the pharmaceutical companies' revenue estimates. This approach implemented a combination of NN-based prediction methods for the time series dataset.

Due to the COVID-19 pandemic, the breakdown of supply chain function has created a shortage of ambulatory medicines [18]. The pandemic managed to increase demand for pharmaceutical products and reported shortages during the early time of the outbreak [3]. The principal objective of this study was to find the most effective animal drugs based on drug distribution routes like those for humans. This generates demands for investment in the efficiency, health, and feasibility of drug delivery systems in testing [17,18,19]. The Long Short-Term Memory (LSTM) model was used to enhance the predictive quality of the time series dataset. The demand forecasting approach based on a multi-layer LSTM model was proposed in this paper [20] which has a robust capacity to predict extremely rising and falling demand data. The LSTM model is compared with the CNN model, which indicated that the LSTM model has a more robust prediction performance [21].

A theoretical Fuzzy Neural Network (FNN) for the development of forecasting techniques was employed by another study [22]. FNN has the capability to remove the insignificant weights to develop Fuzzy IF–THEN rules for further learning. The output ofFNN is pooled with ANN results for the better prediction of time-series data. The statistical and computational intelligence approaches such as “exponential smoothing K-nearest Neighbors (KNN), Supporting Vector Machines (SVM), ARIMA, ANN, RNN, and LSTM are proposed in the literature as the forecasting techniques for various problem domains [23].

In the study [24], the authors introduced a forecasting approach to analyze the existing sales data and contrasted it with the various machine learning models to predict future demand. The demand forecasting method based on LSTM was developed to capture the nonlinearities present in the e-commerce business. The findings of this study showed that LSTM outperformed predictive univariate techniques. The study [25] proposed a hybrid approach that investigated the long time series of memory to integrate Autoregressive Fractionally Integrated Moving Average (ARFIMA) and feed-forward neural networks (FFNN). The authors said that the hybrid model provided better predictive accuracy than other standard approaches.

Table 1 lists some neural-network-based predictive models used for different demand forecasting operations. After a detailed investigation of different methods, strategies, techniques, and methodologies developed and implemented in previous studies, the ANN-based demand forecasting model is identified as the appropriate demand forecasting model for the time series dataset. Therefore, in this research work, ANN-based demand forecasting models will be developed for Anatomical Therapeutic Chemical (ATC) thematic drug sales dataset to predict the weekly demand.

Table 1 List of popular NN-based forecasting models

3 Methods and materials

Autoregressive Integrated Moving Average Model (ARIMA) is a widely used forecasting model due to its simplicity and ability to generalize to non-stationary series. A detailed description of this model can be found in [14, 23, 35].

3.1 Shallow neural network model: SNN

The popular SNN models used for time-series data forecasting are “Radial Basis Function(RBF) Neural Network (RBF_NN) [40,41,42,43,44,45]Generalized Regression Neural Network (GR_NN) and Probabilistic Neural Network (P_NN)” [41, 46,47,48,49,50,51]. These models are used to build demand forecasting models for pharmaceutical products. It can be made up of an input layer(IL), 1 or 2 hidden layers (HL), and an output layer (OL). It performs regression where the target variable of the problem is continuous. The typical network configuration diagram for SNN is shown in Fig. 1. The IL has one neuron for each predictor variable (X1, X2,..,Xn). In HL, the predictor variables’ value and its target value for each category in the dataset ( H1, H2,…, Hn) were stored in each neuron. When the input data are given to the HL, it calculates the distance from the neuron’s mean point for each test category using a specified distance measure. This distance value is given as the input to the RBF of HL. The output of the HL is given to the next layer (pattern/summation layer or output layer) for intermediate output. The decision layer of P_NN/ GR_NN or output layer of RBF_NNis the last layer which calculates the final output of the whole NN [The detailed descriptions of these neural networks are given below in Table 2.

Fig. 1
figure 1

Common architecture of SNN

Table 2 Description of shallow neural network models

3.2 The description of these shallow neural network models is given below:

3.2.1 RBF_NN

The input layer of the RBF_NN receives the input data and feeds it into the hidden layer of the RBF_NN. The calculation that takes place within the hidden layer uses the radial basis kernel function. The output layer performs the prediction task.

3.2.2 P_NN

The input layer of P_NN computes the euclidean distance from the input time-series data to the training input tie-series data. It generates a vector whose components indicate how close the input is to the training input. The second layer compiles the contribution for each class of inputs and creates its net output as a vector of probabilities using the probability density function. Finally, an activation function on the second layer’s output selects the maximum probability neuron. It generates 1 for the target classes and 0 for non-target classes.

3.2.3 GR_NN

The input layer of the GR_NN receives the time series data and feeds it into the hidden layer of the GR_NN. The calculation that takes place within the hidden layer uses the radial basis kernel function.

In the summation/ pattern layer, the denominator summation unit adds up the weight values of the hidden neurons. The numerator summation unit adds up the weight values multiplied by the actual target value for each hidden neuron. The decision layer divides the value accumulated in the numerator summation unit by the value in the denominator summation unit and uses the result as the predicted target value. Finally, it performs the prediction task.

3.3 Deep Neural Network Model: LSTM

LSTM consists of an input layer, more hidden layers, and an output layer [52,53,54]. It is a Recurrent Neural Networks (RNN) with a deep learning architecture containing the LSTM units in hidden layers. The key benefit of RNNs is that they can be used for both classification and regression by using historical time points or data. This network has a single IL, more HLs, and a single OL. The neurons in the hidden layers are fully inner-connected which includes memory cells and related gate units. The general architecture diagram of LSTM is given in Fig. 2. An LSTM neural network consists of a cell, an input gate, an output gate, and a forgot gate. A cell remembers values at arbitrary intervals. The input gate manages the flow of information into the cell. The output gate manages the flow of information out of the cell. Similarly, forget gates manage the necessary or unnecessary information. It is useful for predictions and classification problems.

Fig. 2
figure 2

LSTM network architecture

3.4 Performance metrics

Root Mean Squared Error (RMSE) is defined as the square root of sum of the squared differences between predicted sales values and actual sales values. It is given in Eq. 5. The smaller RMSE value signifies the best model. Normalized RMSE is calculated as the RMSE of the dataset divided by the mean of the dataset. The percentage of error (PE) is defined as the sum of the differences between the forecasted sales values and the actual sales values, which is divided by the sum of the forecast sales values and multiplied by 100.

$$ {\text{RMSE}} = \sqrt {\frac{1}{n}\sum\nolimits_{{i = 1}}^{n} {\left( {{\text{Sales}}\_P_{i} - {\text{Sales}}\_A_{i} } \right)^{2} } } $$
(5)
$$ {\text{Normalized RMSE}} = \;\frac{{{\text{RMSE}}}}{{{\text{mean}}\left( {{\text{Sales}}\_A_{i} } \right)}} $$
(6)
$$ PE = \left( {\mathop \sum \limits_{i = 1}^{n} abs\left( {{\text{Sales}}\_P_{i} - {\text{Sales}}\_A_{i} } \right)/\mathop \sum \limits_{i = 1}^{n} \left( {{\text{Sales}}\_P_{i} } \right)} \right)*100 $$
(7)

In Eqs. 5, 6 and 7, Sales_Pdenotes predicted sales values, n represents the number of instances in the dataset, Sales_A is the actual sales values.

4 Proposed methodology

In this work, ARIMA, neural network variants such as GR_NN, RBF_NN, P_NN, LSTM, and Stacked LSTM models are used to build the DFMs for pharmaceutical time-series sales data. The graphical abstract of the proposed methodology is given in Fig. 3. Their performance was compared based on key performance indicators such as RMSE and PE.

Fig. 3
figure 3

Workflow of proposed methodology

Algorithm 1 lists the sequences of the step for the Neural Network-based Prediction Model for Demand Forecasting. The first step is th initialization of the model parameters and hyperparameters of RBF_NN, GR_NN, P_NN, and LSTM models. For the initial run of these models, the parameters of the neural network model such as weights, biases, and other hyper-parameters are initialized randomly. In this work, dataset split ratio is 70%:15%:15% with 10 K-fold cross-validation. Epoch represents the maximum number of iterations. The objective of the run is to minimize the RMSE value of these models. Secondly, fix the value for threshold (θ) which denotes the RMSE value of the model. It is purely dependent on the dataset and is user-defined. Thirdly, simulate the NN_net model with initial values and training dataset. After that, evaluate its performance. If desired performance is reached then stop the process the training process. Otherwise, repeat the training process with the different values of hyperparameters as specified in Table 3 until the termination criterion is reached. Fourth, build the NN_net model using optimal parameter values. Fifth, test the NN_net model using testing data. Sixth, estimate the testing performance of the NN_net model. In the next step, identify the best model from the list of NN_models using a benchmark performance indicator (called RMSE). And then simulate the best model using optimal neural network configuration parameter values for the epidemiological data prediction. Finally, forecast the values for the next ‘n’ time stamp.

Table 3 Neural network model parameters setup
figure a

4.1 Data description

The weekly sales data for pharmaceutical products is taken from the Kaggle website [55]. This database contains 600,000 sales instances of 57 drugs during the period 2014–2019 and includes attributes such as date and time of sale, brand name, and amount of these drugs. In this work, the demand for specific types of drugs is predicted, rather than specific products. Therefore, the selected group of 57 drugs is classified into eight sections of anatomical treatment chemical (ATC) thematic analysis and it is classified as shown in Fig. 4. The sales trends of all categories of drug datasets are shown graphically in Fig. 5. It provides a clear picture of the nature of the dataset taken for the study.

Fig. 4
figure 4

Components of drug dataset

Fig. 5
figure 5

Sales trends of the eight categories of the ATC drugs

4.2 Result and discussions

Table 3 showed model parameter and hyperparameter setup for implementing shallow and deep neural network models for demand forecasting. The details of the hyperparameters of demand forecasting models such as ARIMA, RBF_NN, GR_NN,P_NN, LSTM and stacked LSTM models are shown in Table 4.

Table 4 List of hyperparameters for neural network model

Tables 5 and 6 showed the RMSE value of the ARIMA model, Shallow neural network-based demand forecasting model, and deep neural network-based demand forecasting model respectively.

Table 5 RMSEand normalized RMSE for ARIMA and shallow NN- DFM
Table 6 RMSEand normalized RMSEfor deep NN- DFM

From Table 5, it is observed that all three shallow neural network models have relatively similar values for all eight drug categories. The values highlighted in the red shade indicate the lowest RMSEvalue. From the results, it can be seen that GR_NN-based DFM works better for most drug types and this reduces the RMSE from 5–23%.

As for ARIMA based DFM concerned, average RMSE for the ARIMA model for all drug categories is 15.02. The overall percentage of prediction accuracy for all drug categories using the ARIMA model is 83.43%. If the prediction accuracy of the model is close to 100%, it is considered to be the best model. However, for real-time datasets, achieving 90% forecast accuracy is always desirable.

It is observed from Table 6 that there was a little difference in RMSE of LSTM and stacked LSTM for all drug types. But there was a slight difference in the remaining RMSEvalue.As said above, the values highlighted in the red shade indicate the lowest RMSE value for the Deep NN-based DFM. For M01AE and RO6 drugs, Deep_NN-based DFMs worked better than Shallow_NN-based DFMs. For the rest of the drugs, Shallow_NN-based DFMS showed the best performance. These best models have a predictable accuracy of over 90% for all drugs. This is also one of the interesting finding of this research work.

LSTM and stacked LSTM have shown almost equal efficacy for all drug types except M01AE. Another interesting fact is that shallow NN-based DFM outperformed deep NN-based DFM. This is because shallow NN has a strong ability to tolerate noise in datasets and learn patterns from small databases and is finally easy to create a model with minimal parameters.

Table 7 showed the PE values of the shallow neural network-based demand forecast model and the deep neural network-based demand forecast model. The PE of the best model ranges from 0.2% to 13.57%. Therefore, these DFMs have good predictive accuracy for the (ATC) thematic drug dataset.

Table 7 Percentage of error for NN- DFMs

The performance of DFMs for eight drug types is compared using a benchmark performance indicator called the RMSE value. This is illustrated graphically in Fig. 6. Figure 7 illustrates the comparison of the minimum RMSE values of all eight drug types obtained using shallow NN and deep NN-based demand forecasting models.

Fig. 6
figure 6

RMSE value for each category

Fig. 7
figure 7

Minimal RMSE value for eight categories

Table 8 displayed the mean RMSE value of all demand forecasting models for eight drug types. It is found that shallow NN models (such as GR_NN, RBF_NN, and P_NN) have almost identical mean RMSEvaluesfor all eight categories of the drug. Therefore, it is concluded that the shallow NN-based demand forecasting model worked best in predicting demand for pharmaceutical products.

Table 8 Overall performance of all DFMs

5 Summary and scope

The main scope of demand forecasting models is stated as follows:

  • Achieving the planned production target

  • Stabilize production against demand

  • Future growth of sales

  • Longstanding investment planning

  • Budget preparation and Sales budget

  • Control of raw materials

It is evident that in most cases, shallow neural network models showed better performance in terms of predictive accuracy. The most important fact identified from in this study is that a deep neural network has not shown the best performance against this drug sales dataset because the number of samples used for building the demand forecasting model was less. In short, shallow neural network-based time-series demand forecasting models have offered potentially useful recommendations to the pharmaceutical companies. The weekly sale data analysis of ATC drug products has proven to be a useful to identify the days for implementation of special marketing campaigns to promote their sales.

6 Conclusion and future work

The main objective of this work was to provide a methodology for demand forecasting to help pharmaceutical companies to manufacture or stock the right quantity of pharm products. Shallow Neural networks (GRNN, RBF_NN, and P_NN) and Deep Neural Network (LSTM and Stacked LSTM) models for eight categories of the ATC thematic drug. In this work, shallow neural network models based DFM have performed well for five drug categories out of eight, while ARIMA model has worked best for remaining three drug categories. From the experimental study, it has been known that the single demand forecasting model is not optimal for all drug categories. For ATC thematic drug time-series dataset, shallow NN-based DFM performed better with a mean RMSE value of 6.27 for all eight drug types. Future research work to enhance the predictive accuracy of the time series forecasting model will include more cases, explore different accuracy measurements, and develope shallow and deep neural network models with optimal hyperparameters.