1. Introduction
Groundwater, a significant water supply for agricultural, drinking, domestic and industrial purposes, plays a vital role in the sustainable development of society and ecology in arid regions where the rainfall is extremely scarce [
1,
2]. As the most critical factor, the groundwater level is fit to quantify the groundwater resource. However, the dwindling of groundwater level caused by external pressure of population increase, economic development, climate change and pollution over-exploitation is threatening the sustainability of water resources in arid areas [
3,
4]. Therefore, deriving accurate and specific groundwater level estimation is essential for policy makers and planners to evaluate and manage the groundwater resources as well as prevent over-abstraction more effectively.
Generally, the groundwater level can be assessed by physically-based and data-based models. In the physically based models, a detailed interaction of various physical processes that control the hydrologic behavior of the groundwater system is involved using a simplified control equation [
5]. To achieve this, proper initial and boundary conditions calculated by numerical methods are indispensable [
6]. Nevertheless, a large quantity of hydrogeological data and the physical properties of groundwater such as hydraulic conductivity, volumetric water content and matric potential are hard to access even with expensive site investigations. Moreover, the computational costs derived from partitioning of the physical domain required for the numerical solution is extremely high [
7,
8]. Besides, limitative insight of the researchers about the physical process of groundwater flow restricts application of the models from a practical perspective. Thus, satisfactory results from the numerical groundwater models may be constrained considering these facts. As a result, an alternative artificial intelligence (AI) model, which formulates groundwater level nonlinearity merely by relying on the historical data or broader exogenous data as inputs, is necessary and significant under such conditions.
In recent years, a wide variety of AI models, such as the Artificial Neural Network (ANN), Adaptive Neuro-Fuzzy Inference System (ANFIS), Support Vector Machine (SVM), Extreme Learning Machine (ELM) and genetic programming (GP), has been extensively recommended and applied to investigate hotspots in hydrologic research, mainly focusing on rainfall, runoff, reference evapotranspiration, flood, water quality and groundwater level forecast [
9,
10,
11,
12,
13,
14,
15]. In terms of groundwater level prediction, the applicability and potential of these AI methods have been confirmed [
16]. Among these models, the ELM, as a typical representative of AI models, has received increasing attention in hydrological modeling due to the fast learning speed and strong generalization capability. Compared with the traditional ANN methodology, the ELM algorithm does not need prior tuning of meta-parameters like input weights and hidden layer biases [
10]. Thus, a global solution and more accurate prediction are able to be attained.
However, deficiencies still exist in groundwater modelling despite the effectiveness and applicability of the conventional AI methods. Firstly, the complexity of the groundwater dynamic system itself makes it difficult to distinguish and recognize groundwater features accurately [
17]. In addition, uncertainties from non-stationary inputs caused by trends or seasonal variation of the data may largely affect the model’s performance if no preprocessing procedure is applied [
18,
19]. Under such circumstances, linking data pretreatment process with the conventional AI models can be considered as an effective attempt at careful analysis of the dynamic characteristics and precise accuracy of these models.
In recent years, a number of studies have verified and reported the superiority and availability of AI models coupled with data preprocessing techniques [
4,
18,
20,
21,
22,
23,
24,
25,
26]. In review of this, the wavelet transformation (WT) algorithm, empirical mode decomposition (EMD) and ensemble empirical mode decomposition (EEMD) are three commonly used techniques that can decompose original time series into various sub-series components and extract valuable information concealed in the datasets [
27,
28,
29]. Several successful applications of the WT, EMD and EEMD have also been conducted in hydrological forecasting, as described in Napolitano et al. [
30], Kisi et al. [
31], Feng et al. [
32], Rezaie-balf et al. [
33], Hadi et al. [
34], Yu et al. [
35], and Rezaie-Balf et al. [
36]. These data pre-processing methods reliably obviate the shortcoming of AI models in dealing with the nonstationary and nonlinear signals by decomposing these time series into a set of simpler components to attain a deeper insight into the features [
22,
37]. Therefore, the prediction precision can be improved to a greater extent. Despite this, the weaknesses of these data pre-processing techniques, for instance, the high dependence of wavelet decomposition on the mother wavelet functions [
38], the disadvantages in mode mixing and the lack of the strict mathematical theory of the EMD method [
39], the large amount of computation cost, the uncontrollable modal components, and the unremovable noise of the EEMD [
17] may affect the accuracy of predicting results.
The variational mode decomposition, newly proposed by Dragomiretskiy and Zosso. [
40], is a completely non-recursive variation model for signal decomposition. This method has attracted much attention due to its solid theoretical foundation, strong noise robustness and precise component separation [
41]. The hybrids AI and VMD models have successfully been employed in power quality events recognition [
42], short-term load forecasting [
43], time frequency analysis of Mirnov coil [
44], stock price and movement prediction [
45], short-term wind power generation forecasting [
46], wind speed forecasting [
47], and solar radiation forecasting [
48]. In the hydrological domain, runoff and rainfall-runoff predictions were mainly focused upon. Seo et al. [
49] proposed VMD-based ELM (VMD-ELM) and VMD-based least squares support vector regression (VMD-LSSVR) models for daily rainfall-runoff modeling in the Geumho River Watershed, South Korea. The results showed that the VMD-ELM and the VMD-LSSVR models presented the best performance when compared with the VMD-based ANN (VMD-ANN), discrete wavelet transform (DWT)-based single models (DWT-ELM, DWT-LSSVR, and DWT-ANN) and single models (ELM, LSSVR, and ANN). He et al. [
41] simulated daily runoff by using the deep neural networks (DNN) coupled with VMD (VMD-DNN) in the Zhangjiashan hydrological station of Jing River, China. The results confirmed the superiority and novelty of the proposed hybrid model in daily runoff forecasting. Furthermore, the good stability and representativeness of the deep belief network (DBN) coupled with VMD in short-term runoff prediction were also reported by Xie et al. [
17]. Despite the demonstrated significant potential and advantages of the AI coupled VMD methods, the use of these models in groundwater level prediction is rarely recorded, especially in the arid environment where groundwater has high irregularity, complex nonlinearity and multi-scale variability. It is, thus, important for groundwater level prediction to use the VMD based hybrid models to provide reliable scientific references in such conditions.
Especially, the uncertainty residing in the stochastic AI algorithms remains a problem that cannot be ignored when conducting these models. Therefore, uncertainty analysis is an indispensable procedure to get reliable simulation results since these models are susceptible to input data, and incapable of reproducing the same results even in identical situations. However, the uncertainty of hybrid models in the estimation process is often neglected in most cases, regarding the fact that the deterministic analysis of hybrid models is the main focus. The bootstrap method coupled with a resampling technique can be an effective method for uncertainty analysis due to the advantages of the more convenient computation process relative to derivatives and the Hessian-matrix involved in the delta method, or the Monte Carlo simulations involved in the Bayesian approach, and has been successfully used in a wide range of problems in hydrological modeling [
50,
51,
52,
53]. The applications of combining a machine learning method based on data decomposition technology and a bootstrap resamples method are described in detail in [
51,
52,
54]. These researchers developed a hybrid wavelet-bootstrap-neural network (WBNN) model for forecasting hourly flood, daily discharge, and medium term urban water demand, respectively, and confirmed the superiority of the bootstrap method in assessing uncertainty. However, the uncertainty of the AI coupled decomposition ensemble models is usually ignored especially in groundwater level prediction. Therefore, the uncertainty analysis based on the bootstrap method was conducted to assess the precision of the hybrid models.
According to the above considerations, the main objective of this study is to propose a hybrid model combining the signal decomposition (VMD) with feature extraction (Boruta) and ELM (VMD-Boruta-ELM, briefed as VBELM) for the accurate simulation of 1, 2 and 3 month-ahead groundwater level. Specifically, the VMD was first used to decompose the time series of groundwater level, rainfall and temperature into various multiple intrinsic narrow-band sub-components called IMFs; then the feature selection was carried out by using the Boruta method to reduce input dimensions and to extract input variables of each IMF; next, the obtained IMFs of input parameters were imported into the ELM model to derive a corresponding forecast; last, all the predicted IMFs were summed as final output result. In addition, the single ELM model and the coupled VELM model without Boruta method were also developed for comparison. Moreover, uncertainty analysis was performed for the VBELM model by the bootstrap sampling technique for the purpose of more accurate and reliable groundwater level prediction results.
3. Model Development
In light of the foregoing discussion, a hybrid VBELM model integrating modal decomposition, feature selection and AI methods was proposed for monthly groundwater level prediction. As for the input data, the antecedent groundwater level and weather data were employed in most studies [
4,
8,
33]. In the present study, the past total precipitation (P), average temperature (T) and groundwater level (GW) were used as inputs to predict future groundwater level. The maximum time lag was identified as 3, namely that the time series of P, T and GW for the past three months (1 time step represents 1 month) were employed to forecast the GW at the
t + 1,
t + 2 and
t + 3 timescales Equation (14).
where
t denotes the current time, Δt is the lead-time period, and 1-, 2- and 3- time steps were selected in this study.
t2 and
t1 denote the prior 2- month time step and the prior 1- month time step, respectively.
For model construction, firstly, the VMD technique was adopted to decompose the original precipitation, temperature and groundwater series (for both training and testing sets) into the same number of IMFs with corresponding low–high frequency components to obtain stationary subseries. Secondly, the Boruta feature selection algorithm was used to select appropriate input components for each IMF during the training phase in order to build concise and efficient models. Thirdly, the ELM model was applied to simulate all IMFs. Finally, the predicted IMFs were obtained by aggregating the results of all obtained IMF components. This process was carried out for deterministic analysis. In terms of uncertainty analysis, each selected IMF by Boruta was randomly resampled 1000 times (see
Section 2.4; running the ELM model obtains 1000 outputs for each IMF series. The prediction results of all IMFs are aggregated 1000 times to obtain the final 1000 outputs used as uncertainty analysis. The schematics of the process are illustrated in
Figure 1.
Note that the input and output data were normalized with a mean of 0 and a variance of 1 before running the model in order to eliminate influence of the dimensions. The equation used is as follows:
where
xnew is the normalized dimensionless data, and μ and σ are the average and standard deviation respectively.
6. Conclusions
Accurate and reliable groundwater level prediction can provide government with valuable information about water resources planning and management. In this study, a hybrid VBELM model coupled with the VMD data decomposition method, the Boruta feature selection technique and the ELM model for 1-, 2-, and 3-months ahead groundwater level forecasting at well I, well II and well III in an arid region, northwest China was developed. The performance of the VBELM model was compared with that of the VELM and the ELM models in light of R, RMSE, MAE and NS. Rather than the frequently used deterministic analysis, uncertainty analysis was carried out to evaluate the proposed model by the bootstrap technique.
The results demonstrate that the VBELM model could provide accurate forecast of 1-, 2-, and 3-month ahead groundwater level for all three wells. The best outcome can be derived from the 1-month ahead forecast, while the performance deteriorated gradually with an extension of the prediction time. The comparison results show that the VBELM model performed better than the VELM and ELM models for all the prediction period. In terms of the evaluation indices, the VBELM model obtained higher R and NS values as well as lower RMSE and MAE values compared with those of the other two models. Besides, the VBELM model is capable of tracing the fluctuations in the peak and valley of groundwater level. In terms of the three models, the VBELM performed slightly better than the VELM model, while the single ELM model demonstrated the worst prediction outcome. The result implies that there is a significant improvement in the accuracy of the hybrid models when incorporating data decomposition and inputs selection technologies in monthly groundwater level prediction. In addition, the results of the uncertainty analysis indicate that the performance of the proposed VBELM model was satisfactory, with more observed data falling inside the confidence interval, and lower and acceptable d-factor values. Though the uncertainty of the VBELM model decreased along with forecast time increases, the results remained acceptable. In summary, the VBELM model is able to provide an appropriate way for groundwater level forecast and can be regarded as an alternative tool to explore a complex groundwater system in an arid environment.
Although the VBELM method was found to be a promising method for groundwater level modeling, improvement is still needed. First, the performance of the developed models for the three observation wells varies substantially. This is mainly caused by the difference in the internal mechanism of the groundwater systems, which was neglected in this study. Second, the performance of the hybrid VBELM model in groundwater level forecasting was explored without considering the possibility of any other hybrid models. Third, only the uncertainty caused by input data in forecasting results was discussed, while uncertainty resulting from other sources were barely involved. Fourth, extreme climate change has an important impact on groundwater level, such as extreme precipitation and flood events, and groundwater level change due to these factors have not been discussed. To get more reliable results in groundwater level predictions, further studies should concentrate on exploring more hybrid models, introducing a physical mechanism into the predicting models, investigating uncertainty in the model structure and parameters, and combining extreme climate change to further clarify the reason for groundwater level fluctuation.