Ensemble Streamflow Simulations in a Qinghai–Tibet Plateau Basin Using a Deep Learning Method with Remote Sensing Precipitation Data as Input

Wang, Jinqiang; Li, Zhanjie; Zhou, Ling; Ma, Chi; Sun, Wenchao

doi:10.3390/rs17060967

Open AccessArticle

Ensemble Streamflow Simulations in a Qinghai–Tibet Plateau Basin Using a Deep Learning Method with Remote Sensing Precipitation Data as Input

by

Jinqiang Wang

,

Zhanjie Li

,

Ling Zhou

,

Chi Ma

and

Wenchao Sun

^*

Beijing Key Laboratory of Urban Hydrological Cycle and Sponge City Technology, College of Water Sciences, Beijing Normal University, Xinjiekouwai Street 19, Beijing 100875, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(6), 967; https://doi.org/10.3390/rs17060967 (registering DOI)

Submission received: 17 December 2024 / Revised: 9 February 2025 / Accepted: 6 March 2025 / Published: 9 March 2025

(This article belongs to the Special Issue Machine Learning and Automation in Remote Sensing Applied in Hydrological Processes)

Download

Browse Figures

Versions Notes

Abstract

:

Satellite and reanalysis-based precipitation products have played a crucial role in addressing the challenges associated with limited ground-based observational data. These products are widely utilized in hydrometeorological research, particularly in data-scarce regions like the Qinghai–Tibetan Plateau (QTP). This study proposed an ensemble streamflow simulation method using remote sensing precipitation data as input. By employing a 1D Convolutional Neural Networks (1D CNN), streamflow simulations from multiple models are integrated and a Shapley Additive exPlanations (SHAP) interpretability analysis was conducted to examine the contributions of individual models on ensemble streamflow simulation. The method is demonstrated using GPM IMERG (Global Precipitation Measurement Integrated Multi-satellite Retrievals) remote sensing precipitation data for streamflow estimation in the upstream region of the Ganzi gauging station in the Yalong River basin of QTP for the period from 2010 to 2019. Streamflow simulations were carried out using models with diverse structures, including the physically based BTOPMC (Block-wise use of TOPMODEL) and two machine learning models, i.e., Random Forest (RF) and Long Short-Term Memory Neural Networks (LSTM). Furthermore, ensemble simulations were compared: the Simple Average Method (SAM), Weighted Average Method (WAM), and the proposed 1D CNN method. The results revealed that, for the hydrological simulation of each individual models, the Kling–Gupta Efficiency (KGE) values during the validation period were 0.66 for BTOPMC, 0.71 for RF, and 0.74 for LSTM. Among the ensemble approaches, the validation period KGE values for SAM, WAM, and the 1D CNN-based nonlinear method were 0.74, 0.73, and 0.82, respectively, indicating that the nonlinear 1D CNN approach achieved the highest accuracy. The SHAP-based interpretability analysis further demonstrated that RF made the most significant contribution to the ensemble simulation, while LSTM contributed the least. These findings highlight that the proposed 1D CNN ensemble simulation framework has great potential to improve streamflow estimations using remote sensing precipitation data as input and may provide new insight into how deep learning methods advance the application of remote sensing in hydrological research.

Keywords:

GPM IMERG; Qinghai–Tibetan Plateau; machine learning; ensemble simulation; interpretability

1. Introduction

The reliable simulation of streamflow within a watershed is fundamental to addressing a wide range of hydrological challenges, including flood forecasting, drought assessment, and effective water resource management [1,2]. Such simulations not only provide critical support for practical decision-making but also deepen our understanding of the hydrological processes at play within the watershed [3]. Precipitation data are one of the primary driving factors for hydrological models, and a lack of such data can significantly reduce simulation accuracy [4]. Traditionally, precipitation data are obtained from ground-based measurement stations [5]. However, in high-altitude and cold regions, such as the Qinghai–Tibet Plateau (QTP) the establishment of measurement stations is often difficult, or the limited number of stations fails to adequately monitor precipitation, further constraining our understanding of regional meteorological and hydrological processes and hindering effective water resource management [6,7]. With the advancement of satellite remote sensing technology, satellite-based precipitation products have effectively addressed the limitations of sparse in situ gauged data [8,9]. Global Precipitation Measurement Integrated Multi-Satellite Retrievals (GPM IMERG), which succeeds the Tropical Rainfall Measuring Mission (TRMM), is now a widely used remote sensing precipitation dataset at the global scale [10]. Compared to its predecessor TRMM, it offers broader coverage, reduced errors, and less uncertainty [11,12]. Previous studies have demonstrated that GPM IMERG can accurately capture the spatial patterns of precipitation distribution across mainland China [13,14], Austria [15], and Pakistan [4]. In past decades, including GPM IMERG, remote sensing precipitation dataset products gradually became an important data source for driving hydrological models, playing a positive role in flood forecasting, drought warnings, and streamflow simulations in data-scarce regions [16]. Many countries and research institutions have developed various satellite precipitation products with wide spatial coverage, long temporal continuity, and high spatiotemporal resolution. At present, although these products are widely used in hydrological modelling [17,18], due to their higher errors compared with ground measurement, their influence on hydrological modeling cannot be ignored and need to be addressed to ensure reliable streamflow estimations [8].

Traditionally, streamflow simulation has relied on physically based hydrological models, which are constructed using scientific assumptions and an in-depth understanding of the actual hydrological processes within a watershed [19]. While these models possess a clear physical basis, they are unable to fully capture all hydrological phenomena within a watershed, often showing significant variability when simulating hydrological behavior [20]. Furthermore, due to differences in design focus and assumptions, different physically based models may exhibit sensitivity to specific aspects of the hydrological cycle, leading to inconsistencies in the predictions of the same hydrological variables [21]. Machine learning models, on the other hand, demonstrate strong potential in data mining, especially for hydrological processes where internal mechanisms remain poorly understood, enabling the discovery of previously unknown information [22]. Data-driven models such as Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), and Long Short-Term Memory (LSTM) networks have been widely used in time series forecasting [23,24,25]. Compared to traditional models, these approaches feature flexible and robust structures capable of capturing both the linear and nonlinear relationships between inputs and outputs, making them particularly effective for time series predictions in hydrology [26,27]. However, applying machine learning models in prediction often requires long time series data for training to achieve high simulation accuracy [28], which is impractical in data-scarce regions. Moreover, even with sufficient time series data, machine learning models are prone to overfitting during training, which can degrade their performance during the prediction phase [29]. The imperfect model structure and parameterization methods of both physical-based and data-driven models cause simulation errors and uncertainty [30,31], which become a more complex issue when using satellite precipitation data as input due to the interactions among data errors, and the model structure and parameters [19].

No single model can comprehensively capture all the natural hydrological processes within a watershed. Different models emphasize various aspects of the hydrological cycle, and combining the strengths of physically based models and machine learning approaches may further enhance the accuracy and reliability of variable predictions [32,33]. Common ensemble modeling techniques include linear and nonlinear ensemble methods [34,35]. Linear methods comprise non-weighted, weighted, and regression approaches, while nonlinear methods include neural networks, fuzzy theory-based systems, genetic algorithms [36,37]. The stacking ensemble forecasting method combines simulation results from multiple models to effectively improve prediction accuracy [38]. This method operates on two levels: level 0 and level 1. At level 0, the initial training data are used to train multiple models, generating predictions for each model [39]. These predictions are then merged into a meta-model to create the level 1 output. Cavadias and Morin [40] were among the first to use weighted averaging and multiple linear regression to integrate results from different hydrological models, achieving better accuracy than when using any single model alone. Humphrey [26] used soil water simulated by the GR4J hydrological model as the initial soil moisture conditions of the watershed, then predicted streamflow using ANN, GR4J, and a hybrid model combining ANN and GR4J. Their results showed that the hybrid model integrated the strengths of both approaches, performing particularly well under high-flow conditions compared to single models. Li [41] predicted the monthly flow of the Three Gorges Reservoir using Elastic Net Regression (ENR), Support Vector Regression (SVR), RF, and eXtreme Gradient Boosting (XGB). Based on these predictions, they proposed a new ensemble forecasting approach, the modified stacking ensemble strategy. This approach had a simple structure and high computational efficiency, and showed significant improvements in streamflow prediction accuracy. These studies collectively highlight the broad applicability of ensemble forecasting techniques in the field of hydrological modeling, showcasing their potential to enhance the precision and reliability of hydrological simulations. In the context of using remote sensing precipitation products as the driving data of hydrological modelling, an ensemble simulation of multiple models may reduce the sensitivity of flow simulations to observational errors in remote sensing rainfall data by combining the diverse error-response characteristics of different models.

Using machine learning models for ensemble simulations of multiple models can better utilize the strengths of each model, thereby improving the streamflow simulations within watersheds. However, machine learning models are often considered “black-box” models, relying solely on the ability to learn features from input data to make predictions. They do not provide insight into the internal changes within a model, leaving users unaware of what knowledge the model has derived from the input data. This lack of transparency poses potential risks when making decisions based solely on model outputs [42,43]. Moreover, while machine learning models may perform well during the training phase, they often exhibit poor performance during the testing phase, indicating overfitting. This overfitting issue raises concerns about the reliability of machine learning simulation results [44]. Therefore, an interpretability analysis of machine learning ensemble simulations is crucial. One common approach is the game theory-based global sensitivity analysis method Shapley Additive exPlanations (SHAP) [45], which can effectively explain the influence of each input variable on the model’s output from a global perspective [46], which was introduced to game theory by Shapley [47] to determine the contribution of each participant in a cooperative game. SHAP demonstrated its effectiveness in feature analysis across multiple domains, such as medicine [48], automotive engineering [49], materials science [50], hydrological modelling [51,52] and water environment research [53,54]. It can provide both global explanations (measuring the average impact of features on the entire model) and local explanations (calculating feature contributions for individual samples) [44]. One of the advantages of SHAP is that it is applicable to various machine learning models and does not depend on a specific model structure [51,55].

Based on this understanding, in this study, in order to reduce the influence of errors in remote sensing precipitation products on streamflow estimation based on a rainfall-runoff generation mechanism, an ensemble streamflow simulation method rooted in deep learning was developed. By employing a 1D Convolutional Neural Networks (1D CNN), streamflow simulations from multiple models are integrated to account for the interaction between satellite observation error and the structure of models for streamflow estimation. Meanwhile, a SHAP interpretability analysis was conducted to examine the contributions of individual models to ensemble streamflow simulation. The method is demonstrated through a case study in the upstream region of the Ganzi gauging station in the Yalong River basin of QTP to estimate streamflow in the period from 2010 to 2019.The findings from this study are expected to highlight the potential of the proposed 1D CNN ensemble simulation framework to reduce uncertainty in streamflow estimations using remote sensing precipitation data as input and provide new insight into how deep learning methods advance remote sensing applications for hydrological research.

2. Materials and Methods

2.1. Study Area

The Yalong River originates from the southern slopes of the Bayan Har Mountains on the QTP, flows through Qinghai Province into Sichuan Province, China, and ultimately joins the Jinsha River as its largest tributary. This study focuses on hydrological simulation in the upstream area of the Ganzi Hydrological Station, as shown in Figure 1. The total length of the mainstream is around 690 km, the whole basin area is 32,535 km², and there is a total elevation drop of about 2700 m, with the river flowing in a northwest-to-southeast direction. Precipitation in the basin increases progressively from north to south, with long-term annual averages ranging from 600 mm to 800 mm. The temperature within the basin decreases from south to north, with annual average temperatures ranging from −4.9 °C in the coldest northern regions to 7.8 °C in the warm southern areas. The dry and wet seasons are distinctly demarcated, with the wet season occurring from May to October and accounting for 80% of the average annual runoff [56].

2.2. Stacking Ensemble Method

The stacking ensemble method combines the predictions of multiple models to effectively improve simulation accuracy. This approach consists of two layers of learners. In the first layer, multiple models are trained using the initial training dataset, generating predicted values for the response variable from each model. These predicted values are then used as inputs for the second layer of learners. The second layer can utilize both linear and nonlinear ensemble techniques. Common linear methods include SAM and WAM, while nonlinear methods may involve machine learning models, such as SVR. In this study, the physically based distributed model BTOPMC, along with the RF and LSTM neural networks, are used as the base learners in the first layer. For the second layer, both linear ensemble methods and the nonlinear 1D CNN model are employed. Additionally, SHAP is used to perform an interpretability analysis of the ensemble simulation of the 1D CNN method.

2.2.1. Base-Learner Models

The physical mechanism BTOPMC hydrological model uses the Muskingum–Cunge method for routing, applied in a blockwise manner based on the TOPMODEL hydrological framework. Developed by Takeuchi [57] at the University of Yamanashi, Japan, this semi-distributed, grid-based hydrological model divides the watershed into several sub-basins based on the heterogeneity of the grid cells and the watershed’s topography [10]. The Muskingum–Cunge method is then used for simulating streamflow routing. The BTOPMC model has been widely applied in streamflow simulation, flood forecasting, and other fields [58,59]. In this study, 17 key parameters that influence BTOPMC streamflow simulation were selected, as shown in Table 1.

The BTOPMC was calibrated using the Generalized Likelihood Uncertainty Estimation (GLUE) method with 10,000 Latin Hypercube Sampling iterations. The likelihood function chosen was Kling–Gupta Efficiency (KGE), as shown in Equations (1)–(3). We selected KGE because previous researches suggest that it may be superior to the widely used Nash–Sutcliffe Efficiency (NSE) in evaluating rainfall-runoff models due to the fact that KGE accounts for multiple aspects of model performance, whereas NSE primarily emphasizes high-flow performance [60]. KEG is computed as follows:

L_{y} (θ |Y) = K G E = 1 - \sqrt{{(r - 1)}^{2} + {(α - 1)}^{2} + {(β - 1)}^{2}}

(1)

α = \frac{σ_{s i m}}{σ_{o b s}}

(2)

β = \frac{Q_{s i m}}{Q_{o b s}}

(3)

where Ly(θ|Y) represents the likelihood of a specific parameter set θ, r is the linear (Pearson) correlation coefficient between the simulated and observed streamflow, α is a measure of relative variability in the simulated (σ_sim) and observed (σ_obs) standard deviation values (taken as a representation of analyzed time-step), and β is the ratio between the mean simulated (Q_sim) and mean observed daily streamflow values (Q_obs), which represents bias. The threshold for determining the behavioral parameter set is 0.5 and the 50th percentile of the simulated streamflow at each time step is utilized as the simulated value of BTOPMC.

RF is an ensemble learning technique that enhances prediction accuracy by combining multiple decision tree models [61]. Each decision tree is independently trained by randomly selecting data samples and features, which introduces diversity into the model ensemble. The final prediction of RF is obtained by aggregating the predictions from all individual trees, typically through majority voting or averaging. By training multiple decision trees simultaneously, RF can handle large datasets and supports parallelized computation. It can effectively process high-dimensional datasets and deal with features that exhibit complex interactions. Additionally, RF provides insights into the contribution of each feature to the prediction results, helping researchers identify the key features that play a crucial role in the predictive task.

LSTM is a specialized type of Recurrent Neural Network (RNN) designed to address the challenges associated with learning long-term dependencies [23]. Traditional RNNs is limited by vanishing or exploding gradients during training, which impairs their ability to capture long-term relationships. LSTM overcomes these issues by introducing a structure known as the “gating mechanism”, enabling the network to more effectively learn and retain information over extended time periods. At the core of the LSTM network are its memory cells, which manage the flow of information using various gating mechanisms. Each LSTM unit typically contains three main gates: the input gate, the forget gate, and the output gate. The input gate controls which new information is stored in the memory unit, while the forget gate determines which information should be discarded from the memory. The output gate decides which information is passed on to the next layer of the network. These gates allow the network to selectively retain or update its memory based on the specific requirements of the task, enabling LSTM to capture long-term dependencies in sequential data effectively.

Daily meteorological forcing data, including precipitation, temperature, and potential evapotranspiration, were utilized as input features for both the RF and LSTM models. Daily streamflow records served as the training targets. The models employed a mean squared error loss function and 10-fold cross-validation to determine the optimal input configurations for the RF and LSTM models. Datasets were prepared using sequential time steps, including day t, t − 1, t − 2, t − 3, t − 4, and t − 5. The day t − 5 input configuration incorporates data from day t − 5 to day t, effectively testing the influence of extending the input data to longer temporal ranges in order to consider the influence of lag effect on streamflow simulation. Hyperparameters for each model were determined through a trial-and-error approach and GridSearchCV. Specifically, for the RF model, the number of trees n_estimator was set to 500; the maximum depth of each decision tree, max_depth, was “None”; the minimum number of samples required to split an internal node, min_samples_split, was 10; the minimum number of samples required to be present in a leaf node, min_samples_leaf, was 4; the maximum number of features considered for splitting a node, max_features, was “sqrt”. For the LSTM model, the hyperparameters were set as follows: units: 50; epochs: 50; batch_size: 32; dropout_rate: 0.01; activation: relu; and learning_rate: 0.001.

2.2.2. Ensemble Simulation Method

This study developed a nonlinear ensemble simulation scheme based on 1D CNN and used satellite precipitation data as the driving input data. CNNs are highly regarded for their powerful feature extraction capabilities, making them a preferred choice for various tasks, including image recognition, time series analysis, and short-term precipitation forecasting [62]. While CNNs are often associated with two-dimensional (2D) architectures, which are extensively used in image classification, they also exist in other forms, such as one-dimensional (1D-CNN) and three-dimensional (3D-CNN) architectures [63]. These variants are applied in a wide range of real-world engineering scenarios. Despite differences in the dimensionality of the input data, all types of CNNs share the same fundamental principles and operational framework. The key distinction lies in how the filters (or feature detectors) interact with the data, adapting to their specific dimensions. In this study, a 1D-CNN was specifically designed for ensemble simulations, leveraging its ability to process one-dimensional input data effectively. Using GPM IMERG precipitation data as input, firstly, streamflow was estimated by the BTOPMC, RF, and LSTM models. Then, the streamflow time series simulated by the BTOPMC, RF, and LSTM models were used as input features in the 1D CNN for streamflow estimation. For the 1D CNN, 10-fold cross-validation was applied, through a trial-and-error approach, and the hyperp-arameters were configured as follows: epochs: 200; filters: 64; batch_size: 64; kernel_size: 3; learning_rate: 0.1; and dropout_rate: 0.1.

SHAP was introduced into game theory by Shapley [47] to determine the contribution of each participant in cooperative games. In this study, it is utilized to evaluate the contribution of each single streamflow simulation on the ensemble simulation made by 1D CNN. Its core principle is to allocate the overall payoff of the game based on the average contribution of each participant. In the context of machine learning, the input features of the model are treated as participants in a cooperative game, while the model’s prediction represents the total payoff of the game. By calculating the SHAP value for each input feature, one can interpret the contribution of that feature to the model’s prediction. We assume the model can be expressed as shown in Equation (4):

G_{i} (z) = φ_{0} + \sum_{j = 1}^{n} φ_{j} t_{j}

(4)

where G_i(z) represents the i-th model, n denotes the number of input features, and t represents the simplification of the input features. Specifically, features used for prediction are assigned a t-value of 1, while unused features are assigned a t-value of 0. The parameter φ_j∈R indicates the contribution of each input feature to the model’s output, also known as the SHAP value, which is calculated as shown in Equation (5):

φ_{j} (F, x) = \sum_{t \subseteq x} \frac{|t|! (n - |S| - 1)!}{n!} [F (t) - F (t \ i)]

(5)

where φ_j is the SHAP value, S is a subset of n, and x = {x₁, x₂, …, x_n} represents the input features. The symbol “∖” denotes the set difference operation. A larger SHAP value indicates that the corresponding feature is more important to the model’s prediction.

To evaluate the feasibility of the proposed method, the method is also compared with SAM and WAM schemes. The SAM is applied when the differences in the continuous time series indicators from different model simulations are relatively small. In this case, the final simulated result is obtained by taking the average of the individual model outputs. The equation used for this method is Equation (6):

H (x) = \frac{1}{n} \sum_{i}^{n} h_{i} (x)

(6)

where h_i(x) represents the simulated time series from the i-th model and H(x) is the final simulated time series.

When there are significant differences in the simulation performance of the individual models, the simple averaging method may overlook important information in some models while amplifying errors, which could introduce additional uncertainty into the final prediction. In such cases, assigning weights to the predictions of each model based on their respective performance can often result in more accurate predictions. The WAM method combines the outputs of the models by giving higher importance to those models that perform better according to certain evaluation metrics. The equation used for this method is Equation (7):

H (x) = \sum_{i}^{n} w_{i} h_{i} (x)

(7)

where w_i represents the weight assigned to the i-th model, which is determined based on the model’s performance, and h_i(x) is the simulated output from the i-th model. The weight w_i is typically calculated by evaluating the model’s performance using a specific objective function of_i, such as the error or accuracy metrics for each model. The weights are then normalized based on the sum of all objective functions of the models, as shown in Equation (8):

w_{i} = \frac{{o f}_{i}}{{o f}_{s}}

(8)

where of_i represents the objective function of the i-th model and of_s represents the sum of the objective functions of all models.

2.3. Data Sources and Modelling Setting

The observed precipitation data come from the China Meteorological Data Service Centre (2010–2019), and the streamflow data are taken from the monitoring data of the Ganzi station in the upper reaches of the Yalong River Basin, provided by the hydrological year books. The GPM IMERG was chosen as the source of the driving data for the three models in this study, which combines microwave measurements from low-orbit satellites with microwave-calibrated infrared estimates from geostationary satellites, producing high-resolution precipitation estimates with a spatial resolution of 0.1° and a temporal resolution of 30 min [64]. These estimates cover a quasi-global area ranging from 60°S to 60°N. GPM IMERG is available in three versions based on data release speed: IMERG-E, IMERG-L, and IMERG-F [15]. IMERG-E and IMERG-L are near-real-time products, released approximately 4 h and 12 h after observation, respectively. IMERG-F, on the other hand, is a post-real-time product that undergoes calibration using monthly ground-based precipitation data and is released about 3.5 months after observation. Studies have demonstrated that IMERG-F provides a significantly higher accuracy compared to IMERG-E and IMERG-L. Therefore, this study utilizes the daily-scale IMERG-F dataset due to its superior reliability and precision.

In this study, the years 2010–2019 were used for hydrological modeling, while the calibration and validation period were set as years 2010–2014 and 2015–2019, respectively. Additionally, for the potential evapotranspiration data used in the machine learning model, this study used GLEAM 3.8a [65]. The MSWX [66] products were used for the temperature data. For the remaining data of the BTOPMC model, the Digital Elevation Model (DEM) was derived from the Shuttle Radar Topography Mission, the Land Use and Land Cover data were obtained from the China’s Land Use Remote Sensing Mapping System Dataset, the soil data were sourced from the Chinese Soil Science Database, and the meteorological data were obtained using CRU TS [67]. The detailed information used for model construction is provided in Table 2.

2.4. Statistical Analysis

Before using the GPM IMERG satellite-based precipitation product for streamflow estimation, it is necessary to evaluate its errors based on ground-gauged data. This evaluation was performed using the Coefficient of Determination (R², Equation (9)) and the Root Mean Square Error (RMSE, Equation (10)) to assess the accuracy of the GPM IMERG precipitation data.

R^{2} = \frac{[\sum (P_{obs, i} - P_{obs, avg}) (P_{R S, i} - P_{R S, avg} {)]}^{2}}{\sum (P_{obs, i} - P_{obs, avg})^{2} \sum (P_{R S, i} - P_{R S, avg})^{2}}

(9)

R M S E = \sqrt{\frac{1}{n} \sum (P_{obs, i} - P_{R S, i})^{2}}

(10)

where P_obs,i represents the observed precipitation at ground stations, while P_RS,i denotes the precipitation estimated by the GPM IMERG product.

Additionally, in order to assesses the performance of streamflow estimation using single model and different ensemble estimation methods, KGE (Equations (1)–(3)), the NSE (Equation (11)), R² (Equation (9)), Mean Absolute Error (MAE, Equation (12)), and Mean Absolute Percentage Error (MAPE, Equation (13)) were used as the evaluation indicators:

N S E = 1 - \frac{\sum (Q_{obs, i} - Q_{sim, i})^{2}}{\sum (Q_{obs, i} - Q_{obs, avg})^{2}}

(11)

M A E = \frac{1}{n} \sum ∣ Q_{obs, i} - Q_{sim, i} ∣

(12)

M A P E = \frac{100 %}{n} \sum ∣ \frac{Q_{obs, i} - Q_{sim, i}}{Q_{obs, i}} ∣

(13)

where Q_obs,i denotes measured simulated streamflow, and Q_sim,i represents the simulated streamflow at time step i; Q_obs,avg represents the average observed streamflow. All metrics are widely used to gauge the accuracy of hydrological model simulations. Higher values of KGE, NSE and R², closer to 1, and lower values of MAE and MAPE, closer to 0, indicate better model performance.

3. Results

3.1. Evaluation of GPM IMERG Product Based on Ground Measurement

Before applying remote sensing gridded precipitation products in streamflow simulations, their quality is assessed using in situ gauged precipitation. The point-to-grid approach was employed to evaluate the quality of the GPM IMERG remote sensing precipitation product. The assessment utilized the Coefficient of Determination (R²) and the Root Mean Square Error (RMSE) as performance indicators. Figure 2 shows the scatterplots of GPM IMERG precipitation data versus ground data at both daily and monthly scales. The evaluation revealed that the R² is 0.31 and the RMSE is 4.09 mm for daily precipitation, while at the monthly scale they are 0.88 and 26.01mm. The accuracy of the daily observations is generally comparable to that of PERSIANN-CDR, 3B42RT, and 3B42 in the upstream Yalong River basin, as estimated by Alazzy et al. [68]. These facts indicate that the GPM IMERG product generally captured the variations in precipitation.

3.2. Streamflow Simulations of Individual Models

Table 3 provides a detailed summary of the daily simulation results for various input combinations of RF and LSTM. The analysis revealed that while both models initially benefited from the inclusion of an additional time steps, LSTM’s simulation accuracy started to decline when t − 5 is included. This suggested that overly extending the input time range introduces unnecessary noise or redundancies that may degrade model performance. Based on these findings, both the RF and LSTM models are applied for streamflow estimation using data up to t − 4 as inputs. This balance allows the models to leverage sufficient temporal context to make accurate predictions without overburdening them with excessive input data. It is noted that, compared with calibration period, the NSE values for RF and LSTM during the validation period are lower, which indicates a certain degree of overfitting. This phenomenon was also observed in another study on streamflow estimation [69]. The possible overfitting is one potential limitation of using a single machine learning model, making it necessary for ensemble streamflow estimation to utilize multiple models with different structures.

Figure 3, Figure 4 and Figure 5 illustrate the simulated hydrographs for streamflow prediction using the BTOPMC, RF, and LSTM models, while Table 4 provides a comprehensive summary of the statistical metrics that quantify their simulation accuracy. From the hydrograph comparisons, it is evident that the two machine learning models, RF and LSTM, have a better ability to reproduce streamflow dynamics compared to the physically based BTOPMC model. During the calibration period, RF attained the highest KGE value of 0.81, indicating that it had the best model performance of the three models. For LSTM and BTOPMC, the KGE for the calibration period is 0.77 and 0.66, respectively. In the validation period, LSTM demonstrated the best performance, achieving a KGE of 0.74. RF follows, with a validation period KGE of 0.71, while for BTOPMC, KGE was 0.66 during the validation period.

RF demonstrated a reduction in performance during the validation period, suggesting that it may have overfit the training data. In contrast, LSTM demonstrated a consistent performance across both calibration and validation periods, highlighting its robustness and ability to capture complex temporal dynamics. Despite their higher accuracy, the machine learning models operate as black-box approaches, which introduce certain degree of uncertainty due to the lack of clear insight into their internal processes [70]. While the BTOPMC model, based on physical mechanisms, even performed worse than the machine learning models, it benefits from interpretability and lower structural uncertainties than machine learning models. This makes BTOPMC more reliable for long-term projections under changing environmental conditions, such as climate change and land-use transformations. Additionally, physical models are less prone to overfitting and provide valuable insights into intermediate hydrological processes, which are crucial for understanding the system behavior of a water cycle [71].

3.3. Comparisons of Ensemble Simulations Among 1D CNN, SAM, and WAM

Figure 6, Figure 7 and Figure 8 illustrate the simulated hydrographs of streamflow predictions generated using three ensemble methods: the SAM, the WAM, and the 1D CNN nonlinear ensemble method. Table 5 provides a detailed summary of the statistical metrics used to assess the simulation accuracy of these ensemble approaches.

A closer examination of Figure 6, Figure 7 and Figure 8 reveals that the 1D CNN nonlinear ensemble method produces streamflow simulations that align more closely with the observed data compared to the two linear ensemble methods. During the calibration period, these two methods show a comparable performance, achieving KGE values of 0.74 and 0.73. The NSE for these linear methods during the calibration and validation periods is also comparable, with values of 0.80, 0.80 for calibration and 0.71, 0.70 for validation. For other metrics, such as R², MAE, and MAPE, the values obtained by the two methods are also generally comparable, similar to those of the KGE. According to Table 5, the SAM and WAM yield very similar results, with validation period KGE values of 0.74 and 0.73, respectively.

When using 1D CNN for ensemble simulation, in the calibration period, the NSE and KGE achieve a better performance, compared with SAM and WAM. For the validation period, the KGE was highest for 1D CNN. Regarding the other metrics, the performances of the SAM, WAM and 1D CNN are similar. It is worth noting that the SAM and the 1D CNN nonlinear method both outperform the individual models, with 1D CNN achieving the greatest improvement. These results highlighted the advantages of 1D CNN in utilizing the complementary strengths of different modes, improving the accuracy and reliability of streamflow simulations compared to traditional linear averaging techniques.

3.4. SHAP-Based Interpretability Analysis of 1D CNN Ensemble Simulation

Figure 9 illustrates the SHAP-based interpretability analysis of the 1D CNN ensemble model, which integrated three different underlying model structures. A positive SHAP value indicates a positive contribution of the input to the model’s prediction, whereas a negative SHAP value denotes a negative contribution. The magnitude of the SHAP value reflects the strength of the input’s contribution, with larger absolute values indicating higher contributions. The results demonstrated that all three models—RF, LSTM, and BTOPMC—made positive contributions to the final streamflow predictions when their simulated streamflow values were high. Asadi et al. [72] showed that due to the negative contribution of the LSTM and SVR models to the output of the ensemble model in a high streamflow range, the ensemble model performed a corrective downward adjustment to counteract the overestimations of LSTM and SVR, while in our study, the higher simulated values of the single model increased the magnitude of the ensemble simulation values. The SHAP analysis also computes the contribution and importance of each model within the ensemble framework (Figure 10). The RF model has the most significant influence on the ensemble’s overall performance, reflecting its strong ability to capture patterns and trends in the data. In comparison, the LSTM model has the smallest impact, which may be attributed to its relatively lower weighting in the ensemble or its potential overlap with the contributions made by other models. In general, the integration of the three models ensures a balanced combination of their strengths, guaranteeing the robustness and accuracy of the 1D CNN ensemble.

To further analyze contributions of individual models to the ensemble simulation at different streamflow observation ranges and at each time step, the ratio of the absolute SHAP value of each model to the sum of that for the three models was considered as the model’s relative contribution. Figure 11 shows the scatterplots of observed streamflow and the relative contribution of each model at the time step corresponding to the observation. It was found that the relative contributions of LSTM are generally lower than those of the BTOPMC and RF models. The comparison between the latter two models indicates that RF has relatively higher contributions than BTOPMC in the middle- to high-streamflow range, while for the low-flow range, the contribution of BTOPMC is generally higher than that of RF. This phenomenon is somewhat similar to the findings of Jimeno-Sáez et al. [73], showing that the physically based SWAT model obtains a better performance in a low-streamflow range, whereas machine learning models are more accurate when predicting high flows. From this interpretability analysis, it is implied that the proposed 1D CNN ensemble simulation method has the potential to integrate the advantages of different models for improved streamflow estimation.

4. Conclusions

Satellite precipitation datasets are now widely used in streamflow estimations in regions lacking ground measurements. However, errors in such datasets interact with the hydrological model structure and parameterization schemes, which will increase uncertainty in model simulations. In order to reduce uncertainty in streamflow estimations using satellite precipitation datasets as input, an ensemble simulation method based on the 1D CNN deep learning method was proposed and evaluated through a case study in the upstream region of the Ganzi gauging station in the Yalong River basin of QTP using the GPM IMERG precipitation dataset as input. A comparison of the ensemble simulation methods of SAM, WAM, and the proposed deep learning method demonstrated that, according to the comprehensive model performance indicator of KGE, the 1D CNN ensemble simulations method exhibits the best performance and outperform the three single models, showing its effectiveness in reducing the influence of errors in remotely sensed precipitation datasets on the streamflow estimations. Furthermore, the SHAP-based analysis highlighted that RF contributed the most to the ensemble simulations made using the 1D CNN method, followed by BTOPMC and LSTM, which is valuable when aiming to quantify the contributions of each model. In the proof-of-concept study, the feasibility of the proposed deep-learning-based ensemble streamflow estimation method was evaluated through a single case study using GPM IMERG precipitation data. In future research, it is recommended to test the method in more basins with various rainfall-runoff generation characteristics. As different models vary in their mathematical description of runoff generation processes and parameterization schemes, enabling them to capture different characteristics and dynamic behaviors of the system, incorporating more models with different structures into the proposed methods would also be valuable to explore the potential of deep learning methods in improving ensemble hydrological simulations.

Author Contributions

J.W. and W.S. developed the methodology. J.W. developed the code and carried out the simulation and analysis. J.W., Z.L., L.Z. and W.S. wrote the original manuscript. J.W., Z.L. and C.M. processed the data and visualized the results. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant nos. 52394232, 52179002).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Devia, G.K.; Ganasri, B.P.; Dwarakish, G.S. A Review on Hydrological Models. Aquat. Procedia 2015, 4, 1001–1007. [Google Scholar] [CrossRef]
Ye, A.; Deng, X.; Ma, F.; Duan, Q.; Zhou, Z.; Du, C. Integrating weather and climate predictions for seamless hydrologic ensemble forecasting: A case study in the Yalong River basin. J. Hydrol. 2017, 547, 196–207. [Google Scholar] [CrossRef]
Gupta, H.V.; Beven, K.J.; Wagener, T. Model Calibration and Uncertainty Estimation. In Encyclopedia of Hydrological Sciences; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar] [CrossRef]
Arshad, M.; Ma, X.; Yin, J.; Ullah, W.; Ali, G.; Ullah, S.; Liu, M.; Shahzaman, M.; Ullah, I. Evaluation of GPM-IMERG and TRMM-3B42 precipitation products over Pakistan. Atmos. Res. 2021, 249, 105341. [Google Scholar] [CrossRef]
Tang, X.; Zhang, J.; Gao, C.; Ruben, G.; Wang, G. Assessing the Uncertainties of Four Precipitation Products for Swat Modeling in Mekong River Basin. Remote Sens. 2019, 11, 304. [Google Scholar] [CrossRef]
Suliman, A.H.A.; Awchi, T.A.; Al-Mola, M.; Shahid, S. Evaluation of remotely sensed precipitation sources for drought assessment in Semi-Arid Iraq. Atmos. Res. 2020, 242, 105007. [Google Scholar] [CrossRef]
Sun, R.; Yuan, H.; Liu, X.; Jiang, X. Evaluation of the latest satellite–gauge precipitation products and their hydrologic applications over the Huaihe River basin. J. Hydrol. 2016, 536, 302–319. [Google Scholar] [CrossRef]
Chen, J.; Li, Z.; Li, L.; Wang, J.; Qi, W.; Xu, C.-Y.; Kim, J.-S. Evaluation of Multi-Satellite Precipitation Datasets and Their Error Propagation in Hydrological Modeling in a Monsoon-Prone Region. Remote Sens. 2020, 12, 3550. [Google Scholar] [CrossRef]
Lu, D.; Yong, B. A Preliminary Assessment of the Gauge-Adjusted Near-Real-Time GSMaP Precipitation Estimate over Mainland China. Remote Sens. 2020, 12, 141. [Google Scholar] [CrossRef]
Sun, W.C.; Wang, J.; Li, Z.J.; Yao, X.L.; Yu, J.S. Influences of Climate Change on Water Resources Availability in Jinjiang Basin, China. Sci. World J. 2014, 2014, 908349. [Google Scholar] [CrossRef]
Huffman, G.J.; Bolvin, D.T.; Nelkin, E.J.; Wolff, D.B.; Adler, R.F.; Gu, G.; Hong, Y.; Bowman, K.P.; Stocker, E.F. The TRMM Multisatellite Precipitation Analysis (TMPA): Quasi-Global, Multiyear, Combined-Sensor Precipitation Estimates at Fine Scales. J. Hydrometeorol. 2007, 8, 38–55. [Google Scholar] [CrossRef]
Fang, J.; Yang, W.; Luan, Y.B.; Du, J.; Lin, A.W.; Zhao, L. Evaluation of the TRMM 3B42 and GPM IMERG products for extreme precipitation analysis over China. Atmos. Res. 2019, 223, 24–38. [Google Scholar] [CrossRef]
Wang, Y.; Miao, C.; Zhao, X.; Zhang, Q.; Su, J. Evaluation of the GPM IMERG product at the hourly timescale over China. Atmos. Res. 2023, 285, 106656. [Google Scholar] [CrossRef]
Zhang, Y.; Zheng, X.; Li, X.; Lyu, J.; Zhao, L. Evaluation of the GPM-IMERG V06 Final Run products for monthly/annual precipitation under the complex climatic and topographic conditions of China. J. Appl. Meteorol. Clim. 2023, 62, 929–946. [Google Scholar] [CrossRef]
Sungmin, O.; Foelsche, U.; Kirchengast, G.; Fuchsberger, J.; Tan, J.; Petersen, W.A. Evaluation of GPM IMERG Early, Late, and Final rainfall estimates using WegenerNet gauge data in southeastern Austria. Hydrol. Earth Syst. Sci. 2017, 21, 6559–6572. [Google Scholar] [CrossRef]
Zhou, Z.; Lu, D.; Yong, B.; Shen, Z.; Wu, H.; Yu, L. Evaluation of GPM-IMERG Precipitation Product at Multiple Spatial and Sub-Daily Temporal Scales over Mainland China. Remote Sens. 2023, 15, 1237. [Google Scholar] [CrossRef]
Sun, Q.; Miao, C.; Duan, Q.; Ashouri, H.; Sorooshian, S.; Hsu, K.L. A Review of Global Precipitation Data Sets: Data Sources, Estimation, and Intercomparisons. Rev. Geophys. 2018, 56, 79–107. [Google Scholar] [CrossRef]
Beck, H.E.; Wood, E.F.; Pan, M.; Fisher, C.K.; Miralles, D.G.; Van Dijk, A.I.J.M.; McVicar, T.R.; Adler, R.F. MSWEP V2 Global 3-Hourly 0.1° Precipitation: Methodology and Quantitative Assessment. Bull. Am. Meteorol. Soc. 2019, 100, 473–500. [Google Scholar] [CrossRef]
Beven, K.J.; Binley, A. The future of distributed models: Model calibration and uncertainty prediction. Hydrol. Process. 1992, 6, 279–298. [Google Scholar] [CrossRef]
Pechlivanidis, I.; Jackson, B.; Mcintyre, N.; Wheater, H. Catchment scale hydrological modelling: A review of model types, calibration approaches and uncertainty analysis methods in the context of recent developments in technology and applications. Glob. Nest J. 2011, 13, 193–214. [Google Scholar]
Beven, K.J. Rainfall-Runoff Modeling: Introduction. In Encyclopedia of Hydrological Sciences; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar] [CrossRef]
Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; Prabhat, f. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef]
Frame, J.M.; Kratzert, F.; Klotz, D.; Gauch, M.; Shalev, G.; Gilon, O.; Qualls, L.M.; Gupta, H.V.; Nearing, G.S. Deep learning rainfall–runoff predictions of extreme events. Hydrol. Earth Syst. Sci. 2022, 26, 3377–3392. [Google Scholar] [CrossRef]
Taormina, R.; Chau, K.-W. ANN-based interval forecasting of streamflow discharges using the LUBE method and MOFIPS. Eng. Appl Artif. Intel. 2015, 45, 429–440. [Google Scholar] [CrossRef]
Yu, X.; Wang, Y.; Wu, L.; Chen, G.; Wang, L.; Qin, H. Comparison of support vector regression and extreme gradient boosting for decomposition-based data-driven 10-day streamflow forecasting. J. Hydrol. 2020, 582, 124293. [Google Scholar] [CrossRef]
Humphrey, G.B.; Gibbs, M.S.; Dandy, G.C.; Maier, H.R. A hybrid approach to monthly streamflow forecasting: Integrating hydrological model outputs into a Bayesian artificial neural network. J. Hydrol. 2016, 540, 623–640. [Google Scholar] [CrossRef]
Maier, H.R.; Jain, A.; Dandy, G.C.; Sudheer, K.P. Methods used for the development of neural networks for the prediction of water resource variables in river systems: Current status and future directions. Environ. Model. Softw. 2010, 25, 891–909. [Google Scholar] [CrossRef]
Slater, L.J.; Arnal, L.; Boucher, M.-A.; Chang, A.Y.Y.; Moulds, S.; Murphy, C.; Nearing, G.; Shalev, G.; Shen, C.; Speight, L.; et al. Hybrid forecasting: Blending climate predictions with AI models. Hydrol. Earth Syst. Sci. 2023, 27, 1865–1889. [Google Scholar] [CrossRef]
Altman, N.; Krzywinski, M. The curse(s) of dimensionality. Nat. Methods 2018, 15, 399–400. [Google Scholar] [CrossRef]
Li, W.; Sankarasubramanian, A. Reducing hydrologic model uncertainty in monthly streamflow predictions using multimodel combination. Water Resour. Res. 2012, 48, W12516. [Google Scholar] [CrossRef]
Xu, W.; Chen, J.; Corzo, G.; Xu, C.; Zhang, X.J.; Xiong, L.; Liu, D.; Xia, J. Coupling Deep Learning and Physically Based Hydrological Models for Monthly Streamflow Predictions. Water Resour. Res. 2024, 60, e2023WR035618. [Google Scholar] [CrossRef]
Kumar, A.; Singh, R.; Jena, P.P.; Chatterjee, C.; Mishra, A. Identification of the best multi-model combination for simulating river discharge. J. Hydrol. 2015, 525, 313–325. [Google Scholar] [CrossRef]
Nourani, V.; Elkiran, G.; Abdullahi, J. Multi-step ahead modeling of reference evapotranspiration using a multi-model approach. J. Hydrol. 2020, 581, 124434. [Google Scholar] [CrossRef]
Gelete, G.; Nourani, V.; Gokcekus, H.; Gichamo, T. Physical and artificial intelligence-based hybrid models for rainfall-runoff-sediment process modelling. Hydrol. Sci. J. 2023, 68, 1841–1863. [Google Scholar] [CrossRef]
Zounemat-Kermani, M.; Batelaan, O.; Fadaee, M.; Hinkelmann, R. Ensemble machine learning paradigms in hydrology: A review. J. Hydrol. 2021, 598, 126266. [Google Scholar] [CrossRef]
Gelete, G.; Nourani, V.; Gokcekus, H.; Gichamo, T. Ensemble physically based semi-distributed models for the rainfall-runoff process modeling in the data-scarce Katar catchment, Ethiopia. J. Hydroinform. 2023, 25, 567–592. [Google Scholar] [CrossRef]
Gichamo, T.; Nourani, V.; Gökçekus, H.; Gelete, G. Ensemble of artificial intelligence and physically based models for rainfall-runoff modeling in the upper Blue Nile Basin. Hydrol. Res. 2024, 55, 976–1000. [Google Scholar] [CrossRef]
Yan, J.; Jia, S.; Lv, A.; Zhu, W. Water Resources Assessment of China’s Transboundary River Basins Using a Machine Learning Approach. Water Resour. Res. 2019, 55, 632–655. [Google Scholar] [CrossRef]
Sun, W.; Trevor, B. A stacking ensemble learning framework for annual river ice breakup dates. J. Hydrol. 2018, 561, 636–650. [Google Scholar] [CrossRef]
Cavadias, G.; Morin, G. The Combination of Simulated Discharges of Hydrological Models: Application to the WMO Intercomparison of Conceptual Models of Snowmelt Runoff. Hydrol. Res. 1986, 17, 21–32. [Google Scholar] [CrossRef]
Li, Y.; Liang, Z.; Hu, Y.; Li, B.; Xu, B.; Wang, D. A multi-model integration method for monthly streamflow prediction: Modified stacking ensemble strategy. J. Hydroinform. 2019, 22, 310–326. [Google Scholar] [CrossRef]
Miao, J.; Zhang, X.; Zhang, G.; Wei, T.; Zhao, Y.; Ma, W.; Chen, Y.; Li, Y.; Wang, Y. Applications and interpretations of different machine learning models in runoff and sediment discharge simulations. Catena 2024, 238, 107848. [Google Scholar] [CrossRef]
Samek, W.; Wiegand, T.; Müller, K.-R. Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv 2017, arXiv:1708.08296. [Google Scholar] [CrossRef]
Ali, S.; Abuhmed, T.; El-Sappagh, S.; Muhammad, K.; Alonso-Moral, J.M.; Confalonieri, R.; Guidotti, R.; Del Ser, J.; Díaz-Rodríguez, N.; Herrera, F. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Inf. Fusion 2023, 99, 101805. [Google Scholar] [CrossRef]
Lundberg, S.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]
Liu, Q.; Gui, D.; Zhang, L.; Niu, J.; Dai, H.; Wei, G.; Hu, B.X. Simulation of regional groundwater levels in arid regions using interpretable machine learning models. Sci. Total Environ. 2022, 831, 154902. [Google Scholar] [CrossRef] [PubMed]
Shapley, L.S. A Value for n-Person Games. Contributions to the Theory of Games; Harold William, K., Albert William, T., Eds.; Princeton University Press: Princeton, NJ, USA, 1953; Volume 2, pp. 307–318. [Google Scholar] [CrossRef]
Lama, L.; Wilhelmsson, O.; Norlander, E.; Gustafsson, L.; Lager, A.; Tynelius, P.; Ẅarvik, L.; Östenson, C.G. Machine learning for prediction of diabetes risk in middle-aged Swedish people. Heliyon 2021, 7, e07419. [Google Scholar] [CrossRef]
Wen, X.; Xie, Y.; Wu, L.; Jiang, L. Quantifying and comparing the effects of key risk factors on various types of roadway segment crashes with LightGBM and SHAP. Accid. Anal. Prev. 2021, 159, 106261. [Google Scholar] [CrossRef]
Mangalathu, S.; Hwang, S.H.; Jeon, J.S. Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Eng. Struct. 2020, 219, 110927. [Google Scholar] [CrossRef]
Liu, J.; Ren, K.; Ming, T.; Qu, J.; Guo, W.; Li, H. Investigating the effects of local weather, streamflow lag, and global climate information on 1-month-ahead streamflow forecasting by using XGBoost and SHAP: Two case studies involving the contiguous USA. Acta Geophys. 2023, 71, 905–925. [Google Scholar] [CrossRef]
Wang, R.Z.; Kim, J.H.; Li, M.H. Predicting stream water quality under different urban development pattern scenarios with an interpretable machine learning approach. Sci. Total Environ. 2021, 761, 144057. [Google Scholar] [CrossRef]
Kuntl, S.K.; Saharia, M.; Kirstetter, P. Global-scale characterization of streamflow extremes. J. Hydrol. 2022, 615, 128668. [Google Scholar] [CrossRef]
Wang, S.; Peng, H.; Liang, S. Prediction of estuarine water quality using interpretable machine learning approach. J. Hydrol. 2022, 605, 127320. [Google Scholar] [CrossRef]
Lin, Y.; Wang, D.; Wang, G.; Qiu, J.; Long, K.; Du, Y.; Xie, H.; Wei, Z.; Shangguan, W.; Dai, Y. A hybrid deep learning algorithm and its application to streamflow prediction. J. Hydrol. 2021, 601, 126636. [Google Scholar] [CrossRef]
Wang, X.; Sun, W.; Lu, F.; Zuo, R. Combining Satellite Optical and Radar Image Data for Streamflow Estimation Using a Machine Learning Method. Remote Sens. 2023, 15, 5184. [Google Scholar] [CrossRef]
Takeuchi, K.; Ao, T.; Ishidaira, H. Introduction of block-wise use of TOPMODEL and Muskingum-Cunge method for the hydroenvironmental simulation of a large ungauged basin. Hydrol. Sci. J. 1999, 44, 633–646. [Google Scholar] [CrossRef]
Hishinuma, S.; Takeuchi, K.; Magome, J. Challenges of hydrological analysis for water resource development in semi-arid mountainous regions: Case study in Iran. Hydrol. Sci. J. 2014, 59, 1718–1737. [Google Scholar] [CrossRef]
Petpongpan, C.; Ekkawatpanit, C.; Kositgittiwong, D. Landslide risk assessment using hydrological model in the Upper Yom River Basin, Thailand. Catena 2021, 204, 105402. [Google Scholar] [CrossRef]
Basijokaite, R.; Kelleher, C. Time-Varying Sensitivity Analysis Reveals Relationships Between Watershed Climate and Variations in Annual Parameter Importance in Regions with Strong Interannual Variability. Water Resour. Res. 2021, 57, e2020WR028544. [Google Scholar] [CrossRef]
Islam, K.I.; Elias, E.; Carroll, K.C.; Brown, C. Exploring Random Forest Machine Learning and Remote Sensing Data for Streamflow Prediction: An Alternative Approach to a Process-Based Hydrologic Modeling in a Snowmelt-DrivenWatershed. Remote Sens. 2023, 15, 3999. [Google Scholar] [CrossRef]
Xie, Y.; Sun, W.; Ren, M.; Chen, S.; Huang, Z.; Pan, X. Stacking ensemble learning models for daily runoff prediction using 1D and 2D CNNs. Expert. Syst. Appl. 2023, 217, 119469. [Google Scholar] [CrossRef]
Hussain, D.; Hussain, T.; Khan, A.A.; Naqvi, S.A.A.; Jamil, A. A deep learning approach for hydrological time-series prediction: A case study of Gilgit river basin. Earth Sci. Inform. 2020, 13, 915–927. [Google Scholar] [CrossRef]
Tang, G.; Clark, M.P.; Papalexiou, S.M.; Ma, Z.; Hong, Y. Have satellite precipitation products improved over last two decades? A comprehensive comparison of GPM IMERG with nine satellite and reanalysis datasets. Remote Sens. Environ. 2020, 240, 111697. [Google Scholar] [CrossRef]
Miralles, D.G.; Holmes, T.R.H.; De Jeu, R.A.M.; Gash, J.H.; Meesters, A.G.C.A.; Dolman, A.J. Global land-surface evaporation estimated from satellite-based observations. Hydrol. Earth Syst. Sci. 2011, 15, 453–469. [Google Scholar] [CrossRef]
Beck, H.E.; Van Dijk, A.I.J.M.; Larraondo, P.R.; McVicar, T.R.; Pan, M.; Dutra, E.; Miralles, D.G. MSWX: Global 3-Hourly 0.1° Bias-Corrected Meteorological Data Including Near-Real-Time Updates and Forecast Ensembles. Bull. Am. Meteorol. Soc. 2022, 103, E710–E732. [Google Scholar] [CrossRef]
Harris, I.; Osborn, T.J.; Jones, P.; Lister, D. Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset. Sci. Data 2020, 7, 109. [Google Scholar] [CrossRef] [PubMed]
Alazzy, A.A.; Lü, H.; Chen, R.; Ali, A.B.; Zhu, Y.; Su, J. Evaluation of Satellite Precipitation Products and Their Potential Influence on Hydrological Modeling over the Ganzi River Basin of the Tibetan Plateau. Adv. Meteorol. 2017, 2017, 3695285. [Google Scholar] [CrossRef]
Deng, H.; Chen, W.; Huang, G. Deep insight into daily runoff forecasting based on a CNN-LSTM model. Nat. Hazards 2022, 113, 1675–1696. [Google Scholar] [CrossRef]
Yang, S.; Yang, D.; Chen, J.; Santisirisomboon, J.; Lu, W.; Zhao, B. A physical process and machine learning combined hydrological model for daily streamflow simulations of large watersheds with limited observation data. J. Hydrol. 2020, 590, 125206. [Google Scholar] [CrossRef]
Nearing, G.S.; Kratzert, F.; Sampson, A.F.; Pelissier, C.S.; Klotz, D.; Frame, J.M.; Prieto, C.; Gupta, H.V. What Role Does Hydrological Science Play in the Age of Machine Learning? Water Resour. Res. 2020, 57, e2020WR028091. [Google Scholar] [CrossRef]
Asadi, S.; Jimeno-Sáez, P.; López-Ballesteros, A.; Senent-Aparicio, J. Comparison and integration of physical and interpretable AI-driven models for rainfall-runoff simulation. Results Eng. 2024, 24, 103048. [Google Scholar] [CrossRef]
Jimeno-Sáez, P.; Senent-Aparicio, J.; Pérez-Sánchez, J.; Pulido-Velazquez, D. A Comparison of SWAT and ANN Models for Daily Runoff Simulation in Different Climatic Zones of Peninsular Spain. Water 2018, 10, 192. [Google Scholar] [CrossRef]

Figure 1. The topography and spatial locations of meteorological and hydrological stations in the upstream region of the Ganzi gauging station in the Yalong River Basin.

Figure 2. Comparison between in situ observations of precipitation and GPM IMERG observations from 2010 to 2019 at (a) daily and (b) monthly scales.

Figure 3. Simulated streamflow of BTOPMC model for the (a) calibration period and (b) validation period.

Figure 4. Simulated streamflow of RF model for the (a) calibration period and (b) validation period.

Figure 5. Simulated streamflow of LSTM model for the (a) calibration period and (b) validation period.

Figure 6. Ensemble streamflow simulation using a simple average method for the (a) calibration period and (b) validation period.

Figure 7. Ensemble streamflow simulation using a weighted average method for the (a) calibration period and (b) validation period.

Figure 8. Ensemble streamflow simulation using the 1D CNN method for the (a) calibration period and (b) validation period.

Figure 9. SHAP value of the three single models used for 1D CNN ensemble simulation.

Figure 10. Global feature importance of the three single models used for 1D CNN ensemble simulation.

Figure 11. Scatterplots of relative contributions of (a) BTOPMC, (b) RF, and (c) LSTM to the ensemble simulation versus observed streamflow.

Table 1. Parameters of the BTOPMC model being calibrated in this study.

No.	Parameter	Parameter Description	Unit	Min	Max
1	T0_Clay	Saturated transmissivity of soil (Clay)	m²/h	3	51
2	T0_Sand	Saturated transmissivity of soil (Sand)	m²/h	8	150
3	T0_Silt	Saturated transmissivity of soil (Silt)	m²/h	5	101
4	sdbar	Initial value of average soil saturation deficit	m	0.001	1
5	n0	Block average roughness coefficient	—	0.001	0.07
6	alpha	Drying function parameter	—	1	8
7	m	Decay factor of transmissivity	m	0.001	0.2
8	Srmax1	Maximum root zone storage of deep-rooted	—	0.001	0.01
9	Srmax2	Maximum root zone storage of shallow-rooted	—	0.001	0.04
10	Srmax3	Maximum root zone storage of impervious	—	0.0001	0.02
11	Tsm	Soil freezing threshold temperature	°C	−1	1
12	Tb	Threshold temperature of snowfall	°C	−2	0
13	Tr	Threshold temperature of rainfall	°C	0	2
14	Mf	Degree day factor	mm °C⁻¹day⁻¹	1	4
15	T_base	Threshold temp for snow melt	°C	0	2
16	Phi	Snow pack yield parameter	—	0.1	1.5
17	Cfr	Refreezing coefficient	mm day⁻¹	0.01	1

Table 2. Overview of the datasets used for modeling.

Type of Data	Model	Data Products	Spatial Resolution	Temporal Resolution	Source
DEM	BTOPMC	Shuttle Radar Topography Mission	90 × 90 m	-	http://www.gscloud.cn/ (accessed on 4 February 2025).
Land Use and Land Cover	BTOPMC	China’s Land Use Remote Sensing Mapping System (CNLUCC) Dataset	1 × 1 km	-	http://www.geodata.cn/ (accessed on 4 February 2025).
Soil	BTOPMC	Chinese Soil Science Database	1 × 1 km	-	http://vdb3.soil.csdb.cn/ (accessed on 4 February 2025).
Precipitation	—	Ground-Based Measurement	-	Daily	https://data.cma.cn/data (accessed on 4 February 2025).
Precipitation	BTOPMC/RF/LSTM	GPM IMERG	0.1°	Daily	https://pmm.nasa.gov/data-access/downloads/gpm (accessed on 4 February 2025).
Temperature	BTOPMC/RF/LSTM	MSWX	0.1°	Daily	http://www.gloh2o.org/mswx/ (accessed on 4 February 2025).
Weather	BTOPMC	CRU TS	0.5°	Monthly	https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.07/ (accessed on 4 February 2025).
Evapotranspiration	RF/LSTM	GLEAM 3.8a	0.25°	Daily	https://www.gleam.eu (accessed on 4 February 2025).
Streamflow	BTOPMC/RF/LSTM	Hydrological Year Book	-	Daily	Ministry of Water Resources, China

Table 3. Streamflow simulation performance of RF and LSTM with different inputs.

	Parameter	t	t − 1	t − 2	t − 3	t − 4	t − 5
RF	CAL_NSE	0.71	0.72	0.78	0.82	0.84	0.86
	CAL_KGE	0.71	0.72	0.76	0.80	0.81	0.82
	VAL_NSE	0.41	0.43	0.49	0.53	0.56	0.58
	VAL_KGE	0.62	0.63	0.67	0.70	0.71	0.73
LSTM	CAL_NSE	0.55	0.57	0.61	0.69	0.72	0.78
	CAL_KGE	0.62	0.63	0.71	0.82	0.77	0.86
	VAL_NSE	0.43	0.46	0.48	0.37	0.54	0.43
	VAL_KGE	0.62	0.64	0.69	0.69	0.74	0.71

CAL: calibration; VAL: validation.

Table 4. Streamflow simulation performance for BTOPMC, RF, and LSTM.

	BTOPMC	RF	LSTM
CAL_NSE	0.63	0.84	0.72
CAL_KGE	0.66	0.81	0.77
CAL_R²	0.84	0.88	0.85
CAL_MAE	102.3	77.9	91.1
CAL_MAPE	0.37	0.29	0.34
VAL_NSE	0.54	0.55	0.54
VAL_KGE	0.66	0.71	0.74
VAL_R²	0.79	0.76	0.76
VAL_MAE	102.7	116.1	114.5
VAL_MAPE	0.36	0.51	0.49

CAL: calibration; VAL: validation.

Table 5. Estimated streamflow performance for the three ensemble simulation methods.

	SAM	WAM	1D CNN
CAL_NSE	0.80	0.80	0.84
CAL_KGE	0.74	0.73	0.88
CAL_R²	0.91	0.91	0.91
CAL_MAE	66.5	66.8	64.0
CAL_MAPE	0.20	0.21	0.21
VAL_NSE	0.71	0.70	0.71
VAL_KGE	0.74	0.73	0.82
VAL_R²	0.84	0.84	0.85
VAL_MAE	85.5	87.3	87.4
VAL_MAPE	0.32	0.33	0.34

CAL: calibration; VAL: validation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J.; Li, Z.; Zhou, L.; Ma, C.; Sun, W. Ensemble Streamflow Simulations in a Qinghai–Tibet Plateau Basin Using a Deep Learning Method with Remote Sensing Precipitation Data as Input. Remote Sens. 2025, 17, 967. https://doi.org/10.3390/rs17060967

AMA Style

Wang J, Li Z, Zhou L, Ma C, Sun W. Ensemble Streamflow Simulations in a Qinghai–Tibet Plateau Basin Using a Deep Learning Method with Remote Sensing Precipitation Data as Input. Remote Sensing. 2025; 17(6):967. https://doi.org/10.3390/rs17060967

Chicago/Turabian Style

Wang, Jinqiang, Zhanjie Li, Ling Zhou, Chi Ma, and Wenchao Sun. 2025. "Ensemble Streamflow Simulations in a Qinghai–Tibet Plateau Basin Using a Deep Learning Method with Remote Sensing Precipitation Data as Input" Remote Sensing 17, no. 6: 967. https://doi.org/10.3390/rs17060967

APA Style

Wang, J., Li, Z., Zhou, L., Ma, C., & Sun, W. (2025). Ensemble Streamflow Simulations in a Qinghai–Tibet Plateau Basin Using a Deep Learning Method with Remote Sensing Precipitation Data as Input. Remote Sensing, 17(6), 967. https://doi.org/10.3390/rs17060967

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ensemble Streamflow Simulations in a Qinghai–Tibet Plateau Basin Using a Deep Learning Method with Remote Sensing Precipitation Data as Input

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Stacking Ensemble Method

2.2.1. Base-Learner Models

2.2.2. Ensemble Simulation Method

2.3. Data Sources and Modelling Setting

2.4. Statistical Analysis

3. Results

3.1. Evaluation of GPM IMERG Product Based on Ground Measurement

3.2. Streamflow Simulations of Individual Models

3.3. Comparisons of Ensemble Simulations Among 1D CNN, SAM, and WAM

3.4. SHAP-Based Interpretability Analysis of 1D CNN Ensemble Simulation

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI