Constraining Flood Forecasting Uncertainties through Streamflow Data Assimilation in the Tropical Andes of Peru: Case of the Vilcanota River Basin

Llauca, Harold; Arestegui, Miguel; Lavado-Casimiro, Waldo

doi:10.3390/w15223944

Open AccessArticle

Constraining Flood Forecasting Uncertainties through Streamflow Data Assimilation in the Tropical Andes of Peru: Case of the Vilcanota River Basin

by

Harold Llauca

^1,2,3,*

,

Miguel Arestegui

⁴ and

Waldo Lavado-Casimiro

^1,3

¹

National Service of Meteorology and Hydrology of Peru—SENAMHI, Lima 15072, Peru

²

Universidad de Ingenieria y Tecnologia—UTEC, Lima 15063, Peru

³

Doctoral Program in Water Resources, National Agrarian University La Molina, Lima 15024, Peru

⁴

Practical Action Latin America, Lima 15074, Peru

^*

Author to whom correspondence should be addressed.

Water 2023, 15(22), 3944; https://doi.org/10.3390/w15223944

Submission received: 19 September 2023 / Revised: 27 October 2023 / Accepted: 31 October 2023 / Published: 13 November 2023

(This article belongs to the Section Hydrology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Flood modeling and forecasting are crucial for managing and preparing for extreme flood events, such as those in the Tropical Andes. In this context, assimilating streamflow data is essential. Data Assimilation (DA) seeks to combine errors between forecasting models and discharge measurements through the updating of model states. This study aims to assess the applicability and performance of streamflow DA in a sub-daily forecasting system of the Peruvian Tropical Andes using the Ensemble Kalman Filter (EnKF) and Particle Filter (PF) algorithms. The study was conducted in a data-sparse Andean basin during the period February–March 2022. For this purpose, the lumped GR4H rainfall–runoff model was run forward with 100 ensemble members in four different DA experiments based on IMERG-E and GSMaP-NRT precipitation sources and assimilated real-time hourly discharges at the basin outlet. Ensemble modeling with EnKF and PF displayed that perturbation introduced by GSMaP-NRT’-driven experiments reduced the model uncertainties more than IMERG-E’ ones, and the reduction in high-flow subestimation was more notable for the GSMaP-NRT’+EnKF configuration. The ensemble forecasting framework from 1 to 24 h proposed here showed that the updating of model states using DA techniques improved the accuracy of streamflow prediction at least during the first 6–8 h on average, especially for the GSMaP-NRT’+EnKF scheme. Finally, this study benchmarks the application of streamflow DA in data-sparse basins in the Tropical Andes and will support the development of more accurate climate services in Peru.

Keywords:

streamflow data assimilation; flood forecasting; Tropical Andes; satellite precipitation products; GR4H model

1. Introduction

Floods triggered by extreme rainfall lead the rankings of natural disasters in the Tropical Andes [1,2], causing huge damage to people’s lives and health, agricultural and systems production, economy, infrastructure, etc. [3,4,5]. Thus, flood modeling and forecasting are key to managing and preparing for extreme flood events and also are valuable for timely flood warnings and emergency responses [6,7]. In that sense, the Andean population of Peru is highly prone and vulnerable to a large suite of extreme hydrometeorological events [8,9], such as higher rainfall rates in the eastern Andean/western Amazon transition [10] and greater water discharges from the highlands to Andes foothills [11]. For instance, extreme rainfalls and the subsequent floods that occurred in the austral summer of 2010 in Cusco (southern Peruvian Andes) caused USD 250 million in losses [12], evidencing the crucial necessity for an accurate operational flood forecasting system [13].

In operational streamflow prediction systems, hydrological forecasting aims to predict the catchment response (e.g., discharges at the basin outlet) to sudden changes in precipitation rates [14]. These predictions are typically made by simplifications of real-world processes—applying models—which introduce uncertainties because of limited system knowledge [15]. However, the accuracy of hydrological forecasting is limited due to different uncertainty sources, such as measurement errors in tipping bucket mechanisms of rain gauges [16]; model input errors such as those reported in [17] during interpolation and scaling of rainfall data; structural errors due to the selection of unique hydrological models for the study area [18]; parameter estimation uncertainties and sensitivity due to most parameters being difficult to measure directly and needing to be estimated trough calibration [19]; and simulation errors due to conventional streamflow at the basin outlet being used as a reliable method for determining the performance of hydrological models [20]. In that sense, streamflow data assimilation (DA) seeks to provide more accurate predictions of water discharge by combining errors in observations and model components through the updating of model states [21].

The application of streamflow DA may follow three main steps. In the first place, the design of the DA experiment scheme includes the selection of variables to be perturbed and assimilated. For example, rainfall represents the common variable to be perturbed in ensemble modeling and real-time in situ streamflow measurements are increasingly being used to improve estimations of forecasting models via data assimilation in many operational systems, such as in [22,23,24,25,26]. Next, the quantification of model errors is a key process in DA applications because modeling uncertainties in rainfall, model states, and discharge may have significant impacts on results, even more than the DA algorithm selected [27,28]. Finally, the application of the chosen DA algorithms into an Open Loop (OL) hydrological model. For instance, sequential assimilation of observations in models with the Ensemble Kalman Filter (EnKF) and Particle Filter (PF) is widely employed for probabilistic hydrologic predictions and skillful operational flood forecasting systems, such as in [29,30,31,32].

Despite the increasing scientific literature on streamflow DA, DA applications in Andean hydrology are still emerging. For instance, ref. [7] reports the lack of ensemble modeling in South America, with only 4% (2) of publications worldwide from 2011 to 2019. Most DA studies in South America have focused on incorporating satellite altimetry information to understand the terrestrial water cycle in the larger Amazon River Basin, such as in [33,34,35], because the Amazon plays key roles in the water, energy, and carbon cycles and interacts with the global climate. By contrast, in the Andes Cordillera, DA has been applied only in the extratropical Andes (mainly with snow-dominated basins) using observed discharges [27] and remote sensing of snow cover [36]. In that context, the application of ensemble modeling in the hydrology of the Tropical Andes—with mostly pluvial-dominated basins—is still an unexplored study focus.

Following ref. [37], conceptual hydrological models put considerable emphasis on structure as a result of knowledge-driven model reduction in how we formalize our concept of a hydrological system. The GR4H [38] is a parsimonious and easy-to-implement model that has been widely used worldwide in basins of different climate conditions for flood monitoring [39], extreme event assessment [40], surface runoff in reservoir operation [41], and data assimilation experiments [26], and it has been merged with machine learning architectures [42]. In the Vilcanota Basin, the GR4H model was tested in [43] to evaluate the hydrometeorological performance of near real-time satellite precipitation products (SPPs) in a data-sparse context. The main results indicated good performance for semi-distributed runoff estimation in an Andean basin with limited information, with IMERG-E and GSMaP-NRT as the best-performing precipitation sources.

In this context, flood forecasting in the Peruvian Tropical Andes is a challenging task due to (a) the inaccessibility of remote areas with complex topography, (b) a limited in situ hydrometeorological network due to restricted funds, (c) the large number of stations with short records (e.g., most pluviometric automatic records begin in 2016) and huge numbers of missing values, and (d) increased uncertainties of model inputs related to data scarcity. Some studies in the Vilcanota River Basin dealt with these issues by assessing near real-time satellite precipitation products for rainfall–runoff modeling at sub-daily timesteps [43], constraining structural errors in conceptual hydrological models [18], and incorporating vegetation remote sensing into model calibration strategies [44]. Hence, the improvement of accurate flood forecasting in real-time hydrological systems is still a pending task.

Altogether, this study aims to provide new perspectives on the application of ensemble modeling and streamflow data assimilation in a lumped basin of the Tropical Andes of Peru with high flood risk and a sparse hydrometeorological data context. This paper conducts streamflow DA experiments in the Vilcanota River Basin using the EnKF and PF algorithms to assimilate discharge observations into a sub-daily lumped forecasting system based on the GR4H model and driven by satellite–gauge merged rainfall estimations.

2. Materials and Methods

2.1. Study Area

The study area comprises the Vilcanota River Basin located in the southeastern Tropical Andes of Peru. This basin plays a key role in the economic tourism activity of Cusco and was affected by extreme floods in 2010. For this work, we selected a basin area delineated upstream of the Pisac fluviometric station (see Figure 1). The drainage area of the basin is approximately 6900 km², spanning from 2959 m.a.s.l to 6268 m.a.s.l. Also, we defined a regular domain between latitudes 12.9° S–14.8° S and longitudes 70.6° W–72.7° W to download and merge satellite-based precipitation and pluviometric station observations in the study area.

The Vilcanota Basin has a predominant pluvial regimen, with a smaller glacial/snow melting contribution to the total runoff volume. The basin has sparse hydrometeorological data, with only two pluviometric stations (SAL and SIC in Figure 1) inside the area, so the use of satellite precipitation products [43] is relevant for this study. According to the PISCO dataset, Figure 2 displays the seasonal behavior of the main hydroclimatic variables for the climatological period 1981–2010. Precipitation follows a unimodal distribution, with rainfall rates ranging from 120 mm/month to 150 mm/month during the austral summer (December–March) and mean air temperature varying from 6 °C to 11 °C. Monthly discharges span from 20 m³/s to 140 m³/s, with peak flows often occurring in January and February.

Hourly discharge and rainfall records were collected from 1 fluviometric and 12 pluviometric gauge stations (10 stations outside the basin area) from 1 January 2017 to 31 July 2022. The stations are owned by the National Service of Meteorology and Hydrology of Peru (SENAMHI). The gauges’ locations are displayed in Figure 1 and summarized in Table 1. Also, near-real-time satellite precipitation products (SPPs) from the Integrated Multi-satellite Retrievals for GPM—Early Run (IMERG-E) and the Global Satellite Mapping of Precipitation (GSMaP-NRT) were chosen for this study. These SPPs have 5 h of latency for UTC-5 with a spatial and temporal resolution of 0.1° (~10 km) and 0.5 h, respectively. Hourly potential evapotranspiration was estimated based on the gridded evapotranspiration data set developed for Peru, named PISCOpe [47].

The process of merging SPPs and rain gauge information for the Vilcanota Basin (IMERG-E’ and GSMaP-NRT’) was the same as in [43]. Rain gauge observations and SPPs were merged using a simple bias adjustment correction [48]. For this, in each time interval (t), the pixel corresponding to the station point was extracted, and the bias to the station point (Ybias) was calculated as:

{Y b i a s}_{t} = G_{t} - S_{t}

(1)

then, Ybias was interpolated by inverse distance weighting (IDW), using a weighting factor of 2, to generate a bias map (Mbias). Finally, the corrected bias was added to the SPPs (SPP′), as shown below:

{S P P'}_{t} = {S P P}_{t} + {M b i a s}_{t}

(2)

2.2. Parsimonious Sub-Daily Hydrological Modeling

The GR4H model [38] is the hourly adaptation of the GR4J model [49] with four parameters (X1, X2, X3, and X4). In this model, hourly precipitation and potential evapotranspiration are considered to determine the net precipitation (Pn) and evapotranspiration (En), respectively. Part of Pn is lost in the storage reservoir of the soil (Ps), while the remaining amount forms the effective precipitation (Pt = Pn − Ps). The soil moisture content (S), with a maximum value of X1, decreases due to percolation (Perc). Subsequently, Pt is routed at the basin outlet as follows: 10% is routed through a single unit hydrograph with a base time equal to X4, and the remaining 90% is routed through a unit hydrograph and nonlinear reservoir (R) with a maximum capacity of X3. Additionally, a loss function (F), denoted by parameter X2, is applied to both flows to represent the subsurface exchange (loss or gain in the system).

The lumped GR4H models were forced using the merged IMERG-E’/GSMaP-NRT’ data and the PISCOpe dataset at the basin scale. Models were calibrated and validated using observed hourly discharges at the Pisac station. Calibration was conducted from 00:00 h on 1 January 2017 to 23:00 h on 31 August 2020. The first 1200 values were chosen for model warm-up. Model validation took place from 00:00 h on 1 September 2020 to 23:00 h on 31 July 2022.

Model parameters were calibrated with the Shuffled Complex Evolutionary (SCE-UA) algorithm [50] using the objective function (Fobj) proposed in [26] and shown in Equation (3).

F_{o b j} = {0.25 * F}_{l o g N S E} + {0.25 * F}_{B o x N S E} + {0.25 * F}_{K G E} + {0.25 * F}_{B I A S}

(3)

where Fobj is the arithmetic average of the logarithmic Nash–Sutcliffe Efficiency (logNSE), the Box–Cox transformed Nash–Sutcliffe Efficiency (BoxNSE), the Kling–Gupta Efficiency (KGE), and the BIAS. Details of the selected statistical metrics and their equations are summarized in Table 2.

2.3. Design of Streamflow Data Assimilation Experiments

The sequential assimilation technique is well known for operational flood forecasting systems and is structured by adding normally distributed perturbations to the forcing vector [53]. Figure 3 illustrates the general framework for streamflow DA in the Vilcanota River Basin, beginning with the quantification of model errors, running the Open Loop model, and finishing with the application of DA algorithms.

To design streamflow DA experiments in this study, two Open Loop (OL) models were run forward using 100 ensemble members of precipitation and potential evapotranspiration data. Both OL schemes had the same potential evapotranspiration series from the PISCOpe dataset but differed in precipitation inputs. Thus, the OL models were named IMERG-E’+OL and GSMaP-NRT’+OL, depending on the precipitation source used.

Also, we conducted four sequential DA experiments using the Ensemble Kalman Filter (EnKF) and Particle Filter (PF) algorithms. The main difference between the algorithms is how they recursively generate an approximation of the probability distribution of the prognostic variable. EnKF assumes a Gaussian prior distribution for the Kalman Gain and analysis states [54]. Hence, PF updates the importance weights according to the likelihood values of particles and observations, removing samples with negligible weights and replicating samples with large weights to avoid filter degeneracy [30].

As in the OL experiments, we denominated experiments based on the precipitation source and algorithm chosen, so the IMERG-E’+EnKF and IMERG-E’+PF setups were run using IMERG-E’ inputs, while GSMaP-NRT’+EnKF and GSMaP-NRT’+PF used the GSMaP-NRT’ data.

These DA experiments were applied using an hourly adaptation of the R source code from the airGRdatassim package [55].

2.4. Quantification of Model Errors

An ensemble of model realizations was generated to reflect the uncertainties in the catchment model. Rainfall (P) and evapotranspiration (E) inputs and basin soil moisture (S) were perturbed to generate an ensemble of model states and discharge predictions (Q). Also, the observed discharge was perturbed to represent the observation error. Similar to ref. [26], parameters and structural errors were assumed to be accumulated in soil moisture.

Ensemble meteorological inputs were generated by perturbing rainfall and evapotranspiration with multiplicative stochastic noise φ_p applied at each hourly time step, according to the methodology proposed in [54] and presented in Equations (2) and (3). Also, as in [30], random perturbations were provided by a first-order autoregressive model to guarantee temporal correlation of the time-variant forcings. The fractional error parameter was set to 0.65 and temporal decorrelations lengths were defined as 5 h for rainfall and 3 h for potential evapotranspiration based on an autocorrelation analysis.

p' = p φ_{p},

(4)

φ_{p} = (1 - ε_{p}) + 2 u_{p} ε_{p},

(5)

where ε_p is the error parameter for precipitation, and u_p is a uniform random number, such that φ_p is a realization from a uniform distribution ranging from 1 − ε_p to 1 + ε_p.

Basin soil moisture state (S) was perturbed through normally distributed null-mean noise at each assimilation time step after the analysis procedure, following the approach in [56]. This perturbation was truncated by the upper (X1) and lower (zero) bounds of soil moisture.

According to ref. [54], errors in discharge are mainly derived from the measurement of water level and uncertainties in rating curves. Here, we used a normal distribution with a zero-valued mean and variance (σ_obs²) for describing the measurement noise and parameterizing it as a function of the discharge observation (Q_obs), as presented in Equation (6).

σ_{o b s}^{2} = {({ε_{o b s} Q}_{o b s})}^{2}

(6)

Following the approach proposed in [30], the error parameter ε_obs was set to 0.1, the quantile 10 (Q₁₀) was used as the minimum threshold to prevent underestimated error variances in the case of low discharges, and the variance was evaluated proportionally to Q₁₀² for values below Q₁₀.

2.5. Evaluation of Model Forecast

We assessed forecasting performance during February and March 2022 for lead times from 1 to 24 h. For this procedure, the updated (a posteriori) states of the GR4H model (S, R, HU1, and HU2) were taken as the baseline for flow forecasting from 1 to 24 h. Here, we used Nash–Sutcliffe Efficiency (NSE) and Bias (BIAS) to evaluate the mean ensemble forecast, the Mean of Ensemble Root Mean Squared Error (MRMSE), and the Continuous Ranked Probability Skill Score (CRPSS) to examine the ensemble spread. Details of statistical metrics and their equations are summarized in Table 3.

3. Results

3.1. Model Calibration and Validation

The GR4H model parameters were calibrated for both precipitation sources (IMERG-E’ and GSMaP-NR’) using hourly streamflow records from the Pisac gauge station (basin outlet). Overall, the calibration process gave good results (logNSE ≥ 0.75, BoxNSE ≥ 0.75, NSE ≥ 0.75, and −0.1 ≤ BIAS ≤ 0.1) in terms of mass balance (BIAS), variance (KGE), and flow representation (logNSE, BoxNSE). The summary of statistical metrics selected for calibration, validation, and total period performance evaluation is shown in Table 4. It is notable that IMERG-E’ metrics outperformed during the calibration and validation period; however, GSMaP-NRT’ skills were also higher. Despite all metrics being equally weighted in the OF, it is important to note that KGE values were mostly close to 1 during calibration in comparison to those of logNSE and BoxNSE. This suggested that high-flow representations were slightly superior to middle- and low-flow, at least during model calibration.

The calibration procedure showed that the most sensitive parameters were the conceptual soil moisture (S), with a maximum capacity equal to X1 ranging from 618 mm (IMERG-E’) to 972 mm (GSMaP-NRT’), and the conceptual slow routing storage (R), with a maximum value of X3 varying between 57 mm (IMERG-E’) and 136 mm (GSMaP-NRT’). The upper values of X1 and X3 on the GSMaP-NRT’ model were related to the control of rainfall overestimation over the study domain and runoff transformation, as shown in [43]. For instance, Figure 4a,b shows the rainfall series at the basin scale from both satellite sources as gray bars, exhibiting higher hourly rainfall rates for GSMaP-NRT’ than for IMERG-E’.

The comparison between observed and simulated hourly discharges at the Pisac gauge station (basin outlet) is displayed in Figure 4. Figure 4a,b shows the flow series (2017–2022) during the calibration and validation periods. Two different mean areal precipitation inputs over the Vilcanota Basin are shown as gray bars above their respective hydrographs. Observed discharge is presented as blue lines and simulated discharges are plotted as green (IMERG-E’) and magenta (GSMaP-NRT’) lines. The dashed red box corresponds to the rainiest months in 2022 (February and March) selected for the streamflow DA assessment.

During the calibration and validation steps in Figure 4a,b, the parsimonious GR4H model represented the high variability of discharge at the hourly time step. Hence, high peak flows (which was the main purpose of this particular application) were often underestimated. In terms of statistical metrics (Table 4), the simulations performed slightly better (logNSE, BoxNSE, and KGE higher than 0.85) when the GR4H model was forced with IMERG-E’ than with GSMaP-NRT’, even during model validation. These results suggested that running the model with calibrated parameters was insufficient to appropriately represent the runoff in the basin during the rainy period; furthermore, it was insufficient considering the added uncertainty introduced by the satellite rainfall estimations at the basin scale, especially at finer temporal scales.

Although the calibration process significantly improved the model performance in terms of statistical metrics, the correct reproduction of high peak flows in short time windows was an incomplete task. For example, Figure 4c,d shows scatter plots between observed and simulated values for February and March 2022. Simple linear regression (SLR) models for IMERG-E’ (green) and GSMaP-RT’ (magenta) simulations and Pearson correlation coefficients are presented in the figures. The lower SRL’s slope value and Pearson coefficient (R < 0.50) indicated the subestimation and high dispersion of observed discharges, respectively, and were drastically noted for the GSMaP-NRT’-driven model. These results provide evidence that good model skills do not always reflect good representation of discharge in short periods. With the aim of checking this, we performed posterior DA experiments to improve simulation during the wet period.

3.2. Estimation of Model Uncertainties in Streamflow Data Assimilation

The experiments aimed to assess the usefulness of DA-based estimation to improve streamflow simulation when accounting for the uncertainties of the meteorological forcings. The results obtained for the assimilated discharge data at the basin outlet with EnKF and PF are presented in Figure 5. The period considered here was February and Mach 2022. Blue lines correspond to hourly discharge observations, black lines represent OL model runs, green (magenta) lines display ensemble means streamflow simulations for IMERG-E’ (GSMaP-NRT’), and the ensemble spreads are shown as gray bounds.

The root mean squared error (RMSE) values in Figure 5 quantified errors between simulated and observed discharges and are displayed to represent the preliminary improvement in simulations after streamflow DA concerning Open Loop (OL) simulations. In OL cases, the obtained lower RMSE values indicated that IMERG-E’ (61.25 m³/s) simulations performed better than GSMaP-NRT’ (71.76 m³/s). This is notable when comparing the observed and OL series. For instance, IMERG-E’ simulations usually overestimated discharge, while for the same period, GSMaP-NRT’ tended to underestimate high flow.

After DA application, one could appreciate a match between ensemble mean simulations with the EnKF and PF algorithms and observations in all experiments. Concerning the algorithms used, EnKF had better performance than PF and better agreement between simulations and observations. Also, it is easy to note that ensemble spreads in PF experiments were thinner than in EnKF ones, even for the two different precipitation sources. In more detail, the GSMaP-NRT’+EnKF scheme outperformed other experiments in terms of the lowest RMSE value obtained (17.35 m³/s).

In summary, the results presented here suggest that GSMaP-NRT’-driven experiments performed better than IMERG-E’ ones after applying DA with EnKF and PF, even when IMERG-E’ outperformed during OL runs. In addition, the previous findings displayed here highlight the benefits of sequential data assimilation for the reduction of streamflow uncertainties at the outlet of a data-sparse basin in the Tropical Andes.

On the other hand, the idea of comparing uncertainties in the state variable S from the production reservoir, as a proxy of soil moisture (SM), was tested here and is illustrated in Figure 6. Ensemble means are presented as green (magenta) for the IMERG-E’ (GSMaP-NRT’) experiments, and ensemble spreads are shown as gray bounds. The results displayed huge differences in the SM behavior in all schemes. For instance, SM ranged from 192 mm (IMERG-E’+PF) to 545 mm (GSMaP-NRT’+EnKF). Here, it is important to note that SM values were higher in GSMaP-NRT’ experiments than in IMERG-E’ ones, and were related to the major X1 value obtained during model calibration. Furthermore, similar to the streamflow results shown in Figure 5, the uncertainty bounds were wider with EnKF schemes compared to PF setups, suggesting that EnKF incorporated more variability in SM during perturbation of the basin soil moisture state (S). Here, the use of SM remote sensing could contribute to the reduction in SM uncertainties, even in a lumped hydrological system. However, because the focus of this work was on observed discharge data assimilation, we leave this proposal for future investigation.

3.3. Forecasting Performance Assessment

The main goal of applying discharge assimilation in this framework was to update the amount of water stored in the basin before the catchment responded to a rainfall event (i.e., obtaining the “right” initial conditions before the rising limb of a flood event). Once the updated initial states were obtained, the rest of the forecasting work was accomplished by the hydrological model. Figure 7 illustrates the forecasting performance after streamflow DA for the lumped hydrological system in the Vilcanota River Basin at the Pisac stream gauge station. For this purpose, the perturbed IMERG-E’ and GSMaP-NRT’ precipitation sources were used for computing forecasting skills during lead times from 1 to 24 h.

In terms of NSE, Figure 7a shows that the OL forecasting was much worse (NSE < 0), especially for GSMaP-NRT’. Nevertheless, the ensemble means of the GSMaP-NRT’+EnKF setup had the best performance after streamflow assimilation, with NSE values always higher than those of other schemes. For instance, considering a threshold value of 0.50 for acceptable forecasting, the GSMaP-NRT’+EnKF exhibited acceptable predictions during the first 8 h, while the remaining schemes exceeded this threshold in only the first 5 h.

Concerning BIAS of the ensemble means forecasting (Figure 7b), OL of IMERG-E’ displayed positive values, indicating discharge overestimation, while GSMaP-NRT’ exhibited high negative BIAS associated with huge flow underestimation. After streamflow DA, all configurations showed negative BIAS, indicating that discharge underestimation was still present in the forecasting model even after including observed discharges for updated model states (S and R). However, the improvement in high peak flow prediction was notable in comparison to the calibrated-validated model (without DA). In addition, the underestimation was observed to change with the forecasting time. For instance, negative BIAS decreased during the first 4–5 h of forecasting to then increase until the 24 h. As can be seen in Figure 7b, these BIAS changes were more visible in GSMaP-NRT’ experiments than in IMERG-E’ ones.

In terms of MRMSE (Figure 7c), OL run with GSMaP-NRT’ showed high model errors in contrast to IMERG-E’. However, the ensemble simulation of PF performed better than the EnKF algorithm, specifically with the GSMaP-NRT’ forcing data. These results might be related to the thinner ensemble spread of PF simulations, which introduced less uncertainty from each ensemble member compared to the high uncertainty bounds of IMERG-E’. Also, similar to other statistics, MRMSE was worse when increasing lead times from 1 to 24 h of forecasting. Nerveless, in the case of the IMERG-E’+EnKF scheme, MRMSE was worse at the beginning of forecasting (e.g., 7–8 h forecasts) but better for longer term forecasts (e.g., 24 h forecasts).

Concerning CRPSS, where the continuous ranked probability score (CRPS) measured the quadratic distance between the cumulative distribution of the forecasts and the cumulative distribution of the observations of the predicted variable, values of CRPSS closer to 1 were preferred. GSMaP-NRT’ setups resulted in an improvement in forecasting accuracy up to 10 days (CRPSS ≥ 0.75), decreasing progressively with long-term forecasts, and much less improvement was achieved using IMERG-E’ schemes, with CRPSS values ranging from 0.21 to 0.54.

In summary, all forecasting skills (NSE, BIAS, MRMSE, and CRPSS) were worse when lead times rose from 1 to 24 h at the Pisac gauge station. The good performance at the beginning of forecasting (compared to poor skills for OL runs) demonstrated the benefits of streamflow DA applications to improve the accuracy of predictions. The superiority of GSMaP-NRT’ experiments still existed after sequential data assimilation in the forecasting framework, as was demonstrated by many metrics, and the application of the EnKF algorithm introduced more accuracy in terms of ensemble mean predictions. In this context, the GSMaP-NRT’+EnKF setup outperformed other setups.

Figure 8 displays the comparison between observed (blue lines) and forecasted (magenta lines) hydrographs with GSMaP-NRT’+EnKF for lead times of 1, 3, 6, 12, 18, and 24 h. In addition, ensemble spreads from an ensemble size of 100 members are shown as gray bounds. A quick examination of the hourly hydrographs illustrates the low quality of long-term forecasts (e.g., 18–24 h). For example, the peak flow values during the middle of March 2022 for lead times of 18 h (328 m³/s) and 24 h (302 m³/s) hugely subestimated the observed value (465 m³/s). In contrast, high peak flow forecasts given by the most recent prediction (e.g., 1–6 h) were considered to be the best estimate, as shown in Figure 8a–c. Nevertheless, the ensemble means forecasted discharges were still below the observed values, but at much less magnitude than the OL runs and during model calibration-validation.

4. Discussion

Limitations and Potential of Streamflow Data Assimilation in the Vilcanota River Basin

The results showed the potential to improve sub-daily streamflow forecasting in the Vilcanota River Basin by assimilating real-time observed discharges at the basin outlet. This work establishes the basis for hydrological streamflow predictions in an Andean basin of Peru. However, some key considerations are discussed below.

The ensemble spreads of the EnKF experiments tended to be wider than the PF ones, as displayed in the streamflow (Figure 5) and soil moisture (Figure 6) simulations. This suggests that EnKF requires a bigger ensemble size than PF to deal with the same uncertainty sources. Here, the GR4H model was run forward in time with 100 ensemble members according to similar studies in other domains [27,28,30]. Hence, ref. [57] suggested that EnKF ensemble size influences streamflow DA and must be at least 500 members due to non-Gaussian forecasting, but this process will increase the computational demands on operating systems, especially at sub-daily timesteps and operational models. In contrast, the PF algorithm is more flexible about ensemble size, but a very small ensemble (e.g., 20, 40) increases filter degeneracy and reduces the number of successful runs [58,59]. Moreover, PF and EnKF are inherently sequential algorithms that are easy to use in real-time forecasting applications because measurements are processed as they become available [30]. Recent studies in real-time flood forecasting have also incorporated new sequential DA techniques, such as the Ensemble Transform Kalman Filter (ETKF) [60], covariance resampling for PF [58], disaggregated multi-level Factorial Hydrologic Data Assimilation (FHDA) [61], and state updates through backpropagation and deep learning models [62], which might be tested in future research.

Also, it is well known that the forecasting accuracy is sensitive to each source of model uncertainty, which is likely to have a differing impact depending mainly on the choice of modeling error scheme and then on the selected DA algorithm. For modeling input errors, we used two rainfall sources at the basin scale (IMERG-E’ and GSMaP-NRT’) and a space–time correlation using a uniform distribution so that the random perturbations were provided by a first-order autoregressive model similar to ref. [54]. In this case, autoregressive models similar to ref. [60] were built for the rainiest months in 2022 to focus on improving the forecasting accuracy of recent floods. Hence, the dry season might also be tested in future studies to examine model errors related to uncertainties in low-flow measurements and the contribution of groundwater (baseflow) to surface runoff, such as in [63]. Also, state perturbations were applied only for the production reservoir (S) related to the basin soil moisture following the methodology in [26,28] for DA in the GR4H model, but the remaining model states (R, UH1, and UH2) can also be updated thought EnKF and PF algorithms to assess the impact on hourly simulation, such as presented in [30] for the GR4J at a daily time step. In case of system output errors, the state–discharge rating curves at the Pisac stream gauge need to be continuously adjusted to reduce observed streamflow error, such as in [64], especially for low and high flows where streamflows are usually interpolated and extrapolated, respectively. Recently, a new LiDAR sensor has been installed for monitoring water levels [65], such as in [66], which will support observational real-time flood monitoring.

Furthermore, we note the limitations of lumped-model uncertainty assessments in a basin with an area of 6900 km², especially in terms of the impact of local variability on the basin’s hydrology. Hence, we highlight the benefits of streamflow DA in an Andean basin of Peru to improve forecasting accuracy using real-time discharges at the basin outlet. Future work will assess DA techniques in the semi-distributed Vilcanota systems presented in [43] to prove if incorporating forcing spatialization, river routing, and soil moisture sub-basin spatialization in conceptual models, such as in [25,26], is more appropriate for Andean basins with sparse data availability. For instance, ref. [24] suggests that hydrologic river routing, such as the Muskingum method, is subject to potentially significant errors from structural and parametric uncertainties.

Finally, there is a pending task to generalize the results from particular experiments to the whole wide hydrological heterogeneity of the Peruvian Andes [67]. However, the case of the Vilcanota River Basin with the four DA assimilation schemes tested here has distinctive features that can be expected to apply over a range of different basins in the Tropical Andes of Peru with the same data scarcity issue [68]. Overall, the results presented here benchmark the use of EnKF and PF for real-time streamflow DA in the Peruvian hydrology.

5. Conclusions

The goal of this work was to evaluate the applicability and performance of streamflow data assimilation techniques in a lumped model with limited in situ hydrometeorological data in the southern Peruvian Andes. The model calibration and validation showed that it is possible to obtain reliable hourly streamflow simulations at the basin outlet (logNSE ≥ 0.75, BoxNSE ≥ 0.75, NSE ≥ 0.75, and −0.1 ≤ BIAS ≤ 0.1) for the period of available discharge data (~5 years of hourly records); however, this did not necessary result in the uncertainty reduction of flow simulations. The notable differences between observed and simulated discharges in peak flow seasons highlight the problem of outstanding model skills from hourly longer records (years) that do not always reflect real model fitting in smaller time windows (hours, days, and weeks). For the streamflow data assimilation experiments, the study period was set between February and March 2022 to examine the improvement of high-flow simulations during the rainiest months when river flooding occurs in the basin.

The idea of propagating information inside the basin through streamflow data assimilation was tested using four different configurations. Ensemble modeling with EnKF and PF showed that perturbation introduced by GSMaP-NRT’-driven experiments reduced the model uncertainties more than IMERG-E’ ones. This result is relevant for operational purposes in the basin because during model calibration and validation, even during the Open Loop runs, IMERG-E’-driven models had better skills. Furthermore, the results showed that the assimilation of real-time discharge measurements for the period February–March 2022 drastically reduced the model subestimation of high flows at the basin outlet. This reduction was more notable for the GSMaP-NRT’+EnKF and GSMaP-NRT’+PF schemes.

The ensemble forecasting framework from 1 to 24 h proposed here shows that the updating of model states using EnKF and PF improves the accuracy of streamflow prediction at least during the first 6–8 h on average. It was not surprising that GSMaP-NRT’ had the best performance and, based on the ensemble verification metrics used, the GSMaP-NRT’+EnKF scheme had the best forecasting skills (NSE ≥ 0.50, −15 m³/s ≤ BIAS ≤ −10 m³/s, MRMSE ≤ 70 m³/s and CRPSS ≥ 0.75) from the four experiments.

Finally, hydrologic data assimilation techniques have usually been tested in developed countries where typically detailed hydrometeorological datasets of the most relevant hydrological features (rainfall, runoff, soil, land use, and other properties) are available. This work provides a benchmark of streamflow data assimilation for flood forecasting in a sparsely monitored basin using near real-time satellite precipitation products and highlights several aspects for new research in the Tropical Andes. In particular, the role of model discretization in uncertainty estimation must be understood in further detail to guide data collection and model development for enhanced decision making.

Author Contributions

Conceptualization, H.L. and W.L.-C.; methodology, H.L.; software, H.L.; validation, H.L. and W.L.-C.; formal analysis, H.L.; investigation, H.L.; resources, W.L.-C.; data curation, H.L.; writing—original draft preparation, H.L.; writing—review and editing, H.L., W.L.-C. and M.A.; visualization, H.L.; supervision, W.L.-C.; project administration, M.A.; funding acquisition, M.A. All authors have read and agreed to the published version of the manuscript.

Funding

We acknowledge the financial support of the Z Zurich Foundation through Practical Action Latin America as a partner of the Zurich Flood Resilience Alliance.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study will be made available upon request.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

BIAS	Error in observations and/or simulations.
BoxNSE	Nash–Sutcliffe Efficiency criterion with Box–Cox transformed values.
CRPS	Continuous Ranked Probability Score.
CRPSS	Continuous Ranked Probability Skill Score.
DA	Data Assimilation.
EnKF	Ensemble Kalman Filter.
GR4H	Génie Rural à 4 paramètres Horaire.
GSMaP-NRT	Global Satellite Mapping of Precipitation Near Real-Time product.
GSMaP-NRT’	GSMaP-NRT product merged with pluviometric stations.
GSMaP-NRT’+EnKF	EnKF experiment applied to the hydrological model forced with GSMaP-NRT’.
GSMaP-NRT’+OL	Open Loop for the hydrological model forced with GSMaP-NRT’.
GSMaP-NRT’+PF	PF experiment applied to the hydrological model forced with GSMaP-NRT’.
IMERG-E	Integrated Multi-satellitE Retrievals for GPM Early Runs product.
IMERG-E’	IMERG-E product merged with pluviometric stations.
IMERG-E’+EnKF	EnKF experiment applied to the hydrological model forced with IMERG-E’.
IMERG-E’+OL	Open Loop for the hydrological model forced with IMERG-E’.
IMERG-E’+PF	PF experiment applied to the hydrological model forced with IMERG-E’.
KGE	Kling-Gupta Efficiency criterion.
logNSE	Nash–Sutcliffe Efficiency criterion with logarithmic values.
MRMSE	Mean of Ensemble Root Mean Squared Error
NSE	Nash–Sutcliffe Efficiency criterion.
OL	Open Loop.
PF	Particle Filter.
RMSE	Root Mean square Error.
SM	Soil Moisture.

References

Poveda, G.; Espinoza, J.C.; Zuluaga, M.D.; Solman, S.A.; Garreaud, R.; van Oevelen, P.J. High Impact Weather Events in the Andes. Front. Earth Sci. Chin. 2020, 8. [Google Scholar] [CrossRef]
Motschmann, A. Water Resource Risks in the Andes of Peru: An Integrative Perspective; University of Zurich: Zürich, Switzerland, 2021. [Google Scholar]
Ávila, Á.; Guerrero, F.C.; Escobar, Y.C.; Justino, F. Recent Precipitation Trends and Floods in the Colombian Andes. Water 2019, 11, 379. [Google Scholar] [CrossRef]
Pinos, J.; Orellana, D.; Timbe, L. Assessment of Microscale Economic Flood Losses in Urban and Agricultural Areas: Case Study of the Santa Bárbara River, Ecuador. Nat. Hazards 2020, 103, 2323–2337. [Google Scholar] [CrossRef]
Höglund, S.; Rodin, L. Flood Simulation in the Colombian Andean Region Using UAV-Based LiDAR: Minor Field Study in Colombia; KTH Royal Institute of Technology: Stockholm, Sweden, 2023. [Google Scholar]
Muñoz, P.; Orellana-Alvear, J.; Bendix, J.; Feyen, J.; Célleri, R. Flood Early Warning Systems Using Machine Learning Techniques: The Case of the Tomebamba Catchment at the Southern Andes of Ecuador. Hydrology 2021, 8, 183. [Google Scholar] [CrossRef]
Wu, W.; Emerton, R.; Duan, Q.; Wood, A.W.; Wetterhall, F.; Robertson, D.E. Ensemble Flood Forecasting: Current Status and Future Opportunities. WIREs Water 2020, 7, e1432. [Google Scholar] [CrossRef]
Drenkhan, F.; Carey, M.; Huggel, C.; Seidel, J.; Oré, M.T. The Changing Water Cycle: Climatic and Socioeconomic Drivers of Water-Related Changes in the Andes of Peru. Wiley Interdiscip. Rev. Water 2015, 2, 715–733. [Google Scholar] [CrossRef]
Huggel, C.; Raissig, A.; Rohrer, M.; Romero, G.; Diaz, A.; Salzmann, N. How Useful and Reliable Are Disaster Databases in the Context of Climate and Global Change? A Comparative Case Study Analysis in Peru. Nat. Hazards Earth Syst. Sci. 2015, 15, 475–485. [Google Scholar] [CrossRef]
Espinoza, J.C.; Chavez, S.; Ronchail, J.; Junquas, C.; Takahashi, K.; Lavado, W. Rainfall Hotspots over the Southern Tropical Andes: Spatial Distribution, Rainfall Intensity, and Relations with Large-Scale Atmospheric Circulation. Water Resour. Res. 2015, 51, 3459–3475. [Google Scholar] [CrossRef]
Llauca, H.; Leon, K.; Lavado-Casimiro, W. Construction of a Daily Streamflow Dataset for Peru Using a Similarity-Based Regionalization Approach and a Hybrid Hydrological Modeling Framework. J. Hydrol. Reg. Stud. 2023, 47, 101381. [Google Scholar] [CrossRef]
Lavado-Casimiro, W.; Silvestre, E.; Pulache, W. Extreme Rainfall Trends around Cusco and Its Relationship with the Floods in January 2010; Revista Peruana Geo-Atmosferica: Cusco, Peru, 2010. [Google Scholar]
Waldo, L.-C.; Juan Carlos, J.; Harold, L.; Karen, L.; Clara, O.; Alan, L.; Adrian, H.; Oscar, F.; Julia, A.; Pedro, R.; et al. ANDES: The First System for Flash Flood Monitoring and Forecasting in Peru. In Proceedings of the EGU General Assembly 2020, Online, 4–8 May 2020; p. 3759. [Google Scholar]
Fan, Y. Uncertainty Quantification in Hydrologic Predictions: A Brief Review. J. Environ. Inform. Lett. 2019, 2, 48–56. [Google Scholar] [CrossRef]
Gupta, A.; Govindaraju, R.S. Uncertainty Quantification in Watershed Hydrology: Which Method to Use? J. Hydrol. 2023, 616, 128749. [Google Scholar] [CrossRef]
Segovia-Cardozo, D.A.; Bernal-Basurco, C.; Rodríguez-Sinobas, L. Tipping Bucket Rain Gauges in Hydrological Research: Summary on Measurement Uncertainties, Calibration, and Error Reduction Strategies. Sensors 2023, 23, 5385. [Google Scholar] [CrossRef] [PubMed]
McMillan, H.K.; Westerberg, I.K.; Krueger, T. Hydrological Data Uncertainty and Its Implications. WIREs Water 2018, 5, e1319. [Google Scholar] [CrossRef]
Saavedra, D.; Mendoza, P.A.; Addor, N.; Llauca, H.; Vargas, X. A Multi-objective Approach to Select Hydrological Models and Constrain Structural Uncertainties for Climate Impact Assessments. Hydrol. Process. 2021, 36, e14446. [Google Scholar] [CrossRef]
Herrera, P.A.; Marazuela, M.A.; Hofmann, T. Parameter Estimation and Uncertainty Analysis in Hydrological Modeling. WIREs Water 2022, 9, e1569. [Google Scholar] [CrossRef]
Panchanathan, A.; Ahrari, A.H.; Ghag, K.; Mustafa, S.M.T.; Haghighi, A.T.; Kløve, B.; Oussalah, M. An Overview of Approaches for Reducing Uncertainties in Hydrological Forecasting: Progress, and Challenges. Syst. Rev. 2023. Available online: https://www.researchsquare.com/article/rs-2802423/v1 (accessed on 26 October 2023).
Rasmussen, J.; Madsen, H.; Jensen, K.H.; Refsgaard, J.C. Data Assimilation in Integrated Hydrological Modelling in the Presence of Observation Bias. Hydrol. Earth Syst. Sci. Discuss. 2015, 12, 8131–8173. [Google Scholar] [CrossRef]
Avellaneda, P.M.; Ficklin, D.L.; Lowry, C.S.; Knouft, J.H.; Hall, D.M. Improving Hydrological Models with the Assimilation of Crowdsourced Data. Water Resour. Res. 2020, 56, e2019WR026325. [Google Scholar] [CrossRef]
Boucher, M.-A.; Quilty, J.; Adamowski, J. Data Assimilation for Streamflow Forecasting Using Extreme Learning Machines and Multilayer Perceptrons. Water Resour. Res. 2020, 56, e2019WR026226. [Google Scholar] [CrossRef]
Noh, S.J.; Lee, H.S.; Seo, D.J. Streamflow Data Assimilation for Hydrologic River Routing: Advances and Challenges. In Proceedings of the American Geophysical Union, Fall Meeting 2019, Washington, DC, USA, 9–13 December 2019; Volume 2019. pp. H31J–1853. [Google Scholar]
Mazzoleni, M.; Noh, S.J.; Lee, H.; Liu, Y.; Seo, D.-J.; Amaranto, A.; Alfonso, L.; Solomatine, D.P. Real-Time Assimilation of Streamflow Observations into a Hydrological Routing Model: Effects of Model Structures and Updating Methods. Hydrol. Sci. J. 2018, 63, 386–407. [Google Scholar] [CrossRef]
Li, Y.; Ryu, D.; Western, A.W.; Wang, Q.J. Assimilation of Stream Discharge for Flood Forecasting: Updating a Semidistributed Model with an Integrated Data Assimilation Scheme. Water Resour. Res. 2015, 51, 3238–3258. [Google Scholar] [CrossRef]
Mendoza, P.A.; McPhee, J.; Vargas, X. Uncertainty in Flood Forecasting: A Distributed Modeling Approach in a Sparse Data Catchment. Water Resour. Res. 2012, 48, W09532. [Google Scholar] [CrossRef]
Li, Y.; Ryu, D.; Western, A.W.; Wang, Q.J.; Robertson, D.E.; Crow, W.T. An Integrated Error Parameter Estimation and Lag-Aware Data Assimilation Scheme for Real-Time Flood Forecasting. J. Hydrol. 2014, 519, 2722–2736. [Google Scholar] [CrossRef]
Bergeron, J.; Leconte, R.; Trudel, M.; Farhoodi, S. On the Choice of Metric to Calibrate Time-Invariant Ensemble Kalman Filter Hyper-Parameters for Discharge Data Assimilation and Its Impact on Discharge Forecast Modelling. Hydrology 2021, 8, 36. [Google Scholar] [CrossRef]
Piazzi, G.; Thirel, G.; Perrin, C.; Delaigue, O. Sequential Data Assimilation for Streamflow Forecasting: Assessing the Sensitivity to Uncertainties and Updated Variables of a Conceptual Hydrological Model at Basin Scale. Water Resour. Res. 2021, 57, e2020WR028390. [Google Scholar] [CrossRef]
Leach, J.M.; Coulibaly, P. An Extension of Data Assimilation into the Short-Term Hydrologic Forecast for Improved Prediction Reliability. Adv. Water Resour. 2019, 134, 103443. [Google Scholar] [CrossRef]
Wang, S.; Ancell, B.C.; Huang, G.H.; Baetz, B.W. Improving Robustness of Hydrologic Ensemble Predictions through Probabilistic Pre- and Post-processing in Sequential Data Assimilation. Water Resour. Res. 2018, 54, 2129–2151. [Google Scholar] [CrossRef]
Emery, C.M.; Biancamaria, S.; Boone, A.; Ricci, S.; Rochoux, M.C.; Pedinotti, V.; David, C.H. Assimilation of Wide-Swath Altimetry Water Elevation Anomalies to Correct Large-Scale River Routing Model Parameters. Hydrol. Earth Syst. Sci. 2020, 24, 2207–2233. [Google Scholar] [CrossRef]
Wongchuig, S.C.; de Paiva, R.C.D.; Siqueira, V.; Collischonn, W. Hydrological Reanalysis across the 20th Century: A Case Study of the Amazon Basin. J. Hydrol. 2019, 570, 755–773. [Google Scholar] [CrossRef]
Paiva, R.C.D.; Collischonn, W.; Bonnet, M.-P.; de Gonçalves, L.G.G.; Calmant, S.; Getirana, A.; Santos da Silva, J. Assimilating in Situ and Radar Altimetry Data into a Large-Scale Hydrologic-Hydrodynamic Model for Streamflow Forecast in the Amazon. Hydrol. Earth Syst. Sci. Discuss. 2013, 10, 2879–2925. [Google Scholar] [CrossRef]
Cortés, G.; Girotto, M.; Margulis, S. Snow Process Estimation over the Extratropical Andes Using a Data Assimilation Framework Integrating MERRA Data and Landsat Imagery. Water Resour. Res. 2016, 52, 2582–2600. [Google Scholar] [CrossRef]
Mount, N.J.; Maier, H.R.; Toth, E.; Elshorbagy, A.; Solomatine, D.; Chang, F.-J.; Abrahart, R.J. Data-Driven Modelling Approaches for Socio-Hydrology: Opportunities and Challenges within the Panta Rhei Science Plan. Hydrol. Sci. J. 2016, 61, 1192–1208. [Google Scholar] [CrossRef]
Moine, N. Le Bassin Versant de Surface vu Par Le Souterrain: Une Voie d’amélioration Des Performances et Du Réalisme Des Modèles Pluie-Débit? Ph.D. Thesis, Université Pierre et Marie, Paris, France, 2008. [Google Scholar]
Chancay, J.E.; Espitia-Sarmiento, E.F. Improving Hourly Precipitation Estimates for Flash Flood Modeling in Data-Scarce Andean-Amazon Basins: An Integrative Framework Based on Machine Learning and Multiple Remotely Sensed Data. Remote Sens. 2021, 13, 4446. [Google Scholar] [CrossRef]
Caligiuri, S.; Camera, C.; Masetti, M.; Bruggeman, A.; Sofokleous, I. Testing GR4H Model Parameter Transferability for Extreme Events in Cyprus: Evaluation of a Cluster Analysis Approach. In Proceedings of the EGU General Assembly 2019, Vienna, Austria, 7–12 April 2019; Geophysical Research Abstracts. Volume 21. [Google Scholar]
Basri, H.; Sidek, L.M.; Razad, A.Z.; Pokhrel, P. Hydrological Modelling of Surface Runoff for Temengor Reservoir Using GR4H Model. Int. J. Civ. Eng. Technol. 2019, 10, 29–40. [Google Scholar]
Ayzel, G.; Heistermann, M. The Effect of Calibration Data Length on the Performance of a Conceptual Hydrological Model versus LSTM and GRU: A Case Study for Six Basins from the CAMELS Dataset. Comput. Geosci. 2021, 149, 104708. [Google Scholar] [CrossRef]
Llauca, H.; Lavado-Casimiro, W.; León, K.; Jimenez, J.; Traverso, K.; Rau, P. Assessing Near Real-Time Satellite Precipitation Products for Flood Simulations at Sub-Daily Scales in a Sparsely Gauged Watershed in Peruvian Andes. Remote Sens. 2021, 13, 826. [Google Scholar] [CrossRef]
Fernandez-Palomino, C.A.; Hattermann, F.F.; Krysanova, V.; Vega-Jácome, F.; Bronstert, A. Towards a More Consistent Eco-Hydrological Modelling through Multi-Objective Calibration: A Case Study in the Andean Vilcanota River Basin, Peru. Hydrol. Sci. J. 2021, 66, 59–74. [Google Scholar] [CrossRef]
Aybar, C.; Fernández, C.; Huerta, A.; Lavado, W.; Vega, F.; Felipe-Obando, O. Construction of a High-Resolution Gridded Rainfall Dataset for Peru from 1981 to the Present Day. Hydrol. Sci. J. 2020, 65, 770–785. [Google Scholar] [CrossRef]
Huerta, A.; Camacho, C.L.A.; Imfeld, N.; Correa, K. High-Resolution Grids of Daily Air Temperature for Peru-the New PISCOt v1. 2 Dataset. 2022. Available online: https://eartharxiv.org/repository/view/4864/ (accessed on 26 October 2023).
Huerta, A.; Bonnesoeur, V.; Cuadros-Adriazola, J.; Gutierrez, L.; Ochoa-Tocachi, B.F.; Román-Dañobeytia, F.; Lavado-Casimiro, W. PISCOeo_pm, a Reference Evapotranspiration Gridded Database Based on FAO Penman-Monteith in Peru. Sci. Data 2022, 9, 328. [Google Scholar] [CrossRef]
Dinku, T.; Hailemariam, K.; Maidment, R.; Tarnavsky, E.; Connor, S. Combined Use of Satellite Estimates and Rain Gauge Observations to Generate High-Quality Historical Rainfall Time Series over Ethiopia. Int. J. Climatol. 2014, 34, 2489–2504. [Google Scholar] [CrossRef]
Perrin, C.; Michel, C.; Andréassian, V. Improvement of a Parsimonious Model for Streamflow Simulation. J. Hydrol. 2003, 279, 275–289. [Google Scholar] [CrossRef]
Duan, Q.Y.; Gupta, V.K.; Sorooshian, S. Shuffled Complex Evolution Approach for Effective and Efficient Global Minimization. J. Optim. Theory Appl. 1993, 76, 501–521. [Google Scholar] [CrossRef]
Pfannerstill, M.; Guse, B.; Fohrer, N. Smart Low Flow Signature Metrics for an Improved Overall Performance Evaluation of Hydrological Models. J. Hydrol. 2014, 510, 447–458. [Google Scholar] [CrossRef]
Mizukami, N.; Rakovec, O.; Newman, A.J.; Clark, M.P.; Wood, A.W.; Gupta, H.V.; Kumar, R. On the Choice of Calibration Metrics for “High-Flow” Estimation Using Hydrologic Models. Hydrol. Earth Syst. Sci. 2019, 23, 2601–2614. [Google Scholar] [CrossRef]
Jafarzadegan, K.; Abbaszadeh, P.; Moradkhani, H. Sequential Data Assimilation for Real-Time Probabilistic Flood Inundation Mapping. Hydrol. Earth Syst. Sci. 2021, 25, 4995–5011. [Google Scholar] [CrossRef]
Clark, M.P.; Rupp, D.E.; Woods, R.A.; Zheng, X.; Ibbitt, R.P.; Slater, A.G.; Schmidt, J.; Uddstrom, M.J. Hydrological Data Assimilation with the Ensemble Kalman Filter: Use of Streamflow Observations to Update States in a Distributed Hydrological Model. Adv. Water Resour. 2008, 31, 1309–1324. [Google Scholar] [CrossRef]
Piazzi, G.; Delaigue, O. Ensemble-Based Data Assimilation with GR Hydrological Models (v. 0.1.3); HAL; R Foundation: Vienna, Austria, 2021; Available online: https://doi.org/hal-03301603 (accessed on 26 October 2023).
Moradkhani, H.; Sorooshian, S.; Gupta, H.V.; Houser, P.R. Dual State–Parameter Estimation of Hydrological Models Using Ensemble Kalman Filter. Adv. Water Resour. 2005, 28, 135–147. [Google Scholar] [CrossRef]
Reichle, R.H.; McLaughlin, D.B.; Entekhabi, D. Hydrologic Data Assimilation with the Ensemble Kalman Filter. Mon. Weather Rev. 2002, 130, 103–114. [Google Scholar] [CrossRef]
Berg, D.; Bauser, H.H.; Roth, K. Covariance Resampling for Particle Filter—State and Parameter Estimation for Soil Hydrology. Hydrol. Earth Syst. Sci. 2019, 23, 1163–1178. [Google Scholar] [CrossRef]
Jamal, A.; Linker, R. Covariance-Based Selection of Parameters for Particle Filter Data Assimilation in Soil Hydrology. Water 2022, 14, 3606. [Google Scholar] [CrossRef]
He, X.; Lucatero, D.; Ridler, M.-E.; Madsen, H.; Kidmose, J.; Hole, Ø.; Petersen, C.; Zheng, C.; Refsgaard, J.C. Real-Time Simulation of Surface Water and Groundwater with Data Assimilation. Adv. Water Resour. 2019, 127, 13–25. [Google Scholar] [CrossRef]
Wang, F.; Huang, G.H.; Fan, Y.; Li, Y.P. Development of a Disaggregated Multi-Level Factorial Hydrologic Data Assimilation Model. J. Hydrol. 2022, 610, 127802. [Google Scholar] [CrossRef]
Nearing, G.S.; Klotz, D.; Frame, J.M. Data Assimilation and Autoregression for Using Near-Real-Time Streamflow Observations in Long Short-Term Memory Networks. Hydrol. Earth Syst. Sci. 2022, 26, 5493–5513. [Google Scholar] [CrossRef]
De Sousa, E.R.; Hipsey, M.R.; Vogwill, R.I.J. Data Assimilation, Sensitivity Analysis and Uncertainty Quantification in Semi-Arid Terminal Catchments Subject to Long-Term Rainfall Decline. Front. Earth Sci. Chin. 2023, 10, 886304. [Google Scholar] [CrossRef]
Mansanarez, V.; Renard, B.; Coz, J.L. Shift Happens! Adjusting Stage-discharge Rating Curves to Morphological Changes at Known Times. Water Resour. Res. 2019, 55, 2876–2899. [Google Scholar] [CrossRef]
Arestegui, M.; Lavado, W.; Cisneros, A.; Madueño, G.; Almeida, C.; Millán, C.; Bazo, J.; Anicama, J. Exploration of Flood Lead-Times through River Level Monitoring: A Case Study from the Vilcanota River in Cusco, Peru. In Proceedings of the EGU General Assembly 2023, Online, 23–28 April 2023; p. 17570. [Google Scholar]
Paul, J.D.; Buytaert, W.; Sah, N. A Technical Evaluation of Lidar-based Measurement of River Water Levels. Water Resour. Res. 2020, 56, e2019WR026810. [Google Scholar] [CrossRef]
Llauca, H.; Lavado-Casimiro, W.; Montesinos, C.; Santini, W.; Rau, P. PISCO_HyM_GR2M: A Model of Monthly Water Balance in Peru (1981–2020). Water 2021, 13, 1048. [Google Scholar] [CrossRef]
Condom, T.; Martínez, R.; Pabón, J.D.; Costa, F.; Pineda, L.; Nieto, J.J.; López, F.; Villacis, M. Climatological and Hydrological Observations for the South American Andes: In Situ Stations, Satellite, and Reanalysis Data Sets. Front. Earth Sci. Chin. 2020, 8, 92. [Google Scholar] [CrossRef]

Figure 1. The Pisac gauge station in the Vilcanota River Basin, located in the southern Peruvian Tropical Andes. The available pluviometric network in the study domain is shown as orange triangles.

Figure 2. Seasonal variations in rainfall, mean air temperature, and discharge for the climatological period 1981–2010 on the Vilcanota River Basin at the Pisac stream gauge station. Monthly rainfall [45] and air temperature [46] data were obtained from the gridded PISCO dataset.

Figure 3. Streamflow data assimilation framework in the Vilcanota River Basin.

Figure 4. (a,b) Comparison of simulated (IMERG-E’ and GSMaP-NRT’) and observed hourly flow series at the Pisac stream gauge station during 2017–2022. The red dashed box represents the period February–March 2022 selected for testing streamflow DA techniques. (c,d) Scatter plots between observed and simulated discharges for the period February–March 2022.

Figure 5. Comparison between observed and simulated hourly flow series during February and March 2022 at the Pisac stream gauge station for (a) IMERG-E’+EnKF, (b) IMERG-E’+KF, (c) GSMaP-NRT’+EnKF, and (d) GSMaP-NRT’+PF experiments.

Figure 6. Uncertainty in simulated soil moisture (S) from the GR4H model in the Vilcanota Basin during February and March 2022 for (a) IMERG-E’+EnKF, (b) IMERG-E’+KF, (c) GSMaP-NRT’+EnKF, and (d) GSMaP-NRT’+PF experiments.

Figure 7. Performance skills of hourly forecasted discharges from 1 to 24 h of lead times at the Pisac stream gauge station from February to March 2022, based on (a) NSE, (b) BIAS, (c) MRMSE and (d) CRPSS coefficients.

Figure 8. Comparison of observed and forecasted hourly flow series at the Pisac station from February to March 2023, for lead times of (a) 1, (b) 3, (c) 6, (d) 12, (e) 18, and (f) 24 h in the GSMaP-NRT’+EnKF experiment.

Table 1. Selected gauges with hourly records in the study domain.

Type	Station	Abrev.	Longitude [°W]	Latitude [°S]	Elevation [m.a.s.l.]
Fluviometric	Pisac	PIS	71.84	13.43	2791.65
Pluviometric	Acjanaco Gore	AGR	71.62	13.20	3466.11
	Calca	CAL	71.96	13.33	2921.24
	Casaccancha	CAS	72.30	13.99	4033.16
	Huayllabamba	HUA	72.45	13.27	2976.55
	Intihuatana M	INM	72.56	13.17	1778.23
	Machupicchu	MAC	72.55	13.18	2399.80
	Marcapata Gore	MAR	70.90	13.50	1792.76
	Qorihuayrachina	QOR	72.43	13.22	2517.25
	Salcca	SAL	71.23	14.17	3920.10
	San Pablo	SPB	72.62	13.03	1228.11
	Santo Tomas	STM	72.10	14.45	3665.48
	Sicuani	SIC	71.24	14.24	3534.95

Table 2. Statistic al metrics and their corresponding equations used for evaluating the hydrological performance of the GR4H model.

Statistical Metric	Equation	Min, Max, Optimal	Emphasis
Logarithmic Nash–Sutcliffe Efficiency (logNSE) [-]	$l o g N S E = 1 - \frac{\sum_{t = 1}^{n} {({l o g (Q s i m}_{t}) - {l o g (Q o b s}_{t}))}^{2}}{\sum_{t = 1}^{n} {({l o g (Q o b s}_{t}) - \bar{l o g (Q)} o b s)}^{2}}$	−∞,1,1	Low flows [51]
Nash–Sutcliffe Efficiency with Box–Cox transformation (BoxNSE) [-]	$B o x N S E = 1 - \frac{\sum_{t = 1}^{n} {({Q s i m'}_{t} - {Q o b s'}_{t})}^{2}}{\sum_{t = 1}^{n} {({Q o b s'}_{t} - \bar{Q' o b s})}^{2}}$ $Q^{'} = \frac{{(Q + 1)}^{γ} - 1}{γ}$	−∞,1,1	Middle flows [26]
Kling–Gupta Efficiency (KGE) [-]	$K G E = 1 - \sqrt{{(r - 1)}^{2} + {(α - 1)}^{2} + {(β - 1)}^{2}}$ $r = \frac{\sum_{t = 1}^{n} [({Q s i m}_{t} - \bar{Q} s i m) ({Q o b s}_{t} - \bar{Q} o b s)]}{\sqrt{\sum_{t = 1}^{n} {({Q s i m}_{t} - \bar{Q} s i m)}^{2}} \sqrt{\sum_{t = 1}^{n} {({Q o b s}_{t} - \bar{Q} o b s)}^{2}}}$ $α = \frac{σ_{s i m}}{σ_{o b s}}; β = \frac{μ_{s i m}}{μ_{o b s}}$	−∞,1,1	Variance and high flows [52]
Bias (BIAS) [%]	$B I A S = {[\max (\frac{\bar{Q} s i m}{\bar{Q} o b s}, \frac{\bar{Q} o b s}{\bar{Q} s i m}) - 1]}^{2}$	0,+∞,0	Average trend of simulated flows [26]

Notes: Qobs: corresponds to observed discharges; Qsim: corresponds to simulated discharges.

Table 3. Statistical metrics and their respective equations used for evaluating forecasts.

Statistical Metric	Equation
Nash–Sutcliffe Efficiency (NSE) [-]	$N S E = 1 - \frac{\sum_{t = 1}^{n} {({Q f c s t}_{t} - {Q o b s}_{t})}^{2}}{\sum_{t = 1}^{n} {({Q o b s}_{t} - \bar{Q} o b s)}^{2}}$
Bias (BIAS) [m³/s]	$B I A S = \frac{1}{n} \sum_{t = 1}^{n} ({Q f c s t}_{t} - {Q o b s}_{t})$
Mean of Ensemble Root Mean Squared Error (MRMSE) [m³/s]	$M R M S E = \frac{1}{n} \sum_{t = 1}^{n} \sqrt{\frac{1}{F} \sum_{t = 1}^{F} {({Q^{j} f c s t}_{t} - {Q o b s}_{t})}^{2}}$
Continuous Ranked Probability Skill Score (CRPSS) [-]	$C R P S = \frac{1}{n} \sum_{t = 1}^{n} \int_{- \infty}^{\infty} {({F f c s t}_{t} (Q) - {F o b s}_{t} (Q))}^{2} d Q$ ${F o b s}_{t} (Q) = \{\begin{matrix} 0 (Q < {Q o b s}_{t}) \\ 1 (Q \geq {Q o b s}_{t}) \end{matrix}$ $C R P S S = 1 - \frac{C R P S}{{C R P S}_{O L}}$

Notes: Qobs: corresponds to observed hourly discharges; Qfcst: corresponds to forecasted discharges at different lead times. CRPS is the continuous ranked probability score, and OL corresponds to the Open Loop model.

Table 4. Model performance metrics during calibration and validation periods.

Statistic Metric	Calibration		Validation
Statistic Metric	IMERG-E’	GSMaP-NRT’	IMERG-E’	GSMaP-NRT’
logNSE	0.875	0.792	0.878	0.786
BoxNSE	0.883	0.831	0.878	0.819
KGE	0.912	0.871	0.869	0.789
BIAS	0.003	0.003	0.016	0.029

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Llauca, H.; Arestegui, M.; Lavado-Casimiro, W. Constraining Flood Forecasting Uncertainties through Streamflow Data Assimilation in the Tropical Andes of Peru: Case of the Vilcanota River Basin. Water 2023, 15, 3944. https://doi.org/10.3390/w15223944

AMA Style

Llauca H, Arestegui M, Lavado-Casimiro W. Constraining Flood Forecasting Uncertainties through Streamflow Data Assimilation in the Tropical Andes of Peru: Case of the Vilcanota River Basin. Water. 2023; 15(22):3944. https://doi.org/10.3390/w15223944

Chicago/Turabian Style

Llauca, Harold, Miguel Arestegui, and Waldo Lavado-Casimiro. 2023. "Constraining Flood Forecasting Uncertainties through Streamflow Data Assimilation in the Tropical Andes of Peru: Case of the Vilcanota River Basin" Water 15, no. 22: 3944. https://doi.org/10.3390/w15223944

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Constraining Flood Forecasting Uncertainties through Streamflow Data Assimilation in the Tropical Andes of Peru: Case of the Vilcanota River Basin

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Parsimonious Sub-Daily Hydrological Modeling

2.3. Design of Streamflow Data Assimilation Experiments

2.4. Quantification of Model Errors

2.5. Evaluation of Model Forecast

3. Results

3.1. Model Calibration and Validation

3.2. Estimation of Model Uncertainties in Streamflow Data Assimilation

3.3. Forecasting Performance Assessment

4. Discussion

Limitations and Potential of Streamflow Data Assimilation in the Vilcanota River Basin

5. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI