a b s t r a c t
Article history: Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for assessments of global
Received 20 July 2016 water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit
Received in revised form 20 September process-based models, which might be overly time-consuming and data-intensive for this purpose, or
with empirical regression models that predict MAF based on climate and catchment characteristics.
Accepted 24 November 2016
Available online 25 November 2016
Yet, regression models have mostly been developed at a regional scale and the extent to which they
This manuscript was handled by Corrado can be extrapolated to other regions is not known. In this study, we developed a global-scale regression
Corradini, Editor-in-Chief, with the model for MAF based on a dataset unprecedented in size, using observations of discharge and catchment
assistance of Subashisa Dutta, Associate characteristics from 1885 catchments worldwide, measuring between 2 and 106 km2. In addition, we
Editor compared the performance of the regression model with the predictive ability of the spatially explicit glo-
bal hydrological model PCR-GLOBWB by comparing results from both models to independent measure-
Keywords: ments. We obtained a regression model explaining 89% of the variance in MAF based on catchment area
Mean annual discharge and catchment averaged mean annual precipitation and air temperature, slope and elevation. The regres-
River discharge sion model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error
Global hydrology (RMSE) values were lower (0.29–0.38 compared to 0.49–0.57) and the modified index of agreement (d)
Empirical modelling
was higher (0.80–0.83 compared to 0.72–0.75). Our regression model can be applied globally to estimate
Predictions in ungauged basins
MAF at any point of the river network, thus providing a feasible alternative to spatially explicit process-
Scaling relationships
Model comparison based global hydrological models.
Ó 2016 Elsevier B.V. All rights reserved.
Spatial error model
sion equations relating streamflow to explanatory catchment char- elevation of the catchment. Drainage area, mean annual precipita-
acteristics like upstream drainage area, precipitation and tempera- tion and mean annual temperature are often used as predictors of
ture may help to better understand general hydrological patterns MAF in regional regression modelling studies (Verdin and Worstell,
and processes across different scales (Burgers et al., 2013; Farmer 2008; Vogel and Sankarasubramanian, 2000; Vogel et al., 1999).
et al., 2015). However, to date, regression-based approaches relat- The dependence of MAF on drainage area is a well-accepted power
ing mean annual streamflow to catchment characteristics have relationship reflecting the self-similarity of river systems
been mainly applied at a regional scale (Hortness and (Rodríguez-Iturbe and Rinaldo, 2001). Mean annual precipitation
Berenbrock, 2001; Stuckey, 2006; Tran et al., 2015; Verdin and represents the potential runoff of the catchment, as it equals the
Worstell, 2008; Vogel et al., 1999) or to specific climate zones amount of water supplied to the catchment (Thomas and Benson,
(Syvitski et al., 2003), and the extent to which these models can 1970). We selected the mean annual temperature as a proxy for
be extrapolated to other regions is not known. Regression relation- the potential evapotranspiration (PET), because temperature is a
ships at the global scale have hardly been established so far. An major determinant of evapotranspiration (Hamon, 1963; Lu et al.,
exception is Burgers et al. (2013), who derived MAF relationships 2005; Thornthwaite, 1948). Furthermore, previous regression anal-
at a global scale using precipitation and catchment area as predic- yses of MAF have shown an increased explained variance when
tors. However, their model explained only 56% of the variance in additional geomorphologic parameters were considered
MAF, which is low compared to the range of 77–99% achieved by (Hortness and Berenbrock, 2001; Stuckey, 2006; Vogel et al.,
regional regression models (e.g. Verdin and Worstell (2008)). Yet, 1999). Therefore, we included average slope and elevation of the
the regional studies typically included a larger number of predic- catchment as additional predictors in our study. Although eleva-
tors, which suggests that the explanatory power of a global-scale tion and slope alone may not directly influence MAF, they may
regression model may increase if relevant predictors are added. serve as proxies for other factors causing inter-basin streamflow
In addition, the applicability of global regression relationships for variation which are difficult to measure, e.g. radiation, wind, vege-
the prediction of mean annual streamflow has not yet been tested. tation and basin ruggedness (Thomas and Benson, 1970).
Therefore, the aim of this study was twofold: (1) to establish an
empirical regression model relating MAF to easily retrievable
catchment characteristics at the global scale; (2) to test the predic- 2. Materials and methods
tive ability of the regression model in a backcasting analysis and
compare its performance with the predictive performance of 2.1. Mean annual discharge data
PCR-GLOBWB, a spatially explicit MHM (van Beek et al., 2011).
To our knowledge, our study is the first to make an explicit com- We retrieved worldwide MAF data from the Global Runoff Data
parison of the predictive abilities of a process-based and a Centre (GRDC) database, which provides daily or monthly observa-
regression-based global-scale model for MAF. tions of 9213 gauging stations monitored from 1806 to 2015, with
We based our regression model on measured long-term average variable record length (GRDC, 2015). The GRDC has spent more
MAF from 1885 catchments worldwide, ranging from 2 km2 to than 25 years gathering river discharge data from the National
106 km2 in size. We used five predictor variables, including two cli- Hydrological Services of all the World Meteorological Organization
matic variables – mean annual precipitation and air temperature – (WMO) state members, which has resulted in a discharge dataset
and three geomorphologic variables – area, mean slope and mean unprecedented in size. For example, the SAGE Global River
Fig. 1. Distribution of the 1885 GRDC gauging stations monitored for at least 15 years in the 1981–2010 period. The stations are grouped based on the mean annual flow
(MAF) recorded at each station. Next to each MAF category, the number of observations is provided in brackets.
Discharge Database (http://nelson.wisc.edu/sage/data-and-mod- watershed boundaries were established. We then calculated a sin-
els/riverdata/) and the RivDis database (Vorosmarty et al., 1998) gle mean value of each variable for each catchment corresponding
provide discharge data for 3500 and 1018 stations, respectively. with a GRDC gauging station, as described by Syvitski et al. (2003).
The accuracy of the discharge measurements included in the GRDC Resampling and averaging were performed in ArcGIS 10.3. An over-
database is estimated to be about 10–20% (Syvitski et al., 2005). For view of the summary statistics of the variables is available in
our model development we selected discharge data for the period Table 1.
1981–2010. We used a 30 year period because this is in accordance
with the recommendations for climate analyses (World 2.3. Model fitting
Meteorological Organization, 1992). We excluded years after
2010 because of a decrease in data availability for the most recent Methods available for correlative modelling range from para-
years. We averaged the daily discharge data over each year, using metric and non-parametric regression-based approaches to
only those years where 100% of the daily observations were avail- machine-learning techniques (Chen et al., 2015; Danandeh Mehr
able. Next, we averaged the yearly observations for the period et al., 2013; Fan et al., 2015; Okkan and Serbes, 2012; Wang
1981–2010 in order to obtain a long-term mean annual discharge et al., 2015; Wu et al., 2009). For the present study we selected
for each catchment. We selected monitoring stations with at least ordinary least squares (OLS) regression because it results in an
15 yearly average discharge values in order to obtain representa- explicit equation, which facilitates interpretation and comparison
tive long-term mean values (Kennard et al., 2010). This resulted with other studies. All the variables except for temperature were
in a dataset of 1885 observations out of the 2759 available GRDC log-transformed to avoid heteroscedasticity as they revealed a
gauged catchments with observations in the time range 1981– right-skewed distribution (Table 1), in agreement with the choices
2010 (Fig. 1). made in previous studies for similar variables (Burgers et al., 2013;
Hendriks et al., 2012; Syvitski et al., 2003; Verdin and Worstell,
2.2. Catchment characteristics 2008; Vogel et al., 1999). This resulted in the following linear
regression equation:
We retrieved catchment-specific values for catchment area (A)
and catchment-averaged altitude (H), slope (S), 30-years mean log10 Q ¼ b0 þ bA log10 A þ bP log10 P þ bT T þ bS log10 S
temperature (T) and 30-years mean precipitation (P) from a com- þ bH log10 H þ e ð1Þ
bination of data sources (Table 1). Catchment area was retrieved
from the GRDC database, which includes a georeferenced map of where b0 is the intercept, <bA, bP, bT, bS, bH> is the vector of the
the upstream catchment corresponding with each gauging station regression coefficients associated with the predictor variables (sym-
(GRDC, 2011). Catchment boundaries in this map have been estab- bols are described in Table 1) and e is the error term. The back-
lished based on the HydroSHEDS drainage network, a 15 arc- transformed form to real scale of Eq. (1) yields the nonlinear
seconds hydrological map derived from 3 arc-seconds elevation formulation:
data of the National Aeronautics and Space Administration (NASA)
Q ¼ 10b0 AbA PbP 10bT T SbS HbH 10e ð2Þ
Shuttle Radar Topography Mission (SRTM), extended with the
hydro1k hydrological network for latitudes above 60 N, which Prior to performing the regression analysis, we assessed multi-
are not covered by the SRTM data (GRDC, 2011; Lehner et al., collinearity among the predictors using Variance Inflation Factors
2008). (VIFs), employing the function ‘‘vif” of the package ‘‘HH”
We derived altitude from the WorldClim digital elevation (Heiberger, 2015) in the R environment (Development Core
model (DEM), which is a 30 arc-seconds DEM based upon the SRTM Team, 2005). We preferred VIFs over bivariate correlation analysis
elevation data extended with GTOPO30 elevation data for latitudes because pairwise correlation coefficients do not reveal more subtle
above 60 N (Hijmans et al., 2005). We derived a raster slope map forms of multicollinearity (Field, 2009). The maximum VIF was 2.8,
from the WorldClim DEM using the ‘average maximum technique’ well below the standard threshold of 5 (Zuur et al., 2009). We then
in ArcGIS, similarly to Hortness and Berenbrock (2001). For precip- fitted OLS regression models for an increasing number of predic-
itation and temperature, we averaged 30 arc-minutes resolution tors. To identify the best regression model for each of the five sets
monthly raster maps from the Climatic Research Unit time series of predictors (first set with one predictor variable, second set with
(CRU TS) 3.23 to annual values, and consequently averaged over two predictor variables, etc.) as well as the best overall model, we
the period 1981–2010 (Harris et al., 2014; University of East employed the function ‘‘dredge” of the package ‘‘MuMIn” in the R
Anglia Climatic Research Unit, 2013). environment (Barton, 2015). Within each set, the algorithm ana-
We resampled the raster maps obtained for precipitation, tem- lyzes all possible combinations of predictor variables and ranks
perature, altitude and slope in order to match the 15 arc-seconds the regression models based on a user-defined criterion. To iden-
resolution of the HydroSHEDS drainage network, upon which the tify the most parsimonious model for each set of predictors, we
Table 1
Summary statistics of the mean annual streamflow (MAF) and catchment characteristics of 1885 gauging stations in the period 1981–2010.
Variable Symbol Unit Mean Median SDa Mina Maxa c1a Source database
3 1 2 1 3 3 4
MAF Q m s 3.72 10 3.38 10 1.99 10 3.47 10 4.73 10 14.2 GRDCb
Catchment area A m2 4.58 1010 4.61 109 2.05 1011 2.00 106 3.63 1012 9.7 GRDCc
Altitude H m 6.84 102 4.52 102 6.05 102 1.35 101 4.76 103 1.6 WorldClimd
Slope S (°) 2.31 100 1.27 100 2.65 100 4.71 102 1.56 101 2.0 WorldClimd
Precipitation P m s1 2.78 108 2.46 108 1.46 108 3.53 109 1.09 107 1.3 CRU TS 3.23e
Temperature T °C 8.96 100 8.24 100 7.51 100 1.67 101 2.76 101 0.4 CRU TS 3.23e
SD = standard deviation; Min = minimum; Max = maximum; c1 = skewness.
GRDC (2015).
GRDC (2011).
Hijmans et al. (2005).
Harris et al. (2014).
used the Akaike Information Criterion (AIC) as well as the Bayesian an absolute as well as a relative error measure, following the rec-
Information Criterion (BIC), which employs a larger penalty term ommendations for hydrological model evaluation as provided by
for additional predictor variables. Further, we used the Cooks D Legates and McCabe (1999). We adopted the index of agreement
influence statistic in order to identify observations that may have d2 in the modified form d to avoid inflation of errors by squared
biased the coefficients of the regression (Cook and Weisberg, values (Legates and McCabe, 1999). In addition, the d represents
1982). an improvement over the coefficient of determination (R2)
In order to assess potential bias in the regression coefficients (Legates and McCabe, 1999).
induced by spatial autocorrelation resulting from the nested struc- Given the six orders of magnitude covered by the data, we log
ture of the catchments, we compared the regression coefficients transformed the mean annual streamflow values. The RMSE is cal-
with the coefficients of a spatial error (SE) model. Since spatial culated as:
autocorrelation can exist within either the residuals (spatial error) rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
or the response variable (spatial lag), we performed a preliminary 1 Xm Xn 2
RMSE ¼ ðlog10 Ox;t log10 Px;t Þ ð3Þ
test for spatial autocorrelation based on the Lagrange Multiplier nm x¼1 t¼1
test (LM test) using the R package ‘spdep’ (Anselin, 1988; Bivand where n m are the dimensions of the matrix of observations of the
et al., 2013; Bivand and Piras, 2015). We preferred the LM test to m GRDC stations over the n years of the backcasting period, Ox,t is
the more commonly employed Moran’s I test, for the LM test has the observed value for the station x at time t, and Px,t is the pre-
a higher power to discriminate among either spatial error autocor- dicted value for the station x at time t.
relation or spatial lag (Anselin and Rey, 1991). The LM test showed The d is formulated as:
significantly higher autocorrelation in the error term (LM test
value of 465, robust LM test value of 375) than in the response X
t¼1 jlog10 Ox;t log10 Px;t j
variable (with values of 115 and 25, respectively). Therefore we fit-
d¼P x¼1 ð4Þ
ted an SE model that accounts for spatial autocorrelation in the m Pn
residuals, expressing the error term of Eq. (1) as: e = kWe + l, x¼1 t¼1 jlog10 P x;t log10 Oj þ jlog10 Ox;t log10 Oj
2.4. Comparison with PCR-GLOBWB: backcasting analysis The OLS regression analysis revealed that the model with the
full set of predictors was the most parsimonious (i.e., lowest AIC;
We compared the predictive performance of our best regression Table S1). Nearly 90% of the variation in MAF could be explained
model with the global hydrological model PCR-GLOBWB (van Beek by the five catchment characteristics (Table 2), indicating that
and Bierkens, 2008; van Beek et al., 2011). Defined as a ‘‘leaky the most relevant predictors for MAF were covered by the regres-
bucket” type of model, PCR-GLOBWB calculates changes in water sion model. According to the standardized regression coefficients,
storage between two different soil layers, groundwater reservoir which can be compared across explanatory variables to assess their
and atmosphere, forced by CRU TS 2.1 data, on a cell-by-cell basis relative importance (Bring, 1994), catchment area was the most
at 30 arc-minutes resolution, for daily time steps. PCR-GLOBWB important predictor of MAF, followed by precipitation, tempera-
has been widely employed for assessments of global surface water ture, slope and elevation.
and groundwater availability, nutrient transport modelling and The model performed better for higher MAF values (Fig. 2). Fur-
biodiversity impact calculations (Beusen et al., 2016; Gleeson thermore, residual errors were slightly larger for catchments with
et al., 2012; Janse et al., 2015; Wada et al., 2011; Wanders and lower precipitation values and at higher altitudes (Fig. S1). Residu-
Wada, 2015). Compared to other GHMs, PCR-GLOBWB is a purely als tended to be randomly distributed in relation to catchment
process-based model, as opposed to for example WaterGAP which area, precipitation or slope (Fig. S1). Only about 1.3% of the pre-
is partially calibrated (Alcamo et al., 2003; Döll et al., 2003). There- dicted values showed errors greater than one order of magnitude
fore, we considered PCR-GLOBWB a more suitable benchmark for (Fig. 2). For the Cooks D statistic, a maximum value of 0.03 was
the comparison. found, well below the threshold of 1 (Cook and Weisberg, 1982),
We considered monitoring data of GRDC stations continuously meaning that none of the observations biased the regression
monitored from 1971 to 1980 as an independent and common coefficients.
basis for the comparison between the regression model and PCR- The comparison of the regression coefficients between the OLS
GLOBWB. From the 2219 GRDC stations used for the testing of and SE regression models revealed a large overlap of the confi-
PCR-GLOBWB (van Beek et al., 2011), we selected the 543 stations dence intervals of the coefficients (49–86% CI overlap; Table S2).
that were continuously monitored from 1971 through 1980 This indicated that the OLS regression coefficients were not signif-
(Fig. S2). We derived mean annual values of temperature and pre- icantly influenced by spatial autocorrelation (type I error).
cipitation from the CRU TS 3.23 for each catchment, according to
the approach described in Section 2.2, and calculated mean annual 3.2. Performance testing on independent data and comparison with
streamflow for each year in the time span 1971–1980, and as a PCR-GLOBWB
10 years average.
We evaluated and compared the performances of the regression The testing of the regression model on independent data in the
model and PCR-GLOBWB employing root mean square error time period 1971–1980 (backcasting analysis), for both single year
(RMSE) and modified index of agreement (d). Thus, we employed and 10-years average MAF, revealed that the predictions of the
Table 2
Coefficients (raw and standardized), goodness of fit (R2) and number of underlying observations (m) of the most parsimonious regression model Q = 10b0 AbA PbP 10bTT
HbH SbS. CI = confidence interval.
Fig. 3. Results of the backcasting analysis for the period 1971–1980, showing predicted versus observed MAF for the OLS regression model (left) and the GHM PCR-GLOBWB
(right), based on yearly values (top) as well as 10-year average values (bottom). Within each chart, the solid line represents perfect model fit (1:1 line) and the dashed lines
define a range of accuracy of plus/minus one order of magnitude. RMSE = root mean square error; d = modified index of agreement.
include temperature as predictor, the larger MAF at higher alti- area (p-value < 0.01, R2 = 0.03), while for the regression model no
tudes may result in a positive regression coefficient for altitude significant correlation was found. However, new global hydrologi-
instead. The positive exponent of about ½ obtained for slope (S) cal models with greater spatial resolution than the 30 arc-minutes
compares well with the values reported in regional studies and version of PCR-GLOBWB employed in this study may achieve bet-
reflects that in catchments with steeper slopes, the runoff and con- ter results, especially for smaller catchments (e.g. see the list of
sequently the MAF is enhanced. models provided in Bierkens (2015)). Yet, such refined models
are more demanding in terms of computational costs (Bierkens,
4.2. Performance comparison with PCR-GLOBWB 2015), and might therefore be more suitable when monthly or
daily discharge values are needed.
Overall, our regression model performed better than PCR- Both the regression model and PCR-GLOBWB performed worse
GLOBWB when applied to an independent test dataset. Differences for water-scarce regions, as revealed by larger errors at higher dry-
were apparent in particular for smaller catchments (Fig. 3). The ness ratio values (Fig. 4). The dryness ratio reflects water losses due
residuals (absolute values) of PCR-GLOBWB for the 10-years to evapotranspiration relative to the amount of precipitation. It is
average MAF revealed a significant negative trend in relation to defined as the actual annual evapotranspiration divided by the
V. Barbarossa et al. / Journal of Hydrology 544 (2017) 479–487 485
Table 3
Regression coefficients found in this study compared with coefficients reported in regional and global studies available from the literature. n represents the number of catchments
employed to calibrate the regression coefficients; Area range is the range of the catchment areas employed in the respective study.
This study Burgers et al. Gyawali et al. (2015) Hortness and Berenbrock Stuckey (2006) Tran et al. (2015) Vogel et al.
(2013) (2001) (1999)
bA 1.02 0.86 0.87 0.83 to 1.10 1.01 1.01 0.58 to 1.14
bP 2.07 1.01 3.68 1.64 to 2.70 1.80 2.04 1.21 to 6.42
bT 0.04 – – – – 0.49a 7.66 to
bS 0.46 – 0.30b 3.44 to 7.52b – – 0.33 to 0.51
bH 0.51 – – 2.36 to 2.30 0.13 – 1.66c
n 1885 663 93 200 195 533 1553
Area range 2.0100 to 7.3 103 to 6.2 100 to 4.4 103 8.0 100 to 3.5 104 5.6 100 to 1.3 101 to –
[km2] 3.6106 4.6 106 4.5 103 1.7 102
Extent Global Global western Great Lakes, Idaho, USA Pennsylvania, Upper Mississippi, USA
Values refer to the log-transformed form and are therefore not directly comparable with the coefficient obtained in this study.
Values refer to slope in percent instead of degrees and are therefore not directly comparable with the coefficient obtained in this study.
Used only for region 16, ‘‘Great Basin” in Vogel et al. (1999).
Fig. 4. Residuals plot of the backcasting of the 10-year average MAF vs the dryness ratio for (a) the regression model and (b) PCR-GLOBWB. Note that 18 values (about 3% of
total data) with dryness ratios smaller than 0 were excluded for clarity of representation.
total annual precipitation, where the actual evapotranspiration is tion and temperature values (Table 1). This indicates that the model
calculated as the annual precipitation minus the unit discharge, developed by this study can be applied to predict MAF at any point
in turn obtained by dividing the discharge by the area of the catch- of any river network globally, taking into account the weaker pre-
ment (Vogel et al., 1999). The larger errors with higher dryness diction power for water-scarce regions (see Section 4.2). As such,
ratios are likely due to a combination of higher uncertainty in the model is suitable for assessments of water availability and eco-
the precipitation values for water-scarce regions and hydrological logical integrity in relation to changing future climatic conditions.
processes that are particularly relevant in dry regions yet not The model requires only a small number of input parameters,
described by the models (Döll et al., 2003; van Beek et al., 2011; namely catchment area, catchment-averaged precipitation,
Vogel et al., 1999). Examples of such processes include the almost catchment-averaged temperature, catchment-averaged altitude
instantaneous evaporation from many ephemeral post-rainfall and catchment-averaged slope. Catchment area as well as slope
ponds and relatively large losses from the river channel to ground- and altitude can be easily derived from a flow direction raster
water (Döll et al., 2003). In addition, water abstraction by human map and a digital elevation model with standard Geographic Infor-
activities is likely to affect the natural flow in water-scarce regions mation System (GIS) tools. Annual mean precipitation and temper-
more than in wet regions, thus providing an additional possible ature can be obtained from observations covering a given period of
explanation for the overestimation of MAF in dry regions. interest. If predictor variable values are within the range of values
employed in the calibration phase (Table 1), the uncertainty of the
4.3. Applicability of the regression model predictions is known. Application of the model to predictor values
outside the calibration domain results in MAF values with
The residuals of our regression model were not related to area unknown uncertainty. In addition to this, given the fact that the
(Fig. S1-b), suggesting that the model is area-independent and model was calibrated on the CRU TS 3.23 climatic data (Harris
maintains similar performance across catchments ranging across et al., 2014), with the HydroSHEDS 15-s river network (Lehner
at least six orders of magnitude in size (2 to 106 km2). Although et al., 2008) and with the DEM provided by the WorldClim data-
the distribution of the monitoring stations employed for the calibra- base (Hijmans et al., 2005), we acknowledge that the model is
tion of the regression coefficients was skewed towards America and not valid with other input sources.
Europe (Fig. 1), the residuals of the model-application (yearly and The regression model performed worse for extreme MAF values
10 years average MAF) were consistent across different continents when applied at finer temporal scales, as exemplified by the
(Fig. S3). This is explained by the wide range of latitudes covered decreased performance of the model on a year-by-year basis
by the monitoring stations, reflected by a large range in precipita- (Fig. 3). This might be due to the fact that the regression
