Fitting Flood Frequency Distributions Using The Annual Maximum Series and The Peak Over Threshold Approaches
Fitting Flood Frequency Distributions Using The Annual Maximum Series and The Peak Over Threshold Approaches
Fitting Flood Frequency Distributions Using The Annual Maximum Series and The Peak Over Threshold Approaches
To cite this article: Daniel Caissie, Gabriel Goguen, Nassir El-Jabi & Wafa Chouaib (2022) Fitting
flood frequency distributions using the annual maximum series and the peak over threshold
approaches, Canadian Water Resources Journal / Revue canadienne des ressources hydriques,
47:2-3, 122-136, DOI: 10.1080/07011784.2022.2052752
Fitting flood frequency distributions using the annual maximum series and
the peak over threshold approaches
Daniel Caissiea, Gabriel Goguena, Nassir El-Jabib and Wafa Chouaiba
a
Fisheries and Ocean Canada, Moncton, NB, Canada; bCivil Engineering Department, Universite de Moncton, Moncton, NB, Canada
RÉSUMÉ
La frequence des crues joue un ro ^le important dans la conception des ouvrages hydrauliques ainsi
que dans la gestion des p^eches et des ressources aquatiques. Il existe deux types d’analyses de
frequence des crues, a savoir l’analyse des series maximales annuelles (SMA) et l’analyse des series
partielles de duree (ou pic au-dessus du seuil, POT). L’analyse POT consiste a etudier les donnees
de debit au-dessus d’un certain seuil (ou niveau de troncature). Dans la presente etude, l’SMA (dis-
tribution de valeurs extr^emes generalisees – GEV) et le POT (distribution generalisees de Pareto –
GP et distribution exponentielle – Exp) ont ete utilises pour calculer les crues de quatre stations
hydrometriques de la riviere Miramichi au Nouveau-Brunswick. Dans cette etude, une methode
simple a ete proposee ou
des niveaux de troncature correspondant a 1, 1.5 et 2 crues en moyenne
par annee sont utilises. L’utilisation de plusieurs niveaux de troncature dans l’analyse POT a
l’avantage de fournir plus de resultats qui sont utilises pour identifier quel niveau fournit un meil-
leur ajustement des donnees de crue. Les distributions GEV et GP representaient mieux les
donnees de crues dans la riviere Miramichi, tandis que la distribution Exp ne presentait pas un bon
ajustement aux donnees, en particulier pour les crues de recurrences elevees (>25 ans). Les
resultats ont montre que les niveaux de troncature a un nombre de crues de 1 (niveau de tronca-
ture le plus eleve), ont generalement fourni un meilleur ajustement des crues de recurrences
elevees (>25 ans). De plus, des niveaux de troncature plus faibles avaient tendance a fournir des
estimations de crues avec moins d’incertitudes (coefficient de variation plus faible, tel que teste a
l’aide d’une technique de jackknife). Enfin, les resultats ont montre que l’SMA et le POT sont
complementaires dans les analyses de frequence des crues. L’SMA est l’approche la plus classique
des analyses de frequence des crues; cependant, le POT fournit une meilleure caracterisation des
crues (par exemple, la grandeur, la duree et le volume des crues).
CONTACT Daniel Caissie Daniel.Caissie@dfo-mpo.gc.ca Fisheries and Ocean Canada, Moncton, NB, Canada.
Supplemental data for this article can be accessed online at http://dx.doi.org/10.1080/07011784.2022.2052752
The work of Nassir El-Jabi is ß 2022 Canadian Water Resources Association
The work of Daniel Caissie, Gabriel Goguen and Wafa Chouaib is ß 2022 Her Majesty the Queen in Right of Canada as represented by Fisheries and Oceans Canada
CANADIAN WATER RESOURCES JOURNAL / REVUE CANADIENNE DES RESSOURCES HYDRIQUES 123
relatively high truncation levels where the flood count Most studies have reported that when the truncation
was generally below 1 flood per year on average. level is relatively high (i.e. representing less than 2
Rather than using physical attributes of rivers, some floods per year on average), then the independence
studies have selected the truncation level based purely criteria is generally met (e.g. Cunnane 1973;
on statistical properties of the Poisson distribution Taesombut and Yevjevich 1978; Cunnane 1979;
(e.g. Ashkar and Rousselle 1983b; Caissie and El-Jabi Ashkar and Rousselle 1987).
1991; Lang, Ouarda, and Bobee 1999). A wide range Some theoretical arguments suggest that certain
of truncation levels (high and low values) have been distributions should be favored when dealing with
observed from the literature when applying this selec- extreme data (as they tend to converge to some limit-
tion criterion (Ashkar and Rousselle 1987). ing distributions). For the AMS analysis, the limiting
Based on results of previous studies, it is clear that distribution can be shown to belong to the general-
the truncation level should be set high enough to pre- ized extreme value (GEV) family of distributions,
vent any autocorrelation among floods, especially whereas, for the POT analysis, the limiting distribu-
when floods are very close to each other. However, it tion can be shown to belong to the generalized Pareto
should also be set low enough to maximize the flood (GP) family of distributions (Coles 2001; Salvadori
information for an effective fit of the distributions et al. 2007). In the present study, the GEV will be
(e.g. low sampling variance of flood estimates). In used to analyse flood data using the AMS analysis
order to satisfy these conditions, studies have shown and both the GP and exponential (Exp) distributions
that the truncation level should most likely be set (special case of the GP) will be used for flood exceed-
between 1 and 2 floods per year on average (Cunnane ances for the POT analysis. In addition, the Poisson
1973; Taesombut and Yevjevich 1978; Cunnane 1979; and negative binomial distributions will be used to
Ashkar and Rousselle 1983b). In the present study we model the occurrence of floods. The exponential dis-
are suggesting a new approach of setting the trunca- tribution has been widely used in the POT analysis in
tion level for the POT analysis. We are proposing the past (Todorovic and Zelenhasic 1970; Todorovic
pre-setting three fixed truncation levels that corre- and Rousselle 1971; Ashkar and Rousselle 1983b;
sponds to 1, 1.5 and 2 floods per year on average. Caissie and El-Jabi 1991); however, more recent stud-
This new approach of selecting fixed flood counts has ies are using the GP (e.g. Ben-Zvi 2016). Both the GP
the following advantages: (1) simplifying the POT and exponential distributions will be used for com-
analysis by imposing truncation levels at specific flood parison purposes. The AMS and POT flood frequency
counts, and (2) being able to compare results from results will be compared using modeling performance
different truncation levels to determine which trunca- criterion (e.g. the coefficient of variation and the root
tion level provides a better fit of the flood data. This mean square error). The analysis will be carried out
study will also look at comparing both AMS and POT using data from four hydrometric stations within the
flood frequency results using different criteria, as very Miramichi River basin, namely Catamaran Brook, the
few studies have compared these approaches in Northwest Miramichi River, the Little Southwest
the literature. Miramichi River and the Southwest Miramichi River.
In the application of the POT analysis, two distri- These rivers are part of a homogenous region due to
butions are fitted, namely the distribution of the their close proximity; however, they differ in basin
occurrence of floods (time arrival) and the distribu- sizes where drainage area ranged from 28.7
tion of the floods exceedance (magnitudes of floods to 5050 km2.
above the truncation level). The distributions used for The specific objectives of the present study are: (1)
the occurrence of floods are generally the Poisson, the to fit the occurrence of flood exceedances to a
binomial or a negative binomial distributions (On€ € oz Poisson and negative binomial distribution, (2) to
and Bayazit 2001; Bhunya et al. 2013). In the case of apply the POT analysis using fixed truncation levels
the flood exceedances, the generalized Pareto distribu- with a flood count of 1, 1.5 and 2 floods per year on
tion or special cases of thereof are often used (Ben- average, (3) to fit flood exceedances of the POT ana-
Zvi 2016). Cunnane (1979) looked at the relationship lysis to a GP and exponential distributions, (4) to fit
between successive flood peaks using scatter plots annual flood data to a GEV distribution function
(floods that occurred within 5 and 10 days and for (AMS analysis), and (5) to calculate and compare
flood counts between 1 and 5 floods on average per floods of different recurrence intervals (both AMS
year). No evidence of autocorrelation was found by and POT analysis), using different modeling criteria
the author from the 26 studied hydrometric stations. (coefficient of variation and root mean square error).
CANADIAN WATER RESOURCES JOURNAL / REVUE CANADIENNE DES RESSOURCES HYDRIQUES 125
Figure 1. Miramichi River showing the location and basin of studied rivers. Source: Author.
Material and methods used for the POT analyses for different mean number
of exceedances or events per year (i.e. for 1, 1.5 and 2
Study area
floods per year on average) are also presented in
The hydrological analysis was carried out using his- Table 1.
torical data from four hydrometric stations located
within the Miramichi River basin (New Brunswick,
Canada). This region receives approximately 1200 mm Peak over threshold (POT) analysis
of precipitation yearly of which 38% of the precipita- The peak over threshold (POT) analysis requires the
tion is lost through evapotranspiration (resulting in study of two distinctive properties of floods, namely
744 mm of runoff). All data used in this study were the number of occurrences of floods and the charac-
collected from the Historical Hydrometric Data at the teristics of floods (magnitude, duration and volume).
following site (https://wateroffice.ec.gc.ca/mainmenu/ Figure 2 illustrates a discharge hydrograph with asso-
historical_data_index_e.html). Data extracted included ciated POT events which describe flood exceedances
extreme values, that is, annual maximum daily dis- above the truncation level (QTL). As such, the selec-
charges as well as daily discharge data. The location tion of the truncation level (QTL) is the first part of
of the stations is outlined in Figure 1 and relevant the analysis. Every event above the truncation level is
station characteristics are presented in Table 1. The associated with a flood exceedance, n (magnitude), a
number of years of record varies between 26 flood duration (Dur) and a flood volume (Vol) (grey
(Catamaran Brook) and 64 (Little Southwest area; Figure 2). In Figure 2, Qmax represents the sum
Miramichi River). The smallest drainage basin is of the exceedance value and the truncation level.
Catamaran Brook at 28.7 km2 whereas the largest
river is the Southwest Miramichi River 5050 km2.
Selection of the truncation level in the
Both the mean annual flows and the mean annual
POT analysis
floods (average of annual maxima) are reported in
Table 1. The mean annual flow varied between The literature shows that the truncation level should
0.646 m3/s (CatBk) and 120 m3/s (SwMir) whereas the be set sufficiently low to maximize flood data, which
mean annual flood varied between 7.32 m3/s and should result in lower sampling variance of flood esti-
941 m3/s for the same rivers. The truncation levels mates. However, the truncation level should also be
126 D. CAISSIE ET AL.
Table 1. Studied hydrometric stations within the Miramichi River (New Brunswick, Canada) and relevant characteristics.
Station name Catamaran Brook Northwest Miramichi R. Little Southwest Miramichi R. Southwest Miramichi R.
Station ID 01BP002 01BQ001 01BP001 01BO001
Abreviation CatBk NwMir LSwMir SwMir
Drainage area (km2) 28.7 948 1340 5050
Period of record 1990–2015 1962–2015 1952–2015 1962–2015
Sample size (years) 26 54 64 54
Mean annual flow (m3/s) 0.646 21.9 33.3 120
Mean annual flood (m3/s) 7.32 212 270 941
Truncation level (m3/s) and mean number of event per year in parenthesis
MEvYr ¼ 1 6.75 (1.000) 180 (1.000) 222 (1.000) 850 (1.019)
MEvYr ¼ 1.5 5.60 (1.500) 150 (1.500) 202 (1.500) 750 (1.519)
MEvYr ¼ 2 5.20 (2.000) 129 (2.019) 177 (2.000) 685 (2.000)
Mean annual flood represents the average of annual maximum discharge
MEvYr ¼ mean number of events per year
Ashkar and Rousselle 1983a). In some cases the where H(x) is the cumulative distribution of the flood
occurrence of floods (or the number of exceedances exceedances (e.g. exponential, generalized Pareto or
per year) has been found to follow a binomial distri- other distributions) and where P(n) represents the
bution or a negative binomial distribution (Cunnane distribution of the occurrence of events as described
€ oz and Bayazit 2001; Bhunya et al. 2013).
1979; On€ above (Poisson or negative binomial distribution). In
CANADIAN WATER RESOURCES JOURNAL / REVUE CANADIENNE DES RESSOURCES HYDRIQUES 127
Equation (2) there is no restriction on the distribution where QT_Exp represents the discharge of different
of the occurrence of floods; however, if the occur- recurrence intervals by the POT model using the
rence of floods is a Poisson process, then substituting exponential distribution.
the Poisson distribution equation into (2) results into Similarly, we can analyze flood exceedances follow-
the following equation: ing a GP distribution rather than the exponential dis-
tribution. The GP distribution when used in POT
FðxÞ ¼ ekð1H ðxÞÞ (3)
model is given by the following equations (Ben-Zvi
where k is the Poisson parameter and H(x) represents 2016):
the distribution of the flood exceedances. One of the 10
0 x
k
simple distribution of H(x) which has been widely H ðxÞ ¼ 1 1 þ k 0 (10)
used in flood analysis in the past is the exponential a
distribution, a special case of the GP distribution In Equation (10), a0 is the scale parameter and k0
(Todorovic and Zelenhasic 1970; Todorovic and is the shape parameter. Therefore, if we use the GP
Rousselle 1971; Caissie and El-Jabi 1991, etc.). When distribution instead of the exponential distribution in
using the exponential distribution, H(x) has the fol- the POT model, the flood frequency equation
lowing equation: becomes:
!
H ðxÞ ¼ 1ebx (4) 0
1 k0
a ln 1 T
where b is the exponential distribution scale parameter QTG P ¼ 0 1 þ QTL (11)
k k
which can be calculated using the following equation:
where QT_GP represents the discharge for different
1
EðexceedancesÞ ¼ ¼ meanðexceedancesÞ (5) recurrence intervals using the GP distribution.
b
Therefore, the cumulative distribution for the POT
Annual maximum series analysis
model, Equation (3), is given by:
bx For the annual maximum series (AMS) analysis, the
F ðxÞ ¼ eke (6) maximum daily discharge for each year was used for
Equation (6) represents a classic double exponen- this analysis. This dataset consists a different time ser-
tial function which has been used in flood analysis ies than the POT approach; however, maximum
and this equation has some similarities to the Gumbel annual discharges greater than the truncation level
model when analyzing floods using the AMS method. (from the POT analysis) are common data to repre-
We can extract x from Equation (6), to calculate sent F(x) by both approaches. For the AMS analysis,
exceedances as a function of different frequencies the Generalized Extreme Value (GEV) distribution
F(x): was used. The cumulative distribution of the GEV is
given by (Madsen, Rasmussen, and Rosbjerg 1997):
1 lnðF ðxÞÞ !
x ¼ ln (7)
b k xl 1=e
F ðxÞ ¼ exp 1 þ e (12)
where x represents the flood exceedances of different r
frequencies, and all other parameters have been where r is the scale parameter, m is the location par-
defined previously. ameter and e is the shape parameter. If we isolate x
In hydrology, F(x) is also expressed as: from Equation (12), we have:
1 r h e i
F ðxÞ ¼ 1 (8) x¼ lnðF ðxÞÞ 1 þl (13)
T e
where T represents the recurrence interval in years, where x represents the discharge for different values
such that a 2-year flood has a F(x) ¼ 0.5 and a 100- of F(x).
year flood has a F(x) ¼ 0.99. Therefore, the discharge Therefore, the estimation of discharge for different
for different recurrence intervals when introducing recurrence intervals using the GEV distribution
Equation (8) in Equation (7) and by adding the trun- QT_GEV is given by:
cation level QTL is given by: " e #
r 1
1 ln 1 T1 QTGEV ¼ ln 1 1 þl (14)
QTExp ¼ ln þ QTL (9) e T
b k
128 D. CAISSIE ET AL.
Table 2. Testing of the Poisson and negative binomial distibutions for the occurrence of floods using the chi-
square test.
Mean number of events per year / distributons
1 1.5 2
Poisson Neg Bino Poisson Neg Bino Poisson Neg Bino
Catamaran Brook 0.069 0.127 0.268 0.406 0.088 0.178
Northwest Miramichi R. 0.943 0.876 0.255 0.219 0.597 0.617
Little Southwest Miramichi R. 0.014 0.022 0.091 0.167 0.135 0.112
Southwest Miramichi R. 0.057 0.070 0.414 0.764 0.335 0.423
Values in this table represent p-values. Bold values represent a significant difference between observed and theoretical chi-square val-
ues Poisson ¼ Poisson distribution; Neg Bino ¼ Negative binomial distribution.
exception of the LSwMir for a flood count of 1; Table 2). in the application of the POT model from a practical
The fact that the LSwMir did not satisfy the Poisson pro- perspective in the past (e.g. Nagy, Mohssen, and
cess was most likely a random occurrence, as the Hughey 2017).
Poisson model represented well the flood occurrences in
all other cases (11/12). The Poisson distribution was sat-
Flood frequency analysis by POT and
isfied for the LSwMir at flood counts of 1.5 and 2, and
AMS methods
theoretically, it should have been satisfied at the higher
truncation level (i.e. flood count of 1). In the case of the Following the selection of truncation levels and the
LSwMir, which had a mean to variance ratio of 0.7 for a testing of the distribution of the occurrence of floods
flood count of 1, the chi-squared test was rejected for (Poisson and negative binomial distributions), both
both distributions (Poisson and negative binomial, the exponential and GP distributions were used for
therefore not favoring any distributions). As noted in the POT analysis. For the AMS analysis the GEV dis-
previous studies, the longer the time series the more tribution was used (as described in the method sec-
restrictive the chi-square test becomes (Ashkar and tion). Results of this analysis are presented in Figure
Rousselle 1987). The LSwMir had the longest time series 4 using a Gumbel plot (i.e. using the reduced variable
with 64 years of data (Table 1), which likely resulted in a y ¼ ln(ln(F(x))) on the x axis). Here, the annual
more restrictive test, in this case. maximum discharges are presented for the complete
Nonetheless, results show that in most cases both time series (i.e. number of years of record) which cor-
the Poisson and the negative binomial distributions respond to the annual maximum series (AMS)
adequately represented the occurrence of floods for the observed data. The red data points in Figure 4 iden-
POT model. Also, the selection of different truncation tify the three different truncation levels applied in the
levels (flood count of 1, 1.5 and 2) did not show an POT model, and as such, the annual maximum dis-
impact on the distribution of the occurrence of floods charge below each truncation level are not part of the
(Table 2) or on the mean to variance ratio (Figure 3). POT analysis. For this reason, results of the fitted dis-
These results support the fact that when the Poisson tributions are only shown for floods above the 2-year
model is satisfied at a flood count of 2, then it is gen- event (to compare both AMS and POT models).
erally also satisfied at a higher truncation level (e.g. Notably, the exponential distribution plots a straight
Ashkar and Rousselle 1983a; Ashkar and Rousselle line on the Gumbel plot, as this distribution has only
1987). Similar observations can be made for the nega- a scale parameter (scale ¼ 1/b). In Figure 4, the
tive binomial distribution (Cunnane 1979). As the Poisson distribution was used to model the occur-
Poisson distribution requires the estimation of only 1 rence of flood events. Results showed that the GP dis-
parameter, then this distribution should be favored tribution fitted the flood data better than the
over the 2-parameter negative binomial distribution. exponential distribution, especially at the tail of
Using multiple preset truncation levels for each the distribution (at higher recurrence intervals) for
station can be an advantage in the application of the the POT analysis. In addition, the GP distribution fit
POT model. It simplifies the approach and it also was also better for lower mean number events per
provides more flood frequency information and mul- year (e.g. higher truncation levels; values of 1 and
tiple fits of the data to compare. In fact, the selection 1.5). This was most evident for the LSwMir (Figure
of the truncation level using complicated procedures 4(c)) where the POT (GP-1 and GP-1.5) fitted the
(e.g. at the bankfull discharge, graphical procedure, data better followed by the GEV distribution, POT
based on statistical criteria, etc.) has been a problem (GP-2) and exponential distribution. For the LSwMir,
130 D. CAISSIE ET AL.
Figure 4. Fitted distributions (GEV, GP and Exp) for different mean number of events per year (POT analysis using the Poisson dis-
tribution) within the Miramichi River. Values of top of the graph represent the recurrence intervals. Red data points represent the
three different truncation levels for the POT analysis.
the exponential distribution (Exp-1, Exp-1.5 and Exp- were similar for the NwMir and the SwMir but some-
2) did not fit well the data for high recurrence inter- what different to the exponential distribution at high
vals as all flood data above the 10-year recurrence return floods (Figure 4(b, d)). The SwMir showed the
interval are above the fitted values. Results showed a lowest shape parameters (<0.06; GP-1.5 and GP-2)
relatively good fit of the GEV distribution for all and, showed closer results to the exponential distribu-
studied rivers. tion at high return floods (Figure 4(d)).
The parameters of the fitted distributions are pre- The above results showed that both the GEV (AMS)
sented in Table 3, where here the focus will be on the and GP (POT) distributions fitted well the flood data.
shape parameter for both the GEV and GP distribu- The exponential distribution (POT) did not fit well the
tions, as this parameter influences the curvature of data for all stations, as it lacked some flexibility for sta-
the fit (Figure 4). For CatBk, the shape parameter was tions with relatively high shape parameters (Table 3;
slightly negative for GEV distribution (0.019) but stations that showed some level of curvature on a
more strongly negative for the GP-1 and GP-1.5 Gumbel plot; Figure 4). The exponential distribution
(0.315 and 0.260, respectively) as the curvatures of tended to underestimate high return floods. As
the distribution were more pronounced for these lat- expected, the exponential distribution fitted well the
ter cases (Figure 4(a)). The shape parameter was posi- flood data for stations where the shape parameter was
tive and high for the LSwMir GP-1 and GP-1.5 (0.506 close to zero (SwMir; shape 0.03 to 0.14; Table 3 and
and 0.448, respectively) and the GP distribution cap- Figure 4(d)). This is supported by the fact that the
tured well the high return floods followed by the exponential distribution is a special case of the GP dis-
GEV distribution (0.281). In the case of the NwMir tribution (e.g. Naveau et al. 2005). In contrast, the
and SwMir rivers, the shape parameters were gener- exponential distribution lacked some flexibility in fit-
ally less than 0.2, but higher for the GP-1 when com- ting high return floods for the NwMir (Figure 4(b))
pared to GP-1.5 and GP-2 (i.e. lower truncation and especially for the LSwMir (Figure 4(c)). Large dif-
levels). Results between the GEV and GP distributions ferences were observed in the 100-year flood estimates
CANADIAN WATER RESOURCES JOURNAL / REVUE CANADIENNE DES RESSOURCES HYDRIQUES 131
Table 3. Results of the fitted distribution parameters for both the AMS and POT approaches.
Distributions / mean number of events per year
GEV GP-1 Exp-1 GP-1.5 Exp-1.5 GP-2 Exp-2
Catamaran Brook location 6.05
CatBk scale 2.23 2.90 3.02 2.14
shape 0.019 0.315 0.260 0.010
scale ¼ 1/b 2.15 2.37 2.12
Northwest Miramichi R. location 167.0
NwMir scale 65.56 52.10 60.02 60.04
shape 0.105 0.190 0.094 0.090
scale ¼ 1/b 63.96 66.23 65.96
Little Southwest Miramichi R. location 203.0
LSwMir scale 71.42 46.31 41.94 54.15
shape 0.281 0.506 0.448 0.265
scale ¼ 1/b 83.69 71.76 73.56
Southwest Miramichi R. location 791.41
SwMir scale 275.14 189.67 215.47 212.69
shape scale ¼ 1/b 0.030 0.138 219.64 0.057 228.56 0.062 226.72
Note that GP-1, GP-1.5, GP-2, Exp-1, Exp-1.5 and Exp-2 represent the generalized Pareto and exponential distributions for a mean number of 1, 1.5 and
2 events per year. For the POT analysis the truncation levels are provided in Table 1.
between the exponential distribution compared to both GP-1.5 and GEV fitted well the data (Figure 4(c)).
the GEV and GP distributions. In CatBk, the shape par- Notably, this figure also showed that the highest trun-
ameter was negative for both the GEV and GP distribu- cation level (GP-1) fitted better the tail of the distribu-
tions. The fit was more difficult to assess, as this station tion (high return floods) compared to GP-1.5 and
had fewer years of data and the two highest floods were GEV. This observation can also be made for other sites,
of the same magnitude (i.e. 13 m3/s; Figure 4(a)). where the highest truncation level (GP-1) tended to fit
Results from CatBk showed that some distributions better the upper tail of the distribution or high return
showed a negative shape (downward curvature; GP-1 floods (NWMir and SwMir; Figure 4(b, d).
and GP-1.5) of the distribution while others did not. Studies have shown that when the data are distrib-
Again, these results (similar to LSwMir) showed that uted following a GP distribution, then a truncated
the GP-1 and GP-1.5 are more sensitive to the higher dataset (selecting a higher truncation level) also repre-
return flood data. sent a GP distribution (Madsen, Rasmussen, and
Past studies have used the exponential distribution Rosbjerg 1997). The study by Madsen, Rasmussen,
to represent flood data in the POT model (Todorovic and Rosbjerg (1997) showed that imposing a higher
1978; Cunnane 1979; Caissie and El-Jabi 1991); how- truncation level on the GP distribution should pro-
ever, more recent studies have favored the GP distribu- vide the same shape parameter as the lower trunca-
tion generally because it provides a greater flexibility tion levels. Results from the present study showed
(Hosking and Wallis 1987; Rosbjerg, Madsen, and that this is clearly not the case when using actual
Rasmussen 1992; Madsen, Rasmussen, and Rosbjerg flood data, especially from the Miramichi River sys-
1997; Bhunya et al. 2013; Gharib et al. 2017). The pre- tem. In fact, as the truncation is set at higher levels,
sent study showed that the GP distribution generally the shape parameter tends to increase (generally pro-
provided a better fit than the exponential distribution, viding more curvature to the GP distribution and a
which further supports these previous studies. It has better fit of high return flood data; Table 3). The
been shown in the literature that the GEV did not fit selection of multiple truncation levels can effectively
well flood data when negative shape parameters were show the influence of various truncation levels on
observed (Madsen, Rasmussen, and Rosbjerg 1997). flood estimates (Figure 4). This added information
CatBk had a small negative shape parameter (shape- using the POT analysis can be very useful in flood
¼ 0.019), and the GEV showed a fit similar to other frequency analyses, and when comparing this analysis
distributions, with the exception of the relatively high to the GEV (AMS) distribution.
negative shape parameter (GP-1; shape ¼ 0.315 and
GP-1.5; shape ¼ 0.260; Figure 4(a)). When a high
Comparison of flood quantiles with both the
positive shape parameter was observed (>0.2; Table 3),
poisson and negative binomial distributions
both GEV and GP distributions fitted well the flood
data, especially for high return floods (Figure 4(b, c)). Results of the present study showed that the negative
For example, the LSwMir showed among the highest binomial distribution could also be used to represent
shape parameters (shape > 0.4), and both the GP-1, the occurrence of floods as the mean to variance ratio
132 D. CAISSIE ET AL.
Figure 5. Differences in flood discharge (%) when comparing the negative binomial and Poisson distributions to represent the
occurrence of flood in the POT approach.
was slightly less than one for all stations (Figure 3). the flood occurrence model, especially for high return
Results of this analysis are presented in Figure 5 floods (e.g. less than 1% for floods greater than 10-
where percentage differences were calculated using year return periods). This means that very little gains
Equation (9) (equation also shown in figure). In most in terms of fit are to be attained by using the 2-par-
cases, results showed that discharges calculated using ameter negative binomial distribution.
the negative binomial distribution tended to be lower
than those calculated using the Poisson model (with
Fit and uncertainties in flood quantiles
the exception of CatBk; at 50 and 100-year floods;
Figure 5(a)). For CatBk (50 and 100-year), the nega- In order to compare flood estimates (fit of the distribu-
tive binomial distribution calculated higher discharges tions) among different approaches (AMS/POT) and dis-
but within 1.5% of values obtained from the Poisson tributions (GEV, GP and Exp), the root mean square
model. Larger differences between the two flood error (RMSE) was calculated using 20% of the data, that
occurrence models were observed at low return floods is, 20% of the highest observed flood data (see method
(e.g. 2-year) and differences were generally less than section above). The 20% of the highest floods repre-
6% (highest differences was observed at CatBk 6.4% sented 5 years of data for CatBk, 11 years for NwMir,
and LSwMir 5.0%; Figure 5(a, c)). For high return 13 years for LSwMir and 11 years for the SwMir. Results
floods, the differences in discharge (negative binomial of this analysis are presented in Figure 6. It should be
vs. Poisson) was generally very low and less than 1% noted that RMSE comparisons can be made among fit-
for return floods greater than 10 years (Figure 5). ted distributions and AMS vs. POT for each station, but
€ oz and Bayazit (2001) observed similar results and
On€ not among stations (because of differences in drainage
suggested using the Poisson model, the simpler area and sample sizes). RMSEs at CatBk showed that the
model. These results suggest that flood quantiles in GP-1 and the Exp-1 had among the lowest RMSE values
the POT model are not that sensitive to the choice of at 0.68 m3/s (Figure 6(a)) which is consistent with the
CANADIAN WATER RESOURCES JOURNAL / REVUE CANADIENNE DES RESSOURCES HYDRIQUES 133
Figure 6. Root mean square error for the different fitted distributions (at the tail of the distribution) and for different mean num-
ber of events per year (for the POT approach).
better fit of these distributions for high return floods and GP distributions; Figure 6). The exponential dis-
(Figure 4). Other POT results were similar; however, the tribution poorly captured the high return floods and
GEV showed the highest RMSE (worst fit) at 1.12 m3/s RMSEs were 25% (NwMir), 30% (SwMir) and 250%
(Figure 6(a)). For the NwMir, the better fit was observed (LSwMir) higher than RMSEs of the GP distribution.
with the GEV distribution (RMSE ¼ 38 m3/s) followed The exponential distribution performed poorly for
by very good results for the GP distribution (RMSE ¼ high shape parameter sites (e.g. LSwMir; Figure 4 and
40-43 m3/s; Figure 6(b)). The exponential distribution Table 3) and performed slightly better for low
showed the highest RMSEs with values between 50 and (NwMir and SwMir) and negative shape parameter
53 m3/s. The LSwMir showed the best results with the sites (CatBk). Interestingly, the GEV performed well
GP-1 and GP-1.5 (RMSE ¼ 36 m3/s and 39 m3/s), fol- for most stations (with the exception of CatBk) and
lowed by the GEV (RMSE ¼ 43 m3/s; Figure 6(c)). Both the fit for the GEV was comparable to the POT
the GP-2 and the exponential distributions did not fit model (especially for GP-1 and GP-1.5). These results
well the tail of the distribution for the LSwMir (RMSEs suggest that the GEV, GP-1 and GP-1.5 provided
> 62 m3/s and reached 126 m3/s for the Exp-1.5). These equally good fit of flood data for the studied stations.
results confirm the visual fit of the LSwMir in Figure Results of the jackknife sample are presented in
4(c). For the SwMir, the best results were observed for Figure 7. The coefficients of variation (CVs) were
the GP-1 with a RMSE of 115 m3/s (Figure 6(d)). The generally less than 4% for all sites, and highest values
GEV, GP-1.5 and GP-2 also showed a relatively good fit were observed for high return floods. However, we
with RMSEs between 120 m3/s and 123 m3/s. Similar to point out some differences in the pattern of the CVs
previous rivers, the exponential distribution showed the among rivers. In the case of CatBk, the CV increased
highest RMSEs with values greater than 150 m3/s with increasing return periods for the AMS approach
(Figure 6(d)). and reached a value of 3% (GEV, 100-year). The low-
The exponential distribution showed the worst fit est CV was observed with the GP-1 distribution at a
among the tested distributions (with the exception of return floods of 20 years and at approximately 1%
CatBk where results were similar between the Exp (Figure 7(a)). At the 50 and 100-year event both the
134 D. CAISSIE ET AL.
Figure 7. Coefficient of variation for the fit of different distributions (jackknife technique; see text for details) and for different
mean number of events per year within the Miramichi River.
GP-1 and GP-1.5 showed the lowest CVs (less than than 1.65 (Cunnane 1973; Taesombut and Yevjevich
2%). The exponential distribution showed CVs that 1978). As expected, low return floods showed among
increased from 1.5% (2-year) to 2.5% (50 and the lowest CVs (Taesombut and Yevjevich 1978). Low
100-year). return floods showed CVs close to 0.5% (with the
For the NwMir, LSwMir, and SwMir, the CVs exception of CatBk) and high return floods showed
increased with the return period for both the AMS highest CVs (generally less than 4% for 100-year
and POT methods. The CVs for the exponential dis- floods). CatBk showed slightly different results
tribution (POT) were noticeably low (<1%) for low (potentially as a results of a negative shape parameter)
return floods (2-year) and reached slightly over 1% at whereas other stations showed similar CVs at low and
higher return floods (100-year; Figure 7(b–d). For high return floods. Notably, the GEV distribution had
both the NwMir and SwMir, the CVs were low for similar CVs to the POT model at high return floods
low return floods (<0.5%) and increased to 2% (GEV (2%–4%). The CVs of the GEV distribution (at 100-
and GP) for the 100-year flood (Figure 7(b, d). For year) were very close to values observed with the GP-
the LSwMir, results were similar to previous rivers at 2, which is consistent with results of previous studies
lower return floods (2-year); however, the CVs (Cunnane 1973; Taesombut and Yevjevich 1978). As
showed among the highest values at 2.5%–3.8% for expected, the CVs for the GP-1 were slightly higher
the 100-year flood (GP-1; Figure 7(c)). The CVs for than those of the GP-1.5, followed by higher values
NwMir, LSwMir and SwMir remained relatively low for the GP-2. The exponential distribution showed
for the exponential distribution at high return floods very low CVs for all stations at high return floods
and less than 1.5% (e.g. 50 and 100-year; compared to the GP distribution. This is reflective of
Figure 7(b–d)). the fact that this distribution lacks some flexibility in
Studies showed that the POT method could have fitting flood data (Figure 6), and resulted in low vari-
lower flood estimate uncertainties, especially when the ability in flood estimates. Results of the exponential
mean number of floods per year on average is greater distribution show that uncertainties (low CVs) in
CANADIAN WATER RESOURCES JOURNAL / REVUE CANADIENNE DES RESSOURCES HYDRIQUES 135
Regionalization.” Canadian Journal of Civil Engineering under Climate Change and Urban Expansion: A
18 (2): 225–236. doi:10.1139/l91-027. Probabilistic Analysis Using Global Data.” The Science of
Chow, V. T., D. R. Maidment, and L. W. Mays. 1988. the Total Environment 538: 445–457. doi:10.1016/j.scito-
Applied Hydrology, 572. McGraw-Hill, New York. tenv.2015.08.068.
Coles, S. 2001. An Introduction to Statistical Modeling of Nagy, B. K., M. Mohssen, and K. F. D. Hughey. 2017.
Extreme Value, 224. Springer: London. “Flood Frequency Analysis for a Braided River
Cunnane, C. 1973. “A Particular Comparison of Annual Catchment in New Zealand: Comparing Annual
Maxima and Partial Duration Series Methods of Flood Maximum and Partial Duration Series with Varying
Frequency Prediction.” Journal of Hydrology 18 (3–4): Record Lengths.” Journal of Hydrology 547: 365–374. doi:
257–271. doi:10.1016/0022-1694(73)90051-6. 10.1016/j.jhydrol.2017.02.001.
Cunnane, C. 1979. “A Note on the Poisson Assumption in Naveau, P., M. Nogaj, C. Ammann, P. Yiou, D. Cooley,
Partial Duration Series Models.” Water Resources and V. Jomelli. 2005. “Statistical Methods for the
Research 15 (2): 489–494. doi:10.1029/WR015i002p00489.
Analysis of Climate Extremes.” Comptes Rendus
Durocher, M., S. M. Zadeh, D. H. Burn, and F. Ashkar.
Geoscience 337 (10–11): 1013–1022. doi:10.1016/j.crte.
2018. “Comparison of Automatic Procedures for
2005.04.015.
Selecting Flood Peaks over Threshold Based on € oz, B., and M. Bayazit. 2001. “Effect of the Occurrence
On€
Goodness-of-Fit Tests.” Hydrological Processes 32 (18):
2874–2887. doi:10.1002/hyp.13223. Process of the Peaks over Threshold on the Food
Elwood, J. W., and T. F. Waters. 1969. “Effects of Floods Estimates.” Journal of Hydrology 244 (1–2): 86–96. doi:
on Food Consumption and Production Rates of a Stream 10.1016/S0022-1694(01)00330-4.
Brook Trout Population.” Transactions of the American Rosbjerg, D., H. Madsen, and P. F. Rasmussen. 1992.
Fisheries Society 98 (2): 253–262. doi:10.1577/1548- “Prediction in Partial Duration Series with Generalized
8659(1969)98[253:EOFOFC.2.0.CO;2] Pareto Distributed Exceedances.” Water Resources
Gharib, A., E. G. R. Davies, G. G. Goss, and M. Faramarzi. Research 28 (11): 3001–3010. doi:10.1029/92WR01750.
2017. “Assessment of the Combined Effects of Threshold Salvadori, G., C. De Michele, N. T. Kottedoda, and R.
Selection and Parameter Estimation of Generalized Rosso. 2007. Extremes in Nature: An Approach Using
Pareto Distribution with Applications to Flood Copulas, 292. Springer: The Netherlands.
Frequency Analysis.” Water 9 (9): 692. doi:10.3390/ Taesombut, V., and V. Yevjevich. 1978. “Use of Partial
w9090692. Flood Series for Estimating Distribution of Maximum
Hosking, J. R. M., and J. R. Wallis. 1987. “Parameter and Annual Flood Peak.” Hydrology Papers 97, Colorado
Quantile Estimation for the Generalized Pareto State Univeristy, p. 71.
Distribution.” Technometrics 29 (3): 339–349. doi:10. Todorovic, P. 1970. “On Some Problems Involving Random
1080/00401706.1987.10488243. Number of Random Variables.” The Annals of
Irvine, K. N., and P. R. Waylen. 1986. “Partial Series Mathematical Statistics 41 (3): 1059–1063. doi:10.1214/
Analysis of High Flows in Canadian Rivers.” Canadian aoms/1177696981.
Water Resources Journal 11 (2): 83–91. doi:10.4296/ Todorovic, P. 1978. “Stochastic Models of Floods.” Water
cwrj1102083. Resources Research 14 (2): 345–356. doi:10.1029/
Koutsoyiannis, K. 2004. “Statistics of Extremes and WR014i002p00345.
Estimation of Extreme Rainfall: I. theoretical Todorovic, P., and J. Rousselle. 1971. “Some Problems of
Investigation.” Hydrological Sciences Journal 49 (4): Flood Analysis.” Water Resources Research 7 (5):
575–590. doi:10.1623/hysj.49.4.575.54430. 1144–1150. doi:10.1029/WR007i005p01144.
Lang, M., T. B. M. J. Ouarda, and B. Bobee. 1999. Todorovic, P., and D. A. Woolhiser. 1972. “On the Time
“Towards Operational Guidelines for over-Threshold
When the Extreme Flood Occurs.” Water Resources
Modeling.” Journal of Hydrology 225 (3–4): 103–107. doi:
Research 8 (6): 1433–1438. doi:10.1029/
10.1016/S0022-1694(99)00167-5.
WR008i006p01433.
Langbein, W. B. 1949. “Annual Flood and the Partial-
Todorovic, P., and E. Zelenhasic. 1970. “A Stochastic Model
Duration Flood Series.” Transactions, American
Geophysical Union 30 (6): 879–881. doi:10.1029/ for Flood Analysis.” Water Resources Research 6 (6):
TR030i006p00879. 1641–1648. doi:10.1029/WR006i006p01641.
Madsen, H., P. Rasmussen, and D. Rosbjerg. 1997. Waylen, P., and M.-K. Woo. 1983a. “Stochastic Analysis of
“Comparison of Annual Maximum Series and Partial High Flows in Some Central British Columbia Rivers.”
Duration Series Methods for Modeling Extreme Canadian Journal of Civil Engineering 10 (2): 205–213.
Hydrologic Events 1. At-Site Modeling.” Water Resources doi:10.1139/l83-036.
Research 33 (4): 747–757. doi:10.1029/96WR03848. Waylen, P., and M.-K. Woo. 1983b. “Stochastic Analysis of
Milner, A. M., A. L. Robertson, M. J. McDermott, M. J. High Flows Generated by Mixed Processes.” Canadian
Klaar, and L. E. Brown. 2013. “Major Flood Disturbance Journal of Civil Engineering 10 (4): 639–648. doi:10.1139/
Alters River Ecosystem Evolution.” Nature Climate l83-092.
Change 3 (2): 137–141. doi:10.1038/nclimate1665. Zelenhasic, E. 1970. “Theoretical Probability Distribution
Muis, S., B. G€ uneralp, B. Jongman, J. C. J. H. Aerts, and for Flood Peaks.” Hydrology Paper 42, Colorado State
P. J. Ward. 2015. “Flood Risk and Adaptation Strategies Univeristy, p. 35.