Partial Duration Series - SWETAPADMA

https://doi.org/10.
5194/hess-2021-570
Preprint. Discussion started: 19 November 2021
c Author(s) 2021. CC BY 4.0 License.
Technical Note: Flood frequency study using partial duration series

coupled with entropy principle
Sonali Swetapadma1, Chandra Shekhar Prasad Ojha2
1
Research Scholar, Department of Civil Engineering, IIT Roorkee, Roorkee – 247667, Uttarakhand, India
5 2
Professor, Department of Civil Engineering, IIT Roorkee, Roorkee – 247667, Uttarakhand, India
Correspondence to: Sonali Swetapadma (sonaliswetapadma1992@gmail.com)
Abstract. Quality discharge measurements and frequency analysis are two major prerequisites for defining a design flood.
Flood frequency analysis (FFA) utilizes a comprehensive understanding of the probabilistic behavior of extreme events but
has certain limitations regarding the sampling method and choice of distribution models. Entropy as a modern-day tool has
10 found several applications in FFA, mainly in the derivation of probability distributions and their parameter estimation as per
the principle of maximum entropy (POME) theory. The present study explores a new dimension to this area of research, where
POME theory is applied in the partial duration series (PDS) modeling of FFA to locate the optimum threshold and the
respective distribution models. The proposed methodology is applied to the Waimakariri River at the Old Highway Bridge site
in New Zealand, as it has one of the best quality discharge data. The catchment also has a history of significant flood events
15 in the last few decades. The degree of fitness of models to the exceedances is compared with the standardized statistical
approach followed in literature. Also, the threshold estimated from this study is matched with some previous findings. Various
return period quantiles are calculated, and their predictive ability is tested by bootstrap sampling. An overall analysis of results
shows that entropy can be also be used as an effective tool for threshold identification in PDS modeling of flood frequency
studies.
20
1 Introduction
Frequency analysis of hydrologic events extracts some significant statistical interference from the data that helps in deriving
frequency distribution. This distribution becomes a function of the probability of exceedance or return period unique for each
gauging site. The at-site flood frequency analysis is suitable for reliably predicting the design discharge of various hydraulic
25 structures to ensure their safety planning and management (Meng et al., 2007; Stedinger et al., 1992). Flood frequency analysis
(FFA) comprises two types of sampling approaches: Annual Maximum Series (AMS) and Partial Duration Series (PDS). An
annual maximum series includes the largest flow of each year, thereby having one event per year while the partial duration
series is derived by extracting all the independent peaks exceeding a particular discharge, called threshold. The average number
of events per year (λ) of a PDS is always greater than the number of years for which data is available (N). So this is beneficial
30 where data are scarce (Lang et al., 1999; Madsen et al., 1997; Önöz & Bayazit, 2001), as it mainly deals with many extreme
values comprising primary information about any flood event. A PDS represents the complete flood generating process by
1
https://doi.org/10.5194/hess-2021-570
dual modeling of peaks above a threshold, where one is used to model the arrival of peaks and the other for fitting distribution
to their magnitude. The application of PDS has some statistical constraints in selecting thresholds and appropriate probability
distributions (Guru & Jha, 2016; Adamowski, 2000; Beguería, 2005; Claps & Laio, 2003; Cunnane, 1973; Pham et al., 2014).
35 Previously, some researchers proposed the identification of thresholds in PDS based on the average number of peaks per year
(λ). Langbein (1949) suggested threshold as the lowest annual maximum event of the series, thereby making the value of λ at
least one. Similarly, the better performance of PDS with λ of 1.65 over the AMS model was observed by Cunnane, 1973;
Stedinger et al., 1992; Madsen et al., 1997, etc. Some other studies proposed the choice of threshold depending upon the
Poisson arrival of peaks. Cunnane (1979) derived a dispersion index test to check the suitability of the Poisson process in
40 modeling the arrival rate of peaks. Ashkar and Rousselle (1987) also reported that thresholds should be selected in such a way
to make the flood exceedances fit the Poisson process. Following this, Lang et al. (1999) suggested operational guidelines for
choosing a threshold where an initial region is identified from the graphical analysis of dispersion index test and the variation
of mean exceedances above a threshold and the largest threshold within this region with λ > 2 or 3 becomes the optimum
threshold. Other threshold selection techniques were also proposed; for example, Beguería (2005) applied threshold censoring
45 with Generalized Pareto and Poisson distribution of PDS modeling. Solari et al. (2017) developed a framework for automatic
threshold selection using the Anderson-Darling EDF statistic. Northrop et al. (2017) used the Bayesian cross-validation method
to derive inferences from several thresholds instead of finalizing a single threshold value. Some conventional graphical tools
also found applications in threshold selection, such as; mean residual life plot (MRLP) and parameter stability plot (Ghosh and
Resnick, 2010). Various pieces of literature exist on comparing the performance of PDS with AMS models in flood frequency
50 analysis. The relatively better performance of PDS compared to AMS even when λ =1 was observed by Bezak et al. (2014).
Nagy et al. (2017) also carried out a flood frequency study for the Waimakariri River catchment in New Zealand. Statistical
results indicated the better accuracy of PDS over AMS, where the PDS with λ = 3.98 gave the best results. They suggested the
use of PDS is more applicable in those areas where the historical data is unavailable. A detailed review of these threshold
estimation techniques and their uncertainty analysis is given by Scarrott and Macdonald (2012). Langousis et al. (2016) also
55 presented a review of all usual methods available for threshold identification where they classified these approaches into three
categories: nonparametric methods, graphical tools, and goodness of fit tests that include statistical metrics and the hill-
assumption-based process. They observed that the automation of the mean residual life plot performed better with less
sensitivity to the length of the sample and low levels of data quantization.
Besides all these, entropy has emerged as an effective modern-day tool in recent years. It has found a vast application in the
60 derivation of probability distributions and their parameter estimation based on the principle of maximum entropy theory
(POME). For example, Xiong et al. (2018) proposed Halpen distribution with POME for flood frequency study and applied
the same to the annual maximum flow series at 12 gauging sites. A Monte Carlo simulation tested the predictive and descriptive
ability of the approach. The results suggested that the proposed methodology can be applied as an alternative in FFA. Deng
(2019) presented a distribution free method for FFA combining maximum entropy and Akaike’s information criterion. Zhang
65 et al. (2020) applied an entropy based model selection technique in flood frequency analysis with the AMS sampling approach.
2
Monte Carlo simulation analyzed the performance of the proposed method, which confirmed its better accuracy when the
sample size is small with a positive skewness coefficient and bell shaped density function. Even though there exist various
threshold identification techniques, there are a very few applications of entropy in PDS sampling of flood frequency analysis.
So, the present work augments a new dimension to this field, where an entropy-based approach is proposed to choose the
70 threshold in PDS as well as the underlying dual models. A new approach based on POME is suggested as a threshold selection
criterion. This proposed methodology is applied to the daily discharge data of the Waimakariri River at Old highway bridge
site in New Zealand.
2 Theoretical background
This section describes the background of the methodology proposed in the present research, which includes (i) probability
75 distributions for dual modeling of PDS, (ii) the potential of the entropy approach, (iii) entropy functions of probability
distributions, (iv) independence criteria and Poisson’s hypothesis test, and (v) model selection criteria.
2.1 Probability distributions for the dual modeling of PDS
In the present study, four probability distributions are used to model the magnitude of exceedances above a particular threshold:
Generalized Pareto distribution (GP), Generalized Extreme Value (GEV) distribution, Pearson type III (P 3), and Log Pearson
80 3 distribution (LP 3) because of their widespread applications in flood frequency study (Cunnane, 1988; Stedinger et al., 1993,
Karim and Chowdhury 1995; Rao and Hamed 2000; Ghorbani et al. 2010; Chen et al. 2015; Benumar et al. 2017; Drissia et
al. 2019; Swetapadma and Ojha, 2020). The shape parameter of such three-parameter distributions considers the effect of
skewness present in most hydrologic data series. The Generalized Pareto (GP) distribution is usually known as the ‘Peaks Over
Thresholds’ (POT) model in hydrology as it models the exceedances over the threshold because of its underlying properties
85 (Davison and Smith 1990; Guru and Jha 2016; Hosking and Wallis 1987; Pham et al. 2014; Smith 1989; and Solari et al. 2017).
Generalized extreme value distribution is mainly used to model extreme statistical events. Similarly, various hydrological
processes are effectively modeled using the gamma family distributions like P 3 and LP 3 (Bobee and Ashkar, 1991). LP3
distribution is proposed as a standard distribution for design flood estimation in England and Europe (England, 2011; Bezak
et al., 2014). The parameters of these distributions are estimated using the L moment method. L-moments are the linear
90 combination of rank statistics, thereby more robust to outliers in the data than ordinary moments. Also, while estimating
quantiles from a small sample, less unbiased inferences can be made using the L-moments (Hosking, 1990; Hosking and
Wallis, 1997; Sankarasubramanian and Srinivasan, 1999; Bezak et al., 2014). For a sorted sample of length n (such as x1≤ x2
≤x3 ≤x4 ≤……. ≤xn-1 ≤xn), the three L moments i.e. l1, l2, and l3 can be expressed as,
(i−1)(i−2)……….(i−r)
l1 = β0; l2 = 2β1 - β0 ; and l3 = 6β2 – β1+β0 ; where βr = n−1 ∑ni=r+1 (n−1)(n−2)……..(n−r) xi . L skewness (t3) equals to l3 / l2.
95 Details of all these distributions, such as their cumulative distribution function (CDF), parameters, and the respective L moment
equations, are given in Table 1.
3
Table 1. Continuous probability distributions used to model the exceedances in PDS (Source: Swetapadma and Ojha, 2020)
Distribution Cumulative Distribution Function L moment expressions for parameters

Models (CDF)
Generalized
F ( x)  exp( (1  (kz)1/ k ) C = 2/(3+t3)
Extreme Value 𝑘𝜆2
k = 7.8590c + 2.9554c2 ; 𝜎 = ;µ=
𝛤(1+𝑘)(1−2−𝑘 )
(GEV) 𝜎(𝛤(1+𝑘)−1)
𝜆1 +
𝑘
Generalized Pareto 1 k = (3t3 – 1) / (1+ t3); σ = λ2 (1-k) (2-k); μ = λ1 – σ /

F ( x) 
(GP) 1  (1  kz) 1/ k (1-k)
Pearson Type III ( x ) /  ( ) For 0 < |𝑡3 | < 1/3;

F ( x)  1+0.2906𝑧
(P 3)
( ) z = 3πt32; 𝛼 =
𝑧+0.1882𝑧 2 +0.0442𝑧 3
1
For < |𝑡3 | < 1;
3
0.36067𝑧−0.59567𝑧 2 +0.25361𝑧 3
z = 1 - |𝑡3 |; 𝛼 =
1−2.78861𝑧+2.5609𝑧 2 −0.77045𝑧 3
For all t3 values; β = sign(t3) π1/2λ2(Γ(α)/Γ(α+0.5)),

and ϒ = λ1-(α×β)
Log Pearson 3 (LP (ln(x ) ) /  ( ) Same equations as per P 3 distribution
F ( x) 
3)
( )
for GEV and GPA, z = (x-µ)/σ; where k, µ, and σ are the shape, location, and scale parameter respectively. Similarly, α, ϒ and
β represent the shape, location, and scale parameters of P 3 and LP 3 distributions.
100 Based on the dispersion index value, Poisson distribution, Binomial or Negative Binomial distribution is used to represent the
arrival of peaks above any threshold (Lang et al., 1999). Table 2 gives details of these distributions, like their probability mass
functions and expressions for mean and variance.
Table 2. Discrete distributions used to model the arrival of peaks in the PDS
Distribution Models Parameters Probability Mass Mean and Variance

Function (PMF)
Poisson λ P =( λk e-λ)/k!, k =0,1,2… E[X] = Var[X] = λ
Binomial n = 0,1,2…number of E[X] = np
trials 𝑛 Var[X] = npq
( ) 𝑝𝑘 𝑞 𝑛−𝑘
𝑘
p ϵ [0,1], i.e., success
probability of each trial
q = 1-p
4
Negative Binomial r > 0; the number of 𝑘+𝑟−1 𝑘 𝑟 E[X] = pr/(1-p)

( )𝑝 𝑞
failures until the 𝑘 Var[X] = pr/(1-p)2
experiment is stopped
p ϵ [0,1], i.e., the success
probability of each trial
The exceedance probabilities of a PDS and AMS, i.e., (P(X) = 1 – F(X) = 1 / T) are not comparable if λ > 1. The statistical
105 relationship proposed by Langbein (1949) based on Poissonian assumption is most commonly used to convert the recurrence
intervals from PDS to the annual domain. However, Poisson distribution is not the only choice for modeling the arrival of
peaks. So in the present study, the following expression is used (Mohssen, 2009; Nagy et al., 2017).
1 1 1 𝜆−1
= 𝜆 ( ) (1 − ) (1)
𝑇𝑎 𝑇𝑃 𝑇𝑃
where TP is the return period in the PDS context and Ta is the annual return period, 1 – F(X) = 1/Ta.
110 2.2 The potential of the entropy approach
Entropy best describes the unpredictability associated with a system by signifying the amount of disorderness. It is a better
measure of information than variance as it relates to higher-order distribution moments (Ebrahimi et al., 1999). C.E. Shannon
gave a quantitative measure of entropy for a particular distribution. For ‘n’ number of discrete random variables such as
Y={y1… ….yn}, Shannon’s entropy is given by (Shannon, 1948),
115 H(y) = E [I(y)] = E [-ln (P(y))] (2)
E represents the expected value function, I(y) is a random variable signifying the information contained in the dataset, and P(y)
is the probability mass function. The above expression of entropy can be expressed as,
H(y) = ∑𝑛𝑖=1 𝑃(𝑦𝑖 )𝐼(𝑦𝑖 ) = − ∑𝑛𝑖=1 𝑃(𝑦𝑖 )𝑙𝑜𝑔𝑏 (𝑃(𝑦𝑖 )) (3)
Here ‘b’ is the base of the logarithm, which defines the units of entropy. In this paper, ‘e’ will be used as the logarithm base,
120 i.e., the unit of H becomes ‘Nats’. Similarly, the expression of entropy for a continuous random variable is given below.
∞
𝐻 = − ∫−∞ 𝑓(𝑦) ln(𝑓(𝑦)) 𝑑𝑥 (4)
H is the amount of uncertainty represented by a probability distribution, and f(y) is the probability density function of the
continuous random variable ‘Y’. The above form of entropy is known as ‘Continuous entropy’ or ‘Differential entropy.’
Expressions for continuous entropy for various probability distributions can be derived from Eqn. 4.
125 The principle of maximum entropy given by (Jaynes, 1957) states that while making inferences from limited available data,
the probability distribution with the maximum entropy is the best to represent the data. Entropy derives more information from
a probability distribution to characterize the input data effectively. So, the minimally biased distribution will have the
maximum entropy subject to the available limited data. It will be more probable or less predictable than other distributions
with lower entropy values. Therefore, while characterizing unknown events or some limited data with any statistical model,
5
130 one should prefer the maximum entropy distribution (Lee et al., 2011). POME has been applied to derive several probability
distributions frequently used in hydrology and their respective parameters (Singh, 1998). Apart from POME, the concept of
entropy has found numerous applications in many areas of research, such as clustering of the homogeneous region ( Basu &
Srinivas, 2013; Yao et al., 2000), thresholding for image edge detection, image grey level thresholding (Chang et al., 1994;
Pal, 1989; Pun, 1981). Some remarkable research in the application of entropy includes (Singh, 1997; Alfonso et al., 2010;
135 Krstanovic & Singh, 1992; Atieh et al., 2015; Moramarco and Singh, 2010; Hao and Singh, 2011; Rajsekhar et al., 2015; Li
and Zheng, 2016; Zhang et al., 2020).
2.3 Entropy functions of probability distributions
The expression for the entropy of the three-parameter GP distribution is derived here. The probability density function (PDF)
of three-parameter GP distribution is;
1 𝑘(𝑥−µ) −1−1
140 𝑓(𝑥) = (1 + ) 𝑘 ; for k ≠ 0 (5)
𝜎 𝜎
Entropy for this GP distribution can be derived by putting Eq. (5) in Eq. (4);
∞ 1 ∞ 𝑘(𝑥−µ)
𝐼(𝑓) = 𝑙𝑛(𝜎) ∫𝑜 𝑓(𝑥)𝑑𝑥 − (−1 − ) ∫𝑜 𝑙𝑛(1 + )𝑓(𝑥)𝑑𝑥 (6)
𝑘 𝜎
Constraints of the equation can be expressed as;

∞
∫0 f(x)dx=1
∞ 𝑥−µ 𝑥−µ
145 ∫0 𝑙𝑛[1 + 𝑘 𝜎
]𝑓(𝑥)𝑑𝑥 = 𝐸[𝑙𝑛 (1 + 𝑘
𝜎
)] (7)
So the final interpretation of entropy becomes,

1 𝑘(𝑥−µ)
𝐼𝐺𝑃3 (𝑓) = 𝑙𝑛(𝜎) − (−1 − ) 𝐸 [𝑙𝑛 (1 + )] (8)
𝑘 𝜎
Similarly, the expressions for entropy functions for the other three continuous distributions used in this study can be derived.
For the Generalized extreme value distribution with PDF given as,
1 𝑘(𝑥−µ) (1−𝑘) 𝑘(𝑥−µ)
150 𝑓(𝑥) = (1 − ) 𝑘 𝑒𝑥𝑝 [− (1 − )]1/𝑘 (9),
𝜎 𝜎 𝜎
the expression for entropy is derived as,

𝑘−1 𝑘(𝑥−µ) 𝑘(𝑥−µ) 1/𝑘
𝐼𝐺𝐸𝑉 (𝑓) = 𝑙𝑛(𝜎) + 𝐸 [ 𝑙𝑛 (1 − )] + 𝐸[1 − ] (10)
𝑘 𝜎 𝜎
Similarly, the continuous entropy functions for P 3 and LP 3 distribution are,

𝑰𝑷 𝟑 (𝒇) = 𝒍𝒏(𝜶𝜷 𝜞(𝜷) ) − + ̅ − (𝜷 − 𝟏)𝑬[𝒍𝒏(𝒙 − ϒ)]
ϒ 𝒙
(11)
𝜶 𝜶
ϒ 𝛼+1
155 𝐼𝐿𝑃 3 (𝑓) = 𝑙𝑛(𝛼 𝛽 𝛤(𝛽) ) − + ( ̅̅̅
)𝑦 − (𝛽 − 1)𝐸[𝑙𝑛(𝑦) − ϒ)], y = ln(x) (12)
𝛼 𝛼
The entropy functions for discrete distributions (Poisson, Binomial, and Negative binomial) can be calculated by simply putting
their probability mass function from Table 2 in Eq. (3).
6
2.4 Independence and Poisson’s hypothesis test
One of the basic assumptions of FFA is that the data series to be analyzed is independent or random. If a PDS is not free from
160 dependent values, it underestimates the variability of the quantiles (Fawcett & Walshaw, 2012). So before conducting any
statistical analysis on a partial duration series, it is essential to justify this independence criterion, which is quite a complex
task. Ashkar and Rousselle (1987) stated that the exceedances above a particular threshold level are independent if the average
return period between successive events is relatively longer. Such a statistical phenomenon cannot merely affect the
independence criteria as the peak discharge values also depend upon various catchment dynamics concerning space and time,
165 such as catchment area, the frequency of rainfall and their magnitude, etc. (Lang et al., 1999). In the present study, the criteria
given by the United States Water Resources Council (USWRC) are used to select independent peaks above a particular
threshold level. According to which two successive events are independent if they are separated by as many as days as five
plus the natural logarithm of the square miles of drainage area, with the requirement that intermediate flows must drop below
75% of the lower of the two consecutive values (USWRC, 1982). Therefore, two successive flood peaks will be dependent,
170 which causes rejection of the second peak if they satisfy the following expression.
ɵ < 5days + ln (A) OR qmin > (3/4) min [q1, q2] (13)
where ɵ is the number of days between occurrences of two successive events, A is the catchment area in square miles, qmin is
the minimum intermediate discharge between two peaks q1 and q2. The present study applies this independence criterion to
remove all the dependent flood peaks from the PDS derived at each threshold. To justify the independence of these PDS,
175 modified Kendall’s test (Claps and Laio, 2003) is performed at each gauging site. Ferguson et al. (2000) proposed Kendall’s
tau test for serial dependence. Visual observation of autocorrelation plots also gives an idea about the independence of peaks.
The Partial Duration Series (PDS) at each threshold is then checked for Poisson’s hypothesis by applying the dispersion index
test (Cunnane, 1979), which helps identify the best fit discrete distribution suitable for modeling the arrival rate of peaks per
year. A more detailed description of this test is given by (Lang et al., 1999).
180 2.5 Exceedance model selection criteria
In the present study, different model selection criteria assessed the degree of fitting of continuous distributions to the
exceedances above a threshold. It includes three goodness of fit (GOF) statistics, i.e., Anderson-Darling (AD), modified
Anderson-Darling statistics (ADC), and Kolmogorov-Smirnov test (KS), which measure the fitting of cumulative distribution
functions. However, AD and ADC give more weightage to higher quantiles. Information-based criteria such as modified
185 Akaike Information Criterion (AICC) and Schwarz Bayesian Criterion (BIC) were also applied as the combination of these
with ADC helps evaluate flood frequency models. Along with this, root mean square error (RMSE), relative root mean square
error (RRMSE), correlation coefficient (CC) were used to measure the error between the observed and predicted quantiles
(Swetapadma and Ojha, 2020). The four candidate distributions were fitted to the magnitude of peaks to measure these
7
statistical parameters. Based on a statistical ranking method (Olofintoye et al. 2009), these models were ranked between one
190 to four based on the value of these model selection parameters listed in Table 3. The distribution with the minimum RMSE,
RRMSE, AICC, BIC, KS, AD, ADC, or the maximum CC gets the rank one. The ranks assigned from each of these test
statistics were added, and the distribution with the lowest total rank became the best fit distribution for the exceedances above
a threshold.
Table 3. Model selection criteria for the choice of best fit exceedance distribution (Source: Swetapadma and Ojha, 2020)
Criteria Equations Reference
Kolmogorov-Smirnov i 1 i (Frank and

D  max ( F ( xi )  ,  F ( xi ))
test (KS) 1i  n n n Massey 1951)
Anderson-Darling test 1 n (Anderson and
(AD)
A2   n  {( 2i  1)[ln ( F ( xi )  ln(1  F ( xni1 )]}
n i 1 Darling 1952)
Akaike Information 2( m)( m  1) (Burnham and

AIC c  AIC 
Criterion – second-order n  m 1 ; Anderson 2002)
variant (AICC)
Where AIC  n  ln( RSS / n)  2k
Schwarz Bayesian BIC  (ln( n)  k )  (n  ln( RSS / n))
Information Criterion
(BIC)
Root Mean Square Error (O  Pi ) 1/ 2
2 (Hyndman and
RMSE  [ i ]
(RMSE) nm Koehler 2006)
Relative Root Mean 1 O  Pi 2 1 / 2 (Yu et al., 1994)

Square Error (RRMSE)
RRMSE  [
nm
 { i
Oi
} ]
Correlation Coefficient
CC 
{(O  O )( P  P )}
i i
(CC)
{ (O  O )  ( P  P ) }
i
2
i
2 1/ 2
Modified Anderson- n n 2i  1 (Sinclair et al.,

ADC    [( 2  ) log( 1  F ( xi )  2 F ( xi )]
Darling statistics (ADC) 2 i 1 n 1990)
195 f(xi) is the cumulative distribution function; i represents the rank of an observation; n is the length of the sample; m is the
number of distribution parameters; RSS stands for the residual sum of squares; oi and pi represent the observed and predicted
peak discharge values respectively; 𝑜̅ and 𝑝̅ are the mean of the observed and predicted series.
3 Methodology
8
There is dual modeling of extreme values in the partial duration series of FFA; one model is used for the arrival of peaks per
200 year (M1), and the other is to fit the magnitude of these peak values (M2). The present study suggests the optimum threshold
for PDS analysis is the one where the combined entropy of both these models is the maximum. After removing dependent
peaks from the PDS, the variation of the average number of peaks per year and the mean residual life plot are analyzed
graphically to identify a suitable range of thresholds. The Dispersion index test gives the appropriate distribution to model the
arrival of peaks, and the respective entropy (HM1) is calculated from their probability mass functions. Four candidate
205 distributions are fitted to the magnitude of exceedances to derive the entropy values. The degree of fitting of these continuous
distributions to the exceedance series is compared with the conventional statistical approach using eight different model
selection criteria described in the previous section. Finally, the total entropy (Htotal) at each threshold is calculated as the sum
of these two entropy components. The threshold with the maximum H total is selected as the optimum threshold for PDS
modeling of the study area. The optimum threshold derived from the proposed methodology is compared with some existing
210 literature.
The T year event is expressed as the (1-1/λT) quantile in the PDS perspective. For example, return period estimates (XT, predicted)
are from the GP/PDS model using the following expression (Rosbjerg, 1985)
1
𝑘(𝑥−µ) −𝑘 1
𝐹(𝑥) = 1 − [1 + ] = 1− (14)
𝜎 𝜆𝑇
where λ is the average number of exceedances per year, T is the return period (years), and k, µ, and σ represent the three
215 parameters of the GP distribution obtained from the PDS extracted at a threshold. Similarly, various return period quantiles
are computed, and the bootstrap sampling approach helps plot the respective 95% confidence interval.
Figure 1 depicts the detailed methodology followed in this research.
9
Preliminary analysis of discharge series
Extract PDS at those thresholds and apply USWRC independence criteria; check for independence
using Modified Mann Kendal’s Tau test and auto correlation plots
Plot t vs λ to identify the region where further increase in thresholds cause decrease in λ
Plot mean of exceedance above threshold and identify the region where this mean of
exceedances varies linearly with the threshold
Based on these graphical analysis, identify the range of peak values and apply dispersion index test to
find the suitable distribution for modeling the arrival of peaks above that threshold
Calculate entropy of Model 1 (HM1) (Section 2.3)
Fit selected continuous probability distributions to the value of exceedances and estimate entropy of
all the models (HM2) (Section 2.3)
Calculate the total entropy at each threshold; identify the optimum threshold as the one with
the maximum total entropy (Htotal)
Comparison of degree of fitness of exceedances with conventional statistical approach using

suitable model selection criteria
Return period flood flow estimation at the optimum threshold considering the underlying distribution
models and check for predictive ability through bootstrap sampling
220 Figure 1: Flowchart showing the detailed methodology followed in this study.
4 Study area
The proposed methodology is applied to the discharge data series obtained for the Waimakariri River at the Old Highway
Bridge (OHB) site. The Waimakariri River is one of the largest rivers with 150 km in length and a catchment area of 3654 km2
which flows eastwards from the Southern Alps. It is a large and steep river with a braded gravel-bed river. The upper region
10
225 of this catchment is mountainous and glaciated. Flood management is one of the significant issues with the river because of its
natural tendency to flow into multiple courses. The major flood in this river is due to heavy rainfall on the Main Divide (Nagy
et al., 2017). 30% of the flow is because of snow melting in the spring season (Gray et al., 2006). The Canterbury Regional’s
Council (Environment Canterbury [ECan] has placed a gauging station on the river at the Old Highway Bridge (OHB). This
gauging site has one of the country's excellence and oldest discharge data set. So the quality of data sets available for this site,
230 along with various studies on the river's flood problem, motivated the authors to apply the proposed methodology here. Hourly
data from 1 January 1967 to 31 December 2015 were obtained from Environment Canterbury Regional’s Council, and the
maximum daily discharge series was extracted to carry out the frequency analysis.
Table 4. Some major flow properties of the data series
Series Years Mean flow Standard Skewness Largest flow Mean annual
available [m3/s] deviation. on the flood [m3/s]
[m3/s] record
[m3/s]
Waimakariri daily 1967-2015 119.064 119.379 5.576 2835.579 in 1450.868
maximum flow 1979
5 Results and discussion
235 The proposed methodology for the choice of threshold in partial duration series was applied to the daily maximum discharge
data for the Waimakariri River at the Old Highway bridge site. The annual maximum series having 49 events was extracted,
and initially, some thresholds were applied to derive the respective PDS. Satisfying the independent criteria of the peaks is a
prerequisite in any statistical frequency analysis. So, the dependent peaks from those extracted PDS were dropped by following
USWRC independence criteria as described in Sect. 2.4 Visual observation of autocorrelation plots confirmed the absence of
240 serial dependence in the PDS samples. Also, Kendall’s Tau test verified the independence of these series, and the PDS at some
thresholds were omitted from frequency analysis due to the presence of a significant positive trend at a 95% confidence level.
0.20
(a)
. (b)
0.15
Kendall's Tau at 5% significance
0.10
245
0.05
level
0.00
200 700 1200
-0.05
-0.10
Kendall's Tau
-0.15
Critical Tau 250
-0.20
Threshold (m3s-1)
11
(c) (d)
Figure 2: Test for the independence of flood peaks above the thresholds, (a) Modified Mann-Kendall’s Tau test, (b) – (d)
autocorrelation plot at a threshold of 300 m3s-1, 500 m3s-1, and 700 m3s-1, respectively.
255 For a PDS, an extremely low threshold makes the whole series lie above it, and then with an increase in threshold, more peaks
are retained, and the value of λ rises. After reaching a peak value, λ gradually decreases until no peaks are included when the
threshold is greater than the largest discharge in the record. So this gradual variation of the average number of peaks per year
divides the entire range of thresholds into four domains, as described by Lang et al. (1999). Figure 3 depicts the variation of the
average number of peaks per year (λ) with threshold level for the study area, and domain 3 was identified.
260 9.0
Average number of peaks per year
8.0
7.0 Domain 3
6.0
5.0
(λ)
4.0
265 3.0
2.0
λ > 2 condition
1.0
0.0
0 200 400 600 800 1000 1200 1400
Threshold (m3s-1)
Figure 3: Variation of the average number of peaks per year with the threshold.
270 Lang et al. (1999) also suggested respecting the condition of λ > 2 while analyzing the variation of λ. The thresholds with λ >
2 are marked in Figure 3. Davison and Smith (1990) recommended selecting the threshold within a region where the mean of
exceedances above a threshold is a linear function for the stability of the distribution parameters. This plot is known as Mean
12
̅̅̅𝑡 – t) with the

Residual Life Plot (MRLP). Figure 4 demonstrates the variation of the mean of exceedances above threshold (𝑋
threshold (t) along with the 95% confidence interval of MRLP.
500 500
(a) (b)
Mean excess above threshold
Mean excess above threshold

450
400
400
300
(m3s-1)
350
(m3s-1)
300
200
250
100
200
150 0
200 400 600 800 1000 1200 550 650 750 850 950
Threshold (m3s-1) Threshold (m3s-1)
275
Figure 4: (a) Variation of the mean of exceedances above the threshold, i.e., MRLP with 95% confidence interval, and (b) A zoomed-
in figure of a selected range of thresholds.
As per MRLP, a threshold should be selected from a region where it shows linear behavior. For the present study area, the
PDS extracted between 550 m3/s to 1000 m3/s threshold had a slightly linear pattern, and beyond this, the plot starts to shift.
280 This change in the graph's linearity with an increase in threshold occurs as the variance of a few extreme values might cause a
sudden jump in the plot. So setting an optimal threshold merely based on such graphical observation becomes subjective, but
it gives an idea about a range of thresholds where the optimum one may lie. Based on this, thresholds within the range 550 to
1000 m3/s at which PDS satisfied with independence criteria are selected for further analysis.
For the choice of distribution to model the arrival of peaks above any threshold, the dispersion index test proposed by Cunnane
285 (1979) was applied. Figure 5 displays the value of the dispersion index at a 5% significance level for the study area. The PDS
at most of the thresholds follows Poisson’s process, with DI lying between the upper and lower limit. The binomial distribution
also showed a better fit at some thresholds. Based on this, the entropy of model 1 (HM1) was calculated at each truncation
level as described in Sect. 2.3
13
2.5 Dispersion Index

Lower Limit
290
Dispersion Index at 5% significance
Upper Limit
2.0
1.5
level
1.0
295
0.5
0.0
0 200 400 600 800 1000 1200 1400
Threshold (m3s-1)
Figure 5: Dispersion Index test at 5% significance level.
300 The four candidate distributions (GEV, GP, P 3, and LP 3) were then fitted to the magnitude of exceedances to compute their
entropy function (HM2) as described in Sect. 2.3 The combined entropy of both the models (Htotal = HM1 + HM2) was calculated
at the chosen thresholds. The threshold with the maximum total entropy was selected as the optimum one for each distribution.
Figure 6 demonstrates the variation of entropy functions with the threshold. Figure 6(a-d) compares the total entropy function
of individual distributions with the entropy of model 2, i.e. when the distributions were fitted to the magnitude of exceedances.
305 It’s observed that for a particular distribution model, the threshold at which HM2 becomes maximum is different than the
threshold at which Htotal is maximum. For example, GEV has the highest HM2 at 1100 m3/s, while its total entropy reaches the
highest at a threshold of 700 m3/s. So consideration of the entropy of model 1 changes the choice of optimum threshold for
each distribution.
GEV GP
7.9 6.20 8.6 6.95
7.8 (a) 6.15 (b) 6.90

8.5
7.7 6.10
Entropy of Model 2(Nats)
Entropy of Model 2 (Nats)
6.85
Total Entropy (Nats)
7.6 6.05 8.4

6.80
7.5 6.00
8.3 6.75
7.4 5.95
7.3 5.90 6.70

8.2
7.2 5.85 6.65
7.1 5.80 8.1

6.60
7.0 5.75
550 750 950 1150 8.0 6.55
550 650 750 850 950 1050 1150
Threshold (m3s-1)
Total Entropy Entropy of Model 2 Threshold (m3s-1)
Total Entropy Entropy of Model 2
14
8.6
P3 6.90 LP 3
9.0 7.20
(c)
(d)
8.5 6.85 8.5 7.10

8.0 7.00

8.4 6.80
7.5 6.90
8.3 6.75
7.0 6.80
8.2 6.70 6.5 6.70
6.0 6.60
8.1 6.65
5.5 6.50
8.0 6.60
550 650 750 850 950 1050 1150 5.0 6.40
550 750 950 1150
Threshold (m3s-1)
Threshold (m3s-1)
Total Entropy Entropy of Model 2 Total Entropy Entropy of Model 2
Entropy of Model 1
1.9 9.0
(e) 8.8 (f)

1.8
8.6
1.7 8.4
Entropy (Nats)
8.2
1.6
8.0
1.5 7.8
1.4 7.6
7.4
1.3
7.2
1.2 7.0
550 650 750 850 950 1050 1150 550 650 750 850 950 1050 1150
310 GEV GP P3 LP 3
Figure 6: Variation of entropy with the threshold.
Figure 6(f) shows the variation of total entropy with thresholds for all four distributions. LP 3 has the maximum entropy at
most thresholds, where P 3, GP, and GEV had second, third, and fourth, respectively. LP 3 is recommended as the standard
distribution for FFA in the United States by federal agencies (England, 2011). However, the logarithmic conversion of small
315 events in the series may affect the results while using LP 3 distribution. LP 3/PD at a threshold of 710 m3/s was the most
suitable choice for PDS modeling of the study area. GP and GEV performed better at 700 m3/s, whereas the PDS at 830 m3/s
had the highest total entropy for the P3 distribution. Table 5 summarizes the final results. Poisson’s distribution was found to
be suitable for the arrival of peaks at these thresholds. The average number of peaks per year varied between 2.5 to 3.2.
15
Table 5 Summary of optimum threshold and the underlying models
Distribution Topt (m3/s) λ Htotal (Nats)

models
GEV/PD 700 3.22 7.812
GP/PD 700 3.22 8.510
P 3/PD 830 2.47 8.523
LP 3/PD 710 3.18 8.756
320 Various test statistics were calculated to check the degree of fitting of these continuous probability distributions at different
thresholds for comparing the results of this proposed method to the conventional goodness of fit approaches. So a numerical
assessment based on a statistical ranking method was applied using eight model selection criteria described in Sect. 2.5 The
distributions were ranked according to these test statistics, and the final rank was computed. Figure 7 represents the rank of
these models and their total rank at some thresholds, as an example. A similar analysis was performed at other thresholds also.
325 GP and LP 3 distribution had better GOF statistics values, i.e., KS and AD at a maximum number of thresholds, implying a
better fit of empirical and predicted cumulative distribution function of exceedances. LP 3 distribution had a better combination
of modified AD statistics with the information criteria at majority thresholds, which is helpful in flood frequency analysis.
Also, the squared error metrics and the correlation coefficient of the exceedance series were better while modeled with LP 3
distribution for most thresholds. Overall, LP 3 distribution performed better for the thresholds lying within 600 m 3/s to 1050
330 m3/s. LP 3 best described the exceedances extracted at 700 and 710 m3/s as per all the test statistics. GP was the second-best
model for the exceedances at the majority of thresholds. The results thus obtained agreed with the ones obtained by applying
the modified principle of maximum entropy in this research.
5 30
5 30
25 25
4 4
Rank of individual test statitics
20 20
3 3
Total Rank
Total Rank
15 15
2 2
10 10
1 1
5 5
0 0 0 0
GEV GP P3 LP 3 GEV GP P3 LP 3
Probability Distributions Probability Distributions
16
5 30 5 35
30
25
4 4
KS

25
AD
20
3 3 AICC
Total Rank
Total Rank
20
15 BIC
15 RMSE
2 2
10 RRMSE
10 CC
1
5 1 ADC
5
Total
Rank
0 0
GEV GP P3 LP 3 0 0
GEV GP P3 LP 3
Probability Distributions
Probability Distributions
335 Figure 7: Ranking of distributions based on eight model selection criteria at a threshold of, (a) 600 m3/s, (b) 700 m3/s, (c) 850 m3/s,
and (d) 1100 m3/s.
10, 50, 100, and 500 year return period estimates were calculated and plotted in Figure 8. GEV and LP 3 distribution models
gave higher design flood discharge for T ≥ 50 years. However, for a lower return period of 10 years, all four distribution models
predicted similar design flow values. So the choice of threshold and the respective distribution models don’t significantly
340 influence the lower return period estimates. However, for larger quantiles, it plays a vital role. Nagy et al. (2017) also arrived
at similar conclusions.
8000
GEV/PD/700 GP/PD/700
7000
Design Flood Estimates (m3s-1)
P3/PD/830 LP3/PD/710
6000
5000
4000
3000
2000
1000
0
10 50 100 500
Return Period (Years)
Figure 8: Quantile estimates of PDS at the optimum threshold.
A bootstrap sampling was performed with 1000 samples and data length the same as the main PDS to check the predictive
345 ability of these distribution models. The 95% confidence interval (CI) of quantile estimates were plotted and analyzed for
17
uncertainty. Figure 9 illustrates 95% CI for GP distribution where the estimated flood quantile values lie within the upper and
lower limits, thereby justifying the predictive ability of the models.
4500 5000
4000 4500
100 year Design Discharge

50 year Design Disharge
3500 4000
3500
350
3000
3000
(m3s-1)
(m3s-1)
2500
2500
2000
2000
1500 1500
1000 1000
500 500 LL Q100 UL
LL Q50 UL
0 0
550 750 950 1150 550 750 950 1150 355
Figure 9: 95% CI for 50 and 100 year return period quantiles from GP distribution.
According to the operational guidelines proposed by Lang et al. (1999), the optimum threshold for this study area was identified
as 730 m3/s. As per Rosbjerg and Madsen (1992), the threshold from a daily discharge series should be Topt = E(Q) +
360 3(Var(Q))0.5, following this, a threshold of 666 m3/s was obtained for the study area. Langbein (1949) stated the threshold as
the lowest annual maximum discharge leading to a value of 716 m3/s for the Waimakariri record at OHB. Nagy et al. (2017)
calculated a threshold of 700 m3/s for LP 3 and 750 m3/s for GP distribution. It is observed that the optimum threshold value
obtained from this present study was close to the findings from some existing threshold selection techniques. Considering the
entropy of model 1, i.e., the arrival of peaks instead of taking only the entropy of distributions used for modeling exceedances,
365 gives more accurate optimum threshold values. The conventional statistical approach ensures only the fitness of models to
exceedances; however, the modified POME method helps identify the optimum threshold along with both the models required
for describing the PDS. So this new approach of calculating total entropy of dual models of PDS can be used as an alternative
to locating the optimum threshold and the respective distribution models.
6 Conclusions
370 Several schools of thought exist regarding the choice of threshold in partial duration series of flood frequency analysis. The
present study adds another new domain where the principle of maximum entropy theory is applied to locate the optimum
threshold and the underlying distribution models of the PDS. The methodology was applied to the Waimakariri River at OH
bridge, New Zealand. After extracting dependent peaks from the PDS, a region of threshold was identified based on the
operational guideline proposed by Lang et al. (1999). The dispersion index gave the distribution model for the arrival of peaks
375 above a threshold, and the corresponding entropy was estimated. All the four candidate distributions were fitted to the
18
magnitude of peaks to calculate the respective entropy function. The threshold with the maximum total entropy of both these
models became the optimum threshold. The fitness of candidate distributions to the exceedances was also compared with the
conventional statistical approach, where eight suitable model selection criteria were applied. The results obtained by using
POME were similar to the standardized procedure. For all the candidate distributions, the optimum threshold lay between 2.47
380 to 3.22. The PDS sample with the average number of peaks per year of 3.2 with Log Pearson type 3 and Poisson model
performed better. The formula used for converting return periods into annual domain also helped in simplifying the use of PDS
by eliminating the compulsory consideration of Poisson’s distribution for the occurrence of peaks. Various return period
quantiles were estimated, and a bootstrap sampling with 1000 samples resulted in the 95% confidence interval. The results
justified the predictive ability of these models derived by applying POME in the PDS context. The threshold obtained in the
385 present research was close with some previous research. It has an advantage over other existing methods considering both the
models while identifying the optimum threshold, i.e., considering the entropy of model 1, i.e., the arrival of peaks instead of
taking only the entropy of distributions used for modeling exceedances, gives more accurate optimum threshold values.
Overall, the current research suggests this method based on POME in the PDS context as an alternative to the existing
conventional approach of threshold selection.
390 Data availability: The hourly discharge data for the Waimakariri River at the Old Highway Bridge gauging site, New Zealand,
is available at https://www.ecan.govt.nz/. This work uses material sourced from “the Environment Canterbury Surface Water
Archive”, which is licensed under a Creative Commons Attributions 4.0 International license by Environment Canterbury.
Author Contribution: SS performed data collection, analysis, and manuscript preparation under the proper guidance of
CSPO. His intellectual suggestions and review helped in refining the article. Both the authors read and approved the final
395 manuscript.
Competing interests: The authors declare that they have no conflict of interest.
Acknowledgments: The authors acknowledge Environment Canterbury Regional Council for providing discharge data for the
Waimakariri River at the Old Highway Bridge gauging site, New Zealand.
References
400 Adamowski, K.: Regional analysis of annual maximum and partial duration flood data by nonparametric and L-moment
methods, J. Hydrol., 229(3–4), 219–231, doi:10.1016/S0022-1694(00)00156-6, 2000.
Alfonso, L., Lobbrecht, A., and Price, R.: Optimization of water level monitoring network in polder systems using information
theory, Water Resour. Res., 46(12), doi:10.1029/2009WR008953, 2010.
Anderson, T.W., and Darling, D. A.: Asymptotic Theory of Certain “Goodness of Fit” Criteria Based on Stochastic Processes,
405 Ann. Math. Stat., 23(2), 193–212, 1952.
Ashkar, F., and Rousselle, J.: Partial duration series modeling under the assumption of a Poissonian flood count, J. Hydrol.,
90(1–2), 135–144, doi:10.1016/0022-1694(87)90176-4, 1987.
Atieh, M., Gharabaghi, B., and Rudra, R.: Entropy-based neural networks model for flow duration curves at ungauged sites,
19
J. Hydrol., 529, 1007–1020, doi:10.1016/j.jhydrol.2015.08.068, 2015.

410 Basu, B., and Srinivas, V. V: Regional Flood Frequency Analysis Using Entropy-Based Clustering Approach, J. Hydrol. Eng.,
21(8), 1–12, doi:10.1061/(ASCE)HE.1943-5584.0001351., 2013.
Beguería, S.: Uncertainties in partial duration series modelling of extremes related to the choice of the threshold value, J.
Hydrol., 303(1–4), 215–230, doi:10.1016/j.jhydrol.2004.07.015, 2005.
Bezak, N., Brilly, M., and Šraj, M.: Comparison between the peaks-over-threshold method and the annual maximum method
415 for flood frequency analysis, Hydrol. Sci. J., 59(5), 959–977, doi:10.1080/02626667.2013.831174, 2014.
Bobee, B., and Ashkar, F.: The Gamma Family And Derived Distributions Applied In Hydrology, Water Resources
Publications, Colorado, USA., 1991.
Burnham, K. P., and Anderson, D. R.: Model Selection and Multimodel Inference: A Practical Information-Theoretic
Approach., Springer-Verlag New York, New York, USA., 2002.
420 Chang, C. I., Chen, K., Wang, J., and Althouse, M. L. G.: A relative entropy-based approach to image thresholding, Pattern
Recognit., 27(9), 1275–1289, doi:10.1016/0031-3203(94)90011-6, 1994.
Claps, P., and Laio, F.: Can continuous streamflow data support flood frequency analysis? An alternative to the partial duration
series approach, Water Resour. Res., 39(8), 1216, doi:10.1029/2002WR001868, 2003.
Cunnane, C.: A particular comparison of annual maxima and partial duration series methods of flood frequency prediction, J.
425 Hydrol., 18(3–4), 257–271, doi:10.1016/0022-1694(73)90051-6, 1973.
Cunnane, C.: A note on the Poisson assumption in partial duration series models, Water Resour. Res., 15(2), 489–494,
doi:10.1029/WR015i002p00489, 1979.
Davison, A. C., and Smith, R. L.: Models for Exceedances over High Thresholds, J. R. Stat. Soc., 52(3), 393–442, 1990.
Deng, J.: Maximum entropy method for flood frequency analysis: A case study of the Grand River in Ontario, Canada, IOP
430 Conf. Ser. Earth Environ. Sci., 344(1), doi:10.1088/1755-1315/344/1/012002, 2019.
Drissia, T. K., Jothiprakash, V., and Anitha, A. B.: Flood Frequency Analysis Using L Moments: a Comparison between At-
Site and Regional Approach, Water Resour. Manag., 33, 1013-1037,doi:10.1007/s11269-018-2162-7, 2019.
Ebrahimi, N., Maasoumi, E., and Soofi, E. S.: Ordering univariate distributions by entropy and variance, J. econometrics,
90(2), 317-336, 1999.
435 England, J. F.: Flood frequency and design flood estimation procedures in the United States: Progress and challenges, Aust. J.
Water Resour., 15(1), 33–46, doi:10.1080/13241583.2011.11465388, 2011.
Ferguson, T. S. Genest, C., and Marc, H.: Kendall’s tau for serial dependence, Can. J. Stat., 28(3), 587–604, 2000.
Frank J. Massey, J.: The Kolmogorov-Smirnov Test for Goodness of Fit, J. Am. Stat. Assoc., 46(253), 68–78, 1951.
Ghorbani, M. A., Ruskeepää, H., Singh, V. P., and Sivakumar, B.: Flood frequency analysis using Mathematica, Turkish J.
440 Eng. Environ. Sci., 34(3), 171–188, doi:10.3906/muh-1002-2, 2010.
Ghosh, S., and Resnick, S.: A discussion on mean excess plots, Stoch. Process. their Appl., 120(8), 1492–1517,
doi:10.1016/j.spa.2010.04.002, 2010.
20
Gray, D., Scarsbrook, M. R., and Harding, J. S.: Spatial biodiversity patterns in a large New Zealand braided river, New Zeal.
J. Mar. Freshw. Res., 40(4), 631–642, doi:10.1080/00288330.2006.9517451, 2006.
445 Guru, N., and Jha, R.: Flood estimation in Mahanadi river system, India using partial duration series, Georisk Assess. Manag.
Risk Eng. Syst. Geohazards, 10(2), 135–145, doi:10.1080/17499518.2015.1116013, 2016.
Hao, Z., and Singh, V. P.: Single-site monthly streamflow simulation using entropy theory, Water Resour. Res., 47(9),
W09528, doi:10.1029/2010WR010208, 2011.
Hosking, J. R. M.: L-Moments : Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics, J.
450 Royal Stat. Soc., B, 52(1), 105-124, http://www.jstor.org/stable/2345653, 1990.
Hosking, J. R. M., and Wallis, J. R.: Parameter and Quantile Estimation for the Generalized Pareto Distribution,
Technometrics, 29(3), 339–349, 1987.
Hosking, J. R. M., and Wallis, J. R.: Regional Frequency Analysis: An Approach Based on L-Moments, Cambridge:
Cambridge University Press., 1997.
455 Hyndman, R. J., and Koehler, A. B.: Another look at measures of forecast accuracy, Int. J. Forecasting, 22(4), 679-688, 2006.
Jaynes, E. T.: Information theory and Statitical mechanics, Phys. Rev., 106(4), 620-630, 1957.
Karim, A. M., and Chowdhury, J. U.: A comparison of four distributions used in flood frequency analysis in Bangladesh,
Hydrol. Sci. J., 40(1), 55–66, doi:10.1080/02626669509491390, 1995.
Krstanovic, P. F., and Singh, V. P.: Evaluation of Rainfall Networks Using Entropy : I . Theoretical Development, Water
460 Resour. Manag., 6, 279–293, https://doi.org/10.1007/BF00872281,1992.
Lang, M., Ouarda, T. B. M. J., and Bobee, B.:Towards operational guidelines for over-threshold modeling, J, Hydrol, 3-4, 225,
103–117, https://doi.org/10.1016/S0022-1694(99)00167-5, 1999.
Langbein, W. B.: Annual floods and the partial duration series, Eos Trans. Am. Geophys. Union, 30(6), 879–881,
doi:10.1029/TR030i006p00879, 1949.
465 Langousis, A., Mamalakis, A., Puliga, M., and Deidda, R.: Threshold detection for the generalized pareto distribution: Review
of representative methods and applications to the NOAA NCDC daily rainfall database, Water Resour. Res., 52(4), 2659–
2681, doi:10.1002/2015WR018502, 2016.
Fawcett, L., and Walshaw, D.: Estimating return levels from serially dependent extremes, Environmetrics, 23, 272–283,
doi:10.1002/env.2133, 2012.
470 Lee, S.: A maximum entropy type test of fit: Composite hypothesis test, Comput. Stat. Data Anal., 57(1), 59–67,
doi:10.1016/j.csda.2011.03.012, 2013.
Li, F., and Zheng, Q.: Probabilistic modelling of flood events using the entropy copula, Adv. Water Resour., 97, 233–240,
doi:10.1016/j.advwatres.2016.09.016, 2016.
Mohssen, M.: Partial duration series in the annual domain, in: Anderssen, R., Braddock, R., Newham, L. (Eds.), Proceedings
475 of the 18th World IMACS and MODSIM International Congress, Cairns, 13–17 July 2009, International Association for
Mathematics and Computers in Simulation, Cairns, Australia, 2694–2700., 2009.
21
Madsen, H., Pearson, C. P., and Rosbjerg, D.: Comparison of annual maximum series and partial duration series methods for
modeling extreme hydrologic events: 2. Regional modeling, Water Resour. Res., 33(4), 759–769, doi:10.1029/96WR03849,
1997.
480 Meng, F., Li, J., and Gao, L.: ERM-POT Method for Quantifying Operational Risk for Chinese Commercial Banks, in: Shi Y.,
van Albada G.D., Dongarra J., Sloot P.M.A. (eds) Computational Science – ICCS 2007, ICCS 2007, Lecture Notes in
Computer Science, vol 4488, Springer, Berlin, Heidelberg, https://doi.org/10.1007/978-3-540-72586-2_68 , 478–481, 2007.
Moramarco, T., and Singh, V. P.: Formulation of the Entropy Parameter Based on Hydraulic and Geometric Characteristics of
River Cross Sections, J. Hydrol. Eng., 15(10), 852–858, https://doi.org/10.1061/(ASCE)HE.1943-5584.0000255, 2010.
485 Nagy, B. K., Mohssen, M., and Hughey, K. F. D.: Flood frequency analysis for a braided river catchment in New Zealand:
Comparing annual maximum and partial duration series with varying record lengths, J. Hydrol., 547, 365–374,
doi:10.1016/j.jhydrol.2017.02.001, 2017.
Northrop, P. J., Attalides, N., and Jonathan, P.: Cross-validatory extreme value threshold selection and uncertainty with
application to ocean storm severity, J. R. Stat. Soc., 66(1), 93–120, doi:10.1111/rssc.12159, 2017.
490 Olofintoye, O. O,. Sule, B. F., and Salami, A.W.: Best–fit Probability distribution model for peak daily rainfall of selected
Cities in Nigeria, New York Sci. J., 2(3), 1–12, doi:10.2174/138920312803582960, 2009.
Önöz, B., and Bayazit, M.: Effect of the occurrence process of the peaks over threshold on the flood estimates, J. Hydrol.,
244(1–2), 86–96, doi:10.1016/S0022-1694(01)00330-4, 2001.
Pal, N. R., and Pal, S. K.: Entropic thresholding, Signal Processing, 16(2), 97–108, 1989.
495 Pham, H. X., Shamseldin, A. Y., and Melville, B.: Statistical Properties of Partial Duration Series: Case Study of North Island,
New Zealand, J. Hydrol. Eng., 19(4), 807–815, doi:10.1061/(ASCE)HE.1943-5584.0000841, 2014.
Rajsekhar, D., Singh, V. P., and Mishra, A. K.: Multivariate drought index : An information theory based approach for
integrated drought assessment, J. Hydrol., 526, 164–182, doi:10.1016/j.jhydrol.2014.11.031, 2015.
Rao, A. R., and Hamed, K. H.: Flood Frequency Analysis, CRC Press., 2000.
500 Rosbjerg, D., and Madsen, H.: On the choice of threshold level in partial duration series, in: Proc Nordic Hydrological
Conference, Alta, NHP Rep No. 30, 604–615., 1992.
Rosbjerg, D.: Estimation in partial duration series with independent and dependent peak values, J. Hydrol., 76(1–2), 183–195,
doi:https://doi.org/10.1016/0022-1694(85)90098-8, 1985.
Sankarasubramanian, A., and Srinivasan, K.: Investigation and comparison of sampling properties of L-moments and
505 conventional moments, J. Hydrol., 218(1–2), 13–34, doi:10.1016/S0022-1694(99)00018-9, 1999.
Scarrott, C., and Macdonald, A.: A review of extreme value threshold estimation and uncertainty quantification, REVSTAT-
Stat. J., 10(1), 33–60, 2012.
Shannon, C. E.: A Mathematical Theory of Communication, Bell Syst. Tech. J., 27(3), 379–423, doi:10.1002/j.1538-
7305.1948.tb01338.x, 1948.
510 Sinclair, C. D., Spurr, B. D., and Ahmad, M. I.: Modified anderson darling test, Commun. Stat. - Theory Methods, 19(10),
22
3677–3686, doi:10.1080/03610929008830405, 1990.

Singh, V. P.: The use of entropy in hydrology and water resources, Hydrol. Process., 11, 587–626, 1997.
Smith, R. L.: Extreme Value Analysis of Environmental Time Series: An Application to Trend Detection in Ground-Level
Ozone, Stat. Sci., 4, 367–377, doi:10.1214/ss/1177012400, 1989.
515 Solari, S., Eguen, M., Polo, M. J., and Losada, M. A.: Peak Over Threshold(POT): A methodology for automatic threshold
estimation using goodness of fit p-value, Water Resour. Res., 53, 5375–5377, doi:10.1002/2013WR014979, 2017.
Stedinger, J. R., Vogel, R. M., and Foufoula-Georgiou, E.: Frequency analysis of extreme events, in: D. R. Maidment (ed.),
Handbook of Hydrology, New York: McGraw-Hill, New York,1992.
Swetapadma, S., and Ojha, C. S. P.: Selection of a basin-scale model for flood frequency analysis in Mahanadi river basin,
520 India, Nat Hazards, 102, 519-522, https://doi.org/10.1007/s11069-020-03936-7, 2020.
Pun, D.: Entropic Thresholding, A new approach, Comput. Gr. and Image Processing, 16, 210-239,1981.
U. S. Water Resources Council: Guidelines for determining flood frequency analysis, Bulletin 17B, Hydrologica
Communications, Washington, DC, 1982.
Vogel, R. M., Wilbert, O., Thomas Jr, and McMahon, T. A.: Flood-flow frequency model selection in Southwestern United
525 States, J. Water Resour. Plan. Manag., 119(3), 353 - 366, https://doi.org/10.1061/(ASCE)0733-9496(1993)119:3(353), 1993.
Xiong, F., Guo, S., Chen, L., Yin, J., and Liu, P.: Flood Frequency Analysis Using Halphen Distribution and Maximum
Entropy, J. Hydrol. Eng., 23(5), 04018012, doi:10.1061/(asce)he.1943-5584.0001637, 2018.
Yao, J., Dash, M., Tan, S. T., and Liu, H.: Entropy-based fuzzy clustering and fuzzy modeling, Fuzzy Sets Syst., 113(3), 381–
388, doi:10.1016/S0165-0114(98)00038-4, 2000.
530 Yu, F. X., Naghavi, B., Singh, V. P., and Wang, G.: MMO: An improved estimator for log-Pearson type-3 distribution, Stoch.
Hydrol. Hydraul., 8, 219–231, 1994.
Zhang, H., Chen, L., and Singh, V. P.: Flood frequency analysis using generalized distributions and entropy-based model
selection method, J. Hydrol., 595, 125610, doi:10.1016/j.jhydrol.2020.125610, 2020.
23

Partial Duration Series - SWETAPADMA

Uploaded by

Copyright:

Available Formats

Partial Duration Series - SWETAPADMA

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Partial Duration Series - SWETAPADMA

Uploaded by

Copyright:

Available Formats

https://doi.org/10.

Technical Note: Flood frequency study using partial duration series

2.1 Probability distributions for the dual modeling of PDS

Distribution Cumulative Distribution Function L moment expressions for parameters

Generalized Pareto 1 k = (3t3 – 1) / (1+ t3); σ = λ2 (1-k) (2-k); μ = λ1 – σ /

Pearson Type III ( x ) /  ( ) For 0 < |𝑡3 | < 1/3;

For all t3 values; β = sign(t3) π1/2λ2(Γ(α)/Γ(α+0.5)),

Distribution Models Parameters Probability Mass Mean and Variance

Negative Binomial r > 0; the number of 𝑘+𝑟−1 𝑘 𝑟 E[X] = pr/(1-p)

110 2.2 The potential of the entropy approach

2.3 Entropy functions of probability distributions

Constraints of the equation can be expressed as;

So the final interpretation of entropy becomes,

the expression for entropy is derived as,

Similarly, the continuous entropy functions for P 3 and LP 3 distribution are,

2.4 Independence and Poisson’s hypothesis test

180 2.5 Exceedance model selection criteria

Criteria Equations Reference

Kolmogorov-Smirnov i 1 i (Frank and

Akaike Information 2( m)( m  1) (Burnham and

Relative Root Mean 1 O  Pi 2 1 / 2 (Yu et al., 1994)

Modified Anderson- n n 2i  1 (Sinclair et al.,

Preliminary analysis of discharge series

Calculate entropy of Model 1 (HM1) (Section 2.3)

Comparison of degree of fitness of exceedances with conventional statistical approach using

5 Results and discussion

̅̅̅𝑡 – t) with the

Mean excess above threshold

2.5 Dispersion Index

Figure 5: Dispersion Index test at 5% significance level.

7.8 (a) 6.15 (b) 6.90

Entropy of Model 2 (Nats)

Total Entropy (Nats)

7.6 6.05 8.4

7.3 5.90 6.70

7.1 5.80 8.1

Entropy of Model 2 (Nats)

Entropy of Model 2 (Nats)

Total Entropy (Nats)

8.2 6.70 6.5 6.70

(e) 8.8 (f)

Figure 6: Variation of entropy with the threshold.

Table 5 Summary of optimum threshold and the underlying models

Distribution Topt (m3/s) λ Htotal (Nats)

Rank of individual test statitics

Figure 8: Quantile estimates of PDS at the optimum threshold.

100 year Design Discharge

J. Hydrol., 529, 1007–1020, doi:10.1016/j.jhydrol.2015.08.068, 2015.

3677–3686, doi:10.1080/03610929008830405, 1990.

You might also like