1-s2.0-S2214714419316678-main
1-s2.0-S2214714419316678-main
1-s2.0-S2214714419316678-main
A R T I C LE I N FO A B S T R A C T
Keywords: Providing a robust and reliable model is essential for hydro-environmental and public health engineering per-
Principal component analysis spectives, including water treatment plants (WTPs). The current research develops an emerging evolutionary
Water treatment plant data-intelligence model: extreme learning machine (ELM) integrated with kernel principal component analysis
Data-driven algorithms (KPCA) to predict the performance of the Tamburawa WTP in Kano, Nigeria. A traditional feed-forward neural
Cross-validation
network (FFNN) and a classical linear autoregressive (AR) models were also employed to compare the predictive
Kano-Tamburawa
Extreme learning marchine
performance. For this purpose, different input data with the corresponding treated pH, turbidity, total dissolve
solids, and hardness as the target variables obtained from the WTP were used. The predictive models are
evaluated based on the three numerical indices, namely Nash-Sutcliffe (NC), root mean squared error (RMSE)
and mean absolute percentage error (MAPE). To examine the similarities and differences between the observed
and predicted values, a two-dimension graphical diagram (i.e., Taylor diagram) was also utilized. The predictive
results revealed the potential of KPCA-ELM, which exhibited a high level of accuracy in comparison to the single
models for all the considered variables with a slight exception in terms of pH prediction. Two different model
combination were built for each single (FFNN, ELM, and AR) model and KPCA algorithms (KPCA-FFNN, KPCA-
ELM, and KPCA-AR). The results also depicted that both ELM and FFNN models demonstrated prediction skill
and therefore, can serve as reliable models. The outcomes may contribute to the aforementioned modeling of the
treated parameters and provides a reference benchmark for wastewater management and control in the
Tamburawa WTP.
1. Introduction through several processes and technologies [3]. The United Nations
Educational, Scientific and Cultural Organization (UNESCO) reported
The rapid increase in population, urbanization, industrial and that WTPs are essential components for attaining sustainable develop-
agricultural water demands present threats to water treatment plants ment and are crucial for public and environmental health [3,4].
(WTPs) due to the capacity overload. According to the World Health Therefore, a satisfactory WTP is paramount to overcome the problems
Organization (WHO) and the United Nations International Children's of water scarcity and to meet the domestic water standards required by
Emergency Fund (UNICEF), water is one of the most indispensable law [6]. The physicochemical characteristics are often the major factors
factors need to sustain life, and affordable and adequate supply of water affecting the performance, operation, and control of WTPs [4,5]. The
must be available [1,2]. WTPs are operated to remove bacteria, solids, failure to control the physicochemical parameters prescribed standard
micro-organisms, and other contaminants from untreated water limit in any WTP may cause significant environmental and public
⁎
Corresponding author.
E-mail address: bachquangvu@duytan.edu.vn (Q.-V. Bach).
1
These authors contributed equally to this work.
https://doi.org/10.1016/j.jwpe.2019.101081
Received 5 September 2019; Received in revised form 23 October 2019; Accepted 22 November 2019
Available online 20 December 2019
2214-7144/ © 2019 Elsevier Ltd. All rights reserved.
S.I. Abba, et al. Journal of Water Process Engineering 33 (2020) 101081
health problems. On the other hand, the appropriate and adequate [29], principal component analysis (PCA) [24,25], linear discriminant
control may be achieved by introducing a robust tool for modeling the analysis [32] and kernel principal component analysis (KPCA) [22]. It
WTPs performance [5–8]. The WTP process is complicated due to the was reported that PCA and KPCA are commonly employed in dimension
dilute mixture of several compositions, quality, and characteristics of reduction, classification and feature extraction in the multidimensional
the system, which result in the difficulties in modeling WTP parameters data set [22,27,30]. However, in contrast to PCA, KPCA is capable of
[9–11]. Also, the biochemical and physical nature of WTPs exhibits handling and capturing non-linear interaction within the process due to
non-linear phenomena which are too complicated to simulate by simple it is kernel function. According to the aforementioned literature, it was
deterministic principles or mathematical models [12]. evident that several studies using AI models have been conducted and
In the last decades, different linear models have been widely pre- have shown promising performance, each model for a specific case.
sented for managing the overall performance of WTPs, but most of them Although there is no exceptional model that exhibits superiority over
have limitations and are incapable of meeting the standard of non- others, applying the knowledge of kernel input variables selection ap-
linear modeling systems [13]. On the other hand, non-linear artificial proach could lead to more promising outcomes. Due to the problems of
intelligence (AI) models such as artificial neural network (ANN), overfitting, local minima, and slow learning speed by some of the AI
adaptive neuro-fuzzy inference system (ANFIS), and support vector models such as ANN, a novel, new and emerging algorithm known as
machine (SVM) have shown merit to non-linear model systems of WTPs. extreme learning machine (ELM) model was proposed by Huang et al.
The models applied in hydro-environmental studies can be grouped into [34] to overcome the disadvantages of the traditional feed-forward
two categories, namely physical-based and data-driven models. Phy- backpropagation.
sical models are based on the concept of distributed (white-box) models However, to the best of the authors’ knowledge, this is the first study
that address the physical process and interaction for simulating the in which the Tambura WTP has been modeled using ELM, FFNN, and
hydro-environmental system. In contrast, data-driven models are based KPCA. The current study is aimed at the following: (i) to explore the
on lumped (black-box) models that acquire the optimal correlation potential of ELM with the kernel PCA for modeling the performance of
between inputs and outputs but neglect the physical process [14]. Be- the Tambura WTP in term of different physio-chemical parameters such
sides, several efforts have been made to improve the accuracy and re- as pH, turbidity, and hardness; (ii) to develop and compare the ELM
liability of the influent-effluents parameters in the field of hydro-en- with the traditional feed-forward neural network (FFNN) and classical
vironmental studies: however, no particular method proven to be autoregressive (AR) model using the same input combinations.
applicable in modeling the process [11,15,13].
With this perspective, it could be stated that there are no acceptable 2. Methods and modeling development
single model that can perform better than others in the different hydro-
environmental systems due to the dynamic and complex nature of the 2.1. Extreme learning machine (ELM)
data. This has necessitated the development of more reliable and effi-
cient models using the available data [14–19]. For instance, Al-baid- The ELM was recently developed as a new learning approach whose
hani and Alameedee [20] developed an ANN model to predict the ef- primary advantage is its ability to map the internal features without the
fluent pH and turbidity using various measured input parameters such need to iteratively tune the parameters of the hidden neuron as re-
as pH, temperature, and dose turbidity. The results demonstrated the quired in a traditional ANN model [34]. The input and hidden neuron
suitability of ANN in modeling the WTP parameters. Wu and Lo [21] weightings are computed randomly in the ELM from several pre-as-
used ANN and ANFIS models to compute the real-time coagulant dosage signed neurons without having to pass through all the neurons in the
in WTP using the measurements of turbidity, pH and colour. The out- model [35]. Also, the generalization capability of the ELM is accep-
comes demonstrated that the ANFIS model was capable of accurately table, and it requires less computation time [36–38]. As a newly
predicting the coagulant dosage with regard to the ANN model. Gaya emerging black-box data-driven algorithms, the ELM was first proposed
et al. [12] described the application of ANN and Hammerstein-Wiene by [34] and is comprised of single hidden layer feedforward networks
(H-W) models for forecasting the influent turbidity in WTPs using dif- (SLFNs). The ELM is quite different from the traditional FFNN because
ferent input parameters. The simulated results indicated that ANN it can overcome the problems of slow learning speed, local minima, and
could outperform the H-W model and may serve as an acceptable tool overfitting [33,31,34]. It is notable that the potential of the ELM could
for modeling the turbidity of WTP. be attributed to its generalization ability and fast learning speed [39].
Similarly, other researchers [22–24] were able to employ an ANN Due to it is promising performance ability, ELM has been applied in
for the prediction of optimum coagulants in WTPs. Yaseen et al. [25] various fields of hydro-environmental studies [40]. The structure of the
performed another study on the ELM application to forecast the daily ELM network used in this study is presented in Fig. 1.
time-scale (in a tropical environment) of the Johor River located in In this study, an ELM model was developed using calibration and
Malaysia. The research findings provided evidence showing the capa- validation data sets, as mentioned above. For a collection of N training
city of the ELM model in the region. Nadiri et al. [26] studied the samples (i.e., t = 1, 2, …, N ) in which x t ∈ d and yt ∈ , an SLFN
treatability of the Tabriz wastewater treatment plant (WWTP) using a with H hidden nodes, is mathematically expressed as [34]:
supervised committee fuzzy logic (SCFL), and committee fuzzy logic
H
(CFL) approaches. Different measured influent water quality (WQ)
parameters were used for the prediction of BOD, COD and TSS. The
∑ Bi gi (αi. xt + βi) = zt ,
i=1 (1)
predicted results indicated the advantage of SCFL approach over FL and
CFL. Manu and Thalla [27] employed SVM and ANFIS models for the where B ∈ H , Z (z t ∈ ) and G (α, β , x ) represent the predicted
simulation of Kjeldahl nitrogen in a domestic WWTP located at Man- weights in the output layer, model output and activation function of the
galore, India. The obtained historical data during the period from June hidden layer, respectively, while αi , βi , i and d indicate the weights of
2014 to September 2014, including the influent pH, TSS, BOD, and the randomized layers, biases of these randomized layers, the index of
Kjeldahl nitrogen was used to attain the target objectives. The outcomes the specific node in the hidden layer and the number of inputs, re-
demonstrated the potential of the SVM model in modeling the biolo- spectively.
gical processes in WWTP. Likewise, the performance of ANN in mod- As mentioned above, the sigmoid activation function is found the
eling chemical oxygen demand in WWTP was reported in [28] and [9]. best, and thus it is employed in this study as:
On the other hand, several data processing and input variable se- 1
lection methods have been applied in different prediction models in G (x ) =
1 + exp(−x ) (2)
order to improve the prediction accuracy, including sensitivity analysis
2
S.I. Abba, et al. Journal of Water Process Engineering 33 (2020) 101081
In an ELM model, a proper number of hidden neurons, randomized ANN has been shown to be a useful tool for solving complex functions in
input layer weights (α), and randomized hidden layer biases (β) can different fields of water and environmental engineering [15,4].
lead to a zero error, which therefore produced the weights of the output The FFNN-BP algorithm involves the training of the network with
layer and can be obtained analytically for any training [34]: the trained input data that is processed through the system and then
N passed to the output layer; this might come with an error which is again
∑ ‖z t − yt ‖ = 0, propagated to the system until the desired target is achieved. The
t=1 (3) fundamental principle of FFNN-BP is to minimize the error in order to
The system of the linear equation can be used to obtain the value of understand the training data and subsequently estimate the actual
B for any input-output training samples: target [42]. BPNN is composed of three layers: input, hidden, and
output layers, as seen in Fig. 2. Several neurons that are present in the
Y = GB (4)
hidden layer can have an effect on the generalization ability and ca-
in which pacity of the neural network, which increases the computational
burden, whereby lower neurons cannot produce the desired prediction
⎡ g (x1) ⎤ ⎡ g1 (α1. x1 + β1) ⋯ gL (wH . x1 + βH ) ⎤ accuracy [29,43]. The linear activation function is used in the output
G (α, β , x ) = ⎢ ⋮ ⎥ = ⎢ ⋮ ⋯ ⋮ ⎥
layers, while the sigmoid activation is applied to the hidden and input
⎢ g (x ) ⎥ ⎢ g (α . x + β ) ⋯ g (w . x + β ) ⎥
⎣ N ⎦ ⎣ 1 N N 1 L H N F ⎦N × H layers. The activation function is a mathematical function which is
(5) introduced into each neuron for the conversion of a linear function into
a non-linear function.
and
T
⎡ B1 ⎤ 2.3. Kernel principal component analysis (KPCA)
B=⎢ ⋮ ⎥
⎢ T⎥
⎣ BH ⎦H × 1 (6) PCA as one of the common multivariate statistical techniques used
for reducing the dimension of high-volume data. The dimensionality
and reduction is normally achieved by randomly identifying the linear
T correlation between the variables [31]. However, as mentioned above,
⎡ y1 ⎤
standard PCA allows the linear dimensionally reduction, while KPCA
Y=⎢⋮⎥
⎢ T⎥ has been demonstrated to be a more powerful algorithm for mapping a
⎢ y
⎣ N⎥ ⎦N × 1 (7) non-linear process in the data set. The major importance of the kernel
where G is known as the hidden layer output, and T is the transpose of algorithm is the ability to operate without any non-linear optimization,
the matrix. The output weights B̂ can be estimated by inverting the which is contrary to other non-linear methods [37,38]. By applying this
matrix of the hidden layer using the Moore-Penrose generalized inverse method, input variables are changed and used as independent PC
function (+): variables [30]. Kaiser–Meyer–Olkin (KMO) is among the most com-
monly used statistics employed to assess the suitability of data in any
Bˆ = G+Y (8) factor analysis (FA) [13]. The classification of the KMO coefficient can
Subsequently, the estimated values ŷ can be determined by: be demonstrated as follows: Excellent ≥ 0.9, Very well = 0.8-0.89,
H
Well = 0.7-0.79, Mediocre = 0.6-0.69, Poor = 0.5-0.59 and Un-
yˆ = ∑ Bˆi gi (αi. xt + βi) acceptable < 0.5. The KMO coefficients and KMO index are presented
i=1 (9) in Eq. 4. More explanation of the PCA can be obtained in other studies
[13,45,46,44,14]. In this paper, a brief description of constructing
KPCA for dimensional reduction is provided.
2.2. Feedforward neural network (FFNN)
∑ ∑ rij2
FFNN with backpropagation (FFNN-BP) is one of the most widely KMO =
∑ ∑ rij2 + ∑ ∑ rij2 (10)
used ANN algorithms. It is a mathematical model which is aimed at
handling a non-linear relation between input-output sets of data. where rij is the correlation coefficient between the variable of i and j,
According to the history, ANNs are tools used in processing information and aij is the partial correlation coefficient between them.
which were derived and work like the biological nervous system of the Assuming that a non-linear transformation ∅ (x) from the original
brain, with a fundamental component known as a neuron (node) [41]. sample covariance matrix C in F space should fit the formula (Eq. 12),
3
S.I. Abba, et al. Journal of Water Process Engineering 33 (2020) 101081
the projected new features have zero mean: k (X , Y ) = exp(−1 ‖X − Y ‖2 /2σ 2 (20)
N
1
∑ ∅ (Xi ) = 0
N i=1 2.4. Autoregressive (AR) model
(11)
N
1 AR is commonly used in time-series simulation because of the sto-
C= ∑ ∅ (Xi ) ∅ (Xi )T
N i=1 chastic process that was built with a degree of randomness and un-
(12)
certainty [48]. The AR model forecasts the value of a future process of
If the kernel function is defined as: any variable based on the prior values. In particular, the AR model is
k (Xi , Xj ) = ∅ (Xi )T ∅Xj the regression of values based on the previous occurrence. Therefore,
(13)
the AR model for an order p is defined as AR(p) and expressed as:
The matrix notation can be employed:
Xt = β1 Xt − 1 + β2 Xt − 2 + …εt (21)
K2 ak = λk NK ak, (14)
Where εt is white noise with E= (εt ) and VAR (εt ) = σe2 , the parameters
where, β1, β2, …βP are the AR coefficient [14].
Kij = k (Xi , Xj ), (15)
2.5. Proposed model development
and ak is the N-dimensional column vector of aki as:
ak = [ak1, ak 2 , …akN ] T (16) For any data-driven model, determination of proper input variables
is of paramount importance. Similarly, in time-series modeling, iden-
ak can be solved by tifying the appropriate time lags is an essential part of selecting the
K ak = λk Nak , (17) proper model input combinations. As such, autocorrelation function
(ACF) and partial ACF (PACF) are used. In a time-series, autocorrelation
and the resulting kernel principal components can be calculated using is considered as the correlation between the time-series, previous and
N forthcoming data points [36,49]. The proposed development of the
yk (X ) = ∅ (X )T vk = ∑ aki k (X , Xi ) current study is illustrated in Fig. 3, for ELM (KPCA-ELM) and FFNN
i=1 (18)
(KPCA-FFNN). From the model, it can be seen historically that recorded
If the projected dataset {φ(xi)} does not have zero mean, the Gram data are collected, pre-processed and normalized within the range of 0-
∼
matrix K can be used to substitute the kernel matrix K. The Gram matrix 1. For this purpose, three different data-driven algorithms (ELM, FFNN,
is given by: and AR) coupled with the KPCA were employed for modeling the per-
∼ formance of the Tambura WTP in Kano, Nigeria. At first, the KPCA
K = K − 1N K − K 1N + 1N KN (19)
algorithm is used to perform the dimension reduction of the variables.
where 1N is the N × N matrix with all elements equal to 1/N. Subsequently, the selected KPCA input variables are imposed into the
The power of the kernel methods is that it is not necessary to ELM, FFNN and AR models to determine the performance of the WTP
compute φ(xi) explicitly, the kernel matrix can be directly constructed (see Eq. 2).
from the training data set {xi} [47]. The standard steps of kernel PCA
dimensionality reduction can be summarized as (i) construct the kernel 2.5.1. Model validation
matrix K from the training data set {xi} using Eq. (15); (ii) compute the Generally, in AI models, the primary purpose is to fit the model to
∼
Gram matrix K using Eq. (19); (iii) use Eq. (14) to solve for the vectors the given data based on the employed indicators with the goal of
∼
ai (substitute K with K ); (iv) compute the kernel principal components achieving reliable prediction on the unknown data set. Due to the
yk(x) using Eq. (18). Two commonly used kernels are the polynomial overfitting problems, satisfactory training performance is not always in
kernel and Gaussian kernel. The current work employs the Gaussian agreement with the testing performance. Even though the ELM model
kernel function as: can handle the overfitting problems of the traditional FFNN, overfitting
4
S.I. Abba, et al. Journal of Water Process Engineering 33 (2020) 101081
can still occur, especially when the data size is small. In the validation
process, different types of validation approach can be applied, including
cross-validation, which is called k-fold cross-validation, and others in-
clude holdout, leave one out. The holdout method is also considered as
a simpler version of k-fold, where the data is usually divided randomly
into two sets known as training and testing phase [42,43].
The k-fold cross-validation is a mechanism that was adopted in
order to avoid further overfitting. Also, the original training set is
equally partitioned into equal-sized subsets of k. From these k subsets,
one of the subsets was maintained and used for validation purpose,
while the remaining k-1 subsets were maintained and used for training
purpose. Therefore, the cross-validation method is then repeated k
times (the folds), where each k subset is utilized as the validation data,
in alternation. The final result of the performance of this training model
was the average of the k subsets’ validation performances. Mostly, the k
value can be determined through sample availability, usually from 2-
10. The major advantages of the k-fold cross validation mechanism are
Fig. 4. Illustration of k-fold cross-validation.
that in every single round, the validation set and the training sets are
independent [50,51]. This brings about a performance objective which
creates a sound foundation for optimizing the model [52]. Apart from absolute percentage error (MAPE).
this, implementing cross-validation has the ability to improve the effi- N
∑i = 1 [Yobsi − Ycomi]2
ciency of data usage. Generally, in model configuration, the overall data NC = 1 − N
set is classified into three independent sets: model calibration set, test ∑i = 1 [Yobsi − Y¯obsi]2 (22)
set, and validation set. Sometimes, sample sizes can be small and this
N
can lead to a lack of or poor sample representation. Through the in- ∑i = 1 (Yobsi − Ycomi )2
RMSE =
volvement of cross-validation, the validation set and calibration set are N (23)
combined together as a whole. Therefore, the overall data can be
N
classified into two sets. By the k-fold of a randomly dynamic division of 1⎡ Y −Y
training samples, this model is more objective and stable [52]. As stated
MAPE = ∑ obsiY comi ⎤⎥
N ⎢ i=1
⎣ obsi ⎦ (24)
above, the obtained data is divided into two samples (training = 75 %
and testing = 25 %) considering the 4-fold cross-validation. It is note- where N, Yobsi , Ȳobsi and Ycomi are data number, observed data, average
worthy that other approaches for validating and portioning the data value of the observed data and computed values, respectively.
could be used (see Fig. 4).
3. Case study and data description
2.5.2. Evaluation criteria The Tamburawa water treatment plant (TWTP) in Kano (Nigeria),
Different evaluation criteria can be used to determine the com- like other conventional water treatment plants, has the capacity to
parative accuracy of the predictive models; as such, a multi-criteria produce 150 ML of potable water per day to cover the communities in
indicator for measuring the model’s performance was used in the cur- Kano city and the surroundings (see Fig. 5a). The raw water from the
rent study, namely Nash-Sutcliffe (NC) as a goodness-of-fit and two source is pumped via a pump station and then enters a preliminary
statistical error including root mean squared error (RMSE) and mean treatment unit where grits and some of the suspended solids are
5
S.I. Abba, et al. Journal of Water Process Engineering 33 (2020) 101081
Fig. 5. (a) Map of the Tambura WTP (b) operational process of the plant (c) concentration of the raw and treated (pH, Turb, TDS, and Hard).
6
S.I. Abba, et al. Journal of Water Process Engineering 33 (2020) 101081
Table 1 Table 3
Descriptive statistics of the data. Eigenvalue and percentage of data explained by each factor.
Parameters X̄ Xmax Xmin Number Eigenvalue Difference Proportion Value Proportion
X̄ , Xmax and Xmin indicate the mean, maximum and minimum, respectively.
removed to avoid pump wear and pipe deterioration. Fig. 5b shows the
schematic flow chart of the important operational process. The opera-
tional process contains rapid mix, coagulation/flocculation, sedi-
mentation, filtration, disinfection and final treated water, which can be
distributed to different sources such as domestic, commercial and in-
stitutional [12]. The historical recorded data from TWTP contained raw
and treated turbidity (Turbr and Turbt) (NTU), total dissolve solid (TDSr
and TDSt) (mg/L), suspended solid (SSr and SSr) (mg/L), pH (pHr and
pHt), hardness (Hardr and Hardt) (mg/L), conductivity (Condr and
Condt) (mS/cm), Chloride content (Clr and Clt) (mg/L) and Iron content
(Fer and Fet) (mg/L). Table 1 shows the descriptive statistical analysis
used for studying the data. The concentration of the raw and treated pH
Turb, TDS and Hard (mg/L) at the exit before the discharge to the re-
ceiving body is shown in Fig.5c.
7
S.I. Abba, et al. Journal of Water Process Engineering 33 (2020) 101081
Table 4
Performance results for pHt, Turbt, TDSt and Hardt.
Parameter Model Type Training Testing
8
S.I. Abba, et al. Journal of Water Process Engineering 33 (2020) 101081
Fig. 8. Scatter plots for the best model in testing phase (a) pHt (b) Turbt (c) TDSt (d) Hardt.
treatment processes, the addition of chlorine (chlorination process after combination yielded the best performance outcomes in modeling Turbt.
the addition of lime to correct the pH) at the distribution points raises Furthermore, an explanation of the results revealed that for predicting
the treated pH of the water to be slightly acidic (6.0–6.9) to kill all the the performance of Turbt in the Tamburawa WTP, KPCA-ELM with NC
microorganisms in the pipes and pumps along the distribution line to (0.9920), RMSE (0.0000) and MAPE (0.0250) values in the testing
the points of use and this process reduces the acidity making the water phase, proved merit over KPCA-FFNN and KPCA-AR and therefore
to be neutral (pH = 7.0–7.5). It is important to note that the low pH emerged as a reliable model. The KPCA-FFNN model can also serve the
could also be attributed to the large quantity of chemicals used as prediction purpose efficiently despite being outperformed by the KPCA-
fertilizer in the form of NPK (nitrogen: phosphorus: potassium) during ELM models. Similarly, it is quite interesting to note that, there is a
the irrigational activities. These issues can be reduced using organic small increase in the prediction performance of KPCA-ELM with regard
matter which has relatively neutral pH. Despite, the probable changes to KPCA-FFNN model and an approximate 40 % increase for the KPCA-
of pH values as a boundary condition, it should be noted that the AR model. Figs. 7 and 8b demonstrate the time-series and scatter plots
proposed methods handle the initial condition of the data and valida- for the best model in the testing phase. The overall prediction results
tion process was carried out as mentioned earlier in Section 2.5.1. To depict that, with the dimensionally reduced number of input variables,
further increase the quantitative accuracy of pH in the TWTP, an en- the performance accuracy increased in the case of Turbt.
semble technique can be employed. The ensemble technique is an ap- On the other hand, TDS, which is comprised of organic salt, is
proach employed to combine the process of multiple predictors in order considered to be one of the major organic substances that contribute to
to enhance the final performance. This technique has been used with the deterioration of water quality. As such, the obtained results for this
promising success in several fields including hydro-environmental en- variable are presented in Table 4. The M2-KPCA-ELM model out-
gineering, data mining and statistics as an approach to improve the performed all models with reasonable accuracy in both training and
prediction skill. The main goal for this technique follows the concept of testing steps. According to the results, it can be observed that M2-KPCA
improving the performance of the single model by combining the re- with NC (0.9680), RMSE (0.0010) and MAPE (3.3320) values in testing
sults of the various individual models. As the result, ensemble approach phase 4 input combination exhibited the best performance accuracy
can increase the prediction performance with regards to a single model and therefore proved to be a reliable model for prediction of TDSt for
[57–59]. Hence, this approach is also expected to improve the accuracy the Tamburawa WTP. A review of the TDSt results indicates that the
of pH prediction for the TWTP. However, it is worth noting that the KPCA-ELM model increased the prediction accuracy up to 3 % and 30 %
ensemble approach normally requires a lot of computational time with regard to KPCA-FFNN and KPCA-AR, respectively. Furthermore, it
which suggests the use of kernel optimization functions. is clearly shown in Table 4 that TDSt had the highest MAPE values in
Turbidity mostly provides cover and food for pathogens and, if not both training and testing for all the models than the other parameters in
effectively removed, turbidity can cause an outbreak of waterborne the WTP. This indicated the large size of error accumulated in per-
diseases [12]. Table 4 presents the performance results of modeling the centage. According to [12], the smaller the MAPE, the more accurate
treated turbidity for all three models. A direct comparison between the the prediction performance with a range from 0 to 10% as the best
models indicates that almost all the non-linear model combinations MAPE. Figs. 7 and 8c demonstrate the time-series and scatter plots for
attained appreciable performances with respect to NC, RMSE, and the best model in the testing phase. The overall prediction results depict
MAPE. It can be observed that the integration of KPCA with four input that with a dimensionally reduced number of input variables, the
9
S.I. Abba, et al. Journal of Water Process Engineering 33 (2020) 101081
Fig. 9. Taylor diagram depicting the best performance of (a) pH (c)Turb (d) TDS and (d) Hard during the testing phase.
performance accuracy increased in the case of TDSt. the other two models, with the slight exception of the modeling of pHt.
Lastly, the modeling of Hardt in terms of NSE, RMSE, and MAPE is The predictive results can be also evidenced by considering the high
presented in Table 4. It can be seen from the table that the KPCA-ELM value of correlation (R) which was attributed to the KPCA-ELM. Gen-
model outperforms the other two models in terms of all the perfor- erally, if the standard deviation (SD) of the computed values is higher
mance criteria. The results also depict that M2 with 4 input combina- than the SD of the observed values, then it will result in overestimation
tions served as the best model for assessing the performance of the and vice versa. It can be clearly observed that the emerging ELM de-
Tamburawa WTP. M2-KPCA-ELM is superior to all other combinations monstrated promise in the non-linear process, which is not surprising as
with values of NC = 0.9970, RMSE = 0.0000 and MAPE = 0.1740 in ELM has vividly demonstrated excellent performance in terms of
the testing phase. An additional comparison of the results indicated modeling and prediction in recent decades in the field of hydro en-
that, with regard to percentage variation and accuracy, the M2-KPCA- vironmental engineering [25].
ELM model increased by 8 % and 32 % in comparison to KPCA-FFNN It is indeed crucial to state that there is performance uniformity of
and KPCA-AR, respectively. The forecasts for each model are re- the models in terms of prediction results; in other words, there is no
presented in Figs. 7 and 8d in the form of a time-series and scatterplot, exceptional model that exhibited superiority over the others. Generally,
respectively. data-driven models behave differently in accordance with the processes
In order to capture to detail of the three predictive data-driven al- of learning [60]. As such, it is important to validate the current work
gorithms, a two-dimensional method that exhibits how closely a model outcomes with the established technical literature. For instance, some
or different model matches the observed and corresponding computed studies [3,12,21,61–63] reported significant performance of data-
values, i.e. Taylor diagram [47,48], is constructed to visualize the in- driven models in WTP analysis using various input variables and per-
formation in Fig. 9. The Taylor diagram is also the most widely re- formance indicators. To summarize the discussion section, the proposed
commended diagram for accuracy comparison due to the advantageous evolutionary ELM algorithm coupled with KPCA was found to have
nature of combining and quantifying multiple statistical performance excellent prediction skills for modeling the performance of the Tambura
metrics by comparing the similarity between the measured and pre- WTP with regard to the application of single data-driven intelligence
dicted values in one diagram [52,34,16]. However, in accordance with models. The key advantage of the ELM model was due to its promising
the visualized graphical interpretation, the KPCA-ELM model was closer ability to overcome the disadvantages of the traditional feedforward
to the target measured values for all the variables in comparison with backpropagation [25]. On the other hand, PCA has been applied
10
S.I. Abba, et al. Journal of Water Process Engineering 33 (2020) 101081
11
S.I. Abba, et al. Journal of Water Process Engineering 33 (2020) 101081
municipal solid waste generation with combination of support vector machine and moving average, artificial neural network, and wavelet artificial neural network
principal component analysis: a case study of mashhad, Environ. Prog. Sustain. methods for urban water demand forecasting in Montreal, Canada, Water Resour.
Energy 28 (2) (2009) 249–258. Res. 48 (1) (2012) 1–14.
[31] M. G. E. D. Z. S. J. R. B, Miodrag Belosevic, Decomposition analysis on influence [49] S.I. Abba, et al., Modelling of Uncertain system: a comparison study of linear and
factors of direct household, Environ. Sci. Technol. 33 (2) (2014) 482–489. non-linear approaches, IEEE (2019) 1–6.
[32] C.H. Park, H. Park, A comparison of generalized linear discriminant analysis al- [50] S.J. Aboud, M. Al Fayoumi, M. Alnuaimi, Verification and validation of simulation
gorithms, Pattern Recognit. 41 (3) (2008) 1083–1097. models, Handb. Res. Discret. Event Simul. Environ. Technol. Appl. (2009) (2009)
[33] X. Xin, et al., Insights into the toxicity of triclosan to green microalga Chlorococcum 58–74.
sp. using synchrotron-based fourier transform infrared spectromicroscopy: bio- [51] N. Tsioptsias, A. Tako, S. Robinson, Model validation and testing in simulation: a
physiological analyses and roles of environmental factors, Environ. Sci. Technol. 52 literature review, OpenAccess Ser. Informatics 50 (6) (2016) 6.1–6.11.
(4) (2018) 2295–2306. [52] T. Zhou, F. Wang, Z. Yang, Comparative analysis of ANN and SVM models com-
[34] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: theory and appli- bined with wavelet preprocess for groundwater depth prediction, Water
cations, Neurocomputing 70 (1–3) (2006) 489–501. (Switzerland) 9 (10) (2017).
[35] S.J. Hadi, S.I. Abba, S.S. Sammen, S.Q. Salih, N. Al-Ansari, Z.M. Yaseen, Non-linear [53] E. Olyaie, H. Zare Abyaneh, A. Danandeh Mehr, A comparative analysis among
input variable selection approach integrated with non-tuned data intelligence computational intelligence techniques for dissolved oxygen prediction in Delaware
model for streamflow pattern simulation, IEEE Access 7 (2019) 141533–141548. River, Geosci. Front. 8 (3) (2017) 517–527.
[36] Z.M. Yaseen, et al., Stream-flow forecasting using extreme learning machines: a case [54] C. Holland, Principal components analysis, Encycl. Ecol. (2008) 2940–2949 Five-
study in a semi-arid region in Iraq, J. Hydrol. 542 (2016) 603–614. Volume Set, no. July.
[37] Z.M. Yaseen, S.O. Sulaiman, R.C. Deo, K.W. Chau, An enhanced extreme learning [55] H. Ahmad, I. Indabawa, A study of algal species of Kano River, Tamburawa, Kano
machine model for river flow forecasting: state-of-the-art, practical applications in State, Nigeria, Bayero J. Pure Appl. Sci. 8 (1) (2015) 42.
water resource engineering area and future research direction, J. Hydrol. 569 [56] A. Zakari, V.A. Ikudayisi, S.I. Giwa, Quality assessment of the changes in the phy-
(August) (2019) 387–408. sico-chemical parameters in Pipe-Borne water supplied in Kano Metropolis, Nigeria,
[38] S. Zhu, S. Heddam, S. Wu, J. Dai, B. Jia, Extreme learning machine-based prediction IOSR J. Appl. Chem. 7 (11) (2014) 74–81.
of daily water temperature for rivers, Environ. Earth Sci. 78 (6) (2019) p. 0. [57] Y. Khan, S.S. Chai, Ensemble of ANN and ANFIS for water quality prediction and
[39] V. Nourani, G. Andalib, F. Sadikoglu, Multi-station streamflow forecasting using analysis - a data driven approach, J. Telecommun. Electron. Comput. Eng. 9 (2–9)
wavelet denoising and artificial intelligence models, Procedia Comput. Sci. 120 (2017) 117–122.
(2017) 617–624. [58] S.E. Kim, I.W. Seo, Artificial Neural Network ensemble modeling with conjunctive
[40] Z.M. Yaseen, S.O. Sulaiman, R.C. Deo, K.W. Chau, An enhanced extreme learning data clustering for water quality prediction in rivers, J. Hydro-Environ. Res. 9 (3)
machine model for river flow forecasting: state-of-the-art, practical applications in (2015) 325–339.
water resource engineering area and future research direction, J. Hydrol. 569 [59] I. Partalas, G. Tsoumakas, E.V. Hatzikos, I. Vlahavas, Greedy regression ensemble
(2019) 387–408. selection: theory and an application to water quality prediction, Inf. Sci. (Ny) 178
[41] S.I. Abba, G. Elkiran, Effluent prediction of chemical oxygen demand from the as- (20) (2008) 3867–3879.
tewater treatment plant using artificial neural network application, Procedia [60] O. Kisi, Z.M. Yaseen, The potential of hybrid evolutionary fuzzy intelligence model
Comput. Sci. 120 (2017) 156–163. for suspended sediment concentration prediction, Catena 174 (May) (2019) 11–23.
[42] M.S. Gaya, N. Abdul Wahab, Y.M. Sam, S.I. Samsudin, Anfis modelling of carbon [61] K.E. Taylor, Summarizing multiple aspects of model performance in a single dia-
and nitrogen removal in domestic wastewater treatment plant, J. Teknol. Sciences gram, J. Geophys. Res. Atmos. 106 (D7) (2001) 7183–7192.
Eng. 67 (5) (2014) 29–34. [62] C.M. Kim, M. Parnichkun, Prediction of settled water turbidity and optimal coa-
[43] S.I. Abba, S.J. Hadi, J. Abdullahi, River water modelling prediction using multi- gulant dosage in drinking water treatment plant using a hybrid model of k-means
linear regression, artificial neural network, and adaptive neuro-fuzzy inference clustering and adaptive neuro-fuzzy inference system, Appl. Water Sci. 7 (7) (2017)
system techniques, Procedia Comput. Sci. 120 (2017) 75–82. 3885–3902.
[44] Y. Zhang, Enhanced statistical analysis of nonlinear processes using KPCA, KICA [63] A. Maleki, S. Nasseri, M.S. Aminabad, M. Hadi, Comparison of ARIMA and NNAR
and SVM, Chem. Eng. Sci. 64 (5) (2009) 801–811. models for forecasting water treatment plant’s influent characteristics, KSCE J. Civ.
[45] B. Schölkopf, A. Smola, K.R. Müller, Nonlinear component analysis as a kernel ei- Eng. 22 (9) (2018) 3233–3245.
genvalue problem, Neural Comput. 10 (5) (1998) 1299–1319. [64] D. Zhang, E.S. Hølland, G. Lindholm, H. Ratnaweera, Hydraulic modeling and deep
[46] K.P. Singh, A. Malik, D. Mohan, S. Sinha, Multivariate statistical techniques for the learning based flow forecasting for optimizing inter catchment wastewater transfer,
evaluation of spatial and temporal variations in water quality of Gomti River (India) J. Hydrol. 567 (November) (2018) 792–802.
- A case study, Water Res. 38 (18) (2004) 3980–3992. [65] P. Vijai, P. Bagavathi Sivakumar, Performance comparison of techniques for water
[47] Q. Wang, Kernel Principal Component Analysis and Its Applications in Face demand forecasting, Procedia Comput. Sci. 143 (2018) 258–266.
Recognition and Active Shape Models, (2012). [66] J. Inoue, Y. Yamagata, Y. Chen, C.M. Poskitt, J. Sun, Anomaly detection for a water
[48] J. Adamowski, H. Fung Chan, S.O. Prasher, B. Ozga-Zielinski, A. Sliusarieva, treatment system using unsupervised machine learning, IEEE Int. Conf. Data Min.
Comparison of multiple linear and nonlinear regression, autoregressive integrated Work. ICDMW 2017 (November) (2017) 1058–1065.
12