Sustainability 14 15598

sustainability
Article
Predicting Effluent Quality in Full-Scale Wastewater Treatment
Plants Using Shallow and Deep Artificial Neural Networks
Raed Jafar 1, *, Adel Awad 2 , Kamel Jafar 3 and Isam Shahrour 4
1 Engineering Faculty, Manara University, Lattakia, Syria

2 Environmental Engineering Department, Tishreen University, Lattakia P.O. Box 1385, Syria
3 Information Technology Department, Syrian Virtual University, Damascus P.O. Box 35329, Syria
4 Laboratory of Civil Engineering and Geo-Environment (LGCgE), University of Science and Technology of
Lille, 59650 Villeneuve-d’Ascq, France
* Correspondence: raed.jafar@manara.edu.sy or raedjafar@yahoo.fr; Tel.: +963-948309911
Abstract: This research focuses on applying artificial neural networks with nonlinear transformation
(ANNs) models to predict the performance of wastewater treatment plant (WWTP) processes. The
paper presents a novel machine learning (ML)-based approach for predicting effluent quality in
WWTPs through explaining the relationships between the multiple influent and effluent pollution
variables of an existing WWTP. We developed AI models such as feed-forward neural network
(FFNN) and random forest (RF) as well as deep learning methods such as convolutional neural
network (CNN), recurrent neural network (RNN), and pre-train stacked auto-encoder (SAE) in order
to avoid various shortcomings of conventional mechanistic models. The developed models focus on
providing an adaptive, functional, and alternative methodology for modeling the performance of
the WWTP. They are based on pollution data collected over three years. It includes chemical oxygen
demand (COD), biochemical oxygen demand (BOD5 ), phosphates (PO4 −3 ), and nitrates (NO3 − ), as
well as auxiliary indicators including the temperature (T), degree of acidity or alkalinity (pH), electric
conductivity (EC), and the total dissolved solids (TDS). The paper presents the results of using SNN-
Citation: Jafar, R.; Awad, A.; Jafar, K.;
and DNN-based models to predict the effluent concentrations. Our results show that SNN can predict
Shahrour, I. Predicting Effluent plant performance with a correlation coefficient (R) up to 88%, 90%, 93%, and 96% for the single
Quality in Full-Scale Wastewater models COD, BOD5, NO3 − , and PO4 −3 , respectively, and up to 88%, 96%, and 93% for the ensemble
Treatment Plants Using Shallow and models (BOD5 and COD), (PO4 −3 and NO3 − ), and (COD, BOD5, NO3 − , PO4 −3 ), respectively. The
Deep Artificial Neural Networks. results also show that the two-hidden-layers model outperforms the one-hidden-layer model (SNN).
Sustainability 2022, 14, 15598. Moreover, increasing the input parameters improves the performance of models with one and two
https://doi.org/10.3390/su142315598 hidden layers. We applied DNN (CNN, RNN, SAE) with three, four, and five hidden layers for WWTP
Academic Editor: Silvia Fiore modeling, but due to the small datasets, it gave a low performance and accuracy. In sum, this paper
shows that SNN (one and two hidden layers) and the random forest (RF) machine learning technique
Received: 12 October 2022
provide effective modeling of the WWTP process and could be used in the WWTP management.
Accepted: 14 November 2022
Published: 23 November 2022
Keywords: shallow neural networks; deep neural networks; modeling; statistical analysis; wastewater
Publisher’s Note: MDPI stays neutral treatment plant; random forest; quality prediction
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
1. Introduction
The issue of wastewater disposal has become a worldwide major problem because
Copyright: © 2022 by the authors.
of its high impact on health and the environment. Treatment plants have a major role in
Licensee MDPI, Basel, Switzerland. wastewater management; consequently, they should be managed in an optimal way [1,2].
This article is an open access article The total system has to be regarded as an entity; therefore, to establish an adequate,
distributed under the terms and statistical basis for evaluation of performance, it is required to examine and treat years of
conditions of the Creative Commons data [3]. ANN black-box models can be used to predict the WWTP performance using
Attribution (CC BY) license (https:// important pollution variables. Certain key parameters in a WWTP can be used to evaluate
creativecommons.org/licenses/by/ plant performance. These parameters could contain biological oxygen demand (BOD),
4.0/). chemical oxygen demand (COD), and suspended solid (SS) [4].
Sustainability 2022, 14, 15598. https://doi.org/10.3390/su142315598 https://www.mdpi.com/journal/sustainability

Sustainability 2022, 14, 15598 2 of 35
However, the strategy for monitoring and observing the influent and effluent of the
plant requires an understanding of the plant’s performance and the factors affecting the
water specifications, such as time, season, and people’s lifestyle. Measuring wastewater
pollutants at the inlet and outlet of treatment plants enables managers to predict the quality
of the water rejected into the water resource.
The operational control process of a biological wastewater treatment plant (WWTP)
is complex due to changes in the composition of raw wastewater, different flow rates,
and the complex nature of the treatment process [5]. In addition, the lack of permanent
monitoring of the pollution variables reduces the effective control of the quality of the
wastewater discharge. Conventional bioprocess modeling methods are based on equilib-
rium equations with rate equations for bacterial growth and substrate consumption [6],
and micro-organisms grow by converting environmental nutrients into biomass, primarily
proteins and other macromolecules. This conversion is accomplished through networks
of biochemical reactions that span cellular functions such as metabolism, gene expression,
transport, and signaling. Furthermore, because microbial reactions in conjunction with
environmental interactions are nonlinear, time-variable, and complex, traditional deter-
ministic and empirical modeling have limitations. Predicting plant operational parameters
using traditional experimental techniques is also time consuming and a hindrance to the
efficient control of such processes. Hong et al. [7] analyzed multidimensional process data
and diagnosed the inter-relationship of process variables in a real activated sludge WWTP
using an unsupervised neural network as an efficient tool for discovering complex depen-
dencies between process variables and diagnosing the municipal WWTP’s system behavior.
Since microbial interactions were combined with environmental reactions, these equations
are nonlinear, time-dependent, and of a crucial nature [8]. Côte et al. [9] developed a
two-step procedure to improve the accuracy of a mechanistic model of the activated sludge
process. First, they optimized the numerous model parameters using the downhill simplex
technique to minimize the sum of the squares of the errors between the calculated and
observed values of related parameters. Second, neural network models were successfully
used to predict the optimized mechanistic model’s remaining errors. Hamed et al. [10]
developed two ANN-based models to predict BOD and SS effluent concentrations at a
main WWTP in Cairo. Over a 10-month period, the developed models were trained and
tested on daily sets of BOD and SS measurements. The BOD and SS models provided good
estimates. The ANN models were a reliable prediction tool due to the prediction error,
which varied slightly and smoothly across the range of data sizes used in training and
testing. However, their data were limited, and additional parameters measured (e.g., pH,
temperature, etc.) would improve the predictive capability of the neural network. Given
all of this, predicting the operational parameters of the processing unit using traditional
experimental methods is time-consuming. It constitutes an impediment to the effective,
better management of the WWTP process.
Methods based on artificial intelligence technologies have been applied in many
fields, especially environmental issues. The artificial neural network (ANN) method was
employed to model the wastewater treatment plants (WWTP) in order to improve the
prediction of treatment process [11]. Some key variables can be used to evaluate the perfor-
mance of the plant, including biological oxygen demand (BOD), chemical oxygen demand
(COD), phosphates (PO4 −3 ), and nitrates ( NO3 − ). The work in Ref. [12] showed that the
ANN models can provide an effective tool for modeling the complex processes in the treat-
ment units. Several other studies have employed ANN in this domain as well. For example,
Hamoda et al. [5] evaluated the performance of the Al-Ardiya wastewater treatment plant
in Kuwait City using the ANN backpropagations method. The results demonstrated that
neural networks provide a flexible tool for modeling the wastewater treatment plants. Rene
and Saidutta [13] predicted the BOD5 and COD values of wastewater from a petrochemical
factory treatment plant in India using multilayer neural networks. Vyas et al. [14] used
the ANN approach to predict the parameters of cotreatment plants (sewage and industrial
wastewater). Two models were constructed using ANN with a forward-feeding method-
ology (three layers) and a backpropagation algorithm to predict the BOD5 concentration
in and out of the Govindpura sewage treatment plant in Bhopal. They found that the
plant’s efficiency for removing BOD5 is 80%. Jami et al. [15] used the multi-input ANN to
predict the performance of a wastewater treatment plant. The highest correlation coefficient
between the calculated and measured values was equal to R = 0.6 when predicting the COD
pollution. Pakrou et al. [16] used the artificial neural networks to predict the treatment effi-
ciency and the effect of input parameters on predicting the wastewater treatment plant in
Tabriz. They concluded that the best model was obtained by combining the input variables
of Qinf , TSSeff , and MLSS. It gave R = 0.898 and RSME = 0.443. Nourani et al. [17] studied
the Nicosia WWTP performance by artificial intelligence using three variables: CODeff ,
BOD5eff , and TNeff and three various artificial intelligence (AI) adopted nonlinear models,
feed-forward neural network (FFNN), adaptive neuro-fuzzy inference system (ANFIS),
support vector machine (SVM) methods, and a traditional multilinear regression (MLR)
for WWTP performance forecasting. The results showed that the neural network ensemble
(NNE) model was robust for predicting the WWTP performance.
Wang et al. [18] used machine learning (ML) methods to model waste water treatment
plant processes in order to prevent the weaknesses of traditional mechanistic models. They
presented an original ML context based on RF, DNN, variable importance measure (VIM),
and partial dependence plot (PDP) to improve effluent quality control in WWTPs. The
suggested ML framework seems to have the possible improvement of effluent-quality
management approaches at Sweden’s Umeå WWTP.
It is worth noting that data mining and knowledge recognition through machine
learning (ML) have recently found use in environmental cleanup, particularly the inves-
tigation for the multifactorial method such as hexavalent chromium [Cr(VI)] elimination
from industrial wastewater [19] and adsorption of the antibiotics as emerging component
pollutants from the wastewater [20].
Alsulaili and Refaie [21] investigated the use of the ANN in forecasting the influent
BOD5 and the WWTPs’ performance. The performance of the WWTPs were determined in
relation to the effluent concentrations of COD, BOD5 , and TSS. The best forecasting model
for the inlet BOD5 achieved a value of R = 0.87.
Finally, WWTPs are difficult, nonlinear processes with great variations in stream rate,
chemical environment, pollution load, and hydraulic situations. Modeling WWTP pro-
cesses is challenging due to these complexities and uncertainties [18]. Indeed, deterministic
models such as activated sludge models and other mechanistic models have been com-
monly applied for modeling WWTP methods and to forecast the comportment of specific
parameters [22]. However, because many hypotheses and simplifications are required to
make mechanistic models controllable and calculable, they have several restrictions. ASMs,
for example, are only acceptable in specific alkalinity extents, pH, and temperature [18,23].
Many of these limitations are avoided by machine learning (ML) models, since they
are specially focused on capturing relations concerning the input and output data that
facilitate decisions and allow forecasts [24]. Analysis of the above research on the use of
ANN in wastewater treatment plant modeling shows the following limitations: (i) the
use of a low number of variables in the ANN models and (ii) the moderate performance
achieved by these models (R = 0.70 − 0.89).
This work overcomes the aforementioned literature limitations by using a new ML-
adopted framework, which is planned to predict WWTP effluent quality by explaining the
relationships between the multi-influent auxiliary and effluent pollution variables.
We developed AI-based models such as the feed-forward neural network (FFNN) and
random forest (RF) as well as deep learning methods such as the convolutional neural
network (CNN), recurrent neural network (RNN), and pre-train stacked auto-encoder
(SAE), which are relatively new methods for the assessment of WWTP methods with the
goal of preventing the many shortcomings of traditional mechanistic models. Through
the application of these developed models, we can decrease the number of laboratory
measurements and shorten the time, effort, and cost. It takes five days to measure the
effluent BOD5 in the laboratory, for example, while we can calculate its value by applying
these models at any time and with a very high accuracy, thus reducing the cost of laboratory
materials and the time required to conduct these experiments. All this ultimately serves to
stabilize the ecological balance and reduce pollution. The paper presents, firstly, the models
developed in this research, followed by the methodology and its application to the Khirbet
al-Mu’azah full-scale wastewater treatment plant located in the southeast of Tartous city
in Syria.
2. Theoretical Background
2.1. Artificial Neural Networks Theory
This study is based on creating multiple artificial neural networks (ANNs) models.
These networks are defined as an inspired data-processing system that simulates how
human data are processed, such as the biological nervous system and the human brain.
The goal of the neural network is to calculate the output from the input values by certain
inner calculations [25]. Feed-forward neural networks generally consist of a system of
neurons organized in numerous layers, input layer, output layer, and at least one hidden
layer, representing the second generation of neural networks or the shallow neural network
(SNN). Each neuron in each layer is associated with each neuron in the next layer with an
initial weight and then modified and adjusted during the training and learning process
(Figure A1).
In our study, feed-forward neural networks (FFNNs) of multiple ANN models were
developed. Forward feeding of networks means the spread of data entering the network in
the forward direction, always from the input towards the output layer. This type of network
is called the error backpropagation network, because the real output of the network is
compared with the target output, and the difference between these values is called the error
that the network propagates starting from the output layer [26]. To mathematically define
the mechanism of error backpropagation, the mechanism of forward feeding must first be
clarified, as demonstrated in the equations below [27].
The first stage in the feed-forward phase, in which the output Yik−1 of the neuron (i) in
layer (k − 1) of the forward feeder network is associated
with the input from the j neuron
in the posterior layer k by a true weight factor Wjik .
Where:
k: index of (k = l, ll) layer;
i: neuron index of the (k − 1) layer;
j: neuron index of the (k) layer.
To compute the output Yjk , the neuron j of the k layer [k = l, ll] performs the following
calculation: " #
N
Yjk = f k ∑ (Wjik .Yjk−1 ) + bi (1)
i =1
where:
N: the number of neurons in the k − 1 layer;
f k : transfer function.
The bias (bi) vector is considered as the constant term in the polynomial mathematical
equations that helps in solving these equations more easily and quickly.
The second stage is the error backpropagation step, in which the mean square error
(MSE) and the error correction factor (δ) are determined in the output unit using the
following equations:
i =q
1
MSE = err =
2.q ∑ (yi − a2 )2 (2)
i =1
∂err
δ= (3)
∂a(2)
where:
err: error square rate in output unit; y: output target; (δ): error correction factor; a2 :
calculated output.
The third stage is the updating weights phase, which involves updating the weights
and bias factor as follows:
W( New) = Wold + ∆w (4)
b( New) = bold + ∆b (5)
where: ∆w: weight correction factor; ∆b: bias correction factor.
There are two different methods (increment and batch input method) of updating the
weights of an artificial neural network, assuming that the network inputs are in the form of
a mathematical matrix consisting of rows and columns. Each row represents a vector that
contains all the variables to be entered into the network [28].
The process of updating weights can be repeated thousands of times in familiar
practical applications, and training usually stops when an acceptable error level is reached
or when the number of iterations (epoch) specified by the trainer is reached.
2.2. Deep Neural Networks DNN

A deep neural network (DNN) is an artificial neural network (ANN) that contains
numerous layers between the input and output layers, which are regular feed-forward
networks in which data flows from the input to the output layer without returning back [29].
DNN can be thought of as an upgrade of SNN, which shows major improvement over SNN
for its obvious enhancement of prediction accuracy on the unseen or testing dataset [30].
These layers are the input, hidden, and output, each of which is composed of several
neurons; more than three layers (together with input and output layer) qualify as “deep”
learning. Generally, when there is more than one hidden layer, a feed-forward ANN can be
referred to as a deep neural network (DNN). The most essential theories in a FF-DNN are
weights, biases, nonlinear activation, and backpropagation [18].
The DNN constructs a network of simulated neurons and assigns random numbers
“weights”, to interconnections between them. Then, the weights and inputs are multiplied
and return an output between 0 and 1. If the network failed to recognize a specific pat-
tern accurately, an algorithm would modify the weights. The algorithm can increase the
influence of specific parameters until it determines the correct mathematical treatment
to fully process the data and turn the input into the output, whether it is a linear or a
nonlinear relation.
Subsequently, deep learning has progressively become successful, and certain types of
deep neural networks, such as convolution neural networks (CNN) and recurrence neural
networks (RNN), have achieved surprising accomplishments in image, voice recognition,
and natural language processing (NLP) [31]. Currently, deep learning has become the major
flow in machine learning (ML). However, the implementations of DNNs in environmental
issues are still restricted. This is mainly due to the problems associated with DNN training
and collecting big datasets in wastewater treatment process science. DNNs for wastewater
issues commonly have no more than 100 input variables (including characteristics, physical,
chemical, organic, microbiology, processing, and property variables), and lower parameters
need to be defined. Therefore, small DNNs (few hidden layers and a small neurons number
in every layer) could be sufficient for almost all wastewater treatment plants issues.
The possibilities of using DNN with small datasets in wastewater are obvious: exten-
sive regression obstacles already solved by conventional artificial intelligence (AI) such
as SNN [32] with the small dataset can be treated by DNN with greater reliability and
superior generalization accomplishment.
In this paper, we use pollution parameters prediction as a case to compare among
SNN (traditional shallow NN), random forest (RF), and DNN (CNN, RNN, and SAE pre-
train stacked auto-encoder DNN), and we show the performance of each method with a
small dataset.
In this paper, we use pollution parameters prediction as a case to compare among
SNN (traditional shallow NN), random forest (RF), and DNN (CNN, RNN, and SAE pre-
train stacked auto-encoder DNN), and we show the performance of each method with a
small dataset.
3. Materials and Methods

3.1. Plant Description
3. and Data and
Materials Used
Methods
3.1. Plant Description
The Khirbet al-Mu’azah wastewaterand Data Used
treatment plant is located southeast of Tartous
city in Syria. It is placed The Khirbet
beside theal-Mu’azah wastewater treatment
main Tartous–Safita plant is17
road, about located southeast
km from of Tartous
Tartous
city in Syria. It is placed beside the main Tartous–Safita
and 13 km from Safita city. Figure 1 shows the schematic of the WWTP process. road, about 17 km from Tartous
and 13 km from Safita city. Figure 1 shows the schematic of the WWTP process.
1
Figure 1. Schematic ofFigure

the WWTP process.
1. Schematic of the WWTP process.
The Khirbet al-Mu’azah treatment plant is based on the activated sludge treatment
The Khirbet al-Mu’azah treatment plant is based on the activated sludge treatment
with the extended aeration technique. It was planned to serve a group of villages with
with the extended 10,000
aeration
people.technique.
The average It was
inflowplanned
is 42 m3 /h. to serve a group of villages with
10,000 people. The average inflow is
This research 42 measurements
used m /h.
3
of samples taken from the influent and effluent of
This research used measurements
the Khirbet of samples
al-Mu’azah wastewater taken from
treatment the2018–2020.
plant for influent Theseand effluent of of
measurements
the Khirbet al-Mu’azah wastewater treatment plant for 2018–2020. These measurementssets of
the selected variables cover all the seasonal variations. Moreover, it includes several
input and output parameters. The database consists of 198 domestic wastewater samples
of the selected variables cover all the seasonal variations. Moreover, it includes several
taken from the inlet and the same number from the outlet over three years; eight pollution
sets of input and output
inlet andparameters.
outlet variables The were database
measured. consists of 198was
This database domestic
utilized towastewater
build the models,
samples taken fromincluding
the inletthe and the same
influent of number from the outlet over three years; eight
pollution inlet and outlet COD inf , biochemical
variables were oxygen
measured. demand ThisBOD inf , phosphates
database PO4inf , nitrates
was utilized to buildNO3inf ,
temperature Tinf , degree of acidity or alkalinity pHinf , electric conductivity ECinf and the
the models, including the influent of
total dissolved solids TDSinf , which were used as the inputs of models, and the effluent
CODinf, biochemical oxygenofdemand
measurements BODinf, phosphates
the same parameters CODeff , BOD PO,4inf , nitrates NO3inf, tem-
eff PO4eff , NO3eff , Teff , pHeff , ECeff ,
perature Tinf, degreeandofTDS
acidity or alkalinity
eff , which were used as pH inf, electric
targets conductivity ECinf and the total
of the models.
dissolved solids TDSinf,Inwhich
the WWTP, weretheused as theratio
COD/BOD inputs of models,
is normal (equal toand the effluent
2), indicating meas-
that a considerable
urements of the same parameters CODeff, BODeff, PO4eff, NO3eff, Teff, pHeff, ECeff, and TDSofeffBOD
portion of organic matter will easily reduce biologically. The measurements , 5
showed that the pollution’s load is generally originated from the households, with only a
which were used asminortargets of the models.
involvement from the industrial area. In addition, the variations and components of
In the WWTP,the theWWTP
COD/BOD ratio by
are involved is the
normal
quantity(equal to 2), indicating
of domestic organic waste. that a consider-
able portion of organic Figure
matter2awill easily descriptive
illustrates reduce biologically.
statistics forThe measurements
the selected of BOD5 as a
treated parameters
showed that the pollution’s load is generally originated from the households, with(eff).
boxplot. Figure 2b presents the observed influent (inf) and treated effluent Figure
only a 2c
illustrates the concentrations of COD, BOD5 , NO3 , PO4 , T, pH, EC, and TDS at the entrance
minor involvement from the industrial area. In addition, the variations and components
and outlet of the plant.
of the WWTP are involved by the quantity of domestic organic waste.
3.2. Models
Figure 2a illustrates Development
descriptive statistics for the selected treated parameters as a box-
plot. Figure 2b presents Thethe observed
creation influentneural
of the artificial (inf) networks
and treated
modeleffluent (eff). Figure
depends mainly 2c
on the available
databases of the studied phenomenon factors. Therefore,
illustrates the concentrations of COD, BOD5, NO3, PO4, T, pH, EC, and TDS at the entrance the data of these factors (inputs
and outputs) that we collected through the research period were statistically examined by
and outlet of the plant.
the one-way variance of the ANOVA1 method (Figure A2a,b), which is included in the
environment work of the MATLAB software.
This ANOVA1 analysis was applied before developing ANN models to reject and
exclude the raw data with anomalous and inaccurate values in the database. The plot
signifies the degree of anomalies, effectiveness, and the range of each treatment plant
variable. After completing the statistical analysis phase, artificial neural network models
were created using Matlab, which provides an important platform for applying the ANN
modeling and simulation process. The software includes a special toolbox that contains
several functions that help manage and analyze historical data.
Sustainability 2022,Sustainability
14, 15598 2022, 14, x FOR PEER REVIEW 7 of 35 7 o
(a)
(b)
(c)
Figureof2.treated
Figure 2. (a) Boxplot (a) Boxplot of treated
influent influentparameters
and effluent and effluent parameters
The Therepresent
orange dots orange dots
the represent
data the d
points. (b) BODpoints.
5 , (b)NO
COD, BOD
3 , 5, COD, NO3, PO4, and TDS concentrations of the WWTP at the inlet (influent)
PO 4 , and TDS concentrations of the WWTP at the inlet (influent) and
(c) outlet (effluent).
(c) outlet (effluent).
These ANOVA1 graphs summarize each variable through the four elements as follows:
the centerline in each box indicates the sample mean, which refers to the central tendency
or location; square box to represent the variance around this central tendency (the box’s
edges reflect the 25th and 75th percentiles); whiskers around a box to represent the variable
range. All measurements that exceed the filament longitudinal length are marked with a
sign (+) if their value is greater than 1.5 times the range quarterly away from the top or
bottom of the box.
Related to the dataset division, we start with 80% of the data in the training set, 10%
in the validation set, and 10% in the testing set, and then we continue with 70% of the data
in the training set, 15% in the validation set, and 15% in the testing set that we adopted.
The optimum split of the training, validation, and testing set depends upon factors such as
the use case, the model structure, and data dimension, etc.
Two artificial neural network main scenarios (S1 and S2) were developed. According
to the regular rules in the neural network process, input variables and target variables
must be normalized before use in the network [33]. Thus, at the primary phase before the
training of the model, input and output data were standardized (e.g., in the range of 0 and
1) as: [34,35]
xu − x(min)
xi = (6)
x(max) − x(min)
where: Xi is the standardized data value, xu is observed data, x(min) is the minimum,
and x(max) is the maximum value of the measured dataset. The statistical analysis of the
input-output variables is vital in artificial neural modeling because this type of analysis
determines the nature and strength of the relationships between inputs and outputs.
It is obvious that for all types of data-based methods (e.g., artificial intelligence meth-
ods), if the amount of dispersion (standard deviation) of data is low (indicating the closeness
of the data to the mean), lower biased outputs from the models are predictable. The corre-
lation coefficient (R; a widely used measure) was calculated in the descriptive statistical
analysis to determine the force and amount of linear relationship between two variables,
which can be used as an initial indication of a potential linear relationship among a group of
parameters. Table 1 presents the results of the Pearson correlation matrix between influent
and effluent parameters.
Nevertheless, the drawback of the calculated R coefficient demonstrates that the use
of traditional linear methods to process complex nonlinear relations is not preferred, and
there is a significant requirement to add additional nonlinear solid techniques.
As a result, unlike previous studies that used linear correlation coefficients between
input and output parameters to select the dominant inputs of nonlinear models, this study
examines different combinations of input parameters using the ANN method.
The main shallow artificial neural network scenarios aimed to achieve optimum
performance using the NNFTool box library included in the Matlab environment and
the Levenberg–Marquardt (LM) network-training algorithm. The first ANN scenario (S1)
consists of the input layer, one hidden layer, and the output layer. The Tansig function was
used as a transfer function for the hidden layer, and the linear transfer function was used
as the transfer function for the output layer. As for the second scenario (S2), it includes an
input layer, two hidden layers, and an output layer, where the sigmoid function was used
as a transfer function for the first hidden layer, the Tansig function as a transfer function for
the second hidden layer, and the linear function as a transfer function for the output layer.
The mean square error (MSE) and the (R) correlation coefficient were used to assess the
network effectiveness in the two scenarios.
Table 1. Pearson correlation matrix between influent and effluent parameters.
Parameters COD_inf BOD_inf PO4 _inf NO3 _inf T_inf pH_inf EC_inf TDS_inf COD_eff BOD_eff PO4 _eff NO3 _eff T_eff pH_eff EC_eff TDS_eff
COD_inf 1
BOD_inf 0.901 ** 1
PO4 _inf 0.130 0.069 1
NO3 _inf 0.194 ** 0.169 * 0.346 ** 1
T_inf 0.142 * 0.128 −0.180 * −0.232 ** 1
pH_inf 0.031 −0.005 −0.080 −0.259 ** −0.130 1
EC_inf 0.251 ** 0.270 ** 0.235 ** 0.536 ** −0.030 −0.316 ** 1
TDS_inf 0.248 ** 0.269 ** 0.205 ** 0.527 ** −0.055 −0.260 ** 0.952 ** 1
COD_eff 0.128 0.105 0.167 * 0.287 ** 0.040 −0.070 0.296 ** 0.295 ** 1
BOD_eff 0.016 0.015 0.306 ** 0.030 0.120 0.034 0.010 0.020 0.187 ** 1
PO4 _eff 0.130 0.076 0.888 ** 0.396 ** −0.138 −0.052 0.214 ** 0.180 * 0.087 0.207 ** 1
NO3 _eff 0.196 ** 0.168 * 0.457 ** 0.774 ** −0.231 ** −0.138 0.350 ** 0.344 ** 0.174 * −0.050 0.599 ** 1
T_eff 0.102 0.099 −0.162 * −0.203 ** 0.892 ** −0.178 * −0.059 −0.095 0.072 0.151 * −0.129 −0.220 ** 1
pH_eff 0.053 0.004 0.047 0.002 −0.037 0.635 ** −0.012 0.013 0.114 0.129 0.029 0.008 −0.048 1
EC_eff 0.139 0.217 ** 0.329 ** 0.332 ** 0.141 * −0.284 ** 0.631 ** 0.575 ** 0.123 0.148 * 0.299 ** 0.266 ** 0.148 * 0.008 1
TDS_eff 0.137 0.203 ** 0.314 ** 0.327 ** 0.146 * −0.268 ** 0.616 ** 0.581 ** 0.124 0.161 * 0.287 ** 0.260 ** 0.157 * −0.010 0.949 ** 1
**. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).
In addition, we applied random forest (RF), which is one of the traditional machine
learning (ML) techniques. It can perform both regression and classification problems. It is
a supervised type of ML applied in pattern recognition. The random forest (RF) regression
is an ensemble machine learning algorithm that combines multiple decision trees and that
was first developed by Breiman [36]. A regression tree is a set of conditions or restrictions
that are organized hierarchically and applied sequentially from the tree’s root to its leaf.
RF is based on the assumption that different independent predictors predict incorrectly in
different areas, and that by combining the prediction results of the independent predictors,
the overall prediction accuracy can be improved. When the training data vary slightly, the
structures of regression trees in RF show significant differences. Independent predictors
can be created by combining this characteristic with bagging (bootstrap aggregation) and
random feature selection to construct a random decision tree.
The RF starts with a large number of bootstrap samples taken at random from the
original training dataset. Each bootstrap sample is fitted with a regression tree. For
binary segmentation, a small set of input variables chosen at random from the total set
is considered for each node per tree. In this study, the random forest regression model
was imported from the Sklearn package as “sklearn.ensemble.RandomForestRegressor”.
The RF algorithm needs to define the number of random features, trees, and stop criteria.
Averaging over all trees gives the predicted value of an observation. The number of
regression trees (‘ntree’; default value is 100 trees) and the number of input variables per
node (‘mtry’; equate to 1) should be optimized in the RF. When “mtry” equates to one, the
split variable is completely random, so all variables get a chance. Given the set of training
input–output pairs, the RF regression model was used to model the relationship between
the WWTP influent auxiliary pollution parameters and effluent parameters. RF modeling
generates training data by sampling and replacing all of the samples for each predictor
in the ensemble [36]. We stopped training when the minimum sample in a tree was one
sample with a minimum impurity of zero.
Convolutional neural network (CNN), which is a subset of machine learning that uses
neural networks with at least three layers, has evolved into one of the most prominent
neural networks in the field of deep learning. CNN is a type of feedforward neural network
that uses convolution structures to extract features from data, unlike traditional feature
extraction methods [37]. CNN needs a convolutional layer but can also include nonlinear,
pooling, and fully connected layers to form a deep convolutional neural network [38]. CNN
can be useful depending on the application. However, it adds new parameters for training.
Convolutional filters are trained in the CNN using the backpropagation method. In the
convolutional layer, multiple filters slide over the layer for the given input data. The output
of this layer is a sum of an element-by-element multiplication of the filters and receptive
field of the input. The weighted summation is added as an element to the following layer.
In this study, related to the COD and BOD5 , their CNN models were imported from the
Keras package as “keras.models import Sequential and keras.layers import Dense, Conv1D,
Flatten”. Sequential is the easiest way to build a model in Keras. It allows building a model
layer by layer. Our first layer was Conv1D layer. This is a convolution layer that deals with
the input variables, which is seen as a 1-dimensional matrix. We used 32 as the number
of nodes in the layer. This number can be adjusted to be higher or lower, depending on
the size of the dataset. In our case, 32 worked well. Kernel size is the size of the filter
matrix for our convolution. Therefore, a kernel size of 2 means we would have a 2 × 2 filter
matrix. We used the ReLU activation function for this and (Dense) layers. This activation
function has been proven to work well in neural networks. Our first layer also took in an
input shape as 7 input and 1 output. In between the Conv1D layer and the first dense layer,
there was a “Flatten” layer. Flatten serves as a connection between the convolution and
dense layers. “Dense” is the layer type we used for our third and output layer. Dense is
a standard layer type that is used in many cases for neural networks. First dense layer
includes 64 nodes and ReLU activation function, while the output dense layer includes one
output COD or BOD5 according to the studied model. Regarding the model compiling, we
used the “Adam” optimizer to control the learning rate. The Adam optimizer adjusts the
learning rate throughout training. The learning rate defines in what way and how fast the
optimal weights for the model are calculated. A smaller learning rate may drive additional
accurate weights (to a certain extent); however, the time required to compute the weights
will be longer. We used “MSE” for the loss function; this is the choice for regression issues,
a lower value that indicates that the model is performing better. The most widely used
evaluation metrics R-squared (R2 ) is the proportion of variation in the outcome that is
explained by the predictor variables in a regression model, and mean squared error (MSE)
is the average error performed by the model in predicting the outcome for an observation.
A recurrent neural network (RNN) is a type of artificial neural network that is most
commonly used in speech recognition and natural language processing (NLP). RNN is also
used in deep learning and the creation of models that simulate the activity of neurons in the
human brain. A recurrent neural network (RNN) is a type of neural network in which the
previous step’s output serves as the input to the current step. While all inputs and outputs
in traditional neural networks are independent of each other, in some prediction cases, such
as predicting the next word of a sentence, knowing the previous words is required, and
thus the previous words must be remembered. This is why RNN was created, which solved
the problem by using a hidden layer. The hidden state, which remembers some information
about the sequence, distinguishes and elevates an RNN [39]. Except for the addition of
a hidden layer, a recurrent neural network is similar to a traditional neural network (the
memory state to neurons). A simple memory will be included in the computation. A
recurrent neural network is a deep learning algorithm that uses a sequential approach. All
inputs and outputs in neural networks are always dependent on all other layers. These
neural networks perform mathematical calculations sequentially, which is why they are
referred to as recurrent neural networks [40].
Similarly (up to a certain point) to the above CNN models, we constructed the RNN
model, “keras.models import Sequential and keras.layers import Dense, SampleRNN”,
imported from the Keras package. In Keras, the simplest way to build a model is sequential.
RNN models have used 1 layer of SampleRNN with 40 neurons, input shape as 7 input,
1 output, and ReLU as activation function. Followed by four layers of Dense type, they
contain 30, 10, 5, 1 neurons, respectively, the last dense layer produces the output of
the model (COD or BOD5 ) and ReLU activation function for all. Concerning the model
compiling, we used the “Adam” optimizer to control the learning rate and “MSE” for
the loss function. Finally, we used the R-squared (R2 ) and MSE for evaluation of the
performance of our models.
Pre-training sets up DNN with optimized weights and biases values that are close to
the global optimal solution, allowing the subsequent fine-tuning step to avoid the traps of
the local optimal solution. It is possible to initialize DNN through stacked auto-encoder
(SAE); the auto-encoder is a one-hidden-layer shallow neural network with the same input
and output layers. Each auto-encoder has the same number of neurons as the corresponding
layer of the DNN. The first auto-encoder receives the DNN’s input and output, and the
output of its hidden layer provides the input and output to the second auto-encoder; for
example, the output of the previous auto-hidden encoder’s layer provides the input and
output to the next auto-encoder. Each trained auto-encoder gives initial weights and biases
values to the DNN’s corresponding layer [30].
Moreover, we used RF, CNN, RNN, and SAE pre-train stacked auto-encoder as a DNNs
method in Python and Matlab environment to determine its accuracy and performance of
COD and BOD5 models.
4. Results and Discussion

4.1. Single Model (COD Effluent)
We assessed and compared the forecasting effectiveness of the artificial intelligence
method on the chemical oxygen demand elimination performance of the full-scale WWTP,
after the evaluation data dependency (Figure 3a). The scatter matrix of the dependent
variable (CODeff ) and the other seven variables is depicted in Figure 3a. The Pearson corre-
lation coefficient (R) of CODeff as well as the other seven variables are represented in the
same figure. The scatter plot can easily reveal any obvious patterns or linear relationships
between them. They demonstrate that no noticeable patterns existed in the scatter matrix.
The absolute value of the correlation coefficient (R) is well below 0.5, indicating that there
is no linear relationship between (CODeff ) and the other seven variables; the relationship is
nonlinear and cannot be expressed by a simple function. Several artificial neural networks
have been constructed in the two main scenarios (S1: one hidden layer, and S2: two hidden
layers). Both include one input layer consisting of five subscenarios for the input variables,
considering various input combinations (Table 2): (4 neurons: Tinf , pHinf , TDSinf , ECinf ),
(5 neurons: Tinf , pHinf , TDSinf , ECinf , NO3inf ), (6 neurons: Tinf , pHinf , TDSinf , ECinf , NO3inf ,
PO4inf ), (7 neurons: Tinf , pHinf , TDSinf , ECinf , NO3inf , PO4inf , BOD5inf ), (8 neurons: Tinf ,
pHinf , TDSinf , ECinf , NO3inf , PO4inf , BOD5inf , CODinf ); all the input values were taken at
the WWTP entrance. The output layer was limited to a single neuron, and it represents
the values of the CODeff at the outlet of the plant. As for the hidden layer in the two main
scenarios, it is worth noting that defining the number of hidden neurons, training epochs,
and transfer functions are important elements in planning the FFNN model. Because of its
quick learning and high performance precision, Lavenberg–Marquardt was selected and
used as the BP training algorithm in this research. It was determined using the experiment
(trial) method, in which the number of neurons and sample group division were adjusted
to obtain the lowest mean square error for different repetitive epochs.
Table 2. Performance of the CODeff shallow artificial neural network models.
Model Training Validation Testing All Data No. Neurons

Model
Model Input Variables Output (Correlation (Correlation (Correlation (Correlation in Hidden MSE
No.
Variable(S) Coefficient) Coefficient) Coefficient) Coefficient) Layers
M1-S1 0.557 0.37 0.49 0.504 67 137.42
Tinf , pHinf , ECinf , TDSinf CODeff
M1-S2 0.6637 0.317 0.6104 0.605 60-40 121.26
M2-S1 0.65 0.703 0.806 0.676 65 39.57
Tinf , pHinf , ECinf , TDSinf , NO3 − inf CODeff
M2-S2 0.6859 0.6838 0.6023 0.671 55-30 67.34
M3-S1 Tinf , pHinf , ECinf , TDSinf , 0.605 0.71 0.393 0.59 65 169.21
CODeff
M3-S2 NO3 − inf inf , PO4−3 inf 0.696 0.804 0.769 0.719 60-40 46.61
M4-S1 Tinf , pHinf , ECinf , TDSinf , NO3 − inf , CODeff
0.79 0.752 0.872 0.798 55 48.937
M4-S2 PO4−3 inf , BOD5inf 0.891 0.912 0.833 0.888 50-30 8.5135
M5-S1 Tinf , pHinf , ECinf , TDSinf , NO3 − inf , CODeff
0.79 0.67 0.658 0.754 60 44.963
M5-S2 PO4−3 inf , BOD5inf , CODinf 0.826 0.78 0.895 0.841 50-30 40.09
Each subscenario arrangement was trained more than 100 times with different random
seeds before selecting the best SNN.
Matlab was used for all calculations, as well as its statistics, ML, and ANN tool-
boxes. Pearson correlation coefficients of target values and SNN prediction values were
investigated as a training, validation, and testing accuracy metric. Table 2 shows the best
performance of CODeff models selected from several studied architectures. Figure 3b,c
represent the best architecture of the two scenarios (M4-S1 and M4-S2, respectively) through
a trial–error process and the correlation coefficients for all training, validation, and testing
groups, in addition to the set of all the data. The FFNN model that includes seven inputs
and one output neurons was discovered as the best for all simulated parameters, as shown
in Table 2 and illustrated in Figure 3d for the first scenario (M4-S1) and Figure 3e for the
second scenario (M4-S2). Figure 3f shows the results’ predicted values by ANN as the
samples’ sequence plots for CODeff compared to the measured values. It could be observed
that the results of FFNN are acceptable for forecasting the performance of the WWTP
(Table 2).
Sustainability
Sustainability 2022,
2022, 14,14, x FOR PEER REVIEW
15598 13 of
1335
of 35
(a)
(b)
(c)
Figure 3. Cont.
Sustainability 2022, 14, x FOR PEER REVIEW 14 of 35
(d)
(e)
Figure 3. Cont.
(f)
(g)
Figure 3. Cont.
(h)
(i)
Figure 3. Cont.
Sustainability 14,14,
2022, 15598
x FOR PEER REVIEW 17 35
17 of of 35
(j)
Figure (a) Scatter

Figure3.3. (a) Scattermatrix
matrixofof (COD
(COD eff) eff
and ) and
otherother
sevenseven
inputinput variables,
variables, the of
the values values
Pearsonof corre-
Pearson
correlation coefficients
lation coefficients (Rs)
(Rs) are are
also also in
shown shown in the
the figure. (b)figure.
The best(b) The best
artificial neuralartificial
network neural network
architecture
architecture
for Model M4-S1for Model M4-S1
in the in the first
first scenario. (c)scenario.
The best (c) The best
artificial artificial
neural networkneural network for
architecture architecture
Model
M4-S2
for Model in the
M4-S2second scenario.
in the second(d) The values
scenario. (d) of
Thecorrelation
values ofcoefficient
correlation (R)coefficient
in all stages
(R)ofinModel M4- of
all stages
S1 in the first scenario. (e) The values of correlation coefficient (R) in all stages
Model M4-S1 in the first scenario. (e) The values of correlation coefficient (R) in all stages of Model of Model M4-S2 in
the second scenario. (f) Measured vs. predicted samples sequence obtained by best single COD ef-
M4-S2 in the second scenario. (f) Measured vs. predicted samples sequence obtained by best single
fluent models M4-S2. (g). Measured vs. predicted samples sequence achieved by best single COD
COD effluent
effluent modelsM4-S2.
RF models M4-S2.(h).(g).Measured
Measuredvs.vs.predicted
predicted samples
samples sequence
sequence achieved
achieved byby best
best single
single
COD effluent RF models M4-S2. (h). Measured vs. predicted samples
COD effluent CNN models M4-S2. (i). Measured vs. predicted samples sequence achieved by best sequence achieved by best
single COD effluent CNN
single COD effluent RNN models models M4-S2. (j). Measured vs. predicted samples sequence achieved by by
(i). Measured vs. predicted samples sequence achieved
best single COD effluent
best single COD effluent RNN SAE models M4-S2. (j). Measured vs. predicted samples sequence achieved
by best single COD effluent SAE models M4-S2.
Each subscenario arrangement was trained more than 100 times with different ran-
domTheseedsaimbefore selecting the
of comparing the two-hidden-layer
best SNN. models to one-hidden-layer models was
Matlab was used for all calculations,
to integrate a set of models, decrease the disadvantages as well as its statistics, ML,individual
of every and ANN toolboxes.
model (one
Pearson
hidden correlation
layer), and buildcoefficients of target
an enhanced andvalues
moreand SNN model
reliable prediction
withvalues were investi-
high accuracy using
gated
two as a training,
hidden layers. validation, and testing accuracy metric. Table 2 shows the best perfor-
mance
Theofresults
CODeffofmodels
random selected fromof
forest (RF) several
the CODstudied
modelarchitectures.
were presentedFigurein3b,c represent
Figure 3g, with
Rthe besttoarchitecture
equal 0.933 and MSE of theequal
two scenarios
to 24.3. (M4-S1 and M4-S2, respectively) through a trial–
error
Weprocess
appliedand the correlation
the deep neural network coefficients
(DNN)for withallthe
training,
(LBFGS) validation, anddata
solver to the testing
of the
groups,
best model in within
addition thetofirst
the set
andofsecond
all the scenarios
data. The (M4-S2).
FFNN model that includes
It consists of threeseven
hiddeninputs
layers
and one output
according to the neurons
following was discovered
neurons as the best
architecture: 50 for all simulated
30 10. parameters,
It gave slightly better as shown
results as it
in Table 2 and illustrated in Figure 3d for the
achieved an R value equal to 0.929, but the MSE value was 54.71. first scenario (M4-S1) and Figure 3e for the
second
Figurescenario
3h shows(M4-S2). Figure 3f
the results shows the
achieved results’
by the CNNpredicted values
model, with by ANN
R equal as the
to 0.42 and
samples’ sequence
MSE equal to 152.81. plots for COD eff compared to the measured values. It could be observed
thatFigure
the results of FFNNthe
3i presents areresult
acceptable
of theforRNN forecasting
model, withthe performance of the
R equal to 0.51 WWTP
and MSE(Ta- equal
ble 2).
to 62.16.
The
We aimtried
also of comparing
the stackedthe two-hidden-layer
auto-encoder techniquemodels
withinto one-hidden-layer
three hidden layers models
whichwas make
to integrate a set of models, decrease the disadvantages of every individual
up fully connected deep neural network of the 7-(4-3-2)-1 and 7-(5-4-3)-1 structure, four hidden model (one
hidden
layers deeplayer),
neural and build an
networks ofenhanced and more
the 7-(5-4-3-3)-1 reliable model
and 7-(6-5-4-3)-1 with high
structure, and accuracy
five hiddenusing
layers
two hidden layers.
deep neural networks of the 7-(6-5-4-3-3)-1 and 7-(5-4-3-3-3)-1 structure. Figure 3j illustrates the
best results achieved by four hidden DNNs layers with 7(-6-5-4-3)-1 neurons architecture. We
have a 0.34 value for the (R) correlation coefficient and MSE equal to 0.174.
4.2. Single Model (BOD5 Effluent)

We similarly built several artificial neural networks in the two main scenarios that
include a single input layer consisting of five subscenarios for the input variables: (4, 5,
6, 7, 8 neurons), as shown in Table A1 (Appendix A). The input values were all taken at
the WWTP inlet, while the output layer was limited to a single neuron representing the
BOD5eff value at the plant’s outlet. Table A1 displays the best performance of BOD5eff
models selected from several studied architectures, as well as the correlation coefficients
for all training, validation, and testing groups, in addition to the set of the entire data for
the two scenarios (one and two hidden layers). The best model performance is illustrated
in Figure A3a,b. Measured versus predicted BOD5eff values are described in Figure A3.
We have also applied the deep neural network (DNN) with the (LBFGS) solver to the
data of the best BOD5 effluent model (M9-S2). It consists of three hidden layers according
to the following neurons architecture: 24 30 10. It gave much lower results as it achieved an
R value equal to 0.294, and the MSE value was 322.
The results of applied random forest (RF) of the BOD5 model were presented in
Figure A3d, with R equal to 0.94 and MSE equal to 6.12.
Figure A3e shows the results achieved by the CNN model, with R equal to 0.53 and
MSE equal to 20.55.
Figure A3f presents the result of the RNN model, with R equal to 0.32 and MSE equal
to 20.26.
We tried the stacked auto-encoder technique within three hidden layers fully connected
to the deep neural network BOD5 of 7-(4-3-2)-1 and 7-(5-4-3)-1 structure, four hidden layers
deep neural networks of 7-(5-4-3-3)-1 and 7-(6-5-4-3)-1 structure, and five hidden layers
deep neural networks of 7-(6-5-4-3-3)-1 and 7-(5-4-3-3-3)-1 structure. Figure A3g illustrates
the best results achieved by the four hidden DNNs layers with 7-(6-5-4-3)-1 neurons
architecture. We have 0.39 value for the (R) correlation coefficient and MSE equal to 0.176.
According to these results of applying three, four and five hidden layers in the previous
models (COD effluent and BOD5 effluent), we were satisfied with two hidden layers for all
subsequent models: ( PO4 −3 eff) model, (NO3eff) model, (CODeff and BOD5eff) ensemble model,
(PO4 −3 eff and NO3eff) ensemble model, and (CODeff, BOD5eff, PO4 −3 eff, NO3eff) ensemble model.
4.3. Single Model (PO4 −3 Effluent)

We constructed several artificial neural networks that include a single input layer
consisting of five subscenarios for the input variables: (4, 5, 6, 7, 8 neurons), as shown in
Table 3. All the input values were taken at the WWTP inlet, while the output layer was
limited to a single neuron representing the value of the PO4 −3 effluent at the outlet of the
plant. Table 3 shows the best performance of the PO4 −3 eff models selected from several
studied architectures, as well as the correlation coefficients for all training, validation,
and test groups, in addition to the set of all data, for the two scenarios (one and two
hidden layers). The best model performance is illustrated in Figure 4a,b. Measured versus
predicted PO4 −3 eff values are presented in Figure 4c.
Table 3. Performance of the PO4 −3 eff artificial neural network models.
Training Validation Testing All Data No. Neurons

Model Network
Network Input (Correlation (Correlation (Correlation (Correlation in Hidden MSE
No. Output
Coefficient) Coefficient) Coefficient) Coefficient) Layers
M11-S1 0.676 0.027 0.021 0.223 67 2999.55
M11-S2
Tinf , pHinf , ECinf , TDSinf PO4−3 eff 0.681 0.948 90.89 0.930 0.81 40–60
M12-S1 0.968 0.136 0.0526 0.391 65 2833.67
Tinf , pHinf , ECinf , TDSinf , NO3 − inf PO4−3 eff
M12-S2 0.572 0.670 0.625 0.58 30–55 59.22
M13-S1 Tinf , pHinf , ECinf , TDSinf , 0.97 0.658 0.569 0.832 65 186.127
PO4−3
M13-S2 NO3 − inf inf , PO4−3 inf eff 0.976 0.622 0.97 0.963 40–60 10.26
M14-S1 Tinf , pHinf , ECinf , TDSinf , NO3 − inf , 0.869 0.162 0.0836 0.65 55 278.46
PO4−3 eff
M14-S2 PO4−3 inf , BOD5inf 0.953 0.939 0.989 0.958 30–50 48.34
PO4−3
M15-S2 PO4−3 inf , BOD5inf , CODinf eff 0.933 0.959 0.937 0.936 30–50 25.53
Tinf, pHinf, ECinf, TDSinf, 𝑁𝑂₃ inf 𝑃𝑂4 eff
M12-S2 0.572 0.670 0.625 0.58 30–55 59.22
M13-S1 Tinf, pHinf, ECinf, TDSinf, 𝑁𝑂₃− inf inf, 0.97 0.658 0.569 0.832 65 186.127
𝑃𝑂4−3 eff
M13-S2 𝑃𝑂4−3 inf 0.976 0.622 0.97 0.963 40–60 10.26
M14-S1 Tinf, pHinf, ECinf, TDSinf, 𝑁𝑂₃− inf, 0.869 0.162 0.0836 0.65 55 278.46
𝑃𝑂4−3 eff
M14-S2 𝑃𝑂4−3 inf, BOD5inf 0.953 0.939 0.989 0.958 30–50 48.34
Sustainability 14,
M15-S1 2022,Tinf 15598
, pH −
inf, ECinf, TDSinf, 𝑁𝑂₃ inf, 0.904 0.789 0.23 0.76 60 19256.2
of 35
𝑃𝑂4−3 eff
M15-S2 𝑃𝑂4−3 inf, BOD5inf, CODinf 0.933 0.959 0.937 0.936 30–50 25.53
(a)
(b)
(c)
Figure 4. Cont.
(c)
Figure 4. (a)
Figure 4. (a) The
Thevalues
valuesofof correlation
correlation coefficient
coefficient (R)
(R) in allinstages
all stages of Model
of Model M13-S1M13-S1 in the
in the first first sce-
scenario.
nario.
(b) The(b) The values
values of correlation
of correlation coefficient
coefficient (R)stages
(R) in all in allofstages
Model ofM13-S2
Model inM13-S2 in thescenario.
the second second sce-
nario. (c) Observed
(c) Observed vs. predicted
vs. predicted samplessamples
sequencesequence
obtained byobtained by best
best single PO4single 𝑃𝑂₄models
−3 effluent effluent models
M13-S2.
M13-S2.
4.4. Single Model (NO3 − Effluent)
We built numerous artificial neural networks that included a single input layer con-
taining five subscenarios for the input variables: (4, 5, 6, 7, 8 neurons), as shown in Table A2.
The input values were all taken at the WWTP inlet, while the output layer was restricted
to a single neuron representing the value of NO3 − effluent at the plant’s outlet. Table A2
shows the best-performance NO3 − eff models chosen from several studied architectures,
as well as the correlation coefficients for all training, validation, and testing groups, in
addition to the set of all the data for the two scenarios (one and two hidden layers). The
best model performance is illustrated in Figure A4a,b. Observed versus predicted NO3 − eff
values are shown in Figure A4c.
4.5. Ensemble Model (COD and BOD5 Effluent)

We established several artificial neural networks that included a single input layer
consisting of five subscenarios: (4 neurons: Tinf , pHinf , TDSinf , ECinf ), (5 neurons: Tinf , pHinf ,
TDSinf , ECinf , NO3inf ), (6 neurons: Tinf , pHinf , TDSinf , ECinf , NO3inf , PO4inf ), (7 neurons: Tinf ,
pHinf , TDSinf , ECinf , NO3inf , PO4inf , BOD5inf ), (8 neurons: Tinf , pHinf , TDSinf , ECinf , NO3inf ,
PO4inf , BOD5inf , CODinf ). All the input values were taken at the WWTP inlet, whereas
the output layer was limited to one layer, but with two neurons representing the values
of (CODeff and BOD5eff ) at the outlet of the plant. Table 4 shows the best performance of
the (CODeff and BOD5eff ) models selected from various studied architectures. Figure 5a,b
represents the best performance of the first and second scenario, as well as the correlation
coefficients for all training, validation, and testing groups, in addition to the set of all data.
Observed versus predicted ensemble CODeff and BOD5eff values are presented in Figure 5c.
Sustainability 14,14,x 15598
2022, FOR PEER REVIEW 22 35
21 of of 35
(a)
(b)
Figure 5. Cont.
Sustainability 14,14,
2022, x FOR
15598PEER REVIEW 2223 of 35
of 35
(c)
Figure
Figure5.5.(a)
(a)The
The values ofcorrelation
values of correlation coefficient
coefficient (R)(R) instages
in all all stages of Model
of Model M25-S1
M25-S1 in the
scenario.
nario.
(b) The values of correlation coefficient (R) in all stages of Model M25-S2 in the second scenario.sce-
(b) The values of correlation coefficient (R) in all stages of Model M25-S2 in the second
nario. (c) Measured
(c) Measured vs. predicted
sequence sequence
obtainedobtained by best (COD
by best ensemble ensemble
and (COD
BOD5 ) and BOD5)
effluent
effluent models
models M25-S2.
M25-S2.
(a)eff and BOD5eff ) artificial neural network models.

Table 4. Performance of the (COD

Model Network
No. Output
M21-S1 CODeff & 0.806 0.351 0.411 0.62 67 242.09
Tinf ,pHinf ,ECinf ,TDSinf
M21-S2 BOD5eff 0.804 0.850 0.833 0.811 40–60 34.22
M22-S1 CODeff & 0.53 0.34 0.52 0.5 65 212.37
Tinf ,pHinf ,ECinf ,TDSinf , NO3 − inf
M22-S2 BOD5eff 0.74 0.756 0.837 0.756 30–55 46.54
M23-S1 CODeff & 0.866 0.46 0.57 0.57 65 138.95
Tinf ,pHinf ,ECinf ,TDSinf , NO3 − inf ,PO4−3 inf
M23-S2 BOD5eff 0.749 0.772 0.828 0.761 40–60 60.04
M24-S1 Tinf ,pHinf ,ECinf ,TDSinf , NO3 − inf ,PO4−3 inf , CODeff & 0.921 0.432 0.446 0.75 55 126.998
M24-S2 BOD5inf BOD5eff 0.818 0.891 0.87 0.834 30–50 22.59
M25-S1 Tinf ,pHinf ,ECinf ,TDSinf , NO3 − inf ,PO4−3 inf , CODeff & 0.929 0.685 0.504 0.829 60 75
M25-S2 BOD5inf ,CODinf BOD5eff 0.873 0.891 0.894 0.878 30–50 34.34
4.6. Ensemble Model (PO4 −3 and NO3 − Effluent)

We constructed several artificial neural networks that included a single input layer
consisting of five scenarios in the same way as the previous models. The input neurons
values were taken at the plant’s inlet, whereas the output layer was restricted to one layer,
but with two neurons representing the values of (PO4 −3 eff and NO3 − eff) at the outlet of the
plant. Table 5 shows the best performance of (PO4 −3 eff and NO3 − eff) models selected from
several studied architectures. Figure 6a,b represent the best models’ performance, as well
as the correlation coefficients for all training, validation, and testing groups, in addition to
the set of all the data for the two scenarios (one and two hidden layers). Observed versus
predicted (PO4 −3 eff and NO3 − eff) values are illustrated in Figure 6c.
(b)
Figure 5. (a) The values of correlation coefficient (R) in all stages of Model M25-S1 in the first sce-
nario. (b) The values of correlation coefficient (R) in all stages of Model M25-S2 in the second sce-
nario. (c) Measured vs. predicted samples sequence obtained by best ensemble (COD and BOD5)
effluent models M25-S2.
(a)
(b)
(c)
Figure 6. Cont.
(c)
Figure
Figure 6.
6. (a) The values
(a) The valuesofofcorrelation
correlation coefficient
coefficient (R)allinstages
(R) in all stages of Model
of Model M29-S1M29-S1 in the
scenario.
nario. (b)
(b) The The values
coefficient (R) in all(R) in all
stages stages M29-S2
of Model of Model
in M29-S2
the second in the second
scenario. (c)sce-
−3
nario.
Measured vs. predicted samples sequence obtained by the best ensemble (PO4 and NO3 ) effluentand
(c) Measured vs. predicted samples sequence obtained by the best
− 3 ensemble − (𝑃𝑂 4
−
𝑁𝑂 3 ) effluent
models M29-S2.models M29-S2.
Table 5. Performance of the (PO4 −3 eff and NO3 − eff) artificial neural network models.

Model Network
No. Output
Coefficient) Coefficient) Coefficient) Coefficient) layers
M26-S1
Tinf , pHinf , ECinf , TDSinf PO4 −3 e f f & 0.631 0.376 0.223 0.496 67 2025.37
M26-S2 NO3 − e f f 0.786 0.625 0.761 0.764 40–60 606.61
M27-S1 PO4 −3 e f f & 0.753 0.488 0.617 0.685 65 1653.70
Tinf , pHinf , ECinf , TDSinf , NO3 − inf
M27-S2 NO3 − e f f 0.862 0.870 0.8 0.854 30–55 391.07
M28-S1 Tinf , pHinf , ECinf , TDSinf , NO3 − inf , PO4 −3 e f f & 0.923 0.14 0.48 0.58 65 5828.85
M28-S2 PO4−3 inf NO3 − e f f 0.939 0.938 0.85 0.929 40–60 179.08
PO4−3 inf , BOD5inf NO3 − e f f
M29-S2 0.958 0.966 0.738 0.942 30–50 56.94
M30-S2 PO4−3 in , BOD5inf , CODinf NO3 − e f f 0.962 0.931 0.957 0.957 30–50 198.4
4.7. Ensemble Model (COD, BOD5 , PO4 −3 and NO3 − Effluent)

We built several artificial neural networks with a single input layer made up of five
subscenarios for the input variables: (4, 5, 6, 7, 8 neurons), as shown in Table A3. All the
input values were taken at the WWTP inlet, whereas the output layer was limited to one
layer, but with four neurons representing the values of (CODeff , BOD5eff , PO4 −3 eff and
NO3 − eff ) at the plant’s outlet. Table A3 shows the best performance of the (CODeff , BOD5eff ,
PO4 −3 eff and NO3 − eff ) models selected from several studied architectures. Figure A5a,b
represent the best models’ performance and the correlation coefficients for all training,
validation, and testing groups, in addition to the set of all the data for the two scenarios
(one and two hidden layers).
The results demonstrated that, for predicting the studied variables in full-scale WWTP,
both single and combined models provide good results.
It is worth noting that using the two-hidden-layers SNN model (which aimed to
decrease the weaknesses of the one-hidden-layer SNN model and coming up with improved
and composite models, which are favorable and more reliable with high accuracy) gave
superior results compared to one-hidden-layer models. For example, in relation to the
ensemble model (COD, BOD5 , PO4 −3 and NO3 − effluent), we can find from Table A3 that
in the one-hidden-layer model (M35-S1) with architecture (8-60-4), eight influent pollution
variable input, 60 neurons in the single hidden layer, and four effluent pollution variable
output gave a 0.757 correlation coefficient for all the data and a mean squared error (MSE)
of 622, whereas the two-hidden-layer model (M35-S2) with architecture (8-50-30-4), eight
influent pollution variable input, 50 neurons in the first hidden layer, 30 neurons in the
second hidden layer, and four effluent pollution variable output achieved higher correlation
coefficient 0.936 for all the data and a mean squared error (MSE) of 51.05.
Moreover, the performance of all models with one and two hidden layers is improved
by increasing the input plant’s pollution variables.
The performance of the SNN and DNN models was satisfactory in the calibration
and verification stages due to the solidity of the neural networks approach in processing
nonlinear interactions and the capability to backpropagate the produced error through the
calibration stage until the desired result was obtained.
The use of artificial neural network technology gave remarkable results in forecasting
the performance of the WWTP. In addition, the application of shallow and deep learning
increased the efficiency of this predicting, which contributes to the treatment plant’s
enhanced quality.
These ANN models aid in predicting the effluent quality of wastewater treatment
plants, and thus, ANN models can provide a useful tool for modeling and predicting
wastewater treatment plants’ performance.
5. Conclusions
This paper presented the use of SNN and DNN models to predict the performances
of a full-scale WWTP. It showed that the use of the influent and effluent concentrations of
several pollution parameters (COD, BOD5 , PO4 −3 , NO3 − , T, pH, EC, and TDS) gave a high
modeling accuracy.
We studied the performance of several neural network techniques and architectures.
Our results indicate that increasing the number of input variables and neurons in hidden
layers improved the accuracy of the SNN models. Moreover, the ensemble models were
more reliable, robust, and efficient than others.
In conclusion, we note that the first SNN scenario (one-hidden-layer model) gen-
erally gave good correlation values. Still, the correlation values in the validation and
testing phase were also at acceptable levels. The second SNN scenario (two-hidden-layers
models) and the random forest gave excellent correlation values in all stages (training,
validation, testing).
DNN (CNN, RNN, SAE DNN) can be used for WWTP modeling, but due to the small
datasets, it gave a lower performance and accuracy. Due to the general availability of
small datasets in wastewater management, the shallow (one or two hidden layer) neural
networks are highly recommended for modeling the WWTP process. The use of these
models contributes to significantly reducing the periodic laboratory measurements, which
minimizes the operational cost of these plants, and assessing the stability of environmental
balance. Moreover, it is possible to add some operating parameters from the aerated
activated sludge tank to the models and comparing it, which we recommend to include in
future works.
minimizes the operational cost of these plants, and assessing the stability of environ
tal balance. Moreover, it is possible to add some operating parameters from the aer
activated sludge tank to the models and comparing it, which we recommend to inc
in future works.
Author Contributions: Conceptualization, R.J., A.A., K.J. and I.S.; methodology, R.J., A.A., K.
I.S.; software, R.J. and K.J.; validation, R.J., A.A., K.J. and I.S.; formal analysis, R.J., A.A., K.J
Author Contributions: Conceptualization, R.J., A.A., K.J. and I.S.; methodology, R.J., A.A., K.J. and
I.S.; data curation, R.J. and K.J.; writing—original draft preparation, R.J., A.A., K.J. and I.S.;
I.S.; software, R.J. and K.J.; validation, R.J., A.A., K.J. and I.S.; formal analysis, R.J., A.A., K.J. and
ing—review and editing, R.J., A.A., K.J. and I.S.; visualization, R.J. and K.J.; supervision, A.A
I.S.; data curation, R.J. and K.J.; writing—original draft preparation, R.J., A.A., K.J. and I.S.; writing—
I.S. Alland
review authors
editing,have read K.J.
R.J., A.A., andandagreed to the published
I.S.; visualization, R.J. andversion of the manuscript.
K.J.; supervision, A.A. and I.S. All
authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Funding: This research received no external funding.
Data Availability Statement: The datasets are available from the corresponding author on re
Data request. Statement: The datasets are available from the corresponding author on reason-
ableAvailability
able request.
Conflicts of Interest: The authors declare no conflict of interest.
Conflicts of Interest: The authors declare no conflict of interest.
Appendix
Appendix AA
Figure
Figure A1.
A1. Schematic
Schematic of a of a three-layer
three-layer artificial
artificial neural network.
neural network.
(a) (b)
Figure
FigureA2.
A2.(a)
(a)Analysis
Analysisof
of influent
influent parameters usingANOVA1,
parameters using ANOVA1,(b)
(b)analysis
analysisofof effluent
effluent parameters
parameters
using
usingANOVA1.
ANOVA1.
Table A1. Performance of the BOD5eff artificial neural network models.

Model Output
Model No. Model Input Variables (Correlation (Correlation (Correlation (Correlation in Hidden MSE
Variable(S)
M6-S1 0.515 0.774 0.564 0.541 67 13.964
Tinf, pHinf, ECinf, TDSinf BOD5eff
M6-S2 0.685 0.361 0.825 0.681 40–60 18.28
M7-S1 0.69 0.69 0.624 0.68 65 10.926
Tinf, pHinf, ECinf, TDSinf, 𝑁𝑂₃− inf BOD5eff
M7-S2 0.675 0.520 0.820 0.678 30–55 18.51
(a) (b)
Model Training Validation Testing All Data No. Neurons

Model
Model Input Variables Output (Correlation (Correlation (Correlation (Correlation in Hidden MSE
No.
Variable(S) Coefficient) Coefficient) Coefficient) Coefficient) Layers
M6-S1 0.515 0.774 0.564 0.541 67 13.964
Tinf , pHinf , ECinf , TDSinf BOD5eff
M6-S2 0.685 0.361 0.825 0.681 40–60 18.28
M7-S1 0.69 0.69 0.624 0.68 65 10.926
Tinf , pHinf , ECinf , TDSinf , NO3 − inf BOD5eff
M7-S2 0.675 0.520 0.820 0.678 30–55 18.51
M8-S1 Tinf , pHinf , ECinf , TDSinf , NO3 − inf , BOD5eff
0.534 0.596 0.687 0.557 65 19.223
M8-S2 PO4−3 inf 0.704 0.648 0.825 0.715 40–60 11.10
M9-S1 Tinf , pHinf , ECinf , TDSinf , NO3 − inf , BOD5eff
0.753 0.82 0.87 0.78 55 10.09
M9-S2 PO4−3 inf , BOD5inf Figure A2. (a) Analysis of 0.901
influent 0.942
parameters using 0.895
ANOVA1, (b)0.906
analysis of 30–50
effluent 2.14
parameters
PO4−3 inf , BOD5inf , CODusing ANOVA1.
BOD5eff
M10-S2 inf 0.890 0.935 0.898 0.898 30–50 6.55

Table A2. Performance of the NO3 − eff artificial neural network models.
Model Output
Model No. Model Input Variables Training(Correlation (Correlation
Validation (CorrelationAll(Correlation
Testing Data No. in Hidden MSE
Neurons
Model NetworkVariable(S)
Network Input Coefficient)
(Correlation Coefficient)
(Correlation Coefficient)
(Correlation Coefficient) in Hidden
(Correlation Layers MSE
No. Output
M6-S1 Coefficient) 0.515 Coefficient)0.774 Coefficient)
0.564 Coefficient)
0.541 Layers67 13.964
Tinf, pHinf, ECinf, TDSinf BOD5eff
M16-S1
M6-S2 0.606 0.685 0.273 0.361 0.245 0.825 0.4890.681 6740–60 4135.91
18.28
Tinf , pHinf , ECinf , TDSinf NO3 − eff
M16-S2 0.83 0.485 0.209 0.607 40–60 1915
M7-S1 − 0.69 0.69 0.624 0.68 65 10.926
M17-S1 Tinf, pHinf, ECinf, TDSinf, 𝑁𝑂₃ inf BOD5eff 0.907
M7-S2 Tinf , pHinf , ECinf , TDSinf , NO3 − in NO3 − eff 0.675 0.448 0.520 0.505 0.820 0.7150.678 6530–55 5015.33
18.51
M17-S2 − 0.901 0.988 0.809 0.912 30–55 223.42
M8-S1 Tinf, pHinf, ECinf, TDSinf, 𝑁𝑂₃ inf, 0.534 0.596 0.687 0.557 65 19.223
NO3 − inf inf , BOD5eff 0.972
M8-S2 Tinf , pHinf , ECinf , TDS
M18-S1
𝑃𝑂4inf , −3
inf NO3 − eff 0.704 0.598 0.648 0.661 0.825 0.789
0.715 65
40–60 2054.71
11.10
M18-S2 PO4−3 inf 0.90 0.975 0.864 0.909 40–60 188.87
M9-S1 Tinf, pHinf, ECinf, TDSinf− , 𝑁𝑂₃− inf, 0.753 0.82 0.87 0.78 55 10.09
M19-S1 Tinf , pHinf , ECinf , TDS −3 inf , NO3 inf , BOD5eff 0.969 0.698 0.581 0.793 55 2520.53
M9-S2 − 𝑃𝑂4 inf , BOD 5inf NO3 − eff 0.901 0.942 0.895 0.906 30–50 2.14
M19-S2 3
PO4 inf , BOD5inf 0.905 0.844 0.914 0.9 30–50 406.11
−, 𝑁𝑂₃ −
M10-S1
M20-S1 T inf , T
pH inf , pH
, EC inf
inf, , EC
TDSinf ,
inf TDS
, NO inf
3 inf , inf , 0.876 0.674 0.779 0.775 0.576 0.51 0.822 0.651 60 60 20.224
2038.57
inf
−3 NO3 − eff BOD5eff
M10-S2
M20-S2 PO4−𝑃𝑂4 3
inf , BOD inf,5inf
BOD , COD , CODinf
5infinf 0.928 0.890 0.953 0.935 0.934 0.898 0.9320.898 30–50
30–50 6.55
65.94
(a)
Figure A3. Cont.

Sustainability 2022, 14,
14, 15598
x FOR PEER REVIEW 28
28 of 35
of 35
(b)
(c)
Figure A3. Cont.

Sustainability
Sustainability 14, 14,
2022,
2022, 15598
x FOR PEER REVIEW 2929
of 35
of 35
(d)
(e)
Figure A3. Cont.

Sustainability
2022, 14,14, x FOR PEER REVIEW
15598 30ofof3535
30
(f)
(g)
Figure A3.(a)
FigureA3. (a)The
The values
values of correlation coefficient
of correlation coefficient (R)
(R)ininall
allstages
stagesofofModel
Model M9-S1
M9-S1 in in
thethe first
first sce-
scenario.
nario. (b)(b)
The The values
coefficient (R) in(R)
allin all stages
stages of Model
of Model M9-S2M9-S2 in the scenario.
in the second second
scenario. (c) Measured
(c) Measured vs. predicted
sequencesequence
achievedachieved
by bestby best single
single BOD5 BOD 5 effluent
effluent models models
M9-S2.
(d) Measured
M9-S2. vs. predicted
(d) Measured samples
vs. predicted sequence
samples obtained
sequence by random
obtained forest (RF)
by random forestBOD
(RF)5 effluent models.
BOD5 effluent
(e) Measured
models. vs. predicted
(e) Measured samplessamples
vs. predicted sequence obtained
sequence by (CNN)
obtained convolutional
by (CNN) neural network
convolutional neural
BOD5 effluent
network models.models.
BOD5 effluent (f) Measured vs. predicted
(f) Measured samplessamples
vs. predicted sequencesequence
obtainedobtained
by (RNN) byrecurrent
(RNN)
neural network
recurrent BOD5 effluent
neural network BOD5 models.
effluent (g) Measured
models. vs. predicted
(g) Measured vs. samples
predictedsequence
samplesobtained
sequenceby
(DNN) SAE neural network BOD5 effluent models.
obtained by (DNN) SAE neural network BOD5 effluent models.
Tinf, pHinf, ECinf, TDSinf, 𝑁𝑂₃− in 𝑁𝑂₃− eff
M17-S2 0.901 0.988 0.809 0.912 30–55 223.42
M18-S1 Tinf, pHinf, ECinf, TDSinf, 𝑁𝑂₃− inf inf, − 0.972 0.598 0.661 0.789 65 2054.71
𝑁𝑂₃
M18-S2 𝑃𝑂4−3 inf eff
0.90 0.975 0.864 0.909 40–60 188.87
M19-S1 Tinf, pHinf, ECinf, TDSinf, 𝑁𝑂₃− inf, 0.969 0.698 0.581 0.793 55 2520.53
𝑁𝑂₃− eff
M19-S2 𝑃𝑂4−3 inf, BOD5inf 0.905 0.844 0.914 0.9 30–50 406.11
SustainabilityM20-S1 Tinf, pHinf, ECinf, TDSinf, 𝑁𝑂₃− inf,
2022, 14, 15598 0.876 0.779 0.576 0.822 60 312038.57
of 35
𝑁𝑂₃− eff
M20-S2 𝑃𝑂4−3 inf, BOD5inf, CODinf 0.928 0.953 0.934 0.932 30–50 65.94
(a)
(b)
(c)
Figure A4. Cont.
(c)
Figure
Figure A4. (a) The A4. of
values (a)correlation
The values coefficient
of correlation
(R) coefficient (R) of
in all stages in Model
all stages of Model
M20-S1 M20-S1
in the first in the fi
scenario. (b) The values of correlation coefficient (R) in all stages of Model
scenario. (b) The values of correlation coefficient (R) in all stages of Model M20-S2 in the second M20-S2 in the seco
scenario. (c) Measured vs. predicted samples sequence achieved by best − single 𝑁𝑂₃− effluent mo
scenario. (c) Measured vs. predicted samples sequence achieved by best single NO3 effluent models
els M20-S2.
M20-S2.
Table A3. Performance of the (CODeff , BOD5eff , PO4 −3 eff and NO3 − eff ) artificial neural net-
work models.

Model
Network Input Network Output (Correlation (Correlation (Correlation (Correlation in Hidden MSE
No.
M31-S1 CODeff & BOD5eff & 0.872 0.30 0.62 0.682 67 1707.26
Tin , pHin , ECin , TDSin
M31-S2 PO4 −3 e f f & NO3 − e f f 0.803 0.490 0.603 0.732 40–60 665.98
M32-S1 CODeff & BOD5eff & 0.848 0.487 0.604 0.75 65 870.369
Tinf , pHinf , ECinf , TDSinf , NO3 − inf PO4 −3 e f f & NO3 − e f f
M32-S2 0.932 0.852 0.755 0.872 30–55 185.41
M33-S1 Tinf , pHinf , ECinf , TDSinf , NO3 − inf , CODeff & BOD5eff & 0.829 0.418 0.352 0.671 65 748.01
M33-S2 PO4−3 inf PO4 −3 e f f & NO3 − e f f 0.938 0.931 0.858 0.625 40–60 78.93
M34-S2 PO4−3 inf , BOD5inf PO4 −3 e f f & NO3 − e f f 0.909 0.965 0.973 0.928 30–50 69.71
M35-S2 PO4−3 inf , BOD5inf , CODinf PO4 −3 e f f & NO3 − e f f 0.925 0.952 0.972 0.936 30–50 51.05
M32-S1 CODeff & BOD5eff & 0.848 0.487 0.604 0.75 65 870.369
Tinf, pHinf, ECinf, TDSinf, 𝑁𝑂₃− inf
M32-S2 𝑃𝑂₄−3 𝑒𝑓𝑓 & 𝑁𝑂₃− 𝑒𝑓𝑓 0.932 0.852 0.755 0.872 30–55 185.41
M33-S1 Tinf, pHinf, ECinf, TDSinf, 𝑁𝑂₃− inf, CODeff & BOD5eff & 0.829 0.418 0.352 0.671 65 748.01
M33-S2 𝑃𝑂4−3 inf 𝑃𝑂₄−3 𝑒𝑓𝑓 & 𝑁𝑂₃− 𝑒𝑓𝑓 0.938 0.931 0.858 0.625 40–60 78.93
M34-S2 𝑃𝑂4−3 inf, BOD5inf 𝑃𝑂₄−3 𝑒𝑓𝑓 & 𝑁𝑂₃− 𝑒𝑓𝑓 0.909 0.965 0.973 0.928 30–50 69.71
Sustainability 2022, 14, 15598
33 of 35
M35-S2 𝑃𝑂4−3 inf, BOD5inf, CODinf 𝑃𝑂₄−3 𝑒𝑓𝑓 & 𝑁𝑂₃− 𝑒𝑓𝑓 0.925 0.952 0.972 0.936 30–50 51.05
(a)
(b)
Figure Figure
A5. (a) A5.
The (a)
values
Theofvalues
correlation coefficientcoefficient
of correlation (R) in all stages
(R) inofallModel
stagesM35-S1 in the
of Model first in the first sce-
M35-S1
scenario. (b) The
nario. (b) values of correlation
The values coefficient
of correlation (R) in (R)
coefficient all stages of Model
in all stages M35-S2M35-S2
of Model in the second
in the second scenario.
scenario.
References
1. Vanrolleghem, P.; Verstraete, W. Simultaneous biokinetic characterization of heterotrophic and nitrifying populations of acti-
vated sludge with an on-line respirographic biosensor. Water Sci. Technol. 1993, 28, 377–387.
2. Vassos, T.D. Future directions in instrumentation, control and automation in the water and wastewater industry. Water Sci.
Technol. 1993, 28, 9–14.
3. Harremoë, P.; Capodaglio, A.G.; Hellström, B.G.; Henze, M.; Jensen, K.N.; Lynggaard-Jensen, A.; Otterpohl, R.; Søeberg, H.
References
1. Vanrolleghem, P.; Verstraete, W. Simultaneous biokinetic characterization of heterotrophic and nitrifying populations of activated
sludge with an on-line respirographic biosensor. Water Sci. Technol. 1993, 28, 377–387. [CrossRef]
2. Vassos, T.D. Future directions in instrumentation, control and automation in the water and wastewater industry. Water Sci.
Technol. 1993, 28, 9–14. [CrossRef]
3. Harremoë, P.; Capodaglio, A.G.; Hellström, B.G.; Henze, M.; Jensen, K.N.; Lynggaard-Jensen, A.; Otterpohl, R.; Søeberg, H.
Wastewater treatment plants under transient loading-Performance, modelling and control. Water Sci. Technol. 1993, 27, 71.
[CrossRef]
4. Mjalli, F.S.; Al-Asheh, S.; Alfadala, H. Use of artificial neural network black-box modeling for the prediction of wastewater
treatment plants performance. J. Environ. Manag. 2007, 83, 329–338. [CrossRef]
5. Hamoda, M.F.; Al-Ghusain, I.A.; Hassan, A.H. Integrated wastewater treatment plant performance evaluation using artificial
neural networks. Water Sci. Technol. 1999, 40, 55–65. [CrossRef]
6. Nasr, M.S.; Moustafa, M.A.E.; Seif, H.A.E.; El Kobrosy, G. Application of Artificial Neural Network (ANN) for the prediction of
EL-AGAMY wastewater treatment plant performance-EGYPT. Alex. Eng. J. 2012, 51, 37–43. [CrossRef]
7. Hong, Y.-S.T.; Rosen, M.R.; Bhamidimarri, R. Analysis of a municipal wastewater treatment plant using a neural network-based
pattern analysis. Water Res. 2003, 37, 1608–1618. [CrossRef] [PubMed]
8. Lee, D.S.; Park, J.M. Neural network modeling for on-line estimation of nutrient dynamics in a sequentially-operated batch
reactor. J. Biotechnol. 1999, 75, 229–239. [CrossRef]
9. Côte, M.; Grandjean, B.P.A.; Lessard, P.; Thibault, J. Dynamic modelling of the activated sludge process: Improving prediction
using neural networks. Water Res. 1995, 29, 995–1004. [CrossRef]
10. Hamed, M.M.; Khalafallah, M.G.; Hassanien, E.A. Prediction of wastewater treatment plant performance using artificial neural
networks. Environ. Model. Softw. 2004, 19, 919–928. [CrossRef]
11. Maier, H.R.; Dandy, G.C. Neural networks for the prediction and forecasting of water resources variables: A review of modelling
issues and applications. Environ. Model. Softw. 2000, 15, 101–124. [CrossRef]
12. Blaesi, J.; Jensen, B. Can Neural Networks Compete with Process Calculations. InTech 1992, 39. Available online: https://www.osti.
gov/biblio/6370708 (accessed on 2 October 2022).
13. Rene, E.R.; Saidutta, M. Prediction of BOD and COD of a refinery wastewater using multilayer artificial neural networks. J. Urban
Environ. Eng. 2008, 2, 1–7. [CrossRef]
14. Vyas, M.; Modhera, B.; Vyas, V.; Sharma, A. Performance forecasting of common effluent treatment plant parameters by artificial
neural network. ARPN J. Eng. Appl. Sci. 2011, 6, 38–42.
15. Jami, M.S.; Husain, I.; Kabbashi, N.A.; Abdullah, N. Multiple inputs artificial neural network model for the prediction of
wastewater treatment plant performance. Aust. J. Basic Appl. Sci. 2012, 6, 62–69.
16. Pakrou, S.; Mehrdadi, N.; Baghvand, A. Artificial neural networks modeling for predicting treatment efficiency and considering
effects of input parameters in prediction accuracy: A case study in tabriz treatment plant. Indian J. Fundam. Appl. Life Sci. 2014, 4,
2231–6345.
17. Nourani, V.; Elkiran, G.; Abba, S. Wastewater treatment plant performance analysis using artificial intelligence—An ensemble
approach. Water Sci. Technol. 2018, 78, 2064–2076. [CrossRef]
18. Wang, D.; Thunéll, S.; Lindberg, U.; Jiang, L.; Trygg, J.; Tysklind, M.; Souihi, N.A. machine learning framework to improve
effluent quality control in wastewater treatment plants. Sci. Total Environ. 2021, 784, 147138. [CrossRef]
19. Zhu, X.; Xu, Z.; You, S.; Komárek, M.; Alessi, D.S.; Yuan, X.; Palansooriya, K.N.; Ok, Y.S.; Tsang, D.C.W. Machine learning
exploration of the direct and indirect roles of Fe impregnation on Cr (VI) removal by engineered biochar. Chem. Eng. J. 2022,
428, 131967. [CrossRef]
20. Zhu, X.; Wan, Z.; Tsang, D.C.W.; He, M.; Hou, D.; Su, Z.; Shang, J. Machine learning for the selection of carbon-based materials for
tetracycline and sulfamethoxazole adsorption. Chem. Eng. J. 2021, 406, 126782. [CrossRef]
21. Alsulaili, A.; Refaie, A. Artificial neural network modeling approach for the prediction of five-day biological oxygen demand and
wastewater treatment plant performance. Water Supply 2021, 21, 1861–1877. [CrossRef]
22. Wu, X.; Yang, Y.; Wu, G.; Mao, J.; Zhou, T. Simulation and optimization of a coking wastewater biological treatment process by
activated sludge models (ASM). J. Environ. Manag. 2016, 165, 235–242. [CrossRef]
23. Henze, M.; Gujer, W.; Mino, T.; Matsuo, T.; Wentzel, M.C.; Marais, G.V.R.; van Loosdrecht, M.C.M. Activated sludge model no. 2d,
ASM2d. Water Sci. Technol. 1999, 39, 165–182. [CrossRef]
24. Müller, A.C.; Guido, S. Introduction to Machine Learning with Python: A Guide for Data Scientists; O’Reilly Media, Inc.: Sebastopol,
CA, USA, 2016.
25. Delgrange, N.; Cabassud, C.; Cabassud, M.; Durand-Bourlier, L.; Lainé, J.M. Neural networks for prediction of ultrafiltration
transmembrane pressure–application to drinking water production. J. Membr. Sci. 1998, 150, 111–123. [CrossRef]
26. Eslamian, S.; Gohari, A.; Biabanaki, M.; Malekian, R. Estimation of monthly pan evaporation using artificial neural networks and
support vector machines. J. Appl. Sci. 2008, 8, 3497–3502. [CrossRef]
27. Taylor, J.G. Neural Networks and Their Applications; John Wiley and Sons: Hoboken, NJ, USA, 1996; p. 322.
28. José, C.; Principe, N.R.E.; Lefebvre, W.C. Neural and Adaptive Systems: Fundamentals through Simulations; John Wiley and Sons:
Hoboken, NJ, USA, 1999.
29. Das, H.S.; Roy, P. A Deep Dive into Deep Learning Techniques for Solving Spoken Language Identification Problems. In Intelligent
Speech Signal Processing; Elsevier: Amsterdam, The Netherlands, 2019; pp. 81–100.
30. Feng, S.; Zhou, H.; Dong, H. Using deep neural network with small dataset to predict material defects. Mater. Des. 2018, 162,
300–310. [CrossRef]
31. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef]
32. Ward, L.; Agrawal, A.; Choudhary, A.; Wolverton, C. A general-purpose machine learning framework for predicting properties of
inorganic materials. NPJ Comput. Mater. 2016, 2, 16028. [CrossRef]
33. Hagan, M.T.; Demuth, H.B.; Beale, M.H.; De Jesus, O. Neural Network Design; Martin Hagan: Stillwater, OK, USA, 2014.
34. Nourani, V.; Baghanam, A.H.; Gebremichael, M. Investigating the Ability of Artificial Neural Network (ANN) Models to Estimate
Missing Rain-gauge Data. J. Environ. Inform. 2012, 19, 38–50. [CrossRef]
35. Nourani, V.; Hakimzadeh, H.; Amini, A.B. Implementation of artificial neural network technique in the simulation of dam breach
hydrograph. J. Hydroinform. 2012, 14, 478–496. [CrossRef]
36. Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; Routledge: Milton Park, Abingdon-on-Thames,
UK, 2017.
37. Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE
Trans. Neural Netw. Learn. Syst. 2021, 1–21. [CrossRef] [PubMed]
38. Wu, J. Introduction to Convolutional Neural Networks; National Key Lab for Novel Software Technology, Nanjing University:
Nanjing, China, 2017; Volume 5, p. 23.
39. Apaydin, H.; Feizi, H.; Sattari, M.T.; Colak, M.S.; Shamshirband, S.; Chau, K.-W. Comparative analysis of recurrent neural
network architectures for reservoir inflow forecasting. Water 2020, 12, 1500. [CrossRef]
40. Li, L.; Jiang, P.; Xu, H.; Lin, G.; Guo, D.; Wu, H. Water quality prediction based on recurrent neural network and improved
evidence theory: A case study of Qiantang River, China. Environ. Sci. Pollut. Res. 2019, 26, 19879–19896. [CrossRef] [PubMed]

Sustainability 14 15598

Uploaded by

Copyright:

Available Formats

Sustainability 14 15598

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sustainability 14 15598

Uploaded by

Copyright:

Available Formats

sustainability

1 Engineering Faculty, Manara University, Lattakia, Syria

Sustainability 2022, 14, 15598. https://doi.org/10.3390/su142315598 https://www.mdpi.com/journal/sustainability

2.2. Deep Neural Networks DNN

3. Materials and Methods

Figure 1. Schematic ofFigure

Table 1. Pearson correlation matrix between influent and effluent parameters.

4. Results and Discussion

Table 2. Performance of the CODeff shallow artificial neural network models.

Model Training Validation Testing All Data No. Neurons

Figure (a) Scatter

4.2. Single Model (BOD5 Effluent)

4.3. Single Model (PO4 −3 Effluent)

Table 3. Performance of the PO4 −3 eff artificial neural network models.

Training Validation Testing All Data No. Neurons

Sustainability 2022, 14, x FOR PEER REVIEW 20 of 35

4.5. Ensemble Model (COD and BOD5 Effluent)

(a)eff and BOD5eff ) artificial neural network models.

Training Validation Testing All Data No. Neurons

4.6. Ensemble Model (PO4 −3 and NO3 − Effluent)

Sustainability 2022, 14, x FOR PEER REVIEW 24 of 35

Training Validation Testing All Data No. Neurons

4.7. Ensemble Model (COD, BOD5 , PO4 −3 and NO3 − Effluent)

Sustainability 2022, 14, x FOR PEER REVIEW 27 of 35

Table A1. Performance of the BOD5eff artificial neural network models.

Table A1. Performance of the BOD5eff artificial neural network models.

Model Training Validation Testing All Data No. Neurons

Table A1. Performance of the BOD5eff artificial neural network models.

Figure A3. Cont.

Figure A3. Cont.

Figure A3. Cont.

Sustainability 2022, 14, x FOR PEER REVIEW 32 of 35

Training Validation Testing All Data No. Neurons

Sustainability 2022, 14, x FOR PEER REVIEW 34 of 35

You might also like