Dynamic Real-Time Prediction of Reclaimed Water Volumes Using the Improved Transformer Model and Decomposition Integration Technology

Sun, Xiangyu; Zhang, Lina; Wang, Chao; Yang, Yiyang; Wang, Hao

doi:10.3390/su16156598

Open AccessArticle

Dynamic Real-Time Prediction of Reclaimed Water Volumes Using the Improved Transformer Model and Decomposition Integration Technology

by

Xiangyu Sun

¹,

Lina Zhang

²,

Chao Wang

^3,*,

Yiyang Yang

⁴ and

Hao Wang

^1,3

¹

Research Center of Fluid Machinery Engineering and Technology, Jiangsu University, Zhenjiang 212013, China

²

School of Resources and Civil Engineering, Northeastern University, Shenyang 110819, China

³

State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, China Institute of Water Resources and Hydropower Research, Beijing 100038, China

⁴

College of the Environment and Ecology, Xiamen University, Xiamen 361104, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(15), 6598; https://doi.org/10.3390/su16156598

Submission received: 3 June 2024 / Revised: 27 July 2024 / Accepted: 31 July 2024 / Published: 1 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, wastewater reuse has become crucial for addressing global freshwater scarcity and promoting sustainable water resource development. Accurate inflow volume predictions are essential for enhancing operational efficiency in water treatment facilities and effective wastewater utilization. Traditional and decomposition integration models often struggle with non-stationary time series, particularly in peak and anomaly sensitivity. To address this challenge, a differential decomposition integration model based on real-time rolling forecasts has been developed. This model uses an initial prediction with a machine learning (ML) model, followed by differential decomposition using Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN). A Time-Aware Outlier-Sensitive Transformer (TS-Transformer) is then applied for integrated predictions. The ML-CEEMDAN-TSTF model demonstrated superior accuracy compared to basic ML models, decomposition integration models, and other Transformer-based models. This hybrid model explicitly incorporates time-scale differentiated information as a feature, improving the model’s adaptability to complex environmental data and predictive performance. The TS-Transformer was designed to make the model more sensitive to anomalies and peaks in time series, addressing issues such as anomalous data, uncertainty in water volume data, and suboptimal forecasting accuracy. The results indicated that: (1) the introduction of time-scale differentiated information significantly enhanced model accuracy; (2) ML-CEEMDAN-TSTF demonstrated higher accuracy compared to ML-CEEMDAN-Transformer; (3) the TS-Transformer-based decomposition integration model consistently outperformed those based on LSTM and eXtreme Gradient Boosting (XGBoost). Consequently, this research provides a precise and robust method for predicting reclaimed water volumes, which holds significant implications for research on clean water and water environment management.

Keywords:

reclaimed water volumes; dynamic real-time forecasting; transformer; machine learning; forecast decomposition and ensemble technique

1. Introduction

As global water resources become increasingly scarce, factors such as population growth, accelerated industrialization, and intensified climate change have further heightened the demand for freshwater, exacerbating the scarcity of water resources [1,2]. Reclaimed water inflows refer to the volume of treated wastewater that is reintroduced into water systems for reuse. Effective management of reclaimed water inflows is essential for supporting various levels of wastewater management and planning [3]. With the proliferation of IoT devices and smart sensors, continuous monitoring and real-time decision-making have become possible, enhancing system responsiveness and management efficiency. This process involves real-time treatment of wastewater to remove contaminants, ensuring it can be safely used for irrigation, industrial processes, and replenishing natural water bodies, while also meeting different water volume demands and maintaining water quality standards. The inflows of reclaimed water are influenced by network factors, demographic characteristics, and meteorological events [4]. During heavy rainfall, wastewater flows can surge dramatically, causing variations in flow rates within combined sewer systems by more than two orders of magnitude, a situation that is not uncommon [5]. Such fluctuations in flow can exceed the hydraulic capacity of treatment plants, potentially disrupting biological treatment processes. Reclaimed water inflows help alleviate freshwater scarcity by reducing reliance on limited freshwater resources. At the same time, they enable the recycling of water resources, reducing the pressure on natural water bodies and supporting long-term environmental sustainability and ecological balance. Therefore, accurately predicting the inflow volumes of reclaimed water plants is essential not only for enhancing operational efficiency and achieving precise control of water quality but also for reducing the demand for freshwater and promoting sustainability in water use [6]. This study assumes strict adherence to water recycling and plant management standards in post-treatment water quality, focusing primarily on the accurate prediction of water volumes.

Over the past few decades, research has primarily focused on developing models based on physical and data-driven approaches to simulate and manage the water cycle processes [7,8,9]. Although physical models are capable of simulating the water cycle, they exhibit limitations in handling the complexity of wastewater flow volumes [10,11]. Operators often rely on empirical judgments or complex physical models for predictions, which poses challenges in practical applications. In contrast, data-driven models, particularly Time Series Statistics (TSS) [12] and Machine Learning (ML) [13] technologies, demonstrate advantages in processing large datasets and adapting to new information [14,15]. Machine learning models such as Decision Trees (DT) [16], Linear Regression (LR) [17], Gradient Boosting Regression (GBR) [18], Support Vector Regression (SVR) [19], eXtreme Gradient Boosting (XGBoost) [20] and Long Short-Term Memory networks (LSTM) have proven effective in revealing complex patterns within reclaimed water volume (RWV) data [21,22]. However, standalone ML models may struggle with highly non-stationary time series that contain substantial noise components without prior raw data preprocessing [23,24,25]. Appropriate time series preprocessing can simplify the original signals, extracting multi-scale features and enabling ML techniques to effectively analyze the raw signals and uncover hidden information within the dataset [26,27]. Common decomposition techniques such as Empirical Mode Decomposition (EMD) [28], Ensemble Empirical Mode Decomposition (EEMD) [29], Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) [30], and Variational Mode Decomposition (VMD) [31] enhance ML model performance by decomposing non-stationary signals into simpler sub-signals. Recent research has introduced a novel hybrid predictive model that employs a two-stage decomposition approach (including CEEMDAN and VMD) combined with deep learning algorithms [32]. This study utilizes these techniques to extract major fluctuations and trends from time series, then employs the Long Short-Term Memory (LSTM) deep learning algorithm to explore the dynamic characteristics of each sub-sequence. Consequently, decomposition integration models that combine time series preprocessing with machine learning techniques have been developed to predict both the quantity and quality of wastewater treatment simultaneously [33,34].

Furthermore, the application of the Transformer model and its attention mechanism in the field of wastewater prediction offers an enhanced method by prioritizing important parts of the data. The Transformer model, introduced by Vaswani et al. [35], has revolutionized the field of Natural Language Processing (NLP) and rapidly become a significant milestone in deep learning research. However, its application in time series prediction, particularly in wastewater treatment, has not been extensively explored. Huang et al. [36] were pioneers in applying the Transformer model to wastewater treatment tasks, achieving a sample prediction accuracy of up to 96.8%. This marked the first application of the Transformer model in the wastewater treatment domain, breaking the constraints of traditional models in terms of predictive accuracy. The study by Peng and Fanchao [37], focused on fault detection rather than direct water volume prediction, but the models built using the Transformer network provided effective solutions for complex issues in wastewater treatment processes, potentially inspiring other applications, including water volume prediction. Although the Transformer model has demonstrated potential in certain aspects of wastewater treatment, current research primarily concentrates on enhancing the model’s predictive performance. There remains a relative scarcity of in-depth studies addressing specific challenges faced by the model in predicting wastewater volumes, including handling seasonal and non-stationary features, as well as sensitivity to peaks and anomalies. This necessitates a dual focus: not only on enhancing the accuracy of the Transformer model but also on ensuring the model can sensitively and meticulously respond to the non-stationarity and complexity of the data, thereby ensuring robustness and precision in predictions [38,39]. In the realm of convolutional neural networks implementing fuzzy time series predictions, utilizing differencing methods can attenuate the non-stationarity of time series, thereby facilitating low-error predictions for non-stationary time series. Although studies indicate that differencing can improve model efficiency, how to effectively integrate differencing sequences and deep learning models, including the aforementioned decomposition techniques, particularly within the Transformer model, and effectively handle the seasonal changes and non-stationarity of wastewater volume data remains a challenge yet to be fully addressed [40,41]. The self-attention mechanism in long-sequence time predictions can lead to the loss of temporal information, causing existing models to perform poorly in capturing subtle changes in time series data. When predicting wastewater volumes, the Transformer model may ignore the continuity of temporal information due to the characteristics of the self-attention mechanism, resulting in a failure to capture fine variations in water volume. The dot-product self-attention mechanism in the classic Transformer architecture is insensitive to local context, making it unable to accurately distinguish turning points and anomalies, and thus unable to promptly handle unexpected situations.

To address these challenges and enhance the efficiency of time series prediction with the Transformer, this study introduces the ML-CEEMDAN-TSTF differential decomposition integration model, incorporating the following enhancements to the base Transformer model:

(1) A differential transformation was implemented on time series data to explicitly capture crucial information, significantly enhancing the model’s ability to recognize dynamic changes within the data.

(2) The introduction of a Time-Aware Outlier-Sensitive Loss Function allows the model to place greater emphasis on anomalies and abrupt changes during the loss calculation. Consequently, this design significantly enhances the model’s adaptability to key changes and the precision of its predictions.

(3) The adoption of an adaptive sliding window mechanism enables the model to dynamically adjust the size of the processing window based on the specific features of the time series data. Consequently, this approach effectively reveals long-term dependencies and key patterns, thereby enhancing both the efficiency and accuracy of the model in handling complex time series data.

(4) By utilizing an enhanced Transformer model and its decomposition integration technology, the model efficiently distills multidimensional features while reducing white noise. This significantly augments the model’s capability for comprehensive analysis and processing of time series data.

In summary, the primary goal of this research is to establish a precise and reliable real-time forecasting decomposition integration model, while also identifying the optimal model parameters that are well-suited for complex, nonlinear, non-stationary time series, and to train the model to accurately predict wastewater volumes. Specifically, this study aims to demonstrate the effectiveness and reliability of the proposed ML-CEEMDAN-TSTF model through comparative analysis with single ML models and various hybrid forecasting models. Furthermore, the research focuses on categorizing experiments and discussing solutions to address issues of slow time perception and differencing forgetfulness, ensuring optimal model accuracy.

The remainder of this paper is organized as follows: Section 2 provides a detailed description of four single ML models, differencing decomposition algorithms, factor selection methods, and the customized Transformer hybrid forecasting model. Section 3 describes a case study, including the basic introduction, data preparation, and model parameter settings. Section 4 further analyzes the effects of the model. Section 5 emphasizes the advantages and limitations of the methods proposed in this paper. Finally, conclusions are drawn in Section 6.

2. Materials and Methods

2.1. Research Framework

This study has developed a differential decomposition integration prediction model based on the TS-Transformer, aimed at accurately forecasting time series data. Figure 1 illustrates the framework of the entire decomposition integration model. Initially, three machine learning models were selected: LR, SVR, and GBR. These models are capable of capturing key features of the time series from different dimensions. Specifically, they extract the lagged sequences most correlated with the target sequence from historical data, providing a solid foundation for model training. To address the issue of long-term dependencies and to capture non-linear features within the time series, we further integrated LSTM. Subsequently, to enhance the model’s accuracy and stability, we employed the CEEMDAN differencing decomposition algorithm to perform multimodal decomposition of the series. This step eliminates white noise from the differenced series and isolates components strongly related to the prediction target, thereby strengthening the model’s recognition of seasonal and trend features within the time series. The decomposition by CEEMDAN reveals hidden cyclical and structural information in the data, providing a clearer basis for subsequent analysis. Finally, an integration using the TS-Transformer was performed to combine features generated by the previous prediction and decomposition models, further refining the predictive capability. The TS-Transformer, through its self-attention mechanism, enhances the model’s ability to handle long-distance dependencies within the time series and is trained using a custom time-aware outlier-sensitive loss function, which increases sensitivity to the most recent observations and outliers. Consequently, the integrated model constructed in this study is designed to capture the complex dynamics of time series data, providing accurate predictions for various application scenarios.

2.2. Single Forecast Models

2.2.1. Regression Models

In this study, three machine learning regression models were utilized for predicting time series data: Linear Regression (LR), Support Vector Regression (SVR), and Gradient Boosting Regression (GBR). These models were selected due to their respective strengths in handling complex environmental data, and their complementarity during the analytical process.

LR is a foundational statistical method that models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to the observed data. The simplicity of this model lies in its assumption that there is a straight-line relationship between the variables. Its primary advantage is its simplicity and interpretability, making it a preferred choice for initial exploratory analysis. To implement LR, (1) collect and prepr90ocess the dataset; (2) designate the dependent and independent variables; (3) calculate the coefficients using the least squares method; and (4) use the derived equation to make predictions.

SVR operates by finding a hyperplane that best fits the data, aiming to limit the errors within a certain threshold. Its efficacy arises from its ability to manage non-linear data using kernel functions. It offers robustness against outliers and has the flexibility to model non-linear relationships. To deploy SVR, (1) scale and preprocess the data; (2) choose an appropriate kernel function (linear, polynomial, or radial basis function); (3) train the model by solving the optimization problem; and (4) predict values using the trained model.

GBR is an ensemble learning method that builds multiple decision trees sequentially, where each tree corrects the errors of its predecessor. It works effectively because it optimizes a loss function, reducing errors iteratively. GBR stands out for its ability to handle missing data, its resistance to overfitting with adequate tree depth, and its capacity to model complex non-linear relationships. To utilize GBR, (1) initialize the model with parameters like learning rate and number of trees; (2) train the model by fitting it to the data, allowing each tree to learn from the residuals of the previous one; and (3) use the ensemble of trees for predictions.

The selection of the three aforementioned regression models is justified by their unique data processing characteristics and their complementary analytical dimensions, which are vital for the construction of the final integrated model. Linear regression, based on its simplicity, provides a clear starting point for predictions and reveals the linear relationships between variables. SVR, on the other hand, introduces the capacity to handle complex non-linear data, adapting to a more diverse data structure. In contrast, GBR, through its foundation in decision trees, can capture more subtle features and patterns within the data. Integrating these three models into a more advanced model aims to enhance the final prediction’s accuracy and robustness by synthesizing the strengths of different models, thereby providing a more comprehensive data foundation for the final integrated model.

2.2.2. Long Short-Term Memory (LSTM) Networks

This study incorporates LSTM networks to address the challenges posed by traditional Recurrent Neural Networks (RNNs) in processing extended temporal sequences, a critical aspect in the realm of predictive analytics [42,43]. LSTM networks are specifically deployed to complement the Transformer model, enhancing the overall predictive accuracy of the integrated system.

At the core of LSTM’s effectiveness is its intricate internal architecture, which meticulously orchestrates memory cells and hidden cells. This design allows for the nuanced capture of both gradual and abrupt temporal shifts in data, a key feature for accurate forecasting. The LSTM model is equipped with three essential gates: input, forget, and output. Each of these gates plays a pivotal role in modulating the inflow, retention, and outflow of information within each cell, thereby maintaining the integrity and relevance of data through the sequence. The application of LSTM in hydrological modeling, as explored in-depth by Kratzert et al. [44], provides a foundation for its deployment in this study. The integration of LSTM networks into this study’s forecasting system not only mitigates the limitations of traditional sequence modeling approaches but also enriches the integrating model’s capacity to handle intricate temporal dynamics. This synergy enhances the overall robustness and precision of the predictive analytics framework developed for time series forecasting.

2.3. Differencing Decomposition Based on the CEEMDAN Algorithm

When addressing the complex relationships between time series and their influencing factors, relying solely on raw data sequences often fails to reveal the direct connections among these variables. Therefore, an efficient decomposition algorithm is necessary to eliminate white noise from the differenced series and to identify factors strongly correlated with the time series. Traditionally, the EEMD method has been used for this purpose, but it has a flaw: the reconstruction error increases with the number of ensembles, and the added white noise cannot be completely removed by increasing the average number of times, resulting in lengthy computation times. In contrast, the CEEMDAN method adopted in this study introduces adaptive white noise during the decomposition process. This method effectively reduces the reconstruction error to near zero with fewer ensemble means, thereby optimizing the data processing procedure.

The original time series data are transformed into a differenced form:

D (n) = Q (n) - Q (n - 1)

(1)

where

D (n)

is the differenced series,

D (n)

is the original series, and

n

represents the time step, yielding the sequence at time

t = 1, 2, \dots, T

.

Employing the CEEMDAN algorithm, the differenced data series undergoes multimodal decomposition as follows:

Step 1: First, we calculate the first modal component, similarly to the first independent component in EMD:

I M F_{1}_(n) = \frac{1}{T} \sum_{t = 1}^{T} I M F_{t 1} (n) = I M F_{\tilde{1}} (n)

(2)

Step 2: Next, we compute the first unique residual signal:

r_{1}_(n) = D (n) - I M F_{\tilde{1}} (n)

(3)

Step 3: Decompose the remaining signals, repeating the experiments

T

times, until the second modal component is obtained:

I M F_{\tilde{2}}_(n) = \frac{1}{T} \sum_{t = 1}^{T} E_{1} (r_{1} (n) + ϵ_{1} E_{1} (λ_{t} (n)))

(4)

In this process,

ϵ_{1}

represents the noise coefficient.

Step 4: For each subsequent step

i = 2, 3, \dots, I

, calculate the ith residual signal

r_{i} (n)

in the same manner as Step 3. Continue the decomposition until no further modal components can be obtained:

r_{i} (n) = r_{i - 1} (n) - I M F_{\tilde{i}} (n)

(5)

I M F_{\tilde{i + 1}}_(n) = \frac{1}{T} \sum_{t = 1}^{T} E_{i} (r_{i} (n) + ϵ_{i} E_{i} (λ_{t} (n)))

(6)

Step 5: At each step, determine if the number of extrema in the residual signal is at most two. If this condition is satisfied, the algorithm stops, and the final

I

modal components are obtained. The final residual signal can be expressed as follows:

R (n) = d (n) - \sum_{i = 1}^{I} I M F_{\tilde{i}} (n)

(7)

Thus, the original differenced data

D (n)

can ultimately be decomposed into

I

modal components and the residual signal, as shown in Equation (7):

D (n) = \sum_{i = 1}^{I} I M F_{\tilde{i}} (n) + R (n)

(8)

2.4. Decision Tree for Factor Importance Measurement

In the evaluation of the importance of influencing factors, decision trees are widely utilized due to their intuitiveness and ease of interpretation across various data analysis scenarios. As depicted in Figure 2, a decision tree is visualized as a flowchart, where the root and subsequent internal nodes conduct attribute tests, with branches indicating possible outcomes of these tests. Each terminal leaf node defines a category label or a predicted value, and every path from the root to the leaves constitutes a set of rules for classification or regression decisions. With such a structure, decision trees not only streamline the decision-making process but also offer a transparent perspective on how features influence prediction outcomes.

The construction of a decision tree begins with the analysis of the input time series data, comprising a series of influencing factors. During the data processing stage, the decision tree model progressively reveals, much like assembling a puzzle, which factors are essential to the output result. At each node, the model evaluates and selects an influencing factor to test, dividing the data into two subsets corresponding to the different value ranges of the factor.

These subsets are further tested and divided until no further splitting is possible, culminating in the formation of leaf nodes. Each leaf node contains a set of factors that make a decisive contribution to the prediction outcome. The paths leading to these leaf nodes represent a series of decision rules from input data to the prediction result. Through this recursive partitioning process, the decision tree effectively filters out the time series factors with the greatest impact on the prediction target.

In this process, the model assesses the effectiveness of each influencing factor in differentiating the data, converting it into a score that reflects the importance of each factor. Ultimately, those factors contributing most significantly to the score are considered the most important influencing factors. Thus, we can identify key time series data points for model prediction, bridging historical information to future forecasts. The decision tree makes the complex process of data analysis coherent and manageable, enabling us to easily identify and focus on those key factors that have the most substantial impact on the results.

2.5. Ensemble Model

2.5.1. Time-Aware Outlier-Sensitive Transformer (TS-Transformer)

Currently, Transformer models have achieved notable success in the fields of image, text, audio, and time-series data processing, renowned for their capabilities in parallel computing and capturing long-range dependencies. In this study, we have employed a customized Transformer model, the Time-Aware Outlier-Sensitive Transformer. This model adheres to the original Transformer architecture, consisting of two primary modules: the encoder and the decoder. The structure of our Transformer prediction model is presented in Figure 3.

The initial layer of the encoder is the input embedding, which maps each feature of the original time series into a high-dimensional vector space. This process is formulated as

X_{embed} = X_{input} W_{embedding} + b_{embedding}

(9)

where

X_{input}

is the input feature matrix,

W_{embedding}

is the weight matrix of the embedding layer, and

b_{embedding}

is the bias term.

Post embedding, each vector undergoes positional encoding to retain the sequence order. The encoding for each position

pos

and dimension

i

in a vector of dimension

d_{model}

is given by

{PE}_{(pos, 2 i)} = \sin (\frac{pos}{10000^{2 i / d_{model}}})

(10)

{PE}_{(pos, 2 i + 1)} = \cos (\frac{pos}{10000^{2 i / d_{model}}})

(11)

Each encoder layer comprises a self-attention sub-layer followed by a feed-forward sub-layer, each with a subsequent normalization step. The self-attention mechanism in the

l

-th layer is calculated as

Z^{(l)} = A t t e n t i o n (Q^{(l)}, K^{(l)}, V^{(l)})

(12)

where

Q^{(l)}

,

K^{(l)}

,

V^{(l)}

are the query, key, and value matrices derived from the input to the

l

-th layer. The attention mechanism can be further detailed in the scaled dot-product attention.

The decoder’s initial input aligns with the last sample point of the encoder, ensuring a seamless transition into the forecasting phase.

In addition to self-attention, each decoder layer implements a cross-attention sub-layer that focuses on the encoder’s output, formulated as

U^{(l)} = C r o s s A t t e n t i o n (Z_{e n c o d e r}, Z_{d e c o d e r}^{(l)})

(13)

where

Z_{e n c o d e r}

is the output from the final encoder layer, and

Z_{d e c o d e r}^{(l)}

is the output from the self-attention sub-layer of the

l

-th decoder layer.

To capture the varying influence of different time points, a time attention module assigns weights to each timestep, enhancing the model’s sensitivity to temporal dynamics.

To further enhance the model’s performance in accuracy, we have integrated a custom loss function, ‘Time-Aware Outlier-Sensitive Loss’, which dynamically adjusts its focus on recent data points and outliers during the training process. This loss function is formulated as

L o s s = s i g m o i d (α) \cdot \frac{\sum_{t = 1}^{T} ω_{t} {(y_{t} - {\hat{y}}_{t})}^{2}}{T} + (1 - s i g m o i d (α) \cdot \frac{\sum_{t = 1}^{T} 1 \{|y_{t} - {\hat{y}}_{t} > μ + s i g m o i d (β) \cdot σ|\} |y_{t} - {\hat{y}}_{t}|}{T}

(14)

where

y_{t}

and

{\hat{y}}_{t}

are the actual and predicted values at time

t

,

ω_{t}

is a linearly decreasing time weight,

μ

and

σ

are the mean and standard deviation of the errors, and

α

and

β

are trainable parameters. This advanced loss function enhances the model’s ability to prioritize recent observations and adaptively respond to outliers.

Additionally, to optimize the training of our Transformer model, we employ the Adam optimizer, a method known for its efficiency and effectiveness in handling sparse gradients and large-scale data operations. The Adam optimizer, an algorithm that combines the advantages of two other popular methods, Adagrad [45] and RMSprop [46], is particularly suitable for our application due to its adaptive learning rate capabilities. The optimizer adjusts the learning rates of each parameter through moment estimation, enhancing the convergence speed and performance of the model.

The integration of a Time-Aware Outlier-Sensitive Loss Function and the Adam optimizer into the Transformer model has substantially enhanced its capability to identify and interpret complex temporal patterns in wastewater flow data. By effectively utilizing dynamically tunable parameters for temporal weighting and outlier sensitivity within the model’s loss function, this approach has consequently led to a significant improvement in overall predictive performance for time series forecasting.

2.5.2. Real-Time Forecast

The time series dataset was divided into two halves: a calibration set and a validation set, each representing 50% of the original dataset’s total length. In the intricate process of analyzing time series, four sophisticated predictive models were employed: LR, SVR, GBR, and LSTM. The LSTM model’s input selection strategy was meticulously crafted, focusing on time-lagged selection. From all related factors, decision tree analysis was employed to evaluate the effectiveness of each factor in distinguishing the data. The 10 sequences with the highest correlation were selected as inputs, with the original sequence volume serving as the target sequence. This approach enables the identification and selection of the most influential factors based on their capability for classification and accurate prediction of the target sequence. The model training process was finely tuned to utilize the minimal NSE value to ensure the model’s optimal performance.

The original sequence was artfully transformed into a differenced sequence, which was then deftly decomposed using the CEEMDAN decomposition algorithm, yielding nine sequences of varying frequencies, labeled as imf1 through imf9. These IMFs, along with the regression sequences and LSTM-generated forecast sequences, were astutely selected as forecasting factors, with the differenced sequence as the target sequence. A fixed time step preceding the forecast point was precisely selected, and the TS-Transformer model was employed to seamlessly integrate these sequences of diverse dimensions and frequencies. The Time-Aware Outlier-Sensitive Loss was adeptly used to dynamically balance the sensitivity to the latest data points and outliers, thereby optimizing the model’s performance in time series forecasting.

In this study, a dynamic training set window was strategically adopted to train the Transformer model, effectively simulating a real-time prediction environment. Initially, the differential data set

D (n)

was divided into three parts: model training, model validation, and the rest assigned to the test set for performance evaluation. The TS-Transformer model, with its refined precision, predicted one period for each differenced subsequence of the inflow. After each prediction, the differenced forecast value was added to the previous day’s actual value, yielding the final predicted volume. Simultaneously, the sliding window was advanced one time step, ensuring the inclusion of the most recent data points. As new observation points emerged during the iterative training process, they were continuously integrated into the training set, thereby updating the model’s knowledge base. This approach led to a gradual increase in the training set’s length and a corresponding decrease in the test set, maintaining a constant size of the overall dataset. This strategy emulated a continuous learning environment, allowing the model to adapt to the latest data trends and ensuring consistency in the evaluation process, as well as real-time verification of the model’s predictive capabilities. The flowchart of the whole real-time forecast experiment is shown in Figure 4.

2.6. Performance Evaluation Criteria

In the interest of ensuring an objective evaluation of predictive performance, the experiments utilized the following metrics to quantitatively assess the accuracy of the models: Root Mean Square Error (RMSE), Nash–Sutcliffe Efficiency Coefficient (NSE), Mean Relative Error (MRE), and the Correlation Coefficient (Corr). The calculation formulas for each metric are presented below.

R M S E = \sqrt{\frac{1}{N} \sum_{t = 1}^{N} {(y_{t} - {y^{'}}_{t})}^{2}}

(15)

N S E = 1 - \frac{\sum_{t = 1}^{N} {(y_{t} - {y^{'}}_{t})}^{2}}{\sum_{t = 1}^{N} {(y_{t} - \bar{y})}^{2}}

(16)

M R E = \frac{1}{N} \sum_{t = 1}^{N} \frac{|y_{t} - {y^{'}}_{t}|}{y_{t}} \times 100 %

(17)

C o r r = 1 - \frac{\sum_{t = 1}^{N} (y_{t} - \bar{y}) (y^{'} - \bar{y^{'}})}{\sqrt{\sum_{t = 1}^{N} {(y_{t} - \bar{y})}^{2}} \sqrt{\sum_{t = 1}^{N} {(y^{'} - \bar{y^{'}})}^{2}}}

(18)

In these formulas,

N

represents the number of samples,

y

is the observed value, and

y^{'}

is the predicted value. The model is considered to perform optimally when the NSE and Corr approach 1, and both the RMSE and MRE approach 0, reflecting high accuracy and reliability of the predictions.

3. Case Study

3.1. Study Area and Data

Beijing, as the capital and the pivotal hub of the coordinated development of the Jing-Jin-Ji region, grapples with inherent water scarcity, a situation exacerbated by its categorization as a severely water-deficient area. Water resources constitute the foundation and lifeline of the capital’s development. As of 2022, Beijing boasts a permanent population exceeding 21 million, with an average daily water usage of 163.22 L per capita and a water supply coverage rate reaching 99.81%. Consequently, the use of recycled water plays a vital role in addressing Beijing’s water resource challenges. To maximize the effective use of the city’s recycled water sources, numerous recycling plants have been constructed, significantly mitigating the water shortage in Beijing. In 2022, the city’s total water supply amounted to 14.99 billion cubic meters, marking a marginal decrease of 0.17% compared to previous years, while the utilization and allocation of recycled water exceeded 1.2 billion cubic meters. In this context, investigating the precision and robustness of key technologies in water recycling plants is of paramount importance.

This study focuses on the CH and YF reclaimed water plants in the Haidian District of Beijing to validate the CEEMDAN-TAOST model. Located along the vital Han River tributary of the Yangtze River, the plants’ historical inflow data span 2435 days, from 1 January 2015 to 31 August 2021. This study revolves around the actual historical and meteorological data specific to the CH and YF plants and the Haidian District of Beijing, conducting an in-depth analysis of the plants’ water inflow mechanisms, with rainfall data obtained from the WQ Rainfall Station. A comprehensive historical dataset encompassing various factors was established, from which key predictive elements were extracted. Utilizing these elements, a predictive model focusing on the day-to-day difference decomposition and integrated deep learning has been developed, concentrating on the accurate prediction of water inflow to the CH and YF plants.

3.2. Data Normalization

Given the nature of time series prediction, which is composed of sequences across multiple dimensions and exhibits significant numerical disparity between decomposed and predicted sequences, it is therefore necessary to normalize the data during the preprocessing stage. In this study, all forecasting factors have been standardized to a common scale of [−1, 1]. The predictors are selected from historical lagged information in measured data and real-time forecasting data, and are categorized into three groups: water quantity data, including the influent flow rate of the wastewater treatment plant; climatic factors, comprising maximum temperature, minimum temperature, and rainfall; and other factors, such as holidays and residential water consumption. For the forecasting model in this study, the first round of forecasting models normalizes using Equation (19), while the Transformer model during decomposition integration normalizes using Equation (20):

X^{'} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}

(19)

X^{'} = 2 * \frac{X - X_{\min}}{X_{\max} - X_{\min}} - 1

(20)

where

X^{'}

is the normalized value and

X

is the target sequence, with

X_{\max}

and

X_{\min}

representing the maximum and minimum values in the original sequence, respectively. Considering that the differenced series of observations includes both positive and negative amplitudes, Equation (20) is used for normalization in the integrated model. The above equations ensure that all features are scaled equivalently, circumventing gradient issues caused by inconsistencies in feature dimensions, and thereby promoting numerical stability during the model training process.

3.3. Model Configuration

To assure an impartial appraisal of the predictive performance, four forecasting schemes were established for comparative analysis: the ML model, decomposition integration model, comprehensive model under the Transformer, and comprehensive model under differencing scenarios. These evaluations utilized daily RWV data from the CH and YF reclaimed water plants, as detailed in Section 3.1.

All ML model inputs incorporated historical and real-time information, including rainfall, holidays, and maximum and minimum temperatures, along with measured RWV data. A decision tree was used to select influential factors from these inputs, and different combinations were tested in real-time forecasting to enhance model performance. This methodology allowed the models to capture key time series features, thereby improving the accuracy of the forecasts. For the RWV data, sequences lagged from 1 to 10 days were taken as pre-inputs, combining historical factors with influencing factors over these time frames (i.e., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 day lag). From these inputs, the 10 sequences most strongly correlated with the time series were selected to predict future data points. This approach enabled the model to grasp crucial time series features, hence increasing forecast precision.

To accommodate computational resource limitations and identify optimal model parameters, we adjusted the hyperparameters of LSTM through empirical methods, focusing on optimizing the number of hidden layers and neuron units, which were set to 1 and 7, respectively, while the batch size remained at 1000; other LSTM hyperparameters adopted default settings. Experimental analysis indicated that a single hidden layer outperformed multiple hidden layers on both the original series and the decomposed subsequences; consequently, we selected a single-layer architecture. The parameters for LR, SVR, and GBR models retained their default values. The differenced series was decomposed into nine subsequences with distinct features through CEEMDAN, each independently input into the Transformer model for prediction.

For the foundational configuration of the Transformer model, the historical data input stride was set at 30 days, with the future period for predictions fixed at 1 day. The training epochs were determined to be 500 to ensure the model adequately learned the complex patterns in the data. The chosen number of input features was 14, encompassing the differenced series of historical observations, ML models (including LR, SVR, GBR, and LSTM), and the CEEMDAN decomposed differenced series. The differential dataset was divided into three parts: 50% for model training, 10% for model validation, and the remaining 40% for the test set. For the core configuration of the Transformer model, after empirical testing, the embedding dimension was set at 290 to capture a rich representation of input features; the number of neurons in the hidden layer was 319, providing sufficient model complexity; the number of heads was set to 29, allowing the model to parallelly learn different aspects of the data; the number of blocks for both encoder and decoder was fixed at 3 to form a deep network structure. Additionally, a dropout rate of 0.05 was used during model training to prevent overfitting, the learning rate was set at 0.0001 to ensure stable convergence, and the batch size was maintained at 28. Finally, the predicted differenced series was combined with historical observations to obtain the final forecasting results, which were evaluated using the metrics outlined in Section 2.6.

4. Results

4.1. Experiment on Model Applicability under Temporal and Spatial Conditions

To validate the generalizability and applicability of the method proposed in this paper under different temporal and spatial conditions, we designed this experiment from two perspectives: time, divided into summer–autumn and winter–spring seasons; and space, divided into CH plant and YF plant. Specifically, we selected the 10 most correlated factors out of 35 potential influencing factors, which included maximum temperature, minimum temperature, rainfall, holidays, and the 1–7 lagged sequences of inflow.

Table 1 shows the high-correlation influencing factors and their correlations for the CH plant in different periods, sorted from high to low correlation. Factor n represents the n-th lag of the factor. Table 2 presents the same for the YF plant. From the tables, it is evident that the lagged sequences of inflow occupy a high correlation position in each experiment, with correlations exceeding 65%, indicating that inflow is a significant factor affecting the operation of reclaimed water plants. However, the correlation differences of factors like minimum temperature across different seasons and plants also indicate the need to consider spatiotemporal variations in the model, highlighting the necessity of dynamically adjusting input factors to improve prediction accuracy. Due to the limited capacity of the CH and YF plants, the separation of stormwater and sewage, and minimal rainfall entering the plants, the impact of rainfall is negligible. Therefore, the following experiments did not specifically train the model by distinguishing seasons. Given the generalizability of our model, we selected the CH plant for the subsequent experiments and analysis.

4.2. Comparative Results of Single Models

Initially, in the absence of model integration, single forecast models were employed to train on the original inflow sequence of the wastewater plant, with comparative analyses of the results presented in this segment. As depicted in Figure 5, the comparison models included each of the models within the integrated framework, all ensuring an NSE greater than 0.50 during the model selection process, indicating a certain level of accuracy and meeting the requirements. The selected single models encompassed four machine-learning models and two Transformer models in the first round of forecasting. The Transformer models consisted of the standard Transformer (abbreviated as TR in the figure) and its variant, the Time-Aware Outlier-Sensitive Transformer (abbreviated as TSTF in the figure). To ensure a fair comparison of evaluation metrics, the raw data were normalized to mitigate the impact of differing scales. Such standardization not only highlighted the relative relationships between input factors but also established a unified assessment framework. The four evaluation metrics outlined in Section 2.6 were used to assess the different single forecast models, with RMSE being normalized for graphical representation due to its larger scale.

The LR model exhibited superior performance in the single model assessment, improving MRE by 23.73% and 6.25% relative to GBR and LSTM models, respectively, demonstrating its precision in capturing data dynamics. Compared to SVR and Time-Aware Outlier-Sensitive Transformer models, the LR model achieved a 29.6% and 30.6% reduction in RMSE, signifying a significant advantage in prediction accuracy and implying a certain linearity in the plant inflow data, although numerous other influencing factors were also at play. Despite slightly underperforming compared to the LR model, SVR and GBR outperformed TS-Transformer, exhibiting their specific strengths. The LSTM model, commonly utilized for time series data, matched the RMSE of the LR model, indicating its strong potential in such data contexts. Considering LSTM’s capability to handle complex patterns, the results highlighted the LSTM model’s commendable forecasting precision and data dynamics capture ability.

In this evaluation, the three regression models and the LSTM model demonstrated higher forecasting accuracy. However, the performance of the Transformer models was relatively weaker, with lower NSE values. Although they surpassed the baseline consistency threshold, there was still room for improvement compared to other models. The TS-Transformer performance was marginally better than that of the standard Transformer, suggesting that variations of the Transformer could enhance the model’s own accuracy, yet they still lagged behind other machine learning models. This indicates that while variant Transformers can indeed offer effective improvements over the base model, their precision and predictive capacity are still suboptimal. The Transformer models and their variants did not exhibit the same forecasting ability as the other machine learning models, but they demonstrated superior feature extraction and integration in the integrated models discussed in Section 4.3, suggesting that better performance might be achievable under specific configurations.

The relatively poorer performance of Transformer models in single-model forecasting can be attributed to several key factors. First, although Transformers excel at handling sequential data with long-range dependencies, they often struggle with the short-term fluctuations and local patterns that are critical for accurate predictions of wastewater inflows. Second, the complexity of the Transformer architecture, including its self-attention mechanism, can lead to overfitting, especially when the training data are limited or noisy. This overfitting reduces the model’s ability to generalize to unseen data, resulting in poorer performance. Additionally, the baseline Transformer model lacks specific adaptations for handling time series data, such as mechanisms for capturing seasonality and trends, which are essential for precise time series forecasts. However, the introduction of the Time-Aware Outlier-Sensitive Transformer (TSTF) has led to improvements in accuracy, which will be discussed in detail in Section 4.4.

4.3. Comparative Results of Hybrid Forecasting Models

Although the forecasting results of the Transformer model were not entirely satisfactory in single model forecasts, research has shown the Transformer’s potent feature extraction capabilities [47]. With the assistance of decomposition models, it can maximize the integration of the strengths of each single model, thereby enhancing the overall forecast accuracy. Studies have shown that machine learning models vary in their selection of real-time factors; the higher the accuracy of machine learning models in the first round of forecasting, the better the integrated model’s forecasting outcomes [48]. To validate the feature extraction and integration capabilities of the Transformer, several commonly used integrated models were set up for comparative experiments. Only the results of the TS-Transformer were used to represent the Transformer model for comparison, and the differences from the basic Transformer will be elaborated in Section 4.4. Moreover, in this section, the prediction target for all integrated models was set as the differenced sequence of the inflow volume, with all decomposition models specifically focusing on the differenced sequence. The reasons for utilizing the differenced sequence instead of the original will be discussed in detail in Section 5.

Table 3 displays a comparison of evaluation metrics for the forecasting results of various integrated models during the testing process, including the following six models: Back Propagation (BP), Support Vector Machine (SVM), Gated Recurrent Unit (GRU), eXtreme Gradient Boosting (XGBoost), Long Short-Term Memory (LSTM), and Time-Aware Outlier-Sensitive Transformer (TSTF). It is evident from the table that the NSE for the ensemble of the TS-Transformer, LSTM, and XGBoost models reached the highest, with the ML-CEEMDAN-TSTF model achieving an astonishing NSE of 0.98, significantly surpassing other common integrated models, and improving by 10.83% and 16.19% compared to ML-CEEMDAN-LSTM and ML-CEEMDAN-XGBoost, respectively. The Transformer’s parallel processing capability significantly enhanced training efficiency, enabling the model to quickly adapt to new data. The flexibility of the Transformer to handle time series of varying lengths is key to dealing with wastewater data characterized by seasonality and non-stationarity. It is apparent that some models exhibited substantial precision improvements with integration compared to single model forecasts, while the ensemble effects of other models were mediocre, and in some cases, the ensemble results were even worse than the single forecast models, evidently not suited for model integration. Additionally, Table 3 contrasts the results of the same integrated models with and without prior differencing decomposition. It is observable that the differencing decomposition-enhanced TS-Transformer, LSTM, and XGBoost integrated models outperformed models without differencing decomposition by 7.91%, 4.23%, and 4.44%, respectively. Hence, it can be seen that differencing decomposition is not the primary factor for forecast accuracy but is indeed one of the key elements that can enhance the precision of model forecasts.

Figure 6 presents the relationship between the forecasted values combined with the CEEMDAN differencing decomposition model and the VMD differencing decomposition model and the actual values, involving three of the best-performing integrated models: XGBoost, LSTM, and the TSTF model utilized in this experiment. In the figure, the pink dots represent the collection of observed inflow data, while the blue line is the fitted line. The green and yellow histograms display the distribution of the predicted inflow values and the observed inflow values, respectively, and the purple and cyan curves represent the corresponding probability density functions. In addition, the figure provides the NSE values and fitting equations for each result, mathematically describing the predictive accuracy and the relationship with the observed inflow values. The comparison reveals that not only is there a substantial improvement in accuracy using integrated models over single forecast models, but the results also vary significantly when different decomposition models are applied for differencing decomposition integration. CEEMDAN and VMD have both been popular decomposition models in recent years; however, the outcomes demonstrate that the accuracy of XGBoost and LSTM increased by 18% and 12%, respectively, when using the CEEMDAN model compared to the VMD model. Meanwhile, the TSTF model still achieved an improvement of 0.03 in NSE = 0.95, with scatter points more uniformly and concentratedly distributed around the y = x line, indicating an exceptionally favorable forecasting performance. Therefore, in the models established by this experiment, CEEMDAN was chosen for decomposing the differenced series.

The superior performance of the CEEMDAN model can be attributed to its ability to effectively decompose non-stationary and nonlinear time series data into intrinsic mode functions (IMFs) with adaptive noise. This decomposition helps in isolating significant underlying patterns and removing noise, resulting in more accurate and stable predictions. CEEMDAN resolves the mode mixing problem encountered in traditional empirical mode decomposition and provides a more comprehensive representation of the data’s inherent characteristics. Additionally, by incorporating adaptive noise, CEEMDAN enhances the model’s robustness against data anomalies and fluctuations, which are common in real-world datasets. This allows models like XGBoost, LSTM, and TSTF to better capture and predict the intricate variations in the wastewater inflow data, thereby improving forecasting reliability.

4.4. Comparative Results of Transformer and TS-Transformer

In this experiment, we introduced into the Transformer model a time-aware outlier-sensitive loss function specifically designed for time series forecasting, aimed at enhancing the model’s sensitivity to temporal patterns and its efficacy in handling outliers. Figure 7 contrasts the training results of the Transformer and TS-Transformer within the decomposition integration model against the actual value series. In the figure, the black line represents the observed values, while the green and red lines represent the predicted results of the ML-CEEMDAN-TSTF and ML-CEEMDAN-Transformer models. The figure highlights two segments for comparison, and it is evident that the ML-CEEMDAN-TSTF model exhibits greater sensitivity and precision in scenarios of peaks and sharp increases than the Transformer model, which, despite generally accurate predictions, tends to overestimate high peak values. From the perspective of evaluation metrics, the employment of the time-aware outlier-sensitive loss function resulted in increases of 3.65% and 1.79% in NSE and Corr, respectively, while RMSE and MRE decreased by a substantial 42.52% and 38.55%. These enhancements yielded the optimal solution in this experiment, with an NSE reaching 0.98. It is undeniable that the ML-CEEMDAN-Transformer model already possessed forecasting results far superior to various other models, and the addition of the custom loss function further improved its performance.

The superior performance of the Time-Aware Outlier-Sensitive Transformer (TSTF) model compared to the standard Transformer model can be attributed to the introduction of the Time-Aware Outlier-Sensitive loss function. The time-aware component of the loss function assigns different weights to errors based on their temporal position. This means that errors occurring at critical time points (such as peaks or abrupt changes) are given higher weights. Meanwhile, the outlier-sensitive component of the loss function ensures that the model pays special attention to outliers by increasing the penalty for large prediction errors. Consequently, the introduction of the Time-Aware Outlier-Sensitive loss function enhances the model’s ability to detect and respond to peaks and abrupt changes, while ensuring that outliers are accurately handled without distorting the overall prediction. This improvement enables the TSTF model to outperform the standard Transformer, making it suitable for complex and noisy environments.

5. Discussion

5.1. Advantages of ML-CEEMDAN-TSTF

The ML-CEEMDAN-TSTF model presented in this paper boasts several key advantages, including the integration strengths of Transformer, the reinforced training benefits of the custom time-aware outlier-sensitive loss function, the feature extraction capabilities of CEEMDAN differencing decomposition, and the strategic selection of differenced sequences as the target for prediction.

The ensemble advantage of Transformer is manifested in its robust feature extraction capacity, particularly in capturing short-term dependencies and long-term trends in time series. As observed in Section 4.2, while the Transformer’s performance as a fundamental forecasting model was unremarkable, Section 4.3 demonstrated its formidable parallel capabilities in integrating various models’ features and strengths. This integration is crucial for addressing the challenges faced by conventional deep learning models when dealing with time series data characterized by seasonality and non-stationarity. By reducing the demand for training data volume, the Transformer not only diminishes reliance on computational resources but also significantly improves the efficiency of model training.

The enhanced training benefits of the time-aware outlier-sensitive loss function enable the model to predict sequences with anomalies or significant changes more accurately. The design of this loss function surmounts the limitations of traditional models, which struggle with the impact of differences between predictive and previous time points, effectively boosting the model’s responsiveness to unforeseen events and prediction accuracy.

The feature extraction prowess of differencing decomposition lies in its ability to meticulously dissect time series, identifying and eliminating non-stationarity and white noise, while also uncovering strong correlations with key influencing factors. This approach furnishes Transformer with purer and more representative input features, laying the groundwork for precise predictions.

Lastly, the benefit of choosing differenced sequences as the target for prediction lies in its efficacy in enhancing the model’s capacity to capture seasonal trends and cyclical patterns. By integrating several machine learning models capable of extracting lagged features before Transformer, past time points are effectively recognized and utilized, further bolstering the model’s ability to identify cyclical patterns. A detailed comparative analysis of these advantages will be the focus of Section 5.2.

In conclusion, ML-CEEMDAN-TSTF, with its distinctive design and structure, has successfully addressed several challenges faced by conventional time series forecasting models. It not only offers theoretical innovation but has also demonstrated superior performance in experimental settings, particularly when processing complex time series data exhibiting seasonal and non-stationary characteristics.

5.2. Advantages of Differencing in Decomposition Integration Models

Historically, two misconceptions have often arisen in forecasting. It is well-known that the factors influencing the current period’s data points are essential for forecasting results; however, in daily forecasting, using current-period meteorological information might suggest foreknowledge, while using data from the previous moment as influencing factors often yields suboptimal outcomes. Even if forecast data for the day can be obtained early, the actual operation of models in projects also requires time, as do the actions taken based on the forecasts—warnings, drills, and contingency plans—therefore, daily forecasts often cannot be applied timely to formulate corresponding strategies. This perception is a misconception; factors from the previous moment can significantly affect the actual model operation [49]. The second misconception is that models often overlook the significant impact of differences between previous days and the forecast day on the results, a problem we refer to as ‘differencing forgetfulness’. Many researchers and practitioners assume that models inherently possess the ability to recognize and utilize such temporal scale differences [50,51]. In reality, however, the complexity of time series data often exceeds the processing capabilities of basic model structures, especially when dealing with environmental data characterized by strong seasonality and sudden events. Therefore, this capability often needs to be explicitly introduced through specific data processing and model design. With these two misconceptions in mind, the key issue becomes: how can we enable models to extract the relationship between the differences from the previous day to the forecast day and the inflow volume?

To address this issue, an effective strategy is to introduce the differencing information as an explicit feature into the model. This means that instead of simply inputting sequential data points into the model, we calculate the differences between adjacent time points to provide the model with information about dynamic changes directly. The core of this approach is that it allows the model to focus directly on key change points within the time series, rather than being overwhelmed by stable or minor changes. Through this method, the model can more sensitively respond to key changes in the actual environment, such as sudden weather changes leading to sharp increases or decreases in flow, thereby enhancing the accuracy of predictions. Additionally, this method also helps the model better understand the nonlinear characteristics and non-stationarity of time series, further improving the model’s adaptability to complex environmental data and forecasting performance.

To validate this theoretical approach, we conducted comparative experiments using multiple groups of models, including ML-CEEMDAN-TSTF, ML-CEEMDAN-Transformer, and ML-TSTF, to forecast using both differenced sequences and original water volume sequences, with results illustrated in Figure 8. Forecasts using differenced sequences demonstrated improvements in NSE of no less than 3.16% across different models, yielding better fitting outcomes. Moreover, using differenced sequences as the forecasting sequence still achieved a 5.81% increase in accuracy in tasks without differencing decomposition, clearly indicating that the concept of differencing to enhance model accuracy proposed in this experiment applies not only to differencing decomposition but also as a training target for forecasting models.

5.3. Impact of Sliding Window Length on Forecast Accuracy in Real-Time Prediction

To address the demands of real-time forecasting, Transformer incorporates a sliding window mechanism that allows the model to continuously receive and update data for each batch to respond in real-time. Through its self-attention mechanism, Transformer can capture dependencies between different time points within the sliding window. This means that when processing sequence data provided by the sliding window, the model not only considers the positional information of each element within the sequence but also understands and utilizes the dynamic interactions among these elements, thereby achieving accurate predictions of future events.

Consequently, in our study’s ML-CEEMDAN-TSTF model, the selection of the sliding window length has been demonstrated to have a critical impact on improving forecast accuracy. To understand how the sliding window length affects model performance, we designed a series of experiments comparing the impact of different sliding lengths, ranging from 1 to 60 days, on model forecasting accuracy. These lengths were chosen to cover different time scales from short to long term, assessing the model’s ability to capture short-term dependencies and long-term trends in the time series.

Figure 9 shows the results of four evaluation metrics for the ML-CEEMDAN-TSTF model with slider lengths ranging from 1 to 60, highlighting the best result among them. Experimental results indicate that when the sliding window was set to 30 days, the forecast results achieved the lowest values for MSE and RMSE, and the highest for NSE and Corr, indicating optimal forecast precision. This finding reveals the model’s effectiveness in capturing monthly cyclical changes in the time series and their predictive impact on future water quantity variations. Compared to shorter lengths, such as 7 days and 10 days, a 30-day window provides sufficient historical information, allowing the model to learn richer temporal dependencies, thus enhancing prediction outcomes. However, increasing the window length further to 60 days, although providing access to longer historical data, introduces more noise and unnecessary information, potentially obscuring critical short-term and medium-term dependencies, leading to a decline in forecast accuracy.

The experimental results from this section emphasize the need to balance the sufficiency of historical information and the risk of introducing excessive noise in practical applications, to find the sliding window length that best fits specific data characteristics for optimal forecast results.

5.4. Limitations and Prospects

In order to enhance the reliability of predictive models and reduce the impact of uncertainties in meteorological forecasting, this study employs historical observational values as inputs to the model, explicitly excluding information from the forecast day itself to mitigate overreliance on the accuracy of meteorological forecasts. Despite this, technological advancements and innovations in models have led to considerable objectivity in recent meteorological forecast accuracy, with daily predictions of rainfall and temperature exceeding 90% accuracy. Although the methodology used in this paper ensures the accuracy of input data, it also introduces certain limitations. Primarily, relying on historical data means that the most current meteorological forecast information cannot be fully utilized, which may restrict the model’s timeliness and forward-looking capabilities in practical applications. Additionally, this approach may cause the model to overly adapt to specific patterns in historical data, overlooking the actual impact of changes in meteorological conditions on reclaimed water output, thus affecting the model’s adaptability and forecast accuracy when faced with future changes in weather conditions. Therefore, in practical applications, forecasts often require not only the next time point but also daily predictions for a longer future period. Whether to calibrate the model using meteorological forecast information or the model’s own prediction data remains to be further explored through experimental analysis and consideration in future research.

In this study, our proposed ML-CEEMDAN-TSTF differencing model has made significant progress in enhancing the predictive capability for reclaimed water inflow, achieving forecast accuracy above 98.2%. However, the model faces challenges in algorithm optimization and model simplification. Due to its integration of various complex data processing and forecasting techniques, which have enhanced its ability to handle complex time series, the model also incurs higher operational times and resource consumption. Additionally, calibrating the model’s parameters requires substantial effort, a limitation that becomes particularly evident in scenarios dealing with large datasets or requiring real-time predictions, potentially restricting the model’s widespread application and dissemination. Future research should thus focus on developing more efficient algorithm optimization techniques and model simplification strategies, such as by streamlining model structures and optimizing algorithmic processes to significantly enhance operational efficiency without significantly sacrificing forecast accuracy. Furthermore, exploring the design of lightweight models, such as by reducing the number of model parameters or employing model compression techniques, could not only reduce the demand for computational resources but also simplify the model’s tuning and maintenance processes, enhancing usability and practicality.

This research includes a variety of predictive approaches ranging from traditional machine learning models like support vector regression and gradient boosting regression to deep learning models based on neural networks like Transformer and LSTM. During the experimental process, we found that although neural network-based models excel at learning deep features and complex dimensions of data, capturing more detailed data patterns, they do not match the computational stability of traditional machine learning models. Specifically, we observed that neural network-based models exhibit significant fluctuations in results during continuous training, indicating a lack of robustness. This variability could stem from deep learning models’ high sensitivity to noise and outliers in the data and instability in optimizing within a large parameter space. In contrast, traditional machine learning models, due to their relatively simple and well-defined mathematical frameworks, typically provide more stable and predictable forecast results. Future research should therefore focus on how to improve the robustness of neural network-based models. Developing hybrid predictive models that combine the advantages of deep learning models and traditional machine learning models might be an effective direction, aiming to utilize the high representational capabilities of deep learning and the stability of traditional models to achieve higher forecast accuracy and better stability.

A key model in this study, Transformer, features its self-attention mechanism as one of its core attributes, allowing it to effectively capture dependencies between elements when processing sequence data. The successful application of the self-attention mechanism spans multiple domains, including natural language processing, time series forecasting, and image processing. Although the self-attention mechanism enables the model to capture long-distance dependencies, its computational complexity increases significantly with sequence length, limiting its application in processing very long sequence data. Future research could explore new variants of attention mechanisms, such as sparse self-attention and local self-attention, to reduce computational costs and enhance the model’s capability to handle long sequences. Additionally, multimodal learning has become a recent research hotspot and is highly compatible with the Transformer model. The self-attention mechanism shows tremendous potential in integrating information from different modalities, such as combining text, images, and audio data for comprehensive understanding and analysis. Future work could further explore how to optimize the self-attention mechanism to more effectively blend and process multimodal data, for instance, whether surface water flow, changes in water body areas, or sound signals related to water flow can be incorporated as key data information into the model, further enhancing its performance.

6. Conclusions

The ML-CEEMDAN-TSTF differencing decomposition integration model has been proposed, which is a time series forecasting framework that incorporates machine learning, CEEMDAN differencing decomposition, and the TS-Transformer to enhance the overall performance, reliability, and efficiency of the decomposition integration model. It particularly addresses the issue of differencing forgetfulness. This model is designed to tackle key challenges in forecasting inflow volumes at reclaimed water plants, including handling seasonal and non-stationary features, as well as improving sensitivity and prediction accuracy for peak and anomalous conditions. To validate the ML-CEEMDAN-TSTF model, extensive comparative analyses were conducted, including non-decomposed ML models, time-aware outlier-sensitive loss functions, TSTF integrated models, TS-Transformer decomposition integrated models, and TS-Transformer differencing decomposition integrated models. The study utilized daily RWV data from the CH and YF reclaimed water plants in Beijing from February 2018 to February 2021. The main findings can be summarized as follows:

(1) The introduction of time-scale differencing information as an explicit feature resolved the differencing forgetfulness issue, enhancing the model’s sensitivity to temporal changes and its dynamic adaptability to predictions;

(2) TS-Transformer demonstrated significant improvements in predictive performance and accuracy over traditional Transformers, particularly in handling outliers and peaks within time series;

(3) Differencing decomposition techniques provided rich decision-making information, significantly enhancing the model’s overall prediction accuracy, especially in analyzing intrinsic features of the time series;

(4) Although Transformer exhibited average performance in standalone prediction tasks, it demonstrated superior feature extraction and data handling capabilities within integrated model applications, effectively enhancing the predictive performance of the integrated models.

In summary, the ML-CEEMDAN-TSTF differencing decomposition integration model has significantly improved the dynamic adaptability and predictive accuracy of RWV in practical scenarios. Due to the ongoing improvement of supporting equipment and the need for coordination among multiple departments for real-time monitoring data, the model and results are expected to be applied in the future once these issues are resolved. Future research will explore more efficient algorithm optimization strategies and attention mechanism techniques to further enhance the model’s computational efficiency and real-time prediction capabilities, expanding its potential applications in multimodal data fusion and analysis.

Author Contributions

Conceptualization, X.S.; Data curation, L.Z. and Y.Y.; Formal analysis, X.S. and L.Z.; Funding acquisition, C.W. and H.W.; Investigation, X.S. and Y.Y.; Methodology, X.S.; Project administration, C.W.; Resources, C.W. and L.Z.; Software, X.S. and L.Z.; Supervision, H.W.; Validation, X.S.; Visualization, X.S.; Writing—original draft, X.S.; Writing—review and editing, C.W. and L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key R&D Program of China (2022YFC3204603), Major Science and Technology Projects of the Ministry of Water Resources of China (SKS-2022117).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, J.; Li, Z.; Chen, Y. The Spatiotemporal Evolution of Socioeconomic Drought in the Arid Area of Northwest China Based on the Water Poverty Index. J. Clean. Prod. 2023, 401, 136719. [Google Scholar] [CrossRef]
Zhang, X.; Jiang, L.; Kittel, C.M.M.; Yao, Z.; Nielsen, K.; Liu, Z.; Wang, R.; Liu, J.; Andersen, O.B.; Bauer-Gottwein, P. On the Performance of Sentinel-3 Altimetry Over New Reservoirs: Approaches to Determine Onboard A Priori Elevation. Geophys. Res. Lett. 2020, 47, e2020GL088770. [Google Scholar] [CrossRef]
Zhang, L.; Wang, C.; Yu, Y.; Duan, C.; Lei, X.; Chen, B.; Wang, H.; Zhang, R.; Wang, Y. Real-Time Optimization of Urban Channel Gate Control Based on a Segmentation Hydraulic Model. J. Hydrol. 2023, 625, 130029. [Google Scholar] [CrossRef]
Guan, X.; Guo, Z.; Wang, X.; Xiang, S.; Sun, T.; Zhao, R.; He, J.; Liu, F. Transfer Route and Driving Forces of Antibiotic Resistance Genes from Reclaimed Water to Groundwater. Environ. Pollut. 2023, 330, 121800. [Google Scholar] [CrossRef] [PubMed]
Sonnenschein, B.; Ziel, F. Probabilistic Intraday Wastewater Treatment Plant Inflow Forecast Utilizing Rain Forecast Data and Sewer Network Sensor Data. Water Resour. Res. 2023, 59, e2022WR033826. [Google Scholar] [CrossRef]
Montoya-Coronado, V.A.; Tedoldi, D.; Castebrunet, H.; Molle, P.; Lipeme Kouyi, G. Data-Driven Methodological Approach for Modeling Rainfall-Induced Infiltration Effects on Combined Sewer Overflow in Urban Catchments. J. Hydrol. 2024, 632, 130834. [Google Scholar] [CrossRef]
Aliashrafi, A.; Zhang, Y.; Groenewegen, H.; Peleato, N.M. A Review of Data-Driven Modelling in Drinking Water Treatment. Rev. Environ. Sci. Biotechnol. 2021, 20, 985–1009. [Google Scholar] [CrossRef]
Feng, D.; Liu, J.; Lawson, K.; Shen, C. Differentiable, Learnable, Regionalized Process-Based Models With Multiphysical Outputs Can Approach State-Of-The-Art Hydrologic Prediction Accuracy. Water Resour. Res. 2022, 58, e2022WR032404. [Google Scholar] [CrossRef]
Zhang, X.; Jiang, L.; Liu, Z.; Kittel, C.M.M.; Yao, Z.; Druce, D.; Wang, R.; Tøttrup, C.; Liu, J.; Jiang, H.; et al. Flow Regime Changes in the Lancang River, Revealed by Integrated Modeling with Multiple Earth Observation Datasets. Sci. Total Environ. 2023, 862, 160656. [Google Scholar] [CrossRef] [PubMed]
Cogan, N.G.; Chellam, S. Global Parametric Sensitivity Analysis of a Model for Dead-End Microfiltration of Bacterial Suspensions. J. Membr. Sci. 2017, 537, 119–127. [Google Scholar] [CrossRef]
Liu, J.; Qi, H.; Song, Y.; Chen, S.; Wu, D. Dynamics of Bubbles Detached from Non-Circular Orifices: Confinement Effect of Orifice Boundary. Chem. Eng. J. 2023, 471, 144777. [Google Scholar] [CrossRef]
Yang, J.; Choudhary, G.I.; Rahardja, S.; Fränti, P. Classification of Interbeat Interval Time-Series Using Attention Entropy. IEEE Trans. Affective Comput. 2023, 14, 321–330. [Google Scholar] [CrossRef]
Von Rueden, L.; Mayer, S.; Beckh, K.; Georgiev, B.; Giesselbach, S.; Heese, R.; Kirsch, B.; Walczak, M.; Pfrommer, J.; Pick, A.; et al. Informed Machine Learning—A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems. IEEE Trans. Knowl. Data Eng. 2021, 35, 614–633. [Google Scholar] [CrossRef]
Garg, R.; Barpanda, S.; Salanke, G.R.S.N.; Ramya, S. Machine Learning Algorithms for Time Series Analysis and Forecasting. arXiv 2022, arXiv:2211.14387. [Google Scholar] [CrossRef]
Kramar, V.; Alchakov, V. Time-Series Forecasting of Seasonal Data Using Machine Learning Methods. Algorithms 2023, 16, 248. [Google Scholar] [CrossRef]
Zheng, Y.; Ding, J.; Liu, F.; Wang, D. Adaptive Neural Decision Tree for EEG Based Emotion Recognition. Inf. Sci. 2023, 643, 119160. [Google Scholar] [CrossRef]
Tidke, P.; Sarode, S.; Guhe, S. A Review on Weather Forecasting Using Linear Regression. ISJEM 2023, 2. [Google Scholar] [CrossRef]
Salditt, M.; Humberg, S.; Nestler, S. Gradient Tree Boosting for Hierarchical Data. Multivar. Behav. Res. 2023, 58, 911–937. [Google Scholar] [CrossRef] [PubMed]
Trần, V.A.; Hà, T.K.; Khúc, T.Đ.; Lê, T.N.; Trần, H.H.; Doãn, H.P. Nghiên Cứu Khả Năng Của Mô Hình Học Máy GB và SVR Trong Thành Lập Bản Đồ Nguy Cơ Lún Đất Khu Vực Bán Đảo Cà Mau, Việt Nam. VNJHM 2024, 1, 60–73. [Google Scholar] [CrossRef]
Chimphlee, W.; Chimphlee, S. Hyperparameters Optimization XGBoost for Network Intrusion Detection Using CSE-CIC-IDS 2018 Dataset. IJ-AI 2024, 13, 817. [Google Scholar] [CrossRef]
Safder, U.; Kim, J.; Pak, G.; Rhee, G.; You, K. Investigating Machine Learning Applications for Effective Real-Time Water Quality Parameter Monitoring in Full-Scale Wastewater Treatment Plants. Water 2022, 14, 3147. [Google Scholar] [CrossRef]
Tian, B.; Sun, Z.; Hong, H. A Review of Time Series Prediction Methods Based on Deep Learning. In Proceedings of the 2023 2nd International Conference on Artificial Intelligence and Computer Information Technology (AICIT), Yichang, China, 15 September 2023; pp. 1–6. [Google Scholar]
Ao, S.-I.; Fayek, H. Continual Deep Learning for Time Series Modeling. Sensors 2023, 23, 7167. [Google Scholar] [CrossRef] [PubMed]
Giaretta, A.; Bisiacco, M.; Pillonetto, G. Kernel-Based Function Learning in Dynamic and Non Stationary Environments. arXiv 2023, arXiv:2310.02767. [Google Scholar] [CrossRef]
Lin, Z.; Lu, Z.; Di, Z.; Tang, Y. Learning Noise-Induced Transitions by Multi-Scaling Reservoir Computing. arXiv 2023, arXiv:2309.05413. [Google Scholar] [CrossRef]
Casolaro, A.; Capone, V.; Iannuzzo, G.; Camastra, F. Deep Learning for Time Series Forecasting: Advances and Open Problems. Information 2023, 14, 598. [Google Scholar] [CrossRef]
Zhang, Y.; Suzuki, G.; Shioya, H. Prediction and Detection of Sewage Treatment Process Using N-BEATS Autoencoder Network. IEEE Access 2022, 10, 112594–112608. [Google Scholar] [CrossRef]
Mahmmoud, Z.; Al-Rahim, A. Application of the Empirical Mode Decomposition Method to Noise Reduction Using Seismic Event Data Recorded at Kirkuk Seismological Station. IGJ 2023, 56, 135–142. [Google Scholar] [CrossRef]
Samal, P.; Hashmi, M.F. Ensemble Median Empirical Mode Decomposition for Emotion Recognition Using EEG Signal. IEEE Sens. Lett. 2023, 7, 7001–7004. [Google Scholar] [CrossRef]
Wang, L.; Li, H.; Xi, T.; Wei, S. Fault Feature Extraction Method for Rolling Bearings Based on Complete Ensemble Empirical Mode Decomposition with Adaptive Noise and Variational Mode Decomposition. Sensors 2023, 23, 9441. [Google Scholar] [CrossRef]
Zhao, Z.; Yun, S.; Jia, L.; Guo, J.; Meng, Y.; He, N.; Li, X.; Shi, J.; Yang, L. Hybrid VMD-CNN-GRU-Based Model for Short-Term Forecasting of Wind Power Considering Spatio-Temporal Features. Eng. Appl. Artif. Intell. 2023, 121, 105982. [Google Scholar] [CrossRef]
Li, Y.; Zhou, J.; Li, H.; Meng, G.; Bian, J. A Fast and Adaptive Empirical Mode Decomposition Method and Its Application in Rolling Bearing Fault Diagnosis. IEEE Sens. J. 2023, 23, 567–576. [Google Scholar] [CrossRef]
Clark, S.R.; Pagendam, D.; Ryan, L. Forecasting Multiple Groundwater Time Series with Local and Global Deep Learning Networks. IJERPH 2022, 19, 5091. [Google Scholar] [CrossRef]
Köylü, F.; Ülker, M. Machine Learning-Based Bitcoin Time Series Analysis and Price Prediction. In Proceedings of the 2023 Innovations in Intelligent Systems and Applications Conference (ASYU), Sivas, Turkiye, 11 October 2023; pp. 1–5. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
Huang, Z.; Zhang, L.; Zhang, Y.; Qian, S.; Wang, C. Transformer Based Multi-Output Regression Learning for Wastewater Treatment. In Proceedings of the 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), Washington, DC, USA, 1–3 November 2021; pp. 698–703. [Google Scholar]
Peng, C.; Fanchao, M. Fault Detection of Urban Wastewater Treatment Process Based on Combination of Deep Information and Transformer Network. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 1–10. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; Liu, Y.; Sun, H. TLNets: Transformation Learning Networks for Long-Range Time-Series Prediction. arXiv 2023, arXiv:2305.15770. [Google Scholar] [CrossRef]
Xuan, A.; Yin, M.; Li, Y.; Chen, X.; Ma, Z. A Comprehensive Evaluation of Statistical, Machine Learning and Deep Learning Models for Time Series Prediction. In Proceedings of the 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA), Riyadh, Saudi Arabia, 1–3 March 2022; pp. 55–60. [Google Scholar]
Pang, L.; Liu, W.; Wu, L.; Xie, K.; Guo, S.; Chalapathy, R.; Wen, M. Applied Machine Learning Methods for Time Series Forecasting. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17 October 2022; pp. 5175–5176. [Google Scholar]
Xie, Y.; Chen, Y.; Lian, Q.; Yin, H.; Peng, J.; Sheng, M.; Wang, Y. Enhancing Real-Time Prediction of Effluent Water Quality of Wastewater Treatment Plant Based on Improved Feedforward Neural Network Coupled with Optimization Algorithm. Water 2022, 14, 1053. [Google Scholar] [CrossRef]
Pylov, P.; Maitak, R.; Protodyakonov, A. Algebraic Reconfiguration of LSTM Network for Automated Video Data Stream Analytics Using Applied Machine Learning. E3S Web Conf. 2023, 458, 09023. [Google Scholar] [CrossRef]
Zhang, Y. Encoder-Decoder Models in Sequence-to-Sequence Learning: A Survey of RNN and LSTM Approaches. ACE 2023, 22, 218–226. [Google Scholar] [CrossRef]
Kratzert, F.; Klotz, D.; Brenner, C.; Schulz, K.; Herrnegger, M. Rainfall–Runoff Modelling Using Long Short-Term Memory (LSTM) Networks. Hydrol. Earth Syst. Sci. 2018, 22, 6005–6022. [Google Scholar] [CrossRef]
Kavis, A.; Levy, K.Y.; Cevher, V. High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize. arXiv 2022, arXiv:2204.02833. [Google Scholar] [CrossRef]
Xu, D.; Zhang, S.; Zhang, H.; Mandic, D.P. Convergence of the RMSProp Deep Learning Method with Penalty for Nonconvex Optimization. Neural Netw. 2021, 139, 17–23. [Google Scholar] [CrossRef] [PubMed]
Yang, Z.; Zhang, Q.; Chang, W.; Xiao, P.; Li, M. EGFormer: An Enhanced Transformer Model with Efficient Attention Mechanism for Traffic Flow Forecasting. Vehicles 2024, 6, 120–139. [Google Scholar] [CrossRef]
Zhang, L.; Wang, C.; Hu, W.; Wang, X.; Wang, H.; Sun, X.; Ren, W.; Feng, Y. Dynamic Real-Time Forecasting Technique for Reclaimed Water Volumes in Urban River Environmental Management. Environ. Res. 2024, 248, 118267. [Google Scholar] [CrossRef]
Kostić, M.; Jovanović, A. Modeling of Real Combat Operations. J. Process. Manag. New Technol. 2023, 11, 39–56. [Google Scholar] [CrossRef]
Chen, Z.; Ma, Q.; Lin, Z. Time-Aware Multi-Scale RNNs for Time Series Modeling. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 19–26 August 2021; pp. 2285–2291. [Google Scholar]
D’Amato, V.; Oneto, L.; Camurri, A.; Anguita, D.; Zarandi, Z.; Fadiga, L.; D Ausilio, A.; Pozzo, T. The Importance of Multiple Temporal Scales in Motion Recognition: From Shallow to Deep Multi Scale Models. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18 July 2022; pp. 1–9. [Google Scholar]

Figure 1. Flowchart of forecasting decomposition integration framework.

Figure 2. Schematic diagram of decision tree structure.

Figure 3. Schematic flow diagram of TS-Transformer model.

Figure 4. The flowchart of the real-time forecast experiment.

Figure 5. Bar chart of prediction accuracy indicators for different single machine learning models in a wastewater reclamation plant.

Figure 6. The relationship between predicted and observed influent volumes at a reclaimed water plant using ML-VMD-XGBoost (a), ML-VMD-LSTM (b), ML-VMD-TSTF (c), ML-CEEMDAN-XGBoost (d), ML-CEEMDAN-LSTM (e), and ML-CEEMDAN-TSTF (f).

Figure 7. Comparison of influent volumes observed and predicted under different Transformer models at a reclaimed water plant. The black line represents the observed process, the red line represents ML-CEEMDAN-Transformer predicted process, and the green line represents ML-CEEMDAN-TSTF predicted process.

Figure 8. The relationship between predicted and observed influent volumes at a reclaimed water plant with and without the application of a differential concept, using ML-CEEMDAN-TSTF (a,d), ML-CEEMDAN-Transformer (b,e), ML-TSTF (c,f), where (a–c) are scenarios without differential application, and (d–f) are scenarios with differential application.

Figure 9. Diagram illustrating the influence of slider length on forecasting accuracy of the ML-CEEMDAN-TSTF model.

Table 1. High-correlation influencing factors for CH plant under different temporal conditions.

Summer–Autumn		Winter–Spring
Sequence	Correlation (%)	Sequence	Correlation (%)
Inflow-1	94.3	Inflow-1	95.3
Inflow-2	90.3	Inflow-2	93.9
Inflow-3	88.8	Inflow-3	93.7
Inflow-4	87.9	Inflow-4	93.5
Inflow-5	86.8	Inflow-7	92.8
Inflow-6	85.5	Inflow-5	92.7
Inflow-7	84.6	Inflow-6	92.6
Min Temperature-7	25.8	Min Temperature-1	4.0
Min Temperature-4	25.6	Min Temperature-2	3.8
Min Temperature-6	25.5	Min Temperature-3	3.7

Table 2. High-correlation influencing factors for YF plant under different temporal conditions.

Summer–Autumn		Winter–Spring
Sequence	Correlation (%)	Sequence	Correlation (%)
Inflow-1	86.5	Inflow-1	90.1
Inflow-2	79.8	Inflow-2	82.7
Inflow-3	73.7	Inflow-3	77.7
Inflow-4	71.5	Inflow-4	73.9
Inflow-5	69.2	Inflow-5	73.2
Inflow-6	67.4	Inflow-6	72.7
Inflow-7	65.1	Inflow-7	71.8
Min Temperature-6	28.2	Min Temperature-7	14.5
Min Temperature-7	28.2	Min Temperature-6	14.0
Min Temperature-3	28.1	Min Temperature-5	13.5

Table 3. Comparative performance metrics of various ensemble models under decomposition and non-decomposition conditions.

Type	Model	RMSE (m³)	NSE (%)	MRE (%)	Corr (%)
Non-Decomposition	ML-BP	750.9	25.5	5.9	70.8
	ML-SVM	671.0	40.5	4.8	70.9
	ML-GRU	661.4	42.2	4.8	71.1
	ML-XGBoost	379.3	81.0	3.4	95.0
	ML-LSTM	336.3	85.1	2.9	94.9
	ML-TSTF	259.9	91.1	2.1	96.5
Decomposition	ML-CEEMDAN-BP	687.5	37.6	5.0	70.9
	ML-CEEMDAN-SVM	671.0	40.5	4.8	70.8
	ML-CEEMDAN-GRU	602.9	52.0	4.4	75.5
	ML-CEEMDAN-XGBoost	341.2	84.6	3.2	96.8
	ML-CEEMDAN-LSTM	293.0	88.7	2.7	97.8
	ML-CEEMDAN-TSTF	113.8	98.3	0.9	99.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, X.; Zhang, L.; Wang, C.; Yang, Y.; Wang, H. Dynamic Real-Time Prediction of Reclaimed Water Volumes Using the Improved Transformer Model and Decomposition Integration Technology. Sustainability 2024, 16, 6598. https://doi.org/10.3390/su16156598

AMA Style

Sun X, Zhang L, Wang C, Yang Y, Wang H. Dynamic Real-Time Prediction of Reclaimed Water Volumes Using the Improved Transformer Model and Decomposition Integration Technology. Sustainability. 2024; 16(15):6598. https://doi.org/10.3390/su16156598

Chicago/Turabian Style

Sun, Xiangyu, Lina Zhang, Chao Wang, Yiyang Yang, and Hao Wang. 2024. "Dynamic Real-Time Prediction of Reclaimed Water Volumes Using the Improved Transformer Model and Decomposition Integration Technology" Sustainability 16, no. 15: 6598. https://doi.org/10.3390/su16156598

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Real-Time Prediction of Reclaimed Water Volumes Using the Improved Transformer Model and Decomposition Integration Technology

Abstract

1. Introduction

2. Materials and Methods

2.1. Research Framework

2.2. Single Forecast Models

2.2.1. Regression Models

2.2.2. Long Short-Term Memory (LSTM) Networks

2.3. Differencing Decomposition Based on the CEEMDAN Algorithm

2.4. Decision Tree for Factor Importance Measurement

2.5. Ensemble Model

2.5.1. Time-Aware Outlier-Sensitive Transformer (TS-Transformer)

2.5.2. Real-Time Forecast

2.6. Performance Evaluation Criteria

3. Case Study

3.1. Study Area and Data

3.2. Data Normalization

3.3. Model Configuration

4. Results

4.1. Experiment on Model Applicability under Temporal and Spatial Conditions

4.2. Comparative Results of Single Models

4.3. Comparative Results of Hybrid Forecasting Models

4.4. Comparative Results of Transformer and TS-Transformer

5. Discussion

5.1. Advantages of ML-CEEMDAN-TSTF

5.2. Advantages of Differencing in Decomposition Integration Models

5.3. Impact of Sliding Window Length on Forecast Accuracy in Real-Time Prediction

5.4. Limitations and Prospects

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI