A Rapid Prediction Method for Key Information of the Urban Flood Control Engineering System Based on Machine Learning: An Empirical Study of the Wusha River Basin

Hu, Yaosheng; Tang, Ming; Ma, Shuaitao; Zhu, Zihan; Zhou, Qin; Xie, Qianchen; Wu, Yuze

doi:10.3390/w17060784

Open AccessArticle

A Rapid Prediction Method for Key Information of the Urban Flood Control Engineering System Based on Machine Learning: An Empirical Study of the Wusha River Basin

by

Yaosheng Hu

¹,

Ming Tang

^1,2,*

,

Shuaitao Ma

¹,

Zihan Zhu

¹,

Qin Zhou

¹,

Qianchen Xie

³ and

Yuze Wu

⁴

¹

School of Hydraulic Engineering, Nanchang Institute of Technology, Nanchang 330099, China

²

Jiangxi Provincial Key Laboratory of Water Resources Allocation and Efficient Utilization, Nanchang 330099, China

³

School of Resources & Environment, Nanchang University, Nanchang 330031, China

⁴

College of Water Conservancy & Hydropower Engineering, Hohai University, Nanjing 210024, China

^*

Author to whom correspondence should be addressed.

Water 2025, 17(6), 784; https://doi.org/10.3390/w17060784

Submission received: 12 January 2025 / Revised: 2 March 2025 / Accepted: 6 March 2025 / Published: 8 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

With the intensification of global climate change, the frequency and intensity of urban flood disasters have been increasing significantly, highlighting the necessity for a scientific assessment of urban flood risks. However, most existing studies focus primarily on the spatial distribution of urban flood data and their socio-economic impacts, with limited attention on the urban flood control engineering system (UFCES) itself and the analysis of urban flood risks from the perspective of the degree of system failure. To address this gap, we proposed a rapid prediction method for key information of the UFCES based on a machine learning model. With the aim of improving the accuracy and timeliness of information prediction, we employed a coupled modeling approach that integrates physical mechanisms with data-driven methods to simulate and predict the information. Taking the Wusha River Basin in Nanchang City as a case study, we generated the training, validation, and testing datasets for machine learning using the urban flood mechanism model. Subsequently, we compared the prediction performance of four machine learning models: random forest (RF), XGBoost (XGB), support vector regression (SVR), and the backpropagation neural network (BP). The results indicate that the XGB model provides more stable and accurate simulation outcomes for key information, with Nash coefficient (R²) values above 0.87 and relative error (RE) values below 0.06. Additionally, the XGB model exhibited significant advantages in terms of simulation speed and model generalization performance. Furthermore, we explored methods for selecting key information indicators and generating samples required for the coupled model. These findings are crucial for the rapid prediction of key information in the UFCES. These achievements improve the technical level of urban flood simulation and provide richer information for urban flood risk management.

Keywords:

urban flood; urban flood control engineering system (UFCES); machine learning models; rapid simulation; stratified sampling strategy

1. Introduction

Since the mid-20th century, the trend of global warming has become increasingly evident, with extreme weather events rising and intensifying [1]. From 17 to 23 July 2021, Henan Province in China experienced an extremely rare torrential rainstorm. The probability of 1-h rainfall in some areas even reached a once-in-a-millennium level, triggering severe floods that affected 14.786 million people and caused direct economic losses of CNY 120.06 billion. In July 2022, consecutive heavy rains in eastern Kentucky, USA, caused flooding, resulting in at least 37 fatalities and displacing thousands of people [2]. Flooding caused by extreme rainfall poses a serious threat to normal life and socio-economic development and will continue to do so. Therefore, monitoring, simulating, assessing, and warning of urban floods under heavy rain conditions and actively taking preventive measures to alleviate their extensive impacts are major issues of concern for the international community, governments, and academia [3,4]. Moreover, with the advancement of artificial intelligence technology, the application of machine learning for urban flood information simulation has emerged as a current research hotspot.

In urban flood simulation, the scenario simulation assessment method based on physical mechanisms has become the predominant approach in flood risk research. Scholars have conducted in-depth research on the development of surface models, river network models, and underground pipe network models, as well as their coupling mechanisms, yielding fruitful research outcomes [5,6]. Currently, widely used urban flood simulation models include SWMM, the MIKE series, InfoWorks ICM, HEC-RAS, TUFLOW, STORM, and IFMS Urban, among others. These models can provide the spatiotemporal evolution of parameters such as water level, flow rate, flow velocity, and waterlogging depth under various rainfall scenarios. However, these models have several drawbacks, including high and comprehensive data requirements, high computational resource demands, and longer simulation times. To enhance the operational efficiency of these models, scholars have employed acceleration techniques such as GPU technology or supercomputers to increase the computational speed of hydrological and hydraulic models [7]. Nevertheless, the increasing complexity of the underlying surface, pipe networks, and surface channels, as cities develop, leads to a dramatic increase in the number of grids and a highly complex physical process. Even with the use of GPU technology or supercomputers and other acceleration methods, the models still fail to meet the time-sensitive requirements of urban emergency management.

With the advancement of machine learning and related technologies, integrating machine learning with physical-based flood simulation has become a frontier area in urban flood risk assessment [8]. In recent years, the combination of machine learning models—such as neural networks and decision trees—with numerical models for urban flood prediction has rapidly developed, greatly improving the timeliness of simulations [9]. Wang J et al. [10] proposed a flood prediction method that incorporates a BP neural network to correct the XAJ hydrological model. The results demonstrated that the BP neural network effectively reduces prediction errors in the XAJ model and exhibits stable performance in real-time corrections. Moreover, Wang Z et al. [11] utilized a random forest (RF) model to assess regional flood disaster risks, thereby validating the feasibility and rationality of the RF model in flood risk evaluation. In addition, Ma et al. [12] introduced a flash flood risk assessment method based on XGBoost, which achieved a test accuracy of 0.84, highlighting the model’s robustness. Moreover, Li et al. [13] developed a flood prediction model that integrates support vector machine (SVM) modeling with boosting algorithms. This model significantly enhances flood prediction accuracy, particularly when dealing with complex flood data. Furthermore, Alipour et al. [14] employed both artificial neural networks (ANNs) and support vector regression (SVR) to simulate two-dimensional hydrodynamic flood models. The study revealed that SVR outperforms ANN in predicting water levels. These studies highlight the significant potential of machine learning models in flood simulation and prediction, providing robust support for flood disaster prevention and management. However, these models also face challenges, including limited prediction indicators, high computational demands, and insufficient generalization performance [15].

Urbanization has transformed urban flooding in large cities into a complex three-dimensional dynamic system, where the linear process of disaster formation has evolved into a network of interdependent feedback loops and secondary disasters. In particular, major cities in southern China, located in close proximity to large rivers, have drainage systems that are tightly integrated with these water bodies. As a result, these cities are more susceptible to the dual impacts of external floods and waterlogging, exacerbating urban flood disasters. External flooding can impede the timely discharge of rainwater from urban protected areas, leading to rising river and lake levels, which further weaken the drainage capacity of the drainage network. In severe cases, backflow of floodwaters can occur, triggering even more severe waterlogging [16]. The UFCES, composed of subsystems such as the water drainage system (river embankments, pumping stations, etc.) and the urban drainage system (municipal pipe networks, stormwater pumping stations), is often at risk of partial or even complete failure.

Given these challenges, it is crucial to focus on the UFCES itself and assess urban flood risks from the perspective of system failure. However, real-time monitoring data in urban stormwater management practices are often scarce. Therefore, it is necessary to develop a coupled modeling approach that integrates physically based models with data-driven methods to simulate key information of the UFCES. This approach aims to reasonably evaluate the performance metrics of each subsystem and determine the overall failure degree of the system. We first employed a physical-based scenario simulation method to obtain urban flood information, which was used as the dataset for constructing machine learning models. Subsequently, we selected the optimal modeling approach from data mining techniques such as RF, XGBoost, SVR, and the BP neural network to rapidly simulate and predict key information of the UFCES. This includes characteristic indices of the water conservancy drainage system, such as the highest water level of the embankment (HWLE) and characteristic indices of the urban drainage system, such as the maximum waterlogging depth at the waterlogging point (MWD), the minimum average flow rate of the pipeline over 1 h (MAFP), and the time of negative flow rate of the pipeline (TNFP). These efforts will facilitate subsequent failure risk assessment of the UFCES, provide a basis for proactive prevention and rational resource allocation by the government, and enhance flood emergency management capacity.

2. Materials and Methods

2.1. Study Area and Data

2.1.1. Study Area

Nanchang is located in the lower reaches of the Ganjiang River and the Fuhe River in Jiangxi Province, and the Ganjiang River passes through the urban area from the southwest to the northeast, dividing it into two districts: Changnan and Changbei. The Changbei district features undulating topography, comprising low mountains, hills, and riverside flatlands, with more small tributaries, small and scattered flood protection circles, and complex regional flood combinations, including flash floods and flooding in urban depressions brought about by external flood topography, which is a typical low-mountain and hilly urban area. Based on this, we took the Wusha River Basin in Changbei as the research object in this paper. We selected six typical embankment sections (QD, MW, MLWL, MLW, JLWG, LQ) for the water conservancy drainage system and six waterlogging points (XPG, UFHO, CIFA, XHC, WCCZ, CFDA) for the urban drainage system to carry out the empirical study (as shown in Figure 1).

The Wusha River Basin is located in the central-northern part of Jiangxi Province, serving as a left-bank tributary of the lower reaches of the Ganjiang River. The main river channel is approximately 40 km in length, with a longitudinal slope of 4.46‰, and the average elevation of the basin is 135 m. The water level of the lower reaches of the Wusha River is closely related to the Ganjiang River; when the water level of the Ganjiang River is high, it has a significant impact on the Wusha River.

2.1.2. Data Sources

The original monitoring data in this paper were obtained from the day-by-day average water level monitoring data of Nanchang Waizhou Station (1950–2013), precipitation data of Waizhou Station (1956–1996, 2005–2022), and precipitation data of Nanchang Station (1961–2020) provided by the Hydrological Monitoring Center of Jiangxi Province. The design parameters and operational data of the pipe network, river network, and pump gates in the Wusha River Basin were provided by the Nanchang Urban Planning and Design Institute Group and the Nanchang Water Affairs Bureau.

2.1.3. Stormwater Scenario Schemes

Based on the measured 24-h cumulative precipitation data of 59 years at Waizhou Station, the 24-h cumulative rainfall amounts

R_{i} (i = 20, 50, \dots, 1000)

for 6 return periods of 20, 50, 100, 200, 500, and 1000 years were calculated. A total of 428 rainfall events were extracted from the hourly rainfall data recorded at Nanchang Station. Using the DTW-hierarchical clustering algorithm, these events were clustered to identify 8 rainfall patterns

P_{j} (j = 1, 2, \dots, 8)

[17]. In addition, because the water level of the Ganjiang River has a significant top-supporting effect on the Wusha River, according to the flood prevention and control scheme in the “Revision of the Wusha River Water System Plan for the Changbei City District of Nanchang”, 4 design water levels

L_{k} (k = 1, 2, 3, 4)

at the junction of the downstream of the Wusha River and Ganjiang River were obtained.

The urban stormwater scenario was defined as a combination of 192 predefined stormwater scenarios S_m, considering 6 return periods of rainfall R_i, 8 rainfall types P_j, and 4 downstream river levels L_k in this paper, as follows:

S_{m} = [R_{i}, P_{j}, L_{k}]

(1)

From the stormwater scenarios S_m(where m = 1,2,……,192), the characteristic indicators MR_mn (the maximum rainfall in n hours in the m-th stormwater scenario, where n = 1, 2, 3, 6, 12), cumulative rainfall CR_m, and downstream river level L_m are extracted as the flood characteristic indicators for the input into the machine learning model.

2.1.4. The Key Information Indicators of the UFCES

In the water conservancy drainage system, the highest water level of the embankment (HWLE) is a key indicator of flood information, directly related to the safety of flood control embankments and the flood control capacity of the basin. Monitoring and analysis of the HWLE are crucial for predicting and preventing potential urban flood disasters.

In the urban drainage system, three key indicators were selected to assess the flood risk: the maximum water depth at the waterlogging point (MWD), the minimum average flow rate of the pipeline over 1 h (MAFP), and the time of negative flow of the pipeline (TNFP). MWD can intuitively reflect the drainage efficiency and waterlogging risk of the city under extreme rainfall conditions. MAFP and TNFP, as pipeline flow data, reflect the degree of failure and the time of complete failure of the drainage system in handling excessive rainwater.

2.2. The Key Information of the UFCES Rapid Simulation Model

We constructed a rapid simulation model for key information of the UFCES based on urban flood mechanism models and machine learning models, as illustrated in Figure 2. We constructed an urban flood mechanism model and completed its calibration and validation; inputted 192 stormwater scenario schemes to obtain the key information of urban flooding; and generated the training set, validation set, and testing set of the machine learning model. Subsequently, we compared the simulation performance of machine learning models such as RF and XGB. Finally, we realized the rapid simulation of key information for the UFCES.

2.2.1. Model of Urban Flooding Mechanisms Based on Physical Mechanisms (Sub-Model 1)

We collected basic data, including drainage pipeline data and river network data of the Wusha River Basin, and constructed a one-dimensional and two-dimensional coupled urban flood mechanism model in the Wusha River Basin based on MIKE+. Firstly, we activated the hydrodynamics (HD) and rainfall-runoff (RR) modules. Subsequently, we carried out the construction of the one-dimensional pipeline model, the river model, and the two-dimensional overland flow model. After coupling these three models, we established boundary conditions, initial conditions, and simulation settings. Finally, we conducted model solving and result presentation. We collected hourly precipitation data from Nanchang Station on 8 and 20 June 2023 (referred to as “2023.6.8” and “2023.6.20”), each with a rainfall duration of 24 h. We also collected data on river water levels and the maximum water depth at two waterlogging points (“XPG” and “UFHO”) during these two actual rainfall events, as well as the measured water level at Changling Hydrological Station on the main stream of the Wusha River on 2023.6.8. These data were used to calibrate and validate the urban flood mechanism model constructed on the basis of MIKE+. Then, we used 192 stormwater scenarios as input information and boundary conditions for the urban flood mechanism model to carry out the flood simulation in the Wusha River Basin, and the key information (sample set) required for the urban flood information rapid simulation model based on machine learning was extracted, as shown in Figure 3.

2.2.2. Rapid Simulation Model for Key Information of the UFCES Based on Machine Learning (Sub-Model2)

Alternative Machine Learning Models

(1): Random forest

Random forest is an ensemble learning method that improves the predictive accuracy of a model by constructing multiple decision trees and aggregating their predictions. The prediction process of random forest is achieved by aggregating the prediction results of all decision trees.

In the regression problem, the prediction can then be expressed as:

\hat{y} = \frac{1}{T} \sum_{i = 1}^{T} y_{i}

(2)

where

\hat{y}

is the prediction result, T denotes the number of decision trees, and

y_{i}

is the prediction result of the i tree for a particular sample.

(2): XGBoost

XGBoost is an efficient gradient boosting decision tree algorithm that builds upon the original gradient boosting decision tree (GBDT) framework, significantly enhancing the model’s effectiveness. As a forward additive model, its core principle involves integrating multiple weak learners into a single strong learner through specific techniques. Specifically, it employs multiple trees for joint decision-making. The result of each tree is the difference between the target value and the cumulative predictions of all previous trees. By summing up all these results, the final outcome is obtained, thereby enhancing the overall performance of the model.

XGBoost is composed of multiple regression trees, which means it can handle regression problems. The formula for the regression tree in its model is expressed as follows:

{\hat{y}}_{i} = \sum_{k = 1}^{K} f_{k} (x_{i})

(3)

where K denotes the number of trees,

{\hat{y}}_{i}

is the predicted value of the i sample, and

f_{k} (x_{i})

is the regression equation corresponding to the regression equation corresponding to the k regression tree of the i sample.

The objective function of the XGB model consists of a loss function and a regularization term, which can be expressed as follows:

O b j = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})

(4)

where l is the loss function and

Ω (f_{k})

is a regularization term to suppress the complexity of the k tree.

(3): BP neural network

The backpropagation (BP) neural network, also known as a multilayer feed-forward neural network, is trained using a backpropagation algorithm to adjust the weights and biases within the network. A typical BP neural network usually consists of an input layer, one or more hidden layers, and an output layer. Each neuron (node) is connected to all neurons in the previous layer via weights. The sum of the weighted inputs is computed, and the output is determined by passing this sum through a nonlinear activation function. This process can be mathematically represented as follows:

z_{j} = \sum_{i = 1}^{n} w_{j i} \cdot x_{i} + b_{j}

(5)

a_{j} = f (z_{j})

(6)

where z_j is the sum of the weighted inputs of the j neuron, w_ji are the weights connecting the i input to the j neuron, x_i is the i input, b_j is the bias of the j neuron, a_j is the output of the j neuron, and f is the activation.

(4): Support Vector Regression

Support vector regression (SVR) is a regression method based on support vector machine (SVM) modeling. It addresses regression problems by identifying the optimal hyperplane in a high-dimensional feature space. The core concept of SVR involves finding a hyperplane within the dataset that minimizes the sum of the distances from the hyperplane to the sample data points. The model’s objective is to minimize both the model’s complexity and the maximum deviation between the training data points and the hyperplane.

The mathematical formulation of SVR can be expressed as the minimization of the following objective function:

\min_{w, b, ξ, ξ^{*}} \frac{1}{2} | | w | |^{2} + C \sum_{i = 1}^{n} (ξ_{i} + ξ_{i}^{*})

(7)

It is bound by the following:

s . t . \{\begin{array}{l} y_{i} - (w \cdot x_{i} + b) \leq ε + ξ_{i} \\ (w \cdot x_{i} + b) - y_{i} \leq ε + ξ_{i}^{*} \\ ξ_{i}, ξ_{i}^{*} \geq 0 \end{array}\}

(8)

where w is the weight vector, b is the bias term,

ξ_{i}

and

ξ_{i}^{*}

are slack variables to handle the case of incomplete separability, C is a regularization parameter, used to balance model complexity and fitting error, and

ε

is a pre-set threshold.

Constructing a Rapid Simulation Model for Key Information of the UFCES Based on Machine Learning

Four commonly used machine learning models, namely RF, XGB, SVR, and the BP neural network, were used to realize the rapid simulation for key information of the UFCES in this paper. The key information indicators HWLE, MWD, MAFP, and TNFP were extracted from the simulated dataset of urban flooding simulated by the mechanism model, and the feature indicators MR_mn, CR_m, and L_m were extracted from the 192 stormwater scenarios. These were combined to form the sample set Ω for the machine learning model.

To enhance the accuracy of machine learning models in simulations and to mitigate the potential insufficiency in capturing key urban flood information under different rainfall intensities due to random division of training and validation sets, we divided the training set Ω_tr, validation set Ω_va, and testing set Ω_te based on rainfall return periods in this paper. This ensures that each dataset covers all rainfall intensities. Specifically, each dataset includes 6 data subsets corresponding to different rainfall intensities. Firstly, according to different rainfall return periods, the sample set Ω was divided into 6 subsets Ω₁, Ω₂, …, Ω₆. Then, we randomly selected a set of data from each subset to form the testing set Ω_te. Secondly, from the remaining data in the six subsets, data were randomly divided at a 7:3 ratio to obtain data for training and validation, respectively. Finally, these were aggregated to form the training set Ω_tr and the validation set Ω_va.

In the subsequent machine learning model training process, the grid search method was used to optimize the hyperparameters of the machine learning model, and it verified the performance of each model under the optimal parameters and ultimately selected the optimal prediction model for the key information of the UFCES through the testing set.

Indicators for Model Evaluation

The model simulation performance was evaluated using the Nash coefficient R² and the relative error RE. R² measures the fit of the model to the observed data, and the closer the value is to 1, the more accurate the model is. RE evaluates the prediction accuracy of the model, and the closer the value is to 0, the more accurate it is. The formula is as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(9)

R E = \frac{1}{n} \sum_{i = 1}^{n} | \frac{{\hat{y}}_{i} - y_{i}}{y_{i}} | \times 100 %

(10)

where y_i is the i observation,

{\hat{y}}_{i}

is the i value predicted by the model,

\bar{y}

is the mean of the observations, and n is the total number of observations.

Combining R² and RE, the simulation performance of the urban flood information simulation model can be comprehensively evaluated to ensure that the selected model can provide scientific and accurate decision support for urban planning and flood prevention and control.

3. Results and Discussion

3.1. Calibration and Validation of Urban Flood Mechanism Model

The calibration and validation results are shown in Table 1 and Table 2. It can be seen that the relative errors are all within the range of −10% to 10%, which proves that the urban flooding mechanism model constructed in this paper can satisfactorily simulate the key information of the UFCES.

To further enhance the reliability of the model, we conducted a validation using the water level data from the Changling Hydrological Station of the Wusha River on 8 June 2023. The validation results are shown in Figure 4, which indicates a consistent trend in the rise and fall of water levels. The Nash coefficient between the simulated and observed water levels is 0.929, demonstrating a satisfactory validation outcome. Additionally, in our previous research projects, we have successfully applied MIKE+ model (http://www.dhichina.cn/h-col-189.html, accessed on 5 March 2025) to Nanchang, establishing a stormwater runoff simulation model for the Qingshan Lake urban area. The results have shown high accuracy and reliability [18,19]. Based on these successful experiences, we are confident that the model can provide a reliable basis for prediction and analysis in the current study.

3.2. Comparison of Machine Learning Model Simulation Result

3.2.1. Comparison of Simulation Performances Based on the Validation Set

The simulation performance of the four machine learning models, RF, XGB, SVR, and the BP neural network, optimized by the grid search method, are shown in Table 3.

As shown in Table 4, in the simulation of the HWLE index, the XGB model demonstrates excellent predictive ability, with an R² value as high as 0.8770 and a relative error of only 0.0044, indicating that its prediction results closely match the real values. In comparison, the predictive ability of the RF model shows the next best result, while the SVR model and BP neural network exhibit larger relative errors. In the simulation of the MWD index, the SVR model performs best, while the XGB and RF models also show outstanding performance, with R² values close to 0.95 and RE values within 0.045, demonstrating their high accuracy in simulating MWD. In the prediction of MAFP, the XGB model once again exhibits its superiority, with a good simulation effect and an R² value of 0.8817, and the relative error was within a reasonable range, indicating its high applicability in flow prediction. For TNFP, although the XGB model has a relatively better simulation effect, the overall simulation performances of the four machine learning models are generally low.

Overall, the XGB model provides more stable and accurate simulation effects on most key information indicators, demonstrating its superiority in simulating the key information of the UFCES.

3.2.2. Comparison of Prediction Performances Based on Test Sets

The test performances of the optimized models are illustrated in Figure 5. It can be found that the XGB model demonstrates good stability in predicting the four indicators across various rainfall intensity levels. Regardless of the rainfall conditions, this model maintains a relatively low relative error, indicating that the XGB model has strong generalization performance and can adapt to complex and changeable heavy rainfall scenarios. Therefore, the XGB model is a highly promising tool in the simulation and provides early warning for the key information of the UFCES.

3.3. Comparison of Simulation Speeds

We compared and analyzed the computational speed of four machine learning models based on the same running environment. The results are as shown in Table 4. It is obvious that machine learning models have significant advantages over traditional urban flooding mechanism models in terms of running time. The simulation time of the MIKE+ model for key information of the UFCES reaches 21,600 s, posing a significant limitation for flood early warning and management systems that require rapid response.

In contrast, machine learning models exhibit significant superiority in simulation time. The simulation times of the XGB models are all within 0.2 s, much lower than those of the MIKE+ software. This rapid simulation ability enables the XGB model to play an important role in real-time flood early warning systems and provides decision-makers with timely flood risk assessments. The simulation times of the RF and SVR models are slightly longer, and the BP model has the longest simulation time. Obviously, due to its fast and stable performance, the XGB model has become the preferred model in urban flood information simulation and has important application value in urban flood early warning. It is of great significance for improving the timeliness and accuracy of flood information prediction and can provide strong support for the rapid response and emergency management of urban flood disasters.

3.4. Discussion

(1): Advantages of Stratified Sampling Strategy

The stratified sampling strategy based on the rainfall return period divides the data into multiple subsets, each corresponding to one rainfall intensity. Through stratified division and uniform sampling, the training set Ω_tr, the validation set Ω_va, and the test set Ω_te can cover all of the scenarios of rainfall intensities, which ensures that the model can sufficiently learn the features and patterns under different rainfall intensities in the simulation process. In contrast, random sampling may result in insufficient or even absent samples for certain rainfall intensities. This deficiency can lead to underlearning in those scenarios, thereby compromising the model’s prediction performance. The stratified sampling strategy helps the model to learn and predict the key information under different rainfall intensities in a more comprehensive way, thus enhancing the generalization performance of the model, which in turn improves its accuracy and reliability in practical applications.

(2): Optimal Simulation Model—XGB Model

It is concluded that the simulation performance of the XGB model for the key information of the UFCES is generally better than that of the RF model through simulation experiments in this paper. The R² values of the XGB model are mostly above 0.87 and the RE values are mostly below 0.06, while the R² values of the RF model are mostly above 0.8 and the RE value of the RF model in simulating MAFP is 0.35, which is much higher than that of the XGB model. This is slightly different from the previous research results of some scholars [20], which may be due to the fact that the XGB model introduces regularization to prevent overfitting by controlling the complexity of the model, which improves the model’s ability to generalize over unknown data, and at the same time, the XGB model takes into account all of the data in each iteration and progressively optimizes the model via gradient boosting, which makes it able to more effectively capture the critical information and improve the prediction accuracy of the model. In addition, the simulation speeds of the XGB models are all within 0.13 s, showing high computational efficiency. In contrast, although the RF model shows good robustness in integrated learning, it lacks a regularization term to control the model complexity, which makes it unable to effectively inhibit the overfitting of the model when confronted with complex datasets, especially those containing a large number of features and noises, thus affecting its ability to generalize over unknown data. Moreover, the RF model lacks an explicit loss function to guide model optimization during training, which limits its performance on complex datasets. Furthermore, as the RF model evaluates the importance of features by calculating the average impurity reduction of features across all decision trees, it may not accurately reflect the actual contribution of features to model prediction in some cases. When dealing with large-scale datasets, the RF model requires training of a large number of decision trees, which is slower with limited computational resources. In addition to the RF model, there is also a gap in the simulation performance of the BP neural network and SVR models compared with the XGB model. The BP neural network is sensitive to initial weights and thresholds, and it may fall into local optima during training, leading to unstable simulation results. Moreover, the BP neural network model has a slow training speed and a complex structure, resulting in low efficiency when dealing with large-scale data. For the SVR model, the selection of the kernel function and the optimization of parameters have a significant impact on the simulation results, and improper selection may lead to overfitting or underfitting. Compared with these models, the XGB model has better performance in terms of simulation accuracy, generalization ability, and computational efficiency. Therefore, the XGB model is more suitable for popularization and application in the field of UFCES key information prediction to improve the accuracy and timeliness of flood risk assessment and provide strong support for urban flood early warning and risk management.

(3): Limitations

It is observed that the simulation performances of machine learning models on the TNFP indicator are generally mediocre through simulation experiments in this paper. This may be due to the presence of a large number of reciprocating flow phenomena in the urban drainage system of the UFCES under extreme heavy rainfall conditions, which are difficult to simulate accurately. The hydraulic modeling of urban flooding events is extremely challenging, and existing urban flood mechanism models struggle to capture the details of the complex processes generated by the interaction between buildings and water flow, and although the results can be improved by roughness coefficient calibration, it can only partially compensate for the model limitations; moreover, the simulation of the reciprocating flow in the pipeline is still poor [21,22]. Future improvements could focus on the following directions: further optimizing the urban flood mechanism model, for example, by introducing finer grid discretization or employing multi-physics coupling methods to better capture complex flow phenomena; and integrating more advanced machine learning techniques, such as deep learning, leveraging their strengths in handling complex pattern recognition to uncover hidden relationships in the data and improve the simulation of reciprocating flow phenomena under extreme rainfall conditions.

(4): Research Prospects

In future studies, the XGB model could be employed to predict key information of the UFCES, thereby enabling the calculation of the system’s overall failure degree. This approach would enable flood risk assessment from the perspective of system failure, highlighting potential risks to the UFCES. In addition, integrating real-time sensor data (e.g., IoT-based surveillance) directly into model training can further enhance the model’s prediction capability. Further exploration of reciprocating flow simulation methods within mechanism-based models could improve numerical accuracy and provide better data for machine learning models. Additionally, further exploration of the application of deep learning models in urban flood simulation could improve both simulation accuracy and efficiency.

4. Conclusions

(1): This study broke through the limitation of singular indicators in the traditional urban flooding information prediction, and selected four indices, namely HWLE in the water conservancy drainage system and MWD, TNFP, and MAFP in the urban drainage system, to construct a more comprehensive prediction model for the key information of UFCES.
(2): A dataset was constructed by performing stratified sampling of storm and flood information based on rainfall return periods. The training, validation, and testing datasets constructed through this method can cover heavy rainfall of all intensity levels, thereby significantly improving the generalization performance of the trained machine learning model, which can enable the models to better cope with flood prediction tasks under varying rainfall intensities.
(3): Comparative studies of four commonly used machine learning models showed that the XGB model provided a more stable and accurate simulation for key information, with R² values being above 0.87 and RE values being below 0.06. Therefore, it is more suitable for promotion in the field of UFCES key information prediction, providing more efficient urban flood key information for urban planning and emergency management.
(4): The rapid simulation model constructed in this study enriched the technical means of urban flood simulation, which can predict key information of UFCES under different rainfall return periods and can thereby calculate the failure degree of the UFCES, offering a scientific technological foundation for the overall performance assessment of UFCES.

Author Contributions

All authors contributed to the study conception and design. Writing and editing: Y.H. and M.T.; data collecting: Q.X. and Y.W.; data analysis: Y.H., S.M. and Z.Z.; algorithm development: Y.H. and Q.Z.; chart editing: Y.H. and S.M.; construction of flood model: Q.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Joint Major Project of ’Science and Technology + Water Conservancy’ in Jiangxi Province (2022KSG01007) and the National Natural Science Foundation of China (No. 52469002).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dai, Y.S.; Abhishek; Li, L.J.; Gong, Y.; Wu, X.; Sheng, B.; Zhao, W.P. Variations in Present and Future Hourly Extreme Rainfall: Insights from High-Resolution Data and Novel Temporal Disaggregation Model. Water 2024, 16, 3463. [Google Scholar] [CrossRef]
Lecomte, D.U.S. Weather Highlights 2022-Drought, Flash Floods, Tornado Outbreaks, Hurricane Ian, Blockbuster Winter Storms. Weatherwise 2023, 76, 14–22. [Google Scholar] [CrossRef]
Wu, M.M.; Wu, Z.N.; Ge, W.; Wang, H.L.; Jiang, M.M. Identification of sensitivity indicators of urban rainstorm flood disasters: A case study in China. J. Hydrol. 2021, 599, 126393. [Google Scholar] [CrossRef]
He, S.Y.; Zhang, L.M. A stress test of urban system flooding upon extreme rainstorms in Hong Kong. J. Hydrol. 2021, 597, 125713. [Google Scholar] [CrossRef]
Li, D.L.; Hou, J.M.; Shen, R.Z.; Gao, X.J.; Huang, M.S.; Ma, Y. Partitioned Adaptive Model for Urban Rainstorm Runoff Process Based on Plot Generalization and Road Network Detailed Simulation. Adv. Water Sci. 2023, 34, 197–208. [Google Scholar]
Zeng, Z.Y.; Lai, C.G.; Wang, Z.L.; Wu, X.S.; Huang, G.R.; Hu, Q.F. Rapid Simulation of Urban Rainstorm Flood Based on WCA2D and SWMM Models. Adv. Water Sci. 2020, 31, 10. [Google Scholar]
Huang, G.R.; Chen, Z.W.; Zeng, B.W. Research Progress of Urban Flood Model and CPU-GPU Heterogeneous Parallel Computing Technology. J. Hydraul. Eng. 2023, 54, 654–665. [Google Scholar]
Mahato, S.; Pal, S.; Talukdar, S.; Saha, T.; Mandal, P. Field based index of flood vulnerability (IFV): A new validation technique for flood susceptible models. Geosci. Front. 2021, 12, 101175. [Google Scholar] [CrossRef]
Zhang, R.; Chai, Z.Y.; Zhang, T.; Li, J.Z. Research Progress of Flood Forecasting Based on Machine Learning Models. Water Resour. Hydropower Eng. 2023, 54, 89–101. [Google Scholar]
Wang, J.J.; Shi, P.; Jiang, P.; Hu, J.W.; Xiao, Z.W. Application of BP Neural Network Algorithm in Traditional Hydrological Model for Flood Forecasting. Water 2017, 9, 48. [Google Scholar] [CrossRef]
Wang, Z.L.; Lai, C.G.; Chen, X.H.; Yang, B.; Zhao, S.W.; Bai, X.Y. Flood hazard risk assessment model based on random forest. J. Hydrol. 2015, 527, 1130–1141. [Google Scholar] [CrossRef]
Ma, M.H.; Zhao, G.; He, B.S.; Li, Q.; Wang, Z.L. XGBoost-based method for flash flood risk assessment. J. Hydrol. 2021, 598, 126382. [Google Scholar] [CrossRef]
Li, S.J.; Ma, K.K.; Jin, Z.; Zhu, Y.L. A new flood forecasting model based on SVM and boosting learning algorithms. In Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), Vancouver, BC, Canada, 24–29 July 2016. [Google Scholar]
Alipour Saba, M.; Joao, L. Emulation of 2D Hydrodynamic Flood Simulations at Catchment Scale Using ANN and SVR. Water 2021, 13, 2858. [Google Scholar] [CrossRef]
Li, X.L.; Lü, H.S.; An, T.Q.; Liu, D.; Luo, Y. Real-time flood forecast using a Support Vector Machine. In Proceedings of the International Symposium on International Symposium on Integrated Water Resources Management, Agadir, Morocco, 24–25 November 2010. [Google Scholar]
Liu, J.H.; Mei, C.; Liu, H.W.; Fang, X.Y.; Ni, G.H.; Jin, W.B. Key Scientific and Technological Issues in the Joint Prevention and Control of Urban Flood and Inland Waterlogging Disaster Chains in Megacities. Adv. Water Sci. 2023, 34, 172–181. [Google Scholar]
Wu, Y.Z.; Tang, M.; Zhou, Z.H.; Chu, J.Y.; Zeng, Y.L.; Zhan, M.J.; Xu, W.B. Rainfall Pattern Construction Method Based on DTW-HCA and Urban Flood Simulation: A Case Study of Nanchang City, China. Water 2023, 16, 65. [Google Scholar] [CrossRef]
Tang, M.; Xu, W.B.; Yao, J.H.; Tang, C.S. A study of design storm rain patterns based on numerical simulation of urban flooding. China Water Wastewater 2021, 37, 97–105. [Google Scholar]
Tang, M.; Xu, W.B. Scenario simulation-based joint urban stormwater scheduling strategy. China Rural Water Hydropower 2020, 6, 76–81. [Google Scholar]
Dai, X.; Huang, H.; Ji, X.Y.; Wang, W. Spatial-temporal rapid prediction model of urban rainstorm waterlogging based on machine learning. J. Tsinghua Univ. Sci. Technol. 2023, 63, 865–873. [Google Scholar]
Bulti, D.T.; Abebe, B.G. A review of flood modeling methods for urban pluvial flood application. Model. Earth Syst. Environ. 2020, 6, 1293–1320. [Google Scholar] [CrossRef]
Todini, F.D. Testing a simple 2D hydraulic model in an urban flood experiment. Hydrol. Process. 2013, 27, 1301–1320. [Google Scholar]

Figure 1. Schematic diagram of the study area (left: schematic diagram of the Wusha River Basin; upper right: typical embankments selection diagram; bottom right: waterlogging points selection diagram).

Figure 2. The key information of the UFCES rapid simulation model construction diagram.

Figure 3. Urban flood information simulation model based on the mechanism model construction diagram.

Figure 4. Validation results of the water level at Changling hydrological station.

Figure 5. Relative errors of the each model in predicting key information at different rainfall intensity levels (rainfall intensity levels are defined based on the rainfall recurrence period. Levels 1, 2, 3, 4, 5, and 6 correspond to the rainfall intensities of 20-, 50-, 100-, 200-, 500-, and 1000-year return periods, respectively).

Table 1. Calibration results on “20 June 2023”.

Waterlogging Points	Simulated Maximum Water Depth (m)	Measured Maximum Water Depth (m)	Relative Error (%)
UFHO	0.127	0.131	−2.92
XPG	0.429	0.434	−1.12

Table 2. Validation results on “8 June 2023”.

Waterlogging Points	Simulated Maximum Water Depth (m)	Measured Maximum Water Depth (m)	Relative Error (%)
UFHO	0.414	0.458	−9.55
XPG	0.148	0.141	5.30

Table 3. Simulation effects of each model on each key information indicator.

Model	Assessment Indicators	Characteristic Indicators
Model	Assessment Indicators	HWLE	MWD	MAFP	TNFP
RF	R²	0.8129	0.9497	0.8539	0.7315
RF	RE	0.0072	0.0305	0.3570	0.2399
XGB	R²	0.8770	0.9494	0.8817	0.7499
XGB	RE	0.0044	0.0442	0.0541	0.2501
SVR	R²	0.8542	0.9601	0.6743	0.6105
SVR	RE	0.1458	0.0399	0.3057	0.3895
BP	R²	0.8692	0.9230	0.7507	0.3712
BP	RE	0.1308	0.0770	0.2493	0.4621

Table 4. Comparison of model simulation speeds (s).

	HWLE	MWD	MAFP	TNFP
MIKE+	21,600
RF	0.0764	0.2177	0.1383	0.0811
XGB	0.1027	0.1209	0.102	0.102
SVR	0.1458	0.0399	0.3895	0.3895
BP	83.1666	137.7661	133.3321	133.3321

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, Y.; Tang, M.; Ma, S.; Zhu, Z.; Zhou, Q.; Xie, Q.; Wu, Y. A Rapid Prediction Method for Key Information of the Urban Flood Control Engineering System Based on Machine Learning: An Empirical Study of the Wusha River Basin. Water 2025, 17, 784. https://doi.org/10.3390/w17060784

AMA Style

Hu Y, Tang M, Ma S, Zhu Z, Zhou Q, Xie Q, Wu Y. A Rapid Prediction Method for Key Information of the Urban Flood Control Engineering System Based on Machine Learning: An Empirical Study of the Wusha River Basin. Water. 2025; 17(6):784. https://doi.org/10.3390/w17060784

Chicago/Turabian Style

Hu, Yaosheng, Ming Tang, Shuaitao Ma, Zihan Zhu, Qin Zhou, Qianchen Xie, and Yuze Wu. 2025. "A Rapid Prediction Method for Key Information of the Urban Flood Control Engineering System Based on Machine Learning: An Empirical Study of the Wusha River Basin" Water 17, no. 6: 784. https://doi.org/10.3390/w17060784

APA Style

Hu, Y., Tang, M., Ma, S., Zhu, Z., Zhou, Q., Xie, Q., & Wu, Y. (2025). A Rapid Prediction Method for Key Information of the Urban Flood Control Engineering System Based on Machine Learning: An Empirical Study of the Wusha River Basin. Water, 17(6), 784. https://doi.org/10.3390/w17060784

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Rapid Prediction Method for Key Information of the Urban Flood Control Engineering System Based on Machine Learning: An Empirical Study of the Wusha River Basin

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Data

2.1.1. Study Area

2.1.2. Data Sources

2.1.3. Stormwater Scenario Schemes

2.1.4. The Key Information Indicators of the UFCES

2.2. The Key Information of the UFCES Rapid Simulation Model

2.2.1. Model of Urban Flooding Mechanisms Based on Physical Mechanisms (Sub-Model 1)

2.2.2. Rapid Simulation Model for Key Information of the UFCES Based on Machine Learning (Sub-Model2)

Alternative Machine Learning Models

Constructing a Rapid Simulation Model for Key Information of the UFCES Based on Machine Learning

Indicators for Model Evaluation

3. Results and Discussion

3.1. Calibration and Validation of Urban Flood Mechanism Model

3.2. Comparison of Machine Learning Model Simulation Result

3.2.1. Comparison of Simulation Performances Based on the Validation Set

3.2.2. Comparison of Prediction Performances Based on Test Sets

3.3. Comparison of Simulation Speeds

3.4. Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI