1. Introduction
Accurate wind data are crucial for successful search and rescue (SAR) operations when a maritime accident occurs. As wind impacts the drift patterns of survivors and debris, precise wind information is essential for predicting their locations [
1,
2,
3]. Without reliable wind data, search efforts become less effective, potentially delaying survivor rescue and debris recovery.
However, maritime wind stations are sparsely distributed, making it difficult to obtain real-time wind data in many areas. As a result, numerical weather prediction (NWP) models that generate forecasts for all grid points are typically used to estimate wind conditions at the accident site [
3,
4]. These NWP models may not always provide precise wind predictions at the appropriate times for effective SAR operations since they have spin-up issues [
5,
6,
7] and systematic biases [
8,
9,
10]. We need more accurate wind prediction models to better support SAR efforts for maritime accidents.
As wind speed prediction plays a crucial role in SAR operations, air mobility, air pollution forecasting, and wind power generation, there have been numerous efforts to enhance its accuracy [
11,
12,
13,
14,
15,
16,
17,
18,
19]. Many of these efforts have focused on improving predictions by post-processing NWP models. Sweeney et al. [
20] compared seven adaptive approaches to post-processing wind speed forecasts over Ireland, demonstrating that combined forecasts improved accuracy by reducing root-mean-squared errors (RMSE) compared to traditional NWPs. Xu et al. [
21] post-processed the Weather Research Forecast (WRF) model with the gradient boosting decision tree [
22] to improve wind speed forecast over the original WRF results. Phipps et al. [
23] post-processed weather elements such as temperature, wind speed and both the
u- and
v-components of wind from weather ensembles of NWPs to enhance wind power forecasts. Duan et al. [
24] proposed a graph-based wind speed prediction model that post-processes the WRF model. Bouallègue et al. [
25] used machine learning methods to optimize ECMWF’s operational medium-range forecasts for 10
wind speed by correcting past errors and reducing forecast uncertainties, resulting in a 10–15% improvement in RMSE across various methods. Zhang et al. [
10] developed a deep learning model to improve real-time wind forecasts by using a spatiotemporal method for nonlinear mapping between the forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) and the fifth-generation ECMWF atmospheric reanalysis, resulting in reductions in wind speed and direction biases. These previous research results suggest that post-processing NWP data to improve wind prediction accuracy could help in predicting the drift patterns of survivors and debris during maritime accidents, thereby enhancing the efficiency of SAR operations.
Supervised learning has been widely used for post-processing NWPs [
10,
20,
21,
24]. This machine learning technique creates a function to map input data to output data based on given instances of such mappings. The resulting function can then predict the output for new input data, and is referred to as a classifier if the output is categorical, or a regression function if the output is continuous. For instance, in our context, a classifier would predict whether or not strong winds will occur, whereas a regression function might predict wind speed in meters per second.
In machine learning, dimensionality reduction techniques construct an effective set of features from high-dimensional input data. As the dimension of input data increases, the amount of time or memory required by machine learning techniques can increase significantly. This phenomenon is referred to as the curse of dimensionality [
26], which can be alleviated by dimensionality reduction techniques. These techniques include feature selection [
27] and feature extraction [
28]. Feature selection selects a subset of input variables, while feature extraction projects high-dimensional features to a lower-dimensional space.
In this paper, we predict the
u- and
v-components of wind by post-processing the forecasts from the ECMWF high-resolution model. We evaluate the first 24 h forecast from the ECMWF and apply post-processing techniques to the subsequent 48 h. First, we apply principal component analysis (PCA) [
29] for feature extraction on wind observation and forecast data around the Korean Peninsula for the initial 24 h. We use these data to train the support vector regression (SVR) [
30] model, which then makes wind predictions for the following 48 h. Furthermore, we devise an adaptive weighting scheme that dynamically combines predictions from locations without wind stations with those from the nearest wind station. This approach successfully improved the accuracy of predictions for locations without wind stations. Using observations to post-process forecasts from the ECMWF is a novel approach compared to previous studies. However, this method has the limitation that it cannot be applied to locations where no nearby wind stations exist.
The rest of this paper is organized as follows: In
Section 2, we introduce the datasets and their features. In
Section 3, we present the details of the PCA, SVR, and the adaptive weighting scheme. In
Section 4, we present the experimental setup and results. Finally, in
Section 5, we draw conclusions and discuss future research directions.
2. Data
Observation data were collected from seven offshore wind stations around the Korean Peninsula, each equipped with sensors to detect the wind speed and direction. We convert these measurements into the
u- and
v-components of wind, as this decomposition is known to be beneficial for wind forecasting [
31].
Figure 1 shows the seven stations we use: two are in the East Sea, two in the Yellow Sea, and the remaining three in the Korea Strait. Wind data for 1 June 2022 to 31 May 2023 at these locations are used for training and evaluating our scheme.
Table 1 lists the details of these stations.
We post-process the ECMWF high-resolution forecasts from 00:00 UTC. Each forecast that we use predicts the
u- and
v-components of surface wind at hourly intervals, from 0 to 71
. Every day at 00:00 UTC, we use the first 24 h (0
–23
) of forecast data from the ECMWF released at the previous day and the latest 24 h of observations (00:00 UTC to 23:00 UTC) to correct the last 48 h (24
–71
) of the ECMWF’s 72-hour forecast.
Table 2 lists the independent variables used in this study. To mitigate the dissimilarity between January 1st (1) and December 31st (365), we applied trigonometric functions to the cyclic data representing days (1–365).
3. Methods
We use a machine learning approach, in which features are extracted from the original dataset using principal component analysis (PCA), and then support vector regression (SVR) is used to correct the u- and v-components of wind data for each forecasting interval. The proposed method is evaluated by the root-mean-squared error (RMSE).
3.1. Nomenclature
The nomenclature used in this section is given below.
3.2. Principal Component Analysis
A large number of input variables can significantly increase computational time and memory usage as well as cause overfitting, which degrades performance on unseen data. To resolve this issue, two types of dimensional reduction techniques can be used: feature extraction and feature selection. Based on preliminary experiments, which are not covered in this paper, we chose PCA for feature extraction because it performed better than other dimensional reduction techniques, such as various wrapper methods [
32] and the correlation-based feature selection [
33], on our dataset. When raw input data have little classification power, feature extraction tends to be preferred over feature selection [
27,
34].
PCA is a feature extraction technique that uses orthogonal transformation to convert possibly correlated variables into principal components (PCs), which are linearly uncorrelated variables. It provides an informative view of the data by introducing a new coordinate system and reduces the dimensionality of the data. For example, PCA is used in wavelet denoising [
35], extracting features from facial images [
36], and constructing early warning systems for heavy rainfall [
37] by discarding insignificant features from the feature space.
Let
be the
p-th PC and
represent the variable values of the
n-th instance in a training set
, where
M is the number of variables associated with
and
is the value of the
m-th variable. The first PC
is computed to maximize variance:
The remaining PCs are subsequently constructed to maximize variance while being orthogonal to the previous components. The number of PCs is less than or equal to M, and the dimensionality of the data can be reduced by selecting the first s (<M) PCs without significant loss of information. After selecting s PCs, each is transformed to .
In general, the PCs are computed using the singular value decomposition (SVD) [
38]. The SVD of an
matrix
is a factorization of the form
, where
is an
matrix containing the left singular vectors,
is an
matrix containing the right singular vectors, and
is an
diagonal matrix with singular values. The right singular vectors
are the PCs, which represent the directions of maximum variance in
. Typically, the time complexity of PCA is
, where
M is the number of variables and
N is the number of instances. Detailed information on the computation of SVD can be found in the works of Jolliffe [
38] and Leskovec et al. [
39]. An illustrative example of PCA is shown in
Figure 2.
The most common criterion for choosing the value of
s is based on the cumulative percentage of total variation. Specifically,
s is determined to be as small as possible while ensuring that the percentage of variation accounted for by the first
s PCs exceeds a specified cutoff. Although a sensible cutoff typically lies between 70% and 90%, it can vary depending on the properties of the dataset [
38]. In this study, we set the cutoff at 98%, which, based on our preliminary experiments, uses approximately 19 variables. Reducing the cutoff from 98% to 90% decreased the average number of PCs
s from 19.0 to 4.7 for each wind station, but significantly degraded the performance of wind data correction.
3.3. Support Vector Regression
SVR is a machine learning algorithm that extends the principles of support vector machines [
40] to regression problems. SVR aims to find a function that approximates the relationship between input variables and output variables by minimizing the prediction error, while also maintaining a model complexity that is as simple as possible. SVR is widely used in marine engineering, including applications such as predicting the maneuvering motion of an unmanned surface vehicle [
41] and nonparametric modeling of ship dynamics [
42].
The SVR maps the input data onto a high-dimensional feature space using a kernel function and then performs linear regression in that space. This approach enables SVR to effectively handle nonlinear relationships between the input and output variables. Commonly used kernels include linear, polynomial, and radial basis functions [
43].
Another feature of SVR is the use of an epsilon tube, which ignores errors that are within a certain distance
from the true value. This creates a “tube” around the regression line where errors are not penalized. SVR also manages the trade-off between achieving a low error rate on the training dataset and minimizing model complexity through the regularization parameter
C. A large
C value tries to fit the training data as well as possible, while a smaller
C value leads to a simpler model.The optimization problem for SVR can be formulated as follows:
where
and
b are the parameters of the regression function,
C is the regularization parameter,
is the width of the epsilon tube,
is the target values, and
is the input variables.
Figure 3 illustrates the support vector regression with the epsilon tube. The central solid line indicates the regression function, representing the best fit within the epsilon tube. The shaded area around this line is the epsilon tube. Points falling within this tube are considered well predicted. The epsilon tube defines a margin of tolerance, where no penalty is given to errors.
In our study, we employ SVR to correct the
u- and
v-components of wind data for each forecasting interval. The input features for SVR include both observed and forecasted wind data. As standard practice, we standardized the input variables [
44] by rescaling them to have a mean of zero and a standard deviation of one, ensuring equal contribution from each variable. We then performed PCA to reduce dimensionality, capturing the most significant variance while minimizing noise and computational complexity before training the SVR models.
Figure 4 shows a flowchart of the wind prediction correction process using PCA and SVR.
3.4. Adaptive Weighting Scheme
In our study, we corrected the wind forecasts for the next 48 h by post-processing the observations and NWPs from the past 24 h. Therefore, it is not possible to create a wind data correction model for locations without observational data. In such situations, we used the model trained at the nearest wind station to correct the wind data of the stations without observations. The wind station provides the model with the first 98 input variables (day, recent 24 h forecasts and observations), while the remaining three variables (the target hour of correction and the forecasts for that time) are provided by the location without observations.
However, since the two locations are not exactly the same, we found that combining the original predictions with the corrections from the model trained at the nearest wind station yields better results. To achieve this, we devised an adaptive weighting scheme.
Cosine similarity is a metric used to measure the degree to which the directions of the two vectors align. The cosine similarity ranges from −1 (indicating opposite directions) to 1 (indicating perfectly aligned directions) and is calculated as follows:
where
and
are the two vectors,
is the dot product of the vectors, and
and
are the magnitudes (or lengths) of the vectors.
The adaptive weighting scheme calculates the cosine similarity between the past 24-hour forecasts at the location without observations and those at the nearest wind station. Using this adaptive weighting scheme, the corrected
u-component of wind,
, is calculated as follows:
where
is the original forecast at the location without observations,
is the corrected
u-component of wind using the model from the nearest wind station, and
w is the cosine similarity between the past 24-hour
u-component forecasts at the location without observations and those at the nearest wind station. Each past forecast can be interpreted as a vector of length 24, allowing us to calculate the cosine similarity between two forecasts. The more similar these vectors are, the closer the correction will be to the post-processed value using observations from the nearest wind station. If the vectors differ significantly, the original forecast is used instead. The
v-component of the wind at locations without observations can be determined in a similar manner.
Figure 5 shows a flowchart of the wind prediction correction process at locations without observational data using the adaptive weighting scheme.
4. Results
4.1. Experimental Setup
We used the first three days of the high-resolution forecasts from the ECMWF, released daily at 00:00 UTC, for the seven locations shown in
Figure 1. This dataset covers the period from June 2022 to May 2023. To evaluate the performance of wind forecast corrections, we used 12-fold cross-validation (CV), with each fold consisting of one month. For example, in the first validation, data from June 2022 to April 2023 was used as the training set, while data from May 2023 was used as the test set. This procedure was repeated for all months, and the results were averaged to produce a single performance estimate for each model.
We tested the performance of SVR without any dimensionality reduction techniques, and SVR was preceded by PCA, which uses an orthogonal transformation to convert possibly correlated variables into linearly uncorrelated variables. By projecting high-dimensional features into lower-dimensional space and introducing a new coordinate system, PCA reduced the number of variables in a dataset.
We also tested linear regression, random forest [
45], and light gradient-boosting machine (LightGBM) [
22]. Linear regression models the relationship between input variables and the target variable by fitting a linear equation to the observed data. Random forest is an ensemble learning method that constructs multiple decision trees during training and outputs the mean prediction of the individual trees, providing robustness against overfitting. LightGBM is a gradient-boosting framework that uses tree-based learning algorithms. Each of these techniques was evaluated to compare their effectiveness in correcting wind forecasts.
We implemented the machine learning techniques used in this study with
scikit-learn [
46]. All inputs were standardized, and hyperparameters were kept at their default settings, except for SVR, which used a linear kernel with a maximum of 16,000 iterations. Since conventional machine learning techniques only generate a single output, separate correction models were trained for the
u- and
v-components of the wind. All experiments were conducted on an Intel
® Core™ i9–12900K processor.
4.2. Performance Criterion
We use the root-mean-squared error (RMSE) to evaluate the performance of wind correction. In this study, the errors in the
u- and
v-components of the wind are assessed simultaneously as follows:
where
N is the number of test cases,
and
are the true
u- and
v-components of the wind, and
and
are the predicted
u- and
v-components of the wind.
4.3. Comparative Analysis
First, we compared the performance of each machine learning model without using any dimensionality reduction methods.
Table 3 shows the average RMSE values from 24
to 71
forecasts, obtained using 12-fold cross-validation. The ECMWF high-resolution model served as the baseline, and we compared it with linear regression, random forest, LightGBM, and SVR. Each model’s performance is evaluated at several stations in the East Sea, Yellow Sea, and Korea Strait. Without dimensionality reduction techniques, LightGBM achieved the lowest RMSE (2.81), demonstrating the best performance among the compared techniques. Linear regression, random forest, and SVR also achieved RMSE values below 2.92, representing an improvement of over 12% compared to ECMWF baseline.
Next,
Table 4 shows the results of wind correction using machine learning techniques after applying PCA for dimensionality reduction. Here, SVR achieved the best performance with a lowest average RMSE (2.78), improving on ECMWF by 16.01%, and outperforming LightGBM without PCA on average. Specifically, SVR improved the RMSE at all stations, with the average RMSE decreasing from 3.31 to 2.78. These findings highlight the robustness of SVR in refining wind forecasts, which is critical for applications like SAR operations. The performance differences based on the types of SVR kernels can be found in
Appendix A, and the performance for wind direction is given in
Appendix B.
Linear regression also showed improvement, with an RMSE of 2.82, which is over 3% better than the results before using PCA. The performance of the linear regression was not significantly worse than that of the best model, which is consistent with the findings of Bouallègue et al. [
25]. LightGBM, which performed the best without PCA, showed an increase in RMSE of over 12% after dimensionality reduction. Additionally, the RMSE value for random forest became higher than the ECMWF. These tree-based learning models do not seem to perform well with dimensionality reduction through PCA, which is consistent with the experimental results of Moon et al. [
37].
4.4. Analysis of PCA
Table 5 shows the number of input variables and the corresponding average RMSE values by varying the PCA cutoff values when correcting wind data using SVR. The cutoff values for PCA range from 0.90 to 0.99. The cutoff determines the cumulative percentage of variance that must be preserved in the data after dimensionality reduction. As the cutoff value increases, more input variables are retained, preserving more information from the original dataset. As the cutoff value increases, the average RMSE generally decreases, indicating improved performance up to a certain point. The lowest RMSE value of 2.7801 is achieved at a cutoff of 0.98, indicating this is the optimal cutoff value for this experiment. Therefore, all subsequent experiments were conducted using SVR with a linear kernel and PCA with a cutoff of 0.98, utilizing approximately 19 variables. With this setting, the model takes approximately 100 s to train on one year of data from a single location and about 2 s to correct the 23 h–71 h forecast for a single day (the source code is available at
https://github.com/uramoon/wind-correct, accessed on 9 August 2024.).
4.5. Monthly Performance Evaluation
Figure 6 illustrates the monthly RMSE values for wind forecasts from the ECMWF model and the improvements achieved through post-processing these forecasts using PCA and SVR for three regions: East Sea, Yellow Sea, and Korea Strait. The RMSE value for each region is the average of the RMSE values of the stations within that region.
In the East Sea, the RMSE values for forecasts from the ECMWF range from 2.52 in January to 7.22 in May. After applying PCA and SVR, the RMSE values show a significant reduction, ranging from 2.23 in January to 6.33 in May. This demonstrates a consistent improvement across all months, with the post-processed RMSE values being lower than the original forecasts from the ECMWF.
For the Yellow Sea, the ECMWF RMSE values are generally high, with values such as 5.36 in January and 5.07 in April, compared to those of the East Sea. The application of PCA and SVR reduces these values to 3.59 in January and 3.60 in April. This substantial decrease in RMSE highlights the effectiveness of the post-processing method in improving the accuracy of wind forecasts in the Yellow Sea.
In the Korea Strait, ECMWF RMSE values range from 2.23 in January to 4.12 in May. After post-processing, the RMSE values improve to a range of 2.06 in January to 3.66 in May. The reduction in RMSE values across all months indicates the robustness of the PCA and SVR approach in providing more accurate wind forecasts for the Korea Strait.
Overall, post-processing forecasts from the ECMWF with PCA and SVR consistently improved the RMSE values across all three regions and throughout the year. The reductions in RMSE values are evident in every month, suggesting that the post-processing method is effective in enhancing the accuracy of wind forecasts.
4.6. Analysis of Forecast Horizon
We corrected the 24
–71
portion of the 72 h forecast released the previous day, which aligns with the 0
–48
forecast released on the current day. Due to spin-up issues, the forecast for the current day is released several hours after 00:00. Therefore, our corrected forecast can be useful during the prediction gap. Furthermore, as shown in
Figure 7, our corrected forecast is more accurate than the forecast released on the current day, making it highly beneficial for SAR operations.
4.7. Stations without Observational Data
So far, we have successfully post-processed the remaining 48 h forecasts by utilizing the first 24 h of forecast data and observational data at locations with observations. However, most maritime accidents occur in areas where wind observational data are not available, and we need to account for them. To address this, we use a model trained at the nearest wind station to the accident site, evaluating the similarity between the recent 24 h forecasts of both locations. We then correct the wind data using an adaptive weighting scheme.
Table 6 compares the results of different methods, assuming that each location has no observational data and using the model trained at the nearest station. In this table, ECMWF denotes the baseline NWPs for the target station. SVR represents the correction result obtained using the model trained at the nearest station. Averaging refers to the arithmetic mean of the ECMWF and SVR predictions. Adaptive weighting is the correction result obtained using an adaptive weighting scheme, which evaluates the cosine similarity between the forecasts of the target station and its nearest wind station.
In the East Sea, SVR produced the best results, but the adaptive weighting scheme also improved upon the baseline ECMWF. For the Yellow Sea, both SVR and Averaging showed higher RMSE values than the baseline, indicating a failure in correction; however, the adaptive weighting scheme successfully improved upon the baseline. Lastly, in the Korea Strait, the adaptive weighting scheme demonstrated the best results on average. The adaptive weighting scheme achieved an average RMSE improvement of 5.42%, outperforming the baseline at all locations. These promising results suggest that it can enhance NWP for SAR operations even at sites without observational data. The results of using the Euclidean norm instead of cosine similarity in the adaptive weighting scheme can be found in
Appendix C.
5. Conclusions and Future Work
In this study, we explored various machine learning techniques to improve wind forecast accuracy in areas with or without observational data. Our primary methods included SVR for regression and PCA for dimensionality reduction. Through rigorous experimentation, we identified the optimal cutoff value for PCA, achieving an optimal balance between dimensionality reduction and information retention. This balance was crucial in ensuring that the most relevant features were preserved while minimizing computational complexity, thereby enhancing the performance of the wind prediction models.
Additionally, we introduced an adaptive weighting scheme to refine wind predictions for locations without observational data. This scheme utilized the cosine similarity between forecasts from target locations and their nearest wind stations to dynamically correct predictions. Our results demonstrated that this approach enhanced wind forecasts, which is essential for SAR operations.
Ensemble models are generally known to outperform single NWP models in terms of prediction accuracy [
47]. For future research, we aim to improve wind prediction performance by post-processing various ensemble models. Additionally, preliminary experiments indicated that the performance of a simple artificial neural network was underwhelming. Therefore, our goal is to improve the correction model’s performance by applying various neural networks, such as convolution neural networks, recurrent neural networks, and Transformers.