Enhancing multivariate post-processed visibility predictions utilizing CAMS forecasts

Mária Lakatos^1,2 and Sándor Baran^1,∗
¹Faculty of Informatics, University of Debrecen, Hungary
²Doctoral School of Informatics, University of Debrecen, Hungary

Abstract

In our contemporary era, meteorological weather forecasts increasingly incorporate ensemble predictions of visibility – a parameter of great importance in aviation, maritime navigation, and air quality assessment, with direct implications for public health. However, this weather variable falls short of the predictive accuracy achieved for other quantities issued by meteorological centers. Therefore, statistical post-processing is recommended to enhance the reliability and accuracy of predictions. By estimating the predictive distributions of the variables with the aid of historical observations and forecasts, one can achieve statistical consistency between true observations and ensemble predictions. Visibility observations, following the recommendation of the World Meteorological Organization, are typically reported in discrete values; hence, the predictive distribution of the weather quantity takes the form of a discrete parametric law. Recent studies demonstrated that the application of classification algorithms can successfully improve the skill of such discrete forecasts; however, a frequently emerging issue is that certain spatial and/or temporal dependencies could be lost between marginals. Based on visibility ensemble forecasts of the European Centre for Medium-Range Weather Forecasts for 30 locations in Central Europe, we investigate whether the inclusion of Copernicus Atmosphere Monitoring Service (CAMS) predictions of the same weather quantity as an additional covariate could enhance the skill of the post-processing methods and whether it contributes to the successful integration of spatial dependence between marginals. Our study confirms that post-processed forecasts are substantially superior to raw and climatological predictions, and the utilization of CAMS forecasts provides a further significant enhancement both in the univariate and multivariate setup.

Keywords: Copernicus Atmosphere Monitoring Service, ensemble calibration, ensemble copula coupling, multivariate post-processing, Schaake shuffle, visibility

¹¹footnotetext: Corresponding author: baran.sandor@inf.unideb.hu

1 Introduction

Due to its significant impact on a wide range of aspects of aviation meteorology, visibility is a critical parameter for both weather forecasters and pilots. Beyond the previously mentioned contexts, visibility’s importance extends notably into the realms of agriculture and maritime transportation. Additionally, poor visibility conditions stand out as one of the most common causes of road accidents. The World Meteorological Organization (WMO) defines visibility as ”the greatest distance at which a black object of suitable dimensions (located on the ground) can be seen and recognized when observed against the horizon sky” (WMO,, 1992). Due to the mentioned reasons, it is crucial to create the most accurate visibility forecasts possible, a task that, fortunately, an increasing number of meteorological services have been undertaking in recent years.

In the present day, weather forecasts are crafted through the use of numerical weather prediction (NWP) models. These models utilize initial values obtained from the previous states of the atmosphere, and the perturbation of these initial conditions results in the creation of a set of forecasts, commonly referred to as ensemble predictions. Despite notable advancements, NWP systems face challenges in achieving complete bias-free forecasts for various weather parameters, even within short lead times. Visibility emerges as a parameter with comparatively weak forecast skill, primarily because, unlike temperature, wind speed, or precipitation accumulation, most NWPs do not explicitly model visibility. Consequently, visibility forecasts necessitate derivation from predictions of related quantities such as relative humidity or precipitation (Chmielecki and Raftery,, 2011). Specifically, at the European Centre of Medium-Range Weather Forecasts (ECMWF), the visibility parameter was integrated into the ECMWF Integrated Forecasting System (IFS) on 12 May 2015 (ECMWF,, 2021). This parameter utilizes model projections of water vapor, cloud, rain, and snow, along with climatological aerosol fields, to estimate the visibility that would be recorded by weather observers.

To correct the biases and lack of calibration in the forecasts, one possible direction is the application of statistical post-processing. In recent years, numerous post-processing techniques have been developed to calibrate ensemble forecasts, for a recent overview we refer to Vannitsem et al., (2021). From this wide range of methods, the straightforward yet potent parametric ensemble model output statistic (Gneiting et al.,, 2005) approach maps ensemble forecasts onto a probability distribution, providing a pathway for assessing uncertainties associated with potential outcomes. The commonly used Bayesian model averaging (BMA; Raftery et al.,, 2005) also offers a complete predictive distribution for the examined weather variable; however, by now the machine learning-based distributional regression network (Rasp and Lerch,, 2018) method has became a rather popular parametric technique (see e.g. Schultz and Lerch,, 2022). Another widespread approach to post-processing is capturing the predictive distribution of a weather quantity with the help of its quantiles (see e.g. Friederichs and Hense,, 2007; Bremnes,, 2019). Naturally, such nonparametric calibration is also frequently combined with machine learning techniques. As examples, one can mention the quantile regression forests (Taillardat et al.,, 2016) or the Bernstein quantile network (Bremnes,, 2020).

The utility of univariate post-processing techniques is supported by numerous studies; however, even today, many experts examine the disadvantage that the correlations between marginal distributions often get lost when applying such methods. This can be relevant to connections between locations, predictions initialized at the same time, or the loss of dependencies between weather quantities. Multivariate post-processing aims to restore potentially lost correlation structures, and there are already numerous parametric or non-parametric models available for achieving these goals (see e.g. Schefzik and Möller,, 2018). One option is to consider multivariate predictive distributions (see e.g. Baran and Möller,, 2015); however, the drawback of such models is that due to the estimation of a large number of parameters, they can easily encounter numerical problems. In contrast, the ensemble copula coupling (ECC; Schefzik et al.,, 2013) represents a two-step approach that involves initially generating samples from univariate predictive distributions. Subsequently, these samples are reorganized into the raw ensemble rank order structure, thus, this procedure can be interpreted as the application of an empirical copula. The Schaake Shuffle (Clark et al.,, 2004) is another method that functions based on a similar principle to ECC; nevertheless, past research has suggested that ECC frequently demonstrates slightly better predictive performance (see e.g. Chen et al.,, 2024; Lakatos et al.,, 2023). For a comprehensive comparison of such multivariate post-processing methods, we refer to Lerch et al., (2020).

Visibility observations are typically reported in discrete values according to WMO guidelines, namely ”100 to 5,000 m in steps of 100 m, 6 to 30 km in steps of 1 km, and 35 to 70 km in steps of 5 km” (WMO,, 2018, Section 9.1.2). As a result, the post-processing of visibility forecasts is reduced to a classification task, where the predicted probabilities of the different classes comprise the forecast distribution in the form of a discrete probability mass function (PMF). A recent study (Baran and Lakatos,, 2023) demonstrated that the implementation of multilayer perceptron neural network (MLP; Goodfellow et al.,, 2016) and proportional odds logistic regression (POLR; McCullagh,, 1980) classifiers can significantly enhance the forecast skill of raw ECMWF visibility predictions. Note, that these two methods displayed superior forecast skills also in the realm of post-processing ensemble predictions of total cloud cover (Baran et al.,, 2021). However, in contrast to Baran et al., (2021), the investigation of Baran and Lakatos, (2023) did not include the exploration of the use of additional covariates that are not exclusively based on ECMWF forecasts of the target weather quantity. A natural first step in this direction is to consider further predictions of visibility that are independent of the operational ECMWF ensemble.

The Copernicus Atmosphere Monitoring Service (CAMS) is dedicated to monitoring and predicting the composition of Earth’s atmosphere. It provides information on various atmospheric components, including air quality, greenhouse gases, and aerosols (Pitkänen et al.,, 2023). CAMS delivers deterministic forecasts related to these components, generated by extending the IFS model developed by ECMWF with additional modules.

In this paper, we study the predictive accuracy of POLR and MLP techniques for calibrating ECMWF visibility ensemble forecasts, incorporating CAMS forecasts as covariates, in the contexts of both univariate and multivariate post-processing. Raw ECMWF visibility ensemble and climatology are considered as reference forecasts.

The structure of the paper is outlined as follows. In Section 2, a concise description of the visibility datasets under study is presented. Section 3 provides a review of the considered univariate and multivariate post-processing methods, along with details on the approaches used for training data selection and the verification tools considered. The outcomes of our case study are detailed in Section 4, followed by a brief discussion and conclusions in Section 5.

2 Data

Refer to caption — Figure 1: Locations of SYNOP observation stations corresponding to the ECMWF- and CAMS forecasts for 2020-2021

The visibility ensemble forecasts employed in this study consist of the operational ECMWF control (CTRL) and the corresponding 50 interchangeable ensemble members (ENS) produced by the ECMWF IFS. Regarding the latter 50 members of the ensemble, it is important to note that their interchangeability stems from the statistical indistinguishability arising from the perturbations of the initial conditions of the NWP. As mentioned in the Introduction, while visibility is primarily reported as a continuous variable in meters, the WMO suggests reporting observations in discrete categories, resulting in 84 different values in total, which can be described by the set

\mathcal{Y}=\{0,100,200,\ldots,4900,5000,6000,7000,\ldots,29000,30000,35000,40% 000,\ldots,65000,70000\},

where the alignment of forecasts (provided in 1 m increments) with observations is achieved by rounding down to the nearest reported value.

As discussed, our goal is to show that incorporating CAMS visibility forecasts as an additional predictor in the post-processing methods described in Section 3, – representing an independent yet comparable meteorological parameter – can significantly enhance the forecast skill of both the raw ECMWF predictions and the corresponding calibrated forecasts. The deterministic CAMS predictions are part of the global forecasts for atmospheric composition provided by the Copernicus Atmosphere Monitoring Service, a service provided by the ECMWF as part of the Copernicus program. The CAMS Global atmospheric composition forecast utilizes the IFS, the same model employed for ECMWF weather forecasts. However, specific modules within CAMS have been enabled to account for aerosols, reactive gases, and greenhouse gases, which have been developed uniquely within the CAMS framework.

Both the ECMWF ensemble, the CAMS forecasts, and the corresponding observations span a 2-year period from January 2020 to December 2021, for 30 SYNOP stations located in Germany, the Czech Republic, and Poland (refer to Figure 1). All forecasts are initialized at 0000 UTC, and we consider 20 different lead times ranging from 6 hours to 120 hours, with a time step of 6 hours. Notably, this dataset is fairly complete, with no missing observations.

3 Post-processing- and forecast evaluation methods

In what follows, let $\boldsymbol{f}^{(d)}=\big{(}f_{1}^{(d)},f_{2}^{(d)},\ldots,f_{51}^{(d)}\big{)}$ denote a 51-member ECMWF ensemble forecast for SYNOP station $d$ ( $d=1,2,\ldots,30$ ) for a given time point with a given lead time, where $f_{1}^{(d)}=f^{(d)}_{\text{CTRL}}$ represents the control run, and $f^{(d)}_{2},f^{(d)}_{3},\ldots,f^{(d)}_{51}$ correspond to the 50 exchangeable members $f^{(d)}_{\text{ENS},1},f^{(d)}_{\text{ENS},2},\ldots,f^{(d)}_{\text{ENS},50}$ . Furthermore, let $f_{0}^{(d)}=f^{(d)}_{\text{CAMS}}$ be the matching CAMS prediction, while $Y^{(d)}\in{\mathcal{Y}}=\big{\{}y_{1},y_{2},\ldots,y_{84}\big{\}}$ denotes the corresponding observed visibility with ${\mathcal{Y}}$ defined in Section 2. Finally, station-specific data are combined into multivariate forecasts $\boldsymbol{\mathfrak{f}}_{0},\boldsymbol{\mathfrak{f}}_{1},\ldots,\boldsymbol% {\mathfrak{f}}_{51}$ and observations $\boldsymbol{Y}$ , where $\boldsymbol{\mathfrak{f}}_{\ell}=\big{(}f_{\ell}^{(1)},f_{\ell}^{(2)},\ldots,f% _{\ell}^{(30)}\big{)}^{\top},\ \ell=0,1,\ldots,51,$ and $\boldsymbol{Y}=\big{(}Y^{(1)},Y^{(2)},\ldots,Y^{(30)}\big{)}^{\top}$ .

3.1 Univariate post-processing

To simplify the presentation of calibration methods for marginals, in this section we omit the indication of the SYNOP station and use notations $\boldsymbol{f}=\boldsymbol{f}^{(d)}$ and $Y=Y^{(d)}$ for forecasts and observations, respectively.

As $Y$ is a discrete random variable, the predictive distribution of $Y$ is specified by conditional probabilities

{\mathsf{P}}(Y=y_{k}\mid\boldsymbol{f}),\qquad k=1,2,\ldots,84,

(3.1)

forming a conditional PMF with respect to the ensemble forecast.

Naturally, in (3.1) the raw forecast can replaced by any feature vector $\boldsymbol{x}$ derived from the ensemble and/or other covariates. These covariates can be location or time-specific variables or additional forecasts of visibility or other related weather quantities.

3.1.1 Multilayer perceptron neural network

In recent years, the application of various types of artificial neural networks (ANNs) for the calibration of ensemble forecasts has become increasingly popular. These networks have a significant advantage compared to classical parametric or non-parametric approaches, as they are more flexible e.g. in representing non-linear relations between the ensemble forecasts and the parameters of the predictive distributions and in incorporating new explanatory variables in the modelling process. Multilayer perceptron (MLP) networks are a powerful class of ANNs widely employed for classification tasks in machine learning. A conventional MLP comprises multiple layers and nodes (neurons), where the value of each node is a result of the transformed (via an activation function) weighted sum of values from the nodes of the preceding layer, along with an added bias term. Input features are introduced to the input layer, and predictions regarding the distribution of different classes are generated in the output layer. The level of abstraction can be determined by tuning parameters, such as the number of hidden layers and the neurons within them. For more details, please refer to the work by Goodfellow et al., (2016).

3.1.2 Proportional odds logistic regression

Proportional odds logistic regression (POLR) is an efficient parametric classification method for ordered classes, such as the visibility observations at hand. It specifies the conditional cumulative distribution of observed visibility $Y$ given an $M$ -dimensional feature vector $\boldsymbol{x}$ as

{\mathsf{P}}\big{(}Y\leq y_{k}\mid\boldsymbol{x}\big{)}=\frac{{\mathrm{e}}^{% \mathcal{L}_{k}(\boldsymbol{x})}}{1+{\mathrm{e}}^{\mathcal{L}_{k}(\boldsymbol{% x})}},\qquad\text{where}\qquad\mathcal{L}_{k}(\boldsymbol{x}):=\alpha_{k}+% \boldsymbol{x}^{\top}\boldsymbol{\beta},\quad\ k=1,2,\ldots,84,

with $\alpha_{k}\in{\mathbb{R}},\ \boldsymbol{\beta}\in{\mathbb{R}}^{M}$ , and $\alpha_{1}<\alpha_{2}<\cdots<\alpha_{84}$ . Hence, POLR modelling of visibility classes results in a total of $84+M$ unknown parameters to be estimated.

3.2 Multivariate methods

The purpose of applying univariate post-processing methods is to fine-tune and calibrate the forecasts predicted by the NWP model, addressing various accuracy issues or biases. However, when applying these techniques, there is a risk of losing the spatial, temporal, or even inter-variable correlations between marginals. Here we consider two easy-to-implement yet powerful two-step techniques that can correct this particular deficiency and restore dependence between marginal distributions (see e.g. Lerch et al.,, 2020; Lakatos et al.,, 2023). Both methods are based on reordering a sample generated from the calibrated predictive distribution obtained in the first step, where one can either consider equidistant quantiles or draw random or stratified samples. As Chen et al., (2024) reveal only minor differences in skill between the multivariate forecasts corresponding to the various sampling strategies, here we will work with the equidistant quantiles of the post-processed univariate forecasts.

3.3 Ensemble copula coupling

Ensemble copula coupling (ECC) leverages the raw ensemble rank structure (with ties resolved at random) and capitalizes on the ample information embedded in ensemble predictions concerning dependencies. Following the notations of Lakatos et al., (2023) this iterative algorithm can be described as follows:

1.

Generate a sample $\widehat{f}_{1}^{(d)},\widehat{f}_{2}^{(d)},\ldots,\widehat{f}_{K}^{(d)}$ for each dimension $d$ , matching the size of the raw ensemble ( $K$ = 51), from the calibrated marginal predictive distribution assumed to be sorted in ascending order.

Consider permutations $\boldsymbol{\pi}_{d}=\big{(}\pi_{d}(1),\pi_{d}(2),\ldots,\pi_{d}(K)\big{)}$ of $\{1,2,\ldots,K\}$ corresponding to the rank order structure of the raw ensemble $f_{1}^{(d)},f_{2}^{(d)},\ldots,f_{K}^{(d)}$ , namely $\pi_{d}(k):=\mathop{\hbox{\rm rank}}\big{(}f_{k}^{(d)}\big{)}$ , where ties are resolved at random. The ECC calibrated sample $\widetilde{f}_{1}^{(d)},\widetilde{f}_{2}^{(d)},\ldots,\widetilde{f}_{K}^{(d)}$ for location $d$ is obtained by rearranging the sample generated in step 1 according to permutation $\boldsymbol{\pi}_{d}$ , that is

\widetilde{f}_{k}^{(d)}:=\widehat{f}_{\pi_{d}(k)}^{(d)},\qquad k=1,2,\ldots,K,% \quad d=1,2,\ldots,30.

3.4 Schaake shuffle

The other nonparametric multivariate post-processing technique we consider is the Schaake shuffle (SSh; Clark et al.,, 2004), which involves using randomly selected past observations to create a dependence template aimed at restoring correlations between the marginal distributions. Samples from the calibrated univariate predictive distributions are then rearranged to match the rank order structure of this selected set of historical observations of the corresponding cardinality. In this way, the SSh method offers the flexibility to generate post-processed forecasts of various sizes, provided there is a sufficiently long historical climatological dataset available. However, to ensure a fair comparison with ECC methods, we restrict the sample size to that of the raw ensemble and use the same sampling strategy.

3.5 Training data selection

Each statistical post-processing method, including both the MLP and POLR models introduced in Section 3.1, conducts parameter estimation using training data, which comprises a collection of prior forecasts and observations predating the current forecast time. Various approaches exist for selecting this training dataset in terms of both spatial and temporal decomposition.

A frequently used method of spatial selection, known as local estimation, entails using forecasts and observations specific to the station under consideration to estimate model parameters or weights (Thorarinsdottir and Gneiting,, 2010). This approach could offer significant advantages as it effectively integrates the station’s characteristics and local conditions into the calibration process. Another widely used and quite powerful alternative to the latter selection process is to treat together historical data of all stations of the ensemble domain, which, for a short training period, can be more efficient than local modelling. Nevertheless, this method, referred to as regional, is not particularly suitable for expansive and diverse domains, since it may struggle to effectively capture the varied complexities and nuances present across such landscapes. Due to the factors mentioned, in the case of regional modeling, all locations share the same set of parameters on a specific day to derive the parameters of the predictive distribution. A middle ground between regional and local methods is semi-local modeling, where stations with similar characteristics are clustered together, and regional modelling is applied within these clusters (Lerch and Baran,, 2017).

The efficiency of post-processing methods is further influenced by the length of the training period and where the training period is positioned in time. One commonly used technique for temporal selection is when the training data is selected using a so-called sliding window approach, where the data for the preceding $n$ calendar days before the actual forecast date are chosen. Alternatively, fixing the training period to a specific extensive time frame in the past can also be suitable for implementing the modelling process, especially in the case of machine learning-based techniques requiring large training datasets. For a recent comparison of various time-adaptive training schemes, we refer to Lang et al., (2020).

3.6 Verification scores

Following the suggestions of Gneiting and Raftery, (2007), the predictive performance of the studied univariate and multivariate visibility forecasts is evaluated with the help of proper scoring rules, as they simultaneously address both the calibration and sharpness of the predictive distribution. To be consistent with Baran and Lakatos, (2023), in the univariate case, we consider the logarithmic score (LogS; Good,, 1952) and the continuous ranked probability score (CRPS; Wilks,, 2019, Section 9.5.1). For a discrete visibility forecast $F$ represented by a PMF $p_{F}(y)$ on $\mathcal{Y}$ the former is defined as

\mathop{\hbox{\rm LogS}}(F,y):=-\log\big{(}p_{F}(y)\big{)},

whereas the latter equals

\mathop{\hbox{\rm CRPS}}(F,y)=\sum_{k=1}^{84}p_{F}(y_{k})\big{|}y-y_{k}\big{|}% -\sum_{k=2}^{84}\sum_{\ell=1}^{k-1}p_{F}(y_{k})p_{F}(y_{\ell})\big{|}y_{\ell}-% y_{k}\big{|},

which is the discrete version of the general representation

\mathop{\hbox{\rm CRPS}}(F,y)={\mathsf{E}}|Y-y|-\frac{1}{2}{\mathsf{E}}|Y-Y^{% \prime}|

(3.2)

derived by Gneiting and Raftery, (2007), where $Y$ and $Y^{\prime}$ are independent random variables with finite second moments distributed according to $F$ .

Form (3.2) allows a direct extension of the CRPS to multivariate forecasts. For a $D$ -dimensional predictive cumulative distribution function (CDF) $F$ and vector $\boldsymbol{y}=\big{(}y^{(1)},y^{(2)},\ldots,y^{(D)}\big{)}^{\top},$ the energy score (ES; Gneiting and Raftery,, 2007) is defined as

\mathop{\hbox{\rm ES}}(F,\boldsymbol{y})={\mathsf{E}}\|\boldsymbol{Y}-% \boldsymbol{y}\|-\frac{1}{2}{\mathsf{E}}\|\boldsymbol{Y}-\boldsymbol{Y}^{% \prime}\|,

where now $\boldsymbol{Y}$ and $\boldsymbol{Y}^{\prime}$ are independent random vectors having distribution $F$ , and $\|\cdot\|$ denotes the Euclidean distance. For a $K$ -member forecast ensemble or sample drawn from the predictive distribution, one has to consider the empirical CDF $F_{K}$ resulting in the ensemble energy score (Gneiting et al.,, 2008)

\mathop{\hbox{\rm ES}}(F_{K},\boldsymbol{y})=\frac{1}{K}\sum_{j=1}^{K}\|% \boldsymbol{\mathfrak{f}}_{j}-\boldsymbol{y}\|-\frac{1}{2K^{2}}\sum_{j=1}^{K}% \sum_{k=1}^{K}\|\boldsymbol{\mathfrak{f}}_{j}-\boldsymbol{\mathfrak{f}}_{k}\|.

Finally, we also report the (ensemble) variogram score of order $p$ ( ${\rm VS}^{p}$ ; Scheuerer and Hamill,, 2015), which is more sensitive to the errors in the specification of correlations than the $\mathop{\hbox{\rm ES}}$ . Given an ensemble forecast $\boldsymbol{\mathfrak{f}}_{k}=\big{(}f_{k}^{(1)},f_{k}^{(2)},\ldots,f_{k}^{(D)% }\big{)}^{\top},\ k=1,2,\ldots,K,$ it equals

{\rm VS}^{p}(F_{K},\boldsymbol{y})=\sum_{i=1}^{D}\sum_{j=1}^{D}\omega_{ij}% \left(\big{|}y^{(i)}-y^{(j)}\big{|}^{p}-\frac{1}{K}\sum_{k=1}^{K}\big{|}f_{k}^% {(i)}-f_{k}^{(j)}\big{|}^{p}\right)^{2},

where $\omega_{ij}\geq 0$ is the weight for coordinate pair $(i,j)$ . The usual choices for order $p$ are $0.5$ and $1$ , in our case study we consider the former and use the notation ${\rm VS}$ for ${\rm VS}^{0.5}$ . Note that all four proper scoring rules defined above are negatively oriented, that is the smaller the better, and implemented in the R package scoringRules (Jordan et al.,, 2019).

In the case study of Section 4 the predictive performance of a forecast $F$ with a given forecast horizon in terms of a score $\mathcal{S}_{F}$ is quantified by the mean score value $\overline{\mathcal{S}}_{F}$ over all forecast cases used for verification. Furthermore, we also consider skill scores (Murphy,, 1973)

{\mathcal{SS}}_{F}:=1-\frac{\overline{\mathcal{S}}_{F}}{\overline{\mathcal{S}}% _{F_{\text{ref}}}},

providing the relative improvement of a forecast $F$ in terms of the score $\mathcal{S}_{F}$ with respect to a reference forecast $F_{\text{ref}}$ resulting in a mean score value $\overline{\mathcal{S}}_{F_{\text{ref}}}$ . Skill scores are positively oriented (the larger the better) and often reported in percentage values. In Section 4 we investigate continuous ranked probability skill score (CRPSS), logarithmic skill score (LogSS), energy skill score (ESS), and variogram skill score (VSS).

In addition to reporting the various score and skill score values of the competing visibility forecasts, we also address the significance of the score differences and uncertainty in score values by providing 95 % confidence bounds. These confidence intervals are calculated using 2000 block-bootstrap samples according to the stationary bootstrap scheme with the mean block length formula specified by Politis and Romano, (1994).

Calibration of univariate ensemble forecasts can also be assessed graphically with the help of verification rank histograms (Wilks,, 2019, Section 9.7.1). For a properly calibrated $K$ -member ensemble, the ranks of the verifying observations with respect to the corresponding forecasts follow a discrete uniform distribution on the set $\{1,2,\ldots,K+1\}$ , and the shape of the corresponding histogram reflects the source of the lack of calibration. Multivariate generalizations of the verification rank are based on the concept of pre-ranks and the corresponding histograms display the pre-rank of the observation with respect to the ensemble forecast with ties resolved at random (Gneiting et al.,, 2008). Here we consider the average and band-depth ranks suggested by Thorarinsdottir et al., (2016) and energy score and dependence (variogram) ranks studied in detail e.g. by Allen et al, (2024). The average rank is simply the mean of the univariate ranks and the interpretation of the corresponding histogram is similar to the univariate case: $\cup$ - and $\cap$ -shaped histograms refer to under- and overdispersion, respectively, whereas biased forecasts result in triangular shapes. Band-depth histograms are based on the discrete version of ordering multivariate functional data proposed by López-Pintado and Romo, (2009), while energy score and dependence histograms use the idea of Knüppel et al., (2022) of proper score-based pre-ranks. Right-skewed band-depth histograms indicate overdispersion, left-skewed ones underdispersion or bias, whereas $\cup$ - and $\cap$ -shaped histograms mean too low- and high correlations in the ensemble. Furthermore, low score-based ranks indicate that the observation is similar to the ensemble members in terms of the given scoring rule, whereas outliers result in high pre-ranks. Note, that in the case of the dependence ranking the weight for a pair of coordinates in the variogram score is the negative exponential of the geographical distance between the corresponding SYNOP stations given in 100 km. Finally, as a measure of the deviation of verification ranks from uniformity we consider the reliability index (RI; Delle Monache et al.,, 2006) defined as

\mathop{\hbox{\rm RI}}:=\sum_{r=1}^{K+1}\Big{|}\rho_{r}-\frac{1}{K+1}\Big{|},

(3.3)

where $\rho_{r}$ is the relative frequency of rank $r$ .

4 Results

We first explore the advantage of applying CAMS visibility forecasts as additional covariates in POLR and MLP methods for discrete post-processing of ECMWF ensemble predictions of visibility compared to models based on the raw ECMWF ensemble only. We also study the predictive performance of the corresponding multivariate forecasts obtained with the help of ensemble copula coupling and Schaake shuffle. To be consistent with the results of Baran and Lakatos, (2023), all investigated MLP and POLR classifiers rely on 350-day rolling training periods, and each lead time is treated separately. We consider local (MLP-L and POLR-L), regional (MLP-R and POLR-R) and clustering-based semi-local (MLP-C and POLR-C) training with four clusters derived using $k$ -means clustering of SYNOP stations with respect to three-dimensional feature vectors consisting of observed frequencies of visibility intervals 0–5000, 5000–30000, and 30000–70000 m in the training period. Both univariate and multivariate predictions are validated on forecast-observation pairs for calendar year 2021 and as reference forecasts, we consider the raw ECMWF visibility ensemble and 30-day climatology.

The reference MLP and POLR models studied by Baran and Lakatos, (2023) are based on eight-dimensional input feature vectors

\big{(}\tilde{f}_{CTRL},\overline{f}_{ENS},s^{2},p_{1},p_{2},p_{3},\beta_{1},% \beta_{2}\big{)}^{\top},

(4.1)

where $\tilde{f}_{CTRL}:=f_{CTRL}/70000$ is the normalized control member, $\overline{f}_{ENS}$ is the mean of the 50 (normalized) exchangeable members, $s^{2}$ is the variance of the 51-member (normalized) operational ensemble, $p_{1},\ p_{2}$ and $p_{3}$ are the proportions of ensemble members predicting visibility up to 1000 m, 1000–2000 m and more than 30000 m, respectively, whereas $\beta_{1}$ and $\beta_{2}$ are annual base functions addressing seasonal variations defined as

\beta_{1}(d):=\sin\big{(}2\pi d/365\big{)}\qquad\text{and}\qquad\beta_{2}(d):=% \cos\big{(}2\pi d/365\big{)},

where $d$ denotes the day of the year (see e.g. Dabering et al.,, 2017).

Here we add the normalized CAMS forecast $\tilde{f}_{CAMS}:=f_{CAMS}/70000$ to the input features and investigate the forecast skill of CAMS extended local, regional, and semi-local MLP and POLR post-processing based on the feature vector

\big{(}\tilde{f}_{CTRL},\overline{f}_{ENS},\tilde{f}_{CAMS},s^{2},p_{1},p_{2},% p_{3},\beta_{1},\beta_{2}\big{)}^{\top}

(4.2)

together with the corresponding multivariate forecasts derived with the help of the two-step approaches presented in Section 3.2.

4.1 Univariate performance

The predictive performance of univariate visibility forecasts is evaluated with the help of the mean CRPS and mean LogS over all forecast cases in the verification period, together with the corresponding skill scores. As in Baran and Lakatos, (2023), extremely low predicted probabilities resulting in numerical issues in LogS calculations are handled by introducing a probability threshold $p_{\min}=2.75\times 10^{-5}$ , which replaces all values in the predictive PMFs which are below this probability. This particular choice of $p_{\min}$ ensures that with a 1 % probability the corresponding reported visibility materializes at least once in a year; however, it is low enough, so there are no observable differences in terms of other scores between the original and the adjusted and renormalized PMFs. Furthermore, as for the reference POLR approaches based on feature vector (4.1) the weights of $\tilde{f}_{CTRL}$ and $\overline{f}_{ENS}$ are forced to be non-negative, we apply the same constraint also for the coefficient of $\tilde{f}_{CAMS}$ of the CAMS extended local, semi-local and regional POLR models. For further details of the implementation of MLP and POLR classifiers see Baran and Lakatos, (2023) and Hemri et al., (2016).

As demonstrated in Section 4.2 of Baran and Lakatos, (2023), in terms of the mean CRPS and mean LogS, up to 120 h, all MLP and POLR post-processed forecasts have shown significant improvement compared to both the raw ensemble and climatology. The extent of this improvement is particularly striking in the case of the raw ensemble, which we have chosen to exclude from our univariate analysis. Figure 2a displays the mean CRPS of climatological and CAMS extended MLP and POLR forecasts as functions of the forecast horizon. All post-processed predictions outperform climatology for all lead times, and the ranking of the various approaches is identical to the one for the corresponding reference models based on feature vector (4.1), see Figure 2 of Baran and Lakatos, (2023). The advantage achieved in mean CRPS by utilizing the CAMS extended feature vector (4.2) over models based solely on ECMWF forecasts is illustrated by the CRPSS values of Figure 2b. In general, it can be observed that the integration of CAMS into the post-processing workflow for regional models yields the lowest benefits, but this advantage is still noteworthy. Note that the best performing POLR and MLP variants, which are the local POLR model and the semi-local MLP, profit the most from the extension of the input features with an average CRPSS of around 3 %. One should also remark that the improvement in mean CRPS from including CAMS predictions might further decrease for forecast horizons beyond 120 h.

The mean LogS values of Figure 3a tell the same story about the various post-processing approaches as Figure 3 of Baran and Lakatos, (2023). Adding the CAMS forecast to the input features of the local, semi-local, and regional MLP and POLR models does not change the ranking of the forecasts; however, as portrayed in Figure 3b, it results in a modest 0.4 – 0.9 % average improvement in LogS.

To address the significance of gain of CAMS extended post-processing models with respect to the corresponding forecasts based only on ECMWF predictions, in Figure 4 we accompany the CRPSS and LogSS of the MLP-C and POLR-L approaches, which are the best model variants according to Baran and Lakatos, (2023), by 95 % block-bootstrap confidence intervals. As depicted in Figures 4a and 4b, up to the maximal studied lead time of 120 h, the improvement in terms of both scores is significant at a 5 % level.

4.2 Multivariate post-processing

Continuing our analysis, we delve into the comparison of forecast skill among multivariate approaches detailed in Section 3.2. In the case of the applied multivariate methods, calibrated samples of size $51$ are employed, mirroring the dimensionality of the raw ECMWF ensemble. When employing the SSh method, which relies on historical data to establish dependency templates, we utilize observations obtained from the rolling training window applied during univariate calibration. This decision is made based on the finding of Lakatos et al., (2023), that extending these pool dates to encompass the entire available historical dataset may only yield negligible advantages. In our case, this entails selecting 51 days from a pool of 350 dates (which is the length of the training period). As mentioned, in both the ECC and SSh methods, the initial step entails generating samples from the calibrated PMFs, which we perform by considering their equidistant quantiles.

To simplify presentation, as univariate calibrated forecasts we consider the CAMS extended MLP-C and POLR-L predictions and their counterparts based only on the ECMWF ensemble. The corresponding multivariate predictions obtained with the help of the ECC and SSh are referred to as MLP-C ECC, POLR-L ECC, and MLP-C SSh, POLR-L SSh, respectively. We also report the performance of naive multivariate forecasts obtained by simply arranging the 51-member station-specific MLP-C and POLR-L samples into 30-dimensional vectors. In the multivariate context, notations MLP-C and POLR-L will refer to these ”independent” predictions. Finally, as a multivariate climatological forecast, we consider the vector of 51 equidistant quantiles of 30-day climatological PMFs corresponding to the investigated SYNOP stations. In principle, one could also consider the vectors of station-specific 51-day climatological forecasts; however, due to their discrete nature, they would substantially differ from the competing calibrated predictions.

First, we assess the performance of the previously discussed multivariate forecasts based on the energy score. Figure 5a clearly illustrates the substantial superiority of each CAMS extended multivariate forecast over the raw ensemble and the 30-day climatology in terms of this scoring rule. Notably, the semi-local MLP ECC and local POLR ECC exhibit the best predictive performance, extremely closely approaching their SSh counterparts, thereby minimizing performance disparities and highlighting the success of multivariate post-processing. According to the energy skill scores of Figure 5b, the univariate MLP-C and POLR-L models alone demonstrate a substantial average improvement of 12 % over multivariate climatology, while the multivariate models show even more pronounced performance enhancements, averaging 17 %. This analysis emphasizes that, at least in our dataset, irrespective of the selected dependency template, multivariate models consistently exhibit improvement compared to independent calibration and climatology.

Figure 6 presents a slightly different picture in terms of model ranking, which might be a natural phenomenon when considering the variogram score depicted here. This verification measure clearly illustrates the success of multivariate post-processing in the redefinition of covariance structures. As expected, (CAMS extended) naive models, such as MLP-C and POLR-L, do not capture the spatial dependencies and might underperform the raw forecast vectors and multivariate climatology, as well. According to the VSS values of Figure 6b, the two-step multivariate models demonstrate a notable improvement of over 20 % with respect to the reference climatological predictions. Among these approaches, the POLR variants exhibit the highest skill, particularly the POLR-L SSh configuration. Moreover, it is noteworthy that a more pronounced advantage was attained with the SSh models obtaining dependence templates from past observations, compared to the ECC forecasts utilizing the rank structure of raw ensemble.

In Figure 7, we illustrate how the CAMS extended post-processed forecasts compare to the corresponding multivariate predictions from classifiers that ignore CAMS forecasts. Both the ESS and VSS reveal a diminishing advantage of CAMS integration with increasing forecast horizons. In the case of the ES the differences between the various post-processed forecasts are minor (see Figure 7a) with the MLP-C ECC variant showing the most overall improvement. The VSS values of Figure 7b again tell a different story. With independent MLP-C and POLR-L models, there is an average improvement of around 5 % in variogram scores suggesting that using CAMS as a feature helps in better preserving spatial dependencies. For the multivariate models, the gain is much smaller, especially for the 84-hour horizon and the POLR-L SSh model even displays negative skill at 66 and 108 hours.

For better clarity, in Figure 8, we accompanied the ESS and VSS values of MLP-C ECC and POLR-L ECC forecasts – which are overall considered the best performers – with 95 % bootstrap confidence bounds. In cases where climatology serves as the reference, as seen in Figures 8a and 8b, in general, the local POLR-based ECC outperforms its MLP-C counterpart. However, for forecasts corresponding to 6 UTC, both in terms of ES and VS, the difference is not significant, indicating that the advantage of the POLR-L ECC approach over the MLP-C ECC is more apparent for the other three studied observation times (12, 18, 24 UTC). Furthermore, Figures 8c and 8d demonstrate that incorporating CAMS forecasts as an additional feature leads to a significant enhancement in ES for all investigated lead times, whereas for VS there are a few instances of longer lead times exhibiting negative skill within the confidence bounds.

In addition to the previous analyses, we also generated multivariate rank histograms to thoroughly assess the differences in calibration of the individual forecasts (see Figure 9). In general, properly calibrated forecasts result in flat histograms, and the deviation from uniformity can be quantified by the reliability index (3.3). In principle, one can have an instant ranking of the various multivariate predictions with the help of the reliability indices, displayed in Figure 10 (the smaller the better); however, the interpretation of the different rank histograms can vary depending on the functions applied to determine the pre-ranks (see Section 3.6), as these functions may be more sensitive to detecting different deficiencies. Therefore, as argued by Thorarinsdottir et al., (2016) and Allen et al, (2024), using multiple pre-rank functions to evaluate the performance of the forecasts could be beneficial, providing a more comprehensive comparison. In the case of naive MLP-C and POLR-L models, hump-shaped average- and band-depth rank histograms indicate overestimated correlations, which is also supported by the skewed energy score- and dependence rank histograms. To a smaller extent, the same applies to the multivariate climatology. Multivariate calibration results in a substantial improvement, especially in the case of MLP-C ECC and POLR-L ECC forecasts. According to Figure 10, in terms of the average ranks, these two methods provide by far the lowest reliability indices and are very competitive with the MLP-C SSh and POLR-L SSh approaches in the three other cases, as well.

5 Conclusions

The current work is a direct continuation of Baran and Lakatos, (2023) where location-specific (univariate) discrete post-processing of visibility was studied. To get a deeper insight into the behaviour of this complex and hardly predictable weather quantity, two main questions are investigated. First, we assess whether the state-of-the-art multivariate post-processing approaches which were approved to be successful in the case of more traditional variables such as temperature, wind speed, or precipitation (see e.g. Lakatos et al.,, 2023), can reliably restore spatial dependencies that lost during separate calibration of visibility ensemble forecasts for each SYNOP station. Furthermore, we also investigate whether the use of CAMS visibility predictions as additional covariates in post-processing models can significantly improve the forecast skill.

In the case of the best-performing univariate post-processing approaches (MLP-C and POLR-L) from the methods investigated by Baran and Lakatos, (2023), the addition of CAMS predictions to the input features results in a significant 1 – 5.25 % improvement in mean CRPS and 0.27 – 1.37 % in mean LogS. Note, that for both scores the gain displays a decreasing trend as the lead time increases.

Joint multivariate post-processing of forecasts for all 30 investigated SYNOP observation stations is performed with the help of the two-step ECC and SSh approaches utilizing the dependence structure of the raw vector ensemble forecasts and historical observations, respectively. In fact, in terms of ES and VS, there are no visible differences in skill between the two dependence template selection methods which substantially outperform the reference independently calibrated (naive) multivariate forecasts. Compared to forecasts based only on the ECMWF predictions, utilizing CAMS results in an additional 0.58 – 4.11 % improvement in energy score for all investigated models. The VS focusing on the correct specification of correlations shows a different picture. In the case of the independent reference methods, the advantage of CAMS extended forecasts is around 5 %; however, the corresponding MLP-C forecasts underperform climatology for almost all lead times. In contrast, for the ECC and SSh methods, the VSS of CAMS extended predictions with respect to the models based only on the ECMWF ensemble is below 2.5 %; nevertheless, they outperform climatology by more than 20 %. From the competing multivariate predictions, the ECC-corrected MLP-C and POLR-L methods display the best overall performance.

Our case study demonstrated the usefulness of CAMS visibility predictions as additional covariates in the discrete post-processing of visibility ensemble forecasts. Furthermore, it showed the efficiency of the simplest two-step multivariate post-processing methods in capturing spatial dependencies in the case of this particular weather quantity.

A natural direction of further studies is the utilization of CAMS forecasts and/or other visibility-related covariates in post-processing methods treating visibility as a continuous quantity. Possible parametric candidates are the BMA approach of Chmielecki and Raftery, (2011), where the predictive distribution for visibility is a mixture of beta laws, and the more recent censored gamma and censored truncated normal mixture model proposed by Baran and Baran, (2023). However, the most straightforward extension of the present work would be the adaptation of the classification and interpolation-based approach of Scheuerer et al., (2020) to visibility forecasts.

Acknowledgments. The authors gratefully acknowledge the support of the ÚNKP-23-3 New National Excellence Program of the Ministry for Culture and Innovation from the source of the National Research, Development and Innovation Fund and of the National Research, Development and Innovation Office under Grant No. K142849. They are also indebted to Martin Leutbecher and Zied Ben Bouallègue for the inspiring discussions and for providing the ECMWF and CAMS visibility data.

References

Allen et al, (2024) Allen, S., Ziegel, J. and Ginsbourger, D. (2024) Assessing the calibration of multivariate probabilistic forecasts. Q. J. R. Meteorol. Soc. 150, 1315–1335.
Baran and Baran, (2023) Baran, Á. and Baran, S. (2023) Parametric model for post-processing visibility ensemble forecasts. Preprint. Available at: https://doi.org/10.48550/arXiv.2310.16824 [Accessed on 19 June 2024]
Baran et al., (2021) Baran, Á., Lerch, S., El Ayari, M. and Baran, S. (2021) Machine learning for total cloud cover prediction. Neural. Comput. Appl. 33, 2605–2620.
Baran and Lakatos, (2023) Baran, S. and Lakatos, M. (2023) Statistical post-processing of visibility ensemble forecasts. Meteorol. Appl. 30, paper e2157.
Baran and Möller, (2015) Baran, S. and Möller, A. (2015) Joint probabilistic forecasting of wind speed and temperature using Bayesian model averaging. Environmetrics 26, 120–132.
Bremnes, (2019) Bremnes, J. B. (2019) Constrained quantile regression splines for ensemble postprocessing. Mon. Weather Rev. 147, 1769–1780.
Bremnes, (2020) Bremnes, J. B. (2020) Ensemble postprocessing using quantile function regression based on neural networks and Bernstein polynomials. Mon. Weather Rev. 148, 403–414.
Chen et al., (2024) Chen, J., Janke, T., Steinke, F. and Lerch, S. (2024) Generative machine learning methods for multivariate ensemble postprocessing. Ann. Appl. Stat. 18, 159–183.
Chmielecki and Raftery, (2011) Chmielecki, R. M. and Raftery, A. E. (2011) Probabilistic visibility forecasting using Bayesian model averaging. Mon. Weather Rev. 139, 1626–1636.
Clark et al., (2004) Clark, M., Gangopadhyay, S., Hay, L., Rajagopalan, B. and Wilby, R. (2004) The Schaake shuffle: A method for reconstructing space–time variability in forecasted precipitation and temperature fields. J. Hydrometeorol. 5, 243–262.
Dabering et al., (2017) Dabernig, M., Mayr, G. J., Messner, J. W. and Zeileis, A. (2017) Spatial ensemble post-processing with standardized anomalies. Q. J. R. Meteorol. Soc. 143, 909–916.
Delle Monache et al., (2006) Delle Monache, L., Hacker, J. P., Zhou, Y., Deng, X. and Stull, R. B. (2006) Probabilistic aspects of meteorological and ozone regional ensemble forecasts. J. Geophys. Res. 111, paper D24307, doi:10.1029/2005JD006917.
ECMWF, (2021) ECMWF (2021) IFS Documentation CY47R3 - Part IV Physical processes. ECMWF, Reading. Available at: http://dx.doi.org/10.21957/eyrpir4vj [Accessed on 19 June 2024]
Friederichs and Hense, (2007) Friederichs, P. and Hense, A. (2007) Statistical downscaling of extreme precipitation events using censored quantile regression. Mon. Weather Rev. 135, 2365–2378.
Gneiting and Raftery, (2007) Gneiting, T. and Raftery, A. E. (2007) Strictly proper scoring rules, prediction and estimation. J. Amer. Statist. Assoc. 102, 359–378.
Gneiting et al., (2005) Gneiting, T., Raftery, A. E., Westveld, A. H. and Goldman, T. (2005) Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098–1118.
Gneiting et al., (2008) Gneiting, T., Stanberry, L. I., Grimit, E. P., Held, L. and Johnson, N. A. (2008) Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. Test 17, 211–235.
Good, (1952) Good, I. J. (1952) Rational decisions. J. R. Stat. Soc. Series B Stat. Methodol. 14, 107–114.
Goodfellow et al., (2016) Goodfellow, I., Bengio, Y. and Courville, A. (2016) Deep Learning. MIT Press, Cambridge.
Hemri et al., (2016) Hemri, S., Haiden, T. and Pappenberger, F. (2016) Discrete postprocessing of total cloud cover ensemble forecasts. Mon. Weather Rev. 144, 2565–2577.
Jordan et al., (2019) Jordan, A., Krüger, F. and Lerch, S. (2019) Evaluating probabilistic forecasts with scoringRules. J. Stat. Softw. 90, 1–37.
Knüppel et al., (2022) Knüppel, M., Krüger, F. and Pohle, M.-O. (2022) Score-based calibration testing for multivariate forecast distributions. Preprint. Available at: https://doi.org/10.48550/arXiv.2211.16362. [Accessed on 19 June 2024]
Lakatos et al., (2023) Lakatos, M., Lerch, S., Hemri, S. and Baran, S. (2023) Comparison of multivariate post-processing methods using global ECMWF ensemble forecasts. Q. J. R. Meteorol. Soc. 149, 856–877.
Lang et al., (2020) Lang, M. N., Lerch, S., Mayr, G. J., Simon, T., Stauffer, R. and Zeileis, A. (2020) Remember the past: a comparison of time-adaptive training schemes for non-homogeneous regression. Nonlin. Processes Geophys. 27, 23–34.
Lerch and Baran, (2017) Lerch, S. and Baran, S. (2017) Similarity-based semi-local estimation of EMOS models. J. R. Stat. Soc. Ser. C Appl. Stat. 66, 29–51.
Lerch et al., (2020) Lerch, S., Baran, S., Möller, A., Groß, J., Schefzik, R., Hemri, S. and Graeter, M. (2020) Simulation-based comparison of multivariate ensemble post-processing methods. Nonlinear Process. Geophys. 27, 349–371.
López-Pintado and Romo, (2009) López-Pintado, S. and Romo, J. (2009) On the concept of depth for functional data. J. Amer. Statist. Assoc. 104, 718–734.
McCullagh, (1980) McCullagh, P. (1980) Regression model for ordinal data (with discussion). J. R. Stat. Soc. Series B Stat. Methodol. 42, 243–268.
Murphy, (1973) Murphy, A. H. (1973) Hedging and skill scores for probability forecasts. J. Appl. Meteorol. 12, 215–223.
Pitkänen et al., (2023) Pitkänen, M. R. A., Benedictow, A., Arola, A., Bennouna, Y., Blake, L., Bouarar, I., Cuevas, E., Errera, Q., Eskes, H. J., Griesfeller, J., Ilic, L., Kapsomenakis, J., Langerock, B., Mortier, A., Pison, I., Ramonet, M., Richter, A., Schoenhardt, A., Schulz, M., Tarniewicz, J., Thouret, V., Tsikerdekis, A., Warneke, T., Zerefos, C. (2023) Validation report of the CAMS near-real-time global atmospheric composition service: March – May 2023. Copernicus Atmosphere Monitoring Service (CAMS) report, September 2023, Available at: http://dx.doi.org/10.24380/ac0r-ao9u. [Accessed on 19 June 2024]
Politis and Romano, (1994) Politis, D. N. and Romano, J. P. (1994) The stationary bootstrap. J. Amer. Statist. Assoc. 89, 1303–1313.
Raftery et al., (2005) Raftery, A. E., Gneiting, T., Balabdaoui, F. and Polakowski, M. (2005) Using Bayesian model averaging to calibrate forecast ensembles. Mon. Weather Rev. 133, 1155–1174.
Rasp and Lerch, (2018) Rasp, S. and Lerch, S. (2018) Neural networks for postprocessing ensemble weather forecasts. Mon. Weather Rev. 146, 3885–3900.
Schefzik and Möller, (2018) Schefzik, R. and Möller, A. (2018) Ensemble postprocessing methods incorporating dependence structures. In Vannitsem, S., Wilks, D. S. and Messner, J. W. (eds.), Statistical Postprocessing of Ensemble Forecasts. Elsevier, Amsterdam, pp. 91–125.
Schefzik et al., (2013) Schefzik, R., Thorarinsdottir T. L. and Gneiting, T. (2013) Uncertainty quantification in complex simulation models using ensemble copula coupling. Statist. Sci. 28, 616–640.
Scheuerer and Hamill, (2015) Scheuerer, M. and Hamill, T. M. (2015) Variogram-based proper scoring rules for probabilistic forecasts of multivariate quantities. Mon. Weather Rev. 143, 1321–1334.
Scheuerer et al., (2020) Scheuerer, M., Switanek, M. B., Worsnop, R. P. and Hamill, T. M. (2020) Using artificial neural networks for generating probabilistic subseasonal precipitation forecasts over California. Mon. Weather Rev. 148, 3489–3506.
Schultz and Lerch, (2022) Schultz, B. and Lerch, S. (2022) Machine learning methods for postprocessing ensemble forecasts of wind gusts: a systematic comparison. Mon. Weather Rev. 150, 235–257.
Taillardat et al., (2016) Taillardat, M., Mestre, O., Zamo, M. and Naveau, P. (2016) Calibrated ensemble forecasts using quantile regression forests and ensemble model output statistics. Mon. Weather Rev. 144, 2375–2393.
Thorarinsdottir and Gneiting, (2010) Thorarinsdottir, T. L. and Gneiting, T. (2010) Probabilistic forecasts of wind speed: ensemble model output statistics by using heteroscedastic censored regression. J. R. Stat. Soc. Ser. A Stat. Soc. 173, 371–388.
Thorarinsdottir et al., (2016) Thorarinsdottir, T. L., Scheuerer, M. and Heinz, C. (2016) Assessing the calibration of high-dimensional ensemble forecasts using rank histograms. J. Comput. Graph. Stat. 25, 105–122.
Vannitsem et al., (2021) Vannitsem, S., Bremnes, J. B., Demaeyer, J., Evans, G. R., Flowerdew, J., Hemri, S., Lerch, S., Roberts, N., Theis, S., Atencia, A., Ben Boualègue, Z., Bhend, J., Dabernig, M., De Cruz, L., Hieta, L., Mestre, O., Moret, L., Odak Plenkovič, I., Schmeits, M., Taillardat, M., Van den Bergh, J., Van Schaeybroeck, B., Whan, K. and Ylhaisi, J. (2021) Statistical postprocessing for weather forecasts – review, challenges and avenues in a big data world. Bull. Amer. Meteorol. Soc. 102, E681–E699.
Wilks, (2019) Wilks, D. S. (2019) Statistical Methods in the Atmospheric Sciences. 4th ed. Elsevier, Amsterdam.
WMO, (1992) World Meteorological Organization (1992) International Meteorological Vocabulary (WMO-No.182). WMO, Geneva.
WMO, (2018) World Meteorological Organization (2018) Guide to Instruments and Methods of Observation. Volume I – Measurement of Meteorological Variables (WMO-No.8). WMO, Geneva.