Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Enhancing multivariate post-processed visibility predictions utilizing CAMS forecasts

Mária Lakatos1,2 and Sándor Baran1,∗
1Faculty of Informatics, University of Debrecen, Hungary
2Doctoral School of Informatics, University of Debrecen, Hungary
Abstract

In our contemporary era, meteorological weather forecasts increasingly incorporate ensemble predictions of visibility – a parameter of great importance in aviation, maritime navigation, and air quality assessment, with direct implications for public health. However, this weather variable falls short of the predictive accuracy achieved for other quantities issued by meteorological centers. Therefore, statistical post-processing is recommended to enhance the reliability and accuracy of predictions. By estimating the predictive distributions of the variables with the aid of historical observations and forecasts, one can achieve statistical consistency between true observations and ensemble predictions. Visibility observations, following the recommendation of the World Meteorological Organization, are typically reported in discrete values; hence, the predictive distribution of the weather quantity takes the form of a discrete parametric law. Recent studies demonstrated that the application of classification algorithms can successfully improve the skill of such discrete forecasts; however, a frequently emerging issue is that certain spatial and/or temporal dependencies could be lost between marginals. Based on visibility ensemble forecasts of the European Centre for Medium-Range Weather Forecasts for 30 locations in Central Europe, we investigate whether the inclusion of Copernicus Atmosphere Monitoring Service (CAMS) predictions of the same weather quantity as an additional covariate could enhance the skill of the post-processing methods and whether it contributes to the successful integration of spatial dependence between marginals. Our study confirms that post-processed forecasts are substantially superior to raw and climatological predictions, and the utilization of CAMS forecasts provides a further significant enhancement both in the univariate and multivariate setup.


Keywords: Copernicus Atmosphere Monitoring Service, ensemble calibration, ensemble copula coupling, multivariate post-processing, Schaake shuffle, visibility

11footnotetext: Corresponding author: baran.sandor@inf.unideb.hu

1 Introduction

Due to its significant impact on a wide range of aspects of aviation meteorology, visibility is a critical parameter for both weather forecasters and pilots. Beyond the previously mentioned contexts, visibility’s importance extends notably into the realms of agriculture and maritime transportation. Additionally, poor visibility conditions stand out as one of the most common causes of road accidents. The World Meteorological Organization (WMO) defines visibility as ”the greatest distance at which a black object of suitable dimensions (located on the ground) can be seen and recognized when observed against the horizon sky” (WMO,, 1992). Due to the mentioned reasons, it is crucial to create the most accurate visibility forecasts possible, a task that, fortunately, an increasing number of meteorological services have been undertaking in recent years.

In the present day, weather forecasts are crafted through the use of numerical weather prediction (NWP) models. These models utilize initial values obtained from the previous states of the atmosphere, and the perturbation of these initial conditions results in the creation of a set of forecasts, commonly referred to as ensemble predictions. Despite notable advancements, NWP systems face challenges in achieving complete bias-free forecasts for various weather parameters, even within short lead times. Visibility emerges as a parameter with comparatively weak forecast skill, primarily because, unlike temperature, wind speed, or precipitation accumulation, most NWPs do not explicitly model visibility. Consequently, visibility forecasts necessitate derivation from predictions of related quantities such as relative humidity or precipitation (Chmielecki and Raftery,, 2011). Specifically, at the European Centre of Medium-Range Weather Forecasts (ECMWF), the visibility parameter was integrated into the ECMWF Integrated Forecasting System (IFS) on 12 May 2015 (ECMWF,, 2021). This parameter utilizes model projections of water vapor, cloud, rain, and snow, along with climatological aerosol fields, to estimate the visibility that would be recorded by weather observers.

To correct the biases and lack of calibration in the forecasts, one possible direction is the application of statistical post-processing. In recent years, numerous post-processing techniques have been developed to calibrate ensemble forecasts, for a recent overview we refer to Vannitsem et al., (2021). From this wide range of methods, the straightforward yet potent parametric ensemble model output statistic (Gneiting et al.,, 2005) approach maps ensemble forecasts onto a probability distribution, providing a pathway for assessing uncertainties associated with potential outcomes. The commonly used Bayesian model averaging (BMA; Raftery et al.,, 2005) also offers a complete predictive distribution for the examined weather variable; however, by now the machine learning-based distributional regression network (Rasp and Lerch,, 2018) method has became a rather popular parametric technique (see e.g. Schultz and Lerch,, 2022). Another widespread approach to post-processing is capturing the predictive distribution of a weather quantity with the help of its quantiles (see e.g. Friederichs and Hense,, 2007; Bremnes,, 2019). Naturally, such nonparametric calibration is also frequently combined with machine learning techniques. As examples, one can mention the quantile regression forests (Taillardat et al.,, 2016) or the Bernstein quantile network (Bremnes,, 2020).

The utility of univariate post-processing techniques is supported by numerous studies; however, even today, many experts examine the disadvantage that the correlations between marginal distributions often get lost when applying such methods. This can be relevant to connections between locations, predictions initialized at the same time, or the loss of dependencies between weather quantities. Multivariate post-processing aims to restore potentially lost correlation structures, and there are already numerous parametric or non-parametric models available for achieving these goals (see e.g. Schefzik and Möller,, 2018). One option is to consider multivariate predictive distributions (see e.g. Baran and Möller,, 2015); however, the drawback of such models is that due to the estimation of a large number of parameters, they can easily encounter numerical problems. In contrast, the ensemble copula coupling (ECC; Schefzik et al.,, 2013) represents a two-step approach that involves initially generating samples from univariate predictive distributions. Subsequently, these samples are reorganized into the raw ensemble rank order structure, thus, this procedure can be interpreted as the application of an empirical copula. The Schaake Shuffle (Clark et al.,, 2004) is another method that functions based on a similar principle to ECC; nevertheless, past research has suggested that ECC frequently demonstrates slightly better predictive performance (see e.g. Chen et al.,, 2024; Lakatos et al.,, 2023). For a comprehensive comparison of such multivariate post-processing methods, we refer to Lerch et al., (2020).

Visibility observations are typically reported in discrete values according to WMO guidelines, namely ”100 to 5,000 m in steps of 100 m, 6 to 30 km in steps of 1 km, and 35 to 70 km in steps of 5 km” (WMO,, 2018, Section 9.1.2). As a result, the post-processing of visibility forecasts is reduced to a classification task, where the predicted probabilities of the different classes comprise the forecast distribution in the form of a discrete probability mass function (PMF). A recent study (Baran and Lakatos,, 2023) demonstrated that the implementation of multilayer perceptron neural network (MLP; Goodfellow et al.,, 2016) and proportional odds logistic regression (POLR; McCullagh,, 1980) classifiers can significantly enhance the forecast skill of raw ECMWF visibility predictions. Note, that these two methods displayed superior forecast skills also in the realm of post-processing ensemble predictions of total cloud cover (Baran et al.,, 2021). However, in contrast to Baran et al., (2021), the investigation of Baran and Lakatos, (2023) did not include the exploration of the use of additional covariates that are not exclusively based on ECMWF forecasts of the target weather quantity. A natural first step in this direction is to consider further predictions of visibility that are independent of the operational ECMWF ensemble.

The Copernicus Atmosphere Monitoring Service (CAMS) is dedicated to monitoring and predicting the composition of Earth’s atmosphere. It provides information on various atmospheric components, including air quality, greenhouse gases, and aerosols (Pitkänen et al.,, 2023). CAMS delivers deterministic forecasts related to these components, generated by extending the IFS model developed by ECMWF with additional modules.

In this paper, we study the predictive accuracy of POLR and MLP techniques for calibrating ECMWF visibility ensemble forecasts, incorporating CAMS forecasts as covariates, in the contexts of both univariate and multivariate post-processing. Raw ECMWF visibility ensemble and climatology are considered as reference forecasts.

The structure of the paper is outlined as follows. In Section 2, a concise description of the visibility datasets under study is presented. Section 3 provides a review of the considered univariate and multivariate post-processing methods, along with details on the approaches used for training data selection and the verification tools considered. The outcomes of our case study are detailed in Section 4, followed by a brief discussion and conclusions in Section 5.

2 Data

Refer to caption
Figure 1: Locations of SYNOP observation stations corresponding to the ECMWF- and CAMS forecasts for 2020-2021

The visibility ensemble forecasts employed in this study consist of the operational ECMWF control (CTRL) and the corresponding 50 interchangeable ensemble members (ENS) produced by the ECMWF IFS. Regarding the latter 50 members of the ensemble, it is important to note that their interchangeability stems from the statistical indistinguishability arising from the perturbations of the initial conditions of the NWP. As mentioned in the Introduction, while visibility is primarily reported as a continuous variable in meters, the WMO suggests reporting observations in discrete categories, resulting in 84 different values in total, which can be described by the set

𝒴={0,100,200,,4900,5000,6000,7000,,29000,30000,35000,40000,,65000,70000},𝒴01002004900500060007000290003000035000400006500070000\mathcal{Y}=\{0,100,200,\ldots,4900,5000,6000,7000,\ldots,29000,30000,35000,40% 000,\ldots,65000,70000\},caligraphic_Y = { 0 , 100 , 200 , … , 4900 , 5000 , 6000 , 7000 , … , 29000 , 30000 , 35000 , 40000 , … , 65000 , 70000 } ,

where the alignment of forecasts (provided in 1 m increments) with observations is achieved by rounding down to the nearest reported value.

As discussed, our goal is to show that incorporating CAMS visibility forecasts as an additional predictor in the post-processing methods described in Section 3, – representing an independent yet comparable meteorological parameter – can significantly enhance the forecast skill of both the raw ECMWF predictions and the corresponding calibrated forecasts. The deterministic CAMS predictions are part of the global forecasts for atmospheric composition provided by the Copernicus Atmosphere Monitoring Service, a service provided by the ECMWF as part of the Copernicus program. The CAMS Global atmospheric composition forecast utilizes the IFS, the same model employed for ECMWF weather forecasts. However, specific modules within CAMS have been enabled to account for aerosols, reactive gases, and greenhouse gases, which have been developed uniquely within the CAMS framework.

Both the ECMWF ensemble, the CAMS forecasts, and the corresponding observations span a 2-year period from January 2020 to December 2021, for 30 SYNOP stations located in Germany, the Czech Republic, and Poland (refer to Figure 1). All forecasts are initialized at 0000 UTC, and we consider 20 different lead times ranging from 6 hours to 120 hours, with a time step of 6 hours. Notably, this dataset is fairly complete, with no missing observations.

3 Post-processing- and forecast evaluation methods

In what follows, let  𝒇(d)=(f1(d),f2(d),,f51(d))superscript𝒇𝑑superscriptsubscript𝑓1𝑑superscriptsubscript𝑓2𝑑superscriptsubscript𝑓51𝑑\boldsymbol{f}^{(d)}=\big{(}f_{1}^{(d)},f_{2}^{(d)},\ldots,f_{51}^{(d)}\big{)}bold_italic_f start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT = ( italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT , … , italic_f start_POSTSUBSCRIPT 51 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT )  denote a 51-member ECMWF ensemble forecast for SYNOP station  d𝑑ditalic_d  (d=1,2,,30𝑑1230d=1,2,\ldots,30italic_d = 1 , 2 , … , 30) for a given time point with a given lead time, where  f1(d)=fCTRL(d)superscriptsubscript𝑓1𝑑subscriptsuperscript𝑓𝑑CTRLf_{1}^{(d)}=f^{(d)}_{\text{CTRL}}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT = italic_f start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT CTRL end_POSTSUBSCRIPT  represents the control run, and   f2(d),f3(d),,f51(d)subscriptsuperscript𝑓𝑑2subscriptsuperscript𝑓𝑑3subscriptsuperscript𝑓𝑑51f^{(d)}_{2},f^{(d)}_{3},\ldots,f^{(d)}_{51}italic_f start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_f start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , … , italic_f start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 51 end_POSTSUBSCRIPT  correspond to the 50 exchangeable members  fENS,1(d),fENS,2(d),,fENS,50(d)subscriptsuperscript𝑓𝑑ENS1subscriptsuperscript𝑓𝑑ENS2subscriptsuperscript𝑓𝑑ENS50f^{(d)}_{\text{ENS},1},f^{(d)}_{\text{ENS},2},\ldots,f^{(d)}_{\text{ENS},50}italic_f start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ENS , 1 end_POSTSUBSCRIPT , italic_f start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ENS , 2 end_POSTSUBSCRIPT , … , italic_f start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT ENS , 50 end_POSTSUBSCRIPT.  Furthermore, let f0(d)=fCAMS(d)superscriptsubscript𝑓0𝑑subscriptsuperscript𝑓𝑑CAMSf_{0}^{(d)}=f^{(d)}_{\text{CAMS}}italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT = italic_f start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT CAMS end_POSTSUBSCRIPT be the matching CAMS prediction, while  Y(d)𝒴={y1,y2,,y84}superscript𝑌𝑑𝒴subscript𝑦1subscript𝑦2subscript𝑦84Y^{(d)}\in{\mathcal{Y}}=\big{\{}y_{1},y_{2},\ldots,y_{84}\big{\}}italic_Y start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT ∈ caligraphic_Y = { italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT 84 end_POSTSUBSCRIPT }  denotes the corresponding observed visibility with  𝒴𝒴{\mathcal{Y}}caligraphic_Y  defined in Section 2. Finally, station-specific data are combined into multivariate forecasts  𝖋0,𝖋1,,𝖋51subscript𝖋0subscript𝖋1subscript𝖋51\boldsymbol{\mathfrak{f}}_{0},\boldsymbol{\mathfrak{f}}_{1},\ldots,\boldsymbol% {\mathfrak{f}}_{51}bold_fraktur_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_fraktur_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_fraktur_f start_POSTSUBSCRIPT 51 end_POSTSUBSCRIPT  and observations  𝒀𝒀\boldsymbol{Y}bold_italic_Y,  where  𝖋=(f(1),f(2),,f(30)),=0,1,,51,formulae-sequencesubscript𝖋superscriptsuperscriptsubscript𝑓1superscriptsubscript𝑓2superscriptsubscript𝑓30top0151\boldsymbol{\mathfrak{f}}_{\ell}=\big{(}f_{\ell}^{(1)},f_{\ell}^{(2)},\ldots,f% _{\ell}^{(30)}\big{)}^{\top},\ \ell=0,1,\ldots,51,bold_fraktur_f start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT = ( italic_f start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_f start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … , italic_f start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 30 ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , roman_ℓ = 0 , 1 , … , 51 ,  and  𝒀=(Y(1),Y(2),,Y(30))𝒀superscriptsuperscript𝑌1superscript𝑌2superscript𝑌30top\boldsymbol{Y}=\big{(}Y^{(1)},Y^{(2)},\ldots,Y^{(30)}\big{)}^{\top}bold_italic_Y = ( italic_Y start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_Y start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … , italic_Y start_POSTSUPERSCRIPT ( 30 ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT.

3.1 Univariate post-processing

To simplify the presentation of calibration methods for marginals, in this section we omit the indication of the SYNOP station and use notations  𝒇=𝒇(d)𝒇superscript𝒇𝑑\boldsymbol{f}=\boldsymbol{f}^{(d)}bold_italic_f = bold_italic_f start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT  and  Y=Y(d)𝑌superscript𝑌𝑑Y=Y^{(d)}italic_Y = italic_Y start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT  for forecasts and observations, respectively.

As  Y𝑌Yitalic_Y  is a discrete random variable, the predictive distribution of  Y𝑌Yitalic_Y  is specified by conditional probabilities

𝖯(Y=yk𝒇),k=1,2,,84,formulae-sequence𝖯𝑌conditionalsubscript𝑦𝑘𝒇𝑘1284{\mathsf{P}}(Y=y_{k}\mid\boldsymbol{f}),\qquad k=1,2,\ldots,84,sansserif_P ( italic_Y = italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∣ bold_italic_f ) , italic_k = 1 , 2 , … , 84 , (3.1)

forming a conditional PMF with respect to the ensemble forecast.

Naturally, in (3.1) the raw forecast can replaced by any feature vector  𝒙𝒙\boldsymbol{x}bold_italic_x  derived from the ensemble and/or other covariates. These covariates can be location or time-specific variables or additional forecasts of visibility or other related weather quantities.

3.1.1 Multilayer perceptron neural network

In recent years, the application of various types of artificial neural networks (ANNs) for the calibration of ensemble forecasts has become increasingly popular. These networks have a significant advantage compared to classical parametric or non-parametric approaches, as they are more flexible e.g. in representing non-linear relations between the ensemble forecasts and the parameters of the predictive distributions and in incorporating new explanatory variables in the modelling process. Multilayer perceptron (MLP) networks are a powerful class of ANNs widely employed for classification tasks in machine learning. A conventional MLP comprises multiple layers and nodes (neurons), where the value of each node is a result of the transformed (via an activation function) weighted sum of values from the nodes of the preceding layer, along with an added bias term. Input features are introduced to the input layer, and predictions regarding the distribution of different classes are generated in the output layer. The level of abstraction can be determined by tuning parameters, such as the number of hidden layers and the neurons within them. For more details, please refer to the work by Goodfellow et al., (2016).

3.1.2 Proportional odds logistic regression

Proportional odds logistic regression (POLR) is an efficient parametric classification method for ordered classes, such as the visibility observations at hand. It specifies the conditional cumulative distribution of observed visibility  Y𝑌Yitalic_Y  given an M𝑀Mitalic_M-dimensional feature vector  𝒙𝒙\boldsymbol{x}bold_italic_x   as

𝖯(Yyk𝒙)=ek(𝒙)1+ek(𝒙),wherek(𝒙):=αk+𝒙𝜷,k=1,2,,84,formulae-sequence𝖯𝑌conditionalsubscript𝑦𝑘𝒙superscriptesubscript𝑘𝒙1superscriptesubscript𝑘𝒙whereformulae-sequenceassignsubscript𝑘𝒙subscript𝛼𝑘superscript𝒙top𝜷𝑘1284{\mathsf{P}}\big{(}Y\leq y_{k}\mid\boldsymbol{x}\big{)}=\frac{{\mathrm{e}}^{% \mathcal{L}_{k}(\boldsymbol{x})}}{1+{\mathrm{e}}^{\mathcal{L}_{k}(\boldsymbol{% x})}},\qquad\text{where}\qquad\mathcal{L}_{k}(\boldsymbol{x}):=\alpha_{k}+% \boldsymbol{x}^{\top}\boldsymbol{\beta},\quad\ k=1,2,\ldots,84,sansserif_P ( italic_Y ≤ italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∣ bold_italic_x ) = divide start_ARG roman_e start_POSTSUPERSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) end_POSTSUPERSCRIPT end_ARG start_ARG 1 + roman_e start_POSTSUPERSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) end_POSTSUPERSCRIPT end_ARG , where caligraphic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_x ) := italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_β , italic_k = 1 , 2 , … , 84 ,

with  αk,𝜷Mformulae-sequencesubscript𝛼𝑘𝜷superscript𝑀\alpha_{k}\in{\mathbb{R}},\ \boldsymbol{\beta}\in{\mathbb{R}}^{M}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R , bold_italic_β ∈ blackboard_R start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT,  and  α1<α2<<α84subscript𝛼1subscript𝛼2subscript𝛼84\alpha_{1}<\alpha_{2}<\cdots<\alpha_{84}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < ⋯ < italic_α start_POSTSUBSCRIPT 84 end_POSTSUBSCRIPT.  Hence, POLR modelling of visibility classes results in a total of  84+M84𝑀84+M84 + italic_M  unknown parameters to be estimated.

3.2 Multivariate methods

The purpose of applying univariate post-processing methods is to fine-tune and calibrate the forecasts predicted by the NWP model, addressing various accuracy issues or biases. However, when applying these techniques, there is a risk of losing the spatial, temporal, or even inter-variable correlations between marginals. Here we consider two easy-to-implement yet powerful two-step techniques that can correct this particular deficiency and restore dependence between marginal distributions (see e.g. Lerch et al.,, 2020; Lakatos et al.,, 2023). Both methods are based on reordering a sample generated from the calibrated predictive distribution obtained in the first step, where one can either consider equidistant quantiles or draw random or stratified samples. As Chen et al., (2024) reveal only minor differences in skill between the multivariate forecasts corresponding to the various sampling strategies, here we will work with the equidistant quantiles of the post-processed univariate forecasts.

3.3 Ensemble copula coupling

Ensemble copula coupling (ECC) leverages the raw ensemble rank structure (with ties resolved at random) and capitalizes on the ample information embedded in ensemble predictions concerning dependencies. Following the notations of Lakatos et al., (2023) this iterative algorithm can be described as follows:

  1. 1.

    Generate a sample  f^1(d),f^2(d),,f^K(d)superscriptsubscript^𝑓1𝑑superscriptsubscript^𝑓2𝑑superscriptsubscript^𝑓𝐾𝑑\widehat{f}_{1}^{(d)},\widehat{f}_{2}^{(d)},\ldots,\widehat{f}_{K}^{(d)}over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT , over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT , … , over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT  for each dimension  d𝑑ditalic_d,  matching the size of the raw ensemble (K𝐾Kitalic_K = 51), from the calibrated marginal predictive distribution assumed to be sorted in ascending order.

  2. 2.

    Consider permutations  𝝅d=(πd(1),πd(2),,πd(K))subscript𝝅𝑑subscript𝜋𝑑1subscript𝜋𝑑2subscript𝜋𝑑𝐾\boldsymbol{\pi}_{d}=\big{(}\pi_{d}(1),\pi_{d}(2),\ldots,\pi_{d}(K)\big{)}bold_italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = ( italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( 1 ) , italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( 2 ) , … , italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_K ) )  of  {1,2,,K}12𝐾\{1,2,\ldots,K\}{ 1 , 2 , … , italic_K }  corresponding to the rank order structure of the raw ensemble   f1(d),f2(d),,fK(d)superscriptsubscript𝑓1𝑑superscriptsubscript𝑓2𝑑superscriptsubscript𝑓𝐾𝑑f_{1}^{(d)},f_{2}^{(d)},\ldots,f_{K}^{(d)}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT,  namely  πd(k):=rank(fk(d))assignsubscript𝜋𝑑𝑘ranksuperscriptsubscript𝑓𝑘𝑑\pi_{d}(k):=\mathop{\hbox{\rm rank}}\big{(}f_{k}^{(d)}\big{)}italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_k ) := rank ( italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT ),  where ties are resolved at random. The ECC calibrated sample  f~1(d),f~2(d),,f~K(d)superscriptsubscript~𝑓1𝑑superscriptsubscript~𝑓2𝑑superscriptsubscript~𝑓𝐾𝑑\widetilde{f}_{1}^{(d)},\widetilde{f}_{2}^{(d)},\ldots,\widetilde{f}_{K}^{(d)}over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT , over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT , … , over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT  for location  d𝑑ditalic_d  is obtained by rearranging the sample generated in step 1 according to permutation  𝝅dsubscript𝝅𝑑\boldsymbol{\pi}_{d}bold_italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT,  that is

    f~k(d):=f^πd(k)(d),k=1,2,,K,d=1,2,,30.formulae-sequenceassignsuperscriptsubscript~𝑓𝑘𝑑superscriptsubscript^𝑓subscript𝜋𝑑𝑘𝑑formulae-sequence𝑘12𝐾𝑑1230\widetilde{f}_{k}^{(d)}:=\widehat{f}_{\pi_{d}(k)}^{(d)},\qquad k=1,2,\ldots,K,% \quad d=1,2,\ldots,30.over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT := over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_k ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_d ) end_POSTSUPERSCRIPT , italic_k = 1 , 2 , … , italic_K , italic_d = 1 , 2 , … , 30 .

3.4 Schaake shuffle

The other nonparametric multivariate post-processing technique we consider is the Schaake shuffle (SSh; Clark et al.,, 2004), which involves using randomly selected past observations to create a dependence template aimed at restoring correlations between the marginal distributions. Samples from the calibrated univariate predictive distributions are then rearranged to match the rank order structure of this selected set of historical observations of the corresponding cardinality. In this way, the SSh method offers the flexibility to generate post-processed forecasts of various sizes, provided there is a sufficiently long historical climatological dataset available. However, to ensure a fair comparison with ECC methods, we restrict the sample size to that of the raw ensemble and use the same sampling strategy.

3.5 Training data selection

Each statistical post-processing method, including both the MLP and POLR models introduced in Section 3.1, conducts parameter estimation using training data, which comprises a collection of prior forecasts and observations predating the current forecast time. Various approaches exist for selecting this training dataset in terms of both spatial and temporal decomposition.

A frequently used method of spatial selection, known as local estimation, entails using forecasts and observations specific to the station under consideration to estimate model parameters or weights (Thorarinsdottir and Gneiting,, 2010). This approach could offer significant advantages as it effectively integrates the station’s characteristics and local conditions into the calibration process. Another widely used and quite powerful alternative to the latter selection process is to treat together historical data of all stations of the ensemble domain, which, for a short training period, can be more efficient than local modelling. Nevertheless, this method, referred to as regional, is not particularly suitable for expansive and diverse domains, since it may struggle to effectively capture the varied complexities and nuances present across such landscapes. Due to the factors mentioned, in the case of regional modeling, all locations share the same set of parameters on a specific day to derive the parameters of the predictive distribution. A middle ground between regional and local methods is semi-local modeling, where stations with similar characteristics are clustered together, and regional modelling is applied within these clusters (Lerch and Baran,, 2017).

The efficiency of post-processing methods is further influenced by the length of the training period and where the training period is positioned in time. One commonly used technique for temporal selection is when the training data is selected using a so-called sliding window approach, where the data for the preceding  n𝑛nitalic_n  calendar days before the actual forecast date are chosen. Alternatively, fixing the training period to a specific extensive time frame in the past can also be suitable for implementing the modelling process, especially in the case of machine learning-based techniques requiring large training datasets. For a recent comparison of various time-adaptive training schemes, we refer to Lang et al., (2020).

3.6 Verification scores

Following the suggestions of Gneiting and Raftery, (2007), the predictive performance of the studied univariate and multivariate visibility forecasts is evaluated with the help of proper scoring rules, as they simultaneously address both the calibration and sharpness of the predictive distribution. To be consistent with Baran and Lakatos, (2023), in the univariate case, we consider the logarithmic score (LogS; Good,, 1952) and the continuous ranked probability score (CRPS; Wilks,, 2019, Section 9.5.1). For a discrete visibility forecast  F𝐹Fitalic_F  represented by a PMF  pF(y)subscript𝑝𝐹𝑦p_{F}(y)italic_p start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ( italic_y )  on  𝒴𝒴\mathcal{Y}caligraphic_Y  the former is defined as

LogS(F,y):=log(pF(y)),assignLogS𝐹𝑦subscript𝑝𝐹𝑦\mathop{\hbox{\rm LogS}}(F,y):=-\log\big{(}p_{F}(y)\big{)},LogS ( italic_F , italic_y ) := - roman_log ( italic_p start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ( italic_y ) ) ,

whereas the latter equals

CRPS(F,y)=k=184pF(yk)|yyk|k=284=1k1pF(yk)pF(y)|yyk|,CRPS𝐹𝑦superscriptsubscript𝑘184subscript𝑝𝐹subscript𝑦𝑘𝑦subscript𝑦𝑘superscriptsubscript𝑘284superscriptsubscript1𝑘1subscript𝑝𝐹subscript𝑦𝑘subscript𝑝𝐹subscript𝑦subscript𝑦subscript𝑦𝑘\mathop{\hbox{\rm CRPS}}(F,y)=\sum_{k=1}^{84}p_{F}(y_{k})\big{|}y-y_{k}\big{|}% -\sum_{k=2}^{84}\sum_{\ell=1}^{k-1}p_{F}(y_{k})p_{F}(y_{\ell})\big{|}y_{\ell}-% y_{k}\big{|},CRPS ( italic_F , italic_y ) = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 84 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) | italic_y - italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | - ∑ start_POSTSUBSCRIPT italic_k = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 84 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) | italic_y start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ,

which is the discrete version of the general representation

CRPS(F,y)=𝖤|Yy|12𝖤|YY|CRPS𝐹𝑦𝖤𝑌𝑦12𝖤𝑌superscript𝑌\mathop{\hbox{\rm CRPS}}(F,y)={\mathsf{E}}|Y-y|-\frac{1}{2}{\mathsf{E}}|Y-Y^{% \prime}|CRPS ( italic_F , italic_y ) = sansserif_E | italic_Y - italic_y | - divide start_ARG 1 end_ARG start_ARG 2 end_ARG sansserif_E | italic_Y - italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | (3.2)

derived by Gneiting and Raftery, (2007), where  Y𝑌Yitalic_Y  and  Ysuperscript𝑌Y^{\prime}italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT  are independent random variables with finite second moments distributed according to  F𝐹Fitalic_F.

Form (3.2) allows a direct extension of the CRPS to multivariate forecasts. For a D𝐷Ditalic_D-dimensional predictive cumulative distribution function (CDF)  F𝐹Fitalic_F  and vector  𝒚=(y(1),y(2),,y(D)),𝒚superscriptsuperscript𝑦1superscript𝑦2superscript𝑦𝐷top\boldsymbol{y}=\big{(}y^{(1)},y^{(2)},\ldots,y^{(D)}\big{)}^{\top},bold_italic_y = ( italic_y start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … , italic_y start_POSTSUPERSCRIPT ( italic_D ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ,  the energy score (ES; Gneiting and Raftery,, 2007) is defined as

ES(F,𝒚)=𝖤𝒀𝒚12𝖤𝒀𝒀,ES𝐹𝒚𝖤norm𝒀𝒚12𝖤norm𝒀superscript𝒀\mathop{\hbox{\rm ES}}(F,\boldsymbol{y})={\mathsf{E}}\|\boldsymbol{Y}-% \boldsymbol{y}\|-\frac{1}{2}{\mathsf{E}}\|\boldsymbol{Y}-\boldsymbol{Y}^{% \prime}\|,ES ( italic_F , bold_italic_y ) = sansserif_E ∥ bold_italic_Y - bold_italic_y ∥ - divide start_ARG 1 end_ARG start_ARG 2 end_ARG sansserif_E ∥ bold_italic_Y - bold_italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ ,

where now  𝒀𝒀\boldsymbol{Y}bold_italic_Y  and  𝒀superscript𝒀\boldsymbol{Y}^{\prime}bold_italic_Y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT  are independent random vectors having distribution  F𝐹Fitalic_F, and  \|\cdot\|∥ ⋅ ∥ denotes the Euclidean distance. For a  K𝐾Kitalic_K-member forecast ensemble or sample drawn from the predictive distribution, one has to consider the empirical CDF  FKsubscript𝐹𝐾F_{K}italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT  resulting in the ensemble energy score (Gneiting et al.,, 2008)

ES(FK,𝒚)=1Kj=1K𝖋j𝒚12K2j=1Kk=1K𝖋j𝖋k.ESsubscript𝐹𝐾𝒚1𝐾superscriptsubscript𝑗1𝐾normsubscript𝖋𝑗𝒚12superscript𝐾2superscriptsubscript𝑗1𝐾superscriptsubscript𝑘1𝐾normsubscript𝖋𝑗subscript𝖋𝑘\mathop{\hbox{\rm ES}}(F_{K},\boldsymbol{y})=\frac{1}{K}\sum_{j=1}^{K}\|% \boldsymbol{\mathfrak{f}}_{j}-\boldsymbol{y}\|-\frac{1}{2K^{2}}\sum_{j=1}^{K}% \sum_{k=1}^{K}\|\boldsymbol{\mathfrak{f}}_{j}-\boldsymbol{\mathfrak{f}}_{k}\|.ES ( italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT , bold_italic_y ) = divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∥ bold_fraktur_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - bold_italic_y ∥ - divide start_ARG 1 end_ARG start_ARG 2 italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∥ bold_fraktur_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - bold_fraktur_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ .

Finally, we also report the (ensemble) variogram score of order  p𝑝pitalic_p (VSpsuperscriptVS𝑝{\rm VS}^{p}roman_VS start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT; Scheuerer and Hamill,, 2015), which is more sensitive to the errors in the specification of correlations than the ESES\mathop{\hbox{\rm ES}}ES. Given an ensemble forecast   𝖋k=(fk(1),fk(2),,fk(D)),k=1,2,,K,formulae-sequencesubscript𝖋𝑘superscriptsuperscriptsubscript𝑓𝑘1superscriptsubscript𝑓𝑘2superscriptsubscript𝑓𝑘𝐷top𝑘12𝐾\boldsymbol{\mathfrak{f}}_{k}=\big{(}f_{k}^{(1)},f_{k}^{(2)},\ldots,f_{k}^{(D)% }\big{)}^{\top},\ k=1,2,\ldots,K,bold_fraktur_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ( italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_D ) end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , italic_k = 1 , 2 , … , italic_K ,  it equals

VSp(FK,𝒚)=i=1Dj=1Dωij(|y(i)y(j)|p1Kk=1K|fk(i)fk(j)|p)2,superscriptVS𝑝subscript𝐹𝐾𝒚superscriptsubscript𝑖1𝐷superscriptsubscript𝑗1𝐷subscript𝜔𝑖𝑗superscriptsuperscriptsuperscript𝑦𝑖superscript𝑦𝑗𝑝1𝐾superscriptsubscript𝑘1𝐾superscriptsuperscriptsubscript𝑓𝑘𝑖superscriptsubscript𝑓𝑘𝑗𝑝2{\rm VS}^{p}(F_{K},\boldsymbol{y})=\sum_{i=1}^{D}\sum_{j=1}^{D}\omega_{ij}% \left(\big{|}y^{(i)}-y^{(j)}\big{|}^{p}-\frac{1}{K}\sum_{k=1}^{K}\big{|}f_{k}^% {(i)}-f_{k}^{(j)}\big{|}^{p}\right)^{2},roman_VS start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( italic_F start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT , bold_italic_y ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT italic_ω start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( | italic_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT - italic_y start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT | italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT - italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_j ) end_POSTSUPERSCRIPT | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where  ωij0subscript𝜔𝑖𝑗0\omega_{ij}\geq 0italic_ω start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ≥ 0  is the weight for coordinate pair  (i,j)𝑖𝑗(i,j)( italic_i , italic_j ).  The usual choices for order  p𝑝pitalic_p  are 0.50.50.50.5 and 1111, in our case study we consider the former and use the notation  VSVS{\rm VS}roman_VS  for  VS0.5superscriptVS0.5{\rm VS}^{0.5}roman_VS start_POSTSUPERSCRIPT 0.5 end_POSTSUPERSCRIPT.  Note that all four proper scoring rules defined above are negatively oriented, that is the smaller the better, and implemented in the R package scoringRules (Jordan et al.,, 2019).

In the case study of Section 4 the predictive performance of a forecast  F𝐹Fitalic_F  with a given forecast horizon in terms of a score  𝒮Fsubscript𝒮𝐹\mathcal{S}_{F}caligraphic_S start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT  is quantified by the mean score value  𝒮¯Fsubscript¯𝒮𝐹\overline{\mathcal{S}}_{F}over¯ start_ARG caligraphic_S end_ARG start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT  over all forecast cases used for verification. Furthermore, we also consider skill scores (Murphy,, 1973)

𝒮𝒮F:=1𝒮¯F𝒮¯Fref,assign𝒮subscript𝒮𝐹1subscript¯𝒮𝐹subscript¯𝒮subscript𝐹ref{\mathcal{SS}}_{F}:=1-\frac{\overline{\mathcal{S}}_{F}}{\overline{\mathcal{S}}% _{F_{\text{ref}}}},caligraphic_S caligraphic_S start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT := 1 - divide start_ARG over¯ start_ARG caligraphic_S end_ARG start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG start_ARG over¯ start_ARG caligraphic_S end_ARG start_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ,

providing the relative improvement of a forecast  F𝐹Fitalic_F  in terms of the score  𝒮Fsubscript𝒮𝐹\mathcal{S}_{F}caligraphic_S start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT  with respect to a reference forecast    Frefsubscript𝐹refF_{\text{ref}}italic_F start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT  resulting in a mean score value  𝒮¯Frefsubscript¯𝒮subscript𝐹ref\overline{\mathcal{S}}_{F_{\text{ref}}}over¯ start_ARG caligraphic_S end_ARG start_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT ref end_POSTSUBSCRIPT end_POSTSUBSCRIPT.  Skill scores are positively oriented (the larger the better) and often reported in percentage values. In Section 4 we investigate continuous ranked probability skill score (CRPSS), logarithmic skill score (LogSS), energy skill score (ESS), and variogram skill score (VSS).

In addition to reporting the various score and skill score values of the competing visibility forecasts, we also address the significance of the score differences and uncertainty in score values by providing 95 % confidence bounds. These confidence intervals are calculated using 2000 block-bootstrap samples according to the stationary bootstrap scheme with the mean block length formula specified by Politis and Romano, (1994).

Calibration of univariate ensemble forecasts can also be assessed graphically with the help of verification rank histograms (Wilks,, 2019, Section 9.7.1). For a properly calibrated K𝐾Kitalic_K-member ensemble, the ranks of the verifying observations with respect to the corresponding forecasts follow a discrete uniform distribution on the set  {1,2,,K+1}12𝐾1\{1,2,\ldots,K+1\}{ 1 , 2 , … , italic_K + 1 },  and the shape of the corresponding histogram reflects the source of the lack of calibration. Multivariate generalizations of the verification rank are based on the concept of pre-ranks and the corresponding histograms display the pre-rank of the observation with respect to the ensemble forecast with ties resolved at random (Gneiting et al.,, 2008). Here we consider the average and band-depth ranks suggested by Thorarinsdottir et al., (2016) and energy score and dependence (variogram) ranks studied in detail e.g. by Allen et al, (2024). The average rank is simply the mean of the univariate ranks and the interpretation of the corresponding histogram is similar to the univariate case: \cup- and \cap-shaped histograms refer to under- and overdispersion, respectively, whereas biased forecasts result in triangular shapes. Band-depth histograms are based on the discrete version of ordering multivariate functional data proposed by López-Pintado and Romo, (2009), while energy score and dependence histograms use the idea of Knüppel et al., (2022) of proper score-based pre-ranks. Right-skewed band-depth histograms indicate overdispersion, left-skewed ones underdispersion or bias, whereas \cup- and \cap-shaped histograms mean too low- and high correlations in the ensemble. Furthermore, low score-based ranks indicate that the observation is similar to the ensemble members in terms of the given scoring rule, whereas outliers result in high pre-ranks. Note, that in the case of the dependence ranking the weight for a pair of coordinates in the variogram score is the negative exponential of the geographical distance between the corresponding SYNOP stations given in 100 km. Finally, as a measure of the deviation of verification ranks from uniformity we consider the reliability index (RI; Delle Monache et al.,, 2006) defined as

RI:=r=1K+1|ρr1K+1|,assignRIsuperscriptsubscript𝑟1𝐾1subscript𝜌𝑟1𝐾1\mathop{\hbox{\rm RI}}:=\sum_{r=1}^{K+1}\Big{|}\rho_{r}-\frac{1}{K+1}\Big{|},RI := ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K + 1 end_POSTSUPERSCRIPT | italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_K + 1 end_ARG | , (3.3)

where  ρrsubscript𝜌𝑟\rho_{r}italic_ρ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT  is the relative frequency of rank  r𝑟ritalic_r.

4 Results

We first explore the advantage of applying CAMS visibility forecasts as additional covariates in POLR and MLP methods for discrete post-processing of ECMWF ensemble predictions of visibility compared to models based on the raw ECMWF ensemble only. We also study the predictive performance of the corresponding multivariate forecasts obtained with the help of ensemble copula coupling and Schaake shuffle. To be consistent with the results of Baran and Lakatos, (2023), all investigated MLP and POLR classifiers rely on 350-day rolling training periods, and each lead time is treated separately. We consider local (MLP-L and POLR-L), regional (MLP-R and POLR-R) and clustering-based semi-local (MLP-C and POLR-C) training with four clusters derived using k𝑘kitalic_k-means clustering of SYNOP stations with respect to three-dimensional feature vectors consisting of observed frequencies of visibility intervals 0–5000, 5000–30000, and 30000–70000 m in the training period. Both univariate and multivariate predictions are validated on forecast-observation pairs for calendar year 2021 and as reference forecasts, we consider the raw ECMWF visibility ensemble and 30-day climatology.

Refer to caption
Figure 2: Mean CRPS of climatological and post-processed visibility forecasts based on CAMS extended MLP and POLR models (a) and CRPSS of CAMS extended post-processed forecasts with respect to the corresponding MLP and POLR models based only on ECMWF predictions (b) as functions of the lead time.

The reference MLP and POLR models studied by Baran and Lakatos, (2023) are based on eight-dimensional input feature vectors

(f~CTRL,f¯ENS,s2,p1,p2,p3,β1,β2),superscriptsubscript~𝑓𝐶𝑇𝑅𝐿subscript¯𝑓𝐸𝑁𝑆superscript𝑠2subscript𝑝1subscript𝑝2subscript𝑝3subscript𝛽1subscript𝛽2top\big{(}\tilde{f}_{CTRL},\overline{f}_{ENS},s^{2},p_{1},p_{2},p_{3},\beta_{1},% \beta_{2}\big{)}^{\top},( over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_C italic_T italic_R italic_L end_POSTSUBSCRIPT , over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_E italic_N italic_S end_POSTSUBSCRIPT , italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , (4.1)

where  f~CTRL:=fCTRL/70000assignsubscript~𝑓𝐶𝑇𝑅𝐿subscript𝑓𝐶𝑇𝑅𝐿70000\tilde{f}_{CTRL}:=f_{CTRL}/70000over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_C italic_T italic_R italic_L end_POSTSUBSCRIPT := italic_f start_POSTSUBSCRIPT italic_C italic_T italic_R italic_L end_POSTSUBSCRIPT / 70000  is the normalized control member,  f¯ENSsubscript¯𝑓𝐸𝑁𝑆\overline{f}_{ENS}over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_E italic_N italic_S end_POSTSUBSCRIPT  is the mean of the 50 (normalized) exchangeable members,  s2superscript𝑠2s^{2}italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT  is the variance of the 51-member (normalized) operational ensemble,  p1,p2subscript𝑝1subscript𝑝2p_{1},\ p_{2}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT  and  p3subscript𝑝3p_{3}italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT  are the proportions of ensemble members predicting visibility up to 1000 m, 1000–2000 m and more than 30000 m, respectively, whereas  β1subscript𝛽1\beta_{1}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT  and  β2subscript𝛽2\beta_{2}italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT  are annual base functions addressing seasonal variations defined as

β1(d):=sin(2πd/365)andβ2(d):=cos(2πd/365),formulae-sequenceassignsubscript𝛽1𝑑2𝜋𝑑365andassignsubscript𝛽2𝑑2𝜋𝑑365\beta_{1}(d):=\sin\big{(}2\pi d/365\big{)}\qquad\text{and}\qquad\beta_{2}(d):=% \cos\big{(}2\pi d/365\big{)},italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_d ) := roman_sin ( 2 italic_π italic_d / 365 ) and italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_d ) := roman_cos ( 2 italic_π italic_d / 365 ) ,

where  d𝑑ditalic_d  denotes the day of the year (see e.g. Dabering et al.,, 2017).

Here we add the normalized CAMS forecast  f~CAMS:=fCAMS/70000assignsubscript~𝑓𝐶𝐴𝑀𝑆subscript𝑓𝐶𝐴𝑀𝑆70000\tilde{f}_{CAMS}:=f_{CAMS}/70000over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_C italic_A italic_M italic_S end_POSTSUBSCRIPT := italic_f start_POSTSUBSCRIPT italic_C italic_A italic_M italic_S end_POSTSUBSCRIPT / 70000  to the input features and investigate the forecast skill of CAMS extended local, regional, and semi-local MLP and POLR post-processing based on the feature vector

(f~CTRL,f¯ENS,f~CAMS,s2,p1,p2,p3,β1,β2)superscriptsubscript~𝑓𝐶𝑇𝑅𝐿subscript¯𝑓𝐸𝑁𝑆subscript~𝑓𝐶𝐴𝑀𝑆superscript𝑠2subscript𝑝1subscript𝑝2subscript𝑝3subscript𝛽1subscript𝛽2top\big{(}\tilde{f}_{CTRL},\overline{f}_{ENS},\tilde{f}_{CAMS},s^{2},p_{1},p_{2},% p_{3},\beta_{1},\beta_{2}\big{)}^{\top}( over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_C italic_T italic_R italic_L end_POSTSUBSCRIPT , over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_E italic_N italic_S end_POSTSUBSCRIPT , over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_C italic_A italic_M italic_S end_POSTSUBSCRIPT , italic_s start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT (4.2)

together with the corresponding multivariate forecasts derived with the help of the two-step approaches presented in Section 3.2.

4.1 Univariate performance

Refer to caption
Figure 3: Mean LogS of climatological and post-processed visibility forecasts based on CAMS extended MLP and POLR models (a) and LogSS of CAMS extended post-processed forecasts with respect to the corresponding MLP and POLR models based only on ECMWF predictions (b) as functions of the lead time.

The predictive performance of univariate visibility forecasts is evaluated with the help of the mean CRPS and mean LogS over all forecast cases in the verification period, together with the corresponding skill scores. As in Baran and Lakatos, (2023), extremely low predicted probabilities resulting in numerical issues in LogS calculations are handled by introducing a probability threshold  pmin=2.75×105subscript𝑝2.75superscript105p_{\min}=2.75\times 10^{-5}italic_p start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT = 2.75 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT,  which replaces all values in the predictive PMFs which are below this probability. This particular choice of  pminsubscript𝑝p_{\min}italic_p start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT  ensures that with a 1 % probability the corresponding reported visibility materializes at least once in a year; however, it is low enough, so there are no observable differences in terms of other scores between the original and the adjusted and renormalized PMFs. Furthermore, as for the reference POLR approaches based on feature vector (4.1) the weights of  f~CTRLsubscript~𝑓𝐶𝑇𝑅𝐿\tilde{f}_{CTRL}over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_C italic_T italic_R italic_L end_POSTSUBSCRIPT and  f¯ENSsubscript¯𝑓𝐸𝑁𝑆\overline{f}_{ENS}over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_E italic_N italic_S end_POSTSUBSCRIPT  are forced to be non-negative, we apply the same constraint also for the coefficient of  f~CAMSsubscript~𝑓𝐶𝐴𝑀𝑆\tilde{f}_{CAMS}over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_C italic_A italic_M italic_S end_POSTSUBSCRIPT  of the CAMS extended local, semi-local and regional POLR models. For further details of the implementation of MLP and POLR classifiers see Baran and Lakatos, (2023) and Hemri et al., (2016).

Refer to caption
Figure 4: CRPSS (a) and LogSS (b) of CAMS extended MLP-C and POLR-L forecasts with respect to the corresponding models based only on ECMWF predictions together with 95 % confidence bounds as functions of the lead time.

As demonstrated in Section 4.2 of Baran and Lakatos, (2023), in terms of the mean CRPS and mean LogS, up to 120 h, all MLP and POLR post-processed forecasts have shown significant improvement compared to both the raw ensemble and climatology. The extent of this improvement is particularly striking in the case of the raw ensemble, which we have chosen to exclude from our univariate analysis. Figure 2a displays the mean CRPS of climatological and CAMS extended MLP and POLR forecasts as functions of the forecast horizon. All post-processed predictions outperform climatology for all lead times, and the ranking of the various approaches is identical to the one for the corresponding reference models based on feature vector (4.1), see Figure 2 of Baran and Lakatos, (2023). The advantage achieved in mean CRPS by utilizing the CAMS extended feature vector (4.2) over models based solely on ECMWF forecasts is illustrated by the CRPSS values of Figure 2b. In general, it can be observed that the integration of CAMS into the post-processing workflow for regional models yields the lowest benefits, but this advantage is still noteworthy. Note that the best performing POLR and MLP variants, which are the local POLR model and the semi-local MLP, profit the most from the extension of the input features with an average CRPSS of around 3 %. One should also remark that the improvement in mean CRPS from including CAMS predictions might further decrease for forecast horizons beyond 120 h.

The mean LogS values of Figure 3a tell the same story about the various post-processing approaches as Figure 3 of Baran and Lakatos, (2023). Adding the CAMS forecast to the input features of the local, semi-local, and regional MLP and POLR models does not change the ranking of the forecasts; however, as portrayed in Figure 3b, it results in a modest 0.4 – 0.9 % average improvement in LogS.

To address the significance of gain of CAMS extended post-processing models with respect to the corresponding forecasts based only on ECMWF predictions, in Figure 4 we accompany the CRPSS and LogSS of the MLP-C and POLR-L approaches, which are the best model variants according to Baran and Lakatos, (2023), by 95 % block-bootstrap confidence intervals. As depicted in Figures 4a and 4b, up to the maximal studied lead time of 120 h, the improvement in terms of both scores is significant at a 5 % level.

Refer to caption
Figure 5: Mean ES of raw, climatological and post-processed multivariate visibility forecasts based on CAMS extended MLP-C and POLR-L models (a) and ESS of CAMS extended post-processed forecasts with respect to multivariate climatology (b) as functions of the lead time.

4.2 Multivariate post-processing

Continuing our analysis, we delve into the comparison of forecast skill among multivariate approaches detailed in Section 3.2. In the case of the applied multivariate methods, calibrated samples of size 51515151 are employed, mirroring the dimensionality of the raw ECMWF ensemble. When employing the SSh method, which relies on historical data to establish dependency templates, we utilize observations obtained from the rolling training window applied during univariate calibration. This decision is made based on the finding of Lakatos et al., (2023), that extending these pool dates to encompass the entire available historical dataset may only yield negligible advantages. In our case, this entails selecting 51 days from a pool of 350 dates (which is the length of the training period). As mentioned, in both the ECC and SSh methods, the initial step entails generating samples from the calibrated PMFs, which we perform by considering their equidistant quantiles.

Refer to caption
Figure 6: Mean VS of raw, climatological and post-processed multivariate visibility forecasts based on CAMS extended MLP-C and POLR-L models (a) and VSS of CAMS extended post-processed forecasts with respect to climatology (b) as functions of the lead time.

To simplify presentation, as univariate calibrated forecasts we consider the CAMS extended MLP-C and POLR-L predictions and their counterparts based only on the ECMWF ensemble. The corresponding multivariate predictions obtained with the help of the ECC and SSh are referred to as MLP-C ECC, POLR-L ECC, and MLP-C SSh, POLR-L SSh, respectively. We also report the performance of naive multivariate forecasts obtained by simply arranging the 51-member station-specific MLP-C and POLR-L samples into 30-dimensional vectors. In the multivariate context, notations MLP-C and POLR-L will refer to these ”independent” predictions. Finally, as a multivariate climatological forecast, we consider the vector of 51 equidistant quantiles of 30-day climatological PMFs corresponding to the investigated SYNOP stations. In principle, one could also consider the vectors of station-specific 51-day climatological forecasts; however, due to their discrete nature, they would substantially differ from the competing calibrated predictions.

First, we assess the performance of the previously discussed multivariate forecasts based on the energy score. Figure 5a clearly illustrates the substantial superiority of each CAMS extended multivariate forecast over the raw ensemble and the 30-day climatology in terms of this scoring rule. Notably, the semi-local MLP ECC and local POLR ECC exhibit the best predictive performance, extremely closely approaching their SSh counterparts, thereby minimizing performance disparities and highlighting the success of multivariate post-processing. According to the energy skill scores of Figure 5b, the univariate MLP-C and POLR-L models alone demonstrate a substantial average improvement of 12 % over multivariate climatology, while the multivariate models show even more pronounced performance enhancements, averaging 17 %. This analysis emphasizes that, at least in our dataset, irrespective of the selected dependency template, multivariate models consistently exhibit improvement compared to independent calibration and climatology.

Refer to caption
Figure 7: ESS (a) and VSS (b) post-processed multivariate visibility forecasts based on CAMS extended MLP-C and POLR-L models with respect to the corresponding forecasts based only on ECMWF predictions as functions of the lead time.

Figure 6 presents a slightly different picture in terms of model ranking, which might be a natural phenomenon when considering the variogram score depicted here. This verification measure clearly illustrates the success of multivariate post-processing in the redefinition of covariance structures. As expected, (CAMS extended) naive models, such as MLP-C and POLR-L, do not capture the spatial dependencies and might underperform the raw forecast vectors and multivariate climatology, as well. According to the VSS values of Figure 6b, the two-step multivariate models demonstrate a notable improvement of over 20 % with respect to the reference climatological predictions. Among these approaches, the POLR variants exhibit the highest skill, particularly the POLR-L SSh configuration. Moreover, it is noteworthy that a more pronounced advantage was attained with the SSh models obtaining dependence templates from past observations, compared to the ECC forecasts utilizing the rank structure of raw ensemble.

In Figure 7, we illustrate how the CAMS extended post-processed forecasts compare to the corresponding multivariate predictions from classifiers that ignore CAMS forecasts. Both the ESS and VSS reveal a diminishing advantage of CAMS integration with increasing forecast horizons. In the case of the ES the differences between the various post-processed forecasts are minor (see Figure 7a) with the MLP-C ECC variant showing the most overall improvement. The VSS values of Figure 7b again tell a different story. With independent MLP-C and POLR-L models, there is an average improvement of around 5 % in variogram scores suggesting that using CAMS as a feature helps in better preserving spatial dependencies. For the multivariate models, the gain is much smaller, especially for the 84-hour horizon and the POLR-L SSh model even displays negative skill at 66 and 108 hours.

Refer to caption
Figure 8: ESS (a,c) and VSS (b,d) of CAMS extended MLP-C ECC and POLR-L ECC multivariate forecasts with respect to multivariate climatology (a,b) and to the corresponding MLP-C ECC and POLR-L ECC forecasts based only on ECMWF predictions (c,d) together with 95 % confidence bounds as functions of the lead time.

For better clarity, in Figure 8, we accompanied the ESS and VSS values of MLP-C ECC and POLR-L ECC forecasts – which are overall considered the best performers – with 95 % bootstrap confidence bounds. In cases where climatology serves as the reference, as seen in Figures 8a and 8b, in general, the local POLR-based ECC outperforms its MLP-C counterpart. However, for forecasts corresponding to 6 UTC, both in terms of ES and VS, the difference is not significant, indicating that the advantage of the POLR-L ECC approach over the MLP-C ECC is more apparent for the other three studied observation times (12, 18, 24 UTC). Furthermore, Figures 8c and 8d demonstrate that incorporating CAMS forecasts as an additional feature leads to a significant enhancement in ES for all investigated lead times, whereas for VS there are a few instances of longer lead times exhibiting negative skill within the confidence bounds.

Refer to caption
Figure 9: Rank histograms of climatological and post-processed multivariate visibility forecasts based on CAMS extended MLP-C and POLR-L models.

In addition to the previous analyses, we also generated multivariate rank histograms to thoroughly assess the differences in calibration of the individual forecasts (see Figure 9). In general, properly calibrated forecasts result in flat histograms, and the deviation from uniformity can be quantified by the reliability index (3.3). In principle, one can have an instant ranking of the various multivariate predictions with the help of the reliability indices, displayed in Figure 10 (the smaller the better); however, the interpretation of the different rank histograms can vary depending on the functions applied to determine the pre-ranks (see Section 3.6), as these functions may be more sensitive to detecting different deficiencies. Therefore, as argued by Thorarinsdottir et al., (2016) and Allen et al, (2024), using multiple pre-rank functions to evaluate the performance of the forecasts could be beneficial, providing a more comprehensive comparison. In the case of naive MLP-C and POLR-L models, hump-shaped average- and band-depth rank histograms indicate overestimated correlations, which is also supported by the skewed energy score- and dependence rank histograms. To a smaller extent, the same applies to the multivariate climatology. Multivariate calibration results in a substantial improvement, especially in the case of MLP-C ECC and POLR-L ECC forecasts. According to Figure 10, in terms of the average ranks, these two methods provide by far the lowest reliability indices and are very competitive with the MLP-C SSh and POLR-L SSh approaches in the three other cases, as well.

Refer to caption
Figure 10: Reliability indices of rank histograms of raw, climatological and post-processed multivariate visibility forecasts based on CAMS extended MLP-C and POLR-L models as functions of the lead time.

5 Conclusions

The current work is a direct continuation of Baran and Lakatos, (2023) where location-specific (univariate) discrete post-processing of visibility was studied. To get a deeper insight into the behaviour of this complex and hardly predictable weather quantity, two main questions are investigated. First, we assess whether the state-of-the-art multivariate post-processing approaches which were approved to be successful in the case of more traditional variables such as temperature, wind speed, or precipitation (see e.g. Lakatos et al.,, 2023), can reliably restore spatial dependencies that lost during separate calibration of visibility ensemble forecasts for each SYNOP station. Furthermore, we also investigate whether the use of CAMS visibility predictions as additional covariates in post-processing models can significantly improve the forecast skill.

In the case of the best-performing univariate post-processing approaches (MLP-C and POLR-L) from the methods investigated by Baran and Lakatos, (2023), the addition of CAMS predictions to the input features results in a significant 1 – 5.25 % improvement in mean CRPS and 0.27 – 1.37 % in mean LogS. Note, that for both scores the gain displays a decreasing trend as the lead time increases.

Joint multivariate post-processing of forecasts for all 30 investigated SYNOP observation stations is performed with the help of the two-step ECC and SSh approaches utilizing the dependence structure of the raw vector ensemble forecasts and historical observations, respectively. In fact, in terms of ES and VS, there are no visible differences in skill between the two dependence template selection methods which substantially outperform the reference independently calibrated (naive) multivariate forecasts. Compared to forecasts based only on the ECMWF predictions, utilizing CAMS results in an additional 0.58 – 4.11 % improvement in energy score for all investigated models. The VS focusing on the correct specification of correlations shows a different picture. In the case of the independent reference methods, the advantage of CAMS extended forecasts is around 5 %; however, the corresponding MLP-C forecasts underperform climatology for almost all lead times. In contrast, for the ECC and SSh methods, the VSS of CAMS extended predictions with respect to the models based only on the ECMWF ensemble is below 2.5 %; nevertheless, they outperform climatology by more than 20 %. From the competing multivariate predictions, the ECC-corrected MLP-C and POLR-L methods display the best overall performance.

Our case study demonstrated the usefulness of CAMS visibility predictions as additional covariates in the discrete post-processing of visibility ensemble forecasts. Furthermore, it showed the efficiency of the simplest two-step multivariate post-processing methods in capturing spatial dependencies in the case of this particular weather quantity.

A natural direction of further studies is the utilization of CAMS forecasts and/or other visibility-related covariates in post-processing methods treating visibility as a continuous quantity. Possible parametric candidates are the BMA approach of Chmielecki and Raftery, (2011), where the predictive distribution for visibility is a mixture of beta laws, and the more recent censored gamma and censored truncated normal mixture model proposed by Baran and Baran, (2023). However, the most straightforward extension of the present work would be the adaptation of the classification and interpolation-based approach of Scheuerer et al., (2020) to visibility forecasts.


Acknowledgments.  The authors gratefully acknowledge the support of the ÚNKP-23-3 New National Excellence Program of the Ministry for Culture and Innovation from the source of the National Research, Development and Innovation Fund and of the National Research, Development and Innovation Office under Grant No. K142849. They are also indebted to Martin Leutbecher and Zied Ben Bouallègue for the inspiring discussions and for providing the ECMWF and CAMS visibility data.

References

  • Allen et al, (2024) Allen, S., Ziegel, J. and Ginsbourger, D. (2024) Assessing the calibration of multivariate probabilistic forecasts. Q. J. R. Meteorol. Soc. 150, 1315–1335.
  • Baran and Baran, (2023) Baran, Á. and Baran, S. (2023) Parametric model for post-processing visibility ensemble forecasts. Preprint. Available at: https://doi.org/10.48550/arXiv.2310.16824 [Accessed on 19 June 2024]
  • Baran et al., (2021) Baran, Á., Lerch, S., El Ayari, M. and Baran, S. (2021) Machine learning for total cloud cover prediction. Neural. Comput. Appl. 33, 2605–2620.
  • Baran and Lakatos, (2023) Baran, S. and Lakatos, M. (2023) Statistical post-processing of visibility ensemble forecasts. Meteorol. Appl. 30, paper e2157.
  • Baran and Möller, (2015) Baran, S. and Möller, A. (2015) Joint probabilistic forecasting of wind speed and temperature using Bayesian model averaging. Environmetrics 26, 120–132.
  • Bremnes, (2019) Bremnes, J. B. (2019) Constrained quantile regression splines for ensemble postprocessing. Mon. Weather Rev. 147, 1769–1780.
  • Bremnes, (2020) Bremnes, J. B. (2020) Ensemble postprocessing using quantile function regression based on neural networks and Bernstein polynomials. Mon. Weather Rev. 148, 403–414.
  • Chen et al., (2024) Chen, J., Janke, T., Steinke, F. and Lerch, S. (2024) Generative machine learning methods for multivariate ensemble postprocessing. Ann. Appl. Stat. 18, 159–183.
  • Chmielecki and Raftery, (2011) Chmielecki, R. M. and Raftery, A. E. (2011) Probabilistic visibility forecasting using Bayesian model averaging. Mon. Weather Rev. 139, 1626–1636.
  • Clark et al., (2004) Clark, M., Gangopadhyay, S., Hay, L., Rajagopalan, B. and Wilby, R. (2004) The Schaake shuffle: A method for reconstructing space–time variability in forecasted precipitation and temperature fields. J. Hydrometeorol. 5, 243–262.
  • Dabering et al., (2017) Dabernig, M., Mayr, G. J., Messner, J. W. and Zeileis, A. (2017) Spatial ensemble post-processing with standardized anomalies. Q. J. R. Meteorol. Soc. 143, 909–916.
  • Delle Monache et al., (2006) Delle Monache, L., Hacker, J. P., Zhou, Y., Deng, X. and Stull, R. B. (2006) Probabilistic aspects of meteorological and ozone regional ensemble forecasts. J. Geophys. Res. 111, paper D24307, doi:10.1029/2005JD006917.
  • ECMWF, (2021) ECMWF (2021) IFS Documentation CY47R3 - Part IV Physical processes. ECMWF, Reading. Available at: http://dx.doi.org/10.21957/eyrpir4vj [Accessed on 19 June 2024]
  • Friederichs and Hense, (2007) Friederichs, P. and Hense, A. (2007) Statistical downscaling of extreme precipitation events using censored quantile regression. Mon. Weather Rev. 135, 2365–2378.
  • Gneiting and Raftery, (2007) Gneiting, T. and Raftery, A. E. (2007) Strictly proper scoring rules, prediction and estimation. J. Amer. Statist. Assoc. 102, 359–378.
  • Gneiting et al., (2005) Gneiting, T., Raftery, A. E., Westveld, A. H. and Goldman, T. (2005) Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098–1118.
  • Gneiting et al., (2008) Gneiting, T., Stanberry, L. I., Grimit, E. P., Held, L. and Johnson, N. A. (2008) Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. Test 17, 211–235.
  • Good, (1952) Good, I. J. (1952) Rational decisions. J. R. Stat. Soc. Series B Stat. Methodol. 14, 107–114.
  • Goodfellow et al., (2016) Goodfellow, I., Bengio, Y. and Courville, A. (2016) Deep Learning. MIT Press, Cambridge.
  • Hemri et al., (2016) Hemri, S., Haiden, T. and Pappenberger, F. (2016) Discrete postprocessing of total cloud cover ensemble forecasts. Mon. Weather Rev. 144, 2565–2577.
  • Jordan et al., (2019) Jordan, A., Krüger, F. and Lerch, S. (2019) Evaluating probabilistic forecasts with scoringRules. J. Stat. Softw. 90, 1–37.
  • Knüppel et al., (2022) Knüppel, M., Krüger, F. and Pohle, M.-O. (2022) Score-based calibration testing for multivariate forecast distributions. Preprint. Available at: https://doi.org/10.48550/arXiv.2211.16362. [Accessed on 19 June 2024]
  • Lakatos et al., (2023) Lakatos, M., Lerch, S., Hemri, S. and Baran, S. (2023) Comparison of multivariate post-processing methods using global ECMWF ensemble forecasts. Q. J. R. Meteorol. Soc. 149, 856–877.
  • Lang et al., (2020) Lang, M. N., Lerch, S., Mayr, G. J., Simon, T., Stauffer, R. and Zeileis, A. (2020) Remember the past: a comparison of time-adaptive training schemes for non-homogeneous regression. Nonlin. Processes Geophys. 27, 23–34.
  • Lerch and Baran, (2017) Lerch, S. and Baran, S. (2017) Similarity-based semi-local estimation of EMOS models. J. R. Stat. Soc. Ser. C Appl. Stat. 66, 29–51.
  • Lerch et al., (2020) Lerch, S., Baran, S., Möller, A., Groß, J., Schefzik, R., Hemri, S. and Graeter, M. (2020) Simulation-based comparison of multivariate ensemble post-processing methods. Nonlinear Process. Geophys. 27, 349–371.
  • López-Pintado and Romo, (2009) López-Pintado, S. and Romo, J. (2009) On the concept of depth for functional data. J. Amer. Statist. Assoc. 104, 718–734.
  • McCullagh, (1980) McCullagh, P. (1980) Regression model for ordinal data (with discussion). J. R. Stat. Soc. Series B Stat. Methodol. 42, 243–268.
  • Murphy, (1973) Murphy, A. H. (1973) Hedging and skill scores for probability forecasts. J. Appl. Meteorol. 12, 215–223.
  • Pitkänen et al., (2023) Pitkänen, M. R. A., Benedictow, A., Arola, A., Bennouna, Y., Blake, L., Bouarar, I., Cuevas, E., Errera, Q., Eskes, H. J., Griesfeller, J., Ilic, L., Kapsomenakis, J., Langerock, B., Mortier, A., Pison, I., Ramonet, M., Richter, A., Schoenhardt, A., Schulz, M., Tarniewicz, J., Thouret, V., Tsikerdekis, A., Warneke, T., Zerefos, C. (2023) Validation report of the CAMS near-real-time global atmospheric composition service: March – May 2023. Copernicus Atmosphere Monitoring Service (CAMS) report, September 2023, Available at: http://dx.doi.org/10.24380/ac0r-ao9u. [Accessed on 19 June 2024]
  • Politis and Romano, (1994) Politis, D. N. and Romano, J. P. (1994) The stationary bootstrap. J. Amer. Statist. Assoc. 89, 1303–1313.
  • Raftery et al., (2005) Raftery, A. E., Gneiting, T., Balabdaoui, F. and Polakowski, M. (2005) Using Bayesian model averaging to calibrate forecast ensembles. Mon. Weather Rev. 133, 1155–1174.
  • Rasp and Lerch, (2018) Rasp, S. and Lerch, S. (2018) Neural networks for postprocessing ensemble weather forecasts. Mon. Weather Rev. 146, 3885–3900.
  • Schefzik and Möller, (2018) Schefzik, R. and Möller, A. (2018) Ensemble postprocessing methods incorporating dependence structures. In Vannitsem, S., Wilks, D. S. and Messner, J. W. (eds.), Statistical Postprocessing of Ensemble Forecasts. Elsevier, Amsterdam, pp. 91–125.
  • Schefzik et al., (2013) Schefzik, R., Thorarinsdottir T. L. and Gneiting, T. (2013) Uncertainty quantification in complex simulation models using ensemble copula coupling. Statist. Sci. 28, 616–640.
  • Scheuerer and Hamill, (2015) Scheuerer, M. and Hamill, T. M. (2015) Variogram-based proper scoring rules for probabilistic forecasts of multivariate quantities. Mon. Weather Rev. 143, 1321–1334.
  • Scheuerer et al., (2020) Scheuerer, M., Switanek, M. B., Worsnop, R. P. and Hamill, T. M. (2020) Using artificial neural networks for generating probabilistic subseasonal precipitation forecasts over California. Mon. Weather Rev. 148, 3489–3506.
  • Schultz and Lerch, (2022) Schultz, B. and Lerch, S. (2022) Machine learning methods for postprocessing ensemble forecasts of wind gusts: a systematic comparison. Mon. Weather Rev. 150, 235–257.
  • Taillardat et al., (2016) Taillardat, M., Mestre, O., Zamo, M. and Naveau, P. (2016) Calibrated ensemble forecasts using quantile regression forests and ensemble model output statistics. Mon. Weather Rev. 144, 2375–2393.
  • Thorarinsdottir and Gneiting, (2010) Thorarinsdottir, T. L. and Gneiting, T. (2010) Probabilistic forecasts of wind speed: ensemble model output statistics by using heteroscedastic censored regression. J. R. Stat. Soc. Ser. A Stat. Soc. 173, 371–388.
  • Thorarinsdottir et al., (2016) Thorarinsdottir, T. L., Scheuerer, M. and Heinz, C. (2016) Assessing the calibration of high-dimensional ensemble forecasts using rank histograms. J. Comput. Graph. Stat. 25, 105–122.
  • Vannitsem et al., (2021) Vannitsem, S., Bremnes, J. B., Demaeyer, J., Evans, G. R., Flowerdew, J., Hemri, S., Lerch, S., Roberts, N., Theis, S., Atencia, A., Ben Boualègue, Z., Bhend, J., Dabernig, M., De Cruz, L., Hieta, L., Mestre, O., Moret, L., Odak Plenkovič, I., Schmeits, M., Taillardat, M., Van den Bergh, J., Van Schaeybroeck, B., Whan, K. and Ylhaisi, J. (2021) Statistical postprocessing for weather forecasts – review, challenges and avenues in a big data world. Bull. Amer. Meteorol. Soc. 102, E681–E699.
  • Wilks, (2019) Wilks, D. S. (2019) Statistical Methods in the Atmospheric Sciences. 4th ed. Elsevier, Amsterdam.
  • WMO, (1992) World Meteorological Organization (1992) International Meteorological Vocabulary (WMO-No.182). WMO, Geneva.
  • WMO, (2018) World Meteorological Organization (2018) Guide to Instruments and Methods of Observation. Volume I – Measurement of Meteorological Variables (WMO-No.8). WMO, Geneva.