Search | arXiv e-print repository

Gradient-Boosted Generalized Linear Models for Conditional Vine Copulas

Authors: David Jobst, Annette Möller, Jürgen Groß

Abstract: Vine copulas are flexible dependence models using bivariate copulas as building blocks. If the parameters of the bivariate copulas in the vine copula depend on covariates, one obtains a conditional vine copula. We propose an extension for the estimation of continuous conditional vine copulas, where the parameters of continuous conditional bivariate copulas are estimated sequentially and separately… ▽ More Vine copulas are flexible dependence models using bivariate copulas as building blocks. If the parameters of the bivariate copulas in the vine copula depend on covariates, one obtains a conditional vine copula. We propose an extension for the estimation of continuous conditional vine copulas, where the parameters of continuous conditional bivariate copulas are estimated sequentially and separately via gradient-boosting. For this purpose, we link covariates via generalized linear models (GLMs) to Kendall's $τ$ correlation coefficient from which the corresponding copula parameter can be obtained. Consequently, the gradient-boosting algorithm estimates the copula parameters providing a natural covariate selection. In a second step, an additional covariate deselection procedure is applied. The performance of the gradient-boosted conditional vine copulas is illustrated in a simulation study. Linear covariate effects in low- and high-dimensional settings are investigated for the conditional bivariate copulas separately and for conditional vine copulas. Moreover, the gradient-boosted conditional vine copulas are applied to the temporal postprocessing of ensemble weather forecasts in a low-dimensional setting. The results show, that our suggested method is able to outperform the benchmark methods and identifies temporal correlations better. Eventually, we provide an R-package called boostCopula for this method. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2404.04964 [pdf, other]

The zero degree of freedom non-central chi squared distribution for ensemble postprocessing

Authors: Jürgen Groß, Annette Möller

Abstract: In this note the use of the zero degree non-central chi squared distribution as predictive distribution for ensemble postprocessing is investigated. It has a point mass at zero by definition, and is thus particularly suited for postprocessing weather variables naturally exhibiting large numbers of zeros, such as precipitation, solar radiation or lightnings. Due to the properties of the distributio… ▽ More In this note the use of the zero degree non-central chi squared distribution as predictive distribution for ensemble postprocessing is investigated. It has a point mass at zero by definition, and is thus particularly suited for postprocessing weather variables naturally exhibiting large numbers of zeros, such as precipitation, solar radiation or lightnings. Due to the properties of the distribution no additional truncation or censoring is required to obtain a positive probability at zero. The presented study investigates its performance compared to that of the censored generalized extreme value distribution and the censored and shifted gamma distribution for postprocessing 24h accumulated precipitation using an EMOS (ensemble model output statistics) approach with a rolling training period. The obtained results support the conclusion that it serves well as a predictive distribution in postprocessing precipitation and thus may also be considered in future analyses of other weather variables having substantial zero observations. △ Less

Submitted 7 April, 2024; originally announced April 2024.

MSC Class: 62P12

arXiv:2402.00555 [pdf, other]

Time Series based Ensemble Model Output Statistics for Temperature Forecasts Postprocessing

Authors: David Jobst, Annette Möller, Jürgen Groß

Abstract: Nowadays, weather prediction is based on numerical weather prediction (NWP) models to produce an ensemble of forecasts. Despite of large improvements over the last few decades, they still tend to exhibit systematic bias and dispersion errors. Consequently, these forecasts may be improved by statistical postprocessing. This work proposes an extension of the ensemble model output statistics (EMOS) m… ▽ More Nowadays, weather prediction is based on numerical weather prediction (NWP) models to produce an ensemble of forecasts. Despite of large improvements over the last few decades, they still tend to exhibit systematic bias and dispersion errors. Consequently, these forecasts may be improved by statistical postprocessing. This work proposes an extension of the ensemble model output statistics (EMOS) method in a time series framework. Besides of taking account of seasonality and trend in the location and scale parameter of the predictive distribution, the autoregressive process in the mean forecast errors or the standardized forecast errors is considered. The models can be further extended by allowing generalized autoregressive conditional heteroscedasticity (GARCH). Last but not least, it is outlined how to use these models for arbitrary forecast horizons. To illustrate the performance of the suggested EMOS models in time series fashion, we present a case study for the postprocessing of 2 m surface temperature forecasts using five different lead times and a set of observation stations in Germany. The results indicate that the time series EMOS extensions are able to significantly outperform the benchmark EMOS and autoregressive adjusted EMOS (AR-EMOS) in most of the lead time-station cases. To complement this article, our method is accompanied by an R-package called tsEMOS. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2309.05603 [pdf, other]

D-Vine GAM Copula based Quantile Regression with Application to Ensemble Postprocessing

Authors: David Jobst, Annette Möller, Jürgen Groß

Abstract: Temporal, spatial or spatio-temporal probabilistic models are frequently used for weather forecasting. The D-vine (drawable vine) copula quantile regression (DVQR) is a powerful tool for this application field, as it can automatically select important predictor variables from a large set and is able to model complex nonlinear relationships among them. However, the current DVQR does not always expl… ▽ More Temporal, spatial or spatio-temporal probabilistic models are frequently used for weather forecasting. The D-vine (drawable vine) copula quantile regression (DVQR) is a powerful tool for this application field, as it can automatically select important predictor variables from a large set and is able to model complex nonlinear relationships among them. However, the current DVQR does not always explicitly and economically allow to account for additional covariate effects, e.g. temporal or spatio-temporal information. Consequently, we propose an extension of the current DVQR, where we parametrize the bivariate copulas in the D-vine copula through Kendall's Tau which can be linked to additional covariates. The parametrization of the correlation parameter allows generalized additive models (GAMs) and spline smoothing to detect potentially hidden covariate effects. The new method is called GAM-DVQR, and its performance is illustrated in a case study for the postprocessing of 2m surface temperature forecasts. We investigate a constant as well as a time-dependent Kendall's Tau. The GAM-DVQR models are compared to the benchmark methods Ensemble Model Output Statistics (EMOS), its gradient-boosted extension (EMOS-GB) and basic DVQR. The results indicate that the GAM-DVQR models are able to identify time-dependent correlations as well as relevant predictor variables and significantly outperform the state-of-the-art methods EMOS and EMOS-GB. Furthermore, the introduced parameterization allows using a static training period for GAM-DVQR, yielding a more sustainable model estimation in comparison to DVQR using a sliding training window. Finally, we give an outlook of further applications and extensions of the GAM-DVQR model. To complement this article, our method is accompanied by an R-package called gamvinereg. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2309.02069 [pdf, other]

Some Additional Remarks on Statistical Properties of Cohen's d from Linear Regression

Authors: Jürgen Groß, Annette Möller

Abstract: The size of the effect of the difference in two groups with respect to a variable of interest may be estimated by the classical Cohen's $d$. A recently proposed generalized estimator allows conditioning on further independent variables within the framework of a linear regression model. In this note, it is demonstrated how unbiased estimation of the effect size parameter together with a correspondi… ▽ More The size of the effect of the difference in two groups with respect to a variable of interest may be estimated by the classical Cohen's $d$. A recently proposed generalized estimator allows conditioning on further independent variables within the framework of a linear regression model. In this note, it is demonstrated how unbiased estimation of the effect size parameter together with a corresponding standard error may be obtained based on the non-central $t$ distribution. The portrayed estimator may be considered as a natural generalization of the unbiased Hedges' $g$. In addition, confidence interval estimation for the unknown parameter is demonstrated by applying the so-called inversion confidence interval principle. The regarded properties collapse to already known ones in case of absence of any additional independent variables. The stated remarks are illustrated with a publicly available data set. △ Less

Submitted 5 September, 2023; originally announced September 2023.

MSC Class: 62J05 (Primary) 62F03; 62F10 (Secondary)

arXiv:2305.10095 [pdf, other]

Nonparametric estimation of the interventional disparity indirect effect among the exposed

Authors: Helene C. W. Rytgaard, Amalie Lykkemark Møller, Thomas A. Gerds

Abstract: In situations with non-manipulable exposures, interventions can be targeted to shift the distribution of intermediate variables between exposure groups to define interventional disparity indirect effects. In this work, we present a theoretical study of identification and nonparametric estimation of the interventional disparity indirect effect among the exposed. The targeted estimand is intended fo… ▽ More In situations with non-manipulable exposures, interventions can be targeted to shift the distribution of intermediate variables between exposure groups to define interventional disparity indirect effects. In this work, we present a theoretical study of identification and nonparametric estimation of the interventional disparity indirect effect among the exposed. The targeted estimand is intended for applications examining the outcome risk among an exposed population for which the risk is expected to be reduced if the distribution of a mediating variable was changed by a (hypothetical) policy or health intervention that targets the exposed population specifically. We derive the nonparametric efficient influence function, study its double robustness properties and present a targeted minimum loss-based estimation (TMLE) procedure. All theoretical results and algorithms are provided for both uncensored and right-censored survival outcomes. With offset in the ongoing discussion of the interpretation of non-manipulable exposures, we discuss relevant interpretations of the estimand under different sets of assumptions of no unmeasured confounding and provide a comparison of our estimand to other related estimands within the framework of interventional (disparity) effects. Small-sample performance and double robustness properties of our estimation procedure are investigated and illustrated in a simulation study. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: 35 pages, 1 figure

arXiv:2302.14580 [pdf, other]

Effect Size Estimation in Linear Mixed Models

Authors: Jürgen Groß, Annette Möller

Abstract: In this note, we reconsider Cohen's effect size measure $f^2$ under linear mixed models and demonstrate its application by employing an artificially generated data set. It is shown how $f^2$ can be computed with the statistical software environment R using lme4 without the need for specification and computation of a coefficient of determination. In this note, we reconsider Cohen's effect size measure $f^2$ under linear mixed models and demonstrate its application by employing an artificially generated data set. It is shown how $f^2$ can be computed with the statistical software environment R using lme4 without the need for specification and computation of a coefficient of determination. △ Less

Submitted 20 May, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

MSC Class: 62J05; 62J20; 62F03

arXiv:2211.10987 [pdf, other]

Finding active galactic nuclei through Fink

Authors: Etienne Russeil, Emille E. O. Ishida, Roman Le Montagner, Julien Peloton, Anais Moller

Abstract: We present the Active Galactic Nuclei (AGN) classifier as currently implemented within the Fink broker. Features were built upon summary statistics of available photometric points, as well as color estimation enabled by symbolic regression. The learning stage includes an active learning loop, used to build an optimized training sample from labels reported in astronomical catalogs. Using this metho… ▽ More We present the Active Galactic Nuclei (AGN) classifier as currently implemented within the Fink broker. Features were built upon summary statistics of available photometric points, as well as color estimation enabled by symbolic regression. The learning stage includes an active learning loop, used to build an optimized training sample from labels reported in astronomical catalogs. Using this method to classify real alerts from the Zwicky Transient Facility (ZTF), we achieved 98.0% accuracy, 93.8% precision and 88.5% recall. We also describe the modifications necessary to enable processing data from the upcoming Vera C. Rubin Observatory Large Survey of Space and Time (LSST), and apply them to the training sample of the Extended LSST Astronomical Time-series Classification Challenge (ELAsTiCC). Results show that our designed feature space enables high performances of traditional machine learning algorithms in this binary classification task. △ Less

Submitted 20 November, 2022; originally announced November 2022.

Comments: Accepted for the Machine learning and the Physical Sciences workshop of NeurIPS 2022

arXiv:2210.13048 [pdf, other]

A Note on Cohen's d From a Partitioned Linear Regression Model

Authors: Jürgen Groß, Annette Möller

Abstract: In this note we introduce a generalized formula for Cohen's $d$ under the presence of additional independent variables, providing a measure for the size of a possible effect concerning the location difference of a variable in two groups. This is done by employing the so-called Frisch-Waugh-Lovell theorem in a partitioned linear regression model. The generalization is motivated by demonstrating the… ▽ More In this note we introduce a generalized formula for Cohen's $d$ under the presence of additional independent variables, providing a measure for the size of a possible effect concerning the location difference of a variable in two groups. This is done by employing the so-called Frisch-Waugh-Lovell theorem in a partitioned linear regression model. The generalization is motivated by demonstrating the relationship to appropriate $t$ and $F$ statistics. Our discussion is further illustrated by inference from a publicly available data set. △ Less

Submitted 24 October, 2022; originally announced October 2022.

MSC Class: 62J20; 62F03; 91C99

arXiv:2210.11580 [pdf, other]

doi 10.1080/1743727X.2022.2128744

Predicting school transition rates in Austria with classification trees

Authors: Annette Möller, Ann Cathrice George, Jürgen Groß

Abstract: Methods based on machine learning become increasingly popular in many areas as they allow models to be fitted in a highly-data driven fashion, and often show comparable or even increased performance in comparison to classical methods. However, in the area of educational sciences the application of machine learning is still quite uncommon. This work investigates the benefit of using classification… ▽ More Methods based on machine learning become increasingly popular in many areas as they allow models to be fitted in a highly-data driven fashion, and often show comparable or even increased performance in comparison to classical methods. However, in the area of educational sciences the application of machine learning is still quite uncommon. This work investigates the benefit of using classification trees for analyzing data from educational sciences. An application to data on school transition rates in Austria indicates different aspects of interest in the context of educational sciences: (i) the trees select variables for predicting school transition rates in a data-driven fashion which are well in accordance with existing confirmatory theories from educational sciences, (ii) trees can be employed for performing variable selection for regression models, (iii) the classification performance of trees is comparable to that of binary regression models. These results indicate that trees and possibly other machine learning methods may also be helpful to explore high-dimensional educational data sets, especially where no confirmatory theories have been developed yet. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Journal ref: International Journal of Research & Method, 2022

arXiv:2007.09246 [pdf, other]

Global estimation of unintended pregnancy and abortion using a Bayesian hierarchical random walk model

Authors: Jonathan Marc Bearak, Anna Popinchalk, Bela Ganatra, Ann-Beth Moller, Özge Tunçalp, Cynthia Beavin, Lorraine Kwok, Leontine Alkema

Abstract: Unintended pregnancy and abortion estimates are needed to inform and motivate investment in global health programmes and policies. Variability in the availability and reliability of data poses challenges for producing estimates. We developed a Bayesian model that simultaneously estimates incidence of unintended pregnancy and abortion for 195 countries and territories. Our modelling strategy was in… ▽ More Unintended pregnancy and abortion estimates are needed to inform and motivate investment in global health programmes and policies. Variability in the availability and reliability of data poses challenges for producing estimates. We developed a Bayesian model that simultaneously estimates incidence of unintended pregnancy and abortion for 195 countries and territories. Our modelling strategy was informed by the proximate determinants of fertility with (i) incidence of unintended pregnancy defined by the number of women (grouped by marital and contraceptive use status) and their respective pregnancy rates, and (ii) abortion incidence defined by group-specific pregnancies and propensities to have an abortion. Hierarchical random walk models are used to estimate country-group-period-specific pregnancy rates and propensities to abort. △ Less

Submitted 17 July, 2020; originally announced July 2020.

arXiv:1909.08578 [pdf, other]

Estimating maternal mortality using data from national civil registration vital statistics systems: A Bayesian hierarchical bivariate random walk model to estimate sensitivity and specificity of reporting

Authors: Emily Peterson, Doris Chou, Ann-Beth Moller, Alison Gemmill, Lale Say, Leontine Alkema

Abstract: Civil registration vital statistics (CRVS) data are used to produce national estimates of maternal mortality, but are often subject to substantial reporting errors due to misclassification of maternal deaths. The accuracy of CRVS systems can be assessed by comparing CRVS-based counts of maternal and non-maternal deaths to those obtained from specialized studies, which are rigorous assessments of m… ▽ More Civil registration vital statistics (CRVS) data are used to produce national estimates of maternal mortality, but are often subject to substantial reporting errors due to misclassification of maternal deaths. The accuracy of CRVS systems can be assessed by comparing CRVS-based counts of maternal and non-maternal deaths to those obtained from specialized studies, which are rigorous assessments of maternal mortality for a given country-period. We developed a Bayesian bivariate random walk model to estimate sensitivity and specificity of the reporting on maternal mortality in CRVS data, and associated CRVS adjustment factors. The model was fitted to a global data set of CRVS and specialized study data. Validation exercises suggest that the model performs well in terms of predicting CRVS-based proportions of maternal deaths for country-periods without specialized studies. The new model is used by the UN Maternal Mortality Inter-Agency Group to account for misclassification errors when estimating maternal mortality using CRVS data. △ Less

Submitted 18 September, 2019; originally announced September 2019.

arXiv:1903.06739 [pdf, other]

doi 10.1002/qj.3667

Probabilistic Temperature Forecasting with a Heteroscedastic Autoregressive Ensemble Postprocessing model

Authors: Annette Möller, Jürgen Groß

Abstract: Weather prediction today is performed with numerical weather prediction (NWP) models. These are deterministic simulation models describing the dynamics of the atmosphere, and evolving the current conditions forward in time to obtain a prediction for future atmospheric states. To account for uncertainty in NWP models it has become common practice to employ ensembles of NWP forecasts. However, NWP e… ▽ More Weather prediction today is performed with numerical weather prediction (NWP) models. These are deterministic simulation models describing the dynamics of the atmosphere, and evolving the current conditions forward in time to obtain a prediction for future atmospheric states. To account for uncertainty in NWP models it has become common practice to employ ensembles of NWP forecasts. However, NWP ensembles often exhibit forecast biases and dispersion errors, thus require statistical postprocessing to improve reliability of the ensemble forecasts. This work proposes an extension of a recently developed postprocessing model utilizing autoregressive information present in the forecast error of the raw ensemble members. The original approach is modified to let the variance parameter depend on the ensemble spread, yielding a two-fold heteroscedastic model. Furthermore, an additional high-resolution forecast is included into the postprocessing model, yielding improved predictive performance. Finally, it is outlined how the autoregressive model can be utilized to postprocess ensemble forecasts with higher forecast horizons, without the necessity of making fundamental changes to the original model. We accompany the new methodology by an implementation within the R package ensAR to make our method available for other researchers working in this area. To illustrate the performance of the heteroscedastic extension of the autoregressive model, and its use for higher forecast horizons we present a case study for a data set containing 12 years of temperature forecasts and observations over Germany. The case study indicates that the autoregressive model yields particularly strong improvements for forecast horizons beyond 24 hours. △ Less

Submitted 15 March, 2019; originally announced March 2019.

arXiv:1811.02255 [pdf, other]

Vine copula based post-processing of ensemble forecasts for temperature

Authors: Annette Möller, Ludovica Spazzini, Daniel Kraus, Thomas Nagler, Claudia Czado

Abstract: Today weather forecasting is conducted using numerical weather prediction (NWP) models, consisting of a set of differential equations describing the dynamics of the atmosphere. The output of such NWP models are single deterministic forecasts of future atmospheric states. To assess uncertainty in NWP forecasts so-called forecast ensembles are utilized. They are generated by employing a NWP model fo… ▽ More Today weather forecasting is conducted using numerical weather prediction (NWP) models, consisting of a set of differential equations describing the dynamics of the atmosphere. The output of such NWP models are single deterministic forecasts of future atmospheric states. To assess uncertainty in NWP forecasts so-called forecast ensembles are utilized. They are generated by employing a NWP model for distinct variants. However, as forecast ensembles are not able to capture the full amount of uncertainty in an NWP model, they often exhibit biases and dispersion errors. Therefore it has become common practise to employ statistical post processing models which correct for biases and improve calibration. We propose a novel post processing approach based on D-vine copulas, representing the predictive distribution by its quantiles. These models allow for much more general dependence structures than the state-of-the-art EMOS model and is highly data adapted. Our D-vine quantile regression approach shows excellent predictive performance in comparative studies of temperature forecasts over Europe with different forecast horizons based on the 52-member ensemble of the European Centre for Medium-Range Weather Forecasting (ECMWF). Specifically for larger forecast horizons the method clearly improves over the benchmark EMOS model. △ Less

Submitted 6 November, 2018; originally announced November 2018.

arXiv:1511.03330 [pdf, other]

A Bayesian approach to the global estimation of maternal mortality

Authors: Leontine Alkema, Sanqian Zhang, Doris Chou, Alison Gemmill, Ann-Beth Moller, Doris Ma Fat, Lale Say, Colin Mathers, Daniel Hogan

Abstract: The maternal mortality ratio (MMR) is defined as the number of maternal deaths in a population per 100,000 live births. Country-specific MMR estimates are published on a regular basis by the United Nations Maternal Mortality Estimation Inter-agency Group (UN MMEIG) to track progress in reducing maternal deaths and to evaluate regional and national performance related to Millennium Development Goal… ▽ More The maternal mortality ratio (MMR) is defined as the number of maternal deaths in a population per 100,000 live births. Country-specific MMR estimates are published on a regular basis by the United Nations Maternal Mortality Estimation Inter-agency Group (UN MMEIG) to track progress in reducing maternal deaths and to evaluate regional and national performance related to Millennium Development Goal (MDG) 5, which calls for a 75% reduction in the MMR between 1990 and 2015. Until 2014, the UN MMEIG used a multilevel regression model for producing estimates for countries without sufficient data from vital registration systems. While this model worked well in the past to assess MMR levels for countries with limited data, it was deemed unsatisfactory for final MDG 5 reporting for countries where longer time series of observations had become available because by construction, estimated trends in the MMR were covariate-driven only and did not necessarily track data-driven trends. We developed a Bayesian maternal mortality estimation model, which extends upon the UN MMEIG multilevel regression model. The new model assesses data-driven trends through the inclusion of an ARIMA time series model that captures accelerations and decelerations in the rate of change in the MMR. Varying reporting and data quality issues are accounted for in source-specific data models. The revised model provides data-driven estimates of MMR levels and trends and will be used for MDG 5 reporting for all countries. △ Less

Submitted 10 November, 2015; originally announced November 2015.

arXiv:1508.01397 [pdf, other]

doi 10.1002/qj.2741

Probabilistic temperature forecasting based on an ensemble AR modification

Authors: Annette Möller, Jürgen Groß

Abstract: To address the uncertainty in outputs of numerical weather prediction (NWP) models, ensembles of forecasts are used. To obtain such an ensemble of forecasts the NWP model is run multiple times, each time with different formulations and/or initial or boundary conditions. To correct for possible biases and dispersion errors in the ensemble, statistical postprocessing models are frequently employed.… ▽ More To address the uncertainty in outputs of numerical weather prediction (NWP) models, ensembles of forecasts are used. To obtain such an ensemble of forecasts the NWP model is run multiple times, each time with different formulations and/or initial or boundary conditions. To correct for possible biases and dispersion errors in the ensemble, statistical postprocessing models are frequently employed. These statistical models yield full predictive probability distributions for a weather quantity of interest and thus allow for a more accurate assessment of forecast uncertainty. This paper proposes to combine the state of the art Ensemble Model Output Statistics (EMOS) with an ensemble that is adjusted by an AR process fitted to the respective error series by a spread-adjusted linear pool (SLP) in case of temperature forecasts. The basic ensemble modification technique we introduce may be used to simply adjust the ensemble itself as well as to obtain a full predictive distribution for the weather quantity. As demonstrated for temperature forecasts of the European Centre for Medium-Range Weather Forecasts (ECMWF) ensemble, the proposed procedure gives rise to improved results upon the basic (local) EMOS method. △ Less

Submitted 6 August, 2015; originally announced August 2015.

Comments: 18 pages, 4 figures, 5 tables

arXiv:1507.05066 [pdf, other]

Spatially adaptive, Bayesian estimation for probabilistic temperature forecasts

Authors: Annette Möller, Thordis L. Thorarinsdottir, Alex Lenkoski, Tilmann Gneiting

Abstract: Uncertainty in the prediction of future weather is commonly assessed through the use of forecast ensembles that employ a numerical weather prediction model in distinct variants. Statistical postprocessing can correct for biases in the numerical model and improves calibration. We propose a Bayesian version of the standard ensemble model output statistics (EMOS) postprocessing method, in which spati… ▽ More Uncertainty in the prediction of future weather is commonly assessed through the use of forecast ensembles that employ a numerical weather prediction model in distinct variants. Statistical postprocessing can correct for biases in the numerical model and improves calibration. We propose a Bayesian version of the standard ensemble model output statistics (EMOS) postprocessing method, in which spatially varying bias coefficients are interpreted as realizations of Gaussian Markov random fields. Our Markovian EMOS (MEMOS) technique utilizes the recently developed stochastic partial differential equation (SPDE) and integrated nested Laplace approximation (INLA) methods for computationally efficient inference. The MEMOS approach shows good predictive performance in a comparative study of 24-hour ahead temperature forecasts over Germany based on the 50-member ensemble of the European Centre for Medium-Range Weather Forecasting (ECMWF). △ Less

Submitted 15 June, 2016; v1 submitted 17 July, 2015; originally announced July 2015.

arXiv:1507.03479 [pdf, ps, other]

doi 10.1007/s00703-016-0467-8

Bivariate ensemble model output statistics approach for joint forecasting of wind speed and temperature

Authors: Sándor Baran, Annette Möller

Abstract: Forecast ensembles are typically employed to account for prediction uncertainties in numerical weather prediction models. However, ensembles often exhibit biases and dispersion errors, thus they require statistical post-processing to improve their predictive performance. Two popular univariate post-processing models are the Bayesian model averaging (BMA) and the ensemble model output statistics (E… ▽ More Forecast ensembles are typically employed to account for prediction uncertainties in numerical weather prediction models. However, ensembles often exhibit biases and dispersion errors, thus they require statistical post-processing to improve their predictive performance. Two popular univariate post-processing models are the Bayesian model averaging (BMA) and the ensemble model output statistics (EMOS). In the last few years increased interest has emerged in developing multivariate post-processing models, incorporating dependencies between weather quantities, such as for example a bivariate distribution for wind vectors or even a more general setting allowing to combine any types of weather variables. In line with a recently proposed approach to model temperature and wind speed jointly by a bivariate BMA model, this paper introduces a bivariate EMOS model for these weather quantities based on a truncated normal distribution. The bivariate EMOS model is applied to temperature and wind speed forecasts of the eight-member University of Washington mesoscale ensemble and of the eleven-member ALADIN-HUNEPS ensemble of the Hungarian Meteorological Service and its predictive performance is compared to the performance of the bivariate BMA model and a multivariate Gaussian copula approach, post-processing the margins with univariate EMOS. While the predictive skills of the compared methods are similar, the bivariate EMOS model requires considerably lower computation times than the bivariate BMA method. △ Less

Submitted 27 July, 2015; v1 submitted 13 July, 2015; originally announced July 2015.

Comments: 21 pages; 5 figures; 3 tables. arXiv admin note: text overlap with arXiv:1404.3681

Journal ref: Meteorology and Atmospheric Physics 129 (2017), no. 1, 99-112

arXiv:1404.3681 [pdf, ps, other]

doi 10.1002/env.2316

Joint probabilistic forecasting of wind speed and temperature using Bayesian model averaging

Authors: Sándor Baran, Annette Möller

Abstract: Ensembles of forecasts are typically employed to account for the forecast uncertainties inherent in predictions of future weather states. However, biases and dispersion errors often present in forecast ensembles require statistical post-processing. Univariate post-processing models such as Bayesian model averaging (BMA) have been successfully applied for various weather quantities. Nonetheless, BM… ▽ More Ensembles of forecasts are typically employed to account for the forecast uncertainties inherent in predictions of future weather states. However, biases and dispersion errors often present in forecast ensembles require statistical post-processing. Univariate post-processing models such as Bayesian model averaging (BMA) have been successfully applied for various weather quantities. Nonetheless, BMA and many other standard post-processing procedures are designed for a single weather variable, thus ignoring possible dependencies among weather quantities. In line with recently upcoming research to develop multivariate post-processing procedures, e.g., BMA for bivariate wind vectors, or flexible procedures applicable for multiple weather quantities of different types, a bivariate BMA model for joint calibration of wind speed and temperature forecasts is proposed based on the bivariate truncated normal distribution. It extends the univariate truncated normal BMA model designed for post-processing ensemble forecast of wind speed by adding a normally distributed temperature component with a covariance structure representing the dependency among the two weather quantities. The method is applied to wind speed and temperature forecasts of the eight-member University of Washington mesoscale ensemble and of the eleven-member ALADIN-HUNEPS ensemble of the Hungarian Meteorological Service and its predictive performance is compared to that of the general Gaussian copula method. The results indicate improved calibration of probabilistic and accuracy of point forecasts in comparison to the raw ensemble and the overall performance of this model is able to keep up with that of the Gaussian copula method. △ Less

Submitted 14 April, 2014; originally announced April 2014.

Comments: 22 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:1305.1184

Journal ref: Environmetrics 26 (2015), no. 2, 120-132

arXiv:1202.3956 [pdf, other]

doi 10.1002/qj.2009

Multivariate probabilistic forecasting using Bayesian model averaging and copulas

Authors: Annette Möller, Alex Lenkoski, Thordis L. Thorarinsdottir

Abstract: We propose a method for post-processing an ensemble of multivariate forecasts in order to obtain a joint predictive distribution of weather. Our method utilizes existing univariate post-processing techniques, in this case ensemble Bayesian model averaging (BMA), to obtain estimated marginal distributions. However, implementing these methods individually offers no information regarding the joint di… ▽ More We propose a method for post-processing an ensemble of multivariate forecasts in order to obtain a joint predictive distribution of weather. Our method utilizes existing univariate post-processing techniques, in this case ensemble Bayesian model averaging (BMA), to obtain estimated marginal distributions. However, implementing these methods individually offers no information regarding the joint distribution. To correct this, we propose the use of a Gaussian copula, which offers a simple procedure for recovering the dependence that is lost in the estimation of the ensemble BMA marginals. Our method is applied to 48-h forecasts of a set of five weather quantities using the 8-member University of Washington mesoscale ensemble. We show that our method recovers many well-understood dependencies between weather quantities and subsequently improves calibration and sharpness over both the raw ensemble and a method which does not incorporate joint distributional information. △ Less

Submitted 17 February, 2012; originally announced February 2012.

Comments: 17 pages, 4 figures

Showing 1–20 of 20 results for author: Möller, A