Computation
See recent articles
- [1] arXiv:2408.00206 [pdf, html, other]
-
Title: Gaussian Processes Sampling with Sparse Grids under Additive Schwarz PreconditionerComments: 20 pages, 12 figuresSubjects: Computation (stat.CO); Machine Learning (stat.ML)
Gaussian processes (GPs) are widely used in non-parametric Bayesian modeling, and play an important role in various statistical and machine learning applications. In a variety tasks of uncertainty quantification, generating random sample paths of GPs is of interest. As GP sampling requires generating high-dimensional Gaussian random vectors, it is computationally challenging if a direct method, such as the Cholesky decomposition, is used. In this paper, we propose a scalable algorithm for sampling random realizations of the prior and posterior of GP models. The proposed algorithm leverages inducing points approximation with sparse grids, as well as additive Schwarz preconditioners, which reduce computational complexity, and ensure fast convergence. We demonstrate the efficacy and accuracy of the proposed method through a series of experiments and comparisons with other recent works.
New submissions for Friday, 2 August 2024 (showing 1 of 1 entries )
- [2] arXiv:2408.00014 (cross-list from cs.DC) [pdf, html, other]
-
Title: Optimization of Energy Consumption Forecasting in Puno using Parallel Computing and ARIMA Models: An Innovative Approach to Big Data ProcessingComments: In preparation for Journal SubmissionSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computation (stat.CO); Machine Learning (stat.ML)
This research presents an innovative use of parallel computing with the ARIMA (AutoRegressive Integrated Moving Average) model to forecast energy consumption in Peru's Puno region. The study conducts a thorough and multifaceted analysis, focusing on the execution speed, prediction accuracy, and scalability of both sequential and parallel implementations. A significant emphasis is placed on efficiently managing large datasets. The findings demonstrate notable improvements in computational efficiency and data processing capabilities through the parallel approach, all while maintaining the accuracy and integrity of predictions. This new method provides a versatile and reliable solution for real-time predictive analysis and enhances energy resource management, which is particularly crucial for developing areas. In addition to highlighting the technical advantages of parallel computing in this field, the study explores its practical impacts on energy planning and sustainable development in regions like Puno.
- [3] arXiv:2408.00409 (cross-list from q-bio.PE) [pdf, html, other]
-
Title: Within-vector viral dynamics challenges how to model the extrinsic incubation period for major arboviruses: dengue, Zika, and chikungunyaSubjects: Populations and Evolution (q-bio.PE); Quantitative Methods (q-bio.QM); Computation (stat.CO)
Arboviruses represent a significant threat to human, animal, and plant health worldwide. To elucidate transmission, anticipate their spread and efficiently control them, mechanistic modelling has proven its usefulness. However, most models rely on assumptions about how the extrinsic incubation period (EIP) is represented: the intra-vector viral dynamics (IVD), occurring during the EIP, is approximated by a single state. After an average duration, all exposed vectors become infectious. Behind this are hidden two strong hypotheses: (i) EIP is exponentially distributed in the vector population; (ii) viruses successfully cross the infection, dissemination, and transmission barriers in all exposed vectors. To assess these hypotheses, we developed a stochastic compartmental model which represents successive IVD stages, associated to the crossing or not of these three barriers. We calibrated the model using an ABC-SMC (Approximate Bayesian Computation - Sequential Monte Carlo) method with model selection. We systematically searched for literature data on experimental infections of Aedes mosquitoes infected by either dengue, chikungunya, or Zika viruses. We demonstrated the discrepancy between the exponential hypothesis and observed EIP distributions for dengue and Zika viruses and identified more relevant EIP distributions . We also quantified the fraction of infected mosquitoes eventually becoming infectious, highlighting that often only a small fraction crosses the three barriers. This work provides a generic modelling framework applicable to other arboviruses for which similar data are available. Our model can also be coupled to population-scale models to aid future arbovirus control.
- [4] arXiv:2408.00507 (cross-list from stat.AP) [pdf, other]
-
Title: Spatial Weather, Socio-Economic and Political Risks in Probabilistic Load ForecastingSubjects: Applications (stat.AP); Computational Engineering, Finance, and Science (cs.CE); General Economics (econ.GN); Risk Management (q-fin.RM); Computation (stat.CO)
Accurate forecasts of the impact of spatial weather and pan-European socio-economic and political risks on hourly electricity demand for the mid-term horizon are crucial for strategic decision-making amidst the inherent uncertainty. Most importantly, these forecasts are essential for the operational management of power plants, ensuring supply security and grid stability, and in guiding energy trading and investment decisions. The primary challenge for this forecasting task lies in disentangling the multifaceted drivers of load, which include national deterministic (daily, weekly, annual, and holiday patterns) and national stochastic weather and autoregressive effects. Additionally, transnational stochastic socio-economic and political effects add further complexity, in particular, due to their non-stationarity. To address this challenge, we present an interpretable probabilistic mid-term forecasting model for the hourly load that captures, besides all deterministic effects, the various uncertainties in load. This model recognizes transnational dependencies across 24 European countries, with multivariate modeled socio-economic and political states and cross-country dependent forecasting. Built from interpretable Generalized Additive Models (GAMs), the model enables an analysis of the transmission of each incorporated effect to the hour-specific load. Our findings highlight the vulnerability of countries reliant on electric heating under extreme weather scenarios. This emphasizes the need for high-resolution forecasting of weather effects on pan-European electricity consumption especially in anticipation of widespread electric heating adoption.
- [5] arXiv:2408.00651 (cross-list from stat.ME) [pdf, html, other]
-
Title: A Dirichlet stochastic block model for composition-weighted networksSubjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)
Network data are observed in various applications where the individual entities of the system interact with or are connected to each other, and often these interactions are defined by their associated strength or importance. Clustering is a common task in network analysis that involves finding groups of nodes displaying similarities in the way they interact with the rest of the network. However, most clustering methods use the strengths of connections between entities in their original form, ignoring the possible differences in the capacities of individual nodes to send or receive edges. This often leads to clustering solutions that are heavily influenced by the nodes' capacities. One way to overcome this is to analyse the strengths of connections in relative rather than absolute terms, expressing each edge weight as a proportion of the sending (or receiving) capacity of the respective node. This, however, induces additional modelling constraints that most existing clustering methods are not designed to handle. In this work we propose a stochastic block model for composition-weighted networks based on direct modelling of compositional weight vectors using a Dirichlet mixture, with the parameters determined by the cluster labels of the sender and the receiver nodes. Inference is implemented via an extension of the classification expectation-maximisation algorithm that uses a working independence assumption, expressing the complete data likelihood of each node of the network as a function of fixed cluster labels of the remaining nodes. A model selection criterion is derived to aid the choice of the number of clusters. The model is validated using simulation studies, and showcased on network data from the Erasmus exchange program and a bike sharing network for the city of London.
Cross submissions for Friday, 2 August 2024 (showing 4 of 4 entries )
- [6] arXiv:2110.00314 (replaced) [pdf, html, other]
-
Title: Confounder importance learning for treatment effect inferenceSubjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)
We address modelling and computational issues for multiple treatment effect inference under many potential confounders.
Our main contribution is providing a trade-off between preventing the omission of relevant confounders, while not running into an over-selection of instruments that significantly inflates variance. We propose a novel empirical Bayes framework for Bayesian model averaging that learns from data the extent to which the inclusion of key covariates should be encouraged.
Our framework sets a prior that asymptotically matches the true amount of confounding in the data, as measured by a novel confounding coefficient. A key challenge is computational. We develop fast algorithms, using an exact gradient of the marginal likelihood that has linear cost in the number of covariates, and a variational counterpart. Our framework uses widely-used ingredients and largely existing software, and it is implemented within the R package mombf. We illustrate our work with two applications. The first is the association between salary variation and discriminatory factors. The second, that has been debated in previous works, is the association between abortion policies and crime. Our approach provides insights that differ from previous analyses especially in situations with weaker treatment effects. - [7] arXiv:2402.06133 (replaced) [pdf, other]
-
Title: Leveraging Quadratic Polynomials in Python for Advanced Data AnalysisComments: The datasets can be freely accessed at this https URL. To facilitate ease of use and accessibility, the code was made available through this http URL (this https URL)Subjects: Methodology (stat.ME); Computation (stat.CO)
This research explores the application of quadratic polynomials in Python for advanced data analysis. The study demonstrates how quadratic models can effectively capture nonlinear relationships in complex datasets by leveraging Python libraries such as NumPy, Matplotlib, scikit-learn, and Pandas. The methodology involves fitting quadratic polynomials to the data using least-squares regression and evaluating the model fit using the coefficient of determination (R-squared). The results highlight the strong performance of the quadratic polynomial fit, as evidenced by high R-squared values, indicating the model's ability to explain a substantial proportion of the data variability. Comparisons with linear and cubic models further underscore the quadratic model's balance between simplicity and precision for many practical applications. The study also acknowledges the limitations of quadratic polynomials and proposes future research directions to enhance their accuracy and efficiency for diverse data analysis tasks. This research bridges the gap between theoretical concepts and practical implementation, providing an accessible Python-based tool for leveraging quadratic polynomials in data analysis.
- [8] arXiv:2404.12290 (replaced) [pdf, other]
-
Title: Debiased Distribution CompressionComments: Published at ICML 2024Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME)
Modern compression methods can summarize a target distribution $\mathbb{P}$ more succinctly than i.i.d. sampling but require access to a low-bias input sequence like a Markov chain converging quickly to $\mathbb{P}$. We introduce a new suite of compression methods suitable for compression with biased input sequences. Given $n$ points targeting the wrong distribution and quadratic time, Stein kernel thinning (SKT) returns $\sqrt{n}$ equal-weighted points with $\widetilde{O}(n^{-1/2})$ maximum mean discrepancy (MMD) to $\mathbb{P}$. For larger-scale compression tasks, low-rank SKT achieves the same feat in sub-quadratic time using an adaptive low-rank debiasing procedure that may be of independent interest. For downstream tasks that support simplex or constant-preserving weights, Stein recombination and Stein Cholesky achieve even greater parsimony, matching the guarantees of SKT with as few as $\text{poly-log}(n)$ weighted points. Underlying these advances are new guarantees for the quality of simplex-weighted coresets, the spectral decay of kernel matrices, and the covering numbers of Stein kernel Hilbert spaces. In our experiments, our techniques provide succinct and accurate posterior summaries while overcoming biases due to burn-in, approximate Markov chain Monte Carlo, and tempering.