Search | arXiv e-print repository

Stochastic diagonal estimation with adaptive parameter selection

Authors: Zongyuan Han, Wenhao Li, Shengxin Zhu

Abstract: In this paper, we investigate diagonal estimation for large or implicit matrices, aiming to develop a novel and efficient stochastic algorithm that incorporates adaptive parameter selection. We explore the influence of different eigenvalue distributions on diagonal estimation and analyze the necessity of introducing the projection method and adaptive parameter optimization into the stochastic diag… ▽ More In this paper, we investigate diagonal estimation for large or implicit matrices, aiming to develop a novel and efficient stochastic algorithm that incorporates adaptive parameter selection. We explore the influence of different eigenvalue distributions on diagonal estimation and analyze the necessity of introducing the projection method and adaptive parameter optimization into the stochastic diagonal estimator. Based on this analysis, we derive a lower bound on the number of random query vectors needed to satisfy a given probabilistic error bound, which forms the foundation of our adaptive stochastic diagonal estimation algorithm. Finally, numerical experiments demonstrate the effectiveness of the proposed estimator for various matrix types, showcasing its efficiency and stability compared to other existing stochastic diagonal estimation methods. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.04390 [pdf, ps, other]

Approximate Maximum Likelihood Inference for Acoustic Spatial Capture-Recapture with Unknown Identities, Using Monte Carlo Expectation Maximization

Authors: Yuheng Wang, Juan Ye, Weiye Li, David L. Borchers

Abstract: Acoustic spatial capture-recapture (ASCR) surveys with an array of synchronized acoustic detectors can be an effective way of estimating animal density or call density. However, constructing the capture histories required for ASCR analysis is challenging, as recognizing which detections at different detectors are of which calls is not a trivial task. Because calls from different distances take dif… ▽ More Acoustic spatial capture-recapture (ASCR) surveys with an array of synchronized acoustic detectors can be an effective way of estimating animal density or call density. However, constructing the capture histories required for ASCR analysis is challenging, as recognizing which detections at different detectors are of which calls is not a trivial task. Because calls from different distances take different times to arrive at detectors, the order in which calls are detected is not necessarily the same as the order in which they are made, and without knowing which detections are of the same call, we do not know how many different calls are detected. We propose a Monte Carlo expectation-maximization (MCEM) estimation method to resolve this unknown call identity problem. To implement the MCEM method in this context, we sample the latent variables from a complete-data likelihood model in the expectation step and use a semi-complete-data likelihood or conditional likelihood in the maximization step. We use a parametric bootstrap to obtain confidence intervals. When we apply our method to a survey of moss frogs, it gives an estimate within 15% of the estimate obtained using data with call capture histories constructed by experts, and unlike this latter estimate, our confidence interval incorporates the uncertainty about call identities. Simulations show it to have a low bias (6%) and coverage probabilities close to the nominal 95% value. △ Less

Submitted 6 October, 2024; originally announced October 2024.

Comments: 36 pages, 3 figures, 1 table

arXiv:2409.01248 [pdf, other]

Nonparametric Estimation of Path-specific Effects in Presence of Nonignorable Missing Covariates

Authors: Jiawei Shan, Ting Wang, Wei Li, Chunrong Ai

Abstract: The path-specific effect (PSE) is of primary interest in mediation analysis when multiple intermediate variables between treatment and outcome are observed, as it can isolate the specific effect through each mediator, thus mitigating potential bias arising from other intermediate variables serving as mediator-outcome confounders. However, estimation and inference of PSE become challenging in the p… ▽ More The path-specific effect (PSE) is of primary interest in mediation analysis when multiple intermediate variables between treatment and outcome are observed, as it can isolate the specific effect through each mediator, thus mitigating potential bias arising from other intermediate variables serving as mediator-outcome confounders. However, estimation and inference of PSE become challenging in the presence of nonignorable missing covariates, a situation particularly common in epidemiological research involving sensitive patient information. In this paper, we propose a fully nonparametric methodology to address this challenge. We establish identification for PSE by expressing it as a functional of observed data and demonstrate that the associated nuisance functions can be uniquely determined through sequential optimization problems by leveraging a shadow variable. Then we propose a sieve-based regression imputation approach for estimation. We establish the large-sample theory for the proposed estimator, and introduce a robust and efficient approach to make inference for PSE. The proposed method is applied to the NHANES dataset to investigate the mediation roles of dyslipidemia and obesity in the pathway from Type 2 diabetes mellitus to cardiovascular disease. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: 37 pages, 6 figures

arXiv:2408.09722 [pdf, other]

Towards Few-Shot Learning in the Open World: A Review and Beyond

Authors: Hui Xue, Yuexuan An, Yongchun Qin, Wenqian Li, Yixin Wu, Yongjuan Che, Pengfei Fang, Minling Zhang

Abstract: Human intelligence is characterized by our ability to absorb and apply knowledge from the world around us, especially in rapidly acquiring new concepts from minimal examples, underpinned by prior knowledge. Few-shot learning (FSL) aims to mimic this capacity by enabling significant generalizations and transferability. However, traditional FSL frameworks often rely on assumptions of clean, complete… ▽ More Human intelligence is characterized by our ability to absorb and apply knowledge from the world around us, especially in rapidly acquiring new concepts from minimal examples, underpinned by prior knowledge. Few-shot learning (FSL) aims to mimic this capacity by enabling significant generalizations and transferability. However, traditional FSL frameworks often rely on assumptions of clean, complete, and static data, conditions that are seldom met in real-world environments. Such assumptions falter in the inherently uncertain, incomplete, and dynamic contexts of the open world. This paper presents a comprehensive review of recent advancements designed to adapt FSL for use in open-world settings. We categorize existing methods into three distinct types of open-world few-shot learning: those involving varying instances, varying classes, and varying distributions. Each category is discussed in terms of its specific challenges and methods, as well as its strengths and weaknesses. We standardize experimental settings and metric benchmarks across scenarios, and provide a comparative analysis of the performance of various methods. In conclusion, we outline potential future research directions for this evolving field. It is our hope that this review will catalyze further development of effective solutions to these complex challenges, thereby advancing the field of artificial intelligence. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2407.18166 [pdf, other]

Identification and multiply robust estimation of causal effects via instrumental variables from an auxiliary heterogeneous population

Authors: Wei Li, Jiapeng Liu, Peng Ding, Zhi Geng

Abstract: Evaluating causal effects in a primary population of interest with unmeasured confounders is challenging. Although instrumental variables (IVs) are widely used to address unmeasured confounding, they may not always be available in the primary population. Fortunately, IVs might have been used in previous observational studies on similar causal problems, and these auxiliary studies can be useful to… ▽ More Evaluating causal effects in a primary population of interest with unmeasured confounders is challenging. Although instrumental variables (IVs) are widely used to address unmeasured confounding, they may not always be available in the primary population. Fortunately, IVs might have been used in previous observational studies on similar causal problems, and these auxiliary studies can be useful to infer causal effects in the primary population, even if they represent different populations. However, existing methods often assume homogeneity or equality of conditional average treatment effects between the primary and auxiliary populations, which may be limited in practice. This paper aims to remove the homogeneity requirement and establish a novel identifiability result allowing for different conditional average treatment effects across populations. We also construct a multiply robust estimator that remains consistent despite partial misspecifications of the observed data model and achieves local efficiency if all nuisance models are correct. The proposed approach is illustrated through simulation studies. We finally apply our approach by leveraging data from lower income individuals with cigarette price as a valid IV to evaluate the causal effect of smoking on physical functional status in higher income group where strong IVs are not available. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.11646 [pdf, other]

Discovery and inference of possibly bi-directional causal relationships with invalid instrumental variables

Authors: Wei Li, Rui Duan, Sai Li

Abstract: Learning causal relationships between pairs of complex traits from observational studies is of great interest across various scientific domains. However, most existing methods assume the absence of unmeasured confounding and restrict causal relationships between two traits to be uni-directional, which may be violated in real-world systems. In this paper, we address the challenge of causal discover… ▽ More Learning causal relationships between pairs of complex traits from observational studies is of great interest across various scientific domains. However, most existing methods assume the absence of unmeasured confounding and restrict causal relationships between two traits to be uni-directional, which may be violated in real-world systems. In this paper, we address the challenge of causal discovery and effect inference for two traits while accounting for unmeasured confounding and potential feedback loops. By leveraging possibly invalid instrumental variables, we provide identification conditions for causal parameters in a model that allows for bi-directional relationships, and we also establish identifiability of the causal direction under the introduced conditions. Then we propose a data-driven procedure to detect the causal direction and provide inference results about causal effects along the identified direction. We show that our method consistently recovers the true direction and produces valid confidence intervals for the causal effect. We conduct extensive simulation studies to show that our proposal outperforms existing methods. We finally apply our method to analyze real data sets from UK Biobank. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.00397 [pdf, other]

Markovian Gaussian Process: A Universal State-Space Representation for Stationary Temporal Gaussian Process

Authors: Weihan Li, Yule Wang, Chengrui Li, Anqi Wu

Abstract: Gaussian Processes (GPs) and Linear Dynamical Systems (LDSs) are essential time series and dynamic system modeling tools. GPs can handle complex, nonlinear dynamics but are computationally demanding, while LDSs offer efficient computation but lack the expressive power of GPs. To combine their benefits, we introduce a universal method that allows an LDS to mirror stationary temporal GPs. This state… ▽ More Gaussian Processes (GPs) and Linear Dynamical Systems (LDSs) are essential time series and dynamic system modeling tools. GPs can handle complex, nonlinear dynamics but are computationally demanding, while LDSs offer efficient computation but lack the expressive power of GPs. To combine their benefits, we introduce a universal method that allows an LDS to mirror stationary temporal GPs. This state-space representation, known as the Markovian Gaussian Process (Markovian GP), leverages the flexibility of kernel functions while maintaining efficient linear computation. Unlike existing GP-LDS conversion methods, which require separability for most multi-output kernels, our approach works universally for single- and multi-output stationary temporal kernels. We evaluate our method by computing covariance, performing regression tasks, and applying it to a neuroscience application, demonstrating that our method provides an accurate state-space representation for stationary temporal GPs. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.16708 [pdf, other]

CausalFormer: An Interpretable Transformer for Temporal Causal Discovery

Authors: Lingbai Kong, Wengen Li, Hanchen Yang, Yichao Zhang, Jihong Guan, Shuigeng Zhou

Abstract: Temporal causal discovery is a crucial task aimed at uncovering the causal relations within time series data. The latest temporal causal discovery methods usually train deep learning models on prediction tasks to uncover the causality between time series. They capture causal relations by analyzing the parameters of some components of the trained models, e.g., attention weights and convolution weig… ▽ More Temporal causal discovery is a crucial task aimed at uncovering the causal relations within time series data. The latest temporal causal discovery methods usually train deep learning models on prediction tasks to uncover the causality between time series. They capture causal relations by analyzing the parameters of some components of the trained models, e.g., attention weights and convolution weights. However, this is an incomplete mapping process from the model parameters to the causality and fails to investigate the other components, e.g., fully connected layers and activation functions, that are also significant for causal discovery. To facilitate the utilization of the whole deep learning models in temporal causal discovery, we proposed an interpretable transformer-based causal discovery model termed CausalFormer, which consists of the causality-aware transformer and the decomposition-based causality detector. The causality-aware transformer learns the causal representation of time series data using a prediction task with the designed multi-kernel causal convolution which aggregates each input time series along the temporal dimension under the temporal priority constraint. Then, the decomposition-based causality detector interprets the global structure of the trained causality-aware transformer with the proposed regression relevance propagation to identify potential causal relations and finally construct the causal graph. Experiments on synthetic, simulated, and real datasets demonstrate the state-of-the-art performance of CausalFormer on discovering temporal causality. Our code is available at https://github.com/lingbai-kong/CausalFormer. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.13936 [pdf, other]

Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods

Authors: Tim Tsz-Kit Lau, Weijian Li, Chenwei Xu, Han Liu, Mladen Kolar

Abstract: Modern deep neural networks often require distributed training with many workers due to their large size. As worker numbers increase, communication overheads become the main bottleneck in data-parallel minibatch stochastic gradient methods with per-iteration gradient synchronization. Local gradient methods like Local SGD reduce communication by only syncing after several local steps. Despite under… ▽ More Modern deep neural networks often require distributed training with many workers due to their large size. As worker numbers increase, communication overheads become the main bottleneck in data-parallel minibatch stochastic gradient methods with per-iteration gradient synchronization. Local gradient methods like Local SGD reduce communication by only syncing after several local steps. Despite understanding their convergence in i.i.d. and heterogeneous settings and knowing the importance of batch sizes for efficiency and generalization, optimal local batch sizes are difficult to determine. We introduce adaptive batch size strategies for local gradient methods that increase batch sizes adaptively to reduce minibatch gradient variance. We provide convergence guarantees under homogeneous data conditions and support our claims with image classification experiments, demonstrating the effectiveness of our strategies in training and generalization. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.10554 [pdf, other]

Causal Inference with Outcomes Truncated by Death and Missing Not at Random

Authors: Wei Li, Yuan Liu, Shanshan Luo, Zhi Geng

Abstract: In clinical trials, principal stratification analysis is commonly employed to address the issue of truncation by death, where a subject dies before the outcome can be measured. However, in practice, many survivor outcomes may remain uncollected or be missing not at random, posing a challenge to standard principal stratification analyses. In this paper, we explore the identification, estimation, an… ▽ More In clinical trials, principal stratification analysis is commonly employed to address the issue of truncation by death, where a subject dies before the outcome can be measured. However, in practice, many survivor outcomes may remain uncollected or be missing not at random, posing a challenge to standard principal stratification analyses. In this paper, we explore the identification, estimation, and bounds of the average treatment effect within a subpopulation of individuals who would potentially survive under both treatment and control conditions. We show that the causal parameter of interest can be identified by introducing a proxy variable that affects the outcome only through the principal strata, while requiring that the treatment variable does not directly affect the missingness mechanism. Subsequently, we propose an approach for estimating causal parameters and derive nonparametric bounds in cases where identification assumptions are violated. We illustrate the performance of the proposed method through simulation studies and a real dataset obtained from a Human Immunodeficiency Virus (HIV) study. △ Less

Submitted 2 August, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.06767 [pdf]

ULV: A robust statistical method for clustered data, with applications to multisubject, single-cell omics data

Authors: Mingyu Du, Kevin Johnston, Veronica Berrocal, Wei Li, Xiangmin Xu, Zhaoxia Yu

Abstract: Molecular and genomic technological advancements have greatly enhanced our understanding of biological processes by allowing us to quantify key biological variables such as gene expression, protein levels, and microbiome compositions. These breakthroughs have enabled us to achieve increasingly higher levels of resolution in our measurements, exemplified by our ability to comprehensively profile bi… ▽ More Molecular and genomic technological advancements have greatly enhanced our understanding of biological processes by allowing us to quantify key biological variables such as gene expression, protein levels, and microbiome compositions. These breakthroughs have enabled us to achieve increasingly higher levels of resolution in our measurements, exemplified by our ability to comprehensively profile biological information at the single-cell level. However, the analysis of such data faces several critical challenges: limited number of individuals, non-normality, potential dropouts, outliers, and repeated measurements from the same individual. In this article, we propose a novel method, which we call U-statistic based latent variable (ULV). Our proposed method takes advantage of the robustness of rank-based statistics and exploits the statistical efficiency of parametric methods for small sample sizes. It is a computationally feasible framework that addresses all the issues mentioned above simultaneously. An additional advantage of ULV is its flexibility in modeling various types of single-cell data, including both RNA and protein abundance. The usefulness of our method is demonstrated in two studies: a single-cell proteomics study of acute myelogenous leukemia (AML) and a single-cell RNA study of COVID-19 symptoms. In the AML study, ULV successfully identified differentially expressed proteins that would have been missed by the pseudobulk version of the Wilcoxon rank-sum test. In the COVID-19 study, ULV identified genes associated with covariates such as age and gender, and genes that would be missed without adjusting for covariates. The differentially expressed genes identified by our method are less biased toward genes with high expression levels. Furthermore, ULV identified additional gene pathways likely contributing to the mechanisms of COVID-19 severity. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.04201 [pdf, ps, other]

Securing Equal Share: A Principled Approach for Learning Multiplayer Symmetric Games

Authors: Jiawei Ge, Yuanhao Wang, Wenzhe Li, Chi Jin

Abstract: This paper examines multiplayer symmetric constant-sum games with more than two players in a competitive setting, including examples like Mahjong, Poker, and various board and video games. In contrast to two-player zero-sum games, equilibria in multiplayer games are neither unique nor non-exploitable, failing to provide meaningful guarantees when competing against opponents who play different equi… ▽ More This paper examines multiplayer symmetric constant-sum games with more than two players in a competitive setting, including examples like Mahjong, Poker, and various board and video games. In contrast to two-player zero-sum games, equilibria in multiplayer games are neither unique nor non-exploitable, failing to provide meaningful guarantees when competing against opponents who play different equilibria or non-equilibrium strategies. This gives rise to a series of long-lasting fundamental questions in multiplayer games regarding suitable objectives, solution concepts, and principled algorithms. This paper takes an initial step towards addressing these challenges by focusing on the natural objective of equal share -- securing an expected payoff of C/n in an n-player symmetric game with a total payoff of C. We rigorously identify the theoretical conditions under which achieving an equal share is tractable and design a series of efficient algorithms, inspired by no-regret learning, that provably attain approximate equal share across various settings. Furthermore, we provide complementary lower bounds that justify the sharpness of our theoretical results. Our experimental results highlight worst-case scenarios where meta-algorithms from prior state-of-the-art systems for multiplayer games fail to secure an equal share, while our algorithm succeeds, demonstrating the effectiveness of our approach. △ Less

Submitted 2 October, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

arXiv:2405.08699 [pdf]

Weakly-supervised causal discovery based on fuzzy knowledge and complex data complementarity

Authors: Wenrui Li, Wei Zhang, Qinghao Zhang, Xuegong Zhang, Xiaowo Wang

Abstract: Causal discovery based on observational data is important for deciphering the causal mechanism behind complex systems. However, the effectiveness of existing causal discovery methods is limited due to inferior prior knowledge, domain inconsistencies, and the challenges of high-dimensional datasets with small sample sizes. To address this gap, we propose a novel weakly-supervised fuzzy knowledge an… ▽ More Causal discovery based on observational data is important for deciphering the causal mechanism behind complex systems. However, the effectiveness of existing causal discovery methods is limited due to inferior prior knowledge, domain inconsistencies, and the challenges of high-dimensional datasets with small sample sizes. To address this gap, we propose a novel weakly-supervised fuzzy knowledge and data co-driven causal discovery method named KEEL. KEEL adopts a fuzzy causal knowledge schema to encapsulate diverse types of fuzzy knowledge, and forms corresponding weakened constraints. This schema not only lessens the dependency on expertise but also allows various types of limited and error-prone fuzzy knowledge to guide causal discovery. It can enhance the generalization and robustness of causal discovery, especially in high-dimensional and small-sample scenarios. In addition, we integrate the extended linear causal model (ELCM) into KEEL for dealing with the multi-distribution and incomplete data. Extensive experiments with different datasets demonstrate the superiority of KEEL over several state-of-the-art methods in accuracy, robustness and computational efficiency. For causal discovery in real protein signal transduction processes, KEEL outperforms the benchmark method with limited data. In summary, KEEL is effective to tackle the causal discovery tasks with higher accuracy while alleviating the requirement for extensive domain expertise. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2404.16444 [pdf, other]

Automating the Discovery of Partial Differential Equations in Dynamical Systems

Authors: Weizhen Li, Rui Carvalho

Abstract: Identifying partial differential equations (PDEs) from data is crucial for understanding the governing mechanisms of natural phenomena, yet it remains a challenging task. We present an extension to the ARGOS framework, ARGOS-RAL, which leverages sparse regression with the recurrent adaptive lasso to identify PDEs from limited prior knowledge automatically. Our method automates calculating partial… ▽ More Identifying partial differential equations (PDEs) from data is crucial for understanding the governing mechanisms of natural phenomena, yet it remains a challenging task. We present an extension to the ARGOS framework, ARGOS-RAL, which leverages sparse regression with the recurrent adaptive lasso to identify PDEs from limited prior knowledge automatically. Our method automates calculating partial derivatives, constructing a candidate library, and estimating a sparse model. We rigorously evaluate the performance of ARGOS-RAL in identifying canonical PDEs under various noise levels and sample sizes, demonstrating its robustness in handling noisy and non-uniformly distributed data. We also test the algorithm's performance on datasets consisting solely of random noise to simulate scenarios with severely compromised data quality. Our results show that ARGOS-RAL effectively and reliably identifies the underlying PDEs from data, outperforming the sequential threshold ridge regression method in most cases. We highlight the potential of combining statistical methods, machine learning, and dynamical systems theory to automatically discover governing equations from collected data, streamlining the scientific modeling process. △ Less

Submitted 2 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: 18 pages, 6 figures, 1 table

arXiv:2404.03830 [pdf, other]

BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model

Authors: Chenwei Xu, Yu-Chao Huang, Jerry Yao-Chieh Hu, Weijian Li, Ammar Gilani, Hsi-Sheng Goan, Han Liu

Abstract: We introduce the \textbf{B}i-Directional \textbf{S}parse \textbf{Hop}field Network (\textbf{BiSHop}), a novel end-to-end framework for deep tabular learning. BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in tabular data. Our key motivation comes from the recent established connection between associative memory and a… ▽ More We introduce the \textbf{B}i-Directional \textbf{S}parse \textbf{Hop}field Network (\textbf{BiSHop}), a novel end-to-end framework for deep tabular learning. BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in tabular data. Our key motivation comes from the recent established connection between associative memory and attention mechanisms. Consequently, BiSHop uses a dual-component approach, sequentially processing data both column-wise and row-wise through two interconnected directional learning modules. Computationally, these modules house layers of generalized sparse modern Hopfield layers, a sparse extension of the modern Hopfield model with adaptable sparsity. Methodologically, BiSHop facilitates multi-scale representation learning, capturing both intra-feature and inter-feature interactions, with adaptive sparsity at each scale. Empirically, through experiments on diverse real-world datasets, we demonstrate that BiSHop surpasses current SOTA methods with significantly less HPO runs, marking it a robust solution for deep tabular learning. △ Less

Submitted 12 July, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: 31 pages; Code available at https://github.com/MAGICS-LAB/BiSHop

arXiv:2404.03828 [pdf, other]

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Authors: Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Robin Luo, Hong-Yu Chen, Weijian Li, Wei-Po Wang, Han Liu

Abstract: We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an out… ▽ More We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism (${\rm Softmax}_1$): it is an approximation of the memory retrieval process of $\mathrm{OutEffHop}$. Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point convergence and exponential storage capacity. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models (including BERT, OPT, ViT, and STanHop-Net), benchmarking against state-of-the-art methods like $\mathtt{Clipped\_Softmax}$ and $\mathtt{Gated\_Attention}$. Notably, $\mathrm{OutEffHop}$ achieves an average reduction of 22+\% in average kurtosis and 26+\% in the maximum infinity norm of model outputs across four models. Code is available at \href{https://github.com/MAGICS-LAB/OutEffHop}{GitHub}; models are on \href{https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f}{Hugging Face Hub}; future updates are on \href{https://arxiv.org/abs/2404.03828}{arXiv}. △ Less

Submitted 26 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted at ICML 2024; v2 updated to camera-ready version; Code available at https://github.com/MAGICS-LAB/OutEffHop; Models are on Hugging Face: https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f

arXiv:2404.02313 [pdf, other]

Optimal combination of composite likelihoods using approximate Bayesian computation with application to state-space models

Authors: Wentao Li, Rosabeth White, Dennis Prangle

Abstract: Composite likelihood provides approximate inference when the full likelihood is intractable and sub-likelihood functions of marginal events can be evaluated relatively easily. It has been successfully applied for many complex models. However, its wider application is limited by two issues. First, weight selection of marginal likelihood can have a significant impact on the information efficiency an… ▽ More Composite likelihood provides approximate inference when the full likelihood is intractable and sub-likelihood functions of marginal events can be evaluated relatively easily. It has been successfully applied for many complex models. However, its wider application is limited by two issues. First, weight selection of marginal likelihood can have a significant impact on the information efficiency and is currently an open question. Second, calibrated Bayesian inference with composite likelihood requires curvature adjustment which is difficult for dependent data. This work shows that approximate Bayesian computation (ABC) can properly address these two issues by using multiple composite score functions as summary statistics. First, the summary-based posterior distribution gives the optimal Godambe information among a wide class of estimators defined by linear combinations of estimating functions. Second, to make ABC computationally feasible for models where marginal likelihoods have no closed form, a novel approach is proposed to estimate all simulated marginal scores using a Monte Carlo sample with size N. Sufficient conditions are given for the additional noise to be negligible with N fixed as the data size n goes to infinity, and the computational cost is O(n). Third, asymptotic properties of ABC with summary statistics having heterogeneous convergence rates is derived, and an adaptive scheme to choose the component composite scores is proposed. Numerical studies show that the new method significantly outperforms the existing Bayesian composite likelihood methods, and the efficiency of adaptively combined composite scores well approximates the efficiency of particle MCMC using the full likelihood. △ Less

Submitted 4 September, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 56 pages, 7 figures

arXiv:2403.14593 [pdf, other]

Rethinking Adversarial Inverse Reinforcement Learning: Policy Imitation, Transferable Reward Recovery and Algebraic Equilibrium Proof

Authors: Yangchun Zhang, Qiang Liu, Weiming Li, Yirui Zhou

Abstract: Adversarial inverse reinforcement learning (AIRL) stands as a cornerstone approach in imitation learning, yet it faces criticisms from prior studies. In this paper, we rethink AIRL and respond to these criticisms. Criticism 1 lies in Inadequate Policy Imitation. We show that substituting the built-in algorithm with soft actor-critic (SAC) during policy updating (requires multi-iterations) signific… ▽ More Adversarial inverse reinforcement learning (AIRL) stands as a cornerstone approach in imitation learning, yet it faces criticisms from prior studies. In this paper, we rethink AIRL and respond to these criticisms. Criticism 1 lies in Inadequate Policy Imitation. We show that substituting the built-in algorithm with soft actor-critic (SAC) during policy updating (requires multi-iterations) significantly enhances the efficiency of policy imitation. Criticism 2 lies in Limited Performance in Transferable Reward Recovery Despite SAC Integration. While we find that SAC indeed exhibits a significant improvement in policy imitation, it introduces drawbacks to transferable reward recovery. We prove that the SAC algorithm itself is not feasible to disentangle the reward function comprehensively during the AIRL training process, and propose a hybrid framework, PPO-AIRL + SAC, for a satisfactory transfer effect. Criticism 3 lies in Unsatisfactory Proof from the Perspective of Potential Equilibrium. We reanalyze it from an algebraic theory perspective. △ Less

Submitted 14 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

arXiv:2402.14438 [pdf, ps, other]

Efficiency-improved doubly robust estimation with non-confounding predictive covariates

Authors: Shanshan Luo, Mengchen Shi, Wei Li, Xueli Wang, Zhi Geng

Abstract: In observational studies, covariates with substantial missing data are often omitted, despite their strong predictive capabilities. These excluded covariates are generally believed not to simultaneously affect both treatment and outcome, indicating that they are not genuine confounders and do not impact the identification of the average treatment effect (ATE). In this paper, we introduce an altern… ▽ More In observational studies, covariates with substantial missing data are often omitted, despite their strong predictive capabilities. These excluded covariates are generally believed not to simultaneously affect both treatment and outcome, indicating that they are not genuine confounders and do not impact the identification of the average treatment effect (ATE). In this paper, we introduce an alternative doubly robust (DR) estimator that fully leverages non-confounding predictive covariates to enhance efficiency, while also allowing missing values in such covariates. Beyond the double robustness property, our proposed estimator is designed to be more efficient than the standard DR estimator. Specifically, when the propensity score model is correctly specified, it achieves the smallest asymptotic variance among the class of DR estimators, and brings additional efficiency gains by further integrating predictive covariates. Simulation studies demonstrate the notable performance of the proposed estimator over current popular methods. An illustrative example is provided to assess the effectiveness of right heart catheterization (RHC) for critically ill patients. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.12825 [pdf, other]

On scalable ARMA models

Authors: Yuchang Lin, Wenyu Li, Qianqian Zhu, Guodong Li

Abstract: This paper considers both the least squares and quasi-maximum likelihood estimation for the recently proposed scalable ARMA model, a parametric infinite-order vector AR model, and their asymptotic normality is also established. It makes feasible the inference on this computationally efficient model, especially for economic and financial time series. An efficient block coordinate descent algorithm… ▽ More This paper considers both the least squares and quasi-maximum likelihood estimation for the recently proposed scalable ARMA model, a parametric infinite-order vector AR model, and their asymptotic normality is also established. It makes feasible the inference on this computationally efficient model, especially for economic and financial time series. An efficient block coordinate descent algorithm is further introduced to search for estimates, and a Bayesian information criterion with selection consistency is suggested for model selection. Simulation experiments are conducted to illustrate their finite sample performance, and a real application on six macroeconomic indicators illustrates the usefulness of the proposed methodology. △ Less

Submitted 27 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: 67 pages, 3 figures, 7 tables

MSC Class: 62M10; 62F12

arXiv:2402.11948 [pdf]

Mini-Hes: A Parallelizable Second-order Latent Factor Analysis Model

Authors: Jialiang Wang, Weiling Li, Yurong Zhong, Xin Luo

Abstract: Interactions among large number of entities is naturally high-dimensional and incomplete (HDI) in many big data related tasks. Behavioral characteristics of users are hidden in these interactions, hence, effective representation of the HDI data is a fundamental task for understanding user behaviors. Latent factor analysis (LFA) model has proven to be effective in representing HDI data. The perform… ▽ More Interactions among large number of entities is naturally high-dimensional and incomplete (HDI) in many big data related tasks. Behavioral characteristics of users are hidden in these interactions, hence, effective representation of the HDI data is a fundamental task for understanding user behaviors. Latent factor analysis (LFA) model has proven to be effective in representing HDI data. The performance of an LFA model relies heavily on its training process, which is a non-convex optimization. It has been proven that incorporating local curvature and preprocessing gradients during its training process can lead to superior performance compared to LFA models built with first-order family methods. However, with the escalation of data volume, the feasibility of second-order algorithms encounters challenges. To address this pivotal issue, this paper proposes a mini-block diagonal hessian-free (Mini-Hes) optimization for building an LFA model. It leverages the dominant diagonal blocks in the generalized Gauss-Newton matrix based on the analysis of the Hessian matrix of LFA model and serves as an intermediary strategy bridging the gap between first-order and second-order optimization methods. Experiment results indicate that, with Mini-Hes, the LFA model outperforms several state-of-the-art models in addressing missing data estimation task on multiple real HDI datasets from recommender system. (The source code of Mini-Hes is available at https://github.com/Goallow/Mini-Hes) △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 6 pages

arXiv:2402.06162 [pdf, other]

Wasserstein proximal operators describe score-based generative models and resolve memorization

Authors: Benjamin J. Zhang, Siting Liu, Wuchen Li, Markos A. Katsoulakis, Stanley J. Osher

Abstract: We focus on the fundamental mathematical structure of score-based generative models (SGMs). We first formulate SGMs in terms of the Wasserstein proximal operator (WPO) and demonstrate that, via mean-field games (MFGs), the WPO formulation reveals mathematical structure that describes the inductive bias of diffusion and score-based models. In particular, MFGs yield optimality conditions in the form… ▽ More We focus on the fundamental mathematical structure of score-based generative models (SGMs). We first formulate SGMs in terms of the Wasserstein proximal operator (WPO) and demonstrate that, via mean-field games (MFGs), the WPO formulation reveals mathematical structure that describes the inductive bias of diffusion and score-based models. In particular, MFGs yield optimality conditions in the form of a pair of coupled partial differential equations: a forward-controlled Fokker-Planck (FP) equation, and a backward Hamilton-Jacobi-Bellman (HJB) equation. Via a Cole-Hopf transformation and taking advantage of the fact that the cross-entropy can be related to a linear functional of the density, we show that the HJB equation is an uncontrolled FP equation. Second, with the mathematical structure at hand, we present an interpretable kernel-based model for the score function which dramatically improves the performance of SGMs in terms of training samples and training time. In addition, the WPO-informed kernel model is explicitly constructed to avoid the recently studied memorization effects of score-based generative models. The mathematical form of the new kernel-based models in combination with the use of the terminal condition of the MFG reveals new explanations for the manifold learning and generalization properties of SGMs, and provides a resolution to their memorization effects. Finally, our mathematically informed, interpretable kernel-based model suggests new scalable bespoke neural network architectures for high-dimensional applications. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.05384 [pdf, other]

Efficient Nonparametric Inference of Causal Mediation Effects with Nonignorable Missing Confounders

Authors: Jiawei Shan, Wei Li, Chunrong Ai

Abstract: We consider causal mediation analysis with confounders subject to nonignorable missingness in a nonparametric framework. Our approach relies on shadow variables that are associated with the missing confounders but independent of the missingness mechanism. The mediation effect of interest is shown to be a weighted average of an iterated conditional expectation, which motivates our Sieve-based Itera… ▽ More We consider causal mediation analysis with confounders subject to nonignorable missingness in a nonparametric framework. Our approach relies on shadow variables that are associated with the missing confounders but independent of the missingness mechanism. The mediation effect of interest is shown to be a weighted average of an iterated conditional expectation, which motivates our Sieve-based Iterative Outward (SIO) estimator. We derive the rate of convergence and asymptotic normality of the SIO estimator, which do not suffer from the ill-posed inverse problem. Essentially, we show that the asymptotic normality is not affected by the slow convergence rate of nonparametric estimators of nuisance functions. Moreover, we demonstrate that our estimator is locally efficient and attains the semiparametric efficiency bound under certain conditions. We accurately depict the efficiency loss attributable to missingness and identify scenarios in which efficiency loss is absent. We also propose a stable and easy-to-implement approach to estimate asymptotic variance and construct confidence intervals for the mediation effects. Finally, we evaluate the finite-sample performance of our proposed approach through simulation studies, and apply it to the CFPS data to show its practical applicability. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.01036 [pdf, other]

Fisher information dissipation for time inhomogeneous stochastic differential equations

Authors: Qi Feng, Xinzhe Zuo, Wuchen Li

Abstract: We provide a Lyapunov convergence analysis for time-inhomogeneous variable coefficient stochastic differential equations (SDEs). Three typical examples include overdamped, irreversible drift, and underdamped Langevin dynamics. We first formula the probability transition equation of Langevin dynamics as a modified gradient flow of the Kullback-Leibler divergence in the probability space with respec… ▽ More We provide a Lyapunov convergence analysis for time-inhomogeneous variable coefficient stochastic differential equations (SDEs). Three typical examples include overdamped, irreversible drift, and underdamped Langevin dynamics. We first formula the probability transition equation of Langevin dynamics as a modified gradient flow of the Kullback-Leibler divergence in the probability space with respect to time-dependent optimal transport metrics. This formulation contains both gradient and non-gradient directions depending on a class of time-dependent target distribution. We then select a time-dependent relative Fisher information functional as a Lyapunov functional. We develop a time-dependent Hessian matrix condition, which guarantees the convergence of the probability density function of the SDE. We verify the proposed conditions for several time-inhomogeneous Langevin dynamics. For the overdamped Langevin dynamics, we prove the $O(t^{-1/2})$ convergence in $L^1$ distance for the simulated annealing dynamics with a strongly convex potential function. For the irreversible drift Langevin dynamics, we prove an improved convergence towards the target distribution in an asymptotic regime. We also verify the convergence condition for the underdamped Langevin dynamics. Numerical examples demonstrate the convergence results for the time-dependent Langevin dynamics. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 9 figures, 36 pages

arXiv:2402.00597 [pdf, other]

An efficient multivariate volatility model for many assets

Authors: Wenyu Li, Yuchang Lin, Qianqian Zhu, Guodong Li

Abstract: This paper develops a flexible and computationally efficient multivariate volatility model, which allows for dynamic conditional correlations and volatility spillover effects among financial assets. The new model has desirable properties such as identifiability and computational tractability for many assets. A sufficient condition of the strict stationarity is derived for the new process. Two quas… ▽ More This paper develops a flexible and computationally efficient multivariate volatility model, which allows for dynamic conditional correlations and volatility spillover effects among financial assets. The new model has desirable properties such as identifiability and computational tractability for many assets. A sufficient condition of the strict stationarity is derived for the new process. Two quasi-maximum likelihood estimation methods are proposed for the new model with and without low-rank constraints on the coefficient matrices respectively, and the asymptotic properties for both estimators are established. Moreover, a Bayesian information criterion with selection consistency is developed for order selection, and the testing for volatility spillover effects is carefully discussed. The finite sample performance of the proposed methods is evaluated in simulation studies for small and moderate dimensions. The usefulness of the new model and its inference tools is illustrated by two empirical examples for 5 stock markets and 17 industry portfolios, respectively. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.11070 [pdf, other]

Efficient Data Reduction Strategies for Big Data and High-Dimensional LASSO Regressions

Authors: Xin Wang, Min Yang, William Li

Abstract: The IBOSS approach proposed by Wang et al. (2019) selects the most informative subset of n points. It assumes that the ordinary least squares method is used and requires that the number of variables, p, is not large. However, in many practical problems, p is very large and penalty-based model fitting methods such as LASSO is used. We study the big data problems, in which both n and p are large. In… ▽ More The IBOSS approach proposed by Wang et al. (2019) selects the most informative subset of n points. It assumes that the ordinary least squares method is used and requires that the number of variables, p, is not large. However, in many practical problems, p is very large and penalty-based model fitting methods such as LASSO is used. We study the big data problems, in which both n and p are large. In the first part, we focus on reduction in data points. We develop theoretical results showing that the IBOSS type of approach can be applicable to penalty-based regressions such as LASSO. In the second part, we consider the situations where p is extremely large. We propose a two-step approach that involves first reducing the number of variables and then reducing the number of data points. Two separate algorithms are developed, whose performances are studied through extensive simulation studies. Compared to existing methods including well-known split-and-conquer approach, the proposed methods enjoy advantages in terms of estimation accuracy, prediction accuracy, and computation time. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.00775 [pdf, other]

doi 10.1146/annurev-statistics-040522-022138

Recent Advances in Text Analysis

Authors: Zheng Tracy Ke, Pengsheng Ji, Jiashun Jin, Wanshan Li

Abstract: Text analysis is an interesting research area in data science and has various applications, such as in artificial intelligence, biomedical research, and engineering. We review popular methods for text analysis, ranging from topic modeling to the recent neural language models. In particular, we review Topic-SCORE, a statistical approach to topic modeling, and discuss how to use it to analyze MADSta… ▽ More Text analysis is an interesting research area in data science and has various applications, such as in artificial intelligence, biomedical research, and engineering. We review popular methods for text analysis, ranging from topic modeling to the recent neural language models. In particular, we review Topic-SCORE, a statistical approach to topic modeling, and discuss how to use it to analyze MADStat - a dataset on statistical publications that we collected and cleaned. The application of Topic-SCORE and other methods on MADStat leads to interesting findings. For example, $11$ representative topics in statistics are identified. For each journal, the evolution of topic weights over time can be visualized, and these results are used to analyze the trends in statistical research. In particular, we propose a new statistical model for ranking the citation impacts of $11$ topics, and we also build a cross-topic citation graph to illustrate how research results on different topics spread to one another. The results on MADStat provide a data-driven picture of the statistical research in $1975$--$2015$, from a text analysis perspective. △ Less

Submitted 7 February, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

Journal ref: Annual Review of Statistics and Its Application 2024 11:1

arXiv:2312.17346 [pdf, other]

STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

Authors: Dennis Wu, Jerry Yao-Chieh Hu, Weijian Li, Bo-Yu Chen, Han Liu

Abstract: We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities. At the heart of our approach is STanHop, a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations in a data-dependent fashion. In essence, STanHop sequentially learn temporal representation and cross-s… ▽ More We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities. At the heart of our approach is STanHop, a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations in a data-dependent fashion. In essence, STanHop sequentially learn temporal representation and cross-series representation using two tandem sparse Hopfield layers. In addition, StanHop incorporates two additional external memory modules: a Plug-and-Play module and a Tune-and-Play module for train-less and task-aware memory-enhancements, respectively. They allow StanHop-Net to swiftly respond to certain sudden events. Methodologically, we construct the StanHop-Net by stacking STanHop blocks in a hierarchical fashion, enabling multi-resolution feature extraction with resolution-specific sparsity. Theoretically, we introduce a sparse extension of the modern Hopfield model (Generalized Sparse Modern Hopfield Model) and show that it endows a tighter memory retrieval error compared to the dense counterpart without sacrificing memory capacity. Empirically, we validate the efficacy of our framework on both synthetic and real-world settings. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.01411 [pdf, other]

Bayesian inference on Cox regression models using catalytic prior distributions

Authors: Weihao Li, Dongming Huang

Abstract: The Cox proportional hazards model (Cox model) is a popular model for survival data analysis. When the sample size is small relative to the dimension of the model, the standard maximum partial likelihood inference is often problematic. In this work, we propose the Cox catalytic prior distributions for Bayesian inference on Cox models, which is an extension of a general class of prior distributions… ▽ More The Cox proportional hazards model (Cox model) is a popular model for survival data analysis. When the sample size is small relative to the dimension of the model, the standard maximum partial likelihood inference is often problematic. In this work, we propose the Cox catalytic prior distributions for Bayesian inference on Cox models, which is an extension of a general class of prior distributions originally designed for stabilizing complex parametric models. The Cox catalytic prior is formulated as a weighted likelihood of the regression coefficients based on synthetic data and a surrogate baseline hazard constant. This surrogate hazard can be either provided by the user or estimated from the data, and the synthetic data are generated from the predictive distribution of a fitted simpler model. For point estimation, we derive an approximation of the marginal posterior mode, which can be computed conveniently as a regularized log partial likelihood estimator. We prove that our prior distribution is proper and the resulting estimator is consistent under mild conditions. In simulation studies, our proposed method outperforms standard maximum partial likelihood inference and is on par with existing shrinkage methods. We further illustrate the application of our method to a real dataset. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: 34 pages

arXiv:2311.16793 [pdf, other]

Mediation pathway selection with unmeasured mediator-outcome confounding

Authors: Kang Shuai, LAn Liu, Yangbo He, Wei Li

Abstract: Causal mediation analysis aims to investigate how an intermediary factor, called a mediator, regulates the causal effect of a treatment on an outcome. With the increasing availability of measurements on a large number of potential mediators, methods for selecting important mediators have been proposed. However, these methods often assume the absence of unmeasured mediator-outcome confounding. We a… ▽ More Causal mediation analysis aims to investigate how an intermediary factor, called a mediator, regulates the causal effect of a treatment on an outcome. With the increasing availability of measurements on a large number of potential mediators, methods for selecting important mediators have been proposed. However, these methods often assume the absence of unmeasured mediator-outcome confounding. We allow for such confounding in a linear structural equation model for the outcome and further propose an approach to tackle the mediator selection issue. To achieve this, we firstly identify causal parameters by constructing a pseudo proxy variable for unmeasured confounding. Leveraging this proxy variable, we propose a partially penalized method to identify mediators affecting the outcome. The resultant estimates are consistent, and the estimates of nonzero parameters are asymptotically normal. Motivated by these results, we introduce a two-step procedure to consistently select active mediation pathways, eliminating the need to test composite null hypotheses for each mediator that are commonly required by traditional methods. Simulation studies demonstrate the superior performance of our approach compared to existing methods. Finally, we apply our approach to genomic data, identifying gene expressions that potentially mediate the impact of a genetic variant on mouse obesity. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: 35 pages

arXiv:2311.02516 [pdf, other]

Forward $χ^2$ Divergence Based Variational Importance Sampling

Authors: Chengrui Li, Yule Wang, Weihan Li, Anqi Wu

Abstract: Maximizing the log-likelihood is a crucial aspect of learning latent variable models, and variational inference (VI) stands as the commonly adopted method. However, VI can encounter challenges in achieving a high log-likelihood when dealing with complicated posterior distributions. In response to this limitation, we introduce a novel variational importance sampling (VIS) approach that directly est… ▽ More Maximizing the log-likelihood is a crucial aspect of learning latent variable models, and variational inference (VI) stands as the commonly adopted method. However, VI can encounter challenges in achieving a high log-likelihood when dealing with complicated posterior distributions. In response to this limitation, we introduce a novel variational importance sampling (VIS) approach that directly estimates and maximizes the log-likelihood. VIS leverages the optimal proposal distribution, achieved by minimizing the forward $χ^2$ divergence, to enhance log-likelihood estimation. We apply VIS to various popular latent variable models, including mixture models, variational auto-encoders, and partially observable generalized linear models. Results demonstrate that our approach consistently outperforms state-of-the-art baselines, both in terms of log-likelihood and model parameter estimation. △ Less

Submitted 2 February, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

arXiv:2311.00878 [pdf, other]

Backward Joint Model for the Dynamic Prediction of Both Competing Risk and Longitudinal Outcomes

Authors: Wenhao Li, Brad C. Astor, Wei Yang, Tom H. Greene, Liang Li

Abstract: Joint modeling is a useful approach to dynamic prediction of clinical outcomes using longitudinally measured predictors. When the outcomes are competing risk events, fitting the conventional shared random effects joint model often involves intensive computation, especially when multiple longitudinal biomarkers are be used as predictors, as is often desired in prediction problems. This paper propos… ▽ More Joint modeling is a useful approach to dynamic prediction of clinical outcomes using longitudinally measured predictors. When the outcomes are competing risk events, fitting the conventional shared random effects joint model often involves intensive computation, especially when multiple longitudinal biomarkers are be used as predictors, as is often desired in prediction problems. This paper proposes a new joint model for the dynamic prediction of competing risk outcomes. The model factorizes the likelihood into the distribution of the competing risks data and the distribution of longitudinal data given the competing risks data. It extends the basic idea of the recently published backward joint model (BJM) to the competing risk setting, and we call this model crBJM. This model also enables the prediction of future longitudinal data trajectories conditional on being at risk at a future time, a practically important problem that has not been studied in the statistical literature. The model fitting with the EM algorithm is efficient, stable and computationally fast, with a one-dimensional integral in the E-step and convex optimization for most parameters in the M-step, regardless of the number of longitudinal predictors. The model also comes with a consistent albeit less efficient estimation method that can be quickly implemented with standard software, ideal for model building and diagnostics. We study the numerical properties of the proposed method using simulations and illustrate its use in a chronic kidney disease study. △ Less

Submitted 30 August, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.16323 [pdf, other]

Personalized Federated X -armed Bandit

Authors: Wenjie Li, Qifan Song, Jean Honorio

Abstract: In this work, we study the personalized federated $\mathcal{X}$-armed bandit problem, where the heterogeneous local objectives of the clients are optimized simultaneously in the federated learning paradigm. We propose the \texttt{PF-PNE} algorithm with a unique double elimination strategy, which safely eliminates the non-optimal regions while encouraging federated collaboration through biased but… ▽ More In this work, we study the personalized federated $\mathcal{X}$-armed bandit problem, where the heterogeneous local objectives of the clients are optimized simultaneously in the federated learning paradigm. We propose the \texttt{PF-PNE} algorithm with a unique double elimination strategy, which safely eliminates the non-optimal regions while encouraging federated collaboration through biased but effective evaluations of the local objectives. The proposed \texttt{PF-PNE} algorithm is able to optimize local objectives with arbitrary levels of heterogeneity, and its limited communications protects the confidentiality of the client-wise reward data. Our theoretical analysis shows the benefit of the proposed algorithm over single-client algorithms. Experimentally, \texttt{PF-PNE} outperforms multiple baselines on both synthetic and real life datasets. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.07990 [pdf]

Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics

Authors: Chen Zhao, Kuan-Jui Su, Chong Wu, Xuewei Cao, Qiuying Sha, Wu Li, Zhe Luo, Tian Qin, Chuan Qiu, Lan Juan Zhao, Anqi Liu, Lindong Jiang, Xiao Zhang, Hui Shen, Weihua Zhou, Hong-Wen Deng

Abstract: Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information f… ▽ More Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved R^2-scores > 0.01 for 71.55% of metabolites. Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research. △ Less

Submitted 12 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 19 pages, 3 figures

arXiv:2310.05495 [pdf, other]

On the Convergence of Federated Averaging under Partial Participation for Over-parameterized Neural Networks

Authors: Xin Liu, Wei li, Dazhi Zhan, Yu Pan, Xin Ma, Yu Ding, Zhisong Pan

Abstract: Federated learning (FL) is a widely employed distributed paradigm for collaboratively training machine learning models from multiple clients without sharing local data. In practice, FL encounters challenges in dealing with partial client participation due to the limited bandwidth, intermittent connection and strict synchronized delay. Simultaneously, there exist few theoretical convergence guarant… ▽ More Federated learning (FL) is a widely employed distributed paradigm for collaboratively training machine learning models from multiple clients without sharing local data. In practice, FL encounters challenges in dealing with partial client participation due to the limited bandwidth, intermittent connection and strict synchronized delay. Simultaneously, there exist few theoretical convergence guarantees in this practical setting, especially when associated with the non-convex optimization of neural networks. To bridge this gap, we focus on the training problem of federated averaging (FedAvg) method for two canonical models: a deep linear network and a two-layer ReLU network. Under the over-parameterized assumption, we provably show that FedAvg converges to a global minimum at a linear rate $\mathcal{O}\left((1-\frac{min_{i \in [t]}|S_i|}{N^2})^t\right)$ after $t$ iterations, where $N$ is the number of clients and $|S_i|$ is the number of the participated clients in the $i$-th iteration. Experimental evaluations confirm our theoretical results. △ Less

Submitted 2 February, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.12997 [pdf, other]

Scaling Limits of the Wasserstein information matrix on Gaussian Mixture Models

Authors: Wuchen Li, Jiaxi Zhao

Abstract: We consider the Wasserstein metric on the Gaussian mixture models (GMMs), which is defined as the pullback of the full Wasserstein metric on the space of smooth probability distributions with finite second moment. It derives a class of Wasserstein metrics on probability simplices over one-dimensional bounded homogeneous lattices via a scaling limit of the Wasserstein metric on GMMs. Specifically,… ▽ More We consider the Wasserstein metric on the Gaussian mixture models (GMMs), which is defined as the pullback of the full Wasserstein metric on the space of smooth probability distributions with finite second moment. It derives a class of Wasserstein metrics on probability simplices over one-dimensional bounded homogeneous lattices via a scaling limit of the Wasserstein metric on GMMs. Specifically, for a sequence of GMMs whose variances tend to zero, we prove that the limit of the Wasserstein metric exists after certain renormalization. Generalizations of this metric in general GMMs are established, including inhomogeneous lattice models whose lattice gaps are not the same, extended GMMs whose mean parameters of Gaussian components can also change, and the second-order metric containing high-order information of the scaling limit. We further study the Wasserstein gradient flows on GMMs for three typical functionals: potential, internal, and interaction energies. Numerical examples demonstrate the effectiveness of the proposed GMM models for approximating Wasserstein gradient flows. △ Less

Submitted 22 September, 2023; originally announced September 2023.

Comments: 32 pages, 3 figures

MSC Class: 62B11; 41A60

arXiv:2309.08199 [pdf, ps, other]

Multiply robust estimation of causal effects using linked data

Authors: Shanshan Luo, Yechi Zhang, Wei Li

Abstract: Unmeasured confounding presents a common challenge in observational studies, potentially making standard causal parameters unidentifiable without additional assumptions. Given the increasing availability of diverse data sources, exploiting data linkage offers a potential solution to mitigate unmeasured confounding within a primary study of interest. However, this approach often introduces selectio… ▽ More Unmeasured confounding presents a common challenge in observational studies, potentially making standard causal parameters unidentifiable without additional assumptions. Given the increasing availability of diverse data sources, exploiting data linkage offers a potential solution to mitigate unmeasured confounding within a primary study of interest. However, this approach often introduces selection bias, as data linkage is feasible only for a subset of the study population. To address this concern, we explore three nonparametric identification strategies under the assumption that a unit' s inclusion in the linked cohort is determined solely by the observed confounders, while acknowledging that the ignorability assumption may depend on some partially unobserved covariates. The existence of multiple identification strategies motivates the development of estimators that effectively capture distinct components of the observed data distribution. Appropriately combining these estimators yields triply robust estimators for the average treatment effect. These estimators remain consistent if at least one of the three distinct parts of the observed data law is correct. Moreover, they are locally efficient if all the models are correctly specified. We evaluate the proposed estimators using simulation studies and real data analysis. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.02087 [pdf, ps, other]

Identifying Causal Effects Using Instrumental Variables from the Auxiliary Population

Authors: Kang Shuai, Shanshan Luo, Wei Li, Yangbo He

Abstract: Instrumental variable approaches have gained popularity for estimating causal effects in the presence of unmeasured confounding. However, the availability of instrumental variables in the primary population is often challenged due to stringent and untestable assumptions. This paper presents a novel method to identify and estimate causal effects in the primary population by utilizing instrumental v… ▽ More Instrumental variable approaches have gained popularity for estimating causal effects in the presence of unmeasured confounding. However, the availability of instrumental variables in the primary population is often challenged due to stringent and untestable assumptions. This paper presents a novel method to identify and estimate causal effects in the primary population by utilizing instrumental variables from the auxiliary population, incorporating a structural equation model, even in scenarios with nonlinear treatment effects. Our approach involves using two datasets: one from the primary population with joint observations of treatment and outcome, and another from the auxiliary population providing information about the instrument and treatment. Our strategy differs from most existing methods by not depending on the simultaneous measurements of instrument and outcome. The central idea for identifying causal effects is to establish a valid substitute through the auxiliary population, addressing unmeasured confounding. This is achieved by developing a control function and projecting it onto the function space spanned by the treatment variable. We then propose a three-step estimator for estimating causal effects and derive its asymptotic results. We illustrate the proposed estimator through simulation studies, and the results demonstrate favorable performance. We also conduct a real data analysis to evaluate the causal effect between vitamin D status and BMI. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: 19 pages

arXiv:2308.14945 [pdf, other]

Noise-Free Sampling Algorithms via Regularized Wasserstein Proximals

Authors: Hong Ye Tan, Stanley Osher, Wuchen Li

Abstract: We consider the problem of sampling from a distribution governed by a potential function. This work proposes an explicit score based MCMC method that is deterministic, resulting in a deterministic evolution for particles rather than a stochastic differential equation evolution. The score term is given in closed form by a regularized Wasserstein proximal, using a kernel convolution that is approxim… ▽ More We consider the problem of sampling from a distribution governed by a potential function. This work proposes an explicit score based MCMC method that is deterministic, resulting in a deterministic evolution for particles rather than a stochastic differential equation evolution. The score term is given in closed form by a regularized Wasserstein proximal, using a kernel convolution that is approximated by sampling. We demonstrate fast convergence on various problems and show improved dimensional dependence of mixing time bounds for the case of Gaussian distributions compared to the unadjusted Langevin algorithm (ULA) and the Metropolis-adjusted Langevin algorithm (MALA). We additionally derive closed form expressions for the distributions at each iterate for quadratic potential functions, characterizing the variance reduction. Empirical results demonstrate that the particles behave in an organized manner, lying on level set contours of the potential. Moreover, the posterior mean estimator of the proposed method is shown to be closer to the maximum a-posteriori estimator compared to ULA and MALA in the context of Bayesian logistic regression. Additional examples demonstrate competitive performance for Bayesian neural network training. △ Less

Submitted 2 October, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

MSC Class: 65C05; 62G07

arXiv:2308.10505 [pdf, other]

A Clustering Algorithm to Organize Satellite Hotspot Data for the Purpose of Tracking Bushfires Remotely

Authors: Weihao Li, Emily Dodwell, Dianne Cook

Abstract: This paper proposes a spatiotemporal clustering algorithm and its implementation in the R package spotoroo. This work is motivated by the catastrophic bushfires in Australia throughout the summer of 2019-2020 and made possible by the availability of satellite hotspot data. The algorithm is inspired by two existing spatiotemporal clustering algorithms but makes enhancements to cluster points spatia… ▽ More This paper proposes a spatiotemporal clustering algorithm and its implementation in the R package spotoroo. This work is motivated by the catastrophic bushfires in Australia throughout the summer of 2019-2020 and made possible by the availability of satellite hotspot data. The algorithm is inspired by two existing spatiotemporal clustering algorithms but makes enhancements to cluster points spatially in conjunction with their movement across consecutive time periods. It also allows for the adjustment of key parameters, if required, for different locations and satellite data sources. Bushfire data from Victoria, Australia, is used to illustrate the algorithm and its use within the package. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.05964 [pdf, other]

A Plot is Worth a Thousand Tests: Assessing Residual Diagnostics with the Lineup Protocol

Authors: Weihao Li, Dianne Cook, Emi Tanaka, Susan VanderPlas

Abstract: Regression experts consistently recommend plotting residuals for model diagnosis, despite the availability of many numerical hypothesis test procedures designed to use residuals to assess problems with a model fit. Here we provide evidence for why this is good advice using data from a visual inference experiment. We show how conventional tests are too sensitive, which means that too often the conc… ▽ More Regression experts consistently recommend plotting residuals for model diagnosis, despite the availability of many numerical hypothesis test procedures designed to use residuals to assess problems with a model fit. Here we provide evidence for why this is good advice using data from a visual inference experiment. We show how conventional tests are too sensitive, which means that too often the conclusion would be that the model fit is inadequate. The experiment uses the lineup protocol which puts a residual plot in the context of null plots. This helps generate reliable and consistent reading of residual plots for better model diagnosis. It can also help in an obverse situation where a conventional test would fail to detect a problem with a model due to contaminated data. The lineup protocol also detects a range of departures from good residuals simultaneously. Supplemental materials for the article are available online. △ Less

Submitted 24 March, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

arXiv:2306.16578 [pdf, other]

Allocating Divisible Resources on Arms with Unknown and Random Rewards

Authors: Ningyuan Chen, Wenhao Li

Abstract: We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource. In particular, if the decision maker allocates resource $A_i$ to arm $i$ in a period, then the reward… ▽ More We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource. In particular, if the decision maker allocates resource $A_i$ to arm $i$ in a period, then the reward $Y_i$ is$Y_i(A_i)=A_i μ_i+A_i^b ξ_{i}$, where $μ_i$ is the unknown mean and the noise $ξ_{i}$ is independent and sub-Gaussian. When the order $b$ ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for $b\in [0,1]$, and demonstrate a phase transition at $b=1/2$. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic. △ Less

Submitted 2 November, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.15286 [pdf, other]

Multilayer random dot product graphs: Estimation and online change point detection

Authors: Fan Wang, Wanshan Li, Oscar Hernan Madrid Padilla, Yi Yu, Alessandro Rinaldo

Abstract: We study the multilayer random dot product graph (MRDPG) model, an extension of the random dot product graph to multilayer networks. To estimate the edge probabilities, we deploy a tensor-based methodology and demonstrate its superiority over existing approaches. Moving to dynamic MRDPGs, we formulate and analyse an online change point detection framework. At every time point, we observe a realiza… ▽ More We study the multilayer random dot product graph (MRDPG) model, an extension of the random dot product graph to multilayer networks. To estimate the edge probabilities, we deploy a tensor-based methodology and demonstrate its superiority over existing approaches. Moving to dynamic MRDPGs, we formulate and analyse an online change point detection framework. At every time point, we observe a realization from an MRDPG. Across layers, we assume fixed shared common node sets and latent positions but allow for different connectivity matrices. We propose efficient tensor algorithms under both fixed and random latent position cases to minimize the detection delay while controlling false alarms. Notably, in the random latent position case, we devise a novel nonparametric change point detection algorithm based on density kernel estimation that is applicable to a wide range of scenarios, including stochastic block models as special cases. Our theoretical findings are supported by extensive numerical experiments, with the code available online https://github.com/MountLee/MRDPG. △ Less

Submitted 10 June, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.06252 [pdf, other]

Feature Programming for Multivariate Time Series Prediction

Authors: Alex Reneau, Jerry Yao-Chieh Hu, Chenwei Xu, Weijian Li, Ammar Gilani, Han Liu

Abstract: We introduce the concept of programmable feature engineering for time series modeling and propose a feature programming framework. This framework generates large amounts of predictive features for noisy multivariate time series while allowing users to incorporate their inductive bias with minimal effort. The key motivation of our framework is to view any multivariate time series as a cumulative su… ▽ More We introduce the concept of programmable feature engineering for time series modeling and propose a feature programming framework. This framework generates large amounts of predictive features for noisy multivariate time series while allowing users to incorporate their inductive bias with minimal effort. The key motivation of our framework is to view any multivariate time series as a cumulative sum of fine-grained trajectory increments, with each increment governed by a novel spin-gas dynamical Ising model. This fine-grained perspective motivates the development of a parsimonious set of operators that summarize multivariate time series in an abstract fashion, serving as the foundation for large-scale automated feature engineering. Numerically, we validate the efficacy of our method on several synthetic and real-world noisy time series datasets. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: 21 pages, accepted to ICML2023. Code is available at https://github.com/SirAlex900/FeatureProgramming

arXiv:2305.13856 [pdf, ps, other]

On the Optimal Batch Size for Byzantine-Robust Distributed Learning

Authors: Yi-Rui Yang, Chang-Wei Shi, Wu-Jun Li

Abstract: Byzantine-robust distributed learning (BRDL), in which computing devices are likely to behave abnormally due to accidental failures or malicious attacks, has recently become a hot research topic. However, even in the independent and identically distributed (i.i.d.) case, existing BRDL methods will suffer from a significant drop on model accuracy due to the large variance of stochastic gradients. I… ▽ More Byzantine-robust distributed learning (BRDL), in which computing devices are likely to behave abnormally due to accidental failures or malicious attacks, has recently become a hot research topic. However, even in the independent and identically distributed (i.i.d.) case, existing BRDL methods will suffer from a significant drop on model accuracy due to the large variance of stochastic gradients. Increasing batch sizes is a simple yet effective way to reduce the variance. However, when the total number of gradient computation is fixed, a too-large batch size will lead to a too-small iteration number (update number), which may also degrade the model accuracy. In view of this challenge, we mainly study the optimal batch size when the total number of gradient computation is fixed in this work. In particular, we theoretically and empirically show that when the total number of gradient computation is fixed, the optimal batch size in BRDL increases with the fraction of Byzantine workers. Therefore, compared to the case without attacks, the batch size should be set larger when under Byzantine attacks. However, for existing BRDL methods, large batch sizes will lead to a drop on model accuracy, even if there is no Byzantine attack. To deal with this problem, we propose a novel BRDL method, called Byzantine-robust stochastic gradient descent with normalized momentum (ByzSGDnm), which can alleviate the drop on model accuracy in large-batch cases. Moreover, we theoretically prove the convergence of ByzSGDnm for general non-convex cases under Byzantine attacks. Empirical results show that ByzSGDnm has a comparable performance to existing BRDL methods under bit-flipping failure, but can outperform existing BRDL methods under deliberately crafted attacks. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.06807 [pdf, other]

Information Design in Multi-Agent Reinforcement Learning

Authors: Yue Lin, Wenhao Li, Hongyuan Zha, Baoxiang Wang

Abstract: Reinforcement learning (RL) is inspired by the way human infants and animals learn from the environment. The setting is somewhat idealized because, in actual tasks, other agents in the environment have their own goals and behave adaptively to the ego agent. To thrive in those environments, the agent needs to influence other agents so their actions become more helpful and less harmful. Research in… ▽ More Reinforcement learning (RL) is inspired by the way human infants and animals learn from the environment. The setting is somewhat idealized because, in actual tasks, other agents in the environment have their own goals and behave adaptively to the ego agent. To thrive in those environments, the agent needs to influence other agents so their actions become more helpful and less harmful. Research in computational economics distills two ways to influence others directly: by providing tangible goods (mechanism design) and by providing information (information design). This work investigates information design problems for a group of RL agents. The main challenges are two-fold. One is the information provided will immediately affect the transition of the agent trajectories, which introduces additional non-stationarity. The other is the information can be ignored, so the sender must provide information that the receiver is willing to respect. We formulate the Markov signaling game, and develop the notions of signaling gradient and the extended obedience constraints that address these challenges. Our algorithm is efficient on various mixed-motive tasks and provides further insights into computational economics. Our code is publicly available at https://github.com/YueLin301/InformationDesignMARL. △ Less

Submitted 29 October, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

arXiv:2303.17048 [pdf]

Applying Machine Learning to Understand Water Security and Water Access Inequality in Underserved Colonia Communities

Authors: Zhining Gu, Wenwen Li, Michael Hanemann, Yushiou Tsai, Amber Wutich, Paul Westerhoff, Laura Landes, Anais D. Roque, Madeleine Zheng, Carmen A. Velasco, Sarah Porter

Abstract: This paper explores the application of machine learning to enhance our understanding of water accessibility issues in underserved communities called Colonias located along the northern part of the United States - Mexico border. We analyzed more than 2000 such communities using data from the Rural Community Assistance Partnership (RCAP) and applied hierarchical clustering and the adaptive affinity… ▽ More This paper explores the application of machine learning to enhance our understanding of water accessibility issues in underserved communities called Colonias located along the northern part of the United States - Mexico border. We analyzed more than 2000 such communities using data from the Rural Community Assistance Partnership (RCAP) and applied hierarchical clustering and the adaptive affinity propagation algorithm to automatically group Colonias into clusters with different water access conditions. The Gower distance was introduced to make the algorithm capable of processing complex datasets containing both categorical and numerical attributes. To better understand and explain the clustering results derived from the machine learning process, we further applied a decision tree analysis algorithm to associate the input data with the derived clusters, to identify and rank the importance of factors that characterize different water access conditions in each cluster. Our results complement experts' priority rankings of water infrastructure needs, providing a more in-depth view of the water insecurity challenges that the Colonias suffer from. As an automated and reproducible workflow combining a series of tools, the proposed machine learning pipeline represents an operationalized solution for conducting data-driven analysis to understand water access inequality. This pipeline can be adapted to analyze different datasets and decision scenarios. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: 26 pages, 7 figures, accepted by Computers, Environment and Urban Systems (CEUS)

arXiv:2303.10134 [pdf, ps, other]

doi 10.1016/j.spl.2023.109836

Proximal Causal Inference without Uniqueness Assumptions

Authors: Jeffrey Zhang, Wei Li, Wang Miao, Eric Tchetgen Tchetgen

Abstract: We consider identification and inference about a counterfactual outcome mean when there is unmeasured confounding using tools from proximal causal inference (Miao et al. [2018], Tchetgen Tchetgen et al. [2020]). Proximal causal inference requires existence of solutions to at least one of two integral equations. We motivate the existence of solutions to the integral equations from proximal causal i… ▽ More We consider identification and inference about a counterfactual outcome mean when there is unmeasured confounding using tools from proximal causal inference (Miao et al. [2018], Tchetgen Tchetgen et al. [2020]). Proximal causal inference requires existence of solutions to at least one of two integral equations. We motivate the existence of solutions to the integral equations from proximal causal inference by demonstrating that, assuming the existence of a solution to one of the integral equations, $\sqrt{n}$-estimability of a linear functional (such as its mean) of that solution requires the existence of a solution to the other integral equation. Solutions to the integral equations may not be unique, which complicates estimation and inference. We construct a consistent estimator for the solution set for one of the integral equations and then adapt the theory of extremum estimators to find from the estimated set a consistent estimator for a uniquely defined solution. A debiased estimator for the counterfactual mean is shown to be root-$n$ consistent, regular, and asymptotically semiparametrically locally efficient under additional regularity conditions. △ Less

Submitted 1 October, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: Fixed some errors and added to acknowledgements

Journal ref: Statistics & Probability Letters 198 (2023)

arXiv:2303.10112 [pdf, other]

Causal Discovery from Temporal Data: An Overview and New Perspectives

Authors: Chang Gong, Di Yao, Chuzhe Zhang, Wenbin Li, Jingping Bi

Abstract: Temporal data, representing chronological observations of complex systems, has always been a typical data structure that can be widely generated by many domains, such as industry, medicine and finance. Analyzing this type of data is extremely valuable for various applications. Thus, different temporal data analysis tasks, eg, classification, clustering and prediction, have been proposed in the pas… ▽ More Temporal data, representing chronological observations of complex systems, has always been a typical data structure that can be widely generated by many domains, such as industry, medicine and finance. Analyzing this type of data is extremely valuable for various applications. Thus, different temporal data analysis tasks, eg, classification, clustering and prediction, have been proposed in the past decades. Among them, causal discovery, learning the causal relations from temporal data, is considered an interesting yet critical task and has attracted much research attention. Existing causal discovery works can be divided into two highly correlated categories according to whether the temporal data is calibrated, ie, multivariate time series causal discovery, and event sequence causal discovery. However, most previous surveys are only focused on the time series causal discovery and ignore the second category. In this paper, we specify the correlation between the two categories and provide a systematical overview of existing solutions. Furthermore, we provide public datasets, evaluation metrics and new perspectives for temporal data causal discovery. △ Less

Submitted 3 August, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: 54 pages, 7 figures

arXiv:2303.04030 [pdf, other]

PyXAB -- A Python Library for $\mathcal{X}$-Armed Bandit and Online Blackbox Optimization Algorithms

Authors: Wenjie Li, Haoze Li, Jean Honorio, Qifan Song

Abstract: We introduce a Python open-source library for $\mathcal{X}$-armed bandit and online blackbox optimization named PyXAB. PyXAB contains the implementations for more than 10 $\mathcal{X}$-armed bandit algorithms, such as HOO, StoSOO, HCT, and the most recent works GPO and VHCT. PyXAB also provides the most commonly-used synthetic objectives to evaluate the performance of different algorithms and the… ▽ More We introduce a Python open-source library for $\mathcal{X}$-armed bandit and online blackbox optimization named PyXAB. PyXAB contains the implementations for more than 10 $\mathcal{X}$-armed bandit algorithms, such as HOO, StoSOO, HCT, and the most recent works GPO and VHCT. PyXAB also provides the most commonly-used synthetic objectives to evaluate the performance of different algorithms and the various choices of the hierarchical partitions on the parameter space. The online documentation for PyXAB includes clear instructions for installation, straight-forward examples, detailed feature descriptions, and a complete reference of the API. PyXAB is released under the MIT license in order to encourage both academic and industrial usage. The library can be directly installed from PyPI with its source code available at https://github.com/WilliamLwj/PyXAB △ Less

Submitted 7 March, 2023; originally announced March 2023.

Showing 1–50 of 253 results for author: Li, W