Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 435 results for author: Zhang, H

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.16221  [pdf, other

    cs.LG cs.AI cs.GR econ.EM stat.ME

    F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data

    Authors: Zexing Xu, Linjun Zhang, Sitan Yang, Rasoul Etesami, Hanghang Tong, Huan Zhang, Jiawei Han

    Abstract: Demand prediction is a crucial task for e-commerce and physical retail businesses, especially during high-stake sales events. However, the limited availability of historical data from these peak periods poses a significant challenge for traditional forecasting methods. In this paper, we propose a novel approach that leverages strategically chosen proxy data reflective of potential sales patterns f… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    MSC Class: 68T07; 68T05; 62M10; 62M20; 90C90; 91B84

  2. arXiv:2406.14071  [pdf, other

    stat.ML cs.LG

    Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits

    Authors: Ziyi Huang, Henry Lam, Haofeng Zhang

    Abstract: Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. Nevertheless, their theoretical justification is less investigated in the literature, especially for contextual bandit problems. To fill this gap, we propose a general theoretical framework to analyze stochastic linear bandits in the presence of approximate inference and conduct regret… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.12212  [pdf, other

    stat.AP stat.ME

    Identifying Genetic Variants for Obesity Incorporating Prior Insights: Quantile Regression with Insight Fusion for Ultra-high Dimensional Data

    Authors: Jiantong Wang, Heng Lian, Yan Yu, Heping Zhang

    Abstract: Obesity is widely recognized as a critical and pervasive health concern. We strive to identify important genetic risk factors from hundreds of thousands of single nucleotide polymorphisms (SNPs) for obesity. We propose and apply a novel Quantile Regression with Insight Fusion (QRIF) approach that can integrate insights from established studies or domain knowledge to simultaneously select variables… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: This article is submitted to Journal of the American Statistical Association

  4. arXiv:2406.11397  [pdf, other

    cs.LG cs.AI stat.ML

    DistPred: A Distribution-Free Probabilistic Inference Method for Regression and Forecasting

    Authors: Daojun Liang, Haixia Zhang, Dongfeng Yuan

    Abstract: Traditional regression and prediction tasks often only provide deterministic point estimates. To estimate the uncertainty or distribution information of the response variable, methods such as Bayesian inference, model ensembling, or MC Dropout are typically used. These methods either assume that the posterior distribution of samples follows a Gaussian process or require thousands of forward passes… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2405.18795  [pdf, other

    stat.ML cs.LG

    Federated Q-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost

    Authors: Zhong Zheng, Haochen Zhang, Lingzhou Xue

    Abstract: In this paper, we consider model-free federated reinforcement learning for tabular episodic Markov decision processes. Under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. Despite recent advances in federated Q-learning algorithms achieving near-linear regret speedup with low communication co… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  6. arXiv:2405.09362  [pdf, other

    stat.ML cs.LG

    On the Saturation Effect of Kernel Ridge Regression

    Authors: Yicheng Li, Haobo Zhang, Qian Lin

    Abstract: The saturation effect refers to the phenomenon that the kernel ridge regression (KRR) fails to achieve the information theoretical lower bound when the smoothness of the underground truth function exceeds certain level. The saturation effect has been widely observed in practices and a saturation lower bound of KRR has been conjectured for decades. In this paper, we provide a proof of this long-sta… ▽ More

    Submitted 28 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: ICLR 2023; Minor errors are corrected in this version

  7. arXiv:2405.01010  [pdf, other

    cs.LG stat.ML

    Efficient and Adaptive Posterior Sampling Algorithms for Bandits

    Authors: Bingshan Hu, Zhiming Huang, Tianyue H. Zhang, Mathias Lécuyer, Nidhi Hegde

    Abstract: We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$, we derive a more practical bound that tightens the coefficient of the leading term %from $288 e^{64}$ to $1270$. Additionally, motivated by large-scale real-wo… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  8. arXiv:2404.18980  [pdf, other

    econ.GN physics.soc-ph stat.AP

    The Impact of COVID-19 on Co-authorship and Economics Scholars' Productivity

    Authors: Hanqiao Zhang, Joy D. Xiuyao Yang

    Abstract: The COVID-19 pandemic has disrupted traditional academic collaboration patterns, prompting a unique opportunity to analyze the influence of peer effects and coauthorship dynamics on research output. Using a novel dataset, this paper endeavors to make a first cut at investigating the role of peer effects on the productivity of economics scholars, measured by the number of publications, in both pre-… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  9. arXiv:2404.18979  [pdf, other

    econ.GN stat.AP

    Analysis of Proximity Informed User Behavior in a Global Online Social Network

    Authors: Nils Breitmar, Matthew C. Harding, Hanqiao Zhang

    Abstract: Despite the earlier claim of "Death of Distance", recent studies revealed that geographical proximity still greatly influences link formation in online social networks. However, it is unclear how physical distances are intertwined with users' online behaviors in a virtual world. We study the role of spatial dependence on a global online social network with a dyadic Logit model. Results show countr… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  10. arXiv:2404.18009  [pdf, other

    econ.GN stat.AP

    Exit Spillovers of Foreign-invested Enterprises in Shenzhen's Electronics Manufacturing Industry

    Authors: Hanqiao Zhang

    Abstract: Neighborhood characteristics have been broadly studied with different firm behaviors, e.g. birth, entry, expansion, and survival, except for firm exit. Using a novel dataset of foreign-invested enterprises operating in Shenzhen's electronics manufacturing industry from 2017 to 2021, I investigate the spillover effects of firm exits on other firms in the vicinity, from both the industry group and t… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  11. arXiv:2404.13366  [pdf

    stat.ME

    Prior Effective Sample Size When Borrowing on the Treatment Effect Scale

    Authors: Hongtao Zhang, Keaven M Anderson, Zachary Zimmer, Gregory Golm, Aditi Sapre, Joseph G Ibrahim

    Abstract: With the robust uptick in the applications of Bayesian external data borrowing, eliciting a prior distribution with the proper amount of information becomes increasingly critical. The prior effective sample size (ESS) is an intuitive and efficient measure for this purpose. The majority of ESS definitions have been proposed in the context of borrowing control information. While many Bayesian models… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  12. arXiv:2404.12613  [pdf, other

    stat.ML cs.LG eess.SP stat.ME

    A Fourier Approach to the Parameter Estimation Problem for One-dimensional Gaussian Mixture Models

    Authors: Xinyu Liu, Hai Zhang

    Abstract: The purpose of this paper is twofold. First, we propose a novel algorithm for estimating parameters in one-dimensional Gaussian mixture models (GMMs). The algorithm takes advantage of the Hankel structure inherent in the Fourier data obtained from independent and identically distributed (i.i.d) samples of the mixture. For GMMs with a unified variance, a singular value ratio functional using the Fo… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  13. arXiv:2404.12597  [pdf, other

    cs.LG math.ST stat.ML

    The phase diagram of kernel interpolation in large dimensions

    Authors: Haobo Zhang, Weihao Lu, Qian Lin

    Abstract: The generalization ability of kernel interpolation in large dimensions (i.e., $n \asymp d^γ$ for some $γ>0$) might be one of the most interesting problems in the recent renaissance of kernel regression, since it may help us understand the 'benign overfitting phenomenon' reported in the neural networks literature. Focusing on the inner product kernel on the sphere, we fully characterized the exact… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 18 pages, 1 figure

  14. arXiv:2404.10004  [pdf

    cs.LG physics.soc-ph stat.AP

    A Strategy Transfer and Decision Support Approach for Epidemic Control in Experience Shortage Scenarios

    Authors: X. Xiao, P. Chen, X. Cao, K. Liu, L. Deng, D. Zhao, Z. Chen, Q. Deng, F. Yu, H. Zhang

    Abstract: Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 20 pages, 9 figures

  15. arXiv:2404.08613  [pdf, other

    physics.ao-ph stat.ML

    Using Explainable AI and Transfer Learning to understand and predict the maintenance of Atlantic blocking with limited observational data

    Authors: Huan Zhang, Justin Finkel, Dorian S. Abbot, Edwin P. Gerber, Jonathan Weare

    Abstract: Blocking events are an important cause of extreme weather, especially long-lasting blocking events that trap weather systems in place. The duration of blocking events is, however, underestimated in climate models. Explainable Artificial Intelligence are a class of data analysis methods that can help identify physical causes of prolonged blocking events and diagnose model deficiencies. We demonstra… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 29 pages, 10 figures

  16. arXiv:2404.08164  [pdf, other

    stat.ML cs.AI cs.CL cs.LG

    Language Model Prompt Selection via Simulation Optimization

    Authors: Haoting Zhang, Jinghai He, Rhonda Righter, Zeyu Zheng

    Abstract: With the advancement in generative language models, the selection of prompts has gained significant attention in recent years. A prompt is an instruction or description provided by the user, serving as a guide for the generative language model in content generation. Despite existing methods for prompt selection that are based on human labor, we consider facilitating this selection through simulati… ▽ More

    Submitted 19 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  17. arXiv:2403.12367  [pdf, other

    stat.ML cs.LG stat.ME

    Semisupervised score based matching algorithm to evaluate the effect of public health interventions

    Authors: Hongzhe Zhang, Jiasheng Shi, Jing Huang

    Abstract: Multivariate matching algorithms "pair" similar study units in an observational study to remove potential bias and confounding effects caused by the absence of randomizations. In one-to-one multivariate matching algorithms, a large number of "pairs" to be matched could mean both the information from a large sample and a large number of tasks, and therefore, to best match the pairs, such a matching… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  18. arXiv:2403.06246  [pdf, other

    econ.EM stat.ME

    Estimating Factor-Based Spot Volatility Matrices with Noisy and Asynchronous High-Frequency Data

    Authors: Degui Li, Oliver Linton, Haoxuan Zhang

    Abstract: We propose a new estimator of high-dimensional spot volatility matrices satisfying a low-rank plus sparse structure from noisy and asynchronous high-frequency data collected for an ultra-large number of assets. The noise processes are allowed to be temporally correlated, heteroskedastic, asymptotically vanishing and dependent on the efficient prices. We define a kernel-weighted pre-averaging metho… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  19. arXiv:2402.03352  [pdf, other

    math.OC cs.LG stat.ML

    Zeroth-Order primal-dual Alternating Projection Gradient Algorithms for Nonconvex Minimax Problems with Coupled linear Constraints

    Authors: Huiling Zhang, Zi Xu, Yuhong Dai

    Abstract: In this paper, we study zeroth-order algorithms for nonconvex minimax problems with coupled linear constraints under the deterministic and stochastic settings, which have attracted wide attention in machine learning, signal processing and many other fields in recent years, e.g., adversarial attacks in resource allocation problems and network flow problems etc. We propose two single-loop algorithms… ▽ More

    Submitted 26 January, 2024; originally announced February 2024.

    Comments: arXiv admin note: text overlap with arXiv:2212.04672

  20. arXiv:2402.03104  [pdf, other

    stat.ML cs.LG

    High-dimensional Bayesian Optimization via Covariance Matrix Adaptation Strategy

    Authors: Lam Ngo, Huong Ha, Jeffrey Chan, Vu Nguyen, Hongyu Zhang

    Abstract: Bayesian Optimization (BO) is an effective method for finding the global optimum of expensive black-box functions. However, it is well known that applying BO to high-dimensional optimization problems is challenging. To address this issue, a promising solution is to use a local search strategy that partitions the search domain into local regions with high likelihood of containing the global optimum… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 31 pages, 17 figures

    Journal ref: Transactions on Machine Learning Research 2024

  21. arXiv:2402.02368  [pdf, other

    cs.LG stat.ML

    Timer: Generative Pre-trained Transformers Are Large Time Series Models

    Authors: Yong Liu, Haoran Zhang, Chenyu Li, Xiangdong Huang, Jianmin Wang, Mingsheng Long

    Abstract: Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world data-scarce scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous prog… ▽ More

    Submitted 4 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  22. arXiv:2401.06091  [pdf, other

    cs.LG stat.ME

    A Closer Look at AUROC and AUPRC under Class Imbalance

    Authors: Matthew B. A. McDermott, Lasse Hyldig Hansen, Haoran Zhang, Giovanni Angelotti, Jack Gallifant

    Abstract: In machine learning (ML), a widespread adage is that the area under the precision-recall curve (AUPRC) is a superior metric for model comparison to the area under the receiver operating characteristic (AUROC) for binary classification tasks with class imbalance. This paper challenges this notion through novel mathematical analysis, illustrating that AUROC and AUPRC can be concisely related in prob… ▽ More

    Submitted 18 April, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  23. arXiv:2401.05812  [pdf, other

    stat.CO

    A Tidy Framework and Infrastructure to Systematically Assemble Spatio-temporal Indexes from Multivariate Data

    Authors: H. Sherry Zhang, Dianne Cook, Ursula Laa, Nicolas Langrené, Patricia Menéndez

    Abstract: Indexes are useful for summarizing multivariate information into single metrics for monitoring, communicating, and decision-making. While most work has focused on defining new indexes for specific purposes, more attention needs to be directed towards making it possible to understand index behavior in different data conditions, and to determine how their structure affects their values and variation… ▽ More

    Submitted 13 May, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  24. arXiv:2401.00540  [pdf, other

    stat.ME stat.AP

    Study Duration Prediction for Clinical Trials with Time-to-Event Endpoints Using Mixture Distributions Accounting for Heterogeneous Population

    Authors: Hong Zhang, Jie Pu, Shibing Deng, Satrajit Roychoudhury, Haitao Chu, Douglas Robinson

    Abstract: In the era of precision medicine, more and more clinical trials are now driven or guided by biomarkers, which are patient characteristics objectively measured and evaluated as indicators of normal biological processes, pathogenic processes, or pharmacologic responses to therapeutic interventions. With the overarching objective to optimize and personalize disease management, biomarker-guided clinic… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  25. arXiv:2312.15447  [pdf, other

    cs.CV cs.LG stat.AP

    Superpixel-based and Spatially-regularized Diffusion Learning for Unsupervised Hyperspectral Image Clustering

    Authors: Kangning Cui, Ruoning Li, Sam L. Polk, Yinyi Lin, Hongsheng Zhang, James M. Murphy, Robert J. Plemmons, Raymond H. Chan

    Abstract: Hyperspectral images (HSIs) provide exceptional spatial and spectral resolution of a scene, crucial for various remote sensing applications. However, the high dimensionality, presence of noise and outliers, and the need for precise labels of HSIs present significant challenges to HSIs analysis, motivating the development of performant HSI clustering algorithms. This paper introduces a novel unsupe… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    Comments: 27 pages, 9 figures, and 2 tables

  26. arXiv:2312.10618  [pdf

    stat.ME cs.LG stat.ML

    Sparse Learning and Class Probability Estimation with Weighted Support Vector Machines

    Authors: Liyun Zeng, Hao Helen Zhang

    Abstract: Classification and probability estimation have broad applications in modern machine learning and data science applications, including biology, medicine, engineering, and computer science. The recent development of a class of weighted Support Vector Machines (wSVMs) has shown great values in robustly predicting the class probability and classification for various problems with high accuracy. The cu… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

  27. arXiv:2312.08670  [pdf, other

    stat.ME cs.AI cs.LG

    Temporal-Spatial Entropy Balancing for Causal Continuous Treatment-Effect Estimation

    Authors: Tao Hu, Honglong Zhang, Fan Zeng, Min Du, XiangKun Du, Yue Zheng, Quanqi Li, Mengran Zhang, Dan Yang, Jihao Wu

    Abstract: In the field of intracity freight transportation, changes in order volume are significantly influenced by temporal and spatial factors. When building subsidy and pricing strategies, predicting the causal effects of these strategies on order volume is crucial. In the process of calculating causal effects, confounding variables can have an impact. Traditional methods to control confounding variables… ▽ More

    Submitted 18 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 10 pages;

  28. arXiv:2311.18725  [pdf, other

    stat.ME cs.LG stat.AP stat.ML

    AI in Pharma for Personalized Sequential Decision-Making: Methods, Applications and Opportunities

    Authors: Yuhan Li, Hongtao Zhang, Keaven Anderson, Songzi Li, Ruoqing Zhu

    Abstract: In the pharmaceutical industry, the use of artificial intelligence (AI) has seen consistent growth over the past decade. This rise is attributed to major advancements in statistical machine learning methodologies, computational capabilities and the increased availability of large datasets. AI techniques are applied throughout different stages of drug development, ranging from drug discovery to pos… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  29. arXiv:2311.15051  [pdf, other

    cs.LG math.OC stat.ML

    Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults

    Authors: Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun

    Abstract: Although gradient descent with Polyak's momentum is widely used in modern machine and deep learning, a concrete understanding of its effects on the training trajectory remains elusive. In this work, we empirically show that for linear diagonal networks and nonlinear neural networks, momentum gradient descent with a large learning rate displays large catapults, driving the iterates towards much fla… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

    Comments: v3: major updates; 25 pages, 17 figures; the first two authors contributed equally. The preliminary version was accepted to the NeurIPS 2023 M3L Workshop (oral) under the title "Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study."

  30. arXiv:2311.12244  [pdf, other

    cs.LG cs.AI stat.ML

    Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning

    Authors: Hongming Zhang, Tongzheng Ren, Chenjun Xiao, Dale Schuurmans, Bo Dai

    Abstract: In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state. Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounte… ▽ More

    Submitted 10 June, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: The first two authors contribute equally

  31. arXiv:2311.08908  [pdf, other

    stat.ME cs.CV

    Robust Brain MRI Image Classification with SIBOW-SVM

    Authors: Liyun Zeng, Hao Helen Zhang

    Abstract: The majority of primary Central Nervous System (CNS) tumors in the brain are among the most aggressive diseases affecting humans. Early detection of brain tumor types, whether benign or malignant, glial or non-glial, is critical for cancer prevention and treatment, ultimately improving human life expectancy. Magnetic Resonance Imaging (MRI) stands as the most effective technique to detect brain tu… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  32. arXiv:2311.03757  [pdf, other

    stat.ML cs.LG

    Manifold learning: what, how, and why

    Authors: Marina Meilă, Hanyu Zhang

    Abstract: Manifold learning (ML), known also as non-linear dimension reduction, is a set of methods to find the low dimensional structure of data. Dimension reduction for large, high dimensional data is not merely a way to reduce the data; the new representations and descriptors obtained by ML reveal the geometric shape of high dimensional point clouds, and allow one to visualize, de-noise and interpret the… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  33. arXiv:2311.03201  [pdf, other

    stat.ML cs.LG math.NA

    Spatial Process Approximations: Assessing Their Necessity

    Authors: Hao Zhang

    Abstract: In spatial statistics and machine learning, the kernel matrix plays a pivotal role in prediction, classification, and maximum likelihood estimation. A thorough examination reveals that for large sample sizes, the kernel matrix becomes ill-conditioned, provided the sampling locations are fairly evenly distributed. This condition poses significant challenges to numerical algorithms used in predictio… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  34. arXiv:2311.02618  [pdf, other

    stat.AP

    Regionalization of China's PM2.5 through Robust Spatio temporal Functional Clustering Method

    Authors: Tingyin Wang, Xueqin Wang, Xiaobo Guo, Heping Zhang

    Abstract: The patterns of particulate matter with diameters that are generally 2.5 micrometers and smaller (PM2.5) are heterogeneous in China nationwide but can be homogeneous region-wide. To reduce the adverse effects from PM2.5, policymakers need to develop location-specific regulations based on nationwide clustering analysis of PM2.5 concentrations. However, such an analysis is challenging because the da… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  35. arXiv:2311.01797  [pdf, other

    cs.LG stat.ML

    On the Generalization Properties of Diffusion Models

    Authors: Puheng Li, Zhong Li, Huishuai Zhang, Jiang Bian

    Abstract: Diffusion models are a class of generative models that serve to establish a stochastic transport map between an empirically observed, yet unknown, target distribution and a known prior. Despite their remarkable success in real-world applications, a theoretical understanding of their generalization capabilities remains underdeveloped. This work embarks on a comprehensive theoretical exploration of… ▽ More

    Submitted 12 January, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: 42 pages, 11 figures

  36. arXiv:2310.20438  [pdf, ps, other

    stat.ML cs.LG

    The Phase Transition Phenomenon of Shuffled Regression

    Authors: Hang Zhang, Ping Li

    Abstract: We study the phase transition phenomenon inherent in the shuffled (permuted) regression problem, which has found numerous applications in databases, privacy, data analysis, etc. In this study, we aim to precisely identify the locations of the phase transition points by leveraging techniques from message passing (MP). In our analysis, we first transform the permutation recovery problem into a proba… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  37. arXiv:2310.19051  [pdf, other

    stat.ME cs.MS

    A Survey of Methods for Estimating Hurst Exponent of Time Sequence

    Authors: Hong-Yan Zhang, Zhi-Qiang Feng, Si-Yu Feng, Yu Zhou

    Abstract: The Hurst exponent is a significant indicator for characterizing the self-similarity and long-term memory properties of time sequences. It has wide applications in physics, technologies, engineering, mathematics, statistics, economics, psychology and so on. Currently, available methods for estimating the Hurst exponent of time sequences can be divided into different categories: time-domain methods… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: 46 pages, 8 figures, 4 tables, 24 algorithms with pseudo-codes

  38. arXiv:2310.15448  [pdf, other

    math.OC cs.LG stat.ML

    An accelerated first-order regularized momentum descent ascent algorithm for stochastic nonconvex-concave minimax problems

    Authors: Huiling Zhang, Zi Xu

    Abstract: Stochastic nonconvex minimax problems have attracted wide attention in machine learning, signal processing and many other fields in recent years. In this paper, we propose an accelerated first-order regularized momentum descent ascent algorithm (FORMDA) for solving stochastic nonconvex-concave minimax problems. The iteration complexity of the algorithm is proved to be… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  39. arXiv:2310.09257  [pdf, other

    stat.ME

    A SIMPLE Approach to Provably Reconstruct Ising Model with Global Optimality

    Authors: Junxian Zhu, Xuanyu Chen, Jin Zhu, Xueqin Wang, Heping Zhang

    Abstract: Reconstruction of interaction network between random events is a critical problem arising from statistical physics and politics to sociology, biology, and psychology, and beyond. The Ising model lays the foundation for this reconstruction process, but finding the underlying Ising model from the least amount of observed samples in a computationally efficient manner has been historically challenging… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  40. arXiv:2310.08208  [pdf, other

    stat.CO

    Distributed Estimation for Large-Scale Cox Regression with Poisson Subsampling

    Authors: Haixiang Zhang, Yang Li, HaiYing Wang

    Abstract: To ensure privacy protection and alleviate computational burden, we propose a Poisson-subsampling based distributed estimation procedure for the Cox model with massive survival datasets from multi-centered, decentralized sources. The proposed estimator is computed based on optimal subsampling probabilities that we derived and enables transmission of subsample-based summary level statistics between… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  41. arXiv:2310.01326  [pdf, other

    stat.ML cs.LG

    Optimal Estimator for Linear Regression with Shuffled Labels

    Authors: Hang Zhang, Ping Li

    Abstract: This paper considers the task of linear regression with shuffled labels, i.e., $\mathbf Y = \mathbf Π\mathbf X \mathbf B + \mathbf W$, where $\mathbf Y \in \mathbb R^{n\times m}, \mathbf Pi \in \mathbb R^{n\times n}, \mathbf X\in \mathbb R^{n\times p}, \mathbf B \in \mathbb R^{p\times m}$, and $\mathbf W\in \mathbb R^{n\times m}$, respectively, represent the sensing results, (unknown or missing) c… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  42. arXiv:2309.16578  [pdf, other

    stat.ML cs.LG physics.chem-ph

    Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning

    Authors: He Zhang, Siyuan Liu, Jiacheng You, Chang Liu, Shuxin Zheng, Ziheng Lu, Tong Wang, Nanning Zheng, Bin Shao

    Abstract: Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. Here we propose M-OFDFT, an OFDFT… ▽ More

    Submitted 9 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Published in Nature Computational Science, March 2024. Full paper with supplementary information

  43. arXiv:2309.16492  [pdf, other

    stat.ME cs.AI stat.AP stat.ML

    Asset Bundling for Wind Power Forecasting

    Authors: Hanyu Zhang, Mathieu Tanneau, Chaofan Huang, V. Roshan Joseph, Shangkun Wang, Pascal Van Hentenryck

    Abstract: The growing penetration of intermittent, renewable generation in US power grids, especially wind and solar generation, results in increased operational uncertainty. In that context, accurate forecasts are critical, especially for wind generation, which exhibits large variability and is historically harder to predict. To overcome this challenge, this work proposes a novel Bundle-Predict-Reconcile (… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  44. arXiv:2309.11764  [pdf, other

    stat.ME

    Causal inference with outcome dependent sampling and mismeasured outcome

    Authors: Min Zeng, Zeyang Jia, Zijian Sui, Jinfeng Xu, Hong Zhang

    Abstract: Outcome-dependent sampling designs are extensively utilized in various scientific disciplines, including epidemiology, ecology, and economics, with retrospective case-control studies being specific examples of such designs. Additionally, if the outcome used for sample selection is also mismeasured, then it is even more challenging to estimate the average treatment effect (ATE) accurately. To our k… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 49 pages, 5 figures

  45. arXiv:2309.08642  [pdf, other

    eess.SY cs.AI cs.LG stat.ME

    A Stochastic Online Forecast-and-Optimize Framework for Real-Time Energy Dispatch in Virtual Power Plants under Uncertainty

    Authors: Wei Jiang, Zhongkai Yi, Li Wang, Hanwei Zhang, Jihai Zhang, Fangquan Lin, Cheng Yang

    Abstract: Aggregating distributed energy resources in power systems significantly increases uncertainties, in particular caused by the fluctuation of renewable energy generation. This issue has driven the necessity of widely exploiting advanced predictive control techniques under uncertainty to ensure long-term economics and decarbonization. In this paper, we propose a real-time uncertainty-aware energy dis… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: Preprint. Accepted by CIKM 23

  46. arXiv:2309.06230  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    A Consistent and Scalable Algorithm for Best Subset Selection in Single Index Models

    Authors: Borui Tang, Jin Zhu, Junxian Zhu, Xueqin Wang, Heping Zhang

    Abstract: Analysis of high-dimensional data has led to increased interest in both single index models (SIMs) and best subset selection. SIMs provide an interpretable and flexible modeling framework for high-dimensional data, while best subset selection aims to find a sparse model from a large set of predictors. However, best subset selection in high-dimensional models is known to be computationally intracta… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  47. arXiv:2309.04268  [pdf, other

    stat.ML cs.LG math.ST

    Optimal Rate of Kernel Regression in Large Dimensions

    Authors: Weihao Lu, Haobo Zhang, Yicheng Li, Manyun Xu, Qian Lin

    Abstract: We perform a study on kernel regression for large-dimensional data (where the sample size $n$ is polynomially depending on the dimension $d$ of the samples, i.e., $n\asymp d^γ$ for some $γ>0$ ). We first build a general tool to characterize the upper bound and the minimax lower bound of kernel regression for large dimensional data through the Mendelson complexity $\varepsilon_{n}^{2}$ and the metr… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    MSC Class: 62G08; 46E22; 68T07

  48. arXiv:2308.13514  [pdf, other

    stat.ME stat.AP

    Towards more scientific meta-analyses

    Authors: Lily H. Zhang, Menelaos Konstantinidis, Marie-Abèle Bind, Donald B. Rubin

    Abstract: Meta-analysis can be a critical part of the research process, often serving as the primary analysis on which the practitioners, policymakers, and individuals base their decisions. However, current literature synthesis approaches to meta-analysis typically estimate a different quantity than what is implicitly intended; concretely, standard approaches estimate the average effect of a treatment for a… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Comments: Oral presentation at Cochrane Colloquium 2023

  49. arXiv:2308.11374  [pdf, other

    stat.ME

    Weighting Based Approaches to Borrowing Historical Controls for Indirect comparison for Time-to-Event Data with a Cure Fraction

    Authors: Jixian Wang, Hongtao Zhang, Ram Tiwari

    Abstract: To use historical controls for indirect comparison with single-arm trials, the population difference between data sources should be adjusted to reduce confounding bias. The adjustment is more difficult for time-to-event data with a cure fraction. We propose different adjustment approaches based on pseudo observations and calibration weighting by entropy balancing. We show a simple way to obtain th… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  50. Fixed-Point Algorithms for Solving the Critical Value and Upper Tail Quantile of Kuiper's Statistics

    Authors: Hong-Yan Zhang, Wei Sun, Xiao Chen, Rui-Jia Lin, Yu Zhou

    Abstract: Kuiper's statistic is a good measure for the difference of ideal distribution and empirical distribution in the goodness-of-fit test. However, it is a challenging problem to solve the critical value and upper tail quantile, or simply Kuiper pair, of Kuiper's statistics due to the difficulties of solving the nonlinear equation and reasonable approximation of infinite series. In this work, the contr… ▽ More

    Submitted 23 March, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

    Comments: 20 pages, 6 figures, 5 tables, code available on GitHub

    Journal ref: Heliyon, 10(7): e28274, April 15, 2024