Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 569 results for author: Zhang, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2409.16534  [pdf, other

    stat.AP

    Dependencies in Item-Adaptive CAT Data and Differential Item Functioning Detection: A Multilevel Framework

    Authors: Dandan Chen Kaptur, Justin Kern, Chingwei David Shin, Jinming Zhang

    Abstract: This study investigates differential item functioning (DIF) detection in computerized adaptive testing (CAT) using multilevel modeling. We argue that traditional DIF methods have proven ineffective in CAT due to the hierarchical nature of the data. Our proposed two-level model accounts for dependencies between items via provisional ability estimates. Simulations revealed that our model outperforme… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: 38 pages, preprint

  2. arXiv:2409.13300  [pdf, other

    stat.ME math.ST

    A Two-stage Inference Procedure for Sample Local Average Treatment Effects in Randomized Experiments

    Authors: Zhen Zhong, Per Johansson, Junni L. Zhang

    Abstract: In a given randomized experiment, individuals are often volunteers and can differ in important ways from a population of interest. It is thus of interest to focus on the sample at hand. This paper focuses on inference about the sample local average treatment effect (LATE) in randomized experiments with non-compliance. We present a two-stage procedure that provides asymptotically correct coverage r… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

  3. arXiv:2409.12928  [pdf, other

    stat.ME

    A general condition for bias attenuation by a nondifferentially mismeasured confounder

    Authors: Jeffrey Zhang, Junu Lee

    Abstract: In real-world studies, the collected confounders may suffer from measurement error. Although mismeasurement of confounders is typically unintentional -- originating from sources such as human oversight or imprecise machinery -- deliberate mismeasurement also occurs and is becoming increasingly more common. For example, in the 2020 U.S. Census, noise was added to measurements to assuage privacy con… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  4. arXiv:2409.12848  [pdf, ps, other

    stat.ME

    Bridging the Gap Between Design and Analysis: Randomization Inference and Sensitivity Analysis for Matched Observational Studies with Treatment Doses

    Authors: Jeffrey Zhang, Siyu Heng

    Abstract: Matching is a commonly used causal inference study design in observational studies. Through matching on measured confounders between different treatment groups, valid randomization inferences can be conducted under the no unmeasured confounding assumption, and sensitivity analysis can be further performed to assess sensitivity of randomization inference results to potential unmeasured confounding.… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  5. arXiv:2409.09512  [pdf, other

    stat.ME

    Doubly robust and computationally efficient high-dimensional variable selection

    Authors: Abhinav Chakraborty, Jeffrey Zhang, Eugene Katsevich

    Abstract: The variable selection problem is to discover which of a large set of predictors is associated with an outcome of interest, conditionally on the other predictors. This problem has been widely studied, but existing approaches lack either power against complex alternatives, robustness to model misspecification, computational efficiency, or quantification of evidence against individual hypotheses. We… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

  6. arXiv:2409.09355  [pdf, ps, other

    stat.ME math.ST

    A Random-effects Approach to Regression Involving Many Categorical Predictors and Their Interactions

    Authors: Hanmei Sun, Jiangshan Zhang, Jiming Jiang

    Abstract: Linear model prediction with a large number of potential predictors is both statistically and computationally challenging. The traditional approaches are largely based on shrinkage selection/estimation methods, which are applicable even when the number of potential predictors is (much) larger than the sample size. A situation of the latter scenario occurs when the candidate predictors involve many… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

    Comments: 28 pages

  7. arXiv:2409.06530  [pdf, ps, other

    math.OC cs.LG stat.ML

    Functionally Constrained Algorithm Solves Convex Simple Bilevel Problems

    Authors: Huaqing Zhang, Lesi Chen, Jing Xu, Jingzhao Zhang

    Abstract: This paper studies simple bilevel problems, where a convex upper-level function is minimized over the optimal solutions of a convex lower-level problem. We first show the fundamental difficulty of simple bilevel problems, that the approximate optimal value of such problems is not obtainable by first-order zero-respecting algorithms. Then we follow recent works to pursue the weak approximate soluti… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  8. arXiv:2409.05276  [pdf, ps, other

    stat.ME

    An Eigengap Ratio Test for Determining the Number of Communities in Network Data

    Authors: Yujia Wu, Jingfei Zhang, Wei Lan, Chih-Ling Tsai

    Abstract: To characterize the community structure in network data, researchers have introduced various block-type models, including the stochastic block model, degree-corrected stochastic block model, mixed membership block model, degree-corrected mixed membership block model, and others. A critical step in applying these models effectively is determining the number of communities in the network. However, t… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

  9. arXiv:2408.15419  [pdf, other

    stat.AP

    Bayesian Inference General Procedures for A Single-subject Test Study

    Authors: Jie Li, Gary Green, Sarah J. A. Carr, Peng Liu, Jian Zhang

    Abstract: Abnormality detection in the identification of a single-subject which deviates from the majority of the dataset that comes from a control group is a critical problem. A common approach is to assume that the control group can be characterised in terms of standard Normal statistics and the detection of single abnormal subject is in that context. But in many situations the control group can not be de… ▽ More

    Submitted 10 September, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

    Comments: 35 pages, 12 figures and 10 tables

  10. arXiv:2408.13702  [pdf, ps, other

    stat.AP

    Examining Differential Item Functioning (DIF) in Self-Reported Health Survey Data: Via Multilevel Modeling

    Authors: Dandan Chen Kaptur, Yiqing Liu, Bradley Kaptur, Nicholas Peterman, Jinming Zhang, Justin Kern, Carolyn Anderson

    Abstract: Few health-related constructs or measures have received critical evaluation in terms of measurement equivalence, such as self-reported health survey data. Differential item functioning (DIF) analysis is crucial for evaluating measurement equivalence in self-reported health surveys, which are often hierarchical in structure. While traditional DIF methods rely on single-level models, multilevel mode… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: preprint, 11 pages (excluding references)

  11. arXiv:2408.13430  [pdf, other

    stat.AP cs.DL cs.GT cs.LG stat.ML

    Analysis of the ICML 2023 Ranking Data: Can Authors' Opinions of Their Own Papers Assist Peer Review in Machine Learning?

    Authors: Buxin Su, Jiayao Zhang, Natalie Collina, Yuling Yan, Didong Li, Kyunghyun Cho, Jianqing Fan, Aaron Roth, Weijie J. Su

    Abstract: We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML) that requested authors with multiple submissions to rank their own papers based on perceived quality. We received 1,342 rankings, each from a distinct author, pertaining to 2,592 submissions. In this paper, we present an empirical analysis of how author-provided rankings could be le… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: See more details about the experiment at https://openrank.cc/

  12. arXiv:2408.11922  [pdf, other

    stat.AP

    Evaluating Four Methods for Detecting Differential Item Functioning in Large-Scale Assessments with More Than Two Groups

    Authors: Dandan Chen Kaptur, Jinming Zhang

    Abstract: This study evaluated four multi-group differential item functioning (DIF) methods (the root mean square deviation approach, Wald-1, generalized logistic regression procedure, and generalized Mantel-Haenszel method) via Monte Carlo simulation of controlled testing conditions. These conditions varied in the number of groups, the ability and sample size of the DIF-contaminated group, the parameter as… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: preprint, 16 pages (excluding figures, references, and title page)

  13. arXiv:2407.15439  [pdf, other

    cs.LG stat.ML

    Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays

    Authors: Ziqun Chen, Kechao Cai, Zhuoyue Chen, Jinbei Zhang, John C. S. Lui

    Abstract: We study the stochastic combinatorial semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints. This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available and fairness among different choices (or arms) is crucial. We consider two types of unrestricted feedback delays: reward-independent… ▽ More

    Submitted 29 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 28 pages, 9 figures, accepted for 27th European Conference on Artificial Intelligence (ECAI 2024), Source code added, Typo fixed

  14. arXiv:2407.14976  [pdf, other

    stat.AP q-bio.PE

    Multiple merger coalescent inference of effective population size

    Authors: Julie Zhang, Julia A. Palacios

    Abstract: Variation in a sample of molecular sequence data informs about the past evolutionary history of the sample's population. Traditionally, Bayesian modeling coupled with the standard coalescent, is used to infer the sample's bifurcating genealogy and demographic and evolutionary parameters such as effective population size, and mutation rates. However, there are many situations where binary coalescen… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: 22 pages, 11 figures, 4 tables

  15. arXiv:2407.12835  [pdf, ps, other

    cs.CL cs.AI stat.ML

    Regurgitative Training: The Value of Real Data in Training Large Language Models

    Authors: Jinghui Zhang, Dandan Qiao, Mochen Yang, Qiang Wei

    Abstract: What happens if we train a new Large Language Model (LLM) using data that are at least partially generated by other LLMs? The explosive success of LLMs means that a substantial amount of content online will be generated by LLMs rather than humans, which will inevitably enter the training datasets of next-generation LLMs. We evaluate the implications of such "regurgitative training" on LLM performa… ▽ More

    Submitted 25 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  16. arXiv:2407.11901  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Combining Wasserstein-1 and Wasserstein-2 proximals: robust manifold learning via well-posed generative flows

    Authors: Hyemin Gu, Markos A. Katsoulakis, Luc Rey-Bellet, Benjamin J. Zhang

    Abstract: We formulate well-posed continuous-time generative flows for learning distributions that are supported on low-dimensional manifolds through Wasserstein proximal regularizations of $f$-divergences. Wasserstein-1 proximal operators regularize $f$-divergences so that singular distributions can be compared. Meanwhile, Wasserstein-2 proximal operators regularize the paths of the generative flows by add… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  17. arXiv:2407.04819  [pdf, other

    cs.LG cs.AI cs.CV cs.IT stat.ML

    RPN: Reconciled Polynomial Network Towards Unifying PGMs, Kernel SVMs, MLP and KAN

    Authors: Jiawei Zhang

    Abstract: In this paper, we will introduce a novel deep model named Reconciled Polynomial Network (RPN) for deep function learning. RPN has a very general architecture and can be used to build models with various complexities, capacities, and levels of completeness, which all contribute to the correctness of these models. As indicated in the subtitle, RPN can also serve as the backbone to unify different ba… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 110 pages, 31 figures, 33 tables

  18. arXiv:2406.17184  [pdf, ps, other

    cs.LG stat.ML

    Minimax Optimality in Contextual Dynamic Pricing with General Valuation Models

    Authors: Xueping Gong, Jiheng Zhang

    Abstract: Dynamic pricing, the practice of adjusting prices based on contextual factors, has gained significant attention due to its impact on revenue maximization. In this paper, we address the contextual dynamic pricing problem, which involves pricing decisions based on observable product features and customer characteristics. We propose a novel algorithm that achieves improved regret bounds while minimiz… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 29 pages

  19. arXiv:2406.15514  [pdf, other

    physics.soc-ph q-bio.PE stat.ME

    How big does a population need to be before demographers can ignore individual-level randomness in demographic events?

    Authors: John Bryant, Tahu Kukutai, Junni L. Zhang

    Abstract: When studying a national-level population, demographers can safely ignore the effect of individual-level randomness on age-sex structure. When studying a single community, or group of communities, however, the potential importance of individual-level randomness is less clear. We seek to measure the effect of individual-level randomness in births and deaths on standard summary indicators of age-sex… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 28 pages, 8 figures, 3 tables

    MSC Class: 91-XX

  20. arXiv:2406.10537  [pdf, other

    cs.LG cs.AI stat.ML

    Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton Posterior (Extended Version)

    Authors: Pingchuan Ma, Rui Ding, Qiang Fu, Jiaru Zhang, Shuai Wang, Shi Han, Dongmei Zhang

    Abstract: Differentiable causal discovery has made significant advancements in the learning of directed acyclic graphs. However, its application to real-world datasets remains restricted due to the ubiquity of latent confounders and the requirement to learn maximal ancestral graphs (MAGs). To date, existing differentiable MAG learning algorithms have been limited to small datasets and failed to scale to lar… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  21. arXiv:2406.04487  [pdf, other

    cs.LG stat.ML

    A multi-core periphery perspective: Ranking via relative centrality

    Authors: Chandra Sekhar Mukherjee, Jiapeng Zhang

    Abstract: Community and core-periphery are two widely studied graph structures, with their coexistence observed in real-world graphs (Rombach, Porter, Fowler \& Mucha [SIAM J. App. Math. 2014, SIAM Review 2017]). However, the nature of this coexistence is not well understood and has been pointed out as an open problem (Yanchenko \& Sengupta [Statistics Surveys, 2023]). Especially, the impact of inferring th… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  22. arXiv:2406.01823  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Causal Discovery with Fewer Conditional Independence Tests

    Authors: Kirankumar Shiragur, Jiaqi Zhang, Caroline Uhler

    Abstract: Many questions in science center around the fundamental problem of understanding causal relationships. However, most constraint-based causal discovery algorithms, including the well-celebrated PC algorithm, often incur an exponential number of conditional independence (CI) tests, posing limitations in various applications. Addressing this, our work focuses on characterizing what can be learned abo… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  23. arXiv:2405.18979  [pdf, other

    cs.LG stat.ML

    MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

    Authors: Renchunzi Xie, Ambroise Odonnat, Vasilii Feofanov, Weijian Deng, Jianfeng Zhang, Bo An

    Abstract: Leveraging the models' outputs, specifically the logits, is a common approach to estimating the test accuracy of a pre-trained neural network on out-of-distribution (OOD) samples without requiring access to the corresponding ground truth labels. Despite their ease of implementation and computational efficiency, current logit-based methods are vulnerable to overconfidence issues, leading to predict… ▽ More

    Submitted 24 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: The three first authors contributed equally

  24. arXiv:2405.15754  [pdf, ps, other

    stat.ML cs.LG math.ST

    Score-based generative models are provably robust: an uncertainty quantification perspective

    Authors: Nikiforos Mimikos-Stamatopoulos, Benjamin J. Zhang, Markos A. Katsoulakis

    Abstract: Through an uncertainty quantification (UQ) perspective, we show that score-based generative models (SGMs) are provably robust to the multiple sources of error in practical implementation. Our primary tool is the Wasserstein uncertainty propagation (WUP) theorem, a model-form UQ bound that describes how the $L^2$ error from learning the score function propagates to a Wasserstein-1 ($\mathbf{d}_1$)… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  25. arXiv:2405.15038  [pdf, other

    stat.ME

    Preferential Latent Space Models for Networks with Textual Edges

    Authors: Maoyu Zhang, Biao Cai, Dong Li, Xiaoyue Niu, Jingfei Zhang

    Abstract: Many real-world networks contain rich textual information in the edges, such as email networks where an edge between two nodes is an email exchange. Other examples include co-author networks and social media networks. The useful textual information carried in the edges is often discarded in most network analyses, resulting in an incomplete view of the relationships between nodes. In this work, we… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 30 pages

    MSC Class: G.3; F.2 ACM Class: G.3

  26. arXiv:2405.12421  [pdf, other

    cs.LG cs.AI stat.ML

    A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback

    Authors: Kihyun Kim, Jiawei Zhang, Asuman Ozdaglar, Pablo A. Parrilo

    Abstract: Inverse Reinforcement Learning (IRL) and Reinforcement Learning from Human Feedback (RLHF) are pivotal methodologies in reward learning, which involve inferring and shaping the underlying reward function of sequential decision-making problems based on observed human demonstrations and feedback. Most prior work in reward learning has relied on prior knowledge or assumptions about decision or prefer… ▽ More

    Submitted 3 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  27. arXiv:2405.10991  [pdf, other

    cs.LG cs.AI stat.ME

    Relative Counterfactual Contrastive Learning for Mitigating Pretrained Stance Bias in Stance Detection

    Authors: Jiarui Zhang, Shaojuan Wu, Xiaowang Zhang, Zhiyong Feng

    Abstract: Stance detection classifies stance relations (namely, Favor, Against, or Neither) between comments and targets. Pretrained language models (PLMs) are widely used to mine the stance relation to improve the performance of stance detection through pretrained knowledge. However, PLMs also embed ``bad'' pretrained knowledge concerning stance into the extracted stance relation semantics, resulting in pr… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  28. arXiv:2405.08235  [pdf, other

    stat.ML cs.LG

    Additive-Effect Assisted Learning

    Authors: Jiawei Zhang, Yuhong Yang, Jie Ding

    Abstract: It is quite popular nowadays for researchers and data analysts holding different datasets to seek assistance from each other to enhance their modeling performance. We consider a scenario where different learners hold datasets with potentially distinct variables, and their observations can be aligned by a nonprivate identifier. Their collaboration faces the following difficulties: First, learners m… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  29. arXiv:2405.06613  [pdf, other

    stat.ME

    Simultaneously detecting spatiotemporal changes with penalized Poisson regression models

    Authors: Zerui Zhang, Xin Wang, Xin Zhang, Jing Zhang

    Abstract: In the realm of large-scale spatiotemporal data, abrupt changes are commonly occurring across both spatial and temporal domains. This study aims to address the concurrent challenges of detecting change points and identifying spatial clusters within spatiotemporal count data. We introduce an innovative method based on the Poisson regression model, employing doubly fused penalization to unveil the u… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  30. arXiv:2405.01275  [pdf, other

    stat.ME

    Variable Selection in Ultra-high Dimensional Feature Space for the Cox Model with Interval-Censored Data

    Authors: Daewoo Pak, Jianrui Zhang, Di Wu, Haolei Weng, Chenxi Li

    Abstract: We develop a set of variable selection methods for the Cox model under interval censoring, in the ultra-high dimensional setting where the dimensionality can grow exponentially with the sample size. The methods select covariates via a penalized nonparametric maximum likelihood estimation with some popular penalty functions, including lasso, adaptive lasso, SCAD, and MCP. We prove that our penalize… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  31. arXiv:2404.19557  [pdf, other

    stat.ML cs.LG

    Neural Dynamic Data Valuation

    Authors: Zhangyong Liang, Huanhuan Gao, Ji Zhang

    Abstract: Data constitute the foundational component of the data economy and its marketplaces. Efficient and fair data valuation has emerged as a topic of significant interest.\ Many approaches based on marginal contribution have shown promising results in various downstream tasks. However, they are well known to be computationally expensive as they require training a large number of utility functions, whic… ▽ More

    Submitted 12 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: 43 pages, 19 figures

  32. arXiv:2404.02093  [pdf, other

    stat.ME

    High-dimensional covariance regression with application to co-expression QTL detection

    Authors: Rakheon Kim, Jingfei Zhang

    Abstract: While covariance matrices have been widely studied in many scientific fields, relatively limited progress has been made on estimating conditional covariances that permits a large covariance matrix to vary with high-dimensional subject-level covariates. In this paper, we present a new sparse multivariate regression framework that models the covariance matrix as a function of subject-level covariate… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  33. arXiv:2403.16059  [pdf, other

    stat.ML cs.LG math.OC

    Manifold Regularization Classification Model Based On Improved Diffusion Map

    Authors: Hongfu Guo, Wencheng Zou, Zeyu Zhang, Shuishan Zhang, Ruitong Wang, Jintao Zhang

    Abstract: Manifold regularization model is a semi-supervised learning model that leverages the geometric structure of a dataset, comprising a small number of labeled samples and a large number of unlabeled samples, to generate classifiers. However, the original manifold norm limits the performance of models to local regions. To address this limitation, this paper proposes an approach to improve manifold reg… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 20 pages, 24figures

  34. arXiv:2403.12448  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Do Generated Data Always Help Contrastive Learning?

    Authors: Yifei Wang, Jizhe Zhang, Yisen Wang

    Abstract: Contrastive Learning (CL) has emerged as one of the most successful paradigms for unsupervised visual representation learning, yet it often depends on intensive manual data augmentations. With the rise of generative models, especially diffusion models, the ability to generate realistic images close to the real data distribution has been well recognized. These generated high-equality images have be… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 19 pages. Accepted by ICLR 2024

  35. arXiv:2403.05759  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Membership Testing in Markov Equivalence Classes via Independence Query Oracles

    Authors: Jiaqi Zhang, Kirankumar Shiragur, Caroline Uhler

    Abstract: Understanding causal relationships between variables is a fundamental problem with broad impact in numerous scientific fields. While extensive research has been dedicated to learning causal graphs from data, its complementary concept of testing causal relationships has remained largely unexplored. While learning involves the task of recovering the Markov equivalence class (MEC) of the underlying c… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  36. arXiv:2403.05647  [pdf, other

    stat.ME stat.CO

    Minor Issues Escalated to Critical Levels in Large Samples: A Permutation-Based Fix

    Authors: Xuekui Zhang, Li Xing, Jing Zhang, Soojeong Kim

    Abstract: In the big data era, the need to reevaluate traditional statistical methods is paramount due to the challenges posed by vast datasets. While larger samples theoretically enhance accuracy and hypothesis testing power without increasing false positives, practical concerns about inflated Type-I errors persist. The prevalent belief is that larger samples can uncover subtle effects, necessitating dual… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  37. arXiv:2402.17287  [pdf, other

    cs.LG cs.CV stat.ML

    An Interpretable Evaluation of Entropy-based Novelty of Generative Models

    Authors: Jingwei Zhang, Cheuk Ting Li, Farzan Farnia

    Abstract: The massive developments of generative model frameworks require principled methods for the evaluation of a model's novelty compared to a reference dataset. While the literature has extensively studied the evaluation of the quality, diversity, and generalizability of generative models, the assessment of a model's novelty compared to a reference model has not been adequately explored in the machine… ▽ More

    Submitted 13 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  38. arXiv:2402.14942  [pdf, other

    stat.ME

    On Identification of Dynamic Treatment Regimes with Proxies of Hidden Confounders

    Authors: Jeffrey Zhang, Eric Tchetgen Tchetgen

    Abstract: We consider identification of optimal dynamic treatment regimes in a setting where time-varying treatments are confounded by hidden time-varying confounders, but proxy variables of the unmeasured confounders are available. We show that, with two independent proxy variables at each time point that are sufficiently relevant for the hidden confounders, identification of the joint distribution of coun… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  39. arXiv:2402.12655  [pdf, other

    cs.SI stat.AP

    Ego Group Partition: A Novel Framework for Improving Ego Experiments in Social Networks

    Authors: Lu Deng, JingJing Zhang, Yong Wang, Chuan Chen

    Abstract: Estimating the average treatment effect in social networks is challenging due to individuals influencing each other. One approach to address interference is ego cluster experiments, where each cluster consists of a central individual (ego) and its peers (alters). Clusters are randomized, and only the effects on egos are measured. In this work, we propose an improved framework for ego cluster exper… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  40. arXiv:2402.12653  [pdf, other

    cs.SI stat.AP

    Unbiased Estimation for Total Treatment Effect Under Interference Using Aggregated Dyadic Data

    Authors: Lu Deng, Yilin Li, JingJing Zhang, Yong Wang, Chuan Chen

    Abstract: In social media platforms, user behavior is often influenced by interactions with other users, complicating the accurate estimation of causal effects in traditional A/B experiments. This study investigates situations where an individual's outcome can be broken down into the sum of multiple pairwise outcomes, a reflection of user interactions. These outcomes, referred to as dyadic data, are prevale… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  41. arXiv:2402.10357  [pdf, other

    math.ST cs.LG math.PR stat.CO stat.ML

    Efficient Sampling on Riemannian Manifolds via Langevin MCMC

    Authors: Xiang Cheng, Jingzhao Zhang, Suvrit Sra

    Abstract: We study the task of efficiently sampling from a Gibbs distribution $d π^* = e^{-h} d {vol}_g$ over a Riemannian manifold $M$ via (geometric) Langevin MCMC; this algorithm involves computing exponential maps in random Gaussian directions and is efficiently implementable in practice. The key to our analysis of Langevin MCMC is a bound on the discretization error of the geometric Euler-Murayama sche… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: This is an old paper from NeurIPS 2022. arXiv admin note: text overlap with arXiv:2204.13665

  42. arXiv:2402.06162  [pdf, other

    stat.ML cs.LG

    Wasserstein proximal operators describe score-based generative models and resolve memorization

    Authors: Benjamin J. Zhang, Siting Liu, Wuchen Li, Markos A. Katsoulakis, Stanley J. Osher

    Abstract: We focus on the fundamental mathematical structure of score-based generative models (SGMs). We first formulate SGMs in terms of the Wasserstein proximal operator (WPO) and demonstrate that, via mean-field games (MFGs), the WPO formulation reveals mathematical structure that describes the inductive bias of diffusion and score-based models. In particular, MFGs yield optimality conditions in the form… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  43. arXiv:2402.06010  [pdf, other

    cs.LG stat.ML

    NPSVC++: Nonparallel Classifiers Encounter Representation Learning

    Authors: Junhong Zhang, Zhihui Lai, Jie Zhou, Guangfei Liang

    Abstract: This paper focuses on a specific family of classifiers called nonparallel support vector classifiers (NPSVCs). Different from typical classifiers, the training of an NPSVC involves the minimization of multiple objectives, resulting in the potential concerns of feature suboptimality and class dependency. Consequently, no effective learning scheme has been established to improve NPSVCs' performance… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  44. arXiv:2402.04885  [pdf, other

    stat.ML cs.AI cs.LG

    A Unified Gaussian Process for Branching and Nested Hyperparameter Optimization

    Authors: Jiazhao Zhang, Ying Hung, Chung-Ching Lin, Zicheng Liu

    Abstract: Choosing appropriate hyperparameters plays a crucial role in the success of neural networks as hyper-parameters directly control the behavior and performance of the training algorithms. To obtain efficient tuning, Bayesian optimization methods based on Gaussian process (GP) models are widely used. Despite numerous applications of Bayesian optimization in deep learning, the existing methodologies a… ▽ More

    Submitted 19 January, 2024; originally announced February 2024.

  45. arXiv:2402.01607  [pdf, other

    cs.AI cs.CV cs.LG cs.NE stat.ME

    Natural Counterfactuals With Necessary Backtracking

    Authors: Guang-Yuan Hao, Jiji Zhang, Biwei Huang, Hao Wang, Kun Zhang

    Abstract: Counterfactual reasoning is pivotal in human cognition and especially important for providing explanations and making decisions. While Judea Pearl's influential approach is theoretically elegant, its generation of a counterfactual scenario often requires interventions that are too detached from the real scenarios to be feasible. In response, we propose a framework of natural counterfactuals and a… ▽ More

    Submitted 20 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  46. arXiv:2401.15309  [pdf, other

    stat.ME

    Zero-inflated Smoothing Spline (ZISS) Models for Individual-level Single-cell Temporal Data

    Authors: Yifu Tang, Yi Zhang, Yue Wang, Jingyi Zhang, Xiaoxiao Sun

    Abstract: Recent advancements in single-cell RNA-sequencing (scRNA-seq) have enhanced our understanding of cell heterogeneity at a high resolution. With the ability to sequence over 10,000 cells per hour, researchers can collect large scRNA-seq datasets for different participants, offering an opportunity to study the temporal progression of individual-level single-cell data. However, the presence of excessi… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

  47. arXiv:2401.13094  [pdf, other

    stat.ME

    On cross-validated estimation of skew normal model

    Authors: Jian Zhang, Tong Wang

    Abstract: Skew normal model suffers from inferential drawbacks, namely singular Fisher information in the vicinity of symmetry and diverging of maximum likelihood estimation. To address the above drawbacks, Azzalini and Arellano-Valle (2013) introduced maximum penalised likelihood estimation (MPLE) by subtracting a penalty function from the log-likelihood function with a pre-specified penalty coefficient. H… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  48. arXiv:2401.09660  [pdf

    stat.AP

    Data-Driven Assessment of the County-Level Breast Cancer Incidence in the United States: Impacts of Modifiable and Non-Modifiable Factors

    Authors: Tingting Zhao, Qing Han, Jinfeng Zhang

    Abstract: Female breast cancer (FBC) incidence rate (IR) varies greatly by counties across the United States (US). Factors responsible for such high spatial disparities are not well understood, making it challenging to design effective intervention strategies. We predicted FBC IRs using prevailing machine learning techniques for 1,754 US counties with a female population over 10,000. Outlier counties with t… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  49. arXiv:2401.06909  [pdf, other

    stat.ME

    Sensitivity Analysis for Matched Observational Studies with Continuous Exposures and Binary Outcomes

    Authors: Jeffrey Zhang, Dylan Small, Siyu Heng

    Abstract: Matching is one of the most widely used study designs for adjusting for measured confounders in observational studies. However, unmeasured confounding may exist and cannot be removed by matching. Therefore, a sensitivity analysis is typically needed to assess a causal conclusion's sensitivity to unmeasured confounding. Sensitivity analysis frameworks for binary exposures have been well-established… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  50. arXiv:2401.06134  [pdf

    econ.GN stat.AP

    Synergy or Rivalry? Glimpses of Regional Modernization and Public Service Equalization: A Case Study from China

    Authors: Shengwen Shi, Jian'an Zhang

    Abstract: For most developing countries, increasing the equalization of basic public services is widely recognized as an effective channel to improve people's sense of contentment. However, for many emerging economies like China, the equalization level of basic public services may often be neglected in the trade-off between the speed and quality of development. Taking the Yangtze River Delta region of China… ▽ More

    Submitted 21 November, 2023; originally announced January 2024.