Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 71 results for author: Jin, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2410.05594  [pdf, ps, other

    stat.ME

    Comparing HIV Vaccine Immunogenicity across Trials with Different Populations and Study Designs

    Authors: Yutong Jin, Alex Luedtke, Zoe Moodie, Holly Janes, David Benkeser

    Abstract: Safe and effective preventive vaccines have the potential to help stem the HIV epidemic. The efficacy of such vaccines is typically measured in randomized, double-blind phase IIb/III trials and described as a reduction in newly acquired HIV infections. However, such trials are often expensive, time-consuming, and/or logistically challenging. These challenges lead to a great interest in immune resp… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  2. arXiv:2409.19314  [pdf, other

    stat.AP

    Re-evaluating the impact of reduced malaria prevalence on birthweight in sub-Saharan Africa: A pair-of-pairs study via two-stage bipartite and non-bipartite matching

    Authors: Pengyun Wang, Ping Huang, Yifan Jin, Yanxin Shen, Omar El Shahawy, Dae Woong Ham, Wendy P. O'Meara, Siyu Heng

    Abstract: According to the WHO, in 2021, about 32% of pregnant women in sub-Saharan Africa were infected with malaria during pregnancy. Malaria infection during pregnancy can cause various adverse birth outcomes such as low birthweight. Over the past two decades, while some sub-Saharan African areas have experienced a large reduction in malaria prevalence due to improved malaria control and treatments, othe… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

  3. arXiv:2409.07018  [pdf, other

    stat.ME

    Clustered Factor Analysis for Multivariate Spatial Data

    Authors: Yanxiu Jin, Tomoya Wakayama, Renhe Jiang, Shonosuke Sugasawa

    Abstract: Factor analysis has been extensively used to reveal the dependence structures among multivariate variables, offering valuable insight in various fields. However, it cannot incorporate the spatial heterogeneity that is typically present in spatial data. To address this issue, we introduce an effective method specifically designed to discover the potential dependence structures in multivariate spati… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  4. arXiv:2408.08450  [pdf, other

    stat.AP stat.ME

    Smooth and shape-constrained quantile distributed lag models

    Authors: Yisen Jin, Aaron J. Molstad, Ander Wilson, Joseph Antonelli

    Abstract: Exposure to environmental pollutants during the gestational period can significantly impact infant health outcomes, such as birth weight and neurological development. Identifying critical windows of susceptibility, which are specific periods during pregnancy when exposure has the most profound effects, is essential for developing targeted interventions. Distributed lag models (DLMs) are widely use… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  5. arXiv:2406.08709  [pdf, other

    cs.LG stat.ME

    Introducing Diminutive Causal Structure into Graph Representation Learning

    Authors: Hang Gao, Peng Qiao, Yifan Jin, Fengge Wu, Jiangmeng Li, Changwen Zheng

    Abstract: When engaging in end-to-end graph representation learning with Graph Neural Networks (GNNs), the intricate causal relationships and rules inherent in graph data pose a formidable challenge for the model in accurately capturing authentic data relationships. A proposed mitigating strategy involves the direct integration of rules or relationships corresponding to the graph data into the model. Howeve… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  6. arXiv:2406.04167  [pdf, other

    stat.ME stat.AP

    Comparing estimators of discriminative performance of time-to-event models

    Authors: Ying Jin, Andrew Leroux

    Abstract: Predicting the timing and occurrence of events is a major focus of data science applications, especially in the context of biomedical research. Performance for models estimating these outcomes, often referred to as time-to-event or survival outcomes, is frequently summarized using measures of discrimination, in particular time-dependent AUC and concordance. Many estimators for these quantities hav… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  7. arXiv:2405.14374  [pdf, other

    stat.ML cs.AI cs.LG

    State-Constrained Offline Reinforcement Learning

    Authors: Charles A. Hepburn, Yue Jin, Giovanni Montana

    Abstract: Traditional offline reinforcement learning methods predominantly operate in a batch-constrained setting. This confines the algorithms to a specific state-action distribution present in the dataset, reducing the effects of distributional shift but restricting the algorithm greatly. In this paper, we alleviate this limitation by introducing a novel framework named \emph{state-constrained} offline re… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  8. arXiv:2405.10301  [pdf, other

    stat.ML cs.AI cs.LG

    Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees

    Authors: Yu Gui, Ying Jin, Zhimei Ren

    Abstract: Before deploying outputs from foundation models in high-stakes tasks, it is imperative to ensure that they align with human values. For instance, in radiology report generation, reports generated by a vision-language model must align with human evaluations before their use in medical decision-making. This paper presents Conformal Alignment, a general framework for identifying units whose outputs m… ▽ More

    Submitted 21 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  9. arXiv:2404.08284  [pdf

    stat.ME

    A unified generalization of inverse regression via adaptive column selection

    Authors: Yin Jin, Wei Luo

    Abstract: A bottleneck of sufficient dimension reduction (SDR) in the modern era is that, among numerous methods, only the sliced inverse regression (SIR) is generally applicable under the high-dimensional settings. The higher-order inverse regression methods, which form a major family of SDR methods that are superior to SIR in the population level, suffer from the dimensionality of their intermediate matri… ▽ More

    Submitted 23 July, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: submitted to the "Journal of the Royal Statistical Society, Series B (Statistical Methodology)"

  10. arXiv:2403.19720  [pdf, other

    math.ST cs.LG stat.ML

    Meta-Learning with Generalized Ridge Regression: High-dimensional Asymptotics, Optimality and Hyper-covariance Estimation

    Authors: Yanhao Jin, Krishnakumar Balasubramanian, Debashis Paul

    Abstract: Meta-learning involves training models on a variety of training tasks in a way that enables them to generalize well on new, unseen test tasks. In this work, we consider meta-learning within the framework of high-dimensional multivariate random-effects linear models and study generalized ridge-regression based predictions. The statistical intuition of using generalized ridge regression in this sett… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  11. arXiv:2403.03868  [pdf, other

    stat.ME math.ST stat.ML

    Confidence on the Focal: Conformal Prediction with Selection-Conditional Coverage

    Authors: Ying Jin, Zhimei Ren

    Abstract: Conformal prediction builds marginally valid prediction intervals that cover the unknown outcome of a randomly drawn new test point with a prescribed probability. However, a common scenario in practice is that, after seeing the data, practitioners decide which test unit(s) to focus on in a data-driven manner and seek for uncertainty quantification of the focal unit(s). In such cases, marginally va… ▽ More

    Submitted 24 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  12. arXiv:2402.11742  [pdf, other

    cs.LG stat.ML

    Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance

    Authors: Chiraag Kaushik, Ran Liu, Chi-Heng Lin, Amrit Khera, Matthew Y Jin, Wenrui Ma, Vidya Muthukumar, Eva L Dyer

    Abstract: Classification models are expected to perform equally well for different classes, yet in practice, there are often large gaps in their performance. This issue of class bias is widely studied in cases of datasets with sample imbalance, but is relatively overlooked in balanced datasets. In this work, we introduce the concept of spectral imbalance in features as a potential source for class dispariti… ▽ More

    Submitted 3 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: 25 pages, 9 figures

  13. arXiv:2312.09613  [pdf, other

    cs.LG cs.AI stat.ML

    Rethinking Causal Relationships Learning in Graph Neural Networks

    Authors: Hang Gao, Chengyu Yao, Jiangmeng Li, Lingyu Si, Yifan Jin, Fengge Wu, Changwen Zheng, Huaping Liu

    Abstract: Graph Neural Networks (GNNs) demonstrate their significance by effectively modeling complex interrelationships within graph-structured data. To enhance the credibility and robustness of GNNs, it becomes exceptionally crucial to bolster their ability to capture causal relationships. However, despite recent advancements that have indeed strengthened GNNs from a causal learning perspective, conductin… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  14. arXiv:2309.01056  [pdf, other

    stat.AP stat.ME

    Diagnosing the role of observable distribution shift in scientific replications

    Authors: Ying Jin, Kevin Guo, Dominik Rothenhäusler

    Abstract: Many researchers have identified distribution shift as a likely contributor to the reproducibility crisis in behavioral and biomedical sciences. The idea is that if treatment effects vary across individual characteristics and experimental contexts, then studies conducted in different populations will estimate different average effects. This paper uses ``generalizability" methods to quantify how mu… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

  15. arXiv:2307.09291  [pdf, other

    stat.ME math.ST stat.AP

    Model-free selective inference under covariate shift via weighted conformal p-values

    Authors: Ying Jin, Emmanuel J. Candès

    Abstract: This paper introduces novel weighted conformal p-values and methods for model-free selective inference. The problem is as follows: given test units with covariates $X$ and missing responses $Y$, how do we select units for which the responses $Y$ are larger than user-specified values while controlling the proportion of false positives? Can we achieve this without any modeling assumptions on the dat… ▽ More

    Submitted 26 September, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

  16. arXiv:2305.14535  [pdf, other

    cs.LG stat.ML

    Uncertainty Quantification over Graph with Conformalized Graph Neural Networks

    Authors: Kexin Huang, Ying Jin, Emmanuel Candès, Jure Leskovec

    Abstract: Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data. However, GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant. We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates. Given an entity in the… ▽ More

    Submitted 30 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Published at NeurIPS 2023

  17. arXiv:2301.00457  [pdf, other

    math.OC cs.CR cs.DS cs.LG stat.ML

    ReSQueing Parallel and Private Stochastic Convex Optimization

    Authors: Yair Carmon, Arun Jambulapati, Yujia Jin, Yin Tat Lee, Daogao Liu, Aaron Sidford, Kevin Tian

    Abstract: We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO obj… ▽ More

    Submitted 27 October, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

  18. arXiv:2212.09900  [pdf, other

    cs.LG math.ST stat.ME stat.ML

    Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality

    Authors: Ying Jin, Zhimei Ren, Zhuoran Yang, Zhaoran Wang

    Abstract: This paper studies offline policy learning, which aims at utilizing observations collected a priori (from either fixed or adaptively evolving behavior policies) to learn an optimal individualized decision rule that achieves the best overall outcomes for a given population. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all… ▽ More

    Submitted 26 September, 2024; v1 submitted 19 December, 2022; originally announced December 2022.

  19. arXiv:2212.06069  [pdf, other

    cs.LG stat.ML

    VO$Q$L: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

    Authors: Alekh Agarwal, Yujia Jin, Tong Zhang

    Abstract: We study time-inhomogeneous episodic reinforcement learning (RL) under general function approximation and sparse rewards. We design a new algorithm, Variance-weighted Optimistic $Q$-Learning (VO$Q$L), based on $Q$-learning and bound its regret assuming completeness and bounded Eluder dimension for the regression function class. As a special case, VO$Q$L achieves $\tilde{O}(d\sqrt{HT}+d^6H^{5})$ re… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

  20. arXiv:2211.10032  [pdf, other

    stat.ME

    Modular Regression: Improving Linear Models by Incorporating Auxiliary Data

    Authors: Ying Jin, Dominik Rothenhäusler

    Abstract: This paper develops a new framework, called modular regression, to utilize auxiliary information -- such as variables other than the original features or additional data sets -- in the training process of linear models. At a high level, our method follows the routine: (i) decomposing the regression task into several sub-tasks, (ii) fitting the sub-task models, and (iii) using the sub-task models t… ▽ More

    Submitted 23 November, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

    Comments: Journal of Machine Learning Research

  21. arXiv:2210.01408  [pdf, other

    stat.ME stat.ML

    Selection by Prediction with Conformal p-values

    Authors: Ying Jin, Emmanuel J. Candès

    Abstract: Decision making or scientific discovery pipelines such as job hiring and drug discovery often involve multiple stages: before any resource-intensive step, there is often an initial screening that uses predictions from a machine learning model to shortlist a few candidates from a large pool. We study screening procedures that aim to select candidates whose unobserved outcomes exceed user-specified… ▽ More

    Submitted 26 May, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: Journal of Machine Learning Research

  22. arXiv:2209.07015  [pdf, ps, other

    stat.ML cs.LG

    Upper bounds on the Natarajan dimensions of some function classes

    Authors: Ying Jin

    Abstract: The Natarajan dimension is a fundamental tool for characterizing multi-class PAC learnability, generalizing the Vapnik-Chervonenkis (VC) dimension from binary to multi-class classification problems. This work establishes upper bounds on Natarajan dimensions for certain function classes, including (i) multi-class decision tree and random forests, and (ii) multi-class neural networks with binary, li… ▽ More

    Submitted 23 April, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: To appear at IEEE ISIT 2023

  23. arXiv:2205.04616  [pdf, other

    cs.LG stat.AP

    Nightly Automobile Claims Prediction from Telematics-Derived Features: A Multilevel Approach

    Authors: Allen R. Williams, Yoolim Jin, Anthony Duer, Tuka Alhanai, Mohammad Ghassemi

    Abstract: In recent years it has become possible to collect GPS data from drivers and to incorporate this data into automobile insurance pricing for the driver. This data is continuously collected and processed nightly into metadata consisting of mileage and time summaries of each discrete trip taken, and a set of behavioral scores describing attributes of the trip (e.g, driver fatigue or driver distraction… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

  24. arXiv:2203.04373  [pdf, other

    stat.ME

    Sensitivity analysis under the $f$-sensitivity models: a distributional robustness perspective

    Authors: Ying Jin, Zhimei Ren, Zhengyuan Zhou

    Abstract: This paper introduces the $f$-sensitivity model, a new sensitivity model that characterizes the violation of unconfoundedness in causal inference. It assumes the selection bias due to unmeasured confounding is bounded "on average"; compared with the widely used point-wise sensitivity models in the literature, it is able to capture the strength of unmeasured confounding by not only its magnitude bu… ▽ More

    Submitted 5 September, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

  25. arXiv:2112.03440  [pdf, ps, other

    cs.LG stat.ML

    A Unified Framework for Multi-distribution Density Ratio Estimation

    Authors: Lantao Yu, Yujia Jin, Stefano Ermon

    Abstract: Binary density ratio estimation (DRE), the problem of estimating the ratio $p_1/p_2$ given their empirical samples, provides the foundation for many state-of-the-art machine learning algorithms such as contrastive representation learning and covariate shift adaptation. In this work, we consider a generalized setting where given samples from multiple distributions $p_1, \ldots, p_k$ (for $k > 2$),… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

  26. arXiv:2111.12161  [pdf, other

    stat.ME

    Sensitivity Analysis of Individual Treatment Effects: A Robust Conformal Inference Approach

    Authors: Ying Jin, Zhimei Ren, Emmanuel J. Candès

    Abstract: We propose a model-free framework for sensitivity analysis of individual treatment effects (ITEs), building upon ideas from conformal inference. For any unit, our procedure reports the $Γ$-value, a number which quantifies the minimum strength of confounding needed to explain away the evidence for ITE. Our approach rests on the reliable predictive inference of counterfactuals and ITEs in situations… ▽ More

    Submitted 24 April, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

  27. arXiv:2110.13406  [pdf, other

    stat.ME

    Towards Optimal Variance Reduction in Online Controlled Experiments

    Authors: Ying Jin, Shan Ba

    Abstract: We study optimal variance reduction solutions for count and ratio metrics in online controlled experiments. Our methods leverage flexible machine learning tools to incorporate covariates that are independent from the treatment but have predictive power for the outcomes, and employ the cross-fitting technique to remove the bias in complex machine learning models. We establish CLT-type asymptotic in… ▽ More

    Submitted 1 September, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

  28. arXiv:2104.04565  [pdf, other

    stat.ME

    Tailored inference for finite populations: conditional validity and transfer across distributions

    Authors: Ying Jin, Dominik Rothenhäusler

    Abstract: Parameters of sub-populations can be more relevant than super-population ones. For example, a healthcare provider may be interested in the effect of a treatment plan for a specific subset of their patients; policymakers may be concerned with the impact of a policy in a particular state within a given population. In these cases, the focus is on a specific finite population, as opposed to an infinit… ▽ More

    Submitted 20 March, 2023; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: To appear at Biometrika

  29. arXiv:2102.05198  [pdf, other

    stat.ML cs.LG

    Statistical Inference for Polyak-Ruppert Averaged Zeroth-order Stochastic Gradient Algorithm

    Authors: Yanhao Jin, Tesi Xiao, Krishnakumar Balasubramanian

    Abstract: Statistical machine learning models trained with stochastic gradient algorithms are increasingly being deployed in critical scientific applications. However, computing the stochastic gradient in several such applications is highly expensive or even impossible at times. In such cases, derivative-free or zeroth-order algorithms are used. An important question which has thus far not been addressed su… ▽ More

    Submitted 14 November, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

  30. arXiv:2102.00431  [pdf, other

    cs.LG cs.AI stat.ML

    Synergetic Learning of Heterogeneous Temporal Sequences for Multi-Horizon Probabilistic Forecasting

    Authors: Longyuan Li, Jihai Zhang, Junchi Yan, Yaohui Jin, Yunhao Zhang, Yanjie Duan, Guangjian Tian

    Abstract: Time-series is ubiquitous across applications, such as transportation, finance and healthcare. Time-series is often influenced by external factors, especially in the form of asynchronous events, making forecasting difficult. However, existing models are mainly designated for either synchronous time-series or asynchronous event sequence, and can hardly provide a synthetic way to capture the relatio… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

    Comments: Accepted by AAAI 2021 conference

  31. arXiv:2102.00397  [pdf, other

    cs.LG stat.ML

    Learning Interpretable Deep State Space Model for Probabilistic Time Series Forecasting

    Authors: Longyuan Li, Junchi Yan, Xiaokang Yang, Yaohui Jin

    Abstract: Probabilistic time series forecasting involves estimating the distribution of future based on its history, which is essential for risk management in downstream decision-making. We propose a deep state space model for probabilistic time series forecasting whereby the non-linear emission model and transition model are parameterized by networks and the dependency is modeled by recurrent neural nets.… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

    Comments: IJCAI 2019

  32. arXiv:2101.11159  [pdf, other

    stat.ME cs.DM stat.CO

    An Early Stopping Bayesian Data Assimilation Approach for Mixed-Logit Estimation

    Authors: Shanshan Xie, Tim Hillel, Ying Jin

    Abstract: The mixed-logit model is a flexible tool in transportation choice analysis, which provides valuable insights into inter and intra-individual behavioural heterogeneity. However, applications of mixed-logit models are limited by the high computational and data requirements for model estimation. When estimating on small samples, the Bayesian estimation approach becomes vulnerable to over and under-fi… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

  33. arXiv:2012.15085  [pdf, other

    cs.LG cs.AI math.OC math.ST stat.ML

    Is Pessimism Provably Efficient for Offline RL?

    Authors: Ying Jin, Zhuoran Yang, Zhaoran Wang

    Abstract: We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori. Due to the lack of further interactions with the environment, offline RL suffers from the insufficient coverage of the dataset, which eludes most existing theoretical analysis. In this paper, we propose a pessimistic variant of the value iteration algorithm (PEVI), which incor… ▽ More

    Submitted 4 May, 2022; v1 submitted 30 December, 2020; originally announced December 2020.

    Comments: This version adds results on RKHS, and a data-splitting algorithm

  34. arXiv:2012.05158  [pdf, other

    stat.ME

    Exponential Family Graphical Models: Correlated Replicates and Unmeasured Confounders, with Applications to fMRI Data

    Authors: Yanxin Jin, Yang Ning, Kean Ming Tan

    Abstract: Graphical models have been used extensively for modeling brain connectivity networks. However, unmeasured confounders and correlations among measurements are often overlooked during model fitting, which may lead to spurious scientific discoveries. Motivated by functional magnetic resonance imaging (fMRI) studies, we propose a novel method for constructing brain connectivity networks with correlate… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: R package latentgraph for fitting the model can be found at CRAN

  35. arXiv:2010.15240  [pdf, other

    cs.LG stat.ML

    Test Set Optimization by Machine Learning Algorithms

    Authors: Kaiming Fu, Yulu Jin, Zhousheng Chen

    Abstract: Diagnosis results are highly dependent on the volume of test set. To derive the most efficient test set, we propose several machine learning based methods to predict the minimum amount of test data that produces relatively accurate diagnosis. By collecting outputs from failing circuits, the feature matrix and label vector are generated, which involves the inference information of the test terminat… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

  36. arXiv:2010.01265  [pdf, other

    cs.LG q-fin.GN stat.ML

    DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and Feature Selection for Financial Data Analysis

    Authors: Chuheng Zhang, Yuanqi Li, Xi Chen, Yifei Jin, Pingzhong Tang, Jian Li

    Abstract: Modern machine learning models (such as deep neural networks and boosting decision tree models) have become increasingly popular in financial market prediction, due to their superior capacity to extract complex non-linear patterns. However, since financial datasets have very low signal-to-noise ratio and are non-stationary, complex models are often very prone to overfitting and suffer from instabi… ▽ More

    Submitted 31 January, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: This paper was published in ICDM 2020. We have fixed several typos and polished the writing in this revision

  37. arXiv:2009.11510  [pdf, other

    cs.LG cs.NE cs.SI stat.ML

    EPNE: Evolutionary Pattern Preserving Network Embedding

    Authors: Junshan Wang, Yilun Jin, Guojie Song, Xiaojun Ma

    Abstract: Information networks are ubiquitous and are ideal for modeling relational data. Networks being sparse and irregular, network embedding algorithms have caught the attention of many researchers, who came up with numerous embeddings algorithms in static networks. Yet in real life, networks constantly evolve over time. Hence, evolutionary patterns, namely how nodes develop itself over time, would serv… ▽ More

    Submitted 24 September, 2020; originally announced September 2020.

    Comments: 8 pages

  38. arXiv:2008.12776  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    Efficiently Solving MDPs with Stochastic Mirror Descent

    Authors: Yujia Jin, Aaron Sidford

    Abstract: We present a unified framework based on primal-dual stochastic mirror descent for approximately solving infinite-horizon Markov decision processes (MDPs) given a generative model. When applied to an average-reward MDP with $A_{tot}$ total state-action pairs and mixing time bound $t_{mix}$ our method computes an $ε$-optimal policy with an expected $\widetilde{O}(t_{mix}^2 A_{tot} ε^{-2})$ samples f… ▽ More

    Submitted 28 August, 2020; originally announced August 2020.

    Comments: ICML 2020

  39. arXiv:2006.14278  [pdf, other

    cs.LG cs.SI stat.ML

    Graph Structural-topic Neural Network

    Authors: Qingqing Long, Yilun Jin, Guojie Song, Yi Li, Wei Lin

    Abstract: Graph Convolutional Networks (GCNs) achieved tremendous success by effectively gathering local features for nodes. However, commonly do GCNs focus more on node features but less on graph structures within the neighborhood, especially higher-order structural patterns. However, such local structural patterns are shown to be indicative of node properties in numerous fields. In addition, it is not jus… ▽ More

    Submitted 4 July, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

  40. arXiv:2004.09845  [pdf, other

    cs.LG cs.CV stat.ML

    LRTD: Long-Range Temporal Dependency based Active Learning for Surgical Workflow Recognition

    Authors: Xueying Shi, Yueming Jin, Qi Dou, Pheng-Ann Heng

    Abstract: Automatic surgical workflow recognition in video is an essentially fundamental yet challenging problem for developing computer-assisted and robotic-assisted surgery. Existing approaches with deep learning have achieved remarkable performance on analysis of surgical videos, however, heavily relying on large-scale labelled datasets. Unfortunately, the annotation is not often available in abundance,… ▽ More

    Submitted 23 April, 2020; v1 submitted 21 April, 2020; originally announced April 2020.

    Comments: Accepted as a conference paper in IPCAI 2020

  41. arXiv:2004.00378  [pdf, other

    cs.LG eess.SP stat.ML

    Time-Frequency Analysis based Blind Modulation Classification for Multiple-Antenna Systems

    Authors: Weiheng Jiang, Xiaogang Wu, Bolin Chen, Wenjiang Feng, Yi Jin

    Abstract: Blind modulation classification is an important step to implement cognitive radio networks. The multiple-input multiple-output (MIMO) technique is widely used in military and civil communication systems. Due to the lack of prior information about channel parameters and the overlapping of signals in the MIMO systems, the traditional likelihood-based and feature-based approaches cannot be applied in… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: 12 pages, 11 figures

  42. arXiv:2003.03564  [pdf, other

    cs.LG cs.DC stat.ML

    Ternary Compression for Communication-Efficient Federated Learning

    Authors: Jinjin Xu, Wenli Du, Ran Cheng, Wangli He, Yaochu Jin

    Abstract: Learning over massive data stored in different locations is essential in many real-world applications. However, sharing data is full of challenges due to the increasing demands of privacy and security with the growing use of smart mobile devices and IoT devices. Federated learning provides a potential solution to privacy-preserving and secure machine learning, by means of jointly training a global… ▽ More

    Submitted 29 March, 2022; v1 submitted 7 March, 2020; originally announced March 2020.

    Journal ref: IEEE Trans Neural Netw Learn Syst. 2022 Mar;33(3):1162-1176

  43. arXiv:2003.02793  [pdf, ps, other

    cs.LG cs.DC stat.ML

    Real-time Federated Evolutionary Neural Architecture Search

    Authors: Hangyu Zhu, Yaochu Jin

    Abstract: Federated learning is a distributed machine learning approach to privacy preservation and two major technical challenges prevent a wider application of federated learning. One is that federated learning raises high demands on communication, since a large number of model parameters must be transmitted between the server and the clients. The other challenge is that training large machine learning mo… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

  44. arXiv:2002.11545  [pdf, other

    cs.LG stat.ML

    Towards Utilizing Unlabeled Data in Federated Learning: A Survey and Prospective

    Authors: Yilun Jin, Xiguang Wei, Yang Liu, Qiang Yang

    Abstract: Federated Learning (FL) proposed in recent years has received significant attention from researchers in that it can bring separate data sources together and build machine learning models in a collaborative but private manner. Yet, in most applications of FL, such as keyboard prediction, labeling data requires virtually no additional efforts, which is not generally the case. In reality, acquiring l… ▽ More

    Submitted 11 May, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

  45. arXiv:2002.09145  [pdf, other

    cs.LG stat.ML

    Leveraging Cross Feedback of User and Item Embeddings with Attention for Variational Autoencoder based Collaborative Filtering

    Authors: Yuan Jin, He Zhao, Ming Liu, Ye Zhu, Lan Du, Longxiang Gao, He Zhang, Yunfeng Li

    Abstract: Matrix factorization (MF) has been widely applied to collaborative filtering in recommendation systems. Its Bayesian variants can derive posterior distributions of user and item embeddings, and are more robust to sparse ratings. However, the Bayesian methods are restricted by their update rules for the posterior parameters due to the conjugacy of the priors and the likelihood. Variational autoenco… ▽ More

    Submitted 22 August, 2022; v1 submitted 21 February, 2020; originally announced February 2020.

  46. arXiv:1912.10773  [pdf, other

    cs.LG cs.CV eess.SY stat.ML

    A Survey of Deep Learning Applications to Autonomous Vehicle Control

    Authors: Sampo Kuutti, Richard Bowden, Yaochu Jin, Phil Barber, Saber Fallah

    Abstract: Designing a controller for autonomous vehicles capable of providing adequate performance in all driving scenarios is challenging due to the highly complex environment and inability to test the system in the wide variety of scenarios which it may encounter after deployment. However, deep learning methods have shown great promise in not only providing excellent performance for complex and non-linear… ▽ More

    Submitted 23 December, 2019; originally announced December 2019.

    Comments: 23 pages, 3 figures, Accepted in IEEE Transactions on Intelligent Transportation Systems

  47. arXiv:1912.03699  [pdf, other

    cs.LG cs.CV stat.ML

    Minimum Class Confusion for Versatile Domain Adaptation

    Authors: Ying Jin, Ximei Wang, Mingsheng Long, Jianmin Wang

    Abstract: There are a variety of Domain Adaptation (DA) scenarios subject to label sets and domain configurations, including closed-set and partial-set DA, as well as multi-source and multi-target DA. It is notable that existing DA methods are generally designed only for a specific scenario, and may underperform for scenarios they are not tailored to. To this end, this paper studies Versatile Domain Adaptat… ▽ More

    Submitted 10 August, 2020; v1 submitted 8 December, 2019; originally announced December 2019.

    Comments: Accepted by ECCV2020

  48. arXiv:1911.07675  [pdf, other

    cs.LG stat.ML

    GraLSP: Graph Neural Networks with Local Structural Patterns

    Authors: Yilun Jin, Guojie Song, Chuan Shi

    Abstract: It is not until recently that graph neural networks (GNNs) are adopted to perform graph representation learning, among which, those based on the aggregation of features within the neighborhood of a node achieved great success. However, despite such achievements, GNNs illustrate defects in identifying some common structural patterns which, unfortunately, play significant roles in various network ph… ▽ More

    Submitted 7 December, 2019; v1 submitted 18 November, 2019; originally announced November 2019.

  49. arXiv:1910.08892  [pdf, other

    stat.ME

    Bayesian Symbolic Regression

    Authors: Ying Jin, Weilin Fu, Jian Kang, Jiadong Guo, Jian Guo

    Abstract: Interpretability is crucial for machine learning in many scenarios such as quantitative finance, banking, healthcare, etc. Symbolic regression (SR) is a classic interpretable machine learning method by bridging X and Y using mathematical expressions composed of some basic functions. However, the search space of all possible expressions grows exponentially with the length of the expression, making… ▽ More

    Submitted 15 January, 2020; v1 submitted 20 October, 2019; originally announced October 2019.

  50. arXiv:1910.08864  [pdf, other

    cs.LG q-bio.MN stat.ML

    Identification of Interaction Clusters Using a Semi-supervised Hierarchical Clustering Method

    Authors: Yu Chen, Yuanyuan Yang, Yaochu Jin, Xiufen Zou

    Abstract: Motivation: Identifying interaction clusters of large gene regulatory networks (GRNs) is critical for its further investigation, while this task is very challenging, attributed to data noise in experiment data, large scale of GRNs, and inconsistency between gene expression profiles and function modules, etc. It is promising to semi-supervise this process by prior information, but shortage of prior… ▽ More

    Submitted 19 October, 2019; originally announced October 2019.