Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 108 results for author: Hu, X

Searching in archive stat. Search in all archives.
.
  1. arXiv:2404.09438  [pdf, other

    math.OC cs.LG stat.ML

    Developing Lagrangian-based Methods for Nonsmooth Nonconvex Optimization

    Authors: Nachuan Xiao, Kuangyu Ding, Xiaoyin Hu, Kim-Chuan Toh

    Abstract: In this paper, we consider the minimization of a nonsmooth nonconvex objective function $f(x)$ over a closed convex subset $\mathcal{X}$ of $\mathbb{R}^n$, with additional nonsmooth nonconvex constraints $c(x) = 0$. We develop a unified framework for developing Lagrangian-based methods, which takes a single-step update to the primal variables by some subgradient methods in each iteration. These su… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 30 pages, 4 figures

  2. arXiv:2402.15515  [pdf

    cs.AI q-bio.QM stat.AP

    Feasibility of Identifying Factors Related to Alzheimer's Disease and Related Dementia in Real-World Data

    Authors: Aokun Chen, Qian Li, Yu Huang, Yongqiu Li, Yu-neng Chuang, Xia Hu, Serena Guo, Yonghui Wu, Yi Guo, Jiang Bian

    Abstract: A comprehensive view of factors associated with AD/ADRD will significantly aid in studies to develop new treatments for AD/ADRD and identify high-risk populations and patients for prevention efforts. In our study, we summarized the risk factors for AD/ADRD by reviewing existing meta-analyses and review articles on risk and preventive factors for AD/ADRD. In total, we extracted 477 risk factors in… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  3. arXiv:2311.11965  [pdf, other

    cs.LG stat.ML

    Provably Efficient CVaR RL in Low-rank MDPs

    Authors: Yulai Zhao, Wenhao Zhan, Xiaoyan Hu, Ho-fung Leung, Farzan Farnia, Wen Sun, Jason D. Lee

    Abstract: We study risk-sensitive Reinforcement Learning (RL), where we aim to maximize the Conditional Value at Risk (CVaR) with a fixed risk tolerance $τ$. Prior theoretical work studying risk-sensitive RL focuses on the tabular Markov Decision Processes (MDPs) setting. To extend CVaR RL to settings where state space is large, function approximation must be deployed. We study CVaR RL in low-rank MDPs with… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: The first three authors contribute equally and are ordered randomly

  4. arXiv:2310.09516  [pdf, other

    cs.LG stat.ML

    Efficient Link Prediction via GNN Layers Induced by Negative Sampling

    Authors: Yuxin Wang, Xiannian Hu, Quan Gan, Xuanjing Huang, Xipeng Qiu, David Wipf

    Abstract: Graph neural networks (GNNs) for link prediction can loosely be divided into two broad categories. First, \emph{node-wise} architectures pre-compute individual embeddings for each node that are later combined by a simple decoder to make predictions. While extremely efficient at inference time (since node embeddings are only computed once and repeatedly reused), model expressiveness is limited such… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    Comments: 19 pages, 5 figures

  5. arXiv:2309.03097  [pdf, other

    stat.AP

    An Algorithm for Modelling Escalator Fixed Loss Energy for PHM and sustainable energy usage

    Authors: Xuwen Hu, Jiaqi Qiu, Yu Lin, Inez Maria Zwetsloot, William Ka Fai Lee, Edmond Yin San Yeung, Colman Yiu Wah Yeung, Chris Chun Long Wong

    Abstract: Prognostic Health Management (PHM) is designed to assess and monitor the health status of systems, anticipate the onset of potential failure, and prevent unplanned downtime. In recent decades, collecting massive amounts of real-time sensor data enabled condition monitoring (CM) and consequently, detection of abnormalities to support maintenance decision-making. Additionally, the utilization of PHM… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  6. arXiv:2307.10053  [pdf, other

    math.OC cs.AI cs.LG stat.ML

    SGD-type Methods with Guaranteed Global Stability in Nonsmooth Nonconvex Optimization

    Authors: Nachuan Xiao, Xiaoyin Hu, Kim-Chuan Toh

    Abstract: In this paper, we focus on providing convergence guarantees for variants of the stochastic subgradient descent (SGD) method in minimizing nonsmooth nonconvex functions. We first develop a general framework to establish global stability for general stochastic subgradient methods, where the corresponding differential inclusion admits a coercive Lyapunov function. We prove that, with sufficiently sma… ▽ More

    Submitted 13 May, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

    Comments: 36 pages

  7. arXiv:2306.17584  [pdf, other

    stat.ME stat.AP

    Flexible and Accurate Methods for Estimation and Inference of Gaussian Graphical Models with Applications

    Authors: Yueqi Qian, Xianghong Hu, Can Yang

    Abstract: The Gaussian graphical model (GGM) incorporates an undirected graph to represent the conditional dependence between variables, with the precision matrix encoding partial correlation between pair of variables given the others. To achieve flexible and accurate estimation and inference of GGM, we propose the novel method FLAG, which utilizes the random effects model for pairwise conditional regressio… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

  8. arXiv:2306.07479  [pdf, ps, other

    cs.GT cs.IR cs.LG stat.ML

    Incentivizing High-Quality Content in Online Recommender Systems

    Authors: Xinyan Hu, Meena Jagadeesan, Michael I. Jordan, Jacob Steinhardt

    Abstract: In content recommender systems such as TikTok and YouTube, the platform's recommendation algorithm shapes content producer incentives. Many platforms employ online learning, which generates intertemporal incentives, since content produced today affects recommendations of future content. We study the game between producers and analyze the content created at equilibrium. We show that standard online… ▽ More

    Submitted 21 June, 2024; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: Updated version with revised and expanded content

  9. arXiv:2305.03938  [pdf, other

    math.OC cs.LG stat.ML

    Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees

    Authors: Nachuan Xiao, Xiaoyin Hu, Xin Liu, Kim-Chuan Toh

    Abstract: In this paper, we present a comprehensive study on the convergence properties of Adam-family methods for nonsmooth optimization, especially in the training of nonsmooth neural networks. We introduce a novel two-timescale framework that adopts a two-timescale updating scheme, and prove its convergence properties under mild assumptions. Our proposed framework encompasses various popular Adam-family… ▽ More

    Submitted 19 February, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

    Comments: 53 pages

  10. arXiv:2303.02566  [pdf, other

    stat.ML cs.LG stat.CO

    MFAI: A Scalable Bayesian Matrix Factorization Approach to Leveraging Auxiliary Information

    Authors: Zhiwei Wang, Fa Zhang, Cong Zheng, Xianghong Hu, Mingxuan Cai, Can Yang

    Abstract: In various practical situations, matrix factorization methods suffer from poor data quality, such as high data sparsity and low signal-to-noise ratio (SNR). Here, we consider a matrix factorization problem by utilizing auxiliary information, which is massively available in real-world applications, to overcome the challenges caused by poor data quality. Unlike existing methods that mainly rely on s… ▽ More

    Submitted 12 February, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

  11. arXiv:2211.16681  [pdf, other

    stat.ME q-bio.GN stat.AP

    Biomarker-guided heterogeneity analysis of genetic regulations via multivariate sparse fusion

    Authors: Sanguo Zhang, Xiaonan Hu, Ziye Luo, Yu Jiang, Yifan Sun, Shuangge Ma

    Abstract: Heterogeneity is a hallmark of many complex diseases. There are multiple ways of defining heterogeneity, among which the heterogeneity in genetic regulations, for example GEs (gene expressions) by CNVs (copy number variations) and methylation, has been suggested but little investigated. Heterogeneity in genetic regulations can be linked with disease severity, progression, and other traits and is b… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: 24 pages, 8 figures

    Journal ref: Statistics in Medicine, 40: 3915-3936, 2021

  12. arXiv:2210.01281  [pdf, other

    stat.ME

    A Predictor-Informed Multi-Subject Bayesian Approach for Dynamic Functional Connectivity

    Authors: Jaylen Lee, Sana Hussain, Ryan Warnick, Marina Vannucci, Isaac Menchaca, Aaron R. Seitz, Xiaoping Hu, Megan A. K. Peters, Michele Guindani

    Abstract: Time Varying Functional Connectivity (TVFC) investigates how the interactions among brain regions vary over the course of an fMRI experiment. The transitions between different individual connectivity states can be modulated by changes in underlying physiological mechanisms that drive functional network dynamics, e.g., changes in attention or cognitive effort as measured by pupil dilation. In this… ▽ More

    Submitted 9 January, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

  13. arXiv:2209.00383  [pdf, other

    cs.CV stat.ML

    TokenCut: Segmenting Objects in Images and Videos with Self-supervised Transformer and Normalized Cut

    Authors: Yangtao Wang, Xi Shen, Yuan Yuan, Yuming Du, Maomao Li, Shell Xu Hu, James L Crowley, Dominique Vaufreydaz

    Abstract: In this paper, we describe a graph-based algorithm that uses the features obtained by a self-supervised transformer to detect and segment salient objects in images and videos. With this approach, the image patches that compose an image or video are organised into a fully connected graph, where the edge between each pair of patches is labeled with a similarity score between patches using features l… ▽ More

    Submitted 5 December, 2023; v1 submitted 1 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: text overlap with arXiv:2202.11539

  14. arXiv:2207.07624  [pdf, other

    cs.LG stat.ML

    Feed-Forward Latent Domain Adaptation

    Authors: Ondrej Bohdal, Da Li, Shell Xu Hu, Timothy Hospedales

    Abstract: We study a new highly-practical problem setting that enables resource-constrained edge devices to adapt a pre-trained model to their local data distributions. Recognizing that device's data are likely to come from multiple latent domains that include a mixture of unlabelled domain-relevant and domain-irrelevant examples, we focus on the comparatively under-studied problem of latent domain adaptati… ▽ More

    Submitted 31 January, 2024; v1 submitted 15 July, 2022; originally announced July 2022.

    Comments: Accepted at WACV 2024. Project page: https://ondrejbohdal.github.io/cxda

  15. arXiv:2206.13140  [pdf, other

    cs.LG stat.ML

    Compressing Features for Learning with Noisy Labels

    Authors: Yingyi Chen, Shell Xu Hu, Xi Shen, Chunrong Ai, Johan A. K. Suykens

    Abstract: Supervised learning can be viewed as distilling relevant information from input data into feature representations. This process becomes difficult when supervision is noisy as the distilled information might not be relevant. In fact, recent research shows that networks can easily overfit all labels including those that are corrupted, and hence can hardly generalize to clean datasets. In this paper,… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: Accepted to TNNLS 2022. Project page: https://yingyichen-cyy.github.io/CompressFeatNoisyLabels/

  16. arXiv:2202.06400  [pdf, other

    math.ST stat.ME

    Misspecification Analysis of High-Dimensional Random Effects Models for Estimation of Signal-to-Noise Ratios

    Authors: Xiaohan Hu, Xiaodong Li

    Abstract: Estimation of signal-to-noise ratios and residual variances in high-dimensional linear models has various important applications including, e.g. heritability estimation in bioinformatics. One commonly used estimator, usually referred to as REML, is based on the likelihood of the random effects model, in which both the regression coefficients and the noise variables are respectively assumed to be i… ▽ More

    Submitted 7 June, 2023; v1 submitted 13 February, 2022; originally announced February 2022.

    Comments: 58 pages, 4 figures

    MSC Class: 62J05

  17. arXiv:2202.01614  [pdf, other

    cs.SD eess.AS stat.ML

    The RoyalFlush System of Speech Recognition for M2MeT Challenge

    Authors: Shuaishuai Ye, Peiyao Wang, Shunfei Chen, Xinhui Hu, Xinkang Xu

    Abstract: This paper describes our RoyalFlush system for the track of multi-speaker automatic speech recognition (ASR) in the M2MeT challenge. We adopted the serialized output training (SOT) based multi-speakers ASR system with large-scale simulation data. Firstly, we investigated a set of front-end methods, including multi-channel weighted predicted error (WPE), beamforming, speech separation, speech enhan… ▽ More

    Submitted 24 February, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

  18. arXiv:2201.03092  [pdf, other

    cs.LG econ.GN stat.ML

    Uncovering the Source of Machine Bias

    Authors: Xiyang Hu, Yan Huang, Beibei Li, Tian Lu

    Abstract: We develop a structural econometric model to capture the decision dynamics of human evaluators on an online micro-lending platform, and estimate the model parameters using a real-world dataset. We find two types of biases in gender, preference-based bias and belief-based bias, are present in human evaluators' decisions. Both types of biases are in favor of female applicants. Through counterfactual… ▽ More

    Submitted 9 January, 2022; originally announced January 2022.

    Comments: accepted by KDD 2021, MLCM workshop

    ACM Class: I.2

  19. arXiv:2201.00382  [pdf, other

    cs.LG cs.DB stat.AP stat.ML

    ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions

    Authors: Zheng Li, Yue Zhao, Xiyang Hu, Nicola Botta, Cezar Ionescu, George H. Chen

    Abstract: Outlier detection refers to the identification of data points that deviate from a general data distribution. Existing unsupervised approaches often suffer from high computational cost, complex hyperparameter tuning, and limited interpretability, especially when working with large, high-dimensional datasets. To address these issues, we present a simple yet effective algorithm called ECOD (Empirical… ▽ More

    Submitted 24 August, 2022; v1 submitted 2 January, 2022; originally announced January 2022.

    Comments: Accepted to IEEE Transactions on Knowledge and Data Engineering (TKDE) with fixed data statistics. Zheng Li and Yue Zhao contributed equally. Code is available in PyOD library at https://github.com/yzhao062/pyod

  20. arXiv:2112.04598  [pdf, other

    cs.CV cs.LG stat.ML

    InvGAN: Invertible GANs

    Authors: Partha Ghosh, Dominik Zietlow, Michael J. Black, Larry S. Davis, Xiaochen Hu

    Abstract: Generation of photo-realistic images, semantic editing and representation learning are a few of many potential applications of high resolution generative models. Recent progress in GANs have established them as an excellent choice for such tasks. However, since they do not provide an inference model, image editing or downstream tasks such as classification can not be done on real images using the… ▽ More

    Submitted 10 December, 2021; v1 submitted 8 December, 2021; originally announced December 2021.

  21. arXiv:2111.07465  [pdf, other

    stat.ML cs.LG econ.EM econ.TH

    Decoding Causality by Fictitious VAR Modeling

    Authors: Xingwei Hu

    Abstract: In modeling multivariate time series for either forecast or policy analysis, it would be beneficial to have figured out the cause-effect relations within the data. Regression analysis, however, is generally for correlation relation, and very few researches have focused on variance analysis for causality discovery. We first set up an equilibrium for the cause-effect relations using a fictitious vec… ▽ More

    Submitted 21 November, 2021; v1 submitted 14 November, 2021; originally announced November 2021.

    Comments: 32 pages, 10 figures, 10 theorems, 5 corollaries, 3 algorithms, 2 tables, and 14 proofs

    MSC Class: 62P20; 91A12; 62D20 ACM Class: I.2.6

  22. arXiv:2110.02419  [pdf, other

    stat.ML cs.GT cs.LG econ.TH

    Feature Selection by a Mechanism Design

    Authors: Xingwei Hu

    Abstract: In constructing an econometric or statistical model, we pick relevant features or variables from many candidates. A coalitional game is set up to study the selection problem where the players are the candidates and the payoff function is a performance measurement in all possible modeling scenarios. Thus, in theory, an irrelevant feature is equivalent to a dummy player in the game, which contribute… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: 15 pages, 2 figures, 1 table

    MSC Class: 62C10; 91A12; 91B03; 91B68 ACM Class: I.2.6

  23. arXiv:2106.12674  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Fairness via Representation Neutralization

    Authors: Mengnan Du, Subhabrata Mukherjee, Guanchu Wang, Ruixiang Tang, Ahmed Hassan Awadallah, Xia Hu

    Abstract: Existing bias mitigation methods for DNN models primarily work on learning debiased encoders. This process not only requires a lot of instance-level annotations for sensitive attributes, it also does not guarantee that all fairness sensitive information has been removed from the encoder. To address these limitations, we explore the following research question: Can we reduce the discrimination of D… ▽ More

    Submitted 27 October, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: Accepted by NeurIPS 2021

  24. arXiv:2105.13841   

    cs.LG cs.AI stat.ML

    A General Taylor Framework for Unifying and Revisiting Attribution Methods

    Authors: Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, Xia Hu

    Abstract: Attribution methods provide an insight into the decision-making process of machine learning models, especially deep neural networks, by assigning contribution scores to each individual feature. However, the attribution problem has not been well-defined, which lacks a unified guideline to the contribution assignment process. Furthermore, existing attribution methods often built upon various empiric… ▽ More

    Submitted 25 February, 2023; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: In the current version, the author information is not complete and there are some mathematical errors in the proof. We need to correct errors and add all co-authors who contribute to the paper. Therefore, we hope to withdraw the manuscript

  25. arXiv:2104.03087  [pdf, other

    stat.ME

    Dynamic Principal Component Analysis in High Dimensions

    Authors: Xiaoyu Hu, Fang Yao

    Abstract: Principal component analysis is a versatile tool to reduce dimensionality which has wide applications in statistics and machine learning. It is particularly useful for modeling data in high-dimensional scenarios where the number of variables $p$ is comparable to, or much larger than the sample size $n$. Despite an extensive literature on this topic, researchers have focused on modeling static prin… ▽ More

    Submitted 17 August, 2022; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: 35 pages, 2 figures, 4 tables

  26. arXiv:2011.00959  [pdf, other

    stat.ME

    Sparse Functional Principal Component Analysis in High Dimensions

    Authors: Xiaoyu Hu, Fang Yao

    Abstract: Functional principal component analysis (FPCA) is a fundamental tool and has attracted increasing attention in recent decades, while existing methods are restricted to data with a single or finite number of random functions (much smaller than the sample size $n$). In this work, we focus on high-dimensional functional processes where the number of random functions $p$ is comparable to, or even much… ▽ More

    Submitted 21 January, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: 27 pages, 2 figures, 3 tables

  27. arXiv:2010.15949  [pdf, other

    cs.LG stat.ML

    Graph Regularized Autoencoder and its Application in Unsupervised Anomaly Detection

    Authors: Imtiaz Ahmed, Travis Galoppo, Xia Hu, Yu Ding

    Abstract: Dimensionality reduction is a crucial first step for many unsupervised learning tasks including anomaly detection and clustering. Autoencoder is a popular mechanism to accomplish dimensionality reduction. In order to make dimensionality reduction effective for high-dimensional data embedding nonlinear low-dimensional manifold, it is understood that some sort of geodesic distance metric should be u… ▽ More

    Submitted 11 March, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

  28. arXiv:2010.13015  [pdf, other

    cs.LG stat.ML

    Towards Interaction Detection Using Topological Analysis on Neural Networks

    Authors: Zirui Liu, Qingquan Song, Kaixiong Zhou, Ting Hsiang Wang, Ying Shan, Xia Hu

    Abstract: Detecting statistical interactions between input features is a crucial and challenging task. Recent advances demonstrate that it is possible to extract learned interactions from trained neural networks. It has also been observed that, in neural networks, any interacting features must follow a strongly weighted connection to common hidden units. Motivated by the observation, in this paper, we propo… ▽ More

    Submitted 3 November, 2020; v1 submitted 24 October, 2020; originally announced October 2020.

  29. A Two-Sample Conditional Distribution Test Using Conformal Prediction and Weighted Rank Sum

    Authors: Xiaoyu Hu, Jing Lei

    Abstract: We consider the problem of testing the equality of conditional distributions of a response variable given a vector of covariates between two populations. Such a hypothesis testing problem can be motivated from various machine learning and statistical inference scenarios, including transfer learning and causal predictive inference. We develop a nonparametric test procedure inspired from the conform… ▽ More

    Submitted 22 February, 2023; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: 46 pages, 2 figures, 7 tables; to appear in Journal of the American Statistical Association

  30. arXiv:2009.09822  [pdf, other

    cs.DB cs.LG stat.ML

    TODS: An Automated Time Series Outlier Detection System

    Authors: Kwei-Herng Lai, Daochen Zha, Guanchu Wang, Junjie Xu, Yue Zhao, Devesh Kumar, Yile Chen, Purav Zumkhawaka, Minyang Wan, Diego Martinez, Xia Hu

    Abstract: We present TODS, an automated Time Series Outlier Detection System for research and industrial applications. TODS is a highly modular system that supports easy pipeline construction. The basic building block of TODS is primitive, which is an implementation of a function with hyperparameters. TODS currently supports 70 primitives, including data processing, time series processing, feature analysis,… ▽ More

    Submitted 7 January, 2021; v1 submitted 18 September, 2020; originally announced September 2020.

    Comments: Accepted by AAAI'21 demo track

  31. COPOD: Copula-Based Outlier Detection

    Authors: Zheng Li, Yue Zhao, Nicola Botta, Cezar Ionescu, Xiyang Hu

    Abstract: Outlier detection refers to the identification of rare items that are deviant from the general data distribution. Existing approaches suffer from high computational complexity, low predictive capability, and limited interpretability. As a remedy, we present a novel outlier detection algorithm called COPOD, which is inspired by copulas for modeling multivariate data distribution. COPOD first constr… ▽ More

    Submitted 20 September, 2020; originally announced September 2020.

    Comments: Proceedings of the 2020 International Conference on Data Mining (ICDM)

    Journal ref: 2020 IEEE International Conference on Data Mining (ICDM)

  32. arXiv:2009.07415  [pdf, other

    cs.LG stat.ML

    Meta-AAD: Active Anomaly Detection with Deep Reinforcement Learning

    Authors: Daochen Zha, Kwei-Herng Lai, Mingyang Wan, Xia Hu

    Abstract: High false-positive rate is a long-standing challenge for anomaly detection algorithms, especially in high-stake applications. To identify the true anomalies, in practice, analysts or domain experts will be employed to investigate the top instances one by one in a ranked list of anomalies identified by an anomaly detection system. This verification procedure generates informative labels that can b… ▽ More

    Submitted 15 September, 2020; originally announced September 2020.

    Comments: Accepted by ICDM 2020

  33. arXiv:2008.09695  [pdf, other

    stat.ML cs.AI cs.CV cs.LG

    A Unified Taylor Framework for Revisiting Attribution Methods

    Authors: Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, Xia Hu

    Abstract: Attribution methods have been developed to understand the decision-making process of machine learning models, especially deep neural networks, by assigning importance scores to individual features. Existing attribution methods often built upon empirical intuitions and heuristics. There still lacks a general and theoretical framework that not only can unify these attribution methods, but also theor… ▽ More

    Submitted 13 April, 2021; v1 submitted 21 August, 2020; originally announced August 2020.

  34. arXiv:2008.09316  [pdf, other

    cs.LG stat.ML

    Explainable Recommender Systems via Resolving Learning Representations

    Authors: Ninghao Liu, Yong Ge, Li Li, Xia Hu, Rui Chen, Soo-Hyun Choi

    Abstract: Recommender systems play a fundamental role in web applications in filtering massive information and matching user interests. While many efforts have been devoted to developing more effective models in various scenarios, the exploration on the explainability of recommender systems is running behind. Explanations could help improve user experience and discover system defects. In this paper, after f… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

  35. arXiv:2007.07224  [pdf, other

    cs.IR cs.LG stat.ML

    AutoRec: An Automated Recommender System

    Authors: Ting-Hsiang Wang, Qingquan Song, Xiaotian Han, Zirui Liu, Haifeng Jin, Xia Hu

    Abstract: Realistic recommender systems are often required to adapt to ever-changing data and tasks or to explore different models systematically. To address the need, we present AutoRec, an open-source automated machine learning (AutoML) platform extended from the TensorFlow ecosystem and, to our knowledge, the first framework to leverage AutoML for model search and hyperparameter tuning in deep recommenda… ▽ More

    Submitted 26 June, 2020; originally announced July 2020.

  36. arXiv:2006.15097  [pdf, other

    cs.LG stat.ML

    Policy-GNN: Aggregation Optimization for Graph Neural Networks

    Authors: Kwei-Herng Lai, Daochen Zha, Kaixiong Zhou, Xia Hu

    Abstract: Graph data are pervasive in many real-world applications. Recently, increasing attention has been paid on graph neural networks (GNNs), which aim to model the local graph structures and capture the hierarchical patterns by aggregating the information from neighbors with stackable network modules. Motivated by the observation that different nodes often require different iterations of aggregation to… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

    Comments: Accepted by ACM SIGKDD'20 research track

  37. arXiv:2006.11485  [pdf, other

    cs.LG stat.ML

    Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning

    Authors: Tianren Zhang, Shangqi Guo, Tian Tan, Xiaolin Hu, Feng Chen

    Abstract: Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is often large. Searching in a large goal space poses difficulties for both high-level subgoal generation and low-level policy learning. In this pap… ▽ More

    Submitted 18 March, 2021; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: Accepted by NeurIPS 2020

  38. arXiv:2006.11321  [pdf, other

    cs.LG stat.ML

    AutoOD: Automated Outlier Detection via Curiosity-guided Search and Self-imitation Learning

    Authors: Yuening Li, Zhengzhang Chen, Daochen Zha, Kaixiong Zhou, Haifeng Jin, Haifeng Chen, Xia Hu

    Abstract: Outlier detection is an important data mining task with numerous practical applications such as intrusion detection, credit card fraud detection, and video surveillance. However, given a specific complicated task with big data, the process of building a powerful deep learning based system for outlier detection still highly relies on human expertise and laboring trials. Although Neural Architecture… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

  39. arXiv:2006.08962  [pdf, other

    cs.LG stat.ML

    Measuring Model Complexity of Neural Networks with Curve Activation Functions

    Authors: Xia Hu, Weiqing Liu, Jiang Bian, Jian Pei

    Abstract: It is fundamental to measure model complexity of deep neural networks. The existing literature on model complexity mainly focuses on neural networks with piecewise linear activation functions. Model complexity of neural networks with general curve activation functions remains an open problem. To tackle the challenge, in this paper, we first propose the linear approximation neural network (LANN for… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Comments: KDD 2020

  40. arXiv:2006.06972  [pdf, other

    cs.LG stat.ML

    Towards Deeper Graph Neural Networks with Differentiable Group Normalization

    Authors: Kaixiong Zhou, Xiao Huang, Yuening Li, Daochen Zha, Rui Chen, Xia Hu

    Abstract: Graph neural networks (GNNs), which learn the representation of a node by aggregating its neighbors, have become an effective computational tool in downstream applications. Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases. It is because the stacked aggregators would make node representations converge to indistinguishable vectors. Several… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  41. XGNN: Towards Model-Level Explanations of Graph Neural Networks

    Authors: Hao Yuan, Jiliang Tang, Xia Hu, Shuiwang Ji

    Abstract: Graphs neural networks (GNNs) learn node features by aggregating and combining neighbor information, which have achieved promising performance on many graph tasks. However, GNNs are mostly treated as black-boxes and lack human intelligible explanations. Thus, they cannot be fully trusted and used in certain application domains if GNN models cannot be explained. In this work, we propose a novel app… ▽ More

    Submitted 3 June, 2020; originally announced June 2020.

  42. arXiv:2005.07855  [pdf

    cs.SI cs.LG stat.ML

    Neural Stochastic Block Model & Scalable Community-Based Graph Learning

    Authors: Zheng Chen, Xinli Yu, Yuan Ling, Xiaohua Hu

    Abstract: This paper proposes a novel scalable community-based neural framework for graph learning. The framework learns the graph topology through the task of community detection and link prediction by optimizing with our proposed joint SBM loss function, which results from a non-trivial adaptation of the likelihood function of the classic Stochastic Block Model (SBM). Compared with SBM, our framework is f… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

    ACM Class: I.2; H.2.8

  43. arXiv:2004.12696  [pdf, other

    cs.LG stat.ML

    Empirical Bayes Transductive Meta-Learning with Synthetic Gradients

    Authors: Shell Xu Hu, Pablo G. Moreno, Yang Xiao, Xi Shen, Guillaume Obozinski, Neil D. Lawrence, Andreas Damianou

    Abstract: We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging the unlabeled query set in addition to the support set to generate a more powerful model for each task. To develop our framework, we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a su… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: ICLR 2020

  44. arXiv:2004.11488  [pdf, other

    cs.LG cs.CV stat.ML

    Adversarial Attacks and Defenses: An Interpretation Perspective

    Authors: Ninghao Liu, Mengnan Du, Ruocheng Guo, Huan Liu, Xia Hu

    Abstract: Despite the recent advances in a wide spectrum of applications, machine learning models, especially deep neural networks, have been shown to be vulnerable to adversarial attacks. Attackers add carefully-crafted perturbations to input, where the perturbations are almost imperceptible to humans, but can cause models to make wrong predictions. Techniques to protect models against adversarial input ar… ▽ More

    Submitted 7 October, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

  45. arXiv:2004.04618  [pdf, other

    cs.LG eess.SP stat.ML

    Deep Reinforcement Learning (DRL): Another Perspective for Unsupervised Wireless Localization

    Authors: You Li, Xin Hu, Yuan Zhuang, Zhouzheng Gao, Peng Zhang, Naser El-Sheimy

    Abstract: Location is key to spatialize internet-of-things (IoT) data. However, it is challenging to use low-cost IoT devices for robust unsupervised localization (i.e., localization without training data that have known location labels). Thus, this paper proposes a deep reinforcement learning (DRL) based unsupervised wireless-localization method. The main contributions are as follows. (1) This paper propos… ▽ More

    Submitted 9 April, 2020; originally announced April 2020.

  46. arXiv:2003.12198  [pdf, other

    stat.ML cs.LG econ.GN

    Sorting Big Data by Revealed Preference with Application to College Ranking

    Authors: Xingwei Hu

    Abstract: When ranking big data observations such as colleges in the United States, diverse consumers reveal heterogeneous preferences. The objective of this paper is to sort out a linear ordering for these observations and to recommend strategies to improve their relative positions in the ranking. A properly sorted solution could help consumers make the right choices, and governments make wise policy decis… ▽ More

    Submitted 26 March, 2020; originally announced March 2020.

    Comments: 43 pages, 1 figure, 5 theorems, and 1 lemma

    MSC Class: 91A11; 91B08; 91B42; 91B68; 91B69; 91D30 ACM Class: E.4; F.2.2; G.3; J.4

    Journal ref: Journal of Big Data, 2020

  47. arXiv:2003.11461  [pdf, other

    cs.HC cs.LG stat.ML

    Emotion Recognition From Gait Analyses: Current Research and Future Directions

    Authors: Shihao Xu, Jing Fang, Xiping Hu, Edith Ngai, Wei Wang, Yi Guo, Victor C. M. Leung

    Abstract: Human gait refers to a daily motion that represents not only mobility, but it can also be used to identify the walker by either human observers or computers. Recent studies reveal that gait even conveys information about the walker's emotion. Individuals in different emotion states may show different gait patterns. The mapping between various emotions and gait patterns provides a new source for au… ▽ More

    Submitted 15 July, 2022; v1 submitted 13 March, 2020; originally announced March 2020.

  48. arXiv:2003.05731  [pdf, other

    cs.LG cs.DC cs.IR stat.ML

    SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection

    Authors: Yue Zhao, Xiyang Hu, Cheng Cheng, Cong Wang, Changlin Wan, Wen Wang, Jianing Yang, Haoping Bai, Zheng Li, Cao Xiao, Yunlong Wang, Zhi Qiao, Jimeng Sun, Leman Akoglu

    Abstract: Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples with numerous high-stake applications including fraud detection and intrusion detection. Due to the lack of ground truth labels, practitioners often have to build a large number of unsupervised, heterogeneous models (i.e., different algorithms with varying hyperparameters) for further c… ▽ More

    Submitted 4 March, 2021; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: Proceedings of the 4th Conference on Machine Learning and Systems (MLSys). The code is available at see http://github.com/yzhao062/SUOD. arXiv admin note: text overlap with arXiv:2002.03222

  49. arXiv:2003.05602  [pdf, other

    cs.LG cs.AI stat.ML

    PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning

    Authors: Yuening Li, Daochen Zha, Praveen Kumar Venugopal, Na Zou, Xia Hu

    Abstract: Outlier detection is an important task for various data mining applications. Current outlier detection techniques are often manually designed for specific domains, requiring large human efforts of database setup, algorithm selection, and hyper-parameter tuning. To fill this gap, we present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support, which automaticall… ▽ More

    Submitted 11 March, 2020; originally announced March 2020.

    Comments: In Companion Proceedings of the Web Conference 2020 (WWW 20)

  50. arXiv:2003.03616  [pdf, other

    stat.ML cs.CV cs.LG math.PR

    Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks

    Authors: Lenore Cowen, Kapil Devkota, Xiaozhe Hu, James M. Murphy, Kaiyi Wu

    Abstract: Data-dependent metrics are powerful tools for learning the underlying structure of high-dimensional data. This article develops and analyzes a data-dependent metric known as diffusion state distance (DSD), which compares points using a data-driven diffusion process. Unlike related diffusion methods, DSDs incorporate information across time scales, which allows for the intrinsic data structure to b… ▽ More

    Submitted 7 March, 2020; originally announced March 2020.

    Comments: 28 pages