Search | arXiv e-print repository

arXiv:2405.03734 [pdf, other]

FOKE: A Personalized and Explainable Education Framework Integrating Foundation Models, Knowledge Graphs, and Prompt Engineering

Authors: Silan Hu, Xiaoning Wang

Abstract: Integrating large language models (LLMs) and knowledge graphs (KGs) holds great promise for revolutionizing intelligent education, but challenges remain in achieving personalization, interactivity, and explainability. We propose FOKE, a Forest Of Knowledge and Education framework that synergizes foundation models, knowledge graphs, and prompt engineering to address these challenges. FOKE introduce… ▽ More Integrating large language models (LLMs) and knowledge graphs (KGs) holds great promise for revolutionizing intelligent education, but challenges remain in achieving personalization, interactivity, and explainability. We propose FOKE, a Forest Of Knowledge and Education framework that synergizes foundation models, knowledge graphs, and prompt engineering to address these challenges. FOKE introduces three key innovations: (1) a hierarchical knowledge forest for structured domain knowledge representation; (2) a multi-dimensional user profiling mechanism for comprehensive learner modeling; and (3) an interactive prompt engineering scheme for generating precise and tailored learning guidance. We showcase FOKE's application in programming education, homework assessment, and learning path planning, demonstrating its effectiveness and practicality. Additionally, we implement Scholar Hero, a real-world instantiation of FOKE. Our research highlights the potential of integrating foundation models, knowledge graphs, and prompt engineering to revolutionize intelligent education practices, ultimately benefiting learners worldwide. FOKE provides a principled and unified approach to harnessing cutting-edge AI technologies for personalized, interactive, and explainable educational services, paving the way for further research and development in this critical direction. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.19292 [pdf, other]

Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning

Authors: Qiaosheng Zhang, Chenjia Bai, Shuyue Hu, Zhen Wang, Xuelong Li

Abstract: This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic t… ▽ More This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. The basic algorithm, referred to as MAIDS, employs an asymmetric learning structure where the max-player first solves a minimax optimization problem based on the joint information ratio of the joint policy, and the min-player then minimizes the marginal information ratio with the max-player's policy fixed. Theoretical analyses show that it achieves a Bayesian regret of tilde{O}(sqrt{K}) for K episodes. To reduce the computational load of MAIDS, we develop an improved algorithm called Reg-MAIDS, which has the same Bayesian regret bound while enjoying less computational complexity. Moreover, by leveraging the flexibility of IDS principle in choosing the learning target, we propose two methods for constructing compressed environments based on rate-distortion theory, upon which we develop an algorithm Compressed-MAIDS wherein the learning target is a compressed environment. Finally, we extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.09729 [pdf]

Amplitude-Phase Fusion for Enhanced Electrocardiogram Morphological Analysis

Authors: Shuaicong Hu, Yanan Wang, Jian Liu, Jingyu Lin, Shengmei Qin, Zhenning Nie, Zhifeng Yao, Wenjie Cai, Cuiwei Yang

Abstract: Considering the variability of amplitude and phase patterns in electrocardiogram (ECG) signals due to cardiac activity and individual differences, existing entropy-based studies have not fully utilized these two patterns and lack integration. To address this gap, this paper proposes a novel fusion entropy metric, morphological ECG entropy (MEE) for the first time, specifically designed for ECG mor… ▽ More Considering the variability of amplitude and phase patterns in electrocardiogram (ECG) signals due to cardiac activity and individual differences, existing entropy-based studies have not fully utilized these two patterns and lack integration. To address this gap, this paper proposes a novel fusion entropy metric, morphological ECG entropy (MEE) for the first time, specifically designed for ECG morphology, to comprehensively describe the fusion of amplitude and phase patterns. MEE is computed based on beat-level samples, enabling detailed analysis of each cardiac cycle. Experimental results demonstrate that MEE achieves rapid, accurate, and label-free localization of abnormal ECG arrhythmia regions. Furthermore, MEE provides a method for assessing sample diversity, facilitating compression of imbalanced training sets (via representative sample selection), and outperforms random pruning. Additionally, MEE exhibits the ability to describe areas of poor quality. By discussing, it proves the robustness of MEE value calculation to noise interference and its low computational complexity. Finally, we integrate this method into a clinical interactive interface to provide a more convenient and intuitive user experience. These findings indicate that MEE serves as a valuable clinical descriptor for ECG characterization. The implementation code can be referenced at the following link: https://github.com/fdu-harry/ECG-MEE-metric. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 16 pages, 12 figures

ACM Class: I.5.2

arXiv:2404.03701 [pdf, other]

Predictive Analytics of Varieties of Potatoes

Authors: Fabiana Ferracina, Bala Krishnamoorthy, Mahantesh Halappanavar, Shengwei Hu, Vidyasagar Sathuvalli

Abstract: We explore the application of machine learning algorithms to predict the suitability of Russet potato clones for advancement in breeding trials. Leveraging data from manually collected trials in the state of Oregon, we investigate the potential of a wide variety of state-of-the-art binary classification models. We conduct a comprehensive analysis of the dataset that includes preprocessing, feature… ▽ More We explore the application of machine learning algorithms to predict the suitability of Russet potato clones for advancement in breeding trials. Leveraging data from manually collected trials in the state of Oregon, we investigate the potential of a wide variety of state-of-the-art binary classification models. We conduct a comprehensive analysis of the dataset that includes preprocessing, feature engineering, and imputation to address missing values. We focus on several key metrics such as accuracy, F1-score, and Matthews correlation coefficient (MCC) for model evaluation. The top-performing models, namely the multi-layer perceptron classifier (MLPC), histogram-based gradient boosting classifier (HGBC), and a support vector machine classifier (SVC), demonstrate consistent and significant results. Variable selection further enhances model performance and identifies influential features in predicting trial outcomes. The findings emphasize the potential of machine learning in streamlining the selection process for potato varieties, offering benefits such as increased efficiency, substantial cost savings, and judicious resource utilization. Our study contributes insights into precision agriculture and showcases the relevance of advanced technologies for informed decision-making in breeding programs. △ Less

Submitted 18 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: 19 pages, 3 figures, submitted to Artificial Intelligence in Agriculture

arXiv:2403.05425 [pdf, ps, other]

An Adaptive Dimension Reduction Estimation Method for High-dimensional Bayesian Optimization

Authors: Shouri Hu, Jiawei Li, Zhibo Cai

Abstract: Bayesian optimization (BO) has shown impressive results in a variety of applications within low-to-moderate dimensional Euclidean spaces. However, extending BO to high-dimensional settings remains a significant challenge. We address this challenge by proposing a two-step optimization framework. Initially, we identify the effective dimension reduction (EDR) subspace for the objective function using… ▽ More Bayesian optimization (BO) has shown impressive results in a variety of applications within low-to-moderate dimensional Euclidean spaces. However, extending BO to high-dimensional settings remains a significant challenge. We address this challenge by proposing a two-step optimization framework. Initially, we identify the effective dimension reduction (EDR) subspace for the objective function using the minimum average variance estimation (MAVE) method. Subsequently, we construct a Gaussian process model within this EDR subspace and optimize it using the expected improvement criterion. Our algorithm offers the flexibility to operate these steps either concurrently or in sequence. In the sequential approach, we meticulously balance the exploration-exploitation trade-off by distributing the sampling budget between subspace estimation and function optimization, and the convergence rate of our algorithm in high-dimensional contexts has been established. Numerical experiments validate the efficacy of our method in challenging scenarios. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: First draft

arXiv:2402.00728 [pdf, other]

Dropout-Based Rashomon Set Exploration for Efficient Predictive Multiplicity Estimation

Authors: Hsiang Hsu, Guihong Li, Shaohan Hu, Chun-Fu, Chen

Abstract: Predictive multiplicity refers to the phenomenon in which classification tasks may admit multiple competing models that achieve almost-equally-optimal performance, yet generate conflicting outputs for individual samples. This presents significant concerns, as it can potentially result in systemic exclusion, inexplicable discrimination, and unfairness in practical applications. Measuring and mitiga… ▽ More Predictive multiplicity refers to the phenomenon in which classification tasks may admit multiple competing models that achieve almost-equally-optimal performance, yet generate conflicting outputs for individual samples. This presents significant concerns, as it can potentially result in systemic exclusion, inexplicable discrimination, and unfairness in practical applications. Measuring and mitigating predictive multiplicity, however, is computationally challenging due to the need to explore all such almost-equally-optimal models, known as the Rashomon set, in potentially huge hypothesis spaces. To address this challenge, we propose a novel framework that utilizes dropout techniques for exploring models in the Rashomon set. We provide rigorous theoretical derivations to connect the dropout parameters to properties of the Rashomon set, and empirically evaluate our framework through extensive experimentation. Numerical results show that our technique consistently outperforms baselines in terms of the effectiveness of predictive multiplicity metric estimation, with runtime speedup up to $20\times \sim 5000\times$. With efficient Rashomon set exploration and metric estimation, mitigation of predictive multiplicity is then achieved through dropout ensemble and model selection. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: ICLR 2024

arXiv:2309.13557 [pdf, other]

Bayesian Parameter Inference for Partially Observed Diffusions using Multilevel Stochastic Runge-Kutta Methods

Authors: Pierre Del Moral, Shulan Hu, Ajay Jasra, Hamza Ruzayqat, Xinyu Wang

Abstract: We consider the problem of Bayesian estimation of static parameters associated to a partially and discretely observed diffusion process. We assume that the exact transition dynamics of the diffusion process are unavailable, even up-to an unbiased estimator and that one must time-discretize the diffusion process. In such scenarios it has been shown how one can introduce the multilevel Monte Carlo m… ▽ More We consider the problem of Bayesian estimation of static parameters associated to a partially and discretely observed diffusion process. We assume that the exact transition dynamics of the diffusion process are unavailable, even up-to an unbiased estimator and that one must time-discretize the diffusion process. In such scenarios it has been shown how one can introduce the multilevel Monte Carlo method to reduce the cost to compute posterior expected values of the parameters for a pre-specified mean square error (MSE). These afore-mentioned methods rely on upon the Euler-Maruyama discretization scheme which is well-known in numerical analysis to have slow convergence properties. We adapt stochastic Runge-Kutta (SRK) methods for Bayesian parameter estimation of static parameters for diffusions. This can be implemented in high-dimensions of the diffusion and seemingly under-appreciated in the uncertainty quantification and statistics fields. For a class of diffusions and SRK methods, we consider the estimation of the posterior expectation of the parameters. We prove that to achieve a MSE of $\mathcal{O}(ε^2)$, for $ε>0$ given, the associated work is $\mathcal{O}(ε^{-2})$. Whilst the latter is achievable for the Milstein scheme, this method is often not applicable for diffusions in dimension larger than two. We also illustrate our methodology in several numerical examples. △ Less

Submitted 24 September, 2023; originally announced September 2023.

arXiv:2309.05145 [pdf, other]

Outlier Robust Adversarial Training

Authors: Shu Hu, Zhenhuan Yang, Xin Wang, Yiming Ying, Siwei Lyu

Abstract: Supervised learning models are challenged by the intrinsic complexities of training data such as outliers and minority subpopulations and intentional attacks at inference time with adversarial samples. While traditional robust learning methods and the recent adversarial training approaches are designed to handle each of the two challenges, to date, no work has been done to develop models that are… ▽ More Supervised learning models are challenged by the intrinsic complexities of training data such as outliers and minority subpopulations and intentional attacks at inference time with adversarial samples. While traditional robust learning methods and the recent adversarial training approaches are designed to handle each of the two challenges, to date, no work has been done to develop models that are robust with regard to the low-quality training data and the potential adversarial attack at inference time simultaneously. It is for this reason that we introduce Outlier Robust Adversarial Training (ORAT) in this work. ORAT is based on a bi-level optimization formulation of adversarial training with a robust rank-based loss function. Theoretically, we show that the learning objective of ORAT satisfies the $\mathcal{H}$-consistency in binary classification, which establishes it as a proper surrogate to adversarial 0/1 loss. Furthermore, we analyze its generalization ability and provide uniform convergence rates in high probability. ORAT can be optimized with a simple algorithm. Experimental evaluations on three benchmark datasets demonstrate the effectiveness and robustness of ORAT in handling outliers and adversarial attacks. Our code is available at https://github.com/discovershu/ORAT. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: Accepted by The 15th Asian Conference on Machine Learning (ACML 2023)

arXiv:2308.04158 [pdf, other]

A Dual Cox Model Theory And Its Applications In Oncology

Authors: Powei Chen, Siying Hu, Dr. Haojin Zhou

Abstract: Given the prominence of targeted therapy and immunotherapy in cancer treatment, it becomes imperative to consider heterogeneity in patients' responses to treatments, which contributes greatly to the widely used proportional hazard assumption invalidated as in several clinical trials. To address the challenge, we develop a Dual Cox model theory including a Dual Cox model and a fitting algorithm.… ▽ More Given the prominence of targeted therapy and immunotherapy in cancer treatment, it becomes imperative to consider heterogeneity in patients' responses to treatments, which contributes greatly to the widely used proportional hazard assumption invalidated as in several clinical trials. To address the challenge, we develop a Dual Cox model theory including a Dual Cox model and a fitting algorithm. As one of the finite mixture models, the proposed Dual Cox model consists of two independent Cox models based on patients' responses to one designated treatment (usually the experimental one) in the clinical trial. Responses of patients in the designated treatment arm can be observed and hence those patients are known responders or non-responders. From the perspective of subgroup classification, such a phenomenon renders the proposed model as a semi-supervised problem, compared to the typical finite mixture model where the subgroup classification is usually unsupervised. A specialized expectation-maximization algorithm is utilized for model fitting, where the initial parameter values are estimated from the patients in the designated treatment arm and then the iteratively reweighted least squares (IRLS) is applied. Under mild assumptions, the consistency and asymptotic normality of its estimators of effect parameters in each Cox model are established. In addition to strong theoretical properties, simulations demonstrate that our theory can provide a good approximation to a wide variety of survival models, is relatively robust to the change of censoring rate and response rate, and has a high prediction accuracy and stability in subgroup classification while it has a fast convergence rate. Finally, we apply our theory to two clinical trials with cross-overed KM plots and identify the subgroups where the subjects benefit from the treatment or not. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2211.10508 [pdf, other]

Distributionally Robust Survival Analysis: A Novel Fairness Loss Without Demographics

Authors: Shu Hu, George H. Chen

Abstract: We propose a general approach for training survival analysis models that minimizes a worst-case error across all subpopulations that are large enough (occurring with at least a user-specified minimum probability). This approach uses a training loss function that does not know any demographic information to treat as sensitive. Despite this, we demonstrate that our proposed approach often scores bet… ▽ More We propose a general approach for training survival analysis models that minimizes a worst-case error across all subpopulations that are large enough (occurring with at least a user-specified minimum probability). This approach uses a training loss function that does not know any demographic information to treat as sensitive. Despite this, we demonstrate that our proposed approach often scores better on recently established fairness metrics (without a significant drop in prediction accuracy) compared to various baselines, including ones which directly use sensitive demographic information in their training loss. Our code is available at: https://github.com/discovershu/DRO_COX △ Less

Submitted 18 November, 2022; originally announced November 2022.

Comments: Machine Learning for Health (ML4H 2022)

arXiv:2209.00383 [pdf, other]

TokenCut: Segmenting Objects in Images and Videos with Self-supervised Transformer and Normalized Cut

Authors: Yangtao Wang, Xi Shen, Yuan Yuan, Yuming Du, Maomao Li, Shell Xu Hu, James L Crowley, Dominique Vaufreydaz

Abstract: In this paper, we describe a graph-based algorithm that uses the features obtained by a self-supervised transformer to detect and segment salient objects in images and videos. With this approach, the image patches that compose an image or video are organised into a fully connected graph, where the edge between each pair of patches is labeled with a similarity score between patches using features l… ▽ More In this paper, we describe a graph-based algorithm that uses the features obtained by a self-supervised transformer to detect and segment salient objects in images and videos. With this approach, the image patches that compose an image or video are organised into a fully connected graph, where the edge between each pair of patches is labeled with a similarity score between patches using features learned by the transformer. Detection and segmentation of salient objects is then formulated as a graph-cut problem and solved using the classical Normalized Cut algorithm. Despite the simplicity of this approach, it achieves state-of-the-art results on several common image and video detection and segmentation tasks. For unsupervised object discovery, this approach outperforms the competing approaches by a margin of 6.1%, 5.7%, and 2.6%, respectively, when tested with the VOC07, VOC12, and COCO20K datasets. For the unsupervised saliency detection task in images, this method improves the score for Intersection over Union (IoU) by 4.4%, 5.6% and 5.2%. When tested with the ECSSD, DUTS, and DUT-OMRON datasets, respectively, compared to current state-of-the-art techniques. This method also achieves competitive results for unsupervised video object segmentation tasks with the DAVIS, SegTV2, and FBMS datasets. △ Less

Submitted 5 December, 2023; v1 submitted 1 September, 2022; originally announced September 2022.

Comments: arXiv admin note: text overlap with arXiv:2202.11539

arXiv:2208.02627 [pdf, ps, other]

Modelling multivariate extreme value distributions via Markov trees

Authors: Shuang Hu, Zuoxiang Peng, Johan Segers

Abstract: Multivariate extreme value distributions are a common choice for modelling multivariate extremes. In high dimensions, however, the construction of flexible and parsimonious models is challenging. We propose to combine bivariate extreme value distributions into a Markov random field with respect to a tree. Although in general not an extreme value distribution itself, this Markov tree is attracted b… ▽ More Multivariate extreme value distributions are a common choice for modelling multivariate extremes. In high dimensions, however, the construction of flexible and parsimonious models is challenging. We propose to combine bivariate extreme value distributions into a Markov random field with respect to a tree. Although in general not an extreme value distribution itself, this Markov tree is attracted by a multivariate extreme value distribution. The latter serves as a tree-based approximation to an unknown extreme value distribution with the given bivariate distributions as margins. Given data, we learn an appropriate tree structure by Prim's algorithm with estimated pairwise upper tail dependence coefficients or Kendall's tau values as edge weights. The distributions of pairs of connected variables can be fitted in various ways. The resulting tree-structured extreme value distribution allows for inference on rare event probabilities, as illustrated on river discharge data from the upper Danube basin. △ Less

Submitted 29 July, 2022; originally announced August 2022.

Comments: 37 pages, 10 figures, 7 tables

MSC Class: 62G32; 62H22

arXiv:2207.07624 [pdf, other]

Feed-Forward Latent Domain Adaptation

Authors: Ondrej Bohdal, Da Li, Shell Xu Hu, Timothy Hospedales

Abstract: We study a new highly-practical problem setting that enables resource-constrained edge devices to adapt a pre-trained model to their local data distributions. Recognizing that device's data are likely to come from multiple latent domains that include a mixture of unlabelled domain-relevant and domain-irrelevant examples, we focus on the comparatively under-studied problem of latent domain adaptati… ▽ More We study a new highly-practical problem setting that enables resource-constrained edge devices to adapt a pre-trained model to their local data distributions. Recognizing that device's data are likely to come from multiple latent domains that include a mixture of unlabelled domain-relevant and domain-irrelevant examples, we focus on the comparatively under-studied problem of latent domain adaptation. Considering limitations of edge devices, we aim to only use a pre-trained model and adapt it in a feed-forward way, without using back-propagation and without access to the source data. Modelling these realistic constraints bring us to the novel and practically important problem setting of feed-forward latent domain adaptation. Our solution is to meta-learn a network capable of embedding the mixed-relevance target dataset and dynamically adapting inference for target examples using cross-attention. The resulting framework leads to consistent improvements over strong ERM baselines. We also show that our framework sometimes even improves on the upper bound of domain-supervised adaptation, where only domain-relevant instances are provided for adaptation. This suggests that human annotated domain labels may not always be optimal, and raises the possibility of doing better through automated instance selection. △ Less

Submitted 31 January, 2024; v1 submitted 15 July, 2022; originally announced July 2022.

Comments: Accepted at WACV 2024. Project page: https://ondrejbohdal.github.io/cxda

arXiv:2206.13140 [pdf, other]

Compressing Features for Learning with Noisy Labels

Authors: Yingyi Chen, Shell Xu Hu, Xi Shen, Chunrong Ai, Johan A. K. Suykens

Abstract: Supervised learning can be viewed as distilling relevant information from input data into feature representations. This process becomes difficult when supervision is noisy as the distilled information might not be relevant. In fact, recent research shows that networks can easily overfit all labels including those that are corrupted, and hence can hardly generalize to clean datasets. In this paper,… ▽ More Supervised learning can be viewed as distilling relevant information from input data into feature representations. This process becomes difficult when supervision is noisy as the distilled information might not be relevant. In fact, recent research shows that networks can easily overfit all labels including those that are corrupted, and hence can hardly generalize to clean datasets. In this paper, we focus on the problem of learning with noisy labels and introduce compression inductive bias to network architectures to alleviate this over-fitting problem. More precisely, we revisit one classical regularization named Dropout and its variant Nested Dropout. Dropout can serve as a compression constraint for its feature dropping mechanism, while Nested Dropout further learns ordered feature representations w.r.t. feature importance. Moreover, the trained models with compression regularization are further combined with Co-teaching for performance boost. Theoretically, we conduct bias-variance decomposition of the objective function under compression regularization. We analyze it for both single model and Co-teaching. This decomposition provides three insights: (i) it shows that over-fitting is indeed an issue for learning with noisy labels; (ii) through an information bottleneck formulation, it explains why the proposed feature compression helps in combating label noise; (iii) it gives explanations on the performance boost brought by incorporating compression regularization into Co-teaching. Experiments show that our simple approach can have comparable or even better performance than the state-of-the-art methods on benchmarks with real-world label noise including Clothing1M and ANIMAL-10N. Our implementation is available at https://yingyichen-cyy.github.io/CompressFeatNoisyLabels/. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: Accepted to TNNLS 2022. Project page: https://yingyichen-cyy.github.io/CompressFeatNoisyLabels/

arXiv:2206.08531 [pdf, ps, other]

Reframed GES with a Neural Conditional Dependence Measure

Authors: Xinwei Shen, Shengyu Zhu, Jiji Zhang, Shoubo Hu, Zhitang Chen

Abstract: In a nonparametric setting, the causal structure is often identifiable only up to Markov equivalence, and for the purpose of causal inference, it is useful to learn a graphical representation of the Markov equivalence class (MEC). In this paper, we revisit the Greedy Equivalence Search (GES) algorithm, which is widely cited as a score-based algorithm for learning the MEC of the underlying causal s… ▽ More In a nonparametric setting, the causal structure is often identifiable only up to Markov equivalence, and for the purpose of causal inference, it is useful to learn a graphical representation of the Markov equivalence class (MEC). In this paper, we revisit the Greedy Equivalence Search (GES) algorithm, which is widely cited as a score-based algorithm for learning the MEC of the underlying causal structure. We observe that in order to make the GES algorithm consistent in a nonparametric setting, it is not necessary to design a scoring metric that evaluates graphs. Instead, it suffices to plug in a consistent estimator of a measure of conditional dependence to guide the search. We therefore present a reframing of the GES algorithm, which is more flexible than the standard score-based version and readily lends itself to the nonparametric setting with a general measure of conditional dependence. In addition, we propose a neural conditional dependence (NCD) measure, which utilizes the expressive power of deep neural networks to characterize conditional independence in a nonparametric manner. We establish the optimality of the reframed GES algorithm under standard assumptions and the consistency of using our NCD estimator to decide conditional independence. Together these results justify the proposed approach. Experimental results demonstrate the effectiveness of our method in causal discovery, as well as the advantages of using our NCD measure over kernel-based measures. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: Accepted to UAI 2022

arXiv:2206.07902 [pdf, other]

On Privacy and Personalization in Cross-Silo Federated Learning

Authors: Ziyu Liu, Shengyuan Hu, Zhiwei Steven Wu, Virginia Smith

Abstract: While the application of differential privacy (DP) has been well-studied in cross-device federated learning (FL), there is a lack of work considering DP and its implications for cross-silo FL, a setting characterized by a limited number of clients each containing many data subjects. In cross-silo FL, usual notions of client-level DP are less suitable as real-world privacy regulations typically con… ▽ More While the application of differential privacy (DP) has been well-studied in cross-device federated learning (FL), there is a lack of work considering DP and its implications for cross-silo FL, a setting characterized by a limited number of clients each containing many data subjects. In cross-silo FL, usual notions of client-level DP are less suitable as real-world privacy regulations typically concern the in-silo data subjects rather than the silos themselves. In this work, we instead consider an alternative notion of silo-specific sample-level DP, where silos set their own privacy targets for their local examples. Under this setting, we reconsider the roles of personalization in federated learning. In particular, we show that mean-regularized multi-task learning (MR-MTL), a simple personalization framework, is a strong baseline for cross-silo FL: under stronger privacy requirements, silos are incentivized to federate more with each other to mitigate DP noise, resulting in consistent improvements relative to standard baseline methods. We provide an empirical study of competing methods as well as a theoretical characterization of MR-MTL for mean estimation, highlighting the interplay between privacy and cross-silo data heterogeneity. Our work serves to establish baselines for private cross-silo FL as well as identify key directions of future work in this area. △ Less

Submitted 17 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: NeurIPS 2022, 37 pages

arXiv:2203.11691 [pdf, other]

GAM(L)A: An econometric model for interpretable Machine Learning

Authors: Emmanuel Flachaire, Gilles Hacheme, Sullivan Hué, Sébastien Laurent

Abstract: Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes or uninterpretable models which has raised concerns from practitioners and regulators. As an alternative, we propose in this paper to use partial linear models that are inherently interpretable. Specifically, this article introduces GAM-lasso (GAMLA) and GAM-autometrics (GAMA), denote… ▽ More Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes or uninterpretable models which has raised concerns from practitioners and regulators. As an alternative, we propose in this paper to use partial linear models that are inherently interpretable. Specifically, this article introduces GAM-lasso (GAMLA) and GAM-autometrics (GAMA), denoted as GAM(L)A in short. GAM(L)A combines parametric and non-parametric functions to accurately capture linearities and non-linearities prevailing between dependent and explanatory variables, and a variable selection procedure to control for overfitting issues. Estimation relies on a two-step procedure building upon the double residual method. We illustrate the predictive performance and interpretability of GAM(L)A on a regression and a classification problem. The results show that GAM(L)A outperforms parametric models augmented by quadratic, cubic and interaction effects. Moreover, the results also suggest that the performance of GAM(L)A is not significantly different from that of random forest and gradient boosting. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Comments: 47 pages, 12 tables and 7 figures

arXiv:2202.11539 [pdf, other]

Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut

Authors: Yangtao Wang, Xi Shen, Shell Hu, Yuan Yuan, James Crowley, Dominique Vaufreydaz

Abstract: Transformers trained with self-supervised learning using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we demonstrate a graph-based approach that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connect… ▽ More Transformers trained with self-supervised learning using self-distillation loss (DINO) have been shown to produce attention maps that highlight salient foreground objects. In this paper, we demonstrate a graph-based approach that uses the self-supervised transformer features to discover an object from an image. Visual tokens are viewed as nodes in a weighted graph with edges representing a connectivity score based on the similarity of tokens. Foreground objects can then be segmented using a normalized graph-cut to group self-similar regions. We solve the graph-cut problem using spectral clustering with generalized eigen-decomposition and show that the second smallest eigenvector provides a cutting solution since its absolute value indicates the likelihood that a token belongs to a foreground object. Despite its simplicity, this approach significantly boosts the performance of unsupervised object discovery: we improve over the recent state of the art LOST by a margin of 6.9%, 8.1%, and 8.1% respectively on the VOC07, VOC12, and COCO20K. The performance can be further improved by adding a second stage class-agnostic detector (CAD). Our proposed method can be easily extended to unsupervised saliency detection and weakly supervised object detection. For unsupervised saliency detection, we improve IoU for 4.9%, 5.2%, 12.9% on ECSSD, DUTS, DUT-OMRON respectively compared to previous state of the art. For weakly supervised object detection, we achieve competitive performance on CUB and ImageNet. △ Less

Submitted 24 March, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

Journal ref: CVPR 2022 - Conference on Computer Vision and Pattern Recognition, Jun 2022, New Orleans, United States

arXiv:2106.03300 [pdf, other]

Sum of Ranked Range Loss for Supervised Learning

Authors: Shu Hu, Yiming Ying, Xin Wang, Siwei Lyu

Abstract: In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output. Such cases occur in the aggregate loss, which combines individual losses of a learning model over each training sample, and in the individual loss for multi-label learning, which combines prediction scores over all class labels. In this work, we introduce the sum of ranked range (SoRR)… ▽ More In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output. Such cases occur in the aggregate loss, which combines individual losses of a learning model over each training sample, and in the individual loss for multi-label learning, which combines prediction scores over all class labels. In this work, we introduce the sum of ranked range (SoRR) as a general approach to form learning objectives. A ranked range is a consecutive sequence of sorted values of a set of real numbers. The minimization of SoRR is solved with the difference of convex algorithm (DCA). We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary/multi-class classification at the sample level and the TKML individual loss for multi-label/multi-class classification at the label level. A combination loss of AoRR and TKML is proposed as a new learning objective for improving the robustness of multi-label learning in the face of outliers in sample and labels alike. Our empirical results highlight the effectiveness of the proposed optimization frameworks and demonstrate the applicability of proposed losses using synthetic and real data sets. △ Less

Submitted 3 April, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

Comments: Accepted by Journal of Machine Learning Research (JMLR). arXiv admin note: text overlap with arXiv:2010.01741

arXiv:2106.00925 [pdf, other]

Contrastive ACE: Domain Generalization Through Alignment of Causal Mechanisms

Authors: Yunqi Wang, Furui Liu, Zhitang Chen, Qing Lian, Shoubo Hu, Jianye Hao, Yik-Chung Wu

Abstract: Domain generalization aims to learn knowledge invariant across different distributions while semantically meaningful for downstream tasks from multiple source domains, to improve the model's generalization ability on unseen target domains. The fundamental objective is to understand the underlying "invariance" behind these observational distributions and such invariance has been shown to have a clo… ▽ More Domain generalization aims to learn knowledge invariant across different distributions while semantically meaningful for downstream tasks from multiple source domains, to improve the model's generalization ability on unseen target domains. The fundamental objective is to understand the underlying "invariance" behind these observational distributions and such invariance has been shown to have a close connection to causality. While many existing approaches make use of the property that causal features are invariant across domains, we consider the causal invariance of the average causal effect of the features to the labels. This invariance regularizes our training approach in which interventions are performed on features to enforce stability of the causal prediction by the classifier across domains. Our work thus sheds some light on the domain generalization problem by introducing invariance of the mechanisms into the learning process. Experiments on several benchmark datasets demonstrate the performance of the proposed method against SOTAs. △ Less

Submitted 2 June, 2021; originally announced June 2021.

arXiv:2103.10912 [pdf, other]

Copula Averaging for Tail Dependence in Insurance Claims Data

Authors: Sen Hu, Adrian O'Hagan

Abstract: Analysing dependent risks is an important task for insurance companies. A dependency is reflected in the fact that information about one random variable provides information about the likely distribution of values of another random variable. Insurance companies in particular must investigate such dependencies between different lines of business and the effects that an extreme loss event, such as a… ▽ More Analysing dependent risks is an important task for insurance companies. A dependency is reflected in the fact that information about one random variable provides information about the likely distribution of values of another random variable. Insurance companies in particular must investigate such dependencies between different lines of business and the effects that an extreme loss event, such as an earthquake or hurricane, has across multiple lines of business simultaneously. Copulas provide a popular model-based approach to analysing the dependency between risks, and the coefficient of tail dependence is a measure of dependence for extreme losses. Besides commonly used empirical estimators for estimating the tail dependence coefficient, copula fitting can lead to estimation of such coefficients directly or can verify their existence. Generally, a range of copula models is available to fit a data set well, leading to multiple different tail dependence results; a method based on Bayesian model averaging is designed to obtain a unified estimate of tail dependence. In this article, this model-based coefficient estimation method is illustrated through a variety of copula fitting approaches and results are presented for several simulated data sets and also a real general insurance loss data set. △ Less

Submitted 19 March, 2021; originally announced March 2021.

arXiv:2012.04221 [pdf, other]

Ditto: Fair and Robust Federated Learning Through Personalization

Authors: Tian Li, Shengyuan Hu, Ahmad Beirami, Virginia Smith

Abstract: Fairness and robustness are two important concerns for federated learning systems. In this work, we identify that robustness to data and model poisoning attacks and fairness, measured as the uniformity of performance across devices, are competing constraints in statistically heterogeneous networks. To address these constraints, we propose employing a simple, general framework for personalized fede… ▽ More Fairness and robustness are two important concerns for federated learning systems. In this work, we identify that robustness to data and model poisoning attacks and fairness, measured as the uniformity of performance across devices, are competing constraints in statistically heterogeneous networks. To address these constraints, we propose employing a simple, general framework for personalized federated learning, Ditto, that can inherently provide fairness and robustness benefits, and develop a scalable solver for it. Theoretically, we analyze the ability of Ditto to achieve fairness and robustness simultaneously on a class of linear problems. Empirically, across a suite of federated datasets, we show that Ditto not only achieves competitive performance relative to recent personalization methods, but also enables more accurate, robust, and fair models relative to state-of-the-art fair or robust baselines. △ Less

Submitted 15 June, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: Accepted by ICML 2021

arXiv:2010.01741 [pdf, other]

Learning by Minimizing the Sum of Ranked Range

Authors: Shu Hu, Yiming Ying, Xin Wang, Siwei Lyu

Abstract: In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output. Such cases occur in the aggregate loss, which combines individual losses of a learning model over each training sample, and in the individual loss for multi-label learning, which combines prediction scores over all class labels. In this work, we introduce the sum of ranked range (SoRR)… ▽ More In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output. Such cases occur in the aggregate loss, which combines individual losses of a learning model over each training sample, and in the individual loss for multi-label learning, which combines prediction scores over all class labels. In this work, we introduce the sum of ranked range (SoRR) as a general approach to form learning objectives. A ranked range is a consecutive sequence of sorted values of a set of real numbers. The minimization of SoRR is solved with the difference of convex algorithm (DCA). We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification. Our empirical results highlight the effectiveness of the proposed optimization framework and demonstrate the applicability of proposed losses using synthetic and real datasets. △ Less

Submitted 4 October, 2020; originally announced October 2020.

Comments: Accepted by Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020)

arXiv:2009.04197

QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

Authors: Jian Hu, Seth Austin Harding, Haibin Wu, Siyue Hu, Shih-wei Liao

Abstract: In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the setting of Centralized Training with Decentralized Execution (CTDE), agents observe and interact with their environment locally and independently. With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns. Existing methods such as Value Decomposition Network… ▽ More In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the setting of Centralized Training with Decentralized Execution (CTDE), agents observe and interact with their environment locally and independently. With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns. Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness. Our proposed model QR-MIX introduces quantile regression, modeling joint state-action values as a distribution, combining QMIX with Implicit Quantile Network (IQN). However, the monotonicity in QMIX limits the expression of joint state-action value distribution and may lead to incorrect estimation results in non-monotonic cases. Therefore, we proposed a flexible loss function to approximate the monotonicity found in QMIX. Our model is not only more tolerant of the randomness of returns, but also more tolerant of the randomness of monotonic constraints. The experimental results demonstrate that QR-MIX outperforms the previous state-of-the-art method QMIX in the StarCraft Multi-Agent Challenge (SMAC) environment. △ Less

Submitted 23 February, 2021; v1 submitted 9 September, 2020; originally announced September 2020.

Comments: There are some experimental errors and experimental unfairness in this paper that will seriously affect the later studies

arXiv:2009.01272 [pdf, other]

Understanding the wiring evolution in differentiable neural architecture search

Authors: Sirui Xie, Shoukang Hu, Xinjiang Wang, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin

Abstract: Controversy exists on whether differentiable neural architecture search methods discover wiring topology effectively. To understand how wiring topology evolves, we study the underlying mechanism of several existing differentiable NAS frameworks. Our investigation is motivated by three observed searching patterns of differentiable NAS: 1) they search by growing instead of pruning; 2) wider networks… ▽ More Controversy exists on whether differentiable neural architecture search methods discover wiring topology effectively. To understand how wiring topology evolves, we study the underlying mechanism of several existing differentiable NAS frameworks. Our investigation is motivated by three observed searching patterns of differentiable NAS: 1) they search by growing instead of pruning; 2) wider networks are more preferred than deeper ones; 3) no edges are selected in bi-level optimization. To anatomize these phenomena, we propose a unified view on searching algorithms of existing frameworks, transferring the global optimization to local cost minimization. Based on this reformulation, we conduct empirical and theoretical analyses, revealing implicit inductive biases in the cost's assignment mechanism and evolution dynamics that cause the observed phenomena. These biases indicate strong discrimination towards certain topologies. To this end, we pose questions that future differentiable methods for neural wiring discovery need to confront, hoping to evoke a discussion and rethinking on how much bias has been enforced implicitly in existing NAS methods. △ Less

Submitted 25 February, 2021; v1 submitted 2 September, 2020; originally announced September 2020.

Comments: AISTATS 2021

arXiv:2006.13681 [pdf, other]

Multi-view Drone-based Geo-localization via Style and Spatial Alignment

Authors: Siyi Hu, Xiaojun Chang

Abstract: In this paper, we focus on the task of multi-view multi-source geo-localization, which serves as an important auxiliary method of GPS positioning by matching drone-view image and satellite-view image with pre-annotated GPS tag. To solve this problem, most existing methods adopt metric loss with an weighted classification block to force the generation of common feature space shared by different vie… ▽ More In this paper, we focus on the task of multi-view multi-source geo-localization, which serves as an important auxiliary method of GPS positioning by matching drone-view image and satellite-view image with pre-annotated GPS tag. To solve this problem, most existing methods adopt metric loss with an weighted classification block to force the generation of common feature space shared by different view points and view sources. However, these methods fail to pay sufficient attention to spatial information (especially viewpoint variances). To address this drawback, we propose an elegant orientation-based method to align the patterns and introduce a new branch to extract aligned partial feature. Moreover, we provide a style alignment strategy to reduce the variance in image style and enhance the feature unification. To demonstrate the performance of the proposed approach, we conduct extensive experiments on the large-scale benchmark dataset. The experimental results confirm the superiority of the proposed approach compared to state-of-the-art alternatives. △ Less

Submitted 8 July, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: 9 pages 9 figures. arXiv admin note: text overlap with arXiv:2002.12186 by other authors

ACM Class: I.4.7; I.2.10

arXiv:2006.13463 [pdf, other]

Graph Policy Network for Transferable Active Learning on Graphs

Authors: Shengding Hu, Zheng Xiong, Meng Qu, Xingdi Yuan, Marc-Alexandre Côté, Zhiyuan Liu, Jian Tang

Abstract: Graph neural networks (GNNs) have been attracting increasing popularity due to their simplicity and effectiveness in a variety of fields. However, a large number of labeled data is generally required to train these networks, which could be very expensive to obtain in some domains. In this paper, we study active learning for GNNs, i.e., how to efficiently label the nodes on a graph to reduce the an… ▽ More Graph neural networks (GNNs) have been attracting increasing popularity due to their simplicity and effectiveness in a variety of fields. However, a large number of labeled data is generally required to train these networks, which could be very expensive to obtain in some domains. In this paper, we study active learning for GNNs, i.e., how to efficiently label the nodes on a graph to reduce the annotation cost of training GNNs. We formulate the problem as a sequential decision process on graphs and train a GNN-based policy network with reinforcement learning to learn the optimal query strategy. By jointly training on several source graphs with full labels, we learn a transferable active learning policy which can directly generalize to unlabeled target graphs. Experimental results on multiple datasets from different domains prove the effectiveness of the learned policy in promoting active learning performance in both settings of transferring between graphs in the same domain and across different domains. △ Less

Submitted 23 October, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

ACM Class: I.2

arXiv:2006.07856 [pdf, other]

doi 10.1145/3510540

The OARF Benchmark Suite: Characterization and Implications for Federated Learning Systems

Authors: Sixu Hu, Yuan Li, Xu Liu, Qinbin Li, Zhaomin Wu, Bingsheng He

Abstract: This paper presents and characterizes an Open Application Repository for Federated Learning (OARF), a benchmark suite for federated machine learning systems. Previously available benchmarks for federated learning have focused mainly on synthetic datasets and use a limited number of applications. OARF mimics more realistic application scenarios with publicly available data sets as different data si… ▽ More This paper presents and characterizes an Open Application Repository for Federated Learning (OARF), a benchmark suite for federated machine learning systems. Previously available benchmarks for federated learning have focused mainly on synthetic datasets and use a limited number of applications. OARF mimics more realistic application scenarios with publicly available data sets as different data silos in image, text and structured data. Our characterization shows that the benchmark suite is diverse in data size, distribution, feature distribution and learning task complexity. The extensive evaluations with reference implementations show the future research opportunities for important aspects of federated learning systems. We have developed reference implementations, and evaluated the important aspects of federated learning, including model accuracy, communication cost, throughput and convergence time. Through these evaluations, we discovered some interesting findings such as federated learning can effectively increase end-to-end throughput. △ Less

Submitted 2 March, 2022; v1 submitted 14 June, 2020; originally announced June 2020.

Comments: ACM Transactions on Intelligent Systems and Technology, Vol. 13, No. 4, Article 63

arXiv:2006.04877 [pdf, other]

A Causal Direction Test for Heterogeneous Populations

Authors: Vahid Partovi Nia, Xinlin Li, Masoud Asgharian, Shoubo Hu, Zhitang Chen, Yanhui Geng

Abstract: A probabilistic expert system emulates the decision-making ability of a human expert through a directional graphical model. The first step in building such systems is to understand data generation mechanism. To this end, one may try to decompose a multivariate distribution into product of several conditionals, and evolving a blackbox machine learning predictive models towards transparent cause-and… ▽ More A probabilistic expert system emulates the decision-making ability of a human expert through a directional graphical model. The first step in building such systems is to understand data generation mechanism. To this end, one may try to decompose a multivariate distribution into product of several conditionals, and evolving a blackbox machine learning predictive models towards transparent cause-and-effect discovery. Most causal models assume a single homogeneous population, an assumption that may fail to hold in many applications. We show that when the homogeneity assumption is violated, causal models developed based on such assumption can fail to identify the correct causal direction. We propose an adjustment to a commonly used causal direction test statistic by using a $k$-means type clustering algorithm where both the labels and the number of components are estimated from the collected data to adjust the test statistic. Our simulation result show that the proposed adjustment significantly improves the performance of the causal direction test statistic for heterogeneous data. We study large sample behaviour of our proposed test statistic and demonstrate the application of the proposed method using real data. △ Less

Submitted 27 September, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

MSC Class: 62D20; 62H30

arXiv:2005.00667 [pdf]

Data-Driven Modeling Reveals the Impact of Stay-at-Home Orders on Human Mobility during the COVID-19 Pandemic in the U.S

Authors: Chenfeng Xiong, Songhua Hu, Mofeng Yang, Hannah N Younes, Weiyu Luo, Sepehr Ghader, Lei Zhang

Abstract: One approach to delay the spread of the novel coronavirus (COVID-19) is to reduce human travel by imposing travel restriction policies. It is yet unclear how effective those policies are on suppressing the mobility trend due to the lack of ground truth and large-scale dataset describing human mobility during the pandemic. This study uses real-world location-based service data collected from anonym… ▽ More One approach to delay the spread of the novel coronavirus (COVID-19) is to reduce human travel by imposing travel restriction policies. It is yet unclear how effective those policies are on suppressing the mobility trend due to the lack of ground truth and large-scale dataset describing human mobility during the pandemic. This study uses real-world location-based service data collected from anonymized mobile devices to uncover mobility changes during COVID-19 and under the 'Stay-at-home' state orders in the U.S. The study measures human mobility with two important metrics: daily average number of trips per person and daily average person-miles traveled. The data-driven analysis and modeling attribute less than 5% of the reduction in the number of trips and person-miles traveled to the effect of the policy. The models developed in the study exhibit high prediction accuracy and can be applied to inform epidemics modeling with empirically verified mobility trends and to support time-sensitive decision-making processes. △ Less

Submitted 4 May, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

arXiv:2004.12696 [pdf, other]

Empirical Bayes Transductive Meta-Learning with Synthetic Gradients

Authors: Shell Xu Hu, Pablo G. Moreno, Yang Xiao, Xi Shen, Guillaume Obozinski, Neil D. Lawrence, Andreas Damianou

Abstract: We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging the unlabeled query set in addition to the support set to generate a more powerful model for each task. To develop our framework, we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a su… ▽ More We propose a meta-learning approach that learns from multiple tasks in a transductive setting, by leveraging the unlabeled query set in addition to the support set to generate a more powerful model for each task. To develop our framework, we revisit the empirical Bayes formulation for multi-task learning. The evidence lower bound of the marginal log-likelihood of empirical Bayes decomposes as a sum of local KL divergences between the variational posterior and the true posterior on the query set of each task. We derive a novel amortized variational inference that couples all the variational posteriors via a meta-model, which consists of a synthetic gradient network and an initialization network. Each variational posterior is derived from synthetic gradient descent to approximate the true posterior on the query set, although where we do not have access to the true gradient. Our results on the Mini-ImageNet and CIFAR-FS benchmarks for episodic few-shot classification outperform previous state-of-the-art methods. Besides, we conduct two zero-shot learning experiments to further explore the potential of the synthetic gradient. △ Less

Submitted 27 April, 2020; originally announced April 2020.

Comments: ICLR 2020

arXiv:2002.09128 [pdf, other]

DSNAS: Direct Neural Architecture Search without Parameter Retraining

Authors: Shoukang Hu, Sirui Xie, Hehui Zheng, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin

Abstract: If NAS methods are solutions, what is the problem? Most existing NAS methods require two-stage parameter optimization. However, performance of the same architecture in the two stages correlates poorly. In this work, we propose a new problem definition for NAS, task-specific end-to-end, based on this observation. We argue that given a computer vision task for which a NAS method is expected, this de… ▽ More If NAS methods are solutions, what is the problem? Most existing NAS methods require two-stage parameter optimization. However, performance of the same architecture in the two stages correlates poorly. In this work, we propose a new problem definition for NAS, task-specific end-to-end, based on this observation. We argue that given a computer vision task for which a NAS method is expected, this definition can reduce the vaguely-defined NAS evaluation to i) accuracy of this task and ii) the total computation consumed to finally obtain a model with satisfying accuracy. Seeing that most existing methods do not solve this problem directly, we propose DSNAS, an efficient differentiable NAS framework that simultaneously optimizes architecture and parameters with a low-biased Monte Carlo estimate. Child networks derived from DSNAS can be deployed directly without parameter retraining. Comparing with two-stage methods, DSNAS successfully discovers networks with comparable accuracy (74.4%) on ImageNet in 420 GPU hours, reducing the total time by more than 34%. Our implementation is available at https://github.com/SNAS-Series/SNAS-Series. △ Less

Submitted 31 March, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

Comments: To appear in CVPR 2020

arXiv:2002.05582 [pdf, other]

Learning to Predict Error for MRI Reconstruction

Authors: Shi Hu, Nicola Pezzotti, Max Welling

Abstract: In healthcare applications, predictive uncertainty has been used to assess predictive accuracy. In this paper, we demonstrate that predictive uncertainty estimated by the current methods does not highly correlate with prediction error by decomposing the latter into random and systematic errors, and showing that the former is equivalent to the variance of the random error. In addition, we observe t… ▽ More In healthcare applications, predictive uncertainty has been used to assess predictive accuracy. In this paper, we demonstrate that predictive uncertainty estimated by the current methods does not highly correlate with prediction error by decomposing the latter into random and systematic errors, and showing that the former is equivalent to the variance of the random error. In addition, we observe that current methods unnecessarily compromise performance by modifying the model and training loss to estimate the target and uncertainty jointly. We show that estimating them separately without modifications improves performance. Following this, we propose a novel method that estimates the target labels and magnitude of the prediction error in two steps. We demonstrate this method on a large-scale MRI reconstruction task, and achieve significantly better results than the state-of-the-art uncertainty estimation methods. △ Less

Submitted 6 July, 2021; v1 submitted 13 February, 2020; originally announced February 2020.

Comments: Accepted to MICCAI 2021

arXiv:1910.07629 [pdf, other]

A New Defense Against Adversarial Images: Turning a Weakness into a Strength

Authors: Tao Yu, Shengyuan Hu, Chuan Guo, Wei-Lun Chao, Kilian Q. Weinberger

Abstract: Natural images are virtually surrounded by low-density misclassified regions that can be efficiently discovered by gradient-guided search --- enabling the generation of adversarial images. While many techniques for detecting these attacks have been proposed, they are easily bypassed when the adversary has full knowledge of the detection mechanism and adapts the attack strategy accordingly. In this… ▽ More Natural images are virtually surrounded by low-density misclassified regions that can be efficiently discovered by gradient-guided search --- enabling the generation of adversarial images. While many techniques for detecting these attacks have been proposed, they are easily bypassed when the adversary has full knowledge of the detection mechanism and adapts the attack strategy accordingly. In this paper, we adopt a novel perspective and regard the omnipresence of adversarial perturbations as a strength rather than a weakness. We postulate that if an image has been tampered with, these adversarial directions either become harder to find with gradient methods or have substantially higher density than for natural images. We develop a practical test for this signature characteristic to successfully detect adversarial attacks, achieving unprecedented accuracy under the white-box setting where the adversary is given full knowledge of our detection mechanism. △ Less

Submitted 3 December, 2019; v1 submitted 16 October, 2019; originally announced October 2019.

Comments: NeurIPS 2019, 14 pages

arXiv:1907.11216 [pdf, other]

Domain Generalization via Multidomain Discriminant Analysis

Authors: Shoubo Hu, Kun Zhang, Zhitang Chen, Laiwan Chan

Abstract: Domain generalization (DG) aims to incorporate knowledge from multiple source domains into a single model that could generalize well on unseen target domains. This problem is ubiquitous in practice since the distributions of the target data may rarely be identical to those of the source data. In this paper, we propose Multidomain Discriminant Analysis (MDA) to address DG of classification tasks in… ▽ More Domain generalization (DG) aims to incorporate knowledge from multiple source domains into a single model that could generalize well on unseen target domains. This problem is ubiquitous in practice since the distributions of the target data may rarely be identical to those of the source data. In this paper, we propose Multidomain Discriminant Analysis (MDA) to address DG of classification tasks in general situations. MDA learns a domain-invariant feature transformation that aims to achieve appealing properties, including a minimal divergence among domains within each class, a maximal separability among classes, and overall maximal compactness of all classes. Furthermore, we provide the bounds on excess risk and generalization error by learning theory analysis. Comprehensive experiments on synthetic and real benchmark datasets demonstrate the effectiveness of MDA. △ Less

Submitted 25 July, 2019; originally announced July 2019.

Comments: UAI 2019

arXiv:1907.09693 [pdf, other]

doi 10.1109/TKDE.2021.3124599

A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection

Authors: Qinbin Li, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Yuan Li, Xu Liu, Bingsheng He

Abstract: Federated learning has been a hot research topic in enabling the collaborative training of machine learning models among different organizations under the privacy restrictions. As researchers try to support more machine learning models with different privacy-preserving approaches, there is a requirement in developing systems and infrastructures to ease the development of various federated learning… ▽ More Federated learning has been a hot research topic in enabling the collaborative training of machine learning models among different organizations under the privacy restrictions. As researchers try to support more machine learning models with different privacy-preserving approaches, there is a requirement in developing systems and infrastructures to ease the development of various federated learning algorithms. Similar to deep learning systems such as PyTorch and TensorFlow that boost the development of deep learning, federated learning systems (FLSs) are equivalently important, and face challenges from various aspects such as effectiveness, efficiency, and privacy. In this survey, we conduct a comprehensive review on federated learning systems. To achieve smooth flow and guide future research, we introduce the definition of federated learning systems and analyze the system components. Moreover, we provide a thorough categorization for federated learning systems according to six different aspects, including data distribution, machine learning model, privacy mechanism, communication architecture, scale of federation and motivation of federation. The categorization can help the design of federated learning systems as shown in our case studies. By systematically summarizing the existing federated learning systems, we present the design factors, case studies, and future research opportunities. △ Less

Submitted 4 December, 2021; v1 submitted 23 July, 2019; originally announced July 2019.

Comments: Accepted to IEEE Transactions on Knowledge and Data Engineering (TKDE)

arXiv:1907.01949 [pdf, other]

Supervised Uncertainty Quantification for Segmentation with Multiple Annotations

Authors: Shi Hu, Daniel Worrall, Stefan Knegt, Bas Veeling, Henkjan Huisman, Max Welling

Abstract: The accurate estimation of predictive uncertainty carries importance in medical scenarios such as lung node segmentation. Unfortunately, most existing works on predictive uncertainty do not return calibrated uncertainty estimates, which could be used in practice. In this work we exploit multi-grader annotation variability as a source of 'groundtruth' aleatoric uncertainty, which can be treated as… ▽ More The accurate estimation of predictive uncertainty carries importance in medical scenarios such as lung node segmentation. Unfortunately, most existing works on predictive uncertainty do not return calibrated uncertainty estimates, which could be used in practice. In this work we exploit multi-grader annotation variability as a source of 'groundtruth' aleatoric uncertainty, which can be treated as a target in a supervised learning problem. We combine this groundtruth uncertainty with a Probabilistic U-Net and test on the LIDC-IDRI lung nodule CT dataset and MICCAI2012 prostate MRI dataset. We find that we are able to improve predictive uncertainty estimates. We also find that we can improve sample accuracy and sample diversity. In real-world applications, our method could inform doctors about the confidence of the segmentation results. △ Less

Submitted 27 May, 2022; v1 submitted 3 July, 2019; originally announced July 2019.

Comments: MICCAI 2019. Fixed a few typos

arXiv:1905.12269 [pdf, other]

doi 10.2140/astat.2022.13.41

Topological Techniques in Model Selection

Authors: Shaoxiong Hu, Hugo Maruri-Aguliar, Zixiang Ma

Abstract: The LASSO is an attractive regularisation method for linear regression that combines variable selection with an efficient computation procedure. This paper is concerned with enhancing the performance of LASSO for square-free hierarchical polynomial models when combining validation error with a measure of model complexity. The measure of the complexity is the sum of Betti numbers of the model which… ▽ More The LASSO is an attractive regularisation method for linear regression that combines variable selection with an efficient computation procedure. This paper is concerned with enhancing the performance of LASSO for square-free hierarchical polynomial models when combining validation error with a measure of model complexity. The measure of the complexity is the sum of Betti numbers of the model which is seen as a simplicial complex, and we describe the model in terms of components and cycles, borrowing from recent developments in computational topology. We study and propose an algorithm which combines statistical and topological criteria. This compound criterion would allow us to deal with model selection problems in polynomial regression models containing higher-order interactions. Simulation results demonstrate that the compound criteria produce sparser models with lower prediction errors than the estimators of several other statistical methods for higher order interaction models. △ Less

Submitted 29 May, 2019; originally announced May 2019.

Journal ref: Alg. Stat. 13 (2022) 41-56

arXiv:1904.04699 [pdf, other]

Bivariate Gamma Mixture of Experts Models for Joint Insurance Claims Modeling

Authors: Sen Hu, T Brendan Murphy, Adrian O'Hagan

Abstract: In general insurance, risks from different categories are often modeled independently and their sum is regarded as the total risk the insurer takes on in exchange for a premium. The dependence from multiple risks is generally neglected even when correlation could exist, for example a single car accident may result in claims from multiple risk categories. It is desirable to take the covariance of d… ▽ More In general insurance, risks from different categories are often modeled independently and their sum is regarded as the total risk the insurer takes on in exchange for a premium. The dependence from multiple risks is generally neglected even when correlation could exist, for example a single car accident may result in claims from multiple risk categories. It is desirable to take the covariance of different categories into consideration in modeling in order to better predict future claims and hence allow greater accuracy in ratemaking. In this work multivariate severity models are investigated using mixture of experts models with bivariate gamma distributions, where the dependence structure is modeled directly using a GLM framework, and covariates can be placed in both gating and expert networks. Furthermore, parsimonious parameterisations are considered, which leads to a family of bivariate gamma mixture of experts models. It can be viewed as a model-based clustering approach that clusters policyholders into sub-groups with different dependencies, and the parameters of the mixture models are dependent on the covariates. Clustering is shown to be important in separating the data into sub-groupings where strong dependence is often present, even if the overall data set exhibits only weak dependence. In doing so, the correlation within different components features prominently in the model. It is shown that, by applying to both simulated data and a real-world Irish GI insurer data set, claim predictions can be improved. △ Less

Submitted 9 April, 2019; originally announced April 2019.

arXiv:1902.07849 [pdf, other]

doi 10.1145/3308558.3313426

STFNets: Learning Sensing Signals from the Time-Frequency Perspective with Short-Time Fourier Neural Networks

Authors: Shuochao Yao, Ailing Piao, Wenjun Jiang, Yiran Zhao, Huajie Shao, Shengzhong Liu, Dongxin Liu, Jinyang Li, Tianshi Wang, Shaohan Hu, Lu Su, Jiawei Han, Tarek Abdelzaher

Abstract: Recent advances in deep learning motivate the use of deep neural networks in Internet-of-Things (IoT) applications. These networks are modelled after signal processing in the human brain, thereby leading to significant advantages at perceptual tasks such as vision and speech recognition. IoT applications, however, often measure physical phenomena, where the underlying physics (such as inertia, wir… ▽ More Recent advances in deep learning motivate the use of deep neural networks in Internet-of-Things (IoT) applications. These networks are modelled after signal processing in the human brain, thereby leading to significant advantages at perceptual tasks such as vision and speech recognition. IoT applications, however, often measure physical phenomena, where the underlying physics (such as inertia, wireless signal propagation, or the natural frequency of oscillation) are fundamentally a function of signal frequencies, offering better features in the frequency domain. This observation leads to a fundamental question: For IoT applications, can one develop a new brand of neural network structures that synthesize features inspired not only by the biology of human perception but also by the fundamental nature of physics? Hence, in this paper, instead of using conventional building blocks (e.g., convolutional and recurrent layers), we propose a new foundational neural network building block, the Short-Time Fourier Neural Network (STFNet). It integrates a widely-used time-frequency analysis method, the Short-Time Fourier Transform, into data processing to learn features directly in the frequency domain, where the physics of underlying phenomena leave better foot-prints. STFNets bring additional flexibility to time-frequency analysis by offering novel nonlinear learnable operations that are spectral-compatible. Moreover, STFNets show that transforming signals to a domain that is more connected to the underlying physics greatly simplifies the learning process. We demonstrate the effectiveness of STFNets with extensive experiments. STFNets significantly outperform the state-of-the-art deep learning models in all experiments. A STFNet, therefore, demonstrates superior capability as the fundamental building block of deep neural networks for IoT applications for various sensor inputs. △ Less

Submitted 20 February, 2019; originally announced February 2019.

arXiv:1812.11027 [pdf, other]

Exploring Weight Symmetry in Deep Neural Networks

Authors: Xu Shell Hu, Sergey Zagoruyko, Nikos Komodakis

Abstract: We propose to impose symmetry in neural network parameters to improve parameter usage and make use of dedicated convolution and matrix multiplication routines. Due to significant reduction in the number of parameters as a result of the symmetry constraints, one would expect a dramatic drop in accuracy. Surprisingly, we show that this is not the case, and, depending on network size, symmetry can ha… ▽ More We propose to impose symmetry in neural network parameters to improve parameter usage and make use of dedicated convolution and matrix multiplication routines. Due to significant reduction in the number of parameters as a result of the symmetry constraints, one would expect a dramatic drop in accuracy. Surprisingly, we show that this is not the case, and, depending on network size, symmetry can have little or no negative effect on network accuracy, especially in deep overparameterized networks. We propose several ways to impose local symmetry in recurrent and convolutional neural networks, and show that our symmetry parameterizations satisfy universal approximation property for single hidden layer networks. We extensively evaluate these parameterizations on CIFAR, ImageNet and language modeling datasets, showing significant benefits from the use of symmetry. For instance, our ResNet-101 with channel-wise symmetry has almost 25% less parameters and only 0.2% accuracy loss on ImageNet. Code for our experiments is available at https://github.com/hushell/deep-symmetry △ Less

Submitted 10 January, 2019; v1 submitted 28 December, 2018; originally announced December 2018.

arXiv:1812.08434 [pdf]

Graph Neural Networks: A Review of Methods and Applications

Authors: Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, Maosong Sun

Abstract: Lots of learning tasks require dealing with graph data which contains rich relation information among elements. Modeling physics systems, learning molecular fingerprints, predicting protein interface, and classifying diseases demand a model to learn from graph inputs. In other domains such as learning from non-structural data like texts and images, reasoning on extracted structures (like the depen… ▽ More Lots of learning tasks require dealing with graph data which contains rich relation information among elements. Modeling physics systems, learning molecular fingerprints, predicting protein interface, and classifying diseases demand a model to learn from graph inputs. In other domains such as learning from non-structural data like texts and images, reasoning on extracted structures (like the dependency trees of sentences and the scene graphs of images) is an important research topic which also needs graph reasoning models. Graph neural networks (GNNs) are neural models that capture the dependence of graphs via message passing between the nodes of graphs. In recent years, variants of GNNs such as graph convolutional network (GCN), graph attention network (GAT), graph recurrent network (GRN) have demonstrated ground-breaking performances on many deep learning tasks. In this survey, we propose a general design pipeline for GNN models and discuss the variants of each component, systematically categorize the applications, and propose four open problems for future research. △ Less

Submitted 6 October, 2021; v1 submitted 20 December, 2018; originally announced December 2018.

Comments: Published at AI Open 2021

arXiv:1809.08568 [pdf, other]

Causal Inference and Mechanism Clustering of A Mixture of Additive Noise Models

Authors: Shoubo Hu, Zhitang Chen, Vahid Partovi Nia, Laiwan Chan, Yanhui Geng

Abstract: The inference of the causal relationship between a pair of observed variables is a fundamental problem in science, and most existing approaches are based on one single causal model. In practice, however, observations are often collected from multiple sources with heterogeneous causal models due to certain uncontrollable factors, which renders causal analysis results obtained by a single model skep… ▽ More The inference of the causal relationship between a pair of observed variables is a fundamental problem in science, and most existing approaches are based on one single causal model. In practice, however, observations are often collected from multiple sources with heterogeneous causal models due to certain uncontrollable factors, which renders causal analysis results obtained by a single model skeptical. In this paper, we generalize the Additive Noise Model (ANM) to a mixture model, which consists of a finite number of ANMs, and provide the condition of its causal identifiability. To conduct model estimation, we propose Gaussian Process Partially Observable Model (GPPOM), and incorporate independence enforcement into it to learn latent parameter associated with each observation. Causal inference and clustering according to the underlying generating mechanisms of the mixture model are addressed in this work. Experiments on synthetic and real data demonstrate the effectiveness of our proposed approach. △ Less

Submitted 11 November, 2018; v1 submitted 23 September, 2018; originally announced September 2018.

Comments: Published at NIPS 2018

arXiv:1809.08560 [pdf, other]

doi 10.1162/neco_a_01064

A Kernel Embedding-based Approach for Nonstationary Causal Model Inference

Authors: Shoubo Hu, Zhitang Chen, Laiwan Chan

Abstract: Although nonstationary data are more common in the real world, most existing causal discovery methods do not take nonstationarity into consideration. In this letter, we propose a kernel embedding-based approach, ENCI, for nonstationary causal model inference where data are collected from multiple domains with varying distributions. In ENCI, we transform the complicated relation of a cause-effect p… ▽ More Although nonstationary data are more common in the real world, most existing causal discovery methods do not take nonstationarity into consideration. In this letter, we propose a kernel embedding-based approach, ENCI, for nonstationary causal model inference where data are collected from multiple domains with varying distributions. In ENCI, we transform the complicated relation of a cause-effect pair into a linear model of variables of which observations correspond to the kernel embeddings of the cause-and-effect distributions in different domains. In this way, we are able to estimate the causal direction by exploiting the causal asymmetry of the transformed linear model. Furthermore, we extend ENCI to causal graph discovery for multiple variables by transforming the relations among them into a linear nongaussian acyclic model. We show that by exploiting the nonstationarity of distributions, both cause-effect pairs and two kinds of causal graphs are identifiable under mild conditions. Experiments on synthetic and real-world data are conducted to justify the efficacy of ENCI over major existing methods. △ Less

Submitted 23 September, 2018; originally announced September 2018.

Comments: Published at Neural Computation

Journal ref: Neural computation, 30(5), 1394-1425, 2018

arXiv:1809.01471 [pdf, other]

Chest X-ray Inpainting with Deep Generative Models

Authors: Ecem Sogancioglu, Shi Hu, Davide Belli, Bram van Ginneken

Abstract: Generative adversarial networks have been successfully applied to inpainting in natural images. However, the current state-of-the-art models have not yet been widely adopted in the medical imaging domain. In this paper, we investigate the performance of three recently published deep learning based inpainting models: context encoders, semantic image inpainting, and the contextual attention model, a… ▽ More Generative adversarial networks have been successfully applied to inpainting in natural images. However, the current state-of-the-art models have not yet been widely adopted in the medical imaging domain. In this paper, we investigate the performance of three recently published deep learning based inpainting models: context encoders, semantic image inpainting, and the contextual attention model, applied to chest x-rays, as the chest exam is the most commonly performed radiological procedure. We train these generative models on 1.2M 128 $\times$ 128 patches from 60K healthy x-rays, and learn to predict the center 64 $\times$ 64 region in each patch. We test the models on both the healthy and abnormal radiographs. We evaluate the results by visual inspection and comparing the PSNR scores. The outputs of the models are in most cases highly realistic. We show that the methods have potential to enhance and detect abnormalities. In addition, we perform a 2AFC observer study and show that an experienced human observer performs poorly in detecting inpainted regions, particularly those generated by the contextual attention model. △ Less

Submitted 29 August, 2018; originally announced September 2018.

Comments: 9 pages

arXiv:1808.05766 [pdf]

The Function Transformation Omics - Funomics

Authors: Yongshuai Jiang, Jing Xu, Simeng Hu, Di Liu, Linna Zhao, Xu Zhou

Abstract: There are no two identical leaves in the world, so how to find effective markers or features to distinguish them is an important issue. Function transformation, such as f(x,y) and f(x,y,z), can transform two, three, or multiple input/observation variables (in biology, it generally refers to the observed/measured value of biomarkers, biological characteristics, or other indicators) into a new outpu… ▽ More There are no two identical leaves in the world, so how to find effective markers or features to distinguish them is an important issue. Function transformation, such as f(x,y) and f(x,y,z), can transform two, three, or multiple input/observation variables (in biology, it generally refers to the observed/measured value of biomarkers, biological characteristics, or other indicators) into a new output variable (new characteristics or indicators). This provided us a chance to re-cognize objective things or relationships beyond the original measurements. For example, Body Mass Index, which transform weight and high into a new indicator BMI=x/y^2 (where x is weight and y is high), is commonly used in to gauge obesity. Here, we proposed a new system, Funomics (Function Transformation Omics), for understanding the world in a different perspective. Funome can be understood as a set of math functions consist of basic elementary functions (such as power functions and exponential functions) and basic mathematical operations (such as addition, subtraction). By scanning the whole Funome, researchers can identify some special functions (called handsome functions) which can generate the novel important output variable (characteristics or indicators). We also start "the Funome project" to develop novel methods, function library and analysis software for Funome studies. The Funome project will accelerate the discovery of new useful indicators or characteristics, will improve the utilization efficiency of directly measured data, and will enhance our ability to understand the world. The analysis tools and data resources about the Funome project can be found gradually at http://www.funome.com. △ Less

Submitted 17 August, 2018; originally announced August 2018.

arXiv:1805.11793 [pdf, ps, other]

Infinite Arms Bandit: Optimality via Confidence Bounds

Authors: Hock Peng Chan, Shouri Hu

Abstract: Berry et al. (1997) initiated the development of the infinite arms bandit problem. They derived a regret lower bound of all allocation strategies for Bernoulli rewards with uniform priors, and proposed strategies based on success runs. Bonald and Proutière (2013) proposed a two-target algorithm that achieves the regret lower bound, and extended optimality to Bernoulli rewards with general priors.… ▽ More Berry et al. (1997) initiated the development of the infinite arms bandit problem. They derived a regret lower bound of all allocation strategies for Bernoulli rewards with uniform priors, and proposed strategies based on success runs. Bonald and Proutière (2013) proposed a two-target algorithm that achieves the regret lower bound, and extended optimality to Bernoulli rewards with general priors. We present here a confidence bound target (CBT) algorithm that achieves optimality for rewards that are bounded above. For each arm we construct a confidence bound and compare it against each other and a target value to determine if the arm should be sampled further. The target value depends on the assumed priors of the arm means. In the absence of information on the prior, the target value is determined empirically. Numerical studies here show that CBT is versatile and outperforms its competitors. △ Less

Submitted 21 June, 2020; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: Fourth version

arXiv:1804.04206 [pdf, other]

Multi-scale Neural Networks for Retinal Blood Vessels Segmentation

Authors: Boheng Zhang, Shenglei Huang, Shaohan Hu

Abstract: Existing supervised approaches didn't make use of the low-level features which are actually effective to this task. And another deficiency is that they didn't consider the relation between pixels, which means effective features are not extracted. In this paper, we proposed a novel convolutional neural network which make sufficient use of low-level features together with high-level features and inv… ▽ More Existing supervised approaches didn't make use of the low-level features which are actually effective to this task. And another deficiency is that they didn't consider the relation between pixels, which means effective features are not extracted. In this paper, we proposed a novel convolutional neural network which make sufficient use of low-level features together with high-level features and involves atrous convolution to get multi-scale features which should be considered as effective features. Our model is tested on three standard benchmarks - DRIVE, STARE, and CHASE databases. The results presents that our model significantly outperforms existing approaches in terms of accuracy, sensitivity, specificity, the area under the ROC curve and the highest prediction speed. Our work provides evidence of the power of wide and deep neural networks in retinal blood vessels segmentation task which could be applied on other medical images tasks. △ Less

Submitted 11 April, 2018; originally announced April 2018.

arXiv:1710.03704 [pdf, other]

Motor Insurance Accidental Damage Claims Modeling with Factor Collapsing and Bayesian Model Averaging

Authors: Sen Hu, Adrian O'Hagan, Thomas Brendan Murphy

Abstract: Accidental damage is a typical component of motor insurance claim. Modeling of this nature generally involves analysis of past claim history and different characteristics of the insured objects and the policyholders. Generalized linear models (GLMs) have become the industry's standard approach for pricing and modeling risks of this nature. However, the GLM approach utilizes a single "best" model o… ▽ More Accidental damage is a typical component of motor insurance claim. Modeling of this nature generally involves analysis of past claim history and different characteristics of the insured objects and the policyholders. Generalized linear models (GLMs) have become the industry's standard approach for pricing and modeling risks of this nature. However, the GLM approach utilizes a single "best" model on which loss predictions are based, which ignores the uncertainty among the competing models and variable selection. An additional characteristic of motor insurance data sets is the presence of many categorical variables, within which the number of levels is high. In particular, not all levels of such variables may be statistically significant and rather some subsets of the levels may be merged to give a smaller overall number of levels for improved model parsimony and interpretability. A method is proposed for assessing the optimal manner in which to collapse a factor with many levels into one with a smaller number of levels, then Bayesian model averaging (BMA) is used to blend model predictions from all reasonable models to account for factor collapsing uncertainty. This method will be computationally intensive due to the number of factors being collapsed as well as the possibly large number of levels within factors. Hence a stochastic optimisation is proposed to quickly find the best collapsing cases across the model space. △ Less

Submitted 10 October, 2017; originally announced October 2017.

arXiv:1704.04235 [pdf, other]

Close Yet Distinctive Domain Adaptation

Authors: Lingkun Luo, Xiaofang Wang, Shiqiang Hu, Chao Wang, Yuxing Tang, Liming Chen

Abstract: Domain adaptation is transfer learning which aims to generalize a learning model across training and testing data with different distributions. Most previous research tackle this problem in seeking a shared feature representation between source and target domains while reducing the mismatch of their data distributions. In this paper, we propose a close yet discriminative domain adaptation method,… ▽ More Domain adaptation is transfer learning which aims to generalize a learning model across training and testing data with different distributions. Most previous research tackle this problem in seeking a shared feature representation between source and target domains while reducing the mismatch of their data distributions. In this paper, we propose a close yet discriminative domain adaptation method, namely CDDA, which generates a latent feature representation with two interesting properties. First, the discrepancy between the source and target domain, measured in terms of both marginal and conditional probability distribution via Maximum Mean Discrepancy is minimized so as to attract two domains close to each other. More importantly, we also design a repulsive force term, which maximizes the distances between each label dependent sub-domain to all others so as to drag different class dependent sub-domains far away from each other and thereby increase the discriminative power of the adapted domain. Moreover, given the fact that the underlying data manifold could have complex geometric structure, we further propose the constraints of label smoothness and geometric structure consistency for label propagation. Extensive experiments are conducted on 36 cross-domain image classification tasks over four public datasets. The comprehensive results show that the proposed method consistently outperforms the state-of-the-art methods with significant margins. △ Less

Submitted 13 April, 2017; originally announced April 2017.

Comments: 11pages, 3 figures, ICCV2017

Showing 1–50 of 56 results for author: Hu, S