Search | arXiv e-print repository

Improved identification of breakpoints in piecewise regression and its applications

Authors: Taehyeong Kim, Hyungu Lee, Hayoung Choi

Abstract: Identifying breakpoints in piecewise regression is critical in enhancing the reliability and interpretability of data fitting. In this paper, we propose novel algorithms based on the greedy algorithm to accurately and efficiently identify breakpoints in piecewise polynomial regression. The algorithm updates the breakpoints to minimize the error by exploring the neighborhood of each breakpoint. It… ▽ More Identifying breakpoints in piecewise regression is critical in enhancing the reliability and interpretability of data fitting. In this paper, we propose novel algorithms based on the greedy algorithm to accurately and efficiently identify breakpoints in piecewise polynomial regression. The algorithm updates the breakpoints to minimize the error by exploring the neighborhood of each breakpoint. It has a fast convergence rate and stability to find optimal breakpoints. Moreover, it can determine the optimal number of breakpoints. The computational results for real and synthetic data show that its accuracy is better than any existing methods. The real-world datasets demonstrate that breakpoints through the proposed algorithm provide valuable data information. △ Less

Submitted 27 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

Comments: 13 pages, 6 figures

arXiv:2407.10784 [pdf, other]

AdapTable: Test-Time Adaptation for Tabular Data via Shift-Aware Uncertainty Calibrator and Label Distribution Handler

Authors: Changhun Kim, Taewon Kim, Seungyeon Woo, June Yong Yang, Eunho Yang

Abstract: In real-world scenarios, tabular data often suffer from distribution shifts that threaten the performance of machine learning models. Despite its prevalence and importance, handling distribution shifts in the tabular domain remains underexplored due to the inherent challenges within the tabular data itself. In this sense, test-time adaptation (TTA) offers a promising solution by adapting models to… ▽ More In real-world scenarios, tabular data often suffer from distribution shifts that threaten the performance of machine learning models. Despite its prevalence and importance, handling distribution shifts in the tabular domain remains underexplored due to the inherent challenges within the tabular data itself. In this sense, test-time adaptation (TTA) offers a promising solution by adapting models to target data without accessing source data, crucial for privacy-sensitive tabular domains. However, existing TTA methods either 1) overlook the nature of tabular distribution shifts, often involving label distribution shifts, or 2) impose architectural constraints on the model, leading to a lack of applicability. To this end, we propose AdapTable, a novel TTA framework for tabular data. AdapTable operates in two stages: 1) calibrating model predictions using a shift-aware uncertainty calibrator, and 2) adjusting these predictions to match the target label distribution with a label distribution handler. We validate the effectiveness of AdapTable through theoretical analysis and extensive experiments on various distribution shift scenarios. Our results demonstrate AdapTable's ability to handle various real-world distribution shifts, achieving up to a 16% improvement on the HELOC dataset. △ Less

Submitted 26 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

Comments: Under Review at AAAI 2025

arXiv:2404.13321 [pdf]

Accelerated System-Reliability-based Disaster Resilience Analysis for Structural Systems

Authors: Taeyong Kim, Sang-ri Yi

Abstract: Resilience has emerged as a crucial concept for evaluating structural performance under disasters because of its ability to extend beyond traditional risk assessments, accounting for a system's ability to minimize disruptions and maintain functionality during recovery. To facilitate the holistic understanding of resilience performance in structural systems, a system-reliability-based disaster resi… ▽ More Resilience has emerged as a crucial concept for evaluating structural performance under disasters because of its ability to extend beyond traditional risk assessments, accounting for a system's ability to minimize disruptions and maintain functionality during recovery. To facilitate the holistic understanding of resilience performance in structural systems, a system-reliability-based disaster resilience analysis framework was developed. The framework describes resilience using three criteria: reliability, redundancy, and recoverability, and the system's internal resilience is evaluated by inspecting the characteristics of reliability and redundancy for different possible progressive failure modes. However, the practical application of this framework has been limited to complex structures with numerous sub-components, as it becomes intractable to evaluate the performances for all possible initial disruption scenarios. To bridge the gap between the theory and practical use, especially for evaluating reliability and redundancy, this study centers on the idea that the computational burden can be substantially alleviated by focusing on initial disruption scenarios that are practically significant. To achieve this research goal, we propose three methods to efficiently eliminate insignificant scenarios: the sequential search method, the n-ball sampling method, and the surrogate model-based adaptive sampling algorithm. Three numerical examples, including buildings and a bridge, are introduced to prove the applicability and efficiency of the proposed approaches. The findings of this study are expected to offer practical solutions to the challenges of assessing resilience performance in complex structural systems. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 25 pages, 18 figures

arXiv:2312.03386 [pdf, other]

An Infinite-Width Analysis on the Jacobian-Regularised Training of a Neural Network

Authors: Taeyoung Kim, Hongseok Yang

Abstract: The recent theoretical analysis of deep neural networks in their infinite-width limits has deepened our understanding of initialisation, feature learning, and training of those networks, and brought new practical techniques for finding appropriate hyperparameters, learning network weights, and performing inference. In this paper, we broaden this line of research by showing that this infinite-width… ▽ More The recent theoretical analysis of deep neural networks in their infinite-width limits has deepened our understanding of initialisation, feature learning, and training of those networks, and brought new practical techniques for finding appropriate hyperparameters, learning network weights, and performing inference. In this paper, we broaden this line of research by showing that this infinite-width analysis can be extended to the Jacobian of a deep neural network. We show that a multilayer perceptron (MLP) and its Jacobian at initialisation jointly converge to a Gaussian process (GP) as the widths of the MLP's hidden layers go to infinity and characterise this GP. We also prove that in the infinite-width limit, the evolution of the MLP under the so-called robust training (i.e., training with a regulariser on the Jacobian) is described by a linear first-order ordinary differential equation that is determined by a variant of the Neural Tangent Kernel. We experimentally show the relevance of our theoretical claims to wide finite networks, and empirically analyse the properties of kernel regression solution to obtain an insight into Jacobian regularisation. △ Less

Submitted 21 August, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Accepted at ICML 2024. 74 pages, 18 figures

arXiv:2310.13349 [pdf, other]

DeepFDR: A Deep Learning-based False Discovery Rate Control Method for Neuroimaging Data

Authors: Taehyo Kim, Hai Shu, Qiran Jia, Mony J. de Leon

Abstract: Voxel-based multiple testing is widely used in neuroimaging data analysis. Traditional false discovery rate (FDR) control methods often ignore the spatial dependence among the voxel-based tests and thus suffer from substantial loss of testing power. While recent spatial FDR control methods have emerged, their validity and optimality remain questionable when handling the complex spatial dependencie… ▽ More Voxel-based multiple testing is widely used in neuroimaging data analysis. Traditional false discovery rate (FDR) control methods often ignore the spatial dependence among the voxel-based tests and thus suffer from substantial loss of testing power. While recent spatial FDR control methods have emerged, their validity and optimality remain questionable when handling the complex spatial dependencies of the brain. Concurrently, deep learning methods have revolutionized image segmentation, a task closely related to voxel-based multiple testing. In this paper, we propose DeepFDR, a novel spatial FDR control method that leverages unsupervised deep learning-based image segmentation to address the voxel-based multiple testing problem. Numerical studies, including comprehensive simulations and Alzheimer's disease FDG-PET image analysis, demonstrate DeepFDR's superiority over existing methods. DeepFDR not only excels in FDR control and effectively diminishes the false nondiscovery rate, but also boasts exceptional computational efficiency highly suited for tackling large-scale neuroimaging data. △ Less

Submitted 10 March, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

Journal ref: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024), PMLR 238:946-954, 2024

arXiv:2307.09254 [pdf, other]

PAC Neural Prediction Set Learning to Quantify the Uncertainty of Generative Language Models

Authors: Sangdon Park, Taesoo Kim

Abstract: Uncertainty learning and quantification of models are crucial tasks to enhance the trustworthiness of the models. Importantly, the recent surge of generative language models (GLMs) emphasizes the need for reliable uncertainty quantification due to the concerns on generating hallucinated facts. In this paper, we propose to learn neural prediction set models that comes with the probably approximatel… ▽ More Uncertainty learning and quantification of models are crucial tasks to enhance the trustworthiness of the models. Importantly, the recent surge of generative language models (GLMs) emphasizes the need for reliable uncertainty quantification due to the concerns on generating hallucinated facts. In this paper, we propose to learn neural prediction set models that comes with the probably approximately correct (PAC) guarantee for quantifying the uncertainty of GLMs. Unlike existing prediction set models, which are parameterized by a scalar value, we propose to parameterize prediction sets via neural networks, which achieves more precise uncertainty quantification but still satisfies the PAC guarantee. We demonstrate the efficacy of our method on four types of language datasets and six types of models by showing that our method improves the quantified uncertainty by $63\%$ on average, compared to a standard baseline method. △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2307.08150 [pdf, other]

Efficient Treatment Effect Estimation with Out-of-bag Post-stratification

Authors: Taebin Kim, Lili Wang, Randy Lai, Sangho Yoon

Abstract: Post-stratification is often used to estimate treatment effects with higher efficiency. However, the majority of existing post-stratification frameworks depend on prior knowledge of the distributions of covariates and assume that the units are classified into post-strata without error. We propose a novel method to determine a proper stratification rule by mapping the covariates into a post-stratif… ▽ More Post-stratification is often used to estimate treatment effects with higher efficiency. However, the majority of existing post-stratification frameworks depend on prior knowledge of the distributions of covariates and assume that the units are classified into post-strata without error. We propose a novel method to determine a proper stratification rule by mapping the covariates into a post-stratification factor (PSF) using predictive regression models. Inspired by the bootstrap aggregating (bagging) method, we utilize the out-of-bag delete-D jackknife to estimate strata boundaries, strata weights, and the variance of the point estimate. Confidence intervals are constructed with these estimators to take into account the additional variability coming from uncertainty in the strata boundaries and weights. Extensive simulations show that our proposed method consistently improves the efficiency of the estimates when the regression models are predictive and tends to be more robust than the regression imputation method. △ Less

Submitted 12 September, 2023; v1 submitted 16 July, 2023; originally announced July 2023.

arXiv:2304.04221 [pdf, other]

Maximum Agreement Linear Prediction via the Concordance Correlation Coefficient

Authors: Taeho Kim, George Luta, Matteo Bottai, Pierre Chausse, Gheorghe Doros, Edsel A. Pena

Abstract: This paper examines distributional properties and predictive performance of the estimated maximum agreement linear predictor (MALP) introduced in Bottai, Kim, Lieberman, Luta, and Pena (2022) paper in The American Statistician, which is the linear predictor maximizing Lin's concordance correlation coefficient (CCC) between the predictor and the predictand. It is compared and contrasted, theoretica… ▽ More This paper examines distributional properties and predictive performance of the estimated maximum agreement linear predictor (MALP) introduced in Bottai, Kim, Lieberman, Luta, and Pena (2022) paper in The American Statistician, which is the linear predictor maximizing Lin's concordance correlation coefficient (CCC) between the predictor and the predictand. It is compared and contrasted, theoretically and through computer experiments, with the estimated least-squares linear predictor (LSLP). Finite-sample and asymptotic properties are obtained, and confidence intervals are also presented. The predictors are illustrated using two real data sets: an eye data set and a bodyfat data set. The results indicate that the estimated MALP is a viable alternative to the estimated LSLP if one desires a predictor whose predicted values possess higher agreement with the predictand values, as measured by the CCC. △ Less

Submitted 10 February, 2024; v1 submitted 9 April, 2023; originally announced April 2023.

MSC Class: 62J99; 62H20; 62F99

arXiv:2303.15833 [pdf, other]

Complementary Domain Adaptation and Generalization for Unsupervised Continual Domain Shift Learning

Authors: Wonguk Cho, Jinha Park, Taesup Kim

Abstract: Continual domain shift poses a significant challenge in real-world applications, particularly in situations where labeled data is not available for new domains. The challenge of acquiring knowledge in this problem setting is referred to as unsupervised continual domain shift learning. Existing methods for domain adaptation and generalization have limitations in addressing this issue, as they focus… ▽ More Continual domain shift poses a significant challenge in real-world applications, particularly in situations where labeled data is not available for new domains. The challenge of acquiring knowledge in this problem setting is referred to as unsupervised continual domain shift learning. Existing methods for domain adaptation and generalization have limitations in addressing this issue, as they focus either on adapting to a specific domain or generalizing to unseen domains, but not both. In this paper, we propose Complementary Domain Adaptation and Generalization (CoDAG), a simple yet effective learning framework that combines domain adaptation and generalization in a complementary manner to achieve three major goals of unsupervised continual domain shift learning: adapting to a current domain, generalizing to unseen domains, and preventing forgetting of previously seen domains. Our approach is model-agnostic, meaning that it is compatible with any existing domain adaptation and generalization algorithms. We evaluate CoDAG on several benchmark datasets and demonstrate that our model outperforms state-of-the-art models in all datasets and evaluation metrics, highlighting its effectiveness and robustness in handling unsupervised continual domain shift learning. △ Less

Submitted 13 October, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

Comments: ICCV 2023

arXiv:2210.13533 [pdf, other]

Sufficient Invariant Learning for Distribution Shift

Authors: Taero Kim, Sungjun Lim, Kyungwoo Song

Abstract: Machine learning algorithms have shown remarkable performance in diverse applications. However, it is still challenging to guarantee performance in distribution shifts when distributions of training and test datasets are different. There have been several approaches to improve the performance in distribution shift cases by learning invariant features across groups or domains. However, we observe t… ▽ More Machine learning algorithms have shown remarkable performance in diverse applications. However, it is still challenging to guarantee performance in distribution shifts when distributions of training and test datasets are different. There have been several approaches to improve the performance in distribution shift cases by learning invariant features across groups or domains. However, we observe that the previous works only learn invariant features partially. While the prior works focus on the limited invariant features, we first raise the importance of the sufficient invariant features. Since only training sets are given empirically, the learned partial invariant features from training sets might not be present in the test sets under distribution shift. Therefore, the performance improvement on distribution shifts might be limited. In this paper, we argue that learning sufficient invariant features from the training set is crucial for the distribution shift case. Concretely, we newly observe the connection between a) sufficient invariant features and b) flatness differences between groups or domains. Moreover, we propose a new algorithm, Adaptive Sharpness-aware Group Distributionally Robust Optimization (ASGDRO), to learn sufficient invariant features across domains or groups. ASGDRO learns sufficient invariant features by seeking common flat minima across all groups or domains. Therefore, ASGDRO improves the performance on diverse distribution shift cases. Besides, we provide a new simple dataset, Heterogeneous-CMNIST, to diagnose whether the various algorithms learn sufficient invariant features. △ Less

Submitted 28 August, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

arXiv:2209.05150 [pdf, other]

Bounding the Rademacher Complexity of Fourier neural operators

Authors: Taeyoung Kim, Myungjoo Kang

Abstract: A Fourier neural operator (FNO) is one of the physics-inspired machine learning methods. In particular, it is a neural operator. In recent times, several types of neural operators have been developed, e.g., deep operator networks, Graph neural operator (GNO), and Multiwavelet-based operator (MWTO). Compared with other models, the FNO is computationally efficient and can learn nonlinear operators b… ▽ More A Fourier neural operator (FNO) is one of the physics-inspired machine learning methods. In particular, it is a neural operator. In recent times, several types of neural operators have been developed, e.g., deep operator networks, Graph neural operator (GNO), and Multiwavelet-based operator (MWTO). Compared with other models, the FNO is computationally efficient and can learn nonlinear operators between function spaces independent of a certain finite basis. In this study, we investigated the bounding of the Rademacher complexity of the FNO based on specific group norms. Using capacity based on these norms, we bound the generalization error of the model. In addition, we investigated the correlation between the empirical generalization error and the proposed capacity of FNO. From the perspective of our result, we inferred that the type of group norms determines the information about the weights and architecture of the FNO model stored in the capacity. And then, we confirmed these inferences through experiments. Based on this fact, we gained insight into the impact of the number of modes used in the FNO model on the generalization error. And we got experimental results that followed our insights. △ Less

Submitted 26 September, 2022; v1 submitted 12 September, 2022; originally announced September 2022.

Comments: 21 pages, 19 figures

arXiv:2207.07533 [pdf, ps, other]

Selection of the Most Probable Best

Authors: Taeho Kim, Kyoung-kuk Kim, Eunhye Song

Abstract: We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution. We define the most probable best (MPB) to be the solution that has the largest probability of being optimal with respect to the distribution and design an efficient sequential sampling algorithm to learn the MPB… ▽ More We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution. We define the most probable best (MPB) to be the solution that has the largest probability of being optimal with respect to the distribution and design an efficient sequential sampling algorithm to learn the MPB when the parameter has a finite support. We derive the large deviations rate of the probability of falsely selecting the MPB and formulate an optimal computing budget allocation problem to find the rate-maximizing static sampling ratios. The problem is then relaxed to obtain a set of optimality conditions that are interpretable and computationally efficient to verify. We devise a series of algorithms that replace the unknown means in the optimality conditions with their estimates and prove the algorithms' sampling ratios achieve the conditions as the simulation budget increases. Furthermore, we show that the empirical performances of the algorithms can be significantly improved by adopting the kernel ridge regression for mean estimation while achieving the same asymptotic convergence results. The algorithms are benchmarked against a state-of-the-art contextual R&S algorithm and demonstrated to have superior empirical performances. △ Less

Submitted 20 April, 2024; v1 submitted 15 July, 2022; originally announced July 2022.

arXiv:2104.14695 [pdf, other]

Dynamic Gene Coexpression Analysis with Correlation Modeling

Authors: Tae Hyun Kim, Dan Nicolae

Abstract: In many transcriptomic studies, the correlation of genes might fluctuate with quantitative factors such as genetic ancestry. We propose a method that models the covariance between two variables to vary against a continuous covariate. For the bivariate case, the proposed score test statistic is computationally simple and robust to model misspecification of the covariance term. Subsequently, the met… ▽ More In many transcriptomic studies, the correlation of genes might fluctuate with quantitative factors such as genetic ancestry. We propose a method that models the covariance between two variables to vary against a continuous covariate. For the bivariate case, the proposed score test statistic is computationally simple and robust to model misspecification of the covariance term. Subsequently, the method is expanded to test relationships between one highly connected gene, such as a transcription factor, and several other genes for a more global investigation of the dynamic of the coexpression network. Simulations show that the proposed method has higher statistical power than alternatives, can be used in more diverse scenarios, and is computationally cheaper. We apply this method to African American subjects from GTEx to analyze the dynamic behavior of their gene coexpression against genetic ancestry and to identify transcription factors whose coexpression with their target genes change with the genetic ancestry. The proposed method can be applied to a wide array of problems that require covariance modeling. △ Less

Submitted 29 April, 2021; originally announced April 2021.

arXiv:2103.00083 [pdf, other]

Flexible Model Aggregation for Quantile Regression

Authors: Rasool Fakoor, Taesup Kim, Jonas Mueller, Alexander J. Smola, Ryan J. Tibshirani

Abstract: Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost estimates, and revenue predictions all benefit from being able to quantify the range of possible values accurately. As such, many models have been developed for… ▽ More Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost estimates, and revenue predictions all benefit from being able to quantify the range of possible values accurately. As such, many models have been developed for this problem over many years of research in statistics, machine learning, and related fields. Rather than proposing yet another (new) algorithm for quantile regression we adopt a meta viewpoint: we investigate methods for aggregating any number of conditional quantile models, in order to improve accuracy and robustness. We consider weighted ensembles where weights may vary over not only individual models, but also over quantile levels, and feature values. All of the models we consider in this paper can be fit using modern deep learning toolkits, and hence are widely accessible (from an implementation point of view) and scalable. To improve the accuracy of the predicted quantiles (or equivalently, prediction intervals), we develop tools for ensuring that quantiles remain monotonically ordered, and apply conformal calibration methods. These can be used without any modification of the original library of base models. We also review some basic theory surrounding quantile aggregation and related scoring rules, and contribute a few new results to this literature (for example, the fact that post sorting or post isotonic regression can only improve the weighted interval score). Finally, we provide an extensive suite of empirical comparisons across 34 data sets from two different benchmark repositories. △ Less

Submitted 15 April, 2023; v1 submitted 26 February, 2021; originally announced March 2021.

Comments: Accepted at JMLR 2023

arXiv:2101.02491 [pdf, ps, other]

Density Deconvolution with Non-Standard Error Distributions: Rates of Convergence and Adaptive Estimation

Authors: Alexander Goldenshluger, Taeho Kim

Abstract: It is a typical standard assumption in the density deconvolution problem that the characteristic function of the measurement error distribution is non-zero on the real line. While this condition is assumed in the majority of existing works on the topic, there are many problem instances of interest where it is violated. In this paper we focus on non--standard settings where the characteristic funct… ▽ More It is a typical standard assumption in the density deconvolution problem that the characteristic function of the measurement error distribution is non-zero on the real line. While this condition is assumed in the majority of existing works on the topic, there are many problem instances of interest where it is violated. In this paper we focus on non--standard settings where the characteristic function of the measurement errors has zeros, and study how zeros multiplicity affects the estimation accuracy. For a prototypical problem of this type we demonstrate that the best achievable estimation accuracy is determined by the multiplicity of zeros, the rate of decay of the error characteristic function, as well as by the smoothness and the tail behavior of the estimated density. We derive lower bounds on the minimax risk and develop optimal in the minimax sense estimators. In addition, we consider the problem of adaptive estimation and propose a data-driven estimator that automatically adapts to unknown smoothness and tail behavior of the density to be estimated. △ Less

Submitted 7 January, 2021; originally announced January 2021.

Comments: 32 pages

MSC Class: 62G07; 62G20

arXiv:2012.03501 [pdf, other]

Adaptive Local Bayesian Optimization Over Multiple Discrete Variables

Authors: Taehyeon Kim, Jaeyeon Ahn, Nakyil Kim, Seyoung Yun

Abstract: In the machine learning algorithms, the choice of the hyperparameter is often an art more than a science, requiring labor-intensive search with expert experience. Therefore, automation on hyperparameter optimization to exclude human intervention is a great appeal, especially for the black-box functions. Recently, there have been increasing demands of solving such concealed tasks for better general… ▽ More In the machine learning algorithms, the choice of the hyperparameter is often an art more than a science, requiring labor-intensive search with expert experience. Therefore, automation on hyperparameter optimization to exclude human intervention is a great appeal, especially for the black-box functions. Recently, there have been increasing demands of solving such concealed tasks for better generalization, though the task-dependent issue is not easy to solve. The Black-Box Optimization challenge (NeurIPS 2020) required competitors to build a robust black-box optimizer across different domains of standard machine learning problems. This paper describes the approach of team KAIST OSI in a step-wise manner, which outperforms the baseline algorithms by up to +20.39%. We first strengthen the local Bayesian search under the concept of region reliability. Then, we design a combinatorial kernel for a Gaussian process kernel. In a similar vein, we combine the methodology of Bayesian and multi-armed bandit,(MAB) approach to select the values with the consideration of the variable types; the real and integer variables are with Bayesian, while the boolean and categorical variables are with MAB. Empirical evaluations demonstrate that our method outperforms the existing methods across different tasks. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: workshop at NeurIPS 2020 Competition Track on Black-Box Optimization Challenge

arXiv:2010.01792 [pdf, other]

Can we Generalize and Distribute Private Representation Learning?

Authors: Sheikh Shams Azam, Taejin Kim, Seyyedali Hosseinalipour, Carlee Joe-Wong, Saurabh Bagchi, Christopher Brinton

Abstract: We study the problem of learning representations that are private yet informative, i.e., provide information about intended "ally" targets while hiding sensitive "adversary" attributes. We propose Exclusion-Inclusion Generative Adversarial Network (EIGAN), a generalized private representation learning (PRL) architecture that accounts for multiple ally and adversary attributes unlike existing PRL s… ▽ More We study the problem of learning representations that are private yet informative, i.e., provide information about intended "ally" targets while hiding sensitive "adversary" attributes. We propose Exclusion-Inclusion Generative Adversarial Network (EIGAN), a generalized private representation learning (PRL) architecture that accounts for multiple ally and adversary attributes unlike existing PRL solutions. While centrally-aggregated dataset is a prerequisite for most PRL techniques, data in real-world is often siloed across multiple distributed nodes unwilling to share the raw data because of privacy concerns. We address this practical constraint by developing D-EIGAN, the first distributed PRL method that learns representations at each node without transmitting the source data. We theoretically analyze the behavior of adversaries under the optimal EIGAN and D-EIGAN encoders and the impact of dependencies among ally and adversary tasks on the optimization objective. Our experiments on various datasets demonstrate the advantages of EIGAN in terms of performance, robustness, and scalability. In particular, EIGAN outperforms the previous state-of-the-art by a significant accuracy margin (47% improvement), and D-EIGAN's performance is consistently on par with EIGAN under different network settings. △ Less

Submitted 30 January, 2022; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 2022

arXiv:2007.02105 [pdf, other]

Prediction Regions for Poisson and Over-Dispersed Poisson Regression Models with Applications to Forecasting Number of Deaths during the COVID-19 Pandemic

Authors: T. KIm, B. Lieberman, G. Luta, E. Pena

Abstract: Motivated by the current Coronavirus Disease (COVID-19) pandemic, which is due to the SARS-CoV-2 virus, and the important problem of forecasting daily deaths and cumulative deaths, this paper examines the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson regression model. For the Poisson regression model, several prediction regions… ▽ More Motivated by the current Coronavirus Disease (COVID-19) pandemic, which is due to the SARS-CoV-2 virus, and the important problem of forecasting daily deaths and cumulative deaths, this paper examines the construction of prediction regions or intervals under the Poisson regression model and for an over-dispersed Poisson regression model. For the Poisson regression model, several prediction regions are developed and their performance are compared through simulation studies. The methods are applied to the problem of forecasting daily and cumulative deaths in the United States (US) due to COVID-19. To examine their performance relative to what actually happened, daily deaths data until May 15th were used to forecast cumulative deaths by June 1st. It was observed that there is over-dispersion in the observed data relative to the Poisson regression model. An over-dispersed Poisson regression model is therefore proposed. This new model builds on frailty ideas in Survival Analysis and over-dispersion is quantified through an additional parameter. The Poisson regression model is a hidden model in this over-dispersed Poisson regression model and obtains as a limiting case when the over-dispersion parameter increases to infinity. A prediction region for the cumulative number of US deaths due to COVID-19 by July 16th, given the data until July 2nd, is presented. Finally, the paper discusses limitations of proposed procedures and mentions open research problems, as well as the dangers and pitfalls when forecasting on a long horizon, with focus on this pandemic where events, both foreseen and unforeseen, could have huge impacts on point predictions and prediction regions. △ Less

Submitted 6 July, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

Comments: There are 16 Figures with some containing one to four plot panels. The appendix section are supplementary materials. Without these supplementary materials, there are 35 pages in this manuscript

MSC Class: Primary: 62J02; 62P99; Secondary: 62F99; 62M10

arXiv:2006.09679 [pdf, other]

FrostNet: Towards Quantization-Aware Network Architecture Search

Authors: Taehoon Kim, YoungJoon Yoo, Jihoon Yang

Abstract: INT8 quantization has become one of the standard techniques for deploying convolutional neural networks (CNNs) on edge devices to reduce the memory and computational resource usages. By analyzing quantized performances of existing mobile-target network architectures, we can raise an issue regarding the importance of network architecture for optimal INT8 quantization. In this paper, we present a ne… ▽ More INT8 quantization has become one of the standard techniques for deploying convolutional neural networks (CNNs) on edge devices to reduce the memory and computational resource usages. By analyzing quantized performances of existing mobile-target network architectures, we can raise an issue regarding the importance of network architecture for optimal INT8 quantization. In this paper, we present a new network architecture search (NAS) procedure to find a network that guarantees both full-precision (FLOAT32) and quantized (INT8) performances. We first propose critical but straightforward optimization method which enables quantization-aware training (QAT) : floating-point statistic assisting (StatAssist) and stochastic gradient boosting (GradBoost). By integrating the gradient-based NAS with StatAssist and GradBoost, we discovered a quantization-efficient network building block, Frost bottleneck. Furthermore, we used Frost bottleneck as the building block for hardware-aware NAS to obtain quantization-efficient networks, FrostNets, which show improved quantization performances compared to other mobile-target networks while maintaining competitive FLOAT32 performance. Our FrostNets achieve higher recognition accuracy than existing CNNs with comparable latency when quantized, due to higher latency reduction rate (average 65%). △ Less

Submitted 30 November, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

arXiv:2003.01860 [pdf, ps, other]

Designing a Bonus-Malus system reflecting the claim size under the dependent frequency-severity model

Authors: Rosy Oh, Joseph H. T. Kim, Jae Youn Ahn

Abstract: In auto insurance, a Bonus-Malus System (BMS) is commonly used as a posteriori risk classification mechanism to set the premium for the next contract period based on a policyholder's claim history. Even though recent literature reports evidence of a significant dependence between frequency and severity, the current BMS practice is to use a frequency-based transition rule while ignoring severity in… ▽ More In auto insurance, a Bonus-Malus System (BMS) is commonly used as a posteriori risk classification mechanism to set the premium for the next contract period based on a policyholder's claim history. Even though recent literature reports evidence of a significant dependence between frequency and severity, the current BMS practice is to use a frequency-based transition rule while ignoring severity information. Although Oh et al. (2019) claim that the frequency-driven BMS transition rule can accommodate the dependence between frequency and severity, their proposal is only a partial solution, as the transition rule still completely ignores the claim severity and is unable to penalize large claims. In this study, we propose to use the BMS with a transition rule based on both frequency and size of claim, based on the bivariate random effect model, which conveniently allows dependence between frequency and severity. We analytically derive the optimal relativities under the proposed BMS framework and show that the proposed BMS outperforms the existing frequency-driven BMS. Later numerical experiments are also provided using both hypothetical and actual datasets in order to assess the effect of various dependencies on the BMS risk classification and confirm our theoretical findings. △ Less

Submitted 3 March, 2020; originally announced March 2020.

arXiv:2002.11903 [pdf, other]

Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image

Authors: Taewon Kim, Yeseong Park, Youngbin Park, Il Hong Suh

Abstract: For a robotic grasping task in which diverse unseen target objects exist in a cluttered environment, some deep learning-based methods have achieved state-of-the-art results using visual input directly. In contrast, actor-critic deep reinforcement learning (RL) methods typically perform very poorly when grasping diverse objects, especially when learning from raw images and sparse rewards. To make t… ▽ More For a robotic grasping task in which diverse unseen target objects exist in a cluttered environment, some deep learning-based methods have achieved state-of-the-art results using visual input directly. In contrast, actor-critic deep reinforcement learning (RL) methods typically perform very poorly when grasping diverse objects, especially when learning from raw images and sparse rewards. To make these RL techniques feasible for vision-based grasping tasks, we employ state representation learning (SRL), where we encode essential information first for subsequent use in RL. However, typical representation learning procedures are unsuitable for extracting pertinent information for learning the grasping skill, because the visual inputs for representation learning, where a robot attempts to grasp a target object in clutter, are extremely complex. We found that preprocessing based on the disentanglement of a raw input image is the key to effectively capturing a compact representation. This enables deep RL to learn robotic grasping skills from highly varied and diverse visual inputs. We demonstrate the effectiveness of this approach with varying levels of disentanglement in a realistic simulated environment. △ Less

Submitted 26 February, 2020; originally announced February 2020.

arXiv:1912.13366 [pdf, other]

Fast and Accurate Transferability Measurement for Heterogeneous Multivariate Data

Authors: Seungcheol Park, Huiwen Xu, Taehun Kim, Inhwan Hwang, Kyung-Jun Kim, U Kang

Abstract: Given a set of heterogeneous source datasets with their classifiers, how can we quickly find the most useful source dataset for a specific target task? We address the problem of measuring transferability between source and target datasets, where the source and the target have different feature spaces and distributions. We propose Transmeter, a fast and accurate method to estimate the transferabili… ▽ More Given a set of heterogeneous source datasets with their classifiers, how can we quickly find the most useful source dataset for a specific target task? We address the problem of measuring transferability between source and target datasets, where the source and the target have different feature spaces and distributions. We propose Transmeter, a fast and accurate method to estimate the transferability of two heterogeneous multivariate datasets. We address three challenges in measuring transferability between two heterogeneous multivariate datasets: reducing time, minimizing domain gap, and extracting meaningful homogeneous representations. To overcome the above issues, we utilize a pre-trained source model, an adversarial network, and an encoder-decoder architecture. Extensive experiments on heterogeneous multivariate datasets show that Transmeter gives the most accurate transferability measurement with up to 10.3 times faster performance than its competitor. We also show that selecting the best source data with Transmeter followed by a full transfer leads to the best transfer accuracy and the fastest running time. △ Less

Submitted 29 January, 2021; v1 submitted 23 December, 2019; originally announced December 2019.

arXiv:1912.04871 [pdf, other]

Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients

Authors: Brenden K. Petersen, Mikel Landajuela, T. Nathan Mundhenk, Claudio P. Santiago, Soo K. Kim, Joanne T. Kim

Abstract: Discovering the underlying mathematical expressions describing a dataset is a core challenge for artificial intelligence. This is the problem of $\textit{symbolic regression}$. Despite recent advances in training neural networks to solve complex tasks, deep learning approaches to symbolic regression are underexplored. We propose a framework that leverages deep learning for symbolic regression via… ▽ More Discovering the underlying mathematical expressions describing a dataset is a core challenge for artificial intelligence. This is the problem of $\textit{symbolic regression}$. Despite recent advances in training neural networks to solve complex tasks, deep learning approaches to symbolic regression are underexplored. We propose a framework that leverages deep learning for symbolic regression via a simple idea: use a large model to search the space of small models. Specifically, we use a recurrent neural network to emit a distribution over tractable mathematical expressions and employ a novel risk-seeking policy gradient to train the network to generate better-fitting expressions. Our algorithm outperforms several baseline methods (including Eureqa, the gold standard for symbolic regression) in its ability to exactly recover symbolic expressions on a series of benchmark problems, both with and without added noise. More broadly, our contributions include a framework that can be applied to optimize hierarchical, variable-length objects under a black-box performance metric, with the ability to incorporate constraints in situ, and a risk-seeking policy gradient formulation that optimizes for best-case performance instead of expected performance. △ Less

Submitted 5 April, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

Comments: Published at International Conference on Learning Representations, 2021

Report number: LLNL-CONF-790457

Journal ref: International Conference on Learning Representations, 2021

arXiv:1912.03756 [pdf, other]

Improved Multiple Confidence Intervals via Thresholding Informed by Prior Information

Authors: Taeho Kim, Edsel A. Pena

Abstract: Consider a statistical problem where a set of parameters are of interest to a researcher. Then multiple confidence intervals can be constructed to infer the set of parameters simultaneously. The constructed multiple confidence intervals are the realization of a multiple interval estimator (MIE), the main focus of this study. In particular, a thresholding approach is introduced to improve the perfo… ▽ More Consider a statistical problem where a set of parameters are of interest to a researcher. Then multiple confidence intervals can be constructed to infer the set of parameters simultaneously. The constructed multiple confidence intervals are the realization of a multiple interval estimator (MIE), the main focus of this study. In particular, a thresholding approach is introduced to improve the performance of the MIE. The developed thresholds require additional information, so a prior distribution is assumed for this purpose. The MIE procedure is then evaluated by two performance measures: a global coverage probability and a global expected content, which are averages with respect to the prior distribution. The procedure defined by the performance measures will be called a Bayes MIE with thresholding (BMIE Thres). In this study, a normal-normal model is utilized to build up the BMIE Thres for a set of location parameters. Then, the behaviors of BMIE Thres are investigated in terms of the performance measures, which approach those of the corresponding z-based MIE as the thresholding parameter, C, goes to infinity. In addition, an optimization procedure is introduced to achieve the best thresholding parameter C. For illustrations, in-season baseball batting average data and leukemia gene expression data are used to demonstrate the procedure for the known and unknown standard deviations situations, respectively. In the ensuing simulations, the target parameters are generated from different true generating distributions to consider the misspecified prior situation. The simulation also involves Bayes credible MIEs, and the effectiveness among the different MIEs are compared with respect to the performance measures. In general, the thresholding procedure helps to achieve a meaningful reduction in the global expected content while maintaining a nominal level of the global coverage probability. △ Less

Submitted 8 December, 2019; originally announced December 2019.

Comments: 34 pages and 7 figures

MSC Class: 62F25; 62H12; 62H15

arXiv:1910.00775 [pdf, other]

Variational Temporal Abstraction

Authors: Taesup Kim, Sungjin Ahn, Yoshua Bengio

Abstract: We introduce a variational approach to learning and inference of temporally hierarchical structure and representation for sequential data. We propose the Variational Temporal Abstraction (VTA), a hierarchical recurrent state space model that can infer the latent temporal structure and thus perform the stochastic state transition hierarchically. We also propose to apply this model to implement the… ▽ More We introduce a variational approach to learning and inference of temporally hierarchical structure and representation for sequential data. We propose the Variational Temporal Abstraction (VTA), a hierarchical recurrent state space model that can infer the latent temporal structure and thus perform the stochastic state transition hierarchically. We also propose to apply this model to implement the jumpy-imagination ability in imagination-augmented agent-learning in order to improve the efficiency of the imagination. In experiments, we demonstrate that our proposed method can model 2D and 3D visual sequence datasets with interpretable temporal structure discovery and that its application to jumpy imagination enables more efficient agent-learning in a 3D navigation task. △ Less

Submitted 2 October, 2019; originally announced October 2019.

Comments: Accepted in NeurIPS 2019

arXiv:1906.05956 [pdf, other]

doi 10.1007/978-3-030-32248-9_25

Scalable Neural Architecture Search for 3D Medical Image Segmentation

Authors: Sungwoong Kim, Ildoo Kim, Sungbin Lim, Woonhyuk Baek, Chiheon Kim, Hyungjoo Cho, Boogeon Yoon, Taesup Kim

Abstract: In this paper, a neural architecture search (NAS) framework is proposed for 3D medical image segmentation, to automatically optimize a neural architecture from a large design space. Our NAS framework searches the structure of each layer including neural connectivities and operation types in both of the encoder and decoder. Since optimizing over a large discrete architecture space is difficult due… ▽ More In this paper, a neural architecture search (NAS) framework is proposed for 3D medical image segmentation, to automatically optimize a neural architecture from a large design space. Our NAS framework searches the structure of each layer including neural connectivities and operation types in both of the encoder and decoder. Since optimizing over a large discrete architecture space is difficult due to high-resolution 3D medical images, a novel stochastic sampling algorithm based on a continuous relaxation is also proposed for scalable gradient based optimization. On the 3D medical image segmentation tasks with a benchmark dataset, an automatically designed architecture by the proposed NAS framework outperforms the human-designed 3D U-Net, and moreover this optimized architecture is well suited to be transferred for different tasks. △ Less

Submitted 13 June, 2019; originally announced June 2019.

Comments: 9 pages, 3 figures

arXiv:1906.04691 [pdf, other]

On Single Source Robustness in Deep Fusion Models

Authors: Taewan Kim, Joydeep Ghosh

Abstract: Algorithms that fuse multiple input sources benefit from both complementary and shared information. Shared information may provide robustness against faulty or noisy inputs, which is indispensable for safety-critical applications like self-driving cars. We investigate learning fusion algorithms that are robust against noise added to a single source. We first demonstrate that robustness against sin… ▽ More Algorithms that fuse multiple input sources benefit from both complementary and shared information. Shared information may provide robustness against faulty or noisy inputs, which is indispensable for safety-critical applications like self-driving cars. We investigate learning fusion algorithms that are robust against noise added to a single source. We first demonstrate that robustness against single source noise is not guaranteed in a linear fusion model. Motivated by this discovery, two possible approaches are proposed to increase robustness: a carefully designed loss with corresponding training algorithms for deep fusion models, and a simple convolutional fusion layer that has a structural advantage in dealing with noise. Experimental results show that both training algorithms and our fusion layer make a deep fusion-based 3D object detector robust against noise applied to a single source, while preserving the original performance on clean data. △ Less

Submitted 16 October, 2019; v1 submitted 11 June, 2019; originally announced June 2019.

Comments: Accepted to NeurIPS 2019

arXiv:1905.13536 [pdf, other]

Scaling Video Analytics on Constrained Edge Nodes

Authors: Christopher Canel, Thomas Kim, Giulio Zhou, Conglong Li, Hyeontaek Lim, David G. Andersen, Michael Kaminsky, Subramanya R. Dulloor

Abstract: As video camera deployments continue to grow, the need to process large volumes of real-time data strains wide area network infrastructure. When per-camera bandwidth is limited, it is infeasible for applications such as traffic monitoring and pedestrian tracking to offload high-quality video streams to a datacenter. This paper presents FilterForward, a new edge-to-cloud system that enables datacen… ▽ More As video camera deployments continue to grow, the need to process large volumes of real-time data strains wide area network infrastructure. When per-camera bandwidth is limited, it is infeasible for applications such as traffic monitoring and pedestrian tracking to offload high-quality video streams to a datacenter. This paper presents FilterForward, a new edge-to-cloud system that enables datacenter-based applications to process content from thousands of cameras by installing lightweight edge filters that backhaul only relevant video frames. FilterForward introduces fast and expressive per-application microclassifiers that share computation to simultaneously detect dozens of events on computationally constrained edge nodes. Only matching events are transmitted to the cloud. Evaluation on two real-world camera feed datasets shows that FilterForward reduces bandwidth use by an order of magnitude while improving computational efficiency and event detection accuracy for challenging video content. △ Less

Submitted 24 May, 2019; originally announced May 2019.

Comments: This paper is an extended version of a paper with the same title published in the 2nd SysML Conference, SysML '19 (Canel et. al., 2019)

arXiv:1905.00397 [pdf, other]

Fast AutoAugment

Authors: Sungbin Lim, Ildoo Kim, Taesup Kim, Chiheon Kim, Sungwoong Kim

Abstract: Data augmentation is an essential technique for improving generalization ability of deep learning models. Recently, AutoAugment has been proposed as an algorithm to automatically search for augmentation policies from a dataset and has significantly enhanced performances on many image recognition tasks. However, its search method requires thousands of GPU hours even for a relatively small dataset.… ▽ More Data augmentation is an essential technique for improving generalization ability of deep learning models. Recently, AutoAugment has been proposed as an algorithm to automatically search for augmentation policies from a dataset and has significantly enhanced performances on many image recognition tasks. However, its search method requires thousands of GPU hours even for a relatively small dataset. In this paper, we propose an algorithm called Fast AutoAugment that finds effective augmentation policies via a more efficient search strategy based on density matching. In comparison to AutoAugment, the proposed algorithm speeds up the search time by orders of magnitude while achieves comparable performances on image recognition tasks with various models and datasets including CIFAR-10, CIFAR-100, SVHN, and ImageNet. △ Less

Submitted 25 May, 2019; v1 submitted 1 May, 2019; originally announced May 2019.

Comments: 8 pages, 2 figure

Report number: NeurIPS/2019/12

arXiv:1902.06562 [pdf, other]

doi 10.1016/j.bspc.2020.102037

Intra- and Inter-epoch Temporal Context Network (IITNet) Using Sub-epoch Features for Automatic Sleep Scoring on Raw Single-channel EEG

Authors: Hogeon Seo, Seunghyeok Back, Seongju Lee, Deokhwan Park, Tae Kim, Kyoobin Lee

Abstract: A deep learning model, named IITNet, is proposed to learn intra- and inter-epoch temporal contexts from raw single-channel EEG for automatic sleep scoring. To classify the sleep stage from half-minute EEG, called an epoch, sleep experts investigate sleep-related events and consider the transition rules between the found events. Similarly, IITNet extracts representative features at a sub-epoch leve… ▽ More A deep learning model, named IITNet, is proposed to learn intra- and inter-epoch temporal contexts from raw single-channel EEG for automatic sleep scoring. To classify the sleep stage from half-minute EEG, called an epoch, sleep experts investigate sleep-related events and consider the transition rules between the found events. Similarly, IITNet extracts representative features at a sub-epoch level by a residual neural network and captures intra- and inter-epoch temporal contexts from the sequence of the features via bidirectional LSTM. The performance was investigated for three datasets as the sequence length (L) increased from one to ten. IITNet achieved the comparable performance with other state-of-the-art results. The best accuracy, MF1, and Cohen's kappa ($κ$) were 83.9%, 77.6%, 0.78 for SleepEDF (L=10), 86.5%, 80.7%, 0.80 for MASS (L=9), and 86.7%, 79.8%, 0.81 for SHHS (L=10), respectively. Even though using four epochs, the performance was still comparable. Compared to using a single epoch, on average, accuracy and MF1 increased by 2.48%p and 4.90%p and F1 of N1, N2, and REM increased by 16.1%p, 1.50%p, and 6.42%p, respectively. Above four epochs, the performance improvement was not significant. The results support that considering the latest two-minute raw single-channel EEG can be a reasonable choice for sleep scoring via deep neural networks with efficiency and reliability. Furthermore, the experiments with the baselines showed that introducing intra-epoch temporal context learning with a deep residual network contributes to the improvement in the overall performance and has the positive synergy effect with the inter-epoch temporal context learning. △ Less

Submitted 10 June, 2020; v1 submitted 18 February, 2019; originally announced February 2019.

Comments: First three authors contributed equally to this work; Accepted manuscript for Biomedical Signal Processing and Control (BSPC); 12 pages, 6 figures;

arXiv:1902.04224 [pdf, other]

Effective Network Compression Using Simulation-Guided Iterative Pruning

Authors: Dae-Woong Jeong, Jaehun Kim, Youngseok Kim, Tae-Ho Kim, Myungsu Chae

Abstract: Existing high-performance deep learning models require very intensive computing. For this reason, it is difficult to embed a deep learning model into a system with limited resources. In this paper, we propose the novel idea of the network compression as a method to solve this limitation. The principle of this idea is to make iterative pruning more effective and sophisticated by simulating the redu… ▽ More Existing high-performance deep learning models require very intensive computing. For this reason, it is difficult to embed a deep learning model into a system with limited resources. In this paper, we propose the novel idea of the network compression as a method to solve this limitation. The principle of this idea is to make iterative pruning more effective and sophisticated by simulating the reduced network. A simple experiment was conducted to evaluate the method; the results showed that the proposed method achieved higher performance than existing methods at the same pruning level. △ Less

Submitted 11 February, 2019; originally announced February 2019.

Comments: Submitted to NIPS 2018 MLPCD2

MSC Class: 68T05

arXiv:1812.08997 [pdf, other]

Stochastic Doubly Robust Gradient

Authors: Kanghoon Lee, Jihye Choi, Moonsu Cha, Jung-Kwon Lee, Taeyoon Kim

Abstract: When training a machine learning model with observational data, it is often encountered that some values are systemically missing. Learning from the incomplete data in which the missingness depends on some covariates may lead to biased estimation of parameters and even harm the fairness of decision outcome. This paper proposes how to adjust the causal effect of covariates on the missingness when t… ▽ More When training a machine learning model with observational data, it is often encountered that some values are systemically missing. Learning from the incomplete data in which the missingness depends on some covariates may lead to biased estimation of parameters and even harm the fairness of decision outcome. This paper proposes how to adjust the causal effect of covariates on the missingness when training models using stochastic gradient descent (SGD). Inspired by the design of doubly robust estimator and its theoretical property of double robustness, we introduce stochastic doubly robust gradient (SDRG) consisting of two models: weight-corrected gradients for inverse propensity score weighting and per-covariate control variates for regression adjustment. Also, we identify the connection between double robustness and variance reduction in SGD by demonstrating the SDRG algorithm with a unifying framework for variance reduced SGD. The performance of our approach is empirically tested by showing the convergence in training image classifiers with several examples of missing data. △ Less

Submitted 21 December, 2018; originally announced December 2018.

Comments: 9 pages, 2 figures

arXiv:1812.02341 [pdf, other]

Quantifying Generalization in Reinforcement Learning

Authors: Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, John Schulman

Abstract: In this paper, we investigate the problem of overfitting in deep reinforcement learning. Among the most common benchmarks in RL, it is customary to use the same environments for both training and testing. This practice offers relatively little insight into an agent's ability to generalize. We address this issue by using procedurally generated environments to construct distinct training and test se… ▽ More In this paper, we investigate the problem of overfitting in deep reinforcement learning. Among the most common benchmarks in RL, it is customary to use the same environments for both training and testing. This practice offers relatively little insight into an agent's ability to generalize. We address this issue by using procedurally generated environments to construct distinct training and test sets. Most notably, we introduce a new environment called CoinRun, designed as a benchmark for generalization in RL. Using CoinRun, we find that agents overfit to surprisingly large training sets. We then show that deeper convolutional architectures improve generalization, as do methods traditionally found in supervised learning, including L2 regularization, dropout, data augmentation and batch normalization. △ Less

Submitted 14 July, 2019; v1 submitted 5 December, 2018; originally announced December 2018.

arXiv:1810.02358 [pdf, other]

Transfer Learning via Unsupervised Task Discovery for Visual Question Answering

Authors: Hyeonwoo Noh, Taehoon Kim, Jonghwan Mun, Bohyung Han

Abstract: We study how to leverage off-the-shelf visual and linguistic data to cope with out-of-vocabulary answers in visual question answering task. Existing large-scale visual datasets with annotations such as image class labels, bounding boxes and region descriptions are good sources for learning rich and diverse visual concepts. However, it is not straightforward how the visual concepts can be captured… ▽ More We study how to leverage off-the-shelf visual and linguistic data to cope with out-of-vocabulary answers in visual question answering task. Existing large-scale visual datasets with annotations such as image class labels, bounding boxes and region descriptions are good sources for learning rich and diverse visual concepts. However, it is not straightforward how the visual concepts can be captured and transferred to visual question answering models due to missing link between question dependent answering models and visual data without question. We tackle this problem in two steps: 1) learning a task conditional visual classifier, which is capable of solving diverse question-specific visual recognition tasks, based on unsupervised task discovery and 2) transferring the task conditional visual classifier to visual question answering models. Specifically, we employ linguistic knowledge sources such as structured lexical database (e.g. WordNet) and visual descriptions for unsupervised task discovery, and transfer a learned task conditional visual classifier as an answering unit in a visual question answering model. We empirically show that the proposed algorithm generalizes to out-of-vocabulary answers successfully using the knowledge transferred from the visual dataset. △ Less

Submitted 7 April, 2019; v1 submitted 3 October, 2018; originally announced October 2018.

Comments: CVPR 2019

arXiv:1809.00758 [pdf]

End-to-end Multimodal Emotion and Gender Recognition with Dynamic Joint Loss Weights

Authors: Myungsu Chae, Tae-Ho Kim, Young Hoon Shin, June-Woo Kim, Soo-Young Lee

Abstract: Multi-task learning is a method for improving the generalizability of multiple tasks. In order to perform multiple classification tasks with one neural network model, the losses of each task should be combined. Previous studies have mostly focused on multiple prediction tasks using joint loss with static weights for training models, choosing the weights between tasks without making sufficient cons… ▽ More Multi-task learning is a method for improving the generalizability of multiple tasks. In order to perform multiple classification tasks with one neural network model, the losses of each task should be combined. Previous studies have mostly focused on multiple prediction tasks using joint loss with static weights for training models, choosing the weights between tasks without making sufficient considerations by setting them uniformly or empirically. In this study, we propose a method to calculate joint loss using dynamic weights to improve the total performance, instead of the individual performance, of tasks. We apply this method to design an end-to-end multimodal emotion and gender recognition model using audio and video data. This approach provides proper weights for the loss of each task when the training process ends. In our experiments, emotion and gender recognition with the proposed method yielded a lower joint loss, which is computed as the negative log-likelihood, than using static weights for joint loss. Moreover, our proposed model has better generalizability than other models. To the best of our knowledge, this research is the first to demonstrate the strength of using dynamic weights for joint loss for maximizing overall performance in emotion and gender recognition tasks. △ Less

Submitted 2 October, 2018; v1 submitted 3 September, 2018; originally announced September 2018.

Comments: IROS 2018 Workshop on Crossmodal Learning for Intelligent Robotics

MSC Class: 68T05

arXiv:1806.03836 [pdf, other]

Bayesian Model-Agnostic Meta-Learning

Authors: Taesup Kim, Jaesik Yoon, Ousmane Dia, Sungwoong Kim, Yoshua Bengio, Sungjin Ahn

Abstract: Learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning due to the model uncertainty inherent in the problem. In this paper, we propose a novel Bayesian model-agnostic meta-learning method. The proposed method combines scalable gradient-based meta-learning with nonparametric variational inference in a principled probabilistic framework. During… ▽ More Learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning due to the model uncertainty inherent in the problem. In this paper, we propose a novel Bayesian model-agnostic meta-learning method. The proposed method combines scalable gradient-based meta-learning with nonparametric variational inference in a principled probabilistic framework. During fast adaptation, the method is capable of learning complex uncertainty structure beyond a point estimate or a simple Gaussian approximation. In addition, a robust Bayesian meta-update mechanism with a new meta-loss prevents overfitting during meta-update. Remaining an efficient gradient-based meta-learner, the method is also model-agnostic and simple to implement. Experiment results show the accuracy and robustness of the proposed method in various tasks: sinusoidal regression, image classification, active learning, and reinforcement learning. △ Less

Submitted 18 November, 2018; v1 submitted 11 June, 2018; originally announced June 2018.

Comments: First two authors contributed equally. 15 pages with appendix including experimental details. Accepted in NIPS 2018

arXiv:1806.02071 [pdf, other]

doi 10.1111/cgf.13619

Deep Fluids: A Generative Network for Parameterized Fluid Simulations

Authors: Byungsoo Kim, Vinicius C. Azevedo, Nils Thuerey, Theodore Kim, Markus Gross, Barbara Solenthaler

Abstract: This paper presents a novel generative model to synthesize fluid simulations from a set of reduced parameters. A convolutional neural network is trained on a collection of discrete, parameterizable fluid simulation velocity fields. Due to the capability of deep learning architectures to learn representative features of the data, our generative model is able to accurately approximate the training d… ▽ More This paper presents a novel generative model to synthesize fluid simulations from a set of reduced parameters. A convolutional neural network is trained on a collection of discrete, parameterizable fluid simulation velocity fields. Due to the capability of deep learning architectures to learn representative features of the data, our generative model is able to accurately approximate the training data set, while providing plausible interpolated in-betweens. The proposed generative model is optimized for fluids by a novel loss function that guarantees divergence-free velocity fields at all times. In addition, we demonstrate that we can handle complex parameterizations in reduced spaces, and advance simulations in time by integrating in the latent space with a second network. Our method models a wide variety of fluid behaviors, thus enabling applications such as fast construction of simulations, interpolation of fluids with different parameters, time re-sampling, latent space simulations, and compression of fluid simulation data. Reconstructed velocity fields are generated up to 700x faster than re-simulating the data with the underlying CPU solver, while achieving compression rates of up to 1300x. △ Less

Submitted 1 February, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

Comments: Computer Graphics Forum (Proceedings of EUROGRAPHICS 2019), additional materials: http://www.byungsoo.me/project/deep-fluids/

Journal ref: Computer Graphics Forum (Proc. Eurographics), 38, 2 (2019), 59-70

arXiv:1805.10724 [pdf, other]

doi 10.1109/TVCG.2018.2865027

RetainVis: Visual Analytics with Interpretable and Interactive Recurrent Neural Networks on Electronic Medical Records

Authors: Bum Chul Kwon, Min-Je Choi, Joanne Taery Kim, Edward Choi, Young Bin Kim, Soonwook Kwon, Jimeng Sun, Jaegul Choo

Abstract: We have recently seen many successful applications of recurrent neural networks (RNNs) on electronic medical records (EMRs), which contain histories of patients' diagnoses, medications, and other various events, in order to predict the current and future states of patients. Despite the strong performance of RNNs, it is often challenging for users to understand why the model makes a particular pred… ▽ More We have recently seen many successful applications of recurrent neural networks (RNNs) on electronic medical records (EMRs), which contain histories of patients' diagnoses, medications, and other various events, in order to predict the current and future states of patients. Despite the strong performance of RNNs, it is often challenging for users to understand why the model makes a particular prediction. Such black-box nature of RNNs can impede its wide adoption in clinical practice. Furthermore, we have no established methods to interactively leverage users' domain expertise and prior knowledge as inputs for steering the model. Therefore, our design study aims to provide a visual analytics solution to increase interpretability and interactivity of RNNs via a joint effort of medical experts, artificial intelligence scientists, and visual analytics researchers. Following the iterative design process between the experts, we design, implement, and evaluate a visual analytics tool called RetainVis, which couples a newly improved, interpretable and interactive RNN-based model called RetainEX and visualizations for users' exploration of EMR data in the context of prediction tasks. Our study shows the effective use of RetainVis for gaining insights into how individual medical codes contribute to making risk predictions, using EMRs of patients with heart failure and cataract symptoms. Our study also demonstrates how we made substantial changes to the state-of-the-art RNN model called RETAIN in order to make use of temporal information and increase interactivity. This study will provide a useful guideline for researchers that aim to design an interpretable and interactive visual analytics tool for RNNs. △ Less

Submitted 23 October, 2018; v1 submitted 27 May, 2018; originally announced May 2018.

Comments: Accepted at IEEE VIS 2018. To appear in IEEE Transactions on Visualization and Computer Graphics in January 2019

arXiv:1801.06700 [pdf, other]

A Deep Reinforcement Learning Chatbot (Short Version)

Authors: Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Rajeswar, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio

Abstract: We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including neural network and template-based… ▽ More We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including neural network and template-based models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than other systems. The results highlight the potential of coupling ensemble systems with deep reinforcement learning as a fruitful path for developing real-world, open-domain conversational agents. △ Less

Submitted 20 January, 2018; originally announced January 2018.

Comments: 9 pages, 1 figure, 2 tables; presented at NIPS 2017, Conversational AI: "Today's Practice and Tomorrow's Potential" Workshop

ACM Class: I.5.1; I.2.7

arXiv:1711.07433 [pdf, other]

Relaxed Oracles for Semi-Supervised Clustering

Authors: Taewan Kim, Joydeep Ghosh

Abstract: Pairwise "same-cluster" queries are one of the most widely used forms of supervision in semi-supervised clustering. However, it is impractical to ask human oracles to answer every query correctly. In this paper, we study the influence of allowing "not-sure" answers from a weak oracle and propose an effective algorithm to handle such uncertainties in query responses. Two realistic weak oracle model… ▽ More Pairwise "same-cluster" queries are one of the most widely used forms of supervision in semi-supervised clustering. However, it is impractical to ask human oracles to answer every query correctly. In this paper, we study the influence of allowing "not-sure" answers from a weak oracle and propose an effective algorithm to handle such uncertainties in query responses. Two realistic weak oracle models are considered where ambiguity in answering depends on the distance between two points. We show that a small query complexity is adequate for effective clustering with high probability by providing better pairs to the weak oracle. Experimental results on synthetic and real data show the effectiveness of our approach in overcoming supervision uncertainties and yielding high quality clusters. △ Less

Submitted 20 November, 2017; originally announced November 2017.

Comments: NIPS 2017 Workshop: Learning with Limited Labeled Data (LLD 2017)

arXiv:1709.03202 [pdf, other]

Semi-Supervised Active Clustering with Weak Oracles

Authors: Taewan Kim, Joydeep Ghosh

Abstract: Semi-supervised active clustering (SSAC) utilizes the knowledge of a domain expert to cluster data points by interactively making pairwise "same-cluster" queries. However, it is impractical to ask human oracles to answer every pairwise query. In this paper, we study the influence of allowing "not-sure" answers from a weak oracle and propose algorithms to efficiently handle uncertainties. Different… ▽ More Semi-supervised active clustering (SSAC) utilizes the knowledge of a domain expert to cluster data points by interactively making pairwise "same-cluster" queries. However, it is impractical to ask human oracles to answer every pairwise query. In this paper, we study the influence of allowing "not-sure" answers from a weak oracle and propose algorithms to efficiently handle uncertainties. Different types of model assumptions are analyzed to cover realistic scenarios of oracle abstraction. In the first model, random-weak oracle, an oracle randomly abstains with a certain probability. We also proposed two distance-weak oracle models which simulate the case of getting confused based on the distance between two points in a pairwise query. For each weak oracle model, we show that a small query complexity is adequate for the effective $k$ means clustering with high probability. Sufficient conditions for the guarantee include a $γ$-margin property of the data, and an existence of a point close to each cluster center. Furthermore, we provide a sample complexity with a reduced effect of the cluster's margin and only a logarithmic dependency on the data dimension. Our results allow significantly less number of same-cluster queries if the margin of the clusters is tight, i.e. $γ\approx 1$. Experimental results on synthetic data show the effective performance of our approach in overcoming uncertainties. △ Less

Submitted 10 September, 2017; originally announced September 2017.

arXiv:1709.02349 [pdf, other]

A Deep Reinforcement Learning Chatbot

Authors: Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Rajeshwar, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio

Abstract: We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including template-based models, bag-of-wor… ▽ More We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including template-based models, bag-of-words models, sequence-to-sequence neural network and latent variable neural network models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than many competing systems. Due to its machine learning architecture, the system is likely to improve with additional data. △ Less

Submitted 5 November, 2017; v1 submitted 7 September, 2017; originally announced September 2017.

Comments: 40 pages, 9 figures, 11 tables

ACM Class: I.5.1; I.2.7

arXiv:1707.08774 [pdf, other]

Topological Data Analysis of Clostridioides difficile Infection and Fecal Microbiota Transplantation

Authors: Pavel Petrov, Stephen T Rush, Zhichun Zhai, Christine H Lee, Peter T Kim, Giseon Heo

Abstract: Computational topologists recently developed a method, called persistent homology to analyze data presented in terms of similarity or dissimilarity. Indeed, persistent homology studies the evolution of topological features in terms of a single index, and is able to capture higher order features beyond the usual clustering techniques. There are three descriptive statistics of persistent homology, n… ▽ More Computational topologists recently developed a method, called persistent homology to analyze data presented in terms of similarity or dissimilarity. Indeed, persistent homology studies the evolution of topological features in terms of a single index, and is able to capture higher order features beyond the usual clustering techniques. There are three descriptive statistics of persistent homology, namely barcode, persistence diagram and more recently, persistence landscape. Persistence landscape is useful for statistical inference as it belongs to a space of $p-$integrable functions, a separable Banach space. We apply tools in both computational topology and statistics to DNA sequences taken from Clostridioides difficile infected patients treated with an experimental fecal microbiota transplantation. Our statistical and topological data analysis are able to detect interesting patterns among patients and donors. It also provides visualization of DNA sequences in the form of clusters and loops. △ Less

Submitted 31 July, 2017; v1 submitted 27 July, 2017; originally announced July 2017.

Comments: 20 pages, 8 figures

MSC Class: 62-07

arXiv:1607.08877 [pdf, other]

The Phylogenetic LASSO and the Microbiome

Authors: Stephen T Rush, Christine H Lee, Washington Mio, Peter T Kim

Abstract: Scientific investigations that incorporate next generation sequencing involve analyses of high-dimensional data where the need to organize, collate and interpret the outcomes are pressingly important. Currently, data can be collected at the microbiome level leading to the possibility of personalized medicine whereby treatments can be tailored at this scale. In this paper, we lay down a statistical… ▽ More Scientific investigations that incorporate next generation sequencing involve analyses of high-dimensional data where the need to organize, collate and interpret the outcomes are pressingly important. Currently, data can be collected at the microbiome level leading to the possibility of personalized medicine whereby treatments can be tailored at this scale. In this paper, we lay down a statistical framework for this type of analysis with a view toward synthesis of products tailored to individual patients. Although the paper applies the technique to data for a particular infectious disease, the methodology is sufficiently rich to be expanded to other problems in medicine, especially those in which coincident `-omics' covariates and clinical responses are simultaneously captured. △ Less

Submitted 29 July, 2016; originally announced July 2016.

Comments: 31 pages, 6 figures, 5 tables

MSC Class: 62P10

arXiv:1606.03439 [pdf, other]

Deep Directed Generative Models with Energy-Based Probability Estimation

Authors: Taesup Kim, Yoshua Bengio

Abstract: Training energy-based probabilistic models is confronted with apparently intractable sums, whose Monte Carlo estimation requires sampling from the estimated probability distribution in the inner loop of training. This can be approximately achieved by Markov chain Monte Carlo methods, but may still face a formidable obstacle that is the difficulty of mixing between modes with sharp concentrations o… ▽ More Training energy-based probabilistic models is confronted with apparently intractable sums, whose Monte Carlo estimation requires sampling from the estimated probability distribution in the inner loop of training. This can be approximately achieved by Markov chain Monte Carlo methods, but may still face a formidable obstacle that is the difficulty of mixing between modes with sharp concentrations of probability. Whereas an MCMC process is usually derived from a given energy function based on mathematical considerations and requires an arbitrarily long time to obtain good and varied samples, we propose to train a deep directed generative model (not a Markov chain) so that its sampling distribution approximately matches the energy function that is being trained. Inspired by generative adversarial networks, the proposed framework involves training of two models that represent dual views of the estimated probability distribution: the energy function (mapping an input configuration to a scalar energy value) and the generator (mapping a noise vector to a generated configuration), both represented by deep neural networks. △ Less

Submitted 10 June, 2016; originally announced June 2016.

arXiv:1605.04955 [pdf, other]

Probing the Geometry of Data with Diffusion Fréchet Functions

Authors: Diego Hernán Díaz Martínez, Christine H. Lee, Peter T. Kim, Washington Mio

Abstract: Many complex ecosystems, such as those formed by multiple microbial taxa, involve intricate interactions amongst various sub-communities. The most basic relationships are frequently modeled as co-occurrence networks in which the nodes represent the various players in the community and the weighted edges encode levels of interaction. In this setting, the composition of a community may be viewed as… ▽ More Many complex ecosystems, such as those formed by multiple microbial taxa, involve intricate interactions amongst various sub-communities. The most basic relationships are frequently modeled as co-occurrence networks in which the nodes represent the various players in the community and the weighted edges encode levels of interaction. In this setting, the composition of a community may be viewed as a probability distribution on the nodes of the network. This paper develops methods for modeling the organization of such data, as well as their Euclidean counterparts, across spatial scales. Using the notion of diffusion distance, we introduce diffusion Frechet functions and diffusion Frechet vectors associated with probability distributions on Euclidean space and the vertex set of a weighted network, respectively. We prove that these functional statistics are stable with respect to the Wasserstein distance between probability measures, thus yielding robust descriptors of their shapes. We apply the methodology to investigate bacterial communities in the human gut, seeking to characterize divergence from intestinal homeostasis in patients with Clostridium difficile infection (CDI) and the effects of fecal microbiota transplantation, a treatment used in CDI patients that has proven to be significantly more effective than traditional treatment with antibiotics. The proposed method proves useful in deriving a biomarker that might help elucidate the mechanisms that drive these processes. △ Less

Submitted 7 March, 2017; v1 submitted 16 May, 2016; originally announced May 2016.

Comments: 26 pages, 8 figures. Lemma 1b and Theorem 2 have been revised, as well as the results derived from them

MSC Class: 62-07; 92C50

arXiv:1410.3752 [pdf, ps, other]

Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding

Authors: Wai Lam Hoo, Tae-Kyun Kim, Yuru Pei, Chee Seng Chan

Abstract: Image understanding is an important research domain in the computer vision due to its wide real-world applications. For an image understanding framework that uses the Bag-of-Words model representation, the visual codebook is an essential part. Random forest (RF) as a tree-structure discriminative codebook has been a popular choice. However, the performance of the RF can be degraded if the local pa… ▽ More Image understanding is an important research domain in the computer vision due to its wide real-world applications. For an image understanding framework that uses the Bag-of-Words model representation, the visual codebook is an essential part. Random forest (RF) as a tree-structure discriminative codebook has been a popular choice. However, the performance of the RF can be degraded if the local patch labels are poorly assigned. In this paper, we tackle this problem by a novel way to update the RF codebook learning for a more discriminative codebook with the introduction of the soft class labels, estimated from the pLSA model based on a feedback scheme. The feedback scheme is performed on both the image and patch levels respectively, which is in contrast to the state- of-the-art RF codebook learning that focused on either image or patch level only. Experiments on 15-Scene and C-Pascal datasets had shown the effectiveness of the proposed method in image understanding task. △ Less

Submitted 14 October, 2014; originally announced October 2014.

Comments: Accepted in ICPR 2014 (Oral)

Showing 1–47 of 47 results for author: Kim, T