Search | arXiv e-print repository

Meent: Differentiable Electromagnetic Simulator for Machine Learning

Authors: Yongha Kim, Anthony W. Jung, Sanmun Kim, Kevin Octavian, Doyoung Heo, Chaejin Park, Jeongmin Shin, Sunghyun Nam, Chanhyung Park, Juho Park, Sangjun Han, Jinmyoung Lee, Seolho Kim, Min Seok Jang, Chan Y. Park

Abstract: Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reachin… ▽ More Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reaching real world impact. Traditional algorithms for such tasks require iteratively refining parameters through simulations, which often yield sub-optimal results due to the high computational cost of both the algorithms and EM simulations. Machine learning (ML) emerged as a promising candidate to mitigate these challenges, and optics research community has increasingly adopted ML algorithms to obtain results surpassing classical methods across various tasks. To foster a synergistic collaboration between the optics and ML communities, it is essential to have an EM simulation software that is user-friendly for both research communities. To this end, we present Meent, an EM simulation software that employs rigorous coupled-wave analysis (RCWA). Developed in Python and equipped with automatic differentiation (AD) capabilities, Meent serves as a versatile platform for integrating ML into optics research and vice versa. To demonstrate its utility as a research platform, we present three applications of Meent: 1) generating a dataset for training neural operator, 2) serving as an environment for the reinforcement learning of nanophotonic device optimization, and 3) providing a solution for inverse problems with gradient-based optimizers. These applications highlight Meent's potential to advance both EM simulation and ML methodologies. The code is available at https://github.com/kc-ml2/meent with the MIT license to promote the cross-polinations of ideas among academic researchers and industry practitioners. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: under review

arXiv:2406.06111 [pdf, other]

JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis

Authors: Hyunjae Cho, Junhyeok Lee, Wonbin Jung

Abstract: Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality. However, they often suffer from audible artifacts such as tonal artifacts in their generated results. Therefore, we propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property. This method helps prevent alia… ▽ More Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality. However, they often suffer from audible artifacts such as tonal artifacts in their generated results. Therefore, we propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property. This method helps prevent aliasing and reduce artifacts while preserving the model structure used during inference. In our experimental evaluation, JenGAN consistently enhances the performance of vocoder models, yielding significantly superior scores across the majority of evaluation metrics. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Accepted to Interspeech 2024

arXiv:2405.00646 [pdf, other]

Learning to Compose: Improving Object Centric Learning by Injecting Compositionality

Authors: Whie Jung, Jaehoon Yoo, Sungjin Ahn, Seunghoon Hong

Abstract: Learning compositional representation is a key aspect of object-centric learning as it enables flexible systematic generalization and supports complex visual reasoning. However, most of the existing approaches rely on auto-encoding objective, while the compositionality is implicitly imposed by the architectural or algorithmic bias in the encoder. This misalignment between auto-encoding objective a… ▽ More Learning compositional representation is a key aspect of object-centric learning as it enables flexible systematic generalization and supports complex visual reasoning. However, most of the existing approaches rely on auto-encoding objective, while the compositionality is implicitly imposed by the architectural or algorithmic bias in the encoder. This misalignment between auto-encoding objective and learning compositionality often results in failure of capturing meaningful object representations. In this study, we propose a novel objective that explicitly encourages compositionality of the representations. Built upon the existing object-centric learning framework (e.g., slot attention), our method incorporates additional constraints that an arbitrary mixture of object representations from two images should be valid by maximizing the likelihood of the composite data. We demonstrate that incorporating our objective to the existing framework consistently improves the objective-centric learning and enhances the robustness to the architectural choices. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2402.15604 [pdf, other]

Goal-Reaching Trajectory Design Near Danger with Piecewise Affine Reach-avoid Computation

Authors: Long Kiu Chung, Wonsuhk Jung, Chuizheng Kong, Shreyas Kousik

Abstract: Autonomous mobile robots must maintain safety, but should not sacrifice performance, leading to the classical reach-avoid problem: find a trajectory that is guaranteed to reach a goal and avoid obstacles. This paper addresses the near danger case, also known as a narrow gap, where the agent starts near the goal, but must navigate through tight obstacles that block its path. The proposed method bui… ▽ More Autonomous mobile robots must maintain safety, but should not sacrifice performance, leading to the classical reach-avoid problem: find a trajectory that is guaranteed to reach a goal and avoid obstacles. This paper addresses the near danger case, also known as a narrow gap, where the agent starts near the goal, but must navigate through tight obstacles that block its path. The proposed method builds off the common approach of using a simplified planning model to generate plans, which are then tracked using a high-fidelity tracking model and controller. Existing approaches use reachability analysis to overapproximate the error between these models and ensure safety, but doing so introduces numerical approximation error conservativeness that prevents goal-reaching. The present work instead proposes a Piecewise Affine Reach-avoid Computation (PARC) method to tightly approximate the reachable set of the planning model. PARC significantly reduces conservativeness through a careful choice of the planning model and set representation, along with an effective approach to handling time-varying tracking errors. The utility of this method is demonstrated through extensive numerical experiments in which PARC outperforms state-of-the-art reach avoid methods in near-danger goal reaching. Furthermore, in a simulated demonstration, PARC enables the generation of provably-safe extreme vehicle dynamics drift parking maneuvers. A preliminary hardware demo on a TurtleBot3 also validates the method. △ Less

Submitted 28 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: The first two authors contributed equally to the work. This work has been submitted for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2312.00548 [pdf, other]

Domain Adaptive Imitation Learning with Visual Observation

Authors: Sungho Choi, Seungyul Han, Woojun Kim, Jongseong Chae, Whiyoung Jung, Youngchul Sung

Abstract: In this paper, we consider domain-adaptive imitation learning with visual observation, where an agent in a target domain learns to perform a task by observing expert demonstrations in a source domain. Domain adaptive imitation learning arises in practical scenarios where a robot, receiving visual sensory data, needs to mimic movements by visually observing other robots from different angles or obs… ▽ More In this paper, we consider domain-adaptive imitation learning with visual observation, where an agent in a target domain learns to perform a task by observing expert demonstrations in a source domain. Domain adaptive imitation learning arises in practical scenarios where a robot, receiving visual sensory data, needs to mimic movements by visually observing other robots from different angles or observing robots of different shapes. To overcome the domain shift in cross-domain imitation learning with visual observation, we propose a novel framework for extracting domain-independent behavioral features from input observations that can be used to train the learner, based on dual feature extraction and image reconstruction. Empirical results demonstrate that our approach outperforms previous algorithms for imitation learning from visual observation with domain shift. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2311.13824 [pdf, other]

Constraint-Guided Online Data Selection for Scalable Data-Driven Safety Filters in Uncertain Robotic Systems

Authors: Jason J. Choi, Fernando Castañeda, Wonsuhk Jung, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

Abstract: As the use of autonomous robotic systems expands in tasks that are complex and challenging to model, the demand for robust data-driven control methods that can certify safety and stability in uncertain conditions is increasing. However, the practical implementation of these methods often faces scalability issues due to the growing amount of data points with system complexity, and a significant rel… ▽ More As the use of autonomous robotic systems expands in tasks that are complex and challenging to model, the demand for robust data-driven control methods that can certify safety and stability in uncertain conditions is increasing. However, the practical implementation of these methods often faces scalability issues due to the growing amount of data points with system complexity, and a significant reliance on high-quality training data. In response to these challenges, this study presents a scalable data-driven controller that efficiently identifies and infers from the most informative data points for implementing data-driven safety filters. Our approach is grounded in the integration of a model-based certificate function-based method and Gaussian Process (GP) regression, reinforced by a novel online data selection algorithm that reduces time complexity from quadratic to linear relative to dataset size. Empirical evidence, gathered from successful real-world cart-pole swing-up experiments and simulated locomotion of a five-link bipedal robot, demonstrates the efficacy of our approach. Our findings reveal that our efficient online data selection algorithm, which strategically selects key data points, enhances the practicality and efficiency of data-driven certifying filters in complex robotic systems, significantly mitigating scalability concerns inherent in nonparametric learning-based control methods. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: The first three authors contributed equally to the work. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2311.12630 [pdf, other]

Hierarchical Joint Graph Learning and Multivariate Time Series Forecasting

Authors: Juhyeon Kim, Hyungeun Lee, Seungwon Yu, Ung Hwang, Wooyul Jung, Miseon Park, Kijung Yoon

Abstract: Multivariate time series is prevalent in many scientific and industrial domains. Modeling multivariate signals is challenging due to their long-range temporal dependencies and intricate interactions--both direct and indirect. To confront these complexities, we introduce a method of representing multivariate signals as nodes in a graph with edges indicating interdependency between them. Specificall… ▽ More Multivariate time series is prevalent in many scientific and industrial domains. Modeling multivariate signals is challenging due to their long-range temporal dependencies and intricate interactions--both direct and indirect. To confront these complexities, we introduce a method of representing multivariate signals as nodes in a graph with edges indicating interdependency between them. Specifically, we leverage graph neural networks (GNN) and attention mechanisms to efficiently learn the underlying relationships within the time series data. Moreover, we suggest employing hierarchical signal decompositions running over the graphs to capture multiple spatial dependencies. The effectiveness of our proposed model is evaluated across various real-world benchmark datasets designed for long-term forecasting tasks. The results consistently showcase the superiority of our model, achieving an average 23\% reduction in mean squared error (MSE) compared to existing models. △ Less

Submitted 30 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: Temporal Graph Learning Workshop @ NeurIPS 2023, New Orleans, United States

arXiv:2310.20412 [pdf]

Thermal-Infrared Remote Target Detection System for Maritime Rescue based on Data Augmentation with 3D Synthetic Data

Authors: Sungjin Cheong, Wonho Jung, Yoon Seop Lim, Yong-Hwa Park

Abstract: This paper proposes a thermal-infrared (TIR) remote target detection system for maritime rescue using deep learning and data augmentation. We established a self-collected TIR dataset consisting of multiple scenes imitating human rescue situations using a TIR camera (FLIR). Additionally, to address dataset scarcity and improve model robustness, a synthetic dataset from a 3D game (ARMA3) to augment… ▽ More This paper proposes a thermal-infrared (TIR) remote target detection system for maritime rescue using deep learning and data augmentation. We established a self-collected TIR dataset consisting of multiple scenes imitating human rescue situations using a TIR camera (FLIR). Additionally, to address dataset scarcity and improve model robustness, a synthetic dataset from a 3D game (ARMA3) to augment the data is further collected. However, a significant domain gap exists between synthetic TIR and real TIR images. Hence, a proper domain adaptation algorithm is essential to overcome the gap. Therefore, we suggest a domain adaptation algorithm in a target-background separated manner from 3D game-to-real, based on a generative model, to address this issue. Furthermore, a segmentation network with fixed-weight kernels at the head is proposed to improve the signal-to-noise ratio (SNR) and provide weak attention, as remote TIR targets inherently suffer from unclear boundaries. Experiment results reveal that the network trained on augmented data consisting of translated synthetic and real TIR data outperforms that trained on only real TIR data by a large margin. Furthermore, the proposed segmentation model surpasses the performance of state-of-the-art segmentation methods. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 12 pages

MSC Class: 68T45

arXiv:2310.16530 [pdf, other]

Toward Practical Privacy-Preserving Convolutional Neural Networks Exploiting Fully Homomorphic Encryption

Authors: Jaiyoung Park, Donghwan Kim, Jongmin Kim, Sangpyo Kim, Wonkyung Jung, Jung Hee Cheon, Jung Ho Ahn

Abstract: Incorporating fully homomorphic encryption (FHE) into the inference process of a convolutional neural network (CNN) draws enormous attention as a viable approach for achieving private inference (PI). FHE allows delegating the entire computation process to the server while ensuring the confidentiality of sensitive client-side data. However, practical FHE implementation of a CNN faces significant hu… ▽ More Incorporating fully homomorphic encryption (FHE) into the inference process of a convolutional neural network (CNN) draws enormous attention as a viable approach for achieving private inference (PI). FHE allows delegating the entire computation process to the server while ensuring the confidentiality of sensitive client-side data. However, practical FHE implementation of a CNN faces significant hurdles, primarily due to FHE's substantial computational and memory overhead. To address these challenges, we propose a set of optimizations, which includes GPU/ASIC acceleration, an efficient activation function, and an optimized packing scheme. We evaluate our method using the ResNet models on the CIFAR-10 and ImageNet datasets, achieving several orders of magnitude improvement compared to prior work and reducing the latency of the encrypted CNN inference to 1.4 seconds on an NVIDIA A100 GPU. We also show that the latency drops to a mere 0.03 seconds with a custom hardware design. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 3 pages, 1 figure, appears at DISCC 2023 (2nd Workshop on Data Integrity and Secure Cloud Computing, in conjunction with the 56th International Symposium on Microarchitecture (MICRO 2023))

arXiv:2310.13312 [pdf, other]

Exploring the Impact of Corpus Diversity on Financial Pretrained Language Models

Authors: Jaeyoung Choe, Keonwoong Noh, Nayeon Kim, Seyun Ahn, Woohwan Jung

Abstract: Over the past few years, various domain-specific pretrained language models (PLMs) have been proposed and have outperformed general-domain PLMs in specialized areas such as biomedical, scientific, and clinical domains. In addition, financial PLMs have been studied because of the high economic impact of financial data analysis. However, we found that financial PLMs were not pretrained on sufficient… ▽ More Over the past few years, various domain-specific pretrained language models (PLMs) have been proposed and have outperformed general-domain PLMs in specialized areas such as biomedical, scientific, and clinical domains. In addition, financial PLMs have been studied because of the high economic impact of financial data analysis. However, we found that financial PLMs were not pretrained on sufficiently diverse financial data. This lack of diverse training data leads to a subpar generalization performance, resulting in general-purpose PLMs, including BERT, often outperforming financial PLMs on many downstream tasks. To address this issue, we collected a broad range of financial corpus and trained the Financial Language Model (FiLM) on these diverse datasets. Our experimental results confirm that FiLM outperforms not only existing financial PLMs but also general domain PLMs. Furthermore, we provide empirical evidence that this improvement can be achieved even for unseen corpus groups. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023 (Findings)

arXiv:2310.11715 [pdf, other]

Enhancing Low-resource Fine-grained Named Entity Recognition by Leveraging Coarse-grained Datasets

Authors: Su Ah Lee, Seokjin Oh, Woohwan Jung

Abstract: Named Entity Recognition (NER) frequently suffers from the problem of insufficient labeled data, particularly in fine-grained NER scenarios. Although $K$-shot learning techniques can be applied, their performance tends to saturate when the number of annotations exceeds several tens of labels. To overcome this problem, we utilize existing coarse-grained datasets that offer a large number of annotat… ▽ More Named Entity Recognition (NER) frequently suffers from the problem of insufficient labeled data, particularly in fine-grained NER scenarios. Although $K$-shot learning techniques can be applied, their performance tends to saturate when the number of annotations exceeds several tens of labels. To overcome this problem, we utilize existing coarse-grained datasets that offer a large number of annotations. A straightforward approach to address this problem is pre-finetuning, which employs coarse-grained data for representation learning. However, it cannot directly utilize the relationships between fine-grained and coarse-grained entities, although a fine-grained entity type is likely to be a subcategory of a coarse-grained entity type. We propose a fine-grained NER model with a Fine-to-Coarse(F2C) mapping matrix to leverage the hierarchical structure explicitly. In addition, we present an inconsistency filtering method to eliminate coarse-grained entities that are inconsistent with fine-grained entity types to avoid performance degradation. Our experimental results show that our method outperforms both $K$-shot learning and supervised learning methods when dealing with a small number of fine-grained annotations. △ Less

Submitted 13 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023

arXiv:2310.03457 [pdf, other]

A Quantitatively Interpretable Model for Alzheimer's Disease Prediction Using Deep Counterfactuals

Authors: Kwanseok Oh, Da-Woon Heo, Ahmad Wisnu Mulyadi, Wonsik Jung, Eunsong Kang, Kun Ho Lee, Heung-Il Suk

Abstract: Deep learning (DL) for predicting Alzheimer's disease (AD) has provided timely intervention in disease progression yet still demands attentive interpretability to explain how their DL models make definitive decisions. Recently, counterfactual reasoning has gained increasing attention in medical research because of its ability to provide a refined visual explanatory map. However, such visual explan… ▽ More Deep learning (DL) for predicting Alzheimer's disease (AD) has provided timely intervention in disease progression yet still demands attentive interpretability to explain how their DL models make definitive decisions. Recently, counterfactual reasoning has gained increasing attention in medical research because of its ability to provide a refined visual explanatory map. However, such visual explanatory maps based on visual inspection alone are insufficient unless we intuitively demonstrate their medical or neuroscientific validity via quantitative features. In this study, we synthesize the counterfactual-labeled structural MRIs using our proposed framework and transform it into a gray matter density map to measure its volumetric changes over the parcellated region of interest (ROI). We also devised a lightweight linear classifier to boost the effectiveness of constructed ROIs, promoted quantitative interpretation, and achieved comparable predictive performance to DL methods. Throughout this, our framework produces an ``AD-relatedness index'' for each ROI and offers an intuitive understanding of brain status for an individual patient and across patient groups with respect to AD progression. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: 15 pages, 5 figures, 4 tables

arXiv:2310.03404 [pdf, other]

EAG-RS: A Novel Explainability-guided ROI-Selection Framework for ASD Diagnosis via Inter-regional Relation Learning

Authors: Wonsik Jung, Eunjin Jeon, Eunsong Kang, Heung-Il Suk

Abstract: Deep learning models based on resting-state functional magnetic resonance imaging (rs-fMRI) have been widely used to diagnose brain diseases, particularly autism spectrum disorder (ASD). Existing studies have leveraged the functional connectivity (FC) of rs-fMRI, achieving notable classification performance. However, they have significant limitations, including the lack of adequate information whi… ▽ More Deep learning models based on resting-state functional magnetic resonance imaging (rs-fMRI) have been widely used to diagnose brain diseases, particularly autism spectrum disorder (ASD). Existing studies have leveraged the functional connectivity (FC) of rs-fMRI, achieving notable classification performance. However, they have significant limitations, including the lack of adequate information while using linear low-order FC as inputs to the model, not considering individual characteristics (i.e., different symptoms or varying stages of severity) among patients with ASD, and the non-explainability of the decision process. To cover these limitations, we propose a novel explainability-guided region of interest (ROI) selection (EAG-RS) framework that identifies non-linear high-order functional associations among brain regions by leveraging an explainable artificial intelligence technique and selects class-discriminative regions for brain disease identification. The proposed framework includes three steps: (i) inter-regional relation learning to estimate non-linear relations through random seed-based network masking, (ii) explainable connection-wise relevance score estimation to explore high-order relations between functional connections, and (iii) non-linear high-order FC-based diagnosis-informative ROI selection and classifier learning to identify ASD. We validated the effectiveness of our proposed method by conducting experiments using the Autism Brain Imaging Database Exchange (ABIDE) dataset, demonstrating that the proposed method outperforms other comparative methods in terms of various evaluation metrics. Furthermore, we qualitatively analyzed the selected ROIs and identified ASD subtypes linked to previous neuroscientific studies. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: 12 pages, 6 figures, 6 tables

arXiv:2310.03353 [pdf, other]

Deep Geometric Learning with Monotonicity Constraints for Alzheimer's Disease Progression

Authors: Seungwoo Jeong, Wonsik Jung, Junghyo Sohn, Heung-Il Suk

Abstract: Alzheimer's disease (AD) is a devastating neurodegenerative condition that precedes progressive and irreversible dementia; thus, predicting its progression over time is vital for clinical diagnosis and treatment. Numerous studies have implemented structural magnetic resonance imaging (MRI) to model AD progression, focusing on three integral aspects: (i) temporal variability, (ii) incomplete observ… ▽ More Alzheimer's disease (AD) is a devastating neurodegenerative condition that precedes progressive and irreversible dementia; thus, predicting its progression over time is vital for clinical diagnosis and treatment. Numerous studies have implemented structural magnetic resonance imaging (MRI) to model AD progression, focusing on three integral aspects: (i) temporal variability, (ii) incomplete observations, and (iii) temporal geometric characteristics. However, deep learning-based approaches regarding data variability and sparsity have yet to consider inherent geometrical properties sufficiently. The ordinary differential equation-based geometric modeling method (ODE-RGRU) has recently emerged as a promising strategy for modeling time-series data by intertwining a recurrent neural network and an ODE in Riemannian space. Despite its achievements, ODE-RGRU encounters limitations when extrapolating positive definite symmetric metrics from incomplete samples, leading to feature reverse occurrences that are particularly problematic, especially within the clinical facet. Therefore, this study proposes a novel geometric learning approach that models longitudinal MRI biomarkers and cognitive scores by combining three modules: topological space shift, ODE-RGRU, and trajectory estimation. We have also developed a training algorithm that integrates manifold mapping with monotonicity constraints to reflect measurement transition irreversibility. We verify our proposed method's efficacy by predicting clinical labels and cognitive scores over time in regular and irregular settings. Furthermore, we thoroughly analyze our proposed framework through an ablation study. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2307.16833 [pdf, other]

Data Augmentation for Neural Machine Translation using Generative Language Model

Authors: Seokjin Oh, Su Ah Lee, Woohwan Jung

Abstract: Despite the rapid growth in model architecture, the scarcity of large parallel corpora remains the main bottleneck in Neural Machine Translation. Data augmentation is a technique that enhances the performance of data-hungry models by generating synthetic data instead of collecting new ones. We explore prompt-based data augmentation approaches that leverage large-scale language models such as ChatG… ▽ More Despite the rapid growth in model architecture, the scarcity of large parallel corpora remains the main bottleneck in Neural Machine Translation. Data augmentation is a technique that enhances the performance of data-hungry models by generating synthetic data instead of collecting new ones. We explore prompt-based data augmentation approaches that leverage large-scale language models such as ChatGPT. To create a synthetic parallel corpus, we compare 3 methods using different prompts. We employ two assessment metrics to measure the diversity of the generated synthetic data. This approach requires no further model training cost, which is mandatory in other augmentation methods like back-translation. The proposed method improves the unaugmented baseline by 0.68 BLEU score. △ Less

Submitted 13 November, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

arXiv:2303.13110 [pdf, other]

OCELOT: Overlapped Cell on Tissue Dataset for Histopathology

Authors: Jeongun Ryu, Aaron Valero Puche, JaeWoong Shin, Seonwook Park, Biagio Brattoli, Jinhee Lee, Wonkyung Jung, Soo Ick Cho, Kyunghyun Paeng, Chan-Young Ock, Donggeun Yoo, Sérgio Pereira

Abstract: Cell detection is a fundamental task in computational pathology that can be used for extracting high-level medical information from whole-slide images. For accurate cell detection, pathologists often zoom out to understand the tissue-level structures and zoom in to classify cells based on their morphology and the surrounding context. However, there is a lack of efforts to reflect such behaviors by… ▽ More Cell detection is a fundamental task in computational pathology that can be used for extracting high-level medical information from whole-slide images. For accurate cell detection, pathologists often zoom out to understand the tissue-level structures and zoom in to classify cells based on their morphology and the surrounding context. However, there is a lack of efforts to reflect such behaviors by pathologists in the cell detection models, mainly due to the lack of datasets containing both cell and tissue annotations with overlapping regions. To overcome this limitation, we propose and publicly release OCELOT, a dataset purposely dedicated to the study of cell-tissue relationships for cell detection in histopathology. OCELOT provides overlapping cell and tissue annotations on images acquired from multiple organs. Within this setting, we also propose multi-task learning approaches that benefit from learning both cell and tissue tasks simultaneously. When compared against a model trained only for the cell detection task, our proposed approaches improve cell detection performance on 3 datasets: proposed OCELOT, public TIGER, and internal CARP datasets. On the OCELOT test set in particular, we show up to 6.79 improvement in F1-score. We believe the contributions of this paper, including the release of the OCELOT dataset at https://lunit-io.github.io/research/publications/ocelot are a crucial starting point toward the important research direction of incorporating cell-tissue relationships in computation pathology. △ Less

Submitted 23 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted for publication at CVPR'23

arXiv:2303.00451 [pdf, other]

A Variational Approach to Mutual Information-Based Coordination for Multi-Agent Reinforcement Learning

Authors: Woojun Kim, Whiyoung Jung, Myungsik Cho, Youngchul Sung

Abstract: In this paper, we propose a new mutual information framework for multi-agent reinforcement learning to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the simultaneous mutual information between multi-agent actions. By introducing a latent variable to induce nonzero mutual information between multi-agent actions and applying a variational bound, we… ▽ More In this paper, we propose a new mutual information framework for multi-agent reinforcement learning to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the simultaneous mutual information between multi-agent actions. By introducing a latent variable to induce nonzero mutual information between multi-agent actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. The derived tractable objective can be interpreted as maximum entropy reinforcement learning combined with uncertainty reduction of other agents actions. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic, which follows centralized learning with decentralized execution. We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms other MARL algorithms in multi-agent tasks requiring high-quality coordination. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: arXiv admin note: text overlap with arXiv:2006.02732

arXiv:2302.12391 [pdf, other]

PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS

Authors: Junhyeok Lee, Wonbin Jung, Hyunjae Cho, Jaeyeon Kim, Jaehwan Kim

Abstract: Previous pitch-controllable text-to-speech (TTS) models rely on directly modeling fundamental frequency, leading to low variance in synthesized speech. To address this issue, we propose PITS, an end-to-end pitch-controllable TTS model that utilizes variational inference to model pitch. Based on VITS, PITS incorporates the Yingram encoder, the Yingram decoder, and adversarial training of pitch-shif… ▽ More Previous pitch-controllable text-to-speech (TTS) models rely on directly modeling fundamental frequency, leading to low variance in synthesized speech. To address this issue, we propose PITS, an end-to-end pitch-controllable TTS model that utilizes variational inference to model pitch. Based on VITS, PITS incorporates the Yingram encoder, the Yingram decoder, and adversarial training of pitch-shifted synthesis to achieve pitch-controllability. Experiments demonstrate that PITS generates high-quality speech that is indistinguishable from ground truth speech and has high pitch-controllability without quality degradation. Code, audio samples, and demo are available at https://github.com/anonymous-pits/pits. △ Less

Submitted 6 June, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: 6 pages, preprint

arXiv:2212.10504 [pdf, other]

Can Current Task-oriented Dialogue Models Automate Real-world Scenarios in the Wild?

Authors: Sang-Woo Lee, Sungdong Kim, Donghyeon Ko, Donghoon Ham, Youngki Hong, Shin Ah Oh, Hyunhoon Jung, Wangkyo Jung, Kyunghyun Cho, Donghyun Kwak, Hyungsuk Noh, Woomyoung Park

Abstract: Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-worl… ▽ More Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-world scenarios and that the current TOD models are still a long way to cover the scenarios. In this position paper, we first identify current status and limitations of SF-TOD systems. After that, we explore the WebTOD framework, the alternative direction for building a scalable TOD system when a web/mobile interface is available. In WebTOD, the dialogue system learns how to understand the web/mobile interface that the human agent interacts with, powered by a large-scale language model. △ Less

Submitted 24 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

arXiv:2211.15034 [pdf, other]

Quantile Constrained Reinforcement Learning: A Reinforcement Learning Framework Constraining Outage Probability

Authors: Whiyoung Jung, Myungsik Cho, Jongeui Park, Youngchul Sung

Abstract: Constrained reinforcement learning (RL) is an area of RL whose objective is to find an optimal policy that maximizes expected cumulative return while satisfying a given constraint. Most of the previous constrained RL works consider expected cumulative sum cost as the constraint. However, optimization with this constraint cannot guarantee a target probability of outage event that the cumulative sum… ▽ More Constrained reinforcement learning (RL) is an area of RL whose objective is to find an optimal policy that maximizes expected cumulative return while satisfying a given constraint. Most of the previous constrained RL works consider expected cumulative sum cost as the constraint. However, optimization with this constraint cannot guarantee a target probability of outage event that the cumulative sum cost exceeds a given threshold. This paper proposes a framework, named Quantile Constrained RL (QCRL), to constrain the quantile of the distribution of the cumulative sum cost that is a necessary and sufficient condition to satisfy the outage constraint. This is the first work that tackles the issue of applying the policy gradient theorem to the quantile and provides theoretical results for approximating the gradient of the quantile. Based on the derived theoretical results and the technique of the Lagrange multiplier, we construct a constrained RL algorithm named Quantile Constrained Policy Optimization (QCPO). We use distributional RL with the Large Deviation Principle (LDP) to estimate quantiles and tail probability of the cumulative sum cost for the implementation of QCPO. The implemented algorithm satisfies the outage probability constraint after the training period. △ Less

Submitted 27 November, 2022; originally announced November 2022.

Comments: NeurIPS 2022

arXiv:2211.04610 [pdf, other]

doi 10.1109/ICASSP49357.2023.10096374

PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping

Authors: Junhyeok Lee, Seungu Han, Hyunjae Cho, Wonbin Jung

Abstract: Previous generative adversarial network (GAN)-based neural vocoders are trained to reconstruct the exact ground truth waveform from the paired mel-spectrogram and do not consider the one-to-many relationship of speech synthesis. This conventional training causes overfitting for both the discriminators and the generator, leading to the periodicity artifacts in the generated audio signal. In this wo… ▽ More Previous generative adversarial network (GAN)-based neural vocoders are trained to reconstruct the exact ground truth waveform from the paired mel-spectrogram and do not consider the one-to-many relationship of speech synthesis. This conventional training causes overfitting for both the discriminators and the generator, leading to the periodicity artifacts in the generated audio signal. In this work, we present PhaseAug, the first differentiable augmentation for speech synthesis that rotates the phase of each frequency bin to simulate one-to-many mapping. With our proposed method, we outperform baselines without any architecture modification. Code and audio samples will be available at https://github.com/mindslab-ai/phaseaug. △ Less

Submitted 13 March, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

Comments: Accepted to ICASSP 2023

arXiv:2208.10733 [pdf, other]

Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions

Authors: Fernando Castañeda, Jason J. Choi, Wonsuhk Jung, Bike Zhang, Claire J. Tomlin, Koushil Sreenath

Abstract: Learning-based control schemes have recently shown great efficacy performing complex tasks for a wide variety of applications. However, in order to deploy them in real systems, it is of vital importance to guarantee that the system will remain safe during online training and execution. Among the currently most popular methods to tackle this challenge, Control Barrier Functions (CBFs) serve as math… ▽ More Learning-based control schemes have recently shown great efficacy performing complex tasks for a wide variety of applications. However, in order to deploy them in real systems, it is of vital importance to guarantee that the system will remain safe during online training and execution. Among the currently most popular methods to tackle this challenge, Control Barrier Functions (CBFs) serve as mathematical tools that provide a formal safety-preserving control synthesis procedure for systems with known dynamics. In this paper, we first introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers using Gaussian Process (GP) regression to bridge the gap between an approximate mathematical model and the real system. Compared to previous approaches, we study the feasibility of the resulting robust safety-critical controller. This feasibility analysis results in a set of richness conditions that the available information about the system should satisfy to guarantee that a safe control action can be found at all times. We then use these conditions to devise an event-triggered online data collection strategy that ensures the recursive feasibility of the learned safety-critical controller. Our proposed methodology endows the system with the ability to reason at all times about whether the current information at its disposal is enough to ensure safety or if new measurements are required. This, in turn, allows us to provide formal results of forward invariance of a safe set with high probability, even in a priori unexplored regions. Finally, we validate the proposed framework in numerical simulations of an adaptive cruise control system and a kinematic vehicle. △ Less

Submitted 26 September, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: Journal article. Includes the results of the 2021 CDC paper titled "Pointwise feasibility of gaussian process-based safety-critical control under model uncertainty" and proposes a recursively feasible safe online learning algorithm as new contribution

arXiv:2207.13223 [pdf, other]

XADLiME: eXplainable Alzheimer's Disease Likelihood Map Estimation via Clinically-guided Prototype Learning

Authors: Ahmad Wisnu Mulyadi, Wonsik Jung, Kwanseok Oh, Jee Seok Yoon, Heung-Il Suk

Abstract: Diagnosing Alzheimer's disease (AD) involves a deliberate diagnostic process owing to its innate traits of irreversibility with subtle and gradual progression. These characteristics make AD biomarker identification from structural brain imaging (e.g., structural MRI) scans quite challenging. Furthermore, there is a high possibility of getting entangled with normal aging. We propose a novel deep-le… ▽ More Diagnosing Alzheimer's disease (AD) involves a deliberate diagnostic process owing to its innate traits of irreversibility with subtle and gradual progression. These characteristics make AD biomarker identification from structural brain imaging (e.g., structural MRI) scans quite challenging. Furthermore, there is a high possibility of getting entangled with normal aging. We propose a novel deep-learning approach through eXplainable AD Likelihood Map Estimation (XADLiME) for AD progression modeling over 3D sMRIs using clinically-guided prototype learning. Specifically, we establish a set of topologically-aware prototypes onto the clusters of latent clinical features, uncovering an AD spectrum manifold. We then measure the similarities between latent clinical features and well-established prototypes, estimating a "pseudo" likelihood map. By considering this pseudo map as an enriched reference, we employ an estimating network to estimate the AD likelihood map over a 3D sMRI scan. Additionally, we promote the explainability of such a likelihood map by revealing a comprehensible overview from two perspectives: clinical and morphological. During the inference, this estimated likelihood map served as a substitute over unseen sMRI scans for effectively conducting the downstream task while providing thorough explainable states. △ Less

Submitted 26 July, 2022; originally announced July 2022.

arXiv:2207.07862 [pdf, other]

MAC-DO: An Efficient Output-Stationary GEMM Accelerator for CNNs Using DRAM Technology

Authors: Minki Jeong, Wanyeong Jung

Abstract: DRAM-based in-situ accelerators have shown their potential in addressing the memory wall challenge of the traditional von Neumann architecture. Such accelerators exploit charge sharing or logic circuits for simple logic operations at the DRAM subarray level. However, their throughput is limited due to low array utilization, as only a few row cells in a DRAM array participate in operations while mo… ▽ More DRAM-based in-situ accelerators have shown their potential in addressing the memory wall challenge of the traditional von Neumann architecture. Such accelerators exploit charge sharing or logic circuits for simple logic operations at the DRAM subarray level. However, their throughput is limited due to low array utilization, as only a few row cells in a DRAM array participate in operations while most rows remain deactivated. Moreover, they require many cycles for more complex operations such as a multi-bit multiply-accumulate (MAC) operation, resulting in significant data access and movement and potentially worsening power efficiency. To overcome these limitations, this paper presents MAC-DO, an efficient and low-power DRAM-based in-situ accelerator. Compared to previous DRAM-based in-situ accelerators, a MAC-DO cell, consisting of two 1T1C DRAM cells (two transistors and two capacitors), innately supports a multi-bit MAC operation within a single cycle, ensuring good linearity and compatibility with existing 1T1C DRAM cells and array structures. This achievement is facilitated by a novel analog computation method utilizing charge steering. Additionally, MAC-DO enables concurrent individual MAC operations in each MAC-DO cell without idle cells, significantly improving throughput and energy efficiency. As a result, a MAC-DO array efficiently can accelerate matrix multiplications based on output stationary mapping, supporting the majority of computations performed in deep neural networks (DNNs). Furthermore, a MAC-DO array efficiently reuses three types of data (input, weight and output), minimizing data movement. △ Less

Submitted 7 February, 2024; v1 submitted 16 July, 2022; originally announced July 2022.

Comments: 13 pages, 21 figures

arXiv:2206.12132 [pdf, other]

doi 10.21437/Interspeech.2022-46

SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech

Authors: Hyunjae Cho, Wonbin Jung, Junhyeok Lee, Sang Hoon Woo

Abstract: In this paper, we present SANE-TTS, a stable and natural end-to-end multilingual TTS model. By the difficulty of obtaining multilingual corpus for given speaker, training multilingual TTS model with monolingual corpora is unavoidable. We introduce speaker regularization loss that improves speech naturalness during cross-lingual synthesis as well as domain adversarial training, which is applied in… ▽ More In this paper, we present SANE-TTS, a stable and natural end-to-end multilingual TTS model. By the difficulty of obtaining multilingual corpus for given speaker, training multilingual TTS model with monolingual corpora is unavoidable. We introduce speaker regularization loss that improves speech naturalness during cross-lingual synthesis as well as domain adversarial training, which is applied in other multilingual TTS models. Furthermore, by adding speaker regularization loss, replacing speaker embedding with zero vector in duration predictor stabilizes cross-lingual inference. With this replacement, our model generates speeches with moderate rhythm regardless of source speaker in cross-lingual synthesis. In MOS evaluation, SANE-TTS achieves naturalness score above 3.80 both in cross-lingual and intralingual synthesis, where the ground truth score is 3.99. Also, SANE-TTS maintains speaker similarity close to that of ground truth even in cross-lingual inference. Audio samples are available on our web page. △ Less

Submitted 24 June, 2022; originally announced June 2022.

Comments: Accepted to Interspeech 2022

arXiv:2206.10607 [pdf, other]

MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay Buffer

Authors: Jeewon Jeon, Woojun Kim, Whiyoung Jung, Youngchul Sung

Abstract: In this paper, we consider cooperative multi-agent reinforcement learning (MARL) with sparse reward. To tackle this problem, we propose a novel method named MASER: MARL with subgoals generated from experience replay buffer. Under the widely-used assumption of centralized training with decentralized execution and consistent Q-value decomposition for MARL, MASER automatically generates proper subgoa… ▽ More In this paper, we consider cooperative multi-agent reinforcement learning (MARL) with sparse reward. To tackle this problem, we propose a novel method named MASER: MARL with subgoals generated from experience replay buffer. Under the widely-used assumption of centralized training with decentralized execution and consistent Q-value decomposition for MARL, MASER automatically generates proper subgoals for multiple agents from the experience replay buffer by considering both individual Q-value and total Q-value. Then, MASER designs individual intrinsic reward for each agent based on actionable representation relevant to Q-learning so that the agents reach their subgoals while maximizing the joint action value. Numerical results show that MASER significantly outperforms StarCraft II micromanagement benchmark compared to other state-of-the-art MARL algorithms. △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:2206.09314 [pdf, other]

Robust Imitation Learning against Variations in Environment Dynamics

Authors: Jongseong Chae, Seungyul Han, Whiyoung Jung, Myungsik Cho, Sungho Choi, Youngchul Sung

Abstract: In this paper, we propose a robust imitation learning (IL) framework that improves the robustness of IL when environment dynamics are perturbed. The existing IL framework trained in a single environment can catastrophically fail with perturbations in environment dynamics because it does not capture the situation that underlying environment dynamics can be changed. Our framework effectively deals w… ▽ More In this paper, we propose a robust imitation learning (IL) framework that improves the robustness of IL when environment dynamics are perturbed. The existing IL framework trained in a single environment can catastrophically fail with perturbations in environment dynamics because it does not capture the situation that underlying environment dynamics can be changed. Our framework effectively deals with environments with varying dynamics by imitating multiple experts in sampled environment dynamics to enhance the robustness in general variations in environment dynamics. In order to robustly imitate the multiple sample experts, we minimize the risk with respect to the Jensen-Shannon divergence between the agent's policy and each of the sample experts. Numerical results show that our algorithm significantly improves robustness against dynamics perturbations compared to conventional IL baselines. △ Less

Submitted 18 June, 2022; originally announced June 2022.

Comments: Accepted to ICML 2022

arXiv:2202.03805 [pdf, other]

Quantifying the topic disparity of scientific articles

Authors: Munjung Kim, Jisung Yoon, Woo-Sung Jung, Hyunuk Kim

Abstract: Citation count is a popular index for assessing scientific papers. However, it depends on not only the quality of a paper but also various factors, such as conventionality, team size, and gender. Here, we examine the extent to which the conventionality of a paper is related to its citation percentile in a discipline by using our measure, topic disparity. The topic disparity is the cosine distance… ▽ More Citation count is a popular index for assessing scientific papers. However, it depends on not only the quality of a paper but also various factors, such as conventionality, team size, and gender. Here, we examine the extent to which the conventionality of a paper is related to its citation percentile in a discipline by using our measure, topic disparity. The topic disparity is the cosine distance between a paper and its discipline on a neural embedding space. Using this measure, we show that the topic disparity is negatively associated with the citation percentile in many disciplines, even after controlling team size and the genders of the first and last authors. This result indicates that less conventional research tends to receive fewer citations than conventional research. Our proposed method can be used to complement the raw citation counts and to recommend papers at the periphery of a discipline because of their less conventional topics. △ Less

Submitted 8 February, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: 5 pages, 2 figures, Submitted to the 2nd International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment

arXiv:2202.01466 [pdf, other]

Quantifying knowledge synchronisation in the 21st century

Authors: Jisung Yoon, Jinseo Park, Jinhyuk Yun, Woo-Sung Jung

Abstract: Humans acquire and accumulate knowledge through language usage and eagerly exchange their knowledge for advancement. Although geographical barriers had previously limited communication, the emergence of information technology has opened new avenues for knowledge exchange. However, it is unclear which communication pathway is dominant in the 21st century. Here, we explore the dominant path of knowl… ▽ More Humans acquire and accumulate knowledge through language usage and eagerly exchange their knowledge for advancement. Although geographical barriers had previously limited communication, the emergence of information technology has opened new avenues for knowledge exchange. However, it is unclear which communication pathway is dominant in the 21st century. Here, we explore the dominant path of knowledge diffusion in the 21st century using Wikipedia, the largest communal dataset. We evaluate the similarity of shared knowledge between population groups, distinguished based on their language usage. When population groups are more engaged with each other, their knowledge structure is more similar, where engagement is indicated by socioeconomic connections, such as cultural, linguistic, and historical features. Moreover, geographical proximity is no longer a critical requirement for knowledge dissemination. Furthermore, we integrate our data into a mechanistic model to better understand the underlying mechanism and suggest that the knowledge "Silk Road" of the 21st century is based online. △ Less

Submitted 3 February, 2022; originally announced February 2022.

Comments: 35 pages, 11 figures

arXiv:2201.06699 [pdf, other]

AESPA: Accuracy Preserving Low-degree Polynomial Activation for Fast Private Inference

Authors: Jaiyoung Park, Michael Jaemin Kim, Wonkyung Jung, Jung Ho Ahn

Abstract: Hybrid private inference (PI) protocol, which synergistically utilizes both multi-party computation (MPC) and homomorphic encryption, is one of the most prominent techniques for PI. However, even the state-of-the-art PI protocols are bottlenecked by the non-linear layers, especially the activation functions. Although a standard non-linear activation function can generate higher model accuracy, it… ▽ More Hybrid private inference (PI) protocol, which synergistically utilizes both multi-party computation (MPC) and homomorphic encryption, is one of the most prominent techniques for PI. However, even the state-of-the-art PI protocols are bottlenecked by the non-linear layers, especially the activation functions. Although a standard non-linear activation function can generate higher model accuracy, it must be processed via a costly garbled-circuit MPC primitive. A polynomial activation can be processed via Beaver's multiplication triples MPC primitive but has been incurring severe accuracy drops so far. In this paper, we propose an accuracy preserving low-degree polynomial activation function (AESPA) that exploits the Hermite expansion of the ReLU and basis-wise normalization. We apply AESPA to popular ML models, such as VGGNet, ResNet, and pre-activation ResNet, to show an inference accuracy comparable to those of the standard models with ReLU activation, achieving superior accuracy over prior low-degree polynomial studies. When applied to the all-RELU baseline on the state-of-the-art Delphi PI protocol, AESPA shows up to 42.1x and 28.3x lower online latency and communication cost. △ Less

Submitted 18 February, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

Comments: 11 pages, 5 figures

arXiv:2112.15479 [pdf, other]

doi 10.1145/3470496.3527415

BTS: An Accelerator for Bootstrappable Fully Homomorphic Encryption

Authors: Sangpyo Kim, Jongmin Kim, Michael Jaemin Kim, Wonkyung Jung, Minsoo Rhu, John Kim, Jung Ho Ahn

Abstract: Homomorphic encryption (HE) enables the secure offloading of computations to the cloud by providing computation on encrypted data (ciphertexts). HE is based on noisy encryption schemes in which noise accumulates as more computations are applied to the data. The limited number of operations applicable to the data prevents practical applications from exploiting HE. Bootstrapping enables an unlimited… ▽ More Homomorphic encryption (HE) enables the secure offloading of computations to the cloud by providing computation on encrypted data (ciphertexts). HE is based on noisy encryption schemes in which noise accumulates as more computations are applied to the data. The limited number of operations applicable to the data prevents practical applications from exploiting HE. Bootstrapping enables an unlimited number of operations or fully HE (FHE) by refreshing the ciphertext. Unfortunately, bootstrapping requires a significant amount of additional computation and memory bandwidth as well. Prior works have proposed hardware accelerators for computation primitives of FHE. However, to the best of our knowledge, this is the first to propose a hardware FHE accelerator that supports bootstrapping as a first-class citizen. In particular, we propose BTS - Bootstrappable, Technologydriven, Secure accelerator architecture for FHE. We identify the challenges of supporting bootstrapping in the accelerator and analyze the off-chip memory bandwidth and computation required. In particular, given the limitations of modern memory technology, we identify the HE parameter sets that are efficient for FHE acceleration. Based on the insights gained from our analysis, we propose BTS, which effectively exploits the parallelism innate in HE operations by arranging a massive number of processing elements in a grid. We present the design and microarchitecture of BTS, including a network-on-chip design that exploits a deterministic communication pattern. BTS shows 5,556x and 1,306x improved execution time on ResNet-20 and logistic regression over a CPU, with a chip area of 373.6mm^2 and up to 163.2W of power. △ Less

Submitted 28 April, 2022; v1 submitted 31 December, 2021; originally announced December 2021.

Comments: 15 pages, 10 figures

arXiv:2108.01625 [pdf, other]

From augmented microscopy to the topological transformer: a new approach in cell image analysis for Alzheimer's research

Authors: Wooseok Jung

Abstract: Cell image analysis is crucial in Alzheimer's research to detect the presence of A$β$ protein inhibiting cell function. Deep learning speeds up the process by making only low-level data sufficient for fruitful inspection. We first found Unet is most suitable in augmented microscopy by comparing performance in multi-class semantics segmentation. We develop the augmented microscopy method to capture… ▽ More Cell image analysis is crucial in Alzheimer's research to detect the presence of A$β$ protein inhibiting cell function. Deep learning speeds up the process by making only low-level data sufficient for fruitful inspection. We first found Unet is most suitable in augmented microscopy by comparing performance in multi-class semantics segmentation. We develop the augmented microscopy method to capture nuclei in a brightfield image and the transformer using Unet model to convert an input image into a sequence of topological information. The performance regarding Intersection-over-Union is consistent concerning the choice of image preprocessing and ground-truth generation. Training model with data of a specific cell type demonstrates transfer learning applies to some extent. The topological transformer aims to extract persistence silhouettes or landscape signatures containing geometric information of a given image of cells. This feature extraction facilitates studying an image as a collection of one-dimensional data, substantially reducing computational costs. Using the transformer, we attempt grouping cell images by their cell type relying solely on topological features. Performances of the transformers followed by SVM, XGBoost, LGBM, and simple convolutional neural network classifiers are inferior to the conventional image classification. However, since this research initiates a new perspective in biomedical research by combining deep learning and topology for image analysis, we speculate follow-up investigation will reinforce our genuine regime. △ Less

Submitted 3 August, 2021; originally announced August 2021.

arXiv:2107.03649 [pdf]

Heavily Augmented Sound Event Detection utilizing Weak Predictions

Authors: Hyeonuk Nam, Byeong-Yun Ko, Gyeong-Tae Lee, Seong-Hu Kim, Won-Ho Jung, Sang-Min Choi, Yong-Hwa Park

Abstract: The performances of Sound Event Detection (SED) systems are greatly limited by the difficulty in generating large strongly labeled dataset. In this work, we used two main approaches to overcome the lack of strongly labeled data. First, we applied heavy data augmentation on input features. Data augmentation methods used include not only conventional methods used in speech/audio domains but also our… ▽ More The performances of Sound Event Detection (SED) systems are greatly limited by the difficulty in generating large strongly labeled dataset. In this work, we used two main approaches to overcome the lack of strongly labeled data. First, we applied heavy data augmentation on input features. Data augmentation methods used include not only conventional methods used in speech/audio domains but also our proposed method named FilterAugment. Second, we propose two methods to utilize weak predictions to enhance weakly supervised SED performance. As a result, we obtained the best PSDS1 of 0.4336 and best PSDS2 of 0.8161 on the DESED real validation dataset. This work is submitted to DCASE 2021 Task4 and is ranked on the 3rd place. Code availa-ble: https://github.com/frednam93/FilterAugSED. △ Less

Submitted 14 September, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

Comments: Won 3rd place on IEEE DCASE 2021 Task 4

arXiv:2106.15166 [pdf, other]

doi 10.1016/j.joi.2022.101294

Disturbance of questionable publishing to academia

Authors: Taekho You, Jinseo Park, June Young Lee, Jinhyuk Yun, Woo-Sung Jung

Abstract: Questionable publications have been accused of "greedy" practices; however, their influence on academia has not been gauged. Here, we probe the impact of questionable publications through a systematic and comprehensive analysis with various participants from academia and compare the results with those of their unaccused counterparts using billions of citation records, including liaisons, i.e., jou… ▽ More Questionable publications have been accused of "greedy" practices; however, their influence on academia has not been gauged. Here, we probe the impact of questionable publications through a systematic and comprehensive analysis with various participants from academia and compare the results with those of their unaccused counterparts using billions of citation records, including liaisons, i.e., journals and publishers, and prosumers, i.e., authors. Questionable publications attribute publisher-level self-citations to their journals while limiting journal-level self-citations; yet, conventional journal-level metrics are unable to detect these publisher-level self-citations. We propose a hybrid journal-publisher metric for detecting self-favouring citations among QJs from publishers. Additionally, we demonstrate that the questionable publications were less disruptive and influential than their counterparts. Our findings indicate an inflated citation impact of suspicious academic publishers. The findings provide a basis for actionable policy-making against questionable publications. △ Less

Submitted 19 April, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

Comments: 16 pages of main text including 4 figures + 42 pages of supplementary information including 38 supplementary figures

Journal ref: Journal of Informetrics, 2022, 16(2), 101294

arXiv:2106.12753 [pdf, other]

DeepAuditor: Distributed Online Intrusion Detection System for IoT devices via Power Side-channel Auditing

Authors: Woosub Jung, Yizhou Feng, Sabbir Ahmed Khan, Chunsheng Xin, Danella Zhao, Gang Zhou

Abstract: As the number of IoT devices has increased rapidly, IoT botnets have exploited the vulnerabilities of IoT devices. However, it is still challenging to detect the initial intrusion on IoT devices prior to massive attacks. Recent studies have utilized power side-channel information to identify this intrusion behavior on IoT devices but still lack accurate models in real-time for ubiquitous botnet de… ▽ More As the number of IoT devices has increased rapidly, IoT botnets have exploited the vulnerabilities of IoT devices. However, it is still challenging to detect the initial intrusion on IoT devices prior to massive attacks. Recent studies have utilized power side-channel information to identify this intrusion behavior on IoT devices but still lack accurate models in real-time for ubiquitous botnet detection. We proposed the first online intrusion detection system called DeepAuditor for IoT devices via power auditing. To develop the real-time system, we proposed a lightweight power auditing device called Power Auditor. We also designed a distributed CNN classifier for online inference in a laboratory setting. In order to protect data leakage and reduce networking redundancy, we then proposed a privacy-preserved inference protocol via Packed Homomorphic Encryption and a sliding window protocol in our system. The classification accuracy and processing time were measured, and the proposed classifier outperformed a baseline classifier, especially against unseen patterns. We also demonstrated that the distributed CNN design is secure against any distributed components. Overall, the measurements were shown to the feasibility of our real-time distributed system for intrusion detection on IoT devices. △ Less

Submitted 9 May, 2022; v1 submitted 23 June, 2021; originally announced June 2021.

Comments: The 21st ACM/IEEE Conference on Information Processing in Sensor Networks (IPSN'22)

ACM Class: C.2.4; I.2.11

arXiv:2104.10812 [pdf]

doi 10.1038/s41562-022-01367-x

The latent structure of global scientific development

Authors: Lili Miao, Dakota Murray, Woo-Sung Jung, Vincent Larivière, Cassidy R. Sugimoto, Yong-Yeol Ahn

Abstract: Science is essential to innovation and economic prosperity. Although studies have shown that national scientific development is affected by geographic, historic, and economic factors, it remains unclear whether there are universal structures and trajectories of national scientific development that can inform forecasting and policymaking. Here, by examining countries' scientific 'exports'-publicati… ▽ More Science is essential to innovation and economic prosperity. Although studies have shown that national scientific development is affected by geographic, historic, and economic factors, it remains unclear whether there are universal structures and trajectories of national scientific development that can inform forecasting and policymaking. Here, by examining countries' scientific 'exports'-publications that are indexed in international databases-we reveal a three-cluster structure in the relatedness network of disciplines that underpin national scientific development and the organization of global science. Tracing the evolution of national research portfolios reveals that while nations are proceeding to more diverse research profiles individually, scientific production is increasingly specialized in global science over the past decades. By uncovering the underlying structure of scientific development and connecting it with economic development, our results may offer a new perspective on the evolution of global science. △ Less

Submitted 30 March, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

Comments: 30 pages(main text), 5 figures(main text), 3 tables(main text)

arXiv:2104.06697 [pdf, other]

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction

Authors: Wonkwang Lee, Whie Jung, Han Zhang, Ting Chen, Jing Yu Koh, Thomas Huang, Hyungsuk Yoon, Honglak Lee, Seunghoon Hong

Abstract: Learning to predict the long-term future of video frames is notoriously challenging due to inherent ambiguities in the distant future and dramatic amplifications of prediction error through time. Despite the recent advances in the literature, existing approaches are limited to moderately short-term prediction (less than a few seconds), while extrapolating it to a longer future quickly leads to des… ▽ More Learning to predict the long-term future of video frames is notoriously challenging due to inherent ambiguities in the distant future and dramatic amplifications of prediction error through time. Despite the recent advances in the literature, existing approaches are limited to moderately short-term prediction (less than a few seconds), while extrapolating it to a longer future quickly leads to destruction in structure and content. In this work, we revisit hierarchical models in video prediction. Our method predicts future frames by first estimating a sequence of semantic structures and subsequently translating the structures to pixels by video-to-video translation. Despite the simplicity, we show that modeling structures and their dynamics in the discrete semantic structure space with a stochastic recurrent estimator leads to surprisingly successful long-term prediction. We evaluate our method on three challenging datasets involving car driving and human dancing, and demonstrate that it can generate complicated scene structures and motions over a very long time horizon (i.e., thousands frames), setting a new standard of video prediction with orders of magnitude longer prediction time than existing approaches. Full videos and codes are available at https://1konny.github.io/HVP/. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: Accepted as a conference paper at ICLR 2021

arXiv:2104.04952 [pdf, other]

Fine-Grained Attention for Weakly Supervised Object Localization

Authors: Junghyo Sohn, Eunjin Jeon, Wonsik Jung, Eunsong Kang, Heung-Il Suk

Abstract: Although recent advances in deep learning accelerated an improvement in a weakly supervised object localization (WSOL) task, there are still challenges to identify the entire body of an object, rather than only discriminative parts. In this paper, we propose a novel residual fine-grained attention (RFGA) module that autonomously excites the less activated regions of an object by utilizing informat… ▽ More Although recent advances in deep learning accelerated an improvement in a weakly supervised object localization (WSOL) task, there are still challenges to identify the entire body of an object, rather than only discriminative parts. In this paper, we propose a novel residual fine-grained attention (RFGA) module that autonomously excites the less activated regions of an object by utilizing information distributed over channels and locations within feature maps in combination with a residual operation. To be specific, we devise a series of mechanisms of triple-view attention representation, attention expansion, and feature calibration. Unlike other attention-based WSOL methods that learn a coarse attention map, having the same values across elements in feature maps, our proposed RFGA learns fine-grained values in an attention map by assigning different attention values for each of the elements. We validated the superiority of our proposed RFGA module by comparing it with the recent methods in the literature over three datasets. Further, we analyzed the effect of each mechanism in our RFGA and visualized attention maps to get insights. △ Less

Submitted 11 April, 2021; originally announced April 2021.

Comments: 16 pages, 11 figures

arXiv:2102.02463 [pdf]

DIFFnet: Diffusion parameter mapping network generalized for input diffusion gradient schemes and bvalues

Authors: Juhung Park, Woojin Jung, Eun-Jung Choi, Se-Hong Oh, Dongmyung Shin, Hongjun An, Jongho Lee

Abstract: In MRI, deep neural networks have been proposed to reconstruct diffusion model parameters. However, the inputs of the networks were designed for a specific diffusion gradient scheme (i.e., diffusion gradient directions and numbers) and a specific b-value that are the same as the training data. In this study, a new deep neural network, referred to as DIFFnet, is developed to function as a generaliz… ▽ More In MRI, deep neural networks have been proposed to reconstruct diffusion model parameters. However, the inputs of the networks were designed for a specific diffusion gradient scheme (i.e., diffusion gradient directions and numbers) and a specific b-value that are the same as the training data. In this study, a new deep neural network, referred to as DIFFnet, is developed to function as a generalized reconstruction tool of the diffusion-weighted signals for various gradient schemes and b-values. For generalization, diffusion signals are normalized in a q-space and then projected and quantized, producing a matrix (Qmatrix) as an input for the network. To demonstrate the validity of this approach, DIFFnet is evaluated for diffusion tensor imaging (DIFFnetDTI) and for neurite orientation dispersion and density imaging (DIFFnetNODDI). In each model, two datasets with different gradient schemes and b-values are tested. The results demonstrate accurate reconstruction of the diffusion parameters at substantially reduced processing time (approximately 8.7 times and 2240 times faster processing time than conventional methods in DTI and NODDI, respectively; less than 4% mean normalized root-mean-square errors (NRMSE) in DTI and less than 8% in NODDI). The generalization capability of the networks was further validated using reduced numbers of diffusion signals from the datasets. Different from previously proposed deep neural networks, DIFFnet does not require any specific gradient scheme and b-value for its input. As a result, it can be adopted as an online reconstruction tool for various complex diffusion imaging. △ Less

Submitted 4 February, 2021; originally announced February 2021.

arXiv:2101.09356 [pdf]

Dynamical prediction of two meteorological factors using the deep neural network and the long short term memory $(1)$

Authors: Ki Hong Shin, Jae Won Jung, Sung Kyu Seo, Cheol Hwan You, Dong In Lee, Jisun Lee, Ki Ho Chang, Woon Seon Jung, Kyungsik Kim

Abstract: It is important to calculate and analyze temperature and humidity prediction accuracies among quantitative meteorological forecasting. This study manipulates the extant neural network methods to foster the predictive accuracy. To achieve such tasks, we analyze and explore the predictive accuracy and performance in the neural networks using two combined meteorological factors (temperature and humid… ▽ More It is important to calculate and analyze temperature and humidity prediction accuracies among quantitative meteorological forecasting. This study manipulates the extant neural network methods to foster the predictive accuracy. To achieve such tasks, we analyze and explore the predictive accuracy and performance in the neural networks using two combined meteorological factors (temperature and humidity). Simulated studies are performed by applying the artificial neural network (ANN), deep neural network (DNN), extreme learning machine (ELM), long short-term memory (LSTM), and long short-term memory with peephole connections (LSTM-PC) machine learning methods, and the accurate prediction value are compared to that obtained from each other methods. Data are extracted from low frequency time-series of ten metropolitan cities of South Korea from March 2014 to February 2020 to validate our observations. To test the robustness of methods, the error of LSTM is found to outperform that of the other four methods in predictive accuracy. Particularly, as testing results, the temperature prediction of LSTM in summer in Tongyeong has a root mean squared error (RMSE) value of 0.866 lower than that of other neural network methods, while the mean absolute percentage error (MAPE) value of LSTM for humidity prediction is 5.525 in summer in Mokpo, significantly better than other metropolitan cities. △ Less

Submitted 16 January, 2021; originally announced January 2021.

Comments: 33 Pages, 9 figures

arXiv:2012.02785 [pdf, other]

doi 10.1073/pnas.2305414120

Unsupervised embedding of trajectories captures the latent structure of scientific migration

Authors: Dakota Murray, Jisung Yoon, Sadamori Kojaku, Rodrigo Costas, Woo-Sung Jung, Staša Milojević, Yong-Yeol Ahn

Abstract: Human migration and mobility drives major societal phenomena including epidemics, economies, innovation, and the diffusion of ideas. Although human mobility and migration have been heavily constrained by geographic distance throughout the history, advances and globalization are making other factors such as language and culture increasingly more important. Advances in neural embedding models, origi… ▽ More Human migration and mobility drives major societal phenomena including epidemics, economies, innovation, and the diffusion of ideas. Although human mobility and migration have been heavily constrained by geographic distance throughout the history, advances and globalization are making other factors such as language and culture increasingly more important. Advances in neural embedding models, originally designed for natural language, provide an opportunity to tame this complexity and open new avenues for the study of migration. Here, we demonstrate the ability of the model word2vec to encode nuanced relationships between discrete locations from migration trajectories, producing an accurate, dense, continuous, and meaningful vector-space representation. The resulting representation provides a functional distance between locations, as well as a digital double that can be distributed, re-used, and itself interrogated to understand the many dimensions of migration. We show that the unique power of word2vec to encode migration patterns stems from its mathematical equivalence with the gravity model of mobility. Focusing on the case of scientific migration, we apply word2vec to a database of three million migration trajectories of scientists derived from the affiliations listed on their publication records. Using techniques that leverage its semantic structure, we demonstrate that embeddings can learn the rich structure that underpins scientific migration, such as cultural, linguistic, and prestige relationships at multiple levels of granularity. Our results provide a theoretical foundation and methodological framework for using neural embeddings to represent and understand migration both within and beyond science. △ Less

Submitted 17 November, 2023; v1 submitted 4 December, 2020; originally announced December 2020.

Comments: 44 pages (main text), 102 pages total. 5 figures (main text), 32 figures in supporting information

arXiv:2012.01968 [pdf, other]

doi 10.1109/IISWC50251.2020.00033

Accelerating Number Theoretic Transformations for Bootstrappable Homomorphic Encryption on GPUs

Authors: Sangpyo Kim, Wonkyung Jung, Jaiyoung Park, Jung Ho Ahn

Abstract: Homomorphic encryption (HE) draws huge attention as it provides a way of privacy-preserving computations on encrypted messages. Number Theoretic Transform (NTT), a specialized form of Discrete Fourier Transform (DFT) in the finite field of integers, is the key algorithm that enables fast computation on encrypted ciphertexts in HE. Prior works have accelerated NTT and its inverse transformation on… ▽ More Homomorphic encryption (HE) draws huge attention as it provides a way of privacy-preserving computations on encrypted messages. Number Theoretic Transform (NTT), a specialized form of Discrete Fourier Transform (DFT) in the finite field of integers, is the key algorithm that enables fast computation on encrypted ciphertexts in HE. Prior works have accelerated NTT and its inverse transformation on a popular parallel processing platform, GPU, by leveraging DFT optimization techniques. However, these GPU-based studies lack a comprehensive analysis of the primary differences between NTT and DFT or only consider small HE parameters that have tight constraints in the number of arithmetic operations that can be performed without decryption. In this paper, we analyze the algorithmic characteristics of NTT and DFT and assess the performance of NTT when we apply the optimizations that are commonly applicable to both DFT and NTT on modern GPUs. From the analysis, we identify that NTT suffers from severe main-memory bandwidth bottleneck on large HE parameter sets. To tackle the main-memory bandwidth issue, we propose a novel NTT-specific on-the-fly root generation scheme dubbed on-the-fly twiddling (OT). Compared to the baseline radix-2 NTT implementation, after applying all the optimizations, including OT, we achieve 4.2x speedup on a modern GPU. △ Less

Submitted 3 December, 2020; originally announced December 2020.

Comments: 12 pages, 13 figures, to appear in IISWC 2020

arXiv:2011.11851 [pdf, other]

Dual Supervision Framework for Relation Extraction with Distant Supervision and Human Annotation

Authors: Woohwan Jung, Kyuseok Shim

Abstract: Relation extraction (RE) has been extensively studied due to its importance in real-world applications such as knowledge base construction and question answering. Most of the existing works train the models on either distantly supervised data or human-annotated data. To take advantage of the high accuracy of human annotation and the cheap cost of distant supervision, we propose the dual supervisio… ▽ More Relation extraction (RE) has been extensively studied due to its importance in real-world applications such as knowledge base construction and question answering. Most of the existing works train the models on either distantly supervised data or human-annotated data. To take advantage of the high accuracy of human annotation and the cheap cost of distant supervision, we propose the dual supervision framework which effectively utilizes both types of data. However, simply combining the two types of data to train a RE model may decrease the prediction accuracy since distant supervision has labeling bias. We employ two separate prediction networks HA-Net and DS-Net to predict the labels by human annotation and distant supervision, respectively, to prevent the degradation of accuracy by the incorrect labeling of distant supervision. Furthermore, we propose an additional loss term called disagreement penalty to enable HA-Net to learn from distantly supervised labels. In addition, we exploit additional networks to adaptively assess the labeling bias by considering contextual information. Our performance study on sentence-level and document-level REs confirms the effectiveness of the dual supervision framework. △ Less

Submitted 23 November, 2020; originally announced November 2020.

Comments: Accepted to COLING 2020

arXiv:2007.10588 [pdf, other]

CyCNN: A Rotation Invariant CNN using Polar Mapping and Cylindrical Convolution Layers

Authors: Jinpyo Kim, Wooekun Jung, Hyungmo Kim, Jaejin Lee

Abstract: Deep Convolutional Neural Networks (CNNs) are empirically known to be invariant to moderate translation but not to rotation in image classification. This paper proposes a deep CNN model, called CyCNN, which exploits polar mapping of input images to convert rotation to translation. To deal with the cylindrical property of the polar coordinates, we replace convolution layers in conventional CNNs to… ▽ More Deep Convolutional Neural Networks (CNNs) are empirically known to be invariant to moderate translation but not to rotation in image classification. This paper proposes a deep CNN model, called CyCNN, which exploits polar mapping of input images to convert rotation to translation. To deal with the cylindrical property of the polar coordinates, we replace convolution layers in conventional CNNs to cylindrical convolutional (CyConv) layers. A CyConv layer exploits the cylindrically sliding windows (CSW) mechanism that vertically extends the input-image receptive fields of boundary units in a convolutional layer. We evaluate CyCNN and conventional CNN models for classification tasks on rotated MNIST, CIFAR-10, and SVHN datasets. We show that if there is no data augmentation during training, CyCNN significantly improves classification accuracies when compared to conventional CNN models. Our implementation of CyCNN is publicly available on https://github.com/mcrl/CyCNN. △ Less

Submitted 21 July, 2020; originally announced July 2020.

arXiv:2006.04941 [pdf, other]

Persona2vec: A Flexible Multi-role Representations Learning Framework for Graphs

Authors: Jisung Yoon, Kai-Cheng Yang, Woo-Sung Jung, Yong-Yeol Ahn

Abstract: Graph embedding techniques, which learn low-dimensional representations of a graph, are achieving state-of-the-art performance in many graph mining tasks. Most existing embedding algorithms assign a single vector to each node, implicitly assuming that a single representation is enough to capture all characteristics of the node. However, across many domains, it is common to observe pervasively over… ▽ More Graph embedding techniques, which learn low-dimensional representations of a graph, are achieving state-of-the-art performance in many graph mining tasks. Most existing embedding algorithms assign a single vector to each node, implicitly assuming that a single representation is enough to capture all characteristics of the node. However, across many domains, it is common to observe pervasively overlapping community structure, where most nodes belong to multiple communities, playing different roles depending on the contexts. Here, we propose persona2vec, a graph embedding framework that efficiently learns multiple representations of nodes based on their structural contexts. Using link prediction-based evaluation, we show that our framework is significantly faster than the existing state-of-the-art model while achieving better performance. △ Less

Submitted 21 October, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

Comments: 9 pages, 7 figures

arXiv:2006.02732 [pdf, other]

A Maximum Mutual Information Framework for Multi-Agent Reinforcement Learning

Authors: Woojun Kim, Whiyoung Jung, Myungsik Cho, Youngchul Sung

Abstract: In this paper, we propose a maximum mutual information (MMI) framework for multi-agent reinforcement learning (MARL) to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the mutual information between actions. By introducing a latent variable to induce nonzero mutual information between actions and applying a variational bound, we derive a tractable… ▽ More In this paper, we propose a maximum mutual information (MMI) framework for multi-agent reinforcement learning (MARL) to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the mutual information between actions. By introducing a latent variable to induce nonzero mutual information between actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic (VM3-AC), which follows centralized learning with decentralized execution (CTDE). We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms MADDPG and other MARL algorithms in multi-agent tasks requiring coordination. △ Less

Submitted 4 June, 2020; originally announced June 2020.

arXiv:2005.02643 [pdf, other]

Deep Recurrent Model for Individualized Prediction of Alzheimer's Disease Progression

Authors: Wonsik Jung, Eunji Jun, Heung-Il Suk

Abstract: Alzheimer's disease (AD) is known as one of the major causes of dementia and is characterized by slow progression over several years, with no treatments or available medicines. In this regard, there have been efforts to identify the risk of developing AD in its earliest time. While many of the previous works considered cross-sectional analysis, more recent studies have focused on the diagnosis and… ▽ More Alzheimer's disease (AD) is known as one of the major causes of dementia and is characterized by slow progression over several years, with no treatments or available medicines. In this regard, there have been efforts to identify the risk of developing AD in its earliest time. While many of the previous works considered cross-sectional analysis, more recent studies have focused on the diagnosis and prognosis of AD with longitudinal or time series data in a way of disease progression modeling (DPM). Under the same problem settings, in this work, we propose a novel computational framework that can predict the phenotypic measurements of MRI biomarkers and trajectories of clinical status along with cognitive scores at multiple future time points. However, in handling time series data, it generally faces with many unexpected missing observations. In regard to such an unfavorable situation, we define a secondary problem of estimating those missing values and tackle it in a systematic way by taking account of temporal and multivariate relations inherent in time series data. Concretely, we propose a deep recurrent network that jointly tackles the four problems of (i) missing value imputation, (ii) phenotypic measurements forecasting, (iii) trajectory estimation of the cognitive score, and (iv) clinical status prediction of a subject based on his/her longitudinal imaging biomarkers. Notably, the learnable model parameters of our network are trained in an end-to-end manner with our circumspectly defined loss function. In our experiments over TADPOLE challenge cohort, we measured performance for various metrics and compared our method to competing methods in the literature. Exhaustive analyses and ablation studies were also conducted to better confirm the effectiveness of our method. △ Less

Submitted 27 August, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

Comments: 17 pages, 12 figures

arXiv:2003.04510 [pdf, other]

doi 10.1109/ACCESS.2021.3096189

HEAAN Demystified: Accelerating Fully Homomorphic Encryption Through Architecture-centric Analysis and Optimization

Authors: Wonkyung Jung, Eojin Lee, Sangpyo Kim, Keewoo Lee, Namhoon Kim, Chohong Min, Jung Hee Cheon, Jung Ho Ahn

Abstract: Homomorphic Encryption (HE) draws a significant attention as a privacy-preserving way for cloud computing because it allows computation on encrypted messages called ciphertexts. Among numerous HE schemes proposed, HE for Arithmetic of Approximate Numbers (HEAAN) is rapidly gaining popularity across a wide range of applications because it supports messages that can tolerate approximate computation… ▽ More Homomorphic Encryption (HE) draws a significant attention as a privacy-preserving way for cloud computing because it allows computation on encrypted messages called ciphertexts. Among numerous HE schemes proposed, HE for Arithmetic of Approximate Numbers (HEAAN) is rapidly gaining popularity across a wide range of applications because it supports messages that can tolerate approximate computation with no limit on the number of arithmetic operations applicable to the corresponding ciphertexts. A critical shortcoming of HE is the high computation complexity of ciphertext arithmetic; especially, HE multiplication (HE Mul) is more than 10,000 times slower than the corresponding multiplication between unencrypted messages. This leads to a large body of HE acceleration studies, including ones exploiting FPGAs; however, those did not conduct a rigorous analysis of computational complexity and data access patterns of HE Mul. Moreover, the proposals mostly focused on designs with small parameter sizes, making it difficult to accurately estimate their performance in conducting a series of complex arithmetic operations. In this paper, we first describe how HE Mul of HEAAN is performed in a manner friendly to computer architects. Then we conduct a disciplined analysis on its computational and memory access characteristics, through which we (1) extract parallelism in the key functions composing HE Mul and (2) demonstrate how to effectively map the parallelism to the popular parallel processing platforms, multicore CPUs and GPUs, by applying a series of optimization techniques such as transposing matrices and pinning data to threads. This leads to the performance improvement of HE Mul on a CPU and a GPU by 42.9x and 134.1x, respectively, over the single-thread reference HEAAN running on a CPU. The conducted analysis and optimization would set a new foundation for future HE acceleration research. △ Less

Submitted 9 March, 2020; originally announced March 2020.

Journal ref: IEEE Access 2021

arXiv:2001.02907 [pdf, other]

Population-Guided Parallel Policy Search for Reinforcement Learning

Authors: Whiyoung Jung, Giseung Park, Youngchul Sung

Abstract: In this paper, a new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL). In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information. The key point is that the… ▽ More In this paper, a new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL). In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information. The key point is that the information of the best policy is fused in a soft manner by constructing an augmented loss function for policy update to enlarge the overall search region by the multiple learners. The guidance by the previous best policy and the enlarged range enable faster and better policy search. Monotone improvement of the expected cumulative return by the proposed scheme is proved theoretically. Working algorithms are constructed by applying the proposed scheme to the twin delayed deep deterministic (TD3) policy gradient algorithm. Numerical results show that the constructed algorithm outperforms most of the current state-of-the-art RL algorithms, and the gain is significant in the case of sparse reward environment. △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: Accepted to ICLR 2020

Showing 1–50 of 65 results for author: Jung, W