Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 171 results for author: Hwang, S J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17808  [pdf, other

    cs.CL cs.AI cs.LG

    Training-Free Exponential Extension of Sliding Window Context with Cascading KV Cache

    Authors: Jeffrey Willette, Heejun Lee, Youngwan Lee, Myeongjae Jeon, Sung Ju Hwang

    Abstract: The context window within a transformer provides a form of active memory for the current task, which can be useful for few-shot learning and conditional generation, both which depend heavily on previous context tokens. However, as the context length grows, the computational cost increases quadratically. Recent works have shown that saving a few initial tokens along with a fixed-sized sliding windo… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2406.16013  [pdf, other

    cs.CL cs.AI cs.IR

    Database-Augmented Query Representation for Information Retrieval

    Authors: Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park

    Abstract: Information retrieval models that aim to search for the documents relevant to the given query have shown many successes, which have been applied to diverse tasks. However, the query provided by the user is oftentimes very short, which challenges the retrievers to correctly fetch relevant documents. To tackle this, existing studies have proposed expanding the query with a couple of additional (user… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  3. arXiv:2406.10995  [pdf, other

    cs.CV cs.LG

    Concept-skill Transferability-based Data Selection for Large Vision-Language Models

    Authors: Jaewoo Lee, Boyang Li, Sung Ju Hwang

    Abstract: Instruction tuning, or supervised finetuning on extensive task-specific data, is necessary for Large Vision-Language Models (LVLMs) to generalize well across a broad range of vision-language (VL) tasks. However, training on large VL datasets can become prohibitively expensive. In this work, we introduce COINCIDE, an effective and scalable data selection technique that uses a small model as a refer… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Preprint

  4. arXiv:2406.09827  [pdf, other

    cs.CL cs.CV cs.DC cs.LG

    HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning

    Authors: Heejun Lee, Geon Park, Youngwan Lee, Jina Kim, Wonyoung Jeong, Myeongjae Jeon, Sung Ju Hwang

    Abstract: In modern large language models (LLMs), increasing sequence lengths is a crucial challenge for enhancing their comprehension and coherence in handling complex tasks such as multi-modal question answering. However, handling long context sequences with LLMs is prohibitively costly due to the conventional attention mechanism's quadratic time and space complexity, and the context window size is limite… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 26 pages, 15 figures

  5. arXiv:2405.18540  [pdf, other

    cs.CL cs.CR cs.LG

    Learning diverse attacks on large language models for robust red-teaming and safety tuning

    Authors: Seanie Lee, Minsu Kim, Lynn Cherif, David Dobre, Juho Lee, Sung Ju Hwang, Kenji Kawaguchi, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Moksh Jain

    Abstract: Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs). Developing effective protection against many modes of attack prompts requires discovering diverse attacks. Automated red-teaming typically uses reinforcement learning to fine-tune an attacker language model to generate prompts that e… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  6. arXiv:2405.18042  [pdf, other

    cs.CV cs.LG

    Visualizing the loss landscape of Self-supervised Vision Transformer

    Authors: Youngwan Lee, Jeffrey Ryan Willette, Jonghee Kim, Sung Ju Hwang

    Abstract: The Masked autoencoder (MAE) has drawn attention as a representative self-supervised approach for masked image modeling with vision transformers. However, even though MAE shows better generalization capability than fully supervised training from scratch, the reason why has not been explored. In another line of work, the Reconstruction Consistent Masked Auto Encoder (RC-MAE), has been proposed whic… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2023 Workshop: Self-Supervised Learning - Theory and Practice

  7. arXiv:2405.17918  [pdf, other

    cs.LG cs.AI

    Cost-Sensitive Multi-Fidelity Bayesian Optimization with Transfer of Learning Curve Extrapolation

    Authors: Dong Bok Lee, Aoxuan Silvia Zhang, Byungjoo Kim, Junhyeon Park, Juho Lee, Sung Ju Hwang, Hae Beom Lee

    Abstract: In this paper, we address the problem of cost-sensitive multi-fidelity Bayesian Optimization (BO) for efficient hyperparameter optimization (HPO). Specifically, we assume a scenario where users want to early-stop the BO when the performance improvement is not satisfactory with respect to the required computational cost. Motivated by this scenario, we introduce utility, which is a function predefin… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  8. arXiv:2405.16567  [pdf, other

    cs.AI cs.CR

    Automatic Jailbreaking of the Text-to-Image Generative AI Systems

    Authors: Minseon Kim, Hyomin Lee, Boqing Gong, Huishuai Zhang, Sung Ju Hwang

    Abstract: Recent AI systems have shown extremely powerful performance, even surpassing human performance, on various tasks such as information retrieval, language generation, and image generation based on large language models (LLMs). At the same time, there are diverse safety risks that can cause the generation of malicious contents by circumventing the alignment in LLMs, which are often referred to as jai… ▽ More

    Submitted 28 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

    Comments: Under review

  9. arXiv:2405.11162  [pdf, other

    cs.CL

    LG AI Research & KAIST at EHRSQL 2024: Self-Training Large Language Models with Pseudo-Labeled Unanswerable Questions for a Reliable Text-to-SQL System on EHRs

    Authors: Yongrae Jo, Seongyun Lee, Minju Seo, Sung Ju Hwang, Moontae Lee

    Abstract: Text-to-SQL models are pivotal for making Electronic Health Records (EHRs) accessible to healthcare professionals without SQL knowledge. With the advancements in large language models, these systems have become more adept at translating complex questions into SQL queries. Nonetheless, the critical need for reliability in healthcare necessitates these models to accurately identify unanswerable ques… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: NAACL 2024 Clinical NLP Workshop

  10. arXiv:2404.07738  [pdf, other

    cs.CL cs.AI cs.LG

    ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models

    Authors: Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, Sung Ju Hwang

    Abstract: Scientific Research, vital for improving human life, is hindered by its inherent complexity, slow pace, and the need for specialized experts. To enhance its productivity, we propose a ResearchAgent, a large language model-powered research idea writing agent, which automatically generates problems, methods, and experiment designs while iteratively refining them based on scientific literature. Speci… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  11. arXiv:2404.04243  [pdf, other

    cs.CV cs.AI

    Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models

    Authors: Sangwon Jang, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang

    Abstract: Text-to-image diffusion models have shown remarkable success in generating personalized subjects based on a few reference images. However, current methods often fail when generating multiple subjects simultaneously, resulting in mixed identities with combined attributes from different subjects. In this work, we present MuDI, a novel framework that enables multi-subject personalization by effective… ▽ More

    Submitted 28 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: Preprint. Project page: https://mudi-t2i.github.io/

  12. arXiv:2404.00921  [pdf, other

    cs.CV

    Towards Label-Efficient Human Matting: A Simple Baseline for Weakly Semi-Supervised Trimap-Free Human Matting

    Authors: Beomyoung Kim, Myeong Yeon Yi, Joonsang Yu, Young Joon Yoo, Sung Ju Hwang

    Abstract: This paper presents a new practical training method for human matting, which demands delicate pixel-level human region identification and significantly laborious annotations. To reduce the annotation cost, most existing matting approaches often rely on image synthesis to augment the dataset. However, the unnaturalness of synthesized training images brings in a new domain generalization challenge f… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Preprint, 15 pages, 13 figures

  13. arXiv:2404.00918  [pdf, other

    cs.CV

    Rethinking Saliency-Guided Weakly-Supervised Semantic Segmentation

    Authors: Beomyoung Kim, Donghyun Kim, Sung Ju Hwang

    Abstract: This paper presents a fresh perspective on the role of saliency maps in weakly-supervised semantic segmentation (WSSS) and offers new insights and research directions based on our empirical findings. We conduct comprehensive experiments and observe that the quality of the saliency map is a critical factor in saliency-guided WSSS approaches. Nonetheless, we find that the saliency maps used in previ… ▽ More

    Submitted 2 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Preprint, 17 pages, 7 figures

  14. arXiv:2403.20126  [pdf, other

    cs.CV

    ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning

    Authors: Beomyoung Kim, Joonsang Yu, Sung Ju Hwang

    Abstract: Panoptic segmentation, combining semantic and instance segmentation, stands as a cutting-edge computer vision task. Despite recent progress with deep learning models, the dynamic nature of real-world applications necessitates continual learning, where models adapt to new classes (plasticity) over time without forgetting old ones (catastrophic forgetting). Current continual segmentation methods oft… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  15. arXiv:2403.15456  [pdf, other

    cs.AI cs.CL

    WoLF: Wide-scope Large Language Model Framework for CXR Understanding

    Authors: Seil Kang, Donghyun Kim, Junhyeok Kim, Hyo Kyung Lee, Seong Jae Hwang

    Abstract: Significant methodological strides have been made toward Chest X-ray (CXR) understanding via modern vision-language models (VLMs), demonstrating impressive Visual Question Answering (VQA) and CXR report generation abilities. However, existing CXR understanding frameworks still possess several procedural caveats. (1) Previous methods solely use CXR reports, which are insufficient for comprehensive… ▽ More

    Submitted 29 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 11 pages main paper, 2 pages supplementary

  16. arXiv:2403.14403  [pdf, other

    cs.CL cs.AI

    Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

    Authors: Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park

    Abstract: Retrieval-Augmented Large Language Models (LLMs), which incorporate the non-parametric knowledge from external knowledge bases into LLMs, have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA). However, even though there are various approaches dealing with queries of different complexities, they either handle simple queries with unnece… ▽ More

    Submitted 28 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: NAACL 2024

  17. arXiv:2403.08801  [pdf, other

    cs.CV

    CoBra: Complementary Branch Fusing Class and Semantic Knowledge for Robust Weakly Supervised Semantic Segmentation

    Authors: Woojung Han, Seil Kang, Kyobin Choo, Seong Jae Hwang

    Abstract: Leveraging semantically precise pseudo masks derived from image-level class knowledge for segmentation, namely image-level Weakly Supervised Semantic Segmentation (WSSS), still remains challenging. While Class Activation Maps (CAMs) using CNNs have steadily been contributing to the success of WSSS, the resulting activation maps often narrowly focus on class-specific parts (e.g., only face of human… ▽ More

    Submitted 27 May, 2024; v1 submitted 5 February, 2024; originally announced March 2024.

  18. arXiv:2403.06516  [pdf, other

    cs.CV

    Advancing Text-Driven Chest X-Ray Generation with Policy-Based Reinforcement Learning

    Authors: Woojung Han, Chanyoung Kim, Dayun Ju, Yumin Shim, Seong Jae Hwang

    Abstract: Recent advances in text-conditioned image generation diffusion models have begun paving the way for new opportunities in modern medical domain, in particular, generating Chest X-rays (CXRs) from diagnostic reports. Nonetheless, to further drive the diffusion models to generate CXRs that faithfully reflect the complexity and diversity of real data, it has become evident that a nontrivial learning a… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  19. arXiv:2403.01482  [pdf, other

    cs.CV

    EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

    Authors: Chanyoung Kim, Woojung Han, Dayun Ju, Seong Jae Hwang

    Abstract: Semantic segmentation has innately relied on extensive pixel-level annotated data, leading to the emergence of unsupervised methodologies. Among them, leveraging self-supervised Vision Transformers for unsupervised semantic segmentation (USS) has been making steady progress with expressive deep features. Yet, for semantically segmenting images with complex objects, a predominant challenge remains:… ▽ More

    Submitted 5 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

  20. arXiv:2402.18153  [pdf, other

    cs.LG cs.AI

    Diffusion-based Neural Network Weights Generation

    Authors: Bedionita Soro, Bruno Andreis, Hayeon Lee, Song Chong, Frank Hutter, Sung Ju Hwang

    Abstract: Transfer learning is a topic of significant interest in recent deep learning research because it enables faster convergence and improved performance on new tasks. While the performance of transfer learning depends on the similarity of the source data to the target data, it is costly to train a model on a large number of datasets. Therefore, pretrained models are generally blindly selected with the… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 14 pages

  21. arXiv:2402.13482  [pdf, other

    cs.CL cs.AI cs.LG

    Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks

    Authors: Minju Seo, Jinheon Baek, James Thorne, Sung Ju Hwang

    Abstract: Despite large successes of recent language models on diverse tasks, they suffer from severe performance degeneration in low-resource settings with limited training data available. Many existing works tackle this problem by generating synthetic data from the training data and then training models on them, recently using Large Language Models (LLMs). However, in low-resource settings, the amount of… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  22. arXiv:2402.08712  [pdf, other

    cs.LG cs.CV

    BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation

    Authors: Daeun Lee, Jaehong Yoon, Sung Ju Hwang

    Abstract: Continual Test Time Adaptation (CTTA) is required to adapt efficiently to continuous unseen domains while retaining previously learned knowledge. However, despite the progress of CTTA, it is still challenging to deploy the model with improved forgetting-adaptation trade-offs and efficiency. In addition, current CTTA scenarios assume only the disjoint situation, even though real-world domains are s… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML2024, 22 pages, Project page: https://becotta-ctta.github.io/

  23. arXiv:2312.11973  [pdf, other

    cs.CV cs.AI cs.LG

    Continual Learning: Forget-free Winning Subnetworks for Video Representations

    Authors: Haeyong Kang, Jaehong Yoon, Sung Ju Hwang, Chang D. Yoo

    Abstract: Inspired by the Lottery Ticket Hypothesis (LTH), which highlights the existence of efficient subnetworks within larger, dense networks, a high-performing Winning Subnetwork (WSN) in terms of task performance under appropriate sparsity conditions is considered for various continual learning tasks. It leverages pre-existing weights from dense networks to achieve efficient learning in Task Incrementa… ▽ More

    Submitted 2 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.14962, arXiv:2306.11305

  24. arXiv:2312.08958  [pdf, other

    cs.LG cs.AI cs.RO

    LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers

    Authors: Taewook Nam, Juyong Lee, Jesse Zhang, Sung Ju Hwang, Joseph J. Lim, Karl Pertsch

    Abstract: We propose a framework that leverages foundation models as teachers, guiding a reinforcement learning agent to acquire semantically meaningful behavior without human feedback. In our framework, the agent receives task instructions grounded in a training environment from large language models. Then, a vision-language model guides the agent in learning the multi-task language-conditioned policy by p… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 2nd Workshop on Agent Learning in Open-Endedness (ALOE) at NeurIPS 2023

  25. arXiv:2312.04005  [pdf, other

    cs.CV cs.AI

    KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis

    Authors: Youngwan Lee, Kwanyong Park, Yoorhim Cho, Yong-Ju Lee, Sung Ju Hwang

    Abstract: As text-to-image (T2I) synthesis models increase in size, they demand higher inference costs due to the need for more expensive GPUs with larger memory, which makes it challenging to reproduce these models in addition to the restricted access to training datasets. Our study aims to reduce these inference costs and explores how far the generative capabilities of T2I models can be extended using onl… ▽ More

    Submitted 28 May, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Project page: https://youngwanlee.github.io/KOALA/

  26. arXiv:2311.08106  [pdf, other

    cs.CL

    Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models

    Authors: Yujin Kim, Jaehong Yoon, Seonghyeon Ye, Sangmin Bae, Namgyu Ho, Sung Ju Hwang, Se-young Yun

    Abstract: The dynamic nature of knowledge in an ever-changing world presents challenges for language models trained on static data; the model in the real world often requires not only acquiring new knowledge but also overwriting outdated information into updated ones. To study the ability of language models for these time-dependent dynamics in human language, we introduce a novel task, EvolvingQA, a tempora… ▽ More

    Submitted 20 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 15 pages, 10 figures, 5 tables; accepted to NAACL 2024

  27. arXiv:2311.07006  [pdf, other

    cs.CL cs.AI

    Context-dependent Instruction Tuning for Dialogue Response Generation

    Authors: Jin Myung Kwak, Minseon Kim, Sung Ju Hwang

    Abstract: Recent language models have achieved impressive performance in natural language tasks by incorporating instructions with task input during fine-tuning. Since all samples in the same natural language task can be explained with the same task instructions, many instruction datasets only provide a few instructions for the entire task, without considering the input of each example in the task. However,… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: Work in Progress

  28. arXiv:2311.05161  [pdf, other

    cs.CL

    Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization

    Authors: Jangwhan Lee, Minsoo Kim, Seungcheol Baek, Seok Joong Hwang, Wonyong Sung, Jungwook Choi

    Abstract: Large Language Models (LLMs) are proficient in natural language processing tasks, but their deployment is often restricted by extensive parameter sizes and computational demands. This paper focuses on post-training quantization (PTQ) in LLMs, specifically 4-bit weight and 8-bit activation (W4A8) quantization, to enhance computational efficiency -- a topic less explored compared to weight-only quan… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023 Main Conference

  29. arXiv:2311.02849  [pdf, other

    cs.CL cs.AI

    Co-training and Co-distillation for Quality Improvement and Compression of Language Models

    Authors: Hayeon Lee, Rui Hou, Jongpil Kim, Davis Liang, Hongbo Zhang, Sung Ju Hwang, Alexander Min

    Abstract: Knowledge Distillation (KD) compresses computationally expensive pre-trained language models (PLMs) by transferring their knowledge to smaller models, allowing their use in resource-constrained or real-time settings. However, most smaller models fail to surpass the performance of the original larger model, resulting in sacrificing performance to improve inference speed. To address this issue, we p… ▽ More

    Submitted 7 November, 2023; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: Findings of EMNLP 2023

  30. arXiv:2310.13307  [pdf, other

    cs.CL cs.LG

    Test-Time Self-Adaptive Small Language Models for Question Answering

    Authors: Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park

    Abstract: Recent instruction-finetuned large language models (LMs) have achieved notable performances in various tasks, such as question-answering (QA). However, despite their ability to memorize a vast amount of general knowledge across diverse tasks, they might be suboptimal on specific tasks due to their limited capacity to transfer and adapt knowledge to target tasks. Moreover, further finetuning LMs wi… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: EMNLP Findings 2023

  31. arXiv:2310.12836  [pdf, other

    cs.CL cs.LG

    Knowledge-Augmented Language Model Verification

    Authors: Jinheon Baek, Soyeong Jeong, Minki Kang, Jong C. Park, Sung Ju Hwang

    Abstract: Recent Language Models (LMs) have shown impressive capabilities in generating texts with the knowledge internalized in parameters. Yet, LMs often generate the factually incorrect responses to the given queries, since their knowledge may be inaccurate, incomplete, and outdated. To address this problem, previous works propose to augment LMs with the knowledge retrieved from an external knowledge sou… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

  32. arXiv:2310.08204  [pdf, other

    cs.CV cs.LG

    STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment

    Authors: Jaewoo Lee, Jaehong Yoon, Wonjae Kim, Yunji Kim, Sung Ju Hwang

    Abstract: Continuously learning a variety of audio-video semantics over time is crucial for audio-related reasoning tasks in our ever-evolving world. However, this is a nontrivial problem and poses two critical challenges: sparse spatio-temporal correlation between audio-video pairs and multimodal correlation overwriting that forgets audio-video relations. To tackle this problem, we propose a new continual… ▽ More

    Submitted 28 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  33. arXiv:2310.07216  [pdf, other

    cs.LG stat.ML

    Generative Modeling on Manifolds Through Mixture of Riemannian Diffusion Processes

    Authors: Jaehyeong Jo, Sung Ju Hwang

    Abstract: Learning the distribution of data on Riemannian manifolds is crucial for modeling data from non-Euclidean space, which is required by many applications in diverse scientific fields. Yet, existing generative models on manifolds suffer from expensive divergence computation or rely on approximations of heat kernel. These limitations restrict their applicability to simple geometries and hinder scalabi… ▽ More

    Submitted 2 June, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  34. arXiv:2310.06511  [pdf, other

    cs.LG

    Self-Supervised Dataset Distillation for Transfer Learning

    Authors: Dong Bok Lee, Seanie Lee, Joonho Ko, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang

    Abstract: Dataset distillation methods have achieved remarkable success in distilling a large dataset into a small set of representative samples. However, they are not designed to produce a distilled dataset that can be effectively used for facilitating self-supervised pre-training. To this end, we propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient… ▽ More

    Submitted 11 April, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

  35. arXiv:2310.01777  [pdf, other

    cs.CL cs.LG

    SEA: Sparse Linear Attention with Estimated Attention Mask

    Authors: Heejun Lee, Jina Kim, Jeffrey Willette, Sung Ju Hwang

    Abstract: The transformer architecture has driven breakthroughs in recent years on tasks which require modeling pairwise relationships between sequential elements, as is the case in natural language understanding. However, long seqeuences pose a problem due to the quadratic complexity of the attention operation. Previous research has aimed to lower the complexity by sparsifying or linearly approximating the… ▽ More

    Submitted 25 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: 9 main pages

  36. arXiv:2310.00841  [pdf, other

    cs.LG

    Drug Discovery with Dynamic Goal-aware Fragments

    Authors: Seul Lee, Seanie Lee, Kenji Kawaguchi, Sung Ju Hwang

    Abstract: Fragment-based drug discovery is an effective strategy for discovering drug candidates in the vast chemical space, and has been widely employed in molecular generative models. However, many existing fragment extraction methods in such models do not take the target chemical properties into account or rely on heuristic rules. Additionally, the existing fragment-based generative models cannot update… ▽ More

    Submitted 30 May, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  37. arXiv:2306.12026  [pdf, other

    cs.LG cs.CV

    Continual Learners are Incremental Model Generalizers

    Authors: Jaehong Yoon, Sung Ju Hwang, Yue Cao

    Abstract: Motivated by the efficiency and rapid convergence of pre-trained models for solving downstream tasks, this paper extensively studies the impact of Continual Learning (CL) models as pre-trainers. In both supervised and unsupervised CL, we find that the transfer quality of the representation often increases gradually without noticeable degradation in fine-tuning performance. This is because CL model… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  38. arXiv:2306.11305  [pdf, other

    cs.CV cs.AI cs.LG

    Progressive Fourier Neural Representation for Sequential Video Compilation

    Authors: Haeyong Kang, Jaehong Yoon, DaHyun Kim, Sung Ju Hwang, Chang D Yoo

    Abstract: Neural Implicit Representation (NIR) has recently gained significant attention due to its remarkable ability to encode complex and high-dimensional data into representation space and easily reconstruct it through a trainable mapping function. However, NIR methods assume a one-to-one mapping between the target data and representation models regardless of data relevancy or similarity. This results i… ▽ More

    Submitted 6 February, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

  39. arXiv:2306.05031  [pdf, other

    cs.LG

    Generalizable Lightweight Proxy for Robust NAS against Diverse Perturbations

    Authors: Hyeonjeong Ha, Minseon Kim, Sung Ju Hwang

    Abstract: Recent neural architecture search (NAS) frameworks have been successful in finding optimal architectures for given conditions (e.g., performance or latency). However, they search for optimal architectures in terms of their performance on clean images only, while robustness against various types of perturbations or corruptions is crucial in practice. Although there exist several robust NAS framewor… ▽ More

    Submitted 20 October, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023, Code is available at https://github.com/HyeonjeongHa/CRoZe

  40. arXiv:2306.04293  [pdf, other

    cs.CL cs.IR cs.LG

    Phrase Retrieval for Open-Domain Conversational Question Answering with Conversational Dependency Modeling via Contrastive Learning

    Authors: Soyeong Jeong, Jinheon Baek, Sung Ju Hwang, Jong C. Park

    Abstract: Open-Domain Conversational Question Answering (ODConvQA) aims at answering questions through a multi-turn conversation based on a retriever-reader pipeline, which retrieves passages and then predicts answers with them. However, such a pipeline approach not only makes the reader vulnerable to the errors propagated from the retriever, but also demands additional effort to develop both the retriever… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Findings of ACL 2023

  41. arXiv:2305.19135  [pdf, other

    cs.CV

    Context-Preserving Two-Stage Video Domain Translation for Portrait Stylization

    Authors: Doyeon Kim, Eunji Ko, Hyunsu Kim, Yunji Kim, Junho Kim, Dongchan Min, Junmo Kim, Sung Ju Hwang

    Abstract: Portrait stylization, which translates a real human face image into an artistically stylized image, has attracted considerable interest and many prior works have shown impressive quality in recent years. However, despite their remarkable performances in the image-level translation tasks, prior methods show unsatisfactory results when they are applied to the video domain. To address the issue, we p… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 5 pages, 3 figures, CVPR 2023 Workshop on AI for Content Creation

  42. arXiv:2305.18846  [pdf, other

    cs.CL cs.AI cs.LG

    Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue Generation

    Authors: Minki Kang, Jin Myung Kwak, Jinheon Baek, Sung Ju Hwang

    Abstract: Language models have achieved impressive performances on dialogue generation tasks. However, when generating responses for a conversation that requires factual knowledge, they are far from perfect, due to an absence of mechanisms to retrieve, encode, and reflect the knowledge in the generated responses. Some knowledge-grounded dialogue generation methods tackle this problem by leveraging facts fro… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Preprint. Under review

  43. arXiv:2305.18395  [pdf, other

    cs.CL cs.AI cs.LG

    Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

    Authors: Minki Kang, Seanie Lee, Jinheon Baek, Kenji Kawaguchi, Sung Ju Hwang

    Abstract: Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small Language Models (LMs) by fine-tu… ▽ More

    Submitted 30 October, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  44. arXiv:2305.18239  [pdf, other

    cs.CL cs.AI

    A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models

    Authors: Hayeon Lee, Rui Hou, Jongpil Kim, Davis Liang, Sung Ju Hwang, Alexander Min

    Abstract: Distillation from Weak Teacher (DWT) is a method of transferring knowledge from a smaller, weaker teacher model to a larger student model to improve its performance. Previous studies have shown that DWT can be effective in the vision domain and natural language processing (NLP) pre-training stage. Specifically, DWT shows promise in practical scenarios, such as enhancing new generation or larger mo… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Findings of ACL 2023

  45. arXiv:2305.16948  [pdf, other

    cs.LG cs.AI

    Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets

    Authors: Hayeon Lee, Sohyun An, Minseon Kim, Sung Ju Hwang

    Abstract: Distillation-aware Neural Architecture Search (DaNAS) aims to search for an optimal student architecture that obtains the best performance and/or efficiency when distilling the knowledge from a given teacher model. Previous DaNAS methods have mostly tackled the search for the neural architecture for fixed datasets and the teacher, which are not generalized well on a new task consisting of an unsee… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ICLR 2023 (Notable-top-25%)

  46. arXiv:2305.16943  [pdf, other

    cs.LG

    DiffusionNAG: Predictor-guided Neural Architecture Generation with Diffusion Models

    Authors: Sohyun An, Hayeon Lee, Jaehyeong Jo, Seanie Lee, Sung Ju Hwang

    Abstract: Existing NAS methods suffer from either an excessive amount of time for repetitive sampling and training of many task-irrelevant architectures. To tackle such limitations of existing NAS methods, we propose a paradigm shift from NAS to a novel conditional Neural Architecture Generation (NAG) framework based on diffusion models, dubbed DiffusionNAG. Specifically, we consider the neural architecture… ▽ More

    Submitted 24 March, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted to ICLR 2024

  47. arXiv:2305.16625  [pdf, other

    cs.LG cs.AI cs.NE

    Set-based Neural Network Encoding

    Authors: Bruno Andreis, Soro Bedionita, Sung Ju Hwang

    Abstract: We propose an approach to neural network weight encoding for generalization performance prediction that utilizes set-to-set and set-to-vector functions to efficiently encode neural network parameters. Our approach is capable of encoding neural networks in a modelzoo of mixed architecture and different parameter sizes as opposed to previous approaches that require custom encoding models for differe… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: 13 pages

  48. arXiv:2305.13831  [pdf, other

    cs.SD cs.CL eess.AS

    ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models

    Authors: Minki Kang, Wooseok Han, Sung Ju Hwang, Eunho Yang

    Abstract: Emotional Text-To-Speech (TTS) is an important task in the development of systems (e.g., human-like dialogue agents) that require natural and emotional speech. Existing approaches, however, only aim to produce emotional TTS for seen speakers during training, without consideration of the generalization to unseen speakers. In this paper, we propose ZET-Speech, a zero-shot adaptive emotion-controllab… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023

  49. arXiv:2305.12416  [pdf, other

    cs.IR

    Direct Fact Retrieval from Knowledge Graphs without Entity Linking

    Authors: Jinheon Baek, Alham Fikri Aji, Jens Lehmann, Sung Ju Hwang

    Abstract: There has been a surge of interest in utilizing Knowledge Graphs (KGs) for various natural language processing/understanding tasks. The conventional mechanism to retrieve facts in KGs usually involves three steps: entity span detection, entity disambiguation, and relation classification. However, this approach requires additional labels for training each of the three subcomponents in addition to p… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  50. arXiv:2304.01515  [pdf, other

    cs.LG cs.CL cs.CV

    Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models

    Authors: Jaewoong Lee, Sangwon Jang, Jaehyeong Jo, Jaehong Yoon, Yunji Kim, Jin-Hwa Kim, Jung-Woo Ha, Sung Ju Hwang

    Abstract: Token-based masked generative models are gaining popularity for their fast inference time with parallel decoding. While recent token-based approaches achieve competitive performance to diffusion-based models, their generation performance is still suboptimal as they sample multiple tokens simultaneously without considering the dependence among them. We empirically investigate this problem and propo… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    ACM Class: I.5.4; I.2.10; I.4.m