Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 229 results for author: Chung, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18925  [pdf, other

    cs.CL cs.CV

    Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding

    Authors: Jiwan Chung, Sungjae Lee, Minseo Kim, Seungju Han, Ashkan Yousefpour, Jack Hessel, Youngjae Yu

    Abstract: Visual arguments, often used in advertising or social causes, rely on images to persuade viewers to do or believe something. Understanding these arguments requires selective vision: only specific visual stimuli within an image are relevant to the argument, and relevance can only be understood within the context of a broader argumentative structure. While visual arguments are readily appreciated by… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures

  2. arXiv:2406.16994  [pdf, other

    eess.SP cs.AI

    Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

    Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

    Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 17 pages, 22 figures

  3. arXiv:2406.16896  [pdf, other

    eess.SP cs.LG

    f-GAN: A frequency-domain-constrained generative adversarial network for PPG to ECG synthesis

    Authors: Nathan C. L. Kong, Dae Lee, Huyen Do, Dae Hoon Park, Cong Xu, Hongda Mao, Jonathan Chung

    Abstract: Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual's cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  4. arXiv:2406.14703  [pdf, other

    cs.CL cs.AI

    Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

    Authors: Seungbeen Lee, Seungwon Lim, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee, Jinyoung Yeo, Youngjae Yu

    Abstract: The idea of personality in descriptive psychology, traditionally defined through observable behavior, has now been extended to Large Language Models (LLMs) to better understand their behavior. This raises a question: do LLMs exhibit distinct and consistent personality traits, similar to humans? Existing self-assessment personality tests, while applicable, lack the necessary validity and reliabilit… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint; Under review

  5. arXiv:2406.14559  [pdf, other

    cs.SD eess.AS

    Disentangled Representation Learning for Environment-agnostic Speaker Recognition

    Authors: KiHyun Nam, Hee-Soo Heo, Jee-weon Jung, Joon Son Chung

    Abstract: This work presents a framework based on feature disentanglement to learn speaker embeddings that are robust to environmental variations. Our framework utilises an auto-encoder as a disentangler, dividing the input speaker embedding into components related to the speaker and other residual information. We employ a group of objective functions to ensure that the auto-encoder's code representation -… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024. The official webpage can be found at https://mm.kaist.ac.kr/projects/voxceleb-disentangler/

  6. arXiv:2406.14190  [pdf, other

    cs.LO

    Extended Resolution Clause Learning via Dual Implication Points

    Authors: Sam Buss, Jonathan Chung, Vijay Ganesh, Albert Oliveras

    Abstract: We present a new extended resolution clause learning (ERCL) algorithm, implemented as part of a conflict-driven clause-learning (CDCL) SAT solver, wherein new variables are dynamically introduced as definitions for {\it Dual Implication Points} (DIPs) in the implication graph constructed by the solver at runtime. DIPs are generalizations of unique implication points and can be informally viewed as… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  7. arXiv:2406.10549  [pdf, other

    eess.AS cs.CL cs.SD

    Lightweight Audio Segmentation for Long-form Speech Translation

    Authors: Jaesong Lee, Soyoon Kim, Hanbyul Kim, Joon Son Chung

    Abstract: Speech segmentation is an essential part of speech translation (ST) systems in real-world scenarios. Since most ST models are designed to process speech segments, long-form audio must be partitioned into shorter segments before translation. Recently, data-driven approaches for the speech segmentation task have been developed. Although the approaches improve overall translation quality, a performan… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  8. arXiv:2406.09286  [pdf, other

    eess.AS cs.SD

    FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching

    Authors: Chaeyoung Jung, Suyeon Lee, Ji-Hoon Kim, Joon Son Chung

    Abstract: This work proposes an efficient method to enhance the quality of corrupted speech signals by leveraging both acoustic and visual cues. While existing diffusion-based approaches have demonstrated remarkable quality, their applicability is limited by slow inference speeds and computational complexity. To address this issue, we present FlowAVSE which enhances the inference speed and reduces the numbe… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  9. arXiv:2406.08719  [pdf, other

    cs.CR

    TikTag: Breaking ARM's Memory Tagging Extension with Speculative Execution

    Authors: Juhee Kim, Jinbum Park, Sihyeon Roh, Jaeyoung Chung, Youngjoo Lee, Taesoo Kim, Byoungyoung Lee

    Abstract: ARM Memory Tagging Extension (MTE) is a new hardware feature introduced in ARMv8.5-A architecture, aiming to detect memory corruption vulnerabilities. The low overhead of MTE makes it an attractive solution to mitigate memory corruption attacks in modern software systems and is considered the most promising path forward for improving C/C++ software security. This paper explores the potential secur… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  10. arXiv:2406.05339  [pdf, other

    eess.AS cs.AI

    To what extent can ASV systems naturally defend against spoofing attacks?

    Authors: Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi, Joon Son Chung

    Abstract: The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target. However, emerging advancements in speech generation technology pose significant threats to the reliability of ASV systems. This study investigates whether ASV effortlessly acquires robustness against spoofing attacks (i.e., zero-shot capability) by systematically ex… ▽ More

    Submitted 14 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, 3 tables, Interspeech 2024

  11. arXiv:2406.03344  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

    Authors: Mehmet Hamza Erol, Arda Senocak, Jiu Feng, Joon Son Chung

    Abstract: Transformers have rapidly become the preferred choice for audio classification, surpassing methods based on CNNs. However, Audio Spectrogram Transformers (ASTs) exhibit quadratic scaling due to self-attention. The removal of this quadratic self-attention cost presents an appealing direction. Recently, state space models (SSMs), such as Mamba, have demonstrated potential in language and vision task… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Code is available at https://github.com/mhamzaerol/Audio-Mamba-AuM

  12. arXiv:2405.16301  [pdf, other

    cs.CV cs.LG

    Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples

    Authors: Dae Ung Jo, Kyuewang Lee, JaeHo Chung, Jin Young Choi

    Abstract: Securing a sufficient amount of paired data is important to train an image-text retrieval (ITR) model, but collecting paired data is very expensive. To address this issue, in this paper, we propose an active learning algorithm for ITR that can collect paired data cost-efficiently. Previous studies assume that image-text pairs are given and their category labels are asked to the annotator. However,… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  13. arXiv:2405.16137  [pdf, other

    cs.RO

    Comparison between Behavior Trees and Finite State Machines

    Authors: Matteo Iovino, Julian Förster, Pietro Falco, Jen Jen Chung, Roland Siegwart, Christian Smith

    Abstract: Behavior Trees (BTs) were first conceived in the computer games industry as a tool to model agent behavior, but they received interest also in the robotics community as an alternative policy design to Finite State Machines (FSMs). The advantages of BTs over FSMs had been highlighted in many works, but there is no thorough practical comparison of the two designs. Such a comparison is particularly r… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: Submitted to IEEE Transactions on Robotics (T-RO). arXiv admin note: text overlap with arXiv:2209.07392

  14. arXiv:2405.13220  [pdf, other

    cs.LG cs.AI math.NA

    Paired Autoencoders for Inverse Problems

    Authors: Matthias Chung, Emma Hart, Julianne Chung, Bas Peters, Eldad Haber

    Abstract: We consider the solution of nonlinear inverse problems where the forward problem is a discretization of a partial differential equation. Such problems are notoriously difficult to solve in practice and require minimizing a combination of a data-fit term and a regularization term. The main computational bottleneck of typical algorithms is the direct estimation of the data misfit. Therefore, likelih… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 18 pages, 6 figures

  15. arXiv:2405.10272  [pdf, other

    cs.CV cs.AI cs.SD eess.AS eess.IV

    Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

    Authors: Youngjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung

    Abstract: The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  16. arXiv:2405.05581  [pdf, other

    cs.HC cs.AI cs.CL

    One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations

    Authors: Yoonjoo Lee, Kihoon Son, Tae Soo Kim, Jisu Kim, John Joon Young Chung, Eytan Adar, Juho Kim

    Abstract: As Large Language Models (LLMs) are nondeterministic, the same input can generate different outputs, some of which may be incorrect or hallucinated. If run again, the LLM may correct itself and produce the correct answer. Unfortunately, most LLM-powered systems resort to single results which, correct or not, users accept. Having the LLM produce multiple outputs may help identify disagreements or a… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted to FAccT 2024

  17. arXiv:2404.16283  [pdf, other

    cs.DC cs.LG

    Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services

    Authors: Jiachen Liu, Zhiyu Wu, Jae-Won Chung, Fan Lai, Myungjin Lee, Mosharaf Chowdhury

    Abstract: The advent of large language models (LLMs) has transformed text-based services, enabling capabilities ranging from real-time translation to AI-driven chatbots. However, existing serving systems primarily focus on optimizing server-side aggregate metrics like token generation throughput, ignoring individual user experience with streamed text. As a result, under high and/or bursty load, a significan… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 16 pages, 22 figures

  18. arXiv:2404.12465  [pdf, other

    cs.CR

    Toward a Quantum Information System Cybersecurity Taxonomy and Testbed: Exploiting a Unique Opportunity for Early Impact

    Authors: Benjamin Blakely, Joaquin Chung, Alec Poczatek, Ryan Syed, Raj Kettimuthu

    Abstract: Any human-designed system can potentially be exploited in ways that its designers did not envision, and information systems or networks using quantum components do not escape this reality. We are presented with a unique but quickly waning opportunity to bring cybersecurity concerns to the forefront for quantum information systems before they become widely deployed. The resources and knowledge requ… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  19. arXiv:2404.11358  [pdf, other

    cs.CV

    DeblurGS: Gaussian Splatting for Camera Motion Blur

    Authors: Jeongtaek Oh, Jaeyoung Chung, Dongwoo Lee, Kyoung Mu Lee

    Abstract: Although significant progress has been made in reconstructing sharp 3D scenes from motion-blurred images, a transition to real-world applications remains challenging. The primary obstacle stems from the severe blur which leads to inaccuracies in the acquisition of initial camera poses through Structure-from-Motion, a critical aspect often overlooked by previous approaches. To address this challeng… ▽ More

    Submitted 17 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  20. arXiv:2404.06675  [pdf, ps, other

    cs.LG cs.AR cs.DC

    Toward Cross-Layer Energy Optimizations in Machine Learning Systems

    Authors: Jae-Won Chung, Mosharaf Chowdhury

    Abstract: The enormous energy consumption of machine learning (ML) and generative AI workloads shows no sign of waning, taking a toll on operating costs, power delivery, and environmental sustainability. Despite a long line of research on energy-efficient hardware, we found that software plays a critical role in ML energy optimization through two recent works: Zeus and Perseus. This is especially true for l… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  21. arXiv:2404.03753  [pdf, other

    cs.LO cs.AI cs.LG

    A Reinforcement Learning based Reset Policy for CDCL SAT Solvers

    Authors: Chunxiao Li, Charlie Liu, Jonathan Chung, Zhengyang Lu, Piyush Jha, Vijay Ganesh

    Abstract: Restart policy is an important technique used in modern Conflict-Driven Clause Learning (CDCL) solvers, wherein some parts of the solver state are erased at certain intervals during the run of the solver. In most solvers, variable activities are preserved across restart boundaries, resulting in solvers continuing to search parts of the assignment tree that are not far from the one immediately prio… ▽ More

    Submitted 19 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  22. arXiv:2404.03477  [pdf, other

    cs.CV

    Towards Automated Movie Trailer Generation

    Authors: Dawit Mureja Argaw, Mattia Soldan, Alejandro Pardo, Chen Zhao, Fabian Caba Heilbron, Joon Son Chung, Bernard Ghanem

    Abstract: Movie trailers are an essential tool for promoting films and attracting audiences. However, the process of creating trailers can be time-consuming and expensive. To streamline this process, we propose an automatic trailer generation framework that generates plausible trailers from a full movie by automating shot selection and composition. Our approach draws inspiration from machine translation tec… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  23. arXiv:2404.03398  [pdf, other

    cs.CV

    Scaling Up Video Summarization Pretraining with Large Language Models

    Authors: Dawit Mureja Argaw, Seunghyun Yoon, Fabian Caba Heilbron, Hanieh Deilamsalehy, Trung Bui, Zhaowen Wang, Franck Dernoncourt, Joon Son Chung

    Abstract: Long-form video content constitutes a significant portion of internet traffic, making automated video summarization an essential research problem. However, existing video summarization datasets are notably limited in their size, constraining the effectiveness of state-of-the-art methods for generalization. Our work aims to overcome this limitation by capitalizing on the abundance of long-form vide… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  24. arXiv:2404.02575  [pdf, other

    cs.CL

    Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

    Authors: Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Seonghwan Kim, Taeyoon Kwon, Jiwan Chung, Youngjae Yu, Jinyoung Yeo

    Abstract: Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for large language models (LLMs), even though they have demonstrated promising performance in other reasoning tasks. Within this context, some recent studies use progra… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 38 pages, 4 figures

  25. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  26. A Design Space for Intelligent and Interactive Writing Assistants

    Authors: Mina Lee, Katy Ilonka Gero, John Joon Young Chung, Simon Buckingham Shum, Vipul Raheja, Hua Shen, Subhashini Venugopalan, Thiemo Wambsganss, David Zhou, Emad A. Alghamdi, Tal August, Avinash Bhat, Madiha Zahrah Choksi, Senjuti Dutta, Jin L. C. Guo, Md Naimul Hoque, Yewon Kim, Simon Knight, Seyed Parsa Neshaei, Agnia Sergeyuk, Antonette Shibani, Disha Shrivastava, Lila Shroff, Jessi Stark, Sarah Sterman , et al. (11 additional authors not shown)

    Abstract: In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge by proposing a design space as a structured way to examine and explore the multidimensional space of intelligent and interactive writing assistants. Through a large community collaboration, we explore… ▽ More

    Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: Published as a conference paper at CHI 2024

  27. arXiv:2403.13548  [pdf, other

    cs.CV

    Diversity-aware Channel Pruning for StyleGAN Compression

    Authors: Jiwoo Chung, Sangeek Hyun, Sang-Heon Shim, Jae-Pil Heo

    Abstract: StyleGAN has shown remarkable performance in unconditional image generation. However, its high computational cost poses a significant challenge for practical applications. Although recent efforts have been made to compress StyleGAN while preserving its performance, existing compressed models still lag behind the original model, particularly in terms of sample diversity. To overcome this, we propos… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. Project page: https://jiwoogit.github.io/DCP-GAN_site

  28. arXiv:2403.10576  [pdf, other

    cs.CR cs.CL cs.LG

    Ignore Me But Don't Replace Me: Utilizing Non-Linguistic Elements for Pretraining on the Cybersecurity Domain

    Authors: Eugene Jang, Jian Cui, Dayeon Yim, Youngjin Jin, Jin-Woo Chung, Seungwon Shin, Yongjae Lee

    Abstract: Cybersecurity information is often technically complex and relayed through unstructured text, making automation of cyber threat intelligence highly challenging. For such text domains that involve high levels of expertise, pretraining on in-domain corpora has been a popular method for language models to obtain domain expertise. However, cybersecurity texts often contain non-linguistic elements (suc… ▽ More

    Submitted 2 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: To appear in NAACL Findings 2024

    ACM Class: I.2.7

  29. arXiv:2403.09502  [pdf, other

    cs.LG cs.AI cs.MM

    EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning

    Authors: Jongsuk Kim, Hyeongkeun Lee, Kyeongha Rho, Junmo Kim, Joon Son Chung

    Abstract: Recent advancements in self-supervised audio-visual representation learning have demonstrated its potential to capture rich and comprehensive representations. However, despite the advantages of data augmentation verified in many learning methods, audio-visual learning has struggled to fully harness these benefits, as augmentations can easily disrupt the correspondence between input pairs. To addre… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 15 pages, 3 figures; Accepted to ICML 2024 (camera ready version)

  30. Authors' Values and Attitudes Towards AI-bridged Scalable Personalization of Creative Language Arts

    Authors: Taewook Kim, Hyomin Han, Eytan Adar, Matthew Kay, John Joon Young Chung

    Abstract: Generative AI has the potential to create a new form of interactive media: AI-bridged creative language arts (CLA), which bridge the author and audience by personalizing the author's vision to the audience's context and taste at scale. However, it is unclear what the authors' values and attitudes would be regarding AI-bridged CLA. To identify these values and attitudes, we conducted an interview s… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 16 pages, 6 figures, 2 tables. Accepted to ACM CHI 2024

  31. arXiv:2401.16743  [pdf, ps, other

    cs.IT

    Multi-Group Multicasting Systems Using Multiple RISs

    Authors: Hyeongtaek Lee, Seungsik Moon, Youngjoo Lee, Jaeky Oh, Jaehoon Chung, Junil Choi

    Abstract: In this paper, practical utilization of multiple distributed reconfigurable intelligent surfaces (RISs), which are able to conduct group-specific operations, for multi-group multicasting systems is investigated. To tackle the inter-group interference issue in the multi-group multicasting systems, the block diagonalization (BD)-based beamforming is considered first. Without any inter-group interfer… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE Transactions on Wireless Communications

  32. arXiv:2401.11429  [pdf, ps, other

    cs.IT eess.SP

    Joint Downlink and Uplink Optimization for RIS-Aided FDD MIMO Communication Systems

    Authors: Gyoseung Lee, Hyeongtaek Lee, Donghwan Kim, Jaehoon Chung, A. Lee. Swindlehurst, Junil Choi

    Abstract: This paper investigates reconfigurable intelligent surface (RIS)-aided frequency division duplexing (FDD) communication systems. Since the downlink and uplink signals are simultaneously transmitted in FDD, the phase shifts at the RIS should be designed to support both transmissions. Considering a single-user multiple-input multiple-output system, we formulate a weighted sum-rate maximization probl… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE Transactions on Wireless Communications

  33. arXiv:2401.10032  [pdf, other

    eess.AS cs.AI eess.SP

    FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder

    Authors: Tan Dat Nguyen, Ji-Hoon Kim, Youngjoon Jang, Jaehun Kim, Joon Son Chung

    Abstract: The goal of this paper is to generate realistic audio with a lightweight and fast diffusion-based vocoder, named FreGrad. Our framework consists of the following three key components: (1) We employ discrete wavelet transform that decomposes a complicated waveform into sub-band wavelets, which helps FreGrad to operate on a simple and concise feature space, (2) We design a frequency-aware dilated co… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  34. arXiv:2401.09944  [pdf, other

    cs.LG cs.AI cs.RO

    WindSeer: Real-time volumetric wind prediction over complex terrain aboard a small UAV

    Authors: Florian Achermann, Thomas Stastny, Bogdan Danciu, Andrey Kolobov, Jen Jen Chung, Roland Siegwart, Nicholas Lawrance

    Abstract: Real-time high-resolution wind predictions are beneficial for various applications including safe manned and unmanned aviation. Current weather models require too much compute and lack the necessary predictive capabilities as they are valid only at the scale of multiple kilometers and hours - much lower spatial and temporal resolutions than these applications require. Our work, for the first time,… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  35. arXiv:2401.08415  [pdf, other

    cs.SD cs.LG eess.AS

    From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers

    Authors: Jiu Feng, Mehmet Hamza Erol, Joon Son Chung, Arda Senocak

    Abstract: Transformers have become central to recent advances in audio classification. However, training an audio spectrogram transformer, e.g. AST, from scratch can be resource and time-intensive. Furthermore, the complexity of transformers heavily depends on the input audio spectrogram size. In this work, we aim to optimize AST training by linking to the resolution in the time-axis. We introduce multi-pha… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024

  36. arXiv:2312.15861  [pdf, other

    cs.CV

    Towards Squeezing-Averse Virtual Try-On via Sequential Deformation

    Authors: Sang-Heon Shim, Jiwoo Chung, Jae-Pil Heo

    Abstract: In this paper, we first investigate a visual quality degradation problem observed in recent high-resolution virtual try-on approach. The tendency is empirically found that the textures of clothes are squeezed at the sleeve, as visualized in the upper row of Fig.1(a). A main reason for the issue arises from a gradient conflict between two popular losses, the Total Variation (TV) and adversarial los… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024

  37. arXiv:2312.11949  [pdf, other

    cs.HC

    CreativeConnect: Supporting Reference Recombination for Graphic Design Ideation with Generative AI

    Authors: DaEun Choi, Sumin Hong, Jeongeon Park, John Joon Young Chung, Juho Kim

    Abstract: Graphic designers often get inspiration through the recombination of references. Our formative study (N=6) reveals that graphic designers focus on conceptual keywords during this process, and want support for discovering the keywords, expanding them, and exploring diverse recombination options of them, while still having room for designers' creativity. We propose CreativeConnect, a system with gen… ▽ More

    Submitted 6 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  38. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  39. arXiv:2312.09008  [pdf, other

    cs.CV

    Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

    Authors: Jiwoo Chung, Sangeek Hyun, Jae-Pil Heo

    Abstract: Despite the impressive generative capabilities of diffusion models, existing diffusion model-based style transfer methods require inference-stage optimization (e.g. fine-tuning or textual inversion of style) which is time-consuming, or fails to leverage the generative ability of large-scale diffusion models. To address these issues, we introduce a novel artistic style transfer method based on a pr… ▽ More

    Submitted 20 March, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR 2024. Project page: https://jiwoogit.github.io/StyleID_site

  40. arXiv:2312.06902  [pdf, other

    cs.LG cs.DC

    Perseus: Removing Energy Bloat from Large Model Training

    Authors: Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury

    Abstract: Training large AI models on numerous GPUs consumes a massive amount of energy. We observe that not all energy consumed during training directly contributes to end-to-end training throughput, and a significant portion can be removed without slowing down training, which we call energy bloat. In this work, we identify two independent sources of energy bloat in large model training, intrinsic and ex… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Open-source at https://ml.energy/zeus/perseus/

  41. arXiv:2312.06498  [pdf, other

    cs.CE eess.SY

    Sustainability through Optimal Design of Buildings for Natural Ventilation using Updated Comfort and Occupancy Models

    Authors: Jihoon Chung, Nastaran Shahmansouri, Rhys Goldstein, James Stoddart, John Locke

    Abstract: This paper explores the benefits of incorporating natural ventilation (NV) simulation into a generative process of designing residential buildings to improve energy efficiency and indoor thermal comfort. Our proposed workflow uses the Wave Function Collapse algorithm to generate a diverse set of plausible floor plans. It also includes post-COVID occupant presence models while incorporating adaptiv… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 12 pages, 14 figures

  42. arXiv:2311.13398  [pdf, other

    cs.CV cs.GR

    Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images

    Authors: Jaeyoung Chung, Jeongtaek Oh, Kyoung Mu Lee

    Abstract: In this paper, we present a method to optimize Gaussian splatting with a limited number of images while avoiding overfitting. Representing a 3D scene by combining numerous Gaussian splats has yielded outstanding visual quality. However, it tends to overfit the training views when only a small number of images are available. To address this issue, we introduce a dense depth map as a geometry guide… ▽ More

    Submitted 4 January, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures; Project page: robot0321.github.io/DepthRegGS

  43. arXiv:2311.13384  [pdf, other

    cs.CV

    LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

    Authors: Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, Kyoung Mu Lee

    Abstract: With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by… ▽ More

    Submitted 23 November, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: Project page: https://luciddreamer-cvlab.github.io/

  44. arXiv:2311.04066  [pdf, other

    cs.CV cs.AI cs.MM cs.SD eess.AS

    Can CLIP Help Sound Source Localization?

    Authors: Sooyoung Park, Arda Senocak, Joon Son Chung

    Abstract: Large-scale pre-trained image-text models demonstrate remarkable versatility across diverse tasks, benefiting from their robust representational capabilities and effective multimodal alignment. We extend the application of these models, specifically CLIP, to the domain of sound source localization. Unlike conventional approaches, we employ the pre-trained CLIP model without explicit text input, re… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  45. arXiv:2311.01233  [pdf, other

    cs.CV cs.AI

    Long Story Short: a Summarize-then-Search Method for Long Video Question Answering

    Authors: Jiwan Chung, Youngjae Yu

    Abstract: Large language models such as GPT-3 have demonstrated an impressive capability to adapt to new tasks without requiring task-specific training data. This capability has been particularly effective in settings such as narrative question answering, where the diversity of tasks is immense, but the available supervision data is small. In this work, we investigate if such language models can extend thei… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Published in BMVC 2023

  46. arXiv:2310.19581  [pdf, other

    eess.AS cs.CV cs.SD

    Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model

    Authors: Suyeon Lee, Chaeyoung Jung, Youngjoon Jang, Jaehun Kim, Joon Son Chung

    Abstract: The objective of this work is to extract target speaker's voice from a mixture of voices using visual cues. Existing works on audio-visual speech separation have demonstrated their performance with promising intelligibility, but maintaining naturalness remains a challenge. To address this issue, we propose AVDiffuSS, an audio-visual speech separation model based on a diffusion mechanism known for… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Project page with demo: https://mm.kaist.ac.kr/projects/avdiffuss/

  47. arXiv:2310.19274  [pdf, other

    cs.LG physics.comp-ph physics.geo-ph

    Prediction of Effective Elastic Moduli of Rocks using Graph Neural Networks

    Authors: Jaehong Chung, Rasool Ahmad, WaiChing Sun, Wei Cai, Tapan Mukerji

    Abstract: This study presents a Graph Neural Networks (GNNs)-based approach for predicting the effective elastic moduli of rocks from their digital CT-scan images. We use the Mapper algorithm to transform 3D digital rock images into graph datasets, encapsulating essential geometrical information. These graphs, after training, prove effective in predicting elastic moduli. Our GNN model shows robust predictiv… ▽ More

    Submitted 2 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

  48. arXiv:2310.16058  [pdf, other

    cs.LG stat.AP

    A Sparse Bayesian Learning for Diagnosis of Nonstationary and Spatially Correlated Faults with Application to Multistation Assembly Systems

    Authors: Jihoon Chung, Zhenyu Kong

    Abstract: Sensor technology developments provide a basis for effective fault diagnosis in manufacturing systems. However, the limited number of sensors due to physical constraints or undue costs hinders the accurate diagnosis in the actual process. In addition, time-varying operational conditions that generate nonstationary process faults and the correlation information in the process require to consider fo… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  49. arXiv:2310.10418  [pdf, other

    cs.LG cs.AI

    Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms

    Authors: Seungju Han, Junhyeok Kim, Jack Hessel, Liwei Jiang, Jiwan Chung, Yejin Son, Yejin Choi, Youngjae Yu

    Abstract: Commonsense norms are defeasible by context: reading books is usually great, but not when driving a car. While contexts can be explicitly described in language, in embodied scenarios, contexts are often provided visually. This type of visually grounded reasoning about defeasible commonsense norms is generally easy for humans, but (as we show) poses a challenge for machines, as it necessitates both… ▽ More

    Submitted 11 November, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Published as a conference paper at EMNLP 2023 (long)

  50. arXiv:2310.09767  [pdf, other

    cs.CL cs.AI

    VLIS: Unimodal Language Models Guide Multimodal Language Generation

    Authors: Jiwan Chung, Youngjae Yu

    Abstract: Multimodal language generation, which leverages the synergy of language and vision, is a rapidly expanding field. However, existing vision-language models face challenges in tasks that require complex linguistic understanding. To address this issue, we introduce Visual-Language models as Importance Sampling weights (VLIS), a novel framework that combines the visual conditioning capability of visio… ▽ More

    Submitted 19 December, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

    Comments: Accepted as main paper in EMNLP 2023