Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–33 of 33 results for author: Kwon, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.16054  [pdf, other

    cs.RO

    Development of Tendon-Driven Compliant Snake Robot with Global Bending and Twisting Actuation

    Authors: Seongil Kwon, Serdar Incekara, Gangil Kwon, Junhyoung Ha

    Abstract: Snake robots have been studied for decades with the aim of achieving biological snakes' fluent locomotion. Yet, as of today, their locomotion remains far from that of the biological snakes. Our recent study suggested that snake locomotion utilizing partial ground contacts can be achieved with robots by using body compliance and lengthwise-globally applied body tensions. In this paper, we present t… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 10 pages, 12 figures

  2. arXiv:2405.16823  [pdf, other

    cs.CV cs.AI

    Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection

    Authors: Gihyun Kwon, Jangho Park, Jong Chul Ye

    Abstract: While text-to-image models have achieved impressive capabilities in image generation and editing, their application across various modalities often necessitates training separate models. Inspired by existing method of single image editing with self attention injection and video editing with shared attention, we propose a novel unified editing framework that combines the strengths of both approache… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Page: https://unifyediting.github.io/

  3. arXiv:2404.03913  [pdf, other

    cs.CV cs.AI cs.LG

    Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

    Authors: Gihyun Kwon, Simon Jenni, Dingzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron

    Abstract: While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing customized text-to-image diffusion models at inference time. Specifically, the method breaks the process into two steps: creating a template image aligned with t… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  4. arXiv:2312.08223  [pdf, other

    cs.CV

    Patch-wise Graph Contrastive Learning for Image Translation

    Authors: Chanyong Jung, Gihyun Kwon, Jong Chul Ye

    Abstract: Recently, patch-wise contrastive learning is drawing attention for the image translation by exploring the semantic correspondence between the input and output images. To further explore the patch-wise topology for high-level semantic understanding, here we exploit the graph neural network to capture the topology-aware features. Specifically, we construct the graph based on the patch-wise similarit… ▽ More

    Submitted 19 February, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: AAAI 2024

  5. arXiv:2311.18608  [pdf, other

    cs.CV cs.AI cs.LG

    Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing

    Authors: Hyelin Nam, Gihyun Kwon, Geon Yeong Park, Jong Chul Ye

    Abstract: With the remarkable advent of text-to-image diffusion models, image editing methods have become more diverse and continue to evolve. A promising recent approach in this realm is Delta Denoising Score (DDS) - an image editing technique based on Score Distillation Sampling (SDS) framework that leverages the rich generative prior of text-to-image diffusion models. However, relying solely on the diffe… ▽ More

    Submitted 1 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: CVPR 2024 (poster); Project page: https://hyelinnam.github.io/CDS/

  6. arXiv:2310.02712  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    ED-NeRF: Efficient Text-Guided Editing of 3D Scene with Latent Space NeRF

    Authors: Jangho Park, Gihyun Kwon, Jong Chul Ye

    Abstract: Recently, there has been a significant advancement in text-to-image diffusion models, leading to groundbreaking performance in 2D image generation. These advancements have been extended to 3D models, enabling the generation of novel 3D objects from textual descriptions. This has evolved into NeRF editing methods, which allow the manipulation of existing 3D objects through textual conditioning. How… ▽ More

    Submitted 21 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: ICLR 2024; Project Page: https://jhq1234.github.io/ed-nerf.github.io/

  7. arXiv:2306.04396  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Improving Diffusion-based Image Translation using Asymmetric Gradient Guidance

    Authors: Gihyun Kwon, Jong Chul Ye

    Abstract: Diffusion models have shown significant progress in image translation tasks recently. However, due to their stochastic nature, there's often a trade-off between style transformation and content preservation. Current strategies aim to disentangle style and content, preserving the source image's structure while successfully transitioning from a source to a target domain under text or one-shot image… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  8. arXiv:2305.18842  [pdf, other

    cs.CL cs.AI cs.CV

    Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

    Authors: Xingyu Fu, Sheng Zhang, Gukyeong Kwon, Pramuditha Perera, Henghui Zhu, Yuhao Zhang, Alexander Hanbo Li, William Yang Wang, Zhiguo Wang, Vittorio Castelli, Patrick Ng, Dan Roth, Bing Xiang

    Abstract: The open-ended Visual Question Answering (VQA) task requires AI models to jointly reason over visual and natural language inputs using world knowledge. Recently, pre-trained Language Models (PLM) such as GPT-3 have been applied to the task and shown to be powerful world knowledge sources. However, these methods suffer from low knowledge coverage caused by PLM bias -- the tendency to generate certa… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023 Findings

  9. arXiv:2305.15086  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Unpaired Image-to-Image Translation via Neural Schrödinger Bridge

    Authors: Beomsu Kim, Gihyun Kwon, Kwanyoung Kim, Jong Chul Ye

    Abstract: Diffusion models are a powerful class of generative models which simulate stochastic differential equations (SDEs) to generate data from noise. While diffusion models have achieved remarkable progress, they have limitations in unpaired image-to-image (I2I) translation tasks due to the Gaussian prior assumption. Schrödinger Bridge (SB), which learns an SDE to translate between two arbitrary distrib… ▽ More

    Submitted 2 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: ICLR 2024

  10. arXiv:2304.02389  [pdf, other

    eess.IV cs.CV cs.LG

    DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images

    Authors: Bo Qian, Hao Chen, Xiangning Wang, Haoxuan Che, Gitaek Kwon, Jaeyoung Kim, Sungjin Choi, Seoyoung Shin, Felix Krause, Markus Unterdechler, Junlin Hou, Rui Feng, Yihao Li, Mostafa El Habib Daho, Qiang Wu, Ping Zhang, Xiaokang Yang, Yiyu Cai, Weiping Jia, Huating Li, Bin Sheng

    Abstract: Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and s… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

  11. arXiv:2302.03900  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models

    Authors: Hyeonho Jeong, Gihyun Kwon, Jong Chul Ye

    Abstract: Recent advancements in large scale text-to-image models have opened new possibilities for guiding the creation of images through human-devised natural language. However, while prior literature has primarily focused on the generation of individual images, it is essential to consider the capability of these models to ensure coherency within a sequence of images to fulfill the demands of real-world a… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

  12. arXiv:2210.09558  [pdf, other

    eess.IV cs.CV cs.LG

    Bag of Tricks for Developing Diabetic Retinopathy Analysis Framework to Overcome Data Scarcity

    Authors: Gitaek Kwon, Eunjin Kim, Sunho Kim, Seongwon Bak, Minsung Kim, Jaeyoung Kim

    Abstract: Recently, diabetic retinopathy (DR) screening utilizing ultra-wide optical coherence tomography angiography (UW-OCTA) has been used in clinical practices to detect signs of early DR. However, developing a deep learning-based DR analysis system using UW-OCTA images is not trivial due to the difficulty of data collection and the absence of public datasets. By realistic constraints, a model trained o… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

  13. arXiv:2209.15264  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Diffusion-based Image Translation using Disentangled Style and Content Representation

    Authors: Gihyun Kwon, Jong Chul Ye

    Abstract: Diffusion-based image translation guided by semantic texts or a single target image has enabled flexible style transfer which is not limited to the specific domains. Unfortunately, due to the stochastic nature of diffusion models, it is often difficult to maintain the original content of the image during the reverse diffusion. To address this, here we present a novel diffusion-based unsupervised i… ▽ More

    Submitted 1 February, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: ICLR 2023 camera ready

  14. arXiv:2208.02131  [pdf, other

    cs.CV cs.CL cs.LG

    Masked Vision and Language Modeling for Multi-modal Representation Learning

    Authors: Gukyeong Kwon, Zhaowei Cai, Avinash Ravichandran, Erhan Bas, Rahul Bhotika, Stefano Soatto

    Abstract: In this paper, we study how to use masked signal modeling in vision and language (V+L) representation learning. Instead of developing masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal of one modality is reconstructed with the help from another modality. This is motivated by the nature… ▽ More

    Submitted 14 March, 2023; v1 submitted 3 August, 2022; originally announced August 2022.

    Comments: International Conference on Learning Representations (ICLR) 2023

  15. arXiv:2206.11485  [pdf, other

    eess.IV cs.LG

    Patient Aware Active Learning for Fine-Grained OCT Classification

    Authors: Yash-yee Logan, Ryan Benkert, Ahmad Mustafa, Gukyeong Kwon, Ghassan AlRegib

    Abstract: This paper considers making active learning more sensible from a medical perspective. In practice, a disease manifests itself in different forms across patient cohorts. Existing frameworks have primarily used mathematical constructs to engineer uncertainty or diversity-based methods for selecting the most informative samples. However, such algorithms do not present themselves naturally as usable b… ▽ More

    Submitted 27 June, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

    Comments: IEEE International Conference on Image Processing (ICIP)

  16. arXiv:2204.05626  [pdf, other

    cs.CV cs.CL cs.LG

    X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks

    Authors: Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika, Stefano Soatto

    Abstract: In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image. To address these tasks, we propose X-DETR, whose architecture has three major components: an object detector, a language encoder, and vision-language alignment. The vision and language streams are independent until the end and t… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

  17. arXiv:2203.10622  [pdf, other

    eess.IV cs.CV

    Multi-Modal Learning Using Physicians Diagnostics for Optical Coherence Tomography Classification

    Authors: Y. Logan, K. Kokilepersaud, G. Kwon, G. AlRegib, C. Wykoff, H. Yu

    Abstract: In this paper, we propose a framework that incorporates experts diagnostics and insights into the analysis of Optical Coherence Tomography (OCT) using multi-modal learning. To demonstrate the effectiveness of this approach, we create a medical diagnostic attribute dataset to improve disease classification using OCT. Although there have been successful attempts to deploy machine learning for diseas… ▽ More

    Submitted 20 March, 2022; originally announced March 2022.

  18. arXiv:2203.09301  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    One-Shot Adaptation of GAN in Just One CLIP

    Authors: Gihyun Kwon, Jong Chul Ye

    Abstract: There are many recent research efforts to fine-tune a pre-trained generator with a few target images to generate images of a novel domain. Unfortunately, these methods often suffer from overfitting or under-fitting when fine-tuned with a single target image. To address this, here we present a novel single-shot GAN adaptation method through unified CLIP space manipulations. Specifically, our model… ▽ More

    Submitted 30 January, 2023; v1 submitted 17 March, 2022; originally announced March 2022.

  19. arXiv:2203.04195  [pdf, other

    cs.LG cs.CV

    A Gating Model for Bias Calibration in Generalized Zero-shot Learning

    Authors: Gukyeong Kwon, Ghassan AlRegib

    Abstract: Generalized zero-shot learning (GZSL) aims at training a model that can generalize to unseen class data by only using auxiliary information. One of the main challenges in GZSL is a biased model prediction toward seen classes caused by overfitting on only available seen class data during training. To overcome this issue, we propose a two-stream autoencoder-based gating model for GZSL. Our gating mo… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Comments: IEEE Transactions on Image Processing, 2022. Code is available at https://github.com/gukyeongkwon/gating-ae

  20. arXiv:2203.01532  [pdf, other

    cs.CV

    Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks

    Authors: Chanyong Jung, Gihyun Kwon, Jong Chul Ye

    Abstract: Recently, contrastive learning-based image translation methods have been proposed, which contrasts different spatial locations to enhance the spatial correspondence. However, the methods often ignore the diverse semantic relation within the images. To address this, here we propose a novel semantic relation consistency (SRC) regularization along with the decoupled contrastive learning, which utiliz… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  21. arXiv:2112.00374  [pdf, other

    cs.CV cs.CL eess.IV

    CLIPstyler: Image Style Transfer with a Single Text Condition

    Authors: Gihyun Kwon, Jong Chul Ye

    Abstract: Existing neural style transfer methods require reference style images to transfer texture information of style images to content images. However, in many practical situations, users may not have reference style images but still be interested in transferring styles by just imagining them. In order to deal with such applications, we propose a new framework that enables a style transfer `without' a s… ▽ More

    Submitted 19 March, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: CVPR 2022 camera ready

  22. arXiv:2103.16146  [pdf, other

    cs.CV

    Diagonal Attention and Style-based GAN for Content-Style Disentanglement in Image Generation and Translation

    Authors: Gihyun Kwon, Jong Chul Ye

    Abstract: One of the important research topics in image generative models is to disentangle the spatial contents and styles for their separate control. Although StyleGAN can generate content feature vectors from random noises, the resulting spatial content control is primarily intended for minor spatial variations, and the disentanglement of global content and styles is by no means complete. Inspired by a m… ▽ More

    Submitted 23 July, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: ICCV 2021

  23. arXiv:2008.06094  [pdf, other

    cs.CV

    Novelty Detection Through Model-Based Characterization of Neural Networks

    Authors: Gukyeong Kwon, Mohit Prabhushankar, Dogancan Temel, Ghassan AlRegib

    Abstract: In this paper, we propose a model-based characterization of neural networks to detect novel input types and conditions. Novelty detection is crucial to identify abnormal inputs that can significantly degrade the performance of machine learning algorithms. Majority of existing studies have focused on activation-based representations to detect abnormal inputs, which limits the characterization of ab… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: IEEE International Conference on Image Processing (ICIP) 2020

  24. arXiv:2008.00178  [pdf, other

    cs.CV cs.AI cs.LG

    Contrastive Explanations in Neural Networks

    Authors: Mohit Prabhushankar, Gukyeong Kwon, Dogancan Temel, Ghassan AlRegib

    Abstract: Visual explanations are logical arguments based on visual features that justify the predictions made by neural networks. Current modes of visual explanations answer questions of the form $`Why \text{ } P?'$. These $Why$ questions operate under broad contexts thereby providing answers that are irrelevant in some cases. We propose to constrain these $Why$ questions based on some context $Q$ so that… ▽ More

    Submitted 1 August, 2020; originally announced August 2020.

  25. arXiv:2007.09507  [pdf, other

    cs.CV

    Backpropagated Gradient Representations for Anomaly Detection

    Authors: Gukyeong Kwon, Mohit Prabhushankar, Dogancan Temel, Ghassan AlRegib

    Abstract: Learning representations that clearly distinguish between normal and abnormal data is key to the success of anomaly detection. Most of existing anomaly detection algorithms use activation representations from forward propagation while not exploiting gradients from backpropagation to characterize data. Gradients capture model updates required to represent data. Anomalies require more drastic model… ▽ More

    Submitted 18 July, 2020; originally announced July 2020.

    Comments: European Conference on Computer Vision (ECCV) 2020

  26. arXiv:1908.09998  [pdf, other

    cs.CV eess.IV

    Distorted Representation Space Characterization Through Backpropagated Gradients

    Authors: Gukyeong Kwon, Mohit Prabhushankar, Dogancan Temel, Ghassan AlRegib

    Abstract: In this paper, we utilize weight gradients from backpropagation to characterize the representation space learned by deep learning algorithms. We demonstrate the utility of such gradients in applications including perceptual image quality assessment and out-of-distribution classification. The applications are chosen to validate the effectiveness of gradients as features when the test image distribu… ▽ More

    Submitted 26 August, 2019; originally announced August 2019.

    Comments: 5 pages, 5 figures, 2 tables, ICIP 2019

  27. arXiv:1908.08239  [pdf, other

    cs.CV

    Progressive Face Super-Resolution via Attention to Facial Landmark

    Authors: Deokyun Kim, Minseon Kim, Gihyun Kwon, Dae-Shik Kim

    Abstract: Face Super-Resolution (SR) is a subfield of the SR domain that specifically targets the reconstruction of face images. The main challenge of face SR is to restore essential facial features without distortion. We propose a novel face SR method that generates photo-realistic 8x super-resolved face images with fully retained facial details. To that end, we adopt a progressive training method, which a… ▽ More

    Submitted 22 August, 2019; originally announced August 2019.

    Comments: BMVC 2019 Accepted

  28. arXiv:1908.02498  [pdf, other

    eess.IV cs.CV

    Generation of 3D Brain MRI Using Auto-Encoding Generative Adversarial Networks

    Authors: Gihyun Kwon, Chihye Han, Dae-shik Kim

    Abstract: As deep learning is showing unprecedented success in medical image analysis tasks, the lack of sufficient medical data is emerging as a critical problem. While recent attempts to solve the limited data problem using Generative Adversarial Networks (GAN) have been successful in generating realistic images with diversity, most of them are based on image-to-image translation and thus require extensiv… ▽ More

    Submitted 7 August, 2019; originally announced August 2019.

    Comments: 8.5 pages, 4 figures, Accepted by the 22nd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2019)

  29. arXiv:1905.02422  [pdf, other

    q-bio.NC cs.AI cs.CV

    Representation of White- and Black-Box Adversarial Examples in Deep Neural Networks and Humans: A Functional Magnetic Resonance Imaging Study

    Authors: Chihye Han, Wonjun Yoon, Gihyun Kwon, Seungkyu Nam, Daeshik Kim

    Abstract: The recent success of brain-inspired deep neural networks (DNNs) in solving complex, high-level visual tasks has led to rising expectations for their potential to match the human visual system. However, DNNs exhibit idiosyncrasies that suggest their visual representation and processing might be substantially different from human vision. One limitation of DNNs is that they are vulnerable to adversa… ▽ More

    Submitted 7 May, 2019; originally announced May 2019.

    Comments: Copyright 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  30. arXiv:1902.06334  [pdf, other

    cs.CV eess.IV

    Semantically Interpretable and Controllable Filter Sets

    Authors: Mohit Prabhushankar, Gukyeong Kwon, Dogancan Temel, Ghassan AlRegib

    Abstract: In this paper, we generate and control semantically interpretable filters that are directly learned from natural images in an unsupervised fashion. Each semantic filter learns a visually interpretable local structure in conjunction with other filters. The significance of learning these interpretable filter sets is demonstrated on two contrasting applications. The first application is image recogni… ▽ More

    Submitted 17 February, 2019; originally announced February 2019.

    Comments: 5 pages, 5 figures, 1 table

    ACM Class: I.2; I.4; I.5

  31. Power of Tempospatially Unified Spectral Density for Perceptual Video Quality Assessment

    Authors: Mohammed A. Aabed, Gukyeong Kwon, Ghassan AlRegib

    Abstract: We propose a perceptual video quality assessment (PVQA) metric for distorted videos by analyzing the power spectral density (PSD) of a group of pictures. This is an estimation approach that relies on the changes in video dynamic calculated in the frequency domain and are primarily caused by distortion. We obtain a feature map by processing a 3D PSD tensor obtained from a set of distorted frames. T… ▽ More

    Submitted 12 December, 2018; originally announced December 2018.

    Comments: 6 pages, 4 figures, 3 tables

    Journal ref: M. A. Aabed, G. Kwon, and G. AlRegib, "Power of Tempospatially Unified Spectral Density for Perceptual Video Quality Assessment," 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, 2017, pp. 1476-1481

  32. arXiv:1712.02463  [pdf, other

    cs.CV

    CURE-TSR: Challenging Unreal and Real Environments for Traffic Sign Recognition

    Authors: Dogancan Temel, Gukyeong Kwon, Mohit Prabhushankar, Ghassan AlRegib

    Abstract: In this paper, we investigate the robustness of traffic sign recognition algorithms under challenging conditions. Existing datasets are limited in terms of their size and challenging condition coverage, which motivated us to generate the Challenging Unreal and Real Environments for Traffic Sign Recognition (CURE-TSR) dataset. It includes more than two million traffic sign images that are based on… ▽ More

    Submitted 13 November, 2018; v1 submitted 6 December, 2017; originally announced December 2017.

    Comments: 31st Conference on Neural Information Processing Systems (NIPS), Machine Learning for Intelligent Transportation Systems Workshop, Long Beach, CA, USA, 2017

    ACM Class: I.2; I.4; I.5

    Journal ref: D. Temel, G. Kwon*, M. Prabhushankar*, and G. AlRegib, "CURE-TSR: Challenging unreal and real environments for traffic sign recognition," Neural Information Processing Systems (NIPS) MLITSW, December 2017

  33. arXiv:cs/0606107  [pdf

    cs.HC

    Human Information Processing with the Personal Memex

    Authors: Ingrid Burbey, Gyuhyun Kwon, Uma Murthy, Nicholas Polys, Prince Vincent

    Abstract: In this report, we describe the work done in a project that explored the human information processing aspects of a personal memex (a memex to organize personal information). In the project, we considered the use of the personal memex, focusing on information recall, by three populations: people with Mild Cognitive Impairment, those diagnosed with Macular Degeneration, and a high-functioning popu… ▽ More

    Submitted 26 June, 2006; originally announced June 2006.