Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 130 results for author: Min, D

.
  1. arXiv:2409.08566  [pdf, other

    cs.CV

    Hybrid-TTA: Continual Test-time Adaptation via Dynamic Domain Shift Detection

    Authors: Hyewon Park, Hyejin Park, Jueun Ko, Dongbo Min

    Abstract: Continual Test Time Adaptation (CTTA) has emerged as a critical approach for bridging the domain gap between the controlled training environments and the real-world scenarios, enhancing model adaptability and robustness. Existing CTTA methods, typically categorized into Full-Tuning (FT) and Efficient-Tuning (ET), struggle with effectively addressing domain shifts. To overcome these challenges, we… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  2. arXiv:2409.02846  [pdf, other

    cs.CV

    MaDis-Stereo: Enhanced Stereo Matching via Distilled Masked Image Modeling

    Authors: Jihye Ahn, Hyesong Choi, Soomin Kim, Dongbo Min

    Abstract: In stereo matching, CNNs have traditionally served as the predominant architectures. Although Transformer-based stereo models have been studied recently, their performance still lags behind CNN-based stereo models due to the inherent data scarcity issue in the stereo matching task. In this paper, we propose Masked Image Modeling Distilled Stereo matching model, termed MaDis-Stereo, that enhances l… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  3. arXiv:2409.02838  [pdf, other

    cs.CV

    iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation

    Authors: Hayeon Jo, Hyesong Choi, Minhee Cho, Dongbo Min

    Abstract: Transfer learning based on full fine-tuning (FFT) of the pre-trained encoder and task-specific decoder becomes increasingly complex as deep models grow exponentially. Parameter efficient fine-tuning (PEFT) approaches using adapters consisting of small learnable layers have emerged as an alternative to FFT, achieving comparable performance while maintaining high training efficiency. However, the in… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  4. arXiv:2409.02699  [pdf, other

    cs.CV

    CLDA: Collaborative Learning for Enhanced Unsupervised Domain Adaptation

    Authors: Minhee Cho, Hyesong Choi, Hayeon Jo, Dongbo Min

    Abstract: Unsupervised Domain Adaptation (UDA) endeavors to bridge the gap between a model trained on a labeled source domain and its deployment in an unlabeled target domain. However, current high-performance models demand significant resources, resulting in prohibitive deployment costs and highlighting the need for small yet effective models. For UDA of lightweight models, Knowledge Distillation (KD) in a… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  5. arXiv:2409.02545  [pdf, other

    cs.CV

    UniTT-Stereo: Unified Training of Transformer for Enhanced Stereo Matching

    Authors: Soomin Kim, Hyesong Choi, Jihye Ahn, Dongbo Min

    Abstract: Unlike other vision tasks where Transformer-based approaches are becoming increasingly common, stereo depth estimation is still dominated by convolution-based approaches. This is mainly due to the limited availability of real-world ground truth for stereo matching, which is a limiting factor in improving the performance of Transformer-based stereo approaches. In this paper, we propose UniTT-Stereo… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  6. arXiv:2409.02513  [pdf, other

    cs.CV

    SG-MIM: Structured Knowledge Guided Efficient Pre-training for Dense Prediction

    Authors: Sumin Son, Hyesong Choi, Dongbo Min

    Abstract: Masked Image Modeling (MIM) techniques have redefined the landscape of computer vision, enabling pre-trained models to achieve exceptional performance across a broad spectrum of tasks. Despite their success, the full potential of MIM-based methods in dense prediction tasks, particularly in depth estimation, remains untapped. Existing MIM approaches primarily rely on single-image inputs, which make… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  7. arXiv:2409.01627  [pdf, other

    cs.CV

    Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge

    Authors: Hyejin Park, Dongbo Min

    Abstract: In the realm of Adversarial Distillation (AD), strategic and precise knowledge transfer from an adversarially robust teacher model to a less robust student model is paramount. Our Dynamic Guidance Adversarial Distillation (DGAD) framework directly tackles the challenge of differential sample importance, with a keen focus on rectifying the teacher model's misclassifications. DGAD employs Misclassif… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  8. arXiv:2407.18892  [pdf, other

    cs.RO cs.AI eess.SY

    SHANGUS: Deep Reinforcement Learning Meets Heuristic Optimization for Speedy Frontier-Based Exploration of Autonomous Vehicles in Unknown Spaces

    Authors: Seunghyeop Nam, Tuan Anh Nguyen, Eunmi Choi, Dugki Min

    Abstract: This paper introduces SHANGUS, an advanced framework combining Deep Reinforcement Learning (DRL) with heuristic optimization to improve frontier-based exploration efficiency in unknown environments, particularly for intelligent vehicles in autonomous air services, search and rescue operations, and space exploration robotics. SHANGUS harnesses DRL's adaptability and heuristic prioritization, marked… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  9. arXiv:2406.15755  [pdf, other

    cs.CV cs.AI

    Fine-grained Background Representation for Weakly Supervised Semantic Segmentation

    Authors: Xu Yin, Woobin Im, Dongbo Min, Yuchi Huo, Fei Pan, Sung-Eui Yoon

    Abstract: Generating reliable pseudo masks from image-level labels is challenging in the weakly supervised semantic segmentation (WSSS) task due to the lack of spatial information. Prevalent class activation map (CAM)-based solutions are challenged to discriminate the foreground (FG) objects from the suspicious background (BG) pixels (a.k.a. co-occurring) and learn the integral object regions. This paper pr… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  10. arXiv:2406.15725  [pdf, other

    eess.AS cs.SD

    Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes

    Authors: Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park

    Abstract: To tackle sound event detection (SED), we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency warping and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial dilat… ▽ More

    Submitted 19 September, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 Challenge Task 4 technical report, DCASE 2024 Workshop accepted

  11. arXiv:2406.05341  [pdf, other

    eess.AS cs.SD

    Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park

    Abstract: Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patte… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  12. arXiv:2406.02596  [pdf, other

    cs.LG cs.AI

    Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks

    Authors: Hojoon Lee, Hyeonseo Cho, Hyunseung Kim, Donghu Kim, Dugki Min, Jaegul Choo, Clare Lyle

    Abstract: This study investigates the loss of generalization ability in neural networks, revisiting warm-starting experiments from Ash & Adams. Our empirical analysis reveals that common methods designed to enhance plasticity by maintaining trainability provide limited benefits to generalization. While reinitializing the network can be effective, it also risks losing valuable prior knowledge. To this end, w… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: accepted to ICML 2024

  13. arXiv:2404.08330  [pdf, other

    cs.CV

    Emerging Property of Masked Token for Effective Pre-training

    Authors: Hyesong Choi, Hunsang Lee, Seyoung Joung, Hyejin Park, Jiyeong Kim, Dongbo Min

    Abstract: Driven by the success of Masked Language Modeling (MLM), the realm of self-supervised learning for computer vision has been invigorated by the central role of Masked Image Modeling (MIM) in driving recent breakthroughs. Notwithstanding the achievements of MIM across various downstream tasks, its overall efficiency is occasionally hampered by the lengthy duration of the pre-training phase. This pap… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  14. arXiv:2404.08327  [pdf, other

    cs.CV

    Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training

    Authors: Hyesong Choi, Hyejin Park, Kwang Moo Yi, Sungmin Cha, Dongbo Min

    Abstract: In this paper, we introduce Saliency-Based Adaptive Masking (SBAM), a novel and cost-effective approach that significantly enhances the pre-training performance of Masked Image Modeling (MIM) approaches by prioritizing token salience. Our method provides robustness against variations in masking ratios, effectively mitigating the performance instability issues common in existing methods. This relax… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  15. arXiv:2404.00636  [pdf, other

    cs.CV cs.AI cs.MM

    Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation

    Authors: Taekyung Ki, Dongchan Min, Gyeongsu Chae

    Abstract: In this paper, we present Export3D, a one-shot 3D-aware portrait animation method that is able to control the facial expression and camera view of a given portrait image. To achieve this, we introduce a tri-plane generator with an effective expression conditioning method, which directly generates a tri-plane of 3D prior by transferring the expression parameter of 3DMM into the source image. The tr… ▽ More

    Submitted 23 July, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: ECCV 2024. Project page: https://export3d.github.io

  16. arXiv:2403.19723  [pdf, other

    cs.CL cs.AI cs.DB cs.MM

    HGT: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding

    Authors: Rihui Jin, Yu Li, Guilin Qi, Nan Hu, Yuan-Fang Li, Jiaoyan Chen, Jianan Wang, Yongrui Chen, Dehai Min

    Abstract: Table understanding (TU) has achieved promising advancements, but it faces the challenges of the scarcity of manually labeled tables and the presence of complex table structures.To address these challenges, we propose HGT, a framework with a heterogeneous graph (HG)-enhanced large language model (LLM) to tackle few-shot TU tasks.It leverages the LLM by aligning the table semantics with the LLM's p… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  17. arXiv:2403.19305  [pdf, other

    cs.CL cs.AI

    MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation

    Authors: Yu Li, Shenyu Zhang, Rui Wu, Xiutian Huang, Yongrui Chen, Wenhao Xu, Guilin Qi, Dehai Min

    Abstract: Recent advancements in generative Large Language Models(LLMs) have been remarkable, however, the quality of the text generated by these models often reveals persistent issues. Evaluating the quality of text generated by these models, especially in open-ended text, has consistently presented a significant challenge. Addressing this, recent work has explored the possibility of using LLMs as evaluato… ▽ More

    Submitted 15 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted as a long paper presentation by DASFAA 2024 Industrial Track

  18. arXiv:2403.13578  [pdf, other

    cs.CL cs.LG

    Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation

    Authors: Do June Min, Veronica Perez-Rosas, Kenneth Resnicow, Rada Mihalcea

    Abstract: In this paper, we study the problem of multi-reward reinforcement learning to jointly optimize for multiple text qualities for natural language generation. We focus on the task of counselor reflection generation, where we optimize the generators to simultaneously improve the fluency, coherence, and reflection quality of generated counselor responses. We introduce two novel bandit methods, DynaOpt… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  19. arXiv:2402.12869  [pdf, other

    cs.CL

    Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data

    Authors: Dehai Min, Nan Hu, Rihui Jin, Nuo Lin, Jiaoyan Chen, Yongrui Chen, Yu Li, Guilin Qi, Yun Li, Nijun Li, Qianren Wang

    Abstract: Augmenting Large Language Models (LLMs) for Question Answering (QA) with domain specific data has attracted wide attention. However, domain data often exists in a hybrid format, including text and semi-structured tables, posing challenges for the seamless integration of information. Table-to-Text Generation is a promising solution by facilitating the transformation of hybrid data into a uniformly… ▽ More

    Submitted 9 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to NAACL 2024 Industry Track Paper

  20. arXiv:2311.08300  [pdf, other

    cs.CL cs.AI

    Workflow-Guided Response Generation for Task-Oriented Dialogue

    Authors: Do June Min, Paloma Sodhi, Ramya Ramakrishnan

    Abstract: Task-oriented dialogue (TOD) systems aim to achieve specific goals through interactive dialogue. Such tasks usually involve following specific workflows, i.e. executing a sequence of actions in a particular order. While prior work has focused on supervised learning methods to condition on past actions, they do not explicitly optimize for compliance to a desired workflow. In this paper, we propose… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  21. arXiv:2311.08299  [pdf, other

    cs.CL cs.AI

    VERVE: Template-based ReflectiVE Rewriting for MotiVational IntErviewing

    Authors: Do June Min, Verónica Pérez-Rosas, Kenneth Resnicow, Rada Mihalcea

    Abstract: Reflective listening is a fundamental skill that counselors must acquire to achieve proficiency in motivational interviewing (MI). It involves responding in a manner that acknowledges and explores the meaning of what the client has expressed in the conversation. In this work, we introduce the task of counseling response rewriting, which transforms non-reflective statements into reflective response… ▽ More

    Submitted 8 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  22. arXiv:2310.15482  [pdf, other

    cs.CV

    Salient Object Detection in RGB-D Videos

    Authors: Ao Mou, Yukang Lu, Jiahao He, Dingyao Min, Keren Fu, Qijun Zhao

    Abstract: Given the widespread adoption of depth-sensing acquisition devices, RGB-D videos and related data/media have gained considerable traction in various aspects of daily life. Consequently, conducting salient object detection (SOD) in RGB-D videos presents a highly promising and evolving avenue. Despite the potential of this area, SOD in RGB-D videos remains somewhat under-explored, with RGB-D SOD and… ▽ More

    Submitted 21 May, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: IEEE TIP (under major revision)

  23. arXiv:2306.11427  [pdf

    eess.AS

    Auditory Neural Response Inspired Sound Event Detection Based on Spectro-temporal Receptive Field

    Authors: Deokki Min, Hyeonuk Nam, Yong-Hwa Park

    Abstract: Sound event detection (SED) is one of tasks to automate function by human auditory system which listens and understands auditory scenes. Therefore, we were inspired to make SED recognize sound events in the way human auditory system does. Spectro-temporal receptive field (STRF), an approach to describe the relationship between perceived sound at ear and transformed neural response in the auditory… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Submitted to DCASE 2023 Workshop

  24. arXiv:2306.11277  [pdf, other

    cs.SD eess.AS

    Frequency & Channel Attention for Computationally Efficient Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Yong-Hwa Park

    Abstract: We explore on various attention methods on frequency and channel dimensions for sound event detection (SED) in order to enhance performance with minimal increase in computational cost while leveraging domain knowledge to address the frequency dimension of audio data. We have introduced frequency dynamic convolution (FDY conv) in a previous work to release the translational equivariance issue assoc… ▽ More

    Submitted 28 August, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted to DCASE 2023 workshop

  25. arXiv:2306.01866  [pdf, ps, other

    math.DG

    Construction of higher dimensional ALF Calabi-Yau metrics

    Authors: Daheng Min

    Abstract: Roughly speaking, an ALF metric of real dimension $4n$ should be a metric such that its asymptotic cone is $4n-1$ dimensional, the volume growth of this metric is of order $4n-1$ and its sectional curvature tends to 0 at infinity. In this paper, I will first show that the Taub-NUT deformation of a hyperkähler cone with respect to a locally free $\mathbb{S}^1-$symmetry is ALF hyperkähler. Modelle… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    MSC Class: 53C25 (Primary) 53C55; 53C26; 53C30; 53D20 (Secondary)

  26. arXiv:2305.19135  [pdf, other

    cs.CV

    Context-Preserving Two-Stage Video Domain Translation for Portrait Stylization

    Authors: Doyeon Kim, Eunji Ko, Hyunsu Kim, Yunji Kim, Junho Kim, Dongchan Min, Junmo Kim, Sung Ju Hwang

    Abstract: Portrait stylization, which translates a real human face image into an artistically stylized image, has attracted considerable interest and many prior works have shown impressive quality in recent years. However, despite their remarkable performances in the image-level translation tasks, prior methods show unsatisfactory results when they are applied to the video domain. To address the issue, we p… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 5 pages, 3 figures, CVPR 2023 Workshop on AI for Content Creation

  27. arXiv:2305.12544  [pdf, other

    cs.CL cs.AI

    Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models

    Authors: Oana Ignat, Zhijing Jin, Artem Abzaliev, Laura Biester, Santiago Castro, Naihao Deng, Xinyi Gao, Aylin Gunal, Jacky He, Ashkan Kazemi, Muhammad Khalifa, Namho Koh, Andrew Lee, Siyang Liu, Do June Min, Shinka Mori, Joan Nwatu, Veronica Perez-Rosas, Siqi Shen, Zekun Wang, Winston Wu, Rada Mihalcea

    Abstract: Recent progress in large language models (LLMs) has enabled the deployment of many generative NLP applications. At the same time, it has also led to a misleading public discourse that ``it's all been solved.'' Not surprisingly, this has, in turn, made many NLP researchers -- especially those at the beginning of their careers -- worry about what NLP research area they should focus on. Has it all be… ▽ More

    Submitted 15 March, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted at COLING 2024

  28. arXiv:2305.00521  [pdf, other

    cs.CV cs.AI cs.LG

    StyleLipSync: Style-based Personalized Lip-sync Video Generation

    Authors: Taekyung Ki, Dongchan Min

    Abstract: In this paper, we present StyleLipSync, a style-based personalized lip-sync video generative model that can generate identity-agnostic lip-synchronizing video from arbitrary audio. To generate a video of arbitrary identities, we leverage expressive lip prior from the semantically rich latent space of a pre-trained StyleGAN, where we can also design a video consistency with a linear transformation.… ▽ More

    Submitted 12 February, 2024; v1 submitted 30 April, 2023; originally announced May 2023.

    Comments: International Conference on Computer Vision (ICCV) 2023. Project page: https://stylelipsync.github.io

  29. Adaptive Endpointing with Deep Contextual Multi-armed Bandits

    Authors: Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh

    Abstract: Current endpointing (EP) solutions learn in a supervised framework, which does not allow the model to incorporate feedback and improve in an online setting. Also, it is a common practice to utilize costly grid-search to find the best configuration for an endpointing model. In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Journal ref: Proc. IEEE ICASSP, June 2023

  30. arXiv:2303.10368  [pdf, other

    cs.CL

    An Empirical Study of Pre-trained Language Models in Simple Knowledge Graph Question Answering

    Authors: Nan Hu, Yike Wu, Guilin Qi, Dehai Min, Jiaoyan Chen, Jeff Z. Pan, Zafar Ali

    Abstract: Large-scale pre-trained language models (PLMs) such as BERT have recently achieved great success and become a milestone in natural language processing (NLP). It is now the consensus of the NLP community to adopt PLMs as the backbone for downstream tasks. In recent works on knowledge graph question answering (KGQA), BERT or its variants have become necessary in their KGQA models. However, there is… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Comments: Accepted by World Wide Web Journal

  31. arXiv:2303.07992  [pdf, other

    cs.CL

    Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM Family

    Authors: Yiming Tan, Dehai Min, Yu Li, Wenbo Li, Nan Hu, Yongrui Chen, Guilin Qi

    Abstract: ChatGPT is a powerful large language model (LLM) that covers knowledge resources such as Wikipedia and supports natural language question answering using its own knowledge. Therefore, there is growing interest in exploring whether ChatGPT can replace traditional knowledge-based question answering (KBQA) models. Although there have been some works analyzing the question answering performance of Cha… ▽ More

    Submitted 20 September, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: To be published in Proceedings of ISWC 2023, 22nd International Semantic Web Conference

  32. arXiv:2211.09383  [pdf, other

    eess.AS cs.AI cs.SD

    Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models

    Authors: Minki Kang, Dongchan Min, Sung Ju Hwang

    Abstract: There has been a significant progress in Text-To-Speech (TTS) synthesis technology in recent years, thanks to the advancement in neural generative modeling. However, existing methods on any-speaker adaptive TTS have achieved unsatisfactory performance, due to their suboptimal accuracy in mimicking the target speakers' styles. In this work, we present Grad-StyleSpeech, which is an any-speaker adapt… ▽ More

    Submitted 13 March, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: ICASSP 2023

  33. arXiv:2210.02689  [pdf, other

    cs.CV

    Neural Matching Fields: Implicit Representation of Matching Fields for Visual Correspondence

    Authors: Sunghwan Hong, Jisu Nam, Seokju Cho, Susung Hong, Sangryul Jeon, Dongbo Min, Seungryong Kim

    Abstract: Existing pipelines of semantic correspondence commonly include extracting high-level semantic features for the invariance against intra-class variations and background clutters. This architecture, however, inevitably results in a low-resolution matching field that additionally requires an ad-hoc interpolation process as a post-processing for converting it into a high-resolution one, certainly limi… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: NeurIPS2022 camera ready

  34. arXiv:2210.00223  [pdf, other

    cs.CV

    Contour-Aware Equipotential Learning for Semantic Segmentation

    Authors: Xu Yin, Dongbo Min, Yuchi Huo, Sung-Eui Yoon

    Abstract: With increasing demands for high-quality semantic segmentation in the industry, hard-distinguishing semantic boundaries have posed a significant threat to existing solutions. Inspired by real-life experience, i.e., combining varied observations contributes to higher visual recognition confidence, we present the equipotential learning (EPL) method. This novel module transfers the predicted/ground-t… ▽ More

    Submitted 1 October, 2022; originally announced October 2022.

  35. arXiv:2209.02518  [pdf

    cs.CV

    Sequential Cross Attention Based Multi-task Learning

    Authors: Sunkyung Kim, Hyesong Choi, Dongbo Min

    Abstract: In multi-task learning (MTL) for visual scene understanding, it is crucial to transfer useful information between multiple tasks with minimal interferences. In this paper, we propose a novel architecture that effectively transfers informative features by applying the attention mechanism to the multi-scale features of the tasks. Since applying the attention module directly to all possible features… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

    Comments: ICIP 2022

  36. arXiv:2208.10922  [pdf, other

    cs.CV cs.LG eess.AS eess.IV

    StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation

    Authors: Dongchan Min, Minyoung Song, Eunji Ko, Sung Ju Hwang

    Abstract: We propose StyleTalker, a novel audio-driven talking head generation model that can synthesize a video of a talking person from a single reference image with accurately audio-synced lip shapes, realistic head poses, and eye blinks. Specifically, by leveraging a pretrained image generator and an image encoder, we estimate the latent codes of the talking head video that faithfully reflects the given… ▽ More

    Submitted 15 March, 2024; v1 submitted 23 August, 2022; originally announced August 2022.

  37. arXiv:2207.13340  [pdf, other

    cs.CV cs.LG

    PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation

    Authors: Kwonyoung Kim, Jungin Park, Jiyoung Lee, Dongbo Min, Kwanghoon Sohn

    Abstract: Online stereo adaptation tackles the domain shift problem, caused by different environments between synthetic (training) and real (test) datasets, to promptly adapt stereo models in dynamic real-world applications such as autonomous driving. However, previous methods often fail to counteract particular regions related to dynamic objects with more severe environmental changes. To mitigate this issu… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022

  38. arXiv:2206.12059  [pdf

    eess.AS cs.SD

    Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes

    Authors: Byeong-Yun Ko, Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Seung-Deok Choi, Yong-Hwa Park

    Abstract: Performance of sound event localization and detection (SELD) in real scenes is limited by small size of SELD dataset, due to difficulty in obtaining sufficient amount of realistic multi-channel audio data recordings with accurate label. We used two main strategies to solve problems arising from the small real SELD dataset. First, we applied various data augmentation methods on all data dimensions:… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Technical Report submitted for DCASE2022 Challenge Task3

  39. arXiv:2206.11645  [pdf, ps, other

    eess.AS

    Frequency Dependent Sound Event Detection for DCASE 2022 Challenge Task 4

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Byeong-Yun Ko, Seung-Deok Choi, Yong-Hwa Park

    Abstract: While many deep learning methods on other domains have been applied to sound event detection (SED), differences between original domains of the methods and SED have not been appropriately considered so far. As SED uses audio data with two dimensions (time and frequency) for input, thorough comprehension on these two dimensions is essential for application of methods from other domains on SED. Prev… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Technical Reprot submitted for DCASE2022 Challenge Task4

  40. arXiv:2206.09604  [pdf, other

    cs.CV cs.AI

    Distortion-Aware Network Pruning and Feature Reuse for Real-time Video Segmentation

    Authors: Hyunsu Rhee, Dongchan Min, Sunil Hwang, Bruno Andreis, Sung Ju Hwang

    Abstract: Real-time video segmentation is a crucial task for many real-world applications such as autonomous driving and robot control. Since state-of-the-art semantic segmentation models are often too heavy for real-time applications despite their impressive performance, researchers have proposed lightweight architectures with speed-accuracy trade-offs, achieving real-time speed at the expense of reduced a… ▽ More

    Submitted 15 December, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

  41. arXiv:2204.03609  [pdf, other

    cs.CV cs.LG

    Pin the Memory: Learning to Generalize Semantic Segmentation

    Authors: Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn

    Abstract: The rise of deep neural networks has led to several breakthroughs for semantic segmentation. In spite of this, a model trained on source domain often fails to work properly in new challenging domains, that is directly concerned with the generalization capability of the model. In this paper, we present a novel memory-guided domain generalization method for semantic segmentation based on meta-learni… ▽ More

    Submitted 30 May, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted to CVPR 2022

  42. arXiv:2202.06060  [pdf, other

    cs.CV

    Depth-Cooperated Trimodal Network for Video Salient Object Detection

    Authors: Yukang Lu, Dingyao Min, Keren Fu, Qijun Zhao

    Abstract: Depth can provide useful geographical cues for salient object detection (SOD), and has been proven helpful in recent RGB-D SOD methods. However, existing video salient object detection (VSOD) methods only utilize spatiotemporal information and seldom exploit depth information for detection. In this paper, we propose a depth-cooperated trimodal network, called DCTNet for VSOD, which is a pioneering… ▽ More

    Submitted 11 July, 2022; v1 submitted 12 February, 2022; originally announced February 2022.

    Comments: 5 pages, 3 figures, Accepted at ICIP-2022

  43. arXiv:2110.11590  [pdf, other

    cs.CV

    DIML/CVL RGB-D Dataset: 2M RGB-D Images of Natural Indoor and Outdoor Scenes

    Authors: Jaehoon Cho, Dongbo Min, Youngjung Kim, Kwanghoon Sohn

    Abstract: This manual is intended to provide a detailed description of the DIML/CVL RGB-D dataset. This dataset is comprised of 2M color images and their corresponding depth maps from a great variety of natural indoor and outdoor scenes. The indoor dataset was constructed using the Microsoft Kinect v2, while the outdoor dataset was built using the stereo cameras (ZED stereo camera and built-in stereo camera… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: Technical report

  44. Self-balanced Learning For Domain Generalization

    Authors: Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn

    Abstract: Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics. Most existing approaches have been developed under the assumption that the source data is well-balanced in terms of both domain and class. However, real-world training data collected with different composition biases often exhibits severe… ▽ More

    Submitted 30 August, 2021; originally announced August 2021.

    Comments: Accepted at International Conference on Image Processing (ICIP) 2021

    Journal ref: ICIP, 2021, pp. 779-783

  45. arXiv:2106.03153  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

    Authors: Dongchan Min, Dong Bok Lee, Eunho Yang, Sung Ju Hwang

    Abstract: With rapid progress in neural text-to-speech (TTS) models, personalized speech generation is now in high demand for many applications. For practical applicability, a TTS model should generate high-quality speech with only a few audio samples from the given speaker, that are also short in length. However, existing methods either require to fine-tune the model or achieve low adaptation quality witho… ▽ More

    Submitted 16 June, 2021; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: Accepted by ICML 2021

  46. arXiv:2101.00431  [pdf, other

    cs.CV

    On the confidence of stereo matching in a deep-learning era: a quantitative evaluation

    Authors: Matteo Poggi, Seungryong Kim, Fabio Tosi, Sunok Kim, Filippo Aleotti, Dongbo Min, Kwanghoon Sohn, Stefano Mattoccia

    Abstract: Stereo matching is one of the most popular techniques to estimate dense depth maps by finding the disparity between matching pixels on two, synchronized and rectified images. Alongside with the development of more accurate algorithms, the research community focused on finding good strategies to estimate the reliability, i.e. the confidence, of estimated disparity maps. This information proves to b… ▽ More

    Submitted 30 March, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: TPAMI final version

  47. arXiv:2011.10897   

    cs.AI eess.SY

    Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems

    Authors: Hyungjun Park, Daiki Min, Jong-hyun Ryu, Dong Gu Choi

    Abstract: Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First… ▽ More

    Submitted 19 May, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

    Comments: We request withdrawal of this article due to a definition error on methodology and problem definition (Section 3-4; pages 2-5)

  48. arXiv:2009.12840  [pdf, other

    cs.CV

    Adaptive confidence thresholding for monocular depth estimation

    Authors: Hyesong Choi, Hunsang Lee, Sunkyung Kim, Sunok Kim, Seungryong Kim, Kwanghoon Sohn, Dongbo Min

    Abstract: Self-supervised monocular depth estimation has become an appealing solution to the lack of ground truth labels, but its reconstruction loss often produces over-smoothed results across object boundaries and is incapable of handling occlusion explicitly. In this paper, we propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching… ▽ More

    Submitted 23 August, 2021; v1 submitted 27 September, 2020; originally announced September 2020.

    Comments: ICCV 2021

  49. arXiv:2006.16659  [pdf, other

    eess.SY cs.LG

    Delayed Q-update: A novel credit assignment technique for deriving an optimal operation policy for the Grid-Connected Microgrid

    Authors: Hyungjun Park, Daiki Min, Jong-hyun Ryu, Dong Gu Choi

    Abstract: A microgrid is an innovative system that integrates distributed energy resources to supply electricity demand within electrical boundaries. This study proposes an approach for deriving a desirable microgrid operation policy that enables sophisticated controls in the microgrid system using the proposed novel credit assignment technique, delayed-Q update. The technique employs novel features such as… ▽ More

    Submitted 20 October, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

  50. arXiv:2004.13354  [pdf, other

    cs.CR

    SGX-SSD: A Policy-based Versioning SSD with Intel SGX

    Authors: Jinwoo Ahn, Seungjin Lee, Jinhoon Lee, Yungwoo Ko, Donghyun Min, Junghee Lee, Youngjae Kim

    Abstract: This paper demonstrates that SSDs, which perform device-level versioning, can be exposed to data tampering attacks when the retention time of data is less than the malware's dwell time. To deal with that threat, we propose SGX-SSD, a SGX-based versioning SSD which selectively preserves file history based on the given policy. The proposed system adopts Intel SGX to implement the version policy mana… ▽ More

    Submitted 28 April, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

    Comments: 7 pages, 4 figures

    ACM Class: E.5