Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 99 results for author: Kim, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.10272  [pdf, other

    cs.CV cs.AI cs.SD eess.AS eess.IV

    Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

    Authors: Youngjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung

    Abstract: The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  2. arXiv:2405.08247  [pdf, other

    eess.IV cs.AI

    Automated classification of multi-parametric body MRI series

    Authors: Boah Kim, Tejas Sudharshan Mathai, Kimberly Helm, Ronald M. Summers

    Abstract: Multi-parametric MRI (mpMRI) studies are widely available in clinical practice for the diagnosis of various diseases. As the volume of mpMRI exams increases yearly, there are concomitant inaccuracies that exist within the DICOM header fields of these exams. This precludes the use of the header information for the arrangement of the different series as part of the radiologist's hanging protocol, an… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  3. arXiv:2405.05944  [pdf, other

    eess.IV cs.CV

    MRISegmentator-Abdomen: A Fully Automated Multi-Organ and Structure Segmentation Tool for T1-weighted Abdominal MRI

    Authors: Yan Zhuang, Tejas Sudharshan Mathai, Pritam Mukherjee, Brandon Khoury, Boah Kim, Benjamin Hou, Nusrat Rabbee, Abhinav Suri, Ronald M. Summers

    Abstract: Background: Segmentation of organs and structures in abdominal MRI is useful for many clinical applications, such as disease diagnosis and radiotherapy. Current approaches have focused on delineating a limited set of abdominal structures (13 types). To date, there is no publicly available abdominal MRI dataset with voxel-level annotations of multiple organs and structures. Consequently, a segmenta… ▽ More

    Submitted 24 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: We made the segmentation model publicly available

  4. arXiv:2405.05107  [pdf, other

    cs.ET cs.AR eess.SY

    Leveraging AES Padding: dBs for Nothing and FEC for Free in IoT Systems

    Authors: Jongchan Woo, Vipindev Adat Vasudevan, Benjamin D. Kim, Rafael G. L. D'Oliveira, Alejandro Cohen, Thomas Stahlbuhk, Ken R. Duffy, Muriel Médard

    Abstract: The Internet of Things (IoT) represents a significant advancement in digital technology, with its rapidly growing network of interconnected devices. This expansion, however, brings forth critical challenges in data security and reliability, especially under the threat of increasing cyber vulnerabilities. Addressing the security concerns, the Advanced Encryption Standard (AES) is commonly employed… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  5. arXiv:2402.08098  [pdf, other

    eess.IV cs.CV

    Automated Classification of Body MRI Sequence Type Using Convolutional Neural Networks

    Authors: Kimberly Helm, Tejas Sudharshan Mathai, Boah Kim, Pritam Mukherjee, Jianfei Liu, Ronald M. Summers

    Abstract: Multi-parametric MRI of the body is routinely acquired for the identification of abnormalities and diagnosis of diseases. However, a standard naming convention for the MRI protocols and associated sequences does not exist due to wide variations in imaging practice at institutions and myriad MRI scanners from various manufacturers being used for imaging. The intensity distributions of MRI sequences… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted at SPIE 2024

  6. arXiv:2402.06846  [pdf, other

    cs.CR eess.SY

    System-level Analysis of Adversarial Attacks and Defenses on Intelligence in O-RAN based Cellular Networks

    Authors: Azuka Chiejina, Brian Kim, Kaushik Chowhdury, Vijay K. Shah

    Abstract: While the open architecture, open interfaces, and integration of intelligence within Open Radio Access Network technology hold the promise of transforming 5G and 6G networks, they also introduce cybersecurity vulnerabilities that hinder its widespread adoption. In this paper, we conduct a thorough system-level investigation of cyber threats, with a specific focus on machine learning (ML) intellige… ▽ More

    Submitted 13 February, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: his paper has been accepted for publication in ACM WiSec 2024

  7. arXiv:2402.00977  [pdf, other

    cs.CV eess.IV

    Enhanced fringe-to-phase framework using deep learning

    Authors: Won-Hoe Kim, Bongjoong Kim, Hyung-Gun Chi, Jae-Sang Hyun

    Abstract: In Fringe Projection Profilometry (FPP), achieving robust and accurate 3D reconstruction with a limited number of fringe patterns remains a challenge in structured light 3D imaging. Conventional methods require a set of fringe images, but using only one or two patterns complicates phase recovery and unwrapping. In this study, we introduce SFNet, a symmetric fusion network that transforms two fring… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 35 pages, 13 figures, 6 tables

  8. arXiv:2401.13921  [pdf, other

    eess.AS cs.SD

    Intelli-Z: Toward Intelligible Zero-Shot TTS

    Authors: Sunghee Jung, Won Jang, Jaesam Yoon, Bongwan Kim

    Abstract: Although numerous recent studies have suggested new frameworks for zero-shot TTS using large-scale, real-world data, studies that focus on the intelligibility of zero-shot TTS are relatively scarce. Zero-shot TTS demands additional efforts to ensure clear pronunciation and speech quality due to its inherent requirement of replacing a core parameter (speaker embedding or acoustic prompt) with a new… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  9. arXiv:2401.12473  [pdf, other

    eess.AS cs.SD

    Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor

    Authors: Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, Shinji Watanabe

    Abstract: We propose a novel speech separation model designed to separate mixtures with an unknown number of speakers. The proposed model stacks 1) a dual-path processing block that can model spectro-temporal patterns, 2) a transformer decoder-based attractor (TDA) calculation module that can deal with an unknown number of speakers, and 3) triple-path processing blocks that can model inter-speaker relations… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, accepted by ICASSP 2024

  10. arXiv:2312.14939  [pdf, other

    q-bio.NC cs.CV cs.LG eess.IV

    Large-scale Graph Representation Learning of Dynamic Brain Connectome with Transformers

    Authors: Byung-Hoon Kim, Jungwon Choi, EungGu Yun, Kyungsang Kim, Xiang Li, Juho Lee

    Abstract: Graph Transformers have recently been successful in various graph representation learning tasks, providing a number of advantages over message-passing Graph Neural Networks. Utilizing Graph Transformers for learning the representation of the brain functional connectivity network is also gaining interest. However, studies to date have underlooked the temporal dynamics of functional connectivity, wh… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023 Temporal Graph Learning Workshop

  11. arXiv:2312.07896  [pdf, other

    eess.SY cs.NI

    TRACTOR: Traffic Analysis and Classification Tool for Open RAN

    Authors: Joshua Groen, Mauro Belgiovine, Utku Demir, Brian Kim, Kaushik Chowdhury

    Abstract: 5G and beyond cellular networks promise remarkable advancements in bandwidth, latency, and connectivity. The emergence of Open Radio Access Network (O-RAN) represents a pivotal direction for the evolution of cellular networks, inherently supporting machine learning (ML) for network operation control. Within this framework, RAN Intelligence Controllers (RICs) from one provider can employ ML models… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: 6 pages, 5 figures, 2 tables, submitted to ICC 2024

  12. arXiv:2312.06453  [pdf, other

    cs.CV eess.IV

    Semantic Image Synthesis for Abdominal CT

    Authors: Yan Zhuang, Benjamin Hou, Tejas Sudharshan Mathai, Pritam Mukherjee, Boah Kim, Ronald M. Summers

    Abstract: As a new emerging and promising type of generative models, diffusion models have proven to outperform Generative Adversarial Networks (GANs) in multiple tasks, including image synthesis. In this work, we explore semantic image synthesis for abdominal CT using conditional diffusion models, which can be used for downstream applications such as data augmentation. We systematically evaluated the perfo… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: This paper has been accepted at Deep Generative Models workshop at MICCAI 2023

  13. arXiv:2312.01994  [pdf, other

    cs.LG cs.CV eess.IV q-bio.NC

    A Generative Self-Supervised Framework using Functional Connectivity in fMRI Data

    Authors: Jungwon Choi, Seongho Keum, EungGu Yun, Byung-Hoon Kim, Juho Lee

    Abstract: Deep neural networks trained on Functional Connectivity (FC) networks extracted from functional Magnetic Resonance Imaging (fMRI) data have gained popularity due to the increasing availability of data and advances in model architectures, including Graph Neural Network (GNN). Recent research on the application of GNN to FC suggests that exploiting the time-varying properties of the FC could signifi… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023 Temporal Graph Learning Workshop

  14. arXiv:2311.11745  [pdf, other

    cs.SD cs.CL eess.AS

    ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis

    Authors: Jungil Kong, Junmo Lee, Jeongmin Kim, Beomjeong Kim, Jihoon Park, Dohee Kong, Changheon Lee, Sangjin Kim

    Abstract: In this work, we propose a novel method for modeling numerous speakers, which enables expressing the overall characteristics of speakers in detail like a trained multi-speaker model without additional training on the target speaker's dataset. Although various works with similar purposes have been actively studied, their performance has not yet reached that of trained multi-speaker models due to th… ▽ More

    Submitted 31 May, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: ICML 2024

  15. arXiv:2309.06035  [pdf

    physics.optics eess.SY physics.class-ph

    Non-reciprocal absorption and zero reflection in physically separated dual photonic resonators by traveling-wave-induced indirect coupling

    Authors: Bojong Kim, Junyoung Kim, Hae-Chan Jeon, Sang-Koog Kim

    Abstract: We experimentally explored novel behaviors of non-reciprocal absorption and almost zero reflection in a dual photon resonator system, which is physically separated and composed of two inverted split ring resonators (ISRRs) with varying inter-distances. We also found that an electromagnetically-induced-transparency (EIT)-like peak at a specific inter-distance of d = 18 mm through traveling waves fl… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  16. arXiv:2309.03844  [pdf, other

    cs.NI eess.SP

    Experimental Study of Adversarial Attacks on ML-based xApps in O-RAN

    Authors: Naveen Naik Sapavath, Brian Kim, Kaushik Chowdhury, Vijay K Shah

    Abstract: Open Radio Access Network (O-RAN) is considered as a major step in the evolution of next-generation cellular networks given its support for open interfaces and utilization of artificial intelligence (AI) into the deployment, operation, and maintenance of RAN. However, due to the openness of the O-RAN architecture, such AI models are inherently vulnerable to various adversarial machine learning (ML… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepted for Globecom 2023

  17. arXiv:2309.00647  [pdf, other

    eess.AS cs.LG cs.SD

    Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data

    Authors: Seunghan Yang, Byeonggeun Kim, Kyuhong Shim, Simyung Chang

    Abstract: Few-shot keyword spotting (FS-KWS) models usually require large-scale annotated datasets to generalize to unseen target keywords. However, existing KWS datasets are limited in scale and gathering keyword-like labeled data is costly undertaking. To mitigate this issue, we propose a framework that uses easily collectible, unlabeled reading speech data as an auxiliary source. Self-supervised learning… ▽ More

    Submitted 31 August, 2023; originally announced September 2023.

    Comments: Interspeech 2023

  18. arXiv:2308.10372  [pdf

    eess.IV cs.CV cs.LG q-bio.QM

    Developing a Machine Learning-Based Clinical Decision Support Tool for Uterine Tumor Imaging

    Authors: Darryl E. Wright, Adriana V. Gregory, Deema Anaam, Sepideh Yadollahi, Sumana Ramanathan, Kafayat A. Oyemade, Reem Alsibai, Heather Holmes, Harrison Gottlich, Cherie-Akilah G. Browne, Sarah L. Cohen Rassier, Isabel Green, Elizabeth A. Stewart, Hiroaki Takahashi, Bohyun Kim, Shannon Laughlin-Tommaso, Timothy L. Kline

    Abstract: Uterine leiomyosarcoma (LMS) is a rare but aggressive malignancy. On imaging, it is difficult to differentiate LMS from, for example, degenerated leiomyoma (LM), a prevalent but benign condition. We curated a data set of 115 axial T2-weighted MRI images from 110 patients (mean [range] age=45 [17-81] years) with UTs that included five different tumor types. These data were randomly split stratifyin… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

  19. arXiv:2308.06957  [pdf, other

    eess.IV cs.CV cs.LG

    CEmb-SAM: Segment Anything Model with Condition Embedding for Joint Learning from Heterogeneous Datasets

    Authors: Dongik Shin, Beomsuk Kim, Seungjun Baek

    Abstract: Automated segmentation of ultrasound images can assist medical experts with diagnostic and therapeutic procedures. Although using the common modality of ultrasound, one typically needs separate datasets in order to segment, for example, different anatomical structures or lesions with different levels of malignancy. In this paper, we consider the problem of jointly learning from heterogeneous datas… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  20. arXiv:2308.05063  [pdf, other

    cs.CR cs.AR cs.IT eess.SY

    CERMET: Coding for Energy Reduction with Multiple Encryption Techniques -- $It's\ easy\ being\ green$

    Authors: Jongchan Woo, Vipindev Adat Vasudevan, Benjamin Kim, Alejandro Cohen, Rafael G. L. D'Oliveira, Thomas Stahlbuhk, Muriel Médard

    Abstract: This paper presents CERMET, an energy-efficient hardware architecture designed for hardware-constrained cryptosystems. CERMET employs a base cryptosystem in conjunction with network coding to provide both information-theoretic and computational security while reducing energy consumption per bit. This paper introduces the hardware architecture for the system and explores various optimizations to en… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  21. arXiv:2308.00193  [pdf, other

    eess.IV cs.CV cs.LG

    C-DARL: Contrastive diffusion adversarial representation learning for label-free blood vessel segmentation

    Authors: Boah Kim, Yujin Oh, Bradford J. Wood, Ronald M. Summers, Jong Chul Ye

    Abstract: Blood vessel segmentation in medical imaging is one of the essential steps for vascular disease diagnosis and interventional planning in a broad spectrum of clinical scenarios in image-based medicine and interventional medicine. Unfortunately, manual annotation of the vessel masks is challenging and resource-intensive due to subtle branches and complex structures. To overcome this issue, this pape… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

  22. arXiv:2307.16430  [pdf, other

    cs.SD cs.LG eess.AS

    VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design

    Authors: Jungil Kong, Jihoon Park, Beomjeong Kim, Jeongmin Kim, Dohee Kong, Sangjin Kim

    Abstract: Single-stage text-to-speech models have been actively studied recently, and their results have outperformed two-stage pipeline systems. Although the previous single-stage model has made great progress, there is room for improvement in terms of its intermittent unnaturalness, computational efficiency, and strong dependence on phoneme conversion. In this work, we introduce VITS2, a single-stage text… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: Interspeech 2023

  23. arXiv:2306.14385  [pdf, other

    eess.SP

    Calibration of Wideband LFM Radars based on Sliding Window Algorithm

    Authors: Hyung-Woo Kim, Jin-woo Kim, Jin-ha Kim, JaeYoung Choi, Sangpyo Hong, Byungkwan Kim

    Abstract: This paper addresses the challenges of wideband signal beamforming in radar systems and proposes a new calibration method. Due to operating conditions, the frequency dependent characteristics of the system can be changed, and amplitude, phase, and time delay error can be generated. The proposed method is based on the concept of sliding window algorithm for linear frequency modulated (LFM) signals.… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: 11 pages

  24. arXiv:2305.16699  [pdf, other

    eess.AS cs.AI cs.LG

    Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis

    Authors: Seongyeon Park, Bohyung Kim, Tae-hyun Oh

    Abstract: Recently, zero-shot TTS and VC methods have gained attention due to their practicality of being able to generate voices even unseen during training. Among these methods, zero-shot modifications of the VITS model have shown superior performance, while having useful properties inherited from VITS. However, the performance of VITS and VITS-based zero-shot models vary dramatically depending on how the… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023

  25. arXiv:2304.08707  [pdf, other

    eess.AS cs.SD

    Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

    Authors: Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

    Abstract: We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain. The model maintains an information highway to flow an over-complete input representation through multiple FSB-LSTM modules. Each FSB-LSTM module consists of a full-band bl… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: in ICASSP 2023

  26. arXiv:2304.00471  [pdf, other

    cs.SD cs.CV cs.GR cs.LG eess.AS

    A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation

    Authors: Bo-Kyeong Kim, Jaemin Kang, Daeun Seo, Hancheol Park, Shinkook Choi, Hyoung-Kyu Song, Hyungshin Kim, Sungsu Lim

    Abstract: Virtual humans have gained considerable attention in numerous industries, e.g., entertainment and e-commerce. As a core technology, synthesizing photorealistic face frames from target speech and facial identity has been actively studied with generative adversarial networks. Despite remarkable results of modern talking-face generation models, they often entail high computational burdens, which limi… ▽ More

    Submitted 28 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: MLSys Workshop on On-Device Intelligence, 2023; Demo: https://huggingface.co/spaces/nota-ai/compressed_wav2lip

  27. arXiv:2303.16511  [pdf, other

    eess.AS

    Joint unsupervised and supervised learning for context-aware language identification

    Authors: Jinseok Park, Hyung Yong Kim, Jihwan Park, Byeong-Yeol Kim, Shukjae Choi, Yunkyu Lim

    Abstract: Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task only. However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this probl… ▽ More

    Submitted 14 April, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  28. arXiv:2303.15669  [pdf, other

    eess.AS cs.AI cs.LG

    Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages

    Authors: Seongyeon Park, Myungseo Song, Bohyung Kim, Tae-Hyun Oh

    Abstract: Neural text-to-speech (TTS) models can synthesize natural human speech when trained on large amounts of transcribed speech. However, collecting such large-scale transcribed data is expensive. This paper proposes an unsupervised pre-training method for a sequence-to-sequence TTS model by leveraging large untranscribed speech data. With our pre-training, we can remarkably reduce the amount of paired… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  29. arXiv:2303.09463  [pdf, other

    cs.RO eess.SY

    An Autonomous System for Head-to-Head Race: Design, Implementation and Analysis; Team KAIST at the Indy Autonomous Challenge

    Authors: Chanyoung Jung, Andrea Finazzi, Hyunki Seong, Daegyu Lee, Seungwook Lee, Bosung Kim, Gyuri Gang, Seungil Han, David Hyunchul Shim

    Abstract: While the majority of autonomous driving research has concentrated on everyday driving scenarios, further safety and performance improvements of autonomous vehicles require a focus on extreme driving conditions. In this context, autonomous racing is a new area of research that has been attracting considerable interest recently. Due to the fact that a vehicle is driven by its perception, planning,… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: 35 pages, 31 figures, 5 tables, Field Robotics (accepted)

  30. arXiv:2302.14370  [pdf, other

    cs.SD cs.AI eess.AS eess.SP

    CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis

    Authors: Ji-Hoon Kim, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim

    Abstract: While recent text-to-speech (TTS) systems have made remarkable strides toward human-level quality, the performance of cross-lingual TTS lags behind that of intra-lingual TTS. This gap is mainly rooted from the speaker-language entanglement problem in cross-lingual TTS. In this paper, we propose CrossSpeech which improves the quality of cross-lingual speech by effectively disentangling speaker and… ▽ More

    Submitted 12 June, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: Accepted to ICASSP 2023

  31. arXiv:2301.08078  [pdf, other

    cs.RO eess.SY

    Stable Contact Guaranteeing Motion/Force Control for an Aerial Manipulator on an Arbitrarily Tilted Surface

    Authors: Jeonghyun Byun, Byeongjun Kim, Changhyeon Kim, Donggeon David Oh, H. Jin Kim

    Abstract: This study aims to design a motion/force controller for an aerial manipulator which guarantees the tracking of time-varying motion/force trajectories as well as the stability during the transition between free and contact motions. To this end, we model the force exerted on the end-effector as the Kelvin-Voigt linear model and estimate its parameters by recursive least-squares estimator. Then, the… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

    Comments: to be presented in 2023 IEEE International Conference on Robotics and Automations (ICRA), London, United Kingdom, 2023

  32. arXiv:2211.12433  [pdf, other

    cs.SD eess.AS

    TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

    Authors: Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

    Abstract: We propose TF-GridNet for speech separation. The model is a novel deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several blocks, each consisting of an intra-frame full-band module, a sub-band temporal module, and a cross-frame self-attention module. It is trained to perform complex spectral mapping, where the real and imaginary (RI)… ▽ More

    Submitted 4 August, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: In IEEE/ACM Transactions on Audio, Speech, and Language Processing. A sound demo is available at https://zqwang7.github.io/demos/TF-GridNet-demo/index.html, and the code is available at https://github.com/espnet/espnet/pull/5395

  33. arXiv:2211.00439  [pdf, other

    eess.AS cs.SD

    Metric Learning for User-defined Keyword Spotting

    Authors: Jaemin Jung, Youkyum Kim, Jihwan Park, Youshin Lim, Byeong-Yeol Kim, Youngjoon Jang, Joon Son Chung

    Abstract: The goal of this work is to detect new spoken terms defined by users. While most previous works address Keyword Spotting (KWS) as a closed-set classification problem, this limits their transferability to unseen terms. The ability to define custom keywords has advantages in terms of user experience. In this paper, we propose a metric learning-based training strategy for user-defined keyword spott… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  34. arXiv:2209.14566  [pdf, other

    eess.IV cs.CV cs.LG

    Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation

    Authors: Boah Kim, Yujin Oh, Jong Chul Ye

    Abstract: Vessel segmentation in medical images is one of the important tasks in the diagnosis of vascular diseases and therapy planning. Although learning-based segmentation approaches have been extensively studied, a large amount of ground-truth labels are required in supervised methods and confusing background structures make neural networks hard to segment vessels in an unsupervised manner. To address t… ▽ More

    Submitted 15 February, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: Accepted at ICLR 2023

  35. arXiv:2209.06305  [pdf

    physics.optics eess.IV

    Ptychographic lens-less polarization microscopy

    Authors: Jeongsoo Kim, Seungri Song, Bora Kim, Mirae Park, Seung Jae Oh, Daesuk Kim, Barry Cense, Yong-Min Huh, Joo Yong Lee, Chulmin Joo

    Abstract: Birefringence, an inherent characteristic of optically anisotropic materials, is widely utilized in various imaging applications ranging from material characterizations to clinical diagnosis. Polarized light microscopy enables high-resolution, high-contrast imaging of optically anisotropic specimens, but it is associated with mechanical rotations of polarizer/analyzer and relatively complex optica… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 18 pages, 10 figures, author names corrected

  36. arXiv:2209.03952  [pdf, other

    cs.SD eess.AS

    TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation

    Authors: Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

    Abstract: We propose TF-GridNet, a novel multi-path deep neural network (DNN) operating in the time-frequency (T-F) domain, for monaural talker-independent speaker separation in anechoic conditions. The model stacks several multi-path blocks, each consisting of an intra-frame spectral module, a sub-band temporal module, and a full-band self-attention module, to leverage local and global spectro-temporal inf… ▽ More

    Submitted 15 March, 2023; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: in IEEE ICASSP 2023

  37. arXiv:2206.13909  [pdf, other

    cs.SD cs.LG eess.AS

    QTI Submission to DCASE 2021: residual normalization for device-imbalanced acoustic scene classification with efficient design

    Authors: Byeonggeun Kim, Seunghan Yang, Jangho Kim, Simyung Chang

    Abstract: This technical report describes the details of our TASK1A submission of the DCASE2021 challenge. The goal of the task is to design an audio scene classification system for device-imbalanced datasets under the constraints of model complexity. This report introduces four methods to achieve the goal. First, we propose Residual Normalization, a novel feature normalization method that uses instance nor… ▽ More

    Submitted 25 October, 2022; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: tech report; won 1st place in DCASE2021 challenge. arXiv admin note: substantial text overlap with arXiv:2111.06531

  38. arXiv:2206.13708  [pdf, other

    cs.SD cs.LG eess.AS

    Personalized Keyword Spotting through Multi-task Learning

    Authors: Seunghan Yang, Byeonggeun Kim, Inseop Chung, Simyung Chang

    Abstract: Keyword spotting (KWS) plays an essential role in enabling speech-based user interaction on smart devices, and conventional KWS (C-KWS) approaches have concentrated on detecting user-agnostic pre-defined keywords. However, in practice, most user interactions come from target users enrolled in the device which motivates to construct personalized keyword spotting. We design two personalized KWS task… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: Proceedings of INTERSPEECH 2022

  39. arXiv:2206.13691  [pdf, other

    cs.SD cs.LG eess.AS

    Dummy Prototypical Networks for Few-Shot Open-Set Keyword Spotting

    Authors: Byeonggeun Kim, Seunghan Yang, Inseop Chung, Simyung Chang

    Abstract: Keyword spotting is the task of detecting a keyword in streaming audio. Conventional keyword spotting targets predefined keywords classification, but there is growing attention in few-shot (query-by-example) keyword spotting, e.g., N-way classification given M-shot support samples. Moreover, in real-world scenarios, there can be utterances from unexpected categories (open-set) which need to be rej… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: Proceedings of INTERSPEECH 2022

  40. arXiv:2206.13295  [pdf, other

    eess.IV cs.CV cs.LG

    Diffusion Deformable Model for 4D Temporal Medical Image Generation

    Authors: Boah Kim, Jong Chul Ye

    Abstract: Temporal volume images with 3D+t (4D) information are often used in medical imaging to statistically analyze temporal dynamics or capture disease progression. Although deep-learning-based generative models for natural images have been extensively studied, approaches for temporal medical image generation such as 4D cardiac volume data are limited. In this work, we present a novel deep learning mode… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: Accepted for MICCAI 2022

  41. arXiv:2206.12513  [pdf, other

    cs.SD cs.LG eess.AS

    Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene Classification

    Authors: Byeonggeun Kim, Seunghan Yang, Jangho Kim, Hyunsin Park, Juntae Lee, Simyung Chang

    Abstract: While using two-dimensional convolutional neural networks (2D-CNNs) in image processing, it is possible to manipulate domain information using channel statistics, and instance normalization has been a promising way to get domain-invariant features. However, unlike image processing, we analyze that domain-relevant information in an audio feature is dominant in frequency statistics rather than chann… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: Proceedings of INTERSPEECH 2022

  42. arXiv:2205.10405  [pdf, other

    cs.NI eess.SP

    Demo: A Transparent Antenna System for In-Building Networks

    Authors: Sang-Hyun Park, Soo-Min Kim, Seonghoon Kim, HongIl Yoo, Byoungnam Kim, Chan-Byoung Chae

    Abstract: For in-building networks, the potential of transparent antennas, which are used as windows of a building, is presented in this paper. In this scenario, a transparent window antenna communicates with outdoor devices or base stations, and the indoor repeaters act as relay stations of the transparent window antenna for indoor devices. At indoor, back lobe waves of the transparent window antenna are d… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: 2 pages, 3 figures

  43. arXiv:2205.05275  [pdf, other

    eess.SY

    Strong Sign Controllability of Diffusively-Coupled Networks

    Authors: Nam-Jin Park, Seong-Ho Kwon, Yoo-Bin Bae, Byeong-Yeon Kim, Kevin L. Moore, Hyo-Sung Ahn

    Abstract: This paper presents several conditions to determine strong sign controllability for diffusively-coupled undirected networks. The strong sign controllability is determined by the sign patterns (positive, negative, zero) of the edges. We first provide the necessary and sufficient conditions for strong sign controllability of basic components, such as path, cycle, and tree. Next, we propose a merging… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

  44. arXiv:2201.09458  [pdf

    cs.RO eess.SY

    Hybrid Adaptive Control for Series Elastic Actuator of Humanoid Robot

    Authors: Anh Khoa Lanh Luu, Van Tu Duong, Huy Hung Nguyen, Sang Bong Kim, Tan Tien Nguyen

    Abstract: Generally, humanoid robots usually suffer significant impact force when walking or running in a non-predefined environment that could easily damage the actuators due to high stiffness. In recent years, the usages of passive, compliant series elastic actuators (SEA) for driving humanoid's joints have proved the capability in many aspects so far. However, despite being widely applied in the biped ro… ▽ More

    Submitted 23 January, 2022; originally announced January 2022.

  45. arXiv:2112.11414  [pdf, ps, other

    eess.SP cs.LG cs.NI stat.ML

    Covert Communications via Adversarial Machine Learning and Reconfigurable Intelligent Surfaces

    Authors: Brian Kim, Tugba Erpek, Yalin E. Sagduyu, Sennur Ulukus

    Abstract: By moving from massive antennas to antenna surfaces for software-defined wireless systems, the reconfigurable intelligent surfaces (RISs) rely on arrays of unit cells to control the scattering and reflection profiles of signals, mitigating the propagation loss and multipath attenuation, and thereby improving the coverage and spectral efficiency. In this paper, covert communication is considered in… ▽ More

    Submitted 21 December, 2021; originally announced December 2021.

  46. arXiv:2112.05149  [pdf, other

    eess.IV cs.CV cs.LG

    DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion Model

    Authors: Boah Kim, Inhwa Han, Jong Chul Ye

    Abstract: Deformable image registration is one of the fundamental tasks in medical imaging. Classical registration algorithms usually require a high computational cost for iterative optimizations. Although deep-learning-based methods have been developed for fast image registration, it is still challenging to obtain realistic continuous deformations from a moving image to a fixed image with less topological… ▽ More

    Submitted 29 September, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

  47. arXiv:2112.04188  [pdf, other

    eess.SP cs.NI eess.SY

    Beam Squint in Ultra-wideband mmWave Systems: RF Lens Array vs. Phase-Shifter-Based Array

    Authors: Sang-Hyun Park, Byoungnam Kim, Dong Ku Kim, Linglong Dai, Kai-Kit Wong, Chan-Byoung Chae

    Abstract: In this article, we discuss the potential of radio frequency (RF) lens for ultra-wideband millimeter-wave (mmWave) systems. In terms of the beam squint, we compare the proposed RF lens antenna with the phase shifter-based array for hybrid beamforming. To reduce the complexities for fully digital beamforming, researchers have come up with RF lens-based hybrid beamforming. The use of mmWave systems,… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

    Comments: 8 pages, 4 figures, 2 tables

  48. arXiv:2111.14362  [pdf, other

    eess.IV cs.CV

    Unsupervised Image Denoising with Frequency Domain Knowledge

    Authors: Nahyun Kim, Donggon Jang, Sunhyeok Lee, Bomi Kim, Dae-Shik Kim

    Abstract: Supervised learning-based methods yield robust denoising results, yet they are inherently limited by the need for large-scale clean/noisy paired datasets. The use of unsupervised denoisers, on the other hand, necessitates a more detailed understanding of the underlying image statistics. In particular, it is well known that apparent differences between clean and noisy images are most prominent on h… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

    Comments: Accepted to BMVC 2021

  49. arXiv:2111.10067  [pdf

    physics.med-ph eess.IV

    Noise-resistant reconstruction algorithm based on the sinogram pattern

    Authors: Byung Chun Kim, Hyunju Lee, Kyungtaek Jun

    Abstract: We introduce a new CT image reconstruction algorithm that is less affected by various artifacts. The new reconstruction algorithm is a method of minimizing the difference between synchrotron X-ray tomography data and sinograms generated using Radon transform of CT images. The CT image is iteratively updated to reduce the difference from the sinogram of the data. This method can obtain clean CT ima… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

    Comments: 8 pages, 3 figures

  50. arXiv:2111.06531  [pdf, other

    cs.SD cs.LG eess.AS

    Domain Generalization on Efficient Acoustic Scene Classification using Residual Normalization

    Authors: Byeonggeun Kim, Seunghan Yang, Jangho Kim, Simyung Chang

    Abstract: It is a practical research topic how to deal with multi-device audio inputs by a single acoustic scene classification system with efficient design. In this work, we propose Residual Normalization, a novel feature normalization method that uses frequency-wise normalization % instance normalization with a shortcut path to discard unnecessary device-specific information without losing useful informat… ▽ More

    Submitted 11 November, 2021; originally announced November 2021.

    Comments: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)