Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–28 of 28 results for author: Nam, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.15725  [pdf, other

    eess.AS cs.SD

    Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes

    Authors: Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park

    Abstract: To tackle sound event detection (SED) task, we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency warping and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 Challenge Task 4 technical report

  2. arXiv:2406.13312  [pdf, other

    eess.AS cs.SD

    Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution

    Authors: Hyeonuk Nam, Yong-Hwa Park

    Abstract: Frequency dynamic convolution (FDY conv) has been a milestone in the sound event detection (SED) field, but it involves a substantial increase in model size due to multiple basis kernels. In this work, we propose partial frequency dynamic convolution (PFD conv), which concatenates static convolution output and dynamic FDY conv output in order to minimize model size increase while maintaining the p… ▽ More

    Submitted 7 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  3. arXiv:2406.05341  [pdf, other

    eess.AS cs.SD

    Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park

    Abstract: Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patte… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  4. arXiv:2403.16652  [pdf, other

    cs.RO eess.SY

    Trajectory Planning of Robotic Manipulator in Dynamic Environment Exploiting DRL

    Authors: Osama Ahmad, Zawar Hussain, Hammad Naeem

    Abstract: This study is about the implementation of a reinforcement learning algorithm in the trajectory planning of manipulators. We have a 7-DOF robotic arm to pick and place the randomly placed block at a random target point in an unknown environment. The obstacle is randomly moving which creates a hurdle in picking the object. The objective of the robot is to avoid the obstacle and pick the block with c… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted in ICIESTR-2024

  5. arXiv:2403.08187  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children

    Authors: Taekyung Ahn, Yeonjung Hong, Younggon Im, Do Hyung Kim, Dayoung Kang, Joo Won Jeong, Jae Won Kim, Min Jung Kim, Ah-ra Cho, Dae-Hyun Jang, Hosung Nam

    Abstract: This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children wit… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 12 pages, 2 figures

    ACM Class: I.2.7

  6. arXiv:2312.15924  [pdf, other

    cs.IT eess.SP

    Modeling and Analysis of GEO Satellite Networks

    Authors: Dong-Hyun Jung, Hongjae Nam, Junil Choi, David J. Love

    Abstract: The extensive coverage offered by satellites makes them effective in enhancing service continuity for users on dynamic airborne and maritime platforms, such as airplanes and ships. In particular, geosynchronous Earth orbit (GEO) satellites ensure stable connectivity for terrestrial users due to their stationary characteristics when observed from Earth. This paper introduces a novel approach to mod… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: 12 pages, 9 figures, submitted to IEEE Transactions on Wireless Communications

  7. arXiv:2309.11127  [pdf, other

    eess.SP cs.AI cs.CL

    Language-Oriented Communication with Semantic Coding and Knowledge Distillation for Text-to-Image Generation

    Authors: Hyelin Nam, Jihong Park, Jinho Choi, Mehdi Bennis, Seong-Lyun Kim

    Abstract: By integrating recent advances in large language models (LLMs) and generative models into the emerging semantic communication (SC) paradigm, in this article we put forward to a novel framework of language-oriented semantic communication (LSC). In LSC, machines communicate using human language messages that can be interpreted and manipulated via natural language processing (NLP) techniques for SC e… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 5 pages, 4 figures, submitted to 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

  8. arXiv:2309.04287  [pdf, other

    eess.SP cs.AI

    Sequential Semantic Generative Communication for Progressive Text-to-Image Generation

    Authors: Hyelin Nam, Jihong Park, Jinho Choi, Seong-Lyun Kim

    Abstract: This paper proposes new framework of communication system leveraging promising generation capabilities of multi-modal generative models. Regarding nowadays smart applications, successful communication can be made by conveying the perceptual meaning, which we set as text prompt. Text serves as a suitable semantic representation of image data as it has evolved to instruct an image or generate image… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 4 pages, 2 figures, to be published in IEEE International Conference on Sensing, Communication, and Networking, Workshop on Semantic Communication for 6G (SC6G-SECON23)

  9. arXiv:2307.11998  [pdf, other

    eess.IV

    ELiOT : End-to-end Lidar Odometry using Transformer Framework

    Authors: Daegyu Lee, Hyunwoo Nam, D. Hyunchul Shim

    Abstract: In recent years, deep-learning-based point cloud registration methods have shown significant promise. Furthermore, learning-based 3D detectors have demonstrated their effectiveness in encoding semantic information from LiDAR data. In this paper, we introduce ELiOT, an end-to-end LiDAR odometry framework built on a transformer architecture. Our proposed Self-attention flow embedding network implici… ▽ More

    Submitted 12 September, 2023; v1 submitted 22 July, 2023; originally announced July 2023.

  10. arXiv:2306.11427  [pdf

    eess.AS

    Auditory Neural Response Inspired Sound Event Detection Based on Spectro-temporal Receptive Field

    Authors: Deokki Min, Hyeonuk Nam, Yong-Hwa Park

    Abstract: Sound event detection (SED) is one of tasks to automate function by human auditory system which listens and understands auditory scenes. Therefore, we were inspired to make SED recognize sound events in the way human auditory system does. Spectro-temporal receptive field (STRF), an approach to describe the relationship between perceived sound at ear and transformed neural response in the auditory… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Submitted to DCASE 2023 Workshop

  11. arXiv:2306.11277  [pdf, other

    cs.SD eess.AS

    Frequency & Channel Attention for Computationally Efficient Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Yong-Hwa Park

    Abstract: We explore on various attention methods on frequency and channel dimensions for sound event detection (SED) in order to enhance performance with minimal increase in computational cost while leveraging domain knowledge to address the frequency dimension of audio data. We have introduced frequency dynamic convolution (FDY conv) in a previous work to release the translational equivariance issue assoc… ▽ More

    Submitted 28 August, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted to DCASE 2023 workshop

  12. arXiv:2306.05004  [pdf, other

    eess.AS cs.AI cs.SD

    VIFS: An End-to-End Variational Inference for Foley Sound Synthesis

    Authors: Junhyeok Lee, Hyeonuk Nam, Yong-Hwa Park

    Abstract: The goal of DCASE 2023 Challenge Task 7 is to generate various sound clips for Foley sound synthesis (FSS) by "category-to-sound" approach. "Category" is expressed by a single index while corresponding "sound" covers diverse and different sound examples. To generate diverse sounds for a given category, we adopt VITS, a text-to-speech (TTS) model with variational inference. In addition, we apply va… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: DCASE 2023 Challenge Task 7

  13. arXiv:2212.12844  [pdf, other

    eess.IV cs.CV

    Weakly-Supervised Deep Learning Model for Prostate Cancer Diagnosis and Gleason Grading of Histopathology Images

    Authors: Mohammad Mahdi Behzadi, Mohammad Madani, Hanzhang Wang, Jun Bai, Ankit Bhardwaj, Anna Tarakanova, Harold Yamase, Ga Hie Nam, Sheida Nabavi

    Abstract: Prostate cancer is the most common cancer in men worldwide and the second leading cause of cancer death in the United States. One of the prognostic features in prostate cancer is the Gleason grading of histopathology images. The Gleason grade is assigned based on tumor architecture on Hematoxylin and Eosin (H&E) stained whole slide images (WSI) by the pathologists. This process is time-consuming a… ▽ More

    Submitted 24 December, 2022; originally announced December 2022.

  14. arXiv:2206.12059  [pdf

    eess.AS cs.SD

    Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes

    Authors: Byeong-Yun Ko, Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Seung-Deok Choi, Yong-Hwa Park

    Abstract: Performance of sound event localization and detection (SELD) in real scenes is limited by small size of SELD dataset, due to difficulty in obtaining sufficient amount of realistic multi-channel audio data recordings with accurate label. We used two main strategies to solve problems arising from the small real SELD dataset. First, we applied various data augmentation methods on all data dimensions:… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Technical Report submitted for DCASE2022 Challenge Task3

  15. arXiv:2206.11645  [pdf, ps, other

    eess.AS

    Frequency Dependent Sound Event Detection for DCASE 2022 Challenge Task 4

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Byeong-Yun Ko, Seung-Deok Choi, Yong-Hwa Park

    Abstract: While many deep learning methods on other domains have been applied to sound event detection (SED), differences between original domains of the methods and SED have not been appropriately considered so far. As SED uses audio data with two dimensions (time and frequency) for input, thorough comprehension on these two dimensions is essential for application of methods from other domains on SED. Prev… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Technical Reprot submitted for DCASE2022 Challenge Task4

  16. arXiv:2205.01679  [pdf, other

    eess.IV cs.CV

    Physics to the Rescue: Deep Non-line-of-sight Reconstruction for High-speed Imaging

    Authors: Fangzhou Mu, Sicheng Mo, Jiayong Peng, Xiaochun Liu, Ji Hyun Nam, Siddeshwar Raghavan, Andreas Velten, Yin Li

    Abstract: Computational approach to imaging around the corner, or non-line-of-sight (NLOS) imaging, is becoming a reality thanks to major advances in imaging hardware and reconstruction algorithms. A recent development towards practical NLOS imaging, Nam et al. demonstrated a high-speed non-confocal imaging system that operates at 5Hz, 100x faster than the prior art. This enormous gain in acquisition rate,… ▽ More

    Submitted 5 August, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

    Comments: ICCP 2022 (TPAMI Special Issue on Computational Photography). Project page: https://pages.cs.wisc.edu/~fmu/nlos3d/

  17. arXiv:2203.15296  [pdf, other

    eess.AS

    Frequency Dynamic Convolution: Frequency-Adaptive Pattern Recognition for Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Byeong-Yun Ko, Yong-Hwa Park

    Abstract: 2D convolution is widely used in sound event detection (SED) to recognize two dimensional time-frequency patterns of sound events. However, 2D convolution enforces translation equivariance on sound events along both time and frequency axis while frequency is not shift-invariant dimension. In order to improve physical consistency of 2D convolution on SED, we propose frequency dynamic convolution wh… ▽ More

    Submitted 3 July, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Accepted to INTERSPEECH 2022

  18. arXiv:2203.15277  [pdf, other

    eess.AS cs.SD

    Decomposed Temporal Dynamic CNN: Efficient Time-Adaptive Network for Text-Independent Speaker Verification Explained with Speaker Activation Map

    Authors: Seong-Hu Kim, Hyeonuk Nam, Yong-Hwa Park

    Abstract: To extract accurate speaker information for text-independent speaker verification, temporal dynamic CNNs (TDY-CNNs) adapting kernels to each time bin was proposed. However, model size of TDY-CNN is too large and the adaptive kernel's degree of freedom is limited. To address these limitations, we propose decomposed temporal dynamic CNNs (DTDY-CNNs) which forms time-adaptive kernel by combining stat… ▽ More

    Submitted 27 October, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Submitted to ICASSP 2023

  19. arXiv:2110.03282  [pdf, other

    eess.AS

    FilterAugment: An Acoustic Environmental Data Augmentation Method

    Authors: Hyeonuk Nam, Seong-Hu Kim, Yong-Hwa Park

    Abstract: Acoustic environments affect acoustic characteristics of sound to be recognized by physically interacting with sound wave propagation. Thus, training acoustic models for audio and speech tasks requires regularization on various acoustic environments in order to achieve robust performance in real life applications. We propose FilterAugment, a data augmentation method for regularization of acoustic… ▽ More

    Submitted 7 February, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP 2022

  20. arXiv:2110.03213  [pdf

    eess.AS

    Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

    Authors: Seong-Hu Kim, Hyeonuk Nam, Yong-Hwa Park

    Abstract: In the field of text-independent speaker recognition, dynamic models that adapt along the time axis have been proposed to consider the phoneme-varying characteristics of speech. However, a detailed analysis of how dynamic models work depending on phonemes is insufficient. In this paper, we propose temporal dynamic CNN (TDY-CNN) that considers temporal variation of phonemes by applying kernels opti… ▽ More

    Submitted 8 February, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP 2022

  21. arXiv:2109.07783   

    eess.IV cs.CV

    Towards Non-Line-of-Sight Photography

    Authors: Jiayong Peng, Fangzhou Mu, Ji Hyun Nam, Siddeshwar Raghavan, Yin Li, Andreas Velten, Zhiwei Xiong

    Abstract: Non-line-of-sight (NLOS) imaging is based on capturing the multi-bounce indirect reflections from the hidden objects. Active NLOS imaging systems rely on the capture of the time of flight of light through the scene, and have shown great promise for the accurate and robust reconstruction of hidden scenes without the need for specialized scene setups and prior assumptions. Despite that existing meth… ▽ More

    Submitted 17 April, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

    Comments: The proposed method and dataset are required further validations

  22. Deep learning based cough detection camera using enhanced features

    Authors: Gyeong-Tae Lee, Hyeonuk Nam, Seong-Hu Kim, Sang-Min Choi, Youngkey Kim, Yong-Hwa Park

    Abstract: Coughing is a typical symptom of COVID-19. To detect and localize coughing sounds remotely, a convolutional neural network (CNN) based deep learning model was developed in this work and integrated with a sound camera for the visualization of the cough sounds. The cough detection model is a binary classifier of which the input is a two second acoustic feature and the output is one of two inferences… ▽ More

    Submitted 24 May, 2022; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: 28 pages, 20 figures, and 14 tables

    Journal ref: Expert Systems With Applications, Vol. 206, No. 15, pp. 1-20, 2022

  23. arXiv:2107.03649  [pdf

    eess.AS cs.SD

    Heavily Augmented Sound Event Detection utilizing Weak Predictions

    Authors: Hyeonuk Nam, Byeong-Yun Ko, Gyeong-Tae Lee, Seong-Hu Kim, Won-Ho Jung, Sang-Min Choi, Yong-Hwa Park

    Abstract: The performances of Sound Event Detection (SED) systems are greatly limited by the difficulty in generating large strongly labeled dataset. In this work, we used two main approaches to overcome the lack of strongly labeled data. First, we applied heavy data augmentation on input features. Data augmentation methods used include not only conventional methods used in speech/audio domains but also our… ▽ More

    Submitted 14 September, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

    Comments: Won 3rd place on IEEE DCASE 2021 Task 4

  24. arXiv:2103.12622  [pdf, other

    eess.IV cs.CV

    Virtual Light Transport Matrices for Non-Line-Of-Sight Imaging

    Authors: Julio Marco, Adrian Jarabo, Ji Hyun Nam, Xiaochun Liu, Miguel Ángel Cosculluela, Andreas Velten, Diego Gutierrez

    Abstract: The light transport matrix (LTM) is an instrumental tool in line-of-sight (LOS) imaging, describing how light interacts with the scene and enabling applications such as relighting or separation of illumination components. We introduce a framework to estimate the LTM of non-line-of-sight (NLOS) scenarios, coupling recent virtual forward light propagation models for NLOS imaging with the LOS light t… ▽ More

    Submitted 5 October, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: ICCV 2021 (Oral)

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 2440-2449

  25. arXiv:2007.08667  [pdf, other

    eess.IV cs.CV

    Super-Resolution Remote Imaging using Time Encoded Remote Apertures

    Authors: Ji Hyun Nam, Andreas Velten

    Abstract: Imaging of scenes using light or other wave phenomena is subject to the diffraction limit. The spatial profile of a wave propagating between a scene and the imaging system is distorted by diffraction resulting in a loss of resolution that is proportional with traveled distance. We show here that it is possible to reconstruct sparse scenes from the temporal profile of the wave-front using only one… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  26. arXiv:1907.06834  [pdf, other

    eess.SP

    Noise Removal of FTIR Hyperspectral Images via MMSE

    Authors: Chang Sik Lee, Hyeong Geun Yu, Dong Jo Park, Dong Eui Chang, Hyunwoo Nam, Byeong Hwang Park

    Abstract: Fourier transform infrared (FTIR) hyperspectral imaging systems are deployed in various fields where spectral information is exploited. Chemical warfare agent (CWA) detection is one of such fields and it requires a fast and accurate process from the measurement to the visualization of detection results, including noise removal. A general concern of existing noise removal algorithms is a trade-off… ▽ More

    Submitted 29 December, 2019; v1 submitted 16 July, 2019; originally announced July 2019.

  27. arXiv:1906.06021  [pdf, other

    eess.SP cs.IT

    Self-Tuning Sectorization: Deep Reinforcement Learning Meets Broadcast Beam Optimization

    Authors: Rubayet Shafin, Hao Chen, Young Han Nam, Sooyoung Hur, Jeongho Park, Jianzhong, Zhang, Jeffrey Reed, Lingjia Liu

    Abstract: Beamforming in multiple input multiple output (MIMO) systems is one of the key technologies for modern wireless communication. Creating appropriate sector-specific broadcast beams are essential for enhancing the coverage of cellular network and for improving the broadcast operation for control signals. However, in order to maximize the coverage, patterns for broadcast beams need to be adapted base… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

    Comments: 30 pages, 16 figures

  28. arXiv:1906.03585  [pdf, other

    eess.SP

    A State-of-the-Art Survey on Multidimensional Scaling Based Localization Techniques

    Authors: Nasir Saeed, Haewoon Nam, Tareq Y. Al-Naffouri, Mohamed-Slim Alouini

    Abstract: Current and future wireless applications strongly rely on precise real-time localization. A number of applications such as smart cities, Internet of Things (IoT), medical services, automotive industry, underwater exploration, public safety, and military systems require reliable and accurate localization techniques. Generally, the most popular localization/ positioning system is the Global Position… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

    Comments: Accepted in IEEE Communications Surveys and Tutorials