-
Ring-LWE based encrypted controller with unlimited number of recursive multiplications and effect of error growth
Authors:
Yeongjun Jang,
Joowon Lee,
Seonhong Min,
Hyesun Kwak,
Junsoo Kim,
Yongsoo Song
Abstract:
In this paper, we propose a method to encrypt linear dynamic controllers that enables an unlimited number of recursive homomorphic multiplications on a Ring Learning With Errors (Ring-LWE) based cryptosystem without bootstrapping. Unlike LWE based schemes, where a scalar error is injected during encryption for security, Ring-LWE based schemes are based on polynomial rings and inject error as a pol…
▽ More
In this paper, we propose a method to encrypt linear dynamic controllers that enables an unlimited number of recursive homomorphic multiplications on a Ring Learning With Errors (Ring-LWE) based cryptosystem without bootstrapping. Unlike LWE based schemes, where a scalar error is injected during encryption for security, Ring-LWE based schemes are based on polynomial rings and inject error as a polynomial having multiple error coefficients. Such errors accumulate under recursive homomorphic operations, and it has been studied that their effect can be suppressed by the closed-loop stability when dynamic controllers are encrypted using LWE based schemes. We show that this also holds for the proposed controller encrypted using a Ring-LWE based scheme. Specifically, only the constant terms of the error polynomials affect the control performance, and their effect can be arbitrarily bounded even when the noneffective terms diverge. Furthermore, a novel packing algorithm is applied, resulting in reduced computation time and enhanced memory efficiency. Simulation results demonstrate the effectiveness of the proposed method.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation
Authors:
Sangyeop Yeo,
Yoojin Jang,
Jaejun Yoo
Abstract:
In this paper, we address the challenge of compressing generative adversarial networks (GANs) for deployment in resource-constrained environments by proposing two novel methodologies: Distribution Matching for Efficient compression (DiME) and Network Interactive Compression via Knowledge Exchange and Learning (NICKEL). DiME employs foundation models as embedding kernels for efficient distribution…
▽ More
In this paper, we address the challenge of compressing generative adversarial networks (GANs) for deployment in resource-constrained environments by proposing two novel methodologies: Distribution Matching for Efficient compression (DiME) and Network Interactive Compression via Knowledge Exchange and Learning (NICKEL). DiME employs foundation models as embedding kernels for efficient distribution matching, leveraging maximum mean discrepancy to facilitate effective knowledge distillation. Simultaneously, NICKEL employs an interactive compression method that enhances the communication between the student generator and discriminator, achieving a balanced and stable compression process. Our comprehensive evaluation on the StyleGAN2 architecture with the FFHQ dataset shows the effectiveness of our approach, with NICKEL & DiME achieving FID scores of 10.45 and 15.93 at compression rates of 95.73% and 98.92%, respectively. Remarkably, our methods sustain generative quality even at an extreme compression rate of 99.69%, surpassing the previous state-of-the-art performance by a large margin. These findings not only demonstrate our methodologies' capacity to significantly lower GANs' computational demands but also pave the way for deploying high-quality GAN models in settings with limited resources. Our code will be released soon.
△ Less
Submitted 4 September, 2024; v1 submitted 19 May, 2024;
originally announced May 2024.
-
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Authors:
Youngjoon Jang,
Ji-Hoon Kim,
Junseok Ahn,
Doyeop Kwak,
Hong-Sun Yang,
Yoon-Cheol Ju,
Il-Hwan Kim,
Byeong-Yeol Kim,
Joon Son Chung
Abstract:
The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations…
▽ More
The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations in facial motion for the same identity. To tackle these issues, we introduce a motion sampler based on conditional flow matching, which is capable of high-quality motion code generation in an efficient way. Moreover, we introduce a novel conditioning method for the TTS system, which utilises motion-removed features from the TFG model to yield uniform speech outputs. Our extensive experiments demonstrate that our method effectively creates natural-looking talking faces and speech that accurately match the input text. To our knowledge, this is the first effort to build a multimodal synthesis system that can generalise to unseen identities.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
Authors:
Youngdong Jang,
Dong In Lee,
MinHyuk Jang,
Jong Wook Kim,
Feng Yang,
Sangpil Kim
Abstract:
The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representat…
▽ More
The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representations. In this work, we introduce an innovative watermarking method that can be employed in both representations of NeRF. This is achieved by fine-tuning NeRF to embed binary messages in the rendering process. In detail, we propose utilizing the discrete wavelet transform in the NeRF space for watermarking. Furthermore, we adopt a deferred back-propagation technique and introduce a combination with the patch-wise loss to improve rendering quality and bit accuracy with minimum trade-offs. We evaluate our method in three different aspects: capacity, invisibility, and robustness of the embedded watermarks in the 2D-rendered images. Our method achieves state-of-the-art performance with faster training speed over the compared state-of-the-art methods.
△ Less
Submitted 11 July, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
Learning with errors based dynamic encryption that discloses residue signal for anomaly detection
Authors:
Yeongjun Jang,
Joowon Lee,
Junsoo Kim,
Hyungbo Shim
Abstract:
Anomaly detection is a protocol that detects integrity attacks on control systems by comparing the residue signal with a threshold. Implementing anomaly detection on encrypted control systems has been a challenge because it is hard to detect an anomaly from the encrypted residue signal without the secret key. In this paper, we propose a dynamic encryption scheme for a linear system that automatica…
▽ More
Anomaly detection is a protocol that detects integrity attacks on control systems by comparing the residue signal with a threshold. Implementing anomaly detection on encrypted control systems has been a challenge because it is hard to detect an anomaly from the encrypted residue signal without the secret key. In this paper, we propose a dynamic encryption scheme for a linear system that automatically discloses the residue signal. The initial state and the input are encrypted based on the zero-dynamics of the system, so that the effect of encryption on the residue signal remains identically zero. The proposed scheme is shown to be secure in the sense that no other information than the residue signal is disclosed. Furthermore, we demonstrate a method of utilizing the disclosed residue signal to operate an observer-based controller over encrypted data for an infinite time horizon without re-encryption.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Hybrid Neural Representations for Spherical Data
Authors:
Hyomin Kim,
Yunhui Jang,
Jaeho Lee,
Sungsoo Ahn
Abstract:
In this paper, we study hybrid neural representations for spherical data, a domain of increasing relevance in scientific research. In particular, our work focuses on weather and climate data as well as comic microwave background (CMB) data. Although previous studies have delved into coordinate-based neural representations for spherical signals, they often fail to capture the intricate details of h…
▽ More
In this paper, we study hybrid neural representations for spherical data, a domain of increasing relevance in scientific research. In particular, our work focuses on weather and climate data as well as comic microwave background (CMB) data. Although previous studies have delved into coordinate-based neural representations for spherical signals, they often fail to capture the intricate details of highly nonlinear signals. To address this limitation, we introduce a novel approach named Hybrid Neural Representations for Spherical data (HNeR-S). Our main idea is to use spherical feature-grids to obtain positional features which are combined with a multilayer perception to predict the target signal. We consider feature-grids with equirectangular and hierarchical equal area isolatitude pixelization structures that align with weather data and CMB data, respectively. We extensively verify the effectiveness of our HNeR-S for regression, super-resolution, temporal interpolation, and compression tasks.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
Authors:
Tan Dat Nguyen,
Ji-Hoon Kim,
Youngjoon Jang,
Jaehun Kim,
Joon Son Chung
Abstract:
The goal of this paper is to generate realistic audio with a lightweight and fast diffusion-based vocoder, named FreGrad. Our framework consists of the following three key components: (1) We employ discrete wavelet transform that decomposes a complicated waveform into sub-band wavelets, which helps FreGrad to operate on a simple and concise feature space, (2) We design a frequency-aware dilated co…
▽ More
The goal of this paper is to generate realistic audio with a lightweight and fast diffusion-based vocoder, named FreGrad. Our framework consists of the following three key components: (1) We employ discrete wavelet transform that decomposes a complicated waveform into sub-band wavelets, which helps FreGrad to operate on a simple and concise feature space, (2) We design a frequency-aware dilated convolution that elevates frequency awareness, resulting in generating speech with accurate frequency information, and (3) We introduce a bag of tricks that boosts the generation quality of the proposed model. In our experiments, FreGrad achieves 3.7 times faster training time and 2.2 times faster inference speed compared to our baseline while reducing the model size by 0.6 times (only 1.78M parameters) without sacrificing the output quality. Audio samples are available at: https://mm.kaist.ac.kr/projects/FreGrad.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
A Unified Approach for Comprehensive Analysis of Various Spectral and Tissue Doppler Echocardiography
Authors:
Jaeik Jeon,
Jiyeon Kim,
Yeonggul Jang,
Yeonyee E. Yoon,
Dawun Jeong,
Youngtaek Hong,
Seung-Ah Lee,
Hyuk-Jae Chang
Abstract:
Doppler echocardiography offers critical insights into cardiac function and phases by quantifying blood flow velocities and evaluating myocardial motion. However, previous methods for automating Doppler analysis, ranging from initial signal processing techniques to advanced deep learning approaches, have been constrained by their reliance on electrocardiogram (ECG) data and their inability to proc…
▽ More
Doppler echocardiography offers critical insights into cardiac function and phases by quantifying blood flow velocities and evaluating myocardial motion. However, previous methods for automating Doppler analysis, ranging from initial signal processing techniques to advanced deep learning approaches, have been constrained by their reliance on electrocardiogram (ECG) data and their inability to process Doppler views collectively. We introduce a novel unified framework using a convolutional neural network for comprehensive analysis of spectral and tissue Doppler echocardiography images that combines automatic measurements and end-diastole (ED) detection into a singular method. The network automatically recognizes key features across various Doppler views, with novel Doppler shape embedding and anti-aliasing modules enhancing interpretation and ensuring consistent analysis. Empirical results indicate a consistent outperformance in performance metrics, including dice similarity coefficients (DSC) and intersection over union (IoU). The proposed framework demonstrates strong agreement with clinicians in Doppler automatic measurements and competitive performance in ED detection.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model
Authors:
Suyeon Lee,
Chaeyoung Jung,
Youngjoon Jang,
Jaehun Kim,
Joon Son Chung
Abstract:
The objective of this work is to extract target speaker's voice from a mixture of voices using visual cues. Existing works on audio-visual speech separation have demonstrated their performance with promising intelligibility, but maintaining naturalness remains a challenge. To address this issue, we propose AVDiffuSS, an audio-visual speech separation model based on a diffusion mechanism known for…
▽ More
The objective of this work is to extract target speaker's voice from a mixture of voices using visual cues. Existing works on audio-visual speech separation have demonstrated their performance with promising intelligibility, but maintaining naturalness remains a challenge. To address this issue, we propose AVDiffuSS, an audio-visual speech separation model based on a diffusion mechanism known for its capability in generating natural samples. For an effective fusion of the two modalities for diffusion, we also propose a cross-attention-based feature fusion mechanism. This mechanism is specifically tailored for the speech domain to integrate the phonetic information from audio-visual correspondence in speech generation. In this way, the fusion process maintains the high temporal resolution of the features, without excessive computational requirements. We demonstrate that the proposed framework achieves state-of-the-art results on two benchmarks, including VoxCeleb2 and LRS3, producing speech with notably better naturalness.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Self supervised convolutional kernel based handcrafted feature harmonization: Enhanced left ventricle hypertension disease phenotyping on echocardiography
Authors:
Jina Lee,
Youngtaek Hong,
Dawun Jeong,
Yeonggul Jang,
Jaeik Jeon,
Sihyeon Jeong,
Taekgeun Jung,
Yeonyee E. Yoon,
Inki Moon,
Seung-Ah Lee,
Hyuk-Jae Chang
Abstract:
Radiomics, a medical imaging technique, extracts quantitative handcrafted features from images to predict diseases. Harmonization in those features ensures consistent feature extraction across various imaging devices and protocols. Methods for harmonization include standardized imaging protocols, statistical adjustments, and evaluating feature robustness. Myocardial diseases such as Left Ventricul…
▽ More
Radiomics, a medical imaging technique, extracts quantitative handcrafted features from images to predict diseases. Harmonization in those features ensures consistent feature extraction across various imaging devices and protocols. Methods for harmonization include standardized imaging protocols, statistical adjustments, and evaluating feature robustness. Myocardial diseases such as Left Ventricular Hypertrophy (LVH) and Hypertensive Heart Disease (HHD) are diagnosed via echocardiography, but variable imaging settings pose challenges. Harmonization techniques are crucial for applying handcrafted features in disease diagnosis in such scenario. Self-supervised learning (SSL) enhances data understanding within limited datasets and adapts to diverse data settings. ConvNeXt-V2 integrates convolutional layers into SSL, displaying superior performance in various tasks. This study focuses on convolutional filters within SSL, using them as preprocessing to convert images into feature maps for handcrafted feature harmonization. Our proposed method excelled in harmonization evaluation and exhibited superior LVH classification performance compared to existing methods.
△ Less
Submitted 22 November, 2023; v1 submitted 13 October, 2023;
originally announced October 2023.
-
TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning
Authors:
Chaeyoung Jung,
Suyeon Lee,
Kihyun Nam,
Kyeongha Rho,
You Jin Kim,
Youngjoon Jang,
Joon Son Chung
Abstract:
The goal of this work is Active Speaker Detection (ASD), a task to determine whether a person is speaking or not in a series of video frames. Previous works have dealt with the task by exploring network architectures while learning effective representations has been less explored. In this work, we propose TalkNCE, a novel talk-aware contrastive loss. The loss is only applied to part of the full se…
▽ More
The goal of this work is Active Speaker Detection (ASD), a task to determine whether a person is speaking or not in a series of video frames. Previous works have dealt with the task by exploring network architectures while learning effective representations has been less explored. In this work, we propose TalkNCE, a novel talk-aware contrastive loss. The loss is only applied to part of the full segments where a person on the screen is actually speaking. This encourages the model to learn effective representations through the natural correspondence of speech and facial movements. Our loss can be jointly optimized with the existing objectives for training ASD models without the need for additional supervision or training data. The experiments demonstrate that our loss can be easily integrated into the existing ASD frameworks, improving their performance. Our method achieves state-of-the-art performances on AVA-ActiveSpeaker and ASW datasets.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Improving Out-of-Distribution Detection in Echocardiographic View Classication through Enhancing Semantic Features
Authors:
Jaeik Jeon,
Seongmin Ha,
Yeonggul Jang,
Yeonyee E. Yoon,
Jiyeon Kim,
Hyunseok Jeong,
Dawun Jeong,
Youngtaek Hong,
Seung-Ah Lee Hyuk-Jae Chang
Abstract:
In echocardiographic view classification, accurately detecting out-of-distribution (OOD) data is essential but challenging, especially given the subtle differences between in-distribution and OOD data. While conventional OOD detection methods, such as Mahalanobis distance (MD) are effective in far-OOD scenarios with clear distinctions between distributions, they struggle to discern the less obviou…
▽ More
In echocardiographic view classification, accurately detecting out-of-distribution (OOD) data is essential but challenging, especially given the subtle differences between in-distribution and OOD data. While conventional OOD detection methods, such as Mahalanobis distance (MD) are effective in far-OOD scenarios with clear distinctions between distributions, they struggle to discern the less obvious variations characteristic of echocardiographic data. In this study, we introduce a novel use of label smoothing to enhance semantic feature representation in echocardiographic images, demonstrating that these enriched semantic features are key for significantly improving near-OOD instance detection. By combining label smoothing with MD-based OOD detection, we establish a new benchmark for accuracy in echocardiographic OOD detection.
△ Less
Submitted 23 November, 2023; v1 submitted 31 August, 2023;
originally announced August 2023.
-
Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation
Authors:
Syed Samiul Alam,
Samiul Based Shuvo,
Shams Nafisa Ali,
Fardeen Ahmed,
Arbil Chakma,
Yeong Min Jang
Abstract:
Ocular Toxoplasmosis (OT), is a common eye infection caused by T. gondii that can cause vision problems. Diagnosis is typically done through a clinical examination and imaging, but these methods can be complicated and costly, requiring trained personnel. To address this issue, we have created a benchmark study that evaluates the effectiveness of existing pre-trained networks using transfer learnin…
▽ More
Ocular Toxoplasmosis (OT), is a common eye infection caused by T. gondii that can cause vision problems. Diagnosis is typically done through a clinical examination and imaging, but these methods can be complicated and costly, requiring trained personnel. To address this issue, we have created a benchmark study that evaluates the effectiveness of existing pre-trained networks using transfer learning techniques to detect OT from fundus images. Furthermore, we have also analysed the performance of transfer-learning based segmentation networks to segment lesions in the images. This research seeks to provide a guide for future researchers looking to utilise DL techniques and develop a cheap, automated, easy-to-use, and accurate diagnostic method. We have performed in-depth analysis of different feature extraction techniques in order to find the most optimal one for OT classification and segmentation of lesions. For classification tasks, we have evaluated pre-trained models such as VGG16, MobileNetV2, InceptionV3, ResNet50, and DenseNet121 models. Among them, MobileNetV2 outperformed all other models in terms of Accuracy (Acc), Recall, and F1 Score outperforming the second-best model, InceptionV3 by 0.7% higher Acc. However, DenseNet121 achieved the best result in terms of Precision, which was 0.1% higher than MobileNetv2. For the segmentation task, this work has exploited U-Net architecture. In order to utilize transfer learning the encoder block of the traditional U-Net was replaced by MobileNetV2, InceptionV3, ResNet34, and VGG16 to evaluate different architectures moreover two different two different loss functions (Dice loss and Jaccard loss) were exploited in order to find the most optimal one. The MobileNetV2/U-Net outperformed ResNet34 by 0.5% and 2.1% in terms of Acc and Dice Score, respectively when Jaccard loss function is employed during the training.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Self-supervised Image Denoising with Downsampled Invariance Loss and Conditional Blind-Spot Network
Authors:
Yeong Il Jang,
Keuntek Lee,
Gu Yong Park,
Seyun Kim,
Nam Ik Cho
Abstract:
There have been many image denoisers using deep neural networks, which outperform conventional model-based methods by large margins. Recently, self-supervised methods have attracted attention because constructing a large real noise dataset for supervised training is an enormous burden. The most representative self-supervised denoisers are based on blind-spot networks, which exclude the receptive f…
▽ More
There have been many image denoisers using deep neural networks, which outperform conventional model-based methods by large margins. Recently, self-supervised methods have attracted attention because constructing a large real noise dataset for supervised training is an enormous burden. The most representative self-supervised denoisers are based on blind-spot networks, which exclude the receptive field's center pixel. However, excluding any input pixel is abandoning some information, especially when the input pixel at the corresponding output position is excluded. In addition, a standard blind-spot network fails to reduce real camera noise due to the pixel-wise correlation of noise, though it successfully removes independently distributed synthetic noise. Hence, to realize a more practical denoiser, we propose a novel self-supervised training framework that can remove real noise. For this, we derive the theoretic upper bound of a supervised loss where the network is guided by the downsampled blinded output. Also, we design a conditional blind-spot network (C-BSN), which selectively controls the blindness of the network to use the center pixel information. Furthermore, we exploit a random subsampler to decorrelate noise spatially, making the C-BSN free of visual artifacts that were often seen in downsample-based methods. Extensive experiments show that the proposed C-BSN achieves state-of-the-art performance on real-world datasets as a self-supervised denoiser and shows qualitatively pleasing results without any post-processing or refinement.
△ Less
Submitted 28 July, 2023; v1 submitted 19 April, 2023;
originally announced April 2023.
-
NeBLa: Neural Beer-Lambert for 3D Reconstruction of Oral Structures from Panoramic Radiographs
Authors:
Sihwa Park,
Seongjun Kim,
Doeyoung Kwon,
Yohan Jang,
In-Seok Song,
Seung Jun Baek
Abstract:
Panoramic radiography (Panoramic X-ray, PX) is a widely used imaging modality for dental examination. However, PX only provides a flattened 2D image, lacking in a 3D view of the oral structure. In this paper, we propose NeBLa (Neural Beer-Lambert) to estimate 3D oral structures from real-world PX. NeBLa tackles full 3D reconstruction for varying subjects (patients) where each reconstruction is bas…
▽ More
Panoramic radiography (Panoramic X-ray, PX) is a widely used imaging modality for dental examination. However, PX only provides a flattened 2D image, lacking in a 3D view of the oral structure. In this paper, we propose NeBLa (Neural Beer-Lambert) to estimate 3D oral structures from real-world PX. NeBLa tackles full 3D reconstruction for varying subjects (patients) where each reconstruction is based only on a single panoramic image. We create an intermediate representation called simulated PX (SimPX) from 3D Cone-beam computed tomography (CBCT) data based on the Beer-Lambert law of X-ray rendering and rotational principles of PX imaging. SimPX aims at not only truthfully simulating PX, but also facilitates the reverting process back to 3D data. We propose a novel neural model based on ray tracing which exploits both global and local input features to convert SimPX to 3D output. At inference, a real PX image is translated to a SimPX-style image with semantic regularization, and the translated image is processed by generation module to produce high-quality outputs. Experiments show that NeBLa outperforms prior state-of-the-art in reconstruction tasks both quantitatively and qualitatively. Unlike prior methods, NeBLa does not require any prior information such as the shape of dental arches, nor the matched PX-CBCT dataset for training, which is difficult to obtain in clinical practice. Our code is available at https://github.com/sihwa-park/nebla.
△ Less
Submitted 6 February, 2024; v1 submitted 8 April, 2023;
originally announced April 2023.
-
SurgT challenge: Benchmark of Soft-Tissue Trackers for Robotic Surgery
Authors:
Joao Cartucho,
Alistair Weld,
Samyakh Tukra,
Haozheng Xu,
Hiroki Matsuzaki,
Taiyo Ishikawa,
Minjun Kwon,
Yong Eun Jang,
Kwang-Ju Kim,
Gwang Lee,
Bizhe Bai,
Lueder Kahrs,
Lars Boecking,
Simeon Allmendinger,
Leopold Muller,
Yitong Zhang,
Yueming Jin,
Sophia Bano,
Francisco Vasconcelos,
Wolfgang Reiter,
Jonas Hajek,
Bruno Silva,
Estevao Lima,
Joao L. Vilaca,
Sandro Queiros
, et al. (1 additional authors not shown)
Abstract:
This paper introduces the ``SurgT: Surgical Tracking" challenge which was organised in conjunction with MICCAI 2022. There were two purposes for the creation of this challenge: (1) the establishment of the first standardised benchmark for the research community to assess soft-tissue trackers; and (2) to encourage the development of unsupervised deep learning methods, given the lack of annotated da…
▽ More
This paper introduces the ``SurgT: Surgical Tracking" challenge which was organised in conjunction with MICCAI 2022. There were two purposes for the creation of this challenge: (1) the establishment of the first standardised benchmark for the research community to assess soft-tissue trackers; and (2) to encourage the development of unsupervised deep learning methods, given the lack of annotated data in surgery. A dataset of 157 stereo endoscopic videos from 20 clinical cases, along with stereo camera calibration parameters, have been provided. Participants were assigned the task of developing algorithms to track the movement of soft tissues, represented by bounding boxes, in stereo endoscopic videos. At the end of the challenge, the developed methods were assessed on a previously hidden test subset. This assessment uses benchmarking metrics that were purposely developed for this challenge, to verify the efficacy of unsupervised deep learning algorithms in tracking soft-tissue. The metric used for ranking the methods was the Expected Average Overlap (EAO) score, which measures the average overlap between a tracker's and the ground truth bounding boxes. Coming first in the challenge was the deep learning submission by ICVS-2Ai with a superior EAO score of 0.617. This method employs ARFlow to estimate unsupervised dense optical flow from cropped images, using photometric and regularization losses. Second, Jmees with an EAO of 0.583, uses deep learning for surgical tool segmentation on top of a non-deep learning baseline method: CSRT. CSRT by itself scores a similar EAO of 0.563. The results from this challenge show that currently, non-deep learning methods are still competitive. The dataset and benchmarking tool created for this challenge have been made publicly available at https://surgt.grand-challenge.org/.
△ Less
Submitted 30 August, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Energy-Efficient Vehicular Edge Computing with One-by-one Access Scheme
Authors:
Youngsu Jang,
Seongah Jeong,
Joonhyuk Kang
Abstract:
With the advent of ever-growing vehicular applications, vehicular edge computing (VEC) has been a promising solution to augment the computing capacity of future smart vehicles. The ultimate challenge to fulfill the quality of service (QoS) is increasingly prominent with constrained computing and communication resources of vehicles. In this paper, we propose an energy-efficient task offloading stra…
▽ More
With the advent of ever-growing vehicular applications, vehicular edge computing (VEC) has been a promising solution to augment the computing capacity of future smart vehicles. The ultimate challenge to fulfill the quality of service (QoS) is increasingly prominent with constrained computing and communication resources of vehicles. In this paper, we propose an energy-efficient task offloading strategy for VEC system with one-by-one scheduling mechanism, where only one vehicle wakes up at a time to offload with a road side unit (RSU). The goal of system is to minimize the total energy consumption of vehicles by jointly optimizing user scheduling, offloading ratio and bit allocation within a given mission time. To this end, the non-convex and mixed-integer optimization problem is formulated and solved by adopting Lagrange dual problem, whose superior performances are verified via numerical results, as compared to other benchmark schemes.
△ Less
Submitted 31 January, 2023;
originally announced January 2023.
-
Quantum Communication Systems: Vision, Protocols, Applications, and Challenges
Authors:
Syed Rakib Hasan,
Mostafa Zaman Chowdhury,
Md. Saiam,
Yeong Min Jang
Abstract:
The growth of modern technological sectors have risen to such a spectacular level that the blessings of technology have spread to every corner of the world, even to remote corners. At present, technological development finds its basis in the theoretical foundation of classical physics in every field of scientific research, such as wireless communication, visible light communication, machine learni…
▽ More
The growth of modern technological sectors have risen to such a spectacular level that the blessings of technology have spread to every corner of the world, even to remote corners. At present, technological development finds its basis in the theoretical foundation of classical physics in every field of scientific research, such as wireless communication, visible light communication, machine learning, and computing. The performance of the conventional communication systems is becoming almost saturated due to the usage of bits. The usage of quantum bits in communication technology has already surpassed the limits of existing technologies and revealed to us a new path in developing technological sectors. Implementation of quantum technology over existing system infrastructure not only provides better performance but also keeps the system secure and reliable. This technology is very promising for future communication systems. This review article describes the fundamentals of quantum communication, vision, design goals, information processing, and protocols. Besides, quantum communication architecture is also proposed here. This research included and explained the prospective applications of quantum technology over existing technological systems, along with the potential challenges of obtaining the goal.
△ Less
Submitted 26 December, 2022;
originally announced December 2022.
-
Over-the-Air Consensus for Distributed Vehicle Platooning Control (Extended version)
Authors:
Jihoon Lee,
Yonghoon Jang,
Hansol Kim,
Seong-Lyun Kim,
Seung-Woo Ko
Abstract:
A distributed control of vehicle platooning is referred to as distributed consensus (DC) since many autonomous vehicles (AVs) reach a consensus to move as one body with the same velocity and inter-distance. For DC control to be stable, other AVs' real-time position information should be inputted to each AV's controller via vehicle-to-vehicle (V2V) communications. On the other hand, too many V2V li…
▽ More
A distributed control of vehicle platooning is referred to as distributed consensus (DC) since many autonomous vehicles (AVs) reach a consensus to move as one body with the same velocity and inter-distance. For DC control to be stable, other AVs' real-time position information should be inputted to each AV's controller via vehicle-to-vehicle (V2V) communications. On the other hand, too many V2V links should be simultaneously established and frequently retrained, causing frequent packet loss and longer communication latency. We propose a novel DC algorithm called over-the-air consensus (AirCons), a joint communication-and-control design with two key features to overcome the above limitations. First, exploiting a wireless signal's superposition and broadcasting properties renders all AVs' signals to converge to a specific value proportional to participating AVs' average position without individual V2V channel information. Second, the estimated average position is used to control each AV's dynamics instead of each AV's individual position. Through analytic and numerical studies, the effectiveness of the proposed AirCons designed on the state-of-the-art New Radio architecture is verified by showing a $14.22\%$ control gain compared to the benchmark without the average position.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
Metric Learning for User-defined Keyword Spotting
Authors:
Jaemin Jung,
Youkyum Kim,
Jihwan Park,
Youshin Lim,
Byeong-Yeol Kim,
Youngjoon Jang,
Joon Son Chung
Abstract:
The goal of this work is to detect new spoken terms defined by users. While most previous works address Keyword Spotting (KWS) as a closed-set classification problem, this limits their transferability to unseen terms. The ability to define custom keywords has advantages in terms of user experience.
In this paper, we propose a metric learning-based training strategy for user-defined keyword spott…
▽ More
The goal of this work is to detect new spoken terms defined by users. While most previous works address Keyword Spotting (KWS) as a closed-set classification problem, this limits their transferability to unseen terms. The ability to define custom keywords has advantages in terms of user experience.
In this paper, we propose a metric learning-based training strategy for user-defined keyword spotting. In particular, we make the following contributions: (1) we construct a large-scale keyword dataset with an existing speech corpus and propose a filtering method to remove data that degrade model training; (2) we propose a metric learning-based two-stage training strategy, and demonstrate that the proposed method improves the performance on the user-defined keyword spotting task by enriching their representations; (3) to facilitate the fair comparison in the user-defined KWS field, we propose unified evaluation protocol and metrics.
Our proposed system does not require an incremental training on the user-defined keywords, and outperforms previous works by a significant margin on the Google Speech Commands dataset using the proposed as well as the existing metrics.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Bayesian approaches for Quantifying Clinicians' Variability in Medical Image Quantification
Authors:
Jaeik Jeon,
Yeonggul Jang,
Youngtaek Hong,
Hackjoon Shim,
Sekeun Kim
Abstract:
Medical imaging, including MRI, CT, and Ultrasound, plays a vital role in clinical decisions. Accurate segmentation is essential to measure the structure of interest from the image. However, manual segmentation is highly operator-dependent, which leads to high inter and intra-variability of quantitative measurements. In this paper, we explore the feasibility that Bayesian predictive distribution p…
▽ More
Medical imaging, including MRI, CT, and Ultrasound, plays a vital role in clinical decisions. Accurate segmentation is essential to measure the structure of interest from the image. However, manual segmentation is highly operator-dependent, which leads to high inter and intra-variability of quantitative measurements. In this paper, we explore the feasibility that Bayesian predictive distribution parameterized by deep neural networks can capture the clinicians' inter-intra variability. By exploring and analyzing recently emerged approximate inference schemes, we evaluate whether approximate Bayesian deep learning with the posterior over segmentations can learn inter-intra rater variability both in segmentation and clinical measurements. The experiments are performed with two different imaging modalities: MRI and ultrasound. We empirically demonstrated that Bayesian predictive distribution parameterized by deep neural networks could approximate the clinicians' inter-intra variability. We show a new perspective in analyzing medical images quantitatively by providing clinical measurement uncertainty.
△ Less
Submitted 6 July, 2022; v1 submitted 5 July, 2022;
originally announced July 2022.
-
LC-FDNet: Learned Lossless Image Compression with Frequency Decomposition Network
Authors:
Hochang Rhee,
Yeong Il Jang,
Seyun Kim,
Nam Ik Cho
Abstract:
Recent learning-based lossless image compression methods encode an image in the unit of subimages and achieve comparable performances to conventional non-learning algorithms. However, these methods do not consider the performance drop in the high-frequency region, giving equal consideration to the low and high-frequency areas. In this paper, we propose a new lossless image compression method that…
▽ More
Recent learning-based lossless image compression methods encode an image in the unit of subimages and achieve comparable performances to conventional non-learning algorithms. However, these methods do not consider the performance drop in the high-frequency region, giving equal consideration to the low and high-frequency areas. In this paper, we propose a new lossless image compression method that proceeds the encoding in a coarse-to-fine manner to separate and process low and high-frequency regions differently. We initially compress the low-frequency components and then use them as additional input for encoding the remaining high-frequency region. The low-frequency components act as a strong prior in this case, which leads to improved estimation in the high-frequency area. In addition, we design the frequency decomposition process to be adaptive to color channel, spatial location, and image characteristics. As a result, our method derives an image-specific optimal ratio of low/high-frequency components. Experiments show that the proposed method achieves state-of-the-art performance for benchmark high-resolution datasets.
△ Less
Submitted 12 December, 2021;
originally announced December 2021.
-
Dynamic Placement of Rapidly Deployable Mobile Sensor Robots Using Machine Learning and Expected Value of Information
Authors:
Alice Agogino,
Hae Young Jang,
Vivek Rao,
Ritik Batra,
Felicity Liao,
Rohan Sood,
Irving Fang,
R. Lily Hu,
Emerson Shoichet-Bartus,
John Matranga
Abstract:
Although the Industrial Internet of Things has increased the number of sensors permanently installed in industrial plants, there will be gaps in coverage due to broken sensors or sparse density in very large plants, such as in the petrochemical industry. Modern emergency response operations are beginning to use Small Unmanned Aerial Systems (sUAS) that have the ability to drop sensor robots to pre…
▽ More
Although the Industrial Internet of Things has increased the number of sensors permanently installed in industrial plants, there will be gaps in coverage due to broken sensors or sparse density in very large plants, such as in the petrochemical industry. Modern emergency response operations are beginning to use Small Unmanned Aerial Systems (sUAS) that have the ability to drop sensor robots to precise locations. sUAS can provide longer-term persistent monitoring that aerial drones are unable to provide. Despite the relatively low cost of these assets, the choice of which robotic sensing systems to deploy to which part of an industrial process in a complex plant environment during emergency response remains challenging.
This paper describes a framework for optimizing the deployment of emergency sensors as a preliminary step towards realizing the responsiveness of robots in disaster circumstances. AI techniques (Long short-term memory, 1-dimensional convolutional neural network, logistic regression, and random forest) identify regions where sensors would be most valued without requiring humans to enter the potentially dangerous area. In the case study described, the cost function for optimization considers costs of false-positive and false-negative errors. Decisions on mitigation include implementing repairs or shutting down the plant. The Expected Value of Information (EVI) is used to identify the most valuable type and location of physical sensors to be deployed to increase the decision-analytic value of a sensor network. This method is applied to a case study using the Tennessee Eastman process data set of a chemical plant, and we discuss implications of our findings for operation, distribution, and decision-making of sensors in plant emergency and resilience scenarios.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Breaking Moravec's Paradox: Visual-Based Distribution in Smart Fashion Retail
Authors:
Shin Woong Sung,
Hyunsuk Baek,
Hyeonjun Sim,
Eun Hie Kim,
Hyunwoo Hwangbo,
Young Jae Jang
Abstract:
In this paper, we report an industry-academia collaborative study on the distribution method of fashion products using an artificial intelligence (AI) technique combined with an optimization method. To meet the current fashion trend of short product lifetimes and an increasing variety of styles, the company produces limited volumes of a large variety of styles. However, due to the limited volume o…
▽ More
In this paper, we report an industry-academia collaborative study on the distribution method of fashion products using an artificial intelligence (AI) technique combined with an optimization method. To meet the current fashion trend of short product lifetimes and an increasing variety of styles, the company produces limited volumes of a large variety of styles. However, due to the limited volume of each style, some styles may not be distributed to some off-line stores. As a result, this high-variety, low-volume strategy presents another challenge to distribution managers. We collaborated with KOLON F/C, one of the largest fashion business units in South Korea, to develop models and an algorithm to optimally distribute the products to the stores based on the visual images of the products. The team developed a deep learning model that effectively represents the styles of clothes based on their visual image. Moreover, the team created an optimization model that effectively determines the product mix for each store based on the image representation of clothes. In the past, computers were only considered to be useful for conducting logical calculations, and visual perception and cognition were considered to be difficult computational tasks. The proposed approach is significant in that it uses both AI (perception and cognition) and mathematical optimization (logical calculation) to address a practical supply chain problem, which is why the study was called "Breaking Moravec's Paradox."
△ Less
Submitted 9 July, 2020;
originally announced July 2020.
-
Energy-Efficient UAV Relaying Robust Resource Allocation in Uncertain Adversarial Networks
Authors:
S. Ahmed,
Mostafa Z. Chowdhury,
S. R. Sabuj,
M. I. Alam,
Y. M. Jang
Abstract:
The mobile relaying technique is a critical enhancing technology in wireless communications due to a higher chance of supporting the remote user from the base station (BS) with better quality of service. This paper investigates energy-efficient (EE) mobile relaying networks, mounted on an unmanned aerial vehicle (UAV), while the unknown adversaries try to intercept the legitimate link. We aim to o…
▽ More
The mobile relaying technique is a critical enhancing technology in wireless communications due to a higher chance of supporting the remote user from the base station (BS) with better quality of service. This paper investigates energy-efficient (EE) mobile relaying networks, mounted on an unmanned aerial vehicle (UAV), while the unknown adversaries try to intercept the legitimate link. We aim to optimize robust transmit power both UAV and BS along, relay hovering path, speed, and acceleration. The BS sends legitimate information, which is forwarded to the user by the relay. This procedure is defined as information-causality-constraint (ICC). We jointly optimize the worst case secrecy rate (WCSR) and UAV propulsion energy consumption (PEC) for a finite time horizon. We construct the BS-UAV, the UAV-user, and the UAV-adversary channel models. We apply the UAV PEC considering UAV speed and acceleration. At last, we derive EE UAV relay-user maximization problem in the adversarial wireless networks. While the problem is non-convex, we propose an iterative and sub-optimal algorithm to optimize EE UAV relay with constraints, such as ICC, trajectory, speed, acceleration, and transmit power. First, we optimize both BS and UAV transmit power, and hovering speed for known UAV path planning and acceleration. Using the optimal transmit power and speed, we obtain the optimal trajectory and acceleration. We compare our algorithm with existing algorithms and demonstrate the improved EE UAV relaying communication for our model.
△ Less
Submitted 23 July, 2021; v1 submitted 28 June, 2020;
originally announced June 2020.
-
Opportunities of Optical Spectrum for Future Wireless Communications
Authors:
Mostafa Zaman Chowdhury,
Moh Khalid Hasan,
Md Shahjalal,
Eun Bi Shin,
Yeong Min Jang
Abstract:
The requirements in terms of service quality such as data rate, latency, power consumption, number of connectivity of future fifth-generation (5G) communication is very high. Moreover, in Internet of Things (IoT) requires massive connectivity. Optical wireless communication (OWC) technologies such as visible light communication, light fidelity, optical camera communication, and free space optical…
▽ More
The requirements in terms of service quality such as data rate, latency, power consumption, number of connectivity of future fifth-generation (5G) communication is very high. Moreover, in Internet of Things (IoT) requires massive connectivity. Optical wireless communication (OWC) technologies such as visible light communication, light fidelity, optical camera communication, and free space optical communication can effectively serve for the successful deployment of 5G and IoT. This paper clearly presents the contributions of OWC networks for 5G and IoT solutions.
△ Less
Submitted 30 May, 2020;
originally announced June 2020.
-
Optical wireless hybrid networks for 5G and beyond communications
Authors:
Mostafa Zaman Chowdhury,
Moh Khalid Hasan,
Md Shahjalal,
Md Tanvir Hossan,
Yeong Min Jang
Abstract:
The next 5 th generation (5G) and above ultra-high speed, ultra-low latency, and extremely high reliable communication systems will consist of heterogeneous networks. These heterogeneous networks will consist not only radio frequency (RF) based systems but also optical wireless based systems. Hybrid architectures among different networks is an excellent approach for achieving the required level of…
▽ More
The next 5 th generation (5G) and above ultra-high speed, ultra-low latency, and extremely high reliable communication systems will consist of heterogeneous networks. These heterogeneous networks will consist not only radio frequency (RF) based systems but also optical wireless based systems. Hybrid architectures among different networks is an excellent approach for achieving the required level of service quality. In this paper, we provide the opportunities bring by hybrid systems considering RF as well as optical wireless based communication technologies. We also discuss about the key research direction of hybrid network systems.
△ Less
Submitted 30 May, 2020;
originally announced June 2020.
-
Energy-Efficient Task Offloading for Vehicular Edge Computing: Joint Optimization of Offloading and Bit Allocation
Authors:
Youngsu Jang,
Jinyeop Na,
Seongah Jeong,
Joonhyuk Kang
Abstract:
With the rapid development of vehicular networks, various applications that require high computation resources have emerged. To efficiently execute these applications, vehicular edge computing (VEC) can be employed. VEC offloads the computation tasks to the VEC node, i.e., the road side unit (RSU), which improves vehicular service and reduces energy consumption of the vehicle. However, communicati…
▽ More
With the rapid development of vehicular networks, various applications that require high computation resources have emerged. To efficiently execute these applications, vehicular edge computing (VEC) can be employed. VEC offloads the computation tasks to the VEC node, i.e., the road side unit (RSU), which improves vehicular service and reduces energy consumption of the vehicle. However, communication environment is time-varying due to the movement of the vehicle, so that finding the optimal offloading parameters is still an open problem. Therefore, it is necessary to investigate an optimal offloading strategy for effective energy savings in energy-limited vehicles. In this paper, we consider the changes of communication environment due to various speeds of vehicles, which are not considered in previous studies. Then, we jointly optimize the offloading proportion and uplink/computation/downlink bit allocation of multiple vehicles, for the purpose of minimizing the total energy consumption of the vehicles under the delay constraint. Numerical results demonstrate that the proposed energy-efficient offloading strategy significantly reduces the total energy consumption.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
6G Wireless Communication Systems: Applications, Requirements, Technologies, Challenges, and Research Directions
Authors:
Mostafa Zaman Chowdhury,
Md. Shahjalal,
Shakil Ahmed,
Yeong Min Jang
Abstract:
Fifth-generation (5G) communication, which has many more features than fourth-generation communication, will be officially launched very soon. A new paradigm of wireless communication, the sixth-generation (6G) system, with the full support of artificial intelligence is expected to be deployed between 2027 and 2030. In beyond 5G, there are some fundamental issues, which need to be addressed are hi…
▽ More
Fifth-generation (5G) communication, which has many more features than fourth-generation communication, will be officially launched very soon. A new paradigm of wireless communication, the sixth-generation (6G) system, with the full support of artificial intelligence is expected to be deployed between 2027 and 2030. In beyond 5G, there are some fundamental issues, which need to be addressed are higher system capacity, higher data rate, lower latency, and improved quality of service (QoS) compared to 5G system. This paper presents the vision of future 6G wireless communication and its network architecture. We discuss the emerging technologies such as artificial intelligence, terahertz communications, optical wireless technology, free space optic network, blockchain, three-dimensional networking, quantum communications, unmanned aerial vehicle, cell-free communications, integration of wireless information and energy transfer, integration of sensing and communication, integration of access-backhaul networks, dynamic network slicing, holographic beamforming, and big data analytics that can assist the 6G architecture development in guaranteeing the QoS. We present the expected applications with the requirements and the possible technologies for 6G communication. We also outline the possible challenges and research directions to reach this goal.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
A Novel Indoor Mobile Localization System Based on Optical Camera Communication
Authors:
Md. Tanvir Hossan,
Mostafa Zaman Chowdhury,
Amirul Islam,
Yeong Min Jang
Abstract:
Localizing smartphones in indoor environments offers excellent opportunities for e-commerce. In this paper, we propose a localization technique for smartphones in indoor environments. This technique can calculate the coordinates of a smartphone using existing illumination infrastructure with light-emitting diodes (LEDs). The system can locate smartphones without further modification of the existin…
▽ More
Localizing smartphones in indoor environments offers excellent opportunities for e-commerce. In this paper, we propose a localization technique for smartphones in indoor environments. This technique can calculate the coordinates of a smartphone using existing illumination infrastructure with light-emitting diodes (LEDs). The system can locate smartphones without further modification of the existing LED light infrastructure. Smartphones do not have fixed position and they may move frequently anywhere in an environment. Our algorithm uses multiple (i.e., more than two) LED lights simultaneously. The smartphone gets the LED-IDs from the LED lights that are within the field of view (FOV) of the smartphone's camera. These LED-IDs contain the coordinate information of the LED lights. Concurrently, the pixel area on the image sensor (IS) of projected image changes with the relative motion between the smartphone and each LED light which allows the algorithm to calculate the distance from the smartphone to that LED.
△ Less
Submitted 5 October, 2018;
originally announced October 2018.