Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (11)

Search Parameters:
Keywords = VQ-VAE

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 14286 KiB  
Article
Anomaly Detection in Optical Coherence Tomography Angiography (OCTA) with a Vector-Quantized Variational Auto-Encoder (VQ-VAE)
by Hana Jebril, Meltem Esengönül and Hrvoje Bogunović
Bioengineering 2024, 11(7), 682; https://doi.org/10.3390/bioengineering11070682 - 5 Jul 2024
Viewed by 443
Abstract
Optical coherence tomography angiography (OCTA) provides detailed information on retinal blood flow and perfusion. Abnormal retinal perfusion indicates possible ocular or systemic disease. We propose a deep learning-based anomaly detection model to identify such anomalies in OCTA. It utilizes two deep learning approaches. [...] Read more.
Optical coherence tomography angiography (OCTA) provides detailed information on retinal blood flow and perfusion. Abnormal retinal perfusion indicates possible ocular or systemic disease. We propose a deep learning-based anomaly detection model to identify such anomalies in OCTA. It utilizes two deep learning approaches. First, a representation learning with a Vector-Quantized Variational Auto-Encoder (VQ-VAE) followed by Auto-Regressive (AR) modeling. Second, it exploits epistemic uncertainty estimates from Bayesian U-Net employed to segment the vasculature on OCTA en face images. Evaluation on two large public datasets, DRAC and OCTA-500, demonstrates effective anomaly detection (an AUROC of 0.92 for the DRAC and an AUROC of 0.75 for the OCTA-500) and localization (a mean Dice score of 0.61 for the DRAC) on this challenging task. To our knowledge, this is the first work that addresses anomaly detection in OCTA. Full article
(This article belongs to the Special Issue Translational AI and Computational Tools for Ophthalmic Disease)
Show Figures

Figure 1

18 pages, 5104 KiB  
Article
Hierarchical Vector-Quantized Variational Autoencoder and Vector Credibility Mechanism for High-Quality Image Inpainting
by Cheng Li, Dan Xu and Kuai Chen
Electronics 2024, 13(10), 1852; https://doi.org/10.3390/electronics13101852 - 9 May 2024
Viewed by 670
Abstract
Image inpainting infers the missing areas of a corrupted image according to the information of the undamaged part. Many existing image inpainting methods can generate plausible inpainted results from damaged images with the fast-developed deep-learning technology. However, they still suffer from over-smoothed textures [...] Read more.
Image inpainting infers the missing areas of a corrupted image according to the information of the undamaged part. Many existing image inpainting methods can generate plausible inpainted results from damaged images with the fast-developed deep-learning technology. However, they still suffer from over-smoothed textures or textural distortion in the cases of complex textural details or large damaged areas. To restore textures at a fine-grained level, we propose an image inpainting method based on a hierarchical VQ-VAE with a vector credibility mechanism. It first trains the hierarchical VQ-VAE with ground truth images to update two codebooks and to obtain two corresponding vector collections containing information on ground truth images. The two vector collections are fed to a decoder to generate the corresponding high-fidelity outputs. An encoder then is trained with the corresponding damaged image. It generates vector collections approximating the ground truth by the help of the prior knowledge provided by the codebooks. After that, the two vector collections pass through the decoder from the hierarchical VQ-VAE to produce the inpainted results. In addition, we apply a vector credibility mechanism to promote vector collections from damaged images and approximate vector collections from ground truth images. To further improve the inpainting result, we apply a refinement network, which uses residual blocks with different dilation rates to acquire both global information and local textural details. Extensive experiments conducted on several datasets demonstrate that our method outperforms the state-of-the-art ones. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Image and Video Processing)
Show Figures

Figure 1

12 pages, 12887 KiB  
Communication
A One-Class Classifier for the Detection of GAN Manipulated Multi-Spectral Satellite Images
by Lydia Abady, Giovanna Maria Dimitri and Mauro Barni
Remote Sens. 2024, 16(5), 781; https://doi.org/10.3390/rs16050781 - 24 Feb 2024
Viewed by 790
Abstract
The current image generative models have achieved a remarkably realistic image quality, offering numerous academic and industrial applications. However, to ensure these models are used for benign purposes, it is essential to develop tools that definitively detect whether an image has been synthetically [...] Read more.
The current image generative models have achieved a remarkably realistic image quality, offering numerous academic and industrial applications. However, to ensure these models are used for benign purposes, it is essential to develop tools that definitively detect whether an image has been synthetically generated. Consequently, several detectors with excellent performance in computer vision applications have been developed. However, these detectors cannot be directly applied as they areto multi-spectral satellite images, necessitating the training of new models. While two-class classifiers generally achieve high detection accuracies, they struggle to generalize to image domains and generative architectures different from those encountered during training. In this paper, we propose a one-class classifier based on Vector Quantized Variational Autoencoder 2 (VQ-VAE 2) features to overcome the limitations of two-class classifiers. We start by highlighting the generalization problem faced by binary classifiers. This was demonstrated by training and testing an EfficientNet-B4 architecture on multiple multi-spectral datasets. We then illustrate that the VQ-VAE 2-based classifier, which was trained exclusively on pristine images, could detect images from different domains and generated by architectures not encountered during training. Finally, we conducted a head-to-head comparison between the two classifiers on the same generated datasets, emphasizing the superior generalization capabilities of the VQ-VAE 2-based detector, wherewe obtained a probability of detection at a 0.05 false alarm rate of 1 for the blue and red channels when using the VQ-VAE 2-based detector, and 0.72 when we used the EfficientNet-B4 classifier. Full article
Show Figures

Figure 1

30 pages, 163128 KiB  
Article
PCGen: A Fully Parallelizable Point Cloud Generative Model
by Nicolas Vercheval, Remco Royen, Adrian Munteanu and Aleksandra Pižurica
Sensors 2024, 24(5), 1414; https://doi.org/10.3390/s24051414 - 22 Feb 2024
Viewed by 952
Abstract
Generative models have the potential to revolutionize 3D extended reality. A primary obstacle is that augmented and virtual reality need real-time computing. Current state-of-the-art point cloud random generation methods are not fast enough for these applications. We introduce a vector-quantized variational autoencoder model [...] Read more.
Generative models have the potential to revolutionize 3D extended reality. A primary obstacle is that augmented and virtual reality need real-time computing. Current state-of-the-art point cloud random generation methods are not fast enough for these applications. We introduce a vector-quantized variational autoencoder model (VQVAE) that can synthesize high-quality point clouds in milliseconds. Unlike previous work in VQVAEs, our model offers a compact sample representation suitable for conditional generation and data exploration with potential applications in rapid prototyping. We achieve this result by combining architectural improvements with an innovative approach for probabilistic random generation. First, we rethink current parallel point cloud autoencoder structures, and we propose several solutions to improve robustness, efficiency and reconstruction quality. Notable contributions in the decoder architecture include an innovative computation layer to process the shape semantic information, an attention mechanism that helps the model focus on different areas and a filter to cover possible sampling errors. Secondly, we introduce a parallel sampling strategy for VQVAE models consisting of a double encoding system, where a variational autoencoder learns how to generate the complex discrete distribution of the VQVAE, not only allowing quick inference but also describing the shape with a few global variables. We compare the proposed decoder and our VQVAE model with established and concurrent work, and we prove, one by one, the validity of the single contributions. Full article
(This article belongs to the Topic Applied Computing and Machine Intelligence (ACMI))
Show Figures

Figure 1

20 pages, 4900 KiB  
Article
Vector Quantized Variational Autoencoder-Based Compressive Sampling Method for Time Series in Structural Health Monitoring
by Ge Liang, Zhenglin Ji, Qunhong Zhong, Yong Huang and Kun Han
Sustainability 2023, 15(20), 14868; https://doi.org/10.3390/su152014868 - 13 Oct 2023
Cited by 1 | Viewed by 1062
Abstract
The theory of compressive sampling (CS) has revolutionized data compression technology by capitalizing on the inherent sparsity of a signal to enable signal recovery from significantly far fewer samples than what is required by the Nyquist–Shannon sampling theorem. Recent advancement in deep generative [...] Read more.
The theory of compressive sampling (CS) has revolutionized data compression technology by capitalizing on the inherent sparsity of a signal to enable signal recovery from significantly far fewer samples than what is required by the Nyquist–Shannon sampling theorem. Recent advancement in deep generative models, which can represent high-dimension data in a low-dimension latent space efficiently when trained with big data, has been used to further reduce the sample size for image data compressive sampling. However, compressive sampling for 1D time series data has not significantly benefited from this technological progress. In this study, we investigate the application of different architectures of deep neural networks suitable for time series data compression and propose an efficient method to solve the compressive sampling problem on one-dimensional (1D) structural health monitoring (SHM) data, based on block CS and the vector quantized–variational autoencoder model with a naïve multitask paradigm (VQ-VAE-M). The proposed method utilizes VQ-VAE-M to learn the data characteristics of the signal, replaces the “hard constraint” of sparsity to realize the compressive sampling signal reconstruction and thereby does not need to select the appropriate sparse basis for the signal. A comparative analysis against various CS methods and other deep neural network models was performed in both synthetic data and real-world data from two real bridges in China. The results have demonstrated the superiority of the proposed method, with achieving the smallest reconstruction error of 0.038, 0.034 and 0.021, and the highest reconstruction accuracy of 0.882, 0.892 and 0.936 for compression ratios of 4.0, 2.66, and 2.0, respectively. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) in Structural Health Monitoring)
Show Figures

Figure 1

18 pages, 1711 KiB  
Article
MURM: Utilization of Multi-Views for Goal-Conditioned Reinforcement Learning in Robotic Manipulation
by Seongwon Jang, Hyemi Jeong and Hyunseok Yang
Robotics 2023, 12(4), 119; https://doi.org/10.3390/robotics12040119 - 19 Aug 2023
Viewed by 1631
Abstract
We present a novel framework, multi-view unified reinforcement learning for robotic manipulation (MURM), which efficiently utilizes multiple camera views to train a goal-conditioned policy for a robot to perform complex tasks. The MURM framework consists of three main phases: (i) demo collection from [...] Read more.
We present a novel framework, multi-view unified reinforcement learning for robotic manipulation (MURM), which efficiently utilizes multiple camera views to train a goal-conditioned policy for a robot to perform complex tasks. The MURM framework consists of three main phases: (i) demo collection from an expert, (ii) representation learning, and (iii) offline reinforcement learning. In the demo collection phase, we design a scripted expert policy that uses privileged information, such as Cartesian coordinates of a target and goal, to solve the tasks. We add noise to the expert policy to provide sufficient interactive information about the environment, as well as suboptimal behavioral trajectories. We designed three tasks in a Pybullet simulation environment, including placing an object in a desired goal position and picking up various objects that are randomly positioned in the environment. In the representation learning phase, we use a vector-quantized variational autoencoder (VQVAE) to learn a more structured latent representation that makes it feasible to train for RL compared to high-dimensional raw images. We train VQVAE models for each distinct camera view and define the best viewpoint settings for training. In the offline reinforcement learning phase, we use the Implicit Q-learning (IQL) algorithm as our baseline and introduce a separated Q-functions method and dropout method that can be implemented in multi-view settings to train the goal-conditioned policy with supervised goal images. We conduct experiments in simulation and show that the single-view baseline fails to solve complex tasks, whereas MURM is successful. Full article
(This article belongs to the Section AI in Robotics)
Show Figures

Figure 1

11 pages, 2769 KiB  
Communication
Fast Jukebox: Accelerating Music Generation with Knowledge Distillation
by Michel Pezzat-Morales, Hector Perez-Meana and Toru Nakashika
Appl. Sci. 2023, 13(9), 5630; https://doi.org/10.3390/app13095630 - 3 May 2023
Cited by 1 | Viewed by 1710
Abstract
The Jukebox model can generate high-diversity music within a single system, which is achieved by using a hierarchical VQ-VAE architecture to compress audio in a discrete space at different compression levels. Even though the results are impressive, the inference stage is tremendously slow. [...] Read more.
The Jukebox model can generate high-diversity music within a single system, which is achieved by using a hierarchical VQ-VAE architecture to compress audio in a discrete space at different compression levels. Even though the results are impressive, the inference stage is tremendously slow. To address this issue, we propose a Fast Jukebox, which uses different knowledge distillation strategies to reduce the number of parameters of the prior model for compressed space. Since the Jukebox has shown highly diverse audio generation capabilities, we used a simple compilation of songs for experimental purposes. Evaluation results obtained using emotional valence show that the proposed approach achieved a tendency towards actively pleasant, thus reducing inference time for all VQ-VAE levels without compromising quality. Full article
Show Figures

Figure 1

13 pages, 4267 KiB  
Article
Generating High-Resolution 3D Faces and Bodies Using VQ-VAE-2 with PixelSNAIL Networks on 2D Representations
by Alessio Gallucci, Dmitry Znamenskiy, Yuxuan Long, Nicola Pezzotti and Milan Petkovic
Sensors 2023, 23(3), 1168; https://doi.org/10.3390/s23031168 - 19 Jan 2023
Cited by 3 | Viewed by 2997
Abstract
Modeling and representing 3D shapes of the human body and face is a prominent field due to its applications in the healthcare, clothes, and movie industry. In our work, we tackled the problem of 3D face and body synthesis by reducing 3D meshes [...] Read more.
Modeling and representing 3D shapes of the human body and face is a prominent field due to its applications in the healthcare, clothes, and movie industry. In our work, we tackled the problem of 3D face and body synthesis by reducing 3D meshes to 2D image representations. We show that the face can naturally be modeled on a 2D grid. At the same time, for more challenging 3D body geometries, we proposed a novel non-bijective 3D–2D conversion method representing the 3D body mesh as a plurality of rendered projections on the 2D grid. Then, we trained a state-of-the-art vector-quantized variational autoencoder (VQ-VAE-2) to learn a latent representation of 2D images and fit a PixelSNAIL autoregressive model to sample novel synthetic meshes. We evaluated our method versus a classical one based on principal component analysis (PCA) by sampling from the empirical cumulative distribution of the PCA scores. We used the empirical distributions of two commonly used metrics, specificity and diversity, to quantitatively demonstrate that the synthetic faces generated with our method are statistically closer to real faces when compared with the PCA ones. Our experiment on the 3D body geometry requires further research to match the test set statistics but shows promising results. Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)
Show Figures

Figure 1

15 pages, 16165 KiB  
Article
Deep Multi-Task Learning for an Autoencoder-Regularized Semantic Segmentation of Fundus Retina Images
by Ge Jin, Xu Chen and Long Ying
Mathematics 2022, 10(24), 4798; https://doi.org/10.3390/math10244798 - 16 Dec 2022
Cited by 2 | Viewed by 1583
Abstract
Automated segmentation of retinal blood vessels is necessary for the diagnosis, monitoring, and treatment planning of the disease. Although current U-shaped structure models have achieved outstanding performance, some challenges still emerge due to the nature of this problem and mainstream models. (1) There [...] Read more.
Automated segmentation of retinal blood vessels is necessary for the diagnosis, monitoring, and treatment planning of the disease. Although current U-shaped structure models have achieved outstanding performance, some challenges still emerge due to the nature of this problem and mainstream models. (1) There does not exist an effective framework to obtain and incorporate features with different spatial and semantic information at multiple levels. (2) The fundus retina images coupled with high-quality blood vessel segmentation are relatively rare. (3) The information on edge regions, which are the most difficult parts to segment, has not received adequate attention. In this work, we propose a novel encoder–decoder architecture based on the multi-task learning paradigm to tackle these challenges. The shared image encoder is regularized by conducting the reconstruction task in the VQ-VAE (Vector Quantized Variational AutoEncoder) module branch to improve the generalization ability. Meanwhile, hierarchical representations are generated and integrated to complement the input image. The edge attention module is designed to make the model capture edge-focused feature representations via deep supervision, focusing on the target edge regions that are most difficult to recognize. Extensive evaluations of three publicly accessible datasets demonstrate that the proposed model outperforms the current state-of-the-art methods. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition with Applications)
Show Figures

Figure 1

15 pages, 6234 KiB  
Article
Anomaly Detection for Agricultural Vehicles Using Autoencoders
by Esma Mujkic, Mark P. Philipsen, Thomas B. Moeslund, Martin P. Christiansen and Ole Ravn
Sensors 2022, 22(10), 3608; https://doi.org/10.3390/s22103608 - 10 May 2022
Cited by 16 | Viewed by 3327
Abstract
The safe in-field operation of autonomous agricultural vehicles requires detecting all objects that pose a risk of collision. Current vision-based algorithms for object detection and classification are unable to detect unknown classes of objects. In this paper, the problem is posed as anomaly [...] Read more.
The safe in-field operation of autonomous agricultural vehicles requires detecting all objects that pose a risk of collision. Current vision-based algorithms for object detection and classification are unable to detect unknown classes of objects. In this paper, the problem is posed as anomaly detection instead, where convolutional autoencoders are applied to identify any objects deviating from the normal pattern. Training an autoencoder network to reconstruct normal patterns in agricultural fields makes it possible to detect unknown objects by high reconstruction error. Basic autoencoder (AE), vector-quantized variational autoencoder (VQ-VAE), denoising autoencoder (DAE) and semisupervised autoencoder (SSAE) with a max-margin-inspired loss function are investigated and compared with a baseline object detector based on YOLOv5. Results indicate that SSAE with an area under the curve for precision/recall (PR AUC) of 0.9353 outperforms other autoencoder models and is comparable to an object detector with a PR AUC of 0.9794. Qualitative results show that SSAE is capable of detecting unknown objects, whereas the object detector is unable to do so and fails to identify known classes of objects in specific cases. Full article
(This article belongs to the Special Issue Robotics and Sensors Technology in Agriculture)
Show Figures

Figure 1

19 pages, 2363 KiB  
Article
Image Anomaly Detection Using Normal Data Only by Latent Space Resampling
by Lu Wang, Dongkai Zhang, Jiahao Guo and Yuexing Han
Appl. Sci. 2020, 10(23), 8660; https://doi.org/10.3390/app10238660 - 3 Dec 2020
Cited by 31 | Viewed by 9324
Abstract
Detecting image anomalies automatically in industrial scenarios can improve economic efficiency, but the scarcity of anomalous samples increases the challenge of the task. Recently, autoencoder has been widely used in image anomaly detection without using anomalous images during training. However, it is hard [...] Read more.
Detecting image anomalies automatically in industrial scenarios can improve economic efficiency, but the scarcity of anomalous samples increases the challenge of the task. Recently, autoencoder has been widely used in image anomaly detection without using anomalous images during training. However, it is hard to determine the proper dimensionality of the latent space, and it often leads to unwanted reconstructions of the anomalous parts. To solve this problem, we propose a novel method based on the autoencoder. In this method, the latent space of the autoencoder is estimated using a discrete probability model. With the estimated probability model, the anomalous components in the latent space can be well excluded and undesirable reconstruction of the anomalous parts can be avoided. Specifically, we first adopt VQ-VAE as the reconstruction model to get a discrete latent space of normal samples. Then, PixelSail, a deep autoregressive model, is used to estimate the probability model of the discrete latent space. In the detection stage, the autoregressive model will determine the parts that deviate from the normal distribution in the input latent space. Then, the deviation code will be resampled from the normal distribution and decoded to yield a restored image, which is closest to the anomaly input. The anomaly is then detected by comparing the difference between the restored image and the anomaly image. Our proposed method is evaluated on the high-resolution industrial inspection image datasets MVTec AD which consist of 15 categories. The results show that the AUROC of the model improves by 15% over autoencoder and also yields competitive performance compared with state-of-the-art methods. Full article
Show Figures

Figure 1

Back to TopTop