Issue Downloads
A Review on Methods and Applications in Multimodal Deep Learning
Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities. Despite the ...
Video Frame Interpolation: A Comprehensive Survey
Video Frame Interpolation (VFI) is a fascinating and challenging problem in the computer vision (CV) field, aiming to generate non-existing frames between two consecutive video frames. In recent years, many algorithms based on optical flow, kernel, or ...
A Decoupled Kernel Prediction Network Guided by Soft Mask for Single Image HDR Reconstruction
Recent works on single image high dynamic range (HDR) reconstruction fail to hallucinate plausible textures, resulting in information missing and artifacts in large-scale under/over-exposed regions. In this article, a decoupled kernel prediction network ...
Point Cloud Quality Assessment: Dataset Construction and Learning-based No-reference Metric
Full-reference (FR) point cloud quality assessment (PCQA) has achieved impressive progress in recent years. However, in many cases, obtaining the reference point clouds is difficult, so no-reference (NR) metrics have become a research hotspot. Few ...
Pose- and Attribute-consistent Person Image Synthesis
Person Image Synthesis aims at transferring the appearance of the source person image into a target pose. Existing methods cannot handle large pose variations and therefore suffer from two critical problems: (1) synthesis distortion due to the ...
Scalable Color Quantization for Task-centric Image Compression
Conventional image compression techniques targeted for the perceptual quality are not generally optimized for classification tasks using deep neural networks (DNNs). To compress images for DNN inference tasks, recent studies have proposed task-centric ...
From False-Free to Privacy-Oriented Communitarian Microblogging Social Networks
Online Social Networks (OSNs) have gained enormous popularity in recent years. They provide a dynamic platform for sharing content (text messages or multimedia) and for facilitating communication between friends and acquaintances. Microblogging services ...
Query-Guided Prototype Learning with Decoder Alignment and Dynamic Fusion in Few-Shot Segmentation
Few-shot segmentation aims to segment objects belonging to a specific class under the guidance of a few annotated examples. Most existing approaches follow the prototype learning paradigm and generate category prototypes by squeezing masked feature maps ...
ML-CookGAN: Multi-Label Generative Adversarial Network for Food Image Generation
Generating food images from recipe and ingredient information can be applied to many tasks such as food recommendation, recipe development, and health management. For the characteristics of food images, this paper proposes ML-CookGAN, a novel CGAN. This ...
GHOSM: Graph-based Hybrid Outline and Skeleton Modelling for Shape Recognition
An efficient and accurate shape detection model plays a major role in many research areas. With the emergence of more complex shapes in real-life applications, shape recognition models need to capture the structure with more effective features to achieve ...
Distill-DBDGAN: Knowledge Distillation and Adversarial Learning Framework for Defocus Blur Detection
Defocus blur detection (DBD) aims to segment the blurred regions from a given image affected by defocus blur. It is a crucial pre-processing step for various computer vision tasks. With the increasing popularity of small mobile devices, there is a need ...
Boosting Relationship Detection in Images with Multi-Granular Self-Supervised Learning
Visual and spatial relationship detection in images has been a fast-developing research topic in the multimedia field, which learns to recognize the semantic/spatial interactions between objects in an image, aiming to compose a structured semantic ...
Robust Long-Term Tracking via Localizing Occluders
Occlusion is known as one of the most challenging factors in long-term tracking because of its unpredictable shape. Existing works devoted into the design of loss functions, training strategies or model architectures, which are considered to have not ...
Context Prior Guided Semantic Modeling for Biomedical Image Segmentation
Most state-of-the-art deep networks proposed for biomedical image segmentation are developed based on U-Net. While remarkable success has been achieved, its inherent limitations hinder it from yielding more precise segmentation. First, its receptive field ...
A Optimized BERT for Multimodal Sentiment Analysis
Sentiment analysis of one modality (e.g., text or image) has been broadly studied. However, not much attention has been paid to the sentiment analysis of multi-modal data. As the research on and applications of multi-modal data analysis are becoming more ...
Progressive Transformer Machine for Natural Character Reenactment
Character reenactment aims to control a target person’s full-head movement by a driving monocular sequence that is made up of the driving character video. Current algorithms utilize convolution neural networks in generative adversarial networks, which ...
Is it Violin or Viola? Classifying the Instruments’ Music Pieces using Descriptive Statistics
Classifying music pieces based on their instrument sounds is pivotal for analysis and application purposes. Given its importance, techniques using machine learning have been proposed to classify violin and viola music pieces. The violin and viola are two ...
EiMOL: A Secure Medical Image Encryption Algorithm based on Optimization and the Lorenz System
Nowadays, the demand for digital images from different intelligent devices and sensors has dramatically increased in smart healthcare. Due to advanced low-cost and easily available tools and software, manipulation of these images is an easy task. Thus, ...
UEFPN: Unified and Enhanced Feature Pyramid Networks for Small Object Detection
Object detection models based on feature pyramid networks have made significant progress in general object detection. However, small object detection is still a challenge for the existing models. In this paper, we think that two factors in the existing ...
Deep Learning-Based Intra Mode Derivation for Versatile Video Coding
In intra coding, Rate Distortion Optimization (RDO) is performed to achieve the optimal intra mode from a pre-defined candidate list. The optimal intra mode is also required to be encoded and transmitted to the decoder side besides the residual signal, ...
Learning Explicit and Implicit Dual Common Subspaces for Audio-visual Cross-modal Retrieval
Audio-visual tracks in video contain rich semantic information with potential in many applications and research. Since the audio-visual data have inconsistent distributions and because of the heterogeneous nature of representations, the heterogeneous gap ...
Real-time Image Enhancement with Attention Aggregation
Image enhancement has stimulated significant research works over the past years for its great application potential in video conferencing scenarios. Nevertheless, most existing image enhancement approaches are still struggling to find a good tradeoff that ...
Toward Visual Behavior and Attention Understanding for Augmented 360 Degree Videos
Augmented reality (AR) overlays digital content onto reality. In an AR system, correct and precise estimations of user visual fixations and head movements can enhance the quality of experience by allocating more computational resources for analyzing, ...
Mirror Segmentation via Semantic-aware Contextual Contrasted Feature Learning
Mirrors are everywhere in our daily lives. Existing computer vision systems do not consider mirrors, and hence may get confused by the reflected content inside a mirror, resulting in a severe performance degradation. However, separating the real content ...