Self-Guiding Multimodal LSTM—When We Do Not Have a Perfect Training Dataset for Image Captioning
In this paper, a self-guiding multimodal LSTM (sgLSTM) image captioning model is proposed to handle an uncontrolled imbalanced real-world image-sentence dataset. We collect a FlickrNYC dataset from Flickr as our testbed with 306,165 images and the ...
Visual Attention Prediction for Stereoscopic Video by Multi-Module Fully Convolutional Network
Visual attention is an important mechanism in the human visual system (HVS) and there have been numerous saliency detection algorithms designed for 2D images/video recently. However, the research for fixation detection of stereoscopic video is still ...
Occlusion-Aware Depth Map Coding Optimization Using Allowable Depth Map Distortions
In depth map coding, rate-distortion optimization for those pixels that will cause occlusion in view synthesis is a rather challenging task, since the synthesis distortion estimation is complicated by the warping competition and the occlusion order can be ...
Sample Fusion Network: An End-to-End Data Augmentation Network for Skeleton-Based Human Action Recognition
Data augmentation is a widely used technique for enhancing the generalization ability of deep neural networks for skeleton-based human action recognition (HAR) tasks. Most existing data augmentation methods generate new samples by means of handcrafted ...
Saliency From Growing Neural Gas: Learning Pre-Attentional Structures for a Flexible Attention System
Artificial visual attention has been an active research area for over two decades. Especially, the concept of saliency has been implemented in many different ways. Early approaches aimed at closely modeling saliency processing with concepts from ...
Exploiting Images for Video Recognition: Heterogeneous Feature Augmentation via Symmetric Adversarial Learning
Training deep models of video recognition usually requires sufficient labeled videos in order to achieve good performance without over-fitting. However, it is quite labor-intensive and time-consuming to collect and annotate a large amount of videos. ...
A Work Efficient Parallel Algorithm for Exact Euclidean Distance Transform
A fully-parallelized work-time optimal algorithm is presented for computing the exact Euclidean Distance Transform (EDT) of a 2D binary image with the size of <inline-formula> <tex-math notation="LaTeX">$n\times n$ </tex-math></inline-formula>. Unlike ...
Reference-Free Quality Assessment of Sonar Images via Contour Degradation Measurement
Sonar imagery plays a significant role in oceanic applications since there is little natural light underwater, and light is irrelevant to sonar imaging. Sonar images are very likely to be affected by various distortions during the process of transmission ...
Multi-View Linear Discriminant Analysis Network
In many real-world applications, an object can be described from multiple views or styles, leading to the emerging multi-view analysis. To eliminate the complicated (usually highly nonlinear) view discrepancy for favorable cross-view recognition and ...
O2O Method for Fast 2D Shape Retrieval
A novel post-processing method, online to offline (O2O), to improve the efficiency of shape retrieval is proposed in this paper. The essence of this proposed method is to move more work that requires a lot of computation to offline. Based on this approach,...
Point Cloud Saliency Detection by Local and Global Feature Fusion
Inspired by the characteristics of the human visual system, a novel method is proposed for detecting the visually salient regions on 3D point clouds. First, the local distinctness of each point is evaluated based on the difference with its local ...
Learning to Find Unpaired Cross-Spectral Correspondences
We present a deep architecture and learning framework for establishing correspondences across cross-spectral visible and infrared images in an unpaired setting. To overcome the unpaired cross-spectral data problem, we design the unified image translation ...
Robust Adaptive Median Binary Pattern for Noisy Texture Classification and Retrieval
Texture is an important characteristic for different computer vision tasks and applications. Local binary pattern (LBP) is considered one of the most efficient texture descriptors yet. However, LBP has some notable limitations, in particular its ...
Optimal Adaptive Quantization Based on Temporal Distortion Propagation Model for HEVC
Optimal adaptive quantization is one of the key points to optimize the coding efficiency of video encoders. The latest block-based video compression standards, such as <italic>high-efficiency</italic> <italic>video coding</italic> (HEVC), extensively use ...
Weakly Supervised Salient Object Detection by Learning A Classifier-Driven Map Generator
Top-down saliency detection aims to highlight the regions of a specific object category, and typically relies on pixel-wise annotated training data. In this paper, we address the high cost of collecting such training data by a weakly supervised approach ...
Learning Deep Features for One-Class Classification
We present a novel deep-learning-based approach for <italic>one-class transfer learning</italic> in which labeled data from an unrelated task is used for feature learning in one-class classification. The proposed method operates on top of a convolutional ...
AttGAN: Facial Attribute Editing by Only Changing What You Want
Facial attribute editing aims to manipulate single or multiple attributes on a given face image, i.e., to generate a new face image with desired attributes while preserving other details. Recently, the generative adversarial net (GAN) and encoder–...
Reconstruction of Stochastic 3D Signals With Symmetric Statistics From 2D Projection Images Motivated by Cryo-Electron Microscopy
Cryo-electron microscopy provides 2D projection images of the 3D electron scattering intensity of many instances of the particle under study (e.g., a virus). Both symmetry (rotational point groups) and heterogeneity are important aspects of biological ...
Multiple Pyramids Based Image Inpainting Using Local Patch Statistics and Steering Kernel Feature
In this paper, we propose a novel multiple pyramids based image inpainting method using local patch statistics and geometric feature-based sparse representation to maintain texture consistency and structure coherence. First, we approximate each patch in ...
Adaptive Morphological Reconstruction for Seeded Image Segmentation
Morphological reconstruction (MR) is often employed by seeded image segmentation algorithms such as watershed transform and power watershed, as it is able to filter out seeds (regional minima) to reduce over-segmentation. However, the MR might mistakenly ...
Fast Blind Quality Assessment of DIBR-Synthesized Video Based on High-High Wavelet Subband
Free-viewpoint video, as the development direction of the next-generation video technologies, uses the depth-image-based rendering (DIBR) technique for the synthesis of video sequences at viewpoints, where real captured videos are missing. As reference ...
Texture Variation Adaptive Image Denoising With Nonlocal PCA
Image textures, as a kind of local variations, provide important information for the human visual system. Many image textures, especially the small-scale or stochastic textures, are rich in high-frequency variations, and are difficult to be preserved. ...
CAM-RNN: Co-Attention Model Based RNN for Video Captioning
Video captioning is a technique that bridges vision and language together, for which both visual information and text information are quite important. Typical approaches are based on the recurrent neural network (RNN), where the video caption is generated ...
TextField: Learning a Deep Direction Field for Irregular Scene Text Detection
Scene text detection is an important step in the scene text reading system. The main challenges lie in significantly varied sizes and aspect ratios, arbitrary orientations, and shapes. Driven by the recent progress in deep learning, impressive ...
Underwater Image Enhancement Using Adaptive Retinal Mechanisms
We propose an underwater image enhancement model inspired by the morphology and function of the teleost fish retina. We aim to solve the problems of underwater image degradation raised by the blurring and nonuniform color biasing. In particular, the ...
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Object Tracking
With efficient appearance learning models, discriminative correlation filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., ...
Efficient Bandwidth Estimation in 2D Filtered Backprojection Reconstruction
A generalized cross-validation approach to estimate the reconstruction filter bandwidth in 2D filtered backprojection is presented. The method writes the reconstruction equation in equivalent backprojected filtering form, derives results on ...
Subjective and Objective Quality Assessment of Stitched Images for Virtual Reality
We consider the problem of quality assessment (QA) of image stitching algorithms used to generate panoramic images for virtual reality applications. Our contributions are two-fold. We design the Indian Institute of Science Stitched Image QA (ISIQA) ...
Conditional Random Field Model for Robust Multi-Focus Image Fusion
In this paper, a novel multi-focus image fusion algorithm based on conditional random field optimization (mf-CRF) is proposed. It is based on an unary term that includes the combined activity estimation of both high and low frequencies of the input images,...
Channel Splitting Network for Single MR Image Super-Resolution
High resolution magnetic resonance (MR) imaging is desirable in many clinical applications due to its contribution to more accurate subsequent analyses and early clinical diagnoses. Single image super-resolution (SISR) is an effective and cost efficient ...