No abstract available.
Deep Boosting for Image Denoising
Boosting is a classic algorithm which has been successfully applied to diverse computer vision tasks. In the scenario of image denoising, however, the existing boosting algorithms are surpassed by the emerging learning-based models. In this paper, ...
Self-Supervised Relative Depth Learning for Urban Scene Understanding
As an agent moves through the world, the apparent motion of scene elements is (usually) inversely proportional to their depth (Strictly speaking, this statement is true only after one has compensated for camera rotation, individual object motion, ...
K-convexity Shape Priors for Segmentation
This work extends popular star-convexity and other more general forms of convexity priors. We represent an object as a union of “convex” overlappable subsets. Since an arbitrary shape can always be divided into convex parts, our regularization ...
Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images
We propose an end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image. Limited by the nature of deep neural network, previous methods usually represent a 3D shape in volume or point cloud, and it ...
Boosted Attention: Leveraging Human Attention for Image Captioning
Visual attention has shown usefulness in image captioning, with the goal of enabling a caption model to selectively focus on regions of interest. Existing models typically rely on top-down language information and learn attention implicitly by ...
Image Inpainting for Irregular Holes Using Partial Convolutions
Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (...
Fighting Fake News: Image Splice Detection via Learned Self-Consistency
Advances in photo editing and manipulation tools have made it significantly easier to create fake imagery. Learning to detect such manipulations, however, remains a challenging problem due to the lack of sufficient amounts of manipulated training ...
Hand Pose Estimation via Latent 2.5D Heatmap Regression
Estimating the 3D pose of a hand is an essential part of human-computer interaction. Estimating 3D pose using depth or multi-view sensors has become easier with recent advances in computer vision, however, regressing pose from a single RGB image ...
Depth-Aware CNN for RGB-D Segmentation
Convolutional neural networks (CNN) are limited by the lack of capability to handle geometric information due to the fixed grid kernel structure. The availability of depth data enables progress in RGB-D semantic segmentation with CNNs. State-of-...
CAR-Net: Clairvoyant Attentive Recurrent Network
We present an interpretable framework for path prediction that leverages dependencies between agents’ behaviors and their spatial navigation environment. We exploit two sources of information: the past motion trajectory of the agent of interest ...
Evaluating Capability of Deep Neural Networks for Image Classification via Information Plane
Inspired by the pioneering work of information bottleneck principle for Deep Neural Networks (DNNs) analysis, we design an information plane based framework to evaluate the capability of DNNs for image classification tasks, which not only helps ...
What Do I Annotate Next? An Empirical Study of Active Learning for Action Localization
Despite tremendous progress achieved in temporal action localization, state-of-the-art methods still struggle to train accurate models when annotated data is scarce. In this paper, we introduce a novel active learning framework for temporal ...
Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model
We propose a novel end-to-end semi-supervised adversarial framework to generate photorealistic face images of new identities with a wide range of expressions, poses, and illuminations conditioned by synthetic images sampled from a 3D morphable ...
HairNet: Single-View Hair Reconstruction Using Convolutional Neural Networks
We introduce a deep learning-based method to generate full 3D hair geometry from an unconstrained image. Our method can recover local strand details and has real-time performance. State-of-the-art hair modeling techniques rely on large hairstyle ...
Neural Network Encapsulation
A capsule is a collection of neurons which represents different variants of a pattern in the network. The routing scheme ensures only certain capsules which resemble lower counterparts in the higher layer should be activated. However, the ...
Learning Deep Representations with Probabilistic Knowledge Transfer
Knowledge Transfer (KT) techniques tackle the problem of transferring the knowledge from a large and complex neural network into a smaller and faster one. However, existing KT methods are tailored towards classification tasks and they cannot be ...
Integrating Egocentric Videos in Top-View Surveillance Videos: Joint Identification and Temporal Alignment
Videos recorded from first person (egocentric) perspective have little visual appearance in common with those from third person perspective, especially with videos captured by top-view surveillance cameras. In this paper, we aim to relate these ...
Visual-Inertial Object Detection and Mapping
We present a method to populate an unknown environment with models of previously seen objects, placed in a Euclidean reference frame that is inferred causally and on-line using monocular video along with inertial sensors. The system we implement ...
Actor-Centric Relation Network
Current state-of-the-art approaches for spatio-temporal action localization rely on detections at the frame level and model temporal context with 3D ConvNets. Here, we go one step further and model spatio-temporal relations to capture the ...
Zero-Annotation Object Detection with Web Knowledge Transfer
Object detection is one of the major problems in computer vision, and has been extensively studied. Most of the existing detection works rely on labor-intensive supervision, such as ground truth bounding boxes of objects or at least image-level ...
Receptive Field Block Net for Accurate and Fast Object Detection
Current top-performing object detectors depend on deep CNN backbones, such as ResNet-101 and Inception, benefiting from their powerful feature representations but suffering from high computational costs. Conversely, some lightweight model based ...
Deep Adversarial Attention Alignment for Unsupervised Domain Adaptation: The Benefit of Target Expectation Maximization
In this paper, we make two contributions to unsupervised domain adaptation (UDA) using the convolutional neural network (CNN). First, our approach transfers knowledge in all the convolutional layers through attention alignment. Most previous ...
TSC: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection
This work provides a simple approach to discover tight object bounding boxes with only image-level supervision, called Tight box mining with Surrounding Segmentation Context (TS2C). We observe that object candidates mined through current multiple ...
Hierarchy of Alternating Specialists for Scene Recognition
We introduce a method for improving convolutional neural networks (CNNs) for scene classification. We present a hierarchy of specialist networks, which disentangles the intra-class variation and inter-class similarity in a coarse to fine manner. ...
Move Forward and Tell: A Progressive Generator of Video Descriptions
We present an efficient framework that can generate a coherent paragraph to describe a given video. Previous works on video captioning usually focus on video clips. They typically treat an entire video as a whole and generate the caption ...
Learning Monocular Depth by Distilling Cross-Domain Stereo Networks
Monocular depth estimation aims at estimating a pixelwise depth map for a single image, which has wide applications in scene understanding and autonomous driving. Existing supervised and unsupervised methods face great challenges. Supervised ...
Index Terms
- Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XI