No abstract available.
PS-FCN: A Flexible Learning Framework for Photometric Stereo
This paper addresses the problem of photometric stereo for non-Lambertian surfaces. Existing approaches often adopt simplified reflectance models to make the problem more tractable, but this greatly hinders their applications on real-world ...
Ask, Acquire, and Attack: Data-Free UAP Generation Using Class Impressions
Deep learning models are susceptible to input specific noise, called adversarial perturbations. Moreover, there exist input-agnostic noise, called Universal Adversarial Perturbations (UAP) that can affect inference of the models over most input ...
Rendering Portraitures from Monocular Camera and Beyond
Shallow Depth-of-Field (DoF) is a desirable effect in photography which renders artistic photos. Usually, it requires single-lens reflex cameras and certain photography skills to generate such effects. Recently, dual-lens on cellphones is used to ...
Learning to Zoom: A Saliency-Based Sampling Layer for Neural Networks
We introduce a saliency-based distortion layer for convolutional neural networks that helps to improve the spatial sampling of input data for a given task. Our differentiable layer can be added as a preprocessing block to existing task networks ...
A Scalable Exemplar-Based Subspace Clustering Algorithm for Class-Imbalanced Data
Subspace clustering methods based on expressing each data point as a linear combination of a few other data points (e.g., sparse subspace clustering) have become a popular tool for unsupervised learning due to their empirical success and ...
RCAA: Relational Context-Aware Agents for Person Search
We aim to search for a target person from a gallery of whole scene images for which the annotations of pedestrian bounding boxes are unavailable. Previous approaches to this problem have relied on a pedestrian proposal net, which may generate ...
Distractor-Aware Siamese Networks for Visual Object Tracking
Recently, Siamese networks have drawn great attention in visual tracking community because of their balanced accuracy and speed. However, features used in most Siamese tracking approaches can only discriminate foreground from the non-semantic ...
Adding Attentiveness to the Neurons in Recurrent Neural Networks
Recurrent neural networks (RNNs) are capable of modeling the temporal dynamics of complex sequential information. However, the structures of existing RNN neurons mainly focus on controlling the contributions of current and historical information ...
Learning Dynamic Memory Networks for Object Tracking
Template-matching methods for visual tracking have gained popularity recently due to their comparable performance and fast speed. However, they lack effective ways to adapt to changes in the target object’s appearance, making their tracking ...
GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints
Learned local descriptors based on Convolutional Neural Networks (CNNs) have achieved significant improvements on patch-based benchmarks, whereas not having demonstrated strong generalization ability on recent benchmarks of image-based 3D ...
Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks
Recent studies on unsupervised image-to-image translation have made remarkable progress by training a pair of generative adversarial networks with a cycle-consistent loss. However, such unsupervised methods may generate inferior results when the ...
Find and Focus: Retrieve and Localize Video Events with Natural Language Queries
The thriving of video sharing services brings new challenges to video retrieval, e.g. the rapid growth in video duration and content diversity. Meeting such challenges calls for new techniques that can effectively retrieve videos with natural ...
Face Super-Resolution Guided by Facial Component Heatmaps
State-of-the-art face super-resolution methods leverage deep convolutional neural networks to learn a mapping between low-resolution (LR) facial patterns and their corresponding high-resolution (HR) counterparts by exploring local appearance ...
Reverse Attention for Salient Object Detection
Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low ...
Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization
State-of-the-art temporal action detectors inefficiently search the entire video for specific actions. Despite the encouraging progress these methods achieve, it is crucial to design automated approaches that only explore parts of the video which ...
PSANet: Point-wise Spatial Attention Network for Scene Parsing
We notice information flow in convolutional neural networks is restricted inside local neighborhood regions due to the physical design of convolutional filters, which limits the overall understanding of complex scenes. In this paper, we propose ...
Repeatability Is Not Enough: Learning Affine Regions via Discriminability
A method for learning local affine-covariant regions is presented. We show that maximizing geometric repeatability does not lead to local regions, a.k.a features, that are reliably matched and this necessitates descriptor-based learning. We ...
Compressing the Input for CNNs with the First-Order Scattering Transform
We study the first-order scattering transform as a candidate for reducing the signal processed by a convolutional neural network (CNN). We show theoretical and empirical evidence that in the case of natural images and sufficiently small ...
Faces as Lighting Probes via Unsupervised Deep Highlight Extraction
We present a method for estimating detailed scene illumination using human faces in a single image. In contrast to previous works that estimate lighting in terms of low-order basis functions or distant point lights, our technique estimates ...
DetNet: Design Backbone for Object Detection
Recent CNN based object detectors, either one-stage methods like YOLO, SSD, and RetinaNet, or two-stage detectors like Faster R-CNN, R-FCN and FPN, are usually trying to directly finetune from ImageNet pre-trained models designed for the task of ...
Structured Siamese Network for Real-Time Visual Tracking
Local structures of target objects are essential for robust tracking. However, existing methods based on deep neural networks mostly describe the target appearance from the global view, leading to high sensitivity to non-rigid appearance change ...
Associating Inter-image Salient Instances for Weakly Supervised Semantic Segmentation
Effectively bridging between image level keyword annotations and corresponding image pixels is one of the main challenges in weakly supervised semantic segmentation. In this paper, we use an instance-level salient object detector to automatically ...
HybridFusion: Real-Time Performance Capture Using a Single Depth Sensor and Sparse IMUs
We propose a light-weight yet highly robust method for real-time human performance capture based on a single depth camera and sparse inertial measurement units (IMUs). Our method combines non-rigid surface tracking and volumetric fusion to ...
Learning Human-Object Interactions by Graph Parsing Neural Networks
This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable ...
Macro-Micro Adversarial Network for Human Parsing
In human parsing, the pixel-wise classification loss has drawbacks in its low-level local inconsistency and high-level semantic inconsistency. The introduction of the adversarial network tackles the two problems using a single discriminator. ...
Stereo Computation for a Single Mixture Image
This paper proposes an original problem of stereo computation from a single mixture image – a challenging problem that had not been researched before. The goal is to separate (i.e., unmix) a single mixture image into two constitute image layers, ...
Dividing and Aggregating Network for Multi-view Action Recognition
In this paper, we propose a new Dividing and Aggregating Network (DA-Net) for multi-view action recognition. In our DA-Net, we learn view-independent representations shared by all views at lower layers, while we learn one view-specific ...
Selective Zero-Shot Classification with Augmented Attributes
In this paper, we introduce a selective zero-shot classification problem: how can the classifier avoid making dubious predictions? Existing attribute-based zero-shot classification methods are shown to work poorly in the selective classification ...
Modeling Varying Camera-IMU Time Offset in Optimization-Based Visual-Inertial Odometry
- Yonggen Ling,
- Linchao Bao,
- Zequn Jie,
- Fengming Zhu,
- Ziyang Li,
- Shanmin Tang,
- Yongsheng Liu,
- Wei Liu,
- Tong Zhang
Combining cameras and inertial measurement units (IMUs) has been proven effective in motion tracking, as these two sensing modalities offer complementary characteristics that are suitable for fusion. While most works focus on global-shutter ...
Index Terms
- Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part IX