No abstract available.
Front Matter
Partially-Shared Variational Auto-encoders for Unsupervised Domain Adaptation with Target Shift
Target shift, the different label distributions of source and target domains, is an important problem for practical use of unsupervised domain adaptation (UDA); as we do not know labels in target domain datasets, we cannot ensure an identical ...
Learning Where to Focus for Efficient Video Object Detection
Transferring existing image-based detectors to the video is non-trivial since the quality of frames is always deteriorated by part occlusion, rare pose, and motion blur. Previous approaches exploit to propagate and aggregate features across video ...
Learning Object Permanence from Video
Object Permanence allows people to reason about the location of non-visible objects, by understanding that they continue to exist even when not perceived directly. Object Permanence is critical for building a model of the world, since objects in ...
Adaptive Text Recognition Through Visual Matching
This work addresses the problems of generalization and flexibility for text recognition in documents. We introduce a new model that exploits the repetitive nature of characters in languages, and decouples the visual decoding and linguistic ...
Actions as Moving Points
The existing action tubelet detectors often depend on heuristic anchor design and placement, which might be computationally expensive and sub-optimal for precise localization. In this paper, we present a conceptually simple, computationally ...
Geometric Correspondence Fields: Learned Differentiable Rendering for 3D Pose Refinement in the Wild
- Alexander Grabner,
- Yaming Wang,
- Peizhao Zhang,
- Peihong Guo,
- Tong Xiao,
- Peter Vajda,
- Peter M. Roth,
- Vincent Lepetit
We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild. In contrast to previous methods, we make two main contributions: First, instead of comparing real-world images and ...
3D Fluid Flow Reconstruction Using Compact Light Field PIV
Particle Imaging Velocimetry (PIV) estimates the fluid flow by analyzing the motion of injected particles. The problem is challenging as the particles lie at different depths but have similar appearances. Tracking a large number of moving ...
Contextual Diversity for Active Learning
Requirement of large annotated datasets restrict the use of deep convolutional neural networks (CNNs) for many practical applications. The problem can be mitigated by using active learning (AL) techniques which, under a given annotation budget, ...
Temporal Aggregate Representations for Long-Range Video Understanding
Future prediction, especially in long-range videos, requires reasoning from current and past observations. In this work, we address questions of temporal extent, scaling, and level of semantic abstraction with a flexible multi-granular temporal ...
Stochastic Fine-Grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition
In this paper, we propose novel stochastic modeling of various components of a continuous sign language recognition (CSLR) system that is based on the transformer encoder and connectionist temporal classification (CTC). Most importantly, We model ...
General 3D Room Layout from a Single View by Render-and-Compare
We present a novel method to reconstruct the 3D layout of a room—walls, floors, ceilings—from a single perspective view in challenging conditions, by contrast with previous single-view methods restricted to cuboid-shaped layouts. This input view ...
Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints
We introduce the first dense neural non-rigid structure from motion (N-NRSfM) approach, which can be trained end-to-end in an unsupervised manner from 2D point tracks. Compared to the competing methods, our combination of loss functions is fully-...
Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability
A key capability of an intelligent system is deciding when events from past experience must be remembered and when they can be forgotten. Towards this goal, we develop a predictive model of human visual event memory and how those memories decay ...
Yet Another Intermediate-Level Attack
The transferability of adversarial examples across deep neural network (DNN) models is the crux of a spectrum of black-box attacks. In this paper, we propose a novel method to enhance the black-box transferability of baseline adversarial examples. ...
Topology-Change-Aware Volumetric Fusion for Dynamic Scene Reconstruction
Topology change is a challenging problem for 4D reconstruction of dynamic scenes. In the classic volumetric fusion-based framework, a mesh is usually extracted from the TSDF volume as the canonical surface representation to help estimating ...
Early Exit or Not: Resource-Efficient Blind Quality Enhancement for Compressed Images
Lossy image compression is pervasively conducted to save communication bandwidth, resulting in undesirable compression artifacts. Recently, extensive approaches have been proposed to reduce image compression artifacts at the decoder side; however, ...
PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations
Implicit surface representations, such as signed-distance functions, combined with deep learning have led to impressive models which can represent detailed shapes of objects with arbitrary topology. Since a continuous function is learned, the ...
Infrastructure-Based Multi-camera Calibration Using Radial Projections
Multi-camera systems are an important sensor platform for intelligent systems such as self-driving cars. Pattern-based calibration techniques can be used to calibrate the intrinsics of the cameras individually. However, extrinsic calibration of ...
MotionSqueeze: Neural Motion Feature Learning for Video Understanding
Motion plays a crucial role in understanding videos and most state-of-the-art neural models for video classification incorporate motion information typically using optical flows extracted by a separate off-the-shelf method. As the frame-by-frame ...
Polarized Optical-Flow Gyroscope
We merge by generalization two principles of passive optical sensing of motion. One is common spatially resolved imaging, where motion induces temporal readout changes at high-contrast spatial features, as used in traditional optical-flow. The ...
Online Meta-learning for Multi-source and Semi-supervised Domain Adaptation
Domain adaptation (DA) is the topical problem of adapting models from labelled source datasets so that they perform well on target datasets where only unlabelled or partially labelled data is available. Many methods have been proposed to address ...
On the Effectiveness of Image Rotation for Open Set Domain Adaptation
Open Set Domain Adaptation (OSDA) bridges the domain gap between a labeled source domain and an unlabeled target domain, while also rejecting target classes that are not present in the source. To avoid negative transfer, OSDA can be tackled by ...
Combining Task Predictors via Enhancing Joint Predictability
Predictor combination aims to improve a (target) predictor of a learning task based on the (reference) predictors of potentially relevant tasks, without having access to the internals of individual predictors. We present a new predictor ...
Multi-scale Positive Sample Refinement for Few-Shot Object Detection
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances, and is useful when manual annotation is time-consuming or data acquisition is limited. Unlike previous attempts that exploit few-shot ...
Single-Image Depth Prediction Makes Feature Matching Easier
Good local features improve the robustness of many 3D re-localization and multi-view reconstruction pipelines. The problem is that viewing angle and distance severely impact the recognizability of a local feature. Attempts to improve appearance ...
Index Terms
- Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI