Absolute Pose Estimation With a Known Direction by Motion Decoupling
This paper develops an extremely robust solution for absolute pose estimation with known prior gravity direction by motion decoupling. Absolute pose estimation is a fundamental problem in computer vision, and recently the prior known vertical direction is ...
THISNet: Tooth Instance Segmentation on 3D Dental Models via Highlighting Tooth Regions
Automatic tooth instance segmentation on 3D dental models is crucial for digitizing dental treatments and enabling computer-assisted treatment planning. However, It is challenging since the tight arrangement of dental structures and the consequential ...
Online Discriminative Cross-Modal Hashing
Online cross-modal hashing has received increasing research attention due to its capability of encoding streaming data and updating hash functions simultaneously. Despite significant progress, there is still room for further improving accuracy from two ...
Intermediate Domain-Based Meta Learning Framework for Adaptive Object Detection
Deep learning based object detection methods have made significant progress in recent years. However, these methods often suffer from a substantial performance drop when domain shifts occur, making it difficult to generalize a source domain trained object ...
Cascade Semantic Prompt Alignment Network for Image Captioning
Image captioning (IC) takes an image as input and generates open-form descriptions in the domain of natural language. IC requires the detection of objects, modeling of relations between them, an assessment of the semantics of the scene and representing ...
Small Sample Image Segmentation by Coupling Convolutions and Transformers
Compared with natural image segmentation, small sample image segmentation tasks, such as medical image segmentation and defect detection, have been less studied. Recent studies made efforts on bringing together Convolutional Neural Networks (CNNs) and ...
A New Training Data Organization Form and Training Mode for Unbiased Scene Graph Generation
The current mainstream studies on Scene Graph Generation (SGG) devote to the long-tailed predicate distribution problem to generate unbiased scene graph. The long-tailed predicate distribution exists in VG dataset and is more severe during the SGG network ...
Representation Robustness and Feature Expansion for Exemplar-Free Class-Incremental Learning
Despite deep neural networks have made outstanding achievements in many static tasks, when faced with a continuous stream of data, they suffer from catastrophic forgetting since the previous data is usually inaccessible. Stored data or generative model is ...
Online Multi-Scale Classification and Global Feature Modulation for Robust Visual Tracking
Recent advanced trackers, composed of discriminative classification and dedicated bounding box estimation, have achieved remarkable advancements in performance of visual object tracking. However, existing methods cannot satisfy the demands of tracking ...
Allowing Supervision in Unsupervised Deformable- Instances Image-to-Image Translation
Replacing objects in images is a practical functionality of Photoshop, e.g., clothes changing. This task is defined as Unsupervised Deformable-Instances Image-to-Image Translation (UDIT), which maps multiple foreground instances of a source domain to a ...
Efficient Task-Specific Feature Re-Fusion for More Accurate Object Detection and Instance Segmentation
Feature pyramid representations have been widely adopted in the object detection literature for better handling of variations in scale, which provide abundant information from various spatial levels for classification and localization sub-tasks. We find ...
Toward Meta-Shape-Based Multi-View 3D Point Cloud Registration: An Evaluation
Reducing cumulative registration error is critical to accurate 3D multi-view registration. Meta-shape based methods optimize rigid transformations of point clouds by iteratively registering each point cloud with a meta-shape, which remain popular ...
Unveiling the Power of Visible-Thermal Video Object Segmentation
Despite recent progress, Video Object Segmentation (VOS) remains challenging in complex situations such as low light and dark scenes. In this paper, we tackle the visibility limitations by introducing thermal information as auxillary for VOS. Specifically,...
ESNet: An Efficient Framework for Superpixel Segmentation
Superpixel segmentation divides an original image into mid-level regions to reduce the number of computational primitives for subsequent tasks. The two-stage approaches work better but have high computational complexity among the existing deep superpixel ...
Attention-Bridged Modal Interaction for Text-to-Image Generation
We propose a novel Text-to-Image Generation Network, Attention-bridged Modal Interaction Generative Adversarial Network (AMI-GAN), to better explore modal interaction and perception for high-quality image synthesis. The AMI-GAN contains two novel designs: ...
The Devil Is in the Boundary: Boundary-Enhanced Polyp Segmentation
Due to the various appearance of the polyps and the tiny contrast between the polyp area and its surrounding background, accurate polyp segmentation has become a challenging task. To tackle this issue, we introduce a boundary-enhanced framework for polyp ...
Context-Aware and Semantic-Consistent Spatial Interactions for One-Shot Object Detection Without Fine-Tuning
One-shot object detection (OSOD) without fine-tuning has recently garnered considerable attention and research focus. It aims to directly detect novel-class objects in the target image by providing merely one support image patch without undergoing the ...
Enhancing Micro-Video Venue Recognition via Multi-Modal and Multi-Granularity Object Relations
Micro-video venue recognition aims to predict the venue category where a micro-video was filmed. Different from traditional long videos which contain rich temporal context, venue prediction for micro-videos is difficult due to its limited duration (...
Efficient Camouflaged Object Detection Network Based on Global Localization Perception and Local Guidance Refinement
Camouflaged Object Detection (COD) is a challenging visual task due to its complex contour, diverse scales, and high similarity to the background. Existing COD methods encounter two predicaments: One is that they are prone to falling into local perception,...
Identity-Aware Variational Autoencoder for Face Swapping
Face swapping aims to transfer the identity of a source face to a target face image while preserving the target attributes (e.g., facial expression, head pose, illumination, and background). Most existing methods use a face recognition model to extract ...
Weakly-Supervised Video Anomaly Detection With Snippet Anomalous Attention
With a focus on abnormal events contained within untrimmed videos, there is increasing interest among researchers in video anomaly detection. Among different video anomaly detection scenarios, weakly-supervised video anomaly detection poses a significant ...
Analogical Learning-Based Few-Shot Class-Incremental Learning
FSCIL (Few-shot class-incremental learning) is a prominent research topic in the ML community. It faces two significant challenges: forgetting old class knowledge and overfitting to limited new class training examples. In this paper, we present a novel ...
Dynamics-Aware Adversarial Attack of Adaptive Neural Networks
In this paper, we investigate the dynamics-aware adversarial attack problem of adaptive neural networks. Most existing adversarial attack algorithms are designed under a basic assumption – the network architecture is fixed throughout the attack ...
Knowledge Synergy Learning for Multi-Modal Tracking
Benefiting from the rich information provided by different modalities, multi-modal tracking has shown significant improvements compared to single-modal tracking. However, in practical applications, multi-modal tracking still faces two major challenges. ...
Equity in Unsupervised Domain Adaptation by Nuclear Norm Maximization
Nuclear norm maximization has shown the power to enhance the transferability of unsupervised domain adaptation model (UDA) in an empirical scheme. In this paper, we identify a new property termed equity, which indicates the balance degree of predicted ...
SwinIT: Hierarchical Image-to-Image Translation Framework Without Cycle Consistency
Image-to-image (I2I) translation often requires establishing cycle consistency between the source and the translated images across different domains. However, cycle consistency requires redundant reconstruction, and is too restrictive to satisfy the ...
DRNet: Disentanglement and Recombination Network for Few-Shot Semantic Segmentation
Few-shot semantic segmentation (FSS) aims to segment novel classes with only a few annotated samples. Existing methods to FSS generally combine the annotated mask and the corresponding support image to generate the class-specific representation, and ...
Weakly-Supervised Action Learning in Procedural Task Videos via Process Knowledge Decomposition
Action learning is a research area that aims to recognize the action category of each frame in the video. Context information is crucial for learning actions, but most existing methods face two challenges in exploiting this information: 1) They apply ...
Pedestrian 3D Shape Understanding for Person Re-Identification via Multi-View Learning
Recent development in computing power has resulted in performance improvements on holistic (none-occluded) person Re-Identification (ReID) tasks. Nevertheless, the precision of the recent research will diminish when a pedestrian is obstructed by ...
TED-Net: Dispersal Attention for Perceiving Interaction Region in Indirectly-Contact HOI Detection
Human-Object Interaction (HOI) detection is a fertile research ground that merits further investigation in computer vision, and plays an important role in image high-level semantic information understanding. To achieve superior object detection ...