IEEETCSVT: Vol 34, No 7

Volume 34, Issue 7July 2024

Volume 34, Issue 7

July 2024

Publisher:

IEEE Press

ISSN:1051-8215

Bibliometrics

Select All

Export Citations Save to Binder

research-article

Absolute Pose Estimation With a Known Direction by Motion Decoupling

Pages 5215–5228https://doi.org/10.1109/TCSVT.2023.3264451

This paper develops an extremely robust solution for absolute pose estimation with known prior gravity direction by motion decoupling. Absolute pose estimation is a fundamental problem in computer vision, and recently the prior known vertical direction is ...

research-article

THISNet: Tooth Instance Segmentation on 3D Dental Models via Highlighting Tooth Regions

Pages 5229–5241https://doi.org/10.1109/TCSVT.2023.3341805

Automatic tooth instance segmentation on 3D dental models is crucial for digitizing dental treatments and enabling computer-assisted treatment planning. However, It is challenging since the tight arrangement of dental structures and the consequential ...

research-article

Online Discriminative Cross-Modal Hashing

Pages 5242–5254https://doi.org/10.1109/TCSVT.2023.3342418

Online cross-modal hashing has received increasing research attention due to its capability of encoding streaming data and updating hash functions simultaneously. Despite significant progress, there is still room for further improving accuracy from two ...

research-article

Intermediate Domain-Based Meta Learning Framework for Adaptive Object Detection

Pages 5255–5265https://doi.org/10.1109/TCSVT.2023.3342879

Deep learning based object detection methods have made significant progress in recent years. However, these methods often suffer from a substantial performance drop when domain shifts occur, making it difficult to generalize a source domain trained object ...

research-article

Cascade Semantic Prompt Alignment Network for Image Captioning

Pages 5266–5281https://doi.org/10.1109/TCSVT.2023.3343520

Image captioning (IC) takes an image as input and generates open-form descriptions in the domain of natural language. IC requires the detection of objects, modeling of relations between them, an assessment of the semantics of the scene and representing ...

research-article

Small Sample Image Segmentation by Coupling Convolutions and Transformers

Pages 5282–5294https://doi.org/10.1109/TCSVT.2023.3343632

Compared with natural image segmentation, small sample image segmentation tasks, such as medical image segmentation and defect detection, have been less studied. Recent studies made efforts on bringing together Convolutional Neural Networks (CNNs) and ...

research-article

A New Training Data Organization Form and Training Mode for Unbiased Scene Graph Generation

Pages 5295–5305https://doi.org/10.1109/TCSVT.2023.3344569

The current mainstream studies on Scene Graph Generation (SGG) devote to the long-tailed predicate distribution problem to generate unbiased scene graph. The long-tailed predicate distribution exists in VG dataset and is more severe during the SGG network ...

research-article

Representation Robustness and Feature Expansion for Exemplar-Free Class-Incremental Learning

Pages 5306–5320https://doi.org/10.1109/TCSVT.2023.3344574

Despite deep neural networks have made outstanding achievements in many static tasks, when faced with a continuous stream of data, they suffer from catastrophic forgetting since the previous data is usually inaccessible. Stored data or generative model is ...

research-article

Online Multi-Scale Classification and Global Feature Modulation for Robust Visual Tracking

Pages 5321–5334https://doi.org/10.1109/TCSVT.2023.3343949

Recent advanced trackers, composed of discriminative classification and dedicated bounding box estimation, have achieved remarkable advancements in performance of visual object tracking. However, existing methods cannot satisfy the demands of tracking ...

research-article

Allowing Supervision in Unsupervised Deformable- Instances Image-to-Image Translation

Pages 5335–5349https://doi.org/10.1109/TCSVT.2023.3343733

Replacing objects in images is a practical functionality of Photoshop, e.g., clothes changing. This task is defined as Unsupervised Deformable-Instances Image-to-Image Translation (UDIT), which maps multiple foreground instances of a source domain to a ...

research-article

Efficient Task-Specific Feature Re-Fusion for More Accurate Object Detection and Instance Segmentation

Pages 5350–5360https://doi.org/10.1109/TCSVT.2023.3344713

Feature pyramid representations have been widely adopted in the object detection literature for better handling of variations in scale, which provide abundant information from various spatial levels for classification and localization sub-tasks. We find ...

research-article

Toward Meta-Shape-Based Multi-View 3D Point Cloud Registration: An Evaluation

Pages 5361–5375https://doi.org/10.1109/TCSVT.2023.3341622

Reducing cumulative registration error is critical to accurate 3D multi-view registration. Meta-shape based methods optimize rigid transformations of point clouds by iteratively registering each point cloud with a meta-shape, which remain popular ...

research-article

Unveiling the Power of Visible-Thermal Video Object Segmentation

Pages 5376–5388https://doi.org/10.1109/TCSVT.2023.3345852

Despite recent progress, Video Object Segmentation (VOS) remains challenging in complex situations such as low light and dark scenes. In this paper, we tackle the visibility limitations by introducing thermal information as auxillary for VOS. Specifically,...

research-article

ESNet: An Efficient Framework for Superpixel Segmentation

Pages 5389–5399https://doi.org/10.1109/TCSVT.2023.3347402

Superpixel segmentation divides an original image into mid-level regions to reduce the number of computational primitives for subsequent tasks. The two-stage approaches work better but have high computational complexity among the existing deep superpixel ...

research-article

Attention-Bridged Modal Interaction for Text-to-Image Generation

Pages 5400–5413https://doi.org/10.1109/TCSVT.2023.3347971

We propose a novel Text-to-Image Generation Network, Attention-bridged Modal Interaction Generative Adversarial Network (AMI-GAN), to better explore modal interaction and perception for high-quality image synthesis. The AMI-GAN contains two novel designs: ...

research-article

The Devil Is in the Boundary: Boundary-Enhanced Polyp Segmentation

Pages 5414–5423https://doi.org/10.1109/TCSVT.2023.3348598

Due to the various appearance of the polyps and the tiny contrast between the polyp area and its surrounding background, accurate polyp segmentation has become a challenging task. To tackle this issue, we introduce a boundary-enhanced framework for polyp ...

research-article

Context-Aware and Semantic-Consistent Spatial Interactions for One-Shot Object Detection Without Fine-Tuning

Pages 5424–5439https://doi.org/10.1109/TCSVT.2023.3349007

One-shot object detection (OSOD) without fine-tuning has recently garnered considerable attention and research focus. It aims to directly detect novel-class objects in the target image by providing merely one support image patch without undergoing the ...

research-article

Enhancing Micro-Video Venue Recognition via Multi-Modal and Multi-Granularity Object Relations

Pages 5440–5451https://doi.org/10.1109/TCSVT.2023.3349202

Micro-video venue recognition aims to predict the venue category where a micro-video was filmed. Different from traditional long videos which contain rich temporal context, venue prediction for micro-videos is difficult due to its limited duration (...

research-article

Efficient Camouflaged Object Detection Network Based on Global Localization Perception and Local Guidance Refinement

Pages 5452–5465https://doi.org/10.1109/TCSVT.2023.3349209

Camouflaged Object Detection (COD) is a challenging visual task due to its complex contour, diverse scales, and high similarity to the background. Existing COD methods encounter two predicaments: One is that they are prone to falling into local perception,...

research-article

Identity-Aware Variational Autoencoder for Face Swapping

Pages 5466–5479https://doi.org/10.1109/TCSVT.2024.3349909

Face swapping aims to transfer the identity of a source face to a target face image while preserving the target attributes (e.g., facial expression, head pose, illumination, and background). Most existing methods use a face recognition model to extract ...

research-article

Weakly-Supervised Video Anomaly Detection With Snippet Anomalous Attention

Pages 5480–5492https://doi.org/10.1109/TCSVT.2024.3350084

With a focus on abnormal events contained within untrimmed videos, there is increasing interest among researchers in video anomaly detection. Among different video anomaly detection scenarios, weakly-supervised video anomaly detection poses a significant ...

research-article

Analogical Learning-Based Few-Shot Class-Incremental Learning

Pages 5493–5504https://doi.org/10.1109/TCSVT.2024.3350913

FSCIL (Few-shot class-incremental learning) is a prominent research topic in the ML community. It faces two significant challenges: forgetting old class knowledge and overfitting to limited new class training examples. In this paper, we present a novel ...

research-article

Dynamics-Aware Adversarial Attack of Adaptive Neural Networks

Pages 5505–5518https://doi.org/10.1109/TCSVT.2024.3351680

In this paper, we investigate the dynamics-aware adversarial attack problem of adaptive neural networks. Most existing adversarial attack algorithms are designed under a basic assumption – the network architecture is fixed throughout the attack ...

research-article

Knowledge Synergy Learning for Multi-Modal Tracking

Pages 5519–5532https://doi.org/10.1109/TCSVT.2024.3352573

Benefiting from the rich information provided by different modalities, multi-modal tracking has shown significant improvements compared to single-modal tracking. However, in practical applications, multi-modal tracking still faces two major challenges. ...

research-article

Equity in Unsupervised Domain Adaptation by Nuclear Norm Maximization

Pages 5533–5545https://doi.org/10.1109/TCSVT.2023.3346444

Nuclear norm maximization has shown the power to enhance the transferability of unsupervised domain adaptation model (UDA) in an empirical scheme. In this paper, we identify a new property termed equity, which indicates the balance degree of predicted ...

research-article

SwinIT: Hierarchical Image-to-Image Translation Framework Without Cycle Consistency

Pages 5546–5559https://doi.org/10.1109/TCSVT.2024.3353932

Image-to-image (I2I) translation often requires establishing cycle consistency between the source and the translated images across different domains. However, cycle consistency requires redundant reconstruction, and is too restrictive to satisfy the ...

research-article

DRNet: Disentanglement and Recombination Network for Few-Shot Semantic Segmentation

Pages 5560–5574https://doi.org/10.1109/TCSVT.2024.3358679

Few-shot semantic segmentation (FSS) aims to segment novel classes with only a few annotated samples. Existing methods to FSS generally combine the annotated mask and the corresponding support image to generate the class-specific representation, and ...

research-article

Weakly-Supervised Action Learning in Procedural Task Videos via Process Knowledge Decomposition

Pages 5575–5588https://doi.org/10.1109/TCSVT.2024.3358547

Action learning is a research area that aims to recognize the action category of each frame in the video. Context information is crucial for learning actions, but most existing methods face two challenges in exploiting this information: 1) They apply ...

research-article

Pedestrian 3D Shape Understanding for Person Re-Identification via Multi-View Learning

Pages 5589–5602https://doi.org/10.1109/TCSVT.2024.3358850

Recent development in computing power has resulted in performance improvements on holistic (none-occluded) person Re-Identification (ReID) tasks. Nevertheless, the precision of the recent research will diminish when a pedestrian is obstructed by ...

research-article

TED-Net: Dispersal Attention for Perceiving Interaction Region in Indirectly-Contact HOI Detection

Pages 5603–5615https://doi.org/10.1109/TCSVT.2024.3358952

Human-Object Interaction (HOI) detection is a fertile research ground that merits further investigation in computer vision, and plays an important role in image high-level semantic information understanding. To achieve superior object detection ...

IEEE Transactions on Circuits and Systems for Video Technology

Sections

Absolute Pose Estimation With a Known Direction by Motion Decoupling

THISNet: Tooth Instance Segmentation on 3D Dental Models via Highlighting Tooth Regions

Online Discriminative Cross-Modal Hashing

Intermediate Domain-Based Meta Learning Framework for Adaptive Object Detection

Cascade Semantic Prompt Alignment Network for Image Captioning

Small Sample Image Segmentation by Coupling Convolutions and Transformers

A New Training Data Organization Form and Training Mode for Unbiased Scene Graph Generation

Representation Robustness and Feature Expansion for Exemplar-Free Class-Incremental Learning

Online Multi-Scale Classification and Global Feature Modulation for Robust Visual Tracking

Allowing Supervision in Unsupervised Deformable- Instances Image-to-Image Translation

Efficient Task-Specific Feature Re-Fusion for More Accurate Object Detection and Instance Segmentation

Toward Meta-Shape-Based Multi-View 3D Point Cloud Registration: An Evaluation

Unveiling the Power of Visible-Thermal Video Object Segmentation

ESNet: An Efficient Framework for Superpixel Segmentation

Attention-Bridged Modal Interaction for Text-to-Image Generation

The Devil Is in the Boundary: Boundary-Enhanced Polyp Segmentation

Context-Aware and Semantic-Consistent Spatial Interactions for One-Shot Object Detection Without Fine-Tuning

Enhancing Micro-Video Venue Recognition via Multi-Modal and Multi-Granularity Object Relations

Efficient Camouflaged Object Detection Network Based on Global Localization Perception and Local Guidance Refinement

Identity-Aware Variational Autoencoder for Face Swapping

Weakly-Supervised Video Anomaly Detection With Snippet Anomalous Attention

Analogical Learning-Based Few-Shot Class-Incremental Learning

Dynamics-Aware Adversarial Attack of Adaptive Neural Networks

Knowledge Synergy Learning for Multi-Modal Tracking

Equity in Unsupervised Domain Adaptation by Nuclear Norm Maximization

SwinIT: Hierarchical Image-to-Image Translation Framework Without Cycle Consistency

DRNet: Disentanglement and Recombination Network for Few-Shot Semantic Segmentation

Weakly-Supervised Action Learning in Procedural Task Videos via Process Knowledge Decomposition

Pedestrian 3D Shape Understanding for Person Re-Identification via Multi-View Learning

TED-Net: Dispersal Attention for Perceiving Interaction Region in Indirectly-Contact HOI Detection

Sections

Save to Binder

Comments