Author: Huang, Qingming : Search

research-article

Rethink video retrieval representation for video captioning

Pattern Recognition (PATT), Volume 156, Issue Chttps://doi.org/10.1016/j.patcog.2024.110744

Abstract

Video captioning, a challenging task targeting the automatic generation of accurate and comprehensive descriptions based on video content, has witnessed substantial success recently driven by bridging video representations and textual semantics. ...

Highlights

Multi-grained video-text alignment when extracting visual features for captioning.
A learnable token shift module to enhance fine-grained inter-frame info interaction.
Refineformer provides additional well text-related spatial info ...

research-article

MovingColor: Seamless Fusion of Fine-grained Video Color Enhancement

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7454–7463https://doi.org/10.1145/3664647.3681130

Fine-grained video color enhancement delivers superior visual results by making precise adjustments to specific areas of the frame, maintaining more natural color relationships compared to global enhancement techniques. However, dynamically applying ...

research-article

Open Access

Regularized Contrastive Partial Multi-view Outlier Detection

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 8711–8720https://doi.org/10.1145/3664647.3681125

In recent years, multi-view outlier detection (MVOD) methods have advanced significantly, aiming to identify outliers within multi-view datasets. A key point is to better detect class outliers and class-attribute outliers, which only exist in multi-view ...

research-article

Open Access

HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution Detection

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 1544–1553https://doi.org/10.1145/3664647.3681118

With the progressive advancements in deep graph learning, out-of-distribution (OOD) detection for graph data has emerged as a critical challenge. While the efficacy of auxiliary datasets in enhancing OOD detection has been extensively studied for image ...

research-article

Open Access

Honorable Mention

Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 3828–3837https://doi.org/10.1145/3664647.3681110

The rapid growth of online video resources has significantly promoted the development of video retrieval methods. As a standard evaluation metric for video retrieval, Average Precision (AP) assesses the overall rankings of relevant videos at the top list,...

research-article

Unsupervised Image-to-Video Adaptation via Category-aware Flow Memory Bank and Realistic Video Generation

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 8795–8804https://doi.org/10.1145/3664647.3681063

Image-to-Video adaptation is proposed to train a model using labeled images and unlabeled videos to facilitate the classification of unlabeled videos. The latest work synthesizes videos using still images to mitigate the modality gap between images and ...

research-article

Finding a Taxi With Illegal Driver Substitution Activity via Behavior Modelings

IEEE Transactions on Intelligent Transportation Systems (ITS-TRANSACTIONS), Volume 25, Issue 12Pages 20309–20319https://doi.org/10.1109/TITS.2024.3409744

In our urban life, Illegal Driver Substitution (IDS) activity for a taxi is a grave unlawful activity in the taxi industry. Currently, the IDS activity is manually supervised by law enforcers, i.e., law enforcers empirically choose a taxi and inspect it. ...

Article

Distractors-Immune Representation Learning with Cross-Modal Contrastive Regularization for Change Captioning

Computer Vision – ECCV 2024Pages 311–328https://doi.org/10.1007/978-3-031-72775-7_18

Abstract

Change captioning aims to succinctly describe the semantic change between a pair of similar images, while being immune to distractors (illumination and viewpoint changes). Under these distractors, unchanged objects often appear pseudo changes ...

research-article

SpikeODE: Image Reconstruction for Spike Camera With Neural Ordinary Differential Equation

IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 34, Issue 11_Part_1Pages 11142–11155https://doi.org/10.1109/TCSVT.2024.3417812

The recently invented retina-inspired spike camera has shown great potential for capturing dynamic scenes. However, reconstructing high-quality images from the binary spike data remains a challenge due to the existence of noises in the camera. This paper ...

research-article

Self-Constructing Stereo Correspondences for Unsupervised Multi-View Stereo

IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 34, Issue 11_Part_1Pages 10732–10742https://doi.org/10.1109/TCSVT.2024.3416474

Existing unsupervised Multi-View Stereo (MVS) methods generally construct supervision on the basis of the photometric consistency loss, which suffers from unreliable supervision and limited scalability. In this paper, a novel unsupervised MVS framework ...

research-article

Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 8Article No.: 250, Pages 1–19https://doi.org/10.1145/3663570

Monocular depth estimation aims to infer a depth map from a single image. Although supervised learning-based methods have achieved remarkable performance, they generally rely on a large amount of labor-intensively annotated data. Self-supervised methods, ...

research-article

Collaborative Debias Strategy for Temporal Sentence Grounding in Video

IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 34, Issue 11_Part_1Pages 10972–10986https://doi.org/10.1109/TCSVT.2024.3413074

Temporal sentence grounding in video has witnessed significant advancements, but suffers from substantial dataset bias, which undermines its generalization ability. Existing debias approaches primarily concentrate on well-known distribution and linguistic ...

research-article

Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer

Neural Computing and Applications (NCAA), Volume 36, Issue 22Pages 13799–13814https://doi.org/10.1007/s00521-024-09773-0

Abstract

The tracking performance of Multi-Object Tracking (MOT) has recently been improved by using discriminative appearance and motion features. However, dense crowds and occlusions significantly reduce the reliability of these features, resulting in ...

research-article

Feature-based Perturbation Makes a Better Ensemble Learning for SSL Classification

CVIPPR '24: Proceedings of the 2024 2nd Asia Conference on Computer Vision, Image Processing and Pattern RecognitionArticle No.: 49, Pages 1–5https://doi.org/10.1145/3663976.3664035

Semi-supervised learning (SSL) poses a significant practical challenge in the field of computer vision. Pseudo Labeling methods (PL methods), as representative SSL techniques, obtain the State Of The Art (SOTA) performances in SSL. However, the error ...

research-article

Ensemble of Distinct Students for SSL 2D Pose Estimation

CVIPPR '24: Proceedings of the 2024 2nd Asia Conference on Computer Vision, Image Processing and Pattern RecognitionArticle No.: 48, Pages 1–5https://doi.org/10.1145/3663976.3664034

Semi-supervised pose estimation poses a significant challenge in computer vision. Although numerous semi-supervised classification techniques have been developed, they often rely on confidence scores to assess the quality of pseudo-labels, a feat that ...

research-article

SMART: Syntax-Calibrated Multi-Aspect Relation Transformer for Change Captioning

IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 46, Issue 7Pages 4926–4943https://doi.org/10.1109/TPAMI.2024.3365104

Change captioning aims to describe the semantic change between two similar images. In this process, as the most typical distractor, viewpoint change leads to the pseudo changes about appearance and position of objects, thereby overwhelming the real ...

research-article

Algorithm-Dependent Generalization of AUPRC Optimization: Theory and Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 46, Issue 7Pages 5062–5079https://doi.org/10.1109/TPAMI.2024.3361861

Stochastic optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning. Despite extensive studies on AUPRC optimization, generalization is still an open problem. In this work, we present the first trial in ...

research-article

Stereo Image Restoration via Attention-Guided Correspondence Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 46, Issue 7Pages 4850–4865https://doi.org/10.1109/TPAMI.2024.3357709

Although stereo image restoration has been extensively studied, most existing work focuses on restoring stereo images with limited horizontal parallax due to the binocular symmetry constraint. Stereo images with unlimited parallax (e.g., large ranges and ...

research-article

Multi-Granularity Contrastive Cross-Modal Collaborative Generation for End-to-End Long-Term Video Question Answering

IEEE Transactions on Image Processing (TIP), Volume 33Pages 3115–3129https://doi.org/10.1109/TIP.2024.3390984

Long-term Video Question Answering (VideoQA) is a challenging vision-and-language bridging task focusing on semantic understanding of untrimmed long-term videos and diverse free-form questions, simultaneously emphasizing comprehensive cross-modal ...

research-article

Fine-Grained Accident Detection: Database and Algorithm

IEEE Transactions on Image Processing (TIP), Volume 33Pages 1059–1069https://doi.org/10.1109/TIP.2024.3355812

This paper presents a novel fine-grained task for traffic accident analysis. Accident detection in surveillance or dashcam videos is a common task in the field of traffic accident analysis by using videos. However, common accident detection does not ...

Applied Filters

People

Names

Institutions

Authors

Editors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Paper Award

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder

Upcoming Conferences