Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
Rethink video retrieval representation for video captioning
AbstractVideo captioning, a challenging task targeting the automatic generation of accurate and comprehensive descriptions based on video content, has witnessed substantial success recently driven by bridging video representations and textual semantics. ...
Highlights- Multi-grained video-text alignment when extracting visual features for captioning.
- A learnable token shift module to enhance fine-grained inter-frame info interaction.
- Refineformer provides additional well text-related spatial info ...
- research-articleOctober 2024
MovingColor: Seamless Fusion of Fine-grained Video Color Enhancement
- Yi Dong,
- Yuxi Wang,
- Zheng Fang,
- Wenqi Ouyang,
- Xianhui Lin,
- Zhiqi Shen,
- Peiran Ren,
- Xuansong Xie,
- Qingming Huang
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7454–7463https://doi.org/10.1145/3664647.3681130Fine-grained video color enhancement delivers superior visual results by making precise adjustments to specific areas of the frame, maintaining more natural color relationships compared to global enhancement techniques. However, dynamically applying ...
- research-articleOctober 2024
Regularized Contrastive Partial Multi-view Outlier Detection
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 8711–8720https://doi.org/10.1145/3664647.3681125In recent years, multi-view outlier detection (MVOD) methods have advanced significantly, aiming to identify outliers within multi-view datasets. A key point is to better detect class outliers and class-attribute outliers, which only exist in multi-view ...
- research-articleOctober 2024
HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution Detection
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 1544–1553https://doi.org/10.1145/3664647.3681118With the progressive advancements in deep graph learning, out-of-distribution (OOD) detection for graph data has emerged as a critical challenge. While the efficacy of auxiliary datasets in enhancing OOD detection has been extensively studied for image ...
- research-articleOctober 2024Honorable Mention
Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 3828–3837https://doi.org/10.1145/3664647.3681110The rapid growth of online video resources has significantly promoted the development of video retrieval methods. As a standard evaluation metric for video retrieval, Average Precision (AP) assesses the overall rankings of relevant videos at the top list,...
-
- research-articleOctober 2024
Unsupervised Image-to-Video Adaptation via Category-aware Flow Memory Bank and Realistic Video Generation
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 8795–8804https://doi.org/10.1145/3664647.3681063Image-to-Video adaptation is proposed to train a model using labeled images and unlabeled videos to facilitate the classification of unlabeled videos. The latest work synthesizes videos using still images to mitigate the modality gap between images and ...
- research-articleOctober 2024
Finding a Taxi With Illegal Driver Substitution Activity via Behavior Modelings
IEEE Transactions on Intelligent Transportation Systems (ITS-TRANSACTIONS), Volume 25, Issue 12Pages 20309–20319https://doi.org/10.1109/TITS.2024.3409744In our urban life, Illegal Driver Substitution (IDS) activity for a taxi is a grave unlawful activity in the taxi industry. Currently, the IDS activity is manually supervised by law enforcers, i.e., law enforcers empirically choose a taxi and inspect it. ...
- ArticleSeptember 2024
Distractors-Immune Representation Learning with Cross-Modal Contrastive Regularization for Change Captioning
AbstractChange captioning aims to succinctly describe the semantic change between a pair of similar images, while being immune to distractors (illumination and viewpoint changes). Under these distractors, unchanged objects often appear pseudo changes ...
- research-articleJune 2024
SpikeODE: Image Reconstruction for Spike Camera With Neural Ordinary Differential Equation
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 34, Issue 11_Part_1Pages 11142–11155https://doi.org/10.1109/TCSVT.2024.3417812The recently invented retina-inspired spike camera has shown great potential for capturing dynamic scenes. However, reconstructing high-quality images from the binary spike data remains a challenge due to the existence of noises in the camera. This paper ...
- research-articleJune 2024
Self-Constructing Stereo Correspondences for Unsupervised Multi-View Stereo
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 34, Issue 11_Part_1Pages 10732–10742https://doi.org/10.1109/TCSVT.2024.3416474Existing unsupervised Multi-View Stereo (MVS) methods generally construct supervision on the basis of the photometric consistency loss, which suffers from unreliable supervision and limited scalability. In this paper, a novel unsupervised MVS framework ...
- research-articleJune 2024
Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 8Article No.: 250, Pages 1–19https://doi.org/10.1145/3663570Monocular depth estimation aims to infer a depth map from a single image. Although supervised learning-based methods have achieved remarkable performance, they generally rely on a large amount of labor-intensively annotated data. Self-supervised methods, ...
- research-articleJune 2024
Collaborative Debias Strategy for Temporal Sentence Grounding in Video
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 34, Issue 11_Part_1Pages 10972–10986https://doi.org/10.1109/TCSVT.2024.3413074Temporal sentence grounding in video has witnessed significant advancements, but suffers from substantial dataset bias, which undermines its generalization ability. Existing debias approaches primarily concentrate on well-known distribution and linguistic ...
- research-articleApril 2024
Multiple object tracking based on appearance and motion graph convolutional neural networks with an explainer
Neural Computing and Applications (NCAA), Volume 36, Issue 22Pages 13799–13814https://doi.org/10.1007/s00521-024-09773-0AbstractThe tracking performance of Multi-Object Tracking (MOT) has recently been improved by using discriminative appearance and motion features. However, dense crowds and occlusions significantly reduce the reliability of these features, resulting in ...
- research-articleJune 2024
Feature-based Perturbation Makes a Better Ensemble Learning for SSL Classification
CVIPPR '24: Proceedings of the 2024 2nd Asia Conference on Computer Vision, Image Processing and Pattern RecognitionArticle No.: 49, Pages 1–5https://doi.org/10.1145/3663976.3664035Semi-supervised learning (SSL) poses a significant practical challenge in the field of computer vision. Pseudo Labeling methods (PL methods), as representative SSL techniques, obtain the State Of The Art (SOTA) performances in SSL. However, the error ...
- research-articleJune 2024
Ensemble of Distinct Students for SSL 2D Pose Estimation
CVIPPR '24: Proceedings of the 2024 2nd Asia Conference on Computer Vision, Image Processing and Pattern RecognitionArticle No.: 48, Pages 1–5https://doi.org/10.1145/3663976.3664034Semi-supervised pose estimation poses a significant challenge in computer vision. Although numerous semi-supervised classification techniques have been developed, they often rely on confidence scores to assess the quality of pseudo-labels, a feat that ...
- research-articleFebruary 2024
SMART: Syntax-Calibrated Multi-Aspect Relation Transformer for Change Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 46, Issue 7Pages 4926–4943https://doi.org/10.1109/TPAMI.2024.3365104Change captioning aims to describe the semantic change between two similar images. In this process, as the most typical distractor, viewpoint change leads to the pseudo changes about appearance and position of objects, thereby overwhelming the real ...
- research-articleFebruary 2024
Algorithm-Dependent Generalization of AUPRC Optimization: Theory and Algorithm
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 46, Issue 7Pages 5062–5079https://doi.org/10.1109/TPAMI.2024.3361861Stochastic optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning. Despite extensive studies on AUPRC optimization, generalization is still an open problem. In this work, we present the first trial in ...
- research-articleJanuary 2024
Stereo Image Restoration via Attention-Guided Correspondence Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 46, Issue 7Pages 4850–4865https://doi.org/10.1109/TPAMI.2024.3357709Although stereo image restoration has been extensively studied, most existing work focuses on restoring stereo images with limited horizontal parallax due to the binocular symmetry constraint. Stereo images with unlimited parallax (e.g., large ranges and ...
- research-articleApril 2024
Multi-Granularity Contrastive Cross-Modal Collaborative Generation for End-to-End Long-Term Video Question Answering
IEEE Transactions on Image Processing (TIP), Volume 33Pages 3115–3129https://doi.org/10.1109/TIP.2024.3390984Long-term Video Question Answering (VideoQA) is a challenging vision-and-language bridging task focusing on semantic understanding of untrimmed long-term videos and diverse free-form questions, simultaneously emphasizing comprehensive cross-modal ...
- research-articleJanuary 2024
Fine-Grained Accident Detection: Database and Algorithm
IEEE Transactions on Image Processing (TIP), Volume 33Pages 1059–1069https://doi.org/10.1109/TIP.2024.3355812This paper presents a novel fine-grained task for traffic accident analysis. Accident detection in surveillance or dashcam videos is a common task in the field of traffic accident analysis by using videos. However, common accident detection does not ...