Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleFebruary 2025
Multi-view human pose and shape estimation via mesh-aligned voxel interpolation
AbstractAlthough multi-view human pose and shape regression methods have information from other views for complementing and correcting, existing ones still have its own drawback of not fully taking advantage of multi-view setup. Thus they are far from ...
Highlights- A new network for multi-view human pose and shape regression.
- Human body features are well merged through multi-scale volumetric aggregation.
- A mesh-aligned voxel selection module is proposed to make effective prediction.
- A new ...
- research-articleJanuary 2025
ENCODE: Breaking the Trade-Off Between Performance and Efficiency in Long-Term User Behavior Modeling
IEEE Transactions on Knowledge and Data Engineering (IEEECS_TKDE), Volume 37, Issue 1Pages 265–277https://doi.org/10.1109/TKDE.2024.3486445Long-term user behavior sequences are a goldmine for businesses to explore users’ interests to improve Click-Through Rate (CTR). However, it is very challenging to accurately capture users’ long-term interests from their long-term behavior ...
- research-articleDecember 2024
Decomposed Prototype Learning for Few-Shot Scene Graph Generation
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 21, Issue 1Article No.: 30, Pages 1–24https://doi.org/10.1145/3700877Today's scene graph generation (SGG) models typically require abundant manual annotations to learn new predicate types. Therefore, it is difficult to apply them to real-world applications with massive uncommon predicate categories whose annotations are ...
- research-articleDecember 2024
IDPro: Flexible Interactive Video Object Segmentation by ID-Queried Concurrent Propagation
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 34, Issue 12_Part_1Pages 12171–12183https://doi.org/10.1109/TCSVT.2024.3431714Interactive Video Object Segmentation (iVOS) is inherently demanding, requiring real-time interaction between humans and computers. Enhancing user experience involves considerations such as user input habits, segmentation quality, running time, and memory ...
- research-articleNovember 2024
-
- research-articleNovember 2024
Knowledge-Guided Causal Intervention for Weakly-Supervised Object Localization
IEEE Transactions on Knowledge and Data Engineering (IEEECS_TKDE), Volume 36, Issue 11Pages 6477–6489https://doi.org/10.1109/TKDE.2024.3389668Previous weakly-supervised object localization (WSOL) methods aim to expand activation map discriminative areas to cover the whole objects, yet neglect two inherent challenges when relying solely on image-level labels. First, the “entangled context&...
- research-articleOctober 2024
Point Cloud Densification for 3D Gaussian Splatting from Sparse Input Views
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 896–904https://doi.org/10.1145/3664647.3681454The technique of 3D Gaussian splatting (3DGS) has demonstrated its effectiveness and efficiency in rendering photo-realistic images for novel view synthesis. However, 3DGS requires a high density of camera coverage, and its performance inevitably ...
- research-articleOctober 2024
Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 1602–1611https://doi.org/10.1145/3664647.3681036Benefiting from strong generalization ability, pre-trained vision-language models (VLMs), e.g., CLIP, have been widely utilized in zero-shot scene understanding. Unlike simple recognition tasks, grounded situation recognition (GSR) requires the model not ...
- research-articleOctober 2024
FC-4DFS: Frequency-controlled Flexible 4D Facial Expression Synthesizing
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 10882–10890https://doi.org/10.1145/3664647.36814554D facial expression synthesizing is a critical problem in the fields of computer vision and graphics. Current methods lack flexibility and smoothness when simulating the inter-frame motion of expression sequences. In this paper, we propose a frequency-...
- research-articleOctober 2024
Neural Interaction Energy for Multi-Agent Trajectory Prediction
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 1952–1960https://doi.org/10.1145/3664647.3680792Maintaining temporal stability is crucial in multi-agent trajectory prediction. Insufficient regularization to uphold this temporal stability often results in fluctuations in kinematic states, leading to inconsistent predictions and the amplification of ...
- research-articleOctober 2024
NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 46, Issue 10Pages 6873–6888https://doi.org/10.1109/TPAMI.2024.3387349Nearly all existing scene graph generation (SGG) models have overlooked the ground-truth annotation qualities of mainstream SGG datasets, i.e., they assume: 1) all the manually annotated positive samples are equally correct; 2) all the un-annotated ...
- ArticleNovember 2024
- ArticleSeptember 2024
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
AbstractExplicit Caption Editing (ECE)—refining reference image captions through a sequence of explicit edit operations (e.g., KEEP, DETELE)—has raised significant attention due to its explainable and human-like nature. After training with carefully ...
- research-articleSeptember 2024
UN-η: An offline adaptive normalization method for deploying transformers
AbstractTransformer has become the de-facto architecture for many natural language and vision tasks thanks to its remarkable performance. As an essential part of transformers, normalization functions struggle to improve the robustness and performance of ...
- research-articleSeptember 2024
Deep multi-scale feature mixture model for image super-resolution with multiple-focal-length degradation
AbstractSingle image super-resolution is a classic problem in computer vision. In recent years, deep learning-based models have achieved unprecedented success with this problem. However, most existing deep super-resolution models unavoidably produce ...
Highlights- This paper focuses on a real-world degradation, i.e., the multiple-focal-length degradation, for single image super-resolution.
- An efficient multi-scale feature mixture model is proposed for restoring images under the multiple-focal-...
- research-articleSeptember 2024
SPMLD: A skin pathological image dataset for non-melanoma with detailed lesion area annotation
Computers in Biology and Medicine (CBIM), Volume 179, Issue Chttps://doi.org/10.1016/j.compbiomed.2024.108793AbstractSkin tumors are the most common tumors in humans and the clinical characteristics of three common non-melanoma tumors (IDN, SK, BCC) are similar, resulting in a high misdiagnosis rate. The accurate differential diagnosis of these tumors needs to ...
Highlights- We develop a publicly available skin pathological image dataset, SPMLD, carried out by some dermatologists who collect and annotate the dataset.
- We propose a lesion area-based enhanced classification network, which can automatically ...
- research-articleAugust 2024
From Easy to Hard: Learning Curricular Shape-Aware Features for Robust Panoptic Scene Graph Generation
International Journal of Computer Vision (IJCV), Volume 133, Issue 1Pages 489–508https://doi.org/10.1007/s11263-024-02190-9AbstractPanoptic Scene Graph Generation (PSG) aims to generate a comprehensive graph-structure representation based on panoptic segmentation masks. Despite remarkable progress in PSG, almost all existing methods neglect the importance of shape-aware ...
- ArticleAugust 2024
One Stage Near-Ground Pothole Object Detection
Advanced Intelligent Computing Technology and ApplicationsPages 419–429https://doi.org/10.1007/978-981-97-5612-4_36AbstractTo propose a single-stage object detection method for realizing real-time and accurate identification and detection of near-ground small pothole objects photographed by UAV. Through the special copy-paste technology of the pothole dataset obtained ...
- posterJuly 2024
T2DyVec: Leveraging Sparse Images and Controllable Text for Dynamic SVG
SIGGRAPH '24: ACM SIGGRAPH 2024 PostersArticle No.: 48, Pages 1–2https://doi.org/10.1145/3641234.3671020In this work, we introduce a controlled dynamic vector graphic generation method. While existing work mostly focuses on text-based generation of single-frame images, dynamic images, or single-frame vectors, there is a lack of research on generating ...
- research-articleJuly 2024
Learning Combinatorial Prompts for Universal Controllable Image Captioning
International Journal of Computer Vision (IJCV), Volume 133, Issue 1Pages 129–150https://doi.org/10.1007/s11263-024-02179-4AbstractControllable Image Captioning (CIC)—generating natural language descriptions about images under the guidance of given control signals—is one of the most promising directions toward next-generation captioning systems. Till now, various kinds of ...