Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleNovember 2024
Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation
AbstractGenerating high-quality videos that synthesize desired realistic content is a challenging task due to their intricate high dimensionality and complexity. Several recent diffusion-based methods have shown comparable performance by compressing ...
- ArticleNovember 2024
DIFFender: Diffusion-Based Adversarial Defense Against Patch Attacks
AbstractAdversarial attacks, particularly patch attacks, pose significant threats to the robustness and reliability of deep learning models. Developing reliable defenses against patch attacks is crucial for real-world applications. This paper introduces ...
- ArticleNovember 2024
- ArticleNovember 2024
Privacy-Preserving Adaptive Re-Identification Without Image Transfer
AbstractRe-Identification systems (Re-ID) are crucial for public safety but face the challenge of having to adapt to environments that differ from their training distribution. Furthermore, rigorous privacy protocols in public places are being enforced as ...
- ArticleNovember 2024
EA-VTR: Event-Aware Video-Text Retrieval
- Zongyang Ma,
- Ziqi Zhang,
- Yuxin Chen,
- Zhongang Qi,
- Chunfeng Yuan,
- Bing Li,
- Yingmin Luo,
- Xu Li,
- Xiaojuan Qi,
- Ying Shan,
- Weiming Hu
AbstractUnderstanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-...
-
- ArticleNovember 2024
Continuous SO(3) Equivariant Convolution for 3D Point Cloud Analysis
AbstractThe inherent richness of geometric information in point cloud underscores the necessity of leveraging group equivariance, as preserving the topological structure of the point cloud up to the feature space provides an intuitive inductive bias for ...
- ArticleNovember 2024
Textual Knowledge Matters: Cross-Modality Co-teaching for Generalized Visual Class Discovery
AbstractIn this paper, we study the problem of Generalized Category Discovery (GCD), which aims to cluster unlabeled data from both known and unknown categories using the knowledge of labeled data from known categories. Current GCD methods rely on only ...
- ArticleNovember 2024
VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space
AbstractPrevious works on Human Pose and Shape Estimation (HPSE) from RGB images can be broadly categorized into two main groups: parametric and non-parametric approaches. Parametric techniques leverage a low-dimensional statistical body model for ...
- ArticleNovember 2024
V2X-Real: A Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception
- Hao Xiang,
- Zhaoliang Zheng,
- Xin Xia,
- Runsheng Xu,
- Letian Gao,
- Zewei Zhou,
- Xu Han,
- Xinkai Ji,
- Mingxi Li,
- Zonglin Meng,
- Li Jin,
- Mingyue Lei,
- Zhaoyang Ma,
- Zihang He,
- Haoxuan Ma,
- Yunshuang Yuan,
- Yingqian Zhao,
- Jiaqi Ma
AbstractRecent advancements in Vehicle-to-Everything (V2X) technologies have enabled autonomous vehicles to share sensing information to see through occlusions, greatly boosting the perception capability. However, there are no real-world datasets to ...
- ArticleNovember 2024
Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-supervised Multi-label Learning
AbstractSemi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations. Unlike semi-supervised learning, one cannot select the most probable ...
- ArticleNovember 2024
- ArticleNovember 2024
Real Appearance Modeling for More General Deepfake Detection
AbstractRecent studies in deepfake detection have shown promising results when detecting deepfakes of the same type as those present in training. However, their ability to generalize to unseen deepfakes remains limited. This work improves the ...
- ArticleNovember 2024
Disentangled Clothed Avatar Generation from Text Descriptions
- Jionghao Wang,
- Yuan Liu,
- Zhiyang Dou,
- Zhengming Yu,
- Yongqing Liang,
- Cheng Lin,
- Rong Xie,
- Li Song,
- Xin Li,
- Wenping Wang
AbstractIn this paper, we introduce a novel text-to-avatar generation method that separately generates the human body and the clothes and allows high-quality animation on the generated avatar. While recent advancements in text-to-avatar generation have ...
- ArticleNovember 2024
Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
- Junxiong Lin,
- Yan Wang,
- Zeng Tao,
- Boyang Wang,
- Qing Zhao,
- Haorang Wang,
- Xuan Tong,
- Xinji Mai,
- Yuxuan Lin,
- Wei Song,
- Jiawen Yu,
- Shaoqi Yan,
- Wenqiang Zhang
AbstractPre-trained diffusion models utilized for image generation encapsulate a substantial reservoir of a priori knowledge pertaining to intricate textures. Harnessing the potential of leveraging this a priori knowledge in the context of image super-...
- ArticleNovember 2024
Knowledge-Enhanced Visual-Language Pretraining for Computational Pathology
AbstractIn this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain-specific knowledge in pathology. Specifically, ...
- ArticleNovember 2024
CountFormer: Multi-view Crowd Counting Transformer
AbstractMulti-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical ...
- ArticleNovember 2024
Beyond Viewpoint: Robust 3D Object Recognition Under Arbitrary Views Through Joint Multi-part Representation
AbstractExisting view-based methods excel at recognizing 3D objects from predefined viewpoints, but their exploration of recognition under arbitrary views is limited. This is a challenging and realistic setting because each object has different viewpoint ...
- ArticleNovember 2024
DriveLM: Driving with Graph Visual Question Answering
- Chonghao Sima,
- Katrin Renz,
- Kashyap Chitta,
- Li Chen,
- Hanxue Zhang,
- Chengen Xie,
- Jens Beißwenger,
- Ping Luo,
- Andreas Geiger,
- Hongyang Li
AbstractWe study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems to boost generalization and enable interactivity with human users. While recent approaches adapt VLMs to driving via single-...
- ArticleNovember 2024
Emergent Visual-Semantic Hierarchies in Image-Text Representations
AbstractWhile recent vision-and-language models (VLMs) like CLIP are a powerful tool for analyzing text and images in a shared semantic space, they do not explicitly model the hierarchical nature of the set of texts which may describe an image. Conversely,...