Artificial intelligence

Applied Filters

Publication Date

People

Publications

25 Results for: Book/Issue: Computer Vision – ECCV 2024Edit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,856,501 records)|Limit your search to The ACM Full-Text Collection (778,918 records)

Showing 1 - 20of25 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

Article
November 2024
Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation
Computer Vision – ECCV 2024Pages 148–165https://doi.org/10.1007/978-3-031-72943-0_9
Abstract
Generating high-quality videos that synthesize desired realistic content is a challenging task due to their intricate high dimensionality and complexity. Several recent diffusion-based methods have shown comparable performance by compressing ...
0
Metrics
Total Citations0
Article
November 2024
DIFFender: Diffusion-Based Adversarial Defense Against Patch Attacks
Computer Vision – ECCV 2024Pages 130–147https://doi.org/10.1007/978-3-031-72943-0_8
Abstract
Adversarial attacks, particularly patch attacks, pose significant threats to the robustness and reliability of deep learning models. Developing reliable defenses against patch attacks is crucial for real-world applications. This paper introduces ...
0
Metrics
Total Citations0
Article
November 2024
A Simple Low-Bit Quantization Framework for Video Snapshot Compressive Imaging
Computer Vision – ECCV 2024Pages 112–129https://doi.org/10.1007/978-3-031-72943-0_7
Abstract
Video Snapshot Compressive Imaging (SCI) aims to use a low-speed 2D camera to capture high-speed scene as snapshot compressed measurements, followed by a reconstruction algorithm to reconstruct the high-speed video frames. State-of-the-art (SOTA) ...
0
Metrics
Total Citations0
Article
November 2024
Privacy-Preserving Adaptive Re-Identification Without Image Transfer
Computer Vision – ECCV 2024Pages 95–111https://doi.org/10.1007/978-3-031-72943-0_6
Abstract
Re-Identification systems (Re-ID) are crucial for public safety but face the challenge of having to adapt to environments that differ from their training distribution. Furthermore, rigorous privacy protocols in public places are being enforced as ...
0
Metrics
Total Citations0
Article
November 2024
EA-VTR: Event-Aware Video-Text Retrieval
- Zongyang Ma,
- Ziqi Zhang,
- Yuxin Chen,
- Zhongang Qi,
- Chunfeng Yuan,
- Bing Li,
- Yingmin Luo,
- Xu Li,
- Xiaojuan Qi,
- Ying Shan,
- Weiming Hu
Computer Vision – ECCV 2024Pages 76–94https://doi.org/10.1007/978-3-031-72943-0_5
Abstract
Understanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-...
0
Metrics
Total Citations0
Article
November 2024
Continuous SO(3) Equivariant Convolution for 3D Point Cloud Analysis
Computer Vision – ECCV 2024Pages 59–75https://doi.org/10.1007/978-3-031-72943-0_4
Abstract
The inherent richness of geometric information in point cloud underscores the necessity of leveraging group equivariance, as preserving the topological structure of the point cloud up to the feature space provides an intuitive inductive bias for ...
0
Metrics
Total Citations0
Article
November 2024
Textual Knowledge Matters: Cross-Modality Co-teaching for Generalized Visual Class Discovery
Computer Vision – ECCV 2024Pages 41–58https://doi.org/10.1007/978-3-031-72943-0_3
Abstract
In this paper, we study the problem of Generalized Category Discovery (GCD), which aims to cluster unlabeled data from both known and unknown categories using the knowledge of labeled data from known categories. Current GCD methods rely on only ...
0
Metrics
Total Citations0
Article
November 2024
VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space
Computer Vision – ECCV 2024Pages 471–490https://doi.org/10.1007/978-3-031-72943-0_27
Abstract
Previous works on Human Pose and Shape Estimation (HPSE) from RGB images can be broadly categorized into two main groups: parametric and non-parametric approaches. Parametric techniques leverage a low-dimensional statistical body model for ...
0
Metrics
Total Citations0
Article
November 2024
V2X-Real: A Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception
- Hao Xiang,
- Zhaoliang Zheng,
- Xin Xia,
- Runsheng Xu,
- Letian Gao,
- Zewei Zhou,
- Xu Han,
- Xinkai Ji,
- Mingxi Li,
- Zonglin Meng,
- Li Jin,
- Mingyue Lei,
- Zhaoyang Ma,
- Zihang He,
- Haoxuan Ma,
- Yunshuang Yuan,
- Yingqian Zhao,
- Jiaqi Ma
Computer Vision – ECCV 2024Pages 455–470https://doi.org/10.1007/978-3-031-72943-0_26
Abstract
Recent advancements in Vehicle-to-Everything (V2X) technologies have enabled autonomous vehicles to share sensing information to see through occlusions, greatly boosting the perception capability. However, there are no real-world datasets to ...
0
Metrics
Total Citations0
Article
November 2024
Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-supervised Multi-label Learning
Computer Vision – ECCV 2024Pages 437–454https://doi.org/10.1007/978-3-031-72943-0_25
Abstract
Semi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations. Unlike semi-supervised learning, one cannot select the most probable ...
0
Metrics
Total Citations0
Article
November 2024
6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model
Computer Vision – ECCV 2024Pages 420–436https://doi.org/10.1007/978-3-031-72943-0_24
Abstract
We propose 6DGS to estimate the camera pose of a target RGB image given a 3D Gaussian Splatting (3DGS) model representing the scene. 6DGS avoids the iterative process typical of analysis-by-synthesis methods (e. g.iNeRF) that also require an ...
0
Metrics
Total Citations0
Article
November 2024
Real Appearance Modeling for More General Deepfake Detection
- Jiahe Tian,
- Cai Yu,
- Xi Wang,
- Peng Chen,
- Zihao Xiao,
- Jiao Dai,
- Jizhong Han,
- Yesheng Chai
Computer Vision – ECCV 2024Pages 402–419https://doi.org/10.1007/978-3-031-72943-0_23
Abstract
Recent studies in deepfake detection have shown promising results when detecting deepfakes of the same type as those present in training. However, their ability to generalize to unseen deepfakes remains limited. This work improves the ...
0
Metrics
Total Citations0
Article
November 2024
Disentangled Clothed Avatar Generation from Text Descriptions
- Jionghao Wang,
- Yuan Liu,
- Zhiyang Dou,
- Zhengming Yu,
- Yongqing Liang,
- Cheng Lin,
- Rong Xie,
- Li Song,
- Xin Li,
- Wenping Wang
Computer Vision – ECCV 2024Pages 381–401https://doi.org/10.1007/978-3-031-72943-0_22
Abstract
In this paper, we introduce a novel text-to-avatar generation method that separately generates the human body and the clothes and allows high-quality animation on the generated avatar. While recent advancements in text-to-avatar generation have ...
0
Metrics
Total Citations0
Article
November 2024
Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
- Junxiong Lin,
- Yan Wang,
- Zeng Tao,
- Boyang Wang,
- Qing Zhao,
- Haorang Wang,
- Xuan Tong,
- Xinji Mai,
- Yuxuan Lin,
- Wei Song,
- Jiawen Yu,
- Shaoqi Yan,
- Wenqiang Zhang
Computer Vision – ECCV 2024Pages 363–380https://doi.org/10.1007/978-3-031-72943-0_21
Abstract
Pre-trained diffusion models utilized for image generation encapsulate a substantial reservoir of a priori knowledge pertaining to intricate textures. Harnessing the potential of leveraging this a priori knowledge in the context of image super-...
0
Metrics
Total Citations0
Article
November 2024
Knowledge-Enhanced Visual-Language Pretraining for Computational Pathology
Computer Vision – ECCV 2024Pages 345–362https://doi.org/10.1007/978-3-031-72943-0_20
Abstract
In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain-specific knowledge in pathology. Specifically, ...
0
Metrics
Total Citations0
Article
November 2024
CountFormer: Multi-view Crowd Counting Transformer
- Hong Mo,
- Xiong Zhang,
- Jianchao Tan,
- Cheng Yang,
- Qiong Gu,
- Bo Hang,
- Wenqi Ren
Computer Vision – ECCV 2024Pages 20–40https://doi.org/10.1007/978-3-031-72943-0_2
Abstract
Multi-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical ...
0
Metrics
Total Citations0
Article
November 2024
Beyond Viewpoint: Robust 3D Object Recognition Under Arbitrary Views Through Joint Multi-part Representation
- Linlong Fan,
- Ye Huang,
- Yanqi Ge,
- Wen Li,
- Lixin Duan
Computer Vision – ECCV 2024Pages 291–309https://doi.org/10.1007/978-3-031-72943-0_17
Abstract
Existing view-based methods excel at recognizing 3D objects from predefined viewpoints, but their exploration of recognition under arbitrary views is limited. This is a challenging and realistic setting because each object has different viewpoint ...
0
Metrics
Total Citations0
Article
November 2024
DriveLM: Driving with Graph Visual Question Answering
Computer Vision – ECCV 2024Pages 256–274https://doi.org/10.1007/978-3-031-72943-0_15
Abstract
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems to boost generalization and enable interactivity with human users. While recent approaches adapt VLMs to driving via single-...
0
Metrics
Total Citations0
Article
November 2024
Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation
Computer Vision – ECCV 2024Pages 239–255https://doi.org/10.1007/978-3-031-72943-0_14
Abstract
Semantic segmentation is an important task for numerous applications but it is still quite challenging to achieve advanced performance with limited computational costs. In this paper, we present CGRSeg, an efficient yet competitive segmentation ...
0
Metrics
Total Citations0
Article
November 2024
Emergent Visual-Semantic Hierarchies in Image-Text Representations
- Morris Alper,
- Hadar Averbuch-Elor
Computer Vision – ECCV 2024Pages 220–238https://doi.org/10.1007/978-3-031-72943-0_13
Abstract
While recent vision-and-language models (VLMs) like CLIP are a powerful tool for analyzing text and images in a shared semantic space, they do not explicitly model the hierarchical nature of the set of texts which may describe an image. Conversely,...
0
Metrics
Total Citations0

Applied Filters

Publication Date

People

Authors

Institutions

Publications

All Publications

Content Type

Publisher

Results

Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation

DIFFender: Diffusion-Based Adversarial Defense Against Patch Attacks

A Simple Low-Bit Quantization Framework for Video Snapshot Compressive Imaging

Privacy-Preserving Adaptive Re-Identification Without Image Transfer

EA-VTR: Event-Aware Video-Text Retrieval

Continuous SO(3) Equivariant Convolution for 3D Point Cloud Analysis

Textual Knowledge Matters: Cross-Modality Co-teaching for Generalized Visual Class Discovery

VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space

V2X-Real: A Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-supervised Multi-label Learning

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

Real Appearance Modeling for More General Deepfake Detection

Disentangled Clothed Avatar Generation from Text Descriptions

Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution

Knowledge-Enhanced Visual-Language Pretraining for Computational Pathology

CountFormer: Multi-view Crowd Counting Transformer

Beyond Viewpoint: Robust 3D Object Recognition Under Arbitrary Views Through Joint Multi-part Representation

DriveLM: Driving with Graph Visual Question Answering

Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation

Emergent Visual-Semantic Hierarchies in Image-Text Representations