Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-72940-9guideproceedingsBook PagePublication PagesConference Proceedingsacm-pubtype
Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XLI
2024 Proceeding
  • Editors:
  • Aleš Leonardis,
  • Elisa Ricci,
  • Stefan Roth,
  • Olga Russakovsky,
  • Torsten Sattler,
  • Gül Varol
Publisher:
  • Springer-Verlag
  • Berlin, Heidelberg
Conference:
European Conference on Computer VisionMilan, Italy29 September 2024
ISBN:
978-3-031-72939-3
Published:
03 January 2025

Reflects downloads up to 13 Jan 2025Bibliometrics
Abstract

No abstract available.

Skip Table Of Content Section
front-matter
Front Matter
Pages i–lxxxv
back-matter
Back Matter
Article
Audio-Synchronized Visual Animation
Abstract

Current visual generation methods can produce high-quality videos guided by text prompts. However, effectively controlling object dynamics remains a challenge. This work explores audio as a cue to generate temporally synchronized image animations. ...

Article
Expressive Whole-Body 3D Gaussian Avatar
Abstract

Facial expression and hand motions are necessary to express our emotions and interact with the world. Nevertheless, most of the 3D human avatars modeled from a casually captured video only support body motions without facial expressions and hand ...

Article
Canonical Shape Projection Is All You Need for 3D Few-Shot Class Incremental Learning
Abstract

In recent years, robust pre-trained foundation models have been successfully used in many downstream tasks. Here, we would like to use such powerful models to address the problem of few-shot class incremental learning (FSCIL) tasks on 3D point ...

Article
Controllable Human-Object Interaction Synthesis
Abstract

Synthesizing semantic-aware, long-horizon, human-object interaction is critical to simulate realistic human behaviors. In this work, we address the challenging problem of generating synchronized object motion and human motion guided by language ...

Article
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
Abstract

This paper enables high-fidelity, transferable NeRF editing by frequency decomposition. Recent NeRF editing pipelines lift 2D stylization results to 3D scenes while suffering from blurry results, and fail to capture detailed structures caused by ...

Article
DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects
Abstract

Manipulation of elastoplastic objects like dough often involves topological changes such as splitting and merging. The ability to accurately predict these topological changes that a specific action might incur is critical for planning interactions ...

Article
PAV: Personalized Head Avatar from Unstructured Video Collection
Abstract

We propose PAV, Personalized Head Avatar for the synthesis of human faces under arbitrary viewpoints and facial expressions. PAV introduces a method that learns a dynamic deformable neural radiance field (NeRF), in particular from a collection of ...

Article
Strike a Balance in Continual Panoptic Segmentation
Abstract

This study explores the emerging area of continual panoptic segmentation, highlighting three key balances. First, we introduce past-class backtrace distillation to balance the stability of existing knowledge with the adaptability to new ...

Article
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
Abstract

We present Lazy Visual Grounding for open-vocabulary semantic segmentation, which decouples unsupervised object mask discovery from object grounding. Plenty of the previous art casts this task as pixel-to-text classification without object-level ...

Article
MultiDelete for Multimodal Machine Unlearning
Abstract

Machine Unlearning removes specific knowledge about training data samples from an already trained model. It has significant practical benefits, such as purging private, inaccurate, or outdated information from trained models without the need for ...

Article
Unified Local-Cloud Decision-Making via Reinforcement Learning
Abstract

Embodied vision-based real-world systems, such as mobile robots, require a careful balance between energy consumption, compute latency, and safety constraints to optimize operation across dynamic tasks and contexts. As local computation tends to ...

Article
UniTalker: Scaling up Audio-Driven 3D Facial Animation Through A Unified Model
Abstract

Audio-driven 3D facial animation aims to map input audio to realistic facial motion. Despite significant progress, limitations arise from inconsistent 3D annotations, restricting previous models to training on specific annotations and thereby ...

Article
Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation
Abstract

Enabling robotic manipulation that generalizes to out-of-distribution scenes is a crucial step toward the open-world embodied intelligence. For human beings, this ability is rooted in the understanding of semantic correspondence among different ...

Article
Efficient Frequency-Domain Image Deraining with Contrastive Regularization
Abstract

Most current single image-deraining (SID) methods are based on the Transformer with global modeling for high-quality reconstruction. However, their architectures only build long-range features from the spatial domain, which suffers from a ...

Article
Stitched ViTs are Flexible Vision Backbones
Abstract

Large pretrained plain vision Transformers (ViTs) have been the workhorse for many downstream tasks. However, existing works utilizing off-the-shelf ViTs are inefficient in terms of training and deployment, because adopting ViTs with individual ...

Article
TrajPrompt: Aligning Color Trajectory with Vision-Language Representations
Abstract

Cross-modal learning shows promising potential to overcome the limitations of single-modality tasks. However, without proper design for representation alignment between different data sources, the external modality cannot fully exhibit its value. ...

Article
SemReg: Semantics Constrained Point Cloud Registration
Abstract

Despite the recent success of Transformers in point cloud registration, the cross-attention mechanism, while enabling point-wise feature exchange between point clouds, suffers from redundant feature interactions among semantically unrelated ...

Article
Cascade-Zero123: One Image to Highly Consistent 3D with Self-prompted Nearby Views
Abstract

Synthesizing multi-view 3D from one single image is a significant but challenging task. Zero-1-to-3 methods have achieved great success by lifting a 2D latent diffusion model to the 3D scope. The target-view image is generated with a single-view ...

Article
RoScenes: A Large-Scale Multi-view 3D Dataset for Roadside Perception
Abstract

We introduce RoScenes, the largest multi-view roadside perception dataset, which aims to shed light on the development of vision-centric Bird’s Eye View (BEV) approaches for more challenging traffic scenes. The highlights of RoScenes include ...

Article
ReSyncer: Rewiring Style-Based Generator for Unified Audio-Visually Synced Facial Performer
Abstract

Lip-syncing videos with given audio is the foundation for various applications including the creation of virtual presenters or performers. While recent studies explore high-fidelity lip-sync with different techniques, their task-orientated models ...

Article
Language-Driven Physics-Based Scene Synthesis and Editing via Feature Splatting
Abstract

Scene representations using 3D Gaussian primitives have produced excellent results in modeling the appearance of static and dynamic 3D scenes. Many graphics applications, however, demand the ability to manipulate both the appearance and the ...

Article
AlignDiff: Aligning Diffusion Models for General Few-Shot Segmentation
Abstract

Text-to-image diffusion models have shown remarkable success in synthesizing photo-realistic images. Apart from creative applications, can we use such models to synthesize samples that aid the few-shot training of discriminative models? In this ...

Article
SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition
Abstract

Skeleton-based action recognition, which classifies human actions based on the coordinates of joints and their connectivity within skeleton data, is widely utilized in various scenarios. While Graph Convolutional Networks (GCNs) have been proposed ...

Article
R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Abstract

Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries. Most existing VTG models are built upon frame-wise final-layer CLIP features, aided ...

Article
Tree-D Fusion: Simulation-Ready Tree Dataset from Single Images with Diffusion Priors
Abstract

We introduce Tree-D Fusion, featuring the first collection of 600,000 environmentally aware, 3D simulation-ready tree models generated through Diffusion priors. Each reconstructed 3D tree model corresponds to an image from Google’s Auto Arborist ...

Article
Parameterization-Driven Neural Surface Reconstruction for Object-Oriented Editing in Neural Rendering
Abstract

The advancements in neural rendering have increased the need for techniques that enable intuitive editing of 3D objects represented as neural implicit surfaces. This paper introduces a novel neural algorithm for parameterizing neural implicit ...

Article
DomainFusion: Generalizing to Unseen Domains with Latent Diffusion Models
Abstract

Latent Diffusion Models (LDMs) are powerful and potential tools for facilitating generation-based methods for domain generalization. However, existing diffusion-based DG methods are restricted to offline augmentation using LDM and suffer from ...

Contributors
  • University of Birmingham
  • Bruno Kessler Foundation
  • Technical University of Darmstadt
  • Princeton University
  • Czech Technical University in Prague
Index terms have been assigned to the content through auto-classification.

Recommendations