Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-72970-6guideproceedingsBook PagePublication PagesConference Proceedingsacm-pubtype
Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XLVII
2024 Proceeding
  • Editors:
  • Aleš Leonardis,
  • Elisa Ricci,
  • Stefan Roth,
  • Olga Russakovsky,
  • Torsten Sattler,
  • Gül Varol
Publisher:
  • Springer-Verlag
  • Berlin, Heidelberg
Conference:
European Conference on Computer VisionMilan, Italy29 September 2024
ISBN:
978-3-031-72969-0
Published:
03 January 2025

Reflects downloads up to 13 Jan 2025Bibliometrics
Abstract

No abstract available.

Skip Table Of Content Section
front-matter
Front Matter
Pages i–lxxxv
back-matter
Back Matter
Article
RCS-Prompt: Learning Prompt to Rearrange Class Space for Prompt-Based Continual Learning
Abstract

Prompt-based Continual Learning is an emerging direction in leveraging pre-trained knowledge for downstream continual learning. While arriving at a new session, existing prompt-based continual learning methods usually adapt features from pre-...

Article
Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models
Abstract

Text-to-image diffusion models have advanced towards more controllable generation via supporting various additional conditions (e.g., depth map, bounding box) beyond text. However, these models are learned based on the premise of perfect alignment ...

Article
Grounding DINO: Marrying DINO with Grounded Pre-training for Open-Set Object Detection
Abstract

In this paper, we develop an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring ...

Article
Make Your ViT-Based Multi-view 3D Detectors Faster via Token Compression
Abstract

Slow inference speed is one of the most crucial concerns for deploying multi-view 3D detectors to tasks with high real-time requirements like autonomous driving. Although many sparse query-based methods have already attempted to improve the ...

Article
OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
Abstract

In the current state of 3D object detection research, the severe scarcity of annotated 3D data, substantial disparities across different data modalities, and the absence of a unified architecture, have impeded the progress towards the goal of ...

Article
CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing
Abstract

The success of deep neural networks (DNNs) in real-world applications has benefited from abundant pre-trained models. However, the backdoored pre-trained models can pose a significant trojan threat to the deployment of downstream DNNs. Numerous ...

Article
UCIP: A Universal Framework for Compressed Image Super-Resolution Using Dynamic Prompt
Abstract

Compressed Image Super-resolution (CSR) aims to simultaneously super-resolve the compressed images and tackle the challenging hybrid distortions caused by compression. However, existing works on CSR usually focus on single compression codec, i.e., ...

Article
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Abstract

This paper presents LLaVA-Plus (Large Language and Vision Assistants that Plug and Learn to Use Skills), a general-purpose multimodal assistant trained using an end-to-end approach that systematically expands the capabilities of large multimodal ...

Article
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Abstract

Despite the success of large-scale pretrained Vision-Language Models (VLMs) especially CLIP in various open-vocabulary tasks, their application to semantic segmentation remains challenging, producing noisy segmentation maps with mis-segmented ...

Article
Two-Stage Active Learning for Efficient Temporal Action Segmentation
Abstract

Training a temporal action segmentation (TAS) model on long and untrimmed videos requires gathering framewise video annotations, which is very costly. We propose a two-stage active learning framework to efficiently learn a TAS model using only a ...

Article
TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
Abstract

Texturing 3D humans with semantic UV maps remains a challenge due to the difficulty of acquiring reasonably unfolded UV. Despite recent text-to-3D advancements in supervising multi-view renderings using large text-to-image (T2I) models, issues ...

Article
MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views
Abstract

Recently, the Neural Radiance Field (NeRF) advancement has facilitated few-shot Novel View Synthesis (NVS), which is a significant challenge in 3D vision applications. Despite numerous attempts to reduce the dense input requirement in NeRF, it ...

Article
Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions
Abstract

Existing 2D human pose estimation research predominantly concentrates on well-lit scenarios, with limited exploration of poor lighting conditions, which are a prevalent aspect of daily life. Recent studies on low-light pose estimation require the ...

Article
Towards More Practical Group Activity Detection: A New Benchmark and Model
Abstract

Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video. While GAD has been studied recently, there is still much room for improvement in both dataset ...

Article
Depicting Beyond Scores: Advancing Image Quality Assessment Through Multi-modal Language Models
Abstract

We introduce a Depicted image Quality Assessment method (DepictQA), overcoming the constraints of traditional score-based methods. DepictQA allows for detailed, language-based, human-like evaluation of image quality by leveraging Multi-modal Large ...

Article
Zero-Shot Image Feature Consensus with Deep Functional Maps
Abstract

Correspondences emerge from large-scale vision models trained for generative and discriminative tasks. This has been revealed and benchmarked by computing correspondence maps between pairs of images, using nearest neighbors on the feature grids. ...

Article
WindPoly: Polygonal Mesh Reconstruction via Winding Numbers
Abstract

Polygonal mesh reconstruction of a raw point cloud is a valuable topic in the field of computer graphics and 3D vision. Especially to 3D architectural models, polygonal mesh provides concise expressions for fundamental geometric structures while ...

Article
MinD-3D: Reconstruct High-Quality 3D Objects in Human Brain
Abstract

In this paper, we introduce Recon3DMind, an innovative task aimed at reconstructing 3D visuals from Functional Magnetic Resonance Imaging (fMRI) signals, marking a significant advancement in the fields of cognitive neuroscience and computer ...

Article
Tokenize Anything via Prompting
Abstract

We present a unified, promptable model capable of simultaneously segmenting, recognizing, and captioning anything. Unlike SAM, we aim to build a versatile region representation in the wild via visual prompting. To achieve this, we train a ...

Article
Geospecific View Generation Geometry-Context Aware High-Resolution Ground View Inference from Satellite Views
Abstract

Predicting realistic ground views from satellite imagery in urban scenes is a challenging task due to the significant view gaps between satellite and ground-view images. We propose a novel pipeline to tackle this challenge, by generating ...

Article
Scissorhands: Scrub Data Influence via Connection Sensitivity in Networks
Abstract

Machine unlearning has become a pivotal task to erase the influence of data from a trained model. It adheres to recent data regulation standards and enhances the privacy and security of machine learning applications. In this work, we present a new ...

Article
City-on-Web: Real-Time Neural Rendering of Large-Scale Scenes on the Web
Abstract

Existing neural radiance field-based methods can achieve real-time rendering of small scenes on the web platform. However, extending these methods to large-scale scenes still poses significant challenges due to limited resources in computation, ...

Article
GRAPE: Generalizable and Robust Multi-view Facial Capture
Abstract

Deep learning-based multi-view facial capture methods have shown impressive accuracy while being several orders of magnitude faster than a traditional mesh registration pipeline. However, the existing systems (e.g. TEMPEH) are strictly restricted ...

Article
Training-Free Model Merging for Multi-target Domain Adaptation
Abstract

In this paper, we study multi-target domain adaptation of scene understanding models. While previous methods achieved commendable results through inter-domain consistency losses, they often assumed unrealistic simultaneous access to images from ...

Article
Multi-RoI Human Mesh Recovery with Camera Consistency and Contrastive Losses
Abstract

Besides a 3D mesh, Human Mesh Recovery (HMR) methods usually need to estimate a camera for computing 2D reprojection loss. Previous approaches may encounter the following problem: both the mesh and camera are not correct but the combination of ...

Article
Co-Student: Collaborating Strong and Weak Students for Sparsely Annotated Object Detection
Abstract

Sparsely Annotated Object Detection (SAOD) tackles the issue of incomplete labeling in object detection. Compared with Fully Annotated Object Detection (FAOD), SAOD is more complicated and challenging. Unlabeled objects tend to provide wrong ...

Article
Open-Vocabulary Camouflaged Object Segmentation
Abstract

Recently, the emergence of the large-scale vision-language model (VLM), such as CLIP, has opened the way towards open-world object perception. Many works have explored the utilization of pre-trained VLM for the challenging open-vocabulary dense ...

Contributors
  • University of Birmingham
  • Bruno Kessler Foundation
  • Technical University of Darmstadt
  • Princeton University
  • Czech Technical University in Prague
Index terms have been assigned to the content through auto-classification.

Recommendations