OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models
OmnimatteZero is a training-free, real-time method for decomposing videos into background and foreground layers. Unlike existing approaches that require heavy computation or supervised training, it can remove objects with their footprints (shadows, reflections) and blend them seamlessly into new videos. Running at 24 FPS on an A100 GPU, it achieves this by directly manipulating the spatio-temporal latent space of pre-trained video diffusion models.