Matan Levy

I am a Computer Science Ph.D. candidate at the School of Computer Science and Engineering at the Hebrew University of Jerusalem, under the joint supervision of Prof. Dani Lischinski and Dr. Rami Ben-Ari.

I previously worked in IBM Research AI as a research intern.

My Research interest are Computer Vision and NLP, and tasks that combine them.

Semantic Scholar Google Scholar LinkedIn {Last name (in blue)}@cs.huji.ac.il

Publications

OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models

arXiv, 2025

Dvir Samuel, Matan Levy, Nir Darshan, Gal Chechik, Rami Ben-Ari

OmnimatteZero is a training-free, real-time method for decomposing videos into background and foreground layers. Unlike existing approaches that require heavy computation or supervised training, it can remove objects with their footprints (shadows, reflections) and blend them seamlessly into new videos. Running at 24 FPS on an A100 GPU, it achieves this by directly manipulating the spatio-temporal latent space of pre-trained video diffusion models.

Project Page arXiv

Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization

arXiv, 2025

Michael Green*, Matan Levy*, Issar Tzachor*, Dvir Samuel, Nir Darshan, Rami Ben-Ari

We tackle the problem of Small Object Image Retrieval (SoIR), where the goal is to retrieve images containing specific small objects within cluttered scenes. We establish new benchmarks and introduce Multi-object Attention Optimization (MaO), a novel framework that significantly outperforms existing methods, paving the way for future advancements in efficient, fine-grained retrieval tasks.

arXiv

Task-Specific Adaptation with Restricted Model Access

arXiv, 2025

Matan Levy, Rami Ben-Ari, Dvir Samuel, Nir Darshan, Dani Lischinski

In this work, we propose "Gray-box" fine-tuning frameworks that enables task-specific adaptation of foundational models without exposing their weights or architecture. Using lightweight input and output adapters, our approach effectively adapts models while keeping them fixed. We introduce DarkGray-box and LightGray-box variants, demonstrating competitive performance with full fine-tuning on tasks like text-image and text-video alignment.

arXiv

EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition

ICLR 2025

Issar Tzachor, Boaz Lerner, Matan Levy, Michael Green, Tal Berkovitz Shalev, Gavriel Habib, Dvir Samuel, Noam Korngut Zailer, Or Shimshi, Nir Darshan, Rami Ben-Ari

This work introduces a new method for visual place recognition (VPR) that uses features from foundation models to improve accuracy. It excels in handling challenging scenarios like occlusions, seasonal changes, and day-night variations, offering more efficient and accurate results than previous methods.

arXiv

Where's Waldo: Diffusion Features for Personalized Segmentation and Retrieval

NeurIPS 2024

Dvir Samuel, Rami Ben-Ari, Matan Levy, Nir Darshan, Gal Chechik

This work leverages text-to-image diffusion models for personalized image segmentation and retrieval, using features from pre-trained models. It surpasses existing methods in identifying specific objects within images without additional training.

Project Page arXiv Code

Chatting Makes Perfect: Chat-based Image Retrieval

NeurIPS 2023

Matan Levy, Rami Ben-Ari, Nir Darshan, Dani Lischinski

This work proposes a chat-based image retrieval system that refines search results through interactive dialogue. By asking follow-up questions, the system improves retrieval accuracy and surpasses traditional single-query methods in performance.

Project Page arXiv Code

Data Roaming and Quality Assessment for Composed Image Retrieval

AAAI 2024

Matan Levy, Rami Ben-Ari, Nir Darshan, Dani Lischinski

This work introduces a new dataset for Composed Image Retrieval (CoIR) and a model that significantly improves retrieval tasks. The dataset enhances query richness and reduces redundancy, achieving state-of-the-art results on benchmarks like FashionIQ and CIRR.

Project Page arXiv Code

Classification-Regression for Chart Comprehension

ECCV 2022

Matan Levy, Rami Ben-Ari, Dani Lischinski

This work presents a model for chart question answering that combines visual and textual data, significantly improving performance on complex charts. It excels in handling out-of-vocabulary and regression tasks, achieving strong results on the PlotQA dataset.

Project Page arXiv Code