Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 108 results for author: Snoek, C G M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09415  [pdf, other

    cs.CV cs.LG

    An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

    Authors: Duy-Kien Nguyen, Mahmoud Assran, Unnat Jain, Martin R. Oswald, Cees G. M. Snoek, Xinlei Chen

    Abstract: This work does not introduce a new method. Instead, we present an interesting finding that questions the necessity of the inductive bias -- locality in modern computer vision architectures. Concretely, we find that vanilla Transformers can operate by directly treating each individual pixel as a token and achieve highly performant results. This is substantially different from the popular design in… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Technical report, 23 pages

  2. arXiv:2404.00701  [pdf, other

    cs.CV

    Training-Free Semantic Segmentation via LLM-Supervision

    Authors: Wenfang Sun, Yingjun Du, Gaowen Liu, Ramana Kompella, Cees G. M. Snoek

    Abstract: Recent advancements in open vocabulary models, like CLIP, have notably advanced zero-shot classification and segmentation by utilizing natural language for class-specific embeddings. However, most research has focused on improving model accuracy through prompt engineering, prompt learning, or fine-tuning with limited labeled data, thereby overlooking the importance of refining the class descriptor… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 22 pages,10 figures, conference

  3. arXiv:2403.12143  [pdf, other

    cs.LG cs.AI stat.ML

    Graph Neural Networks for Learning Equivariant Representations of Neural Networks

    Authors: Miltiadis Kofinas, Boris Knyazev, Yan Zhang, Yunlu Chen, Gertjan J. Burghouts, Efstratios Gavves, Cees G. M. Snoek, David W. Zhang

    Abstract: Neural networks that process the parameters of other neural networks find applications in domains as diverse as classifying implicit neural representations, generating neural network weights, and predicting generalization errors. However, existing approaches either overlook the inherent permutation symmetry in the neural network or rely on intricate weight-sharing patterns to achieve equivariance,… ▽ More

    Submitted 20 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: In ICLR 2024. Source code: https://github.com/mkofinas/neural-graphs

  4. arXiv:2402.10099  [pdf, other

    cs.CV

    Any-Shift Prompting for Generalization over Distributions

    Authors: Zehao Xiao, Jiayi Shen, Mohammad Mahdi Derakhshani, Shengcai Liao, Cees G. M. Snoek

    Abstract: Image-language models with prompt learning have shown remarkable advances in numerous downstream vision tasks. Nevertheless, conventional prompt learning methods overfit their training distribution and lose the generalization ability on test distributions. To improve generalization across various distribution shifts, we propose any-shift prompting: a general probabilistic inference framework that… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  5. arXiv:2402.08657  [pdf, other

    cs.CV

    PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

    Authors: Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek, Yuki M. Asano

    Abstract: Vision-Language Models (VLMs), such as Flamingo and GPT-4V, have shown immense potential by integrating large language models with vision systems. Nevertheless, these models face challenges in the fundamental computer vision task of object localisation, due to their training on multimodal data containing mostly captions without explicit spatial grounding. While it is possible to construct custom,… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  6. arXiv:2401.04716  [pdf, other

    cs.CV

    Low-Resource Vision Challenges for Foundation Models

    Authors: Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek

    Abstract: Low-resource settings are well-established in natural language processing, where many languages lack sufficient data for deep learning at scale. However, low-resource problems are under-explored in computer vision. In this paper, we address this gap and explore the challenges of low-resource image tasks with vision foundation models. We first collect a benchmark of genuinely low-resource image dat… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted at CVPR2024

  7. arXiv:2312.10825  [pdf, other

    cs.CV cs.LG

    Latent Space Editing in Transformer-Based Flow Matching

    Authors: Vincent Tao Hu, David W Zhang, Pascal Mettes, Meng Tang, Deli Zhao, Cees G. M. Snoek

    Abstract: This paper strives for image editing via generative models. Flow Matching is an emerging generative modeling technique that offers the advantage of simple and efficient training. Simultaneously, a new transformer-based U-ViT has recently been proposed to replace the commonly used UNet for better scalability and performance in generative modeling. Hence, Flow Matching with a transformer backbone of… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: AAAI 2024 with Appendix

  8. arXiv:2312.08895  [pdf, other

    cs.CV

    Motion Flow Matching for Human Motion Synthesis and Editing

    Authors: Vincent Tao Hu, Wenzhe Yin, Pingchuan Ma, Yunlu Chen, Basura Fernando, Yuki M Asano, Efstratios Gavves, Pascal Mettes, Bjorn Ommer, Cees G. M. Snoek

    Abstract: Human motion synthesis is a fundamental task in computer animation. Recent methods based on diffusion models or GPT structure demonstrate commendable performance but exhibit drawbacks in terms of slow sampling speeds and error accumulation. In this paper, we propose \emph{Motion Flow Matching}, a novel generative model designed for human motion generation featuring efficient sampling and effective… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: WIP

  9. arXiv:2312.08825  [pdf, other

    cs.CV

    Guided Diffusion from Self-Supervised Diffusion Features

    Authors: Vincent Tao Hu, Yunlu Chen, Mathilde Caron, Yuki M. Asano, Cees G. M. Snoek, Bjorn Ommer

    Abstract: Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or classifier pretraining. That is why guidance was harnessed from self-supervised learning backbones, like DINO. However, recent studies have revealed that the feature representation derived from diffusion model itself is discriminative for numerous downstream tasks a… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Work In Progress

  10. arXiv:2311.18512  [pdf, other

    cs.CV cs.LG

    Revisiting Proposal-based Object Detection

    Authors: Aritra Bhowmik, Martin R. Oswald, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper revisits the pipeline for detecting objects in images with proposals. For any object detector, the obtained box proposals or queries need to be classified and regressed towards ground truth boxes. The common solution for the final predictions is to directly maximize the overlap between each proposal and the ground truth box, followed by a winner-takes-all ranking or non-maximum suppress… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 10 pages, 7 figures

  11. arXiv:2311.17937  [pdf, other

    cs.CV

    Unlocking Spatial Comprehension in Text-to-Image Diffusion Models

    Authors: Mohammad Mahdi Derakhshani, Menglin Xia, Harkirat Behl, Cees G. M. Snoek, Victor Rühle

    Abstract: We propose CompFuser, an image generation pipeline that enhances spatial comprehension and attribute assignment in text-to-image generative models. Our pipeline enables the interpretation of instructions defining spatial relationships between objects in a scene, such as `An image of a gray cat on the left of an orange dog', and generate corresponding images. This is especially important in order t… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  12. arXiv:2311.13895  [pdf, other

    cs.CV

    Query by Activity Video in the Wild

    Authors: Tao Hu, William Thong, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper focuses on activity retrieval from a video query in an imbalanced scenario. In current query-by-activity-video literature, a common assumption is that all activities have sufficient labelled examples when learning an embedding. This assumption does however practically not hold, as only a portion of activities have many examples, while other activities are only described by few examples.… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: An extended version of ICIP 2023

  13. arXiv:2311.08851  [pdf, other

    cs.LG cs.CV

    Data Augmentations in Deep Weight Spaces

    Authors: Aviv Shamsian, David W. Zhang, Aviv Navon, Yan Zhang, Miltiadis Kofinas, Idan Achituve, Riccardo Valperga, Gertjan J. Burghouts, Efstratios Gavves, Cees G. M. Snoek, Ethan Fetaya, Gal Chechik, Haggai Maron

    Abstract: Learning in weight spaces, where neural networks process the weights of other deep neural networks, has emerged as a promising research direction with applications in various fields, from analyzing and editing neural fields and implicit neural representations, to network pruning and quantization. Recent works designed architectures for effective learning in that space, which takes into account its… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Accepted to NeurIPS 2023 Workshop on Symmetry and Geometry in Neural Representations

  14. arXiv:2310.19776  [pdf, other

    cs.CV cs.AI cs.IT cs.LG

    Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery

    Authors: Sarah Rastegar, Hazel Doughty, Cees G. M. Snoek

    Abstract: In the quest for unveiling novel categories at test time, we confront the inherent limitations of traditional supervised recognition models that are restricted by a predefined category set. While strides have been made in the realms of self-supervised and open-world learning towards test-time category discovery, a crucial yet often overlooked question persists: what exactly delineates a category?… ▽ More

    Submitted 18 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

    ACM Class: I.2.1.b; I.2.6.g; I.5.4.b; I.4

  15. arXiv:2310.05920  [pdf, other

    cs.CV

    SimPLR: A Simple and Plain Transformer for Scaling-Efficient Object Detection and Segmentation

    Authors: Duy-Kien Nguyen, Martin R. Oswald, Cees G. M. Snoek

    Abstract: The ability to detect objects in images at varying scales has played a pivotal role in the design of modern object detectors. Despite considerable progress in removing hand-crafted components and simplifying the architecture with transformers, multi-scale feature maps and/or pyramid design remain a key factor for their empirical success. In this paper, we show that this reliance on either feature… ▽ More

    Submitted 15 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  16. arXiv:2310.00500  [pdf, other

    cs.CV

    Self-Supervised Open-Ended Classification with Small Visual Language Models

    Authors: Mohammad Mahdi Derakhshani, Ivona Najdenkoska, Cees G. M. Snoek, Marcel Worring, Yuki M. Asano

    Abstract: We present Self-Context Adaptation (SeCAt), a self-supervised approach that unlocks few-shot abilities for open-ended classification with small visual language models. Our approach imitates image captions in a self-supervised way based on clustering a large pool of images followed by assigning semantically-unrelated names to clusters. By doing so, we construct a training signal consisting of inter… ▽ More

    Submitted 6 December, 2023; v1 submitted 30 September, 2023; originally announced October 2023.

  17. arXiv:2308.11796  [pdf, other

    cs.CV

    Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations

    Authors: Mohammadreza Salehi, Efstratios Gavves, Cees G. M. Snoek, Yuki M. Asano

    Abstract: Spatially dense self-supervised learning is a rapidly growing problem domain with promising applications for unsupervised segmentation and pretraining for dense downstream tasks. Despite the abundance of temporal data in the form of videos, this information-rich source has been largely overlooked. Our paper aims to address this gap by proposing a novel approach that incorporates temporal consisten… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  18. arXiv:2307.04033  [pdf, other

    cs.LG cs.AI

    Learning Variational Neighbor Labels for Test-Time Domain Generalization

    Authors: Sameer Ambekar, Zehao Xiao, Jiayi Shen, Xiantong Zhen, Cees G. M. Snoek

    Abstract: This paper strives for domain generalization, where models are trained exclusively on source domains before being deployed at unseen target domains. We follow the strict separation of source training and target testing but exploit the value of the unlabeled target data itself during inference. We make three contributions. First, we propose probabilistic pseudo-labeling of target samples to general… ▽ More

    Submitted 23 October, 2023; v1 submitted 8 July, 2023; originally announced July 2023.

    Comments: Under review

  19. arXiv:2306.12795  [pdf, other

    cs.CV cs.LG cs.MM

    Learning Unseen Modality Interaction

    Authors: Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek

    Abstract: Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. In this paper, we challenge this modality-complete assumption for multimodal learning and instead strive for generalization to unseen modality combinations during inference. We pose the problem of unseen modality interaction and introduce a first solution. It exploi… ▽ More

    Submitted 25 October, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

    Comments: Published at NeurIPS 2023

  20. Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph Generation

    Authors: Shuo Chen, Yingjun Du, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper investigates the problem of scene graph generation in videos with the aim of capturing semantic relations between subjects and objects in the form of $\langle$subject, predicate, object$\rangle$ triplets. Recognizing the predicate between subject and object pairs is imbalanced and multi-label in nature, ranging from ubiquitous interactions such as spatial relationships (\eg \emph{in fro… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: ICMR 2023

    ACM Class: I.2.10

  21. arXiv:2306.05411  [pdf, other

    cs.CV

    R-MAE: Regions Meet Masked Autoencoders

    Authors: Duy-Kien Nguyen, Vaibhav Aggarwal, Yanghao Li, Martin R. Oswald, Alexander Kirillov, Cees G. M. Snoek, Xinlei Chen

    Abstract: In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning. Inspired by Masked Autoencoding (MAE), a generative pre-training baseline, we propose masked region autoencoding to learn from groups of pixels or regions. Specifically, we design an architecture which efficiently addresses the one-to-many mapping between images and regions,… ▽ More

    Submitted 4 January, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

  22. arXiv:2306.05189  [pdf, other

    cs.LG

    EMO: Episodic Memory Optimization for Few-Shot Meta-Learning

    Authors: Yingjun Du, Jiayi Shen, Xiantong Zhen, Cees G. M. Snoek

    Abstract: Few-shot meta-learning presents a challenge for gradient descent optimization due to the limited number of training samples per task. To address this issue, we propose an episodic memory optimization for meta-learning, we call EMO, which is inspired by the human ability to recall past learning experiences from the brain's memory. EMO retains the gradient history of past experienced tasks in extern… ▽ More

    Submitted 26 June, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted by CoLLAs 2023

  23. arXiv:2306.05129  [pdf, other

    cs.CV

    Focus for Free in Density-Based Counting

    Authors: Zenglin Shi, Pascal Mettes, Cees G. M. Snoek

    Abstract: This work considers supervised learning to count from images and their corresponding point annotations. Where density-based counting methods typically use the point annotations only to create Gaussian-density maps, which act as the supervision signal, the starting point of this work is that point annotations have counting potential beyond density map generation. We introduce two methods that repur… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: 18 pages

  24. arXiv:2305.10309  [pdf, other

    cs.LG

    MetaModulation: Learning Variational Feature Hierarchies for Few-Shot Learning with Fewer Tasks

    Authors: Wenfang Sun, Yingjun Du, Xiantong Zhen, Fan Wang, Ling Wang, Cees G. M. Snoek

    Abstract: Meta-learning algorithms are able to learn a new task using previously learned knowledge, but they often require a large number of meta-training tasks which may not be readily available. To address this issue, we propose a method for few-shot learning with fewer tasks, which we call MetaModulation. The key idea is to use a neural network to increase the density of the meta-training tasks by modula… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted by ICML 2023

  25. arXiv:2304.00961  [pdf, other

    cs.CV

    Self-Ordering Point Clouds

    Authors: Pengwan Yang, Cees G. M. Snoek, Yuki M. Asano

    Abstract: In this paper we address the task of finding representative subsets of points in a 3D point cloud by means of a point-wise ordering. Only a few works have tried to address this challenging vision problem, all with the help of hard to obtain point and cloud labels. Different from these works, we introduce the task of point-wise ordering in 3D point clouds through self-supervision, which we call sel… ▽ More

    Submitted 10 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

  26. arXiv:2304.00101  [pdf, other

    cs.CV

    SuperDisco: Super-Class Discovery Improves Visual Recognition for the Long-Tail

    Authors: Yingjun Du, Jiayi Shen, Xiantong Zhen, Cees G. M. Snoek

    Abstract: Modern image classifiers perform well on populated classes, while degrading considerably on tail classes with only a few instances. Humans, by contrast, effortlessly handle the long-tailed recognition challenge, since they can learn the tail representation based on different levels of semantic abstraction, making the learned tail features more discriminative. This phenomenon motivated us to propos… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

    Comments: Accepted by CVPR 2023

  27. arXiv:2303.05977  [pdf, other

    cs.CV

    Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models

    Authors: Tom van Sonsbeek, Mohammad Mahdi Derakhshani, Ivona Najdenkoska, Cees G. M. Snoek, Marcel Worring

    Abstract: Medical Visual Question Answering (VQA) is an important challenge, as it would lead to faster and more accurate diagnoses and treatment decisions. Most existing methods approach it as a multi-class classification problem, which restricts the outcome to a predefined closed-set of curated answers. We focus on open-ended VQA and motivated by the recent advances in language models consider it as a gen… ▽ More

    Submitted 21 July, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    MSC Class: 68T07

  28. arXiv:2302.11215  [pdf, other

    cs.LG

    Energy-Based Test Sample Adaptation for Domain Generalization

    Authors: Zehao Xiao, Xiantong Zhen, Shengcai Liao, Cees G. M. Snoek

    Abstract: In this paper, we propose energy-based sample adaptation at test time for domain generalization. Where previous works adapt their models to target domains, we adapt the unseen target samples to source-trained models. To this end, we design a discriminative energy-based model, which is trained on source domains to jointly model the conditional distribution for classification and data distribution f… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: Accepted by ICLR 2023

  29. arXiv:2301.13197  [pdf, other

    cs.LG cs.CV

    Unlocking Slot Attention by Changing Optimal Transport Costs

    Authors: Yan Zhang, David W. Zhang, Simon Lacoste-Julien, Gertjan J. Burghouts, Cees G. M. Snoek

    Abstract: Slot attention is a powerful method for object-centric modeling in images and videos. However, its set-equivariance limits its ability to handle videos with a dynamic number of objects because it cannot break ties. To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose MESH (Minimize Entropy of Sinkhorn):… ▽ More

    Submitted 31 May, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: Published at International Conference on Machine Learning (ICML) 2023

  30. arXiv:2301.02074  [pdf, other

    cs.CV cs.AI

    Test of Time: Instilling Video-Language Models with a Sense of Time

    Authors: Piyush Bagad, Makarand Tapaswi, Cees G. M. Snoek

    Abstract: Modelling and understanding time remains a challenge in contemporary video understanding models. With language emerging as a key driver towards powerful generalization, it is imperative for foundational video-language models to have a sense of time. In this paper, we consider a specific aspect of temporal understanding: consistency of time order as elicited by before/after relations. We establish… ▽ More

    Submitted 25 March, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

    Comments: Accepted for publication at CVPR 2023. Project page: https://bpiyush.github.io/testoftime-website/index.html

  31. arXiv:2212.12395  [pdf, other

    cs.CV

    Detecting Objects with Context-Likelihood Graphs and Graph Refinement

    Authors: Aritra Bhowmik, Yu Wang, Nora Baka, Martin R. Oswald, Cees G. M. Snoek

    Abstract: The goal of this paper is to detect objects by exploiting their interrelationships. Contrary to existing methods, which learn objects and relations separately, our key idea is to learn the object-relation distribution jointly. We first propose a novel way of creating a graphical representation of an image from inter-object relation priors and initial class predictions, we call a context-likelihood… ▽ More

    Submitted 27 September, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

    Comments: 13 pages, 8 figures. In Proceedings of International Conference on Computer Vision (ICCV) 2023

  32. arXiv:2212.02053  [pdf, other

    cs.CV

    Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight

    Authors: Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek

    Abstract: This paper strives to recognize activities in the dark, as well as in the day. We first establish that state-of-the-art activity recognizers are effective during the day, but not trustworthy in the dark. The main causes are the limited availability of labeled dark videos to learn from, as well as the distribution shift towards the lower color contrast at test-time. To compensate for the lack of la… ▽ More

    Submitted 27 August, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

    Comments: Under review

  33. arXiv:2210.10378  [pdf, other

    cs.LG cs.CV

    Variational Model Perturbation for Source-Free Domain Adaptation

    Authors: Mengmeng Jing, Xiantong Zhen, Jingjing Li, Cees G. M. Snoek

    Abstract: We aim for source-free domain adaptation, where the task is to deploy a model pre-trained on source domains to target domains. The challenges stem from the distribution shift from the source to the target domain, coupled with the unavailability of any source data and labeled target data for optimization. Rather than fine-tuning the model by updating the parameters, we propose to perturb the source… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

  34. arXiv:2210.06462  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Guided Diffusion Models

    Authors: Vincent Tao Hu, David W Zhang, Yuki M. Asano, Gertjan J. Burghouts, Cees G. M. Snoek

    Abstract: Diffusion models have demonstrated remarkable progress in image generation quality, especially when guidance is used to control the generative process. However, guidance requires a large amount of image-annotation pairs for training and is thus dependent on their availability, correctness and unbiasedness. In this paper, we eliminate the need for such annotation by instead leveraging the flexibili… ▽ More

    Submitted 27 November, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: CVPR 2023

  35. arXiv:2210.04637  [pdf, other

    cs.CV

    Association Graph Learning for Multi-Task Classification with Category Shifts

    Authors: Jiayi Shen, Zehao Xiao, Xiantong Zhen, Cees G. M. Snoek, Marcel Worring

    Abstract: In this paper, we focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously. In particular, we tackle a new setting, which is more realistic than currently addressed in the literature, where categories shift from training to test data. Hence, individual tasks do not contain complete training data for the categories in the test… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

  36. arXiv:2210.02390  [pdf, other

    cs.CV cs.AI cs.LG

    Bayesian Prompt Learning for Image-Language Model Generalization

    Authors: Mohammad Mahdi Derakhshani, Enrique Sanchez, Adrian Bulat, Victor Guilherme Turrisi da Costa, Cees G. M. Snoek, Georgios Tzimiropoulos, Brais Martinez

    Abstract: Foundational image-language models have generated considerable interest due to their efficient adaptation to downstream tasks by prompt learning. Prompt learning treats part of the language model input as trainable while freezing the rest, and optimizes an Empirical Risk Minimization objective. However, Empirical Risk Minimization is known to suffer from distributional shifts which hurt generaliza… ▽ More

    Submitted 20 August, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted at ICCV 2023

  37. arXiv:2205.14297  [pdf, other

    cs.CV cs.LG

    Fake It Till You Make It: Towards Accurate Near-Distribution Novelty Detection

    Authors: Hossein Mirzaei, Mohammadreza Salehi, Sajjad Shahabi, Efstratios Gavves, Cees G. M. Snoek, Mohammad Sabokrou, Mohammad Hossein Rohban

    Abstract: We aim for image-based novelty detection. Despite considerable progress, existing models either fail or face a dramatic drop under the so-called "near-distribution" setting, where the differences between normal and anomalous samples are subtle. We first demonstrate existing methods experience up to 20% decrease in performance in the near-distribution setting. Next, we propose to exploit a score-ba… ▽ More

    Submitted 28 November, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

  38. arXiv:2204.08874  [pdf, other

    cs.CV

    Less than Few: Self-Shot Video Instance Segmentation

    Authors: Pengwan Yang, Yuki M. Asano, Pascal Mettes, Cees G. M. Snoek

    Abstract: The goal of this paper is to bypass the need for labelled examples in few-shot video understanding at run time. While proven effective, in many practical video settings even labelling a few examples appears unrealistic. This is especially true as the level of details in spatio-temporal video understanding and with it, the complexity of annotations continues to increase. Rather than performing few-… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: 25 pages, 5 figures, 13 tables

  39. arXiv:2204.05737  [pdf, other

    cs.CV

    LifeLonger: A Benchmark for Continual Disease Classification

    Authors: Mohammad Mahdi Derakhshani, Ivona Najdenkoska, Tom van Sonsbeek, Xiantong Zhen, Dwarikanath Mahapatra, Marcel Worring, Cees G. M. Snoek

    Abstract: Deep learning models have shown a great effectiveness in recognition of findings in medical images. However, they cannot handle the ever-changing clinical environment, bringing newly annotated medical data from different sources. To exploit the incoming streams of data, these models would benefit largely from sequentially learning from new samples, without forgetting the previously obtained knowle… ▽ More

    Submitted 30 June, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

    MSC Class: 68T07

  40. arXiv:2203.14240  [pdf, other

    cs.CV

    Audio-Adaptive Activity Recognition Across Video Domains

    Authors: Yunhua Zhang, Hazel Doughty, Ling Shao, Cees G. M. Snoek

    Abstract: This paper strives for activity recognition under domain shift, for example caused by change of scenery or camera viewpoint. The leading approaches reduce the shift in activity appearance by adversarial training and self-supervised learning. Different from these vision-focused works we leverage activity sounds for domain adaptation as they have less variance across domains and can reliably indicat… ▽ More

    Submitted 29 March, 2022; v1 submitted 27 March, 2022; originally announced March 2022.

    Comments: Accepted at CVPR 2022

  41. arXiv:2203.12344  [pdf, other

    cs.CV

    How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs

    Authors: Hazel Doughty, Cees G. M. Snoek

    Abstract: We aim to understand how actions are performed and identify subtle differences, such as 'fold firmly' vs. 'fold gently'. To this end, we propose a method which recognizes adverbs across different actions. However, such fine-grained annotations are difficult to obtain and their long-tailed nature makes it challenging to recognize adverbs in rare action-adverb compositions. Our approach therefore us… ▽ More

    Submitted 10 June, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022

  42. arXiv:2202.08045  [pdf, other

    cs.LG cs.CV

    Learning to Generalize across Domains on Single Test Samples

    Authors: Zehao Xiao, Xiantong Zhen, Ling Shao, Cees G. M. Snoek

    Abstract: We strive to learn a model from a set of source domains that generalizes well to unseen target domains. The main challenge in such a domain generalization scenario is the unavailability of any target domain data during training, resulting in the learned model not being explicitly adapted to the unseen target domains. We propose learning to generalize across domains on single test samples. We lever… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

  43. arXiv:2112.13410  [pdf, other

    cs.LG cs.AI

    Generative Kernel Continual learning

    Authors: Mohammad Mahdi Derakhshani, Xiantong Zhen, Ling Shao, Cees G. M. Snoek

    Abstract: Kernel continual learning by \citet{derakhshani2021kernel} has recently emerged as a strong continual learner due to its non-parametric ability to tackle task interference and catastrophic forgetting. Unfortunately its success comes at the expense of an explicit memory to store samples from past tasks, which hampers scalability to continual learning settings with a large number of tasks. In this p… ▽ More

    Submitted 26 December, 2021; originally announced December 2021.

    Comments: work in progress

  44. arXiv:2112.08181  [pdf, other

    cs.LG

    Hierarchical Variational Memory for Few-shot Learning Across Domains

    Authors: Yingjun Du, Xiantong Zhen, Ling Shao, Cees G. M. Snoek

    Abstract: Neural memory enables fast adaptation to new tasks with just a few training samples. Existing memory models store features only from the single last layer, which does not generalize well in presence of a domain shift between training and test distributions. Rather than relying on a flat memory, we propose a hierarchical alternative that stores features at different semantic levels. We introduce a… ▽ More

    Submitted 20 April, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: 17 pages, 5 figures

    Journal ref: ICLR 2022

  45. arXiv:2111.13087  [pdf, other

    cs.CV

    BoxeR: Box-Attention for 2D and 3D Transformers

    Authors: Duy-Kien Nguyen, Jihong Ju, Olaf Booij, Martin R. Oswald, Cees G. M. Snoek

    Abstract: In this paper, we propose a simple attention mechanism, we call box-attention. It enables spatial interaction between grid features, as sampled from boxes of interest, and improves the learning capability of transformers for several vision tasks. Specifically, we present BoxeR, short for Box Transformer, which attends to a set of boxes by predicting their transformation from a reference window on… ▽ More

    Submitted 25 March, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: In Proceeding of CVPR'2022

  46. arXiv:2111.12193  [pdf, other

    cs.LG stat.ML

    Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation

    Authors: Yan Zhang, David W. Zhang, Simon Lacoste-Julien, Gertjan J. Burghouts, Cees G. M. Snoek

    Abstract: Most set prediction models in deep learning use set-equivariant operations, but they actually operate on multisets. We show that set-equivariant functions cannot represent certain functions on multisets, so we introduce the more appropriate notion of multiset-equivariance. We identify that the existing Deep Set Prediction Network (DSPN) can be multiset-equivariant without being hindered by set-equ… ▽ More

    Submitted 3 February, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: Published at International Conference on Learning Representations (ICLR) 2022

  47. arXiv:2110.14336  [pdf, other

    cs.CV

    Feature and Label Embedding Spaces Matter in Addressing Image Classifier Bias

    Authors: William Thong, Cees G. M. Snoek

    Abstract: This paper strives to address image classifier bias, with a focus on both feature and label embedding spaces. Previous works have shown that spurious correlations from protected attributes, such as age, gender, or skin tone, can cause adverse decisions. To balance potential harms, there is a growing need to identify and mitigate image classifier bias. First, we identify in the feature space a bias… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted at British Machine Vision Conference (BMVC) 2021

  48. arXiv:2110.13110  [pdf, other

    cs.CV

    Diagnosing Errors in Video Relation Detectors

    Authors: Shuo Chen, Pascal Mettes, Cees G. M. Snoek

    Abstract: Video relation detection forms a new and challenging problem in computer vision, where subjects and objects need to be localized spatio-temporally and a predicate label needs to be assigned if and only if there is an interaction between the two. Despite recent progress in video relation detection, overall performance is still marginal and it remains unclear what the key factors are towards solving… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: BMVC 2021

  49. arXiv:2108.08363  [pdf, other

    cs.CV

    Social Fabric: Tubelet Compositions for Video Relation Detection

    Authors: Shuo Chen, Zenglin Shi, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper strives to classify and detect the relationship between object tubelets appearing within a video as a <subject-predicate-object> triplet. Where existing works treat object proposals or tubelets as single entities and model their relations a posteriori, we propose to classify and detect predicates for pairs of object tubelets a priori. We also propose Social Fabric: an encoding that repr… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  50. arXiv:2108.03656  [pdf, other

    cs.CV

    Skeleton-Contrastive 3D Action Representation Learning

    Authors: Fida Mohammad Thoker, Hazel Doughty, Cees G. M. Snoek

    Abstract: This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition. Our proposal is built upon learning invariances to input skeleton representations and various skeleton augmentations via a noise contrastive estimation. In particular, we propose inter-skeleton contrastive learning, which learns from multiple different input skeleton representations i… ▽ More

    Submitted 8 August, 2021; originally announced August 2021.

    Comments: Accepted in ACM Multimedia 2021