Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 119 results for author: Ricci, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12321  [pdf, other

    cs.AI cs.CL cs.CV

    Automatic benchmarking of large multimodal models via iterative experiment programming

    Authors: Alessandro Conti, Enrico Fini, Paolo Rota, Yiming Wang, Massimiliano Mancini, Elisa Ricci

    Abstract: Assessing the capabilities of large multimodal models (LMMs) often requires the creation of ad-hoc evaluations. Currently, building new benchmarks requires tremendous amounts of manual work for each specific analysis. This makes the evaluation process tedious and costly. In this paper, we present APEx, Automatic Programming of Experiments, the first framework for automatic benchmarking of LMMs. Gi… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 31 pages, 6 figures, code is available at https://github.com/altndrr/apex

  2. arXiv:2405.18330  [pdf, other

    cs.CV cs.AI

    Frustratingly Easy Test-Time Adaptation of Vision-Language Models

    Authors: Matteo Farina, Gianni Franchi, Giovanni Iacca, Massimiliano Mancini, Elisa Ricci

    Abstract: Vision-Language Models seamlessly discriminate among arbitrary semantic categories, yet they still suffer from poor generalization when presented with challenging examples. For this reason, Episodic Test-Time Adaptation (TTA) strategies have recently emerged as powerful techniques to adapt VLMs in the presence of a single unlabeled image. The recent literature on TTA is dominated by the paradigm o… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Preprint. Work in progress

  3. arXiv:2405.15633  [pdf, other

    cs.CV cs.AI

    Less is more: Summarizing Patch Tokens for efficient Multi-Label Class-Incremental Learning

    Authors: Thomas De Min, Massimiliano Mancini, Stéphane Lathuilière, Subhankar Roy, Elisa Ricci

    Abstract: Prompt tuning has emerged as an effective rehearsal-free technique for class-incremental learning (CIL) that learns a tiny set of task-specific parameters (or prompts) to instruct a pre-trained transformer to learn on a sequence of tasks. Albeit effective, prompt tuning methods do not lend well in the multi-label class incremental learning (MLCIL) scenario (where an image contains multiple foregro… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Published at 3rd Conference on Lifelong Learning Agents (CoLLAs), 2024

  4. arXiv:2405.10053  [pdf, other

    cs.CV

    SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

    Authors: Mingxuan Liu, Tyler L. Hayes, Elisa Ricci, Gabriela Csurka, Riccardo Volpi

    Abstract: Open-vocabulary object detection (OvOD) has transformed detection into a language-guided task, empowering users to freely define their class vocabularies of interest during inference. However, our initial investigation indicates that existing OvOD detectors exhibit significant variability when dealing with vocabularies across various semantic granularities, posing a concern for real-world deployme… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted as a conference paper (highlight) at CVPR 2024

  5. arXiv:2404.10864  [pdf, other

    cs.CV

    Vocabulary-free Image Classification and Semantic Segmentation

    Authors: Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci

    Abstract: Large vision-language models revolutionized image classification and semantic segmentation paradigms. However, they typically assume a pre-defined set of categories, or vocabulary, at test time for composing textual prompts. This assumption is impractical in scenarios with unknown or evolving semantic context. Here, we address this issue and introduce the Vocabulary-free Image Classification (VIC)… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Under review, 22 pages, 10 figures, code is available at https://github.com/altndrr/vicss. arXiv admin note: text overlap with arXiv:2306.00917

  6. arXiv:2404.07560  [pdf, other

    cs.RO cs.AI

    Socially Pertinent Robots in Gerontological Healthcare

    Authors: Xavier Alameda-Pineda, Angus Addlesee, Daniel Hernández García, Chris Reinke, Soraya Arias, Federica Arrigoni, Alex Auternaud, Lauriane Blavette, Cigdem Beyan, Luis Gomez Camara, Ohad Cohen, Alessandro Conti, Sébastien Dacunha, Christian Dondrup, Yoav Ellinson, Francesco Ferro, Sharon Gannot, Florian Gras, Nancie Gunson, Radu Horaud, Moreno D'Incà, Imad Kimouche, Séverin Lemaignan, Oliver Lemon, Cyril Liotard , et al. (19 additional authors not shown)

    Abstract: Despite the many recent achievements in developing and deploying social robotics, there are still many underexplored environments and applications for which systematic evaluation of such systems by end-users is necessary. While several robotic platforms have been used in gerontological healthcare, the question of whether or not a social interactive robot with multi-modal conversational capabilitie… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  7. arXiv:2404.05621  [pdf, other

    cs.CV

    MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning

    Authors: Matteo Farina, Massimiliano Mancini, Elia Cunegatti, Gaowen Liu, Giovanni Iacca, Elisa Ricci

    Abstract: While excellent in transfer learning, Vision-Language models (VLMs) come with high computational costs due to their large number of parameters. To address this issue, removing parameters via model pruning is a viable solution. However, existing techniques for VLMs are task-specific, and thus require pruning the network from scratch for each new task of interest. In this work, we explore a new dire… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  8. arXiv:2404.05426  [pdf, other

    cs.CV

    Test-Time Zero-Shot Temporal Action Localization

    Authors: Benedetta Liberatori, Alessandro Conti, Paolo Rota, Yiming Wang, Elisa Ricci

    Abstract: Zero-Shot Temporal Action Localization (ZS-TAL) seeks to identify and locate actions in untrimmed videos unseen during training. Existing ZS-TAL methods involve fine-tuning a model on a large amount of annotated training data. While effective, training-based ZS-TAL approaches assume the availability of labeled data for supervised learning, which can be impractical in some applications. Furthermore… ▽ More

    Submitted 11 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  9. arXiv:2404.01014  [pdf, other

    cs.CV

    Harnessing Large Language Models for Training-free Video Anomaly Detection

    Authors: Luca Zanella, Willi Menapace, Massimiliano Mancini, Yiming Wang, Elisa Ricci

    Abstract: Video anomaly detection (VAD) aims to temporally locate abnormal events in a video. Existing works mostly rely on training deep models to learn the distribution of normality with either video-level supervision, one-class supervision, or in an unsupervised setting. Training-based methods are prone to be domain-specific, thus being costly for practical deployment as any domain change will involve da… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project website at https://lucazanella.github.io/lavad/

  10. arXiv:2402.14797  [pdf, other

    cs.CV cs.AI

    Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

    Authors: Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, Sergey Tulyakov

    Abstract: Contemporary models for generating images show remarkable quality and versatility. Swayed by these advantages, the research community repurposes them to generate videos. Since video content is highly redundant, we argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability. In this work, we build Snap Video, a… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  11. arXiv:2401.13837  [pdf, other

    cs.CV

    Democratizing Fine-grained Visual Recognition with Large Language Models

    Authors: Mingxuan Liu, Subhankar Roy, Wenjing Li, Zhun Zhong, Nicu Sebe, Elisa Ricci

    Abstract: Identifying subordinate-level categories from images is a longstanding task in computer vision and is referred to as fine-grained visual recognition (FGVR). It has tremendous significance in real-world applications since an average layperson does not excel at differentiating species of birds or mushrooms due to subtle differences among the species. A major bottleneck in developing FGVR systems is… ▽ More

    Submitted 10 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted as a conference paper at ICLR 2024; Project page: https://projfiner.github.io/

  12. arXiv:2312.03782  [pdf, other

    cs.CV

    Novel class discovery meets foundation models for 3D semantic segmentation

    Authors: Luigi Riz, Cristiano Saltori, Yiming Wang, Elisa Ricci, Fabio Poiesi

    Abstract: The task of Novel Class Discovery (NCD) in semantic segmentation entails training a model able to accurately segment unlabelled (novel) classes, relying on the available supervision from annotated (base) classes. Although extensively investigated in 2D image data, the extension of the NCD task to the domain of 3D point clouds represents a pioneering effort, characterized by assumptions and challen… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.11610

  13. arXiv:2312.03046  [pdf, other

    cs.CV

    Diversified in-domain synthesis with efficient fine-tuning for few-shot classification

    Authors: Victor G. Turrisi da Costa, Nicola Dall'Asen, Yiming Wang, Nicu Sebe, Elisa Ricci

    Abstract: Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class. A recent research direction for improving few-shot classifiers involves augmenting the labelled samples with synthetic images created by state-of-the-art text-to-image generation models. Following this trend, we propose Diversified In-domain Synthesis with Efficient Fine-tuning (DI… ▽ More

    Submitted 6 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: 14 pages, 6 figures, 8 tables

  14. arXiv:2312.01800  [pdf, other

    cs.CV

    Collaborative Neural Painting

    Authors: Nicola Dall'Asen, Willi Menapace, Elia Peruzzo, Enver Sangineto, Yiming Wang, Elisa Ricci

    Abstract: The process of painting fosters creativity and rational planning. However, existing generative AI mostly focuses on producing visually pleasant artworks, without emphasizing the painting process. We introduce a novel task, Collaborative Neural Painting (CNP), to facilitate collaborative art painting generation between humans and machines. Given any number of user-input brushstrokes as the context… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Submitted to Computer Vision and Image Understanding, project website at https://fodark.github.io/collaborative-neural-painting/

  15. arXiv:2311.09004  [pdf, other

    cs.CV

    Incremental Object-Based Novelty Detection with Feedback Loop

    Authors: Simone Caldarella, Elisa Ricci, Rahaf Aljundi

    Abstract: Object-based Novelty Detection (ND) aims to identify unknown objects that do not belong to classes seen during training by an object detection model. The task is particularly crucial in real-world applications, as it allows to avoid potentially harmful behaviours, e.g. as in the case of object detection models adopted in a self-driving car or in an autonomous robot. Traditional approaches to ND fo… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  16. arXiv:2310.02835  [pdf, other

    cs.CV

    Delving into CLIP latent space for Video Anomaly Recognition

    Authors: Luca Zanella, Benedetta Liberatori, Willi Menapace, Fabio Poiesi, Yiming Wang, Elisa Ricci

    Abstract: We tackle the complex problem of detecting and recognising anomalies in surveillance videos at the frame level, utilising only video-level supervision. We introduce the novel method AnomalyCLIP, the first to combine Large Language and Vision (LLV) models, such as CLIP, with multiple instance learning for joint video anomaly detection and classification. Our approach specifically involves manipulat… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: submitted to Computer Vision and Image Understanding, project website and code are available at https://luca-zanella-dvl.github.io/AnomalyCLIP/

  17. arXiv:2309.15478  [pdf, other

    cs.CV cs.LG

    The Robust Semantic Segmentation UNCV2023 Challenge Results

    Authors: Xuanlong Yu, Yi Zuo, Zitao Wang, Xiaowen Zhang, Jiaxuan Zhao, Yuting Yang, Licheng Jiao, Rui Peng, Xinyi Wang, Junpei Zhang, Kexin Zhang, Fang Liu, Roberto Alcover-Couso, Juan C. SanMiguel, Marcos Escudero-Viñolo, Hanlin Tian, Kenta Matsui, Tianhao Wang, Fahmy Adan, Zhitong Gao, Xuming He, Quentin Bouniot, Hossein Moghaddam, Shyam Nandan Rai, Fabio Cermelli , et al. (12 additional authors not shown)

    Abstract: This paper outlines the winning solutions employed in addressing the MUAD uncertainty quantification challenge held at ICCV 2023. The challenge was centered around semantic segmentation in urban environments, with a particular focus on natural adversarial scenarios. The report presents the results of 19 submitted entries, with numerous techniques drawing inspiration from cutting-edge uncertainty q… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: 11 pages, 4 figures, accepted at ICCV 2023 UNCV workshop

  18. arXiv:2308.14619  [pdf, other

    cs.CV

    Compositional Semantic Mix for Domain Adaptation in Point Cloud Segmentation

    Authors: Cristiano Saltori, Fabio Galasso, Giuseppe Fiameni, Nicu Sebe, Fabio Poiesi, Elisa Ricci

    Abstract: Deep-learning models for 3D point cloud semantic segmentation exhibit limited generalization capabilities when trained and tested on data captured with different sensors or in varying environments due to domain shift. Domain adaptation methods can be employed to mitigate this domain shift, for instance, by simulating sensor noise, developing domain-agnostic generators, or training point cloud comp… ▽ More

    Submitted 29 August, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: TPAMI. arXiv admin note: text overlap with arXiv:2207.09778

  19. arXiv:2308.09610  [pdf, other

    cs.CV

    On the Effectiveness of LayerNorm Tuning for Continual Learning in Vision Transformers

    Authors: Thomas De Min, Massimiliano Mancini, Karteek Alahari, Xavier Alameda-Pineda, Elisa Ricci

    Abstract: State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts, drastically reducing catastrophic forgetting. However, there is a tradeoff between the number of learned parameters and the performance, making such models computationally expensive. In this work, we aim to reduce this cost while maintaining competitive perfor… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: In The First Workshop on Visual Continual Learning (ICCVW 2023); Oral

  20. arXiv:2308.09139  [pdf, other

    cs.CV

    The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation

    Authors: Giacomo Zara, Alessandro Conti, Subhankar Roy, Stéphane Lathuilière, Paolo Rota, Elisa Ricci

    Abstract: Source-Free Video Unsupervised Domain Adaptation (SFVUDA) task consists in adapting an action recognition model, trained on a labelled source dataset, to an unlabelled target dataset, without accessing the actual source data. The previous approaches have attempted to address SFVUDA by leveraging self-supervision (e.g., enforcing temporal consistency) derived from the target data itself. In this wo… ▽ More

    Submitted 22 August, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV2023, 14 pages, 7 figures, code is available at https://github.com/giaczara/dallv

  21. Interactive Neural Painting

    Authors: Elia Peruzzo, Willi Menapace, Vidit Goel, Federica Arrigoni, Hao Tang, Xingqian Xu, Arman Chopikyan, Nikita Orlov, Yuxiao Hu, Humphrey Shi, Nicu Sebe, Elisa Ricci

    Abstract: In the last few years, Neural Painting (NP) techniques became capable of producing extremely realistic artworks. This paper advances the state of the art in this emerging research domain by proposing the first approach for Interactive NP. Considering a setting where a user looks at a scene and tries to reproduce it on a painting, our objective is to develop a computational framework to assist the… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: This is a preprint version of the paper to appear at Computer Vision and Image Understanding (CVIU). The final journal version will be available at https://www.sciencedirect.com/science/article/pii/S1077314223001583

    Journal ref: 10.1016/j.cviu.2023.103778

  22. arXiv:2307.09662  [pdf, other

    cs.CV

    Object-aware Gaze Target Detection

    Authors: Francesco Tonini, Nicola Dall'Asen, Cigdem Beyan, Elisa Ricci

    Abstract: Gaze target detection aims to predict the image location where the person is looking and the probability that a gaze is out of the scene. Several works have tackled this task by regressing a gaze heatmap centered on the gaze location, however, they overlooked decoding the relationship between the people and the gazed objects. This paper proposes a Transformer-based architecture that automatically… ▽ More

    Submitted 27 September, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV 2023. Code is available at https://github.com/francescotonini/object-aware-gaze-target-detection

  23. arXiv:2307.01533  [pdf, other

    cs.CV

    Unsupervised Video Anomaly Detection with Diffusion Models Conditioned on Compact Motion Representations

    Authors: Anil Osman Tur, Nicola Dall'Asen, Cigdem Beyan, Elisa Ricci

    Abstract: This paper aims to address the unsupervised video anomaly detection (VAD) problem, which involves classifying each frame in a video as normal or abnormal, without any access to labels. To accomplish this, the proposed method employs conditional diffusion models, where the input data is the spatiotemporal features extracted from a pre-trained network, and the condition is the features extracted fro… ▽ More

    Submitted 19 July, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

    Comments: Accepted to ICIAP 2023

  24. arXiv:2306.07483  [pdf, other

    cs.CV

    Semi-supervised learning made simple with self-supervised clustering

    Authors: Enrico Fini, Pietro Astolfi, Karteek Alahari, Xavier Alameda-Pineda, Julien Mairal, Moin Nabi, Elisa Ricci

    Abstract: Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations. However, in many real-world scenarios, labels are partially available, motivating a recent line of work on semi-supervised methods inspired by self-supervised principles. In this paper, we propose a conceptually simple yet empirically powerful approach to turn clustering-based… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: CVPR 2023 - Code available at https://github.com/pietroastolfi/suave-daino

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) 3187-3197

  25. arXiv:2306.00917  [pdf, other

    cs.CV

    Vocabulary-free Image Classification

    Authors: Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota, Yiming Wang, Elisa Ricci

    Abstract: Recent advances in large vision-language models have revolutionized the image classification paradigm. Despite showing impressive zero-shot capabilities, a pre-defined set of categories, a.k.a. the vocabulary, is assumed at test time for composing the textual prompts. However, such assumption can be impractical when the semantic context is unknown and evolving. We thus formalize a novel task, term… ▽ More

    Submitted 12 January, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted at NeurIPS2023, 19 pages, 8 figures, code is available at https://github.com/altndrr/vic

  26. arXiv:2305.05268  [pdf, other

    cs.CV cs.AI

    Rotation Synchronization via Deep Matrix Factorization

    Authors: Gk Tejus, Giacomo Zara, Paolo Rota, Andrea Fusiello, Elisa Ricci, Federica Arrigoni

    Abstract: In this paper we address the rotation synchronization problem, where the objective is to recover absolute rotations starting from pairwise ones, where the unknowns and the measures are represented as nodes and edges of a graph, respectively. This problem is an essential task for structure from motion and simultaneous localization and mapping. We focus on the formulation of synchronization via neur… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: To be published in ICRA 2023

  27. arXiv:2304.11705  [pdf, other

    cs.CV cs.AI cs.LG

    Walking Your LiDOG: A Journey Through Multiple Domains for LiDAR Semantic Segmentation

    Authors: Cristiano Saltori, Aljoša Ošep, Elisa Ricci, Laura Leal-Taixé

    Abstract: The ability to deploy robots that can operate safely in diverse environments is crucial for developing embodied intelligent agents. As a community, we have made tremendous progress in within-domain LiDAR semantic segmentation. However, do these methods generalize across domains? To answer this question, we design the first experimental setup for studying domain generalization (DG) for LiDAR semant… ▽ More

    Submitted 29 August, 2023; v1 submitted 23 April, 2023; originally announced April 2023.

    Comments: Accepted at ICCV 2023

  28. arXiv:2304.05841  [pdf, other

    cs.CV

    Exploring Diffusion Models for Unsupervised Video Anomaly Detection

    Authors: Anil Osman Tur, Nicola Dall'Asen, Cigdem Beyan, Elisa Ricci

    Abstract: This paper investigates the performance of diffusion models for video anomaly detection (VAD) within the most challenging but also the most operational scenario in which the data annotations are not used. As being sparse, diverse, contextual, and often ambiguous, detecting abnormal events precisely is a very ambitious task. To this end, we rely only on the information-rich spatio-temporal data, an… ▽ More

    Submitted 2 July, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

    Comments: Accepted to IEEE ICIP 2023

  29. arXiv:2304.01110  [pdf, other

    cs.CV

    AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation

    Authors: Giacomo Zara, Subhankar Roy, Paolo Rota, Elisa Ricci

    Abstract: Open-set Unsupervised Video Domain Adaptation (OUVDA) deals with the task of adapting an action recognition model from a labelled source domain to an unlabelled target domain that contains "target-private" categories, which are present in the target but absent in the source. In this work we deviate from the prior work of training a specialized open-set classifier or weighted adversarial learning b… ▽ More

    Submitted 4 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPR 2023

  30. arXiv:2303.15975  [pdf, other

    cs.CV cs.LG

    Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery

    Authors: Mingxuan Liu, Subhankar Roy, Zhun Zhong, Nicu Sebe, Elisa Ricci

    Abstract: Discovering novel concepts from unlabelled data and in a continuous manner is an important desideratum of lifelong learners. In the literature such problems have been partially addressed under very restricted settings, where either access to labelled data is provided for discovering novel concepts (e.g., NCD) or learning occurs for a limited number of incremental steps (e.g., class-iNCD). In this… ▽ More

    Submitted 29 March, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

  31. arXiv:2303.15444  [pdf, other

    cs.CV

    Quantum Multi-Model Fitting

    Authors: Matteo Farina, Luca Magri, Willi Menapace, Elisa Ricci, Vladislav Golyanik, Federica Arrigoni

    Abstract: Geometric model fitting is a challenging but fundamental computer vision problem. Recently, quantum optimization has been shown to enhance robust fitting for the case of a single model, while leaving the question of multi-model fitting open. In response to this challenge, this paper shows that the latter case can significantly benefit from quantum hardware and proposes the first quantum approach t… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: In Computer Vision and Pattern Recognition (CVPR) 2023; Highlight

  32. arXiv:2303.13472  [pdf, other

    cs.CV cs.AI

    Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion Models

    Authors: Willi Menapace, Aliaksandr Siarohin, Stéphane Lathuilière, Panos Achlioptas, Vladislav Golyanik, Sergey Tulyakov, Elisa Ricci

    Abstract: Neural video game simulators emerged as powerful tools to generate and edit videos. Their idea is to represent games as the evolution of an environment's state driven by the actions of its agents. While such a paradigm enables users to play a game action-by-action, its rigidity precludes more semantic forms of control. To overcome this limitation, we augment game models with prompts specified as a… ▽ More

    Submitted 21 January, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: ACM Transactions on Graphics \c{opyright} Copyright is held by the owner/author(s) 2023. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Transactions on Graphics, http://dx.doi.org/10.1145/3635705

  33. arXiv:2303.11610  [pdf, other

    cs.CV

    Novel Class Discovery for 3D Point Cloud Semantic Segmentation

    Authors: Luigi Riz, Cristiano Saltori, Elisa Ricci, Fabio Poiesi

    Abstract: Novel class discovery (NCD) for semantic segmentation is the task of learning a model that can segment unlabelled (novel) classes using only the supervision from labelled (base) classes. This problem has recently been pioneered for 2D image data, but no work exists for 3D point cloud data. In fact, the assumptions made for 2D are loosely applicable to 3D in this case. This paper is presented to ad… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Paper accepted at CVPR 2023

  34. arXiv:2302.09251  [pdf, other

    cs.CV

    StyLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-based Domain Generalization

    Authors: Shirsha Bose, Ankit Jha, Enrico Fini, Mainak Singha, Elisa Ricci, Biplab Banerjee

    Abstract: Large-scale foundation models, such as CLIP, have demonstrated impressive zero-shot generalization performance on downstream tasks, leveraging well-designed language prompts. However, these prompt learning techniques often struggle with domain shift, limiting their generalization capabilities. In our study, we tackle this issue by proposing StyLIP, a novel approach for Domain Generalization (DG) t… ▽ More

    Submitted 28 November, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: 23 pages,5 figures, 7 tables, Accepted in WACV 2024

  35. arXiv:2301.03322  [pdf, other

    cs.CV

    Simplifying Open-Set Video Domain Adaptation with Contrastive Learning

    Authors: Giacomo Zara, Victor Guilherme Turrisi da Costa, Subhankar Roy, Paolo Rota, Elisa Ricci

    Abstract: In an effort to reduce annotation costs in action recognition, unsupervised video domain adaptation methods have been proposed that aim to adapt a predictive model from a labelled dataset (i.e., source domain) to an unlabelled dataset (i.e., target domain). In this work we address a more realistic scenario, called open-set video domain adaptation (OUVDA), where the target dataset contains "unknown… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

    Comments: Currently under review at Computer Vision and Image Understanding (CVIU) journal

  36. arXiv:2212.05102  [pdf, other

    cs.CV cs.LG

    A soft nearest-neighbor framework for continual semi-supervised learning

    Authors: Zhiqi Kang, Enrico Fini, Moin Nabi, Elisa Ricci, Karteek Alahari

    Abstract: Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data. In this paper, we tackle this challenge and propose an approach for continual semi-supervised learning--a setting where not all the data samples are labeled. A primary issue in this scenario is the model forgetting representations of unlabeled da… ▽ More

    Submitted 11 September, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: Accepted at ICCV 2023

  37. arXiv:2210.11539  [pdf, other

    cs.CV

    ConfMix: Unsupervised Domain Adaptation for Object Detection via Confidence-based Mixing

    Authors: Giulio Mattolin, Luca Zanella, Elisa Ricci, Yiming Wang

    Abstract: Unsupervised Domain Adaptation (UDA) for object detection aims to adapt a model trained on a source domain to detect instances from a new target domain for which annotations are not available. Different from traditional approaches, we propose ConfMix, the first method that introduces a sample mixing strategy based on region-level detection confidence for adaptive object detector learning. We mix t… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: Accepted in WACV 2023

  38. arXiv:2210.09836  [pdf, other

    cs.CV

    Overlap-guided Gaussian Mixture Models for Point Cloud Registration

    Authors: Guofeng Mei, Fabio Poiesi, Cristiano Saltori, Jian Zhang, Elisa Ricci, Nicu Sebe

    Abstract: Probabilistic 3D point cloud registration methods have shown competitive performance in overcoming noise, outliers, and density variations. However, registering point cloud pairs in the case of partial overlap is still a challenge. This paper proposes a novel overlap-guided probabilistic registration approach that computes the optimal transformation from matched Gaussian Mixture Model (GMM) parame… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: Accepted in WACV 2023

  39. arXiv:2210.05246  [pdf, other

    cs.CV cs.AI

    Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition

    Authors: Alessandro Conti, Paolo Rota, Yiming Wang, Elisa Ricci

    Abstract: Automatically understanding emotions from visual data is a fundamental task for human behaviour understanding. While models devised for Facial Expression Recognition (FER) have demonstrated excellent performances on many datasets, they often suffer from severe performance degradation when trained and tested on different datasets due to domain shift. In addition, as face images are considered highl… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted at BMVC2022, 13 pages, 4 figures, code is available at https://github.com/altndrr/clup

  40. arXiv:2210.02798  [pdf, other

    cs.CV

    Data Augmentation-free Unsupervised Learning for 3D Point Cloud Understanding

    Authors: Guofeng Mei, Cristiano Saltori, Fabio Poiesi, Jian Zhang, Elisa Ricci, Nicu Sebe, Qiang Wu

    Abstract: Unsupervised learning on 3D point clouds has undergone a rapid evolution, especially thanks to data augmentation-based contrastive methods. However, data augmentation is not ideal as it requires a careful selection of the type of augmentations to perform, which in turn can affect the geometric and semantic information learned by the network during self-training. To overcome this issue, we propose… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: BMVC 2022

  41. arXiv:2210.01578  [pdf, other

    cs.CV

    Cooperative Self-Training for Multi-Target Adaptive Semantic Segmentation

    Authors: Yangsong Zhang, Subhankar Roy, Hongtao Lu, Elisa Ricci, Stéphane Lathuilière

    Abstract: In this work we address multi-target domain adaptation (MTDA) in semantic segmentation, which consists in adapting a single model from an annotated source dataset to multiple unannotated target datasets that differ in their underlying data distributions. To address MTDA, we propose a self-training strategy that employs pseudo-labels to induce cooperation among multiple domain-specific classifiers.… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: Accepted at WACV 2023

  42. arXiv:2209.12948  [pdf

    physics.comp-ph cond-mat.mtrl-sci cs.LG physics.chem-ph

    Developing Machine-Learned Potentials for Coarse-Grained Molecular Simulations: Challenges and Pitfalls

    Authors: Eleonora Ricci, George Giannakopoulos, Vangelis Karkaletsis, Doros N. Theodorou, Niki Vergadou

    Abstract: Coarse graining (CG) enables the investigation of molecular properties for larger systems and at longer timescales than the ones attainable at the atomistic resolution. Machine learning techniques have been recently proposed to learn CG particle interactions, i.e. develop CG force fields. Graph representations of molecules and supervised training of a graph convolutional neural network architectur… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: Proceedings of 12th Conference on Artificial Intelligence (SETN), 2022

  43. arXiv:2209.12946  [pdf

    physics.comp-ph cond-mat.mtrl-sci cs.LG physics.chem-ph

    Investigation of Machine Learning-based Coarse-Grained Mapping Schemes for Organic Molecules

    Authors: Dimitris Nasikas, Eleonora Ricci, George Giannakopoulos, Vangelis Karkaletsis, Doros N. Theodorou, Niki Vergadou

    Abstract: Due to the wide range of timescales that are present in macromolecular systems, hierarchical multiscale strategies are necessary for their computational study. Coarse-graining (CG) allows to establish a link between different system resolutions and provides the backbone for the development of robust multiscale simulations and analyses. The CG mapping process is typically system- and application-sp… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: Proceedings of 12th Conference on Artificial Intelligence (SETN), 2022

  44. arXiv:2208.10822  [pdf, other

    cs.CV cs.AI cs.HC

    Multimodal Across Domains Gaze Target Detection

    Authors: Francesco Tonini, Cigdem Beyan, Elisa Ricci

    Abstract: This paper addresses the gaze target detection problem in single images captured from the third-person perspective. We present a multimodal deep architecture to infer where a person in a scene is looking. This spatial model is trained on the head images of the person-of- interest, scene and depth maps representing rich context information. Our model, unlike several prior art, do not require superv… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

    Comments: Accepted to 24th ACM International Conference on Multimodal Interaction (ICMI 2022)

  45. arXiv:2208.07591  [pdf, other

    cs.CV cs.LG

    Uncertainty-guided Source-free Domain Adaptation

    Authors: Subhankar Roy, Martin Trapp, Andrea Pilzer, Juho Kannala, Nicu Sebe, Elisa Ricci, Arno Solin

    Abstract: Source-free domain adaptation (SFDA) aims to adapt a classifier to an unlabelled target data set by only using a pre-trained source model. However, the absence of the source data and the domain shift makes the predictions on the target data unreliable. We propose quantifying the uncertainty in the source model predictions and utilizing it to guide the target adaptation. For this, we construct a pr… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

    Comments: ECCV 2022

  46. arXiv:2207.12842  [pdf, other

    cs.CV

    Unsupervised Domain Adaptation for Video Transformers in Action Recognition

    Authors: Victor G. Turrisi da Costa, Giacomo Zara, Paolo Rota, Thiago Oliveira-Santos, Nicu Sebe, Vittorio Murino, Elisa Ricci

    Abstract: Over the last few years, Unsupervised Domain Adaptation (UDA) techniques have acquired remarkable importance and popularity in computer vision. However, when compared to the extensive literature available for images, the field of videos is still relatively unexplored. On the other hand, the performance of a model in action recognition is heavily affected by domain shift. In this paper, we propose… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

    Comments: Accepted at ICPR 2022

  47. arXiv:2207.11482  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Multimodal Emotion Recognition with Modality-Pairwise Unsupervised Contrastive Loss

    Authors: Riccardo Franceschini, Enrico Fini, Cigdem Beyan, Alessandro Conti, Federica Arrigoni, Elisa Ricci

    Abstract: Emotion recognition is involved in several real-world applications. With an increase in available modalities, automatic understanding of emotions is being performed more accurately. The success in Multimodal Emotion Recognition (MER), primarily relies on the supervised learning paradigm. However, data annotation is expensive, time-consuming, and as emotion expression and perception depends on seve… ▽ More

    Submitted 23 July, 2022; originally announced July 2022.

    Comments: Accepted to 26th International Conference on Pattern Recognition (ICPR) 2022

  48. arXiv:2207.09778  [pdf, other

    cs.CV cs.AI cs.LG

    CoSMix: Compositional Semantic Mix for Domain Adaptation in 3D LiDAR Segmentation

    Authors: Cristiano Saltori, Fabio Galasso, Giuseppe Fiameni, Nicu Sebe, Elisa Ricci, Fabio Poiesi

    Abstract: 3D LiDAR semantic segmentation is fundamental for autonomous driving. Several Unsupervised Domain Adaptation (UDA) methods for point cloud data have been recently proposed to improve model generalization for different sensors and environments. Researchers working on UDA problems in the image domain have shown that sample mixing can mitigate domain shift. We propose a new approach of sample mixing… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted at ECCV 2022

  49. arXiv:2207.09763  [pdf, other

    cs.CV cs.AI cs.LG

    GIPSO: Geometrically Informed Propagation for Online Adaptation in 3D LiDAR Segmentation

    Authors: Cristiano Saltori, Evgeny Krivosheev, Stéphane Lathuilière, Nicu Sebe, Fabio Galasso, Giuseppe Fiameni, Elisa Ricci, Fabio Poiesi

    Abstract: 3D point cloud semantic segmentation is fundamental for autonomous driving. Most approaches in the literature neglect an important aspect, i.e., how to deal with domain shift when handling dynamic scenes. This can significantly hinder the navigation capabilities of self-driving vehicles. This paper advances the state of the art in this research field. Our first contribution consists in analysing a… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted at ECCV 2022

  50. arXiv:2207.08605  [pdf, other

    cs.CV

    Class-incremental Novel Class Discovery

    Authors: Subhankar Roy, Mingxuan Liu, Zhun Zhong, Nicu Sebe, Elisa Ricci

    Abstract: We study the new task of class-incremental Novel Class Discovery (class-iNCD), which refers to the problem of discovering novel categories in an unlabelled data set by leveraging a pre-trained model that has been trained on a labelled data set containing disjoint yet related categories. Apart from discovering novel classes, we also aim at preserving the ability of the model to recognize previously… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: ECCV 2022