Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–10 of 10 results for author: Falcon, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.14630  [pdf, other

    cs.CV

    A Language-based solution to enable Metaverse Retrieval

    Authors: Ali Abdari, Alex Falcon, Giuseppe Serra

    Abstract: Recently, the Metaverse is becoming increasingly attractive, with millions of users accessing the many available virtual worlds. However, how do users find the one Metaverse which best fits their current interests? So far, the search process is mostly done by word of mouth, or by advertisement on technology-oriented websites. However, the lack of search engines similar to those available for other… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted at 30th International Conference on Multimedia Modeling- MMM2024

  2. arXiv:2309.03100  [pdf, other

    cs.CV cs.MM

    FArMARe: a Furniture-Aware Multi-task methodology for Recommending Apartments based on the user interests

    Authors: Ali Abdari, Alex Falcon, Giuseppe Serra

    Abstract: Nowadays, many people frequently have to search for new accommodation options. Searching for a suitable apartment is a time-consuming process, especially because visiting them is often mandatory to assess the truthfulness of the advertisements found on the Web. While this process could be alleviated by visiting the apartments in the metaverse, the Web-based recommendation platforms are not suitabl… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: accepted for presentation at the ICCV2023 CV4Metaverse workshop

  3. arXiv:2306.15445  [pdf, ps, other

    cs.CV

    UniUD Submission to the EPIC-Kitchens-100 Multi-Instance Retrieval Challenge 2023

    Authors: Alex Falcon, Giuseppe Serra

    Abstract: In this report, we present the technical details of our submission to the EPIC-Kitchens-100 Multi-Instance Retrieval Challenge 2023. To participate in the challenge, we ensembled two models trained with two different loss functions on 25% of the training data. Our submission, visible on the public leaderboard, obtains an average score of 56.81% nDCG and 42.63% mAP.

    Submitted 16 July, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  4. A Feature-space Multimodal Data Augmentation Technique for Text-video Retrieval

    Authors: Alex Falcon, Giuseppe Serra, Oswald Lanz

    Abstract: Every hour, huge amounts of visual contents are posted on social media and user-generated content platforms. To find relevant videos by means of a natural language query, text-video retrieval methods have received increased attention over the past few years. Data augmentation techniques were introduced to increase the performance on unseen test examples by creating new training samples with the ap… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: Accepted for presentation at 30th ACM International Conference on Multimedia (ACM MM)

  5. arXiv:2206.10903  [pdf, ps, other

    cs.CV

    UniUD-FBK-UB-UniBZ Submission to the EPIC-Kitchens-100 Multi-Instance Retrieval Challenge 2022

    Authors: Alex Falcon, Giuseppe Serra, Sergio Escalera, Oswald Lanz

    Abstract: This report presents the technical details of our submission to the EPIC-Kitchens-100 Multi-Instance Retrieval Challenge 2022. To participate in the challenge, we designed an ensemble consisting of different models trained with two recently developed relevance-augmented versions of the widely used triplet loss. Our submission, visible on the public leaderboard, obtains an average score of 61.02% n… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

    Comments: Ranked joint 1st place in the Multi-Instance Action Retrieval Challenge organized at EPIC@CVPR2022

  6. Relevance-based Margin for Contrastively-trained Video Retrieval Models

    Authors: Alex Falcon, Swathikiran Sudhakaran, Giuseppe Serra, Sergio Escalera, Oswald Lanz

    Abstract: Video retrieval using natural language queries has attracted increasing interest due to its relevance in real-world applications, from intelligent access in private media galleries to web-scale video search. Learning the cross-similarity of video and text in a joint embedding space is the dominant approach. To do so, a contrastive loss is usually employed because it organizes the embedding space b… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: Accepted for presentation at International Conference on Multimedia Retrieval (ICMR '22)

  7. arXiv:2203.08688  [pdf, other

    cs.CV

    Learning video retrieval models with relevance-aware online mining

    Authors: Alex Falcon, Giuseppe Serra, Oswald Lanz

    Abstract: Due to the amount of videos and related captions uploaded every hour, deep learning-based solutions for cross-modal video retrieval are attracting more and more attention. A typical approach consists in learning a joint text-video embedding space, where the similarity of a video and its associated caption is maximized, whereas a lower similarity is enforced with all the other captions, called nega… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted at 21st International Conference on Image Analysis and Processing (ICIAP 2021)

  8. arXiv:2110.02902  [pdf, ps, other

    cs.CV

    SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021

    Authors: Swathikiran Sudhakaran, Adrian Bulat, Juan-Manuel Perez-Rua, Alex Falcon, Sergio Escalera, Oswald Lanz, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This report presents the technical details of our submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021. To participate in the challenge we deployed spatio-temporal feature extraction and aggregation models we have developed recently: GSF and XViT. GSF is an efficient spatio-temporal feature extracting module that can be plugged into 2D CNNs for video action recognition. XViT is a… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: Ranked third in the EPIC-Kitchens-100 Action Recognition Challenge @ CVPR 2021

  9. arXiv:2008.09849  [pdf, other

    cs.CV

    Data augmentation techniques for the Video Question Answering task

    Authors: Alex Falcon, Oswald Lanz, Giuseppe Serra

    Abstract: Video Question Answering (VideoQA) is a task that requires a model to analyze and understand both the visual content given by the input video and the textual part given by the question, and the interaction between them in order to produce a meaningful answer. In our work we focus on the Egocentric VideoQA task, which exploits first-person videos, because of the importance of such task which can ha… ▽ More

    Submitted 22 August, 2020; originally announced August 2020.

    Comments: 16 pages, 5 figures; to be published in Egocentric Perception, Interaction and Computing (EPIC) Workshop Proceedings, at ECCV 2020

  10. arXiv:1910.04056  [pdf, other

    cs.LG cs.CL stat.ML

    Text-to-Image Synthesis Based on Machine Generated Captions

    Authors: Marco Menardi, Alex Falcon, Saida S. Mohamed, Lorenzo Seidenari, Giuseppe Serra, Alberto Del Bimbo, Carlo Tasso

    Abstract: Text to Image Synthesis refers to the process of automatic generation of a photo-realistic image starting from a given text and is revolutionizing many real-world applications. In order to perform such process it is necessary to exploit datasets containing captioned images, meaning that each image is associated with one (or more) captions describing it. Despite the abundance of uncaptioned images… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.