Search | arXiv e-print repository

Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency

Authors: Yannis Kalantidis, Mert Bülent Sarıyıldız, Rafael S. Rezende, Philippe Weinzaepfel, Diane Larlus, Gabriela Csurka

Abstract: State-of-the-art visual localization approaches generally rely on a first image retrieval step whose role is crucial. Yet, retrieval often struggles when facing varying conditions, due to e.g. weather or time of day, with dramatic consequences on the visual localization accuracy. In this paper, we improve this retrieval step and tailor it to the final localization task. Among the several changes w… ▽ More State-of-the-art visual localization approaches generally rely on a first image retrieval step whose role is crucial. Yet, retrieval often struggles when facing varying conditions, due to e.g. weather or time of day, with dramatic consequences on the visual localization accuracy. In this paper, we improve this retrieval step and tailor it to the final localization task. Among the several changes we advocate for, we propose to synthesize variants of the training set images, obtained from generative text-to-image models, in order to automatically expand the training set towards a number of nameable variations that particularly hurt visual localization. After expanding the training set, we propose a training approach that leverages the specificities and the underlying geometry of this mix of real and synthetic images. We experimentally show that those changes translate into large improvements for the most challenging visual localization datasets. Project page: https://europe.naverlabs.com/ret4loc △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Accepted at ICLR 2024. Project Page: https://europe.naverlabs.com/ret4loc

arXiv:2307.00814 [pdf, other]

Finite Element Modeling of Power Cables using Coordinate Transformations

Authors: Albert Piwonski, Julien Dular, Rodrigo Silva Rezende, Rolf Schuhmann

Abstract: Power cables have complex geometries in order to reduce their ac resistance. Although there are many different cable designs, most have in common that their inner conductors' cross-section is divided into several electrically insulated conductors, which are twisted over the cable's length (helicoidal symmetry). In previous works, we presented how to exploit this symmetry by means of dimensional re… ▽ More Power cables have complex geometries in order to reduce their ac resistance. Although there are many different cable designs, most have in common that their inner conductors' cross-section is divided into several electrically insulated conductors, which are twisted over the cable's length (helicoidal symmetry). In previous works, we presented how to exploit this symmetry by means of dimensional reduction within the $\mathbf{H}-\varphi$ formulation of the eddy current problem. Here, the dimensional reduction is based on a coordinate transformation from the Cartesian coordinate system to a helicoidal coordinate system. This contribution focuses on how this approach can be incorporated into the magnetic vector potential based $\mathbf{A}-v$ formulation. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: arXiv admin note: text overlap with arXiv:2301.03370

arXiv:2304.01961 [pdf, other]

AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation

Authors: Jheng-Hong Yang, Carlos Lassance, Rafael Sampaio de Rezende, Krishna Srinivasan, Miriam Redi, Stéphane Clinchant, Jimmy Lin

Abstract: This paper presents the AToMiC (Authoring Tools for Multimedia Content) dataset, designed to advance research in image/text cross-modal retrieval. While vision-language pretrained transformers have led to significant improvements in retrieval effectiveness, existing research has relied on image-caption datasets that feature only simplistic image-text relationships and underspecified user models of… ▽ More This paper presents the AToMiC (Authoring Tools for Multimedia Content) dataset, designed to advance research in image/text cross-modal retrieval. While vision-language pretrained transformers have led to significant improvements in retrieval effectiveness, existing research has relied on image-caption datasets that feature only simplistic image-text relationships and underspecified user models of retrieval tasks. To address the gap between these oversimplified settings and real-world applications for multimedia content creation, we introduce a new approach for building retrieval test collections. We leverage hierarchical structures and diverse domains of texts, styles, and types of images, as well as large-scale image-document associations embedded in Wikipedia. We formulate two tasks based on a realistic user model and validate our dataset through retrieval experiments using baseline models. AToMiC offers a testbed for scalable, diverse, and reproducible multimedia retrieval research. Finally, the dataset provides the basis for a dedicated track at the 2023 Text Retrieval Conference (TREC), and is publicly available at https://github.com/TREC-AToMiC/AToMiC. △ Less

Submitted 4 April, 2023; originally announced April 2023.

arXiv:2301.03370 [pdf, other]

doi 10.1109/TMAG.2022.3231054

2D Eddy Current Boundary Value Problems for Power Cables with Helicoidal Symmetry

Authors: Albert Piwonski, Julien Dular, Rodrigo Silva Rezende, Rolf Schuhmann

Abstract: Power cables have complex geometries in order to reduce their AC resistance. The cross-section of a cable consists of several conductors that are electrically insulated from each other to counteract the current displacement caused by the skin effect. Furthermore, the individual conductors are twisted over the cable's length. This geometry has a non-standard symmetry - a combination of translation… ▽ More Power cables have complex geometries in order to reduce their AC resistance. The cross-section of a cable consists of several conductors that are electrically insulated from each other to counteract the current displacement caused by the skin effect. Furthermore, the individual conductors are twisted over the cable's length. This geometry has a non-standard symmetry - a combination of translation and rotation. Exploiting this property allows formulating a dimensionally reduced boundary value problem. Dimension reduction is desirable, otherwise the electromagnetic modeling of these cables becomes impracticable due to tremendous computational efforts. We investigate 2D eddy current boundary value problems which still allow the analysis of 3D effects, such as the twisting of conductor layers. △ Less

Submitted 9 January, 2023; originally announced January 2023.

Comments: 4 pages, 6 figures

arXiv:2203.08101 [pdf, other]

ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity

Authors: Ginger Delmas, Rafael Sampaio de Rezende, Gabriela Csurka, Diane Larlus

Abstract: An intuitive way to search for images is to use queries composed of an example image and a complementary text. While the first provides rich and implicit context for the search, the latter explicitly calls for new traits, or specifies how some elements of the example image should be changed to retrieve the desired target image. Current approaches typically combine the features of each of the two e… ▽ More An intuitive way to search for images is to use queries composed of an example image and a complementary text. While the first provides rich and implicit context for the search, the latter explicitly calls for new traits, or specifies how some elements of the example image should be changed to retrieve the desired target image. Current approaches typically combine the features of each of the two elements of the query into a single representation, which can then be compared to the ones of the potential target images. Our work aims at shedding new light on the task by looking at it through the prism of two familiar and related frameworks: text-to-image and image-to-image retrieval. Taking inspiration from them, we exploit the specific relation of each query element with the targeted image and derive light-weight attention mechanisms which enable to mediate between the two complementary modalities. We validate our approach on several retrieval benchmarks, querying with images and their associated free-form text modifiers. Our method obtains state-of-the-art results without resorting to side information, multi-level features, heavy pre-training nor large architectures as in previous works. △ Less

Submitted 16 May, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

Comments: Published in ICLR 2022

arXiv:2112.11743 [pdf, other]

Simple and Effective Balance of Contrastive Losses

Authors: Arnaud Sors, Rafael Sampaio de Rezende, Sarah Ibrahimi, Jean-Marc Andreoli

Abstract: Contrastive losses have long been a key ingredient of deep metric learning and are now becoming more popular due to the success of self-supervised learning. Recent research has shown the benefit of decomposing such losses into two sub-losses which act in a complementary way when learning the representation network: a positive term and an entropy term. Although the overall loss is thus defined as a… ▽ More Contrastive losses have long been a key ingredient of deep metric learning and are now becoming more popular due to the success of self-supervised learning. Recent research has shown the benefit of decomposing such losses into two sub-losses which act in a complementary way when learning the representation network: a positive term and an entropy term. Although the overall loss is thus defined as a combination of two terms, the balance of these two terms is often hidden behind implementation details and is largely ignored and sub-optimal in practice. In this work, we approach the balance of contrastive losses as a hyper-parameter optimization problem, and propose a coordinate descent-based search method that efficiently find the hyper-parameters that optimize evaluation performance. In the process, we extend existing balance analyses to the contrastive margin loss, include batch size in the balance, and explain how to aggregate loss elements from the batch to maintain near-optimal performance over a larger range of batch sizes. Extensive experiments with benchmarks from deep metric learning and self-supervised learning show that optimal hyper-parameters are found faster with our method than with other common search methods. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: 15 pages, 10 figures

arXiv:2112.10453 [pdf, other]

Learning with Label Noise for Image Retrieval by Selecting Interactions

Authors: Sarah Ibrahimi, Arnaud Sors, Rafael Sampaio de Rezende, Stéphane Clinchant

Abstract: Learning with noisy labels is an active research area for image classification. However, the effect of noisy labels on image retrieval has been less studied. In this work, we propose a noise-resistant method for image retrieval named Teacher-based Selection of Interactions, T-SINT, which identifies noisy interactions, ie. elements in the distance matrix, and selects correct positive and negative i… ▽ More Learning with noisy labels is an active research area for image classification. However, the effect of noisy labels on image retrieval has been less studied. In this work, we propose a noise-resistant method for image retrieval named Teacher-based Selection of Interactions, T-SINT, which identifies noisy interactions, ie. elements in the distance matrix, and selects correct positive and negative interactions to be considered in the retrieval loss by using a teacher-based training setup which contributes to the stability. As a result, it consistently outperforms state-of-the-art methods on high noise rates across benchmark datasets with synthetic noise and more realistic noise. △ Less

Submitted 21 December, 2021; v1 submitted 20 December, 2021; originally announced December 2021.

Comments: Accepted at WACV 2022. 13 pages, 5 figures

arXiv:2101.05068 [pdf, other]

Probabilistic Embeddings for Cross-Modal Retrieval

Authors: Sanghyuk Chun, Seong Joon Oh, Rafael Sampaio de Rezende, Yannis Kalantidis, Diane Larlus

Abstract: Cross-modal retrieval methods build a common representation space for samples from multiple modalities, typically from the vision and the language domains. For images and their captions, the multiplicity of the correspondences makes the task particularly challenging. Given an image (respectively a caption), there are multiple captions (respectively images) that equally make sense. In this paper, w… ▽ More Cross-modal retrieval methods build a common representation space for samples from multiple modalities, typically from the vision and the language domains. For images and their captions, the multiplicity of the correspondences makes the task particularly challenging. Given an image (respectively a caption), there are multiple captions (respectively images) that equally make sense. In this paper, we argue that deterministic functions are not sufficiently powerful to capture such one-to-many correspondences. Instead, we propose to use Probabilistic Cross-Modal Embedding (PCME), where samples from the different modalities are represented as probabilistic distributions in the common embedding space. Since common benchmarks such as COCO suffer from non-exhaustive annotations for cross-modal matches, we propose to additionally evaluate retrieval on the CUB dataset, a smaller yet clean database where all possible image-caption pairs are annotated. We extensively ablate PCME and demonstrate that it not only improves the retrieval performance over its deterministic counterpart but also provides uncertainty estimates that render the embeddings more interpretable. Code is available at https://github.com/naver-ai/pcme △ Less

Submitted 14 June, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

Comments: Accepted to CVPR 2021; Code is available at https://github.com/naver-ai/pcme

arXiv:2012.04329 [pdf, other]

StacMR: Scene-Text Aware Cross-Modal Retrieval

Authors: Andrés Mafla, Rafael Sampaio de Rezende, Lluís Gómez, Diane Larlus, Dimosthenis Karatzas

Abstract: Recent models for cross-modal retrieval have benefited from an increasingly rich understanding of visual scenes, afforded by scene graphs and object interactions to mention a few. This has resulted in an improved matching between the visual representation of an image and the textual representation of its caption. Yet, current visual representations overlook a key aspect: the text appearing in imag… ▽ More Recent models for cross-modal retrieval have benefited from an increasingly rich understanding of visual scenes, afforded by scene graphs and object interactions to mention a few. This has resulted in an improved matching between the visual representation of an image and the textual representation of its caption. Yet, current visual representations overlook a key aspect: the text appearing in images, which may contain crucial information for retrieval. In this paper, we first propose a new dataset that allows exploration of cross-modal retrieval where images contain scene-text instances. Then, armed with this dataset, we describe several approaches which leverage scene text, including a better scene-text aware cross-modal retrieval method which uses specialized representations for text from the captions and text from the visual scene, and reconcile them in a common embedding space. Extensive experiments confirm that cross-modal retrieval approaches benefit from scene text and highlight interesting research questions worth exploring further. Dataset and code are available at http://europe.naverlabs.com/stacmr △ Less

Submitted 8 December, 2020; originally announced December 2020.

arXiv:1906.07589 [pdf, other]

Learning with Average Precision: Training Image Retrieval with a Listwise Loss

Authors: Jerome Revaud, Jon Almazan, Rafael Sampaio de Rezende, Cesar Roberto de Souza

Abstract: Image retrieval can be formulated as a ranking problem where the goal is to order database images by decreasing similarity to the query. Recent deep models for image retrieval have outperformed traditional methods by leveraging ranking-tailored loss functions, but important theoretical and practical problems remain. First, rather than directly optimizing the global ranking, they minimize an upper-… ▽ More Image retrieval can be formulated as a ranking problem where the goal is to order database images by decreasing similarity to the query. Recent deep models for image retrieval have outperformed traditional methods by leveraging ranking-tailored loss functions, but important theoretical and practical problems remain. First, rather than directly optimizing the global ranking, they minimize an upper-bound on the essential loss, which does not necessarily result in an optimal mean average precision (mAP). Second, these methods require significant engineering efforts to work well, e.g. special pre-training and hard-negative mining. In this paper we propose instead to directly optimize the global mAP by leveraging recent advances in listwise loss formulations. Using a histogram binning approximation, the AP can be differentiated and thus employed to end-to-end learning. Compared to existing losses, the proposed method considers thousands of images simultaneously at each iteration and eliminates the need for ad hoc tricks. It also establishes a new state of the art on many standard retrieval benchmarks. Models and evaluation scripts have been made available at https://europe.naverlabs.com/Deep-Image-Retrieval/ △ Less

Submitted 18 June, 2019; originally announced June 2019.

arXiv:1711.10394 [pdf, other]

Exposing Computer Generated Images by Using Deep Convolutional Neural Networks

Authors: Edmar R. S. de Rezende, Guilherme C. S. Ruppert, Antonio Theophilo, Tiago Carvalho

Abstract: The recent computer graphics developments have upraised the quality of the generated digital content, astonishing the most skeptical viewer. Games and movies have taken advantage of this fact but, at the same time, these advances have brought serious negative impacts like the ones yielded by fakeimages produced with malicious intents. Digital artists can compose artificial images capable of deceiv… ▽ More The recent computer graphics developments have upraised the quality of the generated digital content, astonishing the most skeptical viewer. Games and movies have taken advantage of this fact but, at the same time, these advances have brought serious negative impacts like the ones yielded by fakeimages produced with malicious intents. Digital artists can compose artificial images capable of deceiving the great majority of people, turning this into a very dangerous weapon in a timespan currently know as Fake News/Post-Truth" Era. In this work, we propose a new approach for dealing with the problem of detecting computer generated images, through the application of deep convolutional networks and transfer learning techniques. We start from Residual Networks and develop different models adapted to the binary problem of identifying if an image was or not computer generated. Differently from the current state-of-the-art approaches, we don't rely on hand-crafted features, but provide to the model the raw pixel information, achieving the same 0.97 of state-of-the-art methods with two main advantages: our methods show more stable results (depicted by lower variance) and eliminate the laborious and manual step of specialized features extraction and selection. △ Less

Submitted 28 November, 2017; originally announced November 2017.

arXiv:1705.04043 [pdf, other]

SCNet: Learning Semantic Correspondence

Authors: Kai Han, Rafael S. Rezende, Bumsub Ham, Kwan-Yee K. Wong, Minsu Cho, Cordelia Schmid, Jean Ponce

Abstract: This paper addresses the problem of establishing semantic correspondences between images depicting different instances of the same object or scene category. Previous approaches focus on either combining a spatial regularizer with hand-crafted features, or learning a correspondence model for appearance only. We propose instead a convolutional neural network architecture, called SCNet, for learning… ▽ More This paper addresses the problem of establishing semantic correspondences between images depicting different instances of the same object or scene category. Previous approaches focus on either combining a spatial regularizer with hand-crafted features, or learning a correspondence model for appearance only. We propose instead a convolutional neural network architecture, called SCNet, for learning a geometrically plausible model for semantic correspondence. SCNet uses region proposals as matching primitives, and explicitly incorporates geometric consistency in its loss function. It is trained on image pairs obtained from the PASCAL VOC 2007 keypoint dataset, and a comparative evaluation on several standard benchmarks demonstrates that the proposed approach substantially outperforms both recent deep learning architectures and previous methods based on hand-crafted features. △ Less

Submitted 17 August, 2017; v1 submitted 11 May, 2017; originally announced May 2017.

Comments: ICCV 2017

Showing 1–12 of 12 results for author: Rezende, R S