-
Grounding Image Matching in 3D with MASt3R
Authors:
Vincent Leroy,
Yohann Cabon,
Jérôme Revaud
Abstract:
Image Matching is a core component of all best-performing algorithms and pipelines in 3D vision. Yet despite matching being fundamentally a 3D problem, intrinsically linked to camera pose and scene geometry, it is typically treated as a 2D problem. This makes sense as the goal of matching is to establish correspondences between 2D pixel fields, but also seems like a potentially hazardous choice. I…
▽ More
Image Matching is a core component of all best-performing algorithms and pipelines in 3D vision. Yet despite matching being fundamentally a 3D problem, intrinsically linked to camera pose and scene geometry, it is typically treated as a 2D problem. This makes sense as the goal of matching is to establish correspondences between 2D pixel fields, but also seems like a potentially hazardous choice. In this work, we take a different stance and propose to cast matching as a 3D task with DUSt3R, a recent and powerful 3D reconstruction framework based on Transformers. Based on pointmaps regression, this method displayed impressive robustness in matching views with extreme viewpoint changes, yet with limited accuracy. We aim here to improve the matching capabilities of such an approach while preserving its robustness. We thus propose to augment the DUSt3R network with a new head that outputs dense local features, trained with an additional matching loss. We further address the issue of quadratic complexity of dense matching, which becomes prohibitively slow for downstream applications if not carefully treated. We introduce a fast reciprocal matching scheme that not only accelerates matching by orders of magnitude, but also comes with theoretical guarantees and, lastly, yields improved results. Extensive experiments show that our approach, coined MASt3R, significantly outperforms the state of the art on multiple matching tasks. In particular, it beats the best published methods by 30% (absolute improvement) in VCRE AUC on the extremely challenging Map-free localization dataset.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
DUSt3R: Geometric 3D Vision Made Easy
Authors:
Shuzhe Wang,
Vincent Leroy,
Yohann Cabon,
Boris Chidlovskii,
Jerome Revaud
Abstract:
Multi-view stereo reconstruction (MVS) in the wild requires to first estimate the camera parameters e.g. intrinsic and extrinsic parameters. These are usually tedious and cumbersome to obtain, yet they are mandatory to triangulate corresponding pixels in 3D space, which is the core of all best performing MVS algorithms. In this work, we take an opposite stance and introduce DUSt3R, a radically nov…
▽ More
Multi-view stereo reconstruction (MVS) in the wild requires to first estimate the camera parameters e.g. intrinsic and extrinsic parameters. These are usually tedious and cumbersome to obtain, yet they are mandatory to triangulate corresponding pixels in 3D space, which is the core of all best performing MVS algorithms. In this work, we take an opposite stance and introduce DUSt3R, a radically novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections, i.e. operating without prior information about camera calibration nor viewpoint poses. We cast the pairwise reconstruction problem as a regression of pointmaps, relaxing the hard constraints of usual projective camera models. We show that this formulation smoothly unifies the monocular and binocular reconstruction cases. In the case where more than two images are provided, we further propose a simple yet effective global alignment strategy that expresses all pairwise pointmaps in a common reference frame. We base our network architecture on standard Transformer encoders and decoders, allowing us to leverage powerful pretrained models. Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera. Exhaustive experiments on all these tasks showcase that the proposed DUSt3R can unify various 3D vision tasks and set new SoTAs on monocular/multi-view depth estimation as well as relative pose estimation. In summary, DUSt3R makes many geometric 3D vision tasks easy.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Cross-view and Cross-pose Completion for 3D Human Understanding
Authors:
Matthieu Armando,
Salma Galaaoui,
Fabien Baradel,
Thomas Lucas,
Vincent Leroy,
Romain Brégier,
Philippe Weinzaepfel,
Grégory Rogez
Abstract:
Human perception and understanding is a major domain of computer vision which, like many other vision subdomains recently, stands to gain from the use of large models pre-trained on large datasets. We hypothesize that the most common pre-training strategy of relying on general purpose, object-centric image datasets such as ImageNet, is limited by an important domain shift. On the other hand, colle…
▽ More
Human perception and understanding is a major domain of computer vision which, like many other vision subdomains recently, stands to gain from the use of large models pre-trained on large datasets. We hypothesize that the most common pre-training strategy of relying on general purpose, object-centric image datasets such as ImageNet, is limited by an important domain shift. On the other hand, collecting domain-specific ground truth such as 2D or 3D labels does not scale well. Therefore, we propose a pre-training approach based on self-supervised learning that works on human-centric data using only images. Our method uses pairs of images of humans: the first is partially masked and the model is trained to reconstruct the masked parts given the visible ones and a second image. It relies on both stereoscopic (cross-view) pairs, and temporal (cross-pose) pairs taken from videos, in order to learn priors about 3D as well as human motion. We pre-train a model for body-centric tasks and one for hand-centric tasks. With a generic transformer architecture, these models outperform existing self-supervised pre-training methods on a wide set of human-centric downstream tasks, and obtain state-of-the-art performance for instance when fine-tuning for model-based and model-free human mesh recovery.
△ Less
Submitted 18 April, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Win-Win: Training High-Resolution Vision Transformers from Two Windows
Authors:
Vincent Leroy,
Jerome Revaud,
Thomas Lucas,
Philippe Weinzaepfel
Abstract:
Transformers have become the standard in state-of-the-art vision architectures, achieving impressive performance on both image-level and dense pixelwise tasks. However, training vision transformers for high-resolution pixelwise tasks has a prohibitive cost. Typical solutions boil down to hierarchical architectures, fast and approximate attention, or training on low-resolution crops. This latter so…
▽ More
Transformers have become the standard in state-of-the-art vision architectures, achieving impressive performance on both image-level and dense pixelwise tasks. However, training vision transformers for high-resolution pixelwise tasks has a prohibitive cost. Typical solutions boil down to hierarchical architectures, fast and approximate attention, or training on low-resolution crops. This latter solution does not constrain architectural choices, but it leads to a clear performance drop when testing at resolutions significantly higher than that used for training, thus requiring ad-hoc and slow post-processing schemes. In this paper, we propose a novel strategy for efficient training and inference of high-resolution vision transformers. The key principle is to mask out most of the high-resolution inputs during training, keeping only N random windows. This allows the model to learn local interactions between tokens inside each window, and global interactions between tokens from different windows. As a result, the model can directly process the high-resolution input at test time without any special trick. We show that this strategy is effective when using relative positional embedding such as rotary embeddings. It is 4 times faster to train than a full-resolution network, and it is straightforward to use at test time compared to existing approaches. We apply this strategy to three dense prediction tasks with high-resolution data. First, we show on the task of semantic segmentation that a simple setting with 2 windows performs best, hence the name of our method: Win-Win. Second, we confirm this result on the task of monocular depth prediction. Third, we further extend it to the binocular task of optical flow, reaching state-of-the-art performance on the Spring benchmark that contains Full-HD images with an order of magnitude faster inference than the best competitor.
△ Less
Submitted 22 March, 2024; v1 submitted 1 October, 2023;
originally announced October 2023.
-
SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction
Authors:
Anilkumar Swamy,
Vincent Leroy,
Philippe Weinzaepfel,
Fabien Baradel,
Salma Galaaoui,
Romain Bregier,
Matthieu Armando,
Jean-Sebastien Franco,
Gregory Rogez
Abstract:
Recent hand-object interaction datasets show limited real object variability and rely on fitting the MANO parametric model to obtain groundtruth hand shapes. To go beyond these limitations and spur further research, we introduce the SHOWMe dataset which consists of 96 videos, annotated with real and detailed hand-object 3D textured meshes. Following recent work, we consider a rigid hand-object sce…
▽ More
Recent hand-object interaction datasets show limited real object variability and rely on fitting the MANO parametric model to obtain groundtruth hand shapes. To go beyond these limitations and spur further research, we introduce the SHOWMe dataset which consists of 96 videos, annotated with real and detailed hand-object 3D textured meshes. Following recent work, we consider a rigid hand-object scenario, in which the pose of the hand with respect to the object remains constant during the whole video sequence. This assumption allows us to register sub-millimetre-precise groundtruth 3D scans to the image sequences in SHOWMe. Although simpler, this hypothesis makes sense in terms of applications where the required accuracy and level of detail is important eg., object hand-over in human-robot collaboration, object scanning, or manipulation and contact point analysis. Importantly, the rigidity of the hand-object systems allows to tackle video-based 3D reconstruction of unknown hand-held objects using a 2-stage pipeline consisting of a rigid registration step followed by a multi-view reconstruction (MVR) part. We carefully evaluate a set of non-trivial baselines for these two stages and show that it is possible to achieve promising object-agnostic 3D hand-object reconstructions employing an SfM toolbox or a hand pose estimator to recover the rigid transforms and off-the-shelf MVR algorithms. However, these methods remain sensitive to the initial camera pose estimates which might be imprecise due to lack of textures on the objects or heavy occlusions of the hands, leaving room for improvements in the reconstruction. Code and dataset are available at https://europe.naverlabs.com/research/showme
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
4DHumanOutfit: a multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements
Authors:
Matthieu Armando,
Laurence Boissieux,
Edmond Boyer,
Jean-Sebastien Franco,
Martin Humenberger,
Christophe Legras,
Vincent Leroy,
Mathieu Marsot,
Julien Pansiot,
Sergi Pujades,
Rim Rekik,
Gregory Rogez,
Anilkumar Swamy,
Stefanie Wuhrer
Abstract:
This work presents 4DHumanOutfit, a new dataset of densely sampled spatio-temporal 4D human motion data of different actors, outfits and motions. The dataset is designed to contain different actors wearing different outfits while performing different motions in each outfit. In this way, the dataset can be seen as a cube of data containing 4D motion sequences along 3 axes with identity, outfit and…
▽ More
This work presents 4DHumanOutfit, a new dataset of densely sampled spatio-temporal 4D human motion data of different actors, outfits and motions. The dataset is designed to contain different actors wearing different outfits while performing different motions in each outfit. In this way, the dataset can be seen as a cube of data containing 4D motion sequences along 3 axes with identity, outfit and motion. This rich dataset has numerous potential applications for the processing and creation of digital humans, e.g. augmented reality, avatar creation and virtual try on. 4DHumanOutfit is released for research purposes at https://kinovis.inria.fr/4dhumanoutfit/. In addition to image data and 4D reconstructions, the dataset includes reference solutions for each axis. We present independent baselines along each axis that demonstrate the value of these reference solutions for evaluation tasks.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
Authors:
Philippe Weinzaepfel,
Thomas Lucas,
Vincent Leroy,
Yohann Cabon,
Vaibhav Arora,
Romain Brégier,
Gabriela Csurka,
Leonid Antsfeld,
Boris Chidlovskii,
Jérôme Revaud
Abstract:
Despite impressive performance for high-level downstream tasks, self-supervised pre-training methods have not yet fully delivered on dense geometric vision tasks such as stereo matching or optical flow. The application of self-supervised concepts, such as instance discrimination or masked image modeling, to geometric tasks is an active area of research. In this work, we build on the recent cross-v…
▽ More
Despite impressive performance for high-level downstream tasks, self-supervised pre-training methods have not yet fully delivered on dense geometric vision tasks such as stereo matching or optical flow. The application of self-supervised concepts, such as instance discrimination or masked image modeling, to geometric tasks is an active area of research. In this work, we build on the recent cross-view completion framework, a variation of masked image modeling that leverages a second view from the same scene which makes it well suited for binocular downstream tasks. The applicability of this concept has so far been limited in at least two ways: (a) by the difficulty of collecting real-world image pairs -- in practice only synthetic data have been used -- and (b) by the lack of generalization of vanilla transformers to dense downstream tasks for which relative position is more meaningful than absolute position. We explore three avenues of improvement. First, we introduce a method to collect suitable real-world image pairs at large scale. Second, we experiment with relative positional embeddings and show that they enable vision transformers to perform substantially better. Third, we scale up vision transformer based cross-completion architectures, which is made possible by the use of large amounts of data. With these improvements, we show for the first time that state-of-the-art results on stereo matching and optical flow can be reached without using any classical task-specific techniques like correlation volume, iterative estimation, image warping or multi-scale reasoning, thus paving the way towards universal vision models.
△ Less
Submitted 18 August, 2023; v1 submitted 18 November, 2022;
originally announced November 2022.
-
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion
Authors:
Philippe Weinzaepfel,
Vincent Leroy,
Thomas Lucas,
Romain Brégier,
Yohann Cabon,
Vaibhav Arora,
Leonid Antsfeld,
Boris Chidlovskii,
Gabriela Csurka,
Jérôme Revaud
Abstract:
Masked Image Modeling (MIM) has recently been established as a potent pre-training paradigm. A pretext task is constructed by masking patches in an input image, and this masked content is then predicted by a neural network using visible patches as sole input. This pre-training leads to state-of-the-art performance when finetuned for high-level semantic tasks, e.g. image classification and object d…
▽ More
Masked Image Modeling (MIM) has recently been established as a potent pre-training paradigm. A pretext task is constructed by masking patches in an input image, and this masked content is then predicted by a neural network using visible patches as sole input. This pre-training leads to state-of-the-art performance when finetuned for high-level semantic tasks, e.g. image classification and object detection. In this paper we instead seek to learn representations that transfer well to a wide variety of 3D vision and lower-level geometric downstream tasks, such as depth prediction or optical flow estimation. Inspired by MIM, we propose an unsupervised representation learning task trained from pairs of images showing the same scene from different viewpoints. More precisely, we propose the pretext task of cross-view completion where the first input image is partially masked, and this masked content has to be reconstructed from the visible content and the second image. In single-view MIM, the masked content often cannot be inferred precisely from the visible portion only, so the model learns to act as a prior influenced by high-level semantics. In contrast, this ambiguity can be resolved with cross-view completion from the second unmasked image, on the condition that the model is able to understand the spatial relationship between the two images. Our experiments show that our pretext task leads to significantly improved performance for monocular 3D vision downstream tasks such as depth estimation. In addition, our model can be directly applied to binocular downstream tasks like optical flow or relative camera pose estimation, for which we obtain competitive results without bells and whistles, i.e., using a generic architecture without any task-specific design.
△ Less
Submitted 12 January, 2023; v1 submitted 19 October, 2022;
originally announced October 2022.
-
MonoNHR: Monocular Neural Human Renderer
Authors:
Hongsuk Choi,
Gyeongsik Moon,
Matthieu Armando,
Vincent Leroy,
Kyoung Mu Lee,
Gregory Rogez
Abstract:
Existing neural human rendering methods struggle with a single image input due to the lack of information in invisible areas and the depth ambiguity of pixels in visible areas. In this regard, we propose Monocular Neural Human Renderer (MonoNHR), a novel approach that renders robust free-viewpoint images of an arbitrary human given only a single image. MonoNHR is the first method that (i) renders…
▽ More
Existing neural human rendering methods struggle with a single image input due to the lack of information in invisible areas and the depth ambiguity of pixels in visible areas. In this regard, we propose Monocular Neural Human Renderer (MonoNHR), a novel approach that renders robust free-viewpoint images of an arbitrary human given only a single image. MonoNHR is the first method that (i) renders human subjects never seen during training in a monocular setup, and (ii) is trained in a weakly-supervised manner without geometry supervision. First, we propose to disentangle 3D geometry and texture features and to condition the texture inference on the 3D geometry features. Second, we introduce a Mesh Inpainter module that inpaints the occluded parts exploiting human structural priors such as symmetry. Experiments on ZJU-MoCap, AIST, and HUMBI datasets show that our approach significantly outperforms the recent methods adapted to the monocular case.
△ Less
Submitted 2 October, 2022;
originally announced October 2022.
-
Three-dimensional acoustic lensing with a bubbly diamond metamaterial
Authors:
Maxime Lanoy,
Fabrice Lemoult,
Geoffroy Lerosey,
Arnaud Tourin,
Valentin Leroy,
John H. Page
Abstract:
A sound wave travelling in water is scattered by a periodic assembly of air bubbles. The local structure matters even in the low frequency regime. If the bubbles are arranged in a face-centered cubic (fcc) lattice, a total band gap opens near the Minnaert resonance frequency. If they are arranged in the diamond structure, which one obtains by simply adding a second bubble to the unit cell, one fin…
▽ More
A sound wave travelling in water is scattered by a periodic assembly of air bubbles. The local structure matters even in the low frequency regime. If the bubbles are arranged in a face-centered cubic (fcc) lattice, a total band gap opens near the Minnaert resonance frequency. If they are arranged in the diamond structure, which one obtains by simply adding a second bubble to the unit cell, one finds an additional branch with a negative slope (optical branch). For a single specific frequency, the medium behaves as if its refractive index (relative to water) is exactly $n=-1$. We show that a slab of this material can be used to design a three three-dimensional flat lens. We also report super-resolution focusing in the near field of the slab and illustrate its potential for imaging in three dimensions.
△ Less
Submitted 3 February, 2021; v1 submitted 6 January, 2021;
originally announced January 2021.
-
SMPLy Benchmarking 3D Human Pose Estimation in the Wild
Authors:
Vincent Leroy,
Philippe Weinzaepfel,
Romain Brégier,
Hadrien Combaluzier,
Grégory Rogez
Abstract:
Predicting 3D human pose from images has seen great recent improvements. Novel approaches that can even predict both pose and shape from a single input image have been introduced, often relying on a parametric model of the human body such as SMPL. While qualitative results for such methods are often shown for images captured in-the-wild, a proper benchmark in such conditions is still missing, as i…
▽ More
Predicting 3D human pose from images has seen great recent improvements. Novel approaches that can even predict both pose and shape from a single input image have been introduced, often relying on a parametric model of the human body such as SMPL. While qualitative results for such methods are often shown for images captured in-the-wild, a proper benchmark in such conditions is still missing, as it is cumbersome to obtain ground-truth 3D poses elsewhere than in a motion capture room. This paper presents a pipeline to easily produce and validate such a dataset with accurate ground-truth, with which we benchmark recent 3D human pose estimation methods in-the-wild. We make use of the recently introduced Mannequin Challenge dataset which contains in-the-wild videos of people frozen in action like statues and leverage the fact that people are static and the camera moving to accurately fit the SMPL model on the sequences. A total of 24,428 frames with registered body models are then selected from 567 scenes at almost no cost, using only online RGB videos. We benchmark state-of-the-art SMPL-based human pose estimation methods on this dataset. Our results highlight that challenges remain, in particular for difficult poses or for scenes where the persons are partially truncated or occluded.
△ Less
Submitted 4 December, 2020;
originally announced December 2020.
-
DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild
Authors:
Philippe Weinzaepfel,
Romain Brégier,
Hadrien Combaluzier,
Vincent Leroy,
Grégory Rogez
Abstract:
We introduce DOPE, the first method to detect and estimate whole-body 3D human poses, including bodies, hands and faces, in the wild. Achieving this level of details is key for a number of applications that require understanding the interactions of the people with each other or with the environment. The main challenge is the lack of in-the-wild data with labeled whole-body 3D poses. In previous wo…
▽ More
We introduce DOPE, the first method to detect and estimate whole-body 3D human poses, including bodies, hands and faces, in the wild. Achieving this level of details is key for a number of applications that require understanding the interactions of the people with each other or with the environment. The main challenge is the lack of in-the-wild data with labeled whole-body 3D poses. In previous work, training data has been annotated or generated for simpler tasks focusing on bodies, hands or faces separately. In this work, we propose to take advantage of these datasets to train independent experts for each part, namely a body, a hand and a face expert, and distill their knowledge into a single deep network designed for whole-body 2D-3D pose detection. In practice, given a training image with partial or no annotation, each part expert detects its subset of keypoints in 2D and 3D and the resulting estimations are combined to obtain whole-body pseudo ground-truth poses. A distillation loss encourages the whole-body predictions to mimic the experts' outputs. Our results show that this approach significantly outperforms the same whole-body model trained without distillation while staying close to the performance of the experts. Importantly, DOPE is computationally less demanding than the ensemble of experts and can achieve real-time performance. Test code and models are available at https://europe.naverlabs.com/research/computer-vision/dope.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
Robust Image Retrieval-based Visual Localization using Kapture
Authors:
Martin Humenberger,
Yohann Cabon,
Nicolas Guerin,
Julien Morat,
Vincent Leroy,
Jérôme Revaud,
Philippe Rerole,
Noé Pion,
Cesar de Souza,
Gabriela Csurka
Abstract:
Visual localization tackles the challenge of estimating the camera pose from images by using correspondence analysis between query images and a map. This task is computation and data intensive which poses challenges on thorough evaluation of methods on various datasets. However, in order to further advance in the field, we claim that robust visual localization algorithms should be evaluated on mul…
▽ More
Visual localization tackles the challenge of estimating the camera pose from images by using correspondence analysis between query images and a map. This task is computation and data intensive which poses challenges on thorough evaluation of methods on various datasets. However, in order to further advance in the field, we claim that robust visual localization algorithms should be evaluated on multiple datasets covering a broad domain variety. To facilitate this, we introduce kapture, a new, flexible, unified data format and toolbox for visual localization and structure-from-motion (SFM). It enables easy usage of different datasets as well as efficient and reusable data processing. To demonstrate this, we present a versatile pipeline for visual localization that facilitates the use of different local and global features, 3D data (e.g. depth maps), non-vision sensor data (e.g. IMU, GPS, WiFi), and various processing algorithms. Using multiple configurations of the pipeline, we show the great versatility of kapture in our experiments. Furthermore, we evaluate our methods on eight public datasets where they rank top on all and first on many of them. To foster future research, we release code, models, and all datasets used in this paper in the kapture format open source under a permissive BSD license. github.com/naver/kapture, github.com/naver/kapture-localization
△ Less
Submitted 7 January, 2022; v1 submitted 27 July, 2020;
originally announced July 2020.
-
Mutation++: MUlticomponent Thermodynamic And Transport properties for IONized gases in C++
Authors:
James B. Scoggins,
Vincent Leroy,
Georgios Bellas-Chatzigeorgis,
Bruno Dias,
Thierry E. Magin
Abstract:
The Mutation++ library provides accurate and efficient computation of physicochemical properties associated with partially ionized gases in various degrees of thermal nonequilibrium. With v1.0.0, users can compute thermodynamic and transport properties, multiphase linearly-constrained equilibria, chemical production rates, energy transfer rates, and gas-surface interactions. The framework is based…
▽ More
The Mutation++ library provides accurate and efficient computation of physicochemical properties associated with partially ionized gases in various degrees of thermal nonequilibrium. With v1.0.0, users can compute thermodynamic and transport properties, multiphase linearly-constrained equilibria, chemical production rates, energy transfer rates, and gas-surface interactions. The framework is based on an object-oriented design in C++, allowing users to plug-and-play various models, algorithms, and data as necessary. Mutation++ is available open-source under the GNU Lesser General Public License v3.0.
△ Less
Submitted 5 February, 2020;
originally announced February 2020.
-
Ultrasound transmission through monodisperse 2D microfoams
Authors:
Lorène Champougny,
Juliette Pierre,
Antoine Devulder,
Valentin Leroy,
Marie-Caroline Jullien
Abstract:
While the acoustic properties of solid foams have been abundantly characterized, sound propagation in liquid foams remains poorly understood. Recent studies have investigated the transmission of ultrasound through three-dimensional polydisperse liquid foams (Pierre et al., 2013, 2014, 2017). However, further progress requires to characterize the acoustic response of better controlled foam structur…
▽ More
While the acoustic properties of solid foams have been abundantly characterized, sound propagation in liquid foams remains poorly understood. Recent studies have investigated the transmission of ultrasound through three-dimensional polydisperse liquid foams (Pierre et al., 2013, 2014, 2017). However, further progress requires to characterize the acoustic response of better controlled foam structures. In this work, we study experimentally the transmission of ultrasounds through a single layer of monodisperse bubbles generated by microfluidics techniques. In such a material, we show that the sound velocity is only sensitive to the gas phase. Nevertheless, the structure of the liquid network has to be taken into account through a transfer parameter analogous to the one in a layer of porous material. Finally, we observe that the attenuation cannot be explained by thermal dissipation alone, but is compatible with viscous dissipation in the gas pores of the monolayer.
△ Less
Submitted 18 January, 2019;
originally announced January 2019.
-
Using Longitudinal Targeted Maximum Likelihood Estimation in Complex Settings with Dynamic Interventions
Authors:
Michael Schomaker,
Miguel Angel Luque-Fernandez,
Valeriane Leroy,
Mary-Ann Davies
Abstract:
Longitudinal targeted maximum likelihood estimation (LTMLE) has very rarely been used to estimate dynamic treatment effects in the context of time-dependent confounding affected by prior treatment when faced with long follow-up times, multiple time-varying confounders, and complex associational relationships simultaneously. Reasons for this include the potential computational burden, technical cha…
▽ More
Longitudinal targeted maximum likelihood estimation (LTMLE) has very rarely been used to estimate dynamic treatment effects in the context of time-dependent confounding affected by prior treatment when faced with long follow-up times, multiple time-varying confounders, and complex associational relationships simultaneously. Reasons for this include the potential computational burden, technical challenges, restricted modeling options for long follow-up times, and limited practical guidance in the literature. However, LTMLE has desirable asymptotic properties, i.e. it is doubly robust, and can yield valid inference when used in conjunction with machine learning. We use a topical and sophisticated question from HIV treatment research to show that LTMLE can be used successfully in complex realistic settings and compare results to competing estimators. Our example illustrates the following practical challenges common to many epidemiological studies 1) long follow-up time (30 months), 2) gradually declining sample size 3) limited support for some intervention rules of interest 4) a high-dimensional set of potential adjustment variables, increasing both the need and the challenge of integrating appropriate machine learning methods 5) consideration of collider bias. Our analyses, as well as simulations, shed new light on the application of LTMLE in complex and realistic settings: we show that (i) LTMLE can yield stable and good estimates, even when confronted with small samples and limited modeling options; (ii) machine learning utilized with a small set of simple learners (if more complex ones can't be fitted) can outperform a single, complex model, which is tailored to incorporate prior clinical knowledge; (iii) performance can vary considerably depending on interventions and their support in the data, and therefore critical quality checks should accompany every LTMLE analysis.
△ Less
Submitted 4 March, 2021; v1 submitted 14 February, 2018;
originally announced February 2018.
-
Acoustic double negativity induced by position correlations within a disordered set of monopolar resonators
Authors:
Maxime Lanoy,
John H. Page,
Geoffroy Lerosey,
Fabrice Lemoult,
Arnaud Tourin,
Valentin Leroy
Abstract:
Using a Multiple Scattering Theory algorithm, we investigate numerically the transmission of ultrasonic waves through a disordered locally resonant metamaterial containing only monopolar resonators. By comparing the cases of a perfectly random medium with its pair correlated counterpart, we show that the introduction of short range correlation can substantially impact the effective parameters of t…
▽ More
Using a Multiple Scattering Theory algorithm, we investigate numerically the transmission of ultrasonic waves through a disordered locally resonant metamaterial containing only monopolar resonators. By comparing the cases of a perfectly random medium with its pair correlated counterpart, we show that the introduction of short range correlation can substantially impact the effective parameters of the sample. We report, notably, the opening of an acoustic transparency window in the region of the hybridization band gap. Interestingly, the transparency window is found to be associated with negative values of both effective compressibility and density. Despite this feature being unexpected for a disordered medium of monopolar resonators, we show that it can be fully described analytically and that it gives rise to negative refraction of waves.
△ Less
Submitted 11 October, 2017; v1 submitted 9 October, 2017;
originally announced October 2017.
-
An FFT approach to the analysis of dynamic properties of gas/liquid interfaces
Authors:
Sandrine Mariot,
Valentin Leroy,
Juliette Pierre,
Florence Elias,
Eloise Bouthemy,
Dominique Langevin,
Wiebke Drenckhan
Abstract:
The characterisation of the dynamic properties of viscoelastic monolayers of surfactants at the gasliquid interface is very important in the analysis and prediction of foam stability. With most of the relevant dynamic processes being rapid (thermal fluctuation, film coalescence etc.) it is important to probe interfacial dynamics at high deformation rates. Today, only few techniques allow this, one…
▽ More
The characterisation of the dynamic properties of viscoelastic monolayers of surfactants at the gasliquid interface is very important in the analysis and prediction of foam stability. With most of the relevant dynamic processes being rapid (thermal fluctuation, film coalescence etc.) it is important to probe interfacial dynamics at high deformation rates. Today, only few techniques allow this, one of them being the characterisation of the propagation of electrocapillary waves on the liquid surface. Traditionally, this technique has been applied in a continuous mode (i.e. at constant frequency) in order to ensure reliable accuracy. Here we explore the possibility to analyse the propagation of an excited pulse in order to access the interfacial properties in one single Fourier treatment over a wide range of frequencies. The main advantage of this approach is that the measurement times and the required liquid volumes can be reduced significantly. This occurs at the cost of precision in the measurement, due partly to the presence of a pronounced resonance of the liquid surface. The pulsed approach may therefore be used to prescan the surface response before a more in-depth scan at constant frequency; or to follow the changes of the interfacial properties during surfactant adsorption.
△ Less
Submitted 21 October, 2016;
originally announced October 2016.
-
Ternary Neural Networks for Resource-Efficient AI Applications
Authors:
Hande Alemdar,
Vincent Leroy,
Adrien Prost-Boucle,
Frédéric Pétrot
Abstract:
The computation and storage requirements for Deep Neural Networks (DNNs) are usually high. This issue limits their deployability on ubiquitous computing devices such as smart phones, wearables and autonomous drones. In this paper, we propose ternary neural networks (TNNs) in order to make deep learning more resource-efficient. We train these TNNs using a teacher-student approach based on a novel,…
▽ More
The computation and storage requirements for Deep Neural Networks (DNNs) are usually high. This issue limits their deployability on ubiquitous computing devices such as smart phones, wearables and autonomous drones. In this paper, we propose ternary neural networks (TNNs) in order to make deep learning more resource-efficient. We train these TNNs using a teacher-student approach based on a novel, layer-wise greedy methodology. Thanks to our two-stage training procedure, the teacher network is still able to use state-of-the-art methods such as dropout and batch normalization to increase accuracy and reduce training time. Using only ternary weights and activations, the student ternary network learns to mimic the behavior of its teacher network without using any multiplication. Unlike its -1,1 binary counterparts, a ternary neural network inherently prunes the smaller weights by setting them to zero during training. This makes them sparser and thus more energy-efficient. We design a purpose-built hardware architecture for TNNs and implement it on FPGA and ASIC. We evaluate TNNs on several benchmark datasets and demonstrate up to 3.1x better energy efficiency with respect to the state of the art while also improving accuracy.
△ Less
Submitted 26 February, 2017; v1 submitted 1 September, 2016;
originally announced September 2016.
-
Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns
Authors:
Martin Kirchgessner,
Vincent Leroy,
Sihem Amer-Yahia,
Shashwat Mishra
Abstract:
Understanding customer buying patterns is of great interest to the retail industry and has shown to benefit a wide variety of goals ranging from managing stocks to implementing loyalty programs. Association rule mining is a common technique for extracting correlations such as "people in the South of France buy rosé wine" or "customers who buy paté also buy salted butter and sour bread." Unfortunat…
▽ More
Understanding customer buying patterns is of great interest to the retail industry and has shown to benefit a wide variety of goals ranging from managing stocks to implementing loyalty programs. Association rule mining is a common technique for extracting correlations such as "people in the South of France buy rosé wine" or "customers who buy paté also buy salted butter and sour bread." Unfortunately, sifting through a high number of buying patterns is not useful in practice, because of the predominance of popular products in the top rules. As a result, a number of "interestingness" measures (over 30) have been proposed to rank rules. However, there is no agreement on which measures are more appropriate for retail data. Moreover, since pattern mining algorithms output thousands of association rules for each product, the ability for an analyst to rely on ranking measures to identify the most interesting ones is crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a framework that provides analysts with the ability to compare the outcome of interestingness measures applied to buying patterns in the retail industry. We report on how we used CAPA to compare 34 measures applied to over 1,800 stores of Intermarché, one of the largest food retailers in France.
△ Less
Submitted 15 March, 2016;
originally announced March 2016.
-
Time reversal sub-wavelength focusing in bubbly media
Authors:
Maxime Lanoy,
Romain Pierrat,
Fabrice Lemoult,
Mathias Fink,
Valentin Leroy,
Arnaud Tourin
Abstract:
Thanks to a Multiple Scattering Theory algorithm, we present a way to focus energy at the deep subwavelength scale, from the far-field, inside a cubic disordered bubble cloud by using broadband Time Reversal (TR). We show that the analytical calculation of an effective wavenumber performing the Independant Scattering Approximation (ISA) matches the numerical results for the focal extension. Subwav…
▽ More
Thanks to a Multiple Scattering Theory algorithm, we present a way to focus energy at the deep subwavelength scale, from the far-field, inside a cubic disordered bubble cloud by using broadband Time Reversal (TR). We show that the analytical calculation of an effective wavenumber performing the Independant Scattering Approximation (ISA) matches the numerical results for the focal extension. Subwavelength focusings of lambda/100 are reported for simulations with perfect bubbles (no loss). A more realistic case, with viscous and thermal losses, allows us to obtain a $λ/14$ focal spot, with a low volume fraction of scatterers (phi = 0.01). Bubbly materials could open new perspective for acoustic actuation in the microfluidic context.
△ Less
Submitted 23 October, 2015;
originally announced October 2015.
-
Manipulating bubbles with secondary Bjerknes forces
Authors:
Maxime Lanoy,
Caroline Derec,
Arnaud Tourin,
Valentin Leroy
Abstract:
Gas bubbles in a sound field are submitted to a radiative force, known as the secondary Bjerknes force. We propose an original experimental setup that allows us to investigate in details this force between two bubbles, as a function of the sonication frequency, as well as the bubbles radii and distance. We report the observation of both attractive and, more interestingly, repulsive Bjerknes force,…
▽ More
Gas bubbles in a sound field are submitted to a radiative force, known as the secondary Bjerknes force. We propose an original experimental setup that allows us to investigate in details this force between two bubbles, as a function of the sonication frequency, as well as the bubbles radii and distance. We report the observation of both attractive and, more interestingly, repulsive Bjerknes force, when the two bubbles are driven in antiphase. Our experiments show the importance of taking multiple scattering into account, which leads to a strong acoustic coupling of the bubbles when their radii are similar. Our setup demonstrates the accuracy of secondary Bjerknes forces for attracting or repealing a bubble, and could lead to new acoustic tools for non contact manipulation in microfluidic devices.
△ Less
Submitted 23 October, 2015;
originally announced October 2015.
-
A technique for measuring velocity and attenuation of ultrasound in liquid foams
Authors:
Juliette Pierre,
F. Elias,
Valentin Leroy
Abstract:
We describe an experimental setup specifically designed for measuring the ultrasonic transmission through liquid foams, over a broad range of frequencies (60-600 kHz). The question of determining the ultrasonic properties of the foam (density, phase velocity and attenuation) from the transmission measurements is addressed. An inversion method is proposed, tested on synthetic data, and applied to a…
▽ More
We describe an experimental setup specifically designed for measuring the ultrasonic transmission through liquid foams, over a broad range of frequencies (60-600 kHz). The question of determining the ultrasonic properties of the foam (density, phase velocity and attenuation) from the transmission measurements is addressed. An inversion method is proposed, tested on synthetic data, and applied to a liquid foam at different times during the coarsening. The ultrasonic velocity and attenuation are found to be very sensitive to the foam bubble sizes, suggesting that a spectroscopy technique could be developed for liquid foams.
△ Less
Submitted 19 October, 2012;
originally announced October 2012.
-
Enhanced and reduced transmission of acoustic waves with bubble meta-screens
Authors:
Alice Bretagne,
Arnaud Tourin,
V. Leroy
Abstract:
We present a class of sonic meta-screens for manipulating air-borne acoustic waves at ultrasonic or audible frequencies. Our screens consist of periodic arrangements of air bubbles in water or possibly embedded in a soft elastic matrix. They can be used for soundproofing, but also for exalting transmission at an air/water interface or even to achieve enhanced absorption.
We present a class of sonic meta-screens for manipulating air-borne acoustic waves at ultrasonic or audible frequencies. Our screens consist of periodic arrangements of air bubbles in water or possibly embedded in a soft elastic matrix. They can be used for soundproofing, but also for exalting transmission at an air/water interface or even to achieve enhanced absorption.
△ Less
Submitted 5 November, 2011;
originally announced November 2011.
-
Influence of positional correlations on the propagation of waves in a complex medium with polydisperse resonant scatterers
Authors:
V. Leroy,
A. L. Strybulevych,
J. H. Page,
M. G. Scanlon
Abstract:
We present experimental results on a model system for studying wave propagation in a complex medium exhibiting low frequency resonances. These experiments enable us to investigate a fundamental question that is relevant for many materials, such as metamaterials, where low-frequency scattering resonances strongly influence the effective medium properties. This question concerns the effect of correl…
▽ More
We present experimental results on a model system for studying wave propagation in a complex medium exhibiting low frequency resonances. These experiments enable us to investigate a fundamental question that is relevant for many materials, such as metamaterials, where low-frequency scattering resonances strongly influence the effective medium properties. This question concerns the effect of correlations in the positions of the scatterers on the coupling between their resonances, and hence on wave transport through the medium. To examine this question experimentally, we measure the effective medium wave number of acoustic waves in a sample made of bubbles embedded in an elastic matrix over a frequency range that includes the resonance frequency of the bubbles. The effective medium is highly dispersive, showing peaks in the attenuation and the phase velocity as functions of the frequency, which cannot be accurately described using the Independent Scattering Approximation (ISA). This discrepancy may be explained by the effects of the positional correlations of the scatterers, which we show to be dependent on the size of the scatterers. We propose a self-consistent approach for taking this "polydisperse correlation" into account and show that our model better describes the experimental results than the ISA.
△ Less
Submitted 26 April, 2011; v1 submitted 29 October, 2010;
originally announced October 2010.
-
Thermal degradation of ligno-cellulosic fuels: DSC and TGA studies
Authors:
Valérie Leroy,
Dominique Cancellieri,
Eric Leoni
Abstract:
The scope of this work was to show the utility of thermal analysis and calorimetric experiments to study the thermal oxidative degradation of Mediterranean scrubs. We investigated the thermal degradation of four species; DSC and TGA were used under air sweeping to record oxidative reactions in dynamic conditions. Heat released and mass loss are important data to be measured for wildland fires mo…
▽ More
The scope of this work was to show the utility of thermal analysis and calorimetric experiments to study the thermal oxidative degradation of Mediterranean scrubs. We investigated the thermal degradation of four species; DSC and TGA were used under air sweeping to record oxidative reactions in dynamic conditions. Heat released and mass loss are important data to be measured for wildland fires modelling purpose and fire hazard studies on ligno-cellulosic fuels. Around 638 and 778 K, two dominating and overlapped exothermic peaks were recorded in DSC and individualized using a experimental and numerical separation. This stage allowed obtaining the enthalpy variation of each exothermic phenomenon. As an application, we propose to classify the fuels according to the heat released and the rate constant of each reaction. TGA experiments showed under air two successive mass loss around 638 and 778 K. Both techniques are useful in order to measure ignitability, combustibility and sustainability of forest fuels.
△ Less
Submitted 17 November, 2008;
originally announced November 2008.