Search | arXiv e-print repository

TransCAD: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds

Authors: Elona Dupont, Kseniya Cherenkova, Dimitrios Mallis, Gleb Gusev, Anis Kacem, Djamila Aouada

Abstract: 3D reverse engineering, in which a CAD model is inferred given a 3D scan of a physical object, is a research direction that offers many promising practical applications. This paper proposes TransCAD, an end-to-end transformer-based architecture that predicts the CAD sequence from a point cloud. TransCAD leverages the structure of CAD sequences by using a hierarchical learning strategy. A loop refi… ▽ More 3D reverse engineering, in which a CAD model is inferred given a 3D scan of a physical object, is a research direction that offers many promising practical applications. This paper proposes TransCAD, an end-to-end transformer-based architecture that predicts the CAD sequence from a point cloud. TransCAD leverages the structure of CAD sequences by using a hierarchical learning strategy. A loop refiner is also introduced to regress sketch primitive parameters. Rigorous experimentation on the DeepCAD and Fusion360 datasets show that TransCAD achieves state-of-the-art results. The result analysis is supported with a proposed metric for CAD sequence, the mean Average Precision of CAD Sequence, that addresses the limitations of existing metrics. △ Less

Submitted 18 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

arXiv:2404.07168 [pdf, other]

Using Neural Networks to Model Hysteretic Kinematics in Tendon-Actuated Continuum Robots

Authors: Yuan Wang, Max McCandless, Abdulhamit Donder, Giovanni Pittiglio, Behnam Moradkhani, Yash Chitalia, Pierre E. Dupont

Abstract: The ability to accurately model mechanical hysteretic behavior in tendon-actuated continuum robots using deep learning approaches is a growing area of interest. In this paper, we investigate the hysteretic response of two types of tendon-actuated continuum robots and, ultimately, compare three types of neural network modeling approaches with both forward and inverse kinematic mappings: feedforward… ▽ More The ability to accurately model mechanical hysteretic behavior in tendon-actuated continuum robots using deep learning approaches is a growing area of interest. In this paper, we investigate the hysteretic response of two types of tendon-actuated continuum robots and, ultimately, compare three types of neural network modeling approaches with both forward and inverse kinematic mappings: feedforward neural network (FNN), FNN with a history input buffer, and long short-term memory (LSTM) network. We seek to determine which model best captures temporal dependent behavior. We find that, depending on the robot's design, choosing different kinematic inputs can alter whether hysteresis is exhibited by the system. Furthermore, we present the results of the model fittings, revealing that, in contrast to the standard FNN, both FNN with a history input buffer and the LSTM model exhibit the capacity to model historical dependence with comparable performance in capturing rate-dependent hysteresis. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 7 pages, 8 figures, conference

arXiv:2402.17678 [pdf, other]

CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention

Authors: Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali, Kseniya Cherenkova, Anis Kacem, Djamila Aouada

Abstract: Reverse engineering in the realm of Computer-Aided Design (CAD) has been a longstanding aspiration, though not yet entirely realized. Its primary aim is to uncover the CAD process behind a physical object given its 3D scan. We propose CAD-SIGNet, an end-to-end trainable and auto-regressive architecture to recover the design history of a CAD model represented as a sequence of sketch-and-extrusion f… ▽ More Reverse engineering in the realm of Computer-Aided Design (CAD) has been a longstanding aspiration, though not yet entirely realized. Its primary aim is to uncover the CAD process behind a physical object given its 3D scan. We propose CAD-SIGNet, an end-to-end trainable and auto-regressive architecture to recover the design history of a CAD model represented as a sequence of sketch-and-extrusion from an input point cloud. Our model learns visual-language representations by layer-wise cross-attention between point cloud and CAD language embedding. In particular, a new Sketch instance Guided Attention (SGA) module is proposed in order to reconstruct the fine-grained details of the sketches. Thanks to its auto-regressive nature, CAD-SIGNet not only reconstructs a unique full design history of the corresponding CAD model given an input point cloud but also provides multiple plausible design choices. This allows for an interactive reverse engineering scenario by providing designers with multiple next-step choices along with the design process. Extensive experiments on publicly available CAD datasets showcase the effectiveness of our approach against existing baseline models in two settings, namely, full design history recovery and conditional auto-completion from point clouds. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2401.17161 [pdf, other]

doi 10.1109/IROS55552.2023.10341686

Hybrid Tendon and Ball Chain Continuum Robots for Enhanced Dexterity in Medical Interventions

Authors: Giovanni Pittiglio, Margherita Mencattelli, Abdulhamit Donder, Yash Chitalia, Pierre E. Dupont

Abstract: A hybrid continuum robot design is introduced that combines a proximal tendon-actuated section with a distal telescoping section comprised of permanent-magnet spheres actuated using an external magnet. While, individually, each section can approach a point in its workspace from one or at most several orientations, the two-section combination possesses a dexterous workspace. The paper describes kin… ▽ More A hybrid continuum robot design is introduced that combines a proximal tendon-actuated section with a distal telescoping section comprised of permanent-magnet spheres actuated using an external magnet. While, individually, each section can approach a point in its workspace from one or at most several orientations, the two-section combination possesses a dexterous workspace. The paper describes kinematic modeling of the hybrid design and provides a description of the dexterous workspace. We present experimental validation which shows that a simplified kinematic model produces tip position mean and maximum errors of 3% and 7% of total robot length, respectively. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Journal ref: Medical Interventions," 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 8461-8466

arXiv:2312.02753 [pdf, other]

C3: High-performance and low-complexity neural compression from a single image or video

Authors: Hyunjik Kim, Matthias Bauer, Lucas Theis, Jonathan Richard Schwarz, Emilien Dupont

Abstract: Most neural compression models are trained on large datasets of images or videos in order to generalize to unseen data. Such generalization typically requires large and expressive architectures with a high decoding complexity. Here we introduce C3, a neural compression method with strong rate-distortion (RD) performance that instead overfits a small model to each image or video separately. The res… ▽ More Most neural compression models are trained on large datasets of images or videos in order to generalize to unseen data. Such generalization typically requires large and expressive architectures with a high decoding complexity. Here we introduce C3, a neural compression method with strong rate-distortion (RD) performance that instead overfits a small model to each image or video separately. The resulting decoding complexity of C3 can be an order of magnitude lower than neural baselines with similar RD performance. C3 builds on COOL-CHIC (Ladune et al.) and makes several simple and effective improvements for images. We further develop new methodology to apply C3 to videos. On the CLIC2020 image benchmark, we match the RD performance of VTM, the reference implementation of the H.266 codec, with less than 3k MACs/pixel for decoding. On the UVG video benchmark, we match the RD performance of the Video Compression Transformer (Mentzer et al.), a well-established neural video codec, with less than 5k MACs/pixel for decoding. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2308.15966 [pdf, other]

SHARP Challenge 2023: Solving CAD History and pArameters Recovery from Point clouds and 3D scans. Overview, Datasets, Metrics, and Baselines

Authors: Dimitrios Mallis, Sk Aziz Ali, Elona Dupont, Kseniya Cherenkova, Ahmet Serdar Karadeniz, Mohammad Sadil Khan, Anis Kacem, Gleb Gusev, Djamila Aouada

Abstract: Recent breakthroughs in geometric Deep Learning (DL) and the availability of large Computer-Aided Design (CAD) datasets have advanced the research on learning CAD modeling processes and relating them to real objects. In this context, 3D reverse engineering of CAD models from 3D scans is considered to be one of the most sought-after goals for the CAD industry. However, recent efforts assume multipl… ▽ More Recent breakthroughs in geometric Deep Learning (DL) and the availability of large Computer-Aided Design (CAD) datasets have advanced the research on learning CAD modeling processes and relating them to real objects. In this context, 3D reverse engineering of CAD models from 3D scans is considered to be one of the most sought-after goals for the CAD industry. However, recent efforts assume multiple simplifications limiting the applications in real-world settings. The SHARP Challenge 2023 aims at pushing the research a step closer to the real-world scenario of CAD reverse engineering through dedicated datasets and tracks. In this paper, we define the proposed SHARP 2023 tracks, describe the provided datasets, and propose a set of baseline methods along with suitable evaluation metrics to assess the performance of the track solutions. All proposed datasets along with useful routines and the evaluation metrics are publicly available. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2305.15574 [pdf, other]

Deep Stochastic Processes via Functional Markov Transition Operators

Authors: Jin Xu, Emilien Dupont, Kaspar Märtens, Tom Rainforth, Yee Whye Teh

Abstract: We introduce Markov Neural Processes (MNPs), a new class of Stochastic Processes (SPs) which are constructed by stacking sequences of neural parameterised Markov transition operators in function space. We prove that these Markov transition operators can preserve the exchangeability and consistency of SPs. Therefore, the proposed iterative construction adds substantial flexibility and expressivity… ▽ More We introduce Markov Neural Processes (MNPs), a new class of Stochastic Processes (SPs) which are constructed by stacking sequences of neural parameterised Markov transition operators in function space. We prove that these Markov transition operators can preserve the exchangeability and consistency of SPs. Therefore, the proposed iterative construction adds substantial flexibility and expressivity to the original framework of Neural Processes (NPs) without compromising consistency or adding restrictions. Our experiments demonstrate clear advantages of MNPs over baseline models on a variety of tasks. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: 18 pages, 5 figures

arXiv:2304.06531 [pdf, other]

SepicNet: Sharp Edges Recovery by Parametric Inference of Curves in 3D Shapes

Authors: Kseniya Cherenkova, Elona Dupont, Anis Kacem, Ilya Arzhannikov, Gleb Gusev, Djamila Aouada

Abstract: 3D scanning as a technique to digitize objects in reality and create their 3D models, is used in many fields and areas. Though the quality of 3D scans depends on the technical characteristics of the 3D scanner, the common drawback is the smoothing of fine details, or the edges of an object. We introduce SepicNet, a novel deep network for the detection and parametrization of sharp edges in 3D shape… ▽ More 3D scanning as a technique to digitize objects in reality and create their 3D models, is used in many fields and areas. Though the quality of 3D scans depends on the technical characteristics of the 3D scanner, the common drawback is the smoothing of fine details, or the edges of an object. We introduce SepicNet, a novel deep network for the detection and parametrization of sharp edges in 3D shapes as primitive curves. To make the network end-to-end trainable, we formulate the curve fitting in a differentiable manner. We develop an adaptive point cloud sampling technique that captures the sharp features better than uniform sampling. The experiments were conducted on a newly introduced large-scale dataset of 50k 3D scans, where the sharp edge annotations were extracted from their parametric CAD models, and demonstrate significant improvement over state-of-the-art methods. △ Less

Submitted 13 April, 2023; originally announced April 2023.

arXiv:2302.03728 [pdf, other]

doi 10.1109/ICRA48891.2023.10160695.

Magnetic Ball Chain Robots for Endoluminal Interventions

Authors: Giovanni Pittiglio, Margherita Mencattelli, Pierre E. Dupont

Abstract: This paper introduces a novel class of hyperredundant robots comprised of chains of permanently magnetized spheres enclosed in a cylindrical polymer skin. With their shape controlled using an externally-applied magnetic field, the spherical joints of these robots enable them to bend to very small radii of curvature. These robots can be used as steerable tips for endoluminal instruments. A kinemati… ▽ More This paper introduces a novel class of hyperredundant robots comprised of chains of permanently magnetized spheres enclosed in a cylindrical polymer skin. With their shape controlled using an externally-applied magnetic field, the spherical joints of these robots enable them to bend to very small radii of curvature. These robots can be used as steerable tips for endoluminal instruments. A kinematic model is derived based on minimizing magnetic and elastic potential energy. Simulation is used to demonstrate the enhanced steerability of these robots in comparison to magnetic soft continuum robots designed using either distributed or lumped magnetic material. Experiments are included to validate the model and to demonstrate the steering capability of ball chain robots in bifurcating channels. △ Less

Submitted 7 February, 2023; originally announced February 2023.

arXiv:2302.03130 [pdf, other]

Spatial Functa: Scaling Functa to ImageNet Classification and Generation

Authors: Matthias Bauer, Emilien Dupont, Andy Brock, Dan Rosenbaum, Jonathan Richard Schwarz, Hyunjik Kim

Abstract: Neural fields, also known as implicit neural representations, have emerged as a powerful means to represent complex signals of various modalities. Based on this Dupont et al. (2022) introduce a framework that views neural fields as data, termed *functa*, and proposes to do deep learning directly on this dataset of neural fields. In this work, we show that the proposed framework faces limitations w… ▽ More Neural fields, also known as implicit neural representations, have emerged as a powerful means to represent complex signals of various modalities. Based on this Dupont et al. (2022) introduce a framework that views neural fields as data, termed *functa*, and proposes to do deep learning directly on this dataset of neural fields. In this work, we show that the proposed framework faces limitations when scaling up to even moderately complex datasets such as CIFAR-10. We then propose *spatial functa*, which overcome these limitations by using spatially arranged latent representations of neural fields, thereby allowing us to scale up the approach to ImageNet-1k at 256x256 resolution. We demonstrate competitive performance to Vision Transformers (Steiner et al., 2022) on classification and Latent Diffusion (Rombach et al., 2022) on image generation respectively. △ Less

Submitted 9 February, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

arXiv:2208.10555 [pdf, other]

CADOps-Net: Jointly Learning CAD Operation Types and Steps from Boundary-Representations

Authors: Elona Dupont, Kseniya Cherenkova, Anis Kacem, Sk Aziz Ali, Ilya Arzhannikov, Gleb Gusev, Djamila Aouada

Abstract: 3D reverse engineering is a long sought-after, yet not completely achieved goal in the Computer-Aided Design (CAD) industry. The objective is to recover the construction history of a CAD model. Starting from a Boundary Representation (B-Rep) of a CAD model, this paper proposes a new deep neural network, CADOps-Net, that jointly learns the CAD operation types and the decomposition into different CA… ▽ More 3D reverse engineering is a long sought-after, yet not completely achieved goal in the Computer-Aided Design (CAD) industry. The objective is to recover the construction history of a CAD model. Starting from a Boundary Representation (B-Rep) of a CAD model, this paper proposes a new deep neural network, CADOps-Net, that jointly learns the CAD operation types and the decomposition into different CAD operation steps. This joint learning allows to divide a B-Rep into parts that were created by various types of CAD operations at the same construction step; therefore providing relevant information for further recovery of the design history. Furthermore, we propose the novel CC3D-Ops dataset that includes over $37k$ CAD models annotated with CAD operation type labels and step labels. Compared to existing datasets, the complexity and variety of CC3D-Ops models are closer to those used for industrial purposes. Our experiments, conducted on the proposed CC3D-Ops and the publicly available Fusion360 datasets, demonstrate the competitive performance of CADOps-Net with respect to state-of-the-art, and confirm the importance of the joint learning of CAD operation types and steps. △ Less

Submitted 22 August, 2022; originally announced August 2022.

arXiv:2208.08768 [pdf, other]

TSCom-Net: Coarse-to-Fine 3D Textured Shape Completion Network

Authors: Ahmet Serdar Karadeniz, Sk Aziz Ali, Anis Kacem, Elona Dupont, Djamila Aouada

Abstract: Reconstructing 3D human body shapes from 3D partial textured scans remains a fundamental task for many computer vision and graphics applications -- e.g., body animation, and virtual dressing. We propose a new neural network architecture for 3D body shape and high-resolution texture completion -- BCom-Net -- that can reconstruct the full geometry from mid-level to high-level partial input scans. We… ▽ More Reconstructing 3D human body shapes from 3D partial textured scans remains a fundamental task for many computer vision and graphics applications -- e.g., body animation, and virtual dressing. We propose a new neural network architecture for 3D body shape and high-resolution texture completion -- BCom-Net -- that can reconstruct the full geometry from mid-level to high-level partial input scans. We decompose the overall reconstruction task into two stages - first, a joint implicit learning network (SCom-Net and TCom-Net) that takes a voxelized scan and its occupancy grid as input to reconstruct the full body shape and predict vertex textures. Second, a high-resolution texture completion network, that utilizes the predicted coarse vertex textures to inpaint the missing parts of the partial 'texture atlas'. A thorough experimental evaluation on 3DBodyTex.V2 dataset shows that our method achieves competitive results with respect to the state-of-the-art while generalizing to different types and levels of partial shapes. The proposed method has also ranked second in the track1 of SHApe Recovery from Partial textured 3D scans (SHARP [38,1]) 2022 challenge1. △ Less

Submitted 22 August, 2022; v1 submitted 18 August, 2022; originally announced August 2022.

Comments: Accepted in European Conference on Computer Vision Workshop (ECCVW) 2022

arXiv:2201.12904 [pdf, other]

COIN++: Neural Compression Across Modalities

Authors: Emilien Dupont, Hrushikesh Loya, Milad Alizadeh, Adam Goliński, Yee Whye Teh, Arnaud Doucet

Abstract: Neural compression algorithms are typically based on autoencoders that require specialized encoder and decoder architectures for different data modalities. In this paper, we propose COIN++, a neural compression framework that seamlessly handles a wide range of data modalities. Our approach is based on converting data to implicit neural representations, i.e. neural functions that map coordinates (s… ▽ More Neural compression algorithms are typically based on autoencoders that require specialized encoder and decoder architectures for different data modalities. In this paper, we propose COIN++, a neural compression framework that seamlessly handles a wide range of data modalities. Our approach is based on converting data to implicit neural representations, i.e. neural functions that map coordinates (such as pixel locations) to features (such as RGB values). Then, instead of storing the weights of the implicit neural representation directly, we store modulations applied to a meta-learned base network as a compressed code for the data. We further quantize and entropy code these modulations, leading to large compression gains while reducing encoding time by two orders of magnitude compared to baselines. We empirically demonstrate the feasibility of our method by compressing various data modalities, from images and audio to medical and climate data. △ Less

Submitted 8 December, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

Comments: TMLR camera ready

arXiv:2201.12204 [pdf, other]

From data to functa: Your data point is a function and you can treat it like one

Authors: Emilien Dupont, Hyunjik Kim, S. M. Ali Eslami, Danilo Rezende, Dan Rosenbaum

Abstract: It is common practice in deep learning to represent a measurement of the world on a discrete grid, e.g. a 2D grid of pixels. However, the underlying signal represented by these measurements is often continuous, e.g. the scene depicted in an image. A powerful continuous alternative is then to represent these measurements using an implicit neural representation, a neural function trained to output t… ▽ More It is common practice in deep learning to represent a measurement of the world on a discrete grid, e.g. a 2D grid of pixels. However, the underlying signal represented by these measurements is often continuous, e.g. the scene depicted in an image. A powerful continuous alternative is then to represent these measurements using an implicit neural representation, a neural function trained to output the appropriate measurement value for any input spatial location. In this paper, we take this idea to its next level: what would it take to perform deep learning on these functions instead, treating them as data? In this context we refer to the data as functa, and propose a framework for deep learning on functa. This view presents a number of challenges around efficient conversion from data to functa, compact representation of functa, and effectively solving downstream tasks on functa. We outline a recipe to overcome these challenges and apply it to a wide range of data modalities including images, 3D shapes, neural radiance fields (NeRF) and data on manifolds. We demonstrate that this approach has various compelling properties across data modalities, in particular on the canonical tasks of generative modeling, data imputation, novel view synthesis and classification. Code: https://github.com/deepmind/functa △ Less

Submitted 10 November, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

arXiv:2103.03123 [pdf, other]

COIN: COmpression with Implicit Neural representations

Authors: Emilien Dupont, Adam Goliński, Milad Alizadeh, Yee Whye Teh, Arnaud Doucet

Abstract: We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate th… ▽ More We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate the MLP at every pixel location. We found that this simple approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights. While our framework is not yet competitive with state of the art compression methods, we show that it has various attractive properties which could make it a viable alternative to other neural data compression approaches. △ Less

Submitted 10 April, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

Comments: Added qualitative comparisons and link to github repo https://github.com/EmilienDupont/coin

arXiv:2102.04776 [pdf, other]

Generative Models as Distributions of Functions

Authors: Emilien Dupont, Yee Whye Teh, Arnaud Doucet

Abstract: Generative models are typically trained on grid-like data such as images. As a result, the size of these models usually scales directly with the underlying grid resolution. In this paper, we abandon discretized grids and instead parameterize individual data points by continuous functions. We then build generative models by learning distributions over such functions. By treating data points as func… ▽ More Generative models are typically trained on grid-like data such as images. As a result, the size of these models usually scales directly with the underlying grid resolution. In this paper, we abandon discretized grids and instead parameterize individual data points by continuous functions. We then build generative models by learning distributions over such functions. By treating data points as functions, we can abstract away from the specific type of data we train on and construct models that are agnostic to discretization. To train our model, we use an adversarial approach with a discriminator that acts on continuous signals. Through experiments on a wide variety of data modalities including images, 3D shapes and climate data, we demonstrate that our model can learn rich distributions of functions independently of data type and resolution. △ Less

Submitted 17 February, 2022; v1 submitted 9 February, 2021; originally announced February 2021.

Comments: AISTATS 2022 Oral camera ready. Incorporated reviewer feedback

arXiv:2012.10885 [pdf, other]

LieTransformer: Equivariant self-attention for Lie Groups

Authors: Michael Hutchinson, Charline Le Lan, Sheheryar Zaidi, Emilien Dupont, Yee Whye Teh, Hyunjik Kim

Abstract: Group equivariant neural networks are used as building blocks of group invariant neural networks, which have been shown to improve generalisation performance and data efficiency through principled parameter sharing. Such works have mostly focused on group equivariant convolutions, building on the result that group equivariant linear maps are necessarily convolutions. In this work, we extend the sc… ▽ More Group equivariant neural networks are used as building blocks of group invariant neural networks, which have been shown to improve generalisation performance and data efficiency through principled parameter sharing. Such works have mostly focused on group equivariant convolutions, building on the result that group equivariant linear maps are necessarily convolutions. In this work, we extend the scope of the literature to self-attention, that is emerging as a prominent building block of deep learning models. We propose the LieTransformer, an architecture composed of LieSelfAttention layers that are equivariant to arbitrary Lie groups and their discrete subgroups. We demonstrate the generality of our approach by showing experimental results that are competitive to baseline methods on a wide range of tasks: shape counting on point clouds, molecular property regression and modelling particle trajectories under Hamiltonian dynamics. △ Less

Submitted 16 June, 2021; v1 submitted 20 December, 2020; originally announced December 2020.

arXiv:2006.10711 [pdf, other]

STEER: Simple Temporal Regularization For Neural ODEs

Authors: Arnab Ghosh, Harkirat Singh Behl, Emilien Dupont, Philip H. S. Torr, Vinay Namboodiri

Abstract: Training Neural Ordinary Differential Equations (ODEs) is often computationally expensive. Indeed, computing the forward pass of such models involves solving an ODE which can become arbitrarily complex during training. Recent works have shown that regularizing the dynamics of the ODE can partially alleviate this. In this paper we propose a new regularization technique: randomly sampling the end ti… ▽ More Training Neural Ordinary Differential Equations (ODEs) is often computationally expensive. Indeed, computing the forward pass of such models involves solving an ODE which can become arbitrarily complex during training. Recent works have shown that regularizing the dynamics of the ODE can partially alleviate this. In this paper we propose a new regularization technique: randomly sampling the end time of the ODE during training. The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks. Further, the technique is orthogonal to several other methods proposed to regularize the dynamics of ODEs and as such can be used in conjunction with them. We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models. △ Less

Submitted 2 November, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

Comments: Neurips 2020

arXiv:2006.07630 [pdf, other]

Equivariant Neural Rendering

Authors: Emilien Dupont, Miguel Angel Bautista, Alex Colburn, Aditya Sankar, Carlos Guestrin, Josh Susskind, Qi Shan

Abstract: We propose a framework for learning neural scene representations directly from images, without 3D supervision. Our key insight is that 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene. Specifically, we introduce a loss which enforces equivariance of the scene representation with respect to 3D transformations. Our formulation allows us to infer… ▽ More We propose a framework for learning neural scene representations directly from images, without 3D supervision. Our key insight is that 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene. Specifically, we introduce a loss which enforces equivariance of the scene representation with respect to 3D transformations. Our formulation allows us to infer and render scenes in real time while achieving comparable results to models requiring minutes for inference. In addition, we introduce two challenging new datasets for scene representation and neural rendering, including scenes with complex lighting and backgrounds. Through experiments, we show that our model achieves compelling results on these datasets as well as on standard ShapeNet benchmarks. △ Less

Submitted 21 December, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

Comments: Add link to code

arXiv:1904.01681 [pdf, other]

Augmented Neural ODEs

Authors: Emilien Dupont, Arnaud Doucet, Yee Whye Teh

Abstract: We show that Neural Ordinary Differential Equations (ODEs) learn representations that preserve the topology of the input space and prove that this implies the existence of functions Neural ODEs cannot represent. To address these limitations, we introduce Augmented Neural ODEs which, in addition to being more expressive models, are empirically more stable, generalize better and have a lower computa… ▽ More We show that Neural Ordinary Differential Equations (ODEs) learn representations that preserve the topology of the input space and prove that this implies the existence of functions Neural ODEs cannot represent. To address these limitations, we introduce Augmented Neural ODEs which, in addition to being more expressive models, are empirically more stable, generalize better and have a lower computational cost than Neural ODEs. △ Less

Submitted 26 October, 2019; v1 submitted 2 April, 2019; originally announced April 2019.

Comments: NeurIPS camera ready, additional experiments, additional datasets, discussion on relation to other models

arXiv:1810.03728 [pdf, other]

Probabilistic Semantic Inpainting with Pixel Constrained CNNs

Authors: Emilien Dupont, Suhas Suresha

Abstract: Semantic inpainting is the task of inferring missing pixels in an image given surrounding pixels and high level image semantics. Most semantic inpainting algorithms are deterministic: given an image with missing regions, a single inpainted image is generated. However, there are often several plausible inpaintings for a given missing region. In this paper, we propose a method to perform probabilist… ▽ More Semantic inpainting is the task of inferring missing pixels in an image given surrounding pixels and high level image semantics. Most semantic inpainting algorithms are deterministic: given an image with missing regions, a single inpainted image is generated. However, there are often several plausible inpaintings for a given missing region. In this paper, we propose a method to perform probabilistic semantic inpainting by building a model, based on PixelCNNs, that learns a distribution of images conditioned on a subset of visible pixels. Experiments on the MNIST and CelebA datasets show that our method produces diverse and realistic inpaintings. △ Less

Submitted 23 February, 2019; v1 submitted 8 October, 2018; originally announced October 2018.

Comments: AISTATS camera ready version

arXiv:1804.00104 [pdf, other]

Learning Disentangled Joint Continuous and Discrete Representations

Authors: Emilien Dupont

Abstract: We present a framework for learning disentangled and interpretable jointly continuous and discrete representations in an unsupervised manner. By augmenting the continuous latent distribution of variational autoencoders with a relaxed discrete distribution and controlling the amount of information encoded in each latent unit, we show how continuous and categorical factors of variation can be discov… ▽ More We present a framework for learning disentangled and interpretable jointly continuous and discrete representations in an unsupervised manner. By augmenting the continuous latent distribution of variational autoencoders with a relaxed discrete distribution and controlling the amount of information encoded in each latent unit, we show how continuous and categorical factors of variation can be discovered automatically from data. Experiments show that the framework disentangles continuous and discrete generative factors on various datasets and outperforms current disentangling methods when a discrete generative factor is prominent. △ Less

Submitted 22 October, 2018; v1 submitted 30 March, 2018; originally announced April 2018.

Comments: NIPS camera ready, added quantitative evaluation and figures for dsprites dataset

Showing 1–22 of 22 results for author: Dupont, E