Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 57 results for author: Eslami, A

.
  1. arXiv:2405.03162  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Advancing Multimodal Medical Capabilities of Gemini

    Authors: Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal, Nick George, Yang Wang, Ryutaro Tanno, David G. T. Barrett, Wei-Hung Weng , et al. (22 additional authors not shown)

    Abstract: Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histop… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  2. arXiv:2404.18416  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Capabilities of Gemini Models in Medicine

    Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

    Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2311.18260  [pdf, other

    eess.IV cs.CL cs.CV cs.LG

    Consensus, dissensus and synergy between clinicians and specialist foundation models in radiology report generation

    Authors: Ryutaro Tanno, David G. T. Barrett, Andrew Sellergren, Sumedh Ghaisas, Sumanth Dathathri, Abigail See, Johannes Welbl, Karan Singhal, Shekoofeh Azizi, Tao Tu, Mike Schaekermann, Rhys May, Roy Lee, SiWai Man, Zahra Ahmed, Sara Mahdavi, Yossi Matias, Joelle Barral, Ali Eslami, Danielle Belgrave, Vivek Natarajan, Shravya Shetty, Pushmeet Kohli, Po-Sen Huang, Alan Karthikesalingam , et al. (1 additional authors not shown)

    Abstract: Radiology reports are an instrumental part of modern medicine, informing key clinical decisions such as diagnosis and treatment. The worldwide shortage of radiologists, however, restricts access to expert care and imposes heavy workloads, contributing to avoidable errors and delays in report delivery. While recent progress in automated report generation with vision-language models offer clear pote… ▽ More

    Submitted 20 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

  5. arXiv:2310.06085  [pdf, other

    cs.CV cs.LG

    Quantile-based Maximum Likelihood Training for Outlier Detection

    Authors: Masoud Taghikhah, Nishant Kumar, Siniša Šegvić, Abouzar Eslami, Stefan Gumhold

    Abstract: Discriminative learning effectively predicts true object class for image classification. However, it often results in false positives for outliers, posing critical concerns in applications like autonomous driving and video surveillance systems. Previous attempts to address this challenge involved training image classifiers through contrastive learning using actual outlier data or synthesizing outl… ▽ More

    Submitted 2 June, 2024; v1 submitted 20 August, 2023; originally announced October 2023.

    Comments: Camera Ready Version. Accepted at AAAI 2024. Code available at https://github.com/taghikhah/QuantOD

  6. arXiv:2302.07106  [pdf, other

    cs.CV

    Normalizing Flow based Feature Synthesis for Outlier-Aware Object Detection

    Authors: Nishant Kumar, Siniša Šegvić, Abouzar Eslami, Stefan Gumhold

    Abstract: Real-world deployment of reliable object detectors is crucial for applications such as autonomous driving. However, general-purpose object detectors like Faster R-CNN are prone to providing overconfident predictions for outlier objects. Recent outlier-aware object detection approaches estimate the density of instance-wide features with class-conditional Gaussians and train on synthesized outlier f… ▽ More

    Submitted 28 May, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: Accepted as CVPR 2023 Highlight (Top 10% of all acceptance)

  7. arXiv:2212.09538  [pdf

    stat.ME

    Estimation of the attributable fraction for time to event outcomes using an inverse probability of exposure weighted Kaplan-Meier estimator

    Authors: Denis Talbot, Miceline Mésidor, Kossi Clément Trenou, Mathilde Lavigne-Robichaud, Xavier Trudel, Aida Eslami

    Abstract: Population attributable fractions aim to quantify the proportion of the cases of an outcome (for example, a disease) that would have been avoided had no individuals in the population been exposed to a given exposure. This quantity thus plays a crucial role in epidemiology and public health, notably to guide policies, interventions or to assess the burden of a disease due to a particular exposure.… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: 15 pages, 0 figure

  8. arXiv:2210.06433  [pdf, other

    cs.CV cs.AI cs.LG

    Self-supervised video pretraining yields human-aligned visual representations

    Authors: Nikhil Parthasarathy, S. M. Ali Eslami, João Carreira, Olivier J. Hénaff

    Abstract: Humans learn powerful representations of objects and scenes by observing how they evolve over time. Yet, outside of specific tasks that require explicit temporal understanding, static image pretraining remains the dominant paradigm for learning visual foundation models. We question this mismatch, and ask whether video pretraining can yield visual representations that bear the hallmarks of human pe… ▽ More

    Submitted 25 July, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Technical report

  9. arXiv:2207.05727  [pdf, other

    cs.CV math.PR

    Enhancing Fairness of Visual Attribute Predictors

    Authors: Tobias Hänel, Nishant Kumar, Dmitrij Schlesinger, Mengze Li, Erdem Ünal, Abouzar Eslami, Stefan Gumhold

    Abstract: The performance of deep neural networks for image recognition tasks such as predicting a smiling face is known to degrade with under-represented classes of sensitive attributes. We address this problem by introducing fairness-aware regularization losses based on batch estimates of Demographic Parity, Equalized Odds, and a novel Intersection-over-Union measure. The experiments performed on facial a… ▽ More

    Submitted 1 October, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: Camera Ready, ACCV 2022

  10. arXiv:2201.12204  [pdf, other

    cs.LG

    From data to functa: Your data point is a function and you can treat it like one

    Authors: Emilien Dupont, Hyunjik Kim, S. M. Ali Eslami, Danilo Rezende, Dan Rosenbaum

    Abstract: It is common practice in deep learning to represent a measurement of the world on a discrete grid, e.g. a 2D grid of pixels. However, the underlying signal represented by these measurements is often continuous, e.g. the scene depicted in an image. A powerful continuous alternative is then to represent these measurements using an implicit neural representation, a neural function trained to output t… ▽ More

    Submitted 10 November, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

  11. arXiv:2106.14108  [pdf, other

    cs.CE eess.IV

    Inferring a Continuous Distribution of Atom Coordinates from Cryo-EM Images using VAEs

    Authors: Dan Rosenbaum, Marta Garnelo, Michal Zielinski, Charlie Beattie, Ellen Clancy, Andrea Huber, Pushmeet Kohli, Andrew W. Senior, John Jumper, Carl Doersch, S. M. Ali Eslami, Olaf Ronneberger, Jonas Adler

    Abstract: Cryo-electron microscopy (cryo-EM) has revolutionized experimental protein structure determination. Despite advances in high resolution reconstruction, a majority of cryo-EM experiments provide either a single state of the studied macromolecule, or a relatively small number of its conformations. This reduces the effectiveness of the technique for proteins with flexible regions, which are known to… ▽ More

    Submitted 26 June, 2021; originally announced June 2021.

  12. arXiv:2106.13884  [pdf, other

    cs.CV cs.CL cs.LG

    Multimodal Few-Shot Learning with Frozen Language Models

    Authors: Maria Tsimpoukelli, Jacob Menick, Serkan Cabi, S. M. Ali Eslami, Oriol Vinyals, Felix Hill

    Abstract: When trained at sufficient scale, auto-regressive language models exhibit the notable ability to learn a new language task after being prompted with just a few examples. Here, we present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language). Using aligned image and caption data, we train a vision encoder to represent each im… ▽ More

    Submitted 3 July, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

  13. arXiv:2105.12196  [pdf, other

    cs.AI cs.MA cs.NE cs.RO

    From Motor Control to Team Play in Simulated Humanoid Football

    Authors: Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, Thore Graepel, Nicolas Heess

    Abstract: Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents… ▽ More

    Submitted 25 May, 2021; originally announced May 2021.

  14. arXiv:2105.00162  [pdf, other

    cs.AI cs.NE

    Generative Art Using Neural Visual Grammars and Dual Encoders

    Authors: Chrisantha Fernando, S. M. Ali Eslami, Jean-Baptiste Alayrac, Piotr Mirowski, Dylan Banarse, Simon Osindero

    Abstract: Whilst there are perhaps only a few scientific methods, there seem to be almost as many artistic methods as there are artists. Artistic processes appear to inhabit the highest order of open-endedness. To begin to understand some of the processes of art making it is helpful to try to automate them even partially. In this paper, a novel algorithm for producing generative art is described which allow… ▽ More

    Submitted 3 May, 2021; v1 submitted 1 May, 2021; originally announced May 2021.

  15. arXiv:2011.09192  [pdf, other

    cs.AI cs.GT cs.MA

    Game Plan: What AI can do for Football, and What Football can do for AI

    Authors: Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder , et al. (11 additional authors not shown)

    Abstract: The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with t… ▽ More

    Submitted 18 November, 2020; originally announced November 2020.

  16. arXiv:2007.05566  [pdf, other

    cs.LG stat.ML

    Contrastive Training for Improved Out-of-Distribution Detection

    Authors: Jim Winkens, Rudy Bunel, Abhijit Guha Roy, Robert Stanforth, Vivek Natarajan, Joseph R. Ledsam, Patricia MacWilliams, Pushmeet Kohli, Alan Karthikesalingam, Simon Kohl, Taylan Cemgil, S. M. Ali Eslami, Olaf Ronneberger

    Abstract: Reliable detection of out-of-distribution (OOD) inputs is increasingly understood to be a precondition for deployment of machine learning systems. This paper proposes and investigates the use of contrastive training to boost OOD detection performance. Unlike leading methods for OOD detection, our approach does not require access to examples labeled explicitly as OOD, which can be difficult to coll… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

  17. arXiv:2003.05707  [pdf, other

    cs.CV cs.LG

    Fairness by Learning Orthogonal Disentangled Representations

    Authors: Mhd Hasan Sarhan, Nassir Navab, Abouzar Eslami, Shadi Albarqouni

    Abstract: Learning discriminative powerful representations is a crucial step for machine learning systems. Introducing invariance against arbitrary nuisance or sensitive attributes while performing well on specific tasks is an important problem in representation learning. This is mostly approached by purging the sensitive information from learned representations. In this paper, we propose a novel disentangl… ▽ More

    Submitted 4 July, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

  18. arXiv:2002.10880  [pdf, other

    cs.GR cs.CV cs.LG stat.ML

    PolyGen: An Autoregressive Generative Model of 3D Meshes

    Authors: Charlie Nash, Yaroslav Ganin, S. M. Ali Eslami, Peter W. Battaglia

    Abstract: Polygon meshes are an efficient representation of 3D geometry, and are of central importance in computer graphics, robotics and games development. Existing learning-based approaches have avoided the challenges of working with 3D meshes, instead using alternative object representations that are more compatible with neural architectures and training approaches. We present an approach which models th… ▽ More

    Submitted 23 February, 2020; originally announced February 2020.

  19. arXiv:1912.04618  [pdf, other

    cs.CV

    Deep Attention Based Semi-Supervised 2D-Pose Estimation for Surgical Instruments

    Authors: Mert Kayhan, Okan Köpüklü, Mhd Hasan Sarhan, Mehmet Yigitsoy, Abouzar Eslami, Gerhard Rigoll

    Abstract: For many practical problems and applications, it is not feasible to create a vast and accurately labeled dataset, which restricts the application of deep learning in many areas. Semi-supervised learning algorithms intend to improve performance by also leveraging unlabeled data. This is very valuable for 2D-pose estimation task where data labeling requires substantial time and is subject to noise.… ▽ More

    Submitted 11 January, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

  20. arXiv:1910.01007  [pdf, other

    cs.CV cs.LG stat.ML

    Unsupervised Doodling and Painting with Improved SPIRAL

    Authors: John F. J. Mellor, Eunbyung Park, Yaroslav Ganin, Igor Babuschkin, Tejas Kulkarni, Dan Rosenbaum, Andy Ballard, Theophane Weber, Oriol Vinyals, S. M. Ali Eslami

    Abstract: We investigate using reinforcement learning agents as generative models of images (extending arXiv:1804.01118). A generative agent controls a simulated painting environment, and is trained with rewards provided by a discriminator network simultaneously trained to assess the realism of the agent's samples, either unconditional or reconstructions. Compared to prior work, we make a number of improvem… ▽ More

    Submitted 2 October, 2019; originally announced October 2019.

    Comments: See https://learning-to-paint.github.io for an interactive version of this paper, with videos

    ACM Class: I.2; I.4

  21. arXiv:1909.06693  [pdf, ps, other

    cs.GT

    Local Voting Games for Misbehavior Detection in VANETs in Presence of Uncertainty

    Authors: Ali Behfarnia, Ali Eslami

    Abstract: Cooperation between neighboring vehicles is an effective solution to the problem of malicious node identification in vehicular ad hoc networks (VANETs). However, the outcome is subject to nodes' beliefs and reactions in the collaboration. In this paper, a plain game-theoretic approach that captures the uncertainty of nodes about their monitoring systems, the type of their neighboring nodes, and th… ▽ More

    Submitted 14 September, 2019; originally announced September 2019.

  22. arXiv:1905.13077  [pdf, other

    cs.CV

    A Hierarchical Probabilistic U-Net for Modeling Multi-Scale Ambiguities

    Authors: Simon A. A. Kohl, Bernardino Romera-Paredes, Klaus H. Maier-Hein, Danilo Jimenez Rezende, S. M. Ali Eslami, Pushmeet Kohli, Andrew Zisserman, Olaf Ronneberger

    Abstract: Medical imaging only indirectly measures the molecular identity of the tissue within each voxel, which often produces only ambiguous image evidence for target measures of interest, like semantic segmentation. This diversity and the variations of plausible interpretations are often specific to given image regions and may thus manifest on various scales, spanning all the way from the pixel to the im… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

    Comments: 25 pages, 15 figures

  23. arXiv:1905.09272  [pdf, other

    cs.CV cs.LG

    Data-Efficient Image Recognition with Contrastive Predictive Coding

    Authors: Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord

    Abstract: Human observers can learn to recognize new categories of images from a handful of examples, yet doing so with artificial ones remains an open challenge. We hypothesize that data-efficient recognition is enabled by representations which make the variability in natural signals more predictable. We therefore revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning suc… ▽ More

    Submitted 1 July, 2020; v1 submitted 22 May, 2019; originally announced May 2019.

  24. arXiv:1904.12732  [pdf, other

    eess.IV cs.CV physics.med-ph

    Multi-scale Microaneurysms Segmentation Using Embedding Triplet Loss

    Authors: Mhd Hasan Sarhan, Shadi Albarqouni, Mehmet Yigitsoy, Nassir Navab, Abouzar Eslami

    Abstract: Deep learning techniques are recently being used in fundus image analysis and diabetic retinopathy detection. Microaneurysms are an important indicator of diabetic retinopathy progression. We introduce a two-stage deep learning approach for microaneurysms segmentation using multiple scales of the input with selective sampling and embedding triplet loss. The model first segments on two scales and t… ▽ More

    Submitted 14 August, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

  25. arXiv:1904.08491  [pdf, other

    cs.LG stat.ML

    Learning Interpretable Disentangled Representations using Adversarial VAEs

    Authors: Mhd Hasan Sarhan, Abouzar Eslami, Nassir Navab, Shadi Albarqouni

    Abstract: Learning Interpretable representation in medical applications is becoming essential for adopting data-driven models into clinical practice. It has been recently shown that learning a disentangled feature representation is important for a more compact and explainable representation of the data. In this paper, we introduce a novel adversarial variational autoencoder with a total correlation constrai… ▽ More

    Submitted 17 April, 2019; originally announced April 2019.

  26. arXiv:1903.11907  [pdf, other

    stat.ML cs.LG

    Meta-Learning surrogate models for sequential decision making

    Authors: Alexandre Galashov, Jonathan Schwarz, Hyunjik Kim, Marta Garnelo, David Saxton, Pushmeet Kohli, S. M. Ali Eslami, Yee Whye Teh

    Abstract: We introduce a unified probabilistic framework for solving sequential decision making problems ranging from Bayesian optimisation to contextual bandits and reinforcement learning. This is accomplished by a probabilistic model-based approach that explains observed data while capturing predictive uncertainty during the decision making process. Crucially, this probabilistic model is chosen to be a Me… ▽ More

    Submitted 12 June, 2019; v1 submitted 28 March, 2019; originally announced March 2019.

  27. arXiv:1903.02034  [pdf, ps, other

    cs.CR

    Risk Assessment of Autonomous Vehicles Using Bayesian Defense Graphs

    Authors: Ali Behfarnia, Ali Eslami

    Abstract: Recent developments have made autonomous vehicles (AVs) closer to hitting our roads. However, their security is still a major concern among drivers as well as manufacturers. Although some work has been done to identify threats and possible solutions, a theoretical framework is needed to measure the security of AVs. In this paper, a simple security model based on defense graphs is proposed to quant… ▽ More

    Submitted 5 March, 2019; originally announced March 2019.

    Comments: IEEE 88th Vehicular Technology Conference: VTC2018-Fall

  28. arXiv:1901.05761  [pdf, other

    cs.LG stat.ML

    Attentive Neural Processes

    Authors: Hyunjik Kim, Andriy Mnih, Jonathan Schwarz, Marta Garnelo, Ali Eslami, Dan Rosenbaum, Oriol Vinyals, Yee Whye Teh

    Abstract: Neural Processes (NPs) (Garnelo et al 2018a;b) approach regression by learning to map a context set of observed input-output pairs to a distribution over regression functions. Each function models the distribution of the output given an input, conditioned on the context. NPs have the benefit of fitting observed data efficiently with linear complexity in the number of context input-output pairs, an… ▽ More

    Submitted 9 July, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

  29. arXiv:1812.00898  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    Generating Diverse Programs with Instruction Conditioned Reinforced Adversarial Learning

    Authors: Aishwarya Agrawal, Mateusz Malinowski, Felix Hill, Ali Eslami, Oriol Vinyals, Tejas Kulkarni

    Abstract: Advances in Deep Reinforcement Learning have led to agents that perform well across a variety of sensory-motor domains. In this work, we study the setting in which an agent must learn to generate programs for diverse scenes conditioned on a given symbolic instruction. Final goals are specified to our agent via images of the scenes. A symbolic instruction consistent with the goal images is used as… ▽ More

    Submitted 3 December, 2018; originally announced December 2018.

  30. Towards Robotic Eye Surgery: Marker-free, Online Hand-eye Calibration using Optical Coherence Tomography Images

    Authors: Mingchuan Zhou, Mahdi Hamad, Jakob Weiss, Abouzar Eslami, Kai Huang, Mathias Maier, Chris P. Lohmann, Nassir Navab, Alois Knoll, M. Ali Nasseri

    Abstract: Ophthalmic microsurgery is known to be a challenging operation, which requires very precise and dexterous manipulation. Image guided robot-assisted surgery (RAS) is a promising solution that brings significant improvements in outcomes and reduces the physical limitations of human surgeons. However, this technology must be further developed before it can be routinely used in clinics. One of the pro… ▽ More

    Submitted 17 August, 2018; originally announced August 2018.

    Comments: *The first two authors contributed equally to this paper. Accepted by IEEE Robotics and Automation Letters (RA-L), 2018

  31. arXiv:1807.03149  [pdf, other

    cs.CV cs.LG stat.ML

    Learning models for visual 3D localization with implicit mapping

    Authors: Dan Rosenbaum, Frederic Besse, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami

    Abstract: We consider learning based methods for visual localization that do not require the construction of explicit maps in the form of point clouds or voxels. The goal is to learn an implicit representation of the environment at a higher, more abstract level. We propose to use a generative approach based on Generative Query Networks (GQNs, Eslami et al. 2018), asking the following questions: 1) Can GQN c… ▽ More

    Submitted 12 December, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

  32. arXiv:1807.02033  [pdf, other

    cs.CV cs.LG stat.ML

    Consistent Generative Query Networks

    Authors: Ananya Kumar, S. M. Ali Eslami, Danilo J. Rezende, Marta Garnelo, Fabio Viola, Edward Lockhart, Murray Shanahan

    Abstract: Stochastic video prediction models take in a sequence of image frames, and generate a sequence of consecutive future image frames. These models typically generate future frames in an autoregressive fashion, which is slow and requires the input and output frames to be consecutive. We introduce a model that overcomes these drawbacks by generating a latent representation from an arbitrary set of fram… ▽ More

    Submitted 21 April, 2019; v1 submitted 5 July, 2018; originally announced July 2018.

  33. arXiv:1807.01670  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Encoding Spatial Relations from Natural Language

    Authors: Tiago Ramalho, Tomáš Kočiský, Frederic Besse, S. M. Ali Eslami, Gábor Melis, Fabio Viola, Phil Blunsom, Karl Moritz Hermann

    Abstract: Natural language processing has made significant inroads into learning the semantics of words through distributional approaches, however representations learnt via these methods fail to capture certain kinds of information implicit in the real world. In particular, spatial relations are encoded in a way that is inconsistent with human spatial reasoning and lacking invariance to viewpoint changes.… ▽ More

    Submitted 5 July, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

  34. arXiv:1807.01622  [pdf, other

    cs.LG stat.ML

    Neural Processes

    Authors: Marta Garnelo, Jonathan Schwarz, Dan Rosenbaum, Fabio Viola, Danilo J. Rezende, S. M. Ali Eslami, Yee Whye Teh

    Abstract: A neural network (NN) is a parameterised function that can be tuned via gradient descent to approximate a labelled collection of data with high precision. A Gaussian process (GP), on the other hand, is a probabilistic model that defines a distribution over possible functions, and is updated in light of data via the rules of probabilistic inference. GPs are probabilistic, data-efficient and flexibl… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

  35. arXiv:1807.01613  [pdf, other

    cs.LG stat.ML

    Conditional Neural Processes

    Authors: Marta Garnelo, Dan Rosenbaum, Chris J. Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo J. Rezende, S. M. Ali Eslami

    Abstract: Deep neural networks excel at function approximation, yet they are typically trained from scratch for each new function. On the other hand, Bayesian methods, such as Gaussian Processes (GPs), exploit prior knowledge to quickly infer the shape of a new function at test time. Yet GPs are computationally expensive, and it can be hard to design appropriate priors. In this paper we propose a family of… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

  36. arXiv:1806.06163  [pdf, other

    cs.IT

    A Micro-Scale Mobile-Enabled Implantable Medical Sensor

    Authors: Michael Okwori, Ali Behfarnia, Phanikumar Vuka, Ali Eslami

    Abstract: Micro-scale implantable medical devices (IMDs) extend the immense benefits of sensors used in health management. However, their development is limited by many requirements and challenges, such as the use of safe materials, size restrictions, safe and efficient powering, and selection of suitable wireless communication technologies. Some of the proposed wireless communication technologies are the t… ▽ More

    Submitted 15 June, 2018; originally announced June 2018.

  37. arXiv:1806.05034  [pdf, other

    cs.CV cs.LG cs.NE stat.ML

    A Probabilistic U-Net for Segmentation of Ambiguous Images

    Authors: Simon A. A. Kohl, Bernardino Romera-Paredes, Clemens Meyer, Jeffrey De Fauw, Joseph R. Ledsam, Klaus H. Maier-Hein, S. M. Ali Eslami, Danilo Jimenez Rezende, Olaf Ronneberger

    Abstract: Many real-world vision problems suffer from inherent ambiguities. In clinical applications for example, it might not be clear from a CT scan alone which particular region is cancer tissue. Therefore a group of graders typically produces a set of diverse but plausible segmentations. We consider the task of learning a distribution over segmentations given an input. To this end we propose a generativ… ▽ More

    Submitted 29 January, 2019; v1 submitted 13 June, 2018; originally announced June 2018.

    Comments: Last update: added further details about the LIDC experiment. 11 pages for the main paper, 28 pages including appendix. 5 figures in the main paper, 18 figures in total, Advances in Neural Information Processing Systems (NeurIPS), 2018

  38. arXiv:1804.09401  [pdf, other

    stat.ML cs.LG

    Generative Temporal Models with Spatial Memory for Partially Observed Environments

    Authors: Marco Fraccaro, Danilo Jimenez Rezende, Yori Zwols, Alexander Pritzel, S. M. Ali Eslami, Fabio Viola

    Abstract: In model-based reinforcement learning, generative and temporal models of environments can be leveraged to boost agent performance, either by tuning the agent's representations during training or via use as part of an explicit planning mechanism. However, their application in practice has been limited to simplistic environments, due to the difficulty of training such models in larger, potentially p… ▽ More

    Submitted 19 July, 2018; v1 submitted 25 April, 2018; originally announced April 2018.

    Comments: ICML 2018

  39. arXiv:1804.01118  [pdf, other

    cs.CV cs.LG stat.ML

    Synthesizing Programs for Images using Reinforced Adversarial Learning

    Authors: Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, S. M. Ali Eslami, Oriol Vinyals

    Abstract: Advances in deep generative networks have led to impressive results in recent years. Nevertheless, such models can often waste their capacity on the minutiae of datasets, presumably due to weak inductive biases in their decoders. This is where graphics engines may come in handy since they abstract away low-level details and represent images as high-level programs. Current methods that combine deep… ▽ More

    Submitted 3 April, 2018; originally announced April 2018.

    Comments: 12 pages, 13 figures

  40. arXiv:1803.03835  [pdf, other

    cs.LG

    Kickstarting Deep Reinforcement Learning

    Authors: Simon Schmitt, Jonathan J. Hudson, Augustin Zidek, Simon Osindero, Carl Doersch, Wojciech M. Czarnecki, Joel Z. Leibo, Heinrich Kuttler, Andrew Zisserman, Karen Simonyan, S. M. Ali Eslami

    Abstract: We present a method for using previously-trained 'teacher' agents to kickstart the training of a new 'student' agent. To this end, we leverage ideas from policy distillation and population based training. Our method places no constraints on the architecture of the teacher or student agents, and it regulates itself to allow the students to surpass their teachers in performance. We show that, on a c… ▽ More

    Submitted 10 March, 2018; originally announced March 2018.

  41. arXiv:1802.07740  [pdf, other

    cs.AI

    Machine Theory of Mind

    Authors: Neil C. Rabinowitz, Frank Perbet, H. Francis Song, Chiyuan Zhang, S. M. Ali Eslami, Matthew Botvinick

    Abstract: Theory of mind (ToM; Premack & Woodruff, 1978) broadly refers to humans' ability to represent the mental states of others, including their desires, beliefs, and intentions. We propose to train a machine to build such models too. We design a Theory of Mind neural network -- a ToMnet -- which uses meta-learning to build models of the agents it encounters, from observations of their behaviour alone.… ▽ More

    Submitted 12 March, 2018; v1 submitted 21 February, 2018; originally announced February 2018.

    Comments: 21 pages, 15 figures

  42. arXiv:1802.06446  [pdf, other

    cs.CV

    Fast 5DOF Needle Tracking in iOCT

    Authors: Jakob Weiss, Nicola Rieke, Mohammad Ali Nasseri, Mathias Maier, Abouzar Eslami, Nassir Navab

    Abstract: Purpose. Intraoperative Optical Coherence Tomography (iOCT) is an increasingly available imaging technique for ophthalmic microsurgery that provides high-resolution cross-sectional information of the surgical scene. We propose to build on its desirable qualities and present a method for tracking the orientation and location of a surgical needle. Thereby, we enable direct analysis of instrument-tis… ▽ More

    Submitted 18 February, 2018; originally announced February 2018.

  43. arXiv:1802.03006  [pdf, other

    cs.LG

    Learning and Querying Fast Generative Models for Reinforcement Learning

    Authors: Lars Buesing, Theophane Weber, Sebastien Racaniere, S. M. Ali Eslami, Danilo Rezende, David P. Reichert, Fabio Viola, Frederic Besse, Karol Gregor, Demis Hassabis, Daan Wierstra

    Abstract: A key challenge in model-based reinforcement learning (RL) is to synthesize computationally efficient and accurate environment models. We show that carefully designed generative models that learn and operate on compact state representations, so-called state-space models, substantially reduce the computational costs for predicting outcomes of sequences of actions. Extensive experiments establish th… ▽ More

    Submitted 8 February, 2018; originally announced February 2018.

  44. arXiv:1710.10304  [pdf, other

    cs.NE cs.CV

    Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions

    Authors: Scott Reed, Yutian Chen, Thomas Paine, Aäron van den Oord, S. M. Ali Eslami, Danilo Rezende, Oriol Vinyals, Nando de Freitas

    Abstract: Deep autoregressive models have shown state-of-the-art performance in density estimation for natural images on large-scale datasets such as ImageNet. However, such models require many thousands of gradient-based weight updates and unique image examples for training. Ideally, the models would rapidly learn visual concepts from only a handful of examples, similar to the manner in which humans learns… ▽ More

    Submitted 28 February, 2018; v1 submitted 27 October, 2017; originally announced October 2017.

  45. arXiv:1707.02286  [pdf, other

    cs.AI

    Emergence of Locomotion Behaviours in Rich Environments

    Authors: Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver

    Abstract: The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help to promote the learning of complex behavior. Specifically… ▽ More

    Submitted 10 July, 2017; v1 submitted 7 July, 2017; originally announced July 2017.

  46. arXiv:1703.10701  [pdf, other

    cs.CV

    Concurrent Segmentation and Localization for Tracking of Surgical Instruments

    Authors: Iro Laina, Nicola Rieke, Christian Rupprecht, Josué Page Vizcaíno, Abouzar Eslami, Federico Tombari, Nassir Navab

    Abstract: Real-time instrument tracking is a crucial requirement for various computer-assisted interventions. In order to overcome problems such as specular reflections and motion blur, we propose a novel method that takes advantage of the interdependency between localization and segmentation of the surgical tool. In particular, we reformulate the 2D instrument pose estimation as heatmap regression and ther… ▽ More

    Submitted 1 August, 2017; v1 submitted 30 March, 2017; originally announced March 2017.

    Comments: I. Laina and N. Rieke contributed equally to this work. Accepted to MICCAI 2017

  47. arXiv:1609.07796  [pdf, ps, other

    cs.SI physics.soc-ph

    Error Correction Coding Meets Cyber-Physical Systems: Message-Passing Analysis of Self-Healing Interdependent Networks

    Authors: Ali Behfarnia, Ali Eslami

    Abstract: Coupling cyber and physical systems gives rise to numerous engineering challenges and opportunities. An important challenge is the contagion of failure from one system to another, which can lead to large-scale cascading failures. However, the self-healing ability emerges as a valuable opportunity where the overlaying cyber network can cure failures in the underlying physical network. To capture bo… ▽ More

    Submitted 30 June, 2017; v1 submitted 25 September, 2016; originally announced September 2016.

    Comments: arXiv admin note: substantial text overlap with arXiv:1606.00955

  48. arXiv:1607.00662  [pdf, other

    cs.CV cs.LG stat.ML

    Unsupervised Learning of 3D Structure from Images

    Authors: Danilo Jimenez Rezende, S. M. Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, Nicolas Heess

    Abstract: A key goal of computer vision is to recover the underlying 3D structure from 2D observations of the world. In this paper we learn strong deep generative models of 3D structures, and recover these structures from 3D and 2D images via probabilistic inference. We demonstrate high-quality samples and report log-likelihoods on several datasets, including ShapeNet [2], and establish the first benchmarks… ▽ More

    Submitted 19 June, 2018; v1 submitted 3 July, 2016; originally announced July 2016.

    Comments: Appears in Advances in Neural Information Processing Systems 29 (NIPS 2016)

  49. arXiv:1606.00955   

    cs.SI

    Message Passing for Analysis and Resilient Design of Self-Healing Interdependent Cyber-Physical Networks

    Authors: Ali Behfarnia, Ali Eslami

    Abstract: Coupling cyber and physical systems gives rise to numerous engineering challenges and opportunities. An important challenge is the contagion of failure from one system to another, that can lead to large scale cascading failures. On the other hand, self-healing ability emerges as a valuable opportunity where the overlay cyber network can cure failures in the underlying physical network. To capture… ▽ More

    Submitted 29 December, 2016; v1 submitted 2 June, 2016; originally announced June 2016.

    Comments: The complete version of this paper is uploaded into this system with this title: " Error Correction Coding Meets Cyber-Physical Systems: Message-Passing Analysis of Self-Healing Interdependent Networks". See arXiv:1609.07796. So, by withdrawing this paper, readers have the opportunity to find a complete version of this

  50. arXiv:1603.08575  [pdf, other

    cs.CV cs.LG

    Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

    Authors: S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari, Koray Kavukcuoglu, Geoffrey E. Hinton

    Abstract: We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural network that attends to scene elements and processes them one at a time. Crucially, the model itself learns to choose the appropriate number of inference steps. We use this scheme to learn to perform inference… ▽ More

    Submitted 12 August, 2016; v1 submitted 28 March, 2016; originally announced March 2016.