Search | arXiv e-print repository

Automatic Evaluation of Excavator Operators using Learned Reward Functions

Authors: Pranav Agarwal, Marek Teichmann, Sheldon Andrews, Samira Ebrahimi Kahou

Abstract: Training novice users to operate an excavator for learning different skills requires the presence of expert teachers. Considering the complexity of the problem, it is comparatively expensive to find skilled experts as the process is time-consuming and requires precise focus. Moreover, since humans tend to be biased, the evaluation process is noisy and will lead to high variance in the final score… ▽ More Training novice users to operate an excavator for learning different skills requires the presence of expert teachers. Considering the complexity of the problem, it is comparatively expensive to find skilled experts as the process is time-consuming and requires precise focus. Moreover, since humans tend to be biased, the evaluation process is noisy and will lead to high variance in the final score of different operators with similar skills. In this work, we address these issues and propose a novel strategy for the automatic evaluation of excavator operators. We take into account the internal dynamics of the excavator and the safety criterion at every time step to evaluate the performance. To further validate our approach, we use this score prediction model as a source of reward for a reinforcement learning agent to learn the task of maneuvering an excavator in a simulated environment that closely replicates the real-world dynamics. For a policy learned using these external reward prediction models, our results demonstrate safer solutions following the required dynamic constraints when compared to policy trained with task-based reward functions only, making it one step closer to real-life adoption. For future research, we release our codebase at https://github.com/pranavAL/InvRL_Auto-Evaluate and video results https://drive.google.com/file/d/1jR1otOAu8zrY8mkhUOUZW9jkBOAKK71Z/view?usp=share_link . △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: 11 pages, 5 figures, Accepted at Reinforcement Learning for Real Life (RL4RealLife) Workshop at NeurIPS 2022

arXiv:2207.00095 [pdf, other]

End-to-end Learning for Image-based Detection of Molecular Alterations in Digital Pathology

Authors: Marvin Teichmann, Andre Aichert, Hanibal Bohnenberger, Philipp Ströbel, Tobias Heimann

Abstract: Current approaches for classification of whole slide images (WSI) in digital pathology predominantly utilize a two-stage learning pipeline. The first stage identifies areas of interest (e.g. tumor tissue), while the second stage processes cropped tiles from these areas in a supervised fashion. During inference, a large number of tiles are combined into a unified prediction for the entire slide. A… ▽ More Current approaches for classification of whole slide images (WSI) in digital pathology predominantly utilize a two-stage learning pipeline. The first stage identifies areas of interest (e.g. tumor tissue), while the second stage processes cropped tiles from these areas in a supervised fashion. During inference, a large number of tiles are combined into a unified prediction for the entire slide. A major drawback of such approaches is the requirement for task-specific auxiliary labels which are not acquired in clinical routine. We propose a novel learning pipeline for WSI classification that is trainable end-to-end and does not require any auxiliary annotations. We apply our approach to predict molecular alterations for a number of different use-cases, including detection of microsatellite instability in colorectal tumors and prediction of specific mutations for colon, lung, and breast cancer cases from The Cancer Genome Atlas. Results reach AUC scores of up to 94% and are shown to be competitive with state of the art two-stage pipelines. We believe our approach can facilitate future research in digital pathology and contribute to solve a large range of problems around the prediction of cancer phenotypes, hopefully enabling personalized therapies for more patients in future. △ Less

Submitted 19 July, 2022; v1 submitted 30 June, 2022; originally announced July 2022.

Comments: MICCAI 2022; 8.5 Pages, 4 Figures

arXiv:2003.09260 [pdf]

doi 10.3233/JAD-190594

Accuracy of MRI Classification Algorithms in a Tertiary Memory Center Clinical Routine Cohort

Authors: Alexandre Morin, Jorge Samper-González, Anne Bertrand, Sebastian Stroer, Didier Dormont, Aline Mendes, Pierrick Coupé, Jamila Ahdidan, Marcel Lévy, Dalila Samri, Harald Hampel, Bruno Dubois, Marc Teichmann, Stéphane Epelbaum, Olivier Colliot

Abstract: BACKGROUND:Automated volumetry software (AVS) has recently become widely available to neuroradiologists. MRI volumetry with AVS may support the diagnosis of dementias by identifying regional atrophy. Moreover, automatic classifiers using machine learning techniques have recently emerged as promising approaches to assist diagnosis. However, the performance of both AVS and automatic classifiers has… ▽ More BACKGROUND:Automated volumetry software (AVS) has recently become widely available to neuroradiologists. MRI volumetry with AVS may support the diagnosis of dementias by identifying regional atrophy. Moreover, automatic classifiers using machine learning techniques have recently emerged as promising approaches to assist diagnosis. However, the performance of both AVS and automatic classifiers has been evaluated mostly in the artificial setting of research datasets.OBJECTIVE:Our aim was to evaluate the performance of two AVS and an automatic classifier in the clinical routine condition of a memory clinic.METHODS:We studied 239 patients with cognitive troubles from a single memory center cohort. Using clinical routine T1-weighted MRI, we evaluated the classification performance of: 1) univariate volumetry using two AVS (volBrain and Neuroreader$^{TM}$); 2) Support Vector Machine (SVM) automatic classifier, using either the AVS volumes (SVM-AVS), or whole gray matter (SVM-WGM); 3) reading by two neuroradiologists. The performance measure was the balanced diagnostic accuracy. The reference standard was consensus diagnosis by three neurologists using clinical, biological (cerebrospinal fluid) and imaging data and following international criteria.RESULTS:Univariate AVS volumetry provided only moderate accuracies (46% to 71% with hippocampal volume). The accuracy improved when using SVM-AVS classifier (52% to 85%), becoming close to that of SVM-WGM (52 to 90%). Visual classification by neuroradiologists ranged between SVM-AVS and SVM-WGM.CONCLUSION:In the routine practice of a memory clinic, the use of volumetric measures provided by AVS yields only moderate accuracy. Automatic classifiers can improve accuracy and could be a useful tool to assist diagnosis. △ Less

Submitted 19 March, 2020; originally announced March 2020.

Journal ref: Journal of Alzheimer's Disease, IOS Press, 2020, pp.1-10

arXiv:1912.03201 [pdf, other]

doi 10.1007/978-3-030-01418-6_25

A Neural Spiking Approach Compared to Deep Feedforward Networks on Stepwise Pixel Erasement

Authors: René Larisch, Michael Teichmann, Fred H. Hamker

Abstract: In real world scenarios, objects are often partially occluded. This requires a robustness for object recognition against these perturbations. Convolutional networks have shown good performances in classification tasks. The learned convolutional filters seem similar to receptive fields of simple cells found in the primary visual cortex. Alternatively, spiking neural networks are more biological pla… ▽ More In real world scenarios, objects are often partially occluded. This requires a robustness for object recognition against these perturbations. Convolutional networks have shown good performances in classification tasks. The learned convolutional filters seem similar to receptive fields of simple cells found in the primary visual cortex. Alternatively, spiking neural networks are more biological plausible. We developed a two layer spiking network, trained on natural scenes with a biologically plausible learning rule. It is compared to two deep convolutional neural networks using a classification task of stepwise pixel erasement on MNIST. In comparison to these networks the spiking approach achieves good accuracy and robustness. △ Less

Submitted 6 December, 2019; originally announced December 2019.

Comments: Published in ICANN 2018: Artificial Neural Networks and Machine Learning - ICANN 2018 https://link.springer.com/chapter/10.1007/978-3-030-01418-6_25 The final authenticated publication is available online at https://doi.org/10.1007/978-3-030-01418-6_25

arXiv:1909.10239 [pdf, other]

Large Scale Joint Semantic Re-Localisation and Scene Understanding via Globally Unique Instance Coordinate Regression

Authors: Ignas Budvytis, Marvin Teichmann, Tomas Vojir, Roberto Cipolla

Abstract: In this work we present a novel approach to joint semantic localisation and scene understanding. Our work is motivated by the need for localisation algorithms which not only predict 6-DoF camera pose but also simultaneously recognise surrounding objects and estimate 3D geometry. Such capabilities are crucial for computer vision guided systems which interact with the environment: autonomous driving… ▽ More In this work we present a novel approach to joint semantic localisation and scene understanding. Our work is motivated by the need for localisation algorithms which not only predict 6-DoF camera pose but also simultaneously recognise surrounding objects and estimate 3D geometry. Such capabilities are crucial for computer vision guided systems which interact with the environment: autonomous driving, augmented reality and robotics. In particular, we propose a two step procedure. During the first step we train a convolutional neural network to jointly predict per-pixel globally unique instance labels and corresponding local coordinates for each instance of a static object (e.g. a building). During the second step we obtain scene coordinates by combining object center coordinates and local coordinates and use them to perform 6-DoF camera pose estimation. We evaluate our approach on real world (CamVid-360) and artificial (SceneCity) autonomous driving datasets. We obtain smaller mean distance and angular errors than state-of-the-art 6-DoF pose estimation algorithms based on direct pose regression and pose estimation from scene coordinates on all datasets. Our contributions include: (i) a novel formulation of scene coordinate regression as two separate tasks of object instance recognition and local coordinate regression and a demonstration that our proposed solution allows to predict accurate 3D geometry of static objects and estimate 6-DoF pose of camera on (ii) maps larger by several orders of magnitude than previously attempted by scene coordinate regression methods, as well as on (iii) lightweight, approximate 3D maps built from 3D primitives such as building-aligned cuboids. △ Less

Submitted 23 September, 2019; originally announced September 2019.

Comments: BMVC 2019

arXiv:1812.01584 [pdf, other]

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Authors: Marvin Teichmann, Andre Araujo, Menglong Zhu, Jack Sim

Abstract: Retrieving object instances among cluttered scenes efficiently requires compact yet comprehensive regional image representations. Intuitively, object semantics can help build the index that focuses on the most relevant regions. However, due to the lack of bounding-box datasets for objects of interest among retrieval benchmarks, most recent work on regional representations has focused on either uni… ▽ More Retrieving object instances among cluttered scenes efficiently requires compact yet comprehensive regional image representations. Intuitively, object semantics can help build the index that focuses on the most relevant regions. However, due to the lack of bounding-box datasets for objects of interest among retrieval benchmarks, most recent work on regional representations has focused on either uniform or class-agnostic region selection. In this paper, we first fill the void by providing a new dataset of landmark bounding boxes, based on the Google Landmarks dataset, that includes $86k$ images with manually curated boxes from $15k$ unique landmarks. Then, we demonstrate how a trained landmark detector, using our new dataset, can be leveraged to index image regions and improve retrieval accuracy while being much more efficient than existing regional methods. In addition, we introduce a novel regional aggregated selective match kernel (R-ASMK) to effectively combine information from detected regions into an improved holistic image representation. R-ASMK boosts image retrieval accuracy substantially with no dimensionality increase, while even outperforming systems that index image regions independently. Our complete image retrieval system improves upon the previous state-of-the-art by significant margins on the Revisited Oxford and Paris datasets. Code and data available at the project webpage: https://github.com/tensorflow/models/tree/master/research/delf. △ Less

Submitted 13 May, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

Comments: CVPR 2019. Code and dataset available: https://github.com/tensorflow/models/tree/master/research/delf

arXiv:1805.04777 [pdf, other]

Convolutional CRFs for Semantic Segmentation

Authors: Marvin T. T. Teichmann, Roberto Cipolla

Abstract: For the challenging semantic image segmentation task the most efficient models have traditionally combined the structured modelling capabilities of Conditional Random Fields (CRFs) with the feature extraction power of CNNs. In more recent works however, CRF post-processing has fallen out of favour. We argue that this is mainly due to the slow training and inference speeds of CRFs, as well as the d… ▽ More For the challenging semantic image segmentation task the most efficient models have traditionally combined the structured modelling capabilities of Conditional Random Fields (CRFs) with the feature extraction power of CNNs. In more recent works however, CRF post-processing has fallen out of favour. We argue that this is mainly due to the slow training and inference speeds of CRFs, as well as the difficulty of learning the internal CRF parameters. To overcome both issues we propose to add the assumption of conditional independence to the framework of fully-connected CRFs. This allows us to reformulate the inference in terms of convolutions, which can be implemented highly efficiently on GPUs. Doing so speeds up inference and training by a factor of more then 100. All parameters of the convolutional CRFs can easily be optimized using backpropagation. To facilitating further CRF research we make our implementation publicly available. Please visit: https://github.com/MarvinTeichmann/ConvCRF △ Less

Submitted 15 May, 2018; v1 submitted 12 May, 2018; originally announced May 2018.

Comments: 8 Pages + Appendix, references. Code can be found under: https://github.com/MarvinTeichmann/ConvCRF

arXiv:1805.02475 [pdf, other]

Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery

Authors: Sebastian Bodenstedt, Max Allan, Anthony Agustinos, Xiaofei Du, Luis Garcia-Peraza-Herrera, Hannes Kenngott, Thomas Kurmann, Beat Müller-Stich, Sebastien Ourselin, Daniil Pakhomov, Raphael Sznitman, Marvin Teichmann, Martin Thoma, Tom Vercauteren, Sandrine Voros, Martin Wagner, Pamela Wochner, Lena Maier-Hein, Danail Stoyanov, Stefanie Speidel

Abstract: Intraoperative segmentation and tracking of minimally invasive instruments is a prerequisite for computer- and robotic-assisted surgery. Since additional hardware like tracking systems or the robot encoders are cumbersome and lack accuracy, surgical vision is evolving as promising techniques to segment and track the instruments using only the endoscopic images. However, what is missing so far are… ▽ More Intraoperative segmentation and tracking of minimally invasive instruments is a prerequisite for computer- and robotic-assisted surgery. Since additional hardware like tracking systems or the robot encoders are cumbersome and lack accuracy, surgical vision is evolving as promising techniques to segment and track the instruments using only the endoscopic images. However, what is missing so far are common image data sets for consistent evaluation and benchmarking of algorithms against each other. The paper presents a comparative validation study of different vision-based methods for instrument segmentation and tracking in the context of robotic as well as conventional laparoscopic surgery. The contribution of the paper is twofold: we introduce a comprehensive validation data set that was provided to the study participants and present the results of the comparative validation study. Based on the results of the validation study, we arrive at the conclusion that modern deep learning approaches outperform other methods in instrument segmentation tasks, but the results are still not perfect. Furthermore, we show that merging results from different methods actually significantly increases accuracy in comparison to the best stand-alone method. On the other hand, the results of the instrument tracking task show that this is still an open challenge, especially during challenging scenarios in conventional laparoscopic surgery. △ Less

Submitted 7 May, 2018; originally announced May 2018.

arXiv:1612.07695 [pdf, other]

MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving

Authors: Marvin Teichmann, Michael Weber, Marius Zoellner, Roberto Cipolla, Raquel Urtasun

Abstract: While most approaches to semantic reasoning have focused on improving performance, in this paper we argue that computational times are very important in order to enable real time applications such as autonomous driving. Towards this goal, we present an approach to joint classification, detection and semantic segmentation via a unified architecture where the encoder is shared amongst the three task… ▽ More While most approaches to semantic reasoning have focused on improving performance, in this paper we argue that computational times are very important in order to enable real time applications such as autonomous driving. Towards this goal, we present an approach to joint classification, detection and semantic segmentation via a unified architecture where the encoder is shared amongst the three tasks. Our approach is very simple, can be trained end-to-end and performs extremely well in the challenging KITTI dataset, outperforming the state-of-the-art in the road segmentation task. Our approach is also very efficient, taking less than 100 ms to perform all tasks. △ Less

Submitted 8 May, 2018; v1 submitted 22 December, 2016; originally announced December 2016.

Comments: 9 pages, 7 tables and 9 figures; first place on Kitti Road Segmentation; Code on GitHub (https://github.com/MarvinTeichmann/MultiNet)

arXiv:1511.00513 [pdf, other]

Pixel-wise Segmentation of Street with Neural Networks

Authors: Sebastian Bittel, Vitali Kaiser, Marvin Teichmann, Martin Thoma

Abstract: Pixel-wise street segmentation of photographs taken from a drivers perspective is important for self-driving cars and can also support other object recognition tasks. A framework called SST was developed to examine the accuracy and execution time of different neural networks. The best neural network achieved an $F_1$-score of 89.5% with a simple feedforward neural network which trained to solve a… ▽ More Pixel-wise street segmentation of photographs taken from a drivers perspective is important for self-driving cars and can also support other object recognition tasks. A framework called SST was developed to examine the accuracy and execution time of different neural networks. The best neural network achieved an $F_1$-score of 89.5% with a simple feedforward neural network which trained to solve a regression task. △ Less

Submitted 2 November, 2015; originally announced November 2015.

Showing 1–10 of 10 results for author: Teichmann, M