Search | arXiv e-print repository

arXiv:2006.14480 [pdf, other]

One Thousand and One Hours: Self-driving Motion Prediction Dataset

Authors: John Houston, Guido Zuidhof, Luca Bergamini, Yawei Ye, Long Chen, Ashesh Jain, Sammy Omari, Vladimir Iglovikov, Peter Ondruska

Abstract: Motivated by the impact of large-scale datasets on ML systems we present the largest self-driving dataset for motion prediction to date, containing over 1,000 hours of data. This was collected by a fleet of 20 autonomous vehicles along a fixed route in Palo Alto, California, over a four-month period. It consists of 170,000 scenes, where each scene is 25 seconds long and captures the perception out… ▽ More Motivated by the impact of large-scale datasets on ML systems we present the largest self-driving dataset for motion prediction to date, containing over 1,000 hours of data. This was collected by a fleet of 20 autonomous vehicles along a fixed route in Palo Alto, California, over a four-month period. It consists of 170,000 scenes, where each scene is 25 seconds long and captures the perception output of the self-driving system, which encodes the precise positions and motions of nearby vehicles, cyclists, and pedestrians over time. On top of this, the dataset contains a high-definition semantic map with 15,242 labelled elements and a high-definition aerial view over the area. We show that using a dataset of this size dramatically improves performance for key self-driving problems. Combined with the provided software kit, this collection forms the largest and most detailed dataset to date for the development of self-driving machine learning tasks, such as motion forecasting, motion planning and simulation. The full dataset is available at http://level5.lyft.com/. △ Less

Submitted 16 November, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

Comments: Presente at CoRL2020

arXiv:2001.11190 [pdf, other]

2018 Robotic Scene Segmentation Challenge

Authors: Max Allan, Satoshi Kondo, Sebastian Bodenstedt, Stefan Leger, Rahim Kadkhodamohammadi, Imanol Luengo, Felix Fuentes, Evangello Flouty, Ahmed Mohammed, Marius Pedersen, Avinash Kori, Varghese Alex, Ganapathy Krishnamurthi, David Rauber, Robert Mendel, Christoph Palm, Sophia Bano, Guinther Saibro, Chi-Sheng Shih, Hsun-An Chiang, Juntang Zhuang, Junlin Yang, Vladimir Iglovikov, Anton Dobrenkii, Madhu Reddiboina , et al. (16 additional authors not shown)

Abstract: In 2015 we began a sub-challenge at the EndoVis workshop at MICCAI in Munich using endoscope images of ex-vivo tissue with automatically generated annotations from robot forward kinematics and instrument CAD models. However, the limited background variation and simple motion rendered the dataset uninformative in learning about which techniques would be suitable for segmentation in real surgery. In… ▽ More In 2015 we began a sub-challenge at the EndoVis workshop at MICCAI in Munich using endoscope images of ex-vivo tissue with automatically generated annotations from robot forward kinematics and instrument CAD models. However, the limited background variation and simple motion rendered the dataset uninformative in learning about which techniques would be suitable for segmentation in real surgery. In 2017, at the same workshop in Quebec we introduced the robotic instrument segmentation dataset with 10 teams participating in the challenge to perform binary, articulating parts and type segmentation of da Vinci instruments. This challenge included realistic instrument motion and more complex porcine tissue as background and was widely addressed with modifications on U-Nets and other popular CNN architectures. In 2018 we added to the complexity by introducing a set of anatomical objects and medical devices to the segmented classes. To avoid over-complicating the challenge, we continued with porcine data which is dramatically simpler than human tissue due to the lack of fatty tissue occluding many organs. △ Less

Submitted 2 August, 2020; v1 submitted 30 January, 2020; originally announced January 2020.

arXiv:1905.01743 [pdf, other]

Breast Tumor Cellularity Assessment using Deep Neural Networks

Authors: Alexander Rakhlin, Aleksei Tiulpin, Alexey A. Shvets, Alexandr A. Kalinin, Vladimir I. Iglovikov, Sergey Nikolenko

Abstract: Breast cancer is one of the main causes of death worldwide. Histopathological cellularity assessment of residual tumors in post-surgical tissues is used to analyze a tumor's response to a therapy. Correct cellularity assessment increases the chances of getting an appropriate treatment and facilitates the patient's survival. In current clinical practice, tumor cellularity is manually estimated by p… ▽ More Breast cancer is one of the main causes of death worldwide. Histopathological cellularity assessment of residual tumors in post-surgical tissues is used to analyze a tumor's response to a therapy. Correct cellularity assessment increases the chances of getting an appropriate treatment and facilitates the patient's survival. In current clinical practice, tumor cellularity is manually estimated by pathologists; this process is tedious and prone to errors or low agreement rates between assessors. In this work, we evaluated three strong novel Deep Learning-based approaches for automatic assessment of tumor cellularity from post-treated breast surgical specimens stained with hematoxylin and eosin. We validated the proposed methods on the BreastPathQ SPIE challenge dataset that consisted of 2395 image patches selected from whole slide images acquired from 64 patients. Compared to expert pathologist scoring, our best performing method yielded the Cohen's kappa coefficient of 0.70 (vs. 0.42 previously known in literature) and the intra-class correlation coefficient of 0.89 (vs. 0.83). Our results suggest that Deep Learning-based methods have a significant potential to alleviate the burden on pathologists, enhance the diagnostic workflow, and, thereby, facilitate better clinical outcomes in breast cancer treatment. △ Less

Submitted 3 September, 2019; v1 submitted 5 May, 2019; originally announced May 2019.

arXiv:1902.06426 [pdf, other]

2017 Robotic Instrument Segmentation Challenge

Authors: Max Allan, Alex Shvets, Thomas Kurmann, Zichen Zhang, Rahul Duggal, Yun-Hsuan Su, Nicola Rieke, Iro Laina, Niveditha Kalavakonda, Sebastian Bodenstedt, Luis Herrera, Wenqi Li, Vladimir Iglovikov, Huoling Luo, Jian Yang, Danail Stoyanov, Lena Maier-Hein, Stefanie Speidel, Mahdi Azizian

Abstract: In mainstream computer vision and machine learning, public datasets such as ImageNet, COCO and KITTI have helped drive enormous improvements by enabling researchers to understand the strengths and limitations of different algorithms via performance comparison. However, this type of approach has had limited translation to problems in robotic assisted surgery as this field has never established the… ▽ More In mainstream computer vision and machine learning, public datasets such as ImageNet, COCO and KITTI have helped drive enormous improvements by enabling researchers to understand the strengths and limitations of different algorithms via performance comparison. However, this type of approach has had limited translation to problems in robotic assisted surgery as this field has never established the same level of common datasets and benchmarking methods. In 2015 a sub-challenge was introduced at the EndoVis workshop where a set of robotic images were provided with automatically generated annotations from robot forward kinematics. However, there were issues with this dataset due to the limited background variation, lack of complex motion and inaccuracies in the annotation. In this work we present the results of the 2017 challenge on robotic instrument segmentation which involved 10 teams participating in binary, parts and type based segmentation of articulated da Vinci robotic instruments. △ Less

Submitted 21 February, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

arXiv:1810.02981 [pdf, other]

Camera Model Identification Using Convolutional Neural Networks

Authors: Artur Kuzin, Artur Fattakhov, Ilya Kibardin, Vladimir Iglovikov, Ruslan Dautov

Abstract: Source camera identification is the process of determining which camera or model has been used to capture an image. In the recent years, there has been a rapid growth of research interest in the domain of forensics. In the current work, we describe our Deep Learning approach to the camera detection task of 10 cameras as a part of the Camera Model Identification Challenge hosted by Kaggle.com where… ▽ More Source camera identification is the process of determining which camera or model has been used to capture an image. In the recent years, there has been a rapid growth of research interest in the domain of forensics. In the current work, we describe our Deep Learning approach to the camera detection task of 10 cameras as a part of the Camera Model Identification Challenge hosted by Kaggle.com where our team finished 2nd out of 582 teams with the accuracy on the unseen data of 98%. We used aggressive data augmentations that allowed a model to stay robust against transformations. A number of experiments are carried out on datasets collected by organizers and scraped from the web. △ Less

Submitted 9 December, 2018; v1 submitted 6 October, 2018; originally announced October 2018.

Comments: 5 pages, 2 figures

arXiv:1809.06839 [pdf, other]

doi 10.3390/info11020125

Albumentations: fast and flexible image augmentations

Authors: Alexander Buslaev, Alex Parinov, Eugene Khvedchenya, Vladimir I. Iglovikov, Alexandr A. Kalinin

Abstract: Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve output labels. In computer vision domain, image augmentations have become a common implicit regularization technique to combat overfitting in deep convolutional neural networks and are ubiquitously used to improve performance. Whil… ▽ More Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve output labels. In computer vision domain, image augmentations have become a common implicit regularization technique to combat overfitting in deep convolutional neural networks and are ubiquitously used to improve performance. While most deep learning frameworks implement basic image transformations, the list is typically limited to some variations and combinations of flipping, rotating, scaling, and cropping. Moreover, the image processing speed varies in existing tools for image augmentation. We present Albumentations, a fast and flexible library for image augmentations with many various image transform operations available, that is also an easy-to-use wrapper around other augmentation libraries. We provide examples of image augmentations for different computer vision tasks and show that Albumentations is faster than other commonly used image augmentation tools on the most of commonly used image transformations. The source code for Albumentations is made publicly available online at https://github.com/albu/albumentations △ Less

Submitted 18 September, 2018; originally announced September 2018.

arXiv:1806.05182 [pdf, other]

Fully Convolutional Network for Automatic Road Extraction from Satellite Imagery

Authors: Alexander V. Buslaev, Selim S. Seferbekov, Vladimir I. Iglovikov, Alexey A. Shvets

Abstract: Analysis of high-resolution satellite images has been an important research topic for traffic management, city planning, and road monitoring. One of the problems here is automatic and precise road extraction. From an original image, it is difficult and computationally expensive to extract roads due to presences of other road-like features with straight edges. In this paper, we propose an approach… ▽ More Analysis of high-resolution satellite images has been an important research topic for traffic management, city planning, and road monitoring. One of the problems here is automatic and precise road extraction. From an original image, it is difficult and computationally expensive to extract roads due to presences of other road-like features with straight edges. In this paper, we propose an approach for automatic road extraction based on a fully convolutional neural network of U-net family. This network consists of ResNet-34 pre-trained on ImageNet and decoder adapted from vanilla U-Net. Based on validation results, leaderboard and our own experience this network shows superior results for the DEEPGLOBE - CVPR 2018 road extraction sub-challenge. Moreover, this network uses moderate memory that allows using just one GTX 1080 or 1080ti video cards to perform whole training and makes pretty fast predictions. △ Less

Submitted 19 June, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1806.03510, arXiv:1804.08024, arXiv:1801.05746, arXiv:1803.01207

arXiv:1806.03510 [pdf, other]

Feature Pyramid Network for Multi-Class Land Segmentation

Authors: Selim S. Seferbekov, Vladimir I. Iglovikov, Alexander V. Buslaev, Alexey A. Shvets

Abstract: Semantic segmentation is in-demand in satellite imagery processing. Because of the complex environment, automatic categorization and segmentation of land cover is a challenging problem. Solving it can help to overcome many obstacles in urban planning, environmental engineering or natural landscape monitoring. In this paper, we propose an approach for automatic multi-class land segmentation based o… ▽ More Semantic segmentation is in-demand in satellite imagery processing. Because of the complex environment, automatic categorization and segmentation of land cover is a challenging problem. Solving it can help to overcome many obstacles in urban planning, environmental engineering or natural landscape monitoring. In this paper, we propose an approach for automatic multi-class land segmentation based on a fully convolutional neural network of feature pyramid network (FPN) family. This network is consisted of pre-trained on ImageNet Resnet50 encoder and neatly developed decoder. Based on validation results, leaderboard score and our own experience this network shows reliable results for the DEEPGLOBE - CVPR 2018 land cover classification sub-challenge. Moreover, this network moderately uses memory that allows using GTX 1080 or 1080 TI video cards to perform whole training and makes pretty fast predictions. △ Less

Submitted 19 June, 2018; v1 submitted 9 June, 2018; originally announced June 2018.

arXiv:1806.00844 [pdf, other]

TernausNetV2: Fully Convolutional Network for Instance Segmentation

Authors: Vladimir I. Iglovikov, Selim Seferbekov, Alexander V. Buslaev, Alexey Shvets

Abstract: The most common approaches to instance segmentation are complex and use two-stage networks with object proposals, conditional random-fields, template matching or recurrent neural networks. In this work we present TernausNetV2 - a simple fully convolutional network that allows extracting objects from a high-resolution satellite imagery on an instance level. The network has popular encoder-decoder t… ▽ More The most common approaches to instance segmentation are complex and use two-stage networks with object proposals, conditional random-fields, template matching or recurrent neural networks. In this work we present TernausNetV2 - a simple fully convolutional network that allows extracting objects from a high-resolution satellite imagery on an instance level. The network has popular encoder-decoder type of architecture with skip connections but has a few essential modifications that allows using for semantic as well as for instance segmentation tasks. This approach is universal and allows to extend any network that has been successfully applied for semantic segmentation to perform instance segmentation task. In addition, we generalize network encoder that was pre-trained for RGB images to use additional input channels. It makes possible to use transfer learning from visual to a wider spectral range. For DeepGlobe-CVPR 2018 building detection sub-challenge, based on public leaderboard score, our approach shows superior performance in comparison to other methods. The source code corresponding pre-trained weights are publicly available at https://github.com/ternaus/TernausNetV2 △ Less

Submitted 19 June, 2018; v1 submitted 3 June, 2018; originally announced June 2018.

arXiv:1804.08024 [pdf, other]

doi 10.1109/ICMLA.2018.00098

Angiodysplasia Detection and Localization Using Deep Convolutional Neural Networks

Authors: Alexey Shvets, Vladimir Iglovikov, Alexander Rakhlin, Alexandr A. Kalinin

Abstract: Accurate detection and localization for angiodysplasia lesions is an important problem in early stage diagnostics of gastrointestinal bleeding and anemia. Gold-standard for angiodysplasia detection and localization is performed using wireless capsule endoscopy. This pill-like device is able to produce thousand of high enough resolution images during one passage through gastrointestinal tract. In t… ▽ More Accurate detection and localization for angiodysplasia lesions is an important problem in early stage diagnostics of gastrointestinal bleeding and anemia. Gold-standard for angiodysplasia detection and localization is performed using wireless capsule endoscopy. This pill-like device is able to produce thousand of high enough resolution images during one passage through gastrointestinal tract. In this paper we present our winning solution for MICCAI 2017 Endoscopic Vision SubChallenge: Angiodysplasia Detection and Localization its further improvements over the state-of-the-art results using several novel deep neural network architectures. It address the binary segmentation problem, where every pixel in an image is labeled as an angiodysplasia lesions or background. Then, we analyze connected component of each predicted mask. Based on the analysis we developed a classifier that predict angiodysplasia lesions (binary variable) and a detector for their localization (center of a component). In this setting, our approach outperforms other methods in every task subcategory for angiodysplasia detection and localization thereby providing state-of-the-art results for these problems. The source code for our solution is made publicly available at https://github.com/ternaus/angiodysplasia-segmentatio △ Less

Submitted 21 April, 2018; originally announced April 2018.

Comments: 12 pages, 6 figures

Journal ref: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)

arXiv:1803.01207 [pdf, other]

doi 10.1109/ICMLA.2018.00100

Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning

Authors: Alexey Shvets, Alexander Rakhlin, Alexandr A. Kalinin, Vladimir Iglovikov

Abstract: Semantic segmentation of robotic instruments is an important problem for the robot-assisted surgery. One of the main challenges is to correctly detect an instrument's position for the tracking and pose estimation in the vicinity of surgical scenes. Accurate pixel-wise instrument segmentation is needed to address this challenge. In this paper we describe our winning solution for MICCAI 2017 Endosco… ▽ More Semantic segmentation of robotic instruments is an important problem for the robot-assisted surgery. One of the main challenges is to correctly detect an instrument's position for the tracking and pose estimation in the vicinity of surgical scenes. Accurate pixel-wise instrument segmentation is needed to address this challenge. In this paper we describe our winning solution for MICCAI 2017 Endoscopic Vision SubChallenge: Robotic Instrument Segmentation. Our approach demonstrates an improvement over the state-of-the-art results using several novel deep neural network architectures. It addressed the binary segmentation problem, where every pixel in an image is labeled as an instrument or background from the surgery video feed. In addition, we solve a multi-class segmentation problem, where we distinguish different instruments or different parts of an instrument from the background. In this setting, our approach outperforms other methods in every task subcategory for automatic instrument segmentation thereby providing state-of-the-art solution for this problem. The source code for our solution is made publicly available at https://github.com/ternaus/robot-surgery-segmentation △ Less

Submitted 19 June, 2018; v1 submitted 3 March, 2018; originally announced March 2018.

Comments: 9 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:1804.08024

Journal ref: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)

arXiv:1802.00752 [pdf, other]

doi 10.1007/978-3-319-93000-8_83

Deep Convolutional Neural Networks for Breast Cancer Histology Image Analysis

Authors: Alexander Rakhlin, Alexey Shvets, Vladimir Iglovikov, Alexandr A. Kalinin

Abstract: Breast cancer is one of the main causes of cancer death worldwide. Early diagnostics significantly increases the chances of correct treatment and survival, but this process is tedious and often leads to a disagreement between pathologists. Computer-aided diagnosis systems showed potential for improving the diagnostic accuracy. In this work, we develop the computational approach based on deep convo… ▽ More Breast cancer is one of the main causes of cancer death worldwide. Early diagnostics significantly increases the chances of correct treatment and survival, but this process is tedious and often leads to a disagreement between pathologists. Computer-aided diagnosis systems showed potential for improving the diagnostic accuracy. In this work, we develop the computational approach based on deep convolution neural networks for breast cancer histology image classification. Hematoxylin and eosin stained breast histology microscopy image dataset is provided as a part of the ICIAR 2018 Grand Challenge on Breast Cancer Histology Images. Our approach utilizes several deep neural network architectures and gradient boosted trees classifier. For 4-class classification task, we report 87.2% accuracy. For 2-class classification task to detect carcinomas we report 93.8% accuracy, AUC 97.3%, and sensitivity/specificity 96.5/88.0% at the high-sensitivity operating point. To our knowledge, this approach outperforms other common methods in automated histopathological image classification. The source code for our approach is made publicly available at https://github.com/alexander-rakhlin/ICIAR2018 △ Less

Submitted 3 April, 2018; v1 submitted 2 February, 2018; originally announced February 2018.

Comments: 8 pages, 4 figures

arXiv:1801.05746 [pdf, other]

TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation

Authors: Vladimir Iglovikov, Alexey Shvets

Abstract: Pixel-wise image segmentation is demanding task in computer vision. Classical U-Net architectures composed of encoders and decoders are very popular for segmentation of medical images, satellite images etc. Typically, neural network initialized with weights from a network pre-trained on a large data set like ImageNet shows better performance than those trained from scratch on a small dataset. In s… ▽ More Pixel-wise image segmentation is demanding task in computer vision. Classical U-Net architectures composed of encoders and decoders are very popular for segmentation of medical images, satellite images etc. Typically, neural network initialized with weights from a network pre-trained on a large data set like ImageNet shows better performance than those trained from scratch on a small dataset. In some practical applications, particularly in medicine and traffic safety, the accuracy of the models is of utmost importance. In this paper, we demonstrate how the U-Net type architecture can be improved by the use of the pre-trained encoder. Our code and corresponding pre-trained weights are publicly available at https://github.com/ternaus/TernausNet. We compare three weight initialization schemes: LeCun uniform, the encoder with weights from VGG11 and full network trained on the Carvana dataset. This network architecture was a part of the winning solution (1st out of 735) in the Kaggle: Carvana Image Masking Challenge. △ Less

Submitted 17 January, 2018; originally announced January 2018.

Comments: 5 pages, 4 figures

arXiv:1712.05053 [pdf, other]

doi 10.1007/978-3-030-00889-5_34

Pediatric Bone Age Assessment Using Deep Convolutional Neural Networks

Authors: Vladimir Iglovikov, Alexander Rakhlin, Alexandr Kalinin, Alexey Shvets

Abstract: Skeletal bone age assessment is a common clinical practice to diagnose endocrine and metabolic disorders in child development. In this paper, we describe a fully automated deep learning approach to the problem of bone age assessment using data from Pediatric Bone Age Challenge organized by RSNA 2017. The dataset for this competition is consisted of 12.6k radiological images of left hand labeled by… ▽ More Skeletal bone age assessment is a common clinical practice to diagnose endocrine and metabolic disorders in child development. In this paper, we describe a fully automated deep learning approach to the problem of bone age assessment using data from Pediatric Bone Age Challenge organized by RSNA 2017. The dataset for this competition is consisted of 12.6k radiological images of left hand labeled by the bone age and sex of patients. Our approach utilizes several deep learning architectures: U-Net, ResNet-50, and custom VGG-style neural networks trained end-to-end. We use images of whole hands as well as specific parts of a hand for both training and inference. This approach allows us to measure importance of specific hand bones for the automated bone age analysis. We further evaluate performance of the method in the context of skeletal development stages. Our approach outperforms other common methods for bone age assessment. △ Less

Submitted 19 June, 2018; v1 submitted 13 December, 2017; originally announced December 2017.

Comments: 14 pages, 9 figures

arXiv:1706.06169 [pdf, other]

Satellite Imagery Feature Detection using Deep Convolutional Neural Network: A Kaggle Competition

Authors: Vladimir Iglovikov, Sergey Mushinskiy, Vladimir Osin

Abstract: This paper describes our approach to the DSTL Satellite Imagery Feature Detection challenge run by Kaggle. The primary goal of this challenge is accurate semantic segmentation of different classes in satellite imagery. Our approach is based on an adaptation of fully convolutional neural network for multispectral data processing. In addition, we defined several modifications to the training objecti… ▽ More This paper describes our approach to the DSTL Satellite Imagery Feature Detection challenge run by Kaggle. The primary goal of this challenge is accurate semantic segmentation of different classes in satellite imagery. Our approach is based on an adaptation of fully convolutional neural network for multispectral data processing. In addition, we defined several modifications to the training objective and overall training pipeline, e.g. boundary effect estimation, also we discuss usage of data augmentation strategies and reflectance indices. Our solution scored third place out of 419 entries. Its accuracy is comparable to the first two places, but unlike those solutions, it doesn't rely on complex ensembling techniques and thus can be easily scaled for deployment in production as a part of automatic feature labeling systems for satellite imagery analysis. △ Less

Submitted 19 June, 2017; originally announced June 2017.

Showing 1–15 of 15 results for author: Iglovikov, V