-
NCT-CRC-HE: Not All Histopathological Datasets Are Equally Useful
Authors:
Andrey Ignatov,
Grigory Malivenko
Abstract:
Numerous deep learning-based solutions have been proposed for histopathological image analysis over the past years. While they usually demonstrate exceptionally high accuracy, one key question is whether their precision might be affected by low-level image properties not related to histopathology but caused by microscopy image handling and pre-processing. In this paper, we analyze a popular NCT-CR…
▽ More
Numerous deep learning-based solutions have been proposed for histopathological image analysis over the past years. While they usually demonstrate exceptionally high accuracy, one key question is whether their precision might be affected by low-level image properties not related to histopathology but caused by microscopy image handling and pre-processing. In this paper, we analyze a popular NCT-CRC-HE-100K colorectal cancer dataset used in numerous prior works and show that both this dataset and the obtained results may be affected by data-specific biases. The most prominent revealed dataset issues are inappropriate color normalization, severe JPEG artifacts inconsistent between different classes, and completely corrupted tissue samples resulting from incorrect image dynamic range handling. We show that even the simplest model using only 3 features per image (red, green and blue color intensities) can demonstrate over 50% accuracy on this 9-class dataset, while using color histogram not explicitly capturing cell morphology features yields over 82% accuracy. Moreover, we show that a basic EfficientNet-B0 ImageNet pretrained model can achieve over 97.7% accuracy on this dataset, outperforming all previously proposed solutions developed for this task, including dedicated foundation histopathological models and large cell morphology-aware neural networks. The NCT-CRC-HE dataset is publicly available and can be freely used to replicate the presented results. The codes and pre-trained models used in this paper are available at https://github.com/gmalivenko/NCT-CRC-HE-experiments
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks
Authors:
Andrey Ignatov,
Josephine Yates,
Valentina Boeva
Abstract:
Histopathological images are widely used for the analysis of diseased (tumor) tissues and patient treatment selection. While the majority of microscopy image processing was previously done manually by pathologists, recent advances in computer vision allow for accurate recognition of lesion regions with deep learning-based solutions. Such models, however, usually require extensive annotated dataset…
▽ More
Histopathological images are widely used for the analysis of diseased (tumor) tissues and patient treatment selection. While the majority of microscopy image processing was previously done manually by pathologists, recent advances in computer vision allow for accurate recognition of lesion regions with deep learning-based solutions. Such models, however, usually require extensive annotated datasets for training, which is often not the case in the considered task, where the number of available patient data samples is very limited. To deal with this problem, we propose a novel DeepCMorph model pre-trained to learn cell morphology and identify a large number of different cancer types. The model consists of two modules: the first one performs cell nuclei segmentation and annotates each cell type, and is trained on a combination of 8 publicly available datasets to ensure its high generalizability and robustness. The second module combines the obtained segmentation map with the original microscopy image and is trained for the downstream task. We pre-trained this module on the Pan-Cancer TCGA dataset consisting of over 270K tissue patches extracted from 8736 diagnostic slides from 7175 patients. The proposed solution achieved a new state-of-the-art performance on the dataset under consideration, detecting 32 cancer types with over 82% accuracy and outperforming all previously proposed solutions by more than 4%. We demonstrate that the resulting pre-trained model can be easily fine-tuned on smaller microscopy datasets, yielding superior results compared to the current top solutions and models initialized with ImageNet weights. The codes and pre-trained models presented in this paper are available at: https://github.com/aiff22/DeepCMorph
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Virtually Enriched NYU Depth V2 Dataset for Monocular Depth Estimation: Do We Need Artificial Augmentation?
Authors:
Dmitry Ignatov,
Andrey Ignatov,
Radu Timofte
Abstract:
We present ANYU, a new virtually augmented version of the NYU depth v2 dataset, designed for monocular depth estimation. In contrast to the well-known approach where full 3D scenes of a virtual world are utilized to generate artificial datasets, ANYU was created by incorporating RGB-D representations of virtual reality objects into the original NYU depth v2 images. We specifically did not match ea…
▽ More
We present ANYU, a new virtually augmented version of the NYU depth v2 dataset, designed for monocular depth estimation. In contrast to the well-known approach where full 3D scenes of a virtual world are utilized to generate artificial datasets, ANYU was created by incorporating RGB-D representations of virtual reality objects into the original NYU depth v2 images. We specifically did not match each generated virtual object with an appropriate texture and a suitable location within the real-world image. Instead, an assignment of texture, location, lighting, and other rendering parameters was randomized to maximize a diversity of the training data, and to show that it is randomness that can improve the generalizing ability of a dataset. By conducting extensive experiments with our virtually modified dataset and validating on the original NYU depth v2 and iBims-1 benchmarks, we show that ANYU improves the monocular depth estimation performance and generalization of deep neural networks with considerably different architectures, especially for the current state-of-the-art VPD model. To the best of our knowledge, this is the first work that augments a real-world dataset with randomly generated virtual 3D objects for monocular depth estimation. We make our ANYU dataset publicly available in two training configurations with 10% and 100% additional synthetically enriched RGB-D pairs of training images, respectively, for efficient training and empirical exploration of virtual augmentation at https://github.com/ABrain-One/ANYU
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
MicroISP: Processing 32MP Photos on Mobile Devices with Deep Learning
Authors:
Andrey Ignatov,
Anastasia Sycheva,
Radu Timofte,
Yu Tseng,
Yu-Syuan Xu,
Po-Hsiang Yu,
Cheng-Ming Chiang,
Hsien-Kai Kuo,
Min-Hung Chen,
Chia-Ming Cheng,
Luc Van Gool
Abstract:
While neural networks-based photo processing solutions can provide a better image quality compared to the traditional ISP systems, their application to mobile devices is still very limited due to their very high computational complexity. In this paper, we present a novel MicroISP model designed specifically for edge devices, taking into account their computational and memory limitations. The propo…
▽ More
While neural networks-based photo processing solutions can provide a better image quality compared to the traditional ISP systems, their application to mobile devices is still very limited due to their very high computational complexity. In this paper, we present a novel MicroISP model designed specifically for edge devices, taking into account their computational and memory limitations. The proposed solution is capable of processing up to 32MP photos on recent smartphones using the standard mobile ML libraries and requiring less than 1 second to perform the inference, while for FullHD images it achieves real-time performance. The architecture of the model is flexible, allowing to adjust its complexity to devices of different computational power. To evaluate the performance of the model, we collected a novel Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The experiments demonstrated that, despite its compact size, the MicroISP model is able to provide comparable or better visual results than the traditional mobile ISP systems, while outperforming the previously proposed efficient deep learning based solutions. Finally, this model is also compatible with the latest mobile AI accelerators, achieving good runtime and low power consumption on smartphone NPUs and APUs. The code, dataset and pre-trained models are available on the project website: https://people.ee.ethz.ch/~ihnatova/microisp.html
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Realistic Bokeh Effect Rendering on Mobile GPUs, Mobile AI & AIM 2022 challenge: Report
Authors:
Andrey Ignatov,
Radu Timofte,
Jin Zhang,
Feng Zhang,
Gaocheng Yu,
Zhe Ma,
Hongbin Wang,
Minsu Kwon,
Haotian Qian,
Wentao Tong,
Pan Mu,
Ziping Wang,
Guangjing Yan,
Brian Lee,
Lei Fei,
Huaijin Chen,
Hyebin Cho,
Byeongjun Kwon,
Munchurl Kim,
Mingyang Qian,
Huixin Ma,
Yanan Li,
Xiaotao Wang,
Lei Lei
Abstract:
As mobile cameras with compact optics are unable to produce a strong bokeh effect, lots of interest is now devoted to deep learning-based solutions for this task. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based bokeh effect rendering approach that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale EBB!…
▽ More
As mobile cameras with compact optics are unable to produce a strong bokeh effect, lots of interest is now devoted to deep learning-based solutions for this task. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based bokeh effect rendering approach that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using the Canon 7D DSLR camera. The runtime of the resulting models was evaluated on the Kirin 9000's Mali GPU that provides excellent acceleration results for the majority of common deep learning ops. A detailed description of all models developed in this challenge is provided in this paper.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
PyNet-V2 Mobile: Efficient On-Device Photo Processing With Neural Networks
Authors:
Andrey Ignatov,
Grigory Malivenko,
Radu Timofte,
Yu Tseng,
Yu-Syuan Xu,
Po-Hsiang Yu,
Cheng-Ming Chiang,
Hsien-Kai Kuo,
Min-Hung Chen,
Chia-Ming Cheng,
Luc Van Gool
Abstract:
The increased importance of mobile photography created a need for fast and performant RAW image processing pipelines capable of producing good visual results in spite of the mobile camera sensor limitations. While deep learning-based approaches can efficiently solve this problem, their computational requirements usually remain too large for high-resolution on-device image processing. To address th…
▽ More
The increased importance of mobile photography created a need for fast and performant RAW image processing pipelines capable of producing good visual results in spite of the mobile camera sensor limitations. While deep learning-based approaches can efficiently solve this problem, their computational requirements usually remain too large for high-resolution on-device image processing. To address this limitation, we propose a novel PyNET-V2 Mobile CNN architecture designed specifically for edge devices, being able to process RAW 12MP photos directly on mobile phones under 1.5 second and producing high perceptual photo quality. To train and to evaluate the performance of the proposed solution, we use the real-world Fujifilm UltraISP dataset consisting on thousands of RAW-RGB image pairs captured with a professional medium-format 102MP Fujifilm camera and a popular Sony mobile camera sensor. The results demonstrate that the PyNET-V2 Mobile model can substantially surpass the quality of tradition ISP pipelines, while outperforming the previously introduced neural network-based solutions designed for fast image processing. Furthermore, we show that the proposed architecture is also compatible with the latest mobile AI accelerators such as NPUs or APUs that can be used to further reduce the latency of the model to as little as 0.5 second. The dataset, code and pre-trained models used in this paper are available on the project website: https://github.com/gmalivenko/PyNET-v2
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report
Authors:
Andrey Ignatov,
Radu Timofte,
Maurizio Denna,
Abdel Younes,
Ganzorig Gankhuyag,
Jingang Huh,
Myeong Kyun Kim,
Kihwan Yoon,
Hyeon-Cheol Moon,
Seungho Lee,
Yoonsik Choe,
Jinwoo Jeong,
Sungjei Kim,
Maciej Smyl,
Tomasz Latkowski,
Pawel Kubik,
Michal Sokolski,
Yujie Ma,
Jiahao Chao,
Zhou Zhou,
Hongfan Gao,
Zhengfeng Yang,
Zhenbing Zeng,
Zhengyang Zhuge,
Chenghua Li
, et al. (71 additional authors not shown)
Abstract:
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose…
▽ More
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report
Authors:
Andrey Ignatov,
Radu Timofte,
Cheng-Ming Chiang,
Hsien-Kai Kuo,
Yu-Syuan Xu,
Man-Yu Lee,
Allen Lu,
Chia-Ming Cheng,
Chih-Cheng Chen,
Jia-Ying Yong,
Hong-Han Shuai,
Wen-Huang Cheng,
Zhuang Jia,
Tianyu Xu,
Yijian Zhang,
Long Bao,
Heng Sun,
Diankai Zhang,
Si Gao,
Shaoli Liu,
Biao Wu,
Xiaofeng Zhang,
Chengjian Zheng,
Kaidi Lu,
Ning Wang
, et al. (29 additional authors not shown)
Abstract:
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this prob…
▽ More
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI & AIM 2022 Challenge: Report
Authors:
Andrey Ignatov,
Grigory Malivenko,
Radu Timofte,
Lukasz Treszczotko,
Xin Chang,
Piotr Ksiazek,
Michal Lopuszynski,
Maciej Pioro,
Rafal Rudnicki,
Maciej Smyl,
Yujie Ma,
Zhenyu Li,
Zehui Chen,
Jialei Xu,
Xianming Liu,
Junjun Jiang,
XueChao Shi,
Difan Xu,
Yanan Li,
Xiaotao Wang,
Lei Lei,
Ziyu Zhang,
Yicheng Wang,
Zilong Huang,
Guozhong Luo
, et al. (14 additional authors not shown)
Abstract:
Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth es…
▽ More
Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth estimation solutions that can show a real-time performance on IoT platforms and smartphones. For this, the participants used a large-scale RGB-to-depth dataset that was collected with the ZED stereo camera capable to generated depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the Raspberry Pi 4 platform, where the developed solutions were able to generate VGA resolution depth maps at up to 27 FPS while achieving high fidelity results. All models developed in the challenge are also compatible with any Android or Linux-based mobile devices, their detailed description is provided in this paper.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report
Authors:
Andrey Ignatov,
Radu Timofte,
Shuai Liu,
Chaoyu Feng,
Furui Bai,
Xiaotao Wang,
Lei Lei,
Ziyao Yi,
Yan Xiang,
Zibin Liu,
Shaoqing Li,
Keming Shi,
Dehui Kong,
Ke Xu,
Minsu Kwon,
Yaqi Wu,
Jiesi Zheng,
Zhihao Fan,
Xun Wu,
Feng Zhang,
Albert No,
Minhyeok Cho,
Zewen Chen,
Xiaze Zhang,
Ran Li
, et al. (13 additional authors not shown)
Abstract:
The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. Th…
▽ More
The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Real-Time Video Super-Resolution on Smartphones with Deep Learning, Mobile AI 2021 Challenge: Report
Authors:
Andrey Ignatov,
Andres Romero,
Heewon Kim,
Radu Timofte,
Chiu Man Ho,
Zibo Meng,
Kyoung Mu Lee,
Yuxiang Chen,
Yutong Wang,
Zeyu Long,
Chenhao Wang,
Yifei Chen,
Boshen Xu,
Shuhang Gu,
Lixin Duan,
Wen Li,
Wang Bofei,
Zhang Diankai,
Zheng Chengjian,
Liu Shaoli,
Gao Si,
Zhang Xiaofeng,
Lu Kaidi,
Xu Tianyu,
Zheng Hui
, et al. (6 additional authors not shown)
Abstract:
Video super-resolution has recently become one of the most important mobile-related problems due to the rise of video communication and streaming services. While many solutions have been proposed for this task, the majority of them are too computationally expensive to run on portable devices with limited hardware resources. To address this problem, we introduce the first Mobile AI challenge, where…
▽ More
Video super-resolution has recently become one of the most important mobile-related problems due to the rise of video communication and streaming services. While many solutions have been proposed for this task, the majority of them are too computationally expensive to run on portable devices with limited hardware resources. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based video super-resolution solutions that can achieve a real-time performance on mobile GPUs. The participants were provided with the REDS dataset and trained their models to do an efficient 4X video upscaling. The runtime of all models was evaluated on the OPPO Find X2 smartphone with the Snapdragon 865 SoC capable of accelerating floating-point networks on its Adreno GPU. The proposed solutions are fully compatible with any mobile GPU and can upscale videos to HD resolution at up to 80 FPS while demonstrating high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Fast and Accurate Quantized Camera Scene Detection on Smartphones, Mobile AI 2021 Challenge: Report
Authors:
Andrey Ignatov,
Grigory Malivenko,
Radu Timofte,
Sheng Chen,
Xin Xia,
Zhaoyan Liu,
Yuwei Zhang,
Feng Zhu,
Jiashi Li,
Xuefeng Xiao,
Yuan Tian,
Xinglong Wu,
Christos Kyrkou,
Yixin Chen,
Zexin Zhang,
Yunbo Peng,
Yue Lin,
Saikat Dutta,
Sourya Dipta Das,
Nisarg A. Shah,
Himanshu Kumar,
Chao Ge,
Pei-Lin Wu,
Jin-Hua Du,
Andrew Batutin
, et al. (6 additional authors not shown)
Abstract:
Camera scene detection is among the most popular computer vision problem on smartphones. While many custom solutions were developed for this task by phone vendors, none of the designed models were available publicly up until now. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop quantized deep learning-based camera scene classification solutions th…
▽ More
Camera scene detection is among the most popular computer vision problem on smartphones. While many custom solutions were developed for this task by phone vendors, none of the designed models were available publicly up until now. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop quantized deep learning-based camera scene classification solutions that can demonstrate a real-time performance on smartphones and IoT platforms. For this, the participants were provided with a large-scale CamSDD dataset consisting of more than 11K images belonging to the 30 most important scene categories. The runtime of all models was evaluated on the popular Apple Bionic A11 platform that can be found in many iOS devices. The proposed solutions are fully compatible with all major mobile AI accelerators and can demonstrate more than 100-200 FPS on the majority of recent smartphone platforms while achieving a top-3 accuracy of more than 98%. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report
Authors:
Andrey Ignatov,
Grigory Malivenko,
David Plowman,
Samarth Shukla,
Radu Timofte,
Ziyu Zhang,
Yicheng Wang,
Zilong Huang,
Guozhong Luo,
Gang Yu,
Bin Fu,
Yiran Wang,
Xingyi Li,
Min Shi,
Ke Xian,
Zhiguo Cao,
Jin-Hua Du,
Pei-Lin Wu,
Chao Ge,
Jiaoyang Yao,
Fangwen Tu,
Bo Li,
Jung Eun Yoo,
Kwanggyoon Seo,
Jialei Xu
, et al. (13 additional authors not shown)
Abstract:
Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based d…
▽ More
Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based depth estimation solutions that can demonstrate a nearly real-time performance on smartphones and IoT platforms. For this, the participants were provided with a new large-scale dataset containing RGB-depth image pairs obtained with a dedicated stereo ZED camera producing high-resolution depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the popular Raspberry Pi 4 platform with a mobile ARM-based Broadcom chipset. The proposed solutions can generate VGA resolution depth maps at up to 10 FPS on the Raspberry Pi 4 while achieving high fidelity results, and are compatible with any Android or Linux-based mobile devices. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report
Authors:
Andrey Ignatov,
Kim Byeoung-su,
Radu Timofte,
Angeline Pouget,
Fenglong Song,
Cheng Li,
Shuai Xiao,
Zhongqian Fu,
Matteo Maggioni,
Yibin Huang,
Shen Cheng,
Xin Lu,
Yifeng Zhou,
Liangyu Chen,
Donghao Liu,
Xiangyu Zhang,
Haoqiang Fan,
Jian Sun,
Shuaicheng Liu,
Minsu Kwon,
Myungje Lee,
Jaeyoon Yoo,
Changbeom Kang,
Shinjo Wang,
Bin Huang
, et al. (7 additional authors not shown)
Abstract:
Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solut…
▽ More
Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solution that can demonstrate high efficiency on smartphone GPUs. For this, the participants were provided with a novel large-scale dataset consisting of noisy-clean image pairs captured in the wild. The runtime of all models was evaluated on the Samsung Exynos 2100 chipset with a powerful Mali GPU capable of accelerating floating-point and quantized neural networks. The proposed solutions are fully compatible with any mobile GPU and are capable of processing 480p resolution images under 40-80 ms while achieving high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Fast and Accurate Camera Scene Detection on Smartphones
Authors:
Angeline Pouget,
Sidharth Ramesh,
Maximilian Giang,
Ramithan Chandrapalan,
Toni Tanner,
Moritz Prussing,
Radu Timofte,
Andrey Ignatov
Abstract:
AI-powered automatic camera scene detection mode is nowadays available in nearly any modern smartphone, though the problem of accurate scene prediction has not yet been addressed by the research community. This paper for the first time carefully defines this problem and proposes a novel Camera Scene Detection Dataset (CamSDD) containing more than 11K manually crawled images belonging to 30 differe…
▽ More
AI-powered automatic camera scene detection mode is nowadays available in nearly any modern smartphone, though the problem of accurate scene prediction has not yet been addressed by the research community. This paper for the first time carefully defines this problem and proposes a novel Camera Scene Detection Dataset (CamSDD) containing more than 11K manually crawled images belonging to 30 different scene categories. We propose an efficient and NPU-friendly CNN model for this task that demonstrates a top-3 accuracy of 99.5% on this dataset and achieves more than 200 FPS on the recent mobile SoCs. An additional in-the-wild evaluation of the obtained solution is performed to analyze its performance and limitation in the real-world scenarios. The dataset and pre-trained models used in this paper are available on the project website.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Real-Time Quantized Image Super-Resolution on Mobile NPUs, Mobile AI 2021 Challenge: Report
Authors:
Andrey Ignatov,
Radu Timofte,
Maurizio Denna,
Abdel Younes,
Andrew Lek,
Mustafa Ayazoglu,
Jie Liu,
Zongcai Du,
Jiaming Guo,
Xueyi Zhou,
Hao Jia,
Youliang Yan,
Zexin Zhang,
Yixin Chen,
Yunbo Peng,
Yue Lin,
Xindong Zhang,
Hui Zeng,
Kun Zeng,
Peirong Li,
Zhihuang Liu,
Shiqi Xue,
Shengpeng Wang
Abstract:
Image super-resolution is one of the most popular computer vision problems with many important applications to mobile devices. While many solutions have been proposed for this task, they are usually not optimized even for common smartphone AI hardware, not to mention more constrained smart TV platforms that are often supporting INT8 inference only. To address this problem, we introduce the first M…
▽ More
Image super-resolution is one of the most popular computer vision problems with many important applications to mobile devices. While many solutions have been proposed for this task, they are usually not optimized even for common smartphone AI hardware, not to mention more constrained smart TV platforms that are often supporting INT8 inference only. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image super-resolution solutions that can demonstrate a real-time performance on mobile or edge NPUs. For this, the participants were provided with the DIV2K dataset and trained quantized models to do an efficient 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated NPU capable of accelerating quantized neural networks. The proposed solutions are fully compatible with all major mobile AI accelerators and are capable of reconstructing Full HD images under 40-60 ms while achieving high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Learned Smartphone ISP on Mobile NPUs with Deep Learning, Mobile AI 2021 Challenge: Report
Authors:
Andrey Ignatov,
Cheng-Ming Chiang,
Hsien-Kai Kuo,
Anastasia Sycheva,
Radu Timofte,
Min-Hung Chen,
Man-Yu Lee,
Yu-Syuan Xu,
Yu Tseng,
Shusong Xu,
Jin Guo,
Chao-Hung Chen,
Ming-Chun Hsyu,
Wen-Chia Tsai,
Chao-Wei Chen,
Grigory Malivenko,
Minsu Kwon,
Myungje Lee,
Jaeyoon Yoo,
Changbeom Kang,
Shinjo Wang,
Zheng Shaolong,
Hao Dejun,
Xie Fen,
Feng Zhuang
, et al. (16 additional authors not shown)
Abstract:
As the quality of mobile cameras starts to play a crucial role in modern smartphones, more and more attention is now being paid to ISP algorithms used to improve various perceptual aspects of mobile photos. In this Mobile AI challenge, the target was to develop an end-to-end deep learning-based image signal processing (ISP) pipeline that can replace classical hand-crafted ISPs and achieve nearly r…
▽ More
As the quality of mobile cameras starts to play a crucial role in modern smartphones, more and more attention is now being paid to ISP algorithms used to improve various perceptual aspects of mobile photos. In this Mobile AI challenge, the target was to develop an end-to-end deep learning-based image signal processing (ISP) pipeline that can replace classical hand-crafted ISPs and achieve nearly real-time performance on smartphone NPUs. For this, the participants were provided with a novel learned ISP dataset consisting of RAW-RGB image pairs captured with the Sony IMX586 Quad Bayer mobile sensor and a professional 102-megapixel medium format camera. The runtime of all models was evaluated on the MediaTek Dimensity 1000+ platform with a dedicated AI processing unit capable of accelerating both floating-point and quantized neural networks. The proposed solutions are fully compatible with the above NPU and are capable of processing Full HD photos under 60-100 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
AIM 2020 Challenge on Learned Image Signal Processing Pipeline
Authors:
Andrey Ignatov,
Radu Timofte,
Zhilu Zhang,
Ming Liu,
Haolin Wang,
Wangmeng Zuo,
Jiawei Zhang,
Ruimao Zhang,
Zhanglin Peng,
Sijie Ren,
Linhui Dai,
Xiaohong Liu,
Chengqi Li,
Jun Chen,
Yuichi Ito,
Bhavya Vasudeva,
Puneesh Deora,
Umapada Pal,
Zhenyu Guo,
Yu Zhu,
Tian Liang,
Chenghua Li,
Cong Leng,
Zhihong Pan,
Baopu Li
, et al. (14 additional authors not shown)
Abstract:
This paper reviews the second AIM learned ISP challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world RAW-to-RGB mapping problem, where to goal was to map the original low-quality RAW images captured by the Huawei P20 device to the same photos obtained with the Canon 5D DSLR camera. The considered task embraced a number of com…
▽ More
This paper reviews the second AIM learned ISP challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world RAW-to-RGB mapping problem, where to goal was to map the original low-quality RAW images captured by the Huawei P20 device to the same photos obtained with the Canon 5D DSLR camera. The considered task embraced a number of complex computer vision subtasks, such as image demosaicing, denoising, white balancing, color and contrast correction, demoireing, etc. The target metric used in this challenge combined fidelity scores (PSNR and SSIM) with solutions' perceptual results measured in a user study. The proposed solutions significantly improved the baseline results, defining the state-of-the-art for practical image signal processing pipeline modeling.
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
AIM 2020 Challenge on Rendering Realistic Bokeh
Authors:
Andrey Ignatov,
Radu Timofte,
Ming Qian,
Congyu Qiao,
Jiamin Lin,
Zhenyu Guo,
Chenghua Li,
Cong Leng,
Jian Cheng,
Juewen Peng,
Xianrui Luo,
Ke Xian,
Zijin Wu,
Zhiguo Cao,
Densen Puthussery,
Jiji C V,
Hrishikesh P S,
Melvin Kuriakose,
Saikat Dutta,
Sourya Dipta Das,
Nisarg A. Shah,
Kuldeep Purohit,
Praveen Kandula,
Maitreya Suin,
A. N. Rajagopalan
, et al. (10 additional authors not shown)
Abstract:
This paper reviews the second AIM realistic bokeh effect rendering challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world bokeh simulation problem, where the goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using th…
▽ More
This paper reviews the second AIM realistic bokeh effect rendering challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world bokeh simulation problem, where the goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using the Canon 7D DSLR camera. The participants had to render bokeh effect based on only one single frame without any additional data from other cameras or sensors. The target metric used in this challenge combined the runtime and the perceptual quality of the solutions measured in the user study. To ensure the efficiency of the submitted models, we measured their runtime on standard desktop CPUs as well as were running the models on smartphone GPUs. The proposed solutions significantly improved the baseline results, defining the state-of-the-art for practical bokeh effect rendering problem.
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
Controlling Information Capacity of Binary Neural Network
Authors:
Dmitry Ignatov,
Andrey Ignatov
Abstract:
Despite the growing popularity of deep learning technologies, high memory requirements and power consumption are essentially limiting their application in mobile and IoT areas. While binary convolutional networks can alleviate these problems, the limited bitwidth of weights is often leading to significant degradation of prediction accuracy. In this paper, we present a method for training binary ne…
▽ More
Despite the growing popularity of deep learning technologies, high memory requirements and power consumption are essentially limiting their application in mobile and IoT areas. While binary convolutional networks can alleviate these problems, the limited bitwidth of weights is often leading to significant degradation of prediction accuracy. In this paper, we present a method for training binary networks that maintains a stable predefined level of their information capacity throughout the training process by applying Shannon entropy based penalty to convolutional filters. The results of experiments conducted on SVHN, CIFAR and ImageNet datasets demonstrate that the proposed approach can statistically significantly improve the accuracy of binary networks.
△ Less
Submitted 4 August, 2020;
originally announced August 2020.
-
Rendering Natural Camera Bokeh Effect with Deep Learning
Authors:
Andrey Ignatov,
Jagruti Patel,
Radu Timofte
Abstract:
Bokeh is an important artistic effect used to highlight the main object of interest on the photo by blurring all out-of-focus areas. While DSLR and system camera lenses can render this effect naturally, mobile cameras are unable to produce shallow depth-of-field photos due to a very small aperture diameter of their optics. Unlike the current solutions simulating bokeh by applying Gaussian blur to…
▽ More
Bokeh is an important artistic effect used to highlight the main object of interest on the photo by blurring all out-of-focus areas. While DSLR and system camera lenses can render this effect naturally, mobile cameras are unable to produce shallow depth-of-field photos due to a very small aperture diameter of their optics. Unlike the current solutions simulating bokeh by applying Gaussian blur to image background, in this paper we propose to learn a realistic shallow focus technique directly from the photos produced by DSLR cameras. For this, we present a large-scale bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using the Canon 7D DSLR with 50mm f/1.8 lenses. We use these images to train a deep learning model to reproduce a natural bokeh effect based on a single narrow-aperture image. The experimental results show that the proposed approach is able to render a plausible non-uniform bokeh even in case of complex input data with multiple objects. The dataset, pre-trained models and codes used in this paper are available on the project website.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
Replacing Mobile Camera ISP with a Single Deep Learning Model
Authors:
Andrey Ignatov,
Luc Van Gool,
Radu Timofte
Abstract:
As the popularity of mobile photography is growing constantly, lots of efforts are being invested now into building complex hand-crafted camera ISP solutions. In this work, we demonstrate that even the most sophisticated ISP pipelines can be replaced with a single end-to-end deep learning model trained without any prior knowledge about the sensor and optics used in a particular device. For this, w…
▽ More
As the popularity of mobile photography is growing constantly, lots of efforts are being invested now into building complex hand-crafted camera ISP solutions. In this work, we demonstrate that even the most sophisticated ISP pipelines can be replaced with a single end-to-end deep learning model trained without any prior knowledge about the sensor and optics used in a particular device. For this, we present PyNET, a novel pyramidal CNN architecture designed for fine-grained image restoration that implicitly learns to perform all ISP steps such as image demosaicing, denoising, white balancing, color and contrast correction, demoireing, etc. The model is trained to convert RAW Bayer data obtained directly from mobile camera sensor into photos captured with a professional high-end DSLR camera, making the solution independent of any particular mobile ISP implementation. To validate the proposed approach on the real data, we collected a large-scale dataset consisting of 10 thousand full-resolution RAW-RGB image pairs captured in the wild with the Huawei P20 cameraphone (12.3 MP Sony Exmor IMX380 sensor) and Canon 5D Mark IV DSLR. The experiments demonstrate that the proposed solution can easily get to the level of the embedded P20's ISP pipeline that, unlike our approach, is combining the data from two (RGB + B/W) camera sensors. The dataset, pre-trained models and codes used in this paper are available on the project website.
△ Less
Submitted 13 February, 2020;
originally announced February 2020.
-
AI Benchmark: All About Deep Learning on Smartphones in 2019
Authors:
Andrey Ignatov,
Radu Timofte,
Andrei Kulik,
Seungsoo Yang,
Ke Wang,
Felix Baum,
Max Wu,
Lirong Xu,
Luc Van Gool
Abstract:
The performance of mobile AI accelerators has been evolving rapidly in the past two years, nearly doubling with each new generation of SoCs. The current 4th generation of mobile NPUs is already approaching the results of CUDA-compatible Nvidia graphics cards presented not long ago, which together with the increased capabilities of mobile deep learning frameworks makes it possible to run complex an…
▽ More
The performance of mobile AI accelerators has been evolving rapidly in the past two years, nearly doubling with each new generation of SoCs. The current 4th generation of mobile NPUs is already approaching the results of CUDA-compatible Nvidia graphics cards presented not long ago, which together with the increased capabilities of mobile deep learning frameworks makes it possible to run complex and deep AI models on mobile devices. In this paper, we evaluate the performance and compare the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference. We also discuss the recent changes in the Android ML pipeline and provide an overview of the deployment of deep learning models on mobile devices. All numerical results provided in this paper can be found and are regularly updated on the official project website: http://ai-benchmark.com.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
Fast Perceptual Image Enhancement
Authors:
Etienne de Stoutz,
Andrey Ignatov,
Nikolay Kobyshev,
Radu Timofte,
Luc Van Gool
Abstract:
The vast majority of photos taken today are by mobile phones. While their quality is rapidly growing, due to physical limitations and cost constraints, mobile phone cameras struggle to compare in quality with DSLR cameras. This motivates us to computationally enhance these images. We extend upon the results of Ignatov et al., where they are able to translate images from compact mobile cameras into…
▽ More
The vast majority of photos taken today are by mobile phones. While their quality is rapidly growing, due to physical limitations and cost constraints, mobile phone cameras struggle to compare in quality with DSLR cameras. This motivates us to computationally enhance these images. We extend upon the results of Ignatov et al., where they are able to translate images from compact mobile cameras into images with comparable quality to high-resolution photos taken by DSLR cameras. However, the neural models employed require large amounts of computational resources and are not lightweight enough to run on mobile devices. We build upon the prior work and explore different network architectures targeting an increase in image quality and speed. With an efficient network architecture which does most of its processing in a lower spatial resolution, we achieve a significantly higher mean opinion score (MOS) than the baseline while speeding up the computation by 6.3 times on a consumer-grade CPU. This suggests a promising direction for neural-network-based photo enhancement using the phone hardware of the future.
△ Less
Submitted 31 December, 2018;
originally announced December 2018.
-
PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report
Authors:
Andrey Ignatov,
Radu Timofte,
Thang Van Vu,
Tung Minh Luu,
Trung X Pham,
Cao Van Nguyen,
Yongwoo Kim,
Jae-Seok Choi,
Munchurl Kim,
Jie Huang,
Jiewen Ran,
Chen Xing,
Xingguang Zhou,
Pengfei Zhu,
Mingrui Geng,
Yawei Li,
Eirikur Agustsson,
Shuhang Gu,
Luc Van Gool,
Etienne de Stoutz,
Nikolay Kobyshev,
Kehui Nie,
Yan Zhao,
Gen Li,
Tong Tong
, et al. (23 additional authors not shown)
Abstract:
This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones. The challenge consisted of two tracks. In the first one, participants were solving the classical image super-resolution problem with a bicubic downscaling factor of 4. The second track was aimed at real-world photo enhancement, and the goal was to map lo…
▽ More
This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones. The challenge consisted of two tracks. In the first one, participants were solving the classical image super-resolution problem with a bicubic downscaling factor of 4. The second track was aimed at real-world photo enhancement, and the goal was to map low-quality photos from the iPhone 3GS device to the same photos captured with a DSLR camera. The target metric used in this challenge combined the runtime, PSNR scores and solutions' perceptual results measured in the user study. To ensure the efficiency of the submitted models, we additionally measured their runtime and memory requirements on Android smartphones. The proposed solutions significantly improved baseline results defining the state-of-the-art for image enhancement on smartphones.
△ Less
Submitted 3 October, 2018;
originally announced October 2018.
-
AI Benchmark: Running Deep Neural Networks on Android Smartphones
Authors:
Andrey Ignatov,
Radu Timofte,
William Chou,
Ke Wang,
Max Wu,
Tim Hartley,
Luc Van Gool
Abstract:
Over the last years, the computational power of mobile devices such as smartphones and tablets has grown dramatically, reaching the level of desktop computers available not long ago. While standard smartphone apps are no longer a problem for them, there is still a group of tasks that can easily challenge even high-end devices, namely running artificial intelligence algorithms. In this paper, we pr…
▽ More
Over the last years, the computational power of mobile devices such as smartphones and tablets has grown dramatically, reaching the level of desktop computers available not long ago. While standard smartphone apps are no longer a problem for them, there is still a group of tasks that can easily challenge even high-end devices, namely running artificial intelligence algorithms. In this paper, we present a study of the current state of deep learning in the Android ecosystem and describe available frameworks, programming models and the limitations of running AI on smartphones. We give an overview of the hardware acceleration resources available on four main mobile chipset platforms: Qualcomm, HiSilicon, MediaTek and Samsung. Additionally, we present the real-world performance results of different mobile SoCs collected with AI Benchmark that are covering all main existing hardware configurations.
△ Less
Submitted 15 October, 2018; v1 submitted 2 October, 2018;
originally announced October 2018.
-
WESPE: Weakly Supervised Photo Enhancer for Digital Cameras
Authors:
Andrey Ignatov,
Nikolay Kobyshev,
Radu Timofte,
Kenneth Vanhoey,
Luc Van Gool
Abstract:
Low-end and compact mobile cameras demonstrate limited photo quality mainly due to space, hardware and budget constraints. In this work, we propose a deep learning solution that translates photos taken by cameras with limited capabilities into DSLR-quality photos automatically. We tackle this problem by introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image Generative Adve…
▽ More
Low-end and compact mobile cameras demonstrate limited photo quality mainly due to space, hardware and budget constraints. In this work, we propose a deep learning solution that translates photos taken by cameras with limited capabilities into DSLR-quality photos automatically. We tackle this problem by introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image Generative Adversarial Network-based architecture. The proposed model is trained by under weak supervision: unlike previous works, there is no need for strong supervision in the form of a large annotated dataset of aligned original/enhanced photo pairs. The sole requirement is two distinct datasets: one from the source camera, and one composed of arbitrary high-quality images that can be generally crawled from the Internet - the visual content they exhibit may be unrelated. Hence, our solution is repeatable for any camera: collecting the data and training can be achieved in a couple of hours. In this work, we emphasize on extensive evaluation of obtained results. Besides standard objective metrics and subjective user study, we train a virtual rater in the form of a separate CNN that mimics human raters on Flickr data and use this network to get reference scores for both original and enhanced photos. Our experiments on the DPED, KITTI and Cityscapes datasets as well as pictures from several generations of smartphones demonstrate that WESPE produces comparable or improved qualitative results with state-of-the-art strongly supervised methods.
△ Less
Submitted 3 March, 2018; v1 submitted 4 September, 2017;
originally announced September 2017.
-
A Large-Scale CNN Ensemble for Medication Safety Analysis
Authors:
Liliya Akhtyamova,
Andrey Ignatov,
John Cardiff
Abstract:
Revealing Adverse Drug Reactions (ADR) is an essential part of post-marketing drug surveillance, and data from health-related forums and medical communities can be of a great significance for estimating such effects. In this paper, we propose an end-to-end CNN-based method for predicting drug safety on user comments from healthcare discussion forums. We present an architecture that is based on a v…
▽ More
Revealing Adverse Drug Reactions (ADR) is an essential part of post-marketing drug surveillance, and data from health-related forums and medical communities can be of a great significance for estimating such effects. In this paper, we propose an end-to-end CNN-based method for predicting drug safety on user comments from healthcare discussion forums. We present an architecture that is based on a vast ensemble of CNNs with varied structural parameters, where the prediction is determined by the majority vote. To evaluate the performance of the proposed solution, we present a large-scale dataset collected from a medical website that consists of over 50 thousand reviews for more than 4000 drugs. The results demonstrate that our model significantly outperforms conventional approaches and predicts medicine safety with an accuracy of 87.17% for binary and 62.88% for multi-classification tasks.
△ Less
Submitted 17 June, 2017;
originally announced June 2017.
-
Decision Stream: Cultivating Deep Decision Trees
Authors:
Dmitry Ignatov,
Andrey Ignatov
Abstract:
Various modifications of decision trees have been extensively used during the past years due to their high efficiency and interpretability. Tree node splitting based on relevant feature selection is a key step of decision tree learning, at the same time being their major shortcoming: the recursive nodes partitioning leads to geometric reduction of data quantity in the leaf nodes, which causes an e…
▽ More
Various modifications of decision trees have been extensively used during the past years due to their high efficiency and interpretability. Tree node splitting based on relevant feature selection is a key step of decision tree learning, at the same time being their major shortcoming: the recursive nodes partitioning leads to geometric reduction of data quantity in the leaf nodes, which causes an excessive model complexity and data overfitting. In this paper, we present a novel architecture - a Decision Stream, - aimed to overcome this problem. Instead of building a tree structure during the learning process, we propose merging nodes from different branches based on their similarity that is estimated with two-sample test statistics, which leads to generation of a deep directed acyclic graph of decision rules that can consist of hundreds of levels. To evaluate the proposed solution, we test it on several common machine learning problems - credit scoring, twitter sentiment analysis, aircraft flight control, MNIST and CIFAR image classification, synthetic data classification and regression. Our experimental results reveal that the proposed approach significantly outperforms the standard decision tree learning methods on both regression and classification tasks, yielding a prediction error decrease up to 35%.
△ Less
Submitted 3 September, 2017; v1 submitted 25 April, 2017;
originally announced April 2017.
-
DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks
Authors:
Andrey Ignatov,
Nikolay Kobyshev,
Radu Timofte,
Kenneth Vanhoey,
Luc Van Gool
Abstract:
Despite a rapid rise in the quality of built-in smartphone cameras, their physical limitations - small sensor size, compact lenses and the lack of specific hardware, - impede them to achieve the quality results of DSLR cameras. In this work we present an end-to-end deep learning approach that bridges this gap by translating ordinary photos into DSLR-quality images. We propose learning the translat…
▽ More
Despite a rapid rise in the quality of built-in smartphone cameras, their physical limitations - small sensor size, compact lenses and the lack of specific hardware, - impede them to achieve the quality results of DSLR cameras. In this work we present an end-to-end deep learning approach that bridges this gap by translating ordinary photos into DSLR-quality images. We propose learning the translation function using a residual convolutional neural network that improves both color rendition and image sharpness. Since the standard mean squared loss is not well suited for measuring perceptual image quality, we introduce a composite perceptual error function that combines content, color and texture losses. The first two losses are defined analytically, while the texture loss is learned in an adversarial fashion. We also present DPED, a large-scale dataset that consists of real photos captured from three different phones and one high-end reflex camera. Our quantitative and qualitative assessments reveal that the enhanced image quality is comparable to that of DSLR-taken photos, while the methodology is generalized to any type of digital camera.
△ Less
Submitted 5 September, 2017; v1 submitted 8 April, 2017;
originally announced April 2017.
-
Automatic Analysis, Decomposition and Parallel Optimization of Large Homogeneous Networks
Authors:
Dmitry Yu. Ignatov,
Alexander N. Filippov,
Andrey D. Ignatov,
Xuecang Zhang
Abstract:
The life of the modern world essentially depends on the work of the large artificial homogeneous networks, such as wired and wireless communication systems, networks of roads and pipelines. The support of their effective continuous functioning requires automatic screening and permanent optimization with processing of the huge amount of data by high-performance distributed systems. We propose new m…
▽ More
The life of the modern world essentially depends on the work of the large artificial homogeneous networks, such as wired and wireless communication systems, networks of roads and pipelines. The support of their effective continuous functioning requires automatic screening and permanent optimization with processing of the huge amount of data by high-performance distributed systems. We propose new meta-algorithm of large homogeneous network analysis, its decomposition into alternative sets of loosely connected subnets, and parallel optimization of the most independent elements. This algorithm is based on a network-specific correlation function, Simulated Annealing technique, and is adapted to work in the computer cluster. On the example of large wireless network, we show that proposed algorithm essentially increases speed of parallel optimization. The elaborated general approach can be used for analysis and optimization of the wide range of networks, including such specific types as artificial neural networks or organized in networks physiological systems of living organisms.
△ Less
Submitted 23 January, 2017;
originally announced January 2017.