E-mail: 111221001@live.asia.edu.tw Instructions: Prof. Chun-Yuan Lin (林 俊 淵) Agenda • Preface – Abstract – Introduction • Deep learning – Deep learning architectures – Classification – Segmentation – Deep learning development framework • Medical image – Deep learning for medical imaging analysis • Conclusions and future research Preface
• This Post-reading report is
about reading an article “A review of the application of deep learning in medical image classification and segmentation” post-free on https://www.ncbi.nlm.nih.g ov/pmc/articles/PMC73273 46/# ANN Abstract • Big medical data mainly include electronic health record data, medical image data, gene information data, etc. Among them, medical image data account for the vast majority of medical data at this stage. • How to apply big medical data to clinical practice? This is an issue of great concern to medical and computer researchers, and intelligent imaging and deep learning provide a good answer. • This review introduces the application of intelligent imaging and deep learning in the field of big data analysis and early diagnosis of diseases, combining the latest research progress of big data analysis of medical images and the work of our team in the field of big data analysis of medical imagec, especially the classification and segmentation of medical images. Introduction
• Since 2006, deep learning has emerged as a branch of
the machine learning field in people’s field of vision. It is a method of data processing using multiple layers of complex structures or multiple processing layers composed of multiple nonlinear transformations (1). • In recent years, deep learning has made breakthroughs in the fields of computer vision, speech recognition, natural language processing, audio recognition and bioinformatics (2). Deep learning architectures
Deep learning algorithms
• Deep learning has developed into a hot research field, and there are dozens of algorithms, each with its own advantages and disadvantages. These algorithms cover almost all aspects of our image processing, which mainly focus on classification, segmentation. Figure 1 is an overview of some typical network structures in these areas. Classification-1
Convolutional neural network (CNN) is the most widely
structure. Since Krizhevsky et al. proposed AlexNet based on deep learning model CNN in 2012 (5), which won the championship in the ImageNet image classification of that year, deep learning began to explode. In 2013, Lin et al. proposed the network in network (NIN) structure, which uses global average pooling to reduce the risk of overfitting (6). In 2014, GoogLeNet and VGGNet both improved the accuracy on the ImageNet dataset (7,8). Classification-2
• For the shortcomings of CNN on the input size fixed
requirements, He et al. proposed spatial pyramid pooling (SPP) model to enhance the robustness of the input data (12). With the deepening of the deep learning model, He et al. proposed the residual network ResNet for the problem of model degradation that may occur, and continue to advance the deep learning technology (13). Classification-3
• Take AlexNet as an example. In 2012, the AlexNet
adopted an 8-layer network structure consisting of five convolutional layers and three fully connected layers. After each convolution in five convolutional layers, a maximum pooling performed to reduce the amount of data. AlexNet accepts 227×227 pixels’ input data. After five rounds of convolution and pooling operations, the 6×6×256 feature matrix finally sent to the fully connected layer. Classification-4
• AlexNet’s error rate in ImageNet was 15.3%, which was
much higher than the 26.2% in second place. At the same time, its activation function is not sigmoid but adopted ReLU, and proved that the ReLU function is more effective. • VGG16 first proposed by VGG Group of Oxford University. Compared with AlexNet, it uses several consecutive 3×3 kernels instead of the larger convolution kernel in AlexNet like 11×11 and 5×5. Classification-5
• For a given receptive field range, the effect of using
several small convolution kernels is better than using a larger convolution kernel, because the multi-layer nonlinear layer can increase the network depth to ensure more complex patterns are learned, and the computational cost is also more small. Segmentation-1
• Semantic segmentation is an important research field of
deep learning. With the rapid development of deep learning technology, excellent semantic segmentation neural networks emerge in large numbers and continuously become state-of-the-art in various segmentation competitions. Segmentation-2
• Since CNN’s success in the classification field, people started
to try CNN for image segmentation. Although CNN can accept images of any size as input, CNN will lose some details while pooling for extracting features, and it will lose the space information of input image due to the fully connected layers at the end of the network. So it’s difficult for CNN to pinpoint which category certain pixels belong to. With the development of deep learning technology, some segmentation networks based on convolution structure are derived. Segmentation-3
• The fully convolutional network (FCN) (14) proposed by
Long et al. is the originator of the semantic segmentation networks. It replaces the fully connected layers of the classification network VGG16 with convolutional layers and retains the spatial information of the feature map and achieves pixel-level classification. Finally, FCN uses the deconvolution and fusing feature maps to restore the image, and provides the segmentation result of each pixel by softmax. Segmentation-4
• The performance of the FCN on the Pascal VOC 2012
datasets (15) has increased by 20% compared to the previous method, reaching 62.2% of the mIOU. • U-Net (16) was proposed by Olaf based on FCN, and has been widely used in medical imaging. Based on the idea of FCN deconvolution to restore image size and feature, U-Net constructs the encoder-decoder structure in the field of semantic segmentation. Segmentation-5
• The encoder gradually reduces the spatial dimension by
continuously merging the layers to extract feature information, and the decoder portion gradually restores the target detail and the spatial dimension according to the feature information. • Among them, the step of the encoder gradually reducing the image size is called downsampling, and the step of the decoder gradually reducing the image details and size is called upsampling. Segmentation-6
• In order to fuse the context information under multi
scale
at the same level, PSPNet (18) proposes a pooled pyramid structure, which realizes image segmentation in which the target environment can be understood, and solves the problem that FCN cannot effectively deal with, the relationship problem between global information and scenes. Its pooled pyramid structure can aggregate context information of different regions, thereby improving the ability to obtain global information. Deep learning development framework-1
• While the deep learning technology is developing in theory, the
software development framework based on deep learning theory is also booming. • Convolutional architecture for fast feature embedding (Caffe)...Caffe was born in Berkeley, California and now hosted by BVLC. Caffe features high-performance, seamless switching between CPU and GPU modes, and cross platform. • TensorFlow is an open source software library that uses data flow diagrams for numerical calculations. Google officially opened the computing framework TensorFlow on November 9, 2015, and officially released Google TensorFlow version 1.0 in 2017. Deep learning development framework-2
• The TensorFlow calculation framework can well support various
algorithms for deep learning such as CNN, RNN and LSTM, but its application is not limited to deep learning, but also supports the construction of general machine learning. • Pytorch is the python version of torch, a neural network framework that is open sourced by Facebook and specifically targeted at GPU-accelerated deep neural network programming. Unlike Tensorflow’s static calculation graph, Pytorch’s calculation graph is dynamic, and the calculation graph can be changed in real-time according to the calculation needs. Deep learning development framework-3
High-performance computing based on GPU
• The key factors of image processing in medical imaging field are imaging speed, image size and resolution. • The full name of the GPU is the Graphics Processing Unit, a microprocessor that performs image computing on PCs. • In August 1999, NVIDIA released a GeForce 256 graphics chip codenamed NV10. Its architecture is very different from that of the CPU. Deep learning development framework-4
• With the GPU, CPU does not need to perform graphics
processing work, and can perform other system tasks, which can greatly improve the overall performance of the computer. Deep learning for medical imaging analysis
• In recent years, various types of medical image
processing and recognition have adopted deep learning methods, including fundus images, endoscopic images, CT/MRI images, ultrasound images, pathological images, etc. The classification of medical image
On the one hand, the academic circles have made great
efforts to design a variety of efficient CNN models, which have achieved high accuracy and even exceeded the human recognition ability. On the other hand, the application of CNN model in medical image analysis has become one of the most attractive directions of deep learning.
• Diabetic retinopathy detection
Diabetic retinopathy detection
• The main method for studying related fundus diseases
using deep learning techniques is to classify and detect fundus images, such as diabetic retinopathy detection and glaucoma detection. The following Table 1 lists the deep learning methods applied and fundus image analysis in the past 3 years. • From the earliest shallow CNN model to the deep CNN model or some combination models, and the use of migration learning, data augmentation and other new methods and techniques. The segmentation of medical image analysis
• Early detection of Alzheimer’s disease (AD)
Brain MRI analysis is mainly for the segmentation of different brain regions and the diagnosis of brain diseases, such as brain tumor segmentation (31), schizophrenia diagnosis, early diagnosis of Parkinson’s syndrome (32) and early diagnosis of AD. Conclusions and future research-1
• The deep learning model relies heavily on data sets. Each
deep learning network requires massive data for training, which makes the data set acquisition more demanding. • The root cause is that the pixel features from the original input image are too complex, so it is a future development trend to focus on designing a network with a smaller data size. Conclusions and future research-2
• How to better apply deep learning to all stages of medical
treatment becomes a more challenging task. It depends on two aspects: one is the constantly updated iteration of technology, and the other is the continuous accumulation of medical experience. • Thank You!!!