Final Report 2
Final Report 2
Final Report 2
PNEUMONIA USING
CONVOLUTION NEURAL NETWORK
A PROJECT REPORT
Submitted by
AJITH. M
DURAI SHRIDHARSHAN.R
NANDHINI.S
PAVITHRA.S
A PROJECT REPORT
Submitted by
AJITH.M
DURAI SHRIDHARSHAN.R
NANDHINI.S
PAVITHRA.S
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
With deep sense of gratitude, we extend our earnest and sincere thanks to our
project coordinators Mr. T. Karthikeyan, Assistant Professor and
Mr. M. Senthilkumar, Assistant Professor, Department of Computer
Science and Engineering for their kind guidance and encouragement during
this project. We would also like to express our thanks to all the staff
members of our department, friends and students who helped us directly and
indirectly in all aspects of the project work to get completed successfully.
TABLE OF CONTENTS
LIST OF FIGURES II
1 INTRODUCTION 1
4 SYSTEM SPECIFICATION 15
4.1 HARDWARE 15
REQUIREMENTS
4.2 SOFTWARE 15
REQUIREMENTS
5 SYSTEM STUDY 16
5.1.1TECHNICAL FEASIBILITY 16
6 SYSTEM DESGIN 18
7 MODULE DESCRIPTION 20
7.1 MODULES 20
7.1.1 ANALYZING THE PROBLEM 20
7.1.4 MODELING 21
7.1.5 DEPLOYMENT 22
7.1.6 VISUALIZATION 23
8 SYSTEM TESTING 25
8.1 TESTING 25
8.2 TYPES OF TESTS 25
8.2.1 Unit Testing 26
8.2.2 Integration Testing 26
8.2.3 Validation Testing 26
8.2.4 Output Testing 26
8.1.5 User Acceptance Testing 27
8.1.6 Performance Testing 27
9 SYSTEM IMPLEMENTATION 28
10 CONCLUSION 29
10.1 CONCLUSION 29
10 APPENDIX A1
Chest diseases are very serious health problems in the life of people. These
very important. Many methods have been developed for this purpose. CXRs
factors such as positioning of the patient and depth of inspiration can alter
clinicians are faced with reading high volumes of images every shift. To
using chest X-ray. In the paper, convolutional neural networks (CNNs) are
presented for the diagnosis of pneumonia. The architecture of CNN and its
design principle are presented. Firstly we take data set that is gathered and
organised into folders. The entire dataset is passed forward and backward
through the neural network. Once our input data is processed through our
I
LIST OF FIGURES
Of Pneumonia
7.1.4 Modelling 22
II
LIST OF ABBREVIATIONS
ABBREVIATIONS EXPANSIONS
AI Artificial Intelligence
III
INTRODUCTION
CHAPTER 1
INTRODUCTION
1
Deep learning is a specific approach used for building and training
neural networks, which are considered highly promising decision-making
nodes. An algorithm is considered to be deep if the input data is passed
through a series of nonlinearities or nonlinear transformations before it
becomes output. Deep learning removes the manual identification of
features in data and, instead, relies on whatever training process it has in
order to discover the useful patterns in the input.
2
connected layer to predict pneumonia, given a chest X-ray image as the input.
Our data will be splitting in to training, validating and testing set. A
convolution does this by multiplying two matrices and yielding a third, smaller
matrix. The network takes an input image, uses a filter to create a
feature map describing the image.in the convolution operation, we take a filter
and slide it over the image matrix. The corresponding numbers in both matrices
are multiplied and added to single number describing the input space. This
process is repeated all over the image. We use different filters to pass over our
inputs, and take all the features maps; put them together as the final output of
the convolutional layer. The next step our process involves further reducing the
dimensionality of the data. This is achieved by using the pooling layer. The
output from the convolutional and pooling layers represents high -level
features in the data .while that output could be flattened and connected to the
output layer, adding a fully-connected layer is a way of learning non-linear
combination of these features. With minimal efforts, we managed to detect the
presence of pneumonia in the X-ray.
3
LITERATURE SURVEY
CHAPTER 2
LITERATURE SURVEY
4
Which are computationally expensive and require well-labelled data
during training. We pre-process the 3D CT scans using segmentation,
normalization, down sampling, and zero-centring. Our initial approach
was to simply input the pre-processed 3D CT scans into 3D CNNs, but
the results were poor, so we needed additional pre-processing to input
only regions of interests into 3D CNNs. To identify regions of interest,
we train a U-net for nodule candidate detection. We then input regions
around nodule candidates detected by the
U-net into 3D CNNs to ultimately classify the CT scans as Positive or
negative for lung cancer. The deep 3D CNN models, namely the Google
net- Based model, performed the best on the test set. While we do not
achieve state-of-the-art performance, we perform well considering that
we use less labelled data than most state-of-the-art CAD systems. As an
interesting observation, the first activation layer of one of the older
models (where we input the entire CT volume) for a validation example
that was labelled as positive for cancer. Other future work include
extending our models to 3D images for other cancers and lung diseases
like Pneumonia using deep learning algorithms, particularly Artificial
Neural Network (ANN) and Convolutional Neural Network(CNN).(1)
5
models to identify patients with active TB have been lacking. The reason
for this lies in the complexity of the clinical and radiographic
presentation, the relatively small patient samples, and the use of
modelling techniques that are poorly suited for the task. Neural networks
are computation systems that process information in parallel, using large
numbers of simple units, and that excel in tasks involving pattern
recognition. These intrinsic properties of the neural networks have been
translated into higher performance accuracy in outcome prediction
compared with expert opinion or conventional statistical methods.
Therefore, we hypothesized that the ability to identify patients correctly
with active pulmonary TB could be improved by using computer analyses
involving neural networks. To test this hypothesis, we have applied an
artificial neural network to the analysis of data from patients who are
considered to be at high risk for active pulmonary TB and compared the
networkoutputtophysiciansprediction.
A general regression neural network (GRNN) was accustomed develop
the prognostic model. Predictive accuracy of the neural network
compared with clinicians' assessment. Predictive accuracy was assessed
by the c-index, which is equivalent to the area under the receiver
operating characteristic curve. The GRNN considerably outperformed the
physicians' prediction, with calculated c-indices (SEM) of zero.947 0.028
and 0.61 0.045, severally (p < zero.001). When the GRNN was applied to
the validation cluster, the corresponding-indices were zero 923 0.056 and
0.7160.095, respectively.(2)
6
2.3 A SURVEY ON DEEP LEARNING IN MEDICAL
IMAGE ANALYSIS (Litiens G1, Kooi T2)
Initially, from the Nineteen Seventies to the Nineties, medical image analysis
was finished ordered application of low-level element process (edge and line
detector filters, region growing) and mathematical modeling (fitting lines,
circles and ellipses) to construct compound rule-based systems that solved
explicit tasks.
7
2.4 SURVEY ON RECENT CAD SYSTEM FOR LIVER
DISEASE DIAGNOSIS (S. S. KUMAR)
8
indicates that the highest accuracy achieved is 96.7% and the highest
sensitivity is 97.3% and highest specificity is 96%, which is obtained
with the contourlet coefficient co-occurrence features. CAD is an
interdisciplinary technology combining artificial intelligence and digital
image processing. These applications not only increase the efficiency and
productivity, but also enhance health services for the public. (4)
9
automatically refined considering the closest point to the manually selected
one, characterized by a lower intensity. Starting from this point, a region-
growing technique is applied. The region is iteratively grown by comparing
all unallocated neighbouring pixels to the region. The difference between a
pixel’s intensity value and the region’s mean is used as a measure of
similarity. The pixel with the smallest difference measured this way is
allocated to the respective region. This process stops when the intensity
difference between region mean and new pixels becomes larger than a
threshold defined experimentally as two times the maximum intensity
distance in the region. On each slice, the kidney area was calculated as the
number of pixels inside the detected contour multiplied by the spatial
resolution. Unexpected low area values inside the detected contour resulted
in the automatic repetition of the region-growing step applying a larger value
for the distance evaluation. On-kidney structures were automatically
identified and excluded from detection and from area measurement. Right
and left kidney volumes measured from stereology ranged from 197 to 3,111
ml and from 161 to 2,156 ml, respectively, reflecting the wide range of
volumes in the selected population of ADPKD patients. The result of this
study provides feasibility of using our approach for accurate and fast
evaluation of total renal volume also in markedly enlarged ADPKD
kidneys. (5)
10
tumor are changing the intensity and structure of the liver. Due to large
variance and shape deformation produced by abnormality, it is difficult to
acquire accurate segmentation via completely automatic method. Graph cut
algorithms have been successfully applied to medical image segmentation of
different organs for 3D volume data, which not only leads to very large-scale
graph due to the same node number as voxel number, but also completely
ignore some available organ shape priors. Thus, a slice by slice liver
segmentation method by combining shape constraints according to
previously slice segmentation has been proposed based on
graph cut the convolution graph cut based organ segmentation methods need
to consider all voxels as the nodes, and construct very-scale graph, which
needs large memory space for storing, and high computational cost for the
optimization.SLIC (Simple Linear iterative clustering) is used for achieving
super-pixels in the medial data. The constructed graph scale is still large, and
the computation of distance map from all voxel to the segmented shape leads
to high cost. Our graph based super pixels can reduce the memory usage. In
order to explore an efficient and effective slice by slice segmentation method
for liver, this paper proposes to apply clustering algorithm to firstly group
slice pixels into super pixels as nodes for constructing graph, which not only
greatly reduce the graph scale but also significantly speed up the
optimization procedure of the graph. To validate effectiveness and efficiency
of our proposed method, we conduct experiments on 10 CT volumes, most
of which have tumours inside liver, and abnormal deformed shape of liver.
Our method can yield an average dice coefficient: 0.94, about 659.22 second
in computation, and take only 1.5GB in memory usage.(6)
11
SYSTEM ANALYSIS
CHAPTER 3
SYSTEM ANALYSIS
Drawbacks
1. Requires breath holding which some patients cannot manage.
2. Side effects of CT scans are diarrhea, nausea or vomiting and
constipation.
3.2 PROPOSED SYSTEM
12
images can be tricky and requires domain expertise and experience. It would
be nice if we can just ask a computer to read the images and tell us the
results. In this story, we will use deep learning to train an AI algorithm that
analyses chest x ray images and detects pneumonia. Convolutional neural
network (CNN) is a class of deep neural networks that specializes in
analysing images and thus is widely used in computer vision applications
such as image classification and clustering, object detection, and neural style
transfer.
Advantages of proposed system
1. Once trained, the predictions are pretty fast.
2. With any numbers of inputs layers, CNN can train.
3.3 LANGUAGE DESCRIPTION
PYTHON
Python is an interpreted high-level programming language for general-
purpose programming. Python has a design philosophy that emphasizes code
readability, and a syntax that allows programmers to express concepts in
fewer lines of code, notably using significant whitespace. It provides
constructs that enable clear programming on both small and large scales.
Rather than having all of its functionality built into its core, Python
was designed to be highly extensible. This compact modularity has made it
particularly popular as a means of adding programmable interfaces to
existing applications.
13
The Natural Language Toolkit, or more commonly NLTK, is a suite of
libraries and programs for symbolic and statistical Natural Language
Processing (NLP) for English written in the Python programming language.
NLTK includes graphical demonstrations and sample data. It is accompanied
by a book that explains the underlying concepts behind the language
processing tasks supported by the toolkit.
14
SYSTEM SPECIFICATION
CHAPTER 4
SYSTEM SPECIFICATION
A complete specification of hardware and software requirements is
essential for the success of software development. This software has been
developed with very powerful and high performance multi-user computing
system. It is applicable in the areas where much processing speed is required.
15
SYSTEM STUDY
CHAPTER 5
SYSTEM STUDY
5.1 FEASIBILITY STUDY
Feasibility and risk analysis are related in many ways. If project risk is
great, the feasibility of producing quality software is reduced. During
product engineering, however, we concentrate our attention on primary areas
of interest.
Technical Feasibility
Economical Feasibility
Behavioural Feasibility
16
Technical feasibility is the most difficult area to assess at the stage of
the system development process. Because objectives, functions and
performance are somewhat hazy, anything seems possible if the right
assumptions are made.
17
SYSTEM DESIGN
CHAPTER 6
SYSTEM DESIGN
SYSTEM ARCHITECTURE
18
Analysis the problem
Data collection
Data Understanding
Modeling
Deployment
Visualization
19
MODULE DESCRIPTION
CHAPTER 7
MODULE DESCRIPTION
7.1 MODULES
1. Analysing the problem
2. Data Collection
3. Data Understanding
4. Modeling
5. Deployment
6. Visualization
We're changing the sizes of the images to 226 x 226 and we'll flip the
images horizontally as well so that we can have more data (images) to
train on.
20
7.1.3 DATA UNDERSTANDING
7.1.4 MODELING
21
FIG 7.1.4 MODELING
There are three layers by which the given dataset is trained and tested.
1. Convolutional layer
2. Pooling layer
7.1.5 DEPLOYMENT
22
Therefore, "deployment" should be interpreted as a general process that has to
be customized according to specific requirements or characteristics.
7.1.6 VISUALIZATION
23
SYSTEM TESTING
24
CHAPTER 8
SYSTEM TESTING
8.1 TESTING
Testing is a series of different tests that whose primary purpose is to fully
exercise the computer based system. Although each test has a different purpose,
all work should verify that all system element have been properly integrated
and performed allocated function. Testing is the process of checking whether
the developed system works according to the actual requirement and objectives
of the system. The philosophy behind testing is to find the errors. A good test is
one that has a high probability of finding an undiscovered error. A successful
test is one that uncovers the undiscovered error. Test cases are devised with this
purpose in mind. A test case is a set of data that the system will process as an
input. However the data are created with the intent of determining whether the
system will process them correctly without any errors to produce the required
output.
8.2 TYPES OF TESTING:
1. Unit testing
2. Integration testing
3. Validation testing
4. Output testing
5. User acceptance testing
25
8.2.1 UNIT TESTING
All modules were tested and individually as soon as they were completed
and were checked for their correct functionality.
8.2.2 INTEGRATION TESTING
The entire project was split into small program; each of this single
programs gives a frame as an output. These programs were tested individually;
at last all these programs where combined together by creating another program
where all these constructors were used. It give a lot of problem by not
functioning is an integrated manner. The user interface testing is important
since the user has to declare that the arrangements made in frames are
convenient and it is satisfied. when the frames where given for the test, the end
user gave suggestion. Based on their suggestions the frames where modified
and put into practice
26
to the user needs. And for the hardcopy the output comes according to the
specifications requested by the user.
8.2.5 USER ACCEPTANCE SYSTEM
27
SYSTEM IMPLEMENTATION
CHAPTER 9
SYSTEM IMPLEMENTATION
27
CONCLUSION
CHAPTER 10
CONCLUSION
10.1 CONCLUSION
Our future work will be based on studying about complex diseases and
that can be implementing in Neural networks.
28
APPENDIX
APPENDIX A.1
A.1 SOURCE CODE
# Import Packages
import keras
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.preprocessing.image
import ImageDataGenerator, load_img
mainDIR = os.listdir('D:/chest_xray/test')
print (mainDIR)
train_folder= 'D:/chest_xray/test'
val_folder = 'D:/chest_xray/train'
test_folder = 'D:/chest_xray/val'
print(len(os.listdir(train_n)))
rand_norm= np.random.randint(0,len(os.listdir(train_n)))
norm_pic = os.listdir(train_n)[rand_norm]
print('normal picture title: ',norm_pic)
norm_pic_address = train_n+norm_pic
#Pneumonia
rand_p = np.random.randint(0,len(os.listdir(train_p)))
sic_pic = os.listdir(train_p)[rand_norm]
sic_address = train_p+sic_pic
A1
print('pneumonia picture title:', sic_pic)
a2 = f.add_subplot(1, 2, 2)
img_plot = plt.imshow(sic_load)
a2.set_title('Pneumonia')
# let's build the CNN model
cnn = Sequential()
#Convolution
cnn.add(Conv2D(32, (3, 3), activation="relu", input shape= (64, 64, 3)))
#Pooling
cnn.add(MaxPooling2D(pool_size = (2, 2)))
# 2nd Convolution
cnn.add (Conv2D (32, (3, 3), activation="relu"))
A2
# 2nd Pooling layer
cnn.add (MaxPooling2D (pool_size = (2, 2)))
#Image normalization.
training_set = train_datagen.flow_from_directory('D:/chest_xray/train/',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
A3
validation_generator =
test_datagen.flow_from_directory('D:/chest_xray/val/',
target_size=(64, 64),
batch_size=32,
class_mode='binary')
test_set = test_datagen.flow_from_directory('D:/chest_xray/test/',
target_size = (64, 64)
batch_size = 32,
class_mode = 'binary')
cnn.summary ()
cnn_model = cnn.fit_generator (training_set,
steps_per_epoch = 163,
epochs = 10,
validation_data = validation_generator,
validation_steps = 624)
test_accu = cnn.evaluate_generator(test_set,steps=624)
print('The testing accuracy is :',test_accu[1]*100, '%')
plt.plot(cnn_model.history['acc'])
plt.plot(cnn_model.history['val_acc'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Training set', 'Validation set'], loc='upper left')
plt.show()
plt.plot(cnn_model.history['val_loss'])
plt.plot(cnn_model.history['loss'])
plt.title('Model Loss')
A4
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Training set', 'Test set'], loc='upper left')
plt.show()
A5
APPENDIX A.2
A.2 SCREENSHOTS
A6
A.2.3 PYTHON INSTALLATION IN ANACONDA PROMPT
A7
A.2.5 PILLOW INSTALLATION
A8
A.2.7 KERAS INSTALLATION
A9
A.2.6 DATASET FOR THE MODEL
A10
A.2.8 MODEL ACCURACY OF PNEUMONIA
A11
REFERENCES
REFERENCES