Final Report 2

IDENTIFICATION OF
PNEUMONIA USING
CONVOLUTION NEURAL NETWORK
A PROJECT REPORT
Submitted by
AJITH. M
DURAI SHRIDHARSHAN.R
NANDHINI.S
PAVITHRA.S
in partial fulfillment for the award of the degree

of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
KNOWLEDGE INSTITUTE OF TECHNOLOGY, SALEM
ANNA UNIVERSITY::CHENNAI 600 025

APRIL 2019
IDENTIFICATION OF
PNEUMONIA USING
CONVOLUTION NEURAL NETWORK
A PROJECT REPORT
Submitted by
AJITH.M
DURAI SHRIDHARSHAN.R
NANDHINI.S
PAVITHRA.S
in partial fulfillment for the award of the degree

of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
KNOWLEDGE INSTITUTE OF TECHNOLOGY, SALEM
ANNA UNIVERSITY::CHENNAI 600 025

APRIL 2019
ANNA UNIVERSITY : CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this project report “IDENTIFICATION OF

PNEUMONIA USING CONVOLUTIONAL NEURAL NETWORK” is
the bonafide work of “M.AJITH (611215104003), R.DURAI
SHRIDHARSHAN (611215104015), S.NANDHINI (611215104054) and
S.PAVITHRA (611215104058)” who carried out the project work under
my supervision.
SIGNATURE SIGNATURE
Dr. V. KUMAR M.E., Ph.D., Mr. I .Rajesh M.E.,

HEAD OF THE DEPARTMENT, SUPERVISOR,
Assistant Professor,
Department of Computer science Department of Computer Science
and Engineering, and Engineering,
Knowledge Institute of Technology, Knowledge Institute of Technology,
Kakapalayam, Salem – 637 504. Kakapalayam, Salem – 637 504.
Submitted for the Project Viva-Voce Examination held on ______________
INTERNAL EXAMINER EXTERNAL EXAMINER

ACKNOWLEDGEMENT
At this pleasing moment of having successfully completed our project, we

wish to convey our sincere thanks and gratitude to our beloved President
Mr. R. Kumarasamy, who has provided all the facilities to us.
We would like to convey our sincere thanks to our beloved Principal,

Dr. PSS. Srinivasan, who forward us to do our project and offers adequate
duration to complete our project.
We express our sincere thanks to Dr. V. Kumar, Head of the Department,

Department of Computer Science and Engineering, for fostering the
excellent academic climate in the department.
We express our pronounced sense of thanks with deepest respect and

gratitude to our internal guide Mr. I.Rajesh, Assistant Professor,
Department of Computer Science and Engineering, for his valuable and
precious guidance and for having amicable relation.
With deep sense of gratitude, we extend our earnest and sincere thanks to our
project coordinators Mr. T. Karthikeyan, Assistant Professor and
Mr. M. Senthilkumar, Assistant Professor, Department of Computer
Science and Engineering for their kind guidance and encouragement during
this project. We would also like to express our thanks to all the staff
members of our department, friends and students who helped us directly and
indirectly in all aspects of the project work to get completed successfully.
TABLE OF CONTENTS
CHAPTER TITLE PAGE

NO. NO.
ABSTRACT I
LIST OF FIGURES II
LIST OF ABBREVIATIONS III
1 INTRODUCTION 1
1.1 INTRODUCTION TO DATA SCIENCE 1

1.1.2 INTRODUCTION TO MACHINE 1
LEARNING
1.1.3 INTRODUCTION TO DEEP 1
LEARNING
1.2 OVERVIEW OF THE PROJECT 2
2 LITERTATURE SURVEY 4
2.1 DEEP CONVOLUTIONAL NERUAL

NETWORK FOR LUNG CANCER
DETECTION 4
2.2 PREDECTING ACTIVE
PULMONARYN TUBERCULOSIS
USING AN ARTIFICAL NERUAL 5
NETWORK
2.3 A SURVERY ON DEEP LEARNING IN
MEDICAL IMAGE ANALYSIS 7
2.4 A SUVERY ON RECENT CAD 8
SYSTEMS FOR LIVER DIAGNOSIS
2.5 ASSENMENT OF KIDENY VOLUMES
IN POLYCYSTIC KIDNEY DISEASES 9
FROM CORONAL AND MR IMAGES
2.6 LIVER SEGMENTATION USING
SUPER PIXEL-BASED GRAPH CUTS 10
AND RESTRICTED REGIONS OF
SHAPE CONSTRAINTS
3 SYSTEM ANALYSIS 12
3.1 EXISTING SYSTEM 12

3.2 PROPOSED SYSTEM 12
3.3 LANGUAGE DESCRIPTION 13
4 SYSTEM SPECIFICATION 15
4.1 HARDWARE 15
REQUIREMENTS
4.2 SOFTWARE 15
REQUIREMENTS
5 SYSTEM STUDY 16
5.1 FEASIBILITY STUDY 16
5.1.1TECHNICAL FEASIBILITY 16
5.1.2 ECONOMICAL FEASIBILITY 16
5.1.3 BEHAVIOURAL FEASIBILITY 17
6 SYSTEM DESGIN 18
6.1 SYSTEM ARCHITECTURE 18
7 MODULE DESCRIPTION 20
7.1 MODULES 20
7.1.1 ANALYZING THE PROBLEM 20
7.1.2 DATA COLLECTION 20
7.1.3 DATA UNDERSTANDING 21
7.1.4 MODELING 21
7.1.5 DEPLOYMENT 22
7.1.6 VISUALIZATION 23
8 SYSTEM TESTING 25
8.1 TESTING 25
8.2 TYPES OF TESTS 25
8.2.1 Unit Testing 26
8.2.2 Integration Testing 26
8.2.3 Validation Testing 26
8.2.4 Output Testing 26
8.1.5 User Acceptance Testing 27
8.1.6 Performance Testing 27
9 SYSTEM IMPLEMENTATION 28
10 CONCLUSION 29
10.1 CONCLUSION 29
10.2 FUTURE ENCHANCEMENT 29
10 APPENDIX A1
A.1 SOURCE CODE A1

A.2 SCREENSHOT A11
11 REFERENCES R1
ABSTRACT
ABSTRACT
Chest diseases are very serious health problems in the life of people. These
diseases include chronic obstructive pulmonary disease, pneumonia, asthma,
tuberculosis, and lung diseases. The timely diagnosis of chest diseases is
very important. Many methods have been developed for this purpose. CXRs
are the most commonly performed diagnostic imaging study. A number of
factors such as positioning of the patient and depth of inspiration can alter
the appearance of the CXR, complicating interpretation further. In addition,
clinicians are faced with reading high volumes of images every shift. To
improve the efficiency and reach of diagnostic services, we utilized
convolutional neural network to accurately predict presences of pneumonia
using chest X-ray. In the paper, convolutional neural networks (CNNs) are
presented for the diagnosis of pneumonia. The architecture of CNN and its
design principle are presented. Firstly we take data set that is gathered and
organised into folders. The entire dataset is passed forward and backward
through the neural network. Once our input data is processed through our
network, it will return the accuracy of pneumonia.
KEYWORDS: Chest diseases, Pneumonia, diagnosis, CNN, X-ray.
I
LIST OF FIGURES
FIGURE NO FIGURE NAME PAGE NO
6.1 System Architecture For Identification 19
Of Pneumonia
7.1.4 Modelling 22
II
LIST OF ABBREVIATIONS
ABBREVIATIONS EXPANSIONS
AI Artificial Intelligence
ANN Artificial Neural Networks
CAD Computer Aided Diagnosis
CLD Chronic Liver Disease
CNN Convolution Neural Networks
ICU Intensive Care Unit
RNN Recurrent Neural Networks
III
INTRODUCTION
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION TO DATA SCIENCE

Data science is the field of study that combines domain expertise,
programming skills, and knowledge of math and statistics to extract
meaningful insights from data. Data science practitioners apply machine
learning algorithms to numbers, text, images, video, audio, and more to
produce artificial intelligence (AI) systems that perform tasks which
ordinarily require human intelligence. In turn, these systems
generate insights that analysts and business users translate into tangible
business value. It refers to an emerging area of work concerned with the
collection, preparation, analysis, visualization, management and
preservation of large collections of information. Although the name data
science seems to connect most strongly with areas such as databases and
computer science, many different kinds of skills including non-
mathematical skills are needed.
1.1.1 INTRODUCTION TO MACHINE LEARNING

Machine learning is an application of artificial intelligence (AI) that
provides systems the ability to automatically learn and improve from
experience without being explicitly programmed. Machine
learning focuses on the development of computer programs that can
access data and use it learn for themselves.
1.1.2 INTRODUCTION TO DEEP LEARNING
1
Deep learning is a specific approach used for building and training
neural networks, which are considered highly promising decision-making
nodes. An algorithm is considered to be deep if the input data is passed
through a series of nonlinearities or nonlinear transformations before it
becomes output. Deep learning removes the manual identification of
features in data and, instead, relies on whatever training process it has in
order to discover the useful patterns in the input.
1.2 OVERVIEW OF THE PROJECT
Pneumonia is an infection in one or both lungs. It can be caused by

bacteria, viruses, or fungi. Bacterial pneumonia is the most common type in
adults. Pneumonia causes inflammation in the air sacs in your lungs. Medical
X-rays are images which are generally used to diagnose some sensitive human
body parts such as bones, chest, teeth, skull, and so on. Medical experts have
used this technique for several decades to explore and visualize fractures or
abnormalities in body organs. Chest X-ray is the best test for pneumonia
diagnosis. However, reading x ray images can be tricky and requires domain
expertise and experience. In this paper, we will use deep learning to train a
CNN algorithm that analyzes chest X-ray images and detects pneumonia.
Convolutional neural network (CNN) is a class of deep neural networks that
specializes in analyzing images and thus is widely used in computer vision
applications such as image classification and clustering, object detection, and
neural style transfer. By applying convolutions, we can we can be much more
efficient with these operations. Instead of going pixel by pixel, we use “filters”
to analyze portions of an image. A great prototyping library/tool
is Keras. Keras is a high-level API which lets you build neural networks
quickly and uses more complicated framework on back-end. We will be using
tensorflow to create a CNN with convolutional layer, pooling layer and fully
2
connected layer to predict pneumonia, given a chest X-ray image as the input.
Our data will be splitting in to training, validating and testing set. A
convolution does this by multiplying two matrices and yielding a third, smaller
matrix. The network takes an input image, uses a filter to create a
feature map describing the image.in the convolution operation, we take a filter
and slide it over the image matrix. The corresponding numbers in both matrices
are multiplied and added to single number describing the input space. This
process is repeated all over the image. We use different filters to pass over our
inputs, and take all the features maps; put them together as the final output of
the convolutional layer. The next step our process involves further reducing the
dimensionality of the data. This is achieved by using the pooling layer. The
output from the convolutional and pooling layers represents high -level
features in the data .while that output could be flattened and connected to the
output layer, adding a fully-connected layer is a way of learning non-linear
combination of these features. With minimal efforts, we managed to detect the
presence of pneumonia in the X-ray.
3
LITERATURE SURVEY
CHAPTER 2
LITERATURE SURVEY
2.1 Deep Convolutional Neural Networks for Lung Cancer

Detection (Albert Chon, Niranjan Balachandar, Peter Lu)
Lung Cancer is one all told the foremost common cancers. It is one

of the deadliest cancers, only 17% of people in the U.S.diagnosed with
carcinoma survive 5 years when the designation and therefore the
survival rate is lower in developing countries. Current diagnostic ways
embody biopsies and imaging, like CT scans. The detection during the
earlier stages can improve the chances of survival rate. The task is to
binary classifying the problem to detect the presence of lung cancer in
patient using CT scans of lungs. The aim is to use methods from deep
learning (2D and 3D convolutional neural networks) and computer vision
to build an accurate classifier. The goal is to construct a CAD system that
takes as input patient chest CT scans and outputs whether or not the
patient has lung cancer. Though this task seems straightforward, it is
actually a needle in the haystack problem. In order to work out whether
or not or not a patient has early-stage cancer, the CAD system would
need to notice the presence of a little nodule (<10 mm in diameter for
early stage cancers) from a large 3D lung CT scan (typically around 200
mm × 400 mm × 400 mm). An example of an early stage lung cancer
nodule shown in within a 2D slice of a CT scan is given in Figure 1.
Furthermore, a CT scan is filled with noise from surrounding tissues,
bone, and air, so for the CAD systems search to be efficient, this noise
would first have to be pre-processed. Hence our classification pipeline is
image pre-processing, nodule candidate’s detection, and malignancy
classification. These pipelines have many phases, each of
4
Which are computationally expensive and require well-labelled data
during training. We pre-process the 3D CT scans using segmentation,
normalization, down sampling, and zero-centring. Our initial approach
was to simply input the pre-processed 3D CT scans into 3D CNNs, but
the results were poor, so we needed additional pre-processing to input
only regions of interests into 3D CNNs. To identify regions of interest,
we train a U-net for nodule candidate detection. We then input regions
around nodule candidates detected by the
U-net into 3D CNNs to ultimately classify the CT scans as Positive or
negative for lung cancer. The deep 3D CNN models, namely the Google
net- Based model, performed the best on the test set. While we do not
achieve state-of-the-art performance, we perform well considering that
we use less labelled data than most state-of-the-art CAD systems. As an
interesting observation, the first activation layer of one of the older
models (where we input the entire CT volume) for a validation example
that was labelled as positive for cancer. Other future work include
extending our models to 3D images for other cancers and lung diseases
like Pneumonia using deep learning algorithms, particularly Artificial
Neural Network (ANN) and Convolutional Neural Network(CNN).(1)
2.2 PREDICTING ACTIVE PULMONARY TUBERCULOSIS USING

AN ARTIFICAL NEURAL NETWORK (EI-Solh AA1, Hsiao CB)
The most important aspect of a tuberculosis (TB) infection control

program is to identify Patients who may have contagious active TB, to
isolate them while they are contagious, and to treat them effectively. The
process of recognizing those persons with active TB is, however, fraught
with difficulty. As a result, numerous outbreaks of Mycobacterium
tuberculosis have been reported in several types of facilities. Prediction
5
models to identify patients with active TB have been lacking. The reason
for this lies in the complexity of the clinical and radiographic
presentation, the relatively small patient samples, and the use of
modelling techniques that are poorly suited for the task. Neural networks
are computation systems that process information in parallel, using large
numbers of simple units, and that excel in tasks involving pattern
recognition. These intrinsic properties of the neural networks have been
translated into higher performance accuracy in outcome prediction
compared with expert opinion or conventional statistical methods.
Therefore, we hypothesized that the ability to identify patients correctly
with active pulmonary TB could be improved by using computer analyses
involving neural networks. To test this hypothesis, we have applied an
artificial neural network to the analysis of data from patients who are
considered to be at high risk for active pulmonary TB and compared the
networkoutputtophysiciansprediction.
A general regression neural network (GRNN) was accustomed develop
the prognostic model. Predictive accuracy of the neural network
compared with clinicians' assessment. Predictive accuracy was assessed
by the c-index, which is equivalent to the area under the receiver
operating characteristic curve. The GRNN considerably outperformed the
physicians' prediction, with calculated c-indices (SEM) of zero.947 0.028
and 0.61 0.045, severally (p < zero.001). When the GRNN was applied to
the validation cluster, the corresponding-indices were zero 923 0.056 and
0.7160.095, respectively.(2)
6
2.3 A SURVEY ON DEEP LEARNING IN MEDICAL
IMAGE ANALYSIS (Litiens G1, Kooi T2)
Initially, from the Nineteen Seventies to the Nineties, medical image analysis
was finished ordered application of low-level element process (edge and line
detector filters, region growing) and mathematical modeling (fitting lines,
circles and ellipses) to construct compound rule-based systems that solved
explicit tasks.
At the top of the Nineties, supervised techniques, wherever coaching

knowledge is employed to develop a system, were changing into more and
more well-liked in medical image analysis.
The deep learning concepts that are important for and have been applied to
medical image analysis are neural networks, CNNs and RNNs. Prevalence of
CNNs in medical image analysis, we elaborate on the most common
architectures and architectural differences among the widely used models are
general classification architecture,Multi-stream architecture and
segmentation architecture. The major contributions to medical image
analysis are classification, detection, segmentation and registration.
Anatomical application areas are Brain, Chest, Eye, Breast, Cardiac,
Abdomen and Musculoskeletal. The main challenge is thus not the
availability of image data itself, but the acquisition of relevant annotations
labelling for these images. Another data-related challenge is class imbalance.
In medical imaging, images for the abnormal class might be challenging to
find, depending on the task at hand. We end with a summary of the current
state-of –the-art, a critical discussion for open challenges and direction’s for
future research. (3)
7
2.4 SURVEY ON RECENT CAD SYSTEM FOR LIVER
DISEASE DIAGNOSIS (S. S. KUMAR)
Computer Aided Diagnosis (CAD) systems provide computerized aid to

medical practitioners that is a second opinion within the detection
and designation of Diseases. Medical imaging modalities are the
foremost effective non-invasive technique used for the classification of
liver diseases .Imaging of abdominal organs for designation of disease is
sometimes allotted by Computed tomography. The liver is a reddish
brown gland which carries out more than 5000 functions vital functions
of human life. Evolution of Chronic Liver Disease (CLD) is characterized
by different stages and each one having particular pathological
characteristics. Steatosis or Fatty liver infiltration is the earliest stage of
the liver disease which occurs due to the increase in fat content in
hepatocytes. Fibrosis appears during the course of tissue injury or organ
damage. Computer Aided Diagnosis (CAD) is a technique that can help
radiologists accurately interpret images and identify potential findings to
avoid incorrect interpretation. CT images are preferred for liver disease
diagnosis since they give cross-sectional images. CT provides fast and
more accurate information due to their higher signal to-noise ratio.
Therefore CAD procedures provide doctors with a second option in the
diagnosis of liver diseases.
The computer power-assisted designation system consists of in the
main four steps like pre-processing, segmentation, feature extraction,
feature choice and classification. Specifically, four features sets, the
original grey level; wavelet coefficient statistics, co-occurrence of grey
level and contourlet coefficient statistics are extracted from the ROI. A
PNN classifier is used for the classification of malignant tissues from
benign tissues. The evaluation of the performance of the CAD system
8
indicates that the highest accuracy achieved is 96.7% and the highest
sensitivity is 97.3% and highest specificity is 96%, which is obtained
with the contourlet coefficient co-occurrence features. CAD is an
interdisciplinary technology combining artificial intelligence and digital
image processing. These applications not only increase the efficiency and
productivity, but also enhance health services for the public. (4)
2.5 ASSESSMENT OF KIDNEY VOLUMES IN

POLYCYSTIC KIDNEY DISEASE FROM CORONAL AND
AXIAL MR IMAGES (Renzo Mignani)
Total renal volume (TRV) is an important index to evaluate the progression

of autosomal-dominant polycystic kidney disease (ADPKD). TRV has been
assessed by manually tracing renal contours from CT or MR scans, often
employing contrast medium (CM). Imaging techniques, such as computed
tomography (CT) and magnetic resonance imaging (MRI), are able to well
document enlarged kidneys in ADPKD, by allowing the calculation of TRV.
In particular, TRV in ADPKD has mostly been evaluated using contrast
medium (CM). TRV computed using CT with iodinated CM has been shown
to inversely correlate with reduction in renal function. The materials and
methods that are used in detecting the kidney disease are phantom analysis
MRI protocol and image analysis. First, a rough segmentation of the kidney
regions is obtained by using a local threshold equal to the mean grey
intensity in an 11 x 11 window around the previously selected
Points followed by flood-fill and region-selection operations. The following

steps are performed only in the two detected regions, resulting from the
previous step, which contained the seeds. Hence they are used as masks to
reduce the size of the processed data and get rid of undesired areas. For the
left and right regions independently, the seed point selection is then
9
automatically refined considering the closest point to the manually selected
one, characterized by a lower intensity. Starting from this point, a region-
growing technique is applied. The region is iteratively grown by comparing
all unallocated neighbouring pixels to the region. The difference between a
pixel’s intensity value and the region’s mean is used as a measure of
similarity. The pixel with the smallest difference measured this way is
allocated to the respective region. This process stops when the intensity
difference between region mean and new pixels becomes larger than a
threshold defined experimentally as two times the maximum intensity
distance in the region. On each slice, the kidney area was calculated as the
number of pixels inside the detected contour multiplied by the spatial
resolution. Unexpected low area values inside the detected contour resulted
in the automatic repetition of the region-growing step applying a larger value
for the distance evaluation. On-kidney structures were automatically
identified and excluded from detection and from area measurement. Right
and left kidney volumes measured from stereology ranged from 197 to 3,111
ml and from 161 to 2,156 ml, respectively, reflecting the wide range of
volumes in the selected population of ADPKD patients. The result of this
study provides feasibility of using our approach for accurate and fast
evaluation of total renal volume also in markedly enlarged ADPKD
kidneys. (5)
2.6 LIVER SEGMENTATION USING SUPERPIXEL-BASED

GRAPH CUTS AND RESTRICTED REGIONS OF SHAPE
CONSTRAINS (Titimunt Kitrungrotsakul1, Xian-Hua Han1)
Liver segmentation is one of the most fundamental and challenging task in

Computer aided diagnosis (CAD) system. In CAD systems, accurate liver
segmentation results commonly require low user because of the illness and
10
tumor are changing the intensity and structure of the liver. Due to large
variance and shape deformation produced by abnormality, it is difficult to
acquire accurate segmentation via completely automatic method. Graph cut
algorithms have been successfully applied to medical image segmentation of
different organs for 3D volume data, which not only leads to very large-scale
graph due to the same node number as voxel number, but also completely
ignore some available organ shape priors. Thus, a slice by slice liver
segmentation method by combining shape constraints according to
previously slice segmentation has been proposed based on
graph cut the convolution graph cut based organ segmentation methods need
to consider all voxels as the nodes, and construct very-scale graph, which
needs large memory space for storing, and high computational cost for the
optimization.SLIC (Simple Linear iterative clustering) is used for achieving
super-pixels in the medial data. The constructed graph scale is still large, and
the computation of distance map from all voxel to the segmented shape leads
to high cost. Our graph based super pixels can reduce the memory usage. In
order to explore an efficient and effective slice by slice segmentation method
for liver, this paper proposes to apply clustering algorithm to firstly group
slice pixels into super pixels as nodes for constructing graph, which not only
greatly reduce the graph scale but also significantly speed up the
optimization procedure of the graph. To validate effectiveness and efficiency
of our proposed method, we conduct experiments on 10 CT volumes, most
of which have tumours inside liver, and abnormal deformed shape of liver.
Our method can yield an average dice coefficient: 0.94, about 659.22 second
in computation, and take only 1.5GB in memory usage.(6)
11
SYSTEM ANALYSIS
CHAPTER 3
SYSTEM ANALYSIS
3.1 EXISTING SYSTEM

Physicians often use chest X-rays to quickly and cheaply diagnose
disease associated with the area. However, it is much more difficult to
make clinical diagnoses with chest X-rays than with other imaging
modalities such as CT or MRI. With computer-aided diagnosis,
physicians can make chest X-ray diagnoses more quickly and
accurately. Pneumonia affects 7% of the global population, resulting in
2 million deaths every year. Chest X-ray analysis is routinely
performed to diagnose the disease. Computer-aided diagnostic tools
aim to supplement decision-making. These tools process the
handcrafted or convolutional neural network extracted image features
for visual recognition. However, CNNs are perceived as black boxes
since their performance lack explanations. This is a serious bottleneck
in applications involving medical screening/diagnosis since poorly
interpreted model behaviour could adversely affect the clinical
decision. In this study, we evaluate, visualize, and explain the
performance of customized CNNs to detect pneumonia
Drawbacks
1. Requires breath holding which some patients cannot manage.
2. Side effects of CT scans are diarrhea, nausea or vomiting and
constipation.
3.2 PROPOSED SYSTEM
Pneumonia is lung inflammation caused by infection with virus, bacteria,

fungi or other pathogens. According to National Institutes of Health (NIH),
chest x ray is the best test for pneumonia diagnosis. However, reading x ray
12
images can be tricky and requires domain expertise and experience. It would
be nice if we can just ask a computer to read the images and tell us the
results. In this story, we will use deep learning to train an AI algorithm that
analyses chest x ray images and detects pneumonia. Convolutional neural
network (CNN) is a class of deep neural networks that specializes in
analysing images and thus is widely used in computer vision applications
such as image classification and clustering, object detection, and neural style
transfer.
Advantages of proposed system
1. Once trained, the predictions are pretty fast.
2. With any numbers of inputs layers, CNN can train.
3.3 LANGUAGE DESCRIPTION
PYTHON
Python is an interpreted high-level programming language for general-
purpose programming. Python has a design philosophy that emphasizes code
readability, and a syntax that allows programmers to express concepts in
fewer lines of code, notably using significant whitespace. It provides
constructs that enable clear programming on both small and large scales.
Python uses dynamic typing, and a combination of reference counting

and a cycle-detecting garbage collector for memory management. It also
features dynamic name resolution (late binding), which binds method and
variable names during program execution.
Rather than having all of its functionality built into its core, Python
was designed to be highly extensible. This compact modularity has made it
particularly popular as a means of adding programmable interfaces to
existing applications.
13
The Natural Language Toolkit, or more commonly NLTK, is a suite of
libraries and programs for symbolic and statistical Natural Language
Processing (NLP) for English written in the Python programming language.
NLTK includes graphical demonstrations and sample data. It is accompanied
by a book that explains the underlying concepts behind the language
processing tasks supported by the toolkit.
NLTK is intended to support research and teaching in NLP or closely

related areas, including empirical linguistics, cognitive science, artificial
intelligence, information retrieval, and machine learning. NLTK has been
Used successfully as a teaching tool, as an individual study tool, and as a
platform for prototyping and building research systems. NLTK supports
classification, tokenization, stemming, tagging, parsing, and semantic
reasoning functionalities.
pandas is an open source, BSD-licensed library providing high-
performance, easy-to-use data structures and data analysis tools for the
Python programming language. Python has long been great for data munging
and preparation, but less so for data analysis and modeling. pandas help fill
this gap, enabling you to carry out your entire data analysis workflow in
Python without having to switch to a more domain specific language like R.
Python plotting package matplotlib strives to produce publication

quality 2D graphics for interactive graphing, scientific publishing, and user
interface development and web application servers targeting multiple user
interfaces and hardcopy output formats. There is a ‘pylab’ mode which
emulates matlab graphics. pyplot is a matplotlib module which provides a
MATLAB-like interface. matplotlib is designed to be as usable as MATLAB,
with the ability to use Python, with the advantage that it is free.
14
SYSTEM SPECIFICATION
CHAPTER 4
SYSTEM SPECIFICATION
A complete specification of hardware and software requirements is
essential for the success of software development. This software has been
developed with very powerful and high performance multi-user computing
system. It is applicable in the areas where much processing speed is required.
4.1 HARDWARE CONFIGURATION

Processor : i4
RAM : DDR 3
4.2 SOFTWARE CONFIGURATION

Operating system : Windows
Technology : Data science
Language : Python (version 3.6)
IDE : Anaconda (Jupyter Note Book)
15
SYSTEM STUDY
CHAPTER 5
SYSTEM STUDY
5.1 FEASIBILITY STUDY
Feasibility study is the evaluation of system regarding its workability,

impact on the organization, ability to meet the user needs and effective use
of resources. It is both necessary and prudent to evaluate the feasibility of a
project at the earliest possible time. Months or years of effort, thousands and
millions of dollars, and untold professional embarrassment can be averted if
an ill-conceived system is recognized early in the definition of phase.
Feasibility and risk analysis are related in many ways. If project risk is
great, the feasibility of producing quality software is reduced. During
product engineering, however, we concentrate our attention on primary areas
of interest.
In this project, there are three key considerations involved in the

feasibility analysis are
 Technical Feasibility
 Economical Feasibility
 Behavioural Feasibility
5.1.1 TECHNICAL FEASIBILITY
Technical feasibility is the need of hardware and software, which are

needed to implement the proposed system in the organization. Technical
requirements are to be fulfilled to make the proposed system work. This
should be necessarily predetermined so as to make the system more
competent.
16
Technical feasibility is the most difficult area to assess at the stage of
the system development process. Because objectives, functions and
performance are somewhat hazy, anything seems possible if the right
assumptions are made.
5.1.2 ECONOMICAL FEASIBILITY
Economical feasibility deals with the analysis of cost against benefits

i.e. whether the benefits to be enjoyed due to the new system are worthy,
when compared to the costs to be spent on the system.
Economic justification is generally the “bottom-line” consideration for

most system, long-term corporate income strategies, impact on other profit
centres or products, cost of the resources needed for development, and
potential market growth.
5.1.3 BEHAVIOURAL FEASIBILITY
Behavioural feasibility speaks about how a strong reaction the

programmer is likely to have toward or against the development of the
system. In this project, the programmers work on Python, which helps the
programmers to do coding in fewer steps as compared to Java or C++. It has
a comprehensive and large standard library that has automatic memory
management and dynamic features.
The language has clean object-oriented designs that increase two to

tenfold of programmer’s productivity while using the languages like Java,
VB, Perl, C, C++ and C#. The programmers of big companies use Python as
it has created a mark for itself in the software development. Since the
programmers are well exposed to the system, it will be feasible for them to
work on. Therefore, the system to be computerized is also behaviourally
feasible.
17
SYSTEM DESIGN
CHAPTER 6
SYSTEM DESIGN
System design is the activity of proceeding from an identified set of

requirements for a system to a design that meets those requirements. A
distinction is sometimes drawn between high-level or architectural design,
which is concerned with the main components of the system and their roles
and interrelationships, and detailed design, which is concerned with the
internal structure and operation of individual components. The term system
design is sometimes used to cover just the high-level design activity.
SYSTEM ARCHITECTURE
This proposed system resolves the problems which were discovered

from the case study made by providing an approach which works as follows:
The input for the model is collected and organized into folders, builded the
model to train the data, Trained and tested the network and finally visualized
the output in the graph format.
The following are the modules which are enclosed:
1. Analysing the problem

2. Data Collection
3. Data Understanding
4. Modeling
5. Deployment
6. visualization
18
Analysis the problem
Data collection
Data Understanding
Modeling
Deployment
Visualization
FIG: 6.1 SYSTEM ARCHITECTURE FOR

IDENTIFIACTION OF PNEUMONIA
19
MODULE DESCRIPTION
CHAPTER 7
MODULE DESCRIPTION
7.1 MODULES
1. Analysing the problem
2. Data Collection
3. Data Understanding
4. Modeling
5. Deployment
6. Visualization
7.1.1 ANALYZING THE PROBLEM

By acquiring a labelled dataset of chest x-ray scans for pneumonia, I was
able to rapidly build and prototype an artificial intelligence model that
identified pneumonia. To automate the detection of pneumonia from x-ray
scans, we need to understand our problem domain. We have to analyse the data
and build the convolutional model to train the dataset to obtain the output.
7.1.2 DATA COLLECTION
In this module we have to explore the dataset that contains chest x-ray
images of people who have pneumonia and people who don’t. Our main goal is
to predict if a person has pneumonia or not based on their chest x-ray image.
 Now we're applying a technique called Data Augmentation.
 We're changing the sizes of the images to 226 x 226 and we'll flip the
images horizontally as well so that we can have more data (images) to
train on.
20
7.1.3 DATA UNDERSTANDING
In our dataset we're given three sets of images:

 The training set. These are images we're going to train the neural network
on.
 The validation set. These are images we're going to use to check if the
model is under fitting or over fitting, while training and compare the
training and validation results in real time.
 The test set. These are images we're going to use to check how good our
neural network is with data it has not seen before.
7.1.4 MODELING
This particular neural network is called a convolutional neural network

because it has convolutional layers that convolve the images/arrays of data it's
being trained on. Image data sets are gathered from kaggle.Reinforcement
learning, in the context of artificial intelligence, is a type of dynamic
programming that trains algorithms using a system of reward and punishment.
A reinforcement learning algorithm, or agent, learns by interacting with its
environment. The agent receives rewards by performing correctly and penalties
for performing incorrectly. The agent learns without intervention from a human
by maximizing its reward and minimizing its penalty.
21
FIG 7.1.4 MODELING
There are three layers by which the given dataset is trained and tested.
1. Convolutional layer
2. Pooling layer
3. Fully connected Layer
7.1.5 DEPLOYMENT
The general deployment process consists of several interrelated activities with

possible transitions between them. These activities can occur at the producer
side or at the consumer side or both. Because every software system is unique,
the precise processes or procedures within each activity can hardly be defined.
22
Therefore, "deployment" should be interpreted as a general process that has to
be customized according to specific requirements or characteristics.
7.1.6 VISUALIZATION
Visualization is any technique for creating images, diagrams, or

animations to communicate a message. Visualization through visual imagery
has been an effective way to communicate both abstract and concrete ideas
since the dawn of humanity. Here we used matplotlib to visualize our output.
23
SYSTEM TESTING
24
CHAPTER 8
SYSTEM TESTING
8.1 TESTING
Testing is a series of different tests that whose primary purpose is to fully
exercise the computer based system. Although each test has a different purpose,
all work should verify that all system element have been properly integrated
and performed allocated function. Testing is the process of checking whether
the developed system works according to the actual requirement and objectives
of the system. The philosophy behind testing is to find the errors. A good test is
one that has a high probability of finding an undiscovered error. A successful
test is one that uncovers the undiscovered error. Test cases are devised with this
purpose in mind. A test case is a set of data that the system will process as an
input. However the data are created with the intent of determining whether the
system will process them correctly without any errors to produce the required
output.
8.2 TYPES OF TESTING:
1. Unit testing
2. Integration testing
3. Validation testing
4. Output testing
5. User acceptance testing
25
8.2.1 UNIT TESTING
All modules were tested and individually as soon as they were completed
and were checked for their correct functionality.
8.2.2 INTEGRATION TESTING
The entire project was split into small program; each of this single
programs gives a frame as an output. These programs were tested individually;
at last all these programs where combined together by creating another program
where all these constructors were used. It give a lot of problem by not
functioning is an integrated manner. The user interface testing is important
since the user has to declare that the arrangements made in frames are
convenient and it is satisfied. when the frames where given for the test, the end
user gave suggestion. Based on their suggestions the frames where modified
and put into practice
8.2.3 VALIDATION TESTING

At the culmination of the black box testing software is completely
assembled as a package. Interfacing errors have been uncovered and corrected
and a final series of test i.e., Validation succeeds when the software function in
a manner that can be reasonably accepted by the customer.
8.2.4 OUTPUT TESTING
After performing the validation testing the next step is output testing of
the proposed system. Since the system cannot be useful if it does not produce
the required output. Asking the user about the format in which the system is
required tests the output displayed or generated by the system under
consideration. Here the output format is considered in two ways. one is on
screen and another one is printed format. The output format on the screen is
found to be corrected as the format was designed in the system phase according
26
to the user needs. And for the hardcopy the output comes according to the
specifications requested by the user.
8.2.5 USER ACCEPTANCE SYSTEM
An acceptance test as the objective of selling the user on validity and

reliability of the system. It verifies that the procedures operate to system
specification and mat the integrity of vital is maintained.
8.2.6 PERFORMANCE TESTING

This project is application based project, and the modules are
interdependent with other modules, so the testing cannot be done module by
module. So unit testing not possible in this case driver.so this system is checked
only with their performance to check their quality
27
SYSTEM IMPLEMENTATION
CHAPTER 9
SYSTEM IMPLEMENTATION
System implementation is the important stage of project when the

theoretical design is tuned into practical system. After proper testing and
validation, system implementation should be done. System implementation
includes all those activities that take place to convert an old system to the new
one. The new system may be different. Replacing an existing manual or
automated system may be a major modification to an existing system.
27
CONCLUSION
CHAPTER 10
CONCLUSION
10.1 CONCLUSION
In this paper, convolutional neural network (CNN) is designed for

diagnosis of chest diseases. The designed CNN was trained and tested using
the chest X-ray images containing diseases. After training, a separate test set
was used to evaluate the performance of the CNN classifier. It takes the
image generator to take the images as batches and process the model. We
can see how accuracy changed over our training and Validation Set and also
how our Loss function changed using the graph. We can get the accuracy of
pneumonia that is presented in the chest X ray that are detected using the
CNN is 90%.
10.2 FUTURE ENHANCEMENT
Our future work will be based on studying about complex diseases and
that can be implementing in Neural networks.
28
APPENDIX
APPENDIX A.1
A.1 SOURCE CODE
# Import Packages
import keras
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.preprocessing.image
import ImageDataGenerator, load_img
mainDIR = os.listdir('D:/chest_xray/test')
print (mainDIR)
train_folder= 'D:/chest_xray/test'
val_folder = 'D:/chest_xray/train'
test_folder = 'D:/chest_xray/val'
print(len(os.listdir(train_n)))
rand_norm= np.random.randint(0,len(os.listdir(train_n)))
norm_pic = os.listdir(train_n)[rand_norm]
print('normal picture title: ',norm_pic)
norm_pic_address = train_n+norm_pic
#Pneumonia
rand_p = np.random.randint(0,len(os.listdir(train_p)))
sic_pic = os.listdir(train_p)[rand_norm]
sic_address = train_p+sic_pic
A1
print('pneumonia picture title:', sic_pic)
# Load the images

norm_load = Image.open(norm_pic_address)
sic_load = Image.open(sic_address)
#Let's plt these images

f = plt.figure(figsize= (10,6))
a1 = f.add_subplot(1,2,1)
img_plot = plt.imshow(norm_load)
a1.set_title('Normal')
a2 = f.add_subplot(1, 2, 2)
img_plot = plt.imshow(sic_load)
a2.set_title('Pneumonia')
# let's build the CNN model
cnn = Sequential()
#Convolution
cnn.add(Conv2D(32, (3, 3), activation="relu", input shape= (64, 64, 3)))
#Pooling
cnn.add(MaxPooling2D(pool_size = (2, 2)))
# 2nd Convolution
cnn.add (Conv2D (32, (3, 3), activation="relu"))
A2
# 2nd Pooling layer
cnn.add (MaxPooling2D (pool_size = (2, 2)))
# Flatten the layer

cnn.add (Flatten ())
# Fully Connected Layers

cnn.add (Dense (activation = 'relu', units = 128))
cnn.add(Dense(activation = 'sigmoid', units = 1))
# Compile the Neural network

cnn.compile (optimizer = 'adam', loss = 'binary_crossentropy', metrics =
['accuracy'])
train_datagen = ImageDataGenerator(rescale = 1./255,

shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
#Image normalization.
training_set = train_datagen.flow_from_directory('D:/chest_xray/train/',
target_size = (64, 64),
batch_size = 32,
class_mode = 'binary')
A3
validation_generator =
test_datagen.flow_from_directory('D:/chest_xray/val/',
target_size=(64, 64),
batch_size=32,
class_mode='binary')
test_set = test_datagen.flow_from_directory('D:/chest_xray/test/',
target_size = (64, 64)
batch_size = 32,
class_mode = 'binary')
cnn.summary ()
cnn_model = cnn.fit_generator (training_set,
steps_per_epoch = 163,
epochs = 10,
validation_data = validation_generator,
validation_steps = 624)
test_accu = cnn.evaluate_generator(test_set,steps=624)
print('The testing accuracy is :',test_accu[1]*100, '%')
plt.plot(cnn_model.history['acc'])
plt.plot(cnn_model.history['val_acc'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Training set', 'Validation set'], loc='upper left')
plt.show()
plt.plot(cnn_model.history['val_loss'])
plt.plot(cnn_model.history['loss'])
plt.title('Model Loss')
A4
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Training set', 'Test set'], loc='upper left')
plt.show()
A5
APPENDIX A.2
A.2 SCREENSHOTS
A.2.1 ANACONDA INSTALLATION TYPE
A.2.2 ANACONDA INSTALLATION
A6
A.2.3 PYTHON INSTALLATION IN ANACONDA PROMPT
A.2.4 TENSORFLOW INSTALLATION
A7
A.2.5 PILLOW INSTALLATION
A.2.6 NUMPY INSTALLATION
A8
A.2.7 KERAS INSTALLATION
A.2.7 MATPLOTLIB INSTALLATION
A9
A.2.6 DATASET FOR THE MODEL
A.2.7 CHEST X RAY OF NORMAL AND PNEUMONIA
A10
A.2.8 MODEL ACCURACY OF PNEUMONIA
A.2.9 MODEL LOSS OF ACCURACY
A11
REFERENCES
REFERENCES
[1] Albert Chon, Niranjan Balachandra(2016), “Deep Convolutional Neural

Networks for Lung Cancer Detection”, Journal of Big Data.
[2] EI-Solh AA 1 , Hsiao CB(2017), “predicting active pulmonary

tuberculosis using an artificial neural network”
[3] Litiens G , Kooi T(2017), “ Survey on deep learning in medical image

analysis”
[4] S. S. Kumar(2015), ” Survey on Recent CAD System for Liver Disease

Diagnosis”, International Conference on Control, Instrumentation,
Communication and Computational technologies.
[5] Renzo Mignani(2013), “assessment of kidney volumes in polycystic

kidney disease from coronal and axial mr images”, International Symposium
on Image and Signal Processing and Analysis.
[6] Titimunt Kitrungrotsakul , Xian-Hua Han(2015), “liver segmentation

using superpixel-based graph cuts and restricted regions of shape
constrains”, Journal of IEEE.

Final Report 2

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Final Report 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Report 2

Uploaded by

Copyright:

Available Formats

IDENTIFICATION OF

in partial fulfillment for the award of the degree

KNOWLEDGE INSTITUTE OF TECHNOLOGY, SALEM

ANNA UNIVERSITY::CHENNAI 600 025

in partial fulfillment for the award of the degree

KNOWLEDGE INSTITUTE OF TECHNOLOGY, SALEM

ANNA UNIVERSITY::CHENNAI 600 025

Certified that this project report “IDENTIFICATION OF

Dr. V. KUMAR M.E., Ph.D., Mr. I .Rajesh M.E.,

Submitted for the Project Viva-Voce Examination held on ______________

INTERNAL EXAMINER EXTERNAL EXAMINER

At this pleasing moment of having successfully completed our project, we

We would like to convey our sincere thanks to our beloved Principal,

We express our sincere thanks to Dr. V. Kumar, Head of the Department,

We express our pronounced sense of thanks with deepest respect and

CHAPTER TITLE PAGE

LIST OF ABBREVIATIONS III

1.1 INTRODUCTION TO DATA SCIENCE 1

2.1 DEEP CONVOLUTIONAL NERUAL

3.1 EXISTING SYSTEM 12

5.1 FEASIBILITY STUDY 16

5.1.2 ECONOMICAL FEASIBILITY 16

5.1.3 BEHAVIOURAL FEASIBILITY 17

6.1 SYSTEM ARCHITECTURE 18

7.1.2 DATA COLLECTION 20

7.1.3 DATA UNDERSTANDING 21

10.2 FUTURE ENCHANCEMENT 29

A.1 SOURCE CODE A1

diseases include chronic obstructive pulmonary disease, pneumonia, asthma,

tuberculosis, and lung diseases. The timely diagnosis of chest diseases is

are the most commonly performed diagnostic imaging study. A number of

the appearance of the CXR, complicating interpretation further. In addition,

improve the efficiency and reach of diagnostic services, we utilized

convolutional neural network to accurately predict presences of pneumonia

network, it will return the accuracy of pneumonia.

KEYWORDS: Chest diseases, Pneumonia, diagnosis, CNN, X-ray.

FIGURE NO FIGURE NAME PAGE NO

6.1 System Architecture For Identification 19

ANN Artificial Neural Networks

CAD Computer Aided Diagnosis

CLD Chronic Liver Disease

CNN Convolution Neural Networks

ICU Intensive Care Unit

RNN Recurrent Neural Networks

1.1 INTRODUCTION TO DATA SCIENCE

1.1.1 INTRODUCTION TO MACHINE LEARNING

1.2 OVERVIEW OF THE PROJECT

Pneumonia is an infection in one or both lungs. It can be caused by

2.1 Deep Convolutional Neural Networks for Lung Cancer

Lung Cancer is one all told the foremost common cancers. It is one

2.2 PREDICTING ACTIVE PULMONARY TUBERCULOSIS USING

The most important aspect of a tuberculosis (TB) infection control

At the top of the Nineties, supervised techniques, wherever coaching

Computer Aided Diagnosis (CAD) systems provide computerized aid to

2.5 ASSESSMENT OF KIDNEY VOLUMES IN

Total renal volume (TRV) is an important index to evaluate the progression

Points followed by flood-fill and region-selection operations. The following

2.6 LIVER SEGMENTATION USING SUPERPIXEL-BASED

Liver segmentation is one of the most fundamental and challenging task in