Major Final

A PROJECT PHASE-1 REPORT SUBMITTED ON
LUNG NODULE DETECTION USING DEEP LEARNING

A report submitted in partial fulfillment of the requirements for the Award of Degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
By
BOINPALLY RAVITEJA 20671A0503
KONAPURAM SIDDA 20671A0522
L. SHARATH KUMAR 20671A0525
GURU DEVENDER 21675A0503
Under the esteemed guidance of
Mrs. S. PAVANI
ASSISTANT PROFESSOR
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

J.B. INSTITUTE OF ENGINEERING & TECHNOLOGY
(UGC Autonomous)
Approved by AICTE, Autonomous, accredited by NBA &NAAC Permanently affiliated
to JNTHU,Hyderabad, Telangana.
2023-2024
I
(UGC Autonomous)
Approved by AICTE, Autonomous, accredited by NBA &NAAC Permanently affiliated
to JNTHU, Hyderabad, Telangana.
CERTIFICATE
This is to certify that the Major Project stage-1 report entitled “LUNG NODULE DETECTION
USING DEEP LEARNING” submitted to the Department of Computer Science and Engineering,
J.B Institute of Engineering & Technology, in accordance with Jawaharlal Nehru Technological
University regulations as partial fulfillment required for successful completion of Bachelor of
Technology is a record of Bonafide work carried out during the academic year 2023-2024 by,
BOINPALLY RAVITEJA 20671A0503

KONAPURAM SIDDA 20671A0522
L. SHARATH KUMAR 20671A0525
GURU DEVENDER 21675A0503
Internal Guide Head of the Department

Mrs. S. PAVANI Dr. G. SREENIVASULU
Assistant Professor Associate Professor
Department of CSE
II
(UGC Autonomous)
Approved by AICTE, Autonomous, accredited by NBA &NAAC Permanently affiliated to
JNTHU, Hyderabad, Telangana.
DECLARATION
w e hereby declare that the Major Project stage-1 report entitled “LUNG NODULE DETECTION
USING DEEP LEARNING” carried out by us under the guidance of, Mrs. S. Pavani, Assistant
Professor is submitted in partial fulfillment of the requirements for the award of the degree of Bachelor
of Technology in Computer Science and Engineering. This is a record of Bonafide work carried out
by us and the results embodied in this project report have not been reproduced or copied from any
source. The results embodied in this project report have not been submitted to any other university
or institutefor the award of any other degree or diploma.
Date: 13 / 12 / 2023
III
ACKNOWLEDGEMENT
At outset we express our gratitude to the almighty lord for showering his grace andblessings upon us
to complete this Major Project stage-1. Although our name appears on the cover of this book, many
people have contributed in some form or the other to this project Development. We could not have
done this Project without the assistance or support of each of the following.
First of all, I am highly indebted to Dr. P. C. KRISHNAMACHARY, Principal for giving us the
permission to carry out this Major Project stage-1.
We would like to thank Dr. G. SREENIVASULU, Associate Professor & Head of the Departmentof
COMPUTER SCIENCE AND ENGINEERING, for being moral support throughout the period of
study in the Department.
We are grateful to Mrs. S. Pavani, Assistant Professor COMPUTER SCIENCE ENGINEERING,

for her valuable suggestions and guidance given by her during the execution of this Project work.
We would like to thank the Teaching and Non-Teaching Staff of the Department of Computer
Science & Engineering for sharing their knowledge with us.
IV
TABLE OF CONTENT
SL.NO NAME OF TOPIC PAGE NO
1. INTRODUCTION 1-2
2. LITERATURE SURVEY 3-5
3. SYSTEM ANALYSIS 6-7
3.1 Existing Systems 6
3.2 Disadvantages 6
3.3 Proposed Systems 7
3.4 Advantages 7
4. REQUIREMENT SPECIFICATIONS 8-9
4.1 Functional Requirements 8
4.2 Non-Functional Requirements 8
4.3 Software Requirements 9
4.4 Hardware Requirements 9
5. SYSTEM DESIGN 10-11
5.1 System Architecture 10
5.2 Data flow diagram 11
5.3 sequence diagram 12
6. BIBLIOGRAPHY 13
V
LIST OF FIGURES
Sl. No Dscripton of Figures Page No
1. System Architecture 9
2. Data flow diagram 10
3. sequence diagram 11
VI
ABSTRACT
Lung cancer is a high-risk disease that affects people all over the world, and lung nodules are the most
common sign of early lung cancer. Since early identification of lung cancer can considerably improve
a lung scanner patient’s chance of survival, an accurate and efficient nodule detection system can be
essential. Automatic lung nodule recognition decreases radiologists' effort, as well as the risk of
misdiagnosis and missed diagnoses. Hence, this article developed a new lung nodule detection model
with four stages like “Image pre-processing, segmentation, feature extraction and classification”. In
these processes, pre-processing is the first step, in which the input image is subjected to a series of
operations. Then, the "Otsu Threshold model" is used to segment the per-processed pictures. Then in
the third stage, the LBP features are retrieved that is then classified via optimized Convolutional Neural
Network (CNN). In this, the activation function and convolutional layer count of CNN is optimally
tuned via a proposed algorithm known as Improved Moth Flame Optimization (IMFO). At the end,
the betterment of the scheme is validated by carrying out analysis in terms of certain measures.
Especially, the accuracy of the proposed work is good accuracy
1. INTRODUCTION
Recently, lung cancer and COVID-19 are two drastic pulmonary diseases, which cause millions of
deaths globally each year. Lung cancer is said to be the 2nd most widespread form of cancer in both
women as well as men and it is the primary cause of deaths occurring due to cancer in US. The finest
possibilities of survival emerge from earlier detection and diagnosis that could be aided by enhanced
automated malignant nodule recognition techniques. A lung nodule will be round and it is a smaller
growth of tissue found in the cavity of the chest. Nodules are usually below 30 mm in size, and outsized
growths are termed as masses and are assumed to be malignant.
Nodules among 5–30 mm might be malignant or benign, with the probability of malignancy rising
with size. Spiculated or lobulated nodule edges might specify malignancy whereas Smooth nodules
with indications of calcifications are expected to be benign. There are two most important chest
imaging methods, fundamental X-ray imaging and CT.
Radiographs or chest X-ray images offer a single outlook on the chest cavity. Poster anterior analysis,
where the X-ray beam passes over the chest of the patient from back to front is general. CT scans are
3-D images generated by means of X-ray images obtained from several orientations and it could offer
an entire view of the internals parts of the chest and can, therefore, be exploited for easily detecting
the shapes, sizes, locations, and densities of lung nodules. Nevertheless, CT scan equipment is highly-
priced and is often not obtainable in rural areas or minor hospitals.
Moreover, radiographs are comparatively fast and cheap, and the patients are exposed to minute
radiation, hence they are typically the initial diagnostic step for identifying any abnormalities in the
chest. CAD methods were deployed to identify the lung nodules more precise and quicker. Nodule
recognition approaches are modeled by conventional image processing schemes to discover areas of
the chest radiograph, which encloses a brighter object of the expected texture, shape, and size of a lung
nodule. With current enhancements in CNNs, certain researchers have aimed at exploiting these
techniques to categorize lung nodules. Unluckily, the accessible datasets are comparatively low in
medical imaging The literature on detecting and diagnosing lung nodules is extensive.
To date, the general technique for lung nodule diagnosis in all existing CAD systems has been to utilize
2
a candidate identification stage. While some of these researches use low-level appearance-based
variables to drive this identification task, others use shape and size information. Ypsilantis et
al. proposed using recurrent neural networks in a patch-based strategy to improve nodule detection,
which is related to deep learning-based methodologies. A 2D multi-step segmentation approach was
presented by Krishnamurthy et al. to discover candidates.
There have also been in-depth studies of high-level discriminatory information extraction employing
deep networks to improve FP minimization. Seito et al. employed a fusion technique to conduct FP
reduction after training 9 independent 2D convolutional neural networks on 9 different perspectives of
candidates. For candidate detection, another study used a modified version of Faster R-CNN, which
was the state-of-the-art object detector at the time, and a patch-based 3D CNN for the FP reduction
step. All of these approaches, however, are computationally ineffective
. CT scans are 3-D images generated by means of X-ray images obtained from several orientations and
it could offer an entire view of the internals parts of the chest and can, therefore, be exploited for easily
detecting the shapes, sizes, locations, and densities of lung nodules. Nevertheless, CT scan equipment
is highly-priced and is often not obtainable in rural areas or minor hospitals.
3
2. LITERATURE SURVEY
In the realm of lung cancer detection and prediction, several studies have explored various
methodologies and algorithms to enhance accuracy and efficiency.
Detection of Lung Cancer in CT Images using Image Processing (Nidhi.s, 2019) introduced an image
processing approach coupled with support vector machine (SVM) for lung cancer detection. However,
this study did not utilize a dedicated database for its analysis.
Multi-Stage Lung Cancer Detection and Prediction Using Multi-class SVM Classifier (Janee Alam,
2019) emphasized the application of a multi-class SVM classifier for lung cancer detection and
prediction across multiple stages. The study highlighted challenges in achieving accurate detection,
particularly in scenarios with low accuracy rates.
A Comparative study of Lung Cancer detection using supervised neural network (Ahana Gangly,
2019) conducted a comparative analysis of lung cancer detection techniques employing a supervised
neural network, specifically utilizing the SURF (Speeded Up Robust Features) algorithm. However,
reported accuracy rates were relatively lower.
Multi-Layer Perceptron Based Lung Tumor Classification (Sneha Petagna, 2018) explored lung tumor
classification using a Multi-Layer Perceptron (MLP) algorithm within an image processing
framework. One of the identified challenges was the time-consuming nature of the process.
Robustness-Driven Feature Selection in Classification of Fibrotic Interstitial Lung Disease Patterns

in Computed Tomography Using 3D Texture Features (Dainel Y Hung, 2016) proposed a robustness-
driven feature selection approach for classifying fibrotic interstitial lung disease patterns in CT images.
However, the study reported slow processing times as a limitation.
Automatic Detection and Segmentation of Lung Nodule on CT Images (Yangchuan, 2018) focused on
automatic detection and segmentation of lung nodules in CT images using a fully convolutional
network (FCN). Nevertheless, the study reported poor detection
4
3. SYSTEM ANALYSIS
3.1 EXISTING SYSTEM
K means clustering
Wavelet and Principal component analysis
KNN classifier
3.2 Disadvantages of Existing System
Difficult to get accurate results

Not applicable for multiple images for lesion segmented in a short time
Poor discriminatory power and less classification accuracy
3.3 PROPOSED SYSTEM
Pre processing
Feature extraction
CNN
3.4Advantages of Proposed System
High Accuracy
Automation and Efficiency
Early Detection
5
6
4. REQUIREMENT SPECIFICATIONS
4.1 FUNCTIONAL REQUIREMENTS
1. Image Acquisition and Input Handling: The system should be able to handle DICOM images
obtained from various medical imaging devices. It should support multiple input formats and
resolutions.
2. Preprocessing: The system must preprocess images to enhance nodule features and ensure
consistency. Preprocessing steps should include resizing, normalization, and noise reduction.
3. Nodule Detection: The system should accurately detect and localize lung nodules within CT scan
images. It must be able to distinguish nodules from other anatomical structures. The detection
should provide information about the size, shape, and location of nodules.
4. Model Training and Validation: The system must train deep learning models using annotated
datasets. It should support various deep learning architectures suitable for nodule detection.
5. Integration and Deployment: The system should integrate with existing PACS or other healthcare
systems for seamless deployment.
6. Reporting and Visualization: The system should generate comprehensive reports detailing
detected nodules and their characteristics.
4.2 NON - FUNCTIONAL REQUIREMENTS
1. Accuracy and Performance: The system must achieve high accuracy in nodule detection to
minimize false positives and false negatives. It should be able to process images within a
reasonable time frame to support clinical workflows.
2. Scalability: The system should be scalable to handle large volumes of image data efficiently. It
should accommodate future growth in the dataset size and user base.
3. Security and Privacy: The system must comply with relevant healthcare data security and privacy
regulations (e.g., HIPAA, GDPR).
4. Reliability and Availability: The system should be reliable, with minimal downtime and robust
error handling mechanisms. It should have built-in redundancy and failover capabilities to ensure
continuous availability.
5. Interoperability: The system should be interoperable with other healthcare IT systems, allowing
7
seamless data exchange and integration. It should support standard data formats and
communication protocols used in the healthcare industry.
6. Usability: The system interface should be intuitive and easy to use, even for non-technical
clinicians. It should provide clear feedback and guidance to users during operation.
7. Regulatory Compliance: The system must comply with applicable medical device regulations and
standards. It should undergo rigorous testing and validation to ensure safety and efficacy in clinical
use.
4.3 SOFTWARE REQUIREMENTS
⚫ Coding Language: Python
4.1 HARDWARE REQUIREMENTS

• Operating System: Windows 10
• Processor: i5 and above
• RAM: 4 GB
• Disk Space: 16 GB
The most common set of requirements defined by any operating system or software application is the
physical computer resources, also known as hardware, A hardware requirements list is often
accompanied by a hardware compatibility list (HCL), especially in case of operating systems. An HCL
lists tested, compatible, and sometimes incompatible hardware devices for a particular operating
system or application. The following sub-sections discuss the various aspects of hardware
requirements.
Architecture – All computer operating systems are designed for a particular computer architecture.
Most software applications are limited to particular operating systems running on particular
architectures. Although architecture-independent operating systems and applications exist, most need
to be recompiled to run on a new architecture. See also a list of common operating systems and their
supporting architectures.
Processing power – The power of the central processing unit (CPU) is a fundamental system
8
requirement for any software. Most software running on x86 architecture define processing power as
the model and the clock speed of the CPU. Many other features of a CPU that influence its speed and
power, like bus speed, cache, and MIPS are often ignored. This definition of power is often erroneous,
as AMD Athlon and Intel Pentium CPUs at similar clock speed often have different throughput speeds.
Intel Pentium CPUs have enjoyed a considerable degree of popularity, and are often mentioned in this
category.
Memory – All software, when run, resides in the random access memory (RAM) of a computer.
Memory requirements are defined after considering demands of the application, operating system,
supporting software and files, and other running processes. Optimal performance of other unrelated
software running on a multi-tasking computer system is also considered when defining this
requirement.
Secondary storage – Hard-disk requirements vary, depending on the size of software installation,
temporary files created and maintained while installing or running the software, and possible use of
swap space (if RAM is insufficient).
Display adapter – Software requiring a better than average computer graphics display, like graphics
editors and high-end games, often define high-end display adapters in the system
9
5.SYSTEM DESIGN
5.1 SYSTEM ARCHITECTURE

The architecture for lung nodule detection using deep learning encompasses several
interconnected components designed to facilitate accurate and efficient detection of nodules
within CT scan images. It begins with the acquisition of DICOM-format lung images, which
undergo preprocessing steps such as resizing, normalization, and noise reduction to enhance
relevant features and ensure consistency. Annotated data, detailing the location and characteristics
of nodules, serves as the foundation for training deep learning models, typically employing
architectures like Convolutional Neural Networks (CNNs) tailored for medical imaging tasks.
Following model training and validation to ensure robust performance metrics, the system
integrates seamlessly into clinical workflows through deployment, often within Picture Archiving
and Communication Systems (PACS) or standalone applications. In the operational phase, the
system applies the trained model to new images for nodule detection, providing clinicians with
valuable insights into nodule size, shape, and location. Continuous feedback loops, supported by
post-processing techniques and comprehensive reporting functionalities, enable iterative
improvements to both model performance and system usability
Fig 5.1: Block Diagram
10
5.2 DATA FLOW DIAGRAM:
1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to
represent a system in terms of input data to the system, various processing carried out on this
data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to model
the system components. These components are the system process, the data used by the
process, an external entity that interacts with the system and the information flows in the
system.
3. DFD shows how the information moves through the system and how it is modified by a series
of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at any level of
abstraction. DFD may be partitioned into levels that represent increasing information flow and
functional detail.
11
5.2 DATA FLOW DIAGRAM:
5. The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to
represent a system in terms of input data to the system, various processing carried out on this
data, and the output data is generated by this system.
6. The data flow diagram (DFD) is one of the most important modeling tools. It is used to model
the system components. These components are the system process, the data used by the
process, an external entity that interacts with the system and the information flows in the
system.
7. DFD shows how the information moves through the system and how it is modified by a series
of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
8. DFD is also known as bubble chart. A DFD may be used to represent a system at any level of
abstraction. DFD may be partitioned into levels that represent increasing information flow and
functional detail.
12
Sequence Diagram:
Sequence Diagrams Represent the objects participating the interaction horizontally and time
vertically. A Use Case is a kind of behavioral classifier that represents a declaration of an offered
behavior. Each use case specifies some behavior, possibly including variants that the subject can
perform in collaboration with one or more actors. Use cases define the offered behavior of the subject
without reference to its internal structure. These behaviors, involving interactions between the actor
and the subject, may result in changes to the state of the subject and communications with its
environment. A use case can include possible variations of its basic behavior, including exceptional
behavior and error handling.
13
Class diagram
The class diagram is the main building block of object-oriented modeling. It is used for general
conceptual modeling of the systematic of the application, and for detailed modeling translating the
models into programming code. Class diagrams can also be used for data modeling.[1] The classes in
a class diagram represent both the main elements, interactions in the application, and the classes to be
programmed.
In the diagram, classes are represented with boxes that contain three compartments:
The top compartment contains the name of the class. It is printed in bold and centered, and the first
letter is capitalized.
The middle compartment contains the attributes of the class. They are left-aligned and the first letter
is lowercase.
The bottom compartment contains the operations the class can execute. They are also left-aligned and
the first letter is lowercase.
14
6. BIBLIOGRAPHY
1. Paul. Key Statistics for Lung Cancer. Version 1.6.0. Available

online: https://www.cancer.org/cancer/non-small-cell-lung-cancer/about/key-
statistics.html (accessed on 15 May 2019).
2. Zhou, Z.H.; Jiang, Y.; Yang, Y.B.; Chen, S.F. Lung cancer cell identification based on artificial
neural network ensembles. Arif. Intel. Med. 2002, 24, 25–36. [Google Scholar] [Crossruff]
3. Borowczyk, L.; Zhao, L.; Lee, K.P. Feature subset selection for improving the performance of
false positive reduction in lung nodule CAD. IEEE Trans. Inf. Technol. Biomed. 2006, 10, 504–
511. [Google Scholar] [Crossruff]
4. Tajbakhsh, N.; Suzuki, K. Comparing two classes of end-to-end machine-learning models in lung
nodule detection and classification: MTANNs vs. CNNs. Pattern Recognin. 2017, 63, 476–486.
[Google Scholar] [Crossruff]
5. Sivakumar, S.; Chandrasekar, C. Lung nodule detection using fuzzy clustering and support vector
machines. Int. J. Eng. Technol. 2013, 5, 179–185. [Google Scholar]
6. Russell, S.J.; Norvig, P. Artificial Intelligence: A Modern Approach; Pearson Education Limited:
Malaysia, 2016. [Google Scholar]
7. Pedregosa, F.; Viroqua, G.; Gram fort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.;
Pettenkofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach.
Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
8. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar]
[Crossruff
15

Major Final

Uploaded by

Copyright:

Available Formats

Major Final

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Major Final

Uploaded by

Copyright:

Available Formats

A PROJECT PHASE-1 REPORT SUBMITTED ON

LUNG NODULE DETECTION USING DEEP LEARNING

Under the esteemed guidance of

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BOINPALLY RAVITEJA 20671A0503

Internal Guide Head of the Department

We are grateful to Mrs. S. Pavani, Assistant Professor COMPUTER SCIENCE ENGINEERING,

SL.NO NAME OF TOPIC PAGE NO

Robustness-Driven Feature Selection in Classification of Fibrotic Interstitial Lung Disease Patterns

3.1 EXISTING SYSTEM

3.2 Disadvantages of Existing System

Difficult to get accurate results

3.3 PROPOSED SYSTEM

3.4Advantages of Proposed System

4.1 FUNCTIONAL REQUIREMENTS

4.2 NON - FUNCTIONAL REQUIREMENTS

4.3 SOFTWARE REQUIREMENTS

⚫ Coding Language: Python

4.1 HARDWARE REQUIREMENTS

5.1 SYSTEM ARCHITECTURE

Fig 5.1: Block Diagram

1. Paul. Key Statistics for Lung Cancer. Version 1.6.0. Available

You might also like