EARLY PREDICTION OF MELANOCARCINOMA USING DEEP LEARNING (1)
EARLY PREDICTION OF MELANOCARCINOMA USING DEEP LEARNING (1)
EARLY PREDICTION OF MELANOCARCINOMA USING DEEP LEARNING (1)
PHASE I REPORT
Submitted by
PRAVEEN KUMAR. S
MASTER OF ENGINEERING
in
BIOMETICS AND CYBERSECURITY
DECEMBER 2024
I
BONAFIDE CERTIFICATE
….………………………………….. …………………………………….
SIGNATURE SIGNATURE
II
GKM COLLEGE OF ENGINEERING AND TECHNOLOGY
CHENNAI-600063
ANNA UNIVERSITY :: CHENNAI-600025
………………………………… …………………………………
III
ACKNOWLEDGEMENT
Dr. N.S. BHUVANESWARI for her continuous motivation, kind support and
guidance throughout the project.
IV
TABLE OF CONTENTS
ABSTRACT vii
1 INTRODUCTION 1
1.1 METHODOLOGY 2
1.2 SCOPE OF THE PROJECT 3
1.3 LITERATURE SURVEY 4
2 SYSTEM ANALYSIS
2.1 PROBLEM DEFINITION 7
2.2 OBJECTIVE 8
2.3 EXISTING SYSTEM 9
2.4 PROPOSED SYSTEM 11
4 SYSTEM DESIGN
4.1 ARCHITECTURE DIAGRAM 16
4.2 SEQUENCE DIAGRAM 17
4.3 UML DIAGRAM 17
4.4 USECASE DIAGRAM 18
4.5 ACTIVITY DIAGRAM 19
4.6 DATAFLOW DIAGRAM 20
4.7 BLOCK DIAGRAM 20
V
5 MODULE DESCRIPTION
5.1 DATA COLLECTION 21
5.2 DATA PREPROCESSING 21
5.3 FEATURE EXTRACTION 22
5.4 MODEL CREATION 22
5.5 PREDICTION 23
6 FUTURE WORK 24
7 CONCLUSION 26
8 REFERENCES 27
VI
ABSTRACT
VII
LIST OF FIGURES
VIII
LIST OF ABBREVIATIONS
IX
CHAPTER-1
INTRODUCTION:
1
analyze lesion evolution. This project aims to develop an advanced deep learning
framework that incorporates sequential dermoscopic image analysis to improve the
early detection and diagnosis of melanoma. By modeling lesion evolution, the
proposed system seeks to enhance diagnostic accuracy and provide clinicians with
explainable and actionable insights.
1.1 METHODOLOGY:
LSTM:
2
demonstrating their versatility in handling sequential data and contributing to
improved performance in tasks requiring nuanced temporal understanding.
CNN – LSTM:
The primary scope of this project is to design and develop an advanced deep
learning-based system for the early diagnosis of skin cancer, with a focus on
melanoma. The system will leverage Convolutional Neural Networks (CNNs) and
CNN-LSTM architectures to classify dermoscopic images into six distinct categories:
basal cell carcinoma, benign keratosis-like lesions, dermatofibroma, melanocytic nevi,
pyogenic granulomas and hemorrhage, and melanoma. The project will utilize the
HAM10000 dataset, a comprehensive collection of labeled dermoscopic images, to
train and evaluate the model's performance.
This project also aims to address the limitations of existing diagnostic methods
by incorporating sequential dermoscopic image analysis. By tracking and analyzing
3
lesion evolution over time, the system will enhance the detection of early-stage
melanoma, even in cases where static visual characteristics are subtle. The inclusion of
temporal data analysis will support clinicians in identifying changes indicative of
malignancy, improving diagnostic accuracy and reducing over-diagnosis of benign
lesions.
4
2. Darrell S Rigel and John A Carucci - “Early Detection of Malignant Melanoma
Using Machine Learning Algorithms”
Darrell S. Rigel proposed the integration of machine learning algorithms,
especially Convolutional Neural Networks (CNNs), to enhance the accuracy of
melanoma diagnosis. The study focused on image analysis and pattern recognition in
dermoscopic images, transitioning beyond traditional ABCD criteria. By utilizing
advanced AI technologies, this approach has improved the early detection of
melanoma, leading to better patient outcomes. The research highlights the evolution of
melanoma diagnosis through multidisciplinary approaches and technological
advancements over 25 years.
Mehwish Dildar reviewed the use of deep learning techniques, such as CNNs,
SVMs, and DNNs, for classifying skin lesion images. The study concluded that CNNs
5
outperform other models in image-based classification due to their strong association
with computer vision tasks. This research highlighted the importance of selecting
appropriate algorithms for optimal performance, providing a framework for advancing
AI-based melanoma detection methods.
Gabriel Salerni and his team introduced a "two-step method" combining total
body photography and digital dermatoscopy for early melanoma detection in high-risk
patients. This digital follow-up (DFU) method enabled more precise monitoring of
suspicious lesions, leading to the early identification of melanoma. The study revealed
that 8.5% of excised lesions were diagnosed as melanomas, most being in situ with
low Breslow indices, demonstrating the effectiveness of this method in minimizing
invasive procedures while ensuring timely diagnoses.
6
CHAPTER 2
SYSTEM ANALYSIS
The early detection of malignant melanoma, the most aggressive form of skin
cancer, remains a critical challenge in dermatology. Despite advancements in
dermoscopic techniques and diagnostic criteria, such as the ‘7-point checklist’ and
ABCD guidelines, identifying melanoma in its incipient stages is often hindered by the
subtle nature of early lesions, which may lack definitive dermoscopic features. This
difficulty contributes to a delicate balance between underdiagnosing melanoma and
overdiagnosing benign lesions, leading to unnecessary biopsies or missed early-stage
cancers.
7
2.2 OBJECTIVE:
8
2.3 EXISTING SYSTEM:
Support Vector Machines (SVMs) are one of the most powerful machine
learning algorithms used in skin cancer detection, particularly melanoma diagnosis.
SVMs are based on supervised learning techniques and are designed to classify
datasets into distinct groups by constructing an optimal hyperplane that maximizes the
margin between data groups. In skin cancer detection, SVMs are widely used for
classifying dermoscopic images into benign or malignant categories. The workflow
generally involves preprocessing steps such as noise reduction, image segmentation,
feature extraction, and then using these features to train the SVM model. The SVM
classifier is effective in handling high-dimensional data and is computationally
efficient, making it suitable for smaller datasets.
Disadvantages of SVMs:
9
5. Requires Extensive Feature Engineering: SVMs need careful feature
selection and transformation, which can be time-consuming and requires domain
expertise.
Disadvantages of KNN:
1. Sensitivity to Dataset Quality and Size: KNN performs poorly with large
datasets or high-dimensional data due to the "curse of dimensionality," where
performance deteriorates as the number of features increases.
2. High Computational Cost: During the classification phase, KNN requires
significant computational resources to compute the distance between the query
point and all training samples, leading to slower real-time predictions.
3. Dependency on the Choice of k: The accuracy of KNN is highly dependent on
the choice of the number of neighbors (k), which can be challenging to optimize
and tune.
10
4. Limited Ability to Model Complex Relationships: While simple and
effective, KNN lacks the ability to model complex relationships within the data,
making it less robust compared to advanced algorithms like CNNs or hybrid
models like CNN-LSTM.
5. Lack of Robustness: KNN is less effective in capturing subtle nuances in
melanoma detection, which are crucial for accurate diagnosis.
The CNN-LSTM model integrates the strengths of both CNN and LSTM to
create a system capable of understanding the progression of skin lesions. CNNs are
used to extract spatial features from dermoscopic images, such as texture, shape, and
color variations that are indicative of melanoma. These features are then passed to the
LSTM network, which is designed to capture temporal changes by analyzing a
sequence of images of the same lesion over time. This approach allows the system to
model lesion evolution, helping to differentiate between benign lesions and early-stage
11
melanoma based on subtle changes that might not be evident in a single time-point
image.
Temporal Analysis for Early Detection: Unlike traditional systems that focus on
single-time point images, the CNN-LSTM model incorporates the sequential nature of
dermoscopic images, making it possible to detect subtle changes over time. By
modeling lesion evolution, the system can recognize early-stage melanomas that may
be missed in static images.
Explainability and Transparency: Unlike traditional deep learning models that are
often "black boxes," CNN-LSTM models can be designed to provide more
interpretable results by highlighting which features of the lesion have changed over
time. This makes it easier for clinicians to understand and trust the AI’s reasoning.
Automation and Efficiency: The CNN-LSTM system automates the process of lesion
analysis and evolution tracking, improving the efficiency of melanoma detection. This
reduces the reliance on manual review and the potential for human error, leading to
faster and more reliable diagnoses.
12
CONCLUSION:
13
CHAPTER 3
HARDWARE AND SOFTWARE REQUIREMENT
The system's hardware requirements for the proposed project are as follows:
For this project, we're utilizing machine learning techniques, particularly focusing on
the CNN-LSTM model, to improve melanoma detection from sequential dermoscopic
images. Given the computational demands of deep learning, the system needs
powerful hardware to support model training and inference efficiently. The hardware
configuration includes a high-performance CPU, sufficient RAM to handle large
datasets, and a dedicated GPU for model training acceleration.
We are using high-capacity storage (SSD) to store large image datasets and model
parameters, ensuring smooth access and faster processing speeds. The graphical output
14
from the system can be viewed on a Full HD monitor for better analysis and
evaluation of results.
The system's software requirements for the proposed project are as follows:
15
CHAPTER 4
SYSTEM DESIGN
16
4.2 SEQUENCE DIAGRAM:
17
4.4 USECASE DIAGRAM:
18
4.5 ACTIVITY DIAGRAM:
19
4.6 DATA FLOW DIAGRAM:
DATASET
Preprocessing
20
CHAPTER 5
MODULE DESCRIPTION
For the melanoma detection project, we will collect images from the
HAM10000 (Human Against Machine with 10000 training images) dataset, which
is specifically designed for skin lesion classification. This dataset is available on
platforms like Kaggle and contains 10,015 dermatoscopic images, each annotated
with information about whether the lesions are benign or malignant.
Additionally, the dataset includes metadata such as age, gender, body site, and
other information relevant to the lesions, which can enhance the model’s predictive
accuracy. Once the data is collected, it will undergo preprocessing to ensure
consistency and robustness in the model training process. This diverse and well-
labeled dataset will be a key asset in developing a reliable and accurate melanoma
detection model.
For data preprocessing, the first step is to clean the dataset by removing
irrelevant or duplicate images and ensuring that each image is correctly labeled as
either malignant or benign. All images will be resized to a consistent dimension of
224x224 pixels to meet the input requirements of the Convolutional Neural Network
(CNN) used in this project.
21
Data augmentation techniques such as rotation, flipping, brightness
adjustments, zooming, and shifting will be applied to artificially expand the dataset
and improve the model's generalization. Pixel values will be normalized to a range
between 0 and 1 to stabilize the training process.
Additionally, the dataset will be split into training, validation, and test sets to
ensure the model’s unbiased evaluation and to prevent overfitting. This preprocessing
pipeline prepares the data for effective training and validation, enabling the model to
generalize well to new, unseen data.
In the training phase, the CNN model processes each image through multiple
layers to detect key features such as edges, textures, and shapes that distinguish
melanoma from other types of skin lesions. The model’s feature extraction process
will focus on identifying the subtle visual characteristics that differentiate malignant
lesions from benign ones.
For model creation, the CNN is used to automatically extract relevant features
from the input images, such as color, texture, shape, and edges. These features are
then passed through pooling layers to reduce the spatial dimensions and improve
computational efficiency while mitigating the risk of overfitting. The fully connected
22
layers use these features to make predictions, outputting the likelihood of a lesion
being malignant or benign.
The model will apply a softmax activation function at the output layer to
convert raw predictions into probabilities. The model will be trained using the labeled
data from the HAM10000 dataset, continuously refining its predictions during training
to achieve high classification accuracy.
5.5 PREDICTION :
Furthermore, the hybrid model combining CNN for feature extraction and
LSTM for temporal analysis can be used to analyze sequences of images over time
(e.g., capturing multiple images of the same lesion over several weeks). This approach
will enable the system to detect changes in lesion characteristics, which may indicate
malignancy. For recommendation purposes, the system can suggest similar lesions
from the dataset to aid in diagnosis.
Additionally, the system can incorporate metadata such as age, gender, and
location of the lesion to offer more personalized recommendations. This combination
of CNN for feature extraction and LSTM for analyzing changes over time allows the
system to provide accurate classifications as well as valuable insights to clinicians,
improving melanoma detection and patient outcomes.
23
CHAPTER 6
FUTURE WORK :
For future work on the melanoma detection project, here are the most useful areas to
focus on:
Use Advanced Models: We could try more advanced deep learning models to
improve how well the system detects melanoma. This could include combining
CNNs (for image analysis) with LSTMs (for tracking changes over time).
Better Accuracy: Using models that can look at how skin lesions change over
time can help spot melanoma earlier.
Mobile App: We could make the system work on smartphones or other mobile
devices so that anyone can check skin lesions in real-time, no matter where they
are.
Doctor’s Tool: The system should be integrated into clinics so that doctors can
use it to help with diagnoses, making it faster and more accurate.
Explain the Results: It’s important that doctors can understand why the system
gave a certain result. Using tools that show which part of the image the model is
focusing on could help doctors trust the system more.
Human Assistance: The system should be a tool to help doctors, but the final
diagnosis should always come from a healthcare professional.
24
4. Get More Data
More Diverse Data: Gathering more images from different skin types, ages,
and geographic locations will make the system work better for everyone.
Extra Information: Adding patient details, such as age or family history of
skin cancer, could make predictions more personalized.
Testing with Doctors: We need to test the system in real hospitals and clinics to
make sure it works well in real-life situations.
Ongoing Updates: The model should be updated regularly with new data to
keep it accurate and effective.
25
CHAPTER 7
CONCLUSION :
26
CHAPTER 8
REFERENCES :
27
Skin Disease Recognition Using Deep Saliency Features and Multimodal Learning
of Dermoscopy and Clinical Images Zongyuan Ge, Sergey Demyanov, Rajib
Chakravorty, Adrian Bowling & Rahil Garnavi Conference paper First Online: 04
September 2017.
28