Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
41 views

Machine Learning-Based Breast Cancer Detection

This dissertation discusses developing a machine learning model for breast cancer detection. It aims to build a more accurate diagnostic tool using convolutional neural networks. The algorithm is implemented in Python and achieves over 97% accuracy on test data. A web app is also created for easy use in detecting cancers in new patient images.

Uploaded by

Perets Arnaud
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Machine Learning-Based Breast Cancer Detection

This dissertation discusses developing a machine learning model for breast cancer detection. It aims to build a more accurate diagnostic tool using convolutional neural networks. The algorithm is implemented in Python and achieves over 97% accuracy on test data. A web app is also created for easy use in detecting cancers in new patient images.

Uploaded by

Perets Arnaud
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

i

UNIVERSITY OF BUEA

FACULTY OF ENGINEERING DEPARTMENT OF ELECTRICAL

AND TECHNOLOGY AND ELECTRONIC ENGINEERING

MACHINE LEARNING-BASED BREAST CANCER DETECTION: A CASE

STUDY

By

BATCHANOU TATAP PERETS ARNAUD

B.Eng. (Hons) Telecommunications Engineering

A Dissertation Submitted to the Department of Electrical and Electronic Engineering,


Faculty of Engineering and Technology, University of Buea, in Partial Fulfillment of
the Requirements for the Award of a Master of Engineering (M.Eng.)
Degree in Telecommunications and Networks

Supervisor Co-Supervisor

FOZIN F. Théophile, PhD NGWASHI Divine Khan, PhD

April 2023
ii

DEDICATION

This dissertation is dedicated to the TATAP‟s family.


iii

UNIVERSITY OF BUEA

FACULTY OF ENGINEERING DEPARTMENT OF ELECTRICAL

AND TECHNOLOGY AND ELECTRONIC ENGINEERING

CERTIFICATION

This dissertation of Batchanou Tatap Perets Arnaud (FE20P032) entitled, “Machine

Learning-based Breast Cancer Detection: A Case Study”, submitted to the

Department of Electrical and Electronic Engineering, Faculty of Engineering and

Technology of the University of Buea, in partial fulfillment of the requirements for the

award of a Master of Engineering (M.Eng.) Degree in Telecommunications and

Networks has been read, examined and approved by the examination panel composed

of:

Elie Fute, PhD, Chairperson (Associate Professor)


Fozin Théophile, PhD, Supervisor (Lecturer)
Ngwashi Divine, PhD, Co-Supervisor (Lecturer)
Nkemeni Valery, PhD, Examiner (Lecturer)

Tsafack Pierre, PhD Fozin Théophile, PhD


(Associate Professor) (Supervisor)
(Head of Department)

Ngwashi Divine, PhD


(Co-Supervisor)

This dissertation has been accepted by the Faculty of Engineering and Technology

Agbor Dieudonne, PhD


(Associate Professor)
(Dean)
iv

ACKNOWLDGEMENTS

This dissertation could not be made possible today without the help of so many

people, to whom I would like to express my gratitude for their help and support

throughout the journey.

I thank the University of Buea, the Faculty of Engineering and Technology and all

our Professors, Doctors and Lecturers who gave me insight throughout the project. I

particularly thank my supervisors Dr. FOZIN Theophile and Dr. NGWASHI Divine

who helped me with reviews, insights and corrections throughout my dissertation.

To Prof. TANYI Emmanuel, Prof. TSAFACK Pierre, Dr. SITAMZE Bertrand, Dr.

NKEMENI Valery, Dr. FENDJI Danielle, Dr. TENE, and so many more, I say thanks

fortheir constant support and remarks which helped me through the dissertation.

I also thank Dr. MANGA at the Radiology Department of the Douala General

Hospital who helped me with insight on the project.

I also thank my fellow mates who tipped in advice time to time, as we all

moved on together: Forka Da-Silva, Samba Shanice, Ngoran Bedes, Nchunga Florine,

and manyothers.

I thank the TATAP family, my dad TATAP Jean, my mum TATAP Marie, my

brothers and sisters – Joel, Peter, Prince and Olive, and also the SIAKET family, who

helped me with all that was necessary for the success of this project. Their constant love

was a strong driving force that helped me remember my goal and complete this project.

Finally, and most importantly, I thank God Almighty for his mercies throughout my

dissertation.
v

ABSTRACT

The detection of breast cancer is a very crucial task even the most seasoned doctors
can‟t perform with a hundred percent accuracy. Late detection of the tumor has caused a
significant number of deaths each year. Fortunately, the introduction of artificial
intelligence hashelped solve this worry in the field. The aim of our study is to develop a
more accurate way of diagnosing breast cancer using machine learning. Our solution
involves the use of artificial intelligence methods like Convolutional Neural Networks to
diagnose breast cancer. The algorithm is realized in Python environment. The result
shows that this method is far more efficient than all other techniques, achieving
accuracy above 97%. Moreover, a web app is also deployed for user-friendliness, in
order to detect any subsequent sample patient image. This solution could be greatly used
in the field of medicine to perform earlyand accurate detection of the tumor, and will go
a long way to prove that technology can revolutionize the way we live.

Keywords: Image Classification, Convolutional Neural Networks, Artificial


Intelligence, Google Colab, Tensor Flow, Keras
vi

TABLE OF CONTENTS

DEDICATION...........................................................................................................ii

CERTIFICATION ................................................................................................... iii

ACKNOWLDGEMENTS ........................................................................................iv

ABSTRACT .............................................................................................................. v

LIST OF FIGURES ..................................................................................................ix

LIST OF TABLES..................................................................................................... x

LIST OF ABBREVIATIONS……………………………………………….........xii

CHAPTER ONE

GENERAL INTRODUCTION

1.1 Overview on Breast Cancer ....................................................................... 1

1.1.1 Signs and Symptoms of Breast Cancer ........................................................... 2

1.1.2 Diagnosis of breast cancer ............................................................................... 3

1.2 Overview on artificial intelligence and benefits ................................................. 4

1.2.1 Overview on artificial intelligence .................................................................. 4

1.2.2 Benefits of Artificial Intelligence .................................................................... 4

1.3 Problem Statement .............................................................................................. 6

1.4 Research Question ............................................................................................... 6

1.5 Objectives ............................................................................................................ 6

1.6 Dissertation Outline............................................................................................. 7


vii

CHAPTER TWO

LITERATURE REVIEW

2.1 Introduction ......................................................................................................... 8

2.2 Overview on Machine Learning Algorithms ...................................................... 8

2.2.1 Supervised Machine Learning Algorithms ..................................................... 9

2.2.2 Unsupervised Machine Learning Algorithms ............................................... 16

2.2.3 Semi Supervised Machine Learning Algorithms .......................................... 17

2.2.4 Deep Learning Algorithms ............................................................................ 18

2.3 Review of Previous Works on Machine Learning for General Diseases


Prediction ................................................................................................................. 21

2.4 Review of Previous Works on Machine Learning for BreastCancer Prediction . 22

2.5 Survey of Previous Works ................................................................................. 30

2.6 Partial Conclusion ............................................................................................. 32

CHAPTER THREE

MATERIALS AND METHODS

3.1 Introduction .............................................................................................. 33

3.2 Project methodology ................................................................................ 33

3.2.1 The Data Set ................................................................................................... 34

3.2.2 Convolutional Neural Network Architecture ................................................ 35

3.2.3 Training the Neural Network ......................................................................... 36


viii

3.2.4 Classifier Performance Index ........................................................................ 39

3.2.5 Web App Development .................................................................................. 41

3.3 Use Case Diagram ............................................................................................. 44

3.4 Partial Conclusion ............................................................................................. 45

CHAPTER FOUR

RESULTS AND DISCUSSIONS

4.1 Introduction ....................................................................................................... 46

4.2 Results from Google Colab ............................................................................... 46

4.3 Results from Streamlit App............................................................................... 48

4.4 Discussion of Results and Comparative Analysis with Previous Works .... 50

4.5 Partial Conclusion ............................................................................................. 52

CHAPTER FIVE

GENERAL CONCLUSION

5.1 Summary of Findings ........................................................................................ 53

5.2 Implications to Existing Knowledge................................................................. 53

5.3 Recommendations ............................................................................................. 54

5.4 Future Scope ...................................................................................................... 54

References ............................................................................................................... 56

Appendices .............................................................................................................. 61
ix

LIST OF FIGURES

Figure 1. 1: Major types of Breast Cancer [1] .................................................................. 2


Figure 1. 2: Relationship between AI, ML and DL .......................................................... 4

Figure 2. 1: Classification of Machine Learning algorithms [4] ...................................... 9


Figure 2. 2: A simplified illustration of how the support vector machine works [5] ..... 11
Figure 2. 3: A simplified illustration of how the decision tree works [5]....................... 12
Figure 2. 4: A simplified illustration of how the rando forest works [5] ........................ 13
Figure 2. 5: An illustration of the Na¨ıve Bayes algorithm [5]....................................... 14
Figure 2. 6: A simplified illustration of the K-nearest neighbor algorithm [5] .............. 15
Figure 2. 7: An illustration of the artificial neural network structure with two hidden
layers [5] ......................................................................................................................... 16
Figure 2. 8: An illustration of recurrent neural networks [6].......................................... 19

Figure 3. 1: Project Methodology Flowchart .................................................................. 33


Figure 3. 2: Section of data set showing first five rows and columns ............................ 34
Figure 3. 3: Illustration of how a Convolutional Neural Networks performs [31] ......... 36
Figure 3. 4: Convolution layer scheme [31..................................................................... 37
Figure 3. 5: Pooling layer scheme [31] ........................................................................... 37
Figure 3. 6: Fully Connected layer scheme [31] ............................................................. 38
Figure 3. 7: a) The basic framework of the confusion matrix; and b) A presentation of
the ROC curve [5] ........................................................................................................... 39
Figure 3. 8: The seven phases of software development life cycle [32] ......................... 41
Figure 3. 9: Use case diagram of the system .................................................................. 44

Figure 4. 1: ROC curve performance.............................................................................. 46


Figure 4. 2: Confusion matrix performance.................................................................... 49
Figure 4. 3: Training and validation accuracy vs training and validation loss ............... 48
Figure 4. 4: Web app homepage ..................................................................................... 49
Figure 4. 5: Benign image prediction ............................................................................. 49
Figure 4. 6: A malignant image prediction ..................................................................... 50
x

LIST OF TABLES

Table 2. 1: Survey of Previous Works on Breast Cancer Detection ............................... 31

Table 4. 1 Comparative analysis of results with previous works on deep learning for

breast cancer detection .................................................................................................... 51


xi

LIST OF ABBREVIATIONS

ACM Association for Computing Machinery

AI Artificial Intelligence

ANN Artificial Neural Network

AUC Area under Curve

BC Breast Cancer

CAD Computer Aided Design

CNN Convolutional Neural Network

CPU Central Processing Unit

DCIS Ductal Carcinoma in Situ

DL Deep Learning

DT Decision Tree

EHR Electronic Health Record

FN False Negative

FNA Fine Needle Aspiration

FNR False Negative Rate

FP False Positive

FPR False Positive Rate

GAN Generative Adversarial Network

GPU Graphics Processing Unit

HASHI High-throughput adaptive sampling for whole-slide

histopathology image analysis

IBC Inflammatory Breast Cancer

IDC Invasive Ductal Carcinoma


xii

IEEE Institute of Electrical and Electronic Engineering

KNN K Nearest Neighbor

KPI Key Performance Indicator

LBC Lobular Breast Cancer

LR Logistic Regression

LSTN Long Short-Term Memory Neural Network

MBC Mucinous Breast Cancer

MIAS Mammographic Image Analysis Society

ML Machine Learning

MRI Magnetic Resonance Imaging

MTBC Mixed Tumors Breast Cancer

NB Naïve Bayes

NPV Net Present Value

PPV Positive Predictive Value

ReLu Rectifying Linear Unit

ResNet Residual Neural Network

RF Random Forest

RNN Recurrent Neural Network

ROC Receiver Operating Characteristic

SDLC Software Development Life Cycle

SVM Support Vector Machine

TF Tensor Flow

TN True Negative

TNR True Negative Rate

TP True Positive
xiii

TPU Tensor Processing Unit

VGG Visual Geometry Group

WDBC Wisconsin Diagnostic Breast Cancer


1

CHAPTER ONE

GENERAL INTRODUCTION

1.1 Overview on Breast Cancer

Breast cancer is one of the most lethal and heterogeneous disease in this present

era that causes the death of enormous number of women all over the world. [1] Breast

cancer (BC)is the most common cancer in women, affecting about 10% of all women at

some stages of their life. In recent years, the incidence rate keeps increasing and data

show that the survival rate is 88% after five years from diagnosis and 80% after 10 years

from diagnosis [1]. Early prediction of breast cancer is one of the most crucial works in

the follow-up process. It is the second largest disease that is responsible of women

death after heart disease. Tumors can be benign (noncancerous) or malignant

(cancerous). Benign tumors tend to grow slowly and do not spread. Malignant tumors

can grow rapidly, invade and destroy nearby normal tissues, and spread throughout the

body A lot of fatty and fibrous tissues of the breast start abnormal growth that becomes

the cause of breast cancer. The cancer cells spread throughout the tumors that cause

different stages of cancer.

Figure 1.1 shows the various types of breast cancer that exist. There are different

types of breast cancer which occurs when affected cells and tissues spread throughout

the body. Ductal Carcinoma in Situ (DCIS) is type of the breast cancer that occurs

when abnormal cells spread outside the breast it is also known as the non-invasive

cancer. The second type is Invasive Ductal Carcinoma (IDC) and it is also known as

infiltrative ductal carcinoma. This type of the cancer occurs when the abnormal cells of

breast spread over all the breasttissues and IDC cancer is usually found in men. Mixed

Tumors Breast Cancer (MTBC)is the third type of breast cancer and it is also known as
2

invasive mammary breast cancer. Abnormal duct cell and lobular cell causes such kind

of cancer. The fourth type of cancer is Lobular Breast Cancer (LBC) which occurs

inside the lobule. It increases the chances of other invasive cancers. Mucinous Breast

Cancer (MBC) is the fifth type that occurs because of invasive ductal cells, it is also

known as colloid breast cancer. It occurs when the abnormal tissues spread around the

duct. Inflammatory Breast Cancer (IBC) is last type that causes swelling and reddening

of breast. It is a fast-growing breast cancer, when the lymph vessels block in break cell,

this type of cancer starts to appear [1].

Figure 1. 1: Major types of Breast Cancer [1]

1.1.1 Signs and Symptoms of Breast Cancer

It is found that most women who have breast cancer symptoms and signs will initially

notice only one or two. Some people do not have any signs or symptoms at all. The

most common signs of breast cancer are:

• A lump or thickening in or near the breast or in the underarm (armpit) area; [2]

• Enlarged lymph nodes in the armpit;

• Changes in size, shape, skin texture or color of the breast;


3

• Pain in any area of the breast;

• Skin redness;

• Dimpling or puckering;

• Fluid, other than breast milk, from the nipple, especially if it‟s bloody; [2]

• Scaly, red or swollen skin on the breast, nipple or areola (the dark area of skin that is

around the nipple);

Nipple pulling to one side or a change in direction; [2]

1.1.2 Diagnosis of Breast Cancer

Breast cancer can be detected using one of the following methods.

Breast ultrasound: A machine that uses sound waves to make pictures, called sonograms,

of areas inside the breast [2]

Diagnostic mammogram: If you have a problem in your breast, such as lumps, or if

an

area of the breast looks abnormal on a screening mammogram, doctors may have you geta

diagnostic mammogram. This is a more detailed X-ray of the breast.

Breast magnetic resonance imaging (MRI): A kind of body scan that uses a magnet

linked to a computer. The MRI scan will make detailed pictures of areas inside the

breast.

Biopsy: This is a test that removes tissue or fluid from the breast to be looked at

under a microscope and do more testing. There are different kinds of biopsies (for

example, fine-needle aspiration, core biopsy, or open biopsy)

Now as an innovation, we go in for a more accurate and effective way of detecting

cancer, hence the introduction of AI-based methods.


4

1.2 Overview on Artificial Intelligence and Benefits

1.2.1 Overview on Artificial Intelligence

Artificial intelligence (AI) is a branch of Computer Science. It involves developing

computer programs to complete tasks which would otherwise require human

intelligence. AI algorithms can tackle learning, perception, problem-solving language-

understanding and/or logical reasoning. In AI we have machine learning and deep

learning. Figure 1.2 shows the relationship between AI, ML and DL

Figure 1. 2: Relationship between AI, ML and DL

1.2.2 Benefits of Artificial Intelligence

Broad areas in life are using AI in the various ways. AI and ML-powered software and

devices are mimicking human thought patterns to facilitate the digital transformation of

society. AI systems perceive their environment, deal with what they perceive, solve

problems and act to help with tasks to make everyday life easier. The following are

ways in which AI has helped revolutionize our lives: [3]

• Voice Assistants: Digital assistants like Siri, Google Home, and Alexa use AI- backed
5

Voice User Interfaces (VUI) to process and decipher voice commands. AI gives these

applications the freedom to not solely rely on voice commands but also leverage vast

databases on cloud storage platforms. [3]

• Entertainment Streaming Apps: Streaming giants like Netflix, Spotify, and Hulu are

continually feeding data into machine learning algorithms to make the user experience

seamless. [3]

• Personalized Marketing: Brands use AI-driven personalization solutions based on

customer data to drive more engagement. [3]

• Smart Input Keyboards: The latest versions of mobile keyboard apps combine the

provisions of autocorrection and language detection to provide a user-friendly

experience. [3]

• Navigation and Travel: The work of AI programmers behind navigation apps like

Google Maps and Waze never ends. Yottabytes of geographical data which is up- dated

every second can only be effectively cross-checked by ML algorithms un- leashed on

satellite images. [3]

• Self-driving vehicles: The technology of Autonomous Vehicle AI is witnessing large-

scale innovation driven by global corporate interest. AI is making innovations beyond

cruise-control and blind-spot detection to include fully autonomous capabilities. [3]

• Security and Surveillance: It is nearly impossible for a human being to keep a constant

eye on too many monitors of a CCTV network at the same time. So, naturally, we have

felt the need to automate such surveillance tasks and further enhance them by leveraging

machine learning methodologies. [3]

• Internet of Things: The confluence of AI and the Internet of Things (IoT) opens up a

plethora of opportunities to develop smarter home appliances that require minimal


6

human interference to operate. While IoT deals with devices interacting with the

internet, the AI part helps these devices to learn from data. [3]

• Facial Recognition Technologies: The most popular application of this technologyis in

the Face ID unlock feature in most of the flagship smartphone models today. The

biggest challenge faced by this technology is widespread concern around the racial and

gender bias of its use in forensics. [3]

• Medicine: Artificially intelligent computer systems are used extensively in medical

sciences. Common applications include diagnosing patients, end-to-end drug discovery

and development, improving communication between physician and patient,

transcribing medical documents, such as prescriptions, and remotely treating patients.

1.3 Problem Statement

Given what has been said in Section 1.1, the inaccuracy of the diagnose of breast

cancer using traditional methods has proven to be considerable, and so there is the need

of a fast,accurate and efficient diagnosis of breast cancer.

1.4 Research Question

From the problem statement, we set our research question as: how can the accuracy of

ML-based methods for breast cancer detection be improved compared to that of

traditionalmethods?

1.5 Objectives

General objective

The main objective of this work is to increase the accuracy of breast cancer detection

by exploiting machine learning techniques in diagnosing breast cancer.


7

Specific objectives

The specific objectives of this project are to:

• Design of a high accurate and low error rate machine learning technique.

• Develop a web app capable of performing any subsequent prediction with a patient

data

1.6 Dissertation Outline

After this introduction, our work is organized as follows. Chapter 2 gives a state of art

and the previous works carried out in the project. It presents the major contributions

carried out by other scientists, researchers and Engineers on the project, their results

obtained, and the limitations of their study. It is from this chapter that a foreknowledge

of the methodologyto be implemented is gotten.

Then, chapter 3 focuses on the methodology used for the project. Here, the project

method- ology is explained, giving the steps taken to build and train the machine

learning model. In this chapter we see how the machine learning algorithm is trained,

how the main KPIs of the model are generated, and how the web app development is

performed. Chapter 4 presents the results of the trained model, the web app developed, the

performance index given by the confusion matrix and the ROC curve, and also a

comparative analysis of the results with those of previous work. Finally, we have a

conclusion of the work in Chapter 5 where we are presenting the summary of findings,

recommendations, and giving thought for the future.


8

CHAPTER TWO

LITERATURE REVIEW

2.1 Introduction

This chapter gives the recent research work and contributions done in the field of breast

cancer detection with machine learning techniques, and explains the various methods

usedto detect breast cancer.

2.2 Overview on Machine Learning Algorithms

Machine Learning is a subset of Artificial Intelligence that uses statistical learning

algorithms to build systems that have the ability to automatically learn and improve

from experiences without being explicitly programmed. Deep learning is a type of

machine learning and artificial intelligence (AI) that imitates the way humans gain

certain types of knowledge. While traditional machine learning algorithms are linear,

deep learning algorithms are stacked in a hierarchy of increasing complexity and

abstraction.

At its most basic sense, machine learning uses programmed algorithms that learn and

optimize their operations by analyzing input data to make predictions within an

acceptable range. With the feeding of new data, these algorithms tend to make more

accurate predictions. Although there are some variations of how to group machine

learning algorithms, they can be divided into three broad categories according to their

purposes and the way the underlying machine is being taught. These three categories

are: supervised, unsupervised and semi-supervised. There also exists a fourth category

known as reinforcement ML. Figure 2.1 shows an illustration of the classification of

machine learning algorithms.


9

Figure 2. 1: Classification of Machine Learning algorithms [4]

2.2.1 Supervised Machine Learning Algorithms

In this type of algorithms, a model gains knowledge from the data that has predefined

examples of data with both input and expected output to compare its output with the

correct input. Classification problem is one of the standard formulations for supervised

learning task where the data is mapped into a class after looking at numerous input-

output examples of a function. Supervised learning is a branch of ML which deals with

a given dataset consisting of multiple data along with their corresponding classes. It can

be used both for decision trees and artificial neural networks. In decision trees it can be

used to determine which attributes of the data given provides the most relevant

information. In artificial neural networks, the models are trained on the given dataset and

classifications of an unknownsample of data are being carried out.


10

1 Logic Regression

Logistic regression Logistic regression (LR) is a powerful and well-established method

for supervised classification [4]. It can be considered as an extension of ordinary

regression and can model only a dichotomous variable which usually represents the

occurrence or non-occurrence of an event. LR helps in finding the probability that a new

instance belongs to a certain class. Since it is a probability, the outcome lies between 0

and 1. Therefore, to use the LR as a binary classifier, a threshold needs to be assigned

to differentiate two classes. For example, a probability value higher than 0.50 for an

input instance will classifyit as ‟class A‟; otherwise, ‟class B‟.

2 Support Vector Machine (SVM)

Support vector machine (SVM) algorithm can classify both linear and non-linear data.

It first maps each data item into an n-dimensional feature space where n is the number

of features. It then identifies the hyper plane that separates the data items into two

classes while maximizing the marginal distance for both classes and minimizing the

classification errors. The marginal distance for a class is the distance between the

decision hyper plane and its nearest instance which is a member of that class. Figure 2.2

shows an illustration of the support Vector machine. The SVM has identified a hyper

plane (actually a line) which maximizes the separation between the „star‟ and „circle‟

classes. More formally, each datapoint is plotted first as a point in an n-dimension space

(where n is the number of features) with the value of each feature being the value of a

specific coordinate. To perform the classification, we then need to find the hyper plane

that differentiates the two classes by themaximum margin [5].


11

Figure 2. 2: A simplified illustration of how the support vector machine works [5]

3 Decision Tree (DT)

Decision tree (DT) is one of the earliest and prominent machine learning algorithms. A

decision tree tests and corresponds outcomes for classifying data items into a tree-like

structure. The nodes of a decision tree normally have multiple levels where the first or

top-most node is called the root node. All internal nodes (i.e., nodes having at least one

child) represent tests on input variables or attributes.

Figure 2.3 shows an illustration of the Decision Tree. Each variable (C1, C2, and C3) is

represented by a circle and the decision outcomes (Class A and Class B) are shown by

rectangles. In order to successfully classify a sample to a class, each branch is labelled

with either „True‟ or „False‟ based on the outcome value from the test of its ancestor

node.

Depending on the test outcome, the classification algorithm branches towards the

appropriate child node where the process of test and branching repeats until it reaches the

leaf node.The leaf or terminal nodes correspond to the decision outcomes. DTs have been

found easy to interpret and quick to learn, and are a common component to many

medical diagnosticprotocols. When traversing the tree for the classification of a sample,

the outcomes of all tests at each node along the path will provide sufficient information

to conjecture about its class.


12

Figure 2. 3: A simplified illustration of how the decision tree works [5]

4 Random Forest (RF)

A random forest (RF) is an ensemble classifier and consisting of many DTs similar

to the way a forest is a collection of many trees. DTs that are grown very deep often

cause over fitting of the training data, resulting a high variation in classification

outcome for a small change in the input data. They are very sensitive to their training

data, which makes them error-prone to the test dataset. The different DTs of an RF are

trained using the different parts of the training dataset.

Figure 2.4 shows an illustration of the RF algorithm which consists of three different

decision trees. Each of those three decision trees was trained using a random subset

of the training data. To classify a new sample, the input vector of that sample is

required to passdown with each DT of the forest. Each DT then considers a different part

of that input vector and gives a classification outcome. The forest then chooses the

classification of having the most ‟votes‟ (for discrete classification outcome) or the

average of all trees in the forest (for numeric classification outcome). Since the RF

algorithm considers the outcomes from many different DTs, it can reduce the variance

resulted from the consideration of a single DT for the same dataset.


13

Figure 2. 4: A simplified illustration of how the random forest works [5]

5 Naïve Bayes (NB)

Naïve Bayes (NB) is a classification technique based on the Bayes‟ theorem. This theorem

can describe the probability of an event based on the prior knowledge of conditions

related to that event. This classifier assumes that a particular feature in a class is not

directly related to any other feature although features for that class could have

interdependence among themselves. By considering the task of classifying a new object

(white circle) to either „green‟ class or „red‟ class, Figure 2.5 shows an illustration of

the Naive Baiyes Algorithm. According to this figure, it is reasonable to believe that any

new object is twice as likely to have „green‟ membership rather than „red‟ since there are

twice as many „green‟ objects (40) as „red‟. In the Bayesian analysis, this belief is known

as the prior probability. Therefore, the prior probabilities of „green‟ and „red‟ are 0.67

(40 ÷ 60) and 0.33 (20 ÷ 60), respectively. Now to classify the „white‟ object, we need

to draw a circle around this object which encompasses several points (to be chosen

prior) irrespective of their class labels. Four points (three „red‟ and one „green) were

considered in this figure. Thus, the likelihood of „white‟ given „green‟ is 0.025 (1 ÷ 40)
14

and the likelihood of „white‟ given „red‟ is 0.15 (3 ÷ 20). Although the prior probability

indicates that the new „white‟ object is more likely to have „green‟ membership, the

likelihood shows that it is more likely to be in the „red‟ class. In the Bayesian analysis,

the final classifier is produced by combining both sources of information (i.e., prior

probability and likelihood value). The „multiplication‟ function is used to combine

these two types of information and the product is called the „posterior‟ probability.

Finally, the posterior probability of „white‟ being „green‟ is 0.017 (0.67 × 0.025) and

the posterior probability of „white‟ being „red‟ is 0.049 (0.33 × 0.15). Thus, the new

„white‟ object should be classified as a member of the „red‟ class according to the NB

technique.

Figure 2. 5: An illustration of the Na¨ıve Bayes algorithm [5]

6 K-Nearest Neighbor (KNN)

The K-nearest Neighbor (KNN) algorithm is one of the simplest and earliest

classification algorithms. It can be thought a simpler version of an NB classifier. Unlike

the NB tech- nique, the KNN algorithm does not require to consider probability values.

The ‟K‟ is the

KNN algorithm is the number of nearest Neighbors considered to take ‟vote‟ from. The

se- lection of different values for ‟K‟ can generate different classification results for the

same sample object. Figure 2.6 shows an illustration of the KNN algorithm. For K=3,
15

the new object (star) is classified as ‟black‟; however, it has been classified as ‟red‟

when K=5.

Figure 2. 6: A simplified illustration of the K-nearest neighbor algorithm [5]

7 Artificial Neural Network (ANN)

Artificial neural networks (ANNs) are a set of machine learning algorithms which are

in- spired by the functioning of the neural networks of human brain. They were first

proposed by McCulloch and Pitts and later popularized by the works of Rumelhart et

al. in the 1980s. In the biological brain, neurons are connected to each other through

multiple axonjunctions forming a graph like architecture. These interconnections can be

rewired (e.g., through neuroplasticity) that helps to adapt, process and store

information. Figure 2.7 shows an illustration of artificial neural networks with two

hidden layers. The arrows connect the output of nodes from one layer to the input of

nodes of another layer. Likewise, ANN algorithms can be represented as an

interconnected group of nodes. The output of one node goes as input to another node

for subsequent processing according to the interconnection. Nodes are normally

grouped into a matrix called layer depending on the transformation they perform. Apart
16

from the input and output layer, there can be one or more hidden layers in an ANN

framework. Nodes and edges have weights that enable to adjust signal strengths of

communication which can be amplified or weakened through re- peated training. Based

on the training and subsequent adaption of the matrices, node and edge weights, ANNs

can make a prediction for the test data.

Figure 2. 7: An illustration of the artificial neural network structure with two hidden layers [5]

2.2.2 Unsupervised Machine Learning Algorithms

In unsupervised learning, only input data is provided to the model the use of labeled

datasets. Unsupervised learning algorithms do not use labeled input and output data. An

example of unsupervised learning is clustering. In contrast to supervised learning, unsu-

pervised learning methods are suitable when the output variables (i.e. the labels) are not

provided. Some examples of unsupervised learning algorithms include K-Means

Cluster-ing, Principal Component Analysis and Hierarchical Clustering.

1 K-Mean Clustering: K mean is clustering algorithm that provides the partition of

data in the form of small clusters. Algorithm is used to find out the similarity between

different data points. Data points exactly consist of at least one cluster that is most

suitable for the evaluation of big dataset.


17

2 C-Mean CLUSTERING: Clusters are identified on the similarity basis. Cluster that

consists of similar data point belongs to one single family. In C mean algorithm each

data point belongs to one single cluster. It is mostly used in medical images

segmentation and disease prediction

3 Hierarchical Algorithm: Hierarchical algorithm mostly provides the evaluation of

raw data in the form of matrix. Each cluster is separated from other clusters in the form

of hierarchy. Every single cluster consists of similar data points. Probabilistic model is

used to measure the distance between each cluster

4 Gaussian Mixture Algorithm: It is most popular technique of unsupervised learning.

It is known as soft clustering technique which is used to compute the probability of

different types of clustered data. The implementation of this algorithm is based on

expectation maximization.

2.2.3 Semi Supervised Machine Learning Algorithms

Semi-supervised machine learning is a combination of supervised and unsupervised

ma- chine learning methods. With more common supervised machine learning

methods, you train a machine learning algorithm on a “labeled” dataset in which each

record includes the outcome information. Semi-supervised learning is an approach to

machine learning that combines a small amount of labeled data with a large amount of

unlabeled data during training. Semi-supervised learning falls between unsupervised

learning (with no labeled training data) and supervised learning (with only labeled

training data). Semi supervised learning is used in speech analysis. Since labeling of

audio files is a very intensive task, Semi-Supervised learning is a very natural approach

to solve this problem. Internet Con- tent Classification: Labeling each webpage is an

impractical and unfeasible process and thus uses Semi-Supervised learning algorithms.
18

2.2.4 Deep Learning Algorithms

Deep learning has gained massive popularity in scientific computing, and its algorithms

are widely used by industries that solve complex problems. All deep learning

algorithms use different types of neural networks to perform specific tasks. Here is the

list of top 10 most popular deep learning algorithms. [6]

1 Convolutional Neural Networks (CNNs): CNN‟s, also known as ConvNets, consist

of multiple layers and are mainly used for image processing and object detection. Yann

LeCun developed the first CNN in 1988 when it was called LeNet. It was used for

recognizing characters like ZIP codes and digits. CNN‟s are widely used to identify

satellite images, process medical images, forecast time series, anddetect anomalies. [6]

2 Long Short-Term Memory Networks (LSTMs): LSTMs are a type of Recurrent

Neural Network (RNN) that can learn and memorize long-term dependencies. Recalling

past information for long periods is the default behavior.

LSTMs retain information over time. They are useful in time-series prediction because

they remember previous inputs. LSTMs have a chain-like structure where four

interactinglayers communicate in a unique way. Besides time-series predictions, LSTMs

are typically used for speech recognition, music composition, and pharmaceutical

development. [6]

3 Recurrent Neural Networks (RNNs): An unfolded RNN is illustrated in Figure 2.8.

RNNs have connections that form directed cycles, which allow the outputs from the

LSTM to be fed as inputs to the current phase. The output from the LSTM becomes an

input to the current phase and can memorize previous inputs due to its internal memory.

RNNs are commonly used for image captioning, time-series analysis, natural-language

processing, handwriting recognition, and machine translation.


19

Figure 2. 8: An illustration of recurrent neural networks [6]

4 Generative Adversarial Networks (GANs): GANs are generative deep learning

algorithms that create new data instances that resemble the training data. GAN has two

components: a generator, which learns to generate fake data, and a discriminator, which

learns from that false information. The usage of GANs has increased over a period of

time. They can be used to improve astronomical images and simulate gravitational

lensing for dark-matter research. Video game developers use GANs to upscale low-

resolution, 2D textures in old video games by recreating them in 4K or higher

resolutions via image training. GANs help generate realistic images and cartoon

characters, create photographs of humanfaces, and render 3D objects.

5 Radial Basis Function Networks (RBFNs): RBFNs are special types of feed

forward neural networks that use radial basis functions as activation functions. They

have an input layer, a hidden layer, and an output layer and are mostly used for

classification, regression, and time-series prediction.

6 Multilayer Perceptrons (MLPs): MLPs are an excellent place to start

learning about deep learning technology. MLPs belong to the class of feed forward
20

neural networks with multiple layers of perceptrons that have activation functions.

MLPs consist of an input layer and an output layer that are fully connected. They have

the same number of input and output layers but may have multiple hidden layers and can

be used to build speech-recognition, image-recognition, and machine-translation software.

7 Self-Organizing Maps (SOMs): Professor Teuvo Kohonen invented SOMs, which

enable data visualization to reduce the dimensions of data through self-organizing

artificial neural networks. Data visualization attempts to solve the problem that humans

cannot easily visualize high- dimensional data. SOMs are created to help users

understand this high-dimensional information.

8 Deep Belief Network (DBNS): DBNs are generative models that consist of

multiple layers of stochastic, latent variables. The latent variables have binary values

and are often called hidden units. [6] DBNs are a stack of Boltzmann Machines with

connections between the layers, and each RBM layer communicates with both the

previous and subsequent layers. Deep Belief Networks (DBNs) are used for image-

recognition, video-recognition, and motion-capturedata.

9 Restricted Boltzmann Machines (RBMs): Developed by Geoffrey Hinton, RBMs

are stochastic neural networks that can learn from a probability distribution over a set of

inputs. This deep learning algorithm is used for dimensionality reduction, classification,

regres- sion, collaborative filtering, feature learning, and topic modeling. RBMs

constitute the building blocks of DBNs. RBMs consist of two layers: Visible units and

Hidden units Each visible unit is connected to all hidden units. RBMs have a bias unit

that is connected to all the visible units and the hidden units, and they have no output

nodes. [6]

10 Auto encoders: Auto encoders are a specific type of feed forward neural network in

which the input and output are identical. Geoffrey Hinton designed auto encoders in the
21

1980s to solve unsupervised learning problems. They are trained neural networks that

replicate the data from the input layer to the output layer. Auto encoders are used for

purposes such as pharmaceutical discovery, popularity prediction, and image

processing. [6]

2.3 Review of Previous Works on Machine Learning for General Diseases

Prediction

Extensive work was carried out in the field of Artificial Intelligence, especially

Machine Learning, to detect common diseases. Dahiwade et al. [7] proposed a ML

based system that predicts common diseases. The symptoms dataset was imported from

the UCI ML depository, where it contained symptoms of many common diseases. The

system used CNN and KNN as classification techniques to achieve multiple diseases

prediction. Moreover, the proposed solution was supplemented with more information

that concerned the living habits of the tested patient, which proved to be helpful in

understanding the level of risk attached to the predicted disease. Dahiwade et al.

compared the results between KNN and CNN algorithm in terms of processing time and

accuracy. The accuracy and processing time of CNN were 84.5% and 11.1 seconds,

respectively.

In light of this study, the findings of Chen et al. [8] also agreed that CNN outperformed

typical supervised algorithms such as KNN, NB, and DT. The authors concluded that

the proposed model scored higher in terms of accuracy, which is explained by the

capability of the model to detect complex nonlinear relationships in the feature space.

Moreover, CNN detects features with high importance that renders better description of

the disease, which enables it to accurately predict diseases with high complexity. This

conclusion is well sup- ported and backed with empirical observations and statistical

arguments. Nonetheless, the presented models lacked details, for instance, neural
22

networks parameters such as network size, architecture type, learning rate and back

propagation algorithm, etc. In addition, the analysis of the performances is only

evaluated in terms of accuracy, which debunks the validity of the presented findings.

Moreover, the authors did not take into consideration the bias problem that is faced by

the tested algorithms. In illustration, the incorporation of more feature variables could

immensely ameliorate the performance metrics of under- performed algorithms.

Uddin et al [5] compared the various supervised ML techniques. In their study,

extensive research efforts were made to identify those studies that applied more than

one supervised machine learning algorithm on single disease prediction. Two databases

(i.e., Scopus and PubMed) were searched for different types of search items. Thus, they

selected 48 articles in total for the comparison among variants supervised machine

learning algorithms for dis- ease prediction. They found that the Support Vector

Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the

Na¨ıve Bayes algorithm (in 23 studies). However, the Random Forest (RF) algorithm

showed superior accuracy comparatively. Of the 17 studies where it was applied, RF

showed the highest accuracy in 9 of them, i.e., 53%. This was followed by SVM which

topped in 41% of the studies it was considered.

2.4 Review of Previous Works on Machine Learning for Breast Cancer

Prediction

Sengar et al. [9] attempted to detect breast cancer using ML algorithms, namely RF,

Bayesian Networks and SVM. The researchers obtained the Wisconsin original breast

cancer dataset from the UCI repository and utilized it for comparing the learning

models in terms of key parameters such as accuracy, recall, precision, and area of ROC

graph. The classifiers were tested using K-fold validation method, where the chosen value

of K is equal to 10. The simulation results have proved that SVM excelled in terms of
23

recall, accuracy, and precision. However, RF had a higher probability in the correct

classification of the tumor, which was implied by the ROC graph. In

contrast, Yao [10] experimented with various data mining methods including RF and

SVM to determine the best suited algorithm for breast cancer prediction. Per results, the

classification rate, sensitivity, and specificity of Random Forest algorithm were

96.27%, 96.78%, and 94.57%, respectively, while SVM scored an accuracy value of

95.85%, a sensitivity of 95.95%, and a specificity of 95.53%. Yao came to the

conclusion that the RF algorithm performed better than SVM because the former

provides better estimates of information gained in each feature attribute.

Furthermore, RF is the most adequate at breast diseases classification, since it scales

well for large datasets and prefaces lower chances of variance and data over fitting. The

studies advantageously presented multiple performance metrics that solidified the

underlined argument. Nevertheless, the inclusion of the preprocessing stage to prepare

raw data for training proved to be disadvantageous for ML models. According to Yao,

omitting parts of data reduces the quality of images, and therefore the performance of

the ML algorithm is hindered.

Noreen Fatima et al. [1] performed a comparative review or machine learning

techniques and analyzed their accuracy across various journals. Her main focus is to

comparatively analyze different existing Machine Learning and Data Mining

techniques in order to find out the most appropriate method that will support the large

dataset with good accuracy of prediction. She found out that machine learning

techniques were used in 27 papers, ensemble techniques were used in 4 papers, and

deep learning techniques were used in 8 papers. She concluded by saying that each

technique is suitable under different conditions and on different type of dataset, after the

comparative analysis of these algorithms we came to know that machine learning


24

algorithm SVM is the most suitable algorithm for prediction of breast cancer. Different

researchers have provided the analysis of prediction algorithms by using the dataset

from Wisconsin Diagnostic Breast Cancer (WDBC), and the analysis shows that each

time the accuracy of SVM algorithm is higher than the other machine learning

algorithms.

Delen et al. [11] used artificial neural networks, decision trees and logistic regression to

develop prediction models for breast cancer survival by analyzing a large dataset, the

SEER cancer incidence database. Two popular data mining algorithms (artificial neural

networks and decision trees) were used, along with a most commonly used statistical

method (logis- tic regression) to develop the prediction models using a large dataset

(more than 200,000 cases). 10-fold cross-validation method was used to measure the

unbiased estimate of the three prediction models for performance comparison purposes.

The results indicated that the decision tree (C5) is the best predictor with 93.6%

accuracy on the holdout sample (this prediction accuracy is better than any reported in

the literature), artificial neural networks came out to be the second with 91.2% accuracy

and the logistic regression models came out to be the worst of the three with 89.2%

accuracy. The comparative study of multiple prediction models for breast cancer

survivability using a large dataset along with a 10-fold cross-validation provided us with

an insight into the relative prediction ability of different data mining methods. Using

sensitivity analysis on neural network models provided us with the prioritized

importance of the prognostic factors used in the study.

Lundin et al. [12] used ANN and logistic regression models to predict 5, 10, and 15-

year breast cancer survival. They studied 951 breast cancer patients and used tumor

size, axillary nodal status, histological type, mitotic count, nuclear pleomorphism,

tubule formation, tumor necrosis, and age as input variables. In this study, they showed
25

that data mining could be a valuable tool in identifying similarities (patterns) in breast

cancer cases, which can be used for diagnosis, prognosis, and treatment purposes the

area under the ROC curve (AUC) was used as a measure of accuracy of the prediction

models in generating survival estimates for the patients in the independent validation

set. The AUC values of the neural network models for 5-, 10- and 15- year breast-

cancer-specific survival were 0.909, 0.886 and 0.883, respectively. The corresponding

AUC values for logistic regression were 0.897, 0.862 and 0.858. Axillary lymph node

status (N0 vs. N+) predicted 5-year survival with a specificity of 71% and a sensitivity

of 77%. The sensitivity of the neural network model was 91% at this specificity level.

The rate of false predictions at 5 years was 82/300 for nodal status and 40/300 for the

neural network. When nodal status was excluded from the neural network model, the

rate of false predictions increased only to 49/300 (AUC 0.877). An artificial neural

network is very accurate in the 5-, 10- and 15-year breast cancer-specific survival

prediction. The consistently high accuracy over time and the good predictive

performance of a network trained without information on nodal status demonstrate that

neural networks can be important tools for cancer survival prediction.

Ahmad et al. [13] implemented machine learning techniques, i.e., Decision Tree (C4.5),

Support Vector Machine (SVM), and Artificial Neural Network (ANN) to develop the

predictive models for patients registered in the Iranian Center for Breast Cancer (ICBC)

program from 1997 to 2008. The dataset contained 1189 records, 22 predictor

variables, and one outcome variable. The main goal of their paper was to compare the

performance of these three well-known algorithms on our data through sensitivity,

specificity, and ac- curacy. Their analysis showed that accuracy of DT, ANN and SVM

are 0.936, 0.947 and 0.957 respectively. The SVM classification model predicts breast

cancer recurrence with least error rate and highest accuracy. The predicted accuracy of
26

the DT model is the lowestof all. The results are achieved using 10-fold cross-validation

for measuring the unbiased prediction accuracy of each model. Ayer et al. [14] used

two of the most frequently used computer models in clinical risk estimation are Logistic

Regression and Artificial Neural Network. A study was conducted to review and

compare these two models, elucidate the advantages and disadvantages of each, and

provide criteria for model selection. The two models were used for estimation of

breast cancer risk on the basis of mammographic descriptors and demographic risk

factors. Although they demonstrated similar performance, the two models have unique

characteristics-strengths as well as limitations-that must be considered and may prove

complementary in contributing to improved clinical decision making. In general, ANN

was more advantageous of ANNs over logistic regression model due to its hidden

layers of nodes. In fact, a special ANN with no hidden node has been shown to be

identical to a logistic regression model. ANNs are particularly useful when there are

implicit inter- actions and complex relationships in the data, whereas logistic regression

models are the better choice when one needs to draw statistical inferences from the

output.

Maxine Tan et al., [15] proposed a novel computerized system to predict breast cancer

risk using quantitative assessment of mammographic image. In this research work the

data collected from 335 women. The proposed model was applied the collected images.

The output of this research work showed 159 cancers were identified. The remaining

176 werecancer free images. In this work SVM classifier was used to predict the cancer

disease.

Habib Dhahri et al., [16] explained the various machine learning methods are used to

predict cancer disease. This proposed model worked based upon genetic programming

combined with machine learning algorithms. The purpose of this study [12] was easy to
27

differentiate the different type of breast cancer tumor such as benign and malignant.

Here genetic programming was applied to find out features and the correct attributes

values of machine learning approaches. The performance of the proposed system was

measured depends upon the sensitivity, specificity, precision and. This research work

proved that genetic programming concept automatically generates the model by feature

pre-processing methods combined with classifier concept

Konstantina Kourou et al. [17] said that different type of cancer disease is available

now. The disease was identified in early stage it was curable. To detect the diseases

frequent screening was needed. Based upon the risk level the cancer disease the peoples

are categorized. Machine Learning algorithms are used to identify the important

features from the complex data set. The various machine learning approaches are

Artificial Neural Networks approach, Bayesian Networks algorithm, Support Vector

Machines concepts and Decision Trees techniques has been broadly used in cancer disease

detection research. This research work presented the reviewed of latest machine learning

concepts used for cancerdisease development level.

Somil Jain et al. [18] discussed about the causes of breast cancer and how to predict in

earlier stage. Recently many people are affected by the cancer disease. Different

machine learning approaches and data mining techniques were used for medical data

prediction done for various diseases like breast cancer. The major contribution of this

research work was to calculate correctness of the classification concepts and identify the

best algorithm based upon the accuracy level and predicting capability [5].

Anusha Bharat et al. [19] explained in their research work machine learning was mostly

used in healthcare applications such as identification of the kind of cancerous cells.

Breast cancer is one of the cost common diseases throughout the world. The different

types of cancer cells are Benign and Malignant. In this proposed work SVM classifier
28

is used to predict the cancer. The SCM concept was applied on Wisconsin Breast

Cancer dataset. The above data set was trained by using other concepts: KNN, Naives

Bayes and CART. The accuracy level of each algorithm is compared. Shubham Sharma

et al., says that most of the Indian women are affected by the breast cancer. From the

affected women 50% women going to fatal condition due to breast cancer. This

research work was used to compare various machine learning algorithms used in

diagnosis breast cancer detection. The obtained result was used in breast cancer

detection.

Moh‟d Rasoul Al-hadidi et al. [20] says that breast cancer was a critical disease in

females throughout the entire world. Detection cancer disease is the initial state to

improve their survival dates. Radiologists are using mammography images to predict

the disease. Here the authors proposed a new model to sense the breast cancer with high

exactness. In this process was divided in to two stage. In the first stage the image

processing concepts are applied to organize the input mammography pictures for feature

and pattern extraction task. Supervised learning methods are applied on the extracted

features. Here Back Propagation Neural Network concept and the Logistic Regression

(LR) techniques are used. Finally, the accuracy of the above-mentioned model was

compared.

Naresh Khuriwal et al. [21] says that breast cancer is the critical disease for

women. The main target of this study was curing the cancer in the initial stage with

using scientific methods. Early diagnosis of the disease can be used to remove the

cancer disease totally. In this research work, 41 research papers are reviewed. Here the

authors were used Deep Leaning concept and convolution neural network algorithms

are used for predict breast cancer. The mamograph images are taken from MIAS

database. The output of this research shows 98% accuracy. The processing step was
29

divided into three stages. In the first stage the data were collected and remove the

unwanted data by using pre-processing techniques. Then the dataset was split for

training and testing purpose. Finally develop a model usingmachine learning algorithms

B.M.Gayathri et al. [22] explained due to the change of life styles of women and avoid

breast. Predicting this disease manually is very difficult and it is a time-consuming

processing. To detect breast cancer disease machine learning concepts are used. In this

research work performed a comparative study of Relevance vector machine (RVM) with

some othermachine learning concepts are used in for breast cancer prediction.

Dana Bazazeh et al. [23] says that breast cancer most wide spread fatal disease among

women throughout the world. Machine Learning approaches are used to diagnosis the

breast cancer in early stage. In this research work the authors compared most

commonly used machine leaning concepts are Support Vector Machine concept,

Random Forest technique and Bayesian Networks approach Wisconsin original breast

cancer data set was usedto implemented machine learning concepts.

Zhiqiong Wang et al. [24] used the convolutional neural network (CNN) deep features

to detect cancer disease. Initial step the authors used mass detection method based upon

CNN deep learning features and unsupervised machine learning clustering concept.

Then build a various feature set likes morphological features, texture features, and

density features. Finally, ELM classifier was designed to classify benign and malignant

breast tumors method

Yawen Xiao et al. [25] says that breast cancer disease is common disease in female

category of the people. In this research work demonstrated a new system embedded

with deep learning concept based unsupervised feature extraction algorithm. The

stacked auto- encoder concept was also used with a support vector machine technique
30

to predict breast cancer. The proposed method was tested by using Wisconsin

Diagnostic Breast Cancer data set. The result displays that SAE-SVM method used to

increase accuracy level to 98.25%

Junaid Ahmad Bhat et al. [26] developed a new tool used to detect the breast cancer

disease in early stage. In this research work the authors was presented preliminary

results of the project BCDM developed by using Matlab software. The algorithm was

implemented using adaptive resonance approach.

P. Malathi et al. [27] proposed a research on computer aided detection system

for women breast cancer diagnosis from digital mammographic images. The research

about work conveyed to create PC helped conclusion instruments that can help the

radiologists in mak- ing precise understanding of the computerized mammograms. The

methodologies which applied to structure various phases of CAD framework are

abridge

2.5 Survey of Previous Works

After research, a great number of articles and publications on breast cancer

prediction using deep learning and machine learning were investigated to find out the

accuracy and repetition of selected algorithms with best performance. Total number of

papers we have found using keyword search were 43,900 that we have got from

different platforms like ACM, IEEE, Research Gate and Science Direct. Our search

query was focused on four keywords: machine learning, deep learning, data mining and

breast cancer prognosis or diagnosis.


31

Table 2. 1: Survey of Previous Works on Breast Cancer Detection

Reference Year Algorithm No of Prediction Limitation Best

subjects Performance Algorithm

Lundin et 1999 ANN, LR 951 AUC No ANN

al. (1999) (ANN=0.909, accuracy

LR=0.897)

Delen et 2004 ANN, DT, 202,532 Accuracy No AUC DT

al. (2004) LR (ANN=0.909,

DT=0.935,

LR=0.894)

Ahmad et 2013 ANN, DT, 1189 Accuracy No AUC SVM

al. (2013) SVM (ANN=0.947,

DT=0.936,

SVM=0.957)

Yao et al. 2013 DT, RF, 569 Accuracy No AUC RF

(2013) SVM (DT=0.932,

RF=0.963,

SVM=0.959)

Ayer et al. 2010 ANN, LR 62,219 Accuracy No AUC ANN

(2010) (ANN=0.965,

LR=0.963)
32

2.6 Partial Conclusion

Investigation of the relevant literature helps in sorting out different deep learning and

ma- chine learning techniques that could be exploited in detecting breast cancer. After

reviewing all techniques, their performance, their accuracy, the number of times they

appeared in a journal, the optimum machine learning techniques that I selected for

breast cancer detection was artificial neural network, more precisely, Convolutional

Neural Networks (CNN) because it was used in more references than any other

algorithm. For my proposed methodology, the CNN architecture and functioning which

shall be further explained in Chapter 3.


33

CHAPTER THREE

MATERIALS AND METHODS

3.1 Introduction

This chapter carefully explains the various methods and processes taken to realize the

project. It reveals the model used, the training of the model in the software, and several

other parameters.

3.2 Project Methodology

The summary of the project methodology is explained in Figure 3.1. This project

aims to assess whether a lump in a breast could be malignant (cancerous) or benign

(non- cancerous).

Figure 3. 1: Project Methodology Flowchart

For that, we use digitized histopathology images of fine-needle aspiration (FNA)

biopsy using machine learning. First, the CNN model is built and trained in colab by

importing the chosen data set to it. Then, once a high accuracy achieved, a web app is

created in the front end to allow a new prediction to be made for any patient image data.

Google Colab was chosen preferred to Kaggle for training the model because it is very
34

simple to use and also has default codes to directly call a dataset into the model.

3.2.1 The Data Set

The data set for this project can be downloaded at kaggle.com/uciml/breast-cancer

wisconsin- data. Dr. William H. Wolberg, from the University of Wisconsin Hospitals,

Madison, obtained this breast cancer database [28]. Figure 3.2 shows the first five

rows and columns of the data set. In this data set there are 30 input parameters more

than 600 patient cases used. Target variables can only have two values in a

classification model: 0 (false) or 1 (true). Since this dataset doesn‟t contain image

data, another dataset containing histopathological FNA biopsy images was also used

to classify the instances into either benign or malignant, from the site

kaggle.com/datasets/paultimothymooney/breasthistopathology-images. About 4000

images were used for the training.

Figure 3. 2: Section of data set showing first five rows and columns
35

3.2.2 Convolutional Neural Network Architecture

The Convolutional Neural Network was built in Google Collaboratory, which is

a free environment that provides users with free Graphics Processing Unit (GPU)

and Tensor Processing Unit (TPU) runtimes for training their machine learning

algorithms. Once in the Google Colab environment, a user will simply have to login

to a Gmail account and create a new notebook in order to create a new neural

network. CNN was chosen over other ANN algorithms because since digital images

are a bunch of pixels with high values, it makes sense to use CNN to analyze them.

CNN decreases their values, which is better for the training phase with less

computational power and less information loss The main reason why ReLu

(Rectifying Linear Unit) is used in preference to other activation layer algorithms

is because it is simple, fast, and empirically it seems to work well. Empirically,

early papers observed that training a deep network with ReLu tended to converge

much more quickly and reliably than training a deep network with sigmoid

activation.

CNNs are comprised of three types of layers. These are convolutional layers,

pooling layers and fully-connected layers. When these layers are stacked, a CNN

architecture has been formed [29]. Figure 3.2 shows the architecture. The CNN

model takes as input the sequence of word embeddings, summarizes the sentence

meaning by convolving the sliding window and pooling the saliency through the

sentence, and yields the fixed-length distributed vector with other layers, such as

dropout and fully connected layers. [30].


36

Figure 3. 3: Illustration of how a Convolutional Neural Networks performs [31]

3.2.3 Training the Neural Network

The CNN model was trained in Colab using Tensor Flow and Keras modules. The

classification was done separating the data set into a training set and validation set. The

training was set for ten epochs for the training and validation data sets. A resulting

graph for the training and validation processing was made to highlight the accuracy and

loss for both sets, which will be discussed in the next chapter.

Convolution Layer: Figure 3.4 shows the convolution operation. This is the first layer

of the convolutional network that performs feature extraction by sliding the filter over

the input image. The output or the convolved feature is the element-wise product of

filters in the image and their sum for every sliding action. The output layer, also known

as the feature map, corresponds to original images like curves, sharp edges, textures, etc.

In the case of networks with more convolutional layers, the initial layers are meant for

extracting the generic features while the complex parts are removed as the network gets

deeper.
37

Figure 3. 4: Convolution layer scheme [31

Figure 3. 5: Pooling layer scheme [31]

Pooling Layer: Figure 3.5 shows the functioning of the pooling layer. The primary

purpose of this layer is to reduce the number of trainable parameters by decreasing the

spatial size of the image, thereby reducing the computational cost. The image depth

remains un- changed since pooling is done independently on each depth dimension. Max

Pooling is the most common pooling method, where the most significant element is
38

taken as input from the feature map. Max Pooling is then performed to give the output

image with dimensions reduced to a great extent while retaining the essential

information.

Fully Connected Layer: Figure 3.6 shows the functioning of the fully connected layer.

The last few layers which determine the output are the fully connected layers. The

output from the pooling layer is Flattened into a one-dimensional vector and then given

as input to the fully connected layer. The output layer has the same number of neurons

as the number of categories we had in our problem for classification, thus associating

features to a particular label.

Figure 3. 6: Fully Connected layer scheme [31]

After this process is known as forwarding propagation, the output so generated is

compared to the actual production for error generation. The error is then back

propagated to update the filters(weights) and bias values. Thus, one training is

completedafter this forwarding and backward propagation cycle.


39

3.2.4 Classifier Performance Index

The diagnostic ability of classifiers has usually been determined by the confusion

matrix and the Receiver Operating Characteristic (ROC) curve. In the machine learning

research domain, the confusion matrix is also known as error or contingency matrix.

The basic framework of the confusion matrix has been provided in Figure 3.7 In this

framework, true positives (TP) are the positive cases where the classifier correctly

identified them. Similarly, true negatives (TN) are the negative cases where the classifier

correctly identified them. False positives (FP) are the negative cases where the classifier

incorrectly identifiedthem as positive and the false negatives (FN) are the positive cases

where the classifier incorrectly identified them as negative. The following measures,

which are based on the confusion matrix, are commonly used to analyze the

performance of classifiers, including those that are based on supervised machine

learning algorithms. The acceptable ranges for the accuracy, precision, F1 score,

sensitivity and specificity are from 90% to 100%, while the acceptable range for the

false positive rate is from 0% to 10%.

Figure 3. 7: a) The basic framework of the confusion matrix; and b) A presentation of


the ROC curve [5]
40

ROC is one of the fundamental tools for diagnostic test evaluation and is created by

plotting the true positive rate against the false positive rate at various threshold settings.

The area under the ROC curve (AUC) is also commonly used to determine the

predictability of a classifier. A higher AUC value represents the superiority of a

classifier and vice versa. Figure 3.7 illustrates a presentation of three ROC curves based on

an abstract dataset. The area under the blue ROC curve is half of the shaded rectangle.

Thus, the AUC value for this blue ROC curve is 0.5. Due to the coverage of a larger

area, the AUC value for the red ROC curve is higher than that of the black ROC curve.

Hence, the classifier that produced the red ROC curve shows higher predictive

accuracy compared with the other two classifiers that generated the blue and red ROC

curves.
41

3.2.5 Web App Development

The web app development proposed should satisfy the SDLC, or Software

Development Life Cycle, which is a set of steps used to create software applications.

Figure 3.8 shows an illustration of the seven steps of the software development life

cycle. These steps divide the development process into tasks that can then be assigned,

completed, and measured. It simply outlines each task required to put together a

software application. This helps to reduce waste and increase the efficiency of the

development process. Monitoring also ensures the project stays on track, and continues

to be a feasible investment for the company.

Figure 3. 8: The seven phases of software development life cycle [32]

1 Planning

In this phase, the scope and purpose of the application are defined. The purpose of the

web app is to enable any patient or medical expert to know in a very accurate way if an

imagedata uploaded is benign or malignant. The first step would be for the user to get a
42

fine needle aspiration biopsy from a hospital. The cost for the biopsy is approximately

10$, which is around 7,000 FCFA. That is the cost to get an image data that can be used

in theapp.

2 Define Requirements

Our application is supposed to read the uploaded image from the user, and compare it

with the thousands of trained images in the back end in order to give an accurate

diagnose of the cancer.

3 Coding

The app is designed to the web using Python as the programming language, Google

Colab or Visual Studio Code as the IDE, and Streamlit as the web development

framework. Themodel was trained with our CNN algorithm in ten epochs.

The source code for building the web app in python is given in Appendix A.

4 Software Development

The Streamlit utility is imported to the code. The ap.py code was downloaded from

colaband saved in a directory on the computer.

In the command prompt window, the directory of the app.py file is used to create a

heroku app from the code. Next after logging in to heroku using heroku login, the

app.py was created in heroku using heroku create aicancertracer. The codes on how to

create these files are in the appendix section.

After this has been done, some files are needed to run the app online. They are the

require-ments.txt file, setup.sh file and the Procfile.

The requirements.txt file contains all the libraries that need to be installed for the

project to work. This file can be created manually by going through all files and

looking at what
43

libraries are used or automatically using something like pipreqs. This can be found in

Appendix B.

Using the setup.sh and Procfile files, you can tell Heroku the needed commands for

starting the application. In the setup.sh file, we will create a streamlit folder with a

credentials.toml and a config.toml file. The Procfile is used to execute the setup.sh and

then call streamlit run to run the application. This can be found in Appendices C and D

respectively.

The final step is to create a git init and push all the files mentioned to heroku master in

order to deploy the app online. This can be found in Appendix E.

5 Testing

Prior to launching the app on the web, the app was tested on the local host using ngrok

in Google Colab. After coding, the ngrok utility was used to test and see if the app was

okay before using streamlit to embed the code. In case the IDE used wan VS Code, the

app canstill be tested on the computer‟s local host from port 8500, 8501 or 8502.

6 Deployment

After the testing is successful, the app was launched at https://aicancertracer.

herokuapp.com

7 Maintenance

Heroku is a platform that requires a lot of maintenance, and that is done by going

to the heroku CLI dashboard and adjusting the settings in order to keep the app

running.The kaffeine module was also used to keep the app running, at https://kaffeine.
44

3.3 Use Case Diagram

In Unified Modelling Language (UML), a use case diagram helps understand how a

user might interact with the system. In this case the use case diagram of the system is

presented in Figure 3.9. There are two actors in the process; the patient and the doctor.

The interactions between the actors and cases are demonstrated in the following

paragraphs. [33]

Figure 3. 9: Use case diagram of the system

Case 1: Perform diagnosis: The doctor will perform a series of tests on the patient

todetermine the type of the cancer.

First, the doctor will perform a fine needle aspiration biopsy, by taking tissue sample

from the breast for further examination. Next, he will inspect the physical appearance

of the

biopsy sample under a microscope, and load the image that result in the web app that
45

resultto determining the type of cancer, whether it is benign or malignant. [33]

Next, the doctor will share the image data of the diagnosis with the patient as to show

him or her the state of findings, and so that the patient can also check his status via the

web app. If the cancer is malignant, the doctor will proceed with a number of tests to

see how advanced the cancer is and how far it has spread.

Then the doctor will look into other factors to determine the prognosis of the disease.

[33]

Case 2: Propose treatment: When the above investigations are completed, the doctor

willcounsel the patient regarding the best treatment options available, based on the type

and the stage of the disease and some prognostic factors.

Besides proposing the treatment options to patient, the doctor also need to explain to

the patient about the risks of taking the particular treatment and chance of recovery.

Upon endorsement by the patient, the doctor will schedule the treatments for the patient.

[33]

3.4 Partial Conclusion

In this chapter we have seen how the data set and images were trained in CNN, and

how the web app was developed using Streamlit in Google Colab. For the next step we

shall view the results of the training, the ROC curve, the confusion matrix, together

with all theKPIs involved, and the app deployed.


46

CHAPTER FOUR

RESULTS AND DISCUSSIONS

4.1 Introduction

This chapter presents the results of training the CNN model, giving information about

the ROC curve, the confusion matrix, the accuracy and other KPIs, and also, the

application deployed.

4.2 Results from Google Colab

From the coding of the model which is found in Appendix A, Figure 4.1 shows the

result of the ROC curve gotten after training the model. The output was a 0.98 AUC,

indicatinga strong classification.

Figure 4. 1: ROC curve performance

The next result obtained was the confusion matrix after training the dataset, which gave

all the resulting KPI shown. Figure 4.2 shows the result of the confusion matrix, while

Figure 4.3 shows a graph of the training and validation accuracy versus the training and

validationloss.
47

erfererertrt

Figure 4.2: Confusion Matrix Performance

From the results gotten, the accuracy of the prediction is 97.9%, implying a very good

model trained. So here is a summary of the KPIs of the training:


48

• Classification accuracy (ratio of instances correctly classified): 97.9%.


• Error rate (ratio of instances misclassified): 1.25%.
• Specificity (ratio of real negative which are predicted negative): 98.75%.
• Sensitivity (ratio of real positive which are predicted positive): 96.8%.
• Precision (ratio of real positive which are predicted positive): 98.4%.

A plot of the training and validation accuracies of the training and validation data sets

were also gotten from the tensor flow prediction, together with a plot of their respective

losses, as a function of the epochs history. From Figure 4.3, the training and validation

accuracy increases with an increase in the number of epochs, while the training and

validation loss decreases with an increase in the number of epochs, indicating that the

system becomes more accurate as more training time is given to it.

Figure 4. 3: Training and validation accuracy vs training and validation loss


49

4.3 Results from Streamlit App

The Streamlit app was able to predict new cases of benign or malignancy when the user

uploads an image to the app. The app is available at aicancertracer.herokuapp. com.

Figure 4.4 shows the homepage of the web app, while Figure 4.5 and Figure 4.6 show

the results of prediction with a benign and malignant image respectively. The app takes

three seconds to give the prediction results for both the benign and malignant images.

Figure 4. 4: Web app homepage

A benign image was uploaded to the app as can be seen in Figure 4.5, and the result

wasstill benign, indication a correct classification made by the application.

Figure 4. 5: Benign image prediction


50

Next, a malignant image was also uploaded to the app as can be seen in figure 4.6 and

theresult was still benign, indication a correct classification made by the application.

Figure 4. 2: A malignant image prediction

4.4 Discussion of Results and Comparative Analysis with Previous Works

CNN was used for this prediction and gave an accuracy of 97.9%, which is relatively

superior to that obtained from previous works. Table 4.1 summarizes the comparison of

results with those from previous works.

From the previous works, it is clearly revealed the CNN is the most preferred deep

learning technique. It is preferred over other neural network algorithms such as RNN,

Feed Forward Neural Networks, Kohonen Self Organizing Neural Network or VGG-16

due to the fact that it digitizes high pixel values faster than the other algorithms.

Therefore, the choice I made to use it for the project was surely the best.
51

Table 4. 1 Comparative analysis of results with previous works on deep learning for
breast cancer detection

Reference Algorithm Accuracy Other KPI

My results CNN 0.979 AUC = 0.98, Specificity = 0.9875,

Sensitivity = 0.968

Rawat et al. [34] CNN 0.846 Only accuracy

Cruz-Roa et al. [35] CNN+HASHI - NPV=0.97 TNR=0.92 FPR=0.08,

FNR=0.13,

Toğaçar et al. [36] Auto encoder3 0.986 Sensitivity = 0.9812, Precision =

0.9688

Shallu et al. [37] VGG16 0.926 ROC = 0.956, Precision = 0.959,

Murtaza et al. [38] CNN 0.954 Sensitivity = 0.935

Bardou et al.[39] CNN 0.98 Precision = 0.842

Elbashir et al.[40] CNN 0.987 Sensitivity = 0.914 Specificity =

1.00 Precision = 1.00 F1 score =

0.955

Saha et al. [41] CNN 0.945 Precision = 0.958

In addition, from the previous works, the main KPI which is the accuracy, ranges from

0.846 to 0.987, indicating strong classification. Our own yields 0.979, which is higher

thanmost results in previous works, demonstrating efficiency in classification.

Another remark is the fact that multiclass variables other than the traditional benign or

malignant classification were also detected at the output, indicating the possibility to extend

the usage of the project according to the type of cancer to be detected. In general, my results
52

are not very much different from the results in the previous works. The accuracy and

other KPIs are very high enough, indicating that the training and prediction was

efficient.

4.5 Partial Conclusion

So far so good, we have all the results of our classification model, the accuracy was

very high, higher than those of the previous works, indicating a superiority in the

selection of the model, and the training dataset. Also, the web app was deployed to

enable any user with a new sample patient image, to train and get a prediction of the

most likelihood of the classification instance of the image data, hence saving lives

through early detection.


53

CHAPTER FIVE

GENERAL CONCLUSION

5.1 Summary of Findings

In this dissertation, we proposed a simple and effective method for the classification of

histopathology breast cancer images in case of a large training data. Following the

training of the artificial neural network with the breast cancer dataset, we had the

following results

• Classification accuracy (ratio of instances correctly classified): 97.9%.

• Error rate (ratio of instances misclassified): 1.25%.

• Specificity (ratio of real negative which are predicted negative): 98.75%.

• Sensitivity (ratio of real positive which are predicted positive): 96.8%.

• Precision (ratio of real positive which are predicted positive): 98.4%.

From the above values, we realize the training and classification of the model was

accurately done, with ease and simplicity. Approximately 4,000 images were used for

training, with 20% of this number used for validation. Therefore, the feature extraction

and classification were done with more precision.

5.2 Implications to Existing Knowledge

The realization of this project means that there exist various ways through which breast

cancer and other diseases could be easily detected using AI. Cancer detection in

medical imaging is a field that can achieve many good results with deep learning

technology. Re- viewed papers are summarized in Table 4.1. So far, the results very

satisfactory, but the development of deep learning technology is very fast, the supply of

researchable medical image data is getting bigger, and the research funds are getting
54

rich, so the future is bright. In the future, it will be easier and more accurate to diagnose

not only medical images but also EHR and genetic information with the help of deep

learning technology. The devel- opment of deep learning technologies is important for

this, but the role of physicians who understand and use these technologies becomes

increasingly important. [37]

5.3 Recommendations

This project can be widely used in the field of medicine, whereby diseases like breast

cancer, heart disease, and other disease can be easily diagnosed for the good of

everyone. This is a great contribution to Engineering and Technology, as the use of AI

has been successfully applied to the good of the society. This study will be valuable to

medical area as it will allow fast diagnostic of breast cancer even in areas without

specialist. Moreover, it could be of high interest to patients in case they are to confirm

their diagnosis.

5.4 Future Scope

This study had some limitations. The histopathology images were downsized to fit the

available GPU. As more GPU memory becomes available, future studies will be able to

train models using larger image sizes, or retain the original image resolution without

the need for downsizing. Retaining the full resolution of the images will provide finer

detailsof the KPIs and likely improve performance.

For future work, we intend to use and evaluate other CNN pretrained models for the

features extraction stage, and extend the application usability to other types of cancer,

such ascolorectal, lung or prostate cancer.

Most papers published in the field of breast cancer detection and subtype classification
55

use machine learning techniques. However, deep learning models have not been

heavily investigated in this domain. A thought for the future would be to present

researchers with opportunities to use various deep learning mechanisms to predict

patient status such as LSTM, GAN and RNN, as these types of research have not yet

been conducted in the field.

We also intend to develop a mobile app for this solution in order to maximize the utility

of the project.
56

REFERENCES

[1] F. Noreen, L. Liu, H. Sha, and H. Ahmed, “Prediction of breast cancer, comparative

review of machine learning techniques, and their analysis,” IEEE Access, vol. PP, pp.

1–1, 08 2020.

[2] A.cancer Society, “Breast cancer early detection and diagnosis.”

https://www.cancer.org/cancer/breast-cancer/ screening-tests-and-early-detection/breast-

mri-scans. html, 2022. Accessed: 2022-07-27.

[3] A. Victor,“10 uses of artificial intelligence in day to day life.”

https://insights.daffodilsw.com/blog/ 10-uses-of-artificial-intelligence-in-day-to-day-life,

2021. Accessed: 2022-07-26.

[4] A. Dasgupta and A. Nath, “Classification of machine learning algorithms,” Inter-

national Journal of Innovative Research in Advanced Engineering (IJIRAE) ISSN:

2349-2763, vol. 3, pp. 6–11, 03 2016.

[5] S. Uddin, A. Khan, M. Hossain, and M. A. Moni, “Comparing different supervised

machine learning algorithms for disease prediction,” BMC Medical Informatics and

Decision Making, vol. 19, 12 2019.

[6] A. Biswal, “Top 10 deep learning algorithms you should know in 2023.” https:

//www.Top10DeepLearningAlgorithmsYouShouldKnowin2022, 2022. Accessed: 2022-

07-23.

[7] D. Dahiwade, G. Patle, and E. Meshram, “Designing disease prediction model using

machine learning approach,” pp. 1211–1215, 03 2019.

[8] H. Chen, “An efficient diagnosis system for detection of parkinson‟s disease using

fuzzy k-nearest neighbor approach,” Expert Systems with Applications, 01 2013.


57

[9] P. Sengar, M. Gaikwad, and D.-A. Nagdive, “Comparative study of machine learning

algorithms for breast cancer prediction,” pp. 796–801, 08 2020.

[10] Y. Dengju, J. Yang, and X. Zhan, “A novel method for disease prediction: Hybrid of

random forest and multivariate adaptive regression splines,” Journal of Computers, vol.

8, 01 2013.

[11] D. Delen, G. Walker, and A. Kadam, “Predicting breast cancer survivability: A com-

parison of three data mining methods,” Artificial intelligence in medicine, vol. 34, pp.

113–27, 07 2005.

[12] M. Lundin, J. Lundin, H. Burke, S. Toikkanen, L. Pylkkänen, and H. Joensuu, “Ar-

tificial neural networks applied to survival prediction in breast cancer,” Oncology, vol.

57, pp. 281–6, 12 1999.

[13] L. Ghasem Ahmad, A. Eshlaghy, A. Pourebrahimi, M. Ebrahimi, and A. Razavi,

“Using three machine learning techniques for predicting breast cancer recurrence,”

Journal of Health Medical Informatics, vol. 4, pp. 124–130, 01 2013.

[14] T. Ayer, J. Chhatwal, O. Alagoz, C. E. Kahn, R. W. Woods, and E. S. Burnside, “In-

formatics in radiology: comparison of logistic regression and artificial neural network

models in breast cancer risk estimation.,” Radiographics : a review publication of the

Radiological Society of North America, Inc, vol. 30 1, pp. 13–22, 2010.

[15] M. Tan, B. Zheng, J. Leader, and D. Gur, “Association between changes in mam-

mographic image features and risk for near-term breast cancer development,” IEEE

Transactions on Medical Imaging, vol. 35, pp. 1–1, 02 2016.

[16] H. Dhahri, “Automated breast cancer diagnosis based on machine learning algo-

rithms,”,” Journal of Healthcare Engineering, 2019.


58

[17] K. Kourou, T. Exarchos, K. Exarchos, M. Karamouzis, and D. Fotiadis, “Machine

learning applications in cancer prognosis and prediction,” Computational and Struc-

tural Biotechnology Journal, vol. 13, 11 2014.

[18] S. Jain and P. Kumar, “Prediction of breast cancer using machine learning,” Recent

Patents on Computer Science, vol. 12, 06 2019.

[19] A. Bharat, N. Pooja, and R. Reddy, “Using machine learning algorithms for breast

cancer risk prediction and diagnosis,” pp. 1–4, 10 2018.

[20] m. Hadidi, A. Alarabeyyat, and M. Alhanahnah, “Breast cancer detection using k-

nearest neighbor machine learning algorithm,” pp. 35–39, 08 2016.

[21] N. Khuriwal and N. Mishra, “Breast cancer diagnosis using deep learning algorithm,” 10

2018.

[22] B. Gayathri and C. Sumathi, “Comparative study of relevance vector machine with

various machine learning techniques used for detecting breast cancer,” pp. 1–5, 12

2016.

[23] R. Shubair, “Comparative study of machine learning algorithms for breast cancer

detection and diagnosis,” 12 2016.

[24] Z. Wang, M. Li, H. Wang, H. Jiang, Y. Yao, H. Zhang, and J. Xin, “Breast cancer

detection using extreme learning machine based on feature fusion with cnn deep fea-

tures,” IEEE Access, vol. PP, pp. 1–1, 01 2019.

[25] Y. Xiao, J. Wu, Z. Lin, and X. Zhao, “Breast cancer diagnosis using an unsupervised

feature extraction algorithm based on deep learning,” pp. 9428–9433, 07 2018.

[26] J. Bhat, V. George, and B. Malik, “Cloud computing with machine learning could help
59

us in the early diagnosis of breast cancer,” 05 2015.

[27] M. Perumal, “A research on computer aided detection system for women breast can- cer

diagnosis from digital mammographic images,” 04 2021.

[28] W. Wolberg and O. Mangasarian, “Multisurface method of pattern separation for

medical diagnosis applied to breast cytology,” Proceedings of the National Academy of

Sciences of the United States of America, vol. 87, pp. 9193–6, 01 1991.

[29] K. O‟Shea and R. Nash, “An introduction to convolutional neural networks,” ArXiv e-

prints, 11 2015.

[30] J. Zhang and C. Zong, “Deep neural networks in machine translation: An overview,”

IEEE Intelligent Systems, vol. 30, pp. 16–25, 09 2015.

[31] X. Kang, B. Song, and F. Sun, “A deep similarity metric method based on incomplete data

for traffic anomaly detection in iot,” Applied Sciences, vol. 9, p. 135, 01 2019.

[32] J. T. Point, “Software development life cycle (sdlc).”

https://www.javatpoint.com/software-engineering-software-development-life-cycle, 2021.

Accessed: 2022-07-26.

[33] O. Sheta and A. Nour Eldeen, “Building a health care data warehouse for cancer

diseases,” International Journal of Database Management Systems ( IJDMS ), vol. 4, 10

2012.

[34] R. Rawat, D. Ruderman, P. Macklin, D. Rimm, and D. Agus, “Correlating nuclear

morphometric patterns with estrogen receptor status in breast cancer pathologic spec-

imens,” npj Breast Cancer, vol. 4, 12 2018.

[35] Cruz-Roa A, Gilmore H, “High-throughput adaptive sam- pling for whole-slide


60

histopathology image analysis (hashi) via convolutional neural networks: Application to

invasive breast cancer detection”,” Pub Med, 12 2018.

[36] M. Toğaçar, B. Ergen, and Z. Cömert, “Application of breast cancer diagnosis based on

a combination of convolutional neural networks, ridge regression and linear dis-

criminant analysis using invasive breast cancer images processed with autoencoders,”

Medical Hypotheses, vol. 135, p. 109503, 2020.

[37] S. Sharma and D. R. Mehra, “Breast cancer histology images classification: Training

from scratch or transfer learning?,” ICT Express, vol. 4, 11 2018.

[38] G. Murtaza, L. Shuib, G. Mujtaba, and G. Raza, “Breast cancer multi-classification

through deep neural network and hierarchical classification approach,” Multimedia

Tools and Applications, vol. 79, 06 2020.

[39] A. S. Bardou D, Zhang K, “Classification of breast cancer based on histology images

using convolutional neural networks,” IEEE Access, 2018.

[40] M. K. Elbashir, M. Ezz, M. Mohammed, and S. S. Saloum, “Lightweight convolu-

tional neural network for breast cancer classification using rna-seq gene expression

data,” IEEE Access, vol. 7, pp. 185338–185348, 2019.

[41] M. Saha, I. Arun, R. Ahmed, S. Chatterjee, and C. Chakraborty, “Hscorenet: A deep

network for estrogen and progesterone scoring using breast ihc images,” Pat- tern

Recognition, vol. 102, p. 107200, 06 2020.


61

Appendices

A Python Code for the Streamlit App

import s tr ea ml i t

i m p o r t t e n s o r f l o w as t f

from z i p f i l e i m p o r t Z i p F i l e i m p o r t os , g l ob

i m p o r t cv2

from tqdm . t q d m n o t e b o o k i m p o r t tqdm notebook as tqdm i m p o r t

numpy as np

from s k l e a r n i m p o r t p r e p r o c e s s i n g

from s k l e a r n . m o d e l s e l e c t i o n i m p o r t t r a i n t e s t s p l i t from k e r a s .

models i m p o r t S e q u e n t i a l

from k e r a s . l a y e r s i m p o r t Convolution 2 D , Dropout , Dense ,

MaxPooling2D from k e r a s . l a y e r s i m p o r t B a t c h N o r m a l i z a t i o n

from k e r a s . l a y e r s i m p o r t MaxPooling2D from k e r a s . l a y e r s i m p o r t F l a t

ten

from z i p f i l e i m p o r t Z i p F i l e

i m p o r t m a t p l o t l i b . p y p l o t as p l t i m p o r t p i c k l e

i m p o r t PIL

from PIL i m p o r t Image

from math i m p o r t exp , t a n h

from t o k e n i z e i m p o r t Exponent from enum i m p o r t Enum

from i o i m p o r t Bytes IO , S t r i n g I O i m p o r t m a t p l o t l i b . p y p l o t as p l t

i m p o r t numpy as np i m p o r t os

i m p o r t PIL

i m p o r t t e n s o r f l o w as t f
62

from t e n s o r f l o w i m p o r t k e r a s

from t e n s o r f l o w . k e r a s i m p o r t l a y e r s

from t e n s o r f l o w . k e r a s . models i m p o r t S e q u e n t i a l from t e n s o r f l o w i

mport keras

from t e n s o r f l o w . k e r a s i m p o r t l a y e r s

from t e n s o r f l o w . k e r a s . models i m p o r t S e q u e n t i a l from d a t e t i m e i m p

ort date

from i o i m p o r t Bytes IO

from I P y t h o n i m p o r t d i s p l a y

from s k l e a r n . d a t a s e t s i m p o r t l o a d b r e a s t c a n c e r

from s k l e a r n . ensemble i m p o r t G r a d i e n t B o o s t i n g C l a s s i f i e r from s k l e a

rn . model selection import t r a i n t e s t s p l i t

from s k l e a r n . m e t r i c s i m p o r t p l o t r o c c u r v e , p l o t c o n f u s i o n m

a t r i x i m p o r t base 64

i m p o r t m a t p l o t l i b . p y p l o t as p l t i m p o r t pandas as pd

i m p o r t s e a b o r n as s n s i m p o r t u u i d

from g o o g l e . c o l a b i m p o r t f i l e s c a n c e r = l o a d b r e a s t c a n c e r ( )

X = pd . Data Frame ( c a n c e r . data , columns= c a n c e r . f e a t u r e n a m e s ) y

= pd . S e r i e s ( c a n c e r . t a r g e t )

X t r a i n , X t e s t , y t r a i n , y t e s t = t r a i n t e s t s p l i t ( X, y ) X t r a i n

. head ( )

def p l o t t o s t r ( ) :

img = Bytes IO ( )

p l t . s a v e f i g ( img , f o r m a t = ‟ png ‟ )

r e t u r n base 64 . e n c o d e b y t e s ( img . g e t v a l u e ( ) ) . decode ( ‟ u t f − 8 ‟ )


63

clf = GradientBoostingClassifier ( ) . f i t ( X train , y train )

# P l o t ROC c u r v e

p l o t r o c c u r v e ( c l f , X t e s t , y t e s t )r o c c u r v e = p l o t t o s t r ( )

# P l o t Confusion M a t r i x

p l o t c o n f u s i o n m a t r i x ( c l f , X t e s t , y t e s t )c o n f u s i o n m a t r i x = p l

ot to str ()

DATADIR = ” / c o n t e n t / FNA”

CATEGORIES = [ ‟ benign ‟ , ‟ m a l i g n a n t ‟ ] f o r c a t e g o r y i n CATEGORIES :

p a t h = os . p a t h . j o i n ( DATADIR, c a t e g o r y )

import pathlib

d a t a d i r = p a t h l i b . Pa t h ( DATADIR)

i m a g e c o u n t = l e n ( l i s t ( d a t a d i r . g l ob ( ‟ * / * . png ‟ ) ) )

b e n i g n = l i s t ( d a t a d i r . g l ob ( ‟ b e n i g n / * ‟ ) )

m a l i g n a n t = l i s t ( d a t a d i r . g l ob ( ‟ m a l i g n a n t / * ‟ ) )

b a t c h s i z e = 32

i m g h e i g h t = 180 img width = 180

t r a i n d s = t f . k e r a s . u t i l s . i m a g e d a t a s e t f r o m d i r e c t o r y (d a t a d i

r ,

v a l i d a t i o n s p l i t = 0 . 2 , s u b s e t =” t r a i n i n g ” ,

s ee d = 123 ,

i m a g e s i z e =( i m g h e i g h t , img width ) , b a t c h s i z e = 32 )

v a l d s = t f . k e r a s . u t i l s . i m a g e d a t a s e t f r o m d i r e c t o r y (d a t a d i r ,

v a l i d a t i o n s p l i t =0.2 ,

s u b s e t =” v a l i d a t i o n ” , s ee d = 123 ,
64

i m a g e s i z e =( i m g h e i g h t , img width ) , b a t c h s i z e = 32 )

c l a s s n a m e s = t r a i n d s . c l a s s n a m e sp r i n t ( c l a s s n a m e s )

i m p o r t m a t p l o t l i b . p y p l o t as p l t

AUTOTUNE = t f . d a t a . AUTOTUNE

t r a i n d s = t r a i n d s . cache ( ) . s h u f f l e ( 1 0 0 0 ) . p r e f e t c h ( b u f f e r s i z

e =AUTOTUNE) v a l d s = v a l d s . cache ( ) . p r e f e t c h ( b u f f e r s i z e

=AUTOTUNE)

normalization layer = layers . Rescaling (1./255)

n o r m a l i z e d d s = t r a i n d s . map ( lambda x , y : ( n o r m a l i z a t i o n l a y e

r ( x ) , y ) ) image batch , l ab el s b at c h = next ( i t e r ( normalized ds ))

first image = image batch [0]

# N o t i c e t h e p i x e l v a l u e s a r e now i n „ [ 0 , 1 ] „ .

num classes = len ( class names )

model = S e q u e n t i a l ( [

l a y e r s . R e s c a l i n g ( 1 . / 2 5 5 , i n p u t s h a p e =( i m g h e i g h t , img width ,

3 ) ) , l a y e r s . Conv2D ( 1 6 , 3 , padding = ‟ same ‟ , a c t i v a t i o n = ‟ r e l u ‟ ) ,

l a y e r s . MaxPooling2D ( ) ,

l a y e r s . Conv2D ( 3 2 , 3 , padding = ‟ same ‟ , a c t i v a t i o n = ‟ r e l u ‟ ) , l a y e r

s . MaxPooling2D ( ) ,

l a y e r s . Conv2D ( 6 4 , 3 , padding = ‟ same ‟ , a c t i v a t i o n = ‟ r e l u ‟ ) , l a y e r

s . MaxPooling2D ( ) ,

layers . Flatten () ,

l a y e r s . Dense ( 1 2 8 , a c t i v a t i o n = ‟ r e l u ‟ ) , l a y e r s . Dense ( n u m c l a s s e s

)
65

])

model. compile ( o p t i m i z e r = ‟ adam ‟ ,

loss= tf . keras . losses . SparseCategoricalCrossentropy ( from log

i t s =True ) , m e t r i c s =[ ‟ ac c u r a c y ‟ ] )

model . summary ( )

epochs =10

h i s t o r y = model . f i t ( t r a i n d s ,

v a l i d a t i o n d a t a = v a l d s , epochs = epochs

)acc = h i s t o r y . h i s t o r y [ ‟ a cc u r a c y ‟ ]

val acc = history . history [ ‟ val accuracy ‟]

loss = history . history [ ‟ loss ‟]

val loss = history . history [ ‟ val loss ‟]

e p o c h s r a n g e = r a n g e ( epochs )

p l t . f i g u r e ( f i g s i z e = ( 8 , 8 ) )p l t . s u b p l o t ( 1 , 2 , 1 )

p l t . p l o t ( e p o c h s r a n g e , acc , l a b e l = ‟ T r a i n i n g Accuracy ‟ )

p l t . p l o t ( e p o c h s r a n g e , v a l a c c , l a b e l = ‟ V a l i d a t i o n Accuracy ‟ ) p

l t . l e g e n d ( l o c = ‟ lower r i g h t ‟ )

p l t . t i t l e ( ‟ T r a i n i n g and V a l i d a t i o n Accuracy ‟ )

p l t . subplot (1 , 2 , 2)

p l t . p l o t ( e p o c h s r a n g e , l o s s , l a b e l = ‟ T r a i n i n g Loss ‟ )

p l t . p l o t ( e p o c h s r a n g e , v a l l o s s , l a b e l = ‟ V a l i d a t i o n Loss ‟ ) p l t .

l e g e n d ( l o c = ‟ upper r i g h t ‟ )

p l t . t i t l e ( ‟ T r a i n i n g and V a l i d a t i o n Loss ‟ ) p l t . show ( )

d a t a a u g m e n t a t i o n = k e r a s . S e q u e n t i a l ([
66

l a y e r s . Random Flip ( ” h o r i z o n t a l ” , i n p u t s h a p e =( i m g h e i g h t ,

img width , 3 ) ) ,

l a y e r s . Random Rotation ( 0 . 1 ) , l a y e r s . RandomZoom ( 0 . 1 ) ,

model = S e q u e n t i a l ( [ d a t a a u g m e n t a t i o n ,

layers . Rescaling (1./255) ,

l a y e r s . Conv2D ( 1 6 , 3 , padding = ‟ same ‟ , a c t i v a t i o n = ‟ r e l u ‟ ) , l a y e r

s . MaxPooling2D ( ) ,

l a y e r s . Conv2D ( 3 2 , 3 , padding = ‟ same ‟ , a c t i v a t i o n = ‟ r e l u ‟ ) , l a y e r

s . MaxPooling2D ( ) ,

l a y e r s . Conv2D ( 6 4 , 3 , padding = ‟ same ‟ , a c t i v a t i o n = ‟ r e l u ‟ ) , l a y e r

s . MaxPooling2D ( ) ,

l a y e r s . Dropout ( 0 . 2 ) , l a y e r s . F l a t t e n ( ) ,

l a y e r s . Dense ( 1 2 8 , a c t i v a t i o n = ‟ r e l u ‟ ) , l a y e r s . Dense ( n u m c l a s s e s

])

model . compile ( o p t i m i z e r = ‟ adam ‟ ,

loss= tf . keras . losses . SparseCategoricalCrossentropy ( from log

i t s =True ) , m e t r i c s =[ ‟ ac c u r a c y ‟ ] )

model . summary ( )

epochs = 15

h i s t o r y = model . f i t ( t r a i n d s ,

validation data=val ds ,
67

epochs = epochs

acc = h i s t o r y . h i s t o r y [ ‟ a cc u r a c y ‟ ]

val acc = history . history [ ‟ val accuracy ‟]

loss = history . history [ ‟ loss ‟]

val loss = history . history [ ‟ val loss ‟]

e p o c h s r a n g e = r a n g e ( epochs )

p l t . f i g u r e ( f i g s i z e = ( 8 , 8 ) )p l t . s u b p l o t ( 1 , 2 , 1 )

p l t . p l o t ( e p o c h s r a n g e , acc , l a b e l = ‟ T r a i n i n g Accuracy ‟ )

p l t . p l o t ( e p o c h s r a n g e , v a l a c c , l a b e l = ‟ V a l i d a t i o n Accuracy ‟ ) p

l t . l e g e n d ( l o c = ‟ lower r i g h t ‟ )

p l t . t i t l e ( ‟ T r a i n i n g and V a l i d a t i o n Accuracy ‟ )

p l t . subplot (1 , 2 , 2)

p l t . p l o t ( e p o c h s r a n g e , l o s s , l a b e l = ‟ T r a i n i n g Loss ‟ )

p l t . p l o t ( e p o c h s r a n g e , v a l l o s s , l a b e l = ‟ V a l i d a t i o n Loss ‟ ) p l t .

l e g e n d ( l o c = ‟ upper r i g h t ‟ )

p l t . t i t l e ( ‟ T r a i n i n g and V a l i d a t i o n Loss ‟ ) p l t . show ( )

f i l e = s t r e a m l i t . f i l e u p l o a d e r ( ” Upload an image ” , t y p e =[” png ” ,

” jpg ” , ” jpeg ”])

i f f i l e i s n o t None :

image = Image . open ( f i l e )

s t r e a m l i t . image ( image )

i m g a r r a y = np . a r r a y ( image )
68

img = t f . image . r e s i z e ( i m g a r r a y , s i z e = ( 1 8 0 , 1 8 0 ) ) img = t f . expand

dims ( img , a x i s = 0 )

p r e d i c t i o n s = model . p r e d i c t ( img )

s c o r e = t f . nn . softmax ( p r e d i c t i o n s [ 0 ] )

i f c l a s s n a m e s [ np . argmax ( s c o r e ) ] == ” b e n i g n ” :

s t r e a m l i t . t i t l e ( ” T h i s image i s most l i k e l y {} wi t h a { : . 2 f } p e r c

ent confidence .”

. f o r m a t ( c l a s s n a m e s [ np . argmax ( s c o r e ) ] , 100 * np . max ( s c o r e ) ) )

s t r e a m l i t . markdown(‟ < s t y l e >h1 { c o l o r : g r e e n ;} </ s t y l e > ‟ , u n s a f e

a l l o w h t m l =True )

i f c l a s s n a m e s [ np . argmax ( s c o r e ) ] == ” m a l i g n a n t ” :

s t r e a m l i t . t i t l e ( ” T h i s image i s most l i k e l y {} wi t h a { : . 2 f } p e r c

ent confidence .”

. f o r m a t ( c l a s s n a m e s [ np . argmax ( s c o r e ) ] , 100 * np . max ( s c o r e ) ) )

s t r e a m l i t . markdown(‟ < s t y l e >h1 { c o l o r : r e d ;} </ s t y l e > ‟ , u n s a f e a l l

o w h t m l =True )

B Requirements.txt file used for heroku

s t r e a m l i t = = 0 . 7 9 . 0pandas = = 1 . 2 . 3 numpy = = 1 . 1 8 . 5

m a t p l o t l i b = = 3 . 3 . 2s e a b o r n = = 0 . 1 1 . 0

t e n s o r f l o w −cpu = = 1 . 1 1 . 1

C Setup.sh file used for heroku

mkdir −p ˜/. streamlit /


69

echo ”\

[ g e n e r a l ]\n\

e m a i l = \” your −email@domain . com \ ” \ n \

” > ˜ / . s t r e a m l i t / c r e d e n t i a l s . toml

echo ”\

[ s e r v e r ]\n\

h e a d l e s s = t r u e \ n \ enableCORS= f a l s e \ n \p o r t = $PORT\ n \

” > ˜ / . s t r e a m l i t / c o n f i g . toml

D Procfile file used for heroku

web : sh s e t u p . sh && s t r e a m l i t run app . py

Git commands used for heroku `

git init

heroku login

heroku create a i c a n c e r t r a c e rg i t add ./ g i t

commit −m ” some message ”

git push heroku master heroku ps : s c a l e web =1

You might also like