Cancer Detection Using Data Mining

a ppt on python software development using data mining to develop application based on detecting cancer based on past data of patient.

Uploaded by

rishabh kumar

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

Cancer Detection Using Data Mining

a ppt on python software development using data mining to develop application based on detecting cancer based on past data of patient.

Uploaded by

rishabh kumar

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Cancer is a disease in which abnormal cells divide uncontrollably and destroy body

tissues. The human body is comprises of million of cells each with its unique function.
When there is unregulated growth of any of these cells termed as Cancer.
Cancer is classified by the type of cell is affected and more than 200 types of
cancer are known. This paper is focused on Breast Cancer. Cancer is the name given to a
collection of related disease.

There are some factors which cause cancer-

1. Gender
2. Age
3. Genetic Factor
4. Family History
5. Over weight
6. Alcoholic
7. Smoking
The dataset used in this story is publicly available and was created
by Dr. William H. Wolberg, physician at the University Of
Wisconsin Hospital at Madison, Wisconsin, USA.
Reference:
http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagn
ostic%29

Attribute Information
1.ID Number
2.Diagnosis(M=Cancerous, B=Non Cancerous)
Ten real-valued features are computed for each cell nucleus:
1.radius (mean of distances from center to points on the perimeter)
2.texture (standard deviation of gray-scale values)
3.perimeter
4.area
5.smoothness (local variation in radius lengths)
6.compactness (perimeter² / area — 1.0)
7.concavity (severity of concave portions of the contour)
8.concave points (number of concave portions of the contour)
9.symmetry
10.fractal dimension (“coastline approximation” — 1)
The mean, standard error and “worst” or largest (mean of the three
largest values) of these features were computed for each image, resulting in 30
features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is
Worst Radius.
We can observe that the data set contain 569 rows and 32 columns.
‘Diagnosis’ is the column which we are going to predict , which says if
the cancer is M = malignant or B = benign.
1 means the cancer is malignant and 0 means benign. We can
identify that out of the 569 persons, 357 are labeled as B (benign) and
212 as M (malignant).
Visualization of data is an imperative aspect of data science. It
helps to understand data and also to explain the data to another
person.
Categorical data are variables that contain label values rather than
numeric values.
The number of possible values is often limited to a fixed set.
For example, users are typically described by country, gender, age group
etc.
In this process we give a fixed numeric values to label values.
M= Cancerous changed to 1.
B= Non Cancerous changed to 0.

Splitting the data set –

The data we use is usually split into training data and test data. The
training set contains a known output and the model learns on this data in
order to be generalized to other data later on. We have the test dataset (or
subset) in order to test our model’s prediction on this sub
In this phase , we use the data transformation techinique
for scaling our data into some scale of either 0-100 or 0-1.
Most of the times, your dataset will contain features highly
varying in magnitudes, units and range. But since, most of the
machine learning algorithms use Euclidian distance between two
data points in their computations.
We need to bring all features to the same level of
magnitudes. This can be achieved by scaling.
There are various transformation techniques is available in data
mining-
1.Min – Max Scaling
2. Z-Score Scaling
In this phase, we use Data Mining Algorithms on our
Data set Algorithm can be classified into two groups :
Supervised learning : Supervised learning is a type of
system in which both input and desired output data are
provided. Input and output data are labeled for
classification to provide a learning basis for future data
processing.
Supervised learning problems can be further grouped
into Regression and Classification problems.
•A regression problem is when the output variable is a
real or continuous value, such as “salary” or “weight”.
•A classification problem is when the output variable is a
category like filtering emails “spam” or “not spam”

Unsupervised Learning : Unsupervised learning is

the algorithm using information that is neither classified nor
labeled and allowing the algorithm to act on
that information without guidance.
In our dataset we have the outcome variable or Dependent
variable
i.e. Y having only two set of values, either M (Malign) or B
(Benign)
So we will use Classification algorithm of supervised learning.
Decision tree [7] is a classifier that is expressed as a recursive partition of the
instance space. It creates a predictive model, which maps observations about a
node to conclusions about the nodes’ target value. In a tree structure leaves
represent the class labels and branches represent conjunctions of feature leading
to the class labels. Figure shows the illustrated example of binary decision tree.
PROCEDURE:
1. Acquire dataset from Hospital Breast Cancer datasets.
2. Pre-process data for applying J48 decision tree data mining technique. a.
Remove Sample Code Number from attribute list b. Numeric to
nominal type of data conversion of Class attribute. (2 – Benign, 4-
Malignant)
3. Pre-processed dataset uploaded in sklearn in python toolkit for analysis.
4. Information Gain algorithm applied in sklearn of respective attributes
record
5. Decision Tree J48 algorithm implemented, generating a decision tree
with leaf nodes as the class label (benign and malignant).
6. Diagnosis of new patients is achieved by cross referencing new attribute
values in the decision tree and following path till the leaf node reached
which would either specify benign or malignant tumor.
By using Decision Tree Method for Classification of our data set it is giving
the accuracy of approximately 96.46% which is a good result for small data
set. We can also gain higher accuracy by adding more information of about
the data set.

The automatic diagnosis of Breast cancer is an important real world medical

problem. Detection of breast cancer in its early stages is the key for treatment.
This paper shows how decision trees are used to model actual diagnosis of
Breast cancer for local and systematic treatment, along with presenting other
techniques that can be applied.
Experimental results show the effectiveness of the proposed model.
The performance of decision tree technique was investigated for the Breast
cancer diagnosis problem.

Solid Starts - First 100 Days
94% (18)
Solid Starts - First 100 Days
287 pages
Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
The Hold Me Tight Workbook - Dr. Sue Johnson
100% (16)
The Hold Me Tight Workbook - Dr. Sue Johnson
187 pages
Read People Like A Book by Patrick King-Edited
62% (66)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
94% (212)
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
212 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (28)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
75% (12)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
36 Questions To Fall in Love 1
97% (31)
36 Questions To Fall in Love 1
2 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
71% (69)
1001 Songs
1,798 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Building A Simple Machine Learning Model On Breast Cancer Data
No ratings yet
Building A Simple Machine Learning Model On Breast Cancer Data
12 pages
Using Predictive Analytics Model To Diagnose Breast Cnacer
No ratings yet
Using Predictive Analytics Model To Diagnose Breast Cnacer
9 pages
Breast Cancer Detection Algo Comparison
No ratings yet
Breast Cancer Detection Algo Comparison
15 pages
DiseasePredReport (3) (1)
No ratings yet
DiseasePredReport (3) (1)
42 pages
Ankita Patra
No ratings yet
Ankita Patra
17 pages
Breast Cancer Detection
No ratings yet
Breast Cancer Detection
15 pages
3MC_CIMPA_School_2025_Project_Overview
No ratings yet
3MC_CIMPA_School_2025_Project_Overview
1 page
Comparison of Decision Tree Methods For Breast Cancer Diagnosis
No ratings yet
Comparison of Decision Tree Methods For Breast Cancer Diagnosis
7 pages
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
No ratings yet
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
11 pages
Application of Big Mining On Health Care Industry
No ratings yet
Application of Big Mining On Health Care Industry
6 pages
Project Report: Bangladesh University of Business & Technology (BUBT)
No ratings yet
Project Report: Bangladesh University of Business & Technology (BUBT)
18 pages
Prediction of Breast Cancer Using Machine Learning Algorithms - 2nd Review
No ratings yet
Prediction of Breast Cancer Using Machine Learning Algorithms - 2nd Review
21 pages
Ijsta V1n4r7y15
No ratings yet
Ijsta V1n4r7y15
5 pages
Project Proposal_Breast Cancer Classification(1)
No ratings yet
Project Proposal_Breast Cancer Classification(1)
2 pages
Diagnosis and Prognosis of Breast Cancer Using Multi Classification Algorithm
No ratings yet
Diagnosis and Prognosis of Breast Cancer Using Multi Classification Algorithm
5 pages
Disease Prediction Using Machine Learning
No ratings yet
Disease Prediction Using Machine Learning
4 pages
Project Final
No ratings yet
Project Final
15 pages
Research Paper Final
No ratings yet
Research Paper Final
11 pages
Assignment Bigdata
No ratings yet
Assignment Bigdata
17 pages
Prediction of Diabetes Using Machine Learning
0% (1)
Prediction of Diabetes Using Machine Learning
6 pages
Mathematical Model of Classification of Human Genome Data For Breast Cancer
No ratings yet
Mathematical Model of Classification of Human Genome Data For Breast Cancer
12 pages
Ijcse11 02 02 53
No ratings yet
Ijcse11 02 02 53
8 pages
Project Proposal
No ratings yet
Project Proposal
6 pages
Report
No ratings yet
Report
7 pages
Classification of Breast Cancer Risk Using Naïve Bayes, Decision Tree, And Random Forest
No ratings yet
Classification of Breast Cancer Risk Using Naïve Bayes, Decision Tree, And Random Forest
15 pages
Chapter One to Three
No ratings yet
Chapter One to Three
39 pages
2023 Ieeee
No ratings yet
2023 Ieeee
6 pages
Breast Cancer Classifier Using Machine Learning
No ratings yet
Breast Cancer Classifier Using Machine Learning
7 pages
diabetes_test report
No ratings yet
diabetes_test report
62 pages
Classification
No ratings yet
Classification
22 pages
Predicting Cardiovascular Disease Using Logistic Regression Research Paper
No ratings yet
Predicting Cardiovascular Disease Using Logistic Regression Research Paper
4 pages
Thesis On Mammogram Classification
100% (3)
Thesis On Mammogram Classification
4 pages
Big Data & Predictive Analytics: How To Submit
No ratings yet
Big Data & Predictive Analytics: How To Submit
4 pages
CHAPTER ONE to 3-1
No ratings yet
CHAPTER ONE to 3-1
51 pages
Mid Term Assignment Data Warehousing and Data Mining Section: C Name: Joy, MD - Monowar Hossain ID: 18-38618-2
No ratings yet
Mid Term Assignment Data Warehousing and Data Mining Section: C Name: Joy, MD - Monowar Hossain ID: 18-38618-2
3 pages
Using Sentiment Analysis and Machine Learning Algorithms To Determine Citizens' Perceptions
No ratings yet
Using Sentiment Analysis and Machine Learning Algorithms To Determine Citizens' Perceptions
6 pages
Synopsis (Heart Disease Prediction)
No ratings yet
Synopsis (Heart Disease Prediction)
7 pages
Experiment 5
No ratings yet
Experiment 5
10 pages
Feature Selection For Classification in Medical Data Mining: Volume 2, Issue 2, March - April 2013
No ratings yet
Feature Selection For Classification in Medical Data Mining: Volume 2, Issue 2, March - April 2013
6 pages
Estimation of COVID19 Infection Using Machine Learning Algorithms
No ratings yet
Estimation of COVID19 Infection Using Machine Learning Algorithms
15 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
Medical Image Classification Thesis
100% (3)
Medical Image Classification Thesis
8 pages
mlPPT_11_45
No ratings yet
mlPPT_11_45
31 pages
IJERT Developing A Web Based System For
No ratings yet
IJERT Developing A Web Based System For
5 pages
8 1486792440 - 10-02-2017 PDF
No ratings yet
8 1486792440 - 10-02-2017 PDF
5 pages
2024-2017
No ratings yet
2024-2017
7 pages
10.1201_9781003559092-148_chapterpdf
No ratings yet
10.1201_9781003559092-148_chapterpdf
6 pages
Experiment 5
No ratings yet
Experiment 5
9 pages
Brown Illustrative Abstract Group Project Presentation_20241208_171319_0000
No ratings yet
Brown Illustrative Abstract Group Project Presentation_20241208_171319_0000
16 pages
ML Unit 2
No ratings yet
ML Unit 2
31 pages
Report 13
No ratings yet
Report 13
12 pages
Breast Cancer Detection Using SVM Classifier With Grid Search Technique
No ratings yet
Breast Cancer Detection Using SVM Classifier With Grid Search Technique
6 pages
Heart Disease Predictor
No ratings yet
Heart Disease Predictor
3 pages
The Comparative Study of Deep Learning N
No ratings yet
The Comparative Study of Deep Learning N
14 pages
CRI StatisticalModeling Methods
No ratings yet
CRI StatisticalModeling Methods
89 pages
61_online
No ratings yet
61_online
9 pages
Thesis Task 1
No ratings yet
Thesis Task 1
4 pages
Data Science Interview Questions: Answer Here
No ratings yet
Data Science Interview Questions: Answer Here
54 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Smart Business Problems and Analytical Hints in Cancer Research
From Everand
Smart Business Problems and Analytical Hints in Cancer Research
Zemelak Goraga
No ratings yet
RMT Notes
No ratings yet
RMT Notes
31 pages
Lesson Plan of Order of Operation
50% (2)
Lesson Plan of Order of Operation
5 pages
Vsa-Student Registration System in The Azure Cloud
100% (2)
Vsa-Student Registration System in The Azure Cloud
25 pages
Amity University Rajasthan: Amity School of Engineering
No ratings yet
Amity University Rajasthan: Amity School of Engineering
19 pages
IS2104 CourseHandout
No ratings yet
IS2104 CourseHandout
7 pages
Auditing SAP R3 - Control Risk Assessment
No ratings yet
Auditing SAP R3 - Control Risk Assessment
28 pages
PSPP Five Units QB PDF
No ratings yet
PSPP Five Units QB PDF
8 pages
OPC 20 How To Use E
No ratings yet
OPC 20 How To Use E
19 pages
Answer Q2 NIIT
100% (1)
Answer Q2 NIIT
21 pages
SAP CO Certification Mindmap - 1557395025
No ratings yet
SAP CO Certification Mindmap - 1557395025
2 pages
Dominos Digital Transformation Rev 2.0
100% (1)
Dominos Digital Transformation Rev 2.0
2 pages
Leia Harlow Resume 1
No ratings yet
Leia Harlow Resume 1
6 pages
3G Baseline Parameters
No ratings yet
3G Baseline Parameters
337 pages
Ix Developer 2 20 Reference Manual MAEN831M PDF
No ratings yet
Ix Developer 2 20 Reference Manual MAEN831M PDF
342 pages
Parallel Implementation of Hyperclique Miner Algorithm For Association Analysis of Weighted Protein-Protein Interaction Network
No ratings yet
Parallel Implementation of Hyperclique Miner Algorithm For Association Analysis of Weighted Protein-Protein Interaction Network
6 pages
HP LaserJet 9050 Printer Failure and Error Codes
No ratings yet
HP LaserJet 9050 Printer Failure and Error Codes
1 page
Incident Management Mind Map Study
No ratings yet
Incident Management Mind Map Study
1 page
PepsiCo S Op Using Sap Apo at Pepsico PDF
No ratings yet
PepsiCo S Op Using Sap Apo at Pepsico PDF
17 pages
Mumbai CIty Surveillance - Infinova
No ratings yet
Mumbai CIty Surveillance - Infinova
2 pages
Running JavaFX in JGrasp PDF
No ratings yet
Running JavaFX in JGrasp PDF
3 pages
Rapid Detection of Fake News Based On Machine Learning Methods
No ratings yet
Rapid Detection of Fake News Based On Machine Learning Methods
10 pages
Xweb
No ratings yet
Xweb
10 pages
Internship Opportunity - ClooBot
No ratings yet
Internship Opportunity - ClooBot
1 page
Assignment No. 3:: COMSATS University Islamabad Islamabad Campus
No ratings yet
Assignment No. 3:: COMSATS University Islamabad Islamabad Campus
3 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
Tar PDF
No ratings yet
Tar PDF
9 pages
IP SUMMER HOLIDAY HOME WORK
No ratings yet
IP SUMMER HOLIDAY HOME WORK
2 pages
Course File Python
No ratings yet
Course File Python
31 pages
Detailed Z80 Instruction Set
No ratings yet
Detailed Z80 Instruction Set
11 pages