Cancer Detection Using Data Mining
Cancer Detection Using Data Mining
tissues. The human body is comprises of million of cells each with its unique function.
When there is unregulated growth of any of these cells termed as Cancer.
Cancer is classified by the type of cell is affected and more than 200 types of
cancer are known. This paper is focused on Breast Cancer. Cancer is the name given to a
collection of related disease.
Attribute Information
1.ID Number
2.Diagnosis(M=Cancerous, B=Non Cancerous)
Ten real-valued features are computed for each cell nucleus:
1.radius (mean of distances from center to points on the perimeter)
2.texture (standard deviation of gray-scale values)
3.perimeter
4.area
5.smoothness (local variation in radius lengths)
6.compactness (perimeter² / area — 1.0)
7.concavity (severity of concave portions of the contour)
8.concave points (number of concave portions of the contour)
9.symmetry
10.fractal dimension (“coastline approximation” — 1)
The mean, standard error and “worst” or largest (mean of the three
largest values) of these features were computed for each image, resulting in 30
features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is
Worst Radius.
We can observe that the data set contain 569 rows and 32 columns.
‘Diagnosis’ is the column which we are going to predict , which says if
the cancer is M = malignant or B = benign.
1 means the cancer is malignant and 0 means benign. We can
identify that out of the 569 persons, 357 are labeled as B (benign) and
212 as M (malignant).
Visualization of data is an imperative aspect of data science. It
helps to understand data and also to explain the data to another
person.
Categorical data are variables that contain label values rather than
numeric values.
The number of possible values is often limited to a fixed set.
For example, users are typically described by country, gender, age group
etc.
In this process we give a fixed numeric values to label values.
M= Cancerous changed to 1.
B= Non Cancerous changed to 0.