Breast Cancer Wisconsin (Diagnostic)
Donated on 10/31/1995
Diagnostic Wisconsin Breast Cancer Database.
Dataset Characteristics
Multivariate
Subject Area
Health and Medicine
Associated Tasks
Classification
Feature Type
Real
# Instances
569
# Features
30
Dataset Information
Additional Information
Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. A few of the images can be found at http://www.cs.wisc.edu/~street/images/ Separating plane described above was obtained using Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree Construction Via Linear Programming." Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society, pp. 97-101, 1992], a classification method which uses linear programming to construct a decision tree. Relevant features were selected using an exhaustive search in the space of 1-4 features and 1-3 separating planes. The actual linear program used to obtain the separating plane in the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34]. This database is also available through the UW CS ftp server: ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/
Has Missing Values?
No
Introductory Paper
By W. Street, W. Wolberg, O. Mangasarian. 1993
Published in Electronic imaging
Variables Table
Variable Name | Role | Type | Description | Units | Missing Values |
---|---|---|---|---|---|
ID | ID | Categorical | no | ||
Diagnosis | Target | Categorical | no | ||
radius1 | Feature | Continuous | no | ||
texture1 | Feature | Continuous | no | ||
perimeter1 | Feature | Continuous | no | ||
area1 | Feature | Continuous | no | ||
smoothness1 | Feature | Continuous | no | ||
compactness1 | Feature | Continuous | no | ||
concavity1 | Feature | Continuous | no | ||
concave_points1 | Feature | Continuous | no |
0 to 10 of 32
Additional Variable Information
1) ID number 2) Diagnosis (M = malignant, B = benign) 3-32) Ten real-valued features are computed for each cell nucleus: a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter^2 / area - 1.0) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation" - 1)
Baseline Model Performance
Dataset Files
File | Size |
---|---|
wdbc.data | 121.2 KB |
wdbc.names | 4.6 KB |
Papers Citing this Dataset
Sort by Year, desc
By Jenni Sidey-Gibbons, Chris Sidey-Gibbons. 2019
Published in BMC medical research methodology.
By Younghwan Chae, Daniel Wilke. 2019
Published in
By Arnaud Looveren, Janis Klaise. 2019
Published in ArXiv.
By Daniel Stamate, Wajdi Alghamdi, Daniel Stahl, Doina Logofatu, Alexander Zamyatin. 2018
Published in AIAI.
0 to 5 of 37
Reviews
There are no reviews for this dataset yet.
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo # fetch dataset breast_cancer_wisconsin_diagnostic = fetch_ucirepo(id=17) # data (as pandas dataframes) X = breast_cancer_wisconsin_diagnostic.data.features y = breast_cancer_wisconsin_diagnostic.data.targets # metadata print(breast_cancer_wisconsin_diagnostic.metadata) # variable information print(breast_cancer_wisconsin_diagnostic.variables)
Wolberg, W., Mangasarian, O., Street, N., & Street, W. (1993). Breast Cancer Wisconsin (Diagnostic) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5DW2B.
Creators
William Wolberg
Olvi Mangasarian
Nick Street
W. Street
DOI
License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.