0% found this document useful (0 votes)

13 views

Machine Learning Algorithm

The document describes implementing a Naive Bayes classifier and SVM classifier to predict breast cancer and iris flower types. It loads datasets, explores the data, trains models using different algorithms, and evaluates the models by calculating accuracy and confusion matrices. The Naive Bayes model achieves 97.3% accuracy on breast cancer data while SVM performs classification on iris flower types using different kernels.

Uploaded by

Fatini N.

We take content rights seriously. If you suspect this is your content, claim it here.

0% found this document useful (0 votes)

13 views

Machine Learning Algorithm

Uploaded by

Fatini N.

We take content rights seriously. If you suspect this is your content, claim it here.

You are on page 1/ 18

A.

Naïve Bayes

Objective: This python script demonstrate implementation of Naïve Bayes Classifier to predict a
sign of Breast Cancer.
Name of Dataset : Breast Cancer dataset from Scikit-Learn.
Overview of dataset: This dataset contains features computed from digitized images of fine
needle aspirates of breast mass. It aims to classify tumors as malignant or benign based on these
features.

Step 1 : Importing Library and load the dataset

# Importing necessary libraries

import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer

# Load the breast cancer dataset

breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

Step 2 : Data Exploration

breast_cancer.target_names

df = pd.DataFrame(np.c_[breast_cancer.data, breast_cancer.target],columns
= [list(breast_cancer.feature_names)+ ['target']])
df.info()

import matplotlib.pyplot as plt

# Count the occurrences of each class in the target array

benign_count = np.sum(breast_cancer.target == 0)
malignant_count = np.sum(breast_cancer.target == 1)

# Create a bar plot

plt.figure(figsize=(8, 6))
bars = plt.bar(['Benign', 'Malignant'], [benign_count, malignant_count],
color=['skyblue', 'salmon'])
# Annotate the bars with the count of each class
for bar in bars:
yval = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2, yval, round(yval),
va='bottom')

plt.xlabel('Class')
plt.ylabel('Count')
plt.title('Distribution of Benign and Malignant Cases')
plt.show()

Output
Based on the report generated below it show that
Number of attributes / features : 30 (mean_radius, mean_texture, mean_area, mean_smoothness,
etc)
Number of patients : 569
Number of class labels : 2 ('B' and 'M' corresponding to 357 Benign and 212 Malignant patients)
Missing Value : Non-null or no attribute have missing value
Value Inconsistency format : All attribute have same format which is float64 or numbers

RangeIndex: 569 entries,

array(['malignant', 0 to 568
'benign'], dtype='<U9')
Data columns (total 31 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 (mean radius,) 569 non-null float64
1 (mean texture,) 569 non-null float64
2 (mean perimeter,) 569 non-null float64
3 (mean area,) 569 non-null float64
4 (mean smoothness,) 569 non-null float64
5 (mean compactness,) 569 non-null float64
6 (mean concavity,) 569 non-null float64
7 (mean concave points,) 569 non-null float64
8 (mean symmetry,) 569 non-null float64
9 (mean fractal dimension,) 569 non-null float64
10 (radius error,) 569 non-null float64
11 (texture error,) 569 non-null float64
12 (perimeter error,) 569 non-null float64
13 (area error,) 569 non-null float64
14 (smoothness error,) 569 non-null float64
15 (compactness error,) 569 non-null float64
16 (concavity error,) 569 non-null float64
17 (concave points error,) 569 non-null float64
18 (symmetry error,) 569 non-null float64
19 (fractal dimension error,) 569 non-null float64
20 (worst radius,) 569 non-null float64
21 (worst texture,) 569 non-null float64
22 (worst perimeter,) 569 non-null float64
23 (worst area,) 569 non-null float64
24 (worst smoothness,) 569 non-null float64
25 (worst compactness,) 569 non-null float64
26 (worst concavity,) 569 non-null float64
27 (worst concave points,) 569 non-null float64
28 (worst symmetry,) 569 non-null float64
29 (worst fractal dimension,) 569 non-null float64
Step 3 : Data Pre-processing
Based on the report shown that dataset consists of numerical features and there are no missing
values, thus not need to perform extensive preprocessing for this dataset. Moreover, pre-
processing is minimal for the breast cancer dataset, as it is already clean and ready-made for use
in the sklearn library.

Step 4 : Splitting dataset

80% of the data is set up for training while 20% is used for testing

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(breast_cancer.data,
breast_cancer.target, test_size=0.2, random_state=42)

Step 5 : Model Training

As dataset having Numerical Data, so Guassian Naïve Bayes works best. This script will
demonstrate using Guassian Naïve Bayes.

# Initialize the Gaussian Naive Bayes classifier

naive_bayes = GaussianNB()

# Train the classifier

naive_bayes.fit(X_train, y_train)

# Predict the labels for the test set

y_pred = naive_bayes.predict(X_test)

Step 6 : Evaluation - Calculate Accuracy and Confusion Matrix

# Calculate the accuracy of the classifier

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Generate the confusion matrix

cm = confusion_matrix(y_test, y_pred)

# Plot the confusion matrix

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=breast_cancer.target_names,
yticklabels=breast_cancer.target_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

Output

Based on the classification report below, it shown that the model performed excellently with an
accuracy score of 0.973 (97.3%), which means that 97 out of 100 predictions (whether benign or
malignant) is correct.

Based on the confusion matrix below it show that the model correctly predicted 71 benign cases
and 70 malignant cases. The off-diagonal cells show the number of incorrect predictions. For
example, the model predicted 3 benign cases as malignant, and 0 malignant cases as benign.
accuracy score: 0.9736842105263158
classifcation report :
precision recall f1-score support

0 1.00 0.93 0.96 43

1 0.96 1.00 0.98 71

accuracy 0.97 114

macro avg 0.98 0.97 0.97 114
weighted avg 0.97 0.97 0.97 114

Full Code via Google Colab

https://colab.research.google.com/drive/1YATsBgPmBGMm9XTUufbFk0SaDywHq3h3?usp=sharing
B. SVM

Objective: This Python script demonstrates the implementation of Support Vector Machine
(SVM) using different kernels to classify iris flowers based on their features.
Name of Dataset: Iris dataset from Scikit-Learn.
Overview of Dataset: The Iris dataset contains features measured from samples of three species
of iris flowers: Iris setosa, Iris versicolor, and Iris virginica. The features include sepal length,
sepal width, petal length, and petal width. It aims to classify iris flowers into one of the three
species based on these features.

Step 1 : Importing Library and load the dataset

from sklearn import datasets

from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.model_selection import GridSearchCV
import numpy as np
from sklearn.datasets import load_iris

# Load Iris dataset

iris = datasets.load_iris()
X = iris.data
y = iris.target

Step 2 : Data Exploration

# Convert the dataset to a DataFrame for easier visualization
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['target'] = iris.target

# Plot the pairplot to visualize the distribution

sns.pairplot(iris_df, hue='target', palette='Set1')
plt.show()

feature_names = iris.feature_names

# Convert the data to a pandas DataFrame

iris_df = pd.DataFrame(X, columns=feature_names)
# Calculate the correlation matrix
correlation_matrix = iris_df.corr()

# Plot the correlation matrix as a heatmap

plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f",
annot_kws={"size": 10})
plt.title('Correlation Between Features')
plt.show()
Output

Based on the correlation matrix and data distribution graph generated above, it can be concluded
that sepal length has a stronger positive linear relationship with petal length (0.87) and petal
width (0.82) compared to sepal width (-0.12). Petal length and petal width have the strongest
positive (0.96) linear relationship among all the features. There is very weak or close to no linear
relationship between sepal width and sepal length (-0.12), sepal width and petal width (-0.37).

Given the non-linear separability shown from the IRIS data distribution graph above, SVM
kernels will be employed for further analysis.

Step 3 : Splitting Data

80% of the data is set up for training while 20% is used for testing

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

Step 4 : Find The Best Hyperparameters For SVM Classifiers With Different Kernels

1. Linear Kernel (param_grid_linear):

o C: takes values of [0.1, 1, 10, 100], different levels of regularization strength
2. Polynomial Kernel (param_grid_poly):
o C: Same as in the linear kernel, C controls the regularization strength.
o takes values of [0.1, 0.01, 0.001], representing different levels of
Gamma
influence of individual training
o Degree takes values of [2, 3, 4], representing different degrees of polynomial
functions
3. RBF Kernel (param_grid_rbf):
o C: Same as in the linear and polynomial kernels, C controls the regularization
o gamma: takes values of [0.1, 0.01, 0.001], representing different levels of
width of the kernel.

In the code provided, cv=5 is used, which means the dataset is divided into 5 folds. This process
is repeated 5 times, resulting in 5 estimates of the model's performance.

# Define parameter grids for each kernel

param_grid_linear = {'C': [0.1, 1, 10, 100]}
param_grid_poly = {'C': [0.1, 1, 10, 100],
'gamma': [0.1, 0.01, 0.001],
'degree': [2, 3, 4]}
param_grid_rbf = {'C': [0.1, 1, 10, 100],
'gamma': [0.1, 0.01, 0.001]}

# Perform grid search with cross-validation for each kernel

svm_linear_grid = GridSearchCV(SVC(kernel='linear'), param_grid_linear,
cv=5)
svm_poly_grid = GridSearchCV(SVC(kernel='poly'), param_grid_poly, cv=5)
svm_rbf_grid = GridSearchCV(SVC(kernel='rbf'), param_grid_rbf, cv=5)

# Fit the models

svm_linear_grid.fit(X_train, y_train)
svm_poly_grid.fit(X_train, y_train)
svm_rbf_grid.fit(X_train, y_train)

# Get the mean cross-validation accuracy for each iteration of grid search
print("Linear Kernel:")
print(pd.DataFrame(svm_linear_grid.cv_results_)[['param_C',
'mean_test_score']])
print("")

print("Polynomial Kernel:")
print(pd.DataFrame(svm_poly_grid.cv_results_)[['param_C', 'param_degree',
'param_gamma', 'mean_test_score']])
print("")

print("RBF Kernel:")
print(pd.DataFrame(svm_rbf_grid.cv_results_)[['param_C', 'param_gamma',
'mean_test_score']])
Output
Linear Kernel:
param_C mean_test_score
0 0.1 0.941667
1 1 0.958333
2 10 0.950000
3 100 0.950000

Polynomial Kernel:
param_C param_degree param_gamma mean_test_score
0 0.1 2 0.1 0.950000
1 0.1 2 0.01 0.441667
2 0.1 2 0.001 0.441667
3 0.1 3 0.1 0.958333
4 0.1 3 0.01 0.425000
5 0.1 3 0.001 0.425000
6 0.1 4 0.1 0.941667
7 0.1 4 0.01 0.441667
8 0.1 4 0.001 0.408333
9 1 2 0.1 0.958333
10 1 2 0.01 0.883333
11 1 2 0.001 0.441667
12 1 3 0.1 0.950000
13 1 3 0.01 0.841667
14 1 3 0.001 0.425000
15 1 4 0.1 0.933333
16 1 4 0.01 0.816667
17 1 4 0.001 0.408333
18 10 2 0.1 0.950000
19 10 2 0.01 0.950000
20 10 2 0.001 0.441667
21 10 3 0.1 0.933333
22 10 3 0.01 0.958333
23 10 3 0.001 0.425000
24 10 4 0.1 0.941667
25 10 4 0.01 0.925000
26 10 4 0.001 0.408333
27 100 2 0.1 0.950000
28 100 2 0.01 0.958333
29 100 2 0.001 0.883333
30 100 3 0.1 0.941667
31 100 3 0.01 0.958333
32 100 3 0.001 0.425000
33 100 4 0.1 0.941667
34 100 4 0.01 0.950000
35 100 4 0.001 0.408333

RBF Kernel:
param_C param_gamma mean_test_score
0 0.1 0.1 0.900000
1 0.1 0.01 0.466667
2 0.1 0.001 0.466667
3 1 0.1 0.950000
4 1 0.01 0.908333
5 1 0.001 0.466667
6 10 0.1 0.950000
7 10 0.01 0.950000
8 10 0.001 0.916667
9 100 0.1 0.950000
10 100 0.01 0.958333
11 100 0.001 0.950000
Best Parameters (Linear Kernel): {'C': 1}
Best Score (Linear Kernel): 0.9583333333333334
Best Parameters (Polynomial Kernel): {'C': 0.1, 'degree': 3, 'gamma':
0.1}
Best Score (Polynomial Kernel): 0.9583333333333334
Best Parameters (RBF Kernel): {'C': 100, 'gamma': 0.01}
Best Score (RBF Kernel): 0.9583333333333334

Observations:

All three kernel types (Linear, Polynomial, and RBF) achieved the same best accuracy
score of 95.83%. This suggests that for this IRIS dataset, the choice of kernel might not
have a significant impact on SVM classification performance.

According to linear and RBF SVM's decision boundary image above, there are in a
straight line, indicating reasonable classification. The polynomial kernels have
complicated decision boundaries.

Given the IRIS have small dataset, the linear kernel seems like a good choice due to its
simplicity. It achieves high accuracy and avoids the potential for overfitting that can
occur with complex kernels like Polynomial and RBF.

Full Code via Google Colab

https://colab.research.google.com/drive/1MVm5P9lPHF9ukZMZPHxrdmvWSnemBDoC?usp=sharing
C. Logistic Regression

Title: Python Script: Logistic Regression for Classifying Handwritten Digits

Objective: This Python script showcases the implementation of logistic regression to classify
handwritten digits using the digit dataset from Scikit-Learn.
Dataset: The digit dataset consists of images of handwritten digits (0 through 9). Each image is
represented as an array of pixel values, where each pixel represents the grayscale intensity. The
objective is to classify these images into one of the ten digits based on their pixel values.

Step 1 : Importing Library and load the dataset

The script begins by importing the required libraries. These include scikit-learn modules
for loading the dataset, splitting the data, preprocessing, and evaluating the model. The
script loads the digit dataset using the load_digits function from scikit-learn. This dataset
contains images of handwritten digits along with their corresponding labels.

# Import necessary libraries

from sklearn.datasets import load_digits

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Load the digits dataset

digits = load_digits()

Step 2 : Split the Dataset:

The dataset is split into training and testing sets using the train_test_split function. 0% is
for training, 20% is for testing.
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(digits.data,
digits.target, test_size=0.2, random_state=42)

Step 3 : Preprocessing to Standardize the Features:

Feature scaling is performed to standardize the features, ensuring that each feature has a
mean of 0 and a standard deviation of 1. This is done using the StandardScaler class.

# Standardize the features

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
Step 4 : Train the model and predict on test data
The logistic regression model is trained on the training data using the fit() method and
predict the labels for the test data using the predict method.

# Initialize the logistic regression model

logistic_regression = LogisticRegression(max_iter=1000)

# Train the model

logistic_regression.fit(X_train_scaled, y_train)

# Predict on the test set

y_pred = logistic_regression.predict(X_test_scaled)

Step 5 : Evaluation Model Performance and visualizing using Confusion Matrix:

The performance of the model is evaluated using accuracy metrics such as accuracy score
and a classification report. Finally, a visually readable confusion matrix is generated
using seaborn to assess the model's classification performance.
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Print classification report

print(classification_report(y_test, y_pred))

import matplotlib.pyplot as plt

import seaborn as sns
from sklearn.metrics import confusion_matrix

# Compute confusion matrix

conf_matrix = confusion_matrix(y_test, y_pred)

# Plot confusion matrix using seaborn

plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", cbar=False)
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()
Output
Accuracy: 0.9722222222222222
precision recall f1-score support

0 1.00 1.00 1.00 33

1 0.97 1.00 0.98 28
2 1.00 1.00 1.00 33
3 0.97 0.97 0.97 34
4 1.00 0.98 0.99 46
5 0.94 0.94 0.94 47
Observation
Overall result show 97.2 accuracy. The model performed well on classifying the digits 0, 2, 4, and
7 There were no instances where the model predicted these digits incorrectly. The model had
slightly difficulty classifying digits 1, 3, 5, 6, 8, and 9. There were few instances where these
digits were classified incorrectly as other digits. Overall, the model seems to be performing well,
but there is room for improvement in classifying digits 1, 3, 5, 6, 8, and 9.
Full Code via Google Colab
https://

colab.research.google.com/drive/1_eXsElMEFZmOAkag6LhdP-1K2sCrHhWM?usp=sharing

D. Ensembled

Title: Ensemble Method vs Single Classifier for Heart Disease Prediction

Objective: This Python script compares the performance of an ensemble method with a single
classifier (Support Vector Machine) for predicting the presence or absence of heart disease using
the UCI Statlog (Heart) dataset.
Dataset: The UCI Statlog (Heart) dataset contains various attributes related to heart health,
including age, gender, chest pain type, blood pressure, cholesterol level, and more.
Target variable: "Presence of Heart Disease," is a binary variable indicating whether the
individual has heart disease (1) or not (0).
Step 1 : Importing Library and load the dataset

Loads the Heart dataset from the openml repository into the variable data.

from sklearn.datasets import fetch_openml

from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Heart dataset

data = fetch_openml(name='heart-statlog', version=1, as_frame=True)
X = data.data
y = data.target

Step 2 : Pre=Process Data

X = X.dropna(): Drops rows with missing values from the feature matrix X. This is a simple way
to handle missing values.
Splits the dataset into training and testing sets. 80% of the data will be used for training ( X_train
and y_train), and 20% will be used for testing (X_test and y_test).

# Data preprocessing
# For simplicity, let's handle missing values by dropping them
X = X.dropna()

# Split data into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

Step 3 : Create Ensemble Model and Single classifier. Evaluate accuracy

# Instantiate SVM classifier (single classifer)

svm_clf = SVC(kernel='linear', probability=True, random_state=42)

# Fit and evaluate SVM Classifier

svm_clf.fit(X_train, y_train)
y_pred_svm = svm_clf.predict(X_test)
accuracy_svm = accuracy_score(y_test, y_pred_svm)
print("SVM Classifier Accuracy:", accuracy_svm)
# Instantiate classifiers
rf_clf = RandomForestClassifier(n_estimators=100, random_state=42)
svm_clf = SVC(kernel='linear', probability=True, random_state=42)
lr_clf = LogisticRegression(max_iter=1000, random_state=42)

# Create ensemble model using VotingClassifier

ensemble_model = VotingClassifier(estimators=[('rf', rf_clf), ('svm',
svm_clf), ('lr', lr_clf)], voting='soft')

# Fit and evaluate ensemble model

ensemble_model.fit(X_train, y_train)
y_pred_ensemble = ensemble_model.predict(X_test)
accuracy_ensemble = accuracy_score(y_test, y_pred_ensemble)
print("Ensemble Model Accuracy:", accuracy_ensemble)

Output

SVM Classifier Accuracy: 0.8703703703703703

Ensemble Model Accuracy: 0.9074074074074074

Observation

The accuracy results show that the ensemble model outperformed the Support Vector Machine
(SVM) classifier. The ensemble model achieved a higher accuracy by combining the predictions
of multiple classifiers (Random Forest, SVM, Logistic Regression) using the soft voting strategy.

Step 4: Visualizing Confusion Matrix

Output
Observation

 True positives (32): The model correctly predicted 32 people who have heart disease.
 False positives (1): The model incorrectly predicted that 1 person has heart disease, but
they actually don't.
 True negatives (17): The model correctly predicted 17 people who do not have heart
disease.
 False negatives (4): The model missed predicting 4 cases of heart disease.

Overall, the model appears to be good at identifying people with heart disease with few
mistakes.

Full Code via Google Colab

https://colab.research.google.com/drive/1VJ9vN1zqw1z0lXOw-N3YQaVgs6a77rN-?usp=sharing

High Yield Surgery Compatible Version PDF
71% (7)
High Yield Surgery Compatible Version PDF
77 pages
Communication Skills MRCS
50% (2)
Communication Skills MRCS
37 pages
Assignment 1 - Introduction To Machine Learning: Version 1.0 of This Notebook. To Download
0% (1)
Assignment 1 - Introduction To Machine Learning: Version 1.0 of This Notebook. To Download
30 pages
Pattern Recognition Lab
No ratings yet
Pattern Recognition Lab
24 pages
Seminar On Cancer: Submitted To Submitted by
100% (1)
Seminar On Cancer: Submitted To Submitted by
13 pages
Globocan 2020
No ratings yet
Globocan 2020
2 pages
45B AIML Practical 08
No ratings yet
45B AIML Practical 08
10 pages
Breast Cancer Classification Using DTC
No ratings yet
Breast Cancer Classification Using DTC
1 page
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
1FsWES7YJDERHD-bZ2ujFakbQyzi6 Yin
No ratings yet
1FsWES7YJDERHD-bZ2ujFakbQyzi6 Yin
9 pages
Support Vector Machines com Python
No ratings yet
Support Vector Machines com Python
13 pages
Breast Cancer Classification With Machine Learning
No ratings yet
Breast Cancer Classification With Machine Learning
17 pages
ML0101EN Clas SVM Cancer Py v1
No ratings yet
ML0101EN Clas SVM Cancer Py v1
10 pages
LAB # 08 Naive Bayes.ipynb - Colab
No ratings yet
LAB # 08 Naive Bayes.ipynb - Colab
3 pages
ML Project - Binary - Colaboratory
No ratings yet
ML Project - Binary - Colaboratory
7 pages
python_final_project_group_03
No ratings yet
python_final_project_group_03
18 pages
CatBoost - An In-Depth Guide Python
No ratings yet
CatBoost - An In-Depth Guide Python
33 pages
AML_LAB21 6 6 1.Ipynb - Colab
No ratings yet
AML_LAB21 6 6 1.Ipynb - Colab
6 pages
# Import Plotting Libraries: in (1) : Import Pandas As PD
No ratings yet
# Import Plotting Libraries: in (1) : Import Pandas As PD
13 pages
5 Breast Cancer Model - Ipynb Colab
No ratings yet
5 Breast Cancer Model - Ipynb Colab
5 pages
Support Vector Machine (SVM) - Bioinformatics
No ratings yet
Support Vector Machine (SVM) - Bioinformatics
10 pages
Hussain-assin2_cancrclassification
No ratings yet
Hussain-assin2_cancrclassification
12 pages
ML - LAB 2 - Jupyter Notebook
No ratings yet
ML - LAB 2 - Jupyter Notebook
9 pages
SVM and Kmeans -Iris Dataset.ipynb - Colab
No ratings yet
SVM and Kmeans -Iris Dataset.ipynb - Colab
5 pages
m1
No ratings yet
m1
10 pages
04-svm
No ratings yet
04-svm
8 pages
20BCP021 Assignment 3
No ratings yet
20BCP021 Assignment 3
7 pages
Python Code For Machine Learning
No ratings yet
Python Code For Machine Learning
26 pages
Breastcancer
No ratings yet
Breastcancer
13 pages
Predicting Breast Cancer Using Logistic Regression - by Mo Kaiser - The Startup - Medium
No ratings yet
Predicting Breast Cancer Using Logistic Regression - by Mo Kaiser - The Startup - Medium
15 pages
Classification Algorithms
No ratings yet
Classification Algorithms
16 pages
Cancer de Mama Sin Estandarizar Estratificado
No ratings yet
Cancer de Mama Sin Estandarizar Estratificado
10 pages
Mini Project With Output
No ratings yet
Mini Project With Output
8 pages
Chap5_wei.ipynb - Colab
No ratings yet
Chap5_wei.ipynb - Colab
29 pages
Breast Cancer Dataset
No ratings yet
Breast Cancer Dataset
154 pages
A008 - KNN.R: # Load The Dataset
No ratings yet
A008 - KNN.R: # Load The Dataset
4 pages
Script Group8
No ratings yet
Script Group8
19 pages
Mini Project
No ratings yet
Mini Project
8 pages
Breast Cancer Detection
No ratings yet
Breast Cancer Detection
15 pages
Practical No - 1
No ratings yet
Practical No - 1
5 pages
Breast Cancer Classification
No ratings yet
Breast Cancer Classification
18 pages
IJERT Developing A Web Based System For
No ratings yet
IJERT Developing A Web Based System For
5 pages
Breast Cancer Prdiction
No ratings yet
Breast Cancer Prdiction
16 pages
Feature Selection
No ratings yet
Feature Selection
8 pages
IDS Project Group 11
No ratings yet
IDS Project Group 11
35 pages
EXP 9 DWM - Merged
No ratings yet
EXP 9 DWM - Merged
11 pages
P06 The Classification Pipeline Ans
No ratings yet
P06 The Classification Pipeline Ans
16 pages
Ludic - Workshop - Iris - Copie
No ratings yet
Ludic - Workshop - Iris - Copie
5 pages
Project 1
No ratings yet
Project 1
6 pages
KRAI LabManual
No ratings yet
KRAI LabManual
77 pages
Practical 6
No ratings yet
Practical 6
8 pages
4c Sklearn-Classification-Regression-Bkhw-Spring 2019
No ratings yet
4c Sklearn-Classification-Regression-Bkhw-Spring 2019
20 pages
LightGBM - An In-Depth Guide Python
No ratings yet
LightGBM - An In-Depth Guide Python
26 pages
Remaining ML Program
No ratings yet
Remaining ML Program
12 pages
bt
No ratings yet
bt
16 pages
AI Medical Diagnosis Week 01
No ratings yet
AI Medical Diagnosis Week 01
5 pages
Building A Simple Machine Learning Model On Breast Cancer Data
No ratings yet
Building A Simple Machine Learning Model On Breast Cancer Data
12 pages
Machine Learning (ML)
No ratings yet
Machine Learning (ML)
35 pages
BCSL606 MACHINE LEARNING LAB
No ratings yet
BCSL606 MACHINE LEARNING LAB
33 pages
Analise Componente Principal
No ratings yet
Analise Componente Principal
22 pages
Unit 2 ML
No ratings yet
Unit 2 ML
93 pages
Experiment - 12: Random Forest in Python
No ratings yet
Experiment - 12: Random Forest in Python
3 pages
Remarque Sur L'effect Des Different Valeurs C
No ratings yet
Remarque Sur L'effect Des Different Valeurs C
5 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Download Complete Hughes Mansel Webster s Benign Disorders and Diseases of the Breast 3rd Edition Robert E. Mansel Mb Bs Mrcs Lrcp Frcs Ms PDF for All Chapters
100% (5)
Download Complete Hughes Mansel Webster s Benign Disorders and Diseases of the Breast 3rd Edition Robert E. Mansel Mb Bs Mrcs Lrcp Frcs Ms PDF for All Chapters
85 pages
Community Health Nursing
75% (4)
Community Health Nursing
59 pages
Kuerer s Breast Surgical Oncology 1st Edition Ph.D. Kuerer - Download the full ebook set with all chapters in PDF format
100% (1)
Kuerer s Breast Surgical Oncology 1st Edition Ph.D. Kuerer - Download the full ebook set with all chapters in PDF format
59 pages
Oncology Nursing Notes
No ratings yet
Oncology Nursing Notes
12 pages
A Diet Change Can Help Fight Breast Cancer
100% (1)
A Diet Change Can Help Fight Breast Cancer
3 pages
Planning For Health Promotion, Health Maintenance and Home Health Considerations Risk Reduction, and Disease Prevention
No ratings yet
Planning For Health Promotion, Health Maintenance and Home Health Considerations Risk Reduction, and Disease Prevention
5 pages
Oup Accepted Manuscript 2018
No ratings yet
Oup Accepted Manuscript 2018
7 pages
Lynne - Bentley CV 3-2020
No ratings yet
Lynne - Bentley CV 3-2020
7 pages
哈麦4 - Graduation Project (Thesis) Assignment Plan
No ratings yet
哈麦4 - Graduation Project (Thesis) Assignment Plan
7 pages
All Peace Corps Medical Technical Guidelines (TG) Titles Numbers Codes Only
No ratings yet
All Peace Corps Medical Technical Guidelines (TG) Titles Numbers Codes Only
6 pages
Benign Vs Malignant Masses in Breast Ultrasound
100% (1)
Benign Vs Malignant Masses in Breast Ultrasound
95 pages
Matary Differential Diagnosis 2013 AllTebFamily Com PDF
No ratings yet
Matary Differential Diagnosis 2013 AllTebFamily Com PDF
128 pages
D - 6answer Key
No ratings yet
D - 6answer Key
14 pages
Report of CA 15.3
No ratings yet
Report of CA 15.3
1 page
10 Breast Cancer Risk Factors
No ratings yet
10 Breast Cancer Risk Factors
2 pages
Questionnaire
No ratings yet
Questionnaire
3 pages
East Cancer
No ratings yet
East Cancer
33 pages
Stat Answers
No ratings yet
Stat Answers
79 pages
Body Image: Viren Swami, Adrian Furnham
No ratings yet
Body Image: Viren Swami, Adrian Furnham
6 pages
Skin Erosion A Rare Long Term Complication of The Chemotherapy Port 305
No ratings yet
Skin Erosion A Rare Long Term Complication of The Chemotherapy Port 305
1 page
ESR Ebook For Undergraduate Education in Radiology - 04 Breast Imaging
No ratings yet
ESR Ebook For Undergraduate Education in Radiology - 04 Breast Imaging
186 pages
Family Medicine 5 - Answers
No ratings yet
Family Medicine 5 - Answers
48 pages
2 (2)
No ratings yet
2 (2)
14 pages
Perawatan Payudara Untuk Mencegah Bendungan Asi Pada Ibu Post Partum Ria Gustirini
No ratings yet
Perawatan Payudara Untuk Mencegah Bendungan Asi Pada Ibu Post Partum Ria Gustirini
6 pages
Red Nails, Black Skates by Erica Rand
No ratings yet
Red Nails, Black Skates by Erica Rand
22 pages
HLTH 634-d01 Health Communication Program Plan
No ratings yet
HLTH 634-d01 Health Communication Program Plan
9 pages

Machine Learning Algorithm

Uploaded by

Machine Learning Algorithm

Uploaded by

A.

Step 1 : Importing Library and load the dataset

# Importing necessary libraries

# Load the breast cancer dataset

Step 2 : Data Exploration

import matplotlib.pyplot as plt

# Count the occurrences of each class in the target array

# Create a bar plot

RangeIndex: 569 entries,

Step 4 : Splitting dataset

# Split the dataset into training and testing sets

Step 5 : Model Training

# Initialize the Gaussian Naive Bayes classifier

# Train the classifier

# Predict the labels for the test set

Step 6 : Evaluation - Calculate Accuracy and Confusion Matrix

# Calculate the accuracy of the classifier

# Generate the confusion matrix

# Plot the confusion matrix

0 1.00 0.93 0.96 43

accuracy 0.97 114

Full Code via Google Colab

Step 1 : Importing Library and load the dataset

from sklearn import datasets

# Load Iris dataset

Step 2 : Data Exploration

# Plot the pairplot to visualize the distribution

# Convert the data to a pandas DataFrame

# Plot the correlation matrix as a heatmap

Step 3 : Splitting Data

# Split the dataset into training and testing sets

1. Linear Kernel (param_grid_linear):

# Define parameter grids for each kernel

# Perform grid search with cross-validation for each kernel

# Fit the models

Full Code via Google Colab

Title: Python Script: Logistic Regression for Classifying Handwritten Digits

Step 1 : Importing Library and load the dataset

# Import necessary libraries

from sklearn.datasets import load_digits

# Load the digits dataset

Step 2 : Split the Dataset:

Step 3 : Preprocessing to Standardize the Features:

# Standardize the features

# Initialize the logistic regression model

# Train the model

# Predict on the test set

Step 5 : Evaluation Model Performance and visualizing using Confusion Matrix:

# Print classification report

import matplotlib.pyplot as plt

# Compute confusion matrix

# Plot confusion matrix using seaborn

0 1.00 1.00 1.00 33

Title: Ensemble Method vs Single Classifier for Heart Disease Prediction

from sklearn.datasets import fetch_openml

# Load the Heart dataset

Step 2 : Pre=Process Data

# Split data into train and test sets

Step 3 : Create Ensemble Model and Single classifier. Evaluate accuracy

# Instantiate SVM classifier (single classifer)

# Fit and evaluate SVM Classifier

# Create ensemble model using VotingClassifier

# Fit and evaluate ensemble model

SVM Classifier Accuracy: 0.8703703703703703

Step 4: Visualizing Confusion Matrix

Full Code via Google Colab

You might also like