Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
14 views

Data Analysis in Python_ML

Uploaded by

Khushal Khan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Data Analysis in Python_ML

Uploaded by

Khushal Khan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Anaconda is a distribution of the Python and R programming languages for scientific

computing, that aims to simplify package management and deployment. The distribution
includes data-science packages suitable for Windows, Linux, and macOS. It can be accessed
from

https://www.anaconda.com/products/individual

After installing anaconda, launch it by simply searching it and then launch it by clicking the
icon you get. When it opens, find and launch jupyter Or type jupyter notebook.

Home screen will open in a browser after terminal display for few seconds.
Go to New and select python 3 in drop down list.

A new notebook will be opened, you may rename it by clicking on text box displaying
“untitled”

You may enter your commands in the empty cells. Green color indicates the active cell
while blue indicates the inactive ones. You can make the cell inactive by pressing Esc key.
Hover the mouse over the inactive cell and click or press ENTER key to make it active.

To create a new cell, press + icon at the top or press B on keyboard to add a cell below or A
to add a cell above the current cell.

To remove a cell, press D twice, indicates number of the command


executed.

# Represents comments

# Like R, it recognizes numbers but not the valueless variables.

print(‘Bismillah') # You can print a value to the screen using “print.”

# = is used as an assignment operator.

A=1

A # will print the contents of A

type (A) # tells the class of the object A, we created above


Variable types don’t need to be declared. Python figures out the variable types on its own.

Variable Names are case sensitive and cannot start with a number. They can contain
letters, numbers, and underscores.

a, b, c = 17, 3.14, "test" # We can assign values to multiple variables at a time.

+ operator to concatenate (join) two strings

a = "hello "

b = "world"

print (a + b)

Python Data Structures

• Integers: Whole numbers e.g. 2, 3, 5, 0, -1

• Floats: Numbers with decimal point. e.g. 1.50

• Strings: Either single ('') or double ("") or triple quotes (""" or ''') can be used. For example,
“python” and ‘python’ are same strings. Unmatched ones can occur within the string.
"datatype’s"

• Tuple ( ) A collection of different things. Tuples are “immutable”, i.e., they cannot be modified
after creation.
myTuple = ('abc', 2.5, A)

myTuple[2] # Shall return the value at 3rd index position starting from 0.

myTuple.index(2.5) # shall tell us what is the index position of 2.5

List [ ] Lists are “mutable”, i.e., their elements can be modified.

myList = ['abc', 'def', 'ghij']

myList.append('klm')

myList

myList.count('def') # shall count the occurrences of def in a list

myList2 = [1,2,3]

myList3 = [4,5,6]
myList2 + myList3

Array [ ] vectors (1d) and matrices (>1d) , for numerical data manipulation are defined in
numpy. We need to import numpy to our python session.
import numpy as np # that’s what the community does, we can access any function like
np.array etc you may use any variable name here, we may also do import numpy but then
we need to access the functions like numpy.array etc. Lists are containers for elements
having differing data types but arrays are used as containers for elements of the same data
type.

myArray2 = np.array(myList2)

myArray3 = np.array(myList3)

myArray2 + myArray3

myArray2.dot(myArray3)

Data Analysis in python

To perform various tasks, a set of instructions is combined into functions. A function is


defined by the keyword def, and can be defined anywhere. A combination of various
functions are put together as Modules which then constitute Packages.

import pandas as pd #Pandas library is designed for quick and easy data manipulation, reading,
aggregation, visualization.

import numpy as np #NumPy is used to process arrays that store values of the same datatype. It
facilitates math operations on arrays and their vectorization.

import matplotlib.pyplot as plt #To plot the histograms and other statistical graphs

Working Directory Setting:

pwd # will tell us where we are (same as linux)

ls # will tell us what we have there (files and folders)

dir() # will tell us what variables (objects) we have try vars()

cd # we can change directory

import os #OS module functions for creating and removing a directory (folder), fetching its
contents, changing and identifying the current directory. Right way of doing that as in other
python IDEs above commands might not work.
os.chdir("C:/Desktop/DataAnalysis/") # shall change to directory to the required directory

os.getcwd() # To make sure we are at the right place

#Reading Dataframe exported previously


combined_PRAD = pd.read_csv("PRAD_labeled.csv", index_col=0)

#Tells you what type of variable it is.


type(combined_PRAD)

#Description about variables contained by the dataframe


combined_PRAD.info()

# gives the total size of a dataframe by multiplying the rows with


columns.
combined_PRAD.size

# tells about the shape of data, how many rows and columns present.
combined_PRAD.shape

# Gives you the information about the dimensions of dataset.


combined_PRAD.ndim

#Outputs the first five rows of the data


combined_PRAD.head()

#Gives the list of last 10 rows in the dataset


combined_PRAD.tail(10)

# Import label encoder


from sklearn import preprocessing
# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
# Encode labels in column 'label'.
combined_PRAD['labels']=
label_encoder.fit_transform(combined_PRAD['labels'])
#To have a list of unique enteries.
combined_PRAD['labels'].unique()

#counting the number of classes


combined_PRAD["labels"].value_counts()

#Assigning the numerical data to a "X" variable and labels column into
a "y" variable that will be used in the next steps
X = combined_PRAD.iloc[:,:-1]
y = combined_PRAD["labels"]

#importing train_test_split
from sklearn.model_selection import train_test_split
X_train, X_test ,Y_train, Y_test = train_test_split(X,y,test_size
=0.30, random_state=42)
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#########Plotting TSNE plot to check whether problem is linear or


not#######

import matplotlib.pyplot as plt


import seaborn as sns
from sklearn.manifold import TSNE
fig, ax = plt.subplots()
m = TSNE(learning_rate=50)
X_tsne = m.fit_transform(X)
combined_PRAD["y"] = Y_train
combined_PRAD["comp-1"] = X_tsne[:,0]
combined_PRAD["comp-2"] = X_tsne[:,1]

sns.scatterplot(x="comp-1", y="comp-2", hue=combined_PRAD.y.tolist(),


palette=sns.color_palette('husl', 2),
data=combined_PRAD).set(title="Cancer data T-SNE
projection")
plt.savefig("TSNE-plot.png", dpi = 600)

from sklearn.metrics import accuracy_score


from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.neighbors import KNeighborsClassifier

KNN = KNeighborsClassifier(n_neighbors=7, metric='minkowski', p=1)


KNN.fit(X_train,Y_train)
# predict samples in the test set
prediction = KNN.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)

#Printing precision, recall and f1_scores


print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))

# Assuming you have trained your KNN model and have X_test and Y_test
# KNN is your trained K Nearest Neighbors model
# Get predictions from the KNN model
Y_pred = KNN.predict(X_test)
# Create the confusion matrix
cm = confusion_matrix(Y_test, Y_pred)

# Display the confusion matrix using ConfusionMatrixDisplay


disp = ConfusionMatrixDisplay(confusion_matrix=cm,
display_labels=KNN.classes_) # KNN.classes_ contains your class
labels
disp.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("KNN.png", dpi=600)
plt.show()

# roc curve and auc


from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from matplotlib import pyplot

KNN_probs = KNN.predict_proba(X_test)
KNN_probs = KNN_probs[:, 1]
KNN_auc = roc_auc_score(Y_test, KNN_probs)
KNN_fpr, KNN_tpr, _ = roc_curve(Y_test, KNN_probs)

# plot the roc curve for the model


fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(KNN_fpr, KNN_tpr ,label='KNN =%.3f' % (KNN_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('Precision', fontsize=15)
pyplot.xlabel('Recall', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("PR_curve.png", dpi = 600)
pyplot.show()

#SVC_linear
from sklearn.svm import SVC
svm_linear = SVC(kernel='linear', probability=True, random_state=40)
svm_linear.fit(X_train,Y_train).decision_function(X_test)
prediction = svm_linear.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)

#Printing precision, recall and f1_scores

print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))

# Assuming you have trained your SVM linear model and have X_test and
Y_test
# svm_linear is your trained Support Vector Machine with linear kernel
model

# Get predictions from the SVM linear model


Y_pred_svm_linear = svm_linear.predict(X_test)

# Create the confusion matrix


cm_svm_linear = confusion_matrix(Y_test, Y_pred_svm_linear)

# Display the confusion matrix using ConfusionMatrixDisplay


disp_svm_linear =
ConfusionMatrixDisplay(confusion_matrix=cm_svm_linear,
display_labels=svm_linear.classes_)
disp_svm_linear.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("svm_linear.png", dpi=600)
plt.show()
svm_linear_probs = svm_linear.predict_proba(X_test)
svm_linear_probs = svm_linear_probs[:, 1]
svm_linear_auc = roc_auc_score(Y_test, svm_linear_probs)
svm_linear_fpr, svm_linear_tpr, _ = roc_curve(Y_test,
svm_linear_probs)

# plot the roc curve for the model


fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(svm_linear_fpr, svm_linear_tpr ,label='SVM_Linear =%.3f' %
(svm_linear_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()

#SVC_poly
from sklearn.svm import SVC
# Training a SVM classifier using SVC polynomial
svm_poly = SVC(kernel='poly', probability=True, random_state=40)
svm_poly.fit(X_train,Y_train).decision_function(X_test)
prediction = svm_poly.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)

#Printing precision, recall and f1_scores

print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))

Y_pred_svm_poly = svm_poly.predict(X_test)

# Create the confusion matrix


cm_svm_poly = confusion_matrix(Y_test, Y_pred_svm_poly)

# Display the confusion matrix using ConfusionMatrixDisplay


disp_svm_poly = ConfusionMatrixDisplay(confusion_matrix=cm_svm_poly,
display_labels=svm_poly.classes_)
disp_svm_poly.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("svm_poly.png", dpi=600)
plt.show()

svm_poly_probs = svm_poly.predict_proba(X_test)
svm_poly_probs = svm_poly_probs[:, 1]
svm_poly_auc = roc_auc_score(Y_test, svm_poly_probs)
svm_poly_fpr, svm_poly_tpr, _ = roc_curve(Y_test, svm_poly_probs)

# plot the roc curve for the model


fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(svm_poly_fpr, svm_poly_tpr ,label='SVM_poly =%.3f' %
(svm_poly_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()

#SVC_RBF
from sklearn.svm import SVC
# Training a SVM classifier using SVC class
svm_rbf = SVC(kernel='rbf', probability=True, random_state=40)
svm_rbf.fit(X_train,Y_train).decision_function(X_test)
prediction = svm_rbf.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)

#Printing precision, recall and f1_scores

print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))

# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model

# Get predictions from the SVM poly model


Y_pred_svm_rbf = svm_rbf.predict(X_test)

# Create the confusion matrix


cm_svm_rbf = confusion_matrix(Y_test, Y_pred_svm_rbf)

# Display the confusion matrix using ConfusionMatrixDisplay


disp_svm_rbf = ConfusionMatrixDisplay(confusion_matrix=cm_svm_rbf,
display_labels=svm_rbf.classes_)
disp_svm_rbf.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("svm_rbf.png", dpi=600)
plt.show()

svm_rbf_probs = svm_rbf.predict_proba(X_test)
svm_rbf_probs = svm_rbf_probs[:, 1]
svm_rbf_auc = roc_auc_score(Y_test, svm_rbf_probs)
svm_rbf_fpr, svm_rbf_tpr, _ = roc_curve(Y_test, svm_rbf_probs)

# plot the roc curve for the model


fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(svm_rbf_fpr, svm_rbf_tpr ,label='SVM_rbf =%.3f' %
(svm_rbf_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()

from sklearn.linear_model import LogisticRegression


LR = LogisticRegression()
LR.fit(X_train,Y_train).decision_function(X_test)
prediction = LR.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)

#Printing precision, recall and f1_scores

print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))

# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model

# Get predictions from the SVM poly model


Y_pred_LR = LR.predict(X_test)

# Create the confusion matrix


cm_LR = confusion_matrix(Y_test, Y_pred_LR)

# Display the confusion matrix using ConfusionMatrixDisplay


disp_LR = ConfusionMatrixDisplay(confusion_matrix=cm_LR,
display_labels=LR.classes_)
disp_LR.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("LR.png", dpi=600)
plt.show()

LR_probs = LR.predict_proba(X_test)
LR_probs = LR_probs[:, 1]
LR_auc = roc_auc_score(Y_test, LR_probs)
LR_fpr, LR_tpr, _ = roc_curve(Y_test, LR_probs)

# plot the roc curve for the model


fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(LR_fpr, LR_tpr ,label='LR =%.3f' % (LR_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()

#Naive bayes
from sklearn.naive_bayes import GaussianNB
NB = GaussianNB()
NB.fit(X_train, Y_train)
prediction = NB.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)

#Printing precision, recall and f1_scores

print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model

# Get predictions from the SVM poly model


Y_pred_NB = NB.predict(X_test)

# Create the confusion matrix


cm_NB = confusion_matrix(Y_test, Y_pred_NB)

# Display the confusion matrix using ConfusionMatrixDisplay


disp_NB = ConfusionMatrixDisplay(confusion_matrix=cm_NB,
display_labels=NB.classes_)
disp_NB.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("NB.png", dpi=600)
plt.show()

NB_probs = NB.predict_proba(X_test)
NB_probs = NB_probs[:, 1]
NB_auc = roc_auc_score(Y_test, NB_probs)
NB_fpr, NB_tpr, _ = roc_curve(Y_test, NB_probs)

# plot the roc curve for the model


fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(NB_fpr, NB_tpr ,label='NB =%.3f' % (NB_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()

#DECISION TREE CLASSIFIER


from sklearn.tree import DecisionTreeClassifier
DT= DecisionTreeClassifier(random_state=0)
DT.fit(X_train, Y_train)
prediction = DT.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)

#Printing precision, recall and f1_scores

print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))

# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model

# Get predictions from the SVM poly model


Y_pred_DT = DT.predict(X_test)

# Create the confusion matrix


cm_DT = confusion_matrix(Y_test, Y_pred_DT)

# Display the confusion matrix using ConfusionMatrixDisplay


disp_DT = ConfusionMatrixDisplay(confusion_matrix=cm_DT,
display_labels=DT.classes_)
disp_DT.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("DT.png", dpi=600)
plt.show()

DT_probs = DT.predict_proba(X_test)
DT_probs = DT_probs[:, 1]
DT_auc = roc_auc_score(Y_test, DT_probs)
DT_fpr, DT_tpr, _ = roc_curve(Y_test, DT_probs)

# plot the roc curve for the model


fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(DT_fpr, DT_tpr ,label='DT =%.3f' % (DT_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()

from sklearn.neural_network import MLPClassifier


MLP = MLPClassifier()
MLP.fit(X_train, Y_train)
prediction = MLP.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)

#Printing precision, recall and f1_scores

print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))

# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model

# Get predictions from the SVM poly model


Y_pred_MLP = MLP.predict(X_test)

# Create the confusion matrix


cm_MLP = confusion_matrix(Y_test, Y_pred_MLP)

# Display the confusion matrix using ConfusionMatrixDisplay


disp_MLP = ConfusionMatrixDisplay(confusion_matrix=cm_MLP,
display_labels=MLP.classes_)
disp_MLP.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("MLP.png", dpi=600)
plt.show()

MLP_probs = MLP.predict_proba(X_test)
MLP_probs = MLP_probs[:,1]
MLP_auc = roc_auc_score(Y_test, MLP_probs)
MLP_fpr, MLP_tpr, _ = roc_curve(Y_test, MLP_probs)

# plot the roc curve for the model


fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(MLP_fpr, MLP_tpr ,label='MLP =%.3f' % (MLP_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()

from sklearn.ensemble import AdaBoostClassifier


# define the model
AB = AdaBoostClassifier()
AB.fit(X_train, Y_train)
prediction = AB.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)

#Printing precision, recall and f1_scores

print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))

# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model

# Get predictions from the SVM poly model


Y_pred_AB = AB.predict(X_test)

# Create the confusion matrix


cm_AB = confusion_matrix(Y_test, Y_pred_AB)

# Display the confusion matrix using ConfusionMatrixDisplay


disp_AB = ConfusionMatrixDisplay(confusion_matrix=cm_AB,
display_labels=AB.classes_)
disp_AB.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("AB.png", dpi=600)
plt.show()
AB_probs = AB.predict_proba(X_test)
AB_probs = AB_probs[:, 1]
AB_auc = roc_auc_score(Y_test, AB_probs)
AB_fpr, AB_tpr, _ = roc_curve(Y_test, AB_probs)

# plot the roc curve for the model


fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(AB_fpr, AB_tpr ,label='AB =%.3f' % (AB_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()

from sklearn.ensemble import RandomForestClassifier


# define the model
RF = RandomForestClassifier()
RF.fit(X_train, Y_train)
prediction = RF.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)

#Printing precision, recall and f1_scores

print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))

# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model

# Get predictions from the SVM poly model


Y_pred_RF = RF.predict(X_test)

# Create the confusion matrix


cm_RF = confusion_matrix(Y_test, Y_pred_RF)

# Display the confusion matrix using ConfusionMatrixDisplay


disp_RF = ConfusionMatrixDisplay(confusion_matrix=cm_RF,
display_labels=RF.classes_)
disp_RF.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("RF.png", dpi=600)
plt.show()

RF_probs = RF.predict_proba(X_test)
RF_probs = RF_probs[:,1]
RF_auc = roc_auc_score(Y_test, RF_probs)
RF_fpr, RF_tpr, _ = roc_curve(Y_test, RF_probs)

# plot the roc curve for the model


fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(RF_fpr, RF_tpr ,label='RF =%.3f' % (RF_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()

#####Combined AUROC and PR Curves

KNN_probs = KNN.predict_proba(X_test)
AB_probs = AB.predict_proba(X_test)
DT_probs = DT.predict_proba(X_test)
LR_probs = LR.predict_proba(X_test)
RF_probs = RF.predict_proba(X_test)
NB_probs = NB.predict_proba(X_test)
MLP_probs = MLP.predict_proba(X_test)
svm_linear_probs =svm_linear.predict_proba(X_test)
svm_rbf_probs = svm_rbf.predict_proba(X_test)
svm_poly_probs = svm_poly.predict_proba(X_test)

# keep probabilities for the positive outcome only


KNN_probs = KNN_probs[:, 1]
AB_probs = AB_probs[:, 1]
DT_probs = DT_probs[:, 1]
LR_probs = LR_probs[:, 1]
RF_probs = RF_probs[:, 1]
NB_probs = NB_probs[:, 1]
MLP_probs = MLP_probs[:,1]
svm_linear_probs = svm_linear_probs[:, 1]
svm_poly_probs = svm_poly_probs[:, 1]
svm_rbf_probs = svm_rbf_probs[:, 1]

# calculate scores
KNN_auc = roc_auc_score(Y_test, KNN_probs)
AB_auc = roc_auc_score(Y_test, AB_probs)
DT_auc = roc_auc_score(Y_test, DT_probs)
LR_auc = roc_auc_score(Y_test, LR_probs)
NB_auc = roc_auc_score(Y_test, NB_probs)
RF_auc = roc_auc_score(Y_test, RF_probs)
MLP_auc = roc_auc_score(Y_test, MLP_probs)
svm_poly_auc = roc_auc_score(Y_test, svm_poly_probs)
svm_linear_auc = roc_auc_score(Y_test, svm_linear_probs)
svm_rbf_auc = roc_auc_score(Y_test, svm_rbf_probs)

# calculate roc curves


KNN_fpr, KNN_tpr, _ = roc_curve(Y_test, KNN_probs)
AB_fpr, AB_tpr, _ = roc_curve(Y_test, AB_probs)
DT_fpr, DT_tpr, _ = roc_curve(Y_test, DT_probs)
NB_fpr, NB_tpr, _ = roc_curve(Y_test, NB_probs)
LR_fpr, LR_tpr, _ = roc_curve(Y_test, LR_probs)
RF_fpr, RF_tpr, _ = roc_curve(Y_test, RF_probs)
MLP_fpr, MLP_tpr, _ = roc_curve(Y_test, MLP_probs)
svm_linear_fpr, svm_linear_tpr, _ = roc_curve(Y_test,
svm_linear_probs)
svm_poly_fpr, svm_poly_tpr, _ = roc_curve(Y_test, svm_poly_probs)
svm_rbf_fpr, svm_rbf_tpr, _ = roc_curve(Y_test, svm_rbf_probs)

# plot the roc curve for the model


fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(KNN_fpr, KNN_tpr ,label='KNN =%.3f' % (KNN_auc))
pyplot.plot(AB_fpr, AB_tpr ,label='AB =%.3f' % (AB_auc))
pyplot.plot(NB_fpr, NB_tpr ,label='NB =%.3f' % (NB_auc))
pyplot.plot(DT_fpr, DT_tpr ,label='DT =%.3f' % (DT_auc))
pyplot.plot(LR_fpr, LR_tpr ,label='LR =%.3f' % (LR_auc))
pyplot.plot(RF_fpr, RF_tpr ,label='RF =%.3f' % (RF_auc))
pyplot.plot(MLP_fpr, MLP_tpr ,label='MLP =%.3f' % (MLP_auc))
pyplot.plot(svm_linear_fpr, svm_linear_tpr ,label='SVM_Linear =%.3f' %
(svm_linear_auc))
pyplot.plot(svm_poly_fpr, svm_poly_tpr ,label='SVM_Poly =%.3f' %
(svm_poly_auc))
pyplot.plot(svm_rbf_fpr, svm_rbf_tpr ,label='SVM_RBF =%.3f' %
(svm_rbf_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()

from sklearn.metrics import precision_score


from sklearn.metrics import recall_score
from sklearn.metrics import precision_recall_curve

KNN_precision, KNN_recall, _ = precision_recall_curve(Y_test,


KNN_probs)
AB_precision, AB_recall, _ = precision_recall_curve(Y_test, AB_probs)
DT_precision, DT_recall, _ = precision_recall_curve(Y_test, DT_probs)
LR_precision, LR_recall, _ = precision_recall_curve(Y_test, LR_probs)
NB_precision, NB_recall, _ = precision_recall_curve(Y_test, NB_probs)
RF_precision, RF_recall, _ = precision_recall_curve(Y_test, RF_probs)
MLP_precision, MLP_recall, _ = precision_recall_curve(Y_test,
MLP_probs)
svm_linear_precision, svm_linear_recall, _ =
precision_recall_curve(Y_test, svm_linear_probs)
svm_poly_precision, svm_poly_recall, _ =
precision_recall_curve(Y_test, svm_poly_probs)
svm_rbf_precision, svm_rbf_recall, _ = precision_recall_curve(Y_test,
svm_rbf_probs)

from sklearn.metrics import auc


# calculate the precision-recall auc
KNN_auc = auc(KNN_recall, KNN_precision)
AB_auc = auc(AB_recall, AB_precision)
DT_auc = auc(DT_recall, DT_precision)
NB_auc = auc(NB_recall, NB_precision)
LR_auc = auc(LR_recall, LR_precision)
RF_auc = auc(RF_recall, RF_precision)
MLP_auc = auc(MLP_recall, MLP_precision)
svm_linear_auc = auc(svm_linear_recall, svm_linear_precision)
svm_poly_auc = auc(svm_poly_recall, svm_poly_precision)
svm_rbf_auc = auc(svm_rbf_recall, svm_rbf_precision)

# plot the roc curve for the model


fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(KNN_precision, KNN_recall ,label='KNN =%.3f' % (KNN_auc))
pyplot.plot(AB_precision, AB_recall ,label='AB =%.3f' % (AB_auc))
pyplot.plot(NB_precision, NB_recall ,label='NB =%.3f' % (NB_auc))
pyplot.plot(DT_precision, DT_recall ,label='DT =%.3f' % (DT_auc))
pyplot.plot(LR_precision, LR_recall ,label='LR =%.3f' % (LR_auc))
pyplot.plot(RF_precision, RF_recall ,label='RF =%.3f' % (RF_auc))
pyplot.plot(MLP_precision, MLP_recall ,label='MLP =%.3f' % (MLP_auc))
pyplot.plot(svm_linear_precision, svm_linear_recall ,label='SVM_Linear
=%.3f' % (svm_linear_auc))
pyplot.plot(svm_poly_precision, svm_poly_recall ,label='SVM_Poly =
%.3f' % (svm_poly_auc))
pyplot.plot(svm_rbf_precision, svm_rbf_recall ,label='SVM_RBF =%.3f' %
(svm_rbf_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('Precision', fontsize=15)
pyplot.xlabel('Recall', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
plt.savefig("PR_curve.png", dpi = 600)
pyplot.show()

You might also like