Data Analysis in Python_ML
Data Analysis in Python_ML
computing, that aims to simplify package management and deployment. The distribution
includes data-science packages suitable for Windows, Linux, and macOS. It can be accessed
from
https://www.anaconda.com/products/individual
After installing anaconda, launch it by simply searching it and then launch it by clicking the
icon you get. When it opens, find and launch jupyter Or type jupyter notebook.
Home screen will open in a browser after terminal display for few seconds.
Go to New and select python 3 in drop down list.
A new notebook will be opened, you may rename it by clicking on text box displaying
“untitled”
You may enter your commands in the empty cells. Green color indicates the active cell
while blue indicates the inactive ones. You can make the cell inactive by pressing Esc key.
Hover the mouse over the inactive cell and click or press ENTER key to make it active.
To create a new cell, press + icon at the top or press B on keyboard to add a cell below or A
to add a cell above the current cell.
# Represents comments
A=1
Variable Names are case sensitive and cannot start with a number. They can contain
letters, numbers, and underscores.
a = "hello "
b = "world"
print (a + b)
• Strings: Either single ('') or double ("") or triple quotes (""" or ''') can be used. For example,
“python” and ‘python’ are same strings. Unmatched ones can occur within the string.
"datatype’s"
• Tuple ( ) A collection of different things. Tuples are “immutable”, i.e., they cannot be modified
after creation.
myTuple = ('abc', 2.5, A)
myTuple[2] # Shall return the value at 3rd index position starting from 0.
myList.append('klm')
myList
myList2 = [1,2,3]
myList3 = [4,5,6]
myList2 + myList3
Array [ ] vectors (1d) and matrices (>1d) , for numerical data manipulation are defined in
numpy. We need to import numpy to our python session.
import numpy as np # that’s what the community does, we can access any function like
np.array etc you may use any variable name here, we may also do import numpy but then
we need to access the functions like numpy.array etc. Lists are containers for elements
having differing data types but arrays are used as containers for elements of the same data
type.
myArray2 = np.array(myList2)
myArray3 = np.array(myList3)
myArray2 + myArray3
myArray2.dot(myArray3)
import pandas as pd #Pandas library is designed for quick and easy data manipulation, reading,
aggregation, visualization.
import numpy as np #NumPy is used to process arrays that store values of the same datatype. It
facilitates math operations on arrays and their vectorization.
import matplotlib.pyplot as plt #To plot the histograms and other statistical graphs
import os #OS module functions for creating and removing a directory (folder), fetching its
contents, changing and identifying the current directory. Right way of doing that as in other
python IDEs above commands might not work.
os.chdir("C:/Desktop/DataAnalysis/") # shall change to directory to the required directory
# tells about the shape of data, how many rows and columns present.
combined_PRAD.shape
#Assigning the numerical data to a "X" variable and labels column into
a "y" variable that will be used in the next steps
X = combined_PRAD.iloc[:,:-1]
y = combined_PRAD["labels"]
#importing train_test_split
from sklearn.model_selection import train_test_split
X_train, X_test ,Y_train, Y_test = train_test_split(X,y,test_size
=0.30, random_state=42)
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Assuming you have trained your KNN model and have X_test and Y_test
# KNN is your trained K Nearest Neighbors model
# Get predictions from the KNN model
Y_pred = KNN.predict(X_test)
# Create the confusion matrix
cm = confusion_matrix(Y_test, Y_pred)
KNN_probs = KNN.predict_proba(X_test)
KNN_probs = KNN_probs[:, 1]
KNN_auc = roc_auc_score(Y_test, KNN_probs)
KNN_fpr, KNN_tpr, _ = roc_curve(Y_test, KNN_probs)
#SVC_linear
from sklearn.svm import SVC
svm_linear = SVC(kernel='linear', probability=True, random_state=40)
svm_linear.fit(X_train,Y_train).decision_function(X_test)
prediction = svm_linear.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM linear model and have X_test and
Y_test
# svm_linear is your trained Support Vector Machine with linear kernel
model
#SVC_poly
from sklearn.svm import SVC
# Training a SVM classifier using SVC polynomial
svm_poly = SVC(kernel='poly', probability=True, random_state=40)
svm_poly.fit(X_train,Y_train).decision_function(X_test)
prediction = svm_poly.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
Y_pred_svm_poly = svm_poly.predict(X_test)
svm_poly_probs = svm_poly.predict_proba(X_test)
svm_poly_probs = svm_poly_probs[:, 1]
svm_poly_auc = roc_auc_score(Y_test, svm_poly_probs)
svm_poly_fpr, svm_poly_tpr, _ = roc_curve(Y_test, svm_poly_probs)
#SVC_RBF
from sklearn.svm import SVC
# Training a SVM classifier using SVC class
svm_rbf = SVC(kernel='rbf', probability=True, random_state=40)
svm_rbf.fit(X_train,Y_train).decision_function(X_test)
prediction = svm_rbf.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model
svm_rbf_probs = svm_rbf.predict_proba(X_test)
svm_rbf_probs = svm_rbf_probs[:, 1]
svm_rbf_auc = roc_auc_score(Y_test, svm_rbf_probs)
svm_rbf_fpr, svm_rbf_tpr, _ = roc_curve(Y_test, svm_rbf_probs)
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model
LR_probs = LR.predict_proba(X_test)
LR_probs = LR_probs[:, 1]
LR_auc = roc_auc_score(Y_test, LR_probs)
LR_fpr, LR_tpr, _ = roc_curve(Y_test, LR_probs)
#Naive bayes
from sklearn.naive_bayes import GaussianNB
NB = GaussianNB()
NB.fit(X_train, Y_train)
prediction = NB.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model
NB_probs = NB.predict_proba(X_test)
NB_probs = NB_probs[:, 1]
NB_auc = roc_auc_score(Y_test, NB_probs)
NB_fpr, NB_tpr, _ = roc_curve(Y_test, NB_probs)
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model
DT_probs = DT.predict_proba(X_test)
DT_probs = DT_probs[:, 1]
DT_auc = roc_auc_score(Y_test, DT_probs)
DT_fpr, DT_tpr, _ = roc_curve(Y_test, DT_probs)
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model
MLP_probs = MLP.predict_proba(X_test)
MLP_probs = MLP_probs[:,1]
MLP_auc = roc_auc_score(Y_test, MLP_probs)
MLP_fpr, MLP_tpr, _ = roc_curve(Y_test, MLP_probs)
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model
RF_probs = RF.predict_proba(X_test)
RF_probs = RF_probs[:,1]
RF_auc = roc_auc_score(Y_test, RF_probs)
RF_fpr, RF_tpr, _ = roc_curve(Y_test, RF_probs)
KNN_probs = KNN.predict_proba(X_test)
AB_probs = AB.predict_proba(X_test)
DT_probs = DT.predict_proba(X_test)
LR_probs = LR.predict_proba(X_test)
RF_probs = RF.predict_proba(X_test)
NB_probs = NB.predict_proba(X_test)
MLP_probs = MLP.predict_proba(X_test)
svm_linear_probs =svm_linear.predict_proba(X_test)
svm_rbf_probs = svm_rbf.predict_proba(X_test)
svm_poly_probs = svm_poly.predict_proba(X_test)
# calculate scores
KNN_auc = roc_auc_score(Y_test, KNN_probs)
AB_auc = roc_auc_score(Y_test, AB_probs)
DT_auc = roc_auc_score(Y_test, DT_probs)
LR_auc = roc_auc_score(Y_test, LR_probs)
NB_auc = roc_auc_score(Y_test, NB_probs)
RF_auc = roc_auc_score(Y_test, RF_probs)
MLP_auc = roc_auc_score(Y_test, MLP_probs)
svm_poly_auc = roc_auc_score(Y_test, svm_poly_probs)
svm_linear_auc = roc_auc_score(Y_test, svm_linear_probs)
svm_rbf_auc = roc_auc_score(Y_test, svm_rbf_probs)