SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
Become a member
There are many cases where the differentiation is not so simple as shown above. In
that case, the hyperplane dimension needs to be changed from 1 dimension to the
Nth dimension. This is called Kernel. To be more simple, its the functional
relationship between the two observations. It will add more dimensions to the data
so we can easily differentiate among them.
1. Linear Kernels
2. Polynomial Kernels
In practical life, it’s very difficult to get a straight hyperplane. Consider the image
below where the points are mixed together. You cannot separate the points using a
straight 2d hyperplane.
Image by Author
1. Face detection
2. Handwriting detection
3. Image Classifications
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
path = 'https://s3-api.us-geo.objectstorage.softlayer.net/cf-
courses-data/CognitiveClass/DA0101EN/automobileEDA.csv'
df = pd.read_csv(path)
df.head()
Open in app
Search
Image by Author
df.head() will give us the details of the top 5 rows of every column. We can use
df.tail() to get the last 5 rows and similarly df.head(10) to get to the top 10 rows.
The data is about cars and we need to predict the price of cars using the above data
We will be using the Decision Tree to get the price of the car.
df.dtypes
symboling int64
normalized-losses int64
make object
aspiration object
num-of-doors object
body-style object
drive-wheels object
engine-location object
wheel-base float64
length float64
width float64
height float64
curb-weight int64
engine-type object
num-of-cylinders object
engine-size int64
fuel-system object
bore float64
stroke float64
compression-ratio float64
horsepower float64
peak-rpm float64
city-mpg int64
highway-mpg int64
price float64
city-L/100km float64
horsepower-binned object
diesel int64
gas int64
dtype: object
df.describe()
Image by Author
In the above data frame, some of the columns are not numeric. So we will consider
only those columns whose values are in numeric and will make all numeric to float.
df.dtypes
for x in df:
if df[x].dtypes == "int64":
df[x] = df[x].astype(float)
print (df[x].dtypes)
Out:
float64
float64
float64
float64
float64
float64
float64
float64
Preparing the Data As with the classification task, in this section, we will divide our
data into attributes and labels and consequently into training and test sets. We will
create 2 data sets, one for the price while the other (df-price). Since our data frame
has various data in object format, for this analysis we are removing all the columns
with object type and for all NaN values, we are removing that row.
df = df.select_dtypes(exclude=['object'])
df=df.fillna(df.mean())
X = df.drop('price',axis=1)
y = df['price']
Here the X variable contains all the columns from the dataset, except the ‘Price’
column, which is the label. The y variable contains values from the ‘Price’ column,
which means that the X variable contains the attribute set and y variable contains
the corresponding labels.
Training SVM
We will create an object svr using the function SVM. We will use the kernel as linear.
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
X_test_std
sc= StandardScaler().fit(X_train)
Out:
array([[ 0.17453157, -0.7473421 , -0.70428107, -1.4995245 ,
-1.05619832,
-0.67877552, -1.30249126, -0.87278899, -1.15396095,
-0.47648372,
-0.09140157, -0.90774727, 0.59090608, 2.00340082,
1.79022864,
-1.50033307, -0.29738086, 0.29738086],
[-1.42118568, -1.74885637, 0.63398001, 0.14076744,
0.30739662,
0.50614488, -0.1142863 , -0.38613195, -0.24348674,
0.32881569,
3.4734668 , -0.82496629, -1.30634872, 0.73955013,
0.31375141,
-0.82735207, 3.36269123, -3.36269123],
[-0.62332705, -0.01896807, 2.63290164, 2.08080815,
1.2007864 ,
2.05879919, 1.74841125, 0.63584784, 1.38777956,
0.89923611,
3.05894722, -0.2179058 , -2.04417004, -0.05035655,
-0.86743037,
-0.18802012, 3.36269123, -3.36269123],
[-0.62332705, 0.16312543, 0.29517974, 0.64867509,
0.30739662,
0.58786352, 1.09156527, 1.34150055, 0.36349607,
0.06038255,
-0.2572094 , 1.35493275, 0.16929391, -1.31420724,
-1.31037354,
1.61715245, -0.29738086, 0.29738086],
[-1.42118568, -0.01896807, 0.9897203 , 1.15658275,
0.30739662,
0.17927028, 1.20136639, 0.85484351, -0.24348674,
0.32881569,
-0.20194012, 1.46530738, 0.16929391, -0.99824457,
-1.0150781 ,
1.02334568, -0.29738086, 0.29738086],
[ 0.9723902 , -0.86873777, -0.22996069, -0.18396041,
-0.16280853,
0.83301947, -0.51623682, -0.4104648 , -0.54697814,
0.4965864 ,
-0.2572094 , -0.49384238, 0.27469695, 0.26560612,
0.46139913,
-0.47216765, -0.29738086, 0.29738086],
[ 0.9723902 , -0.01896807, 0.19353966, 0.28231547,
0.21335559,
-0.22932296, -0.06134647, 0.24652221, -0.54697814,
0.4965864 ,
-0.39538259, 0.19599908, 0.80171217, -0.99824457,
-0.86743037,
1.02334568, -0.29738086, 0.29738086],
[ 0.17453157, -1.08118019, -0.50100091, -1.26638656,
-1.05619832,
0.34270758, -1.08484976, -0.82412328, -1.07808809,
-0.74491686,
-0.2572094 , -1.12849654, -0.67393045, 1.52945681,
1.19963775,
-1.28401775, -0.29738086, 0.29738086],
[-0.62332705, 1.98406049, 0.43069985, 0.2406837 ,
-0.49195214,
0.26098893, 0.44452297, 0.92784206, -0.09174103,
-0.20805059,
-0.2572094 , 0.49952933, -1.83336395, -0.6822819 ,
-0.4244872 ,
0.54264497, -0.29738086, 0.29738086],
[-0.62332705, -0.95978452, -0.50100091, -0.63358358,
-0.6800342 ,
-0.27018228, -0.89661927, -0.67812617, -0.54697814,
-0.74491686,
-0.2572094 , -0.90774727, -0.67393045, 0.73955013,
0.9043423 ,
-0.82735207, -0.29738086, 0.29738086],
[-0.62332705, -0.2314105 , 0.02413952, 0.32394725,
0.30739662,
0.75130082, -0.22212668, -0.09413772, 0.21175037,
0.46303226,
-0.36774795, -0.52143604, -0.67393045, 0.10762479,
0.16610369,
-0.33555826, -0.29738086, 0.29738086],
[ 1.77024883, -0.01896807, -1.55128176, -0.41709835,
-0.39791111,
-0.84221282, 0.51314867, 1.65782762, 1.53952526,
-1.18112071,
-0.11903621, 2.87258398, 1.64493653, -1.31420724,
-0.86743037,
1.61715245, -0.29738086, 0.29738086],
[ 0.9723902 , -0.86873777, -0.22996069, -0.18396041,
-0.16280853,
0.83301947, -0.5378049 , -0.70245902, -1.2298338 ,
0.4965864 ,
3.61163999, -1.40443312, -0.67393045, 1.84541948,
2.23317181,
-1.43212554, 3.36269123, -3.36269123],
[-0.62332705, -0.95978452, -0.50100091, -0.63358358,
-0.6800342 ,
-0.27018228, -0.51623682, -0.38613195, -0.24348674,
0.32881569,
3.4734668 , -1.29405849, -1.30634872, 1.37147547,
0.75669458,
-1.20342969, 3.36269123, -3.36269123],
[ 1.77024883, -0.01896807, -0.46712088, -0.05906508,
0.21335559,
-1.41424335, 0.75039751, 0.73317925, 0.97047888,
2.04007695,
-0.80990217, 1.16177714, -0.25231827, -0.99824457,
-1.0150781 ,
1.02334568, -0.29738086, 0.29738086],
[ 0.17453157, -0.01896807, 1.20994048, 1.56457414,
2.61140185,
0.83301947, 0.81510174, 0.24652221, -0.54697814,
0.4965864 ,
-0.39538259, 0.19599908, 0.80171217, -0.99824457,
-0.86743037,
1.02334568, -0.29738086, 0.29738086],
[ 0.17453157, -0.65629534, -0.83980118, -1.99077944,
-0.86811626,
-0.43361958, -1.14171105, -0.82412328, -1.60919805,
0.53014054,
-0.20194012, -0.74218531, 1.85574262, 0.73955013,
0.46139913,
-0.82735207, -0.29738086, 0.29738086],
[ 1.77024883, 0.83080162, 0.07495956, 1.05666649,
0.30739662,
0.99645676, 0.33080038, -0.11847057, -3.01284579,
-3.96611452,
-0.17430548, 0.19599908, 0.27469695, -0.6822819 ,
-0.4244872 ,
0.54264497, -0.29738086, 0.29738086],
[-0.62332705, -0.50455075, -0.3654808 , -0.53366732,
-0.30387008,
-0.14760431, -0.48878654, -0.38613195, -0.69872384,
1.10056096,
-0.2572094 , -0.46624873, 1.43413044, 0.26560612,
0.31375141,
-0.47216765, -0.29738086, 0.29738086],
[ 0.9723902 , 1.16463971, -0.83980118, -1.38295553,
-0.6800342 ,
-1.16908741, -1.16523986, -0.82412328, -1.3815795 ,
-0.07383402,
-0.14667084, -0.96293458, 0.80171217, 0.89753147,
1.05199003,
-0.93047013, -0.29738086, 0.29738086],
[ 0.9723902 , -0.86873777, -0.22996069, -0.18396041,
-0.16280853,
0.83301947, -0.42996451, -0.70245902, -1.2298338 ,
0.4965864 ,
3.61163999, -0.96293458, -1.30634872, 1.84541948,
1.64258092,
-1.43212554, 3.36269123, -3.36269123],
[-0.62332705, -1.14187802, -0.29772074, -0.02575966,
-0.20982905,
0.50614488, 0.21903853, -0.43479765, 1.08428815,
-2.05352841,
-0.6164597 , 0.22359274, -0.67393045, -0.36631922,
-1.16272582,
0.14554438, -0.29738086, 0.29738086],
[-0.62332705, -0.01896807, 2.42962147, 2.13909264,
1.76503258,
-0.35190093, 2.99543824, 3.21513015, 1.12222458,
3.08025536,
-0.50592114, 2.01718056, -0.7793335 , -1.63016991,
-1.75331671,
2.36930768, -0.29738086, 0.29738086],
[ 0.17453157, 1.37708213, -0.70428107, -0.43375106,
-0.86811626,
-0.43361958, -0.72407465, -0.67812617, -0.54697814,
-0.74491686,
-0.2572094 , -0.90774727, -0.67393045, 0.58156879,
0.46139913,
-0.71712242, -0.29738086, 0.29738086],
[-0.62332705, -0.01896807, 0.02413952, 0.32394725,
0.30739662,
0.75130082, -0.18683347, -0.09413772, 0.21175037,
0.46303226,
3.52873607, -1.07330922, -0.99013959, 1.68743815,
1.64258092,
-1.3601287 , 3.36269123, -3.36269123],
[ 1.77024883, -0.01896807, -1.55128176, -0.41709835,
-0.39791111,
-0.84221282, 0.42687636, 1.65782762, 1.53952526,
-1.18112071,
-0.11903621, 2.87258398, 1.64493653, -1.31420724,
-0.86743037,
1.61715245, -0.29738086, 0.29738086],
[ 0.9723902 , -0.01896807, -0.22996069, -0.18396041,
-0.16280853,
0.83301947, -0.64564528, -0.4104648 , -0.54697814,
0.4965864 ,
-0.2572094 , -0.49384238, 0.27469695, 0.26560612,
0.46139913,
-0.47216765, -0.29738086, 0.29738086],
[ 1.77024883, -0.01896807, -0.70428107, -1.21642843,
-0.77407523,
0.79216014, -0.55741224, -0.4104648 , -0.54697814,
0.4965864 ,
-0.39538259, -0.35587409, 0.80171217, -0.20833789,
-0.27683948,
-0.02818713, -0.29738086, 0.29738086],
[ 1.77024883, 1.92336265, -0.70428107, -0.41709835,
1.15376589,
-1.41424335, 0.47001251, 0.61151499, 2.29825376,
-0.47648372,
-0.11903621, 1.10658982, 0.80171217, -0.99824457,
-0.57213493,
1.02334568, -0.29738086, 0.29738086],
[-0.62332705, 0.67905703, 2.42962147, 2.13909264,
1.76503258,
-0.35190093, 2.99543824, 3.21513015, 1.12222458,
3.08025536,
-0.50592114, 2.01718056, -0.7793335 , -1.63016991,
-1.75331671,
2.36930768, -0.29738086, 0.29738086],
[-0.62332705, -0.01896807, 1.92142106, 1.92260741,
2.37629928,
1.07817541, 1.89546632, 2.0228204 , 1.08428815,
0.46303226,
-0.53355578, 2.18274251, 0.59090608, -1.63016991,
-1.60566899,
2.36930768, -0.29738086, 0.29738086],
[ 1.77024883, 0.83080162, -0.56876096, -0.40877199,
-0.0687675 ,
-1.6593993 , -0.07507161, -1.11611751, -0.01681188,
0.01643855,
-0.14667084, 0.88584055, 1.85574262, -1.47218858,
-1.16272582,
1.96972521, -0.29738086, 0.29738086],
[-0.62332705, -1.26327369, -0.50100091, -0.35048751,
-1.05619832,
2.22223648, -0.48682581, -0.82412328, -1.07808809,
-0.74491686,
-0.2572094 , -1.12849654, -0.67393045, 0.26560612,
0.16610369,
-0.47216765, -0.29738086, 0.29738086],
[-0.62332705, -0.01896807, 2.63290164, 2.08080815,
1.2007864 ,
2.05879919, 1.85625163, 0.63584784, 1.38777956,
0.89923611,
3.05894722, -0.2179058 , -2.04417004, -0.05035655,
-0.86743037,
-0.18802012, 3.36269123, -3.36269123],
[ 0.17453157, -0.14036374, -0.83980118, -1.38295553,
-0.96215729,
-1.16908741, -0.80446476, -0.67812617, -1.15396095,
0.46303226,
-0.64409434, -0.02475019, 0.80171217, -0.20833789,
-0.12919176,
-0.02818713, -0.29738086, 0.29738086],
[-0.62332705, -0.01896807, 0.29517974, 0.76524406,
0.49547868,
0.58786352, 0.04845465, -0.4104648 , -0.54697814,
0.4965864 ,
-0.2572094 , -0.41106141, 0.80171217, -0.05035655,
0.01845597,
-0.18802012, -0.29738086, 0.29738086],
[ 0.9723902 , -0.56524859, 0.07495956, 1.05666649,
0.30739662,
0.99645676, 0.30727157, -0.11847057, 0.78079675,
-0.61070029,
-0.17430548, 0.19599908, 0.27469695, -0.6822819 ,
-0.4244872 ,
0.54264497, -0.29738086, 0.29738086],
[ 0.9723902 , 1.25568646, 0.1257796 , 0.22403099,
0.2603761 ,
0.26098893, 0.56020629, 0.24652221, -0.54697814,
0.4965864 ,
-0.53355578, 0.33396738, 0.80171217, -1.1562259 ,
-1.31037354,
1.30375443, -0.29738086, 0.29738086],
[ 0.17453157, 0.07207868, -0.39936082, -0.12567592,
-0.20982905,
-0.84221282, -0.26134137, -0.09413772, 0.06000467,
0.69791126,
-0.39538259, -0.41106141, -0.25231827, -0.05035655,
0.16610369,
-0.18802012, -0.29738086, 0.29738086],
[-0.62332705, -0.01896807, 2.63290164, 2.08080815,
1.2007864 ,
2.05879919, 1.3562644 , -0.14280343, 0.47730535,
-0.20805059,
-0.42301723, -0.16271848, -0.25231827, -0.99824457,
-1.0150781 ,
1.02334568, -0.29738086, 0.29738086],
[ 0.9723902 , -1.20257586, -0.83980118, -1.41626095,
-1.15023935,
0.01583299, -0.95740203, -0.70245902, 1.08428815,
-2.99304439,
-0.2572094 , -0.93534092, -0.46312436, 0.89753147,
0.75669458,
-0.93047013, -0.29738086, 0.29738086]])
svr.fit(X_train_std,y_train)
y_test_pred = svr.predict(X_test_std)
y_train_pred = svr.predict(X_train_std)
y_test_pred
Out:
Out:
0.8510467833352241
r2_score(y_test,y_test_pred)
Out:
0.720783662521954
Our R sqrt score for the test data is 0.72 and for the train data, it is 0.85 which is good
value.
The above is the graph between the actual and predicted values.
Here we will use the diabetes data that I used in my earlier story for
KNN.https://towardsdatascience.com/knn-algorithm-what-when-why-how-
41405c16c36f
let’s predict the same dataset result using SVM for classification.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv("../input/diabetes.csv")
data.head()
Data from the dataset.
We will read the CSV file through pd.read.csv.And through head() we can see the top
5 rows. There are some factors where the values cannot be zero. For example,
Glucose value cannot be 0 for a human. Similarly, blood pressure, skin thickness,
Insulin, and BMI cannot be zero for a human.
non_zero =
['Glucose','BloodPressure','SkinThickness','Insulin','BMI']
for coloumn in non_zero:
data[coloumn] = data[coloumn].replace(0,np.NaN)
mean = int(data[coloumn].mean(skipna = True))
data[coloumn] = data[coloumn].replace(np.NaN,mean)
print(data[coloumn])
from sklearn.model_selection import train_test_split
X =data.iloc[:,0:8]
y =data.iloc[:,8]
X_train,X_test,y_train,y_test =
train_test_split(X,y,test_size=0.2,random_state=0, stratify=y)
X.head()
Image by Author
For data X we are taking all the rows of columns ranging from 0 to 7. Similarly, for y,
we are taking all the rows for the 8th column.
We have train_test_split which we had imported during the start of the program and
we have defined test size as 0.2 which implies out of all the data 20% will be kept
aside to test the data at a later stage.
Out:
array([0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0,
1, 0,
1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1,
0, 0,
1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 0,
0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 0])
We have an array of data but we need to evaluate our model to check the accuracy.
Let's start it with confusion matrix
confusion_matrix(y_test,y_test_pred)
Out:
array([[92, 8],
[26, 28]])
We have the confusion matrix where the diagonal with 118 and 36 shows the correct
value and 0,0 shows the prediction that we missed.
accuracy_score(y_test,y_test_pred)
Out:
0.7792207792207793
Let’s figure out the difference between the actual and predicted values.
df=pd.DataFrame({'Actual':y_test, 'Predicted':y_test_pred})
df
Actual vs Predicted
We created our linear model with C as 0.01. But how to ensure its the best value. One
option is to change is manual. We can assign different values and run the code one
by one. but this process is very lengthy and time-consuming.
We will use a grid search where we will assign different values of C and from the
dictionary of the value, our model will tell users which is the best value for C as per
the model. To do so we need to import GridsearchCV
svm.grid.fit(X_train,y_train)
[Parallel(n_jobs=1)]: Done 120 out of 120 | elapsed: 43.8s
finished
Out:
svm.grid.best_params_
Out:
{'C': 0.1}
This will give us the result of the best C value for the model
linsvm_clf = svm.grid.best_estimator_
accuracy_score(y_test,linsvm_clf.predict(X_test))
Out:
0.7597402597402597
This is the best accuracy we can get out of the above C values.
In the similar way we can try for Kernel ='poly'. But for ‘rbf’ we need to define gaama
values as well. param = {'C':(0,0.01,0.5,0.1,1,2,5,10,50,100,500,1000)}, 'gamma':
(0,0.1,0.2,2,10) and with normal one value of C from sklearn import svm svm1 =
svm.SVC(kernel='rbf',gamma=0.5, C = 0.01) svm1.fit(X_test,y_test).
adityakumar529/Coursera_Capstone
Coursera_Capstone . Contribute to
adityakumar529/Coursera_Capstone development by creating an…
github.com
Follow
Data Scientist with 6 years of experience. To find out more connect with me on
https://www.linkedin.com/in/adityakumar529/
38 1
2.3K 24
1.8K 18
Aditya Kumar in Towards Data Science
110 1
61
Rukaiya Bano
Convolution Neural Networks: All you need to Know
“Understanding the Power of Convolutional Neural Networks (CNNs) in Machine Learning and
Image Processing”
Lists
New_Reading_List
174 stories · 180 saves
980 6
Viswa
3
Ambassador of Newland in The Data Analytics Academy
2
Prathammodi
17 1