Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
180 views

SVM Using Python

This document provides an overview of support vector machines (SVMs) and how they can be used for classification problems in Python. It explains that SVMs find the optimal hyperplane that separates classes by maximizing the margin between them. The document discusses support vectors, kernels, different kernel types (linear, polynomial, radial basis function), and provides an example of using SVMs for image classification with character recognition. It also covers classification reports, precision, recall, f1 score, support, and confusion matrices for evaluating SVM performance.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
180 views

SVM Using Python

This document provides an overview of support vector machines (SVMs) and how they can be used for classification problems in Python. It explains that SVMs find the optimal hyperplane that separates classes by maximizing the margin between them. The document discusses support vectors, kernels, different kernel types (linear, polynomial, radial basis function), and provides an example of using SVMs for image classification with character recognition. It also covers classification reports, precision, recall, f1 score, support, and confusion matrices for evaluating SVM performance.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Support Vector Machine using Python

What is support vector?


• “Support Vector Machine” (SVM) is a supervised
machine learning algorithm which can be used for
both classification or regression challenges.
However, it is mostly used in classification
problems.
• In this algorithm, we plot each data item as a point
in n-dimensional space (where n is number of
features you have) with the value of each feature
being the value of a particular coordinate.
• Then, we perform classification by finding the
hyperplane that differentiate the two classes
very well.
Support Vector Machine
• Generally, Support Vector Machines is considered to be
a classification approach, it but can be employed in
both types of classification and regression problems.
• It can easily handle multiple continuous and
categorical variables.
• SVM constructs a hyperplane in multidimensional space
to separate different classes. SVM generates optimal
hyperplane in an iterative manner, which is used to
minimize an error.
• The core idea of SVM is to find a maximum
marginal hyperplane(MMH) that best divides the
dataset into classes.
Decision Vectors
Definitions
• Support Vectors
– Support vectors are the data points, which are
closest to the hyperplane. These points will
define the separating line better by
calculating margins. These points are more
relevant to the construction of the classifier.
• Hyperplane
– A hyperplane is a decision plane which
separates between a set of objects
having different class memberships.
Conclusions
• Margin
– A margin is a gap between the two lines
on the closest class points.
– This is calculated as the perpendicular
distance from the line to support vectors
or closest points.
– If the margin is larger in between the
classes, then it is considered a good margin,
a smaller margin is a bad margin.
How SVM works ?
• The main objective is to segregate the given dataset
in the best possible way.
• The distance between the either nearest points is
known as the margin.
• The objective is to select a hyperplane with the
maximum possible margin between support vectors
in the given dataset. SVM searches for the maximum
marginal hyperplane in the following steps:
– Generate hyperplanes which segregates the classes in
the best way.
– Select the right hyperplane with the maximum
segregation from the either nearest data points.
How SVM works ?
Non-linear and inseparable planes
• Some problems can’t be solved using
linear hyperplane.
• In such situation, SVM uses a kernel trick
to transform the input space to a higher
dimensional space as shown on the right.
• The data points are plotted on the x-axis and
z- axis (Z is the squared sum of both x and y:
z=x^2=y^2).
• Now you can easily segregate these points
using linear separation.
Non-linear and inseparable planes
SVM Kernels
• The SVM algorithm is implemented in practice using a
kernel. A kernel transforms an input data space into
the required form.
• SVM uses a technique called the kernel trick. Here,
the kernel takes a low-dimensional input space and
transforms it into a higher dimensional space.
• In other words, you can say that it converts non-
separable problem to separable problems by adding
more dimension to it.
• It is most useful in non-linear separation problem.
Kernel trick helps you to build a more accurate
classifier.
Kernel Types
• Linear Kernel
• Polynomial Kernel
• Radial Basis Function
Kernel
Linear Kernel
• A linear kernel can be used as normal
dot product any two given
observations.
• The product between two vectors is
the sum of the multiplication of each
pair of input values.

K(x, xi) = sum(x * xi)


Polynomial Kernel
• A polynomial kernel is a more generalized
form of the linear kernel.
• The polynomial kernel can distinguish curved
or nonlinear input space.
K(x,xi) = 1 + sum(x * xi)^d
• Where d is the degree of the polynomial. d=1
is similar to the linear transformation.
• The degree needs to be manually specified
in the learning algorithm.
Radial Basis Function Kernel
• The Radial basis function kernel is a popular kernel
function commonly used in support vector machine
classification. RBF can map an input space in infinite
dimensional space.

K(x,xi) = exp(-gamma * sum((x – xi^2))

• Here gamma is a parameter, which ranges from 0 to 1. A


higher value of gamma will perfectly fit the training
dataset, which causes over-fitting. Gamma=0.1 is
considered to be a good default value.
• The value of gamma needs to be manually specified in
the learning algorithm.
Example: Image Processing
• Image processing is a difficult task for many types
of machine learning algorithms.
• The relationships linking patterns of pixels to
higher concepts are extremely complex and hard
to define.
• For instance, it's easy for a human being to recognize
a face, a cat, or the letter "A", but defining these
patterns in strict rules is difficult.
• Furthermore, image data is often noisy. There can be
many slight variations in how the image was
captured, depending on the lighting, orientation,
and positioning of the subject.
Example: Data Collection
• When OCR software first processes a document, it
divides the paper into a matrix such that each cell
in the grid contains a single glyph, which is just a
term referring to a letter, symbol, or number.
• Next, for each cell, the software will attempt
to match the glyph to a set of all characters
it recognizes.
• Finally, the individual characters would be
combined back together into words, which
optionally could be spell-checked against a
dictionary in the document's language.
The Dataset
• We'll use a dataset donated to the UCI
Machine Learning Data Repository (
http://archive.ics.uci.edu/ml ) by W. Frey and D.
J. Slate.
• The dataset contains 20,000 examples of 26 English
alphabet capital letters as printed using 20 different
randomly reshaped and distorted black and white
fonts.
• The following figure, published by Frey and Slate,
provides an example of some of the printed
glyphs.
• Distorted in this way, the letters are challenging for
a computer to identify, yet are easily recognized by
The svc function()
• C : float, optional (default=1.0)
– Penalty parameter C of the error term.
• kernel : string, optional (default=’rbf’)
– Specifies the kernel type to be used in the algorithm. It
must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’,
‘precomputed’ or a callable. If none is given, ‘rbf’ will be
used.
• random_state : int, RandomState instance or
None, optional (default=None)
– The seed of the pseudo random number generator to use
when shuffling the data. If int, random_state is the seed
used by the random number generator; If RandomState
instance, random_state is the random number generator;
Classification report: Precision
• Precision is the ability of a classifier not
to label an instance positive that is
actually negative.
• For each class it is defined as as the ratio
of true positives to the sum of true and
false positives.
• Said another way, “for all instances
classified positive, what percent
was correct?”
Classification report: Recall
• Recall is the ability of a classifier to
find all positive instances.
• For each class it is defined as the ratio of
true positives to the sum of true
positives and false negatives.
• Said another way, “for all instances that
were actually positive, what percent
was classified correctly?”
Classification report: f1 score
• The F1 score is a weighted harmonic
mean of precision and recall such that
the best score is 1.0 and the worst is 0.0.
• Generally speaking, F1 scores are lower
than accuracy measures as they embed
precision and recall into their
computation.
• As a rule of thumb, the weighted average
of F1 should be used to compare
classifier models, not global accuracy.
Classification report: Support
• Support is the number of actual occurrences of
the class in the specified dataset.
• Imbalanced support in the training data may
indicate structural weaknesses in the reported
scores of the classifier and could indicate the
need for stratified sampling or rebalancing.
• Support doesn’t change between models but
instead diagnoses the evaluation process.
Confusion Matrix
• This is also known as an error matrix, is a specific table
layout that allows visualization of the performance of an
algorithm, typically a supervised learning one.
• Each row of the matrix represents the instances in a
predicted class while each column represents the instances
in an actual class (or vice versa).
• The name stems from the fact that it makes it easy to see if
the system is confusing two classes (i.e. commonly
mislabeling one as another).
• It is a special kind of contingency table, with two dimensions
("actual" and "predicted"), and identical sets of "classes" in
both dimensions (each combination of dimension and class is
a variable in the contingency table).

You might also like