This document provides an overview of support vector machines (SVMs) and how they can be used for classification problems in Python. It explains that SVMs find the optimal hyperplane that separates classes by maximizing the margin between them. The document discusses support vectors, kernels, different kernel types (linear, polynomial, radial basis function), and provides an example of using SVMs for image classification with character recognition. It also covers classification reports, precision, recall, f1 score, support, and confusion matrices for evaluating SVM performance.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
180 views
SVM Using Python
This document provides an overview of support vector machines (SVMs) and how they can be used for classification problems in Python. It explains that SVMs find the optimal hyperplane that separates classes by maximizing the margin between them. The document discusses support vectors, kernels, different kernel types (linear, polynomial, radial basis function), and provides an example of using SVMs for image classification with character recognition. It also covers classification reports, precision, recall, f1 score, support, and confusion matrices for evaluating SVM performance.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24
Support Vector Machine using Python
What is support vector?
• “Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems. • In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate. • Then, we perform classification by finding the hyperplane that differentiate the two classes very well. Support Vector Machine • Generally, Support Vector Machines is considered to be a classification approach, it but can be employed in both types of classification and regression problems. • It can easily handle multiple continuous and categorical variables. • SVM constructs a hyperplane in multidimensional space to separate different classes. SVM generates optimal hyperplane in an iterative manner, which is used to minimize an error. • The core idea of SVM is to find a maximum marginal hyperplane(MMH) that best divides the dataset into classes. Decision Vectors Definitions • Support Vectors – Support vectors are the data points, which are closest to the hyperplane. These points will define the separating line better by calculating margins. These points are more relevant to the construction of the classifier. • Hyperplane – A hyperplane is a decision plane which separates between a set of objects having different class memberships. Conclusions • Margin – A margin is a gap between the two lines on the closest class points. – This is calculated as the perpendicular distance from the line to support vectors or closest points. – If the margin is larger in between the classes, then it is considered a good margin, a smaller margin is a bad margin. How SVM works ? • The main objective is to segregate the given dataset in the best possible way. • The distance between the either nearest points is known as the margin. • The objective is to select a hyperplane with the maximum possible margin between support vectors in the given dataset. SVM searches for the maximum marginal hyperplane in the following steps: – Generate hyperplanes which segregates the classes in the best way. – Select the right hyperplane with the maximum segregation from the either nearest data points. How SVM works ? Non-linear and inseparable planes • Some problems can’t be solved using linear hyperplane. • In such situation, SVM uses a kernel trick to transform the input space to a higher dimensional space as shown on the right. • The data points are plotted on the x-axis and z- axis (Z is the squared sum of both x and y: z=x^2=y^2). • Now you can easily segregate these points using linear separation. Non-linear and inseparable planes SVM Kernels • The SVM algorithm is implemented in practice using a kernel. A kernel transforms an input data space into the required form. • SVM uses a technique called the kernel trick. Here, the kernel takes a low-dimensional input space and transforms it into a higher dimensional space. • In other words, you can say that it converts non- separable problem to separable problems by adding more dimension to it. • It is most useful in non-linear separation problem. Kernel trick helps you to build a more accurate classifier. Kernel Types • Linear Kernel • Polynomial Kernel • Radial Basis Function Kernel Linear Kernel • A linear kernel can be used as normal dot product any two given observations. • The product between two vectors is the sum of the multiplication of each pair of input values.
K(x, xi) = sum(x * xi)
Polynomial Kernel • A polynomial kernel is a more generalized form of the linear kernel. • The polynomial kernel can distinguish curved or nonlinear input space. K(x,xi) = 1 + sum(x * xi)^d • Where d is the degree of the polynomial. d=1 is similar to the linear transformation. • The degree needs to be manually specified in the learning algorithm. Radial Basis Function Kernel • The Radial basis function kernel is a popular kernel function commonly used in support vector machine classification. RBF can map an input space in infinite dimensional space.
K(x,xi) = exp(-gamma * sum((x – xi^2))
• Here gamma is a parameter, which ranges from 0 to 1. A
higher value of gamma will perfectly fit the training dataset, which causes over-fitting. Gamma=0.1 is considered to be a good default value. • The value of gamma needs to be manually specified in the learning algorithm. Example: Image Processing • Image processing is a difficult task for many types of machine learning algorithms. • The relationships linking patterns of pixels to higher concepts are extremely complex and hard to define. • For instance, it's easy for a human being to recognize a face, a cat, or the letter "A", but defining these patterns in strict rules is difficult. • Furthermore, image data is often noisy. There can be many slight variations in how the image was captured, depending on the lighting, orientation, and positioning of the subject. Example: Data Collection • When OCR software first processes a document, it divides the paper into a matrix such that each cell in the grid contains a single glyph, which is just a term referring to a letter, symbol, or number. • Next, for each cell, the software will attempt to match the glyph to a set of all characters it recognizes. • Finally, the individual characters would be combined back together into words, which optionally could be spell-checked against a dictionary in the document's language. The Dataset • We'll use a dataset donated to the UCI Machine Learning Data Repository ( http://archive.ics.uci.edu/ml ) by W. Frey and D. J. Slate. • The dataset contains 20,000 examples of 26 English alphabet capital letters as printed using 20 different randomly reshaped and distorted black and white fonts. • The following figure, published by Frey and Slate, provides an example of some of the printed glyphs. • Distorted in this way, the letters are challenging for a computer to identify, yet are easily recognized by The svc function() • C : float, optional (default=1.0) – Penalty parameter C of the error term. • kernel : string, optional (default=’rbf’) – Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. • random_state : int, RandomState instance or None, optional (default=None) – The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; Classification report: Precision • Precision is the ability of a classifier not to label an instance positive that is actually negative. • For each class it is defined as as the ratio of true positives to the sum of true and false positives. • Said another way, “for all instances classified positive, what percent was correct?” Classification report: Recall • Recall is the ability of a classifier to find all positive instances. • For each class it is defined as the ratio of true positives to the sum of true positives and false negatives. • Said another way, “for all instances that were actually positive, what percent was classified correctly?” Classification report: f1 score • The F1 score is a weighted harmonic mean of precision and recall such that the best score is 1.0 and the worst is 0.0. • Generally speaking, F1 scores are lower than accuracy measures as they embed precision and recall into their computation. • As a rule of thumb, the weighted average of F1 should be used to compare classifier models, not global accuracy. Classification report: Support • Support is the number of actual occurrences of the class in the specified dataset. • Imbalanced support in the training data may indicate structural weaknesses in the reported scores of the classifier and could indicate the need for stratified sampling or rebalancing. • Support doesn’t change between models but instead diagnoses the evaluation process. Confusion Matrix • This is also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. • Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa). • The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another). • It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table).