0% found this document useful (0 votes)

192 views

SVM & Image Classification.

20-page report on SVM theory and its implementation under C++. Applied to image classification using Image Classification Dataset from Andrea Vedaldi and Andrew Zisserman's Oxford assignment. The project can be downloaded at : https://github.com/Parveez/CPP_Project_ENSAE_2013

Uploaded by

Alain Parviz Soltani

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

192 views

SVM & Image Classification.

Uploaded by

Alain Parviz Soltani

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

C++ Project

Support Vector Machines & Image Classication

Authors:

Pascal-Adam Sitbon Alain Soltani

Supervisor:

Beno t Patra
February 2014

Contents
Contents 1 Support Vector Machines 1.1 Introduction . . . . . . . . . . . . . . . . . 1.2 Support Vector Machines . . . . . . . . . 1.2.1 Linearly separable set . . . . . . . 1.2.2 Nearly linearly separable set . . . . 1.2.3 Linearly inseparable set . . . . . . 1.2.3.1 The kernel trick . . . . . 1.2.3.2 Classication : projection 1.2.3.3 Mapping conveniently . . 1.2.3.4 Usual kernel functions . . i 1 1 1 2 4 4 5 5 6 6 7 7 7 8 8 8 8 9 10 11 11 13 14 14 15 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . into a bigger space . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

2 Computation under C++ 2.1 Librairies & datasets employed . . . . . . . . . . . . 2.2 Project format . . . . . . . . . . . . . . . . . . . . . 2.3 Two-class SVM implementation . . . . . . . . . . . . 2.3.1 First results . . . . . . . . . . . . . . . . . . . 2.3.2 Parameter selection . . . . . . . . . . . . . . 2.3.2.1 Optimal training on parameter grid 2.3.2.2 Iterating and sharpening results . . 2.4 A good insight : testing on a small zone . . . . . . . 2.5 Central results : testing on a larger zone . . . . . . . 2.5.1 Results . . . . . . . . . . . . . . . . . . . . . 2.5.2 Case of an unreached minimum . . . . . . . . 2.6 Going further : enriching our model . . . . . . . . . 2.6.1 Case (A) : limited dataset . . . . . . . . . . . 2.6.2 Case (B) : richer dataset . . . . . . . . . . . . 2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

A Unbalanced data set 17 A.1 Dierent costs for misclassication . . . . . . . . . . . . . . . . . . . . . . 17 B Multi-class SVM 19 B.1 One-versus-all . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 B.2 One-versus-one . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Bibliography i 20

Chapter 1

Support Vector Machines

1.1 Introduction

Support vector learning is based on simple ideas, which originated from statistical learning theory [1]. Support vector machines are supervised learning models with associated learning algorithms that analyze data and recognize patterns. Basic SVM takes a set of input data and predicts, for each given input, which of the two possible classes forms the output, making it a nonprobabilistic binary linear classier. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.

1.2

Support Vector Machines

A data set containing points which belong to two dierent classes can be represented by the following set : D = {(xi , yi ), 1 i m | i, yi {1; 1} , xi Rq } , (m, q ) N2 (1.1)

where yi represents the belonging to one of the two classes, xi the training points, q the dimension of the data set. One of the most important things we have to focus on is the shape of the data set. Our goal is to nd the best way to distinguish between the two classes. Ideally, we would like to have a linearly separable data set - in which our two set of points can be fully separated by a line for a two-dimensional space, or a hyperplane for a n-dimensional space. However, this is not the case in general. We will look in the following subsections at three possible congurations for our dataset. 1

Chapter 1. Introduction to Support Vector Machines

1.2.1

Linearly separable set

In the following example (Fig. 1.1), it is easy to see that the data points can be easily linearly separated. Most of the time, with a big data set, its impossible to say just by visualizing the data whether it can be linearly separated or not - even the data cannot be visualized.

Figure 1.1: A simple linearly separable dataset. Blue points are labelled 1 ; red are labelled -1.

To solve the problem analytically, we have to dene several new objects. Definition. A linear separator is a function f that depends on two parameters w and b, given by the following formula : fw,b (x) = w, x + b, b R, w Rq . (1.2)

This separator can take more values than 1 and 1. When fw,b (x) 0, x will belong to the class of vectors such that yi = 1 ; in the opposite case, to the other class (i.e. such that yi = 1). The line of separation is the contour line dened by the equation fw,b (x) = 0.
f Definition. The margin of an element (xi , yi ), relatively to a separator f , noted ( xi ,yi ) , is the real given by : f ( xi ,yi ) = f (xi ) yi

(1.3)

Definition. The margin of a set of points D, relatively to a separator f , is the minimum of the margins for all the elements in D :
f f D = min ( xi ,yi ) | (xi , yi ) D .

(1.4)

Definition. The support vectors are the vectors such that :

f ( xi ,yi )

1, i.e. yi ( w, xi + b)

(1.5)

The goal of the SVM is to maximize the margin of the data set.

Chapter 1. Introduction to Support Vector Machines

S u p p o r t v e c t o r s .

Mi n i ma l ma r g i n .

Figure 1.2: Support vectors and minimal margin. The orange line represents the separation, while the pink and blue ones represents respectively the hyperplans associated to the equations fw,b (x) = 1 and fw,b (x) = 1.

Lemma. The width of the band constituted by the hyperplans fw,b (x) = 1 and fw,b (x) = 2 1 equals w . Proof. Let u be a point of the contour line dened by fw,b (x) = 1.
Let u be his orthogonal projection on the contour line fw,b (x) = 1. Hence we have : fw,b (u) fw,b (u ) = 2 i.e. u u , w = 2. Yet we have : i.e. u u , w = u u w . Indeed, they are colinear, and have the same orientation. Besides, u u is equal to the width constituted by the two contour lines.

In order to nd the best separator - i.e. the one providing the maximum margin - we f have to seek within the class of separators such that ( xi ,yi ) > 1, (xi , yi ) D and retain the one for which w is minimal. This leads us to solve the following constrained optimization problem : min
w,b

w 2

(1.6)

f under (xi , yi ) D, ( xi ,yi ) = yi ( w, xi + b) > 1.

for calculus purposes : derivations become easier ; besides, it is NB. We minimize w 2 better to work with the square norm. By introducing Lagrange multipliers i , the previous constrained problem can be expressed as : m w 2 argmin max i [yi ( w, xi + b) 1] (1.7) i 0 2 w,b
i=1

that is we look for a saddle point. In doing so all the points which can be separated as yi ( w, xi + b) 1 > 0 do not matter since we must set the corresponding i to zero.

Chapter 1. Introduction to Support Vector Machines This problem can now be solved by standard quadratic programming techniques.

1.2.2

Nearly linearly separable set

In this subsection, we will discuss the case of a nearly separable set - i.e. a dataset for which using a linear separator would be ecient enough. If there exists no hyperplane that can split entirely the dataset, the following method - called soft margin method - will choose a hyperplane that splits the examples as cleanly as possible, while still maximizing the distance to the nearest cleanly split examples. Let us modify the maximum margin idea to allow mislabeled examples to be treated the same way, by allowing points to have a margin which can be smaller than 1, even negative. The previous constraint in (1.6) now becomes : (xi , yi ) D, yi ( w, xi + b) > 1 i . (1.8)

where i 0 are called the slack variables, and measure the degree of misclassication of the data xi . The objective function we minimize has also to be changed : we increase it by a function which penalizes non-zero i , and the optimization becomes a trade-o between a large margin and a small error penalty. If the penalty function is linear, the optimization problem becomes : min
w,b,

w 2

+C
i=1

(1.9) 0.

under (xi , yi ) D, yi ( w, xi + b) > 1 i ,

This constraint minimization problem above can be solved using Lagrange multipliers as done previously. We now solve the following problem : argmin max
w,b,

i ,i 0

w 2

m i

+C
i=1

i=1

i [yi ( w, xi + b) 1 + i ]
i=1

(1.10)

with i , i

1.2.3

Linearly inseparable set

We saw in the previous subsection that linear classication can read to misclassications - this is especially true if the dataset D is not separable at all. Let us consider the following example (Fig. 1.3). For this set of data points, any linear classication would introduce too much misclassication to be considered as accurate enough.

Chapter 1. Introduction to Support Vector Machines

Figure 1.3: Linearly inseparable set. Blue points are labelled 1 ; red are labelled -1.

1.2.3.1

The kernel trick

To solve our classication problem, let us introduce the kernel trick. For machine learning algorithms, the kernel trick is a way of mapping observations from a general data set S into an inner product space V , without having to compute the mapping explicitly, such that the observations will have a meaningful linear structure in V. Hence linear classications in V are equivalent to generic classications in S. The trick or method used to avoid the explicit mapping is to use learning algorithms that only require dot products between the vectors in V , and choose the mapping such that these high-dimensional dot products can be computed within the original space, by means of a certain kernel function - a function K : S 2 V that can be expressed as an inner product.

1.2.3.2

Classication : projection into a bigger space

To understand the usefulness of the trick, lets go back to our classication problem. Let us consider a simple projection of vectors in D, our dataset, into a much richer, higher-dimension feature space. We project each point of D in this bigger space and make a linear separation there. Lets name p this projection : p1 (xi ) (xi , yi ) D, p(xi ) = ... pn (xi ) as we express the projected vector p(xi ) in a base of the n-dimensional new space. This point of view can lead to problems, because n can grow without any limit, and nothing assures us that the pi are linear in the vectors. Following the same method than above would imply to work on a new set D : D = p(D) = {(p(xi ), yi ), 1 i m | i, yi {1; 1} , xi Rq } , (m, q ) N2 (1.11)

Chapter 1. Introduction to Support Vector Machines Because it implies to calculate p for each vector of D, this method will be never used.

1.2.3.3

Mapping conveniently

Lets rst notice that its not necessary to calculate p, as the optimization problem only involves inner products between the dierent vectors. We can now consider the kernel trick approach. We construct : K : D2 V such as K (x, z ) = p(x), p(z ) , (x, yx ), (z, yz ) D (1.12)

making sure that it corresponds to a projection in the unknown space V . We then avoid the calculus of p, and the description of the space in which we are projecting. The optimization problem remains the same, through replacing ., . by k (., .): m k (w, w) min +C (1.13) i 2 w,b,
i=1

under (xi , yi ) D, yi (k (w, xi ) + b) > 1 i ,

1.2.3.4

Usual kernel functions

Polynomial :

K (x, z ) = (xT z + c)d

where c 0 is a constant trading o the inuence of higher-order versus lower-order terms in the polynomial. Polynomials such that c = 0 are called homogeneous. Gaussian radial basis (RBF) : K (x, z ) = exp( x z 2 ), > 0. Sometimes parametrized using = Hyperbolic tangent : K (x, z ) = tanh(xT z + c), for > 0, c < 0 well chosen.
1 . 2 2

Chapter 2

Computation under C++

2.1 Librairies & datasets employed

We used for this project the computer vision and machine learning library OpenCV. All its SVM features are based on the specic library LibSVM, by Chih-Chung Chang and Chih-Jen Lin. We trained our models on the Image Classication Dataset from Andrea Vedaldi and Andrew Zissermans Oxford assignment. It includes ve dierent image classes - aeroplanes, motorbikes, people, horses and cars - of various sizes, and pre-computed feature vectors, in form of a sequence of consecutive 6-digit values. Pictures used are all color images in .jpg format, of various dimensions. The dataset can be downloaded at : http://www.robots.ox.ac.uk/~vgg/share/practical-image-classification.htm.

2.2

Project format

The C++ project itself possesses 4 branches, for opening, saving, training & testing phases. In its original form, it allows opening two training les and a testing one, on a user-friendly, console-input base. User enters les directories, format used and labels for the dierent training classes. For the testing phase, a label is asked, so results obtained via the SVM classication can be compared with the prior label given by user ; the latter can directly see the misclassication results - rate, number of misclassied les - in the console output. The user can either choose its own kernel type, parameter values, or let the computer run the optimal one ; classes have been created in consequence. Following results have been obtained using this program and additional versions (especially when including multiple training les) that derive directly from it ; these will not be presented here. The project can be found on GitHub at : https://github.com/Parveez/CPP_Project_ENSAE_2013.

Chapter 2. Computation under C++

2.3
2.3.1

Two-class SVM implementation

First results

We rst trained our SVM with training sets aeroplane train.txt and horse train .txt ; the data tested was contained in aeroplane val.txt and horse val.txt. As the images included in the two training classes may vary in size, we all resized them to a unique testing zone ; same goes for the testing set. All images are stored in two matrices - one for the training phase, one for the testing phase : each matrix row is a point (here, an image), and all its coecients are features (here, pixels). For example, for 251 training images, all of size 50x50 pixels, the training matrix will be of dimensions 251x2500. For a 50x50 pixel zone, with respectively 112 and 139 elements in each class, learning time amounts to 0.458 seconds ; testing time, for 274 elements, amounts to 11.147 seconds. But a classier of any type produces bad results for randomly-assigned parameter values : for example, with the default value assigned to C and , a gaussian classier misclassies 126 elements of the aeroplane val.txt le. The following section discuss the optimal selection of the statmodel parameters.

2.3.2
2.3.2.1

Parameter selection
Optimal training on parameter grid

The eectiveness of SVM depends on the selection of kernel, the kernels parameters, and soft margin parameter C . The best combination is here selected by a grid search with multiplicative growing sequences of the parameter, given a certain step. Input parameters for the parameter selection are : min val,max val the extremal values tested step the step parameter. Parameter values are tested through the following iteration sequence : (min val, min val step, ..., min val stepn ) with n such that min val stepn < max val. Parameters are considered optimal when having the best cross-validation accuracy. Using an initial grid gives us a rst approximation of the best parameter possible, and produces better results than default training and testing. It is important to mention here that, without specifying any kernel type to our program, RBF kernel was always chosen as the best t for our data. All the results presented thereafter will be presented for the RBF kernel, with optimization of parameters C and ; following methods are applicable to other classiers as well, even though they remain less ecient.

Chapter 2. Computation under C++ 2.3.2.2 Iterating and sharpening results

Even if results are improved by the use of a parameter grid, renements can be added. Indeed, we sharpen our estimation by computing iterative parameter selection - each time on smaller grids : Data: Default inital grid Result: Optimal parameter for SVM training while iterations under threshold do train SVM on grid through cross-validation ; return best parameter; set parameter = best parameter; re-center grid; diminish grid size; end Algorithm 1: Basic iterative parameter testing. One can initially think of : (j ) (j ) max val = max val min val(j ) = min val(j ) + (j 1) step(j ) = step2

max val(j 1) param(j ) 2 param(j ) min val(j 1) 2

to implement the grid resizing at the step j , with param(j ) best parameter value obtained after training the SVM model. Yet such recursion is not properly ecient : as j grows, the calculation time grows very fastly. Indeed, as step gets smaller, the number of iterations to reach max value increases very easily. As we usually initialize C and grid extremal values at dierent powers of ten, with step(0) = 10, a convient way to resize the grid at step j is the following : 1 1 (j ) = param(j ) 10 2j 1 +10 2j max val 2
(j ) (j ) 2j min val = param 10 1 step(j ) = step(j 1) = ... = 10 2j
1

as we can express min val, max val using powers of ten after replacing step(j ) . It only takes a couple of iterations to go through the grid, and produces equivalent or better results. Besides, the more precise the estimation of the parameters, the faster the iteration.

Chapter 2. Computation under C++

2.4

A good insight : testing on a small zone

We rst sought results for a small zone of 50x50 pixels, to get a primary overview of how our algorithms works. For such zone, and the following initial grid and characteristics1 : Grid min val max val 107 3 10 + 1010 C 103 7 10 + 103 Number of class 1 les Number of class -1 les Files tested 112 139 247

we obtained the following results : No iterations nor grid usage (latest calculation time2 : 0.599 seconds): default value C default value Files misclassied Misclassication rate 1 1 126 0.459

After 1 iteration (latest calculation time : 11.691 seconds): nal value C nal value Files misclassied Misclassication rate 107 1000 68 0.248

After 5 iterations (latest calculation time : 4.138 seconds): nal value C nal value Files misclassied Misclassication rate 9.085 108 90.851 68 0.248

After 20 iterations (latest calculation time : 3.974 seconds): nal value C nal value Files misclassied Misclassication rate
1

4.966 108 18.011 66 0.240

Again, we point our that RBF kernel type was not specied initially by the user, but chosen by the program during paramater optimization. 2 Here represents the total calculation time - i.e. including training and testing time - for the last iteration mentionned.

Chapter 2. Computation under C++

Figure 2.1: Values of C per iterations.

Figure 2.2: Values of 108 per iterations.

What can we surmise from those results ? Firstly, the number of misclassied images is improved by automatically training our model on a grid. Secondly, it is also improved by iterating the parameter selection process. Although decay is slow, each iteration help our SVM classifying better the testing data. Lastly, calculation time seem to be globally lower iteration after iteration, in acceptable proportions considering the small size of our zone.

2.5
2.5.1

Central results : testing on a larger zone

Results

Let us now run training and testing on a larger zone of 300x300 zone, to gain better comprehension of our models behaviour. Parameter grids are initialized to the same values as in the previous subsection ; here again, RBF kernel is the optimal kernel type for the data. No iterations nor grid usage (latest calculation time : 20.857 seconds): default value C default value Files misclassied Misclassication rate 1 1 126 0.459

Chapter 2. Computation under C++ After 1 iteration (latest calculation time : 420.265 seconds): nal value C nal value Files misclassied Misclassication rate 107 1000 118 0.430

After 5 iterations (latest calculation time : 161.741 seconds): nal value C nal value Files misclassied Misclassication rate 9.085 109 133.352 60 0.218

After 15 iterations (latest calculation time : 143.982 seconds): nal value C nal value Files misclassied Misclassication rate 3.048 109 38.983 68 0.248

Figure 2.3: Number of misclassied images per iterations.

Figure 2.4: Values of C per iterations. Blue background, left : Normal scale. Red background, right : logarithmic scale.

Chapter 2. Computation under C++

Figure 2.5: Values of 1010 per iterations. Blue background, left : Normal scale. Red background, right : logarithmic scale.

2.5.2

Case of an unreached minimum

Here the most intriguing fact may be probably be that after 5 iterations, the number of misclassied les drops to 60 les out 274 tested, and raises to 62 the next step. This can be explained by the following fact : the point ( (5) , C (5) ) is near the minimum value - i.e. the one providing the minimal misclassication rate - we are seeking, which exact value cannot be reached through the grid at fth step ; and as reposition (, C ) and resize the grid at ( (5) , C (5) ), we might actually re-center the problem on a new area that does not include the minimum at all.

U n r e a c h e dmi n i mu m. ( G a mma , C ) a t s t e p5 .

( G a mma , C ) g r i da t s t e p5 ( G a mma , C ) g r i da t s t e p6

Figure 2.6: Problem of the unreached minimum. Here the minimum is included in the upper-middle case of the grid at step 5. (Gamma, C) is the best approximation available over the grid, but shrinking the grid at this exact point leaves the minimum o the new grid.

A solution to address this problem may be to have a smoother re-sizing algorithm, like the rst one we presented. But this may actually have a negative impact on calculation time at each step. For example, let us compare our results with those obtained with the initial, less-ecient re-sizing algorithm ; for the latter, with the same 300x300 pixel zone, the rst three steps of iteraion on parameter selection produced the following results :

Chapter 2. Computation under C++ After 1 iteration (latest calculation time : 432.228 seconds): nal value C nal value Files misclassied Misclassication rate 107 1000 118 0.430

After 2 iterations (latest calculation time : 644.136 seconds): nal value C nal value Files misclassied Misclassication rate 107 1000 118 0.430

After 3 iterations (latest calculation time : 1590.78 seconds): nal value C nal value Files misclassied Misclassication rate 107 1000 118 0.430

At rst step, the misclassication rate is the same as with the second re-sizing method ; the decay is indeed much slower (the resizing is so smooth that second and third steps still give a rate of 0.430), but the calculation time are very poor. The third iteration takes 1590.78 seconds to compute, compared to 161.975 with the convenient method. The conclusion of this section is that there might be an actual trade-o between computing performances and avoiding the unreached-minimum problem in many cases.

2.6

Going further : enriching our model

In the rst two sections, we trained our model on two dierent subsets : aeroplane train .txt and horse train.txt, trying to make predictions for both aeroplanes and horses. Here, we will include more objects - horses, background, motorbikes, and cars - in the class -1, and leave aeroplanes in the class 1 ; we will only try to classify les from testing test aeroplane val.txt. Our goal is here to show how using a larger training set can improve our predictions. Let us compare the results between a class -1 testing test containing only horses - case (A) -, and the testing set described above - case (B). RBF kernel is the optimal kernel type in both cases. Zone used is of size 300x300.

2.6.1

Case (A) : limited dataset

Grid min val max val 107 103 + 1010 C 103 107 + 103 Number of class 1 les Number of class -1 les Files tested 112 139 126

Chapter 2. Computation under C++ After 5 iterations (latest calculation time : 145.650 seconds): nal value C nal value Files misclassied Misclassication rate 3.83 108 177.8 41 0.325

After 10 iterations (latest calculation time : 137.342 seconds): nal value C nal value Files misclassied Misclassication rate 3.28 108 56.51 41 0.325

After 20 iterations (latest calculation time : 135.250 seconds): nal value C nal value Files misclassied Misclassication rate 2.27 108 26.25 36 0.285

After 40 iterations (latest calculation time : 141.561 seconds): nal value C nal value Files misclassied Misclassication rate 1.59 108 13.98 34 0.269

2.6.2

Case (B) : richer dataset

Grid min val max val 107 3 10 + 1010 C 103 7 10 + 103 Number of class 1 les Number of class -1 les Files tested 112 1717 126

After 1 iteration (latest calculation time : 681.084 seconds): nal value C nal value Files misclassied Misclassication rate 106 1000 12 0.095

We directly see here, after only 1 iteration, that the classication accuracy is much better ; the larger the initial training set, the better. Note that calculation time can reach quite high rates for very large datasets.

Chapter 2. Computation under C++

2.7

Conclusions

From all the experiments we conducted in this section, we can draw the following conclusions : The number of misclassied images is improved by automatically training our model on a parameter grid. It can also be improved by selecting the best parameter iteratively, and shriking our grid after each step. Choosing the right shrinking algorithm is very important, and can be very tricky. Indeed, for a very sharp resizing, calculation time can be acceptable but we might leave the point of minimal misclassication out of the grid. Using a large training set is always a good thing, as it improves drastically classication accuracy.

Appendix A

Unbalanced data set

In this study, we luckily were in possession of well-balanced data sets : the number of les for each subset were of the same order. However, in general, data sets can be unbalanced : one class may contain a lot more example than others. The principal problem linked to these data sets is that we no longer can say that a classier is ecient just by looking at the accuracy. Indeed, lets say that the ratio is 99% - for, per ex., the class 1 - against 1% - for class -1. A classier which misclassies every vector which belonging to class 1, but well classifes the vectors of the other class will return a 99% accuracy. Yet if you are especially interested in the other class in your study, this separator is not very useful. There are several ways to avoid this problem, we will treat the most well known: dierent costs for misclassication.

A.1

Dierent costs for misclassication

Let us consider an unbalanced data set of the following form : D = {(xi , yi ), 1 i m | i, yi {1; 1} , xi Rq } , (m, q ) N2 (A.1)

The optimization problem remains the same as in (1.9): w min 2 w,b,

2 m

+C
i=1

(A.2) 0.
m i=1 i

under (xi , yi ) D, yi ( w, xi + b) > 1 i ,

The solution is to replace the total misclassication penalty term C one : C+ j + C j

j J+ j J

by a new (A.3)

C+

0, C 17

Appendix A. Unbalanced data set J+ = {j 1, m | yj = 1} , J = {j 1, m | yj = 1} .

One condition has to be satised, in order to give equal overall weight to each class : the total penalty term has to be the same for each class. A hypothesis commonly made is to suppose that the number of misclassied vectors in each class is proportional to the number of vector in each class, leading us to the following condition : C Card(J ) = C+ Card(J+ ) (A.4)

If, for instance, Card(J ) Card(J+ ), then C C +. A larger importance will be given to misclassied vectors xi such that yi = 1.

Appendix B

Multi-class SVM
Several methods have been suggested to extend the previous SVM scheme to solve multiple-class problems. [2] All the following schemes are applicable to any binary classier, and are not exclusively related to SVM. The most famous methods are the one-versus-all and one-versus-one methods.

B.1

One-versus-all

In this and the following subsections, the training and testing sets can be classied in M classes C1 , C2 , ...CM . The one-versus-all method is based on the construction of M binary classiers, each labelling 1 a specied class, -1 the others. During the testing phase, the classier providing the highest margin wins the majority vote.

B.2

One-versus-one

1) The one-versus-all method is based on the construction of M (M binary classiers by 2 confronting each of the M classes. During the testing phase, every point is analysed by each classier, and a majority vote is conducted to determine its class. If we denote xt the point to classify and hij the SVM classier separating classes Ci , Cj , then the label awarded to xt can be formally written :

Card({hij (xt )} {k } | i, j, k [1; M ], i < j )

(B.1)

This represents the class awarded to xt most of the time, after being analysed by all the classiers hij . Some ambiguity may exist in the counting of votes, if there is no majority election. Both methods presents downsides. For the one-versus-all version, nothing indicates that the classication results between the M classiers are comparable. Besides, the problem isnt well-balanced anymore : for example, with M = 10, we use only 10% of positives examples, against 90% negative ones. 19

Bibliography
[1] Vladimir N. Vapnik. The Nature of Statistical Learning Theory, 1995. [2] Christopher M. Bishop. Pattern Recognition And Machine Learning, 2006.

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Introduction To SQL Test Your Understanding
No ratings yet
Introduction To SQL Test Your Understanding
71 pages
TYCS Data Science Questions Bank
No ratings yet
TYCS Data Science Questions Bank
3 pages
Richard Fabian Data Oriented Design Software Engineering For Limited
No ratings yet
Richard Fabian Data Oriented Design Software Engineering For Limited
327 pages
Iti Pdfs
No ratings yet
Iti Pdfs
10 pages
FreeFEM Tutorial Shovkun
No ratings yet
FreeFEM Tutorial Shovkun
6 pages
Functional Dependencies and Normalization
No ratings yet
Functional Dependencies and Normalization
7 pages
JCG Global Air Services
No ratings yet
JCG Global Air Services
2 pages
15A05602 Data Warehousing & Mining
No ratings yet
15A05602 Data Warehousing & Mining
2 pages
Data Warehousing & Data Mining
No ratings yet
Data Warehousing & Data Mining
97 pages
Lab Manual Week 03
100% (1)
Lab Manual Week 03
4 pages
Tableau Lab Manual
No ratings yet
Tableau Lab Manual
6 pages
Data Scales and Representation: Prof. Asim Tewari IIT Bombay
No ratings yet
Data Scales and Representation: Prof. Asim Tewari IIT Bombay
27 pages
CS8091 BDA Unit1
No ratings yet
CS8091 BDA Unit1
63 pages
UE20CS332 Unit2 Slides PDF
No ratings yet
UE20CS332 Unit2 Slides PDF
264 pages
assignment2
No ratings yet
assignment2
2 pages
Logical Database Design and The Relational Model
No ratings yet
Logical Database Design and The Relational Model
50 pages
Create First Data WareHouse - CodeProject
No ratings yet
Create First Data WareHouse - CodeProject
10 pages
Sawtooth Software: Analysis of Traditional Conjoint Using Microsoft Excel: An Introductory Example
No ratings yet
Sawtooth Software: Analysis of Traditional Conjoint Using Microsoft Excel: An Introductory Example
7 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Pivot Tables
No ratings yet
Pivot Tables
8 pages
DS+C25 PGDDS+Masters
No ratings yet
DS+C25 PGDDS+Masters
13 pages
Database Lab 4
No ratings yet
Database Lab 4
7 pages
(Fall 2011) CS-402 Data Mining - Final Exam-SUB - v03
No ratings yet
(Fall 2011) CS-402 Data Mining - Final Exam-SUB - v03
6 pages
UE20CS302 Unit4 Slides
No ratings yet
UE20CS302 Unit4 Slides
312 pages
Control Engineering-I Lab-1 Dated: 24-10-2007 1. What Is MATLAB
No ratings yet
Control Engineering-I Lab-1 Dated: 24-10-2007 1. What Is MATLAB
9 pages
Data Mining and Business Intelligence Lab Manual
No ratings yet
Data Mining and Business Intelligence Lab Manual
52 pages
Unit II Notes
No ratings yet
Unit II Notes
36 pages
Nptel Swayam DWDM Slides
No ratings yet
Nptel Swayam DWDM Slides
406 pages
Final
100% (1)
Final
14 pages
Assignment Chapter 3 PDF
No ratings yet
Assignment Chapter 3 PDF
2 pages
Practical List of DBMS
No ratings yet
Practical List of DBMS
19 pages
Chapter 9 MySQL
No ratings yet
Chapter 9 MySQL
29 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
SQL Project
No ratings yet
SQL Project
15 pages
MySQL Data Types
No ratings yet
MySQL Data Types
3 pages
DWDM Syllabus
No ratings yet
DWDM Syllabus
2 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
12 pages
Econ275 (Stanford) PDF
No ratings yet
Econ275 (Stanford) PDF
4 pages
76 - Sample - Chapter Kunci M2K3 No 9
No ratings yet
76 - Sample - Chapter Kunci M2K3 No 9
94 pages
Unit 1
No ratings yet
Unit 1
70 pages
Lesson Plan: Data Warehousing and Data Mining
No ratings yet
Lesson Plan: Data Warehousing and Data Mining
1 page
(MYSQL Advanced) (CheatSheet)
No ratings yet
(MYSQL Advanced) (CheatSheet)
10 pages
UNIT - 2 .DataScience 04.09.18
No ratings yet
UNIT - 2 .DataScience 04.09.18
53 pages
Sample Paper Q0503
No ratings yet
Sample Paper Q0503
20 pages
CS8091 Bigdata Analytics Lessonplan With Date
No ratings yet
CS8091 Bigdata Analytics Lessonplan With Date
11 pages
100 SQL Formulas Each Student Should Know
No ratings yet
100 SQL Formulas Each Student Should Know
10 pages
Frame-Based Expert Systems
No ratings yet
Frame-Based Expert Systems
50 pages
Untitled
No ratings yet
Untitled
1,326 pages
MySQL Assignment
No ratings yet
MySQL Assignment
4 pages
Unit-3 DMDW
No ratings yet
Unit-3 DMDW
36 pages
In My Case: Master - Slave Replication Step by Step
100% (1)
In My Case: Master - Slave Replication Step by Step
4 pages
Zomato Data Analysis
No ratings yet
Zomato Data Analysis
8 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
23 pages
Report Business
No ratings yet
Report Business
4 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
DBMS Problem Statements 2018-19
0% (1)
DBMS Problem Statements 2018-19
12 pages
DBMS Notes Unit IV PDF
No ratings yet
DBMS Notes Unit IV PDF
73 pages
Mining Comlex Types of Data
No ratings yet
Mining Comlex Types of Data
19 pages
Artificial Neural Networks Kluniversity Course Handout
No ratings yet
Artificial Neural Networks Kluniversity Course Handout
18 pages
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
No ratings yet
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
1 page
IJDIWC - Volume 3, Issue 1
No ratings yet
IJDIWC - Volume 3, Issue 1
143 pages
EC6303
No ratings yet
EC6303
5 pages
MODULE 2 - Activity #18 Approximate Analysis of Rectangular Building Frame
No ratings yet
MODULE 2 - Activity #18 Approximate Analysis of Rectangular Building Frame
4 pages
E7 Ordinary Differential Equation New - 0
No ratings yet
E7 Ordinary Differential Equation New - 0
28 pages
BIT4101 BUSINESS DATA MINING AND WAREHOUSING-cat 1
No ratings yet
BIT4101 BUSINESS DATA MINING AND WAREHOUSING-cat 1
5 pages
2 2024 Trial Sem 1 Negeri Kelantan (A) - 231227 - 112857
No ratings yet
2 2024 Trial Sem 1 Negeri Kelantan (A) - 231227 - 112857
10 pages
B. Vucetic Et Al., Turbo Codes © Springer Science+Business Media New York 2000
No ratings yet
B. Vucetic Et Al., Turbo Codes © Springer Science+Business Media New York 2000
2 pages
ETI microproject
No ratings yet
ETI microproject
11 pages
Panel Data
No ratings yet
Panel Data
31 pages
Examples of Syndrome Decoding: Ex 1 Let C
100% (1)
Examples of Syndrome Decoding: Ex 1 Let C
2 pages
C# Boyer-Moore String Search Example
No ratings yet
C# Boyer-Moore String Search Example
4 pages
Assignment 3
No ratings yet
Assignment 3
8 pages
50 Most Important CNN Interview Questions
No ratings yet
50 Most Important CNN Interview Questions
18 pages
Using HTK
No ratings yet
Using HTK
36 pages
Summative 1Q2
No ratings yet
Summative 1Q2
2 pages
COL106 Minor Prep - Ayush Gupta
No ratings yet
COL106 Minor Prep - Ayush Gupta
69 pages
Research Proposal
No ratings yet
Research Proposal
2 pages
Homework 2 Solutions: G (0) 1 and G (1) 1/2. Therefore G (X) Is in (0,1) For All X in (0,1) - and Since G Is
No ratings yet
Homework 2 Solutions: G (0) 1 and G (1) 1/2. Therefore G (X) Is in (0,1) For All X in (0,1) - and Since G Is
6 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Thermo Info
No ratings yet
Thermo Info
14 pages
DS/CFT Duality in Cosmology: Ramtin Amintaheri
No ratings yet
DS/CFT Duality in Cosmology: Ramtin Amintaheri
81 pages
Worksheet B Key Topic 1.2 Rates of Change
No ratings yet
Worksheet B Key Topic 1.2 Rates of Change
2 pages
Nonequilibrium Statistical Mechanics Gene F. Mazenko - The ebook with rich content is ready for you to download
100% (3)
Nonequilibrium Statistical Mechanics Gene F. Mazenko - The ebook with rich content is ready for you to download
59 pages
Laplace of Gaussian
No ratings yet
Laplace of Gaussian
7 pages
Recognition of Food Type and Calorie Estimation Using Neural Network
No ratings yet
Recognition of Food Type and Calorie Estimation Using Neural Network
22 pages
Pilz Pss Programmable Safety Controller PDF
No ratings yet
Pilz Pss Programmable Safety Controller PDF
74 pages
Buy ebook Analysis And Design Of Algorithms 2nd Edition Amrinder Arora cheap price
100% (2)
Buy ebook Analysis And Design Of Algorithms 2nd Edition Amrinder Arora cheap price
71 pages