SVM Part2
SVM Part2
This is Part 2 of my series of tutorial about the math behind Support Vector Machines.
If you did not read the previous article, you might want to start the serie at the beginning
by reading this article: an overview of Support Vector Machine.
In the first part, we saw what is the aim of the SVM. Its goal is to find the hyperplane
which maximizes the margin.
But how do we calculate this margin?
SVM = Support VECTOR Machine
In Support Vector Machine, there is the word vector.
That means it is important to understand vector well and how to use them.
Here a short sum-up of what we will see today:
What is a vector?
its norm
its direction
How to add and subtract vectors ?
What is the dot product ?
How to project a vector onto another ?
Once we have all these tools in our toolbox, we will then see:
What is the equation of the hyperplane?
How to compute the margin?
What is a vector?
If we define a point A(3, 4) in ℝ 2 we can plot it like this.
Figure 1: a point
Note: You can notice that we write vector either with an arrow on top of them, or in bold,
in the rest of this text I will use the arrow when there is two letters like OA
→
and the bold
notation otherwise.
Ok so now we know that there is a vector, but we still don't know what IS a vector.
OA2 = 32 + 42
OA2 = 25
‾‾
‾
OA = √25
‖OA‖ = OA = 5
2) The direction
The direction is the second component of a vector.
Definition
u u
: The direction of a vector u (u1 , u2 ) is the vector
w ( ‖u1‖ , ‖u2‖ )
the horizontal axis, and with the angle α with respect to the vertical axis.
This is tedious. Instead of that we will use the cosine of the angles.
In a right triangle, the cosine of an angle β is defined by :
adjacent
cos(β) =
hypotenuse
In Figure 4 we can see that we can form two right triangles, and in both case the adjacent
side will be on one of the axis. Which means that the definition of the cosine implicitly
contains the axis related to an angle. We can rephrase our naïve definition to :
Naive definition 2: The direction of the vector is defined by the cosine of the angle
u
direction cosine.
Computing the direction vector
We will now compute the direction of the vector from Figure 4.:
u
u1 3
cos(θ) = = = 0.6
‖ u‖ 5
and
u2 4
cos(α) = = = 0.8
‖ u‖ 5
The direction of (3, 4) is the vector (0.6, 0.8)
u w
interesting about direction vectors like is that their norm is equal to 1. That's why we
w
Which means that adding two vectors gives us a third vector whose coordinate are the
sum of the coordinates of the original vectors.
You can convince yourself with the example below:
However, since a vector has a magnitude and a direction, we often consider that parallel
translate of a given vector (vectors with the same magnitude and direction but with a
different origin) are the same vector, just drawn in a different place in space.
So don't be surprised if you meet the following :
point, but it is a convenient way of thinking about vectors which you'll encounter often.
The dot product
One very important notion to understand SVM is the dot product.
Why ?
To understand let's look at the problem geometrically.
Figure 12
In the definition, they talk about cos(θ), let's see what it is.
By definition we know that in a right-angled triangle:
adjacent
cos(θ) =
hypotenuse
Figure 14
So now we can view our original schema like this:
Figure 15
We can see that
θ=β−α
opposite x2
sin(β) =
hypotenuse ‖x‖
=
adjacent y1
cos(α) =
hypotenuse ‖y‖
=
opposite y
sin(α) = = 2
hypotenuse ‖y‖
x1 y1 x2 y2
cos(θ) = +
‖ x ‖ ‖ y‖ ‖ x ‖ ‖ y‖
x1 y1 + x2 y2
cos(θ) =
‖x‖‖y‖
If we multiply both sides by ‖x‖‖y‖ we get:
‖x‖‖y‖cos(θ) = x1 y1 + x2 y2
Which is the same as :
‖x‖‖y‖cos(θ) = x ⋅y
Figure 17
This give us the vector z
‖z‖ = ‖y⋅ ‖ x y
y
=
u
‖ y‖
and
‖z ‖ = u ⋅x
z
=
u
‖z ‖
z = ‖z ‖u
Why are we interested by the orthogonal projection ? Well in our example, it allows us to
compute the distance between and the line which goes through .
x y
Figure 19
We see that this distance is ‖x − z‖
‖x − z‖ = ‾(3‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
√
− 4)2 + (5 − 1)2‾ = ‾‾
‾
17 √
two vectors, and if you recall, the inner product is just another name for the dot product !
Note that
y = ax + b
T
w x = −b × (1) + (−a) × x + 1 × y
T
w x = y − ax − b
The two equations are just different ways of expressing the same thing.
It is interesting to note that w0 is −b, which means that this value determines the
intersection of the line with the vertical axis.
Why do we use the hyperplane equation w
Tx instead of y = ax + b ?
For two reasons:
it is easier to work in more than two dimensions with this notation,
the vector will always be normal to the hyperplane(Note: I received a lot of
w
questions about the last remark. will always be normal because we use this
w
which is equivalent to
T
w x =0
with w
2
(1)
and x
x1
(x )
2
Note that the vector is shown on the Figure 20. (w is not a data point)
w
We would like to compute the distance between the point A(3, 4) and the hyperplane.
This is the distance between A and its projection onto the hyperplane
Figure 21
We can view the point A as a vector from the origin to A.
If we project it onto the normal vector w
2 1
=( , )
√‾ ‾
u
5 √5
p = ( u ⋅ a) u
2 1
= (3 × +4× )u
√‾ ‾
p
5 √5
6 4
=( + )u
√‾ ‾
p
5 √5
10
=
√‾
p u
5
10 2 10 1
=( × , × )
√‾ ‾ √5‾ √5‾
p
5 √5
20 10
p =( , )
5 5
p = (4, 2)
‖p‖ = √ ‾4‾‾‾‾‾
2
+ 22‾ = 2 ‾
5 √
Alexandre KOWALCZYK
I am passionate about machine learning and Support Vector Machine. I like to explain things
simply to share my knowledge with people from around the world.
Buy me a coffee!
Categories
SVM in Practice
SVM in C#
SVM in R
Text classification
SVM Tutorial
Mathematics