Computer Vision For The Web - Sample Chapter
Computer Vision For The Web - Sample Chapter
ee
P U B L I S H I N G
C o m m u n i t y
Apply various image filters to images
and videos
Recognize and track many different objects,
including faces and face particles, using
powerful facial recognition algorithms
Explore ways to control your browser without
touching the mouse or keyboard
$ 29.99 US
19.99 UK
pl
Foat Akhmadeev
Computer Vision
for the Web
Sa
m
E x p e r i e n c e
D i s t i l l e d
Computer Vision
for the Web
Unleash the power of Computer Vision algorithms in JavaScript to
develop vision-enabled web content
Foat Akhmadeev
research. He completed his master's degree in the year 2014 from the Kazan Federal
University, Russia. He has worked on different projects, including development
of high-loaded websites written in Java and real-time object detection for mobile
phones. He has an extensive background in the field of Computer Vision. He has
also written a scientific paper on 3D reconstruction from a single image. For more
information, you can visit his website at http://foat.me.
Preface
Computer Vision is one of the popular areas in computer science that have gained
widespread importance lately. Besides, the power of personal computers has grown,
thus opening the gate for developers to use Computer Vision algorithms directly
on end user machines using client-side scripting. Nowadays, the most popular
programming language for the web is JavaScript. It allows us to develop complex
algorithms and run them directly in a web browser; this solves several major
problemsthe user needs nothing but a browser to run a web application, and as
a developer, you get a lower load on your server. In this book, we will provide a
comprehensive overview of the most popular JavaScript libraries and discuss the
techniques they provide to help you in your initial steps in exciting fields, such as
image processing and Computer Vision. This book covers Computer Vision methods
by providing an intuitive overview of each algorithm and showing clear examples of
the usage of libraries.
[v]
Preface
Chapter 5, May JS Be with You! Control Your Browser with Motion, extends the topic
of object detection to object tracking and provides exhaustive examples. It also
demonstrates how to create a human interface using gestures or head motion.
Chapter 6, What's Next? summarizes all that we will do throughout this book.
Moreover, it provides references to several libraries that are not presented here.
Conventions
In this book, you will find a number of styles of text that distinguish between
different kinds of information. Here are some examples of these styles, and an
explanation of their meaning.
Code words in text are shown as follows: "The installation of a JavaScript library is
straightforward. You just need to add a script file to your <head> tag."
A block of code is set as follows:
var dataBuffer = new jsfeat.data_t(cols * rows, imageData.data.
buffer);
var mat = new jsfeat.matrix_t(cols, rows, jsfeat.U8C4_t, dataBuffer);
var gray = tracking.Image.grayscale(mat.data, cols, rows, true);
[ vi ]
[1]
As you can see, we just added a JavaScript library here without any additional
actions. We do not need any particular software, since JavaScript is fast enough
for many Computer Vision tasks.
The core data structure for the JSFeat library is a matrix. We will cover more topics
about matrices in the next section, but to check whether everything works correctly,
let's try to create an example.
Add the following code to a <script/> tag:
var matrix = new jsfeat.matrix_t(3, 3, jsfeat.U8_t | jsfeat.C1_t);
matrix.data[1] = 1;
matrix.data[5] = 2;
matrix.data[7] = 1;
for (var i = 0; i < matrix.rows; ++i) {
var start = i * matrix.cols;
console.log(matrix.data.subarray(start, start + matrix.cols));
}
In the preceding code, we create a new matrix with the dimensions of 3 x 3 and an
unsigned byte type with one channel. Next, we set a few elements into it and log the
content of the matrix into the console row by row. The matrix data is presented as a
one-dimensional array. Remember this, we will clarify it in the next section.
[2]
Chapter 1
Finally, you did it! You have successfully added the JSFeat Computer Vision library
to your first project. Now, we will discuss what a matrix actually is.
Then we need to place an image here. We do this with just a few lines of code:
var canvas = document.getElementById('initCanvas'),
context = canvas.getContext('2d'),
image = new Image();
image.src = 'path/to/image.jpg';
image.onload = function () {
var cols = image.width;
var rows = image.height;
canvas.width = cols;
canvas.height = rows;
context.drawImage(image, 0, 0, image.width, image.height);
};
[3]
This is just a common way of displaying an image on a canvas. We define the image
source path, and when the image is loaded, we set the canvas dimensions to those of
an image and draw the image itself. Let's move on. Loading a canvas' content into a
matrix is a bit tricky. Why is that? We need to use a jsfeat.data_t method, which
is a data structure that holds a binary representation of an array. Anyway, since it is
just a wrapper for the JavaScript ArrayBuffer, it should not be a problem:
var imageData = context.getImageData(0, 0, cols, rows);
var dataBuffer = new jsfeat.data_t(cols * rows, imageData.data.
buffer);
var mat = new jsfeat.matrix_t(cols, rows, jsfeat.U8_t | jsfeat.C4_t,
dataBuffer);
Here, we create a matrix as we did earlier, but in addition to that we add a new
parameter, matrix buffer, which holds all the necessary data.
Probably, you already noticed that the third parameter for the matrix construction
looks strange. It sets the type of matrix. Matrices have two properties:
The first part represents the type of data in the matrix. In our example, it is
U8_t; it states that we use unsigned byte array. Usually, an image uses 0-255
range for a color representation, that is why we need bytes here.
Remember that an image consists of 3 main channels (red, green, and blue)
and an alpha channel. The second part of the parameter shows the number
of channels we use for the matrix. If there is only one channel, then it is a
grayscale image.
How do we convert a colored image into a grayscale image? For the answer, we must
move to the next section.
Chapter 1
Just a few lines of code! First, we create an object, which will hold our grayscale
image. Next, we apply the JSFeat function to that image. You may also define
matrix boundaries for conversion, if you want. Here is the result of the conversion:
For this type of operation, you do not actually need to load a color image into the
matrix; instead of mat.data, you can use imageData.data from the contextit's up
to you.
To see how to display a matrix, refer to the Matrix displaying section.
One of the useful operations in Computer Vision is a matrix transpose, which
basically just rotates a matrix by 90 degrees counter-clockwise. You need to keep
in mind that the rows and columns of the original matrix are reflected during this
operation:
var transposed = new jsfeat.matrix_t(mat.rows, mat.cols, mat.type |
mat.channel);
jsfeat.matmath.transpose(transposed, mat);
Again, we need to predefine the resulting matrix, and only then we can apply the
transpose operation:
}
jsfeat.matmath.multiply(C, A, B);
Here, the M = K = 3 and N = 2. Keep in mind that during the matrix creation,
we place columns as a first parameter, and only as the second do we place rows.
We populate matrices with dummy values and call the multiply function. After
displaying the result in the console, you will see this:
[1, 2] [3, 2, 1] [ 3, 0, -3]
[3, 4] [0, -1, -2] [-3, 9, 2]
[5, 6]
[ 2, -5, 15]
Here the first column is matrix A, the second matrix B and the third column is the
result matrix of C.
[6]
Chapter 1
Going deeper
Consider find features on an image, these features are usually used for object
detection. There are many algorithms for this but you need a robust approach,
which has to work with different object sizes. Moreover, you may need to reduce
the redundancy of an image or search something the size of which you are unsure of.
In that case, you need a set of images. The solution to this is a pyramid of an image.
An image pyramid is a collection of several images, which are downsampled from
the original.
The code for creating an image pyramid will look like this:
var levels = 4, start_width = mat.cols, start_height = mat.rows,
data_type = jsfeat.U8_t | jsfeat.C1_t;
var pyramid = new jsfeat.pyramid_t(levels);
pyramid.allocate(start_width, start_height, data_type);
pyramid.build(mat);
[7]
First, we define the number of levels for the pyramid; here, we set it to 4. In JSFeat,
the first level is skipped by default, since it is the original image. Next, we define the
starting dimensions and output types. Then, we allocate space for the pyramid levels
and build the pyramid itself. A pyramid is generally downsampled by a factor of 2:
JSFeat pyramid is just an array of matrices, it shows different pyramid layers starting
from the original image and ending with the smallest image in the pyramid.
Matrix displaying
What we did not discuss in the previous section is how to display output matrices.
It is done in different ways for grayscale and colored images. Here is the code for
displaying matrices for a colored image:
var data = new Uint8ClampedArray(matColour.data);
var imageData = new ImageData(data, matColour.cols, matColour.rows);
context.putImageData(imageData, 0, 0);
[8]
Chapter 1
We just need to cast the matrix data to the appropriate format and put the resulting
ImageData function into the context. It is harder to do so for a grayscale image:
var imageData = new ImageData(mat.cols, mat.rows);
var data = new Uint32Array(imageData.data.buffer);
var alpha = (0xff << 24);
var i = mat.cols * mat.rows, pix = 0;
while (--i >= 0) {
pix = mat.data[i];
data[i] = alpha | (pix << 16) | (pix << 8) | pix;
}
[9]
The first parameter defines an array for sorting, the second and third are the starting
index and the ending index, respectively. The final parameter defines the comparison
function. You will see the following image:
As we can see, the lower portion part of the image was sorted, looks good!
You will probably need a median function, which returns the number that separates
the higher part of the data from the lower part. To understand this better, we need to
see some examples:
var
var
var
var
[ 10 ]
Chapter 1
For the first array, the result is 3. It is simple. For the sorted array, number 3
just separates 1, 2 from 5, 8. What we do see for the second array, is the result
of 4. Actually, different median algorithms may return different results; for the
presented algorithm, JSFeat picks one of the array elements to return the result. In
contrast, many approaches will return 5 in that case, since 5 represents the mean
of two middle values (4, 6). Taking that into account, be careful and see how the
algorithm is implemented.
Linear algebra
Who wants to solve a system of linear equations? No one? Don't worry, it can be
done very easily.
First, let's define a simple linear system. To start with, we define the linear system as
Ax = B, where we know A and B matrices and need to find x:
var bufA = [9, 6, -3, 2, -2, 4, -2, 1, -2],
bufB = [6, -4, 0];
var A = new jsfeat.matrix_t(3, 3, jsfeat.F32_t | jsfeat.C1_t, new
jsfeat.data_t(bufA.length, bufA));
var B = new jsfeat.matrix_t(3, 1, jsfeat.F32_t | jsfeat.C1_t, new
jsfeat.data_t(bufB.length, bufB));
jsfeat.linalg.lu_solve(A, B);
JSFeat places the result into the B matrix, so be careful if you want to use B somewhere
else or you will loose your data. The result will look like this:
[2.000..., -4.000..., -4.000..]
Since the algorithm works with floats, we cannot get the exact values but after
applying a round operation, everything will look fine:
[2, -4, -4]
In addition to this, you can use the svd_solve function. In that case, you will need to
define an X matrix as well:
jsfeat.linalg.svd_solve(A, X, B);
[ 11 ]
A perspective example
Let us show you a more catchy illustration. Suppose you have an image that is
distorted by perspective or you want to rectify an object plane, for example, a
building wall. Here's an example:
Looks good, doesn't it? How do we do that? Let's look at the code:
var imgRectified = new jsfeat.matrix_t(mat.cols, mat.rows, jsfeat.U8_t
| jsfeat.C1_t);
var transform = new jsfeat.matrix_t(3, 3, jsfeat.F32_t | jsfeat.C1_t);
jsfeat.math.perspective_4point_transform(transform,
0, 0, 0, 0, // first pair x1_src, y1_src, x1_dst, y1_dst
640, 0, 640, 0, // x2_src, y2_src, x2_dst, y2_dst and so on.
640, 480, 640, 480,
0, 480, 180, 480);
jsfeat.matmath.invert_3x3(transform, transform);
jsfeat.imgproc.warp_perspective(mat, imgRectified, transform, 255);
[ 12 ]
Chapter 1
Summary
In this chapter, we saw many useful Computer Vision applications. Every time
you want to implement something new, you need to start from the beginning.
Fortunately, there are many libraries that can help you with your investigation. Here,
we mainly covered the JSFeat library, since it provides basic methods for Computer
Vision applications. We discussed how and when to apply the core of this library.
Nevertheless, this is just a starting point, and if you want to see more exciting math
topics and dig into the Computer Vision logic, we strongly encourage you to go
through the next chapters of this book. See you there!
[ 13 ]
www.PacktPub.com
Stay Connected: