OpenCV 3.0 Computer Vision With Java - Sample Chapter
OpenCV 3.0 Computer Vision With Java - Sample Chapter
ee
$ 39.99 US
25.99 UK
P U B L I S H I N G
E x p e r i e n c e
D i s t i l l e d
Sa
m
pl
C o m m u n i t y
Preface
Living in times when self-driving vehicles are becoming a reality might trigger
curious minds as to how could computers' incipient vision works. Having a face
recognized for access control, getting our pictures automatically organized by a
subject or person, and having characters automatically recognized from paper scans
are tasks that have become common in our lives. All these aforementioned actions
have been enlisted in the so-called study area of computer vision.
As a scientific discipline, the theory behind systems that can extract information
from images can be described as computer vision, and it has been adopted to extract
valuable measurements from medical images, as well as to help humans delineate
the boundaries of important image areas in the so-called semi-automatic procedures.
In the context of providing a simple-to-use computer vision infrastructure to help
people rapidly build sophisticated vision applications, an open source library was
created: OpenCV. It was designed for real-time applications and is written in C++,
containing several hundred computer vision algorithms.
Although OpenCV had its debut alpha release back in January 1999, it was only in
February 2013 that it officially supported desktop Java through bindings. As this
is one of the most popular introductory teaching languages adopted in computer
science departments as well as K-12 computer-related courses, it is important to
have a good reference for how to build vision apps in a Java environment.
This book covers the basic OpenCV computer vision algorithms and their integration
with Java. As the Swing GUI widget toolkit is widely adopted to build GUIs in
Java, in this book, you will benefit from the chapters that deal with this topic as
well as come to know how to set up your development environment that deals with
native code bindings. Besides, operations such as stretching, shrinking, warping,
and rotating, as well as finding edges, lines, and circles are all covered through
interesting and practical sample projects in this book.
Preface
As the Kinect device has become a great tool for background segmentation, we have
covered it in this chapter as well.
Another hot topic that is commonly explored with computer vision is machine
learning, and in this book, you will find useful information to create your own object
tracker and to use OpenCV's built-in face tracker as well.
Since Java has been widely used for web applications, we have covered computer
vision applications on the server side as well, explaining the details of image
uploading and integration with OpenCV.
By the end of this book, you will have a solid background in how to use Java with
OpenCV from setup to server side; a brief explanation of the basic computer vision
topics are covered in the book. Also, you'll get the source code of several complete
projects from which you can extend and add your own functionality.
Preface
Chapter 6, Detecting Foreground and Background Regions and Depth with a Kinect Device,
explores the important problem of extracting your background. Furthermore, it
explains how to use a Kinect device to retrieve depth information.
Chapter 7, OpenCV on the Server Side, explains how to set up a web server application
with OpenCV.
Image Transforms
This chapter covers the methods to change an image into an alternate representation
of data in order to cover important problems of computer vision and image
processing. Some examples of these methods are artifacts that are used to find image
edges as well as transforms that help us find lines and circles in an image. In this
chapter, we have covered stretch, shrink, warp, and rotate operations. A very useful
and famous transform is Fourier, which transforms signals between the time domain
and frequency domain. In OpenCV, you can find the Discrete Fourier Transform
(DFT) and Discrete Cosine Transform (DCT). Another transform that we've covered
in this chapter is related to integral images that allow rapid summing of sub regions,
which is a very useful step in tracking faces algorithm. Besides this, you will also get
to see distance transform and histogram equalization in this chapter.
we will cover the following topics:
Integral images
Distance transforms
Histogram equalization
By the end of this chapter, you will have learned a handful of transforms that will
enable you to find edges, lines, and circles in images. Besides, you will be able
to stretch, shrink, warp, and rotate images as well as you will be able to change
the domain from the spatial domain to the frequency domain. Other important
transforms used for face tracking will be covered in this chapter as well. Finally,
distance transforms and histogram equalization will also be explored in detail.
[ 71 ]
Image Transforms
1 0 +1
Gx = 2 0 2 Tx
1 0 1
This means that, for each input pixel, the calculated value of its upper-right neighbor
plus twice its right neighbor, plus its bottom-right neighbor, minus its upper-left
neighbor, minus its left neighbor, minus its left-bottom neighbor will be calculated,
yielding a resulting image. In order to use this operator in OpenCV, you can call
Imgproc's Sobel function according to the following signature:
public static void Sobel(Mat src, Mat dst, int ddepth, int dx,int
dy)
The src parameter is the input image and dst is the output. Ddepth is the output
image's depth and when this is assigned as -1, this has the same depth as the source.
The dx and dy parameters will inform us about the order in each of these directions.
When setting dy to 0 and dx to 1, the kernel that we've used is the one mentioned
in the preceding matrix. The example project kernels from this chapter shows a
customizable look of these operators, as shown in the following screenshot:
[ 72 ]
Chapter 4
Laplace ( f ) =
[ 73 ]
2 f 2 f
+
x 2 y 2
Image Transforms
The matrix can be approximated to the convolution with the following kernel when
using finite difference methods and a 3x3 aperture:
0 1 0
1 4 1
0 1 0
The signature for the preceding function is as follows:
Laplacian(Mat source, Mat destination, int ddepth)
While source and destination matrices are simple parameters, ddepth is the depth
of the destination matrix. When you set this parameter to -1, it will have the same
depth as the source image, although you might want more depth when you apply
this operator. Besides this, there are overloaded versions of this method that receive
an aperture size, a scale factor, and an adding scalar.
Besides using the Laplacian method, you can also use the Canny algorithm, which
is an excellent approach that was proposed by computer scientist John F. Canny,
who optimized edge detection for low error rate, single identification, and correct
localization. In order to fulfill it, the Canny algorithm applies a Gaussian to filter the
noise, calculates intensity gradients through sobel, suppresses spurious responses,
and applies double thresholds followed by a hysteresis that suppresses the weak
and unconnected edges. For more information, check this paper [2]. The method's
signature is as follows:
Canny(Mat image, Mat edges, double threshold1, double threshold2, int
apertureSize, boolean L2gradient)
The image parameter is the input matrix, edges is the output image, threshold1
is the first threshold for the hysteresis procedure (values smaller than this will be
ignored), and threshold2 is the high threshold for hysteresis (values higher than
this will be considered as strong edges, while the smaller values and the ones higher
than the low threshold will be checked for connection with strong edges). The
aperture size is used for the Sobel operator when calculating the gradient and the
boolean informs us which norm to use for the gradient. You can also check out the
source code to use this operator in the kernel's project sample in this chapter.
[ 74 ]
Chapter 4
The hough project from this chapter shows an example of the usage of them. The
following is the code to retrieve lines from Imgproc.HoughLines:
Mat canny = new Mat();
Imgproc.Canny(originalImage, canny, 10, 50, aperture, false);
image = originalImage.clone();
Mat lines = new Mat();
Imgproc.HoughLines(canny, lines, 1, Math.PI/180, lowThreshold);
Note that we need to apply the Hough transform over an edge image; therefore, the
first two lines of the preceding code will take care of this. Then, the original image is
cloned for display and a Mat object is created in the fourth line in order to keep the
lines. In the last line, we can see the application of HoughLines.
[ 75 ]
Image Transforms
[ 76 ]
Chapter 4
Besides having the standard Hough transform, OpenCV also offers a probabilistic
Hough line transform as well as a circular version. Both the implementations are
explored in the same Hough sample project, and the following screenshot shows
the working of the circular version:
[ 77 ]
Image Transforms
Here, we will find out which is the perspective transform that maps the side of a
building in a perspective view to its frontal view:
Note that the input to this problem is the perspective photograph of the building,
which is seen on the left-hand side of the preceding image, as well as the four corner
points of the highlighted quadrilateral shape. The output is to the right and shows
what a viewer would see if he/she looks at the side of the building.
Since affine transforms are a subset of perspective transformations, we will focus on
the latter ones here. The code available for this example is in the warps project of this
chapter. The main method used here is warpPerspective from Imgproc. It applies
a perspective transformation to an input image. Here is the method signature for the
warpPerspective method:
public static void warpPerspective(Mat src, Mat dst, Mat M, Size
dsize)
The Mat src parameter is, naturally, the input image, which is the left-hand
side image in the preceding screenshot, while dst Mat is the image on the
right-hand side; make sure you initialize this parameter before using the method.
The not-so-straightforward parameter here is Mat M, which is the warping matrix.
In order to calculate it, you can use the getPerspectiveTransform method from
Imgproc as well. This method will calculate the perspective matrix from two sets
of the four correlated 2D points, the source and destination points. In our example,
the source points are the ones that are highlighted on the left-hand side of the
screenshot, while the destination points are the four corner points of the image to the
right. These points can be stored through the MatOfPoint2f class, which stores the
Point objects. The getPerspectiveTransform method's signature is as follows:
public static Mat getPerspectiveTransform(Mat src, Mat dst)
[ 78 ]
Chapter 4
Mat src and Mat dst are the same as the MatOfPoint2f class mentioned
previously, which is a subclass of Mat.
In our example, we added a mouse listener to retrieve points clicked by the user. A
detail to be kept in mind is that these points are stored in the order: top-left, top-right,
bottom-left, and bottom-right. In the example application, the currently modified point
can be chosen through four radio buttons above the images. The act of clicking and
dragging listeners has been added to the code, so both approaches work.
F ( k , l ) = f ( i, j ) e
ki l j
i 2 +
N N
i =0 i =0
The f(i,j) value is the image in the spatial domain and F(k,l) is the image in the
frequency domain. Note that F(k,l) is a complex function, which means that it has
a real and an imaginary part. This way, it will be represented by two OpenCV Mat
objects or by Mat with two channels. The easiest way to analyze a DFT is by plotting
its magnitude and taking its logarithm, since values for the DFT can be in different
orders of magnitude.
[ 79 ]
Image Transforms
For instance, this is a pulse pattern, which is a signal that can come from zero,
represented as black, to the top, represented as white, on its left, and its Fourier
transform magnitude with the applied logarithm to its right:
Looking back at the preceding DFT transform, we can think of F(k,l) as the value
that would be yielded by multiplying each point of the spatial image with a base
function, which is related to the frequency domain, and by summing the products.
Remember that base functions are sinusoidal and they have increasing frequencies.
This way, if some of the base functions oscillate at the same rate as the signal, it
will be able to sum up to a big number, which will be seen as a white dot on the
Fourier Transform image. On the other hand, if the given frequency is not present
in the image, the oscillation and multiplication with the image will result in a small
number, which won't be noticed in the Fourier Transform image.
Another thing to observe from the equation is that F(0,0) will yield a base function
that is always 1. This way, F(0,0) will simply refer to the sum of all the pixels of
the spatial image. We can also check whether F(N-1, N-1) corresponds to the base
function related to the highest frequency in the image. Note that the previous image
basically has a DC component, which would be the image mean and it could be
checked from the white dot in the middle of the Discrete Fourier transform image.
Besides, the image to the left could be seen as a series of pulses and hence it would
have a frequency in the x axis, which can be noticed by the two dots near the central
point in the Fourier Transform image to the right. Nonetheless, we will need to use
multiple frequencies to approximate the pulse shape. In this way, more dots can
be seen in the x-axis of the image to the right. The following screenshot gives more
insight and helps you understand the Fourier analysis:
[ 80 ]
Chapter 4
Now, we will again check the DC level at the center of the DFT image, to the right,
as a bright central dot. Besides, we can also check multiple frequencies in a diagonal
pattern. An important piece of information that can be retrieved is the direction of
spatial variation, which is clearly seen as bright dots in the DFT image.
It is time to work on some code now. The following code shows you how to make
room to apply the DFT. Remember, from the preceding screenshot, that the result of
a DFT is complex. Besides, we need them stored as floating point values. This way,
we first convert our 3-channel image to gray and then to a float. After this, we put
the converted image and an empty Mat object into a list of mats, combining them into
a single Mat object through the use of the Core.merge function, shown as follows:
Mat gray = new Mat();
Imgproc.cvtColor(originalImage, gray, Imgproc.COLOR_RGB2GRAY);
Mat floatGray = new Mat();
gray.convertTo(floatGray, CvType.CV_32FC1);
List<Mat> matList = new ArrayList<Mat>();
matList.add(floatGray);
Mat zeroMat = Mat.zeros(floatGray.size(), CvType.CV_32F);
matList.add(zeroMat);
Mat complexImage = new Mat();
Core.merge(matList, complexImage);
[ 81 ]
Image Transforms
In order to get some meaningful information, we will print the image, but first, we
have to obtain its magnitude. In order to get it, we will use the standard way that we
learned in school, which is getting the square root of the sum of the squares of the
real and complex parts of numbers.
Again, OpenCV has a function for this, which is Core.magnitude, whose signature
is magnitude(Mat x, Mat y, Mat magnitude), as shown in the following code:
List<Mat> splitted = new ArrayList<Mat>();
Core.split(complexImage,splitted);
Mat magnitude = new Mat();
Core.magnitude(splitted.get(0), splitted.get(1), magnitude);
Before using Core.magnitude, just pay attention to the process of unpacking a DFT
in the splitted mats using Core.split.
Since the values can be in different orders of magnitude, it is important to get the
values in a logarithmic scale. Before doing this, it is important to add 1 to all the
values in the matrix just to make sure we won't get negative values when applying
the log function. Besides this, there's already an OpenCV function to deal with
logarithms, which is Core.log:
Core.add(Mat.ones(magnitude.size(), CvType.CV_32F), magnitude,
magnitude);
Core.log(magnitude, magnitude);
Now, it is time to shift the image to the center, so that it's easier to analyze its
spectrum. The code to do this is simple and goes like this:
int cx = magnitude.cols()/2;
int cy = magnitude.rows()/2;
Mat q0 = new Mat(magnitude,new Rect(0, 0, cx, cy));
Mat q1 = new Mat(magnitude,new Rect(cx, 0, cx, cy));
Mat q2 = new Mat(magnitude,new Rect(0, cy, cx, cy));
Mat q3 = new Mat(magnitude ,new Rect(cx, cy, cx, cy));
Mat tmp = new Mat();
q0.copyTo(tmp);
q3.copyTo(q0);
tmp.copyTo(q3);
q1.copyTo(tmp);
q2.copyTo(q1);
tmp.copyTo(q2);
[ 82 ]
Chapter 4
As a last step, it's important to normalize the image, so that it can be seen in a
better way. Before we normalize it, it should be converted to CV_8UC1:
magnitude.convertTo(magnitude, CvType.CV_8UC1);
Core.normalize(magnitude, magnitude,0,255, Core.NORM_MINMAX, CvType.
CV_8UC1);
When using the DFT, it's often enough to calculate only half of the DFT when you
deal with real-valued data, as is the case with images. This way, an analog concept
called the Discrete Cosine Transform can be used. In case you want it, it can be
invoked through Core.dct.
Integral images
Some face recognition algorithms, such as OpenCV's face detection algorithm make
heavy use of features like the ones shown in the following image:
[ 83 ]
Image Transforms
These are the so-called Haar-like features and they are calculated as the sum of pixels
in the white area minus the sum of pixels in the black area. You might find this type
of a feature kind of odd, but when training it for face detection, it can be built to be
an extremely powerful classifier using only two of these features, as depicted in the
following image:
In fact, a classifier that uses only the two preceding features can be adjusted to detect
100 percent of a given face training database with only 40 percent of false positives.
Taking out the sum of all pixels in an image as well as calculating the sum of each
area can be a long process. However, this process must be tested for each frame in
a given input image, hence calculating these features fast is a requirement that we
need to fulfill.
First, let's define an integral image sum as the following expression:
sum ( X , Y ) = image ( x, y )
x < X y <Y
0 2 4
A = 6 8 10
12 14 16
[ 84 ]
Chapter 4
Sum A=
0 6 16 30
0 18 42 72
x1 x x 2 y1 y y 2
This means that in order to find the sum of a given rectangle bounded by the
points (x1,y1), (x2,y1), (x2,y2), and (x1,y2), you just need to use the integral
image at the point (x2,y2), but you also need to subtract the points (x1-1,y2) from
(x2,y1-1). Also, since the integral image at (x1-1, y1-1) has been subtracted
twice, we just need to add it once.
The following code will generate the preceding matrix and make use of
Imgproc.integral to create the integral images:
Mat image = new Mat(3,3 ,CvType.CV_8UC1);
Mat sum = new Mat();
byte[] buffer = {0,2,4,6,8,10,12,14,16};
image.put(0,0,buffer);
System.out.println(image.dump());
Imgproc.integral(image, sum);
System.out.println(sum.dump());
The output of this program is like the one shown in the preceding matrices for A and
Sum A.
It is important to verify that the output is a 4 x 4 matrix because of the initial row and
column of zeroes, which are used to make the computation efficient.
[ 85 ]
Image Transforms
Distance transforms
Simply put, a distance transform applied to an image will generate an output image
whose pixel values will be the closest distance to a zero-valued pixel in the input
image. Basically, they will have the closest distance to the background, given a
specified distance measure. The following screenshot gives you an idea of what
happens to the silhouette of a human body:
This transform can be very useful in the process of getting the topological skeleton of
a given segmented image as well as to produce blurring effects. Another interesting
application of this transform is in the segmentation of overlapping objects, along
with a watershed.
Generally, the distance transform is applied to an edge image, which results from a
Canny filter. We are going to make use of Imgproc's distanceTransform method,
which can be seen in action in the distance project, which you can find in this
chapter's source code. Here are the most important lines of this example program:
protected void processOperation() {
Imgproc.Canny(originalImage, image, 220, 255, 3, false);
[ 86 ]
Chapter 4
Imgproc.threshold(image, image, 100, 255,
Imgproc.THRESH_BINARY_INV );
Imgproc.distanceTransform(image, image, Imgproc.CV_DIST_L2, 3);
image.convertTo(image, CvType.CV_8UC1);
Core.multiply(image, new Scalar(20), image);
updateView();
}
Firstly, a Canny edge detector filter is applied to the input image. Then, a threshold
with THRESH_BINARY_INV converts the edges to black and beans to white. Only
then, the distance transform is applied. The first argument is the input image, the
second one is the output matrix, and the third argument specifies how distances
are calculated. In our example, CVDIST_L2 means Euclidean, while other distances,
such as CVDIST_L1 or CVDIST_L12, among others exist. Since the output of
distanceTtransform is a single channel 32 bit Float image, a conversion is required.
Finally, we apply Core.multiply to increase the contrast.
The following screenshot gives you a good idea of the whole process:
Histogram equalization
The human visual system is very sensitive to contrast in images, which is the
difference in the color and brightness of different objects. Besides, the human eye
is a miraculous system that can feel intensities at the 1016 light levels [4]. No wonder
some sensors could mess up the image data.
[ 87 ]
Image Transforms
When analyzing images, it is very useful to draw their histograms. They simply
show you the lightness distribution of a digital image. In order to do that, you need
to count the number of pixels with the exact lightness and plot that as a distribution
graph. This gives us a great insight into the dynamic range of an image.
When a camera picture has been captured with a very narrow light range, it gets
difficult to see the details in the shadowed areas or other areas with poor local
contrast. Fortunately, there's a technique to spread frequencies for uniform intensity
distribution, which is called histogram equalization. The following image shows
the same picture with their respective histograms before and after the histogram
equalization technique is applied:
[ 88 ]
Chapter 4
Note that the light values, located at the rightmost part of the upper histogram, are
rarely used, while the middle range values are too tied. Spreading the values along
the full range yields better contrast and details can be more easily perceived by this.
The histogram equalized image makes better use of intensities that generate better
contrast. In order to accomplish this task, a cumulative distribution can be used
to remap the histogram to something that resembles a uniform distribution. Then,
it's just a matter of checking where the points from the original histogram would
be mapped to the uniform distribution through the use of a cumulative Gaussian
distribution, for instance.
Now, the good part is that all these details have been wrapped in a simple call to
OpenCV's equalizeHist function. Here is the sample from the histogram project
in this chapter:
protected void processOperation() {
Imgproc.cvtColor(originalImage, grayImage, Imgproc.COLOR_RGB2GRAY);
Imgproc.equalizeHist(grayImage, image);
updateView();
}
This piece of code simply converts the image to a single channel image; however,
you can use equalizeHist on a color image as long as you treat each channel
separately. The Imgproc.equalizeHist method outputs the corrected image
following the previously mentioned concept.
References
1. A 3x3 Isotropic Gradient Operator for Image Processing presented at a talk at the
Stanford Artificial Project in 1968, by I. Sobel and G. Feldman.
2. A Computational Approach To Edge Detection, IEEE Trans. Pattern Analysis and
Machine Intelligence, by Canny, J.
3. Robust Detection of Lines Using the Progressive Probabilistic Hough Transform,
CVIU 78 1, by Matas, J. and Galambos, C., and Kittler, J.V. pp 119-137 (2000).
4. Advanced High Dynamic Range Imaging: Theory and Practice, CRC Press, by
Banterle, Francesco; Artusi, Alessandro; Debattista, Kurt; Chalmers, Alan.
[ 89 ]
Image Transforms
Summary
This chapter covered the key aspects of computer vision's daily use. We started with
the important edge detectors, where you gained the experience of how to find them
through the Sobel, Laplacian, and Canny edge detectors. Then, we saw how to use
the Hough transforms to find straight lines and circles. After that, the geometric
transforms stretch, shrink, warp, and rotate were explored with an interactive
sample. We then explored how to transform images from the spatial domain to the
frequency domain using the Discrete Fourier analysis. After that, we showed you
a trick to calculate Haar-like features fast in an image through the use of integral
images. We then explored the important distance transforms and finished the
chapter by explaining histogram equalization to you.
Now, be ready to dive into machine learning algorithms, as we will cover how to
detect faces in the next chapter. Also, you will learn how to create your own object
detector and understand how supervised learning works in order to better train your
classification trees.
[ 90 ]
www.PacktPub.com
Stay Connected: