Robotics Perception Week 3 Assignment
Robotics Perception Week 3 Assignment
Image Projection
1 Introduction
In this programming assignment, we will use tracking and pose estimation to implement
a simple augmented reality application. There will be two steps to this process. First,
we will use a KLT tracker to get the position of corners across different frames. Then
we will use homography estimation to compute the 3D pose of a set of 4 points in the
world and, instead of simply overlaying a logo like in the previous assignment, render
a 3D object in the frame. For simplicity we will be rendering a cube, but in principle,
any object could be used. Below are a few example images of the results in 1. The goal
will be to from a set of 4 projected points on your image with known coordinates (the
AprilTag or Soccer goal corners), find the necessary position and orientation to draw
the cube over the points.
2 Technical Details
The functions you must implement are track corners and ar cube.m. First you will
use MATLAB’s point tracker to implement KLT in track corners, to find where the
corners are for each frame. The output of track corners will be an array of the points
you have tracked accross the images.
Then, using pixel coordinates of the corners you have tracked, the world coordinates of
the points you have to render, and the calibration matrix. The desired output is the
position and orientation (represented as a vector and a rotation matrix respectively).
Then using that pose, you will generate the projected points (in pixel coordinates)
of a virtual object. You will be given a set of generic points to project, but for the
visualization will use a cube that will lie on top of the points.
You can visualize the results of these functions with the play video script. (More
details below)
1
Robotics:Perception Assignment 3
Image Projection
2.1 KLT Tracking
In track corners you will be given skeleton code, where you have to fill in where the
comments indicate. You are given the images where you will track the points, and
location of the points in the first image where you should start the tracking. You do
not need to implement KLT by yourself, you should use the MATLAB Computer Vision
libraries. Search online for the relevant functions. You will store all the points in a
3D array. To get draw the virutal objects in the scene, these corners will be passed to
ar cube.
Hpw ∼ pc = Kpim
| | |
H = λ r1 r2 t
| | |
We give you K, pim , and pw and thus you can solve for H, then extract the rotation
R and translation t from this homography, enfocing that the z component of t to be
positive (more in the appendix). Please note it is highly encouraged for you create
your own test cases for this - this can be done by generating a random rotation and
translation and scale in MATLAB then passing that through ar cube. You can search
online to learn how to generate a random rotation.
2
Robotics:Perception Assignment 3
Image Projection
we simply project them onto the image:
xc
yc = Xc = K(RX + t)
zc
Xim = Xc /zc
3 Visualizing Results
To check the results of only your tracker, run the generate klt images script. To
visualize the full projected AR as a video, run generate ar images script. Then call
the MATLAB command:
play video(generated imgs)
These will be run on a sequence with images of an april tag. You can also generate your
own video with a set of points and edit generate klt images or generate ar images
if you would like to play with your own data.
4 Submitting
To submit your results, run the submit script, which will test your est homography.m
and ar cube.m functions by passing it some sample points. The Barcal Real goal im-
age sequence will be used for testing the your corner tracking. The submit script will
generate a mat file called RoboticsPerceptionWeek3Submission.mat. Upload this file
onto the assignment page, and you should receive your score immediately.
3
Robotics:Perception Assignment 3
Image Projection
5 Appendix A: Computing Pose from a Homogra-
phy
As we saw from the lectures, points on a plane take the form of a homography. Assuming
all the world points lie on Z = 0:
xw xw
xc | | | |
yc = [R t] yw = r1 r2 r3 t yw
0 0
zc | | | |
1 1
xc /zc | | | xw
=⇒ zc yc /zc = r1 r2 t
yw
1 | | | 1
xim | | | xw | | | xw
1
=⇒ yim = r1 r2 t yw = h1 h2 h3 yw
zc
1 | | | 1 | | | 1
| {z }
H
Now that we have extracted H from the previous part using the AprilTag corners. We
need to extract R and t from it. If there was no noise then we would have h1 = λr1 ,
h2 = λr2 , and h3 = λt, for some arbitrary scale factor λ. Since r1 has norm 1, we could
just say r1 = h1 /kh1 k, r2 = h2 /kh2 k, r3 = r1 × r2 , and t = h3 /kh1 k. However, because
of noise, h1 and h2 may not be orthogonal and thus what we computed would not be a
rotation matrix. To fix this we use the SVD once again, giving our final estimate of R
and t:
| | |
R0 = h1 h2 (h1 × h2 ) = U SV T
| | |
1 0 0
R = U 0 1 0 V T
0 0 det(U V T )
Thus we get our final equations to extract R and t from our estimated H:
| | |
R0 = h1 h2 (h1 × h2 ) = U SV T
| | |
1 0 0
R = U 0 1 0 V T
0 0 det(U V T )
t = h3 /kh1 k
4
Robotics:Perception Assignment 3
Image Projection
IMPORTANT NOTE: The above deterimnes the basic algorithm, however, since a
homography is only determined up to a scale, there is a sign ambiguity, as if you negate
H you get the approximate rotation:
| | | | | |
R0 = −h1 −h2 (−h1 × −h2 ) = −h1 −h2 (h1 × h2 )
| | | | | |
This gives a flip of the x and y axes and thus we have two valid rotation matricies,
corresponding to the translations t and −t respectively. To disambiguate this, you
must in your code enforce that the z component of t is positive (i.e. t3 = H3,3 > 0) and
multiply the H matrix by the appropriate sign to do this.
x0 ∼Hx (1)
x0
h11 h12 h13 x
0
λ y = h21 h22 h23
y (2)
1 h31 h32 h33 1
λx0 =h11 x + h12 y + h13 (3)
λy 0 =h21 x + h22 y + h23 (4)
λ =h31 x + h32 y + h33 (5)
In order to recover x0 and y 0 , we can divide equations (3) and (4) by (5):
h11 x1 + h12 x2 + h13
x0 = (6)
h31 x1 + h32 x2 + h33
h21 x1 + h22 x2 + h23
y0 = (7)
h31 x1 + h32 x2 + h33
Rearranging the terms above, we can get a set of equations that is linear in the terms
of H:
−h11 x − h12 y − h13 + h31 xx0 + h32 yx0 + h33 x0 =0 (8)
−h21 x − h22 y − h23 + h31 xy 0 + h32 yy 0 + h33 y 0 =0 (9)
Finally, we can write the above as a matrix equation:
ax
h =0 (10)
ay
5
Robotics:Perception Assignment 3
Image Projection
Where:
−x −y −1 0 0 0 xx0 yx0 x0
ax = (11)
0 0 0 −x −y −1 xy 0 yy 0 y 0
ay = (12)
T
h= h11 h12 h13 h21 h22 h23 h31 h32 h33 (13)
Our matrix H has 8 degrees of freedom, and so, as each point gives 2 sets of equa-
tions, we will need 4 points to solve for h uniquely. So, given four points (such as the
corners provided for this assignment), we can generate vectors ax and ay for each, and
concatenate them together:
ax,1
ay,1
A =
.. (14)
.
ax,n
ay,n
Ah = 0 (15)
As A is a 9x8 matrix, there is a unique null space. Normally, we can use MATLAB’s
null function, however, due to noise in our measurements, there may not be an h such
that Ah is exactly 0. Instead, we have, for some small ~:
Ah =~ (16)
To resolve this issue, we can find the vector h that minimizes the norm of this ~. To
do this, we must use the SVD, which we will cover in week 3. For this project, all you
need to know is that you need to run the command:
[U, S, V] = svd(A);
The vector h will then be the last column of V , and you can then construct the 3x3
homography matrix by reshaping the 9x1 h vector.