U NIVERSIDADE DE C OIMBRA
D EPARTAMENTO DE E NGENHARIA E LECTROT ÉCNICA E DE C OMPUTADORES
I NSTITUTO DE S ISTEMAS E ROB ÓTICA
3030-290 C OIMBRA , P ORTUGAL
A RTICLE : Direct Solution to the Minimal Generalized Pose
Author:
Pedro M IRALDO
miraldo@isr.uc.pt
Co-Author:
Helder A RAUJO
helder@isr.uc.pt
C ONTENTS
I
Introduction
I-A
General Imaging Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I-B
Previous Approaches vs Our Approach . . . . . . . . . . . . . . . . . . . . . . . .
I-C
Background and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
3
3
4
II
Proposed Approach
II-A
Decomposition of the Homography Matrix . . . . . . . . . . . . . . . . . . . . . .
II-B
Formalization of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
II-C
The Case of a Central Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
4
5
6
III
Solving for the Intersection of Quadrics
6
IV
Algorithm Outline
7
V
Experimental Results
V-A
Experiments with Synthetic Data . . .
V-A1
Numerical Errors . . . . .
V-A2
Number of Solutions . . .
V-A3
Computational Effort . . .
V-B
Experimental Results with Real Data .
V-C
Analysis of the Experimental Results
VI
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Conclusions
References
.
.
.
.
.
.
.
.
.
.
.
.
7
. . . . . . 7
. . . . . . 8
. . . . . . 9
. . . . . . 9
. . . . . . 10
. . . . . . 11
11
12
Biographies
13
Pedro Miraldo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Helder Araujo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Preprint: submitted to the IEEE Trans. Cybernetics, August, 2013.
SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM
1
Direct Solution to the Minimal Generalized Pose
Pedro Miraldo and Helder Araujo
Abstract—Pose estimation is a relevant problem for imaging
systems whose applications range from augmented reality to
robotics. In this paper we propose a novel solution for the
minimal pose problem, within the framework of generalized
camera models and using a planar homography. Within this
framework and considering only the geometric elements of the
generalized camera models, an imaging system can be modeled
by a set of mappings associating image pixels to 3D straight lines.
This mapping is defined in a 3D world coordinate system. Pose
estimation performs the computation of the rigid transformation
between the original 3D world coordinate system and the one
in which the camera was calibrated. Using synthetic data, we
compare the proposed minimal-based method with the state-ofthe-art methods in terms of numerical errors, number of solutions
and processing time. From the experiments, we conclude that the
proposed method performs better, especially because there is a
smaller variation in numerical errors, while results are similar in
terms of number of solutions and computation time. To further
evaluate the proposed approach we tested our method with real
data. One of the relevant contributions of this paper is theoretical.
When compared to the state-of-the-art approaches, we propose a
completely new parametrization of the problem that can be solved
in four simple steps. In addition, our approach does not require
any predefined transformation of the dataset, which yields a
simpler solution for the problem.
Index Terms—Minimal problems in computer vision, absolute
pose, generalized camera models, homography matrix.
I. I NTRODUCTION
Camera calibration within the framework of the generalized
camera models consists of estimating the correspondences
between image pixels and the corresponding projecting 3D
straight lines. Image space does not usually change and
therefore we can define a 3D coordinate system for the
image coordinates. But a camera is a mobile device and
so we cannot define a fixed global coordinate system to
represent the lines mapped into the image points. Therefore,
we define a 3D reference coordinate system associated with
the camera to represent the 3D lines mapped into the image
pixels [1]. As a consequence, to estimate the coordinates of
3D entities represented in a different coordinate system we
need to estimate a rigid transformation that maps the camera
coordinate system into the other reference coordinate system,
which we call the absolute pose problem. This problem can
be divided into two subproblems: minimal and non-minimal
cases.
Absolute pose is important for many applications that need
to use the 3D entities imaged by a camera. The analysis of
solutions using minimal data is important because it provides
insights and a better understanding of the problem. Nonminimal solutions, meanwhile, are used to estimate the pose
P. Miraldo and H. Araujo are with the Institute for Systems and Robotics,
Department of Electrical and Computer Engineering, University of Coimbra,
Portugal.
for more than the minimal point correspondences, when we
need a robust solution. The advantages of minimal solutions
are in terms of computation speed and their applications in
the hypothesis-and-test estimation methods such as RANSAC
[2], [3].
Pose estimation has been subject of a significant amount
of research for the case of any camera that can be modeled
by a common perspective imaging system, where all the rays
intersect at a single point (Fig. 1(b)). These works [4], [5],
[6], [7], [8] are examples for non-minimal pose problems, and
in the case of minimal pose problems these works [9], [10],
[11] (only three 3D points in the world and their images are
used) are also examples. Guo and Bujnak et al. , in [12], [13],
developed a minimal solution for the computation of the pose
of an uncalibrated perspective camera. In this case, instead of
three 3D points, their method requires four 3D points and their
images.
If we consider imaging devices where image formation
cannot be modeled by a central projection (see Fig. 1(a)),
e.g. when the imaging rays are subject to reflection and/or
refraction [14], [15], [16], [17], [18], [19] or linear cameras
[20], [21], [22], [23], new methods and algorithms have to be
considered. In this article we address minimal pose estimation
for general imaging devices that uses a single image of three
points in the scene. We do not address the relative pose
problem where two or more images of the scene are used.
In general, estimates based on minimal data provide multiple
solutions. Here we describe a new method for pose estimation
using the minimal number of three known coordinates of 3D
points. As far as we know, this problem has been studied by
Nistér & Stewénius [24], [25], by Chen & Chang [26], [27]
and by Ramalingam et al. at [28]. A more detailed analysis
of these methods is presented in the next section.
We use a planar 3D homography transformation to define an
algebraic relationship that relates any three points in the world
(represented in the world coordinate system) and any three 3D
lines that pass through those points (the lines are represented
in the 3D camera coordinate system). Note that, since we are
using only three 3D points, the use of a planar homography to
represent the rigid transformation does not restrict the use of
the method to planar patterns. Any three linearly independent
3D points define a plane in the world. However, as we will see
(and without additional constraints), the use of three points and
three lines leads to an infinite number of possible solutions.
To have a finite number of solutions, the constraint that the
homography matrix must define a rigid transformation is used.
In this article, existing and proposed approaches are also
compared. The proposed method performs better than the
previous approaches in terms of numerical error and is similar
to the best state-of-the-art method in terms of the number of
solutions and computation effort.
SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM
2
ne
age
Pla
Im
(a)
(b)
Fig. 1. Representation of the minimal pose estimation problem, considering that we have a calibrated generalized camera. Three 3D points are imaged through
three rays. Since the generic camera model is calibrated, the 3D ray coordinates are known in the camera coordinate system. Pose estimation consists therefore
on the estimation of the rotation and translation that define the transformation from the 3D world coordinate system into the camera coordinate system, such
that the incidence relationship between points and lines is verified.
We do not consider the non-minimal problem here. However, solutions for these cases were also developed by Chen
and Chang [26], [27] and by Schweighofer and Pinz [29].
This paper is organized as follows: in the rest of this
section we describe the framework for the generalized camera
model, give a short description of the previous approaches,
and indicate the mathematical notation used in the article. The
proposed approach is presented in Sec. Sec. II. A method to
estimate the intersection points of three quadrics is described
in Sec. III. The structure of the algorithm is shown in Sec. IV
and in Sec. V-A we compare the proposed approach with the
state-of-the-art methods in terms of numerical accuracy, number of possible solutions and processing. Some conclusions
are set out in Sec. VI.
The contributions of the paper are stated in Sec. II and III.
In Sec. II, we propose the homography-based solution for our
problem (to the best of our knowledge, it is the first time
that homography has been used for this kind of problem).
In Sec. III, we adjust a general minimal solver (based on
polynomial eigenvalues) to compute the intersection of the
quadrics, as required by our method (solution presented in
Sec. II).
A. General Imaging Model
For the estimation of the 3D pose, the calibration of the
imaging device is assumed to be known. We use a model that
can represent any type of imaging device. The model used is
the general imaging model proposed by Grossberg and Nayar
[30]. This model assumes that each image pixel is mapped
into a ray in 3D world. The mapping parameters form an
array of parameters called raxels. The set of all raxels that
represent all image pixels constitutes the complete camera
model. Note that there are no constraints on the mapping
parameters, which means that the imaging model can represent
any type of imaging system, central or non-central.
Since we assume that the camera has been previously
calibrated and represented by the raxel imaging model for
all image pixels, the corresponding 3D line coordinates are
known in the camera coordinate system. Several calibration
procedures for the general imaging model were described in
the literature, for instance [30], [31], [32].
B. Previous Approaches vs Our Approach
The algorithm proposed by Chen and Chang at [26], [27]
was the first method to estimate the minimal three-point
pose problem for general imaging devices. The algorithm is
based on the estimation of the roots of an eighth degree
polynomial equation. To get this polynomial equation, the
data-set is transformed such that lines and points verify a set
of constraints. An algorithm is used in the new coordinate
system to relate the triangle defined by distances between the
three known world points and the three projection lines.
This algorithm estimates the coordinates of the points in the
camera coordinate system. Thus, since the pose is described
by a rotation and a translation an additional estimation of the
rigid transformation is also required, based on the matching
between the coordinates of the points in the camera and world
coordinate systems, for instance [33], [34].
The approach of Ramalingam et al. [28] is similar to
that proposed by Chen and Chang, which is based on the
fact that the distances between the world points must be
preserved. They claim that the three distances’ constraints
generate an eighth degree polynomial equation. However,
unlike the algorithm proposed by Chen and Chang, they do
not give a closed-form solution for the polynomial parameters.
Since the basis of this algorithm is similar to the method of
Chen and Chang, we will restrict our analysis to the latter
algorithm.
Nistér and Stewénius [24], [25] proposed an algorithm that
is also based on the estimation of the roots of a single
variable eighth degree polynomial equation. This algorithm
also requires predefined transformations to the 3D points and
3D lines, in the world and camera coordinate systems. The
coefficients of the polynomial equation are obtained from the
intersection between a ruled quartic surface and a circle.
Finally, we note that both algorithms are based on the
derivation of an eighth degree polynomial equation, as a
function of a single variable. Despite the differences between
SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM
the approaches, these algorithms require a set of rigid transformations of the dataset to be performed to get the coefficients of
the eighth degree polynomial equation. These transformations
are more complex – see, for example, the derivation of [35],
[25] – and can lead to critical configurations – see, for instance
[27, Sec. 3.2].
In this article, we propose a different approach. We do
not aim to obtain a polynomial equation as a function of
a single variable. Instead, we express the solutions as the
intersection between three quadrics, where the number of
solutions will be up to eight, as with the previous approaches.
There are, however, methods to express the intersection of
three quadrics as a problem of finding the roots of an eighth
degree polynomial equation. In Guo [12, Theorem 1] the
author shows a procedure to generate, in closed form, the
coefficient parameters of an eighth degree polynomial equation, from a general problem of finding the points where
quadrics intersect. The main results of our approach, in terms
of problem parametrization, are that it does not require any
transformation of the dataset and it is based on a very simple
and intuitive four-step algorithm.
Note, too, that no comparison between previous methods
has yet been made.
C. Background and Notation
By default, bold capital letters represent matrices (eg. A ∈
Rn×m , n rows and m columns), bold small letters represent
vectors (eg. a ∈ Rn , n elements) and small letters represent
one-dimensional elements (eg. a). By default, a vector is a
column.
Let us consider U ∈ Rn×m and V ∈ Rk×l and the equation
UXVT = C
(1)
where X ∈ Rm×l is the equation unknown. Using the
Kronecker product, this equation can be rewritten as
(V ⊗ U) vec (X) = vec (C)
3
II. P ROPOSED A PPROACH
Pose estimation requires the estimation of a rotation matrix
R ∈ SO (3) and a translation vector t ∈ R3 that define the
rigid transformation between the world and camera coordinate systems. Since we consider that the imaging device is
calibrated, pose is specified by the rigid transformation that
satisfies the relationship of incidence between points in the
world coordinate system and 3D straight lines represented in
the camera coordinate system, Fig. 1. To distinguish between
features represented in the world coordinate system and entities in the camera coordinate system, we use the superscripts
(W) and (C), respectively.
The rigid transformation between a point in world coordinates and the same point in camera coordinates is given by
p(C) = Rp(W) + t.
(3)
Let us consider a plane π (W) ∈ R4 , defined by any three
3D points. Using this assumption and from the definition of
the homography map [37], [1], we can rewrite (3) as
1
(4)
p(C) = R + tπ T p(W) ,
π0
|
{z
}
H
valid for any point that belongs to that plane. The matrix H
is called the homography matrix, π0 and π are respectively
the distance from the plane to the origin and the unit normal
vector to the plane.
The homography matrix can be defined up to a scale factor.
However, this matrix only defines a rigid transformation for
points belonging to the plane π (W) if it meets the following
condition:
Proposition 1. A matrix H belongs to the space of the
homography matrices H if it verifies following condition
.
H ∈ H = H ∈ R3×3 : σ2 (H) = 1
(5)
where σ2 (H) is the second biggest singular value of H.
The proof of this proposition is a simple extension of the
proof found in [37, Lemma 5.18].
(2)
where ⊗ represents the Kronecker product with (V ⊗ U) ∈
Rkn×lm and vec (.) is a vector formed by stacking the columns
of the relevant matrix.
b ∈ R3×3 , denotes the skew-symmetric
The hat operator, A
matrix that linearizes the exterior product such that a × b =
b
Ab.
In this article we use Plücker coordinates to represent lines
.
lR = (d, m), where the elements of l ∈ R6 are constrained
to hd, mi = 0 (known as the Klien quadric). For more
information about Plücker coordinates see [36]. d ∈ R3 and
m ∈ R3 are respectively the direction and moment of the line
and for any point x ∈ R3 incident with the line, we can write
m = x × d.
The dimensions of the column-space and dimension of the
null-space are represented by rank (.) and nullity (.), respectively.
A. Decomposition of the Homography Matrix
For the decomposition of the homography matrix we use
the method described in Section 5.3.3 of [37]. In this section
we briefly describe this method and analyze the decomposition
used in our approach, in which the coordinates of π (W) are
known.
Let us consider the eigen decomposition of HT H as
HT H = VΣ2 VT ,
σi2
(7)
the ith eigenvalue of
vi denotes the ith column of V and
H (note that from Proposition 1, we know that σ22 = 1). Using
these assumptions, we define
p
p
1 − σ32 v1 + σ12 − 1v3
p
(8)
u1 =
σ12 − σ32
p
p
1 − σ32 v1 − σ12 − 1v3
p
u2 =
(9)
σ12 − σ32
SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM
Solution 1:
T
R(1) = W(1) U(1)
π (1) = v2 × u1
1 (1)
= H − R(1) R(1)
dt
Solution 3:
R(3) = R(1)
π (3) = −π (1)
1 (3)
= − d1 t(1)
dt
and matrices
U
(1)
U
(2)
W
(1)
W
(2)
=
v2
u1
v2 × u1
=
v2
u2
v2 × u2
=
Hv2
Hu1
Hv2 × Hu1
=
Hv2
Hu2
Hv2 × Hu2
(10)
(11)
(12)
.
(13)
To conclude, the four possible solutions for the decomposition of the homography matrix are shown in (6).
Usually, the homography matrix is used to estimate the
relative pose between two perspective images when a planar
pattern is viewed by both cameras. In that case, the decomposition of H into a rotation matrix and translation vector
can yield up to four solutions [37]. However, in the problem
addressed in this paper, we assume that the 3D coordinates of
the points are known in one of the coordinate systems, which
means that the plane coordinates are known in that coordinate
system. For this problem, it is possible to define the following
theorem.
Theorem 1. Given a homography matrix, H ∈ H, and a
plane π (W) of known coordinates, the decomposition of H
into rotation and translation has a unique solution.
The proof of this theorem is given in Supplementary Material [38].
Solution 2:
T
R(2) = W(2) U(2)
π (2) = v2 × u2
1 (2)
= H − R(2) R(2)
dt
4
Solution 4:
R(4) = R(2)
π (4) = −π (2)
1 (4)
= − d1 t(2)
dt
(6)
coordinate system). Therefore, using the representation of (16),
we can define the following algebraic relation
(C)
(W) T
b (C)
p1
⊗D
m1
1
(C)
(C) vec (H) =
p(W) T ⊗ D
b
(17)
m2 .
2
2
(C)
(W) T
(C)
m3
b
p3
⊗D
3
|
{z
}
{z
}
|
w
M
The column-space of the matrix M is important for the estimation of the unknown parameters. Let us therefore consider
the following theorem.
Theorem 2. Consider a set of three points defined in the world
coordinate system and their corresponding lines in the camera
coordinate system. If the three points define a plane that does
not pass through the origin, the dimension of the column-space
of M at (17) will be rank (M) = 6.
The proof of this theorem is given in Supplementary Material [38]. Even if the plane contains the origin, a simple change
of coordinate system ensures that the conditions of the theorem
hold.
The goal of the method is to compute the parameter of H.
As a result, let us consider the following derivation
vec (H)
M −w ξ = 0, where ξ =
. (18)
1
|
{z
}
N
B. Formalization of the problem
As mentioned in Sec. I-A the camera is deemed to have been
previously calibrated. Therefore, for each pixel the corresponding 3D line (in the camera coordinate system) is known. A 3D
line can be represented in Plücker coordinates, Section I-C.
Using this representation and from [36], a point that is
incident on a line satisfies the following relationship
b (C) p(C) = m(C) .
d(C) × p(C) = D
(14)
We wish to determine the relationship between the points
in the world coordinate system and the lines in the camera
coordinate system. Thus, for the plane π (W) , defined by the
three world points, we can write
b (C)
D
Hp
(W)
=m
(C)
.
(15)
The aim of this approach is to estimate the pose based on the
homography matrix. Thus, let us consider the linearization of
the unknown matrix H in (15), using the Kronecker product,
Sec. I-C,
T
b (C) vec (H) = m(C) .
p(W) ⊗ D
(16)
The minimal problem, in the case of the camera pose,
corresponds to the determination of the mapping between three
world points (represented in the world coordinate system)
and their corresponding 3D lines (represented in the camera
From Theorem 2, it can be seen that nullity (N) = 4. Let us
represent the null-space of N as
n
o
.
null (N) = ae(1) + be(2) + ce(3) + de(4) : a, b, c, d ∈ R ,
(19)
where the e(i) are the vectors defining its basis. Note that ξ
has four degrees of freedom. However, from (18), we see that
the tenth element of vector ξ must be ξ10 = 1. Without loss
(i)
of generality, let us consider e10 = 1, for all i. Using these
definitions, one can see that ξ10 = 1 is satisfied for ξ ∈ R,
such that
n
o
.
R = af (1) + bf (2) + cf (3) + f (4) : a, b, c ∈ R , (20)
and
f (1) = e(1) − e(4)
(21)
f
(2)
=e
(4)
(22)
f
(3)
= e(3) − e(4)
(23)
f
(4)
=e
(2)
(4)
−e
.
(24)
Note that in this representation the space has only three
degrees of freedom.
Let us consider the estimation of the homography matrix
represented as H. Using the result of (20), we can see that
the homography matrix can be represented as
H = aF (1) + bF (2) + cF (3) + F (4) ,
(25)
SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM
where F (i) ∈ R3×3 are the un-stacking of the vectors f (i)
and a, b and c are the unknowns.
The estimated homography matrix must represent a rigidbody transformation for points at π (W) , which means that the
distance between any two points that belong to that plane must
(C)
(W)
be preserved. Formally, since pi = Hpi ,
(W)
(W)
− pj
(C)
= pi
(C)
− pj
(W)
= Hpi
(W)
− Hpj
.
(26)
(W)
(W)
Using this result and considering qi,j = pi − pj
for the
three world points, one can define three constraints that the
estimated homography matrix must satisfy
T
T
T
q1,2 q1,2 = q1,2 H Hq1,2
T
(27)
qT1,3 q1,3 = qT1,3 H Hq1,3 .
T
T
T
q2,3 q2,3 = q2,3 H Hq2,3
pi
Using (25), we can rewrite these three constraints as a function
of the unknowns as
f1,2 (a, b, c) = f1,3 (a, b, c) = f2,3 (a, b, c) = 0
(28)
where each fi,j (a, b, c) is such that
(i,j)
(i,j)
(i,j)
(i,j)
fi,j (a, b, c) = k1 a2 + k2 b2 + k3 c2 + k4 ab+
(i,j)
(i,j)
(i,j)
(i,j)
(i,j)
(i,j)
+k5 ac + k6 bc + k7 a + k8 b + k9 c + k10 .
(29)
(i,j)
The parameters of the coefficients, kn for n = 1, . . . , 10, are
shown in (30). Note that each of these functions represents a
quadric. As a result, the solutions for vector (a, b, c) that verify
the system of (28), can be seen as corresponding to the points
where the three quadrics intersect. A method to estimate the
intersection of quadrics is described in Sec. III.
A necessary condition for the decomposition of the homography matrix described in Sec. II-A is that the estimated
matrices satisfy H ∈ H. This is the same as proving that H
satisfies the conditions of a rigid transformation for any point
that belongs to the plane π (W) . Therefore, this requirement
must be met to ensure that the estimates R ∈ SO (3) and
t ∈ R3 .
From the following Proposition, all the solutions for H that
are obtained from the intersection of the three quadrics, (27)
and (28), satisfy the condition H ∈ H.
Proposition 2. A matrix H ∈ R3×3 that satifies (27) and
(28) defines a rigid transformation for any point that belongs
(W)
to π (W) (that is defined by the three points pi ) and, as a
result, also satisfies the Proposition 1.
The proof of this Proposition is presented in Supplementary
Material [38]. This Proposition proves that a homography
defines a rigid transformation of a plane if it preserves the
distances between any three points of the plane.
Note that from the Theorem 1, we ensure that each solution
estimated for H ∈ H will generate a single solution for R ∈
SO (3) and t ∈ R3 .
C. The Case of a Central Camera
Let us consider the case of a central/perspective camera.
Without loss of generality, we take the origin of the camera
5
coordinate system to be the center of projection, i.e., the point
where all the rays intersect. Since all the rays pass through
the origin, the moments of the 3D lines will be zero and we
can write w = 0. In this case, from the algebraic relationship
of (17), the solutions for the unknown vector vec (H) will
belong to the null-space of M. From Theorem 2, we have
nullity (M) = 3, which means that we can define vec (H) ∈
null (M), such that
n
o
.
null (M) = af (1) + bf (2) + cf (3) : a, b, c ∈ R
(31)
where f (i) are the vectors corresponding to the basis for the
null-space. As with the general case, a space of solutions for
H can be obtained (in this case from the null-space of M)
using (25), by setting f (4) = 0.
Since H must represent a rigid transformation for points that
belong to the plane π (W) , the constraints defined in (27) must
be enforced. As with the general case, each of these equations
will correspond a quadric surface fi,j (a, b, c), where
(i,j) 2
(i,j)
(i,j)
(i,j)
a + k2 b2 + k3 c2 + k4 ab+
(i,j)
(i,j)
(i,j)
+k5 ac + k6 bc + k10
(32)
(i,j)
and the parameters km are given by (30), by setting F 4 = 0.
Note that in this case Proposition 2 is also satisfied and, as
a result, H ∈ H. This means that, for each estimate H ∈ H,
we have a single R ∈ SO (3) and t ∈ R3 .
Note that in the case of a minimal pose problem for a central
camera, we have up to four solutions [9]. Let us consider the
solutions for vec (H) as ui , for i = 1, . . . , n where n ≤ 8 is
the number of possible solutions. It is straightforward to verify
that the solutions can be grouped into pairs {ul , um }, where
um = −ul (if we have a solution ul that satisfies Mul = 0,
then we will have um = −ul that satisfies Mum = 0). As a
result, if we consider only solutions that correspond to points
in front of the camera (which is usually used in these cases
[9]), one of the solutions from each pair can be eliminated and
then we will have up to four solutions, too.
fi,j (a, b, c) = k1
III. S OLVING FOR THE I NTERSECTION OF Q UADRICS
The most important step in previous algorithms was the estimation of the roots of an eighth degree polynomial equation.
There is no analytical solution for the estimation of these roots
[26], [27]. However, there are several numerical methods that
can be used to estimate these solutions. Nistér & Stewénius
and Chen & Chang proposed the use of the companion matrix
[39], [24], [25]. According to Chen and Chang, this procedure
is suitable to solve the proposed problem when considering
criteria of speed and accuracy. With this method, the solutions
for the roots are given by the eigenvalues of the companion
matrix.
For the algorithm proposed in this article, the solutions for
the pose problem are obtained by estimating the intersection
points of three quadrics. As mentioned above, Guo at [12]
describes a method to derive the intersection of three generic
quadrics by solving for the roots of an eight degree polynomial
equation – the coefficients of this polynomial equation can
be given in closed form. But other solvers can be used to
SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM
(i,j)
k1
(i,j)
k2
(i,j)
k3
(i,j)
k4
(i,j)
k5
=
=
=
=
=
qTi,j
qTi,j
qTi,j
qTi,j
F
F
(1) T
(2) T
(3) T
(i,j)
=
(i,j)
=
k8
(i,j)
=
(i,j)
k9
=
(i,j)
=
F
(1)
qi,j ,
k6
F
(2)
qi,j ,
k7
(3)
F qi,j ,
F
T
T
F (1) F (2) + F (2) F (1) qi,j ,
T
T
qTi,j F (1) F (3) + F (3) F (1) qi,j ,
solve the intersection of the quadrics. The usual methods to
estimate these intersection points are based on the use of
the Grobner basis [40], [41] and hidden variable techniques
[42]. More recently, a polynomial eigenvalue method [39]
for solving minimal problems in computer vision has been
presented [43]. As claimed by Kukelova et al. [43], these
approaches are accurate, simple, fast and easy to implement.
As a result, in this paper, we derive a polynomial eigenvalue
solution to estimate the intersection points of the quadrics. We
now describe the proposed derivation.
Let us consider the system of three equations represented
in (28). If we choose a as the eigenvalue, the polynomial
eigenvalue has the form
C2 a2 + C1 a + C0 v = 0.
(33)
Polynomial eigenvalue solvers require that the matrices Ci
have square shapes. Note that since the original problem
has only three equations and the size of the vector v is
six, we cannot get square matrices Ci . We have to generate
new equations as a polynomial combination of the original
ones (resultant-based methods), to ensure that the coefficient
matrices Ci have a square shape.
After the generation of the new polynomial equations, we
get matrices C0 , C1 , C2 ∈ R15×15 from (35) and (34), and
v ∈ R15 is such that
v = b4 , b3 c, bc3 , b2 c2 , c4 , b3 , b2 c, bc2 , c3 , b2 , bc, c2 , b, c, 1 .
(36)
Since matrix C0 will be non-singular, the polynomial eigenvalue problem of (33) can be rewritten as a standard eigen
decomposition problem Cν = aν, where ā = a1 and matrix
C ∈ R30×30 and vector ν ∈ R30 are
0
I
v
C=
, ν=
. (37)
av
−C−1
−C−1
0 C2
0 C1
To conclude, the solution for the unknowns (a, b, c) that
satisfies (28) can be given by the eigen decomposition of
matrix C where: a is given by the inverse of the eigenvalue of
C and the corresponding b and c are given by the thirteenth
and fourteenth element of the corresponding eigenvector.
Note that given the size of matrix C there will be thirty
eigenvalues. However, from (34), we can see that there will
−1
be sixteen columns of matrices −C−1
0 C2 and −C0 C1 equal
to zero. These columns will generate zero eigenvalues, which
cannot be used since the value of a will be the inverse of
each eigenvalue. Thus, we can eliminate these columns and the
corresponding rows, resulting in a 14 × 14 matrix. Additional
zero and complex eigenvalues are also generated and have to
be eliminated, which will yield up to eight real solutions.
k10
T
qTi,j F (2)
T
qTi,j F (1)
T
qTi,j F (2)
T
qTi,j F (3)
T
qTi,j F (4)
T
F (3) + F (3) F (2) qi,j ,
T
F (4) + F (4) F (1) qi,j ,
T
F (4) + F (4) F (2) qi,j ,
T
F (4) + F (4) F (3) qi,j ,
F (4) − I qi,j .
6
(30)
IV. A LGORITHM O UTLINE
We will now give an overview of the proposed method. The
algorithm can be decomposed into the following four steps:
1) compute the null-space of matrix N at (18), which is
obtained from the set of correspondences between world
points and 3D lines from the general imaging model;
2) using the basis of the estimated null-space, compute the
coefficients of the quadric surfaces using (30); using
these coefficients get matrices C0 , C1 , C2 from (35)
and (34), and use these matrices to compute matrix C,
as shown in (37);
3) compute the eigen decomposition of matrix C, and, for
each eigenvalue, set a equal to the inverse of the eigenvalue and b and c equal respectively to the thirteenth and
fourteenth elements of the corresponding eigenvector;
4) for each set of (a, b, c), compute the estimates for the
homography matrix using (25); and decompose each
estimate using the method described in Sec. II-A.
We again note that the third step – which corresponds to
the solver for the quadrics intersection proposed at Sec. III
– can be replaced by the solution described by Guo in [12,
Theorem 1]. The method proposed in Sec. III is much more
intuitive and easy to implement than the method described in
Guo’s article, which derives an eighth degree polynomial equation for the problem. However, as we describe in Sec. V-A3,
finding the roots of an eighth degree polynomial equations can
solve the intersection of the quadrics faster.
The algorithm proposed in this paper is implemented in
M ATLAB and will be available on the author’s page.
V. E XPERIMENTAL R ESULTS
In this section we evaluate the proposed method against the
state-of-the-art approaches, using synthetic data. In addition,
we validate our method by testing it using real data. We
conclude by analyzing the experimental results.
A. Experiments with Synthetic Data
The important parameters to evaluate the solutions of minimal problems are numerical accuracy, computation time, and
the number of possible solutions given by the algorithms.
As noted by Nistér and Stewénius [24], [25], all minimal
solutions will behave in much the same way with respect to
noise, which means that tests of this type are less relevant.
We compare the algorithm proposed in this article with the
methods proposed by Nistér and Stewénius at [24], [25] and
by Chen and Chang at [26], [27]. Afterwards, we describe the
generation of the dataset used to test the minimal three-point
pose problem with the three algorithms.
SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM
(1,2) (1,2) (1,2)
(1,2)
k4
k5
k7
0
0
0
0
0 k1
(1,2)
(1,2)
0
k7
0
0
0
0 k1
0
0
(1,2)
(1,2)
(1,2)
0
k5
0 k7
0
0
0
0 k1
0
0
(1,2) (1,2)
0 k1(1,2) 0
k
k
0
0
0
0
0
0
0
0
(1,2) 4(1,2) 5
(1,2)
k4
k1(1,2) 0
k5
0
k7
0
0
0
0
0
0
0
0
0
(1,3) (1,3) (1,3)
(1,3)
0
0
0
0
0
0
0 k4
k5
k7
0
0
0
0 k1
(1,3) (1,3)
(1,3)
(1,3)
0
0
0
0
k4
k5
0 k7
0
0
0
0 k1
0
0
(10:15) †
(1,3) (1,3)
(1,3)
(1,3)
= 0
= 0
0 , C2
0
0
0 k4
k5
0 k7
0
0
0 k1
0 ,
(1,3)
0 k4(1,3) k5(1,3)
0 k1(1,3) 0
0 k7
0
0
0
0
0
0
0
(1,3) (1,3)
(1,3)
(1,3)
0
0
0 k4
k5
0
0 k7
0
0
0
0 k1
0
0
0
(2,3) (2,3) (2,3)
(2,3)
0
0
0
0
0
0
0
0 k4
k5
k7
0
0
0
0 k1
(2,3) (2,3)
(2,3)
(2,3)
0
0
0
0
0 k4
k5
0 k7
0
0
0
0 k1
0
0
(2,3) (2,3)
(2,3)
(2,3)
0
0
0
0
0
0 k4
k5
0 k7
0
0
0
0 k1
0
(2,3) (2,3)
(2,3)
k
k(2,3) 0
k5
0
0 k7
0
0
0
0
0
0
0
0
0
4
1
(2,3) (2,3)
(2,3)
(2,3)
0
0 k4
k5
0
0 k7
0
0
0
0
0 k1
0
0
0
(34)
(1,2)
(1,2)
(1,2)
(1,2)
(1,2)
(1,2)
0
0
0
0
0
0
0
0
0 k2
k6
k3
k8
k9
k10
(1,2)
(1,2)
(1,2)
(1,2)
(1,2)
(1,2)
0
0
0
0
0 k2
k6
k3
0 k8
k9
0 k10
0
0
(1,2)
(1,2)
(1,2)
(1,2)
(1,2)
(1,2)
0
0
0
0
0
0 k2
k6
k3
0 k8
k9
0 k10
0
(1,2)
(1,2)
(1,2)
(1,2)
(1,2)
(1,2)
0 k2
k
k6
0
0 k8
k9
0
0 k10
0
0
0
0
(1,2) (1,2) 3
(1,2)
(1,2)
(1,2)
(1,2)
k2
k6
0 k3
0 k8
k9
0
0 k10
0
0
0
0
0
(1,3)
(1,3)
(1,3)
(1,3)
(1,3)
(1,3)
0
k10
0
0
0
0
0
0
0
0 k2
k6
k3
k8
k9
(1,3)
(1,3)
(1,3)
(1,3)
(1,3)
(1,3)
0
0
0
0
0 k2
k6
k3
0 k8
k9
0 k10
0
0
(1,3)
(1,3)
(1,3)
(1,3)
(1,3)
(1,3)
C0 = 0
(35)
0
0
0
0
0 k2
k6
k3
0 k8
k9
0 k10
0 .
(1,3)
(1,3)
(1,3)
(1,3)
(1,3)
(1,3)
0 k2
k
0
0
k
0
0
0
0
k
k
0
0
k
3
6
8
9
10
(1,3)
(1,3)
(1,3)
(1,3)
(1,3)
(1,3)
0
0 k6
k2
k3
0
0 k8
k9
0
0 k10
0
0
0
(2,3)
(2,3)
(2,3)
(2,3)
(2,3)
(2,3)
0
0
0
0
0
0
0
0
0 k2
k6
k3
k8
k9
k10
(2,3)
(2,3)
(2,3)
(2,3)
(2,3)
(2,3)
0
k9
0 k10
0
0
0
0
0
0 k2
k6
k3
0 k8
(2,3)
(2,3)
(2,3)
(2,3)
(2,3)
(2,3)
0
0
0
0
0
0 k2
k6
k3
0 k8
k9
0 k10
0
(2,3) (2,3)
(2,3)
(2,3)
(2,3)
(2,3)
k
k6
0 k3
0 k8
k9
0
0 k10
0
0
0
0
0
2
(2,3)
(2,3)
(2,3)
(2,3)
(2,3)
(2,3)
0
0 k6
k2
k3
0
0 k8
k9
0
0 k10
0
0
0
(6:15) †
C1
7
0
0
0
0
0
0
0
0
0
0
(6:15)
†
– The superscripts at C1
elements.
0
0
0
0
0
0
0
0
0
0
0
(1,2)
k4
(1,2)
k5
(1,2)
k4
(1,2)
k7
(10:15)
and C2
0
0
represent the columns of respective matrices. The disregarded columns will have only zero
TABLE I
I N THIS TABLE , WE SHOW THE MEDIAN AND MODE FOR EACH GRAPHIC OF THE F IG . 2, THAT REPRESENT THE NUMERICAL ERRORS FOR ROTATION ,
TRANSLATION AND DISTANCES BETWEEN POINTS .
Numerical Errors:
Rotation
Translation
Distance between points
Our Method
median
mode
1, 4.10−13
1.9.10−14
2, 3.10−11
3, 7.10−12
9, 4.10−12
8, 3.10−13
We used synthetic datasets for these experiments. For that
purpose consider a cube with a side length of 500 units. The
data were generated by randomly mapping 3D lines and points
(C)
(C)
li ↔ pi , for i = 1, 2, 3. A random rigid transformation
was randomly generated (R ∈ SO (3) and t ∈ R3 ). This
rigid transformation was applied to the set of points, such that
(C)
(W)
pi 7→ pi . For the data generated and using the mappings
(C)
(W)
li ↔ pi , for i = 1, 2, 3, the pose is estimated using the
corresponding algorithms.
This process is repeated for 105 trials with a new pose and
data being randomly generated for each trial.
1) Numerical Errors: The characterization of the numerical
error is one of the main parameters used to evaluate minimal
Nistér and
median
9, 5.10−12
1, 5.10−9
3, 8.10−10
Stewénius
mode
8, 3.10−13
7, 5.10−11
7, 5.10−11
Chen and Chang
median
mode
1, 2.10−12
8.3.10−13
1, 7.10−10
3, 5.10−11
3, 6.10−11
1, 7.10−11
three-point pose algorithms. In Fig. 2, we show the results for
the three algorithms that are compared in this article.
We consider three different metrics for the numerical errors:
1) rotation error;
2) translation error;
3) average of the distance error between the estimated
and ground-truth world points in the camera coordinate
system.
Next, we define the errors in detail. For each of the 105
trials we computed the estimated poses using the algorithms
mentioned. From the set of possible solutions, we eliminated
those that do not fit the problem, using the distance from the
estimated solutions to the ground-truth.
SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM
Rotation Error
8000
6000
4000
2000
0
−16
−14
−12
−10
−8
−6
−4
−2
Points Distance Error
Proposed Approach
Nister and Stewenius
Chen and Chang
10000
8000
6000
4000
2000
0
−16
0
−14
Base 10 Logarithm of the Numerical Error
−12
−10
−8
−6
−4
−2
Proposed Approach
Nister and Stewenius
Chen and Chang
12000
Number of Occurrences in 105 Trials
Number of Occurrences in 105 Trials
10000
Number of Occurrences in 105 Trials
Translaction Error
Proposed Approach
Nister and Stewenius
Chen and Chang
0
10000
8000
6000
4000
2000
0
−16
−14
Base 10 Logarithm of the Numerical Error
(a)
8
(b)
−12
−10
−8
−6
−4
−2
0
Base 10 Logarithm of the Numerical Error
(c)
Fig. 2. In this figure, we show the distribution of the numerical error in the computation of the rotation parameters (a) (distance between the estimated and
ground-truth angles); in the computation of the translation vector (b) (distance between the estimated and ground-truth translation vector; and in terms of the
distance between the ground-truth and estimated points in the camera coordinate system (c).
For each of the trial we computed
drotation = α, β, γ − (α, β, γ)
(38)
where α, β, γ and (α, β, γ) are the rotation angles of the
estimated solutions and ground-truth solution, respectively.
The error is the usual Euclidean error defined between two
vectors (L2 metric). The frequency distribution of the error
values is shown in Fig. 2(a).
The numerical error for translation was computed using
dtranslation = t − t ,
(39)
where t and t are the estimated and ground-truth translation
vectors, respectively. The distribution of the numerical errors
of translation is shown in Fig. 2(b).
We also computed an error measure related to the estimation
of the coordinates of the world points in the camera coordinate
system. Since the ground-truth p(C) is known, we computed
the error as
3
dpoints
1 X (C)
(C)
(C)
(C)
, where pi = Rpi t, (40)
pi − pi
=
3 i=1
where R and t are the estimates for rotation and translation
respectively. The frequency distribution of the numerical errors
is shown in Fig. 2(c).
Without Orientation Contraint
4
x 10
5
4
3
2
1
Proposed Approach
Nister and Stewenius
Chen and Chang
4.5
Number of Occurrences in 10 Trials
5
Number of Occurrences in 105 Trials
With Orientation Contraint
4
x 10
Proposed Approach
Nister and Stewenius
Chen and Chang
4
3.5
3
2.5
TABLE II
I N THIS TABLE WE SHOW THE AVERAGES OF THE NUMBER OF SOLUTIONS
GENERATED BY EACH ALGORITHM , WITHOUT AND WITH THE
APPLICATION OF THE ORIENTATION CONSTRAINT.
2
1.5
1
0.5
0
1
2
3
4
5
Number of Solutions
(a)
6
7
8
We also computed the median and mode for each curve.
The results are shown in Tab. I.
2) Number of Solutions: Note that in general, minimal
problems in computer vision do not generate single solutions.
As a result, comparing the different methods in terms of the
number of solutions is important. In this section we analyze
the distribution of the number of solutions for the 105 trials
described above.
Wrong solutions are eliminated by assuming the constraint
that “the camera looks ahead”. Assuming that the line is rep(C)
resented by a point x(C) and its direction d
, this constraint
T
(C)
(C)
(C)
> 0,
pI − xi
can be formalized by the inequality di
for i = 1, 2, 3.
We computed the number of solutions for all the 105 trials
and the results are shown in Figs. 3(a) and 3(b), before and
after imposing the orientation constraint, respectively.
We also computed the averages of the number of solutions
generated by the three algorithms considered, with and without
the imposition of the orientation constraint. The results are
presented in Tab. II.
3) Computational Effort: We implemented the algorithms
in M ATLAB using an Intel core i7-3930k with a 3.2GHz
processor. All the computation times given in the paper were
estimated as the median of the respective computation times,
for all the 105 trials. All the proposed approaches require
iterative steps to solve the problem. When compared to the
iterative steps, the computation time spent on simple analytical
operations is negligible. As a result, for the evaluation we
only considered the steps that require the most significant
0
1
2
3
4
5
6
7
8
Number of Solutions
(b)
Fig. 3. In this figure we show the distribution of the number of solutions
for the three algorithms. Figures (a) and (b) display the number of solutions
before and after the application of the orientation constraint respectively.
Mean:
Our Method
Nistér and Stewénius
Chen and Chang
Orientation Constraint
Without
With
3, 686
2, 909
3.691
2.912
3.695
2.915
SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM
9
400
300
300
250
200
200
100
yy axis
150
0
100
15
50
0
3000
−50
2000
−100
1000
−150
500
−200
2500
17
16
34 19 2 10 11
12 5146 13
ey
ez
ex
2000
−400
1500
400
300
0
200
100
0
−100
xx axis
(a)
8
7
−100
(b)
−200
−300
−400
−1000
Yc0
1000
zz axis
−200
200
500
400
0
600
(c)
Fig. 4. For results with real data, we use a catadioptric system with quadric mirrors. Seventeen images of 3D planes were acquired (c). An example of data
used for pose estimation is displayed in Figs. (a) and (b). Using only three points in the image and their corresponding points in the chess-board coordinate
system (red coordinate system in (a)), rotation and translation (corresponding to the rigid transformation between the chess-board and camera coordinate
system) are estimated. In (b) the camera coordinate system is shown in black and the estimated chess-board coordinate system is shown in red. In Fig. (c)
the metric units are in millimeters.
computational effort, i.e., the iterative steps.
The main computation step for all the algorithms was the
computation of the minimal solver, which may correspond
to finding the roots of an eighth degree polynomial equation
(other simple solvers can be used for the method proposed
in this paper). As suggested by both Nistér & Stewénius and
Chen & Chang, we used the companion matrix method to
solve the roots of the eighth degree polynomial equations. This
method amounts to performing an eigen decomposition of an
8 × 8 matrix, which takes 43µs.
For our parametrization, we had to consider the estimation
of the null-space of the matrix N, at (18). Iterative singular
value decomposition methods would require that we take
into account 40µs. However, there are solutions that can be
used to derive general analytical solutions to compute the
singular value decomposition of matrix N. For instance, we
can symbolically produce the reduced row echelon form of
matrix N and use this derivation to compute the null-space.
We also derived an analytical solution using the geometric
properties of the cross and dot products and considering the
rows of matrices M and N from (17) and (18). As a result,
it is possible to compute the null-space in closed-form, using
very simple analytical operations.
Both our approach and Chen & Chang’s method require
additional steps to recover the rotation and translation that
define the pose. Chen & Chang give the coordinates of the
world points in the camera coordinate system and the method
we propose estimates the homography matrix that defines pose.
In both cases, the computation of an eigen decomposition of
a 3 × 3 matrix is required for each valid solution. Note that
this decomposition can be computed in closed form. We now
describe how we can achieve this decomposition.
Let us consider that we want to compute the eigen decomposition of the matrix HT H ∈ R3×3 from the problem described
in (7). The eigenvalues σi2 , which will be the diagonal elements
of the matrix Σ2 in (7), are computed such that
det HT H − σ 2 I = 0,
(41)
where det (.) denotes the determinant and I denotes the
3 × 3 identity matrix. (41) defines a third degree polynomial
equation with unknown σ 2 . As a result, the solutions for σi2 for
i = 1, 2, 3 are given by the roots of this equation, which can be
computed in closed-form simply by applying Ferrari’s method.
To conclude, and since we already have σi2 , the solutions for
the columns vi ∈ R3 , for i = 1, 2, 3, that form V of (7) are
given by the equation
HT Hvi = σi2 vi .
(42)
B. Experimental Results with Real Data
To obtain data from real images, we considered a noncentral catadioptric imaging device comprising a perspective
camera and a quadric mirror. To obtain the data, we used a
chess board in seventeen different positions and orientations,
Fig. 4(c). The images of the chess board were acquired by
the non-central catadioptric camera. To obtain the groundtruth, the 3D positions of the points of the chess board were
estimated in world coordinates using a set-up with perspective
cameras. An example displaying the points in the world and
their corresponding image coordinates is shown in Figs. 4(a)
and 4(b). In both figures blue dots represent image points and
their corresponding world points in ground-truth coordinates.
Sixteen of the seventeen chess board planes (and their
images) were used to calibrate the camera (using [32]).
The resulting calibration provides the coordinates of the 3D
projecting rays for each image pixel. A reference coordinate
system was associated with the chess board plane that was
not used to calibrate the camera. The coordinate system was
defined so that Theorem 2 is satisfied (the plane does not
contain the origin of the coordinate system). Three points of
the chess board plane and their images were then used to
estimate the pose of the plane. The 3D projection line was
estimated in the camera coordinate system, for each of the
three points in the image. Figs. 4(a) and 4(b) show the image
points plus an example of the 3D lines (in green).
The mappings between the three points in the world (chess
board coordinate system) and their projection lines (represented in the camera coordinate system) were used to estimate
pose. In addition, we selected the best solution (in terms of the
errors in the 3D reconstructed points) from the set of possible
SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM
10
TABLE III
I N THIS TABLE , WE SHOW THE RESULTS OF THE ABSOLUTE ERRORS FOR ALL SIX POSE PARAMETERS IN ALL OF THE SEVENTEEN POSE ESTIMATES .
ROTATION ANGLES ARE IN DEGREES . N OTE THE THE METRIC IN THE TRANSLATION ARE DIFFERENT FROM THE GROUND - TRUTH TO THE ABSLUTE
ERRORS . W HILE IN THE FIRST CASE IS IN METERS , IN THE SECOND CASE IS IN MILLIMITERS .
Plane
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
αx
0.86
0.99
0.89
0.94
1.57
1.60
0.97
1.26
1.34
1.28
1.79
1.95
1.95
1.35
0.98
0.96
2.12
αy
-18.68
-39.83
-8.89
8.04
-51.95
-50.09
4.18
37.49
-54.10
-41.00
-62.64
-68.73
-58.60
-42.33
5.81
-31.92
-63.70
Ground
αz
169.21
168.92
169.30
169.46
168.14
168.08
169.50
170.08
168.32
168.50
167.90
167.74
167.68
168.38
169.55
168.95
167.52
Truth
tx [m]
-0.37
-0.43
-0.41
-0.32
-0.38
-0.32
-0.34
-0.12
-0.51
-0.33
-0.30
-0.53
-0.29
-0.35
-0.45
-0.47
-0.42
ty [m]
0.06
0.09
0.06
0.06
0.12
0.13
0.02
-0.01
0.08
0.10
0.12
0.13
0.15
0.13
-0.04
0.04
0.06
tz [m]
2.10
1.91
2.08
2.11
1.76
1.72
2.30
2.48
1.97
1.90
1.75
1.70
1.63
1.73
2.63
2.16
2.09
pose estimates. Using the pose estimate thus obtained, the
errors for all six pose parameters were computed.
This procedure was repeated for all 17 planes using a leaveone-out procedure.
We computed the pose parameter’s errors for all the seventeen different configurations. The absolute errors in the
estimated parameters are shown in Tab. III.
C. Analysis of the Experimental Results
Let us first look at the experiments with synthetic data,
Sec. V-A. In terms of the number of solutions (Fig. 3) with
and without the orientation constraint, the results are similar
for the three methods and so this comparison does not offer
additional information on the relative merits of the algorithms.
This can be further checked by considering the averages of
the number of solutions generated by the three methods that
are presented in Tab. II. Thus, in this sub-section we discuss
the experimental results in terms of numerical accuracy and
processing time.
In terms of numerical accuracy (Fig. 2) it can be seen
that the method proposed in this article performed better than
the state-of-the-art algorithm, in all the tests. The algorithm
proposed by Chen & Chang performs the second best followed
by the method proposed by Nistér & Stewénius. This analysis
can be confirmed by taking into account the data in Tab. I.
This shows that the numerical errors for the method proposed
in this paper are better than state-of-the-art methods, for all
the evaluation parameters. Moreover, note that the mode of the
curves for the results obtained with the previous approaches
give worse results than the median of the method proposed in
this article, for all the numerical accuracy experiments.
In terms of computational time, Sec. V-A3, we saw that
all three algorithms can be implemented with similar computational effort (solving for the roots of an eighth degree
polynomial equation). Nevertheless, both our method and
α̂x
0.0053
0.2139
0.3361
0.0072
0.4777
1.4092
0.0027
0.0507
0.0292
0.1598
0.3740
0.5593
4.4823
0.3031
0.0538
0.9328
3.7783
α̂y
0.2831
0.6406
1.2368
0.0781
0.2570
0.3863
1.1902
0.0752
0.0882
0.1031
0.0378
1.5524
1.6889
0.9430
0.1246
2.9167
1.3850
Absolute Error
α̂z
t̂x [mm]
0.1082
1.8810
0.2383
1.1964
0.4696
5.2738
0.1519
0.6116
0.6365
0.7986
1.8659
1.1793
0.8675
6.2205
0.0321
1.1442
0.1513
0.0811
0.2384
0.7491
0.5223
0.5351
0.2773
9.6657
5.1984
7.1865
0.0728
2.0597
0.1570
0.6553
2.2671
2.4659
4.6183
3.6914
t̂y [mm]
0.4149
0.3307
1.6796
0.8114
1.4233
5.3712
4.8714
0.6326
0.3610
1.0101
0.5899
5.4284
13.5883
0.7150
0.7758
10.1745
7.4837
t̂z [mm]
0.0305
3.0765
1.1239
0.2942
1.0428
2.4334
5.1137
1.3746
0.0094
1.0732
0.4138
11.1939
10.0518
9.8622
1.3818
24.6971
0.1563
Chen & Chang’s method may or may not have additional
iterative parts. These iterative parts may be considered if fast
implementation is required. However, if computation time is
important, analytical derivations must be considered.
Now let us look at the experiments with real data,
Sec. V-B. As noted earlier and as mentioned by Nistér and
Stewénius [24], [25], all minimal solutions will behave in the
same way with respect to noise. But to further evaluate the
proposed approach we tested our method using real images.
As we can see from Tab. III, and despite the fact that we are
using minimal data, the results are good. Note that the absolute
errors for the angles are in degrees and that the translation
errors are in millimeters (while the ground truth is in meters).
VI. C ONCLUSIONS
In this article we have presented a novel homographybased algorithm for the minimal three-point pose problem for
generalized cameras, where the constraint that the projection
rays must intersect at a single point is not considered. This
method can be used to estimate pose for imaging models where
reflection or refraction occurs. That is the case, for example,
of catadioptric systems made up of arbitrary mirrors, or of a
perspective camera looking at a water environment.
Previous approaches used algebra techniques that lead to
a single variable eighth degree polynomial and the pose
estimates are given by roots of the polynomial. In the case
of this paper, the important step is the estimation of the
intersection points of three quadrics (which can also be
represented by the roots of an eighth degree polynomial
equation). The algorithm proposed estimates the pose by
computing four vectors obtained from a null-space and, using
this null-space, we can define the three quadric equations.
The intersection of quadric surfaces occurs in several minimal
problems in computer vision. We have described a polynomial
eigenvalue representation of the intersection of quadrics, where
the solutions are given by an eigen decomposition. Unlike
SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM
previous methods, our proposed approach does not require any
transformation of the dataset either, and this is why we call it
the direct solution to the problem.
The algorithm presented in this paper was not intended to
be used where non-minimal data is available; it is intended
to be a simple method to allow the analysis of the solutions
when one only has access to minimal data. This method is
suitable for RANSAC-based approaches for pose estimation.
Also, using this algorithm, the solutions with minimal data
can be easily obtained and then analyzed for the purpose of
comparing non-central cameras.
We compared the proposed algorithm with the state-ofthe-art methods and concluded that our algorithm performs
better than them in terms of numerical accuracy, with similar
computational effort and a similar number of solutions. The
validation of the method by testing with real experiments
showed good results.
ACKNOWLEDGEMENTS
This work was supported by the Portuguese
Foundation for Science and Technology, scholarship
SFRH/BD/49054/2008 and by project A “Surgery
and Diagnosis Assisted by Computer Using Images”
(SCT_2011_02_027_4824) funded by the QREN
programme “Mais Centro” with financing from FEDER.
Helder Araujo would also like to thank the support of Project
FCT/PTDC/EIA-EIA/122454/2010, funded by the
Portuguese Science Foundation (FCT) by means of national
funds (PIDDAC) and co-funded by the European Fund
for Regional Development (FEDER) through COMPETE
Operational Programme Competitive Factors (POFC).
R EFERENCES
[1] R. Hartley and A. Zisserman, Multiple View Geometry in Computer
Vision. Cambridge University Press, 2000.
[2] M. A. Fischler and R. C. Bolles, “Random Sample Consensus: A
Paradigm for Model Fitting with Applicatlons to Image Analysis and
Automated Cartography,” Comm. of the ACM, 1981.
[3] D. Nistér, “Preemptive RANSAC for Live Structure and Motion Estimation,” IEEE Proc. Int’l Conf. Computer Vision, 2003.
[4] R. M. Haralick, H. Joo, C.-N. Lee, X. Zhuang, V. G. Vaidya, and M. B.
Kim, “Pose Estimation from Corresponding Point Data,” IEEE Trans.
System, Man and Cybernetics, 1989.
[5] C.-P. Lu, G. D. Hager, and E. Mjolsness, “Fast and Globally Convergent
Pose Estimation from Video Images,” IEEE Trans. Pattern Analysis and
Machine Intelligence, 2000.
[6] F. Moreno Noguer, V. Lepetit, and P. Fua, “Accurate Non–Iteractive
O (n) Solution to the PnP Problem,” IEEE Proc. Int’l Conf. Computer
Vision, 2007.
[7] J. Hesch and S. Roumeliotis, “A Direct Least-Squares (DLS) Method
for PnP,” IEEE Proc. Int’l Conf. Computer Vision, 2011.
[8] O. Tahri, H. Araujo, Y. Mezouar, and F. Chaumette, “Efficient Iterative
Pose Estimation using an Invariant to Rotations,” IEEE Trans. Cybernetics, 2014.
[9] R. Haralick, C.-N. Lee, K. Ottenberg, and M. Nolle, “Review and
Analysis of Solutions of the Three Point Perspective Pose Estimation
Problem,” Int’l J. Computer Vision, 1994.
[10] X.-S. Gao, X.-R. Hou, J. Tang, and H.-F. Cheng, “Complete Solution
Classification for the Perspective–Three–Point Problem,” IEEE Trans.
Pattern Analysis and Machine Intelligence, 2003.
[11] L. Kneip, D. Scaramuzza, and R. Siegwart, “A Novel Parametrization
of the Perspective–Three–Point Problem for a Direct Computation of
Absolute Camera Position and Orientation,” IEEE Proc. Computer
Vision and Patter Recognition, 2011.
11
[12] Y. Guo, “A Novel Solution to the P4P Problem for an Uncalibrated
Camera,” In J. Math. Imaging Vis, 2012.
[13] M. Bujnak, Z. Kukelova, and T. Pajdla, “A general solution to the P4P
problem for camera with unknown focal length,” IEEE Proc. Computer
Vision and Patter Recognition, 2008.
[14] N. Goncalves, “Noncentral Catadioptric Systems with Quadric Mirrors,
Geometry and Calibration,” Ph.D. dissertation, University of Coimbra,
October 2008.
[15] A. Agrawal, Y. Taguchi, and S. Ramalingam, “Analytical Forward
Projection for Axial Non–Central Dioptric & Catadioptric Cameras,”
Proc. European Conf. Computer Vision, 2010.
[16] R. Swaminathan, M. D. Grossberg, and S. K. Nayar, “Non–Single
Viewpoint Catadioptric Cameras: Geometry and Analysis,” Int’l J.
Computer Vision, 2006.
[17] B. Micusik and T. Pajdla, “Autocalibration & 3D Reconstruction with
Non–Central Catadioptric Cameras,” IEEE Proc. Computer Vision and
Patter Recognition, 2004.
[18] T. Treibitz, Y. Y. Schechner, and H. Singh, “Flat Refractive Geometry,”
IEEE Proc. Computer Vision and Patter Recognition, 2008.
[19] K. N. Kutulakos and E. Steger, “A Theory of Refractive and Specular
3D Shape by Light–Path Triangulation,” Int’l J. Computer Vision, 2008.
[20] J. Ponce, “What is a Camera?” IEEE Proc. Computer Vision and Patter
Recognition, 2009.
[21] R. Gupta and R. I. Hartley, “Linear Pushbroom Cameras,” IEEE Trans.
Pattern Analysis and Machine Intelligence, 1997.
[22] J. Yu and L. McMillan, “General Linear Cameras,” Proc. European
Conf. Computer Vision, 2004.
[23] A. Zomet, D. Feldman, S. Peleg, and D. Weinshall, “Mosaicing New
Views: The Crossed–Slits Projection,” IEEE Trans. Pattern Analysis and
Machine Intelligence, 2003.
[24] D. Nistér, “A Minimal Solution to the Generalized 3–Point Pose
Problem,” IEEE Proc. Computer Vision and Patter Recognition, 2004.
[25] D. Nistér and H. Stewénius, “A Minimal Solution to the Generalized
3–Point Pose Problem,” In J. Math. Imaging Vis, 2007.
[26] C.-S. Chen and W.-Y. Chang, “Pose Estimation for Generalized Imaging
Device via Solving Non–Prespective n Point Problem,” IEEE Proc. Int’l
Conf. Robotics and Automation, 2002.
[27] ——, “On Pose Recovery for Generalized Visual Sensors,” IEEE Trans.
Pattern Analysis and Machine Intelligence, 2004.
[28] S. Ramalingam, S. K. Lodha, and P. Sturm, “A Generic Structure–from–
Motion Framework,” Computer Vision and Image Understanding, 2006.
[29] G. Schweighofer and A. Pinz, “Globally Optimal O (n) Solution to
the PnP Problem for General Camera Models,” Proc. British Machine
Vision Conf., 2008.
[30] G. Grossberg and S. Nayar, “A General Imaging Model and a Method
for Finding its Parameters,” IEEE Proc. Int’l Conf. Computer Vision,
2001.
[31] P. Sturm and S. Ramalingam, “A Generic Concept for Camera Calibration,” Proc. European Conf. Computer Vision, 2004.
[32] P. Miraldo, H. Araujo, and Queiró Joao, “Point–based Calibration Using
a Parametric Representation of the General Imagin Model,” IEEE Proc.
Int’l Conf. Computer Vision, 2011.
[33] K. S. Arun, T. S. Huang, and S. D. Blostein, “Least–Square Fitting of
Two 3 − D Point Sets,” IEEE Trans. Pattern Analysis and Machine
Intelligence, 1987.
[34] S. Umeyama, “Least–Square Estimation of Transformation Between
Two Point Patterns,” IEEE Trans. Pattern Analysis and Machine Intelligence, 1991.
[35] D. Nistér, “An Efficient Solution to the Five–Point Relative Pose
Problem,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2004.
[36] H. Pottmann and J. Wallner, Computational Line Geometry. Springer–
Verlag, 2001.
[37] Y. Ma, S. Soatto, J. Košecká, and S. S. Sastry, An Invitation to 3–D
Vision: From Images to Geometry Models. Springer Science+Business,
2004.
[38] P. Miraldo and H. Araujo, “Direct solution to the minimal generalized
pose,” Supplement Material, 2013.
[39] D. A. Cox, J. Little, and D. O’Shea, Using Algebraic Geometry.
Springer Science+Business, 2004.
[40] H. Stewénius, “Gröbner Basis Methods for Minimal Problems in Computer Vision,” PhD Thesis, Lund University, 2005.
[41] Z. Kukelova, M. Bujnak, and T. Pajdla, “Automatic Generator of
Minimal Problem Solvers,” Proc. European Conf. Computer Vision,
2008.
[42] R. Hartley and H. Li, “An Efficient Hidden Variable Approach to
Minimal-Case Camera Motion Estimation,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2012.
SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM
[43] Z. Kukelova, M. Bujnak, and T. Pajdla, “Polynomial Eigenvalue Solutions to Minimal Problems in Computer Vision,” IEEE Trans. Pattern
Analysis and Machine Intelligence, 2012.
Pedro Miraldo received the master’s degree in electrical and computer engineering from the University
of Coimbra in 2008 and the doctor’s degree in
electrical engineering from University of Coimbra in
2013. Currently he is a researcher at the Institute for
Systems and Robotics-Coimbra. His work focusses
on general camera models and camera localization.
Helder Araujo is a Professor in the Department
of Electrical and Computer Engineering of the University of Coimbra, Portugal and also an Invited
Professor at Blaise Pascal University, ClermontFerrand, France. He is a researcher at the Institute
for Systems and Robotics-Coimbra. His research
interests include computer and robot vision, robot
navigation and localization, and sensor modeling for
robot navigation and localization.
12