Direct Solution to the Minimal Generalized Pose

Pedro Miraldo; Helder Araujo

Direct Solution to the Minimal Generalized Pose

Pedro Miraldo

Helder Araujo

2014

U NIVERSIDADE DE C OIMBRA D EPARTAMENTO DE E NGENHARIA E LECTROT ÉCNICA E DE C OMPUTADORES I NSTITUTO DE S ISTEMAS E ROB ÓTICA 3030-290 C OIMBRA , P ORTUGAL A RTICLE : Direct Solution to the Minimal Generalized Pose Author: Pedro M IRALDO miraldo@isr.uc.pt Co-Author: Helder A RAUJO helder@isr.uc.pt C ONTENTS I Introduction I-A General Imaging Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-B Previous Approaches vs Our Approach . . . . . . . . . . . . . . . . . . . . . . . . I-C Background and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3 3 4 II Proposed Approach II-A Decomposition of the Homography Matrix . . . . . . . . . . . . . . . . . . . . . . II-B Formalization of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II-C The Case of a Central Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 5 6 III Solving for the Intersection of Quadrics 6 IV Algorithm Outline 7 V Experimental Results V-A Experiments with Synthetic Data . . . V-A1 Numerical Errors . . . . . V-A2 Number of Solutions . . . V-A3 Computational Effort . . . V-B Experimental Results with Real Data . V-C Analysis of the Experimental Results VI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions References . . . . . . . . . . . . 7 . . . . . . 7 . . . . . . 8 . . . . . . 9 . . . . . . 9 . . . . . . 10 . . . . . . 11 11 12 Biographies 13 Pedro Miraldo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Helder Araujo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Preprint: submitted to the IEEE Trans. Cybernetics, August, 2013. SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM 1 Direct Solution to the Minimal Generalized Pose Pedro Miraldo and Helder Araujo Abstract—Pose estimation is a relevant problem for imaging systems whose applications range from augmented reality to robotics. In this paper we propose a novel solution for the minimal pose problem, within the framework of generalized camera models and using a planar homography. Within this framework and considering only the geometric elements of the generalized camera models, an imaging system can be modeled by a set of mappings associating image pixels to 3D straight lines. This mapping is defined in a 3D world coordinate system. Pose estimation performs the computation of the rigid transformation between the original 3D world coordinate system and the one in which the camera was calibrated. Using synthetic data, we compare the proposed minimal-based method with the state-ofthe-art methods in terms of numerical errors, number of solutions and processing time. From the experiments, we conclude that the proposed method performs better, especially because there is a smaller variation in numerical errors, while results are similar in terms of number of solutions and computation time. To further evaluate the proposed approach we tested our method with real data. One of the relevant contributions of this paper is theoretical. When compared to the state-of-the-art approaches, we propose a completely new parametrization of the problem that can be solved in four simple steps. In addition, our approach does not require any predefined transformation of the dataset, which yields a simpler solution for the problem. Index Terms—Minimal problems in computer vision, absolute pose, generalized camera models, homography matrix. I. I NTRODUCTION Camera calibration within the framework of the generalized camera models consists of estimating the correspondences between image pixels and the corresponding projecting 3D straight lines. Image space does not usually change and therefore we can define a 3D coordinate system for the image coordinates. But a camera is a mobile device and so we cannot define a fixed global coordinate system to represent the lines mapped into the image points. Therefore, we define a 3D reference coordinate system associated with the camera to represent the 3D lines mapped into the image pixels [1]. As a consequence, to estimate the coordinates of 3D entities represented in a different coordinate system we need to estimate a rigid transformation that maps the camera coordinate system into the other reference coordinate system, which we call the absolute pose problem. This problem can be divided into two subproblems: minimal and non-minimal cases. Absolute pose is important for many applications that need to use the 3D entities imaged by a camera. The analysis of solutions using minimal data is important because it provides insights and a better understanding of the problem. Nonminimal solutions, meanwhile, are used to estimate the pose P. Miraldo and H. Araujo are with the Institute for Systems and Robotics, Department of Electrical and Computer Engineering, University of Coimbra, Portugal. for more than the minimal point correspondences, when we need a robust solution. The advantages of minimal solutions are in terms of computation speed and their applications in the hypothesis-and-test estimation methods such as RANSAC [2], [3]. Pose estimation has been subject of a significant amount of research for the case of any camera that can be modeled by a common perspective imaging system, where all the rays intersect at a single point (Fig. 1(b)). These works [4], [5], [6], [7], [8] are examples for non-minimal pose problems, and in the case of minimal pose problems these works [9], [10], [11] (only three 3D points in the world and their images are used) are also examples. Guo and Bujnak et al. , in [12], [13], developed a minimal solution for the computation of the pose of an uncalibrated perspective camera. In this case, instead of three 3D points, their method requires four 3D points and their images. If we consider imaging devices where image formation cannot be modeled by a central projection (see Fig. 1(a)), e.g. when the imaging rays are subject to reflection and/or refraction [14], [15], [16], [17], [18], [19] or linear cameras [20], [21], [22], [23], new methods and algorithms have to be considered. In this article we address minimal pose estimation for general imaging devices that uses a single image of three points in the scene. We do not address the relative pose problem where two or more images of the scene are used. In general, estimates based on minimal data provide multiple solutions. Here we describe a new method for pose estimation using the minimal number of three known coordinates of 3D points. As far as we know, this problem has been studied by Nistér & Stewénius [24], [25], by Chen & Chang [26], [27] and by Ramalingam et al. at [28]. A more detailed analysis of these methods is presented in the next section. We use a planar 3D homography transformation to define an algebraic relationship that relates any three points in the world (represented in the world coordinate system) and any three 3D lines that pass through those points (the lines are represented in the 3D camera coordinate system). Note that, since we are using only three 3D points, the use of a planar homography to represent the rigid transformation does not restrict the use of the method to planar patterns. Any three linearly independent 3D points define a plane in the world. However, as we will see (and without additional constraints), the use of three points and three lines leads to an infinite number of possible solutions. To have a finite number of solutions, the constraint that the homography matrix must define a rigid transformation is used. In this article, existing and proposed approaches are also compared. The proposed method performs better than the previous approaches in terms of numerical error and is similar to the best state-of-the-art method in terms of the number of solutions and computation effort. SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM 2 ne age Pla Im (a) (b) Fig. 1. Representation of the minimal pose estimation problem, considering that we have a calibrated generalized camera. Three 3D points are imaged through three rays. Since the generic camera model is calibrated, the 3D ray coordinates are known in the camera coordinate system. Pose estimation consists therefore on the estimation of the rotation and translation that define the transformation from the 3D world coordinate system into the camera coordinate system, such that the incidence relationship between points and lines is verified. We do not consider the non-minimal problem here. However, solutions for these cases were also developed by Chen and Chang [26], [27] and by Schweighofer and Pinz [29]. This paper is organized as follows: in the rest of this section we describe the framework for the generalized camera model, give a short description of the previous approaches, and indicate the mathematical notation used in the article. The proposed approach is presented in Sec. Sec. II. A method to estimate the intersection points of three quadrics is described in Sec. III. The structure of the algorithm is shown in Sec. IV and in Sec. V-A we compare the proposed approach with the state-of-the-art methods in terms of numerical accuracy, number of possible solutions and processing. Some conclusions are set out in Sec. VI. The contributions of the paper are stated in Sec. II and III. In Sec. II, we propose the homography-based solution for our problem (to the best of our knowledge, it is the first time that homography has been used for this kind of problem). In Sec. III, we adjust a general minimal solver (based on polynomial eigenvalues) to compute the intersection of the quadrics, as required by our method (solution presented in Sec. II). A. General Imaging Model For the estimation of the 3D pose, the calibration of the imaging device is assumed to be known. We use a model that can represent any type of imaging device. The model used is the general imaging model proposed by Grossberg and Nayar [30]. This model assumes that each image pixel is mapped into a ray in 3D world. The mapping parameters form an array of parameters called raxels. The set of all raxels that represent all image pixels constitutes the complete camera model. Note that there are no constraints on the mapping parameters, which means that the imaging model can represent any type of imaging system, central or non-central. Since we assume that the camera has been previously calibrated and represented by the raxel imaging model for all image pixels, the corresponding 3D line coordinates are known in the camera coordinate system. Several calibration procedures for the general imaging model were described in the literature, for instance [30], [31], [32]. B. Previous Approaches vs Our Approach The algorithm proposed by Chen and Chang at [26], [27] was the first method to estimate the minimal three-point pose problem for general imaging devices. The algorithm is based on the estimation of the roots of an eighth degree polynomial equation. To get this polynomial equation, the data-set is transformed such that lines and points verify a set of constraints. An algorithm is used in the new coordinate system to relate the triangle defined by distances between the three known world points and the three projection lines. This algorithm estimates the coordinates of the points in the camera coordinate system. Thus, since the pose is described by a rotation and a translation an additional estimation of the rigid transformation is also required, based on the matching between the coordinates of the points in the camera and world coordinate systems, for instance [33], [34]. The approach of Ramalingam et al. [28] is similar to that proposed by Chen and Chang, which is based on the fact that the distances between the world points must be preserved. They claim that the three distances’ constraints generate an eighth degree polynomial equation. However, unlike the algorithm proposed by Chen and Chang, they do not give a closed-form solution for the polynomial parameters. Since the basis of this algorithm is similar to the method of Chen and Chang, we will restrict our analysis to the latter algorithm. Nistér and Stewénius [24], [25] proposed an algorithm that is also based on the estimation of the roots of a single variable eighth degree polynomial equation. This algorithm also requires predefined transformations to the 3D points and 3D lines, in the world and camera coordinate systems. The coefficients of the polynomial equation are obtained from the intersection between a ruled quartic surface and a circle. Finally, we note that both algorithms are based on the derivation of an eighth degree polynomial equation, as a function of a single variable. Despite the differences between SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM the approaches, these algorithms require a set of rigid transformations of the dataset to be performed to get the coefficients of the eighth degree polynomial equation. These transformations are more complex – see, for example, the derivation of [35], [25] – and can lead to critical configurations – see, for instance [27, Sec. 3.2]. In this article, we propose a different approach. We do not aim to obtain a polynomial equation as a function of a single variable. Instead, we express the solutions as the intersection between three quadrics, where the number of solutions will be up to eight, as with the previous approaches. There are, however, methods to express the intersection of three quadrics as a problem of finding the roots of an eighth degree polynomial equation. In Guo [12, Theorem 1] the author shows a procedure to generate, in closed form, the coefficient parameters of an eighth degree polynomial equation, from a general problem of finding the points where quadrics intersect. The main results of our approach, in terms of problem parametrization, are that it does not require any transformation of the dataset and it is based on a very simple and intuitive four-step algorithm. Note, too, that no comparison between previous methods has yet been made. C. Background and Notation By default, bold capital letters represent matrices (eg. A ∈ Rn×m , n rows and m columns), bold small letters represent vectors (eg. a ∈ Rn , n elements) and small letters represent one-dimensional elements (eg. a). By default, a vector is a column. Let us consider U ∈ Rn×m and V ∈ Rk×l and the equation UXVT = C (1) where X ∈ Rm×l is the equation unknown. Using the Kronecker product, this equation can be rewritten as (V ⊗ U) vec (X) = vec (C) 3 II. P ROPOSED A PPROACH Pose estimation requires the estimation of a rotation matrix R ∈ SO (3) and a translation vector t ∈ R3 that define the rigid transformation between the world and camera coordinate systems. Since we consider that the imaging device is calibrated, pose is specified by the rigid transformation that satisfies the relationship of incidence between points in the world coordinate system and 3D straight lines represented in the camera coordinate system, Fig. 1. To distinguish between features represented in the world coordinate system and entities in the camera coordinate system, we use the superscripts (W) and (C), respectively. The rigid transformation between a point in world coordinates and the same point in camera coordinates is given by p(C) = Rp(W) + t. (3) Let us consider a plane π (W) ∈ R4 , defined by any three 3D points. Using this assumption and from the definition of the homography map [37], [1], we can rewrite (3) as 1 (4) p(C) = R + tπ T p(W) , π0 | {z } H valid for any point that belongs to that plane. The matrix H is called the homography matrix, π0 and π are respectively the distance from the plane to the origin and the unit normal vector to the plane. The homography matrix can be defined up to a scale factor. However, this matrix only defines a rigid transformation for points belonging to the plane π (W) if it meets the following condition: Proposition 1. A matrix H belongs to the space of the homography matrices H if it verifies following condition . H ∈ H = H ∈ R3×3 : σ2 (H) = 1 (5) where σ2 (H) is the second biggest singular value of H. The proof of this proposition is a simple extension of the proof found in [37, Lemma 5.18]. (2) where ⊗ represents the Kronecker product with (V ⊗ U) ∈ Rkn×lm and vec (.) is a vector formed by stacking the columns of the relevant matrix. b ∈ R3×3 , denotes the skew-symmetric The hat operator, A matrix that linearizes the exterior product such that a × b = b Ab. In this article we use Plücker coordinates to represent lines . lR = (d, m), where the elements of l ∈ R6 are constrained to hd, mi = 0 (known as the Klien quadric). For more information about Plücker coordinates see [36]. d ∈ R3 and m ∈ R3 are respectively the direction and moment of the line and for any point x ∈ R3 incident with the line, we can write m = x × d. The dimensions of the column-space and dimension of the null-space are represented by rank (.) and nullity (.), respectively. A. Decomposition of the Homography Matrix For the decomposition of the homography matrix we use the method described in Section 5.3.3 of [37]. In this section we briefly describe this method and analyze the decomposition used in our approach, in which the coordinates of π (W) are known. Let us consider the eigen decomposition of HT H as HT H = VΣ2 VT , σi2 (7) the ith eigenvalue of vi denotes the ith column of V and H (note that from Proposition 1, we know that σ22 = 1). Using these assumptions, we define p p 1 − σ32 v1 + σ12 − 1v3 p (8) u1 = σ12 − σ32 p p 1 − σ32 v1 − σ12 − 1v3 p u2 = (9) σ12 − σ32 SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM Solution 1: T R(1) = W(1) U(1) π (1) = v2 × u1 1 (1) = H − R(1) R(1) dt Solution 3: R(3) = R(1) π (3) = −π (1) 1 (3) = − d1 t(1) dt and matrices U (1) U (2) W (1) W (2) = v2 u1 v2 × u1 = v2 u2 v2 × u2 = Hv2 Hu1 Hv2 × Hu1 = Hv2 Hu2 Hv2 × Hu2 (10) (11) (12) . (13) To conclude, the four possible solutions for the decomposition of the homography matrix are shown in (6). Usually, the homography matrix is used to estimate the relative pose between two perspective images when a planar pattern is viewed by both cameras. In that case, the decomposition of H into a rotation matrix and translation vector can yield up to four solutions [37]. However, in the problem addressed in this paper, we assume that the 3D coordinates of the points are known in one of the coordinate systems, which means that the plane coordinates are known in that coordinate system. For this problem, it is possible to define the following theorem. Theorem 1. Given a homography matrix, H ∈ H, and a plane π (W) of known coordinates, the decomposition of H into rotation and translation has a unique solution. The proof of this theorem is given in Supplementary Material [38]. Solution 2: T R(2) = W(2) U(2) π (2) = v2 × u2 1 (2) = H − R(2) R(2) dt 4 Solution 4: R(4) = R(2) π (4) = −π (2) 1 (4) = − d1 t(2) dt (6) coordinate system). Therefore, using the representation of (16), we can define the following algebraic relation    (C)  (W) T b (C) p1 ⊗D m1 1   (C)  (C)  vec (H) =   p(W) T ⊗ D b (17)  m2  .  2  2 (C) (W) T (C) m3 b p3 ⊗D 3 | {z } {z } | w M The column-space of the matrix M is important for the estimation of the unknown parameters. Let us therefore consider the following theorem. Theorem 2. Consider a set of three points defined in the world coordinate system and their corresponding lines in the camera coordinate system. If the three points define a plane that does not pass through the origin, the dimension of the column-space of M at (17) will be rank (M) = 6. The proof of this theorem is given in Supplementary Material [38]. Even if the plane contains the origin, a simple change of coordinate system ensures that the conditions of the theorem hold. The goal of the method is to compute the parameter of H. As a result, let us consider the following derivation vec (H) M −w ξ = 0, where ξ = . (18) 1 | {z } N B. Formalization of the problem As mentioned in Sec. I-A the camera is deemed to have been previously calibrated. Therefore, for each pixel the corresponding 3D line (in the camera coordinate system) is known. A 3D line can be represented in Plücker coordinates, Section I-C. Using this representation and from [36], a point that is incident on a line satisfies the following relationship b (C) p(C) = m(C) . d(C) × p(C) = D (14) We wish to determine the relationship between the points in the world coordinate system and the lines in the camera coordinate system. Thus, for the plane π (W) , defined by the three world points, we can write b (C) D Hp (W) =m (C) . (15) The aim of this approach is to estimate the pose based on the homography matrix. Thus, let us consider the linearization of the unknown matrix H in (15), using the Kronecker product, Sec. I-C, T b (C) vec (H) = m(C) . p(W) ⊗ D (16) The minimal problem, in the case of the camera pose, corresponds to the determination of the mapping between three world points (represented in the world coordinate system) and their corresponding 3D lines (represented in the camera From Theorem 2, it can be seen that nullity (N) = 4. Let us represent the null-space of N as n o . null (N) = ae(1) + be(2) + ce(3) + de(4) : a, b, c, d ∈ R , (19) where the e(i) are the vectors defining its basis. Note that ξ has four degrees of freedom. However, from (18), we see that the tenth element of vector ξ must be ξ10 = 1. Without loss (i) of generality, let us consider e10 = 1, for all i. Using these definitions, one can see that ξ10 = 1 is satisfied for ξ ∈ R, such that n o . R = af (1) + bf (2) + cf (3) + f (4) : a, b, c ∈ R , (20) and f (1) = e(1) − e(4) (21) f (2) =e (4) (22) f (3) = e(3) − e(4) (23) f (4) =e (2) (4) −e . (24) Note that in this representation the space has only three degrees of freedom. Let us consider the estimation of the homography matrix represented as H. Using the result of (20), we can see that the homography matrix can be represented as H = aF (1) + bF (2) + cF (3) + F (4) , (25) SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM where F (i) ∈ R3×3 are the un-stacking of the vectors f (i) and a, b and c are the unknowns. The estimated homography matrix must represent a rigidbody transformation for points at π (W) , which means that the distance between any two points that belong to that plane must (C) (W) be preserved. Formally, since pi = Hpi , (W) (W) − pj (C) = pi (C) − pj (W) = Hpi (W) − Hpj . (26) (W) (W) Using this result and considering qi,j = pi − pj for the three world points, one can define three constraints that the estimated homography matrix must satisfy  T T T   q1,2 q1,2 = q1,2 H Hq1,2 T (27) qT1,3 q1,3 = qT1,3 H Hq1,3 .   T T T q2,3 q2,3 = q2,3 H Hq2,3 pi Using (25), we can rewrite these three constraints as a function of the unknowns as f1,2 (a, b, c) = f1,3 (a, b, c) = f2,3 (a, b, c) = 0 (28) where each fi,j (a, b, c) is such that (i,j) (i,j) (i,j) (i,j) fi,j (a, b, c) = k1 a2 + k2 b2 + k3 c2 + k4 ab+ (i,j) (i,j) (i,j) (i,j) (i,j) (i,j) +k5 ac + k6 bc + k7 a + k8 b + k9 c + k10 . (29) (i,j) The parameters of the coefficients, kn for n = 1, . . . , 10, are shown in (30). Note that each of these functions represents a quadric. As a result, the solutions for vector (a, b, c) that verify the system of (28), can be seen as corresponding to the points where the three quadrics intersect. A method to estimate the intersection of quadrics is described in Sec. III. A necessary condition for the decomposition of the homography matrix described in Sec. II-A is that the estimated matrices satisfy H ∈ H. This is the same as proving that H satisfies the conditions of a rigid transformation for any point that belongs to the plane π (W) . Therefore, this requirement must be met to ensure that the estimates R ∈ SO (3) and t ∈ R3 . From the following Proposition, all the solutions for H that are obtained from the intersection of the three quadrics, (27) and (28), satisfy the condition H ∈ H. Proposition 2. A matrix H ∈ R3×3 that satifies (27) and (28) defines a rigid transformation for any point that belongs (W) to π (W) (that is defined by the three points pi ) and, as a result, also satisfies the Proposition 1. The proof of this Proposition is presented in Supplementary Material [38]. This Proposition proves that a homography defines a rigid transformation of a plane if it preserves the distances between any three points of the plane. Note that from the Theorem 1, we ensure that each solution estimated for H ∈ H will generate a single solution for R ∈ SO (3) and t ∈ R3 . C. The Case of a Central Camera Let us consider the case of a central/perspective camera. Without loss of generality, we take the origin of the camera 5 coordinate system to be the center of projection, i.e., the point where all the rays intersect. Since all the rays pass through the origin, the moments of the 3D lines will be zero and we can write w = 0. In this case, from the algebraic relationship of (17), the solutions for the unknown vector vec (H) will belong to the null-space of M. From Theorem 2, we have nullity (M) = 3, which means that we can define vec (H) ∈ null (M), such that n o . null (M) = af (1) + bf (2) + cf (3) : a, b, c ∈ R (31) where f (i) are the vectors corresponding to the basis for the null-space. As with the general case, a space of solutions for H can be obtained (in this case from the null-space of M) using (25), by setting f (4) = 0. Since H must represent a rigid transformation for points that belong to the plane π (W) , the constraints defined in (27) must be enforced. As with the general case, each of these equations will correspond a quadric surface fi,j (a, b, c), where (i,j) 2 (i,j) (i,j) (i,j) a + k2 b2 + k3 c2 + k4 ab+ (i,j) (i,j) (i,j) +k5 ac + k6 bc + k10 (32) (i,j) and the parameters km are given by (30), by setting F 4 = 0. Note that in this case Proposition 2 is also satisfied and, as a result, H ∈ H. This means that, for each estimate H ∈ H, we have a single R ∈ SO (3) and t ∈ R3 . Note that in the case of a minimal pose problem for a central camera, we have up to four solutions [9]. Let us consider the solutions for vec (H) as ui , for i = 1, . . . , n where n ≤ 8 is the number of possible solutions. It is straightforward to verify that the solutions can be grouped into pairs {ul , um }, where um = −ul (if we have a solution ul that satisfies Mul = 0, then we will have um = −ul that satisfies Mum = 0). As a result, if we consider only solutions that correspond to points in front of the camera (which is usually used in these cases [9]), one of the solutions from each pair can be eliminated and then we will have up to four solutions, too. fi,j (a, b, c) = k1 III. S OLVING FOR THE I NTERSECTION OF Q UADRICS The most important step in previous algorithms was the estimation of the roots of an eighth degree polynomial equation. There is no analytical solution for the estimation of these roots [26], [27]. However, there are several numerical methods that can be used to estimate these solutions. Nistér & Stewénius and Chen & Chang proposed the use of the companion matrix [39], [24], [25]. According to Chen and Chang, this procedure is suitable to solve the proposed problem when considering criteria of speed and accuracy. With this method, the solutions for the roots are given by the eigenvalues of the companion matrix. For the algorithm proposed in this article, the solutions for the pose problem are obtained by estimating the intersection points of three quadrics. As mentioned above, Guo at [12] describes a method to derive the intersection of three generic quadrics by solving for the roots of an eight degree polynomial equation – the coefficients of this polynomial equation can be given in closed form. But other solvers can be used to SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM (i,j) k1 (i,j) k2 (i,j) k3 (i,j) k4 (i,j) k5 = = = = = qTi,j qTi,j qTi,j qTi,j F F (1) T (2) T (3) T (i,j) = (i,j) = k8 (i,j) = (i,j) k9 = (i,j) = F (1) qi,j , k6 F (2) qi,j , k7 (3) F qi,j , F T T F (1) F (2) + F (2) F (1) qi,j , T T qTi,j F (1) F (3) + F (3) F (1) qi,j , solve the intersection of the quadrics. The usual methods to estimate these intersection points are based on the use of the Grobner basis [40], [41] and hidden variable techniques [42]. More recently, a polynomial eigenvalue method [39] for solving minimal problems in computer vision has been presented [43]. As claimed by Kukelova et al. [43], these approaches are accurate, simple, fast and easy to implement. As a result, in this paper, we derive a polynomial eigenvalue solution to estimate the intersection points of the quadrics. We now describe the proposed derivation. Let us consider the system of three equations represented in (28). If we choose a as the eigenvalue, the polynomial eigenvalue has the form C2 a2 + C1 a + C0 v = 0. (33) Polynomial eigenvalue solvers require that the matrices Ci have square shapes. Note that since the original problem has only three equations and the size of the vector v is six, we cannot get square matrices Ci . We have to generate new equations as a polynomial combination of the original ones (resultant-based methods), to ensure that the coefficient matrices Ci have a square shape. After the generation of the new polynomial equations, we get matrices C0 , C1 , C2 ∈ R15×15 from (35) and (34), and v ∈ R15 is such that v = b4 , b3 c, bc3 , b2 c2 , c4 , b3 , b2 c, bc2 , c3 , b2 , bc, c2 , b, c, 1 . (36) Since matrix C0 will be non-singular, the polynomial eigenvalue problem of (33) can be rewritten as a standard eigen decomposition problem Cν = aν, where ā = a1 and matrix C ∈ R30×30 and vector ν ∈ R30 are 0 I v C= , ν= . (37) av −C−1 −C−1 0 C2 0 C1 To conclude, the solution for the unknowns (a, b, c) that satisfies (28) can be given by the eigen decomposition of matrix C where: a is given by the inverse of the eigenvalue of C and the corresponding b and c are given by the thirteenth and fourteenth element of the corresponding eigenvector. Note that given the size of matrix C there will be thirty eigenvalues. However, from (34), we can see that there will −1 be sixteen columns of matrices −C−1 0 C2 and −C0 C1 equal to zero. These columns will generate zero eigenvalues, which cannot be used since the value of a will be the inverse of each eigenvalue. Thus, we can eliminate these columns and the corresponding rows, resulting in a 14 × 14 matrix. Additional zero and complex eigenvalues are also generated and have to be eliminated, which will yield up to eight real solutions. k10 T qTi,j F (2) T qTi,j F (1) T qTi,j F (2) T qTi,j F (3) T qTi,j F (4) T F (3) + F (3) F (2) qi,j , T F (4) + F (4) F (1) qi,j , T F (4) + F (4) F (2) qi,j , T F (4) + F (4) F (3) qi,j , F (4) − I qi,j . 6 (30) IV. A LGORITHM O UTLINE We will now give an overview of the proposed method. The algorithm can be decomposed into the following four steps: 1) compute the null-space of matrix N at (18), which is obtained from the set of correspondences between world points and 3D lines from the general imaging model; 2) using the basis of the estimated null-space, compute the coefficients of the quadric surfaces using (30); using these coefficients get matrices C0 , C1 , C2 from (35) and (34), and use these matrices to compute matrix C, as shown in (37); 3) compute the eigen decomposition of matrix C, and, for each eigenvalue, set a equal to the inverse of the eigenvalue and b and c equal respectively to the thirteenth and fourteenth elements of the corresponding eigenvector; 4) for each set of (a, b, c), compute the estimates for the homography matrix using (25); and decompose each estimate using the method described in Sec. II-A. We again note that the third step – which corresponds to the solver for the quadrics intersection proposed at Sec. III – can be replaced by the solution described by Guo in [12, Theorem 1]. The method proposed in Sec. III is much more intuitive and easy to implement than the method described in Guo’s article, which derives an eighth degree polynomial equation for the problem. However, as we describe in Sec. V-A3, finding the roots of an eighth degree polynomial equations can solve the intersection of the quadrics faster. The algorithm proposed in this paper is implemented in M ATLAB and will be available on the author’s page. V. E XPERIMENTAL R ESULTS In this section we evaluate the proposed method against the state-of-the-art approaches, using synthetic data. In addition, we validate our method by testing it using real data. We conclude by analyzing the experimental results. A. Experiments with Synthetic Data The important parameters to evaluate the solutions of minimal problems are numerical accuracy, computation time, and the number of possible solutions given by the algorithms. As noted by Nistér and Stewénius [24], [25], all minimal solutions will behave in much the same way with respect to noise, which means that tests of this type are less relevant. We compare the algorithm proposed in this article with the methods proposed by Nistér and Stewénius at [24], [25] and by Chen and Chang at [26], [27]. Afterwards, we describe the generation of the dataset used to test the minimal three-point pose problem with the three algorithms. SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM  (1,2) (1,2) (1,2)  (1,2)  k4 k5 k7 0 0 0 0 0 k1 (1,2) (1,2)   0 k7 0 0  0 0 k1 0 0          (1,2) (1,2) (1,2) 0 k5 0 k7 0  0 0 0 k1 0    0     (1,2) (1,2)   0 k1(1,2) 0 k k 0 0 0 0 0  0 0 0   (1,2) 4(1,2) 5    (1,2)  k4  k1(1,2) 0 k5 0 k7 0 0 0 0 0  0 0 0 0      (1,3) (1,3) (1,3)  (1,3)  0  0 0 0 0 0 0 k4 k5 k7  0 0 0 0 k1     (1,3) (1,3) (1,3) (1,3)  0  0 0 0 k4 k5 0 k7 0 0  0 0 k1 0 0          (10:15) † (1,3) (1,3) (1,3) (1,3) = 0 = 0 0  , C2 0 0 0 k4 k5 0 k7 0 0 0 k1 0 ,     (1,3)  0 k4(1,3) k5(1,3)  0 k1(1,3) 0 0 k7 0 0 0 0  0 0 0      (1,3) (1,3) (1,3) (1,3)  0  0 0 k4 k5 0 0 k7 0 0 0  0 k1 0 0 0      (2,3) (2,3) (2,3)  (2,3)  0  0 0 0 0 0 0 0 k4 k5 k7  0 0 0 0 k1         (2,3) (2,3) (2,3) (2,3)  0  0 0 0 0 k4 k5 0 k7 0 0  0 0 k1 0 0      (2,3) (2,3) (2,3) (2,3)  0  0 0 0 0 0 k4 k5 0 k7 0  0 0 0 k1 0   (2,3) (2,3)    (2,3) k  k(2,3) 0 k5 0 0 k7 0 0 0 0 0  0 0 0 0  4 1 (2,3) (2,3) (2,3) (2,3) 0 0 k4 k5 0 0 k7 0 0 0 0 0 k1 0 0 0 (34)   (1,2) (1,2) (1,2) (1,2) (1,2) (1,2) 0 0 0 0 0 0 0 0 0 k2 k6 k3 k8 k9 k10 (1,2) (1,2) (1,2) (1,2) (1,2) (1,2)  0 0 0 0 0 k2 k6 k3 0 k8 k9 0 k10 0 0      (1,2) (1,2) (1,2) (1,2) (1,2) (1,2) 0  0 0 0 0 0 k2 k6 k3 0 k8 k9 0 k10  0   (1,2) (1,2) (1,2) (1,2) (1,2) (1,2)  0 k2 k k6 0 0 k8 k9 0 0 k10 0 0 0 0   (1,2) (1,2) 3  (1,2) (1,2) (1,2) (1,2)  k2 k6 0 k3 0 k8 k9 0 0 k10 0 0 0 0 0    (1,3) (1,3) (1,3) (1,3) (1,3) (1,3)  0 k10  0 0 0 0 0 0 0 0 k2 k6 k3 k8 k9   (1,3) (1,3) (1,3) (1,3) (1,3) (1,3)  0 0 0 0 0 k2 k6 k3 0 k8 k9 0 k10 0 0      (1,3) (1,3) (1,3) (1,3) (1,3) (1,3) C0 =  0 (35) 0 0 0 0 0 k2 k6 k3 0 k8 k9 0 k10 0 .   (1,3) (1,3) (1,3) (1,3) (1,3) (1,3)  0 k2  k 0 0 k 0 0 0 0 k k 0 0 k 3 6 8 9 10   (1,3) (1,3) (1,3) (1,3) (1,3) (1,3)  0 0 k6 k2 k3 0 0 k8 k9 0 0 k10 0 0 0    (2,3) (2,3) (2,3) (2,3) (2,3) (2,3)  0 0 0 0 0 0 0 0 0 k2 k6 k3 k8 k9 k10      (2,3) (2,3) (2,3) (2,3) (2,3) (2,3)  0 k9 0 k10 0 0  0 0 0 0 k2 k6 k3 0 k8   (2,3) (2,3) (2,3) (2,3) (2,3) (2,3)  0 0 0 0 0 0 k2 k6 k3 0 k8 k9 0 k10 0    (2,3) (2,3) (2,3) (2,3) (2,3) (2,3) k k6 0 k3 0 k8 k9 0 0 k10 0 0 0 0 0  2 (2,3) (2,3) (2,3) (2,3) (2,3) (2,3) 0 0 k6 k2 k3 0 0 k8 k9 0 0 k10 0 0 0  (6:15) † C1 7 0 0 0 0 0 0 0 0 0 0 (6:15) † – The superscripts at C1 elements. 0 0 0 0 0 0 0 0 0 0 0 (1,2) k4 (1,2) k5 (1,2) k4 (1,2) k7 (10:15) and C2 0 0 represent the columns of respective matrices. The disregarded columns will have only zero TABLE I I N THIS TABLE , WE SHOW THE MEDIAN AND MODE FOR EACH GRAPHIC OF THE F IG . 2, THAT REPRESENT THE NUMERICAL ERRORS FOR ROTATION , TRANSLATION AND DISTANCES BETWEEN POINTS . Numerical Errors: Rotation Translation Distance between points Our Method median mode 1, 4.10−13 1.9.10−14 2, 3.10−11 3, 7.10−12 9, 4.10−12 8, 3.10−13 We used synthetic datasets for these experiments. For that purpose consider a cube with a side length of 500 units. The data were generated by randomly mapping 3D lines and points (C) (C) li ↔ pi , for i = 1, 2, 3. A random rigid transformation was randomly generated (R ∈ SO (3) and t ∈ R3 ). This rigid transformation was applied to the set of points, such that (C) (W) pi 7→ pi . For the data generated and using the mappings (C) (W) li ↔ pi , for i = 1, 2, 3, the pose is estimated using the corresponding algorithms. This process is repeated for 105 trials with a new pose and data being randomly generated for each trial. 1) Numerical Errors: The characterization of the numerical error is one of the main parameters used to evaluate minimal Nistér and median 9, 5.10−12 1, 5.10−9 3, 8.10−10 Stewénius mode 8, 3.10−13 7, 5.10−11 7, 5.10−11 Chen and Chang median mode 1, 2.10−12 8.3.10−13 1, 7.10−10 3, 5.10−11 3, 6.10−11 1, 7.10−11 three-point pose algorithms. In Fig. 2, we show the results for the three algorithms that are compared in this article. We consider three different metrics for the numerical errors: 1) rotation error; 2) translation error; 3) average of the distance error between the estimated and ground-truth world points in the camera coordinate system. Next, we define the errors in detail. For each of the 105 trials we computed the estimated poses using the algorithms mentioned. From the set of possible solutions, we eliminated those that do not fit the problem, using the distance from the estimated solutions to the ground-truth. SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM Rotation Error 8000 6000 4000 2000 0 −16 −14 −12 −10 −8 −6 −4 −2 Points Distance Error Proposed Approach Nister and Stewenius Chen and Chang 10000 8000 6000 4000 2000 0 −16 0 −14 Base 10 Logarithm of the Numerical Error −12 −10 −8 −6 −4 −2 Proposed Approach Nister and Stewenius Chen and Chang 12000 Number of Occurrences in 105 Trials Number of Occurrences in 105 Trials 10000 Number of Occurrences in 105 Trials Translaction Error Proposed Approach Nister and Stewenius Chen and Chang 0 10000 8000 6000 4000 2000 0 −16 −14 Base 10 Logarithm of the Numerical Error (a) 8 (b) −12 −10 −8 −6 −4 −2 0 Base 10 Logarithm of the Numerical Error (c) Fig. 2. In this figure, we show the distribution of the numerical error in the computation of the rotation parameters (a) (distance between the estimated and ground-truth angles); in the computation of the translation vector (b) (distance between the estimated and ground-truth translation vector; and in terms of the distance between the ground-truth and estimated points in the camera coordinate system (c). For each of the trial we computed drotation = α, β, γ − (α, β, γ) (38) where α, β, γ and (α, β, γ) are the rotation angles of the estimated solutions and ground-truth solution, respectively. The error is the usual Euclidean error defined between two vectors (L2 metric). The frequency distribution of the error values is shown in Fig. 2(a). The numerical error for translation was computed using dtranslation = t − t , (39) where t and t are the estimated and ground-truth translation vectors, respectively. The distribution of the numerical errors of translation is shown in Fig. 2(b). We also computed an error measure related to the estimation of the coordinates of the world points in the camera coordinate system. Since the ground-truth p(C) is known, we computed the error as 3 dpoints 1 X (C) (C) (C) (C) , where pi = Rpi t, (40) pi − pi = 3 i=1 where R and t are the estimates for rotation and translation respectively. The frequency distribution of the numerical errors is shown in Fig. 2(c). Without Orientation Contraint 4 x 10 5 4 3 2 1 Proposed Approach Nister and Stewenius Chen and Chang 4.5 Number of Occurrences in 10 Trials 5 Number of Occurrences in 105 Trials With Orientation Contraint 4 x 10 Proposed Approach Nister and Stewenius Chen and Chang 4 3.5 3 2.5 TABLE II I N THIS TABLE WE SHOW THE AVERAGES OF THE NUMBER OF SOLUTIONS GENERATED BY EACH ALGORITHM , WITHOUT AND WITH THE APPLICATION OF THE ORIENTATION CONSTRAINT. 2 1.5 1 0.5 0 1 2 3 4 5 Number of Solutions (a) 6 7 8 We also computed the median and mode for each curve. The results are shown in Tab. I. 2) Number of Solutions: Note that in general, minimal problems in computer vision do not generate single solutions. As a result, comparing the different methods in terms of the number of solutions is important. In this section we analyze the distribution of the number of solutions for the 105 trials described above. Wrong solutions are eliminated by assuming the constraint that “the camera looks ahead”. Assuming that the line is rep(C) resented by a point x(C) and its direction d , this constraint T (C) (C) (C) > 0, pI − xi can be formalized by the inequality di for i = 1, 2, 3. We computed the number of solutions for all the 105 trials and the results are shown in Figs. 3(a) and 3(b), before and after imposing the orientation constraint, respectively. We also computed the averages of the number of solutions generated by the three algorithms considered, with and without the imposition of the orientation constraint. The results are presented in Tab. II. 3) Computational Effort: We implemented the algorithms in M ATLAB using an Intel core i7-3930k with a 3.2GHz processor. All the computation times given in the paper were estimated as the median of the respective computation times, for all the 105 trials. All the proposed approaches require iterative steps to solve the problem. When compared to the iterative steps, the computation time spent on simple analytical operations is negligible. As a result, for the evaluation we only considered the steps that require the most significant 0 1 2 3 4 5 6 7 8 Number of Solutions (b) Fig. 3. In this figure we show the distribution of the number of solutions for the three algorithms. Figures (a) and (b) display the number of solutions before and after the application of the orientation constraint respectively. Mean: Our Method Nistér and Stewénius Chen and Chang Orientation Constraint Without With 3, 686 2, 909 3.691 2.912 3.695 2.915 SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM 9 400 300 300 250 200 200 100 yy axis 150 0 100 15 50 0 3000 −50 2000 −100 1000 −150 500 −200 2500 17 16 34 19 2 10 11 12 5146 13 ey ez ex 2000 −400 1500 400 300 0 200 100 0 −100 xx axis (a) 8 7 −100 (b) −200 −300 −400 −1000 Yc0 1000 zz axis −200 200 500 400 0 600 (c) Fig. 4. For results with real data, we use a catadioptric system with quadric mirrors. Seventeen images of 3D planes were acquired (c). An example of data used for pose estimation is displayed in Figs. (a) and (b). Using only three points in the image and their corresponding points in the chess-board coordinate system (red coordinate system in (a)), rotation and translation (corresponding to the rigid transformation between the chess-board and camera coordinate system) are estimated. In (b) the camera coordinate system is shown in black and the estimated chess-board coordinate system is shown in red. In Fig. (c) the metric units are in millimeters. computational effort, i.e., the iterative steps. The main computation step for all the algorithms was the computation of the minimal solver, which may correspond to finding the roots of an eighth degree polynomial equation (other simple solvers can be used for the method proposed in this paper). As suggested by both Nistér & Stewénius and Chen & Chang, we used the companion matrix method to solve the roots of the eighth degree polynomial equations. This method amounts to performing an eigen decomposition of an 8 × 8 matrix, which takes 43µs. For our parametrization, we had to consider the estimation of the null-space of the matrix N, at (18). Iterative singular value decomposition methods would require that we take into account 40µs. However, there are solutions that can be used to derive general analytical solutions to compute the singular value decomposition of matrix N. For instance, we can symbolically produce the reduced row echelon form of matrix N and use this derivation to compute the null-space. We also derived an analytical solution using the geometric properties of the cross and dot products and considering the rows of matrices M and N from (17) and (18). As a result, it is possible to compute the null-space in closed-form, using very simple analytical operations. Both our approach and Chen & Chang’s method require additional steps to recover the rotation and translation that define the pose. Chen & Chang give the coordinates of the world points in the camera coordinate system and the method we propose estimates the homography matrix that defines pose. In both cases, the computation of an eigen decomposition of a 3 × 3 matrix is required for each valid solution. Note that this decomposition can be computed in closed form. We now describe how we can achieve this decomposition. Let us consider that we want to compute the eigen decomposition of the matrix HT H ∈ R3×3 from the problem described in (7). The eigenvalues σi2 , which will be the diagonal elements of the matrix Σ2 in (7), are computed such that det HT H − σ 2 I = 0, (41) where det (.) denotes the determinant and I denotes the 3 × 3 identity matrix. (41) defines a third degree polynomial equation with unknown σ 2 . As a result, the solutions for σi2 for i = 1, 2, 3 are given by the roots of this equation, which can be computed in closed-form simply by applying Ferrari’s method. To conclude, and since we already have σi2 , the solutions for the columns vi ∈ R3 , for i = 1, 2, 3, that form V of (7) are given by the equation HT Hvi = σi2 vi . (42) B. Experimental Results with Real Data To obtain data from real images, we considered a noncentral catadioptric imaging device comprising a perspective camera and a quadric mirror. To obtain the data, we used a chess board in seventeen different positions and orientations, Fig. 4(c). The images of the chess board were acquired by the non-central catadioptric camera. To obtain the groundtruth, the 3D positions of the points of the chess board were estimated in world coordinates using a set-up with perspective cameras. An example displaying the points in the world and their corresponding image coordinates is shown in Figs. 4(a) and 4(b). In both figures blue dots represent image points and their corresponding world points in ground-truth coordinates. Sixteen of the seventeen chess board planes (and their images) were used to calibrate the camera (using [32]). The resulting calibration provides the coordinates of the 3D projecting rays for each image pixel. A reference coordinate system was associated with the chess board plane that was not used to calibrate the camera. The coordinate system was defined so that Theorem 2 is satisfied (the plane does not contain the origin of the coordinate system). Three points of the chess board plane and their images were then used to estimate the pose of the plane. The 3D projection line was estimated in the camera coordinate system, for each of the three points in the image. Figs. 4(a) and 4(b) show the image points plus an example of the 3D lines (in green). The mappings between the three points in the world (chess board coordinate system) and their projection lines (represented in the camera coordinate system) were used to estimate pose. In addition, we selected the best solution (in terms of the errors in the 3D reconstructed points) from the set of possible SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM 10 TABLE III I N THIS TABLE , WE SHOW THE RESULTS OF THE ABSOLUTE ERRORS FOR ALL SIX POSE PARAMETERS IN ALL OF THE SEVENTEEN POSE ESTIMATES . ROTATION ANGLES ARE IN DEGREES . N OTE THE THE METRIC IN THE TRANSLATION ARE DIFFERENT FROM THE GROUND - TRUTH TO THE ABSLUTE ERRORS . W HILE IN THE FIRST CASE IS IN METERS , IN THE SECOND CASE IS IN MILLIMITERS . Plane 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 αx 0.86 0.99 0.89 0.94 1.57 1.60 0.97 1.26 1.34 1.28 1.79 1.95 1.95 1.35 0.98 0.96 2.12 αy -18.68 -39.83 -8.89 8.04 -51.95 -50.09 4.18 37.49 -54.10 -41.00 -62.64 -68.73 -58.60 -42.33 5.81 -31.92 -63.70 Ground αz 169.21 168.92 169.30 169.46 168.14 168.08 169.50 170.08 168.32 168.50 167.90 167.74 167.68 168.38 169.55 168.95 167.52 Truth tx [m] -0.37 -0.43 -0.41 -0.32 -0.38 -0.32 -0.34 -0.12 -0.51 -0.33 -0.30 -0.53 -0.29 -0.35 -0.45 -0.47 -0.42 ty [m] 0.06 0.09 0.06 0.06 0.12 0.13 0.02 -0.01 0.08 0.10 0.12 0.13 0.15 0.13 -0.04 0.04 0.06 tz [m] 2.10 1.91 2.08 2.11 1.76 1.72 2.30 2.48 1.97 1.90 1.75 1.70 1.63 1.73 2.63 2.16 2.09 pose estimates. Using the pose estimate thus obtained, the errors for all six pose parameters were computed. This procedure was repeated for all 17 planes using a leaveone-out procedure. We computed the pose parameter’s errors for all the seventeen different configurations. The absolute errors in the estimated parameters are shown in Tab. III. C. Analysis of the Experimental Results Let us first look at the experiments with synthetic data, Sec. V-A. In terms of the number of solutions (Fig. 3) with and without the orientation constraint, the results are similar for the three methods and so this comparison does not offer additional information on the relative merits of the algorithms. This can be further checked by considering the averages of the number of solutions generated by the three methods that are presented in Tab. II. Thus, in this sub-section we discuss the experimental results in terms of numerical accuracy and processing time. In terms of numerical accuracy (Fig. 2) it can be seen that the method proposed in this article performed better than the state-of-the-art algorithm, in all the tests. The algorithm proposed by Chen & Chang performs the second best followed by the method proposed by Nistér & Stewénius. This analysis can be confirmed by taking into account the data in Tab. I. This shows that the numerical errors for the method proposed in this paper are better than state-of-the-art methods, for all the evaluation parameters. Moreover, note that the mode of the curves for the results obtained with the previous approaches give worse results than the median of the method proposed in this article, for all the numerical accuracy experiments. In terms of computational time, Sec. V-A3, we saw that all three algorithms can be implemented with similar computational effort (solving for the roots of an eighth degree polynomial equation). Nevertheless, both our method and α̂x 0.0053 0.2139 0.3361 0.0072 0.4777 1.4092 0.0027 0.0507 0.0292 0.1598 0.3740 0.5593 4.4823 0.3031 0.0538 0.9328 3.7783 α̂y 0.2831 0.6406 1.2368 0.0781 0.2570 0.3863 1.1902 0.0752 0.0882 0.1031 0.0378 1.5524 1.6889 0.9430 0.1246 2.9167 1.3850 Absolute Error α̂z t̂x [mm] 0.1082 1.8810 0.2383 1.1964 0.4696 5.2738 0.1519 0.6116 0.6365 0.7986 1.8659 1.1793 0.8675 6.2205 0.0321 1.1442 0.1513 0.0811 0.2384 0.7491 0.5223 0.5351 0.2773 9.6657 5.1984 7.1865 0.0728 2.0597 0.1570 0.6553 2.2671 2.4659 4.6183 3.6914 t̂y [mm] 0.4149 0.3307 1.6796 0.8114 1.4233 5.3712 4.8714 0.6326 0.3610 1.0101 0.5899 5.4284 13.5883 0.7150 0.7758 10.1745 7.4837 t̂z [mm] 0.0305 3.0765 1.1239 0.2942 1.0428 2.4334 5.1137 1.3746 0.0094 1.0732 0.4138 11.1939 10.0518 9.8622 1.3818 24.6971 0.1563 Chen & Chang’s method may or may not have additional iterative parts. These iterative parts may be considered if fast implementation is required. However, if computation time is important, analytical derivations must be considered. Now let us look at the experiments with real data, Sec. V-B. As noted earlier and as mentioned by Nistér and Stewénius [24], [25], all minimal solutions will behave in the same way with respect to noise. But to further evaluate the proposed approach we tested our method using real images. As we can see from Tab. III, and despite the fact that we are using minimal data, the results are good. Note that the absolute errors for the angles are in degrees and that the translation errors are in millimeters (while the ground truth is in meters). VI. C ONCLUSIONS In this article we have presented a novel homographybased algorithm for the minimal three-point pose problem for generalized cameras, where the constraint that the projection rays must intersect at a single point is not considered. This method can be used to estimate pose for imaging models where reflection or refraction occurs. That is the case, for example, of catadioptric systems made up of arbitrary mirrors, or of a perspective camera looking at a water environment. Previous approaches used algebra techniques that lead to a single variable eighth degree polynomial and the pose estimates are given by roots of the polynomial. In the case of this paper, the important step is the estimation of the intersection points of three quadrics (which can also be represented by the roots of an eighth degree polynomial equation). The algorithm proposed estimates the pose by computing four vectors obtained from a null-space and, using this null-space, we can define the three quadric equations. The intersection of quadric surfaces occurs in several minimal problems in computer vision. We have described a polynomial eigenvalue representation of the intersection of quadrics, where the solutions are given by an eigen decomposition. Unlike SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM previous methods, our proposed approach does not require any transformation of the dataset either, and this is why we call it the direct solution to the problem. The algorithm presented in this paper was not intended to be used where non-minimal data is available; it is intended to be a simple method to allow the analysis of the solutions when one only has access to minimal data. This method is suitable for RANSAC-based approaches for pose estimation. Also, using this algorithm, the solutions with minimal data can be easily obtained and then analyzed for the purpose of comparing non-central cameras. We compared the proposed algorithm with the state-ofthe-art methods and concluded that our algorithm performs better than them in terms of numerical accuracy, with similar computational effort and a similar number of solutions. The validation of the method by testing with real experiments showed good results. ACKNOWLEDGEMENTS This work was supported by the Portuguese Foundation for Science and Technology, scholarship SFRH/BD/49054/2008 and by project A “Surgery and Diagnosis Assisted by Computer Using Images” (SCT_2011_02_027_4824) funded by the QREN programme “Mais Centro” with financing from FEDER. Helder Araujo would also like to thank the support of Project FCT/PTDC/EIA-EIA/122454/2010, funded by the Portuguese Science Foundation (FCT) by means of national funds (PIDDAC) and co-funded by the European Fund for Regional Development (FEDER) through COMPETE Operational Programme Competitive Factors (POFC). R EFERENCES [1] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Cambridge University Press, 2000. [2] M. A. Fischler and R. C. Bolles, “Random Sample Consensus: A Paradigm for Model Fitting with Applicatlons to Image Analysis and Automated Cartography,” Comm. of the ACM, 1981. [3] D. Nistér, “Preemptive RANSAC for Live Structure and Motion Estimation,” IEEE Proc. Int’l Conf. Computer Vision, 2003. [4] R. M. Haralick, H. Joo, C.-N. Lee, X. Zhuang, V. G. Vaidya, and M. B. Kim, “Pose Estimation from Corresponding Point Data,” IEEE Trans. System, Man and Cybernetics, 1989. [5] C.-P. Lu, G. D. Hager, and E. Mjolsness, “Fast and Globally Convergent Pose Estimation from Video Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2000. [6] F. Moreno Noguer, V. Lepetit, and P. Fua, “Accurate Non–Iteractive O (n) Solution to the PnP Problem,” IEEE Proc. Int’l Conf. Computer Vision, 2007. [7] J. Hesch and S. Roumeliotis, “A Direct Least-Squares (DLS) Method for PnP,” IEEE Proc. Int’l Conf. Computer Vision, 2011. [8] O. Tahri, H. Araujo, Y. Mezouar, and F. Chaumette, “Efficient Iterative Pose Estimation using an Invariant to Rotations,” IEEE Trans. Cybernetics, 2014. [9] R. Haralick, C.-N. Lee, K. Ottenberg, and M. Nolle, “Review and Analysis of Solutions of the Three Point Perspective Pose Estimation Problem,” Int’l J. Computer Vision, 1994. [10] X.-S. Gao, X.-R. Hou, J. Tang, and H.-F. Cheng, “Complete Solution Classification for the Perspective–Three–Point Problem,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2003. [11] L. Kneip, D. Scaramuzza, and R. Siegwart, “A Novel Parametrization of the Perspective–Three–Point Problem for a Direct Computation of Absolute Camera Position and Orientation,” IEEE Proc. Computer Vision and Patter Recognition, 2011. 11 [12] Y. Guo, “A Novel Solution to the P4P Problem for an Uncalibrated Camera,” In J. Math. Imaging Vis, 2012. [13] M. Bujnak, Z. Kukelova, and T. Pajdla, “A general solution to the P4P problem for camera with unknown focal length,” IEEE Proc. Computer Vision and Patter Recognition, 2008. [14] N. Goncalves, “Noncentral Catadioptric Systems with Quadric Mirrors, Geometry and Calibration,” Ph.D. dissertation, University of Coimbra, October 2008. [15] A. Agrawal, Y. Taguchi, and S. Ramalingam, “Analytical Forward Projection for Axial Non–Central Dioptric & Catadioptric Cameras,” Proc. European Conf. Computer Vision, 2010. [16] R. Swaminathan, M. D. Grossberg, and S. K. Nayar, “Non–Single Viewpoint Catadioptric Cameras: Geometry and Analysis,” Int’l J. Computer Vision, 2006. [17] B. Micusik and T. Pajdla, “Autocalibration & 3D Reconstruction with Non–Central Catadioptric Cameras,” IEEE Proc. Computer Vision and Patter Recognition, 2004. [18] T. Treibitz, Y. Y. Schechner, and H. Singh, “Flat Refractive Geometry,” IEEE Proc. Computer Vision and Patter Recognition, 2008. [19] K. N. Kutulakos and E. Steger, “A Theory of Refractive and Specular 3D Shape by Light–Path Triangulation,” Int’l J. Computer Vision, 2008. [20] J. Ponce, “What is a Camera?” IEEE Proc. Computer Vision and Patter Recognition, 2009. [21] R. Gupta and R. I. Hartley, “Linear Pushbroom Cameras,” IEEE Trans. Pattern Analysis and Machine Intelligence, 1997. [22] J. Yu and L. McMillan, “General Linear Cameras,” Proc. European Conf. Computer Vision, 2004. [23] A. Zomet, D. Feldman, S. Peleg, and D. Weinshall, “Mosaicing New Views: The Crossed–Slits Projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2003. [24] D. Nistér, “A Minimal Solution to the Generalized 3–Point Pose Problem,” IEEE Proc. Computer Vision and Patter Recognition, 2004. [25] D. Nistér and H. Stewénius, “A Minimal Solution to the Generalized 3–Point Pose Problem,” In J. Math. Imaging Vis, 2007. [26] C.-S. Chen and W.-Y. Chang, “Pose Estimation for Generalized Imaging Device via Solving Non–Prespective n Point Problem,” IEEE Proc. Int’l Conf. Robotics and Automation, 2002. [27] ——, “On Pose Recovery for Generalized Visual Sensors,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2004. [28] S. Ramalingam, S. K. Lodha, and P. Sturm, “A Generic Structure–from– Motion Framework,” Computer Vision and Image Understanding, 2006. [29] G. Schweighofer and A. Pinz, “Globally Optimal O (n) Solution to the PnP Problem for General Camera Models,” Proc. British Machine Vision Conf., 2008. [30] G. Grossberg and S. Nayar, “A General Imaging Model and a Method for Finding its Parameters,” IEEE Proc. Int’l Conf. Computer Vision, 2001. [31] P. Sturm and S. Ramalingam, “A Generic Concept for Camera Calibration,” Proc. European Conf. Computer Vision, 2004. [32] P. Miraldo, H. Araujo, and Queiró Joao, “Point–based Calibration Using a Parametric Representation of the General Imagin Model,” IEEE Proc. Int’l Conf. Computer Vision, 2011. [33] K. S. Arun, T. S. Huang, and S. D. Blostein, “Least–Square Fitting of Two 3 − D Point Sets,” IEEE Trans. Pattern Analysis and Machine Intelligence, 1987. [34] S. Umeyama, “Least–Square Estimation of Transformation Between Two Point Patterns,” IEEE Trans. Pattern Analysis and Machine Intelligence, 1991. [35] D. Nistér, “An Efficient Solution to the Five–Point Relative Pose Problem,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2004. [36] H. Pottmann and J. Wallner, Computational Line Geometry. Springer– Verlag, 2001. [37] Y. Ma, S. Soatto, J. Košecká, and S. S. Sastry, An Invitation to 3–D Vision: From Images to Geometry Models. Springer Science+Business, 2004. [38] P. Miraldo and H. Araujo, “Direct solution to the minimal generalized pose,” Supplement Material, 2013. [39] D. A. Cox, J. Little, and D. O’Shea, Using Algebraic Geometry. Springer Science+Business, 2004. [40] H. Stewénius, “Gröbner Basis Methods for Minimal Problems in Computer Vision,” PhD Thesis, Lund University, 2005. [41] Z. Kukelova, M. Bujnak, and T. Pajdla, “Automatic Generator of Minimal Problem Solvers,” Proc. European Conf. Computer Vision, 2008. [42] R. Hartley and H. Li, “An Efficient Hidden Variable Approach to Minimal-Case Camera Motion Estimation,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2012. SUBMITTED TO IEEE TRANS. ON CYBERNETICS: DIRECT SOLUTION FOR THE MINIMAL GENERALIZED POSE PROBLEM [43] Z. Kukelova, M. Bujnak, and T. Pajdla, “Polynomial Eigenvalue Solutions to Minimal Problems in Computer Vision,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2012. Pedro Miraldo received the master’s degree in electrical and computer engineering from the University of Coimbra in 2008 and the doctor’s degree in electrical engineering from University of Coimbra in 2013. Currently he is a researcher at the Institute for Systems and Robotics-Coimbra. His work focusses on general camera models and camera localization. Helder Araujo is a Professor in the Department of Electrical and Computer Engineering of the University of Coimbra, Portugal and also an Invited Professor at Blaise Pascal University, ClermontFerrand, France. He is a researcher at the Institute for Systems and Robotics-Coimbra. His research interests include computer and robot vision, robot navigation and localization, and sensor modeling for robot navigation and localization. 12

Log In

Direct Solution to the Minimal Generalized Pose

Related papers

Related papers

Related topics