An N-Point Linear Solver for Line and Motion Estimation with Event Cameras
Abstract
Event cameras respond primarily to edges—formed by strong gradients—and are thus particularly well-suited for line-based motion estimation. Recent work has shown that events generated by a single line each satisfy a polynomial constraint which describes a manifold in the space-time volume. Multiple such constraints can be solved simultaneously to recover the partial linear velocity and line parameters. In this work, we show that, with a suitable line parametrization, this system of constraints is actually linear in the unknowns, which allows us to design a novel linear solver. Unlike existing solvers, our linear solver (i) is fast and numerically stable since it does not rely on expensive root finding, (ii) can solve both minimal and overdetermined systems with more than 5 events (i.e. ), and (iii) admits the characterization of all degenerate cases and multiple solutions. The found line parameters are singularity-free and have a fixed scale, which eliminates the need for auxiliary constraints typically encountered in previous work. To recover the full linear camera velocity we fuse observations from multiple lines with a novel velocity averaging scheme that relies on a geometrically-motivated residual, and thus solves the problem more efficiently than previous schemes which minimize an algebraic residual. Extensive experiments in synthetic and real-world settings demonstrate that our method surpasses the previous work in numerical stability, and operates over 600 times faster.
Project page: https://mgaoling.github.io/eventail/
1 Introduction
Man-made scenes contain a multitude of straight lines, and exploiting these lines for motion estimation is an important feature of modern mobile vision systems like AR/VR devices and robotic systems [17, 25, 42, 23]. However, computer vision algorithms aimed at leveraging these line features still suffer from fundamental limitations when using standard frame-based sensing: During high-speed motion and challenging illumination conditions, these sensors suffer from motion blur and saturation effects, which have deleterious effects on line feature extraction. Event cameras [10] are biologically inspired sensors that address the limitation of frame-based sensors by instead only measuring the changes in intensity at a per-pixel level, and they do this with high dynamic range, low motion blur, high temporal resolution, and high spatial data-sparsity.
Due to their working principle, event cameras respond primarily to edges—formed by strong gradients—and are thus particularly well-suited for line-based motion estimation. A recent breakthrough [12] in event-based motion estimation introduced an incidence relation that enforces the intersection of bearing vectors emitted by events and a corresponding line that generates those events. Using this relation, a 5-point minimal solver was designed that recovers the parameters of a minimal two-point-two-plane parametrization of the line and two velocity ratios in the plane perpendicular to the line, using the Gröbner basis method and polynomial elimination theory. However, this solver suffers from several limitations: First, it relies on a non-minimal line representation using four degrees of freedom (DoF) that (i) fails to realize that, in the absence of scale, only three DoF are needed, and (ii) encounters singularities when describing lines parallel to the two planes. Secondly, to solve for the motion and line parameters, previous work employs a polynomial solver, which (i) is by definition minimal and thus incapable of incorporating more than five events, and (ii) relies on root finding algorithms that are expensive to run and suffer from instabilities that are not easily detected.
This work addresses these limitations with two important innovations: First, it introduces a new line representation based on the angle-axis representation of a rotation matrix, which is singularity-free and only depends on three DoF and thus implicitly enforces scale ambiguity. Second, in formulating the incidence relation with this new parametrization, we derive a simple algorithm for determining motion and line parameters that only relies on solving a linear system and simple vector operations and is thus orders of magnitude faster than the polynomial solver in [12]. This linear system is easily extended to events, with a minimal increase in complexity, which enables solution refinement with inliers when employing random sample consensus (RANSAC) schemes.
The proposed solver sheds light on all possible cases of degenerate solutions, how they arise, and all additional solutions that arise from symmetries in the incidence relation. Finally, expressing the incidence relation in terms of our new line parametrization allows us to fully characterize and visualize the types of manifolds circumscribed by events generated by a single line. Our contributions are:
-
•
A minimal, three DoF representation of lines based on the angle-axis representation of a rotation matrix. This representation encodes a reference frame centered at the 3D line, and enforces a unit distance to the closest point.
-
•
A linear algorithm for determining line and motion parameters from a set of events triggered by a line. This algorithm is fast, and extensible to multiple events, and sheds light on degenerate and multiplicitous solutions.
-
•
A simpler and faster scheme for fusing partial linear velocity measurements from multiple lines, based on geometrically-motivated constraints, instead of solving algebraic equations.
We validate our method extensively in simulated and real-world settings. In particular, our solver is on average 600 times faster, taking only for five events, compared to for [12], while achieving a similar performance.
2 Related Work
Ego-motion estimation is crucial for intelligent mobile devices and thus has been the subject of extensive research over the past few decades. The sub-class of vision-based solutions comprises single-camera, stereo-camera, and multi-camera solutions that are potentially supported by an inertial measurement unit. A review would go beyond the scope of the present paper, and the reader is kindly referred to recent reviews such as the one by Cadena et al. [4]. This paper focuses on motion estimation with event cameras.
The latter is a challenging problem that is initially often addressed for constrained scenarios such as 2D motion [39], known depth or 3D structure [40, 5, 27, 7, 3, 6, 46], pure rotation [8], and homographic warping [9, 20, 34, 24, 31]. The community has furthermore explored the combination with other sensors such as standard cameras [20, 38, 16], inertial units [38, 45], or a second event camera [44]. In turn, the present paper considers motion in 3D with a single event camera and in arbitrary environments. Different optimization-based [32, 21, 28], filter-based [19, 45] and learning-based [26, 13] solutions have already been presented. Of particular interest to the present work are methods that rely on line features [43, 21].
In the spirit of original works on monocular visual odometry [29], the present work addresses the relatively unexplored topic of geometric incidence relationships for local relative motion calculation with an event camera. Geometric solutions remain important to date owing to their ability to find solutions with optimality guarantees, and potentially certificates, under known assumptions, unlike optimization-, filter- or learning-based solvers, which often lack these guarantees. Given that events are primarily triggered by high-gradient appearance boundaries, the dominant feature for event-based motion estimation is given by lines. Weng et al. [41] and Hartley et al. [15] have proposed closed-form solutions for frame-based cameras. Tri-focal tensor geometry inspired the first closed-form solution for event cameras [30]. An important characteristic of this solver is that it relies on a local constant velocity motion model and makes use of the first-order dynamics of the camera. This is important as it permits the inclusion of the time-stamped, asynchronous measurements produced by an event camera. Nonetheless, the method by Peng et al. [30] is not general as it still depends on approximate event-based line-feature extractors [2, 37, 36] rather than events, only.
The most related works to ours are the works by Ieng et al. [18], Seok and Lim [33], and Gao et al. [12], who propose model-based fitting of the manifold locations of the events generated by the observation of a line under motion. In particular, Gao et al. [12] introduces an exact incidence relation that all such events must obey under constant linear velocity. It depends on observable camera motion and 3D line parameters, and thus serves as a basis for joint manifold fitting and motion estimation. Based on this foundation, the present work develops the first linear -point solution to this problem, which not only unlocks unprecedented efficiency, but also a simplified understanding of degenerate geometric conditions.
3 Methodology
Assume a calibrated event camera undergoing an arbitrary six DoF motion, while observing a set of lines . Each line generates a set of events where each event is characterized by its pixel coordinate in the image plane, timestamp (with resolution), and polarity .
For a small time window , centered at reference time , such that the camera motion can be approximated by linear dynamics, the events generated by a single line circumscribe a manifold termed eventail [12]. In Sec. 3.1, we describe the geometric incidence relation [12] which needs to be satisfied by events on this manifold and depends on the observable components of the velocity and the line for which we introduce a minimal parametrization. Then, in Sec. 3.2 we will introduce a solver that recovers the line parameters, and observable linear velocity parameters from a set of events that lie on the manifold. Finally, in Sec. 3.6, we will explore how to recover the full linear velocity from a set of partial velocity observations.
3.1 Incidence Relationship
We reiterate here briefly the incidence relation introduced in [12] and use Fig. 1 for illustration purposes. For simplicity, we will first consider the case of one line, and thus drop the index from the variables. We furthermore express all quantities in the camera frame centered at time . The incidence relation enforces that events are triggered by points on the line, such that the line (in Plücker coordinates) emanating from an individual event triggered at time (orange line in Fig. 1) intersects the line (blue line in Fig. 1). The condition for intersection of two non-parallel lines is
(1) |
Plücker line coordinates comprise the line direction and moment , with being an arbitrary point on the line. As [12] for the event ray we use the camera position at time , and the event bearing vector rotated into the reference frame at time . Under first order dynamics, these are and with . As [12], we assume given and computed by integrating angular rate measurements from an available inertial measurement unit (IMU) with . Here denotes the matrix exponential and denotes the skew-symmetric matrix associated with the angular rate . The Plücker coordinates are thus and the incidence relation becomes
(2) |
The above equation relates measurements and to our unknowns and , but is still not in a minimal form. This is because, firstly Plücker coordinates are not minimal, and indeed true minimal line representations only have four DoF. Second, in a monocular setup, there is scale ambiguity, and this dictates that the absolute scale of and is unobservable. Finally, the velocity component along the line direction is unobservable due to the aperture problem. To encapsulate these constraints, in the next part, we will introduce a line representation based on the angle-axis representation of rotation matrices. This rotation matrix simultaneously spans a coordinate frame in which we will express our camera velocity, and the aperture problem can be enforced succinctly.
3.2 Transition into a Minimal Form
The transition into minimal form consists of two steps. We first derive the angle-axis-based line representation by successively eliminating internal constraints in the Plücker line coordinate representation, and then proceed with finding the minimal camera velocity representation. We start by observing that scaling both and yields the same line, and thus we may choose to fix the scale of to be unity. Next, we observe that is by definition perpendicular to . Moreover, we see that any on the same line results in the same moment, and thus we may choose to select it closest to the origin, such that it is perpendicular to . Since the scale is unobservable, we may furthermore fix the distance from to the origin to be unity. Summarizing these observations, we conclude that we may select and to be perpendicular unit vectors, and in particular we will select and , such that the resulting moment is . We visualize the three unit vectors in Fig. 1, and observe that they span a line-dependent coordinate frame via the rotation matrix . Rotation matrices belong to , and thus it becomes apparent that this line representation can be further compressed via the matrix logarithm , to yield only three DoF, which is a minimal line representation in the absence of scale. Here maps the skew-symmetric matrix in the argument to the associated vector.
Note that compared to the two-point-two-plane parametrization [14] in [12], this representation is (i) minimal, relying on three instead of four DoF, and (ii) more flexible, since it does need a reparametrization step for lines that are almost parallel to the -plane.
We now address the camera velocity parametrization, which we express in the line-dependent coordinate frame.
(3) |
introduces the camera velocity expressed in the line coordinate frame. Using these new parametrizations, the incidence relation Eq. 2 becomes
(4) |
Cycling through the triple product in the first summand, i.e. , we arrive at
(5) |
which can be expanded and further simplified to
(6) |
Note that due to the cross product with , the camera velocity becomes unobservable within this incidence relation, i.e. changing it does not affect the residual. This confirms our intuition that the aperture problem should make velocities along the line direction unobservable. As a result, we focus on only solving for and . Furthermore, observe that these equations are in terms of and instead of the minimal parameters . In what follows we design the solver around recovering and since it yields a simpler algorithm, yet it should be remembered that the minimal representation can always be recovered using the matrix logarithm of . We now discuss how to recover the unknowns from a set of incidence relations from multiple events, which is summarized in Alg. 1.
3.3 Five-point Minimal Solver
The incidence relationship in Eq. 6 has five unknowns, three from and two from , and thus can be solved by stacking a minimum of five such constraints. Since each such constraint originates from a single event, this means that five events are the minimum number to solve this system. This stack of equations is
This system of equations is linear in the unknowns and can be rewritten as a single matrix equation
(7) |
Note that this formulation successfully groups the terms from the events in and unknowns in . We can thus solve for , and then reconstruct the unknowns from the found solution. Solving Eq. 7 can be done with a singular value decomposition of A and then selecting the last column of corresponding to the smallest singular value of . Let us denote this solution with . Note that is normalized, and, due to the homogeneous nature of Eq. 7, only known up to parity, i.e. both are solutions. Note also that this procedure is not limited to using only five events, but can be applied to , however in this case the solution will no longer be exact, and instead approximate but with a globally minimal squared residual equal to the smallest singular value of . The ability to process more than five events sets this method apart from the solver in [12] which uses a fixed elimination template tailored to only five events. Next, we discuss how to recover the unknowns from a solution .
3.4 Recovering the Unknowns from
For simplicity, we will only treat the case with , but will state that the ambiguity in the parity of the solution to Eq. 7 gives rise to the solution pairs , and . Remembering the definition of we write that
(8) |
Here is an unknown scaling factor. However, since the last three entries of correspond to the unit vector , we can simply normalize by the length of . In what follows we assume that this normalization is done beforehand, and thus ignore this scaling factor by setting . Straightforward manipulation yields
(9a) | ||||
(9b) | ||||
(9c) |
From the last equation we can recover and by taking the norm, and normalized vector
(10) |
Note that this decomposition is not unique, as we may simultaneously flip the signs of and , resulting in the same product. Finally, we can recover . While is the globally optimal solution to the incidence relation constraints in Eq. 7, it is not immediately clear that the recovered are also globally optimal with respect to this constraint. As proved in the supplementary material it turns out that are globally optimal. Next, we will comment on verifying the correctness of the solution.
3.5 Degenerate Solutions and Solution Multiplicity
First, we state a theorem that the only degenerate cases arise when the matrix in Eq. 7 is rank-deficient. This means that, if , the previously discussed decomposition always succeeds and yields four solutions.
Theorem 1: If , with defined in Eq. 7, the decomposition of into and always succeeds and yields four distinct solutions. If the solver returns infinitely many solutions.
Note that this theorem also handles cases in which the line passes through the origin. Both the proof of the above theorem and the handling of this case are described in the supplementary material. Rank deficiency of occurs when events share the same timestamp or motion-corrected bearing vector . To identify this scenario, checking the matrix rank before solving for can be done. After SVD, the second smallest singular value should be checked for being large, since small values indicate near-rank deficiency.
Next, we discuss the multiplicity of solutions. As previously stated, the designed solver returns four distinct solutions if the rank of is at least 5. Here we enumerate these solutions (visualized in Fig. 2), stated as a theorem:
Theorem 2: Given a solution to the incidence relation in Eq. 6, then
are also solutions. For solutions and the closest point on the line is behind the camera, while for solutions and the line direction is flipped, which represents an ambiguity in the definition of direction of .
We state the proof in the supplementary material. Note that two configurations correspond to flipping across the xy-plane. We eliminate these solutions by enforcing that the intersection point (see Fig. 1) between the line and event ray is in front of the camera. In the supplementary material, we use this geometric interpretation to characterize manifolds spanned by events in more detail.
(a) Solution | (b) Solution | (c) Solution | (d) Solution |
3.6 Velocity Averaging from Multiple Manifolds
As previously discussed, the linear solver can only recover partial velocities perpendicular to the line generating the events in a single cluster. We now describe how to recover the full velocity from a set of partial observations from multiple lines . Each partial observation of the velocity is the projection into the -plane where , which is visualized in Fig. 3(i). The observation is given by the two projections and scaling the second and third basis vectors of , respectively. Thus, the correct velocity estimate must satisfy
(11) |
Unlike the velocity averaging scheme in [12] we adopt the following geometrically motivated, but equivalent formalism to solve multiple such equations: Each such constraint (one for each line) can be converted into a homogeneous linear constraint following the steps in Fig. 3(i). We see that the 90∘ rotated velocity must be perpendicular to the projected camera velocity . Forming the dot product with this vector yields:
(12) | ||||
Note that the above constraint remains, even when inserting the alternative solutions or , since these only lead to a change in the sign. Stacking such equations, one for each line, can be summarized as:
(13) |
We solve this equation again with SVD, by selecting the column of corresponding with the minimal singular value. From this system, we also conclude that we need a minimum of two lines to recover . Again, SVD only recovers up to an unknown parity, i.e. both satisfy the equation. To disambiguate the solution, we enforce that the projected velocity must point in the same direction as for each line. This can be written concisely as
(14) |
If at least one line does not satisfy this condition, the sign of should be flipped. Compared to [12], which uses a more expensive Shur-Complement step to eliminate unknown scale factors, our algorithm only requires a single step of SVD, but yields the same results. This is because our algorithm is actually equivalent to that of [12], as will be shown in the supplementary material.
Note that the above line averaging scheme may run into issues, if two lines are parallel. Making, the common component of unobservable. However, this case actually induces a rank deficiency on , as proved in the supplementary material, and can thus easily be discarded.
(i) single line constraint | (ii) multiple line constraints |
4 Implementation
We integrate the aforementioned linear solver into a RANSAC framework for parameter determination of each manifold, followed by fitting over all inliers. Diverging from the approach of [12], our implementation adopts the GC-RANSAC framework [1] for robust geometric model estimation. GC-RANSAC enhances the original RANSAC by introducing a few versatile functionalities tailored for early termination, thereby expediting the overall process. Essentially, GC-RANSAC first iteratively selects a minimal, spatially coherent subset of events () from the incoming event stream, applies the proposed linear solver described in Alg. 1, and evaluates the quality of the resulting hypothesis. Each occurrence of a so-far-the-best hypothesis will trigger a local refinement within a subset of its inliers. This procedure is repeated times to separately identify manifolds. We will now delve into the critical aspects influencing this process.
Spatially Coherent Sampler: The manifold’s continuous structure allows for the examination of the data’s spatial coherency. We utilize NAPSAC [35] to sample from the incoming event stream. This approach starts by randomly selecting one event in the space-time volume followed by identifying four additional events within a hyper-sphere centered on the initial event, based on a predetermined radius . In practice, these four points, located within the hyper-sphere, are likely to be inliers111Note that this method does not conflict with the findings about ensuring spatial distribution among samples [12]. However, we adhere to a general rule of maintaining this spatial distribution within our defined hyper-sphere..
Angular Reprojection Residual: We employ the angular reprojection residual [22] for inliers selection. The objective is to minimally correct the two lines, the bearing vector emanating from an event and the 3D line, so they could converge at a single point (i.e. in Fig. 1). Unlike the typical image reprojection residual, this measurement is invariant to both rotation and scale.
Local Refinement: RANSAC often results in numerous ineffective iterations. Therefore, when a promising hypothesis emerges, it is recommended to perform local refinement using the inlier sets, which can enhance the inlier ratio and decrease the total number of required iterations. In our work, we introduce two distinct methods for local refinement. The first method leverages the over-determined nature of our proposed linear solver, while the second employs non-linear optimization with Levenberg–Marquardt over an algebraic error (i.e. Eq. 6). As suggested in [1], we randomly select a subset from the inlier set () and repeat this procedure times.
5 Experiments
We perform evaluations both on synthetic and real data. We first confirm the runtime improvement and numerical stability of our linear solver. Next, we discuss the impact of the number of used events or lines, over different noise setups. We conclude with experiments on a few public real-world sequences, demonstrating the advantage over existing methods. We quantify the accuracy of our results with the same criterion as [12], the direction error between the estimated and the ground truth velocities, given that the scale is not observable. All experiments are conducted on a 32GB RAM desktop with an Intel Core i9-10900F Processor.
5.1 Simulation
We first evaluate the performance of the proposed linear solver under different setups over synthetic data. Readings from individual manifolds are generated as follows. We first sample randomly directed linear and angular velocities of and magnitude, respectively. The time window length is set to , and the virtual event camera has a resolution of 640480 and a focal length of 320 pixels. Next, we sample a random line in 3D with a finite length and sample random events according to the spatiotemporal strategy in [12]. We study three types of noise with different magnitudes: pixel noise (), timestamp jitter (), and gyroscope noise (), which is assumed to be given by an IMU. The magnitude of the pixel noise and the noise on camera angular velocities are consistent within the same noise level but vary in direction. Timestamp noise follows a zero-mean Gaussian. For a more detailed sensitivity study for different noise sources and levels see the supplementary material. We generate one million random line, velocity, and event configurations, and report the mean and median angle error, as well as the minimum and mean runtime in milliseconds of our method and the one in [12].
Method | Runtime () | Error Rate (%) | ||
---|---|---|---|---|
min. | avg. | |||
Gröbner [12] | 1893 | 2046 | 1.00 | 0.28 |
Linear (ours) | 3.00 | 3.25 | 0.00 | 0.00 |
Runtime Analysis: In each run we record the runtime of both solvers. As reflected in Tab. 1, our solver runs over 600 times faster than the Gröbner basis solver in [12].
Numerical Stability: We further analyze the numerical stability of both methods under the noise-free setup. We report the likelihood of the solver to converge to within a low (, or ) error. Our solver consistently reaches a zero error, unlike the polynomial solver, which, due to numerical instabilities of the elimination template, fails to converge within a error range of times.
Method | num. | Pixel Noise | Time. Jitter | Gyro. Noise |
---|---|---|---|---|
events | () | () | () | |
Gröbner [12] | 5 | 7.80/1.67 | 3.61/0.83 | 7.48/3.09 |
Linear (ours) | 5 | 5.53/1.24 | 2.87/0.73 | 6.53/2.47 |
Linear (ours) | 10 | 0.46/0.15 | 0.17/0.12 | 1.50/1.17 |
Analysis of the Number of Used Events: In each simulation, we sample 1,000 signal events and introduce a type of representative noise to the measurements. From these, we use the first events as input for our linear solver for a fair comparison and document the resulting error. We repeat this simulation a million times, varying , and report the results in Fig. 4. We observe a clear trend that as the number of used events increases, the error decreases markedly, except when noise is introduced to the camera’s angular velocity. This exception occurs because the solver cannot average out the noise on the angular velocity, regardless of the number of events processed. The other two errors approach near zero when 1,000 events are used. A full analysis of the solver’s performance under each noise type can be found in the supplementary material.
Analysis of the Number of Used Lines: Finally, we validate our velocity averaging scheme. We extend our simulation to multiple manifolds. For each run, we sampled ten lines and, within each line, we selected ten signal events with known line associations as input to our solver. The first solutions (line and partial motion parameters) from each manifold were taken into the linear averaging scheme, and the error was recorded. This simulation was executed 10,000 times. Fig. 4 shows that as the number of used lines increases, the error drops significantly. A comprehensive analysis of the solver’s performance against various noise types is available in the supplementary material. Additionally, we show quantitative results in Tab. 2. Both the Gröbner Solver and our linear solver use five lines with either five or ten events each. Our approach demonstrates a lower error with noisy measurements, and this margin grows further in overdetermined systems (i.e. ).
5.2 Real-world Experiment
Similarly to [12], we validate our method on the same data sequence from VECtor Benchmark [11]. Unlike the previous work, we first segmented the event data into non-overlapping intervals of each and reduced the overall size of the data to approximately 5,000 events per interval for efficiency. Motion-corrected bearing vectors are then calculated by fusing gyroscope readings from the attached IMU. Next, to construct the chosen sampler used in GC-RANSAC, we multiply the timestamp by a scale factor of 1,000 and established a radius of 50 to compose the spatially coherent graph in the space-time volume. We apply an angular reprojection threshold of for inlier selection, consistent across both the primary iterations and the local refinement stages. The number of iterations for each manifold fitting is capped at 100 and evaluated manifolds is capped at 10. In Tab. 3, we summarized the performance, including both mean and median errors, across two baseline approaches and three variants of our proposed method. Importantly, as [12] reports, CELC+opt [30] is limited to certain sub-sequences, where spatial-temporal plane clustering is feasible, while, [12] does not suffer from this limitation. We test three configurations of our method: linear only, linear with non-minimal refinement, and linear with non-linear optimization. Each configuration uses GC-RANSAC with the spatially coherent sampler. The latter two configurations perform different operations when a new best hypothesis is found by the sampler: “Linear w/ non-min. solver” samples 10 events from the found inliers and feeds them to the linear solver, resulting in a refined solution. “Linear w/ non-linear opt.” runs on-manifold Levenberg-Marquardt optimization steps (with line parameters ) over the 10 selected events and minimizes the algebraic error in Eq. 6. We use a subset of 10 inliers to enhance efficiency and reduce errors, as seen in simulated experiments.
Results: In each configuration, our method achieves a lower error than the two baseline approaches. Additionally, local refinement enhances accuracy significantly. Without refinement it already has a lower mean error than [30], and lower mean error than [12]. Introducing non-minimal solver refinement reduces the mean error by another over the “Linear only” baseline and non-linear optimization reduces it by .
6 Conclusion and Future Work
This work introduced a novel, efficient, and linear N-point solver for line-based relative motion estimation of an event camera. Compared to existing works that rely on polynomial system solvers, our method is more numerically stable, over 600 times faster, and allows the identification of degenerate cases explicitly. Moreover, we introduce a novel velocity averaging scheme that is simpler and faster than previous approaches. When combined with GC-RANSAC we show improved normalized velocity estimation compared to existing approaches, at a fraction of the runtime. Finally, the solutions found by our solver deliver new insights into event manifolds generated by lines and thus pave the way for line-based motion estimation with events. Moreover, despite focusing on event cameras in this work, our formulation is fully compatible with line detections from standard cameras. Thus the tools developed in this work can benefit both frame- and event-based computer vision. Our next steps consist of adding uncertainties to the partial velocity readings and applying the fusion strategy asynchronously over time as well as in conjunction with IMU measurements.
Acknowledgments
This research has been supported by projects 22DZ1201900 and 22ZR1441300 funded by the Natural Science Foundation of Shanghai as well as project 62250610225 by the National Science Foundation of China (NSFC). This work was also supported by the European Research Council (ERC) under grant agreement No. 864042 (AGILEFLIGHT).
Supplementary Material
7 Appendix
Here we report additional results of our algorithm for varying noise sources in Sec. 7.1, before discussing the proofs of Theorem 1 and Theorem 2 in Sec. 7.2 and Sec. 7.3, as well as the connection of our proposed line averaging scheme with that of [12] in Sec. 7.6. Finally, we provide additional visual insights into the manifolds spanned by events generated by a line. We show that these manifolds can be canonicalized, i.e. reduced to a small family of manifolds which are highly interpretable (see Sec. 7.7).
7.1 Noise Sensitivity Analysis
In Fig. 5, we provide additional results of our method in simulation, as we vary the number of events used by our solver, and the magnitude of the various noise sources, e.g. pixel noise, timestamp jitter, gyroscope noise. As expected, we see that all errors decrease as more events are used, and errors increase as more noise is injected. Again, the only noise source that cannot be completely eliminated through addition of events is the gyroscope noise, which introduces systematic errors. Experimentally, we found that events gives a good tradeoff between the speed of the algorithm, and observed errors for all noise levels and sources.
We also present additional results for differing noise sources and magnitudes of our line averaging scheme in Fig. 6, and analyse the resulting errors as the number of used lines increases. Again we see that all errors tend to zero as more lines are used, except for the gyroscope noise.
7.2 Proof of Theorem 1 on Degeneracies
For clarity, we restate Theorem 1 here:
Theorem 1: If , with defined in Eq. 7, the decomposition in Eqs. (8, 9, 10) always succeeds and yields four distinct solutions. If the solver returns infinitely many solutions.
Proof: First assume . Then SVD returns two distinct principle directions . After decomposition, Eq. 10 yields two more solutions, resulting in a total of four distinct solutions. Now assume that the decomposition fails, and this can happen for three reasons:
Failure to normalize in Eq. 8: Normalization may fail if has zero norm. However, this case is impossible for a matrix with rank for the following reason: Let be the three left and right columns of (see Eq. 7). Moreover, note that , where is a diagonal matrix, i.e. each row of is a multiple of the corresponding row in .
If has zero norm, . Next, let be the smallest singular value of corresponding to the solution . Then
(15) | ||||
(16) | ||||
(17) |
The last three rows of the last equation are
(18) |
and imply either that , i.e. is in the left null-space of , or , i.e. is in the right null-space of . Both imply that . This can only be the case if , following the assumption. This implies that the smallest singular value is . From the first three equations above, this implies that . But then
(19) |
which implies that is also in the null space of . We now find that both and are in the null-space of . These vectors are independent, and render the rank of . This is a contradiction.
Failure to recover : Recovering fails if the norm of the cross product in Eq. 10 is . This implies that . For similar reasons as above, this implies that solves both and . This implies that can be freely varied, which would imply a two-dimensional null space of and a rank which is again a contradiction.
Line passing through the origin at : Note that in such a case, would not be defined, and could cause issues in solving. However, we can then use a different definition of the line, with the direction , and point on the line . The line moment then becomes . Inserting this into Eq. 2, transforms Eq. 6 into
(20) |
However, this would imply that the system in Eq. 7 has a solution of the form , with . However, we proved in the last two cases that such a solution form implies that the rank of is smaller than 5. Thus ensuring is sufficient for discarding the case where the line passes through the origin.
We conclude that if , the decomposition cannot fail, and always returns four distinct solutions. Moreover, we conclude that a yields solutions from a two dimensional nullspace, which yields infinitely many decompositions.
7.3 Proof of Theorem 2 on Solution Multiplicity
For clarity, we restate Theorem 2 here:
Theorem 2: Given a solution to the incidence relation in Eq. 6, then
are also solutions. These four solutions are visualized in Fig. 2. For solutions and the closest point on the line is behind the camera, while for solutions and the line direction is flipped, which represents an ambiguity in the definition of direction of .
Proof: We will only prove solutions and since can be derived from a composition of and . Inserting into Eq. 6, we have
and yields
7.4 Handling of Parallel Lines
As mentioned in the main text, parallel lines may cause difficulties in identifying the direction of the camera velocity. However, we can identify this case easily by checking the rank of . If it is lower than 2, we can discard the sample, and select a new one, or even use another RANSAC loop to select pairs of lines until the rank of is at least 2. Let us now prove that parallel lines cause a rank deficiency in .
Proof: We will proceed in showing that if two lines are parallel, the two corresponding rows and in are parallel and will thus result in rank deficiency (see Eq. 13). Expanding in the two line coordinate frames yields
(21) | |||
(22) |
with unknown scale factors . For parallel lines . Computing in two ways (with two expansions of , we recover exactly the rows of by
(23) | |||
(24) |
It follows that , i.e. they are parallel.
7.5 Global Optimality of and
As noted in the main text, while the SVD-based solver which recovers from a set of incidence relations (Eq. 7) finds a globally optimal solution , it is not clear if the decomposed solution is also optimal with respect to the same objective. We prove this here.
Proof: We will prove this by way of contradiction. Assume given the SVD-based solution
which is globally optimal, and decomposition with . Assume that there exists a different, more optimal with . Then but this is impossible since it would imply that is more optimal than , but is already optimal. This implies that the objective is already optimal in which concludes the proof.
7.6 Connection between the Proposed Line Averaging Scheme and [12]
The presented velocity averaging scheme is conceptually simpler, and lends itself to geometric interpretation, unlike the scheme in [12]. However, surprisingly these schemes are actually equivalent, as will be demonstrated next. In [12], Eq. 11 is used to set up a number of constraints
(25) | |||
(26) |
Introducing unknowns and , one for each line. Stacking multiple of these equations results in a system
(27) |
This system is then multiplied from the left with , and the Shur complement trick is employed to eliminate the extraneous variables , resulting in the equation , where we use the following definitions:
(28) | ||||
(29) | ||||
(30) | ||||
(31) |
Inserting the equations, and simplifying we get
(32) | ||||
(33) | ||||
(34) | ||||
(35) |
with
(36) |
The linear system thus becomes
(37) | ||||
(38) | ||||
(39) |
if has full rank this implies that
(40) |
Note that is identical to in Eq. 13 up to normalization of each velocity separately. This normalization strategy can be seamlessly integrated into the computation of . Moreover, computing is much simpler than computing .
7.7 Canonicalization of the Manifold
(i) event transformations |
(ii) eventail in 2D |
The incidence relation in Eq. 6 yields a simple way to visualize the manifold in its canonical form, and also shows the dependence on the line velocity parameters and . To reach this canonical form, we simply rotate the bearing vectors of all events into the line-dependent coordinate frame by replacing . We visualize this transformation in Fig. 7, where we transition from raw events in normalized coordinates (A), derotated events (B), and then events in the line reference frame (C). This coordinate frame corresponds with that of a line that is parallel to the camera’s -axis. Doing this replacement yields
(41) |
where corresponds to the unit vectors in the camera coordinate frame. Distributing and diving out the third component of , i.e. transitioning to normalized coordinates in the new line reference frame, we reach
(42) |
where is the event -coordinate in normalized, line coordinates. This form describes the shape of the manifold in two dimensions and is visualized in Fig. 7(ii) for varying and .
From these visualization we make a number of observations: First, configurations with trace straight lines, corresponding to planar manifolds in the line coordinate frame. Note, however, that in the derotated frame (B) these may still be non-planar. Second, we see that induces a curvature in the manifold which increases as time progresses. This configuration corresponds to a camera approaching the line, and thus the reduced distance increases the apparent motion, which results in a larger slope. Finally, results in flattened curves. This corresponds to cameras retracting from the line, which reduces the apparent motion, and thus reduces the slope in the manifold.
References
- Barath and Matas [2021] Daniel Barath and Jiri Matas. Graph-cut ransac: Local optimization on spatially coherent structures. IEEE TPAMI, 44(9):4961–4974, 2021.
- Brändli et al. [2016] Christian Brändli, Jonas Strubel, Susanne Keller, Davide Scaramuzza, and Tobi Delbruck. ELiSeD — an event-based line segment detector. In EBCCSP, pages 1–7, 2016.
- Bryner et al. [2019] Samuel Bryner, Guillermo Gallego, Henri Rebecq, and Davide Scaramuzza. Event-based, direct camera tracking from a photometric 3d map using nonlinear optimization. In ICRA, pages 325–331, 2019.
- Cadena et al. [2016] Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, José Neira, Ian Reid, and John J. Leonard. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE T-RO, 32(6):1309–1332, 2016.
- Censi and Scaramuzza [2014] Andrea Censi and Davide Scaramuzza. Low-latency event-based visual odometry. In ICRA, pages 703–710, 2014.
- Chamorro Hernández et al. [2020] William Oswaldo Chamorro Hernández, Juan Andrade-Cetto, and Joan Solà Ortega. High-speed event camera tracking. In BMVC, 2020.
- Gallego et al. [2016] Guillermo Gallego, Jon E.A. Lund, Elias Mueggler, Henri Rebecq, Tobi Delbruck, and Davide Scaramuzza. Event-based, 6-dof camera tracking from photometric depth maps. IEEE TPAMI, 40(2):2402–2412, 2016.
- Gallego et al. [2018] Guillermo Gallego, Henri Rebecq, and Davide Scaramuzza. A unifying contrast maximization framework for event cameras, with applications to motion, depth and optical flow estimation. In CVPR, pages 3867–3876, 2018.
- Gallego et al. [2019] Guillermo Gallego, Mathias Gehrig, and Davide Scaramuzza. Focus is all you need: Loss functions for event-based vision. In CVPR, pages 12272–12281, 2019.
- Gallego et al. [2020] Guillermo Gallego, Tobi Delbrück, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew J. Davison, Jörg Conradt, Kostas Daniilidis, and Davide Scaramuzza. Event-based vision: A survey. IEEE TPAMI, 44(1):154–180, 2020.
- Gao et al. [2022] Ling Gao, Yuxuan Liang, Jiaqi Yang, Shaoxun Wu, Chenyu Wang, Jiaben Chen, and Laurent Kneip. VECtor: A versatile event-centric benchmark for multi-sensor slam. IEEE RA-L, 7(3):8217–8224, 2022.
- Gao et al. [2023] Ling Gao, Hang Su, Daniel Gehrig, Marco Cannici, Davide Scaramuzza, and Laurent Kneip. A 5-point minimal solver for event camera relative motion estimation. In ICCV, pages 8015–8025, 2023.
- Gehrig et al. [2020] Mathias Gehrig, Sumit Bam Shrestha, Daniel Mouritzen, and Davide Scaramuzza. Event-based angular velocity regression with spiking networks. In ICRA, pages 4195–4202, 2020.
- Hartley and Zisserman [2003] Richard Hartley and Andrew Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2003.
- Hartley [1997] Richard I. Hartley. Lines and points in three views and the trifocal tensor. IJCV, 22(2):125–140, 1997.
- Hidalgo-Carrió et al. [2022] Javier Hidalgo-Carrió, Guillermo Gallego, and Davide Scaramuzza. Event-aided direct sparse odometry. In CVPR, pages 5771–5780, 2022.
- Huang et al. [2020] Shi-Sheng Huang, Ze-Yu Ma, Tai-Jiang Mu, Hongbo Fu, and Shi-Min Hu. Lidar-monocular visual odometry using point and line features. In ICRA, pages 1091–1097, 2020.
- Ieng et al. [2017] Sio-Hoi Ieng, João Carneiro, and Ryad B. Benosman. Event-based 3d motion flow estimation using 4d spatio temporal subspaces properties. Frontiers in Neuroscience, 10:596, 2017.
- Kim et al. [2016] Hanme Kim, Stefan Leutenegger, and Andrew J. Davison. Real-time 3d reconstruction and 6-dof tracking with an event camera. In ECCV, pages 349–364, 2016.
- Kueng et al. [2016] Beat Kueng, Elias Mueggler, Guillermo Gallego, and Davide Scaramuzza. Low-latency visual odometry using event-based feature tracks. In IROS, pages 16–23, 2016.
- Le Gentil et al. [2020] Cedric Le Gentil, Florian Tschopp, Ignacio Alzugaray, Teresa Vidal-Calleja, Roland Siegwart, and Juan Nieto. IDOL: A framework for imu-dvs odometry using lines. In IROS, pages 5863–5870, 2020.
- Lee and Civera [2019] Seong Hun Lee and Javier Civera. Closed-form optimal two-view triangulation based on angular errors. In ICCV, pages 2681–2689, 2019.
- Lim et al. [2022] Hyunjun Lim, Jinwoo Jeon, and Hyun Myung. UV-SLAM: Unconstrained line-based slam using vanishing points for structural mapping. IEEE RA-L, 7(2):1518–1525, 2022.
- Liu et al. [2020] Daqi Liu, Alvaro Parra, and Tat-Jun Chin. Globally optimal contrast maximisation for event-based motion estimation. In CVPR, pages 6348–6357, 2020.
- Lu et al. [2021] Junxin Lu, Zhijun Fang, Yongbin Gao, and Jieyu Chen. Line-based visual odometry using local gradient fitting. Journal of Visual Communication and Image Representation, 77:103071, 2021.
- Maqueda et al. [2018] Ana I. Maqueda, Antonio Loquercio, Guillermo Gallego, Narciso García, and Davide Scaramuzza. Event-based vision meets deep learning on steering prediction for self-driving cars. In CVPR, pages 5419–5427, 2018.
- Mueggler et al. [2014] Elias Mueggler, Basil Huber, and Davide Scaramuzza. Event-based, 6-dof pose tracking for high-speed maneuvers. In IROS, pages 2761–2768, 2014.
- Mueggler et al. [2018] Elias Mueggler, Guillermo Gallego, Henri Rebecq, and Davide Scaramuzza. Continuous-time visual-inertial odometry for event cameras. IEEE T-RO, 34(6):1425–1440, 2018.
- Nister et al. [2004] David Nister, Oleg Naroditsky, and James Bergen. Visual odometry. In CVPR, pages I–I, 2004.
- Peng et al. [2021] Xin Peng, Wanting Xu, Jiaqi Yang, and Laurent Kneip. Continuous event-line constraint for closed-form velocity initialization. In BMVC, 2021.
- Peng et al. [2022] Xin Peng, Ling Gao, Yifu Wang, and Laurent Kneip. Globally-optimal contrast maximisation for event cameras. IEEE TPAMI, 44(7):3479–3495, 2022.
- Rebecq et al. [2016] Henri Rebecq, Timo Horstschäfer, Guillermo Gallego, and Davide Scaramuzza. EVO: A geometric approach to event-based 6-dof parallel tracking and mapping in real-time. IEEE RA-L, 2(2):593–600, 2016.
- Seok and Lim [2020] Hochang Seok and Jongwoo Lim. Robust feature tracking in dvs event stream using bézier mapping. In WACV, pages 1658–1667, 2020.
- Stoffregen and Kleeman [2019] Timo Stoffregen and Lindsay Kleeman. Event cameras, contrast maximization and reward functions: An analysis. In CVPR, pages 12292–12300, 2019.
- Torr et al. [2002] Philip Hilaire Torr, Slawomir J. Nasuto, and John Mark Bishop. NAPSAC: High noise, high dimensional robust estimation - it’s in the bag. In BMVC, page 3, 2002.
- Vakhitov and Lempitsky [2019] Alexander Vakhitov and Victor Lempitsky. Learnable line segment descriptor for visual slam. IEEE Access, 7:39923–39934, 2019.
- Valeiras et al. [2019] David Reverter Valeiras, Xavier Clady, Sio-Hoi Ieng, and Ryad Benosman. Event-based line fitting and segment detection using a neuromorphic visual sensor. IEEE TNNLS, 30(4):1218–1230, 2019.
- Vidal et al. [2018] Antoni Rosinol Vidal, Henri Rebecq, Timo Horstschaefer, and Davide Scaramuzza. Ultimate slam? combining events, images, and imu for robust visual slam in hdr and high-speed scenarios. IEEE RA-L, 3(2):994–1001, 2018.
- Weikersdorfer et al. [2013] David Weikersdorfer, Raoul Hoffmann, and Jörg Conradt. Simultaneous localization and mapping for event-based vision systems. In ICVS, pages 133–142, 2013.
- Weikersdorfer et al. [2014] David Weikersdorfer, David B. Adrian, Daniel Cremers, and Jörg Conradt. Event-based 3d slam with a depth-augmented dynamic vision sensor. In ICRA, pages 359–364, 2014.
- Weng et al. [1992] Juyang Weng, Thomas S. Huang, and Narendra Ahuja. Motion and structure from line correspondences; closed-form solution, uniqueness, and optimization. IEEE TPAMI, 14(3):318–336, 1992.
- Yousif et al. [2015] Khalid Yousif, Alireza Bab-Hadiashar, and Reza Hoseinnezhad. An overview to visual odometry and visual slam: Applications to mobile robotics. Intelligent Industrial Systems, 1(4):289–311, 2015.
- Yuan and Ramalingam [2016] Wenzhen Yuan and Srikumar Ramalingam. Fast localization and tracking using event sensors. In ICRA, pages 4564–4571, 2016.
- Zhou et al. [2021] Yi Zhou, Guillermo Gallego, and Shaojie Shen. Event-based stereo visual odometry. IEEE T-RO, 37(5):1433–1450, 2021.
- Zhu et al. [2017] Alex Zihao Zhu, Nikolay Atanasov, and Kostas Daniilidis. Event-based visual inertial odometry. In CVPR, pages 5816–5824, 2017.
- Zuo et al. [2022] Yi-Fan Zuo, Jiaqi Yang, Jiaben Chen, Xia Wang, Yifu Wang, and Laurent Kneip. DEVO: Depth-event camera visual odometry in challenging conditions. In ICRA, pages 2179–2185, 2022.