Vidal Etal Icra2006
Vidal Etal Icra2006
Vidal Etal Icra2006
1 1 1
0 0 0
Y
Y
−1 −1 −1
3 3
2 2 3 2
3 3
1 2 1 2 1 2
1 1 0 1
0 0
0 0 0
−1 −1 −1
−1 −1 −1
−2 −2 −2 −2 −2 −2
X −3 X −3 X −3 −3
Z Z Z
(a) Final Map by using Mutual Information for (b) Final Map by using Mutual Information for (c) Final Map by using Mutual Information for
position and constant angular velocity. position and orientation. position and Fisher Information for orientation.
log2| P | log2| P | log2| P |
200 200 200
newland newland newland
Pcam Pcam Pcam
100 100 100
Plan Plan Plan
P P P
0 0 0
(d) Entropy for MI in position and constant (e) Entropy for MI in position and orientation. (f) Entropy for MI in position and FI in orien-
angular velocity. tation.
Fig. 2. Trajectories with Final Maps and Entropy. (rReal and rEst are the real and estimated camera trajectories, the label newland and the green dots and
dotted vertical lines represent the value of entropy at the instant when new landmarks are initialised. Pcam, Plan, and P indicate the camera, map, and overall
entropies.
The third alternative, controlling camera orientation by at rest with some known object in view to act as a starting
maximising the Fisher Information entering into the filter, point and provide a metric scale to the proceedings. The
has the effect that it focuses on reducing the uncertainty of camera moves, translating and rotating freely in 3D, according
the already seen landmarks, instead of eagerly exploring the to the instruction provided in a graphical user interface, and
entire room for new landmarks. The reason is that landmarks executed by the user, within a room or a restricted volume,
that have been observed for a small period of time still have such that various parts of the unknown environment come into
large depth uncertainty, and the Fisher Information metric view. The aim is to estimate and control the full camera pose
is maximised when observations are directed towards them. continuously during arbitrarily long periods of movement. This
The technique tends to close loops at a faster pace than the involves accurately mapping (estimating the locations of) a
other two approaches, thus propagating correlations amongst sparse set of features in the environment.
landmarks and poses in a more efficient way. Additionally, by
revisiting fiduciary points more often, orientations are much Given that the control loop is being closed by the human
better estimated in this case. operator, only displacement commands are computed. Gaze
control is left to the user. Furthermore, the mutual information
Strategy (iii) needs more time to reduce entropy and takes
measure requires evaluating the determinant of the full covari-
more time to insert the same number of landmarks in the map.
ance matrix at each iteration. Because of the complexity of this
But, at the point at which the same number of landmarks is
operation, single motion predictions are evaluated one frame
available it has lower entropy than the other two strategies
at a time. It is only until the 15th frame in the sequence that
(see for example in Figure 2, frames (d-f), that when the 14th
all mutual information measures are compared, and a desired
landmark is added, the times are 19, 18, and 30 secs, and the
action is displayed on screen. That is, the user is presented
entropies are -530, -550, and -610).
with motion directions to obey every second. Note also, that
in computing the mutual information measure, only the camera
V. E XPERIMENTS
position and map parts of the covariance matrix are used,
This section presents an initial experimental result validating leaving out the gaze and velocity parts of the matrix. Finally,
the maximisation of mutual information strategy for the con- to keep it running in real-time, the resulting application must
trol of a hand-held camera in a challenging 15fps visual SLAM be designed for sparse mapping. That is, with the computing
application. Within a room, the camera starts approximately capabilities of an off-the-shelf system, our current application
0.2 0.2 0.2
Error z(m)
Error z(m)
Error z(m)
0 0 0
Error y(m)
Error y(m)
0 0 0
Error x(m)
Error x(m)
0 0 0
(a) Position error when using MI for position (b) Position error when using MI for position (c) Position error when using MI for position
and constant angular velocity. and orientation. and FI for orientation.
0.02 0.02 0.02 0.02 0.02 0.02
Error q1(rad)
Error q0(rad)
Error q1(rad)
Error q0(rad)
Error q1(rad)
0 0 0 0 0 0
Error q3(rad)
Error q2(rad)
Error q3(rad)
Error q2(rad)
Error q3(rad)
0 0 0 0 0 0
(d) Orientation error when using MI for posi- (e) Orientation error when using MI for position (f) Orientation error when using MI for position
tion and constant angular velocity. and orientation. and FI for orientation.
Fig. 3. Estimation errors for camera position and orientation and their corresponding 2σ variance bounds. Position errors are plotted as x, y, and z distances
to the real camera location in meters, and orientation errors are plotted as quaternions.
is limited to less than 50 landmarks. accurately locating the already seen landmarks before actively
Figure 4 shows the graphical user interface. The top part of searching for new ones.
the figure contains a 3D plot of the camera and the landmarks Our method is validated in a video-rate hand-held visual
mapped, while the bottom part shows the information being SLAM implementation. Given that our system is capable of
displayed to the user superimposed on the camera view. Figure producing motion commands for a real-time 6DOF visual
5 contains a plot of the decrease in the various entropies for SLAM, it is sufficiently general to be incorporated into any
the map being built, and the list of actions chosen as shown type of mobile platform, without the need of other sensors.
to the user during the first minute. A possible weakness of this information-based approach is
Worth noticing is that in the real-time implementation, the that it estimates the utility of measurements assuming that
system prompts the user for repeated up-down movements, our models are correct. Model discrepancies, and effects of
as well as left-right commands. This can be explained as if linearisation in the computation of our estimation and control
after initialising new features, the system repeatedly asks for commands might lead to undesirable results.
motions perpendicular to the line of sight to best reduce their
A PPENDIX
uncertainty. Also, closing loops has an interesting effect in the
reduction of entropy, as can be seen around the 1500th frame The orientation of the camera frame, and its rate of change,
on Fig. 5-a. are related to the angular velocity by the quaternion multi-
plication Ω = 2q̇q∗ , with Ω = [0, ωx , ωy , ωz ]⊤ , the angular
VI. C ONCLUSION velocity vector expressed in quaternion form, and q∗ is the
orientation quaternion conjugate. Or equivalently, by q̇ =
1 q(k+1) −q(k)
In conclusion, we have shown plausible motion strategies 2 Mq ≈ ∆t , with
in a video-rate visual SLAM application. On the one hand, by
0 −ωx −ωy −ωz
choosing a maximal mutually informative motion command, ωx 0 −ωz ωy
we are maximising the difference between prior and posterior M= ωy ωz
.
0 −ωx
SLAM entropies, resulting in the motion command that mostly
ωz −ωy ωx 0
reduces the uncertainty of x due to the knowledge of z.
Alternatively, by controlling gaze maximising the information Solving for q(k+1) in the above approximation when ω is con-
about the measurements, we get a system that prioritises in stant, our smooth motion model for the prediction of change
log2 |P |
0
newland
Pcam
−100 Plan
P
−200
−300
−400
−500
−600
−700
−800
−900
0 500 1000 1500 2000 2500
Frames
UP
BACKWARDS
FORWARD
RIGHT
LEFT
STAY
0 10 20 30 40 50 60
Time (s)
Fig. 4. Feature map and camera view as shown in the Graphical User Fig. 5. Real-time Active Vision SLAM.
Interface (844th frame).