Vidal Etal Icra2006

Active Control for Single Camera SLAM
Teresa Vidal-Calleja Andrew J. Davison

Institut de Robòtica i Informàtica Industrial, CSIC-UPC Department of Computing, Imperial College London
Llorens Artigas 4-6, Barcelona 08028, Spain. 180 Queen’s Gate, London SW7 2AZ, UK.
tvidal@iri.upc.edu ajd@doc.ic.ac.uk
Juan Andrade-Cetto David W. Murray

Centre de Visiò per Computador, UAB Robotics Research Group, Department of Engineering Science
Edifici O, Campus UAB, Bellaterra 08193, Spain. University of Oxford, Oxford OX1 3PJ, UK.
cetto@cvc.uab.es dwm@robots.ox.ac.uk
Abstract— computational burden for the accurate detection and matching

In this paper we consider a single hand-held camera perform- of image pairs motivated the use of active visual sensing
ing SLAM at video rate with generic 6DOF motion. The aim for landmark selection in sparse feature maps. Their work
is to optimise both the localisation of the sensor and building
of the feature map by computing the most appropriate control is different to ours because they only control orientations
actions or movements. The actions belong to a discrete set (e.g. of the stereo head, and we are now talking about actually
go forward, go left, go up, turn right, etc), and are chosen so as to controlling translation as well. Other reported techniques to
maximise the mutual information gain between posterior states visual SLAM — although with no control — include the
and measurements. Maximising the mutual information helps use of SIFT features, and matching over a trinocular rig [7].
the camera avoid making ill-conditioned measurements appro-
priate to bearing-only SLAM. Moreover, orientation changes are More elegantly and economically, feature locations can also
determined by maximising the trace of the Fisher Information be computed by tracking landmarks over multiple views from
Matrix. In this way, we allow the camera to continue looking only one camera, a process referred to a ‘bearing-only SLAM’.
at those landmarks with large uncertainty, but from better- One key issue in bearing-only SLAM is the initialisation
posed directions. Various position and gaze control strategies are of feature locations. In [8] for example, the initial estimation
first tested in a simulated environment, and then validated in a
video-rate implementation. Given that our system is capable of of a landmark’s location is achieved by sampling hypotheses
producing motion commands for a real-time 6DOF visual SLAM, of a 1D particle distribution along the line of sight. Another
it could be used with any type of mobile platform, without the technique consists of using sums of Gaussian distributions
need of other sensors. to parameterise 3D feature locations over a delayed state
representation [9].
I. I NTRODUCTION When the sensor capabilities in SLAM are limited, camera
Impressive advances in 2D and, more recently, 3D simul- motion plays an important role in the quality of reconstruction
taneous localisation and mapping (SLAM) for mobile robots obtained. Driving the sensor to the locations that maximise the
have been made over the last 15 years, largely using sonar expected information gain from acquiring an observation at
and laser range sensing [1]–[5]. Most recently, there has been that location has been a common strategy [10]–[12]. However,
considerable interest in solving the SLAM problem using Sim has showed that maximising the expected information
visual sensing, both in order to obtain more accurate 3D gain leads to ill-conditioned filter updates in the bearing-only
representations of the environment and to exploit its richer SLAM [13]. In [14], Bryson et al. present simulated results
potential for scene representation [6], [7]. In this communica- of the effect different vehicle actions have with respect to the
tion, we consider the problem of SLAM with a single camera entropic mutual information gain. The analysis is performed
carried by a human, and how to implement control strategies for a 6DOF aerial vehicle equipped with two cameras and an
in this context. In that sense, this work is different from inertial sensor, for which landmark range, azimuth, and eleva-
other control work because we can only give a human quite tion readings are simulated, and data association is known.
approximate, low frequency, easy to understand commands In this paper we are interested in the video-rate estimation
like ‘left’, ‘right’, ‘stay’. and control of a single camera’s motion, moving rapidly
One of the first active vision-based SLAM approaches used with 6DOF in 3D in normal human environments, mapping
feature correspondences from stereo image pairs [6]. The visual features with minimal prior information about motion
dynamics. Our aim is to localise the sensor and build a feature
This research is supported by the Spanish Ministry of Education and map by computing the appropriate control actions in order to
Science under projects DPI 2004-5414, TIC 2003-09291, and the EU PACO- improve overall system estimation.
PLUS project FP6-2004-IST-4-27657 to TVC and JAC, and by the UK
Engineering and Physical Research Council under an Advanced Research However, insisting on video-rate performance using mod-
Fellowship Grant GR/N03266 to AJD, and by Grant GR/S97774 to DWM. est hardware imposes severe restrictions on the volume of
computation that can take place in each 33ms time step. An Extended Kalman Filter propagates the camera pose
Re-estimation must take place of course, but making strictly and velocity estimates, as well as feature estimates. A state
optimal camera movements would require in addition the that includes the features y is made of x = [x⊤v , y⊤ ]⊤ .
computation of the derivatives of a well-chosen performance The model Q for the prediction of change in orientation
metric with respect to the inputs [15]. Such a computation is inspired by [16] and is detailed in the Appendix. The
remains unfeasible for a 6DOF highly nonlinear system model. redundancy in the quaternion representation is removed by a
Besides, human actions can only be approximate, and at ||q|| = 1 normalisation at each update, accompanied by the
low frequency. So, instead of computing the optimal motion corresponding Jacobian modification.
command, we decide only upon a small set of choices.
Actions belong to a discrete set (eg. go forward, go left, B. Feature Extraction
go up, turn right, etc.), and the particular movement chosen is In this work we are interested in mapping the 3-D coordi-
the one that maximises the mutual information gain between nates of salient point features from images, and need to do
posterior states and measurements. Using entropy for explo- so at video-rate. As in previous work, we use the Shi-Tomasi
ration only makes sense if we can be certain that uncertainty is saliency operator, and match correspondences in subsequent
reduced as landmarks are being discovered. To that, one must frames using normalised sum-of-squared differences [6], [8].
have an idea first of the shape of the space to be mapped, and Although more robust detectors such as SIFT have become
filling it with randomly placed features with large uncertainty widely popular for their ability to find and match features
[14]. Maximising the mutual information aims at reducing with higher degree of uniqueness, they come at the expense
the overall state uncertainty, and helps the camera move of heavier computational load.
away from making repeated ill-conditioned measurements. Image projection is modelled using a full perspective wide
Orientation changes are determined by maximising the trace angle camera. The position of a 3D scene point yi is trans-
of the Fisher Information Matrix. In this way, we allow the formed into the camera frame as yic = [xc , y c , z c ]⊤ =
camera still to look at those landmarks with large uncertainty, R⊤ (yi − p) , with R the rotation matrix equivalent of q.
but from better-posed directions. The point’s projection onto the image plane is
The remainder of the paper is ordered as follows. First we √
u u0 − uc /√ d
briefly describe the system and the estimation scheme. Then hi = = , (1)
v v0 − vc / d
the metrics used as cost functions to choose the appropriate
actions are explained; and our control strategy is illustrated where uc = f ku xc /z c , vc = f kv y c /z c , the radial distortion
through simulations. Lastly, we present the results of real- term is d = 1 + Kd (u2c + vc2 ), and the intrinsic calibration of
time experiments with a hand-held wide-angle camera, where the camera — focal distance f , principal point (u0 , v0 ), pixel
a GUI feeds-back motion commands to the user. densities ku and kv , and radial distortion parameter Kd — are
determined beforehand.
II. 6 DOF B EARING -O NLY SLAM When an image feature is detected, its measurement must
A. Unconstrained Camera Motion either be associated with an existing feature or be added
as a new feature in the map. The location of the camera,
It is assumed that the camera could be attached to any along with the locations of the already mapped features,
mobile platform — in our case the hand — and is free to are used to predict feature position hi using Eq. (1), and
move in any direction in IR3 × SO(3). We adopt a smooth these estimates checked against the measurements using a
unconstrained constant-velocity motion model, its translational nearest neighbour test. Feature search is constrained to 3σ
and rotational altered only by zero-mean, normally distributed elliptical regions around the image estimates as defined by
accelerations and staying the same on average. The Gaussian the innovation covariance matrix Si = Hi Pk+1|k H⊤i + R ,
acceleration assumption means that large impulsive changes of with Hi the Jacobian of the sensor model with respect to the
direction are unlikely. The camera motion prediction model is state, Pk+1|k the prior state covariance, and measurements
zi assumed corrupted by zero mean Gaussian noise with
   
p(k+1|k) p(k|k) + (v(k|k) + a(k) ∆t)∆t
 q(k+1|k)   Qq(k|k) covariance R.

xv(k+1|k) =   v(k+1|k)  = 
  ,
v(k|k) + a(k) ∆t 
C. Initialisation
ω (k+1|k) ω (k|k) + α(k) ∆t
Inserting a new feature to the map cannot be done im-
with p = [x, y, z]⊤ and q = [q0 , q1 , q2 , q3 ]⊤ denoting the cam- mediately because the measurement model is non-invertible.
era pose (three states for position and four for orientation using Though bearing is recoverable from one measurement, 3D
a unit norm quaternion representation), and v = [vx , vy , vz ]⊤ depth is not.
and ω = [ωx , ωy , ωz ]⊤ denoting the linear and angular Several schemes have been reported [8], [9], [17], and we
velocities, respectively. The subscripts (k|k) and (k + 1|k) adopt the first of these. The initial measurement results in a
denote the posterior at time k and the prior (before integrating semi-infinite line with Gaussian uncertainty in its parameters,
measurements) at k + 1. The input to the system is the starting at the estimated camera position and heading to
acceleration vector u = [a⊤ , α⊤ ]⊤ = [ax , ay , az , αx , αy , αz ]⊤ . infinity along the feature viewing direction. A 1D particle
distribution represents the likelihood of the 3D feature’s posi-
tion along this line. The line is projected as an epipolar line
Maximum
into subsequent images, but specifically it is the projection of
the point particles and their uncertainly ellipses that provide 2 3
the regions to be searched for a match, in turn producing

likelihoods for Bayesian re-weighting of the depth distribution.
A small number of steps is required to reduce to below a
1
threshold the ratio of the standard deviation in depth to the Camera
depth estimate itself. At that time, the depth distribution is

re-approximated as Gaussian and the feature is initialised as
a 3D point yi into the map.
III. I NFORMATION G AIN
This section first presents a metric for expected information
gain as a result of performing a given action, and then develops
an overall information conditioning strategy for the computa-
tion of orientations. The aim will be to move the camera in the
direction that most reduces the uncertainty in the entire SLAM Fig. 1. Maximisation of mutual information for the evaluation of motion
commands. A simple 2DOF camera is located at the centre of the plot, and a
state, by using the information that should be gained from decision where to move must be taken as a function of the pose and landmark
future, predicted, landmark observations were such a move to states, and the expected measurements. Three landmarks are located to its
be made, but taking into account the information lost as a left, front, and right-front. Moving to the location in between landmarks 2
and 3 maximises the mutual information between the SLAM prior and the
result of moving with uncertainty. measurements for this particular example.
A. Mutual Information Gain
We adopt entropy as a measure of uncertainty; that is, as Note that the use of mutual information only makes sense
a measure of how much randomness there P is in our state prior to reaching full correlation. In SLAM, |Pk|k | tends
estimate. Entropy is defined as H(X) = − x p(x) log p(x) , asymptotically to zero, point at which the map becomes fully
which, for our case where p(x) is a n-variate Gaussian correlated and there is nothing else the camera can do to
distribution, reduces to H(X) = 21 log((2π)n |P|) . improve the estimates of the features. From then on, entropy
Now consider the following two random vectors: the state can still be used to decide what actions to take to reduce
prior xk+1|k , and the prediction of measurement i, zi,k+1|k . the camera’s own uncertainty, and this can be done just by
We want to choose the action that maximises the mutual in- replacing x with xv from the above discussion.
formation between the two. The mutual information is defined
as the relative entropy between the joint distribution p(x, zi ), B. Fisher Information for Gaze Direction
and the marginals p(x) and p(zi ). Measurements in the bearing-only SLAM case are ill-posed
X p(x, zi ) for motions along the principal axis, when points are close
I(X; Z) = p(x, zi ) log
p(x)p(zi ) to the principal axis and there is little perspective distortion.
x∈X,zi ∈Z
Motion commands based on the maximisation of the mutual
= H(X) + H(Z) − H(X, Z) information metric drive the camera away from those config-
= H(X) − H(X|Z) , urations, that is, perpendicular to the principal axis. However,
we still want the camera to look at those landmarks with
which, for our Gaussian multivariate case, evaluates to
large uncertainty so as to reduce their covariance when seen
1 |Px | from different locations. To do that, we incorporate another
I(X; Z) = log
2 |Px − Pxz P−1 ⊤
z Pxz | information metric to control the direction of gaze. From a
set of possible orientation changes, we propose choosing that
!
1 |Pk+1|k |
= log which maximises the trace of the Fisher Information Matrix.
2 |Pk+1|k − Pk+1|k H⊤i S−1 ⊤
i Hi Pk+1|k | In this way we will be choosing the best direction to look at,
1 in the sense that it is the one that is most informative, but from
= log |Pk+1|k | − log |Pk+1|k+1 | .
2 a different position than the ill-posed one. Under the Gaussian
Thus, in choosing a maximally mutually informative motion assumption for sensor and platform noises, the minimisation
command, we are maximising the difference between prior and of the least squares criteria (the KF) is equivalent to the
posterior entropies [18]. In other words, we are choosing the maximisation of a likelihood function Λ(x) given the set of ob-
motion command that most reduces the uncertainty of x due servations Z k , that is, the maximisationQ of the joint pdf of the
k
to the knowledge of z as a result of a particular action. Figure entire history of observations, Λ(x) = i=1 p(zi |x, Z i−1 ) .
1 shows the directions maximising the mutual information for The Total Fisher Information Matrix, a quantification
a simple 2DOF camera and 3 landmarks. of the maximum existing information in the observations
about
h the state, is defined in [19] i as the expectation J = In our simulated setting, desired camera locations are pre-
⊤ dicted for the best action chosen, and a PD low-level control
E (∇ log Λ (x)) (∇ log Λ (x)) , which here evaluates to
J=
P ⊤ −1
H S H. law is applied to ensure these locations are reached at the
end of one second; at which point the motion metric is again
The information for the reconstruction of the state con-
evaluated to determine the next desired action. Orientations
tributed by the set of measurements at each iteration is
however, are evaluated at frame rate, leaving the system to
contained in H⊤ S−1 H. The eigenvalues λj of this contribution
freely rotate, governed only by the information maximisation
to J show which linear combinations of the states can be
strategy.
estimated with good accuracy and which will have large
uncertainties from the coming measurements. It also shows The simulation considers a fixed number of expected land-
which linear combinations of states are unobservable. When marks to be found, and both the Mutual Information and
one dimension of J has a very small eigenvalue (information Fisher Information metrics are computed taking into account
along the line of sight), the product is not a reliable measure of the corresponding full covariance matrices, including these
the elongation of the information hyperellipsoid, as it collapses unvisited landmarks, which have been initialised with large
the volume uncertainties. This is the only thing that prevents our control
P to zero. Our strategy is to look in the direction at strategies from defaulting to homeostasis.
which λj is maximum [20]. This is the viewing direction
that will introduce the largest amount of information in one
single measurement step. B. Simulation Results
Under a Fisher information motion strategy, maximally Figure 2 contains simulation results from our mutual in-
informative actions move the robot as close as possible to the formation strategy for the computation of motion commands,
landmarks under observation. We do not want to move towards and compares various orientation computation schemes. The
them, but only to orient towards them. Our idea of using the simulated environment represents a room 6 × 6 × 2 m3 in size
Fisher Information is only to fixate our camera to those most containing 33 randomly distributed point landmarks, out of
uncertain landmarks, and use the change in entropy to select which 6 are fiduciary points, to be used as global references
movement actions. This way, by using the mutual informa- [21].
tion metric, maximally informative actions would prevent the The initial standard deviation in camera pose is 6-cm in
camera from producing ill-posed measurements. Note that an the x and y directions, 4.6 cm in height z, and 45◦ in
omnidirectional sensor would not require a strategy to direct orientation, right after matching the fiduciary points, but before
fixation. In our case, as opposed to a mobile robot, translation any motion takes place. Sensor standard deviation is set at 2
and orientation changes are kinematically decoupled, for this pixels, and data association is not known a priori. Instead,
reason, it makes sense to use different information measures nearest neighbour χ-squared tests are computed to guarantee
in evaluating them. correct matching. New features are initialised once their ratio
of depth estimate to depth standard deviation falls below a
IV. C ONTROL S TRATEGY
threshold of 0.3.
In this Section we demonstrate in simulation how combining The plots show the results of actively moving a 6-DOF
the strategies of effectively controlling translation by maximis- camera whilst building a map of 3D features. In all cases, each
ing mutual information thereafter controlling orientation by of the seven motion actions will produce a displacement of
maximising the information available from the new position 30 cm in the corresponding direction. Our mutual information
yields reliable active control of pose and velocity for a free metric is evaluated at each of these positions. The action that
moving camera, whilst building a map optimally. maximises the metric is chosen, and the camera is controlled
to reach that position in one second with a PD control law.
A. Deciding Where to Go and Where to Look At Orientation changes are computed every 50 ms.
As noted earlier, the real-time requirements of the task Three approaches were tested for the computation of gaze
preclude using an optimal control decision that takes into commands: (i) constant rotational velocity of 0.2 rad/sec,
account all possible motion commands which is impracticable frames (a,d); (ii) maximisation of mutual information both
to compute, leading to an exponential growth because of the for the position and orientation of the moving camera, frames
curse of dimensionality of long term action evaluation. Instead (b,e); and (iii) maximisation of mutual information for position
we evaluate our information metrics for a small set of actions and maximisation of Fisher information for gaze, frames (c,f).
carried out over a fixed amount of time, and choose the best The experiment shown in the plots lasted 35 seconds.
action from those. The constant rotational velocity and the mutual information
The set of possible actions is divided in two groups. strategies tend to insert landmarks into the map at a faster
Mutual information is evaluated for the translational actions pace than the Fisher Information strategy. As can be seen in
go_forward, go_backwards, go_right, go_left, the error plots in Figure 3, this might not be always the best
go_up, go_down, and stay; and Fisher information is max- choice. It seems reasonable to let the system accurately locate
imised from the set of orientation commands turn_right, the already seen landmarks before actively searching for new
turn_left, and stay. ones.
rReal rReal rReal
rEst rEst rEst
1 1 1
0.5 0.5 0.5
0 0 0
Y
Y
−0.5 −0.5 −0.5
−1 −1 −1
3 3
2 2 3 2
3 3
1 2 1 2 1 2
1 1 0 1
0 0
0 0 0
−1 −1 −1
−1 −1 −1
−2 −2 −2 −2 −2 −2
X −3 X −3 X −3 −3
Z Z Z
(a) Final Map by using Mutual Information for (b) Final Map by using Mutual Information for (c) Final Map by using Mutual Information for
position and constant angular velocity. position and orientation. position and Fisher Information for orientation.
log2| P | log2| P | log2| P |
200 200 200
newland newland newland
Pcam Pcam Pcam
100 100 100
Plan Plan Plan
P P P
0 0 0
−100 −100 −100
−200 −200 −200
−300 −300 −300
−400 −400 −400
−500 −500 −500
−600 −600 −600
−700 −700 −700
−800 −800 −800

0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
Time (s) Time (s) Time (s)
(d) Entropy for MI in position and constant (e) Entropy for MI in position and orientation. (f) Entropy for MI in position and FI in orien-
angular velocity. tation.
Fig. 2. Trajectories with Final Maps and Entropy. (rReal and rEst are the real and estimated camera trajectories, the label newland and the green dots and
dotted vertical lines represent the value of entropy at the instant when new landmarks are initialised. Pcam, Plan, and P indicate the camera, map, and overall
entropies.
The third alternative, controlling camera orientation by at rest with some known object in view to act as a starting
maximising the Fisher Information entering into the filter, point and provide a metric scale to the proceedings. The
has the effect that it focuses on reducing the uncertainty of camera moves, translating and rotating freely in 3D, according
the already seen landmarks, instead of eagerly exploring the to the instruction provided in a graphical user interface, and
entire room for new landmarks. The reason is that landmarks executed by the user, within a room or a restricted volume,
that have been observed for a small period of time still have such that various parts of the unknown environment come into
large depth uncertainty, and the Fisher Information metric view. The aim is to estimate and control the full camera pose
is maximised when observations are directed towards them. continuously during arbitrarily long periods of movement. This
The technique tends to close loops at a faster pace than the involves accurately mapping (estimating the locations of) a
other two approaches, thus propagating correlations amongst sparse set of features in the environment.
landmarks and poses in a more efficient way. Additionally, by
revisiting fiduciary points more often, orientations are much Given that the control loop is being closed by the human
better estimated in this case. operator, only displacement commands are computed. Gaze
control is left to the user. Furthermore, the mutual information
Strategy (iii) needs more time to reduce entropy and takes
measure requires evaluating the determinant of the full covari-
more time to insert the same number of landmarks in the map.
ance matrix at each iteration. Because of the complexity of this
But, at the point at which the same number of landmarks is
operation, single motion predictions are evaluated one frame
available it has lower entropy than the other two strategies
at a time. It is only until the 15th frame in the sequence that
(see for example in Figure 2, frames (d-f), that when the 14th
all mutual information measures are compared, and a desired
landmark is added, the times are 19, 18, and 30 secs, and the
action is displayed on screen. That is, the user is presented
entropies are -530, -550, and -610).
with motion directions to obey every second. Note also, that
in computing the mutual information measure, only the camera
V. E XPERIMENTS
position and map parts of the covariance matrix are used,
This section presents an initial experimental result validating leaving out the gaze and velocity parts of the matrix. Finally,
the maximisation of mutual information strategy for the con- to keep it running in real-time, the resulting application must
trol of a hand-held camera in a challenging 15fps visual SLAM be designed for sparse mapping. That is, with the computing
application. Within a room, the camera starts approximately capabilities of an off-the-shelf system, our current application
0.2 0.2 0.2
0.1 0.1 0.1
Error z(m)
Error z(m)
Error z(m)
0 0 0
−0.1 −0.1 −0.1
−0.2 −0.2 −0.2

0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
0.2 0.2 0.2
0.1 0.1 0.1

Error y(m)
Error y(m)
Error y(m)
0 0 0
−0.1 −0.1 −0.1
−0.2 −0.2 −0.2

0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
0.2 0.2 0.2
0.1 0.1 0.1

Error x(m)
Error x(m)
Error x(m)
0 0 0
−0.1 −0.1 −0.1
−0.2 −0.2 −0.2

0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
Time (s) Time (s) Time (s)
(a) Position error when using MI for position (b) Position error when using MI for position (c) Position error when using MI for position
and constant angular velocity. and orientation. and FI for orientation.
0.02 0.02 0.02 0.02 0.02 0.02
0.01 0.01 0.01 0.01 0.01 0.01

Error q0(rad)
Error q1(rad)
Error q0(rad)
Error q1(rad)
Error q0(rad)
Error q1(rad)
0 0 0 0 0 0
−0.01 −0.01 −0.01 −0.01 −0.01 −0.01
−0.02 −0.02 −0.02 −0.02 −0.02 −0.02

0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30
Time (s) Time (s) Time (s) Time (s) Time (s) Time (s)
0.02 0.02 0.02 0.02 0.02 0.02
0.01 0.01 0.01 0.01 0.01 0.01

Error q2(rad)
Error q3(rad)
Error q2(rad)
Error q3(rad)
Error q2(rad)
Error q3(rad)
0 0 0 0 0 0
−0.01 −0.01 −0.01 −0.01 −0.01 −0.01
−0.02 −0.02 −0.02 −0.02 −0.02 −0.02

0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30
Time (s) Time (s) Time (s) Time (s) Time (s) Time (s)
(d) Orientation error when using MI for posi- (e) Orientation error when using MI for position (f) Orientation error when using MI for position
tion and constant angular velocity. and orientation. and FI for orientation.
Fig. 3. Estimation errors for camera position and orientation and their corresponding 2σ variance bounds. Position errors are plotted as x, y, and z distances
to the real camera location in meters, and orientation errors are plotted as quaternions.
is limited to less than 50 landmarks. accurately locating the already seen landmarks before actively
Figure 4 shows the graphical user interface. The top part of searching for new ones.
the figure contains a 3D plot of the camera and the landmarks Our method is validated in a video-rate hand-held visual
mapped, while the bottom part shows the information being SLAM implementation. Given that our system is capable of
displayed to the user superimposed on the camera view. Figure producing motion commands for a real-time 6DOF visual
5 contains a plot of the decrease in the various entropies for SLAM, it is sufficiently general to be incorporated into any
the map being built, and the list of actions chosen as shown type of mobile platform, without the need of other sensors.
to the user during the first minute. A possible weakness of this information-based approach is
Worth noticing is that in the real-time implementation, the that it estimates the utility of measurements assuming that
system prompts the user for repeated up-down movements, our models are correct. Model discrepancies, and effects of
as well as left-right commands. This can be explained as if linearisation in the computation of our estimation and control
after initialising new features, the system repeatedly asks for commands might lead to undesirable results.
motions perpendicular to the line of sight to best reduce their
A PPENDIX
uncertainty. Also, closing loops has an interesting effect in the
reduction of entropy, as can be seen around the 1500th frame The orientation of the camera frame, and its rate of change,
on Fig. 5-a. are related to the angular velocity by the quaternion multi-
plication Ω = 2q̇q∗ , with Ω = [0, ωx , ωy , ωz ]⊤ , the angular
VI. C ONCLUSION velocity vector expressed in quaternion form, and q∗ is the
orientation quaternion conjugate. Or equivalently, by q̇ =
1 q(k+1) −q(k)
In conclusion, we have shown plausible motion strategies 2 Mq ≈ ∆t , with
in a video-rate visual SLAM application. On the one hand, by  
0 −ωx −ωy −ωz
choosing a maximal mutually informative motion command, ωx 0 −ωz ωy 
we are maximising the difference between prior and posterior M=  ωy ωz
.
0 −ωx 
SLAM entropies, resulting in the motion command that mostly
ωz −ωy ωx 0
reduces the uncertainty of x due to the knowledge of z.
Alternatively, by controlling gaze maximising the information Solving for q(k+1) in the above approximation when ω is con-
about the measurements, we get a system that prioritises in stant, our smooth motion model for the prediction of change
log2 |P |
0
newland
Pcam
−100 Plan
P
−200
−300
−400
−500
−600
−700
−800
−900
0 500 1000 1500 2000 2500
Frames
(a) Camera, Map, and Total Entropies.

Actions
DOWN
UP
BACKWARDS
FORWARD
RIGHT
LEFT
STAY
0 10 20 30 40 50 60
Time (s)
(b) Actions for the first minute
Fig. 4. Feature map and camera view as shown in the Graphical User Fig. 5. Real-time Active Vision SLAM.
Interface (844th frame).
[9] J. Sola, A. Monin, M. Devy, and T. Lemaire, “Undelayed initialization

in orientation becomes qk+1 = Qqk with the quaternion in bearing only SLAM,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots
Syst., Edmonton, Aug. 2005.
transition matrix [10] P. Whaite and F. Ferrie, “Autonomous exploration: Driven by uncer-
tainty,” in Proc. 9th IEEE Conf. Comput. Vision Pattern Recog., Seattle,
∆tkΩk 2 ∆tkΩk
Q = cos I+ sin M. Jun. 1994, pp. 339–346.
2 kΩk 2 [11] H. Feder, J. Leonard, and C. Smith, “Adaptive mobile robot navigation
and mapping,” Int. J. Robot. Res., vol. 18, pp. 650–668, 1999.
Note that when computing the quaternion propagation, the [12] F. Bourgault, A. Makarenko, S. Williams, and B. Grocholsky, “Informa-
angular velocities are to be evaluated at (k + 1|k), i.e., tion based adaptative robotic exploration,” in Proc. IEEE/RSJ Int. Conf.
Intell. Robots Syst., Lausanne, Oct. 2002.
including the angular acceleration term. [13] R. Sim, “Stable exploration for bearings-only SLAM,” in Proc. IEEE
Int. Conf. Robot. Automat., Barcelona, Apr. 2005, pp. 2422–2427.
R EFERENCES [14] M. Bryson and S. Sukkarieh, “An information-theoretic approach to
autonomous navigation and guidance of an uninhabited aerial vehicle
[1] R. C. Smith and P. Cheeseman, “On the representation and estimation in unknown environments,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots
of spatial uncertainty,” Int. J. Robot. Res., vol. 5, no. 4, pp. 56–68, 1986. Syst., Edmonton, Aug. 2005.
[2] M. W. M. G. Dissanayake, P. Newman, S. Clark, H. F. Durrant-Whyte, [15] A. Adam, E. Rivlin, and I. Shimshoni, “Computing the sensory uncer-
and M. Csorba, “A solution to the simultaneous localization and map tainty field of a vision-based localization sensor,” IEEE Trans. Robot.
building (SLAM) problem,” IEEE Trans. Robot. Automat., vol. 17, no. 3, Automat., vol. 17, no. 3, pp. 258–267, Jun. 2001.
pp. 229–241, Jun. 2001. [16] T. Broida, S. Chandrashekhar, and R. Chellappa, “Recursive 3-d motion
[3] F. Lu and E. Milios, “Globally consistent range scan alignment for estimation from a monocular image sequence,” IEEE Trans. Aerosp.
environment mapping,” Auton. Robot., vol. 4, no. 4, pp. 333–349, 1997. Electron. Syst., vol. 26, no. 4, pp. 639–656, Jul. 1990.
[17] T. Bailey, “Constrained initialisation for bearing-only SLAM,” in Proc.
[4] U. Frese, P. Larsson, and T. Duckett, “A multigrid algorithm for
IEEE Int. Conf. Robot. Automat., vol. 2, Taipei, Sep. 2003, pp. 1966–
simultaneous localization and mapping,” IEEE Trans. Robot., vol. 21,
1971.
no. 2, pp. 1–12, 2005.
[18] D. J. C. MacKay, “Information based objective functions for active data
[5] S. Thrun, Y. Liu, D. Koller, A. Y. Ng, Z. Ghahramani, and H. Durrant-
selection,” Neural Comput., vol. 4, no. 4, pp. 589–603, 1992.
Whyte, “Simultaneous localization and mapping with sparse extended
[19] Y. Bar-Shalom, X. R. Li, and T. Kirubarajan, Estimation with Appli-
information filters,” Int. J. Robot. Res., vol. 23, no. 7-8, pp. 693–716,
cations to Tracking and Navigation. New York: John Wiley & Sons,
Jul. 2004.
2001.
[6] A. J. Davison and D. W. Murray, “Simultaneous localisation and map-
[20] R. Sim and N. Roy, “Global A-optimal robot exploration in SLAM,”
building uisng active vision,” IEEE Trans. Pattern Anal. Machine Intell.,
in Proc. IEEE Int. Conf. Robot. Automat., Barcelona, Apr. 2005, pp.
vol. 24, no. 7, pp. 865–880, Jul. 2002.
673–678.
[7] S. Se, D. Lowe, and J. Little, “Mobile robot localization and mapping [21] J. Andrade-Cetto and A. Sanfeliu, “The effects of partial observability
with uncertainty using scale-invariant visual landmarks,” Int. J. Robot. when building fully correlated maps,” IEEE Trans. Robot., vol. 21, no. 4,
Res., vol. 21, no. 8, pp. 735–758, Aug. 2002. pp. 771–777, Aug. 2005.
[8] A. Davison, W. Mayol, and D. Murray, “Real-time localisation and
mapping with wearable active vision,” in Proc. IEEE Int. Sym. Mixed
and Augmented Reality, Tokyo, Oct. 2003.

Vidal Etal Icra2006

Uploaded by

Copyright:

Available Formats

Vidal Etal Icra2006

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vidal Etal Icra2006

Uploaded by

Copyright:

Available Formats

Active Control for Single Camera SLAM

Teresa Vidal-Calleja Andrew J. Davison

Juan Andrade-Cetto David W. Murray

Abstract— computational burden for the accurate detection and matching

the regions to be searched for a match, in turn producing

depth estimate itself. At that time, the depth distribution is

0.5 0.5 0.5

−0.5 −0.5 −0.5

−100 −100 −100

−200 −200 −200

−300 −300 −300

−400 −400 −400

−500 −500 −500

−600 −600 −600

−700 −700 −700

−800 −800 −800

0.1 0.1 0.1

−0.1 −0.1 −0.1

−0.2 −0.2 −0.2

0.2 0.2 0.2

0.1 0.1 0.1

−0.1 −0.1 −0.1

−0.2 −0.2 −0.2

0.2 0.2 0.2

0.1 0.1 0.1

−0.1 −0.1 −0.1

−0.2 −0.2 −0.2

0.01 0.01 0.01 0.01 0.01 0.01

−0.01 −0.01 −0.01 −0.01 −0.01 −0.01

−0.02 −0.02 −0.02 −0.02 −0.02 −0.02

0.02 0.02 0.02 0.02 0.02 0.02

0.01 0.01 0.01 0.01 0.01 0.01

−0.01 −0.01 −0.01 −0.01 −0.01 −0.01

−0.02 −0.02 −0.02 −0.02 −0.02 −0.02

(a) Camera, Map, and Total Entropies.

(b) Actions for the first minute

[9] J. Sola, A. Monin, M. Devy, and T. Lemaire, “Undelayed initialization

You might also like